KR102034968B1

KR102034968B1 - Method and apparatus of image processing

Info

Publication number: KR102034968B1
Application number: KR1020180155767A
Authority: KR
Inventors: 김문철; 김수예; 김대은
Original assignee: 한국과학기술원
Priority date: 2017-12-06
Filing date: 2018-12-06
Publication date: 2019-10-21
Also published as: KR20190067113A

Abstract

일 실시예에 따른 이미지 처리 방법은, 이미지를 수신하는 단계와, 제1 컨벌루션 연산을 수행하여 상기 이미지를 분해하는 단계와, 제2 컨벌루션 연산을 수행하여 분해된 이미지의 일부로부터 제1 특징맵을 추출하고, 상기 분해된 이미지의 다른 일부로부터 제2 특징맵을 추출하는 단계와, 상기 제1 특징맵 및 상기 제2 특징맵에 제3 컨벌루션 연산을 수행함으로써 HDR 이미지를 생성하는 단계를 포함한다.An image processing method according to an embodiment may include receiving an image, performing a first convolution operation to decompose the image, and performing a second convolution operation to generate a first feature map from a portion of the decomposed image. And extracting a second feature map from another portion of the decomposed image, and generating an HDR image by performing a third convolution operation on the first feature map and the second feature map.

Description

Image processing method and apparatus {METHOD AND APPARATUS OF IMAGE PROCESSING}

아래 실시예들은 이미지 처리 방법 및 장치에 관한 것이다.The embodiments below relate to an image processing method and apparatus.

인간의 시각 시스템은 SDR(Standard Dynamic Range) 디스플레이에서 일반적으로 제공되는 것보다 더 세부적으로 인식하고, 더 강한 명암(Contrast)으로 세계를 훨씬 밝게 인식한다.The human visual system recognizes in greater detail than is normally offered in a standard dynamic range (SDR) display, and perceives the world much brighter with stronger contrast.

최근 출시된 HDR(High Dynamic Range) 소비자 디스플레이를 통해 육안으로 볼 때 1000 cd/m²이상의 밝기(SDR 디스플레이의 경우 100 cd/m²), 높은 명암비(contrast ratio), 10 비트 이상의 비트 심도 증가(increased bit depth) 및 WCG(Wide Color Gamut)가 포함된다. 그러나, HDR TV는 시장에서 쉽게 구할 수 있는 반면, HDR 컨텐츠가 심각하게 부족한 상황이다.With the recently launched High Dynamic Range (HDR) consumer display, brightness above 1000 cd / m ² (100 cd / m ^{2 for} SDR displays), high contrast ratio, and bit depth increase above 10 increased bit depth and Wide Color Gamut (WCG). However, while HDR TVs are readily available in the market, there is a serious lack of HDR content.

역 톤 매핑(Inverse Tone Mapping(ITM) or Reverse Tone Mapping)은 더 나은 그래픽 렌더링을 위해 LDR(Low Dynamic Range) 이미지에서 HDR 이미지를 예측하는 것을 목표로 하는 컴퓨터 그래픽스의 연구 분야이다.Inverse Tone Mapping (ITM) or Reverse Tone Mapping is a field of research in computer graphics that aims to predict HDR images from low dynamic range (LDR) images for better graphics rendering.

연구의 또 다른 분야인 HDR 이미징은 서로 다른 노출의 여러 LDR 이미지를 사용하여 포화된 영역(saturated region)의 세부 정보가 포함된 단일 HDR 이미지를 만드는 것이다. 두 가지 연구 분야에서 조명 계산(lighting calculation)은 SDR 디스플레이에서 그래픽 또는 자연 장면을 보다 정확하게 표현할 것이라는 믿음으로 HDR 도메인에서 수행된다. HDR 이미지는 렌더링하는 동안 전문적인 HDR 모니터를 이용해 볼 수 있다.Another area of research, HDR imaging, uses multiple LDR images of different exposures to create a single HDR image that contains details of a saturated region. In both areas of research, lighting calculations are performed in the HDR domain with the belief that they will more accurately represent graphical or natural scenes on SDR displays. HDR images can be viewed using a professional HDR monitor while rendering.

결과적으로 상술한 영역에서 언급된 HDR 도메인은 현재 이용 가능한 HDR 소비자 디스플레이와 반드시 동일하지는 않으며 그러한 목적을 위해 ITM 방법에 의해 생성된 HDR 이미지는 HDR TV에서 시청하기에 적합하지 않다.As a result, the HDR domain mentioned in the above-mentioned area is not necessarily the same as the currently available HDR consumer display, and the HDR image generated by the ITM method for that purpose is not suitable for viewing on HDR TV.

기존의 ITM 방식을 최대 밝기 1000 cd/m²의 HDR TV에 적용할 경우 명암 또는 세부 묘사가 약하거나 노이즈의 증폭으로 인해 HDR 용량(capacity)를 충분히 활용할 수 없다.When the conventional ITM method is applied to an HDR TV with a maximum brightness of 1000 cd / m ² , the HDR capacity cannot be fully utilized due to weak contrast or detail, or amplification of noise.

즉, 기존의 역 톤 매핑 방법들은 보다 자연스러운 그래픽 렌더링을 목적으로 HDR 도메인에서 작업하기 위해 LDR 영상을 HDR로 변환하고, 주로 HDR 도메인에서 렌더링한 후, 최종적으로는 이를 다시 LDR로 톤 매핑해 영상을 시청하고자 하는 목적을 가지기 때문에, 역 톤 매핑한 HDR 영상은 곧바로 HDR TV로 시청하기에는 적합하지 않다.In other words, conventional inverse tone mapping methods convert LDR images to HDR to work in the HDR domain for the purpose of more natural graphic rendering, mainly render in the HDR domain, and finally tone-map them to LDR to convert the image. Because of the purpose of viewing, reverse tone-mapped HDR video is not suitable for viewing directly on an HDR TV.

종래의 기술로 얻어진 HDR 영상은 어두운 부분에 노이즈가 증폭되는 현상이 관찰되거나, 영상의 대비 또는 디테일이 충분히 표현되지 않는 문제점이 있다. The HDR image obtained by the prior art has a problem that noise is amplified in the dark portion, or the contrast or detail of the image is not sufficiently represented.

종래 기술은 컬러 컨테이너(예를 들어, BT. 2020)와 전달 함수(예를 들어, PQ-EOTF)를 고려하지 않아 HDR TV에 적용하기 위해서는 이를 수동적으로 변환시켜야 하는 문제점이 있다. The prior art does not consider the color container (for example, BT. 2020) and the transfer function (for example, PQ-EOTF), there is a problem that it must be converted manually in order to apply to HDR TV.

실시예들은 LDR 이미지를 HDR 이미지로 처리하는 기술을 제공할 수 있다.Embodiments may provide a technique for processing an LDR image into an HDR image.

상기 HDR 이미지를 생성하는 단계는, 상기 제1 특징맵과 상기 제2 특징맵을 결합하는 단계와, 결합된 제1 특징맵과 제2 특징맵에 상기 제3 컨벌루션 연산을 수행하여 HDR 이미지를 생성하는 단계를 포함할 수 있다.The generating of the HDR image may include combining the first feature map and the second feature map, and performing the third convolution operation on the combined first feature map and the second feature map to generate an HDR image. It may include the step.

상기 이미지 처리 방법은, 상기 제1, 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시키는 단계를 더 포함할 수 있다.The image processing method may further include learning filter parameters of the first, second and third convolution operations.

상기 학습시키는 단계는, 상기 이미지에 기초하여 상기 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시키는 단계와, 상기 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시키는 단계 이후에, 상기 제1 컨벌루션 연산의 필터 파라미터를 학습시키는 단계와, 상기 제1 컨벌루션 연산의 필터 파라미터를 학습시키는 단계 이후에, 상기 제1 내지 제3 컨벌루션 연산의 필터 파라미터를 학습시키는 단계를 포함할 수 있다.The training may include learning filter parameters of the second and third convolution operations based on the image, and learning the filter parameters of the second and third convolution operations. Learning the filter parameters of the operation, and after learning the filter parameters of the first convolution operation, learning the filter parameters of the first to third convolution operations.

상기 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시키는 단계는, 상기 이미지를 필터링함으로써 상기 분해된 이미지의 일부를 생성하는 단계와, 상기 이미지 및 상기 일부에 기초하여 상기 분해된 이미지의 다른 일부를 생성하는 단계와, 상기 일부 및 상기 다른 일부에 기초하여 상기 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시키는 단계를 포함할 수 있다.Learning the filter parameters of the second and third convolution operations may include generating a portion of the disassembled image by filtering the image, and generating another portion of the disassembled image based on the image and the portion. Generating and learning filter parameters of the second and third convolution operations based on the portion and the other portion.

상기 일부를 생성하는 단계는, 상기 이미지를 경계 보존 필터링(edge preserving filtering)함으로써 상기 일부를 생성하는 단계를 포함할 수 있다.Generating the portion may include generating the portion by edge preserving filtering the image.

상기 다른 일부를 생성하는 단계는, 상기 이미지와 상기 일부에 대하여 원소별 나눗셈(element-wise division)을 수행하여 상기 다른 일부를 생성하는 단계를 포함할 수 있다.The generating of the other part may include generating the other part by performing element-wise division on the image and the part.

상기 제1 내지 제3 컨벌루션 연산의 필터 파라미터를 학습시키는 단계는, 학습된 제1 컨벌루션 연산의 파라미터와 학습된 제2 및 제3 컨벌루션 연산의 필터 파라미터가 결합된 네트워크를 엔드-투-엔드(end-to-end)로 학습시키는 단계를 포함할 수 있다.Learning the filter parameters of the first to third convolutional operations comprises: end-to-end a network where the parameters of the learned first convolutional operations and the filter parameters of the learned second and third convolutional operations are combined; -to-end).

일 실시예에 따른 이미지 처리 장치는, 이미지를 수신하는 수신기와, 제1 컨벌루션 연산을 수행하여 상기 이미지를 분해하고, 제2 컨벌루션 연산을 수행하여 분해된 이미지의 일부로부터 제1 특징맵을 추출하고, 상기 분해된 이미지의 다른 일부로부터 제2 특징맵을 추출하고, 상기 제1 특징맵 및 상기 제2 특징맵에 제3 컨벌루션 연산을 수행함으로써 HDR 이미지를 생성하는 프로세서를 포함한다.According to an embodiment, an image processing apparatus includes a receiver for receiving an image, a first convolution operation to decompose the image, and a second convolution operation to extract a first feature map from a part of the decomposed image And a processor configured to extract a second feature map from another part of the decomposed image, and generate an HDR image by performing a third convolution operation on the first feature map and the second feature map.

상기 프로세서는, 상기 제1 특징맵과 상기 제2 특징맵을 결합하고, 결합된 제1 특징맵과 제2 특징맵에 상기 제3 컨벌루션 연산을 수행하여 HDR 이미지를 생성할 수 있다.The processor may generate the HDR image by combining the first feature map and the second feature map, and performing the third convolution operation on the combined first feature map and the second feature map.

상기 프로세서는, 상기 제1, 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다.The processor may learn filter parameters of the first, second and third convolution operations.

상기 프로세서는, 상기 이미지에 기초하여 상기 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시키고, 상기 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킨 이후에, 상기 제1 컨벌루션 연산의 필터 파라미터를 학습시키고, 상기 제1 컨벌루션 연산의 필터 파라미터를 학습시킨 이후에, 상기 제1 내지 제3 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다.The processor learns the filter parameters of the second and third convolution operations based on the image, and after learning the filter parameters of the second and third convolution operations, filters the filter parameters of the first convolution operation. After learning and learning the filter parameters of the first convolution operation, the filter parameters of the first to third convolution operations may be learned.

상기 프로세서는, 상기 이미지를 필터링함으로써 상기 분해된 이미지의 일부를 생성하고, 상기 이미지 및 상기 일부에 기초하여 상기 분해된 이미지의 다른 일부를 생성하고, 상기 일부 및 상기 다른 일부에 기초하여 상기 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다.The processor generates a portion of the disassembled image by filtering the image, generates another portion of the disassembled image based on the image and the portion, and generates the second portion based on the portion and the other portion. And filter parameters of the third convolution operation.

상기 프로세서는, 상기 이미지를 경계 보존 필터링(edge preserving filtering)함으로써 상기 일부를 생성할 수 있다.The processor may generate the portion by edge preserving filtering the image.

상기 프로세서는, 상기 이미지와 상기 일부에 대하여 원소별 나눗셈(element-wise division)을 수행하여 상기 다른 일부를 생성할 수 있다.The processor may generate the other part by performing element-wise division on the image and the part.

상기 프로세서는, 학습된 제1 컨벌루션 연산의 파라미터와 학습된 제2 및 제3 컨벌루션 연산의 필터 파라미터가 결합된 네트워크를 엔드-투-엔드(end-to-end)로 학습시킬 수 있다.The processor may end-to-end learn a network in which the parameters of the learned first convolution operation and the filter parameters of the learned second and third convolution operations are combined.

도 1은 일 실시예에 따른 이미지 처리 장치의 개략적인 블록도를 나타낸다.
도 2는 도 1에 도시된 이미지 처리 장치가 이용하는 뉴럴 네트워크의 구조를 나타낸다.
도 3은 도 2에 도시된 뉴럴 네트워크를 학습시키는 동작을 나타낸다.
도 4a는 도 3에 도시된 LDR 이미지의 예를 나타낸다.
도 4b는 도 3에 도시된 베이스 레이어의 예를 나타낸다.
도 4c는 도 3에 도시된 디테일 레이어의 예를 나타낸다.
도 5a는 이미지를 분해하는 네트워크의 일 예를 나타낸다.
도 5b는 이미지를 분해하는 네트워크의 다른 예를 나타낸다.
도 5c는 이미지를 분해하는 네트워크의 또 다른 예를 나타낸다.
도 6a는 처리하지 않은 원본 이미지의 일 예를 나타낸다
도 6b는 이미지 처리 장치가 처리한 이미지의 일 예를 나타낸다.
도 7a는 처리하지 않은 원본 이미지의 다른 예를 나타낸다
도 7b는 이미지 처리 장치가 처리한 이미지의 다른 예를 나타낸다.1 is a schematic block diagram of an image processing apparatus according to an exemplary embodiment.
FIG. 2 illustrates a structure of a neural network used by the image processing apparatus shown in FIG. 1.
FIG. 3 illustrates an operation of training the neural network shown in FIG. 2.
4A illustrates an example of the LDR image shown in FIG. 3.
4B illustrates an example of the base layer illustrated in FIG. 3.
4C illustrates an example of the detail layer shown in FIG. 3.
5A shows an example of a network that decomposes an image.
5B shows another example of a network for decomposing an image.
5C shows another example of a network for decomposing an image.
6A shows an example of an unprocessed original image
6B illustrates an example of an image processed by the image processing apparatus.
7A shows another example of an unprocessed original image
7B shows another example of an image processed by the image processing apparatus.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. However, various changes may be made to the embodiments so that the scope of the patent application is not limited or limited by these embodiments. It is to be understood that all changes, equivalents, and substitutes for the embodiments are included in the scope of rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of description and should not be construed as limiting. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described on the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

제1 또는 제2등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해서 한정되어서는 안 된다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 실시예의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다. Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The terms are intended only to distinguish one component from another, for example, without departing from the scope of the rights according to the concepts of the embodiment, the first component may be called a second component, and similarly The second component may also be referred to as the first component.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the following description with reference to the accompanying drawings, the same components regardless of reference numerals will be given the same reference numerals and redundant description thereof will be omitted. In the following description of the embodiment, when it is determined that the detailed description of the related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

도 1은 일 실시예에 따른 이미지 처리 장치의 개략적인 블록도를 나타낸다.1 is a schematic block diagram of an image processing apparatus according to an exemplary embodiment.

도 1을 참조하면, 이미지 처리 장치(10)는 이미지를 수신하여 처리할 수 있다. 이미지는 빛의 굴절이나 반사 등에 의하여 이루어진 물체의 상을 포함하는 것으로, 선이나 색채를 이용하여 사물의 형상을 나타낸 것을 의미할 수 있다. 예를 들어, 이미지는 컴퓨터가 처리할 수 있는 형태로 된 정보로 이루어질 수 있다.Referring to FIG. 1, the image processing apparatus 10 may receive and process an image. The image includes an image of an object made by refraction or reflection of light, and may mean a shape of an object using lines or colors. For example, an image may consist of information in a form that can be processed by a computer.

이미지 처리 장치(10)는 LDR(Low Dynamic Range, 400) 이미지(이하, 수신한 이미지 또는 이미지라 함)를 처리하여 HDR(High Dynamic Range) 이미지를 생성할 수 있다.The image processing apparatus 10 may generate a high dynamic range (HDR) image by processing a low dynamic range (LDR) image (hereinafter, referred to as a received image or an image).

이미지 처리 장치(10)는 수신한 이미지를 뉴럴 네트워크(neural network)를 이용하여 처리할 수 있다. 예를 들어, 이미지 처리 장치(10)는 수신한 이미지를 뉴럴 네트워크를 이용하여 역 톤 매핑(inverse tone mapping)하여 HDR 이미지(500)를 생성할 수 있다.The image processing apparatus 10 may process the received image by using a neural network. For example, the image processing apparatus 10 may generate the HDR image 500 by performing inverse tone mapping on the received image using a neural network.

톤 매핑(tone mapping)은 이미지 처리 및 컴퓨터 그래픽스에서 하나의 색상의 집합(set of colors)을 다른 세트로 매핑하여 다이나믹 레인지(dynamic range)가 제한된 매체에서 HDR 이미지(500)의 외관을 근사하기 위해 사용되는 기술을 의미할 수 있다.Tone mapping is used to map the set of colors to another set in image processing and computer graphics to approximate the appearance of the HDR image 500 in a medium with limited dynamic range. It may mean a technique used.

인쇄물, 디스플레이 및 프로젝터는 모두 제한된 다이나믹 레인지를 가지므로 자연 장면에 존재하는 모든 광도(light intensity)를 재현하기에는 부적합할 수 있다. 톤 매핑은 원래의 장면 컨텐츠를 감상하는데 중요한 이미지 디테일 및 색상 외관을 유지하면서 장면의 밝기에서 표시 가능한 범위 까지의 큰 명암 감소 문제를 해결하기 위한 것일 수 있다.Prints, displays, and projectors all have limited dynamic range and may not be suitable for reproducing all the light intensities present in natural scenes. Tone mapping may be to solve the problem of large contrast reduction from the brightness of the scene to the displayable range while maintaining the image detail and color appearance that are important for viewing the original scene content.

역 톤 매핑(inverse tone mapping, ITM)이란 톤 매핑의 반대되는 개념으로 LDR 이미지(400)로부터 HDR 이미지(500)를 생성하는 것을 의미할 수 있다.Inverse tone mapping (ITM) may mean generating the HDR image 500 from the LDR image 400 in the opposite concept of tone mapping.

이미지 처리 장치(10)는 딥 컨볼루션 네트워크를 통해 입력 LDR 영상의 특징을 추출하고 이를 기반을 HDR 영상을 복원해, 기존의 컴퓨터 그래픽 용도 역 톤 매핑 방법들과 다르게 HDR TV로 시청을 위한 HDR 영상을 만들어 낼 수 있다. The image processing apparatus 10 extracts the characteristics of the input LDR image through the deep convolution network and restores the HDR image based on the image, and thus, the HDR image for viewing on the HDR TV unlike the conventional computer graphic reverse tone mapping methods. Can produce

이미지 처리 장치(10)가 사용하는 컨볼루션 네트워크는 필터(예를 들어, 유도 필터(guided filter))를 활용해 LDR 영상을 분해하고, 분해된 영상으로 콘볼루션 신경망의 후반부를 먼저 학습시킬 수 있다. 그 후, 필터 대신에 다수의 콘볼루션 계층을 추가해 우선 추가된 계층만 예비 학습을 시킨 후, 전체 콘볼루션 신경망 구조를 최종적으로 학습시킬 수 있다. The convolutional network used by the image processing apparatus 10 may decompose an LDR image using a filter (for example, a guided filter), and train the second half of the convolutional neural network first using the decomposed image. . Subsequently, a plurality of convolutional layers may be added instead of the filter, and only the added layer may be preliminarily trained, and then the entire convolutional neural network structure may be finally trained.

이미지 처리 장치(10)는 상술한 학습 방식을 통해 엔드-투-엔드(end-to-end) 컨볼루션 네트워크 구조를 효과적으로 학습시킬 수 있고, 최종적으로 보다 개선된 화질의 HDR 영상을 얻을 수 있다.The image processing apparatus 10 may effectively learn an end-to-end convolutional network structure through the above-described learning method, and finally may obtain an HDR image having more improved image quality.

이미지 처리 장치(10)는 수신기(100), 프로세서(200)를 포함한다. 이미지 처리 장치(10)는 메모리(300)를 더 포함할 수 있다.The image processing apparatus 10 includes a receiver 100 and a processor 200. The image processing apparatus 10 may further include a memory 300.

수신기(100)는 이미지를 수신할 수 있다. 예를 들어, 이미지는 LDR 이미지(400)를 포함할 수 있다. 수신기(100)는 이미지 처리 장치(10)의 외부로부터 이미지를 수신할 수 있고, 또는 메모리(300)로부터 이미지를 수신할 수 있다.The receiver 100 may receive an image. For example, the image may include an LDR image 400. The receiver 100 may receive an image from the outside of the image processing apparatus 10 or may receive an image from the memory 300.

프로세서(200)는 컨벌루션 연산을 수행하여 이미지를 처리할 수 있다. 구체적으로, 프로세서(200)는 세 가지 파트의 컨벌루션 연산을 통해서 이미지를 처리함으로써 HDR 이미지(500)를 생성할 수 있다.The processor 200 may process the image by performing a convolution operation. In detail, the processor 200 may generate the HDR image 500 by processing the image through a convolution operation of three parts.

세 가지 파트는 이미지의 분해하는 파트, 분해된 이미지로부터 특징을 추출하는 파트 및 추출된 특징을 이용하여 HDR 이미지(500)를 생성하는 파트를 포함할 수 있다.The three parts may include a decomposing part of the image, a part extracting a feature from the decomposed image, and a part generating the HDR image 500 using the extracted feature.

프로세서(200)는 제1 컨벌루션 연산을 수행하여 이미지를 분해할 수 있다. 프로세서(200)는 제2 컨벌루션 연산을 수행하여 분해된 이미지의 일부로부터 제1 특징맵을 추출하고, 분해된 이미지의 다른 일부로부터 제2 특징맵을 추출할 수 있다.The processor 200 may decompose an image by performing a first convolution operation. The processor 200 may extract a first feature map from a part of the decomposed image by performing a second convolution operation, and extract a second feature map from another part of the decomposed image.

프로세서(200)는 제1 특징맵 및 제2 특징맵에 제3 컨벌루션 연산을 수행함으로써 HDR 이미지(500)를 생성할 수 있다. 프로세서(200)는 제1 특징맵과 제2 특징맵을 결합하고, 결합된 제1 특징맵과 제2 특징맵에 제3 컨벌루션 연산을 수행하여 HDR 이미지(500)를 생성할 수 있다.The processor 200 may generate the HDR image 500 by performing a third convolution operation on the first feature map and the second feature map. The processor 200 may generate the HDR image 500 by combining the first feature map and the second feature map and performing a third convolution operation on the combined first feature map and the second feature map.

프로세서(200)는 뉴럴 네트워크를 학습(train)시킬 수 있다. 예를 들어, 프로세서(200)는 제1, 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다. 프로세서(200)는 제1, 제2 및 제3 컨벌루션 연산의 필터 파라미터를 일정한 순서에 따라 학습시킬 수 있다.The processor 200 may train a neural network. For example, the processor 200 may learn filter parameters of the first, second, and third convolution operations. The processor 200 may learn the filter parameters of the first, second, and third convolution operations in a certain order.

예를 들어, 프로세서(200)는 이미지에 기초하여 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다. 프로세서(200)는 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킨 이후에, 단계 이후에, 제1 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다.For example, the processor 200 may learn filter parameters of the second and third convolution operations based on the image. After learning the filter parameters of the second and third convolution operations, the processor 200 may learn the filter parameters of the first convolution operation after the step.

프로세서(200)는 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킴에 있어서, 수신한 이미지를 필터링함으로써 분해된 이미지의 일부를 생성할 수 있다. 예를 들어, 프로세서(200)는 수신한 이미지를 경계 보존 필터링(edge preserving filtering)함으로써 분해된 이미지의 일부를 생성할 수 있다.The processor 200 may generate a part of the decomposed image by filtering the received image in learning the filter parameters of the second and third convolution operations. For example, the processor 200 may generate a portion of the decomposed image by edge preserving filtering the received image.

프로세서(200)는 이미지 및 분해된 이미지의 일부에 기초하여 분해된 이미지의 다른 일부를 생성할 수 있다. 예를 들어, 프로세서(200)는 이미지와 분해된 이미지의 일부에 대하여 원소별 나눗셈(element-wise division)을 수행하여 분해된 이미지의 다른 일부를 생성할 수 있다.The processor 200 may generate another part of the decomposed image based on the image and the part of the decomposed image. For example, the processor 200 may generate another part of the decomposed image by performing element-wise division on the image and the part of the decomposed image.

프로세서(200)는 분해된 이미지의 일부 및 다른 일부에 기초하여 제2 및 제3 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다. 프로세서(200)가 분해된 이미지의 일부 및 다른 일부를 이용하여 제2 및 제3 컨벌루션 연산을 학습시키는 동작은 도 3을 참조하여 상세하게 설명할 것이다.The processor 200 may learn filter parameters of the second and third convolution operations based on the part and the other part of the disassembled image. The operation of the processor 200 to learn the second and third convolution operations using the part and the other part of the disassembled image will be described in detail with reference to FIG. 3.

프로세서(200)는 제1 컨벌루션 연산의 필터 파라미터를 학습시킨 이후에, 제1 내지 제3 컨벌루션 연산의 필터 파라미터를 학습시킬 수 있다. 프로세서(200)는 학습된 제1 컨벌루션 연산의 파라미터와 학습된 제2 및 제3 컨벌루션 연산의 필터 파라미터가 결합된 네트워크를 엔드-투-엔드(end-to-end)로 학습시킬 수 있다.After learning the filter parameters of the first convolution operation, the processor 200 may learn the filter parameters of the first to third convolution operations. The processor 200 may end-to-end learn a network in which parameters of the learned first convolution operation and filter parameters of the learned second and third convolution operations are combined.

메모리(300)는 프로세서(200)에 의해 실행될 명령들(instructions), 뉴럴 네트워크의 필터 파라미터, 특징맵 등을 저장할 수 있다. 또한, 메모리(300)는 LDR 이미지(400), HDR 이미지(500)를 저장할 수 있다.The memory 300 may store instructions to be executed by the processor 200, filter parameters of a neural network, a feature map, and the like. In addition, the memory 300 may store the LDR image 400 and the HDR image 500.

도 2는 도 1에 도시된 이미지 처리 장치가 이용하는 뉴럴 네트워크의 구조를 나타낸다.FIG. 2 illustrates a structure of a neural network used by the image processing apparatus shown in FIG. 1.

도 2를 참조하면, 이미지 처리 장치(10)는 컨벌루션 뉴럴 네트워크를 이용하여 LDR 이미지(400)를 처리하여 HDR 이미지(500)를 생성할 수 있다. 컨벌루션 뉴럴 네트워크는 세 개의 파트(part)로 구성될 수 있다.Referring to FIG. 2, the image processing apparatus 10 may generate the HDR image 500 by processing the LDR image 400 using a convolutional neural network. The convolutional neural network may consist of three parts.

첫 번째 파트는 LDR 분해(decomposition) 파트(210)이고, 두 번째 파트는 특징 추출(feature extraction) 파트(230)이고, 세 번째 파트는 HDR 복원(reconstruction) 파트(250)로 구현될 수 있다.The first part may be an LDR decomposition part 210, the second part may be a feature extraction part 230, and the third part may be implemented as an HDR reconstruction part 250.

톤 매핑에서 경계 보존 필터(예를 들어, 양방향 필터(bilateral filter))는 HDR 입력에 사용되어 베이스 레이어(base layer) 및 디테일 레이어(detail layer)로 이미지를 분해함으로써 디테일 레이어를 보존하면서 베이스 레이어만이 압축될 수 있다. 처리된 베이스 레이어와 디테일 레이어가 통합되어 최종 LDR 출력 이미지가 획득될 수 있다.In tone mapping, a boundary preserving filter (e.g. a bilateral filter) is used for HDR input to decompose the image into a base layer and a detail layer to preserve only the detail layer while preserving the detail layer. This can be compressed. The processed base layer and detail layer can be integrated to obtain a final LDR output image.

반대로, ITM 알고리즘의 목적은 확장된(extended) 베이스 레이어로 소실된 디테일들을 예측하여 원하는 밝기와 일치시켜 출력 HDR 이미지(500)를 생성하는 것일 수 있다. 이미지가 특성이 다른 두 부분으로 분해되면, 출력 이미지의 보다 정확한 예측을 위해 개별 브랜치(branch)에 대해 적절한 처리가 수행될 수 있다.In contrast, the purpose of the ITM algorithm may be to produce the output HDR image 500 by predicting the details lost with the extended base layer and matching the desired brightness. If the image is decomposed into two parts with different characteristics, then appropriate processing can be performed on the individual branches for more accurate prediction of the output image.

상술한 ITM 이미지 처리를 위해 이미지 처리 장치(10)는 도 2의 예시와 같이 LDR 분해 파트(210), 특징 추출 파트(230) 및 HDR 복원 파트(250)를 포함하는 뉴럴 네트워크를 이용하여 이미지를 처리할 수 있다.For the ITM image processing described above, the image processing apparatus 10 uses an neural network including an LDR decomposition part 210, a feature extraction part 230, and an HDR reconstruction part 250 as shown in FIG. 2. Can be processed.

LDR 분해 파트(210)는 수신한 이미지를 분해할 수 있다. LDR 분해 파트(210)는 수신한 이미지를 복수의 채널(또는 특징맵)로 분해할 수 있다. 예를 들어, LDR 분해 파트(210)의 마지막 레이어의 출력 채널(예를 들어, 분해된 이미지)의 수는 6 개일 수 있다.The LDR decomposition part 210 may decompose the received image. The LDR decomposition part 210 may decompose the received image into a plurality of channels (or feature maps). For example, the number of output channels (eg, decomposed images) of the last layer of the LDR decomposition part 210 may be six.

LDR 분해 파트(210)는 복수의 컨벌루션 레이어를 포함할 수 있다. 예를 들어, LDR 분해 파트(210)는 3 개의 컨벌루션 레이어로 구성될 수 있다.The LDR decomposition part 210 may include a plurality of convolutional layers. For example, the LDR decomposition part 210 may be composed of three convolutional layers.

LDR 분해 파트(210)는 제1 컨벌루션 연산을 수행하여 수신한 각 이미지(400)를 적어도 두 개의 서로 다른 특징맵의 집합으로 분해할 수 있다. 특징맵의 집합은 복수의 특징맵(예를 들어, 분해된 이미지)을 포함할 수 있다. 예를 들어, LDR 분해 파트(210)는 첫 번째 3개의 특징맵(예를 들어, 분해된 이미지의 일부)와 마지막 3 개의 특징맵(예를 들어, 분해된 이미지의 다른 일부)로 이미지를 분해할 수 있다. 즉, 분해된 이미지의 다른 일부는 분해된 이미지에서 분해된 이미지의 일부를 제외한 나머지일 수 있다.The LDR decomposition part 210 may decompose each image 400 received by performing a first convolution operation into a set of at least two different feature maps. The set of feature maps may include a plurality of feature maps (eg, exploded images). For example, LDR decomposition part 210 decomposes an image into the first three feature maps (eg, part of the exploded image) and the last three feature maps (eg, other parts of the exploded image). can do. That is, another part of the decomposed image may be other than a part of the decomposed image in the decomposed image.

특징 추출 파트(230)는 분해된 이미지로부터 특징을 추출할 수 있다. 특징 추출 파트(230)는 복수의 컨벌루션 브랜치를 이용하여 분해된 이미지의 일부와 다른 일부에 대해 개별적으로 특징을 추출할 수 있다.The feature extraction part 230 may extract a feature from the decomposed image. The feature extraction part 230 may separately extract a feature for a part of the disassembled image and another part using a plurality of convolution branches.

각각의 컨벌루션 브랜치는 복수의 컨벌루션 레이어를 포함하고, 각각의 입력(예를 들어, 분해된 이미지의 일부 및 다른 일부)의 특성에 집중하여 제2 컨벌루션 연산을 수행할 수 있다. 특징 추출 파트(230)는 제2 컨벌루션 연산을 수행함으로써 각 분해된 이미지의 특징을 포함하는 제1 특징맵 및 제2 특징맵을 생성할 수 있다.Each convolution branch may include a plurality of convolution layers, and may perform a second convolution operation by focusing on the characteristics of each input (eg, a portion of the disassembled image and another portion). The feature extraction part 230 may generate a first feature map and a second feature map including the features of each of the decomposed images by performing a second convolution operation.

마지막으로, HDR 복원 파트(250)는 제3 컨벌루션 연산을 수행하여 추출된 특징(예를 들어, 제1 특징맵 및 제2 특징맵)으로부터 HDR 이미지(500)를 생성할 수 있다. HDR 복원 파트(250)는 각각의 컨벌루션 브랜치에서 추출한 특징맵들을 연결(concatenate)하고 복수의 컨벌루션 레이어를 통해 추출된 특징맵들을 통합하여 HDR 이미지(500)를 최종적으로 생산하는 방법을 학습할 수 있다.Finally, the HDR restoration part 250 may generate the HDR image 500 from the extracted features (eg, the first feature map and the second feature map) by performing a third convolution operation. The HDR reconstruction part 250 may learn how to concatenate the feature maps extracted from each convolution branch and integrate the feature maps extracted through the plurality of convolution layers to finally produce the HDR image 500. .

이미지 처리 장치(10)는 LDR 분해 파트(210), 특징 추출 파트(230) 및 HDR 복원 파트(250)를 함께 최적화할 수 있다. 또한, 이미지 처리 장치(10)는 이미지 필터링을 이용하여 LDR 이미지(400)를 분해함으로써 특징 추출 파트(230) 및 HDR 복원 파트(250)를 효과적으로 학습시킬 수 있다.The image processing apparatus 10 may optimize the LDR decomposition part 210, the feature extraction part 230, and the HDR reconstruction part 250 together. In addition, the image processing apparatus 10 may effectively learn the feature extraction part 230 and the HDR reconstruction part 250 by decomposing the LDR image 400 using image filtering.

이하에서 도 3 내지 도 4c를 참조하여 이미지 처리 장치(10)가 뉴럴 네트워크를 학습시키는 동작을 설명한다.Hereinafter, an operation of training the neural network by the image processing apparatus 10 will be described with reference to FIGS. 3 to 4C.

도 3은 도 2에 도시된 뉴럴 네트워크를 학습시키는 동작을 나타낸다. 도 4a는 도 3에 도시된 LDR 이미지의 예를 나타내고, 도 4b는 도 3에 도시된 베이스 레이어의 예를 나타내고, 도 4c는 도 3에 도시된 디테일 레이어의 예를 나타낸다.FIG. 3 illustrates an operation of training the neural network shown in FIG. 2. 4A shows an example of the LDR image shown in FIG. 3, FIG. 4B shows an example of the base layer shown in FIG. 3, and FIG. 4C shows an example of the detail layer shown in FIG. 3.

도 3 내지 도 4c를 참조하면, 이미지 처리 장치(10)는 뉴럴 네트워크를 학습시킬 수 있다. 예를 들어, 이미지 처리 장치(10)는 필터를 이용하여 컨벌루션 뉴럴 네트워크를 학습시킬 수 있다.3 to 4C, the image processing apparatus 10 may train a neural network. For example, the image processing apparatus 10 may train the convolutional neural network using a filter.

이미지 처리 장치(10)는 우선적으로, 특징 추출 파트(230) 및 HDR 복원 파트(250)를 사전 학습(pre-train) 시킬 수 있다. 이를 위해, LDR 분해 파트(210)를 LDR 입력에 대한 베이스 레이어(273) 및 디테일 레이어(275)를 유도 필터(guided-filter, 271)에 기초하여 분리하는 구성으로 대체할 수 있다.The image processing apparatus 10 may first pre-train the feature extraction part 230 and the HDR reconstruction part 250. To this end, the LDR decomposition part 210 may be replaced with a configuration in which the base layer 273 and the detail layer 275 for the LDR input are separated based on the guided-filter 271.

도 3의 예시에서, 유도 필터(271)는 경사 반전 아티팩트(gradient reversal artifacts)를 겪지 않는 경계-보존 필터(edge-preserving filter)를 포함할 수 있다. 베이스 레이어(273)는 유도 필터(271)를 이용하여 추출되고, 디테일 레이어(275)는 LDR 이미지(400)를 베이스 레이어(273)에 의해 요소별 나눗셈(element-wise division)을 수행하여 획득될 수 있다.In the example of FIG. 3, induction filter 271 may include an edge-preserving filter that does not suffer from gradient reversal artifacts. The base layer 273 is extracted using the induction filter 271, and the detail layer 275 is obtained by performing element-wise division of the LDR image 400 by the base layer 273. Can be.

디테일 레이어(275)를 생성하는 요소별 나눗셈은 수학식 1과 같이 나타낼 수 있다.The division for each element generating the detail layer 275 may be represented by Equation 1 below.

여기서, I _LDR 은 입력되는 LDR 이미지(400)를 의미하고 I _base 는 베이스 레이어(273)를 의미하고,

는 원소별 나눗셈 연산을 의미할 수 있다. 도 4a는 LDR 이미지(400), 베이스 레이어(273) 및 디테일 레이어(275)의 예를 나타낸다.Here, I _LDR means the input LDR image 400 and I _base means the base layer 273,

May mean an element-wise division operation. 4A shows examples of LDR image 400, base layer 273, and detail layer 275.

유도 필터(271) 분리를 사용하는 사전 학습 구조를 사용하여 특징 추출 파트(230) 및 HDR 복원 파트(250)를 사전 학습시킨 후에, 유도 필터(271)를 복수의(예를 들어, 3 개의) 컨벌루션 레이어(예를 들어, 도 2의 LDR 분해 파트(210))로 대체하여 최종 컨벌루션 뉴럴 네트워크 구조가 완성될 수 있다.After pre-learning the feature extraction part 230 and the HDR reconstruction part 250 using a pre-learning structure using induction filter 271 separation, the induction filter 271 may be plural (eg, three). The final convolutional neural network structure can be completed by replacing with a convolutional layer (eg, LDR decomposition part 210 of FIG. 2).

대체되는 LDR 분해 파트(210)의 3 개의 컨벌루션 레이어는 동일한 데이터로 학습될 수 있지만, 레이어들의 학습률(learning rate)을 0으로 설정함으로써 뒤쪽의 레이어(later layer)의 가중치(weights)를 업데이트 하지 않을 수 있다.The three convolutional layers of the replaced LDR decomposition part 210 can be learned with the same data, but do not update the weights of the later layers by setting the learning rate of the layers to zero. Can be.

이를 통해, 컨벌루션 레이어들은 LDR 이미지(400)를 특징맵(예를 들어, 분해된 이미지)들로 분해하는 분해하도록 학습되고, 뒤쪽의 레이어들은 유도 필터(271) 분리로 학습되어 최종 손실(final loss)을 낮출 수 있다.Through this, convolutional layers are trained to decompose the LDR image 400 into feature maps (eg, disassembled images), and later layers are trained to separate the induction filter 271, resulting in final loss. ) Can be lowered.

사전 학습이 끝나면, 이미지 처리 장치(10)는 별도로 학습된 특징 추출 파트(230), HDR 재구성 파트와 LDR 분해 파트(210)의 공동 최적화를 위해서 모든 파트가 통합된 상태로 엔트-투-엔트(end-to-end)로 컨벌루션 뉴럴 네트워크를 학습시킬 수 있다.After the pre-learning, the image processing apparatus 10 has an end-to-end state in which all parts are integrated for joint optimization of the separately extracted feature extraction part 230, the HDR reconstruction part, and the LDR decomposition part 210. end-to-end) to train convolutional neural networks.

표 1을 참조하면, 각 파트의 학습 순서와 유도 필터(271) 사용 여부에 따른 PSNR 성능을 비교할 수 있다.Referring to Table 1, it is possible to compare the PSNR performance according to the learning order of each part and whether the induction filter 271 is used.

네트워크 파트Network part 학습 순서Learning order 이미지 분해Image Decomposition -- 2^nd 2 ^nd -- 2^nd 2 ^nd 특징 추출Feature Extraction -- 1^st 1 ^st 1^st 1 ^st 1^st 1 ^st HDR 복원HDR restore -- 전체all 1^st 1 ^st -- 2^nd 2 ^nd 3^rd 3 ^rd PSNR of Y (dB)PSNR of Y (dB) 46.7146.71 46.2846.28 47.1147.11 47.2747.27 PSNR of YUV (dB)PSNR of YUV (dB) 48.8048.80 48.1548.15 48.9848.98 49.2149.21

표 1을 참조하면, 유도 필터(271)를 이용하여 특징 추출 파트(230) 및 복원 파트(250)를 유도 필터(271)를 이용하여 먼저 학습시키고, 이미지 분해 파트(210)를 별도로 학습시킨 후에 전체 네트워크를 통합시켜 학습시킨 경우에 PSNR 성능이 가장 우수함을 확인할 수 있다.Referring to Table 1, the feature extraction part 230 and the reconstruction part 250 are first trained using the induction filter 271 using the induction filter 271, and the image decomposition part 210 is trained separately. We can see that PSNR performance is the best when we train the whole network.

도 5a 내지 도 5c는 이미지를 분해하는 네트워크의 예시들을 나타낸다.5A-5C show examples of a network that decomposes an image.

도 5a 내지 5c를 참조하면, LDR 이미지(400)를 분해는 특징 추출 파트(230)가 분해된 이미지 각각에 집중을 도울 수 있다. 도 5a 내지 5c에 도시된 상이한 구조를 비교하여 LDR 이미지(400)를 분해의 효과가 관찰될 수 있다.5A-5C, decomposing the LDR image 400 may help the feature extraction part 230 to concentrate on each of the decomposed images. The effect of resolving the LDR image 400 can be observed by comparing the different structures shown in FIGS. 5A-5C.

도 5a의 구조는 단순한 6 개의 잔차 학습(residual learning)을 병행하는 컨벌루션 레이어로 구성된 구조를 나타낼 수 있다. 8 bit/pixel의 LDR 이미지(400)와 10 bit/pixel의 그라운드 트루스(ground truth) HDR 이미지는 모두 [0, 1] 범위 내로 정규화(normalize)되기 때문에, 네트워크는 더욱 정확한 예측을 위해 LDR 이미지(400)와 HDR 이미지(500) 사이의 차이를 학습할 수 있다.The structure of FIG. 5A may represent a structure composed of convolutional layers that perform simple six residual learnings. Since the 8 bit / pixel LDR image 400 and the 10 bit / pixel ground truth HDR image are both normalized within the range [0, 1], the network can use the LDR image ( We can learn the difference between 400 and HDR image 500.

이 경우, 입력 LDR 이미지(400)에 대해 분해가 수행되지 않지만, 잔차 학습은 LDR 이미지(400)와 HDR 이미지(500)의 차이를 예측하는 데에만 집중하도록 하는 출력의 추가적인 분리(additive separation)로 해석될 수 있다.In this case, decomposition is not performed on the input LDR image 400, but residual learning is an additive separation of the output that focuses only on predicting the difference between the LDR image 400 and the HDR image 500. Can be interpreted.

도 5b의 구조는 곱셈 입력 분해(multiplicative input decomposition)을 위해 유도 필터(271)를 사용하고, 두 개의 개별적인 경로 각각은 HDR 이미지(500)의 베이스 레이어(273) 및 디테일 레이어(275)를 예측할 수 있다. 그 후, 예측된 베이스 레이어(273) 및 디테일 레이어(275)가 원소별(element-wise)로 곱해져서 최종 HDR 이미지(500)가 획득될 수 있다.The structure of FIG. 5B uses an inductive filter 271 for multiplicative input decomposition, and each of the two separate paths can predict the base layer 273 and detail layer 275 of the HDR image 500. have. The predicted base layer 273 and detail layer 275 can then be multiplied element-wise to obtain a final HDR image 500.

HDR 이미지(500)의 그라운드 트루스 베이스 레이어(273) 및 디테일 레이어(275)를 제공함으로써 도 5b에 도시된 구조는 최종 예측을 위한 분해에 완전히 집중할 수 있다.By providing the ground truss base layer 273 and detail layer 275 of the HDR image 500, the structure shown in FIG. 5B can fully concentrate on the decomposition for final prediction.

도 5c의 구조도 곱셈 입력 분해를 위한 유도 필터(271)를 사용하지만, 개별 경로들(passes)로부터 생성된 특징맵들이 HDR 이미지(500)의 직접적인 예측을 위해 연결(concatenate)될 수 있다.Although the schematic diagram of FIG. 5C uses an induction filter 271 for multiplication input decomposition, feature maps generated from individual passes can be concatenated for direct prediction of the HDR image 500.

도 5b의 구조에서, 원소별 곱셈 통합을 위해 HDR 이미지(500)의 베이스 레이어(273) 및 디테일 레이어(275)를 모델링하기 위해 네트워크를 강제하는 반면, 도 5c의 구조는 마지막 3 개의 컨벌루션 레이어들을 통해서 최종 손실을 낮추는 최적의 통합 연산(optimal integration operation)을 학습할 수 있다. 도 5b의 구조는 도 3에서 사전 학습을 위해 사용된 구조와 동일할 수 있다.In the structure of FIG. 5B, the network of FIG. 5C forces the last three convolutional layers while modeling the base layer 273 and detail layer 275 of the HDR image 500 for elemental multiplication integration. This allows you to learn optimal integration operations that lower the final loss. The structure of FIG. 5B may be the same as the structure used for prior learning in FIG. 3.

도 5a 내지 도 5c의 성능을 비교한 결과는 표 2와 같이 나타낼 수 있다.The results of comparing the performance of Figures 5a to 5c can be shown in Table 2.

구조rescue (a)*(a) * (a)(a) (b)(b) (c)*(c) * (c)(c) 레이어Layer 필터 채널의 수(입력, 출력)Number of filter channels (input, output) 1One 3,323,32 3,323,32 3,453,45 3,323,32 3,323,32 3,323,32 3,323,32 3,323,32 3,323,32 22 32,3232,32 32,3232,32 45,4545,45 32,3232,32 32,3232,32 32,3232,32 32,3232,32 32,3232,32 32,3232,32 33 32,3232,32 32,3232,32 45,4845,48 32,3232,32 32,3232,32 32,3232,32 32,3232,32 32,3232,32 32,3232,32 44 32,3232,32 32,3232,32 48,4548,45 32,3232,32 32,3232,32 32,5232,52 64,4064,40 55 32,3232,32 32,3232,32 45,4545,45 32,3232,32 32,3232,32 52,4852,48 40,4040,40 66 32,332,3 32,332,3 45,345,3 32,332,3 32,332,3 48,348,3 40,340,3 전체 파라미터Full parameters 38,59238,592 38,59238,592 77,76077,760 77,18477,184 77,32877,328 77,11277,112 PSNR of YPSNR of Y 45.4645.46 46.3646.36 46.3646.36 46.8446.84 46.7346.73 47.0347.03 PSNR of YUVPSNR of YUV 47.2847.28 48.2548.25 48.3948.39 48.6548.65 48.8748.87 48.8248.82

여기서, (a)*는 도 5a의 구조에서 잔차 학습이 없는 경우를 의미하고, (c)*는 도 5c의 구조에서 3 개의 컨벌루션 레이어 이후의 특징맵들을 연결하는 대신에 원소별 곱셈을 이용한 경우를 의미할 수 있다.Here, (a) * means no residual learning in the structure of FIG. 5A, and (c) * means multiplication by element instead of connecting feature maps after three convolution layers in the structure of FIG. 5C. It may mean.

표 2를 참조하면, 유도 필터링에 의한 이미지의 분해를 이용하는 구조가 높은 PSNR 성능을 나타내는 것을 확인할 수 있다. 통합 방식이 (b)에서 처럼 분해의 곱셈적 속성을 암시하는 대신에 (c)에서처럼 컨벌루션 레이어를 사용하여 학습될 때, 성능이 더욱 향상될 수 있다. (a)와 같은 단순한 구조에서는 잔차 학습이 중요할 수 있다.Referring to Table 2, it can be seen that the structure using the decomposition of the image by inductive filtering exhibits high PSNR performance. When the integration scheme is learned using the convolutional layer as in (c) instead of implying the multiplicative nature of decomposition as in (b), the performance can be further improved. Residual learning can be important in simple structures such as (a).

공정한 비교를 위해 히든 레이어(hidden layer)들에서의 파라미터의 전체 수는 조절하여 각각의 구조에서 파라미터의 수가 비슷할 수 있다. Y 채널과 YUV 3채널 모두에서 PSNR성능이 비교될 수 있다.For a fair comparison the total number of parameters in hidden layers can be adjusted so that the number of parameters in each structure is similar. PSNR performance can be compared on both Y and YUV channels.

비슷한 수의 파라미터가 있더라도, 입력 분해가 사용되는지 여부와 분해된 입력이 어떻게 처리되는지에 따라 Y에 대해서만 측정된 최대 PSNR 차이는 0.67 dB이고, YUV 채널에 대해서는 0.48 dB의 측정 값을 가질 수 있다.Even with a similar number of parameters, the maximum PSNR difference measured only for Y is 0.67 dB and 0.48 dB for the YUV channel, depending on whether input decomposition is used and how the resolved input is processed.

가장 우수한 구조는 5c의 구조이지만, (c)*도 유사한 성능을 나타낼 수 있다. 여기서 네트워크는 두 가지 특징 추출 단계의 통합을 학습할 자유를 가질 수 있다. 5a와 5b 구조를 비교하면 유도 필터(271)를 사용한 입력 분해가 매우 우수함을 확인할 수 있다.The best structure is 5c, but (c) * can also show similar performance. Here the network can have the freedom to learn the integration of the two feature extraction steps. Comparing the structures 5a and 5b, it can be seen that the input decomposition using the induction filter 271 is very excellent.

5a 구조에서 간단한 CNN 구조의 경우 잔차 학습을 이용하여 예측을 향상시키는 것이 중요할 수 있다. 컨벌루션 레이어가 특성이 다른 특정 입력 분해에 집중하고 상이한 브랜치에서 생성된 정보를 결합하여 학습하는 것은 고 품질의 HDR 이미지(500)를 복원하는데 중요할 수 있다.In the case of a simple CNN structure in the 5a structure, it may be important to improve the prediction using residual learning. It may be important for the convolutional layer to focus on specific input decompositions with different characteristics and to learn by combining the information generated in different branches, to reconstruct the high quality HDR image 500.

도 6a 내지 도 7b는 원본 LDR 이미지(400)와 이미지 처리 장치(10)가 생성한 HDR 이미지(500)의 예를 나타낸다.6A to 7B illustrate examples of the original LDR image 400 and the HDR image 500 generated by the image processing apparatus 10.

도 6a 및 7a는 처리하지 않은 원본 이미지의 예들을 나타내고, 도 6b 및 7b는 이미지 처리 장치가 처리한 이미지의 예들을 나타낸다.6A and 7A show examples of unprocessed original images, and FIGS. 6B and 7B show examples of images processed by the image processing apparatus.

도 6a 내지 도 7b를 참조하면, 다양한 장면을 포함하는 LDR-HDR 데이터 쌍의 3840×2160 UHD 해상도의 7268 프레임들이 수집될 수 있다. 데이터들의 사양은 표 3과 같이 나타낼 수 있다.6A to 7B, 7268 frames of 3840 × 2160 UHD resolution of an LDR-HDR data pair including various scenes may be collected. The specification of the data can be shown in Table 3.

데이터 유형Data type 비트 깊이Bit depth 전달 함수Transfer function 컬러 콘테이너Color container LDRLDR 8 bits/pixel8 bits / pixel Gamma Gamma BT. 709BT. 709 HDRHDR 10 bits/pixel10 bits / pixel PQ-EOTFPQ-EOTF BT. 2020BT. 2020

HDR 비디오는 전문적으로 촬영되고, 마스터(master)되었고, LDR 및 HDR 데이터는 모두 [0, 1] 범위로 정규화될 수 있다. 훈련 데이터의 합성을 위해 프레임 스트라이드(stride)가 30인 프레임당 40×40의 20개의 서브 이미지들이 무작위로 잘릴 수 있다. 이를 통해 크기가 40×40×3×4860 인 전체 학습 셋이 생성될 수 있다.HDR video was professionally shot, mastered, and both LDR and HDR data can be normalized to the [0, 1] range. For the synthesis of the training data, 20 sub-images of 40 × 40 per frame with a frame stride of 30 may be randomly cropped. This can generate an entire training set of size 40 × 40 × 3 × 4860.

테스트를 위해 학습 셋에 포함되지 않은 6 개의 다른 장면에서 14 개의 프레임이 선택될 수 있다. 모든 비디오는 YUV 색상 공간으로 변환될 수 있고, 세 개의 YUV 채널은 모두 학습에 사용될 수 있다. Y 채널만 사용하는 것도 가능하지만, 컬러 컨테이너가 BT.709에서 BT.2020으로 변경되기 때문에 3 개의 채널 모두를 사용하는 것이 적합할 수 있다.For testing, 14 frames may be selected from six different scenes not included in the training set. All video can be converted to the YUV color space, and all three YUV channels can be used for learning. It is also possible to use only the Y channel, but it may be appropriate to use all three channels since the color container changes from BT.709 to BT.2020.

표 4를 참조하면, 3 개 채널을 모두 사용할 때의 PSNR 이점을 확인할 수 있다.Referring to Table 4, we can see the PSNR benefits of using all three channels.

Train DataTrain data Y onlyY only YUVYUV PSNR of Y (dB)PSNR of Y (dB) 44.3644.36 46.3646.36 PSNR of YUV (dB)PSNR of YUV (dB) 32.5332.53 48.2548.25

3 개의 YUV 채널 모두에 대해 측정했을 때, PSNR의 큰 차이는 부분적으로 U 및 V 채널에 대해서 학습하지 않으면 색상 컨테이너와 LDR 및 HDR 이미지(500)의 전달함수가 일치되지 않을 수 있기 때문이다. Y 채널은 U 및 V 채널의 보상 정보(complementary information)로부터 이득을 얻을 수 있다.When measured for all three YUV channels, the large difference in PSNR is partly because the transfer function of the color container and the LDR and HDR images 500 may not match unless we learn about the U and V channels. The Y channel may gain from the complementary information of the U and V channels.

컨벌루션 필터의 가중치 감쇄(weight decay)는 5×10^-4로 설정될 수 있고, 바이어는 0으로 설정될 수 있다. 미니 배치 사이즈는 32로, 필터에 대한 학습률은 10^-4로, 바이어스에 대해서는 10^-5로 설정될 수 있다.The weight decay of the convolution filter may be set to 5 × 10 ⁻⁴ , and the buyer may be set to zero. The mini batch size may be set to 32, the learning rate for the filter to 10 ⁻⁴ , and 10 ⁻⁵ for the bias.

모든 컨벌루션 필터는 3×3 크기이고, 가중치는 분산이 입력 및 출력 뉴런들의 수로 표현되는 정규 분포로부터 가중치를 그리는 Xavier 초기화(initialization)로 초기화 될 수 있다. 네트워크의 손실함수는 수학식 2와 같이 주어질 수 있다.All convolution filters are 3x3 in size, and the weights can be initialized with Xavier initialization, where the variance is weighted from a normal distribution expressed as the number of input and output neurons. The loss function of the network may be given by Equation 2.

여기서 θ는 모델 파라미터의 집합을 의미하고, n은 학습 샘플들의 수를 의미하고, I _LDR 은 입력 LDR 이미지(400)를 의미할 수 있다. F는 네트워크의 예측을 F(I _LDR ; θ )로 제공하는 컨벌루션 뉴럴 네트워크의 비선형 매핑 함수를 의미할 수 있고, I _HDR 은 그라운드 트루스 HDR 이미지를 의미할 수 있다.Here, θ may mean a set of model parameters, n may mean the number of training samples, and I _LDR may mean an input LDR image 400. F is the prediction of the network F; can mean a non-linear mapping function of the convolutional neural network that provides a (I _LDR θ), I _HDR can mean the ground Truth HDR image.

활성화 함수는 ReLU(Rectified Linear Unit)으로, 수학식 3과 같이 나타낼 수 있다.The activation function is a rectified linear unit (ReLU), which can be expressed as Equation 3.

또한, 모든 네트워크 모델들은 MatConvNet 패키지로 구현될 수 있다.In addition, all network models can be implemented in the MatConvNet package.

하이 엔드(high-end) HDR TV를 시장에서 쉽게 구할 수 있지만, HDR 비디오 컨텐츠는 부족하다. 따라서, HDR TV용 LDR 레거시 비디오(legacy video)를 HDR 비디오로 업컨버트(up-convert)하는 방식이 필요할 수 있다.While high-end HDR TVs are readily available on the market, HDR video content is lacking. Accordingly, a method of up-converting LDR legacy video for HDR TV to HDR video may be required.

기존의 ITM 방법이 업컨버팅의 유사한 목표를 공유하지만, 그들의 궁극적인 목표는 역 톤매핑된 HDR 이미지(500)를 소비자 HDR TV 디스플레이에 렌더링하는 것이 아니라 LDR 장면을 HDR 도메인으로 전송하여 전문적인 그래픽 렌더링을 개선하는 것이었다. 이전의 ITM 방법으로 생성된 HDR 이미지(500)는 소비자 HDR TV 디스플레이에서 볼 때 어두운 영역에서의 노이즈 증폭 및 지역 명암(local contrast)의 부재 또는 부자연스러운 색상을 가질 수 있다.While traditional ITM methods share a similar goal of upconverting, their ultimate goal is not to render inverse tonemapped HDR images 500 to consumer HDR TV displays, but to transfer LDR scenes to the HDR domain for professional graphics rendering. Was to improve. The HDR image 500 generated by the previous ITM method may have unnatural colors or no noise amplification and local contrast in dark areas when viewed on a consumer HDR TV display.

이미지 처리 장치(10)는 디테일 정보와 지역 명암을 복원하도록 네이트워크를 학습시켜, HDR 소비자 디스플레이를 위한 컨벌루션 뉴럴 네트워크를 이용하는 ITM 방식을 제공할 수 있다.The image processing apparatus 10 may learn NateWalk to restore detail information and local contrast, thereby providing an ITM scheme using a convolutional neural network for HDR consumer display.

HDR 이미지(500)를 정확하게 예측하기 위해, 이미지 처리 장치(10)는 네트워크의 여러 부분(LDR 분해, 특징 추출 및 HDR 복원을 엔드투엔드 학습전에 별도로 학습시킬 수 있다.In order to accurately predict the HDR image 500, the image processing apparatus 10 may learn different portions of the network (LDR decomposition, feature extraction and HDR reconstruction separately before end-to-end learning).

특히, 사전 학습 단계에 대한 LDR 분해를 위해, 이미지 처리 장치(10)는 유도 필터(271)를 채택하여 이후의 레이어가 개별 경로로 개별적으로 분해된 이미지에 집중할 수 있도록 할 수 있다.In particular, for LDR decomposition for the pre-learning step, the image processing apparatus 10 may employ an induction filter 271 so that subsequent layers can focus on the images that have been separately resolved in separate paths.

이미지 처리 장치(10)의 HDR 복원 파트(250)는 두 개의 경로로부터 추출된 특징맵을 통합하는 방법을 학습할 수 있다. 결과적으로 생성된 HDR 이미지(500)는 아티팩트가 없고 지역 명암 및 세부 정보를 복원하여 그라운드 트루스 HDR 이미지에 근접한 HDR 이미지(500)를 생성할 수 있다. 이미지 처리 장치(10)는 일반 HDR TV 디스플레이에서 레거시 LDR 비디오를 HDR 비디오로 직접 시청하는 것을 가능하게 할 수 있다.The HDR reconstruction part 250 of the image processing apparatus 10 may learn how to integrate a feature map extracted from two paths. The resulting HDR image 500 can produce an HDR image 500 that is free of artifacts and reconstructs local contrast and details to approximate a ground truss HDR image. The image processing apparatus 10 may enable watching legacy LDR video directly as HDR video on a normal HDR TV display.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. Computer-readable media may include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with reference to the accompanying drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the following claims.

Claims

Receiving an image;
Decomposing the image by performing a first convolution operation;
Performing a second convolution operation to extract a first feature map from a portion of the decomposed image, and extract a second feature map from another portion of the decomposed image; And
Generating a high dynamic range (HDR) image by performing a third convolution operation on the first feature map and the second feature map
Image processing method comprising a.

The method of claim 1,
Generating the HDR image,
Combining the first feature map and the second feature map; And
Generating an HDR image by performing the third convolution operation on the combined first feature map and the second feature map
Image processing method comprising a.

The method of claim 1,
Learning filter parameters of the first, second and third convolution operations
Image processing method further comprising.

The method of claim 3,
The learning step,
Learning filter parameters of the second and third convolution operations based on the image;
After learning the filter parameters of the second and third convolution operations, learning the filter parameters of the first convolution operations; And
After learning the filter parameters of the first convolution operation, learning the filter parameters of the first to third convolution operations
Image processing method comprising a.

The method of claim 4, wherein
Learning the filter parameters of the second and third convolution operations,
Generating a portion of the decomposed image by filtering the image;
Generating another part of the exploded image based on the image and the part; And
Learning filter parameters of the second and third convolution operations based on the portion and the other portion
Image processing method comprising a.

The method of claim 5,
Generating the portion,
Generating the portion by edge preserving filtering the image.
Image processing method comprising a.

The method of claim 5,
Generating the other part,
Generating the other part by performing element-wise division on the image and the part
Image processing method comprising a.

The method of claim 5,
Learning the filter parameters of the first to third convolution operations,
Training end-to-end a network in which the parameters of the learned first convolution operation and the filter parameters of the learned second and third convolution operations are combined;
Image processing method comprising a.

A receiver for receiving an image; And
Perform a first convolution operation to decompose the image, perform a second convolution operation to extract a first feature map from a portion of the decomposed image, and extract a second feature map from another portion of the decomposed image, A processor that generates an HDR image by performing a third convolution operation on the first feature map and the second feature map
Image processing apparatus comprising a.

The method of claim 9,
The processor,
Combining the first feature map and the second feature map, and performing the third convolution operation on the combined first feature map and the second feature map to generate an HDR image
Image processing unit.

The method of claim 9,
The processor,
Learning filter parameters of the first, second and third convolution operations
Image processing unit.

The method of claim 11,
The processor,
Training filter parameters of the second and third convolution operations based on the image, learning filter parameters of the second and third convolution operations, and then learning filter parameters of the first convolution operation, After learning the filter parameters of the first convolution operation, learning the filter parameters of the first to third convolution operations
Image processing unit.

The method of claim 12,
The processor,
Generate a portion of the disassembled image by filtering the image, generate another portion of the disassembled image based on the image and the portion, and generate the second and third convolution based on the portion and the other portion Training filter parameters
Image processing unit.

The method of claim 13,
The processor,
Generating the portion by edge preserving filtering the image.
Image processing unit.

The method of claim 13,
The processor,
Generating another part by performing element-wise division on the image and the part
Image processing unit.

The method of claim 13,
The processor,
Training end-to-end networks in which the parameters of the learned first convolution operation and the filter parameters of the learned second and third convolution operations are combined.
Image processing unit.