KR102570131B1

KR102570131B1 - Method and Apparatus for Providing an HDR Environment Map from an LDR Image Based on Deep Learning

Info

Publication number: KR102570131B1
Application number: KR1020210165624A
Authority: KR
Inventors: 노준용; 서광균; 이지원; 유정은; 이하늬
Original assignee: 한국과학기술원
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2023-08-25
Also published as: KR20230078146A

Abstract

심층 학습(Deep Learning)에 기반하여 LDR(Low Dynamic Range) 영상으로부터 HDR(High Dynamic Range) 환경 맵(Environment Map)을 제공하는 방법이 개시된다. 개시된 방법은, LDR 영상을 제1 DNN(Deep Neural Network) 네트워크로 입력하여 LDR 환경 맵을 출력하는 단계, 및 상기 LDR 환경 맵을 제2 DNN 네트워크로 입력하여 HDR 환경 맵을 출력하는 단계를 포함할 수 있다. 상기 LDR 환경 맵은 상기 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보 및 상기 LDR 영상의 색감에 관한 정보를 포함하고, 상기 HDR 환경 맵은 상기 LDR 환경 맵에 비해 높은 동적 범위(Dynamic Range)를 가질 수 있다.Disclosed is a method of providing a high dynamic range (HDR) environment map from a low dynamic range (LDR) image based on deep learning. The disclosed method includes the steps of outputting an LDR environment map by inputting an LDR image to a first deep neural network (DNN) network, and outputting an HDR environment map by inputting the LDR environment map to a second DNN network. can The LDR environment map includes information representing the location and intensity of a light source in the LDR image and information about color of the LDR image, and the HDR environment map has a higher dynamic range than the LDR environment map. can have

Description

Method and Apparatus for Providing an HDR Environment Map from an LDR Image Based on Deep Learning

아래의 개시는 심층 학습(Deep Learning)에 기반한 영상 처리 기술에 관한 것이다.The disclosure below relates to image processing technology based on deep learning.

최근 가상현실(Virtual Reality: VR), 증강현실(Augmented Reality: AR) 및 혼합현실 (Mixed Reality: MR) 생성 기술의 발전으로 실제 객체와 가상 객체와의 상호작용이 가능해졌다. 여기서 가상 객체는 실제 객체와 가까운 모습을 가지도록 하여 더욱 실감나는 경험을 제공하기도 한다. 이를 가능하게 해 주는 것이 바로 영상 기반 조명(Image Based Lighting: IBL) 기법인데, 이는 실제 환경에서의 광원을 영상으로 표현한 HDR(High Dynamic Range) 환경 맵(Environment Map)을 이용하여 가상 객체를 조명할 수 있도록 해주는 방법이다. HDR 환경 맵이 생성되면 이에 표현된, 입력 영상에서의 광원의 위치와 세기에 관한 정보 그리고 입력 영상의 색감에 관한 정보를 이용하여 가상 환경에서의 가상 객체에 비춰지는 빛의 방향, 주변 광, 색감 등을 확인하고 이를 바탕으로 가상 환경에서 가상 객체를 최적으로 렌더링할 수 있게 된다.Recent advances in virtual reality (VR), augmented reality (AR), and mixed reality (MR) creation technologies have made it possible to interact with real and virtual objects. Here, the virtual object may provide a more realistic experience by having an appearance close to the real object. What makes this possible is the Image Based Lighting (IBL) technique, which uses HDR (High Dynamic Range) environment maps that represent light sources in the real environment as images to illuminate virtual objects. way to make it possible. When the HDR environment map is created, the information on the position and intensity of the light source in the input image and the information on the color of the input image expressed in the HDR environment map are used to determine the direction of light, ambient light, and color of the virtual object in the virtual environment. etc., and based on this, the virtual object can be rendered optimally in the virtual environment.

HDR 환경 맵을 생성하는 종래의 방법으로서 카메라와 어안렌즈, 거울 공 등 특수 장비를 사용하거나 노출을 조절한 여러 장의 사진을 찍어 이를 합성하는 방법이 있는데, 이 방법은 매번 특수 장비를 사용하여야 한다는 단점이 있다. 이 때문에 최근에는 심층 학습(Deep Learning)을 사용하는 방법들이 제안되고 있다. 심층 학습을 사용하게 될 경우 데이터를 이용한 네트워크 학습을 통해 입력 영상으로부터 HDR환경 맵을 추정할 수 있다는 장점이 있다.As a conventional method of creating an HDR environment map, there is a method of using special equipment such as a camera, fisheye lens, mirror ball, etc. or taking multiple photos with adjusted exposure and compositing them. This method has the disadvantage that special equipment must be used each time. there is For this reason, methods using deep learning have recently been proposed. In the case of using deep learning, there is an advantage in that an HDR environment map can be estimated from an input image through network learning using data.

본 개시에 의해 해결하고자 하는 과제는 광원을 추출할 입력 영상이 실내 영상인지 실외 영상인지의 여부에 관계없이 입력 영상에 적합한 광원을 HDR 환경 맵의 형태로 추출할 수 있도록 한 기술을 제공하는 것이다.An object to be solved by the present disclosure is to provide a technology capable of extracting a light source suitable for an input image in the form of an HDR environment map regardless of whether the input image to extract the light source is an indoor image or an outdoor image.

본 개시에 의해 해결하고자 하는 과제는 이상에서 언급한 과제들에 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present disclosure is not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the description below.

본 개시의 일 특징에 따르면, 심층 학습(Deep Learning)에 기반하여 LDR(Low Dynamic Range) 영상으로부터 HDR(High Dynamic Range) 환경 맵(Environment Map)을 제공하는 방법이 제공된다. 본 방법은, LDR 영상을 제1 DNN(Deep Neural Network) 네트워크로 입력하여 LDR 환경 맵을 출력하는 단계, 및 상기 LDR 환경 맵을 제2 DNN 네트워크로 입력하여 HDR 환경 맵을 출력하는 단계를 포함할 수 있다. 상기 LDR 환경 맵은 상기 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보 및 상기 LDR 영상의 색감에 관한 정보를 포함하고, 상기 HDR 환경 맵은 상기 LDR 환경 맵에 비해 높은 동적 범위(Dynamic Range)를 가질 수 있다.According to one feature of the present disclosure, a method for providing a high dynamic range (HDR) environment map from a low dynamic range (LDR) image based on deep learning is provided. The method includes the steps of outputting an LDR environment map by inputting an LDR image to a first deep neural network (DNN) network, and outputting an HDR environment map by inputting the LDR environment map to a second DNN network. can The LDR environment map includes information representing the location and intensity of a light source in the LDR image and information about color of the LDR image, and the HDR environment map has a higher dynamic range than the LDR environment map. can have

일 실시예에서, 상기 방법은, 상기 LDR 영상을 제1 DNN(Deep Neural Network) 네트워크로 입력하여 LDR 환경 맵을 출력하는 단계 이전에, 상기 제1 DNN 네트워크를 학습용 LDR 영상을 포함하는 학습 데이터로 학습시키는 단계를 더 포함한다.In one embodiment, the method, prior to the step of inputting the LDR image into a first Deep Neural Network (DNN) network and outputting an LDR environment map, converts the first DNN network into training data including an LDR image for learning. The step of learning is further included.

일 실시예에서, 상기 제1 DNN 네트워크를 학습용 LDR 영상을 포함하는 학습 데이터로 학습시키는 단계는, 상기 제1 DNN 네트워크로 학습용 LDR 영상을 입력하고 상기 제1 DNN 네트워크의 내부 파라미터들을 복수회 갱신해 가면서 제1 분류 클래스(Classified Class) 값, 제1 라이트 마스크(Light Mask) 및 제1 LDR 환경 맵을 복수회 출력하는 단계를 포함한다. 여기서 상기 학습 데이터는 상기 학습용 LDR 영상에 부합하는 제2 분류 클래스 값, 제2 라이트 마스크 및 제2 LDR 환경 맵을 더 포함하고, 상기 제2 분류 클래스 값은 상기 학습용 LDR 영상이 실내(Indoor) 영상인지 실외(Outdoor) 영상인지의 여부를 나타내고, 상기 제2 라이트 마스크는 상기 학습용 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보를 포함하고, 상기 제2 LDR 환경 맵은 상기 학습용 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보 및 상기 학습용 LDR 영상의 색감에 관한 정보를 포함한다.In one embodiment, the step of learning the first DNN network with training data including an LDR image for learning may include inputting an LDR image for learning to the first DNN network and updating internal parameters of the first DNN network a plurality of times. and outputting a first classified class value, a first light mask, and a first LDR environment map multiple times while going. Here, the training data further includes a second classification class value, a second light mask, and a second LDR environment map corresponding to the learning LDR image, and the second classification class value indicates that the learning LDR image is an indoor image. Indicates whether the image is an outdoor image, the second light mask includes information indicating the location and intensity of a light source in the LDR image for learning, and the second LDR environment map is a light source in the LDR image for learning. It includes information indicating the position and intensity of and information about the color of the LDR image for learning.

일 실시예에서, 상기 제1 DNN 네트워크를 학습 데이터로 학습시키는 단계는, 아래의 수학식들In one embodiment, the step of learning the first DNN network with training data is the following equations

에 의해서 정의되는 최종손실함수()가 최소값에 근접하도록 상기 제1 DNN 네트워크의 내부 파라미터들을 복수회 갱신하는 단계를 더 포함한다. 여기서 는 최종손실함수를 나타내고, 는 분류 클래스에 대한 손실함수를 나타내고, 는 상기 제1 분류 클래스 값을 나타내고, 는 상기 제2 분류 클래스 값을 나타내고, 는 라이트 마스크에 대한 손실함수를 나타내고, N은 상기 상기 제1 라이트 마스크, 상기 제2 라이트 마스크, 상기 제1 LDR 환경 맵 및 상기 제2 LDR 환경 맵의 화소들(pixels)의 총 개수를 나타내고, i는 화소 번호를 나태내는 인덱스이고, 는 상기 제1 라이트 마스크의 i번째 화소 값을 나타내고, 는 상기 제2 라이트 마스크의 i번째 화소 값을 나타내고, 는 LDR 환경 맵에 대한 손실함수를 나타내고, 는 상기 제1 LDR 환경 맵의 i번째 화소 값을 나태내고, 는 상기 제2 LDR 환경 맵의 i번째 화소 값을 나태내고, 는 상기 LDR 환경 맵에 대한 손실함수에 대한 가중치를 나타내고, 는 상기 라이트 마스크에 대한 손실함수에 대한 가중치를 나타내고, 는 상기 분류 클래스에 대한 손실함수에 대한 가중치를 나타내고, , 및 의 합은 1이다.The final loss function defined by ( ) is closer to the minimum value, and updating the internal parameters of the first DNN network a plurality of times. here represents the final loss function, denotes the loss function for the classification class, Represents the first classification class value, represents the second classification class value, denotes a loss function for a light mask, N denotes the total number of pixels of the first light mask, the second light mask, the first LDR environment map, and the second LDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first light mask, represents the i-th pixel value of the second light mask, denotes the loss function for the LDR environment map, Represents the i-th pixel value of the first LDR environment map, Represents the i-th pixel value of the second LDR environment map, represents a weight for the loss function for the LDR environment map, represents the weight for the loss function for the light mask, represents the weight for the loss function for the classification class, , and is 1.

일 실시예에서, 상기 제1 분류 클래스 값은 0에서 1 사이의 값이며, 상기 학습용 LDR 영상이 실내 영상인 경우 상기 제2 분류 클래스 값은 0 또는 0에 가까운 값이며, 상기 학습용 LDR 영상이 실외 영상인 경우 상기 제2 분류 클래스 값은 1 또는 1에 가까운 값이다.In one embodiment, the first classification class value is a value between 0 and 1, and when the learning LDR image is an indoor image, the second classification class value is 0 or a value close to 0, and the learning LDR image is outdoor In the case of an image, the second classification class value is 1 or a value close to 1.

일 실시예에서, 상기 방법은, 상기 LDR 영상을 제1 DNN(Deep Neural Network) 네트워크로 입력하여 LDR 환경 맵을 출력하는 단계 이전에, 상기 제2 DNN 네트워크를 학습용 LDR 환경 맵을 포함하는 학습 데이터로 학습시키는 단계를 더 포함한다.In one embodiment, the method, prior to the step of inputting the LDR image to a first Deep Neural Network (DNN) network and outputting an LDR environment map, the second DNN network as learning data including an LDR environment map for learning. Further comprising the step of learning as.

일 실시예에서, 상기 제2 DNN 네트워크를 학습용 LDR 환경 맵을 포함하는 학습 데이터로 학습시키는 단계는, 상기 제2 DNN 네트워크로 학습용 LDR 환경 맵을 입력하고 상기 제2 DNN 네트워크의 내부 파라미터들을 복수회 갱신해 가면서 제1 HDR 환경 맵을 복수회 출력하는 단계를 포함한다. 여기서 상기 학습 데이터는 상기 학습용 LDR 환경 맵에 부합하는 제2 HDR 환경 맵을 더 포함하고, 상기 제2 HDR 환경 맵은 상기 학습용 LDR 환경 맵에 비해 높은 동적 범위를 가진다.In one embodiment, the step of learning the second DNN network with training data including an LDR environment map for learning may include inputting an LDR environment map for learning to the second DNN network and configuring internal parameters of the second DNN network a plurality of times. and outputting the first HDR environment map a plurality of times while updating. Here, the training data further includes a second HDR environment map conforming to the LDR environment map for learning, and the second HDR environment map has a higher dynamic range than the LDR environment map for learning.

일 실시예에서, 상기 제2 DNN 네트워크를 학습 데이터로 학습시키는 단계는, 아래의 수학식In one embodiment, the step of learning the second DNN network with training data is the following equation

에 의해서 정의되는 HDR 환경 맵에 대한 손실함수()가 최소값에 근접하도록 상기 제2 DNN 네트워크의 내부 파라미터들을 복수회 갱신하는 단계를 더 포함한다. 여기서 는 HDR 환경 맵에 대한 손실함수를 나타내고, N은 상기 제1 HDR 환경 맵 및 상기 제2 HDR 환경 맵의 화소들의 총 개수를 나타내고, i는 화소 번호를 나태내는 인덱스이고, 는 상기 제1 HDR 환경 맵의 i번째 화소 값을 나타내고, 는 상기 제2 HDR 환경 맵의 i번째 화소 값을 나타낸다.The loss function for the HDR environment map defined by ( Updating internal parameters of the second DNN network a plurality of times so that ) approaches a minimum value. here denotes a loss function for the HDR environment map, N denotes the total number of pixels of the first HDR environment map and the second HDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first HDR environment map, represents the i-th pixel value of the second HDR environment map.

본 개시의 다른 특징에 따르면, 심층 학습에 기반하여 LDR 영상으로부터 HDR환경 맵을 제공하기 위한 장치가 제공된다. 본 장치는, LDR 영상을 저장하는 데이터베이스부, 및 상기 데이터베이스부에 통신가능한 방식으로 결합되며, 제1 DNN 네트워크 및 제2 DNN 네트워크를 구현하며, 상기 데이터베이스부로부터 상기 LDR 영상을 검색하여 상기 제1 DNN 네트워크로 입력되게 제어하도록 구성된 프로세싱 엔진을 포함할 수 있다. 상기 제1 DNN 네트워크는 상기 LDR 영상이 입력되는 것에 응답하여 LDR 환경 맵을 출력하도록 구성되고, 상기 제2 DNN 네트워크는 상기 제1 DNN 네트워크에 접속되고 상기 LDR 환경 맵이 입력되는 것에 응답하여 HDR 환경 맵을 출력하도록 구성되고, 상기 LDR 환경 맵은 상기 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보 및 상기 LDR 영상의 색감에 관한 정보를 포함하고, 상기 HDR 환경 맵은 상기 LDR 환경 맵에 비해 높은 동적 범위를 가질 수 있다.According to another feature of the present disclosure, an apparatus for providing an HDR environment map from an LDR image based on deep learning is provided. The apparatus includes a database unit for storing an LDR image, coupled to the database unit in a communicably manner, implementing a first DNN network and a second DNN network, and searching the LDR image from the database unit to obtain the first DNN network. It may include a processing engine configured to control input to the DNN network. The first DNN network is configured to output an LDR environment map in response to input of the LDR image, and the second DNN network is connected to the first DNN network and is configured to output an HDR environment map in response to input of the LDR environment map. configured to output a map, wherein the LDR environment map includes information representing the position and intensity of a light source in the LDR image and information about color of the LDR image, and the HDR environment map has a higher level than the LDR environment map It can have a dynamic range.

일 실시예에서, 상기 데이터베이스는 학습용 LDR 영상을 포함하는 학습 데이터를 더 저장하고, 상기 프로세싱 엔진은 상기 제1 DNN 네트워크를 상기 학습 데이터로 학습시키도록 더 구성된다.In an embodiment, the database further stores training data including an LDR image for training, and the processing engine is further configured to train the first DNN network with the training data.

일 실시예에서, 상기 프로세싱 엔진은, 상기 제1 DNN 네트워크로 상기 학습용 LDR 영상을 입력하여 상기 제1 DNN 네트워크의 내부 파라미터들이 복수회 갱신되면서 제1 분류 클래스 값, 제1 라이트 마스크 및 제1 LDR 환경 맵이 복수회 출력되게 제어하도록 더 구성된다. 여기서 상기 학습 데이터는 상기 학습용 LDR 영상에 부합하는 제2 분류 클래스 값, 제2 라이트 마스크 및 제2 LDR 환경 맵을 더 포함하고, 상기 제2 분류 클래스 값은 상기 학습용 LDR 영상이 실내 영상인지 실외 영상인지의 여부를 나타내고, 상기 제2 라이트 마스크는 상기 학습용 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보를 포함하고, 상기 제2 LDR 환경 맵은 상기 학습용 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보 및 상기 학습용 LDR 영상의 색감에 관한 정보를 포함한다.In one embodiment, the processing engine inputs the LDR image for training to the first DNN network, and updates the internal parameters of the first DNN network multiple times, including a first classification class value, a first light mask, and a first LDR. It is further configured to control the environment map to be output a plurality of times. Here, the training data further includes a second classification class value, a second light mask, and a second LDR environment map corresponding to the learning LDR image, and the second classification class value determines whether the learning LDR image is an indoor image or an outdoor image. the second light mask includes information representing the location and intensity of a light source in the learning LDR image, and the second LDR environment map indicates the location and intensity of a light source in the learning LDR image information and information about the color of the LDR image for learning.

일 실시예에서, 상기 프로세싱 엔진은, 아래의 수학식들In one embodiment, the processing engine, the following equations

에 의해서 정의되는 최종손실함수()가 최소값에 근접하도록 상기 제1 DNN 네트워크의 내부 파라미터들이 복수회 갱신되게 제어하도록 더 구성된다. 여기서 는 최종손실함수를 나타내고, 는 분류 클래스에 대한 손실함수를 나타내고, 는 상기 제1 분류 클래스 값을 나타내고, 는 상기 제2 분류 클래스 값을 나타내고, 는 라이트 마스크에 대한 손실함수를 나타내고, N은 상기 상기 제1 라이트 마스크, 상기 제2 라이트 마스크, 상기 제1 LDR 환경 맵 및 상기 제2 LDR 환경 맵의 화소들(pixels)의 총 개수를 나타내고, i는 화소 번호를 나태내는 인덱스이고, 는 상기 제1 라이트 마스크의 i번째 화소 값을 나타내고, 는 상기 제2 라이트 마스크의 i번째 화소 값을 나타내고, 는 LDR 환경 맵에 대한 손실함수를 나타내고, 는 상기 제1 LDR 환경 맵의 i번째 화소 값을 나태내고, 는 상기 제2 LDR 환경 맵의 i번째 화소 값을 나태내고, 는 상기 LDR 환경 맵에 대한 손실함수에 대한 가중치를 나타내고, 는 상기 라이트 마스크에 대한 손실함수에 대한 가중치를 나타내고, 는 상기 분류 클래스에 대한 손실함수에 대한 가중치를 나타내고, , 및 의 합은 1이다.The final loss function defined by ( ) is further configured to control internal parameters of the first DNN network to be updated multiple times so that ) approaches a minimum value. here represents the final loss function, denotes the loss function for the classification class, Represents the first classification class value, represents the second classification class value, denotes a loss function for a light mask, N denotes the total number of pixels of the first light mask, the second light mask, the first LDR environment map, and the second LDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first light mask, represents the i-th pixel value of the second light mask, denotes the loss function for the LDR environment map, Represents the i-th pixel value of the first LDR environment map, Represents the i-th pixel value of the second LDR environment map, represents a weight for the loss function for the LDR environment map, represents the weight for the loss function for the light mask, represents the weight for the loss function for the classification class, , and is 1.

일 실시예에서, 상기 데이터베이스는 학습용 LDR 환경 맵을 포함하는 학습 데이터를 더 저장하고, 상기 프로세싱 엔진은 상기 제2 DNN 네트워크를 학습 데이터로 학습시키도록 더 구성된다.In an embodiment, the database further stores training data including an LDR environment map for training, and the processing engine is further configured to train the second DNN network with the training data.

일 실시예에서, 상기 프로세싱 엔진은, 상기 제2 DNN 네트워크로 학습용 LDR 환경 맵을 입력하여 상기 제2 DNN 네트워크의 내부 파라미터들이 복수회 갱신되면서 제1 HDR 환경 맵이 복수회 출력되게 제어하도록 더 구성된다. 여기서 상기 학습 데이터는 상기 학습용 LDR 환경 맵에 부합하는 제2 HDR 환경 맵을 더 포함하고, 상기 제2 HDR 환경 맵은 상기 학습용 LDR 환경 맵에 비해 높은 동적 범위를 가진다.In one embodiment, the processing engine is further configured to input an LDR environment map for learning into the second DNN network and output the first HDR environment map multiple times while updating internal parameters of the second DNN network multiple times. do. Here, the training data further includes a second HDR environment map conforming to the LDR environment map for learning, and the second HDR environment map has a higher dynamic range than the LDR environment map for learning.

일 실시예에서, 상기 프로세싱 엔진은, 아래의 수학식In one embodiment, the processing engine, the following equation

에 의해서 정의되는 HDR 환경 맵에 대한 손실함수()가 최소값에 근접하도록 상기 제2 DNN 네트워크의 내부 파라미터들이 복수회 갱신되게 제어하도록 더 구성된다. 여기서 는 HDR 환경 맵에 대한 손실함수를 나타내고, N은 상기 제1 HDR 환경 맵 및 상기 제2 HDR 환경 맵의 화소들의 총 개수를 나타내고, i는 화소 번호를 나태내는 인덱스이고, 는 상기 제1 HDR 환경 맵의 i번째 화소 값을 나타내고, 는 상기 제2 HDR 환경 맵의 i번째 화소 값을 나타낸다.The loss function for the HDR environment map defined by ( ) is further configured to control internal parameters of the second DNN network to be updated multiple times so that ) approaches a minimum value. here denotes a loss function for the HDR environment map, N denotes the total number of pixels of the first HDR environment map and the second HDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first HDR environment map, represents the i-th pixel value of the second HDR environment map.

개시된 실시예들에 따르면, 광원을 추출할 입력 영상이 실내 영상인지 실외 영상인지의 여부에 관계없이 입력 영상에 적합한 광원을 HDR 환경 맵의 형태로 추출할 수 있게 되는 기술적 효과가 있다.According to the disclosed embodiments, there is a technical effect of being able to extract a light source suitable for an input image in the form of an HDR environment map regardless of whether the input image to extract the light source is an indoor image or an outdoor image.

도 1은 심층 학습(Deep Learning)에 기반하여 LDR(Low Dynamic Range) 영상으로부터 HDR(High Dynamic Range) 환경 맵(Environment Map)을 제공하기 위한 장치의 일 실시예의 블록도를 도시한 도면이다.
도 2는 도 1의 제1 DNN 네트워크의 구조의 일 실시예를 설명하기 위한 도면이다.
도 3은 도 1의 제2 DNN 네트워크의 구조의 일 실시예를 설명하기 위한 도면이다.
도 4는 심층 학습에 기반하여 LDR 영상으로부터 HDR 환경 맵을 제공하는 방법의 일 실시예를 설명하기 위한 흐름도를 도시한 도면이다.1 is a block diagram of an embodiment of an apparatus for providing a high dynamic range (HDR) environment map from a low dynamic range (LDR) image based on deep learning.
FIG. 2 is a diagram for explaining an embodiment of the structure of the first DNN network of FIG. 1;
FIG. 3 is a diagram for explaining an embodiment of the structure of the second DNN network of FIG. 1;
4 is a flowchart illustrating an embodiment of a method of providing an HDR environment map from an LDR image based on deep learning.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 개시의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only, and may be changed and implemented in various forms. Therefore, the form actually implemented is not limited only to the specific disclosed embodiment, and the scope of the present disclosure includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

"제1" 또는 "제2" 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, "제1 구성요소"는 "제2 구성요소"로 명명될 수 있고, 유사하게 "제2 구성요소"는 "제1 구성요소"로도 명명될 수 있다.Although terms such as "first" or "second" may be used to describe various components, such terms should only be construed for the purpose of distinguishing one component from another. For example, a “first element” may be termed a “second element”, and similarly, a “second element” may also be termed a “first element”.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.It should be understood that when an element is referred to as being “connected” to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 개시에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this disclosure, terms such as "comprise" or "having" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or numbers, It should be understood that the presence or addition of steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 개시에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present disclosure, it should not be interpreted in an ideal or excessively formal meaning. don't

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same reference numerals are given to the same components regardless of reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 심층 학습(Deep Learning)에 기반하여 LDR(Low Dynamic Range) 영상으로부터 HDR(High Dynamic Range) 환경 맵(Environment Map)을 제공하기 위한 장치의 일 실시예의 블록도를 도시한 도면이다.1 is a block diagram of an embodiment of an apparatus for providing a high dynamic range (HDR) environment map from a low dynamic range (LDR) image based on deep learning.

도 1에 도시된 바와 같이, HDR 환경 맵 제공 장치(100)는 데이터베이스부(110)를 포함할 수 있다. 데이터베이스부(110)는 본 개시에 따라 광원을 추출하고자 하는 대상이 되는 적어도 하나의 LDR 영상을 저장할 수 있다. 데이터베이스부(110)는 학습 데이터를 더 저장할 수 있다. 후술하는 바와 같이 학습 데이터는 학습용 LDR 영상, 학습용 LDR 영상에 부합하는 표준 분류 클래스 값(Classified Class), 학습용 LDR 영상에 부합하는 표준 라이트 마스크(Light Mask) 및 학습용 LDR 영상에 부합하는 표준 LDR 환경 맵(Environment Map), 학습용 LDR 환경 맵 및 학습용 LDR 환경 맵에 부합하는 표준 HDR 환경 맵을 포함할 수 있다. LDR 영상 및 학습용 LDR 영상은 대중적으로 흔히 사용되는 카메라로 촬영하여 획득한, 동적 범위(Dynamic Range)가 낮은 영상일 수 있다. LDR 영상 및 학습용 LDR 영상의 화소들의 각각은 0에서 255까지의 값들 중 어느 하나의 화소 값을 가질 수 있다. 표준 분류 클래스 값은 학습용 LDR 영상이 실내(Indoor) 영상인지 실외(Outdoor) 영상인지의 여부를 나타내는 표준 값일 수 있다. 일 실시예에서, 표준 분류 클래스 값은 0에서 1 사이의 값이다. 일 실시예에서, 학습용 LDR 영상이 실내 영상인 경우 표준 분류 클래스 값은 0이고, 학습용 LDR 영상이 실외 영상인 경우 표준 분류 클래스 값은 1이다. 표준 라이트 마스크는 학습용 LDR 영상에서의 광원의 위치와 세기를 충실히 나타내는 정보를 포함하는 영상일 수 있다. 표준 LDR 환경 맵은 학습용 LDR 영상에서의 광원의 위치와 세기를 충실히 나타내는 정보뿐만 아니라 학습용 LDR 영상의 색감을 충실히 나타내는 정보를 포함하는 영상일 수 있다. 표준 HDR 환경 맵은 학습용 LDR 환경 맵 보다 높은 동적 범위를 가지는 영상일 수 있다. 데이터베이스부(110)는, 플래시 메모리 타입(flash memory type), 하드 디스크 타입(hard disk type), 멀티미디어 카드(MultiMedia Card: MMC), 카드 타입의 메모리(예를 들어, SD(Secure Digital) 카드 또는 XD(eXtream Digital) 카드 등), RAM(Random Access Memory), SRAM(Static Random Access Memory), ROM(Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크 및 광 디스크 중 어느 하나의 저장 매체로 구현될 수 있으나, 당업자라면 데이터베이스부(110)의 구현 형태가 이에 한정되는 것이 아님을 알 수 있을 것이다.As shown in FIG. 1 , the HDR environment map providing apparatus 100 may include a database unit 110 . The database unit 110 may store at least one LDR image to which a light source is to be extracted according to the present disclosure. The database unit 110 may further store learning data. As described later, the training data includes an LDR image for training, a standard classification class value conforming to the LDR image for training, a standard light mask conforming to the LDR image for training, and a standard LDR environment map conforming to the LDR image for training. (Environment Map), an LDR environment map for learning, and a standard HDR environment map conforming to the LDR environment map for learning. The LDR image and the LDR image for learning may be an image having a low dynamic range obtained by photographing with a camera commonly used in the public. Each of the pixels of the LDR image and the learning LDR image may have any one pixel value from 0 to 255. The standard classification class value may be a standard value indicating whether an LDR image for learning is an indoor image or an outdoor image. In one embodiment, the standard classification class value is a value between 0 and 1. In one embodiment, when the LDR image for learning is an indoor image, the standard classification class value is 0, and when the LDR image for learning is an outdoor image, the standard classification class value is 1. The standard light mask may be an image including information faithfully representing the location and intensity of a light source in an LDR image for learning. The standard LDR environment map may be an image including information faithfully representing the color of the LDR image for learning as well as information faithfully representing the location and intensity of a light source in the LDR image for learning. The standard HDR environment map may be an image having a higher dynamic range than the learning LDR environment map. The database unit 110 is a flash memory type, a hard disk type, a multimedia card (MMC), a card type of memory (eg, SD (Secure Digital) card or XD (eXtream Digital) card, etc.), RAM (Random Access Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory) ), a magnetic memory, a magnetic disk, and an optical disk, but those skilled in the art will recognize that the implementation form of the database unit 110 is not limited thereto.

장치(100)는 데이터베이스부(110)에 통신가능한 방식으로 결합된 프로세싱 엔진(processing engine, 120)을 더 포함할 수 있다. 프로세싱 엔진(120)은 제1 DNN(Deep Neural Network) 네트워크(124) 및 제2 DNN 네트워크를 구현(128)하며, 데이터베이스부(110)로부터 원하는 LDR 영상을 검색하여 제1 DNN 네트워크(124)로 입력되게 제어하도록 구성될 수 있다. 제1 DNN 네트워크(124)는 LDR 영상이 입력되는 것에 응답하여 분류 클래스, 라이트 마스크 및 LDR 환경 맵을 출력하도록 구성될 수 있다. 제2 DNN 네트워크(128)는 제1 DNN 네트워크(124)에 접속되고 제1 DNN 네트워크(124)로부터 LDR 환경 맵이 입력되는 것에 응답하여 HDR 환경 맵을 출력하도록 구성될 수 있다. 제1 DNN 네트워크(124)로부터 출력되는 분류 클래스 값은 LDR 영상이 실내 영상인지 실외 영상인지의 여부를 나타내는 값일 수 있다. 학습용 LDR 영상에 부합하는 표준 분류 클래스 값은 학습용 LDR 영상이 실내 영상인지 실외 영상인지의 여부를 충실히 나타내는 표준 값임에 반해, 분류 클래스 값의 충실도는 제1 DNN 네트워크(124)가 얼마나 잘 학습되었는지에 따라 달라질 수 있다. 제1 DNN 네트워크(124)로부터 출력되는 라이트 마스크는 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보를 포함할 수 있다. 학습용 LDR 영상에 부합하는 표준 라이트 마스크는 학습용 LDR 영상에서의 광원의 위치와 세기를 비교적 충실히 나타내는 영상임에 반해, 라이트 마스크의 충실도는 제1 DNN 네트워크(124)가 얼마나 잘 학습되었는지에 따라 달라질 수 있다. 제1 DNN 네트워크(124)로부터 출력되는 LDR 환경 맵은 LDR 영상에서의 광원의 위치와 세기를 나타내는 정보 및 LDR 영상의 색감에 관한 정보를 포함하는 RGB 영상일 수 있다. 학습용 LDR 영상에 부합하는 표준 LDR 환경 맵은 학습용 LDR 영상에서의 광원의 위치와 세기 및 LDR 영상의 색감을 비교적 충실히 나타내는 영상임에 반해, LDR 환경 맵이 LDR 영상에서의 광원의 위치와 세기 그리고 LDR 영상의 색감을 얼마나 잘 나태내는지는 제1 DNN 네트워크(124)가 얼마나 잘 학습되었는지에 따라 달라질 수 있다. HDR 환경 맵은 LDR 환경 맵에 비해 높은 동적 범위를 가지는 영상일 수 있다. 학습용 LDR 환경 맵에 부합하는 표준 HDR 환경 맵이 비교적 충실도가 높은 영상임에 반해, HDR 환경 맵의 충실도는 제2 DNN 네트워크(128)가 얼마나 잘 학습되었는지의 여부에 따라 달라질 수 있다. 전술한 이유들 때문에 프로세싱 엔진(120)은 제1 DNN 네트워크(124) 및 제2 DNN 네트워트(128)를 학습 데이터를 이용하여 학습하도록 더 구성될 수 있다. 프로세싱 엔진(120)은 제1 DNN 네트워크(124) 및 제2 DNN 네트워트(128)를 감독 학습(supervised learning), 준감독 학습(semi-supervised learning) 및 무감독 학습(unsupervised learning) 중 어느 하나의 방식으로 학습시키도록 구현될 수 있다. Apparatus 100 may further include a processing engine 120 communicatively coupled to database unit 110 . The processing engine 120 implements a first deep neural network (DNN) network 124 and a second DNN network (128), retrieves a desired LDR image from the database unit 110, and returns it to the first DNN network 124. It can be configured to control input. The first DNN network 124 may be configured to output a classification class, a light mask, and an LDR environment map in response to an LDR image being input. The second DNN network 128 may be connected to the first DNN network 124 and configured to output an HDR environment map in response to an LDR environment map being input from the first DNN network 124 . The classification class value output from the first DNN network 124 may be a value indicating whether the LDR image is an indoor image or an outdoor image. While the standard classification class value corresponding to the LDR image for training is a standard value that faithfully indicates whether the LDR image for training is an indoor image or an outdoor image, the fidelity of the classification class value depends on how well the first DNN network 124 has learned. may vary depending on The light mask output from the first DNN network 124 may include information representing the location and intensity of a light source in the LDR image. While a standard light mask conforming to an LDR image for training is an image that relatively faithfully represents the location and intensity of light sources in the LDR image for training, the fidelity of the light mask may vary depending on how well the first DNN network 124 is trained. there is. The LDR environment map output from the first DNN network 124 may be an RGB image including information indicating the position and intensity of a light source in the LDR image and information about color of the LDR image. A standard LDR environment map corresponding to an LDR image for training is an image that relatively faithfully represents the position and intensity of light sources in the LDR image for training and the color of the LDR image. How well the color of the image is represented may vary depending on how well the first DNN network 124 is trained. The HDR environment map may be an image having a higher dynamic range than the LDR environment map. While the standard HDR environment map corresponding to the learning LDR environment map is a relatively high-fidelity image, the fidelity of the HDR environment map may vary depending on how well the second DNN network 128 is trained. For the reasons described above, processing engine 120 may be further configured to learn first DNN network 124 and second DNN network 128 using the training data. The processing engine 120 converts the first DNN network 124 and the second DNN network 128 into supervised learning, semi-supervised learning, and unsupervised learning. It can be implemented to learn in a way.

도 2는 도 1의 제1 DNN 네트워크의 구조의 일 실시예를 설명하기 위한 도면이다.FIG. 2 is a diagram for explaining an embodiment of the structure of the first DNN network of FIG. 1;

도시되지는 않았으나 제1 DNN 네트워크(124)는 LDR 영상(220)을 입력받는 것에 응답하여 분류 클래스 값, 라이트 마스크(230) 및 LDR 환경 맵(240)을 출력하게 작동하기 위하여 복수의 레이어를 포함할 수 있다. 일 실시예에서, 제1 DNN 네트워크(124)는 두 개의 컨볼루션 레이어(convolution layers) 및 다섯 개의 잔차 블록(residual blocks)으로 구성된 인코더, 완전 연결 레이어(fully-connected layer) 및 완전 연결 레이어에 접속된 분류 헤드와 두 개의 디코더를 포함한다. 이러한 실시예에서, 분류 헤드는 분류 클래스 값을 출력하기 위한 경로에 배치되고, 두 개의 디코더는 라이트 마스크(230) 및 LDR 환경 맵(240)을 출력하기 위한 경로들에 각각 배치되어 있다. 이러한 실시예에서, 분류 헤드는 두 개의 완전 연결 레이어로 구성되고, 두 개의 디코더의 각각은 한 개의 완전 연결 레이어, 다섯 개의 업샘플링 레이어(upsampling layers) 및 한 개의 컨볼루션 레이어로 구성된다. 일 실시예에서, 제1 DNN 네트워크의 모든 레이어에는 배치 정규화가 적용되며, 분류 헤드와 라이트 마스크(230)를 생성하는 디코더에는 시그모이드(Sigmoid) 활성화 함수가 적용되며, LDR 환경 맵(240)을 생성하는 디코더에는 탄(Tanh) 활성화 함수가 사용된다. 이상으로 제1 DNN 네트워크(124)의 예시적인 구조를 설명하였으나, 제1 DNN 네트워크(124)의 구조가 전술한 구조로 제한되는 것이 아님을 인식하여야 한다.Although not shown, the first DNN network 124 includes a plurality of layers to output a classification class value, a light mask 230, and an LDR environment map 240 in response to receiving the LDR image 220. can do. In one embodiment, the first DNN network 124 is connected to an encoder consisting of two convolution layers and five residual blocks, a fully-connected layer and a fully-connected layer. It includes a classified head and two decoders. In this embodiment, a classification head is placed on the path for outputting the classification class value, and two decoders are placed on the paths for outputting the light mask 230 and the LDR environment map 240, respectively. In this embodiment, the classification head consists of two fully connected layers, and each of the two decoders consists of one fully connected layer, five upsampling layers and one convolutional layer. In one embodiment, batch normalization is applied to all layers of the first DNN network, a sigmoid activation function is applied to the classification head and the decoder generating the light mask 230, and the LDR environment map 240 A Tanh activation function is used in the decoder that generates . Although the exemplary structure of the first DNN network 124 has been described above, it should be recognized that the structure of the first DNN network 124 is not limited to the above structure.

도 3은 도 1의 제2 DNN 네트워크의 구조의 일 실시예를 설명하기 위한 도면이다.FIG. 3 is a diagram for explaining an embodiment of the structure of the second DNN network of FIG. 1;

도시되지는 않았으나 제2 DNN 네트워크(128)는 LDR 환경 맵(240)을 입력받는 것에 응답하여 HDR 환경 맵(330)을 출력하게 작동하기 위하여 복수의 레이어를 포함할 수 있다. 일 실시예에서, 제2 DNN 네트워크(128)는 세 개의 컨볼루션 레이어, 여섯 개의 잔차 블록, 두 개의 업샘플링 레이어 및 한 개의 컨볼루션 레이어를 포함한다. 일 실시예에서, 제2 DNN 네트워크(128)의 모든 레이어에는 ReLU 활성화 함수와 인스턴스 정규화가 사용된다. 제2 DNN 네트워크(128)의 경우도 전술한 예시적인 구조로 그 구조가 제한되는 것이 아니며, LDR 환경 맵을 입력받아 HDR 환경 맵을 출력하는 DNN 구조라면 그 어떤 것도 본 개시의 범위에 속하는 것임을 인식하여야 한다.Although not shown, the second DNN network 128 may include a plurality of layers in order to output the HDR environment map 330 in response to receiving the LDR environment map 240 . In one embodiment, the second DNN network 128 includes three convolutional layers, six residual blocks, two upsampling layers and one convolutional layer. In one embodiment, the ReLU activation function and instance normalization are used in all layers of the second DNN network 128. Even in the case of the second DNN network 128, the structure is not limited to the above-described exemplary structure, and it is recognized that any DNN structure that receives an LDR environment map and outputs an HDR environment map falls within the scope of the present disclosure. shall.

다시 도 1을 참조하면, 프로세싱 엔진(120)은, 제1 DNN 네트워크(124)를 학습시키기 위하여 제1 DNN 네트워크(124)로 학습용 LDR 영상을 입력하여 제1 DNN 네트워크(124)의 내부 파라미터들이 복수회 갱신되면서 학습 상태의 분류 클래스 값, 학습 상태의 라이트 마스크 및 학습 상태의 LDR 환경 맵이 복수회 출력되게 제어하도록 더 구성될 수 있다. 학습 상태의 분류 클래스 값은 0에서 1 사이의 값일 수 있다. 학습 상태의 분류 클래스 값은 제1 DNN 네트워크(124)에 대한 학습이 이루어져감에 따라 학습용 LDR 영상이 실내 영상인지 실외 영상인지의 여부를 잘 나타내게 될 수 있다. 학습용 LDR 영상이 실내 영상인 경우, 제1 DNN 네트워크(124)에 대한 학습이 이루어져 감에 따라 학습 상태의 분류 클래스 값은 0에 가까운 값으로 접근할 수 있다. 학습용 LDR 영상이 실외 영상인 경우, 제1 DNN 네트워크(124)에 대한 학습이 이루어져 감에 따라 학습 상태의 분류 클래스 값은 1에 가까운 값으로 접근할 수 있다. 학습 상태의 라이트 마스크는 제1 DNN 네트워크(124)에 대한 학습이 이루어져감에 따라 학습용 LDR 영상에서의 광원의 위치와 세기를 잘 나타내게 될 수 있다. 학습 상태의 LDR 환경 맵은 제1 DNN 네트워크(124)에 대한 학습이 이루어져감에 따라 학습용 LDR 영상에서의 광원의 위치와 세기 그리고 학습용 LDR 영상의 색감을 잘 나태내게 될 수 있다.Referring back to FIG. 1, the processing engine 120 inputs an LDR image for training to the first DNN network 124 to train the first DNN network 124 so that the internal parameters of the first DNN network 124 are It may be further configured to control the classification class value of the learning state, the light mask of the learning state, and the LDR environment map of the learning state to be output multiple times while being updated multiple times. The classification class value of the learning state may be a value between 0 and 1. The classification class value of the learning state may well represent whether the LDR image for learning is an indoor image or an outdoor image as learning for the first DNN network 124 is performed. When the LDR image for learning is an indoor image, the classification class value of the learning state may approach a value close to 0 as learning for the first DNN network 124 is performed. When the learning LDR image is an outdoor image, the classification class value of the learning state may approach a value close to 1 as learning of the first DNN network 124 is performed. As the learning of the first DNN network 124 proceeds, the light mask in the learning state may well represent the location and intensity of the light source in the LDR image for training. As the learning of the first DNN network 124 progresses, the LDR environment map in the learning state may well represent the location and intensity of the light source in the learning LDR image and the color of the learning LDR image.

프로세싱 엔진(120)은 아래의 수학식 1 내지 수학식 4에 의해서 정의되는 최종손실함수()가 최소값에 근접하도록 제1 DNN 네트워크(124)의 내부 파라미터들이 복수회 갱신되게 제어하도록 더 구성될 수 있다.The processing engine 120 calculates the final loss function defined by Equations 1 to 4 below ( ) may be further configured to control the internal parameters of the first DNN network 124 to be updated multiple times so that ) approaches a minimum value.

여기서 는 최종손실함수를 나타내고, 는 LDR 환경 맵에 대한 손실함수를 나타내고, 는 라이트 마스크에 대한 손실함수를 나타내고, 는 분류 클래스에 대한 손실함수를 나타낸다. 또한 은 LDR 환경 맵에 대한 손실함수에 대한 가중치를 나타내고, 는 라이트 마스크에 대한 손실함수에 대한 가중치를 나타내고, 는 분류 클래스에 대한 손실함수에 대한 가중치를 나타내고, , 및 의 합은 1이다.here represents the final loss function, denotes the loss function for the LDR environment map, denotes the loss function for the light mask, represents the loss function for the classification class. also represents the weight for the loss function for the LDR environment map, represents the weight for the loss function for the light mask, represents the weight for the loss function for the classification class, , and is 1.

여기서 는 학습 상태의 분류 클래스 값을 나타내고, 는 학습용 LDR 영상에 부합하는 표준 분류 클래스 값을 나타낸다.here Represents the classification class value of the learning state, represents a standard classification class value corresponding to the LDR image for training.

여기서 N은 학습 상태의 라이트 마스크 및 표준 라이트 마스크의 화소들(pixels)의 총 개수를 나타내고, i는 화소 번호를 나태내는 인덱스이고, 는 학습 상태의 라이트 마스크의 i번째 화소 값을 나타내고, 는 표준 라이트 마스크의 i번째 화소 값을 나타낸다.Here, N represents the total number of pixels of the light mask in the learning state and the standard light mask, i is an index representing the pixel number, Represents the i-th pixel value of the light mask in the learning state, represents the i-th pixel value of the standard light mask.

여기서 N은 학습 상태의 LDR 환경 맵 및 표준 LDR 환경 맵의 화소들의 총 개수를 나타내고, i는 화소 번호를 나태내는 인덱스이고, 는 학습 상태의 LDR 환경 맵의 i번째 화소 값을 나태내고, 는 표준 LDR 환경 맵의 i번째 화소 값을 나태낸다.where N represents the total number of pixels of the LDR environment map in the learning state and the standard LDR environment map, i is an index representing the pixel number, Represents the i-th pixel value of the LDR environment map in the learning state, represents the i-th pixel value of the standard LDR environment map.

프로세싱 엔진(120)은, 제2 DNN 네트워크(128)를 학습시키기 위하여 제2 DNN 네트워크(128)로 학습용 LDR 환경 맵을 입력하여 제2 DNN 네트워크(128)의 내부 파라미터들이 복수회 갱신되면서 학습 상태의 HDR 환경 맵이 복수회 출력되게 제어하도록 더 구성될 수 있다. 학습 상태의 HDR 환경 맵은 제2 DNN 네트워크(128)에 대한 학습이 이루어져감에 따라 학습용 LDR 환경 맵에 비해 그 동적 범위가 향상되어 갈 수 있다.In order to train the second DNN network 128, the processing engine 120 inputs an LDR environment map for learning to the second DNN network 128, and internal parameters of the second DNN network 128 are updated multiple times to enter a learning state. It may be further configured to control the HDR environment map to be output multiple times. The dynamic range of the HDR environment map in the learning state may be improved as compared to the LDR environment map for learning as learning for the second DNN network 128 is performed.

프로세싱 엔진(120)은, 아래의 수학식 5에 의해서 정의되는 HDR 환경 맵에 대한 손실함수()가 최소값에 근접하도록 제2 DNN 네트워크(128)의 내부 파라미터들이 복수회 갱신되게 제어하도록 더 구성될 수 있다.The processing engine 120 calculates a loss function for the HDR environment map defined by Equation 5 below ( ) may be further configured to control the internal parameters of the second DNN network 128 to be updated multiple times so that .

여기서 는 HDR 환경 맵에 대한 손실함수를 나타내고, N은 학습 상태의 HDR 환경 맵 및 표준 HDR 환경 맵의 화소들의 총 개수를 나타내고, i는 화소 번호를 나태내는 인덱스이고, 는 학습 상태의 HDR 환경 맵의 i번째 화소 값을 나타내고, 는 표준 HDR 환경 맵의 i번째 화소 값을 나타낸다.here denotes the loss function for the HDR environment map, N denotes the total number of pixels of the HDR environment map in the learning state and the standard HDR environment map, i is an index indicating the pixel number, Represents the i-th pixel value of the HDR environment map in the learning state, represents the i-th pixel value of the standard HDR environment map.

도 4는 심층 학습에 기반하여 LDR 영상으로부터 HDR 환경 맵을 제공하는 방법의 일 실시예를 설명하기 위한 흐름도를 도시한 도면이다.4 is a flowchart illustrating an embodiment of a method of providing an HDR environment map from an LDR image based on deep learning.

도 4에 도시된 바와 같이, 본 방법의 일 실시예는 제1 DNN 네트워크(124)를 학습용 LDR 영상, 학습용 LDR 영상에 부합하는 표준 분류 클래스 값, 학습용 LDR 영상에 부합하는 표준 라이트 마스크 및 학습용 LDR 영상에 부합하는 표준 LDR 환경 맵을 포함하는 학습 데이터로 학습시키는 단계(S405)로부터 시작된다. 여기서 학습용 LDR 영상은 실내 영상 또는 실외 영상일 수 있다. 본 단계에서는 제1 DNN 네트워크(124)를 학습시키기 위하여 제1 DNN 네트워크(124)로 학습용 LDR 영상을 입력하여 제1 DNN 네트워크(124)의 내부 파라미터들이 복수회 갱신되면서 학습 상태의 분류 클래스 값, 학습 상태의 라이트 마스크 및 학습 상태의 LDR 환경 맵이 복수회 출력되게 제어한다. 본 단계에서는 위 수학식 1 내지 수학식 4에 의해서 정의되는 최종손실함수()가 최소값에 근접하도록 제1 DNN 네트워크(124)의 내부 파라미터들이 복수회 갱신되게 제어할 수 있다. 단계(S410)에서는 제2 DNN 네트워크(128)를 학습용 LDR 환경 맵 및 학습용 LDR 환경 맵에 부합하는 표준 HDR 환경 맵을 포함하는 학습 데이터로 학습시킨다. 본 단계에서는 제2 DNN 네트워크(128)를 학습시키기 위하여 제2 DNN 네트워크(128)로 학습용 LDR 환경 맵을 입력하여 제2 DNN 네트워크(128)의 내부 파라미터들이 복수회 갱신되면서 학습 상태의 HDR 환경 맵이 복수회 출력되게 제어한다. 본 단계에서는 위 수학식 5에 의해서 정의되는 HDR 환경 맵에 대한 손실함수()가 최소값에 근접하도록 제2 DNN 네트워크(128)의 내부 파라미터들이 복수회 갱신되게 제어할 수 있다. 단계(S415)에서는 LDR 영상(220)을 제1 DNN 네트워크(124)로 입력하여 LDR 환경 맵(240)을 출력한다. 단계(S420)에서는 제1 DNN 네트워크(124)로부터 출력된 LDR 환경 맵(240)을 제2 DNN 네트워크(128)로 입력하여 HDR 환경 맵(330)을 출력한다. 전술한 바와 같이, LDR 환경 맵(240)은 LDR 영상(220)에서의 광원의 위치와 세기를 나타내는 정보 및 LDR 영상(220)의 색감에 관한 정보를 포함할 수 있다. 전술한 바와 같이, HDR 환경 맵(330)은 LDR 환경 맵(240)에 비해 높은 동적 범위를 가질 수 있다.As shown in FIG. 4 , an embodiment of the method uses the first DNN network 124 as an LDR image for training, a standard classification class value conforming to the LDR image for training, a standard light mask conforming to the LDR image for training, and an LDR for training. It begins with learning with learning data including a standard LDR environment map that matches the image (S405). Here, the LDR image for learning may be an indoor image or an outdoor image. In this step, in order to train the first DNN network 124, an LDR image for learning is input to the first DNN network 124, and the internal parameters of the first DNN network 124 are updated multiple times, and the classification class value of the learning state, The light mask in the learning state and the LDR environment map in the learning state are controlled to be output multiple times. In this step, the final loss function defined by Equations 1 to 4 above ( The internal parameters of the first DNN network 124 may be controlled to be updated multiple times so that ) approaches the minimum value. In step S410, the second DNN network 128 is trained with training data including an LDR environment map for learning and a standard HDR environment map conforming to the LDR environment map for learning. In this step, an LDR environment map for learning is input to the second DNN network 128 to train the second DNN network 128, and the internal parameters of the second DNN network 128 are updated multiple times, and the HDR environment map in the learning state. This is controlled to be output multiple times. In this step, the loss function for the HDR environment map defined by Equation 5 above ( The internal parameters of the second DNN network 128 may be controlled to be updated multiple times so that ) approaches the minimum value. In step S415, the LDR image 220 is input to the first DNN network 124 and the LDR environment map 240 is output. In step S420, the LDR environment map 240 output from the first DNN network 124 is input to the second DNN network 128 to output the HDR environment map 330. As described above, the LDR environment map 240 may include information representing the position and intensity of a light source in the LDR image 220 and information about color of the LDR image 220 . As described above, the HDR environment map 330 may have a higher dynamic range than the LDR environment map 240 .

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), programmable logic units (PLUs), microprocessors, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include multiple processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. The device can be commanded. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination, and the program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in the art of computer software. may be Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware device described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

100: HDR 환경 맵 제공 장치
110: 데이터베이스부
120: 프로세싱 엔진
124: 제1 DNN 네트워크
128: 제2 DNN 네트워크
220: LDR 영상
230: 라이트 마스크
240: LDR 환경 맵
330: HDR 환경 맵100: HDR environment map providing device
110: database unit
120: processing engine
124 First DNN network
128 Second DNN network
220: LDR image
230: light mask
240: LDR environment map
330: HDR environment map

Claims

As a method of providing a high dynamic range (HDR) environment map from a low dynamic range (LDR) image based on deep learning,
Outputting an LDR environment map by inputting the LDR image into a first deep neural network (DNN) network; and
Inputting the LDR environment map to a second DNN network and outputting an HDR environment map,
The LDR environment map includes information indicating the position and intensity of a light source in the LDR image and information about color of the LDR image,
The HDR environment map has a higher dynamic range than the LDR environment map,
The method further includes, before the step of inputting the LDR image into a first deep neural network (DNN) network and outputting an LDR environment map, learning the first DNN network with training data including an LDR image for learning. include,
The step of learning the first DNN network with learning data including an LDR image for learning,
A first classified class value, a first light mask, and a first LDR environment map while inputting an LDR image for learning into the first DNN network and updating the internal parameters of the first DNN network multiple times Outputting multiple times - the training data further includes a second classification class value, a second light mask, and a second LDR environment map corresponding to the learning LDR image, and the second classification class value corresponds to the learning LDR image Indicates whether the image is an indoor image or an outdoor image, the second light mask includes information indicating the location and intensity of a light source in the learning LDR image, and the second LDR environment map Including information representing the position and intensity of the light source in the LDR image for learning and information about the color of the LDR image for learning - Including,
How to provide an HDR environment map.

delete

According to claim 1,
The step of learning the first DNN network with training data is the following equations

The final loss function defined by ( Further comprising updating internal parameters of the first DNN network a plurality of times so that ) approaches a minimum value - wherein represents the final loss function, denotes the loss function for the classification class, Represents the first classification class value, represents the second classification class value, denotes a loss function for a light mask, N denotes the total number of pixels of the first light mask, the second light mask, the first LDR environment map, and the second LDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first light mask, represents the i-th pixel value of the second light mask, denotes the loss function for the LDR environment map, Represents the i-th pixel value of the first LDR environment map, Represents the i-th pixel value of the second LDR environment map, represents a weight for the loss function for the LDR environment map, represents the weight for the loss function for the light mask, represents the weight for the loss function for the classification class, , and The sum of is 1 -, HDR environment map providing method.

According to claim 4,
The first classification class value is a value between 0 and 1,
When the LDR image for learning is an indoor image, the second classification class value is 0 or a value close to 0, and when the LDR image for learning is an outdoor image, the second classification class value is 1 or a value close to 1, HDR environment bap How to provide.

According to claim 1,
Prior to the step of inputting the LDR image into a first Deep Neural Network (DNN) network and outputting an LDR environment map, HDR further comprising learning the second DNN network with training data including an LDR environment map for learning. How to provide an environment map.

According to claim 6,
The step of learning the second DNN network with learning data including an LDR environment map for learning,
Inputting an LDR environment map for learning to the second DNN network and outputting a first HDR environment map multiple times while updating internal parameters of the second DNN network multiple times - the training data conforms to the LDR environment map for learning And further comprising a second HDR environment map, wherein the second HDR environment map has a higher dynamic range than the LDR environment map for learning.

According to claim 7,
The step of learning the second DNN network with training data is the following equation

The loss function for the HDR environment map defined by ( Further comprising updating internal parameters of the second DNN network a plurality of times so that ) approaches a minimum value - wherein denotes a loss function for the HDR environment map, N denotes the total number of pixels of the first HDR environment map and the second HDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first HDR environment map, represents the i-th pixel value of the second HDR environment map -, HDR environment map providing method.

An apparatus for providing an HDR environment map from an LDR image based on deep learning,
A database unit for storing LDR images, and
A processing engine coupled to the database unit in a communicably manner, implementing a first DNN network and a second DNN network, and configured to retrieve the LDR image from the database unit and control input to the first DNN network; ,
The first DNN network is configured to output an LDR environment map in response to input of the LDR image,
The second DNN network is connected to the first DNN network and is configured to output an HDR environment map in response to input of the LDR environment map,
The LDR environment map includes information indicating the position and intensity of a light source in the LDR image and information about color of the LDR image,
The HDR environment map has a higher dynamic range than the LDR environment map,
The database further stores learning data including an LDR image for learning,
the processing engine is further configured to train the first DNN network with the training data;
The processing engine inputs the LDR image for learning to the first DNN network, updates the internal parameters of the first DNN network multiple times, and generates a first classification class value, a first light mask, and a first LDR environment map multiple times. further configured to control to be outputted - the training data further includes a second classification class value, a second light mask, and a second LDR environment map conforming to the LDR image for training, wherein the second classification class value corresponds to the LDR for training Indicates whether the image is an indoor image or an outdoor image, the second light mask includes information indicating the position and intensity of a light source in the LDR image for learning, and the second LDR environment map is in the LDR image for learning. Including information representing the position and intensity of the light source and information about the color of the LDR image for learning -,
HDR environment map provider.

delete

According to claim 9,
The processing engine, the following equations

The final loss function defined by ( ) Is further configured to control the internal parameters of the first DNN network to be updated multiple times so that ) approaches the minimum value - where represents the final loss function, denotes the loss function for the classification class, Represents the first classification class value, represents the second classification class value, denotes a loss function for a light mask, N denotes the total number of pixels of the first light mask, the second light mask, the first LDR environment map, and the second LDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first light mask, represents the i-th pixel value of the second light mask, denotes the loss function for the LDR environment map, Represents the i-th pixel value of the first LDR environment map, Represents the i-th pixel value of the second LDR environment map, represents a weight for the loss function for the LDR environment map, represents the weight for the loss function for the light mask, represents the weight for the loss function for the classification class, , and The sum of is 1 -, HDR environment map providing device.

According to claim 12,
The first classification class value is a value between 0 and 1,
When the LDR image for learning is an indoor image, the second classification class value is 0 or a value close to 0, and when the LDR image for learning is an outdoor image, the second classification class value is 1 or a value close to 1, HDR environment bap provision device.

According to claim 9,
The database further stores learning data including an LDR environment map for learning,
Wherein the processing engine is further configured to train the second DNN network with training data.

According to claim 14,
The processing engine is further configured to control an LDR environment map for learning to be input to the second DNN network so that the first HDR environment map is output multiple times while internal parameters of the second DNN network are updated multiple times - the learning data Further includes a second HDR environment map corresponding to the LDR environment map for learning, wherein the second HDR environment map has a higher dynamic range than the LDR environment map for learning -, HDR environment map providing device.

According to claim 15,
The processing engine, the following equation

The loss function for the HDR environment map defined by ( Further configured to control the internal parameters of the second DNN network to be updated multiple times so that ) approaches the minimum value - where denotes a loss function for the HDR environment map, N denotes the total number of pixels of the first HDR environment map and the second HDR environment map, i is an index indicating a pixel number, represents the i-th pixel value of the first HDR environment map, represents the i th pixel value of the second HDR environment map -, HDR environment map providing device.

A computer program stored in a computer-readable recording medium for executing any one of the methods of claims 1 and 4 to 8.