KR20210082901A

KR20210082901A - Method for raw-to-rgb mapping using two-stage u-net with misaligned data, recording medium and device for performing the method

Info

Publication number: KR20210082901A
Application number: KR1020190175328A
Authority: KR
Inventors: 고성제; 엄광현; 조성진
Original assignee: 고려대학교 산학협력단
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2021-07-06
Also published as: KR102350610B1

Abstract

A mapping method from raw to RGB using misaligned data based on a two-step U-net structure comprises the steps of: obtaining a misaligned data set of an input raw image and an output RGB image from different cameras; generating a first U-net neural network model consisting of a plurality of pooling layers and a plurality of upsampling layers; learning the data set through the first U-net neural network model using a loss function defined as a sum of pixel loss, feature loss, and color loss between an output image and a target image; generating a second U-net neural network model composed of a plurality of pooling layers and a plurality of upsampling layers; and by using the loss function, learning the result of the learning through the first U-net neural network model through the second U-net neural network model learning, and performing alignment in data pairs of the raw image and the RGB image. Accordingly, a convolutional neural network model can be trained to generate an RGB image with DSLR-level quality by using an image obtained from a raw sensor as an input.

Description

A method of mapping from raw to RGB using misaligned data based on a two-step U-Net structure, and a recording medium and device for performing the same. DEVICE FOR PERFORMING THE METHOD}

본 발명은 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법, 이를 수행하기 위한 기록 매체 및 장치에 관한 것으로서, 더욱 상세하게는 딥 러닝을 기반으로 스마트폰 카메라로부터 얻어진 raw Bayer 영상을 DSLR 카메라로 획득한 RGB 영상으로 매핑하는 기법에 관한 것이다.The present invention relates to a raw to RGB mapping method using misaligned data based on a two-step U-Net structure, a recording medium and an apparatus for performing the same, and more particularly, a raw Bayer obtained from a smartphone camera based on deep learning. It relates to a method of mapping an image to an RGB image acquired with a DSLR camera.

[국가지원 연구개발에 대한 설명][Description of state-funded R&D]

본 연구는 2019년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구이다(No. 2014-3-00077, 대규모 실시간 비디오 분석에 의한 전역적 다중 관심객체 추적 및 상황 예측 기술 개발).This study was conducted with the support of the Information and Communication Planning and Evaluation Institute with the funding of the government (Ministry of Science and ICT) in 2019 (No. 2014-3-00077, global multi-object of interest tracking and situation by large-scale real-time video analysis) development of predictive technology).

스마트폰 카메라가 급격히 발전하고 있지만, 물리적인 한계(작은 센서 크기, 작은 렌즈 크기, 하드웨어 비용 등)로 인하여 DSLR 카메라로 촬영한 영상보다 낮은 화질의 영상을 얻게 된다. Smartphone cameras are developing rapidly, but due to physical limitations (small sensor size, small lens size, hardware cost, etc.), images of lower quality than those shot with a DSLR camera are obtained.

따라서, 스마트폰 카메라로 촬영한 영상의 화질을 개선하는 연구가 최근 진행되고 있다. 일부 스마트폰은 카메라 센서에서 획득한 raw 영상 데이터를 저장하는 것을 지원한다. 이러한 raw 영상은 카메라에 탑재된 영상 신호 처리 프로세서(Image Signal Processor)를 통해 RGB 영상으로 변환된다. Therefore, research on improving the image quality of an image captured by a smartphone camera has been recently conducted. Some smartphones support saving the raw image data acquired from the camera sensor. These raw images are converted into RGB images through an image signal processor mounted on the camera.

그러나, 종래 기술들은 모바일 카메라 영상을 DSLR 카메라 수준으로 화질 개선을 하기 위해 RGB 영상만을 이용하였다. 이에 따라, RGB 영상은 이미 카메라에 탑재된 영상 신호 처리기(Image signal processor, ISP)를 통하여 raw 센서 데이터를 가공한 것으로, 데이터의 손실이 발생하는 문제점이 있다.However, the prior art uses only RGB images in order to improve the quality of mobile camera images to the level of a DSLR camera. Accordingly, the RGB image is processed by raw sensor data through an image signal processor (ISP) already mounted in the camera, and there is a problem in that data is lost.

US 8,508,612 B2US 8,508,612 B2 KR 10-1996730 B1KR 10-1996730 B1 US 2017/0270122 A1US 2017/0270122 A1

D. G. Lowe, "Distinctive image features from scale-invariant keypoints," in International Journal of Computer Vision (IJCV), 60(2):91-110, 2004. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” in International Journal of Computer Vision (IJCV), 60(2):91-110, 2004. E. Schwartz, R. Giryes and A. M. Bronstein, "DeepISP: Toward Learning an End-to-End Image Processing Pipeline," in IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 912-923, Feb. 2019. E. Schwartz, R. Giryes and A. M. Bronstein, "DeepISP: Toward Learning an End-to-End Image Processing Pipeline," in IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 912-923, Feb. 2019.

이에, 본 발명의 기술적 과제는 이러한 점에서 착안된 것으로 본 발명의 목적은 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법을 제공하는 것이다.Accordingly, the technical problem of the present invention was conceived in this regard, and an object of the present invention is to provide a mapping method from raw to RGB using misaligned data based on a two-step U-Net structure.

본 발명의 다른 목적은 상기 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법을 수행하기 위한 컴퓨터 프로그램이 기록된 기록 매체를 제공하는 것이다.Another object of the present invention is to provide a recording medium in which a computer program is recorded for performing a raw-to-RGB mapping method using misaligned data based on the two-step U-Net structure.

본 발명의 또 다른 목적은 상기 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법을 수행하기 위한 장치를 제공하는 것이다.Another object of the present invention is to provide an apparatus for performing a raw-to-RGB mapping method using misaligned data based on the two-step U-Net structure.

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법은, 서로 다른 카메라로부터 입력 raw 영상과 출력 RGB 영상의 어긋난 데이터 세트를 획득하는 단계; 다수의 풀링(Pooling) 계층과 다수의 업샘플링(Upsampling) 계층으로 구성된 제1 U-Net 신경망 모델을 생성하는 단계; 출력 영상과 타겟 영상 사이의 픽셀 손실, 특징 손실 및 색상 손실의 합으로 정의되는 손실 함수를 이용하여, 상기 데이터 세트를 상기 제1 U-Net 신경망 모델을 통해 학습하는 단계; 다수의 풀링 계층과 다수의 업샘플링 계층으로 구성된 제2 U-Net 신경망 모델을 생성하는 단계; 및 상기 손실 함수를 이용하여, 상기 제1 U-Net 신경망 모델을 통해 학습된 결과를 상기 제2 U-Net 신경망 모델 학습을 통해 학습하여 raw 영상과 RGB 영상의 데이터 쌍(pair)으로 정렬하는 단계;를 포함한다.The raw to RGB mapping method using misaligned data based on two-step U-Net structure according to an embodiment for realizing the object of the present invention is a mismatched data set of input raw image and output RGB image from different cameras obtaining a; Generating a first U-Net neural network model consisting of a plurality of pooling (Pooling) layers and a plurality of upsampling (Upsampling) layers; learning the data set through the first U-Net neural network model using a loss function defined as a sum of pixel loss, feature loss, and color loss between an output image and a target image; generating a second U-Net neural network model composed of a plurality of pooling layers and a plurality of upsampling layers; and using the loss function, learning the result learned through the first U-Net neural network model through learning the second U-Net neural network model, and aligning the raw image and RGB image data pairs into pairs. includes ;

본 발명의 실시예에서, 상기 제1 U-Net 신경망 모델을 통해 학습하는 단계는, 상기 데이터 세트를 출력 영상과 타겟 영상의 차이를 이용한 상기 손실 함수를 정의하는 단계를 포함할 수 있다.In an embodiment of the present invention, the step of learning through the first U-Net neural network model may include defining the loss function using a difference between an output image and a target image for the data set.

본 발명의 실시예에서, 상기 손실 함수를 정의하는 단계는, 상기 데이터 세트를 출력 영상과 타겟 영상의 픽셀 차이인 픽셀 손실을 계산하는 단계; 미리 설정된 레벨 이상의 특징들에 기반한 퍼셉츄얼(perceptual) 손실로부터 특징을 추출하는 특징 손실을 계산하는 단계; 및 RGB 색상 벡터 사이의 코사인 거리로부터 색상 손실을 계산하는 단계;를 포함할 수 있다.In an embodiment of the present invention, the step of defining the loss function includes: calculating a pixel loss that is a pixel difference between an output image and a target image using the data set; calculating a feature loss for extracting a feature from a perceptual loss based on features above a preset level; and calculating a color loss from the cosine distance between the RGB color vectors.

본 발명의 실시예에서, 상기 미리 설정된 레벨 이상의 특징들에 기반한 퍼셉츄얼 손실로부터 특징을 추출하는 특징 손실을 계산하는 단계는, ReLU(Rectified Linear Unit) 4 이상의 계층에서 특징을 추출할 수 있다.In an embodiment of the present invention, the step of calculating a feature loss for extracting a feature from the perceptual loss based on the features of the preset level or higher may extract a feature from a Rectified Linear Unit (ReLU) 4 or higher layer.

본 발명의 실시예에서, 상기 RGB 색상 벡터 사이의 코사인 거리로부터 색상 손실을 계산하는 단계는, 영상을 2배 축소하여 사용할 수 있다.In an embodiment of the present invention, the calculating of the color loss from the cosine distance between the RGB color vectors may include reducing the image by 2 times.

본 발명의 실시예에서, 상기 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법은, 학습된 raw 영상과 RGB 영상의 데이터 쌍(pair)을 기초로, 실시간으로 raw 영상으로부터 RGB 영상을 출력하는 단계를 더 포함할 수 있다.In an embodiment of the present invention, the raw-to-RGB mapping method using the misaligned data based on the two-step U-Net structure is based on the learned data pair of the raw image and the RGB image, from the raw image in real time. The method may further include outputting an RGB image.

본 발명의 실시예에서, 상기 제1 U-Net 신경망 모델 및 상기 제2 U-Net 신경망 모델은 채널 어텐션 메커니즘(channel attention mechanism)과 롱 스킵 커넥션(long skip connection)을 포함할 수 있다.In an embodiment of the present invention, the first U-Net neural network model and the second U-Net neural network model may include a channel attention mechanism and a long skip connection.

상기한 본 발명의 다른 목적을 실현하기 위한 일 실시예에 따른 컴퓨터로 판독 가능한 저장 매체에는, 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법을 수행하기 위한 컴퓨터 프로그램이 기록되어 있다. In a computer-readable storage medium according to an embodiment for realizing another object of the present invention, a computer program for performing a raw-to-RGB mapping method using misaligned data based on a two-step U-Net structure is recorded has been

상기한 본 발명의 또 다른 목적을 실현하기 위한 일 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 장치는, 다수의 풀링(Pooling) 계층과 다수의 업샘플링(Upsampling) 계층으로 구성된 제1 U-Net 신경망 모델; 다수의 풀링(Pooling) 계층과 다수의 업샘플링(Upsampling) 계층으로 구성되고, 상기 제1 U-Net 신경망 모델과 연속되어 W-Net 구조를 형성하는 제2 U-Net 신경망 모델; 출력 영상과 타겟 영상 사이의 픽셀 손실, 특징 손실 및 색상 손실의 합으로 정의되는 손실 함수를 정의하는 손실 함수부; 및 상기 손실 함수를 이용하여 상기 제1 U-Net 신경망 모델 및 상기 제2 U-Net 신경망 모델을 통해 서로 다른 카메라로부터 입력 raw 영상과 출력 RGB 영상의 어긋난 데이터 세트를 학습하여, raw 영상과 RGB 영상의 데이터 쌍(pair)으로 정렬하는 영상 학습부;를 포함한다.A raw to RGB mapping device using misaligned data based on a two-step U-Net structure according to an embodiment for realizing another object of the present invention is a plurality of pooling layers and a plurality of upsampling ( Upsampling) a first U-Net neural network model composed of layers; a second U-Net neural network model consisting of a plurality of pooling layers and a plurality of upsampling layers, and continuous with the first U-Net neural network model to form a W-Net structure; a loss function unit defining a loss function defined as a sum of pixel loss, feature loss, and color loss between the output image and the target image; And by using the loss function to learn the mismatched data set of the input raw image and the output RGB image from different cameras through the first U-Net neural network model and the second U-Net neural network model, the raw image and the RGB image and an image learning unit that aligns data pairs of

본 발명의 실시예에서, 상기 손실 함수부는, 상기 데이터 세트를 출력 영상과 타겟 영상의 픽셀 차이인 픽셀 손실을 계산하는 픽셀 손실부; 미리 설정된 레벨 이상의 특징들에 기반한 퍼셉츄얼(perceptual) 손실로부터 특징을 추출하는 특징 손실을 계산하는 특징 손실부; 및 RGB 색상 벡터 사이의 코사인 거리로부터 색상 손실을 계산하는 색상 손실부;를 포함할 수 있다.In an embodiment of the present invention, the loss function unit includes: a pixel loss unit for calculating a pixel loss that is a pixel difference between an output image and a target image using the data set; a feature loss unit calculating a feature loss for extracting features from a perceptual loss based on features above a preset level; and a color loss unit that calculates color loss from the cosine distance between RGB color vectors.

본 발명의 실시예에서, 상기 특징 손실부는 ReLU(Rectified Linear Unit) 4 이상의 계층에서 특징을 추출하고, 상기 색상 손실부는 영상을 2배 축소하여 사용할 수 있다.In an embodiment of the present invention, the feature loss unit may extract features from a rectified linear unit (ReLU) 4 or higher layer, and the color loss unit may reduce an image by 2 times.

본 발명의 실시예에서, 상기 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 장치는, 학습된 raw 영상과 RGB 영상의 데이터 쌍(pair)을 기초로, 실시간으로 raw 영상으로부터 RGB 영상을 출력하는 영상 변환부를 더 포함할 수 있다.In an embodiment of the present invention, the raw-to-RGB mapping device using the misaligned data based on the two-step U-Net structure is based on the learned raw image and the data pair of the RGB image, It may further include an image converter for outputting an RGB image.

이와 같은 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법에 따르면, 동일한 씬을 서로 다른 카메라로 촬영하여 U-Net 신경망 모델을 학습시킨 후 출력에 추가적으로 U-Net 신경망 모델을 적용하여 학습 시킨다. 이에 따라, 스마트폰으로 촬영된 영상을 DSLR 촬영 영상 정도의 퀄리티로 변환시킬 수 있다.According to the raw to RGB mapping method using mismatched data based on the two-step U-Net structure, the U-Net neural network model is trained by shooting the same scene with different cameras, and then the U-Net neural network model is additionally added to the output. Apply and learn Accordingly, it is possible to convert an image taken with a smartphone into a quality comparable to that of a DSLR image.

도 1은 본 발명의 일 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 장치의 블록도이다.
도 2는 어긋난 raw 영상과 RGB 영상의 예시를 보여주는 도면이다.
도 3은 본 발명의 사용하는 하나의 신경망 구조를 보여주는 개념도이다.
도 4는 본 발명의 제1 U-Net 신경망 모델 및 제2 U-Net 신경망 모델이 형성하는 두 단계 W-Net 구조를 보여주는 개념도이다.
도 5는 도 1의 손실 함수부의 상세한 블록도이다.
도 6은 본 발명의 일 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법의 흐름도이다.
도 7은 본 발명의 네트워크 구조 향상에 따른 성능 비교를 보여주는 표이다.
도 8은 본 발명의 손실 함수 조합에 따른 성능 비교를 보여주는 표이다.
도 9는 본 발명의 데이터 셋에서의 성능 비교를 보여주는 표이다.
도 10은 본 발명에 따른 결과의 예시를 보여주는 도면이다.1 is a block diagram of a raw-to-RGB mapping apparatus using misaligned data based on a two-step U-Net structure according to an embodiment of the present invention.
2 is a diagram showing an example of a misaligned raw image and an RGB image.
3 is a conceptual diagram showing the structure of one neural network used in the present invention.
4 is a conceptual diagram showing a two-step W-Net structure formed by the first U-Net neural network model and the second U-Net neural network model of the present invention.
FIG. 5 is a detailed block diagram of the loss function unit of FIG. 1 .
6 is a flowchart of a raw-to-RGB mapping method using misaligned data based on a two-step U-Net structure according to an embodiment of the present invention.
7 is a table showing a performance comparison according to the improvement of the network structure of the present invention.
8 is a table showing a performance comparison according to the loss function combination of the present invention.
9 is a table showing performance comparison in the data set of the present invention.
10 is a diagram showing an example of a result according to the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0023] Reference is made to the accompanying drawings, which show by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein with respect to one embodiment may be embodied in other embodiments without departing from the spirit and scope of the invention. In addition, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the detailed description set forth below is not intended to be taken in a limiting sense, and the scope of the invention, if properly described, is limited only by the appended claims, along with all scope equivalents to those claimed. Like reference numerals in the drawings refer to the same or similar functions throughout the various aspects.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 장치의 블록도이다.1 is a block diagram of a raw-to-RGB mapping apparatus using misaligned data based on a two-step U-Net structure according to an embodiment of the present invention.

본 발명에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 장치(10, 이하 장치)는 모바일 기기로 촬영된 raw 영상으로부터 DSLR 수준의 화질의 RGB 영상을 얻기 위해 어긋난 데이터 쌍(pair)을 학습하는 학습기이다. A raw-to-RGB mapping device (10, hereinafter device) using mismatched data based on a two-step U-Net structure according to the present invention is a data pair that is misaligned to obtain an RGB image of DSLR-level quality from a raw image taken with a mobile device. It is a learning period for learning pairs.

도 1을 참조하면, 본 발명에 따른 장치(10)는 제1 U-Net 신경망 모델(200), 제2 U-Net 신경망 모델(400), 손실 함수부(100) 및 영상 학습부(500)를 포함한다. 일 실시예에서, 상기 장치(10)는 영상 변환부(미도시)를 더 포함할 수 있다.Referring to FIG. 1 , an apparatus 10 according to the present invention includes a first U-Net neural network model 200 , a second U-Net neural network model 400 , a loss function unit 100 and an image learning unit 500 . includes In an embodiment, the device 10 may further include an image converter (not shown).

상기 제1 U-Net 신경망 모델(200) 및 상기 제2 U-Net 신경망 모델(400)은 각각 다수의 풀링(Pooling) 계층과 다수의 업샘플링(Upsampling) 계층으로 구성되고, 서로 연결되어 W-Net 구조를 형성한다.The first U-Net neural network model 200 and the second U-Net neural network model 400 are each composed of a plurality of pooling layers and a plurality of upsampling layers, and are connected to each other to W- Net structure is formed.

상기 손실 함수부(100)는 출력 영상과 타겟 영상 사이의 픽셀 손실, 특징 손실 및 색상 손실의 합으로 정의되는 손실 함수를 정의한다.The loss function unit 100 defines a loss function defined as the sum of pixel loss, feature loss, and color loss between the output image and the target image.

상기 영상 학습부(500)는 상기 손실 함수를 이용하여 상기 제1 U-Net 신경망 모델(200) 및 상기 제2 U-Net 신경망 모델(400)을 통해 서로 다른 카메라로부터 입력 raw 영상과 출력 RGB 영상의 어긋난 데이터 세트를 학습하여, raw 영상과 RGB 영상의 데이터 쌍(pair)으로 정렬한다.The image learning unit 500 uses the loss function to input raw images and output RGB images from different cameras through the first U-Net neural network model 200 and the second U-Net neural network model 400 . It learns the misaligned data set of , and sorts it into data pairs of raw and RGB images.

일 실시예로서, 상기 입력 raw 영상은 스마트폰으로 촬영된 영상일 수 있고, 상기 출력 RGB 영상은 높은 화질의 DSLR 카메라로 촬영된 영상일 수 있다.As an embodiment, the input raw image may be an image captured by a smartphone, and the output RGB image may be an image captured by a high-quality DSLR camera.

상기 영상 변환부는 학습된 raw 영상과 RGB 영상의 데이터 쌍(pair)을 기초로, 실시간으로 raw 영상으로부터 RGB 영상을 출력할 수 있다. 예를 들어, 스마트폰으로 촬영한 raw 영상을 DSLR 카메라로 촬영된 영상 수준으로 출력할 수 있다.The image converter may output the RGB image from the raw image in real time based on the learned data pair of the raw image and the RGB image. For example, you can output a raw image shot with a smartphone at the level of an image shot with a DSLR camera.

본 발명의 상기 장치(10)는 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑을 수행하기 위한 소프트웨어(애플리케이션)가 설치되어 실행될 수 있으며, 상기 제1 U-Net 신경망 모델(200), 상기 제2 U-Net 신경망 모델(400), 상기 손실 함수부(100) 및 상기 영상 학습부(500)의 구성은 상기 장치(10)에서 실행되는 상기 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑을 수행하기 위한 소프트웨어에 의해 제어될 수 있다. In the device 10 of the present invention, software (application) for performing mapping from raw to RGB using misaligned data based on a two-step U-Net structure may be installed and executed, and the first U-Net neural network model ( 200), the configuration of the second U-Net neural network model 400, the loss function unit 100 and the image learning unit 500 is based on the two-step U-Net structure executed in the device 10. It can be controlled by software to perform mapping from raw to RGB using misaligned data.

상기 장치(10)는 별도의 단말이거나 또는 단말의 일부 모듈일 수 있다. 또한, 상기 제1 U-Net 신경망 모델(200), 상기 제2 U-Net 신경망 모델(400), 상기 손실 함수부(100) 및 상기 영상 학습부(500)의 구성은 통합 모듈로 형성되거나, 하나 이상의 모듈로 이루어 질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수도 있다.The device 10 may be a separate terminal or a module of the terminal. In addition, the configuration of the first U-Net neural network model 200, the second U-Net neural network model 400, the loss function unit 100 and the image learning unit 500 is formed as an integrated module, It may consist of one or more modules. However, on the contrary, each configuration may be formed of a separate module.

상기 장치(10)는 이동성을 갖거나 고정될 수 있다. 상기 장치(10)는, 서버(server) 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말(terminal), UE(user equipment), MS(mobile station), 무선기기(wireless device), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다. The device 10 may be movable or stationary. The apparatus 10 may be in the form of a server or an engine, and may be a device, an application, a terminal, a user equipment (UE), a mobile station (MS), or a wireless device. (wireless device), may be called other terms such as a handheld device (handheld device).

상기 장치(10)는 운영체제(Operation System; OS), 즉 시스템을 기반으로 다양한 소프트웨어를 실행하거나 제작할 수 있다. 상기 운영체제는 소프트웨어가 장치의 하드웨어를 사용할 수 있도록 하기 위한 시스템 프로그램으로서, 안드로이드 OS, iOS, 윈도우 모바일 OS, 바다 OS, 심비안 OS, 블랙베리 OS 등 모바일 컴퓨터 운영체제 및 윈도우 계열, 리눅스 계열, 유닉스 계열, MAC, AIX, HP-UX 등 컴퓨터 운영체제를 모두 포함할 수 있다.The device 10 may execute or manufacture various software based on an operating system (OS), that is, the system. The operating system is a system program for software to use the hardware of the device, and is a mobile computer operating system such as Android OS, iOS, Windows Mobile OS, Bada OS, Symbian OS, Blackberry OS, and Windows series, Linux series, Unix series, It can include all computer operating systems such as MAC, AIX, and HP-UX.

종래 기술들은 모바일 카메라 영상을 DSLR 카메라 수준으로 화질 개선을 하기 위해 RGB 영상만을 이용하였다. 하지만 RGB 영상은 이미 카메라에 탑재된 영상 신호 처리기(Image signal processor, ISP)를 통하여 raw 센서 데이터를 가공한 것으로, 데이터의 손실이 발생하였을 수 있다. Conventional technologies used only RGB images to improve the quality of mobile camera images to the level of DSLR cameras. However, RGB images are processed by raw sensor data through an image signal processor (ISP) already installed in the camera, and data loss may have occurred.

따라서, 본 발명에서는 raw 센서로부터 획득한 영상을 입력으로 사용하여 DSLR 수준의 화질의 RGB 영상을 생성하는 합성곱 신경망 모델을 학습한다. Therefore, in the present invention, a convolutional neural network model that generates an RGB image of DSLR-level quality is learned by using an image obtained from a raw sensor as an input.

또한, 일반적으로 서로 다른 카메라로 촬영된 영상 쌍(pair)을 정렬시키기 위하여 SIFT 키포인트 매칭을 사용하는데, 이 또한 에러가 존재하기 때문에 완벽하게 정렬되지 않는다. Also, in general, SIFT keypoint matching is used to align pairs of images captured by different cameras, which are also not perfectly aligned due to the presence of errors.

도 2는 어긋난 raw 영상과 RGB 영상의 예시를 보여주는 도면이다. 도 2를 참조하면, 왼쪽은 스마트 폰으로, 오른쪽은 DSLR 카메라로 동일한 씬을 촬영한 영상이나, vertical shift 현상이 나타났다. 이와 같이, 동일한 씬을 서로 다른 카메라로 촬영하였을 때 영상은 완전히 동일할 수 없다.2 is a diagram showing an example of a misaligned raw image and an RGB image. Referring to FIG. 2 , the image taken with a smartphone on the left and the same scene with a DSLR camera on the right, or a vertical shift phenomenon was observed. As such, when the same scene is photographed with different cameras, the images cannot be completely identical.

이러한 데이터 셋의 문제를 해결하기 위하여, 본 발명에서는 기존과 다른 신경망 구조와 손실 함수를 사용한다. 이에 대한 자세한 설명은 이하 도면을 참조하여 서술한다.In order to solve the problem of such a data set, the present invention uses a neural network structure and loss function different from the existing ones. A detailed description thereof will be described with reference to the drawings below.

도 3은 본 발명의 사용하는 하나의 신경망 구조를 보여주는 개념도이다. 도 4는 본 발명의 제1 U-Net 신경망 모델 및 제2 U-Net 신경망 모델이 형성하는 두 단계 W-Net 구조를 보여주는 개념도이다.3 is a conceptual diagram showing the structure of one neural network used in the present invention. 4 is a conceptual diagram showing a two-step W-Net structure formed by the first U-Net neural network model and the second U-Net neural network model of the present invention.

먼저, 본 발명에서 이용하는 신경망 구조는 U-Net 구조이며, 2개의 신경망 모델을 사용한다. 즉, 도 4와 같이 상기 제1 U-Net 신경망 모델(200)과 상기 제2 U-Net 신경망 모델(400)을 연결시킨 신경망 구조를 형성한다.First, the neural network structure used in the present invention is a U-Net structure, and two neural network models are used. That is, as shown in FIG. 4 , a neural network structure in which the first U-Net neural network model 200 and the second U-Net neural network model 400 are connected is formed.

도 3에는 편의상 하나의 신경망 구조만을 도시하였다. 도 3을 참조하면, U-Net은 다수의 풀링(Pooling) 계층과 업샘플링(Upsampling) 계층으로 이루어져 있어 어긋난 데이터로 학습할 때 효과적이다. 3 shows only one neural network structure for convenience. Referring to FIG. 3 , U-Net consists of a plurality of pooling layers and upsampling layers, so it is effective when learning with misaligned data.

본 발명은 U-Net 구조를 추가적으로 변형하여 성능을 향상시킨다. 이를 위해, 도 3과 같이 채널 어텐션 메커니즘(channel attention mechanism)을 적용하여 더욱 유용한 특징(features)을 추출하도록 한다(CA-Convs 부분). 또한, 롱 스킵 커넥션(long skip connection)을 추가하여 신경망의 학습을 용이하게 한다. The present invention improves the performance by further modifying the U-Net structure. To this end, more useful features are extracted by applying a channel attention mechanism as shown in FIG. 3 (CA-Convs part). In addition, the learning of the neural network is facilitated by adding a long skip connection.

도 4와 같이, 본 발명은 동일하게 개선된 U-Net 2개를 연속적으로 붙여서 두 단계로 구성하고, 이를 W-Net이라 명명한다. 학습 과정은 첫 번째 U-Net(200)을 본 발명에서 제안하는 손실 함수를 이용하여 학습하고, 학습이 완료된 뒤에 두 번째 U-Net(400)을 동일한 손실 함수를 이용하여 학습한다.As shown in Figure 4, the present invention consists of two steps by successively attaching two identically improved U-Nets, and this is called W-Net. In the learning process, the first U-Net 200 is learned using the loss function proposed in the present invention, and after the learning is completed, the second U-Net 400 is learned using the same loss function.

도 5는 도 1의 손실 함수부의 상세한 블록도이다.FIG. 5 is a detailed block diagram of the loss function unit of FIG. 1 .

도 5를 참조하면, 상기 손실 함수부(100)는 상기 데이터 세트를 출력 영상과 타겟 영상의 픽셀 차이인 픽셀 손실을 계산하는 픽셀 손실부(110), 미리 설정된 레벨 이상의 특징들에 기반한 퍼셉츄얼(perceptual) 손실로부터 특징을 추출하는 특징 손실을 계산하는 특징 손실부(130) 및 RGB 색상 벡터 사이의 코사인 거리로부터 색상 손실을 계산하는 색상 손실부(150)를 포함한다.Referring to FIG. 5 , the loss function unit 100 converts the data set into a pixel loss unit 110 for calculating a pixel loss that is a pixel difference between an output image and a target image, and a perceptual ( perceptual) includes a feature loss unit 130 for calculating a feature loss that extracts features from the loss, and a color loss unit 150 for calculating a color loss from a cosine distance between RGB color vectors.

일 실시예에서, 상기 손실 함수부(100)는 상기 픽셀 손실, 상기 특징 손실 및 상기 색상 손실을 합하여 최종 손실 함수를 정의하는 손실 합성부(170)를 더 포함할 수 있다.In an embodiment, the loss function unit 100 may further include a loss synthesis unit 170 that defines a final loss function by summing the pixel loss, the feature loss, and the color loss.

이미 설명한 바와 같이, 본 발명에서 제안하는 손실 함수는 3개의 손실의 조합으로 구성된다. 본 발명에서는 신경망 모델의 출력 영상

와 타겟 영상

의 차이를 비교할 때, 데이터의 정렬에 덜 가변적이고 색상 차이에 더 민감한 손실 함수를 사용한다.As already described, the loss function proposed in the present invention consists of a combination of three losses. In the present invention, the output image of the neural network model

and target video

When comparing the differences in , we use a loss function that is less variable in the alignment of the data and more sensitive to color differences.

첫째, 기본적인 픽셀 손실인

손실을 사용한다. 픽셀 손실인

은 아래의 수학식 1과 같이 정의된다.First, the basic pixel loss

use loss. loss of pixels

is defined as in Equation 1 below.

[수학식 1][Equation 1]

그러나, 데이터의 어긋남 때문에, 픽셀 손실만을 사용하면 흐릿한 결과 영상을 얻게 되므로 아래의 손실 함수를 더 이용한다.However, due to data misalignment, if only pixel loss is used, a blurry result image is obtained, so the following loss function is further used.

둘째, 특징 손실로써 높은 레벨의 특징들에 기반한 퍼셉츄얼 손실을 활용한다. 높은 레벨의 특징들은 여러 번의 다운 샘플링(down sampling)을 거쳐서 생성되기 때문에 데이터의 어긋남에 덜 가변적이다. Second, we use perceptual loss based on high-level features as feature loss. Because high-level features are generated through multiple down-sampling, they are less variable in data discrepancy.

본 발명에서는 사전에 학습된 VGG-19의 rule4_1 또는 rule5_1 계층에서 특징을 추출할 수 있다. 특징 손실을 픽셀 손실과 함께 사용하면 픽셀 손실만을 사용했을 경우보다 선명한 화질의 결과 영상을 얻을 수 있다. 아래의 수학식 2는 특징 손실을 정의한다.In the present invention, features can be extracted from the previously learned rule4_1 or rule5_1 layer of VGG-19. If the feature loss is used together with the pixel loss, the resulting image can be obtained with a sharper quality than when only the pixel loss is used. Equation 2 below defines the feature loss.

[수학식 2][Equation 2]

여기서,

는 VGG-19의 rule4_1 또는 rule5_1 계층을 나타낸다.here,

represents the rule4_1 or rule5_1 layer of VGG-19.

셋째, raw에서 RGB로의 색상 변환을 효과적으로 학습하기 위해 색상 손실을 사용한다. 색상 손실은 RGB 색상 벡터 사이의 코사인 거리로 정의된다. 색상 손실까지 조합하여 신경망을 학습하면 더욱 정확한 색상의 결과 영상을 얻을 수 있다. 아래의 수학식 3은 색상 손실을 정의한다.Third, we use color loss to effectively learn the color conversion from raw to RGB. Color loss is defined as the cosine distance between RGB color vectors. If you train the neural network by combining color loss, you can get a more accurate color result image. Equation 3 below defines color loss.

[수학식 3][Equation 3]

여기서,

는 영상의 2배 축소를 의미하고,

는

번째 픽셀에 대한 인덱스이다.here,

means 2x reduction of the image,

is

It is the index of the th pixel.

종합하면, 본 발명에서 제안하는 손실 함수는 상기 픽셀 손실, 상기 특징 손실 및 상기 색상 손실의 합으로 정의되며, 수학식으로 표현하면 아래의 수학식 4와 같다.In summary, the loss function proposed in the present invention is defined as the sum of the pixel loss, the feature loss, and the color loss, and is expressed as Equation 4 below.

[수학식 4][Equation 4]

이와 같이 본 발명에서는 새롭게 도출된 손실 함수를 이용하여 상기 제1 U-Net 신경망 모델(200)을 학습하고, 학습이 완료되면, 상기 제2 U-Net 신경망 모델(400)에서 동일한 손실 함수를 이용하여 학습한다.As described above, in the present invention, the first U-Net neural network model 200 is learned using the newly derived loss function, and when the learning is completed, the same loss function is used in the second U-Net neural network model 400 . to learn

이러한 학습 결과, raw 영상과 RGB 영상의 데이터 쌍(pair)의 정렬 데이터를 획득할 수 있고, 학습된 raw 영상과 RGB 영상의 데이터 쌍(pair)을 이용하여, 이후 실시간으로 raw 영상으로부터 RGB 영상을 출력할 수 있다.As a result of this learning, alignment data of a data pair of a raw image and an RGB image can be acquired, and an RGB image can be generated from the raw image in real time by using the learned data pair of the raw image and the RGB image. can be printed out.

도 6은 본 발명의 일 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법의 흐름도이다.6 is a flowchart of a raw-to-RGB mapping method using misaligned data based on a two-step U-Net structure according to an embodiment of the present invention.

본 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법은, 도 1의 장치(10)와 실질적으로 동일한 구성에서 진행될 수 있다. The raw-to-RGB mapping method using misaligned data based on the two-step U-Net structure according to the present embodiment may be performed in substantially the same configuration as the device 10 of FIG. 1 .

따라서, 도 1의 장치(10)와 동일한 구성요소는 동일한 도면부호를 부여하고, 반복되는 설명은 생략한다. 또한, 본 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법은 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑을 위한 소프트웨어(애플리케이션)에 의해 실행될 수 있다.Accordingly, the same components as those of the device 10 of FIG. 1 are given the same reference numerals, and repeated descriptions are omitted. In addition, the mapping method from raw to RGB using mismatched data based on the two-step U-Net structure according to this embodiment is software (application) for mapping from raw to RGB using misaligned data based on the two-step U-Net structure. can be executed by

본 발명에서는 raw 센서로부터 획득한 영상을 입력으로 사용하여 DSLR 수준의 화질의 RGB 영상을 생성하는 합성곱 신경망 모델을 학습한다. In the present invention, a convolutional neural network model that generates an RGB image of DSLR-level quality is learned by using an image obtained from a raw sensor as an input.

도 6을 참조하면, 본 실시예에 따른 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법은, 먼저, 서로 다른 카메라로부터 입력 raw 영상과 출력 RGB 영상의 어긋난 데이터 세트를 획득한다(단계 S10). Referring to FIG. 6 , the raw-to-RGB mapping method using mismatched data based on the two-step U-Net structure according to the present embodiment first acquires mismatched data sets of input raw images and output RGB images from different cameras. do (step S10).

다수의 풀링(Pooling) 계층과 다수의 업샘플링(Upsampling) 계층으로 구성된 제1 U-Net 신경망 모델을 생성한다(단계 S20).A first U-Net neural network model consisting of a plurality of pooling layers and a plurality of upsampling layers is generated (step S20).

본 발명에서, 제1 U-Net 신경망 모델 및 제2 U-Net 신경망 모델은 각각 다수의 풀링(Pooling) 계층과 다수의 업샘플링(Upsampling) 계층으로 구성되고, 서로 연결되어 W-Net 구조를 형성한다.In the present invention, the first U-Net neural network model and the second U-Net neural network model are each composed of a plurality of pooling layers and a plurality of upsampling layers, and are connected to each other to form a W-Net structure. do.

본 발명은 U-Net 구조를 추가적으로 변형하여 성능을 향상시킨다. 이를 위해, 도 3과 같이 채널 어텐션 메커니즘(channel attention mechanism)을 적용하여 더욱 유용한 특징(features)을 추출하도록 한다(CA-Convs 부분). 또한, 롱 스킵 커넥션(long skip connection)을 추가하여 신경망의 학습을 용이하게 한다.The present invention improves the performance by further modifying the U-Net structure. To this end, more useful features are extracted by applying a channel attention mechanism as shown in FIG. 3 (CA-Convs part). In addition, the learning of the neural network is facilitated by adding a long skip connection.

먼저, 출력 영상과 타겟 영상 사이의 픽셀 손실, 특징 손실 및 색상 손실의 합으로 정의되는 손실 함수를 이용하여, 상기 데이터 세트를 상기 제1 U-Net 신경망 모델을 통해 학습한다(단계 S30).First, the data set is learned through the first U-Net neural network model by using a loss function defined as the sum of pixel loss, feature loss, and color loss between an output image and a target image (step S30).

본 발명에서 제안하는 손실 함수는 3개의 손실의 조합으로 구성된다. 본 발명에서는 신경망 모델의 출력 영상

와 타겟 영상

의 차이를 비교할 때, 데이터의 정렬에 덜 가변적이고 색상 차이에 더 민감한 손실 함수를 사용한다.The loss function proposed in the present invention consists of a combination of three losses. In the present invention, the output image of the neural network model

and target video

첫째, 기본적인 픽셀 손실인

손실을 사용한다. 픽셀 손실인

은 상기 수학식 1과 같이 정의된다. 그러나, 데이터의 어긋남 때문에, 픽셀 손실만을 사용하면 흐릿한 결과 영상을 얻게 되므로 아래의 손실 함수를 더 이용한다.First, the basic pixel loss

use loss. loss of pixels

is defined as in Equation 1 above. However, due to data misalignment, if only pixel loss is used, a blurry result image is obtained, so the following loss function is further used.

본 발명에서는 사전에 학습된 VGG-19의 rule4_1 또는 rule5_1 계층에서 특징을 추출할 수 있다. 특징 손실을 픽셀 손실과 함께 사용하면 픽셀 손실만을 사용했을 경우보다 선명한 화질의 결과 영상을 얻을 수 있다. 상기 수학식 2는 특징 손실을 정의한다.In the present invention, features can be extracted from the previously learned rule4_1 or rule5_1 layer of VGG-19. If the feature loss is used together with the pixel loss, the resulting image can be obtained with a sharper quality than when only the pixel loss is used. Equation 2 above defines the feature loss.

셋째, raw에서 RGB로의 색상 변환을 효과적으로 학습하기 위해 색상 손실을 사용한다. 색상 손실은 RGB 색상 벡터 사이의 코사인 거리로 정의된다. 색상 손실까지 조합하여 신경망을 학습하면 더욱 정확한 색상의 결과 영상을 얻을 수 있다. 상기 수학식 3은 색상 손실을 정의한다.Third, we use color loss to effectively learn the color conversion from raw to RGB. Color loss is defined as the cosine distance between RGB color vectors. If you train the neural network by combining color loss, you can get a more accurate color result image. Equation 3 above defines color loss.

결론적으로, 본 발명에서 제안하는 손실 함수는 상기 픽셀 손실, 상기 특징 손실 및 상기 색상 손실의 합으로 정의되며, 수학식으로 표현하면 상기 수학식 4와 같다.In conclusion, the loss function proposed in the present invention is defined as the sum of the pixel loss, the feature loss, and the color loss, and is expressed as Equation 4 above.

다수의 풀링 계층과 다수의 업샘플링 계층으로 구성된 제2 U-Net 신경망 모델을 생성한다(단계 S50).A second U-Net neural network model composed of a plurality of pooling layers and a plurality of upsampling layers is generated (step S50).

새롭게 도출된 손실 함수를 이용하여 상기 제1 U-Net 신경망 모델을 학습하고, 학습이 완료되면, 상기 제2 U-Net 신경망 모델에서 동일한 손실 함수를 이용하여 학습한다.The first U-Net neural network model is trained using the newly derived loss function, and when the learning is completed, the second U-Net neural network model is trained using the same loss function.

다시 말해, 상기 손실 함수를 이용하여, 상기 제1 U-Net 신경망 모델을 통해 학습된 결과를 상기 제2 U-Net 신경망 모델 학습을 통해 학습하여 raw 영상과 RGB 영상의 데이터 쌍(pair)으로 정렬한다(단계 S60).In other words, by using the loss function, the result learned through the first U-Net neural network model is learned through the second U-Net neural network model learning, and the data pairs of the raw image and the RGB image are arranged. do (step S60).

이에 따라, 학습된 raw 영상과 RGB 영상의 데이터 쌍(pair)을 기초로, 실시간으로 raw 영상으로부터 RGB 영상을 출력할 수 있다. 예를 들어, 스마트폰으로 촬영한 raw 영상을 DSLR 카메라로 촬영된 영상 수준으로 출력할 수 있다.Accordingly, based on the learned data pair of the raw image and the RGB image, it is possible to output the RGB image from the raw image in real time. For example, you can output a raw image shot with a smartphone at the level of an image shot with a DSLR camera.

이하에서는, 본 발명의 성능을 평가하기 위한 실험 결과를 기술한다. Hereinafter, experimental results for evaluating the performance of the present invention are described.

스마트폰으로 촬영된 입력 raw 영상으로부터 DSLR 수준의 화질을 갖는 RGB 영상을 얻는다. International conference on computer vision(ICCV) 안의 advanced in image manipulation(AIM) raw-to-RGB mapping challenge에서 공개한 Zurich RAW2RGB(ZRR) 데이터셋을 이용하여 본 발명에서 개발한 신경망 모델을 학습하고 테스트를 진행하였다. 성능 평가 지표는 peak signal to noise ratio(PSNR)과 structural similarity index(SSIM)을 사용한다. An RGB image with DSLR-level quality is obtained from the input raw image taken with a smartphone. The neural network model developed in the present invention was trained and tested using the Zurich RAW2RGB (ZRR) dataset published in the advanced in image manipulation (AIM) raw-to-RGB mapping challenge in the International conference on computer vision (ICCV). . Performance evaluation indicators use peak signal to noise ratio (PSNR) and structural similarity index (SSIM).

도 7은 네트워크 구조 향상에 따른 성능 향상을 보여주고, 도 8은 손실 함수의 조합에 따른 성능 향상을 보여준다. 이를 통하여, 본 발명에서 제안한 신경망 구조와 손실 함수가 성능을 많이 향상시켰음을 확인할 수 있다.7 shows the performance improvement according to the improvement of the network structure, and FIG. 8 shows the performance improvement according to the combination of the loss functions. Through this, it can be confirmed that the performance of the neural network structure and loss function proposed in the present invention is greatly improved.

도 9는 본 발명의 데이터 셋에서의 성능 비교를 보여주는 표이다.9 is a table showing performance comparison in the data set of the present invention.

도 9를 참조하면, 본 발명이 챌린지에 참여한 다른 기법들에 비하여 높은 성능을 얻은 것을 볼 수 있다. PSNR과 SSIM의 측면에서 가장 높은 성능을 달성하였다. 또 다른 평가 지표인 mean opinion score(MOS)는 사람에 의한 주관 평가로, 1위가 근소한 차이로 2위를 하였다.Referring to FIG. 9 , it can be seen that the present invention obtained higher performance compared to other techniques participating in the challenge. The highest performance was achieved in terms of PSNR and SSIM. Another evaluation index, the mean opinion score (MOS), is a subjective evaluation by a human, and the first place came in second place with a slight difference.

도 10은 본 발명에 따른 결과의 예시를 보여주는 도면으로, 모바일 기기로 촬영된 raw 영상으로부터 DSLR 수준의 화질의 RGB 영상을 얻는 것을 확인할 수 있다.10 is a view showing an example of a result according to the present invention, and it can be confirmed that an RGB image of a DSLR-level quality is obtained from a raw image taken with a mobile device.

이와 같은, 두 단계 U-Net 구조 기반의 어긋난 데이터를 이용한 raw에서 RGB로의 매핑 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. This method of mapping from raw to RGB using mismatched data based on the two-step U-Net structure is implemented as an application or in the form of a program command that can be executed through various computer components and recorded on a computer-readable recording medium. can be The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. The program instructions recorded on the computer-readable recording medium are specially designed and configured for the present invention, and may be known and available to those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of the computer-readable recording medium include hard disks, magnetic media such as floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to the embodiments, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention described in the claims below You will understand.

본 발명은 어긋난 raw 영상과 RGB 영상의 데이터 쌍(pair)의 정렬 데이터를 획득할 수 있고, 학습된 raw 영상과 RGB 영상의 데이터 쌍(pair)을 이용하여, 이후 실시간으로 촬영한 스마트폰 영상으로부터 DSLR 수준의 영상을 출력할 수 있다.The present invention can obtain alignment data of a data pair of a raw image and an RGB image that is misaligned, and using the learned data pair of a raw image and an RGB image, from a smartphone image captured in real time It can output DSLR-quality images.

10: raw에서 RGB로의 매핑 장치
100: 손실 함수부
200: 제1 U-Net 신경망 모델
400: 제2 U-Net 신경망 모델
500: 영상 학습부
110: 픽셀 손실부
130: 특징 손실부
150: 색상 손실부
170: 손실 합성부10: Raw to RGB mapping device
100: loss function part
200: first U-Net neural network model
400: second U-Net neural network model
500: video learning unit
110: pixel loss part
130: feature loss part
150: color loss part
170: lossy synthesis unit

Claims

obtaining a mismatched data set of an input raw image and an output RGB image from different cameras;
Generating a first U-Net neural network model consisting of a plurality of pooling (Pooling) layers and a plurality of upsampling (Upsampling) layers;
learning the data set through the first U-Net neural network model using a loss function defined as a sum of pixel loss, feature loss, and color loss between an output image and a target image;
generating a second U-Net neural network model composed of a plurality of pooling layers and a plurality of upsampling layers; and
using the loss function, learning the result learned through the first U-Net neural network model through the second U-Net neural network model learning, and aligning the raw image and the RGB image data pair; A mapping method from raw to RGB using misaligned data based on a two-step U-Net structure, including

According to claim 1, wherein the step of learning through the first U-Net neural network model,
A mapping method from raw to RGB using misaligned data based on a two-step U-Net structure, comprising the step of defining the loss function using the difference between the output image and the target image in the data set.

3. The method of claim 2, wherein defining the loss function comprises:
calculating a pixel loss that is a pixel difference between an output image and a target image using the data set;
calculating a feature loss for extracting a feature from a perceptual loss based on features above a preset level; and
A method of mapping from raw to RGB using misaligned data based on a two-step U-Net structure, including; calculating a color loss from the cosine distance between RGB color vectors.

The method of claim 3 , wherein calculating the feature loss for extracting features from the perceptual loss based on features above the preset level comprises:
ReLU (Rectified Linear Unit) A mapping method from raw to RGB using misaligned data based on a two-step U-Net structure that extracts features from 4 or higher layers.

4. The method of claim 3, wherein calculating the color loss from the cosine distance between the RGB color vectors comprises:
A mapping method from raw to RGB using misaligned data based on a two-step U-Net structure that reduces the image by 2 times.

According to claim 1,
Mapping from raw to RGB using mismatched data based on a two-step U-Net structure, further comprising outputting an RGB image from a raw image in real time based on a data pair of the learned raw image and RGB image Way.

According to claim 1,
The first U-Net neural network model and the second U-Net neural network model are misaligned data based on a two-step U-Net structure, including a channel attention mechanism and a long skip connection. Mapping method from raw to RGB using .

8. The method according to any one of claims 1 to 7,
A computer-readable storage medium in which a computer program is recorded for performing the raw-to-RGB mapping method using the misaligned data based on the two-step U-Net structure.

A first U-Net neural network model consisting of a plurality of pooling layers and a plurality of upsampling layers;
a second U-Net neural network model consisting of a plurality of pooling layers and a plurality of upsampling layers, and continuous with the first U-Net neural network model to form a W-Net structure;
a loss function unit defining a loss function defined as a sum of pixel loss, feature loss, and color loss between the output image and the target image; and
Using the loss function, the first U-Net neural network model and the second U-Net neural network model are used to learn the mismatched data set of the input raw image and the output RGB image from different cameras. A mapping device from raw to RGB using misaligned data based on a two-step U-Net structure, including; an image learning unit that aligns data pairs.

The method of claim 9, wherein the loss function unit,
a pixel loss unit for calculating a pixel loss that is a pixel difference between an output image and a target image using the data set;
a feature loss unit calculating a feature loss for extracting features from a perceptual loss based on features above a preset level; and
A color loss unit that calculates color loss from the cosine distance between RGB color vectors; a raw to RGB mapping device using misaligned data based on a two-step U-Net structure.

11. The method of claim 10,
The feature loss unit extracts features from a ReLU (Rectified Linear Unit) 4 or higher layer, and the color loss unit reduces the image by a factor of 2 and uses mismatched data based on a two-step U-Net structure. Raw to RGB mapping device .

10. The method of claim 9,
Based on the learned raw image and the data pair of the RGB image, it further includes an image converter that outputs an RGB image from a raw image in real time. mapping device.

10. The method of claim 9,
The first U-Net neural network model and the second U-Net neural network model are misaligned data based on a two-step U-Net structure, including a channel attention mechanism and a long skip connection. Raw to RGB mapping device using .