KR102083721B1

KR102083721B1 - Stereo Super-ResolutionImaging Method using Deep Convolutional Networks and Apparatus Therefor

Info

Publication number: KR102083721B1
Application number: KR1020180026275A
Authority: KR
Inventors: 김민혁; 전석준
Original assignee: 한국과학기술원
Priority date: 2018-03-06
Filing date: 2018-03-06
Publication date: 2020-03-02
Also published as: KR20190105769A

Abstract

딥 러닝을 이용한 양안기반 초해상 이미징 방법 및 그 장치가 개시된다. 본 발명의 일 실시예에 따른 양안기반 초해상 이미징 방법은 스테레오 이미지들을 획득하는 단계; 및 학습된 딥 신경망을 이용하여 상기 스테레오 이미지들로부터 초해상 이미지를 생성하는 단계를 포함하고, 상기 초해상 이미지를 생성하는 단계는 상기 스테레오 이미지들과 고해상 휘도 이미지 간의 미리 설정된 제1 파라미터에 대한 매핑이 학습된 제1 딥 신경망을 이용하여 입력된 스테레오 이미지들에 대한 고해상 휘도 이미지를 재구성하는 단계; 및 고해상 이미지와 초해상 이미지 간의 미리 설정된 제2 파라미터에 대한 매핑이 학습된 제2 딥 신경망을 이용하여 상기 재구성된 고해상 휘도 이미지에 대한 초해상 이미지를 생성하는 단계를 포함한다.Disclosed are a binocular-based super resolution imaging method using deep learning and an apparatus thereof. Binocular-based super resolution imaging method according to an embodiment of the present invention comprises the steps of obtaining stereo images; And generating a super resolution image from the stereo images using a learned deep neural network, wherein generating the super resolution image comprises mapping a first predetermined parameter between the stereo images and the high resolution luminance image. Reconstructing a high resolution luminance image of the input stereo images using the learned first deep neural network; And generating a super-resolution image for the reconstructed high-resolution luminance image using a second deep neural network trained on mapping a second predetermined parameter between the high-resolution image and the super-resolution image.

Description

Binocular-based Super Resolution Imaging Method Using Deep Learning and Its Device {Stereo Super-ResolutionImaging Method using Deep Convolutional Networks and Apparatus Therefor}

본 발명은 양안기반 초해상(stereo super-resolution) 이미징 기술에 관한 것으로서, 보다 구체적으로 딥 러닝(deep learning) 기술을 이용하여 양안기반(또는 스테레오) 시스템에서 이미지의 해상도를 향상시킬 수 있는 양안기반 초해상 이미징 방법 및 그 장치에 관한 것이다.The present invention relates to a binocular-based super-resolution imaging technique, and more particularly to a binocular-based system capable of improving the resolution of an image in a binocular-based (or stereo) system using deep learning technology. A super-resolution imaging method and apparatus therefor.

최근 모바일 사진 기술의 발달에 따라, 듀얼 카메라는 모바일 장치에 일반적으로 장착되어 사용된다. 듀얼 카메라는 양안기반 이미지(또는 스테레오 이미지)의 두 배를 캡쳐하기 때문에, 싱글 이미지 초해상(single image super-resolution) 방법보다 초해상 알고리즘을 사용하여 고해상(high resolution) 이미지를 재구성 할 수 있어야 한다.With the recent development of mobile photography technology, dual cameras are commonly used in mobile devices. Since dual cameras capture twice the binocular-based image (or stereo image), it should be possible to reconstruct high resolution images using superresolution algorithms rather than single image super-resolution methods. .

기존의 양안기반 초해상 방법이 최근의 싱글 이미지 초해상 방법보다 이미지의 해상도를 더욱 향상시킬 수 없기에, 초해상 문제에 대한 듀얼 카메라의 장점이 완전히 활용되지 못하고 있다.Since the existing binocular-based super resolution method cannot improve the resolution of the image more than the recent single image super resolution method, the dual camera advantage on the super resolution problem is not fully utilized.

기존의 초해상 알고리즘은 서브 픽셀 정밀도로 지터(jittered) 서브 샘플링으로부터 획득된 저해상(low resolution) 소스 이미지로부터 고해상 이미지를 재구성한다. 여기서, 동일한 해상도의 멀티플 샷과 비디오 프레임이 입력으로 사용된다. Existing superresolution algorithms reconstruct high resolution images from low resolution source images obtained from jittered subsampling with subpixel precision. Here, multiple shots and video frames of the same resolution are used as inputs.

그러나, 스테레오 이미지들이 시차(parallax)로 인해 동일하지 않기 때문에, 스테레오 쌍을 결합하여 초해상 이미지를 획득하는 것은 쉽지 않다. 조금 더 상세히 설명하면, DFS(depth-from-stereo) 알고리즘으로부터 획득된 깊이(또는 깊이 정보)는 왼쪽 카메라의 픽셀을 오른쪽 카메라의 대응되는 픽셀 위치에 등록하는데 사용될 수 있기 때문에, 기존의 초해상 알고리즘을 듀얼 카메라 설정에 적용 할 수 있어야 한다. 그러나, 양안기반 이미징으로부터 획득된 얻은 불일치 정확도는 초해상 재구성에서 요구되는 서브 픽셀 레벨 정밀도보다 상당히 낮은 문제가 있다.However, since stereo images are not the same due to parallax, it is not easy to combine stereo pairs to obtain super resolution images. In more detail, the conventional super resolution algorithm is obtained because the depth (or depth information) obtained from the depth-from-stereo (DFS) algorithm can be used to register the pixels of the left camera at the corresponding pixel positions of the right camera. Should be able to apply to dual camera settings. However, the obtained mismatch accuracy obtained from binocular based imaging has a problem that is significantly lower than the sub pixel level precision required for super resolution reconstruction.

최근의 초해상 방법들은 싱글 카메라 입력을 사용한 소스 이미지 내의 셀프 패치들, 조인트 러닝(joint learning)을 이용한 이미지 자체, 컨볼루션 신경망(CNN; convolutional neural networks) 등으로부터 고해상 이미지를 구현하도록 시도하였다. 하지만, 이러한 방법들은 시차(parallax) 때문에 스테레오 쌍에 직접 적용 될 수 없다.Recent superresolution methods have attempted to implement high resolution images from self-patches in the source image using a single camera input, the image itself using joint learning, convolutional neural networks (CNN), and the like. However, these methods cannot be applied directly to stereo pairs because of parallax.

따라서, 양안기반 시스템에서 이미지의 해상도를 향상시킬 수 있는 양안기반 초해상 이미징 방법의 필요성이 대두된다.Accordingly, there is a need for a binocular based super resolution imaging method capable of improving the resolution of an image in a binocular based system.

본 발명의 실시예들은, 딥 러닝(deep learning) 기술을 이용하여 양안기반(또는 스테레오) 시스템에서 이미지의 해상도를 향상시킬 수 있는 양안기반 초해상 이미징 방법 및 그 장치를 제공한다.Embodiments of the present invention provide a binocular-based super resolution imaging method and apparatus capable of improving the resolution of an image in a binocular-based (or stereo) system using deep learning technology.

본 발명의 일 실시예에 따른 양안기반 초해상 이미징 방법은 스테레오 이미지들을 획득하는 단계; 및 학습된 딥 신경망을 이용하여 상기 스테레오 이미지들로부터 초해상 이미지를 생성하는 단계를 포함한다.Binocular-based super resolution imaging method according to an embodiment of the present invention comprises the steps of obtaining stereo images; And generating a super resolution image from the stereo images using the learned deep neural network.

상기 상기 초해상 이미지를 생성하는 단계는 상기 스테레오 이미지들과 고해상 휘도 이미지 간의 미리 설정된 제1 파라미터에 대한 매핑이 학습된 제1 딥 신경망을 이용하여 입력된 스테레오 이미지들에 대한 고해상 휘도 이미지를 재구성하는 단계; 및 고해상 이미지와 초해상 이미지 간의 미리 설정된 제2 파라미터에 대한 매핑이 학습된 제2 딥 신경망을 이용하여 상기 재구성된 고해상 휘도 이미지에 대한 초해상 이미지를 생성하는 단계를 포함할 수 있다.The generating of the super resolution image may include reconstructing a high resolution luminance image of the input stereo images using a first deep neural network trained on a first predetermined parameter between the stereo images and the high resolution luminance image. step; And generating a super-resolution image for the reconstructed high-resolution luminance image by using a second deep neural network trained on mapping a second predetermined parameter between the high-resolution image and the super-resolution image.

상기 고해상 휘도 이미지를 재구성하는 단계는 상기 스테레오 이미지들 각각을 휘도 이미지로 변환하고, 상기 변환된 어느 하나의 휘도 이미지를 이용하여 시차 이동이 있는 복수의 휘도 이미지들을 생성하며, 상기 변환된 다른 하나의 휘도 이미지와 상기 생성된 이동이 있는 복수의 휘도 이미지들을 연쇄시킨 후 상기 연쇄된 휘도 이미지를 입력으로 하는 상기 제1 딥 신경망을 이용하여 상기 고해상 휘도 이미지를 재구성할 수 있다.Reconstructing the high resolution luminance image converts each of the stereo images into a luminance image, generates a plurality of luminance images with parallax movement by using the converted one luminance image, and converts the converted other The high resolution luminance image may be reconstructed by concatenating the luminance image and the plurality of luminance images having the generated movement, and then using the first deep neural network that receives the chained luminance image as an input.

상기 제1 딥 신경망은 상기 스테레오 이미지들의 시차 이동과 상기 고해상 휘도 이미지 간의 휘도 매핑을 학습할 수 있다.The first deep neural network may learn luminance mapping between parallax movement of the stereo images and the high resolution luminance image.

상기 초해상 이미지를 생성하는 단계는 상기 재구성된 고해상 휘도 이미지와 상기 스테레오 이미지들 각각의 색차 성분을 연쇄시키고, 세미-잔차 학습 기법을 이용하여 상기 제2 딥 신경망을 통해 상기 초해상 이미지를 생성할 수 있다.The generating of the super resolution image may include concatenating the chrominance components of each of the reconstructed high resolution luminance image and the stereo images, and generating the super resolution image through the second deep neural network using a semi-residual learning technique. Can be.

상기 제2 딥 신경망은 상기 제2 딥 신경망을 구성하는 컨볼루션 레이어들 중 마지막 컨볼루션 레이어에서 정류 선형 유닛의 활성화 함수없이 상기 초해상 이미지를 생성할 수 있다.The second deep neural network may generate the super resolution image without an activation function of the rectifying linear unit in the last convolutional layer among the convolutional layers constituting the second deep neural network.

상기 제1 파라미터에 대한 매핑은 휘도 매핑을 포함하고, 상기 제2 파라미터에 대한 매핑은 색차 매핑을 포함할 수 있다.The mapping for the first parameter may include luminance mapping, and the mapping for the second parameter may include color difference mapping.

본 발명의 일 실시예에 따른 양안기반 초해상 이미징 장치는 스테레오 이미지들을 획득하는 획득부; 및 학습된 딥 신경망을 이용하여 상기 스테레오 이미지들로부터 초해상 이미지를 생성하는 생성부를 포함한다.Binocular-based super-resolution imaging device according to an embodiment of the present invention includes an acquisition unit for obtaining stereo images; And a generator configured to generate a super resolution image from the stereo images using the learned deep neural network.

상기 생성부는 상기 스테레오 이미지들과 고해상 휘도 이미지 간의 미리 설정된 제1 파라미터에 대한 매핑을 학습하고, 입력된 스테레오 이미지들에 대한 고해상 휘도 이미지를 재구성하는 제1 딥 신경망부; 및 고해상 이미지와 초해상 이미지 간의 미리 설정된 제2 파라미터에 대한 매핑을 학습하고, 상기 재구성된 고해상 휘도 이미지에 대한 초해상 이미지를 생성하는 제2 딥 신경망부를 포함한다.The generation unit may include: a first deep neural network unit configured to learn a mapping of a preset first parameter between the stereo images and the high resolution luminance image and to reconstruct a high resolution luminance image with respect to the input stereo images; And a second deep neural network unit for learning a mapping to a second predetermined parameter between the high resolution image and the super resolution image, and generating a super resolution image for the reconstructed high resolution luminance image.

상기 제1 딥 신경망부는 상기 스테레오 이미지들 각각을 휘도 이미지로 변환하고, 상기 변환된 어느 하나의 휘도 이미지를 이용하여 시차 이동이 있는 복수의 휘도 이미지들을 생성하며, 상기 변환된 다른 하나의 휘도 이미지와 상기 생성된 이동이 있는 복수의 휘도 이미지들을 연쇄시킨 후 상기 연쇄된 휘도 이미지를 입력으로 하여 상기 고해상 휘도 이미지를 재구성할 수 있다.The first deep neural network unit converts each of the stereo images into a luminance image, and generates a plurality of luminance images with parallax movement by using the converted one luminance image, and converts the converted luminance image into another luminance image. The high resolution luminance image may be reconstructed by concatenating the plurality of generated luminance images and inputting the chained luminance image as an input.

상기 제1 딥 신경망부는 상기 스테레오 이미지들의 시차 이동과 상기 고해상 휘도 이미지 간의 휘도 매핑을 학습할 수 있다.The first deep neural network unit may learn luminance mapping between parallax movement of the stereo images and the high resolution luminance image.

상기 제2 딥 신경망부는 상기 재구성된 고해상 휘도 이미지와 상기 스테레오 이미지들 각각의 색차 성분을 연쇄시키고, 세미-잔차 학습 기법을 이용하여 상기 초해상 이미지를 생성할 수 있다.The second deep neural network unit concatenates the reconstructed high resolution luminance image and the color difference component of each of the stereo images, and generates the super resolution image by using a semi-residual learning technique.

상기 제2 딥 신경망부는 상기 제2 딥 신경망을 구성하는 컨볼루션 레이어들 중 마지막 컨볼루션 레이어에서 정류 선형 유닛의 활성화 함수없이 상기 초해상 이미지를 생성할 수 있다.The second deep neural network unit may generate the super resolution image without an activation function of the rectifying linear unit in the last convolutional layer among the convolutional layers constituting the second deep neural network.

본 발명의 실시예들에 따르면, 딥 러닝(deep learning) 기술을 이용하여 양안기반(또는 스테레오) 시스템에서 이미지의 해상도를 향상시킬 수 있다.According to embodiments of the present invention, deep learning technology may be used to improve the resolution of an image in a binocular based (or stereo) system.

본 발명의 실시예들에 따르면, 딥 러닝을 초해상 알고리즘에 도입함으로써 높은 해상도의 이미지를 획득할 수 있고, 초해상 이미징에 사용되는 이미지의 수가 적으면서도 해상력을 향상시킬 수 있기 때문에 일반 소비자용 카메라에서도 효과적으로 사용될 수 있다.According to embodiments of the present invention, by introducing deep learning into a super resolution algorithm, a high resolution image can be obtained, and the resolution can be improved while the number of images used for super resolution imaging is small. It can also be used effectively.

이러한 본 발명은 카메라가 사용되는 분야들 예를 들어, 사진 촬영을 위한 카메라뿐만 아니라 고해상도 이미지를 필요로 하는 검사장비 등에서 활용될 수 있으며 특히 듀얼 카메라가 점차 확대중인 모바일 및 소비자용 카메라 분야에서 쉽게 적용될 수 있다. 즉, 본 발명을 통해 모바일 환경에서 듀얼 카메라를 이용하여 기존에 획득할 수 있었던 해상도보다 높은 해상도의 이미지를 얻을 수 있기 때문에 기존의 사진보다 더 선명한 화질의 사진을 촬영할 수 있다.The present invention can be used in fields where a camera is used, for example, a camera for photographing, as well as inspection equipment requiring a high resolution image, and in particular, in the field of mobile and consumer cameras in which dual cameras are gradually expanding. Can be. That is, according to the present invention, since the image having a higher resolution than the previously obtained resolution can be obtained using the dual camera in the mobile environment, it is possible to take a picture having a sharper picture quality than the existing picture.

도 1은 시차를 가지는 두 개의 스테레오 이미지들을 결합한 경우를 설명하기 위한 예시도를 나타낸 것이다.
도 2는 휘도 초해상 네트워크 구조에 대한 개념적인 구성을 나타낸 것이다.
도 3은 색차 초해상 네트워크 구조에 대한 개념적인 구성을 나타낸 것이다.
도 4는 스테레오 쌍에 의한 학습과 스테레오 텐서에 의한 학습 간의 초해상 성능 차이를 비교한 예시도를 나타낸 것이다.
도 5는 휘도 초해상 네트워크에서 잔차를 계산하는 마지막 레이어에서 활성화 함수에 의한 트레이닝 영향을 비교한 예시도를 나타낸 것이다.
도 6은 색차 채널에 대한 바이큐빅 보간법과 본 발명의 색차 네트워크를 비교한 예시도를 나타낸 것이다.
도 7은 잔차 학습에 대한 기존 방법과 본 발명에 따른 방법을 비교한 예시도를 나타낸 것이다.FIG. 1 illustrates an example diagram for describing a case where two stereo images having parallax are combined.
2 illustrates a conceptual configuration of a luminance super resolution network structure.
3 illustrates a conceptual configuration of a color difference super resolution network structure.
4 shows an exemplary diagram comparing the difference in super-resolution performance between learning by stereo pairs and learning by stereo tensors.
5 shows an exemplary diagram comparing training influences by an activation function in the last layer of calculating a residual in a luminance super-resolution network.
6 shows an exemplary diagram comparing a bicubic interpolation method for a color difference channel with a color difference network of the present invention.
7 shows an exemplary view comparing the existing method for residual learning with the method according to the present invention.

이하, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited or limited by the embodiments. Also, like reference numerals in the drawings denote like elements.

본 발명의 실시예들은, 양안기반(또는 스테레오) 시스템에서 딥 러닝 기술을 이용한 초해상 이미징 기술을 제공하는 것으로, 딥 러닝을 초해상 알고리즘에 도입함으로써 높은 해상도의 이미지를 획득하는 것을 그 요지로 한다.Embodiments of the present invention provide a super-resolution imaging technique using deep learning technology in a binocular-based (or stereo) system, and the gist of the present invention is to acquire a high resolution image by introducing deep learning into a super resolution algorithm. .

본 발명에서의 주요 직관은 불일치(disparity)가 연속적인 시차 이동(또는 쉬프트)의 이산 샘플들로 측정된다는 것으로, 스테레오에서 불일치 또는 깊이 추정 대신 연속적인 시차 이동의 이산 샘플들과 초해상 이미지 간의 매핑을 학습하는 딥 신경망을 구현(또는 구성)한다.The main intuition in the present invention is that the disparity is measured with discrete samples of continuous disparity shift (or shift), and mapping between discrete samples of consecutive disparity shift and superresolution images instead of disparity or depth estimation in stereo Implement (or construct) a deep neural network that learns.

이 때, 본 발명의 딥 신경망은 두 개의 딥 CNN으로 구성되며 종단 매핑을 학습할 수 있다. 첫 번째 딥 CNN은 한 쌍의 저해상 스테레오 이미지들로부터 초해상 휘도 이미지로의 휘도 매핑을 학습할 수 있고, 두 번째 딥 CNN은 재구성된 초해상 휘도를 갖는 저해상 색차(chrominance)에서 고해상 컬러 이미지로의 색차 매핑을 학습할 수 있다.At this time, the deep neural network of the present invention is composed of two deep CNN and can learn the termination mapping. The first deep CNN can learn the luminance mapping from a pair of low resolution stereo images to a super resolution luminance image, and the second deep CNN has a high resolution color image at low resolution chrominance with reconstructed super resolution luminance. Can learn color difference mapping to.

이러한 본 발명은 아직 초해상에 완전히 활용되지 않은 스테레오 입력을 활용하여 싱글 이미지 초해상 방식보다 스테레오 입력에서 고해상 이미지를 재구성하는 것이다.The present invention reconstructs a high resolution image from a stereo input rather than a single image super resolution using a stereo input that is not yet fully utilized in super resolution.

본 발명은 스테레오 쌍으로부터 픽셀 별 이산 불일치를 계산하는 것을 피하고자 한다. 대신, 본 발명은 딥 신경망을 사용하여 저해상 이미지의 주어진 스테레오 쌍에서 고해상 출력 이미지까지의 종단 간 맵핑을 직접 학습한다.The present invention seeks to avoid calculating pixel-by-pixel discrete inconsistencies from stereo pairs. Instead, the present invention uses deep neural networks to directly learn end-to-end mapping from a given stereo pair of low resolution images to a high resolution output image.

예컨대, 도 1의 시차를 가지는 두 개의 스테레오 이미지들을 결합한 경우를 나타낸 것과 같이, 시차는 2개의 스테레오 이미지 간의 원근감 투영을 통해 발생한다. 원근감 투영에서의 시차와 스테레오 이미지 변환 예를 들어, 어파인 변환(affine transformation)에서의 유사성 사이의 기하학적 차이에 의해, 시차에 의한 연속 이동은 해당 스테레오 쌍의 수평 및 수직 변위에서 서브 픽셀 해상도로 존재한다. 본 발명에서의 딥 CNN은 깊이 또는 불일치를 추정하지 않고 고해상 이미지를 재구성하기 위하여, 시차에 의한 서브 픽셀 이동을 이용함으로써, 이미지 해상도를 향상시킬 수 있다.For example, parallax occurs through perspective projection between two stereo images, as shown by combining two stereo images with parallax of FIG. 1. Due to geometric differences between parallax in perspective projection and stereo image transformation, for example, similarity in affine transformation, continuous movement due to parallax exists at sub-pixel resolution at the horizontal and vertical displacements of the stereo pair. do. The deep CNN in the present invention can improve image resolution by using subpixel shift due to parallax to reconstruct a high resolution image without estimating depth or inconsistency.

본 발명에서는 저해상 스테레오 이미지 쌍과 고해상 이미지 간의 매핑을 학습하는 딥 신경망 예를 들어, 딥 CNN을 공식화할 수 있다.In the present invention, a deep neural network for learning a mapping between a low resolution stereo image pair and a high resolution image may be formulated, for example, a deep CNN.

먼저, 일반적인 RGB 컬러 채널을 사용하는 대신, RGB 컬러를 크기

의 3D 텐서 형태인 YCbCr 계수로 변환한다.First, instead of using the usual RGB color channels,

Convert to YCbCr coefficients in the form of 3D tensors.

여기서, H는 높이를 의미하고, W는 폭을 의미하며, C는 입력 이미지의 채널 컬러 수를 의미할 수 있다. 트레이닝 데이터 세트

은 N개의 LR 스테레오 이미지 쌍

와 그라운드 트루스(ground-truth) 고해상 이미지

으로 주어진다 가정한다.

은 스테레오에서 왼쪽 기준(reference) 컬러 이미지를 의미하고,

은 스테레오에서 오른쪽 이미지를 의미한다. 본 발명에서는 바이큐빅 보간법(bicubic interpolation)을 이용하여 소스 저해상 스테레오 이미지의 해상도를 업샘플링함으로써, 고해상 이미지의 해상도와 매칭시킬 수 있다.Here, H may mean height, W may mean width, and C may mean the number of channel colors of the input image. Training data set

Is N LR stereo image pairs

And ground-truth high resolution images

Assume that given by

Means the left reference color image in stereo,

Means the right image in stereo. In the present invention, by upsampling the resolution of a source low resolution stereo image using bicubic interpolation, it is possible to match the resolution of a high resolution image.

양안기반 초해상 네트워크의 주요 목적은 주어진 입력 X로부터 고해상 이미지

을 예측할 수 있는 모델 F를 학습하기 위한 것이다. 본 발명에서의 양안 초해상 모델은 두 개의 딥 CNN 예를 들어, 휘도 초해상 네트워크와 색차 초해상 네트워크로 구성될 수 있다.The main purpose of a binocular based superresolution network is to provide a high resolution image from a given input X.

This is to train the model F that can predict. The binocular superresolution model in the present invention may be composed of two deep CNNs, for example, a luminance superresolution network and a chrominance superresolution network.

휘도 초해상 네트워크(또는 휘도 초해상 딥 CNN): 기존의 싱글 이미지 초해상 방법과 유사하게, 휘도

의 잔차 이미지(residual image)를 정의함으로써, 싱글 카메라에서 고해상 이미지와 저해상 이미지 간의 잔차를 학습할 수 있다. 여기서, y_L은 고해상 휘도 이미지를 의미하고, x_1,L은 저해상 휘도 스테레오 이미지들 중 하나를 의미할 수 있다. Luminance Super Resolution Network (or Luminance Super Resolution Deep CNN) : Similar to conventional single image super resolution methods, luminance

By defining a residual image of, a residual between a high resolution image and a low resolution image can be learned in a single camera. Here, y _L may mean a high resolution luminance image, and x _{1, L} may mean one of low resolution luminance stereo images.

본 발명에서는 듀얼 카메라의 스테레오 입력을 사용하기 때문에 스테레오 쌍의 일측에서 잔차를 학습하기 위해, 입력으로 2개의 스테레오 이미지들을 사용하여 네트워크를 통한 시차 이동을 학습한다.In the present invention, since the stereo input of the dual camera is used, in order to learn the residual on one side of the stereo pair, the parallax movement through the network is learned using two stereo images as inputs.

첫 번째 네트워크인 휘도 초해상 네트워크는 트레이닝 데이터 세트에 대해 고해상 휘도

및 저해상 휘도 스테레오 이미지

간의 잔차

을 학습한다. 시차에 의한 서브 픽셀 이동을 고려하기 위해, 결합된 스테레오 이미지 텐서

을 생성하기 위해 왼쪽 이미지와 오른쪽 이미지를 아래 <수학식 1>과 같이 리패킹(repacking)할 수 있다.The first network, Luminance Super Resolution Network, provides high resolution luminance for training data sets.

And low-resolution luminance stereo images

Residual

To learn. Combined Stereo Image Tensors to Consider Subpixel Shift by Parallax

In order to generate, the left image and the right image can be repacked as shown in Equation 1 below.

[수학식 1][Equation 1]

여기서,

는 j 번째 레이어의 이동된 오프셋을 의미하고, M은 이동 수를 의미할 수 있다.here,

Denotes a shifted offset of the j th layer, and M may denote a number of shifts.

본 발명에서는

의 평균 제곱 오차(MSE; mean squared error)를 최소화한다. 여기서, f는 저해상 휘도 이미지 x_1,L에 대해 고해상 휘도

를 예측한다. f는 주어진

로부터

를 추정하고, 네트워크를 통해 입력 디멘젼(dimension)을 줄일 수 있다.In the present invention

Minimize mean squared error (MSE). Where f is the high resolution luminance for the low resolution luminance image x _{1, L}

Predict. f is given

from

Can be estimated and the input dimension can be reduced through the network.

본 발명은 잔차 예측 모델 f가 학습되면, 아래 <수학식 2>와 같이 왼쪽 저해상 휘도 입력과 예측된 잔차를 합산함으로써, 고해상 휘도 이미지

를 재구성할 수 있다.According to the present invention, when the residual prediction model f is learned, the left low resolution luminance input and the predicted residual are summed as shown in Equation 2 below, thereby obtaining a high resolution luminance image.

Can be reconstructed.

[수학식 2][Equation 2]

여기서, 함수 f는 컨볼루션 레이어 f_k의 의 다수의 서브 세트를 포함할 수 있다.Here, the function f may comprise a plurality of subsets of the convolutional layer f _k .

k번째 컨볼루션 레이어 f_k(Z)는 아래 <수학식 3>과 같이 나타낼 수 있다.The kth convolutional layer f _k (Z) may be expressed as Equation 3 below.

[수학식 3][Equation 3]

여기서,

와

는 k번째 레이어의 컨볼루션 필터와 바이어스를 의미하고, *는 컨볼루션을 의미하며, max(0, )는 정류 선형 유닛(ReLU; rectified linear unit)을 의미할 수 있다.here,

Wow

Denotes a convolution filter and a bias of the k-th layer, * denotes a convolution, and max (0,) may denote a rectified linear unit (ReLU).

k 번째 레이어의 깊이가 n_k인 경우 커널

는

이고, 바이어스

의 크기는

이다. the kernel if the depth of the kth layer is n _k

Is

, Bias

The size of

to be.

색차 초해상 네트워크(또는 색차 초해상 딥 CNN): 색차 초해상 네트워크의 입력은 재구성된 고해상 휘도 이미지

과 스테레오

,

의 저해상 이미지의 Cb 및 Cr 채널로부터 업샘플된 색차이다. 트레이닝 데이터 세트

를 구성하기 위하여, 3 개의 입력 채널

를 아래 <수학식 4>와 같이 연쇄(concatenate)시킬 수 있다. Color difference Super resolution network (or color difference) Super Resolution Deep CNN) : Input of chrominance super resolution network is reconstructed high resolution luminance image

And stereo

,

Is the color difference upsampled from the Cb and Cr channels of the low resolution image. Training data set

To configure the three input channels

Can be concatenated as in Equation 4 below.

[수학식 4][Equation 4]

여기서,

와

은

와

을 의미할 수 있다.here,

Wow

silver

Wow

It may mean.

색차 초해상 네트워크의 주요 목적은 고해상 컬러 이미지를 재구성하는 것이다. 본 발명의 색차 초해상 네트워크는 고해상 컬러와 스테레오에서의 half-way-through 저해상 컬러 이미지 간의 잔차를 학습한다. 이를 위해, 본 발명은 색차

의 잔차 이미지를 정의하고,

의 평균 제곱 오차를 최소화한다. 여기서, 함수 g는 저해상 색차와 고해상 휘도의 결합된 입력 이미지

로부터 고해상 컬러들의 잔차를 예측할 수 있다.The main purpose of color difference super resolution networks is to reconstruct high resolution color images. The chrominance superresolution network of the present invention learns the residual between high resolution color and half-way-through low resolution color images in stereo. To this end, the present invention is a color difference

Define the residual image of,

Minimize the mean square error. Here, function g is the combined input image of low resolution color difference and high resolution luminance.

The residual of the high resolution colors can be predicted from.

본 발명은 기존의 잔차 학습과 달리, 추가 컨볼루션 레이어를 가지는 세미-잔차 학습 기법을 따른다. 예컨대, 도 3에 도시된 바와 같이, 입력으로 휘도 초해상 네트워크에서 재구성된 고해상 휘도 이미지와 저해상 이미지에서 업샘플링된 Cb, Cr 요소를 사용하고, 고해상 휘도 이미지와 업샘플링된 Cb, Cr 요소를 연쇄시킨다. 여기서, 저해상 이미지에 잔차 예측을 추가하는 대신, 원본 이미지와 잔차를 연쇄시킨 후 마지막 컨볼루션 레이어에 적용시킬 수 있다. 이러한 세미-잔차 학습 방법은 정확성을 향상시킨다.The present invention, unlike conventional residual learning, follows a semi-residual learning technique with additional convolutional layers. For example, as shown in FIG. 3, the input high-resolution luminance image and the resampled Cb and Cr elements in the low-resolution image and the high-resolution luminance image and the upsampled Cb and Cr element are used. Chain. Here, instead of adding the residual prediction to the low resolution image, the original image and the residual may be concatenated and then applied to the final convolutional layer. This semi-residual learning method improves accuracy.

본 발명은 기존 잔차 학습에서의 단순한 합산이 아니라, 예측 잔차 r을 컨볼루션 레이어 형식의 패스트 포워드 아이덴터티(fast-forwarded identity)

를 결합시킨다. 이들의 컨볼루션을 계산하기 위하여, 예측 잔차와 입력 컬러들을 아래 <수학식 5>와 같이 연쇄시키다.The present invention is not a simple summation in the existing residual learning, but the predicted residual r is fast forwarded identity in the form of a convolutional layer.

Combine. In order to calculate their convolution, the prediction residual and input colors are concatenated as in Equation 5 below.

[수학식 5][Equation 5]

여기서, c'=c-3일 수 있다.Here, c 'may be c-3.

마지막 서브 레이어 h는 정류 선형 유닛(ReLU) 활성화 함수없이

을 입력으로 사용하여 아래 <수학식 6>과 같이 최종 컬러 이미지

를 재구성할 수 있다.The final sublayer h is without the rectification linear unit (ReLU) activation function

With the input as the final color image as shown in Equation 6 below.

Can be reconstructed.

[수학식 6][Equation 6]

여기서, 마지막 레이어의 깊이는 n_h이고, 커널

의 크기는

이며, 바이어스

의 크기는

일 수 있다.Where the depth of the last layer is n _h and the kernel

The size of

Is the bias

The size of

Can be.

상술한 바와 같이, 본 발명은 채도 업샘플링 네트워크를 통해 최종 출력에서 고주파 정보를 향상시킬 수 있다. 이와 같이, 본 발명은 주어진 입력 X로부터 고해상 이미지

를 예측할 수 있는 모델 F를 학습할 수 있다.As described above, the present invention can improve high frequency information at the final output through the saturation upsampling network. As such, the present invention provides a high resolution image from a given input X.

We can learn the model F to predict.

이하, 상술한 휘도 초해상 네트워크와 채도 초해상 네트워크의 구조(또는 아키텍쳐)에 대해 상세히 설명하다.Hereinafter, the structure (or architecture) of the above-described luminance superresolution network and chroma superresolution network will be described in detail.

1. 휘도 1.luminance 초해상Super resolution 네트워크 구조 Network structure

대비(contrast)에 대한 인간 분해능은 주로 휘도에 의존적이기 때문에, 본 발명은 입력 이미지들의 휘도 채널을 사용한다.Since the human resolution for contrast is primarily dependent on luminance, the present invention uses the luminance channel of the input images.

도 2는 휘도 초해상 네트워크 구조에 대한 개념적인 구성을 나타낸 것으로, 첫 번째 레이어에는 입력으로 사용되는 왼쪽 이미지와 이동이 있는 32개의 오른쪽 이미지를 연쇄시킨 총 33개의 이미지가 포함되어 있는데, 왼쪽 뷰의 싱글 이미지와 오른쪽 뷰의 32개의 이미지가 시차 이동과 함께 존재한다. 여기서, 마지막 컨볼루션 레이어는 잔차를 계산하는데 있어서 정류 선형 유닛(ReLU)을 사용하지 않을 수 있다.FIG. 2 illustrates a conceptual configuration of a luminance super-resolution network structure, in which the first layer includes a total of 33 images concatenating the left image used as input and the 32 right images with movement. A single image and 32 images of the right view exist with parallax shift. Here, the last convolutional layer may not use a rectifying linear unit (ReLU) in calculating the residual.

첫 번째 컨볼루션 레이어의 필터 크기는 33×3×3×64이다. 본 발명은 64×3×3×64의 동일한 커널 크기를 가지는 14개의 내부 레이어를 사용한다. 각 컨볼루션 레이어 다음에는 ReLU가 존재한다. 마지막 컨볼루션 레이어는 64×3×3×1의 필터 크기를 가지며, 잔차 이미지를 재구성하는데 사용된다. 마지막으로, 왼쪽 카메라에 의해 캡쳐된 기준 이미지에 잔차를 추가함으로써, 고해상 휘도 이미지를 생성한다.The filter size of the first convolutional layer is 33x3x3x64. The present invention uses 14 inner layers with the same kernel size of 64x3x3x64. There is a ReLU after each convolution layer. The final convolutional layer has a filter size of 64 × 3 × 3 × 1 and is used to reconstruct the residual image. Finally, by adding the residual to the reference image captured by the left camera, a high resolution luminance image is generated.

수용 필드(receptive field): 수용 필드(또는 수용 영역)의 크기는 필터의 크기와 레이어의 수에 따라 달라진다. 예를 들어, 16개의 레이어를 가진 3×3 필터는 33×33의 수용 필드 크기를 동작시킨다. 본 발명에서는 시차에 의한 최대 불일치가 33보다 큰 64라고 가정하고 설명한다. Receptive field : The size of the receptive field (or receptive area) depends on the size of the filter and the number of layers. For example, a 3x3 filter with 16 layers operates a receiving field size of 33x33. In the present invention, it is assumed that the maximum discrepancy due to parallax is 64 larger than 33.

많은 레이어가 삽입될수록 수용 필드의 크기 뿐만 아니라 계산 비용 또한 증가하며, 수용 필드가 증가한다고 정확성 측면에서 성능 향상이 반드시 보장되는 것은 아니다.As more layers are inserted, not only the size of the acceptance field but also the computational cost increases, and the increase in the acceptance field does not necessarily guarantee a performance improvement in terms of accuracy.

이동이 있는 스테레오 텐서 (stereo tensor with shift): 본 발명에서의 네트워크는 이산 불일치를 결정하는 클래식한 블록 매칭을 사용하는 대신 딥 CNN을 사용하여 이동이 있는 입력 스테레오 텐서에서 유사한 패치들을 검출할 수 있다. 즉, 본 발명의 네트워크는 연속적인 시차 오프셋이 포함된 스테레오 채널에서 패치 유사성을 감지할 수 있다. 소스 이미지에서 가장 가까운 패치는 CNN을 통한 재구성을 위해 직접 사용될 수 있으므로, 이미지 향상을 위한 서브 픽셀 정밀 다중 샘플링이 가능하다. 불일치에 대한 패치 대응에 관계없이 유사한 패치를 찾는 것이 패치 해상도를 향상시키는 보다 효과적인 방법이다. Stereo tensor with shift : The network in the present invention can detect similar patches in a shifted input stereo tensor using deep CNN instead of using classic block matching to determine discrete discrepancies. . That is, the network of the present invention can detect patch similarity in a stereo channel including a continuous disparity offset. The nearest patch in the source image can be used directly for reconstruction via CNN, enabling sub-pixel precision multisampling for image enhancement. Regardless of the patch's response to inconsistencies, finding similar patches is a more effective way to improve patch resolution.

본 발명에서는 초해상 재구성을 위해 불일치 맵을 필요로 하지 않는다. 본 발명은 최대 이동 거리가 64 픽셀인 한 픽셀 간격으로 64개의 이동된 이미지를 사용하여 스테레오 이미지 텐서를 만든다. 일반적인 카메라 구성을 고려하여 이미지의 최대 불일치는 64픽셀 미만이라고 가정한다.The present invention does not require a mismatch map for super resolution reconstruction. The present invention creates a stereo image tensor using 64 shifted images at one pixel interval with a maximum shift distance of 64 pixels. Considering the general camera configuration, it is assumed that the maximum mismatch of images is less than 64 pixels.

도 4는 스테레오 쌍으로부터의 학습과 스테레오 텐서로부터의 학습 사이의 초해상 성능 차이를 비교한 예시도를 나타낸 것으로, 미들버리 데이터 세트에 대한 초해상 성능 차이를 비교한 도면을 나타낸 것이다.4 shows an exemplary diagram comparing the superresolution performance difference between learning from a stereo pair and learning from a stereo tensor, showing a comparison of the superresolution performance difference for a Middlebury data set.

도 4를 통해 알 수 있듯이, 스테레오 텐서를 이용한 학습이 스테레오 쌍을 이용한 학습에 비해 해상도가 더 증가하는 것을 알 수 있다.As can be seen from FIG. 4, it can be seen that the learning using the stereo tensor increases the resolution more than the learning using the stereo pair.

활성화 함수(activation functions): 네트워크의 마지막 컨볼루션 레이어는 고해상 휘도 이미지에 대한 잔차를 재구성한다. 마지막 레이어의 활성화 함수는 잔차 이미지의 대비 변화를 처리할 수 있다. Activation functions : The last convolutional layer of the network reconstructs the residuals for high resolution luminance images. The activation function of the last layer can handle the change in contrast of the residual image.

도 5는 휘도 초해상 네트워크에서 잔차를 계산하는 마지막 레이어에서 활성화 함수에 의한 트레이닝 영향을 비교한 일 예시도를 나타낸 것으로, 미들버리 데이터 세트에 대해 ReLU, Sigmoid, 활성화 함수가 없는 경우를 사용하여 얻은 성능을 비교한 예시도를 나타낸 것이다.FIG. 5 illustrates an example of comparing training influences by an activation function in the last layer of calculating a residual in a luminance super-resolution network, obtained using a case without a ReLU, Sigmoid, or activation function for a middlebury data set. Exemplary diagrams comparing performances are shown.

도 5를 통해 알 수 있듯이, 활성화 함수가 없는 경우 가장 높은 신호 대 잡음비(PSNR)를 보여주고, ReLU와 sigmoid 함수는 각각 [0, ∞]와 [-1, +1] 사이의 잔차 값을 클램핑하는 것을 알 수 있다. 이를 통해 이들이 대비 압축(contrast compression)을 일으키는 것을 알 수 있으며, 따라서 본 발명은 대상 이미지의 고주파 특성을 재생하기 위해 마지막 레이어에서 활성화 함수를 사용하지 않는다.As can be seen from FIG. 5, the absence of an activation function shows the highest signal-to-noise ratio (PSNR), and the ReLU and sigmoid functions clamp residual values between [0, ∞] and [-1, +1], respectively. I can see that. This shows that they cause contrast compression, so the present invention does not use an activation function in the last layer to reproduce the high frequency characteristics of the target image.

2. 2. 색차Color difference 초해상Super resolution 네트워크 구조 Network structure

기존의 초해상 알고리즘은 초해상 알고리즘을 독립적으로 세 가지 컬러 채널에 적용하거나 RGB 입력을 YCbCr 컬러 공간으로 변환하고 초해상 방법을 Y 휘도에만 적용하여 고해상 컬러를 재구성한다.Conventional superresolution algorithms reconstruct high resolution colors by applying the superresolution algorithm independently to three color channels or by converting RGB inputs to the YCbCr color space and applying the superresolution method to Y luminance only.

본 발명은 바이큐빅 보간법을 사용하는 대신, 색차 업샘플링 스테이지(stage)를 향상시키는 추가 네트워크를 사용한다. 휘도 성분이 컬러 이미지 해상도에 대해 지배적이기는 하지만, 바이큐빅 보간법을 사용하는 저해상 색차의 업샘플링은 에지(edge)에 컬러 블리딩(bleeding) 문제를 발생시킬 수 있다. 본 발명은 이러한 문제를 색차 초해상 네트워크를 이용하여 극복할 수 있다.Instead of using bicubic interpolation, the present invention uses an additional network that enhances the chrominance upsampling stage. Although the luminance component dominates the color image resolution, upsampling of low-resolution chrominance using bicubic interpolation can cause color bleeding problems at the edges. The present invention can overcome this problem by using a color difference super resolution network.

예컨대, 도 6에 도시된 바와 같이 바이큐빅 보간법은 에지 부분에서 컬러 블리딩 문제가 발생하지만 본 발명에 따른 방법은 색차 업샘플링을 통해 에지에서의 컬러 블리딩 문제가 제거되어 더 명확한 컬러를 보여주는 것을 알 수 있다.For example, as shown in FIG. 6, the bicubic interpolation problem causes color bleeding at the edge portion, but the method according to the present invention shows that the color bleeding problem at the edge is removed through chrominance upsampling, thereby showing a more clear color. have.

본 발명의 색차 초해상 네트워크는 일반적인 잔차 학습 구조를 따르며, 잔차 이미지 학습에 15개의 컨볼루션 레이어를 사용하고, 저해상 색도(chromaticity) 채널과 재구성된 고해상 휘도 이미지로부터 최종 이미지를 생성하기 위한 추가 레이어를 사용한다. 첫 번째 컨볼루션 레이어의 커널 크기는 3×3×3×64이고, 내부 컨볼루션 레이어는 각 컨볼루션 동작에 대해 64×3×3×64의 커널 크기를 가질 수 있다.The chrominance superresolution network of the present invention follows a general residual learning structure, uses 15 convolutional layers for residual image learning, and an additional layer for generating a final image from a low resolution chromaticity channel and a reconstructed high resolution luminance image. Use The kernel size of the first convolutional layer is 3 × 3 × 3 × 64, and the internal convolutional layer can have a kernel size of 64 × 3 × 3 × 64 for each convolution operation.

그리고, 기준 이미지에 잔차 학습에 의한 잔차를 연쇄시킨 후 6×3×3×4의 커널 크기 마지막 컨볼루션 레이어에 적용한다. 이 때, 마지막 컨볼루션 레이어는 최종 결과를 계산할 때 활성화 함수를 사용하지 않는다. Then, the residuals of the residual learning are concatenated to the reference image and then applied to a kernel size final convolutional layer of 6 × 3 × 3 × 4. At this point, the last convolutional layer does not use the activation function to calculate the final result.

색차 초해상의 잔차(residuals of chrominance SR): 재구성된 고해상 휘도로부터 고주파 세부 사항을 보존하기 위해, 본 발명은 패스트 포워드 아이덴터티의 합 대신에 네트워크의 종단에 컨볼루션 레이어를 포함하는 세미-잔차 학습법을 이용한다. Cb와 Cr 채널의 세부 사항은 컨볼루션 단계를 통해 사라질 수 있지만, 세미-잔차 접근 방식은 고주파 세부 정보를 증가시킬 수 있다. Color difference Of the super-resolution residual (residuals of chrominance SR): in order to preserve the high frequency detail from the reconstructed high-resolution brightness, the present invention is semi-containing convolution layer to the terminal of network, instead of the sum of fast-forward identity and uses the residual learning . The details of the Cb and Cr channels can be lost through the convolution step, but the semi-residual approach can increase the high frequency details.

예컨대, 도 7에 도시된 미들버리 데이터 세트(Middlebury dataset)에 대한 잔차 학습 방법의 비교 결과를 통해 알 수 있듯이, 색차 네트워크의 종단에 추가 컨볼루션 레이어를 포함하는 세미-잔차 학습 방법이 바이큐빅 업샘플링 방법과 기존 잔차 학습 방법에 비해 재구성된 이미지의 PSNR이 더 증가하는 것을 알 수 있다.For example, as can be seen from the comparison result of the residual learning method for the Middlebury dataset shown in FIG. 7, the semi-residual learning method including an additional convolutional layer at the end of the chrominance network is bicubic up. It can be seen that the PSNR of the reconstructed image increases more than the sampling method and the existing residual learning method.

상술한 바와 같이, 본 발명은 두 개의 딥 러닝 네트워크를 이용하여 스테레오 이미지들로부터 초해상 컬러 이미지를 생성할 수 있다.As described above, the present invention can generate super resolution color images from stereo images using two deep learning networks.

여기서, 첫 번째 네트워크는 두 개의 스테레오 이미지를 사용하여 휘도 이미지의 해상도를 향상시키는데, 스테레오 쌍을 사용하는 대신, 이동 후보들을 갖는 스테레오 텐서를 사용하여 모델을 학습함으로써, 스테레오 유사성을 학습한다.Here, the first network uses two stereo images to improve the resolution of the luminance image. Instead of using stereo pairs, the stereo network learns stereo similarity by using a stereo tensor with moving candidates.

두 번째 네트워크는 고해상 휘도 이미지를 기준으로 이미지의 채도 성분을 업샘플링하고, 세미-잔차 학습을 가능하게 하는 추가적인 컨볼루션 레이어를 사용하여 예측된 채도 고해상 이미지의 해상도를 향상시킴으로써, 초해상 컬러 이미지를 생성한다.The second network upsamples the saturation component of the image based on the high-resolution luminance image and improves the resolution of the predicted saturation high-resolution image by using an additional convolutional layer that enables semi-residual learning. Create

또한, 본 발명은 휘도 매핑과 색차 매핑으로 설명하였지만, 이에 한정되지 않으며 이미지와 관련된 서로 다른 파라미터 매핑이 학습된 딥 신경망들을 이용하여 구현될 수 있다는 것은 이 기술 분야에 종사하는 당업자에게 있어서 자명하다. 또한 본 발명은 휘도 매핑과 색차 매핑을 포괄할 수 있는 휘도에 기초한 매핑이 학습된 딥 신경망들을 이용하여 구현될 수 있다.In addition, although the present invention has been described as luminance mapping and chrominance mapping, it is apparent to those skilled in the art that different parameter mapping related to an image may be implemented using learned deep neural networks. In addition, the present invention can be implemented using deep neural networks that have been learned based on luminance, which can cover luminance mapping and chrominance mapping.

이러한 본 발명은 장치와 방법으로 구현될 수 있으며, 장치로 구성되는 경우 두 개의 네트워크 즉, 제1 딥 신경망을 포함하는 제1 딥 신경망부와 제2 딥 신경망을 포함하는 제2 딥 신경망부를 이용하여 입력되는 스테레오 이미지들에 대한 초해상 이미지를 생성할 수 있다.The present invention may be implemented by a device and a method, and when the device is configured, two networks, that is, a first deep neural network including a first deep neural network and a second deep neural network including a second deep neural network, may be used. A super resolution image may be generated for input stereo images.

여기서, 제1 딥 신경망부는 상술한 휘도 네트워크 구조 즉, 도 2의 네트워크 구조를 포함할 수 잇고, 제2 딥 신경망부는 상술한 채도 네트워크 구조 즉, 도 3의 네트워크 구조를 포함할 수 있다.Here, the first deep neural network unit may include the above-described luminance network structure, that is, the network structure of FIG. 2, and the second deep neural network unit may include the above-described chroma network structure, that is, the network structure of FIG. 3.

또한, 딥 러닝을 이용한 초해상 이미징 방법은 스테레오 이미지들을 입력으로 하는 제1 딥 신경망을 이용하여 고해상 휘도 이미지를 재구성하고, 고해상 휘도 이미지와 스테레오 이미지의 Cb, Cr 채널로 업샘플링된 채도를 입력으로 하는 제2 딥 신경망을 이용하여 초해상 이미지를 생성하는 과정을 수행할 수 있다.In addition, the super resolution imaging method using deep learning reconstructs a high resolution luminance image by using a first deep neural network that inputs stereo images and inputs the upsampled saturation into the Cb and Cr channels of the high resolution luminance image and the stereo image as an input. A process of generating a super resolution image may be performed by using a second deep neural network.

또한, 장치와 방법으로 구현되는 경우 딥 러닝을 이용한 초해상 이미징 과정과 내용을 모두 포함할 수 있다는 것은 자명하다.In addition, it is obvious that the apparatus and method may include both the super resolution imaging process and the content using deep learning.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 시스템, 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the systems, devices, and components described in the embodiments may include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable arrays (FPAs). ), A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For the convenience of understanding, the processing apparatus may be described as one used, but those skilled in the art will appreciate that the processing apparatus includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and may configure the processing device to operate as desired, or process independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예들에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 족慣瀏 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiments may be embodied in the form of program instructions that may be executed by various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of footstep commands include high-level language code that can be executed by a computer using an interpreter, as well as machine code such as produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or, even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

Obtaining stereo images; And
Generating a super resolution image from the stereo images using a learned deep neural network
Including,
Generating the super resolution image
A binocular based super resolution imaging method for generating a super resolution image from the stereo images using a deep neural network trained by parallax.

The method of claim 1,
Generating the super resolution image
Reconstructing a high resolution luminance image for the stereo images input using a first deep neural network trained on a mapping of a preset first parameter between the stereo images and the high resolution luminance image; And
Generating a super-resolution image for the reconstructed high-resolution luminance image using a second deep neural network trained on mapping a second predetermined parameter between the high-resolution luminance image and the super-resolution image
Binocular-based super-resolution imaging method comprising a.

The method of claim 2,
Reconstructing the high resolution luminance image
Converting each of the stereo images into a luminance image, and generating a plurality of luminance images with parallax movement by using the converted one luminance image, wherein the converted other luminance image and the generated parallax movement are And reconstructing the high resolution luminance image by concatenating a plurality of luminance images and using the first deep neural network as the input.

The method of claim 2,
The first deep neural network
Binocular-based super resolution imaging method, characterized in that it learns a luminance mapping between the parallax shift of the stereo images and the high resolution luminance image.

The method of claim 2,
Generating the super resolution image
Binocular-based super resolution imaging, wherein the reconstructed high resolution luminance image and the chrominance component of each of the stereo images are concatenated and the super resolution image is generated through the second deep neural network using a semi-residual learning technique. Way.

The method of claim 2,
The second deep neural network
And generating the super-resolution image in the last convolutional layer of the second deep neural network without the activation function of the rectifying linear unit.

The method of claim 2,
The mapping for the first parameter is
Including luminance mapping,
The mapping for the second parameter is
A binocular-based super resolution imaging method comprising color difference mapping.

An acquisition unit for acquiring stereo images; And
A generator for generating a super resolution image from the stereo images using the learned deep neural network
Including,
The generation unit
A binocular-based super resolution imaging apparatus for generating a super resolution image from the stereo images using a deep neural network trained by parallax.

The method of claim 8,
The generation unit
A first deep neural network unit configured to learn a mapping of a first predetermined parameter between the stereo images and the high resolution luminance image and to reconstruct a high resolution luminance image with respect to the input stereo images; And
A second deep neural network portion that learns a mapping to a second predetermined parameter between the high resolution luminance image and the super resolution image and generates a super resolution image for the reconstructed high resolution luminance image
Binocular-based super-resolution imaging device comprising a.

The method of claim 9,
The first deep neural network part
Converting each of the stereo images into a luminance image, and generating a plurality of luminance images with parallax movement by using the converted one luminance image, wherein the converted other luminance image and the generated parallax movement are And concatenating the plurality of luminance images and reconstructing the high resolution luminance image by inputting the chained luminance image.

The method of claim 9,
The first deep neural network part
Binocular-based superresolution imaging apparatus, characterized in that it learns a luminance mapping between the parallax shift of the stereo images and the high resolution luminance image.

The method of claim 9,
The second deep neural network part
And concatenating the chrominance components of the reconstructed high resolution luminance image and the stereo images and generating the super resolution image using a semi-residual learning technique.

The method of claim 9,
The second deep neural network part
And generating the super-resolution image without an activation function of the rectifying linear unit in the last convolutional layer among the convolutional layers constituting the second deep neural network.

The method of claim 9,
The mapping for the first parameter is
Including luminance mapping,
The mapping for the second parameter is
A binocular-based super resolution imaging device comprising color difference mapping.