KR102334730B1

KR102334730B1 - Generative Adversarial Network for Joint Light Field Super-resolution and Deblurring and its Operation Method

Info

Publication number: KR102334730B1
Application number: KR1020200154826A
Authority: KR
Inventors: 박인규; 조나단사무엘
Original assignee: 인하대학교 산학협력단
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-12-03

Abstract

An adversarial neural network model apparatus for simultaneously performing light field super-resolution and deblurring and an operation method thereof are suggested. An adversarial neural network model apparatus for simultaneously performing light field super-resolution and deblurring according to an embodiment includes: a super-resolution network for receiving a subview image of a low-resolution or blurred light field and outputting a super-resolution reconstructed image; and a deblurring network for receiving the super-resolution reconstructed image, passing the same through at least one or more convolution blocks, and estimating an image from which super-resolution and motion blur have been removed. It is possible to provide an end-to-end network including the super-resolution network and the deblurring network.

Description

Adversarial neural network model apparatus for simultaneous performance of light field super-resolution and blur removal and its operation method

아래의 실시예들은 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치 및 그 동작 방법에 관한 것이다. The following embodiments relate to an adversarial neural network model apparatus for simultaneously performing lightfield super-resolution and blur removal, and an operating method thereof.

최근 라이트필드 이미징은 시점 합성(각 영역(angular domain) 초해상도(super-resulution, SR)), 영상 재초점, 깊이 추정과 같은 다양한 연구들의 증가하는 인기로 인해 주목을 받고 있다. 하지만 라이트필드의 공간 영역(spatial domain) 초해상도 복원 및 모션 블러 제거와 같은 영상 향상에 대한 연구는 비슷한 관심을 받지 못하였다. 이는 하나의 라이트필드 영상에는 대량의 데이터가 포함되어 있고, 시차 제약 조건도 유지해야 하는 특성 때문이다. 따라서 최근 라이트필드 블러 제거 및 초해상도 복원에 대한 주목할 만한 연구는 거의 존재하지 않는다. Recently, light field imaging has attracted attention due to the increasing popularity of various studies such as viewpoint synthesis (angular domain super-resolution (SR)), image refocusing, and depth estimation. However, studies on image enhancement such as spatial domain super-resolution reconstruction of lightfield and motion blur removal have not received similar attention. This is because a single light field image contains a large amount of data and the parallax constraint must be maintained. Therefore, there are few notable studies on lightfield blur removal and super-resolution restoration recently.

라이트필드 영상 향상을 올바르게 수행하기 위해서는 시차 정보를 취득하는 라이트필드의 도메인을 자세히 설명하는 것이 중요하다. 하나의 라이트필드 영상은 2차원 공간 영역 및 2차원 각 영역으로 분산된 픽셀들을 갖고 있다. 기존의 2차원 영상과는 다른 추가적인 각 영역은 시차와 관련된 다시점 영상들(subview images)을 생성한다. 라이트필드의 각각의 subview 영상은 R. Ng (비특허문헌 11)에 의해 기술된 바와 같이 특정 각도 방향에 위치하는 2차원 영상을 의미하며, 대부분 라이트필드의 공간 및 각도영역의 좌표는 각각 (x, y) 및 (u, v)를 사용하여 나타낸다. In order to properly perform lightfield image enhancement, it is important to describe in detail the domain of the lightfield from which parallax information is acquired. One lightfield image has a two-dimensional spatial region and pixels dispersed in each two-dimensional region. Each additional area different from the existing 2D image generates subview images related to parallax. Each subview image of the light field means a two-dimensional image located in a specific angular direction as described by R. Ng (Non-Patent Document 11), and most of the coordinates of the space and angular region of the light field are (x , y) and (u, v) are used.

초기 연구에서는 단일 카메라 앞에서 작은 물체를 회전시킴으로써 여러 장면을 생성하여 라이트필드를 취득하였다. 최근에는 라이트필드 영상을 취득할 수 있는 사용자 친화적인 라이트필드 카메라 Lytro의 크기가 최신 디지털 카메라와 비슷하여 라이트필드 취득에 편리성을 갖게 되었다. Lytro 카메라는 비록 더 이상 생산되지는 않지만 여전히 다시점 영상을 취득하는 데 사용되고 있다. 다시점 영상의 경우 주 렌즈 뒤에 위치한 마이크로 렌즈를 통해 획득할 수 있다. 마이크로 렌즈들 사이의 각도는 작지만 그럴듯한 시차 효과를 제공하기 때문에 주로 영상 깊이 추정에 이용되고 있다. 그러나 이러한 기법들은 취득한 라이트필드의 품질에 크게 의존하므로 라이트필드 영상 취득 후에 라이트필드 향상 기법이 필요하다. In early studies, light fields were acquired by creating multiple scenes by rotating a small object in front of a single camera. Recently, the user-friendly light field camera Lytro that can acquire light field images is similar in size to the latest digital camera, so it has convenience in light field acquisition. Although the Lytro camera is no longer produced, it is still used to acquire multi-view images. In the case of a multi-viewpoint image, it can be acquired through a micro lens located behind the main lens. Although the angle between the microlenses is small, it is mainly used for image depth estimation because it provides a plausible parallax effect. However, since these techniques greatly depend on the quality of the acquired light field, a light field enhancement technique is required after acquiring the light field image.

초해상도 기법은 저해상도 영상들의 보간된 픽셀들에 의해 발생한 블러된 영역을 복원하거나 새로운 고해상도의 위치에 픽셀들을 채우는 작업이다. 2차원 영상에 CNN을 활용한 초해상도의 최신기법은 훈련 중 빠르게 수렴이 가능하도록 한 Kim et al. 기법(비특허문헌 1)으로부터 알려져 있다. 2차원의 영상과는 다르게 라이트필드 영상의 초해상도는 2차원 공간 영역 및 2차원 각 영역에서 수행할 수 있다. 2차원 각 영역 초해상도는 시점 합성과정으로도 알려져 있다. The super-resolution technique restores a blurred area generated by interpolated pixels of low-resolution images or fills the pixels at a new high-resolution position. Kim et al. It is known from the technique (nonpatent literature 1). Unlike a two-dimensional image, the super-resolution of a light field image can be performed in a two-dimensional spatial domain and each two-dimensional domain. The two-dimensional angular region super-resolution is also known as the viewpoint synthesis process.

초해상도와 유사하게 블러 제거의 주요 작업은 입력으로 주어진 블러된 영상을 선명하게 복원하는 것이다. 하지만 본 실시예의 경우에는 카메라와 물체의 움직임으로부터 발생한 블러에 초점을 맞춘다. 블러 제거의 기존의 연구에서는 움직임을 표현하는 point spread 함수나 블러 커널을 복원하는 것을 시도하였다. 기존의 연구(비특허문헌 5)에서는 블러 제거를 위한 특정 라이트필드 카메라 움직임을 소개하였다. 기존의 연구에서는 라이트필드 필드 카메라 움직임을 시뮬레이션 하기 위한 3-DOF 변환 모델을 소개하였다. Similar to super-resolution, the main task of blur removal is to clearly restore the blurred image given as input. However, in the present embodiment, the focus is on the blur caused by the movement of the camera and the object. Previous research on blur removal attempted to restore a point spread function or a blur kernel that expresses motion. In an existing study (Non-Patent Document 5), a specific light field camera movement for blur removal was introduced. In the previous study, a 3-DOF transformation model was introduced to simulate light field camera movement.

J. Kim, J. K. Lee, and K. M. Lee, "Accurate image super-resolution using very deep convolutional networks," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 1646-1654, June 2016. J. Kim, J. K. Lee, and K. M. Lee, "Accurate image super-resolution using very deep convolutional networks," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 1646-1654, June 2016. D. Krishnan, T. Tay, and R. Fergus, "Blind deconvolution using a normalized sparsity measure," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 233-240, June 2011. D. Krishnan, T. Tay, and R. Fergus, "Blind deconvolution using a normalized sparsity measure," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 233-240, June 2011. J. Pan, Z. Hu, Z. Su, and M. H. Yang, "Deblurring text images via L0-regularized intensity and gradient prior," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 2901-2908, June 2014. J. Pan, Z. Hu, Z. Su, and M. H. Yang, “Deblurring text images via L0-regularized intensity and gradient prior,” In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 2901-2908, June 2014. T. H. Kim, K. M. Lee, B. Scholkopf, and M. Hirsch, "Online video deblurring via dynamic temporal blending network," In Proc. of IEEE International Conference on Computer Vision, pages 4058-4067, 2017. T. H. Kim, K. M. Lee, B. Scholkopf, and M. Hirsch, “Online video deblurring via dynamic temporal blending network,” In Proc. of IEEE International Conference on Computer Vision, pages 4058-4067, 2017. P. P. Srinivasan, R. Ng, and R. Ramamoorthi, "Light field blind motion deblurring," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 3958-3966, July 2017. P. P. Srinivasan, R. Ng, and R. Ramamoorthi, "Light field blind motion deblurring," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 3958-3966, July 2017. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, June 2016. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, June 2016. P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 1125-1134, July 2017. P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 1125-1134, July 2017. Y. W. Tai, P. Tan, and M. S. Brown, "Richardson-Lucy deblurring for scenes under a projective motion path," IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(8):1603-1618, August 2011. Y. W. Tai, P. Tan, and M. S. Brown, "Richardson-Lucy deblurring for scenes under a projective motion path," IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(8):1603-1618, August 2011. Y. Bok, H. G. Jeon, and I. S. Kweon, "Geometric calibration of micro-lense-based light field cameras using line features," IEEE Trans. on Pattern Analysis and Machine Intelligence, 39(2):287-300, February 2017. Y. Bok, H. G. Jeon, and I. S. Kweon, “Geometric calibration of micro-lense-based light field cameras using line features,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 39(2):287-300, February 2017. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, "Depth from combining defocus and correspondence using light-field cameras," In Proc. of the IEEE International Conference on Computer Vision, pages 673-680, December 2013. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, "Depth from combining defocus and correspondence using light-field cameras," In Proc. of the IEEE International Conference on Computer Vision, pages 673-680, December 2013. R. Ng, "Fourier slice photography," ACM Trans. on Graphics, 24(3):735-744, July 2005. R. Ng, "Fourier slice photography," ACM Trans. on Graphics, 24(3):735-744, July 2005.

실시예들은 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치 및 그 동작 방법에 관하여 기술하며, 보다 구체적으로 라이트필드 영상의 공간 영역 초해상도 및 모션 블러 제거를 동시에 해결하는 end-to-end 네트워크 기술을 제공한다. The embodiments describe an adversarial neural network model apparatus for simultaneous performance of lightfield super-resolution and blur removal and an operation method thereof, and more specifically, end-to-end solving spatial domain super-resolution and motion blur removal of lightfield images simultaneously -end Provides network technology.

실시예들은 down/upscale 파라미터를 사용하여 여러 해상도에 적용이 가능하도록 하였으며, 현재의 라이트필드 영상 향상의 문제점을 해결하기 위해 초해상도 및 모션 블러 제거를 동시에 수행하는 적대적 신경망 모델 장치 및 그 동작 방법을 제공하는데 있다. The embodiments made it possible to apply to multiple resolutions using down/upscale parameters, and to solve the problem of current light field image enhancement, an adversarial neural network model device that simultaneously performs super-resolution and motion blur removal and an operating method thereof is to provide

일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치는, 저해상도 또는 블러된 라이트필드의 subview 영상을 입력 받아 초해상도 복원 영상을 출력하는 초해상도 네트워크; 및 상기 초해상도 복원 영상을 입력 받아 적어도 하나 이상의 컨볼루션 블록을 통과시켜 초해상도 및 모션 블러가 제거된 영상을 추정하는 블러 제거 네트워크를 포함하고, 상기 초해상도 네트워크 및 상기 블러 제거 네트워크를 포함하는 end-to-end 네트워크를 제공할 수 있다. An adversarial neural network model apparatus for simultaneously performing light field super-resolution and blur removal according to an embodiment includes: a super-resolution network that receives a subview image of a low-resolution or blurred light field and outputs a super-resolution reconstructed image; and a blur removal network for estimating an image from which super resolution and motion blur have been removed by receiving the super-resolution reconstructed image and passing at least one or more convolution blocks as an input, end including the super-resolution network and the blur removal network Can provide -to-end networks.

상기 초해상도 네트워크는, 상기 초해상도 복원 영상의 해상도는 d 스케일로 다운샘플링되며, 이웃 subview 영상에 대한 추가적인 정보를 획득하기 위해 입력된 상기 subview 영상은 적어도 이웃한 영상들과 연결될 수 있다. In the super-resolution network, the resolution of the reconstructed super-resolution image is down-sampled to a d scale, and the subview image input to obtain additional information about the neighboring subview image may be connected to at least neighboring images.

초해상도 복원 및 블러 제거의 손실들의 역전파를 허용하도록 복수 개의 단계로 나누어 상기 end-to-end 네트워크의 훈련을 진행하고, 저해상도 및 6-DOF 모션 블러된 라이트필드 데이터셋으로 훈련하는 구별자 네트워크를 더 포함할 수 있다. A discriminator network that trains the end-to-end network by dividing it into multiple steps to allow backpropagation of losses in super-resolution restoration and blur removal, and training with low-resolution and 6-DOF motion-blurred lightfield datasets may further include.

영상의 전체 및 지역 영역으로 처리하는 생성적 적대 신경망(Generative Adversarial Network, GAN)을 제공하며, 상기 구별자 네트워크는, 첫 번째 단계에서 생성자 네트워크로부터 예측된 상기 초해상도 및 모션 블러가 제거된 영상의 전체 영역을 사용하여 상기 구별자 네트워크를 훈련하고, 두 번째 단계에서 동일한 상기 구별자 네트워크를 이용하여 추가적인 지역 영역을 학습할 수 있다. It provides a generative adversarial network (GAN) that processes the whole and local regions of an image, wherein the discriminator network is the super-resolution and motion blur-removed image predicted from the generator network in the first step. The discriminator network may be trained using the entire region, and additional local regions may be learned using the same discriminator network in a second step.

다른 실시예에 따른 컴퓨터 장치를 통해 수행되는 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치의 동작 방법은, 초해상도 네트워크에서 저해상도 또는 블러된 라이트필드의 subview 영상을 입력 받아 초해상도 복원 영상을 출력하는 단계; 및 블러 제거 네트워크에서 상기 초해상도 복원 영상을 입력 받아 적어도 하나 이상의 컨볼루션 블록을 통과시켜 초해상도 및 모션 블러가 제거된 영상을 추정하는 단계를 포함하고, 상기 초해상도 네트워크 및 상기 블러 제거 네트워크를 포함하는 end-to-end 네트워크를 제공할 수 있다. The operating method of the adversarial neural network model apparatus for simultaneously performing light field super-resolution and blur removal performed through a computer device according to another embodiment receives a subview image of a low-resolution or blurred light field from a super-resolution network and receives a super-resolution outputting a restored image; and estimating the super-resolution and motion blur-removed image by receiving the super-resolution reconstructed image from the blur removal network and passing it through at least one or more convolution blocks, including the super-resolution network and the blur removal network It can provide an end-to-end network that

실시예들에 따르면 down/upscale 파라미터를 사용하여 여러 해상도에 적용이 가능하도록 하였으며, 현재의 라이트필드 영상 향상의 문제점을 해결하기 위해 초해상도 및 모션 블러 제거를 동시에 수행하는 적대적 신경망 모델 장치 및 그 동작 방법을 제공할 수 있다. According to the embodiments, it is possible to apply to multiple resolutions using down/upscale parameters, and to solve the problem of current light field image improvement, an adversarial neural network model device that simultaneously performs super-resolution and motion blur removal and its operation method can be provided.

도 1은 일 실시예에 따른 프레임워크 결과를 나타내는 도면이다.
도 2a는 일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치를 설명하기 위한 도면이다.
도 2b는 일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치의 동작 방법을 나타내는 흐름도이다.
도 3은 일 실시예에 따른 초해상도 네트워크의 구조를 나타내는 도면이다.
도 4는 일 실시예에 따른 블러 제거 네트워크의 구조를 나타내는 도면이다.
도 5는 일 실시예에 따른 구별자 네트워크를 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 알고리즘의 정성적 결과 영상을 나타내는 도면이다.
도 7은 일 실시예에 따른 정성적 결과 영상을 나타내는 도면이다. 1 is a diagram illustrating a framework result according to an embodiment.
2A is a diagram for explaining an adversarial neural network model apparatus for simultaneously performing light field super-resolution and blur removal according to an embodiment.
2B is a flowchart illustrating a method of operating an adversarial neural network model apparatus for simultaneously performing light field super-resolution and blur removal according to an embodiment.
3 is a diagram illustrating a structure of a super-resolution network according to an embodiment.
4 is a diagram illustrating a structure of a blur removal network according to an embodiment.
5 is a diagram for describing a discriminator network according to an embodiment.
6 is a diagram illustrating a qualitative result image of an algorithm according to an exemplary embodiment.
7 is a diagram illustrating a qualitative result image according to an exemplary embodiment.

이하, 첨부된 도면을 참조하여 실시예들을 설명한다. 그러나, 기술되는 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명되는 실시예들에 의하여 한정되는 것은 아니다. 또한, 여러 실시예들은 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. 도면에서 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, embodiments will be described with reference to the accompanying drawings. However, the described embodiments may be modified in various other forms, and the scope of the present invention is not limited by the embodiments described below. In addition, various embodiments are provided in order to more completely explain the present invention to those of ordinary skill in the art. The shapes and sizes of elements in the drawings may be exaggerated for clearer description.

도 1은 일 실시예에 따른 프레임워크 결과를 나타내는 도면이다.1 is a diagram illustrating a framework result according to an embodiment.

도 1의 (a)는 저해상도 및 6-DOF 모션 블러된 라이트필드의 중앙 subview 영상을 나타내고, (b)는 제안하는 기법을 통해 초해상도 복원 및 모션 블러를 제거한 라이트필드 subview 영상을 나타낸다. 여기서, 프레임워크는 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치를 의미한다.Fig. 1 (a) shows the central subview image of the low-resolution and 6-DOF motion-blurred light field, and (b) shows the light field subview image with super-resolution restoration and motion blur removed through the proposed technique. Here, the framework refers to an adversarial neural network model device for simultaneous performance of lightfield super-resolution and blur removal.

시차 기반 영상처리에 대한 연구들이 증가함에 따라 저해상도 및 모션 블러된 라이트필드 영상을 복원하는 연구는 필수적이 되었다. 이러한 기법들은 라이트필드 영상 향상 과정으로 알려져 있으나 2개 이상의 문제를 동시에 해결하는 기존의 연구는 거의 존재하지 않는다. As studies on parallax-based image processing increase, research to restore low-resolution and motion-blurred lightfield images has become essential. These techniques are known as lightfield image enhancement processes, but there are few existing studies that solve two or more problems at the same time.

본 발명의 실시예에서는 라이트필드 공간 영역 초해상도 복원과 모션 블러 제거를 동시 수행하는 프레임워크를 제안한다. 특히, 저해상도 및 6-DOF 모션 블러된 라이트필드 데이터셋으로 훈련하는 간단한 네트워크를 생성한다. 또한 성능을 향상하기 위해 생성적 적대 신경망의 지역 영역 최적화 기법을 제안하였다. 제안한 프레임워크는 정량적, 정성적 측정을 통해 평가하고, 기존의 state-of-the-art 기법들과 비교하여 우수한 성능을 나타냄을 보인다. 본 실시예에서의 라이트필드 향상 기법은 공간 영역 해상도 4배 향상과 6-DOF 모션 모델 기반 블러 제거로 정의한다.An embodiment of the present invention proposes a framework for simultaneously performing lightfield spatial domain super-resolution restoration and motion blur removal. Specifically, we create a simple network that trains on a low-resolution and 6-DOF motion-blurred lightfield dataset. Also, to improve the performance, we proposed a regional domain optimization technique for generative adversarial neural networks. The proposed framework is evaluated through quantitative and qualitative measurement, and it shows excellent performance compared to the existing state-of-the-art methods. The light field enhancement technique in this embodiment is defined as a 4-fold improvement in spatial domain resolution and 6-DOF motion model-based blur removal.

아래의 실시예에서는 라이트필드 영상의 공간 영역 초해상도 및 모션 블러 제거를 동시에 해결하는 end-to-end 네트워크를 제안한다. 네트워크 훈련에는 복원(reconstruction) 및 적대적 손실(adversarial loss)를 통해 역전파되는 여러 단계의 최적화가 포함된다. 생성적 적대 신경망 단계에서는 추정한 결과 및 입력의 전역 및 지역 구조를 결합하여 여러 손실 계산에 활용한다. 지역 구조는 영상 합성 기법과 유사하게 블러 제거가 되지 않은 영역이 ground truth와 같도록 강제하는 것을 목표로 한다. 본 실시예의 기여를 요약하면 다음과 같다. In the example below, we propose an end-to-end network that simultaneously solves spatial domain super-resolution and motion blur removal of lightfield images. Network training involves several stages of optimization, which are backpropagated through reconstruction and adversarial loss. In the generative adversarial neural network stage, the global and local structures of the estimated results and inputs are combined and used for multiple loss calculations. The local structure aims to force the unblurred area to be the same as the ground truth, similar to the image synthesis technique. The contribution of this embodiment is summarized as follows.

* 저해상도 및 모션 블러된 라이트필드 데이터셋 생성을 위해 축소된 공간 스케일 상에서의 6-DOF 라이트필드 카메라 모션의 수식화한다.* Formulate 6-DOF lightfield camera motion on reduced spatial scales to generate low-resolution and motion-blurred lightfield datasets.

* 라이트필드 공간 영역 초해상도 복원과 모션 블러 제거를 동시 수행하는 프레임워크를 제안한다.* We propose a framework that simultaneously performs lightfield spatial domain super-resolution restoration and motion blur removal.

* 네트워크 훈련 중 성능 향상을 위한 지역 마스킹 및 개선 전략을 제공한다.* Provides local masking and improvement strategies to improve performance during network training.

본 실시예의 구성은 다음과 같다. 먼저, 초해상도 및 블러 제거를 동시 수행하는 네트워크를 제안하며, 세부적으로 SRNet, DebNet 및 구별자 네트워크를 소개한다. 다음으로, 6-DOF 라이트필드 블러 모델과 네트워크에 적용한 최적화 기법에 대하여 소개하고, 네트워크를 학습하기 위해 생성한 저해상도 및 모션 블러된 라이트필드 데이터셋을 소개한다. 이후, 구현의 세부사항에 대하여 설명하고, 기존의 타 기법과 정량적, 정성적 비교 분석을 수행한다. The configuration of this embodiment is as follows. First, we propose a network that simultaneously performs super-resolution and blur removal, and introduces SRNet, DebNet, and discriminator networks in detail. Next, we introduce the 6-DOF light field blur model and the optimization technique applied to the network, and introduce the low-resolution and motion-blurred light field dataset created to train the network. After that, the details of implementation will be described, and quantitative and qualitative comparative analysis with other existing techniques will be performed.

도 2a는 일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치를 설명하기 위한 도면이다.2A is a diagram for explaining an adversarial neural network model apparatus for simultaneously performing light field super-resolution and blur removal according to an embodiment.

도 2a를 참조하면, 본 실시예에서는 라이트필드의 스택된 subview 영상들을 입력으로 하여 특정 subview 영상의 공간 영역 초해상도 복원 및 모션 블러를 제거한 결과를 출력으로 생성하는 end- to-end 네트워크를 제안한다. Referring to Fig. 2a, in this embodiment, we propose an end-to-end network that takes as an input stacked subview images of a light field and generates as an output a result of spatial domain super-resolution restoration and motion blur removal of a specific subview image. .

일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치(적대적 네트워크)는 JSRD(Joint SR and Deblurring)를 포함한다. JSRD로 정의한 공동 네트워크는 초해상도 복원 네트워크(초해상도 네트워크)(SRNet, 210))와 블러 제거 네트워크(DebNet, 220)로 간단히 구성되어 있다. An adversarial neural network model apparatus (adversarial network) for simultaneously performing lightfield super-resolution and blur removal according to an embodiment includes Joint SR and Deblurring (JSRD). The joint network defined by JSRD is simply composed of a super-resolution restoration network (super-resolution network) (SRNet, 210)) and a blur removal network (DebNet, 220).

JSRD는 공간 크기가 (H / d, W / d)인 입력을 받아 (H, W) 크기의 결과를 출력하는 네트워크이다. 네트워크는 2차원 영상에서 직접 작동하기 때문에 4차원 라이트필드 표현 (x, y, u, v)은 3차원 (x, y, a)으로 감소되고, 순환적 기법을 통해 하나씩 처리한다. subview 영상들을 적층함으로써 도메인이 감소하는 것에 대해서는 아래에서 설명한다. JSRD is a network that takes an input of spatial size (H / d, W / d) and outputs a result of size (H, W). Since the network operates directly on the 2D image, the 4D light field representation (x, y, u, v) is reduced to 3D (x, y, a) and processed one by one through a cyclic technique. The domain reduction by stacking subview images will be described below.

end-to-end 네트워크는 초해상도 복원 및 블러 제거의 손실들의 역전파를 허용하도록 여러 단계로 나누어 훈련을 진행한다. 여기서 전역 및 지역 마스크 영역으로 처리하는 생성적 적대 신경망을 제공한다. 아래에서 일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치에 대해 보다 상세히 설명한다.The end-to-end network is trained in several stages to allow backpropagation of losses in super-resolution restoration and blur removal. Here we provide a generative adversarial neural network that processes with global and local mask regions. An adversarial neural network model apparatus for simultaneously performing lightfield super-resolution and blur removal according to an embodiment will be described in more detail below.

일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치는 초해상도 네트워크(210) 및 블러 제거 네트워크(220)를 포함하여 이루어질 수 있다. 이에 따라 초해상도 네트워크 및 블러 제거 네트워크를 포함하는 end-to-end 네트워크를 제공할 수 있다. 또한, 실시예에 따라 구별자 네트워크를 더 포함할 수 있다. 따라서 영상의 전체 및 지역 영역으로 처리하는 생성적 적대 신경망(Generative Adversarial Network, GAN)을 제공할 수 있다. The adversarial neural network model apparatus for simultaneously performing lightfield super-resolution and blur removal according to an embodiment may include a super-resolution network 210 and a blur removal network 220 . Accordingly, it is possible to provide an end-to-end network including a super-resolution network and a blur removal network. In addition, according to an embodiment, it may further include a discriminator network. Therefore, it is possible to provide a generative adversarial network (GAN) that processes the entire image and local regions.

초해상도 네트워크(210)는 저해상도 또는 블러된 라이트필드의 subview 영상(201)을 입력 받아 초해상도 복원 영상(202)을 출력할 수 있다. 여기서, 초해상도 네트워크(210)는 초해상도 복원 영상(202)의 해상도는 d 스케일로 다운샘플링되며, 이웃 subview 영상에 대한 추가적인 정보를 획득하기 위해 입력된 subview 영상(201)은 적어도 이웃한 영상들과 연결될 수 있다. The super-resolution network 210 may receive a subview image 201 of a low-resolution or blurred light field and output a super-resolution reconstructed image 202 . Here, in the super-resolution network 210, the resolution of the super-resolution reconstructed image 202 is downsampled to a d scale, and the subview image 201 input to obtain additional information on the neighboring subview image is at least the neighboring images. can be connected with

블러 제거 네트워크(220)는 초해상도 복원 영상(202)을 입력 받아 적어도 하나 이상의 컨볼루션 블록을 통과시켜 초해상도 및 모션 블러가 제거된 영상(203)을 추정할 수 있다. The blur removal network 220 may receive the super-resolution reconstructed image 202 and pass it through at least one or more convolution blocks to estimate the super-resolution and motion blur-removed image 203 .

실시예들은 영상의 전체 및 지역 영역으로 처리하는 생성적 적대 신경망(Generative Adversarial Network, GAN)을 제공할 수 있다. 이 때, 초해상도 네트워크(210) 및 블러 제거 네트워크(220)는 생성자 네트워크에 포함될 수 있고, 아래와 같은 구별자 네트워크를 더 포함할 수 있다. Embodiments may provide a generative adversarial network (GAN) that processes the entire image and local regions. In this case, the super-resolution network 210 and the blur removal network 220 may be included in the generator network, and may further include a discriminator network as follows.

구별자 네트워크는 초해상도 복원 및 블러 제거의 손실들의 역전파를 허용하도록 복수 개의 단계로 나누어 end-to-end 네트워크의 훈련을 진행하고, 저해상도 및 6-DOF 모션 블러된 라이트필드 데이터셋으로 훈련할 수 있다. The discriminator network trains the end-to-end network by dividing it into multiple steps to allow backpropagation of losses in super-resolution restoration and blur removal, and trains with low-resolution and 6-DOF motion-blurred lightfield datasets. can

여기서, 구별자 네트워크는 첫 번째 단계에서 생성자 네트워크로부터 예측된 초해상도 및 모션 블러가 제거된 영상(203)의 전체 영역을 사용하여 구별자 네트워크를 훈련하고, 두 번째 단계에서 동일한 구별자 네트워크를 이용하여 추가적인 지역 영역을 학습할 수 있다.Here, the discriminator network trains the discriminator network using the entire region of the super-resolution and motion blur-removed image 203 predicted from the generator network in the first step, and uses the same discriminator network in the second step. Thus, additional local areas can be learned.

도 2b는 일 실시예에 따른 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치의 동작 방법을 나타내는 흐름도이다.2B is a flowchart illustrating a method of operating an adversarial neural network model apparatus for simultaneously performing light field super-resolution and blur removal according to an embodiment.

도 2b를 참조하면, 일 실시예에 따른 컴퓨터 장치를 통해 수행되는 라이트필드 초해상도와 블러 제거의 동시 수행을 위한 적대적 신경망 모델 장치의 동작 방법은, 초해상도 네트워크에서 저해상도 또는 블러된 라이트필드의 subview 영상을 입력 받아 초해상도 복원 영상을 출력하는 단계(S110), 및 블러 제거 네트워크에서 초해상도 복원 영상을 입력 받아 적어도 하나 이상의 컨볼루션 블록을 통과시켜 초해상도 및 모션 블러가 제거된 영상을 추정하는 단계(S120)를 포함하여 이루어질 수 있다. 여기서, 초해상도 네트워크 및 블러 제거 네트워크를 포함하는 end-to-end 네트워크를 제공할 수 있다. Referring to FIG. 2B , a method of operating an adversarial neural network model apparatus for simultaneously performing light field super-resolution and blur removal performed through a computer device according to an embodiment is a subview of a low-resolution or blurred light field in a super-resolution network Step (S110) of receiving an image and outputting a super-resolution reconstructed image, and estimating an image from which super-resolution and motion blur are removed by receiving a super-resolution reconstructed image from a blur removal network and passing it through at least one or more convolution blocks (S120) may be included. Here, an end-to-end network including a super-resolution network and a blur removal network may be provided.

또한, 구별자 네트워크에서 초해상도 복원 및 블러 제거의 손실들의 역전파를 허용하도록 복수 개의 단계로 나누어 end-to-end 네트워크의 훈련을 진행하고, 저해상도 및 6-DOF 모션 블러된 라이트필드 데이터셋으로 훈련하는 단계(S130)를 더 포함하여 이루어질 수 있다. In addition, training the end-to-end network by dividing it into multiple steps to allow backpropagation of the losses of super-resolution restoration and blur removal in the discriminator network, and low-resolution and 6-DOF motion-blurred lightfield datasets The training step (S130) may be further included.

복수 개의 단계는, 첫 번째 단계에서 생성자 네트워크로부터 예측된 초해상도 및 모션 블러가 제거된 영상의 전체 영역을 사용하여 구별자 네트워크를 훈련하고, 두 번째 단계에서 동일한 구별자 네트워크를 이용하여 추가적인 지역 영역을 학습할 수 있다.In the first step, the discriminator network is trained using the entire region of the super-resolution and motion blur-removed image predicted from the generator network in the first step, and in the second step, an additional local area is used using the same discriminator network. can learn

도 3은 일 실시예에 따른 초해상도 네트워크의 구조를 나타내는 도면이다.3 is a diagram illustrating a structure of a super-resolution network according to an embodiment.

도 3에 도시된 바와 같이, 프레임워크의 첫 번째 단계에서는 공간 영역 초해상도 복원을 해결하기 위해 순환적 U-Net을 활용하였다. 네트워크는 저해상도 및 블러된 라이트필드의 subview 영상(

)을 입력으로 받아 초해상도 복원 영상(

)을 출력한다.

의 해상도는 d 스케일로 다운샘플링되며, 이웃 subview 영상에 대한 추가적인 정보를 획득하기 위해 입력은 적어도 이웃한 영상들(

,

)과 연결한다. As shown in Fig. 3, in the first step of the framework, a cyclic U-Net was utilized to solve the spatial domain super-resolution restoration. The network is a low-resolution and blurred lightfield subview image (

) as an input and a super-resolution restored image (

) is output.

The resolution of is downsampled to the d scale, and the input is at least the neighboring images (

,

) is connected with

첫 번째로 특징들은 인스턴스 정규화로 정규화된 2개의 컨볼루션 블록을 통과한다. 첫 번째와 두 번째 컨볼루션 층의 채널은 각각 64 및 32으로 구성되어 있으며, 필터의 크기는 각 9x9 및 3x3이다. 컨볼루션 층에서 추출된 특징은 이전의 숨겨진 특징(

)과 연결되어 U-Net 내에서 처리된다. U-Net은 6개의 인코더와 5개의 디코더가 통합되어 있다. 각 인코더 계층에서는 채널의 수가 2배로 증가하지만 공간의 크기는 절반으로 줄어든다. 이와 반대로, 디코더 계층에서는 공간의 크기가 2배로 증가하고 채널 수가 절반으로 감소한다. 각각의 디코더 계층은 동일한 크기의 인코더 계층의 출력을 받는다는 점에 유의한다. 인코더의 출력은 디코더의 입력과 연결되어 있다. U-Net 블록은 컨볼루션/디컨볼루션, IN (인스턴스 정규화) 및 ReLU 활성화로 구성되어 있다. 마지막으로, U-Net의 출력은 2개의 간단한 Conv-IN-ReLU 블록을 통과한다. 첫 번째 블록의 출력

은 다음 영상을 위해 recurrent connection을 통해 전파한다. 두 번째 블록의 출력은 잔차(residual) 초해상도 영상

이며, 입력(

)을 더하여 블러된 영상의 초해상도 복원 영상(

)을 생성한다.First, the features are passed through two convolution blocks normalized by instance normalization. The channels of the first and second convolutional layers consist of 64 and 32, respectively, and the size of the filter is 9x9 and 3x3, respectively. The features extracted from the convolutional layer are the previously hidden features (

) and processed within U-Net. U-Net has 6 encoders and 5 decoders integrated. At each encoder layer, the number of channels is doubled, but the size of the space is halved. Conversely, in the decoder layer, the size of the space is doubled and the number of channels is halved. Note that each decoder layer receives the output of the encoder layer of the same size. The output of the encoder is connected to the input of the decoder. The U-Net block consists of convolution/deconvolution, IN (instance normalization) and ReLU activation. Finally, the output of U-Net is passed through two simple Conv-IN-ReLU blocks. output of the first block

propagates over the recurrent connection for the next video. The output of the second block is the residual super-resolution image

, and input (

) to the super-resolution reconstructed image (

) is created.

도 4는 일 실시예에 따른 블러 제거 네트워크의 구조를 나타내는 도면이다.4 is a diagram illustrating a structure of a blur removal network according to an embodiment.

다음 단계인 블러 제거 네트워크는 초해상도 네트워크의 출력을 처리한다. 도 4에 도시된 바와 같이, 블러 제거 네트워크의 입력(

)은 conv-IN-ReLU의 조합으로 구성된 2개의 컨볼루션 블록을 통과한다. 처음 2개의 컨볼루션층에서 추출한 특징은 이전 영상의 숨겨진 특징인

와 연결한다. 연결된 특징들은 배치 정규화가 아닌 He et al.(비특허문헌 6)의 인스턴스 정규화를 사용하는 8개의 ResNet 블록을 통과하며, ResNet 블록은 conv-IN-ReLU-conv-IN 조합으로 구성되어 있다. ResNet 블록 내부에서는 특징이 ReLU로 활성화되고 2개의 컨볼루션 층을 통과한다. ResNet 블록 이후의 첫 번째 컨볼루션 블록은 다음 영상으로 숨겨진 특징인 출력

을 제공한다. 마지막 컨볼루션 블록은 잔차(residual) 블러 제거 영상(

)을 생성하고, 입력(

)을 더하여 최종적인 초해상도 및 모션 블러가 제거된 영상(

)을 추정한다.The next step, the de-blur network, processes the output of the super-resolution network. As shown in Fig. 4, the input of the blur removal network (

) passes through two convolution blocks composed of a combination of conv-IN-ReLU. The features extracted from the first two convolutional layers are the hidden features of the previous image.

connect with The connected features pass through 8 ResNet blocks using instance normalization of He et al. (Non-Patent Document 6) rather than batch normalization, and the ResNet block is composed of a conv-IN-ReLU-conv-IN combination. Inside the ResNet block, features are activated with ReLU and passed through two convolutional layers. The first convolution block after the ResNet block is the output, which is a feature hidden by the next image.

provides The last convolution block is a residual blur-removed image (

), and input (

) to the final super-resolution and motion blur removed image (

) is estimated.

도 5는 일 실시예에 따른 구별자 네트워크를 설명하기 위한 도면이다.5 is a diagram for describing a discriminator network according to an embodiment.

지난 몇 년 동안 다른 도메인에서 입력 영상이 주어졌을 때 그럴듯한 결과를 합성하는 생성적 적대 신경망(Generative Adversarial Network, GAN) 기법이 소개되었다. 생성적 적대 신경망(GAN)이 널리 이용되는 동안 하나의 영상에 포함되는 시맨틱 정보는 감소하게 되어 성능이 저하되었다. 이 문제를 해결하기 위해 영상의 특정 영역에 대한 지역 정보가 사용되었다. 이 지역 정보는 픽셀 간의 관계를 강화하여 더 부드러운 결과를 생성한다. In the past few years, generative adversarial networks (GANs) have been introduced to synthesize plausible results given input images from different domains. While generative adversarial neural networks (GANs) have been widely used, the semantic information included in one image is reduced and performance is degraded. To solve this problem, local information for a specific area of the image was used. This local information enhances the relationships between pixels, producing smoother results.

본 실시예에서는 여러 단계로 훈련하는 간단한 구별자 네트워크를 설계하였다. 첫 번째 단계는 생성자로부터 예측된 영상(

)의 전체 영역을 사용하여 구별자를 훈련한다. 두 번째 단계에서는 동일한 구별자를 이용하여 추가 특정 영역을 학습한다. 추가 특정 영역이란

와

의 절대 차이에 의한 영역 M을 나타낸다. 이러한 방식으로 블러가 제거되지 않은 지역은 구별자를 통해 지역 prior로 처리된다. 구별자 네트워크는 Conv-IN-Leaky ReLU로 구성된 6개 블록과 Conv-IN-Sigmoid로 구성된 1개 블록으로 Isola et al.(비특허문헌 7)에서 제안한 기법과 유사하게 구성한다. In this example, we designed a simple discriminator network that trains in several steps. The first step is the image predicted from the generator (

) is used to train the discriminator. In the second stage, additional specific domains are learned using the same discriminator. What are additional specific areas?

Wow

represents the region M by the absolute difference of . In this way, the region where the blur is not removed is treated as a regional prior through the discriminator. The discriminator network is composed of 6 blocks composed of Conv-IN-Leaky ReLU and 1 block composed of Conv-IN-Sigmoid, similar to the technique proposed by Isola et al. (Non-Patent Document 7).

마지막으로, 도 5에 도시된 바와 같이, 마지막 계층을 제외한 각 계층에서 공간 영역 크기는 2배 다운샘플링하며, 채널은 64 스케일로 증가한다.Finally, as shown in Fig. 5, in each layer except for the last layer, the size of the spatial domain is downsampled by 2 times, and the channel is increased to 64 scale.

여기서,

는 SR-net, Deb-net을 통해 생성된 블러 제거 및 초해상도된 결과물이고,

는 초해상도 및 선명한 영상으로 ground truth를 의미한다. 도 5의 (a)는 마스킹 과정을 나타내고, (b)는 네트워크를 전체 영역으로 학습하는 경우를 나타내며, (c)는 네트워크를 마스킹된 영역으로 학습하는 경우를 나타낸다. here,

is the result of blur removal and super-resolution created through SR-net and Deb-net,

means ground truth with super-resolution and clear images. Fig. 5 (a) shows the masking process, (b) shows the case of learning the network as an entire region, and (c) shows the case of learning the network as the masked region.

아래에서는 네트워크 학습 기법에 대해 설명한다.The following describes the network learning technique.

6-DOF 라이트필드 블러 모델6-DOF Lightfield Blur Model

본 실시예에서 블러된 subview 영상들은 6-DOF 라이트필드 카메라 모션으로 시뮬레이션을 하여 생성한다. 특히, 선명한 subview 영상들을 모션에 따라 워핑하고 시간이 지남에 따라 통합한다. 이 접근법은 기존의 연구인 Tai 등(비특허문헌 8)에서 제안한 다른 호모그래피들로 선명한 2차원 영상을 워핑시키는 방법과 유사하다. 최종적인 블러된 영상은 워핑된 영상들을 평균하여 생성한다. 다음에서는 완전한 6-DOF 모델을 나타내기 위해 3-DOF 변환 모델과 3-DOF 회전 모델에 대해 설명한다.In this embodiment, the blurred subview images are generated by simulating with 6-DOF light field camera motion. In particular, it warps sharp subview images according to motion and integrates them over time. This approach is similar to the method of warping a clear two-dimensional image with other homography proposed by Tai et al. (Non-Patent Document 8), an existing study. The final blurred image is generated by averaging the warped images. In the following, the 3-DOF transformation model and the 3-DOF rotation model are described to represent the complete 6-DOF model.

먼저, 3-DOF 변환 모델에 대해 설명한다.First, the 3-DOF transformation model will be described.

3-DOF 라이트필드 변환 모델은 x축 및 y축의 2차원 in-plane 모션과 z축의 1차원 out-of-plane 모션에서 유도된다. 라이트필드의 블러된 픽셀(B)은 축을 따라 이동시켜 워핑된 선명한 라이트필드의 통합 버전(G)으로부터 획득한다. 또한, 다음 식과 같이 간단히 나타낼 수 있다.The 3-DOF light field transformation model is derived from two-dimensional in-plane motion in the x- and y-axis and one-dimensional out-of-plane motion in the z-axis. Blurred pixels of the lightfield (B) are obtained from an integrated version (G) of the sharpened lightfield that has been warped by moving it along the axis. Also, it can be simply expressed as the following equation.

[수학식 1][Equation 1]

여기서, (x, u)는 공간 및 각도영역 좌표를 나타내고,

는 전체 시간 T에서 각각의 노출 시간 t의 in-plane 및 out-of-plane 카메라 변환을 나타낸다. x (및 y)는 다운샘플 (또는 업 샘플) 스케일을 나타내는 d로 나누어진다. 여기에서 라이트필드의 매개 변수는 단순화를 위해 4D (x, y, u, v)에서 2D (x, u)로 축소하였다.Here, (x, u) represents spatial and angular domain coordinates,

represents the in-plane and out-of-plane camera transformations for each exposure time t at the total time T. x (and y) is divided by d representing the downsample (or upsample) scale. Here, the parameters of the light field are reduced from 4D (x, y, u, v) to 2D (x, u) for simplicity.

다음으로, 3-DOF 회전 모델에 대해 설명한다.Next, the 3-DOF rotation model will be described.

수학식 1을 확장하여 6-DOF 모션을 모델링하기 위해 추가적인 3-DOF 회전 근사를 추가한다. out-of-plane (x축

및 y축

)과 in-plane의 두 가지 모델로 구성된 회전 변환은 보다 정확한 이해를 위해 각각 분리하여 설명한다.Equation 1 is extended to add an additional 3-DOF rotation approximation to model 6-DOF motion. out-of-plane (x-axis

and y-axis

) and in-plane, the rotational transformation, which consists of two models, will be described separately for a more accurate understanding.

Out-of-plane Rotation: 3Х3 호모그래피 행렬에 의해 워핑된 fronto-parallel subview image (G(u))를 모델링한다. 이는 다음 식과 같이 정의할 수 있다.Out-of-plane Rotation: Model a fronto-parallel subview image (G(u)) warped by a 3Х3 homography matrix. This can be defined as the following equation.

[수학식 2][Equation 2]

이 때, (

)는 초점거리와 y축과 x축을 따른 회전각을 의미한다. At this time, (

) is the focal length and rotation angles along the y and x axes.

Out-of-plane은 변환 모델 (

)에 대한 추가적인 매개변수가 된다. 따라서 이는 다음 식과 같이 표현될 수 있다.Out-of-plane is a transformation model (

) as an additional parameter. Therefore, it can be expressed as the following equation.

[수학식 3][Equation 3]

여기서,

는 각 x축 및 y축을 따라 업데이트된 in-plane 모션을 나타낸다. here,

denotes the updated in-plane motion along each x-axis and y-axis.

out-of-plane 회전을 활용하여 수학식 1을 수정하면 다음 식과 같이 나타낼 수 있다.If Equation 1 is modified using out-of-plane rotation, it can be expressed as the following equation.

[수학식 4][Equation 4]

In-plane Rotation: 라이트필드 카메라는 in-plane 회전할 때 단일 축을 따라 회전한다. 이 축은 central subview 영상을 교차한다. 각 subview 영상은

의 거리만큼 분리되며, 이 매개변수들은 시차효과를 생성하는 라이트필드 카메라 기준선으로 활용된다. Bok et al.(비특허문헌 9)에서 제공하는 툴 박스를 사용하여 캘리브레이션을 진행하면

값은 0.9 픽셀이지만 정확한 위치를 제공하기 위해 1픽셀로 정한다. in-plane 회전을 모델링하면 다음 식과 같이 나타낼 수 있다.In-plane Rotation: Lightfield cameras rotate along a single axis when rotated in-plane. This axis intersects the central subview image. Each subview image is

is separated by a distance of , and these parameters are used as a reference line for the light field camera to create a parallax effect. When calibration is performed using the toolbox provided by Bok et al. (Non-Patent Document 9),

The value is 0.9 pixels, but we set it to 1 pixel to provide an accurate location. Modeling the in-plane rotation can be expressed as the following equation.

[수학식 5][Equation 5]

[수학식 6][Equation 6]

여기서,

는 in-plane 각도를 나타낸다. here,

is the in-plane angle.

마지막으로, 수학식 5와 수학식 6을 수학식 4에 통합함으로써 전체 6-DOF 모델 라이트필드 카메라 모션은 다음 식과 같이 나타낼 수 있다.Finally, by integrating Equations 5 and 6 into Equation 4, the entire 6-DOF model lightfield camera motion can be expressed as the following Equation.

[수학식 7][Equation 7]

저해상도 및 모션 블러된 라이트필드 데이터셋Low-resolution and motion-blurred lightfield datasets

이 단계에서는 제안하는 기법의 네트워크 학습을 위한 데이터셋의 세부 내용을 설명한다. 기존의 블러 데이터셋은 주로 라이트필드가 아닌 2차원 또는 3차원 영상(비디오) 블러 제거에 중점을 두고 있다. 최근에는 각 영역 초해상도 연구를 위한 대규모 라이트필드 데이터셋이 제공되고 있다. In this step, we describe the details of the dataset for network learning of the proposed technique. Existing blur datasets mainly focus on removing blur from 2D or 3D images (videos) rather than light fields. Recently, large-scale light field datasets for super-resolution studies in each domain have been provided.

따라서 저해상도이면서 균일하지 않게 블러된 대규모의 라이트필드 데이터셋을 생성하기 위해, 실제 영상은 Lytro Illum으로부터 라이트필드 데이터를 취득하고 합성 영상에 대해서는 UnrealCV에서 3차원 장면을 수집하였다. 3차원 장면은 Lytro 영상이 낮은 강도를 갖는 경향 때문에 색상영역에 더 많은 변화를 주기 위하여 포함하였다. 구현에서는 d는 4로 적용하여 공간 좌표를 4배 감소시켜 80x128의 공간 크기를 갖는 subview 영상을 제공한다. 데이터셋은 훈련셋의 경우 360개 라이트필드와 테스트 셋의 경우는 40개의 라이트필드로 분리하였다. Lytro 및 3차원 장면으로 캡처 한 데이터의 모집단 비율은 두 세트 모두에서 50:50이다. 6-DOF 모델은 수학식 7을 사용하여 라이트필드 카메라에서 셔터 시간 효과 (T = 15)를 시뮬레이션 하였다. 특히, 6-DOF 모델에서 변환 모델의 경우 베지어 곡선을 회전 모델의 경우에는 구형 선형 보간을 사용하여 매 시간 t (t∈T)에서 매개 변수화 하였다.Therefore, in order to generate a large-scale light field dataset with low resolution and non-uniformly blurred light field, light field data were acquired from Lytro Illum for real images and 3D scenes were collected from UnrealCV for synthetic images. The 3D scene was included to give more changes to the color gamut because the Lytro image tends to have low intensity. In the implementation, d is applied to 4 to reduce the spatial coordinates by 4 times to provide a subview image with a spatial size of 80x128. The dataset was divided into 360 light fields for the training set and 40 light fields for the test set. The population ratio of data captured with lytro and 3D scenes is 50:50 in both sets. The 6-DOF model simulated the shutter time effect (T = 15) in a lightfield camera using Equation 7. In particular, in the 6-DOF model, the Bezier curve for the transformation model and spherical linear interpolation for the rotation model were parameterized at every time t(t∈T).

최적화optimization

초기 단계에서는 초해상도 네트워크 및 블러 제거 네트워크를 L1 및 지각 손실(perceptual losses)을 각각 활용하여 개별적으로 학습한다. 지각 손실(perceptual loss)은 사전 훈련된 VGG-19 네트워크를 통해 예측 영상 및 ground truth의 특징에서 L2 오차를 구하여 얻는다. 구현 단계에서 최종 특징은 VGG-19 네트워크의 conv3.3에서 취득한다. 초해상도 및 블러 제거 네트워크에 대한 훈련은 각각 30 에포크(epoch)씩 수행하였다. 이러한 접근법은 영상 콘텐츠 최적화의 일부에 해당한다. 식으로 표현하면 다음 식과 같이 나타낼 수 있다.In the initial stage, the super-resolution network and the blur removal network are individually trained using L1 and perceptual losses, respectively. The perceptual loss is obtained by obtaining the L2 error from the predicted image and ground truth features through the pre-trained VGG-19 network. In the implementation phase, the final features are acquired from conv3.3 of the VGG-19 network. Training for the super-resolution and blur removal networks was performed for 30 epochs, respectively. This approach is part of video content optimization. Expressed as an expression, it can be expressed as the following expression.

[수학식 8][Equation 8]

여기서,

는 특정 반복에서 손실 계산을 대체하는 이진 값이다. here,

is a binary value that replaces the loss calculation in a particular iteration.

상세한 초해상도 손실 함수는 다음 식과 같이 나타낼 수 있다.The detailed super-resolution loss function can be expressed as the following equation.

[수학식 9][Equation 9]

여기서, S'은 저해상도 및 블러된 subview 영상 B로부터 생성한 초해상도 버전이다. S는 초해상도 영상의 ground truth, N은 픽셀 수를 나타내며,

는 VGG에서 추출한 특징이다.

는 노이즈를 줄이기 위해 포함된 총 분산 손실을 의미한다. Here, S' is the super-resolution version generated from the low-resolution and blurred subview image B. S is the ground truth of the super-resolution image, N is the number of pixels,

is a feature extracted from VGG.

is the total dispersion loss included to reduce noise.

같은 방식으로 블러 제거 네트워크의 손실 함수는 다음 식과 같이 정의할 수 있다.In the same way, the loss function of the blur removal network can be defined as the following equation.

[수학식 10][Equation 10]

다음 단계에서는 블러 제거 네트워크의 지각 손실(perceptual loss)만 활용한다. 이 단계 동안에는 모든 변수는 철저하게 역전파된다. 200 에포크(epoch)는 생성자와 구별자 네트워크를 반복하는데 사용한다. 처음 100 에포크(epoch)에서의 구별자는 예측된 영상의 전체 영역을 입력으로 한다. 이 단계의 목적 함수는 WGAN-GP의 접근방식과 같이 기울기 페널티가 추가된 적대적 손실(adversarial loss)로 정의한다. 따라서 전역 영역 페널티는 다음 식과 같이 나타낼 수 있다.In the next step, only the perceptual loss of the blur removal network is utilized. During this phase, all variables are thoroughly backpropagated. 200 epochs are used to iterate over the generator and discriminator networks. The discriminator in the first 100 epochs takes the entire area of the predicted image as input. The objective function of this step is defined as an adversarial loss with a gradient penalty added as in the WGAN-GP approach. Therefore, the global area penalty can be expressed as the following equation.

[수학식 11][Equation 11]

여기서, x는 예측된 D와 ground truth(G)의 혼합 분포를 나타내며,

와 C는 기울기와 critic 구별자 함수이다. where x represents the mixed distribution of predicted D and ground truth (G),

and C are the slope and critic discriminator functions.

전역 영역의 적대적 손실(adversarial loss)을 다음 식과 같이 정의할 수 있다.The adversarial loss of the global area can be defined as the following equation.

[수학식 12][Equation 12]

여기서, P_G 및 P_D는 ground truth와 예측된 초해상도 블러 제거 영상의 분포이다. Here, P _G and P _D are the distributions of the ground truth and predicted super-resolution deblurred images.

다음 단계에서는 ground truth와 지역 영역의 유사도를 강제한다. 이진 마스크 M은 D와 S의 절대 차이로 생성하며 블러되지 않은 영역은 값을 1로 나머지는 0으로 설정한다. 이 마스크는 Tao et al.(비특허문헌 10)에서 연구한 추출된 라이트필드 맵과 유사하다. 해당 연구는 재초점한 라이트필드 영상을 블러 영역과 블러되지 않은 영역으로 구별하는 것이었다. 차이점은 맵에 0과 1의 값만 포함되어 있기 때문에 단순화된 버전이다. 이 접근방식을 사용하면 블러되지 않은 영역이 생성적 적대 네트워크에서 처리되도록 우선 순위가 주어지게 된다. 이 절차는 수학식 11을 다음 식과 같이 수정할 수 있다.The next step enforces the similarity between the ground truth and the local domain. A binary mask M is created as the absolute difference between D and S, and the non-blurred area is set to 1 and the rest to 0. This mask is similar to the extracted light field map studied by Tao et al. (Non-Patent Document 10). The study was to differentiate the refocused lightfield image into a blurred area and a non-blurred area. The difference is that the map contains only values 0 and 1, so it is a simplified version. With this approach, unblurred regions are given priority to be processed in the generative adversarial network. In this procedure, Equation 11 can be modified as the following Equation.

[수학식 13][Equation 13]

여기서, 입력 x와 최종 계산은 마스크 M을 사용하여 처리된다. Here, the input x and the final computation are processed using a mask M.

최종 계산을 마스킹하는 것은 지역 그래디언트 페널티 일관성을 유지하는 것을 요구한다. 따라서 이 지역을 이용하는 또 다른 적대적 손실(adversarial loss)도 추가된다. 동일한 판별자 C를 사용하여 지역 적대적 손실(adversarial loss)은 다음 식과 같이 정의될 수 있다.Masking the final calculation requires maintaining local gradient penalty consistency. Thus, another adversarial loss of using this area is added. Using the same discriminant C, the adversarial loss can be defined as the following equation.

[수학식 14][Equation 14]

다음 100 에포크(epoch)는 전역 및 지역 적대적 손실(adversarial loss)의 합을 사용하여 반복한다. 일반적으로 접근 방식에서 총 목적 함수는 다음 식과 같이 작성할 수 있다.The next 100 epochs iterate using the sum of the global and local adversarial losses. In general approach, the total objective function can be written as

[수학식 15][Equation 15]

여기서,

는 상수 값 100이다.here,

is the constant value 100.

아래에서는 실험 결과에 대해 설명한다.The experimental results are described below.

훈련 중 ground truth의 각각의 subview 이미지(G)는 128 Х 128 픽셀 크기를 갖는 패치로 무작위로 크롭된다. d는 4로 설정하여 낮은 해상도와 blurred 버전인 B는 32Х32의 공간 크기를 갖고, Batch size는 1로 적용하여 subview 영상들은 랜덤 시퀀스로 처리한다. 총 5 x 5 subview 영상은 나선형 방향 순서로 5개의 순차적 프레임을 샘플링한다. 모든 네트워크의 학습률(learning rate)은 0.0001이고 총 260 에포크(epoch)로 ADAM optimizer를 사용하여 학습한다. 처음 60 에포크(epoch)는 SRNet 및 DeblurNet을 개별적으로 훈련하는데 사용하며 다음 100 에포크(epoch)는 전역 영역의 훈련에 사용하고 마지막 100 에포크(epoch)는 구별자의 전역 및 지역 영역을 결합하기 위해 사용하였다. 훈련은 GTX 1080 GPU를 사용하여 약 6일이 소요되었다.During training, each subview image (G) of the ground truth is randomly cropped into a patch with a size of 128 Х 128 pixels. d is set to 4, and B, which is a low resolution and blurred version, has a spatial size of 32Х32, and batch size is applied to 1, so that subview images are processed as random sequences. A total of 5 x 5 subview images are sampled from 5 sequential frames in a spiral order. All networks have a learning rate of 0.0001 and are trained using the ADAM optimizer for a total of 260 epochs. The first 60 epochs are used to train the SRNet and DeblurNet separately, the next 100 epochs are used to train the global region, and the last 100 epochs are used to combine the global and local regions of the discriminator. . The training took about 6 days using a GTX 1080 GPU.

정량적 결과Quantitative results

실험 결과를 공개적으로 이용 가능하고 5 Х 5 각 해상도에서 작동하는 state-of-the art 초해상도 및 블러 제거 알고리즘과 비교하였다. Srinivasan et al.(비특허문헌 5)의 라이트필드 블러 제거 알고리즘은 비록 공간 해상도가 제한되어 있지만 포함하였다. 또한 권장하는 최적화 기반 알고리즘인 Krishnan et al. (비특허문헌 2)과 Pan et al. (비특허문헌 3) 기법을 활용하였다. 학습 기반의 경우 비디오 블러 제거 성능이 우수한 Kim et al. (비특허문헌 4)의 기법을 활용한다. 2D 블러 제거 기법은 각 subview 영상에 개별적으로 적용하였다. 표 1에서 각 항목의 값은 40 LF 테스트셋에서 모든 PSNR, SSIM 평균하여 측정한다. Experimental results were compared with publicly available state-of-the-art super-resolution and blur removal algorithms operating at 5 Х 5 angular resolution. The light field blur removal algorithm of Srinivasan et al. (Non-Patent Document 5) was included although spatial resolution was limited. Also, the recommended optimization-based algorithm, Krishnan et al. (Non-Patent Document 2) and Pan et al. (Non-patent document 3) technique was utilized. For the training-based case, Kim et al. The technique of (Non-Patent Document 4) is utilized. The 2D blur removal technique was applied to each subview image individually. In Table 1, the value of each item is measured by averaging all PSNR and SSIM in the 40 LF test set.

[표 1][Table 1]

첫 번째와 두 번째로 우수한 성능을 갖는 기법에 각각 파란색과 빨간색 텍스트로 표시하였다. 본 실시예에서 제안하는 기법에 Refine 전략을 포함한 방식이 표 1에 표시된 것처럼 전체 해상도 LF 측면에서 가장 높은 점수를 보였다. 이는 제안한 데이터셋과 학습 기반 알고리즘이 state-of-the-art 알고리즘보다 라이트필드 영상을 향상시키고 성능이 우수하다는 것을 나타낸다. 이는 데이터셋를 생성할 때, 정확한 지오메트리(geometry) 근사로 인해 달성되었다.The first and second best performing techniques are indicated by blue and red text, respectively. As shown in Table 1, the method including the refine strategy in the technique proposed in this example showed the highest score in terms of the overall resolution LF. This indicates that the proposed dataset and learning-based algorithm improve the light field image and perform better than the state-of-the-art algorithm. This was achieved due to an accurate geometry approximation when generating the dataset.

정성적 결과 Qualitative Results

도 6은 일 실시예에 따른 알고리즘의 정성적 결과 영상을 나타내는 도면이다.6 is a diagram illustrating a qualitative result image of an algorithm according to an exemplary embodiment.

성능을 정성적으로 관찰하기 위해, 도 6에 도시된 바와 같이, 시각적 결과를 나타냈다. L^* 및 Refine 기법의 실험 결과는 다른 기법들보다 선명하게 복원된 가장자리(edge)를 보여준다. 비록 Refine 방식의 실험 결과는 더 높은 메트릭 점수를 보여 주지만 실제로 L^*의 결과는 크고 작은 가장자리 모두에서 더 선명한 시각적 결과를 보여준다. 이는 작은 가장자리의 복원된 픽셀은 일반적으로 un-deblurred 영역에 있기 때문이다. 따라서 L^*를 사용하면 이 지역은 초해상도 및 블러 제거 네트워크에서 우선 순위로 지정된다. 에피폴라 평면 영상(EPI)은 라이트필드 일관성을 보여주기 위해 도 6의 각 subview 영상 아래와 옆에 표시하였다. 보충 자료를 통해 정성적 결과에 대한 더 많은 결과를 확인할 수 있다.In order to qualitatively observe the performance, as shown in Fig. 6, visual results were presented. Experimental results of L ^* and Refine techniques show sharply restored edges than other techniques. Although the experimental result of Refine method shows a higher metric score, in reality, ^{the result of L*} shows clearer visual results at both large and small edges. This is because reconstructed pixels with small edges are usually in un-deblurred regions. Therefore ^{, using L*} , this region is prioritized in super-resolution and blur-removing networks. Epipolar plane images (EPIs) are displayed below and next to each subview image of FIG. 6 to show light field coherence. Further results can be found on the qualitative results in the supplementary data.

최적화의 효과를 분석하기 위해 손실함수들을 제외하면서 동일한 하이퍼 파라미터로 네트워크를 학습하여 ablation study를 수행하였다. 성능을 수치적으로 비교하기 위해 최대 신호 대 잡음비(PSNR), 구조적 유사도 지수(SSIM) 메트릭을 활용하였다. PSNR과 SSIM은 높을수록 좋은 성능임을 나타낸다. To analyze the effect of optimization, an ablation study was performed by learning the network with the same hyperparameters while excluding the loss functions. To numerically compare the performance, the maximum signal-to-noise ratio (PSNR) and structural similarity index (SSIM) metrics were used. The higher the PSNR and SSIM, the better the performance.

[표 2][Table 2]

표 2에서 볼 수 있듯이 L^Con에만 의존하는 SR + Deb는 6-DOF 모델의 geometry을 학습하기에는 취약한 결과를 보였다. 적대적 손실(adversarial loss)에 전역 지역이 존재하는 수학식 11 및 수학식 12 (+ Glob)의 경우에는 가중치 성능은 PSNR 스케일에서 1.5dB 증가하였다. 마지막으로, 추가적인 지역 손실 수학식 13 및 수학식 14 (+ Loc)의 경우에는 더 높은 PSNR을 보여주는 결과를 나타낸다. As can be seen in Table 2, ^{SR + Deb, which depends only on L Con} , showed a weak result for learning the geometry of the 6-DOF model. In the case of Equations 11 and 12 (+ Glob) where the global region exists in the adversarial loss, the weighting performance is increased by 1.5 dB in the PSNR scale. Finally, in the case of the additional local loss Equations 13 and 14 (+ Loc), a result showing a higher PSNR is shown.

도 7은 일 실시예에 따른 정성적 결과 영상을 나타내는 도면이다.7 is a diagram illustrating a qualitative result image according to an exemplary embodiment.

이러한 방식은 민감하여 일부 경우에서는, 도 7에 도시된 바와 같이, 약한 모서리에 jiggling 노이즈가 발생하였다. 따라서 L^Con최적화를 사용하여 L^*에서 훈련된 가중치를 활용하는 개선 전략을 사용한다. 개선 전략은 표 2에서 Refine으로 표현되어 있으며, 표 2를 통해 알고리즘 비교 실험 결과를 확인할 수 있다. This method is sensitive and in some cases, as shown in FIG. 7 , a jiggling noise was generated in the weak corner. Therefore, we use an improvement strategy that utilizes the weights trained on L ^* ^{using L Con optimization.} The improvement strategy is expressed as Refine in Table 2, and the result of the algorithm comparison experiment can be checked through Table 2.

이상과 같이, 실시예들은 라이트필드 영상의 공간 초해상도 및 모션 블러 제거를 해결하기 위해 전문화된 신경망 기반 라이트필드 향상 프레임 워크를 제안한다. 네트워크는 fully-convolutional이며 다양한 해상도에서 작동하고자 하는 목적으로 순환적인 접근 방식으로 설계하였다. 또한 저해상도 및 parallax-consistent한 6-DOF 모션 블러된 라이트필드 데이터셋을 네트워크 훈련에 활용하였다. 성능을 높이기 위한 최적화 절차에 Local adversarial 및 refinement 전략을 도입하였다. 제안하는 기법은 정량적 측면과 정성적 측면 모두에서 다른 state-of-the-art 기법보다 우월함을 보였다.As described above, the embodiments propose a specialized neural network-based light field enhancement framework to solve spatial super-resolution and motion blur removal of light field images. The network is fully-convolutional and designed in a recursive approach with the goal of operating at various resolutions. In addition, a low-resolution and parallax-consistent 6-DOF motion-blurred lightfield dataset was used for network training. Local adversarial and refinement strategies were introduced in the optimization procedure to improve performance. The proposed method showed superiority to other state-of-the-art methods in both quantitative and qualitative aspects.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다. Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the adversarial neural network model apparatus for simultaneous performance of lightfield super-resolution and blur removal,
a super-resolution network that receives a subview image of a low-resolution or blurred light field and outputs a super-resolution reconstructed image; and
A blur removal network that receives the super-resolution reconstructed image and passes it through at least one or more convolution blocks to estimate the super-resolution and motion blur-removed image
including,
To provide an end-to-end network comprising the super-resolution network and the blur removal network,
The super-resolution network,
The subview image is passed while the super-resolution reconstructed image is output, and two convolution blocks each having filters of different sizes, a U-Net in which six encoders and five decoders are integrated, and the U- Contains convolution blocks through which the output of Net is passed,
The resolution of the reconstructed super-resolution image is downsampled to a d scale, and the subview image input to obtain additional information about the neighboring subview image is connected to at least neighboring images,
The blur removal network is
As the super-resolution reconstructed image is passed, the super-resolution and motion blur-removed image is estimated so that two convolution blocks, eight ResNet blocks, and output of the ResNet blocks each having filters of different sizes An adversarial neural network model device, comprising convolution blocks passed through.

delete

According to claim 1,
A discriminator network that trains the end-to-end network by dividing it into multiple steps to allow backpropagation of losses in super-resolution restoration and blur removal, and training with low-resolution and 6-DOF motion-blurred lightfield datasets
Further comprising, adversarial neural network model device.

4. The method of claim 3,
It provides a generative adversarial network (GAN) that processes into global and local regions of an image,
The discriminator network is
In the first step, the discriminator network is trained using the entire region of the super-resolution and motion blur-removed image predicted from the generator network, and in the second step, an additional local region is learned using the same discriminator network. to do
characterized in that, adversarial neural network model device.

In the operating method of an adversarial neural network model device for simultaneous performance of light field super-resolution and blur removal performed through a computer device,
receiving a subview image of a low-resolution or blurred light field from a super-resolution network and outputting a super-resolution reconstructed image; and
estimating an image from which super-resolution and motion blur have been removed by receiving the super-resolution reconstructed image from a blur removal network and passing it through at least one or more convolution blocks
including,
To provide an end-to-end network comprising the super-resolution network and the blur removal network,
The super-resolution network,
The subview image is passed while the super-resolution reconstructed image is output, and two convolution blocks each having filters of different sizes, a U-Net in which six encoders and five decoders are integrated, and the U- Contains convolution blocks through which the output of Net is passed,
The resolution of the reconstructed super-resolution image is downsampled to a d scale, and the subview image input to obtain additional information about the neighboring subview image is connected to at least neighboring images,
The blur removal network is
As the super-resolution reconstructed image is passed, the super-resolution and motion blur-removed image is estimated so that two convolution blocks, eight ResNet blocks, and output of the ResNet blocks each having filters of different sizes A method of operation of an adversarial neural network model device, comprising the convolution blocks passed through.