KR102442980B1

KR102442980B1 - Super-resolution method for multi-view 360-degree image based on equi-rectangular projection and image processing apparatus

Info

Publication number: KR102442980B1
Application number: KR1020200188790A
Authority: KR
Inventors: 강제원; 김희재; 이병욱
Original assignee: 이화여자대학교 산학협력단
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-09-13
Also published as: KR20220096396A

Abstract

ERP 기반 다시점 360도 영상의 초해상화 방법은 영상처리장치가 다시점을 제공하는 복수의 360도 영상 중 저해상도인 타깃 영상 및 고해상도인 참조 영상을 입력받는 단계, 상기 영상처리장치가 상기 타깃 영상 및 상기 참조 영상을 디스패리티 추정 모델에 입력하여 상기 타깃 영상과 상기 참조 영상의 디스패리티 정보를 출력하는 단계, 상기 영상처리장치가 상기 디스패리티 정보를 기준으로 상기 참조 영상을 정렬하는 단계, 상기 영상처리장치가 상기 타깃 영상 및 상기 정렬된 참조 영상을 잔차 블록으로 구성된 재구성 계층에 입력하여 상기 타깃 영상에 대한 고해상도 영상을 출력하는 단계를 포함한다. 상기 타깃 영상과 상기 참조 영상은 ERP(Equirectangular projection) 기반하여 360도 영상으로 모델링된 영상이다.The ERP-based super-resolution method of multi-view 360-degree images includes the steps of: receiving, by an image processing device, a low-resolution target image and a high-resolution reference image among a plurality of 360-degree images that provide multi-view, the image processing device is the target image and inputting the reference image to a disparity estimation model and outputting disparity information of the target image and the reference image, aligning the reference image based on the disparity information by the image processing apparatus, the image and outputting, by a processing unit, a high-resolution image of the target image by inputting the target image and the aligned reference image to a reconstruction layer composed of residual blocks. The target image and the reference image are images modeled as a 360-degree image based on an ERP (Equirectangular Projection).

Description

SUPER-RESOLUTION METHOD FOR MULTI-VIEW 360-DEGREE IMAGE BASED ON EQUI-RECTANGULAR PROJECTION AND IMAGE PROCESSING APPARATUS

이하 설명하는 기술은 ERP 기반의 다시점 360도 영상에 대한 초해상화 기법에 관한 것이다.The technology to be described below relates to a super-resolution technique for multi-view 360-degree images based on ERP.

멀티미디어 기술의 발달로 최근 실감 미디어에 대한 관심이 높아지고 있다. 360도 영상은 특정 시점을 기준으로 실제 물리 환경과 같은 전방위 영상을 제공한다. 그리고, 다시점 영상은 시점이 다른 복수의 카메라로 획득한 영상을 말한다. 다시점 360도 영상은 360도 영상을 캡쳐하는 복수의 카메라로 획득한 영상을 말한다. 2차원 영상을 360도 영상으로 맵핑하는 다양한 기법이 있다. 원통도법을 응용한 ERP(Equi-rectangular projection)가 대표적인 맵핑 기법이다.With the development of multimedia technology, interest in immersive media is increasing recently. The 360-degree image provides an omnidirectional image like the real physical environment based on a specific point of view. In addition, the multi-view image refers to an image acquired by a plurality of cameras having different viewpoints. The multi-view 360-degree image refers to an image acquired by a plurality of cameras that capture a 360-degree image. There are various techniques for mapping a two-dimensional image to a 360-degree image. ERP (Equi-rectangular projection) applying the cylindrical projection method is a representative mapping technique.

영상 초해상화는 다양한 기법이 연구되고 있다. 참조 영상을 이용한 초해상화 기법도 존재한다. 360도 영상 내지 다시점 360도 영상과 같이 특정 시점의 영상과 연관된 영상이 있는 경우, 참조 영상을 이용한 초해상화가 가능하다.Various techniques are being studied for image super-resolution. There is also a super-resolution technique using a reference image. When there is an image associated with an image of a specific viewpoint, such as a 360-degree image or a multi-view 360-degree image, super-resolution using a reference image is possible.

미국공개특허 US 2013-0258048호US Patent Publication No. US 2013-0258048

참조 영상 기반의 초해상화 기법은 입력 영상 간의 대응 관계를 탐색하여 참조 영상에서 저해상도 영상이 참조할 수 있는 정보를 추출하는 것이 중요하다. 그러나 종래 딥러닝 모델은 시점 차이가 클 경우에 영상 간의 디스패리티(disparity)를 보완하는데 어려움이 있고 ERP 영상의 위도 따른 비선형적 왜곡을 다루지 못한다. In the reference image-based super-resolution technique, it is important to extract information that the low-resolution image can refer to from the reference image by exploring the correspondence between the input images. However, the conventional deep learning model has difficulty in compensating for disparity between images when the viewpoint difference is large, and cannot handle nonlinear distortion according to the latitude of the ERP image.

이하 설명하는 기술은 ERP 기반의 다시점 360도 영상에 대한 초해상화 기법을 제공하고자 한다.The technology described below intends to provide a super-resolution technique for multi-view 360-degree images based on ERP.

ERP 기반 다시점 360도 영상의 초해상화 방법은 영상처리장치가 다시점을 제공하는 복수의 360도 영상 중 저해상도인 타깃 영상 및 고해상도인 참조 영상을 입력받는 단계, 상기 영상처리장치가 상기 타깃 영상 및 상기 참조 영상을 디스패리티 추정 모델에 입력하여 상기 타깃 영상과 상기 참조 영상의 디스패리티 정보를 출력하는 단계, 상기 영상처리장치가 상기 디스패리티 정보를 기준으로 상기 참조 영상을 정렬하는 단계 및 상기 영상처리장치가 상기 타깃 영상 및 상기 정렬된 참조 영상을 잔차 블록으로 구성된 재구성 계층에 입력하여 상기 타깃 영상에 대한 고해상도 영상을 출력하는 단계를 포함한다.The ERP-based super-resolution method of multi-view 360-degree images includes the steps of: receiving, by an image processing device, a low-resolution target image and a high-resolution reference image among a plurality of 360-degree images that provide multi-view, the image processing device is the target image and inputting the reference image to a disparity estimation model and outputting disparity information of the target image and the reference image, aligning the reference image based on the disparity information by the image processing apparatus, and the image and outputting, by a processing unit, a high-resolution image of the target image by inputting the target image and the aligned reference image to a reconstruction layer composed of residual blocks.

RP 기반 360도 영상을 초해상화하는 영상처리장치는 다시점을 제공하는 복수의 360도 영상 중 저해상도인 타깃 영상 및 고해상도인 참조 영상을 입력받는 입력장치, ERP 영상 간 왜곡을 고려하여 360도 영상을 초해상화하는 신경망 모델을 저장하는 저장장치 및 상기 타깃 영상 및 상기 참조 영상을 상기 신경망 모델에 입력하여 상기 타깃 영상과 상기 참조 영상의 디스패리티 정보를 생성하고, 상기 타깃 영상 및 상기 디스패리티 정보를 기준으로 정렬된 상기 참조 영상을 이용하여 상기 타깃 영상에 대한 고해상도 영상을 생성하는 연산장치를 포함한다.The RP-based image processing device for super-resolution 360-degree images is an input device that receives a low-resolution target image and a high-resolution reference image among a plurality of 360-degree images that provide multiple views, and a 360-degree image in consideration of distortion between ERP images. A storage device for storing a neural network model for super-resolution and inputting the target image and the reference image to the neural network model to generate disparity information of the target image and the reference image, and the target image and the disparity information and a computing device generating a high-resolution image of the target image by using the reference image aligned with reference to .

이하 설명하는 기술은 입력 영상 간의 디스패리티를 기준으로 참조 영상을 초해상화 목표(target) 영상에 정렬하여 ERP 왜곡에 강인한 초해상화를 수행한다.The technique to be described below aligns a reference image to a super-resolution target image based on disparity between input images to perform super-resolution robust against ERP distortion.

도 1은 다시점 360도 영상 시스템에 대한 예이다.
도 2는 초해상화를 수행하는 신경망 모델에 대한 예이다.
도 3은 초해상화를 수행하는 신경망 모델에 대한 다른 예이다.
도 4는 피라미드 구조의 360도 디스패리티 추정기에 대한 예이다.
도 5는 초해상화를 수행하는 영상처리장치에 대한 예이다.1 is an example of a multi-view 360-degree imaging system.
2 is an example of a neural network model that performs super-resolution.
3 is another example of a neural network model that performs super-resolution.
4 is an example of a 360 degree disparity estimator of a pyramid structure.
5 is an example of an image processing apparatus that performs super-resolution.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the technology to be described below can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, and it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the technology described below.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various components, but the components are not limited by the above terms, and only for the purpose of distinguishing one component from other components. used only as For example, a first component may be named as a second component, and similarly, the second component may also be referred to as a first component without departing from the scope of the technology to be described below. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설명된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In terms of terms used herein, the singular expression should be understood to include the plural expression unless the context clearly dictates otherwise, and terms such as "comprises" include the described feature, number, step, operation, element. , parts or combinations thereof are to be understood, but not to exclude the possibility of the presence or addition of one or more other features or numbers, step operation components, parts or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to a detailed description of the drawings, it is intended to clarify that the classification of the constituent parts in the present specification is merely a division according to the main function that each constituent unit is responsible for. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function. In addition, each of the constituent units to be described below may additionally perform some or all of the functions of other constituent units in addition to the main function it is responsible for. Of course, it can also be performed by being dedicated to it.

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in performing the method or operation method, each process constituting the method may occur differently from the specified order unless a specific order is clearly described in context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

360도 영상은 하나의 지점을 기준으로 360도 시점의 영상을 제공하는 영상 콘텐츠를 의미한다. 하나의 360도 영상은 한 지점에 배치된 360도 카메라를 통해 획득한 영상을 통해 제공된다. 2차원 영상을 360도 영상으로 맵핑하는 다양한 기법이 있다. 원통도법을 응용한 ERP가 대표적인 맵핑 기법이다. The 360-degree image refers to video content that provides an image of a 360-degree view based on one point. One 360-degree image is provided through an image acquired through a 360-degree camera disposed at a point. There are various techniques for mapping a two-dimensional image to a 360-degree image. ERP using cylindrical projection is a representative mapping technique.

다시점 360도 영상은 일정 영역에 배치된 복수의 360도 카메라들을 통해 획득한 영상을 통해 제공된다. 다시점 360도 영상은 일정 영역에서 획득한 영상들을 정합하고 합성하여 이동하는 사용자에게 자유로운 시점의 영상을 제공한다.The multi-view 360-degree image is provided through an image acquired through a plurality of 360-degree cameras disposed in a predetermined area. The multi-view 360-degree image provides an image from a free viewpoint to a moving user by matching and synthesizing images acquired in a certain area.

저해상도 영상은 해상도가 일정 기준값 미만으로 해상도가 낮은 영상을 말한다. 360도 저해상도 영상은 360도 카메라가 획득한 영상으로 해상도가 낮은 영상이다. 360도 저해상도 영상은 360도 카메라가 획득한 영상을 다운 스케일링한 영상일 수 있다. 이하 저해상도 영상은 LR(low resolution) 영상이라고 표현한다. The low-resolution image refers to an image whose resolution is lower than a predetermined reference value. A 360-degree low-resolution image is an image acquired by a 360-degree camera and has a low resolution. The 360-degree low-resolution image may be an image obtained by downscaling an image acquired by a 360-degree camera. Hereinafter, the low resolution image is referred to as a low resolution (LR) image.

고해상도 영상은 해상도가 기준값 이상으로 해상도가 높은 영상을 말한다. 360도 고해상도 영상은 360도 카메라가 획득한 영상으로 해상도가 높은 영상이다. 360도 고해상도 영상은 360도 카메라가 획득한 원본 영상 또는 해상도가 기준값 이상인 영상을 의미한다. 이하 고해상도 영상은 HR(high resolution) 영상이라고 표현한다.The high-resolution image refers to an image having a higher resolution than a reference value. A 360-degree high-resolution image is an image acquired by a 360-degree camera and is an image with high resolution. The 360-degree high-resolution image refers to an original image acquired by a 360-degree camera or an image having a resolution greater than or equal to a reference value. Hereinafter, the high-resolution image is referred to as a high-resolution (HR) image.

초해상화(super-resolution)는 저해상도 영상을 고해상도 영상으로 변환하는 기법을 말한다. 종래 초해상화는 픽셀 보간과 같은 기법을 이용하여 수행되었다. 이하 설명하는 기술은 전술한 바와 같이 다시점 360도 영상에 대한 초해상화 기법이다. 이하 다시점 360도 영상에 대한 초해상화 기법을 보다 단순하게 MV-SR(multi-view super-resolution)이라고 표현한다. Super-resolution refers to a technique for converting a low-resolution image into a high-resolution image. Conventionally, super-resolution was performed using a technique such as pixel interpolation. The technique to be described below is a super-resolution technique for a multi-view 360 degree image as described above. Hereinafter, a super-resolution technique for a multi-view 360-degree image is more simply expressed as MV-SR (multi-view super-resolution).

이하 설명하는 기술은 신경망 모델을 사용하여 초해상화를 한다. 신경망 모델은 RNN(Recurrent Neural Networks), FFNN(feedforward neural network), CNN(convolutional neural network) 등 다양한 모델이 있다. 이하 설명에서 CNN을 중심으로 설명하지만, 초해상화 기술이 특정 신경망 모델로만 구현되는 것은 아니다.The technique described below performs super-resolution using a neural network model. The neural network model includes various models such as Recurrent Neural Networks (RNN), feedforward neural network (FFNN), and convolutional neural network (CNN). In the following description, CNN will be mainly described, but the super-resolution technology is not implemented only with a specific neural network model.

도 1은 다시점 360도 영상 시스템에 대한 예이다. 도 1은 디코더에서 초해상화를 수행하는 시스템에 대한 예이다.1 is an example of a multi-view 360-degree imaging system. 1 is an example of a system for performing super-resolution in a decoder.

인코더(30)는 복수의 카메라(11 내지 15)로부터 영상을 수신한다. 인코더(30)는 360도 영상 포맷에 따라 개별 영상을 인코딩할 수 있다. 이때, 인코더(30)는 복수의 360도 영상 중 일부 영상을 저해상도 영상으로 다운 스케일링할 수 있다. 저장장치(50)는 복수의 카메라(11 내지 15)가 획득한 영상으로 구성된 360도 영상들(video stream 11 ~ 15)을 저장한다. video stream(13)은 저해상도 영상이라고 가정한다. The encoder 30 receives images from the plurality of cameras 11 to 15 . The encoder 30 may encode an individual image according to a 360-degree image format. In this case, the encoder 30 may downscale some of the plurality of 360-degree images to a low-resolution image. The storage device 50 stores 360-degree images (video streams 11 to 15) composed of images acquired by the plurality of cameras 11 to 15 . It is assumed that the video stream 13 is a low-resolution video.

360도 영상들(video stream 11 ~ 15)은 네트워크를 통해 수신단으로 전송된다. 디코더(70)는 인코딩된 영상을 디코딩한다. 영상처리장치(100)는 디코딩된 영상을 이용하여 초해상화를 한다. 영상처리장치(100)는 저해상도 영상 video stream(13) 및 인접한 다른 영상(video stream 11, 12, 14 및 15 중 적어도 하나)를 이용하여 video stream(13)을 고해상도 영상으로 변환한다. 저장장치(90)는 모두 고해상도 영상인 360도 영상들(video stream 11 ~ 15)을 저장할 수 있다. 360-degree images (video streams 11 to 15) are transmitted to the receiving end through the network. The decoder 70 decodes the encoded image. The image processing apparatus 100 performs super-resolution using the decoded image. The image processing apparatus 100 converts the video stream 13 into a high-resolution image using the low-resolution image video stream 13 and other adjacent images (at least one of video streams 11, 12, 14, and 15). The storage device 90 may store 360-degree images (video streams 11 to 15) that are all high-resolution images.

수신단에서 디코더(70)와 영상처리장치(100)를 구분하여 표시하였다. 다만, 하나의장치가 디코딩과 초해상화를 수행할 수도 있다. 한편, 인코딩 내지 디코딩은 이하 설명하는 초해상화 과정과 연관이 없다. 영상처리장치는 저해상도 영상과 참조할 고해상도 영상만을 기준으로 초해상화를 한다.At the receiving end, the decoder 70 and the image processing apparatus 100 are separately displayed. However, one device may perform decoding and super-resolution. On the other hand, encoding or decoding is not related to the super-resolution process to be described below. The image processing apparatus performs super-resolution based on only the low-resolution image and the high-resolution image to be referenced.

이하 다시점 360도 영상에 대한 초해상도를 수행하는 장치를 영상처리장치라고 가정한다. 영상처리장치는 물리적으로 다양한 형태일 수 있다. 예컨대, 영상처리장치는 VR장치, PC, 서버, 프로그램이 임베디드된 칩셋 등일 수 있다. 영상처리장치는 복수의 360도 영상을 입력받아 초해상화를 수행한다.Hereinafter, it is assumed that an apparatus for performing super-resolution on a multi-view 360-degree image is an image processing apparatus. The image processing apparatus may be physically various types. For example, the image processing device may be a VR device, a PC, a server, a chipset in which a program is embedded, and the like. The image processing apparatus receives a plurality of 360-degree images and performs super-resolution.

ERP 영상은 영상 중 특정 영역에는 일정한 왜곡을 포함한다. 다시점 영상은 서로 다른 위치에 배치된 복수의 카메라로부터 획득된다. 동일한 지점 내지 영역을 촬영한 복수의 영상이라고, 카메라의 위치에 따라 ERP 영상의 왜곡 정도가 달라진다. 따라서, 어느 하나의 360도 카메라가 획득한 영상을 다른 카메라가 획득한 영상을 기준으로 초해상화하는 것이 쉽지 않다. 따라서, 다음과 같은 초해상화 과정을 제안한다. ERP image includes certain distortion in a specific area of the image. Multi-view images are obtained from a plurality of cameras disposed at different positions. It is a plurality of images taken from the same point or area, and the degree of distortion of the ERP image varies depending on the location of the camera. Therefore, it is not easy to super-resolution an image acquired by one 360-degree camera based on an image acquired by another camera. Therefore, the following super-resolution process is proposed.

복수의 360도 영상들 중 초해상화 대상인 영상을 타깃 영상이라고 명명한다. 타깃 영상은 저해상도 영상이다. 타깃 영상은 저해상도 타깃 영상과 타깃 영상을 촬영한 카메라에 인접한 카메라가 획득한 고해상도 영상을 사용하여 초해상화된다. 이때 인접한 카메라기 획득한 고해상도 영상을 참조 영상이라고 명명한다. 참조 영상은 타깃 영상이 캡쳐한 영역 전체 또는 일부를 포함한다. 도 2를 기준으로 설명하면, 타깃 영상이 카메라 13이 획득한 영상이라면, 참조 영상은 카메라 11, 12, 14 및 15 중 어느 하나일 수 있다. An image to be super-resolution among a plurality of 360-degree images is called a target image. The target image is a low-resolution image. The target image is super-resolution using a low-resolution target image and a high-resolution image acquired by a camera adjacent to the camera that captured the target image. In this case, a high-resolution image acquired by an adjacent camera is called a reference image. The reference image includes all or part of a region captured by the target image. Referring to FIG. 2 , if the target image is an image acquired by the camera 13, the reference image may be any one of cameras 11, 12, 14 and 15.

이하 타깃 영상을 초해상화하는 장치를 영상처리장치라고 명명한다. 영상처리장치는 영상 데이터 처리가 가능한 컴퓨터 장치이다. 예컨대, 영상처리장치는 PC, 스마트기기, 네트워크 상의 서버, TV 셋업박스, VR 장치 등일 수 있다.Hereinafter, an apparatus for super-resolution a target image is referred to as an image processing apparatus. The image processing apparatus is a computer device capable of processing image data. For example, the image processing apparatus may be a PC, a smart device, a server on a network, a TV setup box, a VR device, or the like.

도 2는 초해상화를 수행하는 신경망 모델(200)에 대한 예이다. 도 2의 신경망 모델(200)은 360도 입력 영상을 입력받아 360도 고해상도 영상을 출력한다. 입력 영상은 저해상도 타깃 영상 E^LR 및 참조 영상 E^Ref이다. 출력 영상은 타깃 영상 E^LR이 초해상화된 결과물은 고해상도 영상 E^SR이다.2 is an example of a neural network model 200 that performs super-resolution. The neural network model 200 of FIG. 2 receives a 360-degree input image and outputs a 360-degree high-resolution image. The input image is a low-resolution target image E ^LR and a reference image E ^Ref . The output image is the target image E ^LR , and the result of super-resolution is the high-resolution image E ^SR .

도 2는 전이 계층(transfer layer)을 사용하여 실사(real) 360도 영상을 초해상화하는 모델이다. 제1 전이 계층(211)은 E^LR을 입력받아 합성 360도 영상 특징 t^LR을 출력한다. 제2 전이 계층(212)은 E^Ref를 입력받아 합성 360도 영상 특징 t^Ref을 출력한다. 2 is a model of super-resolution a real 360-degree image using a transfer layer. The first transition layer 211 receives E ^LR and outputs a synthesized 360-degree image feature t ^LR . The second transition layer 212 receives E ^Ref and outputs a synthesized 360-degree image feature t ^Ref .

디스패리티 추정기(220)는 E^LR와 E^Ref 사이의 상관관계를 추출한다. 상관관계는 각 영상에서 추출한 특징들 중 연관된 특징들을 정렬한 정보를 포함한다. Disparity estimator 220 is E ^LR and E ^Ref extract the correlation between Correlation includes information on aligning related features among features extracted from each image.

디스패리티 추정기(220)는 상관관계 D_r→l를 연산하여 플로우(flow) 감독없이 참조 영상의 특징을 저해상도 영상과 함께 전달한다. The disparity estimator 220 calculates the correlation D _r→l and transfers the features of the reference image together with the low-resolution image without flow supervision.

디스패리티 추정기(220)는 타깃 영상의 t^LR에서 제1 특징을 추출하는 구성, 참조 영상의 t^Ref에서 제2 특징을 추출하는 구성, 제1 특징과 제2 특징의 상관 관계에서 플로우를 추정하는 구성을 포함한다. The disparity estimator 220 is configured to extract a first feature from t ^LR of a target image, a configuration for extracting a second feature from t ^Ref of a reference image, and to estimate a flow from the correlation between the first feature and the second feature includes configuration.

간략하게 설명하면, 제1 인코더(221)는 t^LR을 입력받아 기본적인 특징을 f^LR을 추출하고, 제1 위치 인식 컨볼루션 계층(latitude-aware convolution, 222)은 f^LR에서 ERP 영상 사이의 디스패리티 차이를 줄이는 역할을 한다. 제1 위치 인식 컨볼루션 계층(222)는 s^LR을 출력한다. 제2 인코더(223)는 t^Ref을 입력받아 기본적인 특징을 f^Ref을 추출하고, 제2 위치 인식 컨볼루션 계층(224)은 f^Ref에서 ERP 영상 사이의 디스패리티 차이를 줄이는 역할을 한다. 제2 위치 인식 컨볼루션 계층(224)은 s^Ref을 출력한다. 제1 인코더(221)와 제2 인코더(223), 제1 위치 인식 컨볼루션 계층(222)과 제2 위치 인식 컨볼루션 계층(224) 사이의 양방향 화살표는 커널 파라미터를 공유하는 관계를 나타낸다. Briefly, the first encoder 221 receives t ^LR and extracts f ^LR as a basic feature, and the first latitude-aware convolution 222 is a disc between f ^LR and ERP images. It serves to reduce the parity difference. The first location-aware convolutional layer 222 outputs s ^LR . The second encoder 223 receives t ^Ref and extracts f ^Ref as a basic feature, and the second location-aware convolutional layer 224 serves to reduce a disparity difference between ERP images in f ^Ref . The second location-aware convolutional layer 224 outputs s ^Ref . A double-headed arrow between the first encoder 221 and the second encoder 223, the first location-aware convolutional layer 222, and the second location-aware convolutional layer 224 indicates a relationship in which kernel parameters are shared.

영상처리장치는 s^LR와 s^Ref의 상관 연산(225)을 하고, 플로우 추정기(226)는 상관 연산한 정보를 입력받아 최종적으로 E^LR과 E^Ref 사이의 상관관계 D_r→l을 출력한다. 영상처리장치는 D_r→l을 이용하여 E^Ref를

로 와핑(warping)한다(240).

는 타깃 영상 E^LR과 참조 영상 E^Ref의 ERP 왜곡을 고려하여 참조 영상을 타깃 영상에 정렬한 결과에 해당한다.The image processing apparatus performs a correlation operation 225 between s ^LR and s ^Ref , and the flow estimator 226 receives the correlation information and finally E ^LR and E ^Ref .The correlation between D _r→l is output. The image processing device uses D _r→l to set E ^Ref .

and warping (240).

corresponds to the result of aligning the reference image with the target image in consideration of ERP distortion of the target image E ^LR and the reference image E ^Ref .

가 타깃과 관련하여 부정적인 특징(성능 저하 특징)을 갖는 경우, 초해상화 성능이 저하된다. 신경망 모델(200)은 이와 같은 성능 저하 특징을 제거하기 위한 구성을 더 포함할 수 있다. 마스크 생성기(230)는 t^LR 및 D_r→l을 입력받아

에 포함되는 성능 저하 특성을 제거하기 위한 마스크 M을 생성한다.

has a negative characteristic (performance degradation characteristic) with respect to the target, the super-resolution performance is degraded. The neural network model 200 may further include a configuration for removing such a performance degradation characteristic. Mask generator 230 is t ^LR and D _r→l as input

A mask M is created to remove the performance degradation characteristics included in .

영상처리장치는 M과

을 요소별 곱셈(elementwise multiplication)하여 초해상화를 위한 최종적인 특징 데이터를 마련할 수 있다. The image processing device is M and

can be multiplied by element (elementwise multiplication) to prepare final feature data for super-resolution.

재구성 계층(250)은 (i) 저해상도 타깃 영상 E^LR(또는 t^LR) 및 (ii)

또는 M과

을 곱한 결과를 입력받아 영상을 재구성하여 초해상화를 수행한다. 재구성 계층(250)은 잔차 블록(residual block)으로 구성될 수 있다. 잔차 블록은 입력 특징이 들어와 일정 값이 출력되는 과정에서 학습 효율에 따라 컨볼루션을 수행하거나 수행하지 않는 과정을 선택적으로 제공한다.The reconstruction layer 250 includes (i) a low-resolution target image E ^LR (or t ^LR ) and (ii)

or M and

Super-resolution is performed by reconstructing the image by receiving the result of multiplying by . The reconstruction layer 250 may be composed of a residual block. The residual block selectively provides a process in which a convolution is performed or not performed according to learning efficiency in the process where an input feature is input and a predetermined value is output.

제3 전이 계층(260)은 재구성 계층(250)의 출력인 초해상화된 합성 360도 영상을 실사 합성 영상으로 변환한다. 제3 전이 계층(260)은 초해상화 360 영상 E^LR을 출력한다. 신경망 모델(200)에서 전이 계층(211, 212 ,260)은 복잡도가 높은 실사 영상을 합성 영상 차원에서 낮은 복잡도로 초해상화를 수행하게 하다.The third transition layer 260 converts the super-resolution synthetic 360-degree image output from the reconstruction layer 250 into a photorealistic synthetic image. The third transition layer 260 outputs a super-resolution 360 image E ^LR . In the neural network model 200 , the transition layers 211 , 212 , and 260 perform super-resolution of a high-complexity photorealistic image with low complexity in a synthetic image dimension.

도 3은 초해상화를 수행하는 신경망 모델(300)에 대한 다른 예이다. 도 3은 합성 360도 영상에 대한 초해상화 모델이다.3 is another example of a neural network model 300 that performs super-resolution. 3 is a super-resolution model for a synthetic 360-degree image.

입력 영상은 합성 영상이고, 저해상도 타깃 영상 t^LR 및 참조 영상 t^Ref이다. 출력 영상은 타깃 영상 t^LR이 초해상화된 결과물은 고해상도 영상 t^SR이다.The input image is a composite image, and is a low-resolution target image t ^LR and a reference image t ^Ref . The output image is the target image t ^LR , and the result of super-resolution is the high-resolution image t ^SR .

디스패리티 추정기(310)는 t^LR와 t^Ref 사이의 상관관계를 추출한다. 상관관계는 각 영상에서 추출한 특징들 중 연관된 특징들을 정렬한 정보를 포함한다. The disparity estimator 310 has t ^LR and t ^Ref extract the correlation between Correlation includes information on aligning related features among features extracted from each image.

디스패리티 추정기(310)는 상관관계 D_r→l를 연산하여 플로우(flow) 감독없이 참조 영상의 특징을 저해상도 영상과 함께 전달한다. The disparity estimator 310 calculates the correlation D _r→l and transfers the features of the reference image together with the low-resolution image without flow supervision.

디스패리티 추정기(310)는 타깃 영상의 t^LR에서 제1 특징을 추출하는 구성, 참조 영상의 t^Ref에서 제2 특징을 추출하는 구성, 제1 특징과 제2 특징의 상관 관계에서 플로우를 추정하는 구성을 포함한다. The disparity estimator 310 is configured to extract a first feature from t ^LR of a target image, a configuration for extracting a second feature from t ^Ref of a reference image, and to estimate a flow from the correlation between the first feature and the second feature includes configuration.

간략하게 설명하면, 제1 인코더(311)는 t^LR을 입력받아 기본적인 특징을 f^LR을 추출하고, 제1 위치 인식 컨볼루션 계층(312)은 f^LR에서 ERP 영상 사이의 디스패리티 차이를 줄이는 역할을 한다. 제1 위치 인식 컨볼루션 계층(222)는 s^LR을 출력한다. 제2 인코더(313)는 t^Ref을 입력받아 기본적인 특징을 f^Ref을 추출하고, 제2 위치 인식 컨볼루션 계층(314)은 f^Ref에서 ERP 영상 사이의 디스패리티 차이를 줄이는 역할을 한다. 제2 위치 인식 컨볼루션 계층(314)는 s^Ref을 출력한다. 제1 인코더(311)와 제2 인코더(313), 제1 위치 인식 컨볼루션 계층(312)과 제2 위치 인식 컨볼루션 계층(314) 사이의 양방향 화살표는 커널 파라미터를 공유하는 관계를 나타낸다. Briefly, the first encoder 311 receives t ^LR and extracts f ^LR as a basic feature, and the first location-aware convolutional layer 312 serves to reduce the disparity difference between the ERP images in f ^LR . do The first location-aware convolutional layer 222 outputs s ^LR . The second encoder 313 receives t ^Ref and extracts f ^Ref as a basic feature, and the second location-aware convolutional layer 314 serves to reduce a disparity difference between ERP images in f ^Ref . The second location-aware convolutional layer 314 outputs s ^Ref . A double arrow between the first encoder 311 and the second encoder 313 , the first location-aware convolutional layer 312 and the second location-aware convolutional layer 314 indicates a relationship that shares a kernel parameter.

영상처리장치는 s^LR와 s^Ref의 상관 연산(315) 하고, 플로우 추정기(326)는 상관 연산한 정보를 입력받아 최종적으로 t^LR과 t^Ref 사이의 상관관계 D_r→l을 출력한다. 영상처리장치는 D_r→l을 이용하여 t^Ref를

로 와핑(warping)한다(320).

는 타깃 영상 t^LR과 참조 영상 t^Ref의 ERP 왜곡을 고려하여 참조 영상을 타깃 영상에 정렬한 결과에 해당한다.The image processing apparatus performs a correlation operation 315 between s ^LR and s ^Ref , and the flow estimator 326 receives the correlation information and finally t ^LR and t ^Ref .The correlation between D _r→l is output. The image processing device calculates t ^Ref using D _r→l

and warping ( 320 ).

corresponds to the result of aligning the reference image with the target image in consideration of ERP distortion of the target image t ^LR and the reference image t ^Ref .

가 타깃과 관련하여 부정적인 특징(성능 저하 특징)을 갖는 경우, 초해상화 성능이 저하된다. 신경망 모델(300)은 이와 같은 성능 저하 특징을 제거하기 위한 구성을 더 포함할 수 있다. 마스크 생성기(330)는 t^LR 및 D_r→l을 입력받아

has a negative characteristic (performance degradation characteristic) with respect to the target, the super-resolution performance is degraded. The neural network model 300 may further include a configuration for removing such a performance degradation characteristic. Mask generator 330 is t ^LR and D _r→l as input

영상처리장치는 M과

을 요소별 곱셈하여 초해상화를 위한 최종적인 특징 데이터를 마련할 수 있다. The image processing device is M and

can be multiplied by elements to prepare final feature data for super-resolution.

재구성 계층(340)은 (i) t^LR 및 (ii)

또는 M과

을 곱한 결과를 입력받아 영상을 재구성하여 초해상화를 수행한다. 재구성 계층(340)은 잔차 블록으로 구성될 수 있다. 잔차 블록은 입력 특징이 들어와 일정 값이 출력되는 과정에서 학습 효율에 따라 컨볼루션을 수행하거나 수행하지 않는 과정을 선택적으로 제공한다. 재구성 계층(340)은 초해상화된 합성 360도 영상 t^SR을 출력한다.Reconstruction layer 340 is (i) t ^LR and (ii)

or M and

Super-resolution is performed by reconstructing the image by receiving the result of multiplying by . The reconstruction layer 340 may be composed of residual blocks. The residual block selectively provides a process in which a convolution is performed or not performed according to learning efficiency in the process where an input feature is input and a predetermined value is output. The reconstruction layer 340 outputs the super-resolution synthetic 360-degree image t ^SR .

도 2의 신경망 모델(200) 및 도 3의 신경망 모델(300)에서 ERP 영상들 사이의 왜곡을 고려하여 마련한 구성이 디스패리티 추정기이다. 디스패리티 추정기에 대하여 설명한다. 위치 인식 컨볼루션 계층(LatConv)이 ERP 왜곡에 강인한 초해상화를 위한 핵심적 구성이다.A configuration prepared in consideration of the distortion between ERP images in the neural network model 200 of FIG. 2 and the neural network model 300 of FIG. 3 is a disparity estimator. A disparity estimator will be described. A location-aware convolutional layer (LatConv) is a key component for ERP distortion-resistant super-resolution.

ERP 영상에서 구형 이미지(spherical image) 기준으로 높은 위도(latitude)에 위치한 픽셀들은 경도(longitudinal) 방향으로 밀집된다. 따라서, 픽셀의 밀집도를 평준화하기 위해서는 적도 영역에서 극 영역으로 갈수록 픽셀들을 수평적 확장할 필요가 있다. LatConv이 ERP 영상의 특징 추출에서 이와 같은 동작을 수행한다.In the ERP image, pixels located at a high latitude based on a spherical image are clustered in a longitudinal direction. Accordingly, in order to equalize the pixel density, it is necessary to horizontally expand the pixels from the equatorial region to the pole region. LatConv performs this operation in feature extraction of ERP images.

LatConv는 위도에 따라 경도의 샘플 간격을 적응적으로 설정한다. 프레임 H(height)×W(width)의 i 번째 행(row)에서 샘플링 간격 a_i은 아래의 수학식 1과 같이 정의될 수 있다.LatConv adaptively sets the sample interval of longitude according to latitude. The sampling interval a _i in the i-th row of the frame H(height)×W(width) may be defined as in Equation 1 below.

영상처리장치는 a_i에 따라 입력 특징 맵 f를 위도에 따른 수평적 확장(scaling)할 수 있다. 영상처리장치는 각 위치 (i,j)에 대하여 아래 수학식 2와 같이 (2K + 1)×(2K + 1) LatConv 연산을 할 수 있다.The image processing apparatus may horizontally scale the input feature map f according to latitude according to a _i . The image processing apparatus may perform (2K + 1) × (2K + 1) LatConv operation for each position (i, j) as shown in Equation 2 below.

여기서, s는 출력 특징값이고, w는 커널이다. K는 제안된 네트워크에서 1로 설정될 수 있다. LatConv은 입력 특징 f의 차원과 관계없이 채널에 걸쳐 적용될 수 있다. LatConv은 표준적인 역전사로 훈련될 수 있다. 영상처리장치가 영상의 경계 영역에 커널을 적용할 때 수평 방향으로 순환 패딩(circular padding)하고, 수직 방향으로 거울 대칭 스킴(mirror symmetry scheme)을 사용할 수 있다.Here, s is the output feature value, and w is the kernel. K may be set to 1 in the proposed network. LatConv can be applied across channels regardless of the dimension of the input feature f. LatConv can be trained as a standard reverse transcription. When the image processing apparatus applies the kernel to the boundary region of the image, it may perform circular padding in a horizontal direction and use a mirror symmetry scheme in a vertical direction.

디스패리티 추정기에 대하여 설명한다. 일반적인 컨볼루션 계층인 인코더(221, 223, 311, 313)는 f^LR 및 f^Ref을 추출한다. 두 개의 ERP 사이의 왜곡 때문에, f^LR 및 f^Ref 사이의 상호 관계를 찾기 어렵다. LatConv을 통과하여 생산되는 s^LR 및 s^Ref 는 ERP 왜곡의 정도가 유사해진다. A disparity estimator will be described. The encoders 221 , 223 , 311 , and 313 as a general convolutional layer extract f ^LR and f ^Ref . Because of the distortion between the two ERPs, f ^LR and f ^Ref It is difficult to find a reciprocal relationship between s ^LR and s ^Ref produced through LatConvThe degree of ERP distortion becomes similar.

도 4는 피라미드 구조의 360도 디스패리티 추정기(400)에 대한 예이다.4 is an example of a 360 degree disparity estimator 400 of a pyramid structure.

디스패리티 추정기(400)는 인코더(410), 위치 인식 컨볼루션 계층(LatConv, 420) 및 플로우 추정기(440)를 포함한다.The disparity estimator 400 includes an encoder 410 , a location-aware convolutional layer (LatConv, 420 ) and a flow estimator 440 .

인코더(410)는 각각의 입력 영상에서 특징을 추출한다. 인코더(410)는 타깃 영상과 참조 영상 각각에 대한 특징을 추출하는 개별 계층으로 구성된다. 인코더(410)의 동일 계층에서 출력되는 특징은 ERP 왜곡이 서로 다르다. 예컨대, A와 B는 구형 왜곡(spherical distortion)이 서로 다르다. The encoder 410 extracts features from each input image. The encoder 410 is composed of individual layers for extracting features for each of a target image and a reference image. Characteristics output from the same layer of the encoder 410 have different ERP distortions. For example, A and B have different spherical distortion.

인코더(410)는 복수의 계층에서 마치 피라미드와 같은 특징들을 추출할 수 있다. 영상처리장치는 타깃 영상과 참조 영상 각각에 대하여 인코더(410)의 동일 계층의 특징을 비교하여 디스패리티를 추정할 수 있다. The encoder 410 may extract pyramid-like features from a plurality of layers. The image processing apparatus may estimate disparity by comparing features of the same layer of the encoder 410 with respect to each of the target image and the reference image.

LatConv(420)는 피라미드의 각 계층에서의 특징에 대한 컨볼루션 연산을 한다. LatConv(420)는 f_l ^LR 및 f_l ^Ref을 입력받아 s_l ^LR 및 s_l ^Ref을 출력한다. l은 계층의 레벨을 의미한다. LatConv(420)는 참조 영상의 특징을 타깃 영상의 특징에 정렬한다. LatConv(420)는 위도에 따른 경도상의 특징 밀집도의 왜곡을 줄여 참조 영상을 타깃 영상에 매칭한다. s_l ^LR 및 s_l ^Ref은 구형 왜곡이 매칭될 수 있다. LatConv(420)의 왜곡 보정은 수학식 2와 같이 수행된다. LatConv 420 performs a convolution operation on features in each layer of the pyramid. The LatConv 420 receives f _l ^LR and f _l ^Ref and outputs s _l ^LR and _sl ^Ref . l means the level of the hierarchy. The LatConv 420 aligns the features of the reference image with the features of the target image. The LatConv 420 matches the reference image to the target image by reducing distortion of feature density in longitude according to latitude. s _l ^LR and s _l ^Ref can be matched with spherical distortion. Distortion correction of the LatConv 420 is performed as in Equation (2).

영상처리장치는 각 레벨 l에서 s_l ^LR 및 s_l ^Ref 사이의 텍스처 특징을 매칭하고 연산한 매칭 비용(430)을 플로우 추정기(440)에 입력하여 플로우를 추정한다. 영상처리장치는 상위 피라미드 레벨에서의 특징 볼륨(volume)과 추정된 플로우를 현재 레벨의 플로우를 추정하기 위하여 결합한다. 플로우 추정기(440)는 6 계층 CNN으로 디스패리티를 추정한다. 영상처리장치는 픽셀을 대응되는 좌표에 재맵핑하여 E^Ref의 추정된 디스패리티를

에 정렬한다. The image processing apparatus matches the texture features between s _l ^LR and s _l ^Ref in each level l and inputs the calculated matching cost 430 to the flow estimator 440 to estimate the flow. The image processing apparatus combines the feature volume at the upper pyramid level and the estimated flow to estimate the flow at the current level. The flow estimator 440 estimates the disparity with a six-layer CNN. The image processing apparatus remaps the pixels to the corresponding coordinates to calculate the estimated disparity of E ^Ref .

sort on

마스크 생성기(230, 330)는 E^LR,

, D_r→l 및 E^LR과

의 차이 절대값을 결합(concatenation)하여 입력하고, 결합마스크 M을 출력할 수 있다. 도 2 및 도 3에 도시한 바와 같이 마스크 생성기(230, 330)는 5개의 컨볼루션 계층 및 시그모이드 활성 계층으로 구성될 수 있다. 마스크 생성기(230, 330)는 영상 재구성(초해상화)을 위하여 어떤 영역이 사용되어야 하는지 결정한다. 참조 영상에서 바람직하지 않은 텍스처는 아래 수학식 3과 같이 억제(필터링)될 수 있다. 상기 수학식 3과 같은 데이터가 재구성 계층(250, 340)에 입력된다.

Mask generators

230 and 330 are E ^LR ,

, D _r→l and E ^LR and

It is possible to input by concatenating the absolute difference value of , and output a concatenated mask M. 2 and 3 , the

mask generators

230 and 330 may include five convolutional layers and a sigmoid active layer. The

mask generators

230 and 330 determine which region should be used for image reconstruction (super-resolution). Undesirable textures in the reference image may be suppressed (filtered) as in Equation 3 below. Data as in Equation 3 is input to the reconstruction layers 250 and 340 .

재구성 계층(250, 340)은 64개 필터를 갖는 복수의 잔차 블록으로 구성될 수 있다. The reconstruction layers 250 and 340 may be composed of a plurality of residual blocks having 64 filters.

학습 과정에서 사용되는 손실함수에 대하여 설명한다. 아래 수학식 4와 수학식 5는 각각 재구성 계층의 손실함수와 참조 영상을 타깃에 정렬하는 와핑 과정의 손실함수에 해당한다.The loss function used in the learning process will be described. Equations 4 and 5 below correspond to the loss function of the reconstruction layer and the loss function of the warping process of aligning the reference image to the target, respectively.

여기서 E^GT는 진성값(ground truth)이고, E^SR은 초해상화 결과값이다. ρ는 0.01로 설정될 수 있다. 전체 손실함수는 아래 수학식 6과 같이 표현될 수 있다.Here, E ^GT is the ground truth, and E ^SR is the super-resolution result. ρ may be set to 0.01. The overall loss function can be expressed as Equation 6 below.

도 5는 초해상화를 수행하는 영상처리장치(500)에 대한 예이다. 영상처리장치(500)는 VR장치, PC, 스마트기기, 네트워크 서버 등과 같은 형태일 수 있다. 5 is an example of an image processing apparatus 500 that performs super-resolution. The image processing apparatus 500 may be in the form of a VR device, a PC, a smart device, a network server, and the like.

영상처리장치(500)는 저장장치(510), 메모리(520), 연산장치(530), 인터페이스장치(540) 및 통신장치(550)를 포함할 수 있다. The image processing apparatus 500 may include a storage device 510 , a memory 520 , an arithmetic device 530 , an interface device 540 , and a communication device 550 .

저장장치(510)는 영상처리장치(500)의 동작을 위한 프로그램 내지 코드를 저장할 수 있다. 저장장치(510)는 전술한 신경망 모델(200 또는 300)을 저장할 수 있다. 저장장치(510)는 신경망 모델(200 또는 300) 학습을 위한 프로그램 내지 코드를 저장할 수도 있다. 저장장치(510)는 신경망 모델이 생성한 고해상도 타깃 영상을 저장할 수 있다.The storage device 510 may store programs or codes for the operation of the image processing apparatus 500 . The storage device 510 may store the aforementioned neural network model 200 or 300 . The storage device 510 may store a program or code for learning the neural network model 200 or 300 . The storage device 510 may store a high-resolution target image generated by the neural network model.

메모리(520)는 영상처리장치(500)의 동작 과정에서 생성되는 데이터 및 정보 등을 임시 저장할 수 있다.The memory 520 may temporarily store data and information generated during the operation of the image processing apparatus 500 .

인터페이스장치(540)는 외부로부터 일정한 명령 및 데이터를 입력받는 장치이다. 인터페이스장치(540)는 물리적으로 연결된 입력장치 또는 물리적인 인터페이스(키패드, 터치 패널 등)로부터 일정한 정보를 입력받을 수 있다. 인터페이스장치(540)는 신경망 모델, 신경망 모델 학습을 위한 정보, 학습 데이터 등을 입력받을 수 있다. 인터페이스장치(540)는 신경망 모델 업데이트를 위한 파라미터값을 입력받을 수도 있다. 인터페이스장치(540)는 초해상화를 위한 복수의 360도 영상을 입력받을 수 있다. 인터페이스장치(540)는 전술한 타깃 영상 및 참조 영상을 입력받을 수 있다.The interface device 540 is a device that receives predetermined commands and data from the outside. The interface device 540 may receive predetermined information from a physically connected input device or a physical interface (keypad, touch panel, etc.). The interface device 540 may receive a neural network model, information for learning the neural network model, learning data, and the like. The interface device 540 may receive parameter values for updating the neural network model. The interface device 540 may receive a plurality of 360-degree images for super-resolution. The interface device 540 may receive the above-described target image and reference image.

통신장치(550)는 무선 네트워크를 통해 일정한 정보를 송수신한다. 통신장치(550)는 신경망 모델, 신경망 모델 학습을 위한 정보, 학습 데이터 등을 입력받을 수 있다. 통신장치(550)는 신경망 모델 업데이트를 위한 파라미터값을 수신할 수 있다. 통신장치(550)는 신경망 모델 입력하기 위한 타깃 영상 및 참조 영상을 수신할 수 있다. 통신장치(550)는 신경망 모델이 생성한 고해상도 타깃 영상을 외부 객체에 전송할 수 있다.The communication device 550 transmits and receives certain information through a wireless network. The communication device 550 may receive a neural network model, information for learning the neural network model, learning data, and the like. The communication device 550 may receive a parameter value for updating the neural network model. The communication device 550 may receive a target image and a reference image for inputting the neural network model. The communication device 550 may transmit a high-resolution target image generated by the neural network model to an external object.

인터페이스장치(540) 및 통신장치(550)는 사용자 또는 외부 객체로부터 일정한 정보 및 데이터를 입력받을 수 있다. 따라서 인터페이스장치(540) 및 통신장치(550)를 포괄하여 입력장치라고 명명할 수 있다.The interface device 540 and the communication device 550 may receive certain information and data from a user or an external object. Accordingly, the interface device 540 and the communication device 550 may be collectively referred to as an input device.

연산장치(530)는 저장장치(510)에 저장된 프로그램 내지 코드를 이용하여 영상처리장치(500)의 동작을 제어한다. 연산장치(530)는 신경망 모델을 이용하여 초해상화를 수행한다. The computing device 530 controls the operation of the image processing device 500 using a program or code stored in the storage device 510 . The computing unit 530 performs super-resolution using the neural network model.

연산장치(530)는 데이터를 처리하고, 일정한 연산을 처리하는 프로세서, AP, 프로그램이 임베디드된 칩과 같은 장치일 수 있다.The computing device 530 may be a device such as a processor, an AP, or a chip embedded with a program that processes data and processes a predetermined operation.

도 2의 신경망 모델(200)을 기준으로 연산장치(530)의 동작을 설명한다.An operation of the computing device 530 will be described based on the neural network model 200 of FIG. 2 .

연산장치(530)는 실사 360도 영상인 타깃 영상 E^LR을 제1 전이 계층(211)에 입력하여 합성 360도 영상 t^LR을 생성한다. 연산장치(530)는 실사 360도 영상인 참조 영상 E^Ref을 제2 전이 계층(212)에 입력하여 합성 360도 영상 t^Ref을 생성한다.The calculating unit 530 generates a synthesized 360-degree image t ^LR by inputting the target image E ^LR that is the actual 360-degree image to the first transition layer 211 . The calculating unit 530 generates a synthetic 360-degree image t ^Ref by inputting the reference image E ^Ref , which is the actual 360-degree image, to the second transition layer 212 .

연산장치(530)는 제1 인코더(221)에 t^LR을 입력하여 특징값 f^LR을 출력하고, f^LR을 제1 위치 인식 컨볼루션 계층(222)에 입력하여 s^LR을 출력한다. 연산장치(530)는 제2 인코더(223)에 t^Ref을 입력하여 특징값 f^Ref을 출력하고, f^Ref을 제2 위치 인식 컨볼루션 계층(224)에 입력하여 s^Ref을 출력한다.The arithmetic unit 530 inputs t ^LR to the first encoder 221 to output a feature value f ^LR , and inputs f ^LR to the first location-aware convolutional layer 222 to output s ^LR . The arithmetic unit 530 inputs t ^Ref to the second encoder 223 to output a feature value f ^Ref , and inputs f ^Ref to the second location-aware convolutional layer 224 to output s ^Ref .

연산장치(530)는 s^LR 및 s^Ref을의 특징값을 도 4와 같이 피라미드 구조와 같은 개별 계층에서 비교하고, 플로우 추정 모델(226)을 이용하여 상관관계 D_r→l을 출력한다. arithmetic unit 530 is s ^LR and s ^Ref are compared in individual layers such as a pyramid structure as shown in FIG. 4 , and the correlation D _r→l is output using the flow estimation model 226 .

연산장치(530)는 상관관계 D_r→l을 기준으로 참조 영상 E^Ref을 재맵핑(와핑)하여

를 생성한다. The arithmetic unit 530 remaps (warps) the reference image E ^Ref based on the correlation D _r→l

create

연산장치(530)는 마스크 생성 모델 내지 마스크 생성 계층(230)에 타깃 영상 E^LR, 상관관계 D_r→l을 및

을 입력하여 마스크 M을 생성한다.The operation unit 530 generates the target image E ^LR , the correlation D _r→l to the mask generation model or the mask generation layer 230 , and

to create a mask M.

연산장치(530)는

과 마스크 M을 요소별로 곱셈한 결과를 재구성 계층(250)에 입력하여 합성 360도 고해상도 영상을 생성한다.The arithmetic unit 530 is

The result of multiplying the mask M and the mask M for each element is input to the reconstruction layer 250 to generate a synthesized 360-degree high-resolution image.

연산장치(530)는 재구성 계층(250)의 출력값을 제3 전이 계층(260)에 입력하여 최종적인 실사 360도 영상인 E^SR을 생성한다.The calculating unit 530 generates an E ^SR that is a final live-action 360-degree image by inputting the output value of the reconstruction layer 250 to the third transition layer 260 .

도 3의 신경망 모델(300)을 기준으로 연산장치(530)의 동작을 설명한다.An operation of the computing device 530 will be described based on the neural network model 300 of FIG. 3 .

연산장치(530)는 제1 인코더(311)에 합성 360도 영상인 타깃 영상 t^LR을 입력하여 특징값 f^LR을 출력하고, f^LR을 제1 위치 인식 컨볼루션 계층(312)에 입력하여 s^LR을 출력한다. 연산장치(530)는 제2 인코더(313)에 합성 360도 영상인 참조 영상 t^Ref을 입력하여 특징값 f^Ref을 출력하고, f^Ref을 제2 위치 인식 컨볼루션 계층(314)에 입력하여 s^Ref을 출력한다.The arithmetic unit 530 inputs a target image t ^LR that is a synthesized 360-degree image to the first encoder 311 to output a feature value f ^LR , and inputs f ^LR to the first location recognition convolution layer 312 to s ^LR is output. The arithmetic unit 530 inputs a reference image t ^Ref , which is a synthesized 360-degree image, to the second encoder 313 to output a feature value f ^Ref , and inputs f ^Ref to the second position recognition convolutional layer 314 to s Print ^Ref .

연산장치(530)는 s^LR 및 s^Ref을의 특징값을 도 4와 같이 피라미드 구조와 같은 개별 계층에서 비교하고, 플로우 추정 모델(326)을 이용하여 상관관계 D_r→l을 출력한다. arithmetic unit 530 is s ^LR and s ^Ref are compared in individual layers such as a pyramid structure as shown in FIG. 4 , and the correlation D _r→l is output using the flow estimation model 326 .

create

연산장치(530)는 마스크 생성 모델 내지 마스크 생성 계층(330)에 타깃 영상 E^LR, 상관관계 D_r→l을 및

을 입력하여 마스크 M을 생성한다.The operation unit 530 applies the target image E ^LR , the correlation D _r→l to the mask generation model or the mask generation layer 330 , and

to create a mask M.

연산장치(530)는

과 마스크 M을 요소별로 곱셈한 결과를 재구성 계층(340)에 입력하여 합성 360도 고해상도 영상인 t^SR을 생성한다.The arithmetic unit 530 is

The result of multiplying the mask M and the mask M for each element is input to the reconstruction layer 340 to generate a synthesized 360-degree high-resolution image t ^SR .

한편, 연산장치(530)는 360도 고해상도 영상을 이용하여 특정 지점에 위치한 사용자에게 사용자 시점의 360도 영상을 합성할 수 있다.Meanwhile, the computing device 530 may synthesize a 360-degree image of the user's point of view to the user located at a specific point by using the 360-degree high-resolution image.

출력장치(560)는 초해상화 과정의 인터페이스 화면을 출력할 수 있다. 출력장치(560)는 초해상화된 결과물인 고해상도 영상을 출력할 수 있다. The output device 560 may output an interface screen of the super-resolution process. The output device 560 may output a high-resolution image that is a super-resolution result.

또한, 상술한 바와 같은 초해상화 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.In addition, the super-resolution method as described above may be implemented as a program (or application) including an executable algorithm that can be executed in a computer. The program may be provided by being stored in a temporary or non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, etc., and can be read by a device. Specifically, the various applications or programs described above are CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), EPROM (Erasable PROM, EPROM) Alternatively, it may be provided while being stored in a non-transitory readable medium such as an EEPROM (Electrically EPROM) or a flash memory.

일시적 판독 가능 매체는 스태틱 램(Static RAM，SRAM), 다이내믹 램(Dynamic RAM，DRAM), 싱크로너스 디램 (Synchronous DRAM，SDRAM), 2배속 SDRAM(Double Data Rate SDRAM，DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM，ESDRAM), 동기화 DRAM(Synclink DRAM，SLDRAM) 및 직접 램버스 램(Direct Rambus RAM，DRRAM) 과 같은 다양한 RAM을 의미한다.Temporarily readable media include: Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (Enhanced) SDRAM, ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM) and Direct Rambus RAM (Direct Rambus RAM, DRRAM) refers to a variety of RAM.

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.This embodiment and the drawings attached to this specification merely clearly show a part of the technical idea included in the above-described technology, and within the scope of the technical idea included in the specification and drawings of the above-described technology, those skilled in the art can easily It will be apparent that all inferred modified examples and specific embodiments are included in the scope of the above-described technology.

Claims

receiving, by the image processing apparatus, a low-resolution target image and a high-resolution reference image from among a plurality of 360-degree images providing a multi-view;
outputting, by the image processing apparatus, disparity information of the target image and the reference image by inputting the target image and the reference image to a disparity estimation model;
aligning, by the image processing apparatus, the reference image based on the disparity information; and
and outputting, by the image processing apparatus, a high-resolution image of the target image by inputting the target image and the aligned reference image to a reconstruction layer composed of residual blocks,
The target image and the reference image are images modeled as a 360-degree image based on an ERP (Equirectangular Projection),
The disparity estimation model performs a convolution operation on each of the target image and the reference image as shown in the following equation to extract features obtained by leveling pixels clustered according to latitude. Way.

(s(i,j) is the convolution operation for the position (i,j), s is an output feature value, w is a kernel, K is a value that sets the size of the kernel,

, H is the height of the frame)

According to claim 1,
The target image and the reference image are synthesized 360-degree images,
ERP-based multi-view 360-degree image further comprising the step of the image processing device inputting each of the target image and the reference image, which are the actual 360-degree images, to the transition layer, and converting the target image and the reference image, which are the synthetic 360-degree images, respectively super-resolution method.

According to claim 1,
The high-resolution image is a synthetic 360-degree image,
The ERP-based multi-view 360-degree image super-resolution method further comprising the step of the image processing device converting the high-resolution image to the transition layer into a live-action 360-degree image.

According to claim 1,
The disparity estimation model extracts feature values in a plurality of hierarchical structures for each of the target image and the reference image, and provides matching information between the features of the target image extracted from the same layer and the features of the reference image to the flow estimator. An ERP-based super-resolution method of multi-view 360-degree images that input and output the disparity information.

delete

According to claim 1,
generating a mask by inputting, by the image processing apparatus, the target image, the aligned reference image, and the disparity information into a mask generation model;
The image processing apparatus outputs the high-resolution image by inputting a value obtained by multiplying the aligned reference image by the mask and the aligned reference image to the reconstruction layer,
The mask is an ERP-based super-resolution method of multi-view 360-degree images that removes a characteristic that degrades the performance of generating the high-resolution image from the input reference image.

an input device for receiving a low-resolution target image and a high-resolution reference image from among a plurality of 360-degree images providing a multi-view;
a storage device for storing a neural network model that super-resolution a 360-degree image in consideration of the distortion between the ERP (Equirectangular projection) images; and
The target image and the reference image are input to the neural network model to generate disparity information of the target image and the reference image, and the target image and the reference image arranged based on the disparity information are used to generate the target image. Including a computing device for generating a high-resolution image for the image,
The target image and the reference image are images modeled as 360-degree images based on ERP,
The neural network model includes a mask generation layer that generates a mask by inputting the target image, the aligned reference image, and the disparity information, and a reconstruction layer that outputs a high-resolution image of the target image using the aligned reference image do,
The operation unit outputs the high-resolution image by inputting a value obtained by multiplying the aligned reference image and the mask and the aligned reference image to the reconstruction layer,
The mask is an ERP-based image processing apparatus for super-resolution 360-degree images that removes a characteristic that deteriorates the performance of generating the high-resolution image from the input reference image.

8. The method of claim 7,
The target image and the reference image are synthesized 360-degree images,
The neural network model includes a transition layer of the input stage,
The computing device inputs a target image and a reference image, which are actual 360-degree images, respectively, to the transition layer of the input stage, and super-resolution an ERP-based 360-degree image that generates the target image and the reference image, which are synthetic 360-degree images. processing unit.

8. The method of claim 7,
The high-resolution image is a synthetic 360-degree image,
The neural network model includes a transition layer of the output stage,
The arithmetic unit is an ERP-based 360-degree image processing apparatus for super-resolution by inputting the high-resolution image to the transition layer of the output stage and converting it into a live-action 360-degree image.

8. The method of claim 7,
The neural network model includes a disparity estimation layer,
The computing device inputs the target image and the reference image to the disparity estimation layer to extract feature values in a plurality of hierarchical structures, and matching information between the features of the target image extracted from the same layer and the features of the reference image An image processing apparatus for super-resolution an ERP-based 360-degree image that generates the disparity information by estimating the flow in the .

8. The method of claim 7,
The neural network model includes a disparity estimation layer,
The disparity estimation layer performs a convolution operation as shown in the following equation on each of the target image and the reference image to extract a feature obtained by leveling pixels clustered according to latitude. processing unit.

, H is the height of the frame)

delete