KR20230052378A

KR20230052378A - Decoding method for multi-view image based on super-resolution and decoder

Info

Publication number: KR20230052378A
Application number: KR1020210135398A
Authority: KR
Inventors: 강제원
Original assignee: 이화여자대학교 산학협력단
Priority date: 2021-10-13
Filing date: 2021-10-13
Publication date: 2023-04-20

Abstract

A decoding method for a multi-view video using super resolution includes the steps of: allowing a decoder to receive a bit stream of a target video that is down-sampled and encoded from an encoder; allowing the decoder to decode the target video of low resolution; and allowing the decoder to input the target image and a reference image for the target image into a previously trained neural network model and perform super resolution on the target image.

Description

Multi-view video decoding method and decoder using super-resolution

이하 설명하는 기술은 초해상화를 이용하여 다시점 비디오를 코딩하는 기법에 관한 것이다. 특히 이하 설명하는 기술은 초해상화를 이용하여 다시점 비디오를 복호하는 방법 및 디코더에 관한 것이다.A technique described below relates to a technique of coding multi-view video using super-resolution. In particular, the technology described below relates to a method and a decoder for decoding multi-view video using super-resolution.

다시점 비디오(Multiview video)는 복수의 카메라를 통해 촬영된 영상들을 정합하여 여러 방향의 다양한 시점을 사용자에게 제공하는 영상이다. 나아가, VR(virtual reality) 기술 등장과 함께 360도 다시점 영상 콘텐츠의 수요도 증가하고 있다.A multiview video is an image that provides users with various viewpoints in various directions by matching images taken through a plurality of cameras. Furthermore, along with the emergence of virtual reality (VR) technology, demand for 360-degree multi-view video content is also increasing.

한국등록특허 제10-2141319호Korean Patent Registration No. 10-2141319

다시점 비디오는 기본적으로 다수의 카메라들이 촬영한 영상들을 이용한다. 자연스러운 3차원 비디오 서비스를 제공하기 위하여 수십에서 수백 시점을 제공하는 고해상도의 다시점 영상이 필요하다. 따라서, 다시점 비디오를 위한 효율적인 압축 기법이 필요하다.Multi-view video basically uses images captured by multiple cameras. In order to provide a natural 3D video service, high-resolution multi-view images providing tens to hundreds of viewpoints are required. Therefore, an efficient compression technique for multi-view video is required.

이하 설명하는 기술은 인코더에서 다시점 영상들을 저해상도 영상으로 부호화하여 전달하고, 디코더에서 저해상도 영상을 초해상화(super-resolution)하여 고해상도 영상을 제공하는 기법을 제공하고자 한다.The technology to be described below aims to provide a technique of encoding multi-view images into low-resolution images in an encoder and transmitting them, and providing high-resolution images by super-resolutioning the low-resolution images in a decoder.

초해상화를 이용한 다시점 비디오의 복호 방법은 디코더가 인코더에서 다운 샘플링되어 부호화된 타깃 영상의 비트 스트림을 입력받는 단계, 상기 디코더가 저해상도인 상기 타깃 영상을 복호하는 단계 및 상기 디코더가 상기 타깃 영상 및 상기 타깃 영상에 대한 참조 영상을 사전에 학습된 신경망 모델에 입력하여 상기 타깃 영상에 대한 초해상화를 수행하는 단계를 포함한다. A method of decoding a multi-view video using super-resolution includes receiving a bit stream of a target image that has been downsampled and encoded by a decoder in an encoder, decoding the target image having a low resolution by the decoder, and the target image being decoded by the decoder. and performing super-resolution on the target image by inputting a reference image for the target image to a previously trained neural network model.

초해상화를 이용하여 다시점 비디오를 복호하는 디코더는 인코더에서 다운 샘플링되어 부호화된 타깃 영상의 비트 스트림을 입력받는 입력장치, 저해상도 영상과 참조 영상을 입력받아 상기 저해상도 영상을 초해상화하는 신경망 모델을 저장하는 저장장치 및 저해상도인 상기 타깃 영상을 복호하고, 상기 타깃 영상 및 상기 타깃 영상에 대한 참조 영상을 상기 신경망 모델에 입력하여 상기 타깃 영상에 대한 초해상화를 수행하는 연산장치를 포함한다.A decoder that decodes a multi-view video using super-resolution is an input device that receives a bit stream of a target image that has been downsampled and encoded by an encoder, and a neural network model that super-resolutions the low-resolution image by receiving a low-resolution image and a reference image. and an arithmetic device that decodes the low-resolution target image, inputs the target image and a reference image for the target image to the neural network model, and performs super-resolution on the target image.

이하 설명하는 기술은 인코더에서 고해상도 영상을 다운 샘플링하여 부호화하므로 전송되는 데이터의 양을 줄인다. 한편, 이하 설명하는 기술은 다시점 영상에 존재하는 다양한 참조 영상을 이용하여 초해상화하는 신경망 모델을 이용하여 디코더에서 품질 높은 고해상도 영상을 제공할 수 있다.The technology to be described below reduces the amount of transmitted data by downsampling and encoding a high-resolution video in an encoder. Meanwhile, in the technique described below, a high-quality, high-resolution image can be provided by a decoder using a neural network model that performs super-resolution using various reference images existing in a multi-view image.

도 1은 초해상화를 이용한 다시점 영상의 코딩 과정에 대한 개략적인 예이다.
도 2는 초해상화를 이용한 다시점 영상의 부호화 및 복호화 과정에 대한 예이다.
도 3은 초해상화를 수행하는 신경망 모델에 대한 예이다.
도 4는 초해상화를 이용하여 영상을 복원하는 디코더에 대한 예이다.1 is a schematic example of a coding process of a multi-view image using super-resolution.
2 is an example of a process of encoding and decoding a multi-view image using super-resolution.
3 is an example of a neural network model performing super-resolution.
4 is an example of a decoder for restoring an image using super-resolution.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the technology to be described below can have various changes and various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, and it should be understood to include all modifications, equivalents, or substitutes included in the spirit and scope of the technology described below.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. may be used to describe various elements, but the elements are not limited by the above terms, and are merely used to distinguish one element from another. used only as For example, without departing from the scope of the technology described below, a first element may be referred to as a second element, and similarly, the second element may be referred to as a first element. The terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설명된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In terms used in this specification, singular expressions should be understood to include plural expressions unless clearly interpreted differently in context, and terms such as “comprising” refer to the described features, numbers, steps, operations, and components. , parts or combinations thereof, but it should be understood that it does not exclude the possibility of the presence or addition of one or more other features or numbers, step-action components, parts or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to a detailed description of the drawings, it is to be clarified that the classification of components in the present specification is merely a classification for each main function in charge of each component. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function. In addition, each component to be described below may additionally perform some or all of the functions of other components in addition to its main function, and some of the main functions of each component may be performed by other components. Of course, it may be dedicated and performed by .

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in performing a method or method of operation, each process constituting the method may occur in a different order from the specified order unless a specific order is clearly described in context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

다시점 비디오 또는 다시점 영상은 서로 다른 복수의 시점의 영상으로 다양한 시점을 제공하는 영상 콘텐츠를 의미한다. 따라서, 다시점 비디오는 360도 영상, 다시점 360도 영상 등을 포함한다. 나아가, 다시점 비디오는 하나의 시점마다 텍스쳐(texture)와 깊이 영상을 갖는 다시점 3차원 비디오를 포함하는 의미이다.A multi-view video or a multi-view image is an image of a plurality of different viewpoints and refers to video content that provides various viewpoints. Accordingly, the multi-view video includes a 360-degree video, a multi-view 360-degree video, and the like. Furthermore, a multi-view video is meant to include a multi-view 3D video having a texture and a depth image for each view.

저해상도 영상은 특정 시점의 카메라가 획득한 영상을 일정하게 다운 스케일링하여 해상도가 기준값 미만인 영상을 의미한다. 이하 저해상도 영상은 LR(low resolution) 영상이라고 표현한다. 고해상도 영상은 해당 특정 시점의 카메라가 획득한 원본 영상 또는 일정한 기준값 이상의 해상도를 갖는 영상을 의미한다. 이하 고해상도 영상은 HR(high resolution) 영상이라고 표현한다.The low-resolution image refers to an image whose resolution is less than a reference value by constantly downscaling an image acquired by a camera at a specific point in time. Hereinafter, a low-resolution image is referred to as a low resolution (LR) image. The high-resolution image means an original image acquired by a camera at a corresponding specific point in time or an image having a resolution higher than a predetermined reference value. Hereinafter, a high resolution image is expressed as a high resolution (HR) image.

초해상화는 낮은 해상도의 영상을 보다 고해상도의 영상으로 변환하는 기법을 말한다. 이하 설명하는 기술은 다시점 비디오를 저해상도 영상으로 압축하여 제공하고 복호 과정에서 초해상화를 이용하여 영상을 복원한다. Super-resolution refers to a technique of converting a low resolution image into a higher resolution image. The technology described below compresses multi-view video into a low-resolution image and provides it, and restores the image using super-resolution in a decoding process.

이하 설명하는 기술은 인공신경망 모델을 이용하여 초해상화를 수행한다. 신경망 모델은 RNN(Recurrent Neural Networks), FFNN(feedforward neural network), CNN(convolutional neural network) 등 다양한 모델이 있다. 이하 설명에서 CNN 기반 모델을 중심으로 설명한다. The technology described below performs super-resolution using an artificial neural network model. There are various models such as recurrent neural networks (RNNs), feedforward neural networks (FFNNs), convolutional neural networks (CNNs), and the like. In the following description, the CNN-based model will be mainly described.

먼저, 다시점 비디오에 대한 코딩에 대한 일반적인 내용을 설명한다. 다시점 비디오는 서로 다른 시점의 영상을 포함한다. 따라서, 다시점 비디오는 시간과 공간 측면에서 다양한 비디오 소스를 활용하여 부호화될 수 있다.First, general descriptions of coding for multi-view video will be given. A multi-view video includes images from different viewpoints. Accordingly, multi-view video can be encoded using various video sources in terms of time and space.

다시점 비디오를 구성하는 각 시점(view)은 독립 시점 또는 종속 시점으로 구분할 수 있다. 독립 시점은 인접 시점의 정보를 활용하지 않고 현재 시점의 정보로만 부호화를 수행한다. 즉 시간 사이의 예측 부호화를 수행하여 압축한다. 반면에 종속 시점은 독립 시점의 정보를 활용하여 부호화를 할 수 있다.Each view constituting a multi-view video can be classified as an independent view or a dependent view. In the independent view, encoding is performed only with information of the current view without using information of an adjacent view. That is, inter-time prediction encoding is performed and compressed. On the other hand, the dependent view can be coded using information of an independent view.

한편, 각 시점의 영상은 영상 정보(텍스처)와 깊이 정보를 가질 수 있다. 영상 정보와 깊이 정보는 같은 시점 인덱스를 이용한다. 독립 시점의 경우에 영상 정보는 항상 관련 깊이 정보보다 먼저 부호화된다. 하지만 종속 시점의 경우 영상 정보가 같은 시점 인덱스를 가지는 관련 깊이 정보보다 이전 또는 이후에 부호화가 될 수 있다. 부호화 순서에 따라 영상 및 깊이 정보 사이에서의 계층 간 예측 부호화를 수행할 수 있다. 예를 들어 영상 정보가 먼저 부호화가 되면 깊이 정보는 관련 영상 정보의 부호화에 사용한 정보를 참조하여 부호화할 수 있다. Meanwhile, an image of each view may have image information (texture) and depth information. Image information and depth information use the same viewpoint index. In the case of an independent view, image information is always encoded before related depth information. However, in the case of a dependent view, image information may be coded before or after related depth information having the same view index. Inter-layer prediction encoding between an image and depth information may be performed according to an encoding order. For example, if image information is encoded first, depth information may be encoded by referring to information used for encoding related image information.

독립 시점과 종속 시점은 계층적 B 구조(Hierarchical B Structure)를 이용하여 시점 사이 및 시간 사이 예측을 통해 부호화가 된다. 독립 시점의 가장 첫 번째 프레임은 I 프레임으로 압축하고 그 이후 프레임들은 시간 사이 압축 기법을 적용한다. 종속 시점의 첫 번째 프레임은 P 또는 B 프레임을 이용하여 I 프레임을 참조한다. 종속 시점의 첫 번째 프레임들은 시간 사이 예측을 사용하지 않고 시점 사이 예측 또는 화면 내 압축 기법을 이용해 부호화한다. 종속 시점에서 그 후 시간의 프레임들은 시간 사이 및 시점 사이 예측 기법을 모두 이용할 수 있다.Independent views and dependent views are coded through inter-view and inter-temporal prediction using a hierarchical B structure. The first frame of an independent view is compressed as an I-frame, and the inter-temporal compression technique is applied to subsequent frames. The first frame of a dependent view refers to an I frame using a P or B frame. The first frames of dependent views are coded using inter-view prediction or intra-picture compression techniques without using inter-time prediction. Frames of time from a dependent time point thereafter may use both inter-time and inter-view prediction techniques.

이하 초해상화를 활용한 다시점 비디오의 부호화 및 복호화 과정을 설명한다. 부호화를 하는 영상처리장치는 인코더이고, 복호화를 하는 영상처리장치는 디코더이다. 이하 영상처리장치는 인코더 및/또는 디코더를 포함하는 의미로 사용한다. 영상처리장치는 영상 데이터 처리가 가능한 컴퓨터 장치이다. 예컨대, 영상처리장치는 PC, 스마트기기, 네트워크 상의 서버, TV 셋업박스, VR 장치 등일 수 있다. Hereinafter, a process of encoding and decoding a multi-view video using super-resolution will be described. An image processing device for encoding is an encoder, and an image processing device for decoding is a decoder. Hereinafter, an image processing device is used to mean including an encoder and/or a decoder. An image processing device is a computer device capable of processing image data. For example, the image processing device may be a PC, a smart device, a server on a network, a TV set-up box, a VR device, and the like.

도 1은 초해상화를 이용한 다시점 영상의 코딩 과정에 대한 개략적인 예이다. 도 1은 시간축과 시점축을 갖는 2차 평면에 영상 코딩 개념을 도시한다. 도 1은 시간 t와 시간 t-1를 예로 표시하고, 시점 1과 시점 2를 예로 표시한다. 도 1은 시간 t-1 및 시점 1에서의 비디오 프레임(F_1,t 및 F_2,t _- ₁)은 모두 부호화 및 복호화가 완료되었다고 가정한다. 도 1에서 부호화 및 복호화 대상은 비디오 프레임 F_2,t이다.1 is a schematic example of a coding process of a multi-view image using super-resolution. Figure 1 shows the video coding concept in a second plane with a time axis and a viewpoint axis. 1 shows time t and time t-1 as examples, and time points 1 and 2 as examples. 1 is a video frame at time t-1 and time 1 (F _1,t and F _2,t _- ₁ ), it is assumed that encoding and decoding are completed. In FIG. 1, an encoding and decoding target is a video frame F _2,t .

인코더는 시간 t와 시점 2에서의 비디오 프레임 F_2,t를 부호화하는 단계에서 원본 영상을 다운샘플링하여 저해상도 영상(F'_2,t)으로 부호화한다. In the step of encoding the video frame F _2,t at time t and time 2, the encoder downsamples the original video and encodes it into a low-resolution video (F' _2,t ).

디코더는 다운샘플링한 비디오 프레임 F'_2,t를 원래의 해상도로 복원하는 초해상화 과정을 수행한다. 한편, 디코더는 초해상화 과정에서 수신한 비디오 프레임 F_1,t 과 F_2,t-1를 같이 이용하는 참조 영상 기반 초해상화 방식을 사용할 수 있다.The decoder performs a super-resolution process of restoring the downsampled video frame F' _2,t to its original resolution. Meanwhile, the decoder may use a reference picture-based super-resolution method that uses both video frames F _1,t and F _2,t-1 received in the super-resolution process.

도 2는 초해상화를 이용한 다시점 영상의 부호화 및 복호화 과정(100)에 대한 예이다. 도 1에서와 같이 비디오 프레임 F_2,t의 부호화 및 복호화를 중심으로 설명한다. 도 2는 인코더와 디코더의 동작을 동시에 설명하는 예이다. 인코더도 부호화 과정에서 움직임 보상 등을 위하여 이전 프레임을 복호하는 구성을 포함할 수 있다.2 is an example of a process 100 of encoding and decoding a multi-view image using super-resolution. As shown in FIG. 1 , encoding and decoding of the video frame F _2,t will be mainly described. 2 is an example for simultaneously explaining the operation of an encoder and a decoder. An encoder may also include a component for decoding a previous frame for motion compensation or the like in an encoding process.

인코더는 최초 비디오 프레임 F_1,t는 원본 해상도에서 부호화하고, 디코더도 원본의 해상도로 복호를 하였다고 가정한다. 또한, t-1 시간에서의 부호화도 완료된 상태라고 가정한다. It is assumed that the encoder encodes the first video frame F _1,t at the original resolution, and the decoder also decodes at the original resolution. Also, it is assumed that encoding at time t-1 is also completed.

이 과정에서 인코더는 비디오 프레임 F_1,t을 입력받고(210), F_1,t을 다운 샘플링하여 F'_1,t을 생성한다(120). 인코더는 다운 샘플링한 비디오 프레임 F'_1,t을 DPB(Decoded Picture Buffer)에 저장할 수 있다. 한편, 디코더도 복호한 영상을 다운 샘플링하여 DPB에 저장할 수 있다.In this process, the encoder receives the video frame F _1,t (210) and generates F' _1,t by downsampling F _1,t (120). The encoder may store the downsampled video frame F' _1,t in a decoded picture buffer (DPB). Meanwhile, the decoder may downsample the decoded video and store it in the DPB.

인코더는 비디오 프레임 F_2,t를 입력받는다(130). The encoder receives a video frame F _2,t (130).

인코더는 F_2,t를 다운 샘플링하여 F'_2,t를 프레임을 생성한다(140). 인코더는 추가 시점(시점 2)에 있는 비디오 프레임 F_2,t는 다운 샘플링하여 저해상도에서 부호화를 진행한다. 이때 DPB는 F'_2,t 와 해상도가 동일한 시점 1의 F'_1,t 와 시간 t-1의 F'_2,t-1을 저장한 상태이다. The encoder generates a frame F' _2,t by downsampling F _2,t (140). The encoder performs down-sampling of the video frame F _2,t at the additional time point (time point 2) and performs encoding at a low resolution. At this time, the DPB is in a state where F' _1, _t at time point 1 and F' 2,t-1 at time t _{-1 are stored, which have the same resolution as F' 2,t} .

비디오 프레임 부호화는 이전에 부호화한 결과를 활용하는 다양한 방식이 이용된다. MCP(Motion Compensated Prediction)는 이전 시간의 프레임을 사용하여 현재 시간의 프레임을 부호화하는 방식이다. 즉, MCP는 시간 차이에 따른 나타나는 움직임 정보를 기준으로 잔차만을 전달하는 부호화 방식이다. 또한, DCP(Disparity Compensated Prediction)는 현재 프레임과 시점(view)이 다른 인접 시점에서의 블록을 예측 블록으로 활용하여 그 잔차 만을 전송하는 방식을 의미한다. For video frame encoding, various schemes utilizing previously encoded results are used. MCP (Motion Compensated Prediction) is a method of encoding a current time frame using a previous time frame. That is, MCP is an encoding method that transmits only residuals based on motion information that appears according to time differences. In addition, DCP (Disparity Compensated Prediction) means a method of transmitting only the residual by using a block at an adjacent view different from the current frame as a prediction block.

따라서 인코더는 F'_1,t 와 F'_2,t _-1 를 각각 이용하여 DCP 및/또는 MCP 방식으로 F'_2,t를 부호화할 수 있다. 인코더는 F'_2,t를 부호화를 위하여 F'_2,t와 F'_1,t를 비교하여 시점 차이에 따른 잔차를 예측(DCP)하거나, F'_2,t와 F'_2,t _- ₁를 비교하여 시간 차이에 따른 잔차를 예측(MCP)한다(150). 인코더는 DCP 또는 MCP를 하거나, 경우에 따라 DCP와 MCP를 동시에 수행하여 두 가지 기준에서 중복되지 않는 잔차만을 결정할 수도 있다. Accordingly, the encoder may encode F' _2,t using DCP and/or MCP using F' _1, t and F' _2,t _-1 , respectively. To encode F' _2,t, the encoder compares F' _2,t and F' _1,t to predict (DCP) the residual according to the time difference, or F' _2,t and F' _2,t _- ₁ The residual according to the time difference is predicted (MCP) by comparing (150). The encoder may perform DCP or MCP, or perform both DCP and MCP simultaneously in some cases to determine only residuals that do not overlap in the two criteria.

이후 부호화 과정은 일반적인 영상 코딩 과정과 동일하다. 인코더는 F'_2,t의 잔차를 대상으로 DCT(Discrete Cosine Transform) 및 양자화(quantization)를 수행한다(160). 인코더는 주파수 영역에서 양자화된 잔차에 대하여 엔트로피 코딩(entropy coding)을 수행할 수 있다(170). 인코더는 최종 비트 스트림을 생성한다.The subsequent coding process is the same as a general video coding process. The encoder performs DCT (Discrete Cosine Transform) and quantization on the residual of F' _2,t (160). The encoder may perform entropy coding on the quantized residual in the frequency domain (170). An encoder produces the final bit stream.

디코더는 인코더가 생성한 비트 스트림을 수신한다(180). 즉, 디코더는 저해상도 영상 데이터(F'_2,t)를 수신하여 영상을 복호하게 된다. 도 2는 디코더 동작에 대해서는 상세하게 도시하지 않았다. 간략하게 설명하면, 디코더는 F'_2,t의 잔차 데이터를 수신하고, 부호화된 방식에 대응하게 DCP 또는/및 MCP 방식으로 이전 데이터를 참조하여 F'_2,t를 복호한다.The decoder receives the bit stream generated by the encoder (180). That is, the decoder receives the low-resolution image data (F' _2,t ) and decodes the image. 2 does not show details of the decoder operation. Briefly, the decoder receives the residual data of F' _2,t , and decodes F' _2,t by referring to previous data using a DCP or/and MCP method corresponding to the encoded method.

이후 디코더는 F'_2,t를 초해상화로 업샘플링하여 F_2,t를 복원한다(190). 복호 과정에서도 디코더는 부호 과정과 동일하게 DPB에 F'_1,t 와 F'_2,t _- ₁를 보유한다. 잔차 예측 방식에 따라 해당 정보를 이용하여 영상을 복원해야 하기 때문이다. 디코더는 DPB에 있는 영상을 참조 영상으로 이용하여 F_2,t를 복원한다.Thereafter, the decoder reconstructs F _2,t by upsampling F' _2,t to super-resolution (190). Even in the decoding process, the decoder retains F' _1,t and F' _2,t _- ₁ in the DPB in the same way as in the coding process. This is because the image must be reconstructed using the corresponding information according to the residual prediction method. The decoder reconstructs F _2,t using an image in the DPB as a reference image.

한편, 다시점 영상은 하나의 프레임에 대한 텍스처 영상과 깊이 영상을 갖는 포맷일 수 있다. 이 경우 코딩 과정에서 사용하는 예측은 DCP가 아닌 texture-to-Depth 예측을 사용한다. 즉 인코더는 부호화 순서에 따라 텍스쳐 영상 및 깊이 영상 사이에서 예측 부호화를 수행할 수 있다. 예컨대, 텍스쳐 영상이 먼저 부호화가 되면 깊이 영상은 대응되는 텍스쳐 영상의 부호화에 사용한 정보를 참조하여 부호화될 수 있다. 나아가, 깊이 영상이 먼저 부호화되면, 텍스쳐 영상은 대응되는 깊이 영상의 부호화에 사용된 정보를 참조하여 부호화될 수 있다. 따라서, 영상 처리 장치(인코더 및 디코더)는 도 2와 달리 먼저 코딩된 텍스처 영상 또는 깊이 영상을 이용하여 현재 깊이 영상 또는 텍스처 영상을 코딩할 수 있다. Meanwhile, the multi-view image may have a format having a texture image and a depth image for one frame. In this case, the prediction used in the coding process uses texture-to-depth prediction instead of DCP. That is, the encoder may perform predictive encoding between the texture image and the depth image according to the encoding order. For example, if a texture image is encoded first, a depth image may be encoded by referring to information used for encoding a corresponding texture image. Furthermore, if the depth image is encoded first, the texture image may be encoded by referring to information used for encoding the corresponding depth image. Therefore, unlike FIG. 2 , the image processing device (encoder and decoder) may code the current depth image or texture image using the previously coded texture image or depth image.

도 2에서 디코더는 참조 영상을 이용하는 초해상화를 이용하여 특정 프레임의 영상을 업샘플링하였다. 이하 초해상화 과정을 설명한다. 디코더는 신경망 모델을 이용하여 초해상화를 수행할 수 있다.In FIG. 2, the decoder upsamples an image of a specific frame using super-resolution using a reference image. The super-resolution process will be described below. The decoder may perform super-resolution using a neural network model.

도 3은 초해상화를 수행하는 신경망 모델(200)에 대한 예이다. 도 3의 신경망 모델(200)은 특정 시간에서 특정 시점을 저해상도 영상을 입력받아 고해상도 영상을 출력한다. 초해상화 대상인 저해상도 영상을 타깃 저해상도 영상이라고 명명한다. 한편, 신경망 모델(200)은 참조 영상을 이용하여 타깃 저해상도 영상을 초해상화 한다. 따라서, 신경망에 입력되는 입력 영상은 타깃 저해상도 영상 E^LR 및 참조 영상 E^Ref이다. 출력 영상은 타깃 저해상도 영상 E^LR이 업샘플링된 고해상도 영상 E^SR이다.3 is an example of a neural network model 200 performing super-resolution. The neural network model 200 of FIG. 3 receives a low-resolution image at a specific point in time and outputs a high-resolution image. A low-resolution image, which is a super-resolution target, is called a target low-resolution image. Meanwhile, the neural network model 200 super-resolutions a target low-resolution image using a reference image. Accordingly, the input images input to the neural network are the target low-resolution image E ^LR and the reference image E ^Ref . The output image is the high-resolution image E ^SR obtained by upsampling the target low-resolution image E ^LR .

제1 컨볼루션 계층(211)은 E^LR을 입력받아 특징 t^LR을 출력한다. 제2 컨볼루션 계층 계층(212)은 E^Ref를 입력받아 특징 t^Ref을 출력한다. The first convolution layer 211 receives E ^LR and outputs a feature t ^LR . The second convolution layer layer 212 receives E ^Ref and outputs a feature t ^Ref .

디스패리티 추정기(220)는 E^LR와 E^Ref 사이의 상관관계를 추출한다. 상관관계는 각 영상에서 추출한 특징들 중 연관된 특징들을 정렬한 정보를 포함한다. 디스패리티 추정기(220)는 2개의 영상 사이의 상관 관계를 추정하는 계층에 해당한다.Disparity estimator 220 is E ^LR and E ^Ref extract the correlation between them. Correlation includes information in which related features are aligned among features extracted from each image. The disparity estimator 220 corresponds to a layer for estimating a correlation between two images.

디스패리티 추정기(220)는 상관관계 D_r→l를 연산하여 플로우(flow) 감독없이 참조 영상의 특징을 저해상도 영상과 함께 전달한다. The disparity estimator 220 calculates the correlation D _r→l and transfers the features of the reference image together with the low-resolution image without flow supervision.

디스패리티 추정기(220)는 타깃 저해상도 영상의 t^LR에서 제1 특징을 추출하는 구성, 참조 영상의 t^Ref에서 제2 특징을 추출하는 구성, 제1 특징과 제2 특징의 상관 관계에서 플로우를 추정하는 구성을 포함한다. The disparity estimator 220 extracts a first feature from t ^LR of the target low-resolution image, extracts a second feature from t ^Ref of the reference image, and estimates a flow from the correlation between the first feature and the second feature includes a configuration that

간략하게 설명하면, 제1 인코더(221)는 t^LR을 입력받아 기본적인 특징을 f^LR을 추출하고, 제1 위치 인식 컨볼루션 계층(latitude-aware convolution, 222)은 f^LR에서 영상 사이의 디스패리티 차이를 줄이는 역할을 한다. 제1 위치 인식 컨볼루션 계층(222)는 s^LR을 출력한다. 제2 인코더(223)는 t^Ref을 입력받아 기본적인 특징을 f^Ref을 추출하고, 제2 위치 인식 컨볼루션 계층(224)은 f^Ref에서 영상 사이의 디스패리티 차이를 줄이는 역할을 한다. 제2 위치 인식 컨볼루션 계층(224)은 s^Ref을 출력한다. 제1 인코더(221)와 제2 인코더(223), 제1 위치 인식 컨볼루션 계층(222)과 제2 위치 인식 컨볼루션 계층(224) 사이의 양방향 화살표는 커널 파라미터를 공유하는 관계를 나타낸다. Briefly, the first encoder 221 receives t ^LR and extracts f ^LR as a basic feature, and the first latitude-aware convolution layer 222 calculates the disparity between images in f ^LR serves to reduce the difference. The first location-aware convolution layer 222 outputs s ^LR . The second encoder 223 receives t ^Ref and extracts f ^Ref as a basic feature, and the second location-aware convolution layer 224 serves to reduce a disparity difference between images in f ^Ref . The second location-aware convolution layer 224 outputs s ^Ref . Double-headed arrows between the first encoder 221 and the second encoder 223, and between the first location-aware convolution layer 222 and the second location-aware convolution layer 224 indicate a relationship in which kernel parameters are shared.

영상처리장치는 s^LR와 s^Ref의 상관 연산(225)을 하고, 플로우 추정기(226)는 상관 연산한 정보를 입력받아 최종적으로 E^LR과 E^Ref 사이의 상관관계 D_r→l을 출력한다. 영상처리장치는 D_r→l을 이용하여 E^Ref를

로 와핑(warping)한다(240).

는 참조 영상을 타깃 영상에 정렬한 결과에 해당한다.The image processing device performs a correlation calculation 225 of s ^LR and s ^Ref , and the flow estimator 226 receives the information obtained through the correlation calculation and finally calculates E ^LR and E ^Ref Outputs the correlation D _r→l between The image processing device uses D _r→l to set E ^Ref

Warping with (240).

corresponds to the result of aligning the reference image to the target image.

가 타깃과 관련하여 부정적인 특징(성능 저하 특징)을 갖는 경우, 초해상화 성능이 저하된다. 신경망 모델(200)은 이와 같은 성능 저하 특징을 제거하기 위한 구성을 더 포함할 수 있다. 마스크 생성기(230)는 t^LR 및 D_r→l을 입력받아

에 포함되는 성능 저하 특성을 제거하기 위한 마스크 M을 생성한다.

If has negative characteristics (performance degradation characteristics) with respect to the target, the super-resolution performance is degraded. The neural network model 200 may further include a configuration for removing such performance degradation characteristics. Mask generator 230 is t ^LR and D _r→l as input

A mask M for removing the performance degradation characteristic included in is generated.

영상처리장치는 M과

을 요소별 곱셈(elementwise multiplication)하여 초해상화를 위한 최종적인 특징 데이터를 마련할 수 있다. The image processing device is M and

Final feature data for super-resolution may be prepared by elementwise multiplication.

재구성 계층(250)은 (i) 저해상도 타깃 영상 E^LR(또는 t^LR) 및 (ii)

또는 M과

을 곱한 결과를 입력받아 영상을 재구성하여 초해상화를 수행한다. 재구성 계층(250)은 잔차 블록(residual block)으로 구성될 수 있다. 잔차 블록은 입력 특징이 들어와 일정 값이 출력되는 과정에서 학습 효율에 따라 컨볼루션을 수행하거나 수행하지 않는 과정을 선택적으로 제공한다. 재구성 계층(250)은 E^LR을 초해상화한 E^SR을 출력한다.Reconstruction layer 250 includes (i) a low-resolution target image E ^LR (or t ^LR ) and (ii)

or with M

Reconstruct the image by receiving the result of multiplying by and performing super-resolution. The reconstruction layer 250 may be composed of residual blocks. The residual block selectively provides a process of performing or not performing convolution according to learning efficiency in the process of inputting input features and outputting a certain value. The reconstruction layer 250 outputs E ^SR obtained by super-resolving E ^LR .

도 2에서 디코더는 MCP 방식에 따른 F'_2,t의 잔차를 입력받아 F'_2,t _- ₁를 이용하여 F_2,t를 복원하였다. 이 경우 디코더에서 신경망 모델(200)은 타깃 저해상도 영상 F'_2,t 및 참조 영상 F'_2,t _- ₁를 입력받아 F_2,t를 산출한다. 또는 디코더는 DCP 방식에 따른 F'_2,t의 잔차를 입력받아 F'_1,t를 이용하여 F_2,t를 복원하였다. 이 경우 디코더에서 신경망 모델(220)은 타깃 저해상도 영상 F'_2,t 및 참조 영상 F'_1,t를 입력받아 F_2,t를 산출한다. 인코더나 MCP 및 DCP를 선택적 또는 복합적으로 이용하는 방식일 수 있으며, 이 경우 디코더는 인코더가 이용한 잔차 생성 방식에 대한 정보를 추가로 입력받아, 해당 잔차 생성 방식에 따라 참조 영상을 선택할 수 있다.In FIG. 2, the decoder receives the residual of F' _2,t according to the MCP method and restores F 2, _t using F' _2,t _- ₁ . In this case, the neural network model 200 in the decoder calculates F _{2,t by receiving the target low-resolution image F' 2,t} and the reference image F' _2,t _- ₁ _. Alternatively, the decoder receives the residual of F' _2,t according to the DCP method and restores F _{2,t using F' 1,} _t . In this case, the neural network model 220 in the decoder calculates F _{2,t by receiving the target low-resolution image F' 2,t} and the reference image F' _1, _t . It may be a method using an encoder or MCP and DCP selectively or in combination. In this case, the decoder may additionally receive information about the residual generation method used by the encoder and select a reference image according to the residual generation method.

한편, 디코더의 DPB는 특정 타깃 저해상도 영상 복호 시점에 이전에 복호했던 프레임에 대한 데이터(참조 영상 세트)를 보유한다. 따라서, 디코더는 참조 영상 세트 중 타깃 저해상도 영상을 복호하기 위한 참조 영상을 일정한 기준으로 선택할 수도 있다. 디코더는 다음과 같은 기준 중 어느 하나로 참조 영상을 선택할 수 있다. F_2,t에 대한 참조 영상을 선택한다고 가정하고 설명한다.Meanwhile, the DPB of the decoder holds data (reference image set) for a previously decoded frame at the time of decoding a specific target low-resolution image. Accordingly, the decoder may select a reference image for decoding a target low-resolution image among reference image sets based on a predetermined criterion. The decoder may select a reference image according to one of the following criteria. It is assumed that a reference image for F _2,t is selected and explained.

① 디코더는 DPB에 저장된 참조 영상 세트 중에서 F_2,t와 시점 순으로 가장 가까운 프레임을 선택할 수 있다. 예컨대, 이전 설명에서는 F'_1,t를 이용한다고 설명하였지만, 디코더는 F'_2,t초해상화를 위하여 F'_3,t를 선택할 수도 있다.① The decoder may select the closest frame in order of F _2,t and viewpoint among the reference image sets stored in the DPB. For example, although F' _1,t is used in the previous description, the decoder may select F' _3,t for F' _2,t super-resolution.

② 디코더는 DPB에 저장된 참조 영상 세트 중에서 F_2,t와 시간 순으로 가장 가까운 프레임을 선택할 수 있다. ② The decoder may select a frame closest to F _2,t in chronological order from among the reference image sets stored in the DPB.

③ 디코더는 DPB에 저장된 참조 영상 세트 중에서 I 프레임을 선택할 수도 있다. 예컨대, 독립 시점 영상의 초해상화에 I 프레임을 선택할 수도 있다.③ The decoder may select an I frame from the reference picture set stored in the DPB. For example, an I frame may be selected for super-resolution of an independent view image.

④ 디코더는 DPB에 저장된 참조 영상 세트 중에서 시간 계층 ID(temporal ID)가 가장 낮은 프레임을 선택할 수도 있다. 시간 계층 ID는 HEVC에서 시간적 확장성을 지원하기 위하여 도입한 정보이다.④ The decoder may select a frame having the lowest temporal ID among reference image sets stored in the DPB. Temporal layer ID is information introduced to support temporal scalability in HEVC.

⑤ 디코더는 DPB에 저장된 참조 영상 세트 중에서 가장 낮은 양자화 파라미터를 사용하는 프레임을 선택할 수 있다.⑤ The decoder may select a frame using the lowest quantization parameter among reference image sets stored in the DPB.

도 4는 초해상화를 이용하여 영상을 복원하는 디코더(400)에 대한 예이다. 디코더(400)는 VR장치, PC, 스마트기기, 네트워크 서버 등과 같은 형태일 수 있다. 4 is an example of a decoder 400 that reconstructs an image using super-resolution. The decoder 400 may be in the form of a VR device, PC, smart device, network server, and the like.

디코더(400)는 저장장치(410), 메모리(420), 연산장치(430), 인터페이스장치(440) 및 통신장치(450)를 포함할 수 있다. The decoder 400 may include a storage device 410, a memory 420, an arithmetic device 430, an interface device 440, and a communication device 450.

저장장치(410)는 디코더(400)의 동작을 위한 프로그램 내지 코드를 저장할 수 있다. The storage device 410 may store programs or codes for operation of the decoder 400 .

저장장치(410)는 전술한 신경망 모델(200)을 저장할 수 있다. The storage device 410 may store the aforementioned neural network model 200 .

저장장치(410)는 신경망 모델(200 또는 300) 학습을 위한 프로그램 내지 코드를 저장할 수도 있다. The storage device 410 may store programs or codes for learning the neural network model 200 or 300.

저장장치(410)는 복호한 저해상도 영상을 저장할 수 있다.The storage device 410 may store the decoded low-resolution image.

저장장치(410)는 이전에 복호한 저해상도 영상을 저장할 수 있다. 저장장치(410)는 초해상화의 참조 영상에 해당하는 저해상도 영상들을 저장할 수 있다.The storage device 410 may store previously decoded low-resolution images. The storage device 410 may store low-resolution images corresponding to super-resolution reference images.

저장장치(410)는 신경망 모델이 생성한 고해상도 영상을 저장할 수 있다.The storage device 410 may store a high-resolution image generated by a neural network model.

메모리(420)는 디코더(400)의 동작 과정에서 생성되는 데이터 및 정보 등을 임시 저장할 수 있다.The memory 420 may temporarily store data and information generated during the operation of the decoder 400 .

인터페이스장치(440)는 외부로부터 일정한 명령 및 데이터를 입력받는 장치이다. 인터페이스장치(440)는 물리적으로 연결된 입력장치 또는 물리적인 인터페이스(키패드, 터치 패널 등)로부터 일정한 정보를 입력받을 수 있다. The interface device 440 is a device that receives certain commands and data from the outside. The interface device 440 may receive certain information from a physically connected input device or a physical interface (keypad, touch panel, etc.).

인터페이스장치(440)는 신경망 모델, 신경망 모델 학습을 위한 정보, 학습 데이터 등을 입력받을 수 있다. 인터페이스장치(440)는 신경망 모델 업데이트를 위한 파라미터값을 입력받을 수도 있다. The interface device 440 may receive a neural network model, information for learning the neural network model, and learning data. The interface device 440 may also receive parameter values for updating the neural network model.

인터페이스장치(440)는 인코더가 생성한 비트 스트림을 입력받을 수 있다. The interface device 440 may receive a bit stream generated by an encoder.

인터페이스장치(440)는 타깃 저해상도 영상을 초해상화하는데 필요한 참조 영상을 입력받을 수 있다.The interface device 440 may receive a reference image required to super-resolution a target low-resolution image.

통신장치(450)는 무선 네트워크를 통해 일정한 정보를 송수신한다. 통신장치(450)는 신경망 모델, 신경망 모델 학습을 위한 정보, 학습 데이터 등을 입력받을 수 있다. 통신장치(450)는 신경망 모델 업데이트를 위한 파라미터값을 수신할 수 있다. The communication device 450 transmits and receives certain information through a wireless network. The communication device 450 may receive a neural network model, information for learning the neural network model, and learning data. The communication device 450 may receive parameter values for updating the neural network model.

통신장치(450)는 인코더가 생성한 비트 스트림을 수신할 수 있다.The communication device 450 may receive the bit stream generated by the encoder.

통신장치(450)는 타깃 저해상도 영상을 초해상화하는데 필요한 참조 영상을 수신할 수 있다.The communication device 450 may receive a reference image required to super-resolution a target low-resolution image.

통신장치(450)는 신경망 모델이 생성한 고해상도 영상을 외부 객체에 전송할 수 있다.The communication device 450 may transmit a high-resolution image generated by a neural network model to an external object.

인터페이스장치(440) 및 통신장치(450)는 사용자 또는 외부 객체로부터 일정한 정보 및 데이터를 입력받을 수 있다. 정보를 입력받는 측면에서 인터페이스장치(440) 및 통신장치(450)를 포괄하여 입력장치라고 명명할 수 있다.The interface device 440 and the communication device 450 may receive certain information and data from a user or an external object. In terms of receiving information, the interface device 440 and the communication device 450 may be collectively referred to as an input device.

연산장치(430)는 저장장치(410)에 저장된 프로그램 내지 코드를 이용하여 디코더(400)의 동작을 제어한다. The arithmetic device 430 controls the operation of the decoder 400 using programs or codes stored in the storage device 410 .

연산장치(430)는 비트 스트림에서 타깃 저해상도 영상의 잔차를 추출할 수 있다. 연산장치(430)는 인코더의 예측 방식에 대응하는 방식으로 인접 시점의 영상 또는 이전 시점의 영상을 사용하여 타깃 저해상도 영상을 복호할 수 있다. 나아가 연산장치(430)는 타깃 저해상도 영상이 텍스처 영상인 경우 깊이 영상을 이용하여 영상을 복호할 수 있다. 연산장치(430)는 타깃 저해상도 영상이 깊이 영상인 경우 덱스처 영상을 이용하여 영상을 복호할 수 있다. The arithmetic unit 430 may extract a residual of a target low-resolution image from a bit stream. The processing unit 430 may decode a target low-resolution image using an image of an adjacent view or an image of a previous view in a manner corresponding to a prediction method of an encoder. Furthermore, when the target low-resolution image is a texture image, the arithmetic device 430 may decode the image using the depth image. When the target low-resolution image is a depth image, the arithmetic unit 430 may decode the image using the texture image.

연산장치(430)는 복호한 타깃 저해상도 영상 및 참조 영상을 신경망 모델에 입력하여 타깃 저해상도 영상을 업샘플링한다. 연산장치(430)는 인코더의 예측 방식에 따라 참조 영상을 선택할 수 있다. The arithmetic unit 430 upsamples the target low-resolution image by inputting the decoded target low-resolution image and the reference image to the neural network model. The arithmetic unit 430 may select a reference image according to a prediction method of an encoder.

나아가, 연산장치(430)는 타깃 저해상도 영상과 관련된 정보를 이용하여 다양한 참조 영상 세트 중 어느 하나의 참조 영상을 선택할 수도 있다. 예컨대, 연산장치(430)는 저장장치에 저장된 참조 영상 세트 중에서 타깃 저해상도 영상과 시점 기준으로 가장 가까운 참조 영상을 선택할 수 있다. 또는 연산장치(430)는 저장장치에 저장된 참조 영상 세트 중에서 타깃 저해상도 영상과 시간 순으로 가장 가까운 참조 영상을 선택할 수 있다. 또는 연산장치(430)는 저장장치에 저장된 참조 영상 세트 중에서 I 프레임을 선택할 수도 있다. 또는 연산장치(430)는 저장장치에 저장된 참조 영상 세트 중에서 일시적 ID(temporal ID)가 가장 낮은 참조 영상을 선택할 수도 있다. 또는 연산장치(430)는 저장장치에 저장된 참조 영상 세트 중에서 가장 낮은 양자화 파라미터를 사용하는 참조 영상을 선택할 수 있다.Furthermore, the arithmetic device 430 may select any one reference image from among various reference image sets by using information related to a target low-resolution image. For example, the arithmetic device 430 may select a reference image closest to a target low-resolution image based on a viewpoint from a set of reference images stored in a storage device. Alternatively, the arithmetic device 430 may select a reference image closest to the target low-resolution image in chronological order from among the reference image sets stored in the storage device. Alternatively, the arithmetic device 430 may select an I frame from a set of reference images stored in a storage device. Alternatively, the arithmetic device 430 may select a reference image having the lowest temporal ID among the reference image sets stored in the storage device. Alternatively, the arithmetic device 430 may select a reference image using the lowest quantization parameter from among reference image sets stored in the storage device.

연산장치(430)는 데이터를 처리하고, 일정한 연산을 처리하는 프로세서, AP, 프로그램이 임베디드된 칩과 같은 장치일 수 있다.The arithmetic device 430 may be a device such as a processor, an AP, or a chip in which a program is embedded that processes data and performs certain arithmetic operations.

또한, 상술한 바와 같은 다시점 비디오 코딩 방법 및 초해상화를 이용한 다시점 영상 복호 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.In addition, the multi-view video coding method and the multi-view video decoding method using super-resolution as described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be stored and provided in a temporary or non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.A non-transitory readable medium is not a medium that stores data for a short moment, such as a register, cache, or memory, but a medium that stores data semi-permanently and can be read by a device. Specifically, the various applications or programs described above are CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), EPROM (Erasable PROM, EPROM) Alternatively, it may be stored and provided in a non-transitory readable medium such as EEPROM (Electrically EPROM) or flash memory.

일시적 판독 가능 매체는 스태틱 램(Static RAM，SRAM), 다이내믹 램(Dynamic RAM，DRAM), 싱크로너스 디램 (Synchronous DRAM，SDRAM), 2배속 SDRAM(Double Data Rate SDRAM，DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM，ESDRAM), 동기화 DRAM(Synclink DRAM，SLDRAM) 및 직접 램버스 램(Direct Rambus RAM，DRRAM) 과 같은 다양한 RAM을 의미한다.Temporary readable media include static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), and enhanced SDRAM (Enhanced SDRAM). SDRAM, ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM) and Direct Rambus RAM (DRRAM).

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.This embodiment and the drawings accompanying this specification clearly represent only a part of the technical idea included in the foregoing technology, and those skilled in the art can easily understand it within the scope of the technical idea included in the specification and drawings of the above technology. It will be obvious that all variations and specific examples that can be inferred are included in the scope of the above-described technology.

Claims

receiving, by a decoder, a bit stream of a target image that has been down-sampled and encoded by an encoder;
decoding the target image having a low resolution by the decoder; and
The decoder inputs the target image and a reference image for the target image to a pre-learned neural network model to perform super-resolution on the target image,
The method of decoding a multi-view video using super-resolution, wherein the target image is an image of a specific viewpoint among multi-view videos at time t.

According to claim 1,
The method of decoding multi-view video using super-resolution, wherein the reference image is determined according to a prediction technique used when the target image is encoded in the encoder.

According to claim 1,
Among the reference image sets previously decoded by the decoder, the reference image includes (i) an image closest to the target image in terms of viewpoint, (ii) an image closest to the target image in terms of time, (iii) an I-frame image, (iv) A method of decoding a multi-view video using super-resolution, which is any one of an image having the lowest temporal ID and (v) an image using the lowest quantization parameter.

According to claim 1,
The neural network model is
a first convolution layer receiving the target image and extracting a first feature;
a second convolution layer receiving the reference image and extracting a second feature;
a dispatch estimation layer receiving the first feature and the second feature and outputting a correlation between the target image and the reference image; and
A method of decoding a multi-view video using super-resolution including an image obtained by warping the reference image so as to be aligned with the target image based on the correlation and a reconstruction layer receiving the target image and reconstructing the target image.

an input device that receives a bit stream of a target image that has been downsampled and encoded by an encoder;
a storage device that receives a low resolution image and a reference image and stores a neural network model that super-resolutions the low resolution image; and
An arithmetic device that decodes the target image of low resolution and inputs the target image and a reference image for the target image to the neural network model to perform super-resolution on the target image,
The target image decodes a multi-view video using super-resolution, which is an image of a specific viewpoint among multi-view videos at time t.

According to claim 5,
The reference image decodes a multi-view video using super-resolution that is determined according to a prediction technique used when the target image is encoded in the encoder.

According to claim 5,
Among the reference image sets previously decoded by the decoder, the reference image includes (i) an image closest to the target image in terms of viewpoint, (ii) an image closest to the target image in terms of time, (iii) an I-frame image, A decoder for decoding a multi-view video using super-resolution, which is any one of (iv) an image having the lowest temporal ID and (v) an image using the lowest quantization parameter.

According to claim 5,
The neural network model is
a first convolution layer receiving the target image and extracting a first feature;
a second convolution layer receiving the reference image and extracting a second feature;
a dispatch estimation layer receiving the first feature and the second feature and outputting a correlation between the target image and the reference image; and
A decoder for decoding a multi-view video using super-resolution including an image obtained by warping the reference image so as to be aligned with the target image based on the correlation and a reconstruction layer receiving the target image and reconstructing the target image .