KR20200143287A

KR20200143287A - Method and apparatus for encoding/decoding image and recording medium for storing bitstream

Info

Publication number: KR20200143287A
Application number: KR1020200071443A
Authority: KR
Inventors: 이광순; 신홍창; 윤국진; 정준영
Original assignee: 한국전자통신연구원
Priority date: 2019-06-14
Filing date: 2020-06-12
Publication date: 2020-12-23

Abstract

Disclosed are an image encoding/decoding method, an apparatus thereof, and a recording medium storing a bitstream. A method of decoding a multi-view image including a basic view image and at least one additional view image according to an embodiment of the present invention comprises the steps of: obtaining a bitstream including basic view image encoding information on the base view image and residual addition view image encoding information on a plurality of residual additional view images; decoding the base view image and the plurality of residual addition view images based on the bitstream; and restoring the at least one additional view image from the plurality of residual additional view images, based on the base view image encoding information, the residual additional view image encoding information, and the base view image. The residual additional view image encoding information includes packing information of a patch, and the packing information includes information on importance of an image region belonging to the additional view image.

Description

Video encoding/decoding method, apparatus, and recording medium storing the bitstream {METHOD AND APPARATUS FOR ENCODING/DECODING IMAGE AND RECORDING MEDIUM FOR STORING BITSTREAM}

본 발명은 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체에 관한 것이다. 구체적으로, 본 발명은 시청자의 좌우/상하 회전뿐만 아니라 좌우/상하 이동 움직임에 대응하여 운동시차를 지원할 수 있는 전방위 비디오를 제공하기 위한 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체에 관한 것이다.The present invention relates to a video encoding/decoding method, an apparatus, and a recording medium storing a bitstream. Specifically, the present invention provides an image encoding/decoding method, an apparatus, and a recording medium storing a bitstream for providing an omnidirectional video capable of supporting motion parallax in response to a horizontal/vertical movement as well as horizontal/vertical rotation of a viewer. About.

가상현실(Virtual Reality, VR) 서비스는 360도 전방위 비디오(또는 전방위 비디오, 360도 비디오)를 실사 혹은 CG(Computer Graphic) 형태로 생성하여 개인형 VR 단말(예: HMD(Head Mounted Display) 또는 스마트폰 등)에서 재생할 수 있으며, 몰입감과 현장감을 극대화할 수 있도록 진화하고 있다.Virtual Reality (VR) service generates 360-degree omni-directional video (or omni-directional video, 360-degree video) in the form of real-life or computer graphics (CG) to create a personal VR terminal (e.g., HMD (Head Mounted Display) or smart Phones, etc.), and is evolving to maximize immersion and realism.

현재까지의 연구에서는, HMD를 통해 자연스럽고 몰입감이 높은 360도 비디오를 재생하기 위해서는 6DoF(Degrees Of Freedom)(자유도)를 재현해야 하는 것으로 알려져 있다. 즉, (1)좌우 이동, (2)상하 회전, (3)상하 이동, (4) 좌우 회전 등을 포함하는 6가지 방향으로의 시청자의 이동에 대해 응시되는 영상을 HMD를 통해 재생해야 된다는 것이다. 카메라로 획득한 실사영상을 재생하는 현재까지의 전방위 디오는 3DoF로서, 상기 방향들 중 상하 회전 및 좌우 회전 위주로의 움직임을 감지하여 영상을 재생하고 있으며, 시청자의 좌우 이동 및 상하 이동에 대해 응시되는 영상을 제공하지 못하고 있다. In studies to date, it is known that 6DoF (degrees of freedom) must be reproduced in order to reproduce natural and immersive 360-degree video through HMD. In other words, the video gazed at the viewer's movement in six directions including (1) left and right movement, (2) up and down rotation, (3) up and down movement, and (4) left and right rotation must be played through the HMD. . The omnidirectional DIO playing the live-action image acquired with the camera up to now is 3DoF, and it detects the movement of the vertical rotation and the left and right rotation among the directions and plays the image, and gazes at the viewer's horizontal movement and vertical movement. It is not able to provide video.

MPEG(Moving Picture Experts Group) 표준화 그룹에서는 몰입감을 극대화하기 위한 미디어를 이머시브 미디어(Immersive Media)로 정의하고, 이에 필요한 이머시브 비디오의 효과적인 인코딩 및 전송을 위한 표준을 단계적으로 진행하고 있다. 구체적으로, 가장 기본적인 이머시브 비디오인 3DoF의 다음 단계로서, 시청자의 착석 환경에서 운동시차를 재현할 수 있는 이머시브 비디오인 3DoF+, 시청자의 몇 발자국 움직임에 대응되는 운동시차를 제공하는 Omnidirectional 6DoF 및 시청자의 자유로운 움직임에 따라 완전한 운동시차를 제공하는 6DoF까지 단계적으로 표준화 과정이 진행될 예정이다. 상기 이머시브 비디오가 여러 시점의 전방위 비디오(예: ERP(Equi-Rectangular Projection) 포맷, cubemap 포맷 등)를 이용하는 경우, windowed-6DoF는 종래의 수평/수직 시차를 가지는 다시점 비디오 기술과 유사할 수 있다. 여기서, Windowed-6DoF는 여러 시점의 평면비디오(예: HD, UHD 등)를 이용하여 단일 시청 윈도우를 통해 운동시차를 제공하는 기술이다.The MPEG (Moving Picture Experts Group) standardization group defines media for maximizing immersion as immersive media, and is step-by-step progressing standards for effective encoding and transmission of immersive video necessary for this. Specifically, as the next step of 3DoF, the most basic immersive video, 3DoF+, an immersive video that can reproduce motion parallax in the viewer's seating environment, Omnidirectional 6DoF, which provides motion parallax corresponding to the motion of a few steps of the viewer, and viewers The standardization process will be carried out step by step up to 6DoF, which provides complete parallax according to the free movement of the player. When the immersive video uses multi-view video (e.g., ERP (Equi-Rectangular Projection) format, cubemap format, etc.), windowed-6DoF may be similar to a conventional multi-view video technology having horizontal/vertical parallax. have. Here, Windowed-6DoF is a technology that provides motion parallax through a single viewing window by using flat video (eg, HD, UHD, etc.) from multiple viewpoints.

본 발명은 운동 시차를 지원하기 위한 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체를 제공하는 것을 목적으로 한다.An object of the present invention is to provide a video encoding/decoding method and apparatus for supporting motion parallax, and a recording medium storing a bitstream.

또한, 본 발명은 VR 단말을 통해 자연스러운 전방위 비디오를 재생하기 위한 이머시브 비디오 포맷팅 방법 및 장치를 제공하는 것을 목적으로 한다.In addition, an object of the present invention is to provide an immersive video formatting method and apparatus for reproducing natural omnidirectional video through a VR terminal.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems that are not mentioned will be clearly understood by those of ordinary skill in the technical field to which the present invention belongs from the following description. I will be able to.

본 발명의 일 실시예에 따른 기본 시점 영상 및 적어도 하나 이상의 부가 시점 영상을 포함하는 다중 시점 영상의 부호화 방법은, 상기 기본 시점 영상에 기초하여 상기 적어도 하나 이상의 부가 시점 영상에 프루닝을 수행하여 적어도 하나 이상의 패치를 생성하는 단계, 상기 적어도 하나 이상의 패치를 패킹하는 단계, 상기 패킹된 적어도 하나 이상의 패치에 기초하여 복수의 잔차 부가 시점 영상을 생성하는 단계, 상기 복수의 잔차 부가 시점 영상에 관한 잔차 부가 시점 영상 부호화 정보를 포함하는 비트스트림을 출력하는 단계를 포함하되, 상기 적어도 하나 이상의 패치를 패킹하는 단계는, 상기 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 수행될 수 있다.In the encoding method of a multi-view image including a base view image and at least one additional view image according to an embodiment of the present invention, pruning is performed on the at least one additional view image based on the base view image. Generating one or more patches, packing the at least one patch, generating a plurality of residual-added view images based on the packed at least one or more patches, and adding residuals for the plurality of residual-added view images Including the step of outputting a bitstream including encoding information of a view image, wherein the packing of the at least one patch may be performed based on the importance of an image region belonging to the additional view image.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 적어도 하나 이상의 패치를 패킹하는 단계는, 상기 부가 시점 영상에 속한 영상 영역의 중요도에 따라 상기 패치의 크기를 조절하는 단계를 포함할 수 있다.In the multi-view image encoding method, packing the at least one patch may include adjusting the size of the patch according to the importance of an image region belonging to the additional view image.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 적어도 하나 이상의 패치를 패킹하는 단계는, 상기 부가 시점 영상에 속한 영상 영역의 중요도에 따라 상기 패치를 회전시키는 단계를 포함할 수 있다.In the multi-view image encoding method, packing the at least one patch may include rotating the patch according to the importance of an image region belonging to the additional view image.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 부가 시점 영상에 속한 영상 영역의 중요도는, 상기 영상 영역의 깊이 값, 상기 영상 영역을 획득하는데 이용된 카메라의 위치, 상기 영상 영역이 관심 영역인지 여부 및 상기 영상 영역의 복잡도 중 적어도 하나에 기초하여 결정될 수 있다.In the multi-view image encoding method, the importance of an image region belonging to the additional view image includes a depth value of the image region, a position of a camera used to acquire the image region, whether the image region is an ROI, and the It may be determined based on at least one of the complexity of the image area.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 적어도 하나 이상의 패치를 패킹하는 단계는, 상기 적어도 하나 이상의 패치의 경계 부분에 가드 밴드(guard band)를 설정하는 단계를 포함할 수 있다.In the multi-view image encoding method, packing the at least one patch may include setting a guard band at a boundary portion of the at least one patch.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 가드 밴드를 설정하는 단계는, 상기 부가 시점 영상에 속한 영상 영역의 중요도에 따라 상기 가드 밴드를 설정하는 것을 특징으로 할 수 있다.In the multi-view image encoding method, the setting of the guard band may include setting the guard band according to an importance of an image region belonging to the additional view image.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 가드 밴드를 설정하는 단계는, 상기 적어도 하나 이상의 패치의 경계 영역에 인접한 샘플 값을 복사하여 상기 가드 밴드를 설정하는 것을 특징으로 할 수 있다.In the multi-view image encoding method, the setting of the guard band may include setting the guard band by copying a sample value adjacent to a boundary region of the at least one patch.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 가드 밴드를 설정하는 단계는, 상기 적어도 하나 이상의 패치의 경계 영역에 포함된 복수의 샘플들을 보간하여 상기 가드 밴드를 설정하는 것을 특징으로 할 수 있다.In the multi-view image encoding method, the setting of the guard band may include setting the guard band by interpolating a plurality of samples included in a boundary region of the at least one patch.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 적어도 하나 이상의 패치를 패킹하는 단계는, 상기 부가 시점 영상에 속한 복수의 영상 영역의 유사성을 판별하는 단계 및 상기 유사성 판별 결과에 기초하여 상기 적어도 하나 이상의 패치를 단일 패치로 패킹하는 단계를 포함할 수 있다.In the multi-view image encoding method, the packing of the at least one patch may include determining similarity between a plurality of image regions belonging to the additional view image and the at least one patch based on the similarity determination result. It may include packing into a single patch.

상기 다중 시점 영상 부호화 방법에 있어서, 상기 비트스트림은, 상기 기본 시점 영상에 관한 기본 시점 영상 부호화 정보를 더 포함할 수 있다.In the multi-view image encoding method, the bitstream may further include basic-view image encoding information on the base-view image.

본 발명의 일 실시예에 따른 기본 시점 영상 및 적어도 하나 이상의 부가 시점 영상을 포함하는 다중 시점 영상의 복호화 방법은, 상기 기본 시점 영상에 관한 기본 시점 영상 부호화 정보 및 복수의 잔차 부가 시점 영상에 관한 잔차 부가 시점 영상 부호화 정보를 포함하는 비트스트림을 획득하는 단계, 상기 비트스트림에 기초하여, 상기 기본 시점 영상 및 상기 복수의 잔차 부가 시점 영상을 복호화하는 단계, 상기 기본 시점 영상 부호화 정보, 상기 잔차 부가 시점 영상 부호화 정보 및 상기 기본 시점 영상에 기초하여, 상기 복수의 잔차 부가 시점 영상으로부터 상기 적어도 하나 이상의 부가 시점 영상을 복원하는 단계를 포함하되, 상기 잔차 부가 시점 영상 부호화 정보는 패치의 패킹 정보를 포함하고, 상기 패킹 정보는 상기 부가 시점 영상에 속한 영상 영역의 중요도에 관한 정보를 포함할 수 있다.A method for decoding a multi-view image including a base view image and at least one additional view image according to an embodiment of the present invention includes encoding information on a base view image and a plurality of residuals on the base view image. Acquiring a bitstream including additional view image encoding information, decoding the base view image and the plurality of residual additional view images, based on the bitstream, the base view image encoding information, the residual addition view And reconstructing the at least one additional view image from the plurality of residual additional view images based on the encoding information and the base view image, wherein the residual additional view image encoding information includes packing information of a patch, , The packing information may include information on the importance of an image region belonging to the additional view image.

상기 다중 시점 영상 복호화 방법에 있어서, 상기 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 상기 패치의 크기가 결정될 수 있다.In the multi-view image decoding method, the size of the patch may be determined based on the importance of an image region belonging to the additional view image.

상기 다중 시점 영상 복호화 방법에 있어서, 상기 부가 시점 영상에 속한 영상 영역의 중요도는, 상기 영상 영역의 깊이 값, 상기 영상 영역을 획득하는데 이용된 카메라의 위치, 상기 영상 영역이 관심 영역인지 여부 및 상기 영상 영역의 복잡도 중 적어도 하나에 기초하여 결정될 수 있다.In the multi-view image decoding method, the importance of an image region belonging to the additional viewpoint image is a depth value of the image region, a position of a camera used to acquire the image region, whether the image region is an ROI, and the It may be determined based on at least one of the complexity of the image area.

상기 다중 시점 영상 복호화 방법에 있어서, 상기 패킹 정보는 상기 패치의 가드 밴드(guard band)에 관한 정보를 포함할 수 있다. In the multi-view video decoding method, the packing information may include information on a guard band of the patch.

상기 다중 시점 영상 복호화 방법에 있어서, 상기 가드 밴드는 상기 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 결정될 수 있다.In the multi-view image decoding method, the guard band may be determined based on the importance of an image region belonging to the additional view image.

상기 다중 시점 영상 복호화 방법에 있어서, 상기 패킹 정보는 상기 부가 시점 영상에 속한 복수의 영상 영역의 유사성에 관한 정보를 포함할 수 있다. In the multi-view image decoding method, the packing information may include information on similarity between a plurality of image regions belonging to the additional view image.

본 발명의 일 실시예에 따른 기본 시점 영상 및 적어도 하나 이상의 부가 시점 영상을 포함하는 다중 시점 영상의 부호화 방법에 관한 비트스트림을 저장하는 비일시적 컴퓨터 판독 가능한 기록 매체에 있어서, 상기 다중 시점 영상의 부호화 방법은, 상기 기본 시점 영상에 기초하여 상기 적어도 하나 이상의 부가 시점 영상에 프루닝을 수행하여 적어도 하나 이상의 패치를 생성하는 단계, 상기 적어도 하나 이상의 패치를 패킹하는 단계, 상기 패킹된 적어도 하나 이상의 패치에 기초하여 복수의 잔차 부가 시점 영상을 생성하는 단계, 상기 복수의 잔차 부가 시점 영상에 관한 잔차 부가 시점 영상 부호화 정보를 포함하는 비트스트림을 출력하는 단계를 포함하되, 상기 적어도 하나 이상의 패치를 패킹하는 단계는, 상기 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 수행될 수 있다.In a non-transitory computer-readable recording medium storing a bitstream related to a method of encoding a multi-view image including a base view image and at least one additional view image according to an embodiment of the present invention, the multi-view image is encoded The method includes generating at least one patch by performing pruning on the at least one additional view image based on the base view image, packing the at least one patch, and in the packed at least one or more patches Generating a plurality of residual-added view images based on the plurality of residual-added view images, including outputting a bitstream including residual-added-view image encoding information on the plurality of residual-added view images, and packing the at least one patch May be performed based on the importance of an image region belonging to the additional view image.

본 발명에 대하여 위에서 간략하게 요약된 특징들은 후술하는 본 발명의 상세한 설명의 예시적인 양상일 뿐이며, 본 발명의 범위를 제한하는 것은 아니다.The features briefly summarized above with respect to the present invention are merely exemplary aspects of the detailed description of the present invention to be described later, and do not limit the scope of the present invention.

본 발명에 따르면, 운동 시차를 지원하기 위한 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체가 제공될 수 있다.According to the present invention, a video encoding/decoding method, an apparatus, and a recording medium storing a bitstream for supporting motion parallax can be provided.

또한, 본 발명에 따르면, 시청자의 상하 좌우 회전 운동뿐만 아니라 시청자의 상하 좌우 이동 운동에 대응되는 영상을 재생함으로써, 완전하고 자연스러운 입체 영상을 VR 기기에 제공하는 방법 및 장치가 제공될 수 있다.In addition, according to the present invention, a method and apparatus for providing a complete and natural stereoscopic image to a VR device by reproducing an image corresponding to the vertical and horizontal movement of the viewer as well as the vertical and horizontal rotation movement of the viewer can be provided.

또한, 본 발명에 따르면, 각 패치별로의 크기 조절(scaling) 및 회전(rotation)을 통해 전송되는 데이터량이 축소하여 패치의 패킹 효율이 증가할 수 있다. In addition, according to the present invention, the amount of data transmitted through scaling and rotation for each patch may be reduced, so that the packing efficiency of the patch may be increased.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those of ordinary skill in the art from the following description. will be.

도 1은 본 발명의 일 실시예에 따른 이머시브 비디오의 개념을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 이머시브 비디오의 인코더 구조를 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 패치 패킹 과정을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른, 패치로 추출될 영상 영역의 중요도를 이용하여 패치의 크기를 조절하는 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른, 가드 밴드의 폭을 조절함으로써 인코딩 효율을 향상시키는 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른, 유사한 다중 패치 영역이 단일 패치로 패킹되는 방법을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따른 아틀라스별로의 패치 패킹 방법을 설명하기 위한 도면이다.
도 8은 본 발명의 일 실시예에 따른 아틀라스별로 비디오 인코딩을 다르게 하는 방법을 설명하기 위한 도면이다.
도 9는 본 발명의 일 실시예에 따른 다중 시점 영상 부호화 방법을 설명하기 위한 도면이다.
도 10은 본 발명의 일 실시예에 따른 다중 시점 영상 복호화 방법을 설명하기 위한 도면이다.1 is a view for explaining the concept of immersive video according to an embodiment of the present invention.
2 is a diagram illustrating an encoder structure of an immersive video according to an embodiment of the present invention.
3 is a view for explaining a patch packing process according to an embodiment of the present invention.
4 is a diagram for explaining a method of adjusting a size of a patch using the importance of an image area to be extracted as a patch according to an embodiment of the present invention.
5 is a diagram illustrating a method of improving encoding efficiency by adjusting a width of a guard band according to an embodiment of the present invention.
6 is a diagram for describing a method in which similar multiple patch areas are packed into a single patch according to an embodiment of the present invention.
7 is a diagram illustrating a patch packing method for each atlas according to an embodiment of the present invention.
8 is a diagram for describing a method of differently encoding video for each atlas according to an embodiment of the present invention.
9 is a diagram illustrating a method of encoding a multi-view image according to an embodiment of the present invention.
10 is a diagram for explaining a method of decoding a multi-view image according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다. 후술하는 예시적 실시예들에 대한 상세한 설명은, 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 실시예를 실시할 수 있기에 충분하도록 상세히 설명된다. 다양한 실시예들은 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 실시예의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 예시적 실시예들의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다.In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to a specific embodiment, it is to be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. Like reference numerals in the drawings refer to the same or similar functions over several aspects. The shapes and sizes of elements in the drawings may be exaggerated for clearer explanation. For a detailed description of exemplary embodiments described below, reference is made to the accompanying drawings, which illustrate specific embodiments as examples. These embodiments are described in detail sufficient to enable a person skilled in the art to practice the embodiments. It should be understood that the various embodiments are different from each other but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the present invention in relation to one embodiment. In addition, it is to be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the embodiment. Accordingly, the detailed description to be described below is not intended to be taken in a limiting sense, and the scope of exemplary embodiments, if properly described, is limited only by the appended claims, along with all scope equivalents to those claimed by the claims.

본 발명에서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.In the present invention, terms such as first and second may be used to describe various components, but the components should not be limited by the terms. These terms are used only for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, a first element may be referred to as a second element, and similarly, a second element may be referred to as a first element. The term and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

본 발명의 어떤 구성 요소가 다른 구성 요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있으나, 중간에 다른 구성 요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다거나 "직접 접속되어"있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component of the present invention is referred to as being "connected" or "connected" to another component, it may be directly connected or connected to the other component, but other components exist in the middle. It should be understood that there may be. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it should be understood that there is no other component in the middle.

본 발명의 실시예에 나타나는 구성부들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성부들이 분리된 하드웨어나 하나의 소프트웨어 구성단위로 이루어짐을 의미하지 않는다. 즉, 각 구성부는 설명의 편의상 각각의 구성부로 나열하여 포함한 것으로 각 구성부 중 적어도 두개의 구성부가 합쳐져 하나의 구성부로 이루어지거나, 하나의 구성부가 복수 개의 구성부로 나뉘어져 기능을 수행할 수 있고 이러한 각 구성부의 통합된 실시예 및 분리된 실시예도 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리범위에 포함된다.Components shown in the embodiments of the present invention are independently shown to represent different characteristic functions, and does not mean that each component is formed of separate hardware or a single software component. That is, each constituent part is listed and included as a constituent part for convenience of explanation, and at least two constituent parts of each constituent part are combined to form one constituent part, or one constituent part may be divided into a plurality of constituent parts to perform a function. Integrated embodiments and separate embodiments of the components are also included in the scope of the present invention unless departing from the essence of the present invention.

본 발명에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 발명에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 즉, 본 발명에서 특정 구성을 "포함"한다고 기술하는 내용은 해당 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성이 본 발명의 실시 또는 본 발명의 기술적 사상의 범위에 포함될 수 있음을 의미한다.The terms used in the present invention are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present invention, terms such as "comprises" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof, does not preclude in advance. That is, in the present invention, the description of "including" a specific configuration does not exclude configurations other than the corresponding configuration, and means that additional configurations may be included in the scope of the implementation of the present invention or the technical idea of the present invention.

본 발명의 일부의 구성 요소는 본 발명에서 본질적인 기능을 수행하는 필수적인 구성 요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성 요소일 수 있다. 본 발명은 단지 성능 향상을 위해 사용되는 구성 요소를 제외한 본 발명의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 단지 성능의 향상을 위해 사용되는 선택적 구성 요소를 제외한 필수 구성 요소만을 포함한 구조도 본 발명의 권리범위에 포함된다.Some of the components of the present invention are not essential components that perform essential functions in the present invention, but may be optional components only for improving performance. The present invention can be implemented by including only the components essential to implement the essence of the present invention excluding components used for improving performance, and including only essential components excluding optional components used for improving performance. The structure is also included in the scope of the present invention.

이하, 도면을 참조하여 본 발명의 실시 형태에 대하여 구체적으로 설명한다. 본 명세서의 실시예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략하고, 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In describing the embodiments of the present specification, if it is determined that a detailed description of a related known configuration or function may obscure the subject matter of the present specification, the detailed description thereof will be omitted, and the same reference numerals are used for the same elements in the drawings. Used and redundant descriptions for the same components are omitted.

본 명세서에서, 이머시브 비디오(Immersive Video)는 사용자로 하여금 몰입감을 경험하게 하는 비디오로, 3DoF, 3DoF+ 또는 6DoF 타입의 영상을 의미할 수 있다. 이때, 몰입(Immersion)이란, 현실과 가상 세계의 구분이 불분명하게 되는 현상으로 정의될 수 있다. In this specification, an immersive video is a video that allows a user to experience an immersive feeling, and may mean a 3DoF, 3DoF+, or 6DoF type image. Here, immersion may be defined as a phenomenon in which the distinction between the real and the virtual world becomes unclear.

본 명세서에서, 비디오 포맷(Video Format)은 비디오가 특정한 프로그램에서 사용 가능하도록 하기 위한 표준을 의미할 수 있다. 또한, 포맷팅(formatting)이란, 비디오의 형식을 상기 표준에 맞도록 변경하는 행위를 의미할 수 있다.In this specification, a video format may mean a standard for enabling a video to be used in a specific program. In addition, formatting may refer to an act of changing a video format to conform to the standard.

본 명세서에서, 비디오는 영상 및 동영상과 동일한 의미로 사용될 수 있다. In this specification, video may have the same meaning as video and video.

이하, 첨부한 도면을 참조하여 본 발명의 실시 예들에 대해서 설명한다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 이머시브 비디오의 개념을 설명하기 위한 도면이다.1 is a view for explaining the concept of immersive video according to an embodiment of the present invention.

도 1을 참조하면, 객체1(O1) 내지 객체4(O4)는 각각 임의의 장면 내의 비디오 영역, V_k는 카메라의 중심 위치에서 획득된 영상(기본 시점 영상 또는 기준 시점 영상), X_k는 시점 위치(카메라 위치), D_k는 카메라 중심 위치에서의 깊이 정보를 각각 의미할 수 있다. 이머시브 비디오는, 시청자의 움직임에 따라 6자유도(Degree Of Freedom, DOF)를 지원하기 위해, 중앙 위치(또는 중심 위치, 기준 위치)(X_k)에서 보여지는 기본 비디오(V_k), 시청자가 이동하였을 때 보여지는 다중 시점 위치(X_k-2,X_k-1, ...)에서의 다중 위치 비디오(V_k-2,V_k-1, ...), 관련 공간 정보(예: 깊이 정보, 카메라 정보) 등을 이용하여 생성될 수 있으며, 비디오 압축, 패킷 다중화 과정을 통해 단말 측에 전송될 수 있다. 여기서, 기본 비디오 및/또는 다중 위치 비디오는 평면 비디오 또는 전방위 비디오일 수 있다. Referring to FIG. 1, object 1 (O1) to object 4 (O4) are each a video region in a scene, V _k is an image acquired from the center of the camera (a basic view image or a reference view image), and X _k is The viewpoint position (camera position) and D _k may denote depth information at the camera center position, respectively. Immersive video, to support 6 DOF (Degree Of Freedom, DOF) in accordance with the viewer's motion, the center position (or center position, the reference position) (X _k) primary video (V _k), the viewer shown in the Multi-view position (X _k-2 ,X _k-1 , ...) in multi-position video (V _k-2 ,V _k-1 , ...), and related spatial information (eg, depth information, camera information), and the like, may be generated and transmitted to the terminal through video compression and packet multiplexing. Here, the basic video and/or multi-position video may be a flat video or an omnidirectional video.

따라서, 이머시브 미디어 시스템은 다중 시점으로 구성된 대용량의 이머시브 비디오를 획득, 생성, 전송 및 재현할 수 있어야 하므로, 대용량의 비디오 데이터가 효과적으로 저장 및 압축되어야 한다. 3DoF+ 또는 6DoF를 위한 이머시브 미디어 시스템 구성시, 기존의 이머시브 비디오(3DoF)와 호환성 유지도 고려되어야 한다.Accordingly, since the immersive media system must be able to acquire, generate, transmit, and reproduce a large amount of immersive video composed of multiple views, large amount of video data must be effectively stored and compressed. When configuring an immersive media system for 3DoF+ or 6DoF, it is necessary to consider maintaining compatibility with the existing immersive video (3DoF).

한편, 이머시브 비디오 포맷팅 장치는 기본 시점 영상, 다중 시점 영상 등을 획득할 수 있으며, 수신부(미도시)는 상기 동작들을 수행할 수 있다.Meanwhile, the immersive video formatting apparatus may acquire a basic view image, a multi view image, and the like, and a receiver (not shown) may perform the above operations.

도 2는 본 발명의 일 실시예에 따른 이머시브 비디오의 인코더 구조를 설명하기 위한 도면이다.2 is a diagram illustrating an encoder structure of an immersive video according to an embodiment of the present invention.

도 3은 본 발명의 일 실시예에 따른 패치 패킹 과정을 설명하기 위한 도면이다.3 is a view for explaining a patch packing process according to an embodiment of the present invention.

도 2를 참고하면, 본 발명의 일 실시예에 따른 이머시브 비디오의 인코더(200)는, 시점 최적화기(View Optimizer)(210), 아틀라스 구성기(Atlas Constructor)(220), 텍스처(Video Texture) 정보 및 깊이(Depth) 정보 인코더(Encoder)(230) 및 메타데이터 구성기(Metadata Composer)(240)를 포함할 수 있다. 그러나, 반드시 이에 한정되는 것은 아니고, 미도시된 구성요소를 추가적으로 더 포함할 수 있다. Referring to FIG. 2, the encoder 200 of immersive video according to an embodiment of the present invention includes a view optimizer 210, an Atlas Constructor 220, and a texture. ) Information and depth information, an encoder 230 and a metadata composer 240. However, the present invention is not necessarily limited thereto, and a component not shown may be additionally included.

시점 최적화기(210)는 복수 개의 카메라로 촬영된 적어도 하나 이상의 원본 시점(Source Views) 영상으로부터 적어도 하나 이상의 기본 시점(Basic Views) 영상 및 적어도 하나 이상의 부가 시점(Additional Views) 영상을 선택할 수 있다. 이때, 상기 원본 시점 영상, 상기 기본 시점 영상 및 상기 부가 시점 영상은 각각 텍스처 성분 및 깊이 성분을 포함할 수 있다. 또한, 상기 적어도 하나 이상의 원본 시점 영상은 다중 시점으로 구성된 이머시브 비디오를 의미할 수 있다. 또한, 텍스처 성분 및 깊이 성분은 각각 텍스처 정보 및 깊이 정보와 동일한 의미로 사용될 수 있다.The view optimizer 210 may select at least one basic view image and at least one additional view image from at least one or more source views images captured by a plurality of cameras. In this case, the original view image, the base view image, and the additional view image may each include a texture component and a depth component. In addition, the at least one original view image may mean an immersive video composed of multiple views. Also, the texture component and the depth component may have the same meaning as the texture information and the depth information, respectively.

아틀라스 구성기(220)는 프루너(Pruner)(221), 누적기(Aggregator)(222) 및 패치 패커(Patch Packer)(223)를 포함할 수 있다. 그러나, 반드시 이에 한정되는 것은 아니고, 미도시된 구성요소를 추가적으로 더 포함할 수 있다. The atlas configurator 220 may include a pruner 221, an aggregator 222, and a patch packer 223. However, the present invention is not necessarily limited thereto, and a component not shown may be additionally included.

아틀라스 구성기(220)는 시점 최적화기(210)에서 선택된 적어도 하나 이상의 기본 시점 영상의 깊이 정보, 적어도 하나 이상의 부가 시점 영상의 깊이 정보 또는 카메라 파라미터(Camera Parameters) 중 적어도 하나를 기반으로 하여, 시점 영상들 간의 중복되는 영역을 제거할 수 있다.The atlas configurator 220 is based on at least one of depth information of at least one basic view image selected by the view optimizer 210, depth information of at least one additional view image, or camera parameters, It is possible to remove an overlapping area between images.

구체적으로, 아틀라스 구성기(220)는 다중 시점으로 구성된 이머시브 비디오로부터 중복 영역이 제거된 영상 정보만을 추출하기 위해 프루닝(Pruning) 과정을 수행할 수 있다. 이때, 프루닝 과정은 프루너(221)에서 수행될 수 있다. 상기 프루닝 과정은 영상의 깊이 성분에 대하여 적용될 수 있다. 또는, 상기 프루닝 과정은 영상의 깊이 성분뿐만 아니라 텍스처 성분에 대하여도 적용될 수 있다.Specifically, the atlas configurator 220 may perform a pruning process to extract only image information from which redundant regions have been removed from an immersive video composed of multiple views. In this case, the pruning process may be performed by the pruner 221. The pruning process may be applied to a depth component of an image. Alternatively, the pruning process may be applied not only to the depth component of the image but also to the texture component.

프루닝 과정은 기본 시점 영상 및 다른 부가 시점 영상에는 포함되어있지 않고, 해당 시점 위치에서만 보여지는 영상 영역을 추출하는 과정을 의미할 수 있다. 상기 프루닝 과정의 수행 결과로, 프루닝 마스크(Pruning Mask) 영상이 생성될 수 있다. 프루닝 마스크 영상은 해당 시점 위치의 영상에서 추출이 필요한 영역만을 표시하는 역할을 수행할 수 있다. 또한, 프루닝 마스크 영상은 3차원 영상 신호를 이용한 해당 시점 영상과 기본 시점 영상 및 다른 부가 시점 영상들과의 비교를 통해 중복되지 않은 영역을 표시하는 바이너리(binary) 형태의 비디오 데이터일 수 있다.The pruning process may refer to a process of extracting an image region that is not included in the basic viewpoint image and other additional viewpoint images and is only visible at a corresponding viewpoint position. As a result of performing the pruning process, a pruning mask image may be generated. The pruning mask image may serve to display only a region that needs to be extracted from an image of a corresponding viewpoint. In addition, the pruning mask image may be binary video data representing a non-overlapping area by comparing a corresponding viewpoint image using a 3D image signal with a basic viewpoint image and other additional viewpoint images.

프루닝 마스크 영상 내 유효한 텍스처 데이터에 대응하는 영역을 프루닝 마스크라 정의할 수 있다. An area corresponding to valid texture data in the pruning mask image may be defined as a pruning mask.

프루닝 마스크에 대응하는 텍스처 데이터에 기초하여 패치(Patch)를 생성할 수 있다. 패치는 중복되지 않은 영역의 데이터를 포함하는 사각 형태의 영역으로 정의될 수 있다. 패치 패커(Patch Packer)(223)는 시점 위치 영상들로부터 각 시점 위치 영상의 프루닝 마스크에 대응하는 텍스처 성분(Texture Component)(중복되지 않은 영상 영역)을 추출하여 각 시점 위치 영상의 패치를 생성하고, 시점 위치 영상들의 패치들을 소수의 영상으로 패킹(Packing)하여 패킹된 뷰(Packed view)를 생성할 수 있다. 이때, 패킹된 뷰는 아틀라스(Atlas)라고 불릴 수 있다. A patch may be generated based on texture data corresponding to the pruning mask. The patch may be defined as a rectangular area including data of a non-overlapping area. The patch packer 223 extracts a texture component (a non-overlapping image region) corresponding to the pruning mask of each viewpoint position image from the viewpoint position images to generate a patch for each viewpoint position image. In addition, a packed view may be generated by packing the patches of the viewpoint position images into a small number of images. In this case, the packed view may be referred to as an atlas.

패킹된 뷰는 일반적인 비디오 코덱(예를 들어, AVC, HEVC 또는 VVC 등)에 입력되는 최종 포맷일 수 있다. 즉, 기본 시점 영상 및 패킹된 뷰는 일반적인 비디오 코덱에 전송되어 디코딩될 수 있다.The packed view may be a final format input to a general video codec (eg, AVC, HEVC or VVC). That is, the base view image and the packed view may be transmitted to and decoded by a general video codec.

예를 들어, 도 2 및 도 3을 참고하면, 아틀라스 구성기(220)는 기본 시점(Basic view) 영상(V_k) 및 부가 시점(Additional view) 영상(V_k-1)을 입력 받아 프루닝(Pruning)을 수행할 수 있다. 이때, 기본 시점 영상 및 부가 시점 영상에서 중복되지 않는 영상 영역인 객체1(O1) 및 객체4(O4) 영역의 추출을 위해 프루닝 마스크(Pruning Mask)로써 제1 마스크(mask1) 및 제2 마스크(mask2)가 생성될 수 있다. 이때, 제1 마스크 및 제2 마스크에 해당하는 영상 영역은 기본 시점 영상에서는 보이지 않고, 해당 부가 시점 영상에서만 보여지는 영상 영역일 수 있다. 그 후, 생성된 제1 패치(Patch1) 및 제2 패치(Patch2)가 패킹되어 패킹된 뷰(Packed view), 즉 아틀라스(Atlas)가 생성될 수 있다. 이렇게 생성된 아틀라스를 포함하는 복수의 잔차 부가 시점 영상에 대한 정보 및 기본 시점 영상에 대한 정보는 비트스트림을 통해 전송될 수 있다. For example, referring to FIGS. 2 and 3, the atlas configurator 220 receives a basic view image (V _k ) and an additional view image (V _k-1 ) and performs pruning. (Pruning) can be performed. In this case, the first mask 1 and the second mask are used as pruning masks to extract the object 1 (O1) and object 4 (O4) regions that do not overlap in the base view image and the additional view image. (mask2) can be created. In this case, the image region corresponding to the first mask and the second mask may be an image region that is not visible in the base view image and is only visible in the corresponding additional view image. Thereafter, the generated first patch (Patch1) and the second patch (Patch2) may be packed to generate a packed view, that is, an atlas. Information on a plurality of residual-added view images including the generated atlas and information on the base view image may be transmitted through a bitstream.

한편, 각 시점 영상으로부터 추출된 패치들이 패킹된 뷰로 생성될 때 적어도 한 개 이상의 영상 프레임으로 구성되므로, 패치 패킹 과정에서 패킹 효율을 높여 패킹된 뷰의 크기 및/또는 개수를 줄이는 것과, 사각형으로 구성되는 패치의 경계 부분에서의 압축 효율을 높이는 것이 필요한 실정이다.On the other hand, when the patches extracted from each viewpoint image are created as a packed view, they are composed of at least one image frame, so that the packing efficiency is increased in the patch packing process to reduce the size and/or number of the packed view, and it is composed of a rectangle. It is necessary to increase the compression efficiency at the boundary of the patch.

본 발명의 일 실시예에 따른, 패치들을 효율적으로 패킹하여 전송되는 데이터량을 줄일 수 있는 방법 중 하나로, 패치별로의 중요도에 따라 패치의 크기를 조절(scaling)하고, 패치를 적절한 방향으로 회전시키면서 패킹된 뷰의 빈 공간에 배치하는 방법이 있다. 이때 중요한 것은 디코더에서 영상을 복원할 경우, 패치의 크기 조절 및 회전에 따른 화질 열화를 최소화하는 것이다. 단일 시점 영상의 경우, 주 관심영역(Region Of Interest, ROI)인지 여부에 따라 패치의 크기를 조절하는 것을 고려할 수 있으나, 복수 개의 시점 영상 및 깊이 정보가 추가적으로 전송되는 이머시브 미디어의 경우에 보다 적합한 패치 크기 조절 방식이 필요하다.As one of the methods for reducing the amount of transmitted data by efficiently packing the patches according to an embodiment of the present invention, scaling the size of the patch according to the importance of each patch, and rotating the patch in an appropriate direction, There is a way to place it in the empty space of the packed view. In this case, it is important to minimize image quality deterioration due to the adjustment and rotation of the patch when the image is restored by the decoder. In the case of a single view image, it is possible to consider adjusting the size of the patch depending on whether it is a region of interest (ROI), but it is more suitable for immersive media in which a plurality of view images and depth information are additionally transmitted. You need a way to adjust the patch size.

도 4는 본 발명의 일 실시예에 따른, 패치로 추출될 영상 영역의 중요도를 이용하여 패치의 크기를 조절하는 방법을 설명하기 위한 도면이다. 4 is a diagram for explaining a method of adjusting a size of a patch using the importance of an image area to be extracted as a patch according to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 패치를 패킹하여 패킹된 뷰를 생성함에 있어서, 각 패치별로 추출될 영상 영역의 중요도에 따라 패치들의 크기가 차등적으로 조절될 수 있다. 이때, 영상 영역의 중요도를 결정할 수 있는 요인은 영상 영역의 깊이(depth) 값, 영상 영역을 획득하는데 이용된 카메라의 위치(Camera Position), 관심 영역(Region Of Interest, ROI) 여부, 영상 간 우선 순위 및 프루닝 순서 중 적어도 하나를 포함할 수 있다. According to an embodiment of the present invention, in generating a packed view by packing a patch, the sizes of the patches may be differentially adjusted according to the importance of an image area to be extracted for each patch. At this time, factors that can determine the importance of the image region are the depth value of the image region, the camera position used to acquire the image region, the region of interest (ROI), and priority between images. It may include at least one of ranking and pruning order.

또한, 패킹된 뷰가 생성됨에 있어서, 일부 패치는 회전(rotation)되어 패킹될 수 있다. In addition, when a packed view is created, some patches may be rotated and packed.

그 결과, 각 패치별로의 크기 조절(scaling) 및 회전(rotation)을 통해 전송되는 데이터량이 축소하여 패치의 패킹 효율이 증가할 수 있다. As a result, the amount of data transmitted through scaling and rotation for each patch may be reduced, thereby increasing the packing efficiency of the patch.

패치의 크기를 결정하는 요인이 깊이 값인 경우, 다중 시점 영상 획득 시 시청자의 관점에서 초점이 맞는 영역과 주시되는 영역이 대부분 깊이 값에 의해 결정될 수 있으므로, 깊이 값이 영상 영역의 중요도가 결정될 수 있는 것이다. If the factor that determines the size of the patch is the depth value, the area in focus and the area to be looked at from the viewer's point of view when acquiring a multi-view image can be mostly determined by the depth value, so that the depth value can determine the importance of the image area. will be.

패치의 크기를 결정하는 요인이 카메라 위치인 경우, 다중 카메라의 중심으로부터 외곽으로 갈수록 중요도가 낮게 설정될 수 있다. 일 예로, 카메라 위치에 따라 순차적으로 카메라 인덱스가 할당되는 경우, 중심 카메라로부터 위치가 멀어질수록 카메라 인덱스 차이가 커지므로, 중심 카메라와의 인덱스 차이가 클수록 영상의 중요도가 낮게 설정될 수 있다. 다른 예로, 다중 카메라의 소정 간격(예를 들어, 짝수 번째 또는 홀수 번째)에 따라 영상의 중요도가 설정될 수 있다. 이때, 상기 소정 간격은 기 정의된 값일 수도 있고, 전체 영상의 개수에 따라 적응적으로 결정될 수도 있다. When the factor determining the size of the patch is the camera position, the importance may be set to be lower from the center of the multiple cameras to the outside. For example, when the camera index is sequentially allocated according to the camera position, since the camera index difference increases as the position moves away from the center camera, the importance of the image may be set to be lower as the index difference with the center camera increases. As another example, the importance of an image may be set according to a predetermined interval (eg, even-numbered or odd-numbered) of multiple cameras. In this case, the predetermined interval may be a predefined value or may be adaptively determined according to the number of all images.

패치의 크기를 조절하는 요인이 영상 간 우선 순위일 경우, 복잡도가 높은 비디오 영역으로 구성된 패치가 복잡도가 낮은 영역으로 구성된 패치 보다 중요도가 높게 설정될 수 있다. 이는 패치 내 영상 영역의 복잡도가 높을 때 크기가 조절되면 복원 시 화질이 많이 열화되기 때문이다.When a factor controlling the size of a patch is priority between images, a patch composed of a video region with high complexity may be set to have higher importance than a patch composed of a region with low complexity. This is because if the size of the image area in the patch is high and the size is adjusted, the image quality deteriorates a lot during restoration.

예를 들어, 도 4를 참고하면, 객체2(O2) 및 객체4(O4)의 패치 영역에 해당하는 제2 패치(Patch2)는 상술한 여러가지 방식 중 하나에 의해 결정된 해당 영상 영역의 중요도에 따라 크기가 조절되고(Scaling), 회전(Rotation)되어 패킹될 수 있다.For example, referring to FIG. 4, the second patch (Patch2) corresponding to the patch regions of the object 2 (O2) and the object 4 (O4) is determined according to the importance of the image region determined by one of the various methods described above. The size can be adjusted (Scaling), rotated (Rotation) and packed.

패치의 중요도를 나타내는 정보가 메타데이터로 부호화될 수 있다. 일 예로, 패치가 ROI 영역에 포함되는지 여부를 나타내는 1비트의 플래그가 부호화될 수 있다. 또는, 중요도 순위를 나타내는 정보가 부호화될 수 있다.Information indicating the importance of the patch may be encoded as metadata. For example, a 1-bit flag indicating whether a patch is included in the ROI region may be encoded. Alternatively, information indicating the priority order may be encoded.

패치 중요도에 따라, 패치의 크기 조절 비율이 결정될 수 있다. 일 예로, 중요 패치와 비중요 패치간 크기 조절 비율은 상이할 수 있다.According to the importance of the patch, a ratio of adjusting the size of the patch may be determined. For example, the ratio of size adjustment between the critical and non-critical patches may be different.

또는, 패치의 크기 조절 비율 및 회전 중 적어도 하나을 나타내는 정보가 메타데이터로 부호화될 수 있다. 일 예로, 패치의 원래 크기, 크기가 조절된 이후 패치의 크기 또는 크기 조절 전후의 비율 중 적어도 하나를 나타내는 정보가 부호화되어, 패치의 크기 조절 비율을 결정하는데 이용될 수 있다. 일 예로, 시계 방향 회전 여부, 반시계 방향 회전 여부 또는 회전 각도 중 적어도 하나를 나타내는 정보가 부호화되어, 패치의 회전 여부를 결정하는데 이용될 수 있다. Alternatively, information indicating at least one of a scaling ratio and rotation of the patch may be encoded as metadata. For example, information indicating at least one of the original size of the patch, the size of the patch after the size is adjusted, or the ratio before and after the size adjustment may be encoded and used to determine the size adjustment ratio of the patch. For example, information indicating at least one of a clockwise rotation, a counterclockwise rotation, or a rotation angle may be encoded and used to determine whether the patch is rotated.

도 5는 본 발명의 일 실시예에 따른, 가드 밴드의 폭을 조절함으로써 인코딩 효율을 향상시키는 방법을 설명하기 위한 도면이다.5 is a diagram illustrating a method of improving encoding efficiency by adjusting a width of a guard band according to an embodiment of the present invention.

패치 패킹 과정에서 생성된 패킹된 뷰는 도 2에 도시된 텍스처 성분 및 깊이 성분 인코더(230)를 통해 압축될 수 있다. 이때, 패치 간의 경계 부분은 고주파 성분이고, 경계 간 영상 영역은 상관성이 떨어져서 압축 효율이 저하되는 요인으로 작용할 수 있다.The packed view generated in the patch packing process may be compressed through the texture component and depth component encoder 230 shown in FIG. 2. In this case, a boundary portion between patches is a high-frequency component, and an image region between the boundaries has a poor correlation, and thus may act as a factor of deteriorating compression efficiency.

따라서, 본 발명의 일 실시예에 의하면, 도 5에 도시된 바와 같이 패치의 경계 영역에 가드 밴드(guard band)가 설정될 수 있다. 가드 밴드는 패치의 경계 영역에 설정되는 영역으로, 패치의 경계에서 압축 효율이 떨어지는 것을 방지하기 위해, 경계를 분리하는 영역을 의미할 수 있다. 가드 밴드 영역에 적절한 필터링(filtering) 및 채움(filling)을 통해 인코딩 효율이 향상될 수 있다. 이때, 가드 밴드는 텍스처 성분 및 깊이 성분 모두에 대하여 설정될 수 있다. Accordingly, according to an embodiment of the present invention, as shown in FIG. 5, a guard band may be set in the boundary area of the patch. The guard band is an area set in the boundary area of the patch, and may refer to an area that separates the boundary in order to prevent a decrease in compression efficiency at the boundary of the patch. Encoding efficiency may be improved through appropriate filtering and filling in the guard band region. In this case, the guard band may be set for both the texture component and the depth component.

일 예로, 가드 밴드에 포함된 픽셀 값은, 패치의 경계 영역에 인접하는 샘플의 복사를 통해 생성될 수 있다.For example, a pixel value included in the guard band may be generated by copying a sample adjacent to the boundary area of the patch.

다른 예로, 가드 밴드에 포함된 픽셀 값은, 패치의 경계 영역에 포함된 복수의 샘플들의 보간을 통해 생성될 수 있다.As another example, pixel values included in the guard band may be generated through interpolation of a plurality of samples included in the boundary area of the patch.

텍스트 이미지 픽셀이 가드 밴드에 포함되는지 여부를 표시하기 위해 도 5의 우측에 도시된 바와 같이, 깊이 정보 값의 특정 레벨이 가드 밴드로 할당될 수 있다. 일 예로, 1024 레벨의 깊이 정보 값 중 하위 32 레벨이 가드 밴드를 표시할 수 있다.As shown on the right side of FIG. 5, in order to indicate whether the text image pixel is included in the guard band, a specific level of the depth information value may be assigned to the guard band. As an example, the lower 32 levels of the 1024 level depth information values may indicate the guard band.

이때, 가드 밴드의 폭은 상술한 도 4와 관련된 설명과 마찬가지로, 영상 영역의 중요도에 기초하여 결정될 수 있다. 이때, 영상 영역의 중요도는 영상 영역의 깊이(depth) 값, 영상 영역을 획득하는데 이용된 카메라의 위치(Camera Position), 관심 영역(Region Of Interest, ROI) 여부, 영상 간 우선 순위 및 프루닝 순서 중 적어도 하나에 기초하여 결정될 수 있다. 또한, 패치의 크기에 기초하여 가드 밴드의 크기가 결정될 수 있다. 또한, 패치들 간의 우선 순위에 기초하여 패치별 가드 밴드의 폭이 결정될 수 있다. In this case, the width of the guard band may be determined based on the importance of the image area, similar to the description related to FIG. 4. At this time, the importance of the image region is the depth value of the image region, the position of the camera used to acquire the image region (Camera Position), the region of interest (ROI), the priority between images, and the order of pruning. It may be determined based on at least one of. Also, the size of the guard band may be determined based on the size of the patch. Also, the width of the guard band for each patch may be determined based on the priority order between the patches.

일 실시예에 따르면, 가드 밴드의 폭은 패치를 정의하는 메타데이터(metadata)를 통해 부가적으로 전송될 수 있다. 또는, 수평 방향 가드 밴드의 크기와 수직 방향 가드 밴드의 크기를 나타내는 정보가 개별적으로 부호화될 수 있다. According to an embodiment, the width of the guard band may be additionally transmitted through metadata defining a patch. Alternatively, information indicating the size of the horizontal direction guard band and the size of the vertical direction guard band may be individually encoded.

도 6은 본 발명의 일 실시예에 따른, 유사한 다중 패치 영역이 단일 패치로 패킹되는 방법을 설명하기 위한 도면이다.6 is a diagram for describing a method in which similar multiple patch areas are packed into a single patch according to an embodiment of the present invention.

프루닝 마스크를 이용하여 부가 시점 영상으로부터 패치가 추출될 때, 유사한 영상 영역이 존재할 수 있다. 따라서, 본 발명의 일 실시예에 따르면, 추출되는 영상 영역들 간의 유사성을 판별하여, 유사성이 있는 다중의 패치 영역을 단일 패치로 패킹하여 데이터량을 줄일 수 있다.When a patch is extracted from an additional view image using a pruning mask, a similar image region may exist. Accordingly, according to an embodiment of the present invention, it is possible to reduce the amount of data by determining similarity between extracted image regions and packing multiple patch regions having similarity into a single patch.

예를 들어, 추출되는 영상 영역들에 포함된 픽셀 값이 유사한 경우, 유사한 영상 영역으로 판별될 수 있다. 따라서, 유사한 픽셀 값을 포함하는 영상 영역들은 단일 패치로 패킹될 수 있다.For example, when pixel values included in the extracted image regions are similar, it may be determined as a similar image region. Accordingly, image regions including similar pixel values may be packed into a single patch.

이때, 추출되는 영상 영역들 간에 적절한 변환(예: Similarity transform, Affine transform)을 적용하였을 때 유사성이 높아진다면. 적절한 변환을 적용하고 단일 패치로 패킹할 수 있다. 한편, 인코더에서 상기 변환과 관련된 변환 파라미터를 추가로 송신하고, 디코더에서 파라미터를 수신하여 역변환을 수행하고, 이를 복원할 수 있다. At this time, if the similarity increases when an appropriate transform (eg, Similarity transform, Affine transform) is applied between the extracted image regions. You can apply the appropriate transformation and pack it into a single patch. Meanwhile, an encoder may additionally transmit a transformation parameter related to the transformation, and an inverse transformation may be performed by receiving a parameter from a decoder, and the transformation may be restored.

예를 들어, 도 6을 참고하면, 부가 시점(Additional view) 영상 V_k-1에 존재하는 객체0(O0) 및 객체1(O1)가 존재하는 영상 영역은 유사한 영상 영역으로 판별될 수 있다. 따라서, 객체O 및 객체1에 해당하는 패치 영역을 단일 패치인 제1 패치(Patch1)로 패킹하여 데이터량을 줄일 수 있다.For example, referring to FIG. 6, an image region in which the object 0 (O0) and the object 1 (O1) exist in the additional view image V _k-1 may be determined as similar image regions. Accordingly, the amount of data can be reduced by packing the patch regions corresponding to object O and object 1 into the first patch (Patch1) which is a single patch.

도 7은 본 발명의 일 실시예에 따른 아틀라스별로의 패치 패킹 방법을 설명하기 위한 도면이다.7 is a diagram illustrating a patch packing method for each atlas according to an embodiment of the present invention.

일 실시예에 따르면, 패치가 아틀라스(Atlas 또는 Packed View)에 패킹됨에 있어서, 패치로 추출될 영상 영역의 중요도에 따라 패치들이 서로 다른 아틀라스(예컨대, 제1 아틀라스(Atlas1) 및 제2 아틀라스(Atlas2))로 분리될 수 있다. 이때, 영상 영역의 중요도를 결정할 수 있는 요인은 영상 영역의 깊이(depth) 값, 영상 영역을 획득하는데 이용된 카메라의 위치(Camera Position), 관심 영역(Region Of Interest, ROI) 여부, 영상 간 우선 순위 및 프루닝 순서 중 적어도 하나를 포함할 수 있다.According to an embodiment, when a patch is packed in an atlas (Atlas or Packed View), patches are atlas different from each other according to the importance of an image region to be extracted as a patch (e.g., a first atlas 1 and a second atlas 2 )). At this time, factors that can determine the importance of the image region are the depth value of the image region, the camera position used to acquire the image region, the region of interest (ROI), and priority between images. It may include at least one of ranking and pruning order.

도 7을 참고하면, 패치들이 서로 다른 아틀라스(예컨대, 제1 아틀라스 및 제2 아틀라스)로 분리된 후, 중요도가 낮은 패치들로 패킹된 아틀라스(예컨대, 제2 아틀라스)는 다시 다운 스케일링(down-scailing)되어 압축되어 전송될 수 있다. 상기 일 실시예에 따르면, 일반적인 영상 포맷으로 생성되는 아틀라스의 크기를 서버와 단말에서 각각 조절함으로써 구현이 단순하다는 장점이 있을 수 있다. Referring to FIG. 7, after the patches are separated into different atlases (eg, a first atlas and a second atlas), an atlas packed with patches of low importance (eg, a second atlas) is again down-scaled. scailing), compressed, and transmitted. According to the above embodiment, there may be an advantage that implementation is simple by adjusting the size of the atlas generated in a general video format in the server and the terminal, respectively.

도 8은 본 발명의 일 실시예에 따른 아틀라스별로 비디오 인코딩을 다르게 하는 방법을 설명하기 위한 도면이다.8 is a diagram for describing a method of differently encoding video for each atlas according to an embodiment of the present invention.

도 8을 참고하면, 패치들이 서로 다른 아틀라스(예컨대, 제1 아틀라스 및 제2 아틀라스)로 분리된 후, 중요도가 높은 패치들로 패킹된 아틀라스는 제1 인코더(Encoder 1)에서 낮은 압축률로 압축되고, 중요도가 낮은 패치들로 패킹된 아틀라스는 제2 인코더(Encoder 2)에서 높은 압축률로 압축될 수 있다. 따라서, 상기 일 실시예에 따르면 전체적으로 동일한 비디오 압축율을 유지하면서 서로 다른 디코딩 화질이 제공될 수 있다.Referring to FIG. 8, after the patches are separated into different atlases (eg, a first atlas and a second atlas), an atlas packed with patches of high importance is compressed at a low compression rate by a first encoder 1 , Atlas packed with patches of low importance may be compressed at a high compression rate in the second encoder (Encoder 2). Accordingly, according to the above embodiment, different decoding quality may be provided while maintaining the same video compression rate as a whole.

또한, 상기 일 실시예에 따르면, 전송율이 한정된 서비스 환경에서 중요도가 높은 비디오 영역은 고화질로 시청이 가능하고, 반대로 중요도가 낮은 비디오 영역은 상대적으로 저화질로 시청하게 되어 전체적으로 시청자의 만족도가 높아질 수 있다. In addition, according to the above embodiment, in a service environment with a limited transmission rate, a video region having a high importance can be viewed in high quality, while a video region having a low importance is viewed in a relatively low quality, thereby increasing the overall viewer satisfaction. .

도 9는 본 발명의 일 실시예에 따른 다중 시점 영상 부호화 방법을 설명하기 위한 도면이다.9 is a diagram illustrating a method of encoding a multi-view image according to an embodiment of the present invention.

도 9를 참고하면, 다중 시점 영상 부호화 장치는 기본 시점 영상에 기초하여 적어도 하나 이상의 부가 시점 영상에 프루닝을 수행하여, 적어도 하나 이상의 패치를 생성할 수 있다(S901). Referring to FIG. 9, the apparatus for encoding a multi-view image may generate at least one or more patches by performing pruning on at least one additional view image based on a base view image (S901).

그리고, 다중 시점 영상 부호화 장치는 적어도 하나 이상의 패치를 패킹할 수 있다(S902). 이때 S902 단계는, 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 수행될 수 있다. In addition, the multi-view image encoding apparatus may pack at least one or more patches (S902). In this case, step S902 may be performed based on the importance of an image region belonging to the additional view image.

한편, 부가 시점 영상에 속한 영상 영역의 중요도는, 부가 시점 영상에 속한 영상 영역의 깊이 값, 부가 시점 영상에 속한 영상 영역을 획득하는데 이용된 카메라의 위치, 부가 시점 영상에 속한 영상 영역이 관심 영역인지 여부 및 상기 영상 영역의 복잡도 중 적어도 하나에 기초하여 결정될 수 있다.On the other hand, the importance of the image region belonging to the additional viewpoint image is the depth value of the image region belonging to the additional viewpoint image, the location of the camera used to acquire the image region belonging to the additional viewpoint image, and the image region belonging to the additional viewpoint image is the region of interest. It may be determined based on at least one of whether or not the image region is complex.

한편, 적어도 하나 이상의 패치를 패킹하는 단계(S902)는, 부가 시점 영상에 속한 영상 영역의 중요도에 따라 패치의 크기를 조절하는 단계를 포함할 수 있다.Meanwhile, the step of packing at least one patch (S902) may include adjusting the size of the patch according to the importance of the image region belonging to the additional view image.

또한, 적어도 하나 이상의 패치를 패킹하는 단계(S902)는, 부가 시점 영상에 속한 영상 영역의 중요도에 따라 패치를 회전시키는 단계를 포함할 수 있다.In addition, packing the at least one patch (S902) may include rotating the patch according to the importance of an image region belonging to the additional view image.

또한, 적어도 하나 이상의 패치를 패킹하는 단계(S902)는, 적어도 하나 이상의 패치의 경계 부분에 가드 밴드(guard band)를 설정하는 단계를 포함할 수 있다.In addition, packing the at least one patch (S902) may include setting a guard band at a boundary portion of the at least one patch.

한편, 가드 밴드를 설정하는 단계는, 부가 시점 영상에 속한 영상 영역의 중요도에 따라 상기 가드 밴드를 설정할 수 있다.Meanwhile, in the step of setting the guard band, the guard band may be set according to the importance of an image area belonging to an additional view image.

또한, 가드 밴드를 설정하는 단계는, 적어도 하나 이상의 패치의 경계 영역에 인접한 샘플 값을 복사하여 가드 밴드를 설정할 수 있다.In addition, in the setting of the guard band, the guard band may be set by copying sample values adjacent to the boundary region of at least one patch.

또한, 가드 밴드를 설정하는 단계는, 적어도 하나 이상의 패치의 경계 영역에 포함된 복수의 샘플들을 보간하여 가드 밴드를 설정할 수 있다.In addition, in the setting of the guard band, the guard band may be set by interpolating a plurality of samples included in the boundary area of at least one patch.

한편, 적어도 하나 이상의 패치를 패킹하는 단계(S902)는, 부가 시점 영상에 속한 복수의 영상 영역의 유사성을 판별하는 단계 및 유사성 판별 결과에 기초하여 적어도 하나 이상의 패치를 단일 패치로 패킹하는 단계를 포함할 수 있다.On the other hand, the step of packing at least one patch (S902) includes determining the similarity of a plurality of image regions belonging to the additional view image and packing at least one or more patches into a single patch based on the similarity determination result. can do.

그리고, 다중 시점 영상 부호화 장치는, 패킹된 적어도 하나 이상의 패치에 기초하여 복수의 잔차 부가 시점 영상을 생성할 수 있다(S903).In addition, the multi-view image encoding apparatus may generate a plurality of residual-added view images based on at least one packed patch (S903).

그리고, 다중 시점 영상 부호화 장치는, 복수의 잔차 부가 시점 영상에 관한 잔차 부가 시점 영상 부호화 정보를 포함하는 비트스트림을 출력할 수 있다(S904). 이때, 비트스트림은 기본 시점 영상에 관한 기본 시점 영상 부호화 정보를 더 포함할 수 있다.In addition, the multi-view image encoding apparatus may output a bitstream including residual-added-view image encoding information about a plurality of residual-added-view images (S904). In this case, the bitstream may further include base view image encoding information on the base view image.

도 10은 본 발명의 일 실시예에 따른 다중 시점 영상 복호화 방법을 설명하기 위한 도면이다.10 is a diagram for explaining a method of decoding a multi-view image according to an embodiment of the present invention.

도 10을 참고하면, 다중 시점 영상 복호화 장치는, 기본 시점 영상에 관한 기본 시점 영상 부호화 정보 및 복수의 잔차 부가 시점 영상에 관한 잔차 부가 시점 영상 부호화 정보를 포함하는 비트스트림을 획득할 수 있다(S1001). 이때, 잔차 부가 시점 영상 부호화 정보는 패치의 패킹 정보를 포함할 수 있다. Referring to FIG. 10, the multi-view image decoding apparatus may obtain a bitstream including basic view image encoding information on a base view image and residual additional view image encoding information on a plurality of residual additional view images (S1001). ). In this case, the residual addition view image encoding information may include packing information of a patch.

또한, 패킹 정보는 부가 시점 영상에 속한 영상 영역의 중요도에 관한 정보를 포함할 수 있다. 또한, 패킹 정보는 패치의 가드 밴드(guard band)에 관한 정보를 포함할 수 있다.In addition, the packing information may include information on the importance of an image region belonging to an additional view image. Also, the packing information may include information on a guard band of the patch.

또한, 패킹 정보는 부가 시점 영상에 속한 복수의 영상 영역의 유사성에 관한 정보를 포함할 수 있다.In addition, the packing information may include information about similarity of a plurality of image regions belonging to an additional view image.

한편, 가드 밴드는 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 결정될 수 있다.Meanwhile, the guard band may be determined based on the importance of an image region belonging to an additional view image.

또한, 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 상기 패치의 크기가 결정될 수 있다.Also, the size of the patch may be determined based on the importance of an image region belonging to the additional view image.

그리고, 다중 시점 영상 복호화 장치는 비트스트림에 기초하여, 기본 시점 영상 및 복수의 잔차 부가 시점 영상을 복호화할 수 있다(S1002).In addition, the multi-view image decoding apparatus may decode a base view image and a plurality of residual additional view images based on the bitstream (S1002).

그리고, 다중 시점 영상 복호화 장치는 기본 시점 영상 부호화 정보, 잔차 부가 시점 영상 부호화 정보 및 기본 시점 영상에 기초하여 복수의 잔차 부가 시점 영상으로부터 적어도 하나 이상의 부가 시점 영상을 복원할 수 있다(S1003).In addition, the multi-view image decoding apparatus may reconstruct at least one additional view image from a plurality of residual additional view images based on the base view image encoding information, the residual additional view image encoding information, and the base view image (S1003).

본 발명의 영상 부호화 방법으로 생성된 비트스트림은 컴퓨터 판독가능한 비일시적 기록 매체에 일시적으로 저장될 수 있으며, 상술한 다중 시점 영상 부호화 방법에 의해 부호화될 수 있다.The bitstream generated by the image encoding method of the present invention may be temporarily stored in a computer-readable non-transitory recording medium, and may be encoded by the multi-view image encoding method described above.

구체적으로, 기본 시점 영상 및 적어도 하나 이상의 부가 시점 영상을 포함하는 다중 시점 영상의 부호화 방법에 관한 비트스트림을 저장하는 비일시적 컴퓨터 판독 가능한 기록 매체에 있어서, 상기 다중 시점 영상의 부호화 방법은, 상기 기본 시점 영상에 기초하여 상기 적어도 하나 이상의 부가 시점 영상에 프루닝을 수행하여 적어도 하나 이상의 패치를 생성하는 단계, 상기 적어도 하나 이상의 패치를 패킹하는 단계, 상기 패킹된 적어도 하나 이상의 패치에 기초하여 복수의 잔차 부가 시점 영상을 생성하는 단계, 상기 복수의 잔차 부가 시점 영상에 관한 잔차 부가 시점 영상 부호화 정보를 포함하는 비트스트림을 출력하는 단계를 포함하되, 상기 적어도 하나 이상의 패치를 패킹하는 단계는, 상기 부가 시점 영상에 속한 영상 영역의 중요도에 기초하여 수행될 수 있다.Specifically, in a non-transitory computer-readable recording medium storing a bitstream related to a method of encoding a multi-view image including a basic view image and at least one additional view image, the method of encoding the multi-view image comprises: Generating at least one patch by performing pruning on the at least one additional view image based on a viewpoint image, packing the at least one patch, a plurality of residuals based on the packed at least one patch Generating an additional view image, and outputting a bitstream including residual additional view image encoding information on the plurality of residual additional view images, wherein packing the at least one patch includes the additional view It may be performed based on the importance of the image region belonging to the image.

본 발명의 예시적인 방법들은 설명의 명확성을 위해서 동작의 시리즈로 표현되어 있지만, 이는 단계가 수행되는 순서를 제한하기 위한 것은 아니며, 필요한 경우에는 각각의 단계가 동시에 또는 상이한 순서로 수행될 수도 있다. 본 발명에 따른 방법을 구현하기 위해서, 예시하는 단계에 추가적으로 다른 단계를 포함하거나, 일부의 단계를 제외하고 나머지 단계를 포함하거나, 또는 일부의 단계를 제외하고 추가적인 다른 단계를 포함할 수도 있다.The exemplary methods of the present invention are expressed as a series of operations for clarity of explanation, but this is not intended to limit the order in which the steps are performed, and each step may be performed simultaneously or in a different order if necessary. In order to implement the method according to the present invention, it is also possible to include other steps in addition to the illustrated steps, to include other steps excluding some steps, or to include additional other steps excluding some steps.

본 발명의 다양한 실시 예는 모든 가능한 조합을 나열한 것이 아니고 본 발명의 대표적인 양상을 설명하기 위한 것이며, 다양한 실시 예에서 설명하는 사항들은 독립적으로 적용되거나 또는 둘 이상의 조합으로 적용될 수도 있다.Various embodiments of the present invention are not listed in all possible combinations, but are intended to describe representative aspects of the present invention, and matters described in the various embodiments may be applied independently or may be applied in combination of two or more.

또한, 본 발명의 다양한 실시 예는 하드웨어, 펌웨어(firmware), 소프트웨어, 또는 그들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 범용 프로세서(general processor), 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. In addition, various embodiments of the present invention may be implemented by hardware, firmware, software, or a combination thereof. For implementation by hardware, one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), general purpose It may be implemented by a general processor, a controller, a microcontroller, a microprocessor, or the like.

본 발명의 범위는 다양한 실시 예의 방법에 따른 동작이 장치 또는 컴퓨터 상에서 실행되도록 하는 소프트웨어 또는 머신-실행가능한 명령들(예를 들어, 운영체제, 애플리케이션, 펌웨어(firmware), 프로그램 등), 및 이러한 소프트웨어 또는 명령 등이 저장되어 장치 또는 컴퓨터 상에서 실행 가능한 비-일시적 컴퓨터-판독가능 매체(non-transitory computer-readable medium)를 포함한다.The scope of the present invention is software or machine-executable instructions (e.g., operating systems, applications, firmware, programs, etc.) that allow an operation according to the method of various embodiments to be executed on a device or computer, and such software or It includes a non-transitory computer-readable medium (non-transitory computer-readable medium) which stores instructions and the like and is executable on a device or a computer.

Claims

In the encoding method of a multi-view image including a base view image and at least one additional view image,
Generating at least one patch by performing pruning on the at least one additional view image based on the base view image;
Packing the at least one patch;
Generating a plurality of residual view images based on the at least one packed patch; And
Including the step of outputting a bitstream including the residual addition view image encoding information for the plurality of residual addition view images,
Packing the at least one patch,
The encoding method of a multi-view image, characterized in that it is performed based on the importance of an image region belonging to the additional view image.

The method of claim 1,
Packing the at least one patch,
And adjusting the size of the patch according to the importance of an image region belonging to the additional view image.

The method of claim 1,
Packing the at least one patch,
And rotating the patch according to the importance of an image region belonging to the additional view image.

The method of claim 1,
The importance of the video area belonging to the additional view video is,
A method of encoding a multi-view image, characterized in that it is determined based on at least one of a depth value of the image region, a position of a camera used to acquire the image region, whether the image region is an ROI, and a complexity of the region .

The method of claim 1,
Packing the at least one patch,
And setting a guard band at a boundary portion of the at least one patch.

The method of claim 5,
The step of setting the guard band,
And setting the guard band according to an importance of an image region belonging to the additional view image.

The method of claim 5,
The step of setting the guard band,
And setting the guard band by copying a sample value adjacent to a boundary region of the at least one or more patches.

The method of claim 5,
The step of setting the guard band,
And setting the guard band by interpolating a plurality of samples included in a boundary area of the at least one patch.

The method of claim 1,
Packing the at least one patch,
Determining similarity between a plurality of image regions belonging to the additional view image; And
And packing the at least one patch into a single patch based on a result of the similarity determination.

The method of claim 1,
The bitstream,
The encoding method of a multi-view image, further comprising base-view image encoding information on the base-view image.

In the decoding method of a multi-view image including a base view image and at least one additional view image,
Obtaining a bitstream including basic view image encoding information on the base view image and residual additional view image encoding information on a plurality of residual additional view images;
Decoding the base view image and the plurality of residual addition view images based on the bitstream; And
Restoring the at least one additional view image from the plurality of residual addition view images based on the base view image encoding information, the residual addition view image encoding information, and the base view image,
The residual addition view video encoding information includes packing information of a patch,
The packing information includes information on the importance of an image region belonging to the additional view image.

The method of claim 11,
A method of decoding a multi-view image, wherein the size of the patch is determined based on the importance of an image region belonging to the additional view image.

The method of claim 11,
The importance of the video area belonging to the additional view video is,
A method for decoding a multi-view image, characterized in that it is determined based on at least one of a depth value of the image region, a position of a camera used to acquire the image region, whether the image region is an ROI, and a complexity of the region .

The method of claim 11,
The packing information includes information on a guard band of the patch.

The method of claim 14,
The guard band is determined based on the importance of an image region belonging to the additional view image.

The method of claim 11,
The packing information includes information on similarity between a plurality of video regions belonging to the additional view video.

A non-transitory computer-readable recording medium storing a bitstream related to a method for encoding a multi-view image including a basic view image and at least one additional view image,
The encoding method of the multi-view image,
Generating at least one patch by performing pruning on the at least one additional view image based on the base view image;
Packing the at least one patch;
Generating a plurality of residual view images based on the at least one packed patch; And
Including the step of outputting a bitstream including the residual addition view image encoding information for the plurality of residual addition view images,
Packing the at least one patch,
The computer-readable recording medium, characterized in that it is performed based on the importance of an image area belonging to the additional view image.