KR20190050714A

KR20190050714A - A METHOD AND APPARATUS FOR ENCODING/DECODING 360 Virtual Reality VIDEO

Info

Publication number: KR20190050714A
Application number: KR1020180133502A
Authority: KR
Inventors: 김현철; 임성용; 석주명
Original assignee: 한국전자통신연구원
Priority date: 2017-11-03
Filing date: 2018-11-02
Publication date: 2019-05-13

Abstract

A method for encoding/decoding a 360-degree virtual reality (VR) image and an apparatus thereof are provided. The method for encoding an image comprises the steps of: dividing a 360-degree VR image into a plurality of regions based on a division structure of the 360-degree VR image; generating a region sequence by using the plurality of divided regions; generating a bit stream for the generated region sequence; and transmitting the generated bit stream, wherein the region sequence can include regions at the same position in at least one frame included in the 360-degree VR image.

Description

TECHNICAL FIELD [0001] The present invention relates to a 360 VR image encoding /

본 발명은 인터랙티브 영상의 부호화 및 복호화에 관한 것이며, 더 구체적으로는 360 VR(Virtual Reality) 영상과 같이 사용자의 움직임에 따라 재생영역이 바뀌는 인터랙티브 영상의 부호화 및 복호화에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to encoding and decoding of an interactive image, and more particularly, to encoding and decoding of an interactive image such as a 360 VR (Virtual Reality) image in which a reproduction region is changed according to a user's motion.

360도 VR(Virtual Reality)과 같은 인터랙티브 영상을 서비스할 경우, 가장 단순하게는 영상 전체를 부호화해서 단말로 전달하고, 단말은 영상 전체를 복호화한 후, 사용자가 보는 뷰포트(viewport)에 해당하는 부분만 렌더링 하는 것이다. 그러나 이와 같이 영상 전체를 부호화해서 전송하는 것은 사용자가 보지 않는 영역까지 고화질로 전송하게 되어 네트워크 대역폭 낭비가 커지게 된다. When an interactive image such as a 360 degree VR (Virtual Reality) is provided, the entire image is simply encoded and transmitted to the terminal. The terminal decodes the entire image, and thereafter a portion corresponding to a viewport Lt; / RTI > However, encoding and transmitting the whole image in this way transmits to the area not seen by the user at high image quality, which causes waste of network bandwidth.

따라서, 사용자가 특정 시점에 볼 수 있는 360 VR 영상의 일부분만 전송하여 전송 비트레이트를 줄일 수 있는 방법들이 사용된다. Accordingly, methods are available that can reduce the transmission bit rate by transmitting only a portion of the 360 VR image that the user can view at a specific time.

360 VR 영상은 사용자의 움직임에 따라 영상의 재생 영역이 변경되어야 하는데, 이전 프레임 및 주변 영역을 참조하는 영상 부호화/복호화의 특성에 따라 현재의 뷰포트 복호화에 필요한 이전 프레임이나 주변 영역이 없을 경우, 뷰포트 영역을 복호화할 수 없는 문제가 발생한다. In the 360 VR image, the reproduction area of the image must be changed according to the movement of the user. If there is no previous frame or surrounding area necessary for decoding the current viewport according to the characteristics of image encoding / decoding referring to the previous frame and the surrounding area, A problem that the area can not be decoded occurs.

따라서, 이러한 문제를 피하기 위해 입력 영상을 여러 개의 타일(tile)로 구분하고, 각 타일(tile)을 독립된 부호화기(encoder)로 부호화하는 종래의 기술들이 있다.Therefore, in order to avoid such a problem, there are conventional techniques of dividing an input image into a plurality of tiles and encoding each tile with an independent encoder.

종래의 기술의 경우 타일(tile)을 독립적으로 부호화 및 복호화 하려면, 타일(tile)의 개수만큼 영상 부호화기(video encoder)와 영상 복호화기(video decoder)가 필요하다. 이는 부호화기의 구성을 위한 비용 증대를 야기하며, 복호화기의 경우, 대부분의 단말에서 타일(tile)의 개수만큼 복호화기를 지원하지 못하므로, 범용적 서비스에 어려움이 있다.Conventionally, in order to independently encode and decode a tile, a video encoder and a video decoder are required as many as the number of tiles. This causes an increase in the cost for the configuration of the encoder. In the case of the decoder, most terminals do not support the decoder as many as the number of tiles, so that there is a difficulty in the general-purpose service.

본 발명의 목적은 기존의 비디오 인코더와 비디오 디코더를 별도의 수정 없이 활용하여, 고해상도 360 VR 영상을 타일(tile)기반으로 부호화하기 위한 타일 기반 360 VR 영상 부호화 방법 및 타일 기반 360 VR 영상 복호화 방법을 제공하는 것이다.It is an object of the present invention to provide a tile-based 360 VR image coding method and a tile-based 360 VR image decoding method for coding a high-resolution 360 VR image based on a tile using a conventional video encoder and a video decoder without any modification .

본 발명에 따르면, 360 VR(Virtual Reality) 영상의 분할 구조에 기초하여 상기 360 VR 영상을 복수의 영역으로 분할하는 단계; 상기 분할된 복수의 영역을 이용하여 영역 시퀀스를 생성하는 단계; 상기 생성된 영역 시퀀스에 대한 비트스트림을 생성하는 단계; 및 상기 생성된 비트스트림을 전송하는 단계를 포함하고, 상기 영역 시퀀스는 상기 360 VR 영상에 포함된 적어도 하나 이상의 프레임에서 동일한 위치의 영역들을 포함하는 360 VR 영상 부호화 방법이 제공될 수 있다.According to another aspect of the present invention, there is provided a method for processing a 360 VR image, the method comprising: dividing the 360 VR image into a plurality of regions based on a division structure of a 360 VR (Virtual Reality) Generating an area sequence using the plurality of divided areas; Generating a bitstream for the generated region sequence; And transmitting the generated bitstream, wherein the region sequence includes regions of the same position in at least one frame included in the 360 VR image.

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 영역은, 타일 및 서브 픽처 중 적어도 하나일 수 있다.In the 360 VR image coding method according to the present invention, the area may be at least one of a tile and a subpicture.

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 360 VR 영상의 분할 구조는, GOP(Group of Picture) 단위로 결정되고, 상기 비트스트림을 생성하는 단계는, 상기 GOP에 포함된 적어도 하나 이상의 영역 시퀀스에 대한 비트스트림을 생성하는 단계를 포함할 수 있다.In the 360 VR image coding method according to the present invention, the division structure of the 360 VR image is determined on a GOP (Group of Picture) basis, and the step of generating the bitstream comprises: And generating a bitstream for the sequence.

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 비트스트림을 생성하는 단계는, 상기 GOP에 포함된 모든 영역 시퀀스에 대한 비트스트림이 생성되도록 반복해서 수행하는 단계를 포함할 수 있다.In the 360 VR image coding method according to the present invention, the step of generating the bitstream may include repeatedly generating a bitstream for all the region sequences included in the GOP.

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 비트스트림은, 상기 GOP에 포함된 적어도 하나의 영역 시퀀스로부터 생성된 제1 비트스트림과 제2 비트스트림을 포함하고, 상기 제1 비트스트림과 상기 제2 비트스트림은 화질이 다를 수 있다.In the 360 VR image coding method according to the present invention, the bitstream may include a first bitstream and a second bitstream generated from at least one region sequence included in the GOP, The second bitstream may have different image qualities.

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 제1 비트스트림은 상기 제2 비트스트림보다 화질이 높을 수 있다.In the 360 VR image coding method according to the present invention, the first bitstream may have higher image quality than the second bitstream.

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 제1 비트스트림은 제1 비디오 인코더를 이용하여 생성되고, 상기 제2 비트스트림은 상기 제1 비디오 인코더와 다른 제2 비디오 인코더를 이용하여 생성될 수 있다.In the 360 VR image coding method according to the present invention, the first bitstream is generated using a first video encoder, and the second bitstream is generated using a second video encoder different from the first video encoder .

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 360 VR 영상의 분할 구조는, 상기 영역의 개수, 위치, 크기 정보 및 프레임 레이트 중 적어도 하나를 포함할 수 있다.In the 360 VR image coding method according to the present invention, the division structure of the 360 VR image may include at least one of the number, position, size information and frame rate of the area.

본 발명에 따른 360 VR 영상 부호화 방법에 있어서, 상기 프레임 레이트는, GOP에 포함된 모든 영역 시퀀스에 대한 비트스트림을 생성하는 시간이 상기 GOP에 포함된 모든 프레임들에 대한 비트스트림을 생성하는 시간과 동일하도록 설정될 수 있다.In the 360 VR image coding method according to the present invention, the frame rate may be determined such that a time for generating a bitstream for all area sequences included in a GOP is a time for generating a bitstream for all frames included in the GOP, Can be set to be the same.

또한, 본 발명에 따르면, 영역 시퀀스 단위로 부호화된 비트스트림을 수신하는 단계; 상기 수신된 비트스트림을 복호화하여 복수의 영역들을 획득하는 단계; 및 상기 복수의 영역들에 기초하여 재생할 영상을 렌더링하는 단계를 포함하고, 상기 영역 시퀀스는 360 VR(Virtual Reality) 영상에 포함된 적어도 하나 이상의 프레임에서 동일한 위치의 영역들을 포함할 수 있다.According to another aspect of the present invention, there is provided a method of encoding a bitstream, the method comprising: receiving a bitstream encoded in an area sequence; Decoding the received bit stream to obtain a plurality of regions; And rendering the image to be reproduced based on the plurality of regions, wherein the region sequence may include regions of the same position in at least one frame included in a 360 VR (Virtual Reality) image.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 영역은, 타일 및 서브 픽처 중 적어도 하나일 수 있다.In the 360 VR image decoding method according to the present invention, the area may be at least one of a tile and a sub-picture.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 비트스트림은, 화질이 다른 적어도 두 개 이상의 비트스트림을 포함하고, 뷰포트(viewport) 영역은 상기 뷰포트 영역을 제외한 나머지 영역보다 화질이 높은 비트스트림이 수신될 수 있다.In the 360 VR image decoding method according to the present invention, the bitstream includes at least two bitstreams having different image qualities, and the viewport region includes a bitstream having a higher image quality than the region excluding the viewport region Lt; / RTI >

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 뷰포트 영역은 GOP(Group of Picture)에 포함된 프레임들 중 첫 번째 프레임을 기준으로 하여 결정될 수 있다.In the 360 VR image decoding method according to the present invention, the viewport area may be determined based on a first frame among frames included in a group of pictures (GOP).

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 뷰포트 영역은 GOP 단위로 갱신될 수 있다.In the 360 VR image decoding method according to the present invention, the viewport area may be updated in units of GOPs.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 화질이 다른 적어도 두 개 이상의 비트스트림은 하나의 비디오 디코더에 의해 복호화될 수 있다.In the 360 VR image decoding method according to the present invention, at least two or more bit streams having different image qualities may be decoded by one video decoder.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 재생할 영상을 렌더링하는 단계는, 상기 복수의 영역들을 상기 영역 시퀀스 단위로 배치하되 비디오 인코더에 입력할 때와 동일한 위치에 배치시키는 단계를 포함할 수 있다.In the 360 VR image decoding method according to the present invention, the step of rendering the image to be reproduced may include arranging the plurality of regions at the same position as that of inputting to the video encoder, have.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 배치는, GOP에 포함된 모든 영역 시퀀스가 배치될 때까지 반복해서 수행할 수 있다.In the 360 VR image decoding method according to the present invention, the arrangement may be repeatedly performed until all the area sequences included in the GOP are arranged.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 뷰포트 영역이 변경되는 경우, 상기 변경된 위치 및 GOP 정보 중 적어도 하나에 기초하여 상기 변경된 뷰포트 영역에 대해 상기 뷰포트 영역을 제외한 나머지 영역보다 화질이 높은 비트스트림이 수신될 수 있다.In the 360 VR image decoding method according to the present invention, when the viewport area is changed, a bit having a higher image quality than the remaining area excluding the viewport area, based on at least one of the changed position and the GOP information, A stream may be received.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 복수의 영역은, 상기 360 VR 영상의 분할 구조에 기초하여 상기 360 VR 영상으로부터 분할되고, 상기 360 VR 영상의 분할 구조는, 상기 영역의 개수, 위치, 크기 정보 및 프레임 레이트 중 적어도 하나를 포함할 수 있다.The 360 VR image decoding method according to the present invention is characterized in that the plurality of regions are divided from the 360 VR image based on the division structure of the 360 VR image, Location information, size information, and frame rate.

본 발명에 따른 360 VR 영상 복호화 방법에 있어서, 상기 프레임 레이트는, GOP에 포함된 모든 영역 시퀀스에 대한 비트스트림을 생성하는 시간이 상기 GOP에 포함된 모든 프레임들에 대한 비트스트림을 생성하는 시간과 동일하도록 설정될 수 있다.In the 360 VR image decoding method according to the present invention, the frame rate may be determined such that a time for generating a bitstream for all area sequences included in a GOP is a time for generating a bitstream for all frames included in the GOP, Can be set to be the same.

본 발명의 실시예들에 따른 360 VR 영상 부호화 방법 및 360 VR 영상 복호화 방법은 비디오 인코더 또는 비디오 디코더를 여러 개 사용하지 않고도 360 VR 영상을 타일(tile) 또는 서브 픽처 기반으로 부호화 및 복호화가 가능하다. The 360 VR image coding method and the 360 VR image decoding method according to the embodiments of the present invention can encode and decode a 360 VR image based on a tile or a sub picture without using a plurality of video encoders or video decoders .

또한 본 발명의 실시예들에 따른 360 VR 영상 부호화 방법 및 360 VR 영상 복호화 방법은 H.264, HEVC(High Efficiency Video Coding) 등 기존의 영상 부호화 방법에 상관없이 적용이 가능한 장점이 있다. Also, the 360 VR image coding method and the 360 VR image decoding method according to the embodiments of the present invention can be applied regardless of a conventional image coding method such as H.264 and HEVC (High Efficiency Video Coding).

또한, 본 발명의 실시예들에 따른 타일 기반 360 VR 영상 부호화 방법 및 타일 기반 360 VR 영상 복호화 방법은 각 타일들(tiles)은 공간적인 상관관계가 없이 부호화되었으므로, 일부의 타일(tile)만 전송되어도 재생에 문제가 없으며, 저화질 비트스트림 및 고화질 비트스트림을 사용하면, 2대의 비디오 인코더만으로 부드러운 렌더링을 제공할 수 있다. 특히 다수의 클라이언트가 접속해도 비디오 인코더의 개수는 변하지 않으며, 코덱의 종류에 무관하게 적용할 수 있다.Further, in the tile-based 360 VR image coding method and the tile-based 360 VR image decoding method according to the embodiments of the present invention, since each tile is coded without spatial correlation, only a part of tiles are transmitted And it is possible to provide smooth rendering with only two video encoders using a low-quality bitstream and a high-quality bitstream. In particular, the number of video encoders does not change even if a large number of clients are connected, and it can be applied irrespective of the type of codec.

또한, 본 발명의 360 VR 영상 부호화 방법 및 360 VR 영상 복호화 방법은 근래에는 그래픽 카드에 부호화 기능이 내장되어 개인용 컴퓨터(PC)에서도 고속의 영상 부호화 및 복호화를 수행할 수 있으므로, 개인 방송이 360 VR 영역까지 확대되는데 기여할 수 있다.In addition, since the 360 VR image coding method and the 360 VR image decoding method of the present invention can embody a coding function in a graphic card and can perform high-speed image coding and decoding even in a personal computer (PC) Area of the image.

도 1은 본 발명의 일 실시 예에 의한 360 VR 시스템에서의 타일(tile) 기반 360 VR 영상 부호화 및 복호화 과정을 설명하기 위한 개념도이다.
도 2는 본 발명의 일 실시 예에 의한 타일(tile) 기반 360 VR 영상 부호화 과정을 좀 더 상세하게 설명하기 위한 개념도이다.
도 3은 본 발명의 일 실시 예에 의한 타일(tile) 기반 360 VR 영상 복호화 과정을 좀 더 상세하게 설명하기 위한 개념도이다.
도 4는 본 발명의 다른 실시예에 의한 2개의 비디오 인코더를 사용하는 타일(tile) 기반 360 VR 영상 부호화 과정을 설명하기 위한 개념도이다.
도 5는 본 발명의 다른 실시예에 의한 2개의 인코더를 사용하는 경우의 타일(tile) 기반 360 VR 영상 복호화 과정을 설명하기 위한 개념도이다.
도 6은 도 5의 타일(tile) 기반 360 VR 영상 복호화 과정에서 사용자의 머리 또는 시선의 움직임 변화에 따라 프레임 F0, F1, ..., F29, F30, F31, ..., F59에서 뷰포트 영역(512)을 표시한 개념도이다.
도 7은 본 발명의 일 실시 예에 의한 360 VR 시스템에서의 타일(tile) 기반 360 VR 영상 부호화 과정을 설명하기 위한 흐름도이다.
도 8은 본 발명의 일 실시 예에 의한 360 VR 시스템에서의 타일(tile) 기반 360 VR 영상 복호화 과정을 설명하기 위한 흐름도이다.1 is a conceptual diagram for explaining a tile-based 360 VR image encoding and decoding process in a 360 VR system according to an embodiment of the present invention.
FIG. 2 is a conceptual diagram for explaining a tile-based 360 VR image coding process according to an embodiment of the present invention in more detail.
FIG. 3 is a conceptual diagram for explaining a tile-based 360 VR image decoding process according to an embodiment of the present invention in more detail.
4 is a conceptual diagram for explaining a tile-based 360 VR image encoding process using two video encoders according to another embodiment of the present invention.
5 is a conceptual diagram for explaining a tile-based 360 VR image decoding process using two encoders according to another embodiment of the present invention.
FIG. 6 is a view showing a viewport area in the frames F0, F1, ..., F29, F30, F31, ..., F59 according to the movement of the user's head or eyes during the tile-based 360 VR image decoding process of FIG. (512).
7 is a flowchart illustrating a tile-based 360 VR image encoding process in a 360 VR system according to an embodiment of the present invention.
FIG. 8 is a flowchart for explaining a tile-based 360 VR image decoding process in a 360 VR system according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

제1, 제2 등을 포함하는 용어가 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재 항목들의 조합 또는 복수의 관련된 기재 항목들 중의 어느 항목을 포함한다. The terms including the first, second, etc. may be used to describe various elements, but the elements are not limited to these terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The term " and / or " includes any combination of a plurality of related entry items or any of a plurality of related entry items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may be present in between. On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, Should not be construed to preclude the presence or addition of one or more other features, integers, steps, operations, elements, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

이하, 첨부한 도면들을 참조하여 본 발명에 바람직한 실시 예를 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어 도면 부호에 상관없이 동일하거나 대응하는 구성요소는 동일한 참조번호를 부여하고 이에 대해 중복되는 설명은 생략하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The description will be omitted.

뷰포트(viewport)는 영상 전체에서 사용자가 보는 영역으로서, 사용자에 의해 보이는 현재 표시되는 구면(spherical) 영상의 부분으로 정의될 수 있다. A viewport can be defined as a portion of a spherical image currently displayed that is viewed by a user as an area viewed by a user throughout the image.

이하 설명되는 실시예에서는, 360 VR 영상을 복수의 영역으로 분할하고, 단위 영역별로 비트스트림을 생성 및 파싱하는 방법을 설명한다. 여기서, 2D 평면에 투사된 360 VR 영상의 분할 단위는 서브 픽처 또는 타일 등일 수 있다. 분할된 영역은 균등한 크기를 가질 수도 있고, 상이한 크기를 가질 수도 있다. 일 예로, 분할된 영역 중 어느 하나의 크기는 다른 영역과 다른 크기를 가질 수 있다. 여기서, 상기 크기는 영역의 가로, 세로, 대각선 길이 및 영역의 기설정된 위치에서의 길이 중 적어도 하나일 수 있다.A method of dividing a 360 VR image into a plurality of regions and generating and parsing a bitstream for each unit region will be described in the following embodiments. Here, the division unit of the 360 VR image projected on the 2D plane may be a subpicture, a tile, or the like. The divided regions may have an equal size, or may have different sizes. As an example, the size of any one of the divided regions may have a different size from the other regions. Here, the size may be at least one of a width, a length, a diagonal length of the area, and a length at a predetermined position of the area.

본 발명에서, 각 프레임들 내 동일한 위치의 공간 영역들의 집합을 "세트" 또는 "시퀀스"라 정의할 수 있다. 일 예로, 영역 세트 또는 영역 시퀀스는 복수 프레임들 내 동일한 위치에 있는 공간 영역들의 집합을 의미할 수 있다.In the present invention, a set of spatial regions at the same position in each of the frames may be defined as a " set " or " sequence ". In one example, the region set or region sequence may refer to a set of spatial regions in the same location within a plurality of frames.

후술되는 실시예에서는, 360 VR 영상의 분할 단위가 타일인 것으로 가정하기로 한다. 아울러, 각 타일은 동일한 크기를 갖는 것으로 가정한다. 다만, 360도 VR 영상의 분할 단위가 서브 픽처인 경우 또는 각 타일의 크기가 균일하지 않은 경우 등에도, 후술되는 실시예가 적용될 수 있음은 자명하다.In the following embodiment, it is assumed that the 360 VR image is divided into tiles. In addition, it is assumed that each tile has the same size. However, it is obvious that the embodiment described below can be applied to the case where the division unit of the 360-degree VR image is a sub-picture or when the size of each tile is not uniform.

도 1은 본 발명의 일실시예에 의한 360 VR(Virtual Reality) 시스템에서의 타일(tile) 기반 360 VR 영상의 부호화 및 복호화 과정을 설명하기 위한 개념도이다. 1 is a conceptual diagram for explaining a coding and decoding process of a tile-based 360 VR image in a 360 VR (Virtual Reality) system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일실시예에 의한 360 VR 시스템은 360 VR 서버(100a)와 360 VR 단말(100b)로 구성되며, 360 VR 서버(100a)는 입력 관리자(Input Manager, 10)와 비디오 인코더(20)를 포함하며, 360 VR 단말(100b)은 비디오 디코더(30)와 출력 관리자(Output Manager, 40)를 포함한다. 1, the 360 VR system according to an exemplary embodiment of the present invention includes a 360 VR server 100a and a 360 VR terminal 100b. The 360 VR server 100a includes an Input Manager 10 and a video encoder 20. The 360 VR terminal 100b includes a video decoder 30 and an output manager 40. [

360 VR 영상(11a)이 360 VR 서버(100a)로 입력되면, 360 VR 서버(100a)의 입력 관리자(10)는 입력된 360 VR 영상(11a)을 공간적으로 복수의 영역으로 분할할 수 있다. 일 예로, 입력 관리자(10)는 360도 VR 영상(11a)의 복수의 타일(tile)로 분할하고, 적어도 하나의 타일(13a)을 고속/순차적으로 비디오 인코더(20)로 전달할 수 있다. 비디오 인코더(20)에서는 생성된 적어도 하나의 타일(13a)을 입력 받아 타일 세트(tile set) 단위로 비트스트림(21)을 생성한다. When the 360 VR image 11a is input to the 360 VR server 100a, the input manager 10 of the 360 VR server 100a can divide the input 360 VR image 11a spatially into a plurality of regions. For example, the input manager 10 may divide a tile of a 360 degree VR image 11a into a plurality of tiles and transmit at least one tile 13a to the video encoder 20 at a high speed / sequentially. The video encoder 20 receives the generated at least one tile 13a and generates a bit stream 21 in units of a tile set.

비디오 디코더(30)는 뷰포트를 렌더링하기 위해 필요한 타일 세트들의 비트스트림을 서버로부터 수신하고, 수신한 타일 세트들의 비트스트림에 대한 복호화를 수행하여, 출력 관리자(output manager, 40)로 복호화된 타일들(13b)을 전달할 수 있다. 출력 관리자(40)는 복호화된 타일들(13b)이 360 VR 영상의 뷰포트를 구성하도록 배치하여 렌더링 할 수 있도록 한다.The video decoder 30 receives a bitstream of sets of tiles necessary for rendering the viewport from the server, decodes the bitstreams of the received sets of tiles, and outputs the decoded tiles to the output manager 40 (13b). The output manager 40 allows the decoded tiles 13b to be arranged and rendered so as to constitute a viewport of a 360 VR image.

도 2는 본 발명의 일 실시 예에 의한 타일(tile) 기반 360 VR 영상 부호화 과정을 좀 더 상세하게 설명하기 위한 개념도이다.FIG. 2 is a conceptual diagram for explaining a tile-based 360 VR image coding process according to an embodiment of the present invention in more detail.

이하, 영상을 부호화할 때 설정하는 GOP(Group of Picture)를 30으로 가정하고, 한 프레임(frame)은 32개의 공간 영역, 즉, 32개의 타일들(tiles)로 분할되는 것으로 가정하여 본 발명의 실시예들에 따른 타일(tile) 기반 360 VR 영상 부호화와 복호화 과정을 설명한다. 부호화 구현에 따라서, GOP는 30이외의 값을 가질 수 있으며, 한 프레임(frame)은 32개 이외의 개수의 타일을 가질 수도 있음은 물론이다. Assuming that a GOP (Group of Pictures) to be set when coding an image is 30 and one frame is divided into 32 spatial regions, that is, 32 tiles, A tile-based 360 VR image encoding and decoding process according to embodiments will be described. According to the encoding implementation, the GOP may have a value other than 30, and one frame may have a number of tiles other than 32. [

도 2를 참조하면, 제1 GOP는 프레임 0부터 프레임 29까지 30개의 프레임으로 이루어지고, 제2 GOP는 프레임 30부터 프레임 59까지 30개의 프레임으로 이루어진다. 각 GOP에 포함된 프레임들은 동일한 분할 구조를 가질 수 있다. 여기서, 분할 구조는, 분할 영역의 개수, 분할 영역의 위치 또는 분할 영역의 크기 중 적어도 하나를 포함할 수 있다. GOP 별로 프레임들의 분할 구조가 상이하게 설정될 수 있다. 일 예로, 제2 GOP가 제1 GOP와 상이한 분할 구조를 갖는 경우, 제2 GOP에 대해 업데이트된 분할 구조에 대한 정보가 부호화될 수 있다. Referring to FIG. 2, the first GOP includes 30 frames from frame 0 to frame 29, and the second GOP includes 30 frames from frame 30 to frame 59. Frames included in each GOP may have the same division structure. Here, the divided structure may include at least one of the number of divided areas, the position of the divided area, or the size of the divided area. The division structure of the frames may be set differently for each GOP. In one example, if the second GOP has a different partition structure than the first GOP, information on the updated partition structure for the second GOP may be coded.

도 2에 도시된 예에서는, 제1 GOP의 프레임들 F0, F1, F2, ... F29 및 제2 GOP의 프레임들 F30, F31, ... F59 각각이 T0, T1, ..., T31까지 32개의 타일들을 포함하는 것으로 도시되었다. 도시된 예에서와 달리, 제1 GOP에 포함된 프레임들 및 제2 GOP에 포함된 프레임들 각각이 포함하는 타일의 개수를 상이하게 설정할 수도 있다. 또는, 타일 개수는 동일하나, 타일의 위치 또는 크기를 상이하게 설정할 수도 있다.In the example shown in FIG. 2, frames F0, F1, F2, ..., F29 of the first GOP and frames F30, F31, ... F59 of the second GOP are T0, T1, ..., T31 Lt; RTI ID = 0.0 > 32 < / RTI > tiles. Unlike the illustrated example, the number of tiles included in each of the frames included in the first GOP and the frames included in the second GOP may be set differently. Alternatively, the number of tiles is the same, but the position or size of the tiles may be set differently.

다시 도 2를 참조하면, 360 VR 영상(11a)은 입력 관리자(10)에 210에 나타낸 것처럼 프레임 F0, F1, ..., F29, F30, ..., F60, ...의 순서로 순차적으로 입력된다. Referring again to FIG. 2, the 360 VR image 11a is sequentially input to the input manager 10 in the order of frames F0, F1, ..., F29, F30, ..., F60, .

영상을 부호화할 때 GOP를 설정하는데, 입력 관리자(10)는 GOP 만큼의 영상을 버퍼링할 수 있다. 입력 관리자(10)는 도 2의 220과 같이 각 GOP에 포함된 프레임들 (F0, F1, ..., F29, F30, ..., F60, ...)을 타일 단위로 분할하고, 분할된 타일(tile)(13a)을 순차적으로 비디오 인코더(20)로 전달한다. 타일들 각각은 독립적으로 부호화될 수 있다. 일 예로, 타일간 움직임 제한(motion constraint)이 적용되어, 타일간에는 부호화 파라미터들이 종속성을 갖지 않을 수 있다.When coding an image, a GOP is set, and the input manager 10 can buffer as many images as GOP. The input manager 10 divides the frames (F0, F1, ..., F29, F30, ..., F60, ...) included in each GOP into tiles as shown in 220 of FIG. 2, And transmits the tile 13a to the video encoder 20 in sequence. Each of the tiles can be independently encoded. As an example, motion constraints between tiles may be applied so that encoding parameters may not have dependencies between tiles.

입력 관리자(10)는 GOP 첫 프레임부터 GOP 마지막 프레임까지 동일 위치의 타일들(tiles)(이하, 타일 세트(tile set) 또는 타일 시퀀스(tile sequence)라고 함)을 순차적으로 비디오 인코더(20)로 입력시킬 수 있다. 그리고, 비디오 인코더(20)는 프레임 단위로 비트스트림을 생성하는 대신, 타일 세트 단위로 비트스트림을 생성할 수 있다. 이 과정을 GOP 내의 모든 타일 세트가 입력될 때까지 반복하여, 타일 세트 개수만큼 비트스트림을 생성할 수 있다. GOP 내 타일 세트 입력 과정을 GOP 단위로 반복하여 수행한다. The input manager 10 sequentially transmits tiles (hereinafter referred to as a tile set or a tile sequence) at the same position from the first frame of the GOP to the last frame of the GOP to the video encoder 20 Can be input. In addition, the video encoder 20 can generate a bit stream in units of tiles, instead of generating a bit stream in units of frames. This process is repeated until all the tile sets in the GOP are input, and a bit stream can be generated by the number of tile sets. The tile set input process in the GOP is repeated in units of GOP.

상기 과정에서, GOP내 프레임들에 대해, 프레임 단위로 부호화가 수행되는 경우와 타일 세트 단위로 부호화가 수행되는 경우의 처리 시간(n sec)이 동일하도록 비디오 인코더(20)가 설정될 수 있다. 즉, 입력 관리자(10)는 GOP 내의 모든 타일(tile)이 처리되는 시간(n sec)(도 2의 230 참조)이 GOP내의 모든 프레임을 처리하는 시간과 같도록 타일 세트를 순차적으로 비디오 인코더(20)로 입력한다. In the above process, the video encoder 20 can be set such that the processing time (n sec) for encoding in the frame unit and the processing time (n sec) for encoding in the tile set unit are the same for the frames in the GOP. That is, the input manager 10 sequentially sets the tile set to the video encoder (step 230) so that the time (n sec) at which all the tiles in the GOP are processed (see 230 in FIG. 2) 20).

이를 위해, 비디오 인코더(20)의 부호화 파라미터 중, 비디오 인코더(20)의 입력 영상(13a)의 크기를 타일(tile)의 크기로 설정하고, 비디오 인코더(20)의 입력 영상(13a)의 프레임 레이트를 GOP 내 총 타일의 개수(즉, (360 VR 영상의 프레임 레이트) x (타일의 개수))로 설정할 수 있다. 일 예로, 도 2의 예에서 360 VR 영상이 3840 x 2160 크기이고 30 fps의 프레임 레이트로 입력될 경우, 비디오 인코더(20)의 영상 크기 관련 부호화 파라미터는 타일 크기인 480 x 540 크기 (즉, 비디오 인코더(20)로 입력되는 영상의 크기 = 타일 크기)를 갖고, 프레임 레이트 관련 부호화 파라미터(비디오 인코더(20)의 입력 영상(13a)의 프레임 레이트)는 GOP 내 총 타일의 개수인 960 fps를 가질 수 있다. 일반적으로 영상의 크기가 작아지면 부호화 속도가 그에 비례해서 빨라지므로, 타일(tile)의 고속처리가 가능하다. 상기 입력 영상의 프레임 레이트는 상술하였던 분할 구조에 포함될 수 있다.The size of the input image 13a of the video encoder 20 is set to the size of the tile and the number of frames of the input image 13a of the video encoder 20 The rate can be set to the total number of tiles in the GOP (i.e., (frame rate of 360 VR image) x (number of tiles)). For example, in the example of FIG. 2, when the 360 VR image has a size of 3840 x 2160 and a frame rate of 30 fps, the image size-related encoding parameter of the video encoder 20 is 480 x 540 (i.e., (The frame rate of the input image 13a of the video encoder 20) has 960 fps, which is the total number of tiles in the GOP, . In general, as the size of an image becomes smaller, a coding speed becomes faster in proportion thereto, and thus a high-speed processing of a tile is possible. The frame rate of the input image may be included in the above-described divided structure.

도 3은 본 발명의 일실시예에 의한 타일(tile) 기반 360 VR 영상 복호화 과정을 좀 더 상세하게 설명하기 위한 개념도이다. FIG. 3 is a conceptual diagram for explaining a tile-based 360 VR image decoding process according to an embodiment of the present invention in more detail.

도 3을 참조하면, 비디오 디코더(30)는 뷰포트를 구성하기 위해 필요한 타일 세트들의 비트스트림을 360 VR 서버로부터 수신하고, 수신한 타일 세트들의 비트스트림에 대해 전술한 부호화의 역순으로 복호화를 수행할 수 있다. 구체적으로, 비디오 디코더(30)는 타일 세트들의 비트스트림 중 뷰포트를 구성하는데 필요한 타일 세트들의 비트스트림만을 수신하여 복호화를 수행할 수 있다.Referring to FIG. 3, the video decoder 30 receives a bit stream of tile sets necessary for constructing the viewport from the 360 VR server, and decodes the bit streams of the received tile sets in the reverse order of the above-described encoding . Specifically, the video decoder 30 can receive and decode only the bitstream of the tile sets necessary for constructing the viewport among the bitstreams of the tile sets.

0번째 프레임 F0에서는 사각형 312의 뷰포트가 타일 T1, T2, T3, T4, T9, T10, T11 및 T12에 걸쳐 존재하는 것으로 도시되었다. In the 0th frame F0, the viewport of the rectangle 312 is shown to exist over the tiles T1, T2, T3, T4, T9, T10, T11 and T12.

따라서, 비디오 디코더(30)는 뷰포트에 대응하는 타일 T1, T2, T3, T4, T9, T10, T11 및 T12에 대한 타일 세트를 순차적으로 수신하여 복호화를 수행한다. 상기 타일 세트는 GOP 내 모든 프레임(즉, 첫 프레임(F0)부터 마지막 프레임(F29)) 내 동일 위치 타일들을 포함할 수 있다. 즉, 첫번째 프레임의 뷰포트를 복호화하는데 이용되는 타일 세트는, GOP 내 모든 프레임의 타일 T1, T2, T3, T4, T9, T10, T11 및 T12들을 포함할 수 있다. 도 3의 320으로 표기된 사각형 점선은 비디오 디코더(30)에서 복호화되어 하나의 GOP에 대응되는 타일 세트- 전술한 첫번째 프레임의 뷰포트 영역에 포함되는 타일 T1, T2, T3, T4, T9, T10, T11 및 T12을 포함-가 디코더에서 n sec 동안 출력되는 예를 나타낸다. Accordingly, the video decoder 30 sequentially receives and decodes a set of tiles for the tiles T1, T2, T3, T4, T9, T10, T11, and T12 corresponding to the viewport. The set of tiles may include the same location tiles in all frames in the GOP (i.e., from the first frame F0 to the last frame F29). That is, the set of tiles used to decode the viewport of the first frame may include tiles T1, T2, T3, T4, T9, T10, T11 and T12 of all frames in the GOP. The rectangular dotted lines indicated by 320 in FIG. 3 are decoded by the video decoder 30 and set as a tile corresponding to one GOP-tiles T1, T2, T3, T4, T9, T10, and T11 included in the viewport area of the first frame And T12 - are output for n sec at the decoder.

즉, 전술한 바와 같이 비디오 디코더(30)는 상기 뷰포트에 해당되는 타일 세트를 수신하여 복호화를 수행하고 복호화된 타일들(13b)을 순차적으로 출력 관리자(40)로 전달한다. That is, as described above, the video decoder 30 receives and decodes the set of tiles corresponding to the viewport, and sequentially delivers the decoded tiles 13b to the output manager 40. [

출력 관리자(40)는 상기 복호화된 타일들(tiles)(13b)은 렌더링할 수 있게 360 VR 영상으로 재구성한다. 360 VR 영상을 재구성하기 위해서는 각 타일(tile)의 360 VR 영상 내 위치를 알아야 하는데, 각 타일(tile)의 360 VR 영상 내 위치는 예를 들어, MPEG DASH 표준의 SRD(Spatial Relationship Description)를 이용하여 얻을 수 있다. The output manager 40 reconstructs the decoded tiles 13b into a 360 VR image so as to render the decoded tiles 13b. In order to reconstruct the 360 VR image, it is necessary to know the position of each tile in the 360 VR image. For example, the position of each tile in the 360 VR image is determined by using the spatialdata description (SRD) of the MPEG DASH standard .

전술한 본 발명의 일 실시예에 따른 타일(tile) 기반 360 VR 영상 부호화 방법 및 복호화 방법에서는, 비디오 인코더 1개와 비디오 디코더 1개만으로 360 VR 영상 서비스를 제공 가능하다. In the tile-based 360 VR image coding and decoding method according to an embodiment of the present invention, a 360 VR image service can be provided using only one video encoder and one video decoder.

GOP가 재생되는 동안 뷰포트가 변경되지 않는 경우, 도 3에 도시된 예에서와 같이, GOP의 첫번째 프레임의 뷰포트를 기초로 타일 세트들의 비트스트림을 선택적으로 수신하는 것이 시스템 효율을 향상시킬 수 있다. 하지만, 도 3에 도시된 방법에 의할 경우, GOP 재생 중에 사용자의 머리 또는 시선 움직임에 의해 뷰포트가 변경될 경우, 뷰포트가 변경됨에 따라 뷰포트를 구성하기 위해 새롭게 포함되어야 하는 영역-변경된 뷰포트에 해당되는 영역-에 대응하는 타일들(tiles)이 수신되고 있지 않기 때문에, 변경된 뷰포트를 온전히 렌더링할 수 없는 문제점이 발생할 수 있다. 즉, 다음 GOP가 시작되어야 변경된 뷰포트에 대응되는 영역의 복호화가 가능하므로, GOP가 재생되는 동안 뷰포트가 변경되는 경우, 360 VR 영상 재생 중 끊김이 발생하여 360 VR 영상 시청에 불편을 초래할 수 있다. If the viewport is not changed while the GOP is being played back, it is possible to improve system efficiency by selectively receiving a bit stream of tile sets based on the viewport of the first frame of the GOP, as in the example shown in Fig. However, according to the method shown in FIG. 3, when the viewport is changed by the user's head or eye movement during GOP playback, the area to be newly included to constitute the viewport as the viewport is changed corresponds to the changed viewport There may be a problem that the changed viewport can not be rendered completely because the tiles corresponding to the region - tiles to be rendered are not being received. That is, since the area corresponding to the changed viewport can be decoded at the start of the next GOP, if the viewport is changed while the GOP is being reproduced, the 360 VR image may be interrupted during playback, which may cause inconvenience to 360 VR video viewing.

즉, 도 3의 312로 표기된 바와 같이 첫번째 GOP 내에서 뷰포트가 타일(tile) T1, T2, T3, T4, T9, T10, T11, T12로 이루어진 영역을 벗어나는 경우, 변경된 뷰포트에 상응하는 새로운 타일(tile) 또는 타일들(tiles)이 필요하게 되나, 새로운 타일(tile) 또는 타일들(tiles)은 다음 GOP 까지 복호화 할 수 없게 되는 문제가 발생될 수 있다. That is, when the viewport is out of the area of tiles T1, T2, T3, T4, T9, T10, T11, and T12 in the first GOP as indicated by 312 in FIG. 3, tiles or tiles are required but new tiles or tiles can not be decoded until the next GOP.

따라서, 360 VR 영상 서비스를 제공하기 위해 본 발명의 다른 실시예에서는 복수의 비디오 인코더를 사용하도록 확장함으로써 360 VR 영상 재생이 부드러운 360 VR 영상 서비스를 제공하도록 구현할 수 있다.Accordingly, in another embodiment of the present invention to provide a 360 VR video service, 360 VR video playback can be implemented to provide a smooth 360 VR video service by extending to use a plurality of video encoders.

설명의 편의를 위해, 후술되는 실시예에서는 인코더의 개수가 2개인 것으로 가정한다. 2개보다 더 많은 수의 인코더를 사용하는 것도 본 발명의 범주에 포함될 수 있다.For convenience of explanation, it is assumed that the number of encoders is two in the embodiment described later. It is also within the scope of the present invention to use more than two encoders.

도 4는 본 발명의 다른 실시예에 의한 2개의 비디오 인코더를 사용하는 타일(tile) 기반 360 VR 영상 부호화 과정을 설명하기 위한 개념도이다. 4 is a conceptual diagram for explaining a tile-based 360 VR image encoding process using two video encoders according to another embodiment of the present invention.

도 4를 참조하면, 본 발명의 다른 실시예에 의한 360 VR 시스템은 360 VR 서버(400a)와 360 VR 단말(400b)로 구성되며, 360 VR 서버(400a)는 입력 관리자(Input Manager, 410), 제1 비디오 인코더(420a) 및 제2 비디오 인코더(420b)를 포함하며, 360 VR 단말(400b)은 비디오 디코더(430)와 출력 관리자(Output Manager, 440)를 포함한다. Referring to FIG. 4, the 360 VR system according to another embodiment of the present invention includes a 360 VR server 400a and a 360 VR terminal 400b. The 360 VR server 400a includes an input manager 410, A first video encoder 420a and a second video encoder 420b and the 360 VR terminal 400b includes a video decoder 430 and an output manager 440. [

입력 관리자(410)와 비디오 인코더(420a, 420b)는 전술한 바와 같이 타일(tile)기반 고속 부호화 방법으로 동작한다.The input manager 410 and the video encoders 420a and 420b operate as a tile-based fast encoding method as described above.

구체적으로, 360 VR 영상(401a)이 360 VR 서버(400a)로 입력되면, 360 VR 서버(400a)의 입력 관리자(410)는 입력된 360 VR 영상(401a)을 복수의 타일로 분할하고, 적어도 하나의 타일(tile)(413a)을 고속/순차적으로 제1 비디오 인코더(420a) 및 제2 비디오 인코더(420b)로 전달한다. 비디오 인코더(420a, 420b)에서는 상기 생성된 적어도 하나의 타일(413a)을 입력 받아 타일들(tiles)의 비트스트림(421a, 421b)을 생성한다. More specifically, when the 360 VR image 401a is input to the 360 VR server 400a, the input manager 410 of the 360 VR server 400a divides the inputted 360 VR image 401a into a plurality of tiles, And transfers one tile 413a to the first video encoder 420a and the second video encoder 420b at high speed / sequentially. The video encoders 420a and 420b receive the generated at least one tile 413a to generate bit streams 421a and 421b of tiles.

제1 비디오 인코더(420a)와 제2 비디오 인코더(420b)는 동일한 영상 소스를 상이한 퀄리티로 부호화할 수 있다. 구체적으로, 제1 비디오 인코더(420a)는 타일들(tiles)을 고화질로 부호화하여 고화질 비트스트림을 생성하고, 제2 비디오 인코더(420b)는 타일들(tiles)들을 저화질로 부호화하여 저화질 비트스트림을 생성한다. The first video encoder 420a and the second video encoder 420b can encode the same video source with different quality. Specifically, the first video encoder 420a encodes the tiles in a high image quality to generate a high-quality bit stream, and the second video encoder 420b encodes the tiles in a low image quality to generate a low- .

비디오 디코더(430)는 뷰포트를 렌더링하기 위해 필요한 타일 세트들의 비트스트림을 서버에 요청할 수 있다. 이때, 비디오 디코더(430)는 뷰포트에 해당하는 영역은 고화질 타일 세트들의 비트스트림(421a)을 수신하여 복호화하고, 뷰포트에 해당되지 않는 영역은 저화질 타일 세트들의 비트스트림(421b)을 수신하여 복호화한다.Video decoder 430 may request the server of a bitstream of sets of tiles needed to render the viewport. At this time, the video decoder 430 receives and decodes the bit stream 421a of the high-quality tile sets, and the area not corresponding to the viewport receives and decodes the bit stream 421b of the low-quality tile sets .

비디오 디코더(430)는 상기 수신한 타일(tile) 스트림을 복호화해서 출력 관리자(440)로 복호화된 타일들(413b)을 전달하고, 출력 관리자(440)는 복호화된 타일들(413b)이 360 VR 영상의 뷰포트를 구성하도록 배치하여 렌더링 할 수 있도록 한다.The video decoder 430 decodes the received tile stream and delivers the decoded tiles 413b to the output manager 440. The output manager 440 decodes the decoded tiles 413b to 360 VR So that the image can be arranged and rendered to construct a viewport of the image.

따라서, 사용자의 머리 또는 시선 움직임에 의해 뷰포트가 고화질로 부호화된 타일(tile) 영역을 벗어나면, 변경된 뷰포트에 대응하는 저화질 타일 세트 비트스트림의 복호화 결과를 기초로 360 VR 영상을 렌더링할 수 있다. 또한, 다음 GOP가 시작되면 변경된 뷰포트를 기초로, 수신되어야 할 고화질 타일 세트 비트스트림을 재선정함으로써, 재생이 부드러운 360 VR 영상 서비스를 제공할 수 있다. Therefore, if the viewport is out of the high-quality encoded tile area by the user's head or line of sight movement, the 360 VR image can be rendered based on the decoded result of the low-quality tile set bitstream corresponding to the changed viewport. In addition, when the next GOP starts, a high-quality tile set bit stream to be received is re-selected based on the changed viewport, thereby providing a smooth 360 VR video service.

상기 고화질 타일 세트 비트스트림과 저화질 타일 세트 비트스트림을 하나의 비디오 디코더로 처리하도록 구현할 수도 있으며, 2개의 비디오 디코더로 처리하도록 구현할 수도 있다. The high-quality tile set bitstream and the low-quality tile set bitstream may be processed by one video decoder or may be implemented by two video decoders.

도 5는 본 발명의 다른 실시예에 의한 2개의 인코더를 사용하는 경우의 타일(tile) 기반 360 VR 영상 복호화 과정을 설명하기 위한 개념도이다.5 is a conceptual diagram for explaining a tile-based 360 VR image decoding process using two encoders according to another embodiment of the present invention.

도 5를 참조하면, 제1 GOP는 프레임 0부터 프레임 29까지 30개의 프레임으로 이루어지고, 제2 GOP는 프레임 30부터 프레임 59까지 30개의 프레임으로 이루어진다. 각 GOP에 포함된 프레임들은 동일한 분할 구조를 가질 수 있다. 도 5에 도시된 예에서는, 제1 GOP의 프레임들 F0, F1, F2, ... F29 및 제2 GOP의 프레임들 F30, F31, ... F59 각각이 T0, T1, ..., T31까지 32개의 타일들을 포함하는 것으로 도시되었다. 도시된 예에서와 달리, 제1 GOP에 포함된 프레임들 및 제2 GOP에 포함된 프레임들 각각이 포함하는 타일의 개수를 상이하게 설정할 수도 있다. 또는, 타일 개수는 동일하나, 타일의 위치 또는 크기를 상이하게 설정할 수도 있다.Referring to FIG. 5, the first GOP includes 30 frames from frame 0 to frame 29, and the second GOP includes 30 frames from frame 30 to frame 59. Frames included in each GOP may have the same division structure. In the example shown in FIG. 5, frames F0, F1, F2, ..., F29 of the first GOP and frames F30, F31, ... F59 of the second GOP are T0, T1, ..., T31 Lt; RTI ID = 0.0 > 32 < / RTI > tiles. Unlike the illustrated example, the number of tiles included in each of the frames included in the first GOP and the frames included in the second GOP may be set differently. Alternatively, the number of tiles is the same, but the position or size of the tiles may be set differently.

디코더(430)는, 제1 GOP를 복호화함에 있어서, 첫 번째 프레임 F0의 뷰포트에 대응하는 영역들(512-0 부터 512-29)에 대해서는 고화질 타일 세트 비트스트림을 수신하여 복호화할 수 있다. 구체적으로, 디코더는 T1, T2, T3, T4, T9, T10, T11 및 T12의 고화질 타일 세트를 복호화할 수 있다. 디코더(430)는 F0의 뷰포트 대응 영역을 제외한 잔여 영역에 대해서는, 저화질 타일 세트 비트스트림을 수신하여 복호화할 수 있다. In decoding the first GOP, the decoder 430 can receive and decode the high-quality tile set bit stream for the areas 512-0 to 512-29 corresponding to the viewport of the first frame F0. Specifically, the decoder can decode high-quality tile sets of T1, T2, T3, T4, T9, T10, T11 and T12. The decoder 430 can receive and decode the low-quality tile set bit stream for the remaining area excluding the viewport corresponding area of F0.

한편, 디코더(430)는 제2 GOP를 복호화함에 있어서, 첫 번째 프레임 F30의 뷰포트에 대응하는 영역들 (512-30부터 512-59)에 대해서는 고화질 타일 세트 비트스트림을 수신하여 복호화할 수 있다. 구체적으로, 디코더(430)는, T4, T5, T6, T7, T12, T13, T14 및 T15의 고화질 타일 세트를 복호화할 수 있다. 디코더(430)는 F30의 뷰포트 대응 영역을 제외한 잔여 영역에 대해서는, 저화질 타일 세트 비트스트림을 수신하여 복호화할 수 있다.Meanwhile, in decoding the second GOP, the decoder 430 may receive and decode the high-quality tile set bit stream for the regions (512-30 to 512-59) corresponding to the viewport of the first frame F30. Specifically, the decoder 430 can decode high-quality tile sets of T4, T5, T6, T7, T12, T13, T14 and T15. The decoder 430 can receive and decode the low quality tile set bit stream for the remaining area except for the viewport corresponding area of F30.

전술한 GOP의 프레임별 고화질 타일 영역에 상응하여, 비디오 디코더(430)에서 복호화된 타일들(413b)에서 고화질 타일 부분은 512로 표시하였다. The high-quality tile portion in the tiles 413b decoded by the video decoder 430 is represented by 512, corresponding to the high-quality tile region for each frame of the GOP described above.

비디오 디코더(430)는 도 5에 도시된 바와 같이 GOP내의 저화질 타일 및 고화질 타일들을 모두 n 초(sec)에 처리해야 한다. The video decoder 430 must process both low-quality tiles and high-quality tiles in the GOP in n seconds as shown in FIG.

도 6은 도 5의 타일(tile) 기반 360 VR 영상 복호화 과정에서 사용자의 머리 또는 시선의 움직임 변화에 따라 프레임 F0, F1, ..., F29, F30, F31, ... F59에서 뷰포트 영역(512)을 표시한 개념도이다. 예를 들어, 도 6에서는 프레임 F0에서의 뷰포트 영역(512-0), 프레임 F1에서의 뷰포트 영역(512-1), ..., 프레임 F29에서의 뷰포트 영역(512-29), 프레임 F30에서의 뷰포트 영역(512-30), 프레임 F31에서의 뷰포트 영역(512-31), ..., 프레임 F59에서의 뷰포트 영역(512-59)을 나타낸다. FIG. 6 is a view illustrating a viewport area (in FIG. 5) in the frames F0, F1, ..., F29, F30, F31, ..., F59 according to the movement of the user's head or eyes during the tile- 512). For example, in FIG. 6, the viewport area 512-0 in the frame F0, the viewport area 512-1 in the frame F1, ..., the viewport area 512-29 in the frame F29, The viewport area 512-30 of the frame F31, ..., and the viewport area 512-59 of the frame F59.

도 7은 본 발명의 일실시예에 의한 360 VR 시스템에서의 타일(tile) 기반 360 VR 영상 부호화 과정을 설명하기 위한 흐름도이다. 7 is a flowchart illustrating a tile-based 360 VR image encoding process in a 360 VR system according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 일 실시 예에 의한 타일(tile) 기반 360 VR 영상 부호화 방법은, 360 VR(Virtual Reality) 영상의 분할 구조에 기초하여 상기 360 VR 영상을 복수의 영역으로 분할하는 단계(S710), 상기 분할된 복수의 영역을 이용하여 영역 시퀀스를 생성하는 단계(S720), 상기 생성된 영역 시퀀스에 대한 비트스트림을 생성하는 단계(S730) 및/또는 상기 생성된 비트스트림을 전송하는 단계(S740)를 포함할 수 있다.Referring to FIG. 7, a tile-based 360 VR image coding method according to an embodiment of the present invention divides the 360 VR image into a plurality of regions based on a division structure of a 360 VR (Virtual Reality) A step S710 of generating a region sequence using the divided regions, a step S720 of generating a region sequence using the divided regions, a step S730 of generating a bitstream for the generated region sequence, and / (S740).

상기 영역 시퀀스는 상기 360 VR 영상에 포함된 적어도 하나 이상의 프레임에서 동일한 위치의 영역들을 포함할 수 있다.The region sequence may include regions of the same position in at least one frame included in the 360 VR image.

상기 영역은, 타일 및 서브 픽처 중 적어도 하나일 수 있다. The area may be at least one of a tile and a subpicture.

상기 360 VR 영상의 분할 구조는, GOP(Group of Picture) 단위로 결정될 수 있다.The division structure of the 360 VR image may be determined on a GOP (Group of Picture) basis.

상기 생성된 영역 시퀀스에 대한 비트스트림을 생성하는 단계(S730)는 상기 GOP에 포함된 적어도 하나 이상의 영역 시퀀스에 대한 비트스트림을 생성하는 단계를 포함할 수 있다. 또한, 상기 생성된 영역 시퀀스에 대한 비트스트림을 생성하는 단계(S730)는 상기 GOP에 포함된 모든 영역 시퀀스에 대한 비트스트림이 생성되도록 반복해서 수행하는 단계를 포함할 수 있다.In operation S730, the bitstream for the generated region sequence may be generated by generating a bitstream for at least one region sequence included in the GOP. In addition, the step of generating a bitstream for the generated region sequence (S730) may include repeatedly generating a bitstream for all the region sequences included in the GOP.

상기 비트스트림은, 상기 GOP에 포함된 적어도 하나의 영역 시퀀스로부터 생성된 제1 비트스트림과 제2 비트스트림을 포함하고, 상기 제1 비트스트림과 상기 제2 비트스트림은 화질이 다를 수 있다. 예를 들어, 상기 제1 비트스트림은 상기 제2 비트스트림보다 화질이 높을 수 있다. 즉, 제1 비트스트림은 고화질 비트스트림이고 제2 비트스트림은 저화질 비트스트림일 수 있다.The bitstream may include a first bitstream and a second bitstream generated from at least one region sequence included in the GOP, and the first bitstream and the second bitstream may have different image qualities. For example, the first bitstream may have a higher image quality than the second bitstream. That is, the first bitstream may be a high-quality bitstream and the second bitstream may be a low-quality bitstream.

상기 제1 비트스트림은 제1 비디오 인코더를 이용하여 생성되고, 상기 제2 비트스트림은 상기 제1 비디오 인코더와 다른 제2 비디오 인코더를 이용하여 생성될 수 있다.The first bitstream may be generated using a first video encoder and the second bitstream may be generated using a second video encoder different from the first video encoder.

상기 360 VR 영상의 분할 구조는, 상기 영역의 개수, 위치, 크기 정보 및 프레임 레이트 중 적어도 하나를 포함할 수 있다.The division structure of the 360 VR image may include at least one of the number, position, size information and frame rate of the area.

상기 프레임 레이트는, GOP에 포함된 모든 영역 시퀀스에 대한 비트스트림을 생성하는 시간이 상기 GOP에 포함된 모든 프레임들에 대한 비트스트림을 생성하는 시간과 동일하도록 설정될 수 있다.The frame rate may be set so that a time for generating a bitstream for all the area sequences included in the GOP is equal to a time for generating a bitstream for all frames included in the GOP.

도 8은 본 발명의 일 실시 예에 의한 360 VR 시스템에서의 타일(tile) 기반 360 VR 영상 복호화 과정을 설명하기 위한 흐름도이다.FIG. 8 is a flowchart for explaining a tile-based 360 VR image decoding process in a 360 VR system according to an embodiment of the present invention.

도 8을 참조하면, 본 발명의 일 실시 예에 의한 타일(tile) 기반 360 VR 영상 복호화 방법은, 영역 시퀀스 단위로 부호화된 비트스트림을 수신하는 단계(S810), 상기 수신된 비트스트림을 복호화하여 복수의 영역들을 획득하는 단계(S820) 및/또는 상기 복수의 영역들에 기초하여 재생할 영상을 렌더링하는 단계(S830)를 포함할 수 있다.Referring to FIG. 8, a tile-based 360 VR image decoding method according to an embodiment of the present invention includes a step of receiving a bitstream encoded in an area sequence unit (S810), decoding the received bitstream (S820) of obtaining a plurality of regions and / or rendering (S830) an image to be reproduced based on the plurality of regions.

상기 영역 시퀀스는 360 VR(Virtual Reality) 영상에 포함된 적어도 하나 이상의 프레임에서 동일한 위치의 영역들을 포함할 수 있다.The region sequence may include regions of the same position in at least one frame included in 360 Virtual Reality (VR) images.

상기 영역은, 타일 및 서브 픽처 중 적어도 하나일 수 있다.The area may be at least one of a tile and a subpicture.

상기 비트스트림은, 화질이 다른 적어도 두 개 이상의 비트스트림을 포함하고, 뷰포트(viewport) 영역은 상기 뷰포트 영역을 제외한 나머지 영역보다 화질이 높은 비트스트림이 수신될 수 있다. 즉, 뷰포트 영역은 고화질 비트스트림, 나머지 영역은 저화질 비트스트림이 수신될 수 있다.The bitstream may include at least two bitstreams having different image qualities and the viewport region may receive a bitstream having a higher image quality than the region other than the viewport region. That is, the viewport area may be a high-quality bitstream, and the remaining area may be a low-quality bitstream.

상기 뷰포트 영역은 GOP(Group of Picture)에 포함된 프레임들 중 첫 번째 프레임을 기준으로 하여 결정될 수 있다.The viewport area may be determined based on the first frame among the frames included in the GOP (Group of Pictures).

상기 뷰포트 영역은 GOP 단위로 갱신될 수 있다.The viewport area may be updated in units of GOPs.

상기 화질이 다른 적어도 두 개 이상의 비트스트림은 하나의 비디오 디코더에 의해 복호화될 수 있다. 또는, 고화질의 비트스트림과 상기 저화질의 비트스트림을 복수의 비디오 디코더로 처리할 수 있다.At least two bit streams having different image qualities may be decoded by one video decoder. Alternatively, the high-quality bit stream and the low-quality bit stream can be processed by a plurality of video decoders.

상기 복수의 영역들에 기초하여 재생할 영상을 렌더링하는 단계(S830)는, 상기 복수의 영역들을 상기 영역 시퀀스 단위로 배치하되 비디오 인코더에 입력할 때와 동일한 위치에 배치시키는 단계를 포함할 수 있다.The step of rendering the image to be reproduced based on the plurality of regions may include arranging the plurality of regions in the same position as that of inputting to the video encoder in the region sequence unit.

상기 배치는, GOP에 포함된 모든 영역 시퀀스가 배치될 때까지 반복해서 수행할 수 있다.The above arrangement can be repeatedly performed until all the region sequences included in the GOP are arranged.

상기 뷰포트 영역이 변경되는 경우, 상기 변경된 위치 및 GOP 정보 중 적어도 하나에 기초하여 상기 변경된 뷰포트 영역에 대해 상기 뷰포트 영역을 제외한 나머지 영역보다 화질이 높은 비트스트림이 수신될 수 있다.When the viewport area is changed, a bitstream having a higher image quality than the rest of the area other than the viewport area may be received for the modified viewport area based on at least one of the changed position and the GOP information.

상기 복수의 영역은, 상기 360 VR 영상의 분할 구조에 기초하여 상기 360 VR 영상으로부터 분할되고, 상기 360 VR 영상의 분할 구조는, 상기 영역의 개수, 위치, 크기 정보 및 프레임 레이트 중 적어도 하나를 포함할 수 있다.Wherein the plurality of regions are divided from the 360 VR image based on a division structure of the 360 VR image, and the division structure of the 360 VR image includes at least one of the number, position, size information, and frame rate of the region can do.

본 개시의 예시적인 방법들은 설명의 명확성을 위해서 동작의 시리즈로 표현되어 있지만, 이는 단계가 수행되는 순서를 제한하기 위한 것은 아니며, 필요한 경우에는 각각의 단계가 동시에 또는 상이한 순서로 수행될 수도 있다. 본 개시에 따른 방법을 구현하기 위해서, 예시하는 단계에 추가적으로 다른 단계를 포함하거나, 일부의 단계를 제외하고 나머지 단계를 포함하거나, 또는 일부의 단계를 제외하고 추가적인 다른 단계를 포함할 수도 있다.Although the exemplary methods of this disclosure are represented by a series of acts for clarity of explanation, they are not intended to limit the order in which the steps are performed, and if necessary, each step may be performed simultaneously or in a different order. In order to implement the method according to the present disclosure, the illustrative steps may additionally include other steps, include the remaining steps except for some steps, or may include additional steps other than some steps.

본 개시의 다양한 실시 예는 모든 가능한 조합을 나열한 것이 아니고 본 개시의 대표적인 양상을 설명하기 위한 것이며, 다양한 실시 예에서 설명하는 사항들은 독립적으로 적용되거나 또는 둘 이상의 조합으로 적용될 수도 있다.The various embodiments of the disclosure are not intended to be all-inclusive and are intended to be illustrative of the typical aspects of the disclosure, and the features described in the various embodiments may be applied independently or in a combination of two or more.

또한, 본 개시의 다양한 실시 예는 하드웨어, 펌웨어(firmware), 소프트웨어, 또는 그들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 범용 프로세서(general processor), 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. In addition, various embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. In the case of hardware implementation, one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays A general processor, a controller, a microcontroller, a microprocessor, and the like.

본 개시의 범위는 다양한 실시 예의 방법에 따른 동작이 장치 또는 컴퓨터 상에서 실행되도록 하는 소프트웨어 또는 머신-실행가능한 명령들(예를 들어, 운영체제, 애플리케이션, 펌웨어(firmware), 프로그램 등), 및 이러한 소프트웨어 또는 명령 등이 저장되어 장치 또는 컴퓨터 상에서 실행 가능한 비-일시적 컴퓨터-판독가능 매체(non-transitory computer-readable medium)를 포함한다. The scope of the present disclosure is to be accorded the broadest interpretation as understanding of the principles of the invention, as well as software or machine-executable instructions (e.g., operating system, applications, firmware, Instructions, and the like are stored and are non-transitory computer-readable medium executable on the device or computer.

Claims

Dividing the 360 VR image into a plurality of regions based on a division structure of 360 Virtual Reality (VR) images;
Generating an area sequence using the plurality of divided areas;
Generating a bitstream for the generated region sequence; And
And transmitting the generated bitstream,
Wherein the region sequence includes regions of the same position in at least one frame included in the 360 VR image.

The method according to claim 1,
Wherein the region is at least one of a tile and a subpicture.

The method according to claim 1,
The division structure of the 360 VR image is determined on a GOP (Group of Picture) basis,
Wherein generating the bitstream comprises:
And generating a bitstream for at least one region sequence included in the GOP.

The method of claim 3,
Wherein generating the bitstream comprises:
And repeatedly generating a bitstream for all the region sequences included in the GOP.

5. The method of claim 4,
The bitstream may include:
A first bitstream and a second bitstream generated from at least one region sequence included in the GOP,
Wherein the first bitstream and the second bitstream have different image qualities.

6. The method of claim 5,
Wherein the first bitstream is higher in image quality than the second bitstream.

6. The method of claim 5,
Wherein the first bitstream is generated using a first video encoder,
Wherein the second bitstream is generated using a second video encoder different from the first video encoder.

The method according to claim 1,
Wherein the division structure of the 360 VR image includes at least one of the number, position, size information, and frame rate of the area.

9. The method of claim 8,
The frame rate
Wherein a time for generating a bitstream for all the region sequences included in the GOP is set to be equal to a time for generating a bitstream for all frames included in the GOP.

Receiving a bitstream encoded in an area sequence unit;
Decoding the received bit stream to obtain a plurality of regions; And
And rendering the image to be reproduced based on the plurality of regions,
Wherein the region sequence includes regions of the same position in at least one frame included in a 360 VR (Virtual Reality) image.

11. The method of claim 10,
Wherein the area is at least one of a tile and a subpicture.

11. The method of claim 10,
The bitstream may include:
And at least two bit streams having different image qualities,
And a viewport area receives a bitstream having a higher image quality than the other areas excluding the viewport area.

13. The method of claim 12,
Wherein the viewport area is determined based on a first one of frames included in a group of pictures (GOP).

13. The method of claim 12,
Wherein the viewport area is updated in units of GOPs.

13. The method of claim 12,
Wherein at least two bitstreams having different image qualities are decoded by one video decoder.

11. The method of claim 10,
Wherein the rendering of the image to be reproduced comprises:
And arranging the plurality of regions at the same position as that of inputting to the video encoder by arranging the plurality of regions in the region sequence unit.

17. The method of claim 16,
Preferably,
And repeats the process until all the region sequences included in the GOP are arranged.

13. The method of claim 12,
Receiving a bitstream having a higher image quality than the remaining area excluding the viewport area for the modified viewport area based on at least one of the changed position and the GOP information when the viewport area is changed, Decoding method.

11. The method of claim 10,
Wherein the plurality of regions are divided from the 360 VR image based on the division structure of the 360 VR image,
Wherein the division structure of the 360 VR image includes at least one of the number, position, size information, and frame rate of the area.

20. The method of claim 19,
The frame rate
Wherein a time for generating a bitstream for all region sequences included in a GOP is set to be equal to a time for generating a bitstream for all frames included in the GOP.