KR101941789B1

KR101941789B1 - Virtual reality video transmission based on viewport and tile size

Info

Publication number: KR101941789B1
Application number: KR1020170170985A
Authority: KR
Inventors: 류은석; 손장우
Original assignee: 가천대학교 산학협력단
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2019-01-24

Abstract

According to the present specification, disclosed is an image transmission method capable of reducing a transmission bandwidth comprises the operations of: dividing the entire area of a virtual reality space into at least one tile having a rectangular shape; generating standard-definition video data for the entire area of the virtual reality space; generating high-definition video data for at least one of tiles included in a viewport at least partly based on a percentage of the tiles included in the viewport that a user views within the virtual reality space; and transmitting a bit stream including at least a part of the standard-definition video data and the high-definition video data.

Description

[0001] VIRTUAL REALITY VIDEO TRANSMISSION BASED ON VIEWPORT AND TILE SIZE BASED ON VIEWPORT AND TILE SIZE [0002]

본 명세서는 뷰포트와 타일 크기에 기초한 가상 현실 비디오 전송에 관한 것이다.This specification relates to virtual reality video transmission based on viewports and tile sizes.

최근 가상 현실 기술 및 장비의 발달과 함께 머리장착형 영상장치(Head-Mounted Display; HMD)와 같은 착용 가능한 기기들이 선보이고 있다. 머리 장착형 영상장치는 눈앞에서 재생되고 구 형태의 360도 화면을 재생해야 하기 때문에 UHD(Ultra High-Definition)급 이상의 초 고화질 영상이 요구된다.Recently, wearable devices such as a head-mounted display (HMD) have been introduced along with the development of virtual reality technology and equipment. Since the head-mounted imaging device is required to reproduce a 360-degree screen in the form of a sphere reproduced in front of the eye, an ultra high-definition image of UHD (Ultra High-Definition) or higher is required.

초 고화질 영상의 전송에는 높은 대역폭이 요구되기 때문에 대역폭을 낮추기 위한 기술 중의 하나로 비디오 표준화(MPEG, JTC-VC, JVET)미팅에서는 사용자의 뷰포트에 해당하는 영상만 고화질로 전송하고 나머지 영역을 저화질로 전송하는 방안이 논의 되고 있다.In the video standardization (MPEG, JTC-VC, JVET) meeting, only the video corresponding to the user's viewport is transmitted in high image quality and the remaining area is transmitted in low image quality because high bandwidth is required for transmission of ultra high image quality. Is being discussed.

360도의 가상 현실 공간 안에서 사용자가 바라보고 있는 영역인 뷰포트만 해당하는 영상만을 전송하기 위해, 움직임 예측과 보상을 제한하여 타일을 개별 혹은 일부 집합으로 전송할 수 있는 움직임이 제한된 타일 집합(Motion Constrained Tile Sets; MCTS)에 대한 기술이 논의 되고 있다.Motion Constrained Tile Sets (Motion Constrained Tile Sets) that can transmit tiles individually or as a subset by limiting motion prediction and compensation in order to transmit only the viewport-only image in the 360-degree virtual reality space. ; MCTS) is being discussed.

MCTS 기술을 적용하여 뷰포트에 해당하는 부분의 타일 영상만을 전송할 경우, 뷰포트가 약간이라도 포함되는 타일의 경우에도 전송이 되어 대역폭 낭비가 발생하고, 전송되지 않은 타일의 경우 화질 개선 없이 저화질로 디코딩 된다. 따라서, 뷰포트와 비디오 타일크기를 고려한 효율적 전송 기법과 화질을 개선할 수 있는 기술에 대한 필요성이 증대되고 있다.When MCTS technology is applied, only the tile image corresponding to the viewport is transmitted, even if the tile including the viewport is slightly transmitted, the bandwidth is wasted. In case of the untransmitted tile, the image is decoded to the low quality without improving the image quality. Therefore, there is an increasing need for an efficient transmission technique that considers the viewport and video tile size, and a technique capable of improving image quality.

본 명세서는 영상 전송 방법을 제시한다. 상기 영상 전송 방법은 가상 현실 공간 전체 영역을 직사각형 모양의 적어도 하나의 타일로 분할하는 동작; 상기 가상 현실 공간 전체 영역에 대한 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터를 생성하는 동작; 상기 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트에 포함된 타일들의 비율에 적어도 일부 기초하여 상기 뷰포트에 포함된 타일들 중 적어도 하나의 타일에 대한 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성하는 동작; 및 상기 기본 계층 비디오 데이터 또는 상기 기본 화질 비디오 데이터, 및 상기 향상 계층 비디오 데이터 또는 상기 고화질 비디오 데이터의 적어도 일부를 포함하는 비트스트림을 전송하는 동작을 포함할 수 있다.This specification presents a method of transmitting an image. Wherein the image transfer method comprises: dividing an entire virtual reality space into at least one tile having a rectangular shape; Generating base layer video data or basic picture quality video data for the entire region of the virtual reality space; Generating enhanced layer video data or enhanced video data for at least one of the tiles included in the viewport based at least in part on a ratio of the tiles included in the viewport that the user is viewing within the virtual reality space; And transmitting a bit stream comprising at least a portion of the base layer video data or the base picture quality video data and the enhancement layer video data or the high definition video data.

상기 방법 및 그 밖의 실시 예는 다음과 같은 특징을 포함할 수 있다.The method and other embodiments may include the following features.

상기 뷰포트에 포함된 타일들의 비율은 영상 수신 장치로부터 수신하는 상기 뷰포트에 포함된 타일의 번호 정보 및 포함 비율 정보에 적어도 일부 기초하여 구할 수 있다.The ratio of the tiles included in the viewport can be obtained based at least in part on the number information and the ratio information of the tiles included in the viewport received from the image receiving apparatus.

또한, 상기 뷰포트에 포함된 타일들의 비율에 적어도 일부 기초하여 상기 뷰포트에 포함된 타일들 중 적어도 하나의 타일에 대한 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성하는 동작은 상기 뷰포트에 포함된 타일들에 상기 비율이 큰 타일에서 작은 타일의 순서로 우선순위를 부여하는 동작; 및 상기 비트스트림을 전송하는 통신 회선의 대역폭 및 상기 우선순위에 기초하여 상기 적어도 하나의 타일에 대한 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성하는 동작을 포함할 수 있다.Also, the operation of generating enhancement layer video data or high-quality video data for at least one of the tiles included in the viewport based at least in part on the ratio of the tiles included in the viewport may be performed on tiles included in the viewport Assigning a priority to the tile with a higher ratio in the order of smaller tiles; And generating enhanced layer video data or high definition video data for the at least one tile based on the bandwidth and the priority of the communication line transmitting the bitstream.

또한, 상기 비트스트림을 전송하는 통신 회선의 대역폭 및 상기 우선순위에 기초하여 상기 적어도 하나의 타일에 대한 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성하는 동작은 상기 대역폭의 허용 범위까지 우선순위가 높은 타일부터 낮은 타일의 순서로 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성할 수 있다.In addition, the operation of generating the enhancement layer video data or the high-quality video data for the at least one tile based on the bandwidth of the communication line for transmitting the bitstream and the priority is performed by using a tile The enhancement layer video data or the high-definition video data can be generated in the order of the lower layer and the lower layer.

또한, 상기 뷰포트에 포함된 타일들의 비율에 적어도 일부 기초하여 상기 뷰포트에 포함된 타일들 중 적어도 하나의 타일에 대한 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성하는 동작은 상기 뷰포트에 포함된 타일들 중에 상기 비율이 특정 값과 같거나 큰 타일에 대해서 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성할 수 있다.Also, the operation of generating enhancement layer video data or high-definition video data for at least one of the tiles included in the viewport based at least in part on a ratio of the tiles included in the viewport may include: The enhancement layer video data or the high-definition video data can be generated for a tile in which the ratio is equal to or larger than a specific value.

한편, 본 명세서는 영상 전송 장치를 제시한다. 상기 영상 전송 장치는 가상 현실 공간 전체 영역에 대한 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터를 생성하는 기본 계층 인코더; 상기 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트에 포함된 타일들의 비율에 적어도 일부 기초하여 상기 뷰포트에 포함된 타일들 중 적어도 하나의 타일에 대한 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성하는 향상 계층 인코더; 상기 기본 계층 비디오 데이터 또는 상기 기본 화질 비디오 데이터, 및 상기 향상 계층 비디오 데이터 또는 상기 고화질 비디오 데이터의 적어도 일부를 포함하는 비트스트림을 생성하는 다중화기; 및 상기 뷰포트에 포함된 타일의 비율 정보를 수신하고, 상기 비트스트림을 전송하는 통신부를 포함할 수 있다.On the other hand, the present specification discloses an image transmission apparatus. Wherein the video transmission apparatus includes: a base layer encoder for generating base layer video data or base layer video data for the entire region of the virtual reality space; An enhancement layer for generating enhancement layer video data or enhancement layer video data for at least one of the tiles included in the viewport based at least in part on a ratio of the tiles included in the viewports viewed by the user in the virtual reality space, An encoder; A multiplexer for generating a bitstream including at least a portion of the base layer video data or the base picture quality video data and the enhancement layer video data or the high definition video data; And a communication unit for receiving the ratio information of the tiles included in the viewport and transmitting the bitstream.

상기 장치 및 그 밖의 실시 예는 다음과 같은 특징을 포함할 수 있다.The apparatus and other embodiments may include the following features.

상기 뷰포트에 포함된 타일들은 상기 비율이 큰 타일에서 작은 타일의 순서로 우선순위가 부여되고, 상기 향상 계층 인코더는 상기 비트스트림을 전송하는 통신 회선의 대역폭 및 상기 우선순위에 기초하여 상기 적어도 하나의 타일에 대한 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성할 수 있다.Wherein the tiles included in the viewport are prioritized in the order of the tiles with the highest ratios and the tiles with the lower ratios, and wherein the enhancement layer encoder is operable to determine, based on the bandwidth of the communication line transmitting the bitstream and the priority, It is possible to generate enhancement layer video data or high-definition video data for the tile.

또한, 상기 향상 계층 인코더는 상기 대역폭의 허용 범위 내에서 우선순위가 높은 타일부터 낮은 타일의 순서로 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성할 수 있다.In addition, the enhancement layer encoder may generate enhancement layer video data or enhancement layer video data in the order of the highest priority tile to the lowest tile within the allowable range of the bandwidth.

또한, 상기 향상 계층 인코더는 상기 뷰포트에 포함된 타일들 중에 상기 비율이 특정 값과 같거나 큰 타일에 대해서 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 생성할 수 있다.The enhancement layer encoder may generate enhancement layer video data or enhancement layer video data for a tile whose ratio is equal to or greater than a specific value among the tiles included in the viewport.

한편, 본 명세서는 영상 수신 방법을 제시한다. 상기 영상 수신 방법은 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트 정보, 상기 뷰포트에 포함된 타일의 번호 정보, 및 상기 뷰포트에 포함된 타일의 포함 비율 정보를 전송하는 동작; 직사각형 모양의 적어도 하나의 타일로 분할된 상기 가상 현실 공간 전체에 대한 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터를 수신하는 동작; 상기 비율에 따라 우선순위가 부여된 상기 뷰포트에 포함된 타일들에 대한 적어도 하나의 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 수신하는 동작; 상기 뷰포트에 포함된 타일들 중 상기 향상 계층 비디오 데이터 또는 상기 고화질 비디오 데이터가 수신되지 않는 타일은 업샘플링된 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터를 생성하는 동작; 및 상기 기본 계층 비디오 데이터 또는 상기 기본 화질 비디오 데이터, 상기 향상 계층 비디오 데이터 또는 상기 고화질 비디오 데이터, 및 상기 업샘플링된 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터에 적어도 일부 기초하여, 상기 사용자에게 출력할 영상을 디코딩하는 동작을 포함할 수 있다.The present specification, on the other hand, discloses a video receiving method. The image receiving method includes: transmitting viewport information viewed by a user in a virtual reality space, number information of a tile included in the viewport, and information on the ratio of tiles included in the viewport; Receiving base layer video data or basic picture quality video data for the entire virtual reality space divided into at least one tile of rectangular shape; Receiving at least one enhancement layer video data or high quality video data for tiles included in the viewport prioritized according to the ratio; Wherein the tile in which the enhancement layer video data or the high-definition video data is not received among the tiles included in the viewport is generated by: generating upsampled base layer video data or base-quality video data; And an image outputting section for outputting the image to be output to the user based at least in part on the base layer video data, the base image quality video data, the enhancement layer video data or the high image quality video data, and the upsampled base layer video data or the basic image quality video data. Lt; / RTI >

또 한편, 본 명세서는 영상 수신 방법을 제시한다. 상기 영상 수신 방법은 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트 정보, 상기 뷰포트에 포함된 타일의 번호 정보, 및 상기 뷰포트에 포함된 타일의 포함 비율 정보를 전송하는 동작; 직사각형 모양의 적어도 하나의 타일로 분할된 상기 가상 현실 공간 전체에 대한 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터를 수신하는 동작; 상기 비율이 특정 값과 같거나 큰 타일에 대해서 상기 뷰포트에 포함된 타일들에 대한 적어도 하나의 향상 계층 비디오 데이터 또는 고화질 비디오 데이터를 수신하는 동작; 상기 뷰포트에 포함된 타일들 중 상기 향상 계층 비디오 데이터 또는 상기 고화질 비디오 데이터가 수신되지 않는 타일은 업샘플링된 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터를 생성하는 동작; 및 상기 기본 계층 비디오 데이터 또는 상기 기본 화질 비디오 데이터, 상기 향상 계층 비디오 데이터 또는 상기 고화질 비디오 데이터, 및 상기 업샘플링된 기본 계층 비디오 데이터 또는 기본 화질 비디오 데이터에 적어도 일부 기초하여, 상기 사용자에게 출력할 영상을 디코딩하는 동작을 포함할 수 있다.On the other hand, the present specification presents a method of receiving a video. The image receiving method includes: transmitting viewport information viewed by a user in a virtual reality space, number information of a tile included in the viewport, and information on the ratio of tiles included in the viewport; Receiving base layer video data or basic picture quality video data for the entire virtual reality space divided into at least one tile of rectangular shape; Receiving at least one enhancement layer video data or high definition video data for tiles included in the viewport for tiles where the ratio is equal to or greater than a specific value; Wherein the tile in which the enhancement layer video data or the high-definition video data is not received among the tiles included in the viewport is generated by: generating upsampled base layer video data or base-quality video data; And an image outputting section for outputting the image to be output to the user based at least in part on the base layer video data, the base image quality video data, the enhancement layer video data or the high image quality video data, and the upsampled base layer video data or the basic image quality video data. Lt; / RTI >

본 명세서에 개시된 실시 예들에 의하면, 뷰포트에 해당하는 타일을 모두 전송할 때 보다 사용자 지연 시간이 짧아지고 전송 대역폭을 줄일 수 있는 효과가 있다.According to the embodiments of the present invention, there is an effect that the user delay time is shorter and the transmission bandwidth is reduced than when all the tiles corresponding to the viewport are transmitted.

또한, 본 명세서에 개시된 실시 예들에 의하면, 뷰포트에 해당하는 타일을 모두 전송하지 않아도, 사용자가 느끼는 불쾌감을 저감시킬 수 있는 효과가 있다.In addition, according to the embodiments disclosed in this specification, there is an effect that the unpleasantness felt by the user can be reduced even if all the tiles corresponding to the viewport are not transmitted.

또한, 본 명세서에 개시된 실시 예들에 의하면, 비디오 데이터 전송을 위한 통신 회선의 대역폭의 상태에 따라, 적응적으로 고화질의 비디오 데이터를 전송할 수 있는 효과가 있다.In addition, according to the embodiments disclosed herein, it is possible to adaptively transmit high-quality video data according to the bandwidth state of a communication line for transmitting video data.

도 1은 가상 현실 영상을 제공하는 예시적인 가상 현실 시스템을 도시한다.
도 2는 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.
도 3은 서버 디바이스의 예시적인 구성을 나타낸 도면이다.
도 4는 인코더의 예시적인 구조를 나타낸 도면이다.
도 5는 관심 영역을 시그널링하는 예시적인 방법을 나타낸 도면이다
도 6은 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.
도 7은 제어부의 예시적인 구성을 나타낸 도면이다.
도 8은 디코더의 예시적인 구성을 나타낸 도면이다.
도 9는 뷰포트에 포함된 우선순위에 따른 타일 전송의 예시도이다.
도 10은 뷰포트에 포함된 타일의 최소 비율에 따른 타일 전송의 예시도이다.
도 11은 예시적인 영상 전송 방법을 도시한다.
도 12는 예시적인 영상 전송 장치를 도시한다.
도 13은 우선 순위에 따른 영상 수신 방법의 예를 도시한다.
도 14는 특정 비율에 따른 영상 수신 방법의 예를 도시한다.
도 15는 영상 전송의 신호 체계에서 전달되는 타일 정보를 예시적으로 도시한다.
도 16은 예시적인 OMAF 구문을 도시한다.
도 17은 XML 형태로 표현된 예시적인 타일 정보 구문을 도시한다.1 illustrates an exemplary virtual reality system for providing a virtual reality image.
2 is a diagram illustrating an exemplary scalable video coding service.
3 is a diagram showing an exemplary configuration of a server device.
4 is a diagram showing an exemplary structure of an encoder.
Figure 5 is an illustration of an exemplary method of signaling a region of interest
6 is a diagram showing an exemplary configuration of a client device.
7 is a diagram showing an exemplary configuration of the control unit.
8 is a diagram showing an exemplary configuration of a decoder.
Figure 9 is an illustration of tile transmission according to the priority included in the viewport.
Figure 10 is an illustration of tile transmission according to the minimum percentage of tiles included in the viewport.
11 shows an exemplary video transmission method.
12 shows an exemplary video transmission apparatus.
13 shows an example of a video receiving method according to priority.
14 shows an example of a method of receiving an image according to a specific ratio.
FIG. 15 exemplarily shows tile information transmitted in a signal transmission system.
Figure 16 illustrates an exemplary OMAF syntax.
Figure 17 shows an exemplary tile information syntax expressed in XML form.

본 명세서에 개시된 기술은 뷰포트와 타일 크기에 기초한 가상 현실 비디오 전송 기술에 적용될 수 있다. 그러나 본 명세서에 개시된 기술은 이에 한정되지 않고, 상기 기술의 기술적 사상이 적용될 수 있는 모든 전자 장치 및 방법에도 적용될 수 있다.The techniques disclosed herein can be applied to virtual reality video transmission techniques based on viewports and tile sizes. However, the technology disclosed in this specification is not limited thereto, and can be applied to all electronic devices and methods to which the technical idea of the above-described technology can be applied.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 명세서에 개시된 기술의 사상을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 명세서에 개시된 기술의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 명세서에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥 상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It is noted that the technical terms used herein are used only to describe specific embodiments and are not intended to limit the scope of the technology disclosed herein. Also, the technical terms used herein should be interpreted as being generally understood by those skilled in the art to which the presently disclosed subject matter belongs, unless the context clearly dictates otherwise in this specification, Should not be construed in a broader sense, or interpreted in an oversimplified sense. It is also to be understood that the technical terms used herein are erroneous technical terms that do not accurately represent the spirit of the technology disclosed herein, it is to be understood that the technical terms used herein may be understood by those of ordinary skill in the art to which this disclosure belongs And it should be understood. Also, the general terms used in the present specification should be interpreted in accordance with the predefined or prior context, and should not be construed as being excessively reduced in meaning.

본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.As used herein, terms including ordinals, such as first, second, etc., may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예들을 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals denote like or similar elements, and redundant description thereof will be omitted.

또한, 본 명세서에 개시된 기술을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 기술의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 기술의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 그 기술의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.Further, in the description of the technology disclosed in this specification, a detailed description of related arts will be omitted if it is determined that the gist of the technology disclosed in this specification may be obscured. It is to be noted that the attached drawings are only for the purpose of easily understanding the concept of the technology disclosed in the present specification, and should not be construed as limiting the spirit of the technology by the attached drawings.

도 1은 가상 현실 영상을 제공하는 예시적인 가상 현실 시스템을 도시한다.1 illustrates an exemplary virtual reality system for providing a virtual reality image.

가상 현실 시스템은 가상 현실 영상을 생성하는 가상 현실 영상 생성 장치, 상기 입력된 가상 현실 영상을 인코딩하여 전송하는 서버 디바이스, 및 상기 전송된 가상 현실 영상을 디코딩하여 사용자에게 출력하는 하나 이상의 클라이언트 디바이스를 포함하도록 구성될 수 있다.The virtual reality system includes a virtual reality image generation device that generates a virtual reality image, a server device that encodes and transmits the input virtual reality image, and one or more client devices that decode the transmitted virtual reality image and output the decoded virtual reality image to a user .

도 1은 가상 현실 영상 생성 장치(110), 서버 디바이스(120), 및 하나 이상의 클라이언트 디바이스(130)가 포함된 가상 현실 시스템(100)을 도시한다. 상기 가상 현실 시스템(100)은 360도 영상 제공 시스템으로 불릴 수 있다. 도 1에 도시된 각 구성요소들의 수는 예시적인 것일 뿐 이에 제한되지 아니한다.FIG. 1 illustrates a virtual reality system 100 including a virtual reality image generation device 110, a server device 120, and one or more client devices 130. The virtual reality system 100 may be referred to as a 360 degree image providing system. The number of the respective components shown in Fig. 1 is illustrative, but not limited thereto.

상기 가상 현실 영상 생성 장치(110)는 적어도 하나 이상의 카메라 모듈을 포함하여 자신이 위치하고 있는 공간에 대한 영상을 촬영함으로써 공간 영상을 생성할 수 있다.The virtual reality image generating apparatus 110 may include at least one camera module and may generate a spatial image by capturing an image of a space in which the virtual reality image generating apparatus 110 is located.

상기 서버 디바이스(120)는 상기 가상 현실 영상 생성 장치(110)에서 생성되어 입력된 공간 영상을 스티칭(Image stitching), 프로젝션(Projection), 맵핑(Mapping)하여 360도 영상을 생성하고, 상기 생성된 360도 영상을 원하는 품질의 비디오 데이터로 조절한 뒤 인코딩(Encoding; 부호화)할 수 있다.The server device 120 generates a 360-degree image by stitching, projecting, and mapping spatial images generated and input in the virtual reality image generating apparatus 110, A 360-degree image can be encoded with video data of a desired quality and then encoded.

또한, 상기 서버 디바이스(120)는 상기 인코딩된 360도 영상에 대한 비디오 데이터와 시그널링 데이터를 포함하는 비트스트림을 네트워크(통신망)을 통해서 클라이언트 디바이스(130)로 전송할 수 있다.Also, the server device 120 may transmit the bitstream including the video data and the signaling data for the encoded 360-degree image to the client device 130 through the network (communication network).

상기 클라이언트 디바이스(130)는 수신된 비트스트림을 디코딩(Decoding; 복호화)하여 상기 클라이언트 디바이스(130)를 착용한 사용자에게 360도 영상을 출력할 수 있다. 상기 클라이언트 디바이스(130)는 머리장착형 영상장치(Head-Mounted Display; HMD)와 같은 근안 디스플레이(Near-eye display) 장치일 수 있다.The client device 130 may decode the received bit stream and output a 360-degree image to a user wearing the client device 130. [ The client device 130 may be a near-eye display device such as a head-mounted display (HMD).

한편, 상기 가상 현실 영상 생성 장치(110)는 컴퓨터 시스템으로 구성되어 컴퓨터 그래픽으로 구현된 가상의 360도 공간에 대한 영상을 생성할 수도 있다. 또한, 상기 가상 현실 영상 생성 장치(110)는 가상 현실 게임 등의 가상 현실 콘텐츠의 공급자 일 수 있다.Meanwhile, the virtual reality image generating apparatus 110 may be configured as a computer system to generate an image of a virtual 360-degree space implemented by computer graphics. In addition, the virtual reality image generating apparatus 110 may be a provider of virtual reality contents such as a virtual reality game.

클라이언트 디바이스(130)는 해당 클라이언트 디바이스(130)를 사용하는 사용자로부터 사용자 데이터를 획득할 수 있다. 사용자 데이터는 사용자의 영상 데이터, 음성 데이터, 뷰포트 데이터(시선 데이터), 관심 영역 데이터 및 부가 데이터를 포함할 수 있다.The client device 130 may obtain user data from a user using the client device 130. The user data may include user's image data, voice data, viewport data (sight line data), region of interest data, and additional data.

예를 들어, 클라이언트 디바이스(130)는 사용자의 영상 데이터를 획득하는 2D/3D 카메라 및 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.For example, the client device 130 may include at least one of a 2D / 3D camera and an Immersive camera for acquiring image data of a user. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

예를 들어, 클라이언트 디바이스(130)는 제1 장소에 위치한 제1 사용자의 사용자 데이터를 획득하는 제1 클라이언트 디바이스(131), 제2 장소에 위치한 제2 사용자의 사용자 데이터를 획득하는 제2 클라이언트 디바이스(133), 및 제3 장소에 위치한 제3 사용자의 사용자 데이터를 획득하는 제3 클라이언트 디바이스(135) 중에서 적어도 하나를 포함할 수 있다.For example, the client device 130 may include a first client device 131 that obtains user data of a first user located at a first location, a second client device 130 that obtains user data of a second user located at a second location, A second client device 133, and a third client device 135 that obtains user data of a third user located at a third location.

그리고 나서, 각각의 클라이언트 디바이스(130)는 획득한 사용자 데이터를 네트워크를 통하여 서버 디바이스(120)로 전송할 수 있다. Each client device 130 may then transmit the acquired user data to the server device 120 over the network.

서버 디바이스(120)는 클라이언트 디바이스(130)로부터 적어도 하나의 사용자 데이터를 수신할 수 있다. 서버 디바이스(120)는 수신한 사용자 데이터를 기초로 가상 공간에 대한 전체 영상을 생성할 수 있다. 상기 전체 영상은 가상 공간 내에서 360도 방향의 영상을 제공하는 immersive 영상을 나타낼 수 있다. 서버 디바이스(120)는 사용자 데이터에 포함된 영상 데이터를 가상 공간에 매핑하여 전체 영상을 생성할 수 있다.Server device 120 may receive at least one user data from client device 130. The server device 120 may generate a full image of the virtual space based on the received user data. The entire image may represent an immersive image providing a 360-degree image in the virtual space. The server device 120 may generate the entire image by mapping the image data included in the user data to the virtual space.

그리고 나서, 서버 디바이스(120)는 전체 영상을 각 사용자에게 전송할 수 있다.The server device 120 may then send the entire image to each user.

각각의 클라이언트 디바이스(130)는 전체 영상을 수신하고, 각 사용자가 바라보는 영역 만큼을 가상 공간에 렌더링 및/또는 디스플레이할 수 있다.Each client device 130 may receive the entire image and render and / or display as much of the area viewed by each user in the virtual space.

도 2는 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.2 is a diagram illustrating an exemplary scalable video coding service.

스케일러블 비디오 코딩 서비스는 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라 시간적, 공간적, 그리고 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 스케일러블 비디오 코딩 서비스는 일반적으로 해상도(Spatial resolution), 품질(Quality), 및 시간(Temporal) 측면에서의 계층성(Scalability)을 제공한다.Scalable video coding service is an image compression method for providing various services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. Scalable video coding services generally provide scalability in terms of spatial resolution, quality, and temporal aspects.

공간적 계층성(Spatial scalability)은 동일한 영상에 대해 각 계층별로 다른 해상도를 가지고 부호화함으로써 서비스할 수 있다. 공간적 계층성을 이용하여 디지털 TV, 노트북, 스마트 폰 등 다양한 해상도를 갖는 디바이스에 대해 적응적으로 영상 콘텐츠를 제공하는 것이 가능하다.Spatial scalability can be provided by encoding the same image with different resolution for each layer. It is possible to adaptively provide image contents to devices having various resolutions such as a digital TV, a notebook, and a smart phone using spatial hierarchy.

도면을 참고하면, 스케일러블 비디오 코딩 서비스는 VSP(비디오 서비스 프로바이더; Video Service Provider)로부터 가정 내의 홈 게이트웨이 (Home Gateway)를 통해 동시에 하나 이상의 서로 다른 특성을 가진 TV를 지원할 수 있다. 예를 들어, 스케일러블 비디오 코딩 서비스는 서로 다른 해상도(Resolution)를 가지는 HDTV (High-Definition TV), SDTV (Standard-Definition TV), 및 LDTV (Low-Definition TV)를 동시에 지원할 수 있다.Referring to the drawings, a scalable video coding service can support one or more TVs having different characteristics from a video service provider (VSP) through a home gateway in the home. For example, the scalable video coding service can simultaneously support HDTV (High-Definition TV), SDTV (Standard-Definition TV), and LDTV (Low-Definition TV) having different resolutions.

시간적 계층성(Temporal scalability)은 콘텐츠가 전송되는 네트워크 환경 또는 단말기의 성능을 고려하여 영상의 프레임 레이트(Frame rate)를 적응적으로 조절할 수 있다. 예를 들어, 근거리 통신망을 이용하는 경우에는 60FPS(Frame Per Second)의 높은 프레임 레이트로 서비스를 제공하고, 3G 모바일 네트워크와 같은 무선 광대역 통신망을 사용하는 경우에는 16FPS의 낮은 프레임 레이트로 콘텐츠를 제공함으로써, 사용자가 영상을 끊김 없이 받아볼 수 있도록 서비스를 제공할 수 있다.Temporal scalability can adaptively adjust the frame rate of an image in consideration of the network environment in which the content is transmitted or the performance of the terminal. For example, when a local area network is used, a service is provided at a high frame rate of 60 frames per second (FPS). When a wireless broadband communication network such as a 3G mobile network is used, a content is provided at a low frame rate of 16 FPS, A service can be provided so that the user can receive the video without interruption.

품질 계층성(Quality scalability) 또한 네트워크 환경이나 단말기의 성능에 따라 다양한 화질의 콘텐츠를 서비스함으로써, 사용자가 영상 콘텐츠를 안정적으로 재생할 수 있도록 한다.Quality scalability In addition, by providing contents of various image quality according to the network environment or the performance of the terminal, the user can stably reproduce the image contents.

스케일러블 비디오 코딩 서비스는 각각 기본 계층 (Base layer)과 하나 이상의 향상 계층 (Enhancement layer(s))을 포함할 수 있다. 수신기는 기본 계층만 받았을 때는 일반 화질의 영상을 제공하고, 기본 계층 및 향상 계층을 함께 받으면 고화질을 제공할 수 있다. 즉, 기본 계층과 하나 이상의 향상 계층이 있을 때, 기본 계층을 받은 상태에서 향상 계층 (예: Enhancement layer 1, enhancement layer 2, …, enhancement layer n)을 더 받으면 받을수록 화질이나 제공하는 영상의 품질이 좋아진다.The scalable video coding service may each include a base layer and one or more enhancement layers (s). The receiver provides a normal image quality when receiving only the base layer, and can provide a high image quality when the base layer and the enhancement layer are received together. In other words, when there is a base layer and one or more enhancement layers, when an enhancement layer (for example, enhancement layer 1, enhancement layer 2, ..., enhancement layer n) is further received while receiving a base layer, Is better.

이와 같이, 스케일러블 비디오 코딩 서비스의 영상은 복수개의 계층으로 구성되어 있으므로, 수신기는 적은 용량의 기본 계층 데이터를 빨리 전송 받아 일반적 화질의 영상을 빨리 처리하여 재생하고, 필요 시 향상 계층 영상 데이터까지 추가로 받아서 서비스의 품질을 높일 수 있다.Thus, since the scalable video coding service is composed of a plurality of layers, the receiver can quickly receive the base layer data with a small capacity and quickly process and reproduce the image of general image quality, The service quality can be improved.

도 3은 서버 디바이스의 예시적인 구성을 나타낸 도면이다.3 is a diagram showing an exemplary configuration of a server device.

서버 디바이스(300)는 제어부(310) 및/또는 통신부(320)를 포함할 수 있다.The server device 300 may include a control unit 310 and / or a communication unit 320.

제어부(310)는 가상 공간에 대한 전체 영상을 생성하고, 생성된 전체 영상을 인코딩할 수 있다. 또한, 제어부(310)는 서버 디바이스(300)의 모든 동작을 제어할 수 있다. 구체적인 내용은 이하에서 설명한다.The controller 310 may generate a full image of the virtual space and encode the entire image. In addition, the control unit 310 can control all the operations of the server device 300. Details will be described below.

통신부(320)는 외부 장치 및/또는 클라이언트 디바이스로 데이터를 전송 및/또는 수신할 수 있다. 예를 들어, 통신부(320)는 적어도 하나의 클라이언트 디바이스로부터 사용자 데이터 및/또는 시그널링 데이터를 수신할 수 있다. 또한, 통신부(320)는 가상 공간에 대한 전체 영상 및/또는 일부의 영역에 대한 영상을 클라이언트 디바이스로 전송할 수 있다.The communication unit 320 may transmit and / or receive data to an external device and / or a client device. For example, the communication unit 320 may receive user data and / or signaling data from at least one client device. In addition, the communication unit 320 may transmit the entire image of the virtual space and / or the image of the partial region to the client device.

제어부(310)는 시그널링 데이터 추출부(311), 영상 생성부(313), 관심 영역 판단부(315), 시그널링 데이터 생성부(317), 및/또는 인코더(319) 중에서 적어도 하나를 포함할 수 있다.The control unit 310 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, a signaling data generation unit 317, and / or an encoder 319 have.

시그널링 데이터 추출부(311)는 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 포함할 수 있다. 상기 영상 구성 정보는 가상 공간 내에서 사용자의 시선 방향을 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다. 또한, 상기 영상 구성 정보는 가상 공간 내에서 사용자의 뷰포트 정보를 포함할 수 있다.The signaling data extracting unit 311 can extract signaling data from the data received from the client device. For example, the signaling data may include image configuration information. The image configuration information may include gaze information indicating a gaze direction of a user and zoom area information indicating a viewing angle of a user in a virtual space. In addition, the image configuration information may include the viewport information of the user in the virtual space.

영상 생성부(313)는 가상 공간에 대한 전체 영상 및 가상 공간 내의 특정 영역에 대한 영상을 생성할 수 있다.The image generating unit 313 may generate a full image of the virtual space and an image of a specific region in the virtual space.

관심 영역 판단부(315)는 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 판단할 수 있다. 또한, 가상 공간의 전체 영역 내에서 사용자의 뷰포트를 판단할 수 있다. 예를 들어, 관심 영역 판단부(315)는 시선 정보 및/또는 줌 영역 정보를 기초로 관심 영역을 판단할 수 있다. 예를 들어, 관심 영역은 사용자가 보게 될 가상의 공간에서 중요 오브젝트가 위치할 타일의 위치(예를 들어, 게임 등에서 새로운 적이 등장하는 위치, 가상 공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳일 수 있다. 또한, 관심 영역 판단부(315)는 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 지시하는 관심 영역 정보와 사용자의 뷰포트에 대한 정보를 생성할 수 있다.The ROI determining unit 315 may determine a ROI corresponding to the user's viewing direction in the entire area of the virtual space. In addition, the user's viewport can be determined within the entire area of the virtual space. For example, the ROI determiner 315 may determine the ROI based on the sight line information and / or the zoom area information. For example, the region of interest may include a location of a tile where the important object is located in a virtual space to be viewed by the user (for example, a location where a new enemy appears in a game or the like, a position of a speaker in a virtual space) It can be a place to look at. In addition, the ROI determining unit 315 may generate ROI information indicating the ROI corresponding to the user's viewing direction and information about the user's viewport in the entire area of the virtual space.

시그널링 데이터 생성부(317)는 전체 영상을 처리하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 관심 영역 정보 및/또는 뷰포트 정보를 전송할 수 있다. 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data generation unit 317 can generate signaling data for processing the entire image. For example, the signaling data may transmit the region of interest information and / or the viewport information. The signaling data may be transmitted via at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing the video data.

인코더(319)는 시그널링 데이터를 기초로 전체 영상을 인코딩할 수 있다. 예를 들어, 인코더(319)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 인코딩할 수 있다. 예를 들어, 가상 공간 내에서 사용자가 특정 지점을 바라보는 경우, 인코더는 가상 공간 내의 사용자 시선을 기초로 특정 지점에 해당하는 영상은 고화질로 인코딩하고, 상기 특정 지점 이외에 해당하는 영상은 저화질로 인코딩할 수 있다. 실시예에 따라서, 인코더(319)는 시그널링 데이터 추출부(311), 영상 생성부(313), 관심 영역 판단부(315), 및/또는 시그널링 데이터 생성부(317) 중에서 적어도 하나를 포함할 수 있다.The encoder 319 may encode the entire image based on the signaling data. For example, the encoder 319 may encode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the user looks at a specific point in the virtual space, the encoder encodes the image corresponding to the specific point in high quality on the basis of the user's gaze in the virtual space, and the corresponding image other than the specific point is encoded can do. The encoder 319 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, and / or a signaling data generation unit 317 have.

이하에서는 관심 영역을 이용한 예시적인 영상 전송 방법을 설명한다.Hereinafter, an exemplary image transmission method using a region of interest will be described.

서버 디바이스는, 통신부를 이용하여, 적어도 하나의 클라이언트 디바이스로부터 비디오 데이터 및 시그널링 데이터를 수신할 수 있다. 또한, 서버 디바이스는, 시그널링 데이터 추출부를 이용하여, 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 시점 정보 및 줌 영역 정보를 포함할 수 있다.The server device can receive video data and signaling data from at least one client device using a communication unit. Further, the server device can extract the signaling data using the signaling data extracting unit. For example, the signaling data may include viewpoint information and zoom region information.

시선 정보는 사용자가 가상 공간 내에서 어느 영역(지점)을 바라보는지 여부를 지시할 수 있다. 가상 공간 내에서 사용자가 특정 영역을 바라보면, 시선 정보는 사용자에서 상기 특정 영역으로 향하는 방향을 지시할 수 있다.The gaze information can indicate which area (point) the user sees in the virtual space. When the user looks at a specific area within the virtual space, the line of sight information can indicate the direction from the user to the specific area.

줌 영역 정보는 사용자의 시선 방향에 해당하는 비디오 데이터의 확대 범위 및/또는 축소 범위를 지시할 수 있다. 또한, 줌 영역 정보는 사용자의 시야각을 지시할 수 있다. 줌 영역 정보의 값을 기초로 비디오 데이터가 확대되면, 사용자는 특정 영역만을 볼 수 있다. 줌 영역 정보의 값을 기초로 비디오 데이터가 축소되면, 사용자는 특정 영역뿐만 아니라 상기 특정 영역 이외의 영역 일부 및/또는 전체를 볼 수 있다.The zoom area information may indicate an enlarged range and / or a reduced range of the video data corresponding to the viewing direction of the user. In addition, the zoom area information can indicate the viewing angle of the user. If the video data is enlarged based on the value of the zoom area information, the user can view only the specific area. If the video data is reduced based on the value of the zoom area information, the user can view not only the specific area but also a part and / or the entire area other than the specific area.

그리고 나서, 서버 디바이스는, 영상 생성부를 이용하여, 가상 공간에 대한 전체 영상을 생성할 수 있다.Then, the server device can generate the entire image of the virtual space using the image generating unit.

그리고 나서, 서버 디바이스는, 관심 영역 판단부를 이용하여, 시그널링 데이터를 기초로 가상 공간 내에서 각 사용자가 바라보는 시점 및 줌(zoom) 영역에 대한 영상 구성 정보를 파악할 수 있다.Then, the server device can use the region-of-interest determination unit to grasp the video configuration information of the point of view and the zoom region of each user in the virtual space based on the signaling data.

그리고 나서, 서버 디바이스는, 관심 영역 판단부를 이용하여, 영상 구성 정보를 기초로 사용자의 관심 영역을 결정할 수 있다.Then, the server device can determine the region of interest of the user based on the image configuration information using the region of interest determination unit.

시그널링 데이터(예를 들어, 시점 정보 및 줌 영역 정보 중에서 적어도 하나)가 변경될 경우, 서버 디바이스는 새로운 시그널링 데이터를 수신할 수 있다. 이 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 새로운 관심 영역을 결정할 수 있다.When the signaling data (for example, at least one of the view information and the zoom area information) is changed, the server device can receive new signaling data. In this case, the server device can determine a new region of interest based on the new signaling data.

그리고 나서, 서버 디바이스는, 제어부를 이용하여, 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심 영역에 해당하는 데이터인지 아닌지 여부를 판단할 수 있다.Then, the server device can use the control unit to determine whether the data currently processed based on the signaling data is data corresponding to the region of interest.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심 영역에 해당하는 데이터인지 아닌지 여부를 판단할 수 있다.When the signaling data is changed, the server device can determine whether or not the data currently processed based on the new signaling data is data corresponding to the region of interest.

관심 영역에 해당하는 데이터일 경우, 서버 디바이스는, 인코더를 이용하여, 사용자의 시점에 해당하는 비디오 데이터(예를 들어, 관심 영역)는 고품질로 인코딩할 수 있다. 예를 들어, 서버 디바이스는 해당 비디오 데이터에 대하여 기본 계층 비디오 데이터 및 향상 계층 비디오 데이터를 생성하고, 이들을 전송할 수 있다.In the case of data corresponding to the region of interest, the server device can encode video data (for example, a region of interest) corresponding to the user's viewpoint at a high quality using an encoder. For example, the server device may generate base layer video data and enhancement layer video data for the video data and transmit them.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시점에 해당하는 비디오 데이터(새로운 관심 영역)는 고품질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 저품질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 고품질의 영상을 전송하는 경우, 서버 디바이스는 향상 계층 비디오 데이터를 추가로 생성 및/또는 전송할 수 있다.When the signaling data is changed, the server device can transmit the video data corresponding to the new time point (new interest area) as a high-quality image. If the server device is transmitting a low-quality image but the signaling data is changed so that the server device transmits a high-quality image, the server device can additionally generate and / or transmit enhancement layer video data.

관심 영역에 해당하지 않는 데이터일 경우, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 데이터(예를 들어, 비-관심 영역)은 저품질로 인코딩할 수 있다. 예를 들어, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 데이터에 대하여 기본 계층 비디오 데이터만 생성하고, 이들을 전송할 수 있다.In the case of data not corresponding to the area of interest, the server device can encode video data (e.g., non-interest area) that does not correspond to the user's viewpoint at a low quality. For example, the server device may generate only base layer video data for video data that does not correspond to a user's viewpoint, and may transmit them.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 사용자의 시점에 해당하지 않는 비디오 데이터(새로운 비-관심 영역)은 저품질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 고품질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 저품질의 영상을 전송하는 경우, 서버 디바이스는 더 이상 적어도 하나의 향상 계층 비디오 데이터를 생성 및/또는 전송하지 않고, 기본 계층 비디오 데이터만을 생성 및/또는 전송할 수 있다.When the signaling data is changed, the server device can transmit video data (new non-interest area) that does not correspond to the new user's viewpoint with a low quality image. In the case where the server device is transmitting a high quality image but the signaling data is changed and the server device transmits a low quality image, the server device does not generate and / or transmit at least one enhancement layer video data, Only hierarchical video data can be generated and / or transmitted.

즉, 기본 계층 비디오 데이터를 수신했을 때의 비디오 데이터의 화질은 향상 계층 비디오 데이터까지 받았을 때의 비디오 데이터의 화질보다는 낮으므로, 클라이언트 디바이스는 사용자가 고개를 돌린 정보를 센서 등으로부터 얻는 순간에, 사용자의 시선 방향에 해당하는 비디오 데이터(예를 들어, 관심 영역)에 대한 향상 계층 비디오 데이터를 수신할 수 있다. 그리고, 클라이언트 디바이스는 짧은 시간 내에 고화질의 비디오 데이터를 사용자에게 제공할 수 있다.That is, since the image quality of the video data when the base layer video data is received is lower than the image quality of the video data received when the enhancement layer video data is received, the client device, at the moment when the user obtains the information, (E.g., a region of interest) corresponding to the viewing direction of the video data. Then, the client device can provide high quality video data to the user in a short time.

도 4는 인코더의 예시적인 구조를 나타낸 도면이다.4 is a diagram showing an exemplary structure of an encoder.

인코더(400, 영상 부호화 장치)는 기본 계층 인코더(410), 적어도 하나의 향상 계층 인코더(420), 및 다중화기(430) 중에서 적어도 하나를 포함할 수 있다.The encoder 400 may include at least one of a base layer encoder 410, at least one enhancement layer encoder 420, and a multiplexer 430.

인코더(400)는 스케일러블 비디오 코딩 방법을 사용하여 전체 영상을 인코딩할 수 있다. 스케일러블 비디오 코딩 방법은 SVC(Scalable Video Coding) 및/또는 SHVC(Scalable High Efficiency Video Coding)를 포함할 수 있다.The encoder 400 may encode the entire image using a scalable video coding method. The scalable video coding method may include Scalable Video Coding (SVC) and / or Scalable High Efficiency Video Coding (SHVC).

스케일러블 비디오 코딩 방법은 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라서 시간적, 공간적, 및 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 예를 들어, 인코더(400)는 동일한 비디오 데이터에 대하여 두 가지 이상의 다른 품질(또는 해상도, 프레임 레이트)의 영상들을 인코딩하여 비트스트림을 생성할 수 있다.The scalable video coding method is an image compression method for providing a variety of services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. For example, the encoder 400 may encode images of two or more different qualities (or resolution, frame rate) for the same video data to generate a bitstream.

예를 들어, 인코더(400)는 비디오 데이터의 압축 성능을 높이기 위해서 계층 간 중복성을 이용한 인코딩 방법인 계층간 예측 툴(Inter-layer prediction tools)을 사용할 수 있다. 계층 간 예측 툴은 계층 간에 존재하는 영상의 중복성을 제거하여 향상 계층(Enhancement Layer; EL)에서의 압출 효율을 높이는 기술이다.For example, the encoder 400 may use an inter-layer prediction tool, which is an encoding method using intra-layer redundancy, in order to increase the compression performance of video data. The inter-layer prediction tool is a technique for enhancing the extrusion efficiency in an enhancement layer (EL) by eliminating redundancy of images existing between layers.

향상 계층은 계층 간 예측 툴을 이용하여 참조 계층(Reference Layer)의 정보를 참조하여 인코딩될 수 있다. 참조 계층이란 향상 계층 인코딩 시 참조되는 하위 계층을 말한다. 여기서, 계층 간 툴을 사용함으로써 계층 사이에 의존성(Dependency)이 존재하기 때문에, 최상위 계층의 영상을 디코딩하기 위해서는 참조되는 모든 하위 계층의 비트스트림이 필요하다. 중간 계층에서는 디코딩 대상이 되는 계층과 그 하위 계층들의 비트스트림 만을 획득하여 디코딩을 수행할 수 있다. 최하위 계층의 비트스트림은 기본 계층(Base Layer; BL)으로써, H.264/AVC, HEVC 등의 인코더로 인코딩될 수 있다.The enhancement layer can be encoded by referring to information of a reference layer using an inter-layer prediction tool. The reference layer refers to the lower layer that is referred to in the enhancement layer encoding. Here, since there is a dependency between layers by using a layer-to-layer tool, in order to decode the image of the highest layer, a bitstream of all lower layers to be referred to is required. In the middle layer, decoding can be performed by acquiring only a bitstream of a layer to be decoded and its lower layers. The bit stream of the lowest layer is a base layer (BL), and can be encoded by an encoder such as H.264 / AVC or HEVC.

기본 계층 인코더(410)는 전체 영상을 인코딩하여 기본 계층을 위한 기본 계층 비디오 데이터(또는 기본 계층 비트스트림)를 생성할 수 있다. 예를 들어, 기본 계층 비디오 데이터는 사용자가 가상 공간 내에서 바라보는 전체 영역을 위한 비디오 데이터를 포함할 수 있다. 기본 계층의 영상은 가장 낮은 화질의 영상일 수 있다.The base layer encoder 410 may encode the entire image to generate base layer video data (or base layer bitstream) for the base layer. For example, the base layer video data may include video data for the entire area viewed by the user in the virtual space. The image of the base layer may be the image of the lowest image quality.

향상 계층 인코더(420)는, 시그널링 데이터(예를 들어, 관심 영역 정보) 및 기본 계층 비디오 데이터를 기초로, 전체 영상을 인코딩하여 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터(또는 향상 계층 비트스트림)를 생성할 수 있다. 향상 계층 비디오 데이터는 전체 영역 내에서 관심 영역을 위한 비디오 데이터를 포함할 수 있다.The enhancement layer encoder 420 encodes the entire image based on signaling data (e.g., region of interest information) and base layer video data to generate at least one enhancement layer for at least one enhancement layer, Video data (or enhancement layer bitstream). The enhancement layer video data may include video data for a region of interest within the entire region.

다중화기(430)는 기본 계층 비디오 데이터, 적어도 하나의 향상 계층 비디오 데이터, 및/또는 시그널링 데이터를 멀티플렉싱하고, 전체 영상에 해당하는 하나의 비트스트림을 생성할 수 있다.The multiplexer 430 may multiplex the base layer video data, the at least one enhancement layer video data, and / or the signaling data, and may generate one bitstream corresponding to the entire image.

도 5는 관심 영역을 시그널링하는 예시적인 방법을 나타낸 도면이다.5 is a diagram illustrating an exemplary method of signaling a region of interest.

도 5를 참조하면, 스케일러블 비디오에서의 관심 영역을 시그널링하는 방법을 나타낸다.Referring to FIG. 5, there is shown a method of signaling a region of interest in scalable video.

서버 디바이스(또는 인코더)는 하나의 비디오 데이터(또는 픽처)를 직사각형 모양을 갖는 여러 타일(Tile)들로 분할할 수 있다. 예를 들어, 비디오 데이터는 Coding Tree Unit(CTU) 단위를 경계로 분할될 수 있다. 예를 들어, 하나의 CTU는 Y CTB, Cb CTB, 및 Cr CTB를 포함할 수 있다.A server device (or an encoder) may divide one video data (or picture) into a plurality of tiles having a rectangular shape. For example, video data can be partitioned into Coding Tree Unit (CTU) units. For example, one CTU may include Y CTB, Cb CTB, and Cr CTB.

서버 디바이스는 빠른 사용자 응답을 위해서 기본 계층의 비디오 데이터는 타일(Tile)로 분할하지 않고 전체적으로 인코딩할 수 있다. 그리고, 서버 디바이스는 하나 이상의 향상 계층들의 비디오 데이터는 필요에 따라서 일부 또는 전체를 여러 타일(Tile)들로 분할하여 인코딩할 수 있다.The server device can encode the video data of the base layer as a whole without dividing the base layer into tiles for fast user response. The server device can divide and encode video data of one or more enhancement layers into a plurality of tiles, part or all, as needed.

즉, 서버 디바이스는 향상 계층의 비디오 데이터는 적어도 하나의 타일로 분할하고, 관심 영역(510, ROI, Region of Interest)에 해당하는 타일들을 인코딩할 수 있다.That is, the server device may divide the video data of the enhancement layer into at least one tile and encode the tiles corresponding to the region of interest 510 (ROI, Region of Interest).

이 때, 관심 영역(510)은 가상 공간에서 사용자가 보게 될 중요 오브젝트(Object)가 위치할 타일들의 위치 (e.g. 게임 등에서 새로운 적이 등장하는 위치, 화상 통신에서 가상공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳에 해당할 수 있다.In this case, the area of interest 510 includes a location of a tile where an important object to be viewed by the user is to be located (e.g., a location where a new enemy appears in a game or the like, a position of a speaker in a virtual space in video communication) And / or where the user's gaze is being viewed.

또한, 서버 디바이스는 관심 영역에 포함 되는 적어도 하나의 타일을 식별하는 타일 정보를 포함하는 관심 영역 정보를 생성할 수 있다. 예를 들어, 관심 영역 정보는 관심 영역 판단부, 시그널링 데이터 생성부, 및/또는 인코더에 의해서 생성될 수 있다.The server device may also generate region of interest information including tile information identifying at least one tile included in the region of interest. For example, the region of interest information may be generated by the region of interest determiner, the signaling data generator, and / or the encoder.

관심 영역(510)의 타일 정보는 연속적이므로 모든 타일의 번호를 다 갖지 않더라도 효과적으로 압축될 수 있다. 예를 들어, 타일 정보는 관심 영역에 해당하는 모든 타일의 번호들뿐만 아니라 타일의 시작과 끝 번호, 좌표점 정보, CU (Coding Unit) 번호 리스트, 수식으로 표현된 타일 번호를 포함할 수 있다.Since the tile information in the area of interest 510 is continuous, it can be effectively compressed even if it does not have all the numbers of tiles. For example, the tile information may include not only the numbers of all the tiles corresponding to the region of interest but also the beginning and ending numbers of the tiles, the coordinate point information, the CU (Coding Unit) number list, and the tile number expressed by the formula.

비-관심 영역의 타일 정보는 인코더가 제공하는 Entropy coding을 거친 후 다른 클라이언트 디바이스, 영상 프로세싱 컴퓨팅 장비, 및/또는 서버로 전송될 수 있다.The tile information in the non-interest region may be sent to another client device, image processing computing device, and / or server after entropy coding provided by the encoder.

관심 영역 정보는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol)을 통해 전해질 수 있다. 또한, 관심 영역 정보는 비디오 표준의 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header) 등의 패킷 단위에서 전해질 수 있다. 또한, 관심 영역 정보는 비디오 파일을 서술하는 별도의 파일로(e.g. DASH의 MPD) 전달될 수 있다.The region of interest may be delivered via a High-Level Syntax Protocol carrying the session information. In addition, the region of interest may be transmitted in packet units such as SEI (Supplement Enhancement Information), VUI (video usability information), and slice header of a video standard. In addition, the region of interest information may be transferred to a separate file describing the video file (e.g., MPD of DASH).

이하에서는, 단일 화면 비디오에서의 관심 영역을 시그널링하는 방법을 나타낸다.Hereinafter, a method of signaling a region of interest in single-screen video is shown.

본 명세서의 예시적인 기술은 스케일러블 비디오가 아닌 단일 화면 영상에서는 일반적으로 관심 영역(ROI)이 아닌 영역을 Downscaling (Downsampling)하는 방식으로 화질을 떨어뜨리는 기법을 사용할 수 있다. 종래 기술은 서비스를 이용하는 단말 간에 downscaling 을 위해 쓴 필터(filter) 정보를 공유하지 않고, 처음부터 한가지 기술로 약속을 하거나 인코더만 필터 정보를 알고 있다.The exemplary technique of the present invention can use a technique of reducing the image quality by downscaling (downsampling) an area other than a ROI in a single-screen image, rather than a scalable video. The prior art does not share the filter information used for downscaling between the terminals using the service, but makes an appointment from the beginning with only one technique, or only the encoder knows the filter information.

하지만, 서버 디바이스는, 인코딩 된 영상을 전달 받는 클라이언트 디바이스(또는 HMD 단말)에서 downscaling된 관심 영역 외 영역의 화질을 조금이라도 향상 시키기 위해, 인코딩 시에 사용된 필터 정보를 클라이언트 디바이스로 전달할 수 있다. 이 기술은 실제로 영상 처리 시간을 상당히 줄일 수 있으며, 화질 향상을 제공할 수 있다.However, the server device may transmit the filter information used in the encoding to the client device in order to improve even the image quality of the down-scaled out-of-interest area in the client device (or the HMD terminal) receiving the encoded image. This technique can actually reduce image processing time significantly and can provide image quality enhancement.

전술한 바와 같이, 서버 디바이스는 관심 영역 정보를 생성할 수 있다. 예를 들어, 관심 영역 정보는 타일 정보뿐만 아니라 필터 정보를 더 포함할 수 있다. 예를 들어, 필터 정보는 약속된 필터 후보들의 번호, 필터에 사용된 값들을 포함할 수 있다.As described above, the server device may generate the region of interest information. For example, the area of interest information may further include filter information as well as tile information. For example, the filter information may include the number of promised filter candidates, the values used in the filter.

도 6은 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.6 is a diagram showing an exemplary configuration of a client device.

클라이언트 디바이스(600)는 영상 입력부(610), 오디오 입력부(620), 센서부(630), 영상 출력부(640), 오디오 출력부(650), 통신부(660), 및/또는 제어부(670) 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 클라이언트 디바이스(600)는 HMD(Head-Mounted Display)일 수 있다. 또한, 클라이언트 디바이스(600)의 제어부(670)는 클라이언트 디바이스(600)에 포함될 수도 있고, 별도의 장치로 존재할 수도 있다.The client device 600 includes an image input unit 610, an audio input unit 620, a sensor unit 630, an image output unit 640, an audio output unit 650, a communication unit 660, and / As shown in FIG. For example, the client device 600 may be an HMD (Head-Mounted Display). The control unit 670 of the client device 600 may be included in the client device 600 or may be a separate device.

영상 입력부(610)는 비디오 데이터를 촬영할 수 있다. 영상 입력부(610)는 사용자의 영상을 획득하는 2D/3D 카메라 및/또는 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.The video input unit 610 can capture video data. The image input unit 610 may include at least one of a 2D / 3D camera and / or an immersive camera for acquiring a user's image. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

오디오 입력부(620)는 사용자의 음성을 녹음할 수 있다. 예를 들어, 오디오 입력부(620)는 마이크를 포함할 수 있다.The audio input unit 620 can record the user's voice. For example, the audio input 620 may include a microphone.

센서부(630)는 사용자 시선의 움직임에 대한 정보를 획득할 수 있다. 예를 들어, 센서부(630)는 물체의 방위 변화를 감지하는 자이로 센서, 이동하는 물체의 가속도나 충격의 세기를 측정하는 가속도 센서, 및 사용자의 시선 방향을 감지하는 외부 센서를 포함할 수 있다. 실시예에 따라서, 센서부(630)는 영상 입력부(610) 및 오디오 입력부(620)를 포함할 수도 있다.The sensor unit 630 can acquire information on the movement of the user's gaze. For example, the sensor unit 630 may include a gyro sensor for sensing a change in the azimuth of the object, an acceleration sensor for measuring the acceleration of the moving object or the intensity of the impact, and an external sensor for sensing the direction of the user's gaze . According to an embodiment, the sensor unit 630 may include an image input unit 610 and an audio input unit 620.

영상 출력부(640)는 통신부(660)로부터 수신되거나 메모리(미도시)에 저장된 영상 데이터를 출력할 수 있다.The video output unit 640 can output video data received from the communication unit 660 or stored in a memory (not shown).

오디오 출력부(650)는 통신부(660)로부터 수신되거나 메모리에 저장된 오디오 데이터를 출력할 수 있다.The audio output unit 650 can output audio data received from the communication unit 660 or stored in the memory.

통신부(660)는 방송망, 무선통신망 및/또는 브로드밴드를 통해서 외부의 클라이언트 디바이스 및/또는 서버 디바이스와 통신할 수 있다. 예를 들어, 통신부(660)는 데이터를 전송하는 전송부(미도시) 및/또는 데이터를 수신하는 수신부(미도시)를 포함할 수 있다.The communication unit 660 can communicate with an external client device and / or a server device through a broadcasting network, a wireless communication network, and / or broadband. For example, the communication unit 660 may include a transmitting unit (not shown) for transmitting data and / or a receiving unit (not shown) for receiving data.

제어부(670)는 클라이언트 디바이스(600)의 모든 동작을 제어할 수 있다. 제어부(670)는 서버 디바이스로부터 수신한 비디오 데이터 및 시그널링 데이터를 처리할 수 있다. 제어부(670)에 대한 구체적인 내용은 이하에서 설명한다.The control unit 670 can control all operations of the client device 600. [ The control unit 670 can process the video data and the signaling data received from the server device. Details of the control unit 670 will be described below.

도 7은 제어부의 예시적인 구성을 나타낸 도면이다.7 is a diagram showing an exemplary configuration of the control unit.

제어부(700)는 시그널링 데이터 및/또는 비디오 데이터를 처리할 수 있다. 제어부(700)는 시그널링 데이터 추출부(710), 디코더(720), 시선 판단부(730), 및/또는 시그널링 데이터 생성부(740) 중에서 적어도 하나를 포함할 수 있다.The control unit 700 may process the signaling data and / or the video data. The control unit 700 may include at least one of a signaling data extractor 710, a decoder 720, a line of sight determiner 730, and / or a signaling data generator 740.

시그널링 데이터 추출부(710)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 관심 영역 정보를 포함할 수 있다.The signaling data extracting unit 710 may extract signaling data from data received from the server device and / or another client device. For example, the signaling data may include region of interest information.

디코더(720)는 시그널링 데이터를 기초로 비디오 데이터를 디코딩할 수 있다. 예를 들어, 디코더(720)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 디코딩할 수 있다. 예를 들어, 가상 공간 내에서 사용자가 특정 영역을 바라보는 경우, 디코더(720)는 가상 공간 내의 사용자 시선을 기초로 특정 영역에 해당하는 영상은 고화질로 디코딩하고, 특정 영역 이외에 해당하는 영상은 저화질로 디코딩할 수 있다. 실시예에 따라서, 디코더(720)는 시그널링 데이터 추출부(710), 시선 판단부(730), 및/또는 시그널링 데이터 생성부(740) 중에서 적어도 하나를 포함할 수 있다.Decoder 720 may decode the video data based on the signaling data. For example, the decoder 720 may decode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the user looks at a specific area in the virtual space, the decoder 720 decodes the image corresponding to the specific area with high image quality based on the user's gaze in the virtual space, Lt; / RTI > The decoder 720 may include at least one of a signaling data extractor 710, a line of sight determiner 730, and / or a signaling data generator 740 according to an embodiment of the present invention.

시선 판단부(730)는 가상 공간 내에서 사용자의 시선을 판단하고, 영상 구성 정보를 생성할 수 있다. 예를 들어, 영상 구성 정보는 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.The gaze determining unit 730 can determine the user's gaze in the virtual space and generate the image configuration information. For example, the image configuration information may include gaze information indicating a gaze direction and / or zoom area information indicating a viewing angle of a user.

시그널링 데이터 생성부(740)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 전송할 수 있다. 시그널링 데이터는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol)을 통해 전해질 수 있다. 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data generation unit 740 may generate signaling data for transmission to a server device and / or another client device. For example, the signaling data may transmit image configuration information. The signaling data may be delivered via a High-Level Syntax Protocol carrying the session information. The signaling data may be transmitted via at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing the video data.

도 8은 디코더의 예시적인 구성을 나타낸 도면이다.8 is a diagram showing an exemplary configuration of a decoder.

디코더(800)는 추출기(810), 기본 계층 디코더(820), 및/또는 적어도 하나의 향상 계층 디코더(830) 중에서 적어도 하나를 포함할 수 있다.The decoder 800 may include at least one of an extractor 810, a base layer decoder 820, and / or at least one enhancement layer decoder 830.

디코더(800)는 스케일러블 비디오 코딩 방법의 역 과정을 이용하여 비트스트림(비디오 데이터)을 디코딩할 수 있다.The decoder 800 may decode the bitstream (video data) using an inverse process of the scalable video coding method.

추출기(810)는 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림(비디오 데이터)을 수신하고, 재생하고자 하는 영상의 화질에 따라서 비트스트림을 선택적으로 추출할 수 있다. 예를 들어, 비트스트림(비디오 데이터)은 기본 계층을 위한 기본 계층 비트스트림(기본 계층 비디오 데이터) 및 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)을 포함할 수 있다. 기본 계층 비트스트림(기본 계층 비디오 데이터)는 가상 공간의 전체 영역을 위한 위한 비디오 데이터를 포함할 수 있다. 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)는 전체 영역 내에서 관심 영역을 위한 비디오 데이터를 포함할 수 있다.The extractor 810 receives the bitstream (video data) including the video data and the signaling data, and can selectively extract the bitstream according to the image quality of the video to be reproduced. For example, a bitstream (video data) may include a base layer bitstream (base layer video data) for a base layer and at least one enhancement layer bitstream for at least one enhancement layer predicted from the base layer ). The base layer bitstream (base layer video data) may include video data for the entire area of the virtual space. At least one enhancement layer bitstream (enhancement layer video data) may include video data for a region of interest within the entire region.

또한, 시그널링 데이터는 화상 회의 서비스를 위한 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 지시하는 관심 영역 정보를 포함할 수 있다.The signaling data may also include region of interest information indicating a region of interest corresponding to the direction of the user's gaze within the entire region of the virtual space for the video conferencing service.

기본 계층 디코더(820)는 저화질 영상을 위한 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)를 디코딩할 수 있다.The base layer decoder 820 can decode a base layer bitstream (or base layer video data) for a low-quality image.

향상 계층 디코더(830)는 시그널링 데이터 및/또는 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)를 기초로 고화질 영상을 위한 적어도 하나의 향상 계층의 비트스트림(또는 향상 계층 비디오 데이터)를 디코딩할 수 있다.The enhancement layer decoder 830 can decode at least one enhancement layer bitstream (or enhancement layer video data) for the high-definition video based on the signaling data and / or the bitstream (or base layer video data) have.

이하에서는, 사용자 시선의 움직임에 실시간으로 대응하기 위한 영상 구성 정보를 생성하는 방법에 대하여 설명한다.Hereinafter, a method of generating image configuration information for responding to the movement of the user's gaze in real time will be described.

영상 구성 정보는 사용자의 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보 중에서 적어도 하나를 포함할 수 있다. 사용자 시선이란 실제 공간이 아닌 가상 공간 내에서 사용자가 바라보는 방향을 의미한다. 또한, 시선 정보는 현재 사용자의 시선 방향을 지시하는 정보뿐만 아니라, 미래에 사용자의 시선 방향을 지시하는 정보(예를 들어, 주목을 받을 것이라 예상되는 시선 지점에 대한 정보)를 포함할 수 있다.The image configuration information may include at least one of gaze information indicating a gaze direction of a user and / or zoom area information indicating a viewing angle of a user. The user's gaze is the direction that the user looks in the virtual space, not the actual space. In addition, the gaze information may include information indicating the gaze direction of the user in the future (for example, information on gaze points that are expected to receive attention), as well as information indicating the gaze direction of the current user.

클라이언트 디바이스는 사용자를 중심으로 가상 공간 내에 위치하는 특정한 영역을 바라보는 동작을 센싱하고, 이를 처리할 수 있다.The client device can sense the operation of looking at a specific area located in the virtual space around the user and process the operation.

클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센서부로부터 센싱 정보를 수신할 수 있다. 센싱 정보는 카메라에 의해 촬영된 영상, 마이크에 의해 녹음된 음성일 수 있다. 또한, 센싱 정보는 자이로 센서, 가속도 센서, 및 외부 센서에 의해서 감지된 데이터일 수 있다.The client device can receive the sensing information from the sensor unit using the control unit and / or the sight line determination unit. The sensing information may be a video shot by a camera, or a voice recorded by a microphone. In addition, the sensing information may be data sensed by a gyro sensor, an acceleration sensor, and an external sensor.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센싱 정보를 기초로 사용자 시선의 움직임을 확인할 수 있다. 예를 들어, 클라이언트 디바이스는 센싱 정보가 가지는 값의 변화를 기초로 사용자 시선의 움직임을 확인할 수 있다.Further, the client device can confirm the movement of the user's gaze based on the sensing information by using the control unit and / or the visual-line determining unit. For example, the client device can check the movement of the user's gaze based on the change of the value of the sensing information.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 가상 현실 공간에서의 영상 구성 정보를 생성할 수 있다. 예를 들어, 클라이언트 디바이스가 물리적으로 움직이거나 사용자의 시선이 움직이는 경우, 클라이언트 디바이스는 센싱 정보를 기초로 가상 현실 공간에서의 사용자의 시선 정보 및/또는 줌 영역 정보를 계산할 수 있다.Further, the client device can generate image configuration information in the virtual reality space using the control unit and / or the visual determination unit. For example, when the client device physically moves or the user's gaze moves, the client device can calculate the gaze information and / or the zoom area information of the user in the virtual reality space based on the sensing information.

또한, 클라이언트 디바이스는, 통신부를 이용하여, 영상 구성 정보를 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송할 수 있다. 또한, 클라이언트 디바이스는 영상 구성 정보를 자신의 다른 구성요소로 전달할 수도 있다.Further, the client device can transmit image configuration information to the server device and / or another client device using the communication unit. In addition, the client device may forward the video configuration information to its other components.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성하는 방법을 설명하였다. 다만 이에 한정되지 않으며, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 영상 구성 정보를 생성할 수도 있다.In the foregoing, a method of generating image configuration information by a client device has been described. However, the present invention is not limited thereto, and the server device may receive the sensing information from the client device and generate the image configuration information.

또한, 클라이언트 디바이스와 연결된 외부의 컴퓨팅 디바이스가 영상 구성 정보를 생성할 수 있으며, 컴퓨팅 디바이스는 영상 구성 정보를 자신의 클라이언트 디바이스, 다른 클라이언트 디바이스, 및/또는 서버 디바이스로 전달할 수도 있다.In addition, an external computing device connected to the client device may generate image configuration information, and the computing device may communicate image configuration information to its client device, another client device, and / or a server device.

이하에서는, 클라이언트 디바이스가 영상 구성 정보를 시그널링 하는 방법을 설명한다.Hereinafter, a method for the client device to signal image configuration information will be described.

영상 구성 정보(시점 정보 및/또는 줌 영역 정보를 포함)를 시그널링하는 부분은 매우 중요하다. 영상 구성 정보의 시그널링이 너무 잦을 경우, 클라이언트 디바이스, 서버 디바이스, 및/또는 전체 네트워크에 부담을 줄 수 있다.Signaling the video configuration information (including viewpoint information and / or zoom area information) is very important. If the signaling of the video configuration information is too frequent, it may place a burden on the client device, the server device, and / or the entire network.

따라서, 클라이언트 디바이스는 사용자의 영상 구성 정보(또는 시선 정보 및/또는 줌 영역 정보)가 변경되는 경우에만 영상 구성 정보를 시그널링할 수 있다. 즉, 클라이언트 디바이스는 사용자의 시선 정보가 변경되는 경우에만 사용자의 시선 정보를 다른 클라이언트 디바이스 및/또는 서버 디바이스로 전송할 수 있다.Accordingly, the client device can signal image configuration information only when the image configuration information (or gaze information and / or zoom area information) of the user is changed. That is, the client device can transmit the gaze information of the user to another client device and / or the server device only when the gaze information of the user is changed.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성 및/또는 전송하는 것을 중심으로 설명하였지만, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 센싱 정보를 기초로 영상 구성 정보를 생성하고, 영상 구성 정보를 적어도 하나의 클라이언트 디바이스로 전송할 수도 있다.In the above description, the client device generates and / or transmits the image configuration information. However, the server device may receive the sensing information from the client device, generate the image configuration information based on the sensing information, It may be transmitted to one client device.

이상에서 언급한 시그널링은 서버 디바이스, 클라이언트 디바이스, 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다. 또한, 이상에서 언급한 시그널링은 클라이언트 디바이스 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다.The above-mentioned signaling may be signaling between a server device, a client device, and / or an external computing device (if present). In addition, the above-mentioned signaling may be signaling between the client device and / or an external computing device (if present).

이하에서는, 높고/낮은 수준의 영상을 전송하는 예시적인 방법을 설명한다.In the following, an exemplary method of transmitting high / low level images is described.

사용자의 시선 정보를 기초로 높고/낮은 수준의 영상을 전송하는 방법은 스케일러블 코덱의 계층을 스위칭하는 방법, 싱글 비트스트림 및 실시간 인코딩의 경우 QP(Quantization Parameter) 등을 이용한 Rate Control 방법, DASH 등의 단일 비트스트림의 경우 청크(Chunk) 단위로 스위칭하는 방법, 다운스케일링/업스케일링방법(Down Scaling/Up Scaling), 및/또는 렌더링(Rendering)의 경우 더 많은 리소스를 활용한 고화질 렌더링 방법을 포함할 수 있다.A method of transmitting a high / low level image based on a user's gaze information includes a method of switching layers of a scalable codec, a rate control method using QP (quantization parameter) in case of single bit stream and real time encoding, DASH A method of switching in units of chunks in the case of a single bit stream of a bit stream, a down scaling / up scaling method and / or a high quality rendering method utilizing more resources in the case of rendering can do.

전술한 예시적인 기술은 비록 비록 스케일러블 비디오를 통한 차별적 전송 기법을 이야기하고 있지만, 단일 계층을 갖는 일반 비디오 코딩 기술을 사용할 경우에도, 양자화 계수(Quantization Parameter)나 다운스케일링/업스케일링 정도를 조절함으로써, 전체 대역폭을 낮추고, 빠르게 사용자 시선 움직임에 응답하는 등의 장점을 제공할 수 있다. 또한 미리 여러 비트레이트(bitrate)를 갖는 비트스트림(bitstream)으로 트랜스코딩 된 파일들을 사용할 경우, 본 명세서의 예시적인 기술은 청크(Chunk) 단위로 높은 수준의 영상과 낮은 수준의 영상 사이를 스위칭하여 제공할 수 있다.Although the above-described exemplary techniques describe a differential transmission scheme using scalable video, even when a general video coding technique with a single layer is used, the quantization parameter or the degree of downscaling / upscaling can be adjusted , Lowering overall bandwidth, and quickly responding to user gaze movements. In addition, when using files that are transcoded into a bitstream having several bitrates in advance, the exemplary technique of the present invention switches between a high-level image and a low-level image on a chunk basis .

또한, 본 명세서는 가상 현실 시스템을 예로 들고 있지만, 본 명세서는 HMD를 이용한 VR (Virtual Reality) 게임, AR (Augmented Reality) 게임 등에서도 똑같이 적용될 수 있다. 즉, 사용자가 바라보는 시선에 해당하는 영역을 높은 수준의 영상으로 제공하고, 사용자가 바라볼 것으로 예상되는 영역이나 오브젝트(Object)가 아닌 곳을 바라 볼 경우만 시그널링하는 기법 모두가 가상 현실 시스템의 예에서와 똑같이 적용될 수 있다.In addition, although the present specification assumes a virtual reality system, the present specification can be equally applied to a VR (Virtual Reality) game using an HMD, an Augmented Reality (AR) game, and the like. That is, all of the techniques for providing a high-level region corresponding to the line of sight that the user is looking at, and signaling only when the user looks at an area or an object that is not expected to be viewed, It can be applied just as in the example.

이하에서는 도 9 내지 도 10을 참조하여 뷰포트에 포함된 타일 크기에 따른 대역폭 적응적 비디오 데이터 전송에 대하여 설명한다.Hereinafter, the bandwidth adaptive video data transmission according to the tile size included in the viewport will be described with reference to FIGS. 9 to 10. FIG.

도 9는 뷰포트에 포함된 우선순위에 따른 타일 전송의 예시도이다.Figure 9 is an illustration of tile transmission according to the priority included in the viewport.

영상의 한 프레임(910)이 도시한 바와 같이 25개의 타일로 구성되어 있고, 움직임이 제한된 타일 집합(Motion Constrained Tile Sets; MCTS)이 적용되어 뷰포트에 해당하는 타일들(12번, 13번, 17번, 18번)이 전송될 때, 서버 디바이스는 뷰포트에 포함된 타일에 대해서 타일의 크기에 대비한 포함된 부분의 비율(포함 비율)을 구한 뒤, 비율이 큰 순서에서 작은 순서로 나열 할 수 있다.One frame 910 of the image is composed of 25 tiles and a motion constrained tile set (MCTS) is applied to the tiles corresponding to the viewport (12, 13, 17 , 18), the server device obtains the percentage of the tile included in the viewport relative to the size of the tile (percentage of coverage), and then lists the tiles in ascending order of magnitude have.

예시에서는, 12번 타일이 7% 포함되고, 13번 타일이 73% 포함되고, 17번 타일이 2% 포함되고, 18번 타일이 18% 포함되었기 때문에, 서버 디바이스는 뷰포트를 가장 많이 포함하고 있는 13번 타일을 1순위로, 18번 타일을 2순위로, 12번 타일을 3순위로, 17번 타일을 4순위로 하는 우선순위를 부여할 수 있다.In the example, the server device contains the largest number of viewports, including 7% for tile 12, 73% for tile 13, 2% for tile 17, and 18% for tile 18 The tile 13 may be assigned a priority, the tile 18 may be assigned a priority, the tile 12 may be assigned a priority, and the tile 17 may be assigned a priority.

서버 디바이스는 서버 디바이스와 클라이언트 디바이스들 사이의 대역폭 상태 또는/및 클라이언트 디바이스들의 성능 등에 따라 상기 우선순위 대로 뷰포트에 포함된 타일들 전체 및/또는 일부를 클라이언트 디바이스들에게 전송할 수 있다(920).The server device may transmit 920 the client devices all and / or a portion of the tiles included in the viewport according to the priority in accordance with the bandwidth state between the server device and the client devices and / or the performance of the client devices.

이 때, 전송되지 않는 타일은 클라이언트 디바이스에서 업샘플링된 기본 계층(Upsampled Base Layer)을 활용한 에러은닉 기술을 통하여 화질이 개선될 수 있다(930).At this time, the image quality can be improved through an error concealment technique using a base layer (Upsampled Base Layer) upsampled by the client device at step 930.

예를 들어, 서버 디바이스와 클라이언트 디바이스 C에 있어서, 현재 대역폭이 2순위까지의 타일만 전송할 수 있는 상태이므로, 서버 디바이스는 타일 13(1순위 타일)과 타일 18(2순위 타일)을 고화질의 비디오 영상으로 전송하고, 나머지 12번 타일과 17번 타일에 대해서는 고화질 비디오 영상을 전송하지 않는다. 이 때, 클라이언트 디바이스 C는 전송되지 않는 12번 타일 및 17번 타일에 대서는 업샘플링된 기본 계층을 활용하여, 출력되는 영상의 품질을 개선함으로써 사용자의 느낄 수 있는 멀미 등의 불쾌감을 줄일 수 있다.For example, in the server device and the client device C, since the current bandwidth is capable of transmitting tiles up to the second rank, the server device stores the tiles 13 (first tile) and tiles 18 (second tile) And does not transmit high-quality video images for the remaining 12 tiles and 17 tiles. At this time, the client device C uses the upsampled base layer for the 12th tile and the 17th tile which are not transmitted, and improves the quality of the outputted image, thereby reducing the unpleasant feeling such as nausea that the user can feel .

따라서, 본 명세서에 제시된 뷰포트에 포함된 타일 크기에 따른 대역폭 적응적 비디오 데이터 전송 방법은 기존의 뷰포트에 해당하는 타일을 모두 전송할 때 보다 사용자 지연 시간이 짧고 전송 대역폭을 줄일 수 있으며, 사용자의 불쾌감도 저감시킬 수 있는 효과가 있다.Accordingly, the bandwidth adaptive video data transmission method according to the tile size included in the viewport described in the present specification can reduce the user delay time and the transmission bandwidth more than when all the tiles corresponding to the existing viewports are transmitted, There is an effect that it can be reduced.

도 10은 뷰포트에 포함된 타일의 최소 비율에 따른 타일 전송의 예시도이다.Figure 10 is an illustration of tile transmission according to the minimum percentage of tiles included in the viewport.

본 명세서에 제시된 방법은 뷰포트에 포함된 타일들의 최소 포함 비율을 설정하여 설정된 포함 비율 값 이상의 비율에 해당하는 타일만 클라이언트 디바이스로 전송 할 수 있다. 즉, 서버 디바이스와 클라이언트 디바이스 사이의 대역폭 상태에 상관없이, 사용자가 설정한 포함 비율에 따라 일괄적으로 해당하는 타일의 고화질 비디오 데이터를 전송함으로써 대역폭을 줄일 수 있다.The method disclosed herein may set a minimum content ratio of tiles included in a viewport and transmit only a tile corresponding to a ratio equal to or higher than a content ratio value to the client device. That is, regardless of the bandwidth state between the server device and the client device, the bandwidth can be reduced by transmitting the high-quality video data of the corresponding tile collectively according to the content ratio set by the user.

도 (a)는 뷰포트에 포함된 타일의 포함 비율을 구하고, 포함 비율에 따른 전송 우선 순위의 선정방법을 도시한다.FIG. 2A illustrates a method of selecting a transmission priority according to an inclusion ratio, by determining the inclusion ratio of the tiles included in the viewport.

도 (b)는 사용자가 설정한 포함 비율(예를 들어, 15% 이상)에 따라 해당하는 타일(13번 타일, 18번 타일)에 대해서만 고화질 비디오 데이터를 전송한 예를 나타낸다. 여기에서, 클라이언트 단말은 전송되지 않는 12번 타일과 17번 타일에 대해서는 업샘플링된 기본 계층을 활용하여, 출력되는 영상의 품질을 개선할 수 있다.5B shows an example in which high-definition video data is transmitted only to a corresponding tile (tile 13, tile 18) according to a content ratio (for example, 15% or more) set by the user. Here, the client terminal can improve the quality of the output image by utilizing the upsampled base layer for the tile No. 12 and the tile No. 17 that are not transmitted.

뷰포트에 포함되는 타일 크기의 최소 비율은 서버와 사용자 환경에 따라 설정될 수 있다. 전송되지 않는 타일에 대하여 에러은닉 기술의 적용이 가능하며, 대역폭의 낭비를 최소화 함으로써 효율적인 고화질 통신 서비스가 가능하다.The minimum percentage of tile size included in the viewport can be set according to the server and user environment. The error concealment technique can be applied to the tile that is not transmitted, and the waste of the bandwidth is minimized, thereby enabling an efficient high-quality communication service.

이하에서는 도 11 및 도 12를 참조하여 전술한 뷰포트에 포함된 타일 비율 및 우선순위에 따른 타일 전송 방법 및 전송 장치에 대해서 상세하게 설명한다.Hereinafter, the tile transmission method and the transmission apparatus according to the tile ratio and priority included in the viewport described above with reference to Figs. 11 and 12 will be described in detail.

본 명세서에는 스케일러블 비디오 코딩 기술을 예로 들어, 저화질의 영상은 기본 계층 영상으로 고화질의 영상은 향상 계층 영상으로 표현하였지만, 가상 공간에 대한 비디오 영상을 저화질 영상 또는 기본 화질과 고화질로 영상으로 생성하고 전송할 수 있는 비디오 코딩 기술은 어느 것이든 사용할 수 있다.In this specification, a scalable video coding technique is taken as an example, a low-quality image is represented as a base layer image, and a high-quality image is represented as an enhancement layer image. However, a video image for a virtual space may be generated as a low- Any video coding technique that can be transmitted can be used.

따라서, 실시예들에서 표현한 기본 계층 비디오 데이터는 기본 화질 또는 저화질의 비디오 데이터를 의미하고, 향상 계층 비디오 데이터는 고화질의 비디오 데이터를 의미할 수 있다. 또한, 향상 계층 비디오를 사용할 경우, 하나 이상의 계층 비디오 데이터를 사용할 수 있는데, 스케일러블 비디오 코딩이 아닌 경우에는 한 픽쳐에 대한 다수의 향상 계층 비디오 데이터는 하나의 픽쳐를 나타내는 고화질 비디오 데이터로 대체할 수 있다.Therefore, the base layer video data represented in the embodiments means video data of a basic picture quality or a low picture quality, and the enhancement layer video data may mean video data of high picture quality. In addition, when enhancement layer video is used, one or more layer video data can be used. In case of not scalable video coding, a plurality of enhancement layer video data for one picture can be replaced with high definition video data representing one picture have.

도 11은 예시적인 영상 전송 방법을 도시한다.11 shows an exemplary video transmission method.

도 11에 도시된 예시적인 영상 전송 방법에 의하면, 먼저, 서버 디바이스(비디오 서버)는 가상 현실 공간 전체 영역을 직사각형 모양의 하나 이상의 타일로 분할할 수 있다(1101).According to the exemplary video transmission method shown in FIG. 11, a server device (video server) may divide the entire area of the virtual reality space into at least one tile having a rectangular shape (1101).

서버 디바이스는 가상 현실 공간 전체 영역에 대한 기본 계층 비디오 데이터 또는 기본 품질의 비디오 데이터를 생성할 수 있다(1103).The server device may generate base layer video data or base-quality video data for the entire area of the virtual reality space (1103).

서버 디바이스는 영상 수신 장치로부터 수신하는 뷰포트에 포함된 타일의 번호 정보 및 포함 비율 정보에 적어도 일부 기초하여 뷰포트에 포함된 타일들의 비율을 구하고, 비율에 따라, 높은 비율에서 낮은 비율의 순서로 타일별로 우선순위를 부여할 수 있다(1105).The server device obtains the ratio of the tiles included in the viewport based on at least a part of the number information and the ratio information of the tiles included in the viewport received from the image receiving apparatus and calculates the ratio of the tiles included in the viewport A priority can be given (1105).

서버 디바이스는 비디오 데이터를 전송할 통신 회선의 대역폭의 허용 한도 내에서 우선순위에 따라 전송할 타일을 선택하고, 선택된 타일에 대해서 향상 계층 비디오 데이터 또는 고화질의 비디오 데이터를 생성할 수 있다(1107).The server device may select a tile to be transmitted according to the priority within the allowable limit of the bandwidth of the communication line through which the video data is to be transmitted and generate enhanced layer video data or high quality video data for the selected tile (1107).

서버 디바이스는 기본 계층 비디오 데이터(저화질 비디오 데이터) 및 향상 계층 비디오 데이터(고화질 비디오 데이터)를 비트스트림으로 형태로 전송할 수 있다(1109).The server device can transmit base layer video data (low quality video data) and enhancement layer video data (high quality video data) in the form of a bit stream (1109).

이 때, 서버 디바이스는 우선 순위에 따라 전송될 타일에 대해서만, 향상 계층 비디오 데이터(즉, 고화질 비디오데이터)를 생성한 뒤, 생성된 비디오 데이터를 전송함으로써, 대역폭을 줄일 수 있다. 즉, 서버 디바이스는 대역폭의 허용 범위까지 우선순위가 높은 타일부터 낮은 타일의 순서로 향상 계층 비디오 데이터(즉, 고화질의 비디오 데이터)를 생성한다.At this time, the server device can generate the enhancement layer video data (i.e., high-definition video data) only for the tile to be transmitted according to the priority, and then transmit the generated video data, thereby reducing the bandwidth. That is, the server device generates the enhancement layer video data (i.e., high-definition video data) in the order from the highest priority tile to the lowest tile to the allowable range of the bandwidth.

한편, 서버 디바이스는 뷰포트에 포함된 타일들 중에서 뷰포트 포함 비율이 특정 값 이상인 경우에 대해서만, 해당 타일 영상을 향상 계층 비디오 데이터(고화질의 비디오 데이터)로 생성하고, 생성된 타일 영상만을 클라이언트 디바이스로 전송할 수 있다. 이 때, 뷰포트 포함 특정 비율은 비디오 서버와 사용자 환경 등을 고려하여 설정할 수 있다.On the other hand, the server device generates enhancement layer video data (high-quality video data) only for the tile included in the viewport only when the viewport inclusion ratio is higher than a specific value, and transmits only the generated tile image to the client device . At this time, the viewport inclusion specific rate can be set in consideration of the video server and the user environment.

도 12는 예시적인 영상 전송 장치를 도시한다.12 shows an exemplary video transmission apparatus.

영상 전송 장치(서버 디바이스)(1200)는 기본 계층 인코더(1210), 향상 계층 인코더(1220), 다중화기(1230), 및/또는 통신부(1240)를 포함할 수 있다.The image transmission device (server device) 1200 may include a base layer encoder 1210, an enhancement layer encoder 1220, a multiplexer 1230, and / or a communication unit 1240.

기본 계층 인코더(1210)는 가상 현실 공간 전체 영역에 대한 기본 계층 비디오 데이터(즉, 저화질 비디오 데이터)를 생성할 수 있다.The base layer encoder 1210 may generate base layer video data (i.e., low quality video data) for the entire region of the virtual reality space.

향상 계층 인코더(1220)는 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트에 포함된 타일들에 대해서 향상 계층 비디오 데이터(즉, 고화질 비디오 데이터)를 생성할 수 있다. 이 때, 향상 계층 인코더(1220)는 뷰포트에 포함된 모든 타일에 대해서 향상 계층 비디오 데이터를 생성하지 않고, 뷰포트에 포함된 타일들에 대해서 포함된 크기 비율에 따라 선정된 우선 순위에 따라, 향상 계층 비디오를 생성할 수 있다.The enhancement layer encoder 1220 may generate enhancement layer video data (i.e., high definition video data) for the tiles included in the viewports that the user is viewing within the virtual reality space. At this time, the enhancement layer encoder 1220 does not generate the enhancement layer video data for all the tiles included in the viewport, but generates the enhancement layer video data in accordance with the priority determined according to the size ratio included in the viewports. Video can be generated.

또한, 향상 계층 인코더(1220)는 대역폭의 허용 범위 내에서 우선순위가 높은 타일부터 낮은 타일의 순서로 향상 계층 비디오 데이터를 생성할 수 있다.In addition, the enhancement layer encoder 1220 can generate the enhancement layer video data in the order of the highest priority tile to the lowest tile within the bandwidth allowable range.

또한, 향상 계층 인코더(1220)는 뷰포트에 포함된 타일들 중에 상기 포함 비율이 특정 값 이상인 타일에 대해서 향상 계층 비디오 데이터를 생성할 수 있다.In addition, the enhancement layer encoder 1220 may generate enhancement layer video data for the tiles included in the viewport, the inclusion ratio of which is higher than a specific value.

이 때, 영상 전송 장치(1200)는 대역폭의 상태를 분석한 뒤, 분석 결과에 따라 허용되는 타일의 순위를 결정하고, 결정된 순위의 타일들에 대해서만 고화질의 영상을 생성한다.At this time, the image transmitting apparatus 1200 analyzes the state of the bandwidth, determines the ranking of the allowed tiles according to the analysis result, and generates a high-quality image only for the tiles of the determined ranking.

이후, 다중화기(1230)는 생성된 기본 계층 비디오 데이터와 향상 계층 비디오 데이터를 포함하는 비트스트림을 생성한다. 상기 비트스트림에는 영상 데이터 뿐만 아니라, 전송되는 타일 번호 정보도 포함될 수 있다.Then, the multiplexer 1230 generates a bitstream including the generated base layer video data and the enhancement layer video data. The bitstream may include not only image data but also tile number information to be transmitted.

통신부(1240)는 클라이언트 디바이스(도시하지 않음)로부터 뷰포트에 포함된 타일의 비율 정보를 수신하고, 다중화기(1230)에서 생성된 비트스트림을 제어부(도시하지 않음)의 명령에 따라 클라이언트 디바이스로 전송할 수 있다.The communication unit 1240 receives the ratio information of the tiles included in the viewport from the client device (not shown), and transmits the bit stream generated by the multiplexer 1230 to the client device according to the command of the control unit .

즉, 영상 전송 장치(1200) 뷰포트에 포함된 타일들을 포함 비율이 큰 타일에서 작은 타일의 순서로 우선순위를 부여하고, 우선순위에 따라 전송될 타일의 영상에 대해서는 고화질(향상 계층)로 인코딩하고, 고화질로 전송되지 않는 타일의 영상, 즉, 뷰포트 이외의 영역에 대한 타일과, 뷰포트에 포함되더라도 전송 우선순위에서 밀려나는 타일에 대해서는 기본화질(저화질)로 비디오 데이터를 인코딩함으로써 대역폭을 절약하면서도 고화질의 영상을 전송할 수 있다.That is, the tiles included in the viewport of the image transmitting apparatus 1200 are given priority in the order of small tiles from tiles having a large ratio, and the images of the tiles to be transmitted are encoded in a high picture quality (enhancement layer) , Video data is encoded in a basic picture quality (low picture quality) for tiles not transmitted in a high picture quality, that is, tiles for areas other than the viewport and tiles pushed out of transmission priority even though they are included in the viewport, Can be transmitted.

도 13은 우선 순위에 따른 영상 수신 방법의 예를 도시한다.13 shows an example of a video receiving method according to priority.

도 13에 도시한 예시적인 영상 수신 방법에 의하면, 먼저, 클라이언트 디바이스는 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트 정보, 뷰포트에 포함된 타일의 번호 정보, 및 뷰포트에 포함된 타일의 포함 비율 정보를 전송할 수 있다(1301).According to the exemplary image receiving method shown in Fig. 13, first, the client device displays the viewport information viewed by the user in the virtual reality space, the number information of the tiles included in the viewport, and the ratio information of the tiles included in the viewport (1301).

클라이언트 디바이스는 직사각형 모양의 적어도 하나의 타일로 분할된 가상 현실 공간 전체에 대한 기본 계층 비디오 데이터를 수신할 수 있다(1303).The client device may receive base layer video data for the entire virtual reality space divided into at least one tile of rectangular shape (1303).

클라이언트 디바이스는 뷰포트 포함 비율에 따라 우선순위가 부여된 타일들에 대해서 대역폭의 허용한도 내에서 적어도 하나의 향상 계층 비디오 데이터를 수신할 수 있다(1305).The client device may receive at least one enhancement layer video data within the allowed bandwidth limit for the prioritized tiles according to the viewport containment rate (1305).

클라이언트 디바이스는 뷰포트에 포함된 타일들 중 향상 계층 비디오 데이터가 수신되지 않는 타일에 대해서는 업샘플링된 기본 계층 비디오 데이터를 생성할 수 있다(1307).The client device may generate upsampled base layer video data for tiles for which enhancement layer video data is not received among the tiles included in the viewport (1307).

클라이언트 디바이스는 기본 계층 비디오 데이터, 향상 계층 비디오 데이터, 및 업샘플링된 기본 계층 비디오 데이터에 적어도 일부 기초하여, 사용자에게 출력할 영상을 디코딩하고(1309), 상기 디코딩된 비디오 영상을 출력할 수 있다.The client device may decode (1309) the image to be output to the user based at least in part on the base layer video data, the enhancement layer video data, and the upsampled base layer video data, and output the decoded video image.

도 14는 특정 비율에 따른 영상 수신 방법의 예를 도시한다.14 shows an example of a method of receiving an image according to a specific ratio.

도 14에 도시한 예시적인 영상 수신 방법에 의하면, 먼저, 클라이언트 디바이스는 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트 정보, 뷰포트에 포함된 타일의 번호 정보, 및 뷰포트에 포함된 타일의 포함 비율 정보를 전송할 수 있다(1401).According to the exemplary image receiving method shown in Fig. 14, first, the client device displays the viewport information viewed by the user in the virtual reality space, the number information of tiles included in the viewport, and the ratio information of the tiles included in the viewport (1401).

클라이언트 디바이스는 직사각형 모양의 적어도 하나의 타일로 분할된 가상 현실 공간 전체에 대한 기본 계층 비디오 데이터를 수신할 수 있다(1403).The client device may receive base layer video data for the entire virtual reality space divided into at least one tile of rectangular shape (1403).

클라이언트 디바이스는 뷰포트 포함 비율이 특정 값과 같거나 큰 타일들에 대해서 적어도 하나의 향상 계층 비디오 데이터를 수신할 수 있다(1405).The client device may receive (1405) at least one enhancement layer video data for tiles with a viewport inclusion ratio equal to or greater than a particular value.

클라이언트 디바이스는 뷰포트에 포함된 타일들 중 향상 계층 비디오 데이터가 수신되지 않는 타일에 대해서는 업샘플링된 기본 계층 비디오 데이터를 생성 할 수 있다(1407).The client device may generate upsampled base layer video data for tiles for which enhancement layer video data is not received among the tiles included in the viewport (1407).

클라이언트 디바이스는 기본 계층 비디오 데이터, 향상 계층 비디오 데이터, 및 업샘플링된 기본 계층 비디오 데이터에 적어도 일부 기초하여, 사용자에게 출력할 영상을 디코딩하고(1409), 상기 디코딩된 비디오 영상을 출력할 수 있다.The client device may decode (1409) the image to be output to the user based at least in part on the base layer video data, the enhancement layer video data, and the upsampled base layer video data, and output the decoded video image.

도 15는 영상 전송의 신호 체계에서 전달되는 타일 정보를 예시적으로 도시한다.FIG. 15 exemplarily shows tile information transmitted in a signal transmission system.

본 명세서에 제시된 기술을 통해 뷰포트와 타일 크기를 고려하여 효율적이고 최적화된, 움직임이 제한된 타일 집합(MCTS) 전송을 할 수 있다. 따라서, 본 명세서에서 제시하는 시그널링의 전달 정보는, 도 15에 도시한 바와 같이 비디오 수신 장치인 머리장착형 영상장치(1510)가 360도 비디오 스트리밍 서버(1520)에 전달하는 타일 별 뷰포트에 포함된 비율 정보(1530)와 뷰포트에 포함된 전송 타일 정보(1540)를 포함할 수 있다.The techniques presented here allow for efficient and optimized movement limited tile set (MCTS) transmission considering the viewport and tile size. Therefore, the signaling transmission information provided in the present specification is the ratio of the number of viewports included in the tile-specific viewport that the head-mounted video apparatus 1510, which is a video receiving apparatus, transmits to the 360-degree video streaming server 1520 Information 1530 and transmission tile information 1540 included in the viewport.

본 명세서에서는 타일 별 뷰포트 포함 비율 정보와 뷰포트에 포함된 전송 타일 정보가 클라이언트 디바이스에서 서버 디바이스로 전달되는 것을 예로 들었지만, 상기 정보들은 클라이언트 디바이스에서 전달해주는 뷰포트 정보만을 이용하여 서버 디바이스에서 산출할 수도 있다.In the present specification, the viewport inclusion ratio information for each tile and the transmission tile information included in the viewport are transmitted from the client device to the server device, but the information may be calculated by the server device using only the viewport information delivered from the client device .

또한, 이 정보는 360도 비디오 부호화 국제표준(MPEG)의 부가정보를 전달하는 OMAF(Omnidirectional Media Application Format)의 추가 정보를 통해 전달 할 수 있다.This information can also be conveyed through additional information in an OMAF (Omnidirectional Media Application Format) that carries additional information of the 360-degree Video Coding International Standard (MPEG).

도 16은 예시적인 OMAF 구문을 도시한다.Figure 16 illustrates an exemplary OMAF syntax.

도 16은 H.264 AVC나 H.265 HEVC와 같은 국제 비디오 표준에서의 OMAF 구문(Syntax)의 예를 도시한다.Figure 16 shows an example of an OMAF syntax in an international video standard such as H.264 AVC or H.265 HEVC.

도면의 참조번호 1600의 구문이 본 명세서의 실시예로 새로 추가된 것이며, 이 외의 구문은 모두 기존의 표준 구문이다.The syntax of reference numeral 1600 in the drawing is a new embodiment of the present specification, and all the other syntaxes are existing standard syntaxes.

전송하는 매 비디오 픽쳐마다 신호를 할 경우(시그널링) 다음에 정의한 각 구문(Syntax) 규격에 맞추어 고효율 비디오 부호화 타일 정보를 전달할 수 있다.When signaling is performed for each video picture to be transmitted (signaling), high efficiency video coding tile information can be delivered according to each syntax standard defined below.

구문에 나온 u(n)는 통상 프로그래밍 언어에서 부호가 없는(unsigned) 'n' 비트 수를 의미하며, 'v'로 표시된 부분은 변화 가능한 비트수(표준에서는 varies로 읽힘)를 의미한다.The u (n) in the syntax means the number of unsigned 'n' bits in the programming language, and the part denoted by 'v' means the number of bits that can be changed (read as varies in the standard).

center_yaw 구문은 전역 좌표축을 기준으로 뷰포트 방향을 지정하며 뷰포트의 중심을 나타낸다. 범위는 -180 * 2^16 ~ 180 * 2^16 - 1 내에 있어야 한다.The center_yaw syntax specifies the viewport orientation relative to the global coordinate axis and represents the center of the viewport. The range should be within -180 * 2 ^ 16 ~ 180 * 2 ^ 16 - 1.

center_pitch 구문은 전역 좌표축을 기준으로 뷰포트 방향을 지정하며 뷰포트의 중심을 나타낸다. 범위는 -90 * 2^16 ~ 90 * 2^16 - 1 내에 있어야 한다.The center_pitch statement specifies the viewport orientation relative to the global coordinate axis and represents the center of the viewport. The range should be within -90 * 2 ^ 16 ~ 90 * 2 ^ 16 - 1.

center_roll 구문은 전역 좌표축을 기준으로 뷰포트 방향을 지정하며 뷰포트의 roll 좌표를 나타낸다. 범위는 -180 * 2^16 ~ 180 * 2^16 - 1 내에 있어야 한다.The center_roll statement specifies the viewport orientation relative to the global coordinate axis and represents the roll coordinates of the viewport. The range should be within -180 * 2 ^ 16 ~ 180 * 2 ^ 16 - 1.

hor_range 구문은 구 영역에서 수평 범위를 나타낸다. 구체 영역의 중심점을 통해 범위를 지정하며 0 ~ 720*2^16 내에 있어야 한다.The hor_range statement represents the horizontal extent in the sphere. The range is specified through the center point of the sphere and must be within 0 ~ 720 * 2 ^ 16.

ver_range 구문은 구 영역에서 수직 범위를 나타낸다. 구체 영역의 중심점을 통해 범위를 지정하며 0 ~ 180*2^16 내에 있어야 한다.The ver_range syntax indicates a vertical range in the sphere. The range is specified through the center point of the sphere and must be within 0 ~ 180 * 2 ^ 16.

Interpolate 구문은 선형 보간의 적용 여부를 나타낸다. 값이 1일 경우 선형 보간이 적용 됨을 나타낸다.Interpolate syntax indicates whether linear interpolation is applied. A value of 1 indicates that linear interpolation is applied.

tile_ratio_list[] 구문은 뷰포트 내에 모든 타일에 대한 관심영역 비율 정보를 전달한다.The tile_ratio_list [] syntax conveys information about the area of interest for all tiles in the viewport.

tile_id_list_trans[] 구문은 뷰포트 내에 전송되는 타일 번호 리스트를 전달한다.The tile_id_list_trans [] syntax conveys a list of tile numbers to be transferred within the viewport.

전술한 정의된 구문과 의미론에 관한 정보들은 MPEG DASH와 같은 HTTP 기반의 영상 통신에서 각각 XML 형태로 표현될 수도 있다.The above-described defined syntax and information on the semantics may be expressed in XML form in HTTP-based video communication such as MPEG DASH.

도 17은 XML 형태로 표현된 예시적인 타일 정보 구문을 도시한다.Figure 17 shows an exemplary tile information syntax expressed in XML form.

도 17을 참조하면, 도면에는 XML 형태로 Yaw 좌표(center_yaw= "134"), Pitch 좌표(center_pitch = "85"), Roll 좌표(center_roll = "247"), 선형 보간 여부(interpolate = "0"), 뷰포트 내에 모든 타일에 대한 뷰포트 포함 비율 정보(tile_ratio_list = "73, 18, 7, 2"), 뷰포트 내에 전송되는 타일 번호 리스트 정보(tile_id_list_trans= "13, 18")를 표현한 타일 정보 구문의 한 예시를 표현하였다.17, the Yaw coordinate (center_yaw = "134"), the Pitch coordinate (center_pitch = "85"), the Roll coordinate (center_roll = "247"), the linear interpolation (interpolate = "0" ), A viewport inclusion information (tile_ratio_list = "73, 18, 7, 2") for all tiles in the viewport, and a tile information syntax representing tile number list information (tile_id_list_trans = "13, 18" An example is shown.

본 명세서에 개시된 기술은 움직임이 제한적인 타일 집합(Motion Constrained Tile Sets; MCTS)에 적용하여 설명하고 있지만, 슬라이스(Slice), FMO(Flexible Macro Block) 등의 화면 분할을 지원하는 다른 비디오 병렬처리 기법들에도 적용이 가능하다.Although the technique disclosed in this specification is applied to a Motion Constrained Tile Sets (MCTS), other video parallel processing techniques such as Slice and FMO (Flexible Macro Block) It is also possible to apply it to the above.

본 명세서에 개시된 기술은 또한 비트 스트림을 분할하여 전송하는 스트리밍 서비스인 MPEG DASH, 마이크로소프트 사의 Smooth 스트리밍(Smooth Streaming), 애플 사의 HLS(HTTP Live Streaming)에도 적용이 가능하다.The technique disclosed in this specification is also applicable to MPEG DASH, Smooth Streaming (Microsoft), and HLS (HTTP Live Streaming) of Apple, which is a streaming service for dividing and transmitting a bitstream.

본 명세서에 개시된 실시예들에 따른 가상 현실 시스템은 컴퓨터로 읽을 수 있는 기록 매체에서 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 명세서의 기술이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The virtual reality system according to the embodiments disclosed herein can be implemented as computer readable code on a computer readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present description belongs.

이상에서 본 명세서의 기술에 대한 바람직한 실시 예가 첨부된 도면들을 참조하여 설명되었다. 여기서, 본 명세서 및 청구 범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다.In the foregoing, preferred embodiments of the present invention have been described with reference to the accompanying drawings. Here, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and should be construed in a sense and concept consistent with the technical idea of the present invention.

본 발명의 범위는 본 명세서에 개시된 실시 예들로 한정되지 아니하고, 본 발명은 본 발명의 사상 및 특허청구범위에 기재된 범주 내에서 다양한 형태로 수정, 변경, 또는 개선될 수 있다.The scope of the present invention is not limited to the embodiments disclosed herein, and the present invention can be modified, changed, or improved in various forms within the scope of the present invention and the claims.

1200: 영상 전송 장치
1210: 기본 계층 인코더
1220: 향상 계층 인코더
1230: 다중화기
1240: 통신부1200: Video transmission device
1210: Base layer encoder
1220: Enhanced Layer Encoder
1230: Multiplexer
1240:

Claims

Dividing the entire area of the virtual reality space into at least one tile of a rectangular shape;
Generating base layer video data or basic picture quality video data for the entire region of the virtual reality space;
Generating enhanced layer video data or enhanced video data for at least one of the tiles included in the viewport based at least in part on a ratio of the tiles included in the viewport that the user is viewing within the virtual reality space; And
And transmitting a bitstream including at least a portion of the base layer video data or the base picture quality video data and the enhancement layer video data or the high definition video data.

The method according to claim 1,
The ratio of the tiles included in the viewport is
Based on at least a part of the number information and the content ratio information of the tiles included in the viewport received from the video receiving apparatus.

The method according to claim 1,
The operation of generating enhancement layer video data or enhancement layer video data for at least one of the tiles included in the viewport based at least in part on the ratio of tiles included in the viewport
Assigning priority to tiles included in the viewport in the order of small tiles in the tiles having a large ratio; And
And generating enhanced layer video data or high definition video data for the at least one tile based on the bandwidth of the communication line transmitting the bit stream and the priority.

The method of claim 3,
The operation of generating enhancement layer video data or high quality video data for the at least one tile based on the bandwidth of the communication line transmitting the bitstream and the priority
Wherein the enhancement layer video data or the high-definition video data is generated in the order of the tile having the highest priority to the allowable range of the bandwidth to the lower tile.

The method according to claim 1,
The operation of generating enhancement layer video data or enhancement layer video data for at least one of the tiles included in the viewport based at least in part on the ratio of tiles included in the viewport
And generating enhanced-layer video data or high-quality video data for a tile having a ratio equal to or greater than a specific value among the tiles included in the viewport.

A base layer encoder for generating base layer video data or basic image quality video data for the entire area of the virtual reality space;
An enhancement layer for generating enhancement layer video data or enhancement layer video data for at least one of the tiles included in the viewport based at least in part on a ratio of the tiles included in the viewports viewed by the user in the virtual reality space, An encoder;
A multiplexer for generating a bitstream including at least a portion of the base layer video data or the base picture quality video data and the enhancement layer video data or the high definition video data; And
And a communication unit for receiving rate information of a tile included in the viewport and transmitting the bitstream.

The method according to claim 6,
The tiles included in the viewport are given priority in the order of the tiles having the largest ratios and the tiles having the small ratios,
Wherein the enhancement layer encoder generates enhanced layer video data or high definition video data for the at least one tile based on the bandwidth of the communication line transmitting the bitstream and the priority.

8. The method of claim 7,
Wherein the enhancement layer encoder generates the enhancement layer video data or the enhancement layer video data in the order of the highest priority tile to the lowest tile within the allowable range of the bandwidth.

The method according to claim 6,
Wherein the enhancement layer encoder generates enhancement layer video data or enhancement layer video data for a tile whose ratio is equal to or greater than a specific value among the tiles included in the viewport.

Transmitting the viewport information viewed by the user in the virtual reality space, the number information of the tiles included in the viewport, and the ratio information of the tiles included in the viewport;
Receiving base layer video data or basic picture quality video data for the entire virtual reality space divided into at least one tile of rectangular shape;
Receiving at least one enhancement layer video data or high quality video data for tiles included in the viewport prioritized according to the ratio;
Wherein the tile in which the enhancement layer video data or the high-definition video data is not received among the tiles included in the viewport is generated by: generating upsampled base layer video data or base-quality video data; And
A video to be output to the user based at least in part on the base layer video data, the base picture quality video data, the enhancement layer video data or the high picture quality video data, and the upsampled base layer video data or the base picture quality video data And decodes the image.

Transmitting the viewport information viewed by the user in the virtual reality space, the number information of the tiles included in the viewport, and the ratio information of the tiles included in the viewport;
Receiving base layer video data or basic picture quality video data for the entire virtual reality space divided into at least one tile of rectangular shape;
Receiving at least one enhancement layer video data or high definition video data for tiles included in the viewport for tiles where the ratio is equal to or greater than a specific value;
Wherein the tile in which the enhancement layer video data or the high-definition video data is not received among the tiles included in the viewport is generated by: generating upsampled base layer video data or base-quality video data; And
A video to be output to the user based at least in part on the base layer video data, the base picture quality video data, the enhancement layer video data or the high picture quality video data, and the upsampled base layer video data or the base picture quality video data And decodes the image.