KR101898822B1

KR101898822B1 - Virtual reality video streaming with viewport information signaling

Info

Publication number: KR101898822B1
Application number: KR1020170170823A
Authority: KR
Inventors: 류은석; 노현준
Original assignee: 가천대학교 산학협력단
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2018-09-13

Abstract

According to the present specification, a method for transmitting an image comprises the following operations: generating video data for an image streaming service on a virtual reality space; generating signaling data based on at least a part of information on a current viewport that a user watches in the virtual reality space and information on a prediction viewport predicted to be watched by the user; and transmitting a bit stream including the video data and the signaling data. By transmitting high-quality video data only for an area corresponding to a viewport and transmitting basic-quality video data for other areas, a transmission bandwidth can be secured.

Description

[0001] VIRTUAL REALITY VIDEO STREAMING WITH VIEWPORT INFORMATION SIGNALING [0002]

본 명세서는 뷰포트 정보 시그널링을 이용하여 가상 현실 비디오를 스트리밍하는 것에 관한 것이다.This specification relates to streaming virtual reality video using viewport information signaling.

최근 가상 현실 기술 및 장비의 발달과 함께 머리장착형 영상장치(Head-Mounted Display; HMD)와 같은 착용 가능한 기기들이 선보이고 있다. 이를 통한 여러 서비스 시나리오 중에는 대표적으로 영화 관람 및 게임뿐만 아니라 화상회의와 원격 수술 등이 존재한다. 이 중에서 게임 같은 콘텐츠들은 일반 사용자들이 쉽게 접할 수 있고 머리장착형 영상장치를 구입하는 요인이 되는 콘텐츠이다.Recently, wearable devices such as a head-mounted display (HMD) have been introduced along with the development of virtual reality technology and equipment. Among these service scenarios, there are video games and telemedicine as well as movie watching and games. Among these, contents such as games are contents that can be easily accessed by general users and purchase head-mounted video devices.

클라우드 기반 게임 스트리밍 역시 널리 보급되고 있는데, 이는 서버에서 게임과 관련된 주요 연산들이 처리되고 클라이언트는 서버에 접속하여 게임 화면을 전송 받아 게임을 즐기는 기술이다. 이 기술은 클라이언트의 연산 성능에 제약 없이 고사양 게임을 즐기 수 있는 장점이 존재한다.Cloud-based game streaming is also becoming widespread, with major operations related to the game being handled on the server, and client accessing the server and receiving game screens to enjoy the game. This technology has the advantage of being able to enjoy high-end games without restrictions on the computational performance of the client.

아직까지 머리장착형 영상장치와 클라우드 기반 게임 스트리밍을 함께 사용하는 것이 보편적이지는 않지만, 몰입도가 상당한 머리장착형 영상장치의 특징과 클라우드 게임 스트리밍의 장점으로 인해 근시일 내에 널리 보급될 것으로 예상된다.Although it is not common to use head-mounted video devices and cloud-based game streaming together, it is expected to be widely used in near-term due to the characteristics of immersive head-mounted video devices and the advantages of cloud game streaming.

머리장착형 영상장치를 이용한 콘텐츠들이 겪는 어려운 문제는 사용자 눈에는 매우 넓게 보이는 360도 영상 전체를 담는 비디오 화소수가 매우 높아야 한다는 점이다. 따라서 콘텐츠로 UHD 급의 영상을 이용할 필요성이 있는데, 이 경우 복수의 사용자 단말들 사이에 대역폭 확보가 어려운 문제점과, 처리해야 할 많은 비디오 데이터로 인해 사용자의 머리 움직임에 빠르게 응답하기 어렵다는 문제점을 갖게 된다. 클라우드 게임 콘텐츠들은 콘텐츠 특성상 인/디코딩 과정이나 전송과정에서 지연이 발생하므로 필요 대역폭 축소와 즉각적인 반응 확보가 필요하게 되었다.The difficulty with content using head-mounted imaging devices is that the number of video pixels that contain the entire 360-degree image, which is very visible to the user, must be very high. Therefore, there is a need to use UHD-class images as contents. In this case, there is a problem that it is difficult to secure a bandwidth between a plurality of user terminals and a problem that it is difficult to quickly respond to a head movement of a user due to a large amount of video data to be processed . Cloud game contents are delayed in the process of encoding / decoding or transmission due to the nature of contents, so it is necessary to reduce the required bandwidth and obtain an immediate response.

본 명세서는 영상 전송 방법을 제시한다. 상기 영상 전송 방법은, 가상 현실 공간에 대한 영상 스트리밍 서비스를 위한 비디오 데이터를 생성하는 동작; 상기 가상 현실 공간 내에서 사용자가 바라보고 있는 현재 뷰포트에 대한 정보 및 상기 사용자가 바라볼 것으로 예상되는 예측 뷰포트에 대한 정보에 적어도 일부 기초하여 시그널링 데이터를 생성하는 동작; 및 상기 비디오 데이터 및 상기 시그널링 데이터를 포함하는 비트스트림을 전송하는 동작을 포함하되, 상기 비디오 데이터는 상기 가상 현실 공간 전체에 대한 기본 화질 비디오 데이터 및 상기 현재 뷰포트와 상기 예측 뷰포트에 대한 고화질 비디오 데이터를 포함하고, 상기 현재 뷰포트와 상기 예측 뷰포트에 대응되는 영역에 대해서는 상기 고화질 비디오 데이터를 전송하고, 상기 현재 뷰포트와 상기 예측 뷰포트 이외의 영역에 대해서는 상기 기본 화질 비디오 데이터만 전송할 수 있다.This specification presents a method of transmitting an image. The video transmission method includes: generating video data for a video streaming service for a virtual reality space; Generating signaling data based at least in part on information about a current viewport the user is viewing within the virtual reality space and information about a predicted viewport that the user is expected to view; And transmitting a bitstream including the video data and the signaling data, wherein the video data includes basic video quality data for the entire virtual reality space and high quality video data for the current viewport and the prediction viewport And transmits the high-definition video data for the area corresponding to the current viewport and the predictive viewport, and only the basic-quality video data for the area other than the current viewport and the predictive viewport.

상기 방법 및 그 밖의 실시 예는 다음과 같은 특징을 포함할 수 있다.The method and other embodiments may include the following features.

상기 고화질 비디오 데이터는 직사각형 모양의 적어도 하나의 타일로 분할되고, 상기 시그널링 데이터는 상기 현재 뷰포트 및 상기 예측 뷰포트에 포함되는 상기 적어도 하나의 타일을 식별하는 타일 정보를 포함할 수 있다.The high definition video data is divided into at least one tile of rectangular shape and the signaling data may include tile information identifying the at least one tile included in the current viewport and the predictive viewport.

또한, 상기 방법은 상기 비디오 데이터를 전송하는 통신 회선의 대역폭이 상기 고화질 비디오 데이터를 모두 전송하기에 충분한지의 여부를 판단하는 동작; 및 상기 대역폭이 충분하지 않은 것으로 판단되는 경우, 상기 적어도 하나의 타일을 우선순위에 따라 전송하는 동작을 더 포함할 수 있다.The method may further comprise: determining whether a bandwidth of the communication line transmitting the video data is sufficient to transmit all of the high definition video data; And if the bandwidth is determined to be insufficient, transmitting the at least one tile in priority order.

또한, 상기 대역폭이 충분하지 않은 것으로 판단되는 경우, 상기 적어도 하나의 타일의 적어도 일부를 우선순위에 따라 전송하는 동작은 상기 대역폭의 허용 범위까지 우선순위가 높은 타일부터 낮은 타일의 순서로 상기 고화질 비디오 데이터를 전송할 수 있다.In addition, if it is determined that the bandwidth is insufficient, the operation of transmitting at least a part of the at least one tile according to the priority may be performed in the order of the high-priority tile to the low- Data can be transmitted.

또한, 상기 우선순위는 상기 사용자로부터 상기 타일 내의 대상(object)까지의 거리에 따라 결정하되, 상기 대상이 상기 사용자에게 가까울수록 상기 대상이 포함된 타일에 높은 순위를 부여할 수 있다.The priority may be determined according to a distance from the user to an object in the tile, and a higher ranking may be given to the tile including the object as the object is closer to the user.

또한, 상기 예측 뷰포트는 상기 현재 뷰포트에 대한 정보 및 가상 현실 콘텐츠의 내용에 적어도 일부 기초하여 결정될 수 있다.In addition, the predictive viewport may be determined based at least in part on the information about the current viewport and the content of the virtual reality content.

또한, 상기 고화질 비디오 데이터는 상기 가상 현실 공간 전체 영역 내에서 상기 현재 뷰포트와 상기 예측 뷰포트에 대응되는 영역에 대한 비디오 데이터를 포함할 수 있다.In addition, the high-definition video data may include video data for the current viewport and an area corresponding to the prediction viewport within the entire area of the virtual reality space.

또한, 상기 시그널링 데이터는 영상 구성 정보를 기초로 생성되고, 상기 영상 구성 정보는 상기 가상 현실 공간 내에서 상기 사용자의 뷰포트를 지시하는 시선 정보 및 상기 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.In addition, the signaling data may be generated based on image configuration information, and the image configuration information may include gaze information indicating a viewport of the user and zoom area information indicating a viewing angle of the user in the virtual reality space have.

또한, 상기 시그널링 데이터는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol), SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더(Slice Header), 및 상기 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data may include at least one of a High-Level Syntax Protocol, a Supplement Enhancement Information (SEI), a Video Usability Information (VUI), a Slice Header, File through at least one of them.

또한, 상기 기본 화질 비디오는 기본 계층 비디오를 포함하고, 상기 고화질 비디오는 기본 계층 비디오와 향상 계층 비디오를 포함할 수 있다.In addition, the basic picture quality video may include base layer video, and the high definition video may include base layer video and enhancement layer video.

한편, 본 명세서는 영상 수신 방법을 제시한다. 상기 영상 수신 방법은 가상 현실 공간에 대한 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림을 수신하는 동작; 상기 비디오 데이터를 기초로 기본 화질 비디오 데이터를 디코딩하는 동작; 및 상기 비디오 데이터 및 상기 시그널링 데이터를 기초로 고화질 비디오 데이터를 디코딩하는 동작을 포함하되, 상기 시그널링 데이터는 상기 가상 현실 공간 내에서 사용자가 바라보고 있는 영역에 대한 현재 뷰포트 및 상기 가상 현실 공간 내에서 상기 사용자가 바라볼 것으로 예상되는 예측 뷰포트에 대한 정보를 적어도 일부 포함하고, 상기 고화질 비디오 데이터는 상기 현재 뷰포트 및 상기 예측 뷰포트에 대응되는 비디오 데이터를 포함할 수 있다.The present specification, on the other hand, discloses a video receiving method. The method comprising: receiving a bitstream including video data and signaling data for a virtual reality space; Decoding the basic picture quality video data based on the video data; And decoding the high-definition video data based on the video data and the signaling data, wherein the signaling data includes a current viewport for a region that the user is viewing within the virtual reality space, The high-definition video data may include video data corresponding to the current viewport and the prediction viewport, at least some of the information about the prediction viewport that the user is expected to view.

상기 기본 화질 비디오 데이터는 상기 가상 현실 공간 전체 영역에 대한 비디오 데이터를 포함하고, 상기 고화질 비디오 데이터는 상기 전체 영역 내에서 상기 현재 뷰포트와 상기 예측 뷰포트에 대응되는 영역에 대한 비디오 데이터를 포함할 수 있다.The basic picture quality video data may include video data for the whole virtual reality space area and the high picture quality video data may include video data for the current viewport and an area corresponding to the prediction viewport within the entire area .

또한, 상기 고화질 비디오 데이터는 직사각형 모양의 적어도 하나의 타일로 분할되고, 상기 시그널링 데이터는 상기 현재 뷰포트 및 상기 예측 뷰포트에 포함되는 상기 적어도 하나의 타일을 식별하는 타일 정보를 포함할 수 있다.Also, the high definition video data may be divided into at least one tile of rectangular shape, and the signaling data may include tile information identifying the at least one tile included in the current viewport and the prediction viewport.

한편, 본 명세서는 영상 전송 장치를 제시한다. 상기 영상 전송 장치는 가상 현실 공간에 대한 영상 스트리밍 서비스를 위한 비디오 데이터를 생성하는 인코더; 상기 가상 현실 공간 내에서 사용자가 바라보고 있는 현재 뷰포트에 대한 정보 및 상기 사용자가 바라볼 것으로 예상되는 예측 뷰포트에 대한 정보에 적어도 일부 기초하여 시그널링 데이터를 생성하는 시그널링부; 상기 비디오 데이터 및 상기 시그널링 데이터를 포함하는 비트스트림을 생성하는 다중화기; 및 상기 비트스트림을 전송하는 전송부를 포함하되, 상기 비디오 데이터는 상기 가상 현실 공간 전체 영역에 대한 기본 화질 비디오 데이터 및 상기 전체 영역 내에서 상기 현재 뷰포트와 상기 예측 뷰포트에 대응되는 영역에 대한 고화질 비디오 데이터를 포함하고, 상기 현재 뷰포트와 상기 예측 뷰포트에 대응되는 영역에 대해서는 상기 고화질 비디오 데이터를 함께 전송하고, 상기 현재 뷰포트와 상기 예측 뷰포트 이외의 영역에 대해서는 상기 기본 화질 비디오 데이터만 전송할 수 있다.On the other hand, the present specification discloses an image transmission apparatus. Wherein the video transmission apparatus comprises: an encoder for generating video data for a video streaming service for a virtual reality space; A signaling unit for generating signaling data based at least in part on information about a current viewport the user is viewing in the virtual reality space and information on a prediction viewport that the user is expected to view; A multiplexer for generating a bitstream including the video data and the signaling data; And a transmission unit for transmitting the bitstream, wherein the video data includes basic picture quality video data for the entire area of the virtual reality space and high-definition video data for the area corresponding to the current viewport and the prediction picture within the entire area Quality video data together with an area corresponding to the current viewport and the predictive viewport, and only the basic picture quality video data may be transmitted to an area other than the current viewport and the predictive viewport.

상기 장치 및 그 밖의 실시 예는 다음과 같은 특징을 포함할 수 있다.The apparatus and other embodiments may include the following features.

상기 기본 화질 비디오는 기본 계층 비디오를 포함하고, 상기 고화질 비디오는 기본 계층 비디오와 향상 계층 비디오를 포함할 수 있다.The basic picture quality video includes base layer video, and the high definition video may include base layer video and enhancement layer video.

본 명세서에 개시된 실시 예들에 의하면, 뷰포트에 해당하는 영역만 고화질 비디오 데이터로 전송하고, 그 외의 영역은 기본 화질 비디오 데이터로 전송함으로써 전송 대역폭을 확보할 수 있는 효과가 있다.According to the embodiments disclosed in this specification, the transmission bandwidth can be secured by transmitting only the area corresponding to the viewport as the high-definition video data and transmitting the other area as the basic-quality video data.

또한, 본 명세서에 개시된 실시 예들에 의하면, 현재 뷰포트 및 예측 뷰포트에 해당하는 영상만 고화질로 전송하되, 비디오 데이터의 전송 중에 대역폭 부족의 문제가 발생해도, 우선 순위에 따라 전송될 타일 영상을 선정하고, 대역폭의 상황에 따라 탄력적으로 전송될 비디오 데이터를 조절할 수 있으므로, 대역폭에 적응적으로 비디오 데이터를 전송할 수 있는 효과가 있다.In addition, according to the embodiments disclosed in this specification, only the video corresponding to the current viewport and the predicted viewport is transmitted with high image quality, and even if there is a problem of lack of bandwidth during transmission of video data, a tile image to be transmitted is selected according to the priority , Video data to be transmitted can be adjusted flexibly according to the bandwidth conditions, so that video data can be transmitted adaptively to the bandwidth.

또한, 본 명세서에 개시된 실시 예들에 의하면, 서버 디바이스에서 다수의 클라이언트 디바이스로 가상 현실 비디오 데이터를 전송할 때, 상기 서버 디바이스와 상기 다수의 클라이언트 디바이스들 간의 각 통신 회선의 상태에 따라 현재 뷰포트와 예측 뷰포트 내의 타일들에 대한 고화질 영상을 선택적으로 전송할 수 있어서, 다수의 클라이언트가 연결된 통신회선의 대역폭도 유연하게 확보할 수 있는 효과가 있다.In addition, according to the embodiments disclosed herein, when transmitting virtual reality video data from a server device to a plurality of client devices, depending on the state of each communication line between the server device and the plurality of client devices, It is possible to flexibly secure the bandwidth of a communication line to which a plurality of clients are connected.

도 1은 가상 현실 영상을 제공하는 예시적인 가상 현실 시스템을 도시한다.
도 2는 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.
도 3은 서버 디바이스의 예시적인 구성을 나타낸 도면이다.
도 4는 인코더의 예시적인 구조를 나타낸 도면이다.
도 5는 관심 영역을 시그널링하는 예시적인 방법을 나타낸 도면이다.
도 6은 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.
도 7은 제어부의 예시적인 구성을 나타낸 도면이다.
도 8은 디코더의 예시적인 구성을 나타낸 도면이다.
도 9는 현재 뷰포트와 예측 뷰포트를 선반입하여 비디오 프로세싱 속도를 높이고 대역폭을 낮출 수 있는 예시적인 게임 영상의 스케일러블 비디오 코딩 기반 스트리밍 방법을 도시한다.
도 10은 영상 내 대상의 거리에 따른 뷰포트 예측 기술 기반의 예시적인 고화질 게임 스트리밍 방법을 나타낸 도이다.
도 11은 예시적인 영상 스트리밍 서비스를 위한 비디오 서버에서의 영상 전송 방법을 도시한다.
도 12는 영상 스트리밍 서비스를 위한 클라이언트 디바이스에서의 영상 수신 방법을 예시적으로 도시한다.
도 13은 영상 스트리밍 서비스를 위한 영상 전송 장치를 예시적으로 도시한다.
도 14는 예측 뷰포트와 대상 거리 정보의 시그널링을 예시적으로 도시한다.
도 15는 예측 뷰포트와 대상 거리 정보의 시그널링에서 제안하는 예시적인 SEI 페이로드 구문을 도시한다.
도 16은 예시적인 비디오 픽쳐별 뷰포트 신호 체계 규격을 도시한다.
도 17은 예시적인 파일, 청크, 비디오 픽쳐 그룹별 신호 체계 규격을 도시한다.
도 18은 XML 형태로 표현된 예시적인 타일 정보 구문을 도시한다1 illustrates an exemplary virtual reality system for providing a virtual reality image.
2 is a diagram illustrating an exemplary scalable video coding service.
3 is a diagram showing an exemplary configuration of a server device.
4 is a diagram showing an exemplary structure of an encoder.
5 is a diagram illustrating an exemplary method of signaling a region of interest.
6 is a diagram showing an exemplary configuration of a client device.
7 is a diagram showing an exemplary configuration of the control unit.
8 is a diagram showing an exemplary configuration of a decoder.
FIG. 9 illustrates a scalable video coding based streaming method of an exemplary game image capable of speeding up video processing and lowering bandwidth by prefetching current viewports and prediction viewports.
10 is a diagram illustrating an exemplary high-quality game streaming method based on a viewport prediction technique according to a distance of an object in an image.
11 illustrates a method of transmitting an image in a video server for an exemplary video streaming service.
FIG. 12 exemplarily shows a video receiving method in a client device for a video streaming service.
FIG. 13 exemplarily shows an image transmission apparatus for a video streaming service.
14 illustrates an exemplary signaling of the prediction viewport and object distance information.
Fig. 15 shows an exemplary SEI payload syntax proposed in the signaling of the prediction viewport and object distance information.
16 illustrates a viewport signaling system specification for each exemplary video picture.
Figure 17 shows exemplary signaling specifications for an exemplary file, chunk, and video picture group.
Figure 18 shows an exemplary tile information syntax expressed in XML form

본 명세서에 개시된 기술은 클라우드 기반의 영상 스트리밍을 제공하는 가상 현실 시스템에 적용될 수 있다. 그러나 본 명세서에 개시된 기술은 이에 한정되지 않고, 상기 기술의 기술적 사상이 적용될 수 있는 모든 전자 장치 및 방법에도 적용될 수 있다.The techniques disclosed herein can be applied to a virtual reality system that provides cloud-based video streaming. However, the technology disclosed in this specification is not limited thereto, and can be applied to all electronic devices and methods to which the technical idea of the above-described technology can be applied.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 명세서에 개시된 기술의 사상을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 명세서에 개시된 기술의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 명세서에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥 상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It is noted that the technical terms used herein are used only to describe specific embodiments and are not intended to limit the scope of the technology disclosed herein. Also, the technical terms used herein should be interpreted as being generally understood by those skilled in the art to which the presently disclosed subject matter belongs, unless the context clearly dictates otherwise in this specification, Should not be construed in a broader sense, or interpreted in an oversimplified sense. It is also to be understood that the technical terms used herein are erroneous technical terms that do not accurately represent the spirit of the technology disclosed herein, it is to be understood that the technical terms used herein may be understood by those of ordinary skill in the art to which this disclosure belongs And it should be understood. Also, the general terms used in the present specification should be interpreted in accordance with the predefined or prior context, and should not be construed as being excessively reduced in meaning.

본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.As used herein, terms including ordinals, such as first, second, etc., may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예들을 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals denote like or similar elements, and redundant description thereof will be omitted.

또한, 본 명세서에 개시된 기술을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 기술의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 기술의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 그 기술의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.Further, in the description of the technology disclosed in this specification, a detailed description of related arts will be omitted if it is determined that the gist of the technology disclosed in this specification may be obscured. It is to be noted that the attached drawings are only for the purpose of easily understanding the concept of the technology disclosed in the present specification, and should not be construed as limiting the spirit of the technology by the attached drawings.

도 1은 가상 현실 영상을 제공하는 예시적인 가상 현실 시스템을 도시한다.1 illustrates an exemplary virtual reality system for providing a virtual reality image.

가상 현실 시스템은 가상 현실 영상을 생성하는 가상 현실 영상 생성 장치, 상기 입력된 가상 현실 영상을 인코딩하여 전송하는 서버 디바이스, 및 상기 전송된 가상 현실 영상을 디코딩하여 사용자에게 출력하는 하나 이상의 클라이언트 디바이스를 포함하도록 구성될 수 있다.The virtual reality system includes a virtual reality image generation device that generates a virtual reality image, a server device that encodes and transmits the input virtual reality image, and one or more client devices that decode the transmitted virtual reality image and output the decoded virtual reality image to a user .

도 1은 가상 현실 영상 생성 장치(110), 서버 디바이스(120), 및 하나 이상의 클라이언트 디바이스(130)가 포함된 가상 현실 시스템(100)을 도시한다. 상기 가상 현실 시스템(100)은 360도 영상 제공 시스템으로 불릴 수 있다. 도 1에 도시된 각 구성요소들의 수는 예시적인 것일 뿐 이에 제한되지 아니한다.FIG. 1 illustrates a virtual reality system 100 including a virtual reality image generation device 110, a server device 120, and one or more client devices 130. The virtual reality system 100 may be referred to as a 360 degree image providing system. The number of the respective components shown in Fig. 1 is illustrative, but not limited thereto.

상기 가상 현실 영상 생성 장치(110)는 적어도 하나 이상의 카메라 모듈을 포함하여 자신이 위치하고 있는 공간에 대한 영상을 촬영함으로써 공간 영상을 생성할 수 있다.The virtual reality image generating apparatus 110 may include at least one camera module and may generate a spatial image by capturing an image of a space in which the virtual reality image generating apparatus 110 is located.

상기 서버 디바이스(120)는 상기 가상 현실 영상 생성 장치(110)에서 생성되어 입력된 공간 영상을 스티칭(Image stitching), 프로젝션(Projection), 맵핑(Mapping)하여 360도 영상을 생성하고, 상기 생성된 360도 영상을 원하는 품질의 비디오 데이터로 조절한 뒤 인코딩(Encoding; 부호화)할 수 있다.The server device 120 generates a 360-degree image by stitching, projecting, and mapping spatial images generated and input in the virtual reality image generating apparatus 110, A 360-degree image can be encoded with video data of a desired quality and then encoded.

또한, 상기 서버 디바이스(120)는 상기 인코딩된 360도 영상에 대한 비디오 데이터와 시그널링 데이터를 포함하는 비트스트림을 네트워크(통신망)을 통해서 클라이언트 디바이스(130)로 전송할 수 있다.Also, the server device 120 may transmit the bitstream including the video data and the signaling data for the encoded 360-degree image to the client device 130 through the network (communication network).

상기 클라이언트 디바이스(130)는 수신된 비트스트림을 디코딩(Decoding; 복호화)하여 상기 클라이언트 디바이스(130)를 착용한 사용자에게 360도 영상을 출력할 수 있다. 상기 클라이언트 디바이스(130)는 머리장착형 영상장치(Head-Mounted Display; HMD)와 같은 근안 디스플레이(Near-eye display) 장치일 수 있다.The client device 130 may decode the received bit stream and output a 360-degree image to a user wearing the client device 130. [ The client device 130 may be a near-eye display device such as a head-mounted display (HMD).

한편, 상기 가상 현실 영상 생성 장치(110)는 컴퓨터 시스템으로 구성되어 컴퓨터 그래픽으로 구현된 가상의 360도 공간에 대한 영상을 생성할 수도 있다. 또한, 상기 가상 현실 영상 생성 장치(110)는 가상 현실 게임 등의 가상 현실 콘텐츠의 공급자 일 수 있다.Meanwhile, the virtual reality image generating apparatus 110 may be configured as a computer system to generate an image of a virtual 360-degree space implemented by computer graphics. In addition, the virtual reality image generating apparatus 110 may be a provider of virtual reality contents such as a virtual reality game.

클라이언트 디바이스(130)는 해당 클라이언트 디바이스(130)를 사용하는 사용자로부터 사용자 데이터를 획득할 수 있다. 사용자 데이터는 사용자의 영상 데이터, 음성 데이터, 뷰포트 데이터(시선 데이터), 관심 영역 데이터 및 부가 데이터를 포함할 수 있다.The client device 130 may obtain user data from a user using the client device 130. The user data may include user's image data, voice data, viewport data (sight line data), region of interest data, and additional data.

예를 들어, 클라이언트 디바이스(130)는 사용자의 영상 데이터를 획득하는 2D/3D 카메라 및 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.For example, the client device 130 may include at least one of a 2D / 3D camera and an Immersive camera for acquiring image data of a user. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

예를 들어, 클라이언트 디바이스(130)는 제1 장소에 위치한 제1 사용자의 사용자 데이터를 획득하는 제1 클라이언트 디바이스(131), 제2 장소에 위치한 제2 사용자의 사용자 데이터를 획득하는 제2 클라이언트 디바이스(133), 및 제3 장소에 위치한 제3 사용자의 사용자 데이터를 획득하는 제3 클라이언트 디바이스(135) 중에서 적어도 하나를 포함할 수 있다.For example, the client device 130 may include a first client device 131 that obtains user data of a first user located at a first location, a second client device 130 that obtains user data of a second user located at a second location, A second client device 133, and a third client device 135 that obtains user data of a third user located at a third location.

그리고 나서, 각각의 클라이언트 디바이스(130)는 획득한 사용자 데이터를 네트워크를 통하여 서버 디바이스(120)로 전송할 수 있다. Each client device 130 may then transmit the acquired user data to the server device 120 over the network.

서버 디바이스(120)는 클라이언트 디바이스(130)로부터 적어도 하나의 사용자 데이터를 수신할 수 있다. 서버 디바이스(120)는 수신한 사용자 데이터를 기초로 가상 공간에 대한 전체 영상을 생성할 수 있다. 상기 전체 영상은 가상 공간 내에서 360도 방향의 영상을 제공하는 immersive 영상을 나타낼 수 있다. 서버 디바이스(120)는 사용자 데이터에 포함된 영상 데이터를 가상 공간에 매핑하여 전체 영상을 생성할 수 있다.Server device 120 may receive at least one user data from client device 130. The server device 120 may generate a full image of the virtual space based on the received user data. The entire image may represent an immersive image providing a 360-degree image in the virtual space. The server device 120 may generate the entire image by mapping the image data included in the user data to the virtual space.

그리고 나서, 서버 디바이스(120)는 전체 영상을 각 사용자에게 전송할 수 있다.The server device 120 may then send the entire image to each user.

각각의 클라이언트 디바이스(130)는 전체 영상을 수신하고, 각 사용자가 바라보는 영역 만큼을 가상 공간에 렌더링 및/또는 디스플레이할 수 있다.Each client device 130 may receive the entire image and render and / or display as much of the area viewed by each user in the virtual space.

도 2는 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.2 is a diagram illustrating an exemplary scalable video coding service.

스케일러블 비디오 코딩 서비스는 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라 시간적, 공간적, 그리고 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 스케일러블 비디오 코딩 서비스는 일반적으로 해상도(Spatial resolution), 품질(Quality), 및 시간(Temporal) 측면에서의 계층성(Scalability)을 제공한다.Scalable video coding service is an image compression method for providing various services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. Scalable video coding services generally provide scalability in terms of spatial resolution, quality, and temporal aspects.

공간적 계층성(Spatial scalability)은 동일한 영상에 대해 각 계층별로 다른 해상도를 가지고 부호화함으로써 서비스할 수 있다. 공간적 계층성을 이용하여 디지털 TV, 노트북, 스마트 폰 등 다양한 해상도를 갖는 디바이스에 대해 적응적으로 영상 콘텐츠를 제공하는 것이 가능하다.Spatial scalability can be provided by encoding the same image with different resolution for each layer. It is possible to adaptively provide image contents to devices having various resolutions such as a digital TV, a notebook, and a smart phone using spatial hierarchy.

도면을 참고하면, 스케일러블 비디오 코딩 서비스는 VSP(비디오 서비스 프로바이더; Video Service Provider)로부터 가정 내의 홈 게이트웨이 (Home Gateway)를 통해 동시에 하나 이상의 서로 다른 특성을 가진 TV를 지원할 수 있다. 예를 들어, 스케일러블 비디오 코딩 서비스는 서로 다른 해상도(Resolution)를 가지는 HDTV (High-Definition TV), SDTV (Standard-Definition TV), 및 LDTV (Low-Definition TV)를 동시에 지원할 수 있다.Referring to the drawings, a scalable video coding service can support one or more TVs having different characteristics from a video service provider (VSP) through a home gateway in the home. For example, the scalable video coding service can simultaneously support HDTV (High-Definition TV), SDTV (Standard-Definition TV), and LDTV (Low-Definition TV) having different resolutions.

시간적 계층성(Temporal scalability)은 콘텐츠가 전송되는 네트워크 환경 또는 단말기의 성능을 고려하여 영상의 프레임 레이트(Frame rate)를 적응적으로 조절할 수 있다. 예를 들어, 근거리 통신망을 이용하는 경우에는 60FPS(Frame Per Second)의 높은 프레임 레이트로 서비스를 제공하고, 3G 모바일 네트워크와 같은 무선 광대역 통신망을 사용하는 경우에는 16FPS의 낮은 프레임 레이트로 콘텐츠를 제공함으로써, 사용자가 영상을 끊김 없이 받아볼 수 있도록 서비스를 제공할 수 있다.Temporal scalability can adaptively adjust the frame rate of an image in consideration of the network environment in which the content is transmitted or the performance of the terminal. For example, when a local area network is used, a service is provided at a high frame rate of 60 frames per second (FPS). When a wireless broadband communication network such as a 3G mobile network is used, a content is provided at a low frame rate of 16 FPS, A service can be provided so that the user can receive the video without interruption.

품질 계층성(Quality scalability) 또한 네트워크 환경이나 단말기의 성능에 따라 다양한 화질의 콘텐츠를 서비스함으로써, 사용자가 영상 콘텐츠를 안정적으로 재생할 수 있도록 한다.Quality scalability In addition, by providing contents of various image quality according to the network environment or the performance of the terminal, the user can stably reproduce the image contents.

스케일러블 비디오 코딩 서비스는 각각 기본 계층 (Base layer)과 하나 이상의 향상 계층 (Enhancement layer(s))을 포함할 수 있다. 수신기는 기본 계층만 받았을 때는 일반 화질의 영상을 제공하고, 기본 계층 및 향상 계층을 함께 받으면 고화질을 제공할 수 있다. 즉, 기본 계층과 하나 이상의 향상 계층이 있을 때, 기본 계층을 받은 상태에서 향상 계층 (예: Enhancement layer 1, enhancement layer 2, …, enhancement layer n)을 더 받으면 받을수록 화질이나 제공하는 영상의 품질이 좋아진다.The scalable video coding service may each include a base layer and one or more enhancement layers (s). The receiver provides a normal image quality when receiving only the base layer, and can provide a high image quality when the base layer and the enhancement layer are received together. In other words, when there is a base layer and one or more enhancement layers, when an enhancement layer (for example, enhancement layer 1, enhancement layer 2, ..., enhancement layer n) is further received while receiving a base layer, Is better.

이와 같이, 스케일러블 비디오 코딩 서비스의 영상은 복수개의 계층으로 구성되어 있으므로, 수신기는 적은 용량의 기본 계층 데이터를 빨리 전송 받아 일반적 화질의 영상을 빨리 처리하여 재생하고, 필요 시 향상 계층 영상 데이터까지 추가로 받아서 서비스의 품질을 높일 수 있다.Thus, since the scalable video coding service is composed of a plurality of layers, the receiver can quickly receive the base layer data with a small capacity and quickly process and reproduce the image of general image quality, The service quality can be improved.

도 3은 서버 디바이스의 예시적인 구성을 나타낸 도면이다.3 is a diagram showing an exemplary configuration of a server device.

서버 디바이스(300)는 제어부(310) 및/또는 통신부(320)를 포함할 수 있다.The server device 300 may include a control unit 310 and / or a communication unit 320.

제어부(310)는 가상 공간에 대한 전체 영상을 생성하고, 생성된 전체 영상을 인코딩할 수 있다. 또한, 제어부(310)는 서버 디바이스(300)의 모든 동작을 제어할 수 있다. 구체적인 내용은 이하에서 설명한다.The controller 310 may generate a full image of the virtual space and encode the entire image. In addition, the control unit 310 can control all the operations of the server device 300. Details will be described below.

통신부(320)는 외부 장치 및/또는 클라이언트 디바이스로 데이터를 전송 및/또는 수신할 수 있다. 예를 들어, 통신부(320)는 적어도 하나의 클라이언트 디바이스로부터 사용자 데이터 및/또는 시그널링 데이터를 수신할 수 있다. 또한, 통신부(320)는 가상 공간에 대한 전체 영상 및/또는 일부의 영역에 대한 영상을 클라이언트 디바이스로 전송할 수 있다.The communication unit 320 may transmit and / or receive data to an external device and / or a client device. For example, the communication unit 320 may receive user data and / or signaling data from at least one client device. In addition, the communication unit 320 may transmit the entire image of the virtual space and / or the image of the partial region to the client device.

제어부(310)는 시그널링 데이터 추출부(311), 영상 생성부(313), 관심 영역 판단부(315), 시그널링 데이터 생성부(317), 및/또는 인코더(319) 중에서 적어도 하나를 포함할 수 있다.The control unit 310 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, a signaling data generation unit 317, and / or an encoder 319 have.

시그널링 데이터 추출부(311)는 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 포함할 수 있다. 상기 영상 구성 정보는 가상 공간 내에서 사용자의 시선 방향을 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다. 또한, 상기 영상 구성 정보는 가상 공간 내에서 사용자의 뷰포트 정보를 포함할 수 있다.The signaling data extracting unit 311 can extract signaling data from the data received from the client device. For example, the signaling data may include image configuration information. The image configuration information may include gaze information indicating a gaze direction of a user and zoom area information indicating a viewing angle of a user in a virtual space. In addition, the image configuration information may include the viewport information of the user in the virtual space.

영상 생성부(313)는 가상 공간에 대한 전체 영상 및 가상 공간 내의 특정 영역에 대한 영상을 생성할 수 있다.The image generating unit 313 may generate a full image of the virtual space and an image of a specific region in the virtual space.

관심 영역 판단부(315)는 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 판단할 수 있다. 또한, 가상 공간의 전체 영역 내에서 사용자의 뷰포트를 판단할 수 있다. 예를 들어, 관심 영역 판단부(315)는 시선 정보 및/또는 줌 영역 정보를 기초로 관심 영역을 판단할 수 있다. 예를 들어, 관심 영역은 사용자가 보게 될 가상의 공간에서 중요 오브젝트가 위치할 타일의 위치(예를 들어, 게임 등에서 새로운 적이 등장하는 위치, 가상 공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳일 수 있다. 또한, 관심 영역 판단부(315)는 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 지시하는 관심 영역 정보와 사용자의 뷰포트에 대한 정보를 생성할 수 있다.The ROI determining unit 315 may determine a ROI corresponding to the user's viewing direction in the entire area of the virtual space. In addition, the user's viewport can be determined within the entire area of the virtual space. For example, the ROI determiner 315 may determine the ROI based on the sight line information and / or the zoom area information. For example, the region of interest may include a location of a tile where the important object is located in a virtual space to be viewed by the user (for example, a location where a new enemy appears in a game or the like, a position of a speaker in a virtual space) It can be a place to look at. In addition, the ROI determining unit 315 may generate ROI information indicating the ROI corresponding to the user's viewing direction and information about the user's viewport in the entire area of the virtual space.

시그널링 데이터 생성부(317)는 전체 영상을 처리하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 관심 영역 정보 및/또는 뷰포트 정보를 전송할 수 있다. 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data generation unit 317 can generate signaling data for processing the entire image. For example, the signaling data may transmit the region of interest information and / or the viewport information. The signaling data may be transmitted through at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing video data.

인코더(319)는 시그널링 데이터를 기초로 전체 영상을 인코딩할 수 있다. 예를 들어, 인코더(319)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 인코딩할 수 있다. 예를 들어, 가상 공간 내에서 사용자가 특정 지점을 바라보는 경우, 인코더는 가상 공간 내의 사용자 시선을 기초로 특정 지점에 해당하는 영상은 고화질로 인코딩하고, 상기 특정 지점 이외에 해당하는 영상은 저화질로 인코딩할 수 있다. 실시예에 따라서, 인코더(319)는 시그널링 데이터 추출부(311), 영상 생성부(313), 관심 영역 판단부(315), 및/또는 시그널링 데이터 생성부(317) 중에서 적어도 하나를 포함할 수 있다.The encoder 319 may encode the entire image based on the signaling data. For example, the encoder 319 may encode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the user looks at a specific point in the virtual space, the encoder encodes the image corresponding to the specific point in high quality on the basis of the user's gaze in the virtual space, and the corresponding image other than the specific point is encoded can do. The encoder 319 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, and / or a signaling data generation unit 317 have.

이하에서는 관심 영역을 이용한 예시적인 영상 전송 방법을 설명한다.Hereinafter, an exemplary image transmission method using a region of interest will be described.

서버 디바이스는, 통신부를 이용하여, 적어도 하나의 클라이언트 디바이스로부터 비디오 데이터 및 시그널링 데이터를 수신할 수 있다. 또한, 서버 디바이스는, 시그널링 데이터 추출부를 이용하여, 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 시점 정보 및 줌 영역 정보를 포함할 수 있다.The server device can receive video data and signaling data from at least one client device using a communication unit. Further, the server device can extract the signaling data using the signaling data extracting unit. For example, the signaling data may include viewpoint information and zoom region information.

시선 정보는 사용자가 가상 공간 내에서 어느 영역(지점)을 바라보는지 여부를 지시할 수 있다. 가상 공간 내에서 사용자가 특정 영역을 바라보면, 시선 정보는 사용자에서 상기 특정 영역으로 향하는 방향을 지시할 수 있다.The gaze information can indicate which area (point) the user sees in the virtual space. When the user looks at a specific area within the virtual space, the line of sight information can indicate the direction from the user to the specific area.

줌 영역 정보는 사용자의 시선 방향에 해당하는 비디오 데이터의 확대 범위 및/또는 축소 범위를 지시할 수 있다. 또한, 줌 영역 정보는 사용자의 시야각을 지시할 수 있다. 줌 영역 정보의 값을 기초로 비디오 데이터가 확대되면, 사용자는 특정 영역만을 볼 수 있다. 줌 영역 정보의 값을 기초로 비디오 데이터가 축소되면, 사용자는 특정 영역뿐만 아니라 상기 특정 영역 이외의 영역 일부 및/또는 전체를 볼 수 있다.The zoom area information may indicate an enlarged range and / or a reduced range of the video data corresponding to the viewing direction of the user. In addition, the zoom area information can indicate the viewing angle of the user. If the video data is enlarged based on the value of the zoom area information, the user can view only the specific area. If the video data is reduced based on the value of the zoom area information, the user can view not only the specific area but also a part and / or the entire area other than the specific area.

그리고 나서, 서버 디바이스는, 영상 생성부를 이용하여, 가상 공간에 대한 전체 영상을 생성할 수 있다.Then, the server device can generate the entire image of the virtual space using the image generating unit.

그리고 나서, 서버 디바이스는, 관심 영역 판단부를 이용하여, 시그널링 데이터를 기초로 가상 공간 내에서 각 사용자가 바라보는 시점 및 줌(zoom) 영역에 대한 영상 구성 정보를 파악할 수 있다.Then, the server device can use the region-of-interest determination unit to grasp the video configuration information of the point of view and the zoom region of each user in the virtual space based on the signaling data.

그리고 나서, 서버 디바이스는, 관심 영역 판단부를 이용하여, 영상 구성 정보를 기초로 사용자의 관심 영역을 결정할 수 있다.Then, the server device can determine the region of interest of the user based on the image configuration information using the region of interest determination unit.

시그널링 데이터(예를 들어, 시점 정보 및 줌 영역 정보 중에서 적어도 하나)가 변경될 경우, 서버 디바이스는 새로운 시그널링 데이터를 수신할 수 있다. 이 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 새로운 관심 영역을 결정할 수 있다.When the signaling data (for example, at least one of the view information and the zoom area information) is changed, the server device can receive new signaling data. In this case, the server device can determine a new region of interest based on the new signaling data.

그리고 나서, 서버 디바이스는, 제어부를 이용하여, 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심 영역에 해당하는 데이터인지 아닌지 여부를 판단할 수 있다.Then, the server device can use the control unit to determine whether the data currently processed based on the signaling data is data corresponding to the region of interest.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심 영역에 해당하는 데이터인지 아닌지 여부를 판단할 수 있다.When the signaling data is changed, the server device can determine whether or not the data currently processed based on the new signaling data is data corresponding to the region of interest.

관심 영역에 해당하는 데이터일 경우, 서버 디바이스는, 인코더를 이용하여, 사용자의 시점에 해당하는 비디오 데이터(예를 들어, 관심 영역)는 고품질로 인코딩할 수 있다. 예를 들어, 서버 디바이스는 해당 비디오 데이터에 대하여 기본 계층 비디오 데이터 및 향상 계층 비디오 데이터를 생성하고, 이들을 전송할 수 있다.In the case of data corresponding to the region of interest, the server device can encode video data (for example, a region of interest) corresponding to the user's viewpoint at a high quality using an encoder. For example, the server device may generate base layer video data and enhancement layer video data for the video data and transmit them.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시점에 해당하는 비디오 데이터(새로운 관심 영역)는 고품질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 저품질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 고품질의 영상을 전송하는 경우, 서버 디바이스는 향상 계층 비디오 데이터를 추가로 생성 및/또는 전송할 수 있다.When the signaling data is changed, the server device can transmit the video data corresponding to the new time point (new interest area) as a high-quality image. If the server device is transmitting a low-quality image but the signaling data is changed so that the server device transmits a high-quality image, the server device can additionally generate and / or transmit enhancement layer video data.

관심 영역에 해당하지 않는 데이터일 경우, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 데이터(예를 들어, 비-관심 영역)은 저품질로 인코딩할 수 있다. 예를 들어, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 데이터에 대하여 기본 계층 비디오 데이터만 생성하고, 이들을 전송할 수 있다.In the case of data not corresponding to the area of interest, the server device can encode video data (e.g., non-interest area) that does not correspond to the user's viewpoint at a low quality. For example, the server device may generate only base layer video data for video data that does not correspond to a user's viewpoint, and may transmit them.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 사용자의 시점에 해당하지 않는 비디오 데이터(새로운 비-관심 영역)은 저품질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 고품질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 저품질의 영상을 전송하는 경우, 서버 디바이스는 더 이상 적어도 하나의 향상 계층 비디오 데이터를 생성 및/또는 전송하지 않고, 기본 계층 비디오 데이터만을 생성 및/또는 전송할 수 있다.When the signaling data is changed, the server device can transmit video data (new non-interest area) that does not correspond to the new user's viewpoint with a low quality image. In the case where the server device is transmitting a high quality image but the signaling data is changed and the server device transmits a low quality image, the server device does not generate and / or transmit at least one enhancement layer video data, Only hierarchical video data can be generated and / or transmitted.

즉, 기본 계층 비디오 데이터를 수신했을 때의 비디오 데이터의 화질은 향상 계층 비디오 데이터까지 받았을 때의 비디오 데이터의 화질보다는 낮으므로, 클라이언트 디바이스는 사용자가 고개를 돌린 정보를 센서 등으로부터 얻는 순간에, 사용자의 시선 방향에 해당하는 비디오 데이터(예를 들어, 관심 영역)에 대한 향상 계층 비디오 데이터를 수신할 수 있다. 그리고, 클라이언트 디바이스는 짧은 시간 내에 고화질의 비디오 데이터를 사용자에게 제공할 수 있다.That is, since the image quality of the video data when the base layer video data is received is lower than the image quality of the video data received when the enhancement layer video data is received, the client device, at the moment when the user obtains the information, (E.g., a region of interest) corresponding to the viewing direction of the video data. Then, the client device can provide high quality video data to the user in a short time.

도 4는 인코더의 예시적인 구조를 나타낸 도면이다.4 is a diagram showing an exemplary structure of an encoder.

인코더(400, 영상 부호화 장치)는 기본 계층 인코더(410), 적어도 하나의 향상 계층 인코더(420), 및 다중화기(430) 중에서 적어도 하나를 포함할 수 있다.The encoder 400 may include at least one of a base layer encoder 410, at least one enhancement layer encoder 420, and a multiplexer 430.

인코더(400)는 스케일러블 비디오 코딩 방법을 사용하여 전체 영상을 인코딩할 수 있다. 스케일러블 비디오 코딩 방법은 SVC(Scalable Video Coding) 및/또는 SHVC(Scalable High Efficiency Video Coding)를 포함할 수 있다.The encoder 400 may encode the entire image using a scalable video coding method. The scalable video coding method may include Scalable Video Coding (SVC) and / or Scalable High Efficiency Video Coding (SHVC).

스케일러블 비디오 코딩 방법은 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라서 시간적, 공간적, 및 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 예를 들어, 인코더(400)는 동일한 비디오 데이터에 대하여 두 가지 이상의 다른 품질(또는 해상도, 프레임 레이트)의 영상들을 인코딩하여 비트스트림을 생성할 수 있다.The scalable video coding method is an image compression method for providing a variety of services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. For example, the encoder 400 may encode images of two or more different qualities (or resolution, frame rate) for the same video data to generate a bitstream.

예를 들어, 인코더(400)는 비디오 데이터의 압축 성능을 높이기 위해서 계층 간 중복성을 이용한 인코딩 방법인 계층간 예측 툴(Inter-layer prediction tools)을 사용할 수 있다. 계층 간 예측 툴은 계층 간에 존재하는 영상의 중복성을 제거하여 향상 계층(Enhancement Layer; EL)에서의 압출 효율을 높이는 기술이다.For example, the encoder 400 may use an inter-layer prediction tool, which is an encoding method using intra-layer redundancy, in order to increase the compression performance of video data. The inter-layer prediction tool is a technique for enhancing the extrusion efficiency in an enhancement layer (EL) by eliminating redundancy of images existing between layers.

향상 계층은 계층 간 예측 툴을 이용하여 참조 계층(Reference Layer)의 정보를 참조하여 인코딩될 수 있다. 참조 계층이란 향상 계층 인코딩 시 참조되는 하위 계층을 말한다. 여기서, 계층 간 툴을 사용함으로써 계층 사이에 의존성(Dependency)이 존재하기 때문에, 최상위 계층의 영상을 디코딩하기 위해서는 참조되는 모든 하위 계층의 비트스트림이 필요하다. 중간 계층에서는 디코딩 대상이 되는 계층과 그 하위 계층들의 비트스트림 만을 획득하여 디코딩을 수행할 수 있다. 최하위 계층의 비트스트림은 기본 계층(Base Layer; BL)으로써, H.264/AVC, HEVC 등의 인코더로 인코딩될 수 있다.The enhancement layer can be encoded by referring to information of a reference layer using an inter-layer prediction tool. The reference layer refers to the lower layer that is referred to in the enhancement layer encoding. Here, since there is a dependency between layers by using a layer-to-layer tool, in order to decode the image of the highest layer, a bitstream of all lower layers to be referred to is required. In the middle layer, decoding can be performed by acquiring only a bitstream of a layer to be decoded and its lower layers. The bit stream of the lowest layer is a base layer (BL), and can be encoded by an encoder such as H.264 / AVC or HEVC.

기본 계층 인코더(410)는 전체 영상을 인코딩하여 기본 계층을 위한 기본 계층 비디오 데이터(또는 기본 계층 비트스트림)를 생성할 수 있다. 예를 들어, 기본 계층 비디오 데이터는 사용자가 가상 공간 내에서 바라보는 전체 영역을 위한 비디오 데이터를 포함할 수 있다. 기본 계층의 영상은 가장 낮은 화질의 영상일 수 있다.The base layer encoder 410 may encode the entire image to generate base layer video data (or base layer bitstream) for the base layer. For example, the base layer video data may include video data for the entire area viewed by the user in the virtual space. The image of the base layer may be the image of the lowest image quality.

향상 계층 인코더(420)는, 시그널링 데이터(예를 들어, 관심 영역 정보) 및 기본 계층 비디오 데이터를 기초로, 전체 영상을 인코딩하여 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터(또는 향상 계층 비트스트림)를 생성할 수 있다. 향상 계층 비디오 데이터는 전체 영역 내에서 관심 영역을 위한 비디오 데이터를 포함할 수 있다.The enhancement layer encoder 420 encodes the entire image based on signaling data (e.g., region of interest information) and base layer video data to generate at least one enhancement layer for at least one enhancement layer, Video data (or enhancement layer bitstream). The enhancement layer video data may include video data for a region of interest within the entire region.

다중화기(430)는 기본 계층 비디오 데이터, 적어도 하나의 향상 계층 비디오 데이터, 및/또는 시그널링 데이터를 멀티플렉싱하고, 전체 영상에 해당하는 하나의 비트스트림을 생성할 수 있다.The multiplexer 430 may multiplex the base layer video data, the at least one enhancement layer video data, and / or the signaling data, and may generate one bitstream corresponding to the entire image.

도 5는 관심 영역을 시그널링하는 예시적인 방법을 나타낸 도면이다.5 is a diagram illustrating an exemplary method of signaling a region of interest.

도 5를 참조하면, 스케일러블 비디오에서의 관심 영역을 시그널링하는 방법을 나타낸다.Referring to FIG. 5, there is shown a method of signaling a region of interest in scalable video.

서버 디바이스(또는 인코더)는 하나의 비디오 데이터(또는 픽처)를 직사각형 모양을 갖는 여러 타일(Tile)들로 분할할 수 있다. 예를 들어, 비디오 데이터는 Coding Tree Unit(CTU) 단위를 경계로 분할될 수 있다. 예를 들어, 하나의 CTU는 Y CTB, Cb CTB, 및 Cr CTB를 포함할 수 있다.A server device (or an encoder) may divide one video data (or picture) into a plurality of tiles having a rectangular shape. For example, video data can be partitioned into Coding Tree Unit (CTU) units. For example, one CTU may include Y CTB, Cb CTB, and Cr CTB.

서버 디바이스는 빠른 사용자 응답을 위해서 기본 계층의 비디오 데이터는 타일(Tile)로 분할하지 않고 전체적으로 인코딩할 수 있다. 그리고, 서버 디바이스는 하나 이상의 향상 계층들의 비디오 데이터는 필요에 따라서 일부 또는 전체를 여러 타일(Tile)들로 분할하여 인코딩할 수 있다.The server device can encode the video data of the base layer as a whole without dividing the base layer into tiles for fast user response. The server device can divide and encode video data of one or more enhancement layers into a plurality of tiles, part or all, as needed.

즉, 서버 디바이스는 향상 계층의 비디오 데이터는 적어도 하나의 타일로 분할하고, 관심 영역(510, ROI, Region of Interest)에 해당하는 타일들을 인코딩할 수 있다.That is, the server device may divide the video data of the enhancement layer into at least one tile and encode the tiles corresponding to the region of interest 510 (ROI, Region of Interest).

이 때, 관심 영역(510)은 가상 공간에서 사용자가 보게 될 중요 오브젝트(Object)가 위치할 타일들의 위치 (e.g. 게임 등에서 새로운 적이 등장하는 위치, 화상 통신에서 가상공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳에 해당할 수 있다.In this case, the area of interest 510 includes a location of a tile where an important object to be viewed by the user is to be located (e.g., a location where a new enemy appears in a game or the like, a position of a speaker in a virtual space in video communication) And / or where the user's gaze is being viewed.

또한, 서버 디바이스는 관심 영역에 포함 되는 적어도 하나의 타일을 식별하는 타일 정보를 포함하는 관심 영역 정보를 생성할 수 있다. 예를 들어, 관심 영역 정보는 관심 영역 판단부, 시그널링 데이터 생성부, 및/또는 인코더에 의해서 생성될 수 있다.The server device may also generate region of interest information including tile information identifying at least one tile included in the region of interest. For example, the region of interest information may be generated by the region of interest determiner, the signaling data generator, and / or the encoder.

관심 영역(510)의 타일 정보는 연속적이므로 모든 타일의 번호를 다 갖지 않더라도 효과적으로 압축될 수 있다. 예를 들어, 타일 정보는 관심 영역에 해당하는 모든 타일의 번호들뿐만 아니라 타일의 시작과 끝 번호, 좌표점 정보, CU (Coding Unit) 번호 리스트, 수식으로 표현된 타일 번호를 포함할 수 있다.Since the tile information in the area of interest 510 is continuous, it can be effectively compressed even if it does not have all the numbers of tiles. For example, the tile information may include not only the numbers of all the tiles corresponding to the region of interest but also the beginning and ending numbers of the tiles, the coordinate point information, the CU (Coding Unit) number list, and the tile number expressed by the formula.

비-관심 영역의 타일 정보는 인코더가 제공하는 Entropy coding을 거친 후 다른 클라이언트 디바이스, 영상 프로세싱 컴퓨팅 장비, 및/또는 서버로 전송될 수 있다.The tile information in the non-interest region may be sent to another client device, image processing computing device, and / or server after entropy coding provided by the encoder.

관심 영역 정보는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol)을 통해 전해질 수 있다. 또한, 관심 영역 정보는 비디오 표준의 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header) 등의 패킷 단위에서 전해질 수 있다. 또한, 관심 영역 정보는 비디오 파일을 서술하는 별도의 파일로(e.g. DASH의 MPD) 전달될 수 있다.The region of interest may be delivered via a High-Level Syntax Protocol carrying the session information. In addition, the region of interest may be transmitted in packet units such as SEI (Supplement Enhancement Information), VUI (video usability information), and slice header of a video standard. In addition, the region of interest information may be transferred to a separate file describing the video file (e.g., MPD of DASH).

이하에서는, 단일 화면 비디오에서의 관심 영역을 시그널링하는 방법을 나타낸다.Hereinafter, a method of signaling a region of interest in single-screen video is shown.

본 명세서의 예시적인 기술은 스케일러블 비디오가 아닌 단일 화면 영상에서는 일반적으로 관심 영역(ROI)이 아닌 영역을 Downscaling (Downsampling)하는 방식으로 화질을 떨어뜨리는 기법을 사용할 수 있다. 종래 기술은 서비스를 이용하는 단말 간에 downscaling 을 위해 쓴 필터(filter) 정보를 공유하지 않고, 처음부터 한가지 기술로 약속을 하거나 인코더만 필터 정보를 알고 있다.The exemplary technique of the present invention can use a technique of reducing the image quality by downscaling (downsampling) an area other than a ROI in a single-screen image, rather than a scalable video. The prior art does not share the filter information used for downscaling between the terminals using the service, but makes an appointment from the beginning with only one technique, or only the encoder knows the filter information.

하지만, 서버 디바이스는, 인코딩 된 영상을 전달 받는 클라이언트 디바이스(또는 HMD 단말)에서 downscaling된 관심 영역 외 영역의 화질을 조금이라도 향상 시키기 위해, 인코딩 시에 사용된 필터 정보를 클라이언트 디바이스로 전달할 수 있다. 이 기술은 실제로 영상 처리 시간을 상당히 줄일 수 있으며, 화질 향상을 제공할 수 있다.However, the server device may transmit the filter information used in the encoding to the client device in order to improve even the image quality of the down-scaled out-of-interest area in the client device (or the HMD terminal) receiving the encoded image. This technique can actually reduce image processing time significantly and can provide image quality enhancement.

전술한 바와 같이, 서버 디바이스는 관심 영역 정보를 생성할 수 있다. 예를 들어, 관심 영역 정보는 타일 정보뿐만 아니라 필터 정보를 더 포함할 수 있다. 예를 들어, 필터 정보는 약속된 필터 후보들의 번호, 필터에 사용된 값들을 포함할 수 있다.As described above, the server device may generate the region of interest information. For example, the area of interest information may further include filter information as well as tile information. For example, the filter information may include the number of promised filter candidates, the values used in the filter.

도 6은 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.6 is a diagram showing an exemplary configuration of a client device.

클라이언트 디바이스(600)는 영상 입력부(610), 오디오 입력부(620), 센서부(630), 영상 출력부(640), 오디오 출력부(650), 통신부(660), 및/또는 제어부(670) 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 클라이언트 디바이스(600)는 HMD(Head-Mounted Display)일 수 있다. 또한, 클라이언트 디바이스(600)의 제어부(670)는 클라이언트 디바이스(600)에 포함될 수도 있고, 별도의 장치로 존재할 수도 있다.The client device 600 includes an image input unit 610, an audio input unit 620, a sensor unit 630, an image output unit 640, an audio output unit 650, a communication unit 660, and / As shown in FIG. For example, the client device 600 may be an HMD (Head-Mounted Display). The control unit 670 of the client device 600 may be included in the client device 600 or may be a separate device.

영상 입력부(610)는 비디오 데이터를 촬영할 수 있다. 영상 입력부(610)는 사용자의 영상을 획득하는 2D/3D 카메라 및/또는 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.The video input unit 610 can capture video data. The image input unit 610 may include at least one of a 2D / 3D camera and / or an immersive camera for acquiring a user's image. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

오디오 입력부(620)는 사용자의 음성을 녹음할 수 있다. 예를 들어, 오디오 입력부(620)는 마이크를 포함할 수 있다.The audio input unit 620 can record the user's voice. For example, the audio input 620 may include a microphone.

센서부(630)는 사용자 시선의 움직임에 대한 정보를 획득할 수 있다. 예를 들어, 센서부(630)는 물체의 방위 변화를 감지하는 자이로 센서, 이동하는 물체의 가속도나 충격의 세기를 측정하는 가속도 센서, 및 사용자의 시선 방향을 감지하는 외부 센서를 포함할 수 있다. 실시예에 따라서, 센서부(630)는 영상 입력부(610) 및 오디오 입력부(620)를 포함할 수도 있다.The sensor unit 630 can acquire information on the movement of the user's gaze. For example, the sensor unit 630 may include a gyro sensor for sensing a change in the azimuth of the object, an acceleration sensor for measuring the acceleration of the moving object or the intensity of the impact, and an external sensor for sensing the direction of the user's gaze . According to an embodiment, the sensor unit 630 may include an image input unit 610 and an audio input unit 620.

영상 출력부(640)는 통신부(660)로부터 수신되거나 메모리(미도시)에 저장된 영상 데이터를 출력할 수 있다.The video output unit 640 can output video data received from the communication unit 660 or stored in a memory (not shown).

오디오 출력부(650)는 통신부(660)로부터 수신되거나 메모리에 저장된 오디오 데이터를 출력할 수 있다.The audio output unit 650 can output audio data received from the communication unit 660 or stored in the memory.

통신부(660)는 방송망, 무선통신망 및/또는 브로드밴드를 통해서 외부의 클라이언트 디바이스 및/또는 서버 디바이스와 통신할 수 있다. 예를 들어, 통신부(660)는 데이터를 전송하는 전송부(미도시) 및/또는 데이터를 수신하는 수신부(미도시)를 포함할 수 있다.The communication unit 660 can communicate with an external client device and / or a server device through a broadcasting network, a wireless communication network, and / or broadband. For example, the communication unit 660 may include a transmitting unit (not shown) for transmitting data and / or a receiving unit (not shown) for receiving data.

제어부(670)는 클라이언트 디바이스(600)의 모든 동작을 제어할 수 있다. 제어부(670)는 서버 디바이스로부터 수신한 비디오 데이터 및 시그널링 데이터를 처리할 수 있다. 제어부(670)에 대한 구체적인 내용은 이하에서 설명한다.The control unit 670 can control all operations of the client device 600. [ The control unit 670 can process the video data and the signaling data received from the server device. Details of the control unit 670 will be described below.

도 7은 제어부의 예시적인 구성을 나타낸 도면이다.7 is a diagram showing an exemplary configuration of the control unit.

제어부(700)는 시그널링 데이터 및/또는 비디오 데이터를 처리할 수 있다. 제어부(700)는 시그널링 데이터 추출부(710), 디코더(720), 시선 판단부(730), 및/또는 시그널링 데이터 생성부(740) 중에서 적어도 하나를 포함할 수 있다.The control unit 700 may process the signaling data and / or the video data. The control unit 700 may include at least one of a signaling data extractor 710, a decoder 720, a line of sight determiner 730, and / or a signaling data generator 740.

시그널링 데이터 추출부(710)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 관심 영역 정보를 포함할 수 있다.The signaling data extracting unit 710 may extract signaling data from data received from the server device and / or another client device. For example, the signaling data may include region of interest information.

디코더(720)는 시그널링 데이터를 기초로 비디오 데이터를 디코딩할 수 있다. 예를 들어, 디코더(720)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 디코딩할 수 있다. 예를 들어, 가상 공간 내에서 사용자가 특정 영역을 바라보는 경우, 디코더(720)는 가상 공간 내의 사용자 시선을 기초로 특정 영역에 해당하는 영상은 고화질로 디코딩하고, 특정 영역 이외에 해당하는 영상은 저화질로 디코딩할 수 있다. 실시예에 따라서, 디코더(720)는 시그널링 데이터 추출부(710), 시선 판단부(730), 및/또는 시그널링 데이터 생성부(740) 중에서 적어도 하나를 포함할 수 있다.Decoder 720 may decode the video data based on the signaling data. For example, the decoder 720 may decode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the user looks at a specific area in the virtual space, the decoder 720 decodes the image corresponding to the specific area with high image quality based on the user's gaze in the virtual space, Lt; / RTI > The decoder 720 may include at least one of a signaling data extractor 710, a line of sight determiner 730, and / or a signaling data generator 740 according to an embodiment of the present invention.

시선 판단부(730)는 가상 공간 내에서 사용자의 시선을 판단하고, 영상 구성 정보를 생성할 수 있다. 예를 들어, 영상 구성 정보는 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.The gaze determining unit 730 can determine the user's gaze in the virtual space and generate the image configuration information. For example, the image configuration information may include gaze information indicating a gaze direction and / or zoom area information indicating a viewing angle of a user.

시그널링 데이터 생성부(740)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 전송할 수 있다. 시그널링 데이터는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol)을 통해 전해질 수 있다. 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data generation unit 740 may generate signaling data for transmission to a server device and / or another client device. For example, the signaling data may transmit image configuration information. The signaling data may be delivered via a High-Level Syntax Protocol carrying the session information. The signaling data may be transmitted via at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing the video data.

도 8은 디코더의 예시적인 구성을 나타낸 도면이다.8 is a diagram showing an exemplary configuration of a decoder.

디코더(800)는 추출기(810), 기본 계층 디코더(820), 및/또는 적어도 하나의 향상 계층 디코더(830) 중에서 적어도 하나를 포함할 수 있다.The decoder 800 may include at least one of an extractor 810, a base layer decoder 820, and / or at least one enhancement layer decoder 830.

디코더(800)는 스케일러블 비디오 코딩 방법의 역 과정을 이용하여 비트스트림(비디오 데이터)을 디코딩할 수 있다.The decoder 800 may decode the bitstream (video data) using an inverse process of the scalable video coding method.

추출기(810)는 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림(비디오 데이터)을 수신하고, 재생하고자 하는 영상의 화질에 따라서 비트스트림을 선택적으로 추출할 수 있다. 예를 들어, 비트스트림(비디오 데이터)은 기본 계층을 위한 기본 계층 비트스트림(기본 계층 비디오 데이터) 및 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)을 포함할 수 있다. 기본 계층 비트스트림(기본 계층 비디오 데이터)는 가상 공간의 전체 영역을 위한 위한 비디오 데이터를 포함할 수 있다. 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)는 전체 영역 내에서 관심 영역을 위한 비디오 데이터를 포함할 수 있다.The extractor 810 receives the bitstream (video data) including the video data and the signaling data, and can selectively extract the bitstream according to the image quality of the video to be reproduced. For example, a bitstream (video data) may include a base layer bitstream (base layer video data) for a base layer and at least one enhancement layer bitstream for at least one enhancement layer predicted from the base layer ). The base layer bitstream (base layer video data) may include video data for the entire area of the virtual space. At least one enhancement layer bitstream (enhancement layer video data) may include video data for a region of interest within the entire region.

또한, 시그널링 데이터는 화상 회의 서비스를 위한 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 지시하는 관심 영역 정보를 포함할 수 있다.The signaling data may also include region of interest information indicating a region of interest corresponding to the direction of the user's gaze within the entire region of the virtual space for the video conferencing service.

기본 계층 디코더(820)는 저화질 영상을 위한 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)를 디코딩할 수 있다.The base layer decoder 820 can decode a base layer bitstream (or base layer video data) for a low-quality image.

향상 계층 디코더(830)는 시그널링 데이터 및/또는 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)를 기초로 고화질 영상을 위한 적어도 하나의 향상 계층의 비트스트림(또는 향상 계층 비디오 데이터)를 디코딩할 수 있다.The enhancement layer decoder 830 can decode at least one enhancement layer bitstream (or enhancement layer video data) for the high-definition video based on the signaling data and / or the bitstream (or base layer video data) have.

이하에서는, 사용자 시선의 움직임에 실시간으로 대응하기 위한 영상 구성 정보를 생성하는 방법에 대하여 설명한다.Hereinafter, a method of generating image configuration information for responding to the movement of the user's gaze in real time will be described.

영상 구성 정보는 사용자의 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보 중에서 적어도 하나를 포함할 수 있다. 사용자 시선이란 실제 공간이 아닌 가상 공간 내에서 사용자가 바라보는 방향을 의미한다. 또한, 시선 정보는 현재 사용자의 시선 방향을 지시하는 정보뿐만 아니라, 미래에 사용자의 시선 방향을 지시하는 정보(예를 들어, 주목을 받을 것이라 예상되는 시선 지점에 대한 정보)를 포함할 수 있다.The image configuration information may include at least one of gaze information indicating a gaze direction of a user and / or zoom area information indicating a viewing angle of a user. The user's gaze is the direction that the user looks in the virtual space, not the actual space. In addition, the gaze information may include information indicating the gaze direction of the user in the future (for example, information on gaze points that are expected to receive attention), as well as information indicating the gaze direction of the current user.

클라이언트 디바이스는 사용자를 중심으로 가상 공간 내에 위치하는 특정한 영역을 바라보는 동작을 센싱하고, 이를 처리할 수 있다.The client device can sense the operation of looking at a specific area located in the virtual space around the user and process the operation.

클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센서부로부터 센싱 정보를 수신할 수 있다. 센싱 정보는 카메라에 의해 촬영된 영상, 마이크에 의해 녹음된 음성일 수 있다. 또한, 센싱 정보는 자이로 센서, 가속도 센서, 및 외부 센서에 의해서 감지된 데이터일 수 있다.The client device can receive the sensing information from the sensor unit using the control unit and / or the sight line determination unit. The sensing information may be a video shot by a camera, or a voice recorded by a microphone. In addition, the sensing information may be data sensed by a gyro sensor, an acceleration sensor, and an external sensor.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센싱 정보를 기초로 사용자 시선의 움직임을 확인할 수 있다. 예를 들어, 클라이언트 디바이스는 센싱 정보가 가지는 값의 변화를 기초로 사용자 시선의 움직임을 확인할 수 있다.Further, the client device can confirm the movement of the user's gaze based on the sensing information by using the control unit and / or the visual-line determining unit. For example, the client device can check the movement of the user's gaze based on the change of the value of the sensing information.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 가상 현실 공간에서의 영상 구성 정보를 생성할 수 있다. 예를 들어, 클라이언트 디바이스가 물리적으로 움직이거나 사용자의 시선이 움직이는 경우, 클라이언트 디바이스는 센싱 정보를 기초로 가상 현실 공간에서의 사용자의 시선 정보 및/또는 줌 영역 정보를 계산할 수 있다.Further, the client device can generate image configuration information in the virtual reality space using the control unit and / or the visual determination unit. For example, when the client device physically moves or the user's gaze moves, the client device can calculate the gaze information and / or the zoom area information of the user in the virtual reality space based on the sensing information.

또한, 클라이언트 디바이스는, 통신부를 이용하여, 영상 구성 정보를 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송할 수 있다. 또한, 클라이언트 디바이스는 영상 구성 정보를 자신의 다른 구성요소로 전달할 수도 있다.Further, the client device can transmit image configuration information to the server device and / or another client device using the communication unit. In addition, the client device may forward the video configuration information to its other components.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성하는 방법을 설명하였다. 다만 이에 한정되지 않으며, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 영상 구성 정보를 생성할 수도 있다.In the foregoing, a method of generating image configuration information by a client device has been described. However, the present invention is not limited thereto, and the server device may receive the sensing information from the client device and generate the image configuration information.

또한, 클라이언트 디바이스와 연결된 외부의 컴퓨팅 디바이스가 영상 구성 정보를 생성할 수 있으며, 컴퓨팅 디바이스는 영상 구성 정보를 자신의 클라이언트 디바이스, 다른 클라이언트 디바이스, 및/또는 서버 디바이스로 전달할 수도 있다.In addition, an external computing device connected to the client device may generate image configuration information, and the computing device may communicate image configuration information to its client device, another client device, and / or a server device.

이하에서는, 클라이언트 디바이스가 영상 구성 정보를 시그널링 하는 방법을 설명한다.Hereinafter, a method for the client device to signal image configuration information will be described.

영상 구성 정보(시점 정보 및/또는 줌 영역 정보를 포함)를 시그널링하는 부분은 매우 중요하다. 영상 구성 정보의 시그널링이 너무 잦을 경우, 클라이언트 디바이스, 서버 디바이스, 및/또는 전체 네트워크에 부담을 줄 수 있다.Signaling the video configuration information (including viewpoint information and / or zoom area information) is very important. If the signaling of the video configuration information is too frequent, it may place a burden on the client device, the server device, and / or the entire network.

따라서, 클라이언트 디바이스는 사용자의 영상 구성 정보(또는 시선 정보 및/또는 줌 영역 정보)가 변경되는 경우에만 영상 구성 정보를 시그널링할 수 있다. 즉, 클라이언트 디바이스는 사용자의 시선 정보가 변경되는 경우에만 사용자의 시선 정보를 다른 클라이언트 디바이스 및/또는 서버 디바이스로 전송할 수 있다.Accordingly, the client device can signal image configuration information only when the image configuration information (or gaze information and / or zoom area information) of the user is changed. That is, the client device can transmit the gaze information of the user to another client device and / or the server device only when the gaze information of the user is changed.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성 및/또는 전송하는 것을 중심으로 설명하였지만, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 센싱 정보를 기초로 영상 구성 정보를 생성하고, 영상 구성 정보를 적어도 하나의 클라이언트 디바이스로 전송할 수도 있다.In the above description, the client device generates and / or transmits the image configuration information. However, the server device may receive the sensing information from the client device, generate the image configuration information based on the sensing information, It may be transmitted to one client device.

이상에서 언급한 시그널링은 서버 디바이스, 클라이언트 디바이스, 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다. 또한, 이상에서 언급한 시그널링은 클라이언트 디바이스 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다.The above-mentioned signaling may be signaling between a server device, a client device, and / or an external computing device (if present). In addition, the above-mentioned signaling may be signaling between the client device and / or an external computing device (if present).

이하에서는, 높고/낮은 수준의 영상을 전송하는 예시적인 방법을 설명한다.In the following, an exemplary method of transmitting high / low level images is described.

사용자의 시선 정보를 기초로 높고/낮은 수준의 영상을 전송하는 방법은 스케일러블 코덱의 계층을 스위칭하는 방법, 싱글 비트스트림 및 실시간 인코딩의 경우 QP(Quantization Parameter) 등을 이용한 Rate Control 방법, DASH 등의 단일 비트스트림의 경우 청크(Chunk) 단위로 스위칭하는 방법, 다운스케일링/업스케일링방법(Down Scaling/Up Scaling), 및/또는 렌더링(Rendering)의 경우 더 많은 리소스를 활용한 고화질 렌더링 방법을 포함할 수 있다.A method of transmitting a high / low level image based on a user's gaze information includes a method of switching layers of a scalable codec, a rate control method using QP (quantization parameter) in case of single bit stream and real time encoding, DASH A method of switching in units of chunks in the case of a single bit stream of a bit stream, a down scaling / up scaling method and / or a high quality rendering method utilizing more resources in the case of rendering can do.

전술한 예시적인 기술은 비록 비록 스케일러블 비디오를 통한 차별적 전송 기법을 이야기하고 있지만, 단일 계층을 갖는 일반 비디오 코딩 기술을 사용할 경우에도, 양자화 계수(Quantization Parameter)나 다운스케일링/업스케일링 정도를 조절함으로써, 전체 대역폭을 낮추고, 빠르게 사용자 시선 움직임에 응답하는 등의 장점을 제공할 수 있다. 또한 미리 여러 비트레이트(bitrate)를 갖는 비트스트림(bitstream)으로 트랜스코딩 된 파일들을 사용할 경우, 본 명세서의 예시적인 기술은 청크(Chunk) 단위로 높은 수준의 영상과 낮은 수준의 영상 사이를 스위칭하여 제공할 수 있다.Although the above-described exemplary techniques describe a differential transmission scheme using scalable video, even when a general video coding technique with a single layer is used, the quantization parameter or the degree of downscaling / upscaling can be adjusted , Lowering overall bandwidth, and quickly responding to user gaze movements. In addition, when using files that are transcoded into a bitstream having several bitrates in advance, the exemplary technique of the present invention switches between a high-level image and a low-level image on a chunk basis .

또한, 본 명세서는 가상 현실 시스템을 예로 들고 있지만, 본 명세서는 HMD를 이용한 VR (Virtual Reality) 게임, AR (Augmented Reality) 게임 등에서도 똑같이 적용될 수 있다. 즉, 사용자가 바라보는 시선에 해당하는 영역을 높은 수준의 영상으로 제공하고, 사용자가 바라볼 것으로 예상되는 영역이나 오브젝트(Object)가 아닌 곳을 바라 볼 경우만 시그널링하는 기법 모두가 가상 현실 시스템의 예에서와 똑같이 적용될 수 있다.In addition, although the present specification assumes a virtual reality system, the present specification can be equally applied to a VR (Virtual Reality) game using an HMD, an Augmented Reality (AR) game, and the like. That is, all of the techniques for providing a high-level region corresponding to the line of sight that the user is looking at, and signaling only when the user looks at an area or an object that is not expected to be viewed, It can be applied just as in the example.

전체 영상을 하나의 압축된 영상 비트스트림(Bitstream)으로 받아서 이를 복호화(Decoding)하고 사용자가 바라보는 영역을 가상의 공간에 렌더링(Rendering)하는 기술은 전체 영상(예를 들어, 360도 몰입형(Immersive) 영상)을 모두 비트스트림으로 전송 받는다. 각각이 고해상도인 영상이 모인 이 비디오 비트스트림의 총 대역폭은 매우 클 수밖에 없어서, 비트스트림의 총대역폭이 매우 커지는 것을 방지하기 위해서 국제 비디오 표준 기술 중 SVC 및 HEVC의 스케일러블 확장 표준인 스케일러블 고효율 비디오 부호화(Scalable High Efficiency Video Coding)와 같은 스케일러블 비디오 기술이 사용될 수 있다.A technique for decoding an entire image by using a single compressed image bitstream and decoding it and rendering an area viewed by a user into a virtual space may be performed by using a full image (for example, a 360-degree immersion type Immersive) images) are all transmitted in the bit stream. In order to prevent the total bandwidth of the bitstream from becoming too large, the total bandwidth of the video bitstream, in which each of the high-resolution images is gathered, must be very large. Therefore, a scalable high-efficiency video which is a scalable extension standard of SVCs and HEVCs A scalable video technique such as Scalable High Efficiency Video Coding may be used.

도 9는 현재 뷰포트와 예측 뷰포트를 선반입하여 비디오 프로세싱 속도를 높이고 대역폭을 낮출 수 있는 예시적인 게임 영상의 스케일러블 비디오 코딩 기반 스트리밍 방법을 도시한다Figure 9 illustrates a scalable video coding based streaming method of an exemplary game image that can speed up and speed up video processing at the current time by presenting a viewport and a predictive viewport

도 9를 참조하면, 위의 그림은 사용자가 클라우드 기반 게임 스트리밍을 통해 자동차 경주 게임을 할 때를 가정한 것이다. 상자 920은 현재 보여지고 있는 화면의 중앙부로 현재 사용자의 뷰포트이다. 이 부분은 사용자가 현재 보고 있는 영역이므로 높은 화질이 요구되고, 따라서 스케일러블 비디오 부호화의 저화질 기본 계층 영상 데이터와 하나 이상의 고화질 향상 계층 영상 데이터를 모두 전송해준다.Referring to FIG. 9, the above picture assumes that a user plays a car racing game through cloud-based game streaming. Box 920 is the current user's viewport at the center of the screen currently being viewed. Since this area is the area currently viewed by the user, a high image quality is required, and therefore, both the low-quality base layer image data of the scalable video encoding and the at least one high-quality enhancement layer image data are transmitted.

예시한 자동차 경주 게임 컨텐츠의 경우, 예측 뷰포트(910)는 사용자 정면의 주행코스로 예상할 수 있다. 따라서 해당하는 예측 뷰포트를 미리 높은 화질로 선반입(Prefetching)함으로써 지연을 줄일 수 있게 한다.In the case of the illustrated car racing game content, the predictive viewport 910 can be expected to be a running course on the front of the user. Thus, it is possible to reduce the delay by prefetching the corresponding prediction viewport to a high image quality in advance.

또한 현재 뷰포트나 예측 뷰포트 이외의 영역(930)은 낮은 화질의 기본 계층 정보만 보내줌으로써 전체 대역폭을 낮출 수 있다. 이 때, 기본 계층이 제공하는 영상 화질 및 지연시간은 가상현실 사용자 서비스에 매우 중요한 멀미현상을 저감하도록 일정 수준 이상의 화질을 유지해야 한다.In addition, the area 930 other than the current viewport or the predicted viewport can lower the overall bandwidth by sending only low-quality base layer information. In this case, the image quality and delay time provided by the base layer must maintain a certain level of quality to reduce the motion sickness, which is very important for the virtual reality user service.

클라우드 기반 게임 스트리밍의 주요과제는 저지연 스트리밍 기술이다. 하지만, 머리장착형 영상장치의 경우 무선네트워크를 이용하는 방식이기 때문에 대역폭 변동의 가능성이 항상 존재한다. 만약 특정영역을 항상 높은 화질로 전송하도록 고정해 버린다면, 갑자기 대역폭이 줄어들었을 때 게임 컨텐츠 전송의 지연이 발생할 수도 있다.A major challenge in cloud-based game streaming is low latency streaming technology. However, in the case of a head mounted image device, there is always a possibility of bandwidth variation because it is a method using a wireless network. If a particular area is always fixed to transmit at a high quality, there may be a delay in game content transmission when bandwidth is suddenly reduced.

도 10은 영상 내 대상의 거리에 따른 뷰포트 예측 기술 기반의 예시적인 고화질 게임 스트리밍 방법을 나타낸 도이다.10 is a diagram illustrating an exemplary high-quality game streaming method based on a viewport prediction technique according to a distance of an object in an image.

본 명세서에 제시된 기술은 사용자 뷰포트에 대한 가능성을 예측하여, 예측되는 뷰포트에 포함된 타일들에 높은 우선순위를 부여한다. 예를 들어, 거리정보를 이용하여 어떤 타일이 사용자로부터 가까운 대상들을 많이 포함하고 있는지 구분한 뒤, 가까운 타일(가까운 대상이 많은 타일)이 많은 순서에서 적은 순서로 우선 순위를 부여하고 우선 순위가 높은 타일부터 고화질의 영상을 전송함으로써 대역폭을 줄일 수 있다.The techniques presented herein predict the possibilities for user viewports and give high priority to the tiles contained in the predicted viewport. For example, by using the distance information, it is possible to classify which tiles contain a large number of objects close to the user, and then prioritize the nearest tiles in descending order, in descending order, Bandwidth can be reduced by transmitting high-quality images from tiles.

도 10을 참조하면, 사용자로부터 가까운 대상(1010, 1011, 1012)이 포함된 타일들(1040, 1041, 1042)에는 높은 중요도를 부여하고, 사용자로부터 중간 거리의 대상(1020, 1021)이 포함된 타일들(1050, 1051)에는 중간 중요도를 부여하고, 사용자로부터 먼 대상(1030, 1031)이 포함된 타일들(1060)에는 낮은 중요도를 부여할 수 있다. 이와 같은 타일 별 우선순위 부여 후 대역폭이 갑자기 감소하게 되면, 대역폭 허용 범위까지 우선순위가 높은 타일에서 낮은 타일의 순서로 고화질 영상 데이터를 전송하여 지연 없이 스트리밍을 할 수 있다. 따라서, 대역폭에 기반하여 우선순위가 높은 타일들을 고화질 전송한다. 여기에서, 타일의 우선 순위가 높을수록 해당 타일이 포함된 영역이 뷰포트가 될 가능성이 높다.10, the tiles 1040, 1041, and 1042 including the objects 1010, 1011, and 1012 close to the user are given high importance, and the tiles 1020, 1021, and 1012 including the objects 1020 and 1021 The tiles 1050 and 1051 may be given a medium importance and the tiles 1060 including the objects 1030 and 1031 far from the user may be given a low importance. If the bandwidth is suddenly decreased after priority is given to each tile, the high-quality video data may be transmitted in the order of the highest priority tile to the low-priority tile until the bandwidth allowable range and the streaming can be performed without delay. Therefore, high-priority tiles are transmitted at high image quality based on the bandwidth. Here, the higher the priority of the tile, the more likely the area containing the tile is the viewport.

예시한 거리에 따른 타일 우선순위 부여 기법(뷰포트 가능성이 높음)의 경우, 고화질/저화질 영상을 만드는 방법은 전술한 것과 마찬가지로 스케일러블 비디오 코딩 기술을 적용할 수 있지만, 스케일러블 비디오 코딩 기술 외에 영상의 화질 차이를 부여할 수 있는 기술은 어느 것이나 적용 가능하다.In the case of a tile prioritization technique (with a high viewport probability) according to the distance exemplified, a scalable video coding technique can be applied as well as a scalable video coding technique in the same manner as the above-described method of creating a high quality / Any technique that can give a difference in image quality is applicable.

본 명세서에 제시된 기술은 주인공의 주변 사물들이 우선시되는 역할 게임(Role-Playing Game; RPG)의 스트리밍에서 더욱 효과적일 수 있다The techniques presented herein may be more effective in streaming a Role-Playing Game (RPG) in which the hero's surrounding objects are prioritized

본 명세서에 제시된 방법은 스케일러블 비디오 코딩 기술을 이용하여, 고화질의 영상은 향상 계층 영상으로, 저화질의 영상은 기본 계층 영상으로 전송하였지만, 스케일러블 비디오 코딩 기술을 이용하지 않고도 전체 기본 품질의 영상을 하나의 영상으로, 현재 뷰포트 및 예측 뷰포트에 대한 영상은 일반적인 고화질의 영상을 사용할 수도 있다.Although the method disclosed in this specification uses a scalable video coding technique to transmit a high-quality image as an enhancement layer image and a low-quality image as a base layer image, a full-quality image can be obtained without using a scalable video coding technique With one image, the image for the current viewport and the predictive viewport can use a normal high-quality image.

본 명세서에 제시된 방법은 화면 분할을 지원하는 다른 비디오 병렬처리 기법들, 예를 들어, 슬라이스(Slice), FMO(Flexible Macro Block) 등에 적용 가능하다. 또한 비트 스트림을 분할하여 전송하는 스트리밍 서비스인 MPEG DASH, 마이크로소프트 사의 Smooth Streaming, 애플 사의 HLS(HTTP Live Streaming; HTTP 라이브 스트리밍)에도 적용할 수 있다.The methods presented herein are applicable to other video parallel processing techniques that support screen segmentation, such as Slice, FMO (Flexible Macro Block), and the like. It can also be applied to MPEG DASH, which is a streaming service for dividing and transmitting a bitstream, Smooth Streaming by Microsoft, and HLS (HTTP Live Streaming; HTTP live streaming) by Apple.

이하에서는 도 11 내지 도 12를 참조하여 게임 영상 스트리밍 등 가상 현실 공간에 대한 영상 스트리밍에서 현재 뷰포트 및 예측 뷰포트 정보, 거리 정보를 이용한 뷰포트의 우선 순위를 이용하여 통신 대역폭을 절감하는 방법을 설명한다.Hereinafter, a method of reducing the communication bandwidth by using the priority of the viewport using the current viewport, the predictive viewport information, and the distance information in the video streaming for the virtual reality space, such as game video streaming, will be described below.

도 11은 영상 스트리밍 서비스를 위한 비디오 서버에서의 영상 전송 방법을 예시적으로 도시한다.FIG. 11 exemplarily shows a video transmission method in a video server for a video streaming service.

도 11을 참조하면, 비디오 서버에서의 영상 전송 방법에서 비디오 서버는 먼저, 가상 현실 공간에 대한 영상 스트리밍 서비스를 위한 비디오 데이터를 생성한다(1101).Referring to FIG. 11, in a video transmission method in a video server, a video server first generates video data for a video streaming service for a virtual reality space (1101).

다음으로, 비디오 서버는 가상 현실 공간 내에서 사용자가 바라보고 있는 현재 뷰포트에 대한 정보와 사용자가 바라볼 것으로 예상되는 예측 뷰포트에 대한 정보에 적어도 일부 기초하여 시그널링 데이터를 생성한다(1103).Next, the video server generates signaling data based on at least a part of the information about the current viewport that the user is looking at in the virtual reality space and the information about the predicted viewport that the user expects to see (1103).

다음으로, 비디오 서버는 상기 비디오 데이터 및 상기 시그널링 데이터를 포함하는 비트스트림 생성하고, 생성된 비트스트림을 통신부를 통해서 클라이언트 디바이스로 전송한다(1105).Next, the video server generates a bitstream including the video data and the signaling data, and transmits the generated bitstream to the client device through the communication unit (1105).

상기 비디오 데이터는 가상 현실 공간 전체에 대한 기본 화질의 비디오 데이터와, 현재 뷰포트와 예측 뷰포트에 대한 고화질의 비디오 데이터를 포함할 수 있다.The video data may include video data of a basic image quality for the entire virtual reality space, and high-quality video data for the current viewport and the prediction viewport.

스케일러블 비디오 코딩 기술을 사용할 때는, 상기 비디오 데이터는 가상 현실 공간 전체에 대한 기본 계층 비디오 데이터 및 현재 뷰포트와 예측 뷰포트에 대한 적어도 하나의 향상 계층 비디오 데이터를 포함할 수 있다.When using scalable video coding techniques, the video data may include base layer video data for the entire virtual reality space and at least one enhancement layer video data for the current viewport and the prediction viewport.

비디오 서버는 현재 뷰포트와 예측 뷰포트에 대응되는 영역에 대해서는 높은 품질의 비디오 데이터를 전송하고, 상기 현재 뷰포트와 상기 예측 뷰포트 이외의 영역에 대해서는 기본 화질(품질)의 비디오 데이터만 전송할 수 있다.The video server transmits high quality video data for the area corresponding to the current viewport and the predictive viewport and only video data of the standard picture quality (quality) for the area other than the current viewport and the predictive viewport.

또한, 비디오 서버는, 비디오 데이터가 스케일러블 비디오 데이터일 경우에는, 현재 뷰포트와 예측 뷰포트에 대응되는 영역에 대해서는 기본 계층 비디오 데이터와 향상 계층 비디오 데이터를 함께 전송하고, 상기 현재 뷰포트와 상기 예측 뷰포트 이외의 영역에 대해서는 기본 계층 비디오 데이터만 전송할 수 있다.When the video data is scalable video data, the video server transmits the base layer video data and the enhancement layer video data together for the area corresponding to the current viewport and the prediction viewport, Only the base layer video data can be transmitted.

상기 적어도 하나의 향상 계층 비디오 데이터는 각 계층별로 직사각형 모양의 적어도 하나의 타일로 분할되고, 상기 기본 화질(저화질)의 비디오 데이터와 상기 고화질의 비디오 데이터는 각각 직사각형 모양의 적어도 하나의 타일로 분할될 수 있다.Wherein the at least one enhancement layer video data is divided into at least one tile of a rectangular shape for each layer, and the video data of the base picture quality (low picture quality) and the high definition video data are divided into at least one tile of a rectangular shape .

또한, 상기 시그널링 데이터는 현재 뷰포트 및 예측 뷰포트에 포함되는 적어도 하나의 타일을 식별하는 타일 정보를 포함할 수 있다.In addition, the signaling data may include tile information identifying at least one tile included in the current viewport and the predicted viewport.

또한, 상기 비디오 서버에서의 영상 전송 방법은 비디오 데이터를 전송하는 통신 회선의 대역폭이 향상 계층 비디오 데이터 또는 고화질의 영상 데이터를 모두 전송하기에 충분한지의 여부를 판단(1107)할 수 있다.In addition, the video transmission method in the video server may determine whether the bandwidth of the communication line for transmitting video data is sufficient to transmit both the enhanced layer video data and the high-quality video data (1107).

상기 영상 전송 방법은 통신 회선의 대역폭이 충분하지 않은 것으로 판단되는 경우, 뷰포트 내의 타일들을 우선순위에 따라 전송(1109)할 수 있으며, 상기 통신 회선의 대역폭이 충분한 것으로 판단되는 경우, 현재 뷰포트 및 예측 뷰포트 내의 타일들을 모두 전송(1111)할 수 있다.If it is determined that the bandwidth of the communication line is insufficient, the tile in the viewport may be transmitted 1109 according to the priority order. If it is determined that the bandwidth of the communication line is sufficient, All tiles in the viewport can be transferred (1111).

여기에서, 우선순위에 따라 전송하는 동작은 대역폭의 허용 범위까지 우선순위가 높은 타일부터 낮은 타일의 순서로 향상 계층 비디오 데이터 또는 고화질의 영상 데이터를 전송할 수 있다.Here, in the operation of transmitting according to the priority, the enhancement layer video data or the high image quality image data can be transmitted in the order of the tile having the highest priority to the allowable range of the bandwidth to the lower tile.

한편, 우선순위는 가상 현실 영상 안에서 사용자로부터 건물, 특정 사물, 게임 캐릭터, 주행 도로 등의 타일 내의 대상(object)까지의 거리에 따라 결정되며, 대상이 사용자에게 가까울수록 대상이 포함된 타일에 높은 순위가 부여될 수 있다. 즉, 타일 내의 대상들이 사용자에게 가까울수록, 상기 대상이 포함된 타일들은 예측 뷰포트가 될 가능성이 증가하게 된다.On the other hand, the priority is determined according to the distance from a user to a target in a tile such as a building, a specific object, a game character, a driving road, etc. in the virtual reality image. As the object is closer to the user, Ranking can be given. That is, the closer the objects in the tile are to the user, the greater the likelihood that the tiles containing the object become the predicted viewport.

또한, 예측 뷰포트는 전술한 타일 내의 대상과 사용자와의 거리뿐만 아니라, 현재 뷰포트와의 거리, 방향 등을 포함하는 현재 뷰포트에 대한 정보와 가상 현실 콘텐츠의 내용에 기초하여 결정될 수도 있다.In addition, the prediction viewport may be determined based on the information on the current viewport including the distance to the current viewport, the direction and the like, and the contents of the virtual reality contents, as well as the distance between the object and the user in the tile described above.

또한, 상기 향상 계층 비디오 데이터는 상기 가상 현실 공간 전체 영역 내에서 상기 현재 뷰포트와 상기 예측 뷰포트에 대응되는 영역에 대한 고화질 비디오 데이터를 포함할 수 있다.In addition, the enhancement layer video data may include high-definition video data for the current viewport and an area corresponding to the prediction viewport within the entire area of the virtual reality space.

또한, 상기 시그널링 데이터는 영상 구성 정보를 기초로 생성될 수 있다.Also, the signaling data may be generated based on the image configuration information.

상기 영상 구성 정보는 가상 현실 공간 내에서 사용자의 뷰포트를 지시하는 시선 정보와 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.The image configuration information may include gaze information indicating a viewport of a user in a virtual reality space and zoom area information indicating a viewing angle of a user.

또한, 상기 시그널링 데이터는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol), SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더(Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.In addition, the signaling data may include a High-Level Syntax Protocol (SEI), a Supplement Enhancement Information (SEI), a Video Usability Information (VUI), a Slice Header, Lt; / RTI >

도 12는 영상 스트리밍 서비스를 위한 클라이언트 디바이스에서의 영상 수신 방법을 예시적으로 도시한다.FIG. 12 exemplarily shows a video receiving method in a client device for a video streaming service.

도 12를 참조하면, HMD 등의 클라이언트 디바이스에서의 영상 수신 방법에서 클라이언트 디바이스는 가상 현실 공간에 대한 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림을 수신할 수 있다(1201). 또한, 클라이언트 디바이스는 비디오 데이터를 기초로 기본 화질 비디오 데이터를 디코딩할 수 있다(1203). 또한, 클라이언트 디바이스는 비디오 데이터 및 시그널링 데이터를 기초로 고화질 비디오 데이터를 디코딩할 수 있다(1205).Referring to FIG. 12, in a method of receiving images in a client device such as an HMD, a client device may receive a bitstream including video data and signaling data for a virtual reality space (1201). In addition, the client device may decode the basic image quality video data based on the video data (1203). In addition, the client device may decode the high definition video data based on the video data and the signaling data (1205).

시그널링 데이터는 가상 현실 공간 내에서 사용자가 바라보고 있는 영역에 대한 현재 뷰포트와 가상 현실 공간 내에서 사용자가 바라볼 것으로 예상되는 예측 뷰포트에 대한 정보를 적어도 일부 포함할 수 있다.The signaling data may include at least some information about the current viewport for the area that the user is viewing within the virtual reality space and about the predicted viewport that the user is expected to see within the virtual reality space.

또한, 고화질 비디오 데이터는 현재 뷰포트 및 예측 뷰포트에 대응되는 높은고화품질의 비디오 데이터를 포함할 수 있다.In addition, the high-definition video data may include high-quality video data corresponding to the current viewport and the prediction viewport.

또한, 기본 화질 비디오 데이터는 가상 현실 공간 전체 영역에 대한 기본적은 화질(저화질)의 비디오 데이터를 포함할 수 있다.In addition, the basic picture quality video data may include video data of basic picture quality (low picture quality) for the whole area of the virtual reality space.

본 명세서 전체에서 기본 화질 또는 저화질의 비디오 데이터/영상 데이터는 가상 현실 서비스에서 사용자에게 멀미 현상 등의 불쾌감을 일으키지 않을 정도의 일정 수준 이상의 화질을 가질 수 있다.Video data / video data of a basic picture quality or a low picture quality throughout the present specification can have a picture quality of a certain level or more such that the user does not cause an unpleasant feeling such as a motion sickness in a virtual reality service.

또한, 고화질 비디오 데이터는 전체 영역 내에서 현재 뷰포트와 예측 뷰포트에 대응되는 영역에 대한 비디오 데이터를 포함할 수 있다.Further, the high-definition video data may include video data for the area corresponding to the current viewport and the prediction viewport within the entire area.

또한, 고화질 비디오 데이터는 직사각형 모양의 적어도 하나의 타일로 분할될 수 있다.In addition, the high-definition video data may be divided into at least one tile of rectangular shape.

또한, 시그널링 데이터는 현재 뷰포트 및 예측 뷰포트에 포함되는 적어도 하나의 타일을 식별하는 타일 정보를 포함할 수 있다.Also, the signaling data may include tile information identifying at least one tile included in the current viewport and the predicted viewport.

또한, 시그널링 데이터는 영상 구성 정보를 기초로 생성될 수 있으며, 영상 구성 정보는 가상 현실 공간 내에서 사용자의 뷰포트를 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.Also, the signaling data may be generated based on the image configuration information, and the image configuration information may include gaze information indicating the user's viewport in the virtual reality space and zoom area information indicating the viewing angle of the user.

또한, 시그널링 데이터는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol), SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더(Slice Header), 및 상기 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.Further, the signaling data includes a High-Level Syntax Protocol, a Supplement Enhancement Information (SEI), a Video Usability Information (VUI), a Slice Header, and a file describing the video data Lt; / RTI >

또한, 전술한 기본 화질 비디오 데이터와 고화질 비디오 데이터는 스케일러블 비디오 코딩 기술에서는 각각 기본 계층 비디오 데이터와 향상 계층 비디오 데이터일 수 있으며, 수신된 비디오 데이터를 기초로 기본 계층 비디오 데이터를 디코딩할 수 있다.In addition, the basic image quality data and the high image quality video data may be base layer video data and enhancement layer video data, respectively, in the scalable video coding technology, and the base layer video data may be decoded based on the received video data.

또한, 상기 비디오 데이터 및 상기 시그널링 데이터를 기초로 적어도 하나의 향상 계층 비디오 데이터를 디코딩할 수 있다.Also, at least one enhancement layer video data may be decoded based on the video data and the signaling data.

도 13은 영상 스트리밍 서비스를 위한 영상 전송 장치를 예시적으로 도시한다.FIG. 13 exemplarily shows an image transmission apparatus for a video streaming service.

도 13을 참조하면, 영상 전송 장치(1300)는 인코더(1310), 시그널링부(1320), 다중화기(1330) 및 전송부(1340)를 포함하여 구성된다.13, an image transmission apparatus 1300 includes an encoder 1310, a signaling unit 1320, a multiplexer 1330, and a transmission unit 1340.

인코더(1310)는 가상 현실 공간에 대한 영상 스트리밍 서비스를 위한 비디오 데이터를 생성할 수 있다.The encoder 1310 may generate video data for a video streaming service for the virtual reality space.

시그널링부(1320)는 가상 현실 공간 내에서 사용자가 바라보고 있는 현재 뷰포트에 대한 정보 및 상기 사용자가 바라볼 것으로 예상되는 예측 뷰포트에 대한 정보에 적어도 일부 기초하여 시그널링 데이터를 생성할 수 있다.The signaling unit 1320 may generate signaling data based at least in part on information about the current viewport the user is viewing in the virtual reality space and information about the prediction viewport that the user is expected to see.

다중화기(1330)는 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림을 생성할 수 있다.Multiplexer 1330 may generate a bitstream that includes video data and signaling data.

전송부(1340)는 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림을 전송할 수 있다.The transmission unit 1340 may transmit a bitstream including video data and signaling data.

비디오 데이터는 가상 현실 공간 전체 영역에 대한 기본 화질 비디오 데이터 또는 기본 계층 비디오 데이터, 및 가상 현실 공간 전체 영역 내에서 현재 뷰포트와 예측 뷰포트에 대응되는 영역에 대한 고화질 비디오 데이터 또는 적어도 하나의 향상 계층 비디오 데이터를 포함할 수 있다.The video data includes basic picture quality video data or base layer video data for the whole area of the virtual reality space and high quality video data for the area corresponding to the current viewport and the prediction viewport in the entire area of the virtual reality space or at least one enhancement layer video data . &Lt; / RTI >

또한, 영상 전송 장치(1300)는 현재 뷰포트와 예측 뷰포트에 대응되는 영역에 대해서는 고화질의 비디오 데이터 또는 기본 계층 비디오 데이터와 향상 계층 비디오 데이터를 함께 전송하고, 현재 뷰포트와 예측 뷰포트 이외의 영역에 대해서는 기본 화질 비디오 데이터 또는 기본 계층 비디오 데이터만 전송할 수 있다.In addition, the image transmitting apparatus 1300 transmits high-quality video data or base layer video data and enhancement layer video data together with the area corresponding to the current viewport and the predicted viewport, Only the picture quality video data or the base layer video data can be transmitted.

이하에서는 도 14 내지 도 18을 참조하여, 뷰포트와 대상 거리 정보 신호 체계를 설명한다.Hereinafter, the viewport and the object distance information signaling system will be described with reference to FIG. 14 to FIG.

가상 현실 컨텐츠 제작자는 사용자의 시선이 '마땅히 주목할 것으로 예측되는' 대상을 미리 알 수 있도록 신호 체계를 구성하여, 사용자의 시선이 향하기 전에 미리 여앙 데이터가 향상 계층까지 선반입(Pre-fetching) 할 수 있도록 한다. 또한 각 타일들의 우선순위 정보도 전송되어 사용될 수 있도록 한다.The creator of the virtual reality contents can configure the signaling system so that the user's eyes can know in advance the target that is expected to be noticed, and can pre-fetch the false data to the enhancement layer in advance of the user's gaze . Also, priority information of each tile can be transmitted and used.

기본 계층은 빠른 사용자 응답시간을 위해 타일링되지 않고 전체적으로 부호화될 수 있다. 하나 이상의 향상 계층은 필요에 따라 일부 또는 전체가 여러 타일들로 나누어져서 부호화된다. 이 때, 뷰포트는 사용자의 시선이 바라보는 곳, 또는 사용자가 보게 될 가상의 공간에서 중요 오브젝트가 위치할 타일 위치일 수 있다. 뷰포트의 타일 번호는 연속적이므로 모든 번호를 다 보내지 않고도 효과적으로 압축할 수 있다(예를들어, 타일 시작과 끝 번호, 좌표점 정보, 타일 내 코딩 단위(CU) 번호 리스트, 타일 번호를 수식으로 표현 등).The base layer can be entirely coded without being tiled for fast user response times. One or more enhancement layers may be encoded as part or whole divided into multiple tiles as needed. At this time, the viewport may be a location where the user's gaze is viewed, or a tile position where the important object is located in a virtual space to be viewed by the user. The viewport tile number is contiguous, so it can be effectively compressed without having to send out all the numbers (for example, starting and ending tiles, coordinate point information, list of coded unit numbers in tiles, ).

도 14는 예측 뷰포트와 대상 거리 정보의 시그널링을 예시적으로 도시한다.14 illustrates an exemplary signaling of the prediction viewport and object distance information.

도 14를 참조하면, 스케일러블 비디오 컨텐츠는 기본 계층과 향상 계층의 비디오 데이터를 포함하고, 서버 디바이스에서 인코딩되는 향상 계층은 현재 뷰포트 및 예측 뷰포트 정보와 거리 정보에 기초하여 시그널링되는 것을 도시하고 있다.Referring to FIG. 14, the scalable video content includes base layer and enhancement layer video data, and the enhancement layer encoded at the server device is signaled based on current viewport and predictive viewport information and distance information.

이러한 신호 체계(시그널링)는 도시한 바와 같이 세션(Session) 정보를 실어나르는 고수준 구문(High-level syntax) 프로토콜을 통해 전해질 수도 있고, 비디오 표준의 SEI, VUI, 또는 슬라이스 헤더 등의 패킷 단위에서 전해질 수도 있고, 비디오 파일을 설명하는 별도의 파일(예를 들어, DASH의 MPD)로 전달될 수 있다.Such a signaling may be transmitted through a high-level syntax protocol that carries session information as shown, or may be transmitted by a packet unit such as an SEI, VUI, or slice header of a video standard. Or may be transferred to a separate file describing the video file (e.g., MPD of DASH).

본 명세서에 제시된 신호 체계를 통해 향상 계층 또는 고화질 비디오 영상의 특정 타일만 우선적으로 전달 받아 전체적인 지연 시간을 줄이고, 대역폭 상황에 따라 일부만 고화질로 처리함으로써 지연발생을 없앨 수 있어서, HMD 사용자에게 빠른 응답시간을 보장하여 사용자의 멀미 현상을 줄일 수 있다.It is possible to eliminate the occurrence of delay by receiving only a specific tile of an enhancement layer or a high-definition video image preferentially through the signaling system described in the present specification to reduce the overall delay time and to process only a part of the enhancement layer according to the bandwidth situation, Thereby reducing the user's motion sickness.

도 15는 예측 뷰포트와 대상 거리 정보의 시그널링에서 제안하는 예시적인 SEI 페이로드 구문을 도시한다.Fig. 15 shows an exemplary SEI payload syntax proposed in the signaling of the prediction viewport and object distance information.

도 15를 참조하면, H.264 AVC나 H.265 HEVC와 같은 국제 비디오 표준에서의 SEI(Supplemental Enhancement Information) 메시지 페이로드(palyload) 구문(syntax)의 예로 "expected_tile_info"를 보여준다.Referring to FIG. 15, an example of an SEI (Supplemental Enhancement Information) message payload syntax in an international video standard such as H.264 AVC or H.265 HEVC is shown as "expected_tile_info ".

만일 제안하는 구문이 188번으로 정해진 경우에는 도면의 참조번호 1500의 구문이 본 명세서의 실시예로 새로 추가되며, 이 외의 구문은 모두 기존의 표준 구문이다.If the proposed syntax is defined as 188, the syntax of reference numeral 1500 is newly added as an embodiment of the present specification, and all the other syntaxes are existing standard syntax.

도 16은 예시적인 비디오 픽쳐별 뷰포트 신호 체계 규격을 도시한다.16 illustrates a viewport signaling system specification for each exemplary video picture.

도 17은 예시적인 파일, 청크, 비디오 픽쳐 그룹별 신호 체계 규격을 도시한다.Figure 17 shows exemplary signaling specifications for an exemplary file, chunk, and video picture group.

unsigned (n)는 통상 프로그래밍 언어에서 부호가 없는 (unsigned) 'n' 비트 수를 의미한다.unsigned (n) means the number of unsigned 'n' bits in a normal programming language.

version_info 구문은 신호 체계 규약의 버젼 정보, 부호 없는 8비트의 정보로 표현된다.The version_info syntax is represented by the signaling protocol version information, unsigned 8-bit information.

file_size 구문은 파일 사이즈, 부호 없는 64 비트의 정보로 표현된다.The syntax file_size is represented by the file size, 64 bits of unsigned information.

poc_num 구문은 HEVC와 같은 비디오 표준에서의 POC(Picture Order Count) 정보를 의미함, 기존의 H.264 AVC 표준에서의 프레임 번호(frame number)와 유사한 의미. 부호 없는 32 비트의 정보로 표현된다.The poc_num syntax refers to picture order count (POC) information in a video standard such as HEVC, which is similar to the frame number in the existing H.264 AVC standard. It is represented by 32 bits of unsigned information.

info_mode 구문은 본 표준에서 정의한 '정보 모드'로서 다음과 같으며, 부호 없는 4 비트의 정보로 표현된다. 0은 이전 신호 체계 정보와 같음을 나타내고, 1은 예측되는 각 뷰포트에 포함되는 타일 id를 나타내고, 2는 예측되는 각 뷰포트에 포함되는 타일에 대한 거리정보를 나타내고, 3은 전송되는 뷰포트 id 및 타일 id를 나타낸다.The info_mode syntax is defined as 'information mode' defined in this standard. It is represented by 4 bits of unsigned information. 0 indicates the same as the previous signal system information, 1 indicates a tile id included in each viewport to be predicted, 2 indicates distance information on a tile included in each predicted viewport, 3 indicates a viewport id and tile id.

viewport_num 구문은 예측되는 뷰포트의 갯수를 의미하며, 부호 없는 8 비트의 정보로 표현된다.The viewport_num syntax refers to the number of viewports to be predicted, and is represented by 8 bits of unsigned information.

tile_num 구문은 화면 내의 타일의 갯수를 의미하며, 부호 없는 12 비트의 정보로 표현된다.The tile_num syntax is the number of tiles in the screen, and is represented by 12 bits of unsigned information.

tile_id_list_in_viewport[] 구문은 뷰포트 내 타일 번호 리스트를 의미하며, 부호 없는 12 비트의 정보로 표현된다.The tile_id_list_in_viewport [] syntax is a list of tile numbers in the viewport, represented by 12 bits of unsigned information.

tile_distance_list_in_viewport[] 구문은 뷰포트 내 타일 별 거리 정보 리스트를 의미하며, 각각의 거리 정보는 부호 없는 16 비트의 정보로 표현된다.The tile_distance_list_in_viewport [] syntax is a list of distance information for each tile in the viewport, and each distance information is represented by 16 bits of unsigned information.

viewport_id_list_trans[] 구문은 전송되는 뷰포트 번호 리스트를 의미하며, 부호 없는 12 비트의 정보로 표현된다.The viewport_id_list_trans [] syntax is a list of viewport numbers to be transmitted, and is represented by 12-bit unsigned information.

tile_id_list_trans[] 구문은 전송되는 타일 번호 리스트를 의미하며, 부호 없는 12 비트의 정보로 표현된다.The tile_id_list_trans [] syntax is a list of tile numbers to be transmitted, and is represented by 12-bit unsigned information.

user_info_flag 구문은 추가 사용자 정보 모드의 플래그(flag)를 의미하며, 사용자가 추가로 전송하려는 타일 관련 정보가 있는지 여부가 부호 없는 1 비트의 정보로 표현된다. 값이 0일 경우, 추가 사용자 정보가 없음을 나타내고, 값이 1일 경우, 추가 사용자 정보가 있음을 나타낸다.The syntax of user_info_flag indicates a flag of the additional user information mode, and whether or not the tile-related information to be transmitted by the user is present is represented by 1-bit unsigned information. A value of 0 indicates no additional user information, and a value of 1 indicates additional user information.

user_info_size 구문은 추가 사용자 정보의 길이를 의미하며, 부호 없는 16 비트의 정보로 표현한다.The syntax user_info_size represents the length of additional user information and is represented by 16 bits of unsigned information.

user_info_list [] 구문은 추가 사용자 정보의 리스트를 의미하며, 각각의 추가 사용자 정보는 부호 없는 변화 가능한(varies) 비트의 정보로 표현된다.The user_info_list [] syntax is a list of additional user information, and each additional user information is represented by information of vary bits that is unsigned.

전술한 정의된 구문과 의미론에 관한 정보들은 MPEG DASH와 같은 HTTP 기반의 영상 통신에서 각각 XML 형태로 표현이 될 수도 있다.Information on the above defined syntax and semantics may be expressed in XML format in HTTP-based video communication such as MPEG DASH.

도 18은 XML 형태로 표현된 예시적인 타일 정보 구문을 도시한다Figure 18 shows an exemplary tile information syntax expressed in XML form

도 18을 참조하면, XML 형태로 정보 모드(info_mode = "3"), 추가 사용자 정의 모드 플래그(user_info_flag ="0"), 뷰포트 갯수 정보(viewport_num ="2"), 타일 갯수 정보(tile_num "6"), 전송되는 뷰포트 번호 정보(viewport_id_list_trans = "1 2"), 전송되는 타일 번호 정보(tile_id_list_trans = "4 5 12 14 17 22")를 표현한 한 예이다.18, information mode (info_mode = "3"), additional user defined mode flag (user_info_flag = "0"), viewport number information (viewport_num = "2"), tile number information tile_num " "), Viewport number information (viewport_id_list_trans =" 1 2 ") to be transmitted, and tile number information (tile_id_list_trans =" 4 5 12 14 17 22 ") to be transmitted.

본 명세서에 제시한 가상 현실 비디오 스트리밍 방법들은 스케일러블 비디오와 뷰포트 및 거리정보를 통한 차별적 전송 기법에 대해서 이야기하고 있지만, 슬라이스(Slice), FMO(Flexible Macro Block) 등의 화면 분할을 지원하는 다른 비디오 병렬처리 기법들에도 적용 가능하다. 또한 비트 스트림을 분할하여 전송하는 스트리밍 서비스인 MPEG DASH, 마이크로소프트(MS)사의 Smooth 스트리밍(Streaming), 애플(Apple)사의 HLS (HTTP Live Streaming; HTTP 라이브 스트리밍)에도 적용 가능하다.Although the virtual reality video streaming methods disclosed in the present specification discuss scalable video, differentiating transmission techniques using viewports and distance information, other video supporting slices (slice), FMO (flexible macro block) It is also applicable to parallel processing techniques. It is also applicable to MPEG DASH, a streaming service for splitting and transmitting a bitstream, Smooth streaming by Microsoft, and HLS (HTTP Live Streaming; HTTP live streaming) by Apple.

본 명세서에 개시된 실시예들에 따른 가상 현실 시스템은 컴퓨터로 읽을 수 있는 기록 매체에서 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 명세서의 기술이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The virtual reality system according to the embodiments disclosed herein can be implemented as computer readable code on a computer readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present description belongs.

이상에서 본 명세서의 기술에 대한 바람직한 실시 예가 첨부된 도면들을 참조하여 설명되었다. 여기서, 본 명세서 및 청구 범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다.In the foregoing, preferred embodiments of the present invention have been described with reference to the accompanying drawings. Here, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and should be construed in a sense and concept consistent with the technical idea of the present invention.

본 발명의 범위는 본 명세서에 개시된 실시 예들로 한정되지 아니하고, 본 발명은 본 발명의 사상 및 특허청구범위에 기재된 범주 내에서 다양한 형태로 수정, 변경, 또는 개선될 수 있다.The scope of the present invention is not limited to the embodiments disclosed herein, and the present invention can be modified, changed, or improved in various forms within the scope of the present invention and the claims.

1300: 영상 전송 장치
1310: 인코더
1320: 시그널링부
1330: 다중화기
1340: 전송부1300: Video transmission device
1310: Encoder
1320:
1330: Multiplexer
1340:

Claims

Generating video data for a video streaming service for a virtual reality space;
Generating signaling data based at least in part on information about a current viewport the user is viewing within the virtual reality space and information about a predicted viewport that the user is expected to view;
Transmitting the bitstream including the video data and the signaling data, wherein the video data includes basic quality video data for the entire virtual reality space and high-definition video data for the current viewport and the prediction viewport, High-definition video data is divided into at least one rectangular-shaped tile;
Determining whether a bandwidth of a communication line for transmitting the video data is sufficient to transmit all the high-definition video data; And
If the bandwidth is determined to be insufficient, transmitting the at least one tile in priority order,
Transmitting the high-definition video data to an area corresponding to the current viewport and the predictive viewport,
Quality video data only for areas other than the current viewport and the predictive viewport,
Wherein the signaling data comprises tile information identifying the at least one tile included in the current viewport and the predictive viewport.

delete

The method according to claim 1,
If the bandwidth is determined to be insufficient, the operation of transmitting at least a portion of the at least one tile according to priority
Wherein the high-definition video data is transmitted in the order of a tile having a higher priority to a lower tile.

5. The method of claim 4,
Wherein the priority is determined according to a distance from the user to an object in the tile,
And assigns a higher ranking to tiles including the object as the object is closer to the user.

The method according to claim 1,
Wherein the prediction viewport is determined based at least in part on information about the current viewport and content of a virtual reality content.

The method according to claim 1,
Wherein the high-definition video data includes video data for an area corresponding to the current viewport and the prediction viewport in the entire area of the virtual reality space.

6. The method of claim 5,
Wherein the signaling data is generated based on image configuration information,
Wherein the image configuration information includes sight line information indicating a viewport of the user and zoom area information indicating a viewing angle of the user in the virtual reality space.

The method according to claim 1,
The signaling data includes a High-Level Syntax Protocol (SEI), a Supplement Enhancement Information (SEI), a Video Usability Information (VUI), a Slice Header, and a file describing the video data And transmitting the at least one image.

6. The method of claim 5,
Wherein the base picture quality video comprises base layer video,
Wherein the high definition video comprises a base layer video and an enhancement layer video.

Receiving a bitstream comprising video data and signaling data for a virtual reality space;
Decoding the basic picture quality video data based on the video data; And
Decoding the high definition video data based on the video data and the signaling data,
Wherein the signaling data includes at least a portion of information about a current viewport for a region that the user is viewing within the virtual reality space and for a predicted viewport that the user is expected to see within the virtual reality space,
Wherein the high definition video data comprises video data corresponding to the current viewport and the prediction viewport,
Wherein the basic picture quality video data includes video data for the entire virtual reality space region,
Wherein the high definition video data includes video data for an area corresponding to the current viewport and the prediction viewport in the entire area,
Wherein the high-definition video data is divided into at least one tile of rectangular shape,
Wherein the signaling data includes tile information identifying the at least one tile included in the current viewport and the predictive viewport,
Wherein the signaling data is generated based on image configuration information,
Wherein the image configuration information includes sight line information indicating a viewport of the user in the virtual reality space and zoom area information indicating a viewing angle of the user.

delete

12. The method of claim 11,
The signaling data includes a High-Level Syntax Protocol (SEI), a Supplement Enhancement Information (SEI), a Video Usability Information (VUI), a Slice Header, and a file describing the video data And transmitting the at least one image.

12. The method of claim 11,
Wherein the base picture quality video comprises base layer video,
Wherein the high definition video comprises a base layer video and an enhancement layer video.

An encoder for generating video data for a video streaming service for a virtual reality space;
A signaling unit for generating signaling data based at least in part on information about a current viewport the user is viewing in the virtual reality space and information on a prediction viewport that the user is expected to view;
A multiplexer for generating a bitstream including the video data and the signaling data; And
And a transmitter for transmitting the bitstream,
Wherein the video data includes basic picture quality video data for the entire virtual reality space region and high-definition video data for the current viewport and an area corresponding to the prediction viewport within the entire area,
The high-definition video data is transmitted together with an area corresponding to the current viewport and the predictive viewport,
Quality video data only for areas other than the current viewport and the predictive viewport,
Wherein the high-definition video data is divided into at least one tile of rectangular shape,
Wherein the signaling data includes tile information identifying the at least one tile included in the current viewport and the predictive viewport,
And transmits the at least one tile according to a priority when it is determined that a bandwidth of a communication line for transmitting the video data is not sufficient to transmit all of the high definition video data.

18. The method of claim 17,
Wherein the base picture quality video comprises base layer video,
Wherein the high definition video comprises a base layer video and an enhancement layer video.