KR20180035089A

KR20180035089A - Providing virtual reality service considering region of interest

Info

Publication number: KR20180035089A
Application number: KR1020160125145A
Authority: KR
Inventors: 류은석
Original assignee: 가천대학교 산학협력단
Priority date: 2016-09-28
Filing date: 2016-09-28
Publication date: 2018-04-05
Anticipated expiration: 2036-09-28
Also published as: WO2018062641A1; KR101861929B1

Abstract

본 명세서는 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 수신하는 단계, 상기 비디오 데이터는 기본 계층을 위한 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터를 포함하고; 상기 기본 계층 비디오 데이터를 디코딩하는 단계; 및 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 디코딩하는 단계를 포함하되, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한
비디오 데이터인 영상 수신 방법을 개시한다.The present disclosure relates to a method and apparatus for receiving a bitstream comprising video data for a virtual reality service, the video data comprising at least one enhancement layer for base layer video data for a base layer and at least one enhancement layer predicted from the base layer Layer video data; Decoding the base layer video data; And decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data includes at least one enhancement layer video data for at least one region of interest
A method for receiving video data is disclosed.

Description

PROVIDING VIRTUAL REALITY SERVICE CONSIDERING REGION OF INTEREST [0002]

본 명세서는 관심 영역을 고려한 가상 현실 서비스 제공에 관한 것이다.This specification relates to providing a virtual reality service in consideration of a region of interest.

최근 가상현실(Virtual Reality, VR) 기술 및 장비가 발전함에 따라 다양한 서비스가 실현되고 있다. 화상 회의 서비스는 가상 현실 기술을 기초로 구현되는 서비스의 예이다. 사용자가 화상 회의를 위하여 회의 참가자의 영상 정보를 포함한 멀티미디어 데이터를 처리하는 장치를 사용할 수 있다.Recently, various services have been realized as the technology and equipment of virtual reality (VR) have developed. Video conferencing services are examples of services implemented on the basis of virtual reality technology. A user may use a device for processing multimedia data including video information of a conference participant for video conferencing.

본 명세서는 가상 현실 내의 관심 영역 정보를 고려한 영상 처리를 제공한다.The present specification provides image processing that considers region of interest information within a virtual reality.

또한, 본 명세서는 사용자의 시선 정보에 따라 서로 다른 품질의 영상 처리를 제공한다.In addition, the present specification provides image processing of different quality according to the user's gaze information.

또한, 본 명세서는 사용자의 시선의 변동에 반응하는 영상 처리를 제공한다.The present specification also provides image processing responsive to variations in the user's gaze.

또한, 본 명세서는 사용자의 시선 변동에 대응하는 시그널링을 제공한다.In addition, the present specification provides signaling corresponding to a user's gaze variation.

본 명세서에 개시된 일 실시예에 따른 영상 수신 장치는 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 수신하는 통신부, 상기 비디오 데이터는 기본 계층을 위한 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터를 포함하고; 상기 기본 계층 비디오 데이터를 디코딩하는 기본 계층 디코더; 및 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 디코딩하는 향상 계층 디코더를 포함하고, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.According to another aspect of the present invention, there is provided an image receiving apparatus including a communication unit for receiving a bitstream including video data for a virtual reality service, the video data including base layer video data for a base layer and at least At least one enhancement layer video data for one enhancement layer; A base layer decoder for decoding the base layer video data; And an enhancement layer decoder for decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data is video data for at least one region of interest .

또한, 본 명세서에 개시된 다른 실시예에 따른 영상 수신 장치는 기본 계층을 위한 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터를 수신하는 통신부; 상기 기본 계층 비디오 데이터를 디코딩하는 제1 프로세서; 및 상기 제1 프로세서와 전기적으로 연결되어, 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 디코딩하는 제2 프로세서를 포함하되, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.According to another aspect of the present invention, there is provided an image receiving apparatus including: a communication unit for receiving base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer; A first processor for decoding the base layer video data; And a second processor, electrically coupled to the first processor, for decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data is stored in the virtual space And may be video data for at least one region of interest.

또한, 본 명세서에 개시된 다른 실시예에 따른 영상 전송 장치는 기본 계층 비디오 데이터를 생성하는 기본 계층 인코더; 상기 기본 계층 비디오 데이터를 기초로 적어도 하나의 향상 계층 비디오 데이터를 생성하는 향상 계층 인코더; 및 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 전송하는 통신부를 포함하고, 상기 비디오 데이터는 기본 계층을 위한 상기 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 상기 적어도 하나의 향상 계층 비디오 데이터를 포함하고, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.According to another aspect of the present invention, there is provided an image transmission apparatus including: a base layer encoder for generating base layer video data; An enhancement layer encoder for generating at least one enhancement layer video data based on the base layer video data; And a communication unit for transmitting a bitstream including video data for a virtual reality service, the video data comprising at least the base layer video data for a base layer and the at least one enhancement layer for at least one enhancement layer predicted from the base layer. One enhancement layer video data, and the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.

또한, 본 명세서에 개시된 다른 실시예에 다른 영상 수신 방법은 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 수신하는 단계, 상기 비디오 데이터는 기본 계층을 위한 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터를 포함하고; 상기 기본 계층 비디오 데이터를 디코딩하는 단계; 및 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 디코딩하는 단계를 포함하고, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.In addition, another image receiving method according to another embodiment disclosed herein includes receiving a bitstream including video data for a virtual reality service, the video data including base layer video data for a base layer and prediction At least one enhancement layer video data for at least one enhancement layer to be enhanced; Decoding the base layer video data; And decoding the at least one enhancement layer video data based on the base layer video data, wherein the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space .

또한, 본 명세서에 개시된 다른 실시예에 따른 영상 전송 방법은 기본 계층 비디오 데이터를 생성하는 단계; 상기 기본 계층 비디오 데이터를 기초로 적어도 하나의 향상 계층 비디오 데이터를 생성하는 단계; 및 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 전송하는 단계를 포함하고, 상기 비디오 데이터는 기본 계층을 위한 상기 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 상기 적어도 하나의 향상 계층 비디오 데이터를 포함하고, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.According to another aspect of the present invention, there is provided an image transmission method including: generating base layer video data; Generating at least one enhancement layer video data based on the base layer video data; And transmitting a bitstream comprising video data for a virtual reality service, wherein the video data comprises at least the base layer video data for a base layer and the at least one enhancement layer for at least one enhancement layer predicted from the base layer. One enhancement layer video data, and the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.

본 명세서에 개시된 기술에 따르면, 영상 처리 장치는 사용자의 시선을 기초로 서로 다른 영상 처리 방법을 적용할 수 있다. 또한 본 명세서에 개시된 기술에 따르면, 사용자의 시선 정보를 고려한 영상 처리 방법에 의하여, 화상 회의 장치, 예를 들어 HMD, 착용자가 느끼는 화질의 변화는 최소화하며, 영상 전송을 위한 대역폭(BW) 절약, 영상 처리 성능의 개선을 통한 소모 전력 감소 등의 효과가 있다.According to the technique disclosed in this specification, the image processing apparatus can apply different image processing methods based on the user's gaze. According to the technique disclosed in this specification, the image processing method considering the user's gaze information minimizes the change in the image quality felt by the video conferencing device, for example, the HMD and the wearer, and saves bandwidth (BW) And reduction of power consumption by improving image processing performance.

도 1은 예시적인 화상 회의 시스템을 나타낸 도면이다.
도 2는 예시적인 화상 회의 서비스를 나타낸 도면이다.
도 3은 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.
도 4는 서버 디바이스의 예시적인 구성을 나타낸 도면이다.
도 5는 인코더의 예시적인 구조를 나타낸 도면이다.
도 6은 스케일러블 비디오 코딩을 이용한 예시적인 화상 회의 서비스를 나타낸 도면이다.
도 7은 예시적인 영상 전송 방법을 나타낸 도면이다.
도 8은 관심 영역을 시그널링하는 예시적인 방법을 나타낸 도면이다.
도 9는 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.
도 10은 제어부의 예시적인 구성을 나타낸 도면이다.
도 11은 디코더의 예시적인 구성을 나타낸 도면이다.
도 12은 영상 구성 정보를 생성 및/또는 전송하는 예시적인 방법을 나타낸 도면이다.
도 13은 클라이언트 디바이스가 영상 구성 정보를 시그널링 하는 예시적인 방법을 나타낸 도면이다.
도 14는 높고/낮은 수준의 영상을 전송하는 예시적인 방법을 나타낸 도면이다.
도 15는 예시적인 영상 복호화 방법을 나타낸 도면이다.
도 16은 예시적인 영상 부호화 방법을 나타낸 도면이다.
도 17은 관심 영역 정보의 예시적인 신택스를 나타낸 도면이다.
도 18은 XML 포맷의 예시적인 관심 영역 정보, 및 예시적인 SEI 메시지를 나타낸 도면이다..
도 19는 클라이언트 디바이스의 예시적인 프로토콜 스택을 도시한 도면이다.
도 20은 SLT 와 SLS (service layer signaling) 의 예시적인 관계를 도시한 도면이다.
도 21은 예시적인 SLT 를 도시한 도면이다.
도 22는 serviceCategory 속성의 예시적인 코드 벨류를 나타낸 도면이다.
도 23은 예시적인 SLS 부트스트래핑과 예시적인 서비스 디스커버리 과정을 도시한 도면이다.
도 24는 ROUTE/DASH 를 위한 예시적인 USBD/USD 프래그먼트를 도시한 도면이다.
도 25는 ROUTE/DASH 를 위한 예시적인 S-TSID 프래그먼트를 도시한 도면이다.
도 26은 예시적인 MPD 프래그먼트를 나타낸 도면이다.
도 27은 가상 현실 서비스를 복수의 ROUTE 세션을 통해서 수신하는 예시적인 과정을 나타낸 도면이다.
도 28는 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.
도 29는 서버 디바이스의 예시적인 구성을 나타낸 도면이다.
도 30은 클라이언트 디바이스의 예시적인 동작을 나타낸 도면이다.
도 31은 서버 디바이스의 예시적인 동작을 나타낸 도면이다.1 is a diagram illustrating an exemplary video conferencing system.
2 is a diagram illustrating an exemplary video conferencing service.
3 is a diagram illustrating an exemplary scalable video coding service.
4 is a diagram showing an exemplary configuration of a server device.
5 is a diagram showing an exemplary structure of an encoder.
6 is a diagram illustrating an exemplary video conferencing service using scalable video coding.
7 is a diagram illustrating an exemplary image transmission method.
8 is a diagram illustrating an exemplary method of signaling a region of interest.
9 is a diagram showing an exemplary configuration of a client device.
10 is a diagram showing an exemplary configuration of the control unit.
11 is a diagram showing an exemplary configuration of a decoder.
12 is a diagram illustrating an exemplary method of generating and / or transmitting image configuration information.
13 is a diagram illustrating an exemplary method by which a client device signals image configuration information.
14 is a diagram illustrating an exemplary method of transmitting high / low level images.
15 is a diagram illustrating an exemplary image decoding method.
16 is a diagram illustrating an exemplary image encoding method.
17 is a diagram showing an exemplary syntax of the region of interest information.
18 is a diagram illustrating exemplary ROI information in an XML format and an exemplary SEI message.
19 is a diagram illustrating an exemplary protocol stack of a client device.
20 is an illustration showing an exemplary relationship between SLT and SLS (service layer signaling).
21 is a diagram showing an exemplary SLT.
22 is a diagram illustrating an exemplary code value of the serviceCategory attribute.
23 is a diagram illustrating an exemplary SLS bootstrapping and exemplary service discovery process.
24 is a diagram illustrating an exemplary USBD / USD fragment for ROUTE / DASH.
25 is a diagram illustrating an exemplary S-TSID fragment for ROUTE / DASH.
26 is a diagram illustrating an exemplary MPD fragment.
27 is a diagram illustrating an exemplary process of receiving a virtual reality service through a plurality of ROUTE sessions.
28 is a diagram showing an exemplary configuration of a client device.
29 is a diagram showing an exemplary configuration of a server device.
30 is a diagram illustrating an exemplary operation of a client device.
31 is a diagram showing an exemplary operation of the server device.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 명세서에 개시된 기술의 사상을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 명세서에 개시된 기술의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 명세서에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥 상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It is noted that the technical terms used herein are used only to describe specific embodiments and are not intended to limit the scope of the technology disclosed herein. Also, the technical terms used herein should be interpreted as being generally understood by those skilled in the art to which the presently disclosed subject matter belongs, unless the context clearly dictates otherwise in this specification, Should not be construed in a broader sense, or interpreted in an oversimplified sense. It is also to be understood that the technical terms used herein are erroneous technical terms that do not accurately represent the spirit of the technology disclosed herein, it is to be understood that the technical terms used herein may be understood by those of ordinary skill in the art to which this disclosure belongs And it should be understood. Also, the general terms used in the present specification should be interpreted in accordance with the predefined or prior context, and should not be construed as being excessively reduced in meaning.

본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 기술의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.As used herein, terms including ordinals, such as first, second, etc., may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the description of the technology, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예들을 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals denote like or similar elements, and redundant description thereof will be omitted.

또한, 본 명세서에 개시된 기술을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 기술의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 기술의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 그 기술의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.Further, in the description of the technology disclosed in this specification, a detailed description of related arts will be omitted if it is determined that the gist of the technology disclosed in this specification may be obscured. It is to be noted that the attached drawings are only for the purpose of easily understanding the concept of the technology disclosed in the present specification, and should not be construed as limiting the spirit of the technology by the attached drawings.

도 1은 예시적인 화상 회의 시스템을 나타낸 도면이다.1 is a diagram illustrating an exemplary video conferencing system.

화상 회의 시스템은 원격의 장소에 위치한 적어도 하나의 사용자에게 화상 회의 서비스를 제공할 수 있다. 화상 회의 서비스는 서로 다른 지역에 있는 사람들이 상대방을 직접 만나지 않고도 화면을 통해 서로의 얼굴을 보면서 회의를 할 수 있는 서비스를 말한다.The video conferencing system can provide video conferencing services to at least one user located at a remote location. Videoconferencing is a service where people in different regions can meet each other face-to-face without having to meet each other directly.

화상 회의 시스템은 두 가지의 모습으로 구성될 수 있다. 첫 째, 화상 회의 시스템은 각 사용자의 클라이언트 디바이스(예를 들어, HMD)끼리 직접 N:N 통신을 이용해서 이루어질 수 있다. 이 경우, 여러 시그널링과 영상 전송이 각각 이루어지므로 전체 대역폭은 많이 차지하지만, 화상 회의 시스템은 각 사용자에게 최적의 영상을 제공할 수 있다.The video conferencing system can be configured in two ways. First, the video conferencing system can be achieved using direct N: N communication between client devices (e.g., HMDs) of each user. In this case, since various signaling and image transmission are performed respectively, the total bandwidth is large, but the video conferencing system can provide an optimal image to each user.

둘 째, 화상 회의 시스템은 화상 회의를 위한 서버 디바이스(또는 중계 시스템)를 더 포함할 수 있다. 이 경우, 서버 디바이스는 각 클라이언트 디바이스로부터 적어도 하나의 비디오 영상을 수신하고, 적어도 하나의 비디오 영상을 취합/선별하여 각 클라이언트 디바이스로 서비스할 수 있다.Second, the video conferencing system may further include a server device (or relay system) for video conferencing. In this case, the server device may receive at least one video image from each client device, and may collect / select at least one video image to service each client device.

본 명세서에 설명된 예시적인 기술은 위의 두 가지 화상 회의 시스템에 모두 적용될 수 있으며, 이하에서는 두 번째 실시예를 중심으로 설명한다.The exemplary techniques described herein can be applied to both of the above two video conferencing systems, and the following description will focus on the second embodiment.

화상 회의 시스템(100)은 원격의 위치에 있는 적어도 하나의 사용자(110)를 위한 적어도 하나의 클라이언트 디바이스(120), 및/또는 서버 디바이스(130)를 포함할 수 있다.The video conferencing system 100 may include at least one client device 120 and / or a server device 130 for at least one user 110 at a remote location.

클라이언트 디바이스(120)는 해당 클라이언트 디바이스(120)를 사용하는 사용자(110)로부터 사용자 데이터를 획득할 수 있다. 사용자 데이터는 사용자의 영상 데이터, 음성 데이터, 및 부가 데이터를 포함할 수 있다.The client device 120 may obtain user data from the user 110 using the corresponding client device 120. [ The user data may include user image data, voice data, and additional data.

예를 들어, 클라이언트 디바이스(120)는 사용자(110)의 영상 데이터를 획득하는 2D/3D 카메라 및 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.For example, the client device 120 may include at least one of a 2D / 3D camera and an Immersive camera for acquiring image data of the user 110. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

예를 들어, 클라이언트 디바이스(120)는 제1 장소(Place 1)에 위치한 제1 사용자(111)의 사용자 데이터를 획득하는 제1 클라이언트 디바이스(121), 제2 장소(Place 2)에 위치한 제2 사용자(113)의 사용자 데이터를 획득하는 제2 클라이언트 디바이스(123), 및 제3 장소(Place 3)에 위치한 제3 사용자(115)의 사용자 데이터를 획득하는 제3 클라이언트 디바이스(125) 중에서 적어도 하나를 포함할 수 있다.For example, the client device 120 may include a first client device 121 that obtains user data of a first user 111 located at a first location (Place 1), a second client device 121 that obtains user data of a second user located at a second location (Place 2) At least one of the second client device 123 acquiring the user data of the user 113 and the third client device 125 acquiring the user data of the third user 115 located at the third place (Place 3) . &Lt; / RTI >

그리고 나서, 각각의 클라이언트 디바이스(120)는 획득한 사용자 데이터를 네트워크를 통하여 서버 디바이스(130)로 전송할 수 있다. Each client device 120 may then transmit the acquired user data to the server device 130 over the network.

서버 디바이스(130)는 클라이언트 디바이스(120)로부터 적어도 하나의 사용자 데이터를 수신할 수 있다. 서버 디바이스(130)는 수신한 사용자 데이터를 기초로 가상 공간에서의 화상 회의를 위한 전체 영상을 생성할 수 있다. 전체 영상은 가상 공간 내에서 360도 방향의 영상을 제공하는 immersive 영상을 나타낼 수 있다. 서버 디바이스(130)는 사용자 데이터에 포함된 영상 데이터를 가상 공간에 매핑하여 전체 영상을 생성할 수 있다..The server device 130 may receive at least one user data from the client device 120. The server device 130 can generate a full image for video conference in the virtual space based on the received user data. The whole image can represent an immersive image that provides an image of 360 degrees in the virtual space. The server device 130 may generate the entire image by mapping the image data included in the user data to the virtual space.

그리고 나서, 서버 디바이스(130)는 전체 영상을 각 사용자에게 전송할 수 있다. The server device 130 may then transmit the entire image to each user.

각각의 클라이언트 디바이스(120)는 전체 영상을 수신하고, 각 사용자가 바라보는 영역 만큼을 가상 공간에 렌더링 및/또는 디스플레이할 수 있다.Each client device 120 can receive the entire image and render and / or display as much of the area viewed by each user in the virtual space.

도 2는예시적인 화상 회의 서비스를 나타낸 도면이다.2 is a diagram illustrating an exemplary video conferencing service.

도면을 참고하면, 가상 공간 내에는 제1 사용자(210), 제2 사용자(220), 및 제3 사용자(230)가 존재할 수 있다. 제1 사용자(210), 제2 사용자(220), 및 제3 사용자(230)는 가상 공간 내에서 서로 상대방을 바라보면서 회의를 수행할 수 있다. 이하에서는, 제1 사용자(210)을 중심으로 설명한다.Referring to the drawing, a first user 210, a second user 220, and a third user 230 may exist in the virtual space. The first user 210, the second user 220, and the third user 230 can perform a conference while looking at each other in the virtual space. Hereinafter, the first user 210 will be mainly described.

화상 회의 시스템은 가상 공간 내에서 말을 하고 있는 화자 및/또는 제1 사용자(210)의 시선을 판단할 수 있다. 예를 들어, 제2 사용자(220)가 화자이고, 제1 사용자(210)는 제2 사용자를 바라볼 수 있다.The video conferencing system can determine the line of sight of the speaker and / or the first user 210 speaking in the virtual space. For example, the second user 220 may be the speaker and the first user 210 may view the second user.

이 경우, 화상 회의 시스템은 제1 사용자(210)가 바라보는 제2 사용자(220)에 대한 영상은 고 품질의 비디오 영상으로 제1 사용자(210)에게 전송할 수 있다. 또한, 화상 회의 시스템은 제1 사용자(220)의 시선 방향에서 보이지 않거나 일부만 보이는 제3 사용자(230)에 대한 영상은 저 품질의 비디오 영상으로 제1 사용자(210)에게 전송할 수 있다.In this case, the video conferencing system can transmit an image of the second user 220 viewed by the first user 210 to the first user 210 as a high-quality video image. In addition, the video conferencing system can transmit the image of the third user 230, which is not visible or partially visible in the direction of the first user 220, to the first user 210 as a low quality video image.

그 결과, 전체 영상을 모두 고품질의 비디오 영상으로 전송하는 종래의 방식에 비하여, 화상 회의 시스템은 사용자의 시선을 기초로 영상 처리 방법에 차이를 두어, 영상 전송을 위한 대역폭(BW)을 절약하고, 영상 처리 성능을 개선할 수 있다.As a result, compared with the conventional method of transmitting all the images as high-quality video images, the video conferencing system makes a difference in the image processing method based on the user's sight, saves bandwidth (BW) for image transmission, The image processing performance can be improved.

도 3은 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.3 is a diagram illustrating an exemplary scalable video coding service.

스케일러블 비디오 코딩 서비스는 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라 시간적, 공간적, 그리고 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 스케일러블 비디오 코딩 서비스는 일반적으로 해상도(Spatial resolution), 품질(Quality), 및 시간(Temporal) 측면에서의 계층성(Scalability)을 제공한다.Scalable video coding service is an image compression method for providing various services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. Scalable video coding services generally provide scalability in terms of spatial resolution, quality, and temporal aspects.

공간적 계층성(Spatial scalability)은 동일한 영상에 대해 각 계층별로 다른 해상도를 가지고 부호화함으로써 서비스할 수 있다. 공간적 계층성을 이용하여 디지털 TV, 노트북, 스마트 폰 등 다양한 해상도를 갖는 디바이스에 대해 적응적으로 영상 콘텐츠를 제공하는 것이 가능하다.Spatial scalability can be provided by encoding the same image with different resolution for each layer. It is possible to adaptively provide image contents to devices having various resolutions such as a digital TV, a notebook, and a smart phone using spatial hierarchy.

도면을 참고하면, 스케일러블 비디오 코딩 서비스는 VSP(비디오 서비스 프로바이더; Video Service Provider)로부터 가정 내의 홈 게이트웨이 (Home Gateway)를 통해 동시에 하나 이상의 서로 다른 특성을 가진 TV를 지원할 수 있다. 예를 들어, 스케일러블 비디오 코딩 서비스는 서로 다른 해상도(Resolution)를 가지는 HDTV (High-Definition TV), SDTV (Standard-Definition TV), 및 LDTV (Low-Definition TV)를 동시에 지원할 수 있다.Referring to the drawings, a scalable video coding service can support one or more TVs having different characteristics from a video service provider (VSP) through a home gateway in the home. For example, the scalable video coding service can simultaneously support HDTV (High-Definition TV), SDTV (Standard-Definition TV), and LDTV (Low-Definition TV) having different resolutions.

시간적 계층성(Temporal scalability)은 콘텐츠가 전송되는 네트워크 환경 또는 단말기의 성능을 고려하여 영상의 프레임 레이트(Frame rate)를 적응적으로 조절할 수 있다. 예를 들어, 근거리 통신망을 이용하는 경우에는 60FPS(Frame Per Second)의 높은 프레임 레이트로 서비스를 제공하고, 3G 모바일 네트워크와 같은 무선 광대역 통신망을 사용하는 경우에는 16FPS의 낮은 프레임 레이트로 콘텐츠를 제공함으로써, 사용자가 영상을 끊김 없이 받아볼 수 있도록 서비스를 제공할 수 있다.Temporal scalability can adaptively adjust the frame rate of an image in consideration of the network environment in which the content is transmitted or the performance of the terminal. For example, when a local area network is used, a service is provided at a high frame rate of 60 frames per second (FPS). When a wireless broadband communication network such as a 3G mobile network is used, a content is provided at a low frame rate of 16 FPS, A service can be provided so that the user can receive the video without interruption.

품질 계층성(Quality scalability) 또한 네트워크 환경이나 단말기의 성능에 따라 다양한 화질의 콘텐츠를 서비스함으로써, 사용자가 영상 콘텐츠를 안정적으로 재생할 수 있도록 한다.Quality scalability In addition, by providing contents of various image quality according to the network environment or the performance of the terminal, the user can stably reproduce the image contents.

스케일러블 비디오 코딩 서비스는 각각 기본 계층 (Base layer)과 하나 이상의 향상 계층 (Enhancement layer(s))을 포함할 수 있다. 수신기는 기본 계층만 받았을 때는 일반 화질의 영상을 제공하고, 기본 계층 및 향상 계층을 함께 받으면 고화질을 제공할 수 있다. 즉, 기본 계층과 하나 이상의 향상 계층이 있을 때, 기본 계층을 받은 상태에서 향상 계층 (예: Enhancement layer 1, enhancement layer 2, …, enhancement layer n)을 더 받으면 받을수록 화질이나 제공하는 영상의 품질이 좋아진다.The scalable video coding service may each include a base layer and one or more enhancement layers (s). The receiver provides a normal image quality when receiving only the base layer, and can provide a high image quality when the base layer and the enhancement layer are received together. In other words, when there is a base layer and one or more enhancement layers, when an enhancement layer (for example, enhancement layer 1, enhancement layer 2, ..., enhancement layer n) is further received while receiving a base layer, Is better.

이와 같이, 스케일러블 비디오 코딩 서비스의 영상은 복수개의 계층으로 구성되어 있으므로, 수신기는 적은 용량의 기본 계층 데이터를 빨리 전송 받아 일반적 화질의 영상을 빨리 처리하여 재생하고, 필요 시 향상 계층 영상 데이터까지 추가로 받아서 서비스의 품질을 높일 수 있다.Thus, since the scalable video coding service is composed of a plurality of layers, the receiver can quickly receive the base layer data with a small capacity and quickly process and reproduce the image of general image quality, The service quality can be improved.

도 4는서버 디바이스의 예시적인 구성을 나타낸 도면이다.4 is a diagram showing an exemplary configuration of a server device.

서버 디바이스(400)는 제어부(410) 및/또는 통신부(420)을 포함할 수 있다.The server device 400 may include a control unit 410 and / or a communication unit 420.

제어부(410)은 가상 공간 내에서 화상 회의를 위한 전체 영상을 생성하고, 생성된 전체 영상을 인코딩할 수 있다. 또한, 제어부(410)는 서버 디바이스(400)의 모든 동작을 제어할 수 있다. 구체적인 내용은 이하에서 설명한다.The control unit 410 can generate the entire image for the video conference in the virtual space, and encode the entire generated image. In addition, the control unit 410 can control all the operations of the server device 400. Details will be described below.

통신부(420)는 외부 장치 및/또는 클라이언트 디바이스로 데이터를 전송 및/또는 수신할 수 있다. 예를 들어, 통신부(420)는 적어도 하나의 클라이언트 디바이스로부터 사용자 데이터 및/또는 시그널링 데이터를 수신할 수 있다. 또한, 통신부(420)은 가상 공간에서 화상 회의를 위한 전체 영상을 클라이언트 디바이스로 전송할 수 있다.The communication unit 420 may transmit and / or receive data to an external device and / or a client device. For example, the communication unit 420 may receive user data and / or signaling data from at least one client device. In addition, the communication unit 420 can transmit the entire image for video conference in the virtual space to the client device.

제어부(410)는 시그널링 데이터 추출부(411), 영상 생성부(413), 관심 영역 판단부(415), 시그널링 데이터 생성부(417), 및/또는 인코더(419) 중에서 적어도 하나를 포함할 수 있다.The control unit 410 may include at least one of a signaling data extraction unit 411, an image generation unit 413, a region of interest determination unit 415, a signaling data generation unit 417, and / or an encoder 419 have.

시그널링 데이터 추출부(411)는 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 포함할 수 있다. 영상 구성 정보는 가상 공간 내에서 사용자의 시선 방향을 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.The signaling data extracting unit 411 can extract the signaling data from the data received from the client device. For example, the signaling data may include image configuration information. The image configuration information may include gaze information indicating a gaze direction of the user in the virtual space and zoom area information indicating a viewing angle of the user.

영상 생성부(413)는 적어도 하나의 클라이언트 디바이스로부터 수신한 영상을 기초로 가상 공간에서 화상 회의를 위한 전체 영상을 생성할 수 있다.The image generation unit 413 can generate a full image for video conference in the virtual space based on the image received from at least one client device.

관심 영역 판단부(417)는 화상 회의 서비스를 위한 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 판단할 수 있다. 예를 들어, 관심 영역 판단부(417)는 시선 정보 및/또는 줌 영역 정보를 기초로 관심 영역을 판단할 수 있다. 예를 들어, 관심 영역은 사용자가 보게 될 가상의 공간에서 중요 오브젝트가 위치할 타일의 위치(예를 들어, 게임 등에서 새로운 적이 등장하는 위치, 가상 공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳일 수 있다. 또한, 관심 영역 판단부(417)는 화상 회의 서비스를 위한 가상 공간의 The region-of-interest determination unit 417 can determine a region of interest corresponding to the direction of the user's gaze within the entire region of the virtual space for the video conferencing service. For example, the ROI determination unit 417 may determine the ROI based on the sight line information and / or the zoom region information. For example, the region of interest may include a location of a tile where the important object is located in a virtual space to be viewed by the user (for example, a location where a new enemy appears in a game or the like, a position of a speaker in a virtual space) It can be a place to look at. In addition, the region-of-interest determination unit 417 determines the region of interest

전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 지시하는 관심 영역 정보를 생성할 수 있다.It is possible to generate the area of interest information indicating the area of interest corresponding to the direction of the user's gaze within the entire area.

시그널링 데이터 생성부(413)는 전체 영상을 처리하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 관심 영역 정보를 전송할 수 있다. 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data generation unit 413 may generate signaling data for processing the entire image. For example, the signaling data may transmit the region of interest information. The signaling data may be transmitted through at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing video data.

인코더(419)는 시그널링 데이터를 기초로 전체 영상을 인코딩할 수 있다. 예를 들어, 인코더(419)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 인코딩할 수 있다. 예를 들어, 가상 공간 내에서 제1 사용자가 제2 사용자를 바라보는 경우, 인코더는 가상 공간 내의 제1 사용자 시선을 기초로 제2 사용자에 해당하는 영상은 고화질로 인코딩하고, 제3 사용자에 해당하는 영상은 저화질로 인코딩할 수 있다. 실시예에 따라서, 인코더(419)는 시그널링 데이터 추출부(411), 영상 생성부(413), 관심 영역 판단부(415), 및/또는 시그널링 데이터 생성부(417) 중에서 적어도 하나를 포함할 수 있다.The encoder 419 can encode the entire image based on the signaling data. For example, the encoder 419 may encode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the first user views the second user in the virtual space, the encoder encodes the image corresponding to the second user in high image quality based on the first user's gaze in the virtual space, Can be encoded with low image quality. The encoder 419 may include at least one of a signaling data extraction unit 411, an image generation unit 413, a region of interest determination unit 415, and / or a signaling data generation unit 417 have.

도 5는 인코더의 예시적인 구조를 나타낸 도면이다.5 is a diagram showing an exemplary structure of an encoder.

인코더(500, 영상 부호화 장치)는 기본 계층 인코더(510), 적어도 하나의 향상 계층 인코더(520), 및 다중화기(530) 중에서 적어도 하나를 포함할 수 있다.The encoder 500 may include at least one of a base layer encoder 510, at least one enhancement layer encoder 520, and a multiplexer 530.

인코더(500)는 스케일러블 비디오 코딩 방법을 사용하여 전체 영상을 인코딩할 수 있다. 스케일러블 비디오 코딩 방법은 SVC(Scalable Video Coding) 및/또는 SHVC(Scalable High Efficiency Video Coding)를 포함할 수 있다.The encoder 500 may encode the entire image using a scalable video coding method. The scalable video coding method may include Scalable Video Coding (SVC) and / or Scalable High Efficiency Video Coding (SHVC).

스케일러블 비디오 코딩 방법은 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라서 시간적, 공간적, 및 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 예를 들어, 인코더(500)는 동일한 비디오 영상에 대하여 두 가지 이상의 다른 품질(또는 해상도, 프레임 레이트)의 영상들을 인코딩하여 비트스트림을 생성할 수 있다.The scalable video coding method is an image compression method for providing a variety of services in a scalable manner in terms of temporal, spatial, and image quality according to various user environments such as a network situation or a terminal resolution in various multimedia environments. For example, the encoder 500 may encode images of two or more different quality (or resolution, frame rate) for the same video image to generate a bitstream.

예를 들어, 인코더(500)는 비디오 영상의 압축 성능을 높이기 위해서 계층 간 중복성을 이용한 인코딩 방법인 계층간 예측 툴(Inter-layer prediction tools)을 사용할 수 있다. 계층 간 예측 툴은 계층 간에 존재하는 영상의 중복성을 제거하여 향상 계층(Enhancement Layer)에서의 압출 효율을 높이는 기술이다.For example, the encoder 500 may use an inter-layer prediction tool, which is an encoding method using intra-layer redundancy, in order to enhance the compression performance of a video image. The inter-layer prediction tool is a technique for enhancing the extrusion efficiency in the enhancement layer by eliminating redundancy of images existing between layers.

향상 계층은 계층 간 예측 툴을 이용하여 참조 계층(Reference Layer)의 정보를 참조하여 인코딩될 수 있다. 참조 계층이란 향상 계층 인코딩 시 참조되는 하위 계층을 말한다. 여기서, 계층 간 툴을 사용함으로써 계층 사이에 의존성(Dependency)이 존재하기 때문에, 최상위 계층의 영상을 디코딩하기 위해서는 참조되는 모든 하위 계층의 비트스트림이 필요하다. 중간 계층에서는 디코딩 대상이 되는 계층과 그 하위 계층들의 비트스트림 만을 획득하여 디코딩을 수행할 수 있다. 최하위 계층의 비트스트림은 기본 계층(Base Layer)으로써, H.264/AVC, HEVC 등의 인코더로 인코딩될 수 있다.The enhancement layer can be encoded by referring to information of a reference layer using an inter-layer prediction tool. The reference layer refers to the lower layer that is referred to in the enhancement layer encoding. Here, since there is a dependency between layers by using a layer-to-layer tool, in order to decode the image of the highest layer, a bitstream of all lower layers to be referred to is required. In the middle layer, decoding can be performed by acquiring only a bitstream of a layer to be decoded and its lower layers. The bit stream of the lowest layer is a base layer, and can be encoded by an encoder such as H.264 / AVC, HEVC, or the like.

기본 계층 인코더(510)는 전체 영상을 인코딩하여 기본 계층을 위한 기본 계층 비디오 데이터(또는 기본 계층 비트스트림)를 생성할 수 있다. 예를 들어, 기본 계층 비디오 데이터는 사용자가 가상 공간 내에서 바라보는 전체 영역을 위한 비디오 데이터를 포함할 수 있다. 기본 계층의 영상은 가장 낮은 화질의 영상일 수 있다.The base layer encoder 510 may encode the entire image to generate base layer video data (or base layer bitstream) for the base layer. For example, the base layer video data may include video data for the entire area viewed by the user in the virtual space. The image of the base layer may be the image of the lowest image quality.

향상 계층 인코더(520)는, 시그널링 데이터(예를 들어, 관심 영역 정보) 및 기본 계층 비디오 데이터를 기초로, 전체 영상을 인코딩하여 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터(또는 향상 계층 비트스트림)를 생성할 수 있다. 향상 계층 비디오 데이터는 전체 영역 내에서 관심 영역을 위한 비디오 데이터를 포함할 수 있다.The enhancement layer encoder 520 encodes the entire image based on signaling data (e.g., region of interest information) and base layer video data to generate at least one enhancement layer for at least one enhancement layer, Video data (or enhancement layer bitstream). The enhancement layer video data may include video data for a region of interest within the entire region.

다중화기(530)는 기본 계층 비디오 데이터, 적어도 하나의 향상 계층 비디오 데이터, 및/또는 시그널링 데이터를 멀티플렉싱하고, 전체 영상에 해당하는 하나의 비트스트림을 생성할 수 있다.The multiplexer 530 may multiplex the base layer video data, the at least one enhancement layer video data, and / or the signaling data, and generate one bitstream corresponding to the entire image.

도 6은 스케일러블 비디오 코딩을 이용한 예시적인 화상 회의 서비스를 나타낸 도면이다.6 is a diagram illustrating an exemplary video conferencing service using scalable video coding.

클라이언트 디바이스는 전체 영상을 하나의 압축된 영상 비트스트림(Bitstream)으로 받아서, 이를 디코딩(decoding)하고, 사용자가 바라보는 영역 만큼을 가상의 공간에 렌더(render)한다. 종래의 기술은 전체 영상(예를 들어, 360도immersive 영상)을 모두 고해상도(또는 높은 품질)의 영상으로 전송 및/또는 수신하므로, 고해상도의 영상이 모인 비트스트림의 총 대역폭은 매우 클 수 밖에 없다.The client device receives the entire image as one compressed image bitstream, decodes it, and renders as many areas as the user views in a virtual space. Since the conventional technique transmits and / or receives a whole image (for example, a 360-degree immersive image) as a high-resolution (or high-quality) image, the total bandwidth of a bit stream having a high resolution image is very large .

서버 디바이스는 스케일러블 비디오 코딩 방법을 사용할 수 있다. 이하에서는, 예시적인 기술을 구체적으로 설명한다.The server device may use a scalable video coding method. Hereinafter, an exemplary technique will be described in detail.

가상 공간(610)에는 제1 사용자(611), 제2 사용자(613), 및 제3 사용자(615)가 존재할 수 있다. 제1 사용자(611), 제2 사용자(613), 및 제3 사용자(615)는 가상 공간 영역(610) 내에서 회의를 할 수 있다.The first user 611, the second user 613, and the third user 615 may exist in the virtual space 610. The first user 611, the second user 613, and the third user 615 may conference within the virtual space region 610.

클라이언트 디바이스(미도시)는 가상 공간 내에서 화자 및 사용자의 시선을 판단하고, 영상 구성 정보를 생성할 수 있다. 클라이언트 디바이스는 영상 구성 정보를 처음으로 생성한 경우 또는 사용자의 시선이 화자를 향하지 않는 경우에 영상 구성 정보를 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송할 수 있다. The client device (not shown) can determine the line of sight of the speaker and the user in the virtual space, and can generate image configuration information. The client device may transmit the image configuration information to the server device and / or other client devices when the image configuration information is first created or when the user's gaze is not facing the speaker.

서버 디바이스(미도시)는 적어도 하나의 클라이언트 디바이스로부터 비디오 영상 및 시그널링 데이터를 수신하고, 가상 공간(610)에 대한 전체 영상을 생성할 수 있다.A server device (not shown) may receive video and signaling data from at least one client device, and may generate a full image of the virtual space 610.

그리고 나서, 서버 디바이스는 시그널링 데이터를 기초로 적어도 하나의 비디오 영상을 인코딩할 수 있다. 서버 디바이스는 영상 구성 정보(예를 들어, 시선 정보 및 중 영역 정보)를 기초로 상기 시선 방향에 해당하는 비디오 영상(또는 관심 영역)과 상기 시선 방향에 해당하지 않는 비디오 영상의 품질을 다르게 인코딩할 수 있다. 예를 들어, 서버 디바이스는 사용자의 시선 방향에 해당하는 비디오 영상은 고품질로 인코딩하고, 사용자의 시선 방향에 해당하지 않는 비디오 영상은 저품질로 인코딩할 수 있다.The server device may then encode at least one video image based on the signaling data. The server device encodes the quality of the video image (or the region of interest) corresponding to the viewing direction and the quality of the video image that does not correspond to the viewing direction differently based on the image configuration information (for example, the sight line information and the medium region information) . For example, the server device can encode a video image corresponding to the user's gaze direction at a high quality and a video image not corresponding to the user's gaze direction at a low quality.

도면을 참고하면, 제1 비디오 영상(630)은 제1 사용자(611)의 시선 방향에 해당하는 관심 영역의 비디오 영상이다. 제1 비디오 영상(630)은 고품질로 제1 사용자(611)에게 제공될 필요가 있다. 따라서, 서버 디바이스는 제1 비디오 영상(630)을 인코딩하여, 기본 계층 비디오 데이터(633)을 생성하고, 계층간 예측을 이용하여 적어도 하나의 향상 계층 비디오 데이터(635)를 생성할 수 있다.Referring to FIG. 6, the first video image 630 is a video image of a region of interest corresponding to the viewing direction of the first user 611. The first video image 630 needs to be provided to the first user 611 with high quality. Thus, the server device may encode the first video image 630 to generate base layer video data 633 and generate at least one enhancement layer video data 635 using inter-layer prediction.

제2 비디오 영상(650)은 제1 사용자(611)의 시선 방향에 해당하지 않는 비-관심영역의 비디오 영상이다. 제2 비디오 영상(650)은 저품질로 제1 사용자(611)에게 제공될 필요가 있다. 따라서, 서버 디바이스는 제2 비디오 영상(650)을 인코딩하여, 기본 계층 비디오 데이터(653)만을 생성할 수 있다.The second video image 650 is a video image of a non-interest area that does not correspond to the gaze direction of the first user 611. The second video image 650 needs to be provided to the first user 611 with a low quality. Accordingly, the server device may encode the second video image 650 to generate only the base layer video data 653. [

그리고 나서, 서버 디바이스는 인코딩된 적어도 하나의 비트스트림을 제1 사용자(611)가 사용하는 클라이언트 디바이스로 전송할 수 있다.The server device may then transmit the encoded at least one bitstream to the client device used by the first user 611.

결론적으로, 제1 사용자(611)가 제2 사용자(613) 만을 바라보거나 제3 사용자(615)가 제1 사용자(611)의 시야각 내에서 아주 작은 영역만 차지하고 있을 경우, 서버 디바이스는 제2 사용자(613)의 영상은 스케일러블 비디오 코딩에서 기본 계층 비디오 데이터 및 적어도 하나의 향상 계층 비디오 데이터로 전송할 수 있다. 또한, 서버 디바이스는 제3 사용자(615)의 영상은 기본 계층 비디오 데이터만을 전송할 수 있다.As a result, when the first user 611 views only the second user 613 or the third user 615 occupies only a very small area within the viewing angle of the first user 611, (613) may be transmitted in scalable video coding as base layer video data and at least one enhancement layer video data. In addition, the server device can transmit only the base layer video data to the image of the third user 615.

도 7은예시적인 영상 전송 방법을 나타낸 도면이다.7 is a diagram illustrating a method of transmitting an image.

서버 디바이스는, 통신부를 이용하여, 적어도 하나의 클라이언트 디바이스로부터 비디오 영상 및 시그널링 데이터를 수신할 수 있다. 또한, 서버 디바이스는, 시그널링 데이터 추출부를 이용하여, 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 시점 정보 및 줌 영역 정보를 포함할 수 있다.The server device can receive video image and signaling data from at least one client device using the communication unit. Further, the server device can extract the signaling data using the signaling data extracting unit. For example, the signaling data may include viewpoint information and zoom region information.

시선 정보는 제1 사용자가 제2 사용자를 바라보는지 제3 사용자를 바라보는지 여부를 지시할 수 있다. 가상 공간 내에서 제1 사용자가 제2 사용자의 방향을 바라보면, 시선 정보는 제1 사용자에서 제2 사용자로 향하는 방향을 지시할 수 있다.The gaze information may indicate whether the first user views the second user or the third user. If the first user views the direction of the second user in the virtual space, the gaze information may indicate the direction from the first user to the second user.

줌 영역 정보는 사용자의 시선 방향에 해당하는 비디오 영상의 확대 범위 및/또는 축소 범위를 지시할 수 있다. 또한, 줌 영역 정보는 사용자의 시야각을 지시할 수 있다. 줌 영역 정보의 값을 기초로 비디오 영상이 확대되면, 제1 사용자는 제2 사용자만을 볼 수 있다. 줌 영역 정보의 값을 기초로 비디오 영상이 축소되면, 제1 사용자는 제2 사용자뿐만 아니라 제3 사용자의 일부 및/또는 전체를 볼 수 있다.The zoom area information may indicate an enlarged range and / or a reduced range of the video image corresponding to the user's gaze direction. In addition, the zoom area information can indicate the viewing angle of the user. If the video image is enlarged based on the value of the zoom area information, the first user can view only the second user. If the video image is reduced based on the value of the zoom area information, the first user can see part and / or entirety of the third user as well as the second user.

그리고 나서, 서버 디바이스는, 영상 생성부를 이용하여, 가상 공간에서 화상 회의를 위한 전체 영상을 생성할 수 있다. Then, the server device can generate the entire image for the video conference in the virtual space using the image generating unit.

그리고 나서, 서버 디바이스는, 관심 영역 판단부를 이용하여, 시그널링 데이터를 기초로 가상 공간 내에서 각 사용자가 바라보는 시점 및 줌(zoom) 영역에 대한 영상 구성 정보를 파악할 수 있다(710).Then, the server device can determine the image configuration information for the viewpoint and the zoom region of each user in the virtual space based on the signaling data using the region-of-interest determination unit 710.

그리고 나서, 서버 디바이스는, 관심 영역 판단부를 이용하여, 영상 구성 정보를 기초로 사용자의 관심 영역을 결정할 수 있다(720).Then, the server device can determine the region of interest of the user based on the image configuration information using the region-of-interest determination unit (720).

제1 사용자가 제2 사용자를 바라볼 경우, 제1 사용자가 바라보는 시선 방향에 해당하는 비디오 영상은 제2 사용자가 많은 영역을 차지하고, 제3 사용자는 적은 영역을 차지하거나 비디오 영상에 포함되지 않을 수도 있다. 이 경우, 관심 영역은 제2 사용자를 포함하는 영역이 될 수 있다. 관심 영역은 상기 시선 정보 및 줌 영역 정보에 따라서 변경될 수 있다.When the first user views the second user, the video image corresponding to the viewing direction of the first user occupies a large area of the second user, the third user occupies a small area, It is possible. In this case, the region of interest may be a region including the second user. The region of interest may be changed according to the gaze information and the zoom region information.

시그널링 데이터(예를 들어, 시점 정보 및 줌 영역 정보 중에서 적어도 하나)가 변경될 경우, 서버 디바이스는 새로운 시그널링 데이터를 수신할 수 있다. 이 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 새로운 관심 영역을 결정할 수 있다..When the signaling data (for example, at least one of the view information and the zoom area information) is changed, the server device can receive new signaling data. In this case, the server device can determine a new region of interest based on the new signaling data.

그리고 나서, 서버 디바이스는, 제어부를 이용하여, 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심 영역에 해당하는 데이터인지 아닌지 여부를 판단할 수 있다.Then, the server device can use the control unit to determine whether the data currently processed based on the signaling data is data corresponding to the region of interest.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심 영역에 해당하는 데이터인지 아닌지 여부를 판단할 수 있다.When the signaling data is changed, the server device can determine whether or not the data currently processed based on the new signaling data is data corresponding to the region of interest.

관심 영역에 해당하는 데이터일 경우, 서버 디바이스는, 인코더를 이용하여, 사용자의 시점에 해당하는 비디오 영상(예를 들어, 관심 영역)은 고품질로 인코딩할 수 있다(740). 예를 들어, 서버 디바이스는 해당 비디오 영상에 대하여 기본 계층 비디오 데이터 및 향상 계층 비디오 데이터를 생성하고, 이들을 전송할 수 있다.In case of data corresponding to a region of interest, the server device may encode a video image (for example, a region of interest) corresponding to a user's viewpoint at a high quality using an encoder (740). For example, the server device can generate base layer video data and enhancement layer video data for the video image and transmit them.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시점에 해당하는 비디오 영상(새로운 관심 영역)은 고품질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 저품질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 고품질의 영상을 전송하는 경우, 서버 디바이스는 향상 계층 비디오 데이터를 추가로 생성 및/또는 전송할 수 있다.When the signaling data is changed, the server device can transmit a video image (new interest area) corresponding to a new time point as a high-quality image. If the server device is transmitting a low-quality image but the signaling data is changed so that the server device transmits a high-quality image, the server device can additionally generate and / or transmit enhancement layer video data.

관심 영역에 해당하지 않는 데이터일 경우, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 영상(예를 들어, 비-관심 영역)은 저품질로 인코딩할 수 있다(750). 예를 들어, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 영상에 대하여 기본 계층 비디오 데이터만 생성하고, 이들을 전송할 수 있다.If the data does not correspond to a region of interest, the server device may encode a video image (e.g., a non-interest region) that does not correspond to a user's viewpoint at a low quality (750). For example, the server device may generate only base layer video data for video images not corresponding to the user's viewpoint, and may transmit them.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 사용자의 시점에 해당하지 않는 비디오 영상(새로운 비-관심 영역)은 저품질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 고품질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 저품질의 영상을 전송하는 경우, 서버 디바이스는 더 이상 적어도 하나의 향상 계층 비디오 데이터를 생성 및/또는 전송하지 않고, 기본 계층 비디오 데이터만을 생성 및/또는 전송할 수 있다.When the signaling data is changed, the server device can transmit a video image (new non-interest area) not corresponding to a new user's viewpoint with a low-quality image. In the case where the server device is transmitting a high quality image but the signaling data is changed and the server device transmits a low quality image, the server device does not generate and / or transmit at least one enhancement layer video data, Only hierarchical video data can be generated and / or transmitted.

즉, 기본 계층 비디오 데이터를 수신했을 때의 비디오 영상의 화질은 향상 계층 비디오 데이터까지 받았을 때의 비디오 영상의 화질보다는 낮으므로, 클라이언트 디바이스는 사용자가 고개를 돌린 정보를 센서 등으로부터 얻는 순간에, 사용자의 시선 방향에 해당하는 비디오 영상(예를 들어, 관심 영역)에 대한 향상 계층 비디오 데이터를 수신할 수 있다. 그리고, 클라이언트 디바이스는 짧은 시간 내에 고화질의 비디오 영상을 사용자에게 제공할 수 있다.That is, since the image quality of the video image when the base layer video data is received is lower than the image quality of the video image when the enhancement layer video data is received, the client device, at the moment when the user obtains the information, Layer video data for a video image (e.g., region of interest) corresponding to the viewing direction of the video data. Then, the client device can provide the user with a high-quality video image in a short time.

본 명세서의 예시적인 방법은 사전에 일부 추가 영역의 데이터만 전송 받는 단순 pre-caching 방법이나, 사용자의 시선 방향에 해당하는 영역의 데이터만을 전송 받는 방법에 비해 큰 장점을 가진다.The exemplary method of the present invention has a great advantage over a simple pre-caching method in which only a part of additional area data is transmitted in advance, or a method of receiving only data in an area corresponding to the direction of the user's sight line.

본 명세서의 예시적인 방법은 모든 데이터를 고화질로 보내는 종래의 방식에 비하여 전체 대역폭을 낮출 수 있다.The exemplary method herein can reduce the overall bandwidth compared to conventional methods of sending all data at high quality.

또한, 본 명세서의 예시적인 방법은 사용자 시선 움직임에 실시간으로 반응하여 비디오 프로세싱 속도를 높일 수 있다.In addition, the exemplary method herein can increase the speed of video processing by reacting in real time to user gaze movements.

기존의 방법은 제1 사용자가 제2 사용자를 바라보다가 제3 사용자로 고개를 돌렸을 때, 클라이언트 디바이스(예를 들어, HMD의 센서 등)로 이 움직임을 파악하여 제3 사용자를 표현하기 위한 비디오 정보를 처리하고 화면에 재생한다. 기존의 방법은 매우 빨리 새로운 영역의 영상을 처리하는 것이 어렵기 때문에, 기존의 방법은 모든 데이터를 미리 받아두는 비효율적 방법을 사용했다.In the conventional method, when a first user looks at a second user and turns his / her head to a third user, a video for identifying the motion with a client device (e.g., a sensor of the HMD) Process information and play it on the screen. The conventional method is very difficult to process the image of the new area very quickly, and the conventional method uses an inefficient method of receiving all the data in advance.

하지만, 본 명세서의 예시적인 기술은 위의 스케일러블 비디오를 통한 적응적 비디오 전송을 하기 때문에, 제1 사용자가 제3 사용자로 고개를 돌렸을 때, 이미 가지고 있는 베이스 레이어 데이터를 이용하여 빠르게 사용자에게 응답할 수 있다. 본 명세서의 예시적인 기술은 전체 고화질 데이터를 처리할 때보다 더 빨리 비디오 영상을 재생할 수 있다. 따라서, 본 명세서의 예시적인 기술은 시선 움직임에 빠르게 반응하여 비디오 영상을 처리할 수 있다. However, since the exemplary technique of the present invention has an adaptive video transmission through the scalable video, when the first user turns his head to the third user, the base layer data that he already has, can do. The exemplary techniques herein can reproduce video images faster than when processing the entire high definition data. Thus, the exemplary techniques herein are capable of rapidly processing video images in response to eye movements.

도 8은 관심 영역을 시그널링하는 예시적인 방법을 나타낸 도면이다.8 is a diagram illustrating an exemplary method of signaling a region of interest.

도 (a)를 참조하면, 스케일러블 비디오에서의 관심 영역을 시그널링하는 방법을 나타낸다.Referring to Figure (a), there is shown a method of signaling a region of interest in scalable video.

서버 디바이스(또는 인코더)는 하나의 비디오 영상(또는 픽처)을 직사각형 모양을 갖는 여러 타일(Tile)들로 분할할 수 있다. 예를 들어, 비디오 영상은 Coding Tree Unit(CTU) 단위를 경계로 분할될 수 있다. 예를 들어, 하나의 CTU는 Y CTB, Cb CTB, 및 Cr CTB를 포함할 수 있다.A server device (or an encoder) can divide one video image (or picture) into a plurality of tiles having a rectangular shape. For example, a video image can be partitioned into Coding Tree Unit (CTU) units. For example, one CTU may include Y CTB, Cb CTB, and Cr CTB.

서버 디바이스는 빠른 사용자 응답을 위해서 기본 계층의 비디오 영상은 타일(Tile)로 분할하지 않고 전체적으로 인코딩할 수 있다. 그리고, 서버 디바이스는 하나 이상의 향상 계층들의 비디오 영상은 필요에 따라서 일부 또는 전체를 여러 타일(Tile)들로 분할하여 인코딩할 수 있다.The server device can encode the video image of the base layer as a whole without dividing the video image into a tile for fast user response. In addition, the server device may encode a video image of one or more enhancement layers by dividing a part or all of the video image into a plurality of tiles as needed.

즉, 서버 디바이스는 향상 계층의 비디오 영상은 적어도 하나의 타일로 분할하고, 관심 영역(810, ROI, Region of Interest)에 해당하는 타일들을 인코딩할 수 있다.That is, the server device may divide the video image of the enhancement layer into at least one tile and encode tiles corresponding to the region of interest 810 (ROI, Region of Interest).

이 때, 관심 영역(810)은 가상 공간에서 사용자가 보게 될 중요 오브젝트(Object)가 위치할 타일들의 위치 (e.g. 게임 등에서 새로운 적이 등장하는 위치, 화상 통신에서 가상공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳에 해당할 수 있다.In this case, the area of interest 810 includes a location of the tiles where the important object to be viewed by the user is to be located (e.g., a location where a new enemy appears in a game or the like, a position of a speaker in a virtual space in video communication) And / or where the user's gaze is being viewed.

또한, 서버 디바이스는 관심 영역에 포함 되는 적어도 하나의 타일을 식별하는 타일 정보를 포함하는 관심 영역 정보를 생성할 수 있다. 예를 들어, 관심 영역 정보는 관심 영역 판단부, 시그널링 데이터 생성부, 및/또는 인코더에 의해서 생성될 수 있다. The server device may also generate region of interest information including tile information identifying at least one tile included in the region of interest. For example, the region of interest information may be generated by the region of interest determiner, the signaling data generator, and / or the encoder.

관심 영역(810)의 타일 정보는 연속적이므로 모든 타일의 번호를 다 갖지 않더라도 효과적으로 압축될 수 있다. 예를 들어, 타일 정보는 관심 영역에 해당하는 모든 타일의 번호들뿐만 아니라 타일의 시작과 끝 번호, 좌표점 정보, CU (Coding Unit) 번호 리스트, 수식으로 표현된 타일 번호를 포함할 수 있다.Since the tile information of the region of interest 810 is continuous, it can be effectively compressed even if all the tiles are not numbered. For example, the tile information may include not only the numbers of all the tiles corresponding to the region of interest but also the beginning and ending numbers of the tiles, the coordinate point information, the CU (Coding Unit) number list, and the tile number expressed by the formula.

비-관심 영역의 타일 정보는 인코더가 제공하는 Entropy coding 을 거친 후 다른 클라이언트 디바이스, 영상 프로세싱 컴퓨팅 장비, 및/또는 서버로 전송될 수 있다.The tile information in the non-interest region may be sent to another client device, image processing computing device, and / or server after entropy coding provided by the encoder.

관심 영역 정보는 Session 정보를 실어나르는 high-level syntax 프로토콜을 통해 전해질 수 있다. 또한, 관심 영역 정보는 비디오 표준의 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header) 등의 패킷 단위에서 전해질 수 있다. 또한, 관심 영역 정보는 비디오 파일을 서술하는 별도의 파일로(e.g. DASH의 MPD) 전달될 수 있다.The region of interest can be delivered through a high-level syntax protocol that carries Session information. In addition, the region of interest may be transmitted in packet units such as SEI (Supplement Enhancement Information), VUI (video usability information), and slice header of a video standard. In addition, the region of interest information may be transferred to a separate file describing the video file (e.g., MPD of DASH).

화상 회의 시스템은 관심 영역 정보의 시그널링을 통해 향상계층의 필요한 타일만 클라이언트 디바이스 간에 및/또는 클라이언트 디바이스와 서버 디바이스 간에 전송 및/또는 수신함으로써, 전체적인 대역폭을 낮추고, 비디오 프로세싱 시간을 줄일 수 있다. 이는 빠른 HMD 사용자 응답시간을 보장하는데 중요하다.The video conferencing system can reduce the overall bandwidth and video processing time by transmitting and / or receiving only the required tiles of the enhancement layer between the client devices and / or between the client device and the server device through signaling of the area of interest information. This is important to ensure fast HMD user response time.

도 (b)를 참조하면, 단일 화면 비디오에서의 관심 영역을 시그널링하는 방법을 나타낸다.Referring to Figure (b), there is shown a method of signaling a region of interest in a single screen video.

본 명세서의 예시적인 기술은 스케일러블 비디오가 아닌 단일 화면 영상에서는 일반적으로 관심 영역(ROI)이 아닌 영역을 Downscaling (Downsampling)하는 방식으로 화질을 떨어뜨리는 기법을 사용할 수 있다. 종래 기술은 서비스를 이용하는 단말 간에 downscaling 을 위해 쓴 필터(filter) 정보(820)를 공유하지 않고, 처음부터 한가지 기술로 약속을 하거나 인코더만 필터 정보를 알고 있다. The exemplary technique of the present invention can use a technique of reducing the image quality by downscaling (downsampling) an area other than a ROI in a single-screen image, rather than a scalable video. The prior art does not share the filter information 820 used for downscaling between the terminals using the service, but makes an appointment from the beginning with only one technique, or only the encoder knows the filter information.

하지만, 서버 디바이스는, 인코딩 된 영상을 전달 받는 클라이언트 디바이스(또는 HMD 단말)에서 downscaling된 관심 영역 외 영역의 화질을 조금이라도 향상 시키기 위해, 인코딩 시에 사용된 필터 정보(820)를 클라이언트 디바이스로 전달할 수 있다. 이 기술은 실제로 영상 처리 시간을 상당히 줄일 수 있으며, 화질 향상을 제공할 수 있다.However, the server device transmits the filter information 820 used at the time of encoding to the client device in order to further improve the image quality of the downscaled area outside the area of interest down-streamed at the client device (or the HMD terminal) receiving the encoded image . This technique can actually reduce image processing time significantly and can provide image quality enhancement.

전술한 바와 같이, 서버 디바이스는 관심 영역 정보를 생성할 수 있다. 예를 들어, 관심 영역 정보는 타일 정보뿐만 아니라 필터 정보를 더 포함할 수 있다. 예를 들어, 필터 정보는 약속된 필터 후보들의 번호, 필터에 사용된 값들을 포함할 수 있다. As described above, the server device may generate the region of interest information. For example, the area of interest information may further include filter information as well as tile information. For example, the filter information may include the number of promised filter candidates, the values used in the filter.

도 9는 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.9 is a diagram showing an exemplary configuration of a client device.

클라이언트 디바이스(900)는 영상 입력부(910), 오디오 입력부(920), 센서부(930), 영상 출력부(940), 오디오 출력부(950), 통신부(960), 및/또는 제어부(970) 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 클라이언트 디바이스(900)는 HMD(Head Mounted Display)일 수 있다. 또한, 클라이언트 디바이스(900)의 제어부(970)은 클라이언트 디바이스(900)에 포함될 수도 있고, 별도의 장치로 존재할 수도 있다.The client device 900 includes an image input unit 910, an audio input unit 920, a sensor unit 930, an image output unit 940, an audio output unit 950, a communication unit 960, and / As shown in FIG. For example, the client device 900 may be an HMD (Head Mounted Display). The control unit 970 of the client device 900 may be included in the client device 900 or may be a separate device.

영상 입력부(910)는 비디오 영상을 촬영할 수 있다. 영상 입력부(910)는 사용자의 영상을 획득하는 2D/3D 카메라 및/또는 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.The video input unit 910 can capture a video image. The image input unit 910 may include at least one of a 2D / 3D camera and / or an immersive camera for acquiring a user's image. The 2D / 3D camera can shoot an image having a viewing angle of 180 degrees or less. Immersive cameras can capture images with a viewing angle of 360 degrees or less.

오디오 입력부(920)는 사용자의 음성을 녹음할 수 있다. 예를 들어, 오디오 입력부(920)는 마이크를 포함할 수 있다.The audio input unit 920 can record the user's voice. For example, the audio input 920 may include a microphone.

센서부(930)는 사용자 시선의 움직임에 대한 정보를 획득할 수 있다. 예를 들어, 센서부(930)는 물체의 방위 변화를 감지하는 자이로 센서, 이동하는 물체의 가속도나 충격의 세기를 측정하는 가속도 센서, 및 사용자의 시선 방향을 감지하는 외부 센서를 포함할 수 있다. 실시예에 따라서, 센서부(930)는 영상 입력부(910) 및 오디오 입력부(920)를 포함할 수도 있다.The sensor unit 930 can acquire information on the movement of the user's gaze. For example, the sensor unit 930 may include a gyro sensor for sensing a change in the azimuth of the object, an acceleration sensor for measuring the acceleration of the moving object or the intensity of the impact, and an external sensor for sensing the direction of the user's gaze . According to an embodiment, the sensor unit 930 may include an image input unit 910 and an audio input unit 920.

영상 출력부(940)는 통신부(960)로부터 수신되거나 메모리(미도시)에 저장된 영상 데이터를 출력할 수 있다.The video output unit 940 can output video data received from the communication unit 960 or stored in a memory (not shown).

오디오 출력부(950)는 통신부(960)로부터 수신되거나 메모리에 저장된 오디오 데이터를 출력할 수 있다.The audio output unit 950 can output audio data received from the communication unit 960 or stored in the memory.

통신부(960)는 방송망 및/또는 브로드밴드를 통해서 외부의 클라이언트 디바이스 및/또는 서버 디바이스와 통신할 수 있다. 예를 들어, 통신부(960)는 데이터를 전송하는 전송부(미도시) 및/또는 데이터를 수신하는 수신부(미도시)를 포함할 수 있다.The communication unit 960 can communicate with an external client device and / or a server device via a broadcasting network and / or broadband. For example, the communication unit 960 may include a transmitting unit (not shown) for transmitting data and / or a receiving unit (not shown) for receiving data.

제어부(970)는 클라이언트 디바이스(900)의 모든 동작을 제어할 수 있다. 제어부(970)는 서버 디바이스로부터 수신한 비디오 데이터 및 시그널링 데이터를 처리할 수 있다. 제어부(970)에 대한 구체적인 내용은 이하에서 설명한다.The control unit 970 can control all operations of the client device 900. [ The control unit 970 can process the video data and the signaling data received from the server device. Details of the control unit 970 will be described below.

도 10은 제어부의 예시적인 구성을 나타낸 도면이다.10 is a diagram showing an exemplary configuration of the control unit.

제어부(1000)는 시그널링 데이터 및/또는 비디오 데이터를 처리할 수 있다. 제어부(1000)는 시그널링 데이터 추출부(1010), 디코더(1020), 화자 판단부(1030), 시선 판단부(1040), 및/또는 시그널링 데이터 생성부(1050) 중에서 적어도 하나를 포함할 수 있다.The control unit 1000 may process the signaling data and / or the video data. The control unit 1000 may include at least one of a signaling data extraction unit 1010, a decoder 1020, a speaker determination unit 1030, a visual determination unit 1040, and / or a signaling data generation unit 1050 .

시그널링 데이터 추출부(1010)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 관심 영역 정보를 포함할 수 있다.The signaling data extraction unit 1010 can extract signaling data from data received from a server device and / or another client device. For example, the signaling data may include region of interest information.

디코더(1020)는 시그널링 데이터를 기초로 비디오 데이터를 디코딩할 수 있다. 예를 들어, 디코더(1020)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 디코딩할 수 있다. 예를 들어, 가상 공간 내에서 제1 사용자가 제2 사용자를 바라보는 경우, 제1 사용자의 디코더(1020)는 가상 공간 내의 제1 사용자 시선을 기초로 제2 사용자에 해당하는 영상은 고화질로 디코딩하고, 제3 사용자에 해당하는 영상은 저화질로 디코딩할 수 있다. 실시예에 따라서, 디코더(1020)는 시그널링 데이터 추출부(1010), 화자 판단부(1030), 시선 판단부(1040), 및/또는 시그널링 데이터 생성부(1050) 중에서 적어도 하나를 포함할 수 있다.The decoder 1020 may decode the video data based on the signaling data. For example, the decoder 1020 may decode the entire image in a customized manner for each user based on the viewing direction of each user. For example, when the first user views the second user in the virtual space, the decoder 1020 of the first user decodes the video corresponding to the second user based on the first user's gaze in the virtual space, And the image corresponding to the third user can be decoded with a low quality. The decoder 1020 may include at least one of a signaling data extraction unit 1010, a speaker determination unit 1030, a visual determination unit 1040, and / or a signaling data generation unit 1050 .

화자 판단부(1030)는 음성 및/또는 주어진 옵션을 기초로 가상 공간 내에서 화자가 누구인지 여부를 판단할 수 있다.The speaker determination unit 1030 can determine whether the speaker is within the virtual space based on the voice and / or the given option.

시선 판단부(1040)는 가상 공간 내에서 사용자의 시선을 판단하고, 영상 구성 정보를 생성할 수 있다. 예를 들어, 영상 구성 정보는 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.The line of sight determining unit 1040 can determine the user's line of sight in the virtual space and generate the image configuration information. For example, the image configuration information may include gaze information indicating a gaze direction and / or zoom area information indicating a viewing angle of a user.

시그널링 데이터 생성부(1050)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 전송할 수 있다. 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data generation unit 1050 may generate signaling data for transmission to a server device and / or another client device. For example, the signaling data may transmit image configuration information. The signaling data may be transmitted through at least one of Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, and a file describing video data.

도 11은 디코더의 예시적인 구성을 나타낸 도면이다.11 is a diagram showing an exemplary configuration of a decoder.

디코더(1100)는 추출기(1110), 기본 계층 디코더(1120), 및/또는 적어도 하나의 향상 계층 디코더(1130) 중에서 적어도 하나를 포함할 수 있다.The decoder 1100 may include at least one of an extractor 1110, a base layer decoder 1120, and / or at least one enhancement layer decoder 1130.

디코더(1100)는 스케일러블 비디오 코딩 방법의 역 과정을 이용하여 비트스트림(비디오 데이터)을 디코딩할 수 있다.The decoder 1100 may decode the bitstream (video data) using an inverse process of the scalable video coding method.

추출기(1110)는 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림(비디오 데이터)을 수신하고, 재생하고자 하는 영상의 화질에 따라서 비트스트림을 선택적으로 추출할 수 있다. 예를 들어, 비트스트림(비디오 데이터)은 기본 계층을 위한 기본 계층 비트스트림(기본 계층 비디오 데이터) 및 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)을 포함할 수 있다. 기본 계층 비트스트림(기본 계층 비디오 데이터)는 가상 공간의 전체 영역을 위한 위한 비디오 데이터를 포함할 수 있다. 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)는 전체 영역 내에서 관심 영역을 위한 비디오 데이터를 포함할 수 있다.The extractor 1110 receives the bitstream (video data) including the video data and the signaling data, and can selectively extract the bitstream according to the image quality of the image to be reproduced. For example, a bitstream (video data) may include a base layer bitstream (base layer video data) for a base layer and at least one enhancement layer bitstream for at least one enhancement layer predicted from the base layer ). The base layer bitstream (base layer video data) may include video data for the entire area of the virtual space. At least one enhancement layer bitstream (enhancement layer video data) may include video data for a region of interest within the entire region.

또한, 시그널링 데이터는 화상 회의 서비스를 위한 가상 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심 영역을 지시하는 관심 영역 정보를 포함할 수 있다.The signaling data may also include region of interest information indicating a region of interest corresponding to the direction of the user's gaze within the entire region of the virtual space for the video conferencing service.

기본 계층 디코더(1120)는 저화질 영상을 위한 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)를 디코딩할 수 있다.The base layer decoder 1120 can decode a base layer bitstream (or base layer video data) for a low-quality image.

향상 계층 디코더(1130)는 시그널링 데이터 및/또는 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)를 기초로 고화질 영상을 위한 적어도 하나의 향상 계층의 비트스트림(또는 향상 계층 비디오 데이터)를 디코딩할 수 있다. The enhancement layer decoder 1130 can decode at least one enhancement layer bitstream (or enhancement layer video data) for the high-definition video based on the signaling data and / or the bitstream (or base layer video data) have.

도 12은 영상 구성 정보를 생성 및/또는 전송하는 예시적인 방법을 나타낸 도면이다.12 is a diagram illustrating an exemplary method of generating and / or transmitting image configuration information.

이하에서는, 사용자 시선의 움직임에 실시간으로 대응하기 위한 영상 구성 정보를 생성하는 방법에 대하여 설명한다. Hereinafter, a method of generating image configuration information for responding to the movement of the user's gaze in real time will be described.

영상 구성 정보는 사용자의 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보 중에서 적어도 하나를 포함할 수 있다. 사용자 시선이란 실제 공간이 아닌 가상 공간 내에서 사용자가 바라보는 방향을 의미한다. 또한, 시선 정보는 현재 사용자의 시선 방향을 지시하는 정보뿐만 아니라, 미래에 사용자의 시선 방향을 지시하는 정보(예를 들어, 주목을 받을 것이라 예상되는 시선 지점에 대한 정보)를 포함할 수 있다. The image configuration information may include at least one of gaze information indicating a gaze direction of a user and / or zoom area information indicating a viewing angle of a user. The user's gaze is the direction that the user looks in the virtual space, not the actual space. In addition, the gaze information may include information indicating the gaze direction of the user in the future (for example, information on gaze points that are expected to receive attention), as well as information indicating the gaze direction of the current user.

클라이언트 디바이스는 사용자를 중심으로 가상 공간에 위치한 다른 사용자를 바라보는 동작을 센싱하고, 이를 처리할 수 있다.The client device can sense the operation of looking at another user located in the virtual space around the user and process the operation.

클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센서부로부터 센싱 정보를 수신할 수 있다. 센싱 정보는 카메라에 의해 촬영된 영상, 마이크에 의해 녹음된 음성일 수 있다. 또한, 센싱 정보는 자이로 센서, 가속도 센서, 및 외부 센서에 의해서 감지된 데이터일 수 있다.The client device can receive the sensing information from the sensor unit using the control unit and / or the sight line determination unit. The sensing information may be a video shot by a camera, or a voice recorded by a microphone. In addition, the sensing information may be data sensed by a gyro sensor, an acceleration sensor, and an external sensor.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센싱 정보를 기초로 사용자 시선의 움직임을 확인할 수 있다(1210). 예를 들어, 클라이언트 디바이스는 센싱 정보가 가지는 값의 변화를 기초로 사용자 시선의 움직임을 확인할 수 있다.Also, the client device can check the movement of the user's gaze based on the sensing information using the control unit and / or the sight line determination unit (1210). For example, the client device can check the movement of the user's gaze based on the change of the value of the sensing information.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 가상 회의 공간에서의 영상 구성 정보를 생성할 수 있다(1220). 예를 들어, 클라이언트 디바이스가 물리적으로 움직이거나 사용자의 시선이 움직이는 경우, 클라이언트 디바이스는 센싱 정보를 기초로 가상 회의 공간에서의 사용자의 시선 정보 및/또는 줌 영역 정보를 계산할 수 있다.In addition, the client device may generate image configuration information in the virtual conference space using the control unit and / or the visual determination unit (1220). For example, when the client device physically moves or the user's gaze moves, the client device can calculate the gaze information and / or the zoom area information of the user in the virtual meeting space based on the sensing information.

또한, 클라이언트 디바이스는, 통신부를 이용하여, 영상 구성 정보를 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송할 수 있다(1230). 또한, 클라이언트 디바이스는 영상 구성 정보를 자신의 다른 구성요소로 전달할 수도 있다.Further, the client device can transmit the image configuration information to the server device and / or another client device using the communication unit (1230). In addition, the client device may forward the video configuration information to its other components.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성하는 방법을 설명하였다. 다만 이에 한정되지 않으며, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 영상 구성 정보를 생성할 수도 있다. In the foregoing, a method of generating image configuration information by a client device has been described. However, the present invention is not limited thereto, and the server device may receive the sensing information from the client device and generate the image configuration information.

또한, 클라이언트 디바이스와 연결된 외부의 컴퓨팅 디바이스가 영상 구성 정보를 생성할 수 있으며, 컴퓨팅 디바이스는 영상 구성 정보를 자신의 클라이언트 디바이스, 다른 클라이언트 디바이스, 및/또는 서버 디바이스로 전달할 수도 있다.In addition, an external computing device connected to the client device may generate image configuration information, and the computing device may communicate image configuration information to its client device, another client device, and / or a server device.

도 13은 클라이언트 디바이스가 영상 구성 정보를 시그널링 하는 예시적인 방법을 나타낸 도면이다.13 is a diagram illustrating an exemplary method by which a client device signals image configuration information.

영상 구성 정보(시점 정보 및/또는 줌 영역 정보를 포함)를 시그널링하는 부분은 매우 중요하다. 영상 구성 정보의 시그널링이 너무 잦을 경우, 클라이언트 디바이스, 서버 디바이스, 및/또는 전체 네트워크에 부담을 줄 수 있다.Signaling the video configuration information (including viewpoint information and / or zoom area information) is very important. If the signaling of the video configuration information is too frequent, it may place a burden on the client device, the server device, and / or the entire network.

따라서, 클라이언트 디바이스는 사용자의 영상 구성 정보(또는 시선 정보 및/또는 줌 영역 정보)가 변경되는 경우에만 영상 구성 정보를 시그널링할 수 있다. 즉, 클라이언트 디바이스는 사용자의 시선 정보가 변경되는 경우에만 사용자의 시선 정보를 다른 클라이언트 디바이스 및/또는 서버 디바이스로 전송할 수 있다.Accordingly, the client device can signal image configuration information only when the image configuration information (or gaze information and / or zoom area information) of the user is changed. That is, the client device can transmit the gaze information of the user to another client device and / or the server device only when the gaze information of the user is changed.

일 실시예로, 화상 회의에서 통상 화자가 주목되는 경우가 대부분인 점을 이용하여 목소리를 내는 화자가 사용자의 시선 방향과 다를 경우에만 시선 정보를 다른 사용자의 클라이언트 디바이스나 서버 디바이스로 시그널링 할 수 있다. In one embodiment, the visual information can be signaled to a client device or a server device of another user only when a speaker who speaks a voice is different from a direction of a user's gaze by using a point where a speaker is usually noticed in a video conference.

비록 말을 하고 있는 화자는 아니지만, 퍼포먼스를 하고 있거나(온라인 강의의 경우) 칠판에 무엇인가를 쓰는 등 현재 주목을 받아야 할 사용자의 경우는, 클라이언트 디바이스는 시스템상의 옵션(예를 들어, 화자 및/또는 강의자는 제2 사용자로 설정)을 통해서 화자에 대한 정보를 획득할 수 있다.In the case of a user who is not speaking, but is performing, or who needs to be noticed, such as writing something on the chalkboard, the client device may use options (eg, speaker and / Or the lecturer is set as the second user).

도면을 참고하면, 클라이언트 디바이스는, 제어부 및/또는 화자 판단부를 이용하여, 화상 회의를 위한 가상 공간 영역 내에서 화자가 누군지를 판단할 수 있다(1310). 예를 들어, 클라이언트 디바이스는 센싱 정보를 기초로 화자가 누구인지를 판단할 수 있다. 또한, 클라이언트 디바이스는 주어진 옵션에 따라서 화자가 누구인지를 판단할 수 있다.Referring to the drawing, the client device can determine the speaker within the virtual space area for the video conference using the control unit and / or the speaker determination unit (1310). For example, the client device can determine who the speaker is based on the sensing information. In addition, the client device can determine who is the speaker according to the given option.

그리고 나서, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 사용자의 시선을 판단할 수 있다(1320). 예를 들어, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 사용자의 시선을 기초로 영상 구성 정보를 생성할 수 있다.Then, the client device can determine the user's gaze using the control unit and / or the visual determination unit (1320). For example, the client device can generate image configuration information based on the user's gaze using the control unit and / or the visual determination unit.

그리고 나서, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 사용자의 시선이 화자를 향하는지 여부를 판단할 수 있다(1330).Then, the client device can determine whether the user's gaze is directed to the speaker using the control unit and / or the gaze determination unit (1330).

사용자의 시선이 화자를 향하는 경우, 클라이언트 디바이스는, 통신부를 이용하여, 영상 구성 정보를 시그널링하지 않을 수 있다(1340). 이 경우, 클라이언트 디바이스는 사용자의 시선 방향에 있는 화자에 대한 영상은 계속 고품질로 수신할 수 있고, 사용자의 시선 방향에 없는 영상들은 계속 저품질로 수신할 수 있다.If the user's gaze is directed to the speaker, the client device may not signal the video configuration information using the communication unit (1340). In this case, the client device can continuously receive the image of the speaker in the direction of the user's eyes in high quality, and the images in the direction of the user's eyes can be continuously received in low quality.

사용자의 시선이 화자를 향하지 않는 경우, 클라이언트 디바이스는, 통신부를 이용하여, 영상 구성 정보를 시그널링할 수 있다(1350). 예를 들어, 처음에는 사용자의 시선이 화자를 향했으나 나중에 다른 곳으로 변경된 경우, 클라이언트 디바이스는 사용자의 새로운 시선 방향에 대한 영상 구성 정보를 시그널링할 수 있다. 즉, 클라이언트 디바이스는 새로운 시선 방향에 대한 영상 구성 정보를 다른 클라이언트 디바이스 및/또는 서버 디바이스로 전송할 수 있다. 이 경우, 클라이언트 디바이스는 사용자의 새로운 시선 방향에 해당하는 영상은 고품질로 수신할 수 있고, 사용자의 새로운 시선 방향에 해당하지 않는 영상(예를 들어, 화자에 해당하는 영상)은 저품질로 수신할 수 있다.If the user's line of sight does not point to the speaker, the client device can signal the video configuration information using the communication unit (1350). For example, if the user's gaze is initially directed to the speaker but later changed to another location, the client device may signal image configuration information for the new viewing direction of the user. That is, the client device may transmit image configuration information for a new viewing direction to another client device and / or a server device. In this case, the client device can receive the image corresponding to the user's new gaze direction with high quality, and can receive the image (for example, the image corresponding to the speaker) that does not correspond to the new gaze direction of the user with low quality have.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성 및/또는 전송하는 것을 중심으로 설명하였지만, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 센싱 정보를 기초로 영상 구성 정보를 생성하고, 영상 구성 정보를 적어도 하나의 클라이언트 디바이스로 전송할 수도 있다.In the above description, the client device generates and / or transmits the image configuration information. However, the server device may receive the sensing information from the client device, generate the image configuration information based on the sensing information, It may be transmitted to one client device.

상술한 바와 같이, 클라이언트 디바이스(예를 들어, HMD)를 이용한 가상 공간에서의 화상 회의에서 사용자들이 모두 화자를 바라보고 있는 상황에서는, 화상 회의 시스템은 화자의 영상정보를 기본 계층 데이터 및 향상 계층 데이터의 스케일러블 비디오 데이터로 전송할 수 있다. 또한, 화상 회의 시스템은 화자가 아닌 다른 사용자를 바라보는 사용자로부터는 시그널링을 받아서 다른 사용자의 영상정보를 기본 계층 데이터 및 향상 계층 데이터의 스케일러블 비디오 데이터로 전송할 수 있다. 이를 통해서, 화상 회의 시스템은 전체 시스템 상의 시그널링을 크게 줄이면서도 사용자에게 빠르고 고화질의 영상 정보를 서비스할 수 있다.As described above, in a situation where users are all looking at a speaker in a video conference in a virtual space using a client device (e.g., HMD), the video conference system converts the speaker's video information into base layer data and enhancement layer data Of scalable video data. Also, the video conferencing system receives signaling from a user looking at a user other than the speaker, and can transmit video information of another user as scalable video data of base layer data and enhancement layer data. Through this, the video conferencing system can provide fast and high quality video information to the user while greatly reducing the signaling on the whole system.

이상에서 언급한 시그널링은 서버 디바이스, 클라이언트 디바이스, 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다. 또한, 이상에서 언급한 시그널링은 클라이언트 디바이스 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다.The above-mentioned signaling may be signaling between a server device, a client device, and / or an external computing device (if present). In addition, the above-mentioned signaling may be signaling between the client device and / or an external computing device (if present).

도 14는 높고/낮은 수준의 영상을 전송하는 예시적인 방법을 나타낸 도면이다.14 is a diagram illustrating an exemplary method of transmitting high / low level images.

사용자의 시선 정보를 기초로 높고/낮은 수준의 영상을 전송하는 방법은 스케일러블 코덱의 계층을 스위칭하는 방법(1410), 싱글 비트스트림 및 실시간 인코딩의 경우 QP(Quantization Parameter) 등을 이용한 Rate Control 방법(1420), DASH 등의 단일 비트스트림의 경우 Chunk 단위로 스위칭하는 방법(1430), Down Scaling/Up Scaling 방법(1440), 및/또는 Rendering 의 경우 더 많은 리소스를 활용한 고화질 Rendering 방법(1450)을 포함할 수 있다.A method of transmitting a high / low level image based on a user's gaze information includes a method 1410 for switching layers of a scalable codec, a rate control method using a quantization parameter (QP) for single bit stream and real time encoding, A method 1430 of switching a chunk unit 1430, a down scaling / up scaling method 1440, and / or a high-quality rendering method 1450 using more resources in the case of a single bit stream such as DASH, . &Lt; / RTI >

전술한 예시적인 기술은 비록 비록 스케일러블 비디오를 통한 차별적 전송 기법(1410)을 이야기하고 있지만, 단일 계층을 갖는 일반 비디오 코딩 기술을 사용할 경우에도, 양자화 계수 (1420, Quantization Parameter)나 Down/Up scaling 정도(1440)를 조절함으로써, 전체 대역폭을 낮추고, 빠르게 사용자 시선 움직임에 응답하는 등의 장점을 제공할 수 있다. 또한 미리 여러 비트레이트(bitrate)를 갖는 비트스트림(bitstream)으로 트랜스코딩 된 파일들을 사용할 경우, 본 명세서의 예시적인 기술은 청크(Chunk) 단위로 높은 수준의 영상과 낮은 수준의 영상 사이를 스위칭하여 제공할 수 있다(1430).Although the exemplary techniques described above refer to the differential transmission scheme 1410 through scalable video, even when a general video coding technique with a single layer is used, quantization coefficients (1420, Quantization Parameter) or Down / Up scaling By adjusting the degree 1440, it is possible to provide advantages such as lowering the overall bandwidth and responding quickly to the user's gaze movement. In addition, when using files that are transcoded into a bitstream having several bitrates in advance, the exemplary technique of the present invention switches between a high-level image and a low-level image on a chunk basis (1430).

또한, 본 명세서는 화상 회의 시스템을 예로 들고 있지만, 본 명세서는 HMD를 이용한 VR (Virtual Reality), AR (Augmented Reality) 게임 등에서도 똑같이 적용될 수 있다. 즉, 사용자가 바라보는 시선에 해당하는 영역을 높은 수준의 영상으로 제공하고, 사용자가 바라볼 것으로 예상되는 영역이나 오브젝트(Object)가 아닌 곳을 바라 볼 경우만 시그널링하는 기법 모두가 화상 회의 시스템의 예에서와 똑같이 적용될 수 있다.In addition, although the present specification exemplifies a video conference system, the present specification can be equally applied to VR (Virtual Reality) and AR (Augmented Reality) games using an HMD. That is, all of the techniques for providing a high-level region corresponding to the line of sight the user is looking at, and signaling only when the user looks at an area or an object that is not expected to be viewed, It can be applied just as in the example.

도 15는 예시적인 영상 복호화 방법을 나타낸 도면이다.15 is a diagram illustrating an exemplary image decoding method.

영상 복호화 장치(또는 디코더)는 추출기, 기본 계층 디코더, 및/또는 향상 계층 디코더 중에서 적어도 하나를 포함할 수 있다. 영상 복호화 장치 및/또는 영상 복호화 방법에 대한 내용은 전술한 서버 디바이스 및/또는 영상 복호화 장치(또는 디코더)에 대한 설명 중에서 관련된 내용을 모두 포함할 수 있다.The video decoding apparatus (or decoder) may include at least one of an extractor, a base layer decoder, and / or an enhancement layer decoder. The contents of the video decoding apparatus and / or the video decoding method may include all contents related to the server device and / or the video decoding apparatus (or decoder) described above.

영상 복호화 장치는, 추출기를 이용하여, 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림을 수신할 수 있다(1510). 영상 복호화 장치는 비디오 데이터로부터 시그널링 데이터, 기본 계층 비디오 데이터, 및/또는 적어도 하나의 향상 계층 비디오 데이터를 추출할 수 있다.The video decoding apparatus can receive a bitstream including video data and signaling data using an extractor (1510). The video decoding apparatus may extract signaling data, base layer video data, and / or at least one enhancement layer video data from the video data.

또한, 영상 복호화 장치는, 기본 계층 디코더를 이용하여, 기본 계층 비디오 데이터를 디코딩할 수 있다(1520).Further, the video decoding apparatus may decode the base layer video data using a base layer decoder (1520).

또한, 영상 복호화 장치는, 향상 계층 디코더를 이용하여, 시그널링 데이터 및 기본 계층 비디오 데이터를 기초로 적어도 하나의 향상 계층 비디오 데이터를 디코딩할 수 있다(1530).In addition, the video decoding apparatus may decode at least one enhancement layer video data based on the signaling data and the base layer video data using an enhancement layer decoder (1530).

예를 들어, 비디오 데이터는 기본 계층을 위한 상기 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 상기 적어도 하나의 향상 계층 비디오 데이터를 포함할 수 있다.For example, the video data may comprise the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.

또한, 기본 계층 비디오 데이터는 전체 영역을 위한 비디오 데이터를 포함하고, 적어도 하나의 향상 계층 비디오 데이터는 전체 영역 내에서 관심 영역을 위한 비디오 데이터를 포함할 수 있다.In addition, the base layer video data may include video data for the entire area, and at least one enhancement layer video data may include video data for the area of interest within the entire area.

또한, 적어도 하나의 향상 계층은 각 계층별로 직사각형 모양의 적어도 하나의 타일로 분할되고, 관심 영역 정보는 관심 영역에 포함되는 적어도 하나의 타일을 식별하는 타일 정보를 포함할 수 있다.Also, the at least one enhancement layer may be divided into at least one tile having a rectangular shape for each layer, and the region of interest information may include tile information identifying at least one tile included in the region of interest.

또한, 관심 영역 정보는 영상 구성 정보를 기초로 생성되고, 영상 구성 정보는 가상 공간 내에서 사용자의 시선 방향을 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.In addition, the ROI information is generated based on the ROI, and the ROI may include ROI information indicating the ROI of the user and ROI information indicating the viewing angle of the user in the ROI.

또한, 영상 구성 정보는 사용자의 시선 방향이 화자를 향하지 않는 경우에 시그널링될 수 있다. Also, the image configuration information can be signaled when the direction of the user's gaze does not face the speaker.

또한, 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 상기 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.Further, the signaling data may be transmitted through at least one of Supplement Enhancement Information (SEI), video usability information (VUI), slice header, and a file describing the video data.

도 16은 예시적인 영상 부호화 방법을 나타낸 도면이다. 16 is a diagram illustrating an exemplary image encoding method.

영상 부호화 장치(또는 인코더)는 기본 계층 인코더, 향상 계층 인코더,및/또는 다중화기 중에서 적어도 하나를 포함할 수 있다. 영상 부호화 장치 및/또는 영상 부호화 방법에 대한 내용은 전술한 클라이언트 디바이스 및/또는 영상 부호화 장치(또는 인코더)에 대한 설명 중에서 관련된 내용을 모두 포함할 수 있다.The image encoding apparatus (or encoder) may include at least one of a base layer encoder, an enhancement layer encoder, and / or a multiplexer. The content of the video encoding apparatus and / or the video encoding method may include any content related to the client device and / or the video encoding apparatus (or the encoder) described above.

영상 부호화 장치는, 기본 계층 인코더를 이용하여, 기본 계층 비디오 데이터를 생성할 수 있다(1610).The video encoding apparatus can generate base layer video data using a base layer encoder (1610).

또한, 영상 부호화 장치는, 향상 계층 인코더를 이용하여, 시그널링 데이터 및 기본 계층 비디오 데이터를 기초로 적어도 하나의 향상 계층 비디오 데이터를 생성할 수 있다.Further, the image encoding apparatus can generate at least one enhancement layer video data based on the signaling data and the base layer video data using an enhancement layer encoder.

또한, 영상 부호화 장치는, 다중화기를 이용하여, 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림을 생성할 수 있다.Further, the video encoding apparatus can generate a bitstream including video data and signaling data using a multiplexer.

영상 부호화 장치 및/또는 영상 부호화 방법은 영상 복호화 장치 및/또는 영상 복호화 방법의 역 과정의 동작을 수행할 수 있다. 또한, 이를 위해서 공통된 특징을 포함할 수 있다.The image encoding apparatus and / or the image encoding method may perform an inverse process of the image decoding apparatus and / or the image decoding method. Also, common features can be included for this purpose.

도 17은 관심 영역 정보의 예시적인 신택스를 나타낸 도면이다.17 is a diagram showing an exemplary syntax of the region of interest information.

도 (a)를 참조하면, 비디오 픽쳐 별 관심 영역 정보(sighted_tile_info)가 나타나 있다. 예를 들어, 관심 영역 정보는 info_mode 정보, tile_id_list_size 정보, tile_id_list 정보, cu_id_list_size 정보, cu_id_list 정보, user_info_flag 정보, user_info_size 정보, 및/또는 user_info_list 중에서 적어도 하나를 포함할 수 있다.Referring to FIG. 5A, interest area information (sighted_tile_info) for each video picture is shown. For example, the region of interest information may include at least one of info_mode information, tile_id_list_size information, tile_id_list information, cu_id_list_size information, cu_id_list information, user_info_flag information, user_info_size information, and / or user_info_list.

info_mode 정보는 픽쳐 별로 관심 영역을 표현하는 정보의 모드를 지시할 수 있다. info_mode 정보는 부호 없는 4 비트의 정보로 표현될 수 있다. 또는 info_mode 정보는 포함하고 있는 정보의 모드를 지시할 수 있다. 예를 들어, info_mode 정보의 값이 ‘0’이면, info_mode 정보는 이전의 정보의 모드를 그대로 사용한다고 지시할 수 있다. info_mode 정보의 값이 ‘1’이면, info_mode 정보는 관심 영역에 해당하는 모든 타일 번호 리스트를 지시할 수 있다. info_mode 정보의 값이 ‘2’이면, info_mode 정보는 관심 영역에 해당하는 연속된 타일의 시작 번The info_mode information may indicate a mode of information representing a region of interest for each picture. The info_mode information can be represented by 4 bits of unsigned information. Or info_mode information may indicate the mode of the containing information. For example, if the value of the info_mode information is '0', the info_mode information can indicate that the previous information mode is used as it is. If the value of info_mode information is '1', info_mode information can indicate all tile number lists corresponding to the area of interest. If the value of the info_mode information is '2', the info_mode information indicates the start time of the consecutive tiles corresponding to the region of interest

호 및 끝 번호를 지시할 수 있다. info_mode 정보의 값이 3’이면, info_mode 정보는 관심 영역의 좌상단 및 우하단 타일의 번호를 지시할 수 있다. info_mode 정보의 값이 ‘4’이면, info_mode 정보는 관심 영역에 해당하는 타일의 번호 및 타일에 포함되는 코딩 유닛(Coding Unit)의 번호를 지시할 수 있다. Call and end numbers. If the value of the info_mode information is 3 ', the info_mode information may indicate the upper left and lower right tile numbers of the ROI. If the value of the info_mode information is '4', the info_mode information may indicate the number of the tile corresponding to the area of interest and the number of the coding unit included in the tile.

tile_id_list_size 정보는 타일 번호 리스트의 길이를 지시할 수 있다. tile_id_list_size 정보는 부호 없는 8 비트의 정보로 표현될 수 있다. The tile_id_list_size information may indicate the length of the tile number list. The tile_id_list_size information can be represented by 8 bits of unsigned information.

tile_id_list 정보는, info_mode 정보를 기초로, 타일 번호 리스트를 포함할 수 있다. 각각의 타일 번호는 부호 없는 8 비트의 정보로 표현될 수 있다. tile_id_list 정보는, info_mode 정보를 기초로, 관심 영역에 해당하는 모든 타일의 번호(info_mode 정보=1 인 경우), 연속된 타일의 시작 번호 및 끝 번호(info_mode 정보=2 인 경우), 및 관심 영역의 좌상단 및 우하단 타일의 번호(info_mode 정보=3 인 경우) 중에서 하나를 포함할 수 있다.The tile_id_list information may include a tile number list based on the info_mode information. Each tile number can be represented by 8 bits of unsigned information. The tile_id_list information includes information on the number of all tiles corresponding to the area of interest (when info_mode information = 1), the start number and end number of consecutive tiles (when info_mode information = 2) And the number of the upper left and lower right tiles (when info_mode information = 3).

cu_id_list_size 정보는 코딩 유닛(Coding Unit) 리스트의 길이를 지시할 수 있다. cu_id_list_size 정보는 부호 없는 16 비트의 정보로 표현될 수 있다.The cu_id_list_size information may indicate the length of a Coding Unit list. The cu_id_list_size information can be represented by 16 bits of unsigned information.

cu_id_list 정보는, info_mode 정보를 기초로, 코딩 유닛 번호의 리스트를 포함할 수 있다. 각각의 코딩 유닛 번호는 부호 없는 16 비트의 정보로 표현될 수 있다. 예를 들어, cu_id_list 정보는, info_mode 정보를 기초로, 관심 영역에 해당하는 코딩 유닛 번호의 리스트(예를 들어, info_mode 정보=4 인 경우)를 지시할 수 있다.The cu_id_list information may include a list of coding unit numbers based on the info_mode information. Each coding unit number can be represented by 16 bits of unsigned information. For example, the cu_id_list information may indicate a list of coding unit numbers (for example, info_mode information = 4) corresponding to the region of interest, based on the info_mode information.

user_info_flag 정보는 추가 사용자 정보 모드를 지시하는 플래그일 수 있다. user_info_flag 정보는 사용자 및/또는 제공자가 추가로 전송하려는 타일 관련 정보가 있는지 여부를 지시할 수 있다. user_info_flag 정보는 부호 없는 1 비트의 정보로 표현될 수 있다. 예를 들어, user_info_flag 정보의 값이 ‘0’이면, 추가 사용자 정보가 없다고 지시할 수 있다. user_info_flag 정보의 값이 ‘1’이면, 추가 사용자 정보가 있다고 지시할 수 있다.The user_info_flag information may be a flag indicating an additional user information mode. The user_info_flag information may indicate whether the user and / or the provider have tile-related information to be transmitted further. The user_info_flag information can be represented by one bit of unsigned information. For example, if the value of the user_info_flag information is '0', it can be indicated that there is no additional user information. If the value of the user_info_flag information is '1', it can be indicated that there is additional user information.

user_info_size 정보는 추가 사용자 정보의 길이를 지시할 수 있다. user_info_size 정보는 부호 없는 16 비트의 정보로 표현될 수 있다.The user_info_size information may indicate the length of the additional user information. user_info_size information can be represented by 16 bits of unsigned information.

user_info_list 정보는 추가 사용자 정보의 리스트를 포함할 수 있다. 각각의 추가 사용자 정보는 부호 없는 변화 가능한 비트의 정보로 표현될 수 있다.The user_info_list information may include a list of additional user information. Each additional user information may be represented by information of unsigned changeable bits.

도 (b)를 참조하면, 파일, 청크, 비디오 픽쳐 그룹별 관심 영역 정보가 나타나 있다. 예를 들어, 관심 영역 정보는 버전 정보 필드, 전체 데이터 크기 필드, 및/또는 적어도 하나의 단위 정보 필드 중에서 적어도 하나를 포함할 수 있다.Referring to FIG. 5B, ROI information for each file, chunk, and video picture group is shown. For example, the region of interest information may include at least one of a version information field, an entire data size field, and / or at least one unit information field.

도면을 참조하면, 파일, 청크, 비디오 픽쳐 그룹별 관심 영역 정보(sighted_tile_info)가 나타나 있다. 예를 들어, 관심 영역 정보는 version_info 정보, file_size 정보, 및/또는 단위 정보 중에서 적어도 하나를 포함할 수 있다. Referring to the drawing, interest area information (sighted_tile_info) for each file, chunk, and video picture group is shown. For example, the region of interest information may include at least one of version_info information, file_size information, and / or unit information.

version_info 정보는 관심 영역 정보(또는 시그널링 규격)의 버전을 지시할 수 있다. version_info 정보는 부호 없는 8 비트의 정보로 표현될 수 있다.The version_info information may indicate the version of the region of interest information (or signaling specification). The version_info information can be represented by 8 bits of unsigned information.

file_size 정보는 단위 정보의 사이즈를 지시할 수 있다. file_size 정보는 부호 없는 64 비트의 정보로 표현될 수 있다. 예를 들어, file_size 정보는 파일 사이즈, 청크 사이즈, 비디오 픽쳐 그룹 사이즈를 지시할 수 있다. The file_size information may indicate the size of the unit information. The file_size information can be represented by 64 bits of unsigned information. For example, the file_size information may indicate a file size, a chunk size, and a video picture group size.

단위 정보는 파일 단위, 청크 단위, 및/또는 비디오 픽쳐 그룹 단위 별로 관심 영역 정보를 포함할 수 있다.The unit information may include ROI information by file unit, chunk unit, and / or video picture group unit.

단위 정보는 poc_num 정보, info_mode 정보, tile_id_list_size 정보, tile_id_list 정보, cu_id_list_size 정보, cu_id_list 정보, user_info_flag 정보, user_info_size 정보, 및/또는 user_info_list 정보 중에서 적어도 하나를 포함할 수 있다.The unit information may include at least one of poc_num information, info_mode information, tile_id_list_size information, tile_id_list information, cu_id_list_size information, cu_id_list information, user_info_flag information, user_info_size information, and / or user_info_list information.

poc_num 정보는 비디오 픽쳐의 번호를 지시할 수 있다. 예를 들어, 픽처 번호 필드는 HEVC에서는 POC(Picture Order Count)를 지시할 수 있으며, 일반 비디오 코덱에서는 해당 픽쳐(프레임) 번호를 지시할 수 있다. poc_num 정보는 부호 없는 32 비트의 정보로 표현될 수 있다.The poc_num information may indicate the video picture number. For example, a picture number field may indicate a picture order count (POC) in HEVC, and a picture (frame) number in a general video codec. The poc_num information can be represented by 32 bits of unsigned information.

info_mode 정보, tile_id_list_size 정보, tile_id_list 정보, cu_id_list_size 정보, cu_id_list 정보, user_info_flag 정보, user_info_size 정보, 및/또는 user_info_list 정보에 대한 구체적인 내용은 전술한 내용과 동일하므로 구체적인 설명은 생략한다. The detailed contents of the info_mode information, the tile_id_list_size information, the tile_id_list information, the cu_id_list_size information, the cu_id_list information, the user_info_flag information, the user_info_size information, and / or the user_info_list information are the same as those described above, and a detailed description thereof will be omitted.

관심 영역 정보는 서버 디바이스(또는 영상 전송 장치)에서 생성되고, 적어도 하나의 클라이언트 디바이스(또는 영상 수신 장치)로 전송될 수 있다.The area of interest information may be generated at a server device (or an image transmission device) and transmitted to at least one client device (or image receiving device).

또한, 관심 영역 정보는 적어도 하나의 클라이언트 디바이스(또는 영상 수신 장치) 에서 생성되고, 적어도 하나의 클라이언트 디바이스(또는 영상 수신 장치) 및/또는 서버 디바이스(또는 영상 전송 장치)로 전송될 수 있다. 이 경우, 클라이언트 디바이스 및/또는 클라이언트 디바이스의 제어부는 전술한 시그널링 데이터 추출부, 영상 생성부, 관심 영역 판단부, 시그널링 데이터 생성부, 및/또는 인코더를 더 포함할 수 있다.In addition, the area of interest information may be generated in at least one client device (or image receiving device) and transmitted to at least one client device (or image receiving device) and / or a server device (or image transmitting device). In this case, the control unit of the client device and / or the client device may further include the signaling data extraction unit, the image generation unit, the ROI determination unit, the signaling data generation unit, and / or the encoder.

도 18은 XML 포맷의 예시적인 관심 영역 정보, 및 예시적인 SEI 메시지를 나타낸 도면이다.18 is a diagram showing exemplary ROI information in an XML format, and an exemplary SEI message.

도 (a)를 참조하면, 관심 영역 정보(sighted_tile_info)는 XML 형태로 표현될 수 있다. 예를 들어, 관심 영역 정보(sighted_tile_info)는 info_mode 정보(‘3’), tile_id_list_size 정보(‘6’), 및/또는 tile_id_list 정보(‘6, 7, 8, 9, 10, 11, 12’)를 포함할 수 있다.Referring to FIG. 5A, the region of interest (sighted_tile_info) can be expressed in XML format. For example, the interest area information (sighted_tile_info) includes info_mode information ('3'), tile_id_list_size information ('6'), and / or tile_id_list information ('6, 7, 8, 9, 10, 11, 12') .

도 (b)를 참고하면, 국제 비디오 표준에서의 Supplemental Enhancement Information(SEI) 메시지의 페이로드 (payload) 구문 (Syntax)이 나타나 있다. SEI 메시지는 동영상 부호화 계층(VCL)의 복호화 과정에서 필수가 아닌 부가정보를 나타낸다.Referring to Figure (b), the payload syntax of a Supplemental Enhancement Information (SEI) message in an international video standard is shown. The SEI message indicates additional information that is not essential in the decoding process of the video coding layer (VCL).

관심 영역 정보(sighted_tile_info, 1810)는 고효율 비디오 부호화(HEVC), 엠펙-4 (MPEG-4), 및/또는 고급 비디오 부호화(AVC)의 SEI 메시지에 포함되어 방송망 및/또는 브로드밴드를 통하여 전송될 수 있다. 예를 들어, SEI 메시지는 압축된 비디오 데이터에 포함될 수 있다.The region of interest information (sighted_tile_info) 1810 may be included in SEI messages of High Efficiency Video Coding (HEVC), MPEG-4 and / or Advanced Video Coding (AVC) and transmitted over a broadcasting network and / or broadband have. For example, the SEI message may be included in the compressed video data.

이하에서는 가상 현실 서비스를 위한 비디오 데이터 및/또는 시그널링 데이터를 방송망 및/또는 브로드밴드를 통해서 전송 및/또는 수신하는 방법에 대하여 설명한다.Hereinafter, a method of transmitting and / or receiving video data and / or signaling data for a virtual reality service through a broadcasting network and / or broadband will be described.

도 19는 클라이언트 디바이스의 예시적인 프로토콜 스택을 도시한 도면이다.19 is a diagram illustrating an exemplary protocol stack of a client device.

본 도면에서 방송(broadcast) 쪽 프로토콜 스택 부분은, SLT(service list table) 와 MMTP(MPEG Media Transport Pprotocol) 를 통해 전송되는 부분, ROUTE(Real time Object delivery over Unidirectional Transport) 를 통해 전송되는 부분으로 나뉘어질 수 있다.In this figure, the broadcast protocol stack portion is divided into a portion transmitted through a service list table (SLT) and a portion transmitted through an MMTP (MPEG Media Transport Protocol), and a portion transmitted through ROUTE (Real time Object delivery over Unidirectional Transport) Can be.

SLT(1910) 는 UDP(User Datagram Protocol), IP(Internet Protocol ) 레이어를 거쳐 인캡슐레이션될 수 있다. MMTP(MPEG media transport Protocol) 는 MMT(MPEG media transport) 에서 정의되는 MPU(Media Processing Unit) 포맷으로 포맷된 데이터(1920)들과 MMTP 에 따른 시그널링 데이터(1930)들을 전송할 수 있다. 이 데이터들은 UDP, IP 레이어를 거쳐 인캡슐레이션될 수 있다. ROUTE 는 DASH(Dynamic Adaptive Streaming over HTTP) 세그먼트 형태로 포맷된 데이터들(1960)과 시그널링 데이터(1940)들, 그리고 NRT(Non-Real Time) 등의 논 타임드(non The SLT 1910 may be encapsulated via a UDP (User Datagram Protocol) or IP (Internet Protocol) layer. The MPEG media transport protocol (MMTP) can transmit data 1920 formatted in an MPU (Media Processing Unit) format defined in the MMT (MPEG media transport) and signaling data 1930 according to MMTP. These data can be encapsulated via UDP, IP layer. ROUTE includes data 1960 and signaling data 1940 formatted in the form of dynamic adaptive streaming over HTTP (DASH) segments, non-timed (non-real time) data such as NRT

timed) 데이터들(1950)을 전송할 수 있다. 이 데이터들 역시 UDP, IP 레이어를 거쳐 인캡슐레이션될 수 있다. timed < / RTI > This data can also be encapsulated via UDP, IP layer.

SLT 와 MMTP 를 통해 전송되는 부분, ROUTE 를 통해 전송되는 부분은 UDP, IP 레이어에서 처리된 후 링크 레이어(Data Link Layer)에서 다시 인캡슐레이션될 수 있다. 링크 레이어에서 처리된 방송 데이터는 피지컬 레이어에서 인코딩/인터리빙 등의 과정을 거쳐 방송 신호로서 멀티캐스트될 수 있다.The part transmitted through the SLT and the MMTP, the part transmitted through the ROUTE may be processed at the UDP, IP layer, and then re-encapsulated at the link layer (Data Link Layer). The broadcast data processed in the link layer can be multicasted as a broadcast signal through processes such as encoding / interleaving in the physical layer.

본 도면에서 브로드밴드(broadband) 쪽 프로토콜 스택 부분은, 전술한 바와 같이 HTTP(HyperText Transfer Protocol) 를 통하여 전송될 수 있다. DASH 세그먼트 형태로 포맷된 데이터들(1960)과 시그널링 데이터들(1980), NRT 등의 데이터(1970)가 HTTP 를 통하여 전송될 수 있다. 여기서 도시된 시그널링 데이터들(signaling)은 서비스에 관한 시그널링 데이터일 수 있다. 이 데이터들은 TCP(Transmission Control Protoco), IP 레이어를 거쳐 프로세싱된 후, 링크 레이어에서 인캡슐레이션될 수 있다. 이 후 처리된 브로드밴드 데이터는 피지컬 레이어에서 전송을 위한 처리를 거쳐 브로드밴드로 유니캐스트될 수 있다.In this figure, the broadband side protocol stack portion can be transmitted through HTTP (HyperText Transfer Protocol) as described above. Data 1960 formatted in the DASH segment format, signaling data 1980, and data 1970 such as NRT can be transmitted via HTTP. The signaling data shown herein may be signaling data relating to the service. These data can be processed via TCP (Transmission Control Protocol), IP layer, and then encapsulated at the link layer. This processed broadband data can be unicast to broadband through processing for transmission in the physical layer.

서비스는 전체적으로 사용자에게 보여주는 미디어 컴포넌트의 컬렉션일 수 있고, 컴포넌트는 여러 미디어 타입의 것일 수 있고, 서비스는 연속적이거나 간헐적일 수 있고, 서비스는 실시간이거나 비실시간일 수 있고, 실시간 서비스는 TV 프로그램의 시퀀스로 구성될 수 있다.The service may be a collection of media components that are displayed to the user as a whole, the component may be of several media types, the service may be continuous or intermittent, the service may be real or non real time, &Lt; / RTI >

서비스는 전술한 가상 현실 서비스 및/또는 증강 현실 서비스를 포함할 수 있다. 또한, 비디오 데이터 및/또는 오디오 데이터는 MPU 포맷으로 포맷된 데이터(1920), NRT 등의 논 타임드(non timed) 데이터(1950), 및/또는 DASH 세그먼트 형태로 포맷된 데이터(1960) 중에서 적어도 하나에 포함될 수 있다. 또한, 시그널링 데이터(예를 들어, 제1 시그널링 데이터, 제2 시그널링 데이터)는 SLT(1910), 시그널링 데이터(1930), 시그널링 데이터(1940), 및/또는 시그널링 데이터(1980) 중에서 적어도 하나에 포함될 수 있다.The service may include the virtual reality service and / or augmented reality service described above. Also, the video data and / or audio data may include at least one of data 1920 formatted in MPU format, non-timed data 1950 such as NRT, and / or data 1960 formatted in DASH segment format Can be included in one. Also, signaling data (e.g., first signaling data, second signaling data) may be included in at least one of SLT 1910, signaling data 1930, signaling data 1940, and / or signaling data 1980 .

도 20은 SLT 와 SLS (service layer signaling) 의 예시적인 관계를 도시한 도면이다.20 is an illustration showing an exemplary relationship between SLT and SLS (service layer signaling).

서비스 시그널링은 서비스 디스커버리 및 디스크립션 정보를 제공하고, 두 기능 컴포넌트를 포함한다. 이들은 SLT(2010)를 통한 부트스트랩 시그널링과 SLS(2020, 2030)이다. 예를 들어, MMTP에서의 SLS는 MMT 시그널링 컴포넌츠(2030)로 표현할 수 있다. 이들은 사용자 서비스를 발견하고 획득하는 데 필요한 정보를 나타낸다. SLT(2010)는 수신기가 기본 서비스 리스트를 작성하고 각 서비스에 대한 SLS(2020, 2030)의 발견을 부트스트랩 할 수 있게 해준다.Service signaling provides service discovery and description information, and includes two functional components. These are bootstrap signaling through SLT 2010 and SLS 2020, 2030. For example, the SLS in the MMTP can be represented as MMT signaling component 2030. These represent the information needed to discover and acquire user services. The SLT 2010 allows the receiver to create a basic service list and bootstrap discovery of SLSs 2020 and 2030 for each service.

SLT(2010)는 기본 서비스 정보의 매우 빠른 획득을 가능하게 한다. SLS(2020, 2030)는 수신기가 서비스와 그 컨텐츠 컴포넌트(비디오 데이터 또는 오디오 데이터 등)를 발견하고 이에 접속할 수 있게 해준다. SLT 2010 enables very fast acquisition of basic service information. SLSs 2020 and 2030 allow a receiver to discover and connect to services and their content components (such as video data or audio data).

전술한 바와 같이 SLT(2010) 는 UDP/IP 를 통해 전송될 수 있다. 이 때, 실시예에 따라 이 전송에 있어 가장 강건한(robust) 방법을 통해 SLT(2010) 에 해당하는 데이터가 전달될 수 있다.As described above, the SLT 2010 can be transmitted through UDP / IP. At this time, according to the embodiment, data corresponding to the SLT 2010 can be delivered through the most robust method in this transmission.

SLT(2010) 는 ROUTE 프로토콜에 의해 전달되는 SLS(2020) 에 접근하기 위한 액세스 정보를 가질 수 있다. 즉 SLT(2010) 는 ROUTE 프로토콜에 따른 SLS(2020) 에 부트스트래핑할 수 있다. 이 SLS(2020) 는 전술한 프로토콜 스택에서 ROUTE 윗 레이어에 위치하는 시그널링 정보로서, ROUTE/UDP/IP 를 통해 전달될 수 있다. 이 SLS(2020) 는 ROUTE 세션에 포함되는 LCT 세션들 중 하나를 통하여 전달될 수 있다. 이 SLS(2020) 를 이용하여 원하는 서비스에 해당하는 서비스 컴포넌트(2040)에 접근할 수 있다.SLT 2010 may have access information to access SLS 2020 carried by the ROUTE protocol. That is, the SLT 2010 can bootstrap the SLS 2020 according to the ROUTE protocol. The SLS 2020 is signaling information located in the ROUTE upper layer in the above-described protocol stack, and can be transmitted via ROUTE / UDP / IP. The SLS 2020 may be delivered via one of the LCT sessions included in the ROUTE session. The service component 2040 corresponding to a desired service can be accessed using the SLS 2020.

또한 SLT(2010) 는 MMTP 에 의해 전달되는 SLS(MMT 시그널링 컴포넌트, 2030)에 접근하기 위한 액세스 정보를 가질 수 있다. 즉, SLT(2010) 는 MMTP 에 따른 SLS(MMT 시그널링 컴포넌트, 2030) 에 부트스트래핑할 수 있다. 이 SLS(MMT 시그널링 컴포넌트, 2030) 는 MMT 에서 정의하는 MMTP 시그널링 메시지(Signaling Message)에 의해 전달될 수 있다. 이 SLS(MMT 시그널링 컴포넌트, 2030) 를 이용하여 원하는 서비스에 해당하는 스트리밍 서비스 컴포넌트(MPU, 2050) 에 접근할 수 있다. 전술한 바와 같이, 본 명세서에서는 NRT 서비스 컴포넌트(2060)는 ROUTE The SLT 2010 may also have access information to access the SLS (MMT signaling component, 2030) delivered by the MMTP. That is, the SLT 2010 may bootstrap the SLS (MMT signaling component) 2030 according to the MMTP. This SLS (MMT signaling component, 2030) can be delivered by an MMTP signaling message defined in the MMT. The SLS (MMT signaling component) 2030 can access the streaming service component (MPU) 2050 corresponding to a desired service. As described above, in this specification, the NRT service component 2060 includes ROUTE

프로토콜을 통해 전달되는데, MMTP 에 따른 SLS(MMT 시그널링 컴포넌트, 2030) 는 이에 접근하기 위한 정보도 포함할 수 있다. 브로드밴드 딜리버리에서, SLS는 HTTP(S)/TCP/IP로 전달된다.Protocol, the SLS (MMT signaling component) 2030 according to the MMTP may also include information for accessing it. In broadband delivery, SLS is delivered over HTTP (S) / TCP / IP.

서비스는 서비스 컴포넌츠(2040), 스트리밍 서비스 컴포넌츠(2050), 및/또는 NRT 서비스 컴포넌츠(2060) 중에서 적어도 하나에 포함될 수 있다. 또한, 시그널링 데이터(예를 들어, 제1 시그널링 데이터, 제2 시그널링 데이터)는 SLT(2010), SLS(2020), 및/또는 MMT 시그널링 컴포넌츠(2030) 중에서 적어도 하나에 포함될 수 있다.The service may be included in at least one of the service components 2040, the streaming service components 2050, and / or the NRT service components 2060. In addition, signaling data (e.g., first signaling data, second signaling data) may be included in at least one of SLT 2010, SLS 2020, and / or MMT signaling components 2030.

도 21은 예시적인 SLT 를 도시한 도면이다. 21 is a diagram showing an exemplary SLT.

SLT는 수신기가 채널 이름, 채널 넘버 등으로 그것이 수신할 수 있는 모든 서비스의 리스트를 구축할 수 있게 하는 빠른 채널 스캔을 지원한다. 또한 SLT는 수신기가 각 서비스에 대해 SLS를 발견할 수 있게 하는 부트스트랩 정보를 제공한다.SLT supports fast channel scans, allowing receivers to build up a list of all services it can receive with channel names, channel numbers, and so on. SLT also provides bootstrapping information that allows the receiver to discover the SLS for each service.

SLT는 @bsid, @sltCapabilities, sltInetUrl 엘레멘트, 및/또는 Service 엘레멘트 중에서 적어도 하나를 포함할 수 있다.The SLT may include at least one of the @bsid, @sltCapabilities, sltInetUrl elements, and / or Service elements.

@bsid는 브로드캐스트 스트림의 고유 식별자일 수 있다. @bsid의 값은 지역적인 단계에서 고유한 값을 가질 수 있다.@bsid may be a unique identifier of the broadcast stream. The value of @bsid can have a unique value at the local level.

@sltCapabilities는 해당 SLT에서 기술하는 모든 서비스에서 의미 있는 방송을 하기 위해 요구되는 사양을 의미한다.@sltCapabilities is a specification required for meaningful broadcasting in all services described in the SLT.

sltInetUrl 엘레멘트는 해당 SLT에서 기술하는 모든 서비스의 가이드 정보를 제공하는 ESG(Electronic Service Guide) 데이터 혹은 서비스 시그널링 정보를 브로드밴드망을 통해서 다운 받을 수 있는 URL(Uniform Resource Locator) 값을 의미한다. sltInetUrl 엘리먼트는 @URLtype을 포함할 수 있다. The sltInetUrl element is a URL (Uniform Resource Locator) value for downloading ESG (Electronic Service Guide) data or service signaling information providing guide information of all services described in the SLT through a broadband network. The sltInetUrl element can contain @URLtype.

@URLtype은 sltInetUrl엘레멘트가 지시하는 URL을 통해 다운 받을 수 있는 파일의 종류를 의미한다.@URLtype is the type of file that can be downloaded through the URL pointed to by the sltInetUrl element.

Service 엘레멘트는 서비스 정보를 포함할 수 있다. 서비스 엘레멘트는 @serviceId, @sltSvcSeqNum, @protected, @majorChannelNo, @minorChannelNo, @serviceCategory, @shortServiceName, @hidden, @broadbandAccessRequired, @svcCapabilities, BroadcastSignaling 엘레멘트, 및/또는 svcInetUrl 엘레멘트 중에서 적어도 하나를 포함할 수 있다.The Service element may contain service information. The service element may include at least one of @ serviceId, @ serviceSlcSeqNum, @protected, @majorChannelNo, @minorChannelNo, @serviceCategory, @shortServiceName, @hidden, @broadbandAccessRequired, @svcCapabilities, BroadcastSignaling element, and / or svcInetUrl element.

@serviceId는 서비스의 고유 식별자이다.@ serviceId is the unique identifier of the service.

@sltSvcSeqNum는 SLT에서 정의하는 각 서비스의 내용이 변경된 바 있는지에 대한 정보를 의미하는 값을 가진다.@sltSvcSeqNum has a value indicating whether the content of each service defined by SLT has been changed.

@protected는 “true” 값을 가질 경우, 해당 서비스를 화면에 보여주기 위해서는 서비스를 구성하는 컴포넌트 중 하나라도 보호가 되어있음을 의미한다.If @protected has a value of "true", it means that one of the components that make up the service is protected to show the service on the screen.

@majorChannelNo는 해당 서비스의 major 채널 넘버를 의미한다.@majorChannelNo means the major channel number of the service.

@minorChannelNo는 해당 서비스이 minor 채널 넘버를 의미한다.@minorChannelNo means the service's minor channel number.

@serviceCategory는 해당 서비스의 종류를 지시한다.@serviceCategory indicates the type of service.

@shortServiceName는 해당 서비스의 이름을 지시한다.@shortServiceName indicates the name of the service.

@hidden는 해당 서비스가 서비스 스캔 시, 사용자에게 보여져야 하는지 아닌지를 지시한다.@hidden indicates whether or not the service should be shown to the user when scanning the service.

@broadbandAccessRequired는 해당 서비스를 사용자에게 의미있게 보여주기 위해서 브로드밴드망에 접속을 해야하는지를 지시한다.@broadbandAccessRequired indicates whether a broadband network should be accessed to show the service to users in a meaningful way.

@svcCapabilities는 해당 서비스를 사용자에게 의미 있게 보여주기 위해 지원 해야 하는 사양을 지시한다.@svcCapabilities indicates specifications that must be supported to make the service meaningful to the user.

BroadcastSignaling 엘레멘트는 방송망으로 전송되는 시그널링의 전송 프로토콜, 위치, 식별자 값들에 대한 정의를 포함한다. BroadcastSignaling 엘레멘트는 @slsProtocol, @slsMajorProtocolVersion, @slsMinorProtocolVersion, @slsPlpId, @slsDestinationIpAddress, @slsDestinationUdpPort, 및/또는 @slsSourceIpAddress중에서 적어도 하나를 포함할 수 있다.The BroadcastSignaling element contains definitions for the transport protocol, location, and identifier values of the signaling sent to the broadcast network. The BroadcastSignaling element may include at least one of @slsProtocol, @slsMajorProtocolVersion, @slsMinorProtocolVersion, @slsPlpId, @slsDestinationIpAddress, @slsDestinationUdpPort, and / or @slsSourceIpAddress.

@slsProtocol는 해당 서비스의 SLS가 전송되는 프로토콜을 나타낸다.@slsProtocol indicates the protocol to which the SLS of the corresponding service is transmitted.

@slsMajorProtocolVersion는 해당 서비스의 SLS가 전송되는 프로토콜의 major 버전을 나타낸다.@slsMajorProtocolVersion indicates the major version of the protocol to which the SLS of the service is transmitted.

@slsMinorProtocolVersion는 해당 서비스의 SLS가 전송되는 프로토콜의 minor 버전을 나타낸다.@slsMinorProtocolVersion indicates the minor version of the protocol to which the SLS of the service is transmitted.

@slsPlpId는 SLS가 전송되는 PLP 식별자를 나타낸다.@slsPlpId indicates the PLP identifier to which the SLS is transmitted.

@slsDestinationIpAddress는 SLS 데이터의 destination IP 주소값을 나타낸다.@slsDestinationIpAddress indicates the destination IP address value of the SLS data.

@slsDestinationUdpPort는 SLS 데이터의 destination Port 값을 나타낸다.@slsDestinationUdpPort indicates the destination port value of the SLS data.

@slsSourceIpAddress는 SLS 데이터의 source IP 주소값을 나타낸다.@slsSourceIpAddress represents the source IP address value of the SLS data.

svcInetUrl 엘레멘트는 ESG 서비스 혹은 해당 서비스와 연관된 시그널링 데이터를 다운받을 수 있는 URL 값을 나타낸다. svcInetUrl 엘레멘트는 @URLtype을 포함할 수 있다.The svcInetUrl element indicates the URL value for downloading the ESG service or the signaling data associated with the service. The svcInetUrl element can contain @URLtype.

@URLtype는 svcInetUrl 엘레먼트가 지시하는 URL을 통해 다운 받을 수 있는 파일의 종류를 의미한다.@URLtype is the type of file that can be downloaded through the URL pointed to by the svcInetUrl element.

도 22는 serviceCategory 속성의 예시적인 코드 벨류를 나타낸 도면이다.22 is a diagram illustrating an exemplary code value of the serviceCategory attribute.

예를 들어, serviceCategory 속성의 값이 ‘0’이면, 서비스는 특정되지 않을 수 있다. serviceCategory 속성의 값이 ‘1’이면, 해당 서비스는 리니어 오디오/비디오 서비스일 수 있다. serviceCategory 속성의 값이 ‘2’이면, 해당 서비스는 리니어 오디오 서비스일 수 있다. serviceCategory 속성의 값이 ‘3’이면, 해당 서비스는 앱-베이스드 서비스일 수 있다. serviceCategory 속성의 값이 ‘4’이면, 해당 서비스는 전자 서비스 가이드(ESG) 서비스일 수 있다. serviceCategory 속성의 값이 ‘5’이면, 해당 서비스는 긴급 경보 서비스(EAS)일 수 있다. For example, if the value of the serviceCategory attribute is '0', the service may not be specified. If the value of the serviceCategory attribute is '1', the service may be a linear audio / video service. If the value of the serviceCategory attribute is '2', the service may be a linear audio service. If the value of the serviceCategory attribute is '3', the service may be an app-based service. If the value of the serviceCategory attribute is '4', the service may be an electronic service guide (ESG) service. If the value of the serviceCategory attribute is '5', the service may be an emergency alert service (EAS).

serviceCategory 속성의 값이 ‘6’이면, 해당 서비스는 가상 현실 및/또는 증강 현실 서비스일 수 있다.If the value of the serviceCategory attribute is '6', the service may be a virtual reality and / or augmented reality service.

화상 회의 서비스의 경우, serviceCategory 속성의 값은 ‘6’일 수 있다(2210).For a video conferencing service, the value of the serviceCategory attribute may be '6' (2210).

도 23은 예시적인 SLS 부트스트래핑과 예시적인 서비스 디스커버리 과정을 도시한 도면이다.23 is a diagram illustrating an exemplary SLS bootstrapping and exemplary service discovery process.

수신기는 SLT를 획득할 수 있다. SLT는 SLS 획득을 부트스트랩 하는데 사용되고, 그 후 SLS는 ROUTE 세션 또는 MMTP 세션에서 전달되는 서비스 컴포넌트를 획득하는 데 사용된다.The receiver can acquire the SLT. The SLT is used to bootstrap the SLS acquisition, after which the SLS is used to acquire the service component delivered in the ROUTE session or the MMTP session.

ROUTE 세션에서 전달되는 서비스와 관련하여, SLT는 PLPID(#1), 소스 IP 어드레스 (sIP1), 데스티네이션 IP 어드레스 (dIP1), 및 데스티네이션 포트 넘버 (dPort1)와 같은 SLS 부트스트래핑 정보를 제공한다. MMTP 세션에서 전달되는 서비스와 관련하여, SLT는 PLPID(#2), 데스티네이션 IP 어드레스 (dIP2), 및 데스티네이션 포트 넘버 (dPort2)와 같은 SLS 부트스트래핑 정보를 제공한다.With respect to the service delivered in the ROUTE session, the SLT provides SLS bootstrapping information such as PLPID (# 1), source IP address (sIPl), destination IP address (dIPl), and destination port number dPortl . With respect to the service delivered in the MMTP session, the SLT provides SLS bootstrapping information such as PLPID (# 2), Destination IP address (dIP2), and Destination port number (dPort2).

참고로, 브로드캐스트 스트림은 특정 대역 내에 집중된 캐리어 주파수 측면에서 정의된 RF 채널의 개념이다. PLP (physical layer pipe)는 RF 채널의 일부에 해당된다. 각 PLP는 특정 모듈레이션 및 코딩 파라미터를 갖는다.For reference, a broadcast stream is a concept of an RF channel defined in terms of a carrier frequency concentrated in a specific band. The physical layer pipe (PLP) is part of the RF channel. Each PLP has specific modulation and coding parameters.

ROUTE를 이용한 스트리밍 서비스 딜리버리에 대해, 수신기는 PLP 및 IP/UDP/LCT 세션으로 전달되는 SLS 프래그먼트를 획득할 수 있다. 이들 SLS 프래그먼트는 USBD/USD(User Service Bundle Description/User Service Description) 프래그먼트, S-TSID(Service-based Transport Session Instance Description) 프래그먼트, MPD(Media Presentation Description) 프래그먼트를 포함한다. 그것들은 하나의 서비스와 관련이 있다.For streaming service delivery using ROUTE, the receiver can acquire SLP fragments that are delivered in PLP and IP / UDP / LCT sessions. These SLS fragments include a USBD / USD (User Service Bundle Description / User Service Description) fragment, a Service-based Transport Session Instance Description (S-TSID) fragment, and a MPD (Media Presentation Description) fragment. They are related to one service.

MMTP를 이용한 스트리밍 서비스 딜리버리에 대해, 수신기는 PLP 및 MMTP 세션으로 전달되는 SLS 프래그먼트를 획득할 수 있다. 이들 SLS 프래그먼트는 USBD/USD 프래그먼트, MMT 시그널링 메시지를 포함할 수 있다. 그것들은 하나의 서비스와 관련이 있다. For streaming service delivery using MMTP, the receiver can obtain SLS fragments that are delivered in PLP and MMTP sessions. These SLS fragments may include USBD / USD fragments, MMT signaling messages. They are related to one service.

수신기는 SLS 프래그먼트를 기초로 비디오 컴포넌트 및/또는 오디오 컴포넌트를 획득할 수 있다.The receiver may obtain video components and / or audio components based on the SLS fragments.

도시된 실시예와는 달리, 하나의 ROUTE 또는 MMTP 세션은 복수개의 PLP 를 통해 전달될 수 있다. 즉, 하나의 서비스는 하나 이상의 PLP 를 통해 전달될 수도 있다. 전술한 바와 같이 하나의 LCT 세션은 하나의 PLP 를 통해 전달될 수 있다. 도시된 것과 달리 실시예에 따라 하나의 서비스를 구성하는 컴포넌트들이 서로 다른 ROUTE 세션들을 통해 전달될 수도 있다. 또한, 실시예에 따라 하나의 서비스를 구성하는 컴포넌트들이 서로 다른 MMTP 세션들을 통해 전달될 수도 있다. 실시예에 따라 하나의 서비스를 구성하는 컴포넌트들이 ROUTE 세션과 MMTP 세션에 Unlike the illustrated embodiment, one ROUTE or MMTP session may be delivered over a plurality of PLPs. That is, one service may be delivered via one or more PLPs. As described above, one LCT session can be transmitted through one PLP. Components which constitute one service may be delivered through different ROUTE sessions according to an embodiment. Also, according to an exemplary embodiment, the components configuring one service may be delivered through different MMTP sessions. According to the embodiment, the components constituting one service are connected to the ROUTE session and the MMTP session

나뉘어 전달될 수도 있다. 도시되지 않았으나, 하나의 서비스를 구성하는 컴포넌트가 브로드밴드를 통해 전달(하이브리드 딜리버리)되는 경우도 있을 수 있다.It can be divided and delivered. Although not shown, there may be a case where a component constituting one service is delivered through a broadband (hybrid delivery).

또한, 서비스 데이터(예를 들어, 비디오 컴포넌트 및/또는 오디오 컴포넌트) 및/또는 시그널링 데이터(예를 들어, SLS 프래그먼트)는 방송망 및/또는 브로드밴드를 통해서 전송될 수 있다.In addition, service data (e.g., video components and / or audio components) and / or signaling data (e.g., SLS fragments) may be transmitted over the broadcast network and / or broadband.

도 24는 ROUTE/DASH 를 위한 예시적인 USBD/USD 프래그먼트를 도시한 도면이다. 24 is a diagram illustrating an exemplary USBD / USD fragment for ROUTE / DASH.

USBD/USD (User Service Bundle Description/User Service Description) 프래그먼트는 서비스 레이어 특성을 서술하고, S-TSID 프래그먼트에 대한 URI(Uniform Resource Identifier) 레퍼런스 및 MPD 프래그먼트에 대한 URI 레퍼런스를 제공한다. 즉, USBD/USD 프래그먼트는 S-TSID 프래그먼트와 MPD 프래그먼트를 각각 레퍼런싱할 수 있다. USBD/USD 프래그먼트는 USBD 프래그먼트로 표현할 수 있다.The USBD / USD (User Service Bundle Description / User Service Description) fragment describes the service layer characteristics and provides a URI reference for the S-TSID fragment and a URI reference for the MPD fragment. That is, USBD / USD fragments can refer to S-TSID fragments and MPD fragments, respectively. USBD / USD fragments can be represented as USBD fragments.

USBD/USD 프래그먼트는 bundleDescription 루트 엘레멘트를 가질 수 있다. bundleDescription 루트 엘레멘트는 userServiceDescription 엘레멘트를 가질 수 있다. userServiceDescription 엘레멘트는 하나의 서비스에 대한 인스턴스일 수 있다.USBD / USD fragments can have a bundleDescription root element. The bundleDescription root element can have a userServiceDescription element. The userServiceDescription element can be an instance of one service.

userServiceDescription 엘레멘트는 @globalServiceId, @serviceId, @serviceStatus, @fullMPDUri, @sTSIDUri, name 엘레멘트, serviceLanguage 엘레멘트, deliveryMethod 엘레멘트, 및/또는 serviceLinakge 엘레멘트 중에서 적어도 하나를 포함할 수 있다.The userServiceDescription element may include at least one of @globalServiceId, @serviceId, @serviceStatus, @fullMPDUri, @sTSIDUri, name element, serviceLanguage element, deliveryMethod element, and / or serviceLinakge element.

@globalServiceId는 서비스를 식별하는 글로벌적으로 고유한 URI를 지시할 수 있다.@globalServiceId can point to a globally unique URI that identifies the service.

@serviceId는 SLT에 있는 해당하는 서비스 엔트리에 대한 레퍼런스이다.The @ serviceId is a reference to the corresponding service entry in the SLT.

@serviceStatus는 해당 서비스의 상태는 특정할 수 있다. 그 값은 해당 서비스가 활성화되어 있는지 비활성화되어 있는지를 나타낸다.@ serviceStatus can specify the status of the service. The value indicates whether the service is active or inactive.

@fullMPDUri는 브로드캐스트 및/또는 브로드밴드 상에서 전달되는 서비스의 컨텐츠 컴포넌트에 대한 디스크립션을 포함하는 MPD 프래그먼트를 레퍼런싱할 수 있다.@fullMPDUri may refer to an MPD fragment that contains a description of the content component of the service delivered on broadcast and / or broadband.

@sTSIDUri는 해당 서비스의 컨텐츠를 전달하는 전송 세션에 액세스 관련 파라미터를 제공하는 S-TSID 프래그먼트를 레퍼런싱할 수 있다.@sTSIDUri can refer to an S-TSID fragment that provides access-related parameters to the transport session carrying the contents of that service.

name 엘레먼트는 서비스의 네임을 나타낼 수 있다. name 엘레먼트는 서비스 네임의 언어를 나타내는 @lang을 포함할 수 있다.The name element can represent the name of the service. The name element can contain @lang, which indicates the language of the service name.

serviceLanguage 엘레먼트는 서비스의 이용 가능한 언어를 나타낼 수 있다. The serviceLanguage element may represent the language in which the service is available.

deliveryMethod 엘레먼트는 액세스의 브로드캐스트 및 (선택적으로) 브로드밴드 모드 상에서 서비스의 컨텐츠에 속하는 정보에 관련된 트랜스포트의 컨테이너일 수 있다. deliveryMethod 엘레멘트는 broadcastAppService 엘레멘트와 unicastAppService 엘레멘트를 포함할 수 있다. 각각의 하위 엘레멘트들은 basePattern 엘레멘트를 하위 엘레멘트로 가질 수 있다.The deliveryMethod element may be a container of a transport related to broadcast of access and (optionally) information pertaining to the content of the service in broadband mode. The deliveryMethod element can contain the broadcastAppService element and the unicastAppService element. Each subelement can have a basePattern element as a child element.

broadcastAppService 엘레멘트는 소속된 미디어 프레젠테이션의 모든 기간에 걸쳐 서비스에 속하는 해당 미디어 컴포넌트를 포함하는 다중화된 또는 비다중화된 형태의 브로드캐스트 상에서 전달되는 DASH 레프레젠테이션일 수 있다. 즉, 각각의 본 필드들은, 방송망을 통해 전달되는 DASH 레프레젠테이션(representation) 들을 의미할 수 있다.The broadcastAppService element may be a DASH presentation that is delivered on a multiplexed or unmultiplexed form of broadcast that includes corresponding media components belonging to the service over all periods of the media presentation to which it belongs. That is, each of these fields may refer to DASH representations transmitted over a broadcast network.

unicastAppService는 소속된 미디어 프레젠테이션의 모든 기간에 걸쳐 서비스에 속하는 구성 미디어 컨텐츠 컴포넌트를 포함하는 다중화된 또는 비다중화된 형태의 브로드밴드 상에서 전달되는 DASH 레프레젠테이션일 수 있다. 즉, 각각의 본 필드들은, 브로드밴드를 통해 전달되는 DASH 레프레젠테이션(representation) 들을 의미할 수 있다.The unicastAppService may be a DASH presentation that is delivered on a broadband or multiplexed or non-multiplexed form containing configuration media content components belonging to the service over all periods of the media presentation to which it belongs. That is, each of these fields may refer to DASH representations transmitted over broadband.

basePattern은 포함된 기간에 페어런트 레프레젠테이션의 미디어 분할을 요구하기 위해 DASH 클라이언트에 의해 사용되는 분할 URL의 모든 부분에 대해 매칭되도록 수신기에 의해 사용되는 문자 패턴일 수 있다.The basePattern may be a character pattern used by the receiver to match for all parts of the split URL used by the DASH client to request media splitting of the parent presentation to the included period.

serviceLinakge 엘레멘트는 서비스 링키지 정보를 포함할 수 있다.The serviceLinakge element may contain service linkage information.

도 25는 ROUTE/DASH 를 위한 예시적인 S-TSID 프래그먼트를 도시한 도면이다.25 is a diagram illustrating an exemplary S-TSID fragment for ROUTE / DASH.

S-TSID(Service-based Transport Session Instance Description) 프래그먼트는 서비스의 미디어 컨텐츠 컴포넌트가 전달되는 하나 이상의 ROUTE/LCT 세션에 대한 전송 세션 디스크립션 및 해당 LCT 세션에서 전달되는 딜리버리 오브젝트의 디스크립션을 제공한다. 수신기는 S-TSID 프래그먼트를 기초로 서비스에 포함되는 적어도 하나의 컴포넌트(예를 들어, 비디오 컴포넌트 및/또는 오디오 컴포넌트)를 획득할 수 있다.A Service-based Transport Session Instance Description (S-TSID) fragment provides a description of a transport session description for one or more ROUTE / LCT sessions over which a media content component of the service is delivered, and a delivery object delivered in that LCT session. The receiver may obtain at least one component (e.g., a video component and / or an audio component) included in the service based on the S-TSID fragment.

S-TSID 프래그먼트는 S-TSID 루트 엘레멘트를 포함할 수 있다. S-TSID 루트 엘레멘트는 @serviceId 및/또는 적어도 하나의 RS 엘레멘트를 포함할 수 있다.The S-TSID fragment may include the S-TSID root element. The S-TSID root element may contain @serviceId and / or at least one RS element.

@serviceID는 USD에서 서비스 엘레멘트에 해당하는 레퍼런스일 수 있다.@serviceID can be a reference to a service element in USD.

RS 엘레멘트는 해당 서비스 데이터들을 전달하는 ROUTE 세션에 대한 정보를 가질 수 있다.The RS element may have information about a ROUTE session that carries corresponding service data.

RS 엘레멘트는 @bsid, @sIpAddr, @dIpAddr, @dport, @PLPID 및/또는 적어도 하나의 LS 엘레멘트 중에서 적어도 하나를 포함할 수 있다.The RS element may contain at least one of @bsid, @sIpAddr, @dIpAddr, @dport, @PLPID, and / or at least one LS element.

@bsid는 broadcastAppService의 컨텐츠 컴포넌트가 전달되는 브로드캐스트 스트림의 식별자일 수 있다.@bsid may be the identifier of the broadcast stream to which the content component of the broadcastAppService is delivered.

@sIpAddr은 소스 IP 어드레스를 나타낼 수 있다. 여기서 소스 IP 어드레스는, 해당 서비스에 포함되는 서비스 컴포넌트를 전달하는 ROUTE 세션의 소스 IP 어드레스일 수 있다.@sIpAddr can indicate the source IP address. Where the source IP address may be the source IP address of the ROUTE session that carries the service component included in the service.

@dIpAddr은 데스티네이션 IP 어드레스를 나타낼 수 있다. 여기서 데스티네이션 IP 어드레스는, 해당 서비스에 포함되는 서비스 컴포넌트를 전달하는 ROUTE 세션의 데스티네이션 IP 어드레스일 수 있다.@dpAddr can represent the destination IP address. The destination IP address may be a destination IP address of a ROUTE session that carries a service component included in the service.

@dport는 데스티네이션 포트를 나타낼 수 있다. 여기서 데스티네이션 포트는, 해당 서비스에 포함되는 서비스 컴포넌트를 전달하는 ROUTE 세션의 데스티네이션 포트일 수 있다.@dport can represent a destination port. The destination port may be a destination port of a ROUTE session that carries a service component included in the service.

@PLPID 는 RS 엘레멘트로 표현되는 ROUTE 세션을 위한 PLP 의 ID 일 수 있다.@PLPID may be the ID of the PLP for the ROUTE session represented by the RS element.

LS 엘레멘트는 해당 서비스 데이터들을 전달하는 LCT 세션에 대한 정보를 가질 수 있다.The LS element may have information about an LCT session that delivers the corresponding service data.

LS 엘레멘트는 @tsi, @PLPID, @bw, @startTime, @endTime, SrcFlow 및/또는 RprFlow 를 포함할 수 있다.The LS element can contain @tsi, @PLPID, @bw, @startTime, @endTime, SrcFlow and / or RprFlow.

@tsi 는 해당 서비스의 서비스 컴포넌트가 전달되는 LCT 세션의 TSI 값을 지시할 수 있다.@tsi can indicate the TSI value of the LCT session over which the service component of the service is delivered.

@PLPID 는 해당 LCT 세션을 위한 PLP 의 ID 정보를 가질 수 있다. 이 값은 기본 ROUTE 세션 값을 덮어쓸 수도 있다.The @PLPID may have the ID information of the PLP for the corresponding LCT session. This value may override the default ROUTE session value.

@bw 는 최대 밴드위스 값을 지시할 수 있다. @startTime 은 해당 LCT 세션의 스타트 타임(Start time)을 지시할 수 있다. @endTime 은 해당 LCT 세션의 엔드 타임(End time)을 지시할 수 있다. SrcFlow 엘레멘트는 ROUTE 의 소스 플로우에 대해 기술할 수 있다. RprFlow 엘레멘트는 ROUTE 의 리페어 플로우에 대해 기술할 수 있다.@bw can indicate the maximum bandwidth value. @startTime can indicate the start time of the LCT session. @endTime can indicate the end time of the LCT session. The SrcFlow element can describe the source flow of ROUTE. The RprFlow element can describe the repair flow of ROUTE.

S-TSID는 관심 영역 정보를 포함할 수 있다. 구체적으로 RS 엘레멘트 및/또는 LS 엘레멘트는 관심 영역 정보를 포함할 수 있다.The S-TSID may include region of interest information. Specifically, the RS element and / or the LS element may include the region of interest information.

도 26은 예시적인 MPD 프래그먼트를 나타낸 도면이다.26 is a diagram illustrating an exemplary MPD fragment.

MPD(Media Presentation Description) 프래그먼트는 방송사에 의해 정해진 주어진 듀레이션의 리니어 서비스에 해당하는 DASH 미디어 프레젠테이션의 공식화된 디스크립션을 포함할 수 있다. MPD 프래그먼트는 주로 스트리밍 컨텐츠로서의 DASH 프래그먼트의 딜리버리를 위한 리니어 서비스와 관련된다. MPD는 프래그먼트 URL 형태의 리니어/스트리밍 서비스의 개별 미디어 컴포넌트에 대한 소스 식별자, 및 미디어 프레젠테이션 내의 식별된 리소스의 컨텍스트를 제공한다. MPD는 브로드캐스트 및/또는 브로드밴드를 통해서 전송될 수 있다.The MPD (Media Presentation Description) fragment may contain a formalized description of a DASH media presentation corresponding to a linear service of a given duration as determined by the broadcaster. MPD fragments are primarily concerned with linear services for delivery of DASH fragments as streaming content. The MPD provides a source identifier for the individual media component of the linear / streaming service in the form of a fragment URL, and the context of the identified resource within the media presentation. The MPD may be transmitted via broadcast and / or broadband.

MPD 프래그먼트는 피리어드(Period) 엘레멘트, 어뎁테이션 셋(Adaptation Set) 엘레멘트 및 레프리젠테이션 (Representation) 엘레멘트를 포함할 수 있다.The MPD fragment may include a Period element, an Adaptation Set element, and a Representation element.

피리어드 엘레멘트는 피리어드에 대한 정보를 포함한다. MPD 프래그먼트는 복수의 피리어드에 대한 정보를 포함할 수 있다. 피리어드는 미디어 컨텐츠 재생(presentation)의 연속한 시간 구간을 나타낸다.The period element contains information about the period. The MPD fragment may contain information about a plurality of periods. The period represents a continuous time interval of media content presentation.

어뎁테이션 셋 엘레멘트는 어뎁테이션 셋에 대한 정보를 포함한다. MPD 프래그먼트는 복수의 어뎁테이션 셋에 대한 정보를 포함할 수 있다. 어뎁테이션 셋은 상호전환 가능한 하나 또는 그 이상의 미디어 컨텐츠 컴포넌트를 포함하는 미디어 컴포넌트의 집합이다. 어뎁테이션 셋은 하나 또는 그 이상의 레프리젠테이션을 포함할 수 있다. 어뎁테이션 셋 각각은 서로 다른 언어의 오디오를 포함하거나 서로 다른 언어의 자막을 포함할 수 있다.An adaptation set element contains information about an adaptation set. The MPD fragment may contain information about a plurality of adaptation sets. An adaptation set is a collection of media components that contain one or more media content components that are interchangeable. An adaptation set may include one or more representations. Each of the adaptation sets may include audio of different languages or may include subtitles of different languages.

레프리젠테이션 엘레멘트는 레프리젠테이션에 대한 정보를 포함한다. MPD는 복수의 레프리젠테이션에 대한 정보를 포함할 수 있다. 레프리젠테이션은 하나 또는 그 이상의 미디어 컴포넌트들의 구조화된 모음으로서, 동일한 미디어 컨텐츠 컴포넌트에 대하여 서로 달리 인코딩된 복수의 레프리젠테이션이 존재할 수 있다. 한편, 비트스트림 스위칭(bitstream switching)이 가능한 경우, 전자 장치는 미디어 컨텐츠 재생 도중 업데이트된 정보에 기초하여 수신되는 레프리젠테이션을 다른 레프리젠테이션으로 전환할 수 있다. 특히 전자 장치는 대역폭의 환경에 따라 수신되는 레프리젠테이션을 다른 레프리젠테이션으로 전환할 수 있다. 레프리젠테이션은 복수의 세그먼트들로 분할된다.The presentation element contains information about the presentation. The MPD may include information about a plurality of representations. A representation is a structured collection of one or more media components, and there may be a plurality of differently encoded representations for the same media content component. On the other hand, when bitstream switching is enabled, the electronic device can switch the received presentation to another presentation based on the updated information during media content playback. In particular, the electronic device can convert the received presentation into another representation depending on the bandwidth environment. The representation is divided into a plurality of segments.

세그먼트는 미디어 컨텐츠 데이터의 단위이다. 레프리젠테이션은 HTTP 1.1(RFC 2616)에서 정의된 HTTP GET 또는 HTTP partial GET method를 이용한 전자 장치의 요청에 따라 세그먼트 또는 세그먼트의 일부분으로 전송될 수 있다.A segment is a unit of media content data. The representation may be sent as part of a segment or segment as requested by the electronic device using the HTTP GET or HTTP partial GET method defined in HTTP 1.1 (RFC 2616).

또한, 세그먼트는 복수의 서브 세그먼트들을 포함하여 구성될 수 있다. 서브세그먼트는 세그먼트 레벨에서 인덱스될 수 있는 가장 작은 단위(unit)를 의미할 수 있다. 세그먼트는 초기화 세그먼트(Initialization Segment), 미디어 세그먼트(Media Segment), 인덱스 세그먼트 Index Segment), 비트스트림 스위칭 세그먼트(BitstreamSwitching Segment) 등을 포함할 수 있다.Further, the segment may be configured to include a plurality of sub-segments. A sub-segment may mean the smallest unit that can be indexed at the segment level. The segment may include an Initialization Segment, a Media Segment, an Index Segment Index Segment, a Bitstream Switching Segment, and the like.

MPD 프래그먼트는 관심 영역 정보를 포함할 수 있다. 구체적으로 피리어드(Period) 엘레멘트, 어뎁테이션 셋(Adaptation Set) 엘레멘트 및/또는 레프리젠테이션 (Representation) 엘레멘트는 관심 영역 정보를 포함할 수 있다.The MPD fragment may include region of interest information. In particular, the Period element, the Adaptation Set element, and / or the Representation element may include the region of interest information.

도 27은 가상 현실 서비스를 복수의 ROUTE 세션을 통해서 수신하는 예시적인 과정을 나타낸 도면이다.27 is a diagram illustrating an exemplary process of receiving a virtual reality service through a plurality of ROUTE sessions.

클라이언트 디바이스(또는 수신기)는 방송망을 통하여 비트스트림을 수신할 수 있다. 예를 들어, 비트 스트림은 서비스를 위한 비디오 데이터 및 제2 시그널링 데이터를 포함할 수 있다. 예를 들어, 제2 시그널링 데이터는 SLT(2710) 및 SLS(2730)를 포함할 수 있다. 서비스는 가상 현실 서비스를 포함할 수 있다. 서비스 데이터는 기본 계층 서비스 데이터(2740) 및 향상 계층 서비스 데이터(2750)를 포함할 수 있다. The client device (or receiver) can receive the bitstream through the broadcast network. For example, the bitstream may comprise video data for service and second signaling data. For example, the second signaling data may include SLT 2710 and SLS 2730. The service may include a virtual reality service. The service data may include base layer service data 2740 and enhancement layer service data 2750.

비트스트림은 적어도 하나의 물리 계층 프레임을 포함할 수 있다. 물리 계층 프레임은 적어도 하나의 PLP를 포함할 수 있다. 예를 들어, PLP(#0)을 통하여 SLT(2710)가 전송될 수 있다.The bitstream may include at least one physical layer frame. The physical layer frame may include at least one PLP. For example, the SLT 2710 may be transmitted through the PLP (# 0).

또한, PLP(#1)은 제1 ROUTE 세션(ROUTE #1)을 포함할 수 있다. 1 ROUTE 세션(ROUTE #1)은 제1 LCT 세션(tsi-sls), 제2 LCT 세션(tsi-bv), 및 제3 LCT 세션(tsi-a)를 포함할 수 있다. 제1 LCT 세션(tsi-sls)을 통해서 SLS(2730)가 전송되고, 제2 LCT 세션(tsi-bv)을 통해서 기본 계층 비디오 데이터(Video Segment, 2740)가 전송되고, 및 제3 LCT 세션(tsi-a)를 통해서 오디오 데이터(Audio Segment)가 전송될 수 있다.Also, the PLP # 1 may include a first ROUTE session ROUTE # 1. One ROUTE session ROUTE # 1 may include a first LCT session tsi-sls, a second LCT session tsi-bv, and a third LCT session tsi-a. The SLS 2730 is transmitted via the first LCT session tsi-sls and the base layer video data 2740 is transmitted through the second LCT session tsi-bv, audio data can be transmitted through the tsi-a.

또한, PLP(#2)는 제2 ROUTE 세션(ROUTE #2)을 포함할 수 있고, 제2 ROUTE 세션(ROUTE #2)은 제4 LCT 세션(tsi-ev)를 포함할 수 있다. 제4 LCT 세션(tsi-ev)을 통해서 향상 계층 비디오 데이터(Video Segment, 2750)가 전송될 수 있다.Also, the PLP # 2 may include a second ROUTE session ROUTE # 2, and the second ROUTE session ROUTE # 2 may include a fourth LCT session tsi-ev. Enhanced layer video data (Video Segment) 2750 may be transmitted through the fourth LCT session (tsi-ev).

그리고 나서, 클라이언트 디바이스는 SLT(2710)를 획득할 수 있다. 예를 들어, SLT(2710)는 SLS(2730)를 획득하기 위한 부트스트랩 정보(2720)를 포함할 수 있다.The client device may then obtain the SLT 2710. For example, the SLT 2710 may include bootstrap information 2720 to obtain the SLS 2730.

그리고 나서, 클라이언트 디바이스는, 부트스트랩 정보(2720)을 기초로, 가상 현실 서비스를 위한 SLS(2730)를 획득할 수 있다. 예를 들어, SLS는 USBD/USD 프래그먼트, S-TSID 프래그먼트, 및/또는 MPD 프래그먼트를 포함할 수 있다. USBD/USD 프래그먼트, S-TSID 프래그먼트, 및/또는 MPD 프래그먼트 중에서 적어도 하나는 관심 영역 정보를 포함할 수 있다. 이하에서는 MPD 프래그먼트가 관심 영역 정보를 포함하는 것을 전제로 설명한다.The client device may then obtain the SLS 2730 for the virtual reality service based on the bootstrap information 2720. For example, the SLS may include a USBD / USD fragment, an S-TSID fragment, and / or an MPD fragment. At least one of the USBD / USD fragment, the S-TSID fragment, and / or the MPD fragment may include region of interest information. Hereinafter, it is assumed that the MPD fragment includes the region of interest information.

그리고 나서, 클라이언트 디바이스는, USBD/USD 프래그먼트를 기초로 S-TSID 프래그먼트 및/또는 MPD 프래그먼트를 획득할 수 있다. 클라이언트 디바이스는, S-TSID 프래그먼트 및 MPD 프래그먼트를 기초로, LCT 세션을 통해서 전송되는 미디어 컴포넌트와 MPD 프래그먼트의 레프리젠테이션을 매칭시킬 수 있다.The client device may then obtain the S-TSID fragment and / or the MPD fragment based on the USBD / USD fragment. The client device may match the representation of the MPD fragment with the media component transmitted via the LCT session, based on the S-TSID fragment and the MPD fragment.

그리고 나서, 클라이언트 디바이스는, S-TSID 프래그먼트의 RS 엘리먼트(ROUTE #1)를 기초로 기본 계층 비디오 데이터(2740) 및 오디오 데이터를 획득할 수 있다. 또한, 클라이언트 디바이스는, S-TSID 프래그먼트의 RS 엘리먼트(ROUTE #2)를 기초로 향상 계층 비디오 데이터(2750) 및 오디오 데이터를 획득할 수 있다.The client device may then obtain base layer video data 2740 and audio data based on the RS element ROUTE # 1 of the S-TSID fragment. In addition, the client device may obtain enhancement layer video data 2750 and audio data based on the RS element (ROUTE # 2) of the S-TSID fragment.

그리고 나서, 클라이언트 디바이스는, MPD 프래그먼트를 기초로, 서비스 데이터(예를 들어, 기본 계층 비디오 데이터, 향상 계층 비디오 데이터, 오디오 데이터)를 디코딩할 수 있다.The client device may then decode the service data (e.g., base layer video data, enhancement layer video data, audio data) based on the MPD fragment.

보다 구체적으로, 클라이언트 디바이스는, 기본 계층 비디오 데이터 및/또는를 관심 영역 정보를 기초로, 향상 계층 비디오 데이터를 디코딩할 수 있다.More specifically, the client device may decode the enhancement layer video data based on the base layer video data and / or the region of interest information.

이상에서는 향상 계층 비디오 데이터가 제2 ROUTE 세션(ROUTE #2)를 통해서 전송되는 것으로 설명하였지만, 향상 계층 비디오 데이터는 MMTP 세션을 통해서 전송될 수도 있다.In the above description, the enhancement layer video data is transmitted through the second ROUTE session (ROUTE # 2), but the enhancement layer video data may be transmitted through the MMTP session.

도 28는 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.28 is a diagram showing an exemplary configuration of a client device.

도 (a)를 참조하면, 따른 클라이언트 디바이스(A2800)는 영상 입력부, 오디오 입력부, 센서부, 영상 출력부, 오디오 출력부, 통신부(A2810), 및/또는 제어부(A2820) 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 클라이언트 디바이스(A2800)에 대한 구체적인 내용은 전술한 클라이언트 디바이스의 내용을 모두 포함할 수 있다.The client device A 2800 includes at least one of an image input unit, an audio input unit, a sensor unit, an image output unit, an audio output unit, a communication unit A 2810, and / or a control unit A 2820 . For example, the specific contents of the client device (A 2800) may include all the contents of the client device described above.

제어부(A2820)는 시그널링 데이터 추출부, 디코더, 화자 판단부, 시선 판단부, 및/또는 시그널링 데이터 생성부 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 제어부(A2820)에 대한 구체적인 내용은 전술한 제어부의 내용을 모두 포함할 수 있다.The control unit A2820 may include at least one of a signaling data extraction unit, a decoder, a speaker determination unit, a gaze determination unit, and / or a signaling data generation unit. For example, the contents of the control unit A2820 may include all the contents of the control unit described above.

도면을 참조하면, 클라이언트 디바이스(또는 수신기, 영상 수신 장치)는 통신부(A2810), 및/또는 제어부(A2820)를 포함할 수 있다. 제어부(A2820)는 기본 계층 디코더(A2821) 및/또는 향상 계층 디코더(A2825)를 포함할 수 있다.Referring to the drawings, a client device (or receiver, image receiving apparatus) may include a communication unit A2810 and / or a control unit A2820. The control unit A2820 may include a base layer decoder A2821 and / or an enhancement layer decoder A2825.

통신부(A2810)는 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 수신할 수 있다. 통신부(A2810)는 방송망 및/또는 브로드밴드를 통하여 비트스트림을 수신할 수 있다.The communication unit A2810 can receive a bit stream including video data for a virtual reality service. The communication unit A2810 can receive the bit stream through the broadcasting network and / or the broadband.

상기 비디오 데이터는 기본 계층을 위한 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터를 포함할 수 있다.The video data may include base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.

기본 계층 디코더(A2821)는 상기 기본 계층 비디오 데이터를 디코딩할 수 있다.The base layer decoder A2821 may decode the base layer video data.

향상 계층 디코더(A2825)는 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 디코딩할 수 있다.The enhancement layer decoder A2825 may decode the at least one enhancement layer video data based on the base layer video data.

상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.The at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.

또한, 제어부(A2820)는 제1 시그널링 데이터를 생성하는 시그널링 데이터 생성부를 더 포함할 수 있다.In addition, the control unit A2820 may further include a signaling data generation unit for generating the first signaling data.

상기 제1 시그널링 데이터는 영상 구성 정보를 포함할 수 있다. 영상 구성 정보는 가상 공간 내에서 사용자의 시선 방향을 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보 중에서 적어도 하나를 포함할 수 있다.The first signaling data may include image configuration information. The image configuration information may include at least one of gaze information indicating a gaze direction of a user in the virtual space and zoom area information indicating a viewing angle of the user.

또한, 제어부(A2820)는 상기 시선 정보에 대응되는 시선 영역이 상기 적어도 하나의 관심 영역에 포함되는지 판단하는 시선 판단부를 더 포함할 수 있다.In addition, the control unit A2820 may further include a visual determination unit for determining whether a visual region corresponding to the visual information is included in the at least one region of interest.

또한, 상기 통신부(A2810)는, 상기 시선 영역이 상기 적어도 하나의 관심 영역 외의 영역에 포함되면, 상기 제1 시그널링 데이터를 서버(또는 서버 디바이스, 송신기, 영상 전송 장치) 및/또는 적어도 하나의 클라이언트 디바이스(또는 영상 수신 장치)로 전송할 수 있다.이 경우, 제1 시그널링 데이터를 수신한 서버, 서버 디바이스 및/또는 적어도 하나의 클라이언트 디바이스는 상기 적어도 하나의 관심 영역에 상기 시선 정보에 대응되는 시선 영역을 포함시킬 수 있다. 즉, 관심 영역은 가상 공간 내에서의 화자를 포함하는 영역, 적어도 하나의 향상 계층 비디오 데이터를 이용하여 표현되는 것으로 미리 정해진 영역, 시선 정보에 대응되는 시선 영역 중에서 적어도 하나를 포함할 수 있다.The communication unit A2810 may transmit the first signaling data to a server (or a server device, a transmitter, a video transmission device) and / or at least one client, if the gaze region is included in an area other than the at least one ROI The server device, the server device, and / or the at least one client device, which have received the first signaling data, may transmit to the device (or the image receiving device) . That is, the region of interest may include at least one of a region including a speaker in the virtual space, a region predetermined to be expressed using at least one enhancement layer video data, and a line of sight corresponding to the line of sight information.

또한, 상기 비트스트림은 제2 시그널링 데이터를 더 포함할 수 있다.In addition, the bitstream may further include second signaling data.

상기 통신부(A2810)는 상기 제2 시그널링 데이터를 기초로 상기 기본 계층 비디오 데이터 및 상기 적어도 하나의 향상 계층 비디오 데이터를 복수의 세션들을 통해서 독립적으로 수신할 수 있다.The communication unit A2810 can independently receive the base layer video data and the at least one enhancement layer video data through a plurality of sessions based on the second signaling data.

예를 들어, 통신부(A2810)는 기본 계층 비디오 데이터를 제1 ROUTE 세션을 통해서 수신하고, 적어도 하나의 향상 계층 비디오 데이터를 적어도 하나의 제2 ROUTE 세션을 통해서 수신할 수 있다. 또는, 통신부(A2810)는 기본 계층 비디오 데이터를 ROUTE 세션을 통해서 수신하고, 적어도 하나의 향상 계층 비디오 데이터를 적어도 하나의 MMTP 세션을 통해서 수신할 수 있다.For example, the communication unit A2810 can receive base layer video data through a first ROUTE session and receive at least one enhancement layer video data through at least one second ROUTE session. Alternatively, the communication unit A2810 may receive base layer video data through a ROUTE session and receive at least one enhancement layer video data through at least one MMTP session.

또한, 상기 제2 시그널링 데이터는 상기 비디오 데이터의 획득을 위한 정보를 포함하는 서비스 레이어 시그널링 데이터(또는 SLS) 및 상기 서비스 레이어 시그널링 데이터의 획득을 위한 정보를 포함하는 서비스 리스트 테이블(또는 SLT) 중에서 적어도 하나를 포함할 수 있다.The second signaling data may include at least one of a service layer signaling data (or SLS) including information for acquiring the video data and a service list table (or SLT) including information for acquiring the service layer signaling data. One can be included.

또한, 상기 서비스 리스트 테이블은 서비스의 카테고리를 지시하는 서비스 카테고리 속성을 포함할 수 있다. 예를 들어, 상기 서비스 카테고리 속성은 상기 가상 현실 서비스를 지시할 수 있다.In addition, the service list table may include a service category attribute indicating a category of the service. For example, the service category attribute may indicate the virtual reality service.

또한, 상기 서비스 레이어 시그널링 데이터는 상기 관심 영역 정보를 포함할 수 있다. 구체적으로 상기 서비스 레이어 시그널링 데이터는 상기 가상 현실 서비스를 위한 적어도 하나의 미디어 컴포넌트들이 전송되는 세션에 대한 정보를 포함하는 S-TSID 프래그먼트, 상기 적어도 하나의 미디어 컴포넌트(비디오 데이터 및/또는 오디오 데이터)에 대한 정보를 포함하는 MPD 프래그먼트, 및 상기 S-TSID 프래그먼트 및 상기 MPD 프래그먼트를 연결하는 URI 값을 포함하는 USBD/USD 프래그먼트 중에서 적어도 하나를 포함할 수 있다.Also, the service layer signaling data may include the region of interest information. Specifically, the service layer signaling data includes an S-TSID fragment including information on a session in which at least one media components for the virtual reality service are transmitted, the at least one media component (video data and / or audio data) And a USBD / USD fragment including a URI value linking the S-TSID fragment and the MPD fragment.

또한, 상기 MPD 프래그먼트는 상기 가상 공간의 전체 영역 내에서 상기 적어도 하나의 관심 영역의 위치를 지시하는 관심 영역 정보를 포함할 수 있다.In addition, the MPD fragment may include region of interest information indicating the location of the at least one region of interest within the entire region of the virtual space.

또한, 상기 비트스트림은 상기 가상 공간의 전체 영역 내에서 상기 적어도 하나의 관심 영역의 위치를 지시하는 관심 영역 정보를 더 포함할 수 있다. 예를 들어, 상기 관심 영역 정보는 Supplemental Enhancement Information (SEI) 메시지, Video Usability Information (VUI) 메시지, 슬라이스 헤더, 및 상기 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송 및/또는 수신될 수 있다.In addition, the bitstream may further comprise region of interest information indicating the location of the at least one region of interest within the entire region of the virtual space. For example, the region of interest information may be transmitted and / or received via at least one of a Supplemental Enhancement Information (SEI) message, a Video Usability Information (VUI) message, a slice header, and a file describing the video data.

또한, 상기 적어도 하나의 향상 계층 비디오 데이터는 상기 기본 계층 비디오 데이터 및 상기 관심 영역 정보를 기초로 생성(인코딩) 및/또는 디코딩될 수 있다.Also, the at least one enhancement layer video data may be generated (encoded) and / or decoded based on the base layer video data and the region of interest information.

또한, 상기 관심 영역 정보는 픽쳐 별로 상기 관심 영역을 표현하는 정보의 모드를 지시하는 정보 모드 필드 및 상기 관심 영역에 해당하는 적어도 하나의 타일의 번호를 포함하는 타일 번호 리스트 필드 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 정보 모드 필드는 전술한 info_mode 정보일 수 있고, 타일 번호 리스트 필드는 전술한 tile_id_list 정보일 수 있다.In addition, the ROI information may include at least one of an information mode field indicating a mode of information representing the ROI, and a tile number list field including a number of at least one tile corresponding to the ROI . For example, the information mode field may be the info_mode information described above, and the tile number list field may be the tile_id_list information described above.

예를 들어, 상기 타일 번호 리스트 필드는 상기 정보 모드 필드를 기초로 상기 관심 영역에 해당하는 모든 타일의 번호, 연속된 타일의 시작 번호 및 끝 번호, 및 상기 관심 영역의 좌상단 및 우하단 타일의 번호 중에서 하나의 방식으로 상기 적어도 하나의 타일의 번호를 포함할 수 있다. For example, the tile number list field may include a number of all tiles corresponding to the area of interest, a start number and an end number of consecutive tiles, and a number of the upper left and lower right tiles of the area of interest, Lt; RTI ID = 0.0 > tile < / RTI >

또한, 상기 관심 영역 정보는 상기 관심 영역을 지시하는 코딩 유닛 번호 리스트 필드를 더 포함할 수 있다. 예를 들어, 코딩 유닛 번호 리스트 필드는 전술한 cu_id_list 정보일 수 있다.In addition, the ROI information may further include a coding unit number list field indicating the ROI. For example, the coding unit number list field may be cu_id_list information described above.

예를 들어, 상기 코딩 유닛 번호 리스트 필드는 상기 정보 모드 필드를 기초로 상기 관심 영역에 해당하는 타일의 번호 및 상기 타일에 포함되는 코딩 유닛의 번호를 지시할 수 있다.For example, the coding unit number list field may indicate a number of a tile corresponding to the region of interest and a number of a coding unit included in the tile based on the information mode field.

도 (b)를 참조하면, 클라이언트 디바이스(B2800)는 영상 입력부, 오디오 입력부, 센서부, 영상 출력부, 오디오 출력부, 통신부(B2810), 및/또는 제어부(B2820) 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 클라이언트 디바이스(B2800)에 대한 구체적인 내용은 전술한 클라이언트 디바이스(A2800)의 내용을 모두 포함할 수 있다.The client device B2800 may include at least one of an image input unit, an audio input unit, a sensor unit, a video output unit, an audio output unit, a communication unit B2810, and / or a control unit B2820. have. For example, the specific contents of the client device B 2806 may include all the contents of the client device A 2800 described above.

추가적으로, 제어부(B2820)는 제1 프로세서(B2821) 및/또는 제2 제어부(B2825) 중에서 적어도 하나를 포함할 수 있다.In addition, the control unit B2820 may include at least one of the first processor B2821 and / or the second control unit B2825.

제1 프로세서(B2821)는 기본 계층 비디오 데이터를 디코딩할 수 있다. 예를 들어, 제1 프로세서(B2821)는 비디오 처리 유닛(VPU, Video Processing Unit) 및/또는 DSP(Digital Signal Processor)일 수 있다.The first processor B2821 may decode the base layer video data. For example, the first processor B2821 may be a video processing unit (VPU) and / or a digital signal processor (DSP).

제2 프로세서(B2825)는 상기 제1 프로세서와 전기적으로 연결되어, 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 디코딩할 수 있다. 예를 들어, 제2 프로세서(B2825)는 중앙처리장치(CPU, Central Processing Unit) 및/또는 그래픽 처리 장치(GPU, Grapic Processing Unit)일 수 있다.The second processor B2825 may be electrically coupled to the first processor to decode the at least one enhancement layer video data based on the base layer video data. For example, the second processor B2825 may be a central processing unit (CPU) and / or a graphics processing unit (GPU).

도 29는 서버 디바이스의 예시적인 구성을 나타낸 도면이다.29 is a diagram showing an exemplary configuration of a server device.

클라이언트 디바이스 사이에서만 통신을 수행하는 경우, 적어도 하나의 클라이언트 디바이스(또는 HMD, 영상 수신 장치)가 서버 디바이스(또는 영상 전송 장치)의 동작을 모두 수행할 수도 있다. 이하에서는 서버 디바이스가 존재하는 경우를 중심으로 설명하지만, 본 명세서의 내용이 이에 한정되는 것은 아니다.When performing communication only between client devices, at least one client device (or HMD, image receiving apparatus) may perform all operations of the server device (or image transmitting apparatus). Hereinafter, the case where a server device exists will be mainly described, but the contents of the present specification are not limited thereto.

도 (a)를 참조하면, 서버 디바이스(A2900, 송신기, 영상 전송 장치)는 제어부(A2910) 및/또는 통신부(A2920)을 포함할 수 있다. 제어부(A2920)는 시그널링 데이터 추출부, 영상 생성부, 관심 영역 판단부, 시그널링 데이터 생성부, 및/또는 인코더 중에서 적어도 하나를 포함할 수 있다. 서버 디바이스(A2900)에 대한 구체적인 내용은 전술한 서버 디바이스의 내용을 모두 포함할 수 있다.Referring to FIG. 2A, a server device (A 2900, a transmitter, an image transmission apparatus) may include a control unit A 2910 and / or a communication unit A 2920. The controller A2920 may include at least one of a signaling data extracting unit, an image generating unit, a region of interest determining unit, a signaling data generating unit, and / or an encoder. The specific contents of the server device A 2900 may include all the contents of the server device described above.

도면을 참조하면, 서버 디바이스(A2900)의 제어부(A2910)는 기본 계층 인코더(A2911) 및/또는 향상 계층 인코더(A2915)를 포함할 수 있다.Referring to the drawings, a controller A2910 of the server device A2900 may include a base layer encoder A 2911 and / or an enhancement layer encoder A 2915.

기본 계층 인코더(A2911)는 기본 계층 비디오 데이터를 생성할 수 있다.The base layer encoder A 2911 can generate base layer video data.

향상 계층 인코더(A2915)는 상기 기본 계층 비디오 데이터를 기초로 적어도 하나의 향상 계층 비디오 데이터를 생성할 수 있다.The enhancement layer encoder A 2915 may generate at least one enhancement layer video data based on the base layer video data.

통신부(A2920)는 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 전송할 수 있다. 통신부(A2920)는 방송망 및/또는 브로드밴드를 통하여 비트스트림을 전송할 수 있다.The communication unit A2920 can transmit a bit stream including video data for a virtual reality service. The communication unit A2920 can transmit the bit stream through the broadcasting network and / or the broadband.

또한, 상기 비디오 데이터는 기본 계층을 위한 상기 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 상기 적어도 하나의 향상 계층 비디오 데이터를 포함할 수 있다.Also, the video data may include the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.

또한, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.Also, the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.

또한, 통신부(A2920)는 제1 시그널링 데이터를 더 수신할 수 있다. 예를 들어, 상기 제1 시그널링 데이터는 영상 구성 정보를 포함할 수 있다.Further, the communication section A2920 can further receive the first signaling data. For example, the first signaling data may include image configuration information.

제어부(A2910)의 관심 영역 판단부는 상기 적어도 하나의 관심 영역에 상기 시선 정보에 대응되는 시선 영역을 포함시킬 수 있다.The region of interest determination unit of the control unit A2910 may include a visual region corresponding to the visual information in the at least one region of interest.

또한, 제어부(A2910)의 시그널링 데이터 생성부는 제2 시그널링 데이터를 생성할 수 있다.Also, the signaling data generation unit of the control unit A2910 may generate the second signaling data.

또한, 상기 통신부(A2920)는 상기 제2 시그널링 데이터를 기초로 상기 기본 계층 비디오 데이터 및 상기 적어도 하나의 향상 계층 비디오 데이터를 복수의 세션들을 통해서 독립적으로 전송할 수 있다.In addition, the communication unit A2920 may independently transmit the base layer video data and the at least one enhancement layer video data through a plurality of sessions based on the second signaling data.

또한, 제2 시그널링 데이터 및/또는 관심 영역 정보는 전술한 내용을 모두 포함할 수 있다.In addition, the second signaling data and / or the region of interest information may include all of the above.

도 (b)를 참조하면, 서버 디바이스(B2900, 송신기, 영상 전송 장치)는 제어부(B2910), 및/또는 통신부(B2920) 중에서 적어도 하나를 포함할 수 있다. 제어부(B2920)는 시그널링 데이터 추출부, 영상 생성부, 관심 영역 판단부, 시그널링 데이터 생성부, 및/또는 인코더 중에서 적어도 하나를 포함할 수 있다. 서버 디바이스(B2900)에 대한 구체적인 내용은 전술한 서버 디바이스의 내용을 모두 포함할 수 있다.Referring to FIG. 5B, the server device (B2900, transmitter, image transmission apparatus) may include at least one of a control unit (B2910) and / or a communication unit (B2920). The control unit B2920 may include at least one of a signaling data extracting unit, an image generating unit, a region of interest determining unit, a signaling data generating unit, and / or an encoder. The specific contents of the server device B 2900 may include all the contents of the server device described above.

서버 디바이스(B2900)의 제어부(B2910)는 제1 프로세서(B2911) 및/또는 제2 프로세서(B2915)를 포함할 수 있다.The control unit B2910 of the server device B2900 may include a first processor B2911 and / or a second processor B2915.

제1 프로세서(B2911)는 기본 계층 비디오 데이터를 생성하는 기본 계층 인코더를 포함할 수 있다.The first processor B2911 may include a base layer encoder for generating base layer video data.

제2 프로세서(B2915)는 상기 제1 프로세서와 전기적으로 연결되어, 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 생성(또는 인코딩)할 수 있다.The second processor B 2915 may be electrically coupled to the first processor to generate (or encode) the at least one enhancement layer video data based on the base layer video data.

도 30은 클라이언트 디바이스의 예시적인 동작을 나타낸 도면이다.30 is a diagram illustrating an exemplary operation of a client device.

클라이언트 디바이스(또는 수신기, 영상 수신 장치)는 통신부, 및/또는 제어부를 포함할 수 있다. 제어부는 기본 계층 디코더 및/또는 향상 계층 디코더를 포함할 수 있다. 또한, 제어부는 제1 프로세서 및/또는 제2 프로세서를 포함할 수 있다.The client device (or receiver, video receiving apparatus) may include a communication unit and / or a control unit. The control unit may include a base layer decoder and / or an enhancement layer decoder. Further, the control unit may include a first processor and / or a second processor.

클라이언트 디바이스는, 통신부를 이용하여, 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 수신할 수 있다(3010).The client device can receive the bitstream including the video data for the virtual reality service using the communication unit (3010).

예를 들어, 상기 비디오 데이터는 기본 계층을 위한 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터를 포함할 수 있다.For example, the video data may include base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.

그리고 나서, 클라이언트 디바이스는, 기본 계층 디코더 및/또는 제1 프로세서를 이용하여, 상기 기본 계층 비디오 데이터를 디코딩할 수 있다(3020).The client device may then decode the base layer video data using a base layer decoder and / or a first processor (3020).

그리고 나서, 클라이언트 디바이스는, 향상 계층 디코더 및/또는 제2 프로세서를 이용하여, 상기 기본 계층 비디오 데이터를 기초로 상기 적어도 하나의 향상 계층 비디오 데이터를 디코딩할 수 있다(3030).The client device may then decode (3030) the at least one enhancement layer video data based on the base layer video data using an enhancement layer decoder and / or a second processor.

예를 들어, 상기 적어도 하나의 향상 계층 비디오 데이터는 가상 공간 내에서 적어도 하나의 관심 영역을 위한 비디오 데이터일 수 있다.For example, the at least one enhancement layer video data may be video data for at least one region of interest within the virtual space.

클라이언트 디바이스의 동작에 관련된 내용은 전술한 클라이언트 디바이스의 내용을 모두 포함할 수 있다.The content related to the operation of the client device may include the contents of the client device described above.

도 31은 서버 디바이스의 예시적인 동작을 나타낸 도면이다.31 is a diagram showing an exemplary operation of the server device.

서버 디바이스는 제어부 및/또는 통신부를 포함할 수 있다. 제어부는 기본 계층 인코더 및/또는 향상 계층 인코더를 포함할 수 있다. 또한, 제어부는 제1 프로세서 및/또는 제2 프로세서를 포함할 수 있다.The server device may include a control unit and / or a communication unit. The control unit may include a base layer encoder and / or an enhancement layer encoder. Further, the control unit may include a first processor and / or a second processor.

서버 디바이스는, 기본 계층 인코더 및/또는 제1 프로세서를 이용하여, 기본 계층 비디오 데이터를 생성할 수 있다(3110).The server device may generate 3110 base layer video data using a base layer encoder and / or a first processor.

그리고 나서, 서버 디바이스는, 향상 계층 인코더 및/또는 제2 프로세서를 이용하여, 상기 기본 계층 비디오 데이터를 기초로 적어도 하나의 향상 계층 비디오 데이터를 생성할 수 있다(3120).The server device may then generate 3120 at least one enhancement layer video data based on the base layer video data using an enhancement layer encoder and / or a second processor.

그리고 나서, 서버 디바이스는, 통신부를 이용하여, 가상 현실 서비스를 위한 비디오 데이터를 포함하는 비트스트림을 전송할 수 있다.Then, the server device can transmit the bit stream including the video data for the virtual reality service using the communication unit.

예를 들어, 상기 비디오 데이터는 기본 계층을 위한 상기 기본 계층 비디오 데이터 및 상기 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 상기 적어도 하나의 향상 계층 비디오 데이터를 포함할 수 있다.For example, the video data may include the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer.

서버 디바이스의 동작에 관련된 내용은 전술한 서버 디바이스의 내용을 모두 포함할 수 있다.The content related to the operation of the server device may include all the contents of the server device described above.

또한, 본 명세서에 개시된 실시 예에 의하면, 전술한 방법은, 프로그램이 기록된 매체에 프로세서가 읽을 수 있는 코드로서 구현할 수 있다. 프로세서가 읽을 수 있는 매체의 예로는, ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 다운로드 가능한 파일의 형태로 구현되는 것도 포함한다.Further, according to the embodiments disclosed herein, the above-described method can be implemented as a code that can be read by a processor on a medium on which the program is recorded. Examples of the medium that can be read by the processor include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

상기와 같이 설명된 전자 장치는 상기 설명된 실시 예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시 예들은 다양한 변형이 이루어질 수 있도록 각 실시 예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.The above-described electronic device can be applied to a configuration and a method of the embodiments described above in a limited manner, but the embodiments can be configured such that all or some of the embodiments are selectively combined so that various modifications can be made It is possible.

이상에서 본 명세서의 기술에 대한 바람직한 실시 예가 첨부된 도면들을 참조하여 설명되었다. 여기서, 본 명세서 및 청구 범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 본 기술의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다.In the foregoing, preferred embodiments of the present invention have been described with reference to the accompanying drawings. Herein, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and should be construed as meaning and concept consistent with the technical idea of the present technology.

본 기술의 범위는 본 명세서에 개시된 실시 예들로 한정되지 아니하고, 본 기술은 본 기술명의 사상 및 특허청구범위에 기재된 범주 내에서 다양한 형태로 수정, 변경, 또는 개선될 수 있다.The scope of the present technology is not limited to the embodiments disclosed in the present specification, and the present invention may be modified, changed, or improved in various forms within the scope of the present invention.

A2821 : 기본 계층 디코더 A2825 : 향상 계층 디코더
A2810 : 통신부
A2911 : 기본 계층 인코더 A2915 : 향상 계층 인코더
A2920 : 통신부A2821: Base layer decoder A2825: Enhancement layer decoder
A2810:
A2911: Base layer encoder A2915: Enhancement layer encoder
A2920:

Claims

Receiving a bitstream including video data for a virtual reality service,
Wherein the video data comprises base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer;
Decoding the base layer video data; And
And decoding the at least one enhancement layer video data based on the base layer video data,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.

The method according to claim 1,
And generating first signaling data,
Wherein the first signaling data includes visual information indicating a direction of a user's gaze in the virtual space.

3. The method of claim 2,
Determining whether a gaze region corresponding to the gaze information is included in the at least one region of interest; And
And transmitting the first signaling data if the viewing area is included in an area other than the at least one area of interest,
Wherein the gaze region is added to the at least one region of interest.

The method according to claim 1,
Wherein the bitstream comprises interest region information indicating the location of the at least one ROI within the entire region of the virtual space,
Wherein the at least one enhancement layer video data is decoded based on the base layer video data and the region of interest information.

5. The method of claim 4,
Wherein the ROI information includes a tile number list field including a number of at least one tile corresponding to the ROI.

6. The method of claim 5,
Wherein the tile number list field comprises a number of tiles of the at least one tile in one of a number of all tiles corresponding to the area of interest, a start number and an end number of consecutive tiles, / RTI >

5. The method of claim 4,
Wherein the ROI information is received through at least one of a Supplemental Enhancement Information (SEI) message, a Video Usability Information (VUI) message, a slice header, and a file describing the video data.

5. The method of claim 4,
Wherein the bitstream comprises second signaling data,
Wherein the receiving the bitstream comprises:
And receiving the base layer video data and the at least one enhancement layer video data independently through a plurality of sessions based on the second signaling data.

9. The method of claim 8,
Wherein the second signaling data includes a service list signal including information for acquiring the video data and information for acquiring the service layer signaling data.

10. The method of claim 9,
Wherein the service layer signaling data comprises the region of interest information.

Generating base layer video data;
Generating at least one enhancement layer video data based on the base layer video data; And
Transmitting a bitstream including video data for a virtual reality service,
Wherein the video data comprises the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.

12. The method of claim 11,
Receiving first signaling data,
Wherein the first signaling data includes line-of-sight information indicating a line-of-sight direction of the user in the virtual space,
Wherein the first signaling data is received when a line of sight corresponding to the line of sight information is included in an area other than the at least one area of interest.

13. The method of claim 12,
Wherein the gaze region is added to the at least one region of interest.

12. The method of claim 11,
Wherein the bitstream comprises interest region information indicating the location of the at least one ROI within the entire region of the virtual space,
Wherein the at least one enhancement layer video data is encoded based on the base layer video data and the region of interest information.

15. The method of claim 14,
Wherein the ROI information includes a tile number list field including a number of at least one tile corresponding to the ROI.

16. The method of claim 15,
Wherein the tile number list field comprises a number of tiles of the at least one tile in one of a number of all tiles corresponding to the area of interest, a start number and an end number of consecutive tiles, A video transmission method comprising a number.

15. The method of claim 14,
Wherein the ROI information is transmitted through at least one of a Supplemental Enhancement Information (SEI) message, a Video Usability Information (VUI) message, a slice header, and a file describing the video data.

15. The method of claim 14,
Further comprising generating second signaling data,
Wherein the step of transmitting the bitstream comprises:
And the base layer video data and the at least one enhancement layer video data are independently transmitted through a plurality of sessions based on the second signaling data.

19. The method of claim 18,
Wherein the second signaling data comprises a service list signal including service layer signaling data including information for acquisition of the video data and information for acquiring the service layer signaling data.

20. The method of claim 19,
Wherein the service layer signaling data includes the region of interest information.

A communication unit for receiving a bit stream including video data for a virtual reality service,
Wherein the video data comprises base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer;
A base layer decoder for decoding the base layer video data; And
And an enhancement layer decoder for decoding the at least one enhancement layer video data based on the base layer video data,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.

A communication unit for receiving base layer video data for a base layer and at least one enhancement layer video data for at least one enhancement layer predicted from the base layer;
A first processor for decoding the base layer video data; And
And a second processor, electrically coupled to the first processor, for decoding the at least one enhancement layer video data based on the base layer video data,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.

A base layer encoder for generating base layer video data;
An enhancement layer encoder for generating at least one enhancement layer video data based on the base layer video data; And
And a communication unit for transmitting a bitstream including video data for a virtual reality service,
Wherein the video data comprises the base layer video data for a base layer and the at least one enhancement layer video data for at least one enhancement layer predicted from the base layer,
Wherein the at least one enhancement layer video data is video data for at least one region of interest within a virtual space.