KR20200076529A

KR20200076529A - Indexing of tiles for region of interest in virtual reality video streaming

Info

Publication number: KR20200076529A
Application number: KR1020180165680A
Authority: KR
Inventors: 류은석; 장동민
Original assignee: 가천대학교 산학협력단
Priority date: 2018-12-19
Filing date: 2018-12-19
Publication date: 2020-06-29
Also published as: KR102183895B1

Abstract

An image transmission method of an image transmission device disclosed in the present specification is a method performed in the image transmission device including a processor. The image transmission method includes steps of: receiving first signaling data including a sphere coordinates for a user′s region of interest within a 360-degree virtual reality space; mapping the region of interest to a 2D plane based on the sphere coordinates and preset projection information; and calculating the indexing information of the region of interest tile corresponding to the mapped region of interest; and transmitting video data and second signaling data for a 360-degree virtual reality space based on the indexing information.

Description

INDEXING OF TILES FOR REGION OF INTEREST IN VIRTUAL REALITY VIDEO STREAMING}

본 명세서는 가상 현실 비디오를 스트리밍하는 것에 관한 것이다.This specification relates to streaming virtual reality video.

최근 가상 현실 기술 및 장비의 발달과 함께 머리 장착형 영상 표시 장치(Head-Mounted Display; HMD)와 같은 착용 가능한 기기들이 선보이고 있다. 가상 현실 기술과 머리 장착형 영상 표시 장치를 통한 여러 서비스 시나리오 중에는 대표적으로 영화 관람 및 게임뿐만 아니라 화상회의와 원격 수술 등이 존재한다.Recently, with the development of virtual reality technology and equipment, wearable devices such as a head-mounted display (HMD) have been introduced. Among various service scenarios through virtual reality technology and a head-mounted video display device, there are not only movie viewing and games, but also video conferencing and telesurgery.

머리 장착형 영상 표시 장치는 사용자가 머리에 착용하여 눈앞에서 재생되고, 구 형태의 360도 화면을 재생해야 하기 때문에, 사용자가 어색함 없이 가상현실을 체험하기 위한 가상 현실 기술과 머리 장착형 영상 표시 장치를 통한 서비스 시나리오에는 UHD(Ultra High-Definition)급 이상의 초 고화질 영상이 요구된다.Since the head-mounted video display device is played in front of the user by wearing it on the head and needs to play a spherical 360-degree screen, the virtual reality technology and the head-mounted video display device allow the user to experience virtual reality without awkwardness. In the service scenario, ultra high-definition video of UHD (Ultra High-Definition) or higher is required.

초 고화질의 360도 영상의 재생에는 높은 컴퓨팅 연산능력이 요구되고, 초 고화질 360도 영상 데이터의 전송에는 높은 대역폭이 요구되지만 현재 출시된 머리 장착형 영상 표시 장치와 가상 현실 기술로 초 고화질 360도 영상을 전송하여 재생하기에는 한계가 있다. 따라서, 초 고화질의 360도 영상을 효율적으로 전송하고 영상 재생 속도를 높이기 위한 영상 처리 기술에 대한 필요성이 증가하고 있다.Although high computing power is required for the reproduction of ultra-high-definition 360-degree images, and high bandwidth is required for the transmission of ultra-high-definition 360-degree image data, it is possible to display ultra-high-definition 360-degree images with the currently available head-mounted video display device and virtual reality technology. There is a limit to transferring and playing. Accordingly, there is an increasing need for an image processing technology for efficiently transmitting an ultra-high-definition 360-degree image and increasing an image reproduction speed.

대표적인 효율적 전송 방식으로 사용자 시선 기반 방법과 사용자 시선 비 기반 방법이 있다. 사용자 시선 비 기반 방식은 360도 영상 전체에 대한 데이터를 머리 장착형 영상 표시 장치로 전송하는 대신 전송할 영상의 주요 영역 별로 QP값을 조절하거나 다운샘플링 하여 전송 대역폭을 낮춘다. 사용자 시선 기반 방식은 360도 영상 전체 중 사용자가 바라보는 영역은 일부라는 점을 고려하여 사용자가 바라보는 영역에 해당하는 타일만 고화질로 전송한다. 사용자 시선 기반 방식은 선택적인 영상 데이터의 전송으로 인해 전송 대역폭 절감이 크며, 디코더 및 인코더의 연산 복잡도가 낮다. 또한 사용자가 바라보지 않는 영역은 저화질로 전송하여 지연속도를 완화시킨다.The representative efficient transmission methods include a user gaze-based method and a user gaze-based method. The user gaze ratio-based method lowers transmission bandwidth by adjusting or downsampling QP values for each main area of the image to be transmitted instead of transmitting data for the entire 360-degree image to the head-mounted image display device. The user gaze-based method transmits only the tile corresponding to the area viewed by the user in high quality, considering that a part of the 360-degree video is partially viewed by the user. The user gaze-based method has a large reduction in transmission bandwidth due to selective transmission of image data, and a low computational complexity of the decoder and encoder. In addition, the area that the user does not look at is transmitted with low image quality to alleviate the delay.

사용자 시선 기반 방식은 정확한 사용자 시선 영역의 판단이 필요하다. 하지만 사용자 시선을 정확히 판단하기 위해서는 다음의 문제점이 있다. 머리 장착형 영상 표시 장치(HMD)가 바라보는 방향은 사용자 시선이 향하는 방향과 다른 문제와 프로젝션의 종류별로 사용자 관심영역의 타일 인덱스가 다른 문제가 발생한다. 따라서, 사용자 시선과 프로젝션을 고려한 효율적 전송 기술에 대한 필요성이 증대되고 있다.The user gaze-based method needs to accurately determine the user gaze area. However, in order to accurately determine the user's gaze, there are the following problems. The direction in which the head mounted image display device (HMD) looks is different from the direction in which the user's gaze is directed, and the tile index of the user's region of interest is different according to the type of projection. Accordingly, there is an increasing need for an efficient transmission technology in consideration of user gaze and projection.

본 명세서는 영상 전송 장치의 영상 전송 방법을 제시한다. 상기 영상 전송 방법은 프로세서를 포함한 영상 전송 장치에서 수행되는 방법으로서, 360도 가상 현실 공간 내에서 사용자의 관심영역에 대한 구 좌표 및 상기 관심영역의 타입(type)에 기반한 관심영역 타입 정보를 포함하는 제1 시그널링 데이터를 수신하는 동작, 상기 구 좌표 및 미리 설정된 프로젝션 정보에 기반하여 상기 관심영역을 2차원 평면에 맵핑하는 동작, 상기 맵핑된 관심영역에 대응하는 관심영역 타일의 인덱싱 정보를 계산하는 동작, 및 상기 인덱싱 정보에 기반하여 360도 가상 현실 공간에 대한 비디오 데이터 및 제2 시그널링 데이터를 전송하는 동작을 포함할 수 있다.This specification presents a video transmission method of the video transmission device. The image transmission method is a method performed by an image transmission device including a processor, and includes information on a region of interest based on a spherical coordinate of a user's region of interest and a type of the region of interest in a 360-degree virtual reality space. Receiving first signaling data, mapping the region of interest to a two-dimensional plane based on the sphere coordinates and preset projection information, and calculating indexing information of a region of interest tile corresponding to the mapped region of interest And transmitting video data and second signaling data for a 360-degree virtual reality space based on the indexing information.

상기 방법 및 그 밖의 실시 예는 다음과 같은 특징을 포함할 수 있다.The above method and other embodiments may include the following features.

상기 제2 시그널링 데이터는 상기 프로젝션 정보 및 상기 프로젝션 정보에 기반한 맵핑된 관심영역 정보를 포함할 수 있다.The second signaling data may include the projection information and mapped region of interest information based on the projection information.

또한, 상기 관심영역 타입 정보는 머리 장착형 영상 표시 장치에 기반한 뷰포트 또는 사용자의 시점에 기반한 뷰포트 중 어느 하나를 지시하는 정보일 수 있다.Also, the region of interest type information may be information indicating either a viewport based on a head-mounted image display device or a viewport based on a user's viewpoint.

또한, 상기 방법은 상기 360도 가상 현실 공간의 적어도 일부에 대한 기본 화질 비디오 데이터를 준비하는 동작, 및 상기 관심영역 타일에 관련된 고화질 비디오 데이터를 준비하는 동작을 더 포함하고, 상기 360도 가상 현실 공간에 대한 비디오 데이터는 상기 기본 화질 비디오 데이터 및 상기 고화질 비디오 데이터를 포함할 수 있다.In addition, the method further includes preparing basic quality video data for at least a portion of the 360-degree virtual reality space, and preparing high-definition video data related to the region of interest tile, and the 360-degree virtual reality space The video data for may include the basic quality video data and the high quality video data.

또한, 상기 기본 화질 비디오 데이터는 상기 360도 가상 현실 공간에 대한 기본 계층 비디오 데이터를 포함하고, 상기 고화질 비디오 데이터는 상기 관심영역 타일에 대한 기본 계층 비디오 데이터 및 향상 계층 비디오 데이터를 포함할 수 있다.In addition, the base quality video data may include base layer video data for the 360-degree virtual reality space, and the high definition video data may include base layer video data and enhancement layer video data for the region of interest tile.

또한, 상기 구 좌표는 상기 관심영역 네 변의 중앙점에 대해서 마주보는 두 변의 제1 중앙점끼리 연결한 선분들이 교차하는 제2 중앙점의 yaw 좌표 및 pitch 좌표와, 상기 제2 중앙점을 기준으로 한 수평 길이 정보 및 수직 길이 정보로 표현될 수 있다.In addition, the spherical coordinates are based on the yaw coordinates and pitch coordinates of the second central point where the segments connected between the first central points of the two sides opposite to the central points of the four sides of the region of interest and the second central point. It may be represented by one horizontal length information and vertical length information.

또한, 상기 제1 시그널링 데이터 및 제2 시그널링 데이터는 영상 구성 정보를 기초로 생성되고, 상기 영상 구성 정보는 상기 360도 가상 현실 공간 내에서 상기 뷰포트들을 지시하는 시선 정보 및 상기 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.In addition, the first signaling data and the second signaling data are generated based on image configuration information, and the image configuration information indicates gaze information indicating the viewports in the 360-degree virtual reality space and the viewing angle of the user It may include zoom area information.

또한, 상기 제1 시그널링 데이터 및 제2 시그널링 데이터는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol), SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더(Slice Header), 및 상기 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.In addition, the first signaling data and the second signaling data include High-Level Syntax Protocol (SEI) for carrying session information, Supplement Enhancement Information (SEI), video usability information (VUI), Slice Header, And a file describing the video data.

또한, 상기 프로젝션의 방식은 Equi-Rectangular Projection (ERP), Cube Map Projection (CMP), Adjusted Equal-Area Projection (AEP), Octahedron Projection (OHP), Icosahedron Projection (ISP), Truncated Square Pyramid (TSP), Segmented Sphere Projection (SSP), Adjusted Cube Map Projection (ACP), Rotated Sphere Projection (RSP), Equatorial Cylindrical Projection (ECP) 및 Equi-Angular Cube Map (EAC) 중 어느 하나일 수 있다.In addition, the projection method is Equi-Rectangular Projection (ERP), Cube Map Projection (CMP), Adjusted Equal-Area Projection (AEP), Octahedron Projection (OHP), Icosahedron Projection (ISP), Truncated Square Pyramid (TSP), Segmented Sphere Projection (SSP), Adjusted Cube Map Projection (ACP), Rotated Sphere Projection (RSP), Equatorial Cylindrical Projection (ECP) and Equi-Angular Cube Map (EAC).

한편, 본 명세서는 영상 전송 장치를 제시한다. 상기 영상 전송 장치는 360도 가상 현실 공간 내에서 사용자의 관심영역에 대한 구 좌표 및 상기 관심영역의 타입(type)에 기반한 관심영역 타입 정보를 포함하는 제1 시그널링 데이터를 수신하는 통신부, 상기 구 좌표 및 미리 설정된 프로젝션 정보에 기반하여 상기 관심영역을 2차원 평면에 맵핑하는 제어부, 상기 맵핑된 관심영역에 대응하는 관심영역 타일의 인덱싱 정보를 계산하는 시그널링 데이터 생성부, 및 상기 인덱싱 정보에 기반하여 360도 가상 현실 공간에 대한 비디오 데이터 및 제2 시그널링 데이터를 전송하는 전송부를 포함할 수 있다.Meanwhile, this specification proposes an image transmission device. The image transmission device is a communication unit for receiving first signaling data including sphere coordinates for a user's region of interest and type of region of interest based on a type of the region of interest in a 360 degree virtual reality space, the sphere coordinates And a controller that maps the region of interest to a two-dimensional plane based on preset projection information, a signaling data generator that calculates indexing information of a region of interest tile corresponding to the mapped region of interest, and 360 based on the indexing information. Also may include a transmitter for transmitting video data and second signaling data for the virtual reality space.

상기 장치 및 그 밖의 실시 예는 다음과 같은 특징을 포함할 수 있다.The device and other embodiments may include the following features.

상기 360도 가상 현실 공간의 적어도 일부에 대한 기본 화질 비디오 데이터 및 상기 관심영역 타일에 대한 고화질 비디오 데이터를 생성하는 인코더를 더 포함할 수 있다.The encoder may further include basic quality video data for at least a portion of the 360-degree virtual reality space and high-definition video data for the region of interest tile.

또한, 상기 기본 화질 비디오 데이터는 상기 360도 가상 현실 공간의 적어도 일부에 대한 기본 계층 비디오 데이터를 포함하고, 상기 고화질 비디오 데이터는 상기 관심영역 타일에 대한 기본 계층 비디오 데이터 및 향상 계층 비디오 데이터를 포함할 수 있다.In addition, the base quality video data includes base layer video data for at least a portion of the 360 degree virtual reality space, and the high definition video data includes base layer video data and enhancement layer video data for the region of interest tile. Can.

또, 한편, 본 명세서는 영상 재생 장치의 영상 재생 방법을 제시한다. 상기 영상 재생 방법은 프로세서를 포함한 영상 재생 장치에서 수행되는 방법으로서, 360도 가상 현실 공간 내에서 사용자의 관심영역에 대한 구 좌표 및 상기 관심영역의 타입(type)에 기반한 관심영역 타입 정보를 포함하는 제1 시그널링 데이터를 전송하는 동작, 제2 시그널링 데이터 및 상기 360도 가상 현실 공간에 대한 비디오 데이터를 수신하는 동작, 및 상기 360도 가상 현실 공간에 대한 비디오 데이터 및 상기 제2 시그널링 데이터에 기반하여 영상을 재생하는 동작을 포함하고, 상기 제2 시그널링 데이터는 상기 구 좌표, 상기 관심영역 타입 정보 및 미리 설정된 프로젝션 정보에 기반하여 2차원 평면에 맵핑된 상기 관심영역에 대응하는 관심영역 타일의 인덱싱 정보를 포함하고, 상기 360도 가상 현실 공간에 대한 비디오 데이터는 상기 360도 가상 현실 공간의 적어도 일부에 대한 기본 화질 비디오 데이터 및 상기 관심영역 타일에 대한 고화질 비디오 데이터를 포함할 수 있다.On the other hand, the present specification proposes a video reproducing method of the video reproducing apparatus. The image reproducing method is a method performed by an image reproducing apparatus including a processor, and includes sphere coordinates for a user's region of interest in the 360-degree virtual reality space and region of interest type information based on the type of the region of interest. Transmitting first signaling data, receiving second signaling data and video data for the 360-degree virtual reality space, and video based on video data for the 360-degree virtual reality space and the second signaling data And reproducing, wherein the second signaling data includes indexing information of a region of interest tile corresponding to the region of interest mapped to a 2D plane based on the sphere coordinates, the region of interest type information, and preset projection information. The video data for the 360-degree virtual reality space may include basic quality video data for at least a portion of the 360-degree virtual reality space and high-definition video data for the tile of interest.

상기 360도 가상 현실 공간에 대한 비디오 데이터 및 상기 제2 시그널링 데이터에 기반하여 영상을 재생하는 동작은, 상기 360도 가상 현실 공간에 대한 비디오 데이터 및 상기 제2 시그널링 데이터에 기초하여 상기 관심영역을 포함하는 사용자 뷰포트에 대한 영상을 렌더링하여 표시할 수 있다.The operation of reproducing an image based on the video data for the 360-degree virtual reality space and the second signaling data includes the region of interest based on the video data for the 360-degree virtual reality space and the second signaling data. The image for the user viewport can be rendered and displayed.

또 다른 한편, 본 명세서는 매체에 저장된 영상 전송을 위한 컴퓨터프로그램을 제시한다. 상기 컴퓨터프로그램은, 컴퓨팅 장치에서, 360도 가상 현실 공간 내에서 사용자의 관심영역에 대한 구 좌표 및 상기 관심영역의 타입(type)에 기반한 관심영역 타입 정보를 포함하는 제1 시그널링 데이터를 수신하는 동작, 상기 구 좌표 및 미리 설정된 프로젝션 정보에 기반하여 상기 관심영역을 2차원 평면에 맵핑하는 동작, 상기 맵핑된 관심영역에 대응하는 관심영역 타일의 인덱싱 정보를 계산하는 동작, 및 상기 인덱싱 정보에 기반하여 360도 가상 현실 공간에 대한 비디오 데이터 및 제2 시그널링 데이터를 전송하는 동작을 실행시킬 수 있다.On the other hand, this specification proposes a computer program for image transmission stored on the medium. The computer program, in the computing device, receiving first signaling data including sphere coordinates for a user's region of interest in the 360-degree virtual reality space and region-of-interest type information based on the type of the region of interest , Mapping the region of interest to a two-dimensional plane based on the spherical coordinates and preset projection information, calculating indexing information of a region of interest tile corresponding to the mapped region of interest, and based on the indexing information An operation of transmitting video data and second signaling data for a 360-degree virtual reality space may be executed.

본 명세서에 개시된 실시 예들에 의하면, 360도 가상 현실 공간에 대한 사용자의 관심영역을 머리 착용형 영상 표시 장치의 뷰포트와 사용자 시점 기반 뷰포트로 구분하고, 프로젝션 타입에 따라 사용자 관심영역을 맵핑하여 정확한 사용자 관심영역 정보를 전달함으로써, 사용자 관심영역에 대해서만 고화질 비디오를 전송할 수 있어서, 전송 대역폭을 확보할 수 있는 효과가 있다.According to the embodiments disclosed herein, a user's region of interest for a 360-degree virtual reality space is divided into a viewport of a head-worn video display device and a user's point of view based viewport, and a user's region of interest is mapped according to a projection type to correct the user. By transmitting the information of the region of interest, it is possible to transmit a high-definition video only to the region of interest of the user, thereby securing the transmission bandwidth.

또한, 본 명세서에 개시된 실시 예들에 의하면, 360도 가상 현실 공간에 비디오 데이터는 사용자 관심영역에 대해서만 고화질 비디오 데이터를 포함하고, 그 외의 영역에 대해서는 기본 화질의 비디오 데이터를 포함함으로써 영상 재생 장치에서 영상 재생의 지연이 감소하는 효과가 있다.In addition, according to the embodiments disclosed herein, video data in a 360-degree virtual reality space includes high-definition video data only for a user's region of interest, and other areas include video data of a basic quality, so that the video is reproduced in the video reproducing apparatus. This has the effect of reducing the delay of reproduction.

또한, 본 명세서에 개시된 실시 예들에 의하면, 사용자의 관심영역에 대한 정보, 프로젝션 타입에 대한 저옵, 프로젝션 타입에 따른 사용자 관심영역의 맵핑 정보 등을 시그널링 데이터를 통해 전달함으로써, 가상 현실 서비스 시스템에서 전송을 위한 대역폭의 확보와 서비스 속도를 향상할 수 있는 효과가 있다.In addition, according to the embodiments disclosed in the present specification, information about a user's region of interest, jeop for a projection type, mapping information of a user's region of interest according to a projection type, etc. are transmitted through signaling data, and transmitted in a virtual reality service system. It has the effect of securing the bandwidth for and improving the service speed.

또한, 본 명세서에 개시된 실시 예들에 의하면, 360도 가상 현실 공간에서 프로젝션을 고려하여 머리 장착형 표시 장치가 바라보는 위치와 함께 실제로 사용자의 시선이 향하는 관심영역에 대한 정보를 메타데이터로 표현하여 전송함으로써, 정확한 사용자 관심영역 정보를 전달하며 네트워크 환경에 따른 적응적 비디오 전송을 통해 대역폭에 최적화된 비디오 전송을 달성할 수 있는 효과가 있다.In addition, according to the embodiments disclosed in the present specification, in consideration of projection in a 360-degree virtual reality space, information on a region of interest to which a user's gaze is actually transmitted is transmitted by expressing metadata with respect to a position viewed by the head-mounted display device. , It has the effect of delivering accurate user interest area information and achieving video optimized for bandwidth through adaptive video transmission according to the network environment.

도 1은 가상 현실 영상을 제공하는 예시적인 가상 현실 시스템을 도시한다.
도 2는 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.
도 3은 서버 디바이스의 예시적인 구성을 나타낸 블록도이다.
도 4는 서버 디바이스의 인코더의 예시적인 구성을 나타낸 블록도이다.
도 5는 관심영역을 시그널링하는 예시적인 방법을 나타낸 도면이다.
도 6은 클라이언트 디바이스의 예시적인 구성을 나타낸 블록도이다.
도 7은 클라이언트 디바이스의 제어부의 예시적인 구성을 나타낸 블록도이다.
도 8은 클라이언트 디바이스의 디코더의 예시적인 구성을 나타낸 블록도이다.
도 9는 프로젝션 정보에 따른 사용자 관심영역의 타일 인덱싱 방법을 예시적으로 도시한 도면이다.
도 10은 영상 스트리밍 서비스를 위한 예시적인 영상 전송 장치의 블럭도이다.
도 11은 영상 스트리밍 서비스를 위한 예시적인 영상 전송 방법의 순서도이다.
도 12는 영상 스트리밍 서비스를 위한 예시적인 영상 재생 방법의 순서도이다.
도 13은 프로젝션에 따른 사용자 관심영역의 타일 인덱싱에서의 시그널링을 위한 국제 비디오 표준에서의 예시적인 OMAF 구문을 도시한 도면이다.
도 14는 XML의 형태로 표현된 관심영역 타일 인덱싱 정보 구문의 예를 나타낸 도면이다.1 shows an example virtual reality system that provides a virtual reality image.
2 is a diagram illustrating an example scalable video coding service.
3 is a block diagram showing an exemplary configuration of a server device.
4 is a block diagram showing an exemplary configuration of an encoder of a server device.
5 is a diagram illustrating an exemplary method of signaling a region of interest.
6 is a block diagram showing an exemplary configuration of a client device.
7 is a block diagram showing an exemplary configuration of a control unit of a client device.
8 is a block diagram showing an exemplary configuration of a decoder of a client device.
9 is a diagram illustrating a tile indexing method of a user's region of interest according to projection information.
10 is a block diagram of an exemplary video transmission device for a video streaming service.
11 is a flowchart of an exemplary video transmission method for a video streaming service.
12 is a flowchart of an exemplary video playback method for a video streaming service.
13 is a diagram illustrating an exemplary OMAF syntax in the international video standard for signaling in tile indexing of a user's region of interest according to projection.
14 is a diagram illustrating an example of a tile indexing information syntax of a region of interest expressed in the form of XML.

본 명세서에 개시된 기술은 영상 전송 장치와 영상 재상 장치 간의 영상 스트리밍을 제공하는 360도 가상 현실 시스템에 적용될 수 있다. 그러나 본 명세서에 개시된 기술은 이에 한정되지 않고, 상기 기술의 기술적 사상이 적용될 수 있는 모든 전자 장치 및 방법에도 적용될 수 있다.The technology disclosed herein can be applied to a 360-degree virtual reality system that provides image streaming between an image transmission device and an image replay device. However, the technology disclosed in this specification is not limited thereto, and may be applied to all electronic devices and methods to which the technical spirit of the technology can be applied.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 명세서에 개시된 기술의 사상을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 명세서에 개시된 기술의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 본 명세서에 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 명세서에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥 상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that the technical terms used in this specification are only used to describe specific embodiments, and are not intended to limit the spirit of the technology disclosed herein. In addition, technical terms used in this specification should be interpreted as meanings generally understood by a person having ordinary knowledge in the field to which the technology disclosed in this specification belongs, unless defined otherwise. It should not be interpreted as a comprehensive meaning or an excessively reduced meaning. In addition, when the technical term used in this specification is an incorrect technical term that does not accurately represent the spirit of the technology disclosed in the present specification, a technical term that can be correctly understood by a person having ordinary knowledge in the field to which the technology disclosed in this specification belongs Should be understood. In addition, the general terms used in this specification should be interpreted as defined in the dictionary or in context before and after, and should not be interpreted as an excessively reduced meaning.

본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Terms including ordinal numbers such as first and second used in the present specification may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from other components. For example, the first component may be referred to as a second component without departing from the scope of the present invention, and similarly, the second component may also be referred to as a first component.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예들을 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, exemplary embodiments disclosed herein will be described in detail with reference to the accompanying drawings, but the same or similar elements are assigned the same reference numbers regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

또한, 본 명세서에 개시된 기술을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 기술의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 기술의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 그 기술의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.In addition, in the description of the technology disclosed in the present specification, when it is determined that the detailed description of the related known technology may obscure the gist of the technology disclosed herein, the detailed description will be omitted. In addition, it should be noted that the accompanying drawings are only for easily understanding the spirit of the technology disclosed in the present specification, and should not be interpreted as limiting the spirit of the technology by the accompanying drawings.

도 1은 가상 현실 영상을 제공하는 예시적인 가상 현실 시스템을 도시한다.1 shows an example virtual reality system that provides a virtual reality image.

가상 현실 시스템은 가상 현실 영상을 생성하는 가상 현실 영상 생성 장치, 상기 입력된 가상 현실 영상을 인코딩하여 전송하는 서버 디바이스, 및 상기 전송된 가상 현실 영상을 디코딩하여 사용자에게 출력하는 하나 이상의 클라이언트 디바이스를 포함하도록 구성될 수 있다.The virtual reality system includes a virtual reality image generating apparatus that generates a virtual reality image, a server device that encodes and transmits the input virtual reality image, and one or more client devices that decode and output the transmitted virtual reality image to a user. It can be configured to.

도 1을 참조하면, 예시적인 가상 현실 시스템(100)은 가상 현실 영상 생성 장치(110), 서버 디바이스(120), 및 하나 이상의 클라이언트 디바이스(130)를 포함하며, 도 1에 도시된 각 구성요소들의 수는 예시적인 것일 뿐 이에 제한되지 아니한다. 상기 가상 현실 시스템(100)은 360도 영상 제공 시스템으로도 불릴 수 있다.Referring to FIG. 1, the exemplary virtual reality system 100 includes a virtual reality image generating apparatus 110, a server device 120, and one or more client devices 130, and each component illustrated in FIG. 1 The number of examples is illustrative only and is not limited thereto. The virtual reality system 100 may also be referred to as a 360-degree image providing system.

상기 가상 현실 영상 생성 장치(110)는 하나 이상의 카메라 모듈을 포함하여 장치 자신이 위치하고 있는 공간에 대한 영상을 촬영함으로써 공간 영상을 생성할 수 있다.The virtual reality image generating apparatus 110 may generate a spatial image by capturing an image of a space in which the device itself is located, including one or more camera modules.

상기 서버 디바이스(120)는 상기 가상 현실 영상 생성 장치(110)에서 생성되어 입력된 가상 현실 공간에 대한 영상을 스티칭(Image stitching), 프로젝션(Projection), 맵핑(Mapping)하여 360도 영상을 생성하고, 상기 생성된 360도 영상을 원하는 품질의 비디오 데이터로 조절한 뒤 인코딩(Encoding; 부호화)할 수 있다.The server device 120 generates a 360-degree image by stitching, projecting, and mapping an image of the virtual reality space generated and input by the virtual reality image generating apparatus 110 and , The generated 360-degree image may be adjusted to video data of a desired quality and then encoded.

또한, 상기 서버 디바이스(120)는 상기 인코딩된 360도 영상에 대한 비디오 데이터와 상기 비디오 데이터에 대한 시그널링 데이터를 포함하는 비트스트림을 네트워크(통신망)를 통해서 클라이언트 디바이스(130)로 전송할 수 있다.Further, the server device 120 may transmit a bitstream including video data for the encoded 360-degree video and signaling data for the video data to the client device 130 through a network (network).

상기 클라이언트 디바이스(130)는 수신된 비트스트림을 디코딩(Decoding; 복호화)하여 상기 클라이언트 디바이스(130)를 착용한 사용자에게 360도 영상을 출력할 수 있다. 상기 클라이언트 디바이스(130)는 머리 장착형 영상 표시 장치(Head-Mounted Display; HMD)와 같은 근안 디스플레이(Near-eye display) 장치일 수 있다.The client device 130 may decode a received bitstream (Decoding) to output a 360-degree image to a user wearing the client device 130. The client device 130 may be a near-eye display device such as a head-mounted display (HMD).

한편, 상기 가상 현실 영상 생성 장치(110)는 컴퓨터 시스템으로 구성되어 컴퓨터 그래픽으로 구현된 가상의 360도 공간에 대한 영상을 생성할 수도 있다. 또한, 상기 가상 현실 영상 생성 장치(110)는 가상 현실 게임 등의 가상 현실 콘텐츠의 공급자 일 수 있다.On the other hand, the virtual reality image generating apparatus 110 may be configured with a computer system to generate an image of a virtual 360-degree space implemented with computer graphics. In addition, the virtual reality image generating apparatus 110 may be a provider of virtual reality content such as a virtual reality game.

클라이언트 디바이스(130)는 해당 클라이언트 디바이스(130)를 사용하는 사용자로부터 사용자 데이터를 획득할 수 있다. 사용자 데이터는 사용자의 영상 데이터, 음성 데이터, 뷰포트 데이터(시선 데이터), 관심영역 데이터 및 부가 데이터를 포함할 수 있다.The client device 130 may obtain user data from a user who uses the client device 130. The user data may include user image data, audio data, viewport data (gaze data), region of interest data, and additional data.

예를 들어, 클라이언트 디바이스(130)는 사용자의 영상 데이터를 획득하는 2D/3D 카메라 및 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.For example, the client device 130 may include at least one of a 2D/3D camera and an Immersive camera that acquire image data of a user. 2D/3D cameras can take images with a viewing angle of 180 degrees or less. The Immersive camera can shoot images with a viewing angle of 360 degrees or less.

예를 들어, 클라이언트 디바이스(130)는 제1 장소에 위치한 제1 사용자의 사용자 데이터를 획득하는 제1 클라이언트 디바이스(131), 제2 장소에 위치한 제2 사용자의 사용자 데이터를 획득하는 제2 클라이언트 디바이스(133), 및 제3 장소에 위치한 제3 사용자의 사용자 데이터를 획득하는 제3 클라이언트 디바이스(135) 중에서 적어도 하나를 포함할 수 있다.For example, the client device 130 may include a first client device 131 obtaining user data of a first user located in a first place, and a second client device obtaining user data of a second user located in a second place. (133), and a third client device 135 for acquiring user data of a third user located in a third place.

각각의 클라이언트 디바이스(130)는 사용자로부터 획득한 사용자 데이터를 네트워크를 통하여 서버 디바이스(120)로 전송할 수 있다.Each client device 130 may transmit user data obtained from a user to the server device 120 through a network.

서버 디바이스(120)는 클라이언트 디바이스(130)로부터 적어도 하나의 사용자 데이터를 수신할 수 있다. 서버 디바이스(120)는 수신한 사용자 데이터를 기초로 가상 현실 공간에 대한 전체 영상을 생성할 수 있다. 서버 디바이스(120)가 생성한 전체 영상은 가상 현실 공간 내에서 360도 방향의 영상을 제공하는 immersive 영상을 나타낼 수 있다. 서버 디바이스(120)는 사용자 데이터에 포함된 영상 데이터를 가상 현실 공간에 매핑하여 전체 영상을 생성할 수 있다.The server device 120 may receive at least one user data from the client device 130. The server device 120 may generate an entire image of the virtual reality space based on the received user data. The entire image generated by the server device 120 may represent an immersive image providing a 360-degree image in the virtual reality space. The server device 120 may generate the entire image by mapping the image data included in the user data to the virtual reality space.

서버 디바이스(120)는 상기 생성된 전체 영상을 각 사용자에게 전송할 수 있다.The server device 120 may transmit the generated entire image to each user.

각각의 클라이언트 디바이스(130)는 전체 영상을 수신하고, 각 사용자가 바라보는 영역만을 가상 현실 공간에 렌더링 및/또는 디스플레이할 수 있다.Each client device 130 may receive the entire image and render and/or display only the area viewed by each user in the virtual reality space.

도 2는 예시적인 스케일러블 비디오 코딩 서비스를 나타낸 도면이다.2 is a diagram illustrating an example scalable video coding service.

스케일러블 비디오 코딩 서비스는 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라 시간적, 공간적, 그리고 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 스케일러블 비디오 코딩 서비스는 일반적으로 해상도(Spatial resolution), 품질(Quality), 및 시간(Temporal) 측면에서의 계층성(Scalability)을 제공한다.The scalable video coding service is a video compression method for providing various services in terms of time, space, and image quality according to various user environments such as network conditions or terminal resolution in various multimedia environments. A scalable video coding service generally provides scalability in terms of spatial resolution, quality, and time.

공간적 계층성(Spatial scalability)은 동일한 영상에 대해 각 계층별로 다른 해상도를 가지고 부호화함으로써 서비스할 수 있다. 공간적 계층성을 이용하여 디지털 TV, 노트북, 스마트 폰 등 다양한 해상도를 갖는 디바이스에 대해 적응적으로 영상 콘텐츠를 제공하는 것이 가능하다.Spatial scalability can be serviced by encoding the same image with different resolution for each layer. It is possible to adaptively provide video content to devices having various resolutions, such as digital TVs, laptops, and smart phones, using spatial hierarchies.

도면을 참고하면, 스케일러블 비디오 코딩 서비스는 VSP(비디오 서비스 프로바이더; Video Service Provider)로부터 가정 내의 홈 게이트웨이 (Home Gateway)를 통해 동시에 하나 이상의 서로 다른 특성을 가진 TV를 지원할 수 있다. 예를 들어, 스케일러블 비디오 코딩 서비스는 서로 다른 해상도(Resolution)를 가지는 HDTV (High-Definition TV), SDTV (Standard-Definition TV), 및 LDTV (Low-Definition TV)를 동시에 지원할 수 있다.Referring to the drawings, a scalable video coding service may simultaneously support TVs having one or more different characteristics from a video service provider (VSP) through a home gateway in a home. For example, the scalable video coding service may simultaneously support HDTV (High-Definition TV), SDTV (Standard-Definition TV), and LDTV (Low-Definition TV) having different resolutions.

시간적 계층성(Temporal scalability)은 콘텐츠가 전송되는 네트워크 환경 또는 단말기의 성능을 고려하여 영상의 프레임 레이트(Frame rate)를 적응적으로 조절할 수 있다. 예를 들어, 근거리 통신망을 이용하는 경우에는 60FPS(Frame Per Second)의 높은 프레임 레이트로 서비스를 제공하고, 상기 근거리 통신망에 비해 상대적으로 전송속도가 느린 3G 모바일 네트워크와 같은 무선 광대역 통신망을 사용하는 경우에는 16FPS의 낮은 프레임 레이트로 콘텐츠를 제공함으로써, 사용자가 영상을 끊김 없이 받아볼 수 있도록 서비스를 제공할 수 있다. 그러나 5G 모바일 네트워크와 같은 고속의 무선 광대역 통신망을 사용하는 경우에는 60FPS의 높은 프레임 레이트로 서비스를 제공할 수 있다.Temporal scalability may adaptively adjust the frame rate of an image in consideration of a network environment in which content is transmitted or performance of a terminal. For example, when a local area network is used, a service is provided at a high frame rate of 60 frames per second (FPS), and when a wireless broadband communication network such as a 3G mobile network having a slower transmission speed than the local area network is used, By providing the content at a low frame rate of 16FPS, a service can be provided so that the user can receive the video seamlessly. However, if a high-speed wireless broadband communication network such as a 5G mobile network is used, the service can be provided at a high frame rate of 60 FPS.

품질 계층성(Quality scalability) 또한 네트워크 환경이나 단말기의 성능에 따라 다양한 화질의 콘텐츠를 서비스함으로써, 사용자가 영상 콘텐츠를 안정적으로 재생할 수 있도록 한다.Quality scalability also provides contents of various image quality according to network environment or terminal performance, so that users can stably play video contents.

스케일러블 비디오 코딩 서비스는 각각 기본 계층(Base layer)과 하나 이상의 향상 계층(Enhancement layer(s))을 포함할 수 있다. 수신기는 기본 계층만 받았을 때는 일반 화질의 영상을 제공하고, 기본 계층 및 향상 계층을 함께 받으면 고화질을 제공할 수 있다. 즉, 기본 계층과 하나 이상의 향상 계층이 있을 때, 기본 계층을 받은 상태에서 향상 계층(예: enhancement layer 1, enhancement layer 2, ... , enhancement layer n)을 더 받으면 받을수록 화질이나 제공하는 영상의 품질이 좋아진다.The scalable video coding service may include a base layer and one or more enhancement layers(s), respectively. When the receiver receives only the base layer, the image can be provided in general quality, and when the base layer and the enhancement layer are received together, the receiver can provide high quality. That is, when there is a base layer and one or more enhancement layers, the more an enhancement layer (eg, enhancement layer 1, enhancement layer 2, ..., enhancement layer n) is received while the base layer is received, the more image quality or quality it provides. The quality is improved.

이와 같이, 스케일러블 비디오 코딩 서비스의 영상은 복수 개의 계층으로 구성되어 있으므로, 수신기는 적은 용량의 기본 계층 데이터를 빨리 전송 받아 일반적 화질의 영상을 빨리 처리하여 재생하고, 필요 시 향상 계층 영상 데이터까지 추가로 받아서 서비스의 품질을 높일 수 있다.As described above, since the video of the scalable video coding service is composed of a plurality of layers, the receiver receives a small amount of base layer data quickly, processes and reproduces the image of general quality quickly, and adds enhancement layer image data if necessary. You can increase the quality of the service.

도 3은 서버 디바이스의 예시적인 구성을 나타낸 도면이다.3 is a diagram showing an exemplary configuration of a server device.

서버 디바이스(300)는 제어부(310) 및/또는 통신부(320)를 포함할 수 있다.The server device 300 may include a control unit 310 and/or a communication unit 320.

제어부(310)는 가상 현실 공간에 대한 전체 영상을 생성하고, 생성된 전체 영상을 인코딩할 수 있다. 또한, 제어부(310)는 서버 디바이스(300)의 모든 동작을 제어할 수 있다. 구체적인 내용은 이하에서 설명한다.The control unit 310 may generate the entire image for the virtual reality space and encode the generated whole image. Also, the control unit 310 may control all operations of the server device 300. Details will be described below.

통신부(320)는 외부 장치 및/또는 클라이언트 디바이스로 데이터를 전송 및/또는 수신할 수 있다. 예를 들어, 통신부(320)는 적어도 하나의 클라이언트 디바이스로부터 사용자 데이터 및/또는 시그널링 데이터를 수신할 수 있다. 또한, 통신부(320)는 가상 현실 공간에 대한 전체 영상 및/또는 일부의 영역에 대한 영상을 클라이언트 디바이스로 전송할 수 있다.The communication unit 320 may transmit and/or receive data to an external device and/or a client device. For example, the communication unit 320 may receive user data and/or signaling data from at least one client device. Also, the communication unit 320 may transmit an entire image of the virtual reality space and/or an image of a partial region to the client device.

제어부(310)는 시그널링 데이터 추출부(311), 영상 생성부(313), 관심영역 판단부(315), 시그널링 데이터 생성부(317), 및/또는 인코더(319) 중에서 적어도 하나를 포함할 수 있다.The control unit 310 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, a signaling data generation unit 317, and/or an encoder 319. have.

시그널링 데이터 추출부(311)는 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 포함할 수 있다. 상기 영상 구성 정보는 가상 현실 공간 내에서 사용자의 시선 방향을 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다. 또한, 상기 영상 구성 정보는 가상 현실 공간 내에서 사용자의 뷰포트 정보를 포함할 수 있다.The signaling data extraction unit 311 may extract signaling data from data transmitted from the client device. For example, the signaling data may include image configuration information. The image configuration information may include gaze information indicating a user's gaze direction in a virtual reality space and zoom area information indicating a user's viewing angle. In addition, the image configuration information may include user's viewport information in a virtual reality space.

영상 생성부(313)는 가상 현실 공간에 대한 전체 영상 및 가상 현실 공간 내의 특정 영역에 대한 영상을 생성할 수 있다.The image generating unit 313 may generate an entire image for the virtual reality space and an image for a specific area in the virtual reality space.

관심영역 판단부(315)는 가상 현실 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심영역을 판단할 수 있다. 또한, 가상 현실 공간의 전체 영역 내에서 사용자의 뷰포트를 판단할 수 있다. 예를 들어, 관심영역 판단부(315)는 시선 정보 및/또는 줌 영역 정보를 기초로 관심영역을 판단할 수 있다. 예를 들어, 관심영역은 사용자가 보게 될 가상의 공간에서 중요 오브젝트가 위치할 타일의 위치(예를 들어, 게임 등에서 새로운 적이 등장하는 위치, 가상 현실 공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳일 수 있다. 또한, 관심영역 판단부(315)는 가상 현실 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심영역을 지시하는 관심영역 정보와 사용자의 뷰포트에 대한 정보를 생성할 수 있다.The ROI determining unit 315 may determine the ROI corresponding to the user's gaze direction within the entire region of the virtual reality space. In addition, it is possible to determine a user's viewport within the entire area of the virtual reality space. For example, the ROI determination unit 315 may determine the ROI based on gaze information and/or zoom region information. For example, the region of interest is the location of a tile in which the important object will be placed in the virtual space that the user will see (eg, a location where new enemies appear in a game, etc., the speaker's location in the virtual reality space), and/or the user It may be the place where the gaze of is looking. In addition, the region of interest determining unit 315 may generate region of interest information indicating a region of interest corresponding to a user's gaze direction and information about a user's viewport within the entire region of the virtual reality space.

시그널링 데이터 생성부(317)는 전체 영상을 처리하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 관심영역 정보 및/또는 뷰포트 정보를 전송할 수 있다. 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더(Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수 있다.The signaling data generation unit 317 may generate signaling data for processing the entire image. For example, the signaling data may transmit region of interest information and/or viewport information. The signaling data may be transmitted through at least one of SEI (Supplement Enhancement Information), VUI (video usability information), a slice header, and a file describing video data.

인코더(319)는 시그널링 데이터를 기초로 전체 영상을 인코딩할 수 있다. 예를 들어, 인코더(319)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 인코딩할 수 있다. 예를 들어, 가상 현실 공간 내에서 사용자가 특정 지점을 바라보는 경우, 인코더는 가상 현실 공간 내의 사용자 시선을 기초로 특정 지점에 해당하는 영상은 고화질로 인코딩하고, 상기 특정 지점 이외에 해당하는 영상은 저화질로 인코딩할 수 있다. 실시예에 따라서, 인코더(319)는 시그널링 데이터 추출부(311), 영상 생성부(313), 관심영역 판단부(315), 및/또는 시그널링 데이터 생성부(317) 중에서 적어도 하나를 포함할 수 있다.The encoder 319 may encode the entire image based on signaling data. For example, the encoder 319 may encode the entire image in a manner customized to each user based on the gaze direction of each user. For example, when a user views a specific point in the virtual reality space, the encoder encodes an image corresponding to the specific point in high quality based on the user's gaze in the virtual reality space, and the image corresponding to the point other than the specific point is low quality. Can be encoded as According to an embodiment, the encoder 319 may include at least one of a signaling data extraction unit 311, an image generation unit 313, a region of interest determination unit 315, and/or a signaling data generation unit 317. have.

또한, 제어부(310)는 시그널링 데이터 추출부(311), 영상 생성부(313), 관심영역 판단부(315), 시그널링 데이터 생성부(317), 및 인코더(319) 이 외에 프로세서(도시하지 않음), 메모리(도시하지 않음), 및 입출력 인터페이스(도시하지 않음)를 포함할 수 있다.In addition, the control unit 310, a signaling data extracting unit 311, an image generating unit 313, a region of interest determining unit 315, a signaling data generating unit 317, and an encoder 319 other than the processor (not shown) ), a memory (not shown), and an input/output interface (not shown).

상기 프로세서는 중앙처리장치(Central Processing Unit; CPU), 어플리케이션 프로세서(Application Processor; AP), 또는 커뮤니케이션 프로세서(Communication Processor; CP) 중 하나 또는 그 이상을 포함할 수 있다. 상기 프로세서는, 예를 들어, 상기 제어부(310)의 적어도 하나의 다른 구성요소들의 제어 및/또는 통신에 관한 연산이나 데이터 처리를 실행할 수 있다.The processor may include one or more of a central processing unit (CPU), an application processor (AP), or a communication processor (CP). The processor may execute, for example, calculation or data processing related to control and/or communication of at least one other component of the control unit 310.

또한, 상기 프로세서는, 예를 들어, SoC(system on chip)로 구현될 수 있다. 일 실시예에 따르면, 상기 프로세서는 GPU(graphic processing unit) 및/또는 이미지 신호 프로세서(image signal processor)를 더 포함할 수 있다.Also, the processor may be implemented with, for example, a System on Chip (SoC). According to an embodiment, the processor may further include a graphic processing unit (GPU) and/or an image signal processor.

또한, 상기 프로세서는, 예를 들어, 운영 체제 또는 응용 프로그램을 구동하여 상기 프로세서에 연결된 다수의 하드웨어 또는 소프트웨어 구성요소들을 제어할 수 있고, 각종 데이터 처리 및 연산을 수행할 수 있다.In addition, the processor may control, for example, a plurality of hardware or software components connected to the processor by driving an operating system or an application program, and may perform various data processing and operations.

또한, 상기 프로세서는 다른 구성요소들(예: 비휘발성 메모리) 중 적어도 하나로부터 수신된 명령 또는 데이터를 휘발성 메모리에 로드(load)하여 처리하고, 다양한 데이터를 비휘발성 메모리에 저장(store)할 수 있다.In addition, the processor may load and process instructions or data received from at least one of other components (eg, non-volatile memory) into a volatile memory, and store various data in a non-volatile memory. have.

상기 메모리는 휘발성(volatile) 및/또는 비휘발성(non-volatile) 메모리를 포함할 수 있다. 상기 메모리는, 예를 들어, 상기 제어부(310)의 적어도 하나의 다른 구성요소에 관계된 명령 또는 데이터를 저장할 수 있다. 한 실시예에 따르면, 상기 메모리는 소프트웨어 및/또는 프로그램을 저장할 수 있다.The memory may include volatile and/or non-volatile memory. The memory may store, for example, commands or data related to at least one other component of the control unit 310. According to one embodiment, the memory may store software and/or programs.

상기 입출력 인터페이스는, 예를 들어, 사용자 또는 다른 외부 기기로부터 입력된 명령 또는 데이터를 상기 제어부(310)의 다른 구성요소(들)에 전달할 수 있는 인터페이스의 역할을 할 수 있다. 또한, 상기 입출력 인터페이스는 상기 제어부(310)의 다른 구성요소(들)로부터 수신된 명령 또는 데이터를 사용자 또는 다른 외부 기기로 출력할 수 있다.The input/output interface, for example, may serve as an interface that can transmit commands or data input from a user or other external device to other component(s) of the controller 310. In addition, the input/output interface may output commands or data received from other component(s) of the control unit 310 to a user or another external device.

이하에서는 관심영역을 이용한 예시적인 영상 전송 방법을 설명한다.Hereinafter, an exemplary image transmission method using a region of interest will be described.

서버 디바이스는, 통신부를 이용하여, 적어도 하나의 클라이언트 디바이스로부터 비디오 데이터 및 시그널링 데이터를 수신할 수 있다. 또한, 서버 디바이스는, 시그널링 데이터 추출부를 이용하여, 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 시점 정보 및 줌 영역 정보를 포함할 수 있다.The server device may receive video data and signaling data from at least one client device using a communication unit. Further, the server device may extract signaling data using the signaling data extraction unit. For example, the signaling data may include viewpoint information and zoom area information.

시선 정보는 사용자가 가상 현실 공간 내에서 어느 영역(지점)을 바라보는지 여부를 지시할 수 있다. 가상 현실 공간 내에서 사용자가 특정 영역을 바라보면, 시선 정보는 사용자에서 상기 특정 영역으로 향하는 방향을 지시할 수 있다.The gaze information may indicate which area (point) the user views in the virtual reality space. When the user views a specific area in the virtual reality space, gaze information may indicate a direction from the user to the specific area.

줌 영역 정보는 사용자의 시선 방향에 해당하는 비디오 데이터의 확대 범위 및/또는 축소 범위를 지시할 수 있다. 또한, 줌 영역 정보는 사용자의 시야각을 지시할 수 있다. 줌 영역 정보의 값을 기초로 비디오 데이터가 확대되면, 사용자는 특정 영역만을 볼 수 있다. 줌 영역 정보의 값을 기초로 비디오 데이터가 축소되면, 사용자는 특정 영역뿐만 아니라 상기 특정 영역 이외의 영역 일부 및/또는 전체를 볼 수 있다.The zoom area information may indicate an enlargement range and/or a reduction range of video data corresponding to a user's gaze direction. Also, the zoom area information may indicate a user's viewing angle. When the video data is enlarged based on the value of the zoom area information, the user can see only a specific area. When the video data is reduced based on the value of the zoom area information, the user can view not only a specific area, but also part and/or all of the areas other than the specific area.

서버 디바이스는, 영상 생성부를 이용하여, 가상 현실 공간에 대한 전체 영상을 생성할 수 있다.The server device may generate an entire image of the virtual reality space using the image generation unit.

서버 디바이스는, 관심영역 판단부를 이용하여, 시그널링 데이터를 기초로 가상 현실 공간 내에서 각 사용자가 바라보는 시점 및 줌(zoom) 영역에 대한 영상 구성 정보를 파악할 수 있다. 상기 관심영역 판단부는 영상 구성 정보를 기초로 사용자의 관심영역을 결정할 수 있다.The server device may use the region of interest determination unit to grasp image configuration information for a viewpoint and a zoom region viewed by each user in the virtual reality space based on the signaling data. The region of interest determining unit may determine the region of interest of the user based on the image configuration information.

시그널링 데이터(예를 들어, 시점 정보 및 줌 영역 정보 중에서 적어도 하나)가 변경될 경우, 서버 디바이스는 새로운 시그널링 데이터를 수신할 수 있다. 이 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 새로운 관심영역을 결정할 수 있다.When signaling data (eg, at least one of viewpoint information and zoom area information) is changed, the server device may receive new signaling data. In this case, the server device may determine a new region of interest based on the new signaling data.

서버 디바이스의 제어부는, 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심영역에 해당하는 데이터인지의 여부를 판단할 수 있다.The control unit of the server device may determine whether the data currently being processed is data corresponding to the ROI based on the signaling data.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심영역에 해당하는 데이터인지 아닌지 여부를 판단할 수 있다. 상기 현재 처리하는 데이터가 관심영역에 해당하는 데이터일 경우, 서버 디바이스는, 인코더를 이용하여, 사용자의 시점에 해당하는 비디오 데이터(예를 들어, 관심영역)는 고화질로 인코딩할 수 있다. 예를 들어, 서버 디바이스는 상기 사용자의 시점에 해당하는 비디오 데이터에 대하여 기본 계층 비디오 데이터 및 향상 계층 비디오 데이터를 생성하여 이들을 전송할 수 있다.When the signaling data is changed, the server device may determine whether the data currently being processed is data corresponding to the ROI based on the new signaling data. When the data currently being processed is data corresponding to a region of interest, the server device may encode video data (eg, a region of interest) corresponding to the user's viewpoint using a encoder. For example, the server device may generate base layer video data and enhancement layer video data for the video data corresponding to the user's viewpoint and transmit them.

또한, 시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 시점에 해당하는 비디오 데이터(새로운 관심영역)는 고화질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 기본 화질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 고화질의 영상을 전송해야 하는 경우, 서버 디바이스는 향상 계층 비디오 데이터를 추가로 생성 및/또는 전송할 수 있다.In addition, when signaling data is changed, the server device may transmit video data (a new region of interest) corresponding to a new viewpoint as a high-quality image. In the past, when the server device was transmitting the image of the basic quality, but the signaling data is changed and the server device needs to transmit the image of the high quality, the server device may additionally generate and/or transmit enhancement layer video data.

새로운 시그널링 데이터를 기초로 현재 처리하는 데이터가 관심영역에 해당하지 않는 데이터일 경우, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 데이터(예를 들어, 비-관심영역)은 기본 화질로 인코딩할 수 있다. 예를 들어, 서버 디바이스는 사용자의 시점에 해당하지 않는 비디오 데이터에 대하여 기본 계층 비디오 데이터만 생성하고, 이들을 전송할 수 있다.If the data currently being processed based on the new signaling data is data that does not correspond to the region of interest, the server device may encode video data (eg, non-interested region) that does not correspond to the user's point of view with the basic image quality. have. For example, the server device may generate and transmit only the base layer video data for video data that does not correspond to the user's viewpoint.

시그널링 데이터가 변경되는 경우, 서버 디바이스는 새로운 사용자의 시점에 해당하지 않는 비디오 데이터(새로운 비-관심영역)는 기본 화질의 영상으로 전송할 수 있다. 기존에 서버 디바이스가 고화질의 영상을 전송하고 있었으나 시그널링 데이터가 변경되어 서버 디바이스가 기본 화질의 영상을 전송해야 하는 경우, 서버 디바이스는 더 이상 적어도 하나의 향상 계층 비디오 데이터를 생성 및/또는 전송하지 않고, 기본 계층 비디오 데이터만을 생성 및/또는 전송할 수 있다.When the signaling data is changed, the server device may transmit video data (a new non-interested region) that does not correspond to a new user's viewpoint as an image having a basic quality. If the server device has been transmitting high-definition video in the past, but the signaling data has changed and the server device has to transmit the video with the basic quality, the server device no longer generates and/or transmits at least one enhancement layer video data. , Only base layer video data can be generated and/or transmitted.

즉, 기본 계층 비디오 데이터를 수신했을 때의 비디오 데이터의 화질은 향상 계층 비디오 데이터까지 받았을 때의 비디오 데이터의 화질보다는 낮으므로, 클라이언트 디바이스는 사용자가 고개를 돌린 정보를 센서 등으로부터 얻는 순간에, 사용자의 시선 방향에 해당하는 비디오 데이터(예를 들어, 관심영역에 대한 비디오 데이터)에 대한 향상 계층 비디오 데이터를 수신할 수 있다. 그리고, 클라이언트 디바이스는 짧은 시간 내에 고화질의 비디오 데이터를 사용자에게 제공할 수 있다.That is, since the quality of the video data when the base layer video data is received is lower than the quality of the video data when the enhancement layer video data is received, the client device is the user at the moment when the user obtains information from the sensor or the like. Enhancement layer video data may be received for video data corresponding to the gaze direction of (for example, video data for a region of interest). And, the client device can provide high quality video data to the user in a short time.

도 4는 서버 디바이스의 인코더의 예시적인 구성을 나타낸 도면이다.4 is a diagram showing an exemplary configuration of an encoder of a server device.

인코더(400, 영상 부호화 장치)는 기본 계층 인코더(410), 적어도 하나의 향상 계층 인코더(420), 및 다중화기(430) 중에서 적어도 하나를 포함할 수 있다.The encoder 400 (image encoding apparatus) may include at least one of a base layer encoder 410, at least one enhancement layer encoder 420, and a multiplexer 430.

인코더(400)는 스케일러블 비디오 코딩 방법을 사용하여 전체 영상을 인코딩할 수 있다. 스케일러블 비디오 코딩 방법은 SVC(Scalable Video Coding) 및/또는 SHVC(Scalable High Efficiency Video Coding)를 포함할 수 있다.The encoder 400 may encode the entire image using a scalable video coding method. The scalable video coding method may include SVC (Scalable Video Coding) and/or SVC (Scalable High Efficiency Video Coding).

스케일러블 비디오 코딩 방법은 다양한 멀티미디어 환경에서 네트워크의 상황 혹은 단말기의 해상도 등과 같은 다양한 사용자 환경에 따라서 시간적, 공간적, 및 화질 관점에서 계층적(Scalable)으로 다양한 서비스를 제공하기 위한 영상 압축 방법이다. 예를 들어, 인코더(400)는 동일한 비디오 데이터에 대하여 두 가지 이상의 다른 품질(또는 해상도, 프레임 레이트)의 영상들을 인코딩하여 비트스트림을 생성할 수 있다.The scalable video coding method is an image compression method for providing various services in a hierarchical (Scalable) view in terms of time, space, and image quality according to various user environments such as network conditions or terminal resolution in various multimedia environments. For example, the encoder 400 may generate a bitstream by encoding two or more different quality (or resolution, frame rate) images of the same video data.

예를 들어, 인코더(400)는 비디오 데이터의 압축 성능을 높이기 위해서 계층 간 중복성을 이용한 인코딩 방법인 계층간 예측 툴(Inter-layer prediction tools)을 사용할 수 있다. 계층 간 예측 툴은 계층 간에 존재하는 영상의 중복성을 제거하여 향상 계층(Enhancement Layer; EL)에서의 압축 효율을 높이는 기술이다.For example, the encoder 400 may use inter-layer prediction tools, which is an encoding method using inter-layer redundancy, to increase compression performance of video data. The inter-layer prediction tool is a technique for improving compression efficiency in an enhancement layer (EL) by removing redundancy of images existing between layers.

향상 계층은 계층 간 예측 툴을 이용하여 참조 계층(Reference Layer)의 정보를 참조하여 인코딩될 수 있다. 참조 계층이란 향상 계층 인코딩 시 참조되는 하위 계층을 말한다. 여기서, 계층 간 예측 툴을 사용함으로써 계층 사이에 의존성(Dependency)이 존재하기 때문에, 최상위 계층의 영상을 디코딩하기 위해서는 참조되는 모든 하위 계층의 비트스트림이 필요하다. 중간 계층에서는 디코딩 대상이 되는 계층과 그 하위 계층들의 비트스트림 만을 획득하여 디코딩을 수행할 수 있다. 최하위 계층의 비트스트림은 기본 계층(Base Layer; BL)으로써, H.264/AVC, HEVC 등의 인코더로 인코딩될 수 있다.The enhancement layer may be encoded by referring to information of a reference layer using an inter-layer prediction tool. The reference layer refers to a lower layer referenced when encoding an enhancement layer. Here, since a dependency exists between layers by using the inter-layer prediction tool, bitstreams of all the referenced lower layers are required to decode the image of the highest layer. In the middle layer, decoding can be performed by acquiring only the bitstream of the layer to be decoded and its lower layers. The bitstream of the lowest layer is a base layer (BL), and may be encoded by encoders such as H.264/AVC and HEVC.

기본 계층 인코더(410)는 전체 영상을 인코딩하여 기본 계층을 위한 기본 계층 비디오 데이터(또는 기본 계층 비트스트림)를 생성할 수 있다. 예를 들어, 기본 계층 비디오 데이터는 사용자가 가상 현실 공간 내에서 바라보는 전체 영역을 위한 비디오 데이터를 포함할 수 있다. 기본 계층의 영상은 가장 낮은 화질의 영상일 수 있다.The base layer encoder 410 may encode the entire image to generate base layer video data (or a base layer bitstream) for the base layer. For example, the base layer video data may include video data for the entire area viewed by the user in the virtual reality space. The base layer image may be the lowest quality image.

향상 계층 인코더(420)는, 시그널링 데이터(예를 들어, 관심영역 정보) 및 기본 계층 비디오 데이터를 기초로, 전체 영상을 인코딩하여 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비디오 데이터(또는 향상 계층 비트스트림)를 생성할 수 있다. 향상 계층 비디오 데이터는 전체 영역 내에서 관심영역을 위한 비디오 데이터를 포함할 수 있다.The enhancement layer encoder 420 may encode at least one enhancement layer for at least one enhancement layer predicted from the base layer by encoding the entire image based on signaling data (eg, region of interest information) and base layer video data. Video data (or enhancement layer bitstream) can be generated. The enhancement layer video data may include video data for a region of interest within the entire region.

다중화기(430)는 기본 계층 비디오 데이터, 적어도 하나의 향상 계층 비디오 데이터, 및/또는 시그널링 데이터를 멀티플렉싱하고, 전체 영상에 해당하는 하나의 비트스트림을 생성할 수 있다.The multiplexer 430 may multiplex base layer video data, at least one enhancement layer video data, and/or signaling data, and generate one bitstream corresponding to the entire image.

도 5는 관심영역을 시그널링하는 방법을 예시적으로 나타낸 도면으로, 스케일러블 비디오 코딩에서 관심영역을 시그널링하는 방법을 나타낸다.5 exemplarily shows a method of signaling a region of interest, and shows a method of signaling a region of interest in scalable video coding.

도 5를 참조하면, 서버 디바이스(또는 인코더)는 기본 계층(BL)과 적어도 하나의 향상 계층(EL)으로 구성되는 스케일러블 비디오 데이터(500)에서 향상 계층으로 구성된 하나의 비디오 데이터(또는 픽처)를 직사각형 모양을 갖는 여러 타일(Tile)들(510)로 분할할 수 있다. 예를 들어, 비디오 데이터는 Coding Tree Unit(CTU) 단위를 경계로 분할될 수 있다. 예를 들어, 하나의 CTU는 Y CTB, Cb CTB, 및 Cr CTB를 포함할 수 있다.Referring to FIG. 5, the server device (or encoder) includes one video data (or picture) composed of enhancement layers in scalable video data 500 composed of a base layer BL and at least one enhancement layer EL. Can be divided into several tiles (510) having a rectangular shape. For example, video data may be divided into Coding Tree Unit (CTU) units as boundaries. For example, one CTU may include Y CTB, Cb CTB, and Cr CTB.

서버 디바이스는 빠른 사용자 응답을 위해서 기본 계층(BL)의 비디오 데이터는 타일로 분할하지 않고 전체적으로 인코딩할 수 있다.For quick user response, the server device may encode the video data of the base layer BL without dividing it into tiles.

서버 디바이스는 하나 이상의 향상 계층들의 비디오 데이터는 필요에 따라서 일부 또는 전체를 여러 타일들로 분할하여 인코딩할 수 있다. 즉, 서버 디바이스는 향상 계층의 비디오 데이터는 적어도 하나의 타일로 분할하고, 관심영역(520, ROI, Region of Interest)에 해당하는 타일들을 인코딩할 수 있다.The server device may encode video data of one or more enhancement layers by dividing some or all of them into several tiles as necessary. That is, the server device may divide the video data of the enhancement layer into at least one tile and encode tiles corresponding to a region of interest (ROI) 520.

이 때, 관심영역(520)은 가상 현실 공간에서 사용자가 보게 될 중요 오브젝트(Object)가 위치할 타일들의 위치(예를 들어, 게임에서 새로운 적이 등장하는 위치, 화상 통신에서 가상공간에서의 화자의 위치), 및/또는 사용자의 시선이 바라보는 곳에 해당할 수 있다.At this time, the region of interest 520 is the location of tiles in which the important object to be viewed by the user in the virtual reality space is located (eg, a location where a new enemy appears in the game, a speaker in the virtual space in video communication) Location), and/or the user's gaze.

또한, 서버 디바이스는 관심영역에 포함되는 적어도 하나의 타일을 식별하는 타일 정보를 포함하는 관심영역 정보를 생성할 수 있다. 예를 들어, 관심영역 정보는 서버 디바이스에 포함된 관심영역 판단부, 시그널링 데이터 생성부, 및/또는 인코더에 의해서 생성될 수 있다.In addition, the server device may generate region of interest information including tile information for identifying at least one tile included in the region of interest. For example, the region of interest information may be generated by the region of interest determination unit included in the server device, the signaling data generation unit, and/or the encoder.

관심영역(520)의 타일 정보는 연속적이므로 모든 타일의 번호를 다 갖지 않더라도 효과적으로 압축될 수 있다. 예를 들어, 타일 정보는 관심영역에 해당하는 모든 타일의 번호들뿐만 아니라 타일의 시작 번호와 끝 번호, 좌표점 정보, CU (Coding Unit) 번호 리스트, 수식으로 표현된 타일 번호를 포함할 수 있다.The tile information of the region of interest 520 is continuous, and thus can be effectively compressed even if all of the tile numbers are not obtained. For example, the tile information may include all tile numbers corresponding to the region of interest, as well as tile start and end numbers, coordinate point information, a CU (Coding Unit) number list, and tile numbers expressed by formulas. .

또한, 관심영역(520)은 사용자의 현재 뷰포트 일 수 있다.Also, the region of interest 520 may be a user's current viewport.

비-관심영역의 타일 정보는 인코더가 제공하는 Entropy coding을 거친 후 다른 클라이언트 디바이스, 영상 프로세싱 컴퓨팅 장비, 및/또는 서버로 전송될 수 있다.The tile information of the non-interested region may be transmitted to other client devices, image processing computing equipment, and/or servers after undergoing entropy coding provided by the encoder.

관심영역 정보는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol)을 통해 전해질 수 있다. 또한, 관심영역 정보는 비디오 표준의 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header) 등의 패킷 단위에서 전해질 수 있다. 또한, 관심영역 정보는 비디오 파일을 서술하는 별도의 파일, 예를 들어, DASH의 MPD로 전달될 수 있다.The region of interest information may be transmitted through a high-level syntax protocol carrying session information. In addition, the region of interest information may be transmitted in packet units such as SEI (Supplement Enhancement Information) of the video standard, video usability information (VUI), and slice header. In addition, the region of interest information may be transmitted as a separate file describing a video file, for example, an MPD of DASH.

이하에서는, 단일 화면 비디오에서의 관심영역을 시그널링하는 방법을 나타낸다.Hereinafter, a method of signaling a region of interest in a single screen video is shown.

본 명세서의 예시적인 기술은 스케일러블 비디오가 아닌 단일 화면 영상에서는 일반적으로 관심영역(ROI)이 아닌 영역을 다운스케일링(downscaling)(다운샘플링(downsampling))하는 방식으로 화질을 떨어뜨리는 기법을 사용할 수 있다.In the exemplary technology of the present specification, in a single screen image that is not a scalable video, a technique of degrading an image quality by using a method of downscaling (downsampling) an area other than a region of interest (ROI) may be used. have.

종래 기술은 서비스를 이용하는 단말 간에 다운스케일링(downscaling)을 위해 쓴 필터(filter) 정보를 공유하지 않고, 처음부터 한가지 기술로 약속을 하거나 인코더만 필터 정보를 알고 있다.The prior art does not share filter information used for downscaling between terminals using a service, and promises with one technique from the beginning or only the encoder knows the filter information.

하지만, 본 명세서의 서버 디바이스는, 인코딩된 영상을 전달 받는 클라이언트 디바이스(또는 HMD 단말)에서 다운스케일링(downscaling)된 관심영역 외 영역의 화질을 조금이라도 향상 시키기 위해, 인코딩 시에 사용된 필터 정보를 클라이언트 디바이스로 전달할 수 있다. 이 기술은 실제로 영상 처리 시간을 상당히 줄일 수 있으며, 화질 향상을 제공할 수 있다.However, the server device of the present specification uses filter information used at the time of encoding to improve the image quality of a region outside the region of interest downscaled by the client device (or HMD terminal) receiving the encoded image. It can be delivered to a client device. This technology can actually significantly reduce image processing time and provide image quality improvement.

전술한 바와 같이, 서버 디바이스는 관심영역 정보를 생성할 수 있다. 예를 들어, 관심영역 정보는 타일 정보뿐만 아니라 필터 정보를 더 포함할 수 있다. 예를 들어, 필터 정보는 약속된 필터 후보들의 번호, 필터에 사용된 값들을 포함할 수 있다.As described above, the server device can generate the region of interest information. For example, the region of interest information may further include filter information as well as tile information. For example, the filter information may include the number of promised filter candidates and values used in the filter.

도 6은 클라이언트 디바이스의 예시적인 구성을 나타낸 도면이다.6 is a diagram showing an exemplary configuration of a client device.

클라이언트 디바이스(600)는 영상 입력부(610), 오디오 입력부(620), 센서부(630), 영상 출력부(640), 오디오 출력부(650), 통신부(660), 및/또는 제어부(670) 중에서 적어도 하나를 포함할 수 있다. 예를 들어, 클라이언트 디바이스(600)는 HMD(Head-Mounted Display)일 수 있다. 또한, 클라이언트 디바이스(600)의 제어부(670)는 클라이언트 디바이스(600)에 포함될 수도 있고, VR 디바이스의 스마트폰처럼 별도의 장치로 존재할 수도 있다.The client device 600 includes an image input unit 610, an audio input unit 620, a sensor unit 630, an image output unit 640, an audio output unit 650, a communication unit 660, and/or a control unit 670 It may include at least one. For example, the client device 600 may be a head-mounted display (HMD). In addition, the control unit 670 of the client device 600 may be included in the client device 600, or may exist as a separate device, such as a smartphone of a VR device.

영상 입력부(610)는 비디오 데이터를 촬영할 수 있다. 영상 입력부(610)는 사용자의 영상을 획득하는 2D/3D 카메라 및/또는 Immersive 카메라 중에서 적어도 하나를 포함할 수 있다. 2D/3D 카메라는 180도 이하의 시야각을 가지는 영상을 촬영할 수 있다. Immersive 카메라는 360도 이하의 시야각을 가지는 영상을 촬영할 수 있다.The image input unit 610 may capture video data. The image input unit 610 may include at least one of a 2D/3D camera and/or an Immersive camera that acquires a user's image. 2D/3D cameras can take images with a viewing angle of 180 degrees or less. The Immersive camera can shoot images with a viewing angle of 360 degrees or less.

오디오 입력부(620)는 사용자의 음성을 녹음할 수 있다. 예를 들어, 오디오 입력부(620)는 마이크를 포함할 수 있다.The audio input unit 620 may record a user's voice. For example, the audio input unit 620 may include a microphone.

센서부(630)는 사용자 시선의 움직임에 대한 정보를 획득할 수 있다. 예를 들어, 센서부(630)는 물체의 방위 변화를 감지하는 자이로 센서, 이동하는 물체의 가속도나 충격의 세기를 측정하는 가속도 센서, 및 사용자의 시선 방향을 감지하는 외부 센서를 포함할 수 있다. 실시예에 따라서, 센서부(630)는 영상 입력부(610) 및 오디오 입력부(620)를 포함할 수도 있다.The sensor unit 630 may acquire information about the movement of the user's gaze. For example, the sensor unit 630 may include a gyro sensor that detects a change in the orientation of an object, an acceleration sensor that measures the acceleration or impact intensity of a moving object, and an external sensor that senses a user's gaze direction. . According to an embodiment, the sensor unit 630 may include an image input unit 610 and an audio input unit 620.

영상 출력부(640)는 통신부(660)로부터 수신되거나 메모리(미도시)에 저장된 영상 데이터를 출력할 수 있다.The image output unit 640 may output image data received from the communication unit 660 or stored in a memory (not shown).

오디오 출력부(650)는 통신부(660)로부터 수신되거나 메모리에 저장된 오디오 데이터를 출력할 수 있다.The audio output unit 650 may output audio data received from the communication unit 660 or stored in a memory.

통신부(660)는 방송망, 무선통신망 및/또는 브로드밴드를 통해서 외부의 클라이언트 디바이스 및/또는 서버 디바이스와 통신할 수 있다. 통신부(660)는 데이터를 전송하는 전송부(미도시) 및/또는 데이터를 수신하는 수신부(미도시)를 더 포함할 수 있다.The communication unit 660 may communicate with an external client device and/or server device through a broadcast network, a wireless communication network, and/or broadband. The communication unit 660 may further include a transmission unit (not shown) for transmitting data and/or a reception unit (not shown) for receiving data.

제어부(670)는 클라이언트 디바이스(600)의 모든 동작을 제어할 수 있다. 제어부(670)는 서버 디바이스로부터 수신한 비디오 데이터 및 시그널링 데이터를 처리할 수 있다. 제어부(670)에 대한 구체적인 내용은 이하의 도 7에서 상세히 설명한다.The control unit 670 may control all operations of the client device 600. The control unit 670 may process video data and signaling data received from the server device. Details of the control unit 670 will be described in detail in FIG. 7 below.

도 7은 클라이언트 디바이스의 제어부의 예시적인 구성을 나타낸 블록도이다.7 is a block diagram showing an exemplary configuration of a control unit of a client device.

제어부(700)는 시그널링 데이터 및/또는 비디오 데이터를 처리할 수 있다. 제어부(700)는 시그널링 데이터 추출부(710), 디코더(720), 시선 판단부(730), 및/또는 시그널링 데이터 생성부(740) 중에서 적어도 하나를 포함할 수 있다.The control unit 700 may process signaling data and/or video data. The control unit 700 may include at least one of the signaling data extraction unit 710, the decoder 720, the gaze determination unit 730, and/or the signaling data generation unit 740.

시그널링 데이터 추출부(710)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로부터 전송 받은 데이터로부터 시그널링 데이터를 추출할 수 있다. 예를 들어, 시그널링 데이터는 관심영역 정보를 포함할 수 있다.The signaling data extraction unit 710 may extract signaling data from data transmitted from a server device and/or another client device. For example, signaling data may include information of interest.

디코더(720)는 시그널링 데이터를 기초로 비디오 데이터를 디코딩할 수 있다. 예를 들어, 디코더(720)는 각 사용자의 시선 방향을 기초로 각 사용자에게 커스터마이즈된 방식으로 전체 영상을 디코딩할 수 있다. 예를 들어, 가상 현실 공간 내에서 사용자가 특정 영역을 바라보는 경우, 디코더(720)는 가상 현실 공간 내의 사용자 시선을 기초로 특정 영역에 해당하는 영상은 고화질로 디코딩하고, 특정 영역 이외에 해당하는 영상은 저화질로 디코딩할 수 있다. 실시예에 따라서, 디코더(720)는 시그널링 데이터 추출부(710), 시선 판단부(730), 및/또는 시그널링 데이터 생성부(740) 중에서 적어도 하나를 포함할 수 있다.The decoder 720 may decode video data based on signaling data. For example, the decoder 720 may decode the entire image in a manner customized to each user based on the gaze direction of each user. For example, when a user views a specific area in a virtual reality space, the decoder 720 decodes an image corresponding to a specific area in high quality based on a user's gaze in the virtual reality space, and an image corresponding to a specific area. Can be decoded with low quality. According to an embodiment, the decoder 720 may include at least one of a signaling data extraction unit 710, a gaze determination unit 730, and/or a signaling data generation unit 740.

시선 판단부(730)는 가상 현실 공간 내에서 사용자의 시선을 판단하고, 영상 구성 정보를 생성할 수 있다. 예를 들어, 영상 구성 정보는 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.The gaze determining unit 730 may determine a user's gaze in the virtual reality space and generate image configuration information. For example, the image configuration information may include gaze information indicating a gaze direction and/or zoom area information indicating a user's viewing angle.

시그널링 데이터 생성부(740)는 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송하기 위한 시그널링 데이터를 생성할 수 있다. 예를 들어, 시그널링 데이터는 영상 구성 정보를 전송할 수 있다. 또한, 시그널링 데이터는 세션 정보를 실어 나르는 고수준 구문 프로토콜(High-Level Syntax Protocol)을 통해 전해질 수 있으며, 시그널링 데이터는 SEI (Supplement Enhancement Information), VUI (video usability information), 슬라이스 헤더 (Slice Header), 및 비디오 데이터를 서술하는 파일 중에서 적어도 하나를 통하여 전송될 수도 있다.The signaling data generation unit 740 may generate signaling data for transmission to a server device and/or another client device. For example, signaling data may transmit image configuration information. In addition, the signaling data can be transmitted through a high-level syntax protocol (SSL) carrying session information, and the signaling data is SEI (Supplement Enhancement Information), VUI (video usability information), slice header (Slice Header), And files describing video data.

도 8은 클라이언트 디바이스의 디코더의 예시적인 구성을 나타낸 도면이다.8 is a diagram showing an exemplary configuration of a decoder of a client device.

디코더(800)는 추출기(810), 기본 계층 디코더(820), 및/또는 적어도 하나의 향상 계층 디코더(830) 중에서 적어도 하나를 포함할 수 있다.The decoder 800 may include at least one of an extractor 810, a base layer decoder 820, and/or at least one enhancement layer decoder 830.

디코더(800)는 스케일러블 비디오 코딩 방법의 인코딩 과정의 역 과정을 이용하여 비디오 데이터를 포함하는 비트스트림을 디코딩할 수 있다.The decoder 800 may decode a bitstream including video data using an inverse process of the encoding process of the scalable video coding method.

추출기(810)는 비디오 데이터 및 시그널링 데이터를 포함하는 비트스트림(비디오 데이터)을 수신하고, 재생하고자 하는 영상의 화질에 따라서 비트스트림을 선택적으로 추출할 수 있다. 예를 들어, 비트스트림(비디오 데이터)은 기본 계층을 위한 기본 계층 비트스트림(기본 계층 비디오 데이터) 및 기본 계층으로부터 예측되는 적어도 하나의 향상 계층을 위한 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)을 포함할 수 있다. 기본 계층 비트스트림(기본 계층 비디오 데이터)는 가상 현실 공간의 전체 영역을 위한 비디오 데이터를 포함할 수 있다. 적어도 하나의 향상 계층 비트스트림(향상 계층 비디오 데이터)는 전체 영역 내에서 관심영역을 위한 비디오 데이터를 포함할 수 있다.The extractor 810 may receive a bitstream (video data) including video data and signaling data, and selectively extract the bitstream according to an image quality to be reproduced. For example, the bitstream (video data) includes a base layer bitstream (base layer video data) for the base layer and at least one enhancement layer bitstream (enhancement layer video data) for at least one enhancement layer predicted from the base layer. ). The base layer bitstream (base layer video data) may include video data for the entire area of the virtual reality space. The at least one enhancement layer bitstream (enhancement layer video data) may include video data for a region of interest within the entire region.

또한, 시그널링 데이터는 화상 회의 서비스를 위한 가상 현실 공간의 전체 영역 내에서 사용자의 시선 방향에 대응되는 관심영역을 지시하는 관심영역 정보를 포함할 수 있다.Also, the signaling data may include interest region information indicating a region of interest corresponding to a user's gaze direction within the entire region of the virtual reality space for the video conference service.

기본 계층 디코더(820)는 저화질 영상을 위한 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)을 디코딩할 수 있다.The base layer decoder 820 may decode a bit stream (or base layer video data) of a base layer for a low-quality image.

향상 계층 디코더(830)는 시그널링 데이터 및/또는 기본 계층의 비트스트림(또는 기본 계층 비디오 데이터)을 기초로 고화질 영상을 위한 적어도 하나의 향상 계층의 비트스트림(또는 향상 계층 비디오 데이터)을 디코딩할 수 있다.The enhancement layer decoder 830 may decode at least one enhancement layer bitstream (or enhancement layer video data) for a high-quality image based on signaling data and/or a base layer bitstream (or base layer video data). have.

한편, 디코더(800)는 비트스트림에 포함된 비디오 데이터가 스케일러블 비디오 코딩으로 인코딩된 데이터가 아닐 경우엔, 비디오 스트림으로부터 가상 현실 공간의 전체 영역을 위한 기본 화질의 비디오 데이터와 가상 현실 공간 전체 영역 내에서 관심영역을 위한 고화질의 비디오 데이터를 추출할 수 있다.On the other hand, if the video data included in the bitstream is not encoded by scalable video coding, the decoder 800 may output video data of the basic quality for the entire area of the virtual reality space and the entire area of the virtual reality space from the video stream. High quality video data for a region of interest can be extracted from within.

이하에서는, 사용자 시선의 움직임에 실시간으로 대응하기 위한 영상 구성 정보를 생성하는 방법에 대하여 설명한다.Hereinafter, a method of generating image configuration information for responding to a user's gaze movement in real time will be described.

영상 구성 정보는 사용자의 시선 방향을 지시하는 시선 정보 및/또는 사용자의 시야각을 지시하는 줌 영역 정보 중에서 적어도 하나를 포함할 수 있다. 사용자 시선이란 실제 공간이 아닌 가상 현실 공간 내에서 사용자가 바라보는 방향을 의미한다. 또한, 시선 정보는 현재 사용자의 시선 방향을 지시하는 정보뿐만 아니라, 미래에 사용자의 시선 방향을 지시하는 정보(예를 들어, 주목을 받을 것이라 예상되는 시선 지점에 대한 정보)를 포함할 수 있다.The image configuration information may include at least one of gaze information indicating a user's gaze direction and/or zoom area information indicating a user's viewing angle. The user's gaze means the direction the user looks in the virtual reality space, not the real space. In addition, the gaze information may include information indicating a gaze direction of the user in the future, as well as information indicating a gaze direction of the user in the future (eg, information about a gaze point that is expected to receive attention).

클라이언트 디바이스는 사용자를 중심으로 가상 현실 공간 내에 위치하는 특정한 영역을 바라보는 동작을 센싱하고, 이를 처리할 수 있다.The client device may sense and process an operation of looking at a specific area located in the virtual reality space centering on the user.

클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센서부로부터 센싱 정보를 수신할 수 있다. 센싱 정보는 카메라에 의해 촬영된 영상, 마이크에 의해 녹음된 음성일 수 있다. 또한, 센싱 정보는 자이로 센서, 가속도 센서, 및 외부 센서에 의해서 감지된 데이터일 수 있다.The client device may receive the sensing information from the sensor unit using the control unit and/or the gaze determination unit. The sensing information may be an image captured by a camera or a voice recorded by a microphone. Further, the sensing information may be data sensed by a gyro sensor, an acceleration sensor, and an external sensor.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 센싱 정보를 기초로 사용자 시선의 움직임을 확인할 수 있다. 예를 들어, 클라이언트 디바이스는 센싱 정보가 가지는 값의 변화를 기초로 사용자 시선의 움직임을 확인할 수 있다.In addition, the client device may check the movement of the user's gaze based on the sensing information, using the control unit and/or the gaze determination unit. For example, the client device may check the movement of the user's gaze based on the change in the value of the sensing information.

또한, 클라이언트 디바이스는, 제어부 및/또는 시선 판단부를 이용하여, 가상 현실 공간에서의 영상 구성 정보를 생성할 수 있다. 예를 들어, 클라이언트 디바이스가 물리적으로 움직이거나 사용자의 시선이 움직이는 경우, 클라이언트 디바이스는 센싱 정보를 기초로 가상 현실 공간에서의 사용자의 시선 정보 및/또는 줌 영역 정보를 계산할 수 있다.Also, the client device may generate image configuration information in the virtual reality space using the control unit and/or the gaze determination unit. For example, when the client device physically moves or the user's gaze moves, the client device may calculate the user's gaze information and/or zoom area information in the virtual reality space based on the sensing information.

또한, 클라이언트 디바이스는, 통신부를 이용하여, 영상 구성 정보를 서버 디바이스 및/또는 다른 클라이언트 디바이스로 전송할 수 있다. 또한, 클라이언트 디바이스는 영상 구성 정보를 자신의 다른 구성요소로 전달할 수도 있다.Also, the client device may transmit the video configuration information to the server device and/or another client device using the communication unit. In addition, the client device may transmit the image configuration information to other components of its own.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성하는 방법을 설명하였다. 다만 이에 한정되지 않으며, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 영상 구성 정보를 생성할 수도 있다.In the above, the method in which the client device generates image configuration information has been described. However, the present invention is not limited thereto, and the server device may receive sensing information from the client device and generate image configuration information.

또한, 클라이언트 디바이스와 연결된 외부의 컴퓨팅 디바이스가 영상 구성 정보를 생성할 수 있으며, 컴퓨팅 디바이스는 영상 구성 정보를 자신의 클라이언트 디바이스, 다른 클라이언트 디바이스, 및/또는 서버 디바이스로 전달할 수도 있다.Also, an external computing device connected to the client device may generate image configuration information, and the computing device may deliver the image configuration information to its client device, other client device, and/or server device.

이하에서는, 클라이언트 디바이스가 영상 구성 정보를 시그널링 하는 방법을 설명한다.Hereinafter, a method of signaling video configuration information by a client device will be described.

영상 구성 정보(시점 정보 및/또는 줌 영역 정보를 포함)를 시그널링하는 부분은 매우 중요하다. 영상 구성 정보의 시그널링이 너무 잦을 경우, 클라이언트 디바이스, 서버 디바이스, 및/또는 전체 네트워크에 부담을 줄 수 있다.It is very important to signal the video configuration information (including viewpoint information and/or zoom area information). If the signaling of the video configuration information is too frequent, a burden may be placed on the client device, the server device, and/or the entire network.

따라서, 클라이언트 디바이스는 사용자의 영상 구성 정보(또는 시선 정보 및/또는 줌 영역 정보)가 변경되는 경우에만 영상 구성 정보를 시그널링할 수 있다. 즉, 클라이언트 디바이스는 사용자의 시선 정보가 변경되는 경우에만 사용자의 시선 정보를 다른 클라이언트 디바이스 및/또는 서버 디바이스로 전송할 수 있다.Accordingly, the client device may signal the image configuration information only when the image configuration information (or gaze information and/or zoom area information) of the user is changed. That is, the client device may transmit the user's gaze information to another client device and/or the server device only when the user's gaze information is changed.

이상에서는 클라이언트 디바이스가 영상 구성 정보를 생성 및/또는 전송하는 것을 중심으로 설명하였지만, 서버 디바이스가 클라이언트 디바이스로부터 센싱 정보를 수신하고, 센싱 정보를 기초로 영상 구성 정보를 생성하고, 영상 구성 정보를 적어도 하나의 클라이언트 디바이스로 전송할 수도 있다.In the above, it has been mainly described that the client device generates and/or transmits image configuration information, but the server device receives the sensing information from the client device, generates image configuration information based on the sensing information, and at least the image configuration information. It can also be transmitted to one client device.

이상에서 언급한 시그널링은 서버 디바이스, 클라이언트 디바이스, 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다. 또한, 이상에서 언급한 시그널링은 클라이언트 디바이스 및/또는 외부의 컴퓨팅 장치(존재하는 경우) 사이의 시그널링일 수 있다.The above-mentioned signaling may be signaling between a server device, a client device, and/or an external computing device (if present). In addition, the above-mentioned signaling may be signaling between a client device and/or an external computing device (if present).

이하에서는, 높고/낮은 수준의 영상을 전송하는 예시적인 방법을 설명한다.Hereinafter, an exemplary method of transmitting a high/low level image will be described.

사용자의 시선 정보를 기초로 높고/낮은 수준의 영상을 전송하는 방법은 스케일러블 코덱의 계층을 스위칭하는 방법, 싱글 비트스트림 및 실시간 인코딩의 경우 QP(Quantization Parameter) 등을 이용한 Rate Control 방법, DASH 등의 단일 비트스트림의 경우 청크(Chunk) 단위로 스위칭하는 방법, 다운스케일링/업스케일링방법(Down Scaling/Up Scaling), 및/또는 렌더링(Rendering)의 경우 더 많은 리소스를 활용한 고화질 렌더링 방법을 포함할 수 있다.The method of transmitting high/low level video based on the user's gaze information is the method of switching the layer of the scalable codec, the rate control method using QP (Quantization Parameter) for single bitstream and real-time encoding, DASH, etc. Includes a method of switching in chunks for a single bitstream, a downscaling/upscaling method, and/or a high definition rendering method using more resources in case of rendering can do.

전술한 예시적인 기술은 비록 스케일러블 비디오를 통한 차별적 전송 기법을 이야기하고 있지만, 단일 계층을 갖는 일반 비디오 코딩 기술을 사용할 경우에도, 양자화 계수(Quantization Parameter)나 다운스케일링/업스케일링 정도를 조절함으로써, 전체 대역폭을 낮추고, 빠르게 사용자 시선 움직임에 응답하는 등의 장점을 제공할 수 있다. 또한 미리 여러 비트레이트(bitrate)를 갖는 비트스트림(bitstream)으로 트랜스코딩된 파일들을 사용할 경우, 본 명세서의 예시적인 기술은 청크(Chunk) 단위로 높은 수준의 영상과 낮은 수준의 영상 사이를 스위칭하여 제공할 수 있다.Although the above-described exemplary technique refers to a differential transmission technique through scalable video, even when a general video coding technique having a single layer is used, by adjusting a quantization parameter or a degree of downscaling/upscaling, It can provide advantages such as lowering the overall bandwidth and rapidly responding to user gaze movements. In addition, when files transcoded into a bitstream having a plurality of bitrates in advance are used, the exemplary technique of the present specification switches between a high-level image and a low-level image in chunk units. Can provide.

또한, 본 명세서는 가상 현실 시스템을 예로 들고 있지만, 본 명세서는 HMD를 이용한 VR (Virtual Reality) 게임, AR (Augmented Reality) 게임 등에서도 똑같이 적용될 수 있다. 즉, 사용자가 바라보는 시선에 해당하는 영역을 높은 수준의 영상으로 제공하고, 사용자가 바라볼 것으로 예상되는 영역이나 오브젝트(Object)가 아닌 곳을 바라볼 경우만 시그널링하는 기법 모두가 가상 현실 시스템의 예에서와 똑같이 적용될 수 있다.In addition, although this specification uses a virtual reality system as an example, this specification may be equally applied to a VR (Virtual Reality) game using an HMD or an Augmented Reality (AR) game. That is, all of the techniques for providing a region corresponding to the gaze viewed by the user as a high level image and signaling only when looking at a region other than an object or an object expected to be viewed by the user are all of the virtual reality system. It can be applied as in the example.

전체 영상을 하나의 압축된 영상 비트스트림(Bitstream)으로 받아서 이를 복호화(Decoding)하고 사용자가 바라보는 영역을 가상의 공간에 렌더링(Rendering)하는 기술은 전체 영상(예를 들어, 360도 몰입형(Immersive) 영상)을 모두 비트스트림으로 전송 받는다. 각각이 고해상도인 영상이 모인 이 비디오 비트스트림의 총 대역폭은 매우 클 수밖에 없어서, 비트스트림의 총대역폭이 매우 커지는 것을 방지하기 위해서 국제 비디오 표준 기술 중 SVC 및 HEVC의 스케일러블 확장 표준인 스케일러블 고효율 비디오 부호화(Scalable High Efficiency Video Coding)와 같은 스케일러블 비디오 기술이 사용될 수 있다.The technology of receiving an entire image as a bitstream of a compressed image, decoding it, and rendering the area viewed by the user in a virtual space is a whole image (for example, a 360 degree immersive ( Immersive) video) are all transmitted as a bitstream. Scalable high-efficiency video, which is a scalable extension standard of SVC and HEVC among international video standard technologies, to prevent the total bandwidth of the bitstream from becoming very large, since the total bandwidth of this video bitstream where each high-resolution video is collected is very large. Scalable video technologies such as scalable high efficiency video coding may be used.

이하에서는 프로젝션 타입에 따라 사용자의 관심영역에 해당하는 타일에 대한 인덱싱 방법의 예를 도 9 내지 11을 참조하여 설명한다.Hereinafter, an example of an indexing method for tiles corresponding to a user's region of interest according to the projection type will be described with reference to FIGS. 9 to 11.

도 9는 프로젝션 정보에 따른 사용자 관심영역의 타일 인덱싱 방법을 예시적으로 도시한 도면이다.9 is a diagram illustrating a tile indexing method of a user's region of interest according to projection information.

도 9를 참조하면, 클라이언트 디바이스(910; 머리 장착형 영상 표시 장치)를 착용한 사용자가 가상 현실 공간 내에서 특정 방향을 바라보는 경우, 상기 클라이언트 디바이스(910)에서의 사용자 관심영역은 클라이언트 디바이스 기반의 사용자 관심영역(911)과 사용자 시선 기반의 사용자 관심영역(913)으로 구분될 수 있다.Referring to FIG. 9, when a user wearing a client device 910 (a head-mounted video display device) views a specific direction in a virtual reality space, a user interest area in the client device 910 is based on a client device. It can be divided into a user interest region 911 and a user gaze-based user interest region 913.

클라이언트 디바이스(910)는 사용자에 의해 선택된 관심영역의 구 좌표를 계산하고, 계산된 구 좌표에 대한 정보를 관심영역에 대한 타입 정보와 함께 서버 디바이스(도시하지 않음)로 전송할 수 있다.The client device 910 may calculate the sphere coordinates of the region of interest selected by the user, and transmit the calculated information about the sphere coordinates to the server device (not shown) together with the type information about the region of interest.

예를 들어, 사용자 관심영역으로 시선 기반의 관심영역(913)이 선택되면, 상기 관심영역(920)의 구 좌표는 중앙점(929)의 yaw 및 pitch 좌표와 중앙점(929)을 지나는 수평(yaw), 수직(pitch) 길이로 표현될 수 있다. 여기서, 상기 중앙점(929)은 상기 관심영역(920)의 네 변의 네 개의 중앙점들(921, 923, 925, 927), 예를 들어, cAzimuth1(921), cAzimuth2(923), cElevation1(925), cElevation2(927)를 Yaw와 Pitch로 두 점씩 수평, 수직으로 연결하여 교차하는 점(929)으로 결정된다.For example, when the gaze-based region of interest 913 is selected as the user's region of interest, the spherical coordinates of the region of interest 920 are horizontally passing through the yaw and pitch coordinates of the central point 929 and the central point 929 ( yaw), vertical (pitch) length. Here, the center point 929 is four center points of the four sides of the region of interest 920 (921, 923, 925, 927), for example, cAzimuth1 (921), cAzimuth2 (923), cElevation1 (925) ), cElevation2 (927) is connected to the two points by Yaw and Pitch horizontally and vertically to determine the intersection point (929).

서버 디바이스가 클라이언트 디바이스(910)로부터 사용자 관심영역에 대한 구 좌표와 관심영역에 대한 타입 정보(도면에서는 사용자 시선 기반의 관심영역)를 수신하면, 수신된 구 좌표를 이용하여 프로젝션 종류(도 9에서는 프로젝션 종류는 equirectangular 프로젝션임)에 따라 사용자 관심영역을 2차원 평면으로 맵핑할 수 있다. 상기 맵핑된 관심영역(931)은 직사각형의 2D 이미지이므로 관심영역의 좌표를 직사각형의 좌측 상단 모서리의 x,y 좌표와 우측 하단 모서리의 x,y 좌표로 나타낼 수 있다.When the server device receives the sphere coordinates for the user's region of interest and type information about the region of interest (the region of interest based on the user's gaze in the drawing) from the client device 910, the projection type is used using the received sphere coordinates (in FIG. 9). The type of projection can be mapped to a two-dimensional plane of user interest according to equirectangular projection. Since the mapped region of interest 931 is a rectangular 2D image, the coordinates of the region of interest may be represented by x,y coordinates of the upper left corner of the rectangle and x,y coordinates of the lower right corner of the rectangle.

서버 디바이스는 타일로 구분된 가상 현실 공간에 대한 영상(930)에서 프로젝션에 맵핑된 관심영역(931)에 해당하는 타일 인덱스를 구하고, 구해진 타일 인덱스 정보를 이용하여 전송될 비디오 데이터에 대한 메타데이터를 생성하고, 생성된 메타데이터를 시그널링 데이터에 포함시켜 클라이언트 디바이스로 전송할 수 있다. 예시에서 맵핑된 관심영역을 포함하는 타일의 번호는 Tile1, Tile2, Tile4, 및 Tile5이다.The server device obtains the tile index corresponding to the region of interest 931 mapped to the projection from the image 930 for the virtual reality space divided by the tile, and uses the obtained tile index information to obtain metadata about the video data to be transmitted. It can be generated and included in the signaling data to be transmitted to the client device. In the example, the number of tiles including the mapped region of interest is Tile1, Tile2, Tile4, and Tile5.

도 10은 영상 스트리밍 서비스를 위한 예시적인 영상 전송 장치의 블럭도이고, 도 11은 영상 스트리밍 서비스를 위한 예시적인 영상 전송 방법의 순서도이다.10 is a block diagram of an exemplary video transmission device for a video streaming service, and FIG. 11 is a flowchart of an exemplary video transmission method for a video streaming service.

도 10 내지 11을 참조하면, 영상 전송 장치(서버 디바이스)(1000)는 통신부(1010), 제어부(1020), 시그널링 데이터 생성부(1030), 인코더(1040), 및/또는 다중화기(1050)를 포함할 수 있다.10 to 11, the video transmission apparatus (server device) 1000 includes a communication unit 1010, a control unit 1020, a signaling data generation unit 1030, an encoder 1040, and/or a multiplexer 1050 It may include.

통신부(1010)는 클라이언트 디바이스(도시하지 않음)로부터 360도 가상 현실 공간 내에서 사용자의 관심영역에 대한 구 좌표 및 상기 관심영역의 타입(type)에 기반한 관심영역 타입 정보를 포함하는 제1 시그널링 데이터를 수신할 수 있다(S1101).The communication unit 1010 is the first signaling data including the sphere coordinates of the user's region of interest and the region of interest type information based on the type of the region of interest in a 360-degree virtual reality space from a client device (not shown). It may be received (S1101).

상기 관심영역 타입 정보는 클라이언트 디바이스(예를 들어, 머리 장착형 영상 표시 장치)에 기반한 뷰포트 또는 사용자의 시점에 기반한 뷰포트 중 어느 하나를 지시하는 정보를 포함할 수 있다.The region of interest type information may include information indicating either a viewport based on a client device (eg, a head-mounted video display device) or a viewport based on a user's viewpoint.

제어부(1020)는 통신부(1010)를 통해 수신된 사용자 관심영역에 대한 구 좌표, 사용자 관심영역에 대한 타입(type) 정보 및/또는 360도 영상을 2차원 평면으로 프로젝션하는 프로젝션 정보에 기반하여 상기 사용자 관심영역을 2차원 평면에 맵핑할 수 있다(S1103).The control unit 1020 is based on the spherical coordinates of the user's region of interest received through the communication unit 1010, type (type) information for the user's region of interest and/or projection information for projecting a 360-degree image into a two-dimensional plane. The user's region of interest may be mapped to the 2D plane (S1103).

360도 가상 현실 공간에 대한 비디오를 인코딩하여 클라이언트 디바이스로 전송하기 위해서는, 인코딩 장치가 사각형 형상의 비디오만 인코딩할 수 있으므로, 구 형태의 비디오 영상을 2차원의 평면으로 프로젝션하는 과정이 필수적으로 요구된다.In order to encode a video for a 360-degree virtual reality space and transmit it to a client device, since the encoding apparatus can encode only a rectangular video, a process of projecting a spherical video image into a two-dimensional plane is essential. .

따라서, 제어부(1020)는 프로젝션 타입을 나타는 프로젝션 정보에 기반하여 사용자 관심영역을 2차원 평면에 맵핑할 수 있다.Accordingly, the control unit 1020 may map the user's region of interest to the two-dimensional plane based on the projection information indicating the projection type.

또한, 제어부(1020)는 상기 2차원 평면에 맵핑된 관심영역에 대응하는 관심영역에 대한 타일의 인덱싱 정보를 계산하여 생성할 수 있다. 즉, 제어부(1020)는 맵핑된 관심영역을 포함하는 타일들의 식별 정보를 생성할 수 있다(S1105).In addition, the controller 1020 may calculate and generate indexing information of a tile for a region of interest corresponding to the region of interest mapped to the 2D plane. That is, the controller 1020 may generate identification information of tiles including the mapped region of interest (S1105).

시그널링 데이터 생성부(1030)는 가상 현실 공간 내에서 상기 맵핑된 관심영역에 대응하는 관심영역 타일의 인덱싱 정보에 적어도 일부 기초하여 제2 시그널링 데이터를 생성할 수 있다.The signaling data generation unit 1030 may generate second signaling data based on at least part of indexing information of a region of interest tile corresponding to the mapped region of interest in the virtual reality space.

상기 제2 시그널링 데이터는 360도 가상 현실 공간에 대한 프로젝션 정보 및 상기 프로젝션 정보에 기반한 맵핑된 관심영역 정보를 포함할 수 있다. 상기 맵핑된 관심영역 정보는 맵핑된 관심영역을 포함하는 타일들의 식별 정보일 수 있다.The second signaling data may include projection information for a 360-degree virtual reality space and mapped region of interest information based on the projection information. The mapped region of interest information may be identification information of tiles including the mapped region of interest.

인코더(1040)는 360도 가상 현실 공간에 대한 영상 스트리밍 서비스를 위한 비디오 데이터를 생성할 수 있다. 상기 비디오 데이터는 360도 가상 현실 공간의 적어도 일부에 대한 기본 화질 비디오 데이터와 가상 현실 공간 전체 영역 내에서 사용자의 관심영역에 대응되는 영역에 관련된 고화질 비디오 데이터를 인코딩할 수 있다.The encoder 1040 may generate video data for an image streaming service for a 360-degree virtual reality space. The video data may encode basic quality video data for at least a portion of the 360-degree virtual reality space and high-definition video data related to an area corresponding to a user's region of interest in the entire virtual reality space.

상기 사용자의 관심영역에 대응되는 영역은 적어도 하나의 타일로 분할될 수 있어서 상기 고화질 비디오 데이터는 상기 적어도 하나의 타일에 대한 고화질의 영상일 수 있다.The region corresponding to the user's region of interest may be divided into at least one tile, so that the high-definition video data may be a high-definition image of the at least one tile.

360도 가상 현실 공간에 대한 비디오 데이터는 기본 화질 비디오 데이터 및 고화질 비디오 데이터를 포함할 수 있으며, 상기 기본 화질 비디오 데이터는 상기 360도 가상 현실 공간에 대한 기본 계층 비디오 데이터를 포함하고, 상기 고화질 비디오 데이터는 관심영역 타일에 대한 기본 계층 비디오 데이터 및 적어도 하나 이상의 향상 계층 비디오 데이터를 포함하도록 인코딩될 수 있다.Video data for a 360-degree virtual reality space may include basic definition video data and high-definition video data, and the basic-definition video data includes base layer video data for the 360-degree virtual reality space, and the high-definition video data May be encoded to include base layer video data for a region of interest tile and at least one enhancement layer video data.

또한, 인코더(1040)는 기본 계층 인코더와 향상 계층 인코더를 포함할 수 있으며, 이 때, 기본 화질 비디오 데이터는 360도 가상 현실 공간 전체 영역에 대한 기본 계층 비디오 데이터일 수 있으며, 고화질 비디오 데이터는 360도 가상 현실 공간 내에서 사용자가 바라보고 있는 뷰포트 또는 클라이언트 디바이스의 뷰포트에 포함된 타일들에 대한 기본 계층 비디오 데이터와 적어도 하나 이상의 향상 계층 비디오 데이터일 수 있다.In addition, the encoder 1040 may include a base layer encoder and an enhancement layer encoder, where the base quality video data may be base layer video data for an entire 360 degree virtual reality space, and the high definition video data may be 360. Also, it may be base layer video data and at least one or more enhancement layer video data for tiles included in a viewport or a view of a client device viewed by a user in a virtual reality space.

기본 계층은 빠른 사용자 응답시간을 위해 타일링되지 않고 전체적으로 인코딩될 수 있다. 하나 이상의 향상 계층은 필요에 따라 일부 또는 전체가 여러 타일들로 나누어져서 부호화 될 수 있다.The base layer can be encoded entirely without tiling for fast user response times. One or more enhancement layers may be coded by dividing some or all of the tiles into multiple tiles as necessary.

다중화기(1050)는 360도 가상 현실 공간에 대한 비디오 데이터 및 상기 비디오 데이터에 대한 제2 시그널링 데이터를 포함하는 비트스트림을 생성할 수 있다.The multiplexer 1050 may generate a bitstream including video data for a 360 degree virtual reality space and second signaling data for the video data.

또한, 통신부(1010)는 다중화기(1050)에서 생성된 비디오 데이터 및 제2 시그널링 데이터를 포함하는 비트스트림을 제어부(1020)의 명령에 따라 클라이언트 디바이스로 전송할 수 있다(S1107).Also, the communication unit 1010 may transmit a bitstream including the video data generated by the multiplexer 1050 and the second signaling data to the client device according to the command of the controller 1020 (S1107).

여기에서, 뷰포트 또는 관심영역 내의 타일 번호는 연속적으로 부여되므로 뷰포트에 포함된 타일 정보를 제1 시그널링 데이터 및 제2 시그널링 데이터에 포함하여 전송할 때, 클라이언트 디바이스와 서버 디바이스는 타일에 대한 모든 번호 정보를 다 보내지 않고도 효과적으로 압축할 수 있다. 예를 들어, 상기 디바이스들은 타일의 시작과 끝 번호, 타일의 좌표점 정보, 타일 내 코딩 단위(CU) 번호 리스트, 타일 번호를 수식으로 표현하는 방법 등을 사용하여 타일 번호 정보를 효과적으로 압축할 수 있다.Here, since the tile numbers in the viewport or the region of interest are continuously assigned, when transmitting the tile information included in the viewport to the first signaling data and the second signaling data, the client device and the server device send all the number information for the tiles. It can be compressed effectively without sending it all out. For example, the devices can effectively compress tile number information by using tile start and end numbers, tile coordinate point information, a list of coding unit (CU) numbers in a tile, and a method of expressing a tile number as a formula. have.

제1 시그널링 데이터 및 제2 시그널링 데이터는 360도 가상 현실 공간에 대한 영상 구성 정보를 기초로 생성되고, 상기 영상 구성 정보는 상기 360도 가상 현실 공간 내에서 뷰포트들을 지시하는 시선 정보 및 사용자의 시야각을 지시하는 줌 영역 정보를 포함할 수 있다.The first signaling data and the second signaling data are generated based on the image configuration information for the 360-degree virtual reality space, and the image configuration information includes the gaze information indicating the viewports and the user's viewing angle in the 360-degree virtual reality space. It may include indicating zoom area information.

이러한 시그널링 데이터는 세션(Session) 정보를 실어나르는 고수준 구문(High-level syntax) 프로토콜을 통해 전해질 수도 있고, 비디오 표준의 SEI, VUI, 또는 슬라이스 헤더 등의 패킷 단위에 의해서 전해질 수도 있고, 비디오 파일을 설명하는 별도의 파일(예를 들어, DASH의 MPD)에 포함되어 전달될 수 있다.The signaling data may be transmitted through a high-level syntax protocol carrying session information, or may be transmitted by packet units such as a video standard SEI, VUI, or slice header, or a video file. It may be included in a separate file to be described (eg, DASH MPD) and delivered.

360도 가상 현실 공간에 대한 2차원 평면으로의 프로젝션 타입은 Equi-Rectangular Projection (ERP), Cube Map Projection (CMP), Adjusted Equal-Area Projection (AEP), Octahedron Projection (OHP), Icosahedron Projection (ISP), Truncated Square Pyramid (TSP), Segmented Sphere Projection (SSP), Adjusted Cube Map Projection (ACP), Rotated Sphere Projection (RSP), Equatorial Cylindrical Projection (ECP) 및 Equi-Angular Cube Map (EAC) 등이 있다.Equi-Rectangular Projection (ERP), Cube Map Projection (CMP), Adjusted Equal-Area Projection (AEP), Octahedron Projection (OHP), Icosahedron Projection (ISP) , Truncated Square Pyramid (TSP), Segmented Sphere Projection (SSP), Adjusted Cube Map Projection (ACP), Rotated Sphere Projection (RSP), Equatorial Cylindrical Projection (ECP) and Equi-Angular Cube Map (EAC).

360도 가상 현실 공간에 대한 특정 영역에 대한 비디오는 다양한 프로젝션 방식으로 인해 프로젝션 타입에 따라 그 위치와 모양이 최초 비디오와 달라지게 된다. 따라서, 프로젝션 타입에 따라 관심영역을 맵핑한 뒤, 맵핑된 관심영역에 대한 정보 및 프로젝션 타입에 대한 정보와 함께 프로젝션 영상을 인코딩하여 영상 재상 장치로 전송하면, 영상 재생 장치는 정확하게 360도 가상 공간에 대한 비디오를 디코딩할 수 있게 되며, 관심영역에 대한 맵핑 정보를 통해 관심영역에 대응되는 타일에 대해서만 고화질의 비디오 데이터를 전송할 수 있으므로, 전송 대역폭을 확보하고, 재생 속도를 높일 수 있는 효과가 있게 된다.The video of a specific area of the 360-degree virtual reality space has a different position and shape depending on the projection type due to various projection methods. Therefore, after mapping the region of interest according to the projection type, and encoding the projected image together with the information about the mapped region of interest and the information about the projection type, and transmitting the projected image to the image reproducing apparatus, the image reproducing apparatus is accurately located in a 360 degree virtual space. It is possible to decode the video, and the high-definition video data can be transmitted only to the tile corresponding to the region of interest through mapping information on the region of interest, thereby securing a transmission bandwidth and increasing the playback speed. .

또한, 사용자 관심영역에 대한 구 좌표를 통해 얻을 수 있는 프로젝션 좌표와 타일 인덱스 정보는 영상 전송 장치와 영상 재생 장치 간에 각 장치의 성능, 네트워크의 상태 등 상황에 알맞게 양측에서 모두 연산이 가능할 수 있다.In addition, projection coordinates and tile index information, which can be obtained through sphere coordinates for a user's region of interest, can be calculated on both sides according to situations such as the performance of each device and the state of the network between the video transmission device and the video playback device.

이하에서는, 사용자의 관심영역에 해당하는 타일에 대한 인덱싱 방법을 이용하여 영상 재생 장치에서 수신된 영상을 재생하는 방법을 도 12를 참조하여 설명한다.Hereinafter, a method of reproducing an image received from an image reproducing apparatus using an indexing method for a tile corresponding to a user's region of interest will be described with reference to FIG. 12.

도 12는 영상 스트리밍 서비스를 위한 예시적인 영상 재생 방법의 순서도이다.12 is a flowchart of an exemplary video playback method for a video streaming service.

도12를 참조하면, 예시적인 영상 재생 방법은 영상 재생 장치에 포함된 프로세서에 의해서 다음의 과정을 통해서 수행될 수 있다.Referring to FIG. 12, an exemplary image playback method may be performed by a processor included in an image playback device through the following process.

먼저, 영상 재생 장치의 프로세서는 360도 가상 현실 공간 내에서 사용자의 관심영역에 대한 구 좌표 및 상기 관심영역의 타입(type)에 기반한 관심영역 타입 정보를 포함하는 제1 시그널링 데이터를 전송할 수 있다(S1201).First, the processor of the video reproducing apparatus may transmit first signaling data including sphere coordinates for a user's region of interest and type of region of interest based on the type of region of interest in a 360 degree virtual reality space ( S1201).

다음으로, 영상 재생 장치의 프로세서는 영상 전송 장치로부터 제2 시그널링 데이터 및 360도 가상 현실 공간에 대한 비디오 데이터를 수신할 수 있다(S1203).Next, the processor of the video reproducing apparatus may receive the second signaling data and the video data for the 360-degree virtual reality space from the video transmitting apparatus (S1203).

상기 제2 시그널링 데이터는 영상 전송 장치에서 생성되며, 상기 360도 가상 현실 공간 내에서 사용자의 관심영역에 대한 구 좌표, 상기 관심영역의 타입 정보, 영상 전송 장치에서 수행된 프로젝션 정보, 상기 프로젝션 정보에 기반하여 2차원 평면에 맵핑된 상기 360도 가상 현실 공간 내의 관심영역에 대응하는 관심영역 타일의 인덱싱 정보를 포함할 수 있다.The second signaling data is generated by the video transmission device, and within the 360-degree virtual reality space, the spherical coordinates of the user's region of interest, type information of the region of interest, projection information performed by the video transmission device, and the projection information. It may include indexing information of a region of interest tile corresponding to a region of interest in the 360-degree virtual reality space mapped on a 2D plane based on the basis.

다음으로, 영상 재생 장치의 프로세서는 360도 가상 현실 공간에 대한 비디오 데이터 및 제2 시그널링 데이터에 기반하여 수신된 영상을 재생할 수 있다(S1205).Next, the processor of the image reproducing apparatus may reproduce the received image based on the video data and the second signaling data for the 360-degree virtual reality space (S1205).

영상 재생 장치는 수신된 360도 가상 현실 공간에 대한 비디오 데이터와 수신된 제2 시그널링 데이터에 기초하여 상기 관심영역을 포함하는 사용자 뷰포트에 대한 비디오 데이터를 디코딩하고, 상기 디코딩된 영상을 렌더링하여, 이를 표시 장치로 출력할 수 있다.The video reproducing apparatus decodes video data for a user viewport including the region of interest based on the received video data for the 360-degree virtual reality space and the received second signaling data, and renders the decoded video, thereby It can be output to a display device.

또한, 상기 360도 가상 현실 공간에 대한 비디오 데이터는 상기 360도 가상 현실 공간의 적어도 일부에 대한 기본 화질 비디오 데이터 및 상기 관심영역 타일에 대한 고화질 비디오 데이터를 적어도 일부 포함할 수 있다.Further, the video data for the 360-degree virtual reality space may include at least a portion of the basic-definition video data for at least a portion of the 360-degree virtual reality space and the high-definition video data for the tile of interest.

이하에서는 프로젝션에 따른 사용자 관심영역의 타일 인덱싱에서의 시그널링(신호체계)에 대해서 도 13 내지 14를 참조하여 설명한다.Hereinafter, signaling (signal system) in tile indexing of a user's region of interest according to projection will be described with reference to FIGS. 13 to 14.

도 13은 프로젝션에 따른 사용자 관심영역의 타일 인덱싱에서의 시그널링을 위한 국제 비디오 표준에서의 예시적인 OMAF 구문을 도시한 도면이다.13 is a diagram illustrating an exemplary OMAF syntax in the international video standard for signaling in tile indexing of a user's region of interest according to projection.

도 13을 참조하면, 예시적인 OMAF(Omnidirectional Media Application Format) 구문은 H.264 AVC나 H.265 HEVC와 같은 국제 비디오 표준에서 사용되는 구문(syntax)을 보여준다.Referring to FIG. 13, an exemplary OMAF (Omnidirectional Media Application Format) syntax shows syntax used in international video standards such as H.264 AVC or H.265 HEVC.

도면의 참조번호 1301, 1303, 및 1305의 구문은 본 명세서의 실시예로 새로 추가되어야 할 내용이며, 이 외의 구문은 모두 기존의 표준 구문이다. 이하에서는 각 구문에 대해서 상세히 설명한다.The syntaxes of reference numerals 1301, 1303, and 1305 in the drawings are contents to be newly added as an embodiment of the present specification, and all other syntaxes are existing standard syntaxes. Hereinafter, each syntax will be described in detail.

viewport_id는 사용자 관심영역의 종류를 의미하며, 예를 들어, 그 값이 "0"이면 사용자 관심영역은 머리 장착형 영상 표시 장치 기반의 관심영역이며, "1"이면 사용자 관심영역은 사용자 시선 기반의 관심영역임을 나타낸다.viewport_id means the kind of user interest area. For example, if the value is "0", the user interest area is a head-mounted video display device-based interest area, and if it is "1", the user interest area is a user gaze-based interest. This indicates that it is an area.

center_azimuth 구문은 구 좌표 기준의 사용자 관심영역의 중심점의 yaw 값을 나타낸다.The center_azimuth syntax indicates the yaw value of the center point of the user's region of interest based on spherical coordinates.

center_elevation 구문은 구 좌표 기준의 사용자 관심영역의 중심점의 pitch 값을 나타낸다.The center_elevation syntax indicates the pitch value of the center point of the user's region of interest based on spherical coordinates.

azimuth_range 구문은 구 좌표 기준의 사용자 관심영역의 중심점을 지나는 yaw 수평 범위 값을 나타낸다.The azimuth_range syntax represents the yaw horizontal range value passing through the center point of the user's region of interest based on the spherical coordinates.

elevation_range 구문은 구 좌표 기준의 사용자 관심영역의 중심점을 지나는 pitch 수직 범위 값을 나타낸다.The elevation_range syntax represents the pitch vertical range value passing through the center point of the user's region of interest based on the spherical coordinates.

projection_id 구문은 2D 프로젝션 종류를 의미하며, 예를 들어, 그 값이 "0"이면, 2D 프로젝션이 ERP(Equi-Rectangular Projection)이고, "1"이면, 2D 프로젝션이 CMP(Cube Map Projection)임을 나타낸다.The projection_id syntax means a 2D projection type. For example, if the value is "0", 2D projection is ERP (Equi-Rectangular Projection), and if "1", 2D projection is CMP (Cube Map Projection). .

left_top_x 구문은 2D 프로젝션 기준 관심영역의 왼쪽 위 모서리의 x 좌표 값을 나타낸다.The left_top_x syntax represents the x-coordinate value of the upper left corner of the region of interest based on 2D projection.

left_top_y 구문은 2D 프로젝션 기준 관심영역의 왼쪽 위 모서리의 y 좌표 값을 나타낸다.The left_top_y syntax indicates the y coordinate value of the upper left corner of the region of interest based on the 2D projection.

right_bottom_x 구문은 2D 프로젝션 기준 관심영역의 오른쪽 아래 모서리의 x 좌표 값을 나타낸다.The right_bottom_x syntax represents the x-coordinate value of the lower right corner of the region of interest based on 2D projection.

right_bottom_y 구문은 2D 프로젝션 기준 관심영역의 오른쪽 아래 모서리의 y 좌표 값을 나타낸다.The right_bottom_y syntax indicates the y-coordinate value of the lower right corner of the region of interest based on the 2D projection.

tile_id[] 구문은 프로젝션된 관심영역에 해당하는 타일의 인덱스 배열 정보를 나타낸다.The tile_id[] syntax represents index array information of a tile corresponding to the projected region of interest.

따라서, 본 명세서 제시된 OMAF 구문을 통해 사용자의 관심영역 타입 정보, 프로젝션 방식에 따른 관심영역의 타일 정보 등을 시그널링 데이터에 포함시켜 클라이언트 디바이스와 서버 디바이스 간에 송수신할 수 있도록 함으로써, 전송 대역폭의 절감과 클라이언트 디바이스에서의 재생 속도 향상을 달성할 수 있게 된다.Accordingly, by including the user's region of interest type information and the tile information of the region of interest according to the projection method in signaling data through the OMAF syntax presented herein, transmission and reception between the client device and the server device is reduced, thereby reducing the transmission bandwidth and the client. It is possible to achieve an improvement in playback speed on the device.

전술한 도 13에 대한 상세한 설명에서 정의된 구문과 의미론에 관한 정보들은 MPEG DASH와 같은 HTTP 기반의 영상 통신에서 각각 XML의 형태로 표현될 수도 있다.Information related to syntax and semantics defined in the detailed description of FIG. 13 may be respectively expressed in the form of XML in HTTP-based video communication such as MPEG DASH.

도 14는 XML의 형태로 표현된 관심영역 타일 인덱싱 정보 구문의 예를 나타낸 도면으로, 예시된 XML은 사용자 관심영역 종류, 관심영역의 구 좌표, 프로젝션의 종류, 관심영역의 프로젝션 좌표, 관심영역에 해당하는 타일의 인덱스 배열 정보를 포함할 수 있다.14 is a diagram illustrating an example of a tile indexing information syntax of a region of interest expressed in the form of XML, and illustrated XML is a type of a user region of interest, a sphere coordinate of a region of interest, a type of projection, a projection coordinate of a region of interest, and a region of interest. It may include index arrangement information of a corresponding tile.

도 14를 참조하면, XML 형태로 표현한 예시에는, 사용자 관심영역 종류는 사용자 시선 기반의 관심영역이고, 관심영역의 구 좌표는 수평각이 45, 수직각이 127이며, 프로젝션의 종류는 ERP이고, 관심영역의 프로젝션 좌표는 좌측 상단 모서리의 좌표가 (312, 523), 우측 하단 모서리의 좌표가 (408, 845)이고, 관심영역에 해당하는 타일의 인덱스 배열 정보는 타일번호 1, 2, 4, 5임을 나타내는 정보를 포함할 수 있다.Referring to FIG. 14, in the example expressed in the form of XML, the type of the user's region of interest is the region of interest based on the user's gaze, the spherical coordinate of the region of interest is 45 horizontally, 127 vertically, and the type of projection is ERP. The projection coordinates of the area are the coordinates of the upper left corner (312, 523), the coordinates of the lower right corner are (408, 845), and the index array information of the tile corresponding to the region of interest is tile numbers 1, 2, 4, 5 It may include information indicating that.

본 명세서에 개시된 실시예들에 따른 가상 현실 시스템은 컴퓨터로 읽을 수 있는 기록 매체에서 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 명세서의 기술이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The virtual reality system according to the embodiments disclosed herein may be embodied as computer readable code in a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system are stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical data storage devices. In addition, the computer-readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. And, functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the technical field to which the technology of this specification belongs.

이상에서 본 명세서의 기술에 대한 바람직한 실시 예가 첨부된 도면들을 참조하여 설명되었다. 여기서, 본 명세서 및 청구 범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다.In the above, a preferred embodiment of the technology of the present specification has been described with reference to the accompanying drawings. Here, the terms or words used in the specification and claims should not be interpreted as being limited to ordinary or dictionary meanings, but should be interpreted as meanings and concepts corresponding to the technical spirit of the present invention.

본 발명의 범위는 본 명세서에 개시된 실시 예들로 한정되지 아니하고, 본 발명은 본 발명의 사상 및 특허청구범위에 기재된 범주 내에서 다양한 형태로 수정, 변경, 또는 개선될 수 있다.The scope of the present invention is not limited to the embodiments disclosed herein, and the present invention can be modified, changed, or improved in various forms within the scope described in the spirit and claims of the present invention.

1000: 서버 디바이스
1010: 통신부
1020: 제어부
1030: 시그널링 데이터 생성부
1040: 인코더
1050: 다중화기1000: server device
1010: Communication Department
1020: control unit
1030: signaling data generation unit
1040: encoder
1050: multiplexer

Claims

As a method performed in a video transmission apparatus including a processor,
Receiving first signaling data including sphere coordinates for a user's region of interest and type of region of interest based on a type of the region of interest in a 360-degree virtual reality space;
Mapping the region of interest to a two-dimensional plane based on the spherical coordinates and preset projection information;
Calculating indexing information of a region of interest tile corresponding to the mapped region of interest; And
And transmitting video data and second signaling data for a 360-degree virtual reality space based on the indexing information.

According to claim 1,
The second signaling data is a video transmission method of the video transmission apparatus including the projection information and mapped region of interest information based on the projection information.

According to claim 1,
The region-of-interest type information is information indicating either a viewport based on a head-mounted video display device or a viewport based on a user's viewpoint, and the video transmission method of the video transmission device.

According to claim 1,
The method further includes preparing basic quality video data for at least a portion of the 360-degree virtual reality space and high-definition video data related to the region of interest tile,
The video data of the 360-degree virtual reality space includes the basic quality video data and the high-definition video data.

According to claim 4,
The base quality video data includes base layer video data for the 360-degree virtual reality space,
The high-definition video data includes a base layer video data and an enhancement layer video data for the region of interest tile.

According to claim 1,
The spherical coordinates are the yaw coordinates and pitch coordinates of the second central point where the segments connected by the first central points of the two sides opposite to the central points of the four sides of the region of interest and the horizontal point based on the second central point. Video transmission method of the video transmission device represented by length information and vertical length information.

According to claim 3,
The first signaling data and the second signaling data are generated based on image configuration information,
The image configuration information includes a gaze information indicating the viewports in the 360-degree virtual reality space and zoom area information indicating the user's viewing angle.

The method of claim 7,
The first signaling data and the second signaling data are high-level syntax protocol (SEI) carrying session information, supplemental enhancement information (SEI), video usability information (VUI), slice header, and the A video transmission method of a video transmission device that is transmitted through at least one of files describing video data.

According to claim 1,
Equi-Rectangular Projection (ERP), Cube Map Projection (CMP), Adjusted Equal-Area Projection (AEP), Octahedron Projection (OHP), Icosahedron Projection (ISP), Truncated Square Pyramid (TSP), Segmented Sphere Video transmission method of an image transmission device which is one of Projection (SSP), Adjusted Cube Map Projection (ACP), Rotated Sphere Projection (RSP), Equatorial Cylindrical Projection (ECP) and Equi-Angular Cube Map (EAC).

A communication unit for receiving first signaling data including sphere coordinates for a user's region of interest and type of region of interest based on a type of the region of interest within a 360-degree virtual reality space, and transmitting the generated bitstream;
A controller for mapping the region of interest to a two-dimensional plane based on the spherical coordinates and preset projection information;
A signaling data generation unit generating second signaling data including indexing information of a region of interest tile corresponding to the mapped region of interest; And
And a multiplexer generating a bitstream including the video data and the second signaling data for the 360 degree virtual reality space.

The method of claim 10,
And an encoder for generating basic quality video data for at least a portion of the 360 degree virtual reality space and high quality video data for the region of interest tile.

The method of claim 11,
The base quality video data includes base layer video data for at least a portion of the 360-degree virtual reality space,
The high definition video data includes a base layer video data and enhancement layer video data for the region of interest tile.

The method of claim 10,
The region of interest type information is information indicating either a viewport based on a head-mounted image display device or a viewport based on a user's viewpoint.

The method of claim 10,
The spherical coordinates are the yaw coordinates and pitch coordinates of the second central point where the segments connected by the first central points of the two sides opposite to the central points of the four sides of the region of interest and the horizontal point based on the second central point. An image transmission device represented by length information and vertical length information.

As a method performed in a video playback device including a processor,
Transmitting first signaling data including sphere coordinates for a user's region of interest and type of region of interest based on a type of the region of interest in a 360-degree virtual reality space;
Receiving second signaling data and video data for the 360-degree virtual reality space; And
And reproducing an image based on the video data for the 360-degree virtual reality space and the second signaling data,
The second signaling data includes indexing information of a region of interest tile corresponding to the region of interest mapped to a two-dimensional plane based on the sphere coordinates, the region of interest type information, and preset projection information,
The video data for the 360-degree virtual reality space includes a basic-definition video data for at least a part of the 360-degree virtual reality space and high-definition video data for the region of interest tile.

The method of claim 15,
Reproducing an image based on the video data and the second signaling data for the 360-degree virtual reality space,
A video reproducing method of an image reproducing apparatus that renders and displays an image for a user viewport including the region of interest based on the video data for the 360-degree virtual reality space and the second signaling data.

On a computing device,
Receiving first signaling data including sphere coordinates for a user's region of interest and type of region of interest based on a type of the region of interest in a 360-degree virtual reality space;
Mapping the region of interest to a two-dimensional plane based on the spherical coordinates and preset projection information;
Calculating indexing information of a region of interest tile corresponding to the mapped region of interest; And
A computer program stored in a medium to execute an operation of transmitting video data and second signaling data for a 360-degree virtual reality space based on the indexing information.