WO2018131803A1

WO2018131803A1 - Method and apparatus for transmitting stereoscopic video content

Info

Publication number: WO2018131803A1
Application number: PCT/KR2017/014742
Authority: WO
Inventors: 최병두
Original assignee: 삼성전자 주식회사
Priority date: 2017-01-10
Filing date: 2017-12-14
Publication date: 2018-07-19

Abstract

A method for transmitting stereoscopic video content according to the present disclosure comprises the steps of: generating, on the basis of data of a stereoscopic video which includes a plurality of omnidirectional videos having parallax, a first frame comprising a plurality of first views projected from the plurality of omnidirectional videos; generating a second frame comprising a plurality of second views by packing, on the basis of region-wise packing information, a plurality of first regions included in the plurality of first views; and transmitting data on the generated second frame, wherein the plurality of second views include a plurality of second regions corresponding to the plurality of first regions, and the region-wise packing information includes information on shape, orientation, or transformation for each of the plurality of second regions.

Description

Method and apparatus for transmitting stereoscopic video content

The present disclosure relates to a method and apparatus for packing data of stereoscopic omni-directional video.

The Internet has evolved from a human-centered connection network where humans create and consume information, and an Internet of Things (IoT) network that exchanges and processes information among distributed components such as things. The Internet of Everything (IoE) technology is an example in which big data processing technology through connection with a cloud server is combined with IoT technology.

In order to implement IoT, technical elements such as sensing technology, wired / wireless communication and network infrastructure, service interface technology, and security technology are required, and recently, a sensor network and a machine to machine connection for connecting things , M2M), Machine Type Communication (MTC), etc. are being studied.

In an IoT environment, intelligent IT (Internet Technology) services that create new value in human life by collecting and analyzing data generated from connected objects can be provided. IoT can be applied to fields such as smart home, smart building, smart city, smart car or connected car, smart grid, health care, smart home appliance, and advanced medical service through convergence and complex of existing IT technology and various industries. have. Meanwhile, contents for implementing IoT are also evolving. In other words, as the content continues to evolve from black and white content to high definition (HD), ultrahigh definition television (UHD), and recent high dynamic range (HDR) content standardization, Oculus , And research on virtual reality (VR) content that can be played on VR devices such as Samsung Gear VR is in progress. The fundamental foundation of a VR system is to monitor the user so that the user can use any kind of controller to provide feedback input to the content display device or processing unit, and that device or unit processes that input and adjusts the content accordingly. This is a system that enables interaction.

Basic configurations within the VR ecosystem include, for example, head mounted display (HMD), wireless, mobile VR, TVs, CA automatic virtual environments (CA VE), peripherals and other controllers for providing input to haptics (VR). Fields, content capture [cameras, video stitching], content studios [game, live, film, news and documentary], industrial applications [education, healthcare, real estate, construction, travel], production tools and services [3D engine, processing Power], app store [for VR media content] and the like.

And, without the implementation of the next-generation high efficiency video coding (HEVC) codec, which can be specifically designed for 3D, 360-degree content for capturing, encoding and transmitting 360-degree video content, which is performed to construct VR content. I'm facing a challenge.

Accordingly, there is a need for a method of more efficiently constructing and consuming VR content.

The present disclosure proposes a method and apparatus for packing data of stereo omni-directional video.

The present disclosure also proposes a trapezoid-based region-wise packing method.

In addition, the present disclosure proposes a packing method of an omnidirectional fisheye image.

A method of packing stereoscopic video content according to an aspect of the present disclosure according to the present disclosure, based on stereoscopic image data including a plurality of monoscopic images having a parallax, to the plurality of monoscopic images. Projecting a first frame comprising a corresponding plurality of first views; Signaling region-specific packing information; Sampling a plurality of second regions included in the plurality of second views from the plurality of first regions included in the plurality of first views based on the region-wise packing information. ; Packing a second frame including the plurality of second views based on the information about the region-specific packing; Wherein each of the plurality of first views includes a 360 degree image or a portion of a 360 degree image.

In the method for transmitting stereoscopic video content according to the present disclosure, based on the data of the stereoscopic image comprising a plurality of omnidirectional images having a parallax, from the plurality of omnidirectional images Generating a first frame comprising a plurality of projected first views; Generating a second frame including a plurality of second views by packing a plurality of first regions included in the plurality of first views based on region-wise packing information; And transmitting data relating to the generated second frame, wherein the plurality of second views includes a plurality of second regions corresponding to the plurality of first regions, and the packing information for each region may include: It includes information about the shape, orientation or transformation of each of the plurality of second regions.

An apparatus for transmitting stereoscopic video content, comprising: a memory; Transceiver; And at least one processor coupled to the memory and the transceiver, wherein the at least one processor is based on data of the stereoscopic image including a plurality of omnidirectional images having parallax; Generating a first frame including a plurality of first views projected from a plurality of omnidirectional images, and based on region-wise packing information, a plurality of first images included in the plurality of first views Packing the first regions to generate a second frame including a plurality of second views, and transmitting data about the generated second frame, wherein the plurality of second views comprise the plurality of first views; And a plurality of second regions corresponding to regions, wherein the region-specific packing information includes information about a shape, orientation, or transformation of each of the plurality of second regions.

1 is an exemplary view for explaining the configuration of a computer system that implements a stereo omnidirectional image packing method according to the present invention.

2 illustrates a left and right stereoscopic 360 format according to the present disclosure, and FIG. 3 illustrates a top-bottom stereoscopic 360 format.

4 illustrates image stitching, projection, and packing per region of a single acquisition time instance.

5 is an exemplary view for explaining a non-area packing method according to the present disclosure.

6 is an exemplary diagram for explaining a separate and independent packing method according to the present disclosure.

7 is an exemplary view for explaining a separation and mirroring packing method according to the present disclosure.

8 is an exemplary diagram for explaining a mixed and independent packing method according to the present disclosure.

9 is an exemplary view for explaining a mixed and pair-wise packing method according to the present disclosure.

10 is an exemplary view for explaining a packing method for a regular polyhedral projection image according to the present disclosure.

11 is an exemplary view for explaining a packing method for each region using a triangular patch according to the present disclosure.

12 is an exemplary view for explaining the layout of the left and right regions used in the non-region-specific packing method according to the present disclosure.

13 is an exemplary view for explaining the layout of the upper and lower regions used in the non-region-specific packing method according to the present disclosure.

14 shows the shape of a patch according to patch_shape of the present disclosure.

FIG. 15 is an exemplary diagram for explaining a region-specific packing method of adjusting and rearranging an area according to latitude in an isotropic projection (ERP) according to the present disclosure.

FIG. 16 is an exemplary diagram for explaining region-specific packing for a cube projection for viewport dependent streaming according to the present disclosure.

17 is an exemplary diagram for explaining an embodiment of a method of packing an ERP image according to the present disclosure.

18 is an exemplary diagram for describing a method of packing an ERP image according to the present disclosure.

19 is an exemplary diagram for explaining a method of converting an isotonic projection according to the present disclosure into a layout similar to a cube.

20 is an exemplary diagram for explaining another embodiment of converting an isotonic projection according to the present disclosure into a layout similar to a cube.

21 is an exemplary diagram for describing a method of converting an ERP image into a cube-like ERP according to the present disclosure.

22 is an exemplary view for explaining a TSP packing method according to the present disclosure.

23 is an exemplary view for explaining an embodiment of a TSP packing method according to the present disclosure.

24 is an exemplary view for explaining another embodiment of a TSP packing method according to the present disclosure.

25 is an illustration of a typical fisheye video comprising two circular images in accordance with the present disclosure.

26A is an exemplary diagram of stereoscopic fisheye video in a vertical stereo format according to the present disclosure.

26B is an illustration of stereoscopic fisheye video in left and right stereo format according to the present disclosure.

27 is an exemplary diagram of stereoscopic fisheye video having a pair-by-pair format for multiview according to the present disclosure.

28 is an exemplary diagram of stereoscopic fisheye video having a group-by-group format for multiview according to the present disclosure.

29 is an exemplary diagram for describing a fisheye camera according to the present disclosure.

30 shows a displayed FOV for two fisheye images, in a fisheye camera according to the present disclosure.

FIG. 31 illustrates an overlapped FOV with a displayed FOV for multiple fisheye images, in a fisheye camera according to the present disclosure.

32 is an exemplary view for explaining the center of a fisheye camera according to the present disclosure.

33 is an exemplary diagram for describing parameters regarding a local field of view according to the present disclosure.

34 is an illustration of a local viewing angle in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION Hereinafter, an operating principle of a preferred embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. Like reference numerals are used to designate like elements even though they are shown in different drawings, and detailed descriptions of related well-known functions or configurations are not required in the following description. If it is determined that it can be blurred, the detailed description thereof will be omitted. In addition, terms to be described below are terms defined in consideration of functions in the present disclosure, and may vary according to a user's or operator's intention or custom. Therefore, the definition should be made based on the contents throughout the specification.

The present disclosure may be variously modified and have various embodiments, and specific embodiments will be described in detail with reference to the drawings. However, this is not intended to limit the present disclosure to specific embodiments, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present disclosure.

In addition, it is to be understood that the singular forms “a” and “an”, including “an”, unless the context clearly indicates otherwise, include plural expressions. Thus, as an example, a “component surface” includes one or more component surfaces.

In addition, terms including ordinal numbers such as first and second may be used to describe various components, but the components are not limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present disclosure, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

Also, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. Singular expressions include plural expressions unless the context clearly indicates otherwise. As used herein, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described on the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

In addition, in the embodiments of the present disclosure, unless otherwise defined, all terms used herein including technical or scientific terms are to be generally understood by those skilled in the art to which the present disclosure belongs. It has the same meaning. Terms such as those defined in the commonly used dictionaries should be interpreted as having meanings consistent with the meanings in the context of the related art, and ideally or excessively formal meanings, unless explicitly defined in the embodiments of the present disclosure. Not interpreted as

According to various embodiments of the present disclosure, an electronic device may include a communication function. For example, the electronic device may include a smart phone, a tablet personal computer (PC), a mobile phone, a video phone, and an e-book reader (e). -book reader, desktop PC, laptop PC, netbook PC, personal digital assistant (PDA), portable Portable multimedia player (PMP, hereinafter referred to as 'PMP'), MP3 player, mobile medical device, camera, wearable device (e.g., head-mounted) Head-mounted device (HMD), for example referred to as 'HMD', electronic clothing, electronic bracelet, electronic necklace, electronic accessory, electronic tattoo, or smart watch ), Etc.

According to various embodiments of the present disclosure, the electronic device may be a smart home appliance having a communication function. For example, the smart home appliance includes a television, a digital video disk (DVD) player, an audio, a refrigerator, an air conditioner, a vacuum cleaner, an oven, Microwave oven, washer, dryer, air purifier, set-top box, TV box (e.g. Samsung HomeSyncTM, Apple TVTM, or Google TVTM), gaming console ), An electronic dictionary, a camcorder, an electronic photo frame, and the like.

According to various embodiments of the present disclosure, an electronic device may include a medical device (eg, magnetic resonance angiography (MRA) device), and magnetic resonance imaging (MRI). MRI, hereinafter referred to as “MRI”), computed tomography (CT) device, imaging device, or ultrasound device), navigation device, A global positioning system (GPS) receiver, an event data recorder (EDR), and a flight data recorder. : FDR, hereinafter referred to as "FER", automotive infotainment device, navigation electronic device (e.g. navigation navigation device, gyroscope) ope, or compass), avionics, security devices, industrial or consumer robots, and the like.

According to various embodiments of the present disclosure, an electronic device may include furniture, part of a building / structure, an electronic board, an electronic signature receiving device, a projector, and various measurement devices (eg, water) that include communication functionality. And electrical, gas, or electromagnetic wave measuring devices).

According to various embodiments of the present disclosure, the electronic device may be a combination of devices as described above. In addition, it will be apparent to those skilled in the art that the electronic device according to the preferred embodiments of the present disclosure is not limited to the device as described above.

According to various embodiments of the present disclosure, a device for transmitting and receiving VR content may be, for example, an electronic device.

Hereinafter, terms used in the embodiments of the present disclosure are defined as follows. The image may be a video, a still image, or the like, and the image content may include various multimedia contents including video, still images, and the like, related audio, subtitles, and the like. The VR content includes image content that provides the image as a 360 degree image, a 3D image, or the like. The media file format may be a media file format according to various media related standards such as an International Organization for Standardization (ISO) -based media file format (ISOBMFF). In addition, projection refers to a process in which a spherical image for representing a 360 degree image or the like is projected onto a planar surface or an image frame according to a result of the processing. Mapping refers to a process in which image data on a plane according to the projection is mapped to a 2D plane or an image frame according to a result of the process. Omnidirectional media are, for example, images or videos that can be rendered according to the direction of the user's head movement or when the user uses the HMD or according to the user's viewport. Or related audio. The view port may be referred to as a field of view (FOV), and refers to an area of an image that is displayed to a user at a specific point in time, where the area of the image may be an area of the spherical image.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Meanwhile, the packing method of the stereo omnidirectional image according to the embodiment of the present invention may be implemented in a computer system or recorded on a recording medium. As shown in FIG. 1, a computer system may include at least one processor 110 and a memory 120.

The processor 110 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 120.

The processor 110 may be a controller that controls all operations of the computer system 100. The controller may execute operations in which the computer system 100 operates by reading and executing the program code stored in the memory 120.

Computer system 100 may include a user input device 150, a data communication bus 130, a user output device 160, and a storage 140. Each of the above components may be in data communication via the data communication bus 130.

The computer system can further include a network interface 170 coupled to the network 180.

Memory 120 and storage 140 may include various types of volatile or nonvolatile storage media. For example, the memory 120 may include a ROM 123 and a RAM 126. Storage 140 may include non-volatile memory such as magnetic tape, hard disk drive (HDD), solid state drive (SDD), optical data device, and flash memory.

Therefore, the packing method of the stereo omnidirectional image according to the embodiment of the present invention may be implemented by a computer executable method. When a method of packing stereo omnidirectional images according to an embodiment of the present invention is performed in a computer device, computer readable instructions may perform the operating method according to the present invention.

Meanwhile, the above-described packing method of stereo omnidirectional image according to the present invention may be implemented as computer readable codes on a computer readable recording medium. Computer-readable recording media include all kinds of recording media having data stored thereon that can be decrypted by a computer system. For example, there may be a read only memory (ROM), a random access memory (RAM), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device, and the like. The computer-readable recording medium can also be distributed over computer systems connected by a computer communication network, and stored and executed as code readable in a distributed fashion.

In the present disclosure, a region-based packing method for a stereoscopic 360 image is proposed.

In addition, a generalized region packing method using a plurality of patches is proposed. Many researchers and practitioners are working on the various layouts for each projection. Depending on the type of layout, it has been found that coding efficiency can be significantly improved. Each region indicated by a particular patch can be resampled and relocated from the projected frame to the packed frame. Thus, the patch specifies the area of the image data to be packed.

Three parameters are proposed, corresponding to various faces of three-dimensional geometries (eg, hexahedron, octahedron, icosahedron, etc.) to allow for an area that can be specified by various polyhedra. The three parameters are patch_shape, patch_orientation and patch_transform. patch_shape represents the shape of a patch, that is, a rectangle, an isosceles triangle, a right triangle, and the like. Herein, the patch may mean each area included in each view of the packed frame, or may mean each area included in each view of the projected frame. patch_orientation indicates the rotation and flip of a patch shape indicating the orientation of various shapes. patch_transform indicates the rotation and flip of the image data specified by the patch. In addition, a region-based packing method for each region is proposed.

In international conferences on Omnidirectional media application format (OMAF), monoscopic / stereoscopic representations are negotiated. Many VR players and service platforms can play and deliver stereoscopic 360 video. Depending on the format of stereoscopic, it can support left and right stereoscopic 360 image and Tom-bottom stereoscopic 360 image.

2 illustrates a left and right stereoscopic 360 format according to the present disclosure, and FIG. 3 illustrates a tom-bottom stereoscopic 360 format.

We propose a region-wise packing method according to the present disclosure. The area-specific packing method can flexibly subdivide the projected frame into a plurality of areas. Each region can be resized and relocated to a packed frame. Hereinafter, a method of packing by region for both monoscopic 360 video and stereoscopic 360 video will be described.

In the case of monoscopic 360 degree video, the input images of one time instance are stitched to produce a projected frame representing one view. In the case of stereoscopic 360 degree video, the input images of one time instance are stitched to produce a projected frame representing two views (one for each eye). Both views are mapped to the same packed frame and encoded by a conventional 2D (2 dimensional) video encoder.

Optionally, each view of the projected frame may be mapped to a packed frame, respectively. The sequence of packed frames of the left view or the right view may be coded independently and when using a multiview video encoder, it may be predicted from another view.

Although the region-by-area packing method of the stereo 360 video format and the stereo 360 video format have been agreed, certain parameters defining the layout of the stereo 360 video format have not been proposed or adopted yet. This disclosure proposes several types of defining the layout of stereoscopic 360 video in packed frames. Each type has its own advantages. For example, according to the fully mixed-independent packing method, the left view and the right view can achieve good performance in terms of coding efficiency, but in tile-based delivery for viewport dependent streaming, it is appropriate to pack the left and right views in pairs. Do. The syntax and meaning for packing by region will be described later.

Referring to FIG. 4, images of concurrent instances (B _i ) are mapped to stitched, projected, and packed frames (D). 4 is a schematic diagram of an image stitching, projection, and packing process for each region. The input images Bi are stitched and projected onto a three-dimensional projection structure such as a sphere or a cube. Image data on the projection structure is further arranged on the two-dimensional projection frame (C). The format of the two-dimensional projection frame is indicated by a projection format indicator defined in coding independent media description code points (CICP) or omnidirectional media application format (OMAF).

Optional per region packing is applied to map the two-dimensional projection frame C into one or more packed frames D. FIG. If no per-field packing is applied, the packed frame will be identical to the projected frame. Otherwise, the regions of the projected frame are mapped to the one or more packed frames D by indicating the location, shape, and size of each area of the one or more packed frames D. In practice, the input images are converted into packed frames by a process without an intermediate process.

Various layouts for stereoscopic 360 video packing are described below.

In stereoscopic 360 video packing, both the left view and the right view may be packed in the same packed frame. Then, when the stereoscopic formats of the left view and the right view are the same, each view of the native layout may be placed in the left or right area. If area-specific packing is applied to each view or both views, for each embodiment various layouts are possible. In order to define each type of stereoscopic layout, two parameters are employed. The two parameters are stereo_format and stereo_packing_type. The stereo_format parameter is an indicator that specifies a stereoscopic format, such as side-by-side or top-bottom.

stereo_packing_type defines a layout type for packing for each stereoscopic region. The layout type relates to whether positions of respective regions belonging to the left view or the right view are separated, mixed, independent, or correspond to each other.

Each stereo_packing_type has advantages in terms of coding efficiency and functionality. The following figures assume the same case as the left-right stereoscopic 360 format.

No region-wise packing

Non-region-wise packing is possible using native layout rather than per-region packing.

If stereo_packing_type corresponds to non-region-wise packing, each projected frame using the basic layout is placed in the left and right regions without shuffling. The packing method using the default layout is the simplest layout and an efficient way to quickly extract and render each view. Since the projected frame and the packed frame are the same, the data structure of the image data is not changed.

Separate and independent packing method

If stereo_packing_type is separate and independent packing, each projected frame having a basic layout of projection may be placed in the left-right region.

Then, each half frame corresponding to each view is internally recognized by region-specific packing. Each view is separated, but the local regions included in each view are sampled again and placed in half packed frames corresponding to the same view. The separate-independent packing layout is effective for fast extraction and coding efficiency. However, each view will have to be recognized for rendering after being decoded.

Separate and mirroring packing

If stereo_packing_type is separate and mirroring packing, each projected frame having a basic layout of projection may be placed in the left-right region.

Then, each half frame corresponding to each view is internally recognized by region-specific packing. Thus, each view is separated, but the local areas included in each view are resampled and placed in half packed frames corresponding to the same view. The difference from the separate-independent packing is that the packing method for each area of one view and the packing method for each area of another view are the same. Compared with separate-independent packing, bits can be saved. Since the area-specific packing parameters of one view are the same as the area-specific packing parameters of another view, the area-specific packing parameters of one view do not need to be signaled.

Mixed and independent packing method

If stereo_packing_type is a mixed and independent packing method, each region of the projected frame of one view is resampled and placed at a particular location of the packed frame. There is no restriction for recognizing left and right frames projected onto the same packed frame. The advantage of the mixed-independent packing method is the coding efficiency. According to the mixed-independent packing method, an optimum layout with full flexibility in terms of compression can be found. However, extracting a view from a frame packed view is complicated, and the view must be recognized for rendering.

Mixed and pair-wise packing

If stereo_packing_type is mixed and pair-wise packing, each region of the projected frame of the left view is resampled and placed at a specific position of the packed frame. The corresponding area (same location, same size) of the projected frame of the right view is then sampled identically to the left view and is located to the right of the projected area of the left view. (When tom-bottom stereoscopic is used, the right view area can be located at the bottom portion of the packed area of the left view.) The main advantage of per-pair packing is that in all the left and right area projected frames. It is located in pairs. Thus, it is suitable for tile based delivery and rendering. The area packed for each pair may be a tile. When specific tiles that are dependent on the current viewport are delivered, the stereoscopic views can always be displayed because each tile includes a left view and a right view. Bits representing the region-specific packing parameters for the right view will be saved as well.

There is no restriction for recognizing left and right frames projected onto the same packed frame. The advantage of the mixed-independent packing method is the coding efficiency. According to the mixed-independent packing method, an optimum layout with full flexibility in terms of compression can be found. However, extracting a view from a frame packed view is complicated, and the view must be recognized for rendering.

This disclosure will present multiple layouts of each projection to find the best layout in terms of coding efficiency and memory usage. By observing that the packed projection performs better, several methods for packing to remove projection redundancy can be compared to the native unfolding or unrolling method.

For icosahedron based projections (ISPs), the difference in compression efficiency between the native layout and the compact projection layout to be packed is 9.5% (all intra, AI), 4.9% (random access, RA), 3.0% (low delay B pricture, LD-B) and 3.3% (low delay P picture, LD-P) were reported. For cube based projection (CMP), experimental results show that in terms of coding efficiency, the 4x3 basic layout averages 1.4% (AI), 1.3% (RA), and 1.7% (low delay B pricture, LD-) over the 3x2 compact layout. B), it surpasses 1.7% (low delay P picture, LD-P). No significant RD gain was found. Based on these results, triangle based packing for ISP is expected to be more efficient than square based packing for CMP.

General Area Packing

In order to determine what kind of packing method is required, the present disclosure should determine in advance which projection method OMAF has been adopted. However, in the present disclosure, in the scope of pack verification experiments (PACK-VE), a generalized region-based packing method using a plurality of patches is proposed to enable a triangle-based packing method. Some projection methods can be used in OMAF by using the basic projection method or the selective projection method or other extended mechanisms possible by unifrom resource indicators (URIs) and the triangle-based tetrahedrons (octahedrons, icosahedrons) Assume that you can. Generalized to improve cube-based projection (CMP), octahedron based projection (OHP), ISP (icosahedron based projection), segmented sphere projection (SSP), Trunked Square Pyramid (TSP) and coding efficiency, and reduce memory usage Packing would be preferred.

In the proposed area-specific packing method according to the present disclosure, each area indicated by a specific patch can be resampled and relocated from the projected frame to the packed frame. Thus, the patch is shaped to specify image data to be packed. Three parameters (patch_shape, patch_orientation, patch_transform) are proposed so that regions corresponding to various faces of various three-dimensional geometry (eg, cubes, octahedrons, icosahedrons, etc.) can be specified by various tetrahedra. phatch_shape represents the patch shape (rectangle, isosceles triangle, right triangle, etc.), patch_orientation represents the patch shape rotation and flips representing various shape orientations, and patch_transform represents the rotation of image data specified by the patch. And flip.

FIG. 11 (a) is an exemplary diagram for describing a parameter of a triangular patch of a projected frame, and includes coordinates (proj_region_top_left_x, proj_region_top_left_y), width (proj_region_width), and height (proj_region_height) of the top-left of a region included in the projected frame. , Patch type (patch_type, patch_shape), and patch orientation (patch_orientation). If the patch type is 2, it means that the patch is an isosceles triangle. If the patch orientation point is 2, it means that the region of the input image is generated by rotating the region of the input image 90 degrees counterclockwise.

FIG. 11 (b) is an exemplary diagram for describing a parameter of a triangular patch of a packed frame, and includes coordinates (pack_region_top_left_x, pack_region_top_left_y), width (pack_region_width), and height (pack_region_height) of the top-left of a region included in the packed frame. , Patch_transform. A patch type of 2 means that the patch is an isosceles triangle. A patch transformation of 6 rotates the projected frame area 270 degrees counterclockwise to It means that you have created an area.

5. Syntax

Table 1 is a syntax illustrating a data structure used to perform a stereoscopic region-specific packing method according to the present disclosure.

6. semantics

Table 2 shows setting values of stereo_format for specifying a stereoscopic 360 video format.

valuevalue	stereo_formatstereo_format
0x000x00	ReservedReserved
0x010x01	Left-right 스테레오스코픽 360 포맷Left-right stereoscopic 360 format
0x020x02	Top-bottom 스테레오스코픽 360 포맷Top-bottom stereoscopic 360 format
0x03-0xFF0x03-0xFF	ReservedReserved

Table 3 shows setting values of stereo_packing_type for specifying a region-specific packing type for stereoscopic 360 video.

valuevalue	stereo_packing_typestereo_packing_type
0x000x00	reservedreserved
0x010x01	no region-wise packing(native)no region-wise packing (native)
0x020x02	separate and independent packingseparate and independent packing
0x030x03	separate and mirroring packingseparate and mirroring packing
0x040x04	mixed and independent packingmixed and independent packing
0x050x05	mixed and mirroring packingmixed and mirroring packing
0x06-0xFF0x06-0xFF	ReservedReserved

If stereo_packing_type is 1, this specifies a projected frame having a basic layout of projections located in the left and right regions (or top and bottom regions) without shuffling.

If stereo_packing_type is 2, each projected frame with a basic layout is located in the left or right area. Then, each half frame corresponding to each view is internally recognized by region-specific packing. Each view is separated, but the local regions included in each view are sampled again and placed in half packed frames corresponding to the same view. The separate-independent packing layout is effective for fast extraction and coding efficiency. However, each view will have to be recognized for rendering after being decoded.

If stereo_packing_type is 3, each projected frame having a basic layout of projection can be placed in the left-right region. Then, each half frame corresponding to each view is internally recognized by region-specific packing. Thus, each view is separated, but the local areas included in each view are resampled and placed in half packed frames corresponding to the same view. The difference from the separate-independent packing is that the packing method for each area of one view and the packing method for each area of another view are the same.

If stereo_packing_type is 4, each area of the projected frame of one view is resampled and placed at a specific location of the packed frame. There is no restriction for recognizing left and right frames projected onto the same packed frame.

If stereo_packing_type is 5, each area of the projected frame of the left view is resampled and placed at a specific position of the packed frame. The corresponding area (same location, same size) of the projected frame of the right view is then sampled identically to the left view and is located to the right of the projected area of the left view. (When tom-bottom stereoscopic is used, the right view area may be located at the bottom portion of the packed area of the left view.)

12 is an exemplary view for explaining the layout of the left and right regions used in the non-regional packing method according to the present disclosure, in which the projected frames and the stereo_packing_type are no region-wise packing (native), separate and independent packing, and separate. The layout of the left and right regions of the packed frame in the case of and mirroring packing, mixed and independent packing, mixed and mirroring packing

FIG. 13 is an exemplary diagram for describing a layout of an upper and lower regions used in a non-regional packing method according to the present disclosure, in which projected frames and stereo_packing_type are no region-wise packing (native) (0x01), separate and independent The layout of the upper and lower regions of the packed frame when packing (0x02), separate and mirroring packing (0x03), mixed and independent packing (0x04), and mixed and mirroring packing (0x05) is shown.

width_proj_frame is the width of the projected frame.

height_proj_frame means the height of the projected frame.

num_of_regions means the number of packed regions specified by the patch.

If uniform_region_size is 1, the projected frame is divided into regions of the same size specified by uniform_region_width and uniform_region_height. If uniform_region_size is 0, the i-th region of the projected frame (i is an integer from 0 to num_of_regons-1). It is specified by the size specified by this proj_region_width [i] and proj_region_height [i].

uniform_region_width and uniform_region_height specify each region of the projected frame with the same width and height.

proj_region_width [i] and proj_region_height [i] specify the i-th region of the projected frame.

patch_shape [i] specifies the shape of the i-th region to be rearranged into the packed frame.

Table 4 shows the shape of each area of the projected frame according to patch_shape.

valuevalue	patch_shapepatch_shape
0x000x00	reservedreserved
0x010x01	직사각형(rectangle)Rectangle
0x020x02	이등변삼각형(equilateral triangle)Equilateral triangle
0x030x03	직각삼각형(right-angled triangle)Right-angled triangle
0x04-0xFF0x04-0xFF	ReservedReserved

FIG. 14A shows that patch_shape is 0x01 (rectangle), FIG. 14B shows that patch_shape is 0x02 (isosceles triangle), FIG. 14C shows that patch_shape is 0x03 (right triangle),

patch_orientation [i] specifies the shape of the patch that has been rotated and flipped from the original patch shape (i-th area of the projected frame) indicated by patch_shape [i].

Table 5 shows the meaning of the rotation or flip according to patch_orientation [i].

값value	의미meaning
0x000x00	reservedreserved
0x010x01	no rotation or flipno rotation or flip
0x020x02	90도 회전(시계 반대 방향)90 degrees rotation(counter-clockwise)90 degrees rotation (counter-clockwise)
0x030x03	수평 플립 후 90도 회전(시계 반대 방향)90 degrees rotation(counter-clockwise) after horizontal flip90 degrees rotation (counter-clockwise) after horizontal flip
0x040x04	180도 회전(시계 반대 방향)180 degrees rotation(counter-clockwise)180 degrees rotation (counter-clockwise)
0x050x05	수평 플립 후 180도 회전(시계 반대 방향)180 degrees rotation(counter-clockwise) after horizontal flip180 degrees rotation (counter-clockwise) after horizontal flip
0x060x06	270도 회전(시계 반대 방향)270 degrees rotation(counter-clockwise)270 degrees rotation (counter-clockwise)
0x070x07	수평 플립 후 270도 회전(시계 반대 방향)270 degrees rotation(counter-clockwise) after horizontal flip270 degrees rotation (counter-clockwise) after horizontal flip
0x08-0xFF0x08-0xFF	reservedreserved

patch_transform [i] specifies the rotation and flip of the image data specified by patch_orientation [i] and patch_shape [i] to be rearranged into the packed frame.

Table 6 shows the meaning of rotation or flip according to patch_transform [i].

packed_region_width [i] and packed_region_height [i] specify the width and height of the packed region of the packed frame corresponding to the i th region of the projected frame.

packed_region_top_left_x [i] and packed_region_top_left_y [i] specify the horizontal and vertical coordinates of the top-left corner of the packed region of the packed frame corresponding to the i th region of the projected frame.

OMAF incorporates a region-by-region packing method that removes redundant regions, thereby improving the projected coding efficiency. For example, an isotropic projection (ERP) stitches each parallel of the sphere, transforming the sphere into a planar rectangular region. The range of stitching increases extremely in the polar direction.

Referring to FIG. 15, the coding efficiency of the projected frame may be improved by reducing the region of the polar region.

For example, in the ERP, the first and fifth regions corresponding to the high latitude region (greater than 60 degrees or less than -60 degrees) are sampled at a 1: 3 ratio, and the middle latitude region (more than 30 degrees and less than 60 degrees, or less than -30 degrees). The second area and the fourth area corresponding to -60 degrees or more) are sampled at a 2: 3 ratio, and the third area corresponding to the low latitude area (less than 30 degrees -30 degrees or more) is sampled at a 1: 1 ratio In addition, the packed frame may be obtained by rearranging the sampled regions as shown in FIG. 15C.

In viewport dependent streaming, only the current viewport region is encoded with high quality, and other regions are encoded with low quality, in order to reduce the bitrate of the projected frame. FIG. 16 shows an exemplary view of the area-by-area packing for a cube map of a projected frame consisting of a front face and five down sampled faces (left side, right side, back side, top side, bottom side) of 1/5.

These cases can generally be converted by mapping a rectangle to a rectangle. However, because the sampling rate changes significantly, square-based mapping can cause discontinuities between subregions at the boundary. This discontinuity reduces coding efficiency and has visual coupling. For this reason, in order to improve the coding efficiency of the projected frame, more flexible area-specific packing is required.

Trapzoid based region-wise packing

In order to improve the flexibility of packing by region, we propose a rectangle-to-trapezoid mapping. The rectangle-to-trapezoid mapping enables various and effective area-specific packing methods. If the short edge is 1 pixel, it becomes a triangle.

17 is an exemplary diagram for explaining a method of packing an ERP image according to the present disclosure.

Square ERP

As mentioned above, ERP creates an extremely stretched pole region. As a result, polarity redundancy pixels unnecessarily reduce the coding efficiency of the video.

FIG. 17 illustrates a region-specific packing approach that reduces the sampling rate of the pole region of an isquirectangular panorama. The projected frame is first divided into eight rectangular sub-regions, and using line-down downsampling, each region is converted into a triangular shape and rearranged to form a rectangular format.

Referring to the center of FIG. 17, one embodiment of a method of packing an ERP image according to the present disclosure extremely reduces the number of pixels in the polar region, while maintaining the relatively equatorial region. Furthermore, the packaged frame is represented by a rectangular layout without discontinuities between sub-regions, and blank pixels do not contain scene information.

According to the method as shown in FIG. 17, since the continuity is maintained at the boundary of each region when the ERP image is packed, distortion at the boundary can be reduced when depacking. When packing according to the method of FIG. 18, there is an advantage in that continuity of an image can be maintained at the boundary of each region.

Cube-like ERP

By using a method of mapping a rectangle into a triangle, an isotropic projected frame can be converted into a cube-like layout.

Referring to FIG. 19, the top region and the bottom region (ie, the polar regions) are each divided into four subregions, each subregion is converted into a triangular region, and the like cube. It is relocated to the layout.

19 is an example of a 4x3 cube map layout, and FIG. 20 is an example of a 3x2 cube map layout.

21 shows an ERP image, a 4x3 cube map layout according to FIG. 20, and a 3x2 cube map layout according to FIG. 20.

2.3. Square pyramid packing method (truncated square pyramid, TSP)

By the TSP packing method, the cube map frame can be converted to a TSP.

Referring to FIG. 22, for example, the front may be a square sampled at a 1: 1 ratio, the back may be a square sampled at a 1: 9 ratio, and the right, left, top, and bottom may be sampled at a 2: 9 ratio. Can be trapezoidal.

According to the TSP packing method of FIG. 22, there is an effect of reducing distortion at a boundary.

The parameters proposed by the present disclosure are described.

In order to support the proposed square- trapezoidal transformation mapping, we propose to include four parameters. The rectangular area of the packed frame is defined by four parameters. The four parameters are the horizontal and vertical coordinates (pack_reg_left, pack_reg_top) and the width and height (pack_reg_width, pack_reg_height) of the top left vertex.

Then, inside the rectangular area, the rectangle side is defined as the shorter side of the trapezoid represented by the offset information (pack_sb_offset) 2320 and the length (pack_sb_length) 2330 indicating the position of the start point 2310. Define the trapezoidal area by setting.

Referring to FIG. 24, another parameter pack_sb_indicator is defined to indicate which side is a short side. For example, if pack_sb_indicator is 1, the upper side may be shorter, if pack_sb_indicator is 2, the lower side may be shorter, if pack_sb_indicator is 3, the left side may be shorter, and if pack_sb_indicator is 4, the right side may be shorter.

construction

Table 7 shows the syntax for implementing the TSP packing method.

meaning

proj_frame_width specifies the width of the projected frame.

proj_frame_height specifies the height of the projected frame.

number_of_regions Specifies the number of subregions of the projected frame.

proj_reg_top [n], proj_reg_left [n] specify the x and y coordinates of the upper left corner of the nth rectangular subarea of the projected frame, and proj_reg_width [n], proj_reg_height [n] are the nth rectangular subareas of the projected frame Specify the width and height of the area.

pack_reg_top [n], pack_reg_left [n] specify the x and y coordinates of the upper left corner of the nth rectangular subarea of the packed frame, and pack_reg_width [n], pack_reg_height [n] specify the nth rectangular sub of the packed frame Specify the width and height of the area.

pack_sb_offset [n] specifies the distance from the upper left vertex of the nth rectangular sub-region of the projected frame to the start of the shorter side.

pack_sb_length [n] specifies the length of the shorter side of the nth rectangular subregion of the projected frame.

pack_sb_indicators [n] specifies the location with the shorter side of the nth trapezoidal subregion of the packed frame that corresponds to the nth rectangular subregion of the projected frame. If pack_sb_indicators [n] is greater than zero, the nth rectangular subregion of the projected frame is trapezoidal, and if pack_sb_indicators [n] is zero, it is rectangular. Table 8 shows the positions of the shorter sides according to pack_sb_indicators [n].

valuevalue	pack_sb_indicators[n]pack_sb_indicators [n]
00	no shorter base (rectangular region)no shorter base (rectangular region)
1One	top side top side
22	bottom side bottom side
33	left side left side
44	right sideright side

proj_reg_rotation [n] specifies the clockwise rotation of the image data corresponding to the nth sub-region of the projected frame. Table 9 shows rotation angles according to proj_reg_rotation [n].

valuevalue	pack_sb_indicators[n]pack_sb_indicators [n]
00	no rotationno rotation
1One	90 degrees rotation90 degrees rotation
22	180 degrees rotation180 degrees rotation
33	270 degrees rotation270 degrees rotation

In a VR 360 system using multiple fisheye cameras, the circular images taken by the fisheye cameras are directly encoded and transmitted. On the receiving side, the decoded image / video is rendered directly according to the viewport intended by the user. This method is useful for low latency live streaming or high quality 360 video delivery because images taken without intermediate projection methods, such as isotropic or cube map projection, are rendered directly.

In previous meetings, the concepts and indicators of monoscopic / stereoscopic, regional packing and fisheye camera and lens parameters were agreed. Although monoscopic / stereoscopic packing arrangements have been considered in the prestitched packing of 360 video, the packing of multiple stereoscopic fisheye images has not yet been addressed. Unlike pre-stitched packing of 360 degree video where rectangular or triangular areas can be flexibly packed, typical fisheye cameras are mainly proposed for circular video data.

Various layouts of stereoscopic packing for fisheye video

26A is an exemplary diagram of stereoscopic fisheye video in a vertical stereo format according to the present disclosure. 26B is an illustration of stereoscopic fisheye video in left and right stereo format according to the present disclosure.

Omnidirectional Fisheye Video

Without projection and region-specific packing processors, multiple circular images taken by fisheye cameras can be projected directly onto image frames. The image frame may comprise omnidirectional fisheye video. At the receiving side, the decoded omnidirectional fisheye video is stitched and rendered according to the user's intended viewport using the signaled fisheye video parameters. The fisheye video parameters include lens distortion correction (LDC) parameters with a local field of view (FOV), lens shading compensation parameters with red-green-blue gains. At least one of a displayed field of view information and a camera extrinsic parameter.

Syntax

Table 10 shows syntax for stereoscopic fisheye video for multiview.

29 is an exemplary diagram for describing a fisheye camera according to the present disclosure. The meaning of each term is as follows.

Semantics

num_circular_images specifies the number of circular images in the coded picture of each sample. num_circular_images can be 2 or any other nonzero integer.

image_center_x is a fixed point 16,16 value indicating the horizontal coordinate of the center of the circular image in the encoded picture of each sample to which the present syntax is applied in the luma samples.

image_center_y is a fixed point 16,16 value indicating the vertical coordinate of the center of the circular image in the encoded picture of each sample to which the present syntax is applied in the luma samples.

full_radius is a fixed point 16,16 value that indicates the radius from the center of the circular image to the edge of the full round image in luma samples.

frame_radius is a fixed point 16,16 value that indicates the radius from the center of the circular image to the edge of the nearest image boundary in luma samples. The circular fisheye image can be cropped by the camera frame, and frame_radius is the radius of the circle indicating the pixels that are not available.

scene_radius is a fixed point 16,16 value that indicates the radius from the center of the circular image to the edge of the region of the nearest image in the luma samples. The image area is an area free of obstructions from the camera body itself, and for stitching, there is no lens distortion too large.

image_rotation is a fixed point 16.16 that indicates the amount of rotation of the circular image in degrees. Different video camera manufacturers use different coordinate systems or different layouts for each photographed individual fisheye image. The image can range from -90 degrees to +90 degrees or from -180 degrees to +180 degrees.

image_flip indicates whether the image is flipped or how flipped. Thus, the reverse flip operation needs to be applied. If image_flip is 0, the image was not flipped. If image_flip is 1, the image is flipped vertically. If image_flip is 2, the image is flipped horizontally. If image_flip is 3, the image is flipped horizontally and flipped vertically

image_scale_axis_angle, image_scale_x, and image_scale_y are fixed point 16.16 values that indicate along which axis the image is scaled and how scaled. By indicating the value of image_scale_axis_angle in angle units, the axis is defined by a single angle. An angle of zero (image_scale_axis_angle) means that the horizontal vector is completely horizontal and the vertical vector is completely vertical. The values of image_scale_x and image_scale_y indicate the scaling ratios of the directions parallel and perpendicular to the axis, respectively.

field_of_view is a fixed point 16.16 value indicating the FOV of the fisheye lens in angle units. The typical value (field_of_view) of the hemispherical fisheye lens is 180 degrees.

num_angle_for_displaying_fov indicates the number of angles. If num_angle_for_displaying_fov is 12, the fisheye image is divided into 12 sectors. The angle of the angular sector is 30 degrees. The value of the FOV superimposed with the displayed FOV is defined clockwise.

displayed_fov indicates the rendered and displayed FOV and the corresponding image area of each fisheye camera image.

overlapped_fov indicates overlapped regions in terms of FOV between multiple circular images.

The parameters indicate a relationship between fisheye images. On the other hand, scene_radius represents the relationship between the fisheye lens and the camera body.

If the value of num_circular_images is 2, the default value of displayed_fov is 180 degrees.

However, the values may vary depending on the characteristics of the lens and the content.

Referring to FIG. 31, for example, if the stitching quality having displayed_fov values is 170 degrees for the left camera and the quality for the right camera is better than the default value (180 degrees) of 190 degrees, the values of the displayed display_fov may be updated.

However, for multiple fisheye images, a single displayed_fov value may not account for the exact region of each fisheye image.

Referring to FIG. 31, displayed_fov (dark portion) varies depending on the direction. In order to explain displayed_fov along the direction, num_angle_for_displaying_fov is introduced, and displayed_fov and overlapped_fov are defined in the clockwise direction.

num_polynomial_coefficients is an integer specifying the number of coefficients present in the polynomial. List of coefficients of polynomial polynomial_coefficient_K is a fixed point 16.16 value representing the coefficients of the polynomial describing the transformation of the fisheye space into an undistorted plane image. An explanation of the polynomial can be found in "Omnidirectional Camera Calibration" by Scaramuzza et al.

num_local_fov_region indicates the number of local fitting regions having different field of view (FOV).

Start_radius, end_radius, start_angle, and end_angle indicate an area for local fitting / warping that changes the actual FOV for locally displaying.

radius_delta indicates a delta value for indicating a different FOV for each radius.

angle_delta indicates a delta value for indicating a different FOV for each angle.

local_fov_weight indicates a weight value for the FOV of the position specified by start_radius, end_radius, start_angle, end_angle, the angle index i and the radius index j.

num_polynomial_coefficeients_lsc may be an order of polynomial approximation of the lens shading curve.

polynomial_coefficient_K_lsc_R may be a polynomial coefficient approximating the lens shading curve for the red color component in the fixed point 16.16 format.

polynomial_coefficient_K_lsc_G may be a polynomial coefficient that approximates the lens shading curve for the green color component in the fixed point 16.16 format.

polynomial_coefficient_K_lsc_B may be a polynomial coefficient that approximates the lens shading curve for the blue color component in the fixed point 16.16 format.

num_deadzones is an integer indicating the number of dead zones in the coded picture of each sample applied by this syntax.

deadzone_left_horizontal_offset, deadzone_top_vertical_offset, deadzone_width, and deadzone_height are integer values indicating the position and size of the dead zone rectangular area. You can't use pixels in the dead zone.

deadzone_left_horizontal_offset and deadzone_top_vertical_offset indicate, in luma samples, the horizontal and vertical coordinates of the upper left corner of the dead zone in the encoded picture, respectively.

deadzone_width and deadzone_height indicate the width and height of the dead zone in luma samples, respectively. In order to save the bit for representing the video, all the pixels in the dead zone are set to the same pixel value (eg all black).

A method for transmitting stereoscopic video content, the method comprising: a plurality of projections from the plurality of omnidirectional images based on data of a stereoscopic image including a plurality of omnidirectional images having parallax; Generating a first frame comprising first views; Generating a second frame including a plurality of second views by packing a plurality of first regions included in the plurality of first views based on region-wise packing information; And transmitting data relating to the generated second frame, wherein the plurality of second views includes a plurality of second regions corresponding to the plurality of first regions, and the packing information for each region may include: It includes information about the shape, orientation or transformation of each of the plurality of second regions.

The packing information for each region may further include information indicating whether the stereoscopic video has a left and right stereoscopic 360 format or a vertical stereoscopic 360 format.

Further, the packing information for each area may be stereoscopic indicating one of non-application of packing by area, packing by separate-independent area, packing by separate-mirroring area, packing by mixed-independent area, and packing by mixed-pair area. It may further include a packing type.

The information on the shape of each of the plurality of second regions indicates one of the plurality of shapes as the shape of each of the plurality of second regions, and the plurality of shapes may include a trapezoid.

In addition, the method for transmitting stereoscopic video content according to the present disclosure further includes generating an omnidirectional image of one of the plurality of omnidirectional images based on images acquired by the plurality of fisheye lenses, The information about the one omnidirectional image may include: information indicating the number of divided regions for dividing an image acquired by each of the plurality of fisheye lenses according to a specific angle with respect to a center; Information indicating an area corresponding to a field of view (FOV) in each of the divided areas; And information indicating an area overlapping an image acquired by another fisheye lens in each of the divided areas.

In addition, each of the plurality of first views may be a spherical projection image, an equirectangular projection image (ERP image), or a tetrahedral projection image, and the regular polyhedral projection image may be a tetrahedral projection image, a cube projection image, an octahedron projection image, It may be a dodecahedron projection image or a dodecahedron projection image.

The packing information for each area may further include location information and size information of the plurality of first areas and location information and size information of the plurality of second areas.

In addition, when the stereoscopic packing type indicates the non-applying of the packing for each region, the position information and the size information of each of the plurality of first regions may include the position information of the corresponding second region among the plurality of second regions, and It may be the same as the size information.

In addition, when the stereoscopic packing type indicates packing for each separation-independent area, the plurality of second views may be separated and packed independently.

In addition, when the stereoscopic packing type indicates packing per separation-mirror region, the plurality of second views may be separated and packed in the same manner.

In addition, when the stereoscopic packing type indicates packing for each mixed-independent area, the plurality of second views may be mixed with each other, and the plurality of second views may be independently packed.

When the stereoscopic packing type indicates packing per region by mixed-pair, the plurality of second views may be mixed with each other, paired, and packed.

The plurality of first views may be a cube projection images including a front surface, a rear surface, a left surface, a right surface, an upper surface, and a lower surface, and the plurality of second regions may be the front surface, the rear surface, the left surface, and the right surface. Each of the areas corresponding to the left side, the right side, the top side, and the bottom side of the plurality of second regions may have a trapezoidal shape. The size of the region corresponding to the front surface of the plurality of second regions may be larger than the size of the region corresponding to the rear surface.

In the above, the configuration of the present invention has been described in detail with reference to the accompanying drawings, which are merely examples, and those skilled in the art to which the present invention pertains have various modifications and changes within the scope of the technical idea of the present invention. Of course this is possible. Therefore, the protection scope of the present invention should not be limited to the above-described embodiment but should be defined by the description of the claims below.

Claims

A method for transmitting stereoscopic video content, the method comprising:

Generating a first frame including a plurality of first views projected from the plurality of omnidirectional images based on data of the stereoscopic image including a plurality of omnidirectional images having parallax; step;

Generating a second frame including a plurality of second views by packing a plurality of first regions included in the plurality of first views based on region-wise packing information; And

Transmits data regarding the generated second frame,

The plurality of second views includes a plurality of second regions corresponding to the plurality of first regions,

The packing information for each region includes information about a shape, orientation, or transformation of each of the plurality of second regions.

Method for transmitting stereoscopic video content.
The method of claim 1,

The packing information for each region is,

Further comprising information indicating whether the stereoscopic video has a left and right stereoscopic 360 format or a vertical stereoscopic 360 format,

Method for transmitting stereoscopic video content.
The method of claim 1,

The packing information for each region is,

Further comprising a stereoscopic packing type indicating one of non-applied packing per area, separate-independent area packing, separate-mirroring area packing, mixed-independent area packing and mixed-pair area per packing;

Method for transmitting stereoscopic video content.
The method of claim 1,

The information about the shape of each of the plurality of second regions indicates one of the plurality of shapes as the shape of each of the plurality of second regions,

Wherein the plurality of shapes comprises a trapezoid,

Method for transmitting stereoscopic video content.
The method of claim 1,

Generating an omnidirectional image of the plurality of omnidirectional images based on the images acquired by the plurality of fisheye lenses,

The information about the one omnidirectional image is

Information indicating a number of divided regions for dividing an image acquired by each of the plurality of fisheye lenses according to a specific angle with respect to a center;

Information indicating an area corresponding to a field of view (FOV) in each of the divided areas; And

In each of the divided regions, information indicating an area overlapping with an image obtained by another fisheye lens,

Method for transmitting stereoscopic video content.
The method of claim 1,

Each of the plurality of first views is a spherical projection image, an equirectangular projection image (ERP image), or a tetrahedral projection image,

The tetrahedral projection image may be a tetrahedral projection image, a cube projection image, an octahedron projection image, a dodecahedron projection image, or an icosahedron projection image.

Method for transmitting stereoscopic video content.
The method of claim 1,

The packing information for each region is,

Location information and size information of the plurality of first regions;

The apparatus further includes location information and size information of the plurality of second regions.

Method for transmitting stereoscopic video content.
The method of claim 3,

When the stereoscopic packing type indicates non-application of region-specific packing, the position information and the size information of each of the plurality of first regions may be the position information and the size information of the corresponding second region among the plurality of second regions. Same as,

When the stereoscopic packing type indicates packing by separate-independent regions, the plurality of second views are separated and packed independently,

When the stereoscopic packing type indicates packing by separation-mirror regions, the plurality of second views are separated and packed in the same manner,

When the stereoscopic packing type indicates packing by mixed-independent regions, the plurality of second views are mixed with each other, and the plurality of second views are independently packed;

When the stereoscopic packing type indicates packing per region by mixed-pair, the plurality of second views are mixed with each other, paired, and packed.

Method for transmitting stereoscopic video content.
The method of claim 1,

The plurality of first views are cube projection images including a front surface, a rear surface, a left surface, a right surface, an upper surface, and a lower surface.

The plurality of second regions correspond to the front surface, the rear surface, the left surface, the right surface, the top surface, and the bottom surface, respectively.

Each of the regions corresponding to the left side, the right side, the top side, and the bottom side of the plurality of second regions has a trapezoidal shape,

The size of the region corresponding to the front surface of the plurality of second regions is larger than the size of the region corresponding to the rear surface.

Method for transmitting stereoscopic video content.
An apparatus for transmitting stereoscopic video content, the apparatus comprising:

Memory;

Transceiver; And

At least one processor coupled to the memory and a transceiver;

The at least one processor,

Generate a first frame including a plurality of first views projected from the plurality of omnidirectional images based on data of the stereoscopic image including a plurality of omnidirectional images having parallax; Generating a second frame including a plurality of second views by packing a plurality of first regions included in the plurality of first views based on region-wise packing information. Transmitting data relating to the second frame, wherein

The plurality of second views includes a plurality of second regions corresponding to the plurality of first regions,

The packing information for each region includes information about a shape, orientation, or transformation of each of the plurality of second regions.

A device for packing stereoscopic video content.
The method of claim 10,

The packing information for each region is,

Further comprising information indicating whether the stereoscopic video has a left and right stereoscopic 360 format or a vertical stereoscopic 360 format,

Device for transmitting stereoscopic video content.
The method of claim 10,

The packing information for each region is,

Further comprising a stereoscopic packing type indicating one of non-applied packing per area, separate-independent area packing, separate-mirroring area packing, mixed-independent area packing and mixed-pair area per packing;

Device for transmitting stereoscopic video content.
The method of claim 10,

The information about the shape of each of the plurality of second regions indicates one of the plurality of shapes as the shape of each of the plurality of second regions,

Wherein the plurality of shapes comprises a trapezoid,

Device for transmitting stereoscopic video content.
The method of claim 10,

The at least one processor,

And generating an omnidirectional image of one of the plurality of omnidirectional images based on the images acquired by the plurality of fisheye lenses,

The information about the one omnidirectional image is

Information indicating a number of divided regions for dividing an image acquired by each of the plurality of fisheye lenses according to a specific angle with respect to a center;

Information indicating an area corresponding to a field of view (FOV) in each of the divided areas; And

In each of the divided regions, information indicating an area overlapping with an image obtained by another fisheye lens,

Device for transmitting stereoscopic video content.
The method of claim 10,

The at least one processor,

A method according to any one of claims 6 to 9

Device for transmitting stereoscopic video content.