WO2018233662A1

WO2018233662A1 - Method and apparatus of motion vector derivations in immersive video coding

Info

Publication number: WO2018233662A1
Application number: PCT/CN2018/092143
Authority: WO
Inventors: Cheng-Hsuan Shih; Jian-Liang Lin
Original assignee: Mediatek Inc.
Priority date: 2017-06-23
Filing date: 2018-06-21
Publication date: 2018-12-27
Also published as: CN109429561B; TW201911867A; CN109691104B; TWI686079B; CN109691104A; WO2018233661A1; TW201911861A; TWI690193B; CN109429561A

Abstract

Methods and apparatus of processing 360-degree virtual reality images are disclosed. According to one method, a system that maps a motion vector in a 2D frame into a 3D space and then applies rotation of sphere to adjust motion vector in the 3D space. The rotated motion vector is then mapped back to the 2D frame for processing 360-degree virtual reality images. According to another method, a system that derives motion vector from displacement of viewpoint is disclosed. According to yet another method, a system that applies scaling to adjust motion vector is disclosed. The motion vector scaling can be performed in the 3D space by projecting a motion vector from a 2D frame to a 3D sphere. After motion vector scaling, the scaled motion vector is projected back to the 2D frame. Alternatively, the forward projection, motion vector scaling and inverse projection can be combined into a single function.

Description

METHOD AND APPARATUS OF MOTION VECTOR DERIVATIONS IN IMMERSIVE VIDEO CODING

CROSS REFERENCE TO RELATED PATENT APPLICATION (S)

The present invention claims priority to U.S. Provisional Patent Application, Serial No. 62/523,883, filed on June 23, 2017 and U.S. Provisional Patent Application, Serial No. 62/523,885, filed on June 23, 2017. The U.S. Provisional Patent Applications are hereby incorporated by reference in their entireties.

TECHNICAL FIELD

The present invention relates to image/video processing or coding for 360-degree virtual reality (VR) images/sequences. In particular, the present invention relates to deriving motion vectors for three-dimensional (3D) contents in various projection formats.

BACKGROUND

The 360-degree video, also known as immersive video is an emerging technology, which can provide “feeling as sensation of present” . The sense of immersion is achieved by surrounding a user with wrap-around scene covering a panoramic view, in particular, 360-degree field of view. The “feeling as sensation of present” can be further improved by stereographic rendering. Accordingly, the panoramic video is being widely used in Virtual Reality (VR) applications.

Immersive video involves the capturing a scene using multiple cameras to cover a panoramic view, such as 360-degree field of view. The immersive camera usually uses a panoramic camera or a set of cameras arranged to capture 360-degree field of view. Typically, two or more cameras are used for the immersive camera. All videos must be taken simultaneously and separate fragments (also called separate perspectives) of the scene are recorded. Furthermore, the set of cameras are often arranged to capture views horizontally, while other arrangements of the cameras are possible.

The 360-degree virtual reality (VR) images may be captured using a 360-degree spherical panoramic camera or multiple images arranged to cover all filed of views around 360 degrees. The three-dimensional (3D) spherical image is difficult to process or store using the conventional image/video processing devices. Therefore, the 360-degree VR images are often converted to a two-dimensional (2D) format using a 3D-to-2D projection method. For example, equirectangular projection (ERP) and cubemap projection (CMP) have been commonly used projection methods. Accordingly, a 360-degree image can be stored in an equirectangular projected format. The equirectangular projection maps the entire surface of a sphere onto a flat image. The vertical axis is latitude and the horizontal axis is longitude. Fig. 1 illustrates an example of projecting a sphere 110 into a rectangular image 120 according to equirectangular projection (ERP) , where each longitude line is mapped to a vertical line of the ERP picture. For the ERP projection, the areas in the north and south poles of the sphere are stretched more severely (i.e., from a single point to a line) than areas near the equator. Furthermore, due to distortions introduced by the stretching, especially near the two poles, predictive coding tools often fail to make good prediction, causing reduction in coding efficiency. Fig. 2 illustrates a cube 210 with six faces, where a 360-degree virtual reality (VR) image can be projected to the six faces on the cube according to cubemap projection (CMP) . There are various ways to lift the six faces off the cube and repack them into a rectangular picture. The example shown in Fig. 2 divides the six faces into two parts (220a and 220b) , where each part consists of three connected faces. The two parts can be unfolded into two strips (230a and 230b) , where each strip corresponds to a continuous-face picture. The two strips can be combined into a compact rectangular frame according to a selected layout format.

Both ERP and CMP formats have been included in the projection format conversion being considered for the next generation video coding as described in JVET-F1003 (Y. Ye, et al., “Algorithm descriptions of projection format conversion and video quality metrics in 360Lib” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 6th Meeting: Hobart, AU, 31 March –7 April 2017, Document: JVET-F1003) . Besides the ERP and CMP formats, there are various other VR projection formats, such as Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) and Rotated Sphere Projection (RSP) that are widely used in the field.

Fig. 3 illustrates an example of octahedron projection (OHP) , where a sphere is projected onto faces of an 8-face octahedron 310. The eight faces 320 lifted from the octahedron 310 can be converted to an intermediate format 330 by cutting open the face edge between

faces

1 and 5 and rotating

faces

1 and 5 to connect to faces 2 and 6 respectively, and applying a similar process to faces 3 and 7. The intermediate format can be packed into a rectangular picture 340.

Fig. 4 illustrates an example of icosahedron projection (ISP) , where a sphere is projected onto faces of a 20-face icosahedron 410. The twenty faces 420 from the icosahedron 410 can be packed into a rectangular picture 430 (referred as a projection layout) .

Segmented sphere projection (SSP) has been disclosed in JVET-E0025 (Zhang et al., “AHG8: Segmented Sphere Projection for 360-degree video” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 5th Meeting: Geneva, CH, 12–20 January 2017, Document: JVET-E0025) as a method to convert a spherical image into an SSP format. Fig. 5 illustrates an example of segmented sphere projection, where a spherical image 500 is mapped into a North Pole image 510, a South Pole image 520 and an equatorial segment image 530. The boundaries of 3 segments correspond to latitudes 45°N (502) and 45°S (504) , where 0° corresponds to the equator (506) . The North and South Poles are mapped into 2 circular areas (i.e., 510 and 520) , and the projection of the equatorial segment can be the same as ERP or equal-area projection (EAP) . The diameter of the circle is equal to the width of the equatorial segments because both Pole segments and equatorial segment have a 90° latitude span. The North Pole image 510, South Pole image 520 and the equatorial segment image 530 can be packed into a rectangular image.

Fig. 6 illustrates an example of rotated sphere projection (RSP) , where the sphere 610 is partitioned into a middle 270°x90° region 620, and a residual part 622. Each part of RSP can be further stretched on the top side and the bottom side to generate a deformed part having an oval shape. The two over-shaped parts can be fitted into a rectangular frame 630 as shown in Fig. 6.

The Adjusted Cubemap Projection format (ACP) is based on the CMP. If the two-dimensional coordinate (u’, v’) for CMP is determined, the two-dimensional coordinate (u, v) for ACP can be calculated by adjusting (u’, v’) according to a set of equations:

The 3D coordinates (X, Y, Z) can be derived using a table given the position (u, v) and the face index f. For 3D-to-2D coordinate conversion, given (X, Y, Z) , the (u’, v’) and face index f can be calculated according to a table for CMP. The 2D coordinates for ACP can be calculated according to a set of equations.

Similar to ERP, the EAP also maps a sphere surface to one face. In the (u, v) plane, u and v are in the range [0, 1]. For 2D-to-3D coordinate conversion, given the sampling position (m, n) , 2D coordinates (u, v) are first calculated in the same way as ERP. Then, the longitude and latitude (φ, θ) on the sphere can be calculated from (u, v) as:

φ = (u-0.5) * (2*π) (3)

θ = sin ^-1 (1.0-2*v) (4)

Finally, (X, Y, Z) can be calculated using the same equations as used for ERP:

X = cos (θ) cos (φ) (5)

Y = sin (θ) (6)

Z = -cos (θ) sin (φ) (7)

Inversely, the longitude and latitude (φ, θ) can be evaluated from (X, Y, Z) coordinates using:

φ = tan-1 (-Z/X) (8)

θ = sin-1 (Y/ (X2+Y2+Z2) 1/2) (9)

Since the images or video associated with virtual reality may take a lot of space to store or a lot of bandwidth to transmit, therefore image/video compression is often used to reduce the required storage space or transmission bandwidth. Inter prediction has been a powerful coding tool to explore the inter-frame redundancy using motion estimation/compensation. If conventional Inter prediction is applied to the 2D frames converted from a 3D space, the using motion estimation/compensation techniques may not work properly since an object in the 3D space may become distorted or deformed in the 2D frames due to object movement or relative motion between an object and a camera. In order to improve Inter prediction for 2D frames converted from a 3D space, various Inter prediction techniques are developed to improve the accuracy of Inter prediction for 2D frames converted from a 3D space.

SUMMARY

Methods and apparatus of processing 360-degree virtual reality images are disclosed. According to one method, input data for a current block in a 2D (two-dimensional) frame are received, where the 2D frame is projected from a 3D (three-dimensional) sphere. A first motion vector associated with a neighbouring block in the 2D frame is determined, where the first motion vector points from a first start location in the neighbouring block to a first end location in the 2D frame. The first motion vector is projected onto the 3D sphere according to a target projection. The first motion vector is rotated in the 3D sphere along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a second motion vector in the 3D sphere. The second motion vector in the 3D sphere is mapped back to the 2D frame according to an inverse target projection. The current block in the 2D frame is encoded or decoded using the second motion vector. The second motion vector may be included as a candidate in a Merge candidate list or an AMVP (Advanced Motion Vector Prediction) candidate list for encoding or decoding of the current block.

In one embodiment, the rotation circle corresponds to a largest circle on the surface of the 3D sphere. In another embodiment, the rotation circle is smaller than a largest circle on the surface of the 3D sphere. The target projection may correspond to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .

In one embodiment, said projecting the first motion vector onto the 3D sphere comprises projecting the first start location, the first end location and a second start location in the 2D frame onto the 3D sphere according to the target projection, where the second start location is at a corresponding location in the current block corresponding to the first start location in the neighbouring block. Said rotating the first motion vector in the 3D sphere along the rotation circle comprises determining a target rotation for rotating from the first start location to the second start location in the 3D sphere along a rotation circle on a surface of the 3D sphere around a rotation axis and rotating the first end location to a second end location on the 3D sphere using the target rotation. Said mapping the second motion vector in the 3D sphere back to the 2D frame comprises mapping the second end location on the 3D sphere back to the 2D frame according to the inverse target projection and determining the second motion vector in the 2D frame based on the second start location and the second end location in the 2D frame.

According to another method, two 2D frames are received, where said two 2D frames are projected, using a target projection, from a 3D sphere corresponding to two different viewpoints, and a current block and a neighbouring block are located in said two 2D frames. A forward point of camera is determined based on said two 2D frames. Moving flows in said two 2D frames are determined. Displacement of camera based on a first motion vector associated with the neighbouring block is determined. A second motion vector associated with the current block is determined based on the displacement of camera. The current block in the 2D frame is then encoded or decoded using the second motion vector.

For the above method, the moving flows in said two 2D frames can be calculated from a tangent direction at each pixel in said two 2D frames. The second motion vector can be included as a candidate in a Merge candidate list or an AMVP (Advanced Motion Vector Prediction) candidate list for encoding or decoding of the current block. The target projection may correspond to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .

According to yet another method, input data for a current block in a 2D frame are received, wherein the 2D frame is projected from a 3D sphere according to a target projection. A first motion vector associated with a neighbouring block in the 2D frame, where the first motion vector points from a first start location in the neighbouring block to a first end location in the 2D frame. The first motion vector is scaled to generate a second motion vector. The current block in the 2D frame is then encoded or decoded using the second motion vector.

In one embodiment of the above method, said scaling the first motion vector to generate the second motion vector comprises projecting the first start location, the first end location and a second start location in the 2D frame onto the 3D sphere according to the target projection, where the second start location is at a corresponding location in the current block corresponding to the first start location in the neighbouring block. Said scaling the first motion vector to generate the second motion vector further comprises scaling a longitude component of the first motion vector to generate a scaled longitude component of the first motion vector; scaling a latitude component of the first motion vector to generate a scaled latitude component of the first motion vector; and determining a second end location corresponding to the second start location based on the scaled longitude component of the first motion vector and the scaled latitude component of the first motion vector. Said scaling the first motion vector to generate the second motion vector further comprises mapping the second end location on the 3D sphere back to the 2D frame according to an inverse target projection; and determining the second motion vector in the 2D frame based on the second start location and the second end location in the 2D frame.

In another embodiment, said scaling the first motion vector to generate the second motion vector comprises applying a first combined function to generate an x-component of the second motion vector and applying a second combined function to generate a y-component of the second motion vector; where the first combined function and the second combined function are dependent on the first start location, a second start location in the current block associated with corresponding first start location, the first motion vector and a target projection; and wherein the first combined function and the second combined function combine the target projection for projecting first data in the 2D frame into second data in the 3D sphere, scaling a selected motion vector in the 3D sphere into a scaled motion vector in the 3D sphere, and inverse target projection for projecting the scaled motion vector into the 2D frame. In one embodiment, when the target projection corresponds to Equirectangular Projection (ERP) , the first combined function corresponds to

and the second combined function corresponds to an identity function, wherein

corresponds to a first latitude associated with the first starting location and

corresponds to a second latitude associated with the second starting location.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 illustrates an example of projecting a sphere into a rectangular image according to equirectangular projection, where each longitude line is mapped to a vertical line of the ERP picture.

Fig. 2 illustrates a cube with six faces, where a 360-degree virtual reality (VR) image can be projected to the six faces on the cube according to cubemap projection (CMP) .

Fig. 3 illustrates an example of octahedron projection (OHP) , where a sphere is projected onto faces of an 8-face octahedron.

Fig. 4 illustrates an example of icosahedron projection (ISP) , where a sphere is projected onto faces of a 20-face icosahedron.

Fig. 5 illustrates an example of segmented sphere projection (SSP) , where a spherical image is mapped into a North Pole image, a South Pole image and an equatorial segment image.

Fig. 6 illustrates an example of rotated sphere projection (RSP) , where the sphere is partitioned into a middle 270°x90° region and a residual part. These two parts of RSP can be further stretched on the top side and the bottom side to generate deformed parts having oval-shaped boundary on the top part and bottom part.

Fig. 7A and Fig. 7B illustrate examples of deformation of 2D image due to the rotation of sphere.

Fig. 8 illustrates an example of rotation of the sphere from point a to point b on a big circle on the sphere, where the big circle corresponds to the largest circle on the surface of the sphere.

Fig. 9 illustrates an example of rotation of the from point a to point b on a small circle on the sphere, where the big circle corresponds to a circle smaller than the largest circle on the surface of the sphere.

Fig. 10 illustrates an example of deriving the motion vector for 2D projected pictures using the rotation of the sphere model.

Fig. 11 illustrates an example of using the MV derived based on rotation of sphere as a Merge or AMVP candidate.

Fig. 12 illustrates an example of an object (i.e., a tree) projected onto the surface of a sphere at different camera locations.

Fig. 13 illustrates an example of an ERP frame overlaid with the pattern of moving flows, where the flow of background (i.e., static object) can be determined if the camera forward point is known.

Fig. 14 illustrates an exemplary procedure of MV derivation based on displacement of viewpoint.

Fig. 15 illustrates exemplary MV derivation based on displacement of viewpoint for various projection methods.

Fig. 16 illustrates an example of deformation associated with motion in the ERP (Equirectangular Projection) frame.

Fig. 17 illustrates an exemplary procedure for the MV scaling technique in 3D sphere.

Fig. 18 illustrates an exemplary procedure of MV scaling in a 2D frame.

Fig. 19 illustrates an exemplary procedure of MV scaling in the ERP (Equirectangular Projection) frame.

Fig. 20 illustrates an exemplary procedure of MV scaling in an Equirectangular Projection (ERP) frame, where a current block and a neighbouring block in an ERP picture are shown.

Fig. 21 illustrates an exemplary procedure of MV scaling in a Cubemap Projection (CMP) frame, where a current block and a neighbouring block in a CMP picture are shown.

Fig. 22 illustrates an exemplary procedure of MV scaling in a Segmented Sphere Projection (SSP) frame, where a current block and a neighbouring block in an SSP picture are shown.

Fig. 23 illustrates an exemplary procedure of MV scaling in an Octahedron Projection (OHP) frame, where a current block and a neighbouring block in an OHP picture are shown.

Fig. 24 illustrates an exemplary procedure of MV scaling in an Icosahedron Projection (ISP) frame, where a current block and a neighbouring block in an ISP picture are shown.

Fig. 25 illustrates an exemplary procedure of MV scaling in an Equal-Area Projection (EAP) frame, where a current block and a neighbouring block in an EAP picture are shown.

Fig. 26 illustrates an exemplary procedure of MV scaling in an Adjusted Cubemap Projection (ACP) frame, where a current block and a neighbouring block in an ACP picture are shown.

Fig. 27 illustrates an exemplary procedure of MV scaling in a Rotated Sphere Projection (RSP) frame, where a current block and a neighbouring block in a RSP picture are shown.

Fig. 28 illustrates an exemplary procedure of MV scaling in a Cylindrical Projection (CLP) frame, where a current block and a neighbouring block in an CLP picture are shown.

Fig. 29 illustrates an exemplary flowchart of a system that applies rotation of sphere to adjust motion vector for processing 360-degree virtual reality images according to an embodiment of the present invention.

Fig. 30 illustrates an exemplary flowchart of a system that derives motion vector from displacement of viewpoint for processing 360-degree virtual reality images according to an embodiment of the present invention.

Fig. 31 illustrates an exemplary flowchart of a system that applies scaling to adjust motion vector for processing 360-degree virtual reality images according to an embodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED IMPLEMENTATIONS

The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.

For conventional Inter prediction in video coding, motion estimation/compensation is widely used to explore correlation in video data in order to reduce transmitted information. The conventional video contents correspond to 2D video data and the motion estimation and compensation techniques often assume a translational motion. In the next generation video coding, more advanced motion models, such as affine model, are considered. Nevertheless, these techniques derive a 3D motion model based on the 2D images.

In the present invention, the motion information is derived in the 3D domain so that more accurate motion information may be derived. According to one method, the rotation of sphere is assumed for the cause of block deformation in the 2D video contents. Fig. 7A and Fig. 7B illustrate examples of deformation of 2D image due to the rotation of sphere. In Fig. 7A, block 701 on a 3D sphere 700 is moved to become block 702. When block 701 and moved block 702 are mapped to an ERP frame 703, the two corresponding blocks become

blocks

704 and 705 in the ERP frame. While the blocks (i.e., 701 and 702) on the 3D sphere correspond to a same block, the two blocks (i.e., 704 and 705) have different shapes in the ERP frame. In other words, object move or rotate in the 3D space may cause object distortion in the 2D frame (i.e., ERP frame in this example) . In Fig. 7B, an area 711 is on the surface of a sphere 710. The area 711 is moved to area 714 caused by the rotation of the sphere along path 713. A motion vector 712 associated with area 711 is mapped to motion vector 715 due to the rotation of sphere. The correspondences of parts 711-715 in the 2D domain 720 are shown as parts 721-725. In another example, the parts corresponding to 713-725 for trajectory 733 on the surface of a sphere 730 are shown as parts 733-735 respectively, and the correspondences of parts 733-735 in the 2D domain 740 are shown as parts 743-745. In yet another example, the parts corresponding to 713-725 for trajectory 753 on the surface of a sphere 750 are shown as parts 753-755 respectively, and the correspondences of parts 753-755 in the 2D domain 760 are shown as parts 763-765.

Fig. 8 illustrates an example of rotation of the sphere 800 from point a 830 to point b 840 on a big circle 820 on the sphere 810, where the big circle 820 corresponds to the largest circle on the surface of the sphere 810. The axis of rotation 850 is shown as an arrow in Fig. 8. The angle of rotation is θ _a.

Fig. 9 illustrates an example of rotation of the sphere 900 from point a to point b on a small circle 910 on the sphere 920, where the small circle 910 corresponds to a circle smaller than the largest circle (e.g. circle 930) on the surface of the sphere 920. The centre point of rotation is shown as a dot 912 in Fig. 9. Another example of rotation of the sphere 950 from point a to point b on a small circle 960 on the sphere 970. A big circle 980 (i.e., the largest circle) is indicated in Fig. 9. The axis of rotation 990 is shown as an arrow in Fig. 9.

Fig. 10 illustrates an example of deriving the motion vector for 2D projected pictures using the rotation of the sphere model. In Fig. 10, illustration 1010 depicts an example of deriving the motion vector for 2D projected pictures using the rotation of the sphere model according to an embodiment of the present invention. Locations a and b are two locations in a 2D projected picture. Motion vector mv _a points from a to a′. The goal is to find the motion vector mv _b for location b. The present invention projects the 2D projected picture onto a 3D sphere 1020 using 2D-to-3D projection. Rotation around a small circle or big circle is applied to rotate location a to location b. Location a’is rotated to location b’according to the same selected rotation of sphere. An inverse projection is then applied to the 3D sphere to convert the location b’to the 2D frame 1030. The motion vector for location b can be calculated according to mv _b = b’–b. Motion vectors mv _a and mv _b are two-dimensional vectors in the (x, y) domain. Locations a, a’, b and b’are two-dimensionalcoordinates in the (x, y) domain. The notations

and

are three-dimensional coordinates in the (θ, φ) domain.

The derived motion vector can be used as a candidate for video coding using Merge or AMVP (advanced motion vector prediction) mode, where the Merge mode and the AMVP modes are techniques to code a block or motion information of the block predictively as disclosed in HEVC (High Efficiency Video Coding) . When a block is coded in the Merge mode, the current block uses the motion information of a block indicated by a Merge index indicating to a selected candidate in a Merge candidate list. When a block is coded in the AMVP mode, the motion vector of the current block predictively by a predictor as indicated by an AMVP index pointing to a selected candidate in the AMVP candidate list. Fig. 11 illustrates an example of using the MV derived based on rotation of sphere as a Merge or AMVP candidate. In Fig. 11, layout 1110 of neighbouring blocks that are used to derive Merge candidate is shown for block 1112 where the neighbouring blocks include spatial neighbouring blocks A ₀, A ₁, B ₀, B ₁ and B ₂ and temporal neighbouring blocks Col ₀ or Col ₁. For a current block 1120 in a 2D frame, the motion vectors from block A ₀ and B ₀ can be used to derive motion candidate for the current block. As disclosed above, the motion vector mv _a of neighbouring block A ₀ can be used to derive a corresponding motion vector mv _a’for the current block according to the rotation of sphere. Similarly, the motion vector mv _b of neighbouring block B ₀ can be used to derive a corresponding motion vector mv _b’at the current block according to the rotation of sphere. Motion vectors mv _a’and mv _b’can be included in the Merge or AMVP candidate list.

In this present invention, motion vector derivation based on displacement of viewpoint is disclosed. Fig. 12 illustrates an example of an object (i.e., a tree) projected onto the surface of a sphere at different camera locations. At camera position A, the tree is projected onto sphere 1210 to form an image 1240 of the tree. At a forward position B, the tree is projected onto sphere 1220 to form an image 1250 of the tree. The image 1241 of the tree corresponding to camera position A is also shown on sphere 1220 for comparison. At a further forward position C, the tree is projected onto sphere 1230 to form an image 1260 of the tree. Image 1242 of the tree corresponding to camera position A and image 1251 of the tree corresponding to camera position B are also shown on sphere 1230 for comparison. In Fig. 12, for a video captured by a camera moving in a straight line, the direction of movement of the camera in 3D space (as indicated by

arrows

1212, 1222 and 1232 for three different camera locations in Fig. 12) can be represented by a latitude and longitude

coordinates, where

corresponds to the intersection of the motion vector and the 3D sphere. The point of

is projected to the 2D target projection plane and the point is referred as a forward point.

Fig. 13 illustrates an example of an ERP frame overlaid with the pattern of moving flows, where the flow of background (i.e., static object) can be determined if the camera forward point is known. The flows are indicated by arrows. The camera forward point 1310 and camera backward point 1320 are shown. Moving flows correspond to the moving direction of video content based on a camera moving in a direction. The movement of the camera causes the relative movement of the static background objects, and the moving direction of background object on the 2D frame captured by camera can be represented as moving flows. Fig. 14 illustrates an exemplary procedure of MV derivation based on displacement of viewpoint. In Fig. 14, an object 1410 is projected to the sphere 1420 to form an image 1422 on the surface of the sphere corresponding to camera location 1424. At camera location 1434, the object 1410 is projected to the sphere 1430 to form an image 1432 on the surface of the sphere. The corresponding location of the object is also shown on the surface of the sphere 1430 at location 1423. The procedure to derive the motion vector based on displacement of viewpoint as shown as follows:

·Find out the forward point 1440 of camera.

·Calculate moving flows 1450 in the 2D frame (i.e., tangent direction at each pixel) .

·Determine the motion vector of current block by using MV of neighbouring block:

a. The MV of neighbouring block can be used to determine the displacement of camera, as indicated by arrow 1460;

b. The displacement of camera can be used to determine the MV of current block, as indicated by arrow 1470.

The MV derivation based on displacement of viewpoint can be applied to various projection methods. The moving flow in the 2D frame can be mapped to 3D sphere. The moving flow on the 3D sphere 1510 is shown in Fig. 15, where the forward point and two different lines of moving flow (1512 and 1514) are shown. The moving flow on 3D sphere associated with ERP 1520 is shown in Fig. 15, where the moving flows are shown for an ERP frame 1526. The moving flow on 3D sphere associated with CMP 1530 is shown in Fig. 15, where the moving flows are shown for six faces of a CMP frame 1536 in a 2x3 layout format. The moving flow on 3D sphere associated with OHP 1540 is shown in Fig. 15, where the moving flows are shown for eight faces of an OHP frame 1546. The moving flow on 3D sphere associated with ISP 1550 is shown in Fig. 15, where the moving flows are shown for twenty faces of an ISP frame 1556. The moving flow on 3D sphere associated with SSP 1560 is shown in Fig. 15, where the moving flows are shown for segmented faces of an SSP frame 1566.

As mentioned before, there are various mappings, such as ERP, CMP, SSP, OHP and ISP, to project from a 3D sphere to a 2D frame. When an object in the 3D sphere moves, the corresponding object in a 2D frame may become deformed. The deformation is projection-type dependent. The length, angle, and area may not remain uniform at all mapping positions in the 2D frame. An MV scaling technique on 3D sphere is disclosed to perform deformation by scaling MV on a 3D sphere to minimize effects of projection types on deformation. An MV scaling technique on 2D frame is also disclosed, which can be directly applied to 2D frames by combining projection, scaling MV and inverse projection into a single function.

Fig. 16 illustrates an example of deformation associated with motion in the ERP frame. In Fig. 16, three motion vectors (1612, 1614 and 1616) along three latitude lines are shown on the surface of 3D sphere 1610, where the three motion vectors have about the same length. If the surface of the 3D sphere is unfolded into a 2D plane 1620, the three motion vectors (1622, 1624 and 1626) maintains their equal length property. For the ERP frame, the unfolded image needs to be stretched, where more stretching is needed in higher latitudes. Therefore, the three motion vectors (1632, 1634 and 1636) have different lengths as shown in Fig. 16. If a neighbouring block having a motion vector 1634 is used for a current block (i.e., a motion vector at location 1632) , motion vector 1634 has to be properly scaled before used for coding, such as a merge or AMVP candidate. The example in Fig. 16 illustrates the need for MV scaling.

An exemplary procedure for the MV scaling technique in 3D sphere is disclosed as follows. Suppose the starting point of mv _a is a, and the ending point of mv _a is a’. We can predict the motion vector mv _b at point b by following procedures:

1. Map a, a’, and b from 2D frame 1710 to 3D sphere 1720 as shown in Fig. 17.

Let the mapping function be

= Pprojection type (x, y) which maps the pixel at (x, y) in a 2D frame to

2. Calculate Δθ and

3. Apply scaling functions scaleθ () and

to Δθ and

4. Calculate

according to the following formula:

5. Map

from the 3D sphere 1720 to the 2D frame 1730 to produce b’.

Let the inverse function be (x, y) = IP _projection _type

which maps the pixel at

on the 3D sphere to (x, y) in the 2D frame.

6. MV mv _b can be determined according to mv _b = b’–b.

In the above procedure, scaling functions for Δθ and

are scaleθ

and

respectively. Some examples of scaling functions are shown as follows.

Example 1:

Therefore, Δθ'= Δθ;

Example 2:

Example 3: Generic form of scaling function–projection independent

Example 4: Generic form of scaling function–projection dependent

The MV scaling in the 3D domain can also be performed in a 2D domain by combining projection function, scaling function, and inverse projection function into a single function f (a, b, mv _a, projection type) . Therefore, the mapping between 3D and 2D can be skipped.

mv _b = (x _mvb, y _mvb) = f (a, b, mv _a, projection type)

x _mvb = f _x (a, b, mv _a, projection type)

y _mvb = f _y (a, b, mv _a, projection type)

In the above equation, f _x (a, b, mv _a, projection type) is the single function to generate x _mvb and f _y (a, b, mv _a, projection type) is the single function to generate y _mvb. Fig. 18 illustrates an exemplary procedure of MV scaling in a 2D frame. Two blocks (A and B) in a 2D frame 1810 are shown. The motion vector for location a in block A is marked. The end position for location a associated with mv _a is denoted as a’in Fig. 18. Location b in block B is marked. The end position for location b associated with mv _b is denoted as b’in Fig. 18. The steps of projection function (i.e., forward projection 1830) , scaling function in the 3D sphere 1820, and inverse projection function (i.e., inverse projection 1840) can be combined into a single function according to the above equation.

An example of MV scaling in an ERP frame is shown in Fig. 19. In Fig. 19, the surface of 3D sphere 1910 is unfolded into a 2D plane 1920. For the ERP frame 1930, the unfolded image needs to be stretched so that all latitude lines have the same length. Based on the features of ERP, the length of horizontal line is enlarged more when it is closer to North or South Pole. In the ERP frame, a neighbouring block 1932 has a motion vector mv ₁. A motion vector mv ₂ needs to be derived based on mv ₁ for the current block 1934. The derived motion vector can be used for coding the current block. For example, the derived block can be used as a Merge or AMVP candidate. Since the neighbouring block is located at a different location on the 3D sphere, the motion vector associated with the neighbouring block needs to be scaled before it is used for coding the current block. In particular, the motion vector of the neighbouring block is stretched more in the x-direction than a motion vector for the current block since the neighbouring block is in higher latitude. Therefore, the x-component of mv ₁ needs to be scaled down before it is used for the current block. The MV scaling process 1940 is shown in Fig. 19, where the motion vector mv ₁ at location a of a neighbouring block is scaled to mv ₂ and used for coding the current block. In one embodiment of the present invention, a scaling function to preserve the motion distance is disclosed, where mv ₂ is a function of mv ₁, θ1,

θ2,

mv _y2 = mv _y1

In the above equation,

corresponds to the longitude and latitude of location a in the neighbouring block and

corresponds to the longitude and latitude of location b in the current block. Location a and location b may correspond to the centre of respective blocks or other location of respective blocks. The y-component of the derived motion vector is the same as the y-component of the mv ₁. In other words, the scaling function for the y-component is an identity function.

The MV scaling for other projections are also disclosed. In Fig. 20, the MV scaling for ERP is disclosed, where a current block 2012 and a neighbouring block 2014 in an ERP picture 2010 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, ERP) , where (x ₁, y ₁) corresponds to the (x, y) coordinates of location a in the neighbouring block and (x ₂, y ₂) corresponds to the (x, y) coordinates of location b in the current block.

In Fig. 21, the MV scaling for CMP is disclosed, where a current block 2112 and a neighbouring block 2114 in a CMP picture 2110 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, CMP) .

In Fig. 22, the MV scaling for SSP is disclosed, where a current block 2212 and a neighbouring block 2214 in an SSP picture 2210 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, SSP) .

In Fig. 23, the MV scaling for OHP is disclosed, where a current block 2312 and a neighbouring block 2314 in an OHP picture 2310 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, OHP) .

In Fig. 24, the MV scaling for ISP is disclosed, where a current block 2412 and a neighbouring block 2414 in an ISP picture 2410 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, ISP) .

In Fig. 25, the MV scaling for EAP is disclosed, where a current block 2512 and a neighbouring block 2514 in an EAP picture 2510 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, EAP) .

In Fig. 26, the MV scaling for ACP is disclosed, where a current block 2612 and a neighbouring block 2614 in an ACP picture 2610 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, ACP) .

In Fig. 27, the MV scaling for RSP is disclosed, where a current block 2712 and a neighbouring block 2714 in an RSP picture 2710 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, RSP) .

Besides these projections mentioned above, cylindrical projection has also been used to project a 3D sphere into a 2D frame. Conceptually, cylindrical projections are created by wrapping a cylinder 2820 around a globe 2830 and projecting light through the globe onto the cylinder as shown in Fig. 28. Cylindrical projections represent meridians as straight, evenly-spaced, vertical lines and parallels as straight horizontal lines. Meridians and parallels intersect at right angles, as they do on the globe. Depending on the placement of the light source, various CLPs are generated. In Fig. 28, the MV scaling for Cylindrical Projection (CLP) is disclosed, where a current block 2812 and a neighbouring block 2814 in a CLP picture 2810 are shown. The neighbouring block has a motion vector mv ₁. The motion vector mv ₁ from the neighbouring block is used to derive a motion vector mv ₂ for the current block. According to the present invention, the motion vector mv ₁ from the neighbouring block needs to be scaled using a scaling function according to mv ₂ = f (mv ₁, x ₁, y ₁, x ₂, y ₂, CLP) .

Fig. 29 illustrates an exemplary flowchart of a system that applies rotation of sphere to adjust motion vector for processing 360-degree virtual reality images according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to this method, input data for a current block in a 2D (two-dimensional) frame are received in step 2910, where the 2D frame is projected from a 3D (three-dimensional) sphere. The input data may correspond to pixel data of a 2D frame to be encoded. A first motion vector associated with a neighbouring block in the 2D frame is determined in step 2920, where the first motion vector points from a first start location in the neighbouring block to a first end location in the 2D frame. The first motion vector is projected onto the 3D sphere according to a target projection in step 2930. The first motion vector in the 3D sphere is rotated along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a second motion vector in the 3D sphere in step 2940. The second motion vector in the 3D sphere is mapped back to the 2D frame according to an inverse target projection. The current block in the 2D frame is encoded or decoded using the second motion vector in step 2960.

Fig. 30 illustrates an exemplary flowchart of a system that derives motion vector from displacement of viewpoint for processing 360-degree virtual reality images according to an embodiment of the present invention. According to another method, two 2D (two-dimensional) frames are received in step 3010, where said two 2D frames are projected, using a target projection, from a 3D (three-dimensional) sphere corresponding to two different viewpoints, and where a current block and a neighbouring block are located in said two 2D frames. A forward point of camera is determined based on said two 2D frames in step 3020. Moving flows in said two 2D frames are determined in step 3030. Displacement of camera is determined based on a first motion vector associated with the neighbouring block in step 3040. A second motion vector associated with the current block is derived based on the displacement of camera in step 3050. The current block in the 2D frame is encoded or decoded using the second motion vector in step 3060.

Fig. 31 illustrates an exemplary flowchart of a system that applies scaling to adjust motion vector for processing 360-degree virtual reality images according to an embodiment of the present invention. According to this method, input data for a current block in a 2D (two-dimensional) frame are received in step 3110, where the 2D frame is projected from a 3D (three-dimensional) sphere according to a target projection. A first motion vector associated with a neighbouring block in the 2D frame is determined in step 3120, where the first motion vector points from a first start location in the neighbouring block to a first end location in the 2D frame. The first motion vector is scaled to generate a second motion vector in step 3130. The current block in the 2D frame is encoded or decoded using the second motion vector in step 3140.

The flowcharts shown above are intended for serving as examples to illustrate embodiments of the present invention. A person skilled in the art may practice the present invention by modifying individual steps, splitting or combining steps with departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more electronic circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method of processing 360-degree virtual reality images, the method comprising:

receiving input data for a current block in a 2D (two-dimensional) frame, wherein the 2D frame is projected from a 3D (three-dimensional) sphere;

determining a first motion vector associated with a neighbouring block in the 2D frame, wherein the first motion vector points from a first start location in the neighbouring block to a first end location in the 2D frame;

projecting the first motion vector onto the 3D sphere according to a target projection;

rotating the first motion vector in the 3D sphere along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a second motion vector in the 3D sphere;

mapping the second motion vector in the 3D sphere back to the 2D frame according to an inverse target projection; and

encoding or decoding the current block in the 2D frame using the second motion vector.
The method of Claim 1, wherein the second motion vector is included as a candidate in a Merge candidate list or an AMVP (Advanced Motion Vector Prediction) candidate list for encoding or decoding of the current block.
The method of Claim 1, wherein the rotation circle corresponds to a largest circle on the surface of the 3D sphere.
The method of Claim 1, wherein the rotation circle is smaller than a largest circle on the surface of the 3D sphere.
The method of Claim 1, wherein the target projection corresponds to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .
The method of Claim 1, wherein said projecting the first motion vector onto the 3D sphere comprises projecting the first start location, the first end location and a second start location in the 2D frame onto the 3D sphere according to the target projection, wherein the second start location is at a corresponding location in the current block corresponding to the first start location in the neighbouring block.
The method of Claim 6, wherein said rotating the first motion vector in the 3D sphere along the rotation circle comprises determining a target rotation for rotating from the first start location to the second start location in the 3D sphere along a rotation circle on a surface of the 3D sphere around a rotation axis and rotating the first end location to a second end location on the 3D sphere using the target rotation.
The method of Claim 7, wherein said mapping the second motion vector in the 3D sphere back to the 2D frame comprises mapping the second end location on the 3D sphere back to the 2D frame according to the inverse target projection and determining the second motion vector in the 2D frame based on the second start location and the second end location in the 2D frame.
An apparatus for processing 360-degree virtual reality images, the apparatus comprising one or more electronic devices or processors configured to:

receive input data for a current block in a 2D (two-dimensional) frame, wherein the 2D frame is projected from a 3D (three-dimensional) sphere;

determine a first motion vector associated with a neighbouring block in the 2D frame, wherein the first motion vector points from a first start location in the neighbouring block to a first end location in the 2D frame;

project the first motion vector onto the 3D sphere according to a target projection;

rotate the first motion vector in the 3D sphere along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a second motion vector in the 3D sphere;

map the second motion vector in the 3D sphere back to the 2D frame according to an inverse target projection; and

encode or decode the current block in the 2D frame using the second motion vector.
A method of processing 360-degree virtual reality images, the method comprising:

receiving two 2D (two-dimensional) frames, wherein said two 2D frames are projected, using a target projection, from a 3D (three-dimensional) sphere corresponding to two different viewpoints, and wherein a current block and a neighbouring block are located in said two 2D frames;

determining a forward point of camera based on said two 2D frames;

determining moving flows in said two 2D frames;

determining displacement of camera based on a first motion vector associated with the neighbouring block;

deriving a second motion vector associated with the current block based on the displacement of camera; and

encoding or decoding the current block in the 2D frame using the second motion vector.
The method of Claim 10, wherein the moving flows in said two 2D frames are calculated from a tangent direction at each pixel in said two 2D frames.
The method of Claim 10, wherein the second motion vector is included as a candidate in a Merge candidate list or an AMVP (Advanced Motion Vector Prediction) candidate list for encoding or decoding of the current block.
The method of Claim 10, wherein the target projection corresponds to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .
A method of processing 360-degree virtual reality images, the method comprising:

receiving input data for a current block in a 2D (two-dimensional) frame, wherein the 2D frame is projected from a 3D (three-dimensional) sphere according to a target projection;

determining a first motion vector associated with a neighbouring block in the 2D frame, wherein the first motion vector points from a first start location in the neighbouring block to a first end location in the 2D frame;

scaling the first motion vector to generate a second motion vector; and

encoding or decoding the current block in the 2D frame using the second motion vector.
The method of Claim 14, wherein said scaling the first motion vector to generate the second motion vector comprises projecting the first start location, the first end location and a second start location in the 2D frame onto the 3D sphere according to the target projection, wherein the second start location is at a corresponding location in the current block corresponding to the first start location in the neighbouring block.
The method of Claim 15, said scaling the first motion vector to generate the second motion vector further comprises scaling a longitude component of the first motion vector to generate a scaled longitude component of the first motion vector; scaling a latitude component of the first motion vector to generate a scaled latitude component of the first motion vector; and determining a second end location corresponding to the second start location based on the scaled longitude component of the first motion vector and the scaled latitude component of the first motion vector.
The method of Claim 16, wherein said scaling the first motion vector to generate the second motion vector further comprises mapping the second end location on the 3D sphere back to the 2D frame according to an inverse target projection; and determining the second motion vector in the 2D frame based on the second start location and the second end location in the 2D frame.
The method of Claim 14, wherein said scaling the first motion vector to generate the second motion vector comprises applying a first combined function to generate an x-component of the second motion vector and applying a second combined function to generate a y-component of the second motion vector; wherein the first combined function and the second combined function are dependent on the first start location, a second start location in the current block associated with corresponding first start location, the first motion vector and a target projection; and wherein the first combined function and the second combined function combine the target projection for projecting first data in the 2D frame into second data in the 3D sphere, scaling a selected motion vector in the 3D sphere into a scaled motion vector in the 3D sphere, and inverse target projection for projecting the scaled motion vector into the 2D frame.
The method of Claim 18, wherein the target projection corresponds to Equirectangular Projection (ERP) and the first combined function corresponds to
wherein
corresponds to a first latitude associated with the first starting location and
corresponds to a second latitude associated with the second starting location, and the second combined function corresponds to an identity function.
The method of Claim 14, wherein the second motion vector is included as a candidate in a Merge candidate list or an AMVP (Advanced Motion Vector Prediction) candidate list for encoding or decoding of the current block.
The method of Claim 14, wherein the target projection corresponds to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .