WO2018233661A1

WO2018233661A1 - Method and apparatus of inter prediction for immersive video coding

Info

Publication number: WO2018233661A1
Application number: PCT/CN2018/092142
Authority: WO
Inventors: Cheng-Hsuan Shih; Jian-Liang Lin
Original assignee: Mediatek Inc.
Priority date: 2017-06-23
Filing date: 2018-06-21
Publication date: 2018-12-27
Also published as: WO2018233662A1; CN109429561B; CN109691104B; TW201911867A; TW201911861A; TWI686079B; CN109691104A; CN109429561A; TWI690193B

Abstract

Methods and apparatus of processing 360-degree virtual reality images are disclosed. According to one method, deformation along a circle on the sphere is used for Inter prediction of a 2D frame that is projected from a 3D space. A source block in a 2D frame is projected onto a 3D sphere. The source block on the 3D sphere is then rotated to a destination block, which is projected back to the 2D frame and used as an Inter predictor. In one embodiment, the rotation axis can be derived based on motion vectors associated with samples or blocks in a reference picture. In another embodiment, the rotation axis can be derived based on motion vectors associated with processed samples or blocks in the current picture. According to another method, deformation is derived from viewpoint displacement.

Description

METHOD AND APPARATUS OF INTER PREDICTION FOR IMMERSIVE VIDEO CODING

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention claims priority to U.S. Provisional Patent Application, Serial No. 62/523,883, filed on June 23, 2017 and U.S. Provisional Patent Application, Serial No. 62/523,885, filed on June 23, 2017. The U.S. Provisional Patent Applications are hereby incorporated by reference in their entireties.

FIELD OF THE INVENTION

The present invention relates to image/video processing or coding for 360-degree virtual reality (VR) images/sequences. In particular, the present invention relates to Inter prediction for three-dimensional (3D) contents in various projection formats.

BACKGROUND AND RELATED ART

The 360-degree video, also known as immersive video is an emerging technology, which can provide “feeling as sensation of present” . The sense of immersion is achieved by surrounding a user with wrap-around scene covering a panoramic view, in particular, 360-degree field of view. The “feeling as sensation of present” can be further improved by stereographic rendering. Accordingly, the panoramic video is being widely used in Virtual Reality (VR) applications.

Immersive video involves the capturing a scene using multiple cameras to cover a panoramic view, such as 360-degree field of view. The immersive camera usually uses a panoramic camera or a set of cameras arranged to capture 360-degree field of view. Typically, two or more cameras are used for the immersive camera. All videos must be taken simultaneously and separate fragments (also called separate perspectives) of the scene are recorded. Furthermore, the set of cameras are often arranged to capture views horizontally, while other arrangements of the cameras are possible.

The 360-degree virtual reality (VR) images may be captured using a 360-degree spherical panoramic camera or multiple images arranged to cover all filed of views around 360 degrees. The three-dimensional (3D) spherical image is difficult to process or store using the conventional image/video processing devices. Therefore, the 360-degree VR images are often converted to a two-dimensional (2D) format using a 3D-to-2D projection method. For example, equirectangular projection (ERP) and cubemap projection (CMP) have been commonly used projection methods. Accordingly, a 360-degree image can be stored in an equirectangular projected format. The equirectangular projection maps the entire surface of a sphere onto a flat image. The vertical axis is latitude and the horizontal axis is longitude. Fig. 1 illustrates an example of projecting a sphere 110 into a rectangular image 120 according to equirectangular projection (ERP) , where each longitude line is mapped to a vertical line of the ERP picture. For the ERP projection, the areas in the north and south poles of the sphere are stretched more severely (i.e., from a single point to a line) than areas near the equator. Furthermore, due to distortions introduced by the stretching, especially near the two poles, predictive coding tools often fail to make good prediction, causing reduction in coding efficiency. Fig. 2 illustrates a cube 210 with six faces, where a 360-degree virtual reality (VR) image can be projected to the six faces on the cube according to cubemap projection (CMP) . There are various ways to lift the six faces off the cube and repack them into a rectangular picture. The example shown in Fig. 2 divides the six faces into two parts (220a and 220b) , where each part consists of three connected faces. The two parts can be unfolded into two strips (230a and 230b) , where each strip corresponds to a continuous-face picture. The two strips can be combined into a compact rectangular frame according to a selected layout format.

Both ERP and CMP formats have been included in the projection format conversion being considered for the next generation video coding as described in JVET-F1003 (Y. Ye, et al., “Algorithm descriptions of projection format conversion and video quality metrics in 360Lib” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 6th Meeting: Hobart, AU, 31 March–7 April 2017, Document: JVET-F1003) . Besides the ERP and CMP formats, there are various other VR projection formats, such as Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) and Rotated Sphere Projection (RSP) that are widely used in the field.

Fig. 3 illustrates an example of octahedron projection (OHP) , where a sphere is projected onto faces of an 8-face octahedron 310. The eight faces 320 lifted from the octahedron 310 can be converted to an intermediate format 330 by cutting open the face edge between

faces

1 and 5 and rotating

faces

1 and 5 to connect to

faces

2 and 6 respectively, and applying a similar process to faces 3 and 7. The intermediate format can be packed into a rectangular picture 340.

Fig. 4 illustrates an example of icosahedron projection (ISP) , where a sphere is projected onto faces of a 20-face icosahedron 410. The twenty faces 420 from the icosahedron 410 can be packed into a rectangular picture 430 (referred as a projection layout) .

Segmented sphere projection (SSP) has been disclosed in JVET-E0025 (Zhang et al., “AHG8: Segmented Sphere Projection for 360-degree video” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 5th Meeting: Geneva, CH, 12–20 January 2017, Document: JVET-E0025) as a method to convert a spherical image into an SSP format. Fig. 5 illustrates an example of segmented sphere projection, where a spherical image 500 is mapped into a North Pole image 510, a South Pole image 520 and an equatorial segment image 530. The boundaries of 3 segments correspond to latitudes 45°N (502) and 45°S (504) , where 0° corresponds to the equator (506) . The North and South Poles are mapped into 2 circular areas (i.e., 510 and 520) , and the projection of the equatorial segment can be the same as ERP or equal-area projection (EAP) . The diameter of the circle is equal to the width of the equatorial segments because both Pole segments and equatorial segment have a 90° latitude span. The North Pole image 510, South Pole image 520 and the equatorial segment image 530 can be packed into a rectangular image.

Fig. 6 illustrates an example of rotated sphere projection (RSP) , where the sphere 610 is partitioned into a middle 270°x90° region 620, and a residual part 622. Each part of RSP can be further stretched on the top side and the bottom side to generate a deformed part having an oval shape. The two over-shaped parts can be fitted into a rectangular frame 630 as shown in Fig. 6.

The Adjusted Cubemap Projection format (ACP) is based on the CMP. If the two-dimensional coordinate (u’, v’) for CMP is determined, the two-dimensional coordinate (u, v) for ACP can be calculated by adjusting (u’, v’) according to a set of equations:

The 3D coordinates (X, Y, Z) can be derived using a table given the position (u, v) and the face index f. For 3D-to-2D coordinate conversion, given (X, Y, Z) , the (u’, v’) and face index f can be calculated according to a table for CMP. The 2D coordinates for ACP can be calculated according to a set of equations.

Similar to ERP, the EAP also maps a sphere surface to one face. In the (u, v) plane, u and v are in the range [0, 1] . For 2D-to-3D coordinate conversion, given the sampling position (m, n) , 2D coordinates (u, v) are first calculated in the same way as ERP. Then, the longitude and latitude (φ, θ) on the sphere can be calculated from (u, v) as:

φ = (u-0.5) * (2*π) (3)

θ = sin ^-1 (1.0-2*v) (4)

Finally, (X, Y, Z) can be calculated using the same equations as used for ERP:

X = cos (θ) cos (φ) (5)

Y = sin (θ) (6)

Z = -cos (θ) sin (φ) (7)

Inversely, the longitude and latitude (φ, θ) can be evaluated from (X, Y, Z) coordinates using:

φ = tan-1 (-Z/X) (8)

θ = sin-1 (Y/ (X2+Y2+Z2) 1/2) (9)

Since the images or video associated with virtual reality may take a lot of space to store or a lot of bandwidth to transmit, therefore image/video compression is often used to reduce the required storage space or transmission bandwidth. Inter prediction has been a powerful coding tool to explore the inter-frame redundancy using motion estimation/compensation. If conventional Inter prediction is applied to the 2D frames converted from a 3D space, the using motion estimation/compensation techniques may not work properly since an object in the 3D space may become distorted or deformed in the 2D frames due to object movement or relative motion between an object and a camera. In order to improve Inter prediction for 2D frames converted from a 3D space, various Inter prediction techniques are developed to improve the accuracy of Inter prediction for 2D frames converted from a 3D space.

BRIEF SUMMARY OF THE INVENTION

Methods and apparatus of processing 360-degree virtual reality images are disclosed. According to one method, input data for a current block in a 2D (two-dimensional) frame are received, where the 2D frame is projected from a 3D (three-dimensional) sphere. A motion vector associated with a source block in the 2D frame is determined, where the motion vector points from a source location in the source block to a destination location in the 2D frame. The source location, the destination location and the source block in the 2D frame are projected onto the 3D sphere according to a target projection. The source block in the 3D sphere is rotated along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a deformed reference block in the 3D sphere. The deformed reference block in the 3D sphere is mapped back to the 2D frame according to an inverse target projection. The current block in the 2D frame is then encoded or decoded using the deformed reference block in the 2D frame as a predictor.

In one embodiment, the rotation circle corresponds to a largest circle on the surface of the 3D sphere. In another embodiment, the rotation circle is smaller than a largest circle on the surface of the 3D sphere.

In one embodiment, the rotation circle on the surface of the 3D sphere around a rotation axis is determined according to the source location and the destination location on the 3D sphere. For example, a rotation axis,

and a rotation angle θ _a associated with the rotation circle are derived according to

and

and wherein

and

correspond to the source location and the destination location on a surface of the 3D sphere respectively. In another embodiment, a rotation axis and a rotation angle associated with the rotation circle are derived based on motion vectors in a reference frame. For example, the rotation axis

and the rotation angle θ’associated with the rotation circle are derived according to:

and wherein s _i corresponds to one source block in the reference frame, mv (s _i) corresponds to a motion vector of source block s _i, mv

corresponds to one motion vector caused by rotating one location in source block s _i by the rotation angle θ’around the rotation axis

and ||.|| _F is F-norm. In yet another embodiment, the rotation axis and a rotation angle associated with the rotation circle are derived based on motion vectors in an already coded region in a current frame according to the same equation shown above.

The rotation axis associated with the rotation circle can be pre-defined or the rotation axis can be indicated in a bitstream for indicating a path of rotation.

The target projection corresponds to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .

In another method of the present invention, deformation based on displacement of camera is applied to generate an Inter predictor. According to this method, two 2D (two-dimensional) frames corresponding to two different viewpoints are received, where said two 2D frames are projected, using a target projection, from a 3D (three-dimensional) sphere, and wherein a current block, a predicted block for the current block and a neighbouring block are located in said two 2D frames. A forward point of camera is determined based on said two 2D frames and moving flows in said two 2D frames are determined. One or more second motion vectors associated with the predicted block are derived either by referring to one or more first motion vectors of the neighbouring block based on the forward point of camera and the moving flows or according to velocity of camera and depth of background. An Inter prediction block is derived based on the predicted block and said one or more second motion vectors. The current block in the 2D frame is encoded or decoded using the Inter prediction block.

Said deriving one or more second motion vectors associated with the predicted block may comprise determining displacement of camera based on said one or more first motion vectors associated with the neighbouring block. Said one or more second motion vectors associated with the predicted block can be derived from said one or more first motion vectors based on the displacement of camera and the moving flows.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 illustrates an example of projecting a sphere into a rectangular image according to equirectangular projection, where each longitude line is mapped to a vertical line of the ERP picture.

Fig. 2 illustrates a cube with six faces, where a 360-degree virtual reality (VR) image can be projected to the six faces on the cube according to cubemap projection (CMP) .

Fig. 3 illustrates an example of octahedron projection (OHP) , where a sphere is projected onto faces of an 8-face octahedron.

Fig. 4 illustrates an example of icosahedron projection (ISP) , where a sphere is projected onto faces of a 20-face icosahedron.

Fig. 5 illustrates an example of segmented sphere projection (SSP) , where a spherical image is mapped into a North Pole image, a South Pole image and an equatorial segment image.

Fig. 6 illustrates an example of rotated sphere projection (RSP) , where the sphere is partitioned into a middle 270°x90° region and a residual part. These two parts of RSP can be further stretched on the top side and the bottom side to generate deformed parts having oval-shaped boundary on the top part and bottom part.

Fig. 7 illustrates a case that deformation occurs in an ERP frame due to movement, where the North Pole is mapped to a horizontal line on the top of the frame and the equator is mapped to a horizontal line in the middle of the frame.

Fig. 8 illustrates an example of deformation caused by movement in the 3D sphere for the ERP frame.

Fig. 9A to Fig. 9I illustrate examples of deformation in 2D frames projected using various projections, where Fig. 9A is for an ERP frame, 9B illustrates is for a CMP frame, Fig. 9C is for an SSP frame, Fig. 9D is for an OHP frame, Fig. 9E is for an ISP frame, Fig. 9F is for an EAP frame, Fig. 9G is for an ACP frame, Fig. 9H is for an RSP frame, and Fig. 9I is for a Cylindrical Projection frame.

Fig. 10 illustrates a concept of Inter prediction by taking into account of deformation based on rotation according to a method of the present invention.

Fig. 11 illustrates examples that a source block is moved to a destination location through different paths. Due to the different paths, the block at the destination location may have different orientations.

Fig. 12A illustrates an example to describe the 3D movement on a sphere by the yaw, pitch and roll rotations, where a source block is moved to a destination location as prescribed by the three axes.

Fig. 12B illustrates an example to describe the 3D movement on a sphere by rotating around a big circle, where a source block is moved to a destination location along a big circle.

Fig. 12C illustrates an example to describe the 3D movement on a sphere by rotating around a small circle, where a source block is moved to a destination location along a small circle.

Fig. 12D illustrates yet another way of describing object movement of the surface of sphere, where source block is first moved to destination block by rotating on a big circle 1253 on the sphere around a rotation axis and the destination block is rotated around another axis.

Fig. 13 illustrates an exemplary procedure for Inter prediction by taking into account of deformation based on rotation.

Fig. 14 illustrates the step of generating samples for a destination block by rotating samples in a source block by a rotation angle and axis of rotation after the rotation angle and axis of rotation are determined.

Fig. 15 compares an exemplary procedure for Inter prediction according to a method of the present invention by taking into account of deformation based on rotation and convention Inter prediction.

Fig. 16 compares the two different deformation methods based on rotation, where the upper part corresponds to a case of rotation along a big circle and the lower one corresponds to rotation around a new rotation axis.

Fig. 17 illustrates a method to derive the axis of rotation using motion vectors associated with blocks of a reference picture.

Fig. 18 illustrates a method to derive the axis of rotation using motion vectors associated with processed blocks in the current picture.

Fig. 19A illustrates an exemplary process of the deformation based on displacement of camera, where an example of an object (i.e., a tree) is projected onto the surface of a sphere at different camera locations.

Fig. 19B illustrates the locations of an object projected onto a 2D frame for different camera locations.

Fig. 20 illustrates an example of an ERP frame overlaid with the pattern of moving flows, where the flow of background (i.e., static object) can be determined if the camera forward point is known.

Fig. 21 illustrates an example of the pattern of moving flows based on displacement of viewpoint for a CMP frame in the 2x3 layout format.

Fig. 22 illustrates an example of using deformation based on displacement of camera for Inter more prediction.

Fig. 23 illustrates an exemplary procedure of using deformation based on displacement of camera for Inter more prediction.

Fig. 24 illustrates exemplary moving flows in the 2D frame for various projection formats.

Fig. 25 illustrates an exemplary flowchart of a system that applies rotation of sphere to deform a reference block for Inter prediction according to a method of the present invention.

Fig. 26 illustrates an exemplary flowchart of a system that applies displacement of camera to deform a reference block for Inter prediction according to a method of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.

For conventional Inter prediction in video coding, motion estimation/compensation is widely used to explore correlation in video data in order to reduce transmitted information. The conventional video contents correspond to 2D video data and the motion estimation and compensation techniques often assume a translational motion. In the next generation video coding, more advanced motion models, such as affine model, are considered. Nevertheless, these techniques derive a 3D motion model based on the 2D images.

When contents are projected from a 3D space onto to 2D frame, deformation occurs for various projection types. For example, an ERP frame 720 is generated by projecting a 3D sphere 710 onto a rectangular frame as shown in Fig. 7, where the North Pole 712 is mapped to a horizontal line 722 on the top of the frame and the equator 714 is mapped to a horizontal line 724 in the middle of the frame. As shown in Fig. 7, a single point at the North Pole and the equator are both mapped to a horizontal line of the same length. Therefore, the ERP projection substantially stretches an object in the longitudinal direction for the object close to the North Pole. Map 730 illustrates the effect of deformation in ERP, where a circle close to the North Pole or South Pole is mapped to an over-shaped area while the circle remains to be a circle in the middle of the frame.

Fig. 8 illustrates an example of deformation caused by movement in the 3D sphere for the ERP frame. In Fig. 8, block 801 on a 3D sphere 800 is moved to become block 802. When block 801 and moved block 802 are mapped to an ERP frame 803, the two corresponding blocks become

blocks

804 and 805 in the ERP frame. While the blocks (i.e., 801 and 802) on the 3D sphere correspond to a same block, the two blocks (i.e., 804 and 805) have different shapes in the ERP frame.

Fig. 9A to Fig. 9I illustrate examples of deformation in 2D frames projected using various projections. Fig. 9A illustrates an example for an ERP frame 910, where a square block 912 becomes deformed when the block is moved to a location 914 closer to the North Pole. Fig. 9B illustrates an example for a CMP frame 920, where a square block 922 becomes deformed when the block is moved to a location 924. Fig. 9C illustrates an example for an SSP frame 930, where a square block 932 becomes deformed when the block is moved to a location 934. Fig. 9D illustrates an example for an OHP frame 940, where a square block 942 becomes deformed when the block is moved to a location 944. Fig. 9E illustrates an example for an ISP frame 950, where a square block 952 becomes deformed when the block is moved to a location 954. Fig. 9F illustrates an example for an EAP frame 970, where a square block 972 becomes deformed when the block is moved to a location 974. Fig. 9G illustrates an example for an ACP frame 980, where a square block 982 becomes deformed when the block is moved to a location 984. Fig. 9H illustrates an example for an RSP frame 980, where a square block 982 becomes deformed when the block is moved to a location 984. Fig. 9I illustrates an example for a Cylindrical Projection frame 990, where a square block 992 becomes deformed when the block is moved to a location 994.

As described above, object motion in a 3D space may cause object deformation in a 2D frame projected from the 3D sphere. In order to overcome the issue associated with deformation, various methods of Inter prediction for VR Video Processing are disclosed.

Method 1-Deformation based on Rotation

A method to handle the deformation issue for Inter prediction is to project a block in the 2D frame back to the 3D sphere. The block may be a corresponding block in a reference picture prior to motion compensation for a current block. In conventional Inter prediction, the corresponding block is moved in the 2D frame according to the motion vector to point to a reference block and the reference block is used as Inter predictor for the current block. According to this method of the present invention, the block is moved to a designated location on the surface of the 3D sphere. In particular, the block is moved to a new location by rotating the sphere. Finally, the object moved on the surface of the 3D sphere is projected back to the 2D frame. Fig. 10 illustrates a concept of Inter prediction by taking into account of deformation based on rotation. In Fig. 10, block 1013 corresponds to a source block in a 2D frame 1010. Motion vector 1015 of the source block points from a location s _c 1012 in the source block 1013 to a destination location d _c 1014. According to the present method, the data in the 2D frame are projected to the 3D sphere according to a corresponding projection type. For example, if the 2D frame is generated from ERP, the ERP projection is used to project data in the 2D frame to the 3D sphere. Accordingly, location s _c 1012 and location d _c 1014 in the 2D frame are projected to locations

1022 and

1024 in the 3D sphere 1020 respectively. In the 3D space, location

1022 is rotated to location

1024. The same rotation is also applied to other locations of the source block 1013 to generate the destination block. The data in the destination block in the 3D sphere are then projected back to the 2D frame using an inverse ERP projection.

Fig. 11 illustrates examples that a source block is moved to a destination location through different paths. Due to the different paths, the block at the destination location may have different orientations. In 3D sphere 1110, the source block 1112 is moved to the destination location 1114 through path 1113 with slight right turn. In 3D sphere 1120, the source block 1122 is moved to the destination location 1124 through straight path 1123. In 3D sphere 1130, the source block 1132 is moved to the destination location 1134 through path 1133 with slight left turn.

One way to describe the 3D movement on a sphere is the yaw, pitch and roll rotations 1210 as shown in Fig. 12A, where a source block 1212 is moved to a destination location 1214. The axes of yaw 1216, pitch 1217 and roll 1218 are shown. Another way to describe the 3D movement on a sphere 1220 is rotation around a big circle as shown in Fig. 12B, where a big circle 1221, a source block 1222 is moved to a destination location 1224. The rotation 1226 is shown. The big circle 1221 corresponds to the largest circle on the surface of the sphere 1220. Fig. 12C illustrates an example of rotation of the sphere 1230 from source block 1232 to destination block 1234 on a small circle 1233 on the sphere 1235, where the small circle 1233 corresponds to a circle smaller than the largest circle (e.g. circle 1236) on the surface of the sphere 1235. The centre point of rotation is shown as a dot 1237 in Fig. 12C. Another perspective of rotation of the sphere 1240 from source block 1242 to destination block 1244 on a small circle 1243 on the sphere 1245, where the small circle 1243 corresponds to a circle smaller than the largest circle (e.g. circle 1246) on the surface of the sphere 1245. The axis of rotation is shown as an arrow 1247 in Fig. 12C. Fig. 12D illustrates yet another way of describing object movement of the surface of sphere 1250, where source block 1252 is first moved to destination block 1254 by rotating on a big circle 1253 on the sphere 1250 around axis-a1256. After the destination block reaches the final location, the destination block is rotated around axis-b 1257, where axis-b is from the centre of the big circle 1258 to the centre of destination block 1254.

Fig. 13 illustrates an exemplary procedure for Inter prediction by taking into account of deformation based on rotation. In Fig. 13, block 1313 corresponds to a source block in a 2D frame 1310. Motion vector 1315 of the source block points from a location s _c 1312 in the source block 1313 to a destination location d _c 1314. According to the present method, the data in the 2D frame are projected to the 3D sphere according to a corresponding projection type. For example, if the 2D frame is generated from ERP, the ERP projection is used to project data in the 2D frame to the 3D sphere. Accordingly, location s _c 1312 and location d _c 1314 in the 2D frame are projected to locations

1322 and

1324 in the 3D sphere 1320 respectively. In the 3D space, location

1322 is rotated to location

1324 around big circle 1326. The same rotation is also applied to other locations of the source block 1313 to generate the destination block. In Fig. 13, rotation angle θ from

to

is calculated according to:

The axis of rotation

is calculated as:

After rotation angle θ and axis of rotation

are determined, samples s _mn in a block 1410 of the 2D frame are identified as shown in Fig. 14. Samples s _mn are mapped to

1422 in 3D sphere 1420. Samples

are rotated by θ around

-axis to obtain

1424 at the destination location according to Rodrigues'rotation formula:

If the block at the destination location is further rotated around axis-b (i.e., φ≠0) as shown in Fig. 12D, samples

will be further rotated. Otherwise, the samples

are the final rotated samples in 3D sphere.

According to the present method, the rotated samples

1512 in 3D sphere 1510 are projected back to a deformed block 1514 in the 2D frame and used as a new Inter predictor for a source block 1516 as shown in Fig. 15. The source block 1516 may be a corresponding block in a reference picture prior to motion compensation for a current block. The destination block 1514 corresponds to a deformed reference block for Inter prediction according to the present invention. In conventional Inter prediction, the source block 1526 is moved in the 2D frame according to the motion vector to point to a reference block 1524 and the reference block is used as Inter predictor for the current block. Compared to the conventional approach, the Inter predictor 1522 for a source block 1526 in the 2D frame maintains its shape. In the 3D space 1520, the Inter predictor 1522 becomes deformed. Therefore, the conventional Inter predictor does not perform properly due to deformation caused by movement in 3D space.

Method 2-Deformation based on Rotation

In this disclosure, another deformation based on rotation is proposed. In Method 1, the rotation axis is around the normal vector of big circle (i.e.,

) . However, in Method 2, a new rotation axis

is used. Fig. 16 compares the two different deformation methods based on rotation. The upper part corresponds to a case of method 1, where a source block 1612 in the 2D frame 1610 is mapped to block 1622 in 3D sphere 1620. A motion vector 1616 in the 2D frame is mapped to the 3D sphere 1620 for determining the location of the destination block 1624. The source block 1622 is then rotated along the big circle 1626 with rotation axis

to generate the destination block 1624. The destination block 1624 is then mapped back to the 2D frame 1610 to generate the deformed block 1614 as an Inter predictor for the source block 1612.

In Fig. 16, the lower part corresponds to the deformation based on rotation according to Method 2, where the source block 1632 in the 2D frame 1630 is mapped to block 1642 in 3D sphere 1640. A motion path 1636 in the 2D frame is mapped to the 3D sphere 1640 for determining the location of the destination block 1644. The source block 1642 is then rotated along the small circle 1646 with a new rotation axis

to generate the destination block 1644. The destination block 1644 is then mapped back to the 2D frame 1630 to generate the deformed block 1634 as an Inter predictor for the source block 1632. It is observed that the example in the upper part of Fig. 16 (i.e., rotation along a big circle) is a special case of the example in the lower part of Fig. 16.

In Fig. 16, it requires to determine the small circle or the axis of rotation. A method to derive the axis of rotation is shown in Fig. 17, where s _c is the block centre of a source block 1712 and the mv pointing from s _c to for d _c the block is already known. Both s _c to for d _c are mapped to

and

in the 3D sphere 1720 respectively. A motion vector from

to

can be determined in the 3D sphere. The motion vector mapping to the 3D sphere can be applied to all source blocks of the 2D frame as shown in 3D sphere 1730. An axis of rotation, that rotates motion vector mv (s _i) to mv

is selected based on a performance criterion, where

corresponds to the axis of rotation and θ’corresponds to the angle of rotation. For block centres s _i, i=1, 2, …, n, the true mv of s _i is mv (s _i) , and mv

is motion vector that rotate θ’around axis

at s _i.

Solve:

where ||...|| _F is F-norm.

The equation provides a way to select an axis of rotation and rotation angle to achieve the best match between a set of mapped motion vectors and a set of rotated motion vectors. In practice, different ways can be applied to find a rotation axis. Fig. 18 illustrates a method to derive the axis of rotation using motion vectors associated with processed blocks in the current picture as follows:

● Pre-defined a rotation (e.g. yaw=0°/pitch=-90°, or any value encoded in a bit-steam as shown in the 3D space 1810) ,

● Find the best matching rotation axis based on motion vectors in reference frame as shown in the 3D space 1820, or

● Find the best matching rotation axis based on motion vectors of compressed blocks in current frame as shown in the 3D space 1830.

Method 3-Deformation Based on Displacement of Camera

According to another method, the deformation based on displacement of camera is disclosed. Fig. 19A illustrates an example of an object (i.e., a tree) projected onto the surface of a sphere at different camera locations. At camera position A, the tree is projected onto sphere 1910 to form an image 1940 of the tree. At a forward position B, the tree is projected onto sphere 1920 to form an image 1950 of the tree. The image 1941 of the tree corresponding to camera position A is also shown on sphere 1920 for comparison. At a further forward position C, the tree is projected onto sphere 1930 to form an image 1960 of the tree. Image 1942 of the tree corresponding to camera position A and image 1951 of the tree corresponding to camera position B are also shown on sphere 1930 for comparison. In Fig. 19A, for a video captured by a camera moving in a straight line, the direction of movement of the camera in 3D space (as indicated by

arrows

1912, 1922 and 1932 for three different camera locations in Fig. 19A) can be represented by a latitude and longitude

coordinates, where

corresponds to the intersection of the motion vector and the 3D sphere. The point of

is projected to the 2D target projection plane and the point is referred as a forward point (e.g. 1934) .

Fig. 19B illustrates the locations of the tree projected onto a 2D frame 1970 for camera locations A and B in 3D space as shown in Fig. 19A, where the tree image at location 1972 corresponds to camera location A and the tree image at location 1974 corresponds to camera location B. The locations of the tree projected onto a 2D frame 1980 for camera locations B and C in 3D space is also shown in Fig. 19B, where the tree image at location 1982 corresponds to camera location A, the tree image at location 1984 corresponds to camera location B and the tree image at location 1986 corresponds to camera location C.

Fig. 20 illustrates an example of an ERP frame overlaid with the pattern of moving flows, where the flow of background (i.e., static object) can be determined if the camera forward point is known. The flows are indicated by arrows. The camera forward point 2010 and camera backward point 2020 are shown. Moving flows correspond to the moving direction of video content based on a camera moving in a direction. The movement of the camera causes the relative movement of the static background objects, and the moving direction of background object on the 2D frame captured by camera can be represented as moving flows. Multiple frames 2030 may be used to derive the moving flows.

The pattern of moving flows based on displacement of viewpoint can be applied to various projection methods. The moving flow in the 2D frame 2110 is shown for a CMP frame in the 2x3 layout format in Fig. 21.

Fig. 22 illustrates an example of using deformation based on displacement of camera for Inter more prediction. In the 2D frame 2210, the moving flows are indicated by arrows. For each source block (e.g. blocks 2221-2225) , the deformation of the source block (e.g. blocks 2231-2235) can be determined using moving flows of the background for background objects.

Fig. 23 illustrates an exemplary procedure of using deformation based on displacement of camera for Inter more prediction. The steps of this exemplary procedure are as follow:

1. Find out the forward point of camera. An example to derive the forward point has been described in Fig. 19A and associated text.

2. Calculate the moving flow of frame, which correspond to tangent direction at each pixel.

3. Determine the motion vector of each pixel by referring motion vector of neighbouring block or according to velocity of camera and depth of background:

■ MV of neighbouring block can determine the camera displacement based on forward point of camera and moving flow of frame

■ The camera displacement and moving flow can determine the MV of each pixel of predicted block.

For example, the images for two different camera locations can be capture as shown in arrangement 2310. The moving flows (indicated by arrows) and the camera forward point 2322 in the 2D frame 2320 can be determined. The camera displacement and moving flow can then be used to determine the MV of each pixel of predicted block 2324. Accordingly, a deformed block 2326 is derived and used as Inter prediction for the current block 2324.

The deformation based on displacement of camera for Inter more prediction can be applied to various projections. The moving flow in the 2D frame can be mapped to 3D sphere. The moving flow on the 3D sphere 2410 is shown in Fig. 24, where the forward point and two different lines of moving flow (2412 and 2414) are shown. The moving flow on 3D sphere associated with ERP 2420 is shown in Fig. 24, where the moving flows are shown for an ERP frame 2426. The moving flow on 3D sphere associated with CMP 2430 is shown in Fig. 24, where the moving flows are shown for six faces of a CMP frame 2436 in a 2x3 layout format. The moving flow on 3D sphere associated with OHP 2440 is shown in Fig. 24, where the moving flows are shown for eight faces of an OHP frame 2446. The moving flow on 3D sphere associated with ISP 2450 is shown in Fig. 24, where the moving flows are shown for twenty faces of an ISP frame 2456. The moving flow on 3D sphere associated with SSP 2460 is shown in Fig. 24, where the moving flows are shown for segmented faces of an SSP frame 2466.

Fig. 25 illustrates an exemplary flowchart of a system that applies rotation of sphere to deform a reference block for Inter prediction according to a method of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to this method, input data for a current block in a 2D (two-dimensional) frame are received in step 2510, where the 2D frame is projected from a 3D (three-dimensional) sphere. A motion vector associated with a source block in the 2D frame is determined in step 2520, where the motion vector points from a source location in the source block to a destination location in the 2D frame. The source location, the destination location and the source block in the 2D frame are projected onto the 3D sphere according to a target projection in step 2530. The source block in the 3D sphere is rotated along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a deformed reference block in the 3D sphere in step 2540. The deformed reference block in the 3D sphere is mapped back to the 2D frame according to an inverse target projection in step 2550. The current block in the 2D frame is encoded or decoded using the deformed reference block in the 2D frame as an Inter predictor in step 2560.

Fig. 26 illustrates an exemplary flowchart of a system that applies displacement of camera to deform a reference block for Inter prediction according to a method of the present invention. Two 2D (two-dimensional) frames corresponding to two different viewpoints are received in step 2610, where said two 2D frames are projected, using a target projection, from a 3D (three-dimensional) sphere, and wherein a current block, a predicted block for the current block and a neighbouring block are located in said two 2D frames. A forward point of camera is determined based on said two 2D frames in step 2620. Moving flows in said two 2D frames are determined in step 2630. One or more second motion vectors associated with the predicted block are derived either by referring to one or more first motion vectors of the neighbouring block based on the forward point of camera and the moving flows or according to velocity of camera and depth of background in step 2640. An Inter prediction block is derived based on the predicted block and said one or more second motion vectors in step 2650. The current block in the 2D frame is encoded or decoded using the Inter prediction block in step 2660.

The flowcharts shown above are intended for serving as examples to illustrate embodiments of the present invention. A person skilled in the art may practice the present invention by modifying individual steps, splitting or combining steps with departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more electronic circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method of processing 360-degree virtual reality images, the method comprising:

receiving input data for a current block in a 2D (two-dimensional) frame, wherein the 2D frame is projected from a 3D (three-dimensional) sphere;

determining a motion vector associated with a source block in the 2D frame, wherein the motion vector points from a source location in the source block to a destination location in the 2D frame;

projecting the source location, the destination location and the source block in the 2D frame onto the 3D sphere according to a target projection;

rotating the source block in the 3D sphere along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a deformed reference block in the 3D sphere;

mapping the deformed reference block in the 3D sphere back to the 2D frame according to an inverse target projection; and

encoding or decoding the current block in the 2D frame using the deformed reference block in the 2D frame as an Inter predictor.
The method of Claim 1, wherein the rotation circle corresponds to a largest circle on the surface of the 3D sphere.
The method of Claim 1, wherein the rotation circle is smaller than a largest circle on the surface of the 3D sphere.
The method of Claim 1, wherein the rotation circle on the surface of the 3D sphere around a rotation axis is determined according to the source location and the destination location on the 3D sphere.
The method of Claim 4, wherein a rotation axis,
and a rotation angle θ _a associated with the rotation circle are derived according to
and
and wherein
and
correspond to the source location and the destination location on a surface of the 3D sphere respectively.
The method of Claim 1, wherein a rotation axis and a rotation angle associated with the rotation circle are derived based on motion vectors in a reference frame.
The method of Claim 6, wherein the rotation axis
and the rotation angle θ’ associated with the rotation circle are derived according to:

[0089]

and wherein s _i corresponds to one source block in the reference frame, mv (s _i) corresponds to a motion vector of source block s _i,
corresponds to one motion vector caused by rotating one location in source block s _i by the rotation angle θ’ around the rotation axis
and ||.|| _F is F-norm.
The method of Claim 1, wherein a rotation axis and a rotation angle associated with the rotation circle are derived based on motion vectors in an already coded region in a current frame.
The method of Claim 8, wherein the rotation axis
and the rotation angle θ’ associated with the rotation circle are derived according to:

[0090]

and wherein s _i corresponds to one source block in the already coded region in the current frame, mv (s _i) corresponds to a motion vector of source block s _i, and
corresponds to one motion vector caused by rotating one location in source block s _i by the rotation angle θ’ around the rotation axis
The method of Claim 1, wherein a rotation axis associated with the rotation circle is pre-defined or the rotation axis is indicated in a bitstream for indicating a path of rotation.
The method of Claim 1, wherein the target projection corresponds to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .
An apparatus for processing 360-degree virtual reality images, the apparatus comprising one or more electronic devices or processors configured to:

receive input data for a current block in a 2D (two-dimensional) frame, wherein the 2D frame is projected from a 3D (three-dimensional) sphere;

determine a motion vector associated with a source block in the 2D frame, wherein the motion vector points from a source location in the source block to a destination location in the 2D frame;

project the source location, the destination location and the source block in the 2D frame onto the 3D sphere according to a target projection;

rotate the source block in the 3D sphere along a rotation circle on a surface of the 3D sphere around a rotation axis to generate a deformed reference block in the 3D sphere;

map the deformed reference block in the 3D sphere back to the 2D frame according to an inverse target projection; and

encode or decode the current block in the 2D frame using the deformed reference block in the 2D frame as an Inter predictor.
A method of processing 360-degree virtual reality images, the method comprising:

receiving two 2D (two-dimensional) frames corresponding to two different viewpoints, wherein said two 2D frames are projected, using a target projection, from a 3D (three-dimensional) sphere, and wherein a current block, a predicted block for the current block and a neighbouring block are located in said two 2D frames;

determining a forward point of camera based on said two 2D frames;

determining moving flows in said two 2D frames;

deriving one or more second motion vectors associated with the predicted block either by referring to one or more first motion vectors of the neighbouring block based on the forward point of camera and the moving flows or according to velocity of camera and depth of background;

deriving an Inter prediction block based on the predicted block and said one or more second motion vectors; and

encoding or decoding the current block in the 2D frame using the Inter prediction block.
The method of Claim 13, wherein said deriving one or more second motion vectors associated with the predicted block comprises determining displacement of camera based on said one or more first motion vectors associated with the neighbouring block; and wherein said one or more second motion vectors associated with the predicted block are derived from said one or more first motion vectors based on the displacement of camera and the moving flows.
The method of Claim 13, wherein the target projection corresponds to Equirectangular Projection (ERP) and Cubemap Projection (CMP) , Adjusted Cubemap Projection (ACP) , Equal-Area Projection (EAP) , Octahedron Projection (OHP) , Icosahedron Projection (ISP) , Segmented Sphere Projection (SSP) , Rotated Sphere Projection (RSP) , or Cylindrical Projection (CLP) .
An apparatus for processing 360-degree virtual reality images, the apparatus comprising one or more electronic devices or processors configured to:

receive two 2D (two-dimensional) frames corresponding to two different viewpoints, wherein said two 2D frames are projected, using a target projection, from a 3D (three-dimensional) sphere, and wherein a current block, a predicted block for the current block and a neighbouring block are located in said two 2D frames;

determine a forward point of camera based on said two 2D frames;

determine moving flows in said two 2D frames;

derive one or more second motion vectors associated with the predicted block either by referring to one or more first motion vectors of the neighbouring block based on the forward point of camera and the moving flows or according to velocity of camera and depth of background;

derive an Inter prediction block based on the predicted block and said one or more second motion vectors; and

encode or decode the current block in the 2D frame using the Inter prediction block.