CN109429561B

CN109429561B - Method and device for processing 360-degree virtual reality image

Info

Publication number: CN109429561B
Application number: CN201880001715.5A
Authority: CN
Inventors: 施正轩; 林建良
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2017-06-23
Filing date: 2018-06-21
Publication date: 2022-01-21
Anticipated expiration: 2038-06-21
Also published as: WO2018233662A1; WO2018233661A1; CN109691104A; TW201911861A; TWI686079B; CN109691104B; CN109429561A; TW201911867A; TWI690193B

Abstract

The invention discloses a method and a device for processing a 360-degree virtual reality image. According to one method, the system maps motion vectors in the 2D frame into 3D space, and then the rotation of the sphere is applied to adjust the motion vectors in the 3D space. The rotational motion vectors are then mapped back to the 2D frames for processing the 360 degree virtual reality image. According to another method, a system for deriving a motion vector from a translation of a viewpoint is disclosed. According to yet another method, a system for applying scaling to adjust a motion vector is disclosed. Motion vector scaling may be performed in 3D space by projecting motion vectors from a 2D frame onto a 3D sphere. After the motion vector scaling, the scaled motion vectors are projected back into the 2D frame. Alternatively, the pre-projection, motion vector scaling and back-projection may be combined into a single function.

Description

Method and device for processing 360-degree virtual reality image

Priority declaration

This application claims priority from us provisional patent application No. 62/523,883 filed on 2017, 06, 23 and us provisional patent application No. 62/523,885 filed on 2017, 06, 23. The above-mentioned U.S. provisional patent application is incorporated herein by reference in its entirety.

Technical Field

The present invention relates to image/video processing or encoding and decoding of 360-degree Virtual Reality (VR) images/sequences. In particular, the present invention relates to deriving motion vectors for three-dimensional (3D) content in different projection formats.

Background

360 degree video, also known as immersive video, is an emerging technology that can provide a "immersive sensation". Immersive sensation is achieved by surrounding the user, in particular a 360 degree panorama, with a surrounding scene that covers the panoramic view. The "feeling of being personally on the scene" can be further improved by the stereoscopic rendering. Accordingly, panoramic video is widely used in Virtual Reality (VR) applications.

Immersive video involves capturing a scene using multiple cameras to cover a panoramic view, e.g., a 360 degree field of view. Immersive cameras typically use panoramic cameras or a collection of cameras for capturing a 360 degree field of view. Typically, two or more cameras are used for the immersive camera. All videos must be acquired simultaneously and a single segment of the scene (also referred to as a single view) is recorded. Furthermore, camera sets are typically used to capture views horizontally, while other camera designs are possible.

A 360 degree VR image may be captured using a 360 degree spherical panoramic camera or multiple images for covering all fields of view around 360 degrees. The 3D spherical image is difficult to process or store using conventional image/video processing devices. Thus, using a 3D-to-2D projection method, a 360 degree VR image is typically converted to a 2D format. For example, the equiangular projection (ERP) and cube projection (CMP) have generally adopted projection methods. Thus, the 360 degree image may be stored in an equiangular projection format. The equiangular projection projects the surface of the entire sphere onto a planar image. The vertical axis is latitude and the horizontal axis is longitude. Fig. 1 shows an example of projecting a sphere 110 according to ERP onto a rectangular image 120, where each longitude line is mapped to a vertical line of the ERP image. For ERP projection, the regions in the north and south poles of the sphere are stretched more heavily (i.e., from a single point to a line) than the regions near the equator. Furthermore, due to the distortion caused by stretching, especially near the two extreme points, the predictive coding tool is often unable to make a good prediction, so that the coding efficiency is reduced. FIG. 2 shows a cube 210 with 6 faces, where a 360 degree VR image can be projected onto the 6 faces on the cube according to CMP. There are different ways to take 6 faces out of the cube and combine them into a rectangular image. The example in fig. 2 divides 6 faces into two parts (i.e., 220a and 220b), where each part includes 3 connection faces. The two portions may be spread into two bands (i.e.,

bands

230a and 230b), where each band corresponds to a continuous plane image. The two bands may be combined into a compact rectangular frame, depending on the layout format selected.

Both ERP format and CMP format have been included in projection format conversion, which is being considered for next generation Video codec, as JVT-F1003 (Y.Ye, et al, "Algorithm descriptions of projection format conversion and Video quality metrics in 360Lib," Joint Video expansion Team (JVT) of ITU-T SG 16WP 3and ISO/IEC JTC 1/SC 29/WG 11,6th Meeting: Hobart, AU,31 March-7 April 2017, Document: JVT-F1003). In addition to the ERP format and the CMP format, there are different other VR Projection formats, such as Adjusted Cube Projection (ACP), Equal-Area Projection (EAP), Octahedron Projection (OHP), Icosahedron Projection (ISP), Segmented Sphere Projection (SSP), and Rotated Sphere Projection (RSP), which are widely applied to the field.

Fig. 3 shows an example of an OHP in which a spherical surface is projected onto 8 faces of an octahedron 310. By cutting the face edges between face 1 and face 5 and rotating face 1 and face 5 to connect to face 2 and face 6, respectively, and applying a similar procedure to face

3and face

7, 8 faces 320 taken out of octahedron 310 can be converted to intermediate format 330. The intermediate format may be packed into a rectangular image 340.

Fig. 4 shows an example of an ISP, where a sphere is projected onto 20 faces of an icosahedron 410. The 20 faces 420 from the icosahedron 410 may be packed into a rectangular image 430 (referred to as a projection layout).

SSP has been disclosed as a method in JFET-E0025 (Zhang et al, "AHG 8: Segmented Sphere Projection for 360-degree Video", Joint Video expansion Team (JFET) of ITU-T SG 16WP 3and ISO/IEC JTC 1/SC 29/WG 11,5th Meeting: Geneva, CH, 12-20 January 2017, Document: JFET-E0025) to convert spherical images to SSP format. Fig. 5 shows an example of a segmented spherical projection, where a spherical image 500 is mapped into a north pole image 510, a south pole image 520, and an equatorial segment image 530. The boundaries of 3 segments correspond to a latitude of 45N (i.e., 502) and a latitude of 45S (i.e., 504), with 0 corresponding to the equator (i.e., 506). The north and south poles are mapped into 2 circular areas (i.e., 510 and 520), and the projection of the equatorial segment may be the same as ERP or EAP. The diameter of the circle is equal to the width of the equatorial segment because both the polar and equatorial segments have a 90 ° latitude span. The north image 510, south image 520, and equatorial segment image 530 may be packed into rectangular images.

Fig. 6 shows an example of RSP, where a sphere 610 is split into an intermediate 270 x90 region 620 and a remainder 622. Each RSP portion may be further stretched at the top and bottom end sides to produce a deformed portion having an elliptical shape. As shown in fig. 6, the two oval shaped portions may be adapted to a rectangular format 630.

ACP is based on CMP. If the two-dimensional coordinates (u ', v') of the CMP are determined, the two-dimensional coordinates (u, v) of the ACP may be calculated by adjusting (u ', v') according to the following set of equations:

using a table of given positions (u, v) and face indices f, 3D coordinates (X, Y, Z) can be derived. For 3D to 2D coordinate conversion, (u ', v') and face index f can be calculated from the table of CMP, given (X, Y, Z). The 2D coordinates of the ACP may be calculated according to a set of equations.

Similarly to ERP, EAP also maps a spherical surface to a surface. In the (u, v) plane, u and v are both in the range [0,1 ]. For 2D to 3D coordinate conversion, given a sampling location (m, n), 2D coordinates (u, v) are first calculated in the same manner as ERP. The longitude and latitude (φ, θ) on the sphere can then be calculated from (u, v) as:

φ＝(u-0.5)*(2*π) (3)

θ＝sin^-1(1.0-2*v) (4)

finally, using the same equation for ERP, (X, Y, Z) can be calculated:

X＝cos(θ)cos(φ) (5)

Y＝sin(θ) (6)

Z＝-cos(θ)sin(φ) (7)

conversely, the longitude and latitude (φ, θ) may be estimated from the (X, Y, Z) coordinates using:

φ＝tan-1(-Z/X) (8)

θ＝sin-1(Y/(X2+Y2+Z2)1/2) (9)

since images or video associated with virtual reality may occupy a large space for storage or a large bandwidth for transmission, image/video compression is often used to reduce the required storage space or transmission bandwidth. Inter-frame prediction has become a powerful codec tool to exploit inter-frame redundancy using motion estimation/motion compensation. If conventional inter-frame prediction is applied to a 2D frame converted from a 3D space, the use of motion estimation/motion compensation techniques cannot function properly because objects in the 3D space may become distorted or distorted in the 2D frame due to object motion or relative motion between the object and the camera. To improve inter prediction of 2D frames converted from 3D space, various inter prediction techniques have been developed to improve the accuracy of inter prediction of 2D frames converted from 3D space.

Disclosure of Invention

The invention discloses a method and a device for processing a 360-degree virtual reality image. According to a method, input data for a current block in a 2D frame is received, wherein the 2D frame is projected from a 3D sphere. A first motion vector associated with a neighboring block in the 2D frame is determined, wherein the first motion vector points from a first starting position in the neighboring block to a first ending position in the 2D frame. The first motion vector is projected onto the 3D sphere according to the target projection. The first motion vector in the 3D sphere is rotated around the rotation axis along a rotation circle on a surface of the 3D sphere to generate a second motion vector in the 3D sphere. The second motion vector in the 3D sphere is mapped back to the 2D frame according to the inverse object projection. Encoding or decoding a current block in the 2D frame using the second motion vector. The second motion vector may be included as a candidate in the merge candidate list or the AMVP candidate list for encoding or decoding the current block.

In an embodiment, the rotation circle corresponds to the largest circle on the surface of the 3D sphere. In another embodiment, the rotation circle is smaller than the largest circle on the surface of the 3D sphere. The target projection may correspond to ERP, CMP, ACP, EAP, OHP, ISP, SSP, RSP, or CLP.

In an embodiment, projecting the first motion vector onto the 3D sphere comprises: according to the target projection, a first start position, a first end position and a second start position in the 2D frame are projected onto the 3D sphere, wherein the second start position is at a corresponding position in the current block corresponding to the first start position in the neighboring block. Rotating the first motion vector in the 3D sphere along the circle of rotation comprises: determining a target rotation for rotating along a rotation circle on a surface of the 3D sphere from a first start position to a second start position in the 3D sphere around a rotation axis; and rotating the first end position to a second end position on the 3D sphere using the target rotation. Mapping the second motion vector in the 3D sphere to the 2D frame comprises: mapping a second termination location on the 3D sphere back to the 2D frame according to the inverse target projection; and determining a second motion vector in the 2D frame based on the second start position and the second end position in the 2D frame.

According to another method, two 2D frames are received, wherein the two frames are projected from 3D spheres corresponding to two different view-points using a target projection, and the current block and the neighboring blocks are located in the two 2D frames. Based on the two 2D frames, a front point of the camera is determined. A plurality of mobile streams is determined in two 2D frames. Based on the first motion vectors associated with the neighboring blocks, a pan of the camera is determined. A second motion vector associated with the current block is derived based on the translation of the camera. Subsequently, the current block in the 2D frame is encoded or decoded using the second motion vector.

With the above method, a plurality of motion streams in two 2D frames may be calculated from a tangential direction of each pixel in the two 2D frames. The second motion vector may be included as a candidate in the merge candidate list or the AMVP candidate list for encoding or decoding the current block. The target projection may correspond to ERP, CMP, ACP, EAP, OHP, ISP, SSP, RSP, or CLP.

According to yet another method, input data for a current block in a 2D frame is received, wherein the 2D frame is projected from a 3D sphere according to a target projection. A first motion vector associated with a neighboring block in the 2D frame is determined, wherein the first motion vector points from a first starting position in the neighboring block to a first ending position in the 2D frame. The first motion vector is scaled to generate a second motion vector. Subsequently, the current block in the 2D frame is encoded or decoded using the second motion vector.

In an embodiment, the step of scaling the first motion vector to generate the second motion vector comprises: and projecting a first starting position, a first ending position and a second starting position in the 2D frame onto the 3D spherical surface according to the target projection, wherein the second starting position is located at a corresponding position in the current block corresponding to the first starting position in the adjacent block. The step of scaling the first motion vector to generate the second motion vector further comprises: scaling the longitude component of the first motion vector to generate a scaled longitude component of the first motion vector; scaling the latitude component of the first motion vector to generate a scaled latitude component of the first motion vector; and determining a second end position corresponding to the second start position based on the scaled longitude component of the first motion vector and the scaled latitude component of the first motion vector. The step of scaling the first motion vector to generate the second motion vector further comprises: mapping a second termination location on the 3D sphere back to the 2D frame according to the inverse target projection; and determining a second motion vector for the 2D frame based on the second start position and the second end position in the 2D frame.

In another embodiment, scaling the first motion vector to generate the second motion vector comprises: applying a first combining function to generate an x-component of the second motion vector and applying a second combining function to generate a y-component of the second motion vector; wherein the first combination function and the second combination functionThe number average is based on the first start position, a second start position in the current block corresponding to the first start position, the first motion vector and the target projection; and wherein the first combining function and the second combining function combine an object projection, the scaling and the inverse object projection, the object projection being the second data for projecting the first data in the 2D frame into the 3D sphere, the scaling being the scaling of the selected motion vector in the 3D sphere into a scaled motion vector in the 3D sphere, and the inverse object projection being for projecting the scaled motion vector into the 2D frame. In one embodiment, the first combined function corresponds to when the target projection corresponds to an equiangular projection

The second combined function corresponds to an identity function, wherein

Corresponding to a first latitude associated with the first starting position,

corresponding to a second latitude associated with a second starting location.

Drawings

Fig. 1 is an example of projecting a spherical surface onto a rectangular image according to an isometric projection, where each longitude line is mapped to a vertical line of the ERP image.

Fig. 2 is a cube with 6 faces, where a 360 degree VR image may be projected to the 6 faces of the cube according to a cube projection.

Fig. 3 is an example of an octahedral projection, where a spherical surface is projected onto the face of an octahedron of 8 faces.

Fig. 4 is an example of an icosahedron projection, in which a spherical surface is projected onto the face of a 20-face icosahedron.

Fig. 5 is an example of a Segmented Sphere Projection (SSP), in which a spherical image is mapped to a north pole image, a south pole image, and an equatorial segment image.

Fig. 6 is an example of a Rotated Sphere Projection (RSP) in which the sphere is divided into a middle 270 x90 region and the remainder. The two portions of the RSP may also be stretched at the top and bottom ends to create deformed portions with oval shaped boundaries on the top and bottom ends.

Fig. 7A and 7B are examples of deformation of a 2D image due to rotation of a spherical surface.

Fig. 8 is an example of rotation of a sphere from point a to point b on the larger circle of the sphere, where the larger circle corresponds to the largest circle on the surface of the sphere.

FIG. 9 is an example of a rotation of a sphere from point a to point b on a smaller circle of the sphere, where the larger circle corresponds to a circle that is smaller than the largest circle on the surface of the sphere.

Fig. 10 is a vector for deriving motion vectors for 2D projection images using a spherical rotation model.

Fig. 11 is an example of using a Motion Vector (MV) derived based on rotation of a sphere as a merge or Advanced Motion Vector Prediction (AMVP) candidate.

Fig. 12 is an example of an object (i.e., a tree) being projected onto the surface of a sphere at different camera positions.

Fig. 13 is an example of an ERP frame overlaid with a model of a mobile stream, where a stream of background (i.e., static objects) can be determined if a camera front point (forward point) is known.

Fig. 14 is an exemplary flow of MV derivation based on view panning.

Fig. 15 is an exemplary MV derivation based on viewpoint translation for different projection methods.

Fig. 16 is an example of a deformation associated with motion in an ERP frame.

Fig. 17 is an exemplary flow for MV scaling techniques in 3D spheres.

Fig. 18 is an exemplary flow of MV scaling techniques in 2D frames.

FIG. 19 is an exemplary flow of MV scaling techniques in an ERP frame.

FIG. 20 is an exemplary flow of an MV scaling technique in an ERP frame, wherein a current block and neighboring blocks are shown in an ERP image.

FIG. 21 is an exemplary flow of an MV scaling technique in a CMP frame, wherein a current block and a neighboring block are shown in a CMP image.

FIG. 22 is an exemplary flow of the MV scaling technique in an SSP frame, where a current block and a neighboring block are shown in an SSP image.

Fig. 23 is an exemplary flow of the MV scaling technique in an octahedral Projection (OHP) frame, where a current block and neighboring blocks are shown in an OHP image.

FIG. 24 is an exemplary flow of the MV scaling technique in an Icosahedron Projection (ISP) frame, where the current block and neighboring blocks are shown in an ISP image.

Fig. 25 is an exemplary flow of the MV scaling technique in an Equal-Area Projection (EAP) frame, where a current block and a neighboring block are shown in an EAP image.

FIG. 26 is an exemplary flow of an MV scaling technique in an Adjusted Cube Projection (ACP) frame, where a current block and neighboring blocks are shown in an ACP image.

FIG. 27 is an exemplary flow of the MV scaling technique in a Rotated Sphere Projection (RSP) frame, where a current block and neighboring blocks are shown in an RSP picture.

FIG. 28 is an exemplary flow of the MV scaling technique in a Cylindrical Projection (CLP) frame, where a current block and neighboring blocks are shown in a CLP image.

FIG. 29 is an exemplary flow diagram of a system for processing a 360 degree VR image using rotation of a sphere to adjust motion vectors in accordance with embodiments of the present invention.

Fig. 30 is an exemplary flow diagram of a system for deriving motion vectors from translation of a viewpoint to process a 360 degree VR image in accordance with an embodiment of the present invention.

FIG. 31 is an exemplary flow diagram of a system for processing a 360 degree VR image with scaling applied to adjust motion vectors in accordance with an embodiment of the present invention.

Detailed Description

The following description is of the preferred mode for carrying out the invention. The description is made for the purpose of illustrating the general principles of the invention and is not to be taken in a limiting sense. The protection scope of the present invention should be determined by the claims.

For conventional inter-prediction in video coding, motion estimation/motion compensation is widely used to explore the correlation in video data in order to reduce transmission information. Conventional video content corresponds to 2D video data, and motion estimation and motion compensation techniques typically assume translational motion. In the next generation of video codecs, higher level motion models are considered, e.g., affine models (affine models). However, these techniques derive a 3D motion model based on the 2D image.

In the present invention, motion information is derived in the 3D domain, so that more accurate motion information can be derived. According to one approach, a rotation of a sphere is assumed because of block deformation in 2D video content. Fig. 7A and 7B illustrate an example of deformation of a 2D image due to rotation of a spherical surface. In fig. 7A, block 701 on 3D sphere 700 is moved to become block 702. When block 701 and moved block 702 are mapped to ERP frame 703, the two corresponding blocks become block 704 and block 705 in the ERP frame. Although the blocks on the 3D sphere (i.e., 701 and 702) correspond to the same block, the two blocks (i.e., 704 and 705) have different shapes in the ERP frame. In other words, object movement or rotation in 3D space may cause object distortion in the 2D frame (i.e., the ERP frame in this example). In fig. 7B, region 711 is located on surface 710 of a sphere. Region 711 is moved to region 714 due to the rotation of the sphere along path 713. Due to the rotation of the sphere, the motion vector 712 associated with the region 711 is mapped to a motion vector 715. The correspondence of section 711-715 in 2D field 720 is shown as section 721-725. In another example, the portions of the trajectory 733 corresponding to 713-. In yet another example, the portions of trajectory 753 on surface 750 that correspond to 713- "725 are shown as portions 753-" 755, respectively, and the portions of 2D domain 760 that correspond to portions 753- "755 are shown as 763-" 765.

Fig. 8 shows an example of a spherical rotation 800 from a point a 830 to a point b 840 on a larger circle 820 on a spherical surface 810, where the larger circle 820 corresponds to the largest circle on the surface of the spherical surface 810. The rotation axis 850 is shown by the arrow in fig. 8. Rotation angle of theta_a。

Fig. 9 shows an example of a spherical rotation 900 from point a to point b on a smaller circle 910 on a spherical surface 920, where the smaller circle 910 corresponds to a circle that is smaller than the largest circle (i.e., circle 930) on the surface of the spherical surface 920. The midpoint of the rotation is shown as point 912 in fig. 9. Another example is a spherical rotation 950 from point a to point b on a smaller circle 960 of the sphere 970. The larger circle 980 (i.e., the largest circle) is shown in fig. 9. The axis of rotation 990 is shown as an arrow in fig. 9.

Fig. 10 illustrates the use of a spherical rotation model to derive motion vectors for 2D projection images. In FIG. 10, a diagram 1010 depicts the use of a spherical rotation model to derive motion vectors for 2D projection images in accordance with an embodiment of the present invention. The position a and the position b are two positions in the 2D projection image. Motion vector mv_aPointing from a to a'. The goal is to find the motion vector mv for position b_b. The present invention projects a 2D projection image onto a sphere 1020 using a 2D to 3D projection. Rotation around a smaller circle or a larger circle is applied to rotate from position a to position b. Position a 'is rotated to position b' according to the same spherical rotation selected. Then, the inverse projection is applied to the 3D sphere to convert the position b' to the 2D frame 1030. The motion vector for position b may be based on mv_bB' -b is calculated. Motion vector mv_aAnd motion vector mv_bIs a two-dimensional vector in the (x, y) domain. Position a, position a ', position b and position b' are two-dimensional coordinates in the (x, y) domain. Mark

Mark

Mark

And a symbol

Is a three-dimensional coordinate in the (theta, phi) domain.

The derived MVs may be used as candidates for Video Coding using a merge mode or an AMVP mode, where both merge mode and AMVP mode are techniques to predictively code motion information of a block or block as disclosed in High Efficiency Video Coding (HEVC). When encoding and decoding a block in merge mode, the current block uses the motion information of the block represented by the merge index, which points to the selected candidate in the merge candidate list. When a block is coded in AMVP mode, the motion vector of the current block is predicted by the predictor represented by the AMVP index, which points to the selected candidate in the AMVP candidate list. Fig. 11 shows the use of MVs derived based on spherical rotation as merge candidates or AMVP candidates. In fig. 11, a layout 1110 of neighboring blocks for deriving merge candidates is shown for block 1112, where the neighboring blocks include spatial neighboring block a₀Spatial neighboring block a₁Spatial neighboring block B₀Spatial neighboring block B₁And spatial neighboring block B₂And temporal neighboring block Col₀Or temporal neighboring blocks Col₁. For a current block 1120 in a 2D frame, from Block A₀And block B₀May be used to derive motion candidates for the current block. As disclosed above, the adjacent blocks a are based on the rotation of the sphere₀Motion vector mv of_aCan be used to derive a corresponding motion vector mv for a current block_a’. Similarly, the adjacent blocks B are rotated according to the spherical surface₀Motion vector mv of_bCan be used to derive a corresponding motion vector mv for a current block_b’. Motion vector mv_a’And motion vector mv_b’May be included in the merge candidate list or the AMVP candidate list.

In the present invention, motion vector derivation based on translation of a viewpoint (viewpoint) is disclosed. Fig. 12 shows an example where objects (i.e. trees) are projected onto the surface of a sphere at different camera positions. In the cameraAt machine position a, the tree is projected onto a spherical surface 1210 to form an image 1240 of the tree. At the forward position B, the tree is projected onto a sphere 1220 to form an image 1250 of the tree. An image 1241 of the tree corresponding to camera position a is also shown on sphere 1220 for comparison. At yet another preposition position C, the tree is projected onto the spherical surface 1230 to form an image 1260 of the tree. An image 1242 of the tree corresponding to camera position a and an image 1251 of the tree corresponding to camera position B are also shown on sphere 1220 for comparison. In FIG. 12, for video captured by a camera moving along a straight line, the direction of movement of the camera in 3D space (as represented by

arrows

1212, 1222, and 1232 for three different camera positions in FIG. 12) may be represented by latitude and longitude coordinates

Is shown in which

Corresponding to the intersection of the motion vector with the 3D sphere. Dot

Is projected onto the 2D object projection plane and this point becomes the forward point.

Fig. 13 is an example of an ERP frame overlaid with a model of moving flow (moving flow), where the flow of the background (i.e. static objects) can be determined if the camera front point is known. These flows are shown by arrows. The moving stream corresponds to a moving direction of the video content based on the camera moved in one direction. The movement of the camera causes relative movement of the static background object, and the direction of movement of the background object on the 2D frames captured by the camera can be represented as a moving stream. Fig. 14 shows an exemplary flow of MV derivation based on view panning. In FIG. 14, an object 1410 is projected onto a spherical surface 1420 to form an image 1422 on the surface of the spherical surface corresponding to a camera position 1424. At camera position 1434, object 1410 is projected onto spherical surface 1430 to form an image 1432 on the surface of the spherical surface. On the surface of the sphere 1430, the corresponding location of the object is shown at location 1423. The flow of deriving motion vectors based on viewpoint translation is as follows:

find the front point 1440 of the camera.

Computing the motion stream 1450 in 2D frames (i.e., the tangential direction at each pixel)

Determining the motion vector of the current block by using the MVs of the neighboring blocks

a. As indicated by arrow 1460, the MVs of the neighboring blocks may be used to determine the panning of the camera

b. As indicated by arrow 1470, the translation of the camera may be used to determine the MV of the current block

The view-shift-based MV derivation can be applied to different projection methods. The motion stream in the 2D frame can be mapped to a 3D sphere. As shown in fig. 15, is a moving stream on a 3D sphere 1510 with a forward point and two different moving streamlines (i.e., 1512 and 1514) shown. A move flow on a 3D sphere related to ERP 1520 is shown in fig. 15, where the move flow is shown for ERP frame 1526. The flow of movement on a 3D sphere associated with CMP 1530 is shown in fig. 15, where the flow of movement is shown for six faces of a CMP frame 1536 in a 2x3 layout format. The flow of movement on the 3D sphere associated with the OHP 1540 is shown in fig. 15, where the flow of movement is shown for eight faces of an OHP frame 1546. The moving streams on the 3D sphere associated with ISP 1550 are shown in fig. 15, where the moving streams are shown for twenty planes of ISP frame 1556. The mobile stream on the 3D sphere associated with the SSP 1560 is shown in fig. 15, where the mobile stream is shown for the split plane of the SSP frame 1566.

As previously described, there are different mappings, e.g., ERP, CMP, SSP, OHP, and ISP, to project from the 3D sphere to the 2D frame. When an object in the 3D sphere moves, the corresponding object in the 2D frame may be deformed. The deformation is based on the type of projection. The length, angle and area may no longer remain consistent at all mapped positions in the 2D frame. MV scaling techniques on 3D spheres are disclosed to perform deformation by scaling MVs on 3D spheres to minimize the impact of projection type on deformation. MV scaling techniques on 2D frames are also disclosed, which can be applied directly to 2D frames by combining projection, scaled MV, and backprojection into a single function.

Fig. 16 illustrates an example of a deformation associated with motion in an ERP frame. In fig. 16, three motion vectors (i.e., 1612, 1614, and 1616) along three latitudinal lines are shown on the surface of 3D sphere 1610, where the three motion vectors have about the same length. If the surface of the 3D sphere is unfolded into the 2D plane 1620, the three motion vectors (1622, 1624, and 1626) maintain their characteristics of equal length. For ERP frames, the unfolded image needs to be stretched (stretch), where more stretching is needed in higher latitudes. Therefore, as shown in fig. 16, the three motion vectors (1632, 1634, and 1636) have different lengths. If a neighboring block with motion vector 1634 is used for the current block (i.e., the motion vector located at position 1632), then motion vector 1634 must be appropriately scaled, e.g., the merge index or AMVP index, before being used for coding. The example in fig. 16 shows the need for MV scaling.

An exemplary flow of MV scaling techniques in 3D spheres is disclosed below. Suppose mv_aStarting point of (1) is a, mv_aThe end point of (1) is a'. We can predict the motion vector mv at point b by the following procedure_b：

1. As shown in fig. 17, a', and b are mapped from the 2D frame 1710 to the 3D sphere 1720.

Let the mapping function be

It maps a pixel located at (x, y) in a 2D frame to

2. Calculate Δ θ sum

3. Will scale function scale θ () and

application to Delta theta and

4. calculated according to the following formula

5. Will be provided with

From the 3D sphere 1720, to the 2D frame 1730 to produce b'.

Let the inverse function be

It will lie on a 3D sphere

The pixels at (a) are mapped to a 2D frame.

6. According to mv_b＝b’-b，MV mv_bMay be determined.

In the above-described flow, Δ θ and

respectively as a scaling function of

And

as shown below, are some examples of scaling functions.

Example 1:

therefore, Δ θ' ═ Δ θ;

example 2:

example 3: general form of scaling function-projection-based

Example 4: general form of scaling function-projection-based

By combining a projection function, a scaling function and an inverse projection function into a single function f (a, b, mv)_aProjection type), MV scaling in the 3D domain may also be performed in the 2D domain. Thus, the mapping between 3D and 2D may be skipped.

mv_b＝(x_mvb,y_mvb)＝f(a,b,mv_aProjection type)

x_mvb＝f_x(a,b,mv_aProjection type)

y_mvb＝f_y(a,b,mv_aProjection type)

In the above equation, f_x(a,b,mv_aProjection type) to generate x_mvbA single function of f_y(a,b,mv_aProjection type) to generate y_mvbA single function of (a). Fig. 18 shows an exemplary flow of MV scaling in a 2D frame. Two blocks (i.e., a and B) in the 2D frame 1810 are shown. The motion vector at position a in block a is marked. In FIG. 18, with mv_aThe end position of the relevant position a is marked a'. The motion vector at position b in block a is marked. In FIG. 18, with mv_bThe end position of the relevant position b is marked b'. The steps of the projection function (i.e., the pre-projection 1830), the scaling function in the 3D sphere 1820, and the back projection function (i.e., the back projection 1840) may be combined into a single function according to the above equations.

As shown in fig. 19, an example of MV scaling in ERP frames. In fig. 19, the surface of the 3D sphere 1910 is unfolded into a 2D plane 1920. For ERP frame 1930, the unfolded image needs to be stretched so that all latitudinal lines have the same length. Based on the ERP characteristics, the length of the horizontal line is expanded more as it gets closer to the north or south pole. In the ERP frame, the adjacent block 1932 has a motion vector mv₁. Motion vector mv₂According to mv₁Is derived for the current block 1934. The derived motion vector may be used to codec the current block. For example, the derivation block may be used as a merging candidate or an AMVP candidate. Since the neighboring blocks are located at different positions on the 3D spherical surface, the motion vectors associated with the neighboring blocks need to be scaled before they are used to codec the current block. Specifically, since the neighboring block is located at a higher latitude, the motion vector of the neighboring block is stretched more in the x direction than the motion vector of the current block. Thus, before it is used for the current block, mv₁The x component of (a) needs to be scaled down. The MV scaling process 1940 is shown in FIG. 19, in which a motion vector located at a position a of a neighboring block is scaled to MV₂And is used to codec the current block. In one embodiment of the present invention, a scaling function is disclosed that preserves motion distance, where mv₂Is mv₁,

Function of (c):

mv_y2＝mv_y1

in the above-described equation, the equation,

the longitude and latitude corresponding to position a in the adjacent blocks,

corresponding to the longitude and latitude of the position b in the current block. The position a and the position b may correspond to the center of the respective block, or other positions of the respective block. Deriving the y-component and mv of a motion vector₁The y component of (a) is the same. In other words, the scaling function of the y component is an identity function.

MV scaling of other projections is also disclosed. In fig. 20, MV scaling of ERP is disclosed, where a current block 2012 and an adjacent block 2014 in an ERP image 2010 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂ERP), using a scaling function, from motion vectors mv of neighboring blocks₁Needs to be scaled, where (x)₁,y₁) Coordinates (x, y), (x) corresponding to position a in the adjacent block₂,y₂) Corresponding to the coordinates (x, y) of the position b in the current block.

In fig. 21, MV scaling of CMP is disclosed, where a current block 2112 and a neighboring block 2114 in a CMP image 2110 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂CMP), using a scaling function, the motion vector mv from the neighboring block₁Needs to be scaled.

In fig. 22, MV scaling of SSP is disclosed, where a current block 2212 and a neighboring block 2214 in an SSP image 2210 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂SSP), using a scaling function, from the motion vectors mv of the neighboring blocks₁Needs to be scaled.

In fig. 23, MV scaling of OHP is disclosed, where a current block 2312 and a neighboring block 2314 in an OHP picture 2310 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂OHP), using a scaling function, from neighboring blocksMotion vector mv₁Needs to be scaled.

In fig. 24, the MV scaling of the ISP is disclosed, wherein a current block 2412 and a neighboring block 2414 in an ISP image 2410 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂ISP) using a scaling function, motion vectors mv from neighboring blocks₁Needs to be scaled.

In fig. 25, MV scaling of EAP is disclosed, where a current block 2512 and a neighboring block 2514 in an EAP image 2510 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂EAP), using a scaling function, from the motion vectors mv of the neighboring blocks₁Needs to be scaled.

In fig. 26, MV scaling of ACP is disclosed, where a current block 2612 and an adjacent block 2614 in an ACP image 2610 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂ACP), using a scaling function, from the motion vectors mv of the neighboring blocks₁Needs to be scaled.

In fig. 27, MV scaling of RSP is disclosed, where a current block 2712 and a neighboring block 2714 in an RSP image 2710 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂RSP), using a scaling function, from the motion vectors mv of the neighboring blocks₁Needs to be scaled.

In addition to the projections described aboveCylindrical projection is also used to project 3D spheres into 2D frames. Conceptually, as shown in fig. 28, a cylinder projection is created by wrapping a cylinder 2820 around a ball 2830 and projecting light rays through the ball onto the cylinder. The cylinder projection represents meridians (meridians) as straight, uniformly spaced vertical lines and parallel lines as straight horizontal lines. As it is on the sphere, meridian and parallel lines cross at right angles. Depending on the position of the light source, different Cylindrical projections (CLPs) are generated. In fig. 28, MV scaling of CLPs is disclosed, where a current block 2812 and a neighboring block 2814 in a CLP picture 2810 are shown. The adjacent blocks having motion vectors mv₁. Mv from neighboring blocks₁Is used to derive a motion vector mv for a current block₂. According to the invention, according to mv₂＝f(mv₁,x₁,y₁,x₂,y₂CLP), using a scaling function, from the motion vectors mv of the neighboring blocks₁Needs to be scaled.

FIG. 29 illustrates an exemplary flow diagram of a system for applying spherical rotations to adjustment motion vectors for processing 360 degree virtual reality images in accordance with an embodiment of the present invention. The steps illustrated in the flow may be implemented as program code executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowcharts may also be implemented on hardware basis, for example, one or more electronic devices or processors for executing the steps in the flowcharts. According to the method, in step 2910, input data for a current block in a 2D frame is received, wherein the 2D frame is projected from a 3D sphere. The input data may correspond to pixel data in a 2D frame to be encoded. In step 2920, a first motion vector associated with a neighboring block in the 2D frame is determined, where the first motion vector points from a first starting position in the neighboring block to a first ending position in the 2D frame. In step 2930, a first motion vector is projected onto the 3D sphere according to the target projection. In step 2940, the first motion vector in the 3D sphere is rotated around the rotation axis along a rotation circle on the surface of the 3D sphere to generate a second motion vector in the 3D sphere. In step 2950, the second motion vector in the 3D sphere is mapped back to the 2D frame according to the inverse object projection. In step 2960, the current block in the 2D frame is encoded or decoded using the second motion vector.

FIG. 30 illustrates an exemplary flow diagram of a system for deriving motion vectors from translation of a viewpoint for processing a 360 degree virtual reality image in accordance with an embodiment of the present invention. According to another method, in step 3010, two 2D frames are received, wherein the two frames are projected from 3D spheres corresponding to two different view-points using a target projection, the current block and the neighboring blocks being located in the two 2D frames. In step 3020, based on the two 2D frames, a front point of the camera is determined. In step 3030, a mobile stream is determined in the two 2D frames. In step 3040, a translation of the camera is determined based on the first motion vectors associated with the neighboring blocks. In step 3050, a second motion vector associated with the current block is derived based on the translation of the camera. In step 3060, the current block in the 2D frame is encoded or decoded using the second motion vector.

FIG. 31 illustrates an exemplary flow diagram of a system for applying scaling to adjust motion vectors for processing a 360 degree virtual reality image in accordance with an embodiment of the present invention. According to the method, in step 3110, input data for a current block in a 2D frame is received, wherein the 2D frame is projected from a 3D sphere according to a target projection. In step 3120, a first motion vector associated with a neighboring block in the 2D frame is determined, wherein the first motion vector points from a first starting position in the neighboring block to a first ending position in the 2D frame. In step 3130, the first motion vector is scaled to generate a second motion vector. In step 3140, the current block in the 2D frame is encoded or decoded using the second motion vector.

The flow chart shown in the present invention is for illustrating an example of a video according to the present invention. One skilled in the art may practice the present invention by modifying each of the steps, recombining the steps, separating one step from another, or combining the steps, without departing from the spirit of the present invention.

The previous description is presented to enable any person skilled in the art to practice the invention in the context of a particular application and its requirements. Various modifications to the described embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. In the foregoing detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. Nevertheless, it will be understood by those skilled in the art that the present invention can be practiced.

The embodiments of the invention described above may be implemented in various hardware, software code, or combinations of both. For example, an embodiment of the invention may be circuitry integrated within a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the invention may also be program code executing on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also include functions performed by a computer processor, digital signal processor, microprocessor, or Field Programmable Gate Array (FPGA). According to the invention, the processors may be configured to perform specific tasks by executing machine-readable software code or firmware code that defines the specific methods implemented by the invention. The software code or firmware code may be developed from different programming languages and different formats or styles. The software code may also be compiled into different target platforms. However, different code formats, styles and languages of software code, and other forms of configuration code to perform the tasks of the invention, do not depart from the spirit and scope of the invention.

The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method of processing a 360 degree virtual reality image, the method comprising:

receiving input data for a current block in a two-dimensional frame, wherein the two-dimensional frame is projected from a three-dimensional sphere;

determining a first motion vector associated with a neighboring block in the two-dimensional frame, wherein the first motion vector points from a first starting location in the neighboring block to a first ending location in the two-dimensional frame;

projecting the first motion vector onto the three-dimensional spherical surface according to target projection;

rotating the first motion vector in the three-dimensional sphere around a rotation axis along a rotation circle on a surface of the three-dimensional sphere to generate a second motion vector in the three-dimensional sphere;

mapping the second motion vector in the three-dimensional sphere back to the two-dimensional frame according to an inverse target projection; and

encoding or decoding the current block in the two-dimensional frame using the second motion vector.

2. The method of processing a 360 degree virtual reality image as recited in claim 1, wherein the second motion vector is included as a candidate in a merge candidate list or an advanced motion vector prediction candidate list for encoding or decoding the current block.

3. The method of processing a 360 degree virtual reality image of claim 1, wherein the circle of rotation corresponds to a largest circle on a surface of the three dimensional sphere.

4. The method of processing a 360 degree virtual reality image of claim 1, wherein the circle of rotation is less than a maximum circle on the surface of the three dimensional sphere.

5. A method of processing a 360 degree virtual reality image as claimed in claim 1, wherein the object projection corresponds to an equiangular projection, a cubic projection, an adjusted cubic projection, an iso-regional projection, an octahedral projection, an icosahedral projection, a segmented spherical projection, a rotated spherical projection or a cylindrical projection.

6. The method of processing a 360 degree virtual reality image of claim 1, wherein projecting the first motion vector onto the three dimensional sphere comprises:

projecting the first starting location, a first ending location, and a second starting location in the two-dimensional frame onto the three-dimensional sphere according to the target projection, wherein the second starting location is at a respective location in the current block that corresponds to the first starting location in the neighboring block.

7. The method of processing a 360 degree virtual reality image of claim 6, wherein rotating the first motion vector in the three dimensional sphere along the circle of rotation comprises:

determining a target rotation for rotation along a rotation circle on a surface of the three-dimensional sphere about a rotation axis from the first starting position to the second starting position in the three-dimensional sphere; and

rotating the first end position to a second end position on the three-dimensional sphere using the target rotation.

8. The method of processing a 360 degree virtual reality image of claim 7, wherein mapping the second motion vector in the three-dimensional sphere to the two-dimensional frame comprises:

mapping the second termination location on the three-dimensional sphere back to the two-dimensional frame according to the inverse target projection; and

and determining the second motion vector in the two-dimensional frame according to the second starting position and the second ending position in the two-dimensional frame.

9. An apparatus for processing a 360 degree virtual reality image, the apparatus comprising one or more electronic devices or processors configured to:

10. A method of processing a 360 degree virtual reality image, the method comprising:

receiving two-dimensional frames, wherein the two frames are projected from three-dimensional spheres corresponding to two different viewpoints using a target projection, and a current block and a neighboring block are located in the two-dimensional frames;

determining a front point of the camera based on the two-dimensional frames;

determining a plurality of mobile streams in the two-dimensional frames;

determining a translation of the camera based on a first motion vector associated with the neighboring block;

deriving a second motion vector associated with the current block based on the translation of the camera; and

encoding or decoding the current block in a two-dimensional frame using the second motion vector.

11. The method of processing a 360 degree virtual reality image of claim 10, wherein the plurality of motion streams in the two-dimensional frames are computed from a tangential direction of each pixel in the two-dimensional frames.

12. The method of processing a 360 degree virtual reality image of claim 10, wherein the second motion vector is included as a candidate in a merge candidate list or an advanced motion vector prediction candidate list for encoding or decoding the current block.

13. A method of processing a 360 degree virtual reality image as claimed in claim 10, wherein the object projection corresponds to an equiangular projection, a cubic projection, an adjusted cubic projection, an iso-regional projection, an octahedral projection, an icosahedral projection, a segmented spherical projection, a rotated spherical projection or a cylindrical projection.

14. A method of processing a 360 degree virtual reality image, the method comprising:

receiving input data for a current block in a two-dimensional frame, wherein the two-dimensional frame is projected from a three-dimensional sphere according to a target projection;

scaling the first motion vector to generate a second motion vector; and

encoding or decoding the current block in the two-dimensional frame using the second motion vector,

wherein the step of scaling the first motion vector to generate a second motion vector comprises: projecting the first starting location, the first ending location, and a second starting location in the two-dimensional frame onto the three-dimensional sphere according to the target projection, wherein the second starting location is at a corresponding location in the current block, the corresponding location corresponding to the first starting location in the neighboring block.

15. The method of processing a 360 degree virtual reality image of claim 14, wherein the step of scaling the first motion vector to generate a second motion vector further comprises:

scaling a longitudinal component of the first motion vector to generate a scaled longitudinal component of the first motion vector;

scaling a latitude component of the first motion vector to generate a scaled latitude component of the first motion vector; and

determining a second ending location corresponding to the second starting location based on the scaled longitude component of the first motion vector and the scaled latitude component of the first motion vector.

16. The method of processing a 360 degree virtual reality image of claim 15, wherein the step of scaling the first motion vector to generate a second motion vector further comprises:

mapping the second termination location on the three-dimensional sphere back to the two-dimensional frame according to an inverse target projection; and

determining the second motion vector of the two-dimensional frame based on the second start position and a second end position in the two-dimensional frame.

17. The method of processing a 360 degree virtual reality image of claim 14, wherein the step of scaling the first motion vector to generate a second motion vector comprises:

applying a first combining function to generate an x-component of the second motion vector and applying a second combining function to generate a y-component of the second motion vector;

wherein the first and second combined functions are each based on the first starting position, a second starting position in the current block corresponding to the first starting position, the first motion vector, and a target projection; and

wherein the first combining function and the second combining function combine the target projection, the scaling and an inverse target projection, the target projection being second data for projecting first data in the two-dimensional frame into the three-dimensional sphere, the scaling being a scaling of a selected motion vector in the three-dimensional sphere into a scaled motion vector in the three-dimensional sphere, and the inverse target projection being for projecting the scaled motion vector into the two-dimensional frame.

18. A method of processing a 360 degree virtual reality image as claimed in claim 17, wherein the target projection corresponds to an equiangular projection;

the first combination function corresponds to

Wherein

Corresponding to a first latitude associated with the first starting position,

corresponding to a second latitude associated with the second starting location; and

the second combined function corresponds to an identity function.

19. The method of processing a 360 degree virtual reality image of claim 14, wherein the second motion vector is included as a candidate in a merge candidate list or an advanced motion vector prediction candidate list for encoding or decoding the current block.

20. A method of processing a 360 degree virtual reality image as claimed in claim 14, wherein the object projection corresponds to an equiangular projection, a cubic projection, an adjusted cubic projection, an iso-regional projection, an octahedral projection, an icosahedral projection, a segmented spherical projection, a rotated spherical projection or a cylindrical projection.