CN107959844B - 360degree video capture and playback - Google Patents

360degree video capture and playback Download PDF

Info

Publication number
CN107959844B
CN107959844B CN201710952982.8A CN201710952982A CN107959844B CN 107959844 B CN107959844 B CN 107959844B CN 201710952982 A CN201710952982 A CN 201710952982A CN 107959844 B CN107959844 B CN 107959844B
Authority
CN
China
Prior art keywords
video
360degree
projection
coordinate system
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710952982.8A
Other languages
Chinese (zh)
Other versions
CN107959844A (en
Inventor
周敏华
陈学敏
布赖恩·A·亨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avago Technologies International Sales Pte Ltd
Original Assignee
Avago Technologies General IP Singapore Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/599,447 external-priority patent/US11019257B2/en
Application filed by Avago Technologies General IP Singapore Pte Ltd filed Critical Avago Technologies General IP Singapore Pte Ltd
Publication of CN107959844A publication Critical patent/CN107959844A/en
Application granted granted Critical
Publication of CN107959844B publication Critical patent/CN107959844B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T3/047
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • G06T3/12
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/139Format conversion, e.g. of frame-rate or size
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/189Recording image signals; Reproducing recorded image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/39Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability involving multiple description coding [MDC], i.e. with separate layers being structured as independently decodable descriptions of input picture data

Abstract

In a 360degree video capture and playback system, 360degree video may be captured, stitched, encoded, decoded, rendered, and played back. In one or more implementations, a stitching device may be configured to stitch the 360degree video using an intermediate coordinate system between the input picture coordinate system and the capture coordinate system. In one or more implementations, the stitching device may be configured to use a projection format decision to stitch the 360degree video into at least two different projection formats, and the encoding device may be configured to encode the stitched 360degree video with signaling indicating the at least two different projection formats. In one or more implementations, the splicing device may be configured to splice the 360degree video with a plurality of views, and the rendering device may be configured to render the decoded bitstream using one or more suggested views.

Description

360degree video capture and playback
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application claims priority benefits from the following U.S. provisional patent applications in accordance with 35U.S. C. § 119: united states provisional patent application No. 62/339,040 entitled "360 DEGREE VIDEO CAPTURE AND PLAYBACK AND play back" (360DEGREE VIDEO CAPTURE AND play) "filed on 19/5/2016; united states provisional patent application No. 62/408,652 entitled "VIDEO CODING and adaptive PROJECTION FORMAT (VIDEO CODING WITH ADAPTIVE PROJECTION FORMAT)" filed on 14/10/2016; and 2016, 11, 4, entitled "SUGGESTED VIEWS WITHIN 360 recommended view in 360DEGREE VIDEO" (U.S. provisional patent application No. 62/418,066), the disclosure of which is hereby incorporated by reference in its entirety for all purposes.
Technical Field
The present disclosure relates to video capture and playback, and more particularly, to 360degree video capture, processing, and playback.
Background
360degree video (360degree video), also known as 360degree video, immersive video and/or spherical video are video recordings of real world panoramas in which views in each direction are recorded simultaneously, taken using an omnidirectional camera or set of cameras. During playback, the viewer is able to control the field of view (FOV) angle and viewing direction (virtual reality form).
Disclosure of Invention
In one aspect, the invention relates to a system comprising: a video capture device configured to capture 360degree video; a splicing device configured to: stitching the captured 360degree video using an intermediate coordinate system between an input picture coordinate system and a 360degree video capture coordinate system; and an encoding device configured to: encoding the spliced 360-degree video into a 360-degree video bitstream; and for transmission and storage, prepare the 360-degree video bitstream for playback.
In another aspect, the invention relates to a system comprising: a video capture device configured to capture 360degree video; a stitching device configured to use a projection format decision to stitch the captured 360degree video into at least two different projection formats; and an encoding device configured to: encoding the spliced 360-degree video into a 360-degree video bitstream, the 360-degree video bitstream including signaling indicating the at least two different projection formats; and for transmission, preparing the 360degree video bitstream for playback.
In another aspect, the invention relates to a system comprising: a video capture device configured to capture 360degree video; a stitching device configured to stitch the captured 360 degrees video with a plurality of perspectives; an encoding device configured to encode the spliced 360degree video into a 360degree video bitstream; decoding means configured to decode the 360-degree video bitstream associated with the plurality of views; and a rendering device configured to render the decoded bitstream using one or more suggested views of the plurality of views.
Drawings
Certain features of the technology are set forth in the appended claims. However, for purposes of explanation, one or more implementations of the present technology are set forth in the accompanying drawings.
Fig. 1 illustrates an example network environment in which 360degree video capture and playback may be implemented, in accordance with one or more implementations.
Figure 2 conceptually illustrates an example of an equidistant columnar projection format.
FIG. 3 conceptually illustrates an example of equidistant histogram projection and a map of the Earth.
Figure 4 conceptually illustrates an example of a 360degree video with equidistant columnar projections.
Figure 5 conceptually illustrates an example of a 360degree image in an equidistant columnar projection layout.
FIG. 6 conceptually illustrates an example definition of a six-sided cube.
FIG. 7 conceptually illustrates an example of a cube projection format.
FIG. 8 conceptually illustrates an example of a 360degree image in a cube projection layout.
FIG. 9 conceptually illustrates an example of a normalized projection plane size determined by the angle of field of view.
Fig. 10 conceptually illustrates an example of viewing direction angles.
Fig. 11 illustrates a schematic diagram of coordinate mapping between an output presentation picture and an input 360degree video picture.
Figure 12 conceptually illustrates an example of mapping points in a normalized presentation coordinate system to a normalized projection coordinate system using an equidistant histogram projection format.
FIG. 13 conceptually illustrates an example of mapping points in a normalized presentation coordinate system to a normalized projection coordinate system using a cube projection format.
Figure 14 conceptually illustrates an example of a two-dimensional layout 1400 of samples of input 360-degree video pictures projected for 360-degree video presentation.
Figure 15 conceptually illustrates an example of a global rotation angle between a capture coordinate system and a 360degree video projection coordinate system.
Figure 16 conceptually illustrates an example of an alternative 360degree view projection in an equidistant columnar format.
Fig. 17 illustrates a schematic diagram of an example of a coordinate mapping process modified with a 360degree video projection coordinate system.
Fig. 18 illustrates a schematic diagram of an example of a 360degree video capture and playback system using a six view layout format.
Fig. 19 illustrates a schematic diagram of an example of a 360degree video capture and playback system using a two-view layout format.
Fig. 20 illustrates a schematic diagram of an example of a 360degree video capture and playback system using a two-view layout format in which one view sequence is used for presentation.
FIG. 21 conceptually illustrates an example of a multiple cube projection format layout.
Fig. 22 conceptually illustrates an example of unrestricted motion compensation.
Fig. 23 conceptually illustrates an example of a multiple 360degree video projection format layout.
Fig. 24 conceptually illustrates an example of extended unconstrained motion compensation.
Fig. 25 illustrates a schematic diagram of another example of a 360degree video capture and playback system.
FIG. 26 conceptually illustrates an example of a cube projection format.
Fig. 27 conceptually illustrates an example of a fisheye projection format.
FIG. 28 conceptually illustrates an example of an icosahedron projection format.
Fig. 29 illustrates a schematic diagram of an example of 360degree video in multiple projection formats.
Fig. 30 illustrates a schematic diagram of an example of a 360degree video capture and playback system with an adaptive projection format.
Fig. 31 illustrates a schematic diagram of another example of a 360degree video capture and playback system with an adaptive projection format.
Fig. 32 illustrates a schematic diagram of an example of projection format determination.
Fig. 33 conceptually illustrates an example of projection format conversion that excludes inter-prediction across projection format conversion boundaries.
Fig. 34 conceptually illustrates an example of projection format transitions with inter-prediction across projection format transition boundaries.
Fig. 35 illustrates an example network environment 3500 in which suggested views within a 360degree video may be implemented in accordance with one or more implementations.
FIG. 36 conceptually illustrates an example of equidistant columnar and cubic projections.
Fig. 37 conceptually illustrates an example of a 360degree video presentation.
FIG. 38 illustrates a schematic diagram of extraction and rendering with suggested perspectives.
FIG. 39 conceptually illustrates an electronic system with which one or more implementations of the present technology may be implemented.
The accompanying appendices, which are included to provide a further understanding of the technology and are incorporated in and constitute a part of this specification, illustrate aspects of the technology and together with the description serve to explain the principles of the technology.
Detailed Description
The detailed description set forth below is intended as a description of various configurations of the present technology and is not intended to represent the only configurations in which the present technology may be practiced. The accompanying drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the present technology. However, it should be clear and apparent to one skilled in the art that the present techniques are not limited to the specific details set forth herein and may be practiced using one or more implementations. In one or more instances, well-known structures and components are shown in block diagram form in order to avoid obscuring the concepts of the present technology.
In a 360degree video capture and playback system, 360degree video may be captured, stitched, encoded, stored or transmitted, decoded, presented and played back. In one or more implementations, a stitching device may be configured to stitch the 360degree video using an intermediate coordinate system between the input picture coordinate system and the capture coordinate system. In one or more implementations, the stitching device may be configured to use a projection format decision to stitch the 360degree video into at least two different projection formats, and the encoding device may be configured to encode the stitched 360degree video with signaling indicating the at least two different projection formats. In one or more implementations, the splicing device may be configured to splice the 360degree video with a plurality of views, and the rendering device may be configured to render the decoded bitstream using one or more suggested views.
In the present system, a default (recommended) viewing direction angle, FOV angle, and/or a system message of presentation picture size may be signaled with the 360degree video content. A 360degree video playback device (not shown) may keep the presentation picture size as is, but purposefully reduce the active presentation area to reduce presentation complexity and memory bandwidth requirements. The 360degree video playback device may store 360degree video presentation settings (e.g., FOV angle, viewing direction angle, presentation picture size, etc.) just prior to playback termination or switching to other program channels so that the stored presentation settings may be used when playback of the same channel resumes. A 360degree video playback device may provide a preview mode in which the viewing angle may be automatically changed every N frames to help the viewer select a desired viewing direction. The 360degree video capture and playback device may calculate the projected pattern in real-time (e.g., block-by-block) to save memory bandwidth. In this example, the projected pattern may not be loaded from off-chip memory. In the present system, different view fidelity information may be assigned to different views.
Fig. 1 illustrates an example network environment 100 in which 360degree video capture and playback may be implemented, in accordance with one or more implementations. However, not all of the depicted components may be used, and one or more implementations may include additional components not shown in the figures. Changes may be made in the arrangement and type of the components without departing from the spirit or scope of the claims as set forth herein. Additional components, different components, or fewer components may be provided.
The example network environment 100 includes a 360-degree video capture device 102, a 360-degree video stitching device 104, a video encoding device 106, a transmission link or storage medium, a video decoding device 108, and a 360-degree video presentation device 110. In one or more implementations, one or more of the devices 102, 104, 106, 108, 110 may be combined into the same physical device. For example, the 360-degree video capture device 102, the 306-degree video stitching device 104, and the video encoding device 106 may be combined into a single device, and the video decoding device 108 and the 360-degree video rendering device 110 may be combined into a single device. In some aspects, the network environment 100 may include a storage device 114 that stores encoded 360degree video (e.g., stores 360degree video on a DVD, blu-ray, cloud, or digital video recording at a gateway/set-top box, etc.), and then plays back on a display device (e.g., 112).
Network environment 100 may further include a 360-degree video projection format conversion device (not shown) that may perform 360-degree video projection format conversion prior to video encoding by video encoding device 106 and/or after video decoding by video decoding device 108. The network environment 100 may also include a 360degree video projection format conversion device (not shown) interposed between the video decoding device 108 and the 360 video presentation device 110. In one or more implementations, video encoding device 106 may be communicatively coupled to video decoding device 108 via a transmission link (e.g., over a network).
In the present system, the 360degree video stitching device 104 may utilize an additional coordinate system that provides more freedom on the 360degree video capture side when projecting the captured 360degree video to the 2D input picture coordinate system for storage or transmission. The 360-degree video stitching device 104 may also support multiple projection formats for 360-degree video storage, compression, transmission, decoding, presentation, and so on. For example, the video stitching device 104 may remove the overlapping region captured by the camera equipment and output, for example, six sequences of views each covering a 90 ° × 90 ° viewport. A 360-degree video projection format conversion device (not shown) may convert an input 360-degree video projection format (e.g., a cube projection format) to an output 360-degree video projection format (e.g., an equidistant columnar format).
Video encoding device 106 may minimize spatial discontinuities (i.e., the number of face boundaries) in the composite picture for better spatial prediction, optimizing compression efficiency in video compression. For example for a cube projection, the preferred layout should have a minimized number of face boundaries within the synthesized 360degree video picture, e.g., 4. Video encoding device 106 may implement Unrestricted Motion Compensation (UMC) to optimize compression efficiency.
In the present system, the 360degree video presentation device 110 may derive the chroma projection map from the luma prediction map. The 360degree video presentation device 110 may also select a presentation picture size to maximize display video quality. The 360degree video presentation device 110 may also co-select the horizontal FOV angle alpha and the vertical FOV angle beta to minimize presentation distortion. The 360degree video presentation device 110 may also control the FOV angle to enable real-time 360degree video presentation subject to the available memory bandwidth budget.
In fig. 1, 360degree video is captured by camera equipment and stitched together into an equidistant columnar format. The video may then be compressed into any suitable video compression format (e.g., MPEG/ITU-T AVC/h.264, HEVC/h.265, VP9, etc.) and transmitted over a transmission link (e.g., cable, satellite, terrestrial, internet streaming, etc.). On the receiver side, the video is decoded (e.g., 108) and stored in an equidistant columnar format, then rendered (e.g., 110) according to a viewing direction angle and a field of view (FOV) angle, and displayed (e.g., 112). In the present system, the end user is able to control the FOV angle and viewing direction angle in order to view the video at a desired viewing angle.
Coordinate system
There are a number of coordinate systems suitable for use with the present techniques, including but not limited to:
● (x, y, z) -3D 360degree video capture (camera) coordinate system
● (x ', y ', z ') -3D 360degree video viewing coordinate system
●(xp,yp) -2D normalized projection coordinate system, where xp∈[0.0:1.0]And y isp∈[0.0:1.0]。
●(Xp,Yp) -2D input picture coordinate system, where Xp∈[0:inputPicWidth-1]And Y isp∈[0:inputPicHeight-1]Where inputpcwidth × inputpcheight is the input picture size of the color component (e.g., Y, U or V).
●(xc,yc) -2D normalized rendering coordinate system, where xc∈[0.0:1.0]And yc ∈ [ 0.0: 1.0]。
●(Xc,Yc) -2D output presentation picture coordinate system, where Xc∈[0:renderingPicWidth-1]And Y isc∈[0:renderingPicHeight-1]Where picWidth × picHeight is the output presentation picture size of the color component (e.g., Y, U or V).
●(xr,yr,zr) -3D 360degree video projection coordinate system
Fig. 2 conceptually illustrates an example of an equidistant columnar projection format 200. The equidistant cylindrical projection format 200 represents a standard way of mapping a sphere in computer graphics. It may also be referred to as equidistant cylindrical projection, geoprojection, plate carre, or map parallelogram. As shown in fig. 2, to project a sphere surface point p (x, y, z) (e.g., 202) into a sample p '(x' in a normalized projection coordinate system (e.g., 204)p,yp) The longitude ω and latitude of p (x, y, z) are calculated according to equation 1
Figure BDA0001433263990000064
Figure BDA0001433263990000061
Wherein ω ∈ [ - π: pi]And is and
Figure BDA0001433263990000062
pi is the ratio of the circumference of the circle to its diameter, typically approximately 3.1415926.
The equidistant columnar projection format 200 may be defined as in equation 2:
Figure BDA0001433263990000063
wherein xp∈[0.0:1.0]And y isp∈[0.0:1.0]。(xp,yp) Are coordinates in a normalized projection coordinate system.
Fig. 3 conceptually illustrates an example of an equidistant histogram projection layout 300 and a map of the earth. In the equidistant cylindrical projection layout 300, the picture has only a 1:1 mapping along the equator and stretches elsewhere. The maximum mapping distortion occurs at the north and south poles of a sphere (e.g., 302), where a single point can be mapped to a sample line on an equidistant columnar projection picture (e.g., 304), resulting in a large amount of redundant data in the synthesized 360degree video using the equidistant columnar projection layout 300.
Figure 4 conceptually illustrates an example of a 360degree video and equidistant columnar projection layout 400. To take advantage of existing infrastructure in which single-layer video codecs are employed for video transmission, 360-degree video segments (e.g., 402) captured by multiple cameras at different angles are typically stitched and synthesized into a single video sequence stored in an equidistant columnar projection layout. As shown in fig. 4, in the equidistant histogram projection layout 400, the left, front and right video segments of the 360degree video are projected in the middle of the picture, the back video segments are equally divided and placed on the left and right sides of the picture; the upper and lower video segments are placed at the top and bottom of the picture (e.g., 404), respectively. All video segments are stretched, with the top and bottom video segments stretched the most. Figure 5 conceptually illustrates an example of a 360degree video image in an equidistant columnar projection layout.
Cubic projection
FIG. 6 conceptually illustrates an example definition of a six-sided cube 600. Another common projection format that stores 360degree views is to project a video clip onto a cube face. As shown in fig. 6, the six faces of the cube are named front, back, left, right, up, and down.
FIG. 7 conceptually illustrates an example of a cube projection format 700. In fig. 7, cube projection format 700 includes mapping sphere surface points p (x, y, z) to one of six cube faces (e.g., 702), where cube face id and coordinates in the normalized cube projection coordinate system (x, y, z) are calculatedp,yp) (e.g., 704).
Fig. 8 conceptually illustrates an example of a 360degree video image in a cube projection layout 800. The projection rules for cube projection are described in table 1, where pseudo-code for mapping sphere surface points p (x, y, z) to cube faces is provided.
Table 1: pseudo code for cube projection mapping
if(z>0&&(-z≤y≤z)&&(-z≤x≤z))
Figure BDA0001433263990000071
else if(z<0&&(z≤y≤-z)&&(z≤x≤-z))
Figure BDA0001433263990000072
else if(x>0&&(-x≤y≤x)&&(-x≤z≤x))
Figure BDA0001433263990000073
else if(x<0&&(x≤y≤-x)&&(x≤z≤-x))
Figure BDA0001433263990000081
else if(y>0&&(-y≤x≤y)&&(-y≤z≤y))
Figure BDA0001433263990000082
else if(y<0&&(y≤x≤-y)&&(y≤z≤-y))
Figure BDA0001433263990000083
Field of view (FOV) and viewing direction angle
To display 360degree video, portions of each 360degree video picture need to be projected and presented. The field of view (FOV) angle defines how large portion of the 360degree video picture is displayed, while the viewing direction angle defines which portion of the 360degree video picture is shown.
To display 360degree video, imagine that the video is mapped on a single sphere surface, a viewer sitting at the center point of the sphere is able to view a rectangular screen, and the screen has its four corners positioned on the sphere surface. Here, (x ', y ', z ') is referred to as a 360 view viewing coordinate system, and (x)c,yc) Referred to as a normalized presentation coordinate system.
Fig. 9 conceptually illustrates an example of a normalized projection plane size 900 determined by the angle of field of view. As shown in fig. 9, in the viewing coordinate system (x ', y', z '), the center point of the projection plane (i.e., the rectangular screen) is positioned on the z' axis and parallel to the x 'y' plane. Thus, the projection plane size wxh and its distance d to the center of the sphere can be calculated by:
Figure BDA0001433263990000084
wherein
Figure BDA0001433263990000085
And is
Figure BDA0001433263990000086
And alpha is an element of (0: pi)]Is the horizontal FOV angle and is beta e (0: pi)]Is the vertical FOV angle.
Fig. 10 conceptually illustrates an example of a viewing direction angle 1000. The viewing direction is defined by the rotation angle of the 3D viewing coordinate system (x ', y ', z ') with respect to the 3D capture coordinate system (x, y, z). As shown in fig. 10, the viewing direction is specified by a clockwise rotation angle θ (e.g., 1002, roll) along the y-axis, a counterclockwise rotation angle γ (e.g., 1004, pitch) along the x-axis, and a counterclockwise rotation angle ε (e.g., 1006, roll) along the z-axis.
The coordinate mapping between the (x, y, z) and (x ', y ', z ') coordinate systems is defined as:
Figure BDA0001433263990000091
that is to say that the first and second electrodes,
Figure BDA0001433263990000092
equation 4
Fig. 11 illustrates a schematic diagram of a coordinate mapping 1100 between an output presentation picture and an input picture. Using the FOV and viewing direction angles defined above, an output presentation Picture coordinate System (X) can be establishedc,Yc) (i.e., presentation pictures for display) and input picture coordinate system (X)p,Yp) (i.e., input 360degree video pictures) in a coordinate mapping. As shown in FIG. 11, a sample point (X) in a given presentation picturec,Yc) Inputting corresponding sample points (X) in a picturep,Yp) The coordinates of (a) can be derived by:
● calculating the normalized projection plane size and distance to the center of the sphere based on the FOV angle (α, β) (i.e., equation 3); computing a coordinate transformation matrix between the viewing and capture coordinate systems based on the viewing direction angle (e, θ, γ) (i.e., equation 4)
● based on rendering picture size and normalized projection planeThe size of the noodle will (X)c,Yc) And (6) normalizing.
● will normalize the coordinates (x) in the presentation coordinate systemc,yc) To a 3D viewing coordinate system (x ', y ', z ').
● transforming the coordinates to a 3D capture coordinate system (x, y, z)
● derive the coordinates (x) in the normalized projection coordinate systemp,yp)
● convert the derived coordinates to integer positions in the input picture based on the input picture size and the projection layout format.
FIG. 12 conceptually illustrates that the normalized rendering coordinate system (e.g., p (x) is normalized using an equidistant histogram projection format 1200c,yc) Maps to a normalized projection coordinate system (e.g., p' (x))p,yp) Examples of (c).
In one or more implementations, projection from an equidistant bin input format is performed. For example, if the input picture is in an equidistant columnar projection format, the following steps may be applied to sample points (X) in the picture to be presentedc,Yc) Mapping to sample points (X) in an input picturep,Yp)。
● calculate a normalized display projection plane size based on the FOV angle:
Figure BDA0001433263990000093
wherein
Figure BDA0001433263990000094
And is
Figure BDA0001433263990000095
● will (X)c,Yc) Mapping into a normalized presentation coordinate system:
Figure BDA0001433263990000101
● calculate p (x) in the (x ', y ', z ') view coordinate systemc,yc) The coordinates of (a):
Figure BDA0001433263990000102
● convert the coordinates (x ', y ', z ') into an (x, y, z) capture coordinate system based on the viewing direction angle:
Figure BDA0001433263990000103
● project p (x, y, z) to a normalized projection coordinate system p' (x)p,yp) The method comprises the following steps:
Figure BDA0001433263990000104
● mixing p' (x)p,yp) Mapping to input Picture (equidistant cylindrical) coordinate System (X)p,Yp) On the upper part
Figure BDA0001433263990000105
Wherein:
● α, β are FOV angles, and ε, θ, γ are viewing direction angles.
● renderingPicWidth x renderingPicHeight is the rendering picture size
● inputPicWidth x inputPicHeight is the input picture size (in equidistant column projection format)
FIG. 13 conceptually illustrates that a cube projection format 1300 is used to normalize a presentation coordinate system (e.g., p (x)c,yc) Maps to a normalized projection coordinate system (e.g., p' (x))p,yp) Examples of (c).
In one or more implementations, a projection of the input format from the cube is performed. For example, if the input picture is in the shape of a cubeProjection format, then the following similar steps apply to sample points (X) in the picture to be renderedc,Yc) Mapping to sample points (X) in an input picturep,Yp)。
● calculate a normalized display projection plane size based on the FOV angle:
Figure BDA0001433263990000111
wherein
Figure BDA0001433263990000112
And is
Figure BDA0001433263990000113
● will (X)c,Yc) Mapping into a normalized presentation coordinate system:
Figure BDA0001433263990000114
● calculate p (x) in the (x ', y ', z ') view coordinate systemc,yc) Coordinates of (2)
Figure BDA0001433263990000115
● convert the coordinates (x ', y ', z ') into an (x, y, z) capture coordinate system based on the viewing direction angle:
Figure BDA0001433263990000116
● project p (x, y, z) to a normalized cube coordinate system p' (x) based on the pseudo-code defined in Table 1p,yp) The above.
● mixing p' (x)p,yp) Mapping to input cube coordinate System (X)p,Yp) Upper (assuming all cube faces have the same resolution)
Figure BDA0001433263990000117
Wherein:
● α, β are FOV angles, and ε, θ, γ are viewing direction angles.
● renderingPicWidth x renderingPicHeight is the rendering picture size
● inputPicWidth inputPicHeight is input picture size (in cube projection format)
● { (Xoffset [ faceID ], Yoffset [ faceID ]) | faceID ═ front, rear, left, right, top, and bottom } is the coordinate offset of the cube face in the input cube projection coordinate system.
Figure BDA0001433263990000118
For the cube projection layout depicted in FIG. 13, the face IDs access the coordinate offset array in the following order: front, back, left, right, up, followed by down.
Sample presentation for display
In a 360-degree video projection display, multiple samples in an input 360-degree video picture (e.g., in an equidistant columnar format or a cubic projection format) may be projected to the same integer position (X) in a presentation picturec,Yc). To have a smooth presentation, not only integer pixel positions, but their sub-pixel positions in the presentation picture are also projected to find the corresponding samples in the input picture.
Figure 14 conceptually illustrates an example of a two-dimensional layout 1400 of samples of input 360 video pictures projected for 360degree video presentation. As shown in fig. 14, if the projection accuracy is in the horizontal direction
Figure BDA0001433263990000121
Sub-pixel and in the vertical direction
Figure BDA0001433263990000122
Sub-pixel, then at location (X)c,Yc) The sample values of the presentation picture at (a) may be presented by:
Figure BDA0001433263990000123
wherein:
●(Xp,Yp)=mapping_func(Xc,Yc) Is a coordinate mapping function from the presentation picture to the input 360 video picture (e.g., w/equidistant columnar projection or cube projection format) as defined in the section above.
●inputImg[Xp,Yp]Is the position (X) in the input picturep,Yp) The sample value of (c).
●renderingImg[Xc,Yc]Is outputting the position (X) in the presentation picturec,Yc) The sample value of (c).
Output presentation coordinate system (X) instead of real-time computationc,Yc) And input picture coordinate system (X)p,Yp) Coordinate mapping between them, which can also be pre-computed and stored as a projection map of the whole presentation picture. Since the viewing direction and FOV angle may not change from picture to picture, the pre-computed projection view may be shared by rendering multiple pictures.
Suppose projectMap [ n Xc+j,m*Yc+i]Is a pre-calculated projection view, where Xc=0、1、…、renderingPicWidth-1, Y c0, 1, …, renderingPicHeight-1, j-0, 1, …, n-1, and i-0, 1, …, m-1. For sub-pixel positions in a presentation picture
Figure BDA0001433263990000124
Each item in the projection view stores a pre-calculated coordinate value (X) of the input picture coordinate systemp,Yp). The presentation can be written as:
Figure BDA0001433263990000125
pictures may have multiple color components, such as YUV, YCbCr, RGB. The above rendering process may be applied independently to the color components.
Figure 15 conceptually illustrates an example of a global rotation angle 1500 between a capture coordinate system and a 360degree video projection coordinate system. Instead of mapping 360 degrees video directly from the 3D 360degree video capture coordinate system (X, y, z) to the input picture coordinate system (X) during 360degree video capture and stitchingp,Yp) We introduce a system named 3D 360degree video projection coordinate (x)r,yr,zr) The additional coordinate system of (2). (x)r,yr,zr) The relationship between (x, y, z) and (x, y, z) is defined by the counterclockwise rotation angle along the z, y, x axis as shown in FIG. 15
Figure BDA0001433263990000135
And (4) specifying. The coordinate transformation between the two systems is defined as:
Figure BDA0001433263990000131
in one or more implementations, equation 8 may be rewritten as equation 9:
Figure BDA0001433263990000132
figure 16 conceptually illustrates an example of an alternative 360degree view projection 1600 in an equidistant columnar projection format. Additional coordinate system (x) when projecting the captured 360degree video to the 2D input picture coordinate system for storage or transmissionr,yr,zr) More degrees of freedom are provided on the 360degree video capture side. Taking the equidistant cylindrical projection format as an example, it may sometimes be desirable to project the front and back views to the south and north poles, as shown in fig. 16, rather than the top and bottom views shown in fig. 4. Because in equidistant cylindrical projectionData around the south and north poles of the sphere is preferably preserved, allowing alternative layouts (such as the layout shown in fig. 16) to potentially provide benefits (such as better compression efficiency) under certain circumstances.
Since 360 degrees of video data can be converted from a 3D coordinate system (x)r,yr,zr) (rather than (x, y, z)) to the input picture coordinate system, the coordinates (x, y, z) need to be converted to (x, y, z) based on equation 9 in fig. 2 and 7 before projecting the 360 degrees video data to the input picture coordinate systemr,yr,zr) The additional step of (2). Thus, applying (x)r,yr,zr) In place of (x, y, z) in equation 1, equation 5, and table 1.
Fig. 17 illustrates a schematic diagram of an example of a coordinate mapping process modified with a 360degree video projection coordinate system. Thus, the slave output presentation picture coordinate system (X) may be modified as illustrated in FIG. 17c,Yc) To input picture coordinate system (X)p,Yp) And (4) coordinate mapping. The coordinate transformation matrix is calculated by considering the viewing direction angle (e, theta, gamma) and the global rotation angle
Figure BDA0001433263990000136
Both modified. By cascading equation 9 with equation 4, the coordinates can be converted directly from the viewing coordinate system (x ', y ', z ') to the 3D 360degree video projection coordinate system (x) by the following equationsr,yr,zr):
Figure BDA0001433263990000133
In one or more implementations, the 3 x 3 coordinate transformation matrix may be pre-computed using the equations above.
FIG. 17 illustrates projecting a coordinate system (e.g., (x) with a 360degree videor,yr,zr) Schematic of an example of a modified coordinate mapping process. In FIG. 17, signaling in the system may be by any suitable meansNotification of global rotation angle
Figure BDA0001433263990000141
For example, SEI (auxiliary enhancement information) messages carried in the video element bitstream are defined, such as AVC SEI or HEVC SEI messages. Further, a default view may also be specified and carried.
Fig. 18 illustrates a schematic diagram of an example of a 360degree video capture and playback system 1800 using a six-view layout format. Instead of compositing 360-degree video into a single view sequence (such as the equidistant columnar format used in fig. 1), a 360-degree video capture and playback system may support multiple view layout formats for 360-degree video storage, compression, transmission, decoding, presentation, and so on. As shown in fig. 18, a 360degree video tile (e.g., 104) may simply remove the overlapping region captured by the camera rig and output, for example, six view sequences (i.e., up, down, front, back, left, right) each covering a 90 ° x 90 ° viewport. The sequences may or may not have the same resolution, but are separately compressed (e.g., 1802) and decompressed (e.g., 1804). A 360degree video presentation engine (e.g., 110) may take multiple sequences as input and present the sequences for display with the help of presentation controls.
Fig. 19 illustrates a schematic diagram of an example of a 360degree video capture and playback system using a two-view layout format. In fig. 19, a 360-degree video tile (e.g., 104) may generate, for example, two sequences of views, one covering, for example, the upper and lower 360 ° × 30 ° of data, and the other covering the front, back, left and right 360 ° × 120 ° of data. The two sequences may have different resolutions and different projection layout formats. For example, the front + back + left + right view may use an equidistant histogram projection format, while the top + bottom view may use another projection format. In this particular example, two encoders (e.g., 1902) and decoders (e.g., 1904) are used. A 360degree video presentation (e.g., 110) may present a sequence for display by taking two sequences as input.
Fig. 20 illustrates a schematic diagram of an example of a 360degree video capture and playback system 2000 using a two-view layout format (with one view sequence for presentation). The multiview layout format may also provide extensible presentation functionality in a 360degree video presentation. As shown in fig. 20, the 360degree video may choose not to present the upper or lower 30 ° portions of the video (e.g., 2002, 2004), due to limited capabilities of the presentation engine (110), or if bitstream packets of the up + down view are lost during transmission.
Even if a 360-degree video tile (e.g., 104) generates multiple view sequences, a single video encoder and decoder may still be used in the 360-degree video capture and playback system 2000. For example, if the output of a 360degree video tile (e.g., 104) is a six 720p @30 (i.e., 720p sequence at 30 frames/sec) view sequence, the output can be synthesized into a single 720p @180 sequence (i.e., 720p sequence at 180 frames/sec) and compressed/decompressed using a single codec. Alternatively, for example, six separate sequences may be compressed/decompressed by a single video encoder and decoder instance without the need to synthesize them into a single combined sequence through the available processing resources of a time-division single encoder and decoder instance.
FIG. 21 conceptually illustrates an example of a plurality of cube projection format layouts: (a) cube projection original layout; (b) an undesirable cubic projection layout; and (c) an example of a preferred cubic projection layout. Conventional video compression techniques exploit spatial correlation within a picture. For cube projection, for example, if a sequence of projections of up, down, front, back, left, and right cube faces are synthesized into a single sequence of views, different cube face layouts may result in different compression efficiencies, even for the same 360degree video data. In one or more implementations, the original layout of cubes (a) does not create discontinuous face boundaries within the composite picture, but has a larger picture size than the other two, and carries dummy data in gray regions. In one or more implementations, layout (b) is one of the undesirable layouts because it has the largest number of discontinuous face boundaries (i.e., 7) within the composite picture. In one or more implementations, layout (c) is one of the preferred cube projection layouts, as it has the smallest number of discontinuous face boundaries 4. The plane boundaries in the layout (c) of fig. 22 are the horizontal plane boundaries between the left and upper and between the right and rear, and the vertical plane boundaries between the upper and lower and between the rear and lower. In some aspects, experimental results reveal that layout (c) outperforms both layouts (a) and (b) by approximately 3% on average.
Thus, for cube projection or other multi-surface projection formats, in general, the layout should minimize spatial discontinuities (i.e., the number of discontinuous surface boundaries) in the composite picture for better spatial prediction and, thus, optimize compression efficiency in video compression. For cube projection, for example, the preferred layout has several discontinuous face boundaries 4 within the composite 360degree video picture. For the cube projection format, further minimization of spatial discontinuities can be achieved by introducing face rotations (i.e., 90, 180, 270 degrees) in the layout. In one or more aspects, layout information for incoming 360degree video is signaled in an advanced video system message for compression and presentation.
Fig. 22 conceptually illustrates an example of unrestricted motion compression 2200. Unrestricted Motion Compensation (UMC) is a technique commonly used in video compression standards to optimize compression efficiency. As shown in fig. 22, in UMC, the reference block of the prediction unit is allowed to exceed the picture boundary. For reference pixels ref [ X ] outside the reference picturep,Yp]The nearest picture boundary pixel is used. Here, the coordinates (X) of the pixel are referencedp,Yp) Determined by the location of the current Prediction Unit (PU) and the motion vector, which may exceed the picture boundary.
Suppose { refPic [ X ]cp,Ycp],Xcp=0、1、...、inputPicWidth-1; Y cp0, 1, unpitpeichheight-1 is the reference picture sample matrix, then UMC (loaded by reference block) is defined as:
refBlk[Xp,Yp]=refPic[clip3(0,inputPicWidth-1,Xp),clip3(0,inputPicHeight-1,Yp)]
in the case of the equation 10,
where the clip function clip3(0, a, x) is defined as:
int clip3(0, a, x) { if (x < 0) return 0; else if (x > a) return a; else return x equation 11
Fig. 23 conceptually illustrates an example of a multiple 360degree video projection format layout 2300. In some aspects, the 360degree video projection format layout 2300 includes (but is not limited to): (a) equidistant columnar projection layout; and (b) cube projection original layout. However, in 360degree video, the left and right picture boundaries may belong to the same camera view. Likewise, the upper and lower picture boundary lines may be close to each other in physical space, although they are far apart in a 360degree video picture layout. Fig. 23 depicts two examples. In layout example (a), both left and right picture boundaries belong to the back view. In layout example (b), although the left and right picture boundaries belong to different views (i.e., left and back), those two picture boundary columns are actually physically close to each other during video capture. Thus, to optimize the compression efficiency of 360degree video, it makes sense to allow reference blocks to be loaded in a "wrap-around" fashion when the reference pixels are outside the picture boundaries rather than being filled with the nearest picture boundary pixels defined in the current UMC.
In one or more implementations, the following high level syntax may enable extended UMC.
Table 2: extended UMC syntax
Figure BDA0001433263990000161
It should be noted that the picture size difference (aw, ah) needs to be signaled because the coded picture size inputPicWidth x inputPicHeight typically has to be a multiple of 8 or 16 in both directions, while the captured picture may not be, which may cause the picture size to be different between the captured picture and the coded picture. Reference block wrap-around is along picture boundaries of the captured picture rather than the coded picture.
In one or more implementations, the high level semantics of table 2 may be signaled in the sequence header or the picture header, or both, depending on the implementation.
Using the semantics defined above, an extended UMC can be defined as follows:
refBlk[Xp,Yp]=refPic[Xcp,Ycp]in the case of the equation 12,
wherein (X)cp,Ycp) Is calculated as:
Figure BDA0001433263990000162
Figure BDA0001433263990000163
where clip3() is defined in equation 11, and warp 3(0, a, x) is defined as:
int wrapaund 3(0, a, x) { while (x < 0) x + ═ a; while (x > a) x- ═ a; return x; equation 13
In current video compression standards, motion vectors may exceed picture boundaries with large margins. Thus, the "while" loop in equation 13 is included. To avoid the "while" loop, in next generation video compression standards, it may be limited how much a reference pixel (for motion compensation) may exceed a picture boundary (e.g., up to 64 or 128 or 256 pixels, depending on the maximum coding block size defined in the next generation standards, etc.). After imposing this constraint, warparound3(0, a, x) can be simplified as:
int wrapaund 3(0, a, x) { if (x < 0) x + ═ a; if (x > a) x- ═ a; return x; equation 14
Fig. 24 conceptually illustrates an example of extended unconstrained motion compensation 2400. In fig. 24, reference block wrapping is enabled in the horizontal direction. Instead of filling the reference block ports outside the left picture boundary with picture boundary pixels, the "wrap-around" portion along the right (captured) picture boundary is loaded.
Adaptive projection format
Fig. 25 illustrates an example network environment 2500 in which 360degree video capture and playback may be implemented in accordance with one or more implementations. However, not all of the depicted components may be used, and one or more implementations may include additional components not shown in the figures. Changes may be made in the arrangement and type of the components without departing from the spirit or scope of the claims as set forth herein. Additional components, different components, or fewer components may be provided.
Example network environment 2500 includes a 360-degree video capture device 2502, a 360-degree video splicing device 2504, a video encoding device 2506, a video decoding device 2508, and a 360-degree video presentation device 2510. In one or more implementations, one or more of devices 2502, 2504, 2506, 2508, 2510 may be combined into the same physical device. For example, the 360-degree video capture device 2502, 306-degree video splicing device 2504, and video encoding device 2506 may be combined into a single device, and the video decoding device 2508 and 360-degree video presentation device 2510 may be combined into a single device.
Network environment 2500 may further include a 360-degree video projection format decision device (not shown) that may perform projection format selection before video splicing by video splicing device 2504 and/or after video encoding by video encoding device 2506. Network environment 2500 may also include a 360degree video playback device (not shown) that plays back the presented 360degree video content. In one or more implementations, video encoding device 2506 may be communicatively coupled to video decoding device 2508 via a transmission link (e.g., over a network).
In the present system, the 360degree video stitching device 2504 may utilize a 360degree video projection format decision device (not shown) on the 360degree video capture/compression side to decide which projection format (e.g., ERP (isometric column projection), CMP (cube projection), ISP (icosahedron projection), etc.) best fits the current video segment (i.e., group of pictures) or current picture to achieve the best possible compression efficiency. The decision may be made based on encoding statistics provided by video encoding device 2506 (e.g., distribution of bit rates, intra/intra modes across segments or pictures, video quality measurements, etc.) and/or raw data statistics obtained from 360-degree video capture device 2502 regarding raw 360-degree video camera data (e.g., distribution of raw data spatial activity, etc.). Once the projection format is selected for the current segment or picture, the 360degree video stitching device 2504 stitches the video into the selected projection format and transmits the stitched 360degree video to the video encoding device 2506 for compression.
In the present system, the selected projection format and associated projection format parameters (e.g., projection format ID, number of faces in the projection layout, face size, face coordinate offset, face rotation angle, etc.) may be signaled from video encoding device 2506 in an appropriate manner in the compressed 360-degree video bitstream, such as in a supplemental enhancement information message (SEI), in a sequence header, in a picture header, and so forth. Rather than stitching the 360degree video into a single and fixed projection format (e.g., ERP), the 360degree video stitching device 2504 may stitch the 360degree video into different projection formats selected by the 360degree video projection format decision device.
In the present system, the 360degree video stitching device 2504 may utilize an additional coordinate system that provides more freedom on the 360degree video capture side when projecting the captured 360degree video to the 2D input picture coordinate system for storage or transmission. The 360-degree video stitching device 2504 may also support multiple projection formats for 360-degree video storage, compression, transmission, decoding, presentation, and so on. For example, video stitching device 2504 may remove overlapping regions captured by camera equipment and output, for example, six sequences of views each covering a 90 ° × 90 ° viewport.
Video encoding device 2506 may minimize spatial discontinuities (i.e., the number of discontinuous face boundaries) in the composite picture for better spatial prediction, optimizing compression efficiency in video compression. For cube projection, for example, the preferred layout should have a minimized number of discontinuous face boundaries within the synthesized 360degree video picture, e.g., 4. Video encoding device 2506 may implement UMC to optimize compression efficiency.
On the 360-degree video playback side, a 360-degree video playback device (not shown) may receive the compressed 360-degree video bitstream and decompress the 360-degree video bitstream. Rather than presenting video in a single and fixed projection format (e.g., ERP), the 360-degree video playback device may present 360-degree video in different projection formats signaled in a 360-degree bitstream. In this regard, the 360degree video presentation is controlled not only by the viewing direction and FOV angle, but also by the projection format information decoded from the compressed 360degree video bitstream.
In the present system, a default (recommended) viewing direction (i.e., viewing direction angle), FOV angle, and/or a system message of the presentation picture size may be signaled with the 360degree video content. A 360degree video playback device may keep the presentation picture size as is, but purposefully reduce the active presentation area to reduce presentation complexity and memory bandwidth requirements. The 360degree video playback device may store 360degree video presentation settings (e.g., FOV angle, viewing direction angle, presentation picture size, etc.) just prior to playback termination or switching to other program channels so that the stored presentation settings may be used when playback of the same channel resumes. A 360degree video playback device may provide a preview mode in which the viewing angle may be automatically changed every N frames to help the viewer select a desired viewing direction. The 360degree video capture and playback device may calculate the projected pattern in real-time (e.g., block-by-block) to conserve memory bandwidth. In this example, the projected pattern may not be loaded from off-chip memory. In the present system, different view fidelity information may be assigned to different views.
In fig. 25, the video is captured by the camera rig and stitched together into an equidistant columnar format. The video may then be compressed into any suitable video compression format (e.g., MPEG/ITU-T AVC/h.264, HEVC/h.265, VP9, etc.) and transmitted over a transmission link (e.g., cable, satellite, terrestrial, internet streaming media, etc.). On the receive side, the video is decoded and stored into an equidistant columnar format, then rendered according to a viewing direction angle and a field of view (FOV) angle, and displayed. In the system, the end user is able to control the FOV and viewing direction angle in order to view a desired portion of the 360degree video at a desired viewing angle.
Referring back to fig. 2, to utilize the existing infrastructure in which single-layer video codecs are used for video transmission, 360-degree video segments captured by multiple cameras at different angles are typically stitched and synthesized into a single video sequence stored in some projection format, such as the widely used equidistant columnar projection format.
In addition to the equidistant columnar projection format (ERP), there are many other projection formats that can represent a 360degree video frame on a 2D rectangular image, such as cubic projection (CMP), icosahedron projection (ISP), and FISHEYE projection (fishere).
FIG. 26 conceptually illustrates an example of a cube projection format. As shown in fig. 26, in CMP, a sphere surface (e.g., 2602) is projected onto six cube faces (i.e., upper, front, right, back, left, and lower) (e.g., 2604), each face covers a 90 x 90 degree field of view, and the six cube faces are composited into a single image. Fig. 27 conceptually illustrates an example of a fisheye projection format. In fish eye projection, a spherical surface (e.g., 2702) is projected to two cycles (e.g., 2704, 2706); each cycle covers 180x180 degree views. FIG. 28 conceptually illustrates an example of an icosahedron projection format. In the ISP, the spheres are mapped to a total of 20 triangles that are composited into a single image. Fig. 29 illustrates a schematic diagram of an example of a 360degree video 2900 in multiple projection formats. For example, layout (a) of fig. 29 depicts an ERP projection format (e.g., 2902), layout (b) of fig. 29 depicts a CMP projection format (e.g., 2904), layout (c) of fig. 29 depicts an ISP projection format (e.g., 2906), and layout (d) of fig. 29 depicts a fishere projection format (e.g., 2908).
For the same 360degree video content, different projection formats may result in different compression efficiencies after the video is compressed with, for example, the MPEG/ITU AVC/h.264 or MPEG/ITU MPEG HEVC/h.265 video compression standards. Table 1 provides BD rate differences between ERP and CMP for twelve 4K 360degree video test sequences. The PSNR to BD rate difference is calculated in the CMP domain, where negative numbers mean better compression efficiency using ERP and positive numbers mean better compression efficiency using CMP. Table 3 experimental results on the compression efficiency of ERP relative to CMP are illustrated with reference to software HM16.9, for example MPEG/ITU MPEG HEVC/h.265.
Table 3: ERP versus CMP compression efficiency
Figure BDA0001433263990000201
As shown in Table 3, while ERP results are used in optimizing compression efficiency overall, there are other cases (e.g., GT _ Shriff-left, timelap _ base jump) where using CMP results in more positive results. Accordingly, it may be desirable to be able to change the projection format from time to best fit the characteristics of the 360degree video content in order to achieve the best possible compression efficiency.
Fig. 30 illustrates a schematic diagram of an example of a 360degree video capture and playback system 3000 with an adaptive projection format. To maximize the compression efficiency of 360degree video, an adaptive approach may be implemented that can compress 360degree video into a hybrid projection format. As shown in fig. 30, projection format decision blocks (e.g., 3002) are employed on the 360degree video capture/compression side to decide which projection format (e.g., ERP, CMP, ISP, etc.) best suits the current video segment (i.e., group of pictures) or current picture to achieve the best possible compression efficiency. The decision may be made based on encoding statistics (e.g., distribution of bit rates, intra/intra modes across segments or pictures, video quality measurements, etc.) provided by a video encoder (e.g., 2506) and/or raw data statistics obtained about raw 360degree video camera data (e.g., distribution of raw data spatial activity, etc.). Once the projection format of the current segment or picture is selected, a 360degree video stitching block (e.g., 2504) stitches the video into the selected projection format and transmits the stitched 360degree video to a video encoder (e.g., 2506) for compression.
The selected projection format and associated projection format parameters (e.g., projection format ID, number of planes in the projection layout, plane size, plane coordinate offset, plane rotation angle, etc.) are signaled in the compressed bitstream in any suitable manner, such as in an SEI (supplemental enhancement information) message, in a sequence header, in a picture header, and so forth. Unlike the system illustrated in fig. 25, the 360degree video stitching block (e.g., 2504) is capable of stitching 360degree video into different projection formats selected by the projection format decision block (e.g., 3002), rather than stitching video into a single and fixed projection format (e.g., ERP).
On the 360degree video playback side, a receiver (e.g., 2508) receives the compressed 360degree video bitstream and decompresses the video stream. Unlike the system illustrated in fig. 25, the 360-degree video rendering block (e.g., 2510) is capable of rendering 360-degree video in different projection formats signaled in the bitstream, rather than rendering video in a single and fixed projection format (e.g., ERP). That is, the 360degree video presentation is controlled not only by the viewing direction and FOV angle, but also by the projection format information decoded from the bitstream.
Fig. 31 illustrates a schematic diagram of another example of a 360degree video capture and playback system with projection formats that are adaptive through multiple channels according to multiple sources. In FIG. 31, the 360degree video capture and playback system 3100 is capable of, for example, four channel decoding and parallel rendering of 360degree video, the 360degree video input may be from different sources (e.g., 3102-1, 3102-2, 3104-3, 3102-4) in different projection formats (e.g., 3104-1, 3104-2, 3106-3, 3106-4) and compression formats (e.g., 3106-1, 3106-2, 3106-3, 3106-4), and in different fidelity (e.g., picture resolution, frame rate, bit rate, etc.).
In one or more implementations, channel 0 may be live video in an adaptive projection format and compressed with HEVE. In one or more implementations, channel 1 may be live video in a fixed ERP format and compressed with MPEG-2. In one or more implementations, channel 2 may be 360degree video content in an adaptive projection format and compressed with VP9 but pre-stored on a server for streaming. In one or more implementations, channel 3 may be 360degree video content in a fixed ERP format and compressed with h.264 but pre-stored on a server for streaming.
For video decoding, decoders (e.g., 3108-1, 3108-2, 3108-3, 3108-4) are capable of decoding video in different compression formats, and the decoders may be implemented in Hardware (HW), or with a programmable processor (SW), or in a hybrid HW/SW.
For 360-degree video presentation, the presentation engine (e.g., 3110-1, 3110-2, 3110-3, 3110-4) can perform viewport presentation from input 360-degree video in different projection formats (e.g., ERP, CMP, etc.) based on viewing direction angle, FOV angle, input 360-degree video picture size, output view port picture size, global rotation angle, and so on.
The rendering engine may be implemented in Hardware (HW), or with a programmable processor (e.g., GPU), or in a hybrid of HW/SW. In some aspects, the same video decoder output may be fed into multiple rendering engines such that multiple viewports are rendered from the same 360 video input for display.
Fig. 32 illustrates an example of projection format determination 3200. While 360degree video playback (decompression plus presentation) is relatively fixed, there are a number of ways to generate a 360degree video compressed video stream with maximum compression efficiency by adaptively changing the projection format from time to time. Fig. 32 provides an example of how a projection format decision (e.g., 3002) may be generated. In this example, 360degree video content (e.g., 3202) in multiple projection formats (e.g., ERP, CMP, and SP) is provided, video in different formats is compressed (e.g., 3204) with the same type of video encoder (e.g., MPEG/ITU-T HEVC/h.265), and the compressed bitstream is stored (e.g., 3206). After encoding segments or pictures of different projection formats, the rate-distortion cost (e.g., number of PSNR's for the same bit rate, or bit rate or combination metric of the same quality) of the segments or pictures of those projection formats may be measured. After a projection format decision (e.g., 3002) is made based on the rate distortion cost of the current segment or picture, the corresponding bitstream in the selected format of the segment or picture (e.g., 3208) is selected and spliced into the bitstream in the hybrid projection format (e.g., 3210). It should be noted that in such a system, the projection format may change from video segment (group of pictures) to picture.
Fig. 33 conceptually illustrates an example of projection format conversion 3300 that excludes inter-prediction across projection format conversion boundaries. To support 360degree video in hybrid projection formats with existing video compression standards, such as MPEG/ITU AVC/h.264, MPEG/ITU HEVC/h.265, and Google VP9, limitations must be imposed on the frequency at which the projection format may be changed. As shown in fig. 33, projection format transition 3300, such as a transition from ERP to CMP, or CMP to ERP in this example, may occur only at a Random Access Point (RAP), such that inter-prediction may not need to cross projection format transition boundaries. RAP may be guided by Instantaneous Decoding Refresh (IDR) pictures or other types of pictures that provide random access functionality. In fig. 33, the projection format can only be changed from video segment to video segment, and not from picture to picture unless the segment itself is a picture.
Fig. 34 conceptually illustrates an example of a projection format transition 3400 with inter-prediction across projection format transition boundaries. The projection format may also change from picture to picture if inter-prediction may cross the projection format transition boundary. For the same 360degree video content, different projection formats may result not only in different content in the picture, but also in different picture resolutions after 360degree video stitching. As shown in fig. 34, projection format conversion (e.g., 3400) of reference pictures in a DPB (decoded picture buffer) may be used to support inter-prediction across projection format boundaries. Projection format conversion may convert 360degree video reference pictures in a DPB from one format (e.g., ERP) to a projection format (e.g., CMP) and picture resolution of the current picture. The conversion may be implemented on a picture-based basis, where a reference picture in the projected format of the current picture is converted and pre-stored, or on a block-by-block basis in real time based on the size, position of the block in the current picture and motion data of the current prediction block. In fig. 34, the projection format can only be changed from picture to picture, but reference picture projection format conversion is required, which is a tool that can be supported by future video compression standards.
Advice views
Several services (e.g., YouTube, Facebook, etc. …) have recently begun to provide 360 ° video sequences. These services allow users to patronize around in all directions while the video is playing. Users can rotate the scene to view anything they are interested in at a given time.
There are several formats for 360 ° video, but each involves projecting a 3D surface (sphere, cube, octahedron, icosahedron, etc. …) onto a 2D plane. The 2D projections are then encoded/decoded as any normal video sequence. At the decoder, a portion of that 360 ° view is rendered and displayed depending on the view of the user at that time.
The end result is to give the user the freedom to see everywhere around them, which greatly increases the feeling of being personally on the scene, making him feel as if they were in the scene. In combination with spatial audio effects (rotating audio surround to match video), the effects can be quite engaging.
Fig. 35 illustrates an example network environment 3500 in which suggested views within a 360degree video may be implemented in accordance with one or more implementations. However, not all of the depicted components may be used, and one or more implementations may include additional components not shown in the figures. Changes may be made in the arrangement and type of the components without departing from the spirit or scope of the claims as set forth herein. Additional components, different components, or fewer components may be provided.
Example network environment 3500 includes 360degree video capture device 3502, 360degree video splicing device 3504, video encoding device 3506, video decoding device 3508, and 360degree video presentation device 3510. In one or more implementations, one or more of the devices 3502, 3504, 3506, 3508, 3510 may be combined into the same physical device. For example, 360degree video capture device 3502, 306 degree video splicing device 3504, and video encoding device 3506 may be combined into a single device, and video decoding device 3508 and 360degree video presentation device 3510 may be combined into a single device. In some embodiments, video decoding device 3508 may include an audio decoding device (not shown), or in other embodiments, video decoding device 3508 may be communicatively coupled to a separate audio decoding device for processing an incoming or stored 360 video compressed bitstream.
On the 360 video playback side, network environment 3500 may further include a demultiplexer device (not shown) that may demultiplex an incoming compressed 360 video bitstream and provide the demultiplexed bitstream to video decoding device 3508, an audio decoding device, and a view extraction device (not shown), respectively. In some aspects, the demultiplexer device may be configured to decompress 360 the video bitstream. Network environment 3500 may further include a 360 video projection format conversion device (not shown) that may perform 360 video projection format conversion before video encoding device 3506 and/or after video decoding device 3508 performs video decoding. Network environment 3500 may also include a 360 video playback device (not shown) that plays back the rendered 360 video content. In one or more implementations, video encoding device 3506 may be communicatively coupled to video decoding device 3508 via a transmission link (e.g., over a network).
The 360 video playback device may store 360 video presentation settings (e.g., FOV angle, viewing direction angle, presentation picture size, etc.) just prior to playback termination or switching to other program channels so that the stored presentation settings may be used when resuming playback of the same channel. A 360 video playback device may provide a preview mode in which the viewing angle may be automatically changed every N frames to help the viewer select a desired viewing direction. The video capture and playback device may calculate the projected pattern in real-time (e.g., block-by-block) to conserve memory bandwidth 360. In this example, the projected pattern may not be loaded from off-chip memory. In the present system, different view fidelity information may be assigned to different views.
In the present system, a content provider may provide a "suggested view" of a given 360degree video. This suggested view may be a specific set of perspectives for each frame in the 360degree video to provide a recommendation experience to the user. In the event that the user is not particularly interested in controlling the view itself at any given point in time, the user may view (or playback) the suggested view and experience the view recommended by the content provider.
In the present system, to efficiently store the data needed to record/playback a decompressed 360 video bitstream in a particular view that a user views during a particular viewing, the "roll", "pitch" and "roll" angles (perspective data) for each frame may be saved in a storage device. In combination with the full 360degree view data that was originally recorded, the previously saved view may be recreated.
Depending on the way the view is stored, the corresponding view extraction process may be initialized. In one or more implementations where views are stored within a compressed 360 video bitstream, a view extraction process may be used. For example, video decoding device 3508 and/or audio decoding device may extract views from a compressed 360 video bitstream (e.g., from SEI messages within an HEVC bitstream). In this regard, the views extracted by video decoding device 3508 may then be provided to video presentation device 3510. If the views are stored in a separate data stream (e.g., an MPEG-2TS PID), the demultiplexer device can extract this information and send it to the video presentation device 3510 as a suggested view. In some examples, the demultiplexer feeds to a separate view extraction device (not shown) to extract suggested views. In this regard, the present system should have the ability to switch between the suggested view and the manually selected user view at any time.
Switching between the suggestion view and the manual view may include prompting the user to provide a user selection (e.g., pressing a button) to enter/exit this mode. In one or more implementations, the present system may automatically perform the switching. For example, if the user manually moves the view (with a mouse, remote control, gesture, handle, etc. …), the view is updated to follow the user's requirements. If the user stops making manual adjustments within a set amount of time, the view may drift back to the suggested view.
In one or more implementations, multiple suggestion views may be provided and/or more than one suggestion view may be presented at a time, where appropriate. For example, for a soccer game, one view may track the quarterback and the other view may track the outsider. Using the football example above, the user may view a split screen with 4 views at a time. Alternatively, different views may be used to track a particular car during a NASCAR racing event. The user can choose from among these suggested views to customize their experience without having to have full control over the views at all times.
If a suggested view of the entire scene is not available or appropriate, then a suggestion (or recommendation) may be given to try to ensure that the viewer does not miss important actions. A hint view (or preview) may be provided at the beginning of a new scene. The view may then be shifted to hint at the view so as to focus on the primary action of the view. In one or more implementations, if the user wishes to be less direct (or independent), an on-screen graphical arrow may be used to indicate that the user may be facing the wrong way and miss interesting things.
Two of the most common types of projection are the equidistant columnar projection and the cubic projection. These projections map the video from spheres (equidistant columns) and cubes (cubes) onto a flat 2D plane. Examples are shown in fig. 36, illustrating examples of equidistant columnar projections (e.g., 3602) and cubic projections (e.g., 3604).
Fig. 37 conceptually illustrates an example of a 360degree video presentation 3700. At this point, most 360degree video content from streaming media services (YouTube, Facebook, etc. …) is viewed on a computer or smartphone. However, it is expected that 360degree video will be played on a standard cable/satellite network in the near future. Sporting events, travel shows, extreme sports, and many other types of programs may be shown in a 360 ° video to increase interest and participation.
While 360degree video applications can be an interesting way of immersing in a scene, for longer programs, the requirement to always manually control the view to track a primary target of interest can often become tedious. For example, looking around again and again during a sporting event may be interesting, but for most games the user only wants to see the center of the action.
For this purpose, the content provider may provide a "suggestion view". This suggested view should be a specific set of perspectives for each frame to provide the user with a recommended physical examination. In the event that the user is not particularly interested in controlling the view itself at any given point in time, the user may simply view the suggested view and experience the view recommended by the content provider.
The concept of recording a user view at any given time can be expressed as three different angles in 3D space. These different angles are so-called Euler angles. In flight dynamics, these three angles are referred to as "roll", "pitch" and "roll". Referring back to fig. 10, as a means of suggesting a viewer's view, "roll", "pitch", and "roll" angles for each frame may be encoded. The decoder (e.g., 3508) may extract and use these suggested views at any time when the user is not interested in controlling the views themselves.
This perspective data may be stored in any number of ways. For example, the angle may be inserted into the video stream as picture user data (AVC/HEVC auxiliary enhancement information message-SEI message), or it may be carried as a separate data stream within the video sequence (a different MPEG-2TS PID or MP4 data stream).
Depending on the way the views are stored, a corresponding view extraction mechanism may be required. For example, if the angles are stored in the video bitstream as SEI messages, the video decoder may extract the views. If the angle is stored in a separate MPEG-2TS PID, the demultiplexer can extract this information and send it to the rendering process. A 360degree video presentation system (e.g., 3500) would have the ability to switch between a suggested view and a manually selected user view at any time.
FIG. 38 illustrates a schematic diagram of extraction and rendering with a suggested perspective. Switching between the suggested view and the manual view may be as simple as pressing a button to enter/exit this mode. In other aspects, the switching may be performed automatically. For example, if the user manually moves the view (with a mouse, remote control, gesture, handle, etc. …), the view is updated to follow the user's requirements. If the user stops making manual adjustments within a set amount of time, the view may drift back to the suggested view.
Multiple suggestion views may be provided where appropriate. For example, for a soccer game, one view may track the quarterback and the other view tracks the outsider. In other aspects, different views may be used to track a particular car during a NASCAR racing event. The user can choose from among these suggested views to customize their experience without having to have full control over the views at all times.
More than one suggestion view may be presented at a time. Using the football example above, a user can view a split screen with 4 views at a time, one for a four-quarter-satellite, one for a outsert, while they manually control the other two, and so on.
If a suggested view of the entire scene is not available or appropriate, a prompt may be given to try to ensure that the viewer does not miss an important action. A hint view may be provided at the beginning of a new scene. The view may then be shifted to hint at the view so as to focus on the primary action of the view. In other aspects, something like an on-screen graphical arrow may be used to indicate that the user may be facing the wrong way and miss interesting things if we wish not to be as straightforward.
Most 360degree video applications do not allow the user to adjust the "roll" angle. The camera is typically fixed in a vertical orientation. The view can be rotated up/down and left/right, but not turned sideways, etc. In this regard, a system that suggests only two perspectives would be sufficient for most uses.
It should be noted that not all "360 video" streams actually cover a full 360 x180 field of view. Some sequences may only allow viewing in the front direction (180 ° × 180 °). In some instances, some may have limitations as to how high you can look up or down. All of which are intended to be encompassed by the same concepts discussed herein.
In one or more implementations, a default (recommended) viewing direction (i.e., viewing direction angle), FOV angle, system message of the presentation picture size may be signaled with the 360degree video content.
In one or more implementations, a 360degree video playback system supports a scanning mode in which the viewing angle can be automatically changed every N frames to help the viewer select a desired viewing direction. For example, in auto-scan mode, the vertical view γ and the view ε along the z-axis are first fixed to 0 degrees, the vertical angle changes by one degree every N frames; after the viewer selects the horizontal viewing angle θ, the horizontal viewing angle is fixed to the selected viewing angle, and the viewing angle ε along the z-axis is still fixed to 0 degrees, the vertical viewing angle starts to change every N frames by one degree until the viewer selects the vertical viewing angle γ; after both the horizontal and vertical viewing angles theta and gamma have been selected, both angles are fixed to the selected angle and the viewing angle epsilon along the z-axis begins to change once every N frames until the viewer has selected the viewing angle epsilon. In some implementations, the scan pattern may scan different views in parallel, or in other embodiments, the scan pattern may scan different views sequentially. In some aspects, the perspective may be limited (or constrained) by the user profile or the type of user (e.g., child, adult). In this example, the 360degree video content managed by the parent control settings may be limited to a subset of the perspectives as indicated by the settings. In some aspects, the 360degree video bitstream includes metadata indicating a frame track of the scan mode.
In one or more implementations, the multiview layout format may also preserve viewpoints of interest by indicating the importance of the views in the sequence. Views may be allocated with different view fidelity (resolution, frame rate, bit rate, FOV angle size, etc.). The application system message signals view fidelity information.
Fig. 39 conceptually illustrates an electronic system 3900, with which one or more implementations of the present technology may be implemented with the electronic system 3900. Electronic system 3900 can be, for example, a network device, a media converter, a desktop computer, a laptop computer, a tablet computer, a server, a switch, a router, a base station, a receiver, a telephone, or, in general, any electronic device that transmits signals over a network. Such an electronic system 3900 includes interfaces for various types of computer-readable media, and various other types of computer-readable media. In one or more implementations, the electronic system 3900 may be or may include one or more of the devices 102, 104, 106, 108, 110, 360degree video projection format conversion devices, and/or 360degree video playback devices. Electronic system 3900 includes bus 3908, one or more processing units 3912, system memory 3904, Read Only Memory (ROM)3910, persistent storage 3902, input device interface 3914, output device interface 3906, and network interface 3916, or subsets and variations thereof.
Bus 3908 collectively represents the chipset bus for all systems, peripherals, and the numerous internal devices that communicatively connect electronic system 3900. In one or more implementations, the bus 3908 communicatively connects the one or more processing units 3912 with the ROM 3910, the system memory 3904, and the persistent storage device 3902. From these various memory units, one or more processing units 3912 retrieve instructions to execute and retrieve data to process in order to perform the processes of the present disclosure. In different implementations, the one or more processing units 3912 may be single processors or multi-core processors.
The ROM 3910 stores statistics and instructions required by one or more processing units 3912 and other modules in the electronic system. Persistent storage 3902, on the other hand, is a read-write memory device. Persistent storage 3902 is a non-volatile memory unit that stores instructions and data, even when electronic system 3900 is powered down. One or more implementations of the invention use mass storage devices (e.g., magnetic or optical disks and their corresponding optical disk drives) as persistent storage device 3902.
Other implementations use removable storage devices (such as floppy disks, flash drives, and their corresponding optical disk drives) as persistent storage device 3902. Like persistent storage device 3902, system memory 3904 is a read-write memory device. Unlike persistent storage 3902, however, system memory 3904 is a volatile read-and-write memory, such as random access memory. The system memory 3904 stores any of the instructions and data needed by the one or more processing units 3912 at runtime. In one or more implementations, the processes of the present disclosure are stored in system memory 3904, persistent storage 3902, and/or ROM 3910. From these various memory units, the one or more processing units 3912 retrieve instructions to execute and retrieve data to process in order to perform the processing of one or more implementations.
The bus 3908 is also connected to an input device interface 3914 and an output device interface 3906. The input device interface 3914 enables a user to communicate information and selection commands to the electronic system. Input devices used in conjunction with the input device interface 3914 include, for example, alphanumeric keypads and pointing devices (also referred to as "cursor control devices"). The output device interface 3906 enables display of images generated by, for example, the electronic system 3900. Output devices used in conjunction with output device interface 3906 include, for example, pointer and display devices such as Liquid Crystal Displays (LCDs), Light Emitting Diode (LED) displays, Organic Light Emitting Diode (OLED) displays, flexible displays, flat panel displays, solid state displays, projectors, or any other device for outputting information. One or more implementations may include a device, such as a touchscreen, that serves as both an input and an output device. In these embodiments, the feedback provided to the user can be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including audible, velocity, or tactile input.
Finally, as shown in fig. 39, bus 3908 also couples electronic system 3900 to one or more networks (not shown) through one or more network interfaces 3916. In this manner, the computer may be part of one or more computer networks (e.g., a local area network ("LAN"), a wide area network ("WAN"), or an intranet), or of one of more networks (e.g., the internet). Any or all of the components of electronic system 3900 may be used in conjunction with the present invention.
Implementations within the scope of the present invention may be partially or fully implemented using tangible computer-readable storage media (or multiple tangible computer-readable storage media of one or more types) encoding one or more instructions. Volatile computer-readable storage media may also be non-transitory in nature.
A computer-readable storage medium may be any storage medium that can be read, written, or otherwise accessed by a general purpose or special purpose computing device, including any processing electronics and/or processing circuitry capable of executing instructions. By way of example, and not limitation, the computer-readable medium may comprise any volatile semiconductor memory, such as RAM, DRAM, SRAM, T-RAM, Z-RAM, and TTRAM. The computer readable medium may also include any non-volatile semiconductor memory, such as ROM, PROM, EPROM, EEPROM, NVRAM, flash memory, nvSRAM, FeRAM, FeTRAM, MRAM, PRAM, CBRAM, SONOS, RRAM, NRAM, racetrack memory, FJG, and Millipede memory.
Further, the computer-readable storage medium may include any non-semiconductor memory, such as optical disk storage, magnetic tape, other magnetic storage devices, or any other medium capable of storing one or more instructions. In some implementations, the volatile computer-readable storage medium may be directly coupled to the computing device, while in other implementations, the volatile computer-readable storage medium may be indirectly coupled to the computing device, e.g., via one or more wired connections, one or more wireless connections, or any combination thereof.
The instructions may be directly executable or may be used to develop executable instructions. For example, the instructions may be implemented as executable or non-executable machine code or as high-level language instructions that may be compiled to produce executable or non-executable machine code. Further, instructions may also be implemented as or may contain data. Computer-executable instructions may also be organized in any format including routines, subroutines, programs, data structures, objects, modules, applications, applets, functions, and the like. As will be recognized by those of skill in the art, details including, but not limited to, number, structure, sequence, and organization of instructions may vary considerably without changing the underlying logic, functions, processing, and outputs.
Although discussed above primarily with reference to a microprocessor or multi-core processor executing software, one or more implementations are performed by one or more integrated circuits, such as an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA). In one or more implementations, such integrated circuits execute instructions stored on their circuitry.
Those of skill in the art will appreciate that the various illustrative blocks, modules, elements, components, methods, and algorithms described herein may be implemented as electronic hardware, computer software, or combinations of both. To illustrate this interchangeability of hardware and software, various illustrative blocks, modules, elements, components, methods, and algorithms have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. The various components and blocks may be arranged differently (e.g., arranged in a different order or divided in a different manner) all without departing from the scope of the present technology.
It should be understood that any particular order or hierarchy of blocks in the processes disclosed is illustrative of example approaches. Based upon design preferences, it is understood that the specific order or hierarchy of blocks in the processes may be rearranged or that all blocks illustrated may be performed. Any of the blocks may be performed simultaneously. In one or more implementations, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
As used in this specification and any claims of this application, the terms "base station," "receiver," "computer," "server," "processor," and "memory" all refer to electronic devices or other technical devices. These terms exclude humans or populations. For the purpose of the specification, the term "display" or "displaying" means displaying on an electronic device.
As used herein, the phrase "at least one of" the aforementioned series of items, and the terms "and" or "separating any of the items, as a whole, modifies the list rather than each member (e.g., each item) in the list. The phrase "at least one of …" need not select at least one of each item listed; rather, the phrase allows for the inclusion of the meaning of at least one of any of the items and/or at least one of any combination of the items and/or at least one of each of the items. By way of example, the phrases "at least one of A, B and C" or "at least one of A, B or C" each refer to: only a, only B, or only C; A. any combination of B and C; and/or A, B and C.
The predicates "configured to", "operable to", and "programmed to" do not imply any particular tangible or intangible modification of the object, but rather, are intended to be used interchangeably. In one or more implementations, a processor configured to monitor and control operations or components may also mean a processor programmed to monitor and control operations or a processor operable to monitor and control operations. Likewise, a processor configured to execute code may be constructed as a processor programmed to execute code or operable to execute code.
Phrases such as an aspect, the aspect, another aspect, some aspect, one or more aspects, an implementation, the implementation, another implementation, some implementations, one or more implementations, an embodiment, the embodiment, another embodiment, some embodiments, one or more embodiments, a configuration, the configuration, another configuration, some configurations, one or more configurations, the present technology, disclosure, present disclosure, other variations thereof, and the like are for convenience and do not imply that disclosure related to such phrase(s) is essential to the present technology or that this disclosure applies to all configurations of the present technology. Disclosure relating to such phrase(s) may apply to all configurations or one or more configurations. Disclosure relating to such phrase(s) may provide one or more examples. For example, a phrase of one aspect or some aspects may refer to one or more aspects and vice versa, and this similarly applies to other preceding phrases.
The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" or "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. Furthermore, to the extent that the terms "includes," "has," or the like are used in either the description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.
All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. In providing 35u.s.c. § 112, paragraph six, no claimed elements are constructed unless the element is explicitly recited using the phrase "means for …", or in the case of the method claims, the element is recited using the phrase "step for …".
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean "one and only one" unless explicitly so stated, but rather "one or more. Unless expressly stated otherwise, the term "some" means one or more. A pronoun for a male (e.g., he) includes a female and a neutral (e.g., her and it), and vice versa. Headings and sub-headings (if any) are used for convenience only and do not limit the invention.

Claims (12)

1. A system, comprising:
a video capture device configured to capture 360degree video;
a splicing device configured to:
stitching the captured 360degree video using an intermediate coordinate system between an input picture coordinate system and a 360degree video capture coordinate system, wherein the intermediate coordinate system is transformed from the video capture coordinate system by rotating along respective coordinate axes of the video capture coordinate system by respective angles; and
an encoding device configured to:
encoding the spliced 360-degree video into a 360-degree video bitstream; and is
For transmission and storage, the 360-degree video bitstream is prepared for playback.
2. The system of claim 1, wherein the splicing device is configured to:
calculating the size of the normalized projection plane by using the view angle;
calculating coordinates in a normalized presentation coordinate system from an output presentation picture coordinate system using the normalized projection plane size;
mapping the coordinates to a viewing coordinate system using the normalized projection plane size;
converting the coordinates from the viewing coordinate system to a capture coordinate system using a coordinate transformation matrix;
converting the coordinates from the capture coordinate system to the intermediate coordinate system using the coordinate transformation matrix;
converting the coordinates from the intermediate coordinate system to a normalized projection system; and
mapping the coordinates from the normalized projection system to the input picture coordinate system.
3. The system of claim 2, wherein the coordinate transformation matrix is pre-computed using a viewing direction angle and the respective angle.
4. The system of claim 3, wherein the respective angle is signaled in a message included in the 360degree video bitstream.
5. The system of claim 1, wherein the encoding device is further configured to:
encoding the stitched 360-degree video into a plurality of view sequences, each of the plurality of view sequences corresponding to a different view region in the 360-degree video.
6. The system of claim 5, wherein at least two of the plurality of view sequences are encoded with different projection layout formats.
7. The system of claim 5, further comprising:
a rendering device configured to receive the plurality of sequences of views as input and render each of the plurality of sequences of views using a rendering control input.
8. The system of claim 7, wherein the rendering device is further configured to select at least one of the plurality of sequences of views for rendering and exclude at least one of the plurality of sequences of views from the rendering.
9. The system of claim 1, wherein the encoding device is further configured to include unrestricted motion compensation signaling in the 360degree video bitstream to indicate one or more pixels in a view that exceed a picture boundary of the view.
10. The system of claim 9, wherein the unrestricted motion compensation signaling is located in a sequence header or a picture header of the 360degree video bitstream.
11. The system of claim 1, wherein a relationship between the intermediate coordinate system and the input picture coordinate system is defined by a counterclockwise rotation angle along one or more axes.
12. The system of claim 1, wherein the splicing device is further configured to:
converting an input projection format of the 360degree video bitstream to an output projection format different from the input projection format.
CN201710952982.8A 2016-10-14 2017-10-13 360degree video capture and playback Active CN107959844B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201662408652P 2016-10-14 2016-10-14
US62/408,652 2016-10-14
US201662418066P 2016-11-04 2016-11-04
US62/418,066 2016-11-04
US15/599,447 2017-05-18
US15/599,447 US11019257B2 (en) 2016-05-19 2017-05-18 360 degree video capture and playback

Publications (2)

Publication Number Publication Date
CN107959844A CN107959844A (en) 2018-04-24
CN107959844B true CN107959844B (en) 2021-09-17

Family

ID=61765451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710952982.8A Active CN107959844B (en) 2016-10-14 2017-10-13 360degree video capture and playback

Country Status (2)

Country Link
CN (1) CN107959844B (en)
DE (1) DE102017009145A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11528538B2 (en) * 2018-12-21 2022-12-13 Koninklijke Kpn N.V. Streaming volumetric and non-volumetric video
CN109829851B (en) * 2019-01-17 2020-09-18 厦门大学 Panoramic image splicing method based on spherical alignment estimation and storage device
CN112423108B (en) * 2019-08-20 2023-06-30 中兴通讯股份有限公司 Method and device for processing code stream, first terminal, second terminal and storage medium
CN111355966A (en) * 2020-03-05 2020-06-30 上海乐杉信息技术有限公司 Surrounding free visual angle live broadcast method and system
US11259055B2 (en) * 2020-07-10 2022-02-22 Tencent America LLC Extended maximum coding unit size
CN113206992A (en) * 2021-04-20 2021-08-03 聚好看科技股份有限公司 Method for converting projection format of panoramic video and display equipment
DE102021119951A1 (en) 2021-08-02 2023-02-02 Dr. Ing. H.C. F. Porsche Aktiengesellschaft Method, system and computer program product for detecting the surroundings of a motor vehicle
CN117745980A (en) * 2022-09-13 2024-03-22 影石创新科技股份有限公司 Image transmission method, image display device and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1922544A (en) * 2004-02-19 2007-02-28 创新科技有限公司 Method and apparatus for providing a combined image
CN102508565A (en) * 2011-11-17 2012-06-20 Tcl集团股份有限公司 Remote control cursor positioning method and device, remote control and cursor positioning system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102025922A (en) * 2009-09-18 2011-04-20 鸿富锦精密工业(深圳)有限公司 Image matching system and method
US20130089301A1 (en) * 2011-10-06 2013-04-11 Chi-cheng Ju Method and apparatus for processing video frames image with image registration information involved therein
CN103873758B (en) * 2012-12-17 2018-09-21 北京三星通信技术研究有限公司 The method, apparatus and equipment that panorama sketch generates in real time
CN105872353A (en) * 2015-12-15 2016-08-17 乐视网信息技术(北京)股份有限公司 System and method for implementing playback of panoramic video on mobile device
CN106023070B (en) * 2016-06-14 2017-10-03 北京岚锋创视网络科技有限公司 Real time panoramic joining method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1922544A (en) * 2004-02-19 2007-02-28 创新科技有限公司 Method and apparatus for providing a combined image
CN102508565A (en) * 2011-11-17 2012-06-20 Tcl集团股份有限公司 Remote control cursor positioning method and device, remote control and cursor positioning system

Also Published As

Publication number Publication date
DE102017009145A1 (en) 2018-04-19
CN107959844A (en) 2018-04-24

Similar Documents

Publication Publication Date Title
US11019257B2 (en) 360 degree video capture and playback
CN107959844B (en) 360degree video capture and playback
US10848668B2 (en) 360 degree video recording and playback with object tracking
US11120837B2 (en) System and method for use in playing back panorama video content
US10277914B2 (en) Measuring spherical image quality metrics based on user field of view
JP6501904B2 (en) Spherical video streaming
US10887572B2 (en) Suggested viewport indication for panoramic video
CN108024094B (en) 360degree video recording and playback with object tracking
EP3249930B1 (en) Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices
US20180192001A1 (en) Rectilinear viewport extraction from a region of a wide field of view using messaging in video transmission
US20170339391A1 (en) 360 degree video system with coordinate compression
US10904508B2 (en) 360 degree video with combined projection format
US20210266613A1 (en) Generating composite video stream for display in vr
JP7177034B2 (en) Method, apparatus and stream for formatting immersive video for legacy and immersive rendering devices
US20110063298A1 (en) Method and system for rendering 3d graphics based on 3d display capabilities
EP4066488A1 (en) A method and apparatus for decoding a 3d video
US20230215080A1 (en) A method and apparatus for encoding and decoding volumetric video
US20230319250A1 (en) Display-optimized light field representations
KR20220054430A (en) Methods and apparatuses for delivering volumetric video content
JP2020043559A (en) Video streaming method, video streaming system, video streaming device, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20181022

Address after: Singapore Singapore

Applicant after: Annwa high tech Limited by Share Ltd

Address before: Singapore Singapore

Applicant before: Avago Technologies Fiber IP Singapore Pte. Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant