WO2017127816A1 - Codage et diffusion en continu de vidéo omnidirectionnelle - Google Patents

Codage et diffusion en continu de vidéo omnidirectionnelle Download PDF

Info

Publication number
WO2017127816A1
WO2017127816A1 PCT/US2017/014588 US2017014588W WO2017127816A1 WO 2017127816 A1 WO2017127816 A1 WO 2017127816A1 US 2017014588 W US2017014588 W US 2017014588W WO 2017127816 A1 WO2017127816 A1 WO 2017127816A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
segments
joining
spherical
encoding
Prior art date
Application number
PCT/US2017/014588
Other languages
English (en)
Inventor
Ziyu Wen
Yikai ZHAO
Jisheng Li
Bichuan GUO
Jiangtao Wen
Sihan Li
Yao LU
Original Assignee
Ziyu Wen
Zhao Yikai
Jisheng Li
Guo Bichuan
Jiangtao Wen
Sihan Li
Lu Yao
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ziyu Wen, Zhao Yikai, Jisheng Li, Guo Bichuan, Jiangtao Wen, Sihan Li, Lu Yao filed Critical Ziyu Wen
Priority to CN201780007828.1A priority Critical patent/CN109121466B/zh
Publication of WO2017127816A1 publication Critical patent/WO2017127816A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/08Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division
    • H04N7/0806Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division the signals being two or more video signals

Definitions

  • the current disclosure relates to encoding and streaming of video and in particular to encoding and streaming omnidirectional video.
  • Omnidirectional video provides a 360 ° view of an environment.
  • Omnidirectional video allows a viewer to view any desired portion of the 360 ° environment.
  • Encoding omnidirectional video may use existing encoding techniques used for 2-dimensional (2D) video, by projecting the omnidirection video from a sphere to one or more rectangles.
  • FIG. 1 depicts projecting the omnidirectional video from a sphere 100 onto one or more rectangles 102, 104a, 104b, 104c, 104d, 104e, 104f using equirectangular projection and cubic projection. In both cases of equirectangular projection and cubic projection, the resulting 2D projections have wasted pixels.
  • the area of the omnidirectional video is that of a sphere 102.
  • the omnidirectional videos' sphere has a radius of r
  • the omnidirectional video covers an area of 4nr 2 .
  • the sphere's area is projected onto a rectangle having an area of 2n 2 r 2 which is 157% the area of the sphere.
  • the sphere's area is projected to six squares have a combined area of 6nr 2 , which is 150% the area of the sphere. Accordingly, both projection techniques result in relatively large amount of unnecessary information being encoded.
  • the present disclosures provides a new encoding method that uses a nearly equal-area projection.
  • the encoding may also use ROI-targeted encoding to provide the encoded omnidirectional videos.
  • the present disclosure provides adaptive streaming techniques for omnidirectional videos.
  • the present disclosure further provides video capture devices and stitching and techniques for capturing panoramic and omnidirectional video.
  • omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by mapping a bottom spherical dome portion of the frame of the spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle; stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2- dimension (2D) frame corresponding to the frame of the spherical
  • the method further comprises segmenting a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments; stacking the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and encoding the plurality of 2D frames.
  • the at least one joining segment comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
  • the circular poles are placed within squares.
  • each of the north pole segment, the south pole segment and the at least one joining segment comprise overlapping pixel data.
  • the at least one joining segment comprises between 2 and 4 segments.
  • the method further comprises tracking one or more regions of interest (ROI) before encoding the 2D frames.
  • ROI regions of interest
  • encoding the 2D frame comprises: encoding one or more view points into a first stream; and for each view point encoding additional streams comprising an intracoded frame stream, a predictive frame stream, and a bi-predictive frame stream.
  • the method further comprises streaming at least one of the encoded view points.
  • a system for encoding omnidirectional video comprising: a processor for executing instructions; and a memory for storing instructions, which when executed by the processor, configure the system to provide a method comprising: receiving spherical omnidirectional video data; segmenting a frame of the spherical omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by mapping a bottom spherical dome portion of the frame of the spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle; stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2-dimension (2D) frame
  • the instructions further configure the system to: segment a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments; stack the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and encode the plurality of 2D frames.
  • the at least one joining segment comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
  • each of the north pole segment, the south pole segment and the at least one joining segment comprise overlapping pixel data.
  • the at least one joining segment comprises between 2 and 4 segments.
  • the instructions further configure the system to track one or more regions of interest (ROI) before encoding the 2D frames.
  • ROI regions of interest
  • encoding the 2D frame comprises: encoding one or more view points into a first stream; and for each view point encoding additional streams comprising an intracoded frame stream, a predictive frame stream, and a bi-predictive frame stream.
  • the instructions further configure the system to stream at least one of the encoded view points.
  • a device for use in capturing panoramic video comprising: a frame for holding a mobile device; a first fisheye lens mounted on the frame and arranged to be located over a front facing camera of the mobile device when the mobile device is held by the frame; and a second fisheye lens mounted on the frame and arranged to be located over a rear facing camera of the mobile device when the mobile device is held by the frame.
  • a method of stitching multiple videos captured from one or more mobile devices comprising: generating a stitching template for each camera capturing the videos; synchronizing frames of the captured video using timestamps of the frames; remapping the multiple videos onto a sphere using the stitching template; and blending the remapped images to provide a panoramic video.
  • FIG. 1 depicts equirectangular projection and cubic projection of a sphere
  • FIGs. 2A and 2B depict segmented sphere projection (SSP) of a sphere
  • FIG. 3 depicts stacking of segments from a segmented sphere projection
  • FIG. 4 is a graph of a ration of segmented area to the spherical area based on the number of segments for both circular pole and square pole segments;
  • FIG. 5 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for circular pole segments;
  • FIG. 6 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for square pole segments;
  • FIG. 7 depicts segmented sphere projection using a single equatorial tile segment and square poles;
  • FIG. 8 depicts segmented sphere projection using multiple equatorial tile segments and square poles
  • FIG. 9 depicts segmented sphere projection using equally sized equatorial tile segments and square poles
  • FIG. 10 depicts stacking of overlapping tile segments
  • FIG. 1 1 depicts further stacking of overlapping tile segments
  • FIG. 12 depicts further stacking of overlapping tile segments
  • FIG. 13 depicts stacking of overlapping tile segments for stereoscopic omnidirectional video
  • FIG. 14 depicts region of interest (ROI) encoding
  • FIG. 15 depicts further ROI encoding
  • FIG. 16 depicts an ROI heat map
  • FIG. 17 depicts ROI temporal encoding
  • FIG. 18 depicts view point encoding of omnidirectional video
  • FIG. 19 depicts view point encoding of a view of omnidirectional video for adaptive view point streaming
  • FIG. 20 depicts adaptive view point streaming of omnidirectional video
  • FIG. 21 depicts a system for encoding and streaming omnidirectional video
  • FIGs. 22A, 22B and 25c depict devices for capturing panoramic and/or omnidirectional video
  • FIGs. 23A, 23B, and 23C depict stitching video together; and [52] FIGs 24A and 24B depict brightness mapping. DETAILED DESCRIPTION
  • Omnidirectional video can be encoded using regular encoding techniques by first projecting the video from a sphere to 2-dimensional (2D) tiles.
  • Segmented sphere projection projects the spherical video from a top dome or cap portion of the sphere, a bottom dome or cap portion of the sphere and a middle equatorial portion of the sphere joining the top and bottom cap portions.
  • the top and bottom cap segments may be mapped to circular tiles or to a circular portion of a square tile.
  • the equatorial portion of the sphere may be mapped to one or more rectangular tiles.
  • the tiles may then be stacked together into a single frame for subsequent encoding.
  • the total area of tiles resulting from SSP may be smaller than the total area resulting from either equirectangular projection or cubic projection.
  • the tile area for SSP is close to that of the area of the sphere of the omnidirectional video.
  • the encoding efficiency of omnidirectional video may be further improved by encoding particular region of interest (ROI) portions of the omnidirectional video with a higher bitrate while encoding non-ROI portions of the omnidirectional video using a lower bitrate.
  • ROI region of interest
  • FIGs. 2A and 2B depict segmented sphere projection (SSP) of a sphere.
  • SSP segmented sphere projection
  • the sphere is segmented and mapped to tiles using an improved projection based on Sinusoidal projection.
  • the sphere 200 is cut along its latitude into several segments including a north pole segment 202a, a south pole segment 202b (referred to collectively as pole segments 202) and one or more equatorial joining segments 204a-f (referred to collectively as joining segments 204) between the two poles.
  • the segments may then be mapped to tiles, and in particular the pole segments may be mapped to circular tiles 206a, 206b and the joining segments 204 may be mapped to rectangular tiles 208a-208f.
  • the number of joining segments can vary.
  • the sphere 200 may be cut into two pole segments 210a, 210b and 3 equatorial joining segments 212a, 212b, 212c.
  • Each of the two pole segments 210a, 210b are mapped to respective circles contained by squares 214a, 214b and the joining segments 212a-c are mapped to respective rectangles 216a-c.
  • the individual tiles may overlap with each other a certain amount in order to maintain video quality during further processing. Once segmented into tiles, the individual tiles may be stacked together to form a frame that may be encoded using various encoding techniques.
  • FIG. 3 depicts stacking of segments from a segmented sphere projection.
  • individual joining segment tiles 304a-c may be stacked together with the square pole tiles 302a,b and arranged in rectangular frame 300.
  • the rectangular frame 300 may then be encoded, using for example a h.264 encoder although other encoding techniques may be used.
  • FIG. 4 is a graph of a ratio of segmented area to the spherical area based on the number of segments for both circular pole and square pole segments. As depicted in the graph of FIG. 4, as the number of equatorial joining segments increases, the total area of the segmented tiles approaches the area of the sphere. As can be seen from FIG. 4, the segmented tile area is greater when using square poles when compared to circular poles. As the number of segmented tiles increases, the tile-pole segmentation latitude, that is the latitude where the sphere is cut to form the segments, will be pushed toward the poles. [58] FIG.
  • FIG. 5 is a graph of a ratio of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for circular pole segments.
  • FIG. 6 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for square pole segments.
  • Table 1 shows segmentation latitudes for varying amounts of segment overlap, varying number of joining segments, and the use of circular or square pole tiles.
  • each hemisphere of the sphere into 1 segment and 1 pole may be described by: niinee(o/y 2 ) + ⁇ )
  • is the tile-pole segmentation latitude.
  • ⁇ 1 is the tile-tile segmentation latitude and ⁇ 2 is the tile-pole segmentation latitude.
  • Table 2 provides examples about coding levels and the corresponding HEVC supported resolution, equivalent equirectangular resolution and equivalent resolution displayed on single eye(FOV 90 ° ) when using different tile.
  • FIG. 7 depicts segmented sphere projection using a single equatorial tile segment and square poles.
  • a sphere 700 may be segmented into north pole 702 and south pole 704 joined by a joining segment 706.
  • the poles 702, 704 are mapped to circles within squares 708, 71 0 and the joining segment 706 is mapped to a rectangle 71 2.
  • the pole tiles 708 and the equator tile 710 can be vertically stacked to form a frame for encoding.
  • FIG. 8 depicts segmented sphere projection using multiple equatorial tile segments and square poles. As depicted, a sphere 800 may be
  • SSP south segment
  • the layout of the tiles may be vertically arranged when forming a frame as shown in FIGs. 7 and 8.
  • the formulas for the SSP are shown below.
  • Equation (3) indicates how to map a point on the cap (0', ⁇ ) into a point in circle (x',y')- It should be noticed that there are differences in the sign between north and south poles.
  • Equation (4) indicates how to map the equator to the middle rectangle. It uses the same equation as Equirectangular Projection (ERP) to convert the equator area into the rectangle.
  • Equations (5) and (6) indicate how to map the rest of the segments to rectangles. It also uses the same equation as Equirectangular Projection (ERP) to map to rectangles.
  • FIG. 9 depicts segmented sphere projection using equally sized equatorial tile segments and square poles.
  • the projection depicted in FIG. 9 may be similar to that depicted in FIG. 7; however, rather than mapping the single joining segment to a single rectangle, the projection depicted in FIG. 9 breaks the single rectangle into 4 squares. That is, the sphere 900 is segmented into a two poles 902, 904 and a joining segment 906 and mapped to circles on squares 908, 910 and squares 912a-d.
  • the Segmented Sphere Projection (SSP) of FIG. 9 segments the sphere into 3 tiles: north pole, equator and south pole. The boundaries of 3 segments are 45 °N and 45 °S.
  • the north and south pole are mapped into 2 circles, and the projection of the equatorial segment is the same as ERP.
  • the diameter of the circle is equal to the height of the equatorial segments since both pole segments and equatorial segment have a 90 " latitude span.
  • the segment tiles may be packed together to form a frame for encoding.
  • the packing process attempts to put each region of the SSP segments into one 2D image with the least wasted area.
  • the two circles on squares 1002 are placed vertically on top of the rectangles 1004. The circles are centered horizontally as the center of the rectangle of the equator. All the other rectangles are centered vertically as the center of the rectangle of the equator.
  • the 2 nd type depicted in FIG. 1 1
  • two circles 1 102 are place horizontally on top of the rectangles 1 104.
  • the circles are also centered horizontally as the center of the rectangle of the equator and all the other rectangles are also centered vertically as the center of the rectangle of the equator.
  • two circles 1202 are put on the left side and the right side of the rectangle 1204 of the equator.
  • the highest point of the circle is as high as the top edge of the rectangle of the equator. All the other rectangles are placed so that the bottom edges of all rectangles are at the same height.
  • FIG. 13 depicts stacking of overlapping tile segments for stereoscopic omnidirectional video.
  • stereoscopic video there are two views.
  • the segmented tiles of each of the views 1302, 1304 are packed side by side.
  • FIG. 13 shows a layout of 1 tile 1 pole SSP that supports stereoscopic video.
  • SSP Video Information box is defined as
  • geometry_type unsigned int(8) geometry_type
  • is_stereoscopic indicates whether stereoscopic media rendering is used or not.
  • the value of this field is equal to 1 to indicate that the video in the referenced track is divided into two parts to provide different texture data for left eye and right eye separately according to the composition type specified by stereoscopic_type.
  • geometry_type indicates the type of geometry for rendering of omnidirectional media. It may be GEOMETRY_ERP indicating that an
  • GEOMETRY_CMP indicating a cube map projection
  • GEOMETRY_SSP indicating a segmented sphere projection
  • stereoscopic_type indicates the type of composition for the stereoscopic video in the referenced track.
  • ssp_theta_num indicates how many ⁇ are used. Then the number of segments of SSP including north pole and south pole in total will be 2 * ssp_theta_num +1 , the default value is 1 .
  • s sp_thet a_id indicates the identifier of the theta.
  • s sp_thet a contains ⁇ values in terms of degrees, ranging from 0-180. The default value is 45.
  • s sp_over i ap_width indicates the width, in pixels, of overlap.
  • FIG. 14 depicts region of interest (ROI) encoding.
  • ROI region of interest
  • an ROI target encoding process 1400 uses ROI information 1406, which may comprise a mask 1408 specifying the ROI portion of the raw video 1402 being encoded.
  • the raw video 1402 is depicted as a video frame 1404 having a person and a tree, with the person being the ROI.
  • the raw video 1402 and the ROI information 1406 may be used to lower the encoding quality of the non-ROI areas of the raw video by the encoder 1410.
  • the reduced quality of the non-ROI areas allow an optimized bitrate allocation, in order to acquire highest quality encoding of ROI areas with constant bitrate.
  • the encoder 1410 provides an ROI optimized video 1412.
  • FIG. 15 depicts further ROI encoding.
  • the process 1500 is similar to that described above with reference to FIG. 14; however, the process tracks ROIs across the raw video.
  • the raw video 1402 is provided to an ROI analysis and tracking functionality 1506.
  • ROI tracking the user may point out objects in the first frame, or any subsequent frames, that the ROIs are based on.
  • the tracking scheme uses an image segmentation algorithm to estimate an ROI corresponding to the selected objects.
  • the image segmentation algorithm is tuned specifically for omnidirectional videos such that it automatically adjusts the area allocation to achieve better efficiency when the resulting ROI is applied to
  • an optic flow tracking algorithm is used to generate the ROIs for successive frames based on prior frames.
  • the number of feature points, the fineness of the optic flow vector field and other parameters are chosen by the algorithm to maximize its efficiency for the projection scheme. Users can pause the optic flow tracking algorithm at any point, and manually define the ROI for a specific frame with the same image segmentation algorithm.
  • the optic flow tracking algorithm will use the newest manually specified mask as its reference once it is resumed.
  • FIG. 16 depicts an ROI heat map.
  • the heat map 1600 depicts the most common locations, depicted by brightness with the most common area being depicted by white 1602, of ROIs.
  • the heat map 1600 provides information on the most common locations within the pole tiles 1604, 1606 and the equatorial tiles 1608.
  • the ROI expansion size margin is relatively low and the ROI border is sharp.
  • the segmentation iteration is high and the number of feature points is small.
  • the fineness of the optic flow field is low. In the low frequency regions there is a smaller ROI region with a sharp transition tuned for static video.
  • the ROI expansion size margin is relatively high and the ROI border is smooth.
  • the segmentation iteration is low and the number of feature points is large.
  • the fineness of the optic flow field is high. In the high frequency regions there is a larger ROI region with a smooth transition and is motion-sensitive. [91 ]
  • the first is adjusting the QP utilization. Regular video encoders treat every block (CU) in the video stream equally. However, in omnidirectional video and with the information on ROI, it is possible to tune the parameter to give ROI areas higher quality.
  • the second is resolution utilization. As described above, a omnidirectional video will be cut and reshaped. Some of the tiles may not contain any ROI area. Therefore there is no need to keep the same resolution ratio for those tiles. Hence those tiles which doesn't contain ROI area can be downscaled to certain resolution encoded with tuned qp parameters in order to save bitrate.
  • ROI area's resolution 1704, 1708 will be enhanced.
  • the extra pixel information will be stored in even-frames 1706 while the original frames become all odd-frames 1702.
  • the changes of resolution may be uncomfortable, and as such, the resolution may be slowly adjusted while there is limited motion in the video.
  • FIG. 18 depicts view point encoding of omnidirectional video.
  • the omnidirectional video 1800 is encoded to provide a plurality of different view points 1802, 1804, 1806.
  • Each view point stream is encoded into different time blocks 1808.
  • the streaming of different view points can be switched between each other at the different clip starting time blocks.
  • the encoded time blocks form a 2D caching scheme that allows different time blocks for different view points to be cached.
  • the view points are encoded to include additional streams of l-frames, P-frames and B-frames that allow a smart assembler to quickly recover the decoded stream when switching between the view points.
  • FIG. 19 depicts view point encoding of a view of omnidirectional video for adaptive view point streaming.
  • an original view point stream 1902 is further encoded into additional streams 1904, 1906, 1908 for the different time clips.
  • the additional streams 1904, 1906, 1908 encode different frames into l-frames, P-frames and B-frames.
  • the original stream and the additional streams 1910 are transmitted to allow quick view point switching at any time point.
  • FIG. 19 after encoding one whole video stream, several addition streams are encoded. All frames after the l-frame in a GOP in the original stream 1902 are encoded as l-frames forming Stream 0 1904.
  • This view point encoding can shorten the average waiting time during temporal random access. Combined with spatial division feature of encoding the different view points described above, it is possible to achieve high spatial and temporal random access ability during the omnidirectional video streaming.
  • FIG. 20 depicts adaptive view point streaming of omnidirectional video.
  • each view point with additional streams of data which provides an improved adaptive streaming for omnidirectional video.
  • previous adaptive streaming techniques encoded a video into a plurality of different quality streams 2002.
  • the different quality streams allowed the streaming of a video to adapt to network conditions.
  • the adapting streaming for omnidirectional viewpoints allows the adaptive streaming of multiple view points.
  • the additional streams allow the quick switching between view points.
  • FIG. 21 a system for encoding and streaming omnidirectional video.
  • the system is depicted as a server 2100 for processing omnidirectional video that may be provided to the server 2100 from a video system 2102 that captures 360 ° video.
  • the server 2100 comprises a processing unit 2104 for executing instructions.
  • An input/output (I/O) 2106 interface allows additional components to be operatively coupled to the processing unit 2104.
  • the server 2100 further comprises a non-volatile (NV) memory 2108 for storing data and instructions and a memory unit 21 10, such as RAM, for storing instructions for execution by the processing unit 2104.
  • the instructions stored in the memory unit 21 10 when executed by the processing unit 2104 configure the server 2100 to provide an omnidirectional video encoding functionality 21 12 in accordance with the functionality described above.
  • the encoding functionality 21 12 comprises functionality for segmenting and mapping 21 14 spherical omnidirectional video data to a number of pole and equatorial joining segments.
  • Tile stacking functionality 21 16 arranges the segments into a single frame for subsequent encoding.
  • the functionality further comprises ROI tracking functionality 21 18 that tracks ROIs across frames of the omnidirectional video. The stacked images and ROI information is then used by encoding functionality 2120 to encode the video data.
  • FIGs. 22A, 22B and 22C depict devices for capturing panoramic and/or omnidirectional video.
  • the devices depicted in FIGs. 22A, 22B, 22C each comprise fish eye-lenses that are mounted over camera of the device.
  • the fish-eye devices may be used with a single mobile phone or with additional mobile phones.
  • FIG. 22A depicts single phone 2200a that has a front facing camera and a rear facing camera.
  • a panoramic video capture device 2202a fits over the phone 2200a and places a first fisheye lens 2204a over a front facing camera and a second fisheye lens 2206a over a back facing camera.
  • FIG. 22B depicts a similar panoramic video capture device 2202b; however, rather than placing fisheye lenses over the front and back cameras, the device 2202b is designed for holding two mobile devices 2200b-1 , 2200b-2 back-to-back and places the fisheye lenses 2204b, 2204b over the front facing cameras of the mobile devices.
  • FIG. 22C depicts a further device 2202C that is designed for holding three mobile devices 2200c-1 , 2200c-2, 2200c-3.
  • the device 2202c holds the three mobile devices and arranges fisheye lenses 2204c, 2206c, 2208c over the front facing cameras of the devices.
  • the devices 2202a, 2202b, 2202c allow panoramic video to be captured using common mobile devices.
  • Each of the fisheye lenses may provide a 180 ° field of view.
  • two or more fisheye video streams are captured simultaneously.
  • one of the capture devices acts as the master capture device and may make connections with other devices to to receive the video streams, stitch the videos together and output the panoramic videos.
  • the two video streams captured by the front and back cameras of the device can be stitched together by the mobile device and stream out the resulting video.
  • stitching can be done in a player, which is suitable for low-power capture devices. In that case, all capture devices stream video directly to the player.
  • the devices depicted in FIGs. 22A, 22B, 22C may be used to stream panoramic video in a video chat system.
  • the video streaming process both between capture devices and from capture device to player, may use Real- time Transport Protocol to transfer real-time video and use Session
  • Timestamps for each frame may be added to the stream for synchronization.
  • the stitching process may be performed as follows:
  • Static photos are captured for every camera and key points are extracted using algorithms like SIFT. After matching the key points from different camera, each camera's parameters and rotation can be generated. More details about stitching is described below.
  • each frame from different camera can be remapped into a sphere.
  • Blend Linear or multi-band blending algorithm is used to blend remapped frames from different cameras to produce a 360 degree panorama frame, which is usually projected into a rectangular image as described above.
  • Generating the stitching template is illustrated in FIGs. 23A, 23B, 23C.
  • Using key points directly extracted from fisheye images to perform matching may produce bad results as depicted in FIG. 23A.
  • the mismatch between the fisheye videos 2302a, 2302b may result from distortion effects of the fisheye lens make the objects far from the image center hard to recognize by algorithms like SIFT and because most part of the images do not overlap.
  • Generating the stitching template uses predefined approximate camera parameters to remap the fisheye images to flattened images 2304a, 2304b before extracting key points as depicted in FIG 23B. Based on the
  • key points at certain areas 2306a, 2306b can be ignored safely. After a correct match is found in remapped images 2306a, 2306b, those key points are then un-mapped into the original fisheye images 2308a, 2308b to compute the final camera parameters and rotation to provide a proper matching between captured videos.
  • FIGs. 24A, 23B Another important step in generating the template is calculating a brightness map as depicted in FIGs. 24A, 23B.
  • a fisheye lens As a fisheye lens is used, brightness of each pixel varies greatly near the border as depicted in FIG. 24A.
  • the brightness map which provides brightness values for each pixel in an image, can be calculated as depicted in FIG 24B and used later to correct image brightness before blending.
  • the stitching process may also involve audio reconstruction. Audio streams captured by several devices at different positions can be
  • the player (usually smart devices or headsets) receives stitched panorama video (or multiple original video stream and then do stitching) and displays it.
  • a user can look at different angles using rotation sensors in the player such as gyroscopes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne des systèmes, des dispositifs et des procédés de capture, de codage et de diffusion en continu de vidéo à 360°. Ces dispositifs permettent la mise en place d'un objectif oeil-de-poisson sur des caméras de dispositifs mobiles, cela permettant à deux caméras ou plus de capturer une couverture vidéo complète de 360°. Une vidéo omnidirectionnelle peut être segmentée en une pluralité de pôles et un ou plusieurs pavés équatoriaux. Les pavés segmentés peuvent être empilés sous la forme d'une trame destinée à être codée. De multiples points de vue peuvent être codés afin d'effectuer une diffusion en continue adaptative en fonction du point de vue.
PCT/US2017/014588 2016-01-22 2017-01-23 Codage et diffusion en continu de vidéo omnidirectionnelle WO2017127816A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201780007828.1A CN109121466B (zh) 2016-01-22 2017-01-23 全向视频编码和流传输

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662286252P 2016-01-22 2016-01-22
US62/286,252 2016-01-22
US201662286516P 2016-01-25 2016-01-25
US62/286,516 2016-01-25

Publications (1)

Publication Number Publication Date
WO2017127816A1 true WO2017127816A1 (fr) 2017-07-27

Family

ID=59362365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/014588 WO2017127816A1 (fr) 2016-01-22 2017-01-23 Codage et diffusion en continu de vidéo omnidirectionnelle

Country Status (2)

Country Link
CN (1) CN109121466B (fr)
WO (1) WO2017127816A1 (fr)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018002425A3 (fr) * 2016-06-30 2018-02-08 Nokia Technologies Oy Appareil, procédé et programme informatique destinés à un codage et un décodage de vidéo
US20180349705A1 (en) * 2017-06-02 2018-12-06 Apple Inc. Object Tracking in Multi-View Video
WO2019038433A1 (fr) * 2017-08-24 2019-02-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Signalisation de caractéristiques pour un contenu omnidirectionnel
WO2019120638A1 (fr) * 2017-12-22 2019-06-27 Huawei Technologies Co., Ltd. Fov+ échelonnable pour distribution de vidéo de réalité virtuelle (vr) à 360° à des utilisateurs finaux distants
EP3518087A1 (fr) * 2018-01-29 2019-07-31 Thomson Licensing Procédé et équipement réseau pour carreler une sphère représentant un contenu multimédia sphérique
EP3531703A1 (fr) * 2018-02-26 2019-08-28 Thomson Licensing Procédé et équipement réseau pour coder une vidéo immersive en mosaïque spatialement avec un ensemble de mosaïques
WO2019190197A1 (fr) * 2018-03-27 2019-10-03 주식회사 케이티 Procédé et appareil de traitement de signal vidéo
US10484621B2 (en) * 2016-02-29 2019-11-19 Gopro, Inc. Systems and methods for compressing video content
EP3618442A1 (fr) * 2018-08-27 2020-03-04 Axis AB Dispositif de capture d'images, procédé et produit de programme informatique permettant de former une image codée
US10666863B2 (en) 2018-05-25 2020-05-26 Microsoft Technology Licensing, Llc Adaptive panoramic video streaming using overlapping partitioned sections
US10754242B2 (en) 2017-06-30 2020-08-25 Apple Inc. Adaptive resolution and projection format in multi-direction video
US10764494B2 (en) 2018-05-25 2020-09-01 Microsoft Technology Licensing, Llc Adaptive panoramic video streaming using composite pictures
WO2020235034A1 (fr) * 2019-05-22 2020-11-26 日本電信電話株式会社 Dispositif de distribution de vidéos, procédé de distribution de vidéos, et programme
US10924747B2 (en) 2017-02-27 2021-02-16 Apple Inc. Video coding techniques for multi-view video
US10999602B2 (en) 2016-12-23 2021-05-04 Apple Inc. Sphere projected motion estimation/compensation and mode decision
US11259046B2 (en) 2017-02-15 2022-02-22 Apple Inc. Processing of equirectangular object data to compensate for distortion by spherical projections

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060034067A1 (en) * 2004-08-11 2006-02-16 Weiliang Kui Illumination pen
US20140003523A1 (en) * 2012-06-30 2014-01-02 Divx, Llc Systems and methods for encoding video using higher rate video sequences
US20140132598A1 (en) * 2007-01-04 2014-05-15 Hajime Narukawa Method of mapping image information from one face onto another continous face of different geometry
WO2014162324A1 (fr) * 2013-04-04 2014-10-09 Virtualmind Di Davide Angelelli Système omnidirectionnel sphérique pour le tournage d'une vidéo

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141562A (ja) * 2001-10-29 2003-05-16 Sony Corp 非平面画像の画像処理装置及び画像処理方法、記憶媒体、並びにコンピュータ・プログラム
US7011625B1 (en) * 2003-06-13 2006-03-14 Albert Shar Method and system for accurate visualization and measurement of endoscopic images
CN103247020A (zh) * 2012-02-03 2013-08-14 苏州科泽数字技术有限公司 一种基于径向特征的鱼眼图像展开方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060034067A1 (en) * 2004-08-11 2006-02-16 Weiliang Kui Illumination pen
US20140132598A1 (en) * 2007-01-04 2014-05-15 Hajime Narukawa Method of mapping image information from one face onto another continous face of different geometry
US20140003523A1 (en) * 2012-06-30 2014-01-02 Divx, Llc Systems and methods for encoding video using higher rate video sequences
WO2014162324A1 (fr) * 2013-04-04 2014-10-09 Virtualmind Di Davide Angelelli Système omnidirectionnel sphérique pour le tournage d'une vidéo

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10484621B2 (en) * 2016-02-29 2019-11-19 Gopro, Inc. Systems and methods for compressing video content
US10979727B2 (en) 2016-06-30 2021-04-13 Nokia Technologies Oy Apparatus, a method and a computer program for video coding and decoding
US20190297339A1 (en) * 2016-06-30 2019-09-26 Nokia Technologies Oy An Apparatus, A Method and A Computer Program for Video Coding and Decoding
WO2018002425A3 (fr) * 2016-06-30 2018-02-08 Nokia Technologies Oy Appareil, procédé et programme informatique destinés à un codage et un décodage de vidéo
US11818394B2 (en) 2016-12-23 2023-11-14 Apple Inc. Sphere projected motion estimation/compensation and mode decision
US10999602B2 (en) 2016-12-23 2021-05-04 Apple Inc. Sphere projected motion estimation/compensation and mode decision
US11259046B2 (en) 2017-02-15 2022-02-22 Apple Inc. Processing of equirectangular object data to compensate for distortion by spherical projections
US10924747B2 (en) 2017-02-27 2021-02-16 Apple Inc. Video coding techniques for multi-view video
US20180349705A1 (en) * 2017-06-02 2018-12-06 Apple Inc. Object Tracking in Multi-View Video
US11093752B2 (en) * 2017-06-02 2021-08-17 Apple Inc. Object tracking in multi-view video
US10754242B2 (en) 2017-06-30 2020-08-25 Apple Inc. Adaptive resolution and projection format in multi-direction video
WO2019038433A1 (fr) * 2017-08-24 2019-02-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Signalisation de caractéristiques pour un contenu omnidirectionnel
WO2019120638A1 (fr) * 2017-12-22 2019-06-27 Huawei Technologies Co., Ltd. Fov+ échelonnable pour distribution de vidéo de réalité virtuelle (vr) à 360° à des utilisateurs finaux distants
US11706274B2 (en) 2017-12-22 2023-07-18 Huawei Technologies Co., Ltd. Scalable FOV+ for VR 360 video delivery to remote end users
US11546397B2 (en) 2017-12-22 2023-01-03 Huawei Technologies Co., Ltd. VR 360 video for remote end users
CN111567052B (zh) * 2017-12-22 2022-01-14 华为技术有限公司 用于将vr 360视频下发给远程终端用户的可缩放fov+
CN111567052A (zh) * 2017-12-22 2020-08-21 华为技术有限公司 用于将vr 360视频下发给远程终端用户的可缩放fov+
EP3518087A1 (fr) * 2018-01-29 2019-07-31 Thomson Licensing Procédé et équipement réseau pour carreler une sphère représentant un contenu multimédia sphérique
CN112088352A (zh) * 2018-01-29 2020-12-15 交互数字Ce专利控股公司 用于对表示球面多媒体内容的球体进行分块的方法和网络设备
WO2019145296A1 (fr) * 2018-01-29 2019-08-01 Interdigital Ce Patent Holdings Procédé et équipement de réseau destinés au pavage d'une sphère représentant un contenu multimédia sphérique
EP3531704A1 (fr) * 2018-02-26 2019-08-28 InterDigital CE Patent Holdings Procédé et équipement réseau pour coder une vidéo immersive en mosaïque spatialement avec un ensemble de mosaïques
US11076162B2 (en) 2018-02-26 2021-07-27 Interdigital Ce Patent Holdings Method and network equipment for encoding an immersive video spatially tiled with a set of tiles
EP3531703A1 (fr) * 2018-02-26 2019-08-28 Thomson Licensing Procédé et équipement réseau pour coder une vidéo immersive en mosaïque spatialement avec un ensemble de mosaïques
WO2019190197A1 (fr) * 2018-03-27 2019-10-03 주식회사 케이티 Procédé et appareil de traitement de signal vidéo
US10764494B2 (en) 2018-05-25 2020-09-01 Microsoft Technology Licensing, Llc Adaptive panoramic video streaming using composite pictures
US10666863B2 (en) 2018-05-25 2020-05-26 Microsoft Technology Licensing, Llc Adaptive panoramic video streaming using overlapping partitioned sections
US10972659B2 (en) 2018-08-27 2021-04-06 Axis Ab Image capturing device, a method and a computer program product for forming an encoded image
EP3618442A1 (fr) * 2018-08-27 2020-03-04 Axis AB Dispositif de capture d'images, procédé et produit de programme informatique permettant de former une image codée
KR102172276B1 (ko) 2018-08-27 2020-10-30 엑시스 에이비 인코딩된 이미지를 형성하기 위한 이미지 캡처 디바이스, 방법 및 컴퓨터 프로그램 제품
TWI716960B (zh) * 2018-08-27 2021-01-21 瑞典商安訊士有限公司 影像捕捉裝置、形成一經編碼影像之方法及電腦程式產品
KR20200024095A (ko) * 2018-08-27 2020-03-06 엑시스 에이비 인코딩된 이미지를 형성하기 위한 이미지 캡처 디바이스, 방법 및 컴퓨터 프로그램 제품
JP7259947B2 (ja) 2019-05-22 2023-04-18 日本電信電話株式会社 映像配信装置、映像配信方法及びプログラム
WO2020235034A1 (fr) * 2019-05-22 2020-11-26 日本電信電話株式会社 Dispositif de distribution de vidéos, procédé de distribution de vidéos, et programme
JPWO2020235034A1 (fr) * 2019-05-22 2020-11-26

Also Published As

Publication number Publication date
CN109121466B (zh) 2022-09-02
CN109121466A (zh) 2019-01-01

Similar Documents

Publication Publication Date Title
CN109121466B (zh) 全向视频编码和流传输
US11228749B2 (en) Systems, methods and apparatus for compressing video content
US10652558B2 (en) Apparatus and methods for video compression using multi-resolution scalable coding
US11166047B2 (en) Apparatus and methods for video compression
KR102191875B1 (ko) 360 비디오를 전송하는 방법, 360 비디오를 수신하는 방법, 360 비디오 전송 장치, 360 비디오 수신 장치
US20190373245A1 (en) 360 video transmission method, 360 video reception method, 360 video transmission device, and 360 video reception device
US20180310010A1 (en) Method and apparatus for delivery of streamed panoramic images
WO2018175493A1 (fr) Projection adaptative de carte cubique perturbée
CN115150618A (zh) 解码装置及系统
WO2018132317A1 (fr) Réglage de champ de vision de projection de pyramide carrée tronquée pour une vidéo à 360 degrés
KR20190095430A (ko) 360 비디오 처리 방법 및 그 장치
EP3434021B1 (fr) Procédé, appareil et flux de formatage d'une vidéo immersive pour dispositifs de rendu existants et immersifs
US11948268B2 (en) Immersive video bitstream processing
Hu et al. Mobile edge assisted live streaming system for omnidirectional video
WO2019034803A1 (fr) Procédé et appareil de traitement d'informations vidéo
WO2017220851A1 (fr) Procédé de compression d'images et équipement technique pour ce procédé

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17742114

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.12.2018)

122 Ep: pct application non-entry in european phase

Ref document number: 17742114

Country of ref document: EP

Kind code of ref document: A1