WO2019034803A1 - Procédé et appareil de traitement d'informations vidéo - Google Patents

Procédé et appareil de traitement d'informations vidéo Download PDF

Info

Publication number
WO2019034803A1
WO2019034803A1 PCT/FI2018/050433 FI2018050433W WO2019034803A1 WO 2019034803 A1 WO2019034803 A1 WO 2019034803A1 FI 2018050433 W FI2018050433 W FI 2018050433W WO 2019034803 A1 WO2019034803 A1 WO 2019034803A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
roi
regions
viewport
content
Prior art date
Application number
PCT/FI2018/050433
Other languages
English (en)
Inventor
Mika Pesonen
Ari Hourunranta
Esin GULDOGAN
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2019034803A1 publication Critical patent/WO2019034803A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/127Prioritisation of hardware or computational resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/37Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability with arrangements for assigning different transmission priorities to video input data or to video coded data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video

Definitions

  • the present invention relates to a method for processing video information, an apparatus for processing video information, and computer program for processing video information.
  • a recent trend in streaming in order to reduce the streaming bitrate of 360-degree video or virtual reality (VR) video bitstreams is viewport-adaptive streaming (a.k.a. viewport dependent delivery), wherein a subset of 360-degree video content covering the primary viewport (i.e., the current view orientation) is transmitted at the best quality/resolution, while the remaining of 360-degree video is transmitted at a lower quality/resolution.
  • the viewing orientation changes, e.g. when the user turns his/her head when viewing the content with a head-mounted display
  • another version of the content needs to be streamed, matching the new viewing orientation.
  • the motion-to-photon latency i.e. the time it takes to view the new content
  • the latency consists of two parts: the time to download new content, and the time to tune-in to the new content in the playback apparatus.
  • the user may constantly change viewing direction, for example turn the viewing direction by 180 degrees and immediately turns back 180 degrees, or carry out any similar change of viewing patterns repetitively. This causes downloading, and possible cancelling of downloading, and decoding content that is not even displayed to the user since the user is already watching at a different viewing direction when the downloaded content would be ready to be displayed after the expiry of the motion-to-photon latency.
  • a method comprising: providing a user with a viewport to 360 degree video content, wherein the 360 degree video content is provided in a base layer bitstream and at least one enhancement layer bitstream and frames of the 360 degree video content are divided into sub-regions assigned with a region-of- interest (ROI) value; in response to the viewport comprising at least one sub-region with a high ROI value indicating an important content, starting downloading and/or decoding of said at least one sub-region from the enhancement layer bitstream; and in response to the viewport comprising one or more sub-regions with a low ROI value indicating a less important content, starting downloading and/or decoding of said one or more sub-regions from the base layer bitstream and delaying downloading and/or decoding of said one or more sub-regions from the enhancement layer bitstream for a predetermined delay.
  • ROI region-of- interest
  • the method further comprises starting downloading and/or decoding of said one or more sub-regions with the low ROI value from the enhancement layer bitstream after expiry o f the predetermined delay.
  • a ROI map comprising one ROI value from a plurality of ROI values assigned on each of the sub-regions is applied on the content.
  • the method further comprises indicating ROI areas comprising the important content within the frame by a geographical shape, and assigning each sub-region intersecting said geographical shape with the same ROI value as the geographical shape has been assigned.
  • the method further comprises recalculating, for the one or more sub- regions with a low ROI value indicating a less important content, said predetermined delay to be linearly changing for each of said sub-regions based on a distance between two or more ROI areas.
  • the ROI value assigned to a sub-region is frame-specific, segment-of- bitstream-specific or bitstream-specific.
  • the sub-region is a tile of a frame.
  • the viewport is provided to the user in response to the user changing a view in a head-mounted display (HMD) apparatus.
  • HMD head-mounted display
  • Figure 1 a shows an example of a multi-camera system as a simplified block diagram, in accordance with an embodiment
  • Figure lb shows a perspective view of a multi-camera system, in accordance with an embodiment
  • Figure 2 shows an example of a video playback apparatus as a simplified block diagram, in accordance with an embodiment
  • Figure 3 illustrates the basic pipeline for delivering 360 degree video content
  • Figure 4 shows a flowchart of a method for providing 360-degree video content to a user in accordance with an embodiment
  • Figures 5a- 5d illustrate an example of assigning ROI values to content and sub-regions appearing on a frame in accordance with an embodiment
  • Figure 6 shows a flowchart of a method for decision-making about downloading a tile in accordance with an embodiment
  • Figure 7 shows a schematic block diagram of an exemplary apparatus or electronic device, in accordance with an embodiment
  • Figure 8 shows an apparatus according to an example embodiment
  • Figure 9 shows an example of an arrangement for wireless communication comprising a plurality of apparatuses, networks and network elements.
  • FIG. 1 a illustrates an example of a multi-camera system 100, which may be able to capture and produce 360 degree stereo panorama video.
  • the multi-camera system 100 comprises two or more camera units 102.
  • the number of camera units 102 is eight, but may also be less than eight or more than eight.
  • Each camera unit 102 is located at a different location in the multi- camera system and may have a different orientation with respect to other camera units 102 so that they may capture a part of the 360 degree scene from different viewpoints substantially simultaneously.
  • a pair of camera units 102 of the multi-camera system 100 may correspond with left and right eye viewpoints at a time.
  • the camera units 102 may have an omnidirectional constellation so that it has a 360° viewing angle in a 3D-space.
  • such multi-camera system 100 may be able to see each direction of a scene so that each spot of the scene around the multi-camera system 100 can be viewed by at least one camera unit 102 or a pair of camera units 102.
  • any two camera units 102 of the multi-camera system 100 may be regarded as a pair of camera units 102.
  • a multi-camera system of two cameras has only one pair of camera units
  • a multi-camera system of three cameras has three pairs of camera units
  • a multi-camera system of four cameras has six pairs of camera units, etc.
  • a multi-camera system 100 comprising N camera units 102, where N is an integer greater than one, has N(N-l)/2 pairs of camera units 102. Accordingly, images captured by the camera units 102 at a certain time may be considered as N(N-l)/2 pairs of captured images.
  • the multi-camera system 100 of Figure l a may also comprise a processor 104 for controlling the operations of the multi-camera system 100.
  • the camera units 102 may be connected, for example, via a camera interface 103 to the processor 104.There may also be a memory 106 for for storing data and computer code to be executed by the processor 104, and a transceiver 108 for communicating with, for example, a communication network and/or other devices in a wireless and/or wired manner.
  • the user device 100 may further comprise a user interface (UI) 1 10 for displaying information to the user, for generating audible signals and/or for receiving user input.
  • UI user interface
  • the multi-camera system 100 does not need to comprise each feature mentioned above, or may comprise other features as well. For example, there may be electric and/or mechanical elements for adjusting and/or controlling optics of the camera units 102 (not shown).
  • Figure la also illustrates some operational elements which may be implemented, for example, as a computer code in the software of the processor, in hardware, or both.
  • An optical flow estimation element 114 may perform optical flow estimation to pair of images of different camera units 102.
  • the transform vectors or other information indicative of the amount of interpolation/extrapolation to be applied to different parts of a viewport may have been stored into a memory or they may be calculated e.g. as a function of the location of a pixel in question. It should be noted here that the transform vector does not need to be defined for each pixel of a viewport but may be defined, for example, so that for pixels on the same horizontal location but in different vertical location the same interpolation/extrapolation factor may be used.
  • the multi-camera system 100 may also comprise intrinsic parameters 120 and extrinsic parameters 122 for camera units 102.
  • the parameters may be stored, for example, in the memory 106. It should be noted that there may also be other operational elements in the multi-camera system 100 than those depicted in Figure 1 a.
  • Figure lb shows as a perspective view an example of an apparatus comprising the multi-camera system 100.
  • the multi-camera system 100 may comprise even more camera units which are not visible from this perspective.
  • Figure lb also shows two microphones 112a, 112b, but the apparatus may also comprise one or more than two microphones.
  • the multi-camera system 100 may be controlled by another device (not shown), wherein the multi-camera system 100 and the other device may communicate with each other and a user may use a user interface of the other device for entering commands, parameters, etc. and the user may be provided information from the multi-camera system 100 via the user interface of the other device.
  • FIG. 2 shows an example of a video playback apparatus 200 as a simplified block diagram, in accordance with an embodiment.
  • a non-limiting example of video playback apparatus 200 includes an immersive display unit.
  • An example of the immersive display unit includes, but is not limited to a head mounted display (HMD).
  • the video playback apparatus 200 may comprise, for example, one or two displays 202 for video playback. When two displays are used a first display 202a may display images for a left eye and a second display 202b may display images for a right eye, in accordance with an embodiment. In case of only one display 202, that display 202 may be used to display images for the left eye on the left side of the display 202 and to display images for the right eye on the right side of the display 202.
  • the video playback apparatus 200 may be provided with encoded data streams via a communication interface 204 and a processor 206 may perform control operations for the video playback apparatus 200 and may also perform operations to reconstruct video streams for displaying on the basis of received encoded data streams.
  • the video playback apparatus 200 may further comprise a processor 206 for reconstructing video streams for displaying on the basis of received encoded data streams.
  • the video playback apparatus 200 may further comprise a direction detector 216 to detect the direction where the video playback apparatus 200 is pointed at. This direction may then be used to determine the part of a scene the user of the video playback apparatus 200 is looking at.
  • the direction detector 216 may comprise, for example, an electronic compass, a gyroscope and/or an accelerometer.
  • the video playback device 200 may comprise a warping element 214 which may perform image warping on the basis of optical enhancement information received e.g. from an encoding device, from a file or from another source, and transform vectors. It should be noted that the video playback device 200 does not need to comprise each of the above elements or may also comprise other elements.
  • the decoding element 208 may be a separate device wherein that device may perform decoding operations and provide decoded data stream to the video playback device 200 for further processing and displaying decoded video streams.
  • a video rendered by an application on a HMD renders a portion of the 360 degrees video. This portion may be defined as a viewport.
  • a viewport is a window on the 360 world represented in the omnidirectional video displayed via a rendering display.
  • a viewport is characterized by horizontal and vertical FoVs (VHFoV, WFoV).
  • a viewport size may correspond to the HMD FoV or may have a smaller size, depending on the application.
  • the part of the 360 degrees space viewed by a user at any given point of time may be defined as primary viewport.
  • Dynamic adaptive streaming over HTTP has turned out to be a promising protocol for multimedia streaming applications, especially for 360-degree video or virtual reality (VR) video bistreams.
  • a recent trend in streaming in order to reduce the streaming bitrate of VR video is viewport- adaptive streaming (a.k.a. viewport dependent delivery), wherein a subset of 360-degree video content covering the primary viewport (i.e., the current view orientation) is transmitted at the best quality/resolution, while the remaining of 360-degree video is transmitted at a lower quality/resolution.
  • viewport-adaptive streaming Viewport-specific encoding and streaming, a.k.a. viewport-dependent encoding and streaming, a.k.a. asymmetric projection, a.k.a. packed VR video, where 360-degree image content is packed into the same frame with an emphasis on the primary viewport, and the packed VR frames are encoded into a single bitstream.
  • a picture can be partitioned in tiles, which are rectangular and contain an integer number of largest coding units (LCUs).
  • LCUs largest coding units
  • the partitioning to tiles forms a regular grid, where heights and widths of tiles differ from each other by one LCU at the maximum.
  • tile rectangle based encoding and streaming An approach of tile-based encoding and streaming, which may be referred to as tile rectangle based encoding and streaming, may be used with any video codec.
  • the source content is split into tile rectangle sequences before encoding.
  • Each tile rectangle sequence covers a subset of the spatial area of the source content, such as full panorama content, which may e.g. be of equirectangular projection format.
  • Each tile rectangle sequence is then encoded independently from each other as a single-layer bitstream.
  • bitstreams may be encoded from the same tile rectangle sequence, e.g. for different bitrates.
  • a first bitstream may be encoded as a base layer (lower quality/resolution) bitstream of the tile rectangle sequence and a second bitstream may be encoded as an enhancement layer (higher quality/resolution) bitstream of said tile rectangle sequence.
  • Each tile rectangle bitstream may be encapsulated in a file as its own track (or alike) and made available for streaming.
  • the tracks to be streamed may be selected based on the viewing orientation.
  • the client may receive tracks covering the entire omnidirectional content. Better quality or higher resolution tracks may be received for the current viewport compared to the quality or resolution covering the remaining, currently non-visible viewports.
  • each track may be decoded with a separate decoder instance.
  • the bitrate is aimed to be reduced e.g. such that the primary viewport (i.e., the current viewing orientation) is transmitted at the best quality/resolution, while the remaining of 360-degree video is transmitted at a lower quality/resolution.
  • the viewing orientation changes, e.g. when the user turns his/her head when viewing the content with a head- mounted display, another version of the content needs to be streamed, matching the new viewing orientation.
  • Figure 3 illustrates the basic pipeline for delivering 360 degree video content from a camera unit to a display of the playback apparatus.
  • 360-degree image or video content can be captured by a set of cameras or a camera device with multiple lenses and sensors, for example using an apparatus shown in Figures la and lb.
  • the acquisition results in a set of digital image/video signals.
  • the cameras/lenses typically cover all directions around the center point of the camera set or camera device.
  • the images of the same time instance may be stitched, projected, and mapped onto a packed virtual reality (VR) frame.
  • Input images may be stitched and projected onto a three-dimensional projection structure, such as a sphere or a cube.
  • VR virtual reality
  • the projection structure may be considered to comprise one or more surfaces, such as plane(s) or part(s) thereof.
  • a projection structure may be defined as three-dimensional structure consisting of one or more surface(s) on which the captured VR image/video content is projected, and from which a respective projected frame can be formed.
  • the image data on the projection structure is further arranged onto a two-dimensional projected frame.
  • projection may be defined as a process by which a set of input images are projected onto a projected frame.
  • Region-wise mapping may be applied to map projected frame onto one or more packed VR frames.
  • region-wise mapping may be understood to be equivalent to extracting two or more regions from the projected frame, optionally applying a geometric transformation (such as rotating, mirroring, and/or resampling) to the regions, and placing the transformed regions in spatially non- overlapping areas, a.k.a. constituent frame partitions, within the packed VR frame. If the region- wise mapping is not applied, the packed VR frame is identical to the projected frame. Otherwise, regions of the projected frame are mapped onto a packed VR frame by indicating the location, shape, and size of each region in the packed VR frame.
  • the term mapping may be defined as a process by which a projected frame is mapped to a packed VR frame.
  • packed VR frame may be defined as a frame that results from a mapping of a projected frame. In practice, the input images may be converted to a packed VR frame in one process without intermediate steps.
  • 360-degree panoramic content i.e., images and video
  • the vertical field-of-view may vary and can be e.g. 180 degrees.
  • Panoramic image covering 360-degree field-of-view horizontally and 180-degree field-of-view vertically can be represented by a sphere that can be mapped to a bounding cylinder that can be cut vertically to form a 2D picture (this type of projection is known as equirectangular projection).
  • the resulting video file is delivered to the playback apparatus.
  • the delivery may comprise, for example, transmitting the video file to the playback device directly via a network, transferring the video file to a storage comprising any type of mass memory to store the encoded video bitstream, or streaming the encoded video bitstream e.g. from a streaming server to the playback device.
  • the delivery path may comprise any number of servers, gateways, recording storages and/or network segments.
  • the encoded video bitstream Upon receiving the encoded video bitstream, it may be processed by a decoder, which outputs one or more uncompressed media streams, such as a first video stream representing a left-eye view of a scene and a second video stream representing a right-eye view of the scene. Finally, a renderer may reproduce the uncompressed media streams with a loudspeaker or a display, for example.
  • the receiver, recording storage, decoder, and renderer may reside in the same physical device or they may be included in separate devices.
  • the viewing orientation changes e.g. when the user turns his/her head when viewing the content with a head-mounted display, another version of the content matching the new viewing orientation needs to be streamed.
  • the motion-to-photon latency i.e. the time it takes to view the new content
  • the latency consists of two parts: the time to download new content, and the time to tune-in to the new content in the playback apparatus.
  • the tune-in time is a result of the nature of video compression: if the starting position is not a random access picture, the playback apparatus needs to decode additional pictures before the target picture.
  • a method for providing 360-degree video content to a user is presented hereinafter.
  • a user is provided (400) with a viewport to 360 degree video content, wherein the 360 degree video content is provided in a base layer bitstream and at least one enhancement layer bitstream and frames of the 360 degree video content are divided into sub-regions assigned with a region-of- interest (ROI) value.
  • ROI region-of- interest
  • downloading and/or decoding of said at least one sub- region is started (402) from the enhancement layer bitstream; and in response to the viewport comprising one or more sub-regions with a low ROI value indicating a less important content, downloading and/or decoding of said one or more sub-regions is started (404) from the base layer bitstream and downloading and/or decoding of said one or more sub-regions from the enhancement layer bitstream is delayed (406) for a predetermined delay.
  • the above method allows for distinguishing between the presumably important parts of the 360-degree panoramic frame and the non-important parts of the 360-degree panoramic frame, by assigning sub-regions of the 360-degree panoramic frames with a ROI value.
  • the ROI values may be defined according to the presumed importance of the content, such as an object, appearing in the frame, and the sub-regions of the frame are assigned the ROI value according to ROI values of the content appearing within the sub-region.
  • the downloading and/or decoding of said sub-regions as high quality/resolution is started from the enhancement layer bitstream without any delay, i.e. a delay of 0 ms. It is noted that the content may have been downloaded previously to the playback device, and thereby only decoding of the content may be necessary. However, if the user watches content that is presumably non-important, i.e. the viewport comprises one or more sub- regions that have content with a low ROI value, the downloading and/or decoding of said sub- regions as lower quality/resolution is started from the base layer bitstream.
  • the method further comprises starting downloading and/or decoding of said one or more sub-regions with the low ROI value from the enhancement layer bitstream after expiry of the predetermined delay.
  • the predetermined delay may be, for example, 500 ms.
  • the predetermined delay may also be referred to as fetching latency.
  • a ROI map comprising one ROI value from a plurality of ROI values assigned on each of the sub-regions is applied on the content.
  • more than 2 levels of ROI values may be provided, such as 3, 4 or 5 levels.
  • the ROI values of the sub- regions may be provided with a ROI map assigning the ROI values on each of the sub-regions.
  • a specific ROI map metadata can be defined and transmitted along the video bitstream.
  • the method further comprises indicating ROI areas comprising the important content within the frame by a geographical shape, and assigning each sub-region intersecting said geographical shape with the same ROI value as the geographical shape has been assigned.
  • the ROI map metadata can be specified according to geometrical shapes, such as rectangles or circles, where each shape defines a ROI area, i.e. an area of the frame where certain fetching latency applies.
  • the location of each shape within the frame may be specified e.g. in equirectangular coordinates. For areas intersecting with presumably important content, a minimum fetching latency may be determined.
  • said predetermined delay is recalculated, for the one or more sub- regions with a low ROI value indicating a less important content, to be linearly changing for each of said sub-regions based on a distance between two or more ROI areas.
  • smooth transition of the fetching latency between the ROI areas may be enabled by calculating the distances between the ROI areas.
  • a distance-dependent multiplier may be used for applying the linearly changing effect to the areas outside the ROI areas.
  • the fetching latency value i.e. ROI value assigned to the ROI is used.
  • the sub- regions not intersecting with the ROI areas may have an initial fetching latency value.
  • This initial fetching latency value may be changed by applying the distance-dependent multiplier such that for sub-regions close to the ROI areas, the fetching latency value is changed (increased or decreased) to be closer to the fetching latency value of the ROI area.
  • the closer the sub-region locates from the ROI areas the smaller the effect of the distance-dependent multiplier becomes.
  • the distance-dependent multiplier obtains the value of 0, and the initial fetching latency value is used for the sub-region.
  • the ROI value assigned to a sub-region is frame-specific, segment- of-bitstream-specific or bitstream-specific.
  • the ROI map metadata may include parameters for each sub-region indicating the duration for how long the specific ROI value applies to a particular sub-region.
  • the duration may be determined, for example, as a number of frames, as one or more segments of a track or a like in a bitstream or as one or more bitstreams, e.g. bitstreams on a plurality of scalability layers.
  • the sub-region is a tile of a frame.
  • Tile -based encoding provides an established framework for existing sub-regions of a frame, which can be independently downloaded and are thus easily subjected for tile-specific fetching latencies. It is nevertheless noted that any other sub-region of a frame could be utilised in the embodiments.
  • the ROI value for content may be determined using supervised or unsupervised methods.
  • the supervised methods include at least a manual selection of the ROI indicated by a user (a.k.a. a director) in connection with the encoding method.
  • the unsupervised methods include at least automated selection of the ROI obtained e.g. from external sensors or on the basis of image-based object detection methods or face detection methods.
  • ROI map metadata may be defined automatically with various computer vision algorithms by detecting interesting objects or persons.
  • ROI map metadata may also be defined manually by the video editor, or in a live case, by the director of the live event. Combination of automatic and manual definition may also be used.
  • a live case it is also possible to use crowdsourcing for determining interesting ROI areas via an analytics service, where a plurality of users send their viewing directions to the service, and viewing heatmaps may be created on the basis of the plurality of viewing directions.
  • Figures 5a - 5d show an example of assigning ROI values to content appearing on a frame and further to sub-regions according to the content appearing on each sub-region.
  • Figure 5a illustrates a video frame in 360 equirectangular format, where two persons appear on the frame.
  • Figure 5b illustrates the two persons defined as ROI areas e.g. using one of the supervised or unsupervised method mentioned above. The ROI areas are indicated by rectangles. The areas covering the persons are assigned a ROI value corresponding to 0 ms fetching latency, whereas the rest of the video frame are assigned a ROI value corresponding to 500 ms fetching latency.
  • Figure 5c illustrates how the frame is divided into tile.
  • the frame is divided into ten equirectangular video tiles (view directions) where the two tiles covering top/bottom areas of the frame are frame -wide in terms of width.
  • the frame area between the top and bottom tiles is inot eight equal-size tiles.
  • Each tile is assigned a ROI value based on the ROI values of areas in Figure 5b intersecting with each tile, wherein ROI value correspond to fetching latency value of either 0 ms or 500 ms.
  • Figure 5d illustrates an example of the current viewing direction selected by the user and the associated viewport a.k.a. viewing frustum (bold rectangle).
  • Figure 6 shows a flow chart illustrating various embodiments relating to decision-making about downloading a tile.
  • the sequence shown in Figure 6 may be repeated periodically at the same rate as head tracking is performed, for example.
  • a viewing direction of the user is obtained (600), e.g. based on head tracking.
  • For each tile it is determined (602) whether the tile is within the user's viewing frustum. For a particular tile residing within the viewing frustum of the user, it may be checked (604) if the tile is already downloading or in decoding stage; if yes, step 602 is repeated for the next tile.
  • Each tile has its own duration for presenting video data.
  • a timer may be used for each tile residing within the viewing frustum of the user to determine the remaining viewing time of the tile, wherein the timer should be started upon the tile entering the viewing frustum. For tiles no longer residing within viewing frustum, for example when noticed in step 602, the timer may be stopped. For tiles residing within viewing frustum, it is checked (606) if the timer has started, and if not, the timer is started. Regardless of whether the timer has been started earlier or only at step 606, it is calculated (608) if the remaining viewing time of the tile exceeds the specified fetching latency for said tile. If yes, the downloading of the tile can be started (610).
  • Figure 7 shows a schematic block diagram of an exemplary apparatus or electronic device 50 depicted in Figure 8, which may incorporate a transmitter according to an embodiment of the invention.
  • the electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require transmission of radio frequency signals.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 further may comprise a display 32 in the form of a liquid crystal display.
  • the display may be any suitable display technology suitable to display an image or video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
  • the term battery discussed in connection with the embodiments may also be one of these mobile energy devices.
  • the apparatus 50 may comprise a combination of different kinds of energy devices, for example a rechargeable battery and a solar cell.
  • the apparatus may further comprise an infrared port 41 for short range line of sight communication to other devices.
  • the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/Fire Wire wired connection.
  • the apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50.
  • the controller 56 may be connected to memory 58 which in embodiments of the invention may store both data and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a universal integrated circuit card (UICC) reader and a universal integrated circuit card for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • UICC universal integrated circuit card
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the apparatus 50 may further comprise an antenna 60 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
  • the apparatus 50 comprises a camera 42 capable of recording or detecting imaging.
  • a camera 42 capable of recording or detecting imaging.
  • FIG 9 an example of a system within which embodiments of the present invention can be utilized is shown.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired and/or wireless networks including, but not limited to a wireless cellular telephone network (such as a global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), long term evolution (LTE) based network, code division multiple access (CDMA) network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • LTE long term evolution
  • CDMA code division multiple access
  • the system shown in Figure 9 shows a mobile telephone network 1 1 and a representation of the internet 28.
  • Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22, a tablet computer.
  • PDA personal digital assistant
  • IMD integrated messaging device
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • the apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 1 1 and the internet 28.
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.1 1 , Long Term Evolution wireless communication technique (LTE) and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocol-internet protocol
  • SMS short messaging service
  • MMS multimedia messaging service
  • email instant messaging service
  • Bluetooth Bluetooth
  • IEEE 802.1 1 Long Term Evolution wireless communication technique
  • LTE Long Term Evolution wireless communication technique
  • embodiments of the invention operating within a wireless communication device
  • the invention as described above may be implemented as a part of any apparatus comprising a circuitry in which radio frequency signals are transmitted and received.
  • embodiments of the invention may be implemented in a mobile phone, in a base station, in a computer such as a desktop computer or a tablet computer comprising radio frequency communication means (e.g. wireless local area network, cellular radio, etc.).
  • radio frequency communication means e.g. wireless local area network, cellular radio, etc.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof. Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate. Programs, such as those provided by Synopsys, Inc.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
  • a standardized electronic format e.g., Opus, GDSII, or the like

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

La présente invention concerne un procédé qui consiste à : fournir à un utilisateur une fenêtre d'affichage pour un contenu vidéo à 360 degrés, le contenu vidéo à 360 degrés étant fourni dans un train de bits de couche de base et au moins un train de bits de couche d'amélioration et des trames du contenu vidéo à 360 degrés sont divisées en sous-régions attribuées à une valeur de région d'intérêt (ROI) ; en réponse au fait que la fenêtre d'affichage comprend au moins une sous-région ayant une valeur de ROI élevée indiquant un contenu important, démarrer le téléchargement et/ou le décodage de ladite au moins une sous-région à partir du train de bits de couche d'amélioration ; et en réponse au fait que la fenêtre d'affichage comprend une ou plusieurs sous-régions ayant une valeur de ROI faible indiquant un contenu moins important, démarrer le téléchargement et/ou le décodage de ladite ou desdites sous-régions à partir du train de bits de couche de base et retarder le téléchargement et/ou le décodage desdites une ou plusieurs sous-régions à partir du train de bits de couche d'amélioration pendant un retard prédéfini.
PCT/FI2018/050433 2017-08-14 2018-06-11 Procédé et appareil de traitement d'informations vidéo WO2019034803A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20175724 2017-08-14
FI20175724 2017-08-14

Publications (1)

Publication Number Publication Date
WO2019034803A1 true WO2019034803A1 (fr) 2019-02-21

Family

ID=65361921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2018/050433 WO2019034803A1 (fr) 2017-08-14 2018-06-11 Procédé et appareil de traitement d'informations vidéo

Country Status (1)

Country Link
WO (1) WO2019034803A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11805303B2 (en) 2019-01-04 2023-10-31 Nokia Technologies Oy Method and apparatus for storage and signaling of media segment sizes and priority ranks

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150208095A1 (en) * 2012-06-29 2015-07-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Video data stream concept
WO2017060423A1 (fr) * 2015-10-08 2017-04-13 Koninklijke Kpn N.V. Amélioration d'une région digne d'intérêt dans des trames vidéo d'un flux vidéo

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150208095A1 (en) * 2012-06-29 2015-07-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Video data stream concept
WO2017060423A1 (fr) * 2015-10-08 2017-04-13 Koninklijke Kpn N.V. Amélioration d'une région digne d'intérêt dans des trames vidéo d'un flux vidéo

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CIUBOTARU, B. ET AL.: "Objective Assessment of Region of Interest- Aware Adaptive Multimedia Streaming Quality", IEEE TRANSACTIONS ON BROADCASTING, vol. 55, no. 2, 5 May 2009 (2009-05-05), pages 202 - 212, XP011343506, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/ document/4908970> [retrieved on 20181217], DOI: doi:10.1109/TBC.2009.2020448 *
NGUYEN, A. ET AL.: "Gaze-J2K: gaze-influenced image coding using eye trackers and JPEG 2000", JOURNAL OF TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, January 2006 (2006-01-01), pages 3 - 10, XP055575389, Retrieved from the Internet <URL:https://www.researchgate.net/publication/27480883_Gaze-J2K_Gaze-lnfluenced_lmage_Coding_Using_Eye_Trackers_and_JPEG_2000> [retrieved on 20181217] *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11805303B2 (en) 2019-01-04 2023-10-31 Nokia Technologies Oy Method and apparatus for storage and signaling of media segment sizes and priority ranks

Similar Documents

Publication Publication Date Title
US20220174252A1 (en) Selective culling of multi-dimensional data sets
US10560660B2 (en) Rectilinear viewport extraction from a region of a wide field of view using messaging in video transmission
KR102371099B1 (ko) 광시야 비디오를 인코딩하기 위한 구면 회전 기법
WO2019076503A1 (fr) Appareil, procédé et programme informatique pour coder une vidéo volumétrique
US11430156B2 (en) Apparatus, a method and a computer program for volumetric video
WO2019073117A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
US10616548B2 (en) Method and apparatus for processing video information
WO2019008222A1 (fr) Procédé et appareil de codage de contenu multimédia
WO2019034803A1 (fr) Procédé et appareil de traitement d&#39;informations vidéo
CN112567737B (zh) 用于观看体积视频的体积信令的装置、方法和计算机程序
EP3494691B1 (fr) Procédé de prédiction temporelle entre vues et équipement technique pour cela
WO2019008233A1 (fr) Méthode et appareil d&#39;encodage de contenu multimédia
WO2018211171A1 (fr) Appareil, procédé et programme d&#39;ordinateur pour le codage et le décodage vidéo
US10783609B2 (en) Method and apparatus for processing video information
CN115457158A (zh) 图像处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18846420

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18846420

Country of ref document: EP

Kind code of ref document: A1