EP3782368A1 - Verarbeitung von videopatches für dreidimensionale inhalte - Google Patents
Verarbeitung von videopatches für dreidimensionale inhalteInfo
- Publication number
- EP3782368A1 EP3782368A1 EP19789175.7A EP19789175A EP3782368A1 EP 3782368 A1 EP3782368 A1 EP 3782368A1 EP 19789175 A EP19789175 A EP 19789175A EP 3782368 A1 EP3782368 A1 EP 3782368A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- patch
- patches
- visible
- client device
- visibility
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000001419 dependent effect Effects 0.000 claims abstract description 12
- 238000009877 rendering Methods 0.000 claims description 36
- 239000013598 vector Substances 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 5
- 230000002123 temporal effect Effects 0.000 claims description 4
- 230000000153 supplemental effect Effects 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 description 15
- 230000033001 locomotion Effects 0.000 description 13
- 230000005540 biological transmission Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 238000012856 packing Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000005236 sound signal Effects 0.000 description 7
- 230000003190 augmentative effect Effects 0.000 description 5
- 239000012092 media component Substances 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 210000003128 head Anatomy 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 241000953561 Toia Species 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 210000003414 extremity Anatomy 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000004886 head movement Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- RUZYUOTYCVRMRZ-UHFFFAOYSA-N doxazosin Chemical compound C1OC2=CC=CC=C2OC1C(=O)N(CC1)CCN1C1=NC(N)=C(C=C(C(OC)=C2)OC)C2=N1 RUZYUOTYCVRMRZ-UHFFFAOYSA-N 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/40—Hidden part removal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/172—Processing image signals image signals comprising non-image signal components, e.g. headers or format information
- H04N13/178—Metadata, e.g. disparity information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/388—Volumetric displays, i.e. systems where the image is built up from picture elements distributed through a volume
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/262—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
- H04N21/26258—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/462—Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
- H04N21/4621—Controlling the complexity of the content stream or additional data, e.g. lowering the resolution or bit-rate of the video stream for a mobile client with a small screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
- H04N21/4825—End-user interface for program selection using a list of items to be played back in a given order, e.g. playlists
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/85406—Content authoring involving a specific file format, e.g. MP4 format
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
Definitions
- Embodiments relate to an apparatus and method for processing video patches for three- dimensional content.
- Three-dimensional (3D) video content comprises data which may be rendered for output to a display to provide an image or sequence of images representing one or more three- dimensional objects or scenes for user consumption.
- VR virtual reality
- the user device may be provided with a live or stored feed from a content source, the feed representing a virtual space for immersive output through the user device.
- Position and/or movement of the user device can enhance the immersive experience.
- 3D0F three degrees of freedom
- An enhancement is a six degrees-of-freedom (6D0F) virtual reality system, where the user is able to freely move in Euclidean space and rotate their head in the yaw, pitch and roll axes.
- Six degrees-of-freedom virtual reality systems enable the provision and consumption of volumetric content.
- Volumetric content comprises data representing spaces and/or objects in three-dimensions from all angles, enabling the user to move fully around the spaces and/or objects to view them from any angle.
- VR virtual reality
- AR augmented reality
- MR mixed reality
- An embodiment provides an apparatus comprising: means for providing a plurality of patches representing at least part of a volumetric scene; means for providing, for each patch, patch visibility information indicative of a set of directions from which a forward surface of the patch is visible; means for providing one or more viewing positions associated with a client device; means for processing one or more of the patches dependent on whether the patch visibility information indicates that the forward surface of the one or more patches is visible from the one or more viewing positions.
- Providing may comprise receiving, transmitting and/or generating.
- the apparatus may be a server of data representing part of the volumetric scene. Alternatively, the apparatus may be a client device for receiving the data from a server.
- the means for providing the plurality of patches may be configured to estimate surface normals from points of the volumetric scene, and group together points having similar surface normals to provide a given patch.
- the processing means may be configured to transmit one or more visible patches to the client device and not to transmit one or more non-visible patches to the client device.
- the processing means may be configured to transmit identifiers of one or more visible patches to the client device and not to transmit identifiers of non-visible patches to the client device, the identifiers being usable by the client device to retrieve the patches from a patch server.
- the processing means may be configured to generate a texture atlas comprising the plurality of patches, to cull from the texture atlas one or more non-visible patches, and to transmit the texture atlas to the client device for decoding and/or rendering thereat.
- the processing means may be configured to generate first and second texture atlases by means of producing, for each patch, a colour image and a depth image, to provide the colour image for a given patch to a respective part of the first texture atlas and the depth image for the given patch to the respective part of the second texture atlas, to cull the colour and depth images corresponding to non-visible patches, and to transmit the culled first and second texture atlases to the client device for decoding and/or rendering thereat.
- the processing means may be configured to project each patch to a two-dimensional geometiy to provide the colour image and the depth image.
- the visible patches / texture atlases may be transmitted to the client device as video frames.
- the apparatus may further comprise means for transmitting metadata to the client device, the metadata being indicative of the patch visibility information for patches.
- the metadata may be further indicative of decoding parameters and/ or requirements of the patches.
- the metadata may be transmitted to the client device using one or more of:
- SEI Supplemental Enhancement Information
- the metadata may be quantized.
- the metadata may indicate patch visibility information by means of a normal vector and an angle.
- the patch visibility information for a patch may be determined by projecting an image of the patch onto a near plane and a far plane using depth information for the patch, and wherein the processing means may be configured to identify the patch as visible or non-visible based on which side of the near and fair planes the one or more viewing positions are located.
- the patch visibility information for a patch may be determined by determining a visibility cone derived from a normal cone direction vector and opening angle that includes approximately the minimum cone from which the forward surface of the patch can be seen and increasing the opening angle by a predetermined amount.
- the visibility cone may be determined by increasing the opening angle by approximately 180 degrees.
- the patch visibility information may be determined over a plurality of temporal frames of the volumetric scene.
- the apparatus may be a server for transmitting patches to one or more client devices.
- the means for providing a one or more viewing positions may further provide one or more predicted viewing positions associated with the client device, and wherein the processing means is configured to process the one or more patches dependent on whether the patch visibility information indicates that the forward surface of the one or more patches will be visible from the one or more predicted viewing positions.
- the apparatus may be a client device, whereby the means for providing the plurality of patches is configured to receive the patches from a server, whereby the means for providing the patch visibility information may be configured to receive the patch visibility information from the, or a different, server, and wherein the processing means may be configured to decode and/ or render one or more patches dependent on whether the received patch visibility information indicates that the forward surface of the one or more patches is visible from the one or more viewing positions.
- Another embodiment provides method, comprising: providing a plurality of patches representing part of a volumetric scene; providing, for each patch, patch visibility information indicative of a set of directions from which a forward surface of the patch is visible; providing one or more viewing positions associated with a client device; and processing one or more of the patches dependent on whether the patch visibility information indicates that the forward surface of the one or more patches is visible from the one or more viewing positions.
- Another embodiment provides a computer program comprising computer-readable instructions which, when executed by at least one processor, cause the at least one processor to perform the method of any preceding method definition.
- Another embodiment provides a non-transitory computer-readable medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising: providing a plurality of patches representing part of a volumetric scene; providing, for each patch, patch visibility information indicative of a set of directions from which a forward surface of the patch is visible; providing one or more viewing positions associated with a client device; processing one or more of the patches dependent on whether the patch visibility information indicates that the forward surface of the one or more patches is visible from the one or more viewing positions.
- Another embodiment provides an apparatus, the apparatus having at least one processor and at least one memoiy having computer-readable code stored thereon which when executed controls the at least one processor: to provide a plurality of patches representing part of a volumetric scene; to provide, for each patch, patch visibility information indicative of a set of directions from which a forward surface of the patch is visible; to provide one or more viewing positions associated with a client device; and to process one or more of the patches dependent on whether the patch visibility information indicates that the forward surface of the one or more patches is visible from the one or more viewing positions.
- FIG. 1 is an example of an audio capture system which may be used in order to capture video and/or audio signals for processing in accordance with various examples described herein;
- FIG. 2 is a schematic diagram of a virtual reality processing apparatus in relation to one or more user devices and a communications network in accordance with various examples described herein;
- FIG. 3 is a block diagram of an example pipeline comprising server and user devices in accordance with various examples described herein;
- FIG. 4A is a perspective view of part of a point cloud relative to an underlying surface
- FIG. 4B is a perspective view of surface normals for respective points of the FIG. 4A point cloud, determined in accordance with various examples described herein;
- FIG. 4C is a perspective view of a visibility cone, determined in accordance with various examples described herein;
- FIG. 5 is a schematic representation of first and second atlases, determined in accordance with various examples described herein;
- FIG. 6 is a schematic representation of first and second two-dimensional patches, relative to a view point, determined in accordance with various examples described herein;
- FIG. 7 is a schematic representation indicating a culling operation performed on a patch, in accordance with various examples described herein;
- FIG. 8 is a flow diagram showing processing operations of a method, in accordance with various examples described herein.
- FIG. 9 is a schematic diagram of components for implementing one or more of the FIG. 8 processing operations, in accordance with various examples described herein.
- Embodiments herein relate to the processing of video patches for three-dimensional content (hereafter“content”).
- the content may represent one or more objects or scenes.
- the content may represent content captured using one or more cameras, for example using a multi- camera module such as Nokia’s OZO camera, a laser scanner and/or a combination of video and dedicated depth sensors.
- the content may be generated using three-dimensional models, e.g. using computer-generated imageiy (CGI). A combination of captured and CGI is possible.
- the content may be panoramic.
- the content may or may not be accompanied by audio.
- Embodiments may relate to coding and transport of content. Coding may comprise compression.
- Embodiments may also relate to decoding and rendering of content. Decoding may comprise decompression.
- VR virtual reality
- Virtual reality is a rapidly developing area of technology in which video content, sometimes accompanied by audio, is provided to a user device such as a user headset.
- the user device may be provided with a live or stored feed from a content source, the feed representing a virtual space for immersive output through the user device.
- the position of the user device can enhance the immersive experience.
- a change of position, i.e. movement, can also enhance the immersive experience.
- 3D0F three degrees of freedom
- An enhancement is a six degrees-of- freedom (6D0F) virtual reality system, where the user may freely move in Euclidean space as well as rotate their head in the yaw, pitch and roll axes.
- Six degrees-of-freedom virtual reality systems enable the provision and consumption of volumetric content.
- Volumetric content comprises data representing spaces and/or objects in three-dimensions from all angles, enabling the user to move fully around the space and/or objects to view them from any angle.
- Such content may be defined by data describing the geometry (e.g. shape, size, position in a three-dimensional space) and attributes such as colour, opacity and reflectance.
- the data may also define temporal changes in the geometry and attributes at given time instances, similar to frames in two-dimensional video.
- Typical representation formats for volumetric content include triangle meshes, point clouds and voxels.
- Temporal information about the content may comprise individual capture instances, i.e. frames or the position of objects as a function of time.
- volumetric content may depend on how the data is to be used. For example, dense voxel arrays may be used to represent volumetric medical images. In three- dimensional graphics, polygon meshes are extensively used. Point clouds, on the other hand, are well suited to applications such as capturing real-world scenes where the topology of the scene is not necessarily a two-dimensional surface or manifold. Another method is to code three-dimensional data to a set of texture and depth maps. Closely related to this is the use of elevation and multi-level surface maps. For the avoidance of doubt, embodiments herein are applicable to any of the above technologies.
- Embodiments herein relate to the processing of video patches representing volumetric three- dimensional content.
- Volumetric content may comprises data representing spaces and/or objects in three-dimensions such that they can be viewed from different angles. Examples particularly relate to virtual reality content, e.g. for transmission to a virtual reality user device where the user may explore the volumetric content by movement when consuming the content.
- virtual reality content e.g. for transmission to a virtual reality user device where the user may explore the volumetric content by movement when consuming the content.
- references to virtual reality are also intended to cover related technologies such as augmented reality (AR) and mixed reality (MR).
- AR augmented reality
- MR mixed reality
- a video patch may refer to a group of volumetric pixels (voxels) or points of a point cloud or another kind of surface representation such as a polygon mesh or Bezier patch.
- a patch may be determined by grouping together voxels or points having similar surface normals, e.g. within a predetermined range of angles. These voxels or points may be adjacent or relatively close together. Visibility of the voxels or points may be a grouping criteria, e.g. linked to surface normals.
- multiple patches are determined based on respective surface normal similarities, and processing of said patches may depend on the visibility of said patches.
- FIG. l is an example of a capture system l which may be used to capture video (and audio) signals for processing in accordance with various examples described herein. Although the capture of audio signals is described for completeness, embodiments relate mainly to video capture.
- the capture system l comprises a spatial capture apparatus to configured to capture video and a spatial audio signal.
- the capture system l may also comprise one or more additional audio capture devices 12A, 12B, 12C.
- the spatial capture apparatus 10 is configured to capture video content by way of a plurality of visual content capture devices 102A-G (e.g. cameras).
- the plurality of visual content capture devices 102A-G may be configured to capture visual content from different directions around the apparatus, thereby to provide volumetric content for consumption by users.
- the spatial capture apparatus 10 is a presence-capture device, such as
- the spatial capture apparatus 10 may be another type of device and/or may be made up of plural physically separate devices.
- the content captured may be suitable for provision as immersive content, it may also be provided in a regular non-VR format for instance via a smart phone or tablet computer.
- the spatial capture apparatus 10 may also comprise a plurality of audio capture devices 101A, B (e.g. directional or non-directional microphones) which are arranged to capture audio signals which may subsequently be spatially rendered into an audio stream in such a way that the reproduced sound is perceived by a listener as originating from at least one virtual spatial position.
- the sound captured by the spatial audio capture apparatus to is derived from plural different sound sources which may be at one or more different locations relative to the spatial capture apparatus to.
- the captured spatial audio signal includes
- the spatial capture apparatus may comprise more than two devices toiA, B.
- the spatial capture apparatus to may comprise may comprise eight audio capture devices.
- the spatial capture system l may further comprises one or more additional audio capture devices 12A-C.
- Each of the additional audio capture devices 12A-C may comprise at least one microphone and, in the example of FIG. 1, the additional audio capture devices 12A-C are lavalier microphones configured for capture of audio signals derived from an associated user 13A-C.
- each of the additional audio capture devices 12A-C is associated with a different user by being affixed to the user in some way.
- the locations of the additional audio capture devices 12A-C and/or the spatial capture apparatus 10 within the audio capture environment may be known by, or may be
- the capture system 1 for instance, a virtual reality processing apparatus 14
- the devices may include a location determination component for enabling the location of the devices to be determined.
- a radio frequency location determination system such as Nokia’s High
- the additional audio capture devices 12A-C (and in some examples the spatial capture apparatus 10) transmit messages for enabling a location server to determine the location of the additional audio capture devices within the audio capture environment.
- the locations may be pre-stored by an entity which forms part of the capture system 1 (for instance, the virtual reality processing apparatus 14).
- the spatial capture system 1 may not include additional audio capture devices 12A-C.
- the capture system l further comprises the virtual reality processing apparatus 14.
- the virtual reality processing apparatus 14 may be a server, or it may be associated with another server. In embodiments herein, it is assumed that the virtual reality processing apparatus 14 also encodes and serves the content to one or more user devices, but this may be performed at a separate server system (not shown). This serving, or
- the virtual reality processing apparatus 14 may be configured to receive and store signals captured by the spatial capture apparatus 10 and/or the one or more additional audio capture devices 12A-C. The signals may be received at the virtual reality processing apparatus 14 in real-time during capture of the audio and video signals or may be received subsequently, for instance via an intermediaiy storage device. In such examples, the virtual reality processing apparatus 14 may be local to the audio capture environment or may be geographically remote from the audio capture environment in which the capture apparatus 10 and devices 12A-C are provided. In some examples, the virtual reality processing apparatus 14 may even form part of the spatial capture apparatus 10.
- FIG. 2 is a schematic view of the virtual reality processing apparatus 14 in relation to a network 16, which may be an Internet Protocol (IP) network such as the Internet, or any other form of data network, and a plurality of remote users 20A - 20C having respective user headsets 22A - 22C for consuming the content when rendered.
- IP Internet Protocol
- the virtual reality processing apparatus 14 may stream the content over multiple transmission channels via the network 16.
- the remote users 20A - 20C may be co-located or located in separate real-world locations, possibly in different countries. What each remote user 20A - 20C sees through the video screens and/ or headphones of their respective headsets 22A - 22C is part of a virtual space or scene.
- a virtual space or scene is any computer-generated version of a space, for example the volumetric real world space captured using the capture system 1 shown in FIG. 1, in which one or more users 20A - 20C can be immersed.
- the virtual space may be entirely computer-generated, e.g. CGI.
- the headsets 22A - 22C may be of any suitable type.
- the headsets 22A - 22C may be configured to provide virtual reality video and/or audio content to the respective users 20A - 20C. As such, the users may be immersed in virtual space.
- the headsets 22A - 22C may receive the content directly from the virtual reality processing apparatus 14, or, in some embodiments, from a separate media player 24 to which the headset is connected.
- the media player 24 may include a games console, or a personal computer (PC) configured to receive visual and/or audio data from the virtual reality processing apparatus, via the network 16, and communicate this to the headset 22A shown in FIG. 2.
- the media player 24 may form part of the headset 22A.
- the media player 24 may comprise a mobile phone, smartphone or tablet computer configured to play content through its display.
- the headsets 22A - 22C may include means for determining the spatial position of the respective users 20A - 20C and/or orientation of the respective user’s head. In some embodiments, therefore, the headsets 22A - 22C may track movement using six degrees-of- freedom. Over successive time frames, a measure of movement may therefore be calculated and stored.
- the headsets 22A - 22C may incorporate motion tracking sensors which may include one or more of gyroscopes, accelerometers and structured light systems. These sensors generate position data from which a current position and visual field-of-view (FOV), in other words a viewport, is determined and updated as the one or more user headsets 22A - 22C change position and/or orientation.
- FOV visual field-of-view
- the headsets 22A - 22C may comprise gaze tracking means used to determine a direction of the user’s gaze, which can be used to determine an object of interest the user is looking at.
- the headsets 22A - 22C may comprise, or be associated with, other limb tracking means to determine the position or orientation of a limb of the user.
- the headsets 22A - 22C may typically comprise two digital screens for displaying
- the headsets 22A - 22C may comprise one or more cameras. Images from the one or more cameras may be presented to the user through the screens of the headsets 22A - 22C, such that the real world environment is displayed to the user in a“see-through mode”, or an augmented reality mode.
- the example embodiments herein, which primarily relate to the delivery of virtual reality content, are not limited to a particular type of virtual reality headset 22A - 22C. Any form of user display device may be used.
- the headsets 22A - 22C may determine the spatial position and/or orientation of the respective users 20A - 20C within the virtual space. These may include measurements of pitch, roll and yaw and also translational movement in Euclidean space along side-to-side, front-to-back and up-and-down axes (i.e. six degrees-of-freedom).
- the headsets 22A - 22C, or the media player 24, may be configured to display content based on the spatial position and/or the orientation of the respective headset.
- a detected change in spatial position and/or orientation i.e.
- a form of movement may result in a corresponding change in the visual and/or audio data to reflect a position or orientation transformation of the user 20A - 20C with reference to the space into which the visual and/ or audio data is projected.
- This allows virtual reality content data to be consumed with the user 20A - 20C experiencing a three-dimensional (3D) virtual reality environment.
- volumetric virtual reality spaces In the context of volumetric virtual reality spaces, this means that the user’s position can be detected relative to content provided within the volumetric virtual reality content, e.g. so that the user can move freely within a given virtual reality space, around individual objects or groups of objects, and can view the objects from different angles depending on the movement (e.g. rotation and location) of their head in the real world.
- the user may also view and explore a plurality of different virtual reality spaces and move from one virtual reality space to another one.
- the angular extent of the environment observable or hearable through the respective headsets 22A - 22C is called the visual field of view (FOV).
- FOV visual field of view
- the actual FOV observed or heard by a user depends on the inter-pupillary distance and on the distance between the lenses of the virtual reality headset 22A - 22C and the user’s eyes, but the FOV can be considered to be approximately the same for all users of a given display device when the virtual reality headset is being worn by the user.
- the portion of virtual reality content that is visible at a given time instant may be called a viewport. When viewing volumetric content from a single viewpoint, a portion (often half) of the content may not be seen because it is facing away from the user. This portion is sometimes called“back facing content”.
- headsets 22A - 22C and/or the media player 24 Given the limited processing power of user devices, such as the headsets 22A - 22C and/or the media player 24, limiting the amount of data required to be processed, whether in terms of the virtual reality processing apparatus 14 (or other server) encoding and transmitting the data and/ or the headsets 22A - 22C and/ or media player 24 rendering the data, offers technical advantages. This may be particularly important where, for example, the headsets 22A - 22C are using processing power to run cameras (e.g. in the case of augmented reality) or other applications. For example, in a city street scene, there is generally no need to encode, transmit and/or render objects on the sides of the buildings facing away from the field of view, because they are completely occluded.
- Embodiments may comprise providing a plurality of patches representing part of a volumetric scene, and providing, for each patch, patch visibility information indicative of a set of directions from which a forward surface of the patch is visible.
- Embodiments may further comprise providing one or more viewing positions associated with a client device, and processing one or more of the patches dependent on whether the patch visibility information indicates that the forward surface of the one or more patches is visible from the one or more viewing positions.
- Advantages of such embodiments include increased processing efficiency, particularly at client devices, such as at the headsets 22A - 22C and/or the media player 24. This is because the amount of data that is encoded, transmitted, decoded and/or rendered is limited based on what parts of objects and/or scenes are visible and what are back facing.
- the patches may be provided by estimating surface normals from points of the volumetric scene, and grouping together points having similar surface normals to provide a given patch.
- a surface normal may be considered a vector which is perpendicular to the surface of the volumetric scene, or a part thereof. The vector may be pointing outside of the surface.
- FIG. 3 is a block diagram of an example volumetric video processing pipeline (hereafter “pipeline”) 30 from content processing to content rendering.
- the pipeline 30 comprises a plurality of modules, some of which represent storage and some of which represent processing modules.
- some modules are provided at the virtual reality processing apparatus 14 (or other server) and other modules are provided at a client end, such as the second headset 22B, although any client end device is applicable.
- a client rendering module 38 is provided for receiving, decoding and rendering content to the display screens of the headset.
- a client view tracking module 39 is provided for transmitting position information indicative of the current or a predicted field- of-view of the second headset 22B which may be fed back to the virtual reality processing apparatus 14. The position information may be generated using previously-described systems and methods for spatial position determination.
- the patch transcoding and packing module 36 and the patch culling module 37 may be provided in a network edge node that performs real-time processing for content streaming.
- the remaining modules 31, 32, 33, 34 of the virtual processing apparatus 14 may provide off-line processing of content data.
- the patch transcoding and packing module 36 may be performed in the client end, e.g. based on a prescription as described later.
- the patch culling may be performed by the client end.
- the client end may resolve addresses or identifiers, such as Uniform Resource Locators (URLs), for patches to be received and request the patches using those addresses or identifiers.
- URLs Uniform Resource Locators
- modules Fewer or a greater number of modules may be provided, and it will be appreciated that such modules may be provided in hardware or software form, or a combination thereof.
- data representing volumetric video content 31 is provided, all or some of which may be encoded and transmitted to one or more of the headsets 22A - 22C and/or the media player 24 for decoding and rendering.
- the content 31 may represent any three-dimensional object or scene.
- the pipeline 30 may comprise a normal generation module 32 for determining or estimating surface normals from at least some of the volumetric content.
- Surface normals represent vectors perpendicular to a surface.
- FIG. 4A shows part of a point cloud 40 comprised of a plurality of points 41.
- Methods and systems for generating the point cloud 40 from a captured or CGI object or scene are known. Further information is provided at http://pointclouds.org.
- the underlying object 42 which the point cloud 40 represents is shown in dotted lines for reference. In other embodiments, the points 41 may represent voxels.
- FIG 4B shows estimated or determined normals 50 (particularly, surface normals) for each point 41 of the point cloud 40, i.e. vectors perpendicular to the surface of the object 42.
- the surface normals 50 vaiy in angle or orientation because the object 42 is non-planar.
- the pipeline 30 may also comprise a patch optimisation module 33, which is configured to determine patches based on one or more rules.
- a patch may comprise all points 41 having surface normals 50 in the same direction.
- a patch may comprise all points 41 having an orientation or opening angle within a particular range, e.g. 15 degrees, which may be defined by a view cone 51. Other opening angles may be used.
- first and second shown surface normals 44, 45 may be determined similar whereas the surface normal 46 may be considered non-similar, because it is outside of the view cone 51.
- An example patch 47 is shown in shaded representation. The patch 47 is a volumetric patch in that the points within it represent three-dimensional content.
- patches 47, 48, 49 may be determined in this manner until all points 41 of the point cloud 40 are allocated to a patch. Partial views of second and third patches 48, 49 are shown for illustration. It follows that each patch 47, 48, 49 may be of a different size and/or shape.
- the pipeline 30 may therefore provide and store a set of volumetric patches in a volumetric patches module 34 (see FIG. 3). Another operation provided by the patch optimization module 33 is to determine patch visibility information 35. An example embodiment for determining patch visibility information 35 is described, but it should be appreciated that other methods are possible. Patch visibility information is data indicative of where in the volumetric space the forward surface of the patch can be seen.
- Patch visibility information 35 for each patch 47, 48, 49 may be calculated from patch normals of each respective patch.
- a visibility cone may be determined, as shown in FIG. 4C, for the first patch 47, which comprises a visibility cone direction vector (X, Y, Z) and an opening angle (A).
- the opening angle (A) defines a set of spatial angles from which the forward surface of the first patch 47 can be seen.
- the visibility cone can be determined using the minimal cone that encloses all normal vector directions for the given patch 47.
- a normal cone angle is determined, for example by determining the largest angle between all normals for the patch and by minimizing the dot product between all normals in the patch.
- the normal cone direction vector can be calculated halfway between them, e.g. by summing these two vectors together and normalizing the result.
- the visibility cone may be determined from the normal cone by using the same direction vector, and adding 180 degrees to the opening angle to account for visibility of forward surfaces.
- the visibility cone can be additionally optimized by considering self-shadowing of the patch, by applying ray-tracing or other methods to detect the visibility.
- each patch 47, 48, 49 will have a normal vector (X, Y, Z) and angle (A) as additional parameters, which may be represented as a visibility cone for the patch.
- the data (X, Y, Z) and (A) may describe patch visibility over a group of frames rather than per frame.
- the patch visibility information may be provided as patch visibility metadata for each respective patch 47, 48, 49.
- the patch visibility metadata may be used for encoding the video bit stream containing each patch 47, 48, 49.
- the containing structures for patch visibility metadata may include but are not limited to the following: supplemental enhancement information (SEI) messages, e.g. as defined in H.264/AVC or HEVC (a.k.a.
- H.265 structures of a containing file format used in encapsulating and/or providing metadata for the video bit stream; for example, in files conforming to the ISO base media file format (ISOBMFF, ISO/IEC 14496-12), timed metadata structures, such as a timed metadata track, a sample group, sample auxiliary information, or SubSamplelnformationBox may be used;
- ISO base media file format ISO/IEC 14496-12
- timed metadata structures such as a timed metadata track, a sample group, sample auxiliary information, or SubSamplelnformationBox
- a descriptor element could contain patch visibility metadata in DASH MPD.
- the patch visibility metadata may comprise the visibility cone direction vector (X, Y, Z) and the angle (A).
- the visibility cone direction vector (X, Y, Z) and angle value (A) may be heavily quantized (e.g. 6-bits per X, Y, Z and A) and represented only using small numbers. Therefore, the additional per-patch storage in the video bit stream may be minimal. Such quantization may be performed conservatively so that the quantized cone encloses the original non-quantized cone.
- the patch visibility metadata may comprise a definition of a bounding sphere surface and sphere region metadata, identical or similar to that specified by the omnidirectional media format (OMAF) standard (ISO/IEC 23090-2).
- the bounding sphere surface may for example be defined by a three-dimensional location of the centre of the sphere, and the radius of the sphere.
- the patch may be considered visible within the indicated sphere region.
- This embodiment may for example suit complex scenes with distinct objects that can be surrounded by a bounding sphere surface.
- the geometry of the bounding surface may also be something other than a sphere, such as cylinder, cube, or cuboid.
- Multiple sets of patch visibility metadata may be defined for the same three-dimensional location of the centre of the bounding surface, but with different radii (or information indicative of the distance of the bounding surface from the three- dimensional location). Indicating several pieces of patch visibility metadata may be beneficial to handle occlusions.
- the metadata structure may describe several visibility cones.
- patch visibility metadata may be approximate spatially and/or temporally.
- the patch visibility metadata may describe a visibility cone that may contain locations from where the patch is not visible but encloses all locations from where the patch is visible.
- the patch visibility metadata may describe a visibility cone that guarantees the visibility of the patch regardless of the location within the cone, while there may be additional locations outside the visibility cone from where the patch is also visible.
- the type of the patch visibility metadata (e.g. accurateness spatially and/or temporally) may be pre-defined, e.g. in a coding standard, or it may be indicated in or along the bit stream by an encoder, and/ or it may be decoded from or along the bit stream by a decoder.
- the patch visibility metadata may be indicative of the absolute and/or relative picture quality of a texture patch.
- quality ranking metadata similar to the OMAF and/or DASH standards may be provided with the patch visibility metadata.
- the patch visibility metadata for each patch may be provided to a patch culling module 37.
- the patch culling module 37 may be configured to determine which patches 47, 48, 49 are transmitted to a user device, for example the client rendering module 38 of the headset 22B.
- the client rendering module 38 may be hardware, software or a combination thereof for receiving, decoding and rendering received patch data to the screens of the headset 22B based on view parameters generated by the client view tracking module 39.
- the patch culling module 37 is configured to cull (i.e. supress or remove) patches 47, 48, 49 which are not visible to the user as determined based on the view parameters and the patch visibility metadata. That is, if it is determined that one or more patches 47, 48, 49 cannot be seen based on the current field-of-view, or will not be seen based on a predicted future field-of-view, then those one or more patches are not encoded or transmitted.
- all patches 47, 48, 49 may be encoded and transmitted, together with the patch visibility metadata, and the client rendering module 38 is configured based on the locally determined view parameters, and the received patch visibility metadata, whether or not to decode and render one or more patches. That is, if it is determined by the client rendering module 38 that one or more patches 47, 48, 49 cannot be seen based on the current field-of-view, or will not be seen based on a predicted future field-of-view, then those one or more patches are not decoded and/or rendered.
- culling may happen in both stages, so that patch culling module 37 does initial culling based on a predicted field-of-view, and the client rendering module 38 does more fine-grained culling based on the final view parameters in effect during rendering.
- a patch transcoding and packing module 36 may be configured to receive each patch 47, 48, 49 stored in the volumetric patch module 34 and to encode and/or transmit patches based on feedback from the patch culling module 37.
- Each patch 47, 48, 49 may be projected to a two-dimensional colour (or other form of texture) image and to a corresponding depth image, also known as a depth map. This conversion enables each patch 47, 48, 49 to be converted back to volumetric form at a client rendering module 38 of the headset 22B using both images. It should be understood that embodiments are not related to colour and depth patches, but can be realized additionally or alternatively for other types of patches, such as reflectance, opacity or transparency (e.g. alpha channel patches), surface normal, albedo, and/or other material or surface attribute patches.
- each patch 47, 48, 49 may be packed into one or more atlases.
- Texture atlases are known in the art, comprising an image consisting of sub-images, the image being treated as a single unit by graphics hardware and which can be compressed and transmitted as a single image for subsequent identification and decompression.
- first and second atlases 52, 53 are provided, one for the colour or texture images and one for the depth images.
- the first atlas 52 stores a two-dimensional colour image 54 of the first patch 47 as a first sub-image and the second atlas 53 stores a corresponding depth image 55 of the first patch 47 as a sub-image at the corresponding location.
- Images derived from the other patches 48, 49 may be stored at different respective regions of the first and second atlases 52, 53.
- the first and second atlases 52, 53 may then be encoded, e.g. compressed, as video frames using any known method and transmitted, e.g. streamed, to the headset 22B for decoding and rendering.
- the sub-image layout in the first and second atlases 41, 42 for the patches 47, 48, 49 may be optimized by placing patches that have similar (e.g. in terms of direction and/or angle) view cones 51 adjacent to one other.
- combined patch visibility metadata for the patches 47, 48, 49 may be generated additionally or alternatively to individual patch visibility metadata which applies to respective ones of the patches.
- the sub-image layout in the first and second atlases 52, 53 may also be organized such that it is possible to encode a patch 47, 48, 49 or a set of patches having similar visibility cones 51 into spatiotemporal units that can be decoded independently of other spatiotemporal units.
- a tile grid as understood in the context of High Efficiency Video Coding (HEVC)
- HEVC High Efficiency Video Coding
- the first and second atlases 52, 53 may be organized in a manner such that a patch 47, 48, 49 or a group of patches having similar visibility cones can be encoded as a motion-constrained tile set (MCTS), as understood in the context of HEVC.
- MCTS motion-constrained tile set
- An MCTS is such that the inter prediction process is constrained in encoding such that no sample value outside the motion- constrained tile set, and no sample value at a fractional sample position that is derived using one or more sample values outside the motion-constrained tile set, is used for inter prediction of any sample within the motion-constrained tile set. Additionally, the encoding of an MCTS is constrained in a manner that motion vector candidates are not derived from blocks outside the MCTS.
- an MCTS may be defined to be a tile set that is independent of any sample values and coded data, such as motion vectors, that are outside the MCTS. In some cases, an MCTS may be required to form a rectangular area.
- an MCTS may refer to the tile set within a picture or to the respective tile set in a sequence of pictures.
- the respective tile set may be, but in general need not be, collocated in the sequence of pictures. It needs to be understood that even though some examples and embodiments are described with respect to MCTSs, they could be similarly realized with other similar concepts of independently decodable spatiotemporal units.
- one or more (but not the entire set of) spatiotemporal units may be provided and stored as a track, as is understood in the context of the ISO base media file format, or as any similar container file format structure.
- a track may be referred to as a patch track.
- Patch tracks may for example be sub-picture tracks, as understood in the context of OMAF, or tile tracks, as understood in the context of ISO/IEC 14496-15.
- the entire coded video bit stream (containing the entire set of spatiotemporal units) may be stored as a track, as understood in the context of the ISO base media file format, or as any similar container file format structure.
- the atlas layouts may be organized such a way that a patch or a set of patches 47, 48, 49 having similar visibility cones form a spatiotemporal unit that can be extracted and encoded individually as a video bit stream. This may typically mean that the width and height of the spatiotemporal unit stays unchanged over time, at least for several successive frames.
- Several sub-picture bit streams of the same atlas may be encoded, for example one per patch 47, 48, 49 or each set of patches having similar visibility cones.
- Encoding may be constrained in a manner such that merging of an encoded sub-picture bit stream to another bit stream that can be decoded with a standard-conforming decoder is enabled.
- sub-picture bit streams may be encoded in a way such that dependencies on samples outside of the decoded picture boundaries are avoided in the encoding by selecting motion vectors in a manner that sample locations outside the picture are not referred to in the inter-prediction process.
- Sub-picture bit streams may be stored as patch tracks, as discussed above.
- versions of the one or more atlases are encoded. Different versions may include, but are not limited to, one or more of the following: - different bitrate versions of the one or more atlases at the same resolution;
- different versions for different random access intervals may include one or more intra-coded atlases (where eveiy picture can be randomly accessed).
- combinations of patches 47, 48, 49 from different versions of the texture atlas may be prescribed and described as metadata, such as extractor tracks, as will be understood in the context of OMAF and/or ISO/IEC 14496-15.
- the prescriptions may be authored on the basis of one or more of the following but are not limited to such. When the total sample count of a texture atlas and, in some cases, of the respective geometry pictures and/or other auxiliary pictures (if any) exceeds a limit, such as a level limit of a video codec, a prescription may be authored in a manner so that the limit is obeyed.
- patches may be selected from a lower-resolution texture atlas according to subjective importance. The selection may be performed in a manner that is not related to the viewing position.
- the prescription may be accompanied by metadata characterizing the obeyed limit(s), e.g. the codec Level that is obeyed.
- a prescription may be made specific to a visibility cone and hence excludes the patches 47,
- the selection of visibility cones for which the prescriptions are generated may be limited to a reasonable number, such that switching from one prescription to another is not expected to occur frequently.
- the visibility cones of prescriptions may overlap to avoid switching back and forth between two prescriptions.
- the prescription may be accompanied by metadata indicative of the visibility cone.
- a prescription may use a specific grid or pattern of independent spatiotemporal units.
- a prescription may use a certain tile grid, wherein tile boundaries are also MCTS boundaries.
- the prescription may be accompanied by metadata indicating potential sources (e.g. track groups, tracks, or representations) that are suitable as spatiotemporal units.
- Patches 47, 48, 49 may be selectively transmitted by the patch transcoding and packing module 36 based on visibility. For example, for a group of frames, the metadata for all patches 47, 48, 49 can be accessed first, and only the patches visible based on the current or predicted viewing parameters received from the client view tracking module 39 may be streamed.
- selective transmission may be controlled by the patch transcoding and packing module 36, i.e. at the server end.
- selective streaming may also be controlled by the client end, i.e. hardware and/or software in the headset 22B. This may be accomplished by having separates streams mapped to individual URIs e.g. based on a manifest or description of media content, or a cloud component (e.g. transcoding/packing in an edge cloud node) or by a streaming server.
- the multimedia content may be stored on an HTTP server and may be delivered using HTTP.
- the content may be stored on the server in two parts: Media Presentation Description (MPD), which describes a manifest of the available content, its various alternatives, their URL addresses, and other characteristics; and segments, which contain the actual multimedia bit streams in the form of chunks, in a single or multiple files.
- MPD Media Presentation Description
- the MDP provides the necessary information for clients to establish a dynamic adaptive streaming over HTTP.
- the MPD contains information describing media presentation, such as an HTTP- uniform resource locator (URL) of each segment to make GET segment request.
- the DASH client may obtain the MPD e.g. by using HTTP, email, thumb drive, broadcast, or other transport methods.
- the DASH client may become aware of the program timing, media-content availability, media types, resolutions, minimum and maximum bandwidths, and the existence of various encoded alternatives of multimedia components, accessibility features and required digital rights management (DRM), media- component locations on the network, and other content characteristics. Using this information, the DASH client may select the appropriate encoded alternative and start streaming the content by fetching the segments using e.g. HTTP GET requests. After appropriate buffering to allow for network throughput variations, the client may continue fetching the subsequent segments and also monitor the network bandwidth fluctuations. The client may decide how to adapt to the available bandwidth by fetching segments of different alternatives (with lower or higher bitrates) to maintain an adequate buffer.
- DRM digital rights management
- a media content component or a media component may be defined as one continuous component of the media content with an assigned media component type that can be encoded individually into a media stream.
- Media content may be defined as one media content period or a contiguous sequence of media content periods.
- Media content component type may be defined as a single type of media content such as audio, video, or text.
- a media stream may be defined as an encoded version of a media content component.
- a hierarchical data model is used to structure media presentation as follows.
- a media presentation consists of a sequence of one or more Periods; each Period contains one or more Groups; each Group contains one or more Adaptation Sets; each Adaptation Sets contains one or more Representations; and each Representation consists of one or more Segments.
- a Group may be defined as a collection of Adaptation Sets that are not expected to be presented simultaneously.
- An Adaptation Set may be defined as a set of interchangeable encoded versions of one or several media content components.
- a Representation is one of the alternative choices of the media content or a subset thereof typically differing by the encoding choice, e.g. by bitrate, resolution, language, codec, etc.
- the Segment contains certain duration of media data, and metadata to decode and present the included media content.
- a Segment is identified by a URI and can typically be requested by a HTTP GET request.
- a Segment may be defined as a unit of data associated with an HTTP-URL and optionally a byte range that are specified by an MPD.
- the DASH MPD complies with Extensible Markup Language (XML) and is therefore specified through elements and attributes as defined in XML. Attributes in an XML document may be identified by a lower-case first letter as well as they may be preceded by a '@'-sign, e.g. @attribute. To point to a specific attribute @attribute contained in an element Element, one may write Element@attribute. In DASH, all descriptor elements are structured in the same way, namely they contain a
- @schemeIdUri attribute that provides a URI to identify the scheme and an optional attribute @value and an optional attribute @id.
- the semantics of the element are specific to the scheme employed.
- the URI identifying the scheme may be a URN or a URL.
- the MPD does not provide any specific information on how to use descriptor elements. It is up to the application or specification that employs DASH formats to instantiate the description elements with appropriate scheme information. Applications or specifications that use one of these elements define a Scheme Identifier in the form of a URI and the value space for the element when that Scheme Identifier is used. The Scheme Identifier appears in the @schemeIdUri attribute. In the case that a simple set of enumerated values are required, a text string may be defined for each value and this string may be included in the @value attribute. If structured data is required then any extension element or attribute may be defined in a separate namespace.
- the @id value may be used to refer to a unique descriptor or to a group of descriptors. In the latter case, descriptors with identical values for the attribute @id may be required to be synonymous, i.e. the processing of one of the descriptors with an identical value for @id is sufficient.
- An Initialization Segment may be defined as a Segment containing metadata that is necessary to present the media streams encapsulated in Media Segments.
- an Initialization Segment may comprise the Movie Box ('moov') which might not include metadata for any samples, i.e. any metadata for samples is provided in Movie Fragment ('moof ) boxes.
- a Media Segment contains certain duration of media data for playback at a normal speed, such duration is referred as Media Segment duration or Segment duration.
- the content producer or service provider may select the Segment duration according to the desired characteristics of the service. For example, a relatively short Segment duration may be used in a live service to achieve a short end-to-end latency.
- Segment duration is typically a lower bound on the end-to-end latency perceived by a DASH client since a Segment is a discrete unit of generating media data for DASH.
- Content generation is typically done such a manner that a whole Segment of media data is made available for a server.
- a Segment can be requested by a DASH client only when the whole duration of Media Segment is available as well as encoded and encapsulated into a Segment.
- different strategies of selecting Segment duration may be used.
- a Segment may be further partitioned into Subsegments e.g. to enable downloading segments in multiple parts.
- Subsegments may be required to contain complete access units.
- Subsegments may be indexed by Segment Index box, which contains information to map presentation time range and byte range for each Subsegment.
- the Segment Index box may also describe subsegments and stream access points in the segment by signaling their durations and byte offsets.
- a DASH client may use the information obtained from Segment Index box(es) to make a HTTP GET request for a specific Subsegment using byte range HTTP request. If relatively long Segment duration is used, then Subsegments may be used to keep the size of HTTP responses reasonable and flexible for bitrate adaptation.
- the indexing information of a segment may be put in the single box at the beginning of that segment, or spread among many indexing boxes in the segment. Different methods of spreading are possible, such as hierarchical, daisy chain, and hybrid. This technique may avoid adding a large box at the beginning of the segment and therefore may prevent a possible initial download delay.
- a DASH Preselection defines a subset of media components of an MPD that are expected to be consumed jointly by a single decoder instance, wherein consuming may comprise decoding and rendering.
- the Adaptation Set that contains the main media component for a Preselection is referred to as main Adaptation Set.
- each Preselection may include one or multiple partial Adaptation Sets. Partial Adaptation Sets may need to be processed in combination with the main Adaptation Set.
- a main Adaptation Set and partial Adaptation Sets may be indicated by one of the two means: a preselection descriptor or a Preselection element.
- a patch track forms a Representation in the context of DASH.
- the Representation element in DASH MPD may provide metadata on the patch, such as patch visibility metadata, related to the patch track.
- the client rendering module 38 obtains the patch visibility metadata. Such“obtaining” may include but is not limited to one or more of the following.
- the client rendering module 38 may obtain the metadata together with other necessary metadata to request patches 47, 48, 49 from the patch transcoding and packing module 36.
- the tracks may be announced as timed metadata representation(s) in a manifest and may be indicated as associated with related media representation(s).
- the @associationld attribute may be used in DASH MPD to indicate an association between metadata and media
- the patch visibility metadata may be provided within an Initialization Segment of the media representation(s).
- the SampleGroupDescriptionBox may contain a possible set of patch visibility metadata, which may be considered a superset of dynamic time-varying patch visibility metadata.
- the client rendering module 38 may obtain the patch visibility metadata as a part of the Initialization Segment of the media Representation(s) that are considered as sources for fetching sub segments.
- the patch visibility metadata may be provided in an initial part of each patch 47, 48, 49, i.e. as a part of the MovieFragmentBox.
- the SampleToGroupBox may indicate which entries of the respective SampleGroupDescriptionBox apply in the movie fragment.
- the client rendering module 38 may issue one or more HTTP GET requests to obtain an initial part of a sub segment to obtain the patch visibility metadata.
- the client rendering module 38 may obtain information on available prescriptions of patches 47, 48, 49, wherein the information may include but is not limited to one or more the following: metadata characterizing the limit(s) obeyed by the prescription, e.g. the codec level that is obeyed by the prescription; metadata indicative of the visibility cone provided by the prescription; information of potential sources (e.g. track groups, tracks, or representations) that are suitable as spatiotemporal units to be included, by reference, into the prescription.
- metadata characterizing the limit(s) obeyed by the prescription e.g. the codec level that is obeyed by the prescription
- metadata indicative of the visibility cone provided by the prescription e.g. information of potential sources (e.g. track groups, tracks, or representations) that are suitable as spatiotemporal units to be included, by reference, into the prescription.
- the client rendering module 38 may use the information for selecting a suitable prescription, e.g. on the basis of one or more of the following: if the client rendering module’s decoding capacity meets or exceeds that required by the prescription; if the viewing position and orientation is within the visibility cone provided by the prescription;
- a prescription may be provided within a main Adaptation Set of a DASH preselection.
- the main Adaptation Set may contain a Representation that comprises Segments of an extractor track.
- Partial Adaptation Sets of a DASH preselection may comprise Representations, wherein a Representation may comprise segments of a patch track.
- the client rendering module 38 may select between several versions of equivalent patches 47, 48, 49.
- the client rendering module’s operation may for example comprise one or more of the following steps.
- the client may obtain such absolute or relative picture quality information. Otherwise, the client rendering module 38 may conclude such absolute or relative picture quality information e.g. based on bitrate indicated for patches 47, 48, 49 e.g. within DASH MPD.
- the client rendering module 38 may select the highest quality or bitrate patches 47, 48, 49 that are determined to be visible, and in some cases (e.g. when the network throughput is not high enough) in the centre of the viewport. Patches 47, 48, 49 that are not visible but which may become visible if the viewing position and/ or orientation changes moderately may be selected to be streamed at a lower quality. The client rendering module 38 may determine that patches 47, 48, 49 which are not visible and not expected to become visible within a reasonable time period are not requested to be streamed. In some cases video decoding by the client rendering module 38 may be optimized so that only partial image block of the texture needs to be decoded.
- FIG. 6 illustrates a viewing point 60, a first patch 61 and a second patch 62.
- a decision on whether or not one or both of the first and second patches 61, 62 are culled may be performed by comparing view vectors 63, 64 (vectors from the viewing point 60 to a patch point on the respective patches) to view cones.
- a first point on each of the first and second patches 61, 62 is needed in order to calculate the respective view vectors 63, 64.
- patch bounding box corners (comprising eight points) can be used as patch reference points. Each bounding box corner point is tested to determine whether the respective view vector 63, 64 is inside or outside the resulting view cone 61A, 61B.
- a patch 61, 62 may only be rendered if any of the view vectors are forward-facing. If rear-facing, a patch 61, 62 may not be rendered.
- FIG. 7 illustrates the above-mentioned visibility check.
- a patch 70 is defined by the black lines describing its surface. Normals 71 are illustrated for the surface. Normals 71 that have the most angular difference are illustrated by a normal cone 73. For example, the uppermost and lowermost normal 71 have the most angular difference and hence the normal cone 73 is derived therefrom.
- a tight bounding box 74 is illustrated by the rectangle, and normal cones 73 are placed at each bounding box comer for testing visibility.
- An example viewing position 78 is indicated to the right-hand side. Before testing visibility with the normal cones 73, some early checks can be made.
- the patch 70 is projected to a first plane 74 of the bounding box 74, which is the near plane in terms of the viewing position 78.
- the depth values at the first plane 74 will be approximate zero.
- the projection on a second plane 77 (the far plane in this case) will have depth values, e.g. below 255 if 8-bit video is used. If the viewing position 78 is to the right-hand side of the far plane 77, nothing may be culled because the user may be inside the content. If the viewing position 78 is to the right-hand side of the near plane 76, then nothing may be culled as the forward surface will be seen.
- each bounding box corner is tested with the normal cone 73.
- the patch 70 can be culled if the viewing position is in a space represented by the black area 75 as the patch cannot be seen from these viewing locations.
- the black area 75 may be determined as follows, using in this case the two normal cones 73 located on the far plane 77 for ease of illustration. Each normal cone 73 is“opened” by 180 (90 + 90) degrees and straight lines drawn to define two culling zones 81, 82, referred to respectively as Acuii and Bcuii. The intersection of these two zones is the black zone 75 which may be culled.
- FIG. 8 is a flow diagram showing processing operations that may be performed by one or more modules shown in FIG. 3 or by individual modules shown in FIG. 3.
- the processing operations in FIG. 8 may be performed by hardware, software or a combination thereof at a client-end system such as the virtual reality processing apparatus 14.
- the order of operations is not necessarily indicative of processing order.
- An operation 8.1 comprises providing a plurality of patches. Providing may mean receiving or generating the patches.
- Another operation 8.2 comprises providing patch visibility information.
- Providing may be receiving or generating the patch visibility information.
- the patch visibility information may be based on a determined visibility cone or bounding sphere or other bounding volume.
- the patch visibility information may be provided as metadata which may be quantised.
- Another operation 8.3 comprises providing one or more viewing positions of a client device.
- Providing may be receiving or generating the one or more viewing positions.
- the one or more viewing positions may be based on field-of-view or predicted field-of-view information received from a user device such as one or more of the headsets 22A - 22C shown in FIG. 2.
- the one or more viewing positions may form a contiguous bounded viewing space.
- a predicted field-of-view may be determined using known methods, such as based on historical movement data for the user, machine -learning techniques, the nature of the scene etc.
- Another operation 8.4 comprises processing the patches dependent on whether the patch visibility information indicates that the forward surface of the patch is visible.
- Said processing may comprise one or more of selectively encoding, transmitting, requesting for transmission, parsing from a container (such as a container file), decoding and rendering.
- FIG. 9 is a schematic diagram of components of either of the virtual reality processing apparatus 14, or the client rendering module 38 of a client end device, such as any of the headsets 22A - 22C and/or the media player 24 shown in FIG. 2.
- the components are those in the virtual reality processing apparatus 14, but it will be appreciated that the following is applicable to the client rendering module 38.
- the virtual reality processing apparatus 14 may have a processor 100, a memory 104 closely coupled to the processor and comprised of a RAM 102 and ROM 103, and, optionally, hardware keys 106 and a display 108.
- the server 40 may comprise one or more network interfaces 110 for connection to a network, e.g. a modem which may be wired or wireless.
- the processor 100 is connected to each of the other components in order to control operation thereof.
- the memory 104 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD).
- the ROM 103 of the memoiy 104 stores, amongst other things, an operating system 112 and may store software applications 114.
- the RAM 102 of the memoiy 104 may be used by the processor 100 for the temporary storage of data.
- the operating system 112 may contain code which, when executed by the processor ware components of the server 40.
- the processor 100 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors and it may comprise processor circuitiy.
- the virtual reality processing apparatus 14 may be a standalone computer, a server, a console, or a network thereof.
- the virtual reality processing apparatus 14 may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications.
- the virtual reality processing apparatus 14 may be in communication with the remote server device in order to utilize the software application stored there.
- Patch creation may be performed based on surface normal similarity, for example using a normal visibility cone.
- Patch visibility information, or metadata may be provided in or along the bitsream containing the coded patches.
- Patches may be grouped having similar visibility information or metadata, for example determined using visibility cones.
- One or more texture atlases may be generated, with sub-images being arranged based on the grouping of patches.
- Encoding may be performed using independently decodable spatiotemporal units, such as motion-constrained tile sets, each enclosing one set of grouped patches. There may be provided an operation of prescribing, e.g. in extractor tracks of ISO/IEC 14496-15, how sets of independently decodable spatiotemporal units are merged in the encoded domain as a decodable bit stream.
- embodiments include selecting, for deliveiy and/or decoding, texture and geometry patches that are determined, based on the patch visibility information, to be visible under current or potential viewing conditions. On the basis of said selecting, one or more of the following may be performed. For example, delivery and/or decoding of the independently decodable spatiotemporal units may be performed on enclosed groups of patches determined to be visible. For example, independently decodable spatiotemporal units may be merged in the coded domain into a decodable bit stream. The merging may be performed using the prescriptions, such as extractor tracks of ISO/IEC 14496-15. For example, selected patches maybe transcoded into a decodable bit stream.
- VR virtual reality
- AR augmented reality
- MR mixed reality
- References to“providing” may include receiving, transmitting and/or generating.
- References to“means” may comprise hardware, software, firmware or a combination thereof.
- the means may be a computer, computer controller or processor, or
- microcontroller which may operate in association with software code.
- References to“viewing position” may be extended to a predicted viewing position associated with the client device, i.e. a future viewing position.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Geometry (AREA)
- Human Computer Interaction (AREA)
- Computer Security & Cryptography (AREA)
- User Interface Of Digital Computer (AREA)
- Processing Or Creating Images (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1806370.1A GB2572996A (en) | 2018-04-19 | 2018-04-19 | Processing video patches for three-dimensional content |
PCT/FI2019/050297 WO2019202207A1 (en) | 2018-04-19 | 2019-04-12 | Processing video patches for three-dimensional content |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3782368A1 true EP3782368A1 (de) | 2021-02-24 |
EP3782368A4 EP3782368A4 (de) | 2021-12-08 |
Family
ID=62236244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19789175.7A Pending EP3782368A4 (de) | 2018-04-19 | 2019-04-12 | Verarbeitung von videopatches für dreidimensionale inhalte |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3782368A4 (de) |
GB (1) | GB2572996A (de) |
WO (1) | WO2019202207A1 (de) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3709659A1 (de) * | 2019-03-11 | 2020-09-16 | InterDigital VC Holdings, Inc. | Verfahren und vorrichtung zur codierung und decodierung eines volumetrischen videos |
EP3829166A1 (de) * | 2019-11-29 | 2021-06-02 | InterDigital CE Patent Holdings | Verfahren und vorrichtung zur dekodierung eines 3d-videos |
JP2023506832A (ja) * | 2019-12-19 | 2023-02-20 | インターデジタル ブイシー ホールディングス フランス | 補助パッチを有する容積ビデオ |
CN115023739A (zh) | 2019-12-20 | 2022-09-06 | 交互数字Vc控股法国公司 | 用于对具有视图驱动的镜面反射的体积视频进行编码和解码的方法和装置 |
US12041250B2 (en) * | 2020-01-10 | 2024-07-16 | Intel Corporation | Multi-dimensional video transcoding |
CN115053531A (zh) * | 2020-01-10 | 2022-09-13 | 诺基亚技术有限公司 | 在isobmff中存储来自一个v-pcc基本流的多个图集 |
EP4115624A4 (de) | 2020-03-03 | 2024-03-27 | Nokia Technologies Oy | Effiziente auslese von volumetrischen videoatlasbitströmen |
WO2021191495A1 (en) * | 2020-03-25 | 2021-09-30 | Nokia Technologies Oy | A method, an apparatus and a computer program product for video encoding and video decoding |
US11974026B2 (en) | 2020-03-26 | 2024-04-30 | Nokia Technologies Oy | Apparatus, a method and a computer program for volumetric video |
TW202143721A (zh) * | 2020-05-06 | 2021-11-16 | 法商內數位Ce專利控股公司 | 具有阿爾法層的3d場景傳輸 |
US20230215080A1 (en) * | 2020-06-09 | 2023-07-06 | Interdigital Ce Patent Holdings, Sas | A method and apparatus for encoding and decoding volumetric video |
CN115004716A (zh) | 2020-06-24 | 2022-09-02 | 中兴通讯股份有限公司 | 容积媒体处理方法和装置 |
EP4218237A1 (de) * | 2020-09-22 | 2023-08-02 | InterDigital CE Patent Holdings, SAS | Verfahren und vorrichtung zur codierung von mpi-basiertem volumetrischem video |
WO2022073796A1 (en) * | 2020-10-08 | 2022-04-14 | Interdigital Ce Patent Holdings, Sas | A method and apparatus for adapting a volumetric video to client devices |
US20220329857A1 (en) * | 2021-04-13 | 2022-10-13 | Samsung Electronics Co., Ltd. | Mpeg media transport (mmt) signaling of visual volumetric video-based coding (v3c) content |
US11659043B1 (en) * | 2022-01-27 | 2023-05-23 | Meta Platforms Technologies, Llc | Systems and methods for predictively downloading volumetric data |
CN116156157B (zh) * | 2023-04-24 | 2023-08-18 | 长沙海信智能系统研究院有限公司 | 一种摄像头遮挡异常的检测方法及电子设备 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5357600A (en) * | 1992-10-15 | 1994-10-18 | Sun Microsystems, Inc. | Method and apparatus for the rendering of curved surfaces using a cone of normals |
US6583787B1 (en) * | 2000-02-28 | 2003-06-24 | Mitsubishi Electric Research Laboratories, Inc. | Rendering pipeline for surface elements |
JP3900839B2 (ja) * | 2001-02-19 | 2007-04-04 | 日産自動車株式会社 | 図形表裏設定装置および図形表裏設定方法 |
JP3961525B2 (ja) * | 2004-09-22 | 2007-08-22 | 株式会社コナミデジタルエンタテインメント | 画像処理装置、画像処理方法、ならびに、プログラム |
TWI450215B (zh) * | 2010-12-14 | 2014-08-21 | Via Tech Inc | 影像物件之隱藏面移除的預先揀選方法、系統以及電腦可記錄媒體 |
US20150199383A1 (en) * | 2014-01-16 | 2015-07-16 | Nokia Corporation | Systems and Methods for Indexing and Retrieving Images |
KR20160051155A (ko) * | 2014-10-31 | 2016-05-11 | 삼성전자주식회사 | 렌더링 장치 및 방법 |
US9412034B1 (en) * | 2015-01-29 | 2016-08-09 | Qualcomm Incorporated | Occlusion handling for computer vision |
FI20165114A (fi) * | 2016-02-17 | 2017-08-18 | Nokia Technologies Oy | Laitteisto, menetelmä ja tietokoneohjelma videokoodausta ja videokoodauksen purkua varten |
GB2547689A (en) * | 2016-02-26 | 2017-08-30 | Nokia Technologies Oy | A multi-camera device and a calibration method |
EP3429210A1 (de) * | 2017-07-13 | 2019-01-16 | Thomson Licensing | Verfahren, vorrichtung und stream zur verschlüsselung und entschlüsselung von volumetrischem video |
-
2018
- 2018-04-19 GB GB1806370.1A patent/GB2572996A/en not_active Withdrawn
-
2019
- 2019-04-12 EP EP19789175.7A patent/EP3782368A4/de active Pending
- 2019-04-12 WO PCT/FI2019/050297 patent/WO2019202207A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2019202207A1 (en) | 2019-10-24 |
EP3782368A4 (de) | 2021-12-08 |
GB2572996A (en) | 2019-10-23 |
GB201806370D0 (en) | 2018-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019202207A1 (en) | Processing video patches for three-dimensional content | |
KR102246002B1 (ko) | 가상 현실 미디어 콘텐트의 스트리밍을 개선하는 방법, 디바이스, 및 컴퓨터 프로그램 | |
US11653065B2 (en) | Content based stream splitting of video data | |
CN109565610B (zh) | 处理全向视频的方法、装置以及存储介质 | |
EP3466093A1 (de) | Verfahren, vorrichtungen und computerprogramm zum adaptiven streaming von medieninhalten der virtuellen realität | |
US11539983B2 (en) | Virtual reality video transmission method, client device and server | |
WO2019229293A1 (en) | An apparatus, a method and a computer program for volumetric video | |
US20230026014A1 (en) | Video processing device and manifest file for video streaming | |
US20230319328A1 (en) | Reference of neural network model for adaptation of 2d video for streaming to heterogeneous client end-points | |
US20240292041A1 (en) | Adaptation of 2d video for streaming to heterogenous client end-points | |
US20230034937A1 (en) | Media file encapsulating method, media file decapsulating method, and related devices | |
US20240179203A1 (en) | Reference of neural network model by immersive media for adaptation of media for streaming to heterogenous client end-points | |
WO2023284487A1 (zh) | 容积媒体的数据处理方法、装置、设备以及存储介质 | |
US20240080501A1 (en) | Processing of multi-view video | |
JP2024512629A (ja) | ライトフィールド/ホログラフィック媒体のアセット再利用性 | |
CN116848840A (zh) | 多视图视频流式传输 | |
EP4391550A1 (de) | Verarbeitung von inhalten für anwendungen der erweiterten realität |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20201119 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20211110 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H04N 21/482 20110101ALI20211104BHEP Ipc: H04N 21/262 20110101ALI20211104BHEP Ipc: H04N 21/462 20110101ALI20211104BHEP Ipc: H04N 21/4402 20110101ALI20211104BHEP Ipc: H04N 21/2662 20110101ALI20211104BHEP Ipc: H04N 21/2343 20110101ALI20211104BHEP Ipc: H04N 21/854 20110101ALI20211104BHEP Ipc: H04N 21/845 20110101ALI20211104BHEP Ipc: H04N 21/81 20110101ALI20211104BHEP Ipc: G06T 19/00 20110101ALI20211104BHEP Ipc: G06T 3/00 20060101ALI20211104BHEP Ipc: G06T 15/20 20110101ALI20211104BHEP Ipc: G06T 15/40 20110101ALI20211104BHEP Ipc: H04N 21/218 20110101ALI20211104BHEP Ipc: H04N 13/178 20180101ALI20211104BHEP Ipc: H04N 19/17 20140101ALI20211104BHEP Ipc: H04N 19/167 20140101ALI20211104BHEP Ipc: H04N 19/597 20140101AFI20211104BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20240903 |