WO2022224112A1 - Pièces à géométrie héritée - Google Patents

Pièces à géométrie héritée Download PDF

Info

Publication number
WO2022224112A1
WO2022224112A1 PCT/IB2022/053575 IB2022053575W WO2022224112A1 WO 2022224112 A1 WO2022224112 A1 WO 2022224112A1 IB 2022053575 W IB2022053575 W IB 2022053575W WO 2022224112 A1 WO2022224112 A1 WO 2022224112A1
Authority
WO
WIPO (PCT)
Prior art keywords
patch
geometry
inherited
patches
information
Prior art date
Application number
PCT/IB2022/053575
Other languages
English (en)
Inventor
Patrice Rondao Alface
Deepa NAIK
Vinod Kumar Malamal Vadakital
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2022224112A1 publication Critical patent/WO2022224112A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding

Definitions

  • JU Joint Undertaking
  • the example and non-limiting embodiments relate generally to volumetric video and specifically to enabling a reduction in redundant signaling of geometry/depth information.
  • an apparatus comprising: at least one processor; and at least one non-transitory memory and computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: determine that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches comprise at least partially different texture information; select one of the at least two patches to be a reference patch; select at least one other of the at least two patches to be an inherited geometry patch; encode the reference patch, wherein the encoded reference patch comprises, at least, geometry information; encode the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch comprises an indication of the reference patch; and transmit at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • a method comprising: determining that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches comprise at least partially different texture information; selecting one of the at least two patches to be a reference patch; selecting at least one other of the at least two patches to be an inherited geometry patch; encoding the reference patch, wherein the encoded reference patch comprises, at least, geometry information; encoding the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch comprises an indication of the reference patch; and transmitting at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • an apparatus comprising means for performing: determining that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches comprise at least partially different texture information; selecting one of the at least two patches to be a reference patch; selecting at least one other of the at least two patches to be an inherited geometry patch; encoding the reference patch, wherein the encoded reference patch comprises, at least, geometry information; encoding the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch comprises an indication of the reference patch; and transmitting at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • a non-transitory computer-readable medium comprising program instructions stored thereon which, when executed with at least one processor, cause the at least one processor to: determine that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches comprise at least partially different texture information; select one of the at least two patches to be a reference patch; select at least one other of the at least two patches to be an inherited geometry patch; encode the reference patch, wherein the encoded reference patch comprises, at least, geometry information; encode the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch comprises an indication of the reference patch; and transmit at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • an apparatus comprising: at least one processor; and at least one non-transitory memory and computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: determine a reference patch and an inherited geometry patch, wherein the reference patch comprises geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch is associated with the reference patch; and determine a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • a method comprising: determining a reference patch and an inherited geometry patch, wherein the reference patch comprises geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch is associated with the reference patch; and determining a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • an apparatus comprising means for performing: determining a reference patch and an inherited geometry patch, wherein the reference patch comprises geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch is associated with the reference patch; and determining a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • a non-transitory computer-readable medium comprising program instructions stored thereon which, when executed with at least one processor, cause the at least one processor to: determine a reference patch and an inherited geometry patch, wherein the reference patch comprises geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch is associated with the reference patch; and determine a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • FIG. 1 is a block diagram of one possible and non limiting exemplary system in which the exemplary embodiments may be practiced;
  • FIG. 2 is a block diagram of one possible and non limiting exemplary system in which the exemplary embodiments may be practiced;
  • FIG. 3 is a diagram illustrating features as described herein;
  • FIG. 4 is a diagram illustrating features as described herein;
  • FIG. 5 is a diagram illustrating features as described herein;
  • FIG. 6 is a diagram illustrating features as described herein;
  • FIG. 7 is a diagram illustrating features as described herein;
  • FIG. 8 is a diagram illustrating features as described herein;
  • FIG. 9 is a diagram illustrating features as described herein;
  • FIG. 10a is a diagram illustrating features as described herein;
  • FIG. 10b is a diagram illustrating features as described herein;
  • FIG. 11a is a diagram illustrating features as described herein;
  • FIG. lib is a diagram illustrating features as described herein;
  • FIG. 12 is a flowchart illustrating steps as described herein.
  • FIG. 13 is a flowchart illustrating steps as described herein.
  • DSP digital signal processor eNB or eNodeB evolved Node B (e.g., an LTE base station)
  • eNB or eNodeB evolved Node B (e.g., an LTE base station)
  • EN-DC E-UTRA-NR dual connectivity en-gNB or En-gNB node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in EN-DC
  • E-UTRA evolved universal terrestrial radio access e.g., the LTE radio access technology FDMA frequency division multiple access gNB (or gNodeB) base station for 5G/NR, e.g., a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC
  • LTE radio access technology FDMA frequency division multiple access gNB (or gNodeB) base station for 5G/NR e.g., a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC
  • UE user equipment e.g., a wireless, typically mobile device
  • FIG. 1 shows an example block diagram of an apparatus 50.
  • the apparatus may be configured to perform various functions such as, for example, gathering information by one or more sensors, encoding and/or decoding information, receiving and/or transmitting information, analyzing information gathered or received by the apparatus, or the like.
  • a device configured to encode a video scene may (optionally) comprise one or more microphones for capturing the scene and/or one or more sensors, such as cameras, for capturing information about the physical environment in which the scene is captured.
  • a device configured to encode a video scene may be configured to receive information about an environment in which a scene is captured and/or a simulated environment.
  • a device configured to decode and/or render the video scene may be configured to receive a Moving Picture Experts Group immersive codec family (MPEG-I) bitstream comprising the encoded video scene.
  • a device configured to decode and/or render the video scene may comprise one or more speakers/audio transducers and/or displays, and/or may be configured to transmit a decoded scene or signals to a device comprising one or more speakers/audio transducers and/or displays.
  • a device configured to decode and/or render the video scene may comprise a user equipment, a head/mounted display, or another device capable of rendering to a user an AR, VR and/or MR experience.
  • the electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. Alternatively, the electronic device may be a computer or part of a computer that is not mobile. It should be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may process data.
  • the electronic device 50 may comprise a device that can access a network and/or cloud through a wired or wireless connection.
  • the electronic device 50 may comprise one or more processors 56, one or more memories 58, and one or more transceivers 52 interconnected through one or more buses.
  • the one or more processors 56 may comprise a central processing unit (CPU) and/or a graphical processing unit (GPU).
  • Each of the one or more transceivers 52 includes a receiver and a transmitter.
  • the one or more buses may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like.
  • the one or more transceivers may be connected to one or more antennas 44.
  • the one or more memories 58 may include computer program code. The one or more memories 58 and the computer program code may be configured to, with the one or more processors 56, cause the electronic device 50 to perform one or more of the operations as described herein.
  • the electronic device 50 may connect to a node of a network.
  • the network node may comprise one or more processors, one or more memories, and one or more transceivers interconnected through one or more buses.
  • Each of the one or more transceivers includes a receiver and a transmitter.
  • the one or more buses may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like.
  • the one or more transceivers may be connected to one or more antennas.
  • the one or more memories may include computer program code.
  • the one or more memories and the computer program code may be configured to, with the one or more processors, cause the network node to perform one or more of the operations as described herein.
  • the electronic device 50 may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the electronic device 50 may further comprise an audio output device 38 which in embodiments of the invention may be any one of: an earpiece, speaker, or an analogue audio or digital audio output connection.
  • the electronic device 50 may also comprise a battery (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell, or clockwork generator).
  • the electronic device 50 may further comprise a camera 42 or other sensor capable of recording or capturing images and/or video. Additionally or alternatively, the electronic device 50 may further comprise a depth sensor.
  • the electronic device 50 may further comprise a display 32.
  • the electronic device 50 may further comprise an infrared port for short range line of sight communication to other devices.
  • the apparatus 50 may further comprise any suitable short-range communication solution such as for example a BluetoothTM wireless connection or a USB/firewire wired connection.
  • an electronic device 50 configured to perform example embodiments of the present disclosure may have fewer and/or additional components, which may correspond to what processes the electronic device 50 is configured to perform.
  • an apparatus configured to encode a video might not comprise a speaker or audio transducer and may comprise a microphone
  • an apparatus configured to render the decoded video might not comprise a microphone and may comprise a speaker or audio transducer.
  • the electronic device 50 may comprise a controller 56, processor or processor circuitry for controlling the apparatus 50.
  • the controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and/or decoding of audio and/or video data or assisting in coding and/or decoding carried out by the controller.
  • the electronic device 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader, for providing user information and being suitable for providing authentication information for authentication and authorization of the user/electronic device 50 at a network.
  • the electronic device 50 may further comprise an input device 34, such as a keypad, one or more input buttons, or a touch screen input device, for providing information to the controller 56.
  • the electronic device 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system, or a wireless local area network.
  • the apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and/or for receiving radio frequency signals from other apparatus(es).
  • the electronic device 50 may comprise a microphone 38, camera 42, and/or other sensors capable of recording or detecting audio signals, image/video signals, and/or other information about the local/virtual environment, which are then passed to the codec 54 or the controller 56 for processing.
  • the electronic device 50 may receive the audio/image/video signals and/or information about the local/virtual environment for processing from another device prior to transmission and/or storage.
  • the electronic device 50 may also receive either wirelessly or by a wired connection the audio/image/video signals and/or information about the local/virtual environment for encoding/decoding.
  • the structural elements of electronic device 50 described above represent examples of means for performing a corresponding function.
  • the memory 58 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the memory 58 may be a non-transitory memory.
  • the memory 58 may be means for performing storage functions.
  • the controller 56 may be or comprise one or more processors, which may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on a multi-core processor architecture, as non limiting examples.
  • the controller 56 may be means for performing functions.
  • the electronic device 50 may be configured to perform capture of a volumetric scene according to example embodiments of the present disclosure.
  • the electronic device 50 may comprise a camera 42 or other sensor capable of recording or capturing images and/or video.
  • the electronic device 50 may also comprise one or more transceivers 52 to enable transmission of captured content for processing at another device.
  • Such an electronic device 50 may or may not include all the modules illustrated in FIG. 1.
  • the electronic device 50 may be configured to perform processing of volumetric video content according to example embodiments of the present disclosure.
  • the electronic device 50 may comprise a controller 56 for processing images to produce volumetric video content, a controller 56 for processing volumetric video content to project 3D information into 2D information, patches, and auxiliary information, and/or a codec 54 for encoding 2D information, patches, and auxiliary information into a bitstream for transmission to another device with radio interface 52.
  • Such an electronic device 50 may or may not include all the modules illustrated in FIG. 1.
  • the electronic device 50 may be configured to perform encoding or decoding of 2D information representative of volumetric video content according to example embodiments of the present disclosure.
  • the electronic device 50 may comprise a codec 54 for encoding or decoding 2D information representative of volumetric video content.
  • Such an electronic device 50 may or may not include all the modules illustrated in FIG. 1.
  • the electronic device 50 may be configured to perform rendering of decoded 3D volumetric video according to example embodiments of the present disclosure.
  • the electronic device 50 may comprise a controller for projecting 2D information to reconstruct 3D volumetric video, and/or a display 32 for rendering decoded 3D volumetric video.
  • Such an electronic device 50 may or may not include all the modules illustrated in FIG. 1.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, E-UTRA, LTE, CDMA, 4G, 5G network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and/or the Internet.
  • a wireless cellular telephone network such as a GSM, UMTS, E-UTRA, LTE, CDMA, 4G, 5G network etc.
  • WLAN wireless local area network
  • the system 10 may include both wired and wireless communication devices and/or electronic devices suitable for implementing embodiments of the invention.
  • the system shown in FIG. 2 shows a mobile telephone network 11 and a representation of the internet 28.
  • Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an apparatus 15, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22, and a head-mounted display (HMD) 17.
  • the electronic device 50 may comprise any of those example communication devices. In an example embodiment of the present disclosure, more than one of these devices, or a plurality of one or more of these devices, may perform the disclosed process(es). These devices may connect to the internet 28 through a wireless connection 2.
  • the embodiments may also be implemented in a set-top box; e.g. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • a set-top box e.g. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • the embodiments may also be implemented in cellular telephones such as smart phones, tablets, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.
  • PDAs personal digital assistants
  • portable computers having wireless communication capabilities
  • image capture devices such as digital cameras having wireless communication capabilities
  • gaming devices having wireless communication capabilities
  • music storage and playback appliances having wireless communication capabilities
  • Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24, which may be, for example, an eNB, gNB, etc.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28.
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11, 3GPP Narrowband IoT and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocol-internet protocol
  • SMS short messaging service
  • MMS multimedia messaging service
  • email instant messaging service
  • IMS instant messaging service
  • Bluetooth IEEE 802.11, 3GPP Narrowband IoT and any similar wireless communication technology.
  • a communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not
  • a channel may refer either to a physical channel or to a logical channel.
  • a physical channel may refer to a physical transmission medium such as a wire
  • a logical channel may refer to a logical connection over a multiplexed medium, capable of conveying several logical channels.
  • a channel may be used for conveying an information signal, for example a bitstream, which may be a MPEG- I bitstream, from one or several senders (or transmitters) to one or several receivers.
  • VR virtual reality
  • AR augmented reality
  • MR mixed reality
  • VR virtual reality
  • VR is an area of technology in which video content may be provided, e.g. streamed, to a VR display system.
  • the VR display system may be provided with a live or stored feed from a video content source, the feed representing a VR space or world for immersive output through the display system.
  • a virtual space or virtual world is any computer-generated version of a space, for example a captured real-world space, in which a user can be immersed through a display system such as a VR headset.
  • a VR headset may be configured to provide VR video and audio content to the user, e.g. through the use of a pair of video screens and headphones incorporated within the headset.
  • Augmented reality is similar to VR in that video content may be provided, as above, which may be overlaid over or combined with aspects of a real-world environment in which the AR content is being consumed.
  • a user of AR content may therefore experience a version of the real-world environment that is "augmented" with additional virtual features, such as virtual visual and/or audio objects.
  • a device may provide AR video and audio content overlaid over a visible or recorded version of the real-world visual and audio elements.
  • the encoding, decoding, and/or rendering of the content may take place at a single device or at two or more separate devices.
  • the encoding of the content may take place at a user equipment, a server, or another electronic device capable of performing the processes herein described.
  • the encoded content may then be transmitted to another device, which may then store, decode, and/or render the content. Transmission of the encoded content may, for example, occur over a network connection, such as an LTE, 5G, and/or NR network.
  • the encoding of the content may take place at a server.
  • the encoded content may then be stored on a suitable file server, which may then be transmitted to another device, which may then store, decode, and/or render the content.
  • Volumetric video data may represent a three- dimensional scene or object and may be used as input for AR, VR, and MR applications. Because volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR, or MR applications, especially for providing six degrees of freedom (6D0F) viewing capabilities.
  • Such data may describe geometry (shape, size, position in 3D-space, etc.) and respective attributes (e.g. color, opacity, reflectance, etc.), plus any possible temporal changes of the geometry and attributes at given time instances.
  • Temporal information about the scene may be included in the form of individual capture instances, e.g. "frames" in 2D video, or other means, e.g. position of an object as a function of time.
  • Volumetric video may be generated from 3D models, e.g. computer-generated imagery (CGI); captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, etc.; or generated from a combination of generated data and real-world data.
  • CGI computer-generated imagery
  • capture solutions e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, etc.
  • volumetric video may be generated from 3D models, e.g. computer-generated imagery (CGI); captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, etc.; or generated from a combination of generated data and real-world data.
  • Increasing computational resources and advances in 3D data acquisition devices has enabled reconstruction of highly detailed volumetric video representations of natural scenes. Representation of the 3D data depends on how the 3D data is used. Infrared, lasers, time-of-flight, and
  • Dense voxel arrays have been used to represent volumetric medical data.
  • polygonal meshes are extensively used.
  • Point clouds are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold.
  • Another way to represent 3D data is coding this 3D data as a set of texture and depth maps, as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.
  • the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations are to be stored or interchanged between entities, then efficient compression becomes essential.
  • Standard volumetric video representation formats such as point clouds, meshes, voxels, etc. suffer from poor temporal compression performance. Identifying correspondences for motion-compensation in 3D-space is an ill-defined problem, as both geometry and respective attributes may change. For example, successive temporal "frames" do not necessarily have the same number of meshes, points, or voxels. Therefore, compression of dynamic 3D scenes may be inefficient. 2D-video based approaches for compressing volumetric data, e.g. multiview+depth, have much better compression efficiency, but rarely cover the full scene. Therefore, they may provide only limited 6DoF capabilities.
  • a 3D scene may be projected onto one or more geometries. These geometries may be "unfolded" into 2D planes (e.g. two planes per geometry: one for texture, one for depth), which are then encoded using standard 2D video compression technologies. Relevant projection geometry information is transmitted alongside the encoded video files to the decoder. The decoder decodes the video and performs the inverse projection to regenerate the 3D scene in any desired representation format (which might not necessarily be the starting format). [0058] Projecting volumetric models onto 2D planes allows for using standard 2D video coding tools with highly efficient temporal compression.
  • coding efficiency may be greatly increased.
  • Using geometry-projections instead of prior-art 2D-video based approaches, e.g. multiview+depth, may provide better coverage of a 3D scene or object.
  • 6DoF capabilities may be improved.
  • Using several geometries for individual objects may further improve the coverage of a scene.
  • standard video encoding hardware may be utilized for real-time compression/decompression of the projected planes. The projection and reverse projection steps are of low complexity.
  • V-PCC video-based point cloud compression
  • V-PCC compression and decompression are further described in MPEG N19092.
  • V-PCC compression may take place during the encoding stage.
  • V-PCC is now visual volumetric video-based coding (V3C), which relates to a core part shared between ISO/IEC 23090-5 (formerly V-PCC (Video- based Point Cloud Compression)) and ISO/IEC 23090-12 (formerly MIV (MPEG Immersive Video)).
  • V3C will not be issued as a separate document, but as part of ISO/IEC 23090-5 (expected to include clauses 1-8 of the current V-PCC text).
  • ISO/IEC 23090-12 will refer to this common part.
  • ISO/IEC 23090-5 will be renamed to V3C PCC, ISO/IEC 23090-12 renamed to V3C MIV.
  • MIV MPEG-I Immersive Video
  • MIV relates to the compression of immersive video content, also known as volumetric video, in which a real or virtual 3D scene is captured by multiple real or virtual cameras.
  • MIV enables storage and distribution of immersive video content over existing and future networks, for playback with 6 degrees of freedom (6DoF) of view position and orientation within a limited viewing space and with different fields of view depending on the capture setup.
  • 6DoF 6 degrees of freedom
  • Example embodiments of the present disclosure may be implemented with respect to MIV, V-PCC, or any immersive/volumetric video content compression procedure known to a person of ordinary skill in the art.
  • FIGs. 3-6 illustrate examples related to V-PCC.
  • FIGs. 7-9 illustrate examples related to MIV.
  • a person of ordinary skill in the art may understand how example embodiments of the present disclosure, while described with respect to MIV, may be adapted to V-PCC or other volumetric video compression procedures/standards.
  • a point cloud frame may be processed.
  • the volumetric data may be represented as a set of 3D projections in different components; in other words, the input point cloud data/frame may be projected onto one or more geometries.
  • the input point cloud frame may be used to generate one or more 3D patches.
  • the 3D image may be decomposed into far and near components for geometry and corresponding attribute components.
  • the 2D projection may be composed of independent patches based on geometry characteristics of the input point cloud frame.
  • the patch information and the input point cloud frame may be used to generate one or more attribute images describing the attributes associated with the patches, at 320.
  • the patch information may be used to perform patch packing, at 310.
  • an occupancy map 2D image may be created to indicate parts of an image that may be used.
  • the input point cloud frame, the patch information, and the occupancy map produced via the patch packing process may be used to generate one or more geometry images describing the patches, at 330.
  • the packed patches/occupancy map may be compressed at 335, resulting in an occupancy sub-stream sent to the multiplexer 360.
  • Image padding may be applied to the one or more geometry images at 345, and the padded geometry images may be compressed at 355, resulting in a geometry sub-stream sent to the multiplexer 360.
  • the image padding may be based on an occupancy map reconstructed from the compressed patches, at 345.
  • Smoothing of the attribute image may be based on a geometry image reconstructed from the compressed geometry image and an occupancy map reconstructed from the compressed patches/occupancy map, at 325.
  • the reconstructed geometry information may be smoothed outside the encoding loop as a post processing step. Additional smoothing parameters that were used for the smoothing process may be transferred as a supplemental information for the decoding process.
  • the generation of the attribute image may be based on the smoothed geometry and an occupancy map reconstructed from the compressed patches/occupancy map, at 320.
  • Image padding may be applied to the one or more attribute images at 340, and the padded attribute images may be compressed at 350, resulting in an attribute sub-stream sent to the multiplexer 360.
  • the image padding may be based on an occupancy map reconstructed from the compressed patches/occupancy map, at 340.
  • the sequence of the generated patches may be compressed at 315, resulting in a patch sub-stream sent to the multiplexer 360.
  • This patch sub-stream may be considered as comprising compressed auxiliary information.
  • the multiplexer 360 may multiplex the patch sub-stream, the attribute sub-stream, the geometry sub-stream, and the occupancy sub-stream to produce a compressed bitstream that may transmitted to a decoder, for example a decoder implementing the decompression process illustrated at FIG. 4.
  • a decoder for example a decoder implementing the decompression process illustrated at FIG. 4.
  • FIG. 4 illustrated is an overview of the V-PCC decompression process.
  • a compressed bitstream may be received by the demultiplexer 410.
  • the demultiplexer 410 may demultiplex the compressed bitstream into a sequence parameter set (SPS) sub-stream, a patch sub-stream, an occupancy sub-stream, a geometry sub-stream, and an attribute sub-stream.
  • the SPS may be parsed at 420.
  • the SPS may be considered auxiliary information, which may have been entropy coded.
  • the patch sequence may be decompressed at 430, resulting in patch information.
  • the decompression of the patch sequency may be based, at least partially, on auxiliary information.
  • the occupancy sub-stream may be decompressed at 440, resulting in an occupancy map.
  • the occupancy map may have been compressed using video compression, and may have to be upscaled to the nominal resolution.
  • the nearest neighbor method may be applied for upscaling.
  • the decompression of the occupancy sub-stream may be based, at least partially, on auxiliary information.
  • the geometry sub-stream may be decompressed at 450, resulting in one or more geometry images.
  • the decompression of the geometry sub-stream may be based, at least partially, on auxiliary information.
  • the attribute sub-stream may be decompressed at 460, resulting in one or more attribute images.
  • the decompression of the attribute sub-stream may be based, at least partially, on auxiliary information.
  • the geometry and attributes may be reconstructed, at 470.
  • Geometry post-processing such as smoothing, may be applied to reconstruct point cloud geometry information, at 480.
  • the geometry post-processing may be based, at least partially, on auxiliary information.
  • the attributes of the point cloud may be reconstructed based on the decoded attribute video stream and reconstructed information for smoothed geometry and, if present, occupancy map and auxiliary information.
  • an additional attribute smoothing method may be used for point cloud refinement, at 490.
  • the attribute transfer and smoothing may be based, at least partially, on auxiliary information and/or reconstructed geometry/attributes.
  • the compressed bitstream generated may result in a network abstraction layer (NAL) sample bitstream.
  • NAL network abstraction layer
  • FIG. 5 illustrated is an example network abstraction layer (NAL) sample bitstream.
  • the NAL sample stream may be used for coded representation of dynamic point clouds.
  • POC picture order count
  • the flexible coding structure may be implemented by using the picture order count (POC) concept, as well as a class to manage the list of parameters in the NAL sample stream and V-PCC sample stream.
  • Atlas 0 (610) and Atlas 1 (620) may each be associated with a NAL bitstream having its own sample stream header specifying a precision.
  • Atlas 1 620
  • NAL Bitstream 630
  • atlas tile group layer 640
  • FIG. 7 illustrated is an example of an encoding process in the MIV extension of V3C.
  • the encoding process may comprise preparation of source material (720, 722, 724, 726); per-group encoding (730); bitstream formatting (752, 754, 756); and video encoding (770).
  • the source material may comprise source views (710), including view parameters (712), geometry component (714), attribute components (716), and/or, optionally, entity map (718).
  • the source material may be processed/prepared according to one or more of the following steps: geometry quality assessment (720); split source in groups (722); synthesize inpainted background (724); and/or view labelling (726).
  • the result of preparing the source material may be per-group encoded (730).
  • the groups may be encoded, as further described with reference to FIG. 8.
  • Bitstream formatting may be performed on parameter set (742), view parameters list (744), and/or atlas data (746) resulting from the per-group encoding (730).
  • Video sub-bitstream encoding (770) may be performed based on raw geometry (762), attribute video data (764), and/or occupancy video data (766), which, after encoding (770), may be packed (754) and multiplexed (756) with a formatted bitstream (752) which may be generated based on SEI messages (750), parameter set (742), view parameters list (744), and/or atlas data (746).
  • the format bitstream (752) may comprise a V3C sample stream with MIV extensions.
  • a bitstream (780), which may be a single file, may be the result of the multiplexing (756).
  • the process (730) may include steps directed to preparing source material (810, 812); pruning processes (820, 822); atlas processes (830, 832, 834, 836, 838); and/or video processes (840, 842, 844, 846).
  • the source views (710) may be used to generate a view parameters list (742).
  • the process (730) may comprise automatic parameter selection (810); and, optionally, separation to entity layers (812).
  • a parameter set (744) may be generated based on the automatic parameter selection (810).
  • the encoding may comprise pixel pruning (820) and aggregating pruning masks (822).
  • active pixels may be clustered (830); the clusters may be split (832); patches may be packed (834); patch attribute average value(s) may be modified (836); and/or color correction (838) may be, optionally, performed.
  • Atlas data (746) may be generated as a result of these atlas processes.
  • video data may be generated (840); geometry may be quantized (842) and scaled (844).
  • occupancy may be scaled (846).
  • the rendering (930) may be based on decoded access unit (910) (e.g. all conformance points).
  • the decoded access unit (910) may comprise parameter sets (912), which may correspond to the parameter set (742) of FIG. 7.
  • the decoded access unit (910) may comprise view parameters list (914), which may correspond to the view parameters list (744) of FIG. 7.
  • the decoded access unit (910) may comprise per atlas data (920), which may correspond to the atlas data (746) of FIG. 7.
  • the rendering (930) may include block to patch map filtering (942, 944); reconstruction processes (952, 954, 956); geometry processes (982, 984, 986); view synthesis (962, 964); and/or viewport filtering (972, 974).
  • the entity filtering (942) may, optionally, be performed, followed by patch culling (944).
  • Reconstruction processes may comprise occupancy reconstruction (952) and optional attribute average value restoration (956), which, together with the output of patch culling (944) may be reconstructed as pruned view (954).
  • Geometry processes may comprise optional geometry scaling (982); optional depth value decoding (984); and optional depth estimation (986).
  • reconstructed pruned views may be unprojected to global coordinate system (962) and may be reprojected and merged into a viewport (964).
  • inpainting (972) and view space handling (974) may be performed to generate the viewport (990).
  • the rendering (930) and, ultimately, the viewport generation (99) may be based, at least partially, on viewport parameters (992).
  • MIV MPEG-I Immersive Video
  • regions of a 3D scene which may include a set of geometry (e.g. depth) and attribute (e.g. texture, occupancy, normals, transparency, etc.) information captured or estimated from a set of sparse input camera(s) may be projected into 2D patches, which may be organized into atlases.
  • attribute e.g. texture, occupancy, normals, transparency, etc.
  • metadata may be provided to a decoder/renderer along with coded video bitstreams, which may enable a client to synthesize dense views in a so-called "viewing volume".
  • a viewer of the synthesized view(s) may observe the surrounding scene from any position and angle within the viewing volume.
  • Specular surfaces are common in natural captured content and in high quality synthetic content. The light reflected by such surfaces is view-dependent, e.g., it changes based on the observer's viewing direction (e.g., light reflections on metallic surfaces, or reflections from a mirror).
  • MIV it may be possible to encode multiple patches including the same surface geometry but that exhibit different colors or texture. Each of these patches may correspond to one input camera view that may be signaled in the patch data unit, which may be identified with a Tile Patch Projection Id. A clrent/decoder/renderer may then decode these patches, select one or more of these patches, and blend their respective colors and textures to synthesize a virtual view.
  • Encoding multiple patches that contain the same geometry may not only result where a same surface geometry exhibits different colors and/or light intensities from different camera views (e.g. specular surfaces); in a multi-sensor use-cases, input camera views observing the same surface region may capture the region using different modalities, such as visible light, infrared, ultra-violet, multi-spectral images, hyper-spectral images, etc. Numerous applications use such intermodal information for crowd, scene, object, and/or surface analysis. Such multi-sensorial data using multiple views and multiple modalities may also be encoded using MIV. However, encoding redundant information, for example geometry, for each modality and view may be considered suboptimal.
  • Example embodiments of the present disclosure may relate to encoding, signaling, and decoding of "inherited geometry patches".
  • no geometry (or depth) is encoded; such information may be derived from the geometry of a reference patch that may be signaled in the patch data unit (PDU) of the inherited geometry patch.
  • PDU patch data unit
  • an "inherited geometry patch” is a patch that refers to a reference patch rather than including geometry and/or depth information, as the geometry and/or depth information applicable to the inherited geometry patch may be determined based on the geometry and/or depth information of the reference patch.
  • the client/decoder/renderer may decode the depth of the corresponding reference patch and warp it from the reference patch projection camera view towards/according to the inherited geometry patch projection camera view.
  • multiple textures for a single portion of surface may be encoded while (drastically) reducing the amount of geometry encoded.
  • FIGs. 10a and 10b illustrated is an example in which the use of an inherited geometry patch may reduce the amount of geometry encoded.
  • FIG. 10a illustrated is an example in which a surface 1010 is observed by multiple (here two) cameras 1020, 1025.
  • the appearance (texture) of the surface 1010 seen by cameras cl (1020) and c2 (1025) (and their corresponding patches 1030, 1040) may be different because the surface is specular, or because cameras cl (1020) and c2 (1025) capture different modalities.
  • the geometry of the 3D surface 1010 is the same (e.g. the same surface 1010 is captured by each camera 1020, 1025), and encoding geometry information for each patch 1030, 1040 may result in the encoding of redundant/derivable information.
  • FIG. 10b illustrated is an example, similar to the example of FIG. 10a, but including the use of an inherited geometry patch. The same features are identified with the same label, and duplicative description is not included.
  • depth information is only encoded for the patch 1030 corresponding to the surface 1010 in view vl, corresponding to camera cl 1020 (e.g. both texture bits 1033 and depth bits 1035 are encoded for reference patch 1030). It may be noted that the view of/captured by camera cl (1020) is different from the view of/captured by camera c2 (1025).
  • the depth is not encoded for the patch 1050 in view v2 corresponding to camera c21025, but it is inherited from the reference patch 1030 in vl, observed by camera cl 1020 (e.g. texture bits 1053 are encoded, but depth bits 1055 are crossed out/replaced by a reference to reference patch 1030.
  • the MIV encoder may receive intrinsic and extrinsic view parameters for all the source views, including texture and depth information.
  • the MIV encoder may first label source views as either "basic” or "additional" views.
  • Basic views may be encoded as full pictures, while additional views may be pruned into patches.
  • Basic views and pruned additional view patches may then be projected on atlases, which may then be encoded as video streams with codecs such as versatile video coding (W C) or high efficiency video coding (HEVC), and all the metadata required to decode such atlases and reconstruct source views (or new virtual views within a specified viewing volume surrounding and including the source views), may be encoded into a MIV bitstream.
  • codecs such as versatile video coding (W C) or high efficiency video coding (HEVC)
  • W C versatile video coding
  • HEVC high efficiency video coding
  • a source comprised of multiview content may have inter view redundancy when the Multiview camera captures the same surfaces from multiple orientations.
  • the MIV encoder may include a pruner that, on a per frame basis, may select which areas can be pruned (e.g. will not be projected on the atlas and therefore not be encoded), by creating masks per frame. All basic views may be preserved, and additional views may be either pruned or preserved.
  • the MIV encoder may focus on depth information to decide whether pixels in additional views are pruned or not.
  • the MIV encoder may rely on a working assumption that the (additional views) content is diffuse, and that only one view is sufficient to represent the same surface.
  • the MIV encoder may determine a hierarchical "pruning graph, " which may determine the inter-view pruning and coding dependency that may be included in the MIV metadata bitstream.
  • the pruning graph information may be read by the decoder to identify which sequence of views to decode first in order to reconstruct a specific view; to reconstruct a leaf view, all parent views in the pruning graph must be reconstructed first from the view root towards the desired view.
  • the pruning graphs may be created, initially, by including basic views into the pruning graph as root views. All pixels of basic view may be projected on to each additional view, and a pruning mask may be generated. The additional view with a maximum number of preserved pixels may be retained and inserted in the pruning graph as a child view. The process may be stopped if all the additional nodes are assigned. Otherwise, all pixels of the last inserted view in the graph may be projected to the remaining additional views. By doing so for each additional view, only pixels that encode new depth information that cannot be reconstructed from parent views in the pruning graph may be preserved. Pruning masks may be obtained for each view.
  • the number of basic views may be kept low, and basic views may be mapped into separate clusters. Each cluster may be assigned with different additional views that are closer to (e.g. that have the largest overlap with) the cluster basic views. This may reduce the number of dependencies between views. [0085] Once pruning masks are created for each additional view, they may be aggregated over the specified intra-period. The process may be carried over, frame by frame, starting at the beginning of the first frame and ending at the last frame of the intra-period.
  • the aggregated mask may be finally clustered based on connectivity (region growing) between at least one other pixel within the eight pixel neighbors. Aggregated masks may then be defined as Patches, and associated with x and y position of the bounding box and width and height of the bounding box. Patches may be merged and split to efficiently pack them in the atlases for better compression efficiency.
  • the fact that the pruner takes mostly depth into account may cause the encoder to prune pixels that would bring important information for higher quality reconstruction.
  • the Test Model may include some adaptive texture consistency thresholds, mostly to compensate for low quality depth, these might not be sufficient to preserve view-dependent features, and pixels may still be pruned.
  • pixels may be kept in additional views, but their corresponding depth might be encoded multiple times per additional view. This multiple encoding of the same geometry may reduce the compression efficiency and increase the decoding complexity of the representation.
  • all the (non- culled) patches from atlases may be copied to images which correspond to each source view.
  • Each patch may be copied from the atlas and tile to its corresponding view based on the position it belongs to.
  • the basic view may be complete, and most parts of the reconstructed additional views may be empty.
  • the input to the reconstruction process may include a viewld, an atlas frame width and height, a 2D array indicating mapping between atlas block to patch map, a size of the patch, and other metadata required for reconstruction of the patches from an atlas.
  • the reconstruction procedure may include evaluating whether a patch belongs to one of the source views and checking whether its occupancy is set. If the above conditions are satisfied, then the pixel U, V positions may be evaluated, and the patch may be assigned to a corresponding pruned view.
  • the missing parts of the additional views may be finally reconstructed using the view weighting synthesizer (non- normative), which reads the pruning graph and associated view weights to recover pruned pixels.
  • view weighting synthesizer non- normative
  • MIV also includes the concept of entities that correspond to different objects of the scene, for which corresponding pixels in all source views may be processed and encoded independently (from other entities) within MIV, and that can be filtered out from the bitstream.
  • Multi-stereo may be achieved for Multiview content with one or several visual modalities and with or without depth sensing.
  • views or cameras may include one or more of: perspective cameras, orthographic cameras, fisheye cameras, and/or spherical projections (half equirectangular, equirectangular projection cameras, cube maps, multiple stitched camera views, etc.) that can be projected onto a camera plane, such as the camera focal plane for the perspective cameras, or a panorama plane for fisheye or spherical cameras.
  • processing on views may be interpreted as processing pixels on such a general camera plane.
  • that the same surface may observed by multiple different cameras from different poses may be determined where (a) the cameras capture the same modality or not, and/or (b) the capture includes depth sensors or not.
  • identifying patches that have the same geometry, but different variation of attributes may be done by aligning views based on depth information.
  • a first step may be to calibrate depth sensors so that depth can be estimated at each input view, which may then be considered as view+depth.
  • Another (optional) step may be to scale the resolution of all views such that they correspond to each other.
  • One way to identify such patches in one input view V may be to use un-projection of all views to the 3D scene, and then re projection to the input view V.
  • This may be achieved by 3D warping views and rasterizing the obtained result to the input view V.
  • Other approaches to identifying patches capturing/describing the same surface may be used, including using homographies, or correspondence based matching techniques, which may enable alignment of MultiviewT depth content.
  • the result may be that, for each pixel of input view V, either a vector with the corresponding pixel values for the other views, or a value specifying that the pixel is invalid due to occlusion, corresponding to another depth (another object) or not inside the viewing frustum of the other camera(s), may be obtained.
  • the geometry when depth is not sensed, but all views capture the same modality, the geometry may be estimated by Multiview stereo and stereo matching.
  • Several known approaches may be leveraged for geometry estimation, such as using block matching approaches or, more recently, using convolutional neural networks (CNN) to identify stereo pairs in the context of 3D reconstruction, using either 2D CNNs when comparing pairs of views or 3D CNNs when considering the scene volume cost optimized over multiple pairs of views.
  • CNN convolutional neural networks
  • observed pixels may be classified as view-dependent or not.
  • view-dependent pixels may be aggregated into patches that correspond to the same surface viewed by different cameras (e.g. view-dependent patches).
  • the analysis of the texture variations within each modality or across modalities may be computed to identify whether a surface point or patch relates to a non-Lambert an surface (e.g. view-dependent patches).
  • a vector of pixels corresponding to a pixel of an input view V may be analyzed.
  • pixels may be classified as view- dependent or not as follows. If the vector includes other modalities, then the pixel may be considered as view-dependent. If the vector only contains the same modality, but with a non-zero variance (or larger than a threshold) of the appearance values in the vector, the pixel may also be considered as view-dependent. Otherwise, it may be considered as view independent (diffuse).
  • the view-dependent pixels may then be aggregated on the view V. This may be performed using region-growing approaches, watershed, super-pixels, or segmentation approaches based on (a) the depth value of the pixels and/or (b) the values of the pixel vectors, or some statistics of these vectors such as the mean, the max to min difference, the variance, standard deviation, the median, etc.
  • FIGs. 11a and lib illustrated are examples of view-dependent patch construction.
  • FIG. 11a illustrated is an example of in-view region-growing for direct 2D patch aggregation (with reprojection to other views).
  • the surface 1110 may be captured with camera cl (1120) having view vl (1125).
  • the same surface 1110 may also be captured with camera c2 (1130) having view v2 (1135).
  • Based on analysis of the pixels captured by camera cl (1120) having view vl (1125), a patch may be generated via region-growing.
  • the patch associated with camera c2 (1130) having view v2 (1135) may be determined based on the patch associated with camera cl (1120) having view vl (1125).
  • a patch 1115 may be generated via region-growing. This patch 1115 may be 2D projected for each of the cameras to generate patches, e.g. patch 1140 for camera cl (1120) having view vl and patch 1150 for camera c2 (1130) having view v2. [00103] Once the patches are obtained for the view V, the corresponding patches in other views may be obtained by using the geometry information already collected or by back projecting the patch to the other views.
  • the area of these patches in 2D may differ due to view warping and due to possible occlusions or overlapping of the patch with regions outside the target view frustum. This process may result in a set of corresponding patches that are view-dependent across all, or a part, of the input views. The above operations may be applied to all input views.
  • a reference patch may be selected inside/within/from each set of corresponding view-dependent patches; this patch may be the only patch in a set for which the geometry will be encoded.
  • Several choices for selecting a reference patch may be possible, based on the type of capture.
  • the patch where the depth has the highest projection resolution (the largest 2D area) and that has no occlusions may be selected as the reference patch.
  • reference patches may be selected that have the same projection view across different view- dependent patches sets, so as to favor the encoding of a full view (for example, basic views in MIV).
  • Other approaches for selecting/determining the reference patch may be possible based on the target application and associated constraints.
  • patches may be derived by performing, first, a 3D reconstruction from estimated or captured depth.
  • This 3D reconstruction may lead to a 3D description of the scene in surfaces that may be represented as a mesh, or as a point cloud that can be tessellated as a mesh, or as a volumetric representation that may also be tessellated as a mesh with marching cubes.
  • the corresponding 3D points in the 3D reconstructed scene may be aggregated by region growing in 3D on the surface using surface (mesh) segmentation techniques, where points may be aggregated by inspecting their respective connected neighborhood and the values and/or statistics of the corresponding view-dependent pixel vector(s).
  • Such aggregation may lead to 3D surface patches. Patches may then be obtained in input views by projecting obtained 3D patches on the input views.
  • patch shapes may be unreasonably small for compression purposes.
  • the minimal size of a patch may be equal to or larger than the smallest block size used by the video codec that will be used to encode the atlases.
  • 2D patch shapes may contain invalid pixels.
  • invalid pixels can be inpainted from neighboring pixels inside the patch, using simple copy or more advanced inpainting techniques that may take into account the local gradient or satisfying the same probability distribution in case of textures.
  • such sets may be encoded within MIV by choosing one patch as a reference patch for which texture and geometry will be encoded, while the remaining patches may be encoded as "inherited geometry patches" by coding their texture and their reference patch index, but not their geometry/depth information.
  • MIV there is currently no mechanism that allows a patch or a set of patches to inherit the geometry from another patch.
  • Geometry is either always encoded for every patch (in MIV Main and MIV Extended Profiles), or never encoded for any patch (in MIV Extended Restricted Geometry and MIV Geometry Absent Profile).
  • MIV Extended Restricted Geometry Profile no geometry is encoded, but a constant depth is encoded to represent Multi- Plane Images (MPIs).
  • MPIs Multi- Plane Images
  • MIV Geometry Absent Profile the geometry is not encoded.
  • Example embodiments of the present disclosure may involve including high-quality encoded geometry within the reference patches and the use of simple reprojection techniques to reconstruct the depth in inherited geometry patches.
  • encoding may start with the detection of patches in the different source(s) or virtual view(s) that relate to the same surface geometry, but exhibit a different texture. This difference in texture may occur either because the surface is non-diffuse, or because the different views are captured/estimated by different sensors specialized in different modalities (e.g. IR, UV, multispectral, hyperspectral, etc.) as described above.
  • modalities e.g. IR, UV, multispectral, hyperspectral, etc.
  • a reference patch For each detected set of view-dependent patches, a reference patch may be identified, from which the patch geometry may be estimated via projection, unprojection, and rasterization operations for all other patches within the same set of view- dependent patches.
  • the reference patch encoding might not be modified compared to MIV; it may be a part of the entirety of a basic view, or a patch in an additional view.
  • the reference patch might not be specifically signaled.
  • the other patches that may inherit their geometry from that reference patch may be signaled as inherited geometry patches and may include the patch index of their reference patch.
  • the reference index patch may be chosen such that its projection_view_id is a basic view.
  • Corresponding inherited geometry patches may then be encoded for each child node of that basic view in its corresponding pruning graph.
  • the pruning graph may be parsed according to the informative MIV subclause H.7.1.
  • one basic view may be preferred, for example the one where the patch occupancy is maximal compared to others, but other heuristics are possible. It may also be an advantage to select the same basic view for most sets of view- dependent patches, or to find the best trade-off between maximizing occupancy in the chosen reference patch and the number of times the same basic view is chosen as reference for all sets of view- dependent patches.
  • a different entity_id may be set for reference patches and inherited geometry patches within a set of view-dependent patches.
  • not all patches in a set of view-dependent patches may be encoded in their respective pruning tree node.
  • an inherited geometry patch might only be encoded if its texture is significantly different from the textures of the other patches in the set of view-dependent patches. "Significantly different" may mean, for example, that a difference in one or more pixel values of the patch varies by more than a threshold amount.
  • a patch of a set of view-dependent patches may be encoded if the surface is diffuse inside one modality (e.g. infrared/heat) but view-dependent in another one (e.g. color).
  • a patch of a set of view-dependent patches may be encoded if the orientation of the surface with respect to the cameras orientations and positions leads to a determination that most cameras observe the same side of the surface, while a few cameras observe other sides of the surface. In that case it may be interesting/useful to encode the latter view-dependent patches as inherited geometry patches, as they bring more information.
  • the pixel texture differences from each view to all other views may be aggregated to determine which ones are most significantly different (e.g. salient) than the average within a view-dependent set of patches. For example, a view may be determined to be salient if the difference between the texture difference of the view to all other views is greater than a threshold value or a predetermined percentage/ratio.
  • the encoding of these salient patches may be considered to maximize the information transmitted within the bitstream.
  • information may be maximized within each modality or across modalities by carefully selecting which view-dependent patches will be encoded as inherited geometry patches, and which patches will be skipped.
  • signaling, semantics, and decoding process extensions for MIV may be implemented in order to enable the use of inherited geometry patches.
  • a flag may be added in the atlas sequence parameter set MIV extension syntax: the asme_inherited_geometry_flag. If set, this flag may indicate that the sequence can include inherited geometry patches.
  • Table 1 below illustrates an example of atlas sequence parameter set MIV extension syntax including the asme_inherited_geometry_flag:
  • the MIV extension syntax of the PatchDataUnit may be extended by including the pdu_inherited_geometry_patch_flag. If set, this flag may indicate that the current patch inherits its geometry information from another patch. The index of the other patch may then be provided as the syntax element - pdu_reference_patch_id[Tile][p].
  • Table 2 below illustrates an example of patch data unit MIV extension syntax including the asme_inherited_geometry_flag, the pdu_inherited_geometry_patch_flag, and pdu_reference_patch_id[Tile][p]:
  • pdu_inherited_geometry_patch_flag[tilelD][p] may specify that the current patch is an inherited geometry patch. If that flag is equal to one, then the patch id of the reference patch, from which the geometry has to be derived, may be encoded in pdu_reference_patch_id[tilelD][p].
  • the decoding of the patch related Texture attribute data for an inherited geometry patch may follow the V3C and MIV extension specifications.
  • the geometry reconstruction process may be as follows. First, pdu_reference_patch_id[tileD][p] may be read in the patch data unit.
  • the geometry reconstruction process may further include setting the following values for the patch with index refldx:
  • the conversion to the 3D coordinate system may be performed by first transforming (x, y) to a local patch coordinate pair (u, v), as follows (according to equation (49) of ISO/IEC DIS 23090-5(2E):2021):
  • GeoFrame may refer to the decoded or unpacked depth atlas frames.
  • OccFrame may refer to the decoded or unpacked occupancy atlas frames,
  • recrefGeoFrame may refer to the reconstructed reference view depth frame that corresponds to the patch with index refldx, in case it was not already decoded.
  • the pixel (i, j) may be a pixel with valid depth in recrefGeoFrame that belongs to patch refldx in viewiD.
  • the function UnProject(v, p) may return the 3D point P which projects onto pixel p in the v-th view, as specified in MIV Annex H subclause H.2.4.
  • the function Project(v, P) may return the pixel coordinates of the projection of 3D point P in the v-th view and may include rasterization operations (as reprojection may not fall directly on the pixel grid of the v-th view).
  • the function IsInViewport(v , p) may return "true” if pixel p is inside the viewport of the v-th view.
  • the functionlsOccupied(v, p) may return "false” if the depth of pixel p in the v-th reconstructed view is invalid.
  • the conversion of the patch to the 3D coordinate system may be performed as follows where occupancy data is available in inherited_geometry_patch_view_id:
  • the conversion of the patch to the 3D coordinate system may be performed as follows where if occupancy data is not available in inherited_geometry_patch_view_id:
  • x, y may be optionally constrained such that AtlasBlockToPatchMap [y / n][x / n] is equal to refldx so that only this reference patch depth is reconstructed.
  • MIV pdu_projection_id[ tilelD ][ p ] corresponds to the view ID of the patch with index equal to p, in the tile with ID equal to tilelD as described above, for a patch with patch index equal to patchldx. If the inherited_geometry_patch_flag is equal to one, the reference_patch_id may be first decoded.
  • the patch geometry of the patch with an index equal to reference_patch_index, noted P[refldx], may be first decoded from the corresponding geometry atlas tile. Then, in order to obtain the reconstructed depth of the inherited geometry patch, the geometry of P[refldx] may be reprojected from its camera viewID to the inherited geometry patch camera view noted/indicated by inherited_geometry_patch_view_id. Occupancy and visibility (IsInViewport) may be checked before writing the reprojected reference depth into the reconstructed inherited geometry depth.
  • entities may be leveraged to decode inherited geometry patches.
  • asme_max_entity_id may be set to a value larger than zero.
  • all inherited geometry patches may be assigned a specific entity_id or a set of specific entity_id's such that these inherited geometry patches may be filtered out or not decoded in case the client computational capacity is not large enough to handle specular content at higher frame rates, or in order to save bandwidth by filtering out these entities in the network to optimize transmission quality and client processing speed.
  • the decoding process may the same as above, with the additional constraint that entities corresponding to reference patches might not be filtered out if the inherited geometry patches corresponding entities have not been filtered out first. If that condition is not met, the bitstream may still be decoded and rendered, but the geometry of the view-dependent patches might not be correct. In that case, the non-normative renderer may then for example filter out the corresponding inherited patches, apply inpainting techniques, etc.
  • the decoded inherited geometry patches may be blended, for example based on MIV subclause H.7.2 or with blending weights that may be application specific and externally defined. In the case of multiple modalities, weights may be set/determined based on proximity for the same modality to the desired one, and weights may be set to zero for other modalities, etc.
  • FIG. 12 illustrates the potential steps of an example method 1200.
  • the example method 1200 may include: determining that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches comprise at least partially different texture information, 1210; selecting one of the at least two patches to be a reference patch, 1220; selecting at least one other of the at least two patches to be an inherited geometry patch, 1230; encoding the reference patch, wherein the encoded reference patch comprises, at least, geometry information, 1240; encoding the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch comprises an indication of the reference patch, 1250; and transmitting at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported, 1260.
  • FIG. 13 illustrates the potential steps of an example method 1300.
  • the example method 1300 may include: determining a reference patch and an inherited geometry patch, wherein the reference patch comprises geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch is associated with the reference patch, 1310; and determining a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch, 1320.
  • an apparatus may comprise: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: determine that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches may comprise at least partially different texture information; select one of the at least two patches to be a reference patch; select at least one other of the at least two patches to be an inherited geometry patch; encode the reference patch, wherein the encoded reference patch may comprise, at least, geometry information; encode the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch may comprise an indication of the reference patch; and transmit at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • the at least two patches may comprise patches corresponding to one of: different camera views of the same region of the surface, or different modalities for capturing the same region of the surface.
  • the different modalities may comprise one of: visible light detection, infrared light detection, ultraviolet light detection, multi-spectral image detection, or hyper-spectral image detection.
  • the at least two patches may correspond to a same geometry of the volumetric content.
  • Determining that the at least two of the plurality of patches relate to the same region of the surface may comprise the example apparatus being configured to: determine that depth information of the volumetric content is captured; align at least two of a plurality of camera views of the volumetric content based on the depth information of the volumetric content; determine pixels associated with the at least two aligned camera views that are view-dependent; and determine the at least two of the plurality of patches based on the view-dependent pixels of the at least two aligned camera views.
  • Determining the pixels associated with the at least two aligned camera views are view-dependent may be based, at least partially, on one of: a determination that the pixels of one of the at least two aligned camera views are associated with a different modality than pixels of another of the at least two aligned camera views, or a determination that values of pixels of one of the at least two aligned camera views vary greater than a threshold value compared to values of pixels of another of the at least two aligned camera views.
  • Determining that the at least two of the plurality of patches relate to the same region of the surface may comprise the example apparatus being configured to: align at least two of a plurality of camera views of the volumetric content based on stereo matching; determine pixels associated with the at least two aligned camera views that are view-dependent; and determine the at least two of the plurality of patches based on the view-dependent pixels of the at least two aligned camera views.
  • Selecting the reference patch may comprise the example apparatus being configured to: select a patch of the plurality of patches with a depth with a highest projection resolution and no occlusions; select a patch of the plurality of patches with a same projection view as more than one other of the plurality of patches; or select a patch of the plurality of patches based, at least partially, on an application constraint.
  • the indication of the reference patch may comprise a patch index of the reference patch.
  • the indication that patch inheritance is supported may comprise a flag included in a set of atlas sequence parameters.
  • the example apparatus may be further configured to: transmit, in a patch data unit, a flag configured to indicate that the inherited geometry patch comprises an inherited geometry patch.
  • an example method comprising: determining that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches may comprise at least partially different texture information; selecting one of the at least two patches to be a reference patch; selecting at least one other of the at least two patches to be an inherited geometry patch; encoding the reference patch, wherein the encoded reference patch comprises, at least, geometry information; encoding the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch may comprise an indication of the reference patch; and transmitting at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • an apparatus may comprise: circuitry configured to perform: determine that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches may comprise at least partially different texture information; select one of the at least two patches to be a reference patch; select at least one other of the at least two patches to be an inherited geometry patch; encode the reference patch, wherein the encoded reference patch may comprise, at least, geometry information; encode the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch may comprise an indication of the reference patch; and transmit at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • an apparatus may comprise: processing circuitry; memory circuitry including computer program code, the memory circuitry and the computer program code configured to, with the processing circuitry, enable the apparatus to: determine that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches may comprise at least partially different texture information; select one of the at least two patches to be a reference patch; select at least one other of the at least two patches to be an inherited geometry patch; encode the reference patch, wherein the encoded reference patch may comprise, at least, geometry information; encode the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch may comprise an indication of the reference patch; and transmit at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • circuitry may refer to one or more or all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and (b) combinations of hardware circuits and software, such as (as applicable): (i) a combination of analog and/or digital hardware circuit(s) with software/firmware and (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(es) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.”
  • This definition of circuitry applies to all uses of this term in this application, including in any claims.
  • circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware.
  • circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
  • an apparatus may comprise means for performing: determining that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches may comprise at least partially different texture information; selecting one of the at least two patches to be a reference patch; selecting at least one other of the at least two patches to be an inherited geometry patch; encoding the reference patch, wherein the encoded reference patch may comprise, at least, geometry information; encoding the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch may comprise an indication of the reference patch; and transmitting at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • a non- transitory computer-readable medium comprising program instructions stored thereon which, when executed with at least one processor, cause the at least one processor to: determine that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches may comprise at least partially different texture information; select one of the at least two patches to be a reference patch; select at least one other of the at least two patches to be an inherited geometry patch; encode the reference patch, wherein the encoded reference patch may comprise, at least, geometry information; encode the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch may comprise an indication of the reference patch; and transmit at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • a non- transitory program storage device readable by a machine may be provided, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising: determine that at least two of a plurality of patches relate to a same region of a surface of a volumetric content, wherein the at least two patches may comprise at least partially different texture information; select one of the at least two patches to be a reference patch; select at least one other of the at least two patches to be an inherited geometry patch; encode the reference patch, wherein the encoded reference patch may comprise, at least, geometry information; encode the inherited geometry patch, wherein the encoded inherited geometry patch does not comprise geometry information, wherein the encoded inherited geometry patch may comprise an indication of the reference patch; and transmit at least the encoded reference patch, the encoded inherited geometry patch, and an indication that patch inheritance is supported.
  • an apparatus may comprise: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to:determine a reference patch and an inherited geometry patch, wherein the reference patch may comprise geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch may be associated with the reference patch; and determine a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • the reference patch and the inherited geometry patch may comprise patches corresponding to one of: different camera views of a same region of a surface of a volumetric content, or different modalities for capturing the same region of the surface.
  • the different modalities may comprise one of: visible light detection, infrared light detection, ultraviolet light detection, multi-spectral image detection, or hyper-spectral image detection.
  • the reference patch and the inherited geometry patch may correspond to a same geometry of a volumetric content.
  • the example apparatus may be further configured to: project the inherited geometry patch to reconstruct volumetric content based, at least partially, on the derived geometry and a texture associated with the inherited geometry patch.
  • the example apparatus may be further configured to: receive an indication that patch inheritance is supported.
  • the indication that patch inheritance is supported may comprise a flag included in a set of atlas sequence parameters.
  • Determining the derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch may comprise the example apparatus being configured to: reproject the geometry information of the reference patch from a projection camera view associated with the reference patch to the projection camera view associated with the inherited geometry patch.
  • an example method comprising: determining a reference patch and an inherited geometry patch, wherein the reference patch may comprise geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch may be associated with the reference patch; and determining a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • an apparatus may comprise: circuitry configured to perform: determine a reference patch and an inherited geometry patch, wherein the reference patch may comprise geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch may be associated with the reference patch; and determine a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • an apparatus may comprise: processing circuitry; memory circuitry including computer program code, the memory circuitry and the computer program code configured to, with the processing circuitry, enable the apparatus to: determine a reference patch and an inherited geometry patch, wherein the reference patch may comprise geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch may be associated with the reference patch; and determine a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • an apparatus may comprise means for performing: determining a reference patch and an inherited geometry patch, wherein the reference patch may comprise geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch may be associated with the reference patch; and determining a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • a non- transitory computer-readable medium comprising program instructions stored thereon which, when executed with at least one processor, cause the at least one processor to: determine a reference patch and an inherited geometry patch, wherein the reference patch may comprise geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch may be associated with the reference patch; and determine a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.
  • a non- transitory program storage device readable by a machine may be provided, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising: determine a reference patch and an inherited geometry patch, wherein the reference patch may comprise geometry information, wherein the inherited geometry patch does not comprise geometry information, wherein the inherited geometry patch may be associated with the reference patch; and determine a derived geometry for the inherited geometry patch based, at least partially, on the geometry information of the reference patch and a projection camera view associated with the inherited geometry patch.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Parmi une pluralité de pièces décrivant une source volumétrique dans une scène volumétrique, plus d'un peut se rapporter à une même région d'une surface de la source volumétrique, et par conséquent être associée à une même géométrie mais à une texture différente. Les différentes textures peuvent résulter de l'utilisation de différentes vues de caméra pour chaque pièce ou différentes modalités de capture. Une pièce de référence peut être codée qui comprend des informations de géométrie et/ou de profondeur. Une ou plusieurs pièces géométriques héritées peuvent être codées qui ne comprennent pas d'informations de géométrie et/ou de profondeur. Afin de reconstruire la source volumétrique, une géométrie peut être dérivée pour les pièces géométriques hérités sur la base des informations géométriques d'une pièce de référence associé à la même surface de la source volumétrique. Un ensemble de paramètres de séquence d'atlas et/ou une syntaxe d'extension MIV d'unité de données de pièce peuvent être modifiés pour permettre l'utilisation de pièces géométriques hérités.
PCT/IB2022/053575 2021-04-23 2022-04-15 Pièces à géométrie héritée WO2022224112A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163178572P 2021-04-23 2021-04-23
US63/178,572 2021-04-23

Publications (1)

Publication Number Publication Date
WO2022224112A1 true WO2022224112A1 (fr) 2022-10-27

Family

ID=81387123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2022/053575 WO2022224112A1 (fr) 2021-04-23 2022-04-15 Pièces à géométrie héritée

Country Status (1)

Country Link
WO (1) WO2022224112A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0783162A1 (fr) 1995-12-28 1997-07-09 M.I.B. Elettronica S.R.L. Dispositif et méthode pour accepter et distribuer des billets de banque
US20210217203A1 (en) * 2020-01-08 2021-07-15 Apple Inc. Video-Based Point Cloud Compression with Predicted Patches
WO2021191495A1 (fr) * 2020-03-25 2021-09-30 Nokia Technologies Oy Procédé, appareil et produit-programme d'ordinateur pour codage vidéo et décodage vidéo

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0783162A1 (fr) 1995-12-28 1997-07-09 M.I.B. Elettronica S.R.L. Dispositif et méthode pour accepter et distribuer des billets de banque
US20210217203A1 (en) * 2020-01-08 2021-07-15 Apple Inc. Video-Based Point Cloud Compression with Predicted Patches
WO2021191495A1 (fr) * 2020-03-25 2021-09-30 Nokia Technologies Oy Procédé, appareil et produit-programme d'ordinateur pour codage vidéo et décodage vidéo

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
"Test Model 8 for MPEG Immersive Video", no. n20002, 30 January 2021 (2021-01-30), XP030293031, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/133_OnLine/wg11/MDS20002_WG04_N00050-v2.zip WG04N0050_tmiv8.pdf> [retrieved on 20210130] *
BASEL SALAHIEHJOEL JUNGADRIAN DZIEMBOWSKICHRISTOPH BACHHUBER: "Test Model 8 for MPEG Immersive Video", ISO/IEC JTC, vol. 4, 29 January 2021 (2021-01-29), pages N0050
BOYCE JILL M ET AL: "MPEG Immersive Video Coding Standard", PROCEEDINGS OF THE IEEE, IEEE. NEW YORK, US, vol. 109, no. 9, 10 March 2021 (2021-03-10), pages 1521 - 1536, XP011873492, ISSN: 0018-9219, [retrieved on 20210818], DOI: 10.1109/JPROC.2021.3062590 *
PATRICE RONDAO ALFACE ET AL: "MIV Exploration Experiment EE3 software description", no. m57075, 21 June 2021 (2021-06-21), XP030296584, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/135_OnLine/wg11/m57075-v1-m57075-v1.zip m57075-v1.docx> [retrieved on 20210621] *
PATRICE RONDAO ALFACE ET AL: "MIV Exploration Experiment EE-4: multiple texture patches per geometry patch: software description and preliminary results", no. m56338, 12 March 2021 (2021-03-12), XP030293673, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/134_OnLine/wg11/m56338-v1-m56338_MIV_EE4.zip m56338_MIV_EE4/m56338_MIV_EE4_input.docx> [retrieved on 20210312] *
PATRICE RONDAO ALFACE ET AL: "Multiple Texture Patches Per Geometry Patch", no. m55977, 12 January 2021 (2021-01-12), XP030290853, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/133_OnLine/wg11/m55977-v3-m55977_Multiple_Texture_Patches_Per_Geometry_Patch_revised.zip m55977_Multiple_Texture_Patches_Per_Geometry_Patch_revised.docx> [retrieved on 20210112] *
TREIBLE, W. ET AL.: "Cats: A color and thermal stereo benchmark", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2017, pages 2961 - 2969

Similar Documents

Publication Publication Date Title
US10600233B2 (en) Parameterizing 3D scenes for volumetric viewing
US11509933B2 (en) Method, an apparatus and a computer program product for volumetric video
JP6630891B2 (ja) 明視野画像ファイルを符号化および復号するためのシステムおよび方法
JP6939883B2 (ja) 自由視点映像ストリーミング用の復号器を中心とするuvコーデック
US20230290006A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
JP7344988B2 (ja) ボリュメトリック映像の符号化および復号化のための方法、装置、およびコンピュータプログラム製品
WO2021191495A1 (fr) Procédé, appareil et produit-programme d&#39;ordinateur pour codage vidéo et décodage vidéo
WO2021260266A1 (fr) Procédé, appareil et produit-programme informatique pour codage vidéo volumétrique
US20220191544A1 (en) Radiative Transfer Signalling For Immersive Video
WO2022224112A1 (fr) Pièces à géométrie héritée
US20220377302A1 (en) A method and apparatus for coding and decoding volumetric video with view-driven specularity
EP4133719A1 (fr) Procédé, appareil et produit-programme informatique pour codage vidéo volumétrique
CN116508318A (zh) 点云数据发送装置、点云数据发送方法、点云数据接收装置及点云数据接收方法
EP4038880A1 (fr) Procédé et appareil pour coder, transmettre et décoder une vidéo volumétrique
WO2019185983A1 (fr) Procédé, appareil et produit-programme d&#39;ordinateur destinés au codage et au décodage de vidéo volumétrique numérique
US20220345681A1 (en) Method and apparatus for encoding, transmitting and decoding volumetric video
US20230300336A1 (en) V3C Patch Remeshing For Dynamic Mesh Coding
US20230298217A1 (en) Hierarchical V3C Patch Remeshing For Dynamic Mesh Coding
US20230362409A1 (en) A method and apparatus for signaling depth of multi-plane images-based volumetric video
US20240179347A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20240177355A1 (en) Sub-mesh zippering
US20220353531A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2022219230A1 (fr) Procédé, appareil et produit-programme d&#39;ordinateur de codage vidéo et de décodage vidéo
WO2019211519A1 (fr) Procédé et appareil de codage et de décodage de vidéo volumétrique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22719035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22719035

Country of ref document: EP

Kind code of ref document: A1