WO2024100028A1

WO2024100028A1 - Signalling for real-time 3d model generation

Info

Publication number: WO2024100028A1
Application number: PCT/EP2023/080971
Authority: WO
Inventors: Saba AHSAN; Lauri Aleksi ILOLA; Sujeet Shyamsundar Mate; Igor Danilo Diego Curcio; Kashyap KAMMACHI SREEDHAR
Original assignee: Nokia Technologies Oy
Priority date: 2022-11-08
Filing date: 2023-11-07
Publication date: 2024-05-16

Abstract

An apparatus may be configured to: obtain at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real-world object; transmit, to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world object based, at least partially, on a plurality of images; obtain the plurality of images with respect to, at least, the real-world object; transmit, to the network entity, the plurality of images; obtain, from the network entity, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model; and transmit at least one of: the three-dimensional model, or an update for the three-dimensional model based, at least partially, on the at least one second description.

Description

SIGNALLING FOR REAL-TIME 3D MODEL GENERATION

TECHNICAL FIELD

[0001] The example and non-limiting embodiments relate generally to 3D model generation and, more particularly, to updating of 3D models.

BACKGROUND

[0002] It is known, in volumetric video, to generate a 3D model from a set of images.

SUMMARY

[0003] The following summary is merely intended to be illustrative. The summary is not intended to limit the scope of the claims.

[0004] In accordance with one aspect, an apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: obtain at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real- world object; transmit, to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world object based, at least partially, on a plurality of images; obtain the plurality of images with respect to , at least , the real-world obj ect ; transmit , to the network entity, the plurality of images ; obtain, from the network entity, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; and transmit at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 0005 ] In accordance with one aspect , a method comprising : obtaining, with a user equipment , at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real-world obj ect ; transmitting, to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; obtaining the plurality of images with respect to , at least , the real-world obj ect ; transmitting, to the network entity, the plurality of images ; obtaining, from the network entity, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; and transmitting at least one of : the three-dimensional model , or an update for the three- dimensional model based, at least partially, on the at least one second description .

[ 0006 ] In accordance with one aspect , an apparatus comprising means for : obtaining at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three- dimensional model of a real-world obj ect ; transmitting, to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; obtaining the plurality of images with respect to , at least , the real-world obj ect ; transmitting, to the network entity, the plurality of images ; obtaining, from the network entity, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; and transmitting at least one o f : the three-dimensional model , or an update for the three- dimensional model based, at least partially, on the at least one second description .

[0007 ] In accordance with one aspect , a non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following : caus ing obtaining of at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real-world obj ect ; caus ing transmitting, to the network entity, of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing obtaining of the plurality of images with respect to , at least , the real-world obj ect ; causing transmitting, to the network entity, of the plurality of images ; causing obtaining, from the network entity, of at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model; and causing transmitting of at least one of: the three-dimensional model, or an update for the three-dimensional model based, at least partially, on the at least one second description.

[0008] In accordance with one aspect, an apparatus comprising: at least one processor; and at least one non- transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive an invitation to join a call with a user equipment; transmit an acceptance of the invitation; receive, from a network entity, a three-dimensional model of a real-world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object; and receive an update for the three-dimensional model.

[0009] In accordance with one aspect, a method comprising: receiving, with a receiver user equipment, an invitation to join a call with a user equipment; transmitting an acceptance of the invitation; receiving, from a network entity, a three- dimensional model of a real-world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object; and receiving an update for the three- dimensional model.

[0010] In accordance with one aspect, an apparatus comprising means for: receiving an invitation to join a call with a user equipment; transmitting an acceptance of the invitation; receiving, from a network entity, a three-dimensional model of a real-world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object; and receiving an update for the three-dimensional model .

[0011] In accordance with one aspect, a non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following: causing receiving of an invitation to join a call with a user equipment; causing transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three-dimensional model of a real-world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object; and causing receiving an update for the three-dimensional model.

[0012] In accordance with one aspect, an apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: transmit, to a user equipment, at least one first description, wherein the at least one first description comprises an indication that the apparatus is capable of generating a three-dimensional model of a real-world object; receive, from the user equipment, at least one indication with respect to generation of the three- dimensional model of the real-world object based, at least partially, on a plurality of images; receive, from the user equipment, the plurality of images with respect to, at least, the real-world obj ect ; generate the three-dimensional model based, at least partially, on the plurality of images ; transmit , to the user equipment , at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; transmit , to a further user equipment , the generated three-dimensional model ; receive , from the user equipment an update for the three-dimensional model based, at least partially, on the at least one second description; and transmit , to the further user equipment , the update for the three-dimensional model .

[ 0013 ] In accordance with one aspect , a method comprising : transmitting, with a network entity to a user equipment , at least one first description, wherein the at least one first description comprises an indication that the end node is capable of generating a three-dimensional model of a real- world obj ect ; receiving, from the user equipment , at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; receiving, from the user equipment , the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; transmitting, to the user equipment , at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; transmitting, to a further user equipment , the generated three-dimensional model ; receiving, from the user equipment , an update for the three- dimensional model based, at least partially, on the at least one second description; and transmitting, to the further user equipment , the update for the three-dimensional model .

[ 0014 ] In accordance with one aspect , an apparatus comprising means for : transmitting, to a user equipment , at least one first description, wherein the at least one first description comprises an indication that the apparatus is capable of generating a three-dimensional model of a real-world obj ect ; receiving, from the user equipment , at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; receiving, from the user equipment , the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; transmitting, to the user equipment , at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; transmitting, to a further user equipment , the generated three-dimensional model ; receiving, from the user equipment , an update for the three- dimensional model based, at least partially, on the at least one second description; and transmitting, to the further user equipment , the update for the three-dimensional model .

[ 0015 ] In accordance with one aspect , a non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following : caus ing transmitting, with a network entity to a user equipment , of at least one first description, wherein the at least one first description comprises an indication that the network entity is capable of generating a three-dimensional model of a real- world obj ect ; causing receiving, from the user equipment , of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing receiving, from the user equipment , of the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; causing transmitting, to the user equipment , of at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; causing transmitting, to a further user equipment , of the generated three-dimensional model ; causing receiving, from the user equipment , of an update for the three-dimensional model based, at least partially, on the at least one second description; and causing transmitting, to the further user equipment , of the update for the three- dimensional model .

[0016] According to some aspects , there is provided the subj ect matter of the independent claims . Some further aspects are defined in the dependent claims . BRIEF DESCRIPTION OF THE DRAWINGS

[0017] The foregoing aspects and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:

[0018] FIG. 1 is a block diagram of one possible and nonlimiting example system in which the example embodiments may be practiced;

[0019] FIG. 2 is a block diagram of one possible and nonlimiting exemplary system in which the example embodiments may be practiced;

[0020] FIG. 3 is a diagram illustrating features as described herein;

[0021] FIG. 4 is a diagram illustrating features as described herein;

[0022] FIG. 5a is a flowchart illustrating steps as described herein;

[0023] FIG. 5b is a flowchart illustrating steps as described herein;

[0024] FIG. 6 is a flowchart illustrating steps as described herein;

[0025] FIG. 7 is a flowchart illustrating steps as described herein; and [ 0026 ] FIG . 8 is a flowchart illustrating steps as described herein .

DETAILED DESCRIPTION OF EMBODIMENTS

[ 0027 ] The following abbreviations that may be found in the speci fication and/or the drawing figures are defined as follows :

3DoF three degrees of freedom

3 GPP third generation partnership project

4G fourth generation

5G fifth generation

5GC 5G core network

6DoF six degrees of freedom

API application programming interface

AR augmented reality

AS application server

CDMA code division multiple access

CPU central processing unit cRAN cloud radio access network

CSCF call session control function

DCS data channel server eNB (or eNodeB) evolved Node B (e.g., an LTE base station)

EN-DC E-UTRA-NR dual connectivity en-gNB or En-gNB node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in EN-DC E-UTRA evolved universal terrestrial radio access, i.e., the LTE radio access technology

FDMA frequency division multiple access

GLB GL transmission format binary gLTF (or GLTF) GL transmission format

GL graphics language gNB (or gNodeB) base station for 5G/NR, i.e., a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC

GPU graphical processing unit

GSM global systems for mobile communications

HE header extension

HEVC high efficiency video coding

HMD head-mounted display

IEEE Institute of Electrical and Electronics Engineers

IMD integrated messaging device

IMS instant messaging service

IMS IP multimedia subsystem loT Internet of Things

IP internet protocol

JSON JavaScript object notification

LTE long term evolution

MCU multipoint control unit

MMS multimedia messaging service

MPEG-I Moving Picture Experts Group immersive codec family

MR mixed reality

MRF media resource function

MTSI multimedia telephony service for IP multimedia subsystem ng or NG new generation ng-eNB or NG-eNB new generation eNB NR new radio

N/W or NW network

0-RAN open radio access network

PC personal computer

PDA personal digital assistant

RTCP RTP control protocol

RTP real-time transmission protocol

SDP session description protocol

SIP session initiation protocol

SMS short messaging service

SSRC synchronization source

TCP-IP transmission control protocol-internet protocol

TDMA time division multiple access

UE user equipment (e.g., a wireless, typically mobile device)

UMTS universal mobile telecommunications system

URI uniform resource identifier

USB universal serial bus

V3C visual volumetric video-based coding

VR virtual reality

VNR virtualized network function

WLAN wireless local area network

XR AR, MR, VR

[ 0028 ] The following describes suitable apparatus and possible mechanisms for practicing example embodiments of the present disclosure . Accordingly, reference is first made to FIG . 1 , which shows an example block diagram of an apparatus 50 . The apparatus may be configured to perform various functions such as , for example , gathering information by one or more sensors , encoding and/or decoding information, receiving and/or transmitting information, analyzing information gathered or received by the apparatus , or the like . A device configured to encode a video scene may ( optionally) comprise one or more microphones for capturing the scene and/or one or more sensors , such as cameras , for capturing information about the physical environment in which the scene is captured . Alternatively, a device configured to encode a video scene may be configured to receive information about an environment in which a scene is captured and/or a simulated environment . A device configured to decode and/or render the video scene may be configured to receive a Moving Picture Experts Group immersive codec family (MPEG- I ) bitstream comprising the encoded video scene . A device configured to decode and/or render the video scene may comprise one or more speakers/audio transducers and/or displays , and/or may be configured to transmit a decoded scene or signals to a device comprising one or more speakers/audio transducers and/or displays . A device configured to decode and/or render the video scene may comprise a user equipment , a head/mounted display, or another device capable of rendering to a user an AR, VR and/or MR experience .

[ 0029 ] The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system . Alternatively, the electronic device may be a computer or part of a computer that is not mobile . It should be appreciated that example embodiments may be implemented within any electronic device or apparatus which may process data . The electronic device 50 may comprise a device that can access a network and/or cloud through a wired or wireless connection . The electronic device 50 may comprise one or more proces sors 56 , one or more memories 58 , and one or more transceivers 52 interconnected through one or more buses . The one or more processors 56 may comprise a central processing unit ( CPU) and/or a graphical processing unit ( GPU) . Each of the one or more transceivers 52 includes a receiver and a transmitter . The one or more buses may be address , data, or control buses , and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit , fiber optics or other optical communication equipment , and the like . A "circuit" may include dedicated hardware or hardware in association with software executable thereon . The one or more transceivers may be connected to one or more antennas 44 . The one or more memories 58 may include computer program code . The one or more memories 58 and the computer program code may be configured to , with the one or more processors 56 , cause the electronic device 50 to perform one or more of the operations as described herein .

[ 0030 ] The electronic device 50 may connect to a node of a network . The network node may comprise one or more processors , one or more memories , and one or more transceivers interconnected through one or more buses . Each of the one or more transceivers includes a receiver and a transmitter . The one or more buses may be address , data, or control buses , and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit , fiber optics or other optical communication equipment , and the like . The one or more transceivers may be connected to one or more antennas . The one or more memories may include computer program code . The one or more memories and the computer program code may be configured to , with the one or more processors , cause the network node to perform one or more of the operations as described herein .

[ 0031 ] The electronic device 50 may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input . The electronic device 50 may further comprise an audio output device 38 which in example embodiments may be any one of : an earpiece , speaker, or an analogue audio or digital audio output connection . The electronic device 50 may also comprise a battery ( or in other example embodiments the device may be powered by any suitable mobile energy device such as solar cell , fuel cell , or clockwork generator ) . The electronic device 50 may further comprise a camera 42 or other sensor capable of recording or capturing images and/or video . Additionally or alternatively, the electronic device 50 may further comprise a depth sensor . The electronic device 50 may further comprise a display 32 . The electronic device 50 may further comprise an infrared port for short range line of sight communication to other devices . In other example embodiments the apparatus 50 may further comprise any suitable short-range communication solution such as for example a BLUETOOTH™ wireless connection or a USB/ firewire wired connection . [ 0032 ] It should be understood that an electronic device 50 configured to perform example embodiments of the present disclosure may have fewer and/or additional components , which may correspond to what processes the electronic device 50 is configured to perform . For example , an apparatus configured to encode a video might not comprise a speaker or audio transducer and may comprise a microphone , while an apparatus configured to render the decoded video might not comprise a microphone and may comprise a speaker or audio transducer .

[ 0033 ] Refe rring now to FIG . 1 , the electronic device 50 may comprise a controller 56 , processor or processor circuitry for controlling the apparatus 50 . The controller 56 may be connected to memory 58 which in example embodiments may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56 . The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and/or decoding of audio and/or video data or assisting in coding and/or decoding carried out by the controller .

[ 0034 ] The electronic device 50 may further compri se a card reader 48 and a smart card 46 , for example a UICC and UICC reader, for providing user information and being suitable for providing authentication information for authentication and authori zation of the user/electronic device 50 at a network . The electronic device 50 may further comprise an input device 34 , such as a keypad, one or more input buttons , or a touch screen input device , for providing information to the controller 56 .

[ 0035 ] The electronic device 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system, or a wireless local area network . The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus ( es ) and/or for receiving radio frequency signals from other apparatus ( es ) .

[ 0036 ] The electronic device 50 may comprise a microphone 38 , camera 42 , and/or other sensors capable of recording or detecting audio signals , image/video signals , and/or other information about the local/virtual environment , which are then passed to the codec 54 or the controller 56 for processing . The electronic device 50 may receive the audio/ image/video signals and/or information about the local/virtual environment for processing from another device prior to transmission and/or storage . The electronic device 50 may also receive either wirelessly or by a wired connection the audio/ image/video signals and/or information about the local/virtual environment for encoding/decoding . The structural elements of electronic device 50 described above represent examples of means for performing a corresponding function . [ 0037 ] The memory 58 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices , flash memory, magnetic memory devices and systems , optical memory devices and systems , fixed memory and removable memory . The memory 58 may be a non-transitory memory . The memory 58 may be means for performing storage functions . The controller 56 may be or comprise one or more processors , which may be of any type suitable to the local technical environment , and may include one or more of general-purpose computers , special purpose computers , microprocessors , digital signal processors ( DSPs ) and processors based on a multi-core processor architecture , as non-limiting examples . The controller 56 may be means for performing functions .

[ 0038 ] The electronic device 50 may be configured to perform capture of a volumetric scene according to example embodiments of the present disclosure . For example , the electronic device 50 may comprise one or more cameras 42 or one or more other sensors capable of recording or capturing images and/or video . The electronic device 50 may also comprise one or more transceivers 52 to enable transmi ssion of captured content for processing at another device . Such an electronic device 50 may or may not include all the modules illustrated in FIG . 1 .

[ 0039 ] The electronic device 50 may be configured to perform processing of volumetric video content according to example embodiments of the present disclosure . For example , the electronic device 50 may comprise a controller 56 for processing images to produce volumetric video content , a controller 56 for processing volumetric video content to proj ect 3D information into 2D information, patches , and auxiliary information, and/or a codec 54 for encoding 2D information, patches , and auxiliary information into a bitstream for transmission to another device with radio interface 52 . Such an electronic device 50 may or may not include all the modules illustrated in FIG . 1 .

[ 0040 ] The electronic device 50 may be configured to perform encoding or decoding of 2D information representative of volumetric video content according to example embodiments o f the present disclosure . For example , the electronic device 50 may comprise a codec 54 for encoding or decoding 2D information representative of volumetric video content . Such an electronic device 50 may or may not include all the modules illustrated in FIG . 1 .

[ 0041 ] The electronic device 50 may be configured to perform rendering of decoded 3D volumetric video according to example embodiments of the present disclosure . For example , the electronic device 50 may comprise a controller for proj ecting 2D information to reconstruct 3D volumetric video , and/or a display 32 for rendering decoded 3D volumetric video . Such an electronic device 50 may or may not include all the modules illustrated in FIG . 1 .

[ 0042 ] With respect to FIG . 2 , an example of a system within which example embodiments of the present disclosure can be utili zed is shown . The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, E- UTRA, LTE, CDMA, 4G, 5G network etc.) , a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a BLUETOOTH™ personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and/or the Internet. A wireless network may implement network virtualization, which is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network. Network virtualization involves platform virtualization, often combined with resource virtualization. Network virtualization is categorized as either external, combining many networks, or parts of networks, into a virtual unit, or internal, providing network-like functionality to software containers on a single system. For example, a network may be deployed in a tele cloud, with virtualized network functions (VNF) running on, for example, data center servers. For example, network core functions and/or radio access network (s) (e.g. CloudRAN, O-RAN, edge cloud) may be virtualized. Note that the virtualized entities that result from the network virtualization are still implemented, at some level, using hardware such as processors and memories, and also such virtualized entities create technical effects. [0043] It may also be noted that operations of example embodiments of the present disclosure may be carried out by a plurality of cooperating devices (e.g. cRAN) .

[0044] The system 10 may include both wired and wireless communication devices and/or electronic devices suitable for implementing example embodiments.

[0045] For example, the system shown in FIG. 2 shows a mobile telephone network 11 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

[0046] The example communication devices shown in the system 10 may include, but are not limited to, an apparatus 15, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22, and a headmounted display (HMD) 17. The electronic device 50 may comprise any of those example communication devices. In an example embodiment of the present disclosure, more than one of these devices, or a plurality of one or more of these devices, may perform the disclosed process (es) . These devices may connect to the internet 28 through a wireless connection 2.

[0047] The example embodiments may also be implemented in a set-top box; i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC) , which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/sof tware based coding. The example embodiments may also be implemented in cellular telephones such as smart phones, tablets, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.

[0048] Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24, which may be, for example, an eNB, gNB, access point, access node, other node, etc. The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28. The system may include additional communication devices and communication devices of various types. [0049] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA) , global systems for mobile communications (GSM) , universal mobile telecommunications system (UMTS) , time divisional multiple access (TDMA) , frequency division multiple access (FDMA) , transmission control protocol-internet protocol (TCP-IP) , short messaging service (SMS) , multimedia messaging service (MMS) , email, instant messaging service (IMS) , BLUETOOTH™, IEEE 802.11, 3GPP Narrowband loT and any similar wireless communication technology. A communications device involved in implementing various example embodiments of the present disclosure may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection.

[0050] In telecommunications and data networks, a channel may refer either to a physical channel or to a logical channel. A physical channel may refer to a physical transmission medium such as a wire, whereas a logical channel may refer to a logical connection over a multiplexed medium, capable of conveying several logical channels. A channel may be used for conveying an information signal, for example a bitstream, which may be a MPEG-I bitstream, from one or several senders (or transmitters) to one or several receivers.

[0051] Having thus introduced one suitable but non-limiting technical context for the practice of the example embodiments of the present disclosure, example embodiments will now be described with greater specificity.

[0052] Features as described herein generally relate to volumetric or 3D video, images, and/or objects. For example, features as described herein may relate to a six degrees of freedom representation. Unlike a three degrees of freedom (3DoF) experience, an immersive six degrees of freedom (6DoF) representation enables a larger viewing-space, wherein viewers have both translational and rotational freedom of movement. In a 3DoF visual experience, content is presented to viewers as if they were positioned at the center of a scene, looking outwards, with all parts of the content positioned at a constant distance. 6DoF experiences allow viewers to move freely in the scene and experience the content from various viewpoints. Contrarily to 3DoF, 6DoF videos enable perception of motion parallax, where the change in relative geometry between objects is reflected with the pose of the viewer.

[0053] There are many ways to capture and represent a volumetric/3D scene or object. The format used to capture and represent it may depend on the processing to be performed on it, and/or the target application using it. Some exemplary representations are discussed below.

[0054] For example, a volumetric object may be represented as a point cloud. A point cloud is a set of unstructured points in 3D space, where each point is characterized by its position in a 3D coordinate system (e.g. Euclidean) , and some corresponding attributes (e.g. color information provided as RGBA value, or normal vectors) .

[0055] For example, a volumetric object may be represented as images, with or without depth, captured from multiple viewpoints in 3D space. In other words, it may be represented by one or more view frames (where a view is a projection of a volumetric scene on to a plane (the camera plane) using a real or virtual camera with known/computed extrinsics and intrinsics) . Each view may be represented by a number of components (e.g. geometry, color, transparency, and occupancy picture) , which may be part of the geometry picture or represented separately.

[0056] For example, a volumetric object may be represented as a mesh. A mesh is a collection of points, called vertices, and connectivity information between vertices, called edges. Vertices along with edges form faces. The combination of vertices, edges and faces can uniquely approximate shapes of ob ects .

[0057] Depending on the capture, a volumetric object may provide viewers the ability to navigate a scene with six degrees of freedom, for example with both translational and rotational movement of their viewing pose (which includes yaw, pitch, and roll) . The data to be coded for a volumetric frame may be significant, as a volumetric frame may contain many objects, and the positioning and movement of these objects in the scene may result in many dis-occluded regions. Furthermore, the interaction of light and materials in objects and surfaces in a volumetric frame can generate complex light fields that can produce texture variations for even a s light change of pose . A 3D model , as used in this document , may refer to the volumetric representation of an obj ect .

[0058 ] A 3D model may be used in the provision of virtual reality (VR) , augmented reality (AR) , and/or mixed reality (MR) . It should be understood that example embodiments described with regard to one of VR, AR, or MR ( collectively, XR) may be implemented with respect to any of these technology areas . Virtual reality (VR) is an area of technology in which video content may be provided, e . g . streamed, to a VR display system . The VR display system may be provided with a live or stored feed from a video content source , the feed representing a VR space or world for immersive output through the display system . A virtual space or virtual world is any computergenerated version of a space , for example a captured real- world space , in which a user can be immersed through a di splay system such as a VR headset . A VR headset may be configured to provide VR video and audio content to the user, e . g . through the use of a pair of video screens and headphones incorporated within the headset . Augmented reality (AR) is similar to VR in that video content may be provided, as above , which may be overlaid over or combined with aspects of a real-world environment in which the AR content is being consumed . A user of AR content may therefore experience a version of the real- world environment that is "augmented" with additional virtual features , such as virtual visual and/or audio obj ects . A device may provide AR video and audio content overlaid over a visible, see-through, or recorded version of the real-world visual and audio elements.

[0059] 3D models may be provided via a graphics language transmission format (GLTF/gLTF) . The GL Transmission Format (glTF) is a JavaScript object notification (JSON) based rendering application programming interface (API) agnostic runtime asset delivery format. glTF bridges the gap between 3D content creation tools and modern 3D applications by providing an efficient, extensible, interoperable format for the transmission and loading of 3D content.

[0060] glTF assets are JSON files plus supporting external data. Specifically, a glTF asset may be represented by at least one of the following files. For example, a JSON- formatted file (.gltf) containing a full scene description: node hierarchy, materials, cameras, as well as descriptor information for meshes, animations, and other constructs may be used. For example, binary files (.bin) containing geometry and animation data, and other buffer-based data may be used. For example, image files (.jpg, .png) for textures may be used.

[0061] The JSON formatted file contains information about the binary files that describe how they may be used, when uploaded to the GPU, with minimal processing. This makes the glTF particularly suitable for runtime delivery, as the assets may be directly copied into GPU memory for the rendering pipeline . [0062] Assets defined in other formats, such as images, may be stored in external files referenced via uniform resource identifier (URI) , stored side-by-side in a graphics language transmission format binary (GLB) container, or embedded directly into the JSON using data URIs.

[0063] glTF has been designed to allow extensibility. While the initial base specification supports a rich feature set, there will be many opportunities for growth and improvement. glTF defines a mechanism that allows the addition of both general-purpose and vendor-specific extension. The MPEG defines the extension that allows signaling that the model can be updated in the future, and provides information regarding through which channel the update can potentially come. A glTF object may be updated using a JSON patch as defined in RFC 6902.

[0064] An example of an update to be made to a 3D model or object is animation. Skinning, or rigging, is a technique used for animating a 3D object. A skin is created when a 3D model is bound to a skeleton. The skeleton is a hierarchical representation of joints. Each joint is bound to a portion of the mesh/point cloud, and a skin weight is assigned to the corresponding vertices in a process known as skinning. The weight determines the influence of the skeletal joint transformation on each vertex (in case of mesh) or point (in case of point cloud) when the joints move. Other techniques for animating a 3D object are also possible with example embodiments of the present disclosure. [0065] Features as described herein may relate to 3GPP MTSI. 3GPP specification TS 26.114, Multimedia Telephony Service for IP Multimedia subsystem (MTSI) , defines a standardized method for establishing a multimedia call between terminals using the IP Multimedia Subsystem (IMS) . The specification uses SDP for session signaling and the Real-time Transmission Protocol (RTP) for the delivery of real-time media. A Media Resource Function (MRF) is a network element that may be inserted in the media path between sender and receiver for media processing. The MTSI specification supports speech, video, images, text, and unidirectional 360-degree video. In addition to the media path, the data channel may be used by MTSI clients for transporting objects and data. Generally, the data channel is used for the transmission of non-real-time data.

[0066] Features as described herein may relate to conversational AR. In an Augmented Reality (AR) conversational call, the user may capture their own 3D model using one or more cameras. The model generation, skinning, and weighting may be done by the (capture) device, by a cloud entity (e.g. Multipoint Control Unit (MCU) or MRF) , or at an edge entity. The model may then be delivered to the other party using glTF, and subsequent updates may be done using a JSON patch. The 3D object may be animated using motion data captured by the sender and delivered to the receiver/viewer .

[0067] For example, in a conference with multiple users, each user may use a device to capture their own 3D model for transmission to other users in the conference. A MRF or MCU, which may be part of a wireless network 100, may act as a middleman for media processing/delivery . In another example use case, there may be a conversational audio-visual session between two MTSI UEs, where one device is capturing omnidirectional content and the other person is consuming the content with a 2D display or a head-mounted display (HMD) .

[0068] In the following description, it should be noted that the phrases "MCU" and "MRF" may be used interchangeably. Use of one or the other of these words should not be interpreted as limiting the disclosure. It should also be noted that where an MCU/MRF is described as performing an action, a sender UE may appropriately perform the action instead, and vice versa.

[0069] A network/edge entity, such as, an MRF, MCU, or other appropriate server/endpoint , may provide the capability to create a 3D model to be used in an XR experience. The 3D model may be created from images captured by one or multiple cameras at the source. In an example embodiment, the source may provide virtual elements to include in the 3D model e.g., a texture, in addition to the captured images. In a different example embodiment, the virtual elements and the captured images may be collected from different sources. For example, the virtual elements may come from a library of predetermined virtual elements, or may be provided via user input. For a conversational XR experience, the session description may be carried using session description protocol (SDR) signaling. During the call, a 3D model update may be required, for example when: a new camera becomes available; more of the object is captured than before (e.g. a person may move away from the camera so now their torso in addition to head and shoulders is also captured) ; the angle of the object captured changes, e.g. a person turns around; lighting conditions have changed (e.g., a window is opened) ; etc. Currently, SDP does not define a way to signal that the network/edge/server/endpoint has this capability. Furthermore, there is no way to signal that the 3D model requires an update such that new images from the source are needed (e.g., due to change in capture situation at the source) . The benefit of knowing when new images are required is to avoid redundant delivery of images to describe an already complete model of the object.

[0070] Example embodiments of the present disclosure may relate to a real-time model generator. A model generator is a device/process/application/server that generates a 3D model from a set of images for real-time consumption. When new content from a source is available, the model generator may generate an update of the 3D model. The frequency of receiving new content from source may be different from the frequency of obtaining an updated 3D model from the model generator.

[0071] Example embodiments of the present disclosure may relate to a source. A source is a device/process/application that provides the content for creating and updating the 3D model .

[0072] Referring now to FIG. 3, illustrated is an example of signaling between a source (305) and a 3D model generator (310) . Each of the signals may be optional, may occur in a different order, and/or may occur simultaneously. It may also be noted that, while not illustrated, the signaling between the source and the 3D model generator may not be direct (i.e. indirect) ; the signaling may be forwarded by an intermediate node/device/ function, for example a MRF or MCU.

[0073] In an example embodiment, the 3D model generator (310) may transmit a signal that indicates the capability to process captured content and generate a 3D model (320) . In an example embodiment, the 3D model generator (310) may transmit a signal indicating the media properties of the input required to generate the 3D model (330) . For example, the 3D model generator (310) may indicate that encoded RGB streams, RGB-D streams, and/or V3C format may be used. In an example embodiment, the 3D model generator (310) may transmit a signal indicating the maximum and/or minimum number of each type of input bitstreams it can support (340) . In an example embodiment, the 3D model generator (310) may transmit a signal indicating the process for updating the 3D model (350) . All or some of the following may be indicated: that the update may be triggered by the source of the input bitstreams; that the update may be triggered by the model generator; and/or that periodic update may be used, with an indication of the period.

[0074] In an example embodiment, the source (305) may transmit a signal from a system/process/device that it has the capability to act as source for 3D model generation (315) . In an example embodiment, the source (305) may transmit a signal indicating the media properties of the input (325) (i.e. input to the 3D model generator 310) . In an example embodiment, the source (305) may transmit a signal indicating the maximum and/or minimum number of each type of input bitstreams it can support (335) . In an example embodiment, the source (305) may transmit a signal indicating the process for updating the 3D model (345) . All or some of the following may be indicated: that the update may be triggered by the source of the input bitstreams; that the update may be triggered by the model generator; and/or that periodic update may be used, with an indication of the period.

[0075] In an example embodiment, the 3D model update may subscribe to either the source or 3D model generator, or both. For example, if the receiver is subscribing or selecting the option for only the 3D model generator updates, the updates may be made available to the receiver only when the 3D model generator updates the 3D model. This may be used to regulate the visibility of updates to the receiver.

[0076] Referring now to FIG. 4, illustrated is an example of a conversational AR use case consisting of a source UE (405) , a receiver UE (435) and a network function provided by, for example, an MRF (425) . The source UE (405) may be a mobile device with one or more cameras (410) . It may also be a mobile device paired with a camera rig with multiple cameras (410) for multipoint capture. While not illustrated in FIG. 4, the source UE (405) and the MRF (425) may exchange information regarding the capabilities of the source (405) , the capabilities of the MRF (425) , bitstream capabilities, 3D model update capabilities, etc. (see FIG. 3) . This information may be described as one or more descriptions; one or each of the source UE (405) and the MRF (425) may transmit and/or receive a description to be used for conversational AR service. It may be noted that description ( s ) from the source and/or MRF may be transmitted/received together/concurrently/simultaneously, or in any order relative to each other.

[0077] The source UE (405) may provide the captured images (420) of, for example, the user (415) to the MRF (425) for processing. The source UE (405) may also provide optional configuration information (e.g., camera configuration, encoding information, etc.) to the model generator (425) . It may be noted that the captured images and the configuration information may be transmitted together/concurrently/simultaneously, or in any order relative to each other. The MRF (425) may create a 3D model using the images, and may deliver the model (430) to a receiver UE (435) and, optionally, back to the source UE (405) . The MRF (425) may also assist in rigging and skinning the 3D model.

[0078] Subsequently after the model generation, the source UE (405) may generate motion signals for joints. The source UE (405) may send motion signals (450) directly to the receiver (435) for animation (not illustrated) , where the generated model (430) may be transformed according to the received motion signals (450) . In an example embodiment, the motion signals may be used for animating a 3D model. In an example embodiment, the motion signals may be derived via a skinning process.

[0079] In yet another embodiment, the source (UE) may continue to send images to the 3D model generator during the 3D model creation phase, as well as sending information for animating the subsequent motion. The 3D model generator may generate motion information for signaling to the receiver (UE) . This may enable less complex devices to perform the functionality of a source UE 405 (e.g. devices that cannot perform skinning/rigging) . The 3D model generator may subsequently delivery 3D models to multiple receivers in a suitable format (e.g., motion signals to the receivers that support it (450) , 3D visual model (430) to those that only render the received 3D model but are not capable of performing animation based on motion signals, etc.) .

[0080] It may be noted that, while the format of the motion signals may be generally known, how to create the motion signals for a rigged 3D model from multiple cameras depends on the specific implementation.

[0081] One way to create the motion signals (450) is to transmit the generated 3D model (430) back to the source UE (405) , where the source UE (405) may compare the views of the input cameras (410) to the views rendered from the 3D model (430) using the same camera configuration. The rendered 2D views overlaid with the actual views may display a transformation that may be encoded as motion signals (450) . [0082] An alternative for creating the motion signals (450) may be to perform real-time human segmentation on one or more input cameras (410) . For example, the KINECT® Azure supports this out of the box. The segmentation may provide joint transformation in 2D space, which may be converted into a 3D signal, when the camera configuration is known. If more than one camera performs the joint tracking, the 3D joint transformation may be combined between different views to achieve better 3D motion tracking. As an example, averaging of motion vectors per joint could be performed to improve the quality of the joint motion signal. The outcome (450) may be subsequently sent to the receiver (435) .

[0083] Referring now to FIGs. 5a-b, illustrated is an example call flow of an AR conversational call with avatars, which may be provided with UE1 (502) and UE2 (512) . In a call setup phase A (514) , UE1 (502) may sends an invite for an AR call (516) with UE2 (512) to the application server (AS) (510) via P/S/I-CSCF (504) . At 518, the CSCE (504) may forward the invite to the AS (510) , which may set up the data channel and MRF resources for the AR application. At 520, the AS (510) may select the proper data channel server (DCS) that will provide the AR application data channel resources (e.g. 506) . At 522, the AS (510) may instruct the DCS (506) to setup the data channel resources for the AR application. At 524, the AS (510) may set up the MRF (508) for the AR call. At 526, the AS (510) may forward the invite to UE2 via the CSCF (504) . At 528, the CSCF (504) may forward the invite to UE2 (512) . At 530, the UE2 (512) may accept the invite, indicate the acceptance to the CSCF (504) , and the session may start.

[0084] In a scene descript ion/entry point retrieval phase B (532) , the MRF (508) may prepare the scene description based on media descriptions and assets for the call (534) . Some assets may be available on the DCS (506) . At 536, the MRF (508) may send the scene description to the DCS (506) . At 538, the DCS (506) may deliver the scene update to the UEs (e.g. UE1 (502) and UE2 (512) ) .

[0085] In a scene description update phase C (540) , a UE may trigger a scene update (e.g. when a new object is added/removed in the scene, or a spatial information update is sent) (542) . In the example of FIG. 5a, the update is triggered by UE1 (502) , but it may be triggered be either UE (e.g. UE2 (512) ) .

At 544, the MRF (508) may process the new information and create a scene description update. At 546, the MRF (508) may send a scene description update to DCS (506) . At 548, the DCS may distribute the scene description update to all UEs (e.g. UE1 (502) , UE2 (512) ) . The updates may be for images for generating the 3D model by the model generator.

[0086] In an AR media and metadata exchange phase D (550) , UE1 (502) may provide source images (e.g. RTF video) from one or more cameras to the MRF (552) . The initial SDP exchange between UE1 (502) and the MRF (508) may include SDP attributes for the 3D-model indicating that the UE1 (502) is the source and the MRF (508) is the 3D-model generator. At 554, the MRF

(508) may process the images to create a 3D model and skinning. At 556, the MRF (508) may deliver the 3D model to the UE2 (512) . At 558, the UE1 (502) may provide source images (e.g. RTP video) to the MRF (508) . These source images may be used for animation. It may be noted that the stream at 554 and at 558 may be the same video stream. At 560, the MRF (508) may create motion signals (e.g. transformation ( s ) to be applied to key points of the 3D model) based on the input from UE1 (502) . At 562, the MRF (508) may deliver the motion signals to UE2 (512) . The motion signals may be to describe movements of the 3D model. At 564, the UE2 (512) may apply the transformation ( s ) to animate the 3D model.

[0087] Referring now to FIG. 5b, illustrated is a case in which the UE1 (502) performs skinning rather than the MRF (508) . The phases A (514) , B (532) , and C (540) may be the same as in FIG. 5a. In phase D of FIG. 5b, at 566 the UE1 (502) may provide source images from one or more cameras to the MRF (508) . The initial SDR exchange between UE1 (502) and MRF (508) may include SDP attributes for the 3D-model that may indicate that UE1 (502) is the source and the MRF (508) is the 3D-model generator. At 568, the MRF (508) may process the images to create the 3D-model. At 570, the MRF (508) may deliver the 3D-model to UE2 (512) . At 572, the MRF (508) may deliver the 3D model to UE1 (502) . During SDP exchange, UE1 (502) may have included the appropriate parameter to receive the 3D model. At 574, the UE1 (502) may skin the received model and generate motion signals. At 576, the UE1 (502) may send the motion signals to UE2 (512) for animating the 3D model. The UE2 (512) may then animate the 3D model. [0088] It may be noted that some of the steps of FIGs . 5a-b may be performed concurrently with each other, or in a different order than is illustrated; the examples of FIGs. 5a- b are not limiting.

[0089] In an example embodiment, the motion signals may be transported as an RTF header extension (HE) consisting of translation (float[3] ) , rotation (float[4] ) , scale vector, etc. The RTF HE may be associated with the RTF stream carrying the source data from UE1 to MRF; or it may be carried using the data channel. If an RTF HE is used, the source may send dummy data when source data is paused or not required but motion signals still need to be sent. The rigged model may include in its representation of the 3D model a hierarchical joint structure such that each joint has at least a name or number attached to it for identification. The joint transformation (rotation, translation, scale, etc.) may then be sent along with the appropriate joint ID. In an example embodiment, the type of transformation matrices to be included in the signal may be negotiated in SDP. In another example embodiment, the type of transformation matrices may be identified with an ID, such as a 4-bit number, such that the IDs are known or advertised amongst participants in session signaling. In another example embodiment, the same level of information may be included for all joints, such that 0 values indicate no transformation. The full set of transformations may be transmitted at a set frequency to cater to detection of missing motion signals due to packet losses. A receiver may use interpolation between two motion signals to recover lost information or to compensate for low-frequency updates. In an example embodiment, an SDP group or group of groups is used to advertise the relationship between motion data, source data and 3D model.

[0090] In an example embodiment, a new SDP attribute to indicate 3D model generation capability may be defined as follows : a=3d-model < source | process> <send | recv | sendrecv>

<motiondata> <type> <updatemode> [period]

[0091] The attribute may be used at the session level to indicate the capability to provide the source or processing capability for generating a 3D model. An endpoint that provides source media for creating a 3D model may use the parameter "source" in the SDP message it sends. An endpoint that generates a 3D model may use the parameter "process" in the SDP message it sends.

[0092] The parameter motiondata is a flag. If the flag is set to 0, the 3D model generator will not generate the motion data, nor transmit it to the receiver; the source may still generate motion/transformation data and deliver it to the receiver for animating the 3D model. The parameter type indicates the type of 3D-model, e.g., humanoid. To ensure interoperability, some known types may be registered. In an embodiment, the SDP offer may include a list of types supported, and the answer may contain the specific one that should be used. [0093] The parameter <send | recv | sendrecv> may indicate the direction of the data flow as shown in TABLE 1:

TABLE 1

[0094] The parameter updatemode may indicate how the 3D model updates will be handled, as shown in TABLE 2:

TABLE 2

[0095] The updates from the source may be triggered based on, for example, a user input, detection in change of lighting conditions by the source, detection of a new camera, detection of capture angle change, detection of change in type of model (e.g., instead of head and shoulders, the full body can be captured) , etc. The update from the model generator may be triggered due to, for example, application level functions. The period parameter may be optional, and may be added when the updatemode value is set to 3. The period may be expressed as a unit of time, for example, milliseconds or microseconds.

[0096] The following example shows an SDR offer from a 3D model generator that can support receiving two high efficiency video coding (HEVC) bitstreams, as indicated by the synchronization source (SSRC) multiplexed offered bitstreams, and process them into a 3D model. The parameters 'process' and 'sendrecv' indicate that the model generator may provide the 3D-model. The update may be periodic (updatemode value is 3) with a period of 3000ms. An RTCP feedback for pause/resume is used for updating. m=video 40004 RTP/AVP 98 a=rtpmap:98 H265/90000 a=ssrc:llll fmtp:98 [format specific parameters] a=ssrc:2222 fmtp:98 [format specific parameters] a=3d-model process sendrecv 3 3000 a=imageattr : 98 recv [x=1280, y=720] [x=320, y=240] a=rtcp-fb:* ccm pause nowait

[0097] Bel ow is an example of an answer from a capture source that sends multiple video streams that may be used for 3D model generation. The parameters 'source' and 'sendrecv' indicate that the source also wants to receive the 3D model. The model update may be triggered by the model generator. An RTCP feedback for pause/resume may be used for pausing and resuming the media path to allow model updates. m=video 40004 RTP/AVP 98 a=rtpmap:98 H265/90000 a=ssrc:llll fmtp:98 [format specific parameters] a=ssrc:2222 a=fmtp:98 [format specific parameters] a=3d-model source sendrecv 1 a=imageattr : 98 send [x=1280, y=720] [x=320, y=240] a=rtcp-fb:* ccm pause nowait

[0098] Additional information configured to help the 3D generator to perform 3D modelling may be included in the source answer. This may, for example, include information about the camera configuration used to capture the RGB or RGB-D bitstreams. This information may be signaled as part of SDP as attributes or parameters. Alternatively, the V3C format may be utilized to provide information about the capture configuration to the modeler, as it supports signaling of different camera models.

[0099] In an example embodiment, the send, recv, and sendrecv parameters may be used as a property of the parameter process and source of the attribute 3d-model. The order may indicate the origin of the session negotiation in SIP (of f er->answer ) . In such a case, sendrecv may not be required, unless the sending and receiving entity are both "source" and "3d-model generator". TABLE 3 shows how the parameters of the 3d-model attribute may be used in this case where the UE is the source and MRF is the 3D-model generator. In SDP, the MRF may list process : send/none before source : recv/none to indicate that it is a 3D model generator. The UE may list source : send/none before process : recv/none to indicate that it is a source. The syntax for this is shown below: 3d-model 2<"source" | "process" "send" | "recv"> <motiondata> <type> [updatemode] [period]

TABLE 3 [OO1OO] Below is an example offer from a 3d-modeler that may process input images into 3D model and deliver the model. The offer may also indicate that it can receive source images. m=video 40004 RTP/AVP 98 a=rtpmap:98 H265/90000 a=ssrc:llll fmtp:98 [format specific parameters] a=ssrc:2222 fmtp:98 [format specific parameters] a=3d-model process: send source : recv 1 3000 a=imageattr : 98 recv [x=1280, y=720] [x=320, y=240] a=rtcp-fb:* ccm pause nowait

[00101] An example answer from the source is shown below, process: recv indicates that the processed 3d-model should be delivered to the source as well. source: send indicates that the source can send images for 3D model processing. m=video 40004 RTP/AVP 98 a=rtpmap:98 H265/90000 a=ssrc:llll fmtp:98 [format specific parameters] a=ssrc:2222 a=fmtp:98 [format specific parameters] a=3d-model source: send process: recv 3 a=imageattr : 98 recv [x=1280, y=720] [x=320, y=240] send

[x=1280, y=720] [x=320 , y=240 ] a=rtcp-fb:* ccm pause nowait

[00102] The generated 3D model may be delivered in another stream, for example over the data channel as glTF. In this case, another parameter 'ID' may be added to the 3d-model attribute. The ID may be used to map the delivery path of the 3d-model. If the 3D model is delivered over RTF, the ID may be set to the mid associated with the media description of that RTF stream or data channel.

[00103] In an example embodiment where 3d-model is used as a session-level attribute, the parameters may include a list of mid values associated with the media that may be used for creating the model, and also for delivering the model. In another example embodiment, an additional parameter may indicate one or both the maximum and the minimum number of input media streams that the model-generator or the source can support .

[00104] In an example embodiment, where the media description of the different input bitstreams is identical, the SDP offer may contain only one media description. The answerer may then add additional media descriptions such that the total number is equal to or more than the minimum, and equal to or less than the maximum.

[00105] In an example embodiment, additional parameters about the model may also be signaled. For example, an endpoint may indicate that the model is of a head and shoulders type. In another example embodiment, metadata related to the rigging (e.g. joints and skinned weights) may be signaled as byte encoded data along with the model. In an example embodiment, an endpoint may indicate if rigging is required with the 3D modelling. In another example embodiment, an endpoint may indicate if rigging data is delivered with the 3D model. In an example embodiment, the requirement for rigging may be determined implicitly when the session signaling includes a media/data path for motion data.

[00106] SDP and RTP are provided as examples in the present disclosure; however, other protocols, e.g., SCTP, QUIC, HTTP and WebRTC datachannel, may be used for exchanging the defined signaling according to an example embodiment.

[00107] In an example embodiment, media from the source may be delivered over a protocol other than RTP (e.g. over a WebRTC data channel) . In an example embodiment, the 3D model may be delivered over a protocol other than RTP (e.g. over a WebRTC data channel) .

[00108] A technical effect of example embodiments of the present disclosure may be to enable generation of 3D models in real-time, which may have the technical effect of allowing users to share their appearance in the present instead of relying on pre-generated models.

[00109] A technical effect of example embodiments of the present disclosure may be to enable the images to be handled by the conversational service provider, and not stored on a third-party server. In an example embodiment, the source may choose to download and store the model itself.

[00110] FIG. 6 illustrates the potential steps of an example method 600. The example method 600 may include: obtaining at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real- world object, 610; transmitting, to the network entity, at least one indication with respect to generation of the three- dimensional model of the real-world object based, at least partially, on a plurality of images, 620; obtaining the plurality of images with respect to, at least, the real-world object, 630; transmitting, to the network entity, the plurality of images, 640; obtaining, from the end node, at least one second description, wherein the at least one second description is configured to indicate at least part of the three- dimensional model, 650; and transmitting at least one of: the three-dimensional model, or an update for the three- dimensional model based, at least partially, on the at least one second description, 660. The example method 600 may be performed, for example, with a UE, for example a source UE, and/or a capture device. The network entity may be an edge entity, a MRF, a MCU, an end node, an endpoint, a network element, a server, etc.

[00111] FIG. 7 illustrates the potential steps of an example method 700. The example method 700 may include: transmitting, to a user equipment, at least one first description, wherein the at least one first description comprises an indication that the apparatus is capable of generating a three-dimensional model of a real-world object, 710; receiving, from the user equipment, at least one indication with respect to generation of the three-dimensional model of the real-world object based, at least partially, on a plurality of images, 720; receiving, from the user equipment, the plurality of images with respect to, at least, the real-world object, 730; generating the three- dimensional model based, at least partially, on the plurality of images, 740; transmitting, to the user equipment, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model, 750; transmitting, to a further user equipment, the generated three-dimensional model, 760; receiving, from the user equipment, an update for the three- dimensional model based, at least partially, on the at least one second description, 770; and transmitting, to the further user equipment, the update for the three-dimensional model, 780. The example method 700 may be performed, for example, with a 3D model generator, an end node, a network entity, an edge entity, a MRF, a MCU, an endpoint, a network element, a server, etc.

[00112] FIG. 8 illustrates the potential steps of an example method 800. The example method 800 may include: receiving an invitation to join a call with a user equipment, 810; transmitting an acceptance of the invitation, 820; receiving, from a network entity, a three-dimensional model of a real- world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object, 830; and receiving an update for the three-dimensional model, 840. The example method 800 may be performed, for example, with a UE, for example a receiver UE .

[00113] In accordance with one example embodiment, an apparatus may comprise: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to : obtain at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real- world obj ect ; transmit , to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; obtain the plurality of images with respect to , at least , the real-world obj ect ; transmit , to the network entity, the plurality of images ; obtain, from the network entity, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and transmit at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00114 ] The at least one transmitted indication may comprise an indication of at least one of : a capability of the apparatus to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the apparatus , a maximum number of a second type of bitstream supported by the apparatus , or a process for updating the three-dimensional model .

[ 00115 ] The at least one transmitted indication may compri se the indication of the process for updating the three- dimensional model , wherein the process for updating the three- dimensional model may comprise at least one of : triggering update of the three-dimensional model with the apparatus , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period .

[ 00116 ] The example apparatus may be further configured to : trigger update of the three-dimensional model in response to at least one of : a user input , a determination that a new camera has become available to the apparatus , a determination that an amount of the real-world obj ect that can be captured has increased, a determination that an angle of capture of the real-world obj ect has changed, or a determination that a lighting condition of the real-world obj ect has changed, wherein the at least one transmitted indication may comprise the indication of the process for updating the three- dimensional model , comprising triggering update of the three- dimensional model with the apparatus .

[ 00117 ] The network entity may comprise one of : a media resource function, or a multipoint control unit .

[ 00118 ] The at least one first description may further comprise an indication of at least one of : at least one required media property for the plurality of images to generate the three-dimensional model , a minimum number of a first type of bitstream supported by the network entity, a maximum number of a second type of bitstream supported by the network entity, or a process for updating the three-dimensional model . [ 00119 ] The at least one first description may comprise the indication of the at least one required media property, wherein the at least one required media property may comprise at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

[ 00120 ] The at least one first description may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the apparatus , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period .

[ 00121 ] The example apparatus may be further configured to : transmit the at least one of : the three-dimensional model , or the update for the three-dimensional model in response to a trigger received from the network entity, wherein the at least one first description may comprise the indication of the process for updating the three-dimensional model , compri sing triggering update of the three-dimensional model with the network entity .

[ 00122 ] The at least one first description may comprise at least one session description protocol of fer .

[ 00123 ] Transmitting the at least one indication may comprise the example apparatus being configured to : transmit , to the network entity, a session description protocol answer, wherein the session description protocol answer may comprise, at least, the at least one indication.

[00124] The example apparatus may comprise a source user equipment associated with a plurality of cameras configured to capture the plurality of images, wherein the source user equipment may be at least partially different from a receiver user equipment.

[00125] The at least one second description may comprise at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model.

[00126] The example apparatus may be further configured to: compare information of the at least one second description with one or more camera views of the real-world object; and determine the update for the real-world object based, at least partially, on the comparison.

[00127] The example apparatus may be further configured to: transmit, to the network entity, configuration information associated with the plurality of images.

[00128] The configuration information may comprise at least one of: a camera configuration, or encoding information.

[00129] The example apparatus may be further configured to: determine one or more joints of the real-world object; generate one or more motion signals for the one or more joints; and transmit, to a user equipment, the one or more motion signals. [00130] The example apparatus may be further configured to: perform a rigging operation or a skinning operation based, at least partially, on the plurality of images.

[00131] The example apparatus may be further configured to: perform real-time segmentation of the real-world object; determine one or more motion signals based, at least partially, on the real-time segmentation of the real-world object; and transmit, to a user equipment, the one or more motion signals.

[00132] The example apparatus may be further configured to: transmit, to a user equipment, an invitation to join an augmented reality call; and transmit the plurality of images as part of the augmented reality call.

[00133] In accordance with one aspect, an example method may be provided comprising: obtaining, with a user equipment, at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real- world object; transmitting, to the network entity, at least one indication with respect to generation of the three- dimensional model of the real-world object based, at least partially, on a plurality of images; obtaining the plurality of images with respect to, at least, the real-world object; transmitting, to the network entity, the plurality of images; obtaining, from the network entity, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model; and transmitting at least one of: the three- dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00134 ] The at least one transmitted indication may comprise an indication of at least one of : a capability of the user equipment to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the user equipment , a maximum number of a second type of bitstream supported by the user equipment , or a process for updating the three-dimensional model .

[ 00135 ] The at least one transmitted indication may comprise the indication of the process for updating the three- dimensional model , wherein the process for updating the three- dimensional model may comprise at least one of : triggering update of the three-dimensional model with the user equipment , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period .

[ 00136 ] The example method may further comprise : triggering update of the three-dimensional model in response to at least one of : a user input , a determination that a new camera has become available to the user equipment , a determination that an amount of the real-world obj ect that can be captured has increased, a determination that an angle of capture of the real-world obj ect has changed, or a determination that a lighting condition of the real-world obj ect has changed, wherein the at least one transmitted indication may comprise the indication of the process for updating the three- dimensional model , comprising triggering update of the three- dimensional model with the user equipment .

[ 00137 ] The network entity may comprise one of : a media resource function, or a multipoint control unit .

[ 00138 ] The at least one first description may further comprise an indication of at least one of : at least one required media property for the plurality of images to generate the three-dimensional model , a minimum number of a first type of bitstream supported by the network entity, a maximum number of a second type of bitstream supported by the network entity, or a process for updating the three-dimensional model .

[ 00139 ] The at least one first description may comprise an indication of the at least one required media property, wherein the at least one required media property may comprise at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

[ 00140 ] The at least one first description may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the user equipment , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period . [00141] The example method may further comprise: transmitting the at least one of: the three-dimensional model, or the update for the three-dimensional model in response to a trigger received from the network entity, wherein the at least one first description may comprise the indication of the process for updating the three-dimensional model, comprising triggering update of the three-dimensional model with the network entity.

[00142] The at least one first description may comprise at least one session description protocol offer.

[00143] The transmitting of the at least one indication may comprise: transmitting, to the network entity, a session description protocol answer, wherein the session description protocol answer may comprise, at least, the at least one indication .

[00144] The user equipment may comprise a source user equipment associated with a plurality of cameras configured to capture the plurality of images, wherein the source user equipment may be at least partially different from a receiver user equipment.

[00145] The at least one second description may comprise at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model.

[00146] The example method may further comprise: comparing information of the at least one second description with one or more camera views of the real-world object; and determining of the update for the real-world object based, at least partially, on the comparison.

[00147] The example method may further comprise: transmitting, to the network entity, configuration information associated with the plurality of images.

[00148] The configuration information may comprise at least one of: a camera configuration, or encoding information.

[00149] The example method may further comprise: determining one or more joints of the real-world object; generating one or more motion signals for the one or more joints; and transmitting, to a user equipment, the one or more motion signals .

[00150] The example method may further comprise: performing a rigging operation or a skinning operation based, at least partially, on the plurality of images.

[00151] The example method may further comprise: performing real-time segmentation of the real-world object; determining one or more motion signals based, at least partially, on the real-time segmentation of the real-world object; and transmitting, to a user equipment, the one or more motion signals .

[00152] The example method may further comprise: transmitting, to a user equipment, an invitation to join an augmented reality call ; and transmitting the plurality of images as part of the augmented reality call .

[ 00153 ] In accordance with one example embodiment , an apparatus may comprise : circuitry configured to perform : obtaining, with a user equipment , at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real-world obj ect ; circuitry configured to perform : transmitting, to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; circuitry configured to perform : obtaining the plurality of images with respect to, at least , the real-world obj ect ; circuitry configured to perform : transmitting, to the network entity, the plurality o f images ; circuitry configured to perform : obtaining, from the network entity, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and circuitry configured to perform : transmitting at least one of : the three- dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00154 ] In accordance with one example embodiment , an apparatus may comprise : processing circuitry; memory circuitry including computer program code , the memory circuitry and the computer program code configured to , with the processing circuitry, enable the apparatus to: obtain at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real-world object; transmit, to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world object based, at least partially, on a plurality of images; obtain the plurality of images with respect to, at least, the real-world object; transmit, to the network entity, the plurality of images; obtain, from the network entity, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model; and transmit at least one of: the three-dimensional model, or an update for the three- dimensional model based, at least partially, on the at least one second description.

[00155] As used in this application, the term "circuitry" may refer to one or more or all of the following: (a) hardware- only circuit implementations (such as implementations in only analog and/or digital circuitry) and (b) combinations of hardware circuits and software, such as (as applicable) : (i) a combination of analog and/or digital hardware circuit (s) with software/ firmware and (ii) any portions of hardware processor (s) with software (including digital signal processor ( s ) ) , software, and memory (ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) hardware circuit (s) and or processor ( s ) , such as a microprocessor ( s ) or a portion of a microprocessor ( s ) , that requires software ( e . g . , firmware ) for operation, but the software may not be present when it is not needed for operation . " This definition of circuitry applies to all uses of this term in this application, including in any claims . As a further example , as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor ( or multiple processors ) or portion of a hardware circuit or processor and its ( or their ) accompanying software and/or firmware . The term circuitry also covers , for example and i f applicable to the particular claim element , a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device , or other computing or network device .

[ 00156 ] In accordance with one example embodiment , an apparatus may comprise means for : obtaining at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real-world obj ect ; transmitting, to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; obtaining the plurality of images with respect to , at least , the real-world obj ect ; transmitting, to the network entity, the plurality of images ; obtaining, from the network entity, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and transmitting at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00157 ] The at least one transmitted indication may comprise an indication of at least one of : a capability of the apparatus to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the apparatus , a maximum number of a second type of bitstream supported by the apparatus , or a process for updating the three-dimensional model .

[ 00158 ] The at least one transmitted indication may comprise the indication of the process for updating the three- dimensional model , wherein the process for updating the three- dimensional model may comprise at least one of : triggering update of the three-dimensional model with the apparatus , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period .

[ 00159 ] The means may be further configured for : triggering update of the three-dimensional model in response to at least one of : a user input , a determination that a new camera has become available to the apparatus , a determination that an amount of the real-world obj ect that can be captured has increased, a determination that an angle of capture of the real-world obj ect has changed, or a determination that a lighting condition of the real-world obj ect has changed, wherein the at least one transmitted indication may comprise the indication of the process for updating the three- dimensional model , comprising triggering update of the three- dimensional model with the apparatus .

[ 00160 ] The network entity may comprise one of : a media resource function, or a multipoint control unit .

[ 00161 ] The at least one first description may further comprise an indication of at least one of : at least one required media property for the plurality of images to generate the three-dimensional model , a minimum number of a first type of bitstream supported by the network entity, a maximum number of a second type of bitstream supported by the network entity, or a process for updating the three-dimensional model .

[ 00162 ] The at least one first description may comprise an indication of the at least one required media property, wherein the at least one required media property may comprise at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

[ 00163 ] The at least one first description may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the apparatus , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period . [00164] The means may be further configured for: transmitting the at least one of: the three-dimensional model, or the update for the three-dimensional model in response to a trigger received from the network entity, wherein the at least one first description may comprise the indication of the process for updating the three-dimensional model, comprising triggering update of the three-dimensional model with the network entity.

[00165] The at least one first description may comprise at least one session description protocol offer.

[00166] The means configured for transmitting the at least one indication may comprise means configured for: transmitting, to the network entity, a session description protocol answer, wherein the session description protocol answer may comprise, at least, the at least one indication.

[00167] The example apparatus may comprise a source user equipment associated with a plurality of cameras configured to capture the plurality of images, wherein the source user equipment may be at least partially different from a receiver user equipment.

[00168] The at least one second description may comprise at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model.

[00169] The means may be further configured for: comparing information of the at least one second description with one or more camera views of the real-world object; and determining of the update for the real-world object based, at least partially, on the comparison.

[00170] The means may be further configured for: transmitting, to the network entity, configuration information associated with the plurality of images.

[00171] The configuration information may comprise at least one of: a camera configuration, or encoding information.

[00172] The means may be further configured for: determining one or more joints of the real-world object; generating one or more motion signals for the one or more joints; and transmitting, to a user equipment, the one or more motion signals .

[00173] The means may be further configured for: a rigging operation or a skinning operation based, at least partially, on the plurality of images.

[00174] The means may be further configured for: real-time segmentation of the real-world object; determining one or more motion signals based, at least partially, on the real-time segmentation of the real-world object; and transmitting, to a user equipment, the one or more motion signals.

[00175] The means may be further configured for: transmitting, to a user equipment, an invitation to join an augmented reality call; and transmitting the plurality of images as part of the augmented reality call. [ 00176 ] A processor, memory, and/or example algorithms (which may be encoded as instructions , program, or code ) may be provided as example means for providing or causing performance of operation .

[ 00177 ] In accordance with one example embodiment , a non- transitory computer-readable medium comprising instructions stored thereon which, when executed with at least one processor, cause the at least one processor to : cause obtaining of at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real-world obj ect ; cause transmitting, to the network entity, of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; cause obtaining of the plurality of images with respect to , at least , the real- world obj ect ; cause transmitting, to the network entity, of the plurality of images ; cause obtaining, from the network entity, of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and cause transmitting of at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00178 ] In accordance with one example embodiment , a non- transitory computer-readable medium comprising program instructions stored thereon for performing at least the following : causing obtaining of at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real-world obj ect ; caus ing transmitting, to the network entity, of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing obtaining of the plurality of images with respect to , at least , the real-world obj ect ; causing transmitting, to the network entity, of the plurality of images ; causing obtaining, from the network entity, of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and causing transmitting of at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00179 ] In accordance with another example embodiment, a non- transitory program storage device readable by a machine may be provided, tangibly embodying instructions executable by the machine for performing operations , the operations comprising : causing obtaining of at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three- dimensional model of a real-world obj ect ; caus ing transmitting, to the network entity, of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing obtaining of the plurality of images with respect to , at least , the real-world obj ect ; causing transmitting, to the network entity, of the plurality of images ; causing obtaining, from the network entity, of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and causing transmitting of at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00180 ] In accordance with another example embodiment , a non- transitory computer-readable medium comprising instructions that , when executed by an apparatus , cause the apparatus to perform at least the following : causing obtaining o f at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real- world obj ect ; causing transmitting, to the network entity, o f at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing obtaining of the plurality of images with respect to , at least , the real- world obj ect ; causing transmitting, to the network entity, o f the plurality of images ; causing obtaining, from the network entity, of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and caus ing transmitting of at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00181 ] A computer implemented system comprising : at least one processor and at least one non-transitory memory storing instructions that , when executed by the at least one proces sor, cause the system at least to perform : causing obtaining of at least one first description, wherein the at least one first description may comprise an indication that a network entity is capable of generating a three-dimensional model of a real- world obj ect ; causing transmitting, to the network entity, o f at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing obtaining of the plurality of images with respect to , at least , the real- world obj ect ; causing transmitting, to the network entity, o f the plurality of images ; causing obtaining, from the network entity, of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and caus ing transmitting of at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00182 ] A computer implemented system comprising : means for causing obtaining of at least one first description, wherein the at least one first description may compri se an indication that a network entity is capable of generating a three- dimensional model of a real-world obj ect ; means for causing transmitting, to the network entity, of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; means for causing obtaining of the plurality of images with respect to , at least , the real-world obj ect ; means for causing transmitting, to the network entity, of the plurality of images ; means for caus ing obtaining, from the network entity, of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; and means for causing transmitting of at least one of : the three- dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

[ 00183 ] In accordance with one example embodiment , an apparatus may comprise : at least one processor ; and at least one memory storing instructions that , when executed by the at least one processor, cause the apparatus at least to : receive an invitation to j oin a call with a user equipment ; transmit an acceptance of the invitation; receive , from a network entity, a three-dimensional model of a real-world obj ect , wherein the network entity may be capable of generating the three-dimensional model of the real-world obj ect ; and receive an update for the three-dimensional model .

[ 00184 ] The example apparatus may be further configured to : receive at least one motion signal for the three-dimensional model; and transform the three-dimensional model based, at least partially, on the at least one motion signal.

[00185] The call may comprise an augmented reality call.

[00186] In accordance with one aspect, an example method may be provided comprising: receiving, with a receiver user equipment, an invitation to join a call with a further user equipment; transmitting an acceptance of the invitation; receiving, from a network entity, a three-dimensional model of a real-world object, wherein the network entity may be capable of generating the three-dimensional model of the real-world object; and receiving an update for the three-dimensional model .

[00187] The example method may further comprise: receiving at least one motion signal for the three-dimensional model; and transforming the three-dimensional model based, at least partially, on the at least one motion signal.

[00188] The call may comprise an augmented reality call.

[00189] In accordance with one example embodiment, an apparatus may comprise: circuitry configured to perform: receiving, with a receiver user equipment, an invitation to join a call with a further user equipment; circuitry configured to perform: transmitting an acceptance of the invitation; circuitry configured to perform: receiving, from a network entity, a three-dimensional model of a real-world object, wherein the network entity may be capable of generating the three-dimensional model of the real-world object; and circuitry configured to perform: receiving an update for the three-dimensional model.

[00190] In accordance with one example embodiment, an apparatus may comprise: processing circuitry; memory circuitry including computer program code, the memory circuitry and the computer program code configured to, with the processing circuitry, enable the apparatus to: receive an invitation to join a call with a further user equipment; transmit an acceptance of the invitation; receive, from a network entity, a three-dimensional model of a real-world object, wherein the network entity may be capable of generating the three- dimensional model of the real-world object; and receive an update for the three-dimensional model.

[00191] In accordance with one example embodiment, an apparatus may comprise means for: receiving an invitation to join a call with a further user equipment; transmitting an acceptance of the invitation; receiving, from a network entity, a three-dimensional model of a real-world object, wherein the network entity may be capable of generating the three- dimensional model of the real-world object; and receiving an update for the three-dimensional model.

[00192] The means may be further configured for: receiving at least one motion signal for the three-dimensional model; and transforming the three-dimensional model based, at least partially, on the at least one motion signal.

[00193] The call may comprise an augmented reality call. [ 00194 ] In accordance with one example embodiment , a non- transitory computer-readable medium comprising instructions stored thereon which, when executed with at least one processor, cause the at least one processor to : cause receiving of an invitation to j oin a call with a further user equipment ; cause transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three-dimensional model of a real-world obj ect , wherein the network entity may be capable of generating the three-dimensional model of the real- world obj ect ; and cause receiving of an update for the three- dimensional model .

[ 00195 ] In accordance with one example embodiment , a non- transitory computer-readable medium comprising program instructions stored thereon for performing at least the following : causing receiving of an invitation to j oin a cal l with a further user equipment ; causing transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three-dimensional model of a real-world obj ect , wherein the network entity may be capable of generating the three-dimensional model of the real-world obj ect ; and caus ing receiving of an update for the three-dimensional model .

[ 00196 ] In accordance with another example embodiment , a non- transitory program storage device readable by a machine may be provided, tangibly embodying instructions executable by the machine for performing operations , the operations comprising : causing receiving of an invitation to j oin a call with a further user equipment ; causing transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three-dimensional model of a real-world object, wherein the network entity may be capable of generating the three- dimensional model of the real-world object; and causing receiving of an update for the three-dimensional model.

[00197] In accordance with another example embodiment, a non- transitory computer-readable medium comprising instructions that, when executed by an apparatus, cause the apparatus to perform at least the following: causing receiving of an invitation to join a call with a further user equipment; causing transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three- dimensional model of a real-world object, wherein the network entity may be capable of generating the three-dimensional model of the real-world object; and causing receiving of an update for the three-dimensional model.

[00198] A computer implemented system comprising: at least one processor and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: causing receiving of an invitation to join a call with a further user equipment; causing transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three- dimensional model of a real-world object, wherein the network entity may be capable of generating the three-dimensional model of the real-world object; and causing receiving of an update for the three-dimensional model. [00199] A computer implemented system comprising: means for causing receiving of an invitation to join a call with a further user equipment; means for causing transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three-dimensional model of a real-world object, wherein the network entity may be capable of generating the three-dimensional model of the real-world object; and means for causing receiving of an update for the three-dimensional model .

[00200] In accordance with one example embodiment, an apparatus may comprise: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: transmit, to a user equipment, at least one first description, wherein the at least one first description may comprise an indication that the apparatus is capable of generating a three-dimensional model of a real-world object; receive, from the user equipment, at least one indication with respect to generation of the three-dimensional model of the real-world object based, at least partially, on a plurality of images; receive, from the user equipment, the plurality of images with respect to, at least, the real-world object; generate the three-dimensional model based, at least partially, on the plurality of images; transmit, to the user equipment, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model; transmit, to a further user equipment, the generated three-dimensional model; receive, from the user equipment , an update for the three-dimensional model based, at least partially, on the at least one second description; and transmit , to the further user equipment , the update for the three-dimensional model .

[ 00201 ] The at least one first description may comprise an indication of at least one of : at least one required media property for the plurality of images to generate the three- dimensional model , a minimum number of a first type of bitstream supported by the apparatus , a maximum number of a second type of bitstream supported by the apparatus , or a process for updating the three-dimensional model .

[ 00202 ] The at least one first description may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the apparatus , triggering update of the three-dimensional model with the user equipment , or periodic update of the three-dimensional model with an indicated period .

[ 00203 ] The example apparatus may be further configured to : transmit , to the user equipment , a trigger to transmit the update the three-dimensional model , wherein the trigger may be transmitted in response to at least one application level function, wherein the at least one first description may comprise the indication of the process for updating the three- dimensional model , comprising triggering update of the three- dimensional model with the apparatus . [ 00204 ] The at least one first description may comprise an indication of the at least one required media property, wherein the at least one required media property may comprise at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

[ 00205 ] The example apparatus may compri se one of : an entity configured to perform a media resource function, or a multipoint control unit .

[ 00206 ] The at least one received indication may comprise an indication of at least one of : a capability of the user equipment to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the user equipment , a maximum number of a second type of bitstream supported by the user equipment , or a process for updating the three-dimensional model .

[ 00207 ] The at least one received indication may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the user equipment , triggering update of the three-dimensional model with the apparatus , or periodic update of the three-dimensional model with an indicated period .

[ 00208 ] Re ceiving the at least one indication may comprise the example apparatus being configured to : receive , from the user equipment, a session description protocol answer, wherein the session description protocol answer may comprise, at least, the at least one indication.

[00209] The at least one first description may comprise at least one a session description protocol offer.

[00210] The at least one second description may comprise at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model.

[00211] The example apparatus may be further configured to: receive, from the user equipment, configuration information associated with the plurality of images.

[00212] The configuration information may comprise at least one of: a camera configuration, or encoding information.

[00213] The example apparatus may be further configured to: receive, from the user equipment, one or more motion signals for the three-dimensional model; and transmit, to the further user equipment, the one or more motion signals.

[00214] The example apparatus may be further configured to: transmit, to the further user equipment, an indication of a delivery path of the three-dimensional model.

[00215] The example apparatus may be further configured to: generate one or more motion signals based, at least partially, on the three-dimensional model. [00216] The example apparatus may be further configured to: transmit, to the further user equipment, the one or more motion signals .

[00217] The example apparatus may be further configured to: transmit, to the user equipment, the one or more motion signals .

[00218] In accordance with one aspect, an example method may be provided comprising: transmitting, with a network entity to a user equipment, at least one first description, wherein the at least one first description may comprise an indication that the network entity is capable of generating a three-dimensional model of a real-world object; receiving, from the user equipment, at least one indication with respect to generation of the three-dimensional model of the real-world object based, at least partially, on a plurality of images; receiving, from the user equipment, the plurality of images with respect to, at least, the real-world object; generating the three- dimensional model based, at least partially, on the plurality of images; transmitting, to the user equipment, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model; transmitting, to a further user equipment, the generated three-dimensional model; receiving, from the user equipment, an update for the three-dimensional model based, at least partially, on the at least one second description; and transmitting, to the further user equipment, the update for the three-dimensional model. [ 00219 ] The at least one first description may comprise an indication of at least one of : at least one required media property for the plurality of images to generate the three- dimensional model , a minimum number of a first type of bitstream supported by the network entity, a maximum number of a second type of bitstream supported by the network entity, or a process for updating the three-dimensional model .

[ 00220 ] The at least one first description may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the network entity, triggering update of the three-dimensional model with the user equipment , or periodic update of the three-dimensional model with an indicated period .

[ 00221 ] The example method may further comprise : transmitting, to the user equipment , a trigger to transmit the update of the three-dimensional model , wherein the trigger may be transmitted in response to at least one application level function, wherein the at least one first description may comprise the indication of the process for updating the three- dimensional model , comprising triggering update of the three- dimensional model with the network entity .

[ 00222 ] The at least one first description may comprise an indication of the at least one required media property, wherein the at least one required media property may comprise at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

[ 00223 ] The network entity may comprise one of : an entity configured to perform a media resource function, or a multipoint control unit .

[ 00224 ] The at least one received indication may comprise an indication of at least one of : a capability of the user equipment to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the user equipment , a maximum number of a second type of bitstream supported by the user equipment , or a process for updating the three-dimensional model .

[ 00225 ] The at least one received indication may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the user equipment , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period .

[ 00226 ] The receiving of the at least one indication may comprise : receiving, from the user equipment , a session description protocol answer, wherein the session description protocol answer may comprise , at least , the at least one indication . [00227] The at least one first description may comprise at least one session description protocol offer.

[00228] The at least one second description may comprise at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model.

[00229] The example method may further comprise: receiving, from the user equipment, configuration information associated with the plurality of images.

[00230] The configuration information may comprise at least one of: a camera configuration, or encoding information.

[00231] The example method may further comprise: receiving, from the user equipment, one or more motion signals for the three-dimensional model; and transmitting, to the further user equipment, the one or more motion signals.

[00232] The example method may further comprise: transmitting, to the further user equipment, an indication of a delivery path of the three-dimensional model.

[00233] The example method may further comprise: generating one or more motion signals based, at least partially, on the three-dimensional model.

[00234] The example method may further comprise: transmitting, to the further user equipment, the one or more motion signals. [00235] The example method may further comprise: transmitting, to the user equipment, the one or more motion signals.

[00236] In accordance with one example embodiment, an apparatus may comprise: circuitry configured to perform: transmitting to a user equipment, at least one first description, wherein the at least one first description may comprise an indication that the apparatus is capable of generating a three-dimensional model of a real-world object; circuitry configured to perform: receiving, from the user equipment, at least one indication with respect to generation of the three-dimensional model of the real-world object based, at least partially, on a plurality of images; circuitry configured to perform: receiving, from the user equipment, the plurality of images with respect to, at least, the real-world object; generating the three-dimensional model based, at least partially, on the plurality of images; circuitry configured to perform: transmitting, to the user equipment, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model; circuitry configured to perform: transmitting, to a further user equipment, the generated three- dimensional model; circuitry configured to perform: receiving, from the user equipment, an update for the three-dimensional model based, at least partially, on the at least one second description; and circuitry configured to perform: transmitting, to the further user equipment, the update for the three-dimensional model. [00237] In accordance with one example embodiment, an apparatus may comprise: processing circuitry; memory circuitry including computer program code, the memory circuitry and the computer program code configured to, with the processing circuitry, enable the apparatus to: transmit, to a user equipment, at least one first description, wherein the at least one first description may comprise an indication that the apparatus is capable of generating a three-dimensional model of a real-world object; receive, from the user equipment, at least one indication with respect to generation of the three- dimensional model of the real-world object based, at least partially, on a plurality of images; receive, from the user equipment, the plurality of images with respect to, at least, the real-world object; generate the three-dimensional model based, at least partially, on the plurality of images; transmit, to the user equipment, at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model; transmit, to a further user equipment, the generated three-dimensional model; receive, from the user equipment, an update for the three-dimensional model based, at least partially, on the at least one second description; and transmit, to the further user equipment, the update for the three-dimensional model.

[00238] In accordance with one example embodiment, an apparatus may comprise means for: transmitting, to a user equipment, at least one first description, wherein the at least one first description may comprise an indication that the apparatus is capable of generating a three-dimensional model of a real-world obj ect ; receiving, from the user equipment , at least one indication with respect to generation of the three- dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; receiving, from the user equipment , the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; transmitting, to the user equipment , at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model ; transmitting, to a further user equipment , the generated three-dimensional model ; receiving, from the user equipment , an update for the three-dimensional model based, at least partially, on the at least one second description; and transmitting, to the further user equipment , the update for the three-dimensional model .

[ 00239 ] The at least one first description may comprise an indication of at least one of : at least one required media property for the plurality of images to generate the three- dimensional model , a minimum number of a first type of bitstream supported by the apparatus , a maximum number of a second type of bitstream supported by the apparatus , or a process for updating the three-dimensional model .

[ 00240 ] The at least one first description may comprise the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model may comprise at least one of : triggering update of the three-dimensional model with the user equipment , triggering update of the three-dimensional model with the apparatus , or periodic update of the three-dimensional model with an indicated period .

[ 00241 ] The means may be further configured for : transmitting, to the user equipment , a trigger to transmit the update of the three-dimensional model , wherein the trigger may be transmitted in response to at least one application level function, wherein the at least one first description may comprise the indication of the process for updating the three- dimensional model , comprising triggering update of the three- dimensional model with the apparatus .

[ 00242 ] The at least one first description may comprise an indication of the at least one required media property, wherein the at least one required media property may comprise at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

[ 00243 ] The example apparatus may compri se one of : an entity configured to perform a media resource function, or a multipoint control unit .

[ 00244 ] The at least one received indication may comprise an indication of at least one of : a capability of the user equipment to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the user equipment , a maximum number of a second type of bitstream supported by the user equipment, or a process for updating the three-dimensional model .

[00245] The at least one received indication may comprise the indication of the process for updating the three-dimensional model, wherein the process for updating the three-dimensional model may comprise at least one of: triggering update of the three-dimensional model with the user equipment, triggering update of the three-dimensional model with the apparatus, or periodic update of the three-dimensional model with an indicated period.

[00246] The means configured for receiving the at least one indication may comprise means configured for: receiving, from the user equipment, a session description protocol answer, wherein the session description protocol answer may comprise, at least, the at least one indication.

[00247] The at least one first description may comprise at least one session description protocol offer.

[00248] The at least one second description may comprise at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model.

[00249] The means may be further configured for: receiving, from the user equipment, configuration information associated with the plurality of images. [00250] The configuration information may comprise at least one of: a camera configuration, or encoding information.

[00251] The means may be further configured for: receiving, from the user equipment, one or more motion signals for the three-dimensional model; and transmitting, to the further user equipment, the one or more motion signals.

[00252] The means may be further configured for: transmitting, to the further user equipment, an indication of a delivery path of the three-dimensional model.

[00253] The means may be further configured for: generating one or more motion signals based, at least partially, on the three-dimensional model.

[00254] The means may be further configured for: transmitting, to the further user equipment, the one or more motion signals.

[00255] The means may be further configured for: transmitting, to the user equipment, the one or more motion signals.

[00256] In accordance with one example embodiment, a non- transitory computer-readable medium comprising instructions stored thereon which, when executed with at least one processor, cause the at least one processor to: cause transmitting, with a network entity to a user equipment, of at least one first description, wherein the at least one first description may comprise an indication that the network entity is capable of generating a three-dimensional model of a real- world object; cause receiving, from the user equipment, of at least one indication with respect to generation of the three- dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; cause receiving, from the user equipment , of the plurality of images with respect to , at least , the real-world obj ect ; generate the three-dimensional model based, at least partially, on the plurality of images ; cause transmitting, to the user equipment , of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model ; cause transmitting, to a further user equipment , of the generated three-dimensional model ; cause receiving, from the user equipment , of an update for the three- dimensional model based, at least partially, on the at least one second description; and cause transmitting, to the further user equipment , of the update for the three-dimensional model .

[ 00257 ] In accordance with one example embodiment , a non- transitory computer-readable medium comprising program instructions stored thereon for performing at least the following : causing transmitting, with a network entity to a user equipment , of at least one first description, wherein the at least one first description may comprise an indication that the network entity is capable of generating a three-dimensional model of a real-world obj ect ; causing receiving, from the user equipment , of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing receiving, from the user equipment , of the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; causing transmitting, to the user equipment , of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; causing transmitting, to a further user equipment , of the generated three-dimensional model ; causing receiving, from the user equipment , of an update for the three-dimensional model based, at least partially, on the at least one second description; and causing transmitting, to the further user equipment , of the update for the three-dimensional model .

[ 00258 ] In accordance with another example embodiment , a non- transitory program storage device readable by a machine may be provided, tangibly embodying instructions executable by the machine for performing operations , the operations comprising : causing transmitting, with a network entity to a user equipment , of at least one first description, wherein the at least one first description may comprise an indication that the network entity is capable of generating a three-dimensional model of a real-world obj ect ; causing receiving, from the user equipment , of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing receiving, from the user equipment , of the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; causing transmitting, to the user equipment , of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; causing transmitting, to a further user equipment , of the generated three-dimensional model ; causing receiving, from the user equipment , of an update for the three-dimensional model based, at least partially, on the at least one second description; and causing transmitting, to the further user equipment , of the update for the three-dimensional model .

[ 00259 ] In accordance with another example embodiment , a non- transitory computer-readable medium comprising instructions that , when executed by an apparatus , cause the apparatus to perform at least the following : causing transmitting, with a network entity to a user equipment , of at least one first description, wherein the at least one first description may comprise an indication that the network entity is capable of generating a three-dimensional model of a real-world obj ect ; causing receiving, from the user equipment , of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing receiving, from the user equipment , of the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; causing transmitting, to the user equipment , of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model ; causing transmitting, to a further user equipment , of the generated three-dimensional model ; caus ing receiving, from the user equipment , of an update for the three- dimensional model based, at least partially, on the at least one second description; and causing transmitting, to the further user equipment , of the update for the three-dimensional model .

[00260 ] A computer implemented system comprising : at least one processor and at least one non-transitory memory storing instructions that , when executed by the at least one proces sor, cause the system at least to perform : causing transmitting, with a network entity to a user equipment , of at least one first description, wherein the at least one first description may comprise an indication that the network entity is capable of generating a three-dimensional model of a real-world obj ect ; causing receiving, from the user equipment , of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing receiving, from the user equipment , of the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; causing transmitting, to the user equipment , of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three- dimensional model ; causing transmitting, to a further user equipment , of the generated three-dimensional model ; caus ing receiving, from the user equipment , of an update for the three- dimensional model based, at least partially, on the at least one second description; and causing transmitting, to the further user equipment , of the update for the three-dimensional model .

[ 00261 ] A computer implemented system comprising : means for causing transmitting, with a network entity to a user equipment , of at least one first description, wherein the at least one first description may comprise an indication that the network entity is capable of generating a three-dimensional model of a real-world obj ect ; means for causing receiving, from the user equipment , of at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; means for causing receiving, from the user equipment , of the plurality of images with respect to , at least , the real- world obj ect ; means for generating the three-dimensional model based, at least partially, on the plurality of images ; means for causing transmitting, to the user equipment , of at least one second description, wherein the at least one second description may be configured to indicate at least part of the three-dimensional model ; means for causing transmitting, to a further user equipment , of the generated three-dimensional model ; means for causing receiving, from the user equipment , of an update for the three-dimensional model based, at least partially, on the at least one second description; and means for causing transmitting, to the further user equipment , of the update for the three-dimensional model .

[ 00262 ] The term "non-transitory, " as used herein, is a limitation of the medium itsel f ( i . e . tangible , not a signal ) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM) .

[00263] It should be understood that the foregoing description is only illustrative. Various alternatives and modifications can be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination ( s ) . In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modification and variances which fall within the scope of the appended claims.

Claims

What is claimed is :

1 . An apparatus comprising means for : obtaining at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real-world ob j ect ; transmitting, to the network entity, at least one indication with respect to generation of the three- dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; obtaining the plurality of images with respect to , at least , the real-world obj ect ; transmitting, to the network entity, the plurality of images ; and obtaining, from the network entity, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model .

2 . The apparatus of claim 1 , wherein the at least one transmitted indication comprises an indication of at least one of : a capability of the apparatus to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the apparatus , a maximum number of a second type of bitstream supported by the apparatus , or a process for updating the three-dimensional model .

3 . The apparatus of claim 2 , wherein the at least one transmitted indication comprises the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model comprises at least one of : triggering update of the three-dimensional model with the apparatus , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period .

4 . The apparatus of claim 3 , wherein the means are further configured for : triggering update of the three-dimensional model in response to at least one of : a user input , a determination that a new camera has become available to the apparatus , a determination that an amount of the real- world obj ect that can be captured has increased, a determination that an angle of capture of the real-world obj ect has changed, or a determination that a lighting condition of the real-world obj ect has changed, wherein the at least one transmitted indication comprises the indication of the process for updating the three-dimensional model , comprising triggering update of the three-dimensional model with the apparatus .

5 . The apparatus of any of claims 1 through 4 , wherein the network entity comprises one of : a media resource function, or a multipoint control unit .

6 . The apparatus of any of claims 1 through 5 , wherein the at least one first description further comprises an indication of at least one of : at least one required media property for the plurality of images to generate the three- dimensional model , a minimum number of a first type of bitstream supported by the network entity, a maximum number of a second type of bitstream supported by the network entity, or a process for updating the three-dimensional model .

7 . The apparatus of claim 6 , wherein the at least one first description comprises the indication of the at least one required media property, wherein the at least one required media property comprises at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

8 . The apparatus of claim 6 , wherein the at least one first description comprises the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model comprises at least one of : triggering update of the three-dimensional model with the apparatus , triggering update of the three-dimensional model with the network entity, or periodic update of the three-dimensional model with an indicated period . 9 . The apparatus of claim 8 , wherein the means are further configured for : transmitting the at least one of : the three- dimensional model , or the update for the three- dimensional model in response to a trigger received from the network entity, wherein the at least one first description comprises the indication of the process for updating the three-dimensional model , comprising triggering update of the three- dimensional model with the network entity .

10 . The apparatus of any of claims 1 through 9 , wherein the at least one first description comprises at least one session description protocol of fer .

11 . The apparatus of any of claims 1 through 10 , wherein the means configured for transmitting the at least one indication comprises means configured for : transmitting, to the network entity, a sess ion description protocol answer, wherein the sess ion description protocol answer comprises , at least , the at least one indication .

12 . The apparatus of any of claims 1 through 11 , wherein the apparatus comprises a source user equipment associated with a plurality of cameras configured to capture the plurality of images , wherein the source user equipment is at least partially di f ferent from a receiver user equipment . 13. The apparatus of any of claims 1 through 12, wherein the at least one second description comprises at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model .

14. The apparatus of any of claims 1 through 13, wherein the means are further configured for: comparing information of the at least one second description with one or more camera views of the real-world object; and determining of the update for the real-world object based, at least partially, on the comparison.

15. The apparatus of any of claims 1 through 14, wherein the means are further configured for: transmitting, to the network entity, configuration information associated with the plurality of images.

16. The apparatus of claim 15, wherein the configuration information comprises at least one of: a camera configuration, or encoding information. 17. The apparatus of any of claims 1 through 16, wherein the means are further configured for: determining one or more joints of the real-world ob ect ; generating one or more motion signals for the one or more joints; and transmitting, to a user equipment, the one or more motion signals.

18. The apparatus of any of claims 1 through 17, wherein the means are further configured for: a rigging operation or a skinning operation based, at least partially, on the plurality of images.

19. The apparatus of any of claims 1 through 18, wherein the means are further configured for: real-time segmentation of the real-world object; determining one or more motion signals based, at least partially, on the real-time segmentation of the real-world object; and transmitting, to a user equipment, the one or more motion signals.

20. The apparatus of any of claims 1 through 19, wherein the means are further configured for: transmitting, to a user equipment , an invitation to j oin an augmented reality call ; and transmitting the plurality of images as part of the augmented reality call .

21 . The apparatus of any of claims 1 through 20 , wherein the means are further configured for : transmitting at least one of : the three-dimensional model , or an update for the three-dimensional model based, at least partially, on the at least one second description .

22 . An apparatus comprising : at least one processor ; and at least one non-transitory memory storing instructions that , when executed by the at least one processor, cause the apparatus at least to : obtain at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real- world obj ect ; transmit , to the network entity, at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; obtain the plurality of images with respect to , at least , the real-world obj ect ; transmit , to the network entity, the plurality of images ; obtain, from the network entity, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model . method comprising : obtaining, with a user equipment , at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real-world obj ect ; transmitting, to the network entity, at least one indication with respect to generation of the three- dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; obtaining the plurality of images with respect to , at least , the real-world obj ect ; transmitting, to the network entity, the plurality of images ; obtaining, from the network entity, at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model .

24 . A non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following : causing obtaining of at least one first description, wherein the at least one first description comprises an indication that a network entity is capable of generating a three-dimensional model of a real-world ob j ect ; causing transmitting, to the network entity, of at least one indication with respect to generation o f the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing obtaining of the plurality of images with respect to , at least , the real-world obj ect ; causing transmitting, to the network entity, of the plurality of images ; causing obtaining, from the network entity, of at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model .

25 . An apparatus comprising means for : receiving an invitation to join a call with a user equipment ; transmitting an acceptance of the invitation; receiving, from a network entity, a three- dimensional model of a real-world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object.

26. The apparatus of claim 25, wherein the means are further configured for: receiving at least one motion signal for the three- dimensional model; and transforming the three-dimensional model based, at least partially, on the at least one motion signal.

27. The apparatus of claim 25 or 26, wherein the call comprises an augmented reality call.

28. The apparatus of any one of claims 25-27, wherein the means are further configured for: receiving an update for the three-dimensional model

29. An apparatus comprising: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive an invitation to join a call with a user equipment; transmit an acceptance of the invitation; receive, from a network entity, a three- dimensional model of a real-world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object.

30. A method comprising: receiving, with a user equipment, an invitation to join a call with a further user equipment; transmitting an acceptance of the invitation; receiving, from a network entity, a three- dimensional model of a real-world object, wherein the network entity is capable of generating the three-dimensional model of the real-world object.

31. A non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following: causing receiving of an invitation to join a call with a further user equipment; causing transmitting of an acceptance of the invitation; causing receiving, from a network entity, of a three- dimensional model of a real-world obj ect , wherein the network entity is capable of generating the three-dimensional model of the real-world obj ect .

32 . An apparatus comprising means for : transmitting, to a user equipment , at least one first description, wherein the at least one first description comprises an indication that the apparatus is capable of generating a three- dimensional model of a real-world obj ect ; receiving, from the user equipment , at least one indication with respect to generation of the three- dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; receiving, from the user equipment , the plurality of images with respect to , at least , the real-world ob j ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; transmitting, to the user equipment , at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; transmitting, to a further user equipment , the generated three-dimensional model .

33 . The apparatus of claim 32 , wherein the at least one first description comprises an indication of at least one of : at least one required media property for the plurality of images to generate the three- dimensional model , a minimum number of a first type of bitstream supported by the apparatus , a maximum number of a second type of bitstream supported by the apparatus , or a process for updating the three-dimensional model .

34 . The apparatus of claim 33 , wherein the at least one first description comprises the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model comprises at least one of : triggering update of the three-dimensional model with the user equipment , triggering update of the three-dimensional model with the apparatus , or periodic update of the three-dimensional model with an indicated period .

35 . The apparatus of claim 34 , wherein the means are further configured for : transmitting, to the user equipment , a trigger to transmit the update of the three-dimensional model , wherein the trigger is transmitted in response to at least one application level function, wherein the at least one first description comprises the indication of the process for updating the three-dimensional model , comprising triggering update of the three- dimensional model with the apparatus .

36 . The apparatus of claim 33 , wherein the at least one first description comprises an indication of the at least one required media property, wherein the at least one required media property comprises at least one of : an encoded RGB stream, an encoded RGB-D stream, or a visual volumetric video-based coding format .

37 . The apparatus of any of claims 32 through 36 , wherein the apparatus comprises one of : an entity configured to perform a media resource function, or a multipoint control unit . 38 . The apparatus of any of claims 32 through 37 , wherein the at least one received indication comprises an indication of at least one of : a capability of the user equipment to act as a source of the plurality of images , media properties of the plurality of images , a minimum number of a first type of bitstream supported by the user equipment , a maximum number of a second type of bitstream supported by the user equipment , or a process for updating the three-dimensional model .

39 . The apparatus of claim 38 , wherein the at least one received indication comprises the indication of the process for updating the three-dimensional model , wherein the process for updating the three-dimensional model comprises at least one of : triggering update of the three-dimensional model with the user equipment , triggering update of the three-dimensional model with the apparatus , or periodic update of the three-dimensional model with an indicated period . Ill

40. The apparatus of any of claims 32 through 39, wherein the means configured for receiving the at least one indication comprises means configured for: receiving, from the user equipment, a session description protocol answer, wherein the session description protocol answer comprises, at least, the at least one indication.

41. The apparatus of any of claims 32 through 39, wherein the at least one first description comprises at least one session description protocol offer.

42. The apparatus of any of claims 32 through 41, wherein the at least one second description comprises at least one of: the three-dimensional model, one or more joints of the three-dimensional model, or a skeleton associated with the three-dimensional model .

43. The apparatus of any of claims 32 through 42, wherein the means are further configured for: receiving, from the user equipment, configuration information associated with the plurality of images. 44. The apparatus of claim 43, wherein the configuration information comprises at least one of: a camera configuration, or encoding information.

45. The apparatus of any of claims 32 through 44, wherein the means are further configured for: receiving, from the user equipment, one or more motion signals for the three-dimensional model; and transmitting, to the further user equipment, the one or more motion signals.

46. The apparatus of any of claims 32 through 45, wherein the means are further configured for: transmitting, to the further user equipment, an indication of a delivery path of the three- dimensional model.

47. The apparatus of any of claims 32 through 46, wherein the means are further configured for: generating one or more motion signals based, at least partially, on the three-dimensional model.

48. The apparatus of claim 47, wherein the means are further configured for: transmitting, to the further user equipment, the one or more motion signals. 49 . The apparatus of claim 47 , wherein the means are further configured for : transmitting, to the user equipment , the one or more motion signals .

50 . The apparatus of any one of claims 32 through 49 , wherein the means are further configured for : receiving, from the user equipment , an update for the three-dimensional model based, at least partially, on the at least one second description; and transmitting, to the further user equipment , the update for the three-dimensional model

51 . An apparatus comprising : at least one processor ; and at least one non-transitory memory storing instructions that , when executed by the at least one processor, cause the apparatus at least to : transmit , to a user equipment , at least one first description, wherein the at least one first description comprises an indication that the apparatus is capable of generating a three- dimensional model of a real-world obj ect ; receive , from the user equipment , at least one indication with respect to generation of the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; receive , from the user equipment , the plurality of images with respect to , at least , the real- world obj ect ; generate the three-dimensional model based, at least partially, on the plurality of images ; transmit , to the user equipment , at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; transmit , to a further user equipment , the generated three-dimensional model . method comprising : transmitting, with a network entity to a user equipment , at least one first description, wherein the at least one first description comprises an indication that the network entity is capable of generating a three-dimensional model of a real-world ob j ect ; receiving, from the user equipment , at least one indication with respect to generation of the three- dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; receiving, from the user equipment , the plurality of images with respect to , at least , the real-world ob ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; transmitting, to the user equipment , at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; transmitting, to a further user equipment , the generated three- dimensional model .

53 . A non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following : causing transmitting, with a network entity to a user equipment , of at least one first description, wherein the at least one first description comprises an indication that the network entity is capable of generating a three-dimensional model of a real-world ob ect ; causing receiving, from the user equipment , of at least one indication with respect to generation o f the three-dimensional model of the real-world obj ect based, at least partially, on a plurality of images ; causing receiving, from the user equipment , of the plurality of images with respect to , at least , the real-world obj ect ; generating the three-dimensional model based, at least partially, on the plurality of images ; causing transmitting, to the user equipment , of at least one second description, wherein the at least one second description is configured to indicate at least part of the three-dimensional model ; causing transmitting, to a further user equipment , of the generated three-dimensional model .