WO2020139766A2 - System and method for optimizing spatial content distribution using multiple data systems - Google Patents

System and method for optimizing spatial content distribution using multiple data systems Download PDF

Info

Publication number
WO2020139766A2
WO2020139766A2 PCT/US2019/067898 US2019067898W WO2020139766A2 WO 2020139766 A2 WO2020139766 A2 WO 2020139766A2 US 2019067898 W US2019067898 W US 2019067898W WO 2020139766 A2 WO2020139766 A2 WO 2020139766A2
Authority
WO
WIPO (PCT)
Prior art keywords
scene
volume
content
client
occlusion
Prior art date
Application number
PCT/US2019/067898
Other languages
French (fr)
Other versions
WO2020139766A3 (en
Inventor
Tatu V. J. HARVIAINEN
Original Assignee
Pcms Holdings, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pcms Holdings, Inc. filed Critical Pcms Holdings, Inc.
Publication of WO2020139766A2 publication Critical patent/WO2020139766A2/en
Publication of WO2020139766A3 publication Critical patent/WO2020139766A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/40Hidden part removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/12Bounding box

Definitions

  • a method (which may be performed by a client device) includes receiving, for at least a first virtual object in a scene, information defining a first bounding volume that substantially encloses the first virtual object. Information is also received that defines at least one occlusion volume in the scene. For least a first viewpoint relative to the scene, a determination is made of an amount by which the first bounding volume is occluded by one or more of the occlusion volumes. A determination of whether to retrieve first object rendering data for the first virtual object is made based on the amount by which the first bounding volume is occluded.
  • a method (which may be performed by a client device) includes receiving, for at least a first virtual object in a scene, information defining a first bounding volume that substantially encloses the first virtual object. Information is also received that defines at least one occlusion volume in the scene. For least a first viewpoint relative to the scene, a determination is made of whether the first bounding volume is fully occluded by one or more of the occlusion volumes. In response to a determination that the first bounding volume is not fully occluded, first object rendering data may be retrieved for the first virtual object.
  • Some embodiments of an example method further include determining an amount by which the first virtual object is occluded by the at least one occlusion volume, where a resolution level of the retrieved first object rendering data is determined based on the amount by which the bounding volume is occluded.
  • Some embodiments further include, for at least a second virtual object in a scene, receiving information defining a second bounding volume that substantially encloses the second virtual object. A determination is made of whether the second bounding volume is fully occluded by at least one of the occlusion volumes. In response to a determination that the second bounding volume is fully occluded by at least one of the occlusion volumes, a determination may be made not to retrieve object rendering data for the second virtual object.
  • At least one of the occlusion volumes is enclosed within a third virtual object in the scene.
  • a data size of the object rendering data for the first virtual object is greater than a data size of the information defining the bounding volume of the first virtual object.
  • the first object rendering data includes a first number of vertex points, and the information defining the first bounding volume comprises a second number of vertex points, the second number being less than the first number.
  • the first object rendering data includes a first number of polygons, and the information defining the first bounding volume comprises a second number of polygons, the second number being less than the first number.
  • the first viewpoint is within a defined navigation volume
  • the method includes, for a plurality of viewpoints within the defined navigation volume, determining whether the first bounding volume is fully occluded by at least one of the occlusion volumes.
  • the first object rendering data may then be retrieved in response to a determination that the first bounding volume is not fully occluded for at least one of the viewpoints in the defined navigation volume.
  • the first viewpoint may correspond in some embodiments to a virtual position of a user with respect to the scene or to a predicted future position of a user with respect to the scene.
  • the method further includes rendering a view of the scene based at least on the first object rendering data.
  • the method may further include generating a signal representing the rendered view. Such a signal may be transmitted to a display device or display component.
  • the method may further include displaying the rendered view.
  • an apparatus is provided with a processor configured to perform any of the methods described herein.
  • the processor may be configured using instructions for performing the methods.
  • the apparatus may include a computer-readable medium (e.g. a non-transitory computer-readable medium) storing the instructions.
  • Spatial data content distribution uses a large amount of bandwidth, which in many cases may be a bottleneck for content delivery. Viewing conditions varying from session to session, such as the number and context of viewers, display device configuration, user preferences, and content navigation may cause at least a portion of the spatial content to not be visible for a viewer. Content distribution may be improved by not delivering visual data that is not visible by a viewer.
  • a content server may improve content distribution by providing clients with spatial content overview information in addition to spatial data used by the viewing client for immersive rendering.
  • a content server may produce content overview information and may analyze the content by separating the content into individual segments, for which classification, viewing volume, and bounding volumes may be determined.
  • a viewing client may align the content, may determine a predicted viewing volume for the viewing client, and may determine which objects are visible and invisible for the viewing volume and may be omitted.
  • a viewing client may communicate information indicating visible objects to a content server. This communication may be a continual process.
  • a content server may use the visible object information to improve spatial data distribution by reducing delivered content data and by dividing distribution to several streams to reduce the amount of data received by each client.
  • FIG. 1A is a system diagram of an example system illustrating an example communications system according to some embodiments.
  • FIG. 1 B is a system diagram of an example system illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to some embodiments.
  • WTRU wireless transmit/receive unit
  • FIG. 2 is a system diagram illustrating an example set of interfaces for a content provider, a content server, and a viewing client according to some embodiments.
  • FIGs. 3A-3E are schematic illustrations of bounding volumes and occlusion volumes for virtual objects in a scene.
  • FIGs. 4A-4C are schematic illustrations of occlusion detection for a virtual scene using bounding volumes, occlusion volumes, and navigation volumes.
  • FIG. 5 is a message sequencing diagram illustrating an example process for determining visible objects and rendering visible objects according to some embodiments.
  • FIG. 6 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example server push model according to some embodiments.
  • FIG. 7 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example client pull model according to some embodiments.
  • FIG. 8 is a flowchart illustrating an example process for analyzing content segments to determine bounding volumes according to some embodiments.
  • FIG. 9 is a flowchart illustrating an example process that may be executed by a content server for streaming spatial data content to a client for an example server push model according to some embodiments.
  • FIG. 10 is a flowchart illustrating an example process executed by a content server for streaming spatial data content to a client for an example client pull model according to some embodiments.
  • FIG. 1 1 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example server push model according to some embodiments.
  • FIG. 12 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example client pull model according to some embodiments.
  • a wireless transmit/receive unit may be used as a client, a viewing device, or a viewer client in embodiments described herein.
  • FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented.
  • the communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
  • the communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
  • the communications systems 100 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
  • CDMA code division multiple access
  • TDMA time division multiple access
  • FDMA frequency division multiple access
  • OFDMA orthogonal FDMA
  • SC-FDMA single-carrier FDMA
  • ZT UW DTS-s OFDM zero-tail unique-word DFT-Spread OFDM
  • UW-OFDM unique word OFDM
  • FBMC filter bank multicarrier
  • the communications system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, a RAN 104/1 13, a ON 106/115, a public switched telephone network (PSTN) 108, the Internet 1 10, and other networks 1 12, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements.
  • WTRUs 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment.
  • the WTRUs 102a, 102b, 102c, 102d may be configured to transmit and/or receive wireless signals and may include a user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription-based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (loT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, a medical device and applications (e.g., remote surgery), an industrial device and applications (e.g., a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts), a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like.
  • UE user equipment
  • PDA personal digital assistant
  • HMD head-mounted display
  • a vehicle a drone
  • the communications systems 100 may also include a base station 1 14a and/or a base station 1 14b.
  • Each of the base stations 1 14a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the CN 106/1 15, the Internet 110, and/or the other networks 112.
  • the base stations 1 14a, 1 14b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 1 14a, 1 14b are each depicted as a single element, it will be appreciated that the base stations 1 14a, 1 14b may include any number of interconnected base stations and/or network elements.
  • the base station 1 14a may be part of the RAN 104/113, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc.
  • BSC base station controller
  • RNC radio network controller
  • the base station 114a and/or the base station 1 14b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum.
  • a cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time. The cell may further be divided into cell sectors.
  • the cell associated with the base station 1 14a may be divided into three sectors.
  • the base station 1 14a may include three transceivers, i.e., one for each sector of the cell.
  • the base station 1 14a may employ multiple-input multiple output (MIMO) technology and may utilize multiple transceivers for each sector of the cell.
  • MIMO multiple-input multiple output
  • beamforming may be used to transmit and/or receive signals in desired spatial directions.
  • the base stations 1 14a, 1 14b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 1 16, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.).
  • the air interface 1 16 may be established using any suitable radio access technology (RAT).
  • RAT radio access technology
  • the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like.
  • the base station 1 14a in the RAN 104/1 13 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1 15/1 16/117 using wideband CDMA (WCDMA).
  • WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (FISPA+).
  • HSPA may include High-Speed Downlink (DL) Packet Access (FISDPA) and/or High-Speed UL Packet Access (FISUPA).
  • the base station 1 14a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1 16 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A) and/or LTE-Advanced Pro (LTE-A Pro).
  • E-UTRA Evolved UMTS Terrestrial Radio Access
  • LTE Long Term Evolution
  • LTE-A LTE-Advanced
  • LTE-A Pro LTE-Advanced Pro
  • the base station 1 14a and the WTRUs 102a, 102b, 102c may implement a radio technology such as NR Radio Access , which may establish the air interface 1 16 using New Radio (NR).
  • a radio technology such as NR Radio Access , which may establish the air interface 1 16 using New Radio (NR).
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement multiple radio access technologies.
  • the base station 1 14a and the WTRUs 102a, 102b, 102c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles.
  • DC dual connectivity
  • the air interface utilized by WTRUs 102a, 102b, 102c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.1 1 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
  • IEEE 802.1 1 i.e., Wireless Fidelity (WiFi)
  • IEEE 802.16 i.e., Worldwide Interoperability for Microwave Access (WiMAX)
  • CDMA2000, CDMA2000 1X, CDMA2000 EV-DO Code Division Multiple Access 2000
  • IS-2000 Interim Standard 95
  • IS-856 Interim Standard 856
  • GSM Global System for
  • the base station 1 14b in FIG. 1A may be a wireless router, Flome Node B, Flome eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like.
  • the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.1 1 to establish a wireless local area network (WLAN).
  • WLAN wireless local area network
  • the base station 1 14b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN).
  • the base station 1 14b and the WTRUs 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell.
  • the base station 1 14b may have a direct connection to the Internet 1 10.
  • the base station 1 14b may not be required to access the Internet 1 10 via the CN 106/115.
  • the RAN 104/1 13 may be in communication with the CN 106/115, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d.
  • the data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like.
  • QoS quality of service
  • the CN 106/1 15 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication.
  • the RAN 104/1 13 and/or the CN 106/115 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 104/1 13 or a different RAT.
  • the CN 106/115 may also be in communication with another RAN (not shown) employing a GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technology.
  • the CN 106/1 15 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or the other networks 1 12.
  • the PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS).
  • POTS plain old telephone service
  • the Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite.
  • the networks 1 12 may include wired and/or wireless communications networks owned and/or operated by other service providers.
  • the networks 112 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 104/1 13 or a different RAT.
  • Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links).
  • the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 1 14a, which may employ a cellular-based radio technology, and with the base station 1 14b, which may employ an IEEE 802 radio technology.
  • FIG. 1 B is a system diagram illustrating an example WTRU 102.
  • the WTRU 102 may include a processor 1 18, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, non-removable memory 130, removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and/or other peripherals 138, among others.
  • GPS global positioning system
  • the processor 1 18 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
  • the processor 1 18 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment.
  • the processor 1 18 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1 B depicts the processor 1 18 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
  • the transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1 14a) over the air interface 1 16.
  • a base station e.g., the base station 1 14a
  • the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals.
  • the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example.
  • the transmit/receive element 122 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
  • the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1 16.
  • the transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122.
  • the WTRU 102 may have multi-mode capabilities.
  • the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as NR and IEEE 802.1 1 , for example.
  • the processor 1 18 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
  • the processor 1 18 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128.
  • the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132.
  • the non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
  • the removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like.
  • SIM subscriber identity module
  • SD secure digital
  • the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
  • the processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102.
  • the power source 134 may be any suitable device for powering the WTRU 102.
  • the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium- ion (Li-ion), etc.), solar cells, fuel cells, and the like.
  • the processor 1 18 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102.
  • location information e.g., longitude and latitude
  • the WTRU 102 may receive location information over the air interface 1 16 from a base station (e.g., base stations 1 14a, 1 14b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable location- determination method while remaining consistent with an embodiment.
  • the processor 1 18 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
  • the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like.
  • FM frequency modulated
  • the peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
  • a gyroscope an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
  • the WTRU 102 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous.
  • the full duplex radio may include an interference management unit to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 1 18).
  • the WRTU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
  • a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
  • one or more, or all, of the functions described herein with regard to one or more of: WTRU 102a-d, Base Station 1 14a-b, and/or any other device(s) described herein may be performed by one or more emulation devices (not shown).
  • the emulation devices may be one or more devices configured to emulate one or more, or all, of the functions described herein.
  • the emulation devices may be used to test other devices and/or to simulate network and/or WTRU functions.
  • the emulation devices may be designed to implement one or more tests of other devices in a lab environment and/or in an operator network environment.
  • the one or more emulation devices may perform the one or more, or all, functions while being fully or partially implemented and/or deployed as part of a wired and/or wireless communication network in order to test other devices within the communication network.
  • the one or more emulation devices may perform the one or more, or all, functions while being temporarily implemented/deployed as part of a wired and/or wireless communication network.
  • the emulation device may be directly coupled to another device for purposes of testing and/or may performing testing using over-the-air wireless communications.
  • the one or more emulation devices may perform the one or more, including all, functions while not being implemented/deployed as part of a wired and/or wireless communication network.
  • the emulation devices may be utilized in a testing scenario in a testing laboratory and/or a non-deployed (e.g., testing) wired and/or wireless communication network in order to implement testing of one or more components.
  • the one or more emulation devices may be test equipment. Direct RF coupling and/or wireless communications via RF circuitry (e.g., which may include one or more antennas) may be used by the emulation devices to transmit and/or receive data.
  • RF circuitry e.g., which may include one or more antennas
  • Visually rich spatial data may contain elements in multiple data formats, which may affect how freely the viewpoint to the scene may be adjusted. Also, characteristics of the viewing session, such as the physical environment, amount and context of users, and the display device setup may affect how much of the full spatial content is used by the viewing client and how the spatial content is adjusted to provide a high- quality experience for all viewers.
  • the content distribution may be adjusted to enable delivery of complex spatial data with high quality. Redundant (or unused) content data which is not contributing to viewed content may not be transmitted between a content server and a viewing client. Such a process may use scene, content, and viewing data to determine which elements are contributing to (or changing) a user’s viewpoint. Processing and contextual data may be communicated between a content server and a viewing client.
  • Environments that may be bandwidth constrained include wireless Wi-Fi data and cellular 5G data. Spatial data content distribution uses a large bandwidth, which in many cases may be a severe bottleneck for content delivery. Content distribution may be improved by not delivering visual data not visible by the viewer.
  • a client navigating an unstructured point cloud may receive data delivered over a bandwidth- constrained network.
  • a three-degrees-of-freedom-plus (3DoF+) or six-degrees-of-freedom (6D0F) application for a VR, AR, or MR environment may have content delivery bandwidth limitations.
  • FIG. 2 is a system diagram illustrating an example set of interfaces for a content provider, a content server, and a viewing client according to some embodiments.
  • FIG. 2 shows an example embodiment for allocation of data and analysis for a system.
  • a content provider may be a source feed of content, such as a live camera, a previously recorded video, or ARA/R/MR content source.
  • a content server may include storage locations for spatial data and scene overview. Content received from a content provider may be analyzed by a content analysis process to determine objects from spatial data and to store scene overview information. Spatial data and scene overview information may be sent to a viewing client for analysis of visible objects and rendering of objects for a user. Operations of each of these elements is described in more detail below.
  • Not transmitting visual data for objects not visible may be one method to improve content delivery efficiency.
  • a content server and a viewing client may exchange information about content characteristics and session characteristics to improve content delivery and to transmit spatial data only for visible objects for some embodiments.
  • Some embodiments of a system use a content server to analyze the spatial data to extract overview information for a spatial scene.
  • the overview information may be sent to a viewing client.
  • the viewing client may combine content layout information with viewing session information to determine which elements of a scene are visible for a navigation volume.
  • Element visibility information and navigation volume information may be communicated by a viewing client to a content server to improve efficiency of content distribution.
  • Information about element visibility may be used by a content server to toggle on and off the processing of content segments.
  • the navigation volume predicted to be used by a viewing client may be used in determining which segment content data may be removed.
  • the navigation volume predicted to be used may be smaller than the full viewing volume available (or enabled) by spatial content data.
  • Systems and methods disclosed herein may be implemented using client pull and server push models.
  • a content server isolates individual scene elements so that the client selectively chooses which elements to receive based on local per session criteria.
  • a content server divides full spatial data into individual streams based on client signaling indicating how a client is using scene data, splitting the full stream into sub-streams if there are multiple clients streaming the data concurrently with varying per scene element uses.
  • An example process that may be executed by a content server may include receiving spatial data for a spatial scene.
  • Scene overview data may be generated for a spatial scene. Pre-processing may be performed to generate scene overview data if the spatial data is available before a user requests the content.
  • Scene overview data may be generated at run-time if the spatial data is a live stream.
  • Analysis of spatial data may include: segmenting and identifying elements within the spatial data, determining bounding volumes for segments, determining occlusion volumes for segments, and determining a viewing volume from which a processed segment may be viewed.
  • analysis of spatial data includes isolating spatial data into individual streams that may be selectively requested by clients.
  • a content server may send scene overview data to the viewing client, which may include: hierarchical structured scene graph describing elements, locations and element types, bounding volumes, occlusion volumes, and viewing volumes for each element.
  • content distribution includes a viewing client sending element visibility and navigation volume information to the content server, and the content server may adjust content data to be sent to the viewing client based on element visibility and navigation volume information.
  • the adjusted content data may be streamed to the viewing client as requested by the viewing client.
  • content distribution includes streaming segmented spatial data elements to the clients based on client requests. The content server process may repeat continually (starting with sending scene overview data) until a session termination is communicated.
  • An example process that may be executed by a viewing client may include requesting content from a content server and receiving scene overview data from the content server.
  • Content may be adjusted for a display setup configuration, location of viewers, and other local criteria.
  • the viewing client may determine which elements are visible and may determine a navigation volume for the scene for the current time step.
  • requesting content from the server includes sending the navigation volume and element visibility information to the content server.
  • Element visibility information may be an indication of estimated changes to visibility for some embodiments.
  • requesting content from the server includes requesting individual scene element files and streams from the server based on local estimation of elements to be used.
  • the viewing client may receive content data from the content server and may display the content to one or more viewers.
  • the viewing client process (starting with receiving scene overview data) may repeat continually until a session termination is communicated.
  • Systems and methods disclosed herein may adjust scene data communicated between a content server and a viewing client to improve content distribution efficiency and to improve a viewing experience. These improvements may use descriptions of a scene overview to determine which scene elements may be used by a viewing client for a navigation volume.
  • the viewing client may request, from a content server, content for a set of scene elements, and the content may be adapted for viewing session characteristics.
  • Content distribution efficiency may be improved by a content server sending only content data for scene elements that are visible for a viewing volume for some embodiments.
  • information signaled from a viewing client back to a content server enables a content server to improve content distribution by decreasing the amount of data streamed to the client.
  • FIG. 3A is a schematic illustration of two objects 302, 304 in a virtual scene 306 available from a content server.
  • Object 302 is a virtual model of a car
  • object 304 is a virtual model of a building.
  • the virtual models of the car and the building may be represented using various computer graphics modeling techniques. For example, they may be represented as point clouds or as polygon meshes with associated texture map information, among other possibilities.
  • Full-resolution versions of the objects 302 and 304 may call for a large amount of data to be transmitted from a content server to a viewing client. The transmission of such data may be inefficient in case where one or more of the objects is occluded by another object because a fully occluded object will not be visible to a user.
  • FIG. 3B An example of complete occlusion of one virtual object by another is illustrated in the schematic plan view of FIG. 3B. From the virtual viewpoint of the user 308 on the virtual scene, the virtual car 302 is completely occluded by the virtual building 304. In this configuration, transmission of full rendering information for the virtual car 302 likely amounts to an unnecessary use of network bandwidth.
  • the bounding volume of a virtual object is, in some embodiments, a volume that substantially encloses (e.g. fully encloses) the respective virtual object.
  • Information defining the bounding volume may be, for example, a polygon mesh or information used for conveying volumetric information.
  • a bounding volume may be a sphere (even if the virtual object is not spherical), and information defining the bounding volume may include coordinates of the center of a sphere and a value indicating the radius of the sphere.
  • a bounding volume may have a box shape (e.g.
  • the information defining the bounding volume may include, for example, coordinates of one or more corners of the box.
  • the information defining the bounding volume may include information (e.g. coordinates) defining the position of the bounding volume within the scene. It is not necessary for all virtual objects in a scene to have a bounding volume. For example, some virtual objects may be sufficiently large (e.g. a virtual mountain) or sufficiently close to the user (e.g. a virtual representation of a user’s hand or handheld tools) that they are not likely to be substantially or fully occluded, and occlusion testing using bounding volumes may be skipped for such objects. [0073] FIG.
  • 3C schematically illustrates an example of a box-shaped bounding volume 310 associated with virtual object 302. While the virtual object 302 itself may be represented using dozens or hundreds of vertices or polygons, bounding volume 310 may be defined with eight or fewer vertices (six or fewer polygons) in this example.
  • an occlusion volume is enclosed within an opaque region of a virtual object.
  • FIG. 3D schematically illustrates an occlusion volume 312 enclosed within virtual object 304. While the virtual object 304 itself may be represented using dozens or hundreds of vertices, occlusion volume 310 may be defined with eight or fewer vertices in this example. More generally, information defining an occlusion volume may be, for example, a polygon mesh or using information used for conveying volumetric information.
  • an occlusion volume may be a sphere, and information defining the occlusion volume may include coordinates of the center of a sphere and a value indicating the radius of the sphere.
  • An occlusion volume may have a box shape (e.g. a cube), and the information defining the occlusion volume may include, for example, coordinates of one or more corners of the box.
  • the information defining the occlusion volume may include information (e.g. coordinates) defining the position of the bounding volume within the scene.
  • Some virtual objects may have more than one associated occlusion volume.
  • virtual object 302 may enclose three occlusion volumes 314, 316, 318.
  • no occlusion volumes are positioned in a partly transparent portion of virtual object 302 that corresponds to car windows.
  • three box-shaped occlusion volumes are used in this example for virtual object 302, other examples could use different numbers and different shapes of occlusion volumes. More occlusion volumes (and/or occlusion volumes with more complicated shapes) may more accurately conform to the shape of the virtual object at the expense of imposing greater data requirements.
  • an object may be associated with more than one occlusion volume.
  • objects may share one or more occlusion volumes.
  • a virtual car may be represented not by a single virtual object, but by a collection of a plurality of virtual objects, such as a body and four individual wheels, but the entire virtual car may be associated with a smaller number of occlusion volumes, such as the three volumes shown in FIG. 3E.
  • An occlusion volume need not be associated with any particular virtual object.
  • a virtual object may have no associated occlusion volume. For example, it may be desirable not to define any occlusion volume for a fully or partly transparent object, such as a virtual window or shrub, or for a small object that is unlikely to substantially occlude other objects.
  • FIGs. 4A-4C illustrate example uses of bounding volumes, occlusion volumes, and navigation volumes.
  • a client device receives, from a content server, information defining the bounding volume 310 of the virtual car and the occlusion volume 312 of the virtual building in a virtual scene. This may occur before the client has received full rendering information for the virtual car and virtual building.
  • the client device determines whether the bounding volume 310 of the car is fully occluded by any individual occlusion volume or combination of occlusion volumes in the scene.
  • FIG. 4A receives, from a content server, information defining the bounding volume 310 of the virtual car and the occlusion volume 312 of the virtual building in a virtual scene. This may occur before the client has received full rendering information for the virtual car and virtual building.
  • the client device determines whether the bounding volume 310 of the car is fully occluded by any individual occlusion volume or combination of occlusion volumes in the scene.
  • the bounding volume 310 is not fully occluded by the occlusion volume 312 of the building.
  • the client device retrieves object rendering data for the virtual car. (A similar process may be performed to determine whether a bounding volume of the virtual building is fully occluded by one or more occlusion volumes; for simplicity, this process is not illustrated in FIGs. 4A-4C.)
  • the client device may retrieve a full-resolution version of the virtual car 302 in response to the determination that the bounding volume 310 of the car virtual is not fully occluded.
  • the client device determines an amount by which the first virtual object is occluded by the occlusion volumes, and the resolution level of the retrieved rendering data is determined based on the amount by which the bounding volume is occluded.
  • the amount of occlusion may be determined as, for example, a percentage occlusion from the perspective of the viewpoint 402.
  • a threshold may be set, such that, for example, a full-resolution version of the virtual car 302 is retrieved if the occlusion level is less than the threshold, while a reduced-resolution version of the virtual car may be retrieved if the occlusion level is greater than the threshold.
  • Reduced-resolution versions of an object may have, for example, fewer polygons or vertex points than full-resolution versions. In some embodiments, if the amount of occlusion is greater than a threshold a determination may be made not to retrieve any rendering data for the object.
  • a virtual object may only be a portion of what might commonly be considered a single physical object.
  • the front half and rear half of a virtual car 302 could be represented as separate virtual objects, albeit separate objects that are constrained to remain adjoined to one another.
  • Each half of the car may then have an associated bounding volume.
  • the client device retrieves full rendering information for the object representing the unoccluded front half of the virtual car and retrieves no rendering information or low-resolution rendering information for the object representing the rear half of the virtual car.
  • the virtual scene includes both the original virtual car 302 and a second virtual car 404.
  • the client device receives information defining a bounding volume 406 associated with virtual car 404.
  • the client device makes a determination of whether bounding volume 406 is fully occluded, from the perspective of viewpoint 402, by any individual occlusion volume or combination of occlusion volumes in the scene.
  • bounding volume 406 is fully occluded by occlusion volume 312.
  • the client device makes a determination not to retrieve object rendering data for the second virtual car 404.
  • the client device may identify to the content server those objects that are determined not to be occluded, so that rendering information for those objects can be retrieved. In some embodiments, the client device may identify to the content server those objects that are determined to be occluded, so the server can omit rendering information for the occluded objects from the object rendering data retrieved by the client. In some embodiments, the client may indicate to the server an amount of occlusion for each of the objects in the virtual scene, with the server determining for each object of whether to send rendering information and at what resolution.
  • bounding volumes and occlusion volumes in general can be defined with less data than would otherwise be required to provide full rendering information for virtual objects with which they are associated.
  • occlusion volumes and/or bounding volumes may be represented by surfaces (e.g. mesh surfaces) or collections of points (e.g. a point cloud) that do not necessarily enclose a particular volume of space in a strict geometric sense.
  • a bounding volume or occlusion volume may be represented as a concave surface in which the virtual object is situated, analogous to a box with an open lid.
  • occlusion of bounding volumes is determined not for a single viewpoint, but rather for a plurality of viewpoints relative to a virtual scene.
  • the client device determines a navigation volume 410.
  • a navigation volume may be determined in various ways.
  • the navigation volume 410 represents a permissible range of motion of a viewpoint of a user (e.g. user 308).
  • navigation volume 410 represents a likely range of motion of the viewpoint of the user, for example the navigation volume 410 may have a predetermined or reconfigurable size and shape surrounding the current user viewpoint.
  • the navigation volume may be periodically updated in view of motion of the user viewpoint or other events.
  • the client device operates to determine the occlusion of bounding volumes from various viewpoints (e.g. viewpoints 412a-c, among others) within the navigation volume 410.
  • Viewpoints within a bounding volume may be selected in various ways. For example, they may be selected to be substantially evenly distributed through the volume, they may be selected to be substantially evenly distributed over an outer surface of the volume, or they may be selected to be at corners of the volume, among other options.
  • occlusion volume 312 is present along with bounding volumes 414, 416, 418 associated with different virtual objects.
  • the client device may determine the levels of occlusion of the bounding volumes from the plurality of viewpoints within the defined navigation volume 410. For example, the client device may determine that bounding volume 414 is fully occluded from all of a plurality of viewpoints within the navigation volume. In response, the client may make a determination not to retrieve full rendering information for the virtual object associated with bounding volume 414. The client device may determine that bounding volume 418 is not occluded from any of the viewpoints within the navigation volume. In response, the client may make a determination to retrieve full rendering information for the virtual object associated with bounding volume 418. The client device may determine that bounding volume 416 is fully occluded from some viewpoints (e.g.
  • the client device retrieves object rendering data for the virtual object associated with bounding volume 416.
  • the client device may retrieve a lower-resolution version of the object rendering data in view of the partial occlusion of bounding volume 416.
  • the viewpoint or viewpoints that are used in testing for occlusion of bounding volumes in a scene may correspond to current or predicted virtual positions of one or more users with respect to the virtual scene. Viewpoints may be set or adjusted based on user input.
  • User input may include user movement, e.g. where the client device is associated with a head-mounted display or other moveable display device with motion tracking capability.
  • Alternative user inputs include user input with a joystick, arrow key, touchpad, and the like.
  • the client device may operate to render a view of the virtual objects using the retrieved object rendering data.
  • a signal representing the rendered view may then be provided to a display device for display to the user.
  • the display device may be a component of the client device, or it may be a separate component.
  • occlusion testing of bounding volumes may be performed for each of a plurality of viewpoints without defining any specific navigation volume.
  • a user may be provided with an option of viewing a virtual scene from any one of a plurality of viewpoints (e.g. one of a plurality of seats that can be selected in a virtual stadium), and the occlusion of bounding volumes may be determined for those different viewpoints.
  • the client device may retrieve rendering information for more than one user (e.g. for two or more users playing an immersive game). In such embodiments, the client device may retrieve rendering information only for virtual objects whose bounding volumes are not fully occluded from a current or predicted viewpoint of at least one of the users.
  • a client device may make a determination not to retrieve rendering data for particular virtual objects for reasons other than occlusion of the bounding volumes of those virtual objects.
  • a virtual object may have an associated viewing volume within which the virtual object is deemed to be visible (providing it is not occluded, etc.).
  • the client device may make a determination not to retrieve rendering data for a virtual object if no current or predicted viewpoint of the client device is within the viewing volume of the virtual object.
  • the client device make take into consideration a user’s direction of view relative to the virtual scene. A current or predicted direction of view may be used to define a view frustrum, and the client device may make a determination not to retrieve rendering information for virtual objects whose bounding volumes fall completely outside a current or predicted view frustrum.
  • Some embodiments described herein allow a client device to reduce the amount of rendering data retrieved for a virtual scene by not retrieving rendering data for virtual objects whose bounding volumes are fully occluded by occlusion volumes.
  • the additional bandwidth used to retrieve bounding volume information and occlusion volume information is expected to be less than the bandwidth saved through not retrieving rendering data for occluded virtual objects.
  • the bounding volumes and occlusion volumes do not necessarily conform precisely to the shapes defined by the full rendering data of those objects. Because the bounding volumes and occlusion volumes do not conform exactly to their respective virtual objects, some error in occlusion testing may be expected. In some embodiments, the bounding volumes and occlusion volumes may be selected such that an error is more likely to result in retrieving unneeded rendering information rather than failing to retrieve rendering information that would otherwise be visible in a scene. For example, it may be preferable for the bounding volume of a virtual object to be generally larger than the object and for an occlusion volume to be generally smaller than the represented occlusion.
  • the client device may find that a bounding volume is only partly occluded, even though the associated object itself is ultimately fully occluded in the rendered scene. In that case, the rendering information for the occluded object may be retrieved unnecessarily, but this is likely to be more acceptable than the effects of failing to retrieve rendering information for objects that should be visible in the rendered scene.
  • a virtual model of a car may include a radio antenna, and a bounding volume for the car may be selected that encloses the body of the car but not the antenna.
  • the antenna may fail to be rendered when it would otherwise be the only visible portion of the car. But this may be an acceptable tradeoff because it avoids the need to retrieve the complete rendering information for the car merely to render the antenna.
  • a content server may process spatial data to segment and identify individual objects from the spatial data and to organize raw point cloud data into object clusters for a scene graph. For each segmented and identified object, a content server may determine a bounding volume (e.g. a box or sphere fully containing the object), an occlusion volume (e.g. one or more boxes or spheres fully contained in the object), and a viewing volume (which is a 3D shape that indicates portions of a space from where the object may be viewed).
  • a bounding volume is a volume in which the segment is entirely contained.
  • an occlusion volume is the area in which the segment creates full occlusion.
  • Transparencies or holes in objects may be processed as zero or multiple occluding elementary volumes.
  • a viewing volume indicates an area for which object data supports navigation.
  • These volumes may be represented using 3D shapes (or primitives), such as boxes and spheres, or using more complex shapes, such as polygons, formed from combinations of 3D primitives (such as cubes, cylinders, spheres, cones, and pyramids).
  • Volumes are generated to form a simplified geometry for the original spatial data segments that enable efficient occlusion and collision calculations.
  • the bounding volume and the occlusion volume of an object are preferably configured such that they can be signaled using less data than is used to signal the object itself. For example, the number of vertices used to signal the bounding volume and the occlusion volume may be less than the number of vertices or points in the object.
  • content may be provided in raw point cloud format as well as in reconstructed 3D mesh format.
  • multiple levels of detail may be available for a representation.
  • the available representation versions are described in the scene overview, and the viewing client (for a client pull model) may choose the representation based on local criteria.
  • the server may choose the version of the content elements to be streamed to the client.
  • low resolution representations may be selected based on distance or significance of the occlusion (far objects or tiny parts may use a low resolution of a representation with a lower level of detail).
  • the content server performs pre-processing for all time steps or periodically between some number of time steps, processing the whole content.
  • the content server tracks elements between temporal steps and compiles generated segments and per segment meta-data together.
  • FIG. 5 is a message sequencing diagram illustrating an example process for determining visible objects and rendering visible objects according to some embodiments.
  • a server 502 (which may be a content server) may analyze content and isolate object streams (box 506) for some embodiments of an example process.
  • a client 504 may send a content request 508 to the server.
  • the server may send to the client communications indicating a scene graph 510 (which may include locations of identified objects), object bounding volumes 512, occlusion volumes 514, and viewing volumes 516.
  • the client may select an initial viewpoint (box 518), record sensor data to track a viewer (box 520), and compute visible objects within the scene (box 522).
  • the client may send a request 524 for objects from the server.
  • the request may be for objects determined to be visible.
  • Some embodiments may include a request for partially visible objects.
  • the server may send 526 object representations to the client.
  • the object representations may correspond to an object request sent by the client such that the same objects are referenced in both communications.
  • the client may render and display (box 528) the objects represented in the object representations. The process shown in FIG. 5 may be repeated continuously for some embodiments.
  • FIG. 6 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example server push model according to some embodiments.
  • Some embodiments of an example process may include content analysis and content streaming.
  • a content provider may send spatial data to a content server (box 606).
  • spatial data may include one or more of the following items: a scene graph (which may include locations of identified objects), object bounding volumes, occlusion volumes, and viewing volumes.
  • a content server may segment and classify (content) data (box 608) and generate bounding, occlusion, and viewing volumes for segments (box 610).
  • a user may send a content request to a viewer client.
  • the user may interface with a viewer client Ul to generate a content request.
  • the viewer client may send a content request (box 612) to a content server.
  • the viewer client may collect sensor and configuration data (box 614), such as location of the user and viewpoint of the user in relation to real-world objects.
  • the viewer client may record and track movements of the user by recording and tracking movements of the viewer client device.
  • the content server may send to the viewer client communications indicating a scene overview.
  • the scene overview may include a scene graph, descriptions of objects identified within content data, and bounding, occlusion, and viewing volumes of identified objects (box 616).
  • the scene overview also may include locations of objects within a scene.
  • the viewer client may estimate a navigation volume within which the user may navigate as well as visibility of scene elements based on session characteristics and user preferences (box 618).
  • the viewer client may send the estimated navigation volume and visibility of elements (box 620) to the content server.
  • the content server may process the content (box 622) to remove objects that would not be visible (e.g. objects that are fully occluded from the user’s viewpoint) and, in some embodiments, to generate lower-resolution versions of objects that are at least partly occluded.
  • the content server may send the processed content streams (box 622) to the viewer client.
  • the content server may send (box 624) content streams containing only objects visible by the user for the viewing volume.
  • the viewer client may render (box 626) and display (box 628) the content for viewing by the user.
  • FIG. 7 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example client pull model according to some embodiments.
  • a content server 704 obtains spatial data (box 706), e.g. from a content server.
  • the content server segments and classifies the data (box 708) and generates bounding, occlusion, and viewing volumes for the segments (box 710).
  • the content server may isolate segments into individual streams (box 71 1 ).
  • a user initiates a content request to a viewer client 702.
  • the viewer client sends a content request (box 712) to the content server.
  • the viewer client collects sensor and configuration data 714.
  • the content server sends to the viewer client scene overview data, which includes a scene graph and bounding, occlusion, and viewing volumes (box 716).
  • the viewer client selects and updates the viewpoint (box 718), such as if the user moves around.
  • the viewer client selects the segments (box 719) to be used/or available for displaying.
  • the viewer client sends a request to the content server for segment streams (box 720).
  • the content server responds to the viewer client with requested segments (724).
  • the content is rendered (box 726) and displayed (box 728) to the user.
  • FIG. 8 is a flowchart illustrating an example process for analyzing content segments to determine bounding volumes according to some embodiments.
  • FIG. 8 shows an example process that may be executed by a content server for extraction of scene overview data from content data.
  • content may be received or otherwise obtained (e.g. locally generated) by a content server or another server performing content analysis (box 802).
  • the received spatial data is segmented (box 804). Segments are isolated as individual data blocks and classified to infer segment types (box 806).
  • the content server may analyze spatial content received from a content provider and extract a scene overview.
  • Spatial data (such as identity of objects, location of objects, and spatial environment boundary locations) and content segments may be extracted from received content.
  • Content segmentation may be performed based on the spatial data extracted.
  • Content segments may be classified.
  • Content segmentation and content classification may be iterative processes that interface with extraction of spatial data.
  • Example processes that may be used for content segmentation and classification for some embodiments are described in Ql, Charles, et al. Volumetric and Multi-view CNNs for Object Classification on 3D Data, PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 5648- 5656 (2016) and Ql, Charles, et al., Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation, 1.2:4 PROCEEDINGS OF 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) 652-660, IEEE (2017).
  • Bounding volumes (box 808), occlusion volumes (box 810), and viewing volumes (box 812) may be calculated for segmented and classified content. Bounding, occlusion, and viewing volumes may be calculated based on spatial data extracted from received content. Outputs of the bounding, occlusion, and viewing volume calculation processes, as well as the segment classification process may be stored as scene overview data. Scene overview data may include bounding, occlusion, and viewing volumes for segmented content elements, and classifications of the content elements together with element locations in unified scene coordinates.
  • FIG. 9 is a flowchart illustrating an example process that may be executed by a content server for streaming spatial data content to a client for an example server push model according to some embodiments.
  • a content server may wait (box 902) to receive a content request from a client.
  • the content server may send (box 904) scene overview data to the client.
  • the scene overview data may be retrieved (or received for some embodiments) by the content server from a content provider.
  • the client determines which content elements are visible for a navigation volume signaled by the viewing client.
  • the content server may receive client navigation volume and element visibility data from the client (box 906).
  • the content server processes the spatial data and stream spatial data for the viewing client (box 908).
  • the spatial data content may be processed according to the navigation volume.
  • the content server For the duration of the spatial data content or if a viewing client signals a continuation of a session, the content server continually streams (box 910) an up-to-date scene overview and processed spatial data to the viewing client. If an end to a session is requested, the process may determine if an end of processing is requested. If an end of processing is requested, the process may exit. Otherwise, the process may wait for a content request from a client. [0109] Some embodiments adjust content distribution between a content server and a single client. Some embodiments of a content server may stream content to several clients (which may occur concurrently) and may split the streamed content into several streams to reduce the amount of redundant data sent to clients. If several clients share some parts of the content, those parts may be delivered as one stream. Portions of content that are used by fewer or a single client may be transmitted as an additional stream concurrently with a stream used by all clients (or more clients for some embodiments).
  • a content server may process the streamed spatial data based on per element visibility and a navigation volume signaled by the viewing client. For some embodiments of content processing, a content server first removes the elements (segments of data) for which the viewing client has not requested or for which the viewing client has indicated may be removed. For the remaining segments, the content server processes the data based on the navigation volume requested by the viewing client. For point cloud content and polygonal data viewed from user viewpoints for a navigation volume, per point and per vertex occlusion culling may be used to remove portions of the geometry that are not visible by the viewing client. In some embodiments, the removed portions of the geometry are not transmitted. For light field data, a reduced viewing volume may be used for cropping out parts of array images or hogel images (part of light field holograms) that are not displayed by the viewing client.
  • the content server waits for content requests from the viewing clients and transmits data accordingly. Viewing clients request scene overview from the content server at the beginning of a session. If the clients have initialized local execution of the scene, each client selects segment streams to be requested using local criteria. Each client requests individual segments from the content server segment stream by segment stream. A server continuously waits for content requests and streams data based on requests until a content server is requested via a signal or communication message to terminate processing.
  • a content server may receive, from a client, a request for streaming content of a spatial scene.
  • the content server process may determine scene overview information (such as non-rendering descriptions and position information for a plurality of objects of the spatial scene).
  • the content server process may include determining spatial bounding volume information for the identified objects of the spatial scene.
  • the content server process may include determining occlusion volume information for at least one identified object of the spatial scene.
  • the content server process may include determining viewing volume information for at least one object of the plurality of objects of the spatial scene.
  • the content server process may include sending to the client the following items: scene overview information (such as non-rendering descriptions and position information for identified objects of the spatial scene), spatial bounding volume information for the identified objects, occlusion volume information for at least one identified object, and viewing volume information for at least one identified object.
  • scene overview information such as non-rendering descriptions and position information for identified objects of the spatial scene
  • spatial bounding volume information for the identified objects
  • occlusion volume information for at least one identified object
  • viewing volume information for at least one identified object may be sent to the client the following items: scene overview information (such as non-rendering descriptions and position information for identified objects of the spatial scene), spatial bounding volume information for the identified objects, occlusion volume information for at least one identified object, and viewing volume information for at least one identified object.
  • the content server may receive from the client a rendering request indicating a set of visible objects of the spatial scene.
  • the content server may generate and send to the client rendering information for the set of visible objects.
  • Some embodiments of a content server process may include receiving a resolution adjustment request for at least one partially occluded object selected from the set of visible objects.
  • the content server process may adjust the resolution used in rendering information for corresponding partially occluded objects.
  • the resolution request may indicate a visibility percentage of the corresponding partially occluded object, wherein adjusting the resolution used in rendering information for the respective partially occluded object may be based on the visibility percentage of the corresponding partially occluded object.
  • At least one portion of the object occlusion sizing data (or object visibility data) (which may include spatial bounding volume information, occlusion volume information, and viewing volume information) may indicates changes with time. Some embodiments may divide identified objects into a plurality of sub-objects. For some embodiments of a content server process, spatial scene boundary information indicating spatial boundaries of a spatial scene may be sent to the client.
  • Some embodiments of a content server process may use a set of visible objects in which at least one of the visible objects is a partially visible object. At least one of the partially visible objects may include a plurality of sub-objects. For each partially visible object that includes a plurality of sub-objects, a respective set of visible sub-objects may include less than all of the plurality of sub-objects of the respective partially visible object. For each partially visible object that includes a plurality of sub-objects, the rendering information sent to the client may include rendering information only for the respective set of visible subobjects.
  • Some embodiments of a content server process may determine predicted object occlusion sizing data (which may include, for one or more objects of a spatial scene, predicted spatial bounding volume information, predicted occlusion volume information, and predicted viewing volume information) for a predicted user viewpoint at a future time t1.
  • the content server process may send to the client the predicted object occlusion sizing data.
  • a predicted user viewpoint at a future time t1 may be received from a client.
  • Some embodiments of the content sever process may further include receiving user viewing position tracking information and determining the predicted user viewpoint relative to the spatial scene based in part on the user viewing position tracking information.
  • FIG. 10 is a flowchart illustrating an example process executed by a content server for streaming spatial data content to a client for an example client pull model according to some embodiments.
  • the content server waits to receive a content request from a client. If a content request is received, the content server determines the request type. If a new session type of content request is received, the content server retrieves scene overview data and sends the scene overview data to the client. If a segment stream type of content request is received, the content server retrieves spatial data and sends the requested segment stream to the client. For both new session and segment stream requests, the content server process determines if an end of processing request is received from a client. If no end of processing request is received, the content server process returns to waiting for a content request from a client. Otherwise, the content server process exits.
  • FIG. 1 1 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example server push model according to some embodiments.
  • a viewing client may request content from a content server (box 1 102).
  • a user may launch an application on a viewing client. The user may indicate the content to be viewed within the application, and a content request may be sent by the application.
  • Content may be a link to a scene overview stored on a content server.
  • the link to the content may be a uniform resource locator (URL) identifying a content server and specific content.
  • URL uniform resource locator
  • a viewing client application may be launched by an explicit command of the user or by the operating system automatically based on an identifying content type request and an application associated with the type of specific content.
  • a viewing client may be integrated with a web browser, a social media client, or an operating system for some embodiments.
  • a viewing client process may initialize sensor data and collect configuration data (box 1104).
  • Sensor data may include information related to the context of the spatial scene.
  • Configuration data for example, may indicate the number of users, activities in which users are engaging, display device setup, and physical characteristics of the environment, including locations and poses of users.
  • a viewing client process may execute a run-time process continually until an end of processing request is received for some embodiments.
  • a run-time process may receive scene overview data from a content server (box 1 106).
  • the scene overview data may be used with sensor and configuration data to adjust the viewpoint to the content.
  • the viewpoint to content may be adjusted automatically and/or manually for some embodiments (box 1 108).
  • Automatic content adjustment of viewpoint for example, may set the viewpoint to the content based on the display setup (such as adjusting the viewpoint orientation and location depending on the display device orientation (such as tabletop or wall mounted)) or locations of the users.
  • content may be processed automatically based on user preferences or manually by the user. For example, user preferences may indicate a preference to display content that focuses on a specific content element type.
  • Scene overview data may include element classification information.
  • Element classification information may be used to adjust the viewpoint of content to focus on a content element.
  • a user’s navigation volume may be estimated (box 1 1 10).
  • User preferences and the estimated navigation volume may be used to determine the visibility of elements (box 1 1 12).
  • a viewing client process may automatically toggle visibility on or off for a particular element or a particular type of element.
  • a user may indicate a preference to adjust a viewpoint to content to focus on a specific element content type based on a list of object (or element) classifications from scene overview data.
  • a user via a user interface
  • the navigation volume and element visibilities are sent by the client to the content server (box 1 1 14), and a content stream is received from the content server (box 1 1 16).
  • the viewpoint to the content may be set based on the content adjustment and content navigation controlled by the user input and potentially display device tracking in case of HMDs (box 1 118).
  • the viewing client determines the navigation volume.
  • the navigation volume may be calculated using the display device configuration, the number and placement of user, and the expected content update frequency.
  • the navigation volume may be a volume within the full viewing volume available for a scene, which may be indicated in scene overview data.
  • the viewing client may determine which scene (or content) elements are visible by one or more viewers. Visibility may be determined based on, for example, the viewing volume and the viewing environment layout.
  • Content processing, requesting and receiving, is implemented differently for embodiments of a server push model or a client pull model.
  • the viewing client may communicate the navigation volume and the visibility of scene elements to the content server.
  • the client may predict which elements may be used in the near future (such as between the present time and a future time t1 ).
  • the client may communicate to a content server the elements predicted to be displayed in the near future.
  • the client may store information related to the predicted elements in a local memory or cache location. The client may use the same information used to determine the navigation volume to determine the predicted elements.
  • a viewing client may continually repeat a run-time process that includes: receiving scene overview data, adjusting a viewpoint to content, determining (or estimating) a navigation volume, determining (or testing) element visibility, sending navigation volume and element visibility data to the content server, and receiving and rendering the spatial data (1120).
  • the run-time process may be continually repeated until a content server indicates an end of the content or a user requests an end of a session.
  • the client may use the same information used for defining the navigation volume to predict which elements may be used in the near future and request to add them to a local cache. If the viewing client has determined the navigation volume and the visibility of scene elements, the viewing client signals this information to the content server. The viewing client receives content streamed from the content server, which may be processed on the server side based on the information the viewing client sent to the content server. If the client receives the content stream, the client updates the viewpoint to the content according to the latest user input and tracking result before rendering the content and sending the rendering data to the display.
  • an example viewing client process executed by a viewing client may include determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects.
  • requesting adjustment of resolution used in rendering information may include: determining a visibility percentage of the respective partially occluded object; and requesting a decrease in the resolution used in the rendering information for the respective partially occluded object if the visibility percentage of the respective partially occluded object is less than a threshold.
  • At least one of the following items may indicate a change with time: the spatial bounding volume information, the occlusion volume information, and the viewing volume information.
  • at least one of the objects of a spatial scene includes two or more sub-objects.
  • at least one of the visible objects may be a partially visible object.
  • At least one of the partially visible objects may include a plurality of sub-objects.
  • a respective set of visible sub-objects may include less than all of the plurality of sub-objects of the respective partially visible object.
  • the rendering information may include rendering information only for the respective set of visible sub-objects.
  • a viewing client process may include determining a predicted viewpoint of the user, wherein determining the set of visible objects may be based in part on the predicted viewpoint of the user.
  • a viewing client process may include determining, for a future time t1 , predicted spatial bounding volume information, predicted occlusion volume information, predicted viewing volume information, and a predicted viewpoint of the user. Determining the set of visible objects, for the future time t1 , may be based in part on at least one of the following items: the predicted spatial bounding volume information, the predicted occlusion volume information, the predicted viewing volume information, and the predicted viewpoint of the user.
  • Some embodiments of a viewing client process may include receiving user viewing position tracking information, wherein determining the viewpoint of the user relative to the spatial scene may be based in part on the user viewing position tracking information.
  • an example viewing client process executed by a viewing client may include receiving spatial scene boundary information indicating spatial boundaries of the spatial scene, wherein determining a set of visible objects may be based in part on the spatial scene boundary information indicating spatial boundaries of the spatial scene.
  • an example viewing client process executed by a viewing client may include determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects.
  • FIG. 12 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example client pull model according to some embodiments.
  • the viewing client requests content from the content server (box 1202).
  • the viewing client initializes sensor data and collects configuration data (box 1204).
  • Scene overview data is received by the viewing client (box 1206).
  • the client continually adjusts the viewpoint to the data based on the tracking, user input, and scene overview (box 1208).
  • the viewing client may predict future viewpoint motion and use future viewpoint predictions to adjust location and size of the volume for which content navigation (navigation volume) may be enabled (box 1210).
  • the viewing client adjusts scene element visibilities (box 1212) as described above. If the scene adjustment has been done based on local criteria, the client may process the content to be streamed by inspecting the segment visibilities based on the per segment bounding, occlusion, and viewing volumes (box 1214). Content processing, for some embodiments, may use occlusion culling. For occlusion culling, each segment bounding volume’s visibility to the available and/or used navigation volume is evaluated by testing if the segment bounding volume is completely occluded by occlusion volumes of other segments and by determining if a per segment viewing volume has overlap with the current navigation volume.
  • a viewing client requests visible streams (box 1216) from the content server. If the client receives the content stream, the client updates the viewpoint to the content according to the latest user input and tracking result (1218) before rendering the content (box 1220) and sending the rendering data to the display. If a request to end processing is received, the viewing client process exits. Otherwise, the viewing client process repeats with the receiving of scene overview data. [0131] In an example process for determining visible objects and displaying visible objects according to some embodiments, a viewing client may receive non-rendering descriptions and position information for a plurality of objects of a spatial scene.
  • Non-rendering descriptions and position information may include a name and location data for each object or element of a spatial scene.
  • a viewing client may receive, from a content server, spatial bounding information, occlusion volume information, and viewing volume information for one or more of the objects of the spatial scene.
  • the example viewing client process may include determining a viewpoint of a user relative to the spatial scene.
  • the viewing client process may determine a set of visible objects for the spatial scene based on at least one of the spatial bounding volume information, the occlusion volume information, the viewing volume information, and the viewpoint of the user.
  • a rendering request may be sent to the content server.
  • the rendering request may include an indication of the set of visible objects.
  • the viewing client may receive rendering information describing the set of visible objects, and the viewing client may use the rendering information to display the set of visible objects from the viewpoint of the user.
  • an apparatus may include a processor and a non-transitory computer- readable medium storing instructions that are operative, when executed by the processor, to perform a method disclosed herein.
  • a viewing client may receive object occlusion sizing data (which may include one or more of the following items: spatial bounding information, occlusion volume information, and viewing volume information) for one or more objects of a spatial scene.
  • the viewing client may determine a set of visible objects based on the object occlusion sizing data and the viewpoint of the user.
  • the viewing client may retrieve rendering information for each of the visible objects (or partially visible objects).
  • the viewing client may display content for the visible objects using the rendering information.
  • a client device performs a method comprising: receiving non-rendering descriptions and position information for a plurality of objects of a spatial scene; receiving spatial bounding volume information for the plurality of objects of the spatial scene; receiving occlusion volume information for at least one object of the plurality of objects of the spatial scene; receiving viewing volume information for at least one object of the plurality of objects of the spatial scene; determining a viewpoint of a user relative to the spatial scene; determining a set of visible objects selected from the plurality of objects based on at least one of the spatial bounding volume information, the occlusion volume information, the viewing volume information, and the viewpoint of the user; sending rendering request information indicating the set of visible objects; receiving rendering information describing the set of visible objects; and displaying the set of visible objects from the viewpoint of the user.
  • the method further includes: determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects.
  • Requesting adjustment of resolution used in the rendering information may include: determining a visibility percentage of the respective partially occluded object; and requesting a decrease in the resolution used in the rendering information for the respective partially occluded object if the visibility percentage of the respective partially occluded object is less than a threshold.
  • At least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information indicates changes with time.
  • at least one of the plurality of objects comprises a plurality of sub-objects.
  • At least one of the visible objects is a partially visible object; at least one of the partially visible objects comprises a plurality of sub-objects; for each partially visible object comprising a plurality of sub-objects, a respective set of visible sub-objects comprises less than all of the plurality of subobjects of the respective partially visible object, and; for each partially visible object comprising a plurality of sub-objects, the rendering information comprises rendering information only for the respective set of visible sub-objects.
  • the method further includes determining a predicted viewpoint of the user, wherein determining the set of visible objects is further based on the predicted viewpoint of the user.
  • the method further includes: determining a predicted spatial bounding volume information at a future time t1 ; determining a predicted occlusion volume information at the future time t1 ; determining a predicted viewing volume information at the future time t1 ; determining a predicted viewpoint of the user at the future time t1 , wherein determining the set of visible objects is further based on, for the future time t1 , at least one of the predicted spatial bounding volume information, the predicted occlusion volume information, the predicted viewing volume information, and the predicted viewpoint of the user.
  • the method further includes receiving user viewing position tracking information, wherein determining the viewpoint of the user relative to the spatial scene is based in part on the user viewing position tracking information.
  • the method further includes receiving spatial scene boundary information indicating spatial boundaries of the spatial scene, wherein determining the set of visible objects is further based on the spatial scene boundary information indicating spatial boundaries of the spatial scene.
  • a client device performs a method comprising: receiving spatial bounding volume information for a plurality of objects of a spatial scene; receiving occlusion volume information for at least one object of the plurality of objects of the spatial scene; receiving viewing volume information for at least one object of the plurality of objects of the spatial scene; determining a set of visible objects selected from the plurality of objects based on at least one of the spatial bounding volume information, the occlusion volume information, the viewing volume information, and the viewpoint of the user relative to the spatial scene; retrieving rendering information for the set of visible objects; and displaying the set of visible objects using the rendering information.
  • the method further includes determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects.
  • requesting adjustment of resolution used in the rendering information comprises: determining a visibility percentage of the respective partially occluded object; and requesting a decrease in the resolution used in the rendering information for the respective partially occluded object if the visibility percentage of the respective partially occluded object is less than a threshold.
  • At least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information indicates changes with time.
  • at least one of the plurality of objects comprises a plurality of sub-objects.
  • At least one of the visible objects is a partially visible object; at least one of the partially visible objects comprises a plurality of sub-objects; for each partially visible object comprising a plurality of sub-objects, a respective set of visible sub-objects comprises less than all of the plurality of subobjects of the respective partially visible object; and for each partially visible object comprising a plurality of sub-objects, the rendering information comprises rendering information only for the respective set of visible sub-objects.
  • determining the set of visible objects is further based on a predicted viewpoint of the user.
  • determining the set of visible objects is further based on, for a future time, at least one of a predicted spatial bounding volume information, a predicted occlusion volume information, a predicted viewing volume information, and a predicted viewpoint of the user.
  • the method further includes receiving user viewing position tracking information and determining the viewpoint of the user relative to the spatial scene is based in part on the user viewing position tracking information. [0149] In some embodiments, the method further includes receiving spatial scene boundary information indicating spatial boundaries of the spatial scene, wherein determining the set of visible objects is further based on the spatial scene boundary information indicating spatial boundaries of the spatial scene.
  • an apparatus comprises a processor configured to perform any of the methods described above.
  • the processor is configured to perform such methods by providing a computer-readable medium (e.g. a non-transitory computer-readable medium) storing instructions that are operative, when executed by the processor, to perform such methods.
  • a computer-readable medium e.g. a non-transitory computer-readable medium
  • a content server performs a method comprising: receiving, from a client, a request for streaming content of a spatial scene; determining non-rendering descriptions and position information for a plurality of objects of the spatial scene; determining spatial bounding volume information for the plurality of objects of the spatial scene; determining occlusion volume information for at least one object of the plurality of objects of the spatial scene; determining viewing volume information for at least one object of the plurality of objects of the spatial scene; sending, to the client, non-rendering descriptions and position information for a plurality of objects of the spatial scene; sending, to the client, spatial bounding volume information for the plurality of objects of the spatial scene; sending, to the client, occlusion volume information for at least one object of the plurality of objects of the spatial scene; sending, to the client, viewing volume information for at least one object of the plurality of objects of the spatial scene; receiving, from the client, rendering request information indicating a set of visible objects selected from the plurality of objects; generating
  • the method includes receiving a resolution adjustment request for at least one partially occluded object selected from the set of visible objects; and adjusting a resolution used in the rendering information for at least one respective partially occluded object.
  • the resolution adjustment request indicates a visibility percentage of the respective partially occluded object
  • adjusting the resolution used in rendering information for the respective partially occluded object is based on the visibility percentage of the respective partially occluded object
  • At least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information indicates changes with time.
  • at least one of the plurality of objects comprises a plurality of sub-objects.
  • At least one of the visible objects is a partially visible object; at least one of the partially visible objects comprises a plurality of sub-objects; for each partially visible object comprising a plurality of sub-objects, a respective set of visible sub-objects comprises less than all of the plurality of subobjects of the respective partially visible object; and for each partially visible object comprising a plurality of sub-objects, the rendering information comprises rendering information only for the respective set of visible sub-objects.
  • the method further includes: determining a predicted spatial bounding volume information for the plurality of objects of the spatial scene for a predicted user viewpoint at the future time t1 ; determining a predicted occlusion volume information for at least one object of the plurality of objects of the spatial scene for the predicted user viewpoint at the future time t1 ; determining a predicted viewing volume information for at least one object of the plurality of objects of the spatial scene for the predicted user viewpoint at the future time t1 ; sending, to the client, the predicted spatial bounding volume information for the plurality of objects of the spatial scene at the future time t1 ; sending, to the client, the predicted occlusion volume information for at least one object of the plurality of objects of the spatial scene at the future time t1 ; and sending, to the client, the predicted viewing volume information for at least one object of the plurality of objects of the spatial scene at the future time t1.
  • the method further includes receiving, from the client, the predicted user viewpoint at a future time t1.
  • the method further includes receiving user viewing position tracking information and determining the predicted user viewpoint relative to the spatial scene based in part on the user viewing position tracking information.
  • the method further includes sending, to the client, spatial scene boundary information indicating spatial boundaries of the spatial scene.
  • a method includes: receiving object occlusion sizing data for a plurality of objects of a spatial scene; determining a set of visible objects selected from the plurality of objects based on the object occlusion sizing data and the viewpoint of the user; retrieving rendering information for the set of visible objects; and displaying the set of visible objects from a viewpoint of the user using the rendering information, wherein the object occlusion sizing data comprises at least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information.
  • a method performed at a client device includes: for each of a plurality of virtual objects, receiving (i) information defining a bounding volume that encloses the respective object and (ii) information defining an occlusion volume enclosed within the respective object; for each of the objects, determining whether the object is visible from at least one viewpoint, where an object is determined to be visible if the bounding volume of the object is not obscured from the viewpoint by the occlusion volumes of the other objects; and, in response to the determination, retrieving rendering information only for the objects that are determined to be visible.
  • an object is determined to be visible if the bounding volume of the object is not entirely obscured from the viewpoint by the occlusion volumes of the other objects. In some embodiments, an object is determined to be visible if no more than a threshold amount of the bounding volume of the object is obscured from the viewpoint by the occlusion volumes of the other objects. In some embodiments, an object is determined to be visible if there is at least one viewpoint in a plurality of selected viewpoints at which the bounding volume of the object is not obscured by the occlusion volumes of the other objects. The selected viewpoints may be viewpoints within a defined navigation volume.
  • a method includes: for each of a plurality of virtual objects, determining (i) a bounding volume that encloses the respective object and (ii) an occlusion volume enclosed within the respective object; for each of the objects, determining whether the object is visible from at least one viewpoint, where an object is determined to be visible if the bounding volume of the object is not obscured from the viewpoint by the occlusion volumes of the other objects; and, in response to the determination, rendering only the objects that are determined to be visible.
  • HMD head mounted display
  • modules include hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation.
  • hardware e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices
  • Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and it is noted that those instructions could take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer-readable medium or media, such as commonly referred to as RAM, ROM, etc.
  • RAM random access memory
  • ROM read-only memory
  • Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • ROM read only memory
  • RAM random access memory
  • register cache memory
  • semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • a processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

In an example embodiment, a client device is provided for rendering virtual objects. For at least a first virtual object in a scene, the device receives information defining a first bounding volume that substantially encloses the first virtual object. The client device also receives information defining at least one occlusion volume in the scene. For least a first viewpoint relative to the scene, the client device determines whether the first bounding volume is fully occluded by one or more of the occlusion volumes. In response to a determination that the first bounding volume is not fully occluded, the client device retrieves first object rendering data for the first virtual object. The client device may then render the scene using the rendering data.

Description

SYSTEM AND METHOD FOR OPTIMIZING SPATIAL CONTENT DISTRIBUTION USING MULTIPLE
DATA SYSTEMS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a non-provisional filing of, and claims benefit under 35 U.S.C. § 119(e) from, U.S. Provisional Patent Application Serial No. 62/786,219, entitled“System and Method for Optimizing Spatial Content Distribution Using Multiple Data Systems,” filed December 28, 2018, which is hereby incorporated by reference in its entirety.
BACKGROUND
[0002] Experiences based on spatial data may be viewed in varying settings. The physical environment, amount and context of users, and the display device setup may change from session to session. Also, the content itself may have multiple formats, such as fully synthetic polygonal real-time data, point clouds and light-fields, which may impact how content may be viewed. Communication systems used for transmitting content to a viewing device may be bandwidth constrained.
SUMMARY
[0003] In some embodiments, a method (which may be performed by a client device) includes receiving, for at least a first virtual object in a scene, information defining a first bounding volume that substantially encloses the first virtual object. Information is also received that defines at least one occlusion volume in the scene. For least a first viewpoint relative to the scene, a determination is made of an amount by which the first bounding volume is occluded by one or more of the occlusion volumes. A determination of whether to retrieve first object rendering data for the first virtual object is made based on the amount by which the first bounding volume is occluded.
[0004] In some embodiments, a method (which may be performed by a client device) includes receiving, for at least a first virtual object in a scene, information defining a first bounding volume that substantially encloses the first virtual object. Information is also received that defines at least one occlusion volume in the scene. For least a first viewpoint relative to the scene, a determination is made of whether the first bounding volume is fully occluded by one or more of the occlusion volumes. In response to a determination that the first bounding volume is not fully occluded, first object rendering data may be retrieved for the first virtual object.
[0005] Some embodiments of an example method further include determining an amount by which the first virtual object is occluded by the at least one occlusion volume, where a resolution level of the retrieved first object rendering data is determined based on the amount by which the bounding volume is occluded.
[0006] Some embodiments further include, for at least a second virtual object in a scene, receiving information defining a second bounding volume that substantially encloses the second virtual object. A determination is made of whether the second bounding volume is fully occluded by at least one of the occlusion volumes. In response to a determination that the second bounding volume is fully occluded by at least one of the occlusion volumes, a determination may be made not to retrieve object rendering data for the second virtual object.
[0007] In some embodiments, at least one of the occlusion volumes is enclosed within a third virtual object in the scene.
[0008] In some embodiments, a data size of the object rendering data for the first virtual object is greater than a data size of the information defining the bounding volume of the first virtual object. In some embodiments, the first object rendering data includes a first number of vertex points, and the information defining the first bounding volume comprises a second number of vertex points, the second number being less than the first number. In some embodiments, the first object rendering data includes a first number of polygons, and the information defining the first bounding volume comprises a second number of polygons, the second number being less than the first number.
[0009] In some embodiments, the first viewpoint is within a defined navigation volume, and the method includes, for a plurality of viewpoints within the defined navigation volume, determining whether the first bounding volume is fully occluded by at least one of the occlusion volumes. The first object rendering data may then be retrieved in response to a determination that the first bounding volume is not fully occluded for at least one of the viewpoints in the defined navigation volume.
[0010] The first viewpoint may correspond in some embodiments to a virtual position of a user with respect to the scene or to a predicted future position of a user with respect to the scene.
[0011] In some embodiments, the method further includes rendering a view of the scene based at least on the first object rendering data. The method may further include generating a signal representing the rendered view. Such a signal may be transmitted to a display device or display component. The method may further include displaying the rendered view. [0012] In some embodiments, an apparatus is provided with a processor configured to perform any of the methods described herein. The processor may be configured using instructions for performing the methods. The apparatus may include a computer-readable medium (e.g. a non-transitory computer-readable medium) storing the instructions.
[0013] Spatial data content distribution uses a large amount of bandwidth, which in many cases may be a bottleneck for content delivery. Viewing conditions varying from session to session, such as the number and context of viewers, display device configuration, user preferences, and content navigation may cause at least a portion of the spatial content to not be visible for a viewer. Content distribution may be improved by not delivering visual data that is not visible by a viewer.
[0014] A content server may improve content distribution by providing clients with spatial content overview information in addition to spatial data used by the viewing client for immersive rendering. A content server may produce content overview information and may analyze the content by separating the content into individual segments, for which classification, viewing volume, and bounding volumes may be determined.
[0015] Using the overview information, a viewing client may align the content, may determine a predicted viewing volume for the viewing client, and may determine which objects are visible and invisible for the viewing volume and may be omitted. A viewing client may communicate information indicating visible objects to a content server. This communication may be a continual process. A content server may use the visible object information to improve spatial data distribution by reducing delivered content data and by dividing distribution to several streams to reduce the amount of data received by each client.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] FIG. 1A is a system diagram of an example system illustrating an example communications system according to some embodiments.
[0017] FIG. 1 B is a system diagram of an example system illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to some embodiments.
[0018] FIG. 2 is a system diagram illustrating an example set of interfaces for a content provider, a content server, and a viewing client according to some embodiments.
[0019] FIGs. 3A-3E are schematic illustrations of bounding volumes and occlusion volumes for virtual objects in a scene.
[0020] FIGs. 4A-4C are schematic illustrations of occlusion detection for a virtual scene using bounding volumes, occlusion volumes, and navigation volumes. [0021] FIG. 5 is a message sequencing diagram illustrating an example process for determining visible objects and rendering visible objects according to some embodiments.
[0022] FIG. 6 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example server push model according to some embodiments.
[0023] FIG. 7 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example client pull model according to some embodiments.
[0024] FIG. 8 is a flowchart illustrating an example process for analyzing content segments to determine bounding volumes according to some embodiments.
[0025] FIG. 9 is a flowchart illustrating an example process that may be executed by a content server for streaming spatial data content to a client for an example server push model according to some embodiments.
[0026] FIG. 10 is a flowchart illustrating an example process executed by a content server for streaming spatial data content to a client for an example client pull model according to some embodiments.
[0027] FIG. 1 1 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example server push model according to some embodiments.
[0028] FIG. 12 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example client pull model according to some embodiments.
[0029] The entities, connections, arrangements, and the like that are depicted in— and described in connection with— the various figures are presented by way of example and not by way of limitation. As such, any and all statements or other indications as to what a particular figure“depicts,” what a particular element or entity in a particular figure“is” or“has,” and any and all similar statements— that may in isolation and out of context be read as absolute and therefore limiting— may only properly be read as being constructively preceded by a clause such as“In at least one embodiment, ....”
EXAMPLE NETWORKS FOR IMPLEMENTATION OF THE EMBODIMENTS
[0030] A wireless transmit/receive unit (WTRU) may be used as a client, a viewing device, or a viewer client in embodiments described herein.
[0031] FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented. The communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users. The communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth. For example, the communications systems 100 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
[0032] As shown in FIG. 1A, the communications system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, a RAN 104/1 13, a ON 106/115, a public switched telephone network (PSTN) 108, the Internet 1 10, and other networks 1 12, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements. Each of the WTRUs 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment. By way of example, the WTRUs 102a, 102b, 102c, 102d, any of which may be referred to as a“station” and/or a“STA”, may be configured to transmit and/or receive wireless signals and may include a user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription-based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (loT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, a medical device and applications (e.g., remote surgery), an industrial device and applications (e.g., a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts), a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like. Any of the WTRUs 102a, 102b, 102c and 102d may be interchangeably referred to as a UE.
[0033] The communications systems 100 may also include a base station 1 14a and/or a base station 1 14b. Each of the base stations 1 14a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the CN 106/1 15, the Internet 110, and/or the other networks 112. By way of example, the base stations 1 14a, 1 14b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 1 14a, 1 14b are each depicted as a single element, it will be appreciated that the base stations 1 14a, 1 14b may include any number of interconnected base stations and/or network elements.
[0034] The base station 1 14a may be part of the RAN 104/113, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 114a and/or the base station 1 14b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum. A cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time. The cell may further be divided into cell sectors. For example, the cell associated with the base station 1 14a may be divided into three sectors. Thus, in one embodiment, the base station 1 14a may include three transceivers, i.e., one for each sector of the cell. In an embodiment, the base station 1 14a may employ multiple-input multiple output (MIMO) technology and may utilize multiple transceivers for each sector of the cell. For example, beamforming may be used to transmit and/or receive signals in desired spatial directions.
[0035] The base stations 1 14a, 1 14b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 1 16, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 1 16 may be established using any suitable radio access technology (RAT).
[0036] More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 1 14a in the RAN 104/1 13 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1 15/1 16/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (FISPA+). HSPA may include High-Speed Downlink (DL) Packet Access (FISDPA) and/or High-Speed UL Packet Access (FISUPA).
[0037] In an embodiment, the base station 1 14a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1 16 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A) and/or LTE-Advanced Pro (LTE-A Pro).
[0038] In an embodiment, the base station 1 14a and the WTRUs 102a, 102b, 102c may implement a radio technology such as NR Radio Access , which may establish the air interface 1 16 using New Radio (NR).
[0039] In an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement multiple radio access technologies. For example, the base station 1 14a and the WTRUs 102a, 102b, 102c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles. Thus, the air interface utilized by WTRUs 102a, 102b, 102c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
[0040] In other embodiments, the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.1 1 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
[0041] The base station 1 14b in FIG. 1A may be a wireless router, Flome Node B, Flome eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like. In one embodiment, the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.1 1 to establish a wireless local area network (WLAN). In an embodiment, the base station 1 14b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 1 14b and the WTRUs 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell. As shown in FIG. 1A, the base station 1 14b may have a direct connection to the Internet 1 10. Thus, the base station 1 14b may not be required to access the Internet 1 10 via the CN 106/115.
[0042] The RAN 104/1 13 may be in communication with the CN 106/115, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d. The data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like. The CN 106/1 15 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in FIG. 1A, it will be appreciated that the RAN 104/1 13 and/or the CN 106/115 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 104/1 13 or a different RAT. For example, in addition to being connected to the RAN 104/113, which may be utilizing a NR radio technology, the CN 106/115 may also be in communication with another RAN (not shown) employing a GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technology.
[0043] The CN 106/1 15 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or the other networks 1 12. The PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 1 12 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 104/1 13 or a different RAT.
[0044] Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links). For example, the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 1 14a, which may employ a cellular-based radio technology, and with the base station 1 14b, which may employ an IEEE 802 radio technology.
[0045] FIG. 1 B is a system diagram illustrating an example WTRU 102. As shown in FIG. 1 B, the WTRU 102 may include a processor 1 18, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, non-removable memory 130, removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and/or other peripherals 138, among others. It will be appreciated that the WTRU 102 may include any sub-combination of the foregoing elements while remaining consistent with an embodiment.
[0046] The processor 1 18 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 1 18 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment. The processor 1 18 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1 B depicts the processor 1 18 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
[0047] The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1 14a) over the air interface 1 16. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In an embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
[0048] Although the transmit/receive element 122 is depicted in FIG. 1 B as a single element, the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1 16.
[0049] The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the WTRU 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as NR and IEEE 802.1 1 , for example.
[0050] The processor 1 18 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 1 18 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
[0051] The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102. The power source 134 may be any suitable device for powering the WTRU 102. For example, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium- ion (Li-ion), etc.), solar cells, fuel cells, and the like.
[0052] The processor 1 18 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 1 16 from a base station (e.g., base stations 1 14a, 1 14b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable location- determination method while remaining consistent with an embodiment.
[0053] The processor 1 18 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like. The peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
[0054] The WTRU 102 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous. The full duplex radio may include an interference management unit to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 1 18). In an embodiment, the WRTU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
[0055] In view of Figures 1A-1 B, and the corresponding description of Figures 1A-1 B, one or more, or all, of the functions described herein with regard to one or more of: WTRU 102a-d, Base Station 1 14a-b, and/or any other device(s) described herein, may be performed by one or more emulation devices (not shown). The emulation devices may be one or more devices configured to emulate one or more, or all, of the functions described herein. For example, the emulation devices may be used to test other devices and/or to simulate network and/or WTRU functions.
[0056] The emulation devices may be designed to implement one or more tests of other devices in a lab environment and/or in an operator network environment. For example, the one or more emulation devices may perform the one or more, or all, functions while being fully or partially implemented and/or deployed as part of a wired and/or wireless communication network in order to test other devices within the communication network. The one or more emulation devices may perform the one or more, or all, functions while being temporarily implemented/deployed as part of a wired and/or wireless communication network. The emulation device may be directly coupled to another device for purposes of testing and/or may performing testing using over-the-air wireless communications.
[0057] The one or more emulation devices may perform the one or more, including all, functions while not being implemented/deployed as part of a wired and/or wireless communication network. For example, the emulation devices may be utilized in a testing scenario in a testing laboratory and/or a non-deployed (e.g., testing) wired and/or wireless communication network in order to implement testing of one or more components. The one or more emulation devices may be test equipment. Direct RF coupling and/or wireless communications via RF circuitry (e.g., which may include one or more antennas) may be used by the emulation devices to transmit and/or receive data.
DETAILED DESCRIPTION
[0058] Visually rich spatial data may contain elements in multiple data formats, which may affect how freely the viewpoint to the scene may be adjusted. Also, characteristics of the viewing session, such as the physical environment, amount and context of users, and the display device setup may affect how much of the full spatial content is used by the viewing client and how the spatial content is adjusted to provide a high- quality experience for all viewers.
[0059] The content distribution may be adjusted to enable delivery of complex spatial data with high quality. Redundant (or unused) content data which is not contributing to viewed content may not be transmitted between a content server and a viewing client. Such a process may use scene, content, and viewing data to determine which elements are contributing to (or changing) a user’s viewpoint. Processing and contextual data may be communicated between a content server and a viewing client.
[0060] Recent development in machine learning approaches applied to spatial data analysis have enabled adjustment of content delivery for different contexts. Deep neural networks, especially convolutional neural networks have seen rapid adoption in a wide range of image analysis and computer vision tasks. Methods developed for analyzing 2D image data may be adopted for the processing of 3D spatial data.
[0061] Environments that may be bandwidth constrained include wireless Wi-Fi data and cellular 5G data. Spatial data content distribution uses a large bandwidth, which in many cases may be a severe bottleneck for content delivery. Content distribution may be improved by not delivering visual data not visible by the viewer. A client navigating an unstructured point cloud may receive data delivered over a bandwidth- constrained network. A three-degrees-of-freedom-plus (3DoF+) or six-degrees-of-freedom (6D0F) application for a VR, AR, or MR environment may have content delivery bandwidth limitations.
Overview
[0062] FIG. 2 is a system diagram illustrating an example set of interfaces for a content provider, a content server, and a viewing client according to some embodiments. FIG. 2 shows an example embodiment for allocation of data and analysis for a system. A content provider may be a source feed of content, such as a live camera, a previously recorded video, or ARA/R/MR content source. A content server may include storage locations for spatial data and scene overview. Content received from a content provider may be analyzed by a content analysis process to determine objects from spatial data and to store scene overview information. Spatial data and scene overview information may be sent to a viewing client for analysis of visible objects and rendering of objects for a user. Operations of each of these elements is described in more detail below.
[0063] Not transmitting visual data for objects not visible may be one method to improve content delivery efficiency. A content server and a viewing client may exchange information about content characteristics and session characteristics to improve content delivery and to transmit spatial data only for visible objects for some embodiments.
[0064] Some embodiments of a system use a content server to analyze the spatial data to extract overview information for a spatial scene. The overview information may be sent to a viewing client. The viewing client may combine content layout information with viewing session information to determine which elements of a scene are visible for a navigation volume. Element visibility information and navigation volume information may be communicated by a viewing client to a content server to improve efficiency of content distribution.
[0065] Information about element visibility may be used by a content server to toggle on and off the processing of content segments. The navigation volume predicted to be used by a viewing client may be used in determining which segment content data may be removed. For example, the navigation volume predicted to be used may be smaller than the full viewing volume available (or enabled) by spatial content data.
[0066] Systems and methods disclosed herein may be implemented using client pull and server push models. I n a client pull model, a content server isolates individual scene elements so that the client selectively chooses which elements to receive based on local per session criteria. In a server push model, a content server divides full spatial data into individual streams based on client signaling indicating how a client is using scene data, splitting the full stream into sub-streams if there are multiple clients streaming the data concurrently with varying per scene element uses.
[0067] An example process that may be executed by a content server may include receiving spatial data for a spatial scene. Scene overview data may be generated for a spatial scene. Pre-processing may be performed to generate scene overview data if the spatial data is available before a user requests the content. Scene overview data may be generated at run-time if the spatial data is a live stream. Analysis of spatial data may include: segmenting and identifying elements within the spatial data, determining bounding volumes for segments, determining occlusion volumes for segments, and determining a viewing volume from which a processed segment may be viewed. In a client pull model, analysis of spatial data includes isolating spatial data into individual streams that may be selectively requested by clients. Upon receiving a content request from a viewing client, a content server may send scene overview data to the viewing client, which may include: hierarchical structured scene graph describing elements, locations and element types, bounding volumes, occlusion volumes, and viewing volumes for each element. In a server push model, content distribution includes a viewing client sending element visibility and navigation volume information to the content server, and the content server may adjust content data to be sent to the viewing client based on element visibility and navigation volume information. The adjusted content data may be streamed to the viewing client as requested by the viewing client. In a client pull model, content distribution includes streaming segmented spatial data elements to the clients based on client requests. The content server process may repeat continually (starting with sending scene overview data) until a session termination is communicated.
[0068] An example process that may be executed by a viewing client may include requesting content from a content server and receiving scene overview data from the content server. Content may be adjusted for a display setup configuration, location of viewers, and other local criteria. The viewing client may determine which elements are visible and may determine a navigation volume for the scene for the current time step. In a server push model, requesting content from the server includes sending the navigation volume and element visibility information to the content server. Element visibility information may be an indication of estimated changes to visibility for some embodiments. In a client pull model, requesting content from the server includes requesting individual scene element files and streams from the server based on local estimation of elements to be used. The viewing client may receive content data from the content server and may display the content to one or more viewers. The viewing client process (starting with receiving scene overview data) may repeat continually until a session termination is communicated.
[0069] Systems and methods disclosed herein may adjust scene data communicated between a content server and a viewing client to improve content distribution efficiency and to improve a viewing experience. These improvements may use descriptions of a scene overview to determine which scene elements may be used by a viewing client for a navigation volume. The viewing client may request, from a content server, content for a set of scene elements, and the content may be adapted for viewing session characteristics. Content distribution efficiency may be improved by a content server sending only content data for scene elements that are visible for a viewing volume for some embodiments. In a server push model, information signaled from a viewing client back to a content server enables a content server to improve content distribution by decreasing the amount of data streamed to the client.
[0070] FIG. 3A is a schematic illustration of two objects 302, 304 in a virtual scene 306 available from a content server. Object 302 is a virtual model of a car, and object 304 is a virtual model of a building. The virtual models of the car and the building may be represented using various computer graphics modeling techniques. For example, they may be represented as point clouds or as polygon meshes with associated texture map information, among other possibilities. Full-resolution versions of the objects 302 and 304 may call for a large amount of data to be transmitted from a content server to a viewing client. The transmission of such data may be inefficient in case where one or more of the objects is occluded by another object because a fully occluded object will not be visible to a user.
[0071] An example of complete occlusion of one virtual object by another is illustrated in the schematic plan view of FIG. 3B. From the virtual viewpoint of the user 308 on the virtual scene, the virtual car 302 is completely occluded by the virtual building 304. In this configuration, transmission of full rendering information for the virtual car 302 likely amounts to an unnecessary use of network bandwidth.
[0072] To limit unnecessary use of network bandwidth, some embodiments make use of a bounding volume for each of a plurality of virtual objects in a scene. The bounding volume of a virtual object is, in some embodiments, a volume that substantially encloses (e.g. fully encloses) the respective virtual object. Information defining the bounding volume may be, for example, a polygon mesh or information used for conveying volumetric information. For example, a bounding volume may be a sphere (even if the virtual object is not spherical), and information defining the bounding volume may include coordinates of the center of a sphere and a value indicating the radius of the sphere. A bounding volume may have a box shape (e.g. a cube), and the information defining the bounding volume may include, for example, coordinates of one or more corners of the box. The information defining the bounding volume may include information (e.g. coordinates) defining the position of the bounding volume within the scene. It is not necessary for all virtual objects in a scene to have a bounding volume. For example, some virtual objects may be sufficiently large (e.g. a virtual mountain) or sufficiently close to the user (e.g. a virtual representation of a user’s hand or handheld tools) that they are not likely to be substantially or fully occluded, and occlusion testing using bounding volumes may be skipped for such objects. [0073] FIG. 3C schematically illustrates an example of a box-shaped bounding volume 310 associated with virtual object 302. While the virtual object 302 itself may be represented using dozens or hundreds of vertices or polygons, bounding volume 310 may be defined with eight or fewer vertices (six or fewer polygons) in this example.
[0074] In addition to bounding volumes, some embodiments further make use of occlusion volumes. In some examples, an occlusion volume is enclosed within an opaque region of a virtual object. FIG. 3D schematically illustrates an occlusion volume 312 enclosed within virtual object 304. While the virtual object 304 itself may be represented using dozens or hundreds of vertices, occlusion volume 310 may be defined with eight or fewer vertices in this example. More generally, information defining an occlusion volume may be, for example, a polygon mesh or using information used for conveying volumetric information. For example, an occlusion volume may be a sphere, and information defining the occlusion volume may include coordinates of the center of a sphere and a value indicating the radius of the sphere. An occlusion volume may have a box shape (e.g. a cube), and the information defining the occlusion volume may include, for example, coordinates of one or more corners of the box. The information defining the occlusion volume may include information (e.g. coordinates) defining the position of the bounding volume within the scene.
[0075] Some virtual objects may have more than one associated occlusion volume. For example, as illustrated in FIG. 3E, virtual object 302 may enclose three occlusion volumes 314, 316, 318. In this example, no occlusion volumes are positioned in a partly transparent portion of virtual object 302 that corresponds to car windows. While three box-shaped occlusion volumes are used in this example for virtual object 302, other examples could use different numbers and different shapes of occlusion volumes. More occlusion volumes (and/or occlusion volumes with more complicated shapes) may more accurately conform to the shape of the virtual object at the expense of imposing greater data requirements.
[0076] As seen from FIG. 3E, an object may be associated with more than one occlusion volume. Conversely, objects may share one or more occlusion volumes. For example, a virtual car may be represented not by a single virtual object, but by a collection of a plurality of virtual objects, such as a body and four individual wheels, but the entire virtual car may be associated with a smaller number of occlusion volumes, such as the three volumes shown in FIG. 3E. An occlusion volume need not be associated with any particular virtual object. In some cases, a virtual object may have no associated occlusion volume. For example, it may be desirable not to define any occlusion volume for a fully or partly transparent object, such as a virtual window or shrub, or for a small object that is unlikely to substantially occlude other objects.
[0077] FIGs. 4A-4C illustrate example uses of bounding volumes, occlusion volumes, and navigation volumes. [0078] In the example of FIG. 4A, a client device receives, from a content server, information defining the bounding volume 310 of the virtual car and the occlusion volume 312 of the virtual building in a virtual scene. This may occur before the client has received full rendering information for the virtual car and virtual building. For a viewpoint 402 relative to the virtual scene, the client device determines whether the bounding volume 310 of the car is fully occluded by any individual occlusion volume or combination of occlusion volumes in the scene. In the example of FIG. 4A, the bounding volume 310 is not fully occluded by the occlusion volume 312 of the building. In response to the determination that the bounding volume 310 of the virtual car is not fully occluded the client device retrieves object rendering data for the virtual car. (A similar process may be performed to determine whether a bounding volume of the virtual building is fully occluded by one or more occlusion volumes; for simplicity, this process is not illustrated in FIGs. 4A-4C.)
[0079] In some embodiments, the client device may retrieve a full-resolution version of the virtual car 302 in response to the determination that the bounding volume 310 of the car virtual is not fully occluded. In other embodiments, the client device determines an amount by which the first virtual object is occluded by the occlusion volumes, and the resolution level of the retrieved rendering data is determined based on the amount by which the bounding volume is occluded. The amount of occlusion may be determined as, for example, a percentage occlusion from the perspective of the viewpoint 402. A threshold may be set, such that, for example, a full-resolution version of the virtual car 302 is retrieved if the occlusion level is less than the threshold, while a reduced-resolution version of the virtual car may be retrieved if the occlusion level is greater than the threshold. Reduced-resolution versions of an object may have, for example, fewer polygons or vertex points than full-resolution versions. In some embodiments, if the amount of occlusion is greater than a threshold a determination may be made not to retrieve any rendering data for the object.
[0080] It may be noted that a virtual object may only be a portion of what might commonly be considered a single physical object. For example, the front half and rear half of a virtual car 302 could be represented as separate virtual objects, albeit separate objects that are constrained to remain adjoined to one another. Each half of the car may then have an associated bounding volume. In the configuration of FIG. 4A, it then may be the case that the client device retrieves full rendering information for the object representing the unoccluded front half of the virtual car and retrieves no rendering information or low-resolution rendering information for the object representing the rear half of the virtual car.
[0081] In the example of FIG. 4B, the virtual scene includes both the original virtual car 302 and a second virtual car 404. The client device receives information defining a bounding volume 406 associated with virtual car 404. The client device makes a determination of whether bounding volume 406 is fully occluded, from the perspective of viewpoint 402, by any individual occlusion volume or combination of occlusion volumes in the scene. In the example of FIG. 4B, bounding volume 406 is fully occluded by occlusion volume 312. In response to the determination that the bounding volume of the second virtual object is fully occluded by at least one of the occlusion volumes, the client device makes a determination not to retrieve object rendering data for the second virtual car 404. In some embodiments, the client device may identify to the content server those objects that are determined not to be occluded, so that rendering information for those objects can be retrieved. In some embodiments, the client device may identify to the content server those objects that are determined to be occluded, so the server can omit rendering information for the occluded objects from the object rendering data retrieved by the client. In some embodiments, the client may indicate to the server an amount of occlusion for each of the objects in the virtual scene, with the server determining for each object of whether to send rendering information and at what resolution.
[0082] Although there may be exceptions for particular objects, bounding volumes and occlusion volumes in general can be defined with less data than would otherwise be required to provide full rendering information for virtual objects with which they are associated.
[0083] In some embodiments, occlusion volumes and/or bounding volumes may be represented by surfaces (e.g. mesh surfaces) or collections of points (e.g. a point cloud) that do not necessarily enclose a particular volume of space in a strict geometric sense. For example, a bounding volume or occlusion volume may be represented as a concave surface in which the virtual object is situated, analogous to a box with an open lid.
[0084] In some embodiments, occlusion of bounding volumes is determined not for a single viewpoint, but rather for a plurality of viewpoints relative to a virtual scene. One such embodiment is illustrated in FIG. 4C. In the example of FIG. 4C, the client device determines a navigation volume 410. A navigation volume may be determined in various ways. In some embodiments, the navigation volume 410 represents a permissible range of motion of a viewpoint of a user (e.g. user 308). In some embodiments, navigation volume 410 represents a likely range of motion of the viewpoint of the user, for example the navigation volume 410 may have a predetermined or reconfigurable size and shape surrounding the current user viewpoint. The navigation volume may be periodically updated in view of motion of the user viewpoint or other events.
[0085] In the example of FIG. 4C, the client device operates to determine the occlusion of bounding volumes from various viewpoints (e.g. viewpoints 412a-c, among others) within the navigation volume 410. Viewpoints within a bounding volume may be selected in various ways. For example, they may be selected to be substantially evenly distributed through the volume, they may be selected to be substantially evenly distributed over an outer surface of the volume, or they may be selected to be at corners of the volume, among other options. [0086] In the virtual scene shown in FIG. 4C, occlusion volume 312 is present along with bounding volumes 414, 416, 418 associated with different virtual objects. Before retrieving full rendering information for the virtual objects, the client device may determine the levels of occlusion of the bounding volumes from the plurality of viewpoints within the defined navigation volume 410. For example, the client device may determine that bounding volume 414 is fully occluded from all of a plurality of viewpoints within the navigation volume. In response, the client may make a determination not to retrieve full rendering information for the virtual object associated with bounding volume 414. The client device may determine that bounding volume 418 is not occluded from any of the viewpoints within the navigation volume. In response, the client may make a determination to retrieve full rendering information for the virtual object associated with bounding volume 418. The client device may determine that bounding volume 416 is fully occluded from some viewpoints (e.g. viewpoint 412a), not occluded at all from some viewpoints (e.g. viewpoint 412c), and partly occluded for some viewpoints (e.g. viewpoint 412b). Because bounding volume 416 is not fully occluded for at least one of the viewpoints in the defined navigation volume, the client device retrieves object rendering data for the virtual object associated with bounding volume 416. In some embodiments, the client device may retrieve a lower-resolution version of the object rendering data in view of the partial occlusion of bounding volume 416.
[0087] The viewpoint or viewpoints that are used in testing for occlusion of bounding volumes in a scene may correspond to current or predicted virtual positions of one or more users with respect to the virtual scene. Viewpoints may be set or adjusted based on user input. User input may include user movement, e.g. where the client device is associated with a head-mounted display or other moveable display device with motion tracking capability. Alternative user inputs include user input with a joystick, arrow key, touchpad, and the like.
[0088] The client device may operate to render a view of the virtual objects using the retrieved object rendering data. A signal representing the rendered view may then be provided to a display device for display to the user. The display device may be a component of the client device, or it may be a separate component.
[0089] In some embodiments, occlusion testing of bounding volumes may be performed for each of a plurality of viewpoints without defining any specific navigation volume. A user may be provided with an option of viewing a virtual scene from any one of a plurality of viewpoints (e.g. one of a plurality of seats that can be selected in a virtual stadium), and the occlusion of bounding volumes may be determined for those different viewpoints. In some embodiments, the client device may retrieve rendering information for more than one user (e.g. for two or more users playing an immersive game). In such embodiments, the client device may retrieve rendering information only for virtual objects whose bounding volumes are not fully occluded from a current or predicted viewpoint of at least one of the users. [0090] In some embodiments, a client device may make a determination not to retrieve rendering data for particular virtual objects for reasons other than occlusion of the bounding volumes of those virtual objects. As one example, a virtual object may have an associated viewing volume within which the virtual object is deemed to be visible (providing it is not occluded, etc.). The client device may make a determination not to retrieve rendering data for a virtual object if no current or predicted viewpoint of the client device is within the viewing volume of the virtual object. As another example, the client device make take into consideration a user’s direction of view relative to the virtual scene. A current or predicted direction of view may be used to define a view frustrum, and the client device may make a determination not to retrieve rendering information for virtual objects whose bounding volumes fall completely outside a current or predicted view frustrum.
[0091] Some embodiments described herein allow a client device to reduce the amount of rendering data retrieved for a virtual scene by not retrieving rendering data for virtual objects whose bounding volumes are fully occluded by occlusion volumes. The additional bandwidth used to retrieve bounding volume information and occlusion volume information is expected to be less than the bandwidth saved through not retrieving rendering data for occluded virtual objects.
[0092] To save on the bandwidth used for bounding volume information and occlusion volume information, the bounding volumes and occlusion volumes do not necessarily conform precisely to the shapes defined by the full rendering data of those objects. Because the bounding volumes and occlusion volumes do not conform exactly to their respective virtual objects, some error in occlusion testing may be expected. In some embodiments, the bounding volumes and occlusion volumes may be selected such that an error is more likely to result in retrieving unneeded rendering information rather than failing to retrieve rendering information that would otherwise be visible in a scene. For example, it may be preferable for the bounding volume of a virtual object to be generally larger than the object and for an occlusion volume to be generally smaller than the represented occlusion. In this way, if the client device detects that a bounding volume is fully occluded, then the associated virtual object is very likely to be fully occluded in the final rendered scene. Conversely, the client device may find that a bounding volume is only partly occluded, even though the associated object itself is ultimately fully occluded in the rendered scene. In that case, the rendering information for the occluded object may be retrieved unnecessarily, but this is likely to be more acceptable than the effects of failing to retrieve rendering information for objects that should be visible in the rendered scene.
[0093] In some embodiments, it may be considered acceptable for some objects to be omitted from a rendered scene even though they would otherwise be at least partly visible; in such embodiments, it may be acceptable for portions of the virtual object to extend beyond the bounding volume. For example, a virtual model of a car may include a radio antenna, and a bounding volume for the car may be selected that encloses the body of the car but not the antenna. In such an embodiment, the antenna may fail to be rendered when it would otherwise be the only visible portion of the car. But this may be an acceptable tradeoff because it avoids the need to retrieve the complete rendering information for the car merely to render the antenna.
[0094] For some embodiments, a content server may process spatial data to segment and identify individual objects from the spatial data and to organize raw point cloud data into object clusters for a scene graph. For each segmented and identified object, a content server may determine a bounding volume (e.g. a box or sphere fully containing the object), an occlusion volume (e.g. one or more boxes or spheres fully contained in the object), and a viewing volume (which is a 3D shape that indicates portions of a space from where the object may be viewed). In some embodiments, a bounding volume is a volume in which the segment is entirely contained. In some embodiments, an occlusion volume is the area in which the segment creates full occlusion. Transparencies or holes in objects may be processed as zero or multiple occluding elementary volumes. A viewing volume indicates an area for which object data supports navigation. These volumes may be represented using 3D shapes (or primitives), such as boxes and spheres, or using more complex shapes, such as polygons, formed from combinations of 3D primitives (such as cubes, cylinders, spheres, cones, and pyramids). Volumes are generated to form a simplified geometry for the original spatial data segments that enable efficient occlusion and collision calculations. The bounding volume and the occlusion volume of an object are preferably configured such that they can be signaled using less data than is used to signal the object itself. For example, the number of vertices used to signal the bounding volume and the occlusion volume may be less than the number of vertices or points in the object.
[0095] In some embodiments, there may be multiple representations for scene elements. For example, content may be provided in raw point cloud format as well as in reconstructed 3D mesh format. For each content element, multiple levels of detail may be available for a representation. In these embodiments, the available representation versions are described in the scene overview, and the viewing client (for a client pull model) may choose the representation based on local criteria. In a server push model, the server may choose the version of the content elements to be streamed to the client. For example, low resolution representations may be selected based on distance or significance of the occlusion (far objects or tiny parts may use a low resolution of a representation with a lower level of detail).
[0096] In embodiments with spatial data containing temporal sequences, the content server performs pre-processing for all time steps or periodically between some number of time steps, processing the whole content. For temporal sequences, the content server tracks elements between temporal steps and compiles generated segments and per segment meta-data together. Communications
[0097] FIG. 5 is a message sequencing diagram illustrating an example process for determining visible objects and rendering visible objects according to some embodiments. A server 502 (which may be a content server) may analyze content and isolate object streams (box 506) for some embodiments of an example process. A client 504 may send a content request 508 to the server. The server may send to the client communications indicating a scene graph 510 (which may include locations of identified objects), object bounding volumes 512, occlusion volumes 514, and viewing volumes 516. The client may select an initial viewpoint (box 518), record sensor data to track a viewer (box 520), and compute visible objects within the scene (box 522). The client may send a request 524 for objects from the server. The request may be for objects determined to be visible. Some embodiments may include a request for partially visible objects. The server may send 526 object representations to the client. The object representations may correspond to an object request sent by the client such that the same objects are referenced in both communications. The client may render and display (box 528) the objects represented in the object representations. The process shown in FIG. 5 may be repeated continuously for some embodiments.
[0098] FIG. 6 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example server push model according to some embodiments. Some embodiments of an example process may include content analysis and content streaming. For content analysis, a content provider may send spatial data to a content server (box 606). For some embodiments, spatial data may include one or more of the following items: a scene graph (which may include locations of identified objects), object bounding volumes, occlusion volumes, and viewing volumes. A content server may segment and classify (content) data (box 608) and generate bounding, occlusion, and viewing volumes for segments (box 610).
[0099] For content streaming, a user may send a content request to a viewer client. The user may interface with a viewer client Ul to generate a content request. The viewer client may send a content request (box 612) to a content server. The viewer client may collect sensor and configuration data (box 614), such as location of the user and viewpoint of the user in relation to real-world objects. The viewer client may record and track movements of the user by recording and tracking movements of the viewer client device. The content server may send to the viewer client communications indicating a scene overview. The scene overview may include a scene graph, descriptions of objects identified within content data, and bounding, occlusion, and viewing volumes of identified objects (box 616). The scene overview also may include locations of objects within a scene. The viewer client may estimate a navigation volume within which the user may navigate as well as visibility of scene elements based on session characteristics and user preferences (box 618). The viewer client may send the estimated navigation volume and visibility of elements (box 620) to the content server. The content server may process the content (box 622) to remove objects that would not be visible (e.g. objects that are fully occluded from the user’s viewpoint) and, in some embodiments, to generate lower-resolution versions of objects that are at least partly occluded. The content server may send the processed content streams (box 622) to the viewer client. For some embodiments, the content server may send (box 624) content streams containing only objects visible by the user for the viewing volume. The viewer client may render (box 626) and display (box 628) the content for viewing by the user.
[0100] FIG. 7 is a message sequencing diagram illustrating an example process for determining visibility of scene elements based on session characteristics and user preferences for an example client pull model according to some embodiments. For some embodiments of a content analysis process for a client pull model, a content server 704 obtains spatial data (box 706), e.g. from a content server. The content server segments and classifies the data (box 708) and generates bounding, occlusion, and viewing volumes for the segments (box 710). The content server may isolate segments into individual streams (box 71 1 ).
[0101] For some embodiments of a content streaming process for a client pull model, a user initiates a content request to a viewer client 702. The viewer client sends a content request (box 712) to the content server. The viewer client collects sensor and configuration data 714. The content server sends to the viewer client scene overview data, which includes a scene graph and bounding, occlusion, and viewing volumes (box 716). The viewer client selects and updates the viewpoint (box 718), such as if the user moves around. The viewer client selects the segments (box 719) to be used/or available for displaying. The viewer client sends a request to the content server for segment streams (box 720). The content server responds to the viewer client with requested segments (724). The content is rendered (box 726) and displayed (box 728) to the user.
Content Analysis
[0102] FIG. 8 is a flowchart illustrating an example process for analyzing content segments to determine bounding volumes according to some embodiments. FIG. 8 shows an example process that may be executed by a content server for extraction of scene overview data from content data. For some embodiments, content may be received or otherwise obtained (e.g. locally generated) by a content server or another server performing content analysis (box 802). In some embodiments of a process executed by a content server, the received spatial data is segmented (box 804). Segments are isolated as individual data blocks and classified to infer segment types (box 806). For content analysis, the content server may analyze spatial content received from a content provider and extract a scene overview.
[0103] Spatial data (such as identity of objects, location of objects, and spatial environment boundary locations) and content segments may be extracted from received content. Content segmentation may be performed based on the spatial data extracted. Content segments may be classified. Content segmentation and content classification may be iterative processes that interface with extraction of spatial data.
[0104] Example processes that may be used for content segmentation and classification for some embodiments are described in Ql, Charles, et al. Volumetric and Multi-view CNNs for Object Classification on 3D Data, PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 5648- 5656 (2016) and Ql, Charles, et al., Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation, 1.2:4 PROCEEDINGS OF 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) 652-660, IEEE (2017).
[0105] Bounding volumes (box 808), occlusion volumes (box 810), and viewing volumes (box 812) may be calculated for segmented and classified content. Bounding, occlusion, and viewing volumes may be calculated based on spatial data extracted from received content. Outputs of the bounding, occlusion, and viewing volume calculation processes, as well as the segment classification process may be stored as scene overview data. Scene overview data may include bounding, occlusion, and viewing volumes for segmented content elements, and classifications of the content elements together with element locations in unified scene coordinates.
Content Server
[0106] FIG. 9 is a flowchart illustrating an example process that may be executed by a content server for streaming spatial data content to a client for an example server push model according to some embodiments. In an example process, a content server may wait (box 902) to receive a content request from a client. The content server may send (box 904) scene overview data to the client. The scene overview data may be retrieved (or received for some embodiments) by the content server from a content provider. Based on the scene (or content) overview, the client determines which content elements are visible for a navigation volume signaled by the viewing client. The content server may receive client navigation volume and element visibility data from the client (box 906).
[0107] Based on per element visibility and the navigation volume communicated (or signaled) by the viewing client, the content server processes the spatial data and stream spatial data for the viewing client (box 908). The spatial data content may be processed according to the navigation volume.
[0108] For the duration of the spatial data content or if a viewing client signals a continuation of a session, the content server continually streams (box 910) an up-to-date scene overview and processed spatial data to the viewing client. If an end to a session is requested, the process may determine if an end of processing is requested. If an end of processing is requested, the process may exit. Otherwise, the process may wait for a content request from a client. [0109] Some embodiments adjust content distribution between a content server and a single client. Some embodiments of a content server may stream content to several clients (which may occur concurrently) and may split the streamed content into several streams to reduce the amount of redundant data sent to clients. If several clients share some parts of the content, those parts may be delivered as one stream. Portions of content that are used by fewer or a single client may be transmitted as an additional stream concurrently with a stream used by all clients (or more clients for some embodiments).
Content Processing
[0110] During content streaming, a content server may process the streamed spatial data based on per element visibility and a navigation volume signaled by the viewing client. For some embodiments of content processing, a content server first removes the elements (segments of data) for which the viewing client has not requested or for which the viewing client has indicated may be removed. For the remaining segments, the content server processes the data based on the navigation volume requested by the viewing client. For point cloud content and polygonal data viewed from user viewpoints for a navigation volume, per point and per vertex occlusion culling may be used to remove portions of the geometry that are not visible by the viewing client. In some embodiments, the removed portions of the geometry are not transmitted. For light field data, a reduced viewing volume may be used for cropping out parts of array images or hogel images (part of light field holograms) that are not displayed by the viewing client.
[0111] In a client pull model, the content server waits for content requests from the viewing clients and transmits data accordingly. Viewing clients request scene overview from the content server at the beginning of a session. If the clients have initialized local execution of the scene, each client selects segment streams to be requested using local criteria. Each client requests individual segments from the content server segment stream by segment stream. A server continuously waits for content requests and streams data based on requests until a content server is requested via a signal or communication message to terminate processing.
[0112] For some embodiments of a content server process, a content server may receive, from a client, a request for streaming content of a spatial scene. The content server process may determine scene overview information (such as non-rendering descriptions and position information for a plurality of objects of the spatial scene). The content server process may include determining spatial bounding volume information for the identified objects of the spatial scene. The content server process may include determining occlusion volume information for at least one identified object of the spatial scene. The content server process may include determining viewing volume information for at least one object of the plurality of objects of the spatial scene. The content server process may include sending to the client the following items: scene overview information (such as non-rendering descriptions and position information for identified objects of the spatial scene), spatial bounding volume information for the identified objects, occlusion volume information for at least one identified object, and viewing volume information for at least one identified object. The content server may receive from the client a rendering request indicating a set of visible objects of the spatial scene. The content server may generate and send to the client rendering information for the set of visible objects.
[0113] Some embodiments of a content server process may include receiving a resolution adjustment request for at least one partially occluded object selected from the set of visible objects. The content server process may adjust the resolution used in rendering information for corresponding partially occluded objects. For some embodiments of a content server process, the resolution request may indicate a visibility percentage of the corresponding partially occluded object, wherein adjusting the resolution used in rendering information for the respective partially occluded object may be based on the visibility percentage of the corresponding partially occluded object.
[0114] For some embodiments of a content server process, at least one portion of the object occlusion sizing data (or object visibility data) (which may include spatial bounding volume information, occlusion volume information, and viewing volume information) may indicates changes with time. Some embodiments may divide identified objects into a plurality of sub-objects. For some embodiments of a content server process, spatial scene boundary information indicating spatial boundaries of a spatial scene may be sent to the client.
[0115] Some embodiments of a content server process may use a set of visible objects in which at least one of the visible objects is a partially visible object. At least one of the partially visible objects may include a plurality of sub-objects. For each partially visible object that includes a plurality of sub-objects, a respective set of visible sub-objects may include less than all of the plurality of sub-objects of the respective partially visible object. For each partially visible object that includes a plurality of sub-objects, the rendering information sent to the client may include rendering information only for the respective set of visible subobjects.
[0116] Some embodiments of a content server process may determine predicted object occlusion sizing data (which may include, for one or more objects of a spatial scene, predicted spatial bounding volume information, predicted occlusion volume information, and predicted viewing volume information) for a predicted user viewpoint at a future time t1. The content server process may send to the client the predicted object occlusion sizing data. For some embodiments of a content server process, a predicted user viewpoint at a future time t1 may be received from a client. Some embodiments of the content sever process may further include receiving user viewing position tracking information and determining the predicted user viewpoint relative to the spatial scene based in part on the user viewing position tracking information. [0117] FIG. 10 is a flowchart illustrating an example process executed by a content server for streaming spatial data content to a client for an example client pull model according to some embodiments. For some embodiments of a content server process for a client pull model, the content server waits to receive a content request from a client. If a content request is received, the content server determines the request type. If a new session type of content request is received, the content server retrieves scene overview data and sends the scene overview data to the client. If a segment stream type of content request is received, the content server retrieves spatial data and sends the requested segment stream to the client. For both new session and segment stream requests, the content server process determines if an end of processing request is received from a client. If no end of processing request is received, the content server process returns to waiting for a content request from a client. Otherwise, the content server process exits.
Viewing Client
[0118] FIG. 1 1 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example server push model according to some embodiments. For some embodiments of a viewing client process under both a server push model (FIG. 1 1) and a client pull model (FIG. 12), a viewing client may request content from a content server (box 1 102). A user may launch an application on a viewing client. The user may indicate the content to be viewed within the application, and a content request may be sent by the application. Content may be a link to a scene overview stored on a content server. The link to the content may be a uniform resource locator (URL) identifying a content server and specific content. A viewing client application may be launched by an explicit command of the user or by the operating system automatically based on an identifying content type request and an application associated with the type of specific content. In addition to being a stand-alone application, a viewing client may be integrated with a web browser, a social media client, or an operating system for some embodiments.
[0119] A viewing client process may initialize sensor data and collect configuration data (box 1104). Sensor data may include information related to the context of the spatial scene. Configuration data, for example, may indicate the number of users, activities in which users are engaging, display device setup, and physical characteristics of the environment, including locations and poses of users. Upon initialization of sensors and collection of configuration data, a viewing client process may execute a run-time process continually until an end of processing request is received for some embodiments.
[0120] For some embodiments of a viewing client process, a run-time process may receive scene overview data from a content server (box 1 106). The scene overview data may be used with sensor and configuration data to adjust the viewpoint to the content. The viewpoint to content may be adjusted automatically and/or manually for some embodiments (box 1 108). Automatic content adjustment of viewpoint, for example, may set the viewpoint to the content based on the display setup (such as adjusting the viewpoint orientation and location depending on the display device orientation (such as tabletop or wall mounted)) or locations of the users. Additionally, content may be processed automatically based on user preferences or manually by the user. For example, user preferences may indicate a preference to display content that focuses on a specific content element type. Scene overview data may include element classification information. Element classification information may be used to adjust the viewpoint of content to focus on a content element. A user’s navigation volume may be estimated (box 1 1 10). User preferences and the estimated navigation volume may be used to determine the visibility of elements (box 1 1 12). A viewing client process may automatically toggle visibility on or off for a particular element or a particular type of element. A user may indicate a preference to adjust a viewpoint to content to focus on a specific element content type based on a list of object (or element) classifications from scene overview data. A user (via a user interface) may request an expanded view that includes selection of elements included in the scene overview data. The navigation volume and element visibilities are sent by the client to the content server (box 1 1 14), and a content stream is received from the content server (box 1 1 16).
[0121] The viewpoint to the content may be set based on the content adjustment and content navigation controlled by the user input and potentially display device tracking in case of HMDs (box 1 118). The viewing client determines the navigation volume. The navigation volume may be calculated using the display device configuration, the number and placement of user, and the expected content update frequency. The navigation volume may be a volume within the full viewing volume available for a scene, which may be indicated in scene overview data. The viewing client may determine which scene (or content) elements are visible by one or more viewers. Visibility may be determined based on, for example, the viewing volume and the viewing environment layout. Content processing, requesting and receiving, is implemented differently for embodiments of a server push model or a client pull model.
[0122] The viewing client may communicate the navigation volume and the visibility of scene elements to the content server. In addition to signaling the scene elements that are visible (or that are to be displayed for some embodiments), the client may predict which elements may be used in the near future (such as between the present time and a future time t1 ). The client may communicate to a content server the elements predicted to be displayed in the near future. The client may store information related to the predicted elements in a local memory or cache location. The client may use the same information used to determine the navigation volume to determine the predicted elements.
[0123] A viewing client may continually repeat a run-time process that includes: receiving scene overview data, adjusting a viewpoint to content, determining (or estimating) a navigation volume, determining (or testing) element visibility, sending navigation volume and element visibility data to the content server, and receiving and rendering the spatial data (1120). The run-time process may be continually repeated until a content server indicates an end of the content or a user requests an end of a session.
[0124] In addition to determining the scene elements used and/or available to display, the client may use the same information used for defining the navigation volume to predict which elements may be used in the near future and request to add them to a local cache. If the viewing client has determined the navigation volume and the visibility of scene elements, the viewing client signals this information to the content server. The viewing client receives content streamed from the content server, which may be processed on the server side based on the information the viewing client sent to the content server. If the client receives the content stream, the client updates the viewpoint to the content according to the latest user input and tracking result before rendering the content and sending the rendering data to the display.
[0125] For some embodiments, an example viewing client process executed by a viewing client may include determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects. With some embodiments of a viewing client process, requesting adjustment of resolution used in rendering information may include: determining a visibility percentage of the respective partially occluded object; and requesting a decrease in the resolution used in the rendering information for the respective partially occluded object if the visibility percentage of the respective partially occluded object is less than a threshold.
[0126] For some embodiments of a viewing client process, at least one of the following items may indicate a change with time: the spatial bounding volume information, the occlusion volume information, and the viewing volume information. In some embodiments of a viewing client process, at least one of the objects of a spatial scene includes two or more sub-objects. With some embodiments of the viewing client process, at least one of the visible objects may be a partially visible object. At least one of the partially visible objects may include a plurality of sub-objects. For each partially visible object that includes a plurality of sub-objects, a respective set of visible sub-objects may include less than all of the plurality of sub-objects of the respective partially visible object. For each partially visible object that includes a plurality of sub-objects, the rendering information may include rendering information only for the respective set of visible sub-objects.
[0127] Some embodiments of a viewing client process may include determining a predicted viewpoint of the user, wherein determining the set of visible objects may be based in part on the predicted viewpoint of the user. For some embodiments, a viewing client process may include determining, for a future time t1 , predicted spatial bounding volume information, predicted occlusion volume information, predicted viewing volume information, and a predicted viewpoint of the user. Determining the set of visible objects, for the future time t1 , may be based in part on at least one of the following items: the predicted spatial bounding volume information, the predicted occlusion volume information, the predicted viewing volume information, and the predicted viewpoint of the user.
[0128] Some embodiments of a viewing client process may include receiving user viewing position tracking information, wherein determining the viewpoint of the user relative to the spatial scene may be based in part on the user viewing position tracking information. For some embodiments, an example viewing client process executed by a viewing client may include receiving spatial scene boundary information indicating spatial boundaries of the spatial scene, wherein determining a set of visible objects may be based in part on the spatial scene boundary information indicating spatial boundaries of the spatial scene.
[0129] For some embodiments, an example viewing client process executed by a viewing client may include determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects.
[0130] FIG. 12 is a flowchart illustrating an example process executed by a viewing client for retrieving and rendering visible scene elements for an example client pull model according to some embodiments. In some embodiments of a client pull model, the viewing client requests content from the content server (box 1202). The viewing client initializes sensor data and collects configuration data (box 1204). Scene overview data is received by the viewing client (box 1206). The client continually adjusts the viewpoint to the data based on the tracking, user input, and scene overview (box 1208). In addition to setting the current viewpoint, the viewing client may predict future viewpoint motion and use future viewpoint predictions to adjust location and size of the volume for which content navigation (navigation volume) may be enabled (box 1210). If the navigation volume is determined, the viewing client adjusts scene element visibilities (box 1212) as described above. If the scene adjustment has been done based on local criteria, the client may process the content to be streamed by inspecting the segment visibilities based on the per segment bounding, occlusion, and viewing volumes (box 1214). Content processing, for some embodiments, may use occlusion culling. For occlusion culling, each segment bounding volume’s visibility to the available and/or used navigation volume is evaluated by testing if the segment bounding volume is completely occluded by occlusion volumes of other segments and by determining if a per segment viewing volume has overlap with the current navigation volume. If segment streams have been selected and non-visible streams have been omitted in a processing step, the viewing client requests visible streams (box 1216) from the content server. If the client receives the content stream, the client updates the viewpoint to the content according to the latest user input and tracking result (1218) before rendering the content (box 1220) and sending the rendering data to the display. If a request to end processing is received, the viewing client process exits. Otherwise, the viewing client process repeats with the receiving of scene overview data. [0131] In an example process for determining visible objects and displaying visible objects according to some embodiments, a viewing client may receive non-rendering descriptions and position information for a plurality of objects of a spatial scene. Non-rendering descriptions and position information may include a name and location data for each object or element of a spatial scene. A viewing client may receive, from a content server, spatial bounding information, occlusion volume information, and viewing volume information for one or more of the objects of the spatial scene. The example viewing client process may include determining a viewpoint of a user relative to the spatial scene. The viewing client process may determine a set of visible objects for the spatial scene based on at least one of the spatial bounding volume information, the occlusion volume information, the viewing volume information, and the viewpoint of the user. A rendering request may be sent to the content server. The rendering request may include an indication of the set of visible objects. The viewing client may receive rendering information describing the set of visible objects, and the viewing client may use the rendering information to display the set of visible objects from the viewpoint of the user.
[0132] For some embodiments, an apparatus may include a processor and a non-transitory computer- readable medium storing instructions that are operative, when executed by the processor, to perform a method disclosed herein.
[0133] For some embodiments, a viewing client may receive object occlusion sizing data (which may include one or more of the following items: spatial bounding information, occlusion volume information, and viewing volume information) for one or more objects of a spatial scene. The viewing client may determine a set of visible objects based on the object occlusion sizing data and the viewpoint of the user. The viewing client may retrieve rendering information for each of the visible objects (or partially visible objects). The viewing client may display content for the visible objects using the rendering information.
Further Embodiments
[0134] In some embodiments, a client device performs a method comprising: receiving non-rendering descriptions and position information for a plurality of objects of a spatial scene; receiving spatial bounding volume information for the plurality of objects of the spatial scene; receiving occlusion volume information for at least one object of the plurality of objects of the spatial scene; receiving viewing volume information for at least one object of the plurality of objects of the spatial scene; determining a viewpoint of a user relative to the spatial scene; determining a set of visible objects selected from the plurality of objects based on at least one of the spatial bounding volume information, the occlusion volume information, the viewing volume information, and the viewpoint of the user; sending rendering request information indicating the set of visible objects; receiving rendering information describing the set of visible objects; and displaying the set of visible objects from the viewpoint of the user. [0135] In some embodiments, the method further includes: determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects. Requesting adjustment of resolution used in the rendering information may include: determining a visibility percentage of the respective partially occluded object; and requesting a decrease in the resolution used in the rendering information for the respective partially occluded object if the visibility percentage of the respective partially occluded object is less than a threshold.
[0136] In some embodiments, at least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information indicates changes with time. In some embodiments, at least one of the plurality of objects comprises a plurality of sub-objects.
[0137] In some embodiments: at least one of the visible objects is a partially visible object; at least one of the partially visible objects comprises a plurality of sub-objects; for each partially visible object comprising a plurality of sub-objects, a respective set of visible sub-objects comprises less than all of the plurality of subobjects of the respective partially visible object, and; for each partially visible object comprising a plurality of sub-objects, the rendering information comprises rendering information only for the respective set of visible sub-objects.
[0138] In some embodiments, the method further includes determining a predicted viewpoint of the user, wherein determining the set of visible objects is further based on the predicted viewpoint of the user.
[0139] In some embodiments, the method further includes: determining a predicted spatial bounding volume information at a future time t1 ; determining a predicted occlusion volume information at the future time t1 ; determining a predicted viewing volume information at the future time t1 ; determining a predicted viewpoint of the user at the future time t1 , wherein determining the set of visible objects is further based on, for the future time t1 , at least one of the predicted spatial bounding volume information, the predicted occlusion volume information, the predicted viewing volume information, and the predicted viewpoint of the user.
[0140] In some embodiments, the method further includes receiving user viewing position tracking information, wherein determining the viewpoint of the user relative to the spatial scene is based in part on the user viewing position tracking information.
[0141] In some embodiments, the method further includes receiving spatial scene boundary information indicating spatial boundaries of the spatial scene, wherein determining the set of visible objects is further based on the spatial scene boundary information indicating spatial boundaries of the spatial scene. [0142] In some embodiments, a client device performs a method comprising: receiving spatial bounding volume information for a plurality of objects of a spatial scene; receiving occlusion volume information for at least one object of the plurality of objects of the spatial scene; receiving viewing volume information for at least one object of the plurality of objects of the spatial scene; determining a set of visible objects selected from the plurality of objects based on at least one of the spatial bounding volume information, the occlusion volume information, the viewing volume information, and the viewpoint of the user relative to the spatial scene; retrieving rendering information for the set of visible objects; and displaying the set of visible objects using the rendering information.
[0143] In some embodiments, the method further includes determining a set of one or more partially occluded objects selected from the set of visible objects; and requesting adjustment of resolution used in the rendering information for at least one of the partially occluded objects of the set of one or more of the partially occluded objects. In some embodiments, requesting adjustment of resolution used in the rendering information comprises: determining a visibility percentage of the respective partially occluded object; and requesting a decrease in the resolution used in the rendering information for the respective partially occluded object if the visibility percentage of the respective partially occluded object is less than a threshold.
[0144] In some embodiments, at least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information indicates changes with time. In some embodiments, at least one of the plurality of objects comprises a plurality of sub-objects.
[0145] In some embodiments: at least one of the visible objects is a partially visible object; at least one of the partially visible objects comprises a plurality of sub-objects; for each partially visible object comprising a plurality of sub-objects, a respective set of visible sub-objects comprises less than all of the plurality of subobjects of the respective partially visible object; and for each partially visible object comprising a plurality of sub-objects, the rendering information comprises rendering information only for the respective set of visible sub-objects.
[0146] In some embodiments, determining the set of visible objects is further based on a predicted viewpoint of the user.
[0147] In some embodiments, determining the set of visible objects is further based on, for a future time, at least one of a predicted spatial bounding volume information, a predicted occlusion volume information, a predicted viewing volume information, and a predicted viewpoint of the user.
[0148] In some embodiments, the method further includes receiving user viewing position tracking information and determining the viewpoint of the user relative to the spatial scene is based in part on the user viewing position tracking information. [0149] In some embodiments, the method further includes receiving spatial scene boundary information indicating spatial boundaries of the spatial scene, wherein determining the set of visible objects is further based on the spatial scene boundary information indicating spatial boundaries of the spatial scene.
[0150] In some embodiments, an apparatus comprises a processor configured to perform any of the methods described above. In some embodiments, the processor is configured to perform such methods by providing a computer-readable medium (e.g. a non-transitory computer-readable medium) storing instructions that are operative, when executed by the processor, to perform such methods.
[0151] In some embodiments, a content server performs a method comprising: receiving, from a client, a request for streaming content of a spatial scene; determining non-rendering descriptions and position information for a plurality of objects of the spatial scene; determining spatial bounding volume information for the plurality of objects of the spatial scene; determining occlusion volume information for at least one object of the plurality of objects of the spatial scene; determining viewing volume information for at least one object of the plurality of objects of the spatial scene; sending, to the client, non-rendering descriptions and position information for a plurality of objects of the spatial scene; sending, to the client, spatial bounding volume information for the plurality of objects of the spatial scene; sending, to the client, occlusion volume information for at least one object of the plurality of objects of the spatial scene; sending, to the client, viewing volume information for at least one object of the plurality of objects of the spatial scene; receiving, from the client, rendering request information indicating a set of visible objects selected from the plurality of objects; generating rendering information for the set of visible objects; and sending, to the client, the rendering information for the set of visible objects.
[0152] In some embodiments, the method includes receiving a resolution adjustment request for at least one partially occluded object selected from the set of visible objects; and adjusting a resolution used in the rendering information for at least one respective partially occluded object.
[0153] In some embodiments, the resolution adjustment request indicates a visibility percentage of the respective partially occluded object, and adjusting the resolution used in rendering information for the respective partially occluded object is based on the visibility percentage of the respective partially occluded object.
[0154] In some embodiments, at least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information indicates changes with time. In some embodiments, at least one of the plurality of objects comprises a plurality of sub-objects.
[0155] In some embodiments: at least one of the visible objects is a partially visible object; at least one of the partially visible objects comprises a plurality of sub-objects; for each partially visible object comprising a plurality of sub-objects, a respective set of visible sub-objects comprises less than all of the plurality of subobjects of the respective partially visible object; and for each partially visible object comprising a plurality of sub-objects, the rendering information comprises rendering information only for the respective set of visible sub-objects.
[0156] In some embodiments, the method further includes: determining a predicted spatial bounding volume information for the plurality of objects of the spatial scene for a predicted user viewpoint at the future time t1 ; determining a predicted occlusion volume information for at least one object of the plurality of objects of the spatial scene for the predicted user viewpoint at the future time t1 ; determining a predicted viewing volume information for at least one object of the plurality of objects of the spatial scene for the predicted user viewpoint at the future time t1 ; sending, to the client, the predicted spatial bounding volume information for the plurality of objects of the spatial scene at the future time t1 ; sending, to the client, the predicted occlusion volume information for at least one object of the plurality of objects of the spatial scene at the future time t1 ; and sending, to the client, the predicted viewing volume information for at least one object of the plurality of objects of the spatial scene at the future time t1.
[0157] In some embodiments, the method further includes receiving, from the client, the predicted user viewpoint at a future time t1.
[0158] In some embodiments, the method further includes receiving user viewing position tracking information and determining the predicted user viewpoint relative to the spatial scene based in part on the user viewing position tracking information.
[0159] In some embodiments, the method further includes sending, to the client, spatial scene boundary information indicating spatial boundaries of the spatial scene.
[0160] In some embodiments, a method includes: receiving object occlusion sizing data for a plurality of objects of a spatial scene; determining a set of visible objects selected from the plurality of objects based on the object occlusion sizing data and the viewpoint of the user; retrieving rendering information for the set of visible objects; and displaying the set of visible objects from a viewpoint of the user using the rendering information, wherein the object occlusion sizing data comprises at least one of the spatial bounding volume information, the occlusion volume information, and the viewing volume information.
[0161] In some embodiments, a method performed at a client device includes: for each of a plurality of virtual objects, receiving (i) information defining a bounding volume that encloses the respective object and (ii) information defining an occlusion volume enclosed within the respective object; for each of the objects, determining whether the object is visible from at least one viewpoint, where an object is determined to be visible if the bounding volume of the object is not obscured from the viewpoint by the occlusion volumes of the other objects; and, in response to the determination, retrieving rendering information only for the objects that are determined to be visible.
[0162] In some embodiments, an object is determined to be visible if the bounding volume of the object is not entirely obscured from the viewpoint by the occlusion volumes of the other objects. In some embodiments, an object is determined to be visible if no more than a threshold amount of the bounding volume of the object is obscured from the viewpoint by the occlusion volumes of the other objects. In some embodiments, an object is determined to be visible if there is at least one viewpoint in a plurality of selected viewpoints at which the bounding volume of the object is not obscured by the occlusion volumes of the other objects. The selected viewpoints may be viewpoints within a defined navigation volume.
[0163] In some embodiments, a method includes: for each of a plurality of virtual objects, determining (i) a bounding volume that encloses the respective object and (ii) an occlusion volume enclosed within the respective object; for each of the objects, determining whether the object is visible from at least one viewpoint, where an object is determined to be visible if the bounding volume of the object is not obscured from the viewpoint by the occlusion volumes of the other objects; and, in response to the determination, rendering only the objects that are determined to be visible.
[0164] While the methods and systems in accordance with some embodiments are discussed in the context of augmented reality (AR), some embodiments may be applied to mixed reality (MR) / virtual reality (VR) contexts as well. Also, although the term “head mounted display (HMD)” is used herein, some embodiments may be applied to a wearable device (which may or may not be attached to the head) capable of, e.g., VR, AR, and/or MR for some embodiments.
[0165] Note that various hardware elements of one or more of the described embodiments are referred to as“modules” that carry out (i.e., perform, execute, and the like) various functions that are described herein in connection with the respective modules. As used herein, a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation. Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and it is noted that those instructions could take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer-readable medium or media, such as commonly referred to as RAM, ROM, etc. [0166] Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Claims

CLAIMS What is Claimed:
1. A method comprising:
for at least a first virtual object in a scene, receiving information defining a first bounding volume that substantially encloses the first virtual object;
receiving information defining at least one occlusion volume in the scene;
for least a first viewpoint relative to the scene, determining whether the first bounding volume is fully occluded by one or more of the occlusion volumes; and
in response to a determination that the first bounding volume is not fully occluded, retrieving first object rendering data for the first virtual object.
2. The method of claim 1 , further comprising determining an amount by which the first virtual object is occluded by the at least one occlusion volume, where a resolution level of the retrieved first object rendering data is determined based on the amount by which the bounding volume is occluded.
3. The method of claim 1 or 2, further comprising:
for at least a second virtual object in a scene, receiving information defining a second bounding volume that substantially encloses the second virtual object;
determining whether the second bounding volume is fully occluded by at least one of the occlusion volumes; and
in response to a determination that the second bounding volume is fully occluded by at least one of the occlusion volumes, making a determination not to retrieve object rendering data for the second virtual object.
4. The method of any of claims 1 -3, wherein at least one of the occlusion volumes is enclosed within a third virtual object in the scene.
5. The method of any of claims 1 -4, wherein a data size of the object rendering data for the first virtual object is greater than a data size of the information defining the bounding volume of the first virtual object.
6. The method of any of claims 1 -5, wherein the first viewpoint is within a defined navigation volume, the method further comprising, for a plurality of viewpoints within the defined navigation volume, determining whether the first bounding volume is fully occluded by at least one of the occlusion volumes; wherein the first object rendering data is retrieved in response to a determination that the first bounding volume is not fully occluded for at least one of the viewpoints in the defined navigation volume.
7. The method of any of claims 1 -6, wherein the first viewpoint corresponds to a virtual position of a user with respect to the scene.
8. The method of any of claims 1 -6, wherein the first viewpoint corresponds to a predicted future position of a user with respect to the scene.
9. The method of any of claims 1 -8, wherein the first object rendering data includes a first number of vertex points, and the information defining the first bounding volume comprises a second number of vertex points, the second number being less than the first number.
10. The method of any of claims 1 -9, wherein the first object rendering data includes a first number of polygons, and the information defining the first bounding volume comprises a second number of polygons, the second number being less than the first number.
1 1. The method of any of claims 1 -10, further comprising rendering a view of the scene based at least on the first object rendering data.
12. The method of claim 11 , further comprising generating a signal representing the rendered view.
13. An apparatus comprising:
a processor configured to perform at least:
for at least a first virtual object in a scene, receiving information defining a first bounding volume that substantially encloses the first virtual object;
receiving information defining at least one occlusion volume in the scene;
for least a first viewpoint relative to the scene, determining whether the first bounding volume is fully occluded by one or more of the occlusion volumes; and
in response to a determination that the first bounding volume is not fully occluded, retrieving first object rendering data for the first virtual object.
14. The apparatus of claim 13, wherein the processor is further configured to perform:
determining an amount by which the first virtual object is occluded by the at least one occlusion volume, where a resolution level of the retrieved first object rendering data is determined based on the amount by which the bounding volume is occluded.
5. The apparatus of claim 13 or 14, wherein the processor is further configured to perform:
for at least a second virtual object in a scene, receiving information defining a second bounding volume that substantially encloses the second virtual object;
determining whether the second bounding volume is fully occluded by at least one of the occlusion volumes; and
in response to a determination that the second bounding volume is fully occluded by at least one of the occlusion volumes, making a determination not to retrieve object rendering data for the second virtual object.
PCT/US2019/067898 2018-12-28 2019-12-20 System and method for optimizing spatial content distribution using multiple data systems WO2020139766A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862786219P 2018-12-28 2018-12-28
US62/786,219 2018-12-28

Publications (2)

Publication Number Publication Date
WO2020139766A2 true WO2020139766A2 (en) 2020-07-02
WO2020139766A3 WO2020139766A3 (en) 2020-08-06

Family

ID=69185722

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/067898 WO2020139766A2 (en) 2018-12-28 2019-12-20 System and method for optimizing spatial content distribution using multiple data systems

Country Status (1)

Country Link
WO (1) WO2020139766A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883792A (en) * 2023-02-15 2023-03-31 深圳市完美显示科技有限公司 Cross-space real-scene user experience system using 5G and 8K technologies

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QL, CHARLES ET AL.: "PROCEEDINGS OF 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR", vol. 1.2, 2017, IEEE, article "Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation", pages: 652 - 660
QL, CHARLES ET AL.: "Volumetric and Multi-view CNNs for Object Classification on 3D Data", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2016, pages 5648 - 5656, XP033021762, DOI: 10.1109/CVPR.2016.609

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883792A (en) * 2023-02-15 2023-03-31 深圳市完美显示科技有限公司 Cross-space real-scene user experience system using 5G and 8K technologies

Also Published As

Publication number Publication date
WO2020139766A3 (en) 2020-08-06

Similar Documents

Publication Publication Date Title
US20220309689A1 (en) System and method for optimizing dynamic point clouds based on prioritized transformations
US11816786B2 (en) System and method for dynamically adjusting level of details of point clouds
CN113424549B (en) System and method for adaptive spatial content streaming with multiple levels of detail and degrees of freedom
US11202051B2 (en) System and method for distributing and rendering content as spherical video and 3D asset combination
US11961264B2 (en) System and method for procedurally colorizing spatial data
US20240087217A1 (en) System and method for hybrid format spatial data distribution and rendering
US11954789B2 (en) System and method for sparse distributed rendering
US20220264080A1 (en) System and method for adaptive lenslet light field transmission and rendering
KR20220004961A (en) Systems and methods for multiplexed rendering of light fields
WO2020139766A2 (en) System and method for optimizing spatial content distribution using multiple data systems
US20220358728A1 (en) Method and device for providing augmented reality (ar) service based on viewer environment
US20240212220A1 (en) System and method for procedurally colorizing spatial data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19839748

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19839748

Country of ref document: EP

Kind code of ref document: A2