WO2025019438A1 - Providing object-based multi-sensory experiences - Google Patents
Providing object-based multi-sensory experiences Download PDFInfo
- Publication number
- WO2025019438A1 WO2025019438A1 PCT/US2024/038069 US2024038069W WO2025019438A1 WO 2025019438 A1 WO2025019438 A1 WO 2025019438A1 US 2024038069 W US2024038069 W US 2024038069W WO 2025019438 A1 WO2025019438 A1 WO 2025019438A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sensory
- environment
- light
- data
- examples
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/4104—Peripherals receiving signals from specially adapted client devices
- H04N21/4131—Peripherals receiving signals from specially adapted client devices home appliance, e.g. lighting, air conditioning system, metering devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Definitions
- the present disclosure relates to providing multi-sensory experiences, some of which include light-based experiences.
- Luminaires for example, are used extensively as an expression of art and function for concerts. However, each installation is designed specifically for a unique set of luminaires. Delivering a lighting design beyond the set of fixtures the system was designed for is generally not feasible. Other systems that attempt to deliver light experiences more broadly simply do so by extending the screen visuals algorithmically, but are not specifically authored.
- Haptics content is designed for a specific haptics apparatus. If another device, such as a game controller, mobile phone or even a different brand of haptics device is used, there has been no way to translate the creative intent of content to the different actuators.
- At least some aspects of the present disclosure may be implemented via methods, such as audio processing methods.
- the methods may be implemented, at least in part, by a control system such as those disclosed herein.
- Some methods may involve receiving, by a control system, a content bitstream including encoded object-based sensory data.
- the encoded object-based sensory data may include one or more sensory objects and corresponding sensory metadata.
- the encoded object-based sensory data may correspond to sensory effects to be provided by one or more sensory actuators in an environment.
- the sensory effects may include lighting, haptics, airflow, one or more positional actuators, or combinations thereof.
- Some methods may involve extracting, by the control system, the object-based sensory data from the content bitstream and providing, by the control system, the object-based sensory data to a sensory Tenderer.
- the object-based sensory metadata may include sensory spatial metadata indicating at least a spatial position for rendering the object-based sensory data within the environment, an area for rendering the object-based sensory data within the environment, or combinations thereof. According to some examples, the object-based sensory data may not correspond to particular sensory actuators in the environment. In some examples, the object-based sensory data may include abstracted sensory reproduction information allowing the sensory Tenderer to reproduce one or more authored sensory effects via one or more sensory actuator types, via various numbers of sensory actuators and from various sensory actuator positions in the environment.
- Some methods may involve receiving, by the sensory Tenderer, the object-based sensory data and environment descriptor data corresponding to one or more locations of sensory actuators in the environment. Some methods may involve receiving, by the sensory Tenderer, actuator descriptor data corresponding to properties of the sensory actuators in the environment. Some methods may involve providing, by the sensory Tenderer, one or more actuator control signals for controlling the sensory actuators in the environment to produce one or more sensory effects indicated by the object-based sensory data. Some methods may involve providing, by the one or more sensory actuators in the environment, the one or more sensory effects.
- the content bitstream also may include one or more encoded audio objects synchronized with the encoded object-based sensory data.
- the audio objects may include one or more audio signals and corresponding audio object metadata. Some such methods may involve extracting, by the control system, audio objects from the content bitstream and providing, by the control system, the audio objects to an audio Tenderer.
- the audio object metadata may include at least audio object spatial metadata indicating an audio object spatial position for rendering the one or more audio signals within the environment.
- Some methods may involve receiving, by the audio Tenderer, the one or more audio objects, receiving, by the audio Tenderer, loudspeaker data corresponding to one or more loudspeakers in the environment and providing, by the audio renderer, one or more loudspeaker control signals for controlling the one or more loudspeakers in the environment to play back audio corresponding to the one or more audio objects and synchronized with the one or more sensory effects.
- Some methods may involve playing back, by the one or more loudspeakers in the environment, the audio corresponding to the one or more audio objects.
- the content bitstream may include encoded video data synchronized with the encoded audio objects and the encoded object-based sensory data.
- Some methods may involve extracting, by the control system, video data from the content bitstream and providing, by the control system, the video data to a video Tenderer.
- Some methods may involve receiving, by the video Tenderer, the video data and providing, by the video Tenderer, one or more video control signals for controlling one or more display devices in the environment to present one or more images corresponding to the one or more video control signals and synchronized with the one or more audio objects and the one or more sensory effects.
- Some methods may involve presenting, by the one or more display devices in the environment, the one or more images corresponding to the one or more video control signals.
- the environment may be a virtual environment.
- the environment may be a physical, real -world environment.
- the environment may be a room environment or a vehicle environment.
- Some or all of the operations, functions and/or methods described herein may be performed by one or more devices according to instructions (e.g., software) stored on one or more computer-readable non-transitory media.
- Such non-transitory media may include one or more memory devices such as those described herein, including but not limited to one or more random access memory (RAM) devices, read-only memory (ROM) devices, etc.
- RAM random access memory
- ROM read-only memory
- an apparatus may include an interface system and a control system.
- the control system may include one or more general purpose single- or multi-chip processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates or transistor logic, discrete hardware components, or combinations thereof.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- the control system may be configured to perform some or all of the disclosed methods.
- Figure l is a block diagram that shows examples of components of an apparatus capable of implementing various aspects of this disclosure.
- Figure 2 shows example elements of an endpoint.
- Figure 3 shows examples of actuator elements.
- Figure 4 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences.
- Figure 5 shows example elements of another system for the creation and playback of MS experiences.
- Figure 6 shows an example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5.
- GUI graphical user interface
- FIG. 7A shows another example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5.
- GUI graphical user interface
- Figure 7B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- Figures 8 A, 8B and 8C show three examples of projecting lighting of viewing environments onto a two-dimensional (2D) plane.
- Figure 9A shows example elements of a lightscape Tenderer.
- Figure 9B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- Figure 10 shows another example of a GUI that may be presented by a display device of a lightscape creation tool.
- Figure 11 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- Figure 12 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences.
- Figure 13 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- a and B may mean at least the following: “both A and B”, “at least both A and B”.
- a or B may mean at least the following: “at least A”, “at least B”, “both A and B”, “at least both A and B”.
- a and/or B may mean at least the following: “A and B”, “A or B”.
- This application describes methods for extending the creative palette for content creators, allowing spatial, MS experiences to be created and delivered at scale.
- Some such methods involve the introduction of new layers of abstraction, in order to allow authored MS experiences to be delivered to different endpoints, with different types of fixtures or actuators.
- the term “endpoint” is synonymous with “playback environment” or simply “environment,” meaning an environment that includes one or more actuators that may be used to provide an MS experience.
- Such endpoints may include a room, such as the living room of a home, a car, a cinema, a night club or other venue, etc.
- Some disclosed methods involve the creation, delivery and/or rendering of object-based sensory data, which may include sensory objects and corresponding sensory metadata.
- An MS experience provided via objectbased sensory data may be referred to herein as a “flexibly-scaled MS experience.”
- Figure l is a block diagram that shows examples of components of an apparatus capable of implementing various aspects of this disclosure.
- the types and numbers of elements shown in Figure 1 are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements.
- the apparatus 101 may be, or may include, a device that is configured for performing at least some of the methods disclosed herein, such as a smart audio device, a laptop computer, a cellular telephone, a tablet device, a smart home hub, etc.
- the apparatus 101 may be, or may include, a server that is configured for performing at least some of the methods disclosed herein.
- the apparatus 101 includes at least an interface system 105 and a control system 110.
- the control system 110 may be configured for performing, at least in part, the methods disclosed herein.
- the control system 110 may, in some implementations, be configured for receiving, via the interface system 105, receiving, by a control system, a content bitstream including encoded object-based sensory metadata,
- the encoded object-based sensory metadata may correspond to sensory effects such as lighting, haptics, airflow, one or more positional actuators, or combinations thereof, to be provided by a plurality of sensory actuators in an environment.
- the control system 110 may be configured for extracting object-based sensory metadata from the content bitstream and for providing the object-based sensory metadata to a sensory Tenderer.
- the object-based sensory metadata may include sensory spatial metadata indicating at least a spatial position for rendering the object-based sensory metadata within the environment, an area for rendering the object-based sensory metadata within the environment, or combinations thereof.
- the object-based sensory metadata does not correspond to any particular sensory actuator in the environment.
- the object-based sensory metadata may include abstracted sensory reproduction information allowing the sensory Tenderer to reproduce authored sensory effects, which also may be referred to herein as intended sensory effects, via various sensory actuator types, via various numbers of sensory actuators and from various sensory actuator positions in the environment.
- the content bitstream also may include encoded audio objects synchronized with the encoded object-based sensory metadata.
- the audio objects may include audio signals and corresponding audio object metadata.
- the control system 110 may be configured for extracting audio objects from the content bitstream and for providing the audio objects to an audio Tenderer.
- the audio objects may include audio signals and corresponding audio object metadata.
- the audio object metadata may include at least audio object spatial metadata indicating an audio object spatial position for rendering the audio signals within the environment.
- the interface system 105 may include one or more network interfaces and/or one or more external device interfaces (such as one or more universal serial bus (USB) interfaces).
- the interface system 105 may include one or more wireless interfaces.
- the interface system 105 may include one or more devices for implementing a user interface, such as one or more microphones, one or more speakers, a display system, a touch sensor system and/or a gesture sensor system.
- the interface system 105 may include one or more interfaces between the control system 110 and a memory system, such as the optional memory system 115 shown in Figure 1.
- the control system 110 may include a memory system in some instances.
- the control system 110 may, for example, include a general purpose single- or multichip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- control system 110 may reside in more than one device.
- a portion of the control system 110 may reside in a device within an environment (such as a laptop computer, a tablet computer, a smart audio device, etc.) and another portion of the control system 110 may reside in a device that is outside the environment, such as a server.
- a portion of the control system 110 may reside in a device within an environment and another portion of the control system 110 may reside in one or more other devices of the environment.
- Non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc.
- RAM random access memory
- ROM read-only memory
- the one or more non-transitory media may, for example, reside in the optional memory system 115 shown in Figure 1 and/or in the control system 110.
- the software may, for example, include instructions for controlling at least one device to process audio data.
- the software may, for example, be executable by one or more components of a control system such as the control system 110 of Figure 1.
- the apparatus 101 may include the optional microphone system 120 shown in Figure 1.
- the optional microphone system 120 may include one or more microphones.
- one or more of the microphones may be part of, or associated with, another device, such as a speaker of the speaker system, a smart audio device, etc.
- the apparatus 101 may include the optional actuator system 125 shown in Figure 1.
- the optional actuator system 125 may include one or more loudspeakers, one or more haptic devices, one or more light fixtures, also referred to herein as luminaires, one or more fans or other air-moving devices, one or more display devices, including but not limited to one or more televisions, one or more positional actuators, one or more other types of devices for providing an MS experience, or combinations thereof.
- the term “light fixture” as used herein refers generally to various types of light sources, including individual light sources, groups of light sources, light strips, etc.
- a “light fixture” may be moveable, and therefore the word “fixture” in this context does not mean that a light fixture is necessarily in a fixed position in space.
- the term “positional actuators” as used herein refers generally to devices that are configured to change a position or orientation of a person or object, such as motion simulator seats. Loudspeakers may sometimes be referred to herein as “speakers.”
- the optional actuator system 125 may include a display system including one or more displays, such as one or more light-emitting diode (LED) displays, one or more organic light-emitting diode (OLED) displays, etc.
- the optional sensor system 130 may include a touch sensor system and/or a gesture sensor system proximate one or more displays of the display system.
- the control system 110 may be configured for controlling the display system to present a graphical user interface (GUI), such as a GUI related to implementing one of the methods disclosed herein.
- GUI graphical user interface
- the apparatus 101 may include the optional sensor system 130 shown in Figure 1.
- the optional sensor system 130 may include a touch sensor system, a gesture sensor system, one or more cameras, etc.
- This application describes methods for creating and delivering a flexibly scaled multi- sensory (MS) immersive experience (MSIE) to different playback environments, which also may be referred to herein as endpoints.
- endpoints may include a room, such as the living room of a home, a car, a cinema, a night club or other venue, an AR/VR headset, a PC, a mobile device, etc.
- Figure 2 shows example elements of an endpoint.
- the endpoint is a living room 1001 containing multiple actuators 008, some furniture 1010 and a person 1000 — also referred to herein as a user — who will consume a flexibly-scaled MS experience.
- Actuators 008 are devices capable of altering the environment 1001 that the user 1000 is in. Actuators 008 may include one or more televisions or other display devices, one or more luminaires — also referred to herein as light fixtures — one or more loudspeakers, etc.
- the number of actuators 008, the arrangement of actuators 008 and the capabilities of actuators 008 in the space 1001 may vary significantly between different endpoint types.
- the number, arrangement and capabilities of actuators 008 in a car will generally be different from the number, arrangement and capabilities of actuators 008 in a living room, a night club, etc.
- the number, arrangement and/or capabilities of actuators 008 may vary significantly between different instances of the same type, e.g., between a small living room with 2 actuators 008 and a large living room with 16 actuators 008.
- the present disclosure describes various method for creating and delivering flexibly- scaled MSIEs to these non-homogenous endpoints.
- FIG. 3 shows examples of actuator elements.
- the actuator is a luminaire 1100, which includes a network module 1101, a control module 1102 and a light emitter 1103.
- the light emitter 1103 includes one or more lightemitting devices, such as light-emitting diodes, which are configured to emit light into an environment in which the luminaire 1100 resides.
- the network module 1101 is configured to provide network connectivity to one or more other devices in the space, such as a device that sends commands to control the emission of light by the luminaire 1100.
- the control module 1102 is configured to receive signals via the network module 1101 and to control the light emitter 1103 accordingly.
- actuators also may include a network module 1101 and a control module 1102, but may include other types of actuating elements. Some such actuators may include one or more loudspeakers, one or more haptic devices, one or more fans or other airmoving devices, one or more positional actuators, one or more display devices, etc.
- Figure 4 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences.
- MS multi- sensory
- system 300 may be, or may include, one or more devices configured for performing at least some of the methods disclosed herein.
- system 300 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein.
- creating and providing an objectbased MS Immersive Experience (MSIE) approach involves the application of a suite of technologies for creation, delivery and rendering of object-based sensory data, which may include sensory objects and corresponding sensory metadata, to the actuators 008.
- Object-Based Representation In various disclosed implementations, multi-sensory (MS) effects are represented using what may be referred to herein as “sensory objects.” According to some such implementations, properties such as layer-type and priority may be assigned to and associated with attached to each sensory object, enabling content creators’ intent to be represented in the rendered experiences. Detailed examples of sensory object properties are described below.
- system 300 includes a content creation tool 000 that is configured for designing multi-sensory (MS) immersive content and for outputting object-based sensory data 005, either separately or in conjunction with corresponding audio data Oi l and/or video data 012, depending on the particular implementation.
- the object-based sensory data 005 may include time stamp information, as well as information indicating the type of sensory object, the sensory object properties, etc.
- the object-based sensory data 005 is not “channel -based” data that corresponds to one or more particular sensory actuators in a playback environment, but instead is generalized for a wide range of playback environments with a wide range of actuator types, numbers of actuators, etc.
- the objectbased sensory data 005 may include object-based light data, object-based haptic data, objectbased air flow data, or object-based positional actuator data, object-based olfactory data, object-based smoke data, object based data for one or more other types of sensor effects, or combinations thereof.
- the object-based sensory data 005 may include sensory objects and corresponding sensory metadata.
- the object-based light data may include light object position metadata, light object color metadata, light object size metadata, light object intensity metadata, light object shape metadata, light object diffusion metadata, light object gradient metadata, light object priority metadata, light object layer metadata, or combinations thereof.
- the content creation tool 000 is shown providing a stream of object-based sensory data 005 to the experience player 002 in this example, in alternative examples the content creation tool 000 may produce object-based sensory data 005 that is stored for subsequent use. Examples of graphical user interfaces for a light-object-based content creation tool are described below.
- MS Object Renderer Various disclosed implementations provide a Tenderer that is configured render MS effects to actuators in a playback environment.
- system 300 includes an MS renderer 001 that is configured to render object-based sensory data 005 to actuator control signals 310, based at least in part on environment and actuator data 004.
- the MS renderer 001 is configured to output the actuator control signals 310 to MS controllers 003, which are configured to control the actuators 008.
- the MS renderer 001 may be configured to receive light objects and object-based lighting metadata indicating an intended lighting environment, as well as lighting information regarding a local lighting environment.
- the lighting information is one general type of environment and actuator data 004, and may include one or more characteristics of one or more controllable light sources in the local lighting environment.
- the MS renderer 001 may be configured to determine a drive level for each of the one or more controllable light sources that approximates the intended lighting environment.
- Some alternative examples may include a separate renderer for each type of actuator 008, such as one renderer for light fixtures, another renderer for haptic devices, another renderer for air flow devices, etc.
- the MS renderer 001 (or one of the MS controllers 003) may be configured to output the drive level to at least one of the controllable light sources.
- the MS renderer 001 may be configured to adapt to changing conditions.
- the environment and actuator data 004 may include what are referred to herein as “room descriptors” that describe actuator locations (e.g., according to an x,y,z coordinate system or a spherical coordinate system).
- the environment and actuator data 004 may indicate actuator orientation and/or placement properties (e.g., directional and north-facing, omnidirectional, occlusion information, etc.).
- the environment and actuator data 004 may indicate actuator orientation and/or placement properties according to a 3x3 matrix, in which three elements (for example, the elements of the first row) represent spatial position (x,y,z), three other elements (for example, the elements of the second row) represent orientation (roll, pitch, yaw), and three other elements (for example, the elements of the third row) indicate a scale or size (sx, sy, sz).
- the environment and actuator data 004 may include device descriptors that describe the actuator properties relevant to the MS Tenderer 001, such as intensity range and color gamut of a light fixture, the air flow speed range and direction(s) for an air-moving device, etc.
- system 300 includes an experience player 002 that is configured to receive object-based sensory data 005’, audio data 011’ and video data 012’, to provide the object-based sensory data 005 to the MS Tenderer 001, to provide the audio data 011 to the audio Tenderer 006 and to provide the video data 012 to the video Tenderer 007.
- the reference numbers for the object-based sensory data 005’, audio data 011’ and video data 012’ received by the experience player 002 include primes (‘), in order to suggest that the data may in some instances be encoded.
- the object-based sensory data 005, audio data Oi l and video data 012 output by the experience player 002 do not include primes, in order to suggest that the data may in some instances have been decoded by the experience player 002.
- the experience player 002 may be a media player, a game engine or personal computer or mobile device, or a component integrated in an television, DVD player, sound bar, set top box, or a service provider media device such as a Chromecast, Apple TV device, or Amazon Fire TV.
- the experience player 002 may be configured to receive encoded object-based sensory data 005 along with encoded audio data Oi l and/or encoded video data 012.
- the encoded objectbased sensory data 005’ may be received as part of the same bitstream with the encoded audio data OI L and/or the encoded video data 012’ .
- the experience player 002 may be configured to extract the object-based sensory data 005’ from the content bitstream and to provide decoded object-based sensory data 005 to the MS Tenderer 001, to provide decoded audio data 011 to the audio Tenderer 006 and to provide decoded video data 012 to the video Tenderer 007.
- time stamp information in the object-based sensory data 005’ may be used — for example, by the experience player 102, the MS Tenderer 001, the audio Tenderer 106, the video Tenderer 107, or all of them — to synchronize effects relating to the object-based sensory data 005’ with the audio data 111’ and/or the video data 112’, which may also include time stamp information.
- system 300 includes MS controllers 003 that are configured to communicate with a variety of actuator types using application program interfaces (APIs) or one or more similar interfaces. Generally speaking, each actuator will require a specific type of control signal to produce the desired output from the Tenderer.
- APIs application program interfaces
- the MS controllers 003 are configured to map outputs from the MS Tenderer 001 to control signals for each actuator.
- a Philips HueTM light bulb receives control information in a particular format to turn the light on, with a particular saturation, brightness and hue, and a digital representation of the desired drive level.
- room descriptors also may describe the size and orientation of the playback environment itself, to establish a relative or absolute coordinate system to which all objects are positioned.
- a display screen may be regarded as the front, in some instances the front and center, and the floor and ceiling may be regarded as the vertical bounds.
- the room descriptors also may also indicate bounds corresponding with the left, right, front, and rear, walls relative to the front position.
- the room descriptor also may be provided in terms of a matrix, such as a 3x3 matrix. This room descriptor information is useful in describing the physical dimensions of the playback environment, for example in physical units of distance such as meters.
- sensory object locations, sensory object sizes, and sensory object orientations may be described in units that are relative to the room size, for example in a range from -1 to 1.
- Room descriptors may also describe a preferred viewing position, in some instances according to a matrix.
- actuators 008 may include lights and/or light strips (also referred to herein as “luminaires”), vibrational motors, air flow generators, positional actuators, or combinations thereof.
- audio data 011 and video data 012 are rendered by the audio Tenderer 006 and the video Tenderer 007 to the loudspeakers 009 and display devices 010, respectively.
- the system 300 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein.
- one instance of the control system 110 may implement the content creation tool 000 and another instance of the control system 110 may implement the experience player 002.
- one instance of the control system 110 may implement the audio Tenderer 006, the video Tenderer 007, the multi-sensory Tenderer 001, or combinations thereof.
- an instance of the control system 110 that is configured to implement the experience player 002 may also be configured to implement the audio Tenderer 006, the video Tenderer 007, the multi-sensory Tenderer 001, or combinations thereof.
- Object-based MS rendering involves different modalities being rendered flexibly to the endpoint/playback environment. Endpoints have differing capabilities according to various factors, including but not limited to the following:
- actuators e.g., light fixture vs. air flow control device vs. haptic device
- the types of those actuators e.g., a white smart light vs. a RGB smart light, or a haptic vest vs. a haptic seat cushion
- a white smart light vs. a RGB smart light
- a haptic vest vs. a haptic seat cushion
- Object-based haptics content conveys sensory aspects of the scene through an abstract sensory representation rather than a channel-based scheme only. For example, instead of defining haptics content as a single-channel time-dependent amplitude signal only, that is in turn played out of a particular haptics actuator such as a vibro-tactile motor in a vest the user wears, object-based haptics content may be defined by the sensations that it is intended to convey. More specifically, in one example, we may have a haptic object representing a collision haptic sensory effect. Associated with this object is:
- the haptic object s spatial location
- a haptic object of this type may be created automatically in an interactive experience such as a video game, e.g. in a car racing game when another car hits a player’s car from behind.
- the MS Tenderer will determine how to render the spatial modality of this effect to the set of haptic actuators in the endpoint. In some examples, the Tenderer does this according to information about the following:
- haptic vest vs. haptic glove vs. haptic seat cushion vs. haptic controller
- each haptic device with respect to the user(s) (some haptic devices may not be coupled to the user(s), e.g., a floor- or seat-mounted shaker);
- each haptic device provides, e.g. kinesthetic vs. vibro-tactile;
- a haptic vest may have dozens of addressable haptics actuators distributed over the user’ s torso;
- the spatial and temporal frequency spectra of the shockwave effect are authored according to the type of material the virtual cars are intended to be made of, amongst other virtual world properties.
- the Tenderer then renders this shockwave through the set of haptics devices in the endpoint, according to the shockwave vector and the physical location of the haptics devices relative to the user.
- the signals sent to each specific actuator are preferably provided so that the sensory effect is congruent across all of the (potentially heterogenous) actuators available.
- the Tenderer may not render very high frequencies to just one of the haptic actuators (e.g., the haptic arm band) due to capabilities lacking in other actuators. Otherwise, as the shockwave moves through the player’s body, because the haptic vest and haptic gloves the user is wearing do not have the capability to render such high frequencies, there would a degradation of the haptic effect perceived by the user as the wave moves through the vest, into the arm band and finally into the gloves.
- Some types of abstract haptic effects include:
- Barrier effects such as haptic effects which are used to represent spatial limitations of a virtual world, for example in a video game. If there are kinesthetic actuators on input devices (e.g., force feedback on a steering wheel or joystick), either active or resistive, then rendering of such an effect can be done through the resistive force applied to the users input. If no such actuators are available in the endpoint then in some examples vibro-tactile feedback may be rendered that is congruent with the collision of the in-game avatar with a barrier;
- Presence for example to indicate the presence of a large object approaching the scene such as a train.
- This type of haptic effect may be rendered using a low timefrequency rumbling of some haptic devices’ actuators.
- This type of haptic effect may also be rendered through contact spatial feedback applied as pressure from air-cuffs;
- this type of haptic effect may be rendered to the closest actuator on the body of the user that performed the click, for example haptic gloves that the user is wearing.
- this type of haptic effect may also be rendered to a shaker coupled to the chair in which the user is sitting.
- This type of haptic effect may, for example, be defined using time-dependent amplitude signals. However, such signals may be altered (modulated, frequency-shifted, etc.) in order to best suit the haptic device(s) that will be providing the haptic effect;
- haptic effects are designed so that the user perceives some form of motion.
- These haptic effects may be rendered by an actuator that actually moves the user, e.g. a moving platform/seat.
- an actuator may provide a secondary modality (via video, for example) to enhance the motion being rendered;
- Triggered sequences These haptic effects are characterized mainly by their timedependent amplitude signals. Such signals may be rendered to multiple actuators and may be augmented when doing so. Such augmentations may include splitting a signal in either time or frequency across multiple actuators. Some examples may involve augmenting the signal itself so that the sum of the haptic actuator outputs does not match the original signal.
- Spatial effects are those which are constructed in a way that convey some spatial information of the multi-sensory scene being rendered. For example, if the playback environment is a room, a shockwave moving through the room would be rendered differently to each haptic device given its location within the room, according to the position and size of one or more haptic objects being rendered at a particular time.
- Non-spatial effects may, in some examples, target particular locations on the user regardless of the user’s location or orientation.
- a haptic device that provides a swelling vibration on the users back to indicate immediate danger.
- a haptic device that provides a sharp vibration to indicate an injury to a particular body area.
- Some effects may be non-diegetic effects. Such effects are typically associated with user interface feedback, such as a haptic sensation to indicate the user completed a level or has clicked a button on a menu item. Non-diegetic effects may be either spatial or non- spatial.
- Haptic Device Type such as a haptic sensation to indicate the user completed a level or has clicked a button on a menu item.
- Non-diegetic effects may be either spatial or non- spatial.
- haptics device data indicating that the user is wearing both haptic gloves and a vibro-tactile vest — or at least local haptics device data indicating that that haptic gloves and a vibro-tactile vest are present in the playback environment — allows the Tenderer to render a congruent recoil effect across the two devices when a user shoots a gun in a virtual world.
- the actual actuator control signals sent to the haptic devices may be different than in the situation where only a single device is available. For example, if the user is only wearing a vest, the actuator control signals used to actuate the vest may differ with regard to the timing of the onset, the maximum amplitude, frequency and decay time of the actuator control signals, or combinations thereof.
- Haptic devices can provide a range of different actuations and thus perceived sensations. These are typically classed in two basic categories:
- Kinesthetic e.g., resistive or active force feedback.
- Either category of actuations may be static or dynamic, where dynamic effects are altered in real time according to some sensor input. Examples include a touch screen rendering a texture using a vibro-tactile actuator and a position sensor measuring the user’s finger position(s). [0079] Moreover, the physical construction of such actuators varies widely and affects many other attributes of the device. An example of this is the onset delay or time-frequency response that varies significantly across the following haptic device types:
- the Tenderer should be configured to account for the onset delay of a particular haptics device type when rendering signals to be actuated by the haptics devices in the endpoint.
- the onset delay of the haptic device refers to the delay between the time that an actuator control signal is sent to the device and the device’s physical response.
- the off-set delay refers to the delay between the time that an actuator control signal is sent to zero the output of the device and the time the device stops actuating.
- the time-frequency response refers to the frequency range of the signal amplitude as a function of time that the haptic device can actuate at steady state.
- the spatial-frequency response refers to the frequency range of the signal amplitude as a function of the spacing of actuators of a haptic device. Devices with closely-spaced actuators have higher spatial-frequency responses.
- Dynamic range refers to the differences between the minimum and maximum amplitude of the physical actuation.
- Some dynamic effects use sensors to update the actuation signal as a function of some observed state.
- the sampling frequencies, both temporal and spatial along with the noise characteristics will limit the capability of the control loop updating the actuator providing the dynamic effect.
- airflow Another modality that some multi-sensory immersive experiences (MSIE) may use is airflow.
- the airflow may, for example, be rendered congruently with one or more other modalities such as audio, video, light-effects and/or haptics.
- some airflow effects may be provided at other endpoints that may typically include airflow, such as a car or a living room.
- the airflow sensory effects may be represented as an airflow object that may include properties such as:
- Some examples of air flow objects may be used to represent the movement of a bird flying past.
- the MS Tenderer 001 may be provided with information regarding:
- the type of airflow devices e.g. fan, air conditioning, heating;
- the capabilities of the airflow device e.g., the airflow device’s ability to control direction, airflow and temperature;
- the object-based metadata can be used to create experiences such as:
- the MS Tenderer 001 may cause an increasing air temperature as a player enters a “lava level” or other hot area during a game.
- Some examples may include other elements, such as confetti in the air vents to celebrate an event, such as the celebration of a goal made by the user’s your favorite football team.
- airflow may be synchronized to the breathing rhythm of a guided meditation in one example.
- airflow may be synchronized to the intensity of a workout, with increased airflow or decreased temperature as intensity increases.
- there may be relatively less control over spatial aspects during rendering. For example, many existing airflow actuators are optimized for heating and/or air conditioning rather than for providing spatially diverse sensory actuation.
- the modalities supported by these actuators include the following:
- Haptics including: o Steering wheel: tactile vibration feedback; o Dash touchscreen: tactile vibration feedback and texture rendering; and o Seats: tactile vibrations and movement.
- a live music stream is being rendered to four users sitting in the front seats.
- the MS Tenderer 001 attempts to optimize the experience for multiple viewing positions.
- the content contains:
- the light content contains ambient light objects that are moving slowly around the scene. These may be rendered using one of the ambient layer methods disclosed herein, for example such that there is no spatial priority given to any user’s perspective.
- the haptic content may be spatially concentrated in the lower time-frequency spectrum and may be rendered only by the vibro- tactile motors in the floor mats.
- pyrotechnic events during the music stream correspond to multi-sensory-sensory content including:
- the MS Tenderer 001 renders both the light objects and the haptic objects spatially.
- Light objects may, for example, be rendered in the car such that each person in the car perceives the light objects to come from the left if the pyrotechnics content is located at the left of the scene. In this example, only lights on the left of the car are actuated.
- Haptics may be rendered across both the seats and floor mats in a way that conveys directionality to each user individually.
- the pyrotechnics are present in the audio content and both pyrotechnics and confetti are present in the video content.
- the effect of the confetti firing may be rendered using the airflow modality.
- the individually controllable air flow vents of the HVAC system may be pulsed.
- a haptics vest that the user — also referred to as a player — is wearing;
- An addressable air-flow bar which includes an array of individually controllable fans directed to the user (similar to HVAC vents in the front dashboard of a car).
- the user is playing a first person shooter game and the game contains a scene in which a destructive hurricane moves through the level.
- in-game objects are thrown around and some hit the player.
- Haptics objects rendered by the MS renderer 001 cause a shockwave effect to be provided through all of the haptics devices that the user can perceive.
- the actuator control signals sent to each device may be optimized according to the intensity of the impact of the in-game objects, the directi on(s) of the impact and the capabilities and location of each actuator (as described earlier).
- the multi-sensory content contains a haptic object corresponding to a non-spatial rumble, one or more airflow objects corresponding to directional airflow; and one or more light objects corresponding to lightning.
- the MS renderer 001 renders the non-spatial rumble to the haptics devices.
- the actuator control signals sent to each haptics device may be rendered such that the ensemble of actuator control signals across the haptics array is congruent in perceived onset time, intensity and frequency.
- the frequency content of the actuator control signals sent to the smart watch may be low-pass filtered, so that they are congruent with the frequencylimited capability of the vest, which is proximate to the watch.
- the MS Tenderer 001 may render the one or more airflow objects to actuator control signals for the AFB such that the air flow in the room is congruent with the location and look direction of the player in the game, as well as the hurricane direction itself.
- Lightning may be rendered across all modalities as (1) a white flash across lights that are located in suitable locations, e.g., in or on the ceiling; and (2) an impulsive rumble in the user’s wearable haptics and seat shaker.
- a directional shockwave may be rendered to the haptics devices.
- a corresponding airflow impulse may be rendered.
- a damage take effect indicating the amount of damage caused to the player by being struck by the in-game object, may be rendered by the lights.
- signals may be rendered spatially to the haptics devices such that a perceived shockwave moves across the player’s body and the room.
- the MS Tenderer 001 may provide such effects according to actuator location information indicating the haptics devices locations relative to one another.
- the MS Tenderer 001 may provide the shockwave vector and position according to the actuator location information in addition to actuator capability information.
- a non-directional air flow impulse may be rendered, e.g., all the air vents of the AFB may be turned up briefly to reinforce the haptic modality.
- a red vignette may be rendered to the light strip surrounding the TV, indicating to the player that the player took damage in the game.
- Figure 5 shows example elements of another system for the creation and playback of MS experiences.
- system 500 may be, or may include, one or more devices configured for performing at least some of the methods disclosed herein.
- system 500 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein.
- the system shown in Figure 5 is an instance of the system shown in Figure 4.
- the system shown in Figure 5 is a “lightscape” embodiment in which vision (video), audio and light effects are combined to create the MS experience.
- system 500 includes a lightscape creation tool 100, which is an instance of the content creation tool 000 that is described with reference to Figure 4.
- the lightscape creation tool 100 is configured for designing and outputting object-based light data 505’, either separately or in conjunction with corresponding audio data 111’ and/or video data 112’, depending on the particular implementation.
- the object-based light data 505’ may include time stamp information, as well as information indicating light object properties, etc.
- the time stamp information may be used to synchronize effects relating to the object-based light data 505’ with the audio data 111’ and/or the video data 112’, which also may include time stamp information.
- the object-based light data 505’ includes light objects and corresponding light metadata.
- the object-based light data may include light object position metadata, light object color metadata, light object size metadata, light object intensity metadata, light object shape metadata, light object diffusion metadata, light object gradient metadata, light object priority metadata, light object layer metadata, or combinations thereof.
- the content creation tool 100 is shown providing a stream of object-based light data 505’ to the experience player 102 in this example, in alternative examples the content creation tool 100 may produce object-based light data 505’ that is stored for subsequent use. Examples of graphical user interfaces for a light-object-based content creation tool are described below.
- system 500 includes an experience player 102 that is configured to receive object-based light data 505’, audio data 111’ and video data 112’, to provide the object-based light data 505 to the lightscape Tenderer 501, to provide the audio data 111 to the audio Tenderer 106 and to provide the video data 112 to the video Tenderer 107.
- the experience player 102 may be a media player, a game engine or personal computer or mobile device, or a component integrated in an television, DVD player, sound bar, set top box, or a service provider media device such as a Chromecast, Apple TV device, or Amazon Fire TV.
- the experience player 002 may be configured to receive encoded object-based light data 505’ along with encoded audio data 111’ and/or encoded video data 112’. In some such examples, the encoded object- based light data 505’ may be received as part of the same bitstream with the encoded audio data 111’ and/or the encoded video data 112’. Some examples are described in more detail below. According to some examples, the experience player 102 may be configured to extract the object-based light data 505 from the content bitstream and to provide decoded objectbased light data 505 to the lightscape Tenderer 501, to provide decoded audio data 111 to the audio Tenderer 106 and to provide decoded video data 112 to the video Tenderer 107. In some examples, the experience player 002 may be configured to allow control of configurable parameters in the lightscape Tenderer 501, such as immersion intensity. Some examples are described below.
- room descriptors of the environment and light fixture data 104 may describe the size and orientation of the playback environment itself, to establish a relative or absolute coordinate system to which all objects are positioned.
- a display screen may be regarded as the front, in some instances the front and center, and the floor and ceiling may be regarded as the vertical bounds.
- the room descriptors also may also indicate bounds corresponding with the left, right, front, and rear, walls relative to the front position.
- the room descriptor also may be provided in terms of a matrix, such as a 3x3 matrix.
- This room descriptor information is useful in describing the physical dimensions of the playback environment, for example in physical units of distance such as meters.
- sensory object locations, sensory object sizes, and sensory object orientations may be described in units that are relative to the room size, for example in a range from -1 to 1.
- Room descriptors may also describe a preferred viewing position, in some instances according to a matrix.
- system 500 includes a lightscape Tenderer 501 that is configured to render object-based light data 505 to light fixture control signals 515, based at least in part on environment and actuator data 104.
- the lightscape Tenderer 501 is configured to output the light fixture control signals 515 to light controllers 103, which are configured to control the light fixtures 108.
- the light fixtures 108 may include individual controllable light sources, groups of controllable light sources (such as controllable light strips), or combinations thereof.
- the lightscape Tenderer 501 may be configured to manage various types of light object metadata layers, examples of which are provided herein.
- the lightscape Tenderer 501 may be configured to render actuator signals for light fixtures based, at least in part, on the perspective of a viewer. If the viewer is in a living room, that includes a television (TV) screen, the lightscape Tenderer 501 may, in some examples, be configured to render the actuator signals relative to the TV screen. However, in virtual reality (VR) use cases, the lightscape Tenderer 501 may be configured to render the actuator signals relative to the position and orientation of the user’s head. In some examples, the lightscape Tenderer 501 may receive input from the playback environment — such as light sensor data corresponding to ambient light, camera data corresponding to a person’s location or orientation, etc. — to augment the render.
- the playback environment such as light sensor data corresponding to ambient light, camera data corresponding to a person’s location or orientation, etc. — to augment the render.
- the lightscape Tenderer 501 is configured to receive object-based light data 505 that includes light objects and object-based lighting metadata indicating an intended lighting environment, as well as environment and light fixture data 104 corresponding to light fixtures 108 and other features of a local playback environment, which may include, but are not limited to, reflective surfaces, windows, non-controllable light sources, light-occluding features, etc.
- the local playback environment includes one or more loudspeakers 109 and one or more display devices 510.
- the lightscape Tenderer 501 is configured to calculate how to excite various controllable light fixtures 108 based at least in part on the object-based light data 505 and the environment and light fixture data 104.
- the environment and light fixture data 104 may, for example, indicate the geometric locations of the light fixtures 108 in the environment, light fixture type information, etc.
- the lightscape Tenderer 501 may configured to determine which light fixtures will be actuated based, at least in part, on the position metadata and size metadata associated with each light object, e.g., by determining which light fixtures are within a volume of a playback environment corresponding to the light object’s position and size at a particular time indicated by light object time stamp information.
- the lightscape Tenderer 501 is configured to send light fixture control signals 515 to the light controller 103 based on the environment and light fixture data 104 and the object-based light data 505.
- the light fixture control signals 515 may be sent via one or more of various transmission mechanisms, application program interfaces (APIs) and protocols.
- the protocols may, for example, include Hue API, LIFX API, DMX, Wi-Fi, Zigbee, Matter, Thread, Bluetooth Mesh, or other protocols.
- the lightscape Tenderer 501 may be configured to determine a drive level for each of the one or more controllable light sources that approximates a lighting environment intended by the author(s) of the object-based light data 505. According to some examples, the lightscape Tenderer 501 may be configured to output the drive level to at least one of the controllable light sources.
- the lightscape renderer 501 may be configured to collapse one or more parts of the lighting fixture map according to the content metadata, user input (choosing a mode), limitations and/or configuration of the light fixtures, other factors, or combinations thereof.
- the lightscape Tenderer 501 may be configured to render the same control signals to two or more different lights of a playback environment.
- two or more lights may be located close to one another.
- two or more lights may be different lights of the same actuator, e.g., may be different bulbs within the same lamp.
- the lightscape Tenderer 501 may be configured to reduce the computational overhead, increase rendering speed, etc., by render the same control signals to two or more different, but closely-spaced, lights.
- the lightscape Tenderer 501 may be configured to spatially upmix the object-based light data 505. For example, if the object-based light data 505 was produced for a single plane, such as a horizontal plane, in some instances the lightscape Tenderer 501 may be configured to project light objects of the object-based light data 505 onto an upper hemispherical surface (e.g., above an actual or expected position of the user’s head) in order to enhance the experience.
- an upper hemispherical surface e.g., above an actual or expected position of the user’s head
- the lightscape Tenderer 501 may be configured to apply one or more thresholds, such as one or more spatial thresholds, one or more luminosity thresholds, etc., when rendering actuator control signals to light actuators of a playback environment.
- thresholds may, in some instances, prevent some light objects from causing the activation of some light fixtures.
- Light objects may be used for various purposes, such as to set the ambience of the room, to give spatial information about characters or objects, to enhance special effects, to create a greater sense of interaction and immersion, to shift viewer attention, to punctuate the content, etc. Some such purposes may be expressed, at least in part, by a content creator according to sensory object metadata types and/or properties that are generally applicable to various types of sensory objects — such as object metadata indicating a sensory object’s location and size.
- the priority of sensory objects including but not limited to light objects, may be indicated by sensory object priority metadata.
- sensory object priority metadata is taken into account when multiple sensor objects map to the same fixture(s) in a playback environment at the same time.
- priority may be indicated by light priority metadata.
- priority may not need to be indicated via metadata.
- the MS Tenderer 001 may give priority to sensory objects — including but not limited to light objects — that are moving over sensory objects that are stationary.
- a light object may, depending on its location and size and the locations of light fixtures within a playback environment — potentially cause the excitation of multiple lights.
- the Tenderer may apply one or more thresholds — such as one or more spatial thresholds or one or more luminosity thresholds — to gate objects from activating some encompassed lights.
- a lighting map which is an instance of the of the actuator map (AM) that includes a description of lighting in a playback environment, may be provided to the lightscape Tenderer 501.
- the environment and light fixture data shown in Figure 5 may include the lighting map.
- the lighting map may be allocentric, e.g., indicating absolute spatial coordinate-based light fall-off, whereas in other examples the lighting map may be egocentric, e.g., a light projection mapped onto a sphere at an intended viewing position and orientation.
- the lighting map may, in some examples, be projected onto a two-dimensional (2D) surface, e.g., in order to utilize 2D image textures in processing.
- the lighting map should indicate the capabilities and the lighting setup of the playback environment, such as a room.
- the lighting map may not directly relate to physical room characteristics, for example if certain user preference-based adjustments have been made.
- the intensity of light indicated by the light map may be inversely correlated to the distance to the center of the light, or may be approximately (e.g., within plus or minus 5%, within plus or minus 10%, within plus or minus 15%, within plus or minus 20%, etc.) inversely correlated to the distance to the center of the light.
- the intensity values of the light map may indicate the strength or impact of the light object onto the light fixture. For example, as a light object approaches a lightbulb, the lightscape Tenderer 501 may be configured to determine that the lightbulb intensity will increase as the distance between the light object and the lightbulb decreases. The lightscape renderer 501 may be configured to determine the rate of this transition based, at least in part, on the intensity of light indicated by the light map.
- the lightscape Tenderer 501 may be configured to use a dot product multiplication between a light object and the light map for each light to compute a light activation metric, e.g., as follows:
- Y represents the light activation metric
- LM represents the lighting map
- Obj represents the map of a light object.
- the light activation metric indicates the relative light intensity for the actuator control signal output by the lightscape Tenderer 501 based on the overlap between the light object and the spread of light from the light fixture.
- the lightscape Tenderer 501 may use the maximum or closest distance, or other geometric metrics, from the light object to the light fixture as part of the determination of light intensity.
- the lightscape Tenderer 501 may refer to a look-up-table to determine the light activation metric.
- the lightscape Tenderer 501 may repeat one of the foregoing procedures for determining the light activation metric for all light objects and all controllable lights of the playback environment. Thresholding for light objects that produce a very low impact on light fixtures may be helpful to reduce complexity. For example, if the effect of a light object would cause an activation of less than a threshold percent of light fixture activation — such as less than 10%, less than 5%, etc. — the lightscape Tenderer 501 may disregard the effect of that light object.
- a threshold percent of light fixture activation such as less than 10%, less than 5%, etc.
- the lightscape Tenderer 501 may then use the resultant light activation matrix Y, along with various other properties such as the chosen panning law (either indicated by light object metadata or Tenderer configuration) or the priority of the light object, to determine which objects get rendered by which lights and how. Rendering lights-objects into light fixture control signals may involve:
- the rendering of light-objects can be a function of the settings or parameters of the lightscape Tenderer 501 itself. These may include:
- Velocity priority when this parameter is set, light objects that are moving are given a higher priority than those which are not. Having the velocity priority parameter set enhances the dynamism of the rendered scene;
- Activation threshold the minimum light activation, Y, that must be achieved in order to activate a light-fixture
- the lightscape Tenderer 501 may, in some implementations, be configured according to different modes.
- mode is different from “parameter” in the sense that modes may, for example, involve completely different signal paths, whereas parameters may simply parameterize these signal paths.
- one mode may involve the projection of all light objects onto a lighting map before determining how/what to render to the light-fixtures, while another mode may only snap the highest-priority lights to the nearest light fixtures.
- Modes may include:
- Modes to support low light-fixture count In these modes, the rendering parameters and the light object metadata are utilized in order to determine which subset of light-objects are to be rendered and in what manner.
- the “manner” refers to the trade-off between the spatial, color, temporal fidelity of the most prominent light-objects in the scene;
- the system 500 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein.
- one instance of the control system 110 may implement the lightscape creation tool 100 and another instance of the control system 110 may implement the experience player 002.
- one instance of the control system 110 may implement the audio Tenderer 006, the video Tenderer 007, the lightscape Tenderer 501, or combinations thereof.
- an instance of the control system 110 that is configured to implement the experience player 002 may also be configured to implement the audio Tenderer 006, the video Tenderer 007, the lightscape Tenderer 501, or combinations thereof.
- Figure 6 shows an example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5.
- GUI graphical user interface
- the types and numbers of elements shown in Figure 6 are merely provided by way of example.
- Other GUIs presented by a lightscape creation tool may include more, fewer and/or different types and numbers of elements.
- the GUI 600 may be presented on a display device according to commands from an instance of the control system 110 of Figure 1 that is configured for implementing the lightscape creation tool 100 of Figure 5.
- a user may interact with the GUI 600 in order to create light objects and to assign light object properties, which may be associated with the light object as metadata.
- a user is selecting properties of the light object 630.
- the GUI 600 shows the light object 630 in a three-dimensional space 631, the latter of which represents a playback environment.
- Element 634 shows a coordinate system of the three-dimensional space 631. Accordingly, in this example the light object 630 and the three-dimensional space 631 are being viewed from the upper left.
- a user may interact with the GUI 600 in order to select a position and a size of the light object 630.
- a user may select a position of the light object 630 by dragging the light object 630 to a desired position within the three-dimensional space 631, for example by touching a touch screen, using a cursor, etc.
- a user may select a size of the light object 630 by selecting the size of the circle (or other shape) that is shown on the GUI 600 to indicate the outline of the light object 130.
- a user may decrease the size of the light object 630 via a two-fingered pinch of the outline of the light object 130, may increase the size of the light object 630 via a two-fingered expansion, etc.
- the GUI 600 allows a content creator to specify the position and size of the light object 630 within the three-dimensional space 631, thereby allowing the content creator to generalize the position and extent of the corresponding light effects, without prior knowledge of the particular size of any particular playback environment in which the light effects will be provided, without prior knowledge of the number, type and positions of light fixtures, etc., within the playback environment in which the light effects will be provided, etc.
- the light fixtures that will potentially be actuated responsive to the presence of the light object 630 at a particular time will be those within a volume of the playback environment corresponding to the position and size/extent of the light object 630.
- a user may interact with the color circle 635 of the GUI 600 in order to select the hue and color saturation of the current light object and may interact with the slider 636 in order to select the brightness of the current light object.
- These and other selectable properties of the light object 630 are displayed in area 632 of the GUI 600.
- the properties of the light object 630 that may be selected via the GUI 600 also include intensity, diffusivity, “feathering,” whether or not the light object is hidden, saturation, priority and layer.
- Light object layers and priority will be described in more detail below. Generally speaking, light object layers may be used to group light objects into categories such as “ambient,” “dynamic,” etc. Light object priority may be assigned by a content creator and used by a Tenderer to determine, for example, which light object(s) will be presented when two or more light objects are simultaneously active and are simultaneously encompassing an area that includes the same light fixture.
- Area 640 of the GUI 600 indicates time information corresponding to each of a plurality of light objects that are being created via the lightscape creation tool.
- light objects are listed on the left side of the area 640, along a vertical axis, and time is shown along a horizontal axis.
- four-second time intervals are delineated by vertical lines.
- time information for each light object is shown as isolated or connected diamond symbols or lines along a series of horizontal rows, each of which corresponds to one of the light objects indicated on the left side of the area 640.
- the line 633 indicates that light object 3 will be displayed starting between 39 and 40 seconds and will be continuously displayed until almost 1 minute and 6 seconds.
- the diamond symbols to the right of the line 633 indicate that light object 3 will be displayed discontinuously for the next few seconds.
- Figure 7A shows another example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5.
- GUI graphical user interface
- the types and numbers of elements shown in Figure 7A are merely provided by way of example.
- Other GUIs presented by a lightscape creation tool may include more, fewer and/or different types and numbers of elements.
- the GUI 700 may be presented on a display device according to commands from an instance of the control system 110 of Figure 1 that is configured for implementing the lightscape creation tool 100 of Figure 5.
- the GUI 700 represents an instant in time during which light fixtures in an actual playback environment are being controlled according to light objects that have been created by an implementation of the lightscape creation tool 100 of Figure 5.
- An image of the playback environment is shown in area 705 of the GUI 700.
- Various light fixtures 708 and a television 715 are shown in the playback environment of area 705.
- the particular instant in time is shown by vertical line 742 of area 740.
- the vertical line 742 intersects with horizontal lines 744a, 744b, 744c, and 744d, indicating that the light being provided in the corresponding light objects 1, 4, 5 and 7 are being played back.
- Area 732 indicates light object properties.
- video data and audio data are also being played back in the audio environment, and the playback of rendered light objects is being synchronized with playback of the video data and audio data.
- an image of the played-back video is shown in area 710 of the GUI 700.
- the video may, for example, be played back by the television 715.
- a user may be able to interact with the GUI 700 in order to adjust light object properties, add or delete light objects, etc. For example, a user may cause the playback to be paused in order to adjust light object properties. In some alternative examples, a user may need to revert to a GUI such as the GUI 600 of Figure 6 in order to adjust light object properties, add or delete light objects, etc.
- the example described with reference to Figure 7A may also involve at least some of the “downstream” rendering and playback functionality that can be provided by other blocks of Figure 5, including but not limited to that of the lightscape renderer 501, the light controller APIs 103 — which may in some instances be implemented by the same device that implements the lightscape renderer 501 — the light fixtures 108, the audio renderer 106, the loudspeakers 109, the video renderer 107 and the display device(s) 510.
- the processes described with reference to Figure 7A also may involve functionality of the experience player 102 of Figure 5.
- the example described with reference to Figure 7A may also involve at least some of the “downstream” rendering and playback functionality that can be provided by other blocks of Figure 4, including but not limited to that of the MS renderer 001, the MS controller APIs 003 — which may in some instances be implemented by the same device that implements the MS renderer 001 — the light fixtures 008, the audio renderer 006, the loudspeakers 009, the video renderer 007 and the display device(s) 010.
- the processes described with reference to Figure 7A also may involve functionality of the experience player 002 of Figure 4.
- Figure 7B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- the blocks of method 750 like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 750 may be performed concurrently. Moreover, some implementations of method 750 may include more or fewer blocks than shown and/or described.
- the blocks of method 750 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above. For example, at least some aspects of method 750 may be performed by an instance of the control system 110 that is configured to implement the experience player 002 of Figure 4. Some other aspects of method 750 may be performed by an instance of the control system 110 that is configured to implement the MS Tenderer 003 of Figure 4.
- block 755 involves receiving, by a control system, a content bitstream including encoded object-based sensory data.
- the object-based sensory data includes sensory objects and corresponding sensory metadata and corresponds to sensory effects to be provided by a plurality of sensory actuators in an environment.
- the environment may be an actual, real-world environment, such as a room environment or an automobile environment.
- the sensory effects may include lighting, haptics, airflow, one or more positional actuators, or combinations thereof, to be provided by the plurality of sensory actuators in the environment.
- the environment may be, or may include, a virtual environment.
- method 750 may involve providing a virtual environment, such as a gaming environment, while also providing corresponding sensory effects in a real-world environment.
- the object-based sensory metadata may include sensory spatial metadata indicating at least a spatial position for rendering the object-based sensory data within the environment, indicating an area for rendering the object-based sensory data within the environment, or both.
- the object-based sensory data does not correspond to particular sensory actuators in the environment.
- the sensory objects of the object-based sensory data may correspond with a portion of a three-dimensional area that represents a playback environment. The actual playback environment in which the sensory objects will be rendered does not need to be known, and generally will not be known, at the time that the sensory objects are authored.
- the object-based sensory data includes abstracted sensory reproduction information — in this example, the sensory objects and corresponding sensory metadata — allowing the sensory Tenderer to reproduce authored sensory effects via various sensory actuator types, via various numbers of sensory actuators and from various sensory actuator positions in the environment.
- block 760 involves extracting, by the control system, the object-based sensory data from the content bitstream.
- the content bitstream also may include encoded audio objects that are synchronized with the encoded object-based sensory data.
- the audio objects may include audio signals and corresponding audio object metadata.
- method 750 also may involve extracting, by the control system, audio objects from the content bitstream.
- method 750 also may involve providing, by the control system, the audio objects to an audio Tenderer.
- the audio object metadata may include at least audio object spatial metadata indicating an audio object spatial position for rendering the audio signals within the environment.
- block 765 involves providing, by the control system, the object-based sensory data to a sensory Tenderer.
- method 750 also may involve receiving, by the sensory Tenderer, the object-based sensory data and receiving, by the sensory Tenderer, environment descriptor data corresponding to locations of sensory actuators in the environment.
- method 750 also may involve receiving, by the sensory Tenderer, actuator descriptor data corresponding to properties of the sensory actuators in the environment.
- the environment descriptor data and the actuator descriptor data may be, or may be included in, the environment and actuator data 004 that is described with reference to Figure 4.
- method 750 also may involve providing, by the sensory Tenderer, actuator control signals for controlling the sensory actuators in the environment to produce sensory effects indicated by the objectbased sensory data.
- the MS Tenderer 001 may provide actuator control signals 310 to the MS controller APIs 003 and the MS controller APIs 003 may provide actuator-specific control signals to the actuators 008.
- the MS controller APIs 003 that are shown in Figure 4 may be implemented via the MS Tenderer 001 and actuator-specific signals may be provided to the actuators 008 by the MS Tenderer 001.
- method 750 also may involve providing, by the sensory actuators in the environment, the sensory effects.
- method 750 also may involve receiving, by the audio Tenderer, the audio objects and receiving, by the audio Tenderer, loudspeaker data corresponding to loudspeakers in the environment.
- method 750 also may involve providing, by the audio Tenderer, loudspeaker control signals for controlling the loudspeakers in the environment to play back audio corresponding to the audio objects and synchronized with the sensory effects.
- the synchronization may, for example, be based on time information that is included in or with the sensory objects and the audio object, such as time stamps.
- method 750 also may involve playing back, by the loudspeakers in the environment, the audio corresponding to the audio objects.
- the content bitstream includes encoded video data synchronized with the encoded audio objects and the encoded object-based sensory data.
- method 750 also may involve extracting, by the control system, video data from the content bitstream and providing, by the control system, the video data to a video Tenderer.
- method 750 also may involve receiving, by the video Tenderer, the video data and providing, by the video Tenderer, video control signals for controlling one or more display devices in the environment to present images corresponding to the video control signals and synchronized with the audio objects and the sensory effects.
- method 750 also may involve presenting, by the one or more display devices in the environment. The images may correspond to the video control signals.
- Some disclosed examples involve the inclusion of lighting metadata with a video and/or an audio track, or for use as a standalone lighting-based sensory experience.
- This lighting metadata describes the intended lighting environment to be reproduced at playback.
- There are various ways of representing the lighting metadata several of which are described in detail in this disclosure.
- the intended lighting environment may be transmitted as one or more Image Based Lighting (IBL) objects.
- IBL is a technique that has previously been used to capture an environment and lighting. IBL may be described as a the process of illuminating scenes and objects (real or synthetic) with images of light from the real world. IBL evolved from previously-disclosed reflection-mapping techniques in panoramic images are used as texture maps on computer graphics models to show shiny objects reflecting real or synthetic environments. Some aspects of IBL are analogous to image-based modeling, in which a three-dimensional scene’s geometric structure may be derived from images. Other aspects of IBL are analogous to image-based rendering, in which the rendered appearance of a scene may be produced from the scene’s appearance in images.
- IBL objects have been used when rendering computer graphics to create realistic reflections and lighting effects, for example effects in which a rendered object itself seems to reflect some part of a real-world environment.
- IBL involves the following processes:
- omnidirectional images There are various methods for capturing omnidirectional images.
- One way is to use a camera to photograph a reflective sphere placed in an environment.
- Another method of obtaining omnidirectional images is to obtain a mosaic based on many camera images obtained from different directions/viewing perspectives and combining the images using an image stitching program. In some such examples, images may be obtained using a fisheye lens, which can cover the full field of view in as few as two images.
- Another method of obtaining omnidirectional images is to use a scanning panoramic camera, such as a “rotating line camera,” which is configured to assemble a digital image as the camera rotates, to scan across a 360-degree field of view. Further details of previously-disclosed IBL methods are disclosed in Debevec, Paul, “Image-Based Lighting” (IEEE, March/ April 2002, pp. 26-34), which is hereby incorporated by reference.
- Some implementations of the present disclosure build upon previously-disclosed methods involving IBL objects for a new purpose, which is to reproduce the intended environment using dynamically controllable surround lighting.
- Both the intended lighting environment and the end-point environment may change over time.
- the walls of the end-point environment may be painted, actuators may be moved, furniture may be moved or replaced, new furniture, shelving and/or cabinetry may be added, etc.
- Several types of mappings, or projections may be used to map environment lighting from a sphere surrounding the intended viewing position onto a two-dimensional (2D) plane. Projection onto a 2D plane facilitates — and reduces the computational overhead required for — compressing lighting objects using 2D image or video codecs for more efficient transmission.
- Figures 8 A, 8B and 8C show three examples of projecting lighting of viewing environments onto a two-dimensional (2D) plane.
- a spherical projection was used.
- Other representations are also possible.
- the lighting of the viewing environments is shown from an intended viewing position, but over 360 degrees of viewing angles.
- the x axis represents the horizontal angles from the viewer’s perspective (left to right) and the vertical axis represents the vertical angles from the viewer’s perspective (up and down).
- the middles of the projections 805, 810 and 815 correspond with the directions in front of a viewer.
- Projection 805 represents studio-type lighting in a mostly dark and monochromatic environment.
- Projection 810 includes a dark blue floor 812 and a predominant blue area 814 directly in front of the viewer.
- Projection 815 includes a bright red light 819 in front of and slightly to the left of the viewer.
- lighting metadata may be used to produce a video showing an environment from an intended viewing position.
- lighting metadata may include multiple metadata units, each of which may include a header, describing how the metadata is represented to facilitate playback.
- lighting metadata may include the following information, which may be provided in different metadata units:
- This information enables decoders to correctly align the viewing position for the reference viewing position.
- Position XI , Y1 ,Z 1 has EnvironmentMetadataPayload EMP 1
- Position X2,Y2,Z2 has EMP2
- a user position X’,Y’,Z’ is determined;
- This simple example illustrates one method of linear interpolation between two points. Additional clamping may be desirable if the viewer position is beyond either of the reference points. A triangular interpolation may be more suitable in some instances;
- the IBL representation may be augmented with depth information indicating the relative distance of the environmental light sources from the reference viewing position.
- This depth information can be used to adapt the position of the light sources as a viewer moves around an environment.
- Depth information may, for example, be obtained directly from RGB images using trained neural networks, e.g., via monocular depth estimation.
- many consumer devices, such as iPhones are capable of measuring depth directly using infrared imaging techniques such as lidar and structured light.
- the spatial resolution of the IBL techniques may be variable depending on the use case requirements, depending on the available bit rate and required compression quality.
- the IBL may be compressed using known image-based compression methods — such as Joint Photographic Experts Group (JPG), JPG2000, Portable Network Graphic (PNG), etc. — or known video-based compression methods, such as (Advanced Video Coding (AVC), also known as H.264, H.265, Versatile Video Encoding (VVC), AOMedia Video 1 (AVI), etc.
- JPG Joint Photographic Experts Group
- PNG Portable Network Graphic
- AVC Advanced Video Coding
- VVC Versatile Video Encoding
- AVI AOMedia Video 1
- the methods disclosed herein are not limited to IBL-based examples.
- Some other examples of representing the intended lighting environment treat each light source as a unique light source object, with a defined position and size.
- other information may be used to define or describe each light source, including but not limited to the directionality of the light emitted from each light source object.
- Some methods also may include information regarding reflectivity of one or more surfaces, one or more intended room dimensions, etc. Such information may be used to support implementations in which the viewer is free to move to new locations.
- the light source objects may be defined directly using a computer program written specifically for this task, or they may be inferred by analyzing video images of light sources in an environment. Potential advantages of a light source object-based approach include a smaller metadata payload size for relatively simple lighting scenarios.
- Figure 9A shows example elements of a lightscape Tenderer.
- the types and numbers of elements shown in Figure 9A are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements.
- the lightscape Tenderer 501 is an instance of the lightscape Tenderer 501 that is shown in Figure 5.
- the lightscape Tenderer 501 is implemented by an instance of the control system 110 of Figure 1.
- the lightscape Tenderer 501 includes a scale factor computation block 925 and a digital drive value computation block 930.
- the lightscape Tenderer 501 receives as input object-based light data 505, including the intended environment lighting metadata.
- the object-based light data 505 may, for example, have been produced by the lightscape creation tool 100 of Figure 5.
- the rendering engine also receives environment and light fixture data 104 and computes appropriate light fixture control signals 515 for the controllable light fixtures 108 (not shown) in the playback environment.
- the environment and light fixture data 104 is shown as including environment and light fixture data 104a, which includes information regarding light fixtures of the playback environment and their capabilities, and environment and light fixture data 104b, which includes information regarding ambient light and/or non-controllable light fixtures of the playback environment.
- the lightscape Tenderer 501 may be configured to output the light fixture control signals 515 to light controllers 103, which are configured to control the light fixtures 108.
- the light fixtures 108 in the playback environment also may be referred to herein as “endpoint dynamic lighting elements.”
- the environment and light fixture data 104a may include information regarding a set of N dynamic lighting elements, where N represents the total number of dynamic lighting elements that are controllable in the playback environment.
- the environment and light fixture data 104a includes: (1) an IBL map 905 of the environmental lighting produced by the dynamic (controllable) light fixtures at maximum intensity; and (2) a mapping 910 of light intensity produced by the controllable light fixtures to digital drive signals provided to the controllable light fixtures.
- the environment and light fixture data 104b includes ambient light information 915 and ambient light level information 920.
- the ambient light information 915 may include a base IBL map of the base environment lighting that is not controllable by the system.
- the ambient light information 915 may include information regarding other light sources, such as windows in the playback environment, the directions that the windows face, the amount of outdoor light received in the environment through the windows at various times of day, information regarding controllable window shades, if any, etc.
- the ambient light level information 920 may include a scale value of the base environmental lighting, obtained, for example, by an optical light sensor.
- a lightscape Tenderer such as the lightscape Tenderer 501 of Figure 5 — may perform the following operations: 1. Receive as input — here, as part of the object-based light data 505 — information regarding an intended environment lighting IBLref;
- Compute the base lighting IBLbase for example according to a constant estimated value, or by scaling the base IBL map with the estimated ambient light;
- a non-linear function may be used to encode each of the linear values prior to computing the difference.
- the non-linear function may correspond to the sensitivity of human vision to color and intensity of light;
- computing the scale factors may involve subtracting the ambient light IBLbase from the intended lighting IBLref, and then computing the scale factors Scalen using deconvolution from the dynamic lighting IBLn. Conventional methods of deconvolution can be used, including regularization as needed to improve robustness.
- the process of computing the scale factors may be performed iteratively, for example by initializing the scale factors to an initial value, then adjusting the scale values in turn while evaluating the output against the reference (the intended lighting IBLref).
- conventional methods of gradient descent and function minimization can be used during the process of computing the scale factors.
- a lookup table such as the dynamic light LUT 910 of Figure 9 A — may be used to determine the correct digital drive value required to achieve a desired light intensity from a particular light source.
- a functional scale factor may be derived from measurement data.
- One important aspect of this application is the generation of the dynamic lighting IBL data for a given playback environment.
- the process is as follows:
- Commonly-used techniques are a camera imaging a shiny sphere, a camera with a fisheye lens, a camera that pans around the scene, or a fixture of multiple cameras which capture the scene from all directions (a 360 degree camera);
- Linear light refers to a space where a linear change in value is perceived as a linear change by a human. Most devices do not have linear responses in this respect. In other words, if the codeword (drive value) to the actuator is doubled, the change in perceived brightness is not doubled. Working in a linear light space is convenient and can ensure that, when applicable, a finite amount of resolution is spread equally across the human perceptual response. After operations are performed in a linear light space and corresponding values are derived, these values may be converted into the drive values/codewords for controlling the physical devices/light.
- the above-described process is applicable when the base environment light is relatively fixed, except for an overall increase or decrease. For example, there may be a window in one corner of the playback environment that causes an overall increase or decrease in ambient light, depending on the weather and the time of day. In many viewing environments the lighting that cannot be controlled may be more dynamic, for example light fixtures that can be controlled manually (not part of the dynamic setup), or an automotive environment. In some examples, multiple base lighting scenarios may be captured, and during playback a measurement of the ambient light in the environment may be used to estimate the captured base lighting scenario that most closely matches the actual, current base lighting conditions. The captured IBLn, IBLbase, and drive values relationships may be stored in a configuration file that is accessible during rendering.
- the output of the content authoring/mastering step is the intended lighting environment map, or IBLref.
- the intended lighting environment map may be measured directly using the measurement approach described in the calibration section above.
- the intended lighting environment map may be rendered from computer graphics software.
- Figure 9B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- the blocks of method 950 like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 950 may be performed concurrently. Moreover, some implementations of method 950 may include more or fewer blocks than shown and/or described.
- the blocks of method 950 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above.
- method 950 may be performed by an instance of the control system 110 that is configured to implement at least the lightscape Tenderer 501 of Figure 5 or the lightscape Tenderer 501 of Figure 9A. In some examples, method 950 may be performed by one or more instances of the control system 110 configured to implement the lightscape Tenderer 501 and the light controller APIs of Figure 5.
- block 955 involves receiving, by a control system configured to implement a lighting environment Tenderer, object-based light data indicating the intended lighting environment.
- the object-based light data including light objects and lighting metadata.
- the environment may be an actual, real-world environment, such as a room environment or an automobile environment.
- the environment may be, or may include, a virtual environment.
- method 950 may involve providing a virtual environment, such as a gaming environment, while also providing corresponding lighting effects in a real-world environment.
- block 960 involves receiving, by the control system, lighting information regarding a local lighting environment.
- the lighting information includes one or more characteristics of one or more controllable light sources in the local lighting environment.
- Block 960 may, for example, involve receiving the environment and light fixture data 104 that is described herein with reference to Figure 5 or Figure 9 A.
- block 965 involves determining, by the control system, a drive level for each of the one or more controllable light sources that approximates the intended lighting environment.
- block 970 involves outputting, by the control system, the drive level to at least one of the controllable light sources.
- the object-based lighting metadata includes time information.
- block 965 may involve determining one or more drive levels for one or more time intervals corresponding to the time information.
- method 950 may involve receiving viewing position information.
- block 965 may involve determining one or more drive levels corresponding to the viewing position information.
- the lighting information may include one or more characteristics of one or more base light sources in the local lighting environment, the base light sources not being controllable by the control system.
- block 965 may involve determining one or more drive levels based, at least in part, on the one or more characteristics of one or more non-controllable light sources.
- the object-based lighting metadata may include at least lighting object position information and lighting object color information.
- the intended lighting environment may be transmitted as one or more Image Based Lighting (IBL) objects.
- the determining process of block 965 may be based, at least in part, on an IBL map of environmental lighting produced by each of n controllable light sources (IBL n ) in the local lighting environment at maximum intensity.
- method 950 may involve receiving a base IBL map of the base environment lighting (IBLbase) that is not controllable by the control system.
- the determining process of block 965 may be based, at least in part, on the base IBL map.
- the determining process of block 965 may be based, at least in part, on a linear scaling (Scalen) for each dynamic lighting element IBL n such that a sum of light from each scaled IBL n plus the base lighting IBLbase most closely matches the intended lighting environment IBLref.
- the determining process of block 965 may be based, at least in part, on minimizing a difference between IBLref and the sum of light from each scaled IBLn plus the base lighting IBLbase.
- the “effect” of an MS object is a synonym for the type of MS object.
- An “effect” is, or indicates, the sensory effect that the MS object is providing. If an MS object is a light object, its effect will involve providing direct or indirect light. If an MS object is a haptic object, its effect will involve providing some type of haptic feedback. If an MS object is an air flow object, its effect will involve providing some type of air flow. As described in more detail below, some examples involve other “effect” categories.
- Some MS objects may contain a persistence property in their metadata. For example, as an moveable MS object moves around in a scene, the moveable MS object may persist for some period of time at locations that the moveable MS object passes through. That period of time may be indicated by persistence metadata.
- the MS Tenderer is responsible for constructing and maintaining the persistence state.
- individual MS objects may be assigned to “layers,” in which MS objects are grouped together according to one or more shared characteristics.
- layers may group MS objects together according to their intended effect or type, which may include but are not limited to the following:
- layers may be used to group MS objects together according to shared properties, which may include but are not limited to the following:
- MS objects may have a priority property that enables the Tenderer to determine which object(s) should take priority in an environment in which MS objects are contending for limited actuators. For example, if multiple light objects overlap with a single light fixture at a time during which all of the light objects are scheduled to be rendered, a Tenderer may refer to the priority of each light object in order to determine which light object(s) will be rendered.
- priority may be defined between layers or within layers. According to some examples, priority may be linked to specific properties such as intensity. In some examples, priority may be defined temporally: for example, the most recent MS object to be rendered may take precedence over MS objects that have been rendered earlier. According to some examples, priority may be used to specify MS objects or layers that should be rendered regardless of the limitations of a particular actuator system in a playback environment.
- Spatial panning laws may define an MS object’s movement across a space, how an MS object affects actuators as it moves between them, etc.
- mixing mode may specify how multiple objects are multiplexed onto a single actuator.
- mixing modes may include one or more of the following:
- Max mode select the MS object which activates an actuator the most
- Mix mode mix in some or all the objects according to a rule set, for example by summing activation levels, taking the average of activation levels, mixing color according to activation level or priority level, etc.;
- MaxNmix mix in the top N MS objects (by activation level), according to a rule set.
- MS content files may include metadata such as trim passes or mastering environment.
- trim controls may act as guidance on how to modulate the default rendering algorithm for specific environments or conditions at the endpoint.
- Trim controls may specify ranges and/or default values for various properties, including saturation, tone detail, gamma, etc.
- there may be automotive trim controls which provide specific defaults and/or rule sets for rendering in automotive environments, for example guidance that includes only objects of a certain priority or layer.
- Other examples may provide trim controls for environments with limited, complex or sparse multisensory actuators.
- a single piece of multisensory content may include metadata on the properties of the mastering environment such as room size, reflectivity and ambient bias lighting level. The specific properties may differ depending on the desired endpoint actuators. Mastering environment information can aid in providing reference points for rendering in a playback environment.
- direct lighting and indirect lighting may be assigned to different lighting metadata layers.
- Direct light objects in a direct light object layer are light objects representing light that is directly visible to the content creator or end user.
- Examples of direct light objects could include light fixtures in the scene, the sun, the moon, headlights from a car approaching, lightning during a storm, traffic lights, etc.
- Direct light objects also may be used to represent light sources that are part of a scene but are typically, or temporarily, not visible in the associated video content, for example because they are outside the video frame or because they move outside the video frame.
- direct light objects may be used to enhance or augment auditory events, such as explosions, visually guiding moving object trajectories outside the video frame, etc.
- the use of direct light objects is typically of a dynamic nature. For example, the associated metadata such as intensity, color, saturation, and position will often change as a function of time within a scene of the media content.
- Indirect light objects are light objects representing the effect of indirect light.
- indirect light objects may be used to represent the effect of light radiated by fixtures that is observed when the light is reflected by one or more surfaces.
- Some examples of the using indirect light objects include change the observed color of the walls, ceiling or floor of the environment into a color that matches the content, such as green colors for a forest scene, or blue colors for sky or water.
- Indirect light objects also may be used to set the scene and the mood of the environment in a similar way as is achieved by color grading video content, but in a more immersive way.
- Some examples involve a further abstraction of the direct and indirect light object layers into layers that include aspects of both.
- these layers may include one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof. These layers may, in some examples, be used for, or may correspond with, linear or event-based triggers in content.
- an ambient layer can be used to set the mood and tone in a space through washes of color on surfaces in the playback environment.
- An ambient layer may be used as a base layer on which to build lighting scenes.
- an ambient layer may be represented via light objects that cover relatively large areas.
- an ambient layer may be represented via light objects that cover relatively small areas, for example with one or more images.
- an ambient layer may be divided into zones. In some such examples, particular light effects always occupy a certain region of space. For example, the walls, ceiling and floor in an authoring or playback environment each may be considered separate ambient layer zones. Dynamic Layer! s)
- a dynamic layer may be used to represent spatial and temporal variation of MS objects, such as light objects.
- MS objects may also have priority so that, for example, one light object may have preference over another light object in being presented via a light fixture.
- individual MS objects may, in some examples, be linked to other objects, such as to audio objects (from spatial audio) or to 3D world MS objects.
- a custom layer can be used to design light sequences that can be freely assigned to light fixtures for functional purposes. These sequences may not be spatial in nature, but instead may provide further information to the user. For example, in a game a light strip may be assigned to show the player’s remaining life.
- an overlay layer can be used to present persistent lights that have continuous priority.
- An overlay layer may, for example, be used to create a “watermark” over all other elements in a lighting scene.
- direct light objects may be authored by determining or setting light source position, intensity, hue, saturation and spatial extent as a function of time for one or more light objects.
- this authoring process may create corresponding metadata that can be distributed, with the direct light objects, alongside audio and/or video content of a content presentation.
- direct light objects are rendered to direct light sources.
- Indirect light effects may, in some examples, be authored as a dedicated group or class within the lightscape metadata content, focusing more on overall color and ambiance rather than dynamic effects. Indirect light effects may also be defined by intensity, hue, saturation, or combinations thereof as a function of time, but would typically be associated with a substantial area of the lightscape rendering environment. Indirect light effects are ideally (but not necessarily) rendered to indirect light sources, when available.
- Figure 10 shows another example of a GUI that may be presented by a display device of a lightscape creation tool. As with other figures provided herein, the types and numbers of elements shown in Figure 10 are merely provided by way of example. Other GUIs presented by a lightscape creation tool may include more, fewer and/or different types and numbers of elements. According to some examples, the GUI 1000 may be presented on a display device according to commands from an instance of the control system 110 of Figure 1 that is configured for implementing the lightscape creation tool 100 of Figure 5.
- a user may interact with the GUI 1000 in order to create light objects and to assign light object properties, which may be associated with the light object as metadata.
- the GUI 1000 includes a direct light object metadata editor section 1005, with which a user can interact in order to define properties of metadata for direct light objects, as well as an indirect light object metadata editor section 1010, with which a user can interact in order to define properties of metadata for indirect light objects.
- a user may interact with the direct light object metadata editor section 1005 in order to select a position, a size and other properties of a direct light object.
- the direct light object metadata editor section 1005 includes a hue-saturation-lightness (HSL) color wheel 1035a, with which a user may interact to select the HSL attributes of a selected direct light object.
- HSL hue-saturation-lightness
- the direct light object metadata editor section 1005 represents direct light objects A, B and C within a three-dimensional space 1031, which represents a playback environment. In this example, the direct light objects A, B and C are being viewed from the top of the three-dimensional space 1031, along the z axis.
- a user has selected, and is currently selecting properties of, the direct light object A. Because the user has selected the direct light object A, the corresponding time automation lanes for direct light object A’s coordinates (X, Y, Z), object extent (E) and HSL values across time have become visible in the area 1025 and can be edited.
- the time interval corresponding to the time automation lanes shown in Figure 10 may be on the order of 1 or more seconds, for example 1 second, 2 seconds, 3 seconds, 4 seconds, 5 seconds, etc.
- the x and y dimensions of the three-dimensional space 1031 are shown, but the area 1025 of the direct light object metadata editor section 1005 nonetheless allows a user to indicate x, y and z coordinates.
- the indirect light object metadata editor section 1010 includes an HSL color wheel 1035b, with which a user may interact to select the HSL attributes of a selected indirect light object.
- the indirect light object metadata editor section 1010 also includes an intensity control 1030, with which a user may interact to select the intensity of a selected indirect light object.
- a lightscape creation tool or an MS content creation tool may allow a content creator to set indirect light effects linked to video scene boundaries to set the lightscape indirect light effects for a specific scene. Alternatively, or additionally, the content creator may choose to modify intensity, hue, saturation, etc., as a function of time.
- a lightscape creation tool or an MS content creation tool may allow a content creator to use the video color overlay information used during video content creation to determine the indirect light effect metadata.
- the indirect light settings of a lightscape creation tool or an MS content creation tool may be used as a color/hue/saturation/intensity overlay to the direct light object metadata such that the direct light objects will follow the indirect light properties more closely.
- layers may be authored in a lightscape creation tool or an MS content creation tool configured for the creation of linear based content, where individual “objects” may be assigned to a layer with properties such as color, intensity, shape and position.
- object may be assigned to a layer with properties such as color, intensity, shape and position.
- position may only be specified for dynamic objects whilst sub zones may be used for ambient layer objects.
- MS content such as light-based content may be created for 3D worlds.
- Some such examples allow for event-based triggers, for example triggers linking events to existing light metadata as well as to the creation of new scenes.
- a lightscape creation tool or an MS content creation tool may allow mixing between layers or individual objects. For example, it may be desired that layers/objects with the same priority may be additively mixed, whilst a higher-priority object should occlude all other objects.
- the lightscape creation tool or the MS content creation tool may allow a content creator to define mixing rules corresponding to the content creator’s intentions.
- the layer attributes and their metadata may be rendered by means of a lightscape renderer, such as the lightscape Tenderer 501 of Figure 5.
- the lightscape renderer may be configured to send light fixture control signals 515 to light fixtures of a playback environment.
- the lightscape Tenderer 501 may be configured to output the light fixture control signals 515 to light controllers 103, which are configured to control the light fixtures 108.
- the lightscape Tenderer uses the environment and light fixture data 104 to determine the capabilities and spatial positions of each light fixture.
- the priority of layers may be a determining factor in what is ultimately rendered to a light fixture.
- the environment and light fixture data 104 received by a lightscape Tenderer include data regarding whether the fixture is directly visible from a viewing position or is an indirect light source. In some examples, if no indirect light fixtures are available, the indirect light data may be sent to direct light fixtures instead, potentially with a reduced brightness.
- Direct light objects are preferably rendered to visible light fixtures such as ceiling downlights, lights fixed to a wall, table lamps, etc.
- Indirect light metadata is ideally targeting light fixtures that are not directly visible, such as LED strips that light up walls, ceilings, shelves, furniture, and spot lights that light up walls or ceilings. If no such indirect lights are available, the indirect light metadata can be used to control direct lights instead.
- a lightscape Tenderer may cause direct light object metadata and indirect light object metadata to be superimposed when rendering to light fixtures that function as both indirect and direct light sources.
- Figure 11 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- the blocks of method 1100 like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 1100 may be performed concurrently. Moreover, some implementations of method 1100 may include more or fewer blocks than shown and/or described.
- the blocks of method 1100 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above.
- block 1105 involves receiving, by a control system configured to implement a sensory Tenderer, one or more sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback environment.
- the environment may be an actual, real-world environment, such as a room environment or an automobile environment.
- the environment may be, or may include, a virtual environment.
- method 1100 may involve providing a virtual environment, such as a gaming environment, while also providing corresponding lighting effects in a real-world environment.
- block 1110 involves receiving, by the control system, playback environment information.
- the playback environment information includes sensory actuator location information and sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment.
- Block 1110 may, for example, involve receiving the environment and actuator data 004 that is described herein with reference to Figure 4 or receiving the environment and light fixture data 104 that is described herein with reference to Figure 5 or Figure 9A.
- block 1115 involves determining, by the control system and based on the playback environment information, the sensory objects and the sensory object metadata, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment.
- block 1120 involves outputting, by the control system, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment.
- the one or more sensory objects may include one or more light objects, one or more haptic objects, one or more air flow objects, one or more positional actuator objects or combinations thereof.
- the lighting metadata may include direct-light object metadata, indirect light metadata, or combinations thereof.
- the lighting metadata may be organized into one or more layers, which may include one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
- the sensory object metadata includes time information.
- block 1115 may involve determining one or more drive levels for one or more time intervals corresponding to the time information.
- method 1100 may involve receiving viewing position information.
- block 1115 may involve determining one or more drive levels corresponding to the viewing position information.
- the lighting information may include one or more characteristics of one or more base light sources in the local lighting environment, the base light sources not being controllable by the control system.
- block 1115 may involve determining one or more drive levels based, at least in part, on the one or more characteristics of one or more non-controllable light sources.
- the sensory object metadata may include sensory object size information.
- each of the sensory objects may have a sensory object effect property indicating a type of effect that the sensory object is providing.
- one or more of the sensory objects may have a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment.
- one or more of the sensory objects may be assigned to one or more layers in which sensory objects are grouped according to shared sensory object characteristics.
- the one or more layers may group sensory objects according to mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof.
- At least some of the sensory objects may have a priority property indicating a relative importance of each sensory object.
- one or more of the sensory objects may have a spatial panning property indicating how a sensory object can move within the sensory actuator playback environment, how a sensor object will affect the controllable sensory actuators in the sensory actuator playback environment, or combinations thereof.
- at least some of the sensory objects may have a mixing mode property indicating how multiple sensory objects can be reproduced by a single controllable sensory actuator.
- Some examples of method 1100 may involve providing and/or processing more general metadata for an entire multi-sensory content file, instead of (or in addition to) perobject metadata.
- method 1100 may involve receiving, by the control system, overarching sensory object metadata comprising trim control information, mastering environment information, or a combination thereof.
- block 1115 may involve determining the sensory actuator control commands or the sensory actuator control signals based, at least in part, on the trim control information, on the mastering environment information, or on a combination thereof.
- This section discloses various types of coded bitstreams to carry object-based multi- sensory data for rendering on an arbitrary plurality of actuators.
- Such bitstreams may be referred to herein as including encoded object-based sensory data or as including an encoded object-based sensory data stream.
- Some encoded object-based sensory data streams may be delivered along with and/or as part of other media content.
- an object-based sensory data stream may be interleaved or multiplexed with the audio and/or video bitstreams.
- an object-based sensory data stream may be arranged in an International Standards Organization (ISO) base media file format (ISOBMFF), so that the encoded object-based sensory data can be provided with corresponding audio data, video media, or both, in an encoded ISOBMFF bitstream.
- ISO International Standards Organization
- an encoded bitstream may, in some examples, include and an encoded objectbased sensory data stream and an encoded audio data stream and/or an encoded video data stream.
- the encoded object-based sensory data stream and other associated data stream(s) include associated synchronization data, such as time stamps, to allow synchronization between different types of content. For example, if the encoded bitstream includes encoded audio data, encoded video data and encoded object-based sensory data, the encoded audio data, encoded video data and the encoded object-based sensory data may all include associated synchronization data.
- an MS Tenderer such as the MS Tenderer 001 of Figure 4 — will be configured to control a particular set of actuators of a particular playback environment based on (a) the general instructions found in the object-based sensory data 005 and (b) information in the environment and actuator data particular .
- the object-based sensory data 005 may be provided to the MS Tenderer 001 by the experience player 002, which may include a bitstream decoder configured to extract the object-based sensory data 005 from a bitstream that also includes encoded audio data and/or video data.
- Linear audio/video media content is conventionally packed into a container format in frames wherein each frame contains the necessary information to render the media during a particular time duration of the content (for example, the 60ms period from 10 min 3.2 seconds to 10 min 3.8 seconds relative to the start of the content).
- Some format streams may include a separate stream for each modality, for example one elemental stream containing video information encoded using High Efficiency Video Coding (HEVC), also known as H.265, one elemental stream containing audio information encoded using Advanced Audio Coding (AAC) or Dolby AC4, and a third elemental stream containing closed-captioning (subtitle) information.
- HEVC High Efficiency Video Coding
- AAC Advanced Audio Coding
- subtitle closed-captioning
- the present disclosure extends and generalizes previously-existing bitstream encoding and decoding methods to include a plurality of elemental streams that convey non-channel- based (such as object-based or spherical harmonic-based) multi-sensory information suitable for demultiplexing, frame reassembly and presentation using a plurality of actuators, in some instances in synchrony with audio and/or video modalities of a media stream.
- non-channel- based such as object-based or spherical harmonic-based
- Figure 12 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences.
- MS multi- sensory
- system 1200 may be, or may include, one or more devices configured for performing at least some of the methods disclosed herein.
- system 1200 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein.
- system 1200 includes instances of some elements that are described with reference to Figure 4.
- the system 1200 includes the following elements: 1200: A system configured to receive and process an encoded bitstream that includes a plurality of data frames, the data frames including encoded audio data, encoded video data and encoded object-based sensory data;
- the encoded bitstream may be arranged in an International Standards Organization (ISO) base media file format (ISOBMFF) “container” and/or encoded according to a Moving Picture Experts Group (MPEG) standard;
- ISO International Standards Organization
- ISO International Standards Organization
- MPEG Moving Picture Experts Group
- the audio data stream may be encoded according to an Advanced Audio Coding (AAC), Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec;
- AAC Advanced Audio Coding
- Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec may be encoded according to an Advanced Audio Coding (AAC), Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec;
- the video data stream may be encoded according to an Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) or AOMedia Video 1 (AVI) codec;
- AVC Advanced Video Coding
- HEVC High Efficiency Video Coding
- VVC Versatile Video Coding
- AVI AOMedia Video 1
- an encoded sensory data stream which in this example is an objectbased sensory data stream.
- the encoded object-based sensory data includes sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided via sensory actuators in a playback environment.
- there may be one object-based sensory data stream which also may be referred to as an “elemental stream” — for each modality (for example, an object-based lighting data stream, an object-based temperature data stream, an object-based airflow data stream, etc.).
- these modalities may be combines into a single encoded sensory data stream.
- the encoded audio data stream, the encoded video data stream and the encoded object-based sensory data stream all include associated synchronization data, such as time stamps;
- Multi-sensory Tenderer - uses environment and actuator data 004 and decoded multi- sensory stream to produce control signals for driving a plurality of actuators 008.
- the MS Tenderer 001 includes the functionality of the MS controller APIs 003 described with reference to Figure 4. According to this example, MS Tenderer 001 can determine how to synchronize playback of the audio, video and sensory data according to the synchronization data in the encoded bitstream 1201;
- Environment and actuator data - contains information on where actuators are physically located within the environment, visibility/zone of affect information describing how each actuator is perceived by the audience (for example, whether and how each light fixture is visible to the viewer(s));
- 008C A smart RGB light-emitting diode (LED) strip; and 008D: Other actuators in the playback environment.
- LED light-emitting diode
- Embodiment 1 One Elemental Stream Per Modality
- the multi-sensory data stream may be transported in a plurality of elemental streams (for example, one elemental stream for lighting information, one elemental stream for airflow information, one elemental stream for haptic information, one elemental stream for temperature information, etc.).
- multiple versions of one or more multi-sensory data streams may be present in the encoded bitstream 1201 to allow selection based on user preference or user requirements (for example, a default or standard lighting data stream for typical viewers and a separate lighting data stream that contains more subtle lighting information that is intended to be safe for photosensitive viewers).
- Embodiment 2 Combined Multi-Sensory Elemental Stream
- all the multi-sensory modalities may be combined into a unified multi-sensory data stream.
- Embodiment 3 Multi-Sensory Information Combined with an Existing Elemental Stream or Arranged in an Existing Container Format
- the multi-sensory data stream may be embedded within one of the existing elemental streams.
- some audio stream formats may include a facility to encapsulate stream synchronous metadata within them.
- the multi-sensory data stream may be embedded within an existing audio metadata transport mechanism, for example in a field or sequence of fields reserved for audio metadata.
- an existing audio, video or container format could be modified or adapted to allow the inclusion of synchronous multi-sensory information.
- the data frames of the encoded bitstream may be arranged in an International Standards Organization (ISO) base media file format (ISOBMFF) file or “container.”
- ISO International Standards Organization
- the encoded objectbased sensory data may reside in a timed metadata track, for example as defined in Section 12.9 of the International Standards Organization/International Electrotechnical Commission (ISO/IEC) 14496-12:2022 standard, which is hereby incorporated by reference.
- encoded audio data may reside in audio track and/or encoded video data may reside in a video track of an ISOBMFF file.
- Such examples have various potential advantages, which include but are not limited to the following:
- the same timed metadata track may be associated with more than one track.
- a timed metadata track corresponding to the encoded object-based sensory data may be independent of the content of the associated audio/video tracks;
- Figure 13 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
- the blocks of method 1300 like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 1300 may be performed concurrently. Moreover, some implementations of method 1300 may include more or fewer blocks than shown and/or described.
- the blocks of method 1300 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above. For example, at least some aspects of method 1300 may be performed by an instance of the control system 110 that is configured to implement the experience player 002 of Figure 4 or Figure 12.
- block 1305 involves receiving, by a control system configured to implement a demultiplexing module, an encoded bitstream that includes a plurality of data frames.
- the demultiplexing module may, for example, be an instance of the demultiplexer 1206 of Figure 12.
- the data frames include encoded audio data, encoded video data and encoded object-based sensory data.
- the encoded object-based sensory data includes sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback environment.
- the encoded audio data stream, the encoded video data stream and the encoded object-based sensory data stream all include associated synchronization data.
- block 1310 involves extracting, by the control system and from the encoded bitstream, an encoded audio data stream, an encoded video data stream and an encoded object-based sensory data stream.
- Block 1310 may, for example, involve parsing and/or demultiplexing the encoded bitstream 1201 that is described with reference to Figure 12.
- block 1315 involves providing, by the control system, the encoded audio data stream to an audio decoder.
- Block 1315 may, for example, involve the demultiplexer 1206 of Figure 12 providing the encoded audio data stream to the audio decoder 1207.
- block 1320 involves providing, by the control system, the encoded video data stream to a video decoder.
- Block 1320 may, for example, involve the demultiplexer 1206 providing the encoded video data stream to the video decoder 1208.
- block 1325 involves providing, by the control system, the encoded object-based sensory data stream to a sensory data decoder.
- Block 1325 may, for example, involve the demultiplexer 1206 providing the object-based sensory data stream to the multi-sensory data stream decoder 1209.
- the sensory object metadata may include sensory object location information, sensory object size information, or both.
- each of the sensory objects may have a sensory object effect property indicating a type of effect that the sensory object is providing.
- one or more of the sensory objects may have a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment.
- one or more of the sensory objects may be assigned to one or more layers in which sensory objects are grouped according to common sensory object characteristics.
- the one or more layers may group sensory objects according to one or more of mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof.
- at least some of the sensory objects may have a priority property indicating a relative importance of each sensory object.
- the object-based sensory data stream may include light objects, haptic objects, air flow objects, positional actuator objects or combinations thereof.
- the lighting metadata may include direct-light object metadata, indirect light metadata, or combinations thereof.
- the lighting metadata may be organized into one or more layers, which may include one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
- method 1300 may involve decoding the encoded object-based sensory data stream and providing a decoded object-based sensory data stream, including associated sensory synchronization data, to a sensory data Tenderer.
- method 1300 may involve receiving, by the sensory data Tenderer, the decoded object-based sensory data stream and receiving, by the sensory data Tenderer, playback environment information.
- the playback environment information may be an instance of the environment and actuator data 004 that is described herein. Accordingly, the playback environment information may include sensory actuator location information and sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment.
- method 1300 may involve determining, by the sensory data Tenderer and based at least in part on (a) the sensory objects, the sensory object metadata and the associated sensory synchronization data from the the decoded object-based sensory data stream and (b) the playback environment information, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment.
- method 1300 may involve outputting, by the sensory data Tenderer, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment.
- the sensory actuator control commands may, for example, be provided to the MS controller APIs 003 described with reference to Figure 4.
- the sensory actuator control signals may, for example, be provided to the actuators 008 of a playback environment.
- each data frame of the plurality of data frames may include an encoded audio data subframe, an encoded video data subframe and an encoded object-based sensory data subframe.
- the data frames of the bitstream may be encoded according to a Moving Picture Experts Group (MPEG) standard.
- the data frames of the bitstream may be arranged in the International Standards Organization (ISO) base media file format (ISOBMFF).
- ISO International Standards Organization
- the encoded object-based sensory data may reside in a timed metadata track.
- the audio data stream may be encoded according to an Advanced Audio Coding (AAC), Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec.
- the video data stream may be encoded according to an Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) or AOMedia Video 1 (AVI) codec.
- AVC Advanced Video Coding
- HEVC High Efficiency Video Coding
- VVC Versatile Video Coding
- EEEs enumerated example embodiments
- a method for rendering an intended lighting environment comprising: receiving, by a control system configured to implement a lighting environment Tenderer, object-based light data indicating the intended lighting environment, the objectbased light data including light objects and lighting metadata; receiving, by the control system, lighting information regarding a local lighting environment, wherein the lighting information includes one or more characteristics of one or more controllable light sources in the local lighting environment; determining, by the control system, a drive level for each of the one or more controllable light sources that approximates the intended lighting environment; and outputting, by the control system, the drive level to at least one of the controllable light sources.
- EEE 2 The method of EEE 1, wherein: the object-based lighting metadata includes time information; and the determining involves determining one or more drive levels for one or more time intervals corresponding to the time information.
- EEE 3 The method of EEE 1 or EEE 2, further comprising receiving viewing position information, wherein the determining involves determining one or more drive levels corresponding to the viewing position information.
- EEE 4 The method of any one of EEEs 1-3, wherein: the lighting information includes one or more characteristics of one or more base light sources in the local lighting environment, the base light sources not being controllable by the control system; and the determining involves determining one or more drive levels based, at least in part, on the one or more characteristics of one or more non-controllable light sources.
- EEE 5. The method of any one of EEEs 1-4, wherein the object-based lighting metadata indicates at least lighting object position information and lighting object color information.
- EEE 6 The method of any one of EEEs 1-5, wherein the intended lighting environment is transmitted as one or more Image Based Lighting (IBL) objects.
- IBL Image Based Lighting
- EEE 7 The method of EEE 6, wherein the determining is based, at least in part, on an IBL map of environmental lighting produced by each of n controllable light sources (IBL n )in the local lighting environment at maximum intensity.
- EEE 8 The method of EEE 7, wherein the determining is based, at least in part, on a base IBL map of the base environment lighting (IBLbase) that is not controllable by the control system.
- IBLbase base environment lighting
- EEE 9 The method of EEE 8, wherein the determining is based, at least in part, on a linear scaling (Scalen) for each dynamic lighting element IBL n such that a sum of light from each scaled IBL n plus the base lighting IBLbase most closely matches the intended lighting environment IBLref.
- Scalen linear scaling
- EEE 10 The method of EEE 9, wherein the determining is based, at least in part, on minimizing a difference between IBLref and the sum of light from each scaled IBLn plus the base lighting IBLb ase.
- EEE 11 An apparatus configured to perform the method of any one of EEEs 1-10.
- EEE 12 A system configured to perform the method of any one of EEEs 1-10.
- EEE 13 One or more non-transitory computer-readable media having instructions stored thereon for controlling one or more devices to perform the method of any one of EEEs 1-10.
- a method for providing an intended sensory experience comprising: receiving, by a control system configured to implement a sensory Tenderer, one or more sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback environment; receiving, by the control system, playback environment information, wherein the playback environment information includes sensory actuator location information and sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment; determining, by the control system and based on the playback environment information, the sensory objects and the sensory object metadata, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment; and outputting, by the control system, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment.
- EEE 15 The method of EEE 14, wherein the one or more sensory objects include one or more light objects, one or more haptic objects, one or more air flow objects, one or more positional actuator objects or combinations thereof.
- EEE 16 The method of EEE 14 or EEE 15, wherein the sensory object metadata includes light objects and lighting metadata, and wherein the lighting metadata includes direct-light object metadata, indirect light metadata, or combinations thereof.
- EEE 17 The method of EEE 16, wherein the lighting metadata is organized into one or more layers including one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
- EEE 18 The method of any one of EEEs 14-17, wherein the sensory object metadata includes sensory object location information.
- EEE 19 The method of any one of EEEs 14-18, wherein the sensory object metadata includes sensory object size information.
- EEE 20 The method of any one of EEEs 14-19, wherein each of the sensory objects has a sensory object effect property indicating a type of effect that the sensory object is providing.
- EEE 21 The method of any one of EEEs 14-20, wherein one or more of the sensory objects has a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment.
- EEE 22 The method of any one of EEEs 14-21, wherein one or more of the sensory objects are assigned to one or more layers in which sensory objects are grouped according to shared sensory object characteristics.
- EEE 23 The method of EEE 22, wherein the one or more layers group sensory objects according to mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof.
- EEE 24 The method of any one of EEEs 14-23, wherein at least some of the sensory objects have a priority property indicating a relative importance of each sensory object.
- EEE 25 The method of any one of EEEs 14-24, wherein one or more of the sensory objects has a spatial panning property indicating how a sensory object can move within the sensory actuator playback environment, how a sensor object will affect the controllable sensory actuators in the sensory actuator playback environment, or combinations thereof.
- EEE 26 The method of any one of EEEs 14-25, wherein at least some of the sensory objects have a mixing mode property indicating how multiple sensory objects can be reproduced by a single controllable sensory actuator.
- EEE 27 The method of any one of EEEs 14-26, further comprising receiving, by the control system, overarching sensory object metadata comprising trim control information, mastering environment information, or a combination thereof.
- EEE 28 An apparatus configured to perform the method of any one of EEEs 14-27.
- EEE 29 A system configured to perform the method of any one of EEEs 14-27.
- EEE 30 One or more non-transitory computer-readable media having instructions stored thereon for controlling one or more devices to perform the method of any one of EEEs 14-27.
- a method for decoding a bitstream comprising: receiving, by a control system configured to implement a demultiplexing module, an encoded bitstream that includes a plurality of data frames, the data frames including encoded audio data, encoded video data and encoded object-based sensory data, the encoded objectbased sensory data including sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback environment, the encoded audio data stream, the encoded video data stream and the encoded object-based sensory data stream all including associated synchronization data; extracting, by the control system and from the encoded bitstream, an encoded audio data stream, an encoded video data stream and an encoded object-based sensory data stream; providing, by the control system, the encoded audio data stream to an audio decoder; providing, by the control system, the encoded video data stream to a video decoder; and providing, by the control system, the encoded object-based sensory data stream to a sensory data decoder.
- EEE 32 The method of EEE 31, further comprising decoding the encoded object-based sensory data stream and providing a decoded object-based sensory data stream, including associated sensory synchronization data, to a sensory data Tenderer.
- EEE 33 The method of EEE 32, further comprising: receiving, by the sensory data Tenderer, the decoded object-based sensory data stream; receiving, by the sensory data Tenderer, playback environment information, wherein the playback environment information includes sensory actuator location information and sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment; determining, by the sensory data Tenderer and based at least in part on (a) the sensory objects, the sensory object metadata and the associated sensory synchronization data from the the decoded object-based sensory data stream and (b) the playback environment information, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment; and outputting, by the sensory data Tenderer, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment.
- EEE 34 The method of any one of EEEs 31-33, wherein each data frame of the plurality of data frames includes an encoded audio data subframe, an encoded video data subframe and an encoded object-based sensory data subframe.
- EEE 35 The method of any one of EEEs 31-34, wherein the data frames of the bitstream are encoded according to a Moving Picture Experts Group (MPEG) standard.
- EEE 36 The method of any one of EEEs 31-35, wherein the data frames of the bitstream are arranged in an International Standards Organization (ISO) base media file format (ISOBMFF).
- ISO International Standards Organization
- EEE 37 The method of EEE 35 or EEE 36, wherein the encoded object-based sensory data resides in a timed metadata track.
- EEE 38 The method of any one of EEEs 31-37, wherein the sensory objects include one or more lighting objects, one or more haptic objects, one or more air flow objects, one or more positional actuator objects or combinations thereof.
- EEE 39 The method of any one of EEEs 31-38, wherein the sensory object metadata includes lighting metadata and wherein the lighting metadata includes direct-light object metadata, indirect light metadata, or combinations thereof.
- EEE 40 The method of EEE 39, wherein the lighting metadata is organized into one or more layers including one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
- EEE 41 The method of any one of EEEs 31-40, wherein the sensory object metadata includes sensory object location information, sensory object size information, or both.
- EEE 42 The method of any one of EEEs 31-41, wherein each of the sensory objects has a sensory object effect property indicating a type of effect that the sensory object is providing.
- EEE 43 The method of any one of EEEs 31-42, wherein one or more of the sensory objects has a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment.
- EEE 44 The method of any one of EEEs 31-43, wherein one or more of the sensory objects are assigned to one or more layers in which sensory objects are grouped according to common sensory object characteristics.
- EEE 45 The method of EEE 44, wherein the one or more layers group sensory objects according to one or more of mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof.
- EEE 46 The method of any one of EEEs 31-45, wherein at least some of the sensory objects have a priority property indicating a relative importance of each sensory object.
- EEE 47 The method of any one of EEEs 31-46, wherein the audio data stream is encoded according to an Advanced Audio Coding (AAC), Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec.
- AAC Advanced Audio Coding
- Dolby AC3 Dolby AC3
- Dolby EC3 Dolby AC4
- Dolby Atmos codec Dolby Atmos codec
- EEE 48 The method of any one of EEEs 31-46, wherein the video data stream is encoded according to an Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) or AOMedia Video 1 (AVI) codec.
- AVC Advanced Video Coding
- HEVC High Efficiency Video Coding
- VVC Versatile Video Coding
- AVI AOMedia Video 1
- EEE 49 An apparatus configured to perform the method of any one of EEEs 31-48.
- EEE 50 A system configured to perform the method of any one of EEEs 31-48.
- EEE 51 One or more non-transitory computer-readable media having instructions stored thereon for controlling one or more devices to perform the method of any one of EEEs 31-48.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Automation & Control Theory (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Some disclosed examples involve receiving, by a control system, a content bitstream including encoded object-based sensory data, the encoded object-based sensory data including one or more sensory objects and corresponding sensory metadata, the encoded object-based sensory data corresponding to sensory effects including lighting, haptics, airflow, one or more positional actuators, or combinations thereof, to be provided by one or more sensory actuators in an environment. Some disclosed examples involve extracting, by the control system, object-based sensory metadata from the content bitstream and providing, by the control system, the object-based sensory metadata to a sensory renderer. In some examples, the content bitstream also may include one or more encoded audio objects and/or encoded video data synchronized with the encoded object-based sensory metadata.
Description
PROVIDING OBJECT-BASED MULTI-SENSORY EXPERIENCES
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority from U.S. Provisional Application No. 63/669,232 filed July 10, 2024, U.S. Provisional Application No.63/514, 107 filed July 17, 2023, U.S. Provisional Application No. 63/514,096 filed July 17, 2023, and U.S. Provisional Application No. 63/514,094, filed July 17, 2023, each of which is incorporated by reference herein in its entirety.
TECHNICAL FIELD
[0002] The present disclosure relates to providing multi-sensory experiences, some of which include light-based experiences.
BACKGROUND
[0003] Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted as prior art by inclusion in this section.
[0004] Media content delivery has generally focused on audio and video experiences. There has been limited delivery of multi-sensory content due to the bespoke nature of actuation. Luminaires, for example, are used extensively as an expression of art and function for concerts. However, each installation is designed specifically for a unique set of luminaires. Delivering a lighting design beyond the set of fixtures the system was designed for is generally not feasible. Other systems that attempt to deliver light experiences more broadly simply do so by extending the screen visuals algorithmically, but are not specifically authored. Haptics content is designed for a specific haptics apparatus. If another device, such as a game controller, mobile phone or even a different brand of haptics device is used, there has been no way to translate the creative intent of content to the different actuators.
SUMMARY
[0005] At least some aspects of the present disclosure may be implemented via methods, such as audio processing methods. In some instances, the methods may be implemented, at least in part, by a control system such as those disclosed herein. Some methods may involve receiving, by a control system, a content bitstream including encoded object-based sensory data. The encoded object-based sensory data may include one or more sensory objects and
corresponding sensory metadata. The encoded object-based sensory data may correspond to sensory effects to be provided by one or more sensory actuators in an environment. The sensory effects may include lighting, haptics, airflow, one or more positional actuators, or combinations thereof. Some methods may involve extracting, by the control system, the object-based sensory data from the content bitstream and providing, by the control system, the object-based sensory data to a sensory Tenderer.
[0006] In some examples, the object-based sensory metadata may include sensory spatial metadata indicating at least a spatial position for rendering the object-based sensory data within the environment, an area for rendering the object-based sensory data within the environment, or combinations thereof. According to some examples, the object-based sensory data may not correspond to particular sensory actuators in the environment. In some examples, the object-based sensory data may include abstracted sensory reproduction information allowing the sensory Tenderer to reproduce one or more authored sensory effects via one or more sensory actuator types, via various numbers of sensory actuators and from various sensory actuator positions in the environment.
[0007] Some methods may involve receiving, by the sensory Tenderer, the object-based sensory data and environment descriptor data corresponding to one or more locations of sensory actuators in the environment. Some methods may involve receiving, by the sensory Tenderer, actuator descriptor data corresponding to properties of the sensory actuators in the environment. Some methods may involve providing, by the sensory Tenderer, one or more actuator control signals for controlling the sensory actuators in the environment to produce one or more sensory effects indicated by the object-based sensory data. Some methods may involve providing, by the one or more sensory actuators in the environment, the one or more sensory effects.
[0008] According to some examples, the content bitstream also may include one or more encoded audio objects synchronized with the encoded object-based sensory data. The audio objects may include one or more audio signals and corresponding audio object metadata. Some such methods may involve extracting, by the control system, audio objects from the content bitstream and providing, by the control system, the audio objects to an audio Tenderer. In some examples, the audio object metadata may include at least audio object spatial metadata indicating an audio object spatial position for rendering the one or more audio signals within the environment. Some methods may involve receiving, by the audio Tenderer, the one or more audio objects, receiving, by the audio Tenderer, loudspeaker data corresponding to one or more loudspeakers in the environment and providing, by the audio
renderer, one or more loudspeaker control signals for controlling the one or more loudspeakers in the environment to play back audio corresponding to the one or more audio objects and synchronized with the one or more sensory effects. Some methods may involve playing back, by the one or more loudspeakers in the environment, the audio corresponding to the one or more audio objects.
[0009] According to some examples, the content bitstream may include encoded video data synchronized with the encoded audio objects and the encoded object-based sensory data. Some methods may involve extracting, by the control system, video data from the content bitstream and providing, by the control system, the video data to a video Tenderer. Some methods may involve receiving, by the video Tenderer, the video data and providing, by the video Tenderer, one or more video control signals for controlling one or more display devices in the environment to present one or more images corresponding to the one or more video control signals and synchronized with the one or more audio objects and the one or more sensory effects. Some methods may involve presenting, by the one or more display devices in the environment, the one or more images corresponding to the one or more video control signals.
[0010] In some examples, the environment may be a virtual environment. According to some examples, the environment may be a physical, real -world environment. For example, the environment may be a room environment or a vehicle environment.
[0011] Some or all of the operations, functions and/or methods described herein may be performed by one or more devices according to instructions (e.g., software) stored on one or more computer-readable non-transitory media. Such non-transitory media may include one or more memory devices such as those described herein, including but not limited to one or more random access memory (RAM) devices, read-only memory (ROM) devices, etc. Accordingly, some innovative aspects of the subject matter described in this disclosure can be implemented in one or more computer-readable non-transitory media having software stored thereon.
[0012] At least some aspects of the present disclosure may be implemented via apparatus. For example, one or more devices may be capable of performing, at least in part, the methods disclosed herein. In some implementations, an apparatus may include an interface system and a control system. The control system may include one or more general purpose single- or multi-chip processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates or transistor logic, discrete hardware components, or combinations
thereof. The control system may be configured to perform some or all of the disclosed methods.
[0013] Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Note that the relative dimensions of the following figures may not be drawn to scale.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Disclosed embodiments now be described, by way of example only, with reference to the accompanying drawings.
[0015] Figure l is a block diagram that shows examples of components of an apparatus capable of implementing various aspects of this disclosure.
[0016] Figure 2 shows example elements of an endpoint.
[0017] Figure 3 shows examples of actuator elements.
[0018] Figure 4 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences.
[0019] Figure 5 shows example elements of another system for the creation and playback of MS experiences.
[0020] Figure 6 shows an example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5.
[0021] Figure 7A shows another example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5.
[0022] Figure 7B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
[0023] Figures 8 A, 8B and 8C show three examples of projecting lighting of viewing environments onto a two-dimensional (2D) plane.
[0024] Figure 9A shows example elements of a lightscape Tenderer.
[0025] Figure 9B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
[0026] Figure 10 shows another example of a GUI that may be presented by a display device of a lightscape creation tool.
[0027] Figure 11 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
[0028] Figure 12 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences.
[0029] Figure 13 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein.
DETAILED DESCRIPTION
[0030] Described herein are techniques related to providing multi-sensory media content. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be evident, however, to one skilled in the art that the present disclosure as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
[0031] In the following description, various methods, processes and procedures are detailed. Although particular steps may be described in a certain order, such order is mainly for convenience and clarity. A particular step may be repeated more than once, may occur before or after other steps (even if those steps are otherwise described in another order), and may occur in parallel with other steps. A second step is required to follow a first step only when the first step must be completed before the second step is begun. Such a situation will be specifically pointed out when not clear from the context.
[0032] In this document, the terms “and”, “or” and “and/or” are used. Such terms are to be read as having an inclusive meaning. For example, “A and B” may mean at least the following: “both A and B”, “at least both A and B”. As another example, “A or B” may mean at least the following: “at least A”, “at least B”, “both A and B”, “at least both A and B”. As another example, “A and/or B” may mean at least the following: “A and B”, “A or B”. When an exclusive-or is intended, such will be specifically noted (e.g., “either A or B”, “at most one of A and B”).
[0033] This document describes various processing functions that are associated with structures such as blocks, elements, components, circuits, etc. In general, these structures may be implemented by one or more processors controlled by one or more computer programs.
[0034] As noted above, media content delivery has generally been focused on audio and video experiences. There has been limited delivery of multi-sensory (MS) content due to the customized nature of actuation.
[0035] This application describes methods for extending the creative palette for content creators, allowing spatial, MS experiences to be created and delivered at scale. Some such methods involve the introduction of new layers of abstraction, in order to allow authored MS experiences to be delivered to different endpoints, with different types of fixtures or actuators. As used herein, the term “endpoint” is synonymous with “playback environment” or simply “environment,” meaning an environment that includes one or more actuators that may be used to provide an MS experience. Such endpoints may include a room, such as the living room of a home, a car, a cinema, a night club or other venue, etc. Some disclosed methods involve the creation, delivery and/or rendering of object-based sensory data, which may include sensory objects and corresponding sensory metadata. This abstraction allows creative intent to be implemented in an object-based format that does not require prior knowledge of the specific controller actuation, thereby enabling greater flexibility and scalability of fixtures and actuators across endpoints. An MS experience provided via objectbased sensory data may be referred to herein as a “flexibly-scaled MS experience.”
[0036] Acronyms
MS - multisensory MSIE - MS Immersive Experience AR - Augmented Reality VR - Virtual Reality PC - personal computer
[0037] Figure l is a block diagram that shows examples of components of an apparatus capable of implementing various aspects of this disclosure. As with other figures provided herein, the types and numbers of elements shown in Figure 1 are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements. According to some examples, the apparatus 101 may be, or may include, a device that is configured for performing at least some of the methods disclosed herein, such as a smart audio device, a laptop computer, a cellular telephone, a tablet device, a smart home hub, etc. In some such implementations the apparatus 101 may be, or may
include, a server that is configured for performing at least some of the methods disclosed herein.
[0038] In this example, the apparatus 101 includes at least an interface system 105 and a control system 110. In some implementations, the control system 110 may be configured for performing, at least in part, the methods disclosed herein. The control system 110 may, in some implementations, be configured for receiving, via the interface system 105, receiving, by a control system, a content bitstream including encoded object-based sensory metadata, The encoded object-based sensory metadata may correspond to sensory effects such as lighting, haptics, airflow, one or more positional actuators, or combinations thereof, to be provided by a plurality of sensory actuators in an environment. In some implementations, the control system 110 may be configured for extracting object-based sensory metadata from the content bitstream and for providing the object-based sensory metadata to a sensory Tenderer.
[0039] According to some examples, the object-based sensory metadata may include sensory spatial metadata indicating at least a spatial position for rendering the object-based sensory metadata within the environment, an area for rendering the object-based sensory metadata within the environment, or combinations thereof. In some implementations, the object-based sensory metadata does not correspond to any particular sensory actuator in the environment. In some examples, the object-based sensory metadata may include abstracted sensory reproduction information allowing the sensory Tenderer to reproduce authored sensory effects, which also may be referred to herein as intended sensory effects, via various sensory actuator types, via various numbers of sensory actuators and from various sensory actuator positions in the environment.
[0040] In some examples, the content bitstream also may include encoded audio objects synchronized with the encoded object-based sensory metadata. The audio objects may include audio signals and corresponding audio object metadata. In some such implementations, the control system 110 may be configured for extracting audio objects from the content bitstream and for providing the audio objects to an audio Tenderer. According to some examples, the audio objects may include audio signals and corresponding audio object metadata. The audio object metadata may include at least audio object spatial metadata indicating an audio object spatial position for rendering the audio signals within the environment.
[0041] The interface system 105 may include one or more network interfaces and/or one or more external device interfaces (such as one or more universal serial bus (USB) interfaces). According to some implementations, the interface system 105 may include one or more wireless interfaces. The interface system 105 may include one or more devices for implementing a user interface, such as one or more microphones, one or more speakers, a display system, a touch sensor system and/or a gesture sensor system. In some examples, the interface system 105 may include one or more interfaces between the control system 110 and a memory system, such as the optional memory system 115 shown in Figure 1. However, the control system 110 may include a memory system in some instances.
[0042] The control system 110 may, for example, include a general purpose single- or multichip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components.
[0043] In some implementations, the control system 110 may reside in more than one device. For example, a portion of the control system 110 may reside in a device within an environment (such as a laptop computer, a tablet computer, a smart audio device, etc.) and another portion of the control system 110 may reside in a device that is outside the environment, such as a server. In other examples, a portion of the control system 110 may reside in a device within an environment and another portion of the control system 110 may reside in one or more other devices of the environment.
[0044] Some or all of the methods described herein may be performed by one or more devices according to instructions (e.g., software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. The one or more non-transitory media may, for example, reside in the optional memory system 115 shown in Figure 1 and/or in the control system 110.
Accordingly, various innovative aspects of the subject matter described in this disclosure can be implemented in one or more non-transitory media having software stored thereon. The software may, for example, include instructions for controlling at least one device to process audio data. The software may, for example, be executable by one or more components of a control system such as the control system 110 of Figure 1.
[0045] In some examples, the apparatus 101 may include the optional microphone system 120 shown in Figure 1. The optional microphone system 120 may include one or more microphones. In some implementations, one or more of the microphones may be part of, or associated with, another device, such as a speaker of the speaker system, a smart audio device, etc.
[0046] According to some implementations, the apparatus 101 may include the optional actuator system 125 shown in Figure 1. The optional actuator system 125 may include one or more loudspeakers, one or more haptic devices, one or more light fixtures, also referred to herein as luminaires, one or more fans or other air-moving devices, one or more display devices, including but not limited to one or more televisions, one or more positional actuators, one or more other types of devices for providing an MS experience, or combinations thereof. The term “light fixture” as used herein refers generally to various types of light sources, including individual light sources, groups of light sources, light strips, etc. A “light fixture” may be moveable, and therefore the word “fixture” in this context does not mean that a light fixture is necessarily in a fixed position in space. The term “positional actuators” as used herein refers generally to devices that are configured to change a position or orientation of a person or object, such as motion simulator seats. Loudspeakers may sometimes be referred to herein as “speakers.” In some implementations, the optional actuator system 125 may include a display system including one or more displays, such as one or more light-emitting diode (LED) displays, one or more organic light-emitting diode (OLED) displays, etc. In some examples wherein the apparatus 101 includes a display system, the optional sensor system 130 may include a touch sensor system and/or a gesture sensor system proximate one or more displays of the display system. According to some such implementations, the control system 110 may be configured for controlling the display system to present a graphical user interface (GUI), such as a GUI related to implementing one of the methods disclosed herein.
[0047] In some implementations, the apparatus 101 may include the optional sensor system 130 shown in Figure 1. The optional sensor system 130 may include a touch sensor system, a gesture sensor system, one or more cameras, etc.
[0048] This application describes methods for creating and delivering a flexibly scaled multi- sensory (MS) immersive experience (MSIE) to different playback environments, which also may be referred to herein as endpoints. Such endpoints may include a room, such as the
living room of a home, a car, a cinema, a night club or other venue, an AR/VR headset, a PC, a mobile device, etc.
[0049] Figure 2 shows example elements of an endpoint. In this example, the endpoint is a living room 1001 containing multiple actuators 008, some furniture 1010 and a person 1000 — also referred to herein as a user — who will consume a flexibly-scaled MS experience. Actuators 008 are devices capable of altering the environment 1001 that the user 1000 is in. Actuators 008 may include one or more televisions or other display devices, one or more luminaires — also referred to herein as light fixtures — one or more loudspeakers, etc.
[0050] The number of actuators 008, the arrangement of actuators 008 and the capabilities of actuators 008 in the space 1001 may vary significantly between different endpoint types. For example, the number, arrangement and capabilities of actuators 008 in a car will generally be different from the number, arrangement and capabilities of actuators 008 in a living room, a night club, etc. In many implementations, the number, arrangement and/or capabilities of actuators 008 may vary significantly between different instances of the same type, e.g., between a small living room with 2 actuators 008 and a large living room with 16 actuators 008. The present disclosure describes various method for creating and delivering flexibly- scaled MSIEs to these non-homogenous endpoints.
[0051] Figure 3 shows examples of actuator elements. In this example, the actuator is a luminaire 1100, which includes a network module 1101, a control module 1102 and a light emitter 1103. According to this example, the light emitter 1103 includes one or more lightemitting devices, such as light-emitting diodes, which are configured to emit light into an environment in which the luminaire 1100 resides. In this example, the network module 1101 is configured to provide network connectivity to one or more other devices in the space, such as a device that sends commands to control the emission of light by the luminaire 1100. According to this example, the control module 1102 is configured to receive signals via the network module 1101 and to control the light emitter 1103 accordingly.
[0052] Other examples of actuators also may include a network module 1101 and a control module 1102, but may include other types of actuating elements. Some such actuators may include one or more loudspeakers, one or more haptic devices, one or more fans or other airmoving devices, one or more positional actuators, one or more display devices, etc.
[0053] Figure 4 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences. As with other figures provided herein, the types and numbers of
elements shown in Figure 4 are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements. According to some examples, system 300 may be, or may include, one or more devices configured for performing at least some of the methods disclosed herein. In some examples, system 300 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein.
[0054] According to the examples in the present disclosure, creating and providing an objectbased MS Immersive Experience (MSIE) approach involves the application of a suite of technologies for creation, delivery and rendering of object-based sensory data, which may include sensory objects and corresponding sensory metadata, to the actuators 008. Some examples are described in the following paragraphs.
[0055] Object-Based Representation: In various disclosed implementations, multi-sensory (MS) effects are represented using what may be referred to herein as “sensory objects.” According to some such implementations, properties such as layer-type and priority may be assigned to and associated with attached to each sensory object, enabling content creators’ intent to be represented in the rendered experiences. Detailed examples of sensory object properties are described below.
[0056] In this example, system 300 includes a content creation tool 000 that is configured for designing multi-sensory (MS) immersive content and for outputting object-based sensory data 005, either separately or in conjunction with corresponding audio data Oi l and/or video data 012, depending on the particular implementation. The object-based sensory data 005 may include time stamp information, as well as information indicating the type of sensory object, the sensory object properties, etc. In this example, the object-based sensory data 005 is not “channel -based” data that corresponds to one or more particular sensory actuators in a playback environment, but instead is generalized for a wide range of playback environments with a wide range of actuator types, numbers of actuators, etc. In some examples, the objectbased sensory data 005 may include object-based light data, object-based haptic data, objectbased air flow data, or object-based positional actuator data, object-based olfactory data, object-based smoke data, object based data for one or more other types of sensor effects, or combinations thereof. According to some examples, the object-based sensory data 005 may include sensory objects and corresponding sensory metadata. For example, if the objectbased sensory data 005 includes object-based light data, the object-based light data may include light object position metadata, light object color metadata, light object size metadata,
light object intensity metadata, light object shape metadata, light object diffusion metadata, light object gradient metadata, light object priority metadata, light object layer metadata, or combinations thereof. Although the content creation tool 000 is shown providing a stream of object-based sensory data 005 to the experience player 002 in this example, in alternative examples the content creation tool 000 may produce object-based sensory data 005 that is stored for subsequent use. Examples of graphical user interfaces for a light-object-based content creation tool are described below.
[0057] MS Object Renderer: Various disclosed implementations provide a Tenderer that is configured render MS effects to actuators in a playback environment. According to this example, system 300 includes an MS renderer 001 that is configured to render object-based sensory data 005 to actuator control signals 310, based at least in part on environment and actuator data 004. In this example, the MS renderer 001 is configured to output the actuator control signals 310 to MS controllers 003, which are configured to control the actuators 008. In some examples, the MS renderer 001 may be configured to receive light objects and object-based lighting metadata indicating an intended lighting environment, as well as lighting information regarding a local lighting environment. The lighting information is one general type of environment and actuator data 004, and may include one or more characteristics of one or more controllable light sources in the local lighting environment. In some examples, the MS renderer 001 may be configured to determine a drive level for each of the one or more controllable light sources that approximates the intended lighting environment. Some alternative examples may include a separate renderer for each type of actuator 008, such as one renderer for light fixtures, another renderer for haptic devices, another renderer for air flow devices, etc. According to some examples, the MS renderer 001 (or one of the MS controllers 003) may be configured to output the drive level to at least one of the controllable light sources. In some implementations, the MS renderer 001 may be configured to adapt to changing conditions. Some examples of MS renderer 001 implementations are described in more detail below.
[0058] The environment and actuator data 004 may include what are referred to herein as “room descriptors” that describe actuator locations (e.g., according to an x,y,z coordinate system or a spherical coordinate system). In some examples, the environment and actuator data 004 may indicate actuator orientation and/or placement properties (e.g., directional and north-facing, omnidirectional, occlusion information, etc.). According to some examples, the environment and actuator data 004 may indicate actuator orientation and/or placement
properties according to a 3x3 matrix, in which three elements (for example, the elements of the first row) represent spatial position (x,y,z), three other elements (for example, the elements of the second row) represent orientation (roll, pitch, yaw), and three other elements (for example, the elements of the third row) indicate a scale or size (sx, sy, sz). In some examples, the environment and actuator data 004 may include device descriptors that describe the actuator properties relevant to the MS Tenderer 001, such as intensity range and color gamut of a light fixture, the air flow speed range and direction(s) for an air-moving device, etc.
[0059] In this example, system 300 includes an experience player 002 that is configured to receive object-based sensory data 005’, audio data 011’ and video data 012’, to provide the object-based sensory data 005 to the MS Tenderer 001, to provide the audio data 011 to the audio Tenderer 006 and to provide the video data 012 to the video Tenderer 007. In this example, the reference numbers for the object-based sensory data 005’, audio data 011’ and video data 012’ received by the experience player 002 include primes (‘), in order to suggest that the data may in some instances be encoded. Likewise, the object-based sensory data 005, audio data Oi l and video data 012 output by the experience player 002 do not include primes, in order to suggest that the data may in some instances have been decoded by the experience player 002. According to some examples, the experience player 002 may be a media player, a game engine or personal computer or mobile device, or a component integrated in an television, DVD player, sound bar, set top box, or a service provider media device such as a Chromecast, Apple TV device, or Amazon Fire TV. In some examples, the experience player 002 may be configured to receive encoded object-based sensory data 005 along with encoded audio data Oi l and/or encoded video data 012. In some such examples, the encoded objectbased sensory data 005’ may be received as part of the same bitstream with the encoded audio data OI L and/or the encoded video data 012’ . Some examples are described in more detail below. According to some examples, the experience player 002 may be configured to extract the object-based sensory data 005’ from the content bitstream and to provide decoded object-based sensory data 005 to the MS Tenderer 001, to provide decoded audio data 011 to the audio Tenderer 006 and to provide decoded video data 012 to the video Tenderer 007. In some examples, time stamp information in the object-based sensory data 005’ may be used — for example, by the experience player 102, the MS Tenderer 001, the audio Tenderer 106, the video Tenderer 107, or all of them — to synchronize effects relating to the object-based sensory data 005’ with the audio data 111’ and/or the video data 112’, which may also include time stamp information.
[0060] According to this example, system 300 includes MS controllers 003 that are configured to communicate with a variety of actuator types using application program interfaces (APIs) or one or more similar interfaces. Generally speaking, each actuator will require a specific type of control signal to produce the desired output from the Tenderer. According to this example, the MS controllers 003 are configured to map outputs from the MS Tenderer 001 to control signals for each actuator. For example, a Philips Hue™ light bulb receives control information in a particular format to turn the light on, with a particular saturation, brightness and hue, and a digital representation of the desired drive level.
[0061] In some examples, room descriptors also may describe the size and orientation of the playback environment itself, to establish a relative or absolute coordinate system to which all objects are positioned. For example, in a living room a display screen may be regarded as the front, in some instances the front and center, and the floor and ceiling may be regarded as the vertical bounds. In some such examples, the room descriptors also may also indicate bounds corresponding with the left, right, front, and rear, walls relative to the front position. According to some examples, the room descriptor also may be provided in terms of a matrix, such as a 3x3 matrix. This room descriptor information is useful in describing the physical dimensions of the playback environment, for example in physical units of distance such as meters. In some such examples, sensory object locations, sensory object sizes, and sensory object orientations may be described in units that are relative to the room size, for example in a range from -1 to 1. Room descriptors may also describe a preferred viewing position, in some instances according to a matrix.
[0062] The types, numbers and arrangements of the actuators 008 will generally vary according to the particular implementation. In some examples, actuators 008 may include lights and/or light strips (also referred to herein as “luminaires”), vibrational motors, air flow generators, positional actuators, or combinations thereof.
[0063] Similarly, the types, numbers and arrangements of the loudspeakers 009 and the display devices 010 will generally vary according to the particular implementation. In the examples shown in Figure 4, audio data 011 and video data 012 are rendered by the audio Tenderer 006 and the video Tenderer 007 to the loudspeakers 009 and display devices 010, respectively.
[0064] As noted above, according to some implementations the system 300 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at
least some of the methods disclosed herein. In some such examples, one instance of the control system 110 may implement the content creation tool 000 and another instance of the control system 110 may implement the experience player 002. In some examples, one instance of the control system 110 may implement the audio Tenderer 006, the video Tenderer 007, the multi-sensory Tenderer 001, or combinations thereof. According to some examples, an instance of the control system 110 that is configured to implement the experience player 002 may also be configured to implement the audio Tenderer 006, the video Tenderer 007, the multi-sensory Tenderer 001, or combinations thereof.
Multi-Sensory Rendering Synchronization
[0065] Object-based MS rendering involves different modalities being rendered flexibly to the endpoint/playback environment. Endpoints have differing capabilities according to various factors, including but not limited to the following:
• The number of actuators,
• The modalities of those actuators (e.g., light fixture vs. air flow control device vs. haptic device);
• The types of those actuators (e.g., a white smart light vs. a RGB smart light, or a haptic vest vs. a haptic seat cushion) and
• The location/layout of those actuators.
[0066] In order to render object-based sensory content to any endpoint, some processing of the object signals, e.g. intensities, colors, patterns etc., will generally need to be done. The processing of each modality’s signal path should not alter the relative phase of certain features within the object signals. For example, suppose that a lightning strike is presented in both the haptics and lightscape modalities. The signal processing chain for the corresponding actuator control signals should not result in a time delay of either type of sensory object signal — haptic or light — sufficient to alter the perceived synchronization of the two modalities. The level of required synchronization may depend on various factors, such as whether the experience is interactive and what other modalities are involved in the experience. Maximum time difference values may, for example, range from approximately 10ms to 100ms, depending on the particular context.
HAPTICS
Rendering of object-based haptics content
[0067] Object-based haptics content conveys sensory aspects of the scene through an abstract sensory representation rather than a channel-based scheme only. For example, instead of defining haptics content as a single-channel time-dependent amplitude signal only, that is in turn played out of a particular haptics actuator such as a vibro-tactile motor in a vest the user wears, object-based haptics content may be defined by the sensations that it is intended to convey. More specifically, in one example, we may have a haptic object representing a collision haptic sensory effect. Associated with this object is:
• The haptic object’s spatial location;
• The spatial direction/vector of the haptic effect;
• The intensity of the haptic effect;
• Haptic spatial and temporal frequency data; and
• A time-dependent amplitude signal.
[0068] According to some examples, a haptic object of this type may be created automatically in an interactive experience such as a video game, e.g. in a car racing game when another car hits a player’s car from behind. In this example, the MS Tenderer will determine how to render the spatial modality of this effect to the set of haptic actuators in the endpoint. In some examples, the Tenderer does this according to information about the following:
• The type(s) of haptic devices available, e.g., haptic vest vs. haptic glove vs. haptic seat cushion vs. haptic controller;
• The locale of each haptic device with respect to the user(s) (some haptic devices may not be coupled to the user(s), e.g., a floor- or seat-mounted shaker);
• The type of actuation each haptic device provides, e.g. kinesthetic vs. vibro-tactile;
• The on- and off-set delay of each haptic device (in other words, how fast each haptic device can turn on and off);
• The dynamic response of each haptic device (how much the amplitude can vary);
• The time-frequency response of each haptic device (what time-frequencies the haptic device can provide);
• The spatial distribution of addressable actuators within each haptic device: for example, a haptic vest may have dozens of addressable haptics actuators distributed over the user’ s torso; and
• The time-response of any haptic sensors used to render closed-loop haptic effects (e.g., an active force-feedback kinesthetic haptic device).
[0069] These attributes of the haptics modality of the endpoint will inform the render how best to render a particular haptic effect. Consider the car crash effect example again. In this example, a player is wearing a haptic vest, a haptic arm band and haptic gloves. According to this example, a haptic shockwave effect is spatially located at the place where the car has collided into the player. The shockwave vector is dictated by the relative velocity of the player’s car and the car that has hit the player. The spatial and temporal frequency spectra of the shockwave effect are authored according to the type of material the virtual cars are intended to be made of, amongst other virtual world properties. The Tenderer then renders this shockwave through the set of haptics devices in the endpoint, according to the shockwave vector and the physical location of the haptics devices relative to the user.
[0070] The signals sent to each specific actuator are preferably provided so that the sensory effect is congruent across all of the (potentially heterogenous) actuators available. For example, the Tenderer may not render very high frequencies to just one of the haptic actuators (e.g., the haptic arm band) due to capabilities lacking in other actuators. Otherwise, as the shockwave moves through the player’s body, because the haptic vest and haptic gloves the user is wearing do not have the capability to render such high frequencies, there would a degradation of the haptic effect perceived by the user as the wave moves through the vest, into the arm band and finally into the gloves.
[0071] Some types of abstract haptic effects include:
• Shockwave effects, such as described above;
• Barrier effects, such as haptic effects which are used to represent spatial limitations of a virtual world, for example in a video game. If there are kinesthetic actuators on input devices (e.g., force feedback on a steering wheel or joystick), either active or resistive, then rendering of such an effect can be done through the resistive force applied to the users input. If no such actuators are available in the endpoint then in some examples vibro-tactile feedback may be rendered that is congruent with the collision of the in-game avatar with a barrier;
• Presence, for example to indicate the presence of a large object approaching the scene such as a train. This type of haptic effect may be rendered using a low timefrequency rumbling of some haptic devices’ actuators. This type of haptic effect may also be rendered through contact spatial feedback applied as pressure from air-cuffs;
• User interface feedback, such as clicks from a virtual button. For example, this type of haptic effect may be rendered to the closest actuator on the body of the user that
performed the click, for example haptic gloves that the user is wearing. Alternatively, or additionally, this type of haptic effect may also be rendered to a shaker coupled to the chair in which the user is sitting. This type of haptic effect may, for example, be defined using time-dependent amplitude signals. However, such signals may be altered (modulated, frequency-shifted, etc.) in order to best suit the haptic device(s) that will be providing the haptic effect;
• Movement. These haptic effects are designed so that the user perceives some form of motion. These haptic effects may be rendered by an actuator that actually moves the user, e.g. a moving platform/seat. In some examples, an actuator may provide a secondary modality (via video, for example) to enhance the motion being rendered; and
• Triggered sequences. These haptic effects are characterized mainly by their timedependent amplitude signals. Such signals may be rendered to multiple actuators and may be augmented when doing so. Such augmentations may include splitting a signal in either time or frequency across multiple actuators. Some examples may involve augmenting the signal itself so that the sum of the haptic actuator outputs does not match the original signal.
Spatial and Non-Spatial Effects
[0072] Spatial effects are those which are constructed in a way that convey some spatial information of the multi-sensory scene being rendered. For example, if the playback environment is a room, a shockwave moving through the room would be rendered differently to each haptic device given its location within the room, according to the position and size of one or more haptic objects being rendered at a particular time.
[0073] Non-spatial effects may, in some examples, target particular locations on the user regardless of the user’s location or orientation. One example is a haptic device that provides a swelling vibration on the users back to indicate immediate danger. Another example is a haptic device that provides a sharp vibration to indicate an injury to a particular body area.
[0074] Some effects may be non-diegetic effects. Such effects are typically associated with user interface feedback, such as a haptic sensation to indicate the user completed a level or has clicked a button on a menu item. Non-diegetic effects may be either spatial or non- spatial.
Haptic Device Type
[0075] Receiving information regarding the different types of haptics devices available at the endpoint enables the Tenderer to determine what kinds of perceived effects and rendering strategies are available to it. For example, local haptics device data indicating that the user is wearing both haptic gloves and a vibro-tactile vest — or at least local haptics device data indicating that that haptic gloves and a vibro-tactile vest are present in the playback environment — allows the Tenderer to render a congruent recoil effect across the two devices when a user shoots a gun in a virtual world. The actual actuator control signals sent to the haptic devices may be different than in the situation where only a single device is available. For example, if the user is only wearing a vest, the actuator control signals used to actuate the vest may differ with regard to the timing of the onset, the maximum amplitude, frequency and decay time of the actuator control signals, or combinations thereof.
Location of the Devices
[0076] Knowledge of the location of the haptics devices across the endpoint enables the Tenderer to render spatial effects congruently. For example, knowledge of the location of the shaker motors in a lounge enables the Tenderer to produce actuator control signals to each of the shaker motors in the lounge in a way to convey spatial effects such as a shockwave propagating through the room. Additionally, knowledge of where wearable haptics devices, whilst implicit by their type, e.g. a glove is on the user’s hand, may also bd used by the Tenderer to convey spatial effects in addition to non-spatial effects.
Types of Actuation Provided by Haptic Devices
[0077] Haptic devices can provide a range of different actuations and thus perceived sensations. These are typically classed in two basic categories:
1. vibro-tactile, e.g. vibrations; or
2. Kinesthetic, e.g., resistive or active force feedback.
[0078] Either category of actuations may be static or dynamic, where dynamic effects are altered in real time according to some sensor input. Examples include a touch screen rendering a texture using a vibro-tactile actuator and a position sensor measuring the user’s finger position(s).
[0079] Moreover, the physical construction of such actuators varies widely and affects many other attributes of the device. An example of this is the onset delay or time-frequency response that varies significantly across the following haptic device types:
• Eccentric rotating mass;
• Linear resonant actuator;
• Piezoelectric actuator; and
• Linear magnetic ram.
[0080] The Tenderer should be configured to account for the onset delay of a particular haptics device type when rendering signals to be actuated by the haptics devices in the endpoint.
The On- and Off-Set Delays of the Haptic Devices
[0081] The onset delay of the haptic device refers to the delay between the time that an actuator control signal is sent to the device and the device’s physical response. The off-set delay refers to the delay between the time that an actuator control signal is sent to zero the output of the device and the time the device stops actuating.
The Time-Frequency Response
[0082] The time-frequency response refers to the frequency range of the signal amplitude as a function of time that the haptic device can actuate at steady state.
The Spatial-Frequency Response
[0083] The spatial-frequency response refers to the frequency range of the signal amplitude as a function of the spacing of actuators of a haptic device. Devices with closely-spaced actuators have higher spatial-frequency responses.
Dynamic Range
[0084] Dynamic range refers to the differences between the minimum and maximum amplitude of the physical actuation.
Characteristics of Sensors in Closed-Loop Haptics Devices
[0085] Some dynamic effects use sensors to update the actuation signal as a function of some observed state. The sampling frequencies, both temporal and spatial along with the noise
characteristics will limit the capability of the control loop updating the actuator providing the dynamic effect.
AIRFLOW
[0086] Another modality that some multi-sensory immersive experiences (MSIE) may use is airflow. The airflow may, for example, be rendered congruently with one or more other modalities such as audio, video, light-effects and/or haptics. Rather than only specialized (e.g. channel-based) setups for 4D experiences in cinemas which may include “wind effects,” some airflow effects may be provided at other endpoints that may typically include airflow, such as a car or a living room. Rather than a channel-based system, the airflow sensory effects may be represented as an airflow object that may include properties such as:
• Spatial location;
• Direction of the intended airflow effect;
• Intensity/airflow speed; and/or
• Air temperature.
[0087] Some examples of air flow objects may be used to represent the movement of a bird flying past. To render to the airflow actuators at the endpoint, the MS Tenderer 001 may be provided with information regarding:
• The type of airflow devices e.g. fan, air conditioning, heating;
• The position of each airflow device relative to the user’s location, or relative to an expected of the user;
• The capabilities of the airflow device, e.g., the airflow device’s ability to control direction, airflow and temperature;
• The level of control of each actuator, e.g., airflow speed, temperature range; and
• The response time of each actuator, e.g., how long does it take to reach a chosen speed.
Some Examples of Airflow Use in Different Endpoints
[0088] In a vehicle such as a car, the object-based metadata can be used to create experiences such as:
• Mimicking “chills down your spine” during a horror movie or gaming piece of content with airflow down the chair;
• Simulating the movement of a bird flying past; and/or
• Create a gentle breeze in a seascape.
[0089] In the small enclosed space of a typical vehicle, temperature changes may be possible to achieve over relatively shorter periods of time — as compared to temperature changes in a larger environment, such as a living room environment. In one example, the MS Tenderer 001 may cause an increasing air temperature as a player enters a “lava level” or other hot area during a game. Some examples may include other elements, such as confetti in the air vents to celebrate an event, such as the celebration of a goal made by the user’s your favorite football team.
[0090] In a living space or other room, airflow may be synchronized to the breathing rhythm of a guided meditation in one example. In another example, airflow may be synchronized to the intensity of a workout, with increased airflow or decreased temperature as intensity increases. In some examples, there may be relatively less control over spatial aspects during rendering. For example, many existing airflow actuators are optimized for heating and/or air conditioning rather than for providing spatially diverse sensory actuation.
Combinations of Lights, Airflow and Haptics
Car Examples
[0091] The following examples are described with reference to a car, but are also applicable to other vehicles, such as trucks, vans, etc. In some examples, there may be a user interface on the steering wheel or on a touchscreen near or in the dashboard. According to some examples, the following actuators may be present in the car:
1. Individually addressable lights, spatially distributed around the car as follows: o on the dashboard; o under the footwells; o on the doors; and o in the center console.
2. Individually controllable air conditioning/heating outlets distributed around the car as follows: o In the front dashboard; o Under the footwells; o In the center console facing the rear seats; o On the side pillars; o In the seats; and o Directed to the windscreens (for defogging).
3. Individually controllable seats with vibro-tractile haptics; and
4. Individually controllable floor mats with vibro-tactile haptics.
[0092] In this example, the modalities supported by these actuators include the following:
• Lights across the individually addressable LEDs in the car, plus the indicator lights on the dash and steering wheel;
• Air flow via the controllable air conditioning vents;
• Haptics, including: o Steering wheel: tactile vibration feedback; o Dash touchscreen: tactile vibration feedback and texture rendering; and o Seats: tactile vibrations and movement.
[0093] In one example, a live music stream is being rendered to four users sitting in the front seats. In this example, the MS Tenderer 001 attempts to optimize the experience for multiple viewing positions. During the build-up before the artist has taken the stage and the previous acts have finished, the content contains:
• Interlude music;
• Low intensity lighting; and
• Haptic content representing the moshing of the crowd.
[0094] In addition to the rendered audio and video stream, the light content contains ambient light objects that are moving slowly around the scene. These may be rendered using one of the ambient layer methods disclosed herein, for example such that there is no spatial priority given to any user’s perspective. In some examples, the haptic content may be spatially concentrated in the lower time-frequency spectrum and may be rendered only by the vibro- tactile motors in the floor mats.
[0095] According to this example, pyrotechnic events during the music stream correspond to multi-sensory-sensory content including:
• Light objects that spatially correspond to the location of the pyrotechnics at the event; and
• Haptic objects to reinforce the dynamism of the pyrotechnics via a shockwave effect.
[0096] In this example, the MS Tenderer 001 renders both the light objects and the haptic objects spatially. Light objects may, for example, be rendered in the car such that each person in the car perceives the light objects to come from the left if the pyrotechnics content
is located at the left of the scene. In this example, only lights on the left of the car are actuated. Haptics may be rendered across both the seats and floor mats in a way that conveys directionality to each user individually.
[0097] At the end of the concert the pyrotechnics are present in the audio content and both pyrotechnics and confetti are present in the video content. In addition to rendering light objects and haptic objects corresponding to the pyrotechnics as above, the effect of the confetti firing may be rendered using the airflow modality. For example, the individually controllable air flow vents of the HVAC system may be pulsed.
Living Room Examples
[0098] In this implementation, in addition to an audio/visual (AV) system that includes multiple loudspeakers and a television, the following actuators and related controls are available in the living room:
• A haptics vest that the user — also referred to as a player — is wearing;
• Haptics shakers mounted to the seat in which the player is sitting;
• A (haptics) controllable smart watch;
• Smart lights spatially distributed around the room;
• A wireless controller; and
• An addressable air-flow bar (AFB), which includes an array of individually controllable fans directed to the user (similar to HVAC vents in the front dashboard of a car).
[0099] In this example, the user is playing a first person shooter game and the game contains a scene in which a destructive hurricane moves through the level. As it does so, in-game objects are thrown around and some hit the player. Haptics objects rendered by the MS renderer 001 cause a shockwave effect to be provided through all of the haptics devices that the user can perceive. The actuator control signals sent to each device may be optimized according to the intensity of the impact of the in-game objects, the directi on(s) of the impact and the capabilities and location of each actuator (as described earlier).
[0100] At a time before the user is struck by an in-game object, the multi-sensory content contains a haptic object corresponding to a non-spatial rumble, one or more airflow objects corresponding to directional airflow; and one or more light objects corresponding to lightning. The MS renderer 001 renders the non-spatial rumble to the haptics devices. The actuator control signals sent to each haptics device may be rendered such that the ensemble of
actuator control signals across the haptics array is congruent in perceived onset time, intensity and frequency. In some examples, the frequency content of the actuator control signals sent to the smart watch may be low-pass filtered, so that they are congruent with the frequencylimited capability of the vest, which is proximate to the watch. The MS Tenderer 001 may render the one or more airflow objects to actuator control signals for the AFB such that the air flow in the room is congruent with the location and look direction of the player in the game, as well as the hurricane direction itself. Lightning may be rendered across all modalities as (1) a white flash across lights that are located in suitable locations, e.g., in or on the ceiling; and (2) an impulsive rumble in the user’s wearable haptics and seat shaker.
[0101] When the user is struck by an in-game object, a directional shockwave may be rendered to the haptics devices. In some examples, a corresponding airflow impulse may be rendered. According to some examples, a damage take effect, indicating the amount of damage caused to the player by being struck by the in-game object, may be rendered by the lights.
[0102] In some such examples, signals may be rendered spatially to the haptics devices such that a perceived shockwave moves across the player’s body and the room. The MS Tenderer 001 may provide such effects according to actuator location information indicating the haptics devices locations relative to one another. The MS Tenderer 001 may provide the shockwave vector and position according to the actuator location information in addition to actuator capability information. According to some examples, a non-directional air flow impulse may be rendered, e.g., all the air vents of the AFB may be turned up briefly to reinforce the haptic modality. In some examples, at the same time, a red vignette may be rendered to the light strip surrounding the TV, indicating to the player that the player took damage in the game.
[0103] Figure 5 shows example elements of another system for the creation and playback of MS experiences. As with other figures provided herein, the types and numbers of elements shown in Figure 5 are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements. According to some examples, system 500 may be, or may include, one or more devices configured for performing at least some of the methods disclosed herein. In some examples, system 500 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein.
[0104] According to this example, the system shown in Figure 5 is an instance of the system shown in Figure 4. In this example, the system shown in Figure 5 is a “lightscape” embodiment in which vision (video), audio and light effects are combined to create the MS experience.
[0105] In this example, system 500 includes a lightscape creation tool 100, which is an instance of the content creation tool 000 that is described with reference to Figure 4. The lightscape creation tool 100 is configured for designing and outputting object-based light data 505’, either separately or in conjunction with corresponding audio data 111’ and/or video data 112’, depending on the particular implementation. The object-based light data 505’ may include time stamp information, as well as information indicating light object properties, etc. In some instances, the time stamp information may be used to synchronize effects relating to the object-based light data 505’ with the audio data 111’ and/or the video data 112’, which also may include time stamp information.
[0106] In this example, the object-based light data 505’ includes light objects and corresponding light metadata. For example, the object-based light data may include light object position metadata, light object color metadata, light object size metadata, light object intensity metadata, light object shape metadata, light object diffusion metadata, light object gradient metadata, light object priority metadata, light object layer metadata, or combinations thereof. Although the content creation tool 100 is shown providing a stream of object-based light data 505’ to the experience player 102 in this example, in alternative examples the content creation tool 100 may produce object-based light data 505’ that is stored for subsequent use. Examples of graphical user interfaces for a light-object-based content creation tool are described below.
[0107] In this example, system 500 includes an experience player 102 that is configured to receive object-based light data 505’, audio data 111’ and video data 112’, to provide the object-based light data 505 to the lightscape Tenderer 501, to provide the audio data 111 to the audio Tenderer 106 and to provide the video data 112 to the video Tenderer 107. According to some examples, the experience player 102 may be a media player, a game engine or personal computer or mobile device, or a component integrated in an television, DVD player, sound bar, set top box, or a service provider media device such as a Chromecast, Apple TV device, or Amazon Fire TV. In some examples, the experience player 002 may be configured to receive encoded object-based light data 505’ along with encoded audio data 111’ and/or encoded video data 112’. In some such examples, the encoded object-
based light data 505’ may be received as part of the same bitstream with the encoded audio data 111’ and/or the encoded video data 112’. Some examples are described in more detail below. According to some examples, the experience player 102 may be configured to extract the object-based light data 505 from the content bitstream and to provide decoded objectbased light data 505 to the lightscape Tenderer 501, to provide decoded audio data 111 to the audio Tenderer 106 and to provide decoded video data 112 to the video Tenderer 107. In some examples, the experience player 002 may be configured to allow control of configurable parameters in the lightscape Tenderer 501, such as immersion intensity. Some examples are described below.
[0108] In some examples, room descriptors of the environment and light fixture data 104 may describe the size and orientation of the playback environment itself, to establish a relative or absolute coordinate system to which all objects are positioned. For example, in a living room a display screen may be regarded as the front, in some instances the front and center, and the floor and ceiling may be regarded as the vertical bounds. In some such examples, the room descriptors also may also indicate bounds corresponding with the left, right, front, and rear, walls relative to the front position. According to some examples, the room descriptor also may be provided in terms of a matrix, such as a 3x3 matrix. This room descriptor information is useful in describing the physical dimensions of the playback environment, for example in physical units of distance such as meters. In some such examples, sensory object locations, sensory object sizes, and sensory object orientations may be described in units that are relative to the room size, for example in a range from -1 to 1. Room descriptors may also describe a preferred viewing position, in some instances according to a matrix.
[0109] According to this example, system 500 includes a lightscape Tenderer 501 that is configured to render object-based light data 505 to light fixture control signals 515, based at least in part on environment and actuator data 104. In this example, the lightscape Tenderer 501 is configured to output the light fixture control signals 515 to light controllers 103, which are configured to control the light fixtures 108. The light fixtures 108 may include individual controllable light sources, groups of controllable light sources (such as controllable light strips), or combinations thereof. In some examples, the lightscape Tenderer 501 may be configured to manage various types of light object metadata layers, examples of which are provided herein. According to some examples, the lightscape Tenderer 501 may be configured to render actuator signals for light fixtures based, at least in part, on the
perspective of a viewer. If the viewer is in a living room, that includes a television (TV) screen, the lightscape Tenderer 501 may, in some examples, be configured to render the actuator signals relative to the TV screen. However, in virtual reality (VR) use cases, the lightscape Tenderer 501 may be configured to render the actuator signals relative to the position and orientation of the user’s head. In some examples, the lightscape Tenderer 501 may receive input from the playback environment — such as light sensor data corresponding to ambient light, camera data corresponding to a person’s location or orientation, etc. — to augment the render.
[0110] In some examples, the lightscape Tenderer 501 is configured to receive object-based light data 505 that includes light objects and object-based lighting metadata indicating an intended lighting environment, as well as environment and light fixture data 104 corresponding to light fixtures 108 and other features of a local playback environment, which may include, but are not limited to, reflective surfaces, windows, non-controllable light sources, light-occluding features, etc. In this example, the local playback environment includes one or more loudspeakers 109 and one or more display devices 510.
[0111] According to some examples, the lightscape Tenderer 501 is configured to calculate how to excite various controllable light fixtures 108 based at least in part on the object-based light data 505 and the environment and light fixture data 104. The environment and light fixture data 104 may, for example, indicate the geometric locations of the light fixtures 108 in the environment, light fixture type information, etc. In some examples, the lightscape Tenderer 501 may configured to determine which light fixtures will be actuated based, at least in part, on the position metadata and size metadata associated with each light object, e.g., by determining which light fixtures are within a volume of a playback environment corresponding to the light object’s position and size at a particular time indicated by light object time stamp information. In this example, the lightscape Tenderer 501 is configured to send light fixture control signals 515 to the light controller 103 based on the environment and light fixture data 104 and the object-based light data 505. The light fixture control signals 515 may be sent via one or more of various transmission mechanisms, application program interfaces (APIs) and protocols. The protocols may, for example, include Hue API, LIFX API, DMX, Wi-Fi, Zigbee, Matter, Thread, Bluetooth Mesh, or other protocols.
[0112] In some examples, the lightscape Tenderer 501 may be configured to determine a drive level for each of the one or more controllable light sources that approximates a lighting environment intended by the author(s) of the object-based light data 505. According to some
examples, the lightscape Tenderer 501 may be configured to output the drive level to at least one of the controllable light sources.
[0113] According to some examples, the lightscape renderer 501 may be configured to collapse one or more parts of the lighting fixture map according to the content metadata, user input (choosing a mode), limitations and/or configuration of the light fixtures, other factors, or combinations thereof. For example, the lightscape Tenderer 501 may be configured to render the same control signals to two or more different lights of a playback environment. In some such examples, two or more lights may be located close to one another. For example, two or more lights may be different lights of the same actuator, e.g., may be different bulbs within the same lamp. Rather than compute a very slightly different control signal for each light bulb, the lightscape Tenderer 501 may be configured to reduce the computational overhead, increase rendering speed, etc., by render the same control signals to two or more different, but closely-spaced, lights.
[0114] In some examples, the lightscape Tenderer 501 may be configured to spatially upmix the object-based light data 505. For example, if the object-based light data 505 was produced for a single plane, such as a horizontal plane, in some instances the lightscape Tenderer 501 may be configured to project light objects of the object-based light data 505 onto an upper hemispherical surface (e.g., above an actual or expected position of the user’s head) in order to enhance the experience.
[0115] According to some examples, the lightscape Tenderer 501 may be configured to apply one or more thresholds, such as one or more spatial thresholds, one or more luminosity thresholds, etc., when rendering actuator control signals to light actuators of a playback environment. Such thresholds may, in some instances, prevent some light objects from causing the activation of some light fixtures.
[0116] Light objects may be used for various purposes, such as to set the ambience of the room, to give spatial information about characters or objects, to enhance special effects, to create a greater sense of interaction and immersion, to shift viewer attention, to punctuate the content, etc. Some such purposes may be expressed, at least in part, by a content creator according to sensory object metadata types and/or properties that are generally applicable to various types of sensory objects — such as object metadata indicating a sensory object’s location and size.
[0117] For example, the priority of sensory objects, including but not limited to light objects, may be indicated by sensory object priority metadata. In some such examples, sensory object priority metadata is taken into account when multiple sensor objects map to the same fixture(s) in a playback environment at the same time. Such priority may be indicated by light priority metadata. In some examples, priority may not need to be indicated via metadata. For example, the MS Tenderer 001 may give priority to sensory objects — including but not limited to light objects — that are moving over sensory objects that are stationary.
[0118] A light object may, depending on its location and size and the locations of light fixtures within a playback environment — potentially cause the excitation of multiple lights. In some examples, when the size of a light object encompasses multiple lights, the Tenderer may apply one or more thresholds — such as one or more spatial thresholds or one or more luminosity thresholds — to gate objects from activating some encompassed lights.
Examples of Using a Lighting Map
[0119] In some implementations a lighting map, which is an instance of the of the actuator map (AM) that includes a description of lighting in a playback environment, may be provided to the lightscape Tenderer 501. In some such examples, the environment and light fixture data shown in Figure 5 may include the lighting map. According to some examples, the lighting map may be allocentric, e.g., indicating absolute spatial coordinate-based light fall-off, whereas in other examples the lighting map may be egocentric, e.g., a light projection mapped onto a sphere at an intended viewing position and orientation. In the case of a sphere, the lighting map may, in some examples, be projected onto a two-dimensional (2D) surface, e.g., in order to utilize 2D image textures in processing. In any case, the lighting map should indicate the capabilities and the lighting setup of the playback environment, such as a room. In some embodiments the lighting map may not directly relate to physical room characteristics, for example if certain user preference-based adjustments have been made.
[0120] In some examples, there may be one lighting map per light fixture, or per light, in a playback environment. According to some examples, the intensity of light indicated by the light map may be inversely correlated to the distance to the center of the light, or may be approximately (e.g., within plus or minus 5%, within plus or minus 10%, within plus or minus 15%, within plus or minus 20%, etc.) inversely correlated to the distance to the center of the light. The intensity values of the light map may indicate the strength or impact of the light object onto the light fixture. For example, as a light object approaches a lightbulb, the
lightscape Tenderer 501 may be configured to determine that the lightbulb intensity will increase as the distance between the light object and the lightbulb decreases. The lightscape renderer 501 may be configured to determine the rate of this transition based, at least in part, on the intensity of light indicated by the light map.
[0121] Inside this common rendering space, in some examples the lightscape Tenderer 501 may be configured to use a dot product multiplication between a light object and the light map for each light to compute a light activation metric, e.g., as follows:
[0122] In the foregoing equation, Y represents the light activation metric, LM represents the lighting map and Obj represents the map of a light object. The light activation metric indicates the relative light intensity for the actuator control signal output by the lightscape Tenderer 501 based on the overlap between the light object and the spread of light from the light fixture. In some examples, the lightscape Tenderer 501 may use the maximum or closest distance, or other geometric metrics, from the light object to the light fixture as part of the determination of light intensity. In some implementations, instead of computing the light activation metric, the lightscape Tenderer 501 may refer to a look-up-table to determine the light activation metric.
[0123] The lightscape Tenderer 501 may repeat one of the foregoing procedures for determining the light activation metric for all light objects and all controllable lights of the playback environment. Thresholding for light objects that produce a very low impact on light fixtures may be helpful to reduce complexity. For example, if the effect of a light object would cause an activation of less than a threshold percent of light fixture activation — such as less than 10%, less than 5%, etc. — the lightscape Tenderer 501 may disregard the effect of that light object.
[0124] The lightscape Tenderer 501 may then use the resultant light activation matrix Y, along with various other properties such as the chosen panning law (either indicated by light object metadata or Tenderer configuration) or the priority of the light object, to determine which objects get rendered by which lights and how. Rendering lights-objects into light fixture control signals may involve:
• Altering the luminance of a light-object as a function of the distance it is from the light fixture;
Mixing the colors of multiple light-objects that are simultaneously (multiplexed) rendered by a single light fixture; or
Altering either of the above based on the light object priority.
RENDERING PARAMETERS
[0125] In addition to the information carried by the light object metadata, the rendering of light-objects can be a function of the settings or parameters of the lightscape Tenderer 501 itself. These may include:
• Velocity priority - when this parameter is set, light objects that are moving are given a higher priority than those which are not. Having the velocity priority parameter set enhances the dynamism of the rendered scene;
• Color priority - light-objects with higher saturation values will take priority;
• Activation threshold - the minimum light activation, Y, that must be achieved in order to activate a light-fixture;
• Accessibility - certain colors may be chosen over others to best represent the experience for colorblind users. Certain flash rates may be avoided for those with photosensitivities.
RENDERING CONFIGURATION (MODES)
[0126] In addition to the information carried by the light object metadata, the lightscape Tenderer 501 may, in some implementations, be configured according to different modes. As used herein, the term “mode” is different from “parameter” in the sense that modes may, for example, involve completely different signal paths, whereas parameters may simply parameterize these signal paths. For example, one mode may involve the projection of all light objects onto a lighting map before determining how/what to render to the light-fixtures, while another mode may only snap the highest-priority lights to the nearest light fixtures. Modes may include:
• Modes to support low light-fixture count. In these modes, the rendering parameters and the light object metadata are utilized in order to determine which subset of light-objects are to be rendered and in what manner. Here, the “manner” refers to the trade-off between the spatial, color, temporal fidelity of the most prominent light-objects in the scene;
• Modes to support different content types, such as music vs. gaming;
• Modes in which multiple light objects may be rendered by a single light fixture (or a single light) with color mixing;
• Modes in which only a single light object can be rendered by a single light fixture (or a single light);
• Modes in which the luminance of the light object is altered as a function of the geometric - or otherwise - distance between the light object and light fixture.
[0127] As noted above, according to some implementations the system 500 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein. In some such examples, one instance of the control system 110 may implement the lightscape creation tool 100 and another instance of the control system 110 may implement the experience player 002. In some examples, one instance of the control system 110 may implement the audio Tenderer 006, the video Tenderer 007, the lightscape Tenderer 501, or combinations thereof. According to some examples, an instance of the control system 110 that is configured to implement the experience player 002 may also be configured to implement the audio Tenderer 006, the video Tenderer 007, the lightscape Tenderer 501, or combinations thereof.
[0128] Figure 6 shows an example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5. As with other figures provided herein, the types and numbers of elements shown in Figure 6 are merely provided by way of example. Other GUIs presented by a lightscape creation tool may include more, fewer and/or different types and numbers of elements. According to some examples, the GUI 600 may be presented on a display device according to commands from an instance of the control system 110 of Figure 1 that is configured for implementing the lightscape creation tool 100 of Figure 5.
[0129] In this example, a user may interact with the GUI 600 in order to create light objects and to assign light object properties, which may be associated with the light object as metadata. According to this example, a user is selecting properties of the light object 630. In this example, the GUI 600 shows the light object 630 in a three-dimensional space 631, the latter of which represents a playback environment. Element 634 shows a coordinate system of the three-dimensional space 631. Accordingly, in this example the light object 630 and the three-dimensional space 631 are being viewed from the upper left.
[0130] A user may interact with the GUI 600 in order to select a position and a size of the light object 630. In some examples, a user may select a position of the light object 630 by dragging the light object 630 to a desired position within the three-dimensional space 631, for example by touching a touch screen, using a cursor, etc. According to some examples, a user may select a size of the light object 630 by selecting the size of the circle (or other shape) that is shown on the GUI 600 to indicate the outline of the light object 130. In some such examples, a user may decrease the size of the light object 630 via a two-fingered pinch of the outline of the light object 130, may increase the size of the light object 630 via a two-fingered expansion, etc.
[0131] Specifying the position and size of an MS object within an abstracted three- dimensional space, such as the three-dimensional space 631 of GUI 600, allows a content creator to generalize the position and extent of the corresponding MS effects without prior knowledge of the particular playback environment in which the MS effects will be provided. This is an advantage of the MS object-oriented approach of various disclosed implementations. For example, the GUI 600 allows a content creator to specify the position and size of the light object 630 within the three-dimensional space 631, thereby allowing the content creator to generalize the position and extent of the corresponding light effects, without prior knowledge of the particular size of any particular playback environment in which the light effects will be provided, without prior knowledge of the number, type and positions of light fixtures, etc., within the playback environment in which the light effects will be provided, etc. The light fixtures that will potentially be actuated responsive to the presence of the light object 630 at a particular time will be those within a volume of the playback environment corresponding to the position and size/extent of the light object 630.
[0132] According to this example, a user may interact with the color circle 635 of the GUI 600 in order to select the hue and color saturation of the current light object and may interact with the slider 636 in order to select the brightness of the current light object. These and other selectable properties of the light object 630 are displayed in area 632 of the GUI 600. According to this example, the properties of the light object 630 that may be selected via the GUI 600 also include intensity, diffusivity, “feathering,” whether or not the light object is hidden, saturation, priority and layer. Light object layers and priority will be described in more detail below. Generally speaking, light object layers may be used to group light objects into categories such as “ambient,” “dynamic,” etc. Light object priority may be assigned by a content creator and used by a Tenderer to determine, for example, which light object(s) will
be presented when two or more light objects are simultaneously active and are simultaneously encompassing an area that includes the same light fixture.
[0133] Area 640 of the GUI 600 indicates time information corresponding to each of a plurality of light objects that are being created via the lightscape creation tool. In this example, light objects are listed on the left side of the area 640, along a vertical axis, and time is shown along a horizontal axis. In this example, four-second time intervals are delineated by vertical lines. Here, time information for each light object is shown as isolated or connected diamond symbols or lines along a series of horizontal rows, each of which corresponds to one of the light objects indicated on the left side of the area 640. The line 633, for example, indicates that light object 3 will be displayed starting between 39 and 40 seconds and will be continuously displayed until almost 1 minute and 6 seconds. The diamond symbols to the right of the line 633 indicate that light object 3 will be displayed discontinuously for the next few seconds.
[0134] Figure 7A shows another example of a graphical user interface (GUI) that may be presented by a display device of the lightscape creation tool of Figure 5. As with other figures provided herein, the types and numbers of elements shown in Figure 7A are merely provided by way of example. Other GUIs presented by a lightscape creation tool may include more, fewer and/or different types and numbers of elements. According to some examples, the GUI 700 may be presented on a display device according to commands from an instance of the control system 110 of Figure 1 that is configured for implementing the lightscape creation tool 100 of Figure 5.
[0135] In this example, the GUI 700 represents an instant in time during which light fixtures in an actual playback environment are being controlled according to light objects that have been created by an implementation of the lightscape creation tool 100 of Figure 5. An image of the playback environment is shown in area 705 of the GUI 700. Various light fixtures 708 and a television 715 are shown in the playback environment of area 705. The particular instant in time is shown by vertical line 742 of area 740. At this time, the vertical line 742 intersects with horizontal lines 744a, 744b, 744c, and 744d, indicating that the light being provided in the corresponding light objects 1, 4, 5 and 7 are being played back. Area 732 indicates light object properties.
[0136] One may observe that at the instant in time that is depicted in Figure 7A, the left side of the playback environment shown in area 705 is being illuminated by blue light. This
corresponds, at least in part, to the effect of the light object 730 shown within the three- dimensional space 731.
[0137] According to this example, video data and audio data are also being played back in the audio environment, and the playback of rendered light objects is being synchronized with playback of the video data and audio data. In this example, an image of the played-back video is shown in area 710 of the GUI 700. The video may, for example, be played back by the television 715.
[0138] In some examples, a user may be able to interact with the GUI 700 in order to adjust light object properties, add or delete light objects, etc. For example, a user may cause the playback to be paused in order to adjust light object properties. In some alternative examples, a user may need to revert to a GUI such as the GUI 600 of Figure 6 in order to adjust light object properties, add or delete light objects, etc.
[0139] In the example described with reference to Figure 7A, although the GUI 700 was being presented a display device corresponding to the lightscape creation tool 100 of Figure 5, light objects, audio and video were being rendered in an actual, real -world environment. Accordingly, in some implementations the example described with reference to Figure 7A may also involve at least some of the “downstream” rendering and playback functionality that can be provided by other blocks of Figure 5, including but not limited to that of the lightscape renderer 501, the light controller APIs 103 — which may in some instances be implemented by the same device that implements the lightscape renderer 501 — the light fixtures 108, the audio renderer 106, the loudspeakers 109, the video renderer 107 and the display device(s) 510. In some such examples, the processes described with reference to Figure 7A also may involve functionality of the experience player 102 of Figure 5.
[0140] In some alternative implementations, the example described with reference to Figure 7A may also involve at least some of the “downstream” rendering and playback functionality that can be provided by other blocks of Figure 4, including but not limited to that of the MS renderer 001, the MS controller APIs 003 — which may in some instances be implemented by the same device that implements the MS renderer 001 — the light fixtures 008, the audio renderer 006, the loudspeakers 009, the video renderer 007 and the display device(s) 010. In some such examples, the processes described with reference to Figure 7A also may involve functionality of the experience player 002 of Figure 4.
[0141] Figure 7B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein. The blocks of method 750, like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 750 may be performed concurrently. Moreover, some implementations of method 750 may include more or fewer blocks than shown and/or described. The blocks of method 750 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above. For example, at least some aspects of method 750 may be performed by an instance of the control system 110 that is configured to implement the experience player 002 of Figure 4. Some other aspects of method 750 may be performed by an instance of the control system 110 that is configured to implement the MS Tenderer 003 of Figure 4.
[0142] In this example, block 755 involves receiving, by a control system, a content bitstream including encoded object-based sensory data. In this instance, the object-based sensory data includes sensory objects and corresponding sensory metadata and corresponds to sensory effects to be provided by a plurality of sensory actuators in an environment. In some examples, the environment may be an actual, real-world environment, such as a room environment or an automobile environment. The sensory effects may include lighting, haptics, airflow, one or more positional actuators, or combinations thereof, to be provided by the plurality of sensory actuators in the environment. According to some examples, the environment may be, or may include, a virtual environment. In some such examples, method 750 may involve providing a virtual environment, such as a gaming environment, while also providing corresponding sensory effects in a real-world environment.
[0143] In some examples, the object-based sensory metadata may include sensory spatial metadata indicating at least a spatial position for rendering the object-based sensory data within the environment, indicating an area for rendering the object-based sensory data within the environment, or both. According to some examples, the object-based sensory data does not correspond to particular sensory actuators in the environment. For example, as described with reference to Figures 6 and 7A, the sensory objects of the object-based sensory data may correspond with a portion of a three-dimensional area that represents a playback environment. The actual playback environment in which the sensory objects will be rendered does not need to be known, and generally will not be known, at the time that the sensory objects are authored. Accordingly, the object-based sensory data includes abstracted sensory
reproduction information — in this example, the sensory objects and corresponding sensory metadata — allowing the sensory Tenderer to reproduce authored sensory effects via various sensory actuator types, via various numbers of sensory actuators and from various sensory actuator positions in the environment.
[0144] According to this example, block 760 involves extracting, by the control system, the object-based sensory data from the content bitstream. In some such examples, the content bitstream also may include encoded audio objects that are synchronized with the encoded object-based sensory data. According to some such examples, the audio objects may include audio signals and corresponding audio object metadata. In some such examples, method 750 also may involve extracting, by the control system, audio objects from the content bitstream. According to some such examples, method 750 also may involve providing, by the control system, the audio objects to an audio Tenderer. In some examples, the audio object metadata may include at least audio object spatial metadata indicating an audio object spatial position for rendering the audio signals within the environment.
[0145] In this example, block 765 involves providing, by the control system, the object-based sensory data to a sensory Tenderer. According to some such examples, method 750 also may involve receiving, by the sensory Tenderer, the object-based sensory data and receiving, by the sensory Tenderer, environment descriptor data corresponding to locations of sensory actuators in the environment. In some such examples, method 750 also may involve receiving, by the sensory Tenderer, actuator descriptor data corresponding to properties of the sensory actuators in the environment. In some examples, the environment descriptor data and the actuator descriptor data may be, or may be included in, the environment and actuator data 004 that is described with reference to Figure 4. According to some examples, method 750 also may involve providing, by the sensory Tenderer, actuator control signals for controlling the sensory actuators in the environment to produce sensory effects indicated by the objectbased sensory data. In some such examples, the MS Tenderer 001 may provide actuator control signals 310 to the MS controller APIs 003 and the MS controller APIs 003 may provide actuator-specific control signals to the actuators 008. In some alternative examples, the MS controller APIs 003 that are shown in Figure 4 may be implemented via the MS Tenderer 001 and actuator-specific signals may be provided to the actuators 008 by the MS Tenderer 001. In some examples, method 750 also may involve providing, by the sensory actuators in the environment, the sensory effects.
[0146] In some examples, method 750 also may involve receiving, by the audio Tenderer, the audio objects and receiving, by the audio Tenderer, loudspeaker data corresponding to loudspeakers in the environment. According to some such examples, method 750 also may involve providing, by the audio Tenderer, loudspeaker control signals for controlling the loudspeakers in the environment to play back audio corresponding to the audio objects and synchronized with the sensory effects. The synchronization may, for example, be based on time information that is included in or with the sensory objects and the audio object, such as time stamps. In some examples, method 750 also may involve playing back, by the loudspeakers in the environment, the audio corresponding to the audio objects.
[0147] According to some examples, the content bitstream includes encoded video data synchronized with the encoded audio objects and the encoded object-based sensory data. In some such examples, method 750 also may involve extracting, by the control system, video data from the content bitstream and providing, by the control system, the video data to a video Tenderer. According to some such examples, method 750 also may involve receiving, by the video Tenderer, the video data and providing, by the video Tenderer, video control signals for controlling one or more display devices in the environment to present images corresponding to the video control signals and synchronized with the audio objects and the sensory effects. In some examples, method 750 also may involve presenting, by the one or more display devices in the environment. The images may correspond to the video control signals.
Intended Lighting Environment Metadata
[0148] Some disclosed examples involve the inclusion of lighting metadata with a video and/or an audio track, or for use as a standalone lighting-based sensory experience. This lighting metadata describes the intended lighting environment to be reproduced at playback. There are various ways of representing the lighting metadata, several of which are described in detail in this disclosure.
[0149] In some examples, the intended lighting environment may be transmitted as one or more Image Based Lighting (IBL) objects. IBL is a technique that has previously been used to capture an environment and lighting. IBL may be described as a the process of illuminating scenes and objects (real or synthetic) with images of light from the real world. IBL evolved from previously-disclosed reflection-mapping techniques in panoramic images are used as texture maps on computer graphics models to show shiny objects reflecting real
or synthetic environments. Some aspects of IBL are analogous to image-based modeling, in which a three-dimensional scene’s geometric structure may be derived from images. Other aspects of IBL are analogous to image-based rendering, in which the rendered appearance of a scene may be produced from the scene’s appearance in images.
[0150] Previously, IBL objects have been used when rendering computer graphics to create realistic reflections and lighting effects, for example effects in which a rendered object itself seems to reflect some part of a real-world environment. In some previously-disclosed virtual world/computer-generated examples, IBL involves the following processes:
• capturing real-world illumination as an omnidirectional image;
• mapping the illumination onto a representation of an environment;
• placing a computer-generated 3D object inside the environment; and
• simulating the light from the environment illuminating the computer graphics object.
[0151] There are various methods for capturing omnidirectional images. One way is to use a camera to photograph a reflective sphere placed in an environment. Another method of obtaining omnidirectional images is to obtain a mosaic based on many camera images obtained from different directions/viewing perspectives and combining the images using an image stitching program. In some such examples, images may be obtained using a fisheye lens, which can cover the full field of view in as few as two images. Another method of obtaining omnidirectional images is to use a scanning panoramic camera, such as a “rotating line camera,” which is configured to assemble a digital image as the camera rotates, to scan across a 360-degree field of view. Further details of previously-disclosed IBL methods are disclosed in Debevec, Paul, “Image-Based Lighting” (IEEE, March/ April 2002, pp. 26-34), which is hereby incorporated by reference.
[0152] Some implementations of the present disclosure build upon previously-disclosed methods involving IBL objects for a new purpose, which is to reproduce the intended environment using dynamically controllable surround lighting. Both the intended lighting environment and the end-point environment may change over time. For example, the walls of the end-point environment may be painted, actuators may be moved, furniture may be moved or replaced, new furniture, shelving and/or cabinetry may be added, etc.
[0153] Several types of mappings, or projections, may be used to map environment lighting from a sphere surrounding the intended viewing position onto a two-dimensional (2D) plane. Projection onto a 2D plane facilitates — and reduces the computational overhead required for — compressing lighting objects using 2D image or video codecs for more efficient transmission.
[0154] Figures 8 A, 8B and 8C show three examples of projecting lighting of viewing environments onto a two-dimensional (2D) plane. In these examples, a spherical projection was used. Other representations are also possible. In these examples, the lighting of the viewing environments is shown from an intended viewing position, but over 360 degrees of viewing angles. In these examples, the x axis represents the horizontal angles from the viewer’s perspective (left to right) and the vertical axis represents the vertical angles from the viewer’s perspective (up and down). In these examples, the middles of the projections 805, 810 and 815 correspond with the directions in front of a viewer. The left-most sides of the projections 805, 810 and 815 correspond with the right-most sides of the projections 805, 810 and 815 because the projections “wrap around” the viewing position. Projections 805, 810 and 815 may, for example, correspond to three different pieces of content, or may correspond to three different times for the same content. Projection 805 represents studio-type lighting in a mostly dark and monochromatic environment. Projection 810 includes a dark blue floor 812 and a predominant blue area 814 directly in front of the viewer. Projection 815 includes a bright red light 819 in front of and slightly to the left of the viewer.
[0155] As noted above, in some disclosed implementations the lighting metadata provided with light objects may be used to produce a video showing an environment from an intended viewing position. In some examples, lighting metadata may include multiple metadata units, each of which may include a header, describing how the metadata is represented to facilitate playback. In some such examples, lighting metadata may include the following information, which may be provided in different metadata units:
• Environment Metadata Version: This information enables decoders to correctly interpret multiple methods of representing the lighting, and choosing the one most appropriate for the capabilities of light fixtures of a particular playback environment;
• Environment Metadata Mapping Type: This information enables decoders to correctly convert the metadata into real-world coordinates;
• Environment Metadata Time Code: This information enables decoders to correctly synchronize the metadata to audio or video tracks;
• Environment Metadata Position Code: This information enables decoders to correctly align the viewing position for the reference viewing position. In some examples, there may be multiple sets of metadata corresponding to different viewing positions, to allow a viewer to experience the content from multiple viewing positions, with the lighting adapting accordingly by choosing the nearest appropriate position or interpolating between nearby positions. For example; o Position XI , Y1 ,Z 1 has EnvironmentMetadataPayload EMP 1 , while Position X2,Y2,Z2 has EMP2; o A user position X’,Y’,Z’ is determined; o The geometric distance is computed by the distance between X1,Y1,Z1 and X’,Y’,Z’ by DI = sqrt ( (Xl-X’)2 + (Yl-Y’)2 + (Zl-Z’)2 ), and D2 = sqrt ( (X2-X’)2 + (Y2-Y’)2 + (Z2-Z’)2); o The ratio of the distances is computed to determine an “alpha” value: alpha = DI / (D1+D2); o The EnvironmentMetadataPayload suitable for the user position EMP’ is interpolated between the two reference viewing positions using the Alpha, for example by EMP’ = EMP1 * (1-Alpha) + EMP2 * (Alpha);
This simple example illustrates one method of linear interpolation between two points. Additional clamping may be desirable if the viewer position is beyond either of the reference points. A triangular interpolation may be more suitable in some instances;
• Environment Metadata Compression Method;
• Environment Metadata Payload Size; and
• Environment Metadata Payload.
[0156] In some examples, the IBL representation may be augmented with depth information indicating the relative distance of the environmental light sources from the reference viewing position. This depth information can be used to adapt the position of the light sources as a viewer moves around an environment. Depth information may, for example, be obtained
directly from RGB images using trained neural networks, e.g., via monocular depth estimation. Alternatively, or additionally, many consumer devices, such as iPhones, are capable of measuring depth directly using infrared imaging techniques such as lidar and structured light.
[0157] The spatial resolution of the IBL techniques may be variable depending on the use case requirements, depending on the available bit rate and required compression quality. The IBL may be compressed using known image-based compression methods — such as Joint Photographic Experts Group (JPG), JPG2000, Portable Network Graphic (PNG), etc. — or known video-based compression methods, such as (Advanced Video Coding (AVC), also known as H.264, H.265, Versatile Video Encoding (VVC), AOMedia Video 1 (AVI), etc.
[0158] However, the methods disclosed herein are not limited to IBL-based examples. Some other examples of representing the intended lighting environment treat each light source as a unique light source object, with a defined position and size. In some such methods, other information may be used to define or describe each light source, including but not limited to the directionality of the light emitted from each light source object. Some methods also may include information regarding reflectivity of one or more surfaces, one or more intended room dimensions, etc. Such information may be used to support implementations in which the viewer is free to move to new locations. According to some examples, the light source objects may be defined directly using a computer program written specifically for this task, or they may be inferred by analyzing video images of light sources in an environment. Potential advantages of a light source object-based approach include a smaller metadata payload size for relatively simple lighting scenarios.
Playback Lighting Environment Rendering
[0159] Figure 9A shows example elements of a lightscape Tenderer. As with other figures provided herein, the types and numbers of elements shown in Figure 9A are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements. According to this example, the lightscape Tenderer 501 is an instance of the lightscape Tenderer 501 that is shown in Figure 5. In this example, the lightscape Tenderer 501 is implemented by an instance of the control system 110 of Figure 1. According to this example, the lightscape Tenderer 501 includes a scale factor computation block 925 and a digital drive value computation block 930.
[0160] According to this example, at playback, the lightscape Tenderer 501 receives as input object-based light data 505, including the intended environment lighting metadata. The object-based light data 505 may, for example, have been produced by the lightscape creation tool 100 of Figure 5. In this example, the rendering engine also receives environment and light fixture data 104 and computes appropriate light fixture control signals 515 for the controllable light fixtures 108 (not shown) in the playback environment. In this example, the environment and light fixture data 104 is shown as including environment and light fixture data 104a, which includes information regarding light fixtures of the playback environment and their capabilities, and environment and light fixture data 104b, which includes information regarding ambient light and/or non-controllable light fixtures of the playback environment. In some examples, the lightscape Tenderer 501 may be configured to output the light fixture control signals 515 to light controllers 103, which are configured to control the light fixtures 108. The light fixtures 108 in the playback environment also may be referred to herein as “endpoint dynamic lighting elements.”
[0161] In some examples, the environment and light fixture data 104a may include information regarding a set of N dynamic lighting elements, where N represents the total number of dynamic lighting elements that are controllable in the playback environment. According to this example, for each dynamic lighting element n, the environment and light fixture data 104a includes: (1) an IBL map 905 of the environmental lighting produced by the dynamic (controllable) light fixtures at maximum intensity; and (2) a mapping 910 of light intensity produced by the controllable light fixtures to digital drive signals provided to the controllable light fixtures. In this example, the environment and light fixture data 104b includes ambient light information 915 and ambient light level information 920. In some examples, the ambient light information 915 may include a base IBL map of the base environment lighting that is not controllable by the system. According to some examples, the ambient light information 915 may include information regarding other light sources, such as windows in the playback environment, the directions that the windows face, the amount of outdoor light received in the environment through the windows at various times of day, information regarding controllable window shades, if any, etc. In some examples, the ambient light level information 920 may include a scale value of the base environmental lighting, obtained, for example, by an optical light sensor.
[0162] In some implementations, a lightscape Tenderer — such as the lightscape Tenderer 501 of Figure 5 — may perform the following operations:
1. Receive as input — here, as part of the object-based light data 505 — information regarding an intended environment lighting IBLref;
2. Compute the base lighting IBLbase, for example according to a constant estimated value, or by scaling the base IBL map with the estimated ambient light;
3. Compute the linear scaling Scalen for each dynamic lighting element IBLn such that the sum of each scaled IBLn plus the base lighting most closely matches the intended environmental lighting, for example as follows:
Minimize abs( IBLref- ( IBLbase + sum( Scalen * IBLn) ) )
A non-linear function may be used to encode each of the linear values prior to computing the difference. The non-linear function may correspond to the sensitivity of human vision to color and intensity of light;
4. Compute digital drive signals Driven — represented in Figure 9A as element 515, because they are instances of the light fixture control signals 515 of Figure 5 — for each dynamic lighting element from the scaled value Scalen; and
5. Transmit the digital drive values to the dynamic lighting elements, or to the light controller APIs 103 of Figure 5.
[0163] In some examples, computing the scale factors may involve subtracting the ambient light IBLbase from the intended lighting IBLref, and then computing the scale factors Scalen using deconvolution from the dynamic lighting IBLn. Conventional methods of deconvolution can be used, including regularization as needed to improve robustness.
[0164] In some alternate examples, the process of computing the scale factors may be performed iteratively, for example by initializing the scale factors to an initial value, then adjusting the scale values in turn while evaluating the output against the reference (the intended lighting IBLref). According to some such examples, conventional methods of gradient descent and function minimization can be used during the process of computing the scale factors.
[0165] In some examples of computing the digital drive values from the scale factors, a lookup table (LUT) — such as the dynamic light LUT 910 of Figure 9 A — may be used to determine the correct digital drive value required to achieve a desired light intensity from a particular light source. Alternatively, a functional scale factor may be derived from measurement data.
Calibration and System Configuration
[0166] One important aspect of this application is the generation of the dynamic lighting IBL data for a given playback environment. In one embodiment the process is as follows:
1) Establish a connection between a host and the dynamic light fixtures (also referred to herein as controllable light fixtures);
2) Install an IBL measurement device at a preferred viewing position. Commonly-used techniques are a camera imaging a shiny sphere, a camera with a fisheye lens, a camera that pans around the scene, or a fixture of multiple cameras which capture the scene from all directions (a 360 degree camera);
3) Capture the base environment light (IBLbase) and light sensor readings;
4) For each dynamic light fixture, do the following: a. Set the drive value to a maximum level; b. Capture the IBLn; c. Repeat for several levels (drive values); d. Construct an representative IBLn that is suitable for multiple drive levels; and e. Construct the relationship between linear light and drive value. “Linear light” refers to a space where a linear change in value is perceived as a linear change by a human. Most devices do not have linear responses in this respect. In other words, if the codeword (drive value) to the actuator is doubled, the change in perceived brightness is not doubled. Working in a linear light space is convenient and can ensure that, when applicable, a finite amount of resolution is spread equally across the human perceptual response. After operations are performed in a linear light space and corresponding values are derived, these values may be converted into the drive values/codewords for controlling the physical devices/light.
[0167] The above-described process is applicable when the base environment light is relatively fixed, except for an overall increase or decrease. For example, there may be a window in one corner of the playback environment that causes an overall increase or decrease in ambient light, depending on the weather and the time of day. In many viewing environments the lighting that cannot be controlled may be more dynamic, for example light fixtures that can be controlled manually (not part of the dynamic setup), or an automotive environment. In some examples, multiple base lighting scenarios may be captured, and during playback a measurement of the ambient light in the environment may be used to
estimate the captured base lighting scenario that most closely matches the actual, current base lighting conditions. The captured IBLn, IBLbase, and drive values relationships may be stored in a configuration file that is accessible during rendering.
Content Authoring/Mastering
[0168] The output of the content authoring/mastering step is the intended lighting environment map, or IBLref. In some examples, the intended lighting environment map may be measured directly using the measurement approach described in the calibration section above. In other examples, the intended lighting environment map may be rendered from computer graphics software. Some disclosed examples involve dynamically changing the reference lighting to create a video sequence of the intended reference lighting environment.
[0169] Figure 9B is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein. The blocks of method 950, like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 950 may be performed concurrently. Moreover, some implementations of method 950 may include more or fewer blocks than shown and/or described. The blocks of method 950 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above. For example, at least some aspects of method 950 may be performed by an instance of the control system 110 that is configured to implement at least the lightscape Tenderer 501 of Figure 5 or the lightscape Tenderer 501 of Figure 9A. In some examples, method 950 may be performed by one or more instances of the control system 110 configured to implement the lightscape Tenderer 501 and the light controller APIs of Figure 5.
[0170] In this example, block 955 involves receiving, by a control system configured to implement a lighting environment Tenderer, object-based light data indicating the intended lighting environment. In this instance, the object-based light data including light objects and lighting metadata. In some examples, the environment may be an actual, real-world environment, such as a room environment or an automobile environment. According to some examples, the environment may be, or may include, a virtual environment. In some such examples, method 950 may involve providing a virtual environment, such as a gaming environment, while also providing corresponding lighting effects in a real-world environment.
[0171] According to this example, block 960 involves receiving, by the control system, lighting information regarding a local lighting environment. In this example, the lighting information includes one or more characteristics of one or more controllable light sources in the local lighting environment. Block 960 may, for example, involve receiving the environment and light fixture data 104 that is described herein with reference to Figure 5 or Figure 9 A.
[0172] In this example, block 965 involves determining, by the control system, a drive level for each of the one or more controllable light sources that approximates the intended lighting environment. Here, block 970 involves outputting, by the control system, the drive level to at least one of the controllable light sources.
[0173] In some examples, the object-based lighting metadata includes time information. In some such examples, block 965 may involve determining one or more drive levels for one or more time intervals corresponding to the time information.
[0174] According to some examples, method 950 may involve receiving viewing position information. In some such examples, block 965 may involve determining one or more drive levels corresponding to the viewing position information.
[0175] In some examples, the lighting information may include one or more characteristics of one or more base light sources in the local lighting environment, the base light sources not being controllable by the control system. In some such examples, block 965 may involve determining one or more drive levels based, at least in part, on the one or more characteristics of one or more non-controllable light sources.
[0176] According to some examples, the object-based lighting metadata may include at least lighting object position information and lighting object color information. In some examples, the intended lighting environment may be transmitted as one or more Image Based Lighting (IBL) objects. In some such examples, the determining process of block 965 may be based, at least in part, on an IBL map of environmental lighting produced by each of n controllable light sources (IBLn) in the local lighting environment at maximum intensity.
[0177] In some examples, method 950 may involve receiving a base IBL map of the base environment lighting (IBLbase) that is not controllable by the control system. In some such examples, the determining process of block 965 may be based, at least in part, on the base IBL map.
[0178] In some examples, the determining process of block 965 may be based, at least in part, on a linear scaling (Scalen) for each dynamic lighting element IBLn such that a sum of light from each scaled IBLn plus the base lighting IBLbase most closely matches the intended lighting environment IBLref. In some such examples, the determining process of block 965 may be based, at least in part, on minimizing a difference between IBLref and the sum of light from each scaled IBLn plus the base lighting IBLbase.
Examples of MS Object Properties
[0179] Following is a non-exhaustive list of possible properties of MS objects:
• Priority;
• Layer;
• Mixing Mode;
• Persistence;
• Effect; and
• Spatial Panning Law.
[0180] Effect
As used herein, the “effect” of an MS object is a synonym for the type of MS object. An “effect” is, or indicates, the sensory effect that the MS object is providing. If an MS object is a light object, its effect will involve providing direct or indirect light. If an MS object is a haptic object, its effect will involve providing some type of haptic feedback. If an MS object is an air flow object, its effect will involve providing some type of air flow. As described in more detail below, some examples involve other “effect” categories.
[0181] Persistence
Some MS objects may contain a persistence property in their metadata. For example, as an moveable MS object moves around in a scene, the moveable MS object may persist for some period of time at locations that the moveable MS object passes through. That period of time may be indicated by persistence metadata. In some implementations, the MS Tenderer is responsible for constructing and maintaining the persistence state.
[0182] Layers
According to some examples, individual MS objects may be assigned to “layers,” in which MS objects are grouped together according to one or more shared characteristics. For
example, layers may group MS objects together according to their intended effect or type, which may include but are not limited to the following:
Mood/ Ambience
Informational
Punctuation al /Attention
Alternatively, or additionally, in some examples, layers may be used to group MS objects together according to shared properties, which may include but are not limited to the following:
Color;
Intensity;
Size;
Shape;
Position; and
Region in space.
[0183] Priority
In some examples, MS objects may have a priority property that enables the Tenderer to determine which object(s) should take priority in an environment in which MS objects are contending for limited actuators. For example, if multiple light objects overlap with a single light fixture at a time during which all of the light objects are scheduled to be rendered, a Tenderer may refer to the priority of each light object in order to determine which light object(s) will be rendered. In some examples, priority may be defined between layers or within layers. According to some examples, priority may be linked to specific properties such as intensity. In some examples, priority may be defined temporally: for example, the most recent MS object to be rendered may take precedence over MS objects that have been rendered earlier. According to some examples, priority may be used to specify MS objects or layers that should be rendered regardless of the limitations of a particular actuator system in a playback environment.
[0184] Spatial Panning Laws
Spatial panning laws may define an MS object’s movement across a space, how an MS object affects actuators as it moves between them, etc.
[0185] Mixing Mode
The mixing mode may specify how multiple objects are multiplexed onto a single actuator. In some examples, mixing modes may include one or more of the following:
Max mode: select the MS object which activates an actuator the most;
Mix mode: mix in some or all the objects according to a rule set, for example by summing activation levels, taking the average of activation levels, mixing color according to activation level or priority level, etc.;
MaxNmix: mix in the top N MS objects (by activation level), according to a rule set.
[0186] According to some examples, more general metadata for an entire multi-sensory content file, instead of (or in addition to) per-object metadata may be defined. For example, MS content files may include metadata such as trim passes or mastering environment.
[0187] Trim Controls
[0188] What are referred to in the context of Dolby Vision™ as “trim controls” may act as guidance on how to modulate the default rendering algorithm for specific environments or conditions at the endpoint. Trim controls may specify ranges and/or default values for various properties, including saturation, tone detail, gamma, etc. For example, there may be automotive trim controls, which provide specific defaults and/or rule sets for rendering in automotive environments, for example guidance that includes only objects of a certain priority or layer. Other examples may provide trim controls for environments with limited, complex or sparse multisensory actuators.
[0189] Mastering Environment
A single piece of multisensory content may include metadata on the properties of the mastering environment such as room size, reflectivity and ambient bias lighting level. The specific properties may differ depending on the desired endpoint actuators. Mastering environment information can aid in providing reference points for rendering in a playback environment.
Lightscape Metadata Layers
[0190] When authoring a lightscape for media content, it can be useful to identify at least two different methods (or layers) to author for. These layers may be used during the process of rendering the authored MS objects according to the available and controllable light fixtures in a playback environment. These layers can aid in capturing artistic intent and can allow for flexibility in restrictions in a playback environment, for example due to the number of light fixtures or light occlusions, so that the primary intent of the author(s) may still be rendered but can be scaled or otherwise modified.
[0191] In some examples, direct lighting and indirect lighting may be assigned to different lighting metadata layers.
[0192] Direct light objects
[0193] Light objects in a direct light object layer, also referred to herein as “direct light objects,” are light objects representing light that is directly visible to the content creator or end user. Examples of direct light objects could include light fixtures in the scene, the sun, the moon, headlights from a car approaching, lightning during a storm, traffic lights, etc. Direct light objects also may be used to represent light sources that are part of a scene but are typically, or temporarily, not visible in the associated video content, for example because they are outside the video frame or because they move outside the video frame. In some examples, direct light objects may be used to enhance or augment auditory events, such as explosions, visually guiding moving object trajectories outside the video frame, etc. The use of direct light objects is typically of a dynamic nature. For example, the associated metadata such as intensity, color, saturation, and position will often change as a function of time within a scene of the media content.
[0194] Indirect light objects
[0195] Light objects in an indirect light object layer, also referred to herein as “indirect light objects,” are light objects representing the effect of indirect light. For example, indirect light objects may be used to represent the effect of light radiated by fixtures that is observed when the light is reflected by one or more surfaces. Some examples of the using indirect light objects include change the observed color of the walls, ceiling or floor of the environment into a color that matches the content, such as green colors for a forest scene, or blue colors for sky or water. Indirect light objects also may be used to set the scene and the mood of the environment in a similar way as is achieved by color grading video content, but in a more immersive way. For example, science-fiction movies often use very specific (blue or greenish) video color grading palettes to reinforce the sense of being in outer space. Flashback scenes often use reduced saturation, muted colors or sepia color overlays in the video content to intensify the effect of a change in the time line. All these effects can be replicated, or approximated, outside the video frame by adjusting the light control signals accordingly. Light effects corresponding to indirect light objects are often more stationary within a scene, and are typically less localized and less dynamic, than light effects corresponding to indirect light objects.
Layer Abstraction
[0196] Some examples involve a further abstraction of the direct and indirect light object layers into layers that include aspects of both. In some such examples, these layers may include one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof. These layers may, in some examples, be used for, or may correspond with, linear or event-based triggers in content.
Ambient Layer! s)
[0197] Similar to indirect lighting, an ambient layer can be used to set the mood and tone in a space through washes of color on surfaces in the playback environment. An ambient layer may be used as a base layer on which to build lighting scenes. In some examples, an ambient layer may be represented via light objects that cover relatively large areas. In other examples, an ambient layer may be represented via light objects that cover relatively small areas, for example with one or more images. According to some examples, an ambient layer may be divided into zones. In some such examples, particular light effects always occupy a certain region of space. For example, the walls, ceiling and floor in an authoring or playback environment each may be considered separate ambient layer zones.
Dynamic Layer! s)
[0198] In some implementations, a dynamic layer may be used to represent spatial and temporal variation of MS objects, such as light objects. Within a dynamic layer, individual MS objects may also have priority so that, for example, one light object may have preference over another light object in being presented via a light fixture. Within a dynamic layer, individual MS objects may, in some examples, be linked to other objects, such as to audio objects (from spatial audio) or to 3D world MS objects.
Custom Layer(s)
[0199] In some examples, a custom layer can be used to design light sequences that can be freely assigned to light fixtures for functional purposes. These sequences may not be spatial in nature, but instead may provide further information to the user. For example, in a game a light strip may be assigned to show the player’s remaining life.
Overlay Layer! s)
[0200] According to some examples, an overlay layer can be used to present persistent lights that have continuous priority. An overlay layer may, for example, be used to create a “watermark” over all other elements in a lighting scene.
Authoring and distribution of lightscape layer data
Authoring of Direct and Indirect Light Objects
[0201] In some examples, direct light objects may be authored by determining or setting light source position, intensity, hue, saturation and spatial extent as a function of time for one or more light objects. In some such examples, this authoring process may create corresponding metadata that can be distributed, with the direct light objects, alongside audio and/or video content of a content presentation. Ideally, direct light objects are rendered to direct light sources.
[0202] Indirect light effects may, in some examples, be authored as a dedicated group or class within the lightscape metadata content, focusing more on overall color and ambiance rather than dynamic effects. Indirect light effects may also be defined by intensity, hue, saturation, or combinations thereof as a function of time, but would typically be associated with a substantial area of the lightscape rendering environment. Indirect light effects are ideally (but not necessarily) rendered to indirect light sources, when available.
[0203] Figure 10 shows another example of a GUI that may be presented by a display device of a lightscape creation tool. As with other figures provided herein, the types and numbers of elements shown in Figure 10 are merely provided by way of example. Other GUIs presented by a lightscape creation tool may include more, fewer and/or different types and numbers of elements. According to some examples, the GUI 1000 may be presented on a display device according to commands from an instance of the control system 110 of Figure 1 that is configured for implementing the lightscape creation tool 100 of Figure 5.
[0204] In this example, a user may interact with the GUI 1000 in order to create light objects and to assign light object properties, which may be associated with the light object as metadata. According to this example, the GUI 1000 includes a direct light object metadata editor section 1005, with which a user can interact in order to define properties of metadata for direct light objects, as well as an indirect light object metadata editor section 1010, with which a user can interact in order to define properties of metadata for indirect light objects.
[0205] A user may interact with the direct light object metadata editor section 1005 in order to select a position, a size and other properties of a direct light object. In this example, the direct light object metadata editor section 1005 includes a hue-saturation-lightness (HSL) color wheel 1035a, with which a user may interact to select the HSL attributes of a selected direct light object. According to this example, the direct light object metadata editor section 1005 represents direct light objects A, B and C within a three-dimensional space 1031, which represents a playback environment. In this example, the direct light objects A, B and C are being viewed from the top of the three-dimensional space 1031, along the z axis.
[0206] According to this example, a user has selected, and is currently selecting properties of, the direct light object A. Because the user has selected the direct light object A, the corresponding time automation lanes for direct light object A’s coordinates (X, Y, Z), object extent (E) and HSL values across time have become visible in the area 1025 and can be edited. In some examples, the time interval corresponding to the time automation lanes shown in Figure 10 may be on the order of 1 or more seconds, for example 1 second, 2 seconds, 3 seconds, 4 seconds, 5 seconds, etc. In this example, only the x and y dimensions of the three-dimensional space 1031 are shown, but the area 1025 of the direct light object metadata editor section 1005 nonetheless allows a user to indicate x, y and z coordinates.
[0207] In this example, the indirect light object metadata editor section 1010 includes an HSL color wheel 1035b, with which a user may interact to select the HSL attributes of a
selected indirect light object. According to this example, the indirect light object metadata editor section 1010 also includes an intensity control 1030, with which a user may interact to select the intensity of a selected indirect light object.
[0208] According to some examples, a lightscape creation tool or an MS content creation tool may allow a content creator to set indirect light effects linked to video scene boundaries to set the lightscape indirect light effects for a specific scene. Alternatively, or additionally, the content creator may choose to modify intensity, hue, saturation, etc., as a function of time. In some examples, a lightscape creation tool or an MS content creation tool may allow a content creator to use the video color overlay information used during video content creation to determine the indirect light effect metadata. According to some examples, the indirect light settings of a lightscape creation tool or an MS content creation tool may be used as a color/hue/saturation/intensity overlay to the direct light object metadata such that the direct light objects will follow the indirect light properties more closely.
Authoring of Layer Abstraction
[0209] In some examples, layers may be authored in a lightscape creation tool or an MS content creation tool configured for the creation of linear based content, where individual “objects” may be assigned to a layer with properties such as color, intensity, shape and position. In some such examples, position may only be specified for dynamic objects whilst sub zones may be used for ambient layer objects.
[0210] According to some examples, MS content such as light-based content may be created for 3D worlds. Some such examples allow for event-based triggers, for example triggers linking events to existing light metadata as well as to the creation of new scenes.
[0211] According to some examples, a lightscape creation tool or an MS content creation tool may allow mixing between layers or individual objects. For example, it may be desired that layers/objects with the same priority may be additively mixed, whilst a higher-priority object should occlude all other objects. The lightscape creation tool or the MS content creation tool may allow a content creator to define mixing rules corresponding to the content creator’s intentions.
Rendering of lightscape layer data
[0212] The layer attributes and their metadata may be rendered by means of a lightscape renderer, such as the lightscape Tenderer 501 of Figure 5. In some examples, the lightscape
renderer may be configured to send light fixture control signals 515 to light fixtures of a playback environment. In some examples, the lightscape Tenderer 501 may be configured to output the light fixture control signals 515 to light controllers 103, which are configured to control the light fixtures 108. According to some examples, the lightscape Tenderer uses the environment and light fixture data 104 to determine the capabilities and spatial positions of each light fixture. In some examples, the priority of layers may be a determining factor in what is ultimately rendered to a light fixture.
Rendering Direct and Indirect Light Objects
[0213] In some implementations, the environment and light fixture data 104 received by a lightscape Tenderer include data regarding whether the fixture is directly visible from a viewing position or is an indirect light source. In some examples, if no indirect light fixtures are available, the indirect light data may be sent to direct light fixtures instead, potentially with a reduced brightness.
[0214] Direct light objects are preferably rendered to visible light fixtures such as ceiling downlights, lights fixed to a wall, table lamps, etc. Indirect light metadata is ideally targeting light fixtures that are not directly visible, such as LED strips that light up walls, ceilings, shelves, furniture, and spot lights that light up walls or ceilings. If no such indirect lights are available, the indirect light metadata can be used to control direct lights instead. In some such examples, a lightscape Tenderer may cause direct light object metadata and indirect light object metadata to be superimposed when rendering to light fixtures that function as both indirect and direct light sources.
[0215] Figure 11 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein. The blocks of method 1100, like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 1100 may be performed concurrently. Moreover, some implementations of method 1100 may include more or fewer blocks than shown and/or described. The blocks of method 1100 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above. For example, at least some aspects of method 1100 may be performed by an instance of the control system 110 that is configured to implement the MS Tenderer 001 of Figure 4, the lightscape Tenderer 501 of Figure 5 and/or the lightscape Tenderer 501 of Figure 9 A.
[0216] In this example, block 1105 involves receiving, by a control system configured to implement a sensory Tenderer, one or more sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback environment. In some examples, the environment may be an actual, real-world environment, such as a room environment or an automobile environment. According to some examples, the environment may be, or may include, a virtual environment. In some such examples, method 1100 may involve providing a virtual environment, such as a gaming environment, while also providing corresponding lighting effects in a real-world environment.
[0217] According to this example, block 1110 involves receiving, by the control system, playback environment information. In this example, the playback environment information includes sensory actuator location information and sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment. Block 1110 may, for example, involve receiving the environment and actuator data 004 that is described herein with reference to Figure 4 or receiving the environment and light fixture data 104 that is described herein with reference to Figure 5 or Figure 9A.
[0218] In this example, block 1115 involves determining, by the control system and based on the playback environment information, the sensory objects and the sensory object metadata, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment. Here, block 1120 involves outputting, by the control system, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment.
[0219] In some examples, the one or more sensory objects may include one or more light objects, one or more haptic objects, one or more air flow objects, one or more positional actuator objects or combinations thereof. According to some examples wherein the sensory object metadata includes light objects and lighting metadata, the lighting metadata may include direct-light object metadata, indirect light metadata, or combinations thereof. Alternatively, or additionally, the lighting metadata may be organized into one or more layers, which may include one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
[0220] According to some examples, the sensory object metadata includes time information.
In some such examples, block 1115 may involve determining one or more drive levels for one or more time intervals corresponding to the time information.
[0221] In some examples, method 1100 may involve receiving viewing position information. In some such examples, block 1115 may involve determining one or more drive levels corresponding to the viewing position information.
[0222] In some examples, the lighting information may include one or more characteristics of one or more base light sources in the local lighting environment, the base light sources not being controllable by the control system. In some such examples, block 1115 may involve determining one or more drive levels based, at least in part, on the one or more characteristics of one or more non-controllable light sources.
[0223] According to some examples, the sensory object metadata may include sensory object size information. In some examples, each of the sensory objects may have a sensory object effect property indicating a type of effect that the sensory object is providing. According to some examples, wherein one or more of the sensory objects may have a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment.
[0224] In some examples, one or more of the sensory objects may be assigned to one or more layers in which sensory objects are grouped according to shared sensory object characteristics. In some such examples, the one or more layers may group sensory objects according to mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof.
[0225] According to some examples, at least some of the sensory objects may have a priority property indicating a relative importance of each sensory object. In some examples, one or more of the sensory objects may have a spatial panning property indicating how a sensory object can move within the sensory actuator playback environment, how a sensor object will affect the controllable sensory actuators in the sensory actuator playback environment, or combinations thereof. According to some examples, at least some of the sensory objects may have a mixing mode property indicating how multiple sensory objects can be reproduced by a single controllable sensory actuator.
[0226] Some examples of method 1100 may involve providing and/or processing more general metadata for an entire multi-sensory content file, instead of (or in addition to) perobject metadata. This more general metadata may be referred to as “overarching sensory object metadata.” Some examples of method 1100 may involve receiving, by the control system, overarching sensory object metadata comprising trim control information, mastering environment information, or a combination thereof. In some such examples, block 1115 may involve determining the sensory actuator control commands or the sensory actuator control signals based, at least in part, on the trim control information, on the mastering environment information, or on a combination thereof.
Bitstreams Including Multi-Sensory Objects
[0227] This section discloses various types of coded bitstreams to carry object-based multi- sensory data for rendering on an arbitrary plurality of actuators. Such bitstreams may be referred to herein as including encoded object-based sensory data or as including an encoded object-based sensory data stream. Some encoded object-based sensory data streams may be delivered along with and/or as part of other media content. In some such examples, an object-based sensory data stream may be interleaved or multiplexed with the audio and/or video bitstreams. According to some examples, an object-based sensory data stream may be arranged in an International Standards Organization (ISO) base media file format (ISOBMFF), so that the encoded object-based sensory data can be provided with corresponding audio data, video media, or both, in an encoded ISOBMFF bitstream. Accordingly, an encoded bitstream may, in some examples, include and an encoded objectbased sensory data stream and an encoded audio data stream and/or an encoded video data stream. In some examples, the encoded object-based sensory data stream and other associated data stream(s) include associated synchronization data, such as time stamps, to allow synchronization between different types of content. For example, if the encoded bitstream includes encoded audio data, encoded video data and encoded object-based sensory data, the encoded audio data, encoded video data and the encoded object-based sensory data may all include associated synchronization data.
[0228] Just as there exist channel-based surround sound formats such as Dolby Digital (AC3) and Dolby Digital Plus and object-based surround sound formats such as Dolby Atmos (DD+ AJOC and AC4-JOC), we here introduce an object-based sensory data format. These concepts are summarised in the following table:
[0229] Following are some examples of artistic intent that can be communicated in encoded object-based sensory data stream:
1. Make all lights in the room bright white now.
2. Switch off all lights in the room at presentation timestamp 1 : 15.
3. Slowly transition lights at the rear of the room (behind the audience) to red over the next 10 seconds.
4. Simulate a helicopter with searchlight flying over the audience by turning on and then off any available overhead lights in the room in sequence from front to back over the next 15 seconds.
5. Produce orange light in the front-left corner of the room now.
6. Create a 5 knot airflow from the front-right corner of the room.
[0230] The reader will note that these examples do not require knowledge at contentauthoring time of the particular set of actuators present in the playback environment or the locations of these actuators. Instead, an MS Tenderer — such as the MS Tenderer 001 of Figure 4 — will be configured to control a particular set of actuators of a particular playback environment based on (a) the general instructions found in the object-based sensory data 005 and (b) information in the environment and actuator data particular . In some examples — as shown in Figure 4 — the object-based sensory data 005 may be provided to the MS Tenderer 001 by the experience player 002, which may include a bitstream decoder configured to extract the object-based sensory data 005 from a bitstream that also includes encoded audio data and/or video data.
[0231] Linear audio/video media content is conventionally packed into a container format in frames wherein each frame contains the necessary information to render the media during a particular time duration of the content (for example, the 60ms period from 10 min 3.2 seconds to 10 min 3.8 seconds relative to the start of the content).
[0232] Some format streams may include a separate stream for each modality, for example one elemental stream containing video information encoded using High Efficiency Video Coding (HEVC), also known as H.265, one elemental stream containing audio information encoded using Advanced Audio Coding (AAC) or Dolby AC4, and a third elemental stream containing closed-captioning (subtitle) information. In some instances there may be multiple elemental streams which may be chosen or combined at rendering time, such as audio tracks featuring different languages, director’s commentary which may be optionally mixed with one or more other audio tracks during playback, closed captions in multiple languages, etc.
[0233] The present disclosure extends and generalizes previously-existing bitstream encoding and decoding methods to include a plurality of elemental streams that convey non-channel- based (such as object-based or spherical harmonic-based) multi-sensory information suitable for demultiplexing, frame reassembly and presentation using a plurality of actuators, in some instances in synchrony with audio and/or video modalities of a media stream.
[0234] Figure 12 shows example elements of a system for the creation and playback of multi- sensory (MS) experiences. As with other figures provided herein, the types and numbers of elements shown in Figure 12 are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements. According to some examples, system 1200 may be, or may include, one or more devices configured for performing at least some of the methods disclosed herein. In some examples, system 1200 may include one or more instances of the control system 110 of Figure 1 that are configured for performing at least some of the methods disclosed herein. In this example, system 1200 includes instances of some elements that are described with reference to Figure 4.
[0235] In this example, the system 1200 includes the following elements: 1200: A system configured to receive and process an encoded bitstream that includes a plurality of data frames, the data frames including encoded audio data, encoded video data and encoded object-based sensory data;
1201 : The encoded bitstream. In some examples, the data frames of the encoded bitstream may be arranged in an International Standards Organization (ISO) base media file format (ISOBMFF) “container” and/or encoded according to a Moving Picture Experts Group (MPEG) standard;
1201A-C: A multiplexed sequence of packets/parts/frames of the encoded bitstream 1201;
1202A-B: Elements of an audio data stream. In some examples, the audio data stream may be encoded according to an Advanced Audio Coding (AAC), Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec;
1203A-B: Elements of a video data stream. In some examples, the video data stream may be encoded according to an Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) or AOMedia Video 1 (AVI) codec;
1204A-B: Elements of an encoded sensory data stream, which in this example is an objectbased sensory data stream. In this example, the encoded object-based sensory data includes sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided via sensory actuators in a playback environment. In some implementations, there may be one object-based sensory data stream — which also may be referred to as an “elemental stream” — for each modality (for example, an object-based lighting data stream, an object-based temperature data stream, an object-based airflow data stream, etc.). In other implementations, these modalities may be combines into a single encoded sensory data stream. In this example, the encoded audio data stream, the encoded video data stream and the encoded object-based sensory data stream all include associated synchronization data, such as time stamps;
002: An instance of the experience player 002 of Figure 4, including:
1206: A bitstream demultiplexer;
1207: An audio stream decoder;
1208: A video stream decoder; and
1209: A multi-sensory data stream decoder;
001 : Multi-sensory Tenderer - uses environment and actuator data 004 and decoded multi- sensory stream to produce control signals for driving a plurality of actuators 008. In this example, the MS Tenderer 001 includes the functionality of the MS controller APIs 003 described with reference to Figure 4. According to this example, MS Tenderer 001 can determine how to synchronize playback of the audio, video and sensory data according to the synchronization data in the encoded bitstream 1201;
004: Environment and actuator data - contains information on where actuators are physically located within the environment, visibility/zone of affect information describing how each actuator is perceived by the audience (for example, whether and how each light fixture is visible to the viewer(s));
1213 A, 1213B, 1213C . . . : A plurality of individual control streams for each actuator 008; 008: A plurality of actuators under the control of the multi-sensory Tenderer 001;
008 A and 008B: Smart lamps;
008C: A smart RGB light-emitting diode (LED) strip; and 008D: Other actuators in the playback environment.
Embodiment 1: One Elemental Stream Per Modality
[0236] In some examples, the multi-sensory data stream may be transported in a plurality of elemental streams (for example, one elemental stream for lighting information, one elemental stream for airflow information, one elemental stream for haptic information, one elemental stream for temperature information, etc.). According to some implementations, multiple versions of one or more multi-sensory data streams may be present in the encoded bitstream 1201 to allow selection based on user preference or user requirements (for example, a default or standard lighting data stream for typical viewers and a separate lighting data stream that contains more subtle lighting information that is intended to be safe for photosensitive viewers).
Embodiment 2: Combined Multi-Sensory Elemental Stream
[0237] In some alternative examples, all the multi-sensory modalities (for example, lighting, airflow, haptics, temperature) may be combined into a unified multi-sensory data stream.
Embodiment 3: Multi-Sensory Information Combined with an Existing Elemental Stream or Arranged in an Existing Container Format
[0238] In another alternative embodiment, the multi-sensory data stream may be embedded within one of the existing elemental streams. For example, some audio stream formats may include a facility to encapsulate stream synchronous metadata within them. In some examples, the multi-sensory data stream may be embedded within an existing audio metadata transport mechanism, for example in a field or sequence of fields reserved for audio metadata. In some alternative examples, an existing audio, video or container format could be modified or adapted to allow the inclusion of synchronous multi-sensory information.
[0239] As noted above, in some examples, the data frames of the encoded bitstream may be arranged in an International Standards Organization (ISO) base media file format (ISOBMFF) file or “container.” According to some implementations, the encoded objectbased sensory data may reside in a timed metadata track, for example as defined in Section 12.9 of the International Standards Organization/International Electrotechnical Commission (ISO/IEC) 14496-12:2022 standard, which is hereby incorporated by reference. According to
some such examples, encoded audio data may reside in audio track and/or encoded video data may reside in a video track of an ISOBMFF file. Such examples have various potential advantages, which include but are not limited to the following:
• The same timed metadata track may be associated with more than one track. In other words, a timed metadata track corresponding to the encoded object-based sensory data may be independent of the content of the associated audio/video tracks;
• It may be easier to append a file with a timed metadata track; and
• The duration of timed metadata samples need not match the duration of associated audio and/or video data.
[0240] Figure 13 is a flow diagram that outlines one example of a method that may be performed by an apparatus or system such as those disclosed herein. The blocks of method 1300, like other methods described herein, are not necessarily performed in the order indicated. In some implementation, one or more of the blocks of method 1300 may be performed concurrently. Moreover, some implementations of method 1300 may include more or fewer blocks than shown and/or described. The blocks of method 1300 may be performed by one or more devices, which may be (or may include) one or more instances of control system such as the control system 110 that is shown in Figure 1 and described above. For example, at least some aspects of method 1300 may be performed by an instance of the control system 110 that is configured to implement the experience player 002 of Figure 4 or Figure 12.
[0241] In this example, block 1305 involves receiving, by a control system configured to implement a demultiplexing module, an encoded bitstream that includes a plurality of data frames. The demultiplexing module may, for example, be an instance of the demultiplexer 1206 of Figure 12. In this example, the data frames include encoded audio data, encoded video data and encoded object-based sensory data. According to this example, the encoded object-based sensory data includes sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback environment. In this example, the encoded audio data stream, the encoded video data stream and the encoded object-based sensory data stream all include associated synchronization data.
[0242] According to this example, block 1310 involves extracting, by the control system and from the encoded bitstream, an encoded audio data stream, an encoded video data stream and
an encoded object-based sensory data stream. Block 1310 may, for example, involve parsing and/or demultiplexing the encoded bitstream 1201 that is described with reference to Figure 12.
[0243] In this example, block 1315 involves providing, by the control system, the encoded audio data stream to an audio decoder. Block 1315 may, for example, involve the demultiplexer 1206 of Figure 12 providing the encoded audio data stream to the audio decoder 1207.
[0244] Here, block 1320 involves providing, by the control system, the encoded video data stream to a video decoder. Block 1320 may, for example, involve the demultiplexer 1206 providing the encoded video data stream to the video decoder 1208.
[0245] According to this example, block 1325 involves providing, by the control system, the encoded object-based sensory data stream to a sensory data decoder. Block 1325 may, for example, involve the demultiplexer 1206 providing the object-based sensory data stream to the multi-sensory data stream decoder 1209.
[0246] In some examples, the sensory object metadata may include sensory object location information, sensory object size information, or both. According to some examples, each of the sensory objects may have a sensory object effect property indicating a type of effect that the sensory object is providing. In some examples, one or more of the sensory objects may have a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment. According to some examples, one or more of the sensory objects may be assigned to one or more layers in which sensory objects are grouped according to common sensory object characteristics. In some examples, the one or more layers may group sensory objects according to one or more of mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof. According to some examples, at least some of the sensory objects may have a priority property indicating a relative importance of each sensory object.
[0247] According to some examples, the object-based sensory data stream may include light objects, haptic objects, air flow objects, positional actuator objects or combinations thereof. According to some examples wherein the sensory object metadata includes light objects and lighting metadata, the lighting metadata may include direct-light object metadata, indirect light metadata, or combinations thereof. Alternatively, or additionally, the lighting metadata may be organized into one or more layers, which may include one or more ambient layers,
one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
[0248] In some examples, method 1300 may involve decoding the encoded object-based sensory data stream and providing a decoded object-based sensory data stream, including associated sensory synchronization data, to a sensory data Tenderer. In some such examples, method 1300 may involve receiving, by the sensory data Tenderer, the decoded object-based sensory data stream and receiving, by the sensory data Tenderer, playback environment information. The playback environment information may be an instance of the environment and actuator data 004 that is described herein. Accordingly, the playback environment information may include sensory actuator location information and sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment.
[0249] According to some such examples, method 1300 may involve determining, by the sensory data Tenderer and based at least in part on (a) the sensory objects, the sensory object metadata and the associated sensory synchronization data from the the decoded object-based sensory data stream and (b) the playback environment information, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment. According to some such examples, method 1300 may involve outputting, by the sensory data Tenderer, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment. The sensory actuator control commands may, for example, be provided to the MS controller APIs 003 described with reference to Figure 4. The sensory actuator control signals may, for example, be provided to the actuators 008 of a playback environment.
[0250] In some examples, each data frame of the plurality of data frames may include an encoded audio data subframe, an encoded video data subframe and an encoded object-based sensory data subframe. According to some examples, the data frames of the bitstream may be encoded according to a Moving Picture Experts Group (MPEG) standard. In some examples, the data frames of the bitstream may be arranged in the International Standards Organization (ISO) base media file format (ISOBMFF). According to some examples, the encoded object-based sensory data may reside in a timed metadata track.
[0251] According to some examples, the audio data stream may be encoded according to an Advanced Audio Coding (AAC), Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec. In some examples, the video data stream may be encoded according to an Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) or AOMedia Video 1 (AVI) codec.
[0252] Various features and aspects will be appreciated from the following enumerated example embodiments (“EEEs”):
EEE 1. A method for rendering an intended lighting environment, comprising: receiving, by a control system configured to implement a lighting environment Tenderer, object-based light data indicating the intended lighting environment, the objectbased light data including light objects and lighting metadata; receiving, by the control system, lighting information regarding a local lighting environment, wherein the lighting information includes one or more characteristics of one or more controllable light sources in the local lighting environment; determining, by the control system, a drive level for each of the one or more controllable light sources that approximates the intended lighting environment; and outputting, by the control system, the drive level to at least one of the controllable light sources.
EEE 2. The method of EEE 1, wherein: the object-based lighting metadata includes time information; and the determining involves determining one or more drive levels for one or more time intervals corresponding to the time information.
EEE 3. The method of EEE 1 or EEE 2, further comprising receiving viewing position information, wherein the determining involves determining one or more drive levels corresponding to the viewing position information.
EEE 4. The method of any one of EEEs 1-3, wherein: the lighting information includes one or more characteristics of one or more base light sources in the local lighting environment, the base light sources not being controllable by the control system; and the determining involves determining one or more drive levels based, at least in part, on the one or more characteristics of one or more non-controllable light sources.
EEE 5. The method of any one of EEEs 1-4, wherein the object-based lighting metadata indicates at least lighting object position information and lighting object color information.
EEE 6. The method of any one of EEEs 1-5, wherein the intended lighting environment is transmitted as one or more Image Based Lighting (IBL) objects.
EEE 7. The method of EEE 6, wherein the determining is based, at least in part, on an IBL map of environmental lighting produced by each of n controllable light sources (IBLn)in the local lighting environment at maximum intensity.
EEE 8. The method of EEE 7, wherein the determining is based, at least in part, on a base IBL map of the base environment lighting (IBLbase) that is not controllable by the control system.
EEE 9. The method of EEE 8, wherein the determining is based, at least in part, on a linear scaling (Scalen) for each dynamic lighting element IBLn such that a sum of light from each scaled IBLn plus the base lighting IBLbase most closely matches the intended lighting environment IBLref.
EEE 10. The method of EEE 9, wherein the determining is based, at least in part, on minimizing a difference between IBLref and the sum of light from each scaled IBLn plus the base lighting IBLb ase.
EEE 11. An apparatus configured to perform the method of any one of EEEs 1-10.
EEE 12. A system configured to perform the method of any one of EEEs 1-10.
EEE 13. One or more non-transitory computer-readable media having instructions stored thereon for controlling one or more devices to perform the method of any one of EEEs 1-10.
EEE 14. A method for providing an intended sensory experience, the method comprising: receiving, by a control system configured to implement a sensory Tenderer, one or more sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback environment; receiving, by the control system, playback environment information, wherein the playback environment information includes sensory actuator location information and
sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment; determining, by the control system and based on the playback environment information, the sensory objects and the sensory object metadata, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment; and outputting, by the control system, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment.
EEE 15. The method of EEE 14, wherein the one or more sensory objects include one or more light objects, one or more haptic objects, one or more air flow objects, one or more positional actuator objects or combinations thereof.
EEE 16. The method of EEE 14 or EEE 15, wherein the sensory object metadata includes light objects and lighting metadata, and wherein the lighting metadata includes direct-light object metadata, indirect light metadata, or combinations thereof.
EEE 17. The method of EEE 16, wherein the lighting metadata is organized into one or more layers including one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
EEE 18. The method of any one of EEEs 14-17, wherein the sensory object metadata includes sensory object location information.
EEE 19. The method of any one of EEEs 14-18, wherein the sensory object metadata includes sensory object size information.
EEE 20. The method of any one of EEEs 14-19, wherein each of the sensory objects has a sensory object effect property indicating a type of effect that the sensory object is providing.
EEE 21. The method of any one of EEEs 14-20, wherein one or more of the sensory objects has a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment.
EEE 22. The method of any one of EEEs 14-21, wherein one or more of the sensory objects are assigned to one or more layers in which sensory objects are grouped according to shared sensory object characteristics.
EEE 23. The method of EEE 22, wherein the one or more layers group sensory objects according to mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof.
EEE 24. The method of any one of EEEs 14-23, wherein at least some of the sensory objects have a priority property indicating a relative importance of each sensory object.
EEE 25. The method of any one of EEEs 14-24, wherein one or more of the sensory objects has a spatial panning property indicating how a sensory object can move within the sensory actuator playback environment, how a sensor object will affect the controllable sensory actuators in the sensory actuator playback environment, or combinations thereof.
EEE 26. The method of any one of EEEs 14-25, wherein at least some of the sensory objects have a mixing mode property indicating how multiple sensory objects can be reproduced by a single controllable sensory actuator.
EEE 27. The method of any one of EEEs 14-26, further comprising receiving, by the control system, overarching sensory object metadata comprising trim control information, mastering environment information, or a combination thereof.
EEE 28. An apparatus configured to perform the method of any one of EEEs 14-27.
EEE 29. A system configured to perform the method of any one of EEEs 14-27.
EEE 30. One or more non-transitory computer-readable media having instructions stored thereon for controlling one or more devices to perform the method of any one of EEEs 14-27.
EEE 31. A method for decoding a bitstream, the method comprising: receiving, by a control system configured to implement a demultiplexing module, an encoded bitstream that includes a plurality of data frames, the data frames including encoded audio data, encoded video data and encoded object-based sensory data, the encoded objectbased sensory data including sensory objects and corresponding sensory object metadata indicating intended sensory effects to be provided in a sensory actuator playback
environment, the encoded audio data stream, the encoded video data stream and the encoded object-based sensory data stream all including associated synchronization data; extracting, by the control system and from the encoded bitstream, an encoded audio data stream, an encoded video data stream and an encoded object-based sensory data stream; providing, by the control system, the encoded audio data stream to an audio decoder; providing, by the control system, the encoded video data stream to a video decoder; and providing, by the control system, the encoded object-based sensory data stream to a sensory data decoder.
EEE 32. The method of EEE 31, further comprising decoding the encoded object-based sensory data stream and providing a decoded object-based sensory data stream, including associated sensory synchronization data, to a sensory data Tenderer.
EEE 33. The method of EEE 32, further comprising: receiving, by the sensory data Tenderer, the decoded object-based sensory data stream; receiving, by the sensory data Tenderer, playback environment information, wherein the playback environment information includes sensory actuator location information and sensory actuator characteristic information regarding one or more controllable sensory actuators in the sensory actuator playback environment; determining, by the sensory data Tenderer and based at least in part on (a) the sensory objects, the sensory object metadata and the associated sensory synchronization data from the the decoded object-based sensory data stream and (b) the playback environment information, sensory actuator control commands or sensory actuator control signals for controlling one or more controllable sensory actuators in the sensory actuator playback environment; and outputting, by the sensory data Tenderer, the sensory actuator control commands or sensory actuator control signals for at least one of the controllable sensory actuators in the sensory actuator playback environment.
EEE 34. The method of any one of EEEs 31-33, wherein each data frame of the plurality of data frames includes an encoded audio data subframe, an encoded video data subframe and an encoded object-based sensory data subframe.
EEE 35. The method of any one of EEEs 31-34, wherein the data frames of the bitstream are encoded according to a Moving Picture Experts Group (MPEG) standard.
EEE 36. The method of any one of EEEs 31-35, wherein the data frames of the bitstream are arranged in an International Standards Organization (ISO) base media file format (ISOBMFF).
EEE 37. The method of EEE 35 or EEE 36, wherein the encoded object-based sensory data resides in a timed metadata track.
EEE 38. The method of any one of EEEs 31-37, wherein the sensory objects include one or more lighting objects, one or more haptic objects, one or more air flow objects, one or more positional actuator objects or combinations thereof.
EEE 39. The method of any one of EEEs 31-38, wherein the sensory object metadata includes lighting metadata and wherein the lighting metadata includes direct-light object metadata, indirect light metadata, or combinations thereof.
EEE 40. The method of EEE 39, wherein the lighting metadata is organized into one or more layers including one or more ambient layers, one or more dynamic layers, one or more custom layers, one or more overlay layers, or combinations thereof.
EEE 41. The method of any one of EEEs 31-40, wherein the sensory object metadata includes sensory object location information, sensory object size information, or both.
EEE 42. The method of any one of EEEs 31-41, wherein each of the sensory objects has a sensory object effect property indicating a type of effect that the sensory object is providing.
EEE 43. The method of any one of EEEs 31-42, wherein one or more of the sensory objects has a persistence property indicating a period of time that the sensory object will persist in the sensory actuator playback environment.
EEE 44. The method of any one of EEEs 31-43, wherein one or more of the sensory objects are assigned to one or more layers in which sensory objects are grouped according to common sensory object characteristics.
EEE 45. The method of EEE 44, wherein the one or more layers group sensory objects according to one or more of mood, ambience, information, attention, color, intensity, size, shape, position, region in space, or combinations thereof.
EEE 46. The method of any one of EEEs 31-45, wherein at least some of the sensory objects have a priority property indicating a relative importance of each sensory object.
EEE 47. The method of any one of EEEs 31-46, wherein the audio data stream is encoded according to an Advanced Audio Coding (AAC), Dolby AC3, Dolby EC3, Dolby AC4 or Dolby Atmos codec.
EEE 48. The method of any one of EEEs 31-46, wherein the video data stream is encoded according to an Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) or AOMedia Video 1 (AVI) codec.
EEE 49. An apparatus configured to perform the method of any one of EEEs 31-48.
EEE 50. A system configured to perform the method of any one of EEEs 31-48.
EEE 51. One or more non-transitory computer-readable media having instructions stored thereon for controlling one or more devices to perform the method of any one of EEEs 31-48.
[0253] The above description illustrates various embodiments of the present disclosure along with examples of how aspects of the present disclosure may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present disclosure as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents will be evident to those skilled in the art and may be employed without departing from the spirit and scope of the disclosure as defined by the claims.
Claims
1. A method, comprising: receiving, by a control system, a content bitstream including encoded object-based sensory data, the encoded object-based sensory data including one or more sensory objects and corresponding sensory metadata, the encoded object-based sensory data corresponding to sensory effects including lighting, haptics, airflow, one or more positional actuators, or combinations thereof, the sensory effects to be provided by one or more sensory actuators in an environment; extracting, by the control system, the object-based sensory data from the content bitstream; and providing, by the control system, the object-based sensory data to a sensory Tenderer.
2. The method of claim 1, wherein the object-based sensory metadata includes sensory spatial metadata indicating at least a spatial position for rendering the object-based sensory data within the environment, an area for rendering the object-based sensory data within the environment, or combinations thereof.
3. The method of claim 1 or claim 2, wherein the object-based sensory data does not correspond to particular sensory actuators in the environment.
4. The method of any one of claims 1-3, wherein the object-based sensory data includes abstracted sensory reproduction information allowing the sensory Tenderer to reproduce one or more authored sensory effects via one or more sensory actuator types, via various numbers of sensory actuators and from various sensory actuator positions in the environment.
5. The method of any one of claims 1-4, further comprising: receiving, by the sensory Tenderer, the object-based sensory data; receiving, by the sensory Tenderer, environment descriptor data corresponding to one or more locations of sensory actuators in the environment; receiving, by the sensory Tenderer, actuator descriptor data corresponding to properties of the sensory actuators in the environment; and
providing, by the sensory Tenderer, one or more actuator control signals for controlling the sensory actuators in the environment to produce one or more sensory effects indicated by the object-based sensory data.
6. The method of claim 5, further comprising providing, by the one or more sensory actuators in the environment, the one or more sensory effects.
7. The method of any one of claims 1-6, wherein the content bitstream also includes one or more encoded audio objects synchronized with the encoded object-based sensory data, the audio objects including one or more audio signals and corresponding audio object metadata, further comprising: extracting, by the control system, audio objects from the content bitstream; and providing, by the control system, the audio objects to an audio Tenderer.
8. The method of claim 7, wherein the audio object metadata includes at least audio object spatial metadata indicating an audio object spatial position for rendering the one or more audio signals within the environment.
9. The method of claim 7 or claim 8, further comprising: receiving, by the audio Tenderer, the one or more audio objects; receiving, by the audio Tenderer, loudspeaker data corresponding to one or more loudspeakers in the environment; and providing, by the audio Tenderer, one or more loudspeaker control signals for controlling the one or more loudspeakers in the environment to play back audio corresponding to the one or more audio objects and synchronized with the one or more sensory effects.
10. The method of claim 9, further comprising playing back, by the one or more loudspeakers in the environment, the audio corresponding to the one or more audio objects.
11. The method of claim 9 or claim 10, wherein the content bitstream includes encoded video data synchronized with the encoded audio objects and the encoded object-based sensory data and wherein the method further comprises: extracting, by the control system, video data from the content bitstream; providing, by the control system, the video data to a video Tenderer; receiving, by the video Tenderer, the video data; and
providing, by the video Tenderer, one or more video control signals for controlling one or more display devices in the environment to present one or more images corresponding to the one or more video control signals and synchronized with the one or more audio objects and the one or more sensory effects.
12. The method of claim 11, further comprising presenting, by the one or more display devices in the environment, the one or more images corresponding to the one or more video control signals.
13. The method of any one of claims 1-12, wherein the environment is a virtual environment.
14. The method of any one of claims 1-11, wherein the environment is a physical, real- world environment.
15. The method of claim 14, wherein the environment is a room environment or an vehicle environment.
16. An apparatus configured to perform the method of any one of claims 1-15.
17. A system configured to perform the method of any one of claims 1-15.
18. One or more non-transitory computer-readable media having instructions stored thereon for controlling one or more devices to perform the method of any one of claims 1-15.
-n-
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363514107P | 2023-07-17 | 2023-07-17 | |
US202363514094P | 2023-07-17 | 2023-07-17 | |
US202363514096P | 2023-07-17 | 2023-07-17 | |
US63/514,096 | 2023-07-17 | ||
US63/514,107 | 2023-07-17 | ||
US63/514,094 | 2023-07-17 | ||
US202463669232P | 2024-07-10 | 2024-07-10 | |
US63/669,232 | 2024-07-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2025019438A1 true WO2025019438A1 (en) | 2025-01-23 |
Family
ID=92141873
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2024/038069 WO2025019438A1 (en) | 2023-07-17 | 2024-07-15 | Providing object-based multi-sensory experiences |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2025019438A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110125789A1 (en) * | 2008-07-16 | 2011-05-26 | Sanghyun Joo | Method and apparatus for representing sensory effects and computer readable recording medium storing sensory device command metadata |
US20160192105A1 (en) * | 2013-07-31 | 2016-06-30 | Dolby International Ab | Processing Spatially Diffuse or Large Audio Objects |
-
2024
- 2024-07-15 WO PCT/US2024/038069 patent/WO2025019438A1/en active Search and Examination
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110125789A1 (en) * | 2008-07-16 | 2011-05-26 | Sanghyun Joo | Method and apparatus for representing sensory effects and computer readable recording medium storing sensory device command metadata |
US20160192105A1 (en) * | 2013-07-31 | 2016-06-30 | Dolby International Ab | Processing Spatially Diffuse or Large Audio Objects |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10819969B2 (en) | Method and apparatus for generating media presentation content with environmentally modified audio components | |
US11863845B2 (en) | Geometry matching in virtual reality and augmented reality | |
US20230336840A1 (en) | Cinematic mastering for virtual reality and augmented reality | |
US11967014B2 (en) | 3D conversations in an artificial reality environment | |
EP3465679B1 (en) | Method and apparatus for generating virtual or augmented reality presentations with 3d audio positioning | |
CN110119232B (en) | Floating image display system | |
JP5059026B2 (en) | Viewing environment control device, viewing environment control system, and viewing environment control method | |
US20140308024A1 (en) | Method and apparatus for generating 3d audio positioning using dynamically optimized audio 3d space perception cues | |
CN116828383A (en) | audio processing | |
US9219910B2 (en) | Volumetric display system blending two light types to provide a new display medium | |
US20160266543A1 (en) | Three-dimensional image source for enhanced pepper's ghost illusion | |
WO2025019438A1 (en) | Providing object-based multi-sensory experiences | |
KR20160006087A (en) | Device and method to display object with visual effect | |
WO2025019440A1 (en) | Multi-sensory object renderer | |
WO2025019443A1 (en) | Multi-sensory (ms) spatial mapping and characterization for ms rendering | |
KR101410976B1 (en) | Apparatus and method for positioning of speaker | |
EP2719196B1 (en) | Method and apparatus for generating 3d audio positioning using dynamically optimized audio 3d space perception cues | |
WO2025019441A1 (en) | Generation of object-based multi-sensory content for virtual worlds | |
US20240406658A1 (en) | Methods and Systems for Automatically Updating Look Directions of Radiation Patterns | |
Desnoyers-Stewart et al. | Kingdom of Illumination VR: Shining a light on new techniques for immersive video | |
GB2632902A (en) | Metadata for spatial audio rendering | |
Page et al. | Rendering sound and images together | |
CN118525302A (en) | Self-timer stereo video | |
HK40005416A (en) | Geometry matching in virtual reality and augmented reality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24749120 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) |