EP4104457A1 - Suivi d'audio retardé - Google Patents

Suivi d'audio retardé

Info

Publication number
EP4104457A1
EP4104457A1 EP21754163.0A EP21754163A EP4104457A1 EP 4104457 A1 EP4104457 A1 EP 4104457A1 EP 21754163 A EP21754163 A EP 21754163A EP 4104457 A1 EP4104457 A1 EP 4104457A1
Authority
EP
European Patent Office
Prior art keywords
user
origin
determining
audio signal
head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21754163.0A
Other languages
German (de)
English (en)
Other versions
EP4104457A4 (fr
Inventor
Anastasia Andreyevna Tajik
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Magic Leap Inc
Original Assignee
Magic Leap Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Magic Leap Inc filed Critical Magic Leap Inc
Publication of EP4104457A1 publication Critical patent/EP4104457A1/fr
Publication of EP4104457A4 publication Critical patent/EP4104457A4/fr
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Definitions

  • This disclosure relates in general to systems and methods for presenting audio to a user, and in particular to systems and methods for presenting audio to a user in a mixed reality environment.
  • Virtual environments are ubiquitous in computing environments, finding use in video games (in which a virtual environment may represent a game world); maps (in which a virtual environment may represent terrain to be navigated): simulations (in which a virtual environment may simulate a real environment); digital storytelling (in which virtual characters may interact with each other in a virtual environment); and many other applications.
  • Modern computer users are generally comfortable perceiving, and interacting with, virtual environments.
  • users’ experiences with virtual environments can be limited by the technology for presenting virtual environments. For example, conventional displays (e.g., 2D display screens) and audio systems (e.g., fixed speakers) may be unable to realize a virtual environment in ways that create a compelling, realistic, and immersi ve experience.
  • a user of an XR system in a large concert hall will expect the virtual sounds of the XR system to have large, cavernous sonic qualities; conversely, a user in a small apartment will expect the sounds to be more dampened, close, and immediate.
  • realism is further enhanced by spatializing virtual sounds. For example, a virtual object may visually fly past a user from behind, and the user may expect the corresponding virtual sound to similarly reflect the spatial movement of the virtual object with respect to the user.
  • an immersive audio experience can be equally as important, if not more important, than an immersive visual experience.
  • FIG. 4 illustrates an example functional block diagram for an example mixed reality system, according to some embodiments.
  • a virtual object in the virtual environment may generate a sound originating from a location coordinate of the object (e.g., a virtual character may speak or cause a sound effect); or the virtual environment may be associated with musical cues or ambient sounds that may or may not be associated with a particular location.
  • a processor can determine an audio signal corresponding to a “listener” coordinate — for instance, an audio signal corresponding to a composite of sounds in the virtual environment, and mixed and processed to simulate an audio signal that would he heard by a listener at the listener coordinate — and present the audio signal to a user via one or more speakers.
  • a virtual environment exists only as a computational structure, a user cannot directly perceive a virtual environment using one's ordinary senses. Instead, a user can perceive a virtual environment only indirectly, as presented to the user, for example by a display, speakers, haptic output devices, etc.
  • a user cannot directly touch, manipulate, or otherwise interact with a virtual environment; but can provide input data, via input devices or sensors, to a processor that can use the device or sensor data to update the virtual environment.
  • a camera sensor can provide optical data indicating that a user is trying to move an object in a virtual environment, and a processor can use that data to cause the object to respond accordingly in the virtual environment.
  • a corresponding virtual object may include a cylinder of roughly the same height and radius as the real lamp post (reflecting that lamp posts may be roughly cylindrical in shape). Simplifying virtual objects in this manner can allow computational efficiencies, and can simplify calculations to be performed on such virtual objects. Further, in some examples of a MRE, not all real objects in a real environment may be associated with a corresponding virtual object, likewise, in some examples of a MRE, not all virtual objects in a virtual environment may be associated with a corresponding real object. That is, some virtual objects may solely in a virtual environment of a MRE, without any real-world counterpart.
  • virtual objects may have characteristics that differ, sometimes drastically, from those of corresponding real objects.
  • a real environment in a MRE may include a green, two-armed cactus — a prickly inanimate object
  • a corresponding virtual object in the MRE may have the characteristics of a green, two-armed virtual character with human facial features and a surly demeanor.
  • the virtual object resembles its corresponding real object in certain characteristics (color, number of arms); but differs from the real object in oilier characteristics (facial features, personality).
  • virtual objects have the potential to represent real objects in a creative, abstract, exaggerated, or fanciful manner; or to impart behaviors (e.g., human personalities) to otherwise inanimate real objects.
  • virtual objects may be purely fanciful creations with no real-world counterpart (e.g., a virtual monster in a virtual environment, perhaps at a location corresponding to an empty space in a real environment).
  • an environment/world coordinate system 108 (comprising an x-axis 108X, a y-axis 108Y, and a z-axis 108Z) with its origin at point 106 (a world coordinate), can define a coordinate space for real environment 100.
  • the origin point 106 of the envi ronment/world coordinate system 108 may correspond to where the mixed reality system 112 was powered on.
  • the origin point 106 of the environment/world coordinate system 108 may be reset during operation.
  • user 110 may be considered a real object in real environment 100; similarly, user 110's body parts (e.g., hands, feet) may be considered real objects in real environment 100.
  • a matrix (which may include a translation matrix and a Quaternion matrix or other rotation matrix), or other suitable representation can characterize a transformation between the user/listener/head coordinate system 114 space and the environment/world coordinate system 108 space.
  • a left ear coordinate 116 and a right ear coordinate 117 may be defined relative to the origin point 115 of the user/listener/head coordinate system 114.
  • a matrix (which may include a translation matrix and a Quaternion matrix or other rotation matrix), or other suitable representation can characterize a transformation between the left ear coordinate 116 and the right ear coordinate 117, and user/listener/head coordinate system 114 space.
  • FIG. IB illustrates an example virtual environment 130 that corresponds to real environment 100.
  • the virtual environment 130 shown includes a virtual rectangular room 104B corresponding to real rectangular room 104A; a virtual object 122B corresponding to real object 122A; a virtual object 124B corresponding to real object 124A; and a virtual object 126B corresponding to real object 126A.
  • Metadata associated with the virtual objects 122B, 124B, 126B can include information derived from the corresponding real objects 122A, 124A, 126A.
  • Virtual environment 130 additionally includes a virtual monster 132, which does not correspond to any real object in real environment 100, Real object 128A in real environment 100 does not correspond to any virtual object in virtual environment 130.
  • each of the virtual objects 122B, 124B, 126B, and 132 may have their own persistent coordinate point relative to the origin point 134 of the persistent coordinate system 133.
  • environment/world coordinate system 108 defines a shared coordinate space for both real environment 100 and virtual environment 130.
  • the coordinate space has its origin at point 106.
  • the coordinate space is defined by the same three orthogonal axes (108X, 108Y, 108Z). Accordingly, a first location in real environment 100, and a second, corresponding location in virtual environment 130, can be described with respect to the same coordinate space. This simplifies identifying and displaying corresponding locations in real and virtual environments, because the same coordinates can be used to identify both locations. However, in some examples, corresponding real and virtual environments need not use a shared coordinate space.
  • Example mixed reality system 112 can include a wearable head device (e.g., a wearable augmented reality or mixed reality head device) comprising a display (which may include left and right transmissive displays, which may be near-eve displays, and associated components for coupling light from the displays to the user's eyes): left and right speakers (e.g., positioned adjacent to the user's left and right ears, respectively); an inertial measurement unit (IMU)(e.g., mounted to a temple arm of the head device); an orthogonal coil electromagnetic receiver (e.g., mounted to the left temple piece); left and right cameras (e.g., depth (time-of- flight) cameras) oriented away from the user; and left and right eye cameras oriented toward the user (e.g., for detecting the user's eye movements).
  • a wearable head device e.g., a wearable augmented reality or mixed reality head device
  • a display which may include left and right transmissive displays, which may be near-eve displays,
  • FIGs. 2A-2D illustrate components of an example mixed reality system 200
  • the left eyepiece 2108 can include a left incoupling grating set 2112, a left orthogonal pupil expansion (OPE) grating set 2120, and a left exit (output) pupil expansion (EPE) grating set 2122.
  • the right eyepiece 2110 can include a right incoupling grating set 2118, a right OPE grating set 2114 and a right EPE grating set 2116. Irnagewise modulated light can he transferred to a user's eye via the incoupling gratings 2112 and 2118, OPEs 2114 and 2120, and EPE 2116 and 2122.
  • the eyepieces 2108 and 2110 can include other arrangements of gratings and/or refractive and reflective features for controlling the coupling of irnagewise modulated light to the user's eyes.
  • the OPE grating sets 2114, 2120 incrementally deflect light propagating by TIR down toward the PIPE grating sets 2116, 2122.
  • the EPE grating sets 2116, 2122 incrementally couple light toward the user's face, including the pupils of the user's eyes.
  • the 6DOF totem subsystem 404A and the 6DOF subsystem 404B cooperate to determine six coordinates (e.g., offsets in three translation directions and rotation along three axes) of the handheld controller 400B relative to the wearable head device 400A.
  • the six degrees of freedom may be expressed relative to a coordinate system of the wearable head device 400A
  • the three translation offsets may be expressed as X, Y, and Z offsets in such a coordinate system, as a translation matrix, or as some other representation.
  • the rotation degrees of freedom may be expressed as sequence of yaw, pitch and roil rotations, as a rotation matrix, as a quaternion, or as some other representation.
  • Some audio systems may suffer limitations in their ability to provide immersive spatialized audio.
  • some headphone systems may present sound in a stereo field by separately presenting left and right audio channels to a user's left and right ears; but without knowledge of the location (e.g., position and/or orientation) of the user's head, the sound may be heard to be statically fixed in relation to the user's head.
  • a sound presented to a user's left ear through a left channel may continue to be presented to the user's left ear regardless of whether the user turns their head, moves forward, backward, side to side, etc.
  • This static behavior may be undesirable for MR systems because it may be inconsistent with a user's expectations for how sounds dynamically behave in a real environment.
  • a listener will expect sounds emitted by that source, and heard by the listener's left and right ears, to become louder or softer, or to exhibit other dynamic audio characteristics (e.g., Doppler effects), in accordance with how the user moves and rotates with respect to that sound source's position. For example, if a static sound source is initially located on a user's left side, the sounds emitted by that sound source may predominate in the user's left ear as compared to the user's right ear’.
  • other dynamic audio characteristics e.g., Doppler effects
  • the user will expect the sounds to predominate in the user's right ear.
  • the sound source may continually appear to be changing location relative to the user (e.g., minute positional changes may result in minute, but perceptible, changes in detected volume at each ear).
  • minute positional changes may result in minute, but perceptible, changes in detected volume at each ear.
  • users can take advantage of realistic audio cues to identify and place a sound source within the environment.
  • a user e.g., user 502
  • a sound source e.g, virtual object 504b
  • the overpowering sound of a virtual cello may drown out sounds from virtual violins.
  • a sound source origin can he configured to always remain at least a minimum distance from the user; for instance, if the magnitude of an offset between the sound source origin and the user's head falls below a minimum threshold, the origin can be relocated to an alternate position that is at least a minimum distance from the user's head.
  • one or more microphones for example, around a user's ear (e.g., one or more microphones of a AIR system) can he used to determine one or more user-specific HRTFs.
  • a distance between a user and a virtual sound source may be simulated using suitable methods (e.g., loudness attenuation, high frequency attenuation, a mix of direct and reverberant sounds, motion parallax, etc.).
  • virtual objects 604a and/or 604b may be configured to radiate sound as a point source.
  • virtual objects 604a and/or 604b may include a physical three-dimensional model of a sound source, and a sound may be generated by modelling interactions with the sound source.
  • virtual object 604a may include a virtual guitar including a wood body, strings, tuning pegs, etc.
  • a sound may be generated by modelling plucking one or more strings and how the action interacts with other components of the virtual guitar.
  • virtual objects 604a and/or 604b may be tied to one or more objects (e.g., center 602 and/or vector 606).
  • virtual object 604a may be assigned to designated position 608a.
  • designated position 608a can be a fixed point relative to vector 606 and/or center 602.
  • virtual object 604b may be assigned to designated position 608b.
  • designated position 608b can be a fixed point relative to vector 606 and/or center 602.
  • Center 602 can be a point and/or a three-dimensional object.
  • the spatiaiizing and/or rendering engine may process the inputs and produce an output that may include a spatiaiized sound that can be configured to percei ve the sound as originating from the location of virtual object 604a.
  • Spatiaiizing and/or rendering engine may use any suitable techniques to render spatiaiized sound, including but not limited to bead-related transfer functions and/or distance attenuation techniques.
  • a spatiaiizing and/or rendering engine may receive a data structure to render delayed follow spatiaiized sound.
  • a delayed follow data structure may include a data format with parameters and/or metadata regarding position relative to headpose and/or delayed follow parameters.
  • an application running on a MR system may send one or more delayed follow data structures to a spatiaiizing and/or rendering engine to render delayed follow spatiaiized sound.
  • each virtual object and/or sound source may have its own, separate parameters.
  • a center point/object and a vector are used to position virtual objects, any appropriate coordinate system (e.g., Cartesian, spherical, etc.) may be used.
  • determining the origin of the audio signal further comprises: in accordance with a determination that the rate of change exceeds a threshold, determining that the origin comprises a first origin; and in accordance with a determination that the rate of change does not exceed the threshold, determining that the origin comprises a second origin different from the first origin. In some examples, determining the origin of the audio signal further comprises: in accordance with a determination that a magnitude of the offset is below a threshold, determining that the origin comprises a first origin; and in accordance with a determination that the magnitude of the offset is not below the threshold, determining that the origin comprises a second origin different from the first origin.
  • a method of presenting audio to a user of a wearable head device comprises: determining, based on one or more sensors of the wearable head device, a first position of the user's head at a first time; determining, based on the one or more sensors, a second position of the user's head at a second time later than the first time; determining, based on a difference between the first position and the second position, an audio signal; and presenting the audio signal to the user via a speaker of the wearable head device, wherein: determining the audio signal comprises determining an origin of the audio signal in a virtual environment; presenting the audio signal to the user comprises presenting the audio signal as if originating from the determined origin; and determining the origin of the audio signal comprises applying an offset to a position of the user's head.
  • determining the audio signal further comprises determining a velocity in the virtual environment; and presenting the audio signal to the user further comprises presenting the audio signal as if the origin is in motion with the determined velocity.
  • determining the velocity comprises determining the velocity based on a difference between the first position of the user's head and the second position of the user's head.
  • the offset is determined based on the first position of the user's head.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Processing Or Creating Images (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne des systèmes et des procédés de présentation de signal audio à réalité mixte. Dans un procédé donné à titre d'exemple, un signal audio est présenté à un utilisateur portant un dispositif monté sur la tête. Une première position de la tête de l'utilisateur à un premier instant est déterminée sur la base d'un ou de plusieurs capteurs du dispositif monté sur la tête. Une seconde position de la tête de l'utilisateur à un second moment ultérieur à la première fois est déterminée sur la base du ou des capteurs. Un signal audio est déterminé sur la base d'une différence entre la première position et la seconde position. Le signal audio est présenté à l'utilisateur par l'intermédiaire d'un haut-parleur du dispositif monté sur la tête. La détermination du signal audio comprend la détermination d'une origine du signal audio dans un environnement virtuel. La présentation du signal audio à l'utilisateur comprend la présentation du signal audio comme si provenant de l'origine déterminée. La détermination de l'origine du signal audio comprend l'application d'un décalage à une position de la tête de l'utilisateur.
EP21754163.0A 2020-02-14 2021-02-12 Suivi d'audio retardé Pending EP4104457A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202062976986P 2020-02-14 2020-02-14
PCT/US2021/017971 WO2021163573A1 (fr) 2020-02-14 2021-02-12 Suivi d'audio retardé

Publications (2)

Publication Number Publication Date
EP4104457A1 true EP4104457A1 (fr) 2022-12-21
EP4104457A4 EP4104457A4 (fr) 2023-07-19

Family

ID=77273537

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21754163.0A Pending EP4104457A4 (fr) 2020-02-14 2021-02-12 Suivi d'audio retardé

Country Status (4)

Country Link
EP (1) EP4104457A4 (fr)
JP (1) JP2023514571A (fr)
CN (1) CN115398935A (fr)
WO (1) WO2021163573A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117956373A (zh) * 2022-10-27 2024-04-30 安克创新科技股份有限公司 音频处理方法、音频播放设备以及计算机可读存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10015620B2 (en) * 2009-02-13 2018-07-03 Koninklijke Philips N.V. Head tracking
US10595147B2 (en) * 2014-12-23 2020-03-17 Ray Latypov Method of providing to user 3D sound in virtual environment
US10123147B2 (en) * 2016-01-27 2018-11-06 Mediatek Inc. Enhanced audio effect realization for virtual reality
EP3264801B1 (fr) * 2016-06-30 2019-10-02 Nokia Technologies Oy Fourniture de signaux audio dans un environement virtuel
US10278003B2 (en) * 2016-09-23 2019-04-30 Apple Inc. Coordinated tracking for binaural audio rendering
US10375506B1 (en) * 2018-02-28 2019-08-06 Google Llc Spatial audio to enable safe headphone use during exercise and commuting

Also Published As

Publication number Publication date
JP2023514571A (ja) 2023-04-06
EP4104457A4 (fr) 2023-07-19
CN115398935A (zh) 2022-11-25
WO2021163573A1 (fr) 2021-08-19

Similar Documents

Publication Publication Date Title
US11778398B2 (en) Reverberation fingerprint estimation
US11589182B2 (en) Dual listener positions for mixed reality
US11800174B2 (en) Mixed reality virtual reverberation
US11627428B2 (en) Immersive audio platform
US11477599B2 (en) Delayed audio following
US20230396948A1 (en) Delayed audio following
WO2021163573A1 (fr) Suivi d'audio retardé

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220913

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20230621

RIC1 Information provided on ipc code assigned before grant

Ipc: H04S 7/00 20060101ALI20230615BHEP

Ipc: G06F 3/16 20060101ALI20230615BHEP

Ipc: G06F 3/01 20060101ALI20230615BHEP

Ipc: H04R 5/00 20060101AFI20230615BHEP