US20160241980A1 - Adaptive ambisonic binaural rendering - Google Patents
Adaptive ambisonic binaural rendering Download PDFInfo
- Publication number
- US20160241980A1 US20160241980A1 US14/988,589 US201614988589A US2016241980A1 US 20160241980 A1 US20160241980 A1 US 20160241980A1 US 201614988589 A US201614988589 A US 201614988589A US 2016241980 A1 US2016241980 A1 US 2016241980A1
- Authority
- US
- United States
- Prior art keywords
- signals
- ambisonic
- cos
- sin
- ambisonic signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/02—Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/012—Head tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/038—Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0381—Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- Ambisonics is an effective technique to encode and reconstruct sound fields. This technique is based on the orthogonal decomposition of a sound field in the spherical coordinates in the 3D space or cylindrical decomposition in the 2D space. In the decoding process, the ambisonic signals are decoded to produce speaker signals. The higher the order of the ambisonics, the finer reconstruction of the sound fields achieved. Ambisonics provides significant flexibility to recreate 3D audio on any playback setup such as large number of loudspeakers and headphones. Particularly, in mobile applications and Head-Mounted Displays (HMD), ambisonic rendering to headphones is of great interest.
- HMD Head-Mounted Displays
- a user equipment includes a memory element and a processor.
- the memory element is configured to store a plurality of head-related transfer functions.
- the processor is configured to receive an audio signal.
- the audio signal includes a plurality of ambisonic signals.
- the processor is also configured to identify an orientation of the UE based on physical properties of the UE.
- the processor is also configured to rotate the plurality of ambisonic signals based on the orientation of the UE.
- the processor is also configured to filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals.
- the processor is also configured to output the speaker signals.
- a method for audio signal processing.
- the method includes receiving an audio signal.
- the audio signal includes a plurality of ambisonic signals.
- the method also includes identifying an orientation of the UE based on physical properties of the UE.
- the method also includes rotating the plurality of ambisonic signals based on the orientation of the UE.
- the method also includes the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals.
- the method also includes outputting the speaker signals.
- FIG. 1 illustrates an example HMD according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented
- FIG. 2 illustrates an example view with content in an HMD according to an embodiment of this disclosure
- FIG. 3 illustrates an example Cartesian domain with respect to a user according to an embodiment of this disclosure
- FIG. 4 illustrates block diagram for adaptive ambisonic binaural rendering according to an embodiment of this disclosure
- FIG. 1 illustrates an example HMD 100 according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented.
- the embodiment of the HMD 100 illustrated in FIG. 1 is for illustration only, the HMD 100 comes in a wide variety of configurations, and FIG. 1 does not limit the scope of this disclosure to any particular implementation of a HMD.
- the processor 140 is also capable of executing other processes and programs resident in the memory 160 .
- the processor 140 can move data into or out of the memory 160 as required by an executing process.
- the processor 140 is configured to execute the applications 162 based on the OS 161 or in response to signals received from eNBs or an operator.
- the processor 140 is also coupled to the I/O interface 145 , which provides the HMD 100 with the ability to connect to other devices, such as laptop computers and handheld computers.
- the I/O interface 145 is the communication path between these accessories and the processor 140 .
- the memory 160 is coupled to the processor 140 .
- Part of the memory 160 could include a random access memory (RAM), and another part of the memory 160 could include a Flash memory or other read-only memory (ROM).
- RAM random access memory
- ROM read-only memory
- the sensor(s) 165 can further include a control circuit for controlling at least one of the sensors included therein. As will be discussed in greater detail below, one or more of these sensor(s) 165 may be used to control audio rendering, determine the orientation and facing direction of the user for 3D content display identification, etc. Any of these sensor(s) 165 may be located within the HMD 100 , within a headset configured to hold the HMD 100 , or in both the headset and HMD 100 , for example, in embodiments where the HMD 100 includes a headset.
- the touchscreen 150 can include a touch panel, a (digital) pen sensor, a key, or an ultrasonic input device.
- the touchscreen 150 can recognize, for example, a touch input in at least one scheme among a capacitive scheme, a pressure sensitive scheme, an infrared scheme, or an ultrasonic scheme.
- the touchscreen 150 can also include a control circuit. In the capacitive scheme, the touchscreen 150 can recognize touch or proximity.
- the HMD 100 may include circuitry for and applications for providing 3D audio for a HMD.
- FIG. 1 illustrates one example of HMD 100
- various changes may be made to FIG. 1 .
- various components in FIG. 1 could be combined, further subdivided, or omitted and additional components could be added according to particular needs.
- the processor 140 could be divided into multiple processors, such as one or more central processing units (CPUs) and one or more graphics processing units (GPUs).
- FIG. 1 illustrates the HMD 100 configured as a mobile telephone, tablet, or smartphone, the HMD 100 could be configured to operate as other types of mobile or stationary devices.
- Embodiments of the present disclosure provide an adaptive ambisonic binaural rendering framework for stereoscopic 3D VR or AR applications on the HMD 100 .
- the user's head motion i.e., the movement of the HMD 100
- sensor(s) 165 in the HMD 100 is tracked using sensor(s) 165 in the HMD 100 and used to control the binaural rendering.
- ambisonic signals are rotated according to the HMD 100 orientation and then mapped to virtual speakers located at fixed positions.
- the rotated ambisonic signals and a fixed set of Head-Related Transfer Functions (HRTF) are used to produce ear signals.
- HRTF Head-Related Transfer Functions
- ambisonic rendering can be adapted to the HMD 100 orientation to recreate an original sound field.
- Various embodiments of this disclosure provide a system for adaptive ambisonic binaural rendering to make audio scene independent from head movement. Binaural ambisonic rendering can be done through mapping of ambisonic signals to virtual speakers and then filtering each loudspeaker signal with a pair of Head-Related Transfer Functions (HRTF) corresponding to the position of the virtual speakers (relative to the head).
- HRTF Head-Related Transfer Functions
- the positions of virtual speakers remain unchanged, and for each new HMD orientation, the original ambisonic signals and a new set of HRTFs are used to produce ear signals.
- the positions of virtual speakers are changed according to HMD orientation to make the audio scene independent from head movement.
- the original ambisonic signals and a new set of HRTFs corresponding to the positions of the speakers are used to produce speaker signals.
- ambisonic signals are rotated according to the HMD (or head) orientation and then mapped to virtual speakers located at fixed positions. Then the rotated ambisonic signals and a fixed set of HRTFs are used to produce ear signals.
- This embodiment is advantageous as it needs only one set of HRTFs for binaural rendering, one HRTF for each headphone speaker (or ear).
- FIG. 2 illustrates an example view 202 with content in an HMD 100 according to an embodiment of this disclosure.
- a user is wearing the HMD 100 and is seeing the view 202 .
- the view 202 includes a ninety-six degree viewing angle. In different embodiments, other viewing angles can be used.
- HMD 204 with mega sized screens and ninety-six degree viewing angles allow users to feel the world beyond peripheral vision.
- the user may desire to seamlessly switch between the VR world and the real world.
- a user is watching a movie in HMD 100 and wants to write an email.
- the user can draft the email in the VR environment without removing the HMD 100 .
- the mobile device can display the mobile device environment in the VR world.
- Various embodiments of the present disclosure provide content within an angular range that is wider than the user's current 3D view frustum 310 .
- the angular range 315 e.g., on the x-z plane assuming a Cartesian coordinate system with the x direction generally denoting left/right or yaw, the y direction generally denoting forward/backwards, and the z direction generally denoting up/down or pitch
- the angular range 315 e.g., on the x-z plane assuming a Cartesian coordinate system with the x direction generally denoting left/right or yaw, the y direction generally denoting forward/backwards, and the z direction generally denoting up/down or pitch
- the HMD 100 displays, either actually or virtually (i.e., not actually displayed on the display 155 but actually displayed when the HMD 100 is moved to a location where the element is virtually displayed), some UI elements 305 outside the current 3D view frustum 310 . However, the HMD 100 places these UI elements 305 within the angular range 315 for the UI so that the user would not have to turn the head too much to the left or the right (i.e., yaw or x movement) to see all displayed UI elements 305 .
- the HMD 100 places the elements within the user's current 3D view frustum, i.e., the portion of the total viewable 3D space that is currently viewable by the user as a result of the HMD's 100 current detected orientation and facing direction.
- the HMD 100 detects the user's head motions, i.e., the movement of the HMD 100 , using the sensor(s) 165 on the HMD 100 and/or headset, such as, for example, a gyroscope, an accelerometer, etc.
- the HMD 100 displays the UI elements 305 as well as other elements of the display (e.g., content) to respond to the head motions to simulate looking at and interacting with the real-world view and objects.
- Rotation matrices for up to second order ambisonics are identified.
- Many ambisonic recordings are third and higher order.
- Another issue would be real-time binaural rendering with no discontinuities (in time and space) while changing the ambisonic signals according to the head movement.
- FIG. 3 illustrates an example Cartesian domain 300 with respect to a user 305 according to an embodiment of this disclosure.
- a user 305 is seen without wearing the HMD 100 , but could be wearing the HMD 100 .
- the coordinates in the Cartesian domain 300 may also be considered with respect to the HMD 100 .
- the axes X, Y, and Z can be in positive and negative directions.
- the user 305 can also rotate within the Cartesian domain 300 .
- One or more embodiments of this disclosure provide different techniques for adaptive ambisonic rendering.
- the different techniques are based on the equivalence of a HMD rotation in one direction and sound field rotation in the opposite direction.
- One embodiment is based on changing the location of virtual speakers to make the reproduced sound field independent from head movement.
- positions of virtual speakers are changed sequentially for rotation around the three axes X, Y, and Z in the Cartesian domain 320 .
- This embodiment can be used to do adaptive binaural rendering for any ambisonic order.
- a new rotation matrix for third order ambisonics is applied. This embodiment rotates ambisonics (for example, up to third order) in any direction in 3D space through simple matrix multiplication and can use only one set of HRTFs.
- fourth order, or higher, ambisonics can be used as well.
- One or more embodiments of this disclosure provide rotating ambisonic signals according to the orientation of an HMD.
- the orientation can be the orientation within a virtual reality defined by three axes in the Cartesian domain (i.e. X, Y, and Z) 300 .
- the orientation can also include a position or location within the virtual reality.
- the location and orientation of the HMD can be determined using sensor(s) 165 as shown in FIG. 1 .
- the head of the user 305 can be tracked instead of the HDM.
- the tracking can be performed by sensor(s) 165 as shown in FIG. 1 or by external camera systems.
- FIG. 4 illustrates block diagram 400 for adaptive ambisonic binaural rendering according to an embodiment of this disclosure.
- the embodiment of the adaptive ambisonic binaural rendering illustrated in FIG. 4 is for illustration only.
- a rotation matrix can be applied to ambisonic signals.
- the rotation matrix is determined based on a head positions or HMD orientation.
- Sensors in the HMD or external systems such as camera system or infrared detectors can identify the orientation and a processor can select a rotation matrix based on the orientation.
- the processor can perform ambisonic rendering by mapping the positions of the virtual speakers.
- a processor can perform binaural filtering by applying the HRTFs to the ambisonic signals to produce binaural signals, or speaker signals. The same HRTFs can be applied no matter the orientation of the HMD.
- One or more embodiments of this disclosure also provide adaptive binaural rendering by relocating virtual speakers.
- Algorithms for sound field rotation around the X, Y, and Z axes are provided by this disclosure.
- a sound field can be rotated sequentially for rotation around the X axis (i.e. roll), Y axis (i.e. pitch), and Z axis (i.e. yaw).
- ⁇ and ⁇ ′ are the original and the modified azimuth, respectively.
- the new position of the virtual speaker is given by:
- the HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
- the new position of the virtual speaker is given by:
- the HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
- One or more embodiments of this disclosure provide another technique to rotate a sound field using a rotation matrix.
- This embodiment provides three new rotation matrices for rotating third order ambisonic signals around the three axes in 3D space.
- ambisonic signals can be modified through a matrix multiplication as follows:
- R is the rotation matrix around an axis
- B and B′ are the original and modified ambisonic signals, respectively.
- the positions of virtual speakers relative to the HMD remain unchanged and as such only one set of HRTFs is used for binaural rendering.
- the different embodiments of this disclosure provide adaptive HOA binaural rendering based on sound field rotation in 3D space. Contrary to channel-based methods, in one of the embodiments, only one set of HRTFs corresponding to a fixed playback setup can be used. In comparison to channel-based binaural rendering, HOA-based methods provide a higher quality if there is not a very large set of HRTFs available to the binaural renderer. Sound fields can be edited in the ambisonic domain for artistic purposes prior to rendering to headphones.
- An embodiment of this disclosure provides third order rotation matrices for N3D-encoded B-Format (used in MPEG audio material) for three axes, X, Y, and Z in the Cartesian domain. Rotation in a direction can be done by multiplying these rotation matrices.
- the third order rotation matrix can be a 16 ⁇ 16 matrix.
- the sixteen original ambisonic signals can be labeled as follows:
- B′ [W′, X′, Y′, Z′, V′, T′, R′, S′, U′, Q′, O′, M′, K′, L′, N′, P′] (12)
- the third order rotation matrix can be a 16 ⁇ 16 matrix, where ⁇ is the rotation angle around the Y axis, the modified signals are as follows:
- V ′ cos(2 ⁇ ) V +sin(2 ⁇ ) U (49)
- N ′ cos(2 ⁇ ) N ⁇ sin(2 ⁇ ) O (59)
- FIG. 5 illustrates process 500 for adaptive ambisonic binaural rendering according to this disclosure.
- the embodiment shown in FIG. 5 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure.
- a processor such as processor 140 as shown in FIG. 5 , can perform different steps of process 500 .
- a processor is configured to receive an audio signal, the audio signal comprising a plurality of ambisonic signals.
- the processor is configured to identify an orientation of the UE based on the measured physical properties of the UE.
- Sensors can be configured to sense the physical properties of the UE.
- the physical properties could include, for example, a touch input on the headset or the HMD, camera information, gesture information, a gyroscopic information, air pressure information, a magnetic information, an acceleration information, a grip information, a proximity information, a color information, a bio-physical information, a temperature/humidity information, an illumination information, an UV information, an Electromyography (EMG) information, an Electroencephalogram (EEG) information, an Electrocardiogram (ECG) sensor, an IR sensor, an ultrasound sensor, an iris sensor, a fingerprint sensor, etc.
- EMG Electromyography
- EEG Electroencephalogram
- ECG Electrocardiogram
- the processor is configured to rotate the plurality of ambisonic signals based on the orientation of the UE.
- the processor can apply at least one rotation matrix to the plurality of ambisonic signals.
- the at least one rotation matrix comprises a rotation matrix for each axis of three axes. If the orientation includes a rotation in a direction, the processor can be configured to rotate the sound field of the plurality of ambisonic signals opposite the direction.
- the processor can also be configured to map the plurality of ambisonic signals to one or more virtual speakers of a sound field.
- the processor is configured to filter the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals.
- the head related transfer functions could be stored in a memory element.
- the plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals.
- the processor is configured to output the speaker signals.
- One or more embodiments of this disclosure provide multichannel audio downmixing via ambisonic conversion.
- An embodiment provides a novel audio downmixing method based on ambisonic mapping. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing.
- one or more embodiments provide an active downmix method based on the conversion of input multichannel audio to HOA signals.
- the playback setup can be independent of the input audio channels configuration.
- the HOA signals can be mapped to any speaker setup (e.g. smaller number of speakers, asymmetric configuration, etc.).
- One or more of the embodiments reduce common distortions such as coloration (i.e. comb filter effects) and loudness preservation to improve audio quality of the downmixed audio.
- An embodiment of this disclosure provides HOA based audio downmixing.
- the input audio channels are decomposed in the spherical coordinates (i.e. mapped onto the spherical harmonic bases) to generate fourth order HOA signals as follows:
- HOA high frequency effect
- S in is the matrix containing input audio channels (except the low frequency effect (IMF) channels)
- B is the matrix of HOA signals.
- the order of HOA signals can be increased to better represent the original sound field. Fourth order ambisonic representation for many sound fields would be sufficient and would reduce the computational load.
- the HOA signals can be mapped to any play back system using an HOA renderer as follows:
- D is the HOA renderer and S out is the output audio channels.
- the input LFE channels are used in the downmixed output audio. Some sound images may be distorted when a larger number of channels are downmixed to a smaller playback system.
- An example embodiment provides a sound field on a smaller playback system with the best possible audio quality.
- Audio downmixing results in some distortions. Issues can be caused by 3D to 2D conversion wherein the sound field envelopment (from above) and accuracy of sound source vertical localization are degraded. Some other issues that might be observed include coloration, loudness distortion, spectral distortion, direct to ambient sound ratio, auditory masking, etc.
- a processor provides a correlation-based technique to adjust the Inter-Channel Time Delay (ICTD) between highly correlated input channels to reduce coloration. Since sound fields might consist of many sound sources, the processor divides input audio channels into subgroups based on the cross correlation. Channels with cross correlation greater than 0.2 are placed in the same group and then time aligned to the channel with the largest energy in that group. In one example embodiment, the maximum delay to be aligned is set to 10 msec. This maximum delay might not be caused by the distance between microphones in a microphone array, but might be caused by post-recording effects. One embodiment of this disclosure recognizes that there are large delays between channels in the MPEG 22.2-channel audio files, and sets the maximum delay at 10 msec. In one example embodiment, the processor does not align spectral components differently.
- ICTD Inter-Channel Time Delay
- the processor can perform ambisonic conversion. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field.
- the processor can map the high order ambisonics to virtual speakers. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing.
- the energy of downmixed audio can be equalized in both spectral domain and space.
- energy distribution in a downmixed sound field can be more easily controlled.
- the energy of the downmixed channels is adjusted in the octave bands to make it equal the energy of the input sound field.
- the energy adjustment can also be done separately for the left and right channels to keep the energy ratio of the left and right channels the same as that in the input sound field
- Some sound sources might be partially masked by louder sounds in a downmixed sound field, caused by the auditory masking effects in the frequency and/or time domain. Sound sources might be located at different location in the input sound field and therefore are audible. In a downmixed sound field, many sounds might be coming from the same direction and therefore auditory masking (both temporal and simultaneous) can be more effective.
- One way to reduce masking effects in a downmixed sound field is to apply different gains to input audio channels prior to downmixing to smaller number of channels
Abstract
A user equipment (UE) includes a memory element and a processor. The memory element is configured to store a plurality of head-related transfer functions. The processor is configured to receive an audio signal. The audio signal includes a plurality of ambisonic signals. The processor is also configured to identify an orientation of the UE based on physical properties of the UE. The processor is also configured to rotate the plurality of ambisonic signals based on the orientation of the UE. The processor is also configured to filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals. The processor is also configured to output the speaker signals.
Description
- The present application claims priority to U.S. Provisional Patent Application Ser. No. 62/108,774, filed Jan. 28, 2015, entitled “ADAPTIVE AMBISONIC BINAURAL RENDERING” and U.S. Provisional Patent Application Ser. No. 62/108,779, filed Jan. 28, 2015, entitled “AUDIO DOWNMIXING VIA AMBISONIC CONVERSION”. The contents of the above-identified patent documents are incorporated herein by reference.
- The present application relates generally to ambisonic signals and, more specifically, to an apparatus and method for adaptive ambisonic binaural rendering.
- Ambisonics is an effective technique to encode and reconstruct sound fields. This technique is based on the orthogonal decomposition of a sound field in the spherical coordinates in the 3D space or cylindrical decomposition in the 2D space. In the decoding process, the ambisonic signals are decoded to produce speaker signals. The higher the order of the ambisonics, the finer reconstruction of the sound fields achieved. Ambisonics provides significant flexibility to recreate 3D audio on any playback setup such as large number of loudspeakers and headphones. Particularly, in mobile applications and Head-Mounted Displays (HMD), ambisonic rendering to headphones is of great interest.
- In an embodiment, a user equipment (UE) is provided that includes a memory element and a processor. The memory element is configured to store a plurality of head-related transfer functions. The processor is configured to receive an audio signal. The audio signal includes a plurality of ambisonic signals. The processor is also configured to identify an orientation of the UE based on physical properties of the UE. The processor is also configured to rotate the plurality of ambisonic signals based on the orientation of the UE. The processor is also configured to filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals. The processor is also configured to output the speaker signals.
- In another embodiment, a method is provided for audio signal processing. The method includes receiving an audio signal. The audio signal includes a plurality of ambisonic signals. The method also includes identifying an orientation of the UE based on physical properties of the UE. The method also includes rotating the plurality of ambisonic signals based on the orientation of the UE. The method also includes the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals. The method also includes outputting the speaker signals.
- Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation; the term “or,” is inclusive, meaning and/or; the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term “controller” means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.
- For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
-
FIG. 1 illustrates an example HMD according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented; -
FIG. 2 illustrates an example view with content in an HMD according to an embodiment of this disclosure; -
FIG. 3 illustrates an example Cartesian domain with respect to a user according to an embodiment of this disclosure; -
FIG. 4 illustrates block diagram for adaptive ambisonic binaural rendering according to an embodiment of this disclosure -
FIG. 5 illustrates process for adaptive ambisonic binaural rendering according to this disclosure; and -
FIG. 6 illustrates block diagram for high order ambisonic downmixing according to an embodiment of this disclosure. -
FIGS. 1 through 6 , discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged apparatus or method. -
FIG. 1 illustrates an example HMD 100 according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented. The embodiment of the HMD 100 illustrated inFIG. 1 is for illustration only, the HMD 100 comes in a wide variety of configurations, andFIG. 1 does not limit the scope of this disclosure to any particular implementation of a HMD. - In various embodiments, the HMD 100 may take different forms, and the present disclosure is not limited to any particular form. For example, the HMD 100 may be a mobile communication device, such as, for example, a user equipment, a mobile station, a subscriber station, a wireless terminal, a smart phone, a tablet, etc., that is mountable within a headset for VR and/or AR applications. In other examples, the HMD 100 may include the headset and take the form of a wearable electronic device, such as, for example, glasses, goggles, a helmet, etc., for the VR and/or AR applications.
- As shown in
FIG. 1 , the HMD 100 includes anantenna 105, a radio frequency (RF)transceiver 110, transmit (TX)processing circuitry 115, amicrophone 120, and receive (RX)processing circuitry 125. The HMD 100 also includes aspeaker 130, aprocessor 140, an input/output (I/O) interface (IF) 145, atouchscreen 150, adisplay 155, amemory 160, and one ormore sensors 165. Thememory 160 includes an operating system (OS) 161 and one ormore applications 162. - The
RF transceiver 110 receives, from theantenna 105, an incoming RF signal transmitted by an access point (e.g., base station, WiFi router, Bluetooth device) for a network (e.g., a WiFi, Bluetooth, cellular, 5G, LTE, LTE-A, WiMAX, or any other type of wireless network). TheRF transceiver 110 down-converts the incoming RF signal to generate an intermediate frequency (IF) or baseband signal. The IF or baseband signal is sent to theRX processing circuitry 125, which generates a processed baseband signal by filtering, decoding, and/or digitizing the baseband or IF signal. TheRX processing circuitry 125 transmits the processed baseband signal to the speaker 130 (such as for voice data) or to theprocessor 140 for further processing (such as for web browsing data). - The TX
processing circuitry 115 receives analog or digital voice data from themicrophone 120 or other outgoing baseband data (such as web data, e-mail, or interactive video game data) from theprocessor 140. The TXprocessing circuitry 115 encodes, multiplexes, and/or digitizes the outgoing baseband data to generate a processed baseband or IF signal. TheRF transceiver 110 receives the outgoing processed baseband or IF signal from theTX processing circuitry 115 and up-converts the baseband or IF signal to an RF signal that is transmitted via theantenna 105. - The
processor 140 can include one or more processors or other processing devices and execute the OS 161 stored in thememory 160 in order to control the overall operation of the HMD 100. For example, theprocessor 140 could control the reception of forward channel signals and the transmission of reverse channel signals by theRF transceiver 110, theRX processing circuitry 125, and theTX processing circuitry 115 in accordance with well-known principles. In some embodiments, theprocessor 140 includes at least one microprocessor or microcontroller. On another embodiment, theprocessor 140 could also be implemented as processing circuitry. Theprocessor 140 can carry out the operations or instructions of any process disclosed herein. - The
processor 140 is also capable of executing other processes and programs resident in thememory 160. Theprocessor 140 can move data into or out of thememory 160 as required by an executing process. In some embodiments, theprocessor 140 is configured to execute theapplications 162 based on theOS 161 or in response to signals received from eNBs or an operator. Theprocessor 140 is also coupled to the I/O interface 145, which provides theHMD 100 with the ability to connect to other devices, such as laptop computers and handheld computers. The I/O interface 145 is the communication path between these accessories and theprocessor 140. - The
processor 140 is also coupled to thetouchscreen 150 and thedisplay 155. The operator of theHMD 100 can use thetouchscreen 150 to enter data and/or inputs into theHMD 100. Thedisplay 155 may be a liquid crystal display, light-emitting diode (LED) display, optical LED (OLED), active matrix OLED (AMOLED), or other display capable of rendering text and/or graphics, such as from web sites, videos, games, etc. - The
memory 160 is coupled to theprocessor 140. Part of thememory 160 could include a random access memory (RAM), and another part of thememory 160 could include a Flash memory or other read-only memory (ROM). -
HMD 100 further includes one or more sensor(s) 165 that can meter a physical quantity or detect an activation state of theHMD 100 and convert metered or detected information into an electrical signal. For example,sensor 165 may include one or more buttons for touch input, e.g., on the headset or theHMD 100, a camera, a gesture sensor, a gyroscope or gyro sensor, an air pressure sensor, a magnetic sensor or magnetometer, an acceleration sensor or accelerometer, a grip sensor, a proximity sensor, a color sensor 165H (e.g., a Red Green Blue (RGB) sensor), a bio-physical sensor, a temperature/humidity sensor, an illumination sensor 165K, an Ultraviolet (UV) sensor, an Electromyography (EMG) sensor, an Electroencephalogram (EEG) sensor, an Electrocardiogram (ECG) sensor, an IR sensor, an ultrasound sensor, an iris sensor, a fingerprint sensor, etc. The sensor(s) 165 can further include a control circuit for controlling at least one of the sensors included therein. As will be discussed in greater detail below, one or more of these sensor(s) 165 may be used to control audio rendering, determine the orientation and facing direction of the user for 3D content display identification, etc. Any of these sensor(s) 165 may be located within theHMD 100, within a headset configured to hold theHMD 100, or in both the headset andHMD 100, for example, in embodiments where theHMD 100 includes a headset. - The
touchscreen 150 can include a touch panel, a (digital) pen sensor, a key, or an ultrasonic input device. Thetouchscreen 150 can recognize, for example, a touch input in at least one scheme among a capacitive scheme, a pressure sensitive scheme, an infrared scheme, or an ultrasonic scheme. Thetouchscreen 150 can also include a control circuit. In the capacitive scheme, thetouchscreen 150 can recognize touch or proximity. - As described in more detail below, the
HMD 100 may include circuitry for and applications for providing 3D audio for a HMD. AlthoughFIG. 1 illustrates one example ofHMD 100, various changes may be made toFIG. 1 . For example, various components inFIG. 1 could be combined, further subdivided, or omitted and additional components could be added according to particular needs. As a particular example, theprocessor 140 could be divided into multiple processors, such as one or more central processing units (CPUs) and one or more graphics processing units (GPUs). Also, whileFIG. 1 illustrates theHMD 100 configured as a mobile telephone, tablet, or smartphone, theHMD 100 could be configured to operate as other types of mobile or stationary devices. - Embodiments of the present disclosure provide an adaptive ambisonic binaural rendering framework for stereoscopic 3D VR or AR applications on the
HMD 100. For VR experience using theHMD 100, the user's head motion, i.e., the movement of theHMD 100, is tracked using sensor(s) 165 in theHMD 100 and used to control the binaural rendering. In this disclosure, ambisonic signals are rotated according to theHMD 100 orientation and then mapped to virtual speakers located at fixed positions. The rotated ambisonic signals and a fixed set of Head-Related Transfer Functions (HRTF) are used to produce ear signals. - One or more embodiments of this disclosure recognize and take into account that ambisonic rendering can be adapted to the
HMD 100 orientation to recreate an original sound field. Various embodiments of this disclosure provide a system for adaptive ambisonic binaural rendering to make audio scene independent from head movement. Binaural ambisonic rendering can be done through mapping of ambisonic signals to virtual speakers and then filtering each loudspeaker signal with a pair of Head-Related Transfer Functions (HRTF) corresponding to the position of the virtual speakers (relative to the head). - In an embodiment of this disclosure, for ambisonic rendering, the positions of virtual speakers remain unchanged, and for each new HMD orientation, the original ambisonic signals and a new set of HRTFs are used to produce ear signals. In another embodiment of this disclosure, the positions of virtual speakers are changed according to HMD orientation to make the audio scene independent from head movement. The original ambisonic signals and a new set of HRTFs corresponding to the positions of the speakers are used to produce speaker signals. In yet another embodiment of this disclosure, ambisonic signals are rotated according to the HMD (or head) orientation and then mapped to virtual speakers located at fixed positions. Then the rotated ambisonic signals and a fixed set of HRTFs are used to produce ear signals. This embodiment is advantageous as it needs only one set of HRTFs for binaural rendering, one HRTF for each headphone speaker (or ear).
-
FIG. 2 illustrates anexample view 202 with content in anHMD 100 according to an embodiment of this disclosure. InFIG. 2 , a user is wearing theHMD 100 and is seeing theview 202. Theview 202 includes a ninety-six degree viewing angle. In different embodiments, other viewing angles can be used. - Various embodiments of this disclosure recognize and take into account that HMD 204 with mega sized screens and ninety-six degree viewing angles allow users to feel the world beyond peripheral vision. There are applications on the
HMD 100 with a mobile device LCD as the screen. Users might want to use a mobile device without removing theHMD 100. The user may desire to seamlessly switch between the VR world and the real world. In an example, a user is watching a movie inHMD 100 and wants to write an email. In this example, the user can draft the email in the VR environment without removing theHMD 100. The mobile device can display the mobile device environment in the VR world. - Various embodiments of the present disclosure provide content within an angular range that is wider than the user's current 3D view frustum 310. The angular range 315 (e.g., on the x-z plane assuming a Cartesian coordinate system with the x direction generally denoting left/right or yaw, the y direction generally denoting forward/backwards, and the z direction generally denoting up/down or pitch), within which the
UI elements 305 are to be placed is configured. In some examples, (e.g., whenmore UI elements 305 exist than can fit), theHMD 100 displays, either actually or virtually (i.e., not actually displayed on thedisplay 155 but actually displayed when theHMD 100 is moved to a location where the element is virtually displayed), someUI elements 305 outside the current 3D view frustum 310. However, theHMD 100 places theseUI elements 305 within the angular range 315 for the UI so that the user would not have to turn the head too much to the left or the right (i.e., yaw or x movement) to see all displayedUI elements 305. Note, while certain examples are given in a Cartesian coordinate system, any suitable coordinate system may be used with any tuple serving as the default coordinate directions. TheHMD 100 places the elements within the user's current 3D view frustum, i.e., the portion of the total viewable 3D space that is currently viewable by the user as a result of the HMD's 100 current detected orientation and facing direction. - As discussed above, the
HMD 100 detects the user's head motions, i.e., the movement of theHMD 100, using the sensor(s) 165 on theHMD 100 and/or headset, such as, for example, a gyroscope, an accelerometer, etc. TheHMD 100 displays theUI elements 305 as well as other elements of the display (e.g., content) to respond to the head motions to simulate looking at and interacting with the real-world view and objects. - One or more embodiments of this disclosure recognize and take into account the difficulty in identifying a rotation matrix for any direction in 3D space. Rotation matrices for up to second order ambisonics (Fu-Ma format) are identified. Many ambisonic recordings are third and higher order. As such, there is a need to develop techniques for rotation of ambisonic signals with any order. Another issue would be real-time binaural rendering with no discontinuities (in time and space) while changing the ambisonic signals according to the head movement.
-
FIG. 3 illustrates an exampleCartesian domain 300 with respect to auser 305 according to an embodiment of this disclosure. InFIG. 3 , auser 305 is seen without wearing theHMD 100, but could be wearing theHMD 100. The coordinates in theCartesian domain 300 may also be considered with respect to theHMD 100. The axes X, Y, and Z can be in positive and negative directions. Theuser 305 can also rotate within theCartesian domain 300. - One or more embodiments of this disclosure provide different techniques for adaptive ambisonic rendering. The different techniques are based on the equivalence of a HMD rotation in one direction and sound field rotation in the opposite direction. One embodiment is based on changing the location of virtual speakers to make the reproduced sound field independent from head movement. In this embodiment, positions of virtual speakers are changed sequentially for rotation around the three axes X, Y, and Z in the Cartesian domain 320. This embodiment can be used to do adaptive binaural rendering for any ambisonic order. In another embodiment, a new rotation matrix for third order ambisonics is applied. This embodiment rotates ambisonics (for example, up to third order) in any direction in 3D space through simple matrix multiplication and can use only one set of HRTFs. In other examples, fourth order, or higher, ambisonics can be used as well.
- One or more embodiments of this disclosure provide rotating ambisonic signals according to the orientation of an HMD. The orientation can be the orientation within a virtual reality defined by three axes in the Cartesian domain (i.e. X, Y, and Z) 300. The orientation can also include a position or location within the virtual reality. The location and orientation of the HMD can be determined using sensor(s) 165 as shown in
FIG. 1 . In different embodiments of this disclosure, the head of theuser 305 can be tracked instead of the HDM. The tracking can be performed by sensor(s) 165 as shown inFIG. 1 or by external camera systems. -
FIG. 4 illustrates block diagram 400 for adaptive ambisonic binaural rendering according to an embodiment of this disclosure. The embodiment of the adaptive ambisonic binaural rendering illustrated inFIG. 4 is for illustration only. - At
block 402, a rotation matrix can be applied to ambisonic signals. The rotation matrix is determined based on a head positions or HMD orientation. Sensors in the HMD or external systems such as camera system or infrared detectors can identify the orientation and a processor can select a rotation matrix based on the orientation. Atblock 404, the processor can perform ambisonic rendering by mapping the positions of the virtual speakers. Atblock 406, a processor can perform binaural filtering by applying the HRTFs to the ambisonic signals to produce binaural signals, or speaker signals. The same HRTFs can be applied no matter the orientation of the HMD. - One or more embodiments of this disclosure also provide adaptive binaural rendering by relocating virtual speakers. Algorithms for sound field rotation around the X, Y, and Z axes are provided by this disclosure. In this embodiment, a sound field can be rotated sequentially for rotation around the X axis (i.e. roll), Y axis (i.e. pitch), and Z axis (i.e. yaw).
- For rotation of virtual speakers around the Z axis only the azimuth may be changed. An azimuth of each virtual speaker can be shifted by γ, with γ being the rotation angle around the Z axis. The azimuth can be modified by:
-
θ′=θ+γ, (1) - where θ and θ′ are the original and the modified azimuth, respectively.
- For rotation of virtual speakers around the X axis (i.e. roll), both azimuth and elevation of virtual speakers are changed. The positions of virtual speakers are modified accordingly. The high order ambisonic (HOA) signals are mapped to virtual speakers at new locations and converted to binaural signals using a set of HRTFs corresponding to the positions of virtual speakers relative to the head. If a virtual speaker is located at θi and φi (azimuth and elevation respectively), then the new positions are θ′i and φ′i. Since rotation around the X axis does not change the projection to the X axis, the new position of each virtual speaker is given by the following procedure.
-
√{square root over (3)} cos(θ′i) cos(φ′i)=√{square root over (3)} cos(θi) cos(φi) (2) - The Y and Z axes are rotated as follows:
-
- where α is the rotation angle around the X axis. The values y, z, y′ and z′ (in the N3D ambisonic format) are given by:
-
y=√{square root over (3)} sin(θi) cos(φi), z=√{square root over (3)} sin(φi) -
y′=√{square root over (3)} sin(θ′i) cos(φ′i), z′=√{square root over (3)} sin(φ′i) (4) - The new position of the virtual speaker is given by:
-
- The HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
- For rotation of virtual speakers around the Y axis, the same procedure as described for the X axis is used. Since rotation around the Y axis does not change the projection to the Y axis, the procedure is as follows.
-
√{square root over (3)} sin(θ′i) cos(φ′i)=√{square root over (3)} sin(θi) cos(φi) (6) - Also, the X and Z axes will be rotated as follows:
-
- where β is the rotation angle around the Y axis, and the values y, z, y′ and z′ (in the N3D ambisonic format) are:
-
x=√{square root over (3)} cos(θi) cos(φi), z=√{square root over (3)} sin(φi) -
x′=√{square root over (3)} cos(θ′i) cos(φ′i), z′=√{square root over (3)} sin(φ′i) (8) - The new position of the virtual speaker is given by:
-
- The HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
- One or more embodiments of this disclosure provide another technique to rotate a sound field using a rotation matrix. This embodiment provides three new rotation matrices for rotating third order ambisonic signals around the three axes in 3D space. For any direction, ambisonic signals can be modified through a matrix multiplication as follows:
-
B′=RB (10) - where R is the rotation matrix around an axis, and B and B′ are the original and modified ambisonic signals, respectively. In this embodiment, the positions of virtual speakers relative to the HMD remain unchanged and as such only one set of HRTFs is used for binaural rendering.
- The different embodiments of this disclosure provide adaptive HOA binaural rendering based on sound field rotation in 3D space. Contrary to channel-based methods, in one of the embodiments, only one set of HRTFs corresponding to a fixed playback setup can be used. In comparison to channel-based binaural rendering, HOA-based methods provide a higher quality if there is not a very large set of HRTFs available to the binaural renderer. Sound fields can be edited in the ambisonic domain for artistic purposes prior to rendering to headphones.
- An embodiment of this disclosure provides third order rotation matrices for N3D-encoded B-Format (used in MPEG audio material) for three axes, X, Y, and Z in the Cartesian domain. Rotation in a direction can be done by multiplying these rotation matrices.
- For rotation around the X axis (i.e., roll), the third order rotation matrix can be a 16×16 matrix. The sixteen original ambisonic signals can be labeled as follows:
-
B=[W, X, Y, Z, T, R, S, U, Q, O, M, K, L, N, P]. (11) - The sixteen modified ambisonic signals can be labeled as follows:
-
B′=[W′, X′, Y′, Z′, V′, T′, R′, S′, U′, Q′, O′, M′, K′, L′, N′, P′] (12) - Assuming an all-zero rotation matrix and only non-zero values in the rotation matrix, where α is the rotation angle around the X axis, the modified signals are as follows:
-
- For rotation around the Y axis (i.e., pitch), the third order rotation matrix can be a 16×16 matrix, where β is the rotation angle around the Y axis, the modified signals are as follows:
-
- For rotation around the Z axis (i.e., yaw), the third order rotation matrix can be a 16×16 matrix, where γ is the rotation angle around the Z axis, the modified signals are as follows:
-
W′=W (45) -
X′=cos(γ)X−sin(γ)Y (46) -
Y′=sin(γ)X+cos(γ)Y (47) -
Z′=Z (48) -
V′=cos(2γ)V+sin(2γ)U (49) -
T′=cos(γ)T+sin(γ)S (50) -
R′=R (51) -
S′=cos(γ)S−sin(γ)T (52) -
U′=cos(2γ)U−sin(2γ)V (53) -
Q′=cos(3γ)Q+sin(3γ)P (54) -
O′=cos(2γ)O+sin(2γ)N (55) -
M′=cos(γ)M+sin(γ)L (56) -
K′=K (57) -
L′=cos(γ)L−sin(γ)M (58) -
N′=cos(2γ)N−sin(2γ)O (59) -
P′=cos(3γ)P−sin(3γ)Q (60) - In the different embodiments, a head tracker or sensors can be mounted on the headphones to determine the head orientation, which is used to rotate the ambisonic signals or in another method change the positions of virtual speakers. In one embodiment, positions of virtual speakers are changed based on the head movement, and in another embodiment, a rotation matrix is used to rotate the ambisonic signals. One embodiment can be used for any ambisonic order. The other embodiment uses only one set of HRTFs and may use less computation as there is no need to change the positions of virtual speakers. If the binaural signals are generated directly from HOA signals (without mapping HOA signals to virtual loudspeakers), the embodiment using only one set of HRTFs will further reduce computation overhead.
-
FIG. 5 illustratesprocess 500 for adaptive ambisonic binaural rendering according to this disclosure. The embodiment shown inFIG. 5 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure. A processor, such asprocessor 140 as shown inFIG. 5 , can perform different steps ofprocess 500. - At
step 505, a processor is configured to receive an audio signal, the audio signal comprising a plurality of ambisonic signals. Atstep 510, the processor is configured to identify an orientation of the UE based on the measured physical properties of the UE. Sensors can be configured to sense the physical properties of the UE. The physical properties could include, for example, a touch input on the headset or the HMD, camera information, gesture information, a gyroscopic information, air pressure information, a magnetic information, an acceleration information, a grip information, a proximity information, a color information, a bio-physical information, a temperature/humidity information, an illumination information, an UV information, an Electromyography (EMG) information, an Electroencephalogram (EEG) information, an Electrocardiogram (ECG) sensor, an IR sensor, an ultrasound sensor, an iris sensor, a fingerprint sensor, etc. - At
step 515 the processor is configured to rotate the plurality of ambisonic signals based on the orientation of the UE. The processor can apply at least one rotation matrix to the plurality of ambisonic signals. The at least one rotation matrix comprises a rotation matrix for each axis of three axes. If the orientation includes a rotation in a direction, the processor can be configured to rotate the sound field of the plurality of ambisonic signals opposite the direction. The processor can also be configured to map the plurality of ambisonic signals to one or more virtual speakers of a sound field. - At
step 520, the processor is configured to filter the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals. The head related transfer functions could be stored in a memory element. The plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals. Atstep 525, the processor is configured to output the speaker signals. - One or more embodiments of this disclosure provide multichannel audio downmixing via ambisonic conversion. An embodiment provides a novel audio downmixing method based on ambisonic mapping. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing.
- Different embodiments of this disclosure recognize and take into account that production of multichannel audio content keeps growing and becomes more widespread. Playback systems currently in use are capable to play back only small number of audio channels such as the legacy 5.1 format. Therefore there is a need for high quality methods to downmix large number of audio channels. Prior downmix methods fall into two passive and active categories. Passive methods use fixed coefficients to combine input channels into output channels. The passive methods sometimes produce unsatisfactory results and cause audio artifacts and spatial and timbral distortions. On the other hand, active downmix methods adapt the downmix procedure to the input audio and reduce distortions caused by passive methods.
- In this disclosure, one or more embodiments provide an active downmix method based on the conversion of input multichannel audio to HOA signals. The playback setup can be independent of the input audio channels configuration. The HOA signals can be mapped to any speaker setup (e.g. smaller number of speakers, asymmetric configuration, etc.). One or more of the embodiments reduce common distortions such as coloration (i.e. comb filter effects) and loudness preservation to improve audio quality of the downmixed audio.
- An embodiment of this disclosure provides HOA based audio downmixing. In this embodiment, the input audio channels are decomposed in the spherical coordinates (i.e. mapped onto the spherical harmonic bases) to generate fourth order HOA signals as follows:
-
B=YSin (61) - Where Y is a matrix of the fourth order spherical harmonics in the direction of the input channels, Sin is the matrix containing input audio channels (except the low frequency effect (IMF) channels), and B is the matrix of HOA signals. The order of HOA signals can be increased to better represent the original sound field. Fourth order ambisonic representation for many sound fields would be sufficient and would reduce the computational load. The HOA signals can be mapped to any play back system using an HOA renderer as follows:
-
Sout=DB (62) - where D is the HOA renderer and Sout is the output audio channels. The input LFE channels are used in the downmixed output audio. Some sound images may be distorted when a larger number of channels are downmixed to a smaller playback system. An example embodiment provides a sound field on a smaller playback system with the best possible audio quality.
- Different embodiments of this disclosure recognize and take into account that audio downmixing results in some distortions. Issues can be caused by 3D to 2D conversion wherein the sound field envelopment (from above) and accuracy of sound source vertical localization are degraded. Some other issues that might be observed include coloration, loudness distortion, spectral distortion, direct to ambient sound ratio, auditory masking, etc.
- Different embodiments of this disclosure recognize and take into account that coloration (i.e. comb filter effect) is caused by the addition of correlated signals where some frequency components are amplified or cancelled. That distortion in downmixed audio is observed when height channels are correlated with the horizontal channels but are not time-aligned (delayed by some msec). This misalignment occurs when a spaced microphone array is used to make multi-channel recordings of a sound field.
-
FIG. 6 illustrates block diagram 600 for high order ambisonic downmixing according to an embodiment of this disclosure. The embodiment of the high order ambisonic downmixing illustrated inFIG. 6 is for illustration only. - At
block 602, a processor provides a correlation-based technique to adjust the Inter-Channel Time Delay (ICTD) between highly correlated input channels to reduce coloration. Since sound fields might consist of many sound sources, the processor divides input audio channels into subgroups based on the cross correlation. Channels with cross correlation greater than 0.2 are placed in the same group and then time aligned to the channel with the largest energy in that group. In one example embodiment, the maximum delay to be aligned is set to 10 msec. This maximum delay might not be caused by the distance between microphones in a microphone array, but might be caused by post-recording effects. One embodiment of this disclosure recognizes that there are large delays between channels in the MPEG 22.2-channel audio files, and sets the maximum delay at 10 msec. In one example embodiment, the processor does not align spectral components differently. - At
block 604, the processor can perform ambisonic conversion. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field. Atblock 606, the processor can map the high order ambisonics to virtual speakers. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing. - In one example embodiment, in order to preserve the energy in the downmixed sound field, at
block 608, the energy of downmixed audio can be equalized in both spectral domain and space. In this example, energy distribution in a downmixed sound field can be more easily controlled. In the spectral domain, the energy of the downmixed channels is adjusted in the octave bands to make it equal the energy of the input sound field. The energy adjustment can also be done separately for the left and right channels to keep the energy ratio of the left and right channels the same as that in the input sound field - Some sound sources might be partially masked by louder sounds in a downmixed sound field, caused by the auditory masking effects in the frequency and/or time domain. Sound sources might be located at different location in the input sound field and therefore are audible. In a downmixed sound field, many sounds might be coming from the same direction and therefore auditory masking (both temporal and simultaneous) can be more effective. One way to reduce masking effects in a downmixed sound field is to apply different gains to input audio channels prior to downmixing to smaller number of channels
- One or more embodiments of this disclosure recognize and take into account that whitening of broadband sounds is another distortion observed in some downmixed sound fields. An example embodiment avoids adding uncorrelated speaker signals that have almost identical spectrum. This technique works well for independently identically distributed (i.i.d.) sources. For other sources (e.g. localized sources) the spectral correlation of horizontal and height channels would be low. This is useful to replace the speaker signals in height speakers. If there is a mixture of ambient sounds and localized sources in height channels, the height speaker signals have to be decomposed into localized and ambient sounds, and then only the ambient sounds could be replaced (with proper energy adjustment).
- An embodiment of this disclosure provides an audio downmixing method where input audio channels are transformed to HOA signals that can be mapped to any playback setup. This embodiment includes a spectral correction component to equalize the energy of the downmixed sound field in the left and right channels. Highly correlated input channels can be time aligned to reduce coloration. This embodiment can be used to downmix multichannel audio files with different configurations (e.g. 22.2, 14.0, 11.1, and 9.0) to a standard 5.1 configuration. Also 5.1 audio files can be converted to an irregular 5.1 format where loudspeakers are placed in irregular locations. As an extension to this example embodiment, HRTFs can be used to find the ear signals for the input and output sound fields. One example downmixed sound field can be found to result in the least difference between the ear signals for the input and output sound fields.
- Although the present disclosure has been described with an exemplary embodiment, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications as fall within the scope of the appended claims.
Claims (20)
1. A user equipment (UE), the UE comprising:
a memory element configured to store a plurality of head-related transfer functions;
a processor configured to:
receive an audio signal, the audio signal comprising a plurality of ambisonic signals;
identify an orientation of the UE based on physical properties of the UE;
rotate the plurality of ambisonic signals based on the orientation of the UE;
filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals; and
output the speaker signals.
2. The UE of claim 1 , wherein the processor configured to rotate the plurality of ambisonic signals based on the orientation of the UE comprises the processor configured to:
apply at least one rotation matrix to the plurality of ambisonic signals.
3. The UE of claim 1 , wherein the processor is further configured to:
map the plurality of ambisonic signals to one or more virtual speakers of a sound field;
4. The UE of claim 3 , wherein, in response to the orientation of the UE including a rotation in a direction, the processor configured to rotate the plurality of ambisonic signals comprises the processor configured to rotate the sound field of the plurality of ambisonic signals opposite the direction.
5. The UE of claim 3 , wherein a position of the virtual speakers with respect to the UE remains unchanged.
6. The UE of claim 1 , wherein the at least one rotation matrix comprises a rotation matrix for each axis of three axes.
7. The UE of claim 1 , wherein the plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals.
8. The UE of claim 6 , wherein the rotation matrix for each axis is for third order ambisonic signals.
9. The UE of claim 1 , further comprising:
at least one sensor configured to measure the physical properties of the UE.
10. The UE of claim 1 , wherein the processor is further configured to receive the physical properties of the UE from an at least one external sensor.
11. A method for audio signal processing, the method comprising:
receiving an audio signal, the audio signal comprising a plurality of ambisonic signals;
identifying an orientation of the UE based on physical properties of the UE;
rotating the plurality of ambisonic signals based on the orientation of the UE;
filtering the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals; and
outputting the speaker signals.
12. The method of claim 11 , wherein rotating the plurality of ambisonic signals based on the orientation of the UE comprises:
applying at least one rotation matrix to the plurality of ambisonic signals.
13. The method of claim 11 , further comprising:
mapping the plurality of ambisonic signals to one or more virtual speakers of a sound field;
14. The method of claim 13 , wherein, in response to the orientation of the UE including a rotation in a direction, rotating the plurality of ambisonic signals comprises rotating the sound field of the plurality of ambisonic signals opposite the direction.
15. The method of claim 13 , wherein a position of the virtual speakers with respect to the UE remains unchanged.
16. The method of claim 11 , wherein the at least one rotation matrix comprises a rotation matrix for each axis of three axes.
17. The method of claim 11 , wherein the plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals.
18. The method of claim 16 , wherein the rotation matrix for each axis is for third order ambisonic signals.
19. The method of claim 11 , further comprising:
measuring the physical properties of the UE.
20. The method of claim 11 , further comprising:
receiving the physical properties of the UE from an at least one external sensor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/988,589 US9767618B2 (en) | 2015-01-28 | 2016-01-05 | Adaptive ambisonic binaural rendering |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562108774P | 2015-01-28 | 2015-01-28 | |
US201562108779P | 2015-01-28 | 2015-01-28 | |
US14/988,589 US9767618B2 (en) | 2015-01-28 | 2016-01-05 | Adaptive ambisonic binaural rendering |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160241980A1 true US20160241980A1 (en) | 2016-08-18 |
US9767618B2 US9767618B2 (en) | 2017-09-19 |
Family
ID=56621622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/988,589 Active 2036-03-20 US9767618B2 (en) | 2015-01-28 | 2016-01-05 | Adaptive ambisonic binaural rendering |
Country Status (1)
Country | Link |
---|---|
US (1) | US9767618B2 (en) |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170094439A1 (en) * | 2015-09-24 | 2017-03-30 | Lenovo (Beijing) Limited | Information processing method and electronic device |
US20170105082A1 (en) * | 2015-10-08 | 2017-04-13 | Qualcomm Incorporated | Conversion from channel-based audio to hoa |
CN106657617A (en) * | 2016-11-30 | 2017-05-10 | 努比亚技术有限公司 | Method for controlling playing of loudspeakers and mobile terminal |
CN107741783A (en) * | 2017-10-01 | 2018-02-27 | 上海量科电子科技有限公司 | electronic transfer method and system |
US20180091919A1 (en) * | 2016-09-23 | 2018-03-29 | Gaudio Lab, Inc. | Method and device for processing binaural audio signal |
KR20180044077A (en) * | 2016-10-21 | 2018-05-02 | 삼성전자주식회사 | In multimedia communication between terminal devices, method for transmitting audio signal and outputting audio signal and terminal device performing thereof |
WO2018093193A1 (en) * | 2016-11-17 | 2018-05-24 | Samsung Electronics Co., Ltd. | System and method for producing audio data to head mount display device |
US9992602B1 (en) * | 2017-01-12 | 2018-06-05 | Google Llc | Decoupled binaural rendering |
US10009704B1 (en) | 2017-01-30 | 2018-06-26 | Google Llc | Symmetric spherical harmonic HRTF rendering |
US20180184225A1 (en) * | 2016-12-23 | 2018-06-28 | Nxp B.V. | Processing audio signals |
CN108268257A (en) * | 2016-12-29 | 2018-07-10 | 福建省天奕网络科技有限公司 | Lines track method for drafting and system applied to VR scenes |
CN108346432A (en) * | 2017-01-25 | 2018-07-31 | 北京三星通信技术研究有限公司 | The processing method and relevant device of Virtual Reality audio |
WO2018149774A1 (en) * | 2017-02-15 | 2018-08-23 | Sennheiser Electronic Gmbh & Co. Kg | Method and device for processing a digital audio signal for binaural reproduction |
US10158963B2 (en) | 2017-01-30 | 2018-12-18 | Google Llc | Ambisonic audio with non-head tracked stereo based on head position and time |
WO2018234624A1 (en) * | 2017-06-21 | 2018-12-27 | Nokia Technologies Oy | Recording and rendering audio signals |
WO2019009085A1 (en) * | 2017-07-05 | 2019-01-10 | ソニー株式会社 | Signal processing device and method, and program |
US10249312B2 (en) | 2015-10-08 | 2019-04-02 | Qualcomm Incorporated | Quantization of spatial vectors |
WO2019063877A1 (en) * | 2017-09-29 | 2019-04-04 | Nokia Technologies Oy | Recording and rendering spatial audio signals |
US20190116440A1 (en) * | 2017-10-12 | 2019-04-18 | Qualcomm Incorporated | Rendering for computer-mediated reality systems |
US20190239015A1 (en) * | 2018-02-01 | 2019-08-01 | Qualcomm Incorporated | Scalable unified audio renderer |
US10390166B2 (en) * | 2017-05-31 | 2019-08-20 | Qualcomm Incorporated | System and method for mixing and adjusting multi-input ambisonics |
US10405126B2 (en) | 2017-06-30 | 2019-09-03 | Qualcomm Incorporated | Mixed-order ambisonics (MOA) audio data for computer-mediated reality systems |
US20190313198A1 (en) * | 2017-12-19 | 2019-10-10 | Spotify Ab | Audio content format selection |
US10515645B2 (en) * | 2015-07-30 | 2019-12-24 | Dolby Laboratories Licensing Corporation | Method and apparatus for transforming an HOA signal representation |
US10595148B2 (en) * | 2016-01-08 | 2020-03-17 | Sony Corporation | Sound processing apparatus and method, and program |
US10657974B2 (en) * | 2017-12-21 | 2020-05-19 | Qualcomm Incorporated | Priority information for higher order ambisonic audio data |
US10659906B2 (en) | 2017-01-13 | 2020-05-19 | Qualcomm Incorporated | Audio parallax for virtual reality, augmented reality, and mixed reality |
WO2020102994A1 (en) * | 2018-11-20 | 2020-05-28 | 深圳市欢太科技有限公司 | 3d sound effect realization method and apparatus, and storage medium and electronic device |
WO2020159602A1 (en) * | 2019-01-28 | 2020-08-06 | Embody Vr, Inc | Spatial audio is received from an audio server over a first communication link. the spatial audio is converted by a cloud spatial audio processing system into binaural audio. the binauralized audio is streamed from the cloud spatial audio processing system to a mobile station over a second communication link to cause the mobile station to play the binaural audio on the personal audio delivery device |
EP3618462A4 (en) * | 2017-04-26 | 2021-01-13 | Shenzhen Skyworth-RGB Electronic Co., Ltd. | Method and apparatus for processing audio data in sound field |
WO2021102137A1 (en) * | 2019-11-22 | 2021-05-27 | Qualcomm Incorporated | Soundfield adaptation for virtual reality audio |
US11076257B1 (en) | 2019-06-14 | 2021-07-27 | EmbodyVR, Inc. | Converting ambisonic audio to binaural audio |
EP3833055A4 (en) * | 2018-08-20 | 2021-09-22 | Huawei Technologies Co., Ltd. | Audio processing method and apparatus |
US11270711B2 (en) | 2017-12-21 | 2022-03-08 | Qualcomm Incorproated | Higher order ambisonic audio data |
US11425497B2 (en) * | 2020-12-18 | 2022-08-23 | Qualcomm Incorporated | Spatial audio zoom |
US11743670B2 (en) | 2020-12-18 | 2023-08-29 | Qualcomm Incorporated | Correlation-based rendering with multiple distributed streams accounting for an occlusion for six degree of freedom applications |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8587631B2 (en) * | 2010-06-29 | 2013-11-19 | Alcatel Lucent | Facilitating communications using a portable communication device and directed sound output |
-
2016
- 2016-01-05 US US14/988,589 patent/US9767618B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8587631B2 (en) * | 2010-06-29 | 2013-11-19 | Alcatel Lucent | Facilitating communications using a portable communication device and directed sound output |
Cited By (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10515645B2 (en) * | 2015-07-30 | 2019-12-24 | Dolby Laboratories Licensing Corporation | Method and apparatus for transforming an HOA signal representation |
US11043224B2 (en) | 2015-07-30 | 2021-06-22 | Dolby Laboratories Licensing Corporation | Method and apparatus for encoding and decoding an HOA representation |
US20170094439A1 (en) * | 2015-09-24 | 2017-03-30 | Lenovo (Beijing) Limited | Information processing method and electronic device |
US9986362B2 (en) * | 2015-09-24 | 2018-05-29 | Lenovo (Beijing) Limited | Information processing method and electronic device |
US20170105082A1 (en) * | 2015-10-08 | 2017-04-13 | Qualcomm Incorporated | Conversion from channel-based audio to hoa |
US9961467B2 (en) * | 2015-10-08 | 2018-05-01 | Qualcomm Incorporated | Conversion from channel-based audio to HOA |
US10249312B2 (en) | 2015-10-08 | 2019-04-02 | Qualcomm Incorporated | Quantization of spatial vectors |
US10595148B2 (en) * | 2016-01-08 | 2020-03-17 | Sony Corporation | Sound processing apparatus and method, and program |
US10659904B2 (en) * | 2016-09-23 | 2020-05-19 | Gaudio Lab, Inc. | Method and device for processing binaural audio signal |
US20180091919A1 (en) * | 2016-09-23 | 2018-03-29 | Gaudio Lab, Inc. | Method and device for processing binaural audio signal |
US20190335287A1 (en) * | 2016-10-21 | 2019-10-31 | Samsung Electronics., Ltd. | Method for transmitting audio signal and outputting received audio signal in multimedia communication between terminal devices, and terminal device for performing same |
EP3531695A4 (en) * | 2016-10-21 | 2019-11-06 | Samsung Electronics Co., Ltd. | Method for transmitting audio signal and outputting received audio signal in multimedia communication between terminal devices, and terminal device for performing same |
US10972854B2 (en) * | 2016-10-21 | 2021-04-06 | Samsung Electronics Co., Ltd. | Method for transmitting audio signal and outputting received audio signal in multimedia communication between terminal devices, and terminal device for performing same |
KR102277438B1 (en) * | 2016-10-21 | 2021-07-14 | 삼성전자주식회사 | In multimedia communication between terminal devices, method for transmitting audio signal and outputting audio signal and terminal device performing thereof |
KR20180044077A (en) * | 2016-10-21 | 2018-05-02 | 삼성전자주식회사 | In multimedia communication between terminal devices, method for transmitting audio signal and outputting audio signal and terminal device performing thereof |
US11026024B2 (en) | 2016-11-17 | 2021-06-01 | Samsung Electronics Co., Ltd. | System and method for producing audio data to head mount display device |
WO2018093193A1 (en) * | 2016-11-17 | 2018-05-24 | Samsung Electronics Co., Ltd. | System and method for producing audio data to head mount display device |
CN106657617A (en) * | 2016-11-30 | 2017-05-10 | 努比亚技术有限公司 | Method for controlling playing of loudspeakers and mobile terminal |
US10602297B2 (en) * | 2016-12-23 | 2020-03-24 | Nxp B.V. | Processing audio signals |
US20180184225A1 (en) * | 2016-12-23 | 2018-06-28 | Nxp B.V. | Processing audio signals |
CN108268257A (en) * | 2016-12-29 | 2018-07-10 | 福建省天奕网络科技有限公司 | Lines track method for drafting and system applied to VR scenes |
US9992602B1 (en) * | 2017-01-12 | 2018-06-05 | Google Llc | Decoupled binaural rendering |
US10952009B2 (en) | 2017-01-13 | 2021-03-16 | Qualcomm Incorporated | Audio parallax for virtual reality, augmented reality, and mixed reality |
US10659906B2 (en) | 2017-01-13 | 2020-05-19 | Qualcomm Incorporated | Audio parallax for virtual reality, augmented reality, and mixed reality |
KR20200067981A (en) * | 2017-01-25 | 2020-06-15 | 삼성전자주식회사 | Method for processing vr audio and corresponding equipment |
EP3569001A4 (en) * | 2017-01-25 | 2020-07-22 | Samsung Electronics Co., Ltd. | Method for processing vr audio and corresponding equipment |
WO2018139884A1 (en) | 2017-01-25 | 2018-08-02 | Samsung Electronics Co., Ltd. | Method for processing vr audio and corresponding equipment |
KR102462067B1 (en) * | 2017-01-25 | 2022-11-02 | 삼성전자주식회사 | Method for processing vr audio and corresponding equipment |
US10750305B2 (en) | 2017-01-25 | 2020-08-18 | Samsung Electronics Co., Ltd. | Method for processing VR audio and corresponding equipment |
CN108346432A (en) * | 2017-01-25 | 2018-07-31 | 北京三星通信技术研究有限公司 | The processing method and relevant device of Virtual Reality audio |
US10009704B1 (en) | 2017-01-30 | 2018-06-26 | Google Llc | Symmetric spherical harmonic HRTF rendering |
US10158963B2 (en) | 2017-01-30 | 2018-12-18 | Google Llc | Ambisonic audio with non-head tracked stereo based on head position and time |
WO2018149774A1 (en) * | 2017-02-15 | 2018-08-23 | Sennheiser Electronic Gmbh & Co. Kg | Method and device for processing a digital audio signal for binaural reproduction |
EP3618462A4 (en) * | 2017-04-26 | 2021-01-13 | Shenzhen Skyworth-RGB Electronic Co., Ltd. | Method and apparatus for processing audio data in sound field |
US10966026B2 (en) | 2017-04-26 | 2021-03-30 | Shenzhen Skyworth-Rgb Electronic Co., Ltd. | Method and apparatus for processing audio data in sound field |
US10390166B2 (en) * | 2017-05-31 | 2019-08-20 | Qualcomm Incorporated | System and method for mixing and adjusting multi-input ambisonics |
WO2018234624A1 (en) * | 2017-06-21 | 2018-12-27 | Nokia Technologies Oy | Recording and rendering audio signals |
US11632643B2 (en) | 2017-06-21 | 2023-04-18 | Nokia Technologies Oy | Recording and rendering audio signals |
US10405126B2 (en) | 2017-06-30 | 2019-09-03 | Qualcomm Incorporated | Mixed-order ambisonics (MOA) audio data for computer-mediated reality systems |
JPWO2019009085A1 (en) * | 2017-07-05 | 2020-04-30 | ソニー株式会社 | Signal processing device and method, and program |
JP7115477B2 (en) | 2017-07-05 | 2022-08-09 | ソニーグループ株式会社 | SIGNAL PROCESSING APPARATUS AND METHOD, AND PROGRAM |
US11252524B2 (en) * | 2017-07-05 | 2022-02-15 | Sony Corporation | Synthesizing a headphone signal using a rotating head-related transfer function |
WO2019009085A1 (en) * | 2017-07-05 | 2019-01-10 | ソニー株式会社 | Signal processing device and method, and program |
US11606661B2 (en) | 2017-09-29 | 2023-03-14 | Nokia Technologies Oy | Recording and rendering spatial audio signals |
WO2019063877A1 (en) * | 2017-09-29 | 2019-04-04 | Nokia Technologies Oy | Recording and rendering spatial audio signals |
CN107741783A (en) * | 2017-10-01 | 2018-02-27 | 上海量科电子科技有限公司 | electronic transfer method and system |
US20190116440A1 (en) * | 2017-10-12 | 2019-04-18 | Qualcomm Incorporated | Rendering for computer-mediated reality systems |
US10469968B2 (en) * | 2017-10-12 | 2019-11-05 | Qualcomm Incorporated | Rendering for computer-mediated reality systems |
CN111183658A (en) * | 2017-10-12 | 2020-05-19 | 高通股份有限公司 | Rendering for computer-mediated reality systems |
US11044569B2 (en) * | 2017-12-19 | 2021-06-22 | Spotify Ab | Audio content format selection |
US20210345056A1 (en) * | 2017-12-19 | 2021-11-04 | Spotify Ab | Audio content format selection |
US11683654B2 (en) * | 2017-12-19 | 2023-06-20 | Spotify Ab | Audio content format selection |
US20190313198A1 (en) * | 2017-12-19 | 2019-10-10 | Spotify Ab | Audio content format selection |
US10657974B2 (en) * | 2017-12-21 | 2020-05-19 | Qualcomm Incorporated | Priority information for higher order ambisonic audio data |
US11270711B2 (en) | 2017-12-21 | 2022-03-08 | Qualcomm Incorproated | Higher order ambisonic audio data |
WO2019152783A1 (en) * | 2018-02-01 | 2019-08-08 | Qualcomm Incorporated | Scalable unified audio renderer |
US20190239015A1 (en) * | 2018-02-01 | 2019-08-01 | Qualcomm Incorporated | Scalable unified audio renderer |
US11395083B2 (en) * | 2018-02-01 | 2022-07-19 | Qualcomm Incorporated | Scalable unified audio renderer |
CN111670583A (en) * | 2018-02-01 | 2020-09-15 | 高通股份有限公司 | Scalable unified audio renderer |
EP3833055A4 (en) * | 2018-08-20 | 2021-09-22 | Huawei Technologies Co., Ltd. | Audio processing method and apparatus |
US11910180B2 (en) | 2018-08-20 | 2024-02-20 | Huawei Technologies Co., Ltd. | Audio processing method and apparatus |
US11611841B2 (en) | 2018-08-20 | 2023-03-21 | Huawei Technologies Co., Ltd. | Audio processing method and apparatus |
WO2020102994A1 (en) * | 2018-11-20 | 2020-05-28 | 深圳市欢太科技有限公司 | 3d sound effect realization method and apparatus, and storage medium and electronic device |
WO2020159602A1 (en) * | 2019-01-28 | 2020-08-06 | Embody Vr, Inc | Spatial audio is received from an audio server over a first communication link. the spatial audio is converted by a cloud spatial audio processing system into binaural audio. the binauralized audio is streamed from the cloud spatial audio processing system to a mobile station over a second communication link to cause the mobile station to play the binaural audio on the personal audio delivery device |
US11617051B2 (en) | 2019-01-28 | 2023-03-28 | EmbodyVR, Inc. | Streaming binaural audio from a cloud spatial audio processing system to a mobile station for playback on a personal audio delivery device |
US11076257B1 (en) | 2019-06-14 | 2021-07-27 | EmbodyVR, Inc. | Converting ambisonic audio to binaural audio |
US11317236B2 (en) | 2019-11-22 | 2022-04-26 | Qualcomm Incorporated | Soundfield adaptation for virtual reality audio |
WO2021102137A1 (en) * | 2019-11-22 | 2021-05-27 | Qualcomm Incorporated | Soundfield adaptation for virtual reality audio |
US11425497B2 (en) * | 2020-12-18 | 2022-08-23 | Qualcomm Incorporated | Spatial audio zoom |
US11743670B2 (en) | 2020-12-18 | 2023-08-29 | Qualcomm Incorporated | Correlation-based rendering with multiple distributed streams accounting for an occlusion for six degree of freedom applications |
Also Published As
Publication number | Publication date |
---|---|
US9767618B2 (en) | 2017-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9767618B2 (en) | Adaptive ambisonic binaural rendering | |
US11765541B2 (en) | Audio spatialization | |
US10728683B2 (en) | Sweet spot adaptation for virtualized audio | |
US20190349705A9 (en) | Graphical user interface to adapt virtualizer sweet spot | |
US10819953B1 (en) | Systems and methods for processing mixed media streams | |
US11309947B2 (en) | Systems and methods for maintaining directional wireless links of motile devices | |
JP2020510341A (en) | Distributed audio virtualization system | |
TWI709131B (en) | Audio scene processing | |
JP2021535632A (en) | Methods and equipment for processing audio signals | |
US11696087B2 (en) | Emphasis for audio spatialization | |
CN114072792A (en) | Cryptographic-based authorization for audio rendering | |
JP2022547253A (en) | Discrepancy audiovisual acquisition system | |
CN110881157B (en) | Sound effect control method and sound effect output device for orthogonal base correction | |
TWI683582B (en) | Sound effect controlling method and sound outputting device with dynamic gain | |
US11445299B2 (en) | Rendering binaural audio over multiple near field transducers | |
WO2022151336A1 (en) | Techniques for around-the-ear transducers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NAJAF-ZADEH, HOSSEIN;WOODWARD, BARRY;REEL/FRAME:037414/0388 Effective date: 20160105 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |