US20160241980A1 - Adaptive ambisonic binaural rendering - Google Patents

Adaptive ambisonic binaural rendering Download PDF

Info

Publication number
US20160241980A1
US20160241980A1 US14/988,589 US201614988589A US2016241980A1 US 20160241980 A1 US20160241980 A1 US 20160241980A1 US 201614988589 A US201614988589 A US 201614988589A US 2016241980 A1 US2016241980 A1 US 2016241980A1
Authority
US
United States
Prior art keywords
signals
ambisonic
cos
sin
ambisonic signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/988,589
Other versions
US9767618B2 (en
Inventor
Hossein Najaf-Zadeh
Barry Woodward
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US14/988,589 priority Critical patent/US9767618B2/en
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NAJAF-ZADEH, HOSSEIN, WOODWARD, BARRY
Publication of US20160241980A1 publication Critical patent/US20160241980A1/en
Application granted granted Critical
Publication of US9767618B2 publication Critical patent/US9767618B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/038Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0381Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • Ambisonics is an effective technique to encode and reconstruct sound fields. This technique is based on the orthogonal decomposition of a sound field in the spherical coordinates in the 3D space or cylindrical decomposition in the 2D space. In the decoding process, the ambisonic signals are decoded to produce speaker signals. The higher the order of the ambisonics, the finer reconstruction of the sound fields achieved. Ambisonics provides significant flexibility to recreate 3D audio on any playback setup such as large number of loudspeakers and headphones. Particularly, in mobile applications and Head-Mounted Displays (HMD), ambisonic rendering to headphones is of great interest.
  • HMD Head-Mounted Displays
  • a user equipment includes a memory element and a processor.
  • the memory element is configured to store a plurality of head-related transfer functions.
  • the processor is configured to receive an audio signal.
  • the audio signal includes a plurality of ambisonic signals.
  • the processor is also configured to identify an orientation of the UE based on physical properties of the UE.
  • the processor is also configured to rotate the plurality of ambisonic signals based on the orientation of the UE.
  • the processor is also configured to filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals.
  • the processor is also configured to output the speaker signals.
  • a method for audio signal processing.
  • the method includes receiving an audio signal.
  • the audio signal includes a plurality of ambisonic signals.
  • the method also includes identifying an orientation of the UE based on physical properties of the UE.
  • the method also includes rotating the plurality of ambisonic signals based on the orientation of the UE.
  • the method also includes the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals.
  • the method also includes outputting the speaker signals.
  • FIG. 1 illustrates an example HMD according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented
  • FIG. 2 illustrates an example view with content in an HMD according to an embodiment of this disclosure
  • FIG. 3 illustrates an example Cartesian domain with respect to a user according to an embodiment of this disclosure
  • FIG. 4 illustrates block diagram for adaptive ambisonic binaural rendering according to an embodiment of this disclosure
  • FIG. 1 illustrates an example HMD 100 according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented.
  • the embodiment of the HMD 100 illustrated in FIG. 1 is for illustration only, the HMD 100 comes in a wide variety of configurations, and FIG. 1 does not limit the scope of this disclosure to any particular implementation of a HMD.
  • the processor 140 is also capable of executing other processes and programs resident in the memory 160 .
  • the processor 140 can move data into or out of the memory 160 as required by an executing process.
  • the processor 140 is configured to execute the applications 162 based on the OS 161 or in response to signals received from eNBs or an operator.
  • the processor 140 is also coupled to the I/O interface 145 , which provides the HMD 100 with the ability to connect to other devices, such as laptop computers and handheld computers.
  • the I/O interface 145 is the communication path between these accessories and the processor 140 .
  • the memory 160 is coupled to the processor 140 .
  • Part of the memory 160 could include a random access memory (RAM), and another part of the memory 160 could include a Flash memory or other read-only memory (ROM).
  • RAM random access memory
  • ROM read-only memory
  • the sensor(s) 165 can further include a control circuit for controlling at least one of the sensors included therein. As will be discussed in greater detail below, one or more of these sensor(s) 165 may be used to control audio rendering, determine the orientation and facing direction of the user for 3D content display identification, etc. Any of these sensor(s) 165 may be located within the HMD 100 , within a headset configured to hold the HMD 100 , or in both the headset and HMD 100 , for example, in embodiments where the HMD 100 includes a headset.
  • the touchscreen 150 can include a touch panel, a (digital) pen sensor, a key, or an ultrasonic input device.
  • the touchscreen 150 can recognize, for example, a touch input in at least one scheme among a capacitive scheme, a pressure sensitive scheme, an infrared scheme, or an ultrasonic scheme.
  • the touchscreen 150 can also include a control circuit. In the capacitive scheme, the touchscreen 150 can recognize touch or proximity.
  • the HMD 100 may include circuitry for and applications for providing 3D audio for a HMD.
  • FIG. 1 illustrates one example of HMD 100
  • various changes may be made to FIG. 1 .
  • various components in FIG. 1 could be combined, further subdivided, or omitted and additional components could be added according to particular needs.
  • the processor 140 could be divided into multiple processors, such as one or more central processing units (CPUs) and one or more graphics processing units (GPUs).
  • FIG. 1 illustrates the HMD 100 configured as a mobile telephone, tablet, or smartphone, the HMD 100 could be configured to operate as other types of mobile or stationary devices.
  • Embodiments of the present disclosure provide an adaptive ambisonic binaural rendering framework for stereoscopic 3D VR or AR applications on the HMD 100 .
  • the user's head motion i.e., the movement of the HMD 100
  • sensor(s) 165 in the HMD 100 is tracked using sensor(s) 165 in the HMD 100 and used to control the binaural rendering.
  • ambisonic signals are rotated according to the HMD 100 orientation and then mapped to virtual speakers located at fixed positions.
  • the rotated ambisonic signals and a fixed set of Head-Related Transfer Functions (HRTF) are used to produce ear signals.
  • HRTF Head-Related Transfer Functions
  • ambisonic rendering can be adapted to the HMD 100 orientation to recreate an original sound field.
  • Various embodiments of this disclosure provide a system for adaptive ambisonic binaural rendering to make audio scene independent from head movement. Binaural ambisonic rendering can be done through mapping of ambisonic signals to virtual speakers and then filtering each loudspeaker signal with a pair of Head-Related Transfer Functions (HRTF) corresponding to the position of the virtual speakers (relative to the head).
  • HRTF Head-Related Transfer Functions
  • the positions of virtual speakers remain unchanged, and for each new HMD orientation, the original ambisonic signals and a new set of HRTFs are used to produce ear signals.
  • the positions of virtual speakers are changed according to HMD orientation to make the audio scene independent from head movement.
  • the original ambisonic signals and a new set of HRTFs corresponding to the positions of the speakers are used to produce speaker signals.
  • ambisonic signals are rotated according to the HMD (or head) orientation and then mapped to virtual speakers located at fixed positions. Then the rotated ambisonic signals and a fixed set of HRTFs are used to produce ear signals.
  • This embodiment is advantageous as it needs only one set of HRTFs for binaural rendering, one HRTF for each headphone speaker (or ear).
  • FIG. 2 illustrates an example view 202 with content in an HMD 100 according to an embodiment of this disclosure.
  • a user is wearing the HMD 100 and is seeing the view 202 .
  • the view 202 includes a ninety-six degree viewing angle. In different embodiments, other viewing angles can be used.
  • HMD 204 with mega sized screens and ninety-six degree viewing angles allow users to feel the world beyond peripheral vision.
  • the user may desire to seamlessly switch between the VR world and the real world.
  • a user is watching a movie in HMD 100 and wants to write an email.
  • the user can draft the email in the VR environment without removing the HMD 100 .
  • the mobile device can display the mobile device environment in the VR world.
  • Various embodiments of the present disclosure provide content within an angular range that is wider than the user's current 3D view frustum 310 .
  • the angular range 315 e.g., on the x-z plane assuming a Cartesian coordinate system with the x direction generally denoting left/right or yaw, the y direction generally denoting forward/backwards, and the z direction generally denoting up/down or pitch
  • the angular range 315 e.g., on the x-z plane assuming a Cartesian coordinate system with the x direction generally denoting left/right or yaw, the y direction generally denoting forward/backwards, and the z direction generally denoting up/down or pitch
  • the HMD 100 displays, either actually or virtually (i.e., not actually displayed on the display 155 but actually displayed when the HMD 100 is moved to a location where the element is virtually displayed), some UI elements 305 outside the current 3D view frustum 310 . However, the HMD 100 places these UI elements 305 within the angular range 315 for the UI so that the user would not have to turn the head too much to the left or the right (i.e., yaw or x movement) to see all displayed UI elements 305 .
  • the HMD 100 places the elements within the user's current 3D view frustum, i.e., the portion of the total viewable 3D space that is currently viewable by the user as a result of the HMD's 100 current detected orientation and facing direction.
  • the HMD 100 detects the user's head motions, i.e., the movement of the HMD 100 , using the sensor(s) 165 on the HMD 100 and/or headset, such as, for example, a gyroscope, an accelerometer, etc.
  • the HMD 100 displays the UI elements 305 as well as other elements of the display (e.g., content) to respond to the head motions to simulate looking at and interacting with the real-world view and objects.
  • Rotation matrices for up to second order ambisonics are identified.
  • Many ambisonic recordings are third and higher order.
  • Another issue would be real-time binaural rendering with no discontinuities (in time and space) while changing the ambisonic signals according to the head movement.
  • FIG. 3 illustrates an example Cartesian domain 300 with respect to a user 305 according to an embodiment of this disclosure.
  • a user 305 is seen without wearing the HMD 100 , but could be wearing the HMD 100 .
  • the coordinates in the Cartesian domain 300 may also be considered with respect to the HMD 100 .
  • the axes X, Y, and Z can be in positive and negative directions.
  • the user 305 can also rotate within the Cartesian domain 300 .
  • One or more embodiments of this disclosure provide different techniques for adaptive ambisonic rendering.
  • the different techniques are based on the equivalence of a HMD rotation in one direction and sound field rotation in the opposite direction.
  • One embodiment is based on changing the location of virtual speakers to make the reproduced sound field independent from head movement.
  • positions of virtual speakers are changed sequentially for rotation around the three axes X, Y, and Z in the Cartesian domain 320 .
  • This embodiment can be used to do adaptive binaural rendering for any ambisonic order.
  • a new rotation matrix for third order ambisonics is applied. This embodiment rotates ambisonics (for example, up to third order) in any direction in 3D space through simple matrix multiplication and can use only one set of HRTFs.
  • fourth order, or higher, ambisonics can be used as well.
  • One or more embodiments of this disclosure provide rotating ambisonic signals according to the orientation of an HMD.
  • the orientation can be the orientation within a virtual reality defined by three axes in the Cartesian domain (i.e. X, Y, and Z) 300 .
  • the orientation can also include a position or location within the virtual reality.
  • the location and orientation of the HMD can be determined using sensor(s) 165 as shown in FIG. 1 .
  • the head of the user 305 can be tracked instead of the HDM.
  • the tracking can be performed by sensor(s) 165 as shown in FIG. 1 or by external camera systems.
  • FIG. 4 illustrates block diagram 400 for adaptive ambisonic binaural rendering according to an embodiment of this disclosure.
  • the embodiment of the adaptive ambisonic binaural rendering illustrated in FIG. 4 is for illustration only.
  • a rotation matrix can be applied to ambisonic signals.
  • the rotation matrix is determined based on a head positions or HMD orientation.
  • Sensors in the HMD or external systems such as camera system or infrared detectors can identify the orientation and a processor can select a rotation matrix based on the orientation.
  • the processor can perform ambisonic rendering by mapping the positions of the virtual speakers.
  • a processor can perform binaural filtering by applying the HRTFs to the ambisonic signals to produce binaural signals, or speaker signals. The same HRTFs can be applied no matter the orientation of the HMD.
  • One or more embodiments of this disclosure also provide adaptive binaural rendering by relocating virtual speakers.
  • Algorithms for sound field rotation around the X, Y, and Z axes are provided by this disclosure.
  • a sound field can be rotated sequentially for rotation around the X axis (i.e. roll), Y axis (i.e. pitch), and Z axis (i.e. yaw).
  • ⁇ and ⁇ ′ are the original and the modified azimuth, respectively.
  • the new position of the virtual speaker is given by:
  • the HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
  • the new position of the virtual speaker is given by:
  • the HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
  • One or more embodiments of this disclosure provide another technique to rotate a sound field using a rotation matrix.
  • This embodiment provides three new rotation matrices for rotating third order ambisonic signals around the three axes in 3D space.
  • ambisonic signals can be modified through a matrix multiplication as follows:
  • R is the rotation matrix around an axis
  • B and B′ are the original and modified ambisonic signals, respectively.
  • the positions of virtual speakers relative to the HMD remain unchanged and as such only one set of HRTFs is used for binaural rendering.
  • the different embodiments of this disclosure provide adaptive HOA binaural rendering based on sound field rotation in 3D space. Contrary to channel-based methods, in one of the embodiments, only one set of HRTFs corresponding to a fixed playback setup can be used. In comparison to channel-based binaural rendering, HOA-based methods provide a higher quality if there is not a very large set of HRTFs available to the binaural renderer. Sound fields can be edited in the ambisonic domain for artistic purposes prior to rendering to headphones.
  • An embodiment of this disclosure provides third order rotation matrices for N3D-encoded B-Format (used in MPEG audio material) for three axes, X, Y, and Z in the Cartesian domain. Rotation in a direction can be done by multiplying these rotation matrices.
  • the third order rotation matrix can be a 16 ⁇ 16 matrix.
  • the sixteen original ambisonic signals can be labeled as follows:
  • B′ [W′, X′, Y′, Z′, V′, T′, R′, S′, U′, Q′, O′, M′, K′, L′, N′, P′] (12)
  • the third order rotation matrix can be a 16 ⁇ 16 matrix, where ⁇ is the rotation angle around the Y axis, the modified signals are as follows:
  • V ′ cos(2 ⁇ ) V +sin(2 ⁇ ) U (49)
  • N ′ cos(2 ⁇ ) N ⁇ sin(2 ⁇ ) O (59)
  • FIG. 5 illustrates process 500 for adaptive ambisonic binaural rendering according to this disclosure.
  • the embodiment shown in FIG. 5 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure.
  • a processor such as processor 140 as shown in FIG. 5 , can perform different steps of process 500 .
  • a processor is configured to receive an audio signal, the audio signal comprising a plurality of ambisonic signals.
  • the processor is configured to identify an orientation of the UE based on the measured physical properties of the UE.
  • Sensors can be configured to sense the physical properties of the UE.
  • the physical properties could include, for example, a touch input on the headset or the HMD, camera information, gesture information, a gyroscopic information, air pressure information, a magnetic information, an acceleration information, a grip information, a proximity information, a color information, a bio-physical information, a temperature/humidity information, an illumination information, an UV information, an Electromyography (EMG) information, an Electroencephalogram (EEG) information, an Electrocardiogram (ECG) sensor, an IR sensor, an ultrasound sensor, an iris sensor, a fingerprint sensor, etc.
  • EMG Electromyography
  • EEG Electroencephalogram
  • ECG Electrocardiogram
  • the processor is configured to rotate the plurality of ambisonic signals based on the orientation of the UE.
  • the processor can apply at least one rotation matrix to the plurality of ambisonic signals.
  • the at least one rotation matrix comprises a rotation matrix for each axis of three axes. If the orientation includes a rotation in a direction, the processor can be configured to rotate the sound field of the plurality of ambisonic signals opposite the direction.
  • the processor can also be configured to map the plurality of ambisonic signals to one or more virtual speakers of a sound field.
  • the processor is configured to filter the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals.
  • the head related transfer functions could be stored in a memory element.
  • the plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals.
  • the processor is configured to output the speaker signals.
  • One or more embodiments of this disclosure provide multichannel audio downmixing via ambisonic conversion.
  • An embodiment provides a novel audio downmixing method based on ambisonic mapping. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing.
  • one or more embodiments provide an active downmix method based on the conversion of input multichannel audio to HOA signals.
  • the playback setup can be independent of the input audio channels configuration.
  • the HOA signals can be mapped to any speaker setup (e.g. smaller number of speakers, asymmetric configuration, etc.).
  • One or more of the embodiments reduce common distortions such as coloration (i.e. comb filter effects) and loudness preservation to improve audio quality of the downmixed audio.
  • An embodiment of this disclosure provides HOA based audio downmixing.
  • the input audio channels are decomposed in the spherical coordinates (i.e. mapped onto the spherical harmonic bases) to generate fourth order HOA signals as follows:
  • HOA high frequency effect
  • S in is the matrix containing input audio channels (except the low frequency effect (IMF) channels)
  • B is the matrix of HOA signals.
  • the order of HOA signals can be increased to better represent the original sound field. Fourth order ambisonic representation for many sound fields would be sufficient and would reduce the computational load.
  • the HOA signals can be mapped to any play back system using an HOA renderer as follows:
  • D is the HOA renderer and S out is the output audio channels.
  • the input LFE channels are used in the downmixed output audio. Some sound images may be distorted when a larger number of channels are downmixed to a smaller playback system.
  • An example embodiment provides a sound field on a smaller playback system with the best possible audio quality.
  • Audio downmixing results in some distortions. Issues can be caused by 3D to 2D conversion wherein the sound field envelopment (from above) and accuracy of sound source vertical localization are degraded. Some other issues that might be observed include coloration, loudness distortion, spectral distortion, direct to ambient sound ratio, auditory masking, etc.
  • a processor provides a correlation-based technique to adjust the Inter-Channel Time Delay (ICTD) between highly correlated input channels to reduce coloration. Since sound fields might consist of many sound sources, the processor divides input audio channels into subgroups based on the cross correlation. Channels with cross correlation greater than 0.2 are placed in the same group and then time aligned to the channel with the largest energy in that group. In one example embodiment, the maximum delay to be aligned is set to 10 msec. This maximum delay might not be caused by the distance between microphones in a microphone array, but might be caused by post-recording effects. One embodiment of this disclosure recognizes that there are large delays between channels in the MPEG 22.2-channel audio files, and sets the maximum delay at 10 msec. In one example embodiment, the processor does not align spectral components differently.
  • ICTD Inter-Channel Time Delay
  • the processor can perform ambisonic conversion. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field.
  • the processor can map the high order ambisonics to virtual speakers. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing.
  • the energy of downmixed audio can be equalized in both spectral domain and space.
  • energy distribution in a downmixed sound field can be more easily controlled.
  • the energy of the downmixed channels is adjusted in the octave bands to make it equal the energy of the input sound field.
  • the energy adjustment can also be done separately for the left and right channels to keep the energy ratio of the left and right channels the same as that in the input sound field
  • Some sound sources might be partially masked by louder sounds in a downmixed sound field, caused by the auditory masking effects in the frequency and/or time domain. Sound sources might be located at different location in the input sound field and therefore are audible. In a downmixed sound field, many sounds might be coming from the same direction and therefore auditory masking (both temporal and simultaneous) can be more effective.
  • One way to reduce masking effects in a downmixed sound field is to apply different gains to input audio channels prior to downmixing to smaller number of channels

Abstract

A user equipment (UE) includes a memory element and a processor. The memory element is configured to store a plurality of head-related transfer functions. The processor is configured to receive an audio signal. The audio signal includes a plurality of ambisonic signals. The processor is also configured to identify an orientation of the UE based on physical properties of the UE. The processor is also configured to rotate the plurality of ambisonic signals based on the orientation of the UE. The processor is also configured to filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals. The processor is also configured to output the speaker signals.

Description

    CROSS-REFERENCE TO RELATED APPLICATION(S) AND CLAIM OF PRIORITY
  • The present application claims priority to U.S. Provisional Patent Application Ser. No. 62/108,774, filed Jan. 28, 2015, entitled “ADAPTIVE AMBISONIC BINAURAL RENDERING” and U.S. Provisional Patent Application Ser. No. 62/108,779, filed Jan. 28, 2015, entitled “AUDIO DOWNMIXING VIA AMBISONIC CONVERSION”. The contents of the above-identified patent documents are incorporated herein by reference.
  • TECHNICAL FIELD
  • The present application relates generally to ambisonic signals and, more specifically, to an apparatus and method for adaptive ambisonic binaural rendering.
  • BACKGROUND
  • Ambisonics is an effective technique to encode and reconstruct sound fields. This technique is based on the orthogonal decomposition of a sound field in the spherical coordinates in the 3D space or cylindrical decomposition in the 2D space. In the decoding process, the ambisonic signals are decoded to produce speaker signals. The higher the order of the ambisonics, the finer reconstruction of the sound fields achieved. Ambisonics provides significant flexibility to recreate 3D audio on any playback setup such as large number of loudspeakers and headphones. Particularly, in mobile applications and Head-Mounted Displays (HMD), ambisonic rendering to headphones is of great interest.
  • SUMMARY
  • In an embodiment, a user equipment (UE) is provided that includes a memory element and a processor. The memory element is configured to store a plurality of head-related transfer functions. The processor is configured to receive an audio signal. The audio signal includes a plurality of ambisonic signals. The processor is also configured to identify an orientation of the UE based on physical properties of the UE. The processor is also configured to rotate the plurality of ambisonic signals based on the orientation of the UE. The processor is also configured to filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals. The processor is also configured to output the speaker signals.
  • In another embodiment, a method is provided for audio signal processing. The method includes receiving an audio signal. The audio signal includes a plurality of ambisonic signals. The method also includes identifying an orientation of the UE based on physical properties of the UE. The method also includes rotating the plurality of ambisonic signals based on the orientation of the UE. The method also includes the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals. The method also includes outputting the speaker signals.
  • Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation; the term “or,” is inclusive, meaning and/or; the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term “controller” means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
  • FIG. 1 illustrates an example HMD according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented;
  • FIG. 2 illustrates an example view with content in an HMD according to an embodiment of this disclosure;
  • FIG. 3 illustrates an example Cartesian domain with respect to a user according to an embodiment of this disclosure;
  • FIG. 4 illustrates block diagram for adaptive ambisonic binaural rendering according to an embodiment of this disclosure
  • FIG. 5 illustrates process for adaptive ambisonic binaural rendering according to this disclosure; and
  • FIG. 6 illustrates block diagram for high order ambisonic downmixing according to an embodiment of this disclosure.
  • DETAILED DESCRIPTION
  • FIGS. 1 through 6, discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged apparatus or method.
  • FIG. 1 illustrates an example HMD 100 according to embodiments of the present disclosure and in which embodiments of the present disclosure may be implemented. The embodiment of the HMD 100 illustrated in FIG. 1 is for illustration only, the HMD 100 comes in a wide variety of configurations, and FIG. 1 does not limit the scope of this disclosure to any particular implementation of a HMD.
  • In various embodiments, the HMD 100 may take different forms, and the present disclosure is not limited to any particular form. For example, the HMD 100 may be a mobile communication device, such as, for example, a user equipment, a mobile station, a subscriber station, a wireless terminal, a smart phone, a tablet, etc., that is mountable within a headset for VR and/or AR applications. In other examples, the HMD 100 may include the headset and take the form of a wearable electronic device, such as, for example, glasses, goggles, a helmet, etc., for the VR and/or AR applications.
  • As shown in FIG. 1, the HMD 100 includes an antenna 105, a radio frequency (RF) transceiver 110, transmit (TX) processing circuitry 115, a microphone 120, and receive (RX) processing circuitry 125. The HMD 100 also includes a speaker 130, a processor 140, an input/output (I/O) interface (IF) 145, a touchscreen 150, a display 155, a memory 160, and one or more sensors 165. The memory 160 includes an operating system (OS) 161 and one or more applications 162.
  • The RF transceiver 110 receives, from the antenna 105, an incoming RF signal transmitted by an access point (e.g., base station, WiFi router, Bluetooth device) for a network (e.g., a WiFi, Bluetooth, cellular, 5G, LTE, LTE-A, WiMAX, or any other type of wireless network). The RF transceiver 110 down-converts the incoming RF signal to generate an intermediate frequency (IF) or baseband signal. The IF or baseband signal is sent to the RX processing circuitry 125, which generates a processed baseband signal by filtering, decoding, and/or digitizing the baseband or IF signal. The RX processing circuitry 125 transmits the processed baseband signal to the speaker 130 (such as for voice data) or to the processor 140 for further processing (such as for web browsing data).
  • The TX processing circuitry 115 receives analog or digital voice data from the microphone 120 or other outgoing baseband data (such as web data, e-mail, or interactive video game data) from the processor 140. The TX processing circuitry 115 encodes, multiplexes, and/or digitizes the outgoing baseband data to generate a processed baseband or IF signal. The RF transceiver 110 receives the outgoing processed baseband or IF signal from the TX processing circuitry 115 and up-converts the baseband or IF signal to an RF signal that is transmitted via the antenna 105.
  • The processor 140 can include one or more processors or other processing devices and execute the OS 161 stored in the memory 160 in order to control the overall operation of the HMD 100. For example, the processor 140 could control the reception of forward channel signals and the transmission of reverse channel signals by the RF transceiver 110, the RX processing circuitry 125, and the TX processing circuitry 115 in accordance with well-known principles. In some embodiments, the processor 140 includes at least one microprocessor or microcontroller. On another embodiment, the processor 140 could also be implemented as processing circuitry. The processor 140 can carry out the operations or instructions of any process disclosed herein.
  • The processor 140 is also capable of executing other processes and programs resident in the memory 160. The processor 140 can move data into or out of the memory 160 as required by an executing process. In some embodiments, the processor 140 is configured to execute the applications 162 based on the OS 161 or in response to signals received from eNBs or an operator. The processor 140 is also coupled to the I/O interface 145, which provides the HMD 100 with the ability to connect to other devices, such as laptop computers and handheld computers. The I/O interface 145 is the communication path between these accessories and the processor 140.
  • The processor 140 is also coupled to the touchscreen 150 and the display 155. The operator of the HMD 100 can use the touchscreen 150 to enter data and/or inputs into the HMD 100. The display 155 may be a liquid crystal display, light-emitting diode (LED) display, optical LED (OLED), active matrix OLED (AMOLED), or other display capable of rendering text and/or graphics, such as from web sites, videos, games, etc.
  • The memory 160 is coupled to the processor 140. Part of the memory 160 could include a random access memory (RAM), and another part of the memory 160 could include a Flash memory or other read-only memory (ROM).
  • HMD 100 further includes one or more sensor(s) 165 that can meter a physical quantity or detect an activation state of the HMD 100 and convert metered or detected information into an electrical signal. For example, sensor 165 may include one or more buttons for touch input, e.g., on the headset or the HMD 100, a camera, a gesture sensor, a gyroscope or gyro sensor, an air pressure sensor, a magnetic sensor or magnetometer, an acceleration sensor or accelerometer, a grip sensor, a proximity sensor, a color sensor 165H (e.g., a Red Green Blue (RGB) sensor), a bio-physical sensor, a temperature/humidity sensor, an illumination sensor 165K, an Ultraviolet (UV) sensor, an Electromyography (EMG) sensor, an Electroencephalogram (EEG) sensor, an Electrocardiogram (ECG) sensor, an IR sensor, an ultrasound sensor, an iris sensor, a fingerprint sensor, etc. The sensor(s) 165 can further include a control circuit for controlling at least one of the sensors included therein. As will be discussed in greater detail below, one or more of these sensor(s) 165 may be used to control audio rendering, determine the orientation and facing direction of the user for 3D content display identification, etc. Any of these sensor(s) 165 may be located within the HMD 100, within a headset configured to hold the HMD 100, or in both the headset and HMD 100, for example, in embodiments where the HMD 100 includes a headset.
  • The touchscreen 150 can include a touch panel, a (digital) pen sensor, a key, or an ultrasonic input device. The touchscreen 150 can recognize, for example, a touch input in at least one scheme among a capacitive scheme, a pressure sensitive scheme, an infrared scheme, or an ultrasonic scheme. The touchscreen 150 can also include a control circuit. In the capacitive scheme, the touchscreen 150 can recognize touch or proximity.
  • As described in more detail below, the HMD 100 may include circuitry for and applications for providing 3D audio for a HMD. Although FIG. 1 illustrates one example of HMD 100, various changes may be made to FIG. 1. For example, various components in FIG. 1 could be combined, further subdivided, or omitted and additional components could be added according to particular needs. As a particular example, the processor 140 could be divided into multiple processors, such as one or more central processing units (CPUs) and one or more graphics processing units (GPUs). Also, while FIG. 1 illustrates the HMD 100 configured as a mobile telephone, tablet, or smartphone, the HMD 100 could be configured to operate as other types of mobile or stationary devices.
  • Embodiments of the present disclosure provide an adaptive ambisonic binaural rendering framework for stereoscopic 3D VR or AR applications on the HMD 100. For VR experience using the HMD 100, the user's head motion, i.e., the movement of the HMD 100, is tracked using sensor(s) 165 in the HMD 100 and used to control the binaural rendering. In this disclosure, ambisonic signals are rotated according to the HMD 100 orientation and then mapped to virtual speakers located at fixed positions. The rotated ambisonic signals and a fixed set of Head-Related Transfer Functions (HRTF) are used to produce ear signals.
  • One or more embodiments of this disclosure recognize and take into account that ambisonic rendering can be adapted to the HMD 100 orientation to recreate an original sound field. Various embodiments of this disclosure provide a system for adaptive ambisonic binaural rendering to make audio scene independent from head movement. Binaural ambisonic rendering can be done through mapping of ambisonic signals to virtual speakers and then filtering each loudspeaker signal with a pair of Head-Related Transfer Functions (HRTF) corresponding to the position of the virtual speakers (relative to the head).
  • In an embodiment of this disclosure, for ambisonic rendering, the positions of virtual speakers remain unchanged, and for each new HMD orientation, the original ambisonic signals and a new set of HRTFs are used to produce ear signals. In another embodiment of this disclosure, the positions of virtual speakers are changed according to HMD orientation to make the audio scene independent from head movement. The original ambisonic signals and a new set of HRTFs corresponding to the positions of the speakers are used to produce speaker signals. In yet another embodiment of this disclosure, ambisonic signals are rotated according to the HMD (or head) orientation and then mapped to virtual speakers located at fixed positions. Then the rotated ambisonic signals and a fixed set of HRTFs are used to produce ear signals. This embodiment is advantageous as it needs only one set of HRTFs for binaural rendering, one HRTF for each headphone speaker (or ear).
  • FIG. 2 illustrates an example view 202 with content in an HMD 100 according to an embodiment of this disclosure. In FIG. 2, a user is wearing the HMD 100 and is seeing the view 202. The view 202 includes a ninety-six degree viewing angle. In different embodiments, other viewing angles can be used.
  • Various embodiments of this disclosure recognize and take into account that HMD 204 with mega sized screens and ninety-six degree viewing angles allow users to feel the world beyond peripheral vision. There are applications on the HMD 100 with a mobile device LCD as the screen. Users might want to use a mobile device without removing the HMD 100. The user may desire to seamlessly switch between the VR world and the real world. In an example, a user is watching a movie in HMD 100 and wants to write an email. In this example, the user can draft the email in the VR environment without removing the HMD 100. The mobile device can display the mobile device environment in the VR world.
  • Various embodiments of the present disclosure provide content within an angular range that is wider than the user's current 3D view frustum 310. The angular range 315 (e.g., on the x-z plane assuming a Cartesian coordinate system with the x direction generally denoting left/right or yaw, the y direction generally denoting forward/backwards, and the z direction generally denoting up/down or pitch), within which the UI elements 305 are to be placed is configured. In some examples, (e.g., when more UI elements 305 exist than can fit), the HMD 100 displays, either actually or virtually (i.e., not actually displayed on the display 155 but actually displayed when the HMD 100 is moved to a location where the element is virtually displayed), some UI elements 305 outside the current 3D view frustum 310. However, the HMD 100 places these UI elements 305 within the angular range 315 for the UI so that the user would not have to turn the head too much to the left or the right (i.e., yaw or x movement) to see all displayed UI elements 305. Note, while certain examples are given in a Cartesian coordinate system, any suitable coordinate system may be used with any tuple serving as the default coordinate directions. The HMD 100 places the elements within the user's current 3D view frustum, i.e., the portion of the total viewable 3D space that is currently viewable by the user as a result of the HMD's 100 current detected orientation and facing direction.
  • As discussed above, the HMD 100 detects the user's head motions, i.e., the movement of the HMD 100, using the sensor(s) 165 on the HMD 100 and/or headset, such as, for example, a gyroscope, an accelerometer, etc. The HMD 100 displays the UI elements 305 as well as other elements of the display (e.g., content) to respond to the head motions to simulate looking at and interacting with the real-world view and objects.
  • One or more embodiments of this disclosure recognize and take into account the difficulty in identifying a rotation matrix for any direction in 3D space. Rotation matrices for up to second order ambisonics (Fu-Ma format) are identified. Many ambisonic recordings are third and higher order. As such, there is a need to develop techniques for rotation of ambisonic signals with any order. Another issue would be real-time binaural rendering with no discontinuities (in time and space) while changing the ambisonic signals according to the head movement.
  • FIG. 3 illustrates an example Cartesian domain 300 with respect to a user 305 according to an embodiment of this disclosure. In FIG. 3, a user 305 is seen without wearing the HMD 100, but could be wearing the HMD 100. The coordinates in the Cartesian domain 300 may also be considered with respect to the HMD 100. The axes X, Y, and Z can be in positive and negative directions. The user 305 can also rotate within the Cartesian domain 300.
  • One or more embodiments of this disclosure provide different techniques for adaptive ambisonic rendering. The different techniques are based on the equivalence of a HMD rotation in one direction and sound field rotation in the opposite direction. One embodiment is based on changing the location of virtual speakers to make the reproduced sound field independent from head movement. In this embodiment, positions of virtual speakers are changed sequentially for rotation around the three axes X, Y, and Z in the Cartesian domain 320. This embodiment can be used to do adaptive binaural rendering for any ambisonic order. In another embodiment, a new rotation matrix for third order ambisonics is applied. This embodiment rotates ambisonics (for example, up to third order) in any direction in 3D space through simple matrix multiplication and can use only one set of HRTFs. In other examples, fourth order, or higher, ambisonics can be used as well.
  • One or more embodiments of this disclosure provide rotating ambisonic signals according to the orientation of an HMD. The orientation can be the orientation within a virtual reality defined by three axes in the Cartesian domain (i.e. X, Y, and Z) 300. The orientation can also include a position or location within the virtual reality. The location and orientation of the HMD can be determined using sensor(s) 165 as shown in FIG. 1. In different embodiments of this disclosure, the head of the user 305 can be tracked instead of the HDM. The tracking can be performed by sensor(s) 165 as shown in FIG. 1 or by external camera systems.
  • FIG. 4 illustrates block diagram 400 for adaptive ambisonic binaural rendering according to an embodiment of this disclosure. The embodiment of the adaptive ambisonic binaural rendering illustrated in FIG. 4 is for illustration only.
  • At block 402, a rotation matrix can be applied to ambisonic signals. The rotation matrix is determined based on a head positions or HMD orientation. Sensors in the HMD or external systems such as camera system or infrared detectors can identify the orientation and a processor can select a rotation matrix based on the orientation. At block 404, the processor can perform ambisonic rendering by mapping the positions of the virtual speakers. At block 406, a processor can perform binaural filtering by applying the HRTFs to the ambisonic signals to produce binaural signals, or speaker signals. The same HRTFs can be applied no matter the orientation of the HMD.
  • One or more embodiments of this disclosure also provide adaptive binaural rendering by relocating virtual speakers. Algorithms for sound field rotation around the X, Y, and Z axes are provided by this disclosure. In this embodiment, a sound field can be rotated sequentially for rotation around the X axis (i.e. roll), Y axis (i.e. pitch), and Z axis (i.e. yaw).
  • For rotation of virtual speakers around the Z axis only the azimuth may be changed. An azimuth of each virtual speaker can be shifted by γ, with γ being the rotation angle around the Z axis. The azimuth can be modified by:

  • θ′=θ+γ,   (1)
  • where θ and θ′ are the original and the modified azimuth, respectively.
  • For rotation of virtual speakers around the X axis (i.e. roll), both azimuth and elevation of virtual speakers are changed. The positions of virtual speakers are modified accordingly. The high order ambisonic (HOA) signals are mapped to virtual speakers at new locations and converted to binaural signals using a set of HRTFs corresponding to the positions of virtual speakers relative to the head. If a virtual speaker is located at θi and φi (azimuth and elevation respectively), then the new positions are θ′i and φ′i. Since rotation around the X axis does not change the projection to the X axis, the new position of each virtual speaker is given by the following procedure.

  • √{square root over (3)} cos(θ′i) cos(φ′i)=√{square root over (3)} cos(θi) cos(φi)   (2)
  • The Y and Z axes are rotated as follows:
  • ( y z ) = ( cos ( α ) - sin ( α ) sin ( α ) cos ( α ) ) ( y z ) , ( 3 )
  • where α is the rotation angle around the X axis. The values y, z, y′ and z′ (in the N3D ambisonic format) are given by:

  • y=√{square root over (3)} sin(θi) cos(φi), z=√{square root over (3)} sin(φi)

  • y′=√{square root over (3)} sin(θ′i) cos(φ′i), z′=√{square root over (3)} sin(φ′i)   (4)
  • The new position of the virtual speaker is given by:
  • φ i = arcsin [ sin ( α ) sin ( θ i ) cos ( φ i ) + cos ( α ) sin ( φ i ) ] θ i = arccos ( cos ( θ i ) cos ( φ i ) cos ( φ i ) ) ( 5 )
  • The HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
  • For rotation of virtual speakers around the Y axis, the same procedure as described for the X axis is used. Since rotation around the Y axis does not change the projection to the Y axis, the procedure is as follows.

  • √{square root over (3)} sin(θ′i) cos(φ′i)=√{square root over (3)} sin(θi) cos(φi)   (6)
  • Also, the X and Z axes will be rotated as follows:
  • ( x z ) = ( cos ( β ) - sin ( β ) sin ( β ) cos ( β ) ) ( x z ) , ( 7 )
  • where β is the rotation angle around the Y axis, and the values y, z, y′ and z′ (in the N3D ambisonic format) are:

  • x=√{square root over (3)} cos(θi) cos(φi), z=√{square root over (3)} sin(φi)

  • x′=√{square root over (3)} cos(θ′i) cos(φ′i), z′=√{square root over (3)} sin(φ′i)   (8)
  • The new position of the virtual speaker is given by:
  • φ i = arcsin [ sin ( β ) cos ( θ i ) cos ( φ i ) + cos ( β ) sin ( φ i ) ] θ i = arcsin ( sin ( θ i ) cos ( φ i ) cos ( φ i ) ) ( 9 )
  • The HOA signals are mapped to virtual speakers at the new locations and filtered with the corresponding HRTFs to find the binaural signals.
  • One or more embodiments of this disclosure provide another technique to rotate a sound field using a rotation matrix. This embodiment provides three new rotation matrices for rotating third order ambisonic signals around the three axes in 3D space. For any direction, ambisonic signals can be modified through a matrix multiplication as follows:

  • B′=RB   (10)
  • where R is the rotation matrix around an axis, and B and B′ are the original and modified ambisonic signals, respectively. In this embodiment, the positions of virtual speakers relative to the HMD remain unchanged and as such only one set of HRTFs is used for binaural rendering.
  • The different embodiments of this disclosure provide adaptive HOA binaural rendering based on sound field rotation in 3D space. Contrary to channel-based methods, in one of the embodiments, only one set of HRTFs corresponding to a fixed playback setup can be used. In comparison to channel-based binaural rendering, HOA-based methods provide a higher quality if there is not a very large set of HRTFs available to the binaural renderer. Sound fields can be edited in the ambisonic domain for artistic purposes prior to rendering to headphones.
  • An embodiment of this disclosure provides third order rotation matrices for N3D-encoded B-Format (used in MPEG audio material) for three axes, X, Y, and Z in the Cartesian domain. Rotation in a direction can be done by multiplying these rotation matrices.
  • For rotation around the X axis (i.e., roll), the third order rotation matrix can be a 16×16 matrix. The sixteen original ambisonic signals can be labeled as follows:

  • B=[W, X, Y, Z, T, R, S, U, Q, O, M, K, L, N, P].   (11)
  • The sixteen modified ambisonic signals can be labeled as follows:

  • B′=[W′, X′, Y′, Z′, V′, T′, R′, S′, U′, Q′, O′, M′, K′, L′, N′, P′]  (12)
  • Assuming an all-zero rotation matrix and only non-zero values in the rotation matrix, where α is the rotation angle around the X axis, the modified signals are as follows:
  • W = W ( 13 ) X = X ( 14 ) Y = cos ( α ) Y - sin ( α ) Z ( 15 ) Z = sin ( α ) Y + cos ( α ) Z ( 16 ) V = cos ( α ) V - sin ( α ) S ( 17 ) T = cos ( 2 α ) T - 3 2 sin ( 2 α ) R - 1 2 sin ( 2 α ) U ( 18 ) R = 3 2 sin ( 2 α ) T + ( 3 4 cos ( 2 α ) + 1 4 ) R + ( 3 4 cos ( 2 α ) - 3 4 ) U ( 19 ) S = sin ( α ) V + cos ( α ) S ( 20 ) U = 1 2 sin ( 2 α ) T + ( 3 4 cos ( 2 α ) - 3 4 ) R + ( 1 4 cos ( 2 α ) + 3 4 ) U ( 21 ) Q = ( 1 4 cos 3 ( α ) + 3 4 cos ( α ) ) Q + ( 15 4 cos 3 ( α ) - 15 4 cos ( α ) ) M + 10 4 sin 3 ( α ) K + ( 6 4 sin 3 ( α ) - 6 2 sin ( α ) ) N ( 22 ) O = cos ( 2 α ) O - 10 4 sin ( 2 α ) L - 6 4 sin ( 2 α ) P ( 23 ) M = - 15 4 cos ( α ) sin 2 ( α ) Q + ( cos 3 ( α ) - 11 4 cos ( α ) sin 2 ( α ) ) M + ( 6 4 sin 3 ( α ) - 6 sin ( α ) cos 2 ( α ) ) K + ( 10 4 sin 3 ( α ) - 10 2 sin ( α ) cos 2 ( α ) ) N ( 24 ) K = - 10 4 sin 3 ( α ) Q + ( - 6 4 sin 3 ( α ) + 6 sin ( α ) cos 2 ( α ) ) M + ( cos 3 ( α ) - 3 2 cos ( α ) sin 2 ( α ) ) K - 15 4 cos ( α ) sin 2 ( α ) N ( 25 ) L = 10 2 sin ( α ) cos ( α ) O + ( cos 2 ( α ) - 1 4 sin 2 ( α ) ) L - 15 4 sin 2 ( α ) P ( 26 ) N = ( 6 4 sin ( α ) + 6 4 cos 2 ( α ) sin ( α ) ) Q + ( 10 2 sin ( α ) - 3 10 4 sin 3 ( α ) ) M + ( 15 2 cos 3 ( α ) - 15 2 cos ( α ) ) K + ( 3 2 cos 3 ( α ) - 1 2 cos ( α ) ) N ( 27 ) P = 6 2 sin ( α ) cos ( α ) O - 15 4 sin 2 ( α ) L + ( 1 4 + 3 4 cos 2 ( α ) ) P ( 28 )
  • For rotation around the Y axis (i.e., pitch), the third order rotation matrix can be a 16×16 matrix, where β is the rotation angle around the Y axis, the modified signals are as follows:
  • W = W ( 29 ) X = cos ( β ) X - sin ( β ) Z ( 30 ) Y = Y ( 31 ) Z = sin ( β ) X + cos ( β ) Z ( 32 ) V = cos ( β ) V - sin ( β ) T ( 33 ) T = sin ( β ) V + cos ( β ) T ( 34 ) R = 3 2 sin ( 2 β ) S + ( 3 4 cos ( 2 β ) + 1 4 ) R - ( 3 4 cos ( 2 β ) - 3 4 ) U ( 35 ) S = cos ( 2 β ) S - 3 2 sin ( 2 β ) R + 1 2 sin ( 2 β ) U ( 36 ) U = - 1 2 sin ( 2 β ) S - ( 3 4 cos ( 2 β ) - 3 4 ) R + ( 1 4 cos ( 2 β ) + 3 4 ) U ( 37 ) Q = - 6 2 sin ( β ) cos ( β ) O + 15 4 sin 2 ( β ) M + ( 1 4 + 3 4 cos 2 ( β ) ) Q ( 38 ) O = cos ( 2 β ) O - 10 4 sin ( 2 β ) M + 6 4 sin ( 2 β ) Q ( 39 ) M = 10 2 sin ( β ) cos ( β ) O + ( cos 2 ( β ) - 1 4 sin 2 ( β ) ) M + 15 4 sin 2 ( β ) Q ( 40 ) K = 10 4 sin 3 ( β ) P + ( - 6 4 sin 3 ( β ) + 6 sin ( β ) cos 2 ( β ) ) L + ( cos 3 ( β ) - 3 2 cos ( β ) sin 2 ( β ) ) K + 15 4 cos ( β ) sin 2 ( β ) N ( 41 ) L = 15 4 cos ( β ) sin 2 ( β ) P + ( cos 3 ( β ) - 11 4 cos ( β ) sin 2 ( β ) ) L + ( 6 4 sin 3 ( β ) - 6 sin ( β ) cos 2 ( β ) ) K - ( 10 4 sin 3 ( β ) - 10 2 sin ( β ) cos 2 ( β ) ) N ( 42 ) N = ( 6 4 sin ( β ) + 6 4 cos 2 ( β ) sin ( β ) ) P - ( 10 2 sin ( β ) - 3 10 4 sin 3 ( β ) ) L - ( 15 2 cos 3 ( β ) - 15 2 cos ( β ) ) K + ( 3 2 cos 3 ( β ) - 1 2 cos ( β ) ) N ( 43 ) P = ( 1 4 cos 3 ( β ) + 3 4 cos ( β ) ) P - ( 15 4 cos 3 ( β ) - 15 4 cos ( β ) ) L - 10 4 sin 3 ( β ) K + ( 6 4 sin 3 ( β ) - 6 2 sin ( β ) ) N ( 44 )
  • For rotation around the Z axis (i.e., yaw), the third order rotation matrix can be a 16×16 matrix, where γ is the rotation angle around the Z axis, the modified signals are as follows:

  • W′=W   (45)

  • X′=cos(γ)X−sin(γ)Y   (46)

  • Y′=sin(γ)X+cos(γ)Y   (47)

  • Z′=Z   (48)

  • V′=cos(2γ)V+sin(2γ)U   (49)

  • T′=cos(γ)T+sin(γ)S   (50)

  • R′=R   (51)

  • S′=cos(γ)S−sin(γ)T   (52)

  • U′=cos(2γ)U−sin(2γ)V   (53)

  • Q′=cos(3γ)Q+sin(3γ)P   (54)

  • O′=cos(2γ)O+sin(2γ)N   (55)

  • M′=cos(γ)M+sin(γ)L   (56)

  • K′=K   (57)

  • L′=cos(γ)L−sin(γ)M   (58)

  • N′=cos(2γ)N−sin(2γ)O   (59)

  • P′=cos(3γ)P−sin(3γ)Q   (60)
  • In the different embodiments, a head tracker or sensors can be mounted on the headphones to determine the head orientation, which is used to rotate the ambisonic signals or in another method change the positions of virtual speakers. In one embodiment, positions of virtual speakers are changed based on the head movement, and in another embodiment, a rotation matrix is used to rotate the ambisonic signals. One embodiment can be used for any ambisonic order. The other embodiment uses only one set of HRTFs and may use less computation as there is no need to change the positions of virtual speakers. If the binaural signals are generated directly from HOA signals (without mapping HOA signals to virtual loudspeakers), the embodiment using only one set of HRTFs will further reduce computation overhead.
  • FIG. 5 illustrates process 500 for adaptive ambisonic binaural rendering according to this disclosure. The embodiment shown in FIG. 5 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure. A processor, such as processor 140 as shown in FIG. 5, can perform different steps of process 500.
  • At step 505, a processor is configured to receive an audio signal, the audio signal comprising a plurality of ambisonic signals. At step 510, the processor is configured to identify an orientation of the UE based on the measured physical properties of the UE. Sensors can be configured to sense the physical properties of the UE. The physical properties could include, for example, a touch input on the headset or the HMD, camera information, gesture information, a gyroscopic information, air pressure information, a magnetic information, an acceleration information, a grip information, a proximity information, a color information, a bio-physical information, a temperature/humidity information, an illumination information, an UV information, an Electromyography (EMG) information, an Electroencephalogram (EEG) information, an Electrocardiogram (ECG) sensor, an IR sensor, an ultrasound sensor, an iris sensor, a fingerprint sensor, etc.
  • At step 515 the processor is configured to rotate the plurality of ambisonic signals based on the orientation of the UE. The processor can apply at least one rotation matrix to the plurality of ambisonic signals. The at least one rotation matrix comprises a rotation matrix for each axis of three axes. If the orientation includes a rotation in a direction, the processor can be configured to rotate the sound field of the plurality of ambisonic signals opposite the direction. The processor can also be configured to map the plurality of ambisonic signals to one or more virtual speakers of a sound field.
  • At step 520, the processor is configured to filter the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals. The head related transfer functions could be stored in a memory element. The plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals. At step 525, the processor is configured to output the speaker signals.
  • One or more embodiments of this disclosure provide multichannel audio downmixing via ambisonic conversion. An embodiment provides a novel audio downmixing method based on ambisonic mapping. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing.
  • Different embodiments of this disclosure recognize and take into account that production of multichannel audio content keeps growing and becomes more widespread. Playback systems currently in use are capable to play back only small number of audio channels such as the legacy 5.1 format. Therefore there is a need for high quality methods to downmix large number of audio channels. Prior downmix methods fall into two passive and active categories. Passive methods use fixed coefficients to combine input channels into output channels. The passive methods sometimes produce unsatisfactory results and cause audio artifacts and spatial and timbral distortions. On the other hand, active downmix methods adapt the downmix procedure to the input audio and reduce distortions caused by passive methods.
  • In this disclosure, one or more embodiments provide an active downmix method based on the conversion of input multichannel audio to HOA signals. The playback setup can be independent of the input audio channels configuration. The HOA signals can be mapped to any speaker setup (e.g. smaller number of speakers, asymmetric configuration, etc.). One or more of the embodiments reduce common distortions such as coloration (i.e. comb filter effects) and loudness preservation to improve audio quality of the downmixed audio.
  • An embodiment of this disclosure provides HOA based audio downmixing. In this embodiment, the input audio channels are decomposed in the spherical coordinates (i.e. mapped onto the spherical harmonic bases) to generate fourth order HOA signals as follows:

  • B=YSin   (61)
  • Where Y is a matrix of the fourth order spherical harmonics in the direction of the input channels, Sin is the matrix containing input audio channels (except the low frequency effect (IMF) channels), and B is the matrix of HOA signals. The order of HOA signals can be increased to better represent the original sound field. Fourth order ambisonic representation for many sound fields would be sufficient and would reduce the computational load. The HOA signals can be mapped to any play back system using an HOA renderer as follows:

  • Sout=DB   (62)
  • where D is the HOA renderer and Sout is the output audio channels. The input LFE channels are used in the downmixed output audio. Some sound images may be distorted when a larger number of channels are downmixed to a smaller playback system. An example embodiment provides a sound field on a smaller playback system with the best possible audio quality.
  • Different embodiments of this disclosure recognize and take into account that audio downmixing results in some distortions. Issues can be caused by 3D to 2D conversion wherein the sound field envelopment (from above) and accuracy of sound source vertical localization are degraded. Some other issues that might be observed include coloration, loudness distortion, spectral distortion, direct to ambient sound ratio, auditory masking, etc.
  • Different embodiments of this disclosure recognize and take into account that coloration (i.e. comb filter effect) is caused by the addition of correlated signals where some frequency components are amplified or cancelled. That distortion in downmixed audio is observed when height channels are correlated with the horizontal channels but are not time-aligned (delayed by some msec). This misalignment occurs when a spaced microphone array is used to make multi-channel recordings of a sound field.
  • FIG. 6 illustrates block diagram 600 for high order ambisonic downmixing according to an embodiment of this disclosure. The embodiment of the high order ambisonic downmixing illustrated in FIG. 6 is for illustration only.
  • At block 602, a processor provides a correlation-based technique to adjust the Inter-Channel Time Delay (ICTD) between highly correlated input channels to reduce coloration. Since sound fields might consist of many sound sources, the processor divides input audio channels into subgroups based on the cross correlation. Channels with cross correlation greater than 0.2 are placed in the same group and then time aligned to the channel with the largest energy in that group. In one example embodiment, the maximum delay to be aligned is set to 10 msec. This maximum delay might not be caused by the distance between microphones in a microphone array, but might be caused by post-recording effects. One embodiment of this disclosure recognizes that there are large delays between channels in the MPEG 22.2-channel audio files, and sets the maximum delay at 10 msec. In one example embodiment, the processor does not align spectral components differently.
  • At block 604, the processor can perform ambisonic conversion. Input multichannel audio is mapped to spherical harmonics to generate an ambisonic representation of the sound field. At block 606, the processor can map the high order ambisonics to virtual speakers. The ambisonic signals are mapped to any playback system with a smaller number of speakers. A number of common downmix distortions are discussed and solutions are introduced to reduce some distortions such as signal coloration. Informal listening tests have demonstrated the merit of the proposed method compared to direct audio downmixing.
  • In one example embodiment, in order to preserve the energy in the downmixed sound field, at block 608, the energy of downmixed audio can be equalized in both spectral domain and space. In this example, energy distribution in a downmixed sound field can be more easily controlled. In the spectral domain, the energy of the downmixed channels is adjusted in the octave bands to make it equal the energy of the input sound field. The energy adjustment can also be done separately for the left and right channels to keep the energy ratio of the left and right channels the same as that in the input sound field
  • Some sound sources might be partially masked by louder sounds in a downmixed sound field, caused by the auditory masking effects in the frequency and/or time domain. Sound sources might be located at different location in the input sound field and therefore are audible. In a downmixed sound field, many sounds might be coming from the same direction and therefore auditory masking (both temporal and simultaneous) can be more effective. One way to reduce masking effects in a downmixed sound field is to apply different gains to input audio channels prior to downmixing to smaller number of channels
  • One or more embodiments of this disclosure recognize and take into account that whitening of broadband sounds is another distortion observed in some downmixed sound fields. An example embodiment avoids adding uncorrelated speaker signals that have almost identical spectrum. This technique works well for independently identically distributed (i.i.d.) sources. For other sources (e.g. localized sources) the spectral correlation of horizontal and height channels would be low. This is useful to replace the speaker signals in height speakers. If there is a mixture of ambient sounds and localized sources in height channels, the height speaker signals have to be decomposed into localized and ambient sounds, and then only the ambient sounds could be replaced (with proper energy adjustment).
  • An embodiment of this disclosure provides an audio downmixing method where input audio channels are transformed to HOA signals that can be mapped to any playback setup. This embodiment includes a spectral correction component to equalize the energy of the downmixed sound field in the left and right channels. Highly correlated input channels can be time aligned to reduce coloration. This embodiment can be used to downmix multichannel audio files with different configurations (e.g. 22.2, 14.0, 11.1, and 9.0) to a standard 5.1 configuration. Also 5.1 audio files can be converted to an irregular 5.1 format where loudspeakers are placed in irregular locations. As an extension to this example embodiment, HRTFs can be used to find the ear signals for the input and output sound fields. One example downmixed sound field can be found to result in the least difference between the ear signals for the input and output sound fields.
  • Although the present disclosure has been described with an exemplary embodiment, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications as fall within the scope of the appended claims.

Claims (20)

What is claimed is:
1. A user equipment (UE), the UE comprising:
a memory element configured to store a plurality of head-related transfer functions;
a processor configured to:
receive an audio signal, the audio signal comprising a plurality of ambisonic signals;
identify an orientation of the UE based on physical properties of the UE;
rotate the plurality of ambisonic signals based on the orientation of the UE;
filter the plurality of ambisonic signals using the plurality of head-related transfer functions to form speaker signals; and
output the speaker signals.
2. The UE of claim 1, wherein the processor configured to rotate the plurality of ambisonic signals based on the orientation of the UE comprises the processor configured to:
apply at least one rotation matrix to the plurality of ambisonic signals.
3. The UE of claim 1, wherein the processor is further configured to:
map the plurality of ambisonic signals to one or more virtual speakers of a sound field;
4. The UE of claim 3, wherein, in response to the orientation of the UE including a rotation in a direction, the processor configured to rotate the plurality of ambisonic signals comprises the processor configured to rotate the sound field of the plurality of ambisonic signals opposite the direction.
5. The UE of claim 3, wherein a position of the virtual speakers with respect to the UE remains unchanged.
6. The UE of claim 1, wherein the at least one rotation matrix comprises a rotation matrix for each axis of three axes.
7. The UE of claim 1, wherein the plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals.
8. The UE of claim 6, wherein the rotation matrix for each axis is for third order ambisonic signals.
9. The UE of claim 1, further comprising:
at least one sensor configured to measure the physical properties of the UE.
10. The UE of claim 1, wherein the processor is further configured to receive the physical properties of the UE from an at least one external sensor.
11. A method for audio signal processing, the method comprising:
receiving an audio signal, the audio signal comprising a plurality of ambisonic signals;
identifying an orientation of the UE based on physical properties of the UE;
rotating the plurality of ambisonic signals based on the orientation of the UE;
filtering the plurality of ambisonic signals using a plurality of head-related transfer functions to form speaker signals; and
outputting the speaker signals.
12. The method of claim 11, wherein rotating the plurality of ambisonic signals based on the orientation of the UE comprises:
applying at least one rotation matrix to the plurality of ambisonic signals.
13. The method of claim 11, further comprising:
mapping the plurality of ambisonic signals to one or more virtual speakers of a sound field;
14. The method of claim 13, wherein, in response to the orientation of the UE including a rotation in a direction, rotating the plurality of ambisonic signals comprises rotating the sound field of the plurality of ambisonic signals opposite the direction.
15. The method of claim 13, wherein a position of the virtual speakers with respect to the UE remains unchanged.
16. The method of claim 11, wherein the at least one rotation matrix comprises a rotation matrix for each axis of three axes.
17. The method of claim 11, wherein the plurality of head-related transfer functions comprises two head-related transfer functions used for any rotation of the plurality of ambisonic signals.
18. The method of claim 16, wherein the rotation matrix for each axis is for third order ambisonic signals.
19. The method of claim 11, further comprising:
measuring the physical properties of the UE.
20. The method of claim 11, further comprising:
receiving the physical properties of the UE from an at least one external sensor.
US14/988,589 2015-01-28 2016-01-05 Adaptive ambisonic binaural rendering Active 2036-03-20 US9767618B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/988,589 US9767618B2 (en) 2015-01-28 2016-01-05 Adaptive ambisonic binaural rendering

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562108774P 2015-01-28 2015-01-28
US201562108779P 2015-01-28 2015-01-28
US14/988,589 US9767618B2 (en) 2015-01-28 2016-01-05 Adaptive ambisonic binaural rendering

Publications (2)

Publication Number Publication Date
US20160241980A1 true US20160241980A1 (en) 2016-08-18
US9767618B2 US9767618B2 (en) 2017-09-19

Family

ID=56621622

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/988,589 Active 2036-03-20 US9767618B2 (en) 2015-01-28 2016-01-05 Adaptive ambisonic binaural rendering

Country Status (1)

Country Link
US (1) US9767618B2 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170094439A1 (en) * 2015-09-24 2017-03-30 Lenovo (Beijing) Limited Information processing method and electronic device
US20170105082A1 (en) * 2015-10-08 2017-04-13 Qualcomm Incorporated Conversion from channel-based audio to hoa
CN106657617A (en) * 2016-11-30 2017-05-10 努比亚技术有限公司 Method for controlling playing of loudspeakers and mobile terminal
CN107741783A (en) * 2017-10-01 2018-02-27 上海量科电子科技有限公司 electronic transfer method and system
US20180091919A1 (en) * 2016-09-23 2018-03-29 Gaudio Lab, Inc. Method and device for processing binaural audio signal
KR20180044077A (en) * 2016-10-21 2018-05-02 삼성전자주식회사 In multimedia communication between terminal devices, method for transmitting audio signal and outputting audio signal and terminal device performing thereof
WO2018093193A1 (en) * 2016-11-17 2018-05-24 Samsung Electronics Co., Ltd. System and method for producing audio data to head mount display device
US9992602B1 (en) * 2017-01-12 2018-06-05 Google Llc Decoupled binaural rendering
US10009704B1 (en) 2017-01-30 2018-06-26 Google Llc Symmetric spherical harmonic HRTF rendering
US20180184225A1 (en) * 2016-12-23 2018-06-28 Nxp B.V. Processing audio signals
CN108268257A (en) * 2016-12-29 2018-07-10 福建省天奕网络科技有限公司 Lines track method for drafting and system applied to VR scenes
CN108346432A (en) * 2017-01-25 2018-07-31 北京三星通信技术研究有限公司 The processing method and relevant device of Virtual Reality audio
WO2018149774A1 (en) * 2017-02-15 2018-08-23 Sennheiser Electronic Gmbh & Co. Kg Method and device for processing a digital audio signal for binaural reproduction
US10158963B2 (en) 2017-01-30 2018-12-18 Google Llc Ambisonic audio with non-head tracked stereo based on head position and time
WO2018234624A1 (en) * 2017-06-21 2018-12-27 Nokia Technologies Oy Recording and rendering audio signals
WO2019009085A1 (en) * 2017-07-05 2019-01-10 ソニー株式会社 Signal processing device and method, and program
US10249312B2 (en) 2015-10-08 2019-04-02 Qualcomm Incorporated Quantization of spatial vectors
WO2019063877A1 (en) * 2017-09-29 2019-04-04 Nokia Technologies Oy Recording and rendering spatial audio signals
US20190116440A1 (en) * 2017-10-12 2019-04-18 Qualcomm Incorporated Rendering for computer-mediated reality systems
US20190239015A1 (en) * 2018-02-01 2019-08-01 Qualcomm Incorporated Scalable unified audio renderer
US10390166B2 (en) * 2017-05-31 2019-08-20 Qualcomm Incorporated System and method for mixing and adjusting multi-input ambisonics
US10405126B2 (en) 2017-06-30 2019-09-03 Qualcomm Incorporated Mixed-order ambisonics (MOA) audio data for computer-mediated reality systems
US20190313198A1 (en) * 2017-12-19 2019-10-10 Spotify Ab Audio content format selection
US10515645B2 (en) * 2015-07-30 2019-12-24 Dolby Laboratories Licensing Corporation Method and apparatus for transforming an HOA signal representation
US10595148B2 (en) * 2016-01-08 2020-03-17 Sony Corporation Sound processing apparatus and method, and program
US10657974B2 (en) * 2017-12-21 2020-05-19 Qualcomm Incorporated Priority information for higher order ambisonic audio data
US10659906B2 (en) 2017-01-13 2020-05-19 Qualcomm Incorporated Audio parallax for virtual reality, augmented reality, and mixed reality
WO2020102994A1 (en) * 2018-11-20 2020-05-28 深圳市欢太科技有限公司 3d sound effect realization method and apparatus, and storage medium and electronic device
WO2020159602A1 (en) * 2019-01-28 2020-08-06 Embody Vr, Inc Spatial audio is received from an audio server over a first communication link. the spatial audio is converted by a cloud spatial audio processing system into binaural audio. the binauralized audio is streamed from the cloud spatial audio processing system to a mobile station over a second communication link to cause the mobile station to play the binaural audio on the personal audio delivery device
EP3618462A4 (en) * 2017-04-26 2021-01-13 Shenzhen Skyworth-RGB Electronic Co., Ltd. Method and apparatus for processing audio data in sound field
WO2021102137A1 (en) * 2019-11-22 2021-05-27 Qualcomm Incorporated Soundfield adaptation for virtual reality audio
US11076257B1 (en) 2019-06-14 2021-07-27 EmbodyVR, Inc. Converting ambisonic audio to binaural audio
EP3833055A4 (en) * 2018-08-20 2021-09-22 Huawei Technologies Co., Ltd. Audio processing method and apparatus
US11270711B2 (en) 2017-12-21 2022-03-08 Qualcomm Incorproated Higher order ambisonic audio data
US11425497B2 (en) * 2020-12-18 2022-08-23 Qualcomm Incorporated Spatial audio zoom
US11743670B2 (en) 2020-12-18 2023-08-29 Qualcomm Incorporated Correlation-based rendering with multiple distributed streams accounting for an occlusion for six degree of freedom applications

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8587631B2 (en) * 2010-06-29 2013-11-19 Alcatel Lucent Facilitating communications using a portable communication device and directed sound output

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8587631B2 (en) * 2010-06-29 2013-11-19 Alcatel Lucent Facilitating communications using a portable communication device and directed sound output

Cited By (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10515645B2 (en) * 2015-07-30 2019-12-24 Dolby Laboratories Licensing Corporation Method and apparatus for transforming an HOA signal representation
US11043224B2 (en) 2015-07-30 2021-06-22 Dolby Laboratories Licensing Corporation Method and apparatus for encoding and decoding an HOA representation
US20170094439A1 (en) * 2015-09-24 2017-03-30 Lenovo (Beijing) Limited Information processing method and electronic device
US9986362B2 (en) * 2015-09-24 2018-05-29 Lenovo (Beijing) Limited Information processing method and electronic device
US20170105082A1 (en) * 2015-10-08 2017-04-13 Qualcomm Incorporated Conversion from channel-based audio to hoa
US9961467B2 (en) * 2015-10-08 2018-05-01 Qualcomm Incorporated Conversion from channel-based audio to HOA
US10249312B2 (en) 2015-10-08 2019-04-02 Qualcomm Incorporated Quantization of spatial vectors
US10595148B2 (en) * 2016-01-08 2020-03-17 Sony Corporation Sound processing apparatus and method, and program
US10659904B2 (en) * 2016-09-23 2020-05-19 Gaudio Lab, Inc. Method and device for processing binaural audio signal
US20180091919A1 (en) * 2016-09-23 2018-03-29 Gaudio Lab, Inc. Method and device for processing binaural audio signal
US20190335287A1 (en) * 2016-10-21 2019-10-31 Samsung Electronics., Ltd. Method for transmitting audio signal and outputting received audio signal in multimedia communication between terminal devices, and terminal device for performing same
EP3531695A4 (en) * 2016-10-21 2019-11-06 Samsung Electronics Co., Ltd. Method for transmitting audio signal and outputting received audio signal in multimedia communication between terminal devices, and terminal device for performing same
US10972854B2 (en) * 2016-10-21 2021-04-06 Samsung Electronics Co., Ltd. Method for transmitting audio signal and outputting received audio signal in multimedia communication between terminal devices, and terminal device for performing same
KR102277438B1 (en) * 2016-10-21 2021-07-14 삼성전자주식회사 In multimedia communication between terminal devices, method for transmitting audio signal and outputting audio signal and terminal device performing thereof
KR20180044077A (en) * 2016-10-21 2018-05-02 삼성전자주식회사 In multimedia communication between terminal devices, method for transmitting audio signal and outputting audio signal and terminal device performing thereof
US11026024B2 (en) 2016-11-17 2021-06-01 Samsung Electronics Co., Ltd. System and method for producing audio data to head mount display device
WO2018093193A1 (en) * 2016-11-17 2018-05-24 Samsung Electronics Co., Ltd. System and method for producing audio data to head mount display device
CN106657617A (en) * 2016-11-30 2017-05-10 努比亚技术有限公司 Method for controlling playing of loudspeakers and mobile terminal
US10602297B2 (en) * 2016-12-23 2020-03-24 Nxp B.V. Processing audio signals
US20180184225A1 (en) * 2016-12-23 2018-06-28 Nxp B.V. Processing audio signals
CN108268257A (en) * 2016-12-29 2018-07-10 福建省天奕网络科技有限公司 Lines track method for drafting and system applied to VR scenes
US9992602B1 (en) * 2017-01-12 2018-06-05 Google Llc Decoupled binaural rendering
US10952009B2 (en) 2017-01-13 2021-03-16 Qualcomm Incorporated Audio parallax for virtual reality, augmented reality, and mixed reality
US10659906B2 (en) 2017-01-13 2020-05-19 Qualcomm Incorporated Audio parallax for virtual reality, augmented reality, and mixed reality
KR20200067981A (en) * 2017-01-25 2020-06-15 삼성전자주식회사 Method for processing vr audio and corresponding equipment
EP3569001A4 (en) * 2017-01-25 2020-07-22 Samsung Electronics Co., Ltd. Method for processing vr audio and corresponding equipment
WO2018139884A1 (en) 2017-01-25 2018-08-02 Samsung Electronics Co., Ltd. Method for processing vr audio and corresponding equipment
KR102462067B1 (en) * 2017-01-25 2022-11-02 삼성전자주식회사 Method for processing vr audio and corresponding equipment
US10750305B2 (en) 2017-01-25 2020-08-18 Samsung Electronics Co., Ltd. Method for processing VR audio and corresponding equipment
CN108346432A (en) * 2017-01-25 2018-07-31 北京三星通信技术研究有限公司 The processing method and relevant device of Virtual Reality audio
US10009704B1 (en) 2017-01-30 2018-06-26 Google Llc Symmetric spherical harmonic HRTF rendering
US10158963B2 (en) 2017-01-30 2018-12-18 Google Llc Ambisonic audio with non-head tracked stereo based on head position and time
WO2018149774A1 (en) * 2017-02-15 2018-08-23 Sennheiser Electronic Gmbh & Co. Kg Method and device for processing a digital audio signal for binaural reproduction
EP3618462A4 (en) * 2017-04-26 2021-01-13 Shenzhen Skyworth-RGB Electronic Co., Ltd. Method and apparatus for processing audio data in sound field
US10966026B2 (en) 2017-04-26 2021-03-30 Shenzhen Skyworth-Rgb Electronic Co., Ltd. Method and apparatus for processing audio data in sound field
US10390166B2 (en) * 2017-05-31 2019-08-20 Qualcomm Incorporated System and method for mixing and adjusting multi-input ambisonics
WO2018234624A1 (en) * 2017-06-21 2018-12-27 Nokia Technologies Oy Recording and rendering audio signals
US11632643B2 (en) 2017-06-21 2023-04-18 Nokia Technologies Oy Recording and rendering audio signals
US10405126B2 (en) 2017-06-30 2019-09-03 Qualcomm Incorporated Mixed-order ambisonics (MOA) audio data for computer-mediated reality systems
JPWO2019009085A1 (en) * 2017-07-05 2020-04-30 ソニー株式会社 Signal processing device and method, and program
JP7115477B2 (en) 2017-07-05 2022-08-09 ソニーグループ株式会社 SIGNAL PROCESSING APPARATUS AND METHOD, AND PROGRAM
US11252524B2 (en) * 2017-07-05 2022-02-15 Sony Corporation Synthesizing a headphone signal using a rotating head-related transfer function
WO2019009085A1 (en) * 2017-07-05 2019-01-10 ソニー株式会社 Signal processing device and method, and program
US11606661B2 (en) 2017-09-29 2023-03-14 Nokia Technologies Oy Recording and rendering spatial audio signals
WO2019063877A1 (en) * 2017-09-29 2019-04-04 Nokia Technologies Oy Recording and rendering spatial audio signals
CN107741783A (en) * 2017-10-01 2018-02-27 上海量科电子科技有限公司 electronic transfer method and system
US20190116440A1 (en) * 2017-10-12 2019-04-18 Qualcomm Incorporated Rendering for computer-mediated reality systems
US10469968B2 (en) * 2017-10-12 2019-11-05 Qualcomm Incorporated Rendering for computer-mediated reality systems
CN111183658A (en) * 2017-10-12 2020-05-19 高通股份有限公司 Rendering for computer-mediated reality systems
US11044569B2 (en) * 2017-12-19 2021-06-22 Spotify Ab Audio content format selection
US20210345056A1 (en) * 2017-12-19 2021-11-04 Spotify Ab Audio content format selection
US11683654B2 (en) * 2017-12-19 2023-06-20 Spotify Ab Audio content format selection
US20190313198A1 (en) * 2017-12-19 2019-10-10 Spotify Ab Audio content format selection
US10657974B2 (en) * 2017-12-21 2020-05-19 Qualcomm Incorporated Priority information for higher order ambisonic audio data
US11270711B2 (en) 2017-12-21 2022-03-08 Qualcomm Incorproated Higher order ambisonic audio data
WO2019152783A1 (en) * 2018-02-01 2019-08-08 Qualcomm Incorporated Scalable unified audio renderer
US20190239015A1 (en) * 2018-02-01 2019-08-01 Qualcomm Incorporated Scalable unified audio renderer
US11395083B2 (en) * 2018-02-01 2022-07-19 Qualcomm Incorporated Scalable unified audio renderer
CN111670583A (en) * 2018-02-01 2020-09-15 高通股份有限公司 Scalable unified audio renderer
EP3833055A4 (en) * 2018-08-20 2021-09-22 Huawei Technologies Co., Ltd. Audio processing method and apparatus
US11910180B2 (en) 2018-08-20 2024-02-20 Huawei Technologies Co., Ltd. Audio processing method and apparatus
US11611841B2 (en) 2018-08-20 2023-03-21 Huawei Technologies Co., Ltd. Audio processing method and apparatus
WO2020102994A1 (en) * 2018-11-20 2020-05-28 深圳市欢太科技有限公司 3d sound effect realization method and apparatus, and storage medium and electronic device
WO2020159602A1 (en) * 2019-01-28 2020-08-06 Embody Vr, Inc Spatial audio is received from an audio server over a first communication link. the spatial audio is converted by a cloud spatial audio processing system into binaural audio. the binauralized audio is streamed from the cloud spatial audio processing system to a mobile station over a second communication link to cause the mobile station to play the binaural audio on the personal audio delivery device
US11617051B2 (en) 2019-01-28 2023-03-28 EmbodyVR, Inc. Streaming binaural audio from a cloud spatial audio processing system to a mobile station for playback on a personal audio delivery device
US11076257B1 (en) 2019-06-14 2021-07-27 EmbodyVR, Inc. Converting ambisonic audio to binaural audio
US11317236B2 (en) 2019-11-22 2022-04-26 Qualcomm Incorporated Soundfield adaptation for virtual reality audio
WO2021102137A1 (en) * 2019-11-22 2021-05-27 Qualcomm Incorporated Soundfield adaptation for virtual reality audio
US11425497B2 (en) * 2020-12-18 2022-08-23 Qualcomm Incorporated Spatial audio zoom
US11743670B2 (en) 2020-12-18 2023-08-29 Qualcomm Incorporated Correlation-based rendering with multiple distributed streams accounting for an occlusion for six degree of freedom applications

Also Published As

Publication number Publication date
US9767618B2 (en) 2017-09-19

Similar Documents

Publication Publication Date Title
US9767618B2 (en) Adaptive ambisonic binaural rendering
US11765541B2 (en) Audio spatialization
US10728683B2 (en) Sweet spot adaptation for virtualized audio
US20190349705A9 (en) Graphical user interface to adapt virtualizer sweet spot
US10819953B1 (en) Systems and methods for processing mixed media streams
US11309947B2 (en) Systems and methods for maintaining directional wireless links of motile devices
JP2020510341A (en) Distributed audio virtualization system
TWI709131B (en) Audio scene processing
JP2021535632A (en) Methods and equipment for processing audio signals
US11696087B2 (en) Emphasis for audio spatialization
CN114072792A (en) Cryptographic-based authorization for audio rendering
JP2022547253A (en) Discrepancy audiovisual acquisition system
CN110881157B (en) Sound effect control method and sound effect output device for orthogonal base correction
TWI683582B (en) Sound effect controlling method and sound outputting device with dynamic gain
US11445299B2 (en) Rendering binaural audio over multiple near field transducers
WO2022151336A1 (en) Techniques for around-the-ear transducers

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NAJAF-ZADEH, HOSSEIN;WOODWARD, BARRY;REEL/FRAME:037414/0388

Effective date: 20160105

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4