EP4393168A1 - Vorrichtung und verfahren zur ambisonic-binauralen audiowiedergabe - Google Patents

Vorrichtung und verfahren zur ambisonic-binauralen audiowiedergabe

Info

Publication number
EP4393168A1
EP4393168A1 EP21765935.8A EP21765935A EP4393168A1 EP 4393168 A1 EP4393168 A1 EP 4393168A1 EP 21765935 A EP21765935 A EP 21765935A EP 4393168 A1 EP4393168 A1 EP 4393168A1
Authority
EP
European Patent Office
Prior art keywords
ambisonic
ear
hrtf
right ear
left ear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21765935.8A
Other languages
English (en)
French (fr)
Inventor
Martin POLLOW
Lauren WARD
Gavin Kearney
Calum Armstrong
Thomas Mckenzie
Liyun PANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP4393168A1 publication Critical patent/EP4393168A1/de
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the apparatus improves low order ITD rendering whilst remaining computationally efficient.
  • the HRTF preprocessing may be done offline, to prevent added computational complexity at runtime. Once adjusting, i.e. calibration is applied and the optimization routine has generated new HRTFs, the localization accuracy of sound source rendering may be improved significantly.
  • the apparatus may be used to improve Ambisonic rendering of measured HRTFs for an individual.
  • a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa.
  • a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures.
  • a specific apparatus is described based on one or a plurality of units, e.g.
  • Figure 1 is a schematic diagram illustrating an apparatus 100 for Ambisonic binaural rendering of an input signal.
  • the apparatus 100 comprises a left ear transducer 101a, e.g. loudspeaker 101a configured to generate a left ear audio signal based on a left ear transducer driver signal and a right ear transducer 101 b, e.g. loudspeaker 101 b configured to generate a right ear audio signal based on a right ear transducer driver signal for a user 103.
  • the apparatus 100 may be implemented in the form of headphones 100.
  • the apparatus 100 further comprises a processing circuitry 110.
  • the processing circuitry 110 may be implemented in hardware and/or software and may comprise digital circuitry, or both analog and digital circuitry.
  • Digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or general-purpose processors.
  • the apparatus 100 may further comprise a memory 105 configured to store executable program code which, when executed by the processing circuitry 110, causes the apparatus 100 to perform the functions and methods described herein.
  • the processing circuitry 110 of the binaural audio rendering apparatus 100 is configured to generate the left ear transducer driver signal and the right ear transducer driver signal using Ambisonic binaural rendering of the input signal based on a plurality of virtual loudspeakers with a virtual loudspeaker configuration defining the number and positions of the virtual loudspeakers.
  • Each virtual loudspeaker is associated with a left ear and a right ear reference head-related transfer function, HRTF, and a left ear and a right ear Ambisonic HRTF, which can be interpreted as a rendered Ambisonic, i.e. representation of the HRTF.
  • Each virtual loudspeaker may be additionally associated with a virtual loudspeaker direction.
  • the processing circuitry 110 of the apparatus 100 illustrated in figure 1 is further configured to adjust the left ear and the right ear Ambisonic HRTF by adjusting an interaural time difference, ITD, of the left ear and the right ear HRTF for each virtual loudspeaker based on a comparison of the resulting Ambisonic HRTF with a reference ITD of the left ear and the right ear reference HRTF.
  • the processing circuitry 110 is further configured to generate the left ear transducer driver signal and the right ear transducer driver signal based on the input signal and the plurality of adjusted left ear and right ear Ambisonic HRTFs of the plurality of virtual loudspeakers using Ambisonic binaural rendering.
  • Fig. 2 is a block diagram illustrating processing steps implemented by the processing circuitry 110 of the apparatus 100 in order to generate Ambisonic Interaural Time Difference Optimised (AITDO) HRTFs according to an embodiment.
  • the processing circuitry 110 of the apparatus 100 may generate personalized reference HRTFs 119 which are Ambisonic Interaural Time Difference (ITD) calibrated 117 based on phase removal and alignment 113 of HRTFs from a reference HRTF Database 111.
  • the calibration 117 may be based on using a head-size measurement 115.
  • the calibration 117 may alternatively be based on an ITD slider method.
  • the processing circuitry 110 of the apparatus 100 may choose a virtual loudspeaker configuration for an appropriate Ambisonic order, which may be for example an octahedron for a 1st order.
  • the virtual loudspeaker configuration may be chosen by the processing circuitry 110 based on the reference HRTF dataset 111.
  • an Ambisonic domain HRTF is generated. In an embodiment the generation may be based on using a delta function.
  • An Ambisonic ITD may then be estimated, i.e. calculated 121 b by the processing circuitry 110 for the Ambisonic HRTF and a reference ITD may be estimated, i.e. calculated 121 b by the processing circuitry 110 for the original virtual loudspeaker HRTF, i.e.
  • the personalized HRTF 119 or reference HRTF and subsequentially the difference in ITD between the two estimations is calculated 123.
  • the process may be repeated for all HRTF directions, creating an array of ITD difference values.
  • the iteration may continue until a difference between the respective reference ITD and the respective Ambisonic ITD is smaller than a threshold value or until a predefined number of iterations has been reached.
  • the augmented 127 virtual loudspeaker HRTFs with AITDO may then combined with the original virtual loudspeaker HRTFs using a linear-phase crossover network and subsequentially normalized 129.
  • the pre-processed HRTFs 131 are then switched into the binaural decoder 133, combined 137 with the Ambisonic HRTF generated based on the input signal and the process is repeated iteratively.
  • An array of delay values may be augmented at each iteration which keeps track of the cumulative delay for each HRTF. At each iteration, this array of delays may be used on the original HRTF set, ensuring that the final AITDO pre-processed HRTF dataset will be subject to the crossover filter only once, regardless of the number of iterations.
  • Fig. 3 is a block diagram illustrating processing steps implemented by the apparatus 100 for Ambisonic binaural rendering of the input signal according to a further embodiment using Ambisonic Interaural Time Difference Optimised preprocessed reference head- related transfer functions.
  • the apparatus 100 may be used for supporting spatial audio playback for virtual or augmented reality games on a mobile phone.
  • the AITDO algorithm implemented by the processing circuitry 110 of the apparatus 100 and further detailed above and below may optimize the HRTFs used for binaural based Ambisonic rendering of game objects.
  • the user 103 may be asked to undertake ITD calibration 117 prior to the game start.
  • the calibration routine may be as simple as getting the subject to measure their head size and then to extract ITDs based on a spherical head model.
  • a full test procedure may be employed where the user 103 is presented sound sources with different cross-head delays and asked to judge when the sound source is perceived to move laterally.
  • a bitstream may come from a game engine and be transcoded 303 to Ambisonic.
  • the AITDO HRTFs 301 are loaded and a binaural based Ambisonic Tenderer 305 produces the final headphone mix.
  • no additional computation complexity is introduced at runtime.
  • Fig. 4 is a block diagram illustrating processing steps implemented by the processing circuitry 110 of the apparatus 100 for Ambisonic binaural rendering of the input signal according to a further embodiment.
  • an Ambisonics signal may be created that corresponds to an incoming plane wave from the direction of the virtual loudspeaker.
  • an initial Ambisonic Binaural decoder implemented by the processing circuitry 110 may be used and the Ambisonics signal may then be binaurally decoded by convolution with the Ambisonics representation of the used Ambisonics HRTF, which may result in a 2 channel impulse response for each loudspeaker.
  • the ITD may be estimated 403 for the binaural impulse response obtained in the previous step 402 for each speaker direction.
  • the ITD may be estimated in the same way for the original HRIR (head-related impulse response) corresponding to the loudspeaker directions. Then the difference in ITD between these two estimations may be calculated for all directions of the virtual loudspeaker array.
  • the ITD difference values of the current iteration which may include the single value of ITD for each virtual loudspeaker, may be stored in an array of the memory 105 which may contain all of the previous measured ITD differences.
  • the processing circuitry 110 may check whether the ITD difference is small enough or the maximum number of iterations has been achieved. If true the process stops in a further step 408, otherwise continues with a further step 406. In the step 406, the processing circuitry 110 may augment the virtual loudspeaker HRTF, i.e.
  • step 407 the augmented virtual loudspeaker HRTFs are used to re-compute the binaural decoder and the process is repeated iteratively, thus going back to step 402.
  • the total delay of the HRTFs for each virtual loudspeaker may be calculated 409 by the addition of all partial delays of all iterations, i.e. ITD offsets, and may be used by the processing circuitry 110 for computing 410 the final Ambisonics Binaural decoder.
  • the calculation in step 409 may comprise an augmentation, combination with the original virtual loudspeaker HRTFs using a linear-phase crossover network and a normalization as described above.
  • Figure 5 is a flow diagram illustrating a method 500 for Ambisonic binaural rendering of the input signal.
  • the method 500 comprises a first step of generating 501 a left ear transducer driver signal and a right ear transducer driver signal using Ambisonic binaural rendering of the input signal based on a plurality of virtual loudspeakers, each virtual loudspeaker being associated with a left ear and a right ear reference head-related transfer function, HRTF, and a left ear and a right ear Ambisonic HRTF.
  • the method 500 comprises a step of adjusting 503 for each virtual loudspeaker the left ear and the right ear HRTF by adjusting an interaural time difference, ITD, of the left ear and the right ear HRTF based on a comparison of the ITD of the resulting Ambisonic HRTF with a reference ITD of the left ear and the right ear reference HRTF.
  • ITD interaural time difference
  • the method 500 further comprises a step of generating 505, based on the input signal and the plurality of adjusted left ear and right ear Ambisonic HRTFs of the plurality of virtual loudspeakers, a left ear transducer driver signal and a right ear transducer driver signal using Ambisonic binaural rendering for driving a left ear transducer 101a configured to generate a left ear audio signal and a right ear transducer 101 b configured to generate a right ear audio signal.
  • the method 500 can be performed by the apparatus 100 according to an embodiment. Thus, further features of the method 500 result directly from the functionality of the apparatus 100 as well as its different embodiments described above and below.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the described embodiment of an apparatus is merely exemplary.
  • the unit division is merely logical function division and may be another division in an actual implementation.
  • a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces.
  • the indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
  • the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
EP21765935.8A 2021-08-25 2021-08-25 Vorrichtung und verfahren zur ambisonic-binauralen audiowiedergabe Pending EP4393168A1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/073440 WO2023025376A1 (en) 2021-08-25 2021-08-25 Apparatus and method for ambisonic binaural audio rendering

Publications (1)

Publication Number Publication Date
EP4393168A1 true EP4393168A1 (de) 2024-07-03

Family

ID=77640694

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21765935.8A Pending EP4393168A1 (de) 2021-08-25 2021-08-25 Vorrichtung und verfahren zur ambisonic-binauralen audiowiedergabe

Country Status (2)

Country Link
EP (1) EP4393168A1 (de)
WO (1) WO2023025376A1 (de)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0815362D0 (en) * 2008-08-22 2008-10-01 Queen Mary & Westfield College Music collection navigation

Also Published As

Publication number Publication date
WO2023025376A1 (en) 2023-03-02

Similar Documents

Publication Publication Date Title
US10757529B2 (en) Binaural audio reproduction
US9918179B2 (en) Methods and devices for reproducing surround audio signals
CN107018460B (zh) 具有头部跟踪的双耳头戴式耳机呈现
US8488796B2 (en) 3D audio renderer
US7231054B1 (en) Method and apparatus for three-dimensional audio display
US8081762B2 (en) Controlling the decoding of binaural audio signals
AU2015234454B2 (en) Method and apparatus for rendering acoustic signal, and computer-readable recording medium
KR101567461B1 (ko) 다채널 사운드 신호 생성 장치
US8374365B2 (en) Spatial audio analysis and synthesis for binaural reproduction and format conversion
US9607622B2 (en) Audio-signal processing device, audio-signal processing method, program, and recording medium
JP2008522483A (ja) 多重チャンネルオーディオ入力信号を2チャンネル出力で再生するための装置及び方法と、これを行うためのプログラムが記録された記録媒体
Yao Headphone-based immersive audio for virtual reality headsets
WO2000019415A2 (en) Method and apparatus for three-dimensional audio display
WO2017063688A1 (en) Method and device for generating an elevated sound impression
EP1815716A1 (de) Vorrichtung und verfahren zur verarbeitung von mehrkanal-audioeingangssignalen zur erzeugung von mindestens zwei kanalausgangssignalen daraus sowie computerlesbares medium mit ausführbarem code zur durchführung dieses verfahrens
EP3700233A1 (de) System und verfahren zur erzeugung von übertragungsfunktionen
EP4393168A1 (de) Vorrichtung und verfahren zur ambisonic-binauralen audiowiedergabe
US20230143857A1 (en) Spatial Audio Reproduction by Positioning at Least Part of a Sound Field
US11470435B2 (en) Method and device for processing audio signals using 2-channel stereo speaker
US20240007819A1 (en) Apparatus and method for personalized binaural audio rendering
KR20240145040A (ko) 머리 전달함수 압축을 위한 장치 및 방법
WO2023156631A1 (en) Apparatus and method for head-related transfer function compression

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20240325

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR