EP3912365A1 - Dispositif et procédé de restitution d'un signal audio binaural - Google Patents

Dispositif et procédé de restitution d'un signal audio binaural

Info

Publication number
EP3912365A1
EP3912365A1 EP19720591.7A EP19720591A EP3912365A1 EP 3912365 A1 EP3912365 A1 EP 3912365A1 EP 19720591 A EP19720591 A EP 19720591A EP 3912365 A1 EP3912365 A1 EP 3912365A1
Authority
EP
European Patent Office
Prior art keywords
component
audio signal
direct component
hrtf
diffuse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19720591.7A
Other languages
German (de)
English (en)
Inventor
Christof Faller
Alexis Favrot
Mohammad TAGHIZADEH
Martin POLLOW
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP3912365A1 publication Critical patent/EP3912365A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the present invention relates to the technical field of three-dimensionality (3D) sound, for instance, for virtual reality (VR) applications or surround sound.
  • the invention also relates to VR compatible audio formats, e.g. First Order Ambisonic (FOA) signals (also referred to as B- format).
  • FOA First Order Ambisonic
  • the invention relates specifically to generating binaural sounds/signals from such audio formats.
  • the invention proposes to this end a device and a method for rendering a binaural signal.
  • 3D VR sounds are typically recorded and stored as FOA signals.
  • the rendering of these FOA signals over headphones is then done by converting them into binaural sounds.
  • the binaural sounds are obtained based on Head Related Transfer Functions (HRTFs), which model the filters from a point source emitting to the ears, such that the impressions of immersion and extemalisation are improved.
  • HRTFs Head Related Transfer Functions
  • the binaural sounds are usually rendered by applying the HRTFs to decoded virtual loudspeaker signals.
  • a straightforward approach for obtaining the binaural sounds is, to decode the FOA signals into specific loudspeaker setups with pre-defined positions, and then applying the HRTFs relatively to these positions.
  • direct and linear decoding of the FOA signals does not provide enough spatial resolution to cover the entire 3D space.
  • the performance is often restricted by a trade-off between the computational complexity of the system and the precision (length) of the HRTFs models or measurements.
  • Non-linear decoding provides better localisation and spatialization, or better discrimination between direct and diffuse sounds for improved spaciousness.
  • parametric approaches in the time- frequency domain based on non-linear decoding into N virtual loudspeaker signals, provide such better results.
  • Other approaches directly synthesise a two channel binaural signal using target binaural cues, which are computed either based on an analysis or based on MPEG surround spatial cues.
  • embodiments of the invention aim to improve the current approaches for rendering binaural sounds.
  • An objective is to obtain a binaural audio signal with improved spatial resolution and by less computational complexity.
  • the use of virtual loudspeaker signals may be avoided.
  • binaural cues should not be necessary for the HRTFs.
  • full compatibility with existing FOA signals is desired.
  • embodiments of the invention propose separating a direct component from a diffuse component of an audio signal, then modifying the direct component based on a HRTF, which is determined based on a Direction of Arrival (DoA) related to the audio signal, and then rendering the binaural audio signal from the diffuse component and the modified direct component.
  • DoA Direction of Arrival
  • a first aspect of the invention provides a device for rendering a binaural audio signal, wherein the device is configured to: obtain a direct component of the audio signal and a diffuse component and a DoA, from a plurality of audio channels of the audio signal, determine a HRTF according to the DoA, filter the direct component based on the HRTF to obtain a modified direct component, and generate the binaural audio signal based on the diffuse component and the modified direct component.
  • the device of the first aspect is able to render the binaural audio signal with high spatial resolution and low computational complexity. Thereby, the use of virtual loudspeaker signals is not necessary, and also not binaural cues need to be used.
  • the HRTF can be applied directly to the audio signal (direct component).
  • the device of the first aspect is full compatibility with existing audio signals, particularly FOA signals.
  • the plurality of audio channels are B-format channels.
  • the audio channels may be the channels of a FOA signal (i.e. W, X, Y and Z), and the audio signal may be the FOA signal.
  • a FOA signal i.e. W, X, Y and Z
  • the audio signal may be the FOA signal.
  • the device is configured to: generate the binaural audio signal by combining the diffuse component with the modified component.
  • the device is configured to: generate a left diffuse component and a right diffuse component based on the diffuse component, generate a left modified direct component and a right modified direct component based on the modified direct component, and combine the left diffuse component with the left modified direct component, and combine the right diffuse component with the right modified direct component, wherein the binaural audio signal is generated based on the result of the combining.
  • the binaural audio signal can be rendered optimally, for instance, for a headphone device or VR device.
  • the device is configured to: generate the left modified direct component and the right modified direct component by applying the HRTF to the direct component.
  • the device is configured to: generate the left diffuse component and the right diffuse component by linear decoding of the diffuse component.
  • the device is configured to: estimate a diffuseness of the audio signal, and obtain the direct component and optionally the diffuse component based on the estimated diffuseness.
  • the device is configured to: estimate the diffuseness based on a short-time Fourier transform of the audio signal.
  • the device is configured to: estimate the Do A based on a short-time Fourier transform of the audio channels.
  • the HRTF includes a gain and a phase to be applied to each time- frequency tile of the direct component.
  • the device is configured to: smooth the HRTF over time to obtain a smoothed HRTF, and modify the direct component based on the smoothed HRTF.
  • the audio signal is an Ambisonic signal.
  • the audio signal may be a FOA signal.
  • a second aspect of the invention provides a headphone device, wherein the headphone device comprises a device for rendering a binaural audio signal according to the first aspect or any of its implementation forms.
  • the headphone device of the second aspect may particularly be for a 3D VR system. For instance, it may be included in such a system.
  • the headphone device of the second aspect enjoys all advantages of the device of the first aspect.
  • a third aspect of the invention provides a method for rendering a binaural audio signal, wherein the method comprises: obtaining a direct component of the audio signal and a diffuse component and a DoA from a plurality of audio channels of the audio signal, determining a HRTF according to the DoA, filtering the direct component based on the HRTF to obtain a modified direct component, and generating the binaural audio signal based on the diffuse component and the modified direct component.
  • the plurality of audio channels are B-format channels.
  • the method comprises: generating the binaural audio signal by combining the diffuse component with the modified component.
  • the method comprises: generating a left diffuse component and a right diffuse component based on the diffuse component, generating a left modified direct component and a right modified direct component based on the modified direct component, and combining the left diffuse component with the left modified direct component, and combining the right diffuse component with the right modified direct component, wherein the binaural audio signal is generated based on the result of the combining.
  • the method comprises: generating the left modified direct component and the right modified direct component by applying the HRTF to the direct component.
  • the method comprises: generating the left diffuse component and the right diffuse component by linear decoding of the diffuse component.
  • the method comprises: estimating a diffuseness of the audio signal, and obtaining the direct component and optionally the diffuse component based on the estimated diffuseness.
  • the method comprises: estimating the diffuseness based on a short-time Fourier transform of the audio signal.
  • the method comprises: estimating the DoA based on a short-time Fourier transform of the audio channels.
  • the HRTF includes a gain and a phase to be applied to each time- frequency tile of the direct component.
  • the method comprises: smoothing the HRTF over time to obtain a smoothed HRTF, and modifying the direct component based on the smoothed HRTF.
  • the audio signal is an Ambisonic signal.
  • the method of the third aspect and its implementation forms achieve all advantages of the device of the first aspect and its respective implementation forms.
  • a fourth aspect of the invention provides a computer program product comprising a program code for controlling a device according to the first aspect or any of its implementation forms, or for controlling a headphone device according to the second aspect, or for carrying out, when executed by a processor, the method according to the third aspect.
  • FIG. 1 shows a device according to an embodiment of the invention.
  • FIG. 2 shows a device according to an embodiment of the invention.
  • FIG. 3 shows a method according to an embodiment of the invention.
  • FIG. 1 shows a device 100 according to an embodiment of the invention.
  • the device 100 is configured to render a binaural audio signal 107, particularly from an audio signal 101 comprising multiple audio channels, e.g. an FOA signal.
  • the device 100 may be used in a headphone or VR device.
  • the device 100 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the device 100 described herein.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors.
  • the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device 100 to perform, conduct or initiate the operations or methods described herein.
  • the device 100 is specifically configured to obtain a direct component 102 of the audio signal 101 and a diffuse component 103 and a DoA 104 from a plurality of audio channels of the audio signal 101. That is, the device 100 may be configured to separate the audio signal 101 into the direct component 102 and the diffuse component 103, and to derive the DoA 104 related to the audio signal 101.
  • the audio signal 101 be a FOA signal or B-format signal.
  • the plurality of audio channels may be B-format channels.
  • the device 100 is further configured to determine a HRTF 105 according to the DoA 104. To this end, it may be configured to first estimate the DoA 104, for instance, based on a short-time Fourier transform of the audio channels of the audio signal 101. Then, it may use the DoA 104 to generate the HRTF 105.
  • the device 100 is further configured to filter the direct component 102 based on the HRTF 105 (e.g. applying the HRTF 105 to the direct component 102), in order to obtain a modified direct component 106, and then to generate the binaural audio signal 107 based on the diffuse component 103 and the modified direct component 106. In particular, the device 100 may generate the binaural audio signal 107 by combining the diffuse component 103 with the modified direct component 106.
  • FIG. 2 shows the device 100 of FIG. 1 according to an embodiment of the invention in more detail, particularly with further optional features.
  • the device 100 shown in FIG. 2 can generate a left diffuse component 203L and a right diffuse component 203R based on the diffuse component 103, and can generate a left modified direct component 106L and a right modified direct component 106R based on the modified direct component 106.
  • the device 100 can combine in particular the left diffuse component 203L with the left modified direct component 106L, and also combine the right diffuse component 203R with the right modified direct component 106R.
  • the binaural audio signal 107 is thus generated based on the result of the combining.
  • the device 100 may first estimate the DoA 104 of the FOA signal 101 (e.g. azimuth and elevation angles), as well as the directness, respectively, the diffuseness 200 of the FOA signal 101.
  • the direct component 102 and the diffuse component 103 of the FOA signal 101 may be separated.
  • the diffuse component 103 may be decoded linearly into the left and right diffuse components 203L and 203R.
  • HRTF processing may be applied to the direct component 102, in order to obtain the left and right direct components 106L and 106R, respectively.
  • both the direct and diffuse components may be combined to obtain the binaural audio signal 107 (including binaural audio channel, e.g. left and right).
  • the device 100 does not rely on numerous decoded virtual signals, but the HRTF 105 can be applied directly to the FOA signal 101. Moreover, the HRTF 105 can be integrated in the parametrical model and can take advantage of FOA analysis, i.e. can be fully adaptively computed from the DOA estimate. Thus, the device 100 and the algorithm it performs are compact and computational efficient, since all operations may be done independently for each time- frequency tile.
  • An exemplary specific algorithm, which the device 100 may perform, is described in the following.
  • a DOA analysis can be performed according to:
  • a directness estimation can be performed based on the same spectra of the FOA signal 101, according to:
  • the directness estimate (3) can then be used to separate the direct component 102 from the diffuse component of the FOA signal 101, according to:
  • left and right diffuse components 203L and 203R may then be obtained by linear decoding of the FOA diffuse component 102 obtained in (5).
  • left and right channels can be rendered through two decoded cardioid signals with maximum angle separation, i.e. according to:
  • the left and right rendered diffuse components 203L and 203R will benefit from the best possible de-correlation.
  • the decoding can possibly be made frequency dependent, in order to follow the physical shape of the cross-correlation coefficient between both ear signals.
  • the left and right modified direct components 106L and 106R may be obtained by applying the HRTF processing based on the HRTF 105 to the FOA direct components 102 obtained in (4).
  • a simple HRTF 105 model can be directly derived from the DO A estimation given in (1) and (2), wherein the inter-aural level differences (ILDs) between both ears may be derived given the first order filter according to:
  • K is here the diameter of the head of the HRTF model and c being the speed of sound.
  • the HRTF model may simply comprise or consists of a gain and phase to be apply on each time- frequency tiles of the direct sound contributions of the FOA signal 101 :
  • the derived HRTF 105 parameters may be smoothed over time to reduce audible audio artefacts, e.g. according to:
  • a HRTF may be determined by:
  • T HRTF is the averaging time-constant in seconds and f s is the spectrum sampling frequency. and may be obtained analogously.
  • the left and right direct components 106L and 106R, respectively, i.e. the direct sound contributions to the binaural audio signal 107, may be obtained by applying the previously derived filters to the omnidirectional direct signal according to:
  • the direct components 106L and 106R may also be obtained by applying the previously derived filters to combinations of B-Format channels, e.g. according to:
  • left and right oriented sub-cardioid B-Format linear combinations may be used. may be adjusted to compensate for a gain difference between and the left and
  • the final binaural signal 107 may be reconstructed by adding the direct and diffuses sound contributions.
  • FIG. 3 shows a method 300 according to an embodiment of the invention.
  • the method 300 is for rendering a binaural signal 107, and can be performed by the device 100 as shown in FIG. 1 or FIG. 2.
  • the method 300 comprises: a step 301 of obtaining a direct component 102 of the audio signal 101 and a diffuse component 103 and a DoA 104 from a plurality of audio channels of the audio signal 101; a step 302 of determining a HRTF 104 according to the DoA 105; a step 303 of filtering the direct component 102 based on the HRTF 104 to obtain a modified direct component 106; and a step 304 of generating the binaural audio signal 107 based on the diffuse component 103 and the modified direct component 106.
  • the device 100, method 300, and the detailed algorithm can have the following advantages:
  • the direct component 102 and the diffuse component 103 may be separately obtained from the single FOA channels (i.e. the FOA signal 101).
  • the HRTF 105 may not require target binaural cues. Instead, the two binaural channels can be synthesized based on:
  • the device 100, method 300, and algorithm may be compatible with existing FOA signals.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne le domaine technique du son 3D, par exemple, pour les applications de réalité virtuelle (RV) ou le son surround. L'invention propose en particulier un dispositif et un procédé de restitution d'un signal audio binaural. Le dispositif est configuré pour obtenir une composante directe du signal audio et une composante diffuse et une direction d'arrivée (DoA) à partir d'une pluralité de canaux audio du signal audio. En outre, le dispositif est configuré pour déterminer une fonction de transfert liée à la tête (HRTF) selon la DoA. Enfin, le dispositif est configuré pour filtrer le composant direct sur la base de la HRTF pour obtenir un composant direct modifié, et pour générer le signal audio binaural sur la base du composant diffus et du composant direct modifié.
EP19720591.7A 2019-04-30 2019-04-30 Dispositif et procédé de restitution d'un signal audio binaural Withdrawn EP3912365A1 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/060996 WO2020221431A1 (fr) 2019-04-30 2019-04-30 Dispositif et procédé de restitution d'un signal audio binaural

Publications (1)

Publication Number Publication Date
EP3912365A1 true EP3912365A1 (fr) 2021-11-24

Family

ID=66334500

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19720591.7A Withdrawn EP3912365A1 (fr) 2019-04-30 2019-04-30 Dispositif et procédé de restitution d'un signal audio binaural

Country Status (2)

Country Link
EP (1) EP3912365A1 (fr)
WO (1) WO2020221431A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584913B (zh) * 2020-11-30 2023-05-16 华为技术有限公司 Foa信号和双耳信号的获得方法、声场采集装置及处理装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030035553A1 (en) * 2001-08-10 2003-02-20 Frank Baumgarte Backwards-compatible perceptual coding of spatial cues
EP2249334A1 (fr) * 2009-05-08 2010-11-10 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Transcodeur de format audio
EP2942982A1 (fr) * 2014-05-05 2015-11-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Système, appareil et procédé de reproduction de scène acoustique constante sur la base d'un filtrage spatial informé

Also Published As

Publication number Publication date
WO2020221431A1 (fr) 2020-11-05

Similar Documents

Publication Publication Date Title
JP7459019B2 (ja) 高次アンビソニックス・オーディオ信号からステレオ・ラウドスピーカー信号を復号する方法および装置
CN106664485B (zh) 基于自适应函数的一致声学场景再现的系统、装置和方法
JP7564295B2 (ja) DirACベース空間オーディオコーディングに関する符号化、復号、シーン処理、および他の手順のための装置、方法、およびコンピュータプログラム
KR101627647B1 (ko) 바이노럴 렌더링을 위한 오디오 신호 처리 장치 및 방법
US9088858B2 (en) Immersive audio rendering system
KR102586089B1 (ko) 파라메트릭 바이너럴 출력 시스템 및 방법을 위한 머리추적
KR101651419B1 (ko) 머리 전달 함수들의 선형 믹싱에 의한 머리 전달 함수 생성을 위한 방법 및 시스템
EP3363212A1 (fr) Capture et mixage audio distribué
EP3114859A1 (fr) Modélisation structurale de la réponse impulsionnelle relative à la tête
US11232802B2 (en) Method for conversion, stereophonic encoding, decoding and transcoding of a three-dimensional audio signal
TWI751457B (zh) 使用直流分量補償用於編碼、解碼、場景處理及基於空間音訊編碼與DirAC有關的其他程序的裝置、方法及電腦程式
TW202329088A (zh) 用於將保真立體音響格式聲訊訊號描繪至二維度(2d)揚聲器設置之方法和裝置以及電腦可讀式儲存媒體
McCormack et al. Parametric first-order ambisonic decoding for headphones utilising the cross-pattern coherence algorithm
EP3753263B1 (fr) Dispositif et procédé de codage audio
JP2024023412A (ja) 音場関連のレンダリング
JP2017085362A (ja) 立体音再生装置およびプログラム
EP3912365A1 (fr) Dispositif et procédé de restitution d'un signal audio binaural
US10462598B1 (en) Transfer function generation system and method
Politis et al. Overview of Time–Frequency Domain Parametric Spatial Audio Techniques
KR20240142538A (ko) 공간 오디오의 렌더링을 가능하게 하기 위한 장치, 방법, 및 컴퓨터 프로그램
Li-hong et al. Robustness design using diagonal loading method in sound system rendered by multiple loudspeakers

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20210816

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20220329