WO2022108494A1 - Improved modeling and/or determination of binaural room impulse responses for audio applications - Google Patents

Improved modeling and/or determination of binaural room impulse responses for audio applications Download PDF

Info

Publication number
WO2022108494A1
WO2022108494A1 PCT/SE2020/051098 SE2020051098W WO2022108494A1 WO 2022108494 A1 WO2022108494 A1 WO 2022108494A1 SE 2020051098 W SE2020051098 W SE 2020051098W WO 2022108494 A1 WO2022108494 A1 WO 2022108494A1
Authority
WO
WIPO (PCT)
Prior art keywords
filter
rir
sound
brir
binaural
Prior art date
Application number
PCT/SE2020/051098
Other languages
French (fr)
Inventor
Viktor GUNNARSSON
Original Assignee
Dirac Research Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dirac Research Ab filed Critical Dirac Research Ab
Priority to PCT/SE2020/051098 priority Critical patent/WO2022108494A1/en
Publication of WO2022108494A1 publication Critical patent/WO2022108494A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/40Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
    • H04R2201/4012D or 3D arrays of transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the proposed technology generally relates to sound reproduction, audio processing, and more particularly to a method and system for determining an audio filter, an audio processing method and a method for tuning an audio system, an audio filter, an audio processing system and an overall audio system as well as a corresponding overall audio system and computer program and computer-program product.
  • the auditory experience is different compared to when listening with loudspeakers.
  • the sound is perceived as originating from inside the head whereas when using loudspeakers the sound is perceived as externalized i.e. coming from outside the head and the subjective spatial impression of the sound field is determined to a large degree by the reverberant characteristics of the room in which the loudspeakers are placed.
  • a common application of signal processing for headphones is consequently to simulate ear signals (also called a binaural signal or a binaural signal pair) that would occur for a listener when listening to loudspeakers in a room, as schematically illustrated in FIG. 1. This can also be referred to as auralization of a sound system.
  • a Binaural Room Impulse Response is the impulse response(s) of a sound source in a certain acoustic environment - for example, a loudspeaker for music reproduction placed in a listening room - to both of the ears of a listener.
  • a BRIR may be seen as a collective pair of impulse 2 responses, one for each ear.
  • look direction here refers to the head pose of the listener which can be described by pitch, yaw, and roll angles of the head.
  • This enables what is known in the literature as dynamic binaural synthesis [1], where the listener’s look direction is tracked by sensors and the 10 BRIR processing is continuously adjusted so that the perceived virtual loudspeaker directions remain the same even as the listener “looks around” in the virtual acoustic space.
  • BRIRs are usually measured using an artificial head or on a real person with microphones at or in the ears.
  • one drawback is that BRIRs are personal so that a set of BRIRs measured on one person or artificial head may not work well for 20 another listener. Only a small number of people can reasonably have their BRIRs measured.
  • Another drawback is that it is difficult and time consuming to measure BRIRs precisely for many head look directions.
  • 25 SUMMARY It is a general object to provide improved modeling of Binaural Room Impulse Responses and/or an improved audio processing system for filtering audio signals.
  • It is a specific object to provide a method for determining an audio filter.
  • Another object is to provide a filter determination system configured to determine an audio filter.
  • Yet another object is to provide an audio processing method.
  • Still another object is to provide a method for tuning an audio system.
  • a method for determining an audio filter comprises:
  • RIR Room Impulse Response
  • This improved filter design scheme enables higher degrees of freedom when it comes to filter adaptation, optimization and/or and customization, and thereby allows for improved audio filters and, ultimately, enhanced audio listening experiences.
  • a filter determination system configured to determine an audio filter.
  • the filter determination system is configured to provide at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment.
  • the filter determination system is also configured to, for each RIR time segment: provide or implement a corresponding filter for binaural signal estimation;
  • BRIR Binaural Room Impulse Response
  • the filter determination system is configured to combine the BRIR time segments to obtain a resulting BRIR for said audio filter.
  • an audio processing method comprising a method for determining an audio filter according to the first aspect, and performing filtering of an audio signal based on the determined audio filter implementing the resulting BRIR.
  • the audio processing method may be performed for providing enhanced music or movie listening experience in headphones or for generating audio for augmented or virtual reality experiences.
  • a method for tuning a sound system comprising a method for determining an audio filter according to the first aspect.
  • the method for tuning a sound system may be performed for auralization of a computer model of a sound system in a virtual product development setting or for remote tuning of a sound system.
  • an audio filter determined by the method according to the first aspect.
  • an audio processing system comprising an audio filter of the fifth aspect.
  • an audio system comprising a sound generating system and an audio processing system according to the sixth aspect in the input signal path of the sound generating system.
  • the sound generating system may include headphones, also referred to as earphones.
  • a computer program comprising instructions, which when executed by a computer, cause the computer to perform the method of the first aspect.
  • a computer-program product comprising a computer-readable medium having stored thereon a computer program of the eighth aspect.
  • an audio processing system for determining a Binaural Room Impulse Response, BRIR.
  • the audio processing system comprises:
  • RIR Room Impulse Response
  • a second stage comprising, for each RIR time segment, a corresponding filter for binaural signal estimation, wherein said first stage and said second stage collectively provides Binaural Room Impulse Response, BRIR, time segments; and
  • a third stage configured to combine the BRIR time segments to obtain a resulting BRIR.
  • the proposed technology concerns the modeling of BRIRs from microphone array measurements of room impulse responses.
  • the corresponding filter for binaural signal estimation based on Direction-of-Arrival (DoA) information, also referred to as direction- of-sound information.
  • DoA Direction-of-Arrival
  • this allows for adaptation of the binaural signal estimation filter(s) based on information of spatial sound power distribution, e.g. enabling one or more directions of dominant sound power to be taken into consideration when adapting the filter(s).
  • FIG. 1 is a schematic diagram illustrating the idea and concept of sound system auralization.
  • FIG. 2 is a schematic block diagram illustrating a simplified example of an audio system and an associated filter determination system.
  • FIG. 3 is a schematic diagram illustrating an example of BRIR filtering of an audio signal using a BRIR audio filter pair for producing a binaural signal, also denoted a binaural signal pair, defined by a left ear signal and a right ear signal.
  • FIG. 4 is a schematic diagram illustrating an example of BRIR filtering using head-tracking sensor information to dynamically select active BRIR filter pair depending on the listener’s head look direction.
  • FIG. 5A is a schematic flow diagram illustrating an example of a method for determining an audio filter.
  • FIG. 5B is a schematic flow diagram illustrating another example of a method for determining an audio filter.
  • FIG. 6 shows schematic diagrams illustrating an example of identification of direct and reflected sound components of a BRIR.
  • FIG. 7 is a schematic diagram illustrating an example of using a (multi- directional) microphone array for performing room impulse response measurements related to one or more loudspeakers in a listening environment.
  • FIG. 8 is a schematic diagram illustrating an example of a digital filter configured to produce a binaural output signal from M (microphone) input signals.
  • FIG. 9 is a schematic diagram illustrating an example of BRIR modeling based on different time segments, here exemplified by one time segment for direct sound and one for reflected sound, in combination with a Virtual Head filter for each time segment.
  • FIG. 10 is a schematic diagram illustrating a more detailed example of BRIR modeling based on different time segments.
  • FIG. 11 is a schematic diagram illustrating an example of a filter design problem formulation for designing a Virtual Head filter, or more generally a filter for binaural signal estimation based on one or more microphone signals.
  • FIG. 12 is a schematic diagram illustrating an example of an Ambisonics- based Virtual Head filter.
  • FIG. 13 is a schematic block diagram illustrating an example of a computer- implementation according to an embodiment. DETAILED DESCRIPTION
  • FIG. 2 illustrates a simplified audio system 100 as well as an associated filter determination system 50.
  • the audio system 100 basically comprises an audio processing system 200 and a sound generating system 300.
  • the audio processing system 200 is configured to process one or more audio input signals which may relate to one or more audio channels.
  • the filtered audio signals are forwarded to the sound generating system 300 for producing sound.
  • the filter determination system 50 is configured to determine an audio filter based on sound measurements and/or corresponding impulse response measurements.
  • the determined audio filter with its filter coefficients and/or parameters may be stored in a filter database 75 or transferred for implementation into the audio processing system 200 to effectuate suitable audio filtering and/or processing for the sound generating system 300.
  • the measurements are typically not measurements based on the sound generated by the sound generating system 300, which may often be headphones (for which sound system auralization should be performed) but rather separate measurements based on the sound of a high-end loudspeaker sound system or other separate sound system to be emulated or auralized.
  • impulse response measurements may be used, not only for headphone applications, but also for reproducing binaural sound via ordinary loudspeakers, e.g. with the support of cross-talk cancellation. Other applications also exist, as will be described later on.
  • FIG. 3 is a schematic diagram illustrating an example of BRIR filtering of an audio signal using a BRIR audio filter pair for producing a binaural signal, also denoted a binaural signal pair, defined by a left ear signal and a right ear signal.
  • a BRIR measured for a sound source includes two impulse responses, one for the left ear and one for the right ear, which can be denoted BRIR_L_ear and BRIR_R_ear.
  • BRIR_L_ear the left ear
  • BRIR_R_ear the right ear
  • FIG. 3 illustrates the filtering of an audio signal, here involving a single audio channel, with a BRIR filter pair to produce a binaural signal consisting of a left ear signal and a right ear signal.
  • FIG. 4 is a schematic diagram illustrating an example of BRIR filtering using head-tracking sensor information to dynamically select active BRIR filter pair depending on the listener’s head look direction.
  • a BRIR has a head look direction associated with it which is the direction the head was facing during its measurement.
  • BRIRs may be stored in a database with BRIRs for many head look directions, enabling the implementation of dynamic binaural synthesis where the active BRIR filter is switched according to head-tracking sensor data representing the listener’s head look direction, as illustrated in FIG. 4.
  • a BRIR modeling method is described in reference [9], which uses BRIR measurements on real persons. Apparently, a set of BRIRs measured on one person or artificial head may not work well for another listener.
  • FIG. 5A is a schematic flow diagram illustrating an example of a method for determining an audio filter.
  • a method for determining an audio filter comprises:
  • step S1 • providing, in step S1 , at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment;
  • step S2 • for each RIR time segment: providing, in step S2, a corresponding filter for binaural signal estimation; and combining, in step S3, the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment; and
  • step S4 • combining, in step S4, the BRIR time segments to obtain a resulting BRIR for said audio filter.
  • the step of providing at least two RIR time segments comprises dividing a measured or simulated RIR into said at least two RIR time segments, or individually measuring or simulating said at least two RIR time segments.
  • time segments may be (partially) overlapping or non-overlapping, depending on implementation and other circumstances.
  • the step of providing at least two RIR time segments comprises providing a RIR time segment related to direct sound of said at least one sound source in said listening environment and a RIR time segment related to sound reflections and/or reverberations.
  • a filter for binaural signal estimation related to direct sound and a filter for binaural signal estimation related to sound reflections and/or reverberations may be established.
  • the microphone measurements may be obtained from or by one or more microphone systems, at least one of which comprises a collective microphone array having at least two microphones.
  • the filter for binaural signal estimation is provided as a Virtual Head filter.
  • the filter for binaural signal estimation is provided as an Ambisonics-based binaural decoding filter.
  • the filter for binaural signal estimation may be provided as a Virtual Head filter and/or an Ambisonics- based binaural decoding filter.
  • the filter for binaural signal estimation may be provided as a Virtual Head filter and/or an Ambisonics- based binaural decoding filter and, for another specific RIR time segment, the filter for binaural signal estimation may be provided as a Head-Related Transfer Function, HRTF, filter for object-based binaural signal reproduction.
  • HRTF Head-Related Transfer Function
  • a suitable filter for binaural signal estimation may be based on a HRTF response.
  • filter for binaural signal estimation is a generic expression that encompasses a Virtual Head filter and/or an Ambisonics-based binaural decoding filter as well as other types of binaural signal estimation filters including HRFT-based filters for object-based binaural signal reproduction.
  • the filter(s) for binaural signal estimation may be selected from the following:
  • an Ambisonics-based binaural decoding filter e.g. using microphone signals in Ambisonics format
  • the corresponding filter for binaural signal estimation may be determined and/or adapted based on Direction-of-Arrival (DoA) information, also referred to as direction-of-sound information.
  • DoA Direction-of-Arrival
  • a Direction-of-Arrival (DoA) analysis may be performed to provide said DoA information, or said DoA information may be accessed as part of system information.
  • DoA Direction-of-Arrival
  • the DoA information may include information of a number of directions of sound incidence relative to a microphone array to provide information of spatial sound power distribution, and the corresponding filter for binaural signal estimation may be determined and/or adapted based on the information of spatial sound power distribution.
  • the corresponding filter for binaural signal estimation may be determined and/or adapted based on information of one or more individualized Head-Related Transfer Functions, HRTFs, thereby enabling determination of an individualized resulting BRIR.
  • the step of providing (S2) a corresponding filter for binaural signal estimation may involve determining the filter for binaural signal estimation based on a separate set of sound measurements, or accessing a predetermined filter for binaural signal estimation and adapting the predetermined filter into the filter for binaural signal estimation.
  • each RIR time segment may simulate microphone signals, for a respective time segment, that would occur when playing an audio signal using said at least one sound source, and each filter for binaural signal estimation may generate the binaural ear signals from the simulated microphone signals.
  • At least one RIR time segment may be adapted for simulating microphone signals in Ambisonics format and the corresponding filter for binaural signal estimation may be designed as an Ambisonics-based binaural decoding filter.
  • the resulting BRIR may thus correspond to or be used to produce binaural ear signals which, when listened to using headphones, gives a listening experience that simulates listening to the actual sound source(s) in the listening environment.
  • the method may optionally include a pre-step, denoted SO, of performing the microphone measurements or simply accessing the measurement data corresponding to the microphone measurements, as schematically illustrated in FIG. 5B.
  • SO a pre-step
  • the method further comprises the step, denoted S5, of storing said resulting BRIR in a BRIR database, as schematically illustrated in FIG. 5B.
  • the method is possible for the method to be performed for each of a number of different head look directions to produce a corresponding number of head-look- direction-specific BRIRs for storage in the BRIR database.
  • the proposed technology also relates to a corresponding filter determination system configured to determine an audio filter.
  • the filter determination system is configured to provide at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment.
  • the filter determination system is further configured to, for each RIR time segment: provide or implement a corresponding filter for binaural signal estimation; and combine the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment.
  • the filter determination system is configured to combine the BRIR time segments to obtain a resulting BRIR for said audio filter.
  • the filter determination system may be configured to provide said at least two RIR time segments by dividing a measured or simulated RIR into said at least two RIR time segments, or by individually measuring or simulating said at least two RIR time segments.
  • the filter determination system may be configured to provide a RIR time segment related to direct sound of said at least one sound source in said listening environment and a RIR time segment related to sound reflections and/or reverberations.
  • the filter determination system is configured to establish a filter for binaural signal estimation related to direct sound and a filter for binaural signal estimation related to sound reflections and/or reverberations. It should be understood that the filter determination system may be configured to obtain the microphone measurements from or by one or more microphone systems, at least one of which comprises a collective microphone array having at least two microphones.
  • each filter for binaural signal estimation may be a Virtual Head filter.
  • the filter determination system may be configured to determine and/or adapt, for at least one RIR time segment, the corresponding filter for binaural signal estimation based on Direction-of- Arrival (DoA) information, also referred to as direction-of-sound information.
  • DoA Direction-of- Arrival
  • the binaural signal estimation filter(s) based on information of spatial sound power distribution, e.g. enabling one or more directions of dominant sound power to be taken into consideration when adapting the filter(s).
  • an audio processing method comprising a method for determining an audio filter according to the first aspect, and performing filtering of an audio signal based on the determined audio filter implementing the resulting BRIR.
  • the audio processing method may be performed for providing enhanced music or movie listening experience in headphones or for generating audio for augmented or virtual reality experiences.
  • a method for tuning a sound system comprising a method for determining an audio filter according to the first aspect.
  • the method for tuning a sound system may be performed for auralization of a computer model of a sound system in a virtual product development setting or for remote tuning of a sound system.
  • an audio filter determined by the method according to the first aspect.
  • an audio processing system comprising an audio filter of the fifth aspect.
  • an audio system comprising a sound generating system and an audio processing system according to the sixth aspect in the input signal path of the sound generating system.
  • the sound generating system may include headphones, also referred to as earphones.
  • a computer program comprising instructions, which when executed by a computer, cause the computer to perform the method of the first aspect.
  • a computer-program product comprising a computer-readable medium having stored thereon a computer program of the eighth aspect.
  • an audio processing system for determining a Binaural Room Impulse Response, BRIR.
  • the audio processing system comprises:
  • RIR Room Impulse Response
  • a second stage comprising, for each RIR time segment, a corresponding filter for binaural signal estimation, wherein said first stage and said second stage collectively provides Binaural Room Impulse Response, BRIR, time segments; and
  • a third stage configured to combine the BRIR time segments to obtain a resulting BRIR.
  • a microphone array may be used to measure a collection of Room Impulse Responses (RIR) from one or more loudspeakers at a desired listening position in a room.
  • RIR Room Impulse Responses
  • the RIR measurements can be used to simulate the microphone signals that would occur when playing a sound signal using one of the loudspeakers.
  • a second filtering stage would then estimate binaural ear signals from the simulated microphone signals, here called a Virtual Head filter.
  • VAH Virtual Artificial Head
  • the Virtual Head filter may be designed with a corresponding Virtual Head look direction, so that a Virtual Head filter can be obtained with low effort for any desired head look direction.
  • VAH filter The design of a VAH filter according to reference [2] is known to involve compromises.
  • the quality of the obtained binaural signal depends among other things on the number of microphones in the microphone array. It is usual that a VAH filter is designed with the assumption that equal sound power is coming from all directions.
  • the inventor has realized that if it is known that sound energy is arriving at the microphone array only from a discrete number of known directions, then a substantially increased performance can be attained by using this information when designing the VAH filter, also true for a general Virtual Head filter, and optimizing performance for those directions.
  • RIR Room Impulse Response
  • the RIR measurements it is possible to split the RIR measurements, byway of example, into two parts, one part containing the direct sound of the loudspeakers in the room, and another part containing remaining room reflections and reverberation.
  • the simulated microphone signals can then be split into a direct sound signal part and a reflected sound signal part.
  • the direct sound signal then only contains sound coming from a discrete number of directions corresponding to the loudspeaker locations. By way of example, these directions can be calculated from a Direction-Of-Arrival (DoA) analysis of the direct sound RIR part.
  • DoA Direction-Of-Arrival
  • Separate Virtual Head filters may then be designed to be combined with the direct and reflected RIR measurement parts, respectively. The performance of the direct sound Virtual Head filter in particular can be substantially increased using the DoA information.
  • BRIRs may be modeled from a microphone array recording with a quality that approaches the quality of BRIRs obtained using a direct measurement by a virtual head or real listener with microphones in the ears.
  • individualized BRIRs can also be obtained if using individual HRTF data when designing the Virtual Head filter.
  • BRIRs for many head look directions can be computed with low effort.
  • a BRIR can typically be considered to include a direct sound component and a reflected sound component.
  • the direct sound component corresponds to an early time segment of the BRIR containing the first wave front arriving from the sound source
  • the reflected sound component corresponds to a late time segment of the BRIR containing room reflections.
  • the early time segment is often dominated by a single wave front if the BRIR is measured for a sound source that approximately behave as a point source.
  • the direct sound component of the BRIR could contain a certain number of wavefronts.
  • the reflected sound component often corresponds to an approximately diffuse sound field, that is, in relation to the direct sound component it contains a much larger number of reflected wavefronts incident from many more directions.
  • FIG. 6 shows schematic diagrams illustrating an example of identification of direct and reflected sound components of a BRIR.
  • the inventor has conceptually understood that to model a BRIR, it may be desirable to have two pieces of information: firstly, the constitution of the reflective properties of the sound field at the head position, which can be given by RIR measurements without a head present. And secondly, how individual sound waves add up at the ears of the listener, which can be given by a set of HRIRs, where HRIR stands for Head Related Impulse Response, which is the HRTF correspondence in the time domain.
  • HRIR Head Related Impulse Response
  • a Room Impulse Response is the impulse response measured from a sound source situated in a certain acoustic environment to a microphone in the acoustic environment.
  • a RIR typically includes a direct sound component, and a reflected sound component.
  • MIMO Multiple-Input Multiple-Output
  • a corresponding collection of RIRs may thus be denoted as a MIMO RIR.
  • a MIMO RIR could be measured in a real room, but it could also be obtained by some other method, for example it could be calculated for a virtual room and virtual microphone array using numerical acoustical methods.
  • a MIMO RIR system description corresponding to a desired listening position may be measured using a microphone array providing M ⁇ 1 recording channels. The microphone array being placed in a desired listening position in a room or other acoustic environment containing L ⁇ 1 number of sound sources.
  • An example microphone array may include several microphone capsules evenly distributed over a spherical surface, each capsule providing a discrete recording channel and a polar uptake pattern which differs from that of the other capsules.
  • the MIMO RIR system description can be used to simulate microphone signals that would occurwhen driving one of the L sound sources with an input signal.
  • a MIMO RIR system description may be divided into time segments. By way of example, it may be divided into a direct sound part denoted RIR_d and a reflected sound part denoted RIR_r.
  • the direct sound part, RIR_d then contains the initial wave fronts incident at the microphone array from the L sound sources.
  • a Direction of Arrival (DoA) to the microphone array can be defined for each of the L initial wave fronts, identifying the location of the L sound sources.
  • DoA-analysis can be performed on a MIMO RIR time segment, e.g. on RIR_d, to estimate dominant DoA’s in the time segment.
  • DoA information could also come from any other source, for example if the MIMO RIR is based on a computational numerical acoustical model, DoA information may be explicitly available.
  • a schematic example of RIR measurement and DoA identification is illustrated in FIG. 7.
  • a Virtual Head filter also generally referred to as a filter for binaural signal estimation (based on microphone signals), is here understood to be a filter having M input signals corresponding to a microphone array signal representation and two output signals. Such a filter is by its design configured to estimate the binaural ear signals that would be present at the ears of a listener if said listener had the head at the position of the microphone array when the M-signal recording was taken.
  • FIG. 8 is a schematic diagram illustrating an example of such a digital filter configured to produce a binaural output signal from M (microphone) input signals.
  • a binaural signal is understood to be defined by two signals, one for the left and right ears respectively.
  • the filter thus has M ⁇ 2 inputs and 2 outputs and can thus be classified as a multiple-input multiple-output (MIMO) filter.
  • MIMO multiple-input multiple-output
  • the M input signals may come directly from a microphone array.
  • the digital filter in FIG. 8 can be said to implement an Ambisonics binaural decoder.
  • the Ambisonics signal could be derived from a microphone array recording or any other possible source.
  • a DoA-analysis is performed on the MIMO RIR segments.
  • the DoA-analysis on RIR_d reveals the directions of dominant sound power incidence corresponding to the directions of the sound sources to be auralized, while a DoA-analysis of RIR_r may typically reveal that approximately equal sound energy is coming from all directions in the reflected sound field.
  • the MIMO RIR segments can equivalently be described in the frequency domain by complex frequency response matrices RIR_d( ⁇ ) and RIR_r( ⁇ ) of dimensions [M x L], where ⁇ is a frequency variable.
  • F_d( ⁇ ) and F_r( ⁇ ) are Virtual Head filters designed for direct and reflected binaural sound modeling respectively and have dimensions [2 x L] .
  • Combining F_d( ⁇ ) and RIR_d( ⁇ ) results in a direct sound BRIR time segment BRIR d ( ⁇ ) of dimensions [2 x L], according to equation 1.
  • BRIR d ( ⁇ ) of dimensions [2 x L]
  • combining F_r( ⁇ ) and RIR_r( ⁇ ) results in a reflected sound BRIR time segment BRIR r ( ⁇ ) of dimensions [2 x L], according to equation 2.
  • summing BRIR r ( ⁇ ) and BRIR d ( ⁇ ) results in the complete BRIR model BRIR( ⁇ ) which has dimensions [2 x L] and thus contains the desired BRIR responses from each sound source to each ear.
  • BRIR( ⁇ ) BRIR d ( ⁇ ) + BRIR r ( ⁇ ) (3)
  • a major technical contribution step is the separate modeling of different time segments of a BRIR based on different time segments of a MIMO RIR and designing a unique Virtual Head filter for each MIMO RIR time segment.
  • the inventor has realized that this enables more accurate BRIR modeling than what can be achieved when designing a Virtual Head filter for a full-time MIMO RIR.
  • FIG. 9 is a schematic diagram illustrating an example of BRIR modeling based on different time segments, here exemplified by one time segment for direct sound and one for reflected sound, in combination with a Virtual Head filter for each time segment.
  • FIG. 9 shows an example representation of an operating scenario, showing L input signals, representing input signals to the sound sources that shall be auralized, being split up into a direct sound signal path and a reflected sound signal path.
  • the input signals are fed into Blocks 1 and 3 which simulates the recorded microphone signals corresponding to the direct and room sound components respectively reaching the microphone array from the L sound sources in the room to be auralized.
  • Blocks 2 and 4 are Virtual Head filters which estimate binaural signals from the microphone signals.
  • the dashed lines from blocks 1 and 3 to blocks 2 and 4 indicate that DoA-information may be extracted from the microphone array signal models in blocks 1 and 3 and used in the filter design for blocks 2 and 4.
  • FIG. 9 exemplifies the modeling of a BRIR as a sum of two separate time segments, it should be understood that a BRIR can be modeled as a sum of any number of separate time segments greater or equal to two. It is also possible to use the invention to model parts of a BRIR instead of the full time response of a BRIR.
  • FIG. 9 illustrates the suggested modeling procedure for BRIRs as it also describes a sound processing system where L input signals corresponding to sound source signals produce a binaural output signal defined by a left and right ear signal. Consequently, if an impulse is presented as input to one of the L input signals, the resulting output of the system is a BRIR which may for example be stored in a database. It should be understood however that explicitly calculating a BRIR is not necessary to realize the invention.
  • the blocks in Figure 8 can be individually realized in different hardware or computational environments. Intermediary results may be stored for processing at different points in time. For example, the simulated microphone array signals may be stored for later processing by Virtual Head filters.
  • FIG. 10 is a schematic diagram illustrating a more detailed example of BRIR modeling based on different time segments.
  • MIMO RIRs are used to simulate microphone array signals and the Virtual Head filters are realized using MIMO filters, e.g. as will be described later on. It is also illustrated how a bank of Virtual Head filters can be used to implement head-tracking, if desired.
  • a generalization is also illustrated in FIG. 10, where different microphone arrays could be used for the different time segments, so that the direct signal path uses a microphone array that outputs M1 number of signals and the reflected signal path uses a microphone array that outputs M2 number of signals, where M1 and M2 may be different. It could for example be the case that a single-channel microphone is used for the direct sound signal path, in which case a suitable binaural signal estimation filter would simply involve an HRTF response and in this scenario the direct sound signal path could be said to implement object-based binaural sound reproduction.
  • FIG. 11 is a schematic diagram illustrating an example of a filter design problem formulation for designing a Virtual Head filter, or more generally a filter for binaural signal estimation based on one or more microphone signals.
  • a generalized example filter design procedure for a Virtual Head filter configured to produce a binaural signal from M microphone signals is reviewed in the following to illustrate the functionality of a Virtual Head filter and motivate how, in a particular example, DoA information can improve the filter design.
  • a Virtual Head filter design problem formulation can be exemplified by assuming the existence of a large number N sound sources, which are evenly distributed on a spherical surface around a microphone with M output signals, with N >> M.
  • the response of each microphone signal in response to each of the N sound sources can be modelled by a complex frequency response matrix (also called a matrix of steering vectors) B( ⁇ ) of dimensions [M x N], where ⁇ denotes frequency.
  • the effect of the Virtual Head filter can similarly be described by a complex frequency response matrix F( ⁇ ) of dimensions [2 x L] .
  • the effect of each of the N sound sources on the binaural output signal produced by the filter can thus be modelled by the product F( ⁇ )B( ⁇ ) which has dimensions [2 x N].
  • a goal in the filter design problem formulation may be to design a filter where the effect of each of the N sound sources on the binaural output signal is equal to a given Head-Related Transfer Function (HRTF) for that direction.
  • HRTF Head-Related Transfer Function
  • a HRTF for a specific direction can be defined as the two transfer functions from an ideal sound source in an anechoic environment to the left and right ears respectively, measured on a person or artificial head model [4], HRTF measurements can be found for example in publicly available databases typically containing HRTFs for many directions [5], In the time domain, a HRTF is referred to as a Head Related Impulse Response (HRIR).
  • HRIR Head Related Impulse Response
  • the virtual head look direction can be controlled by adjusting the coefficients in HRTF( ⁇ ).
  • This optimization problem formulation is usually extended with different constraints on the filter F( ⁇ ) and is simplified here for illustrative purposes.
  • an optimal filter F( ⁇ ) minimizes the contribution to the error e from all the N source directions. It can be understood by elementary linear algebra that the filter has in principle M degrees of freedom to minimize the error in N points, and since it is assumed that N»M, there can potentially be a large residual error. Therefore, if it is known that the sound field only has significant power for a few of the N source directions, the error norm function can be modified to weight these directions higher and achieve a significant residual error reduction for these directions. This serves to illustrate how DoA- information may be used in the Virtual Head filter design when designing Virtual Head filters for BRIR modeling. It also illustrates that, optionally, individualized BRIRs can be modeled if individualized HRTFs are used in the Virtual Head filter design.
  • FIG. 12 is a schematic diagram illustrating an example of an Ambisonics- based Virtual Head filter. For more information on filter methods within the field of Ambisonics theory, reference can be made to references [6,7],
  • FIG. 12 shows a first filter block which transforms microphone array signals to an Ambisonic signal (for example Ambisonics B-format). Knowledge of DoA information may optionally help in designing the filter.
  • an Ambisonic signal may be upmixed to a higher Ambisonics order using DoA information.
  • a third optional filter block applies sound field rotation to implement head-tracking using sensor data describing a listener head look direction.
  • a fourth filter block implements an Ambisonics binaural decoder which outputs a binaural signal.
  • the microphone signals described in FIG. 10 and FIG. 11 could also be Ambisonic format microphone signals, or could be in any format which specifies microphone array polar patterns on some predefined form.
  • RIR time segment For each RIR time segment: a) Perform a Direction-of-Arrival (DoA) analysis to find the dominant directions of sound incidence to the microphone array. b) Design or determine a Virtual Head filter which transforms microphone array signals into a binaural signal, taking into account the DoA information from step a) in the filter design to increase accuracy of the binaural signal. c) Repeat step b) to determine Virtual Head filters for one or more Virtual Head look directions (optional). d) Combine the RIR time segment with the Virtual Head filter from step b) to obtain a BRIR time segment (or with the Virtual Head filters from step c) to obtain a BRIR time segment for multiple head look directions).
  • DoA Direction-of-Arrival
  • a system configured to perform the method as described herein.
  • embodiments may be implemented in hardware, or in software for execution by suitable processing circuitry, or a combination thereof.
  • the described method may be translated into to a discrete-time implementation for digital signal processing.
  • At least some of the steps, functions, procedures, modules and/or blocks described herein may be implemented in software such as a computer program for execution by suitable processing circuitry such as one or more processors or processing units.
  • processing circuitry includes, but is not limited to, one or more microprocessors, one or more Digital Signal Processors (DSPs), one or more Central Processing Units (CPUs), video acceleration hardware, and/or any suitable programmable logic circuitry such as one or more Field Programmable Gate Arrays (FPGAs), or one or more Programmable Logic Controllers (PLCs).
  • DSPs Digital Signal Processors
  • CPUs Central Processing Units
  • FPGAs Field Programmable Gate Arrays
  • PLCs Programmable Logic Controllers
  • FIG. 13 is a schematic diagram illustrating an example of a computer- implementation according to an embodiment.
  • a computer program 425; 435 which is loaded into the memory 420 for execution by processing circuitry including one or more processors 410.
  • the processor(s) 410 and memory 420 are interconnected to each other to enable normal software execution.
  • An optional input/output device 440 may also be interconnected to the processor(s) 410 and/or the memory 420 to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
  • processor should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
  • the processing circuitry including one or more processors 410 is thus configured to perform, when executing the computer program 425, well- defined processing tasks such as those described herein.
  • the processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks.
  • the computer program 425; 435 comprises instructions, which when executed by the processor 410, cause the processor 410 or computer 400 to perform the tasks described herein.
  • the proposed technology also provides a carrier comprising the computer program, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
  • the software or computer program 425; 435 may be realized as a computer program product, which is normally carried or stored on a non-transitory computer-readable medium 420; 430, in particular a non- volatile medium.
  • the computer-readable medium may include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blu-ray disc, a Universal Serial Bus (USB) memory, a Hard Disk Drive (HDD) storage device, a flash memory, a magnetic tape, or any other conventional memory device.
  • the computer program may thus be loaded into the operating memory of a computer or equivalent processing device for execution by the processing circuitry thereof.
  • the procedural flows presented herein may be regarded as a computer flows, when performed by one or more processors.
  • a corresponding apparatus may be defined as a group of function modules, where each step performed by the processor corresponds to a function module.
  • the function modules are implemented as a computer program running on the processor.
  • the computer program residing in memory may thus be organized as appropriate function modules configured to perform, when executed by the processor, at least part of the steps and/or tasks described herein.
  • function modules predominantly by hardware modules, or alternatively by hardware, with suitable interconnections between relevant modules.
  • Particular examples include one or more suitably configured digital signal processors and other known electronic circuits, e.g. discrete logic gates interconnected to perform a specialized function, and/or Application Specific Integrated Circuits (ASICs) as previously mentioned.
  • Other examples of usable hardware include input/output (I/O) circuitry and/or circuitry for receiving and/or sending signals.
  • I/O input/output

Abstract

There is provided a method and corresponding system for determining an audio filter. The method comprises providing (S1) at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment. The method also comprises, for each RIR time segment, providing (S2) a corresponding filter for binaural signal estimation, and combining (S3) the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment. The method also comprises combining (S4) the BRIR time segments to obtain a resulting BRIR for said audio filter.

Description

IMPROVED MODELING AND/OR DETERMINATION OF BINAURAL ROOM IMPULSE RESPONSES FOR AUDIO APPLICATIONS
TECHNICAL FIELD
The proposed technology generally relates to sound reproduction, audio processing, and more particularly to a method and system for determining an audio filter, an audio processing method and a method for tuning an audio system, an audio filter, an audio processing system and an overall audio system as well as a corresponding overall audio system and computer program and computer-program product.
BACKGROUND
When listening to music using headphones, the auditory experience is different compared to when listening with loudspeakers. In general, when using headphones the sound is perceived as originating from inside the head whereas when using loudspeakers the sound is perceived as externalized i.e. coming from outside the head and the subjective spatial impression of the sound field is determined to a large degree by the reverberant characteristics of the room in which the loudspeakers are placed. A common application of signal processing for headphones is consequently to simulate ear signals (also called a binaural signal or a binaural signal pair) that would occur for a listener when listening to loudspeakers in a room, as schematically illustrated in FIG. 1. This can also be referred to as auralization of a sound system.
A Binaural Room Impulse Response (BRIR) is the impulse response(s) of a sound source in a certain acoustic environment - for example, a loudspeaker for music reproduction placed in a listening room - to both of the ears of a listener. In other words, a BRIR may be seen as a collective pair of impulse 2 responses, one for each ear. Using a BRIR, it is possible to simulate ear signals which, when listened to using headphones, gives a similar listening experience as when listening to the actual loudspeaker in the room [1]. 5 A more natural listening experience can be obtained if BRIRs are available for multiple head look directions, where look direction here refers to the head pose of the listener which can be described by pitch, yaw, and roll angles of the head. This enables what is known in the literature as dynamic binaural synthesis [1], where the listener’s look direction is tracked by sensors and the 10 BRIR processing is continuously adjusted so that the perceived virtual loudspeaker directions remain the same even as the listener “looks around” in the virtual acoustic space. Although significant developments within the field of sound system 15 auralization and BRIR processing and modeling have been made, there is still a demand for further improvements. Today, BRIRs are usually measured using an artificial head or on a real person with microphones at or in the ears. By way of example, one drawback is that BRIRs are personal so that a set of BRIRs measured on one person or artificial head may not work well for 20 another listener. Only a small number of people can reasonably have their BRIRs measured. Another drawback is that it is difficult and time consuming to measure BRIRs precisely for many head look directions. 25 SUMMARY It is a general object to provide improved modeling of Binaural Room Impulse Responses and/or an improved audio processing system for filtering audio signals. 30 It is a specific object to provide a method for determining an audio filter. Another object is to provide a filter determination system configured to determine an audio filter.
Yet another object is to provide an audio processing method.
Still another object is to provide a method for tuning an audio system.
It is another object to provide an audio filter, an audio processing system as well as a corresponding overall audio system.
It is also an object to provide a computer program and computer-program product.
These and other objects are met by embodiments of the proposed technology as defined by the claims.
According to a first aspect, there is provided a method for determining an audio filter. Basically, the method comprises:
• providing at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment;
• for each RIR time segment: providing a corresponding filter for binaural signal estimation; and combining the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment; and
• combining the BRIR time segments to obtain a resulting BRIR for said audio filter. This improved filter design scheme enables higher degrees of freedom when it comes to filter adaptation, optimization and/or and customization, and thereby allows for improved audio filters and, ultimately, enhanced audio listening experiences.
According to a second aspect, there is provided a filter determination system configured to determine an audio filter.
The filter determination system is configured to provide at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment.
The filter determination system is also configured to, for each RIR time segment: provide or implement a corresponding filter for binaural signal estimation; and
- combine the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment.
Further, the filter determination system is configured to combine the BRIR time segments to obtain a resulting BRIR for said audio filter.
According to a third aspect, there is provided an audio processing method comprising a method for determining an audio filter according to the first aspect, and performing filtering of an audio signal based on the determined audio filter implementing the resulting BRIR. By way of example, the audio processing method may be performed for providing enhanced music or movie listening experience in headphones or for generating audio for augmented or virtual reality experiences.
According to a fourth aspect, there is provided a method for tuning a sound system comprising a method for determining an audio filter according to the first aspect.
For example, the method for tuning a sound system may be performed for auralization of a computer model of a sound system in a virtual product development setting or for remote tuning of a sound system.
According to a fifth aspect, there is provided an audio filter determined by the method according to the first aspect.
According to a sixth aspect, there is provided an audio processing system comprising an audio filter of the fifth aspect.
According to a seventh aspect, there is provided an audio system comprising a sound generating system and an audio processing system according to the sixth aspect in the input signal path of the sound generating system.
By way of example, the sound generating system may include headphones, also referred to as earphones.
According to an eighth aspect, there is provided a computer program comprising instructions, which when executed by a computer, cause the computer to perform the method of the first aspect. According to a ninth aspect, there is provided a computer-program product comprising a computer-readable medium having stored thereon a computer program of the eighth aspect.
According to a tenth aspect, there is provided an audio processing system for determining a Binaural Room Impulse Response, BRIR. The audio processing system comprises:
- a first stage for implementing at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment;
- a second stage comprising, for each RIR time segment, a corresponding filter for binaural signal estimation, wherein said first stage and said second stage collectively provides Binaural Room Impulse Response, BRIR, time segments; and
- a third stage configured to combine the BRIR time segments to obtain a resulting BRIR.
In a particular example aspect, the proposed technology concerns the modeling of BRIRs from microphone array measurements of room impulse responses.
By way of example, it may be desirable to determine and/or adapt, for at least one RIR time segment, the corresponding filter for binaural signal estimation based on Direction-of-Arrival (DoA) information, also referred to as direction- of-sound information. For example, this allows for adaptation of the binaural signal estimation filter(s) based on information of spatial sound power distribution, e.g. enabling one or more directions of dominant sound power to be taken into consideration when adapting the filter(s). Other advantages will be appreciated when reading the following detailed description of non-limiting embodiments of the invention.
BRIEF DESCRIPTION OF DRAWINGS
The embodiments, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which:
FIG. 1 is a schematic diagram illustrating the idea and concept of sound system auralization.
FIG. 2 is a schematic block diagram illustrating a simplified example of an audio system and an associated filter determination system.
FIG. 3 is a schematic diagram illustrating an example of BRIR filtering of an audio signal using a BRIR audio filter pair for producing a binaural signal, also denoted a binaural signal pair, defined by a left ear signal and a right ear signal.
FIG. 4 is a schematic diagram illustrating an example of BRIR filtering using head-tracking sensor information to dynamically select active BRIR filter pair depending on the listener’s head look direction.
FIG. 5A is a schematic flow diagram illustrating an example of a method for determining an audio filter.
FIG. 5B is a schematic flow diagram illustrating another example of a method for determining an audio filter. FIG. 6 shows schematic diagrams illustrating an example of identification of direct and reflected sound components of a BRIR.
FIG. 7 is a schematic diagram illustrating an example of using a (multi- directional) microphone array for performing room impulse response measurements related to one or more loudspeakers in a listening environment.
FIG. 8 is a schematic diagram illustrating an example of a digital filter configured to produce a binaural output signal from M (microphone) input signals.
FIG. 9 is a schematic diagram illustrating an example of BRIR modeling based on different time segments, here exemplified by one time segment for direct sound and one for reflected sound, in combination with a Virtual Head filter for each time segment.
FIG. 10 is a schematic diagram illustrating a more detailed example of BRIR modeling based on different time segments.
FIG. 11 is a schematic diagram illustrating an example of a filter design problem formulation for designing a Virtual Head filter, or more generally a filter for binaural signal estimation based on one or more microphone signals.
FIG. 12 is a schematic diagram illustrating an example of an Ambisonics- based Virtual Head filter.
FIG. 13 is a schematic block diagram illustrating an example of a computer- implementation according to an embodiment. DETAILED DESCRIPTION
Throughout the drawings, the same reference designations are used for similar or corresponding elements.
It may be useful to start with an audio system overview with reference to FIG. 2, which illustrates a simplified audio system 100 as well as an associated filter determination system 50.
The audio system 100 basically comprises an audio processing system 200 and a sound generating system 300. In general, the audio processing system 200 is configured to process one or more audio input signals which may relate to one or more audio channels. The filtered audio signals are forwarded to the sound generating system 300 for producing sound.
Briefly, the filter determination system 50 is configured to determine an audio filter based on sound measurements and/or corresponding impulse response measurements. The determined audio filter with its filter coefficients and/or parameters may be stored in a filter database 75 or transferred for implementation into the audio processing system 200 to effectuate suitable audio filtering and/or processing for the sound generating system 300.
The measurements are typically not measurements based on the sound generated by the sound generating system 300, which may often be headphones (for which sound system auralization should be performed) but rather separate measurements based on the sound of a high-end loudspeaker sound system or other separate sound system to be emulated or auralized.
It should though be understood that impulse response measurements may be used, not only for headphone applications, but also for reproducing binaural sound via ordinary loudspeakers, e.g. with the support of cross-talk cancellation. Other applications also exist, as will be described later on.
FIG. 3 is a schematic diagram illustrating an example of BRIR filtering of an audio signal using a BRIR audio filter pair for producing a binaural signal, also denoted a binaural signal pair, defined by a left ear signal and a right ear signal.
A BRIR measured for a sound source includes two impulse responses, one for the left ear and one for the right ear, which can be denoted BRIR_L_ear and BRIR_R_ear. For example, one application for BRIRs is the generation of headphone signals for the production of virtual sound images. This is exemplified in FIG. 3, which illustrates the filtering of an audio signal, here involving a single audio channel, with a BRIR filter pair to produce a binaural signal consisting of a left ear signal and a right ear signal. When listening to the headphone signals, a virtual sound source is perceived.
FIG. 4 is a schematic diagram illustrating an example of BRIR filtering using head-tracking sensor information to dynamically select active BRIR filter pair depending on the listener’s head look direction.
Normally, a BRIR has a head look direction associated with it which is the direction the head was facing during its measurement. BRIRs may be stored in a database with BRIRs for many head look directions, enabling the implementation of dynamic binaural synthesis where the active BRIR filter is switched according to head-tracking sensor data representing the listener’s head look direction, as illustrated in FIG. 4.
A system for measuring BRIRs for many head look directions was described in reference [8], The system uses an artificial head which is motorized to be able to effectively change the head pose in between measurements. A drawback with this arrangement is that it cannot produce individualized BRIRs and it uses bulky equipment.
A BRIR modeling method is described in reference [9], which uses BRIR measurements on real persons. Apparently, a set of BRIRs measured on one person or artificial head may not work well for another listener.
FIG. 5A is a schematic flow diagram illustrating an example of a method for determining an audio filter.
According to a first aspect, there is provided a method for determining an audio filter. Basically, the method comprises:
• providing, in step S1 , at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment;
• for each RIR time segment: providing, in step S2, a corresponding filter for binaural signal estimation; and combining, in step S3, the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment; and
• combining, in step S4, the BRIR time segments to obtain a resulting BRIR for said audio filter.
As mentioned, this improved filter design scheme enables higher degrees of freedom when it comes to filter adaptation, optimization and/or and customization, and thereby allows for improved audio filters and, ultimately, enhanced audio listening experiences. By way of example, the step of providing at least two RIR time segments comprises dividing a measured or simulated RIR into said at least two RIR time segments, or individually measuring or simulating said at least two RIR time segments.
It should be understood that the time segments may be (partially) overlapping or non-overlapping, depending on implementation and other circumstances.
In a particular example, the step of providing at least two RIR time segments comprises providing a RIR time segment related to direct sound of said at least one sound source in said listening environment and a RIR time segment related to sound reflections and/or reverberations.
For example, a filter for binaural signal estimation related to direct sound and a filter for binaural signal estimation related to sound reflections and/or reverberations may be established.
As an example, the microphone measurements may be obtained from or by one or more microphone systems, at least one of which comprises a collective microphone array having at least two microphones.
In a particular example, for at least one RIR time segment, the filter for binaural signal estimation is provided as a Virtual Head filter.
In another example, for at least one RIR time segment, the filter for binaural signal estimation is provided as an Ambisonics-based binaural decoding filter.
By way of example, for each RIR time segment, the filter for binaural signal estimation may be provided as a Virtual Head filter and/or an Ambisonics- based binaural decoding filter. Alternatively, for a specific RIR time segment, the filter for binaural signal estimation may be provided as a Virtual Head filter and/or an Ambisonics- based binaural decoding filter and, for another specific RIR time segment, the filter for binaural signal estimation may be provided as a Head-Related Transfer Function, HRTF, filter for object-based binaural signal reproduction.
For example, if a single-channel microphone is used for the direct sound signal path, a suitable filter for binaural signal estimation may be based on a HRTF response.
It should be understood that the expression “filter for binaural signal estimation” is a generic expression that encompasses a Virtual Head filter and/or an Ambisonics-based binaural decoding filter as well as other types of binaural signal estimation filters including HRFT-based filters for object-based binaural signal reproduction.
By way of example, the filter(s) for binaural signal estimation may be selected from the following:
• a Virtual Head filter
• an Ambisonics-based binaural decoding filter (e.g. using microphone signals in Ambisonics format)
• a Virtual Head filter internally implemented based on Ambisonics modules (e.g. using normal microphone signals)
• a Head-Related Transfer Function, HRTF, filter for object-based binaural signal reproduction. Optionally, but with advantageous technical effects, for at least one RIR time segment, the corresponding filter for binaural signal estimation may be determined and/or adapted based on Direction-of-Arrival (DoA) information, also referred to as direction-of-sound information.
For example, for at least one RIR time segment, a Direction-of-Arrival (DoA) analysis may be performed to provide said DoA information, or said DoA information may be accessed as part of system information.
By way of example, the DoA information may include information of a number of directions of sound incidence relative to a microphone array to provide information of spatial sound power distribution, and the corresponding filter for binaural signal estimation may be determined and/or adapted based on the information of spatial sound power distribution.
In an optional embodiment, for at least one RIR time segment, the corresponding filter for binaural signal estimation may be determined and/or adapted based on information of one or more individualized Head-Related Transfer Functions, HRTFs, thereby enabling determination of an individualized resulting BRIR.
It should be understood that the step of providing (S2) a corresponding filter for binaural signal estimation may involve determining the filter for binaural signal estimation based on a separate set of sound measurements, or accessing a predetermined filter for binaural signal estimation and adapting the predetermined filter into the filter for binaural signal estimation.
It should also be understood that each RIR time segment may simulate microphone signals, for a respective time segment, that would occur when playing an audio signal using said at least one sound source, and each filter for binaural signal estimation may generate the binaural ear signals from the simulated microphone signals.
By way of example, at least one RIR time segment may be adapted for simulating microphone signals in Ambisonics format and the corresponding filter for binaural signal estimation may be designed as an Ambisonics-based binaural decoding filter.
In effect, the resulting BRIR may thus correspond to or be used to produce binaural ear signals which, when listened to using headphones, gives a listening experience that simulates listening to the actual sound source(s) in the listening environment.
In a particular example, the method may optionally include a pre-step, denoted SO, of performing the microphone measurements or simply accessing the measurement data corresponding to the microphone measurements, as schematically illustrated in FIG. 5B.
Optionally, the method further comprises the step, denoted S5, of storing said resulting BRIR in a BRIR database, as schematically illustrated in FIG. 5B.
It is possible for the method to be performed for each of a number of different head look directions to produce a corresponding number of head-look- direction-specific BRIRs for storage in the BRIR database.
According to a second aspect, the proposed technology also relates to a corresponding filter determination system configured to determine an audio filter. Basically, the filter determination system is configured to provide at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment.
The filter determination system is further configured to, for each RIR time segment: provide or implement a corresponding filter for binaural signal estimation; and combine the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment.
Further, the filter determination system is configured to combine the BRIR time segments to obtain a resulting BRIR for said audio filter.
By way of example, the filter determination system may be configured to provide said at least two RIR time segments by dividing a measured or simulated RIR into said at least two RIR time segments, or by individually measuring or simulating said at least two RIR time segments.
For example, the filter determination system may be configured to provide a RIR time segment related to direct sound of said at least one sound source in said listening environment and a RIR time segment related to sound reflections and/or reverberations.
In a particular example, the filter determination system is configured to establish a filter for binaural signal estimation related to direct sound and a filter for binaural signal estimation related to sound reflections and/or reverberations. It should be understood that the filter determination system may be configured to obtain the microphone measurements from or by one or more microphone systems, at least one of which comprises a collective microphone array having at least two microphones.
As an example, each filter for binaural signal estimation may be a Virtual Head filter.
According to a particular example, the filter determination system may be configured to determine and/or adapt, for at least one RIR time segment, the corresponding filter for binaural signal estimation based on Direction-of- Arrival (DoA) information, also referred to as direction-of-sound information.
In this way, it is possible to adapt the binaural signal estimation filter(s) based on information of spatial sound power distribution, e.g. enabling one or more directions of dominant sound power to be taken into consideration when adapting the filter(s).
According to a third aspect, there is provided an audio processing method comprising a method for determining an audio filter according to the first aspect, and performing filtering of an audio signal based on the determined audio filter implementing the resulting BRIR.
By way of example, the audio processing method may be performed for providing enhanced music or movie listening experience in headphones or for generating audio for augmented or virtual reality experiences.
According to a fourth aspect, there is provided a method for tuning a sound system comprising a method for determining an audio filter according to the first aspect. For example, the method for tuning a sound system may be performed for auralization of a computer model of a sound system in a virtual product development setting or for remote tuning of a sound system.
According to a fifth aspect, there is provided an audio filter determined by the method according to the first aspect.
According to a sixth aspect, there is provided an audio processing system comprising an audio filter of the fifth aspect.
According to a seventh aspect, there is provided an audio system comprising a sound generating system and an audio processing system according to the sixth aspect in the input signal path of the sound generating system.
By way of example, the sound generating system may include headphones, also referred to as earphones.
According to an eighth aspect, there is provided a computer program comprising instructions, which when executed by a computer, cause the computer to perform the method of the first aspect.
According to a ninth aspect, there is provided a computer-program product comprising a computer-readable medium having stored thereon a computer program of the eighth aspect.
According to a tenth aspect, there is provided an audio processing system for determining a Binaural Room Impulse Response, BRIR. The audio processing system comprises:
- a first stage for implementing at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment;
- a second stage comprising, for each RIR time segment, a corresponding filter for binaural signal estimation, wherein said first stage and said second stage collectively provides Binaural Room Impulse Response, BRIR, time segments; and
- a third stage configured to combine the BRIR time segments to obtain a resulting BRIR.
For a better understanding the proposed technology will now be described with reference to non-limiting examples.
In a sense, the proposed technology relates to a method and system for BRIR modeling as well as an associated sound processing system. In a typical example embodiment of the invention, a microphone array may be used to measure a collection of Room Impulse Responses (RIR) from one or more loudspeakers at a desired listening position in a room. The RIR measurements can be used to simulate the microphone signals that would occur when playing a sound signal using one of the loudspeakers. A second filtering stage would then estimate binaural ear signals from the simulated microphone signals, here called a Virtual Head filter.
In reference [9], a study is presented which infers a relation between the complexity of the microphone array used (i.e. the number of microphone channels) and the quality of the modeled BRIRs (and resulting perceptual fidelity of a binaural signal rendered using the modeled BRIRs).
By way of example, it is desirable to provide an improved method for modeling of BRIRs from RIR measurements taken with a microphone array with a given finite complexity. None of the mentioned prior art references describes how BRIRs can be modeled from microphone array RIR measurements with separate modeling of different time segments of the BRIR.
For more information on possible implementations of the second filtering stage, reference can be made to reference [2] describing a so-called Virtual Artificial Head (VAH) filter, which is a particular Virtual Head filter.
The Virtual Head filter may be designed with a corresponding Virtual Head look direction, so that a Virtual Head filter can be obtained with low effort for any desired head look direction.
The design of a VAH filter according to reference [2] is known to involve compromises. The quality of the obtained binaural signal depends among other things on the number of microphones in the microphone array. It is usual that a VAH filter is designed with the assumption that equal sound power is coming from all directions.
However, in this particular context, the inventor has realized that if it is known that sound energy is arriving at the microphone array only from a discrete number of known directions, then a substantially increased performance can be attained by using this information when designing the VAH filter, also true for a general Virtual Head filter, and optimizing performance for those directions.
As mentioned, it is beneficial to provide at least two Room Impulse Response, RIR, time segments based on sound measurement data from or simulations of microphone measurements.
For example, it is possible to split the RIR measurements, byway of example, into two parts, one part containing the direct sound of the loudspeakers in the room, and another part containing remaining room reflections and reverberation. The simulated microphone signals can then be split into a direct sound signal part and a reflected sound signal part. The direct sound signal then only contains sound coming from a discrete number of directions corresponding to the loudspeaker locations. By way of example, these directions can be calculated from a Direction-Of-Arrival (DoA) analysis of the direct sound RIR part. Separate Virtual Head filters may then be designed to be combined with the direct and reflected RIR measurement parts, respectively. The performance of the direct sound Virtual Head filter in particular can be substantially increased using the DoA information.
Using the proposed method, BRIRs may be modeled from a microphone array recording with a quality that approaches the quality of BRIRs obtained using a direct measurement by a virtual head or real listener with microphones in the ears.
Optionally, individualized BRIRs can also be obtained if using individual HRTF data when designing the Virtual Head filter.
Furthermore, if desirable, BRIRs for many head look directions can be computed with low effort.
The inventor has realized that a BRIR can typically be considered to include a direct sound component and a reflected sound component. The direct sound component corresponds to an early time segment of the BRIR containing the first wave front arriving from the sound source, and the reflected sound component corresponds to a late time segment of the BRIR containing room reflections. The early time segment is often dominated by a single wave front if the BRIR is measured for a sound source that approximately behave as a point source. For more complex sound sources consisting of for example multiple loudspeakers, the direct sound component of the BRIR could contain a certain number of wavefronts. The reflected sound component often corresponds to an approximately diffuse sound field, that is, in relation to the direct sound component it contains a much larger number of reflected wavefronts incident from many more directions.
FIG. 6 shows schematic diagrams illustrating an example of identification of direct and reflected sound components of a BRIR.
The inventor has conceptually understood that to model a BRIR, it may be desirable to have two pieces of information: firstly, the constitution of the reflective properties of the sound field at the head position, which can be given by RIR measurements without a head present. And secondly, how individual sound waves add up at the ears of the listener, which can be given by a set of HRIRs, where HRIR stands for Head Related Impulse Response, which is the HRTF correspondence in the time domain.
A Room Impulse Response (RIR) is the impulse response measured from a sound source situated in a certain acoustic environment to a microphone in the acoustic environment. Similarly to a BRIR, a RIR typically includes a direct sound component, and a reflected sound component.
In a particular example, when impulse responses are measured from several sound sources to several microphones, the collection of impulse responses can be said to constitute a Multiple-Input Multiple-Output (MIMO) system description. A corresponding collection of RIRs may thus be denoted as a MIMO RIR. It can be noted that a MIMO RIR could be measured in a real room, but it could also be obtained by some other method, for example it could be calculated for a virtual room and virtual microphone array using numerical acoustical methods. A MIMO RIR system description corresponding to a desired listening position may be measured using a microphone array providing M ≥ 1 recording channels. The microphone array being placed in a desired listening position in a room or other acoustic environment containing L ≥ 1 number of sound sources. An example microphone array may include several microphone capsules evenly distributed over a spherical surface, each capsule providing a discrete recording channel and a polar uptake pattern which differs from that of the other capsules. The MIMO RIR system description can be used to simulate microphone signals that would occurwhen driving one of the L sound sources with an input signal.
A MIMO RIR system description may be divided into time segments. By way of example, it may be divided into a direct sound part denoted RIR_d and a reflected sound part denoted RIR_r. The direct sound part, RIR_d, then contains the initial wave fronts incident at the microphone array from the L sound sources. A Direction of Arrival (DoA) to the microphone array can be defined for each of the L initial wave fronts, identifying the location of the L sound sources. Using knowledge of the physical configuration of the microphone array, a DoA-analysis can be performed on a MIMO RIR time segment, e.g. on RIR_d, to estimate dominant DoA’s in the time segment. This can be done using well known techniques, for example by conventional beamforming and finding the directions of maximum power [3], DoA information could also come from any other source, for example if the MIMO RIR is based on a computational numerical acoustical model, DoA information may be explicitly available. A schematic example of RIR measurement and DoA identification is illustrated in FIG. 7.
A Virtual Head filter, also generally referred to as a filter for binaural signal estimation (based on microphone signals), is here understood to be a filter having M input signals corresponding to a microphone array signal representation and two output signals. Such a filter is by its design configured to estimate the binaural ear signals that would be present at the ears of a listener if said listener had the head at the position of the microphone array when the M-signal recording was taken.
FIG. 8 is a schematic diagram illustrating an example of such a digital filter configured to produce a binaural output signal from M (microphone) input signals.
As mentioned, a binaural signal is understood to be defined by two signals, one for the left and right ears respectively. The filter thus has M ≥ 2 inputs and 2 outputs and can thus be classified as a multiple-input multiple-output (MIMO) filter. The M input signals may come directly from a microphone array.
For example, if the M input signals represent an Ambisonics signal, the digital filter in FIG. 8 can be said to implement an Ambisonics binaural decoder. The Ambisonics signal could be derived from a microphone array recording or any other possible source.
It may be illustrative with a non-limiting example of the BRIR modeling process, given in the following. Starting with a MIMO RIR, it is in a first step divided into direct and reflected time segments RIR_d, and RIR_r by time- windowing each impulse response.
A DoA-analysis is performed on the MIMO RIR segments. The DoA-analysis on RIR_d reveals the directions of dominant sound power incidence corresponding to the directions of the sound sources to be auralized, while a DoA-analysis of RIR_r may typically reveal that approximately equal sound energy is coming from all directions in the reflected sound field. The MIMO RIR segments can equivalently be described in the frequency domain by complex frequency response matrices RIR_d(ω) and RIR_r(ω) of dimensions [M x L], where ω is a frequency variable.
In the next step, Virtual Head filters are designed. Expressed in the frequency domain, F_d(ω) and F_r(ω) are Virtual Head filters designed for direct and reflected binaural sound modeling respectively and have dimensions [2 x L] . Combining F_d(ω) and RIR_d(ω) results in a direct sound BRIR time segment BRIRd(ω) of dimensions [2 x L], according to equation 1. Likewise, combining F_r(ω) and RIR_r(ω) results in a reflected sound BRIR time segment BRIRr(ω) of dimensions [2 x L], according to equation 2. And summing BRIRr(ω) and BRIRd(ω) results in the complete BRIR model BRIR(ω) which has dimensions [2 x L] and thus contains the desired BRIR responses from each sound source to each ear.
BRIRd(ω) = Fd(ω)RIRd(ω) (1)
BRIRr(ω) = Fr(ω)RIRr(ω) (2)
BRIR(ω) = BRIRd(ω) + BRIRr(ω) (3)
A major technical contribution step is the separate modeling of different time segments of a BRIR based on different time segments of a MIMO RIR and designing a unique Virtual Head filter for each MIMO RIR time segment. The inventor has realized that this enables more accurate BRIR modeling than what can be achieved when designing a Virtual Head filter for a full-time MIMO RIR.
By way of example, a Virtual Head filter that is designed with knowledge of spatial sound power distribution of the sound field has higher performance than if such information is not available. This information can be made available, e.g. from a DoA-analysis of each MIMO RIR time segment. FIG. 9 is a schematic diagram illustrating an example of BRIR modeling based on different time segments, here exemplified by one time segment for direct sound and one for reflected sound, in combination with a Virtual Head filter for each time segment.
FIG. 9 shows an example representation of an operating scenario, showing L input signals, representing input signals to the sound sources that shall be auralized, being split up into a direct sound signal path and a reflected sound signal path. The input signals are fed into Blocks 1 and 3 which simulates the recorded microphone signals corresponding to the direct and room sound components respectively reaching the microphone array from the L sound sources in the room to be auralized. Blocks 2 and 4 are Virtual Head filters which estimate binaural signals from the microphone signals. The dashed lines from blocks 1 and 3 to blocks 2 and 4 indicate that DoA-information may be extracted from the microphone array signal models in blocks 1 and 3 and used in the filter design for blocks 2 and 4.
While FIG. 9 exemplifies the modeling of a BRIR as a sum of two separate time segments, it should be understood that a BRIR can be modeled as a sum of any number of separate time segments greater or equal to two. It is also possible to use the invention to model parts of a BRIR instead of the full time response of a BRIR.
In a sense, FIG. 9 illustrates the suggested modeling procedure for BRIRs as it also describes a sound processing system where L input signals corresponding to sound source signals produce a binaural output signal defined by a left and right ear signal. Consequently, if an impulse is presented as input to one of the L input signals, the resulting output of the system is a BRIR which may for example be stored in a database. It should be understood however that explicitly calculating a BRIR is not necessary to realize the invention. For example, the blocks in Figure 8 can be individually realized in different hardware or computational environments. Intermediary results may be stored for processing at different points in time. For example, the simulated microphone array signals may be stored for later processing by Virtual Head filters.
FIG. 10 is a schematic diagram illustrating a more detailed example of BRIR modeling based on different time segments.
In this particular example, MIMO RIRs are used to simulate microphone array signals and the Virtual Head filters are realized using MIMO filters, e.g. as will be described later on. It is also illustrated how a bank of Virtual Head filters can be used to implement head-tracking, if desired.
A generalization is also illustrated in FIG. 10, where different microphone arrays could be used for the different time segments, so that the direct signal path uses a microphone array that outputs M1 number of signals and the reflected signal path uses a microphone array that outputs M2 number of signals, where M1 and M2 may be different. It could for example be the case that a single-channel microphone is used for the direct sound signal path, in which case a suitable binaural signal estimation filter would simply involve an HRTF response and in this scenario the direct sound signal path could be said to implement object-based binaural sound reproduction.
By way of example, the filter blocks in FIG. 10 and all other filters discussed may be realized as digital FIR filters in the frequency or time domain. There are no restrictions on the filter structure however and any filter structure may be used to realize the described functionality of the filters. FIG. 11 is a schematic diagram illustrating an example of a filter design problem formulation for designing a Virtual Head filter, or more generally a filter for binaural signal estimation based on one or more microphone signals.
A generalized example filter design procedure for a Virtual Head filter configured to produce a binaural signal from M microphone signals is reviewed in the following to illustrate the functionality of a Virtual Head filter and motivate how, in a particular example, DoA information can improve the filter design.
A Virtual Head filter design problem formulation can be exemplified by assuming the existence of a large number N sound sources, which are evenly distributed on a spherical surface around a microphone with M output signals, with N >> M. The response of each microphone signal in response to each of the N sound sources can be modelled by a complex frequency response matrix (also called a matrix of steering vectors) B(ω) of dimensions [M x N], where ω denotes frequency.
The effect of the Virtual Head filter can similarly be described by a complex frequency response matrix F(ω) of dimensions [2 x L] . The effect of each of the N sound sources on the binaural output signal produced by the filter can thus be modelled by the product F(ω)B(ω) which has dimensions [2 x N].
For example, a goal in the filter design problem formulation may be to design a filter where the effect of each of the N sound sources on the binaural output signal is equal to a given Head-Related Transfer Function (HRTF) for that direction. A HRTF for a specific direction can be defined as the two transfer functions from an ideal sound source in an anechoic environment to the left and right ears respectively, measured on a person or artificial head model [4], HRTF measurements can be found for example in publicly available databases typically containing HRTFs for many directions [5], In the time domain, a HRTF is referred to as a Head Related Impulse Response (HRIR).
An optimization criterion for the filter which minimizes a residual error e may then be formulated as: = norm(F(ω)B(ω) - HRTF(ω)) (4)
Figure imgf000030_0001
where HRTF(ω) is a [2 x N] matrix of desired HRTF responses for the left and right ears for the N source positions and norm() represents some error norm, for example mean-square of error magnitude. The virtual head look direction can be controlled by adjusting the coefficients in HRTF(ω).
This optimization problem formulation is usually extended with different constraints on the filter F(ω) and is simplified here for illustrative purposes.
By way of example, an optimal filter F(ω) minimizes the contribution to the error e from all the N source directions. It can be understood by elementary linear algebra that the filter has in principle M degrees of freedom to minimize the error in N points, and since it is assumed that N»M, there can potentially be a large residual error. Therefore, if it is known that the sound field only has significant power for a few of the N source directions, the error norm function can be modified to weight these directions higher and achieve a significant residual error reduction for these directions. This serves to illustrate how DoA- information may be used in the Virtual Head filter design when designing Virtual Head filters for BRIR modeling. It also illustrates that, optionally, individualized BRIRs can be modeled if individualized HRTFs are used in the Virtual Head filter design. FIG. 12 is a schematic diagram illustrating an example of an Ambisonics- based Virtual Head filter. For more information on filter methods within the field of Ambisonics theory, reference can be made to references [6,7],
FIG. 12 shows a first filter block which transforms microphone array signals to an Ambisonic signal (for example Ambisonics B-format). Knowledge of DoA information may optionally help in designing the filter. In a second optional step, an Ambisonic signal may be upmixed to a higher Ambisonics order using DoA information. A third optional filter block applies sound field rotation to implement head-tracking using sensor data describing a listener head look direction. A fourth filter block implements an Ambisonics binaural decoder which outputs a binaural signal. For example, the microphone signals described in FIG. 10 and FIG. 11 could also be Ambisonic format microphone signals, or could be in any format which specifies microphone array polar patterns on some predefined form.
In the following, a non-limiting example of a possible filter design work flow will be presented:
1 . Select a room with a sound system where it is desired to obtain Binaural Room Impulse Responses corresponding to one or more loudspeakers in the room and for a specific listening position in the room.
2. Measure a MIMO Room Impulse Response (RIR) using a microphone array, including impulse responses from one or more loudspeakers to all microphones in the microphone array.
3. Split the RIR into at least two segments or otherwise obtain at least two RIR segments, a first segment containing the direct sound from the loudspeakers and a second segment containing the rest of or at least part of the RIR involving room reflections and/or reverberations.
4. For each RIR time segment: a) Perform a Direction-of-Arrival (DoA) analysis to find the dominant directions of sound incidence to the microphone array. b) Design or determine a Virtual Head filter which transforms microphone array signals into a binaural signal, taking into account the DoA information from step a) in the filter design to increase accuracy of the binaural signal. c) Repeat step b) to determine Virtual Head filters for one or more Virtual Head look directions (optional). d) Combine the RIR time segment with the Virtual Head filter from step b) to obtain a BRIR time segment (or with the Virtual Head filters from step c) to obtain a BRIR time segment for multiple head look directions).
5. Combine the BRIR time segments to obtain full-time BRIR responses for the one or more virtual head look directions.
It will be appreciated that the methods and arrangements described herein can be implemented, combined and re-arranged in a variety of ways.
By way of example, there is provided a system configured to perform the method as described herein. For example, embodiments may be implemented in hardware, or in software for execution by suitable processing circuitry, or a combination thereof.
The steps, functions, procedures, modules and/or blocks described herein may be implemented in hardware using any conventional technology, such as discrete circuit or integrated circuit technology, including both general- purpose electronic circuitry and application-specific circuitry.
For example, the described method may be translated into to a discrete-time implementation for digital signal processing.
Alternatively, or as a complement, at least some of the steps, functions, procedures, modules and/or blocks described herein may be implemented in software such as a computer program for execution by suitable processing circuitry such as one or more processors or processing units.
Examples of processing circuitry includes, but is not limited to, one or more microprocessors, one or more Digital Signal Processors (DSPs), one or more Central Processing Units (CPUs), video acceleration hardware, and/or any suitable programmable logic circuitry such as one or more Field Programmable Gate Arrays (FPGAs), or one or more Programmable Logic Controllers (PLCs).
It should also be understood that it may be possible to re-use the general processing capabilities of any conventional device or unit in which the proposed technology is implemented. It may also be possible to re-use existing software, e.g. by reprogramming of the existing software or by adding new software components. It is also possible to provide a solution based on a combination of hardware and software. The actual hardware-software partitioning can be decided by a system designer based on a number of factors including processing speed, cost of implementation and other requirements.
FIG. 13 is a schematic diagram illustrating an example of a computer- implementation according to an embodiment. In this particular example, at least some of the steps, functions, procedures, modules and/or blocks described herein are implemented in a computer program 425; 435, which is loaded into the memory 420 for execution by processing circuitry including one or more processors 410. The processor(s) 410 and memory 420 are interconnected to each other to enable normal software execution. An optional input/output device 440 may also be interconnected to the processor(s) 410 and/or the memory 420 to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
The term ‘processor’ should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
The processing circuitry including one or more processors 410 is thus configured to perform, when executing the computer program 425, well- defined processing tasks such as those described herein.
The processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks. In a particular embodiment, the computer program 425; 435 comprises instructions, which when executed by the processor 410, cause the processor 410 or computer 400 to perform the tasks described herein.
The proposed technology also provides a carrier comprising the computer program, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
By way of example, the software or computer program 425; 435 may be realized as a computer program product, which is normally carried or stored on a non-transitory computer-readable medium 420; 430, in particular a non- volatile medium. The computer-readable medium may include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blu-ray disc, a Universal Serial Bus (USB) memory, a Hard Disk Drive (HDD) storage device, a flash memory, a magnetic tape, or any other conventional memory device. The computer program may thus be loaded into the operating memory of a computer or equivalent processing device for execution by the processing circuitry thereof.
The procedural flows presented herein may be regarded as a computer flows, when performed by one or more processors. A corresponding apparatus may be defined as a group of function modules, where each step performed by the processor corresponds to a function module. In this case, the function modules are implemented as a computer program running on the processor.
The computer program residing in memory may thus be organized as appropriate function modules configured to perform, when executed by the processor, at least part of the steps and/or tasks described herein. Alternatively, it is possible to realize the function modules predominantly by hardware modules, or alternatively by hardware, with suitable interconnections between relevant modules. Particular examples include one or more suitably configured digital signal processors and other known electronic circuits, e.g. discrete logic gates interconnected to perform a specialized function, and/or Application Specific Integrated Circuits (ASICs) as previously mentioned. Other examples of usable hardware include input/output (I/O) circuitry and/or circuitry for receiving and/or sending signals. The extent of software versus hardware is purely implementation selection.
The embodiments described above are merely given as examples, and it should be understood that the proposed technology is not limited thereto. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the present scope as defined by the appended claims. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.
REFERENCES
[1] A. Lindau, “Binaural Resynthesis of Acoustic Environments. Technology and Perceptual Evaluation”, Ph.D. thesis, Technical University of Berlin, 2014.
[2] E. Rasumow, “Synthetic reproduction of head-related transfer functions by using microphone arrays”, Ph.D. thesis, School of Medicine and Health Sciences, University of Oldenburg, 2015.
[3] P. Stoica and R. Moses, "Spectral Analysis of Signals", Prentice Hall, 2005.
[4] H. Moller, “Fundamentals of binaural technology”, Applied Acoustics, vol. 36, pp. 171-218, Dec. 1992.
[5] B. Bernschutz, “A spherical far field hrir/h rtf compilation of the neumann ku 100”, in Proceedings of the AIA-DAGA 2013 Conference on Acoustics, Merano, Italy, 2013, pp. 592-595.
[6] B. Bernshutz, “Microphone arrays and sound field decomposition for dynamic binaural recording”, Ph.D. thesis, University of Technology Berlin, 2016.
[7] F. Zotter, M. Frank, “Ambisonics”, Springer, 2019.
[8] Lindau, S. Weinzierl, and H. Maempel, “FABIAN - An Instrument for Software-Based Measurement of Binaural Room Impulse Responses in Multiple Degrees of Freedom”, In Proceedings of the 24th Tonmeistertagung - VDT International Convention, Leipzig, Germany, 2006
[9] European patent application 3466 117 A1 .
[10] J. Ahrens and C. Andersson, “Perceptual evaluation of headphone auralization of rooms captured with spherical microphone arrays with respect to spaciousness and timbre”, The Journal of the Acoustical Society of America, vol. 145, pp. 2783-2794, 2019.

Claims

1 . A method for determining an audio filter, said method comprising:
• providing (S1) at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source (310) in a listening environment;
• for each RIR time segment: providing (S2) a corresponding filter for binaural signal estimation; and combining (S3) the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment; and
• combining (S4) the BRIR time segments to obtain a resulting BRIR for said audio filter.
2. The method of claim 1 , wherein said step (S1) of providing at least two RIR time segments comprises dividing a measured or simulated RIR into said at least two RIR time segments, or individually measuring or simulating said at least two RIR time segments.
3. The method of claim 1 or 2, wherein said step (S1) of providing at least two RIR time segments comprises providing a RIR time segment related to direct sound of said at least one sound source in said listening environment and a RIR time segment related to sound reflections and/or reverberations.
4. The method of claim 3, wherein a filter for binaural signal estimation related to direct sound and a filter for binaural signal estimation related to sound reflections and/or reverberations are established.
5. The method of any of the claims 1 to 4, wherein said microphone measurements being obtained from or by one or more microphone systems (25), at least one of which comprises a collective microphone array having at least two microphones.
6. The method of any of the claims 1 to 5, wherein, for at least one RIR time segment, said filter for binaural signal estimation is provided as a Virtual Head filter.
7. The method of any of the claims 1 to 6, wherein, for at least one RIR time segment, said filter for binaural signal estimation is provided as an Ambisonics-based binaural decoding filter.
8. The method of claim 6 or 7, wherein, for each RIR time segment, said filter for binaural signal estimation is provided as a Virtual Head filter and/or an Ambisonics-based binaural decoding filter.
9. The method of claim 6 or 7, wherein, for a specific RIR time segment, said filter for binaural signal estimation is provided as a Virtual Head filter and/or an Ambisonics-based binaural decoding filter and, for another specific RIR time segment, said filter for binaural signal estimation is provided as a Head-Related Transfer Function, HRTF, filter for object-based binaural signal reproduction.
10. The method of any of the claims 1 to 9, wherein, for at least one RIR time segment, the corresponding filter for binaural signal estimation is determined and/or adapted based on Direction-of-Arrival (DoA) information, also referred to as direction-of-sound information.
11 . The method of claim 10, wherein, for at least one RIR time segment, a Direction-of-Arrival (DoA) analysis is performed to provide said DoA information, or said DoA information is accessed as part of system information.
12. The method of claim 10 or 11 , wherein said DoA information includes information of a number of directions of sound incidence relative to a microphone array to provide information of spatial sound power distribution, and the corresponding filter for binaural signal estimation is determined and/or adapted based on said information of spatial sound power distribution.
13. The method of any of the claims 1 to 12, wherein, for at least one RIR time segment, the corresponding filter for binaural signal estimation is determined and/or adapted based on information of one or more individualized Head-Related Transfer Functions, HRTFs, thereby enabling determination of an individualized resulting BRIR.
14. The method of any of the claims 1 to 13, wherein said step of providing (S2) a corresponding filter for binaural signal estimation comprises determining said filter for binaural signal estimation based on a separate set of sound measurements, or accessing a predetermined filter for binaural signal estimation and adapting the predetermined filter into said filter for binaural signal estimation.
15. The method of any of the claims 1 to 14, wherein each RIR time segment is adapted for simulating microphone signals, for a respective time segment, that would occur when playing an audio signal using said at least one sound source, and/or each filter for binaural signal estimation is adapted for generating binaural ear signals from the simulated microphone signals.
16. The method of claim 15, wherein at least one RIR time segment is adapted for simulating microphone signals in Ambisonics format and the corresponding filter for binaural signal estimation is designed as an Ambisonics-based binaural decoding filter.
17. The method of any of the claims 1 to 16, wherein the resulting BRIR corresponds to or is used to produce binaural ear signals which, when listened to using headphones, gives a listening experience that simulates listening to the actual sound source(s) in the listening environment.
18. The method of any of the claims 1 to 17, wherein said method further comprises the step (SO) of performing said microphone measurements or accessing the measurement data corresponding to said microphone measurements.
19. The method of any of the claims 1 to 18, wherein said method further comprises the step (S5) of storing said resulting BRIR in a BRIR database (75).
20. The method of claim 19, wherein said method is performed for each of a number of different head look directions to produce a corresponding number of head-look-direction-specific BRIRs for storage in said BRIR database (75).
21 . A filter determination system (50, 400) configured to determine an audio filter,
• wherein said filter determination system (50, 400) is configured to provide at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source (310) in a listening environment; • wherein said filter determination system (50, 400) is configured to, for each RIR time segment: provide or implement a corresponding filter for binaural signal estimation; and combine the RIR time segment with the corresponding filter for binaural signal estimation to obtain a Binaural Room Impulse Response, BRIR, time segment; and
• wherein said filter determination system (50, 400) is configured to combine the BRIR time segments to obtain a resulting BRIR for said audio filter.
22. The filter determination system (50, 400) of claim 21 , wherein said filter determination system (50, 400) is configured to provide said at least two RIR time segments by dividing a measured or simulated RIR into said at least two RIR time segments, or by individually measuring or simulating said at least two RIR time segments.
23. The filter determination system (50, 400) of claim 21 or 22, wherein said filter determination system (50, 400) is configured to provide a RIR time segment related to direct sound of said at least one sound source (310) in said listening environment and a RIR time segment related to sound reflections and/or reverberations.
24. The filter determination system (50, 400) of claim 23, wherein said filter determination system (50, 400) is configured to establish a filter for binaural signal estimation related to direct sound and a filter for binaural signal estimation related to sound reflections and/or reverberations.
25. The filter determination system (50, 400) of any of the claims 21 to 24, wherein said filter determination system (50, 400) is configured to obtain said microphone measurements from or by one or more microphone systems (25), at least one of which comprises a collective microphone array having at least two microphones.
26. The filter determination system (50, 400) of any of the claims 21 to 25, wherein each filter for binaural signal estimation is a Virtual Head filter.
27. The filter determination system (50, 400) of any of the claims 21 to 26, wherein said filter determination system (50, 400) is configured to determine and/or adapt, for at least one RIR time segment, the corresponding filter for binaural signal estimation based on Direction-of-Arrival (DoA) information, also referred to as direction-of-sound information.
28. An audio processing method comprising a method for determining an audio filter according to any of the claims 1 to 20, and performing filtering of an audio signal based on the determined audio filter implementing the resulting BRIR.
29. The audio processing method of claim 28, wherein said audio processing method is performed for providing enhanced music or movie listening experience in headphones.
30. The audio processing method of claim 29, wherein said audio processing method is performed for generating audio for augmented or virtual reality experiences.
31. A method for tuning a sound system comprising a method for determining an audio filter according to any of the claims 1 to 20.
32. The method for tuning a sound system of claim 31 , wherein said method for tuning a sound system is performed for auralization of a computer model of said sound system in a virtual product development setting.
33. The method for tuning a sound system of claim 31 , wherein said method for tuning a sound system is performed for remote tuning of said sound system.
34. An audio filter determined by the method according to any of the claims 1 to 20.
35. An audio processing system (200) comprising an audio filter of claim 34.
36. An audio system (100) comprising a sound generating system (300) and an audio processing system (200) according to claim 35 in the input signal path of the sound generating system (300).
37. The audio system (100) of claim 36, wherein the sound generating system (300) includes headphones, also referred to as earphones.
38. A computer program (425; 435) comprising instructions, which when executed by a computer (400), cause said computer (400) to perform the method of any of the claims 1 to 20.
39. A computer-program product comprising a computer-readable medium (420; 430) having stored thereon a computer program (425; 435) of claim 38.
40. An audio processing system for determining a Binaural Room Impulse Response, BRIR, wherein said audio processing system comprises: a first stage for implementing at least two Room Impulse Response, RIR, time segments based on measurement data from or simulations of microphone measurements of sound from at least one sound source in a listening environment;
5 - a second stage comprising, for each RIR time segment, a corresponding filter for binaural signal estimation, wherein said first stage and said second stage collectively provides Binaural Room Impulse Response, BRIR, time segments; and a third stage configured to combine the BRIR time segments to w obtain a resulting BRIR.
PCT/SE2020/051098 2020-11-17 2020-11-17 Improved modeling and/or determination of binaural room impulse responses for audio applications WO2022108494A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SE2020/051098 WO2022108494A1 (en) 2020-11-17 2020-11-17 Improved modeling and/or determination of binaural room impulse responses for audio applications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2020/051098 WO2022108494A1 (en) 2020-11-17 2020-11-17 Improved modeling and/or determination of binaural room impulse responses for audio applications

Publications (1)

Publication Number Publication Date
WO2022108494A1 true WO2022108494A1 (en) 2022-05-27

Family

ID=81709431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2020/051098 WO2022108494A1 (en) 2020-11-17 2020-11-17 Improved modeling and/or determination of binaural room impulse responses for audio applications

Country Status (1)

Country Link
WO (1) WO2022108494A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095595A (en) * 2022-08-19 2023-05-09 荣耀终端有限公司 Audio processing method and device
WO2024044113A3 (en) * 2022-08-24 2024-04-25 Dolby Laboratories Licensing Corporation Rendering audio captured with multiple devices

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060045294A1 (en) * 2004-09-01 2006-03-02 Smyth Stephen M Personalized headphone virtualization
US20140355794A1 (en) * 2013-05-29 2014-12-04 Qualcomm Incorporated Binaural rendering of spherical harmonic coefficients
WO2017072118A1 (en) * 2015-10-26 2017-05-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating a filtered audio signal realizing elevation rendering
US20180091920A1 (en) * 2016-09-23 2018-03-29 Apple Inc. Producing Headphone Driver Signals in a Digital Audio Signal Processing Binaural Rendering Environment
WO2018084769A1 (en) * 2016-11-04 2018-05-11 Dirac Research Ab Constructing an audio filter database using head-tracking data
US20180249274A1 (en) * 2017-02-27 2018-08-30 Philip Scott Lyren Computer Performance of Executing Binaural Sound
US20180359294A1 (en) * 2017-06-13 2018-12-13 Apple Inc. Intelligent augmented audio conference calling using headphones
EP3509327A1 (en) * 2018-01-07 2019-07-10 Creative Technology Ltd. Method for generating customized spatial audio with head tracking
EP3595337A1 (en) * 2018-07-09 2020-01-15 Koninklijke Philips N.V. Audio apparatus and method of audio processing
EP3644628A1 (en) * 2018-10-25 2020-04-29 Creative Technology Ltd. Systems and methods for modifying room characteristics for spatial audio rendering over headphones
US20200162835A1 (en) * 2014-01-03 2020-05-21 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
EP3664477A1 (en) * 2018-12-07 2020-06-10 Creative Technology Ltd. Spatial repositioning of multiple audio streams

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060045294A1 (en) * 2004-09-01 2006-03-02 Smyth Stephen M Personalized headphone virtualization
US20140355794A1 (en) * 2013-05-29 2014-12-04 Qualcomm Incorporated Binaural rendering of spherical harmonic coefficients
US20200162835A1 (en) * 2014-01-03 2020-05-21 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
WO2017072118A1 (en) * 2015-10-26 2017-05-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating a filtered audio signal realizing elevation rendering
US20180091920A1 (en) * 2016-09-23 2018-03-29 Apple Inc. Producing Headphone Driver Signals in a Digital Audio Signal Processing Binaural Rendering Environment
WO2018084769A1 (en) * 2016-11-04 2018-05-11 Dirac Research Ab Constructing an audio filter database using head-tracking data
US20180249274A1 (en) * 2017-02-27 2018-08-30 Philip Scott Lyren Computer Performance of Executing Binaural Sound
US20180359294A1 (en) * 2017-06-13 2018-12-13 Apple Inc. Intelligent augmented audio conference calling using headphones
EP3509327A1 (en) * 2018-01-07 2019-07-10 Creative Technology Ltd. Method for generating customized spatial audio with head tracking
EP3595337A1 (en) * 2018-07-09 2020-01-15 Koninklijke Philips N.V. Audio apparatus and method of audio processing
EP3644628A1 (en) * 2018-10-25 2020-04-29 Creative Technology Ltd. Systems and methods for modifying room characteristics for spatial audio rendering over headphones
EP3664477A1 (en) * 2018-12-07 2020-06-10 Creative Technology Ltd. Spatial repositioning of multiple audio streams

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095595A (en) * 2022-08-19 2023-05-09 荣耀终端有限公司 Audio processing method and device
CN116095595B (en) * 2022-08-19 2023-11-21 荣耀终端有限公司 Audio processing method and device
WO2024044113A3 (en) * 2022-08-24 2024-04-25 Dolby Laboratories Licensing Corporation Rendering audio captured with multiple devices

Similar Documents

Publication Publication Date Title
Cuevas-Rodríguez et al. 3D Tune-In Toolkit: An open-source library for real-time binaural spatialisation
US9918179B2 (en) Methods and devices for reproducing surround audio signals
Grimm et al. A toolbox for rendering virtual acoustic environments in the context of audiology
JP6607895B2 (en) Binaural audio generation in response to multi-channel audio using at least one feedback delay network
US7215782B2 (en) Apparatus and method for producing virtual acoustic sound
US9215544B2 (en) Optimization of binaural sound spatialization based on multichannel encoding
JP6215478B2 (en) Binaural audio generation in response to multi-channel audio using at least one feedback delay network
Sakamoto et al. Sound-space recording and binaural presentation system based on a 252-channel microphone array
WO2022108494A1 (en) Improved modeling and/or determination of binaural room impulse responses for audio applications
Pelzer et al. Auralization of a virtual orchestra using directivities of measured symphonic instruments
Otani et al. Binaural Ambisonics: Its optimization and applications for auralization
EP3920557B1 (en) Loudspeaker control
Nagel et al. Dynamic binaural cue adaptation
Vennerød Binaural reproduction of higher order ambisonics-a real-time implementation and perceptual improvements
JPH09191500A (en) Method for generating transfer function localizing virtual sound image, recording medium recording transfer function table and acoustic signal edit method using it
Vorländer Virtual acoustics: opportunities and limits of spatial sound reproduction
US20230370804A1 (en) Hrtf pre-processing for audio applications
Gunnarsson et al. Binaural auralization of microphone array room impulse responses using causal Wiener filtering
CN114586378A (en) Partial HRTF compensation or prediction for in-ear microphone arrays
Zea Binaural In-Ear Monitoring of acoustic instruments in live music performance
Hamdan et al. Weighted orthogonal vector rejection method for loudspeaker-based binaural audio reproduction
US11778408B2 (en) System and method to virtually mix and audition audio content for vehicles
Filipanits Design and implementation of an auralization system with a spectrum-based temporal processing optimization
JP7440174B2 (en) Sound equipment, sound processing method and program
US20240163630A1 (en) Systems and methods for a personalized audio system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20962604

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20962604

Country of ref document: EP

Kind code of ref document: A1