WO2013111038A1 - Generation of a binaural signal - Google Patents

Generation of a binaural signal Download PDF

Info

Publication number
WO2013111038A1
WO2013111038A1 PCT/IB2013/050441 IB2013050441W WO2013111038A1 WO 2013111038 A1 WO2013111038 A1 WO 2013111038A1 IB 2013050441 W IB2013050441 W IB 2013050441W WO 2013111038 A1 WO2013111038 A1 WO 2013111038A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
transfer function
head related
category
binaural
Prior art date
Application number
PCT/IB2013/050441
Other languages
French (fr)
Inventor
Jeroen Gerardus Henricus KOPPENS
Arnoldus Werner Johannes Oomen
Erik Gosuinus Petrus Schuijers
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Publication of WO2013111038A1 publication Critical patent/WO2013111038A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • H04S3/004For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Definitions

  • the invention relates to generation of a binaural signal and in particular, but not exclusively, to generation of a binaural signal for generating a virtual surround sound experience using headphones.
  • binaural audio signals which contain specific directional information to which the human ear is sensitive.
  • Such binaural signals may provide directional cues to the listener which corresponds to positions outside the head, and thus a more natural sound scene can be rendered.
  • Binaural recordings are typically made using two microphones mounted in the ear canal of a dummy human head, so that the recorded sound corresponds to the sound captured by the human ear and includes any influences due to the shape of the head and the ears. Binaural recordings differ from stereo (that is, stereophonic) recordings in that the reproduction of a binaural recording is generally intended for a headset or headphones, whereas a stereo recording is generally made for reproduction by loudspeakers. While a binaural recording allows a reproduction of all spatial information using only two channels, a stereo recording would not provide the same spatial perception.
  • Regular dual channel (stereophonic) or multiple channel (e.g. 5.1) recordings may be transformed into binaural recordings by convolving each regular signal with a set of perceptual binaural transfer functions that correspond to the direction of the regular signal.
  • perceptual binaural transfer functions model the influence of the human head, and possibly other objects, on the signal.
  • a well-known type of spatial perceptual binaural transfer function is the so-called Head-Related Transfer Function (HRTF) or Head Related Impulse Response (HRIR).
  • HRTF Head-Related Transfer Function
  • HRIR Head Related Impulse Response
  • An alternative type of spatial perceptual binaural transfer function which also takes into account reflections caused by the walls, ceiling and floor of a room, is the Binaural Room Impulse Response (BRIR).
  • BRIR Binaural Room Impulse Response
  • Such a BRIR may thus in addition to the HRTF/HRIR component also include characteristics from the acoustic environment being emulated. BRIRs generally result in an improved out
  • the HRIR, BRIR or HRTFs can be determined. These HRIRs, BRIRs or HRTFs vary from person to person due to different acoustic properties of the head, ears and reflective surfaces such as the shoulders, corpus, etc.
  • the functions can be used to create a binaural recording simulating multiple sources at various locations. This can be realized by convolving each sound source with the pair of HRIRs, BRIRs or HRTFs that corresponds to the position of the sound source as illustrated in Fig. 1 for three audio sources.
  • BRIRs consist of an anechoic portion that only depends on the subject's anthropometric attributes (such as head size, ear shape, etc), followed by a reverberant portion that characterizes the combination of the room and the anthropometric properties.
  • the reverberant portion contains two temporal regions, usually overlapping.
  • the first region contains so-called early reflections, which are isolated reflections of the sound source on walls or obstacles inside the room before reaching the ear-drum (or measurement microphone). As the time lag increases, the number of reflections present in a fixed time interval increases and may also begin to include secondary reflections.
  • the second region in the reverberant portion is the part where these reflections are not isolated anymore. This region is called the diffuse or late reverberation tail.
  • the reverberant portion contains cues that give the auditory system information about distance of the source and size and acoustical properties of the room.
  • a virtual surround sound experience can be provided by rendering the sound such that audio sources appear to be originating from a specific direction, thereby creating the illusion that one is listening to a physical surround sound setup (e.g. 5.1 speakers) or environment (e.g. a concert).
  • a physical surround sound setup e.g. 5.1 speakers
  • environment e.g. a concert
  • binaural transfer function e.g. HRTF
  • the signals required at the eardrums for the listener to perceive sound from any direction can be calculated. These signals are then recreated at the eardrum using either headphones or e.g. a crosstalk cancelation method (suitable for rendering over closely spaced speakers).
  • HRTF, HRIR and BRIR filters may be a combination of the head related filters and further processing to e.g. pre-compensate for the (acoustic) filtering occurring during playback of the binaural signal.
  • pre-compensate for the (acoustic) filtering occurring during playback of the binaural signal.
  • An example is playback over closely spaced stereo speakers on a mobile device, where a crosstalk cancellation filter is required for a proper perception of the binaural signal.
  • This further processing may also contain an individualized component.
  • a disadvantage for binaural rendering systems is that it is very difficult to provide a convincing and high quality experience for a variety of listeners.
  • the characteristics of the HRIRs, BRIRs and HRTFs are specific to the individual and depend on the specific physical properties of the listener. For example, different sizes of various parameters of the ear may have impact on the HRIRs, BRIRs and HRTFs (henceforth referred to only by referring to the HRTF, i.e. an HRTF is a common term including also e.g. HRIRs and BRIRs). Therefore, the perceived spatial characteristics depend on the HRTF used and on the characteristics of the individual.
  • Personalized HRTFs can be determined by measuring the HRTF filters directly on the person, e.g. using microphones in the ear and playback of test signals at various spatial locations. This requires specific hardware, software and significant effort. By measuring the HRTF filters in a specific environment, personalized BRIRs are obtained.
  • personalized HRTFs can be (partially) determined by the use of metrics resulting from anthropometric measurements.
  • Anthropometry refers to the measurement of the human individual, and anthropometric data may thus specifically be properties and values that characterize one or more physical properties of a human.
  • an improved generation of a binaural signal would be advantageous and in particular an approach allowing increased flexibility, reduced complexity, increased user friendliness, facilitated operation, improved customization, improved spatial experience and/or improved performance would be advantageous.
  • the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
  • an apparatus for generating a binaural signal comprising: a receiver for receiving a user characteristic indication indicative of a characteristic of a user; a classifier for classifying the user into a first user category out of a plurality of user categories in response to the user characteristic indication, each user category being associated with a set of anthropometric characteristics of humans belonging to the user category; a first circuit for determining a head related binaural transfer function in response to the first user category, the head related binaural transfer function being dependent on the anthropometric characteristic associated with the first user category; and a binaural circuit for generating a binaural signal in response to the head related transform.
  • the invention may provide an improved generation of a binaural signal and/or facilitated operation and/or implementation.
  • the invention may allow a customization of binaural processing to a user without requiring specific data about the users physical properties to be provided.
  • the customization of the binaural processing may for example be achieved through a very simple and user friendly interaction, or may indeed in many embodiments be performed without any involvement by a user.
  • the Inventors have realized that substantially improved performance can be achieved by performing customization based on a few suitable anthropometric characteristics, and that the determination of these anthropometric characteristics need not be determined specifically for the individual user, but rather that suitable values can be determined by a classification of users and the use of customization values determined for the appropriate category/classification.
  • the Inventors have realized that the variation of appropriate anthropometric characteristics for binaural customization can be approximately characterized by considering categories of users rather than individual user characteristics, and that this may provide a substantial benefit.
  • the approach may provide a highly desirable trade-off between customization of a head related binaural transfer function (HRTF/BRIR/HRIR) and binaural performance on one hand and user friendliness and ease of operation on the other hand.
  • HRTF/BRIR/HRIR head related binaural transfer function
  • the anthropometric characteristics reflect statistical properties for users belonging to the user category. For example, they may reflect e.g. average, median or other statistical measures for the physical properties of humans belonging to the category.
  • the set of anthropometric characteristics may specifically comprise a pinna height and/or a distance between the ears.
  • the set of anthropometric characteristics may specifically comprise only a single anthropometric characteristic.
  • the anthropometric characteristics may correspond to measurable physical properties of a human.
  • the anthropometric characteristics may correspond to characteristics of humans that impact on the sense of hearing, such as specifically characteristics associated with an acoustic transfer function for the user.
  • Each user category may be associated with/ linked to different values of the anthropometric characteristics. Specifically, each user category may be associated with different values for the pinna height and/or a distance between the ears.
  • the statistical properties of the psychometric/physical characteristics for users belonging to each user category may be different. Specifically, at least some average anthropometric characteristics/ physical properties may be different between the different user categories.
  • the binaural processing may filter one or more input audio signals using the head related binaural transfer function to generate the binaural signal.
  • the head related binaural transfer function may be a function of a desired position for the audio signal/sound component.
  • the binaural circuit may determine the head related binaural transfer function to apply to an audio signal in response to a desired position for the audio signal.
  • the binaural signal may comprise a plurality of sound sources corresponding to a plurality of input signals.
  • the binaural circuit may for each input signal determine the head related binaural transfer function in response to a desired position of the input signal, and may apply the determined head related binaural transfer function to each audio signal. The resulting signals can then be combined into the binaural signal.
  • the head related binaural transfer function may specifically be an HRTF or equivalent representations such as a HRIR or a BRIR.
  • a head related binaural transfer function may represent other characteristics of the acoustic transfer function than those determined by the characteristics of the listener, e.g. it may include characteristics of the audio environment (and especially the room) being rendered.
  • the binaural signal may specifically be a two-channel signal with one channel providing the signal for one ear and the other channel providing the signal for the other ear of the user.
  • the binaural signal may be rendered e.g. using a headphone.
  • the binaural signal may be rendered using two or more loudspeakers.
  • a crosstalk-cancellation technique may be employed to compensate for sound contributions at one ear from the loudspeaker(s) rendering the signal to the other ear.
  • the user characteristic indication and/or the anthropometric characteristics may be composite or multi-dimensional characteristics.
  • the characteristics may e.g. be indicated by a plurality of values, each value being indicative of one property.
  • the user characteristic indication may be indicative of both a gender and ethnicity.
  • an anthropometric characteristic may comprise a value for both a pinna dimension and an inter-ear distance.
  • first circuit comprises: a storage for storing at least one anthropometric property for each user category of the plurality of user categories; a retriever for retrieving a first anthropometric property corresponding to the first user category from the storage; and a generator for generating the head related binaural transfer function in response to the first anthropometric property.
  • This may facilitate implementation in many embodiments and may provide a particularly efficient approach for generating an adapted head related binaural transfer function. In particular, it may in many embodiments reduce storage requirements while maintaining a low computational requirement. In many embodiments and scenarios it may provide an improved spatial experience.
  • the first anthropometric property is a pinna dimension.
  • the Inventors have realized that basing an adaptation of a head related binaural transfer function on a pinna characteristic allows a classification based adaptation rather than requiring an individual optimization for the individual user.
  • the pinna height has been found to allow an improved category based customization, and may in particular allow a more customized adaptation, and may thus provide an improved spatial experience for the user.
  • the pinna dimension may specifically be a pinna dimension.
  • the first anthropometric property is an inter-ear distance.
  • the Inventors have realized that basing an adaptation of a head related binaural transfer function on an inter-ear distance characteristic allows a classification based adaptation rather than requiring an individual optimization for the individual user.
  • inter-ear distance has been found to allow an improved category based customization, and may in particular allow a more customized adaptation and may thus result in an improved spatial experience for the user.
  • the generator is arranged to generate the head related binaural transfer function in response to a frequency scaling of a reference head related binaural transfer function in response to the first anthropometric property.
  • This may provide a particularly efficient implementation and may reduce complexity and/or reduce computational load requirements while providing a binaural signal with high quality and spatial characteristics.
  • the approach may be particularly advantageous when the first anthropometric property is at least one of a pinna dimension and an inter-ear distance.
  • the first circuit comprises: a storage for storing an head related binaural transfer function for each user category of the plurality of user categories, the head related binaural transfer function reflecting the anthropometric characteristics associated with the user category; a generator for generating the head related binaural transfer function by retrieving a stored head related binaural transfer function for the first user category.
  • the storage may specifically provide a look-up table comprising a head related binaural transfer function for each category.
  • the storage may furthermore comprise head related binaural transfer function values for each category for a plurality of different positions, and the generator may be arranged to retrieve stored head related binaural transfer function values for the first user category corresponding to a desired position.
  • the receiver comprises a user interface for receiving the user characteristic indication from a user input.
  • the user characteristic indication comprises an ethnicity indication for the user.
  • This characteristic has been found to be particularly advantageous for a category based adaptation of a head related binaural transfer function.
  • the user characteristic indication comprises a gender indication for the user.
  • This characteristic has been found to be particularly advantageous for a category based adaptation of a head related binaural transfer function.
  • the user characteristic indication comprises a device characteristic for a user device comprising the apparatus.
  • the device characteristic can be a characteristic associated with a correlation with a characteristic of one or more of the categories of the users.
  • the device characteristic can be a characteristic associated with a statistical differentiation between different types of users.
  • the apparatus for generating the binaural signal may for example be implemented in a mobile phone or media player device, and the head related binaural transfer function may be adapted based on a characteristic of the mobile phone or media player device.
  • the device characteristic comprises a user setting. This may provide a particularly accurate and/or practical association with a user category.
  • the device in accordance with an optional feature of the invention, the device
  • characteristic comprises a language setting.
  • This may provide a particularly accurate and/or practical association with a user category.
  • the first circuit is further arranged to modify the head related binaural transfer function in response to a user input.
  • the first circuit is arranged to generate user setting options for modifying the head related binaural transfer function based on the head related binaural transfer function.
  • This may allow an improved user experience and/or increased user friendliness in many embodiments.
  • it may allow an initial coarse category customization to be refined by an optional manual adaption which is based on the coarse customization. This may substantially facilitate and aid a manual calibration or setting of the binaural processing as the starting point for the optimization may already be close to the optimum values.
  • method of generating a binaural signal comprising: receiving a user characteristic indication indicative of a characteristic of a user; classifying the user into a first user category out of a plurality of user categories in response to the user characteristic indication, each user category being associated with a set of anthropometric characteristics of humans belonging to the user category; determining a head related binaural transfer function in response to the first user category, the head related binaural transfer function being dependent on the anthropometric characteristic associated with the first user category; and generating a binaural signal in response to the head related transform.
  • Fig. 1 illustrates an example of a generation of a binaural signal
  • Fig. 2 illustrates an example of elements of an apparatus for generating a binaural signal in accordance with some embodiments of the invention
  • Fig. 3 illustrates an example of distributions of inter-ear distances for different categories of people
  • Fig. 4 illustrates an example of distributions of pinna heights for different categories of people.
  • Figs. 5-7 illustrate examples of elements for generating a head related binaural transfer function in accordance with some embodiments of the invention.
  • Fig. 2 illustrates an example of elements of an apparatus for generating a binaural signal in accordance with some embodiments of the invention.
  • the apparatus may for example be included in a mobile phone, media player, portable music player, a tablet, a computer, or the like.
  • the apparatus receives five spatial channels of a surround sound multi-channel signal. Specifically, it may receive signals corresponding to the centre, front left and right, and rear left and right speaker positions. The apparatus then proceeds to generate a binaural signal which provides a virtual surround sound experience by rendering the individual channels such that they to a user wearing a pair of headphones appear to be arriving from positions corresponding to the nominal speaker positions for a surround sound system. It will be appreciated that in other embodiments, other types of signals may be rendered. For example, for a gaming application a number of individual audio objects may be received and rendered such that they are perceived to be positioned at appropriate (virtual) positions.
  • the apparatus of Fig.l comprises a binaural renderer 201 which receives the five spatial channels and which generates a binaural signal wherein the five spatial positions are rendered from (virtual) positions corresponding to the nominal speaker positions for a five channel surround sound system.
  • a binaural renderer 201 which receives the five spatial channels and which generates a binaural signal wherein the five spatial positions are rendered from (virtual) positions corresponding to the nominal speaker positions for a five channel surround sound system.
  • the binaural renderer 201 specifically applies an HRTF for respectively the left and right output channels of the output binaural signal to each of the input channels.
  • the resulting signals are summed for respectively the left and right output channels of the output binaural signal.
  • the binaural renderer 201 may specifically employ an approach
  • the head related binaural transfer function represents the characteristics for the individual user. Indeed, it has been found that using the same HRTF for different users tend to result in significantly degraded spatial experiences compared to that which can be achieved if the HRTF is optimized for the individual user.
  • a problem in customizing HRTFs is that it requires specific information about anthropometric measurements of the user which are difficult or cumbersome to determine or obtain. Indeed, even if directly involving the user, inconvenient and cumbersome measurements are required.
  • a facilitated and more user friendly adaptation of the binaural processing is achieved by customizing the head related binaural transfer function based on a classification approach wherein the user is classified into a classification and the head related binaural transfer function is then adapted to match the (average) anthropometric characteristics for that category of people.
  • the apparatus of Fig. 2 in particular exploits the Inventors' realization that a useful customization of a head related binaural transfer function can be achieved by an adaptation to an appropriate user class and without adaptation to the individual user's specific characteristics. It furthermore exploits the Inventors' realization that adaptation of a head related binaural transfer function is particularly significant if the adaption is based on the anthropometric properties of an inter-ear distance and/or a pinna dimension (and in particular a pinna-cavity dimension or a pinna height); as well as the realization that these parameters show a significant dependence on the ethnicity and gender of a person.
  • Figs. 3 and 4 illustrate respectively the head width (inter-ear distance) and ear height (pinna dimension) obtained by measurements of 150 subjects by the Applicant.
  • the data supports the Inventors' realization that the gender and ethnicity of a user provides some predictive properties on the anthropometric features, such as pinna height and head width, and thus that the gender and ethnicity of a user provides some predictive properties for the optimized head related binaural transfer function for the person.
  • the anthropometric data for this category can provide predictive properties for the optimized head related binaural transfer function.
  • a classification of a user into one of a plurality of categories is performed.
  • An HRTF is then determined which is particularly suitable for that category, e.g. by being optimized for the average anthropometric values of the group.
  • the apparatus comprises a receiver 203 which receives (or generates) a user characteristic indication indicative of a characteristic of a user.
  • the user characteristic provided by the receiver is not an anthropometric property or measure of the user but may specifically be an association with a specific type or category of humans that the user belongs to. Specifically, the user characteristics may be an indication of an ethnicity and/or gender to which the user belongs.
  • the receiver 203 may comprise a user interface which can interface to the user and which specifically can receive a user input.
  • the user characteristic may simply be generated based on a user input.
  • the receiver 203 is coupled to a classifier 205 which classifies the user into one (or possibly more) category(ies) out of a plurality of user categories based on the user characteristic indication.
  • the classifier 205 thus selects which category out of the set of categories that the user belongs to.
  • the classifier 205 may for example comprise a number of predetermined categories where each category exhibits different statistical distributions of one or more anthropometric properties. In particular, the average value of a physical property may be different for the different categories.
  • the classifier 205 may include a category for each of the categories illustrated in Figs. 3 and 4, i.e. one category may be provided to correspond to female Asian, another to male Asian, another to female
  • the classifier 205 thus selects which of the predetermined categories the user belongs to.
  • the user characteristic may directly reflect the parameters used to define the user categories, and in such cases the classification is directly given by the user characteristic and is thus straightforward to implement. For example, the user may input whether he/she is male or female, and which ethnic group he/she belongs to.
  • the classifier 205 is coupled to a HRTF processor 207 which is further coupled to the binaural renderer 201.
  • the HRTF processor 207 is arranged to determine a head related binaural transfer function in response to the user category to which the user was found to belong to.
  • the head related binaural transfer function is determined to match the anthropometric values which are associated with the category to which the user belongs. Specifically, the head related binaural transfer function may be selected/ generated such that it is optimized for the specific average anthropometric values for the identified user category.
  • an HRTF is generated which is based on an inter-ear distance of 13.4 cm and a pinna height of 6 cm.
  • an HRTF is generated which is based on an inter-ear distance of 15.8 cm and a pinna height of 6.5 cm.
  • the determined HRTF is then fed to the binaural renderer 201 where it is used to generate the binaural output signal.
  • the approach provides an adaptation of the binaural processing to the user.
  • the adaptation is not specific to the individual user, and therefore the individual user may have different characteristics than assumed, the difference is likely to be significantly less than if only a nominal or reference HRTF was used.
  • the characteristics substantially have a Gaussian distribution, the probability that the user has characteristics close to the average values is relatively high.
  • the apparatus of Fig. 2 may thus provide an improved binaural rendering by using an HRTF that has been adapted to the user.
  • the user adaptation may be approximate, it has been found that substantial improvements can be achieved.
  • the adaptation can be achieved based on more easily available or obtainable information such as in particular the ethnicity or gender of the user. This information can be provided e.g. simply by asking the user to make very simple and easy selections rather than requiring the user to make cumbersome or difficult estimates of anthropometric properties.
  • the generation and adaptation of the HRTF may be performed differently in different embodiments.
  • Fig. 5 illustrates en example of elements of the HRTF processor 207.
  • the HRTF processor 207 comprises a storage 501 which stores one or more anthropometric properties for each user category of the plurality of user categories.
  • the storage 501 may be implemented as a look-up table or memory which stores one or more anthropometric properties for each category.
  • the storage 501 stores an inter-ear distance and a pinna height for each possible category.
  • the HRTF processor 207 further comprises a retrieve processor 503 which is arranged to retrieve the values corresponding to the user category the user has been classified into by the classifier 205.
  • the retrieve processor 503 may receive an indication of the selected category and in return it may generate one or more addresses for the storage 501.
  • the stored values 501 correspond to the pinna height and inter-ear distance for the selected category.
  • the retrieve processor 503 may initiate a table look-up in a look-up-table stored in the storage 501.
  • the value(s) output by the storage 501 are fed to a generator 505 which is arranged to generate the head related binaural transfer function based on the value(s).
  • the generator 505 may comprise a reference HRTF which expresses a filter transfer function (e.g. a frequency response or an impulse response) as a function of the position.
  • the reference HRTF may further comprise one or more adaptable parameters which are determined by the generator 505 as a function of the retrieved values.
  • one or more of the parameters may be determined as a function of the anthropometric measures, and the parameters for the HRTF are thus determined from the values retrieved from the storage 501.
  • the anthropometric values of the storage 501 may be expressed by the value after the application of the function to determine the HRTF parameters, i.e. the storage may directly store the parameters for the HRTF function.
  • the generator 505 may generate the head related binaural transfer function by performing a frequency scaling of a reference head related binaural transfer function where the frequency scaling depends on the retrieved value(s).
  • the scaling parameter is related to the pinna height and head width. In the above referenced paper, the relation is given as:
  • pinna refers to the pinna height
  • head refers to the inter-ear distance
  • indexes B and A refer to the values for the reference HRTF and the retrieved values respectively.
  • the generator 505 uses this relation to determine the frequency scaling factor by converting the reference HRTF determined for a person with a certain pinna height and inter-ear distance, to an adapted HRTF for the user by using the average pinna height and inter-ear distance for the category to which the user has been classified.
  • the HRTF processor 207 comprises a storage 601 and a retriever 603 which operates similarly to the storage 501 and retriever 503 of the example of Fig. 5.
  • the storage 601 directly stores a head related binaural transfer function for each category.
  • the stored head related binaural transfer function is generated to reflect the specific anthropometric characteristics (e.g. pinna height and inter-ear distance) for the specific category.
  • the stored HRTF can be a measured HRTF on a person with anthropometric properties that are characteristic of the corresponding category.
  • the HRTFs may have been derived from one or more reference HRTFs through offline processing.
  • the HRTF processor 207 further comprises a generator 605.
  • the generator 605 need not derive the head related binaural transfer function as this is already provided by the storage 601.
  • the generator 605 may provide the appropriate filter coefficients for a specific desired position by determining these coefficients from the retrieved transfer function. This may be done for each desired position, i.e. for each of the input channels (it may equivalently be considered that the generator 605 is fully or partially part of the binaural renderer 201).
  • the storage 601 may for example store the parameter values for a function representing an HRTF as a function of position. In other embodiments, the storage 601 may simply comprise an identification of an HRTF to be used.
  • the storage 601 may store the HRTFs as individual filter coefficient values for a plurality of positions. For example, for each HRTF, a set of filter coefficients for a pair of HRTF filters may be stored for each of a plurality of positions (say, one filter coefficient set for each 5°). In such an embodiment, both the user adaptation and the position adaptation may be performed by a single table look-up, i.e. the storage 601 may directly provide e.g. the coefficient values for FIR HRTF filters that are to be applied to the input signal to generate the contribution of that signal to the binaural signal.
  • Fig. 7 illustrates the storage 701 comprising filter coefficient values for different user categories and different positions.
  • the retriever 703 may provide a selection between HRTF values for different categories, and a position input may provide a selection between different positions.
  • a look-up address may be formed by the retriever 701 generating the Most Significant Bits of the address with the Least Significant Bits being generated as a digital value representing the desired position.
  • the classification of the user was performed in response to a specific user input and specifically by asking the user very simple and easy to answer questions.
  • the classification may be performed in response to a device characteristic for a user device which includes the apparatus.
  • an indication of a (likely) characteristic of the user can be determined from a property of the mobile phone.
  • many user device characteristics are statistically correlated with specific user characteristics and such a correlation may be exploited by the apparatus of Fig. 2 to perform a classification of the user.
  • some mobile phone characteristics tend to be gender correlated.
  • the color of a mobile phone may be statistically correlated with the gender of the user, and this may be used by the apparatus to determine an appropriate category for the user. For example, if a mobile phone comprising the apparatus is pink or purple, it may be assumed that the user is female and this may be used in the classification of the user into a specific category.
  • the device characteristics may include a setting or operational parameter which is dependent on a user behavior, and through this on a user characteristic.
  • the classification may in such examples be based on the user setting that has been selected by the user. For example, when a user customizes a telephone by selecting a theme for the user interface (e.g. a user interface skin) the different options may typically exhibit correlations with the gender of the user. Indeed, many themes may be relatively feminine (e.g. by motif or color scheme) and will typically be selected by women. Other themes may be more masculine and are typically selected by men.
  • the apparatus may use such a theme selection as an indication of a user characteristic, namely as an indication of whether the user is male or female.
  • the language setting of a device such as a mobile phone may be used as an indication of a user characteristic and may accordingly be used by the apparatus to categorize the user.
  • mobile phones typically provide the option of selecting between a number of different languages.
  • the specific selection may in some cases indicate an ethnic origin of the user. For example, if the operating language is selected to be Chinese, it is likely that the user is Asian. However, if the language is selected as Dutch it is more likely that the user is Caucasian.
  • the categorization of the user may be based on the language setting that has been selected by the user.
  • the selection between the four different categories of the examples of Figs. 3 and 4 may simply be based on a language and theme selection made by the user.
  • the categorization of the user may be considered an estimated categorization and/or adaptation.
  • the approach may in general provide improved performance but may in some scenarios or for some users result in an incorrect or suboptimal categorization or adaptation.
  • a man may select a feminine theme or an Asian person may select English as the operational language.
  • the apparatus may in many embodiments include a user input which allows a user to manually cancel, override or modify the adaptation of the binaural processing.
  • the head related binaural transfer function can be modified in response to a user input thereby allowing the user to manually change one or more parameters of the applied HRTF. This may in particular be used to refine the HRTF to provide a more precise customization to the individual user.
  • the apparatus may include a user input that allows the user to manually refine the binaural processing such that it is specifically customized to the individual user rather than to the category to which the user belongs.
  • the classification based adaptation may thus be considered to be a first automatically generated estimate for the user customization of the HRTF.
  • This estimated HRTF may then be used as a starting point for an optional manual calibration of the binaural processing.
  • the manual calibration is likely to be faster, more accurate and more user friendly. It particularly avoids that the user is confronted with HRTFs that are furthest from the optimum HRTF.
  • the apparatus may specifically generate some user setting options that can be used by the user to adapt the HRTF. For example, the apparatus may generate a plurality of possible HRTF settings by offsetting the various parameters of the estimated HRTF. The user may then listen to each of the possible HRTFs and select the one he considers to provide the best spatial experience.
  • user setting options may be provided in the form of one or more manually adjustable controls (such as e.g. sliders on a display) where each control may adjust a property or one or more parameters of the estimated HRTF.
  • each control may adjust a property or one or more parameters of the estimated HRTF.
  • the user may introduce an offset to the HRTF.
  • the user may then adjust the controls to provide the optimum spatial effect and the corresponding HRTF may be stored for future use.
  • the first speaker can also render a signal component which is arranged to at least partly cancel a signal component reaching the left ear from the second speaker.
  • the second speaker can render a signal component which is arranged to at least partly cancel a signal component reaching the right ear from the first speaker.
  • any suitable algorithm for estimating such cancellation signal components may be used without subtracting from the invention.
  • the crosstalk cancellation may be considered to be a post-processing applied to the binaural signal or may be incorporated as part of the head related binaural transfer function.
  • the invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
  • the invention may optionally be
  • an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units, circuits and processors.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

An apparatus for generating a binaural signal comprises a receiver (203) that receives a user characteristic indication indicative of a characteristic of a user. A classifier (205) classifies the user into a first user category out of a plurality of user categories in response to the user characteristic indication. Each of the user categories is associated with a set of anthropometric characteristics of humans belonging to the user category. A circuit (207) determines a head related binaural transfer function in response to the first user category with the head related binaural transfer function being dependent on the anthropometric characteristic associated with the first user category. A binaural circuit (201) for generating a binaural signal in response to the head related transform. The approach may provide a user classification based customization of e.g. a Head Related Transfer Function. Improved adaptation without requiring difficult or cumbersome characterization of the user' s physical characteristics can be achieved.

Description

Generation of a binaural signal
FIELD OF THE INVENTION
The invention relates to generation of a binaural signal and in particular, but not exclusively, to generation of a binaural signal for generating a virtual surround sound experience using headphones.
BACKGROUND OF THE INVENTION
In the last decade there has been a trend towards the use of multi-channel audio with more than two channels and towards spatial audio extending beyond conventional stereo signals. For example, traditional stereo recordings only comprise two channels whereas modern audio systems typically use five or six channels, as in the popular 5.1 surround sound systems. This provides a more involved and encapsulating listening experience where the user may be surrounded by sound sources.
It is desirable to be able to provide such enhanced spatial experiences using only headphones. For a conventional stereo headphone arrangement the sound is perceived to be originating from a position inside the user's head laterally between the ears. This is of course highly artificial and accordingly techniques have been developed for more advanced sound source positioning for headphone applications. For example, music playback and sound effects in mobile games and videos can add significant value to the consumer experience when positioned in a larger sound stage, effectively creating an 'out-of-head' effect.
In particular, techniques have been developed for recording and reproducing binaural audio signals which contain specific directional information to which the human ear is sensitive. Such binaural signals may provide directional cues to the listener which corresponds to positions outside the head, and thus a more natural sound scene can be rendered.
Binaural recordings are typically made using two microphones mounted in the ear canal of a dummy human head, so that the recorded sound corresponds to the sound captured by the human ear and includes any influences due to the shape of the head and the ears. Binaural recordings differ from stereo (that is, stereophonic) recordings in that the reproduction of a binaural recording is generally intended for a headset or headphones, whereas a stereo recording is generally made for reproduction by loudspeakers. While a binaural recording allows a reproduction of all spatial information using only two channels, a stereo recording would not provide the same spatial perception.
Regular dual channel (stereophonic) or multiple channel (e.g. 5.1) recordings may be transformed into binaural recordings by convolving each regular signal with a set of perceptual binaural transfer functions that correspond to the direction of the regular signal. Such perceptual binaural transfer functions model the influence of the human head, and possibly other objects, on the signal. A well-known type of spatial perceptual binaural transfer function is the so-called Head-Related Transfer Function (HRTF) or Head Related Impulse Response (HRIR). An alternative type of spatial perceptual binaural transfer function, which also takes into account reflections caused by the walls, ceiling and floor of a room, is the Binaural Room Impulse Response (BRIR). Such a BRIR may thus in addition to the HRTF/HRIR component also include characteristics from the acoustic environment being emulated. BRIRs generally result in an improved out-of-head perception of the binaural signal.
By measuring the impulse responses from a sound source at a specific location in 2D or 3D space at microphones placed in or near the ears (of either a dummy head or a human head), the HRIR, BRIR or HRTFs can be determined. These HRIRs, BRIRs or HRTFs vary from person to person due to different acoustic properties of the head, ears and reflective surfaces such as the shoulders, corpus, etc. The functions can be used to create a binaural recording simulating multiple sources at various locations. This can be realized by convolving each sound source with the pair of HRIRs, BRIRs or HRTFs that corresponds to the position of the sound source as illustrated in Fig. 1 for three audio sources.
As mentioned, if the binaural transfer function also includes a room effect they are typically referred to as BRIRs. BRIRs consist of an anechoic portion that only depends on the subject's anthropometric attributes (such as head size, ear shape, etc), followed by a reverberant portion that characterizes the combination of the room and the anthropometric properties.
The reverberant portion contains two temporal regions, usually overlapping.
The first region contains so-called early reflections, which are isolated reflections of the sound source on walls or obstacles inside the room before reaching the ear-drum (or measurement microphone). As the time lag increases, the number of reflections present in a fixed time interval increases and may also begin to include secondary reflections. The second region in the reverberant portion is the part where these reflections are not isolated anymore. This region is called the diffuse or late reverberation tail.
The reverberant portion contains cues that give the auditory system information about distance of the source and size and acoustical properties of the room.
Furthermore it is subject dependent due to the filtering of the reflections with the acoustic characteristics of the listener's ear etc.
Thus, a virtual surround sound experience can be provided by rendering the sound such that audio sources appear to be originating from a specific direction, thereby creating the illusion that one is listening to a physical surround sound setup (e.g. 5.1 speakers) or environment (e.g. a concert). With an appropriate binaural transfer function (e.g. HRTF), the signals required at the eardrums for the listener to perceive sound from any direction can be calculated. These signals are then recreated at the eardrum using either headphones or e.g. a crosstalk cancelation method (suitable for rendering over closely spaced speakers).
HRTF, HRIR and BRIR filters may be a combination of the head related filters and further processing to e.g. pre-compensate for the (acoustic) filtering occurring during playback of the binaural signal. An example is playback over closely spaced stereo speakers on a mobile device, where a crosstalk cancellation filter is required for a proper perception of the binaural signal. This further processing may also contain an individualized component.
However, a disadvantage for binaural rendering systems is that it is very difficult to provide a convincing and high quality experience for a variety of listeners. Indeed, the characteristics of the HRIRs, BRIRs and HRTFs (and crosstalk cancelation functions) are specific to the individual and depend on the specific physical properties of the listener. For example, different sizes of various parameters of the ear may have impact on the HRIRs, BRIRs and HRTFs (henceforth referred to only by referring to the HRTF, i.e. an HRTF is a common term including also e.g. HRIRs and BRIRs). Therefore, the perceived spatial characteristics depend on the HRTF used and on the characteristics of the individual. The use of the same HRTF will for different listeners result in differences in the perceived spatial sound stage. Typically, directionality, i.e., the direction where the virtual source originates from, and externalization, i.e., to what extent the virtual sound source is perceived as being outside the head, will decrease when using non-personalized HRTFs.
Personalized HRTFs can be determined by measuring the HRTF filters directly on the person, e.g. using microphones in the ear and playback of test signals at various spatial locations. This requires specific hardware, software and significant effort. By measuring the HRTF filters in a specific environment, personalized BRIRs are obtained.
Alternatively, personalized HRTFs can be (partially) determined by the use of metrics resulting from anthropometric measurements. Anthropometry refers to the measurement of the human individual, and anthropometric data may thus specifically be properties and values that characterize one or more physical properties of a human.
Typically, such data would be derived in the course of the calibration process of a user device such as a mobile phone. However, it is very inconvenient and cumbersome for user to derive such data. Indeed, determining specific physical measurements typically involve non- straightforward procedures that a user has to follow, such as measuring his head size according to specific instructions.
Hence, an improved generation of a binaural signal would be advantageous and in particular an approach allowing increased flexibility, reduced complexity, increased user friendliness, facilitated operation, improved customization, improved spatial experience and/or improved performance would be advantageous.
SUMMARY OF THE INVENTION
Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
According to an aspect of the invention there is provided an apparatus for generating a binaural signal, the apparatus comprising: a receiver for receiving a user characteristic indication indicative of a characteristic of a user; a classifier for classifying the user into a first user category out of a plurality of user categories in response to the user characteristic indication, each user category being associated with a set of anthropometric characteristics of humans belonging to the user category; a first circuit for determining a head related binaural transfer function in response to the first user category, the head related binaural transfer function being dependent on the anthropometric characteristic associated with the first user category; and a binaural circuit for generating a binaural signal in response to the head related transform.
The invention may provide an improved generation of a binaural signal and/or facilitated operation and/or implementation. In particular, the invention may allow a customization of binaural processing to a user without requiring specific data about the users physical properties to be provided. The customization of the binaural processing may for example be achieved through a very simple and user friendly interaction, or may indeed in many embodiments be performed without any involvement by a user.
In particular, the Inventors have realized that substantially improved performance can be achieved by performing customization based on a few suitable anthropometric characteristics, and that the determination of these anthropometric characteristics need not be determined specifically for the individual user, but rather that suitable values can be determined by a classification of users and the use of customization values determined for the appropriate category/classification. The Inventors have realized that the variation of appropriate anthropometric characteristics for binaural customization can be approximately characterized by considering categories of users rather than individual user characteristics, and that this may provide a substantial benefit.
Thus, the approach may provide a highly desirable trade-off between customization of a head related binaural transfer function (HRTF/BRIR/HRIR) and binaural performance on one hand and user friendliness and ease of operation on the other hand.
The anthropometric characteristics reflect statistical properties for users belonging to the user category. For example, they may reflect e.g. average, median or other statistical measures for the physical properties of humans belonging to the category. The set of anthropometric characteristics may specifically comprise a pinna height and/or a distance between the ears. The set of anthropometric characteristics may specifically comprise only a single anthropometric characteristic. The anthropometric characteristics may correspond to measurable physical properties of a human. The anthropometric characteristics may correspond to characteristics of humans that impact on the sense of hearing, such as specifically characteristics associated with an acoustic transfer function for the user.
Each user category may be associated with/ linked to different values of the anthropometric characteristics. Specifically, each user category may be associated with different values for the pinna height and/or a distance between the ears. The statistical properties of the psychometric/physical characteristics for users belonging to each user category may be different. Specifically, at least some average anthropometric characteristics/ physical properties may be different between the different user categories.
The binaural processing may filter one or more input audio signals using the head related binaural transfer function to generate the binaural signal. The head related binaural transfer function may be a function of a desired position for the audio signal/sound component. Thus, the binaural circuit may determine the head related binaural transfer function to apply to an audio signal in response to a desired position for the audio signal. The binaural signal may comprise a plurality of sound sources corresponding to a plurality of input signals. The binaural circuit may for each input signal determine the head related binaural transfer function in response to a desired position of the input signal, and may apply the determined head related binaural transfer function to each audio signal. The resulting signals can then be combined into the binaural signal.
The head related binaural transfer function may specifically be an HRTF or equivalent representations such as a HRIR or a BRIR. In particular, a head related binaural transfer function may represent other characteristics of the acoustic transfer function than those determined by the characteristics of the listener, e.g. it may include characteristics of the audio environment (and especially the room) being rendered.
The binaural signal may specifically be a two-channel signal with one channel providing the signal for one ear and the other channel providing the signal for the other ear of the user. The binaural signal may be rendered e.g. using a headphone. As another example, the binaural signal may be rendered using two or more loudspeakers. In such a case, a crosstalk-cancellation technique may be employed to compensate for sound contributions at one ear from the loudspeaker(s) rendering the signal to the other ear.
The user characteristic indication and/or the anthropometric characteristics may be composite or multi-dimensional characteristics. Thus the characteristics may e.g. be indicated by a plurality of values, each value being indicative of one property. For example, the user characteristic indication may be indicative of both a gender and ethnicity. As another example, an anthropometric characteristic may comprise a value for both a pinna dimension and an inter-ear distance.
In accordance with an optional feature of the invention, first circuit comprises: a storage for storing at least one anthropometric property for each user category of the plurality of user categories; a retriever for retrieving a first anthropometric property corresponding to the first user category from the storage; and a generator for generating the head related binaural transfer function in response to the first anthropometric property.
This may facilitate implementation in many embodiments and may provide a particularly efficient approach for generating an adapted head related binaural transfer function. In particular, it may in many embodiments reduce storage requirements while maintaining a low computational requirement. In many embodiments and scenarios it may provide an improved spatial experience.
In accordance with an optional feature of the invention, the first anthropometric property is a pinna dimension. The Inventors have realized that basing an adaptation of a head related binaural transfer function on a pinna characteristic allows a classification based adaptation rather than requiring an individual optimization for the individual user. Furthermore, the pinna height has been found to allow an improved category based customization, and may in particular allow a more customized adaptation, and may thus provide an improved spatial experience for the user.
The pinna dimension may specifically be a pinna dimension.
In accordance with an optional feature of the invention, the first anthropometric property is an inter-ear distance.
The Inventors have realized that basing an adaptation of a head related binaural transfer function on an inter-ear distance characteristic allows a classification based adaptation rather than requiring an individual optimization for the individual user.
Furthermore, the inter-ear distance has been found to allow an improved category based customization, and may in particular allow a more customized adaptation and may thus result in an improved spatial experience for the user.
In accordance with an optional feature of the invention, the generator is arranged to generate the head related binaural transfer function in response to a frequency scaling of a reference head related binaural transfer function in response to the first anthropometric property.
This may provide a particularly efficient implementation and may reduce complexity and/or reduce computational load requirements while providing a binaural signal with high quality and spatial characteristics. The approach may be particularly advantageous when the first anthropometric property is at least one of a pinna dimension and an inter-ear distance.
In accordance with an optional feature of the invention, the first circuit comprises: a storage for storing an head related binaural transfer function for each user category of the plurality of user categories, the head related binaural transfer function reflecting the anthropometric characteristics associated with the user category; a generator for generating the head related binaural transfer function by retrieving a stored head related binaural transfer function for the first user category.
This may provide a low complexity and efficient operation in many embodiments. It may in particular provide easy customization with low complexity and computational resource usage. The storage may specifically provide a look-up table comprising a head related binaural transfer function for each category. In some embodiments, the storage may furthermore comprise head related binaural transfer function values for each category for a plurality of different positions, and the generator may be arranged to retrieve stored head related binaural transfer function values for the first user category corresponding to a desired position.
In accordance with an optional feature of the invention, the receiver comprises a user interface for receiving the user characteristic indication from a user input.
This may provide a particularly practical and advantageous approach. In particular, a low complexity yet accurate and reliable determination of an appropriate category can be achieved while keeping any inconvenience to the user low.
In accordance with an optional feature of the invention, the user characteristic indication comprises an ethnicity indication for the user.
This characteristic has been found to be particularly advantageous for a category based adaptation of a head related binaural transfer function.
In accordance with an optional feature of the invention, the user characteristic indication comprises a gender indication for the user.
This characteristic has been found to be particularly advantageous for a category based adaptation of a head related binaural transfer function.
In accordance with an optional feature of the invention, the user characteristic indication comprises a device characteristic for a user device comprising the apparatus.
This may reduce inconvenience to a user and may indeed in many embodiments obviate the requirement for any user interaction. Indeed, a fully automated classification based adaptation of a head related binaural transfer function can be achieved. The device characteristic can be a characteristic associated with a correlation with a characteristic of one or more of the categories of the users. The device characteristic can be a characteristic associated with a statistical differentiation between different types of users.
The apparatus for generating the binaural signal may for example be implemented in a mobile phone or media player device, and the head related binaural transfer function may be adapted based on a characteristic of the mobile phone or media player device.
In accordance with an optional feature of the invention, the device characteristic comprises a user setting. This may provide a particularly accurate and/or practical association with a user category.
In accordance with an optional feature of the invention, the device
characteristic comprises a language setting.
This may provide a particularly accurate and/or practical association with a user category.
In accordance with an optional feature of the invention, the first circuit is further arranged to modify the head related binaural transfer function in response to a user input.
This may allow an improved user experience in many embodiments. In particular, it may allow an initial coarse category customization to be refined by an optional manual adaption.
In accordance with an optional feature of the invention, the first circuit is arranged to generate user setting options for modifying the head related binaural transfer function based on the head related binaural transfer function.
This may allow an improved user experience and/or increased user friendliness in many embodiments. In particular, it may allow an initial coarse category customization to be refined by an optional manual adaption which is based on the coarse customization. This may substantially facilitate and aid a manual calibration or setting of the binaural processing as the starting point for the optimization may already be close to the optimum values.
According to an aspect of the invention method of generating a binaural signal, the method comprising: receiving a user characteristic indication indicative of a characteristic of a user; classifying the user into a first user category out of a plurality of user categories in response to the user characteristic indication, each user category being associated with a set of anthropometric characteristics of humans belonging to the user category; determining a head related binaural transfer function in response to the first user category, the head related binaural transfer function being dependent on the anthropometric characteristic associated with the first user category; and generating a binaural signal in response to the head related transform.
These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter. BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which
Fig. 1 illustrates an example of a generation of a binaural signal;
Fig. 2 illustrates an example of elements of an apparatus for generating a binaural signal in accordance with some embodiments of the invention;
Fig. 3 illustrates an example of distributions of inter-ear distances for different categories of people;
Fig. 4 illustrates an example of distributions of pinna heights for different categories of people; and
Figs. 5-7 illustrate examples of elements for generating a head related binaural transfer function in accordance with some embodiments of the invention.
DETAILED DESCRIPTION OF SOME EMBODIMENTS OF THE INVENTION
The following description focuses on embodiments of the invention applicable to a system for generating a virtual surround sound experience using a pair of headphones. The system generates a binaural signal based on HRTF filtering. However, it will be appreciated that the invention is not limited to this application but may be applied to many other applications and using other head related binaural transfer function, such as for example a BRIR or HRIR. Thus, references to HRTFs are as appropriate equally references to HRIRs or BRIRs (which may be considered a form of HRTFs).
Fig. 2 illustrates an example of elements of an apparatus for generating a binaural signal in accordance with some embodiments of the invention. The apparatus may for example be included in a mobile phone, media player, portable music player, a tablet, a computer, or the like.
In the specific example described in the following, the apparatus receives five spatial channels of a surround sound multi-channel signal. Specifically, it may receive signals corresponding to the centre, front left and right, and rear left and right speaker positions. The apparatus then proceeds to generate a binaural signal which provides a virtual surround sound experience by rendering the individual channels such that they to a user wearing a pair of headphones appear to be arriving from positions corresponding to the nominal speaker positions for a surround sound system. It will be appreciated that in other embodiments, other types of signals may be rendered. For example, for a gaming application a number of individual audio objects may be received and rendered such that they are perceived to be positioned at appropriate (virtual) positions.
The apparatus of Fig.l comprises a binaural renderer 201 which receives the five spatial channels and which generates a binaural signal wherein the five spatial positions are rendered from (virtual) positions corresponding to the nominal speaker positions for a five channel surround sound system. Thus, when a user listens to the binaural signal using headphones, a spatial experience corresponding to a surround sound listening experience is generated.
The binaural renderer 201 specifically applies an HRTF for respectively the left and right output channels of the output binaural signal to each of the input channels. The resulting signals are summed for respectively the left and right output channels of the output binaural signal. The binaural renderer 201 may specifically employ an approach
corresponding to that of Fig. 1.
It will be appreciated that the skilled person will be aware of many different approaches for processing an input signal or signals to generate an output binaural signal based on head related binaural transfer functions (such as an HRTF, HRIR or BRIR) and that any of these may be applied without detracting from the invention.
The key to providing a convincing spatial experience from a binaural signal is that the head related binaural transfer function represents the characteristics for the individual user. Indeed, it has been found that using the same HRTF for different users tend to result in significantly degraded spatial experiences compared to that which can be achieved if the HRTF is optimized for the individual user. However, a problem in customizing HRTFs is that it requires specific information about anthropometric measurements of the user which are difficult or cumbersome to determine or obtain. Indeed, even if directly involving the user, inconvenient and cumbersome measurements are required.
Accordingly, it would be advantageous for a less inconvenient and cumbersome adaptation of the HRTF to the individual user.
In the apparatus of Fig. 2, a facilitated and more user friendly adaptation of the binaural processing is achieved by customizing the head related binaural transfer function based on a classification approach wherein the user is classified into a classification and the head related binaural transfer function is then adapted to match the (average) anthropometric characteristics for that category of people.
The apparatus of Fig. 2 in particular exploits the Inventors' realization that a useful customization of a head related binaural transfer function can be achieved by an adaptation to an appropriate user class and without adaptation to the individual user's specific characteristics. It furthermore exploits the Inventors' realization that adaptation of a head related binaural transfer function is particularly significant if the adaption is based on the anthropometric properties of an inter-ear distance and/or a pinna dimension (and in particular a pinna-cavity dimension or a pinna height); as well as the realization that these parameters show a significant dependence on the ethnicity and gender of a person.
This is illustrated in Figs. 3 and 4 which illustrate respectively the head width (inter-ear distance) and ear height (pinna dimension) obtained by measurements of 150 subjects by the Applicant. The data supports the Inventors' realization that the gender and ethnicity of a user provides some predictive properties on the anthropometric features, such as pinna height and head width, and thus that the gender and ethnicity of a user provides some predictive properties for the optimized head related binaural transfer function for the person. Thus, by classifying a user into a suitable category, the anthropometric data for this category can provide predictive properties for the optimized head related binaural transfer function.
In the apparatus of Fig. 2 a classification of a user into one of a plurality of categories is performed. An HRTF is then determined which is particularly suitable for that category, e.g. by being optimized for the average anthropometric values of the group.
The apparatus comprises a receiver 203 which receives (or generates) a user characteristic indication indicative of a characteristic of a user. The user characteristic provided by the receiver is not an anthropometric property or measure of the user but may specifically be an association with a specific type or category of humans that the user belongs to. Specifically, the user characteristics may be an indication of an ethnicity and/or gender to which the user belongs.
In some embodiments, the receiver 203 may comprise a user interface which can interface to the user and which specifically can receive a user input. Thus, in such embodiments, the user characteristic may simply be generated based on a user input.
The receiver 203 is coupled to a classifier 205 which classifies the user into one (or possibly more) category(ies) out of a plurality of user categories based on the user characteristic indication. The classifier 205 thus selects which category out of the set of categories that the user belongs to. The classifier 205 may for example comprise a number of predetermined categories where each category exhibits different statistical distributions of one or more anthropometric properties. In particular, the average value of a physical property may be different for the different categories. In the specific example, the classifier 205 may include a category for each of the categories illustrated in Figs. 3 and 4, i.e. one category may be provided to correspond to female Asian, another to male Asian, another to female
Caucasian, and yet another to male Caucasian. It will be appreciated that more categories may be included.
The classifier 205 thus selects which of the predetermined categories the user belongs to. In some examples, the user characteristic may directly reflect the parameters used to define the user categories, and in such cases the classification is directly given by the user characteristic and is thus straightforward to implement. For example, the user may input whether he/she is male or female, and which ethnic group he/she belongs to. The
corresponding category is thus simply the one having the combination of features that the user has indicated.
The classifier 205 is coupled to a HRTF processor 207 which is further coupled to the binaural renderer 201. The HRTF processor 207 is arranged to determine a head related binaural transfer function in response to the user category to which the user was found to belong to. The head related binaural transfer function is determined to match the anthropometric values which are associated with the category to which the user belongs. Specifically, the head related binaural transfer function may be selected/ generated such that it is optimized for the specific average anthropometric values for the identified user category.
For example, if the user belongs to the category of Caucasian females, an HRTF is generated which is based on an inter-ear distance of 13.4 cm and a pinna height of 6 cm. However, if the user belongs to the category of Asian males, an HRTF is generated which is based on an inter-ear distance of 15.8 cm and a pinna height of 6.5 cm.
The determined HRTF is then fed to the binaural renderer 201 where it is used to generate the binaural output signal.
Thus, the approach provides an adaptation of the binaural processing to the user. Although the adaptation is not specific to the individual user, and therefore the individual user may have different characteristics than assumed, the difference is likely to be significantly less than if only a nominal or reference HRTF was used. Furthermore, as the characteristics substantially have a Gaussian distribution, the probability that the user has characteristics close to the average values is relatively high.
The apparatus of Fig. 2 may thus provide an improved binaural rendering by using an HRTF that has been adapted to the user. Although the user adaptation may be approximate, it has been found that substantial improvements can be achieved. Furthermore, rather than requesting or determining user metrics, such as inter-ear distance or pinna dimensions, the adaptation can be achieved based on more easily available or obtainable information such as in particular the ethnicity or gender of the user. This information can be provided e.g. simply by asking the user to make very simple and easy selections rather than requiring the user to make cumbersome or difficult estimates of anthropometric properties.
The generation and adaptation of the HRTF may be performed differently in different embodiments.
Fig. 5 illustrates en example of elements of the HRTF processor 207. In the example, the HRTF processor 207 comprises a storage 501 which stores one or more anthropometric properties for each user category of the plurality of user categories.
Specifically, the storage 501 may be implemented as a look-up table or memory which stores one or more anthropometric properties for each category. In the specific example, the storage 501 stores an inter-ear distance and a pinna height for each possible category.
The HRTF processor 207 further comprises a retrieve processor 503 which is arranged to retrieve the values corresponding to the user category the user has been classified into by the classifier 205. The retrieve processor 503 may receive an indication of the selected category and in return it may generate one or more addresses for the storage 501. The stored values 501 correspond to the pinna height and inter-ear distance for the selected category. Thus, the retrieve processor 503 may initiate a table look-up in a look-up-table stored in the storage 501.
The value(s) output by the storage 501 are fed to a generator 505 which is arranged to generate the head related binaural transfer function based on the value(s).
For example, the generator 505 may comprise a reference HRTF which expresses a filter transfer function (e.g. a frequency response or an impulse response) as a function of the position. The reference HRTF may further comprise one or more adaptable parameters which are determined by the generator 505 as a function of the retrieved values. In particular, one or more of the parameters may be determined as a function of the anthropometric measures, and the parameters for the HRTF are thus determined from the values retrieved from the storage 501.
In some embodiments, the anthropometric values of the storage 501 may be expressed by the value after the application of the function to determine the HRTF parameters, i.e. the storage may directly store the parameters for the HRTF function.
In the specific example of Fig. 5, the generator 505 may generate the head related binaural transfer function by performing a frequency scaling of a reference head related binaural transfer function where the frequency scaling depends on the retrieved value(s).
Such an approach is particularly suitable for embodiments wherein the anthropometric characteristics considered are the pinna height and inter-ear distance. Indeed, as described in Middlebrooks, "Individual Differences In External-Ear Transfer Functions Reduced By Scaling In Frequency", Journal of Acoustical Society of America September 1999, Vol. 106(3), Pt. 1, pg. 1480-1492; frequency scaling can be used to (partially) personalize HRTFs. By frequency scaling, certain properties in an HRTF, such as peaks and notches, are moved to other frequencies to better match those of the individual.
The scaling parameter is related to the pinna height and head width. In the above referenced paper, the relation is given as:
(0.340-log2 (pinnaA / pinnaB )+0.527-log2 (head A/headB ))
Figure imgf000017_0001
where pinna refers to the pinna height, head refers to the inter-ear distance, and the indexes B and A refer to the values for the reference HRTF and the retrieved values respectively.
Thus, in the example of Fig. 5, the generator 505 uses this relation to determine the frequency scaling factor by converting the reference HRTF determined for a person with a certain pinna height and inter-ear distance, to an adapted HRTF for the user by using the average pinna height and inter-ear distance for the category to which the user has been classified.
Another example of the HRTF processor 207 is illustrated in Fig. 6. In this example, the HRTF processor 207 comprises a storage 601 and a retriever 603 which operates similarly to the storage 501 and retriever 503 of the example of Fig. 5. However, in this example, the storage 601 directly stores a head related binaural transfer function for each category. The stored head related binaural transfer function is generated to reflect the specific anthropometric characteristics (e.g. pinna height and inter-ear distance) for the specific category. The stored HRTF can be a measured HRTF on a person with anthropometric properties that are characteristic of the corresponding category. Alternatively, the HRTFs may have been derived from one or more reference HRTFs through offline processing.
The HRTF processor 207 further comprises a generator 605. However, in contrast to the example of Fig. 5, the generator 605 need not derive the head related binaural transfer function as this is already provided by the storage 601. However, the generator 605 may provide the appropriate filter coefficients for a specific desired position by determining these coefficients from the retrieved transfer function. This may be done for each desired position, i.e. for each of the input channels (it may equivalently be considered that the generator 605 is fully or partially part of the binaural renderer 201).
In some embodiments, the storage 601 may for example store the parameter values for a function representing an HRTF as a function of position. In other embodiments, the storage 601 may simply comprise an identification of an HRTF to be used.
In some embodiments, the storage 601 may store the HRTFs as individual filter coefficient values for a plurality of positions. For example, for each HRTF, a set of filter coefficients for a pair of HRTF filters may be stored for each of a plurality of positions (say, one filter coefficient set for each 5°). In such an embodiment, both the user adaptation and the position adaptation may be performed by a single table look-up, i.e. the storage 601 may directly provide e.g. the coefficient values for FIR HRTF filters that are to be applied to the input signal to generate the contribution of that signal to the binaural signal.
Such an example is illustrated in Fig. 7 which illustrates the storage 701 comprising filter coefficient values for different user categories and different positions. The retriever 703 may provide a selection between HRTF values for different categories, and a position input may provide a selection between different positions. As an example, a look-up address may be formed by the retriever 701 generating the Most Significant Bits of the address with the Least Significant Bits being generated as a digital value representing the desired position.
Such an approach may reduce the processing load of the apparatus at the expense of memory requirements.
In the example above, the classification of the user was performed in response to a specific user input and specifically by asking the user very simple and easy to answer questions.
However, in other embodiments, other approaches may be used. Specifically, the classification may be performed in response to a device characteristic for a user device which includes the apparatus.
For example, if the apparatus is included in a mobile phone or a media player, an indication of a (likely) characteristic of the user can be determined from a property of the mobile phone. Indeed, many user device characteristics are statistically correlated with specific user characteristics and such a correlation may be exploited by the apparatus of Fig. 2 to perform a classification of the user. For example, some mobile phone characteristics tend to be gender correlated. E.g., the color of a mobile phone may be statistically correlated with the gender of the user, and this may be used by the apparatus to determine an appropriate category for the user. For example, if a mobile phone comprising the apparatus is pink or purple, it may be assumed that the user is female and this may be used in the classification of the user into a specific category.
In some embodiments, the device characteristics may include a setting or operational parameter which is dependent on a user behavior, and through this on a user characteristic. The classification may in such examples be based on the user setting that has been selected by the user. For example, when a user customizes a telephone by selecting a theme for the user interface (e.g. a user interface skin) the different options may typically exhibit correlations with the gender of the user. Indeed, many themes may be relatively feminine (e.g. by motif or color scheme) and will typically be selected by women. Other themes may be more masculine and are typically selected by men. In some embodiments, the apparatus may use such a theme selection as an indication of a user characteristic, namely as an indication of whether the user is male or female.
As another example, the language setting of a device such as a mobile phone may be used as an indication of a user characteristic and may accordingly be used by the apparatus to categorize the user. For example, mobile phones typically provide the option of selecting between a number of different languages. The specific selection may in some cases indicate an ethnic origin of the user. For example, if the operating language is selected to be Chinese, it is likely that the user is Asian. However, if the language is selected as Dutch it is more likely that the user is Caucasian. Thus, the categorization of the user may be based on the language setting that has been selected by the user.
In some embodiments and scenarios, the selection between the four different categories of the examples of Figs. 3 and 4 may simply be based on a language and theme selection made by the user.
It will be appreciated that the categorization of the user, and indeed the adaptation of the head related binaural transfer function, may be considered an estimated categorization and/or adaptation. Thus, the approach may in general provide improved performance but may in some scenarios or for some users result in an incorrect or suboptimal categorization or adaptation. For example, a man may select a feminine theme or an Asian person may select English as the operational language. Accordingly, the apparatus may in many embodiments include a user input which allows a user to manually cancel, override or modify the adaptation of the binaural processing.
In some embodiments, the head related binaural transfer function can be modified in response to a user input thereby allowing the user to manually change one or more parameters of the applied HRTF. This may in particular be used to refine the HRTF to provide a more precise customization to the individual user. Specifically, the apparatus may include a user input that allows the user to manually refine the binaural processing such that it is specifically customized to the individual user rather than to the category to which the user belongs.
In such embodiments, the classification based adaptation may thus be considered to be a first automatically generated estimate for the user customization of the HRTF. This estimated HRTF may then be used as a starting point for an optional manual calibration of the binaural processing. As the estimated HRTF is likely to be closer to the optimum HRTF for the individual person than a generic HRTF, the manual calibration is likely to be faster, more accurate and more user friendly. It particularly avoids that the user is confronted with HRTFs that are furthest from the optimum HRTF.
In such approaches the apparatus may specifically generate some user setting options that can be used by the user to adapt the HRTF. For example, the apparatus may generate a plurality of possible HRTF settings by offsetting the various parameters of the estimated HRTF. The user may then listen to each of the possible HRTFs and select the one he considers to provide the best spatial experience.
As another example, user setting options may be provided in the form of one or more manually adjustable controls (such as e.g. sliders on a display) where each control may adjust a property or one or more parameters of the estimated HRTF. Thus, by adjusting the controls, the user may introduce an offset to the HRTF. The user may then adjust the controls to provide the optimum spatial effect and the corresponding HRTF may be stored for future use.
By basing such calibration algorithms on the HRTF determined from user classification rather than on a predetermined average HRTF, it is not only achieved that the starting point is likely to be closer to the optimum, but it may also allow the offsets introduced by the user settings to be relative to the estimated HRTF (rather than to a generic average HRTF). This may for example allow much smaller offsets to be considered, and thereby allow an easier and more accurate calibration to be performed by the user. The previous description has focused on generation of a binaural signal that can be rendered directly via headphones. However, it will be appreciated that many other forms of rendering may be used. For example, the generated signal may be rendered using two or more loudspeakers. In such an example, crosstalk-cancellation techniques may be applied. For example, in addition to a first speaker rendering the signal for the left ear and a second speaker rendering the signal for the right ear, the first speaker can also render a signal component which is arranged to at least partly cancel a signal component reaching the left ear from the second speaker. Similarly, the second speaker can render a signal component which is arranged to at least partly cancel a signal component reaching the right ear from the first speaker. It will be appreciated that any suitable algorithm for estimating such cancellation signal components may be used without subtracting from the invention. Furthermore, the crosstalk cancellation may be considered to be a post-processing applied to the binaural signal or may be incorporated as part of the head related binaural transfer function.
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional circuits, units and processors. However, it will be apparent that any suitable distribution of functionality between different functional circuits, units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units or circuits are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be
implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units, circuits and processors.
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.
Furthermore, although individually listed, a plurality of means, elements, circuits or method steps may be implemented by e.g. a single circuit, unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate.
Furthermore, the order of features in the claims do not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example shall not be construed as limiting the scope of the claims in any way.

Claims

CLAIMS:
1. An apparatus for generating a binaural signal, the apparatus comprising:
a receiver (203) for receiving a user characteristic indication indicative of a characteristic of a user;
a classifier (205) for classifying the user into a first user category out of a plurality of user categories in response to the user characteristic indication, each user category being associated with a set of anthropometric characteristics of humans belonging to the user category;
a first circuit (207) for determining a head related binaural transfer function in response to the first user category, the head related binaural transfer function being dependent on the anthropometric characteristic associated with the first user category; and
a binaural circuit (201) for generating a binaural signal in response to the head related transform.
2. The apparatus of claim 1 wherein the first circuit comprises:
a storage (501) for storing at least one anthropometric property for each user category of the plurality of user categories;
a retriever (503) for retrieving a first anthropometric property corresponding to the first user category from the storage (501); and
a generator (505) for generating the head related binaural transfer function in response to the first anthropometric property.
3. The apparatus of claim 2 wherein the first anthropometric property is a pinna dimension.
4. The apparatus of claim 2 wherein the first anthropometric property is an inter- ear distance.
5. The apparatus of claim 2 wherein the generator (505) is arranged to generate the head related binaural transfer function in response to a frequency scaling of a reference head related binaural transfer function in response to the first anthropometric property.
6. The apparatus of claim 1 wherein the first circuit (207) comprises:
a storage (601) for storing an head related binaural transfer function for each user category of the plurality of user categories, the head related binaural transfer function reflecting the anthropometric characteristics associated with the user category;
a generator (603) for generating the head related binaural transfer function by retrieving a stored head related binaural transfer function for the first user category.
7. The apparatus of claim 1 wherein the receiver (203) comprises a user interface for receiving the user characteristic indication from a user input.
8. The apparatus of claim 1 wherein the user characteristic indication comprises an ethnicity indication for the user.
9. The apparatus of claim 1 wherein the user characteristic indication comprises a gender indication for the user.
10. The apparatus of claim 1 wherein the user characteristic indication comprises a device characteristic for a user device comprising the apparatus.
11. The apparatus of claim 10 wherein the device characteristic comprises a user setting.
12. The apparatus of claim 11 wherein the device characteristic comprises a language setting.
13. The apparatus of claim 1 wherein the first circuit (207) is further arranged to modify the head related binaural transfer function in response to a user input.
14. The apparatus of claim 13 wherein the first circuit (207) is arranged to generate user setting options for modifying the head related binaural transfer function based on the head related binaural transfer function.
15. A method of generating a binaural signal, the method comprising:
receiving a user characteristic indication indicative of a characteristic of a user;
classifying the user into a first user category out of a plurality of user categories in response to the user characteristic indication, each user category being associated with a set of anthropometric characteristics of humans belonging to the user category;
determining a head related binaural transfer function in response to the first user category, the head related binaural transfer function being dependent on the anthropometric characteristic associated with the first user category; and
generating a binaural signal in response to the head related transform.
A computer program product comprising computer program code means to perform all the steps of claim 15 when said program is run on a computer.
PCT/IB2013/050441 2012-01-24 2013-01-17 Generation of a binaural signal WO2013111038A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261589915P 2012-01-24 2012-01-24
US61/589,915 2012-01-24

Publications (1)

Publication Number Publication Date
WO2013111038A1 true WO2013111038A1 (en) 2013-08-01

Family

ID=47915299

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/050441 WO2013111038A1 (en) 2012-01-24 2013-01-17 Generation of a binaural signal

Country Status (1)

Country Link
WO (1) WO2013111038A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9609436B2 (en) 2015-05-22 2017-03-28 Microsoft Technology Licensing, Llc Systems and methods for audio creation and delivery
US9900722B2 (en) 2014-04-29 2018-02-20 Microsoft Technology Licensing, Llc HRTF personalization based on anthropometric features
US10028070B1 (en) 2017-03-06 2018-07-17 Microsoft Technology Licensing, Llc Systems and methods for HRTF personalization
US10278002B2 (en) 2017-03-20 2019-04-30 Microsoft Technology Licensing, Llc Systems and methods for non-parametric processing of head geometry for HRTF personalization
US10382880B2 (en) 2014-01-03 2019-08-13 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
US10425763B2 (en) 2014-01-03 2019-09-24 Dolby Laboratories Licensing Corporation Generating binaural audio in response to multi-channel audio using at least one feedback delay network
WO2021043248A1 (en) * 2019-09-05 2021-03-11 Harman International Industries, Incorporated Method and system for head-related transfer function adaptation
US11205443B2 (en) 2018-07-27 2021-12-21 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable media for improved audio feature discovery using a neural network
US11212638B2 (en) 2014-01-03 2021-12-28 Dolby Laboratories Licensing Corporation Generating binaural audio in response to multi-channel audio using at least one feedback delay network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996244B1 (en) * 1998-08-06 2006-02-07 Vulcan Patents Llc Estimation of head-related transfer functions for spatial sound representative
WO2007096808A1 (en) * 2006-02-21 2007-08-30 Koninklijke Philips Electronics N.V. Audio encoding and decoding
US20070270988A1 (en) * 2006-05-20 2007-11-22 Personics Holdings Inc. Method of Modifying Audio Content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996244B1 (en) * 1998-08-06 2006-02-07 Vulcan Patents Llc Estimation of head-related transfer functions for spatial sound representative
WO2007096808A1 (en) * 2006-02-21 2007-08-30 Koninklijke Philips Electronics N.V. Audio encoding and decoding
US20070270988A1 (en) * 2006-05-20 2007-11-22 Personics Holdings Inc. Method of Modifying Audio Content

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALGAZI V RALPH ET AL: "Physical and Filter Pinna Models Based on Anthropometry", AES CONVENTION 122; MAY 2007, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 1 May 2007 (2007-05-01), XP040508170 *
MIDDLEBROOKS: "Individual Differences In External-Ear Transfer Functions Reduced By Scaling In Frequency", JOURNAL OF ACOUSTICAL SOCIETY OF AMERICA, vol. 106, no. 3, September 1999 (1999-09-01), pages 1480 - 1492

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10834519B2 (en) 2014-01-03 2020-11-10 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
US10771914B2 (en) 2014-01-03 2020-09-08 Dolby Laboratories Licensing Corporation Generating binaural audio in response to multi-channel audio using at least one feedback delay network
US10425763B2 (en) 2014-01-03 2019-09-24 Dolby Laboratories Licensing Corporation Generating binaural audio in response to multi-channel audio using at least one feedback delay network
US10547963B2 (en) 2014-01-03 2020-01-28 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
US11576004B2 (en) 2014-01-03 2023-02-07 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
US11272311B2 (en) 2014-01-03 2022-03-08 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
US11212638B2 (en) 2014-01-03 2021-12-28 Dolby Laboratories Licensing Corporation Generating binaural audio in response to multi-channel audio using at least one feedback delay network
US10382880B2 (en) 2014-01-03 2019-08-13 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
US10555109B2 (en) 2014-01-03 2020-02-04 Dolby Laboratories Licensing Corporation Generating binaural audio in response to multi-channel audio using at least one feedback delay network
US11582574B2 (en) 2014-01-03 2023-02-14 Dolby Laboratories Licensing Corporation Generating binaural audio in response to multi-channel audio using at least one feedback delay network
US10284992B2 (en) 2014-04-29 2019-05-07 Microsoft Technology Licensing, Llc HRTF personalization based on anthropometric features
US10313818B2 (en) 2014-04-29 2019-06-04 Microsoft Technology Licensing, Llc HRTF personalization based on anthropometric features
US9900722B2 (en) 2014-04-29 2018-02-20 Microsoft Technology Licensing, Llc HRTF personalization based on anthropometric features
US9609436B2 (en) 2015-05-22 2017-03-28 Microsoft Technology Licensing, Llc Systems and methods for audio creation and delivery
US10129684B2 (en) 2015-05-22 2018-11-13 Microsoft Technology Licensing, Llc Systems and methods for audio creation and delivery
US10028070B1 (en) 2017-03-06 2018-07-17 Microsoft Technology Licensing, Llc Systems and methods for HRTF personalization
US10278002B2 (en) 2017-03-20 2019-04-30 Microsoft Technology Licensing, Llc Systems and methods for non-parametric processing of head geometry for HRTF personalization
US11205443B2 (en) 2018-07-27 2021-12-21 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable media for improved audio feature discovery using a neural network
WO2021043248A1 (en) * 2019-09-05 2021-03-11 Harman International Industries, Incorporated Method and system for head-related transfer function adaptation

Similar Documents

Publication Publication Date Title
WO2013111038A1 (en) Generation of a binaural signal
US9426589B2 (en) Determination of individual HRTFs
JP5857071B2 (en) Audio system and operation method thereof
US9860666B2 (en) Binaural audio reproduction
EP1843635B1 (en) Method for automatically equalizing a sound system
CN102972047B (en) Method and apparatus for reproducing stereophonic sound
US20100329490A1 (en) Audio device and method of operation therefor
JP6995777B2 (en) Active monitoring headphones and their binaural method
US10757522B2 (en) Active monitoring headphone and a method for calibrating the same
US11611828B2 (en) Systems and methods for improving audio virtualization
KR20080060640A (en) Method and apparatus for reproducing a virtual sound of two channels based on individual auditory characteristic
CN101521843A (en) Head-related transfer function convolution method and head-related transfer function convolution device
JP6821699B2 (en) How to regularize active monitoring headphones and their inversion
US20200301653A1 (en) System and method for processing audio between multiple audio spaces
EP2484127A1 (en) An apparatus
EP2822301B1 (en) Determination of individual HRTFs
EP1843636B1 (en) Method for automatically equalizing a sound system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13711470

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13711470

Country of ref document: EP

Kind code of ref document: A1