WO2020035180A1 - Procédé de fonctionnement d'un système audio de niveau d'oreille et système audio de niveau d'oreille - Google Patents

Procédé de fonctionnement d'un système audio de niveau d'oreille et système audio de niveau d'oreille Download PDF

Info

Publication number
WO2020035180A1
WO2020035180A1 PCT/EP2019/061993 EP2019061993W WO2020035180A1 WO 2020035180 A1 WO2020035180 A1 WO 2020035180A1 EP 2019061993 W EP2019061993 W EP 2019061993W WO 2020035180 A1 WO2020035180 A1 WO 2020035180A1
Authority
WO
WIPO (PCT)
Prior art keywords
mean
audio system
level audio
ear level
frequency dependent
Prior art date
Application number
PCT/EP2019/061993
Other languages
English (en)
Inventor
Pejman Mowlaee
Lars Dalskov Mosgaard
Thomas Bo Elmedyb
Michael Johannes Pihl
Georg Stiefenhofer
David PELEGRIN-GARCIA
Adam Westermann
Original Assignee
Widex A/S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from DKPA201800462A external-priority patent/DK201800462A1/en
Application filed by Widex A/S filed Critical Widex A/S
Priority to EP19723398.4A priority Critical patent/EP3837862A1/fr
Priority to US17/268,148 priority patent/US11470429B2/en
Publication of WO2020035180A1 publication Critical patent/WO2020035180A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/40Arrangements for obtaining a desired directivity characteristic
    • H04R25/407Circuits for combining signals of a plurality of transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/41Detection or adaptation of hearing aid parameters or programs to listening situation, e.g. pub, forest
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2460/00Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
    • H04R2460/01Hearing devices using active noise cancellation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/55Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception using an external connection, either wireless or wired
    • H04R25/552Binaural
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/55Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception using an external connection, either wireless or wired
    • H04R25/554Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception using an external connection, either wireless or wired using a wireless connection, e.g. between microphone and amplifier or using Tcoils
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/70Adaptation of deaf aid to hearing loss, e.g. initial electronic fitting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • H04S1/005For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Definitions

  • the present invention relates to a method of operating an ear level audio system.
  • the present invention also relates to an ear-level audio system adapted to carry out said method.
  • An ear level audio system may comprise one or two ear level audio devices.
  • an ear level audio device should be understood as a small, battery-powered, microelectronic device designed to be worn in or at an ear of a user.
  • the ear level audio device generally comprises an energy source such as a battery or a fuel cell, at least one microphone, an internal sound generator, a microelectronic circuit comprising a digital signal processor, and an acoustic output transducer.
  • the ear level audio device is enclosed in a casing suitable for fitting in or at (such as behind) a human ear.
  • devices such as e.g. hearables, headsets, headphones and ear pods may be considered ear level audio devices.
  • the ear level audio device furthermore is capable of amplifying an ambient sound signal in order to alleviate a hearing deficit
  • the ear level audio device may be considered a personal sound amplification product or a hearing aid.
  • an ear level audio device may resemble those of hearing aids and as such traditional hearing aid terminology may be used to describe various mechanical implementations of ear level audio devices that are not hearing aids.
  • BTE Behind-The-Ear
  • an electronics unit comprising a housing containing the major electronics parts thereof is worn behind the ear.
  • An earpiece for emitting sound to the hearing aid user is worn in the ear, e.g. in the concha or the ear canal.
  • a sound tube is used to convey sound from the output transducer, which in hearing aid terminology is normally referred to as the receiver, located in the housing of the electronics unit and to the ear canal.
  • a conducting member comprising electrical conductors conveys an electric signal from the housing and to a receiver placed in the earpiece in the ear.
  • Such hearing aids are commonly referred to as Receiver-In-The-Ear (RITE) hearing aids.
  • RITE Receiver-In-The-Ear
  • RIC Receiver-In-Canal
  • ITE In-The-Ear
  • ITE In-The-Ear
  • ITE the hearing aid is placed substantially inside the ear canal.
  • CIC Complete ly-In-Canal
  • IIC Invisible -In-Canal
  • own voice detection may also advantageous in connection with voice command systems where the reliability of the own voice detection is the primary concern. It is therefore a feature of the present invention to provide an improved method of own voice detection in an ear level audio system.
  • the invention in a first aspect, provides a method of operating an ear level audio system according to claim 1.
  • This provides an improved method of providing own voice detection in an ear level audio system.
  • the invention in a second aspect, provides an ear level audio system according to claim 9.
  • This provides an ear level audio system with improved means for providing own voice detection.
  • Fig. 1 illustrates highly schematically an ear level audio device according to an
  • Fig. 2 illustrates highly schematically a map of values of the unbiased mean phase as a function of frequency in order to provide a phase versus frequency plot.
  • Fig. 1 illustrates highly schematically part an ear level audio system 100 according to an embodiment of the invention.
  • the ear level audio system 100 takes as input, the digital output signals, at least, derived from the two acoustical-electrical input transducers lOla-b.
  • the acoustical-electrical input transducers lOla-b which in the following may also be denoted microphones, provide analog output signals that are converted into digital output signals by analog-digital converters (ADC) and subsequently provided to a filter bank 102 adapted to transform the signals into the time-frequency domain.
  • ADC analog-digital converters
  • One specific advantage of transforming the input signals into the time-frequency domain is that both the amplitude and phase of the signals become directly available in the provided individual time-frequency bins.
  • a Fast Fourier Transform may be used for the transformation and in variations other time-frequency domain transformations can be used such as a Discrete Fourier Transform (DTF), a Short-Time Fourier Transform (STFT), polyphase filterbanks, Discrete Cosine Transformations and weighted overlap- add (WOLA) transformations.
  • DTF Discrete Fourier Transform
  • STFT Short-Time Fourier Transform
  • WOLA weighted overlap- add
  • the ADCs are not illustrated in Fig. 1.
  • the output signals from the filter bank 102 will primarily be denoted input signals because these signals represent the primary input signals to the digital signal processor 103 of the ear level audio system as well as to the own voice detector 104.
  • the term digital input signal may be used interchangeably with the term input signal.
  • all other signals referred to in the present disclosure may or may not be specifically denoted as digital signals.
  • input signal digital input signal, frequency band input signal, sub-band signal and frequency band signal
  • input signals can generally be assumed to be frequency band signals independent on whether the filter bank 102 provide frequency band signals in the time domain or in the time-frequency domain.
  • the microphones lOla-b are omni-directional unless otherwise mentioned.
  • the input signals are not transformed into the time-frequency domain. Instead the input signals are first transformed into a number of frequency band signals by a time-domain filter bank comprising a multitude of time-domain bandpass filters, such as Finite Impulse Response bandpass filters and subsequently the frequency band signals may be compared using correlation analysis wherefrom the phase can be derived.
  • a time-domain filter bank comprising a multitude of time-domain bandpass filters, such as Finite Impulse Response bandpass filters
  • Both the digital input signals are branched, whereby the input signals, in a first branch, is provided to the digital signal processor 103, and, in a second branch, is provided to the own voice detector 104.
  • the own voice detection is based on the spatial and acoustic properties of the user’s own voice.
  • the location of the own voice is fixed relative to the ear level audio system and as a source the position of the own voice is very well defined, because the impact from reverberation and especially the early reflections are limited due to the short distance between the ear level audio system and the users mouth.
  • inter-microphone phase difference (IPD) between the input signals from the acoustical-electrical input transducers lOla-b is estimated by considering the properties of periodic variables, which due to mathematically convenience will be described as complex numbers.
  • An estimate of the IPD between said input signals may therefore be given as a complex number that in polar representation has an amplitude A and a phase Q.
  • the average of a multitude of IPD estimates may be given by:
  • ( ) is the average operator
  • n represents the number of IPD estimates used for the averaging
  • RA is an averaged amplitude that depends on the phase and that may assume values in the interval [0, (d)]
  • Q A is the weighted mean phase. It can be seen that the amplitude Ai of each individual sample weight each corresponding phase q [ in the averaging. Therefore both the averaged amplitude RA and the weighted mean phase Q A are biased (i.e. dependent on the other).
  • the present invention is independent of the specific choice of statistical operator used to determine an average, and consequently within the present context the terms expectation operator, average, sample mean, expectation or mean may be used to represent the result of statistical functions or operators selected from a group comprising the Boxcar function. In the following these terms may therefore be used interchangeably.
  • the amplitude weighting providing the weighted mean phase Q A will generally result in the weighted mean phase Q A being different from the unbiased mean phase Q that is defined by:
  • Equation (1) ( ) is the average operator and n represents the number of inter microphone phase difference samples used for the averaging.
  • inter-microphone phase difference samples may in the following simply be denoted inter-microphone phase differences.
  • R is denoted the mean resultant length and the mean resultant length R provides information on how closely the individual phase estimates are grouped together and the circular variance V and the mean resultant length R are related by:
  • V 1— R (eq. 3)
  • the inventors have found that the information regarding the amplitude relation, which is lost in the determination of the unbiased mean phase Q, the mean resultant length R and the circular variance V turns out to be advantageous because more direct access to the underlying phase probability distribution is provided.
  • the unbiased mean phase provides an improved estimate of the location of a given sound source such as the user’s mouth.
  • X a (k, l) and X b (k, V) represent the short-time Fourier transforms of the input signals at the two microphones as provided by the frequency domain filter bank 102. It is assumed that 6 ab (k, V) is a specific realization of a circular random variable Q and therefore that the statistical properties of the IPDs are governed by circular statistics and therefore that the mean of the IPD may be given by:
  • Instantaneous IPD is given as a function of the Fourier transformation frame 1 and the frequency bin k.
  • the mean resultant length carries information about the directional statistics of the impinging signals at the ear level audio system, specifically about the spread of the IPD.
  • the unbiased mean phase may interchangeably be represented by ⁇ ab or Q and similarly the mean resultant length may interchangeably be represented by R ab or R.
  • own voice may be detected by using said first and second input signals in the time-frequency domain to determine a frequency dependent unbiased mean phase from a mean of an estimated inter-microphone phase difference.
  • the value of the frequency dependent unbiased mean phase can identify the situation where the user is speaking in response to a detection that said value is within a predetermined range.
  • the trigger criteria will be that the value of the unbiased mean phase, for a given frequency range, such as a frequency band, falls below a predetermined trigger value, because the value ideally will be zero as a consequence of the ear level audio device being positioned with the same distance to the users mouth.
  • each ear level audio device of a binaural ear level audio system comprises a set (i.e. a multitude) of microphones wherefrom signals may be derived (e.g. in the form of a beam formed signals) that can be used to determined frequency dependent unbiased mean phase and hereby identifying the situation that a user of the ear level audio system is speaking.
  • signals may be derived (e.g. in the form of a beam formed signals) that can be used to determined frequency dependent unbiased mean phase and hereby identifying the situation that a user of the ear level audio system is speaking.
  • both microphones are accommodated in a single ear level audio device, which obviously will require that the unbiased mean phase falls within a certain predetermined interval having values larger than zero.
  • the estimated frequency dependent unbiased mean phase is processed such that a pair of input signals representing a situation where the user of the ear level audio system is speaking will provide a processed unbiased mean phase estimate of zero for all considered frequencies and independent of the positioning of the microphones from which said pair of input signals are at least derived from. More specifically said processing is carried out by determining the difference between the estimated frequency dependent unbiased mean phase and a target unbiased mean phase obtained based on input signals representing a case where only the ear level audio system user is speaking.
  • this processing is carried out using an associated computing device having a software application adapted to assist the user in carrying out the processing and adapted to interact with the ear level audio device such that the results of the processing are stored in the ear level audio system and used to improve the own voice detection further.
  • the processing is carried out as part of an initial hearing aid system programming (i.e. fitting) in case the ear level audio system is a hearing aid system.
  • the input signals i.e. the sound environment
  • the two main sources of dynamics are the temporal and spatial dynamics of the sound environment.
  • speech the duration of a short consonant may be as short as only 5 milliseconds, while long vowels may have a duration of up to 200 milliseconds depending on the specific sound.
  • the spatial dynamics is a consequence of relative movement between the ear level audio system user and surrounding sound sources.
  • speech is considered quasi stationary for a duration in the range between say 20 and 40 milliseconds and this includes the impact from spatial dynamics.
  • the duration of the involved time windows are as long as possible, but it is, on the other hand, detrimental if the duration is so long that it covers natural speech variations or spatial variations and therefore cannot be considered quasi-stationary.
  • a first time window is defined by the transformation of the digital input signals into the time-frequency domain and the longer the duration of the first time window the higher the frequency resolution in the time-frequency domain, which obviously is advantageous. Additionally, the present invention may require that the determination of an unbiased mean phase and a corresponding mean resultant length of an inter-microphone phase difference is based on a calculation of an expectation value and it has been found that the number of individual samples used for calculation of the expectation value preferably exceeds at least 5.
  • the combined effect of the first time window and the calculation of the expectation value provides an effective time window that is shorter than 40 milliseconds or in the range between 5 and 200 milliseconds such that the sound environment in most cases can be considered quasi-stationary.
  • improved accuracy of the unbiased mean phase and the mean resultant length may be provided by obtaining a multitude of successive samples of the unbiased mean phase and the mean resultant length, in the form of a complex number using the methods according to the present invention and subsequently adding these successive estimates (i.e. the complex numbers) and normalizing the result of the addition with the number of added estimates.
  • This embodiment is particularly advantageous in that the mean resultant length effectively weights the samples that have a high probability of comprising a target source, while estimates with a high probability of mainly comprising noise will have a negligible impact on the final value of the unbiased mean phase of the inter-microphone phase difference because the samples are characterized by having a low value of the mean resultant length.
  • this method it therefore becomes possible to achieve pseudo time windows with a duration up to say several seconds or even longer and the improvements that follows therefrom, despite the fact that neither the temporal nor the spatial variations can be considered quasi-stationary.
  • At least one or at least not all of the successive complex numbers representing the unbiased mean phase and the mean resultant length are used for improving the estimation of the unbiased mean phase of the inter-microphone phase difference, wherein the selection of the complex numbers to be used are based on an evaluation of the corresponding mean resultant length (i.e. the variance) such that only complex numbers representing a high mean resultant length are considered.
  • the estimation of the unbiased mean phase of the inter microphone phase difference is additionally based on an evaluation of the value of the individual samples of the unbiased mean phase such that only samples representing the same target source are combined.
  • the mean resultant length can be used to compare or weight information obtained from a multitude of microphone pairs, such as the multitude of microphone pairs that may be available in a binaural ear level audio system comprising two ear level audio devices each having two microphones.
  • the determination of whether the target source is the user’s mouth is provided by combining a monaurally determined unbiased mean phase with a binaurally determined unbiased mean phase, whereby the symmetry ambiguity that results when translating an estimated phase to a target direction may be resolved.
  • identification of a direct sound can be made if a value of the mean resultant length, for at least one frequency range, is above a mean resultant length direct sound trigger level
  • identification of a diffuse, random or incoherent noise field can be made if a value of the mean resultant length, for at least one frequency range, is below a mean resultant length noise trigger level
  • the mean resultant length may be used to estimate the variance of a correspondingly determined unbiased mean phase from samples of inter-microphone phase differences and evaluate the validity of a determined unbiased mean phase based on the estimated variance for the determined unbiased mean phase.
  • Generally improved accuracy of the determined unbiased mean phase is achieved by at least one of averaging and fitting a multitude of determined unbiased mean phases across at least one of time and frequency by weighting the determined unbiased mean phases with the correspondingly determined mean resultant length.
  • the mean resultant length may be used to perform hypothesis testing of probability distributions for a correspondingly determined unbiased mean phase.
  • corresponding values, in time and frequency, of the unbiased mean phase and the mean resultant length can be used to identify and distinguish between at least two target sources, based on identification of direct sound comprising at least two different values of the unbiased mean phase.
  • corresponding values, in time and frequency, of the unbiased mean phase and the mean resultant length can be used to estimate whether a distance to a target source is increasing or decreasing based on whether the value of the mean resultant length is decreasing or increasing respectively.
  • the mapped mean resultant length R ab for diffuse noise approaches zero for all k ⁇ k u while for anechoic sources it approaches one as intended. Commonly used methods for estimating diffuse noise are only applicable for k > k u .
  • the mapped mean resultant length R ab works best for k ⁇ k u and is particularly suitable for short microphone spacings typical for ear level audio devices.
  • a more correct weight may be applied to time-frequency frames with diffuse noise especially for low frequency IPD estimations based on small microphone arrays.
  • Fig. 2 illustrates highly schematically a map of values of the unbiased mean phase as a function of frequency in order to provide a phase versus frequency plot.
  • the phase versus frequency plot can be used to identify a direct sound if said mapping provides a straight line or at least a continuous curve in the phase versus frequency plot.
  • a direct sound will provide a straight line in the plot, but in the real world conditions a non-straight curve will result, which will primarily be determined by the head related transfer function of the user wearing the ear level audio system and the mechanical design of the ear level audio system itself
  • the curve 201 -A represents direct sound from a target positioned directly in front of the ear level audio system user assuming an ear level audio device having two microphones positioned along the direction perpendicular to the user’s ears.
  • the curve 201-B represents direct sound from a target directly behind the ear level audio system user.
  • the angular direction of the direct sound from a given target source may be determined from the fact that the slope of the interpolated straight line representing the direct sound is given as: dq _ 2nd
  • the so called coherent region 203 is defined as the area in the phase versus frequency plot that is bounded by the at least continuous curves defining direct sounds coming directly from the front and the back direction respectively and the curves defining a constant phase of +p and -p respectively. Any data points outside the coherent region, i.e. inside the incoherent regions 202-a and 202-b will represent a random or incoherent noise field.
  • R uw i.e. the unwrapped mean resultant length
  • Mi and M 2 represent the input signals (which can be any set) in the time- frequency domain representation at one particular frequency (or frequency band), c is the speed of sound and d is the inter-microphone spacing of the considered microphone set.
  • the unwrapped unbiased mean phase 0 UW and its unwrapped mean resultant length R uw have a number of attractive features.
  • the unwrapped unbiased mean phase effectively maps the coherent region onto the full 2p support. Unwrapping therefore provides that all phase difference estimates are mapped onto the same support, independent of microphone spacing, and that the frequency dependence of the support is removed. This means that e.g. spatially-diffuse sound corresponds to a uniform distribution between - p to p and that averaging across frequency can be done without introducing errors.
  • the time difference between two microphones corresponds to a slope of the phase across frequency.
  • a time difference corresponds to an offset under free field assumptions.
  • transformation maps a TDoA to not represent the slope of the mean inter-microphone phase difference but rather a parallel offset of the mean of a transformed estimated inter-microphone phase difference across frequency, which can be estimated by fitting accordingly, again using a reliability measure as weighting in the fit.
  • This approach offers a particularly efficient TDoA estimation method for particularly signals impinging perpendicularly to a line connecting two binaurally positioned microphones of an ear level audio system.
  • this TDoA estimation is for binaural own voice detection where the own voice generally has a binaural TDoA of zero.
  • the mapped mean resultant length R ab may be given by other expressions than the one given in (eq.
  • indices 1 and k represent respectively the frame used to transform the input signals into the time-frequency domain and the frequency bin; wherein E is an expectation operator; wherein represents the inter-microphone phase difference between the first and the second microphone; wherein p is a real variable; and wherein f is an arbitrary function.
  • the variations of the mapped mean resultant length given by eq. 10 also provides additional reliability measures.
  • the reliability measure associated with an unbiased mean phase may be dependent on the sound environment such that e.g. the reliability measure is based on the mean resultant length as given in eq. 5, or the mapped mean resultant length as given in eq. 6 or eq. 10, if the sound environment is dominantly uncorrelated noise and is based on the unwrapped mean resultant length, i.e. as given in eq. 8, if diffuse noise dominates the sound environment.
  • the present method and its variations are particularly attractive for use in ear level audio systems, because these systems due to size requirements only offer limited processing resources, and the present invention provides a very precise own voice detection while only requiring relatively few processing resources. It follows from the disclosed embodiments and the many associated variations of the various features that the variants of one feature may be combined with the variants of other features, also from other embodiments, unless it is specifically noted that this is not possible. Thus as one example it is emphasized that generally all variations of the present invention may be combined with both the mean resultant length and the mapped mean resultant length.
  • the unbiased mean phases and the corresponding reliability measures are provided directly to machine learning methods, such as deep neural networks and Bayesian methods in order to provide the own voice detection.
  • the method of the invention is at least advantageous in using the frequency dependent unbiased mean phase to enable own voice detection in separate frequency bands, whereby own voice signal processing may be optimized, because only the frequency bands that in fact contain own voice need to be processed in a special manner whereby the resulting sound can sound more natural and can contain fewer sound artefacts.
  • special processing of own voice may be advantageous in order to alleviate the detrimental effects of occlusion and ampclusion as well to improve the handling of the various dynamic aspects of speech when own voice is also considered.
  • special processing of own voice may comprise lowering the gain when own voice is detected.
  • the reliability of the own voice detection i.e. the identification that a user of the ear level audio system is speaking
  • the reliability of the own voice detection i.e. the identification that a user of the ear level audio system is speaking
  • the reliability of the own voice detection may be improved, in flexible manner that e.g.
  • the value of the mean resultant length for a sound source may increase with decreasing distance to the ear level auditory system as a result of dereverberation and therefore may be used in an additional criteria adapted for improving the reliability of own voice detection.
  • the mean resultant length or a variance measure derived from the mean resultant length may also be used to estimate the probability that an own voice detection is correct and in response to the estimation adapt the applied own voice processing accordingly, e.g. by making smaller gain adjustments or by applying the changes to the processing slower if the probability that the own voice detection is correct is relatively low.
  • an ear level audio system comprising a left and a right ear level audio device a multitude of input signal pairs are available, e.g. two microphones
  • a fast and reliable own voice detection may be achieved due to the parallel processing carried out by the multitude of input signal pairs.
  • a frequency dependent own voice detection can be carried out in less than 100 milliseconds or even less than 50 milliseconds.
  • the especially advantageous method of the invention can be described by the additional steps of:
  • the difference between the determined (i.e. measured) unbiased mean phase and a target unbiased mean phase is used to provide a common basis.
  • the input signal pair specific target unbiased mean phases may be determined by a measurement as discussed above, but in a variation the input signal pair specific target unbiased mean phases can be predetermined without being personalized to the individual user.
  • the method of the invention can be described by the additional steps of: - modifying a frequency dependent parameter in frequency ranges wherein it has been identified that the user of the ear level audio system is speaking, wherein said frequency dependent parameters are selected from a group of parameters comprising frequency dependent amplification, noise estimation and directional system settings.

Landscapes

  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurosurgery (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

L'invention concerne un procédé de fonctionnement d'un système audio de niveau d'oreille afin de fournir une détection vocale propre améliorée et un système audio de niveau d'oreille (100) permettant de mettre en œuvre le procédé.
PCT/EP2019/061993 2018-08-15 2019-05-09 Procédé de fonctionnement d'un système audio de niveau d'oreille et système audio de niveau d'oreille WO2020035180A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19723398.4A EP3837862A1 (fr) 2018-08-15 2019-05-09 Procédé de fonctionnement d'un système audio de niveau d'oreille et système audio de niveau d'oreille
US17/268,148 US11470429B2 (en) 2018-08-15 2019-05-09 Method of operating an ear level audio system and an ear level audio system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
DKPA201800465 2018-08-15
DKPA201800465 2018-08-15
DKPA201800462A DK201800462A1 (en) 2017-10-31 2018-08-15 METHOD OF OPERATING A HEARING AID SYSTEM AND A HEARING AID SYSTEM
DKPA201800462 2018-08-15

Publications (1)

Publication Number Publication Date
WO2020035180A1 true WO2020035180A1 (fr) 2020-02-20

Family

ID=64453468

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2018/081502 WO2020035158A1 (fr) 2018-08-15 2018-11-16 Procédé de fonctionnement d'un système d'aide auditive et système d'aide auditive
PCT/EP2019/061993 WO2020035180A1 (fr) 2018-08-15 2019-05-09 Procédé de fonctionnement d'un système audio de niveau d'oreille et système audio de niveau d'oreille

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/081502 WO2020035158A1 (fr) 2018-08-15 2018-11-16 Procédé de fonctionnement d'un système d'aide auditive et système d'aide auditive

Country Status (1)

Country Link
WO (2) WO2020035158A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023110836A1 (fr) 2021-12-13 2023-06-22 Widex A/S Procédé de fonctionnement d'un système de dispositif audio et système de dispositif audio
WO2023110845A1 (fr) 2021-12-13 2023-06-22 Widex A/S Procédé de fonctionnement d'un système de dispositif audio et système de dispositif audio

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11617037B2 (en) 2021-04-29 2023-03-28 Gn Hearing A/S Hearing device with omnidirectional sensitivity

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110015924A1 (en) * 2007-10-19 2011-01-20 Banu Gunel Hacihabiboglu Acoustic source separation
US20150163602A1 (en) * 2013-12-06 2015-06-11 Oticon A/S Hearing aid device for hands free communication
US20150289064A1 (en) * 2014-04-04 2015-10-08 Oticon A/S Self-calibration of multi-microphone noise reduction system for hearing assistance devices using an auxiliary device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009034524A1 (fr) * 2007-09-13 2009-03-19 Koninklijke Philips Electronics N.V. Appareil et procede de formation de faisceau audio
DK2088802T3 (da) * 2008-02-07 2013-10-14 Oticon As Fremgangsmåde til estimering af lydsignalers vægtningsfunktion i et høreapparat

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110015924A1 (en) * 2007-10-19 2011-01-20 Banu Gunel Hacihabiboglu Acoustic source separation
US20150163602A1 (en) * 2013-12-06 2015-06-11 Oticon A/S Hearing aid device for hands free communication
US20150289064A1 (en) * 2014-04-04 2015-10-08 Oticon A/S Self-calibration of multi-microphone noise reduction system for hearing assistance devices using an auxiliary device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CABOT: "AN INTRODUCTION TO CIRCULAR STATISTICS AND ITS APPLICATION TO SOUND LOCALIZATION EXPERIMENTS", AES, November 1977 (1977-11-01), XP002788240, Retrieved from the Internet <URL:http://www.aes.org/tmpFiles/elib/20190109/3062.pdf> [retrieved on 201901] *
KUTIL R: "Biased and unbiased estimation of the circular mean resultant length and its variance", INTERNET CITATION, 1 August 2012 (2012-08-01), pages 549 - 561, XP002788241, Retrieved from the Internet <URL:https://www.tandfonline.com/doi/pdf/10.1080/02331888.2010.543463?needAccess=true> [retrieved on 19000101] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023110836A1 (fr) 2021-12-13 2023-06-22 Widex A/S Procédé de fonctionnement d'un système de dispositif audio et système de dispositif audio
WO2023110845A1 (fr) 2021-12-13 2023-06-22 Widex A/S Procédé de fonctionnement d'un système de dispositif audio et système de dispositif audio

Also Published As

Publication number Publication date
WO2020035158A1 (fr) 2020-02-20

Similar Documents

Publication Publication Date Title
EP3704873B1 (fr) Procédé de fonctionnement d&#39;un système de prothèse auditive
CN104902418B (zh) 用于估计目标和噪声谱方差的多传声器方法
CN109040932B (zh) 传声器系统及包括传声器系统的听力装置
US10631105B2 (en) Hearing aid system and a method of operating a hearing aid system
Lockwood et al. Performance of time-and frequency-domain binaural beamformers based on recorded signals from real rooms
CN113453134A (zh) 听力装置及其运行方法和相应数据处理系统
CN108235181B (zh) 在音频处理装置中降噪的方法
US10397711B2 (en) Method of determining objective perceptual quantities of noisy speech signals
WO2020035180A1 (fr) Procédé de fonctionnement d&#39;un système audio de niveau d&#39;oreille et système audio de niveau d&#39;oreille
WO2019086435A1 (fr) Procédé de fonctionnement d&#39;un système d&#39;aide auditive et système d&#39;aide auditive
JP2019022213A (ja) 聴覚機器および非侵入型の音声明瞭度による方法
US11470429B2 (en) Method of operating an ear level audio system and an ear level audio system
EP3833043A1 (fr) Système auditif comprenant un formeur de faisceaux personnalisé
US9992583B2 (en) Hearing aid system and a method of operating a hearing aid system
DK201800462A1 (en) METHOD OF OPERATING A HEARING AID SYSTEM AND A HEARING AID SYSTEM
Ohlenbusch et al. Multi-Microphone Noise Data Augmentation for DNN-based Own Voice Reconstruction for Hearables in Noisy Environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19723398

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019723398

Country of ref document: EP

Effective date: 20210315