EP2620940A1

EP2620940A1 - Method and hearing device for estimating a component of one's own voice

Info

Publication number: EP2620940A1
Application number: EP12196705.3A
Authority: EP
Inventors: Vaclav Bouse
Original assignee: Siemens Medical Instruments Pte Ltd
Current assignee: Sivantos Pte Ltd
Priority date: 2012-01-19
Filing date: 2012-12-12
Publication date: 2013-07-31
Also published as: DE102012200745B4; US20130188816A1; DE102012200745A1

Abstract

The method involves segmenting a time-frequency plane into a number of regions, and determining a region phase difference and a region level difference respectively for each region from signals with respect to each other. The regions of the time-frequency plane are grouped such that the region phase difference corresponds to an estimated phase difference and the region level difference corresponds to an estimated level difference. The signal components of the group are acted as an estimation of a voice component of a wearer. An independent claim is also included for a hearing device.

Description

Die vorliegende Erfindung betrifft ein Verfahren zum Schätzen eines Bestandteils der eigenen Stimme eines Trägers einer Hörvorrichtung. Darüber hinaus betrifft die vorliegende Erfindung eine Hörvorrichtung, in die ein entsprechendes Verfahren implementiert ist. Des Weiteren betrifft die vorliegende Erfindung eine Hörvorrichtung, die ein Filter aufweist, welches nach dem obigen Verfahren erstellt wurde. Unter einer Hörvorrichtung wird hier jedes im oder am Ohr tragbare, einen Schallreiz erzeugende Gerät verstanden, insbesondere ein Hörgerät, ein Headset, Kopfhörer und dergleichen.The present invention relates to a method of estimating a constituent of a listener's own voice. Moreover, the present invention relates to a hearing device in which a corresponding method is implemented. Furthermore, the present invention relates to a hearing device having a filter, which was created according to the above method. A hearing device is here understood to mean any device which can be worn in or on the ear and produces a sound stimulus, in particular a hearing device, a headset, headphones and the like.

Hörgeräte sind tragbare Hörvorrichtungen, die zur Versorgung von Schwerhörenden dienen. Um den zahlreichen individuellen Bedürfnissen entgegenzukommen, werden unterschiedliche Bauformen von Hörgeräten wie Hinter-dem-Ohr-Hörgeräte (HdO), Hörgerät mit externem Hörer (RIC: receiver in the canal) und In-dem-Ohr-Hörgeräte (IdO), z.B. auch Concha-Hörgeräte oder Kanal-Hörgeräte (ITE, CIC), bereitgestellt. Die beispielhaft aufgeführten Hörgeräte werden am Außenohr oder im Gehörgang getragen. Darüber hinaus stehen auf dem Markt aber auch Knochenleitungshörhilfen, implantierbare oder vibrotaktile Hörhilfen zur Verfügung. Dabei erfolgt die Stimulation des geschädigten Gehörs entweder mechanisch oder elektrisch.Hearing aids are portable hearing aids that are used to care for the hearing impaired. In order to meet the numerous individual needs, different types of hearing aids such as behind-the-ear hearing aids (BTE), hearing aid with external receiver (RIC: receiver in the canal) and in-the-ear hearing aids (ITE), e.g. Concha hearing aids or canal hearing aids (ITE, CIC). The hearing aids listed by way of example are worn on the outer ear or in the ear canal. In addition, bone conduction hearing aids, implantable or vibrotactile hearing aids are also available on the market. The stimulation of the damaged hearing takes place either mechanically or electrically.

Hörgeräte besitzen prinzipiell als wesentliche Komponenten einen Eingangswandler, einen Verstärker und einen Ausgangswandler. Der Eingangswandler ist in der Regel ein Schallempfänger, z. B. ein Mikrofon, und/oder ein elektromagnetischer Empfänger, z. B. eine Induktionsspule. Der Ausgangswandler ist meist als elektroakustischer Wandler, z. B. Miniaturlautsprecher, oder als elektromechanischer Wandler, z. B. Knochenleitungshörer, realisiert. Der Verstärker ist üblicherweise in eine Signalverarbeitungseinheit integriert. Dieser prinzipielle Aufbau ist in FIG 1 am Beispiel eines Hinter-dem-Ohr-Hörgeräts dargestellt. In ein Hörgerätegehäuse 1 zum Tragen hinter dem Ohr sind ein oder mehrere Mikrofone 2 zur Aufnahme des Schalls aus der Umgebung eingebaut. Eine Signalverarbeitungseinheit 3, die ebenfalls in das Hörgerätegehäuse 1 integriert ist, verarbeitet die Mikrofonsignale und verstärkt sie. Das Ausgangssignal der Signalverarbeitungseinheit 3 wird an einen Lautsprecher bzw. Hörer 4 übertragen, der ein akustisches Signal ausgibt. Der Schall wird gegebenenfalls über einen Schallschlauch, der mit einer Otoplastik im Gehörgang fixiert ist, zum Trommelfell des Geräteträgers übertragen. Die Energieversorgung des Hörgeräts und insbesondere die der Signalverarbeitungseinheit 3 erfolgt durch eine ebenfalls ins Hörgerätegehäuse 1 integrierte Batterie 5.Hearing aids have in principle as essential components an input transducer, an amplifier and an output transducer. The input transducer is usually a sound receiver, z. As a microphone, and / or an electromagnetic receiver, for. B. an induction coil. The output transducer is usually used as an electroacoustic transducer, z. As miniature speaker, or as an electromechanical transducer, z. B. bone conduction, realized. The amplifier is usually integrated in a signal processing unit. This basic structure is in FIG. 1 shown using the example of a behind-the-ear hearing aid. In a hearing aid housing 1 for carrying behind the ear, one or more microphones 2 for receiving the sound from the environment are installed. A signal processing unit 3, which is also integrated in the hearing aid housing 1, processes the microphone signals and amplifies them. The output signal of the signal processing unit 3 is transmitted to a loudspeaker or earpiece 4, which outputs an acoustic signal. The sound is optionally transmitted via a sound tube, which is fixed with an earmold in the ear canal, to the eardrum of the device carrier. The power supply of the hearing device and in particular the signal processing unit 3 is effected by a likewise integrated into the hearing aid housing 1 battery. 5

Bei zahlreichen Hörgeräteanwendungen ist es notwendig oder wünschenswert, die eigene Sprache bzw. Stimme des Trägers des Hörgeräts bzw. der Hörvorrichtung aus dem Schallumfeld extrahieren zu können. Eine beispielhafte Anwendung wäre die aktive Reduktion von Okklusionseffekten. Darüber hinaus kann auch ein Strahlformer anhand der eigenen Stimme gesteuert werden. Ferner ist es denkbar, dass die Raumimpulsantwort auf der Grundlage der eigenen Sprache geschätzt wird.In numerous hearing aid applications, it is necessary or desirable to be able to extract the speech or voice of the wearer of the hearing device or the hearing device from the sound environment. An exemplary application would be the active reduction of occlusion effects. In addition, a beamformer can be controlled by his own voice. Furthermore, it is conceivable that the space impulse response is estimated on the basis of one's own language.

Die eigene Sprache bzw. Sprachanteile des Trägers der Hörvorrichtung können mit unterschiedlichen Verfahren geschätzt bzw. extrahiert werden. Ein sehr bekanntes Verfahren hierfür ist unter dem Namen CASA (Computational Auditory Scene Analysis) bekannt. Dieses CASA-Prinzip beruht also darauf, dass die aktuelle Hörsituation rechnerisch analysiert wird. Das CASA-Prinzip fußt auf dem ASA-Prinzip, dessen wichtigste Errungenschaften in der Arbeit von Bregman, A. S. (1994): "Auditory Scene Analysis: The Perceptual Organisation of Sound", Bradford Books zusammengefasst . Der Stand der Entwicklungen bzgl. CASA ist wiedergegeben in dem Artikel Wang, D., Brown, G. J. (2006): "Computational Auditory Scene Analysis: Principals, Algorithms and Applications", John Wiley & Sons-Verlag, ISBN 978-0-471-74109-1 .The own language or speech components of the wearer of the hearing device can be estimated or extracted with different methods. A well-known method for doing so is known as CASA (Computational Auditory Scene Analysis). This CASA principle is therefore based on the fact that the current hearing situation is analyzed mathematically. The CASA principle is based on the ASA principle, its main achievements in the work of Bregman, AS (1994): "Auditory Scene Analysis: The Perceptual Organization of Sound", Bradford Books summarized , The state of developments regarding CASA is reproduced in the article Wang, D., Brown, GJ (2006): Computational Auditory Scene Analysis: Principals, Algorithms and Applications, John Wiley & Sons Publishing, ISBN 978-0-471-74109-1 ,

Monaurale CASA-Algorithmen arbeiten auf einem einzigen Signalkanal und versuchen die Quellen zu trennen. Zumindest soll Sprache abgetrennt werden. Zumeist beruhen sie auf sehr strengen Voraussetzungen bzgl. der Schallquellen. Eine dieser Voraussetzungen betrifft beispielsweise die Grundfrequenzschätzung. Darüber hinaus sind die monauralen CASA-Algorithmen prinzipiell nicht in der Lage, die räumliche Information eines Signals zu nutzen.Monaural CASA algorithms work on a single signal channel and try to separate the sources. At least, language should be separated. In most cases, they are based on very strict conditions with regard to the sound sources. One of these conditions relates, for example, to the fundamental frequency estimate. In addition, the monaural CASA algorithms are in principle unable to use the spatial information of a signal.

Mehrkanalige Algorithmen versuchen, die Signale auf der Grundlage der räumlichen Positionen der Quellen zu trennen. Bei diesem Ansatz ist die Konfiguration der Mikrofone wesentlich. Beispielsweise kann bei binauraler Konfiguration, d. h. wenn sich die Mikrofone an beiden Seiten des Kopfes befinden, aber keine zuverlässige Quellentrennung mit diesen Algorithmen durchgeführt werden.Multi-channel algorithms attempt to separate the signals based on the spatial locations of the sources. In this approach, the configuration of the microphones is essential. For example, in binaural configuration, i. H. if the microphones are on both sides of the head but no reliable source separation is done with these algorithms.

Die Aufgabe der vorliegenden Erfindung besteht darin, die eigene Stimme eines Trägers einer Hörvorrichtung zuverlässiger erkennen zu können.The object of the present invention is to be able to recognize the own voice of a wearer of a hearing device more reliably.

Erfindungsgemäß wird diese Aufgabe gelöst durch ein Verfahren zum Schätzen eines Bestandteils der eigenen Stimme eines Trägers einer Hörvorrichtung durch

Platzieren eines ersten Mikrofons der Hörvorrichtung am Ausgang des Gehörgangs eines Ohrs des Trägers oder außerhalb des Gehörgangs,
Platzieren eines zweiten Mikrofons der Hörvorrichtung in dem Gehörgang, so dass sich das zweite Mikrofon näher an dem Trommelfell des Ohrs befindet als das erste Mikrofon,
Schätzen einer Phasendifferenz und einer Pegeldifferenz virtueller Mikrofonsignale der beiden Mikrofone zueinander anhand eines vorgegebenen Modells,
Gewinnen je eines zeitlichen Mikrofonsignals durch jedes der beiden Mikrofone,
Transformieren der beiden zeitlichen Mikrofonsignale zu je einem t-f-Signal in die Zeit-Frequenz-Ebene,
Segmentieren der Zeit-Frequenz-Ebene in mehrere Regionen,
Ermitteln jeweils einer Regionphasendifferenz und einer Regionpegeldifferenz für jede der Regionen von einem der beiden t-f-Signale gegenüber dem anderen der beiden t-f-Signale, und
Gruppieren aller derjenigen der mehreren Regionen der Zeit-Frequenz-Ebene zu einer Gruppe, deren Regionphasendifferenz im Wesentlichen mit der geschätzten Phasendifferenz und deren Regionpegeldifferenz im Wesentlichen mit der geschätzten Pegeldifferenz übereinstimmen, wobei die Signalanteile der Gruppe als Schätzung für den Bestandteil der eigenen Stimme des Trägers dient.

According to the invention, this object is achieved by a method for estimating a component of the own voice of a wearer of a hearing device

Placing a first microphone of the hearing device at the exit of the ear canal of an ear of the wearer or outside the auditory canal,
Placing a second microphone of the hearing device in the ear canal, so that the second microphone is closer to the eardrum of the ear than the first microphone,
Estimate a phase difference and a level difference of virtual microphone signals of the two microphones to each other based on a given model,
Gaining a temporal microphone signal through each of the two microphones,
Transforming the two temporal microphone signals to a tf signal into the time-frequency plane,
Segment the time-frequency level into multiple regions
Determining a region phase difference and a region level difference, respectively, for each of the regions of one of the two tf signals from the other of the two tf signals, and
Grouping all of the plurality of time-frequency plane regions into a group whose region phase difference substantially coincides with the estimated phase difference and its region level difference substantially with the estimated level difference, the signal components of the group being estimated as the constituent of the wearer's own voice serves.

Darüber hinaus wird erfindungsgemäß eine Hörvorrichtung zur Durchführung des obigen Verfahrens bereitgestellt, wobei die Hörvorrichtung die beiden Mikrofone und eine Signalverarbeitungseinrichtung zum Transformieren, Segmentieren und Gruppieren aufweist.Moreover, according to the invention, a hearing device for carrying out the above method is provided, wherein the hearing device has the two microphones and a signal processing device for transforming, segmenting and grouping.

In vorteilhafter Weise werden also zwei Mikrofone sehr speziell platziert. Das zweite Mikrofon wird im Gehörgang angeordnet, während das erste Mikrofon im Wesentlichen am Gehörgangsausgang oder außerhalb des Gehörgangs (z. B. in der Concha oder an der Pinna) angeordnet wird. Damit kann das im Gehörgang angeordnete Mikrofon deutlich mehr Schallanteile, die über Knochenleitung in den Gehörgang gelangen, aufnehmen als das äußere Mikrofon. Dadurch können charakteristische Informationen gewonnen werden, die auf der eigenen Stimme beruhen. Mit einem CASA-Algorithmus lässt sich dann die eigene Stimme, also die Stimme des Trägers der Hörvorrichtung, in der der CASA-Algorithmus läuft, zuverlässig schätzen bzw. extrahieren.Advantageously, therefore, two microphones are placed very special. The second microphone is placed in the ear canal, while the first microphone is placed essentially at the ear canal exit or outside the ear canal (eg in the Concha or on the Pinna). Thus, the microphone arranged in the auditory canal can record significantly more portions of sound which enter the auditory canal via bone conduction than the external microphone. As a result, characteristic information based on one's own voice can be obtained. With a CASA algorithm can then own voice, so the voice of the wearer of the hearing in which the CASA algorithm is running, reliably estimate or extract.

Vorzugsweise wird zu jedem der Mikrofonsignale mindestens ein weiteres, von der Phasendifferenz und der Pegeldifferenz verschiedenes Merkmal gewonnen und zu dem Segmentieren und/oder Gruppieren herangezogen. Obwohl das Gruppieren prinzipiell allein anhand der Phasendifferenz und der Pegeldifferenz möglich ist, ist es günstig, für das Gruppieren mindestens ein weiteres Merkmal zusätzlich heranzuziehen. Zum Segmentieren können grundsätzlich andere Merkmale besser geeignet sein.Preferably, at least one further feature different from the phase difference and the level difference is obtained for each of the microphone signals and used for the segmentation and / or grouping. Although grouping is possible in principle only on the basis of the phase difference and the level difference is, it is beneficial to use at least one additional feature for grouping. In principle, other features may be better suited for segmentation.

Speziell kann das weitere Merkmal eine Veränderung oder eine Veränderungsgeschwindigkeit im Spektrum der Mikrofonsignale betreffen. Dies hat den Vorteil, dass beispielsweise rasche Pegelanstiege (ON-SETs) bei bestimmten Frequenzen gut erkannt werden können. Derartige Signalflanken eignen sich zum Segmentieren.Specifically, the further feature may relate to a change or a rate of change in the spectrum of the microphone signals. This has the advantage that, for example, rapid level increases (ON-SETs) can be well recognized at certain frequencies. Such signal edges are suitable for segmentation.

Das weitere Merkmal kann aber auch die Harmonizität (Grad für die akustische Periodizität) oder die Korrelation der beiden Mikrofonsignale umfassen. Mit der Harmonizität ist es leichter, direkt Sprachanteile erkennen zu können. Die Korrelation hat den Vorteil, dass ein Korrelat zwischen der extern hörbaren Sprache und der über Knochenleitung übertragenen Sprache zusätzlich zur sicheren Bestimmung der eigenen Sprache herangezogen werden kann.The further feature may also include the harmonic (degree of the acoustic periodicity) or the correlation of the two microphone signals. Harmonicity makes it easier to directly recognize language components. The correlation has the advantage that a correlate between the externally audible language and the language transmitted via bone conduction can additionally be used for the reliable determination of one's own language.

Die Hörvorrichtung, die zum Schätzen eines Bestandteils der eigenen Sprache gemäß den obigen Prinzipien ausgebildet ist, kann ein Filter aufweisen, das anhand des Gruppierens bzw. einer entsprechenden Gruppierungsinformation der Signalverarbeitungseinrichtung gesteuert wird. In dem Filter werden dann die durch das Gruppieren festgelegten Regionen in der Zeit-Frequenz-Ebene genutzt, um entsprechende Signalanteile, die dann voraussichtlich von der eigenen Stimme stammen, zu extrahieren bzw. auszufiltern. Das Verfahren mit dem Segmentieren und Gruppieren kann bei Bedarf, z. B. bei jedem Anschalten des Hörgeräts wiederholt werden. Dies hat den Vorteil, dass das Filter dann ständig an die aktuellen Bedingungen (z. B Sitz des Hörgeräts im oder am Ohr) angepasst werden kann.The hearing apparatus adapted to estimate a constituent of the own language according to the above principles may comprise a filter controlled by the grouping or corresponding grouping information of the signal processing means. In the filter, the regions in the time-frequency plane defined by the grouping are then used to extract corresponding signal portions, which are then presumably derived from their own voice. The segmentation and grouping technique may be used as needed, e.g. B. be repeated every time the hearing aid. This has the advantage that the filter can then be constantly adapted to the current conditions (eg seat of the hearing aid in or on the ear).

Darüber hinaus kann aber auch eine Hörvorrichtung vorgesehen sein, die ein Filter aufweist, das zum Extrahieren der eigenen Stimme eines Trägers der Hörvorrichtung dient und das Signalbestandteile ausfiltert, die in die Gruppe der Regionen fallen, welche durch ein oben beschriebenes Verfahren bereits vorab gewonnen wurden. Der Unterschied zu der vorhergehenden Hörvorrichtung besteht also darin, dass das Filter nicht mehr variabel sein muss und somit kostengünstiger herzustellen ist.In addition, however, it is also possible to provide a hearing device which has a filter which is used for extracting the own voice of a wearer of the hearing device and filters out signal components belonging to the group of regions fall, which were obtained in advance by a method described above. The difference to the previous hearing device is therefore that the filter no longer has to be variable and thus is less expensive to produce.

Die Hörvorrichtung kann als In-dem-Ohr-Hörgerät ausgebildet sein. Alternativ kann die Hörvorrichtung auch als Hinter-dem-Ohr-Hörgerät ausgebildet sein, das ein Hörgerätegehäuse zum Tragen hinter dem Ohr und einen externen Hörer zum Tragen im Gehörgang oder einen Schallschlauch zum Übertragen von Schall von dem Hörgerätegehäuse in den Gehörgang aufweist, wobei das zweite Mikrofon an dem externen Hörer oder dem Schallschlauch und das erste Mikrofon in dem Hörgerätegehäuse angeordnet ist. Damit können die gängigsten Arten von Hörgeräten von der erfindungsgemäßen Art der Schätzung der eigenen Stimme profitieren.The hearing device can be embodied as an in-the-ear hearing device. Alternatively, the hearing device may also be embodied as a behind-the-ear hearing device which has a hearing aid housing for carrying behind the ear and an external earpiece for carrying in the auditory canal or a sound tube for transmitting sound from the hearing device housing into the auditory canal Microphone is arranged on the external handset or the sound tube and the first microphone in the hearing aid housing. Thus, the most common types of hearing aids can benefit from the inventive method of estimating one's own voice.

Die vorliegende Erfindung wird anhand der beigefügten Zeichnungen näher erläutert, in denen zeigen:

FIG 1: den prinzipiellen Aufbau eines Hörgeräts gemäß dem Stand der Technik;
FIG 2: einem Querschnitt durch einen Gehörgang mit eingesetztem Hörgerät gemäß der vorliegenden Erfindung;
FIG 3: ein Blockdiagramm eines CASA-Algorithmus;
FIG 4: das Blockdiagramm von FIG 3 mit interner Struktur;
FIG 5: ein Zeit-Frequenz-Diagramm mit Nutzsignalregionen und
FIG 6: eine Schnittansicht eines Ohrs mit erfindungsgemäß gestaltetem Hinter-dem-Ohr-Hörgerät.

The present invention will be further explained with reference to the accompanying drawings, in which:

FIG. 1: the basic structure of a hearing aid according to the prior art;
FIG. 2: a cross-section through an ear canal with inserted hearing aid according to the present invention;
FIG. 3: a block diagram of a CASA algorithm;
FIG. 4: the block diagram of FIG. 3 with internal structure;
FIG. 5: a time-frequency diagram with useful signal regions and
FIG. 6: a sectional view of an ear with inventively designed behind-the-ear hearing aid.

Die nachfolgend näher geschilderten Ausführungsbeispiele stellen bevorzugte Ausführungsformen der vorliegenden Erfindung dar.The embodiments described in more detail below represent preferred embodiments of the present invention.

FIG 2 zeigt schematisch einen Gehörgang 10 mit Trommelfell 11, wobei ein IdO-Hörgerät 12 in den Gehörgang 10 eingesetzt ist. Am Ausgang des Gehörgangs 10 befindet sich das Außenohr 13, das hier nicht vollständig eingezeichnet ist. Das Hörgerät 12 besitzt in dem in den Gehörgang 10 eingesetzten Zustand eine dem Trommelfell 11 zugewandte Seite 14 und eine vom Trommelfell abgewandte, nach außen gerichtete Seite 15. FIG. 2 schematically shows an ear canal 10 with eardrum 11, wherein an ITE hearing aid 12 is inserted into the ear canal 10. At the exit of the ear canal 10 is the outer ear 13, which is not completely drawn here. In the state inserted into the ear canal 10, the hearing device 12 has a side 14 facing the eardrum 11 and an outwardly directed side 15 facing away from the eardrum.

Das Hörgerät 12 besitzt ein erstes Mikrofon 16 an der nach außen gewandten Seite 15. Dieses Mikrofon 16 ist lediglich symbolisch außerhalb des Hörgeräts 12 gezeichnet. Tatsächlich befindet sich das Mikrofon regelmäßig jedoch in dem Hörgerät oder zumindest an der Oberfläche des Hörgeräts.The hearing device 12 has a first microphone 16 on the outwardly facing side 15. This microphone 16 is only symbolically drawn outside the hearing device 12. In fact, however, the microphone is regularly in the hearing aid or at least on the surface of the hearing aid.

Das erste Mikrofon 16 liefert ein Mikrofonsignal m₁. Dieses erste Mikrofonsignal m₁ wird für den nachfolgend beschriebenen CASA-Algorithmus verwendet. Es wird aber auch der üblichen Signalverarbeitungseinrichtung 17 des Hörgeräts 12 zur Verfügung gestellt. Diese übliche Signalverarbeitungseinrichtung 17 umfasst häufig einen Verstärker. Das Ausgangssignal der Signalverarbeitungseinrichtung 17 wird an einen Lautsprecher bzw. Hörer 18 weitergeleitet, der an der dem Trommelfell 11 zugewandten Seite 14 des Hörgeräts 12 angeordnet ist. Auch er ist hier nur symbolisch außerhalb des Hörgeräts 12 eingezeichnet und befindet sich meist jedoch im Hörgerätegehäuse.The first microphone 16 supplies a microphone signal m ₁ . This first microphone signal m ₁ is used for the CASA algorithm described below. However, it is also the usual signal processing device 17 of the hearing aid 12 is provided. This conventional signal processing device 17 often includes an amplifier. The output signal of the signal processing device 17 is forwarded to a loudspeaker or receiver 18 which is arranged on the side 14 of the hearing device 12 facing the eardrum 11. Also, he is symbolically located outside of the hearing aid 12 here and is mostly in the hearing aid housing.

Die Hörvorrichtung bzw. das Hörgerät 12 besitzt hier zusätzlich zu dem ersten Mikrofon 16 ein zweites Mikrofon 19. Dieses zweite Mikrofon 19 befindet sich ebenfalls auf der dem Trommelfell 11 zugewandten Seite 14 des Hörgeräts 12. Es nimmt somit Schall auf, der sich in dem Raum zwischen dem Hörgerät 12, dem Trommelfell 11 und der Wand des Gehörgangs 10 ergibt. In diesen oftmals abgeschlossenen Raum wird insbesondere auch der Schall der eigenen Stimme über Knochenleitung eingetragen. Das zweite Mikrofon 19 nimmt unter anderem diesen Schall auf und stellt im Hörgerät 12 ein zweites Mikrofonsignal m₂ zur Verfügung. Dieses zweite Mikrofon 19 kann als im-Kanal-Mikrofon bezeichnet werden.The hearing device or the hearing device 12 has here in addition to the first microphone 16, a second microphone 19. This second microphone 19 is also located on the eardrum 11 facing side 14 of the hearing aid 12. It thus absorbs sound, which is in the room between the hearing aid 12, the eardrum 11 and the wall of the ear canal 10 results. In this often closed room especially the sound of one's own voice becomes over bone conduction entered. Among other things, the second microphone 19 picks up this sound and provides a second microphone signal m ₂ in the hearing device 12. This second microphone 19 may be referred to as an in-channel microphone.

Mit einem in FIG 3 symbolisch dargestellten CASA-System 20, das in das Hörgerät 12 integriert sein kann, wird die eigene Stimme bzw. Sprache, d. h. die Sprache des Hörgeräteträgers, geschätzt. Daher liefert das CASA-System 20 einen Schätzwert ν̃ für einen Bestandteil der eigenen Sprache.With a in FIG. 3 symbolically represented CASA system 20, which may be integrated into the hearing aid 12, the own voice or language, ie the language of the hearing aid wearer estimated. Therefore, the CASA system 20 provides an estimate ν for a component of its own language.

In FIG 4 ist das CASA-System 20 von FIG 3 detailliert dargestellt. In dem CASA-System 20 werden die beiden Mikrofonsignale m₁ und m₂ einer Analyseeinheit 21 zugeführt. Diese Analyseeinheit 21 untersucht jedes der Mikrofonsignale m₁ und m₂ nach spezifischen Merkmalen. Dazu werden die zeitlichen Signale m₁ und m₂ in den Zeit-Frequenz-Bereich transformiert, wodurch sich sog. "t-f-Signale" ergeben, die auch als Kurzzeitspektren bezeichnet werden können. Die Transformation kann durch eine hochauflösende Filterbank durchgeführt werden. In der Analyseeinrichtung 21 werden dann für jeden Frequenzkanal jedes der beiden Mikrofonsignale m₁ und m₂ Merkmale extrahiert. Diese Merkmale sind insbesondere die Phasendifferenz und die Pegeldifferenz zwischen den beiden Mikrofonsignalen m₁ und m₂, also insbesondere die Phasen- und Pegeldifferenz in jedem Punkt der t-f-Ebene der t-f-Signale. Darüber hinaus können durch die Analyseeinrichtung 21 aber noch weitere Merkmale aus den Mikrofonsignalen m₁ und m₂ extrahiert werden. Eines dieser weiteren Merkmale kann so genannte "On-Sets" betreffen. Darunter sind beispielsweise schnelle Veränderungen in einem Spektrum zu verstehen, die sich typischerweise am Anfang eines Vokals ergeben. Solche On-Sets stellen meist steile Flanken in einem t-f-Diagramm dar und eignen sich zum Segmentieren der t-f-Signale.In FIG. 4 is the CASA system 20 of FIG. 3 shown in detail. In the CASA system 20, the two microphone signals m ₁ and m _{2 are} supplied to an analysis unit 21. This analysis unit 21 examines each of the microphone signals m ₁ and m ₂ for specific characteristics. For this purpose, the time signals m ₁ and m _{2 are transformed} into the time-frequency range, resulting in so-called "tf signals", which can also be referred to as short-time spectra. The transformation can be performed by a high-resolution filter bank. In the analysis device 21, features are then extracted for each frequency channel of each of the two microphone signals m ₁ and m ₂ . These features are in particular the phase difference and the level difference between the two microphone signals m ₁ and m ₂ , ie in particular the phase and level difference in each point of the tf-level of the tf signals. In addition, however, further features can be extracted from the microphone signals m ₁ and m ₂ by the analysis device 21. One of these additional features may relate to so-called "on-sets". This includes, for example, rapid changes in a spectrum, which typically arise at the beginning of a vowel. Such on-sets usually represent steep edges in a tf diagram and are suitable for segmenting the tf signals.

Ein weiteres von der Analyseeinrichtung 21 extrahiertes Merkmal kann die Harmonizität sein, unter der der Grad der akustischen Periodizität verstanden wird. Die Harmonizität wird häufig zum Erkennen von Sprache herangezogen. Als noch weiteres Merkmal kann beispielsweise in der Analyseeinrichtung 21 die Korrelation der Mikrofonsignale m₁ und m₂ untersucht werden. Insbesondere kann beispielsweise die Korrelation zwischen dem über Knochenleitung in den Gehörgang übertragenen Schall und dem von außen an das Ohr herangeführten Schall analysiert werden. Auch daraus ergeben sich Hinweise auf die eigene Sprache.Another feature extracted by analyzer 21 may be the harmonics, which is understood to mean the degree of acoustic periodicity. The harmonicity is often used to recognize language. As a still further feature, for example, in the analysis device 21, the correlation of the microphone signals m ₁ and m ₂ can be examined. In particular, for example, the correlation between the sound transmitted via the bone conduction into the auditory canal and the sound brought in from the outside to the ear can be analyzed. This also gives hints to your own language.

Der Analyseeinrichtung 21 schließt sich ausgangsseitig eine Segmentierungseinrichtung 22 an. Diese segmentiert die Kurzzeitspektren der Mikrofonsignale m₁ und m₂. Dies bedeutet, dass die Segmentierungseinrichtung 22 Grenzen um Signalanteile in der t-f-Ebene auf die Art und Weise berechnet, dass damit gemäß FIG 5 Regionen 24 definiert werden. In diesen Regionen 24 liegen t-f-Signalanteile einer einzigen Schallquelle. Die Regionen 24 in der t-f-Ebene für einzelne Quellen können auf verschiedene, bekannte Arten berechnet werden. Regionen, die einer definierten Quelle zugeordnet werden können, enthalten demnach einen Quellenschallanteil 25. Außerhalb dieser Regionen 24 befinden sich Störschallanteile 26, die nicht einer spezifischen Quelle zugeordnet werden können. Zu dem Zeitpunkt der Segmentierung ist aber noch nicht bekannt, welche Region 24 zu welcher spezifischen Quelle gehört. Die in FIG 5 gezeigten Regionen 24 in der t-f-Ebene werden für beide Mikrofonsignale m₁ und m₂ gebildet.The analysis device 21 is followed on the output side by a segmentation device 22. This segmented the short-term spectra of the microphone signals m ₁ and m ₂ . This means that the segmentation device 22 calculates limits around signal components in the tf plane in such a way that according to FIG FIG. 5 Regions 24 are defined. In these regions 24 are tf signal components of a single sound source. The regions 24 in the tf plane for individual sources can be calculated in several known ways. Regions that can be assigned to a defined source therefore contain a source sound component 25. Outside of these regions 24 there are interference sound components 26 that can not be assigned to a specific source. At the time of the segmentation, however, it is not yet known which region 24 belongs to which specific source. In the FIG. 5 shown regions 24 in the tf plane are formed for both microphone signals m ₁ and m ₂ .

Im Anschluss an die Segmentierungseinrichtung 22 folgt eine Gruppierungseinrichtung 23. In der Gruppierungseinrichtung 23 eines allgemeinen CASA-Systems werden die segmentierten Signalanteile, d. h. die Signalanteile 25 in den Regionen 24, in Signalströme organisiert, die den unterschiedlichen Schallquellen zugeordnet werden. Im vorliegenden Fall werden nur diejenigen Signalanteile zu einem Signalstrom synthetisiert, die zur eigenen Sprache des Hörgeräteträgers gehören. Bei dem Gruppieren können beliebige Regionen 24 der t-f-Ebene miteinander kombiniert werden.Subsequent to the segmentation device 22 is followed by a grouping device 23. In the grouping device 23 of a general CASA system, the segmented signal components, ie the signal components 25 in the regions 24, are organized into signal streams which are assigned to the different sound sources. In the present case, only those signal components are synthesized into a signal stream which belong to the own language of the hearing device wearer. In grouping, arbitrary regions 24 of the tf-plane can be combined with each other.

Für das Gruppieren werden die Informationen der Phasendifferenz und der Pegeldifferenz herangezogen. Um anhand dieser Informationen entscheiden zu können, ob eine Region zur eigenen Stimme gehört, muss vorab rechnerisch die Phasendifferenz und die Pegeldifferenz der beiden Mikrofonsignale in einem Modell geschätzt werden. Anhand dieser Schätzwerte kann dann ermittelt werden, ob eine der segmentierten Regionen zu der eigenen Stimme gehört oder nicht. Liegt nämlich ermittelte Phasen- und Pegeldifferenz in einem vorgegebenen Toleranzbereich um die geschätzte Phase und Pegeldifferenz, so wird die betreffende Region zur eigenen Stimme gezählt.For grouping, the information of the phase difference and the level difference is used. In order to be able to use this information to determine whether a region belongs to one's own voice, the phase difference and the level difference of the two microphone signals in a model must be estimated beforehand. Based on these estimates, it can then be determined whether or not one of the segmented regions belongs to one's own vote. If the determined phase and level difference lie within a predetermined tolerance range around the estimated phase and level difference, then the region in question is counted to its own voice.

Die Auswahl, ob eine Region 24 mit einer oder anderen mehreren Regionen 24 gruppiert wird, erfolgt in Abhängigkeit von der Phasendifferenz und der Pegeldifferenz zwischen den beiden Mikrofonsignalen m₁ und m₂. Für das Gruppieren können aber auch die oben aufgezählten weiteren Merkmale herangezogen werden. Eine so entstandene Gruppe repräsentiert also alle Anteile eines Kurzzeitspektrums, die zusammengefügt werden sollen, um hier aus der Vielfalt der Schallanteile nur die eigene Sprache bzw. eigene Stimme zu gewinnen. Die übrigen Signalanteile in dem Kurzzeitspektrum sollen nämlich unterdrückt werden.The selection of whether a region 24 is grouped with one or more multiple regions 24 is dependent on the phase difference and the level difference between the two microphone signals m ₁ and m ₂ . For grouping, however, the additional features listed above can also be used. So a group thus formed represents all parts of a short-term spectrum, which are to be joined together in order to gain only the own language or own voice from the variety of sound components. The remaining signal components in the short-term spectrum should namely be suppressed.

Sind nun diejenigen Regionen 24 in der t-f-Ebene für die eigene Sprache identifiziert, so kann damit eine t-f-Filterung durchgeführt werden. Dazu gibt die Gruppierungseinrichtung 23 die entsprechende Gruppierungsinformation an ein Filter 27 in dem CASA-System 20 weiter. Das Filter 27 wird also mit der Gruppierungsinformation gesteuert bzw. parametrisiert. Das Filter 27 erhält die zeitlichen Mikrofonsignale m₁ und m₂, filtert die beiden Signale und gewinnt daraus eine Schätzung für die eigene Stimme bzw. einen Bestandteil ν̃ der eigenen Stimme. Dabei kann das Filter die Signalanteile der Regionen 24 leider t-f-Signale der zwei Mikrofone oder nur diejenigen von einem der t-f-Signale von einem Mikrofon zur Rekonstruktion der eigenen Stimme verwenden.Now, if those regions 24 identified in the tf-level for their own language, so that a tf-filtering can be performed. For this purpose, the grouping device 23 forwards the corresponding grouping information to a filter 27 in the CASA system 20. The filter 27 is thus controlled or parameterized with the grouping information. The filter 27 receives the temporal microphone signals m ₁ and m ₂ , filters the two signals and derives an estimate for their own voice or a component ν of their own voice. In this case, the filter can use the signal components of the regions 24, unfortunately, tf signals of the two microphones or only those of one of the tf signals from a microphone to reconstruct their own voice.

Aus den beiden Mikrofonsignalen m₁ und m₂, die von sehr spezifisch angeordneten Mikrofonen 16, 19 stammen, wird durch Segmentieren und Gruppieren also ein spezielles Filter bzw. eine spezielle Filterinformation gewonnen, mit dem bzw. der aus einer Hörsituation, die durch mehrere Schallquellen gekennzeichnet ist, die eigene Stimme herausgefiltert werden kann. Dadurch ist ein spezielles Signalmodell für die eigene Sprache hinfällig.From the two microphone signals m ₁ and m ₂ , which come from very specifically arranged microphones 16, 19, therefore, a special filter or special filter information is obtained by segmentation and grouping, with or from a listening situation by several sound sources is characterized, your own voice can be filtered out. As a result, a special signal model for your own language is obsolete.

Das erfindungsgemäße System besitzt typischerweise eine Verarbeitungsverzögerung von wenigen 100 ms. Diese Verzögerung ist notwendig, um die Merkmale zu extrahieren und die Regionen zu gruppieren. Diese Verzögerung stellt aber in der Praxis kein Problem dar.The system according to the invention typically has a processing delay of a few 100 ms. This delay is necessary to extract the features and group the regions. However, this delay is not a problem in practice.

In FIG 5 ist ein weiteres Ausführungsbeispiel hinsichtlich des Hardwareaufbaus eines erfindungsgemäßen Hörgeräts dargestellt. Bei dem Hörgerät handelt es sich hier um ein HdO-Hörgerät, dessen Hauptkomponente 28 hinter dem Ohr, insbesondere hinter der Pinna 29 getragen wird. Dieses HdO-Hörgerät besitzt an der Hauptkomponente 28 ein erstes Mikrofon 30. Außerdem besitzt das Hörgerät hier einen so genannten externen Hörer, der im Gehörgang 32 befestigt ist. Zusammen mit diesem externen Hörer 31 ist auch ein zweites Mikrofon 33 in dem Gehörgang 32 befestigt. Damit kann das erfindungsgemäße Extrahieren bzw. Schätzen eines Bestandteils der eigenen Stimme auch bei einem HdO-Hörgerät genutzt werden.In FIG. 5 is a further embodiment with respect to the hardware structure of a hearing aid according to the invention shown. The hearing aid is a BTE hearing aid whose main component 28 is worn behind the ear, in particular behind the pinna 29. This BTE hearing aid has at the main component 28, a first microphone 30. In addition, the hearing aid here has a so-called external handset, which is mounted in the ear canal 32. Together with this external handset 31, a second microphone 33 is also mounted in the ear canal 32. Thus, the inventive extracting or estimating a component of one's own voice can also be used in a BTE hearing aid.

Bei der erfindungsgemäßen Hörvorrichtung kann somit erstmals das CASA-Prinzip auch zum Registrieren bzw. Extrahieren der eigenen Stimme eingesetzt werden, da durch die spezielle Platzierung der Mikrofone nun ausreichend räumliche Information von den Signalen vorhanden ist. Aus dieser räumlichen Information kann entsprechende Gruppierungsinformation gewonnen werden, so dass letztlich auf komplizierte Sprachmodelle verzichtet werden kann.Thus, for the hearing device according to the invention, the CASA principle can also be used for the first time to register or extract one's own voice, since due to the special placement of the microphones, sufficient spatial information of the signals is now available. From this spatial information corresponding grouping information can be obtained, so that ultimately can be dispensed with complicated language models.

Claims

A method for estimating a constituent of the own voice of a wearer of a hearing device (12) characterized by

Placing a first microphone (16, 30) of the hearing device at the exit of the auditory canal (10) of an ear of the wearer or outside the auditory canal,

Placing a second microphone (19, 33) of the hearing device in the auditory canal, so that the second microphone is closer to the eardrum (11) of the ear than the first microphone,

Estimating a phase difference and a level difference of virtual microphone signals of the two microphones with respect to one another using a predetermined model,

Each time a microphone signal (m ₁ , m ₂ ) is obtained by each of the two microphones,

Transforming the two temporal microphone signals to one tf signal into the time-frequency plane,

Segmenting the time-frequency level into several regions (24),

Determining respectively a region phase difference and a region level difference for each of the regions of one of the two tf signals from the other of the two tf signals, and

Grouping all of the plurality of regions (24) of the time-frequency plane into a group whose region phase difference substantially coincides with the estimated phase difference and its region level difference substantially with the estimated level difference, the signal portions (25) of the group being estimated as the part of the wearer's own voice.

Method according to claim 1, wherein for each of the microphone signals (m ₁ , m ₂ ) at least one further feature different from the phase difference and the level difference is obtained and used for the segmentation and / or grouping.

The method of claim 2, wherein the further feature relates to a change or a rate of change in the spectrum of the microphone signals (m ₁ , m ₂ ).

The method of claim 2, wherein the further feature comprises the harmonic or correlation of the two microphone signals (m ₁ , m ₂ ).

Hearing apparatus (12) for carrying out the method according to one of the preceding claims, wherein the hearing apparatus comprises the two microphones (16, 19, 30, 33) and a signal processing device (20) for transforming, segmenting and grouping.

Hearing apparatus according to claim 5, comprising a filter (27) controlled by grouping the signal processing means.

Hearing apparatus according to claim 5 or 6, which is designed as in-the-ear hearing aid.

Hearing apparatus according to claim 5 or 6, which is designed as a behind-the-ear hearing aid, a hearing aid housing for carrying behind the ear and an external handset (31) for carrying in the ear canal (32) or a sound tube for transmitting sound from the Hearing aid housing in the ear canal, wherein the second microphone (33) on the external handset or the sound tube and the first microphone (30) is arranged in the hearing aid housing.