DE10156954B4

DE10156954B4 - Image-based adaptive acoustics

Info

Publication number: DE10156954B4
Application number: DE2001156954
Authority: DE
Inventors: Martin Dipl.-Geophys. Fritzsche; Alfred Dr. Kaltenmeier; Klaus Dr.-Ing. Linhard; Otto Dipl.-Ing. Löhlein; Joachim Dipl.-Inform. Gloger; Tilo Dr. Schwarz
Original assignee: DaimlerChrysler AG
Current assignee: Mercedes Benz Group AG
Priority date: 2001-11-20
Filing date: 2001-11-20
Publication date: 2004-12-23
Anticipated expiration: 2021-11-21
Also published as: DE10156954A1; DE10156954B9

Abstract

Visuell-akustische Anordnung für Audiowiedergabe, Spracheingabe und Kommunikation zwischen mehreren Teilnehmern,
bei welcher Mikrofon- und/oder Lautsprecheranordnungen für die Teilnehmer individuell adaptiert sind,
und welche ein Kameraanordnung mit Bildverarbeitung einschließt, durch welche zumindest ein Teil des Bereichs, in welchem sich Teilnehmer befinden können erfasst wird,
dadurch gekennzeichnet,
dass ein Mittel vorgesehen ist, um die Mikrofon- und/oder Lautsprecheranordnungen auf Grundlage der Daten der Bildverarbeitung auf wenigstens einen der Teilnehmer auszurichten.Visual-acoustic arrangement for audio reproduction, voice input and communication between several participants,
in which microphone and / or loudspeaker arrangements are individually adapted for the participants,
and which includes a camera arrangement with image processing by which at least a part of the area in which participants are located is detected,
characterized,
in that a means is provided for aligning the microphone and / or loudspeaker arrangements with at least one of the participants on the basis of the image processing data.

Description

Die Erfindung betrifft eine Anordnung und ein Verfahren zur bildgestützten adaptiven Akustik nach dem Oberbegriff der Patentansprüche 1 und 9.The The invention relates to an arrangement and a method for image-based adaptive Acoustics according to the preamble of claims 1 and 9.

Die Erfindung findet Verwendung bei Kommunikationssystemen insbesondere im Fahrzeug.The This invention is used in communications systems, in particular in the vehicle.

In einem Fahrzeugs sind die Sitzplätze fest installiert und dadurch ist die Position der möglichen Insassen relativ gut bestimmbar. Eine weitere Besonderheit ist, daß der Fahrer eine besondere Rolle unter den Insassen einnimmt. Jedoch ist es mit üblichen akustischen Mitteln nicht möglich, einen Hörer zu detektieren, wenn der Hörer passiv ist.In a vehicle are the seats firmly installed and this is the position of the possible occupants relatively well determinable. Another special feature is that the driver plays a special role among the inmates. However it is with usual acoustic Means not possible a listener to detect when the listener is passive.

Gegebenenfalls wird eine Beschallung eines Sitzplatzes durchgeführt, obwohl kein Insasse vorhanden ist. Das führt zu unnötigen akustischen StörungPossibly A sound is made of a seat, although no occupant exists is. Leading to unnecessary acoustic disturbance

In der US Patentschrift 5,901,978 ist ein Verfahren und eine Vorrichtung zur Erfassung eines Kindersitzes angegeben, bei dem durch Mustererkennungsverfahren u.a. die Sitzbelegung erkannt wird und diese Information zur Steuerung z.B. der Sitzeinstellung, des Airbag Systems und des Unterhaltungssystems verwendet wird. Es werden visuelle und akustische Verfahren eingesetzt. Eine gezielte Geräuschreduzierung ist bei diesem System nicht vorgesehen.In US Pat. No. 5,901,978 is a method and an apparatus specified for the detection of a child seat in which by pattern recognition method et al the seat occupancy is detected and this information to control e.g. seat adjustment, airbag system and entertainment system is used. Visual and acoustic methods are used. A targeted noise reduction is not provided with this system.

Aus der Schrift DE 69101527 T2 ist eine Spracherkennungseinrichtung bekannt, welche in einem Fahrzeug verwendet wird. Die Vorrichtungen dient dazu, normalerweise von einem Fahrer durchgeführte manuelle Operationen zu ersetzen. Insbesondere bezieht sich diese Erfindung auf eine Spracherkennungsvorrichtung, die Befehle von einem Fahrersitz und einem Beifahrersitz lokalisiert und die Spracherkennungsrate in einem geräuscherfüllten Fahrzeugraum verstärkt, um dadurch die Zuverlässigkeit der Vorrichtung zu verbessern.From the Scriptures DE 69101527 T2 a voice recognition device is known which is used in a vehicle. The devices serve to replace manual operations normally performed by a driver. More particularly, this invention relates to a voice recognition apparatus that locates commands from a driver's seat and a passenger's seat and amplifies the voice recognition rate in a noisy vehicle compartment, thereby improving the reliability of the apparatus.

Die Schrift DE 19962218 A1 beschreibt ein System zum Autorisieren von Sprachbefehlen. Hierbei werden Sprachbefehle dadurch autorisiert, dass vorbestimmten Sprachbefehlen vorbestimmten Orte zugeordnet werden. An diesen Orten muss sich eine die Befehle sprechende Person befinden, damit der entsprechende Sprachbefehl ausgeführt wird. Der Sprachbefehl wird von einem Mikrofon erfasst. Gleichzeitig wird von einer dem vorbestimmten Ort zugeordneten Kamera die Mundbewegung einer dort sprechenden Person erfasst und der Befehl zur Ausführung nur freigegeben, wenn die Mundbewegung mit dem von Mikrofon erfassten Audiosignal korreliert.The font DE 19962218 A1 describes a system for authorizing voice commands. Here, voice commands are authorized by assigning predetermined voice commands to predetermined locations. A person speaking the commands must be in these locations for the appropriate voice command to be executed. The voice command is detected by a microphone. At the same time, the mouth movement of a person speaking there is detected by a camera assigned to the predetermined location, and the command to execute is only released when the mouth movement correlates with the audio signal detected by the microphone.

Ein Verfahren und eine Vorrichtung zur Überprüfung Nutzungsberechtigung für ein Kommunikationssystem ist aus der Schrift DE 199 48 546 A1 bekannt. Hierbei werden die Gesichtszüge eines Kommunikationsteilnehmers erkannt und mit einem abgespeicherten Referenzbild verglichen. Mit dem Referenzbild ist desweiteren eine Sprachprobe abgespeichert, welche mit einer Kennungsphrase verglichen wird. Auf Grund dieser Vergleiche, wird ermittelt, ob ein aktueller Teilnehmer zur Nutzung des Systems berechtigt ist oder nicht.A method and apparatus for verifying usufruct for a communication system is from the Scriptures DE 199 48 546 A1 known. Here, the facial features of a communication participant are recognized and compared with a stored reference image. Furthermore, a speech sample is stored with the reference image, which is compared with an identification phrase. Based on these comparisons, it is determined whether a current subscriber is authorized to use the system or not.

Die Aufgabe der Erfindung besteht darin, eine Vorrichtung und ein Verfahren anzugeben, bei dem die Detektion der Sprachaktivität vereinfacht, die Adaption mit dem akustischen Signal durch Bild verbessert und bei starken und/oder instationären Geräuschen insbesondere im Fahrzeug Erkennungsfehler vermieden werden.The The object of the invention is a device and a method in which the detection of the voice activity simplifies the Adaptation with the acoustic signal improved by image and at strong and / or unsteady noises in particular Vehicle recognition errors can be avoided.

Die Erfindung betreffend die Anordnung ist in Anspruch 1 und betreffend das Verfahren in Anspruch 9 beschrieben. Vorteilhafte Ausgestaltungen und Weiterbildungen sind den Unteransprüchen zu entnehmen.The Invention relating to the arrangement is in claim 1 and related the method described in claim 9. Advantageous embodiments and Further developments can be found in the dependent claims.

Mikrofon/Lautsprecheranordnungen gemäß der Erfindung sind geeignet, um mehrere Mikrofone oder Mikrofonarrays besser zu adaptieren, um bereits vor Sprachaktivität eine Initialisierung des Arrays auf den Sprecher zu erhalten. Wird die akusti sche Information und die Bildinformation fusioniert, so wird auch bei Sprachinaktivität die Erkennung und Identifizierung des Sprechers möglich. Dies ist für das Freisprechen, für die Spracherkennung und für die Insassen-Kommunikations-Systeme insbesondere im Fahrzeug vorteilhaft.Microphone / speaker placements according to the invention are better suited to multiple microphones or microphone arrays to initialize the voice prior to voice activity Arrays to get the speaker. Will the acoustic information and the image information fuses, so does the recognition in speech inactivity and identification of the speaker possible. This is for the hands-free, for the Speech recognition and for the Occupant communication systems, especially in the vehicle advantageous.

Die Erfindung wird im folgenden anhand von Ausführungsbeispielen beschrieben.The Invention will be described below with reference to exemplary embodiments.

Durch die vordefinierten Sitzplätze werden neben der üblichen Verwendung eines akustischen Signals, z.B. eines einzelnen Mikrofons oder eines Mikrofon-Arrays zur Erfassung des Sprachsignale der Sprecher, verteilte Mikrofon-Arrays eingesetzt. Für jeden möglichen Insassen wird an dessen Sitzplatzposition in der Nähe seiner Mundposition ein Mikrofon plaziert. Es werden mehrere individuelle Insassen-Mikrofone eingesetzt. Jedes einzelne der Insassen-Mikrofone wird zur weiteren Steigerung der akustischen Qualität durch ein Mikrofonarray ersetzt. Es wird eine Anordnung von mehreren individuellen Insassen-Mikrofon-Arrays gebildet.By the predefined seats Beyond the usual Use of an acoustic signal, e.g. a single microphone or a microphone array for detecting the speech signals of the speakers, used distributed microphone arrays. For any inmates will be at the Seating position nearby placed a microphone in his mouth position. There are several individual Occupant microphones used. Every single one of the occupant microphones is used to further increase the acoustic quality replaced a microphone array. An array of multiple individual occupant microphone arrays is formed.

Dies hat den Vorteil, daß der Sprecher leicht zu detektieren ist, da durch die Nähe der individuellen Mikrofone garantiert ist, daß das lauteste Mikrofon-Signal, bzw. das lauteste Mikrofon-Array-Signal den aktiven Sprecher definiert.This has the advantage that the The speaker is easy to detect, due to the proximity of the individual microphones it is guaranteed that the loudest microphone signal, or the loudest microphone array signal the defined active speaker.

Ein weiterer Vorteil bei der Verwendung individueller Insassen-Mikrofon-Arrays besteht darin, daß der Winkel des Sprechers zum Array bestimmt wird und dadurch die Sprecherposition genauer ermittelt wird.On Another advantage of using individual occupant microphone arrays is that the Angle of the speaker is determined to the array and thereby the speaker position is determined more accurately.

Weiterhin ist vorteilhaft, daß durch die Verknüpfung der einzelnen Insassen-Mikrofone oder Arrays mit einer Sprecherverifikation die Identität der einzelnen Insassen und deren Sitzplatz erkannt wird.Farther is advantageous that through The link individual occupant microphones or arrays with speaker verification the identity the individual occupants and their seat is detected.

Durch die Verküpfung der einzelnen Insassen-Mikrofone oder Arrays mit einem Spracherkenner sind vorteilhafterweise sprachbediente Operationen (Telefonbedienung, Radiobedienung u.ä.) von den Insassen ausführbar.By the connection the individual occupant microphones or arrays with a speech recognizer are advantageously voice-operated operations (telephone operation, Radio operation, etc.) Executable by the inmates.

Durch die vordefinierten Sitzplätze wird neben der üblichen Verwendung eines einzelnen Lautsprechersystem für alle Insassen ein verteiltes Lautsprecher-Arrays eingesetzt, bei dem für jeden möglichen Insassen an dessen Sitzplatzposition in der Nähe seiner Ohrposition ein Lautsprecher platziert ist. Damit ergibt sich eine Anordnung von mehreren individuellen Insassen-Lautsprechern oder Arrays. Durch die Verwendung von Insassen-Lautsprecherarrays wird die akustische Qualität gesteigert. Es wird eine Anordnung von mehreren individuellen Insassen-Lautsprecher-Arrays gebildet.By the predefined seats is next to the usual Using a single speaker system for all occupants a distributed Speaker arrays used in which for every possible occupant at the Seating position nearby his ear position a speaker is placed. This results an arrangement of several individual occupant speakers or arrays. Through the use of occupant speaker arrays, the acoustic quality increased. It will be an arrangement of several individual occupant speaker arrays educated.

Wird zusätzlich zur akustischen Erkennung die Erkennung durch die Bildverarbeitung mit eingeführt, tragen folgende Vorteile der Bilderkennung mit zur Insassenerkennung bei:

a) Die Bildverarbeitung erkennt, wieviele Insassen vorhanden sind, bzw. welche der Sitzplätze belegt sind.
b) Die Bildverarbeitung erkennt die Kopfposition der Insassen, die Ohren und den Mund.
c) Die Bildverarbeitung erkennt die Insassen (Insassen-Identifizierung).

If, in addition to the acoustic recognition, the recognition is introduced by the image processing, the following advantages of the image recognition contribute to the occupant recognition:

a) The image processing recognizes how many occupants are present or which seats are occupied.
b) The image processing recognizes the head position of the occupants, the ears and the mouth.
c) The image processing recognizes the occupants (occupant identification).

Durch die Fusion von Sprache und Bild wird der Nachteil beseitigt, daß mit der Sprache eine Identifizierung des Insassen nur gelingt, wenn der Insasse spricht. Durch die zusätzliche Identifikation über Bild, ist die Identifizierung des Insassen immer möglich.By the fusion of language and image eliminates the disadvantage that with the Language an identification of the occupants succeeds only if the Inmate speaks. By the additional Identification via image, the identification of the occupant is always possible.

Insbesondere erfordert die Spracheingabe mit Mikrofon-Arrays die adaptierte Ausrichtung der Mikrofone auf den Sprecher, speziell auf den Mund des Sprechers. Mit dem akustischen Signal erfolgt die Ausrichtung, wenn der Insasse spricht.In particular Voice input with microphone arrays requires the adapted alignment of the microphones on the speaker, especially on the mouth of the speaker. With the acoustic Signal takes the alignment when the occupant speaks.

Durch Kombination Sprache mit Bild ergeben sich folgende Vorteile:

– Initialisierung des Mikrofon-Arrays bevor der Insasse spricht. Dadurch ist bei einsetzender Sprache eine gute Start-Sprachqualität vorhanden. Auch die Detektion der Sprachaktivität wird vereinfacht, da der Sprach-Detektionsalgorithmus von einem Mikrofon-Array einfacher ist als der Bild-Detektionsalgorithmus.
– Bei Sprachaktivität wird die Adaption mit dem akustischen Signal durch Bild verbessert.
– Individuelle Mikrofone oder Mikrofon-Arrays für unbelegte Sitzplätze werden geschlossen. Dadurch werden fehlerhafte Sprachdetektionen abgeschaltet. Bei starken und/oder instationären Geräuschen im Fahrzeug wird durch das Abschalten eines Mikrofons ein deutlicher Vorteil erreicht. Es ergeben sich weniger Erkennungsfehler für den Fall, daß Geräusche aus dem nicht mit einem Insassen belegten Mikrofonsystem dem Spracherkenner angeboten werden.

The combination of language and image yields the following advantages:

- Initialization of the microphone array before the occupant speaks. As a result, a good start voice quality is available when using the language. Also, the detection of the voice activity is simplified because the speech detection algorithm of a microphone array is simpler than the image detection algorithm.
- In speech activity, the adaptation with the acoustic signal is improved by image.
- Individual microphones or microphone arrays for empty seats are closed. This will disable faulty speech detections. With strong and / or unsteady noises in the vehicle, a clear advantage is achieved by switching off a microphone. There are fewer recognition errors in the event that noises from the not occupied with an occupant microphone system are offered to the speech recognizer.

Die Wiedergabe mit individuellen Lautsprechersystemen erfordert eine Ausrichtung auf die Ohren des Hörers. Mit dem akustischen Signal, dem Mikrofonsignal, erfolgt die Ausrichtung der Lautsprecher, wenn der Insasse spricht.The Playing with individual speaker systems requires one Alignment with the ears of the listener. With the acoustic signal, the microphone signal, the alignment takes place the speaker when the occupant speaks.

Durch Kombination Sprache mit Bild ergeben sich folgende Vorteile bei der Audio Ausgabe:

– wenn der Hörer hört und nicht spricht, erfolgt die Kopf/Ohr-Erkennung nur mit einer Bildverarbeitung.
– Nicht belegte Sitzplätze werden nicht beschallt. Dadurch entstehen keine unnötigen akustische Störungen der weiteren Insassen.

The combination of voice and image results in the following advantages for audio output:

- If the listener hears and does not speak, the head / ear detection is done only with an image processing.
- Unoccupied seats will not be sounded. This does not create unnecessary acoustic interference of the other occupants.

Mit den erfindungsgemäßen Anordnungen wird bei

1) individuellen Audio-Wiedergabesystemen die Information vorgegeben, welche Sitzplätze belegt sind. Nicht belegte Sitzplätze werden nicht beschallt. Bei belegten Sitzplätzen wird die Bildinformation benutzt um dem Kopf/Ohren zu folgen.
2) individuellen Sprach-Eingabesystemen die Information vorgegeben, welche Sitzplätze belegt sind, um die Mikrofone der nichtbelegten Sitzplätze abzuschalten.

With the arrangements according to the invention is at

1) individual audio playback systems, the information given, which seats are occupied. Unoccupied seats are not sonicated. In occupied seats, the image information is used to follow the head / ears.
2) individual voice input systems given the information which seats are occupied to disable the microphones of the unoccupied seats.

Weitere Anwendungen findet die erfindungsgemäße Anordnung bei Insassen-Kommunikations-Systemen mit Spracherkennug und Sprechererkennung insbesondere im Fahrzeug. Das System aus Sprach- und Bildverarbeitung erkennt die Sitzbelegung, d.h. den Namen der Personen die sich auf den Sitzen befinden. Die Erkennung der Personen erfolgt durch Sprecherindentifizierung und/oder Gesichts-Identifizierung. Per Spracherkennung wird dann z.B. von einem Insassen gesagt: „Ich möchte mit Peter sprechen". Das System erkennt von welchem Sitz gesprochen wird und erkennt auch den Sitzplatz von Peter. Es wird dann lediglich das Lautsprecher-Mikrofonsystem zwischen den beiden Personen aktiviert, die weiteren Personen werden nicht hinzugeschaltet und damit nicht gestört.Further applications of the inventive arrangement in occupant communication systems with voice recognition and speaker recognition, especially in the vehicle. The system of voice and image processing recognizes the seat occupancy, ie the name of the persons who are sitting on the seats. The recognition of the persons takes place by speaker identification and / or facial identification. By voice recognition is then said eg by an inmate: "I want to talk to Peter." The system recognizes from wel chem seat is spoken and also recognizes the seat of Peter. It is then activated only the speaker microphone system between the two persons, the other people are not connected and thus not disturbed.

Sofern das Fahrzeug mit Monitoren an den einzelnen Sitzplätzen ausgestattet ist, kann das Gesicht des Sprechenden an den oder die Hörer gesendet werden.Provided the vehicle equipped with monitors at the individual seats is, the speaker's face can be sent to the listener or listeners become.

Bei einer Videokonferenz mit Teilnehmern außerhalb des Fahrzeugs wird das Bild des jeweils Sprechenden mit übertragen. Falls jeder Sitzplatz mit einem Monitor ausgestattet ist, sehen die Teilnehmer innerhalb des Fahrzeugs jeweils den Sprechenden auf ihrem individuellen Monitor.at a video conference with participants outside the vehicle the image of each speaker with transferred. If every seat equipped with a monitor, the participants see inside of the vehicle respectively the speaker on their individual monitor.

Die Erfindung ist nicht auf die angegebenen Ausführungsbeispiele beschränkt, sondern es ist die Verwendung in Konferenzsystemen jeglicher Art möglich.The Invention is not limited to the specified embodiments, but it is possible to use it in conference systems of any kind.

Claims

A visual-acoustic arrangement for audio playback, voice input and multi-party communication in which microphone and / or loudspeaker arrangements are individually adapted for the participants and which includes a camera arrangement with image processing through which at least a portion of the area in which participants are located can be detected, characterized in that a means is provided for aligning the microphone and / or loudspeaker arrangements on at least one of the participants on the basis of the data of the image processing.

Visual-acoustic arrangement according to claim 1, characterized characterized in that Arrangement installed in a vehicle with predefined seating is.

Visual-acoustic arrangement according to claim 1 and 2, characterized in that the Recording of the speech signals of the participants, distributed microphone arrays are mounted so that for every possible Participant in his position near his oral position at least a microphone or microphone array is placed.

Visual-acoustic arrangement according to claim 1 and 2, characterized in that for all participants a distributed speaker array is built in which for each potential Participant at his position near his ear position a speaker or speaker array is placed.

Visual-acoustic arrangement according to one of the preceding Claims, characterized in that the Microphone arrangement connected to a speech recognition and / or speaker identification system is.

Visual-acoustic arrangement according to one of the preceding Claims, characterized in that the Microphone / speakerphone arrays of each participant individually can be switched on and off.

Visual-acoustic arrangement according to one of the preceding Claims, characterized in that in addition to Acoustic detection of the participants a detection over the Image processing is done to determine the head position, the ears and of the mouth and to identify the participants.

Visual-acoustic arrangement according to one of the preceding Claims, characterized in that a Monitor for each participant is switchable on which the participants are visible are.

Visual-acoustic method for audio playback, voice input and communication between several participants, that microphone and / or loudspeaker arrangements for the participants individually are adaptable, and that a camera arrangement with image processing, at least part of the area in which participants are located can detected, characterized, that on the basis the recognition of the participants by means of microphone and / or loudspeaker arrangements be targeted to at least one of the participants.

Method according to claim 9, characterized in that that by the distributed, individual arrangement of active speakers defines the loudest microphone signal or the loudest microphone array signal becomes.

Method according to claim 9, characterized in that that at the use of individual occupant microphone arrays of angles of the speaker is determined to the array and thereby the speaker position is determined more accurately.

Method according to one of the preceding claims, characterized characterized in that The link of individual occupant microphones (arrays) with a speaker verification the identity the individual participant and their position is detected.

Method according to one of the preceding claims, characterized in that by the Association of individual occupant microphones or arrays with a speech recognizer voice-operated operations performed by the participants.

Method according to one of the preceding claims, characterized characterized in that the combination of the acoustic participant recognition with the image processing an initialization of the microphone array is performed, before the participant speaks.

Method according to one of the preceding claims, characterized characterized in that in language activity the adaptation is improved with the acoustic signal through the image.

Method according to one of the preceding claims, characterized characterized in that the acoustic signal, the orientation of the speakers or speaker arrays takes place when the participant speaks.

Method according to one of the preceding claims, characterized characterized in that when the listener hear and does not speak, the head / ear detection only with image processing carried out becomes.

Method according to one of the preceding claims, characterized characterized in that by means of image processing unoccupied seats be recognized, and that the individual microphones or microphone arrays for unoccupied seats and thereby faulty speech detections are turned off.

Method according to one of the preceding claims, characterized characterized in that even before a voice activity at least a subscriber initializes the array to the subscriber he follows.