DE4216455C2

DE4216455C2 - Voice control device

Info

Publication number: DE4216455C2
Application number: DE19924216455
Authority: DE
Inventors: Keiichi Miyamoto
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-05-20
Filing date: 1992-05-19
Publication date: 1994-02-10
Anticipated expiration: 2012-05-20
Also published as: DE4216455A1

Description

Die Erfindung betrifft eine Sprachsteuereinrichtung nach dem Oberbegriff des Anspruchs 1.The invention relates to a voice control device according to the Preamble of claim 1.

Eine derartige Sprachsteuereinrichtung ist aus der DE 32 38 853 A1 bekannt. Diese bekannte Sprachsteuereinrichtung umfaßt ein sprachbetriebenes System, welches durch einen Sprachbefehl betrieben werden kann, welcher durch eine Spracherkennung erkannt worden ist, wobei das System mit einer Anzahl manuell bedienbarer Steuerschalter zum Auslösen von Operationen des Systems versehen ist. Die bekannte Sprachsteuereinrichtung umfaßt ferner eine Merkmalextrahier einrichtung zum Extrahieren eines Merkmalsmusters aus einer Spracheingabe, ferner eine Speichereinrichtung zum Speichern von Standardmustern, welche aus einer Gruppe von Standard- Sprachbefehlen extrahierte Merkmalsmuster sind, eine Ähnlich keitsprüfeinrichtung, um einen Ähnlichkeitsgrad des extrahierten Merkmalsmusters in Beziehung zu jedem der gespeicherten Standardmuster zu berechnen, eine Auswähleinrichtung, um einen Sprachbefehl, welcher aus der Spracheingabe erkannt worden ist, auszuwählen, wobei dieser Sprachbefehl den höchsten Ähnlichkeitsgrad bezüglich der gespeicherten Standardmuster hat. Schließlich ist zumindest ein Eingabeschalter vorhanden zum Eingeben eines neuen Sprachbefehls in die Speichereinrichtung.Such a voice control device is from DE 32 38 853 A1 known. This known voice control device comprises a voice operated system, which by a Voice command can be operated by a Speech recognition has been recognized, the system with a number of manually operated control switches for triggering is provided by operations of the system. The well-known The voice control device further comprises a feature extractor device for extracting a feature pattern from a Voice input, furthermore a storage device for storing of standard patterns, which consist of a group of standard Feature commands extracted feature commands are similar kessprüfeinrichtung to a degree of similarity of the extracted Feature pattern in relation to each of the stored Standard pattern to calculate a selector to a voice command, which is recognized from the voice input has been selected, this voice command being the highest Degree of similarity with regard to the stored standard patterns Has. Finally, there is at least one input switch for entering a new voice command into the storage device.

Es ist auch bereits ein ähnliches Spracherkennungssystem ent wickelt worden, bei welchem eine seiner Anwendungen eine Spracheingabeeinrichtung ist, die in einem Personalcomputer, einem Wortprozessor und anderen Computersystemen verwendet worden ist. Im allgemeinen sind Spracheingaben, welche durch das Spracherkennungssystem identifiziert werden, diskrete Meldungen, die von einem Benutzer in ein Mikrophon des Systems gesprochen werden, und die meisten Spracheingaben stellen einen Steuerbefehl, um ein sprachbetriebenes System zu betreiben, oder ein Steuerobjekt dar, das von einem Computer system identifiziert wird. Wie bei der Spracherkennung sind die Erkennungsgenauigkeit und -geschwindigkeiten wichtige Merkmale. Wenn die Anzahl an Vokabularworten, welche von dem Spracherkennungssystem erkannt werden müssen, zunimmt, wird die Erkennungsgenauigkeit geringer und die erforderliche Er kennungszeit wird größer. Um eine hohe Erkennungsgenauigkeit und -geschwindigkeit zu erhalten, sollte bei dem Spracherken nungssystem eine minimale Anzahl an Vokabularworten verwendet werden, welche für die jeweiligen Erkennungsstufen notwendig sind, die in dem System aufgetreten sind.A similar speech recognition system is already in place has been developed in which one of its applications is a Voice input device that is in a personal computer, a word processor and other computer systems has been. In general, voice input is through the speech recognition system can be identified, discrete Messages from a user into a system microphone are spoken, and make most speech inputs a control command to a voice operated system too operate, or is a control object created by a computer system is identified. As with speech recognition the detection accuracy and speeds are important Characteristics. If the number of vocabulary words used by the Speech recognition system must be recognized, is increasing the detection accuracy is lower and the required Er identification time is increasing. For high recognition accuracy getting and speed should be at the speech recognition system uses a minimum number of vocabulary words which are necessary for the respective recognition levels that have occurred in the system.

Beispielsweise enthält eine Sprachsteuereinrichtung zum Be treiben einer Wagenstereoeinheit in einem Kraftfahrzeug das Spracherkennungssystem, um eine von einem Benutzer eingegebene Sprache als einen Befehl zu erkennen, um die Einheit zu be treiben, um auf diese Weise automatische Operationen der Wagen stereoeinheit durchzuführen. Ebenso führt die Einrichtung ein automatisches Ein-/Ausschalten einer Klimaanlage in dem Kraftfahrzeug durch sowie ein automatisches Öffnen und Schließen von Fensterscheiben in Türen der Kraftfahrzeuge durch.For example, includes a voice control device for loading drive a car stereo unit in a motor vehicle Speech recognition system to a user-entered To recognize speech as a command to be the unit drive in this way automatic operations of the car stereo unit. The facility also leads an automatic switching on / off of an air conditioning system in the Motor vehicle through as well as an automatic opening and Closing window panes in motor vehicle doors by.

Für eine solche Anwendungseinrichtung sollte eine minimale Anzahl an Vokabularworten verwendet werden, welche entsprechend den jeweiligen Erkennungsstufen, die von dem Sprachsteuersystem angetroffen werden, erkannt werden können.For such an application device, a minimal Number of vocabulary words are used, which accordingly the respective recognition levels by the voice control system can be found, can be recognized.

Es ist dabei ein Befehlswörterbuch, in welchem mit Sprachein gaben zu vergleichende Vokabularworte gespeichert sind, gemäß der Kategorie in einige Gruppen unterteilt (wobei diese nach stehend als Teilwörterbücher bezeichnet werden). Jedes der Teilwörterbücher wird durch eine Identifizierungs- oder Kennummer ID identifiziert. Derartige Kennummern sind den Vokabularworten in dem Wörterbuch zugeordnet, und es müssen daher die Vokabularworte zusammen mit den zugeordneten Kennummern in das Wörterbuch des Sprachsteuersystems eingegeben werden, so daß sie programmiert werden. Wenn ein Befehl von einem Be nutzer in das Mikrophon des Sprachsteuersystems gesprochen wird, wird eine Kenn- oder ID-Nummer entsprechend dem Befehl spezifiziert, bevor die Spracherkennung gestartet wird. Dann wählt das Sprachsteuersystem eine Teilgruppe von Vokabular wörtern aus dem Wörterbuch vorher aus, wobei die Wörter eine zugeordnete Kennummer haben, die mit der spezifizierten ID- Nummer übereinstimmt. Die Spracheingabe wird dadurch erkannt, daß sie nur mit den Vokabularworten verglichen wird, die in dem durch die ID-Nummer spezifizierten Teilwörterbuch enthalten sind. Hierdurch ist es ermöglicht, daß das Sprachsteuersystem nur einen schmalen Bereich des Wörterbuchs benutzt.It is a command dictionary in which with language Vocabulary words to be compared are stored in accordance with divided into several groups in the category (according to are referred to as partial dictionaries). Each of the Sub-dictionaries are identified by an identification number ID identified. Such identification numbers are the vocabulary words mapped in the dictionary and therefore it must the vocabulary words together with the assigned identification numbers entered into the dictionary of the voice control system, so that they can be programmed. If an order from a Be user spoken into the microphone of the voice control system becomes an identification or ID number according to the command specified before voice recognition is started. Then the voice control system chooses a subset of vocabulary words from the dictionary beforehand, with the words one assigned identification number, which with the specified ID Number matches. The voice input is recognized by that it is only compared to the vocabulary words used in contained in the sub-dictionary specified by the ID number are. This enables the voice control system uses only a narrow area of the dictionary.

In Fig. 7 ist ein Befehlswörterbuch des vorerwähnten Typs dargestellt, welches in einem herkömmlichen Sprachsteuersystem zum Betreiben einer Wagenstereoeinheit vorgesehen ist. Dieses Wörterbuch ist in drei Teilwörterbüchern gemäß den Be dienungen der Wagenstereoeinheit unterteilt, von denen jede durch eine Kenn- oder ID-Nummer identifiziert wird. Diese Wagenstereoeinheit hat drei Betriebsarten, und die Teilwörter bücher entsprechen den einzelnen Betriebsarten. In Fig. 7 ist dieselbe ID-Nummer Vokabularworten in jeder der Teilwörter bücher zugeteilt, und solche Wörter mit derselben ID-Zahl sind Sprachbefehle, die wirksam nur bei der entsprechenden Betriebsart verwendet werden können. Fig. 7 shows a command dictionary of the aforementioned type which is provided in a conventional voice control system for operating a car stereo unit. This dictionary is divided into three sub-dictionaries according to the operations of the car stereo unit, each of which is identified by an identification or ID number. This car stereo unit has three operating modes, and the partial words books correspond to the individual operating modes. In Fig. 7, the same ID number is assigned to vocabulary words in each of the partial words books, and such words with the same ID number are voice commands that can only be used effectively in the corresponding mode.

Gemäß den Betriebsarten der Wagenstereoeinheit hat das Sprachsteuersystem drei Erkennungsstufen, nämlich eine "Radio"-, eine "Kassetten"- und eine "Radio/Kassetten"-Stufe. Gesprochene Worte, welche von dem Sprachsteuersystem in der "Radio"-Stufe richtig empfangen werden können, sind auf "Ab tasten" bzw. "Suchen" (scan) und "Kassette" in dem Wörterbuch beschränkt, und es ist ID=1 diesen Sprachbefehlen zugeordnet. Gesprochene Worte im Falle der "Kassetten"-Stufe sind auf "Wiedergabe (play)", "Vorwärtsspulen" (forward feed), "Rückspulen" (rewind) und "Radio" in dem Wörterbuch begrenzt, und ID=2 ist diesen Befehlen zugeordnet. Gesprochene Worte im Falle der "Radio/Kassette"-Stufe sind nicht beschränkt, und "Lautstärke laut" (volume up) und "Lautstärke leise" (volume down) können in einer der drei Erkennungsstufen er kannt werden. ID=0 ist solchen Befehlen zugeordnet, und es wird nachstehend als umfassende ID- oder Kennummer bezeichnet. In diesem Fall ist die globale ID-Nummer ID=0, und sie ist den Sprachbefehlen zugeordnet, welche der "Radio/Kassetten"- Stufe entsprechen.According to the modes of operation of the car stereo unit, this has Voice control system three levels of recognition, namely a "radio" -, a "cassette" and a "radio / cassette" stage. Spoken words, which are used by the voice control system in the "Radio" level can be received correctly are set to "Down buttons "or" Search "(scan) and" Cassette "in the dictionary limited, and ID = 1 is assigned to these voice commands. Spoken words are in the case of the "cassette" level on "play", "forward feed", "Rewind" and "radio" limited in the dictionary, and ID = 2 is assigned to these commands. Spoken words in the case of the "radio / cassette" level are not limited, and "volume up" and "volume down" (volume down) can be in one of the three detection levels be known. ID = 0 is assigned to such commands, and it is referred to below as a comprehensive ID or identification number. In this case, the global ID number ID = 0, and it is assigned to the voice commands which of the "radio / cassettes" - Level.

Wenn eine Spracheingabe in die "Radio"-Stufe gegeben wird, spezifiziert das Sprachsteuersystem die ID-Nummer, die entweder gleich 0 oder 1 ist, und wählt vorher eine Teilgruppe von Vokabularworten aus dem Befehlswörterbuch aus, wobei die Worte eine zugeordnete ID-Nummer haben, welche mit der spezifizierten ID-Nummer übereinstimmt. Die Spracheingabe wird dadurch erkannt, daß sie nur mit solchen Worten in dem Teilwörterbuch verglichen wird, welches durch die ID-Nummer spezifiziert worden ist. In der "Kassetten"-Stufe ist die ID-Nummer so spezifiziert, daß sie entweder gleich 0 oder 2 ist. Jedoch ergibt sich eine Schwierigkeit, daß ein Befehlseingabeprozeß (oder ein Trainingsmode des Spracherkennungssystems), welcher die ID-Numerierung erfordert, für den Benutzer lästig ist. Wenn die festgelegte ID-Nummer von dem Sprachsteuersystem festgelegt wird, kann der Benutzer den Sprachbefehl nicht frei wählen, wenn das sprachbetriebene System in einem spe ziellen Betriebsmode arbeitet. In einem sprecherunabhängigen Spracherkennungssystem sind eine Folge notwendiger Operationen und ein Befehlswörterbuch vorprogrammiert. Ein Benutzer kann ein solches Spracherkennungssystem benutzen, ohne daß von ihm selbst gesprochene Befehle in das Wörterbuch eingegeben werden müssen. Jedoch sind die Anzahl und die Art verwendbarer Worte zum Identifizieren der von dem System durchgeführten Operationen begrenzt und festgelegt, und der Benutzer kann einen Befehl zum Betreiben des Systems nicht frei wählen. In einem sprecherabhängigen Spracherkennungssystem kann ein gewünschter Sprachbefehl in das Wörterbuch eingegeben werden. Jedoch ist ein Prozeß zum Eingeben von Sprachbefehlen in das Wörterbuch schwierig zu verwalten, und der Benutzer kann nicht ohne weiteres den Befehlseingabeprozeß ausführen. Üblicherweise werden in dem sprecherabhängigen System Sprach befehle in das Wörterbuch durch Bestimmen einer Wortzahl ein gegeben, die jeden der Befehle identifiziert. Jedoch gibt es dann noch weitere Probleme, daß der Benutzer eine derartige Identifizierungszahl in dem Befehlseingabeprozeß nennen muß, was für den Benutzer unbequem und lästig ist. Außerdem muß in dem Spracherkennungssystem eine größere Anzahl von Steuer schaltern vorgesehen sein, und es weist einen komplizierten Aufbau auf, wenn die Anzahl an Sprachbefehlen in dem Wörterbuch größer wird. Es sollte daher ein einfaches und preiswertes System für den Benutzer geschaffen werden.If a voice input is given in the "radio" level, the voice control system specifies the ID number, which is either is 0 or 1 and previously selects a subset of Vocabulary words from the command dictionary, the words have an assigned ID number that matches the specified ID number matches. The voice input is thereby recognized that only with such words in the sub-dictionary is compared, which is specified by the ID number has been. In the "cassette" level is the ID number specified to be either 0 or 2. However there is a problem that a command entry process (or a training mode of the speech recognition system) which requires ID numbering, which is troublesome for the user. If the specified ID number from the voice control system is set, the user cannot use the voice command choose freely if the voice-operated system is in one spe operating mode works. In a speaker-independent Speech recognition systems are a consequence of necessary operations and preprogrammed a command dictionary. One user can use such a speech recognition system without having to commands given to him in the dictionary Need to become. However, the number and type are more usable Words to identify those performed by the system Operations limited and fixed, and the user cannot freely choose a command to operate the system. In a speaker-dependent speech recognition system a desired voice command entered in the dictionary become. However, there is a process for entering voice commands difficult to manage in the dictionary, and the user cannot easily execute the command entry process. Speech is usually used in the speaker-dependent system command into the dictionary by determining a word count given that identifies each of the commands. However there is Then there are other problems that the user is experiencing Must provide identification number in the command entry process which is inconvenient and troublesome for the user. In addition, in the speech recognition system a larger number of taxes switches are provided, and it has a complicated Build up when the number of voice commands in the dictionary gets bigger. It should therefore be a simple and inexpensive one System created for the user.

Die der Erfindung zugrunde liegende Aufgabe besteht darin, eine Sprachsteuereinrichtung der angegebenen Gattung zu schaffen, welche eine einfachere und schnellere Bedienung, insbesondere hinsichtlich der Eingabe- und Löschfunktion von Befehlen ermöglicht.The object underlying the invention is a voice control device of the type specified create which is easier and faster to use, especially with regard to the input and delete function of Command enables.

Diese Aufgabe wird erfindungsgemäß durch die im Kennzeich nungsteil des Patentanspruches 1 aufgeführten Merkmale gelöst.This object is achieved by the in the character solved part of claim 1 listed features.

Besonders vorteilhafte Ausgestaltungen und Weiterbildungen der Erfindung ergeben sich aus den Unteransprüchen.Particularly advantageous refinements and developments the invention emerge from the subclaims.

Im folgenden wird die Erfindung anhand von Ausführungsbei spielen unter Hinweis auf die Zeichnung näher erläutert. Es zeigt In the following the invention is based on exemplary embodiments play explained with reference to the drawing. It shows

Fig. 1 ein Blockdiagramm einer ersten Ausführungsform einer Sprachsteuereinrichtung mit Merkmalen nach der Erfindung; Fig. 1 is a block diagram of a first embodiment of a speech control means with features according to the invention;

Fig. 2 ein Diagramm zum Erläutern der Inhalte eines Merkmalparameter-Wörterbuchs;2 is a diagram for explaining the contents of a feature parameter dictionary.

Fig. 3 eine schematische Darstellung eines Bedienungsfelds einer Wagenstereoeinheit, in welcher die Erfindung angewendet wird; Fig. 3 is a schematic representation of a control panel of a car stereo unit in which the invention is applied;

Fig. 4 ein Diagramm zum Erläutern der Inhalte des Merkmal parameter-Wörterbuchs mit einem auf den neuesten Stand gebrachten Cluster- oder Mehrfachdatenset; Fig. 4 is a diagram for explaining the contents of the feature parameter dictionary with an updated cluster or multiple data set;

Fig. 5 ein Diagramm, welches die Inhalte eines Cluster datensets wiedergibt, welches einem spezifischen Sprachbefehl entspricht; Fig. 5 is a diagram showing data sets representing the contents of a cluster corresponding to a specific voice command;

Fig. 6 eine schematische Darstellung eines anderen Bedie nungsfeldes der Wagenstereoeinheit, deren Bedie nungsfeld gegenüber Fig. 3 verbessert ist; Fig. 6 is a schematic representation of another operating field of the car stereo unit, the operating field is improved compared to Fig. 3;

Fig. 7 ein Diagramm zum Erläutern des Inhaltes eines herkömmlichen Merkmalparameter-Wörterbuchs; Fig. 7 is a diagram for explaining the contents of a conventional feature parameter dictionary;

Fig. 8 ein Blockdiagramm einer zweiten Ausführungsform der Sprachsteuereinrichtung mit Merkmalen nach der Erfindung; Fig. 8 is a block diagram of a second embodiment of the speech control unit having features according to the invention;

Fig. 9 eine schematische Darstellung eines Bedienungs feldes der Wagenstereoeinheit, welche zusammen mit der Sprachsteuereinrichtung der Fig. 8 ver wendet wird; Fig. 9 is a schematic representation of an operating panel of the car stereo unit, which is used together with the voice control device of Fig. 8;

Fig. 10 eine schematische Darstellung noch eines weiteren Bedienungsfeldes der Wagenstereoeinheit, deren Bedienungsfeld gegenüber demjenigen der Fig. 9 ver bessert ist; Fig. 10 is a schematic representation of yet another control panel of the car stereo unit, the control panel is improved compared to that of Figure 9 ver.

Fig. 11 ein Blockdiagramm einer dritten Ausführungsform der Sprachsteuereinheit mit Merkmalen nach der Erfindung; 11 is a block diagram of a third embodiment of the speech control unit having features according to the invention.

Fig. 12 eine schematische Darstellung eines Bedienungs feldes der Wagenstereoeinheit, deren Bedienungs feld gegenüber demjenigen der Fig. 9 verbessert ist; und Fig. 12 is a schematic representation of an operating field of the car stereo unit, the operating field is improved compared to that of Fig. 9; and

Fig. 13 ein Diagramm zum Erläutern von Änderungen von Be dienungsbedingungen der Wagenstereoeinheit ent sprechend einer Spracheingabe in der Sprachsteuer einrichtung. Fig. 13 is a diagram for explaining changes of loading the car stereo unit dienungsbedingungen accordingly a voice input in the voice control means.

Anhand von Fig. 1 wird nunmehr eine erste Ausführungsform einer Sprachsteuereinrichtung mit Merkmalen nach der Erfindung be schrieben. In Fig. 1 hat die Sprachsteuereinrichtung einen Merk malextrahierteil 1, um einen Merkmalparameter aus einer Sprach eingabe über ein Mikrophon zu extrahieren, ein Merkmalpara meter-Wörterbuch 2 zum Speichern einer Gruppe von Standard- Merkmalparametern, welche einer Gruppe von Standard-Sprachbe fehlen entsprechen, einen Ähnlichkeitsprüfteil 3, einen Aus gabeteil 4, einen Steuerteil 5, einen Tastenschalterteil 6, einen Vorwählteil 7 und einen Cluster- oder Mehrfachdaten teil 8. Der Merkmalextrahierteil 1 extrahiert einen Merkmal parameter aus der Spracheingabe. Nachdem die Spracheingabe empfangen ist, wählt der Vorwählteil 7 nur eine Teilgruppe von Standardmerkmalparametern aus den in dem Speicher 2 ge speicherten Standardmerkmalparametern aus. Der Ähnlichkeitsprüf teil 3 berechnet den Ähnlichkeitsgrad des extrahierten Para meters nur in Relation zu der Teilgruppe von Standardmerk malparametern, welche durch den Vorauswählteil 7 ausgewählt worden sind. Nachdem die Ähnlichkeitsgrade für alle ausge wählten Vokabularworte in dem Wörterbuch 2 berechnet sind, erkennt der Ähnlichkeitsprüfteil 3 die Spracheingabe, indem ein Wort, welches dem Merkmalparameter mit dem höchsten Ähn lichkeitsgrad entspricht, aus den Berechnungsergebnissen ausgewählt wird. Der Ausgabeteil 4 gibt schließlich das er kannte Wort an eine Anzeigeeinheit ab. In Fig. 2 ist ein An fangszustand des Merkmalparameter-Wörterbuchs 3 dargestellt. Der Cluster-Datenteil 8 hat Anfangscluster- oder Mehrfach daten, welche anzeigen, daß die ausgewählten Vokabularworte Kandidaten- bzw. Anwärterworte für die Spracheingabe in allen Betriebsarten sind.Referring to Fig. 1, a first embodiment will now be a voice control device with features according to the invention be described. In Fig. 1, the voice control device has a feature extracting part 1 for extracting a feature parameter from a voice input through a microphone, a feature parameter dictionary 2 for storing a group of standard feature parameters which correspond to a group of standard voice commands, a similarity check part 3 , an output part 4 , a control part 5 , a key switch part 6 , a preselection part 7 and a cluster or multiple data part 8 . The feature extracting part 1 extracts a feature parameter from the voice input. After the voice input is received, the preselection part 7 selects only a subset of standard feature parameters from the standard feature parameters stored in the memory 2 . The similarity check part 3 calculates the degree of similarity of the extracted parameter only in relation to the subgroup of standard characteristic parameters which have been selected by the preselection part 7 . After the degrees of similarity have been calculated for all selected vocabulary words in the dictionary 2 , the similarity checking part 3 recognizes the speech input by selecting a word which corresponds to the feature parameter with the highest degree of similarity from the calculation results. The output part 4 finally outputs the word he knows to a display unit. In Fig. 2 an initial state of the feature parameter dictionary 3 is shown. The cluster data part 8 has initial cluster or multiple data, which indicate that the selected vocabulary words are candidate or candidate words for voice input in all operating modes.

In Fig. 3 ist ein Bedienungsfeld einer Wagenstereoeinheit dar gestellt, bei welcher die Erfindung verwendet ist. Das Be dienungsfeld entspricht dem Tastenschaltteil 6. Bei diesem Bedienungsfeld sind ein Eingabeschalter 11, ein Startschalter 12 und ein Bestätigungsschalter 13 vorgesehen. Der Eingabeschalter 11 wird gedrückt, wenn ein neuer Sprach befehl in die Sprachsteuereinrichtung eingegeben wird, so daß ein entsprechender Merkmalparameter einer solchen Sprach eingabe in dem Merkmalparameter-Wörterbuch 2 gespeichert wird. Der Startschalter 12 wird gedrückt, um die Spracher kennung für einen von einem Benutzer eingegebenen Sprachbe fehl zu starten. Der Bestätigungsschalter 13 wird von dem Benutzer gedrückt, nachdem herausgefunden ist, daß das Er gebnis der Spracherkennung richtig ist. Weitere Steuerschal ter auf dem Bedienungsfeld der Wagen-Stereoeinheit sind be kannt, und brauchen daher hier nicht beschrieben zu werden.In Fig. 3, a control panel of a car stereo unit is provided, in which the invention is used. The control panel corresponds to the button switch part 6 . In this control panel, an input switch 11 , a start switch 12 and a confirmation switch 13 are provided. The input switch 11 is pressed when a new voice command is entered into the voice control device, so that a corresponding feature parameter of such a voice input is stored in the feature parameter dictionary 2 . The start switch 12 is pressed to start the speech recognition for a speech input from a user. The confirmation switch 13 is pressed by the user after it is found that the result of the speech recognition is correct. Other control switches ter on the control panel of the car stereo unit are known, and therefore need not be described here.

Das in Fig. 3 dargestellte Bedienungsfeld der Wagenstereoein heit weist einen Löschschalter 14, einen Ausschließungs schalter 15 und einen Ausschließ-Löschschalter 16 auf. Wenn herausgefunden wird, daß das Erkennungsergebnis sich ent sprechend einem von dem Benutzer gesprochenen Spracheingabe von einem gewünschten Ergebnis unterscheidet, wird entweder der Löschschalter 14 oder der Ausschließungsschalter 15 gedrückt. Wenn der Löschschalter 14 gedrückt wird, erscheint ein Sekun där-Kandidatenwort auf einem Display des Bedienungsfeldes. Wenn der Schalter 14 wieder gedrückt wird, erscheint ein anderes Kandidatenwort anstelle des vorerwähnten Sekundär- Kandidaten. Wenn der Bestätigungsschalter gedrückt wird, wenn herausgefunden wird, daß eine korrekte Bezeichnung auf dem Display unter denjenigen Kandidatenworten erscheint, die nacheinander auf dem Display erscheinen, wird eine gewünschte Operation durchgeführt. Wenn dagegen der Ausschließungsschalter 15 gedrückt wird, tritt die vorerwähnte Änderung von Kan didatenworten auf dem Display ein, und außerdem werden ent sprechende Clusterdaten in dem Datenteil 8 auf den neuesten Stand gebracht, um so anzuzeigen, daß das gelöschte Wort aus den Eingaben von Sprachbefehlen in dem Merkmalparameter- Wörterbuch 2 ausgeschlossen ist, wenn die Betriebsart der Wagenstereoeinheit wieder auf die aktuelle Betriebsart zu rückgekehrt ist.The control panel shown in Fig. 3 the Wagenstereoein unit has an erase switch 14 , an exclusion switch 15 and an exclude erase switch 16 . If it is found that the recognition result differs accordingly from a voice input spoken by the user from a desired result, either the delete switch 14 or the exclusion switch 15 is pressed. When the delete switch 14 is pressed, a secondary candidate word appears on a display of the control panel. When the switch 14 is pressed again, another candidate word appears in place of the aforementioned secondary candidate. If the confirmation switch is pressed when it is found that a correct name appears on the display among those candidate words that appear successively on the display, a desired operation is performed. On the other hand, when the exclusion switch 15 is pressed, the aforementioned change of candidate words occurs on the display, and also corresponding cluster data are updated in the data part 8 so as to indicate that the deleted word from the input of voice commands into the feature parameter dictionary 2 is excluded if the operating mode of the car stereo unit has returned to the current operating mode.

In Fig. 5 ist ein Clusterdaten-Set in dem Datenteil 8 darge stellt; diese Clusterdaten entsprechen einem Befehl "Wieder gabe" in dem Merkmalparameter-Wörterbuch 2. In Fig. 5 stellt ein Wert der Zeile "ID" in dieser Aufstellung jede Betriebsart der Sprachsteuereinrichtung dar und ein Wert der Zeile "AUS" zeigt an, ob ein entsprechender Clusterdatenwert auf "1" eingestellt ist (oder ein inaktiver Clusterdatenwert ist). Wenn beispielsweise die Wagenstereoeinheit in der Befehlsein gabebetriebsart ist, wie vorstehend beschrieben ist, wird ein Clusterbit für die Kandidatenworte in dem Merkmalparameter- Wörterbuch 1 aktiviert (das Bit, welches der ID-Nummer ID =0 entspricht, wird Null gesetzt). Wie vorstehend beschrieben, hat die Wagenstereoeinheit drei Betriebsarten, den "Radio"- Betrieb (ID=1), den "Kassetten"-Betrieb (ID=2) und den "Radio/Kassetten"-Betrieb (ID=0). Wenn der Ausschließungs schalter 15 entsprechend einem Wort "Wiedergabe" gedrückt wird, welches unrichtig auf dem Display des Bedienungsfeldes erschienen ist, und sich die Stereoeinheit in der "Radio"- Betriebsart befindet, erscheint ein Sekundär-Kandidatenwort auf dem Display, und ein Cluster-Datenwert, welcher dem Wort "Wiedergabe" entspricht, wird auf den neuesten Stand gebracht. In Fig. 5 sind der aktualisierte Clusterdatensatz in dem Clusterdatenteil 8 die AUS-Bits, welche der globalen Betriebsart und der "Radio" -Betriebsart entsprechen, beide aktiviert (auf den Wert Eins gesetzt). Hierdurch kann dann die vorerwähnte Befehls ausschließungsfunktion erreicht werden, um so den Befehl "Wiedergabe" aus den Eingaben von Befehlsdaten in dem Wörter buch 2 auszuschließen, wenn die Stereoeinheit entweder in der "Radio/Kassetten"-Betriebsart (ID=0) oder in der "Radio "-Betriebsart (ID=1) ist. Nachdem die Befehlsausschließungsfunktion wiederholt durchgeführt ist, werden schließlich die Inhalte des Merkmalparameter-Wörterbuchs 2 in diejenigen geändert, wie sie in Fig. 4 dargestellt sind. Der auf den neuesten Stand gebrachte Cluster-Datenwert wird jeder der Eingaben von in dem Wörterbuch 2 vorhandenen Sprachbe fehlen zugeteilt. Hierbei ist zu beachten, daß das vorstehend beschriebene Clusterdaten-Aktualisieren für Sprachbefehle nicht unbedingt notwendig ist, die weniger häufig als unrich tig festgestellt werden.In FIG. 5, a cluster data set in the data portion 8 is Darge provides; this cluster data corresponds to a "playback" command in the feature parameter dictionary 2 . In Fig. 5, a value of the line "ID" in this table represents each operating mode of the voice control device and a value of the line "OFF" indicates whether a corresponding cluster data value is set to "1" (or is an inactive cluster data value). For example, when the car stereo is in the command input mode as described above, a cluster bit for the candidate words in the feature parameter dictionary 1 is activated (the bit corresponding to the ID number ID = 0 is set to zero). As described above, the car stereo unit has three modes of operation, the "radio" mode (ID = 1), the "cassette" mode (ID = 2) and the "radio / cassette" mode (ID = 0). If the exclusion switch 15 is pressed in accordance with a word "play" which has incorrectly appeared on the display of the control panel and the stereo unit is in the "radio" mode, a secondary candidate word appears on the display and a cluster Data value that corresponds to the word "playback" is updated. In FIG. 5, the updated cluster data set in the cluster data part 8, the OFF bits, which correspond to the global operating mode and the "radio" operating mode, are both activated (set to the value one). This can then achieve the aforementioned command exclusion function so as to exclude the "play" command from the input of command data in the words book 2 when the stereo unit is either in the "radio / cassette" mode (ID = 0) or in the "Radio" mode (ID = 1) is. Finally, after the command exclusion function is repeatedly performed, the contents of the feature parameter dictionary 2 are changed to those as shown in FIG. 4. The updated cluster data value is assigned to each of the inputs of language defaults present in the dictionary 2 . It should be noted here that the cluster data update described above is not absolutely necessary for voice commands which are less frequently found to be incorrect.

Der Ausschließ-Löschschalter 16 auf dem Bedienungsfeld der Fig. 3 wird gedrückt, um so die Befehlsausschließungsfunktion zu löschen und um das AUS-Bit der Clusterdaten (welche der globalen Betriebsart entsprechen) auf Null zurückzusetzen. Wenn der Schalter 16 für kurze Zeit entsprechend dem Erkennungsergeb nis für eine Spracheingabe in einem Betriebszustand der Sprachsteuereinrichtung gedrückt wird, wird eine Clusterdaten gruppe (eine Gesamtheit von AUS-Bits) für das eingegebene Wort initialisiert (auf 0 rückgesetzt). Wenn der Schalter 16 fortwährend gedrückt wird, beispielsweise bei einem Wartebe trieb der Sprachsteuereinrichtung, werden die Clusterdaten- Gruppen für das Eingeben von Sprachbefehlen in das Wörter buch 2 insgesamt initialisiert (wobei die AUS-Bits auf null rückgesetzt werden).The exclude clear switch 16 on the control panel of Fig. 3 is depressed so as to clear the command exclude function and to reset the OFF bit of the cluster data (which corresponds to the global mode) to zero. If the switch 16 is pressed for a short time in accordance with the recognition result for a voice input in an operating state of the voice control device, a cluster data group (a total of OFF bits) for the input word is initialized (reset to 0). If the switch 16 is pressed continuously, for example during a waiting operation of the voice control device, the cluster data groups for the input of voice commands into the dictionary 2 are initialized as a whole (with the OFF bits being reset to zero).

In Fig. 6 ist ein weiteres Bedienungsfeld der Wagenstereoein heit dargestellt, welches gegenüber dem Bedienungsfeld der Fig. 3 verbessert worden ist. Auf diesem Bedienungsfeld ist ein Lösch-2-Schalter 17 mit einem Tastenschalter kombiniert, welcher gedrückt wird, um jede der vorstehend beschriebenen Funktionen der Schalter 14 bis 16 zu erreichen, wobei sich die Funktionen auf das Clusterdaten-Aktualisieren beziehen. Wenn der Schalter 17 für weniger als eine erste vorgeschrie bene Zeitspanne (z. B. eine Sekunde) gedrückt wird, wird die vorstehend beschriebene Funktion des Löschschalters 14 er reicht. Wenn der Schalter länger als die erste vorgeschrie bene Zeitspanne und kürzer als eine zweite vorgeschriebene Zeitspanne (z. B. zwei Sekunden) gedrückt wird, wird die vorstehend beschriebene Funktion des Ausschließungsschalters 15 erreicht. Wenn der Schalter 17 länger als die zweite vor geschriebene Zeitspanne gedrückt wird, wird die vorstehend beschriebene Funktion des Ausschließ-Löschschalters 16 er reicht. Wenn der Schalter 17 ständig gedrückt wird, wird vorzugsweise ein Piepston vom Lautsprecher der Stereoein heit erzeugt, um dem Benutzer anzuzeigen, welche Funktion durch Drücken des Schalters 17 eingegeben wird.In Fig. 6 another control panel of the Wagenstereoein unit is shown, which has been improved compared to the control panel of FIG. 3. On this control panel, a clear 2 switch 17 is combined with a key switch that is pressed to achieve each of the functions of switches 14 through 16 described above, the functions relating to the cluster data update. If the switch 17 is pressed for less than a first prescribed period (z. B. a second), the function of the erase switch 14 described above is sufficient. If the switch is pressed longer than the first prescribed period and shorter than a second prescribed period (e.g. two seconds), the function of the exclusion switch 15 described above is achieved. If the switch 17 is pressed longer than the second period of time before, the above-described function of the exclusion delete switch 16 is sufficient. If the switch 17 is pressed continuously, a beep is preferably generated by the loudspeaker of the stereo unit in order to indicate to the user which function is entered by pressing the switch 17 .

Als nächstes wird eine zweite Ausführungsform mit Merkmalen nach der Erfindung anhand von Fig. 8 und 9 beschrieben. In Fig. 8 weist diese Sprachsteuereinrichtung einen Speicher 22 zum Speichern von Standardmustern auf, welche eine Beziehung zwischen Sprach eingaben und entsprechenden Befehlsworten anzeigen. Wenn ein Startschalter 32 auf dem Bedienungsfeld der Fig. 9 von einem Benutzer gedrückt wird, werden ein Merkmalextrahierteil 21 und ein Ähnlichkeitsprüfteil 23 in einen Wartezustand gebracht. Dieser Wartezustand dauert an, bis eine Spracheingabe über ein Mikrophon in den Merkmalextrahierteil 21 gegeben wird.Next, a second embodiment with features according to the invention will be described with reference to FIGS. 8 and 9. In FIG. 8, this voice control device has a memory 22 for storing standard patterns which indicate a relationship between voice inputs and corresponding command words. When a start switch 32 on the control panel of FIG. 9 is pressed by a user, a feature extracting part 21 and a similarity checking part 23 are brought into a waiting state. This waiting state continues until a voice input via a microphone is given into the feature extracting part 21 .

Nachdem die Spracheingabe erfolgt ist, extrahiert der Merk malextrahierteil 21 einen Merkmalparameter aus der Sprachein gabe. Der Ähnlichkeitsprüfteil 23 berechnet den Ähnlichkeits grad des extrahierten Merkmalparameters in Relation zu den Standardmustern in dem Speicher 22. Nachdem der Ähnlichkeits grad des extrahierten Parameters für alle Standardmuster berechnet ist, wählt ein Auswählteil 24 ein Befehlswort aus, welches dem Standardmuster entspricht, welches den höchsten Ähnlichkeitsgrad zu dem extrahierten Merkmalmuster der Sprach eingabe unter den Berechnungsergebnissen hat. Dieses Befehls wort wird als ein erkanntes Wort, d. h. als ein Kandidatenwort mit dem höchsten Ähnlichkeitsgrad bezüglich der Spracheingabe identifiziert. Der Auswählteil 24 gibt das erkannte Wort an einen Abgabeteil 25 ab, so daß das Erkennungsergebnis dem Be nutzer in einer wahrnehmbaren Form gegeben wird. Der Abgabe teil 25 in dieser Ausführungsform ist eine Display-Einheit und der sich ergebende Datenwert auf der Display-Einheit ist eines von Kandidatenworten mit dem höheren Ähnlichkeitsgrad. Wenn beispielsweise entschieden wird, daß das Erkennungser gebnis entsprechend einer von dem Benutzer gesprochenen Spracheingabe "Wiedergabe" richtig ist, wird ein entsprechen des Ausgangswort "Wiedergabe" auf der Display-Einheit ange zeigt. Wenn entschieden wird, daß das auf dem Abgabeteil 25 erschienene Wort richtig ist, wird von dem Benutzer ein Be stätigungsschalter 33 auf dem Bedienungsfeld der Fig. 9 ge drückt, so daß ein gewünschter Betrieb der Stereoeinheit durchgeführt wird. Wenn entschieden wird, daß das auf dem Abgabeteil 25 erschienene Wort falsch ist, wird ein Lösch schalter 24 gedrückt, so daß kein Betrieb der Stereoeinheit durchgeführt wird. Als eine Modifikation ist es für den Aus wählteil 24 möglich, Sekundär-Kandidatenworte mit dem zweit höchsten Ähnlichkeitsgrad dem Abgabeteil 25 zuzuführen, nachdem das erste auf dem Abgabeteil 25 erscheinende Kandi datenwort als falsch befunden wird, und der Löschschalter 34 gedrückt wird. In einer solchen Modifikation ist es für den Benutzer wünschenswert und bequem, daß die Anzahl an Sekun där-Kandidatenworten, die nacheinander auf dem Abgabeteil 25 erscheinen, vorherbestimmt ist.After the voice input has taken place, the feature extracting part 21 extracts a feature parameter from the voice input. The similarity check part 23 calculates the degree of similarity of the extracted feature parameter in relation to the standard patterns in the memory 22 . After the degree of similarity of the extracted parameter is calculated for all standard patterns, a selection part 24 selects a command word which corresponds to the standard pattern which has the highest degree of similarity to the extracted feature pattern of the speech input among the calculation results. This command word is identified as a recognized word, ie as a candidate word with the highest degree of similarity in terms of voice input. The selection part 24 outputs the recognized word to a delivery part 25 , so that the recognition result is given to the user in a perceptible form. The delivery part 25 in this embodiment is a display unit and the resulting data value on the display unit is one of candidate words with the higher degree of similarity. If, for example, it is decided that the recognition result is correct in accordance with a voice input "playback" spoken by the user, a corresponding output word "playback" is shown on the display unit. If it is decided that the word appearing on the dispenser 25 is correct, the user presses an operation switch 33 on the panel of FIG. 9 so that a desired operation of the stereo unit is performed. If it is decided that the word appearing on the output part 25 is wrong, an erase switch 24 is pressed so that the stereo unit is not operated. As a modification, it is possible for the selection section 24 to supply secondary candidate words with the second highest degree of similarity to the delivery section 25 after the first candidate data word appearing on the delivery section 25 is found to be incorrect and the delete switch 34 is pressed. In such a modification, it is desirable and convenient for the user that the number of secondary candidate words that appear in succession on the dispensing part 25 is predetermined.

Wenn ein Befehlseingabeprozeß von der Sprachsteuereinheit der Fig. 8 durchgeführt wird, wird zuerst ein Eingabeschal ter 31 von dem Benutzer gedrückt und dann wird ein Steuer schalter auf dem Bedienungsfeld entsprechend einer neuen Befehlseingabe gedrückt. Nachdem der Eingabeschalter 31 gedrückt ist, werden der Merkmalextrahierteil 21 und der Ähnlichkeitsprüfteil 23 in einen Wartezustand gebracht, bis eine Spracheingabe, welche von dem Benutzer in das Mikro phon gesprochen wird, an den Merkmal-Extrahierteil 21 gege ben wird. Wenn die Spracheingabe erfolgt ist, extrahiert der Merkmalextrahierteil 21 einen Merkmalparameter aus der Spracheingabe, und der Extrahierparameter wird in dem Stan dardmusterspeicher 22 gespeichert. Der gespeicherte Para meter in dem Speicher 22 enthält Beziehungsdaten, welche eine Beziehung zwischen dem gespeicherten Standardmuster und der Operation festlegen, welche durch Drücken des Steuer schalters durchgeführt worden ist. Der Befehlseingabeprozeß ist folglich beendet. Da der Eingabeschalter 31 nur gedrückt wird, um den Befehlseingabeprozeß zu starten, wird ein Be trieb der Stereoeinheit, welche dem gedrückten Steuerschal ter entspricht, während des Befehlseingabeprozesses nicht durchgeführt.When a command input process is performed by the voice control unit of FIG. 8, an input switch 31 is first pressed by the user and then a control switch on the control panel is pressed in accordance with a new command input. After the input switch 31 is pressed, the feature extracting part 21 and the Ähnlichkeitsprüfteil be placed 23 in a waiting state until a voice input, which is spoken by the user in the micro phon, gege to the feature extracting part 21 is ben. When the voice input is made, the feature extracting part 21 extracts a feature parameter from the voice input, and the extraction parameter is stored in the standard pattern memory 22 . The stored parameter in the memory 22 contains relationship data which establish a relationship between the stored standard pattern and the operation which has been performed by pressing the control switch. The command entry process is therefore ended. Since the input switch 31 is only pressed to start the command input process, the stereo unit which corresponds to the depressed control switch is not operated during the command input process.

In Fig. 10 ist ein anderes Bedienungsfeld der Wagenstereoein heit dargestellt, welche gegenüber dem Bedienungsfeld der Fig. 9 verbessert worden ist. In dem Bedienungsfeld der Fig. 10 sind der Eingabeschalter 31 und der Löschschalter 34 welche in Fig. 9 dargestellt sind, weggelassen, und es sind nur der Startschalter 32 und der Bestätigungsschalter 33 vorgesehen. Die vorstehend beschriebenen Funktionen der Schalter 31 bis 34 sind auf diesem Bedienungsfeld den beiden Schaltern 33 zugeordnet. Folglich ist jeder der beiden Schal ter 32 und 33 der Fig. 10 ein kombinierter Tastenschalter, welchem zwei verschiedene Funktionen zugeteilt sind. Bei spielsweise wird der Startschalter 32 der Fig. 10 gedrückt, um entweder die Befehlseingabe-Funktion oder die Spracher kennungs-Startfunktion zu erreichen. Wenn der Startschalter 32 kürzer als eine Sekunde gedrückt wird, wird der Sprach erkennungsprozeß gestartet, wie vorstehend bezüglich des Startschalters 32 der Fig. 9 beschrieben worden ist. Wenn der Schalter 32 länger als eine Sekunde gedrückt wird, wird der Befehlseingabeprozeß gestartet, wie vorstehend in Ver bindung mit dem Eingabeschalter 31 der Fig. 9 beschrieben worden ist. Ebenso wird, wenn der Bestätigungsschalter 33 kürzer als eine Sekunde gedrückt wird, bestätigt, daß ein erkannter Sprachbefehl, welcher auf dem Abgabeteil 25 erscheint, richtig ist. Wenn der Bestätigungsschalter 33 länger als eine Sekunde gedrückt wird, wird ein falsch er kannter Sprachbefehl, welcher auf dem Abgabeteil 25 erscheint, gelöscht, wie vorstehend bezüglich des Schalters 34 der Fig. 9 beschrieben ist.In Fig. 10 another control panel of the Wagenstereoein unit is shown, which has been improved compared to the control panel of FIG. 9. In the operation panel of Fig. 10, the input switch 31 and the cancel switch 34 which are shown in Fig. 9 is omitted, and only the start switch 32 and the confirmation switch 33 is provided. The functions of the switches 31 to 34 described above are assigned to the two switches 33 on this control panel. Consequently, each of the two scarf ter 32 and 33 of FIG. 10 is a combined key switch, which two different functions are assigned. In example, the start switch 32 of FIG. 10 is pressed to achieve either the command entry function or the voice recognition start function. If the start switch 32 is pressed for less than one second, the speech recognition process is started as described above with respect to the start switch 32 of FIG. 9. If the switch 32 is pressed for more than one second, the command input process is started as described above in connection with the input switch 31 of FIG. 9. Also, if the confirmation switch 33 is pressed for less than one second, it is confirmed that a recognized voice command appearing on the delivery part 25 is correct. If the confirmation switch 33 is pressed for more than one second, an incorrectly known voice command which appears on the dispensing part 25 is deleted, as described above with respect to the switch 34 of FIG. 9.

Als nächstes wird eine dritte Ausführungsform mit Merkmalen nach der Erfindung anhand von Fig. 9 und 11 beschrieben. In Fig. 11 weist die Sprachsteuereinrichtung einen Speicher 22 auf, in welchem Standardmuster, welche eine Beziehung zwi schen Spracheingaben und entsprechenden Befehlsworten anzei gen, gespeichert werden. Ein Tastenschalterteil 25 dieser Sprachsteuereinrichtung entspricht dem in Fig. 9 dargestellten Bedienungsfeld. Wenn der Startschalter 32 auf dem Bedienungs feld der Fig. 9 von dem Benutzer gedrückt wird, werden ein Merkmal-Extrahierteil 21 und ein Ähnlichkeitsprüfteil 23 in einen Wartezustand gebracht. Dieser Wartezustand dauert an, bis eine Spracheingabe an dem Merkmal-Extrahierteil 21 gegeben wird.Next, a third embodiment with features according to the invention will be described with reference to FIGS. 9 and 11. In Fig. 11, the voice control device has a memory 22 in which standard patterns indicating a relationship between voice inputs and corresponding command words are stored. A key switch part 25 of this voice control device corresponds to the control panel shown in FIG. 9. When the start switch 32 on the operation panel of FIG. 9 is pressed by the user, a feature extracting part 21 and a similarity checking part 23 are brought into a waiting state. This waiting state continues until a voice input is given to the feature extracting part 21 .

Nach der Spracheingabe extrahiert der Merkmal-Extrahierteil 21 einen Merkmalsparameter aus der Spracheingabe. Der Ähn lichkeitsprüfteil 23 berechnet den Ähnlichkeitsgrad des ex trahierten Merkmalparameters bezüglich der Standardmuster in dem Speicher 22. In dieser Ausführungsform wird ein Vorauswählteil 29 von einem Steuerteil 26 instruiert, vor läufig eine Teilgruppe von Standardmustern aus den in dem Speicher 22 gespeicherten Standardmustern entsprechend ei nem aktuellen Betriebsmode eines gesteuerten Systems 28 (z. B. der Wagenstereoeinheit) auszuwählen, welche von der Spracheingabe gesteuert wird. Der Ähnlichkeitsgrad des ex trahierten Merkmalparameters wird nur in Relation zu der Teil gruppe der ausgewählten Standardmuster berechnet, welche durch den Vorauswählteil 29 spezifiziert worden ist. Nachdem der Ähnlichkeitsgrad in Relation zu der Gruppe der ausge wählten Standardmuster berechnet ist, wählt ein Auswahlteil 29 ein Befehlswort aus, welches unter den Berechnungsergeb nissen den höchsten Ähnlichkeitsgrad zu dem extrahierten Merkmalsmuster der Spracheingabe hat. Der Auswahlteil 24 legt ein solches erkanntes Wort an einen Abgabeteil 25 an, so daß das Ergebnis der Spracherkennung dem Benutzer in einer wahrnehmbaren Form gegeben wird.After the voice input, the feature extracting part 21 extracts a feature parameter from the voice input. The similarity check part 23 calculates the degree of similarity of the extracted feature parameter with respect to the standard patterns in the memory 22 . In this embodiment, a preselection part 29 is instructed by a control part 26 to temporarily select a subset of standard patterns from the standard patterns stored in the memory 22 according to a current operating mode of a controlled system 28 (e.g., the car stereo unit), which is based on the voice input is controlled. The degree of similarity of the extracted feature parameter is only calculated in relation to the subgroup of the selected standard patterns, which has been specified by the preselection part 29 . After the degree of similarity is calculated in relation to the group of selected standard patterns, a selection part 29 selects a command word which has the highest degree of similarity to the extracted characteristic pattern of the speech input from the calculation results. The selection part 24 applies such a recognized word to a delivery part 25 , so that the result of the speech recognition is given to the user in a perceptible form.

Der Abgabeteil 25 in dieser Ausführungsform ist eine Display- Einheit auf dem Bedienungsfeld der Fig. 9. Wenn entschieden wird, daß das Wort auf dem Abgabeteil 25 richtig ist, wird der Bestätigungsschalter 33 der Fig. 9 von dem Benutzer ge drückt, so daß der gewünschte Betrieb des gesteuerten Systems 28, wie beispielsweise der Stereoeinheit, durchgeführt wird. Wenn entschieden wird, daß das Wort auf dem Abgabe teil 25 falsch ist, wird von dem Benutzer der Löschschalter 34 gedrückt, so daß kein Betrieb des Steuersystems 28 durch geführt wird.The dispensing section 25 in this embodiment is a display unit on the control panel of FIG. 9. When it is decided that the word on the dispensing section 25 is correct, the confirmation switch 33 of FIG. 9 is pressed by the user so that the desired operation of the controlled system 28 , such as the stereo unit, is performed. If it is decided that the word on the delivery part 25 is wrong, the user presses the clear switch 34 so that no operation of the control system 28 is performed.

Ebenso kann auch in der dritten Ausführungsform eine Modifi kation gemacht werden, wie vorstehend bezüglich der zweiten Ausführungsform beschrieben ist. Das heißt, es ist für den Auswählteil 24 möglich, Sekundär-Kandidatenworte mit einem höheren Ähnlichkeitsgrad dem Abgabeteil 25 zuzuführen, nachdem das erste Kandidatenwort in dem Abgabeteil 25 durch Drücken der Löschtaste 34 gelöscht ist. In einer solchen Modifikation ist es für den Benutzer wünschenswert und be quem, daß die maximale Anzahl an Sekundär-Kandidatenworten, welche nacheinander auf dem Abgabeteil 25 erscheinen, vorher bestimmt wird.Likewise, a modification can also be made in the third embodiment, as described above with respect to the second embodiment. That is, it is possible for the selection section 24 to supply secondary candidate words with a higher degree of similarity to the delivery section 25 after the first candidate word in the delivery section 25 is deleted by pressing the delete key 34 . In such a modification, it is desirable and convenient for the user that the maximum number of secondary candidate words that appear in succession on the delivery part 25 is determined beforehand.

Nunmehr wird ein Befehlseingabeprozeß, welcher von der Sprachsteuereinrichtung in der dritten Ausführungsform aus geführt wird, beschrieben. Wenn dieser Befehlseingabe prozeß durchgeführt wird, wird zuerst der Eingabeschalter 31 auf dem Bedienungsfeld der Fig. 9 von dem Benutzer gedrückt, und dann wird ein Steuerschalter gedrückt, um eine gewünschte Betriebsart der Wagenstereoeinheit zu erreichen, welche einem neuen Sprachbefehl entspricht. Nachdem der Eingabeschalter 31 gedrückt ist, werden der Merkmalextrahierteil 21 und der Ähnlichkeitsprüfteil 23 in einen Wartezustand gebracht, bis eine Spracheingabe, welche von dem Benutzer in das Mikrophon gesprochen worden ist, an den Merkmalextrahierteil 21 gegeben wird. Nachdem die Spracheingabe erfolgt ist, extrahiert der Merkmalextrahierteil 21 einen Merkmalsparameter aus der Spracheingabe, und der extrahierte Parameter wird als ein neues Standardmuster in dem Speicher 22 gespeichert. Das gespeicherte Muster in dem Speicher 22 enthält Bezie hungsdaten, welche eine Beziehung zwischen dem gespeicherten Standardmuster und dem Betrieb festlegen, welcher durch Drüc ken des Steuerschalters durchgeführt worden ist. Dann ist der Befehlseingabeprozeß beendet. Da der Eingabeschalter 31 zu erst nur gedrückt wird, um den Befehlseingabeprozeß zu star ten, wird ein Betrieb der Wagenstereoeinheit, welche dem ge drückten Schalter entspricht, während des Befehlseingabepro zesses nicht durchgeführt.A command input process performed by the voice control device in the third embodiment will now be described. When this command input process is performed, the input switch 31 on the control panel of Fig. 9 is first pressed by the user, and then a control switch is pressed to achieve a desired mode of operation of the car stereo unit which corresponds to a new voice command. After the input switch 31 is depressed, the feature extracting part 21 and the similarity checking part 23 are put in a waiting state until a voice input which has been spoken into the microphone by the user is given to the feature extracting part 21 . After the voice input is made, the feature extracting part 21 extracts a feature parameter from the voice input, and the extracted parameter is stored in the memory 22 as a new standard pattern. The stored pattern in the memory 22 contains relationship data which establishes a relationship between the stored standard pattern and the operation performed by pressing the control switch. Then the command entry process is finished. Since the input switch 31 is first pressed only to start the command input process, operation of the car stereo unit corresponding to the switch pressed is not performed during the command input process.

Wenn in der dritten Ausführungsform der Eingabeschalter 31 für eine verhältnismäßig kurze Zeit (z. B. für weniger als eine Sekunde) gedrückt wird, wird ein Cluster-Datenwert einem entsprechenden Standardmuster in dem Speicher 22 zugeordnet, und der Standardparameter mit dem zugeordneten Cluster-Daten wert wird in dem Speicher 22 zusammen mit den vorerwähnten Beziehungsdaten gespeichert. Der Cluster-Datenwert ist ein betriebsabhängiger Datenwert, welcher festlegt, daß der gerade eingegebene Sprachbefehl von der Sprachsteuereinrichtung nur erkannt wird, wenn das System 28 (die Wagenstereoeinheit) in einem spezifischen Betrieb arbeitet, welcher derselbe ist wie einer während des Befehlseingabeprozesses.In the third embodiment, when the input switch 31 is pressed for a relatively short time (e.g., less than a second), a cluster data value is assigned a corresponding standard pattern in the memory 22 , and the standard parameter with the assigned cluster data value is stored in the memory 22 together with the aforementioned relationship data. The cluster data value is an operational data value that specifies that the voice command just entered is recognized by the voice controller only when the system 28 (the car stereo) is operating in a specific operation, which is the same as one during the command entry process.

Der Cluster-Datenwert, welcher jedem der gespeicherten Stan dardmuster des Speichers 22 zugeordnet worden ist, wird durch einen Vorauswählteil 29 geprüft, um so festzustellen, ob das Standardmuster im voraus für die von der Ähnlichkeitsprüfeinrichtung 23 durchgeführte Ähnlichkeitsberechnung ausgewählt ist oder nicht. Wenn der Eingabeschalter 31 für eine verhältnismäßig lange Zeit (z. B. für länger als eine Sekunde) gedrückt wird, werden Cluster- Daten einem entsprechenden Standardmuster in dem Speicher 22 zugeteilt. Dieser Cluster-Datenwert ist ein globaler Betriebsart- Datenwert, welcher festlegt, daß der gerade eingegebene Sprach befehl unabhängig von dem augenblicklichen Betrieb, in wel chem das System 28 arbeitet, immer von der Sprachsteuerein richtung erkannt wird. Mit anderen Worten, der Sprachbefehl wird nicht nur erkannt, wenn das System 28 in derselben Betriebsart wie während des Befehleingabeprozesses arbeitet, sondern auch, wenn das System 28 in irgendeiner anderen Betriebsart betrieben wird.The cluster data that has been assigned to each of the stored standard patterns of the memory 22 is checked by a preselection part 29 so as to determine whether or not the standard pattern is selected in advance for the similarity calculation performed by the similarity checker 23 . If the input switch 31 is pressed for a relatively long time (e.g., longer than one second), cluster data is assigned a corresponding standard pattern in the memory 22 . This cluster data value is a global mode data value which specifies that the voice command just entered is always recognized by the voice control device regardless of the current operation in which the system 28 is operating. In other words, the voice command is recognized not only when system 28 is operating in the same mode as during the command entry process, but also when system 28 is operating in some other mode.

In Fig. 12 ist noch ein weiteres Bedienungsfeld der Wagen stereoeinheit dargestellt, welches gegenüber dem Bedienungs feld der Fig. 9 verbessert worden ist. Auf dem Bedienungsfeld der Fig. 12 ist ein kombinierter Schalter 35, welchem die Be fehlseingabe- und die Spracherkennungs-Startfunktion zuge teilt sind, zusammen mit dem Bestätigungsschalter 33 und dem Löschschalter 34 vorgesehen. Der Schalter 35 entspricht dem Lösch-Schalter 17 der Fig. 6. Wenn beispielsweise der Schal ter 35 für weniger als eine Sekunde gedrückt wird, wird der Spracherkennungsprozeß gestartet. Wenn der Schalter für länger als eine Sekunde und kürzer als zwei Sekunden ge drückt wird, startet der vorstehend beschriebene Befehls eingabeprozeß der betriebsartabhängigen Type. Wenn der Schalter 35 für länger als zwei Sekunden gedrückt wird, startet der vorstehend beschriebene Befehlseingabeprozeß den globalen Betriebs arttyp. Wenn der Schalter 17 ständig gedrückt wird, wird vorzugsweise ein Piepston von einem Lautsprecher der Wagen stereoeinheit erzeugt, um dem Benutzer anzuzeigen, welche Funktion durch Drücken des Schalters 17 eingegeben ist.In Fig. 12 still another control panel of the car stereo unit is shown, which has been improved compared to the control panel of FIG. 9. On the control panel of FIG. 12, a combined switch 35 , which the command input and the voice recognition start function are assigned, is provided together with the confirmation switch 33 and the delete switch 34 . The switch 35 corresponds to 17 of FIG extinguishing the switch. 6. For example, if the scarf will ter 35 is pressed for less than one second, the speech recognition process is started. If the switch is pressed for more than one second and less than two seconds, the above-described command input process of the mode-dependent type starts. When the switch 35 is pressed for more than two seconds, the command entry process described above starts the global mode type. If the switch 17 is pressed continuously, a beep is preferably generated by a loudspeaker of the car stereo unit in order to indicate to the user which function has been entered by pressing the switch 17 .

In Fig. 13 sind Änderungen von Betriebsbedingungen der Wagen stereoeinheit entsprechend einer Spracheingabe an der Sprach steuereinheit dargestellt. In Fig. 13 zeigt ein ausgezogen wiedergegebener Pfeil eine Änderung von einem Betriebszustand in einen anderen entsprechend einem Sprachbefehl an; ein ge strichelt wiedergegebener Pfeil zeigt eine Änderung an, die automatisch bewirkt wird, nachdem eine einem Sprachbefehl entsprechende Operation durchgeführt ist.In Fig. 13 changes in operating conditions of the car stereo unit are shown according to a voice input to the voice control unit. In Fig. 13, a solid arrow indicates a change from one operating state to another in accordance with a voice command; a dashed arrow indicates a change that is effected automatically after an operation corresponding to a voice command is performed.

Wie in Fig. 13 dargestellt, sind, wenn ein "Radio"-Befehl eingegeben ist und sich die Wagen-Stereoeinheit in dem Radio"-Betrieb befindet, Sprachbefehle, die wirksam erkannt werden, "Lautstärke laut", "Lautstärke leise" und "Abtasten/Suchen". Wenn ein "Kassetten"- Befehl eingegeben wird und die Stereoeinheit in "Kassetten"- Betrieb betrieben wird, sind Sprachbefehle, die wirksam er kannt werden, "Lautstärke laut", "Lautstärke leise", "Vor wärtsspulen", "Rückspulen" und "Wiedergabe". Wenn der vor stehend beschriebene betriebsartabhängige Eingabeprozeß durch geführt ist, ist es möglich, daß Sprachbefehle, die Betriebs bedingungen der Wagenstereoeinheit entsprechen, eingegeben werden, und Standardmuster bezüglich der Beziehungs- und Clusterdaten in dem Speicher 22 gespeichert werden. Wenn der vorstehend erwähnte Globalbetriebsart-Eingabeprozeß durchgeführt wird, ist es möglich, daß Sprachbefehle, die unabhängig von den Betriebsbedingungen wirksam sind, einge geben werden und Standardmuster mit den Beziehungs- und den Clusterdaten in dem Speicher 22 gespeichert werden.As shown in Fig. 13, when a "radio" command is entered and the car stereo is in the radio "mode, voice commands that are effectively recognized are" volume up "," volume down "and" Scan / Search ". When a" cassette "command is entered and the stereo unit is operated in" cassette "mode, voice commands that are effectively recognized are" volume up "," volume down "," forward spool ", "Rewind" and "play" When the above-described mode-dependent input process is performed, it is possible that voice commands corresponding to the operating conditions of the car stereo unit are input, and standard patterns relating to the relationship and cluster data are stored in the memory 22 When the above-mentioned global mode input process is performed, it is possible that voice commands that are effective regardless of the operating conditions can be input and standard patterns with the relationship and cluster data are stored in the memory 22 .

Um zu verhindern, daß das System 28 einen größeren Schaden erleidet, müssen einige falsche Operationen (wie Befehle "Lautstärke erhöhen", wenn die Stereoeinheit im Rückspul betrieb ist) automatisch unterbunden werden. Es wird die Eingabe eines neuen Sprachbefehls, welcher einer falschen Operation bei dem Globalbetriebsart-Eingabeprozeß entspricht, nur verhindert, wenn das gesteuerte System sich in einer vorgeschriebenen Betriebsart befindet. Es ist dann wünschenswert, daß der Benutzer durch eine entsprechende Displaymeldung informiert wird, daß ein Sprachbefehl nicht empfangen wird, wenn ihn der Benutzer eingibt.In order to prevent the system 28 from suffering greater damage, some false operations (such as "increase volume" commands when the stereo unit is in rewind) must be automatically prevented. The entry of a new voice command that corresponds to a wrong operation in the global mode input process is only prevented when the controlled system is in a prescribed mode. It is then desirable that the user be informed by an appropriate display message that a voice command will not be received when the user enters it.

Claims

1. Voice control device for operating a voice-operated system by means of a voice command which has been recognized by voice recognition, the system being provided with a number of manually operated control switches ( 6, 27 ) for triggering operations of the system
feature extracting means ( 1, 21 ) for extracting a feature pattern from a voice input;
storage means ( 2, 22 ) for storing standard patterns which are feature patterns extracted from a group of standard voice commands;
similarity checking means ( 3, 23 ) for calculating a degree of similarity of the extracted feature pattern in relation to each of the stored standard patterns;
a selection device ( 4, 24 ) for selecting a voice command which has been recognized from the voice input, this voice command having the highest degree of similarity with respect to the stored standard patterns, and
at least one input switch ( 11 ) for entering a new voice command into the storage device,
characterized by an exclusion switch ( 15 ) for deleting a standard pattern from the group of standard patterns,
wherein, after the input switch ( 11 ) is pressed, a feature pattern extracted from the new voice command is stored as a standard pattern in the storage means ( 2, 22 ) together with cluster data specifying an attribute of the standard pattern just entered , and which indicate that regardless of one of several modes of operation of the system when the input switch ( 11 ) is pressed, the degree of similarity of the new voice command is calculated, and
wherein after the exclusion switch ( 15 ) is pressed in accordance with a recognized voice command so as to delete the recognized voice command, the cluster data indicates that a degree of similarity to the deleted voice command is not calculated when the system is operated in an operating mode which is the same as that when the exclusion switch ( 15 ) is pressed.

2. speech control means according to claim 1, characterized by an erase switch (14) to a standard pattern from the group of standard patterns in the memory means (2, 22) to remove, wherein after the cancel switch (14) accordingly to a recognized voice command is pressed, the cluster data corresponding to the deleted standard pattern are not stored in the storage device ( 2, 22 ), and a voice command with a second highest degree of similarity with respect to the standard pattern is selected from the degrees of similarity to be calculated by means of the selection device ( 4, 24 ) so that the voice command with the second highest degree of similarity is issued.

3. Voice control device according to claim 1 or 2, characterized by an exclusion-delete switch ( 16 ) to delete an exclusion command, which has been given by means of the delete switch, wherein after the exclusion switch ( 15 ) according to a recognized voice command during a special operating mode of the system is pressed, the cluster data indicate that a degree of similarity with respect to the standard pattern that corresponds to the voice command is always by the similarity checking device ( 3, 23 ) regardless of the operating mode of the system when the switch ( 16 ) is calculated.

4. Voice control device according to claim 2 or 3, characterized by an exclusion clearing switch ( 16 ) in order to initialize cluster data which correspond to the group of standard patterns in the memory ( 2, 22 ), wherein when the exclusion switch ( 16 ) for longer than a prescribed period of time corresponding to a recognized voice command during a special operating mode of the system, the initialized cluster data indicate that a degree of similarity in relation to all standard patterns in the memory ( 2, 22 ) is always from the similarity checking device ( 3 , 23 ) is calculated regardless of the operating mode of the system when the switch ( 16 ) is pressed.

5. A voice control device according to claim 2 or 3, characterized by a combined switch ( 17 ) for commanding one of the functions of the delete switch ( 14 ), the exclusion switch ( 15 ) and the exclusion delete switch ( 16 ), the function of the delete switch ( 14 ) is performed, if the combined switch ( 17 ) is pressed shorter than a predetermined first time period, the function of the exclusion switch ( 15 ) is carried out, if the combined switch ( 17 ) is pressed longer than the prescribed second time period, and that Function of the exclusion delete switch ( 16 ) is performed when the combined switch ( 17 ) is pressed longer than the prescribed second period.

6. Voice control device according to claim 1, characterized by a combined key switch ( 35 ) to either bring the control device into a waiting state before a voice input is received, or to enter a new voice command into the storage device ( 2, 22 ), wherein if the key switch ( 35 ) is pressed for less than a prescribed period of time, the control device is brought into the waiting state, wherein if the key switch is pressed for longer than the prescribed period of time and a control switch below the control switches ( 6, 27 ) is pressed, the control device is instructed to enter a new voice command and to store a corresponding standard pattern in the standard patterns of the storage device, the new voice command corresponding to a very special operation of the system which has been instructed by the depressed control switch.

7. Voice control device according to claim 1, characterized by a first switch ( 31 ) to confirm that a recognized voice command, which has been selected by the selection device ( 4, 24 ) according to a voice input, is correct so as to achieve a desired operation of the System and has a second switch ( 34 ) to delete a recognized voice command, which has been incorrectly selected by the selector ( 4, 24 ), so that the selector selects another voice command with a second highest degree of similarity, which by the Similarity checker ( 3, 23 ) has been calculated.

8. Voice control device according to claim 1, characterized by a combined key switch ( 33 ) to either confirm that a recognized voice command, which has been selected by the selection device ( 4, 24 ) according to a voice input, is correct, or to a recognized voice command delete which has been incorrectly selected by the selector, so that the selector selects another voice command with a second highest degree of similarity, which has been calculated by the similarity checker ( 3, 23 ), and if the key switch ( 33 ) for less than a prescribed period of time is pressed, it is confirmed that the recognized voice command is correct, and if the key switch ( 33 ) is pressed longer than the prescribed time period, the recognized voice command is deleted and another voice command with a second highest degree of similarity the selection direction ( 4, 24 ) is selected.

9. Voice control device according to claim 1, characterized by a preselector ( 7, 29 ) to select a subset of standard patterns from the group of standard patterns in the memory ( 2, 22 ) according to an operating mode in which the system is operated when a voice input is received in order to enable the similarity checking device ( 3, 23 ) to calculate the degree of similarity in relation to the subset of the standard patterns which has been selected by the preselection device.

10. Voice control device according to claim 9, characterized by a first switch ( 31 ) for entering an operating mode-dependent voice command and a second switch for entering a global operating mode voice command, wherein when the first switch and then a control switch is pressed under the control switches, instructing the controller to enter a new voice command and to store a corresponding standard pattern in the standard patterns of the storage device ( 2, 22 ), the new voice command corresponding to an operation of the system that has been instructed by the control switch, the new command being incorporated into the Storage device ( 2, 22 ) is input and stored together with operating mode-dependent data which indicate operating mode-dependent data that a degree of similarity to the standard pattern is only calculated by the similarity checking device ( 3, 23 ) if the system is operated in an operating mode, those who is the same as that when the first switch is pressed, and wherein when the second switch and then a control switch below the control switches is pressed, the controller is instructed to enter a new voice command and to store a corresponding standard pattern in the standard patterns of the memory device, the new voice command corresponds to an operation of the system operated by the control switch, whereby the new command is entered and stored in the storage device together with the global mode data, which stipulate that a degree of similarity to the standard pattern is always determined by the similarity check regardless of one mode when the second is pressed Switch is calculated.

11. Voice control device according to claim 9, characterized by a combined key switch ( 31 ) to enter either a mode-dependent voice command or a global mode voice command, wherein if the key switch ( 31 ) is pressed for less than a prescribed period of time and then a control switch is pressed under the control switches ( 27 ), the control device is instructed to enter a new voice command and to store a corresponding standard pattern in the standard patterns of the storage device ( 2, 22 ), the new voice command corresponding to an operation of the system by the Control switch has been instructed, whereby the new command is entered and stored in the storage device together with the mode-dependent data, which mode-dependent data indicate that a degree of similarity in relation to the standard pattern by the similarity checker ( 3, 23 ) only bere The system is operated when the system is operated in an operating mode which is the same as that when the key switch is pressed, and when the key switch ( 31 ) is pressed for longer than the prescribed period and a control switch below the control switches is pressed afterwards the control device is instructed to enter a new voice command and to store a corresponding standard pattern in the standard patterns of the storage device ( 2, 22 ), the new voice command corresponding to an operation of the system which has been actuated by means of the control switch, whereby the new command is entered and stored in the storage device together with the global mode data, which stipulate that a degree of similarity in relation to the standard patterns is always calculated by the similarity checking device regardless of an operating mode when the key switch is pressed.

12. Voice control device according to claim 9, characterized by a combined key switch ( 35 ) to bring the control device into a waiting state, to enter an operating mode-dependent voice command, or to enter a global operating mode voice command,
wherein if the key switch ( 35 ) is pressed for less than a prescribed period of time, the control device is brought into a waiting state before a voice input is received,
wherein if the key switch ( 35 ) is pressed longer than the first time period and shorter than a prescribed second time period and then a control switch is pressed under the control switches, the control device is instructed to enter a new voice command and a corresponding standard pattern in the standard patterns in to store the memory device ( 2, 22 ), the new voice command corresponding to an operation of the system instructed by the control switch, whereby the new command is entered and stored in the memory device together with the mode-dependent data indicating the mode-dependent data that a degree of similarity in relation to the standard patterns is only calculated using the similarity checking device ( 3, 23 ) if the system is operated in an operating mode which is the same as when the key switch ( 35 ) is pressed, and
wherein if the key switch ( 35 ) is pressed for longer than the second period of time and then a control switch is pressed under the control switches, the control device is instructed to enter a new voice command and a corresponding standard pattern in the standard patterns in the storage device ( 2, 22 ) The new voice command corresponds to an operation of the system that has been operated by the control switch, whereby the new command is entered and stored in the storage device together with global mode data, which indicate global mode data that a degree of similarity in relation to the standard pattern always is calculated by the similarity test device ( 3, 23 ) regardless of an operating mode when pressing the key switch.

13. Voice control device according to one of claims 10 to 12, characterized in that the input and storage of a new voice command, which has the global mode data in the memory device ( 2, 22 ), can only be prevented if the system is in a prescribed operating mode.