JP4770374B2

JP4770374B2 - Voice recognition device

Info

Publication number: JP4770374B2
Application number: JP2005291531A
Authority: JP
Inventors: 靖子大橋
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2005-10-04
Filing date: 2005-10-04
Publication date: 2011-09-14
Anticipated expiration: 2025-10-04
Also published as: JP2007101892A

Description

本発明は、音声認識装置に関し、特に、複数の制御対象機器に対する音声命令を認識する音声認識装置に関する。 The present invention relates to a voice recognition device, and more particularly to a voice recognition device that recognizes voice commands for a plurality of control target devices.

音声認識装置は、制御対象機器の動作状態に応じて認識語彙を制限することによって、認識率を向上させている。すなわち、制御対象機器の動作状態に対応した所定の認識語彙を有している認識辞書を複数記憶しておき、制御対象機器の動作状態を検出して、記憶した複数の認識辞書から音声認識処理に用いる認識辞書を選択するようにしている。
特開平１０−７３４８号公報 The speech recognition apparatus improves the recognition rate by limiting the recognition vocabulary according to the operation state of the control target device. That is, a plurality of recognition dictionaries having a predetermined recognition vocabulary corresponding to the operation state of the control target device are stored, the operation state of the control target device is detected, and a voice recognition process is performed from the stored plurality of recognition dictionaries. The recognition dictionary used for is selected.
Japanese Patent Laid-Open No. 10-7348

音声認識装置が複数の制御対象機器に対する音声命令を認識するようになっている場合、音声認識処理に用いられる認識辞書が複数選択される場合がある。たとえば、車両用の音声認識装置の場合、制御対象機器としてナビゲーション装置とオーディオ装置とを含んでいて、且つ、両装置がともに動作中である場合には、ナビゲーション装置の動作状態に基づく認識辞書（たとえば目的地設定操作用の認識辞書や、検索操作用の認識辞書）と、オーディオ装置の動作状態に基づく認識辞書（たとえば曲演奏操作用の認識辞書）とが音声認識処理に用いる辞書として選択される。 When the voice recognition device recognizes voice commands for a plurality of devices to be controlled, a plurality of recognition dictionaries used for voice recognition processing may be selected. For example, in the case of a voice recognition device for a vehicle, when a navigation device and an audio device are included as devices to be controlled and both devices are operating, a recognition dictionary based on the operation state of the navigation device ( For example, a recognition dictionary for destination setting operation or a recognition dictionary for search operation) and a recognition dictionary based on the operating state of the audio device (for example, a recognition dictionary for music performance operation) are selected as the dictionary used for the speech recognition processing. The

上記複数の認識辞書の認識語彙には、複数の認識辞書に共通して含まれている共通語が存在する。上記共通語としては、たとえば「前」「次」などがあり、目的地設定操作用の認識辞書には、「前（次）の目的地」という認識語が含まれ、曲演奏操作用の認識辞書には「前（次）の曲」という認識語が含まれている。 In the recognition vocabulary of the plurality of recognition dictionaries, there are common words included in common in the plurality of recognition dictionaries. Examples of the common language include “previous” and “next”. The recognition dictionary for destination setting operation includes the recognition word “previous (next) destination”, which is recognized for music performance operation. The dictionary contains the recognition word “previous (next) song”.

音声認識処理に用いる認識辞書として複数の辞書が選択され、その選択された複数の辞書にともに共通語が含まれている場合、共通語を含む別の言葉が誤って認識されてしまうことがあった。たとえば、ユーザが「前の目的地」と言ったにも関わらず「前の曲」と認識されてしまうことがあった。また、「前の目的地」という音声命令を思いつかずに、「前」と言った場合にも、ユーザの言いたかった音声命令は認識されずに、「前の曲」などの他の命令が認識されてしまうことがあった。 When a plurality of dictionaries are selected as recognition dictionaries for use in the speech recognition processing, and common words are included in the selected dictionaries, other words including the common words may be mistakenly recognized. It was. For example, the user may be recognized as the “previous song” even though the user has said “the previous destination”. Also, if you say “Previous” without thinking of the voice command “Previous destination”, the voice command you wanted to say is not recognized, and other commands such as “Previous song” Sometimes it was recognized.

なお、認識辞書を複数設けず、一つの認識辞書に全ての制御対象機器に対する認識語を全て持たせることも考えられるが、その場合にも、当然、ユーザの発話が共通語を含むものである場合、前述の誤認識の問題が生じる。 In addition, it is possible to have all recognition words for all control target devices in one recognition dictionary without providing a plurality of recognition dictionaries, but in that case, naturally, if the user's utterance includes a common word, The aforementioned misrecognition problem arises.

本発明は、この事情に基づいて成されたものであり、その目的とするところは、共通語を含む音声命令の認識精度がよい音声認識装置を提供することにある。 The present invention has been made based on this situation, and an object of the present invention is to provide a speech recognition apparatus with high recognition accuracy of speech commands including common words.

その目的を達成するための請求項１記載の発明は、複数の制御対象機器に対する音声命令を認識するための音声認識装置であって、複数の認識語が、それぞれ前記複数の制御対象機器のうちのいずれかの制御対象機器の動作状態と対応付けられている認識辞書を記憶している記憶装置と、その記憶装置に記憶されている認識辞書に基づいて、ユーザが発話した音声命令を認識する音声命令認識手段と、互いに異なる前記制御対象機器と対応付けられている少なくとも２つの認識語に共通して含まれている共通語が、ユーザが発話した音声命令に含まれているか否かを判断する共通語認識手段とを備え、
前記音声命令認識手段は、その共通語認識手段によって共通語が含まれていると判断された場合には、当該共通語が含まれる前記認識辞書の認識語のうち、ユーザが最後に操作した制御対象機器と対応付けられている認識語に基づいて、ユーザが発話した音声命令を認識することを特徴とする。 In order to achieve the object, the invention according to claim 1 is a voice recognition device for recognizing a voice command for a plurality of control target devices, wherein a plurality of recognition words are respectively among the plurality of control target devices. Based on the storage device storing the recognition dictionary associated with the operation state of any one of the control target devices, and the recognition dictionary stored in the storage device, the voice command spoken by the user is recognized It is determined whether or not a common word that is included in common in at least two recognition words associated with the voice command recognition unit and the different control target devices is included in the voice command spoken by the user. Common language recognition means
The voice command recognizing means, when the common word recognizing means determines that the common word is included , the control last operated by the user among the recognized words of the recognition dictionary including the common word. A voice command uttered by the user is recognized based on a recognition word associated with the target device.

この請求項１記載の発明によれば、共通語認識手段によりユーザが発話した音声命令に共通語が含まれているか否かを判断し、音声命令認識手段は、共通語認識手段によって共通語が含まれていると判断された場合には、当該共通語が含まれる認識辞書の認識語のうち、ユーザが最後に操作した制御対象機器に対応付けられている認識語に基づいて、ユーザが発話した音声命令を認識するようにしている。従って、ユーザの発話に共通語が含まれていると判断された場合には、ユーザが最後に操作した制御対象機器と対応付けられていない認識語が認識されることがなくなるので、共通語を含む音声命令の認識精度が向上する。 According to the first aspect of the present invention, it is determined whether or not a common word is included in a voice command uttered by the user by the common word recognition unit. If it is determined that the word is included , the user speaks based on the recognition word associated with the device to be controlled last operated by the user among the recognition words of the recognition dictionary including the common word. The voice command is recognized. Therefore, when it is determined that a common word is included in the user's utterance, a recognized word that is not associated with the device to be controlled last operated by the user is not recognized. The recognition accuracy of the voice command is improved.

記憶装置には、全ての制御対象機器に対する認識語を全て持っている認識辞書が一つだけ記憶されていてもよいが、認識率の向上のために、認識辞書は制御対象機器の動作状態別に複数の認識辞書に分けられることが好ましい。その場合には、請求項２記載のようにして、共通語を含む音声命令の認識精度を向上させる。 The storage device may store only one recognition dictionary having all the recognition words for all the control target devices. However, in order to improve the recognition rate, the recognition dictionary is classified according to the operation state of the control target device. It is preferable to be divided into a plurality of recognition dictionaries. In that case, the recognition accuracy of the voice command including the common word is improved as described in claim 2.

すなわち、請求項２記載の発明は、請求項１に記載の音声認識装置において、前記記憶装置に記憶されている認識辞書は、前記制御対象機器の動作状態別に複数の認識辞書に分けられており、前記複数の制御対象機器の動作状態に基づいて、前記記憶装置に記憶されている複数の認識辞書から少なくとも一つの認識辞書を選択する辞書選択手段をさらに備え、前記音声命令認識手段は、前記共通語認識手段によって共通語が含まれていると判断された場合には、ユーザが発話した音声命令を認識する認識辞書として、前記辞書選択手段で選択された認識辞書のうちでユーザが最後に操作した制御対象機器に基づいて定まる一つの認識辞書を用いることを特徴とする。 That is, the invention according to claim 2 is the speech recognition apparatus according to claim 1, wherein the recognition dictionary stored in the storage device is divided into a plurality of recognition dictionaries according to operation states of the control target devices. , Further comprising dictionary selection means for selecting at least one recognition dictionary from the plurality of recognition dictionaries stored in the storage device based on the operating state of the plurality of control target devices, When it is determined by the common word recognition means that the common word is included, as the recognition dictionary for recognizing the voice command spoken by the user, the user is the last among the recognition dictionaries selected by the dictionary selection means. One recognition dictionary determined based on the operated device to be controlled is used.

この請求項２記載の発明によれば、共通語認識手段によりユーザが発話した音声命令に共通語が含まれているか否かを判断し、音声命令認識手段は、共通語認識手段によって共通語が含まれていると判断された場合には、音声命令を認識する認識辞書として、ユーザが最後に操作した制御対象機器に基づいて定まる一つの認識辞書を用いることにしている。そのため、制御対象機器の動作状態からは複数の認識辞書が選択されていたとしても、ユーザが最後に操作した制御対象機器に基づいて定まる一つの認識辞書以外は、音声命令認識手段で用いられないことになるので、共通語を含む音声命令の認識精度が向上する。 According to the second aspect of the present invention, it is determined whether or not a common word is included in a voice command uttered by the user by the common word recognition unit. When it is determined that it is included, one recognition dictionary determined based on the control target device last operated by the user is used as a recognition dictionary for recognizing a voice command. For this reason, even if a plurality of recognition dictionaries are selected from the operating state of the control target device, only one recognition dictionary determined based on the control target device last operated by the user is used by the voice command recognition unit. As a result, the recognition accuracy of voice commands including common words is improved.

前記音声命令認識手段は、請求項３または４記載のようにして、ユーザが発話した音声命令が何であるあるかを決定することができる。 The voice command recognizing means can determine the voice command spoken by the user as described in claim 3 or 4.

請求項３記載の発明は、請求項２に記載の音声認識装置において、前記音声命令認識手段は、ユーザが最後に操作した制御対象機器に基づいて認識辞書を一つに絞り込んだ場合には、その絞り込んだ一つの認識辞書に含まれる認識語のうち、前記共通語認識手段によって認識された共通語を含んでいる認識語を、ユーザが発話した音声命令として特定するものである。 According to a third aspect of the present invention, in the voice recognition device according to the second aspect, when the voice command recognition unit narrows down the recognition dictionary to one based on the device to be controlled last operated by the user, Among the recognition words included in the narrowed-down recognition dictionary, the recognition word including the common word recognized by the common word recognition means is specified as the voice command uttered by the user.

また、請求項４記載の発明は、請求項２に記載の音声認識装置において、前記音声命令認識手段は、ユーザが最後に操作した制御対象機器に基づいて認識辞書を一つに絞り込んだ場合には、その絞り込んだ一つの認識辞書を用いた音声認識処理を実行することにより、ユーザが発話した音声命令を認識するものである。 According to a fourth aspect of the present invention, in the voice recognition device according to the second aspect, the voice command recognition means narrows down the recognition dictionary to one based on the device to be controlled last operated by the user. Is for recognizing a voice command uttered by a user by executing a voice recognition process using the narrowed-down recognition dictionary.

また、請求項３記載のように、共通語認識手段によって認識された共通語と、認識辞書の認識語彙とを比較することによって、ユーザが発話した音声命令を特定するものである場合、請求項５記載のように、共通語認識手段は、辞書選択手段において選択される認識辞書の数に関係なく実行することが好ましい。 Further, as described in claim 3, when the common word recognized by the common word recognition means and the recognition vocabulary of the recognition dictionary are compared, the voice command spoken by the user is specified. As described in 5, the common word recognition means is preferably executed regardless of the number of recognition dictionaries selected by the dictionary selection means.

ユーザは、制御対象機器の動作状態とは関係なく、共通語のみを発話することがあるが、請求項５のようにすれば、仮に制御対象機器の動作状態からは一つのみの認識辞書が選択されていたとしても、共通語認識手段を実行することになる。そして、共通語認識手段を実行することによって、ユーザが共通語のみを発話した場合にも、それを精度よく認識することができるようになり、その精度よく認識できる共通語と認識辞書の認識語彙とを比較することによって音声命令を特定することになるので、音声命令の特定精度が向上する。 The user may speak only a common word regardless of the operation state of the control target device. However, according to the fifth aspect, only one recognition dictionary is determined from the operation state of the control target device. Even if it is selected, the common word recognition means is executed. By executing the common word recognizing means, even when the user utters only the common word, the common word can be recognized with high accuracy, and the recognition vocabulary of the common word and the recognition dictionary can be recognized with high accuracy. Therefore, the voice command is specified, so that the accuracy of specifying the voice command is improved.

また、請求項６記載のように、前記辞書選択手段において複数の認識辞書が選択され、且つ、その選択した複数の認識辞書のうちの少なくとも２つの認識辞書に前記共通語が含まれている場合に、前記共通語認識手段を実行するようにしてもよい。 Further, as in claim 6, when a plurality of recognition dictionaries are selected by the dictionary selecting means, and at least two of the selected recognition dictionaries include the common word In addition, the common word recognition means may be executed.

請求項６記載のようにすると、条件によっては共通語認識手段が実行されずに、直接、音声命令認識手段によってユーザが発話した音声命令が認識されることになるので、処理が高速になる。 According to the sixth aspect, the common word recognizing unit is not executed depending on the condition, and the voice command spoken by the user is directly recognized by the voice command recognizing unit, so that the processing becomes faster.

また、前記共通語認識手段は、請求項７記載のように、共通語認識辞書を備えることにより、または請求項８記載のように、ワードスポッティング技術を用いて、ユーザが発話した音声命令に共通語が含まれているか否かを判断する。 Further, the common word recognizing means includes a common word recognition dictionary as described in claim 7, or uses a word spotting technique as described in claim 8 to share a voice command spoken by a user. Determine if a word is included.

すなわち、請求項７記載の発明は、請求項１乃至６のいずれかに記載の音声認識装置において、前記記憶装置に、前記共通語を認識語彙とする共通語認識辞書が記憶されており、前記共通語認識手段は、前記共通語認識辞書を用いた音声認識処理を実行することによって、ユーザが発話した音声命令に前記共通語が含まれているか否かを判断するものである。 That is, the invention according to claim 7 is the speech recognition device according to any one of claims 1 to 6, wherein a common word recognition dictionary having the common word as a recognition vocabulary is stored in the storage device. The common word recognition means determines whether or not the common word is included in a voice command uttered by a user by executing a voice recognition process using the common word recognition dictionary.

また、請求項８記載の発明は、請求項１乃至６のいずれかに記載の音声認識装置において、前記共通語認識手段は、ユーザの発話を表す入力音声と予め記憶された前記共通語の基準音声とを比較することによって、ユーザが発話した音声命令に共通語が含まれているか否かを判断するものである。 According to an eighth aspect of the present invention, in the speech recognition apparatus according to any one of the first to sixth aspects, the common word recognition means includes an input voice representing a user's utterance and a reference for the common word stored in advance. By comparing with voice, it is determined whether or not a common word is included in the voice command spoken by the user.

上述の請求項１乃至８に記載の発明は、まず、ユーザが発話した音声命令に共通語が含まれているか否かを判断した後に、認識辞書に基づいて音声命令を認識していたが、請求項９記載のように、認識辞書を用いて音声命令の候補を選択した後に、選択した候補に共通語が含まれているか否かを判断してもよい。 In the first to eighth aspects of the invention, first, after determining whether or not a common word is included in the voice command spoken by the user, the voice command is recognized based on the recognition dictionary. As described in claim 9, after selecting a voice command candidate using a recognition dictionary, it may be determined whether or not a common word is included in the selected candidate.

すなわち、請求項９記載の発明は、複数の制御対象機器に対する音声命令を認識するための音声認識装置であって、複数の認識語が、それぞれ前記複数の制御対象機器のうちのいずれかの制御対象機器の動作状態と対応付けられている認識辞書を記憶している記憶装置と、その記憶装置に記憶されている認識辞書に基づいて、ユーザが発話した音声命令を認識する音声命令認識手段と、互いに異なる前記制御対象機器と対応付けられている少なくとも２つの認識語に共通して含まれている共通語のリストを記憶したリスト記憶装置とを備え、
前記音声命令認識手段は、前記認識辞書を用いた音声認識処理により、その認識辞書からユーザが発話した音声命令の候補を選択する候補選択手段を含み、その候補選択手段において複数の候補が選択され、且つ、その選択された複数の候補のうちの少なくとも２つの候補に、前記共通語のリストに含まれている共通語が含まれている場合、共通語のリストに含まれている共通語を含む候補であって、ユーザが最後に操作した制御対象機器に基づいて定まる認識辞書の認識語彙となっている候補を選択することによって、ユーザが発話した音声を特定するものであることを特徴とする。 That is, the invention according to claim 9 is a voice recognition device for recognizing a voice command for a plurality of control target devices, wherein a plurality of recognition words are controlled by any one of the plurality of control target devices. A storage device that stores a recognition dictionary associated with the operation state of the target device, and a voice command recognition unit that recognizes a voice command spoken by the user based on the recognition dictionary stored in the storage device; A list storage device that stores a list of common words that are commonly included in at least two recognition words associated with different control target devices,
The voice command recognition means includes candidate selection means for selecting a voice command candidate spoken by the user from the recognition dictionary by voice recognition processing using the recognition dictionary, and a plurality of candidates are selected by the candidate selection means. And when at least two of the selected candidates include a common word included in the common word list, the common word included in the common word list is a candidate containing, by selecting a candidate user is a recognition vocabulary of the recognition dictionary determined based on the control target device has been operated last, and characterized in that for identifying the speech uttered by a user To do.

このように、まず、候補選択手段において、認識辞書を用いた音声認識処理によって音声命令の候補を選択して、その選択した候補のうちの少なくとも２つの候補に共通語が含まれている場合に、共通語のリストに含まれている共通語を含む候補であって、最後に操作した制御対象機器に基づいて定まる認識辞書の認識語彙となっている候補を選択するようにしても、ユーザが最後に操作した制御対象機器に関する音声命令以外は選択されないことになるので、共通語を含む音声命令の認識精度が向上する。 Thus, first, when the candidate selection means selects a voice command candidate by voice recognition processing using a recognition dictionary, and a common word is included in at least two of the selected candidates. Even if the user selects a candidate that includes a common word included in the common word list and is a recognition vocabulary of a recognition dictionary that is determined based on the device to be controlled last, Since the voice command related to the device to be controlled last operated is not selected, the recognition accuracy of the voice command including the common word is improved.

上記請求項９記載の発明の場合にも、認識率の向上のために、請求項１０記載のように、認識辞書は制御対象機器の動作状態別に複数の認識辞書に分けられることが好ましい。 Also in the case of the ninth aspect of the invention, in order to improve the recognition rate, it is preferable that the recognition dictionary is divided into a plurality of recognition dictionaries according to the operation state of the control target device.

請求項１０記載の発明は、請求項９に記載の音声認識装置において、前記記憶装置に記憶されている認識辞書は、前記制御対象機器の動作状態別に複数の認識辞書に分けられており、前記複数の制御対象機器の動作状態に基づいて、前記記憶装置に記憶されている複数の認識辞書から少なくとも一つの認識辞書を選択する辞書選択手段をさらに備え、前記候補選択手段は、音声認識処理に前記辞書選択手段で選択された認識辞書を用いることを特徴とする。 The invention according to claim 10 is the speech recognition apparatus according to claim 9, wherein the recognition dictionary stored in the storage device is divided into a plurality of recognition dictionaries according to operation states of the control target devices, The apparatus further comprises dictionary selection means for selecting at least one recognition dictionary from a plurality of recognition dictionaries stored in the storage device based on operating states of a plurality of control target devices, and the candidate selection means is adapted for voice recognition processing. The recognition dictionary selected by the dictionary selection means is used.

また、好ましくは、請求項１１記載のように、前記記憶装置に、前記共通語を認識語彙とする共通語認識辞書が記憶されており、前記候補選択手段は、前記辞書選択手段で選択された認識辞書に加えて前記共通語認識辞書を用いて、ユーザが発話した音声命令の候補を選択する。 Preferably, as described in claim 11, a common word recognition dictionary having the common word as a recognition vocabulary is stored in the storage device, and the candidate selecting unit is selected by the dictionary selecting unit. Using the common word recognition dictionary in addition to the recognition dictionary, a voice command candidate spoken by the user is selected.

このように、候補選択手段において、共通語認識辞書をさらに用いるようにすれば、ユーザが共通語を含む音声命令を発話したことを精度よく認識できるようになり、その結果、共通語を含む音声命令の認識精度がより向上する。 In this way, if the candidate selection means further uses the common word recognition dictionary, it becomes possible to accurately recognize that the user has spoken a voice command including the common word, and as a result, the voice including the common word. Instruction recognition accuracy is further improved.

以下、本発明の装置の実施の形態を図面に基づいて説明する。図１は、本発明の音声認識装置としての機能を備えた車載ナビゲーション装置１０の構成を示すブロック図である。 Embodiments of the apparatus of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an in-vehicle navigation device 10 having a function as a voice recognition device of the present invention.

車載ナビゲーション装置１０は、マイク１２と、スピーカ１４と、外部機器制御部１６と、表示装置１８と、ＥＣＵ１００とを備えている。 The on-vehicle navigation device 10 includes a microphone 12, a speaker 14, an external device control unit 16, a display device 18, and an ECU 100.

この車載ナビゲーション装置１０は、ナビゲーション装置としての機能以外に、オーディオ、エアコン、電話機などの外部機器を制御する機能も備えており、外部機器制御部１６は、それら外部機器を制御するためのものである。 This in-vehicle navigation device 10 has a function of controlling external devices such as an audio, an air conditioner, and a telephone in addition to the function as a navigation device, and the external device control unit 16 is for controlling these external devices. is there.

表示装置１８は、たとえば、液晶ディスプレイによって構成され、インスツルメントパネルの運転席と助手席との中間位置などに配置される。なお、この車載ナビゲーション装置１０には、図１に示す構成以外に、通常のナビゲーション装置が有している構成、たとえば、操作スイッチ群、車両外部との間で無線通信を行う無線通信機、ＤＶＤ−ＲＯＭなどの記憶媒体から地図データが入力される地図データ入力器、地磁気センサやジャイロスコープなどによって車両の現在位置を検出する位置検出器、リモコン、リモコンセンサなどを備えている。それらの機器からの信号は、ＥＣＵ１００に入力されるようになっている。また、ＥＣＵ１００には、車両に搭載された種々のセンサ、たとえば、車速センサ、温度センサなどからの信号も入力される。 The display device 18 is constituted by, for example, a liquid crystal display, and is disposed at an intermediate position between the driver seat and the passenger seat of the instrument panel. In addition to the configuration shown in FIG. 1, the in-vehicle navigation device 10 includes a configuration that a normal navigation device has, such as an operation switch group, a wireless communication device that performs wireless communication with the outside of the vehicle, a DVD A map data input device for inputting map data from a storage medium such as a ROM, a position detector for detecting the current position of the vehicle by a geomagnetic sensor or a gyroscope, a remote controller, a remote control sensor, and the like are provided. Signals from these devices are input to the ECU 100. The ECU 100 also receives signals from various sensors mounted on the vehicle, such as a vehicle speed sensor and a temperature sensor.

ＥＣＵ１００は、ＣＰＵ、ＲＯＭ、ＲＡＭ等を備えたコンピュータであり、マイク１２から入力される音声を認識したり、スピーカ１４から所定の音（音声含む）を出力させたり、外部から入力される信号に基づいて外部機器の制御内容を決定して、外部機器制御部１６へその決定した制御内容を出力したり、外部から入力される信号に基づいて表示装置１８の表示画面を制御したりする。 The ECU 100 is a computer including a CPU, a ROM, a RAM, and the like. The ECU 100 recognizes a sound input from the microphone 12, outputs a predetermined sound (including sound) from the speaker 14, or outputs a signal input from the outside. Based on this, the control content of the external device is determined, and the determined control content is output to the external device control unit 16, or the display screen of the display device 18 is controlled based on a signal input from the outside.

次に、ＥＣＵ１００の内部構成を説明する。音声出力制御部１０２には、マイク１２によって検出されたユーザの音声が入力され、音声出力制御部１０２は、マイク１２から信号が入力されると、その入力された信号に基づいてユーザの発話を確認する。そして、発話が確認できたときには、マイク１２からの音声を言語解析処理部１０４へ出力する。また、音声出力制御部１０２は、ナビゲーションインターフェース部１０６との間で信号の授受が可能となっており、目的地検索部１０８、案内・探索制御部１１０、メモリ地点制御部１１２において決定された出力音指令信号が、ナビゲーションインターフェース部１０６を介して供給される。出力音指令信号が供給されると、その信号に基づいて所定の出力音信号を生成して、それをスピーカ１４に出力することにより、スピーカ１４から所定の音を出力させる。 Next, the internal configuration of the ECU 100 will be described. The voice output control unit 102 receives the user's voice detected by the microphone 12. When the signal is input from the microphone 12, the voice output control unit 102 utters the user's speech based on the input signal. Check. When the utterance can be confirmed, the sound from the microphone 12 is output to the language analysis processing unit 104. The audio output control unit 102 can exchange signals with the navigation interface unit 106, and the output determined by the destination search unit 108, the guidance / search control unit 110, and the memory point control unit 112. A sound command signal is supplied via the navigation interface unit 106. When the output sound command signal is supplied, a predetermined output sound signal is generated based on the signal and is output to the speaker 14, thereby outputting a predetermined sound from the speaker 14.

言語解析処理部１０４は、音声出力制御部１０２で確認された発生内容を解析して、音声認識エンジン１１４に入力する。音声認識操作制御部１１６は、記憶装置１１７を有しており、その記憶装置１１７には、制御対象機器の動作状態に対応した複数の認識辞書が記憶されている。上記制御対象機器とは、この車載ナビゲーション装置１０や、前述の外部機器を意味する。 The language analysis processing unit 104 analyzes the generated content confirmed by the voice output control unit 102 and inputs it to the voice recognition engine 114. The voice recognition operation control unit 116 includes a storage device 117, and the storage device 117 stores a plurality of recognition dictionaries corresponding to the operation state of the control target device. The device to be controlled means the in-vehicle navigation device 10 or the external device described above.

上記複数の認識辞書としては、たとえば、車載ナビゲーション装置１０の目的地設定モードに対応した認識辞書、オーディオの曲選択モードに対応した認識辞書、エアコンの設定操作モードに対応した認識辞書、スピーカ１４の音声モード（スピーカ１４から出力されている音が、ＦＭラジオ、ＡＭラジオ、ＣＤプレーヤー等の音出力可能な機器のうちのどれであるかを表すもの）に対応した認識辞書がある。これらの認識辞書には、たとえば、目的設定モードに対応した認識辞書には、認識語彙に「前の目的地」「次の目的地」などの認識語が含まれ、また、オーディオの曲選択モードに対応した認識辞書には、認識語彙に「次の曲」「前の曲」が含まれている。このように、「次」「前」などは、複数の認識辞書に共通に含まれる語、すなわち、共通語である。記憶装置１１７には、この共通語を認識語彙とする共通語認識辞書も記憶されている。また、記憶装置１１７はリスト記憶装置として機能しており、共通語のリストも記憶されている。 Examples of the plurality of recognition dictionaries include a recognition dictionary corresponding to the destination setting mode of the in-vehicle navigation device 10, a recognition dictionary corresponding to the audio song selection mode, a recognition dictionary corresponding to the air conditioner setting operation mode, and the speaker 14 There is a recognition dictionary corresponding to the sound mode (which indicates which of the devices that can output sound, such as FM radio, AM radio, CD player, etc.). In these recognition dictionaries, for example, the recognition dictionaries corresponding to the purpose setting mode include recognition words such as “previous destination” and “next destination” in the recognition vocabulary, and an audio song selection mode. In the recognition dictionary corresponding to, “next song” and “previous song” are included in the recognition vocabulary. Thus, “next”, “previous”, and the like are words that are commonly included in a plurality of recognition dictionaries, that is, common words. The storage device 117 also stores a common word recognition dictionary that uses this common word as a recognition vocabulary. The storage device 117 functions as a list storage device, and a common language list is also stored.

音声認識操作制御部１１６には、目的地検索部１０８、案内・探索制御部１１０、メモリ地点制御部１１２、外部機器制御部１６、および表示装置１８からの信号がナビゲーションインターフェース部１０６を介して入力される。また、ラストモード管理部１２０において記憶されているラスト管理情報も、ナビゲーションインターフェース部１０６を介して音声認識操作制御部１１６に入力される。ラスト管理情報とは、最後にユーザが操作した制御対象機器が何であったかを管理（記憶）した情報を意味する。 The voice recognition operation control unit 116 receives signals from the destination search unit 108, the guidance / search control unit 110, the memory point control unit 112, the external device control unit 16, and the display device 18 via the navigation interface unit 106. Is done. The last management information stored in the last mode management unit 120 is also input to the voice recognition operation control unit 116 via the navigation interface unit 106. The last management information means information that manages (stores) what the control target device last operated by the user was.

音声認識操作部１１６では、それらの入力される信号に基づいて、現在動作中の制御対象機器、および、その動作中の制御対象機器の動作状態を判断し、それに基づいて、記憶装置１１７に記憶されている複数の認識辞書から、音声認識エンジン１１４において音声認識処理に用いる認識辞書を選択する。 The voice recognition operation unit 116 determines the currently operated control target device and the operation state of the currently operated control target device based on the input signals, and stores them in the storage device 117 based on the determined operation target device. The speech recognition engine 114 selects a recognition dictionary to be used for speech recognition processing from the plurality of recognition dictionaries.

音声認識エンジン１１４では、音声認識操作制御部１１６で選択された認識辞書を用いて、言語解析処理部１０４にて解析された発生内容が、どの音声命令であるかを認識（決定）する。 The voice recognition engine 114 recognizes (determines) which voice command is the generated content analyzed by the language analysis processing unit 104 using the recognition dictionary selected by the voice recognition operation control unit 116.

通信制御部１２０は、音声認識エンジン１１４において認識された音声命令や、図示しない操作スイッチ群（またはリモコン）が操作されることによって入力された制御命令を外部機器制御部１６に通知し、また、外部機器制御部１６から入力される外部機器の動作状態をナビゲーションインターフェース制御部１０６へ入力する。 The communication control unit 120 notifies the external device control unit 16 of a voice command recognized by the voice recognition engine 114 and a control command input by operating an operation switch group (or a remote controller) (not shown). The operation state of the external device input from the external device control unit 16 is input to the navigation interface control unit 106.

表示制御部１２２には、目的地検索部１０８、案内・探索制御部１１０、メモリ地点制御部１１２において決定された表示内容が、ナビゲーションインターフェース部１０６を介して供給され、その供給された内容に従って表示装置１８の表示画面を制御する。また、表示装置１８には、音声認識エンジン１１４における認識結果も表示される。さらに、表示装置１８は外部機器制御部１６との間でも信号の授受が可能となっており、表示装置１８の表示画面には、外部機器制御部１６からの信号に基づいて、外部機器の状態も表示される。 The display content determined by the destination search unit 108, the guidance / search control unit 110, and the memory point control unit 112 is supplied to the display control unit 122 via the navigation interface unit 106, and displayed according to the supplied content. The display screen of the device 18 is controlled. The display device 18 also displays a recognition result in the speech recognition engine 114. Further, the display device 18 can exchange signals with the external device control unit 16, and the display screen of the display device 18 displays the state of the external device based on the signal from the external device control unit 16. Is also displayed.

図２は、ＥＣＵ１００における処理内容のうち、ユーザが発話した音声命令を認識するための処理を示すフローチャートである。この図２に示す処理は、マイク１２から信号が入力され、音声出力制御部１０２においてユーザの発話が確認された場合に実行するようになっている。 FIG. 2 is a flowchart showing a process for recognizing a voice command uttered by the user among the process contents in the ECU 100. The process shown in FIG. 2 is executed when a signal is input from the microphone 12 and the speech output control unit 102 confirms the user's utterance.

図２において、まず、ステップＳ１０では、認識レベルが適当であるか否かを判断する。この判断は、マイク１２からの入力音声が、大きすぎる場合および小さすぎる場合に否定される。一方、ステップＳ１０の判断が肯定された場合には、辞書選択手段に相当するステップＳ２０において、制御対象機器の動作状態に基づいて、記憶装置１１７に記憶されている複数の認識辞書から一つまたは複数の認識辞書を選択する。 In FIG. 2, first, in step S10, it is determined whether or not the recognition level is appropriate. This determination is denied when the input sound from the microphone 12 is too loud or too small. On the other hand, if the determination in step S10 is affirmative, one or more of the plurality of recognition dictionaries stored in the storage device 117 is selected based on the operation state of the control target device in step S20 corresponding to the dictionary selection unit. Select multiple recognition dictionaries.

そして、続くステップＳ３０では、上記ステップＳ２０において複数の認識辞書を選択したか否か、および、その複数の認識辞書のうちの少なくとも２つの認識辞書に同じ共通語が含まれているか否かを判断する。 In subsequent step S30, it is determined whether or not a plurality of recognition dictionaries are selected in step S20, and whether or not the same common word is included in at least two of the plurality of recognition dictionaries. To do.

ステップＳ３０が肯定された場合には、ステップＳ４０において、ラスト管理情報があるか否か、すなわち、ラストモード管理部１１８にラスト管理情報が記憶されているか否かをさらに判断する。この判断も肯定された場合には、共通語認識手段に相当するステップＳ５０において、記憶装置１１７に記憶されている共通語認識辞書を用いて音声認識処理を実行する。この音声認識処理とは、認識辞書に認識語彙として含まれている認識語の音声データとユーザの発話音声とを比較することにより尤度を算出するものであり、所定の尤度以上の認識語（ここでは共通語）がある場合には、その認識語が発話されたと言葉であると決定するものである。なお、所定の尤度以上の認識語が複数ある場合には、最も尤度が高い認識語を発話された言葉として決定する。 If step S30 is affirmed, it is further determined in step S40 whether or not there is last management information, that is, whether or not the last management information is stored in the last mode management unit 118. If this determination is also affirmed, speech recognition processing is executed using the common word recognition dictionary stored in the storage device 117 in step S50 corresponding to the common word recognition means. This speech recognition process calculates likelihood by comparing speech data of a recognized word included in a recognition dictionary as a recognition vocabulary with a user's uttered speech, and a recognized word having a predetermined likelihood or higher. If there is (a common language here), it is determined that the recognized word is a spoken word. When there are a plurality of recognized words having a predetermined likelihood or more, the recognized word having the highest likelihood is determined as the spoken word.

続くステップＳ６０では、上記ステップＳ５０を実行した結果、共通語があると認識できたか否かを判断する。ステップＳ６０の判断が否定された場合、および、前述のステップＳ３０、Ｓ４０の判断が否定された場合には、ステップＳ７０において、従来と同様の音声命令認識処理を実行する。すなわち、制御対象機器の動作状態に基づいて選択された一つまたは複数の認識辞書を用いて音声認識処理を実行する。これにより、ユーザが発話した音声命令として一つの音声命令が決定される。 In the subsequent step S60, it is determined whether or not it has been recognized that there is a common word as a result of executing step S50. If the determination in step S60 is negative, and if the determinations in steps S30 and S40 described above are negative, in step S70, a voice command recognition process similar to the conventional one is executed. That is, the speech recognition process is executed using one or a plurality of recognition dictionaries selected based on the operation state of the control target device. Thereby, one voice command is determined as the voice command spoken by the user.

一方、ステップＳ６０の判断が肯定された場合には、ステップＳ８０を実行する。ステップＳ８０では、ラストモード管理部１１８に記憶されているラスト管理情報に基づいて認識辞書の絞込みを行う。すなわち、ステップＳ８０では、ステップＳ２０で選択した認識辞書から、ラストモード管理部１１８に記憶されているラスト管理情報を用いて一つの認識辞書を選択する。 On the other hand, if the determination in step S60 is affirmative, step S80 is executed. In step S80, the recognition dictionary is narrowed down based on the last management information stored in the last mode management unit 118. That is, in step S80, one recognition dictionary is selected from the recognition dictionary selected in step S20 using the last management information stored in the last mode management unit 118.

続くステップＳ９０では、上記ステップＳ８０で選択した一つの認識辞書を用いて、二度目の音声認識処理を実行する。これにより、ユーザが発話した音声命令が何であるかを決定する。なお、図２においては、ステップＳ４０、Ｓ６０乃至９０が音声命令認識手段に相当する。 In subsequent step S90, the second speech recognition process is executed using the one recognition dictionary selected in step S80. This determines what the voice command the user has spoken. In FIG. 2, steps S40 and S60 to 90 correspond to voice command recognition means.

上記ステップＳ９０の音声認識処理においては、ステップＳ５０における処理と同様に、認識辞書に含まれている認識語の音声データとユーザの発話音声との比較に基づいて定まる尤度を用いてユーザが発話した音声命令が何であるかを決定する。従って、認識辞書に含まれている認識語とユーザの発話した言葉が完全に一致していなくても、尤度が所定値以上であれば、ユーザが発話した音声命令が決定される。たとえば、ユーザの発話した言葉が単に「前」である場合にも、尤度によっては「前の目的地」がユーザが発話した言葉として決定される。 In the speech recognition process in step S90, as in the process in step S50, the user utters using the likelihood determined based on the comparison between the speech data of the recognized word contained in the recognition dictionary and the user's speech. Determine what the voice command is. Therefore, even if the recognized word included in the recognition dictionary and the word spoken by the user do not completely match, if the likelihood is equal to or greater than the predetermined value, the voice command spoken by the user is determined. For example, even when the word spoken by the user is simply “previous”, “previous destination” is determined as the word spoken by the user depending on the likelihood.

ステップＳ７０またはステップＳ９０を実行してユーザが発話した音声命令を決定した後は、ステップＳ１００において、処理が成功したか否かを判断する。この判断は、上記ステップＳ７０またはＳ９０において決定した音声命令を実行することができるか否かを判断するものであり、音声命令が実行できる場合にはステップＳ１００は肯定され、音声命令が実行できない場合にはステップＳ１００は否定される。たとえば、目的地までのルートが決定されていない状態において、ルート変更との命令が認識された場合には、ステップＳ１００の判断は否定される。 After executing step S70 or step S90 and determining the voice command spoken by the user, in step S100, it is determined whether or not the process is successful. This determination is to determine whether or not the voice command determined in step S70 or S90 can be executed. If the voice command can be executed, step S100 is affirmed and the voice command cannot be executed. Step S100 is denied. For example, in the state where the route to the destination has not been determined, if an instruction to change the route is recognized, the determination in step S100 is denied.

ステップＳ１００の判断が肯定された場合には、続くステップＳ１１０において、ステップＳ７０またはＳ９０で決定した音声命令に基づく機能を、所定の制御対象機器に実行させる。 If the determination in step S100 is affirmed, in a subsequent step S110, a function based on the voice command determined in step S70 or S90 is executed by a predetermined control target device.

一方、ステップＳ１００の判断が否定された場合には、ステップＳ１２０において、所定のエラー通知を表示装置１８およびスピーカ１４から出力させる。たとえば、ステップＳ１０が否定された場合には、認識レベルが不適切であるために音声命令が認識できない旨のメッセージを出力させ、ステップＳ１００が否定された場合には、現在の動作状態では入力された音声命令は実行できない旨のメッセージを出力させる。 On the other hand, if the determination in step S100 is negative, a predetermined error notification is output from the display device 18 and the speaker 14 in step S120. For example, if step S10 is denied, a message indicating that the voice command cannot be recognized because the recognition level is inappropriate is output. If step S100 is denied, the message is input in the current operating state. A message indicating that the voice command cannot be executed is output.

以上、説明した本実施形態によれば、ステップＳ５０において共通語認識処理を実行することにより、ユーザが発話した音声命令に共通語が含まれているか否かを判断しており、共通語が含まれていると判断した場合には、ユーザが最後に操作した制御対象機器に基づいて認識辞書を一つに絞り込み（ステップＳ８０）、その絞り込んだ認識辞書を用いて音声認識処理を行っている（ステップＳ９０）。従って、たとえば、オーディオ動作中であり、且つ、目的地設定操作中であるために、２つ（またはそれ以上）の認識辞書がステップＳ２０で選択されたとしても、ステップＳ５０において、「次」「前」などの共通語が認識された場合には、ステップＳ８０においてラスト管理情報に基づいて認識辞書が一つに絞り込まれた後に、ステップＳ９０において音声認識処理が実行される。その結果、たとえば、ユーザの最後の操作機器がオーディオである場合において、ユーザが「前の曲」と言ったにも関わらず「前の目的地」が認識されてしまうことが防止されるのはもちろんのこと、「前の曲」という命令を思いつかずに、単に「前」と言ってしまったとしても、ユーザの意図しない「前の目的地」は認識されずに、ユーザの意図した「前の曲」が認識されるなど、共通語を含む音声命令の認識精度が向上する。 As described above, according to the present embodiment described above, it is determined whether or not a common word is included in the voice command spoken by the user by executing the common word recognition process in step S50. If it is determined that the recognition dictionary is determined, the recognition dictionary is narrowed down to one based on the device to be controlled last operated by the user (step S80), and voice recognition processing is performed using the narrowed recognition dictionary (step S80). Step S90). Therefore, for example, even if two (or more) recognition dictionaries are selected in step S20 because an audio operation is being performed and a destination setting operation is being performed, in step S50, “next” “ If a common word such as “previous” is recognized, the recognition dictionary is narrowed down to one based on the last management information in step S80, and then the speech recognition process is executed in step S90. As a result, for example, when the user's last operating device is audio, it is prevented that the “previous destination” is recognized even though the user has said “previous song”. Of course, even if you simply say “previous” without thinking of the command “previous song”, the “previous destination” not intended by the user will not be recognized, Recognition accuracy of voice commands including common words is improved.

次に、本発明の第２実施形態を説明する。第２実施形態は、図１と同様の構成を有する車載ナビゲーション装置であるが、ユーザが発話した音声命令を特定するためのＥＣＵ１００の処理が第１実施形態と異なる。図３は、第２実施形態においてＥＣＵ１００が実行する、ユーザが発話した音声命令を特定するための処理を示すフローチャートである。 Next, a second embodiment of the present invention will be described. The second embodiment is an in-vehicle navigation device having a configuration similar to that of FIG. 1, but the processing of the ECU 100 for specifying a voice command spoken by the user is different from that of the first embodiment. FIG. 3 is a flowchart showing processing for specifying a voice command spoken by the user, which is executed by the ECU 100 in the second embodiment.

図３のフローチャートが前述の図２に示したものと相違する点は、ステップＳ３０を実行せずに、ステップＳ２０を実行した後、直接ステップＳ４０を実行する点、図２のステップＳ５０に代えてステップＳ５５を実行する点、および図２のステップＳ９０に代えてステップＳ９５を実行する点である。以下、この相違点について説明する。 3 is different from that shown in FIG. 2 described above in that step S30 is not executed but step S20 is directly executed, and then step S40 is directly executed, instead of step S50 in FIG. Step S55 is executed, and step S95 is executed instead of step S90 in FIG. Hereinafter, this difference will be described.

第２実施形態では、ステップＳ２０において、制御対象機器の動作状態に基づいて、記憶装置１１７に記憶されている複数の認識辞書から一つまたは複数の認識辞書を選択した後、選択した辞書数が複数であるか否かを判断することなく、ステップＳ４０を実行する。従って、選択した辞書数が一つのみであっても、ステップＳ４０以下を実行する。 In the second embodiment, after selecting one or more recognition dictionaries from the plurality of recognition dictionaries stored in the storage device 117 based on the operation state of the control target device in step S20, the number of selected dictionaries is Step S40 is executed without determining whether or not there are a plurality. Therefore, even if only one dictionary is selected, step S40 and subsequent steps are executed.

そして、ステップＳ４０においてラスト管理情報があると判断された場合には、共通語認識手段に相当するステップＳ５５を実行する。ステップＳ５５では、マイク１２から入力された音声の波形と予め記憶された共通語の音声波形とに基づいて、たとえばＤＰマッチングを利用した方法などの公知のワードスポッティング法を用いて、ユーザが発話した音声命令から共通語を抽出することを試みる。このステップＳ５５において共通語が抽出できた場合には続くステップＳ６０が肯定され、共通語が抽出できなかった場合にはステップＳ６０が否定されることになる。 If it is determined in step S40 that there is last management information, step S55 corresponding to the common word recognition unit is executed. In step S55, the user speaks using a known word spotting method such as a method using DP matching based on the speech waveform input from the microphone 12 and the speech waveform of the common word stored in advance. Attempts to extract common words from voice commands. If the common word can be extracted in step S55, the following step S60 is affirmed, and if the common word cannot be extracted, step S60 is denied.

ステップＳ８０では、第１実施形態の場合と同様に、ラストモード管理部１１８に記憶されているラスト管理情報に基づいて認識辞書の絞込みを行う。なお、第２実施形態では、ステップＳ２０において選択した認識辞書の数が一つのみの場合でもステップＳ８０を実行することになるが、すでに認識辞書が一つのみとなっている場合、ステップＳ８０ではその一つの認識辞書を選択することになる。 In step S80, the recognition dictionary is narrowed down based on the last management information stored in the last mode management unit 118, as in the case of the first embodiment. In the second embodiment, step S80 is executed even when only one recognition dictionary is selected in step S20. However, if there is already only one recognition dictionary, in step S80, That one recognition dictionary is selected.

続くステップＳ９５では、ステップＳ５５で抽出した共通語と、ステップＳ８０で絞り込んだ認識辞書の認識語彙とを比較し、ステップＳ８０で絞り込んだ認識辞書の認識語彙のうちから、ステップＳ５５で抽出した共通語を含む認識語をユーザが発話した音声命令として特定する。このステップＳ９５を実行した後は、前述のステップＳ１００以下を実行する。 In the subsequent step S95, the common word extracted in step S55 is compared with the recognition vocabulary of the recognition dictionary narrowed down in step S80, and the common word extracted in step S55 from the recognition vocabulary of the recognition dictionary narrowed down in step S80. Is identified as a voice command spoken by the user. After this step S95 is executed, the above-described step S100 and subsequent steps are executed.

以上、説明した第２実施形態の場合には、ステップＳ２０において一つのみの認識辞書を選択した場合であっても、Ｓ５５を実行してユーザの発話内容から共通語の抽出を行っている。従って、ユーザが共通語のみを発話した場合にも、それを精度よく認識することができるようになり、ステップＳ９５では、その精度よく認識できる共通語と認識辞書の認識語彙とを比較することによって音声命令を特定しているので、共通語を含む音声命令の特定精度が向上する。 As described above, in the case of the second embodiment described above, even if only one recognition dictionary is selected in step S20, common words are extracted from the utterance contents of the user by executing S55. Therefore, even when the user utters only a common word, it can be recognized with high accuracy. In step S95, the common word that can be recognized with high accuracy is compared with the recognition vocabulary of the recognition dictionary. Since the voice command is specified, the accuracy of specifying the voice command including the common word is improved.

次に、本発明の第３実施形態を説明する。第３実施形態も、図１と同様の構成を有する車載ナビゲーション装置であるが、ユーザが発話した音声命令を特定するためのＥＣＵ１００の処理が第１、２実施形態と異なる。図４は、第３実施形態においてＥＣＵ１００が実行する、ユーザが発話した音声命令を認識するための処理を示すフローチャートである。 Next, a third embodiment of the present invention will be described. The third embodiment is also an in-vehicle navigation device having the same configuration as that in FIG. 1, but the processing of the ECU 100 for specifying the voice command spoken by the user is different from the first and second embodiments. FIG. 4 is a flowchart showing processing for recognizing a voice command spoken by the user, which is executed by the ECU 100 in the third embodiment.

図４においては、まず、図２と同じステップＳ１０、Ｓ２０を実行する。そして、ステップＳ２０において一つまたは複数の認識辞書を選択した後は、候補選択手段に相当するステップＳ２００において、上記ステップＳ２０で選択した認識辞書を用いて音声命令の候補を選択する。すなわち、ステップＳ２０で選択した認識辞書を用いた音声認識処理を実行し、その処理の結果、認識辞書の認識語彙に含まれる認識語のうち、尤度が所定値以上の認識語を、音声命令の候補として選択する。 In FIG. 4, first, the same steps S10 and S20 as in FIG. 2 are executed. After one or more recognition dictionaries are selected in step S20, voice command candidates are selected using the recognition dictionaries selected in step S20 in step S200 corresponding to the candidate selection means. That is, the speech recognition process using the recognition dictionary selected in step S20 is executed, and among the recognition words included in the recognition vocabulary of the recognition dictionary as a result of the processing, a recognition word having a likelihood equal to or higher than a predetermined value is selected as a voice command. Select as a candidate.

続くステップＳ２１０では、上記ステップＳ２００で選択した候補を記憶装置１１７に記憶されている共通語のリストと比較することにより、選択した候補のうちの少なくとも２つの候補に同じ共通語が含まれているか否かを判断する。 In subsequent step S210, by comparing the candidate selected in step S200 with the list of common words stored in the storage device 117, is the same common word included in at least two of the selected candidates? Judge whether or not.

上記ステップＳ２１０の判断が肯定された場合にはステップＳ２２０を実行する。ステップＳ２２０では、ラスト管理情報があるか否か、すなわち、ラストモード管理部１１８にラスト管理情報が記憶されているか否かをさらに判断する。このステップＳ２２０の判断が否定された場合、および、前述のステップＳ２１０の判断が否定された場合には、ステップＳ２３０を実行する。ステップＳ２３０では、ステップＳ２１０で選択した複数の候補のうちで尤度が最も高いものを、ユーザが発話した音声命令であると決定する。 If the determination in step S210 is affirmative, step S220 is executed. In step S220, it is further determined whether or not there is last management information, that is, whether or not the last management information is stored in the last mode management unit 118. If the determination in step S220 is negative and if the determination in step S210 is negative, step S230 is executed. In step S230, it is determined that the highest likelihood among the plurality of candidates selected in step S210 is the voice command spoken by the user.

一方、ステップＳ２２０の判断が肯定された場合には、ステップＳ２１０で選択した複数の候補のうち、共通語を含む候補であって、ラスト管理情報に基づいて定まる認識辞書の認識語彙となっている候補を、ユーザが発話した音声命令であると決定する。たとえば、ステップＳ２１０において「次の曲」、「次の目的地」など、共通語「次」を含む候補が複数選択され、ラスト管理情報に基づいて定まる認識辞書が目的地設定用の認識辞書である場合には、ユーザが発話した音声命令が「次の目的地」であると決定することになる。なお、図４においては、ステップＳ２００乃至Ｓ２４０が音声命令認識手段に相当する。 On the other hand, if the determination in step S220 is affirmative, the candidate includes a common word among the plurality of candidates selected in step S210, and is a recognition vocabulary of a recognition dictionary determined based on the last management information. The candidate is determined to be a voice command spoken by the user. For example, in step S210, a plurality of candidates including the common word “next” such as “next song” and “next destination” are selected, and a recognition dictionary determined based on the last management information is a recognition dictionary for destination setting. In some cases, the voice command spoken by the user is determined to be the “next destination”. In FIG. 4, steps S200 to S240 correspond to voice command recognition means.

ステップＳ２３０またはＳ２４０においてユーザが発話した音声命令が何であるかを特定した後は、第１実施形態と同様のステップＳ１００以降を実行する。 After identifying what the voice command the user uttered in step S230 or S240, the same steps as in the first embodiment are executed after step S100.

以上、説明した第３実施形態のように、まず、認識辞書を用いた音声認識処理によって音声命令の候補を選択して（Ｓ２００）、次いで、その選択した候補のうちの少なくとも２つの候補に共通語が含まれているか否かを判断し（Ｓ２１０）、共通語が含まれている場合に、最後に操作した制御対象機器に基づいて一つの候補を選択する（Ｓ２４０）ようにしても、ユーザが最後に操作した制御対象機器に関する音声命令以外は選択されないことになるので、共通語を含む音声命令の認識精度が向上する。 As described above, as in the third embodiment described above, first, a speech command candidate is selected by speech recognition processing using a recognition dictionary (S200), and then common to at least two of the selected candidates. It is determined whether or not a word is included (S210), and when a common word is included, one candidate is selected based on the device to be controlled last operated (S240). Is not selected except for the voice command related to the device to be controlled last operated, so that the recognition accuracy of the voice command including the common word is improved.

なお、記憶装置１１７には、認識辞書として共通語認識辞書が記憶されているので、ユーザが共通語を含む音声命令を発話した場合、ステップＳ２００において選択される候補には、共通語のみの候補も含まれることになる。これにより、ユーザが共通語を含む音声命令を発話した場合、より確実にステップＳ２１０が肯定されることになり、その結果、共通語を含む音声命令の認識精度がより向上することになる。 Since the common word recognition dictionary is stored as a recognition dictionary in the storage device 117, when the user utters a voice command including the common word, the candidate selected in step S200 is a candidate for only the common word. Will also be included. As a result, when the user utters a voice command including a common word, step S210 is more reliably affirmed, and as a result, the recognition accuracy of the voice command including the common word is further improved.

以上、本発明の実施形態を説明したが、本発明は上述の実施形態に限定されるものではなく、次の実施形態も本発明の技術的範囲に含まれ、さらに、下記以外にも要旨を逸脱しない範囲内で種々変更して実施することができる。 As mentioned above, although embodiment of this invention was described, this invention is not limited to the above-mentioned embodiment, The following embodiment is also contained in the technical scope of this invention, and also the summary other than the following is also included. Various modifications can be made without departing from the scope.

たとえば、前述の実施形態では、認識辞書が制御対象機器の動作状態別に複数記憶されていたが、各認識辞書を制御対象機器の動作状態と対応させる代わりに、認識辞書に含まれる各認識語を制御対象機器の動作状態と対応させることにより、認識辞書を一つにまとめてもよい。 For example, in the above-described embodiment, a plurality of recognition dictionaries are stored for each operation state of the control target device. Instead of associating each recognition dictionary with the operation state of the control target device, each recognition word included in the recognition dictionary is changed. The recognition dictionaries may be combined into one by associating with the operation state of the control target device.

本発明の音声認識装置としての機能を備えた車載ナビゲーション装置１０の構成を示すブロック図である。It is a block diagram which shows the structure of the vehicle-mounted navigation apparatus 10 provided with the function as a voice recognition apparatus of this invention. 図１のＥＣＵ１００における処理内容のうち、ユーザが発話した音声命令を認識するための処理を示すフローチャートである。It is a flowchart which shows the process for recognizing the voice command which the user uttered among the processing content in ECU100 of FIG. 第２実施形態においてＥＣＵ１００が実行する、ユーザが発話した音声命令を認識するための処理を示すフローチャートである。It is a flowchart which shows the process for recognizing the voice command which the user uttered which ECU100 performs in 2nd Embodiment. 第３実施形態においてＥＣＵ１００が実行する、ユーザが発話した音声命令を認識するための処理を示すフローチャートである。It is a flowchart which shows the process for recognizing the voice command which the user uttered which ECU100 performs in 3rd Embodiment.

Explanation of symbols

１０：車載ナビゲーション装置（音声認識装置）
１００：ＥＣＵ
１１７：記憶装置
Ｓ２０：辞書選択手段
Ｓ５０：共通語認識手段
Ｓ５５：共通語認識手段
Ｓ４０、Ｓ６０〜Ｓ９０：音声命令認識手段
Ｓ２００：候補選択手段
Ｓ２００乃至Ｓ２４０：音声命令認識手段 10: In-vehicle navigation device (voice recognition device)
100: ECU
117: Storage device S20: Dictionary selection means S50: Common word recognition means S55: Common word recognition means S40, S60 to S90: Voice command recognition means S200: Candidate selection means S200 to S240: Voice command recognition means

Claims

A voice recognition device for recognizing voice commands for a plurality of control target devices,
A storage device storing a recognition dictionary in which a plurality of recognition words are respectively associated with operation states of any one of the plurality of control target devices;
Voice command recognition means for recognizing a voice command spoken by the user based on a recognition dictionary stored in the storage device;
Common word recognition means for determining whether or not a common word included in common in at least two recognition words associated with different control target devices is included in a voice command spoken by a user; With
The voice command recognizing means, when the common word recognizing means determines that the common word is included , the control last operated by the user among the recognized words of the recognition dictionary including the common word. A voice recognition device, wherein a voice command spoken by a user is recognized based on a recognition word associated with a target device.

The recognition dictionaries stored in the storage device are divided into a plurality of recognition dictionaries according to the operating state of the device to be controlled,
Further comprising a dictionary selection means for selecting at least one recognition dictionary from a plurality of recognition dictionaries stored in the storage device based on an operation state of the plurality of control target devices;
The voice command recognizing means, when the common word recognizing means determines that a common word is included, the recognition selected by the dictionary selecting means as a recognition dictionary for recognizing a voice command spoken by the user The speech recognition apparatus according to claim 1, wherein one recognition dictionary determined based on a device to be controlled last operated by a user is used.

When the voice command recognition means narrows down the recognition dictionary to one based on the device to be controlled last operated by the user, the common word recognition among the recognition words included in the narrowed one recognition dictionary The speech recognition apparatus according to claim 2, wherein a recognition word including a common word recognized by the means is specified as a voice command spoken by a user.

When the voice command recognizing means narrows down the recognition dictionary to one based on the device to be controlled last operated by the user, by executing the voice recognition process using the narrowed down recognition dictionary, The voice recognition apparatus according to claim 2, which recognizes a voice command spoken by a user.

The speech recognition apparatus according to claim 3, wherein the common word recognition unit is executed regardless of the number of recognition dictionaries selected by the dictionary selection unit.

The common word recognition unit is executed when a plurality of recognition dictionaries are selected by the dictionary selection unit and the common word is included in at least two of the selected plurality of recognition dictionaries. The speech recognition apparatus according to claim 2, wherein

A common word recognition dictionary having the common word as a recognition vocabulary is stored in the storage device;
The common word recognition means determines whether or not the common word is included in a voice command spoken by a user by executing voice recognition processing using the common word recognition dictionary. The voice recognition device according to claim 1.

The common word recognition unit determines whether or not a common word is included in a voice command uttered by the user by comparing an input voice representing the user's utterance with a reference voice of the common word stored in advance. The speech recognition apparatus according to claim 1, wherein

A voice recognition device for recognizing voice commands for a plurality of control target devices,
A storage device storing a recognition dictionary in which a plurality of recognition words are respectively associated with operation states of any one of the plurality of control target devices;
Voice command recognition means for recognizing a voice command spoken by the user based on a recognition dictionary stored in the storage device;
A list storage device that stores a list of common words that are commonly included in at least two recognition words that are associated with different control target devices;
The voice command recognition means includes candidate selection means for selecting a voice command candidate spoken by the user from the recognition dictionary by voice recognition processing using the recognition dictionary, and a plurality of candidates are selected by the candidate selection means. And when at least two of the selected candidates include a common word included in the common word list , the common word included in the common word list is a candidate containing, by selecting a candidate user is a recognition vocabulary of the recognition dictionary determined based on the control target device has been operated last, and characterized in that for identifying the speech uttered by a user Voice recognition device.

The recognition dictionaries stored in the storage device are divided into a plurality of recognition dictionaries according to the operating state of the device to be controlled,
Further comprising a dictionary selection means for selecting at least one recognition dictionary from a plurality of recognition dictionaries stored in the storage device based on an operation state of the plurality of control target devices;
The speech recognition apparatus according to claim 9, wherein the candidate selection unit uses a recognition dictionary selected by the dictionary selection unit in speech recognition processing.

A common word recognition dictionary having the common word as a recognition vocabulary is stored in the storage device;
11. The candidate selection unit is configured to select a voice command candidate spoken by a user using the common word recognition dictionary in addition to the recognition dictionary selected by the dictionary selection unit. The speech recognition apparatus described in 1.