JP2004301875A

JP2004301875A - Speech recognition device

Info

Publication number: JP2004301875A
Application number: JP2003091311A
Authority: JP
Inventors: Hiroyuki Hoshino; 博之星野
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a speech recognition device which starts speech recognition with a specified word, is prevented from starting speech recognition not intended by a speaker owing to the appearance of a specified word in an ordinary conversation and imposes little load on speaking. <P>SOLUTION: As specified words, a limited plurality of words are used and words of a language which is not the mother tongue of the speaker are used in order to decrease the frequency of appearance of specified words in an ordinary conversation. Further, some of the specified words are given meanings of voice operation commands and operations are performed simultaneously with the start of speech recognition in order to lighten the speaking load. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識装置に関し、特に、手動のトークスイッチが不要な音声認識装置に関する。
【０００２】
【従来の技術】
【特許文献１】
特開平１１−３５９２８７号公報
【特許文献２】
特開２０００−５６７９０号公報
【特許文献３】
特開２０００−１９４２９３号公報
【特許文献４】
特開２０００−３２２０７８号公報
【特許文献５】
特開２０００−１９４３９３号公報
【０００３】
従来、音声認識装置においては、発話者は発声の度に毎回トークスイッチを操作しなければならず、このようなスイッチ操作は煩わしいものである。特に、車載のナビゲーション装置における音声入力をする際には、運転操作中にトークスイッチを操作するのは煩雑な作業である。このような問題を解決するためにトークスイッチの代わりに話者の画像から発話の有無を判定するもの、特定の言葉を認識する手段を設けることにより音声認識を開始するものが各社から考案されている。
【０００４】
特開平１１−３５９２８７号公報「音声認識装置」は、カメラにより発話者を撮影し、話者の画像から、発話の有無を判定することにより、トークスイッチを不要としている。また、特開２０００−５６７９０号公報「音声伝達方式」、特開２０００−１９４２９３号公報「音声認識制御装置」、特開２０００−３２２０７８号公報「車載型音声認識装置」は、特定の言葉を認識する手段を設けることにより、音声認識を開始するものである。特開２０００−１９４３９３号公報においては、トークスイッチとも併用できる装置を示している。
【０００５】
【発明が解決しようとする課題】
トークスイッチの代わりに特定の言葉を認識する手段を設けることにより、音声認識を開始する装置では、特定の言葉の次に再び音声操作コマンドを入力する必要があり、発話者の発話負担が大きく、使い勝手を損なっていた。また、その「特定の言葉」が、発話者の母国語の場合、通常の会話にその言葉が出現し、それが誤って認識されると、発話者が意図していないのに音声認識が開始されてしまうという問題があった。
【０００６】
本発明の目的は、この点を考慮し、音声認識開始のための特定の言葉を、あまり多くない複数とし、その内のいくつかには、音声操作コマンドの役割りを持たせることにより、発話者の発話負担を軽くし、使い勝手を向上させる。
【０００７】
さらに、上記の「特定の言葉」が、通常の会話に出現し、誤って認識されることにより発話者の意図しない音声認識が開始されない音声認識装置を得ることである。
【０００８】
なお、上述したある１つの発明が、上記した全ての目的を同時に達成するものと解されるべきではなく、個々の発明が、それぞれの目的を達成するものと解されるべきである。
【０００９】
【課題を解決するための手段】
従って、本発明においては、発話者が発した音声を認識する音声認識装置において、複数の特定の言葉である複数のキーワードを認識するキーワード認識手段と、キーワード認識手段により認識されたキーワードが操作の意味を持つ操作コマンドかどうかを判定する操作コマンド判定手段を有するようにした。さらに、操作コマンド判定手段により操作コマンドと判定された場合は、コマンド操作を実行し、操作コマンドと判定されない場合は、音声認識手段により音声認識を行い、認識結果に基づいた操作を実行するようにしたキーワード制御手段を有するようにしたことを特徴とする。
【００１０】
さらに、請求項２の発明は、キーワードが普通の会話に出現することにより意図しない音声認識が開始されないようにするために、キーワードは、一般的な会話への出現頻度の少ない特定の複数の言葉であることを特徴とする。
【００１１】
さらに、請求項３の発明は、キーワードが普通の会話に出現することにより意図しない音声認識が開始されないようにするために、キーワードは、本音声認識装置において想定している主なユーザが用いる言語による会話において使用することの少ない他国語であることを特徴とする。例えば、本装置が、主なユーザとして日本人を想定している場合、会話に用いられる言語は、日本語である。この場合、キーワードとして、日本語でない、英語、フランス語等を用いることを特徴とする。
【００１２】
【発明の作用及び、効果】
従って、請求項１の発明によれば、発話者が発した音声を認識する音声認識装置において、複数の特定の言葉である複数のキーワードを認識するキーワード認識手段を設けることにより、発話者の発したキーワードを認識することができるようになる。さらに、キーワード認識手段により認識されたキーワードが操作の意味を持つ操作コマンドかどうかを判定する操作コマンド判定手段と、操作コマンド判定手段により操作コマンドと判定された場合は、コマンド操作を実行し、操作コマンドと判定されない場合は、音声認識手段により音声認識を行い、認識結果に基づいた操作を実行するようにしたキーワード制御手段を有するようにした。従って、発話者が発したキーワードが、操作コマンドである場合には、音声認識の開始と共に、コマンド操作が開始されるようになり、発話者の発話負担が軽減されるようになる。また、キーワードが、操作コマンドで無い場合は、通常の音声認識が開始されるようになる。
【００１３】
さらに、請求項２の発明によれば、キーワードは、一般的な会話への出現頻度の少ない複数の特定の言葉であるようにしたので、キーワードが発話者の通常の会話に出現する頻度が少なくなり、キーワードが誤認識され、発話者の意図しない音声認識が開始されることが少なくなる。
【００１４】
さらに、請求項３の発明によれば、キーワードは、本音声認識装置において想定している主なユーザが用いる言語による会話において使用することの少ない他国語を用いるようにした。従って、例えば、日本語を用いるユーザに対する音声認識装置としては、キーワードとして、英語を用いることにより、通常の日本語の会話にキーワードが出現する頻度を極めて低くすることが可能となる。同様に、英語を用いるユーザに対する音声認識装置のキーワードとして、日本語を用いることにより、英語の通常会話にキーワードが出現する頻度を低くすることが可能となる。このように、対象としているユーザが用いる言語以外の他国語を用いることにより、通常の会話にキーワードが出現する頻度を少なくすることが可能となり、発話者の意図しない音声認識の開始を少なくすることが可能となる。さらに、キーワードとして設定する特定の言葉として、キーワードが要求される本来の意味を持った言葉を設定することが可能となる。これにより、会話には出現する頻度は低いが、キーワードとしての意味は妥当である単語を選定することが可能となる。
【００１５】
【発明の実施の形態】
ここでは、車載のナビゲーション装置等に用いられる音声認識装置に本発明を適用した具体例に関して述べる。
先ず、図１は本発明の実施形態の一例である音声認識装置１０を用いたナビゲーション装置１の構成を示すブロック図である。構成及び、動作の概要を図により説明する。音声認識装置１０は、マイクロホン１８、音声認識ＥＣＵ（ＥｌｅｃｔｒｏｎｉｃＣｏｎｔｒｏｌＵｎｉｔ）により構成され、発話者が音声コマンド（例えば走行目的地を指定するためのコマンド）を発すると、音声はマイクロホン１８に入力され、電気的な信号に変換されて音声認識ＥＣＵ１９に送られる。音声認識ＥＣＵ１９は、ＤＳＰ（デジタルシグナルプロセッサ）を有し、音声データを解析し、発話者が何を言ったのかを認識する。周知の認識処理が行われればよく、ダイナミックプログラミング法（動的計画法、ＤＰ法）や、ヒドンマルコフモデル（隠れマルコフモデル、ＨＭＭ）を使った確率手法などが適用可能である。概略的には、例えば、入力信号に対して窓関数処理、フーリエ変換処理などが行われ、音声データのケプストラムなどが求められる（音響処理）。その後、キーワード認識手段１１により、音響処理後の信号と、予め用意されたキーワードテンプレート１２（認識対象単語のデータ）とのパターンマッチングが行われる。マッチング結果のよい単語が、発声された単語であると決定される。認識結果は操作コマンド判定手段１３に、出力され、操作コマンドであるかどうか判定される。
【００１６】
この場合も、例えば、操作コマンドテンプレート１４が用意され、パターンマッチングが行われる。判定結果は、キーワード制御手段１５に出力され、操作コマンドであると判定されると、操作コマンドが、ナビゲーション装置２０に出力され、操作コマンドで無いと判定されると、さらに、発せられた音声が何であるか、判定するために、音声認識手段１６において、音声認識ワードテンプレート１７とのパターンマッチングにより決定する。結果は、キーワード制御手段１５に出力され、ナビゲーション手段３０への指令及び、返答として出力される。ナビゲーション手段３０は、発話者の発した音声コマンド及び、返答に従って、表示装置４０への地図の表示、スピーカ２０を介して、さらに発話を促したり、経路の誘導を行う等の動作する。
【００１７】
次に、図２に示す本発明の特徴である、特定の言葉であるキーワードを用いた処理の例を、図３のフローチャートに従い処理の概要について説明する。
【００１８】
先ず、マイクロホン１８からの音声の入力があった場合、ステップ１００において、音声入力処理が行われる。次に、ステップ１０２において、音声認識ＥＣＵ１９において、音声データの解析処理がなされる。
【００１９】
次に、ステップ１０４において、解析された音声が、キーワードであるかどうか、キーワードテンプレート（ａ）を用いて、パターンマッチングがなされ、発話が、「現在地」、「行き先」、「ナビ操作」で有るかどうか判断される。ここで、キーワードでないと判断されると、音声認識は開始されないので、キーワードが入力されるか、トークスイッチが押されるまで、音声認識は開始されない。一方、キーワードであると判断されると、トークスイッチが押された場合と同様に音声認識処理が開始される。これらのうち、「現在地」、「行き先」は、通常ナビゲーション装置において、特に良く用いられるコマンド語である。
【００２０】
この「現在地」、「行き先」には、音声認識開始だけでなく、音声操作コマンドの役割を持っているので、ステップ１０６において、操作コマンドテンプレート（ｃ）とのパターンマッチングにより、「現在地」、「行き先」と判断されると、ステップ１０８において、認識コマンド操作処理が行われる。これは、キーワード制御手段１５により、「現在地」、「行き先」コマンドが入力されたことが、ナビゲーション手段３０に出力され、ナビゲーション手段３０において、現在地の表示、行き先設定がなされるように処理がされる。必要に応じて、次の音声入力を促するためのガイダンスがスピーカ２０よりなされる。また、発話が「ナビ操作」の場合、キーワード制御手段１５は、音声認識手段に対するトークスイッチが押された場合と同じように、全ての音声コマンドを入力できる状態にする（ステップ１１０）。その後、ステップ１１２において、音声認識手段１６により、音声認識ワードテンプレート１７を用いて、音声認識を行い、音声認識結果が、コマンドであれば、コマンドに対する操作を実行するようにする（ステップ１１４）。
【００２１】
さらに、他の実施例として、キーワードテンプレートにおいて、特定の言葉「現在地」、「行き先」「ナビ操作」を日本語でなく、キーワードテンプレート（ｂ）のように、英語の「ポジション」、「デスティネーション」、「オペレーション」を用いる。これにより、普段日本人が、会話で用いない言葉とすることによが可能となり、通常に会話に、「特定の言葉」が出現し、それが誤って認識され発話者の意図しない音声認識が開始されてしまうことを少なくすることができる。
【００２２】
さらに、他の実施例として、上記２つの実施例ともに、当然トークスイッチと組み合わせを変えることも可能である。例えば、英語の「デスティネーション」のみを音声認識開始、かつ、音声操作コマンドとすることにより、ナビ操作において、最もよく使われる機能の一つである行き先設定を、トークスイッチ無しで、かつ、少ない発声で短時間に完了することができる。その他の音声コマンドは、トークスイッチを押すことにより、入力可能となる。これにより、さらに、発話者の意図しない音声認識の開始を防ぐことが可能となる。
【００２３】
本発明の会話の一例
発話者：「デスティネーション」
ナビ：「行き先を言ってください」
発話者：「名古屋城」
【００２４】
従来の発明の会話の例
発話者：「ナビ」
ナビ：「ナビ音声認識を開始します。コマンドをどうぞ。」
発話者：「行き先」
ナビ：「行き先を言ってください。」
発話者：「名古屋城」
【００２５】
以上により、本発明は、発話者の発話負担を小さくし、トークスイッチの操作を減らすことが可能となる。トークスイッチの操作は、発話者が、運転手の場合には、負担の大きい操作であるので、使い勝手を向上させることができる。さらには、操作時間の短縮が可能になるので、運転中に車載ナビゲーション装置を使用する場合の安全性の向上を期待することができる。
【００２６】
尚、上記実施例では、日本語キーワードとして、「現在地」、「行き先」、「ナビ操作」、代表的な外国語のキーワードとして英語を用い、「ポジション」、「デスティネーション」、「オペレーション」を用いたが、もちろんこれ以外でも良く、発明の主旨を損なわない程度であれば、さらにキーワードを増やしても良い。
【００２７】
尚、上記実施例では、対象とする主なユーザが、日本語を用いる場合を想定しているが、ユーザが英語を用いる場合は、キーワードとして、日本語、ドイツ語等を用いるようにすれば良い。他の言語においても同様に想定するユーザが用いる言語以外の言語によりキーワードを設定すればよい。
【００２８】
さらに、音声認識ワードテンプレートに記した単語は、一例であり、実際に認識できる単語は、これより多くても良く、また、１つの意味に対して、複数の単語を用意することにより、さらに、使い勝手を良くすることが可能である。
【００２９】
本実施例では、車両用ナビゲーション装置に本発明の音声認識装置が備えられていた。しかし、本発明の適用範囲が車両用ナビゲーション装置に限定されないことはもちろんである。本発明は、他の車載機器に備えられる認識装置にも、また、車両以外の環境で使われる音声認識装置にも同様に適用可能である。
【００３０】
上述した実施形態は、本発明の一例であって、これに限定されるものではなく、発明の本質に照らして、様々な変形例が考えられる。
【図面の簡単な説明】
【図１】本発明の実施形態の音声認識装置の構成を示すブロック図である。
【図２】音声認識に用いられるキーワード、操作コマンド、音声認識ワードのテンプレートの一例を示した説明図。
【図３】本発明の音声認識装置の動作を示すフローチャートである。
【符号の説明】
１…ナビゲーション装置
１０…音声認識装置
１１…キーワード認識手段
１２…キーワードテンプレート
１３…操作コマンド判定手段
１４…操作コマンドテンプレート
１５…キーワード制御手段
１６…音声認識手段
１７…音声認識ワードテンプレート
１８…マイクロホン
１９…音声認識ＥＣＵ
２０…スピーカ
３０…ナビゲーション手段
４０…表示装置[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition device, and more particularly to a speech recognition device that does not require a manual talk switch.
[0002]
[Prior art]
[Patent Document 1]
JP-A-11-359287 [Patent Document 2]
JP 2000-56790 A [Patent Document 3]
JP 2000-194293 A [Patent Document 4]
JP 2000-322078 A [Patent Document 5]
JP 2000-194393 A
Conventionally, in a speech recognition device, a speaker must operate a talk switch every time he or she utters, and such a switch operation is troublesome. In particular, when performing voice input in a vehicle-mounted navigation device, operating the talk switch during driving operation is a complicated operation. In order to solve such a problem, those that determine the presence or absence of speech from the image of the speaker instead of the talk switch and those that start voice recognition by providing means for recognizing specific words have been devised by various companies. I have.
[0004]
Japanese Unexamined Patent Application Publication No. 11-359287 discloses a "voice recognition device" which eliminates the need for a talk switch by photographing a speaker with a camera and determining the presence or absence of a speech from the image of the speaker. Also, Japanese Patent Application Laid-Open No. 2000-56790, "Speech Transmission Method", Japanese Patent Application Laid-Open No. 2000-194293, "Speech Recognition Control Device", and Japanese Patent Application Laid-Open No. 2000-322078, "Onboard Speech Recognition Device" recognize specific words. The voice recognition is started by providing the means for performing the voice recognition. Japanese Patent Application Laid-Open No. 2000-194393 discloses an apparatus that can be used together with a talk switch.
[0005]
[Problems to be solved by the invention]
By providing a means for recognizing a specific word in place of the talk switch, in a device that starts voice recognition, it is necessary to input a voice operation command again after the specific word, and the utterance burden of the speaker is large, The usability was impaired. Also, if the "specific word" is the speaker's native language, the word appears in a normal conversation, and if it is incorrectly recognized, speech recognition starts even if the speaker does not intend. There was a problem that would be done.
[0006]
In view of this point, the object of the present invention is to make the specific words for the start of voice recognition into a plurality of not many words, and to make some of them have the role of voice operation commands, thereby making the utterance possible. To reduce the utterance burden on people and improve usability.
[0007]
Another object of the present invention is to provide a speech recognition apparatus in which the above-mentioned "specific words" appear in a normal conversation and are not erroneously recognized, so that speech recognition not intended by the speaker is not started.
[0008]
It should be understood that one invention described above is not intended to achieve all the objects described above at the same time, and individual inventions are intended to achieve the respective objects.
[0009]
[Means for Solving the Problems]
Therefore, according to the present invention, in a speech recognition device that recognizes a voice uttered by a speaker, a keyword recognition unit that recognizes a plurality of keywords that are a plurality of specific words, and a keyword recognized by the keyword recognition unit is used for an operation. An operation command judging means for judging whether the command is a meaningful operation command is provided. Further, if the operation command is determined by the operation command determination unit, a command operation is performed, and if the operation command is not determined, voice recognition is performed by the voice recognition unit, and an operation based on the recognition result is performed. And a keyword control unit.
[0010]
Furthermore, in order to prevent unintended speech recognition from being started when the keyword appears in a normal conversation, the keyword includes a plurality of specific words having a low appearance frequency in a general conversation. It is characterized by being.
[0011]
Furthermore, in order to prevent unintended speech recognition from being started when the keyword appears in a normal conversation, the keyword is set in a language used by a main user assumed in the speech recognition apparatus. It is characterized by the fact that it is a foreign language that is rarely used in conversations with other people. For example, when the present device assumes a Japanese user as a main user, the language used for conversation is Japanese. In this case, non-Japanese, English, French and the like are used as keywords.
[0012]
Actions and effects of the present invention
Therefore, according to the first aspect of the present invention, in the speech recognition apparatus for recognizing a voice uttered by a speaker, a keyword recognition unit for recognizing a plurality of keywords, which are a plurality of specific words, is provided. It becomes possible to recognize the keyword that has been set. Further, an operation command determining means for determining whether the keyword recognized by the keyword recognition means is an operation command having the meaning of the operation, and a command operation is executed when the operation command determining means determines that the command is an operation command. When it is not determined that the command is a command, the voice recognition unit performs voice recognition, and has a keyword control unit that executes an operation based on the recognition result. Therefore, when the keyword uttered by the speaker is an operation command, the command operation is started at the same time as the start of voice recognition, and the utterance burden on the speaker is reduced. If the keyword is not an operation command, normal speech recognition is started.
[0013]
Further, according to the second aspect of the present invention, the keyword is a plurality of specific words having a low appearance frequency in a general conversation. Therefore, the frequency of occurrence of the keyword in a normal conversation of a speaker is low. In other words, it is less likely that a keyword is erroneously recognized and speech recognition not intended by the speaker is started.
[0014]
Further, according to the invention of claim 3, as the keyword, a foreign language rarely used in a conversation in a language used by a main user assumed in the voice recognition device is used. Therefore, for example, as a speech recognition device for a user who uses Japanese, by using English as a keyword, it becomes possible to extremely reduce the frequency of occurrence of the keyword in ordinary Japanese conversation. Similarly, by using Japanese as a keyword of the voice recognition device for a user who uses English, it is possible to reduce the frequency of occurrence of the keyword in a normal English conversation. As described above, by using a language other than the language used by the target user, it is possible to reduce the frequency of occurrence of the keyword in a normal conversation, and to reduce the start of speech recognition unintended by the speaker. Becomes possible. Further, as a specific word set as a keyword, it is possible to set a word having an original meaning that a keyword is required. As a result, it is possible to select a word that appears at a low frequency in a conversation but has a valid meaning as a keyword.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
Here, a specific example in which the present invention is applied to a voice recognition device used for an in-vehicle navigation device or the like will be described.
First, FIG. 1 is a block diagram showing a configuration of a navigation device 1 using a voice recognition device 10 according to an embodiment of the present invention. An outline of the configuration and operation will be described with reference to the drawings. The voice recognition device 10 includes a microphone 18 and a voice recognition ECU (Electronic Control Unit). When a speaker issues a voice command (for example, a command for specifying a travel destination), voice is input to the microphone 18, The signal is converted into an electric signal and sent to the voice recognition ECU 19. The voice recognition ECU 19 has a DSP (digital signal processor), analyzes voice data, and recognizes what the speaker has said. A known recognition process may be performed, and a dynamic programming method (dynamic programming method, DP method), a stochastic method using a hidden Markov model (hidden Markov model, HMM), or the like can be applied. Schematically, for example, window function processing, Fourier transform processing, and the like are performed on an input signal, and a cepstrum or the like of audio data is obtained (acoustic processing). Thereafter, the keyword recognition unit 11 performs pattern matching between the signal after the acoustic processing and a keyword template 12 (data of a recognition target word) prepared in advance. A word with a good matching result is determined to be a spoken word. The recognition result is output to the operation command judging means 13, and it is judged whether the command is an operation command.
[0016]
Also in this case, for example, the operation command template 14 is prepared, and pattern matching is performed. The determination result is output to the keyword control means 15, and when it is determined that the operation command is an operation command, the operation command is output to the navigation device 20, and when it is determined that the operation command is not an operation command, the uttered sound is further output. In order to determine what this is, the voice recognition means 16 determines it by pattern matching with the voice recognition word template 17. The result is output to the keyword control unit 15 and output as a command to the navigation unit 30 and a reply. The navigation means 30 performs operations such as displaying a map on the display device 40, prompting further utterance, guiding a route, and the like via the speaker 20 according to the voice command and response from the speaker.
[0017]
Next, an example of processing using a keyword as a specific word, which is a feature of the present invention shown in FIG. 2, will be described with reference to the flowchart of FIG.
[0018]
First, when a voice is input from the microphone 18, a voice input process is performed in step 100. Next, in step 102, the voice recognition ECU 19 performs a voice data analysis process.
[0019]
Next, in step 104, whether or not the analyzed voice is a keyword is subjected to pattern matching using the keyword template (a), and the utterances are "current location", "destination", and "navigation operation". It is determined whether or not. If it is determined that the keyword is not a keyword, the voice recognition is not started. Therefore, the voice recognition is not started until a keyword is input or a talk switch is pressed. On the other hand, if it is determined that the keyword is a keyword, the voice recognition process is started as in the case where the talk switch is pressed. Of these, “current location” and “destination” are command words that are particularly frequently used in a normal navigation device.
[0020]
Since the “current location” and “destination” have a role of a voice operation command as well as the start of voice recognition, in step 106, the “current location” and “destination” are determined by pattern matching with the operation command template (c). If it is determined to be “destination”, a recognition command operation process is performed in step 108. That is, the fact that the "current location" and "destination" commands are input by the keyword control means 15 is output to the navigation means 30, and the navigation means 30 performs processing such that the current location is displayed and the destination is set. You. Guidance for prompting the next voice input is provided from the speaker 20 as necessary. If the utterance is "navigating operation", the keyword control unit 15 enters a state where all voice commands can be input, as in the case where the talk switch for the voice recognition unit is pressed (step 110). Thereafter, in step 112, speech recognition is performed by the speech recognition means 16 using the speech recognition word template 17, and if the speech recognition result is a command, an operation for the command is executed (step 114).
[0021]
Further, as another example, in the keyword template, the specific words "current location", "destination", and "navigation operation" are not written in Japanese, but in English "position", "destination", as in the keyword template (b). ”And“ Operation ”. This makes it possible for Japanese people to use words that are not usually used in conversations. Normally, "specific words" appear in conversations, which are erroneously recognized and speech recognition not intended by the speaker is performed. It can be less likely to be started.
[0022]
Further, as another embodiment, it is naturally possible to change the combination with the talk switch in each of the above two embodiments. For example, by using only the English “destination” as a voice recognition start command and a voice operation command, the destination setting, which is one of the most frequently used functions in the navigation operation, can be performed without a talk switch and in a small number. It can be completed in a short time by speaking. Other voice commands can be input by pressing the talk switch. As a result, it is possible to prevent the speech recognition from being started unintentionally by the speaker.
[0023]
Example of conversation of the present invention Speaker: "Destination"
Navi: "Please tell us where you are going"
Speaker: "Nagoya Castle"
[0024]
Example of conventional invention conversation Speaker: "Navi"
Navigation: "Start navigation speech recognition.
Speaker: "Destination"
Navi: "Please tell me where you are going."
Speaker: "Nagoya Castle"
[0025]
As described above, according to the present invention, it is possible to reduce the utterance burden on the speaker and reduce the number of operations of the talk switch. The operation of the talk switch is a burdensome operation when the speaker is a driver, so that usability can be improved. Further, since the operation time can be reduced, it is possible to expect an improvement in safety when using the in-vehicle navigation device during driving.
[0026]
In the above embodiment, "current position", "destination", "navigation operation" is used as a Japanese keyword, and English is used as a representative foreign language keyword, and "position", "destination", and "operation" are used. Although the keyword is used, other keywords may be used, and the keyword may be further increased as long as the gist of the invention is not impaired.
[0027]
In the above embodiment, it is assumed that the target user uses Japanese, but if the user uses English, Japanese, German, etc. may be used as keywords. good. Similarly, in other languages, the keyword may be set in a language other than the language used by the assumed user.
[0028]
Furthermore, the words described in the speech recognition word template are merely examples, and the number of words that can be actually recognized may be larger than this, and by preparing a plurality of words for one meaning, It is possible to improve usability.
[0029]
In this embodiment, the vehicular navigation device is provided with the voice recognition device of the present invention. However, it goes without saying that the scope of application of the present invention is not limited to vehicle navigation devices. The present invention is similarly applicable to a recognition device provided in another vehicle-mounted device, and to a speech recognition device used in an environment other than a vehicle.
[0030]
The embodiment described above is an example of the present invention, and the present invention is not limited to the embodiment. Various modifications can be considered in light of the essence of the present invention.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a speech recognition device according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing an example of a template of a keyword, an operation command, and a speech recognition word used for speech recognition.
FIG. 3 is a flowchart showing the operation of the voice recognition device of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Navigation device 10 ... Voice recognition device 11 ... Keyword recognition means 12 ... Keyword template 13 ... Operation command determination means 14 ... Operation command template 15 ... Keyword control means 16 ... Voice recognition means 17 ... Voice recognition word template 18 ... Microphone 19 ... Voice recognition ECU
Reference Signs List 20 speaker 30 navigation means 40 display device

Claims

In a voice recognition device for recognizing a voice uttered by a speaker,
A keyword recognizing means for recognizing a plurality of keywords which are a plurality of specific words,
Operation command determination means for determining whether the keyword recognized by the keyword recognition means is an operation command having a meaning of an operation,
When the operation command is determined to be an operation command by the operation command determination unit, a command operation is performed, and when the operation command is not determined, voice recognition is performed by the voice recognition unit, and an operation based on the recognition result is performed. A speech recognition device comprising keyword control means.

The speech recognition device according to claim 1, wherein the keyword is a plurality of specific words having a low frequency of appearance in general conversation.

The speech recognition device according to claim 1, wherein the keyword is a foreign language rarely used in a conversation in a language used by a main user assumed in the speech recognition device.