JP6759058B2

JP6759058B2 - Voice recognition device and voice recognition method

Info

Publication number: JP6759058B2
Application number: JP2016213052A
Authority: JP
Inventors: 信範工藤; 諒助川
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2016-10-31
Filing date: 2016-10-31
Publication date: 2020-09-23
Anticipated expiration: 2036-10-31
Also published as: JP2018072599A

Description

本発明は、音声認識装置および音声認識方法に関し、特に、音声認識辞書に登録されているワードの音声パターンと、マイクより入力された発話音声との類似度を示す指標を算出し、当該算出した指標が閾値に対して所定の条件を満たすか否かによって、発話音声に対応するワードを認識するようになされた音声認識装置に用いて好適なものである。 The present invention relates to a voice recognition device and a voice recognition method, and in particular, calculates an index indicating the degree of similarity between the voice pattern of a word registered in a voice recognition dictionary and the spoken voice input from a microphone, and the calculation is performed. It is suitable for use in a voice recognition device that recognizes a word corresponding to an uttered voice depending on whether or not the index satisfies a predetermined condition with respect to the threshold value.

車両には、オーディオ装置、エアコンディショナ、ナビゲーション装置など各種の電子機器が搭載されている。また、これらの電子機器を操作する際の片手運転等を回避するために、電子機器の操作を音声認識により行えるようにしたシステムも提供されている。この音声認識技術を用いれば、運転者は、ハンドルから手を離すことなく（リモートコントローラや操作パネル等の操作部を手動で操作せずに）各種電子機器の操作を行うことができる。 The vehicle is equipped with various electronic devices such as audio equipment, air conditioners, and navigation devices. Further, in order to avoid one-handed driving when operating these electronic devices, a system is also provided in which the electronic devices can be operated by voice recognition. By using this voice recognition technology, the driver can operate various electronic devices without taking his / her hand off the steering wheel (without manually operating the operation unit such as the remote controller or the operation panel).

音声認識装置は通常、ユーザが発声した特定の単語や熟語、簡単な命令文など（以下、これらをまとめて「ワード」という）を発話コマンドとして認識する。電子機器は、音声認識装置により認識されたワード（発話コマンド）に応じた制御を行う。かかる音声認識装置では、発話コマンドとして用いる各認識対象ワードとその音声パターンとを対応付けた音響モデルを音声認識辞書にあらかじめ登録しておく。そして、ユーザの発話音声から算出した特徴量と音響モデルの特徴量とを比較して類似度が最も高い音声パターンを検索し、その音声パターンに対応付けられているワードを発話音声のワードであると認識する。 The voice recognition device usually recognizes a specific word, a compound word, a simple command sentence, etc. (hereinafter, collectively referred to as "word") uttered by the user as a utterance command. The electronic device controls according to the word (speech command) recognized by the voice recognition device. In such a voice recognition device, an acoustic model in which each recognition target word used as an utterance command and its voice pattern are associated is registered in advance in a voice recognition dictionary. Then, the feature amount calculated from the user's spoken voice is compared with the feature amount of the acoustic model to search for the voice pattern having the highest similarity, and the word associated with the voice pattern is the spoken voice word. Recognize that.

従来の音声認識装置は、ユーザが備え付けの発話ボタンを押すことで音声認識モードとなり、マイクから入力されたユーザの発話音声を認識してコマンドを実行するようになされている。発話ボタンの操作に代えて、手を叩く等の特定動作をトリガとして音声認識モードとなるようになされたものも知られている。最近では、音声認識時に発話ボタンの操作や特定動作などのトリガを不要にした音声認識装置（以下、トリガレス音声認識装置という）も提供されている。 In the conventional voice recognition device, the user presses the built-in utterance button to enter the voice recognition mode, recognizes the user's utterance voice input from the microphone, and executes a command. It is also known that instead of operating the utterance button, a specific action such as clapping a hand triggers the voice recognition mode. Recently, a voice recognition device (hereinafter referred to as a triggerless voice recognition device) that does not require a trigger such as an operation of a utterance button or a specific operation during voice recognition has been provided.

トリガレス音声認識装置では、マイクを常時オン状態にしておき、入力音声を識別して、発話コマンドに該当するワードかどうかを判定する。すなわち、音声認識辞書に登録している各ワードの音声パターンと、マイクより入力された音声との近さの程度（類似度）を示す指標として距離値を算出する。そして、算出した距離値が、ワード毎に設定されている閾値よりも小さい場合に、その入力音声が、閾値を下回ったワードであると認識する。 In the triggerless voice recognition device, the microphone is always on, the input voice is identified, and it is determined whether or not the word corresponds to the utterance command. That is, the distance value is calculated as an index indicating the degree of closeness (similarity) between the voice pattern of each word registered in the voice recognition dictionary and the voice input from the microphone. Then, when the calculated distance value is smaller than the threshold value set for each word, the input voice is recognized as a word below the threshold value.

なお、車室内では、マイクより入力される音声には、音声認識のための発話音声の他に、エンジンの動作音や走行音、オーディオ音声、搭乗者どうしの会話音声などの各種ノイズが含まれている。特に、トリガレス音声認識装置の場合は音声認識モードが設けられておらず、ノイズとなる音声が常にマイクに入力されている。そのため、このような環境下においても音声認識を正しく行えるようにするための工夫が必要となる。 In the passenger compartment, the voice input from the microphone includes various noises such as engine operation sound, running sound, audio voice, and conversation voice between passengers, in addition to the spoken voice for voice recognition. ing. In particular, in the case of a triggerless voice recognition device, a voice recognition mode is not provided, and noise that becomes noise is always input to the microphone. Therefore, it is necessary to devise a way to correctly perform voice recognition even in such an environment.

音声認識の正解率を上げる（誤認識を抑制する）ためには、距離値と比較される閾値を適切に設定することが必要である。これに対し、従来、車室内の騒音レベルを車両の運転パラメータ（エンジン回転数、車速、車載エアコンディショナの送風ファンの強度、カーステレオの出力音量など）に基づいて推定し、推定した騒音レベルに応じて音声認識の閾値を設定するようになされた音声認識装置が知られている（例えば、特許文献１参照）。 In order to increase the correct answer rate of speech recognition (suppress false recognition), it is necessary to appropriately set the threshold value to be compared with the distance value. On the other hand, conventionally, the noise level in the vehicle interior is estimated based on the driving parameters of the vehicle (engine speed, vehicle speed, strength of the blower fan of the in-vehicle air conditioner, output volume of the car stereo, etc.), and the estimated noise level. A voice recognition device is known that sets a threshold value for voice recognition according to the above (see, for example, Patent Document 1).

特開２００１−７５５９５号公報Japanese Unexamined Patent Publication No. 2001-75595

しかしながら、上記特許文献１に記載の技術では、単に騒音レベルに基づいて閾値を可変設定しているのみであり、騒音の内容については考慮されていない。すなわち、同じ大きさの騒音レベルでも、騒音の内容によって音声認識の正解率は変動するが、特許文献１の技術ではこれが考慮されていないため、閾値を最適化することができていないという問題があった。 However, in the technique described in Patent Document 1, the threshold value is variably set based on the noise level, and the content of noise is not considered. That is, even if the noise level is the same, the correct answer rate of speech recognition varies depending on the content of the noise, but since this is not taken into consideration in the technique of Patent Document 1, there is a problem that the threshold value cannot be optimized. there were.

本発明は、このような問題を解決するために成されたものであり、音声認識の指標と比較される閾値をより適切に設定することができるようにして、誤認識の発生を低減できるようにすることを目的とする。 The present invention has been made to solve such a problem, and it is possible to more appropriately set a threshold value to be compared with an index of speech recognition, so that the occurrence of false recognition can be reduced. The purpose is to.

上記した課題を解決するために、本発明では、音声認識辞書に登録されているワードの音声パターンと、マイクより入力された発話音声との類似度を示す指標を算出し、当該算出した指標が閾値に対して所定の条件を満たすか否かによって、発話音声に対応するワードを認識するようになされた音声認識装置において、車両内で再生されているオーディオ音声のソース種別を判定し、判定したソース種別に応じて閾値を可変設定するようにしている。 In order to solve the above-mentioned problems, in the present invention, an index indicating the degree of similarity between the voice pattern of the word registered in the voice recognition dictionary and the spoken voice input from the microphone is calculated, and the calculated index is used. The source type of the audio voice being played in the vehicle is determined and determined in the voice recognition device that recognizes the word corresponding to the spoken voice depending on whether or not a predetermined condition is satisfied with respect to the threshold value. The threshold is variably set according to the source type.

上記のように構成した本発明によれば、認識対象のワードを音声認識する際にノイズとなるオーディオ音声のソース種別に応じて、音声認識の指標と比較される閾値が設定されるため、ノイズの内容によって閾値を最適化することができ、誤認識の発生を低減することができる。 According to the present invention configured as described above, noise is set because a threshold value to be compared with the voice recognition index is set according to the source type of the audio voice that becomes noise when the word to be recognized is voice-recognized. The threshold value can be optimized according to the content of, and the occurrence of erroneous recognition can be reduced.

第１の実施形態による音声認識装置の構成例を示す機能ブロック図である。It is a functional block diagram which shows the structural example of the voice recognition apparatus by 1st Embodiment. 閾値設定部が参照するテーブル情報の一例を示す図である。It is a figure which shows an example of the table information which a threshold value setting part refers to. 第１の実施形態による音声認識装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the voice recognition apparatus by 1st Embodiment. 第１の実施形態による音声認識装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the voice recognition apparatus by 1st Embodiment. 第２の実施形態による音声認識装置の構成例を示す機能ブロック図である。It is a functional block diagram which shows the structural example of the voice recognition apparatus by 2nd Embodiment. 第２の実施形態による音声認識装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the voice recognition apparatus by 2nd Embodiment. 第３の実施形態による音声認識装置の構成例を示す機能ブロック図である。It is a functional block diagram which shows the structural example of the voice recognition apparatus by 3rd Embodiment. 第３の実施形態による音声認識装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the voice recognition apparatus by 3rd Embodiment. 閾値設定部が参照するテーブル情報の他の例を示す図である。It is a figure which shows another example of the table information which a threshold value setting part refers to.

（第１の実施形態）
以下、本発明による第１の実施形態を図面に基づいて説明する。図１は、第１の実施形態による音声認識装置１００の構成例を示す機能ブロック図である。本実施形態の音声認識装置１００は、マイク２００より入力されるユーザの発話音声（特定の単語や熟語、簡単な命令文などのワード）を発話コマンドとして認識し、ナビゲーション装置３００に対して発話コマンドを実行するものである。なお、ここでは制御対象の電子機器をナビゲーション装置３００としているが、オーディオ装置４００、エアコンディショナ、その他の電子機器であってもよい。 (First Embodiment)
Hereinafter, the first embodiment according to the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration example of the voice recognition device 100 according to the first embodiment. The voice recognition device 100 of the present embodiment recognizes the user's utterance voice (words such as a specific word, a compound word, and a simple command sentence) input from the microphone 200 as a utterance command, and the utterance command to the navigation device 300. Is to execute. Although the electronic device to be controlled is the navigation device 300 here, it may be an audio device 400, an air conditioner, or other electronic device.

図１に示すように、第１の実施形態による音声認識装置１００は、認識辞書記憶部１１、音声認識部１２、確認部１３、ソース判定部１４および閾値設定部１５を備えて構成されている。なお、上記各機能ブロック１２〜１５は、ハードウェア、ＤＳＰ（Digital Signal Processor）、ソフトウェアの何れによっても構成することが可能である。例えばソフトウェアによって構成する場合、上記各機能ブロック１２〜１５は、実際にはコンピュータのＣＰＵ、ＲＡＭ、ＲＯＭなどを備えて構成され、ＲＡＭやＲＯＭ、ハードディスクまたは半導体メモリ等の記録媒体に記憶されたプログラムが動作することによって実現される。 As shown in FIG. 1, the voice recognition device 100 according to the first embodiment includes a recognition dictionary storage unit 11, a voice recognition unit 12, a confirmation unit 13, a source determination unit 14, and a threshold value setting unit 15. .. Each of the above functional blocks 12 to 15 can be configured by any of hardware, DSP (Digital Signal Processor), and software. For example, when configured by software, each of the above functional blocks 12 to 15 is actually configured to include a computer CPU, RAM, ROM, etc., and is a program stored in a recording medium such as RAM, ROM, hard disk, or semiconductor memory. Is realized by the operation of.

認識辞書記憶部１１は、認識対象とするワードとその音声パターンとを対応付けるとともに、音声認識の指標と比較される閾値が設定されて成る音声認識辞書を記憶する。本実施形態では指標の一例として、認識辞書記憶部１１に登録している各ワードの音声パターンと、マイク２００より入力された発話音声との近さの程度（類似度）を示す距離値（例えば、０〜１０００の値）を用いる。距離値が小さいほど類似度が高いことを意味する。本実施形態において、閾値は可変設定されるものである。 The recognition dictionary storage unit 11 associates a word to be recognized with its voice pattern, and stores a voice recognition dictionary in which a threshold value to be compared with a voice recognition index is set. In the present embodiment, as an example of the index, a distance value (for example, a degree of closeness) indicating the degree of closeness (similarity) between the voice pattern of each word registered in the recognition dictionary storage unit 11 and the spoken voice input from the microphone 200 (for example). , A value from 0 to 1000) is used. The smaller the distance value, the higher the similarity. In the present embodiment, the threshold value is variably set.

音声認識部１２は、認識辞書記憶部１１の音声認識辞書に登録されているワードの音声パターンと、マイク２００より入力された発話音声との類似度を示す指標を算出し、当該算出した指標が閾値に対して所定の条件を満たす場合に、発話音声が当該所定の条件を満たすワードであると認識する。上記のように、指標として距離値を用いた場合、音声認識部１２は、発話音声について算出した距離値が、ワード毎に設定されている閾値よりも小さい場合に、その発話音声が、閾値を下回ったワードであると認識する。なお、類似度が高くなるほど値が大きくなるような指標を用いた場合、音声認識部１２は、発話音声について算出した指標が、ワード毎に設定されている閾値よりも大きい場合に、その発話音声が、閾値を上回ったワードであると認識する。 The voice recognition unit 12 calculates an index indicating the degree of similarity between the voice pattern of the word registered in the voice recognition dictionary of the recognition dictionary storage unit 11 and the utterance voice input from the microphone 200, and the calculated index is used. When a predetermined condition is satisfied with respect to the threshold value, the spoken voice is recognized as a word satisfying the predetermined condition. As described above, when the distance value is used as an index, the voice recognition unit 12 sets the threshold value for the spoken voice when the distance value calculated for the spoken voice is smaller than the threshold value set for each word. Recognize that the word is lower. When an index whose value increases as the degree of similarity increases is used, the voice recognition unit 12 uses the spoken voice when the index calculated for the spoken voice is larger than the threshold value set for each word. Is recognized as a word that exceeds the threshold value.

確認部１３は、音声認識部１２により認識されたワードをユーザに提示することにより、ユーザが発声したワードと一致しているか否かをユーザに確認させる。この確認は、例えば、音声認識部１２により認識されたワードを合成音声によって出力するトークバックによって行う。あるいは、音声認識部１２により認識されたワードを文字によって画面表示するようにしてもよい。 By presenting the word recognized by the voice recognition unit 12 to the user, the confirmation unit 13 causes the user to confirm whether or not the word matches the word uttered by the user. This confirmation is performed, for example, by a talkback that outputs a word recognized by the voice recognition unit 12 as a synthetic voice. Alternatively, the word recognized by the voice recognition unit 12 may be displayed on the screen in characters.

もし、ユーザが発声したワードとは異なるワードが確認部１３により提示された場合、ユーザはキャンセルを指示して音声認識を取り消すことができる。キャンセルの指示は、タッチパネルの手動操作または「キャンセル」というワードの発話入力によって行うことが可能である。確認部１３により音声認識の結果が提示されてから所定時間以内にユーザがキャンセルを指示しない場合、確認部１３は、音声認識部１２により認識されたワードを確定し、発話コマンドとしてナビゲーション装置３００に出力する。 If a word different from the word uttered by the user is presented by the confirmation unit 13, the user can instruct to cancel and cancel the voice recognition. The cancellation instruction can be given by manual operation of the touch panel or utterance input of the word "cancel". If the user does not instruct the cancellation within a predetermined time after the voice recognition result is presented by the confirmation unit 13, the confirmation unit 13 determines the word recognized by the voice recognition unit 12 and sends the navigation device 300 as an utterance command. Output.

ソース判定部１４は、車両内でオーディオ装置４００により再生されているオーディオ音声のソース種別を判定する。オーディオ音声は、音声認識にとってノイズとなる音声である。ここで、オーディオ音声の中には、会話が多く含まれる可能性の高いもの（例えば、ニュースやドラマなどの音声）から、会話が多く含まれる可能性が低いもの（例えば、ミュージック系の音声）まで、種々のソースがある。会話が多く含まれるほど、音声認識部１２において誤認識を生じる可能性が高いと言える。 The source determination unit 14 determines the source type of the audio sound reproduced by the audio device 400 in the vehicle. Audio-voice is a voice that becomes noise for voice recognition. Here, among audio-visual sounds, those that are likely to contain many conversations (for example, voices such as news and dramas) are less likely to contain many conversations (for example, music-type voices). There are various sources up to. It can be said that the more conversations are included, the higher the possibility that the voice recognition unit 12 will cause erroneous recognition.

そこで、ソース判定部１４は、会話が多く含まれる可能性の高さに応じてソース種別を区分し、車両内で再生されているオーディオ音声が、当該区分したソース種別のどれに該当するかを判定する。なお、オーディオ装置４００により再生されているオーディオ音声のソース種別は、オーディオ装置４００のソース設定情報を確認することによって判定することが可能である。 Therefore, the source determination unit 14 classifies the source types according to the high possibility that many conversations are included, and determines which of the classified source types the audio / voice being reproduced in the vehicle corresponds to. judge. The source type of the audio audio reproduced by the audio device 400 can be determined by checking the source setting information of the audio device 400.

閾値設定部１５は、ソース判定部１４により判定されたソース種別に応じて、認識辞書記憶部１１に記憶されている音声認識の閾値を可変設定する。図２は、ソース種別の区分と、区分ごとの閾値の調整値とを対応付けたテーブル情報の一例を示す図である。閾値設定部１５は、このテーブル情報を参照して、音声認識の閾値を可変設定する。 The threshold value setting unit 15 variably sets the voice recognition threshold value stored in the recognition dictionary storage unit 11 according to the source type determined by the source determination unit 14. FIG. 2 is a diagram showing an example of table information in which the classification of the source type and the adjustment value of the threshold value for each classification are associated with each other. The threshold value setting unit 15 variably sets the threshold value for voice recognition with reference to this table information.

図２の例では、会話が多く含まれる可能性の高さに応じて、ソース種別を３つに区分している。第１の区分は、ＣＤ（Compact Disc）、メモリカード、ポータブル音源等が接続されるＵＳＢ（Universal Serial Bus）などの音楽系のソースである。第２の区分は、ＤＶＤ（Digital Versatile Disk）、ＨＤＭＩ（High-Definition Multimedia Interface）、ＡＵＸなどの映像系のソースである。第３の区分は、ＤＴＶ（Digital TeleVision）、Ｒａｄｉｏなどのニュース／ドラマ系のソースである。 In the example of FIG. 2, the source types are classified into three according to the high possibility that many conversations are included. The first category is a music source such as a USB (Universal Serial Bus) to which a CD (Compact Disc), a memory card, a portable sound source, or the like is connected. The second category is a video source such as DVD (Digital Versatile Disk), HDMI (High-Definition Multimedia Interface), and AUX. The third category is news / drama sources such as DTV (Digital TeleVision) and Radio.

ここで、第１の区分＜第２の区分＜第３の区分の順番で、オーディオ装置４００により再生されるオーディオ音声の中に会話が多く含まれる可能性が高くなる。本実施形態では、会話が多く含まれる可能性が高い区分ほど、音声認識の閾値を下げるようにテーブル情報が設定されている。閾値が小さくなるほど、音声認識部１２により算出される距離値が閾値を下回りにくくなるので、誤認識の発生を低減することができる。 Here, in the order of the first division <second division <third division, there is a high possibility that many conversations are included in the audio voice reproduced by the audio device 400. In the present embodiment, the table information is set so as to lower the threshold value of voice recognition as the category is more likely to contain conversations. As the threshold value becomes smaller, the distance value calculated by the voice recognition unit 12 is less likely to fall below the threshold value, so that the occurrence of erroneous recognition can be reduced.

なお、図２に示した数値は、基準の閾値に対して調整する値を示している。すなわち、音楽系のソースの場合は、基準の閾値に対して“４０”を加算することを示している。映像系のソースの場合は、基準の閾値をそのまま用いることを示している。ニュース／ドラマ系のソースの場合は、基準の閾値から“２０”を減算することを示している。 The numerical value shown in FIG. 2 indicates a value to be adjusted with respect to the reference threshold value. That is, in the case of a music source, it is shown that "40" is added to the reference threshold value. In the case of a video source, it is shown that the reference threshold value is used as it is. In the case of news / drama sources, it indicates that "20" is subtracted from the reference threshold.

図３および図４は、第１の実施形態による音声認識装置１００の動作例を示すフローチャートである。図３に示すフローチャートは、音声認識装置１００の電源がオンとされたときに開始し、オフとされるまで継続して実行される。図４に示すフローチャートは、図３のステップＳ２における具体的な処理内容を示すものである。なお、ここでは、マイク２００を常時オン状態にしておき、ユーザが特に操作を行わなくても常に音声認識部１２が音声認識を行うトリガレス音声認識の動作例を示している。 3 and 4 are flowcharts showing an operation example of the voice recognition device 100 according to the first embodiment. The flowchart shown in FIG. 3 starts when the power of the voice recognition device 100 is turned on, and is continuously executed until the power of the voice recognition device 100 is turned off. The flowchart shown in FIG. 4 shows a specific processing content in step S2 of FIG. Here, an operation example of triggerless voice recognition in which the microphone 200 is always on and the voice recognition unit 12 always performs voice recognition even if the user does not perform any particular operation is shown.

図３において、まず、音声認識部１２および確認部１３において、音声認識処理を行う（ステップＳ１）。すなわち、音声認識部１２は、認識辞書記憶部１１の音声認識辞書に登録されているワードの音声パターンと、マイク２００より入力された発話音声との類似度を示す距離値を算出し、当該算出した距離値が閾値より小さくなるワードを認識する。そして、確認部１３は、音声認識部１２により認識されたワードをユーザに提示し、所定時間以内にキャンセルの指示がない場合、上記認識されたワードを発話コマンドとしてナビゲーション装置３００に出力する。 In FIG. 3, first, the voice recognition unit 12 and the confirmation unit 13 perform voice recognition processing (step S1). That is, the voice recognition unit 12 calculates a distance value indicating the degree of similarity between the voice pattern of the word registered in the voice recognition dictionary of the recognition dictionary storage unit 11 and the spoken voice input from the microphone 200, and the calculation is performed. Recognize words whose distance value is less than the threshold. Then, the confirmation unit 13 presents the word recognized by the voice recognition unit 12 to the user, and if there is no cancellation instruction within a predetermined time, the confirmation unit 13 outputs the recognized word as an utterance command to the navigation device 300.

次に、ソース判定部１４および閾値設定部１５において、ソース種別に応じた閾値の設定処理を実行する（ステップＳ２）。すなわち、図４において、ソース判定部１４は、オーディオ装置４００においてオーディオ音声の再生が行われているか否かを判定する（ステップＳ１１）。オーディオ音声の再生が行われていない場合、図４に示すフローチャートの処理は終了となる。 Next, the source determination unit 14 and the threshold value setting unit 15 execute the threshold value setting process according to the source type (step S2). That is, in FIG. 4, the source determination unit 14 determines whether or not the audio audio is being reproduced in the audio device 400 (step S11). When the audio / audio is not reproduced, the processing of the flowchart shown in FIG. 4 ends.

オーディオ音声の再生が行われている場合、ソース判定部１４は、当該再生されているオーディオ音声のソース種別を判定する（ステップＳ１２）。そして、閾値設定部１５は、ソース判定部１４により判定されたソース種別に応じて、図２に示すテーブル情報を参照して、認識辞書記憶部１１に記憶されている音声認識の閾値を可変設定する（ステップＳ１３）。これにより、図４に示すフローチャートの処理は終了となる。 When the audio audio is being reproduced, the source determination unit 14 determines the source type of the reproduced audio audio (step S12). Then, the threshold value setting unit 15 variably sets the voice recognition threshold value stored in the recognition dictionary storage unit 11 with reference to the table information shown in FIG. 2 according to the source type determined by the source determination unit 14. (Step S13). As a result, the processing of the flowchart shown in FIG. 4 is completed.

以上詳しく説明したように、第１の実施形態では、車両内で再生されているオーディオ音声のソース種別を判定し、判定したソース種別に応じて音声認識の閾値を可変設定するようにしている。このように構成した第１の実施形態によれば、認識対象のワードを音声認識する際にノイズとなるオーディオ音声のソース種別に応じて、登録ワードの音声パターンと発話音声との類似度を示す距離値と比較される閾値が設定されるため、ノイズの内容によって閾値を最適化することができ、誤認識の発生を低減することができる。 As described in detail above, in the first embodiment, the source type of the audio sound being reproduced in the vehicle is determined, and the voice recognition threshold value is variably set according to the determined source type. According to the first embodiment configured in this way, the similarity between the voice pattern of the registered word and the spoken voice is shown according to the source type of the audio voice that becomes noise when the word to be recognized is voice-recognized. Since the threshold value to be compared with the distance value is set, the threshold value can be optimized according to the content of noise, and the occurrence of erroneous recognition can be reduced.

（第２の実施形態）
次に、本発明による第２の実施形態を図面に基づいて説明する。図５は、第２の実施形態による音声認識装置１００Ａの構成例を示す機能ブロック図である。なお、この図５において、図１に示した符号と同一の符号を付したものは同一の機能を有するものであるので、ここでは重複する説明を省略する。 (Second Embodiment)
Next, a second embodiment according to the present invention will be described with reference to the drawings. FIG. 5 is a functional block diagram showing a configuration example of the voice recognition device 100A according to the second embodiment. Note that, in FIG. 5, those having the same reference numerals as those shown in FIG. 1 have the same functions, and therefore, duplicate description will be omitted here.

図５に示すように、第２の実施形態による音声認識装置１００Ａは、キャンセル回数カウント部１６を更に備えている。また、第２の実施形態による音声認識装置１００Ａは、閾値設定部１５に代えて閾値設定部１５Ａを備えている。 As shown in FIG. 5, the voice recognition device 100A according to the second embodiment further includes a cancellation number counting unit 16. Further, the voice recognition device 100A according to the second embodiment includes a threshold value setting unit 15A instead of the threshold value setting unit 15.

キャンセル回数カウント部１６は、音声認識部１２により認識されたワード（発話音声について算出された距離値が閾値より小さくなったワード）を確認部１３がユーザに提示した後、所定時間以内にユーザがキャンセルを指示した回数（以下、キャンセル回数という）をカウントする。キャンセル回数カウント部１６は、ワード毎にこのキャンセル回数を記憶しておく。 The cancellation count unit 16 is used by the user within a predetermined time after the confirmation unit 13 presents the word recognized by the voice recognition unit 12 (the word whose distance value calculated for the spoken voice is smaller than the threshold value) to the user. The number of times the cancellation is instructed (hereinafter referred to as the number of cancellations) is counted. The cancellation count unit 16 stores the cancellation count for each word.

閾値設定部１５Ａは、第１の実施形態で説明した閾値設定部１５の機能に加えて、以下の機能を有する。すなわち、閾値設定部１５Ａは、キャンセル回数カウント部１６によりカウントされるキャンセル回数が所定回数に達した場合、発話音声について算出される指標が所定の条件を満たしにくくなる方向に閾値を変更する。 The threshold value setting unit 15A has the following functions in addition to the functions of the threshold value setting unit 15 described in the first embodiment. That is, when the number of cancellations counted by the number of cancellations counting unit 16 reaches a predetermined number, the threshold value setting unit 15A changes the threshold value in a direction that makes it difficult for the index calculated for the spoken voice to satisfy the predetermined condition.

ここで、類似度の指標として距離値を用いる場合、閾値設定部１５Ａは、キャンセル回数が所定回数に達したワードについて、認識辞書記憶部１１に記憶されている閾値を現状よりも小さい値に変更する。例えば、閾値設定部１５Ａは、閾値の現在値から所定値を減算した値を新たな閾値として設定する。なお、類似度が高くなるほど値が大きくなるような指標を用いた場合、閾値設定部１５Ａは、キャンセル回数が所定回数に達したワードについて、認識辞書記憶部１１に記憶されている閾値を現状よりも所定値だけ大きい値に変更する。 Here, when a distance value is used as an index of similarity, the threshold value setting unit 15A changes the threshold value stored in the recognition dictionary storage unit 11 to a value smaller than the current value for words whose number of cancellations has reached a predetermined number. To do. For example, the threshold value setting unit 15A sets a value obtained by subtracting a predetermined value from the current value of the threshold value as a new threshold value. When an index is used in which the value increases as the degree of similarity increases, the threshold setting unit 15A sets the threshold value stored in the recognition dictionary storage unit 11 for words that have reached a predetermined number of cancellations from the current state. Is also changed to a value larger by a predetermined value.

ユーザ（例えば運転者）がキャンセルを指示するということは、運転者が発話コマンドに相当するワードを発声していないのに、オーディ音声や他の搭乗者の会話音声の中から発話コマンドのワードが音声認識部１２によって認識されてしまい、確認部１３による確認の動作が生じているということである。しかも、このような状況が所定回数繰り返し生じているということは、今後も同じ状況が繰り返し発生する可能性があることを意味している。その場合、運転者はその都度キャンセルを指示しなくてはならなくなり、煩わしい。そこで、第２の実施形態では、キャンセル回数が所定回数に達したワードについては、閾値を小さくすることにより、音声認識部１２によるワードの認識が行われにくくなるようにしている。 When the user (for example, the driver) instructs to cancel, the word of the utterance command is included in the audio voice or the conversation voice of another passenger even though the driver has not uttered the word corresponding to the utterance command. It is recognized by the voice recognition unit 12, and the confirmation operation by the confirmation unit 13 has occurred. Moreover, the fact that such a situation occurs repeatedly a predetermined number of times means that the same situation may occur repeatedly in the future. In that case, the driver has to instruct the cancellation each time, which is troublesome. Therefore, in the second embodiment, the voice recognition unit 12 makes it difficult for the voice recognition unit 12 to recognize the word when the number of cancellations reaches a predetermined number by reducing the threshold value.

図６は、第２の実施形態による音声認識装置１００Ａの動作例を示すフローチャートである。図６は、図３におけるステップＳ１の具体的な処理例を示すものである。 FIG. 6 is a flowchart showing an operation example of the voice recognition device 100A according to the second embodiment. FIG. 6 shows a specific processing example of step S1 in FIG.

まず、音声認識部１２は、認識辞書記憶部１１の音声認識辞書に登録されているワードの音声パターンと、マイク２００より入力された音声との類似度を示す距離値を算出し、当該算出した距離値が閾値より小さくなるワードの認識処理を実行する（ステップＳ２１）。そして、確認部１３は、距離値が閾値より小さくなるワードが音声認識辞書の中から音声認識部１２により検出されたか否かを判定する（ステップＳ２２）。 First, the voice recognition unit 12 calculates a distance value indicating the degree of similarity between the voice pattern of the word registered in the voice recognition dictionary of the recognition dictionary storage unit 11 and the voice input from the microphone 200, and the calculation is performed. A word recognition process in which the distance value is smaller than the threshold value is executed (step S21). Then, the confirmation unit 13 determines whether or not a word whose distance value is smaller than the threshold value is detected by the voice recognition unit 12 in the voice recognition dictionary (step S22).

ここで、距離値が閾値より小さくなるワードが音声認識部１２により検出されていないと判定した場合、図６に示すフローチャートの処理が終了となり、図３に示すステップＳ２の処理へと進む。一方、距離値が閾値より小さくなるワードが音声認識部１２により検出されたと判定した場合、確認部１３は、その検出されたワードを提示して、発話音声と一致しているかどうかをユーザに確認させる（ステップＳ２３）。 Here, when it is determined that the voice recognition unit 12 has not detected a word whose distance value is smaller than the threshold value, the processing of the flowchart shown in FIG. 6 ends, and the process proceeds to step S2 shown in FIG. On the other hand, when it is determined that the voice recognition unit 12 has detected a word whose distance value is smaller than the threshold value, the confirmation unit 13 presents the detected word and confirms with the user whether or not it matches the spoken voice. (Step S23).

その後、確認部１３は、所定時間以内にユーザからキャンセルの指示があったか否かを判定する（ステップＳ２４）。所定時間以内にキャンセルの指示があった場合、キャンセル回数カウント部１６は、キャンセル回数のカウント値をインクリメントする（ステップＳ２５）。そして、閾値設定部１５Ａは、キャンセル回数が所定回数に達したか否かを判定する（ステップＳ２６）。 After that, the confirmation unit 13 determines whether or not the user has instructed to cancel within a predetermined time (step S24). When the cancellation is instructed within the predetermined time, the cancellation count counting unit 16 increments the cancellation count value (step S25). Then, the threshold value setting unit 15A determines whether or not the number of cancellations has reached a predetermined number (step S26).

ここで、キャンセル回数が所定回数に達した場合、閾値設定部１５Ａは、当該キャンセル回数が所定回数に達したワードについて、認識辞書記憶部１１に記憶されている閾値を現状よりも所定値だけ小さい値に変更する（ステップＳ２７）。これにより、図６に示すフローチャートの処理が終了となり、図３に示すステップＳ２の処理へと進む。一方、キャンセル回数がまだ所定回数に達していない場合は、閾値を変更することなく図６に示すフローチャートの処理が終了となり、図３に示すステップＳ２の処理へと進む。 Here, when the number of cancellations reaches a predetermined number, the threshold setting unit 15A sets the threshold value stored in the recognition dictionary storage unit 11 smaller than the current value for the word for which the number of cancellations has reached the predetermined number. Change to a value (step S27). As a result, the processing of the flowchart shown in FIG. 6 is completed, and the process proceeds to the processing of step S2 shown in FIG. On the other hand, if the number of cancellations has not yet reached the predetermined number, the processing of the flowchart shown in FIG. 6 is completed without changing the threshold value, and the process proceeds to step S2 shown in FIG.

上記ステップＳ２４において、所定時間以内にキャンセルの指示がないと判定された場合、キャンセル回数カウント部１６は、キャンセル回数のカウント値をゼロにクリアする（ステップＳ２８）。これにより、図６に示すフローチャートの処理が終了となり、図３に示すステップＳ２の処理へと進む。なお、この場合は、確認部１３は、認識されたワードを発話コマンドとしてナビゲーション装置３００に出力する。 If it is determined in step S24 that there is no cancellation instruction within a predetermined time, the cancellation count counting unit 16 clears the cancellation count value to zero (step S28). As a result, the processing of the flowchart shown in FIG. 6 is completed, and the process proceeds to the processing of step S2 shown in FIG. In this case, the confirmation unit 13 outputs the recognized word as an utterance command to the navigation device 300.

以上詳しく説明したように、第２の実施形態によれば、音声認識部１２により認識されたワードに対するユーザのキャンセル操作が所定回数繰り返された場合に、そのワードに関する閾値を小さくすることにより、音声認識部１２によるワードの認識が行われにくくなるようにしている。これにより、ユーザの意図に反して音声認識部１２によるワードの認識が行われてしまう状況が減り、ユーザが煩わしいキャンセル操作を何度も行わなくても済むようにすることができる。 As described in detail above, according to the second embodiment, when the user's cancel operation for the word recognized by the voice recognition unit 12 is repeated a predetermined number of times, the voice is reduced by reducing the threshold value for that word. It is made difficult for the recognition unit 12 to recognize the word. As a result, the situation in which the voice recognition unit 12 recognizes the word against the intention of the user is reduced, and the user does not have to perform the troublesome cancel operation many times.

（第３の実施形態）
次に、本発明による第３の実施形態を図面に基づいて説明する。図７は、第３の実施形態による音声認識装置１００Ｂの構成例を示す機能ブロック図である。なお、この図７において、図５に示した符号と同一の符号を付したものは同一の機能を有するものであるので、ここでは重複する説明を省略する。 (Third Embodiment)
Next, a third embodiment according to the present invention will be described with reference to the drawings. FIG. 7 is a functional block diagram showing a configuration example of the voice recognition device 100B according to the third embodiment. In addition, in FIG. 7, those having the same reference numerals as those shown in FIG. 5 have the same functions, and therefore, duplicate description will be omitted here.

図７に示すように、第３の実施形態による音声認識装置１００Ｂは、近接回数カウント部１７を更に備えている。また、第３の実施形態による音声認識装置１００Ｂは、閾値設定部１５Ａに代えて閾値設定部１５Ｂを備えている。 As shown in FIG. 7, the voice recognition device 100B according to the third embodiment further includes a proximity count unit 17. Further, the voice recognition device 100B according to the third embodiment includes a threshold value setting unit 15B instead of the threshold value setting unit 15A.

近接回数カウント部１７は、類似度を示す指標が閾値に対して所定の条件を満たさないワードについて、指標と閾値との差分が所定値より小さくなる回数（以下、近接回数という）をカウントする。ここで、類似度の指標として距離値を用いる場合、近接回数カウント部１７は、距離値が閾値より小さくならないワードのうち、距離値と閾値との差分が所定値より小さくなるワードの検出回数を近接回数としてカウントする。近接回数カウント部１７は、ワード毎にこの近接回数を記憶しておく。 The proximity count counting unit 17 counts the number of times (hereinafter, referred to as the proximity count) that the difference between the index and the threshold value is smaller than the predetermined value for words whose similarity index does not satisfy a predetermined condition with respect to the threshold value. Here, when a distance value is used as an index of similarity, the proximity count unit 17 determines the number of detections of words whose distance value is not smaller than the threshold value and whose difference between the distance value and the threshold value is smaller than a predetermined value. Count as the number of proximity. The proximity count unit 17 stores the proximity count for each word.

あるワードについて算出された距離値と閾値との差分が所定値より小さいということは、距離値が閾値を下回らずにワード認識には至らないものの、登録ワードに対して比較的類似度が高い、近接したワードをユーザが発声しているということである。例えば、ユーザが発話コマンドに相当する登録ワードを発声しているものの、ユーザの発話の状態（音量、イントネーション、発声速度など）によって、距離値が閾値を下回らないようなケースでは、距離値と閾値との差分が所定値より小さくなる。 The fact that the difference between the distance value calculated for a certain word and the threshold value is smaller than the predetermined value means that the distance value does not fall below the threshold value and word recognition is not achieved, but the similarity to the registered word is relatively high. It means that the user is uttering a word in close proximity. For example, in the case where the user is uttering a registered word corresponding to the utterance command, but the distance value does not fall below the threshold value depending on the utterance state (volume, intonation, utterance speed, etc.) of the user, the distance value and the threshold value are used. The difference between and is smaller than the predetermined value.

閾値設定部１５Ｂは、第２の実施形態で説明した閾値設定部１５Ａの機能に加えて、以下の機能を有する。すなわち、閾値設定部１５Ｂは、近接回数カウント部１７によりカウントされる近接回数が所定回数に達した場合、発話音声について算出される指標が所定の条件を満たしやすくなる方向に閾値を変更する。これにより、第３の実施形態では、音声認識部１２によるワードの認識が行われやすくなるようにしている。 The threshold value setting unit 15B has the following functions in addition to the functions of the threshold value setting unit 15A described in the second embodiment. That is, when the number of proximity counts by the proximity count unit 17 reaches a predetermined number of times, the threshold value setting unit 15B changes the threshold value in a direction in which the index calculated for the spoken voice easily satisfies the predetermined condition. As a result, in the third embodiment, the voice recognition unit 12 can easily recognize the word.

ここで、類似度の指標として距離値を用いる場合、閾値設定部１５Ｂは、近接回数が所定回数に達したワードについて、認識辞書記憶部１１に記憶されている閾値を現状よりも所定値だけ大きい値に変更する。なお、類似度が高くなるほど値が大きくなるような指標を用いた場合、閾値設定部１５Ｂは、近接回数が所定回数に達したワードについて、認識辞書記憶部１１に記憶されている閾値を現状よりも所定値だけ小さい値に変更する。 Here, when a distance value is used as an index of similarity, the threshold value setting unit 15B sets the threshold value stored in the recognition dictionary storage unit 11 by a predetermined value larger than the current value for words whose proximity count reaches a predetermined number of times. Change to a value. When an index is used in which the value increases as the degree of similarity increases, the threshold setting unit 15B sets the threshold value stored in the recognition dictionary storage unit 11 for words that have reached a predetermined number of proximity times from the current state. Is also changed to a smaller value by a predetermined value.

図８は、第３の実施形態による音声認識装置１００Ｂの動作例を示すフローチャートである。図８は、図３におけるステップＳ１の具体的な処理例を示すものである。なお、図８において、図６に示したステップ番号と同一の番号を付したものは同一の処理を行うものであるので、ここでは重複する説明を省略する。 FIG. 8 is a flowchart showing an operation example of the voice recognition device 100B according to the third embodiment. FIG. 8 shows a specific processing example of step S1 in FIG. Note that, in FIG. 8, those having the same number as the step number shown in FIG. 6 perform the same processing, and therefore, duplicate description will be omitted here.

図８のステップＳ２２において、距離値が閾値より小さくなるワードが音声認識辞書の中から検出されたと確認部１３により判定された場合、近接回数カウント部１７は、近接回数のカウント値をセロにクリアする（ステップＳ２９）。その後、処理はステップＳ２３へ進む。 In step S22 of FIG. 8, when the confirmation unit 13 determines that a word whose distance value is smaller than the threshold value is detected in the voice recognition dictionary, the proximity count unit 17 clears the proximity count value to cello. (Step S29). After that, the process proceeds to step S23.

また、ステップＳ２２において、距離値が閾値より小さくなるワードが音声認識辞書の中から検出されていないと確認部１３により判定された場合、近接回数カウント部１７は、近接回数のカウント値をインクリメントする（ステップＳ３１）。そして、閾値設定部１５Ｂは、近接回数が所定回数に達したか否かを判定する（ステップＳ３２）。 Further, in step S22, when the confirmation unit 13 determines that the word whose distance value is smaller than the threshold value is not detected in the voice recognition dictionary, the proximity count unit 17 increments the proximity count value. (Step S31). Then, the threshold value setting unit 15B determines whether or not the number of times of proximity has reached a predetermined number of times (step S32).

ここで、近接回数が所定回数に達した場合、閾値設定部１５Ｂは、当該近接回数が所定回数に達したワードについて、認識辞書記憶部１１に記憶されている閾値を現状よりも所定値だけ大きい値に変更する（ステップＳ３３）。これにより、図８に示すフローチャートの処理が終了となり、図３に示すステップＳ２の処理へと進む。一方、近接回数がまだ所定回数に達していない場合は、閾値を変更することなく図８に示すフローチャートの処理が終了となり、図３に示すステップＳ２の処理へと進む。 Here, when the number of proximity reaches a predetermined number of times, the threshold setting unit 15B increases the threshold value stored in the recognition dictionary storage unit 11 by a predetermined value for the word whose number of proximity reaches the predetermined number of times. Change to a value (step S33). As a result, the processing of the flowchart shown in FIG. 8 is completed, and the process proceeds to the processing of step S2 shown in FIG. On the other hand, when the number of proximity has not reached the predetermined number of times, the processing of the flowchart shown in FIG. 8 is completed without changing the threshold value, and the process proceeds to the process of step S2 shown in FIG.

以上詳しく説明したように、第３の実施形態によれば、あるワードについて算出された距離値が閾値を下回らないものの、閾値との差分が所定値より小さい状況が所定回数繰り返された場合に、そのワードに関する閾値を大きくすることにより、音声認識部１２によるワードの認識が行われやすくなるようにしている。これにより、ユーザが発話コマンドに相当するワードを発声しているのに、ユーザの発話の状態（音量、イントネーション、発声速度など）によって認識されないといった不都合を解消することができる。 As described in detail above, according to the third embodiment, when the distance value calculated for a certain word does not fall below the threshold value, but the difference from the threshold value is smaller than the predetermined value is repeated a predetermined number of times. By increasing the threshold value for the word, the voice recognition unit 12 can easily recognize the word. As a result, it is possible to eliminate the inconvenience that the user is uttering a word corresponding to the utterance command but is not recognized depending on the utterance state (volume, intonation, utterance speed, etc.) of the user.

なお、上記第１〜第３の実施形態では、オーディオ音声のソース種別に応じて閾値を可変設定する例について説明したが、本発明はこれに限定されない。例えば、オーディオ音声のソース種別と音量との組み合わせに応じて閾値を可変設定するようにしてもよい。図９に、ソース種別と音量との組み合わせに応じて閾値を可変設定する場合に参照するテーブル情報の一例を示す。図９の例では、音量が小さいほど正しい音声認識をしにくくなることから、音量が小さいほど閾値が大きくなるようにテーブル情報が設定されている。 In the first to third embodiments, an example in which the threshold value is variably set according to the source type of audio / audio has been described, but the present invention is not limited to this. For example, the threshold value may be variably set according to the combination of the audio-audio source type and the volume. FIG. 9 shows an example of table information to be referred to when the threshold value is variably set according to the combination of the source type and the volume. In the example of FIG. 9, the lower the volume, the more difficult it is to recognize the correct voice. Therefore, the table information is set so that the lower the volume, the larger the threshold value.

また、上記第３の実施形態では、キャンセル回数カウント部１６および近接回数カウント部１７の両方を設ける構成について説明したが、近接回数カウント部１７のみを適用した実施形態とすることも可能である。 Further, in the third embodiment, the configuration in which both the cancellation count unit 16 and the proximity count unit 17 are provided has been described, but it is also possible to adopt an embodiment in which only the proximity count unit 17 is applied.

その他、上記第１〜第３の実施形態は、何れも本発明を実施するにあたっての具体化の一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその要旨、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 In addition, the first to third embodiments are merely examples of implementation of the present invention, and the technical scope of the present invention should not be interpreted in a limited manner by these. It must not be. That is, the present invention can be implemented in various forms without departing from its gist or its main features.

１１認識辞書記憶部
１２音声認識部
１３確認部
１４ソース判定部
１５，１５Ａ，１５Ｂ閾値設定部
１６キャンセル回数カウント部
１７近接回数カウント部 11 Recognition dictionary storage unit 12 Voice recognition unit 13 Confirmation unit 14 Source determination unit 15, 15A, 15B Threshold setting unit 16 Cancel count count unit 17 Proximity count count unit

Claims

A recognition dictionary storage unit that stores a voice recognition dictionary that associates a word to be recognized with its voice pattern and sets a threshold value to be compared with a voice recognition index.
The above index indicating the similarity between the voice pattern of the word registered in the above voice recognition dictionary and the utterance voice input from the microphone is calculated, and the calculated index satisfies a predetermined condition with respect to the threshold. In this case, a voice recognition unit that recognizes that the uttered voice is a word that satisfies the above-mentioned predetermined conditions, and
A source judgment unit that determines the source type of audio audio being played in the vehicle,
A threshold setting unit that variably sets the threshold value according to the source type determined by the source determination unit ,
The number of cancellations, which is the number of times the user has instructed to cancel within a predetermined time after presenting to the user a word in which the index calculated for the spoken voice satisfies the predetermined condition with respect to the threshold value. Equipped with a counting section
The threshold value setting unit is characterized in that when the number of cancellations reaches a predetermined number, the threshold value is changed in a direction in which the index calculated for the spoken voice is less likely to satisfy the predetermined condition. Voice recognition device.

A recognition dictionary storage unit that stores a voice recognition dictionary that associates a word to be recognized with its voice pattern and sets a threshold value to be compared with a voice recognition index.
The above index indicating the similarity between the voice pattern of the word registered in the above voice recognition dictionary and the utterance voice input from the microphone is calculated, and the calculated index satisfies a predetermined condition with respect to the threshold. In this case, a voice recognition unit that recognizes that the uttered voice is a word that satisfies the above-mentioned predetermined conditions, and
A source judgment unit that determines the source type of audio audio being played in the vehicle,
A threshold setting unit that variably sets the threshold value according to the source type determined by the source determination unit ,
For words whose index does not satisfy the predetermined condition with respect to the threshold value, a proximity count unit for counting the number of proximity times in which the difference between the index and the threshold value is smaller than the predetermined value is provided.
The threshold value setting unit is characterized in that when the number of times of proximity reaches a predetermined number of times, the threshold value is changed in a direction in which the index calculated for the spoken voice easily satisfies the predetermined condition. Voice recognition device.

The source determination unit classifies the source types according to the high possibility that many conversations are included, and determines which of the classified source types the audio / voice being played in the vehicle corresponds to. The voice recognition device according to claim 1 or 2 , wherein the determination is made.

The first step in which the source determination unit of the voice recognition device determines the source type of the audio voice being played in the vehicle,
The second step in which the threshold value setting unit of the voice recognition device variably sets the threshold value to be compared with the voice recognition index according to the source type determined by the source determination unit.
The voice recognition unit of the voice recognition device uses the threshold value set by the threshold value setting unit to determine the degree of similarity between the voice pattern of the word registered in the voice recognition dictionary and the spoken voice input from the microphone. A third step of calculating the index to be shown and recognizing that the spoken voice is a word satisfying the predetermined condition when the calculated index satisfies the predetermined condition with respect to the threshold value .
After the cancellation count unit of the voice recognition device presents to the user a word in which the index calculated for the spoken voice satisfies the predetermined condition with respect to the threshold value, the user instructs the user to cancel within a predetermined time. It has a fourth step of counting the number of cancellations, which is the number of times it has been done.
In the second step, when the number of cancellations reaches a predetermined number, the threshold setting unit changes the threshold in a direction in which the index calculated for the spoken voice is less likely to satisfy the predetermined condition. /> A voice recognition method characterized by this.

The first step in which the source determination unit of the voice recognition device determines the source type of the audio voice being played in the vehicle,
The second step in which the threshold value setting unit of the voice recognition device variably sets the threshold value to be compared with the voice recognition index according to the source type determined by the source determination unit.
The voice recognition unit of the voice recognition device uses the threshold value set by the threshold value setting unit to determine the degree of similarity between the voice pattern of the word registered in the voice recognition dictionary and the spoken voice input from the microphone. A third step of calculating the index to be shown and recognizing that the spoken voice is a word satisfying the predetermined condition when the calculated index satisfies the predetermined condition with respect to the threshold value .
The proximity count unit of the voice recognition device counts the number of proximitys in which the difference between the index and the threshold is smaller than the predetermined value for words whose index does not satisfy the predetermined condition with respect to the threshold. Have a fourth step to
In the second step, when the number of times of proximity reaches a predetermined number of times, the threshold value setting unit changes the threshold value in a direction in which the index calculated for the spoken voice easily satisfies the predetermined condition. /> A voice recognition method characterized by this.