JP2008309864A

JP2008309864A - Voice recognition device and voice recognition method

Info

Publication number: JP2008309864A
Application number: JP2007155212A
Authority: JP
Inventors: Shigefumi Kirino; 成史桐野
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2007-06-12
Filing date: 2007-06-12
Publication date: 2008-12-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition device and a voice recognition method, in which a speaker who speaks to an in-vehicle device for controlling the device is free from switch press operation, and which causes no malfunction by clearly recognizing whether or not the speaking is to the in-vehicle device. <P>SOLUTION: In the voice recognition device 10a, when a voice recognition processing result determination processing section 13b determines that an uttered vocabulary received by a voice recognition processing section 13a is included in a keyword dictionary 12a, the voice recognition processing section 13a converts it to a corresponding command by referring to a voice recognition dictionary 12b, and forwards the voice recognition result to a command conversion output processing section 13c for outputting to a car navigation device 20. When the voice recognition processing result determination processing section 13b determines that the uttered vocabulary received by the voice recognition processing section 13a is not included in the keyword dictionary 12a, the voice recognition processing section 13a does not forward the voice recognition result to the command conversion output processing section 13c. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、車両の搭乗者によって発話された発話語彙を音声認識する音声認識手段と、該音声認識手段によって音声認識された該発話語彙を対応するコマンドへ変換して車載装置へと受け渡すコマンド変換手段とを有する音声認識装置および該音声認識装置による音声認識方法に関し、特に、車載装置へ向かってその制御のために発話する発話者をトークスイッチの押下操作のわずらわしさから開放し、かつ該発話が車載装置に対するものであるか否かを明確に認識して誤作動を起こさない音声認識装置および音声認識方法に関する。 The present invention provides a speech recognition means for recognizing a speech vocabulary spoken by a passenger of a vehicle, and a command for converting the speech vocabulary recognized by the speech recognition means into a corresponding command and delivering it to an in-vehicle device In particular, the speech recognition device having the conversion means and the speech recognition method using the speech recognition device free from the trouble of pressing down the talk switch for the speaker who speaks for the control toward the in-vehicle device, and The present invention relates to a speech recognition device and a speech recognition method that clearly recognizes whether or not an utterance is directed to an in-vehicle device and does not cause a malfunction.

近年、利用者の音声を認識する技術の実現に向けて、各種考案がなされている。利用者の音声を認識することができれば、利用者は各種機器の操作を音声によって実行することが可能であり、特に車載装置では運転者による手動操作の運転への影響が懸念されることから音声操作技術の実用化が切望されている。 In recent years, various ideas have been made for realizing a technology for recognizing a user's voice. If the user's voice can be recognized, it is possible for the user to perform various device operations by voice. Especially, in-vehicle devices are concerned about the influence of manual operation by the driver on the driving. The practical application of operation technology is eagerly desired.

ところで、現在では、車両の音声認識機能を搭載した車載装置は、トークスイッチを押下した後に発話された特定のコマンドを認識する仕様となっている。このトークスイッチを使用することによって、車載装置は、特定のコマンドをより的確に認識することが可能となる（例えば、特許文献１参照）。 By the way, at present, an in-vehicle device equipped with a vehicle voice recognition function has a specification for recognizing a specific command uttered after pressing a talk switch. By using this talk switch, the in-vehicle device can recognize a specific command more accurately (see, for example, Patent Document 1).

特開平１０−９７２８１号公報JP-A-10-97281

しかしながら、上記特許文献１に代表される従来技術では、トークスイッチを押下するという操作が発話者の負担になるが、特に、発話者が運転者である場合には、運転操作以外の負荷を与える要因となる。このため、将来的には、トークスイッチを使用しない常時音声認識が主流となってくることが予想される。 However, in the conventional technique represented by the above-mentioned Patent Document 1, the operation of pressing the talk switch is a burden on the speaker. In particular, when the speaker is a driver, a load other than the driving operation is given. It becomes a factor. Therefore, in the future, it is expected that continuous speech recognition without using a talk switch will become the mainstream.

しかし、車両に搭乗している発話者が車載装置以外へ向かって発話した場合（例えば、同乗者へ向かう発話や独り言など）にも、車載装置は、自装置へ向かう発話として認識してしまい、この発話に基づいて誤動作を起こしてしまうおそれがあった。 However, even when a speaker on the vehicle speaks to a device other than the vehicle-mounted device (for example, utterance or self-speaking toward the passenger), the vehicle-mounted device recognizes the utterance to the device itself, There was a risk of malfunctioning based on this utterance.

本発明は、上記問題点（課題）を解消するためになされたものであって、車載装置へ向かってその制御のために発話する発話者をトークスイッチの押下操作のわずらわしさから開放し、かつ該発話が車載装置に対するものであるか否かを明確に認識して誤作動を起こさない音声認識装置および音声認識方法を提供することを目的とする。 The present invention has been made to solve the above problems (problems), and frees the speaker who speaks to the in-vehicle device for the control from the trouble of pressing the talk switch, and It is an object of the present invention to provide a voice recognition device and a voice recognition method that clearly recognizes whether or not the utterance is for an in-vehicle device and does not cause a malfunction.

上述した問題を解決し、目的を達成するため、本発明は、車両の搭乗者によって発話された発話語彙を音声認識する音声認識手段と、該音声認識手段によって音声認識された該発話語彙を対応するコマンドへ変換して車載装置へと受け渡すコマンド変換手段とを有する音声認識装置であって、前記音声認識手段によって音声認識された発話語彙が前記車載装置へ向けられた発話であるか否かを判定する音声認識結果判定手段をさらに有し、前記音声認識手段は、前記音声認識手段によって音声認識された発話語彙が前記車載装置へ向けられた発話であると前記音声認識結果判定手段により判定された場合にのみ該音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides a speech recognition unit that recognizes a speech vocabulary spoken by a vehicle occupant and a speech vocabulary that is speech-recognized by the speech recognition unit. Whether or not the utterance vocabulary recognized by the voice recognition means is an utterance directed to the in-vehicle device. Voice recognition result determination means for determining whether the utterance vocabulary recognized by the voice recognition means is an utterance directed to the in-vehicle device. The speech vocabulary recognized by the speech is transferred to the command conversion means only when the command is received.

また、本発明は、上記発明において、前記音声認識手段は、前記音声認識手段によって音声認識された発話語彙が特定語彙であると前記音声認識結果判定手段により判定された場合に、該特定語彙以降に音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを特徴とする。 Further, the present invention provides the speech recognition unit according to the above invention, wherein when the speech recognition result determination unit determines that the utterance vocabulary recognized by the speech recognition unit is a specific vocabulary, the speech recognition unit The speech vocabulary that has been voice-recognized is transferred to the command conversion means.

また、本発明は、上記発明において、前記音声認識手段は、前記音声認識手段によって音声認識された発話語彙が特定語彙であると前記音声認識結果判定手段により判定された場合に、該特定語彙以前に音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを特徴とする。 Further, the present invention provides the speech recognition unit according to the above invention, wherein when the speech recognition result determination unit determines that the utterance vocabulary speech-recognized by the speech recognition unit is a specific vocabulary, The speech vocabulary that has been voice-recognized is transferred to the command conversion means.

また、本発明は、上記発明において、前記音声認識手段は、前記音声認識手段によって音声認識された発話語彙が第１の特定語彙であると前記音声認識結果判定手段により判定された場合に、該第１の特定語彙より後に音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを開始し、前記音声認識手段によって該第１の特定語彙以降に音声認識された発話語彙が第２の特定語彙であると前記音声認識結果判定手段により判定された場合に、該第２の特定語彙以降に音声認識した発話語彙を音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを終了すことを特徴とする。 Further, the present invention provides the speech recognition unit according to the above invention, wherein the speech recognition result determination unit determines that the utterance vocabulary recognized by the speech recognition unit is the first specific vocabulary. An utterance vocabulary speech-recognized after the first specific vocabulary is started to be transferred to the command conversion means, and an utterance vocabulary speech-recognized after the first specific vocabulary by the speech recognition means is a second specific vocabulary. When the speech recognition result determination means determines that the vocabulary is a vocabulary, the speech vocabulary obtained by speech recognition of the utterance vocabulary speech recognized after the second specific vocabulary is terminated to the command conversion means. It is characterized by.

また、本発明は、上記発明において、前記音声認識手段によって音声認識された発話語彙を所定数だけバッファリングするバッファリング手段と、前記発話語彙に、該発話語彙が属するカテゴリを対応付けて記憶する語彙カテゴリ記憶手段とをさらに有し、前記音声認識手段は、前記バッファリング手段にバッファリングされる発話語彙のカテゴリに基づいて、音声認識された発話語彙が前記車載装置へ向けられた発話であると前記音声認識結果判定手段により判定された場合にのみ該音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを特徴とする。 Further, according to the present invention, in the above invention, buffering means for buffering a predetermined number of utterance vocabulary speech-recognized by the speech recognition means, and a category to which the utterance vocabulary belongs is stored in association with the utterance vocabulary. Vocabulary category storage means, and the speech recognition means is an utterance in which the utterance vocabulary recognized by speech based on the category of the utterance vocabulary buffered in the buffering means is directed to the in-vehicle device. The speech vocabulary that has been speech-recognized is transferred to the command conversion means only when it is determined by the speech recognition result determination means.

また、本発明は、上記発明において、前記音声認識手段は、前記バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリの出現率が所定値以上となったとして、音声認識された発話語彙が前記車載装置へ向けられた発話であると前記音声認識結果判定手段により判定された場合に、該音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを特徴とする。 Further, the present invention provides the speech recognition unit according to the above-described invention, wherein the speech recognition unit assumes that the appearance rate of the specific category is a predetermined value or more in the speech vocabulary buffered by the buffering unit. When the speech recognition result determining means determines that the speech is directed to the in-vehicle device, the speech vocabulary recognized by the speech is transferred to the command conversion means.

また、本発明は、上記発明において、前記音声認識手段は、前記バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリの出現率が所定値以上となったとして、音声認識された発話語彙が前記車載装置へ向けられた発話であると前記音声認識結果判定手段により判定されなかった場合が所定回数連続して以降、該音声認識した発話語彙を前記コマンド変換手段へと受け渡すことをキャンセルすることを特徴とする。 Further, the present invention provides the speech recognition unit according to the above-described invention, wherein the speech recognition unit assumes that the appearance rate of the specific category is a predetermined value or more in the speech vocabulary buffered by the buffering unit. Canceling the delivery of the speech-recognized speech vocabulary to the command conversion means after a predetermined number of times that the speech recognition result judgment means has not determined that the speech is directed to the in-vehicle device It is characterized by doing.

また、本発明は、上記発明において、前記音声認識手段は、前記バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリが所定回数連続したとして、音声認識された発話語彙が前記車載装置へ向けられた発話であると前記音声認識結果判定手段により判定された場合に、該音声認識した発話語彙を前記コマンド変換手段へと受け渡すことを特徴とする。 Further, according to the present invention, in the above invention, the voice recognition unit assumes that a specific category continues a predetermined number of times in the utterance vocabulary buffered by the buffering unit, and the utterance vocabulary that has been voice-recognized is transferred to the in-vehicle device. When the speech recognition result determining unit determines that the utterance is directed, the speech vocabulary recognized by the speech is transferred to the command conversion unit.

また、本発明は、上記発明において、前記音声認識手段は、前記バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリが所定回数連続したとして、音声認識された発話語彙が前記車載装置へ向けられた発話であると前記音声認識結果判定手段により判定されなかった場合が所定回数連続して以降、該音声認識した発話語彙を前記コマンド変換手段へと受け渡すことをキャンセルすることを特徴とする。 Further, according to the present invention, in the above invention, the voice recognition unit assumes that a specific category continues a predetermined number of times in the utterance vocabulary buffered by the buffering unit, and the utterance vocabulary that has been voice-recognized is transferred to the in-vehicle device. Canceling the delivery of the speech-recognized speech vocabulary to the command conversion means after a predetermined number of times that the speech recognition result judgment means has not determined that the speech is directed. To do.

また、本発明は、車両の搭乗者によって発話された発話語彙を音声認識する音声認識工程と、該音声認識手段によって音声認識された該発話語彙を対応するコマンドへ変換して車載装置へと受け渡すコマンド変換工程とを含む音声認識方法であって、前記音声認識工程によって音声認識された発話語彙が前記車載装置へ向けられた発話であるか否かを判定する音声認識結果判定工程をさらに含み、前記音声認識工程は、前記音声認識工程によって音声認識された発話語彙が前記車載装置へ向けられた発話であると前記音声認識結果判定工程により判定された場合にのみ、該音声認識した発話語彙を前記コマンド変換肯定へと受け渡すことを特徴とする。 The present invention also provides a speech recognition step for recognizing speech vocabulary spoken by a vehicle occupant and converting the speech vocabulary speech recognized by the speech recognition means into a corresponding command to be received by an in-vehicle device. A speech recognition method including a command conversion step to pass, further including a speech recognition result determination step of determining whether or not the utterance vocabulary recognized by the speech recognition step is an utterance directed to the in-vehicle device. In the speech recognition step, only when the speech recognition result determination step determines that the utterance vocabulary recognized by the speech recognition step is an utterance directed to the in-vehicle device, the speech recognition vocabulary recognized by the speech Is transferred to the affirmative command conversion.

本発明によれば、音声認識手段は、音声認識された発話語彙が車載装置へ向けられた発話であると音声認識結果判定手段により判定された場合にのみ該音声認識した発話語彙をコマンド変換手段へと受け渡すので、トークスイッチを必要としないために搭乗者はトークスイッチの押下操作のわずらわしさから開放され、かつ音声認識によって、車載装置へ向けられた発話語彙とに車載装置へ向けられたものではない発話語彙とを常時区別し、誤った音声認識に基づくコマンドによって車載装置が誤作動することを防止するという効果を奏する。 According to the present invention, the speech recognition means converts the speech utterance vocabulary recognized by the speech recognition result determination means into a command conversion means only when the speech recognition result determination means determines that the speech utterance vocabulary is directed to the in-vehicle device. Since the talk switch is not required, the passenger is freed from the troublesome operation of pressing the talk switch, and the voice vocabulary is directed to the in-vehicle device by voice recognition. It is possible to always distinguish the utterance vocabulary that is not a thing and prevent the in-vehicle device from malfunctioning due to a command based on erroneous voice recognition.

また、本発明によれば、音声認識手段によって音声認識された発話語彙が特定語彙であると音声認識結果判定手段により判定された場合に、該特定語彙以降に音声認識した発話語彙を前記コマンド変換手段へと受け渡すので、車載装置の制御のための発話を開始するためのトークスイッチの押下を必要とせず発話開始を音声認識手段に明確に認識させるとともに、搭乗者はトークスイッチの押下操作のわずらわしさから開放されるという効果を奏する。 Further, according to the present invention, when the speech recognition result determining means determines that the utterance vocabulary recognized by the speech recognition means is the specific vocabulary, the command conversion is performed on the utterance vocabulary recognized after the specific vocabulary. Therefore, the voice recognition means clearly recognizes the start of the utterance without the need to press the talk switch for starting the utterance for controlling the in-vehicle device. It has the effect of being free from bothersomeness.

また、本発明によれば、音声認識手段によって音声認識された発話語彙が特定語彙であると音声認識結果判定手段により判定された場合に、該特定語彙以降に音声認識した発話語彙を前記コマンド変換手段へと受け渡すので、車載装置の制御のための発話を開始するためのトークスイッチの押下を必要とせず発話開始を音声認識手段に明確に認識させるとともに車載装置の制御のための発話の終了を音声認識手段に明確に認識させることができ、搭乗者はトークスイッチの押下操作のわずらわしさから開放されるという効果を奏する。 Further, according to the present invention, when the speech recognition result determining means determines that the utterance vocabulary recognized by the speech recognition means is the specific vocabulary, the command conversion is performed on the utterance vocabulary recognized after the specific vocabulary. Therefore, the voice recognition means clearly recognizes the start of the utterance without the need to press the talk switch for starting the utterance for controlling the in-vehicle device, and ends the utterance for controlling the in-vehicle device. Can be clearly recognized by the voice recognition means, and the passenger is freed from the troublesome operation of pressing the talk switch.

また、本発明によれば、音声認識手段は、バッファリング手段にバッファリングされる発話語彙のカテゴリに基づいて、音声認識された発話語彙が車載装置へ向けられた発話であると音声認識結果判定手段により判定された場合にのみ該音声認識した発話語彙をコマンド変換手段へと受け渡すので、常時音声認識をしつつも、搭乗者が車載装置を制御するための発話の開始を意識しなくても、車載装置へ向けられた発話語彙とに車載装置へ向けられたものではない発話語彙とが区別され、発話者の負担を軽減するとともに、誤った音声認識に基づくコマンドによって車載装置が誤作動することを防止するという効果を奏する。 Further, according to the present invention, the speech recognition means determines the speech recognition result based on the category of the utterance vocabulary buffered by the buffering means if the utterance vocabulary recognized by the speech is directed to the in-vehicle device. Since the speech vocabulary recognized by the voice is transferred to the command conversion means only when it is determined by the means, the passenger does not have to be aware of the start of the utterance for controlling the in-vehicle device while always performing the voice recognition. In addition, the vocabulary directed to the in-vehicle device is distinguished from the utterance vocabulary that is not directed to the in-vehicle device, reducing the burden on the speaker and causing the in-vehicle device to malfunction due to an incorrect voice recognition command. This has the effect of preventing this.

また、本発明によれば、音声認識手段は、バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリの出現率が所定値以上となったとして、音声認識された発話語彙が車載装置へ向けられた発話であると音声認識結果判定手段により判定された場合に、該音声認識した発話語彙をコマンド変換手段へと受け渡すので、発話内容が特定の傾向を示すことを認識することによって、搭乗者が車載装置を制御するための発話を意識しなくても、車載装置へ向けられた発話語彙と車載装置へ向けられたものではない発話語彙とが区別され、発話者の負担を軽減するとともに、誤った音声認識に基づくコマンドによって車載装置が誤作動することを防止するという効果を奏する。 Further, according to the present invention, the speech recognition means determines that the appearance rate of the specific category among the utterance vocabulary buffered by the buffering means exceeds a predetermined value, and the speech utterance vocabulary is transferred to the in-vehicle device. When the speech recognition result determining means determines that the utterance is directed, the speech recognition speech vocabulary is transferred to the command conversion means, so that by recognizing that the utterance content shows a specific tendency, Even if the passenger is not conscious of the utterance for controlling the in-vehicle device, the utterance vocabulary directed to the in-vehicle device and the utterance vocabulary not directed to the in-vehicle device are distinguished, reducing the burden on the utterer At the same time, the in-vehicle device is prevented from malfunctioning due to a command based on erroneous voice recognition.

また、本発明によれば、音声認識手段は、バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリの出現率が所定値以上となったとして、音声認識された発話語彙が車載装置へ向けられた発話であると音声認識結果判定手段により判定されなかった場合が所定回数連続して以降、該音声認識した発話語彙を前記コマンド変換手段へと受け渡すことをキャンセルするので、発話内容が特定の傾向を示さなくなったことを認識することによって、搭乗者が車載装置を制御するための発話の終了を意識しなくても、車載装置へ向けられた発話語彙と車載装置へ向けられたものではない発話語彙とが区別され、発話者の負担を軽減するとともに、誤った音声認識に基づくコマンドによって車載装置が誤作動することを防止するという効果を奏する。 Further, according to the present invention, the speech recognition means determines that the appearance rate of the specific category among the utterance vocabulary buffered by the buffering means exceeds a predetermined value, and the speech utterance vocabulary is transferred to the in-vehicle device. Since the case where the speech recognition result determination means does not determine that the speech is directed is canceled after a predetermined number of consecutive times, the speech utterance vocabulary transferred to the command conversion means is canceled. By recognizing that a specific tendency is no longer shown, utterance vocabulary intended for in-vehicle devices and in-vehicle devices, even if the passenger is not aware of the end of the utterance to control the in-vehicle devices Utterance vocabulary is not distinguished, reducing the burden on the speaker, and preventing the in-vehicle device from malfunctioning due to commands based on incorrect voice recognition. An effect.

また、本発明によれば、バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリが所定回数連続したとして、音声認識された発話語彙が車載装置へ向けられた発話であると音声認識結果判定手段により判定された場合に、該音声認識した発話語彙をコマンド変換手段へと受け渡すので、発話内容が特定の傾向を一時的であっても強く示すことを認識することによって、搭乗者が車載装置を制御するための発話を意識しなくても、車載装置へ向けられた発話語彙と車載装置へ向けられたものではない発話語彙とが区別され、発話者の負担を軽減するとともに、誤った音声認識に基づくコマンドによって車載装置が誤作動することを防止するという効果を奏する。 Further, according to the present invention, the speech recognition result indicates that the speech utterance vocabulary is an utterance directed to the in-vehicle device, assuming that the specific category continues for a predetermined number of times in the utterance vocabulary buffered by the buffering means. When judged by the judging means, the voice-recognized utterance vocabulary is transferred to the command converting means, so that the passenger can recognize the utterance content strongly even if it shows a specific tendency temporarily. Even if you are not conscious of the utterances to control the in-vehicle device, the utterance vocabulary directed to the in-vehicle device and the utterance vocabulary not directed to the in-vehicle device are distinguished, reducing the burden on the speaker and This has the effect of preventing the in-vehicle device from malfunctioning due to a command based on the voice recognition.

また、本発明によれば、音声認識手段は、バッファリング手段にバッファリングされる発話語彙のなかで特定カテゴリが所定回数連続したとして、音声認識された発話語彙が前記車載装置へ向けられた発話であると音声認識結果判定手段により判定されなかった場合が所定回数連続して以降、該音声認識した発話語彙をコマンド変換手段へと受け渡すことをキャンセルするので、発話内容が特定の傾向を一時的に強く示さなくなったことを認識することによって、搭乗者が車載装置を制御するための発話の終了を意識しなくても、車載装置へ向けられた発話語彙と車載装置へ向けられたものではない発話語彙とが区別され、発話者の負担を軽減するとともに、誤った音声認識に基づくコマンドによって車載装置が誤作動することを防止するという効果を奏する。 Further, according to the present invention, the speech recognition means determines that a specific category continues for a predetermined number of times in the utterance vocabulary buffered by the buffering means, and the speech recognition utterance vocabulary is directed to the in-vehicle device. If the speech recognition result determination means does not determine that the speech recognition result is determined, the speech utterance vocabulary is cancelled from being transferred to the command conversion means after a predetermined number of consecutive times. By recognizing that the utterance is no longer shown strongly, the utterance vocabulary for the in-vehicle device and the in-vehicle device are not intended even if the passenger is unaware of the end of the utterance for controlling the in-vehicle device. It is distinguished from the utterance vocabulary not to reduce the burden on the speaker and to prevent the in-vehicle device from malfunctioning due to a command based on erroneous voice recognition. Achieve the cormorant effect.

以下に添付図面を参照し、本発明の音声認識装置および音声認識方法に係る実施例を詳細に説明する。 Exemplary embodiments according to a speech recognition apparatus and speech recognition method of the present invention will be described below in detail with reference to the accompanying drawings.

以下に図１〜図３を参照して、本発明にかかる実施例１を説明する。実施例１は、車両の搭乗者による発話語彙の常時音声認識において、予め設定されている特定のキーワードが音声認識されると、該キーワードの直後に音声認識された発話語彙を、カーナビゲーション装置などを制御可能なコマンドへ変換するために所定のコマンド変換部へと受け渡す実施例である。 Embodiment 1 according to the present invention will be described below with reference to FIGS. In the first embodiment, when a specific keyword set in advance is recognized by voice recognition in the continuous voice recognition of an utterance vocabulary by a vehicle occupant, the utterance vocabulary recognized immediately after the keyword is converted into a car navigation device or the like. This is an embodiment in which a command is transferred to a predetermined command conversion unit in order to convert it into a controllable command.

先ず、実施例１にかかる音声認識装置の構成について説明する。図１は、実施例１にかかる音声認識装置の構成を示す機能ブロック図である。同図に示すように、車両１において、ＣＡＮ（Controller Area Network）２を介して、実施例１にかかる音声認識装置１０ａと、音声認識された発話内容に基づく制御コマンドによる制御対象であるカーナビゲーション装置２０とが接続されている。以下の実施例では、音声認識された発話内容が変換された、カーナビゲーション装置２０などの車載装置を制御する制御コマンドを、単に“コマンド”と呼ぶ。 First, the configuration of the speech recognition apparatus according to the first embodiment will be described. FIG. 1 is a functional block diagram of the configuration of the speech recognition apparatus according to the first embodiment. As shown in the figure, in a vehicle 1, via a CAN (Controller Area Network) 2, a voice recognition device 10a according to the first embodiment and a car navigation that is a control target by a control command based on the speech-recognized utterance content. The apparatus 20 is connected. In the following embodiment, a control command for controlling an in-vehicle device such as the car navigation device 20 in which the speech-recognized utterance content is converted is simply referred to as a “command”.

音声認識装置１０ａは、所定の表示画面を有するディスプレイ装置などの表示手段である表示部１１ａと、音声を発するスピーカー装置などの音声発生手段である音声発生部１１ｂと、揮発性または不揮発性の記憶手段である記憶部１２と、制御部１３とを有する。また、音声認識装置１０ａには、外部から検知した音声データを音声認識装置１０ａへと入力するマイク１４が接続されている。 The voice recognition device 10a includes a display unit 11a that is a display unit such as a display device having a predetermined display screen, a voice generation unit 11b that is a voice generation unit such as a speaker device that emits voice, and a volatile or nonvolatile memory. It has the memory | storage part 12 and the control part 13 which are means. The voice recognition device 10a is connected to a microphone 14 for inputting voice data detected from the outside to the voice recognition device 10a.

記憶部１２は、キーワード辞書１２ａと、音声認識辞書１２ｂとを格納している。キーワード辞書１２ａおよび音声認識辞書１２ｂは、所定のテーブルとして記憶部１２に格納されている。キーワード辞書１２ａは、予め設定された特定の語彙のリストである。また、音声認識辞書１２ｂは、音声認識された発話内容から変換されるべきコマンドのリストである。 The storage unit 12 stores a keyword dictionary 12a and a voice recognition dictionary 12b. The keyword dictionary 12a and the speech recognition dictionary 12b are stored in the storage unit 12 as predetermined tables. The keyword dictionary 12a is a list of specific words set in advance. The voice recognition dictionary 12b is a list of commands to be converted from the speech-recognized utterance contents.

制御部１３は、音声認識装置１０ａの全体制御をつかさどるが、特に実施例１に関連する特徴的な機能構成としては、音声認識処理部１３ａと、音声認識処理結果判定処理部１３ｂと、コマンド変換出力処理部１３ｃとを有する。その他の機能構成については省略している。 The control unit 13 is responsible for overall control of the voice recognition device 10a. In particular, the characteristic functional configuration related to the first embodiment includes a voice recognition processing unit 13a, a voice recognition processing result determination processing unit 13b, and a command conversion. And an output processing unit 13c. Other functional configurations are omitted.

音声認識処理部１３ａは、車両１の搭乗者によって発話された語彙がマイク１４によって検知されると、その検知された発話語彙をひとまず受け付けて一時記憶しておく。そして、その発話語彙がキーワード辞書１２ａに含まれている場合（以上を前段の音声認識と呼ぶ）に、該発話語彙より後に検知された語彙を音声認識処理（この音声認識処理による音声認識を、後段の音声認識と呼ぶ）する。なお、単に音声認識と呼ぶ場合は、検知された発話語彙が音声認識辞書１２ｂに含まれていると判定された場合を示し、音声認識処理は、音声認識を試みる処理である。 When the vocabulary spoken by the passenger of the vehicle 1 is detected by the microphone 14, the speech recognition processing unit 13 a accepts the detected utterance vocabulary for a while and temporarily stores it. When the utterance vocabulary is included in the keyword dictionary 12a (the above is called speech recognition in the previous stage), the vocabulary detected after the utterance vocabulary is subjected to speech recognition processing (speech recognition by this speech recognition processing is performed) Called later speech recognition). Note that the term “speech recognition” simply indicates a case where it is determined that the detected utterance vocabulary is included in the speech recognition dictionary 12b, and the speech recognition process is a process of attempting speech recognition.

音声認識処理結果判定処理部１３ｂは、音声認識処理部１３ａによって受け付けられた発話語彙がキーワード辞書１２ａに含まれているか否かを判定する処理部である。音声認識処理結果判定処理部１３ｂが音声認識処理部１３ａによって受け付けられた発話語彙がキーワード辞書１２ａに含まれていると判定する場合には、音声認識処理部１３ａは、音声認識結果をコマンド変換出力処理部１３ｃへと受け渡す。なお、音声認識処理結果判定処理部１３ｂが音声認識処理部１３ａによって受け付けられた発話語彙がキーワード辞書１２ａに含まれていると判定されない場合には、音声認識処理部１３ａは、音声認識結果をコマンド変換出力処理部１３ｃへ受け渡さない。 The speech recognition processing result determination processing unit 13b is a processing unit that determines whether or not the utterance vocabulary accepted by the speech recognition processing unit 13a is included in the keyword dictionary 12a. When the speech recognition processing result determination processing unit 13b determines that the utterance vocabulary accepted by the speech recognition processing unit 13a is included in the keyword dictionary 12a, the speech recognition processing unit 13a outputs the speech recognition result as a command conversion output. The data is transferred to the processing unit 13c. When the speech recognition processing result determination processing unit 13b does not determine that the utterance vocabulary accepted by the speech recognition processing unit 13a is included in the keyword dictionary 12a, the speech recognition processing unit 13a uses the speech recognition result as a command. The data is not delivered to the conversion output processing unit 13c.

コマンド変換出力処理部１３ｃは、音声認識処理部１３ａから受け渡された音声認識結果を、音声認識辞書１２ｂを参照して対応するコマンドへと変換し、カーナビゲーション装置２０へと出力する。 The command conversion output processing unit 13c converts the voice recognition result delivered from the voice recognition processing unit 13a into a corresponding command with reference to the voice recognition dictionary 12b, and outputs the command to the car navigation device 20.

次に、実施例１の音声認識処理（その１）について説明する。実施例１の音声認識処理（その１）は、前段の音声認識による音声認識結果がキーワードである場合に、そのキーワードの直後に音声認識された１語彙をコマンド変換する場合の処理である。図２は、実施例１の音声認識処理手順（その１）を示すフローチャートである。 Next, the speech recognition process (part 1) of the first embodiment will be described. The voice recognition process (No. 1) according to the first embodiment is a process in the case of converting a vocabulary speech-recognized immediately after the keyword into a command when the voice recognition result by the voice recognition in the previous stage is a keyword. FIG. 2 is a flowchart illustrating the speech recognition processing procedure (part 1) according to the first embodiment.

先ず、音声認識処理部１３ａは、マイク１４を介して入力された発話語彙の音声認識処理をおこない、その処理結果を音声認識処理結果判定処理部１３ｂへと出力する（ステップＳ１０１）。続いて、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」がオンであるか否かを判定する（ステップＳ１０２）。「コマンド変換フラグ」がオンであると判定された場合に（ステップＳ１０２肯定）、ステップＳ１０３へ移り、「コマンド変換フラグ」がオンであると判定されなかった場合に（ステップＳ１０２否定）、ステップＳ１０５へ移る。 First, the speech recognition processing unit 13a performs speech recognition processing on the utterance vocabulary input via the microphone 14, and outputs the processing result to the speech recognition processing result determination processing unit 13b (step S101). Subsequently, the speech recognition processing result determination processing unit 13b determines whether or not the “command conversion flag” stored in the predetermined storage area is on (step S102). When it is determined that the “command conversion flag” is ON (Yes at Step S102), the process proceeds to Step S103, and when it is not determined that the “command conversion flag” is ON (No at Step S102), Step S105 is performed. Move on.

ステップＳ１０３では、音声認識処理部１３ａは、音声認識処理結果判定処理部１３ｂによる判定処理結果に基づき、入力された発話語彙の音声認識処理結果をコマンド変換出力処理部１３ｃへ受け渡す。続いて、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオフにする（ステップＳ１０４）。 In step S103, the speech recognition processing unit 13a delivers the speech recognition processing result of the input utterance vocabulary to the command conversion output processing unit 13c based on the determination processing result by the speech recognition processing result determination processing unit 13b. Subsequently, the speech recognition processing result determination processing unit 13b turns off the “command conversion flag” stored in the predetermined storage area (step S104).

ステップＳ１０５では、音声認識処理結果判定処理部１３ｂは、キーワード辞書１２ａを参照し、音声認識処理部１３ａから入力された音声認識処理結果がキーワードであるか否かを判定する。音声認識処理部１３ａから入力された音声認識処理結果がキーワードであると判定された場合に（ステップＳ１０５肯定）、ステップＳ１０６へ移り、音声認識処理部１３ａから入力された音声認識処理結果がキーワードであると判定されなかった場合に（ステップＳ１０５否定）、ステップＳ１０７へ移る。 In step S105, the speech recognition processing result determination processing unit 13b refers to the keyword dictionary 12a and determines whether or not the speech recognition processing result input from the speech recognition processing unit 13a is a keyword. When it is determined that the speech recognition processing result input from the speech recognition processing unit 13a is a keyword (Yes in step S105), the process proceeds to step S106, and the speech recognition processing result input from the speech recognition processing unit 13a is a keyword. If it is not determined that there is any (No in step S105), the process proceeds to step S107.

ステップＳ１０６では、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオンにする。また、ステップＳ１０７では、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオフにする。これらの処理が終了すると、ステップＳ１０８へ移る。 In step S106, the speech recognition processing result determination processing unit 13b turns on a “command conversion flag” stored in a predetermined storage area. In step S107, the speech recognition processing result determination processing unit 13b turns off the “command conversion flag” stored in the predetermined storage area. When these processes are completed, the process proceeds to step S108.

ステップＳ１０８では、音声認識処理部１３ａは、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了するか否かを判定し、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了する場合は（ステップＳ１０８肯定）、実施例１の音声認識処理（その１）は終了し、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了しない場合は（ステップＳ１０８否定）、ステップＳ１０１へ移る。 In step S108, the voice recognition processing unit 13a determines whether or not to end the output of the voice recognition result to the command conversion output processing unit 13c, and ends the output of the voice recognition result to the command conversion output processing unit 13c. In the case (Yes at Step S108), the voice recognition process (No. 1) of the first embodiment is finished, and when the output of the voice recognition result to the command conversion output processing unit 13c is not finished (No at Step S108), the process goes to Step S101. Move.

次に、実施例１の音声認識処理（その２）について説明する。実施例１の音声認識処理（その２）は、前段の音声認識による音声認識結果がキーワードである場合に、そのキーワードの直前に音声認識された１語彙をコマンド変換する場合の処理である。図３は、実施例１の音声認識処理手順（その２）を示すフローチャートである。 Next, the voice recognition process (part 2) of the first embodiment will be described. The voice recognition process (No. 2) of the first embodiment is a process in the case of converting a vocabulary voice-recognized immediately before the keyword into a command when the voice recognition result by the voice recognition in the previous stage is a keyword. FIG. 3 is a flowchart illustrating the voice recognition processing procedure (part 2) according to the first embodiment.

先ず、音声認識処理部１３ａは、「所定のバッファ」をクリアする（ステップＳ１１１）。このバッファは、揮発性または不揮発性の記憶手段に設けられる。 First, the speech recognition processing unit 13a clears the “predetermined buffer” (step S111). This buffer is provided in volatile or non-volatile storage means.

続いて、音声認識処理部１３ａは、マイク１４を介して入力された発話語彙の音声認識処理をおこない、その処理結果を音声認識処理結果判定処理部１３ｂへと出力する（ステップＳ１１２）。続いて、音声認識処理部１３ａは、「所定のバッファ」がクリアされているか否かを判定する（ステップＳ１１３）。「所定のバッファ」がクリアされていると判定される場合に（ステップＳ１１３肯定）、ステップＳ１１７へ移り、「所定のバッファ」がクリアされていると判定されない場合に（ステップＳ１１３否定）、ステップＳ１１４へ移る。 Subsequently, the speech recognition processing unit 13a performs speech recognition processing on the utterance vocabulary input via the microphone 14, and outputs the processing result to the speech recognition processing result determination processing unit 13b (step S112). Subsequently, the speech recognition processing unit 13a determines whether or not the “predetermined buffer” has been cleared (step S113). When it is determined that the “predetermined buffer” is cleared (Yes at Step S113), the process proceeds to Step S117, and when it is not determined that the “predetermined buffer” is cleared (No at Step S113), Step S114 is performed. Move on.

ステップＳ１１４では、音声認識処理結果判定処理部１３ｂは、キーワード辞書１２ａを参照し、音声認識処理部１３ａから入力された音声認識結果がキーワードであるか否かを判定する。音声認識処理部１３ａから入力された音声認識結果がキーワードであると判定された場合に（ステップＳ１１４肯定）、ステップＳ１１５へ移り、音声認識処理部１３ａから入力された音声認識結果がキーワードであると判定されなかった場合に（ステップＳ１１４否定）、ステップＳ１１７へ移る。 In step S114, the speech recognition processing result determination processing unit 13b refers to the keyword dictionary 12a and determines whether or not the speech recognition result input from the speech recognition processing unit 13a is a keyword. When it is determined that the speech recognition result input from the speech recognition processing unit 13a is a keyword (Yes in step S114), the process proceeds to step S115, and the speech recognition result input from the speech recognition processing unit 13a is a keyword. If not determined (No at step S114), the process proceeds to step S117.

ステップＳ１１５では、音声認識処理部１３ａは、音声認識処理結果判定処理部１３ｂによる判定処理結果に基づき、入力された発話語彙の音声認識処理結果をコマンド変換出力処理部１３ｃへ受け渡す。続いて、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオフにする（ステップＳ１１６）。 In step S115, the speech recognition processing unit 13a delivers the speech recognition processing result of the input utterance vocabulary to the command conversion output processing unit 13c based on the determination processing result by the speech recognition processing result determination processing unit 13b. Subsequently, the speech recognition processing result determination processing unit 13b turns off the “command conversion flag” stored in the predetermined storage area (step S116).

続いて、音声認識処理部１３ａは、ステップＳ１１２で入力された音声認識結果を「所定のバッファ」に記憶する（ステップＳ１１７）。このステップＳ１１７の処理の際に、すでに「所定のバッファ」に音声認識結果が記憶されている場合は、この古い音声認識結果を消去して、新しい音声認識結果を記憶する。続いて、音声認識処理部１３ａは、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了するか否かを判定し（ステップＳ１１８）、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了する場合は（ステップＳ１１８肯定）、実施例１の音声認識処理（その２）は終了し、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了しない場合は（ステップＳ１１８否定）、ステップＳ１１２へ移る。 Subsequently, the speech recognition processing unit 13a stores the speech recognition result input in step S112 in the “predetermined buffer” (step S117). If a speech recognition result is already stored in the “predetermined buffer” during the process of step S117, the old speech recognition result is deleted and a new speech recognition result is stored. Subsequently, the voice recognition processing unit 13a determines whether or not to end the output of the voice recognition result to the command conversion output processing unit 13c (step S118), and outputs the voice recognition result to the command conversion output processing unit 13c. Is terminated (Yes at step S118), the speech recognition process (part 2) of the first embodiment is terminated, and when the output of the speech recognition result to the command conversion output processing unit 13c is not terminated (No at step S118), Control goes to step S112.

以下に図４および図５を参照して、本発明にかかる実施例２を説明する。実施例１は、車両の搭乗者による発話語彙の常時音声認識において、予め設定されている特定の開始キーワードが音声認識されると、該キーワード以降に音声認識された発話語彙を、カーナビゲーション装置などを制御可能なコマンドへ変換するために所定のコマンド変換部へと受け渡し、特定の開始キーワードが音声認識された以降に特定の終了キーワードが音声認識されると、声認識された発話語彙を所定のコマンド変換部へと受け渡すことを終了する実施例である。以下は、実施例１との差異のみを説明することとする。 A second embodiment according to the present invention will be described below with reference to FIGS. 4 and 5. In the first embodiment, when a specific start keyword set in advance is recognized by voice recognition in the continuous speech recognition of the utterance vocabulary by the vehicle occupant, the utterance vocabulary recognized after the keyword is converted into the car navigation device or the like. Is converted to a controllable command to a predetermined command conversion unit, and when a specific end keyword is recognized after speech recognition of a specific start keyword, a voice-recognized utterance vocabulary is It is an Example which complete | finishes delivery to a command conversion part. Only the differences from the first embodiment will be described below.

先ず、実施例２にかかる音声認識装置の構成について説明する。図４は、実施例２にかかる音声認識装置の構成を示す機能ブロック図である。実施例２の音声認識装置１０ｂは、実施例１の音声認識装置１０ａと比較して、記憶部１２においてキーワード辞書１２ａに代えて開始キーワード辞書１２ｃおよび終了キーワード辞書１２ｄが含まれる構成となっている。これらの構成以外は、実施例２の音声認識装置１０ｂは、実施例１の音声認識装置１０ａと同一であるので、説明を省略する。 First, the configuration of the speech recognition apparatus according to the second embodiment will be described. FIG. 4 is a functional block diagram of the configuration of the speech recognition apparatus according to the second embodiment. Compared with the speech recognition device 10a of the first embodiment, the speech recognition device 10b of the second embodiment has a configuration in which the storage unit 12 includes a start keyword dictionary 12c and an end keyword dictionary 12d instead of the keyword dictionary 12a. . Except for these configurations, the speech recognition device 10b according to the second embodiment is the same as the speech recognition device 10a according to the first embodiment, and a description thereof will be omitted.

開始キーワード辞書１２ｃおよび終了キーワード辞書１２ｄは、所定のテーブルとして記憶部１２に格納されている。開始キーワード辞書１２ｃおよび終了キーワード辞書１２ｄは、予め設定された特定の語彙のリストである。特に、開始キーワード辞書１２ｃには、カーナビゲーション装置２０などの車載装置へ向けた発話の開始を示すキーワードが格納されており、終了キーワード辞書１２ｄには、カーナビゲーション装置２０などの車載装置へ向けた発話の終了を示すキーワードが格納されている。 The start keyword dictionary 12c and the end keyword dictionary 12d are stored in the storage unit 12 as predetermined tables. The start keyword dictionary 12c and the end keyword dictionary 12d are lists of specific vocabularies set in advance. In particular, the start keyword dictionary 12c stores a keyword indicating the start of an utterance toward an in-vehicle device such as the car navigation device 20, and the end keyword dictionary 12d is directed to the in-vehicle device such as the car navigation device 20. A keyword indicating the end of the utterance is stored.

次に、実施例２の音声認識処理について説明する。実施例２の音声認識処理は、前段の音声認識による音声認識結果が開始キーワードである場合に、そのキーワード以降に音声認識された語彙をコマンド変換し、その後終了キーワードが音声認識されると、音声認識された語彙のコマンド変換を終了する処理である。図５は、実施例２の音声認識処理手順を示すフローチャートである。 Next, the speech recognition process according to the second embodiment will be described. In the voice recognition process of the second embodiment, when the voice recognition result by the voice recognition in the previous stage is a start keyword, the vocabulary recognized after the keyword is command-converted, and then the end keyword is voice-recognized. This is a process for ending the command conversion of the recognized vocabulary. FIG. 5 is a flowchart illustrating a voice recognition processing procedure according to the second embodiment.

先ず、音声認識処理部１３ａは、マイク１４を介して入力された発話語彙の音声認識処理をおこない、その処理結果を音声認識処理結果判定処理部１３ｂへと出力する（ステップＳ１２１）。続いて、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」がオンであるか否かを判定する（ステップＳ１２２）。「コマンド変換フラグ」がオンであると判定された場合に（ステップＳ１２２肯定）、ステップＳ１２３へ移り、「コマンド変換フラグ」がオンであると判定されなかった場合に（ステップＳ１２２否定）、ステップＳ１２６へ移る。 First, the speech recognition processing unit 13a performs speech recognition processing on the utterance vocabulary input via the microphone 14, and outputs the processing result to the speech recognition processing result determination processing unit 13b (step S121). Subsequently, the speech recognition processing result determination processing unit 13b determines whether or not the “command conversion flag” stored in the predetermined storage area is on (step S122). When it is determined that the “command conversion flag” is ON (Yes at Step S122), the process proceeds to Step S123, and when it is not determined that the “command conversion flag” is ON (No at Step S122), Step S126. Move on.

ステップＳ１２３では、音声認識処理結果判定処理部１３ｂは、終了キーワード辞書１２ｄを参照し、音声認識処理部１３ａから入力された音声認識結果が終了キーワードであるか否かを判定する。音声認識処理部１３ａから入力された音声認識結果が終了キーワードであると判定された場合に（ステップＳ１２３肯定）、ステップＳ１２４へ移り、音声認識処理部１３ａから入力された音声認識結果が終了キーワードであると判定されなかった場合に（ステップＳ１２３否定）、ステップＳ１２５へ移る。 In step S123, the speech recognition processing result determination processing unit 13b refers to the end keyword dictionary 12d and determines whether or not the speech recognition result input from the speech recognition processing unit 13a is an end keyword. When it is determined that the speech recognition result input from the speech recognition processing unit 13a is the end keyword (Yes at Step S123), the process proceeds to Step S124, and the speech recognition result input from the speech recognition processing unit 13a is the end keyword. If it is not determined that there is (No at Step S123), the process proceeds to Step S125.

ステップＳ１２４では、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオフにする。ステップＳ１２５では、音声認識処理部１３ａは、音声認識処理結果判定処理部１３ｂによる判定処理結果に基づき、入力された発話語彙の音声認識処理結果をコマンド変換出力処理部１３ｃへ受け渡す。 In step S124, the speech recognition processing result determination processing unit 13b turns off the “command conversion flag” stored in the predetermined storage area. In step S125, the speech recognition processing unit 13a delivers the speech recognition processing result of the input utterance vocabulary to the command conversion output processing unit 13c based on the determination processing result by the speech recognition processing result determination processing unit 13b.

一方、ステップＳ１２６では、音声認識処理結果判定処理部１３ｂは、開始キーワード辞書１２ｃを参照し、音声認識処理部１３ａから入力された音声認識結果が開始キーワードであるか否かを判定する。音声認識処理部１３ａから入力された音声認識結果が開始キーワードであると判定された場合に（ステップＳ１２６肯定）、ステップＳ１２７へ移り、音声認識処理部１３ａから入力された音声認識結果が開始キーワードであると判定されなかった場合に（ステップＳ１２６否定）、ステップＳ１２９へ移る。 On the other hand, in step S126, the speech recognition processing result determination processing unit 13b refers to the start keyword dictionary 12c and determines whether or not the speech recognition result input from the speech recognition processing unit 13a is a start keyword. When it is determined that the speech recognition result input from the speech recognition processing unit 13a is the start keyword (Yes at Step S126), the process proceeds to Step S127, and the speech recognition result input from the speech recognition processing unit 13a is the start keyword. If it is not determined that there is any (No in step S126), the process proceeds to step S129.

ステップＳ１２７では、音声認識処理部１３ａは、音声認識処理結果判定処理部１３ｂによる判定処理結果に基づき、入力された発話語彙の音声認識処理結果をコマンド変換出力処理部１３ｃへ受け渡す。続いて、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオンにする（ステップＳ１２８）この処理が終了すると、ステップＳ１２９へ移る。 In step S127, the speech recognition processing unit 13a delivers the speech recognition processing result of the input utterance vocabulary to the command conversion output processing unit 13c based on the determination processing result by the speech recognition processing result determination processing unit 13b. Subsequently, the speech recognition processing result determination processing unit 13b turns on the “command conversion flag” stored in the predetermined storage area (step S128). When this process ends, the process proceeds to step S129.

ステップＳ１２９では、音声認識処理部１３ａは、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了するか否かを判定し、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了する場合は（ステップＳ１２９肯定）、実施例２の音声認識処理は終了し、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了しない場合は（ステップＳ１２９否定）、ステップＳ１２１へ移る。 In step S129, the speech recognition processing unit 13a determines whether or not to end the output of the speech recognition result to the command conversion output processing unit 13c, and ends the output of the speech recognition result to the command conversion output processing unit 13c. In such a case (Yes at Step S129), the voice recognition process of the second embodiment is finished. When the output of the voice recognition result to the command conversion output processing unit 13c is not finished (No at Step S129), the process proceeds to Step S121.

以下に図６〜図９を参照して、本発明にかかる実施例３を説明する。実施例３は、車両の搭乗者による発話語彙の常時音声認識において、特定のカテゴリの語彙が一定時間において音声認識された語彙のなかで所定割合を占める、あるいは特定のカテゴリの語彙が一定回数連続して音声認識されると、これらの条件が充足された以降に音声認識された発話語彙を、カーナビゲーション装置などを制御可能なコマンドへ変換するために所定のコマンド変換部へと受け渡する実施例である。以下は、実施例１または２実施例との差異のみを説明することとする。 A third embodiment according to the present invention will be described below with reference to FIGS. In the third embodiment, in the continuous speech recognition of the utterance vocabulary by the vehicle occupant, the vocabulary in the specific category occupies a predetermined ratio among the vocabulary recognized in a certain time, or the vocabulary in the specific category continues for a certain number of times. When speech recognition is performed, an utterance vocabulary recognized after the above conditions are satisfied is transferred to a predetermined command conversion unit to convert the car navigation device into a controllable command. It is an example. In the following, only the differences from the first or second embodiment will be described.

先ず、実施例３にかかる音声認識装置の構成について説明する。図６は、実施例３にかかる音声認識装置の構成を示す機能ブロック図である。実施例３の音声認識装置１０ｃは、実施例１の音声認識装置１０ａと比較して、記憶部１２においてキーワード辞書１２ａに代えて語彙カテゴリ分類テーブル１２ｅおよび認識語彙格納バッファ１２ｆが含まれる構成となっている。これらの構成以外は、実施例３の音声認識装置１０ｃは、実施例１の音声認識装置１０ａと同一であるので、説明を省略する。 First, the configuration of the speech recognition apparatus according to the third embodiment will be described. FIG. 6 is a functional block diagram of the configuration of the speech recognition apparatus according to the third embodiment. Compared with the speech recognition device 10a of the first embodiment, the speech recognition device 10c of the third embodiment includes a vocabulary category classification table 12e and a recognized vocabulary storage buffer 12f in the storage unit 12 instead of the keyword dictionary 12a. ing. Except for these configurations, the speech recognition device 10c according to the third embodiment is the same as the speech recognition device 10a according to the first embodiment, and thus the description thereof is omitted.

語彙カテゴリ分類テーブル１２ｅは、音声認識された発話語彙にその所属カテゴリを少なくとも一つ対応付けて記憶するテーブルである。例えば図７にそのテーブル例を示すように、「語彙」“そば”には、「所属カテゴリ」として“食事”、“和食”などが対応付けられている。また、「語彙」“目的地”には、「所属カテゴリ」として“ナビ”、“地図”などが対応付けられている。「語彙」“ｘｘテレビ局”には、「所属カテゴリ」として“テレビ”、“オーディオ”などが対応付けられている。 The vocabulary category classification table 12e is a table that stores at least one affiliation category associated with a speech-recognized utterance vocabulary. For example, as shown in the table example in FIG. 7, “meal”, “Japanese food”, and the like are associated with “vocabulary” and “soba” as “affiliation category”. In addition, “navigation”, “map”, and the like are associated with “vocabulary” and “destination” as “affiliation category”. “Vocabulary” “xx television station” is associated with “TV”, “audio”, etc. as “affiliation category”.

このように、音声認識処理結果判定処理部１３ｂによって語彙カテゴリ分類テーブル１２ｅに含まれると判定された語彙は、該語彙カテゴリ分類テーブル１２ｅに基づきその所属カテゴリが少なくとも一つ取得されることとなる。 As described above, at least one affiliation category of the vocabulary determined to be included in the vocabulary category classification table 12e by the speech recognition processing result determination processing unit 13b is acquired based on the vocabulary category classification table 12e.

認識語彙格納バッファ１２ｆは、音声認識処理部１３ａによって連続して音声認識がなされた語彙を所定数（例えば、500語彙など）だけバッファリングする記憶領域である。この認識語彙格納バッファ１２ｆにバッファリングされる語彙は、先入れ先出しによって管理され、前述の所定数を超えて新たに音声認識された語彙が格納されようとしたならば、時間的に最も古く格納された語彙を消去して該新たに音声認識された語彙が格納される。 The recognized vocabulary storage buffer 12f is a storage area for buffering a predetermined number (for example, 500 vocabularies) of words that have been continuously recognized by the speech recognition processing unit 13a. The vocabulary buffered in the recognized vocabulary storage buffer 12f is managed in a first-in first-out manner, and if a vocabulary newly recognized by voice exceeding the predetermined number is stored, it is stored the oldest in terms of time. The vocabulary is deleted and the newly recognized vocabulary is stored.

次に、実施例３の音声認識処理（その１）について説明する。実施例３の音声認識処理（その３）は、認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一の所属カテゴリである語彙の割合が所定閾値以上であると判定される場合に、その判定以後に音声認識された語彙をコマンド変換する場合の処理である。図８は、実施例３の音声認識処理手順（その１）を示すフローチャートである。 Next, the speech recognition process (No. 1) according to the third embodiment will be described. In the speech recognition process (No. 3) of the third embodiment, it is determined that the ratio of the vocabulary belonging to the same category among the vocabulary of the speech recognition result buffered in the recognized vocabulary storage buffer 12f is equal to or greater than a predetermined threshold. In this case, the vocabulary recognized after the determination is command-converted. FIG. 8 is a flowchart illustrating the speech recognition processing procedure (part 1) according to the third embodiment.

先ず、音声認識処理部１３ａは、マイク１４を介して入力された発話語彙の音声認識処理をおこない、その処理結果を音声認識処理結果判定処理部１３ｂへと出力する（ステップＳ１３１）。続いて、音声認識処理結果判定処理部１３ｂは、語彙カテゴリ分類テーブル１２ｅを参照して、入力された音声認識結果の語彙の所属カテゴリを取得する（ステップＳ１３２）。 First, the speech recognition processing unit 13a performs speech recognition processing on the utterance vocabulary input via the microphone 14, and outputs the processing result to the speech recognition processing result determination processing unit 13b (step S131). Subsequently, the speech recognition processing result determination processing unit 13b refers to the vocabulary category classification table 12e and acquires the affiliation category of the input speech recognition result vocabulary (step S132).

続いて、音声認識処理結果判定処理部１３ｂは、入力された音声認識結果を、認識語彙格納バッファ１２ｆに所定数（例えば、500語彙など）だけバッファリングする（ステップＳ１３３）。 Subsequently, the speech recognition processing result determination processing unit 13b buffers the input speech recognition result by a predetermined number (for example, 500 vocabulary) in the recognized vocabulary storage buffer 12f (step S133).

続いて、音声認識処理結果判定処理部１３ｂは、認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙の割合が所定閾値以上（例えば、80％など）であるか否かを判定する（ステップＳ１３４）。認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙の割合が所定閾値以上と判定された場合に（ステップＳ１３４肯定）、ステップＳ１３５へ移り、認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙の割合が所定閾値以上であると判定さなかった場合に（ステップＳ１３４否定）、ステップＳ１３７へ移る。 Subsequently, the speech recognition processing result determination processing unit 13b determines that the ratio of the vocabulary of the same category among the vocabulary of the speech recognition result buffered in the recognition vocabulary storage buffer 12f is equal to or higher than a predetermined threshold (for example, 80%). It is determined whether or not there is (step S134). When it is determined that the ratio of the vocabulary of the same category belongs to the vocabulary of the same affiliation category among the vocabulary of the speech recognition result buffered in the recognition vocabulary storage buffer 12f (Yes in step S134), the process proceeds to step S135, and the recognition vocabulary storage buffer If it is not determined that the vocabulary of the same affiliation category among the vocabulary of the speech recognition result buffered in 12f is greater than or equal to the predetermined threshold (No at step S134), the process proceeds to step S137.

ステップＳ１３５では、音声認識処理部１３ａは、音声認識処理結果判定処理部１３ｂによる判定処理結果に基づき、入力された発話語彙の音声認識処理結果をコマンド変換出力処理部１３ｃへ受け渡す。続いて、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオンにする（ステップＳ１３６）。この処理が終了すると、ステップＳ１４１へ移る。 In step S135, the speech recognition processing unit 13a transfers the speech recognition processing result of the input utterance vocabulary to the command conversion output processing unit 13c based on the determination processing result by the speech recognition processing result determination processing unit 13b. Subsequently, the speech recognition processing result determination processing unit 13b turns on the “command conversion flag” stored in a predetermined storage area (step S136). When this process ends, the process proceeds to step S141.

一方、ステップＳ１３７では、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」がオンであるか否かを判定する。「コマンド変換フラグ」がオンであると判定された場合に（ステップＳ１３７肯定）、ステップＳ１３８へ移り、「コマンド変換フラグ」がオンであると判定されなかった場合に（ステップＳ１３７否定）、ステップＳ１３９へ移る。 On the other hand, in step S137, the speech recognition processing result determination processing unit 13b determines whether or not the “command conversion flag” stored in the predetermined storage area is on. When it is determined that the “command conversion flag” is on (Yes at step S137), the process proceeds to step S138. When it is not determined that the “command conversion flag” is on (No at step S137), step S139 is performed. Move on.

続いて、音声認識処理結果判定処理部１３ｂは、認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙の割合が所定閾値以上であると所定回数判定されなかったか（すなわち、ステップＳ１３４否定が所定回数連続したか）否かを判定する（ステップＳ１３９）。ステップＳ１３４否定が所定回数連続した場合に（ステップＳ１３９肯定）、音声認識処理結果判定処理部１３ｂは、所定の記憶領域に格納される「コマンド変換フラグ」をオフにし（ステップＳ１４０）、ステップＳ１３４否定が所定回数連続しなかった場合に（ステップＳ１３９否定）、ステップＳ１４１へ移る。 Subsequently, the speech recognition processing result determination processing unit 13b does not determine the predetermined number of times that the vocabulary of the same affiliation category in the vocabulary of the speech recognition result buffered in the recognition vocabulary storage buffer 12f is equal to or greater than a predetermined threshold. (That is, whether negative in step S134 has continued for a predetermined number of times) or not (step S139). When step S134 is negative for a predetermined number of times (Yes at step S139), the speech recognition processing result determination processing unit 13b turns off the “command conversion flag” stored in the predetermined storage area (step S140), and negative at step S134. Does not continue for a predetermined number of times (No at step S139), the process proceeds to step S141.

ステップＳ１４１では、音声認識処理部１３ａは、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了するか否かを判定し、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了する場合は（ステップＳ１４１肯定）、実施例３の音声認識処理は終了し、音声認識結果のコマンド変換出力処理部１３ｃへの出力を終了しない場合は（ステップＳ１４１否定）、ステップＳ１３１へ移る。 In step S141, the speech recognition processing unit 13a determines whether or not to end the output of the speech recognition result to the command conversion output processing unit 13c, and ends the output of the speech recognition result to the command conversion output processing unit 13c. In such a case (Yes at Step S141), the voice recognition process of the third embodiment is finished. When the output of the voice recognition result to the command conversion output processing unit 13c is not finished (No at Step S141), the process proceeds to Step S131.

以上のステップＳ１３４の判定処理によって、発話語彙のカテゴリ分類による発話内容の傾向がある時間内にある程度現れた場合に、音声認識結果をコマンド変換処理部へと受け渡してコマンド変換をおこなうようにし、ステップＳ１３９の判定処理によって、発話語彙のカテゴリ分類による発話内容の傾向が現れなくなった場合に、音声認識結果をコマンド変換処理部へと受け渡すことをキャンセルしてコマンド変換をおこなわせないように制御することが可能になる。 When the determination processing in step S134 described above shows some tendency in the content of the utterance content due to the categorization of the utterance vocabulary, the speech recognition result is transferred to the command conversion processing unit, and the command conversion is performed. When the tendency of the utterance contents due to the categorization of the utterance vocabulary does not appear as a result of the determination process of S139, control is performed so that the command conversion is not performed by transferring the speech recognition result to the command conversion processing unit. It becomes possible.

なお、ステップＳ１３４の判定条件として、「認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙の割合が所定閾値以上であるか否か」に代えて「認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙が所定数連続しているか否か」を採用してもよい。これを図８のステップＳ１３４に代えてステップＳ１３４ａとする（図９参照）。 In addition, instead of “whether or not the ratio of the vocabulary of the same affiliation category among the vocabulary of the speech recognition result buffered in the recognized vocabulary storage buffer 12f is a predetermined threshold or more” It may be adopted whether or not a predetermined number of vocabularies of the same category belong to the vocabulary of the speech recognition result buffered in the vocabulary storage buffer 12f. This is changed to step S134a in place of step S134 in FIG. 8 (see FIG. 9).

これに応じて、図８のステップＳ１３９の判定条件として、「認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙の割合が所定閾値以上であると所定回数判定されなかったか（すなわち、ステップＳ１３４否定が所定回数連続したか）否か」に代えて「認識語彙格納バッファ１２ｆにバッファリングされている音声認識結果の語彙のうち同一所属カテゴリの語彙が所定数連続していると所定回数判定されなかったか（すなわち、ステップＳ１３４ａ否定が所定回数連続したか）否か」を採用することとなる。これを図８のステップＳ１３９に代えてステップＳ１３９ａとする（図９参照）。 Accordingly, the determination condition in step S139 of FIG. 8 is that “the ratio of the vocabulary of the same affiliation category in the vocabulary of the speech recognition result buffered in the recognized vocabulary storage buffer 12f is a predetermined number of times or more. Instead of whether or not determination is made (that is, whether or not the negative in step S134 is continued for a predetermined number of times), a predetermined number of vocabularies of the same category belong to the vocabulary of the speech recognition result buffered in the recognized vocabulary storage buffer 12f Whether or not the predetermined number of times is determined to be continuous (that is, whether or not negative in step S134a has been continued a predetermined number of times) is adopted. This is replaced with step S139 in FIG. 8 and is referred to as step S139a (see FIG. 9).

以上のステップＳ１３４ａの判定処理によって、発話語彙のカテゴリ分類による発話内容に一時的な強い傾向が現れた場合に、音声認識結果をコマンド変換処理部へと受け渡してコマンド変換をおこなうようにし、ステップＳ１３９ａの判定処理によって、発話語彙のカテゴリ分類による発話内容の一時的な強い傾向が現れなくなった場合に、音声認識結果をコマンド変換処理部へと受け渡すことをキャンセルしてコマンド変換をおこなわせないように制御することが可能になる。 When a strong tendency appears temporarily in the utterance content by the category classification of the utterance vocabulary by the determination processing in the above step S134a, the voice recognition result is transferred to the command conversion processing unit to perform command conversion, and step S139a. If there is no temporary strong tendency of the utterance content due to the categorization of the utterance vocabulary by the judgment processing of, the transfer of the speech recognition result to the command conversion processing unit is canceled and the command conversion is not performed. It becomes possible to control.

以上、本発明の実施例を説明したが、本発明は、これに限られるものではなく、特許請求の範囲に記載した技術的思想の範囲内で、更に種々の異なる実施例で実施されてもよいものである。また、実施例に記載した効果は、これに限定されるものではない。 As mentioned above, although the Example of this invention was described, this invention is not limited to this, In the range of the technical idea described in the claim, even if it implements in a various different Example, it is. It ’s good. Moreover, the effect described in the Example is not limited to this.

また、上記実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記実施例で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, among the processes described in the above embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, information including various data and parameters shown in the above embodiment can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Each component of each illustrated device is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵ（Central Processing Unit）（またはＭＰＵ（Micro Processing Unit）、ＭＣＵ（Micro Controller Unit）などのマイクロ・コンピュータ）および当該ＣＰＵ（またはＭＰＵ、ＭＣＵなどのマイクロ・コンピュータ）にて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現されてもよい。 Furthermore, each or all of the processing functions performed in each device are entirely or partially a CPU (Central Processing Unit) (or a microcomputer such as an MPU (Micro Processing Unit) or MCU (Micro Controller Unit)) and It may be realized by a program that is analyzed and executed by the CPU (or a microcomputer such as MPU or MCU), or may be realized as hardware by wired logic.

本発明は、音声認識装置において、車載装置へ向かってその制御のために発話する発話者をトークスイッチの押下操作のわずらわしさから開放し、かつ該発話が車載装置に対するものであるか否かを明確に認識して誤作動を起こさないようにしたい場合に有用である。 According to the present invention, in a voice recognition device, a speaker who speaks for control to an in-vehicle device is released from the troublesome operation of pressing the talk switch, and whether or not the utterance is for the in-vehicle device. This is useful when you want to clearly recognize and avoid malfunctions.

実施例１にかかる音声認識装置の構成を示す機能ブロック図である。1 is a functional block diagram illustrating a configuration of a speech recognition apparatus according to Embodiment 1. FIG. 実施例１の音声認識処理手順（その１）を示すフローチャートである。6 is a flowchart illustrating a voice recognition processing procedure (part 1) according to the first embodiment. 実施例１の音声認識処理手順（その２）を示すフローチャートである。It is a flowchart which shows the speech recognition process sequence (the 2) of Example 1. FIG. 実施例２にかかる音声認識装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech recognition apparatus concerning Example 2. FIG. 実施例２の音声認識処理手順を示すフローチャートである。10 is a flowchart illustrating a voice recognition processing procedure according to the second embodiment. 実施例３にかかる音声認識装置の構成を示す機能ブロック図である。FIG. 6 is a functional block diagram illustrating a configuration of a speech recognition apparatus according to a third embodiment. 語彙カテゴリ分類テーブルの例を示す図である。It is a figure which shows the example of a vocabulary category classification | category table. 実施例３の音声認識処理手順（その１）を示すフローチャートである。12 is a flowchart illustrating a voice recognition processing procedure (No. 1) according to the third embodiment. 実施例３の音声認識処理手順（その２）を示すフローチャートである。10 is a flowchart illustrating a voice recognition processing procedure (No. 2) according to the third embodiment.

Explanation of symbols

１車両
１０ａ音声認識装置
１０ｂ音声認識装置
１０ｃ音声認識装置
１１ａ表示部
１１ｂ音声発生部
１２記憶部
１２ａキーワード辞書
１２ｂ音声認識辞書
１２ｃ開始キーワード辞書
１２ｄ終了キーワード辞書
１２ｅ語彙カテゴリ分類テーブル
１２ｆ認識語彙格納バッファ
１３制御部
１３ａ音声認識処理部
１３ｂ音声認識処理結果判定処理部
１３ｃコマンド変換出力処理部
１４マイク
２０カーナビゲーション装置 1 vehicle 10a speech recognition device 10b speech recognition device 10c speech recognition device 11a display unit 11b speech generation unit 12 storage unit 12a keyword dictionary 12b speech recognition dictionary 12c start keyword dictionary 12d end keyword dictionary 12e vocabulary category classification table 12f recognition vocabulary storage buffer 13 Control unit 13a Speech recognition processing unit 13b Speech recognition processing result determination processing unit 13c Command conversion output processing unit 14 Microphone 20 Car navigation device

Claims

Speech recognition means for recognizing speech vocabulary spoken by a passenger of the vehicle, and command conversion means for converting the speech vocabulary recognized by the speech recognition means into a corresponding command and delivering it to the in-vehicle device A voice recognition device comprising:
Further comprising speech recognition result determination means for determining whether the utterance vocabulary recognized by the voice recognition means is an utterance directed to the in-vehicle device;
The speech recognition means determines the speech vocabulary recognized by the speech recognition only when the speech recognition result determination means determines that the utterance vocabulary recognized by the speech recognition means is an utterance directed to the in-vehicle device. A speech recognition device, wherein the speech recognition device is passed to a command conversion means.

The speech recognition means, when the speech recognition result determination means determines that the utterance vocabulary recognized by the speech recognition means is a specific vocabulary, the command conversion is performed on the utterance vocabulary recognized after the specific vocabulary. The voice recognition apparatus according to claim 1, wherein the voice recognition apparatus is transferred to a means.

The speech recognition means, when the speech recognition result determination means determines that the utterance vocabulary recognized by the speech recognition means is a specific vocabulary, converts the utterance vocabulary speech recognized before the specific vocabulary to the command conversion The voice recognition apparatus according to claim 1, wherein the voice recognition apparatus is transferred to a means.

The speech recognition means recognizes speech after the first specific vocabulary when the speech recognition result determination means determines that the utterance vocabulary recognized by the speech recognition means is the first specific vocabulary. The speech recognition result determining means starts to deliver the utterance vocabulary to the command conversion means, and the speech vocabulary recognized after the first specific vocabulary by the speech recognition means is the second specific vocabulary. 2. The method according to claim 1, wherein when the determination is performed, the utterance vocabulary obtained by speech recognition of the utterance vocabulary recognized after the second specific vocabulary is terminated to the command conversion means. Voice recognition device.

Buffering means for buffering a predetermined number of speech vocabularies recognized by the voice recognition means;
Vocabulary category storage means for storing the utterance vocabulary in association with the category to which the utterance vocabulary belongs, and
The voice recognition means is determined by the voice recognition result determination means that the utterance vocabulary that has been voice-recognized is an utterance directed to the in-vehicle device based on the category of the utterance vocabulary buffered in the buffering means. 2. The speech recognition apparatus according to claim 1, wherein the speech vocabulary that has been speech-recognized is transferred to the command conversion means only when the speech recognition is performed.

The speech recognition means is an utterance where the speech utterance vocabulary recognized as speech is directed to the in-vehicle device, assuming that the appearance rate of a specific category among the utterance vocabulary buffered by the buffering means is a predetermined value or more. 6. The speech recognition apparatus according to claim 5, wherein if the speech recognition result determination means determines that there is, the speech vocabulary recognized by the speech is transferred to the command conversion means.

The speech recognition means is an utterance where the speech utterance vocabulary recognized as speech is directed to the in-vehicle device, assuming that the appearance rate of a specific category among the utterance vocabulary buffered by the buffering means is a predetermined value or more. 7. The method according to claim 6, further comprising: canceling delivery of the speech-recognized utterance vocabulary to the command conversion means after a predetermined number of consecutive cases where the speech recognition result determination means does not determine. The speech recognition apparatus according to the description.

The speech recognition means recognizes that the speech recognition is a speech directed to the in-vehicle device, assuming that a specific category continues for a predetermined number of times in the speech vocabulary buffered by the buffering means. 6. The speech recognition apparatus according to claim 5, wherein, when judged by the result judging means, the speech vocabulary recognized by the speech is transferred to the command converting means.

The speech recognition means recognizes that the speech recognition is a speech directed to the in-vehicle device, assuming that a specific category continues for a predetermined number of times in the speech vocabulary buffered by the buffering means. 9. The voice recognition apparatus according to claim 8, wherein after the case where the result is not judged by the result judgment means continues for a predetermined number of times, the delivery of the utterance vocabulary recognized by the voice to the command conversion means is cancelled. .

A speech recognition step for recognizing an utterance vocabulary spoken by a vehicle occupant, and a command conversion step for converting the utterance vocabulary recognized by the speech recognition means into a corresponding command and transferring it to an in-vehicle device. A speech recognition method including:
A speech recognition result determination step of determining whether or not the utterance vocabulary recognized by the speech recognition step is an utterance directed to the in-vehicle device;
In the speech recognition step, only when the speech recognition result determination step determines that the speech vocabulary speech-recognized by the speech recognition step is an utterance directed to the in-vehicle device, the speech recognition speech vocabulary is The speech recognition method, wherein the command conversion is passed to the positive.