JP2006071792A

JP2006071792A - Speech recognition device for vehicle

Info

Publication number: JP2006071792A
Application number: JP2004252785A
Authority: JP
Inventors: Shinichi Satomi; 真一里見
Original assignee: Fuji Heavy Industries Ltd
Current assignee: Subaru Corp
Priority date: 2004-08-31
Filing date: 2004-08-31
Publication date: 2006-03-16

Abstract

<P>PROBLEM TO BE SOLVED: To precisely recognize an uttered speech without letting a user perform stepwise input operation. <P>SOLUTION: A dictionary retrieval part 12 performs retrieval from a word dictionary where corresponding words are previously set for a speech extracted by a speech recognition part 11 to select words as recognition candidates. A Mahalanobis distance arithmetic part 13 reads vehicle information set in each recognition candidate, reads current vehicle information, and calculates Mahalanobis distances of respective groups to respective recognition candidates of the vehicle information. An uttered phrase determination output part 14 determines the group whose Mahalanobis distance becomes minimum as the group that the vehicle information belongs to, determines it as a recognition result corresponding to the uttered phrase of a driver, and outputs it. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、車室内で発せられた音声を正確に認識する車両の音声認識装置に関する。 The present invention relates to a vehicle voice recognition apparatus for accurately recognizing a voice uttered in a passenger compartment.

近年、車両においては、ドライバの利便性を図るため、煩わしいスイッチ入力等を省き、ドライバの発する音声を感知して、該当する車載装置の作動が行える様々なシステムが開発されている。 2. Description of the Related Art In recent years, various systems have been developed in vehicles, in which troublesome switch input and the like are omitted, and the sound generated by the driver can be sensed to operate the corresponding on-vehicle device for the convenience of the driver.

例えば、特開２０００−２００９０号公報では、予め複数の言葉を記憶した単語辞書の中から使用者が発話した言葉を検索して特定することにより音声認識を行う車両の音声認識装置において、使用者の要求を、最初に少なくとも１つ一次要求として推定し、その一次要求から使用者の現在状態と未来状態とを推定して、その推定した状態から他の要求を推定する装置が開示されている。
特開２０００−２００９０号公報 For example, in Japanese Patent Laid-Open No. 2000-20090, in a speech recognition device for a vehicle that performs speech recognition by searching for and specifying a word spoken by a user from a word dictionary that stores a plurality of words in advance, the user Is initially estimated as at least one primary request, a current state and a future state of the user are estimated from the primary request, and another request is estimated from the estimated state. .
JP 2000-20090 A

しかしながら、上述の特許文献１で開示される音声認識装置では、使用者は一次要求を推定させるための発話に加えて、個人情報を入力する操作が必要であり設定に時間がかかり煩わしいという問題がある。また、使用者の要求を推定して、単語辞書の検索範囲を絞り、或いは、順序を変えたとしても、最終的には単語辞書で設定される単語の順位に縛られるため、精度の良い認識結果を得るには限界があるという問題がある。すなわち、単語辞書の順位は、前回までの単語の使用頻度等に影響されるものが多く、今回、使用者が置かれている状況が全く異なってしまっている場合でも、前回までの状況が考慮されて設定されてしまい誤認識となる場合がある。 However, in the speech recognition apparatus disclosed in the above-mentioned Patent Document 1, in addition to the utterance for estimating the primary request, the user needs to input personal information, which takes time and is troublesome to set. is there. Also, even if the user's request is estimated and the search range of the word dictionary is narrowed or the order is changed, it is ultimately limited to the word rank set in the word dictionary, so accurate recognition is possible. There is a problem that there is a limit in obtaining the result. In other words, the word dictionary ranking is often influenced by the frequency of word usage up to the previous time, and even if the user is placed in a completely different situation this time, the previous situation is considered. May be misconfigured as a result of being set.

本発明は上記事情に鑑みてなされたもので、使用者に段階的な入力操作を行わせることなく簡単に、使用者の発話フレーズを精度良く認識可能な車両の音声認識装置を提供することを目的とする。 The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a vehicle voice recognition device that can easily recognize a user's utterance phrase with high accuracy without causing the user to perform stepwise input operations. Objective.

本発明は、音声を入力する音声入力手段と、自車情報を入力する自車情報入力手段と、上記入力した音声を予め設定しておいた単語辞書を検索して認識候補を選択する単語辞書検索手段と、上記認識候補が複数存在する際に、各認識候補毎に予め設定しておいた情報群を読み込み、該読み込んだ情報群と現在の自車情報とのマハラノビス距離を演算するマハラノビス距離演算手段と、上記各認識候補毎に演算したマハラノビス距離に基づき上記各認識候補の中から発話フレーズを決定する発話フレーズ決定手段とを備えたことを特徴としている。 The present invention relates to a voice input means for inputting voice, a vehicle information input means for inputting own vehicle information, and a word dictionary for selecting a recognition candidate by searching a word dictionary in which the input voice is preset. Mahalanobis distance that reads the information group set in advance for each recognition candidate and calculates the Mahalanobis distance between the read information group and the current vehicle information when there are a plurality of recognition candidates and the search means It is characterized by comprising a calculation means and an utterance phrase determination means for deciding an utterance phrase from among each of the recognition candidates based on the Mahalanobis distance calculated for each of the recognition candidates.

本発明による車両の音声認識装置によれば、使用者に段階的な入力操作を行わせることなく簡単に、使用者の発話フレーズを精度良く認識可能となる。 According to the vehicle speech recognition apparatus of the present invention, it is possible to easily recognize a user's utterance phrase with high accuracy without causing the user to perform stepwise input operations.

以下、図面に基づいて本発明の実施の形態を説明する。
図１〜図６は本発明の実施の一形態を示し、図１は車両の音声認識装置の機能ブロック図、図２は音声認識プログラムのフローチャート、図３は発話フレーズ毎に設定される情報とマハラノビス距離を演算する各値の表の説明図、図４は「厚い」「厚木」を発話フレーズの一例として設定される情報とマハラノビス距離を演算する各値の表の説明図、図５は図４における自車位置情報の説明図、図６は図４におけるマハラノビス距離を求める際の自車運転情報と自車位置情報の分布を示す説明図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
1 to 6 show an embodiment of the present invention, FIG. 1 is a functional block diagram of a vehicle voice recognition device, FIG. 2 is a flowchart of a voice recognition program, and FIG. 3 is information set for each utterance phrase. FIG. 4 is an explanatory diagram of a table of values for calculating the Mahalanobis distance, FIG. 4 is an explanatory diagram of a table of values to calculate the Mahalanobis distance, information set with “thick” and “Atsugi” as examples of utterance phrases, and FIG. 4 is an explanatory diagram of the own vehicle position information in FIG. 4, and FIG. 6 is an explanatory diagram showing the distribution of the own vehicle driving information and the own vehicle position information when obtaining the Mahalanobis distance in FIG.

図１において、符号１は車両の音声認識装置を示し、この音声認識装置１には、車両に構築した自車情報入力手段としての車内ＣＡＮ通信（Controller Area Network(ISO規格)に準拠した通信）網２と接続されている。この車内ＣＡＮ通信網２には、車載した様々な制御ユニット（例えば、エンジン制御ユニット、トランスミッション制御ユニット、ブレーキ制御ユニット等）が連結されており、車両に設けた車内温度センサ、車速センサ、ワイパースイッチ、ブレーキスイッチ等のセンサ、スイッチ類の信号や、各制御ユニットで演算されたデータが共有可能となっている。 In FIG. 1, reference numeral 1 denotes a vehicle voice recognition device. The voice recognition device 1 includes in-vehicle CAN communication (communication based on the Controller Area Network (ISO standard)) as own vehicle information input means built in the vehicle. Connected to the network 2. Various in-vehicle control units (for example, an engine control unit, a transmission control unit, a brake control unit, etc.) are connected to the in-vehicle CAN communication network 2, and an in-vehicle temperature sensor, a vehicle speed sensor, and a wiper switch provided in the vehicle. Sensors such as brake switches, signals from switches, and data calculated by each control unit can be shared.

また、音声認識装置１には、同じく自車情報入力手段としてのナビゲーション装置３が接続されている。このナビゲーション装置３は、ＧＰＳ受信器、４輪の車輪速センサ、その他の必要なセンサ・スイッチ類から構成されており、これらにより得られる車両の走行情報とＣＤ−ＲＯＭ等に記録された地図情報とをマップマッチング等の演算をしながら合成し、車両の現在位置及びその周辺の地図、目的地までの最適経路、距離、方角等を液晶ディスプレイ上に表示させたり、スピーカから音声による経路誘導を行ってガイドするようになっている。 In addition, a navigation device 3 is also connected to the voice recognition device 1 as own vehicle information input means. This navigation device 3 is composed of a GPS receiver, four wheel speed sensors, and other necessary sensors and switches, and vehicle travel information obtained from these and map information recorded on a CD-ROM or the like. Are calculated while performing map matching, etc., and the current position of the vehicle and its surrounding map, the optimum route to the destination, distance, direction, etc. are displayed on the liquid crystal display, or route guidance by voice from the speaker Go and guide.

更に、音声認識装置１には、ドライバ（使用者）からの音声を捉える音声入力手段としてのマイク４が接続されている。 Furthermore, the speech recognition apparatus 1 is connected with a microphone 4 as a speech input means for capturing speech from a driver (user).

そして、音声認識装置１は、後述する如く、マイク４から入力されるドライバからの音声を、車内ＣＡＮ通信網２及びナビゲーション装置３からの情報を基に認識処理を行って発話フレーズを認識し、認識結果を、例えば、ナビゲーション装置３の入力システムや車内エアコンの入力システム等に出力して、これら必要な装置の設定を可変させる。 Then, as will be described later, the voice recognition device 1 recognizes an utterance phrase by performing a recognition process on the voice from the driver input from the microphone 4 based on information from the in-vehicle CAN communication network 2 and the navigation device 3, The recognition result is output to, for example, the input system of the navigation device 3 or the input system of the vehicle air conditioner, and the settings of these necessary devices are varied.

すなわち、音声認識装置１は、音声抽出部１１、辞書検索部１２、マハラノビス距離演算部１３、発話フレーズ決定出力部１４から主要に構成されている。 That is, the speech recognition apparatus 1 is mainly composed of a speech extraction unit 11, a dictionary search unit 12, a Mahalanobis distance calculation unit 13, and an utterance phrase determination output unit 14.

音声抽出部１１は、マイク４から入力される音声をノイズ等を除去して音声のみを抽出し、辞書検索部１２に出力する。 The voice extraction unit 11 removes noise and the like from the voice input from the microphone 4, extracts only the voice, and outputs it to the dictionary search unit 12.

辞書検索部１２は、単語辞書検索手段として設けられており、音声抽出部１１から入力される音声に対し、対応する単語を予め設定しておいた単語辞書を検索し、認識候補として選出して、マハラノビス距離演算部１３に出力する。ここで、辞書検索部１２は、例えば「厚い」と「暑い」のような同音異義語以外にも、「暑い」と「厚木」のような語尾の部分が不明瞭な場合であっても、それらの単語を全て認識候補として検索し、今までの使用頻度等から順位を付けて出力するようになっている。 The dictionary search unit 12 is provided as a word dictionary search unit, searches a word dictionary in which a corresponding word is set in advance for the voice input from the voice extraction unit 11, and selects it as a recognition candidate. And output to the Mahalanobis distance calculation unit 13. Here, the dictionary search unit 12 is not limited to homonyms such as “thick” and “hot”, but even if the ending part such as “hot” and “Atsugi” is unclear, All of these words are searched as recognition candidates, and are ranked according to the frequency of use so far and output.

また、辞書検索部１２には、発話フレーズ決定出力部１４からドライバの発話フレーズＶに該当する認識結果が入力され、予め設定される単語辞書の認識候補の順番が学習更新される。 The dictionary search unit 12 receives a recognition result corresponding to the utterance phrase V of the driver from the utterance phrase determination output unit 14 and learns and updates the order of recognition candidates in a preset word dictionary.

マハラノビス距離演算部１３は、辞書検索部１２から認識候補が入力され、車内ＣＡＮ通信網２から例えば車内温度、車速等の自車運転情報が入力され、ナビゲーション装置３から認識候補の方向と自車両の走行する方角との角度等の自車位置情報や認識候補までの自車両の距離等の目的地情報が入力される。 The Mahalanobis distance calculation unit 13 receives a recognition candidate from the dictionary search unit 12, and inputs own vehicle driving information such as in-vehicle temperature and vehicle speed from the in-vehicle CAN communication network 2, and the direction of the recognition candidate and the own vehicle from the navigation device 3. The vehicle position information such as the angle with the direction of travel and the destination information such as the distance of the vehicle to the recognition candidate are input.

マハラノビス距離演算部１３により行われる処理を、図３の表を基に説明する。
すなわち、ドライバの実際の発話フレーズをＶとし、この発話フレーズＶに対して辞書検索部１２で検索された認識候補をＡ、Ｂ、Ｃとする。これら認識候補Ａ、Ｂ、Ｃには、予め複数の車両情報（Ａｉ、Ｂｉ、Ｃｉ：ｉ＝１，２，…）が設定されており、それぞれの車両情報（Ａｉ、Ｂｉ、Ｃｉ：ｉ＝１，２，…）は、車内ＣＡＮ通信網２からの自車運転情報Ｐｉ（ｉ＝１，２，…）、ナビゲーション装置３からの自車位置情報Ｑｉ（ｉ＝１，２，…）、目的地情報Ｒｉ（ｉ＝１，２，…）から成っている（Ａｉ＝（Ｐ１，Ｐ２，…，Ｑ１，Ｑ２，…，Ｒ１，Ｒ２，…）、Ｂｉ＝（Ｐ１，Ｐ２，…，Ｑ１，Ｑ２，…，Ｒ１，Ｒ２，…）、Ｃｉ＝（Ｐ１，Ｐ２，…，Ｑ１，Ｑ２，…，Ｒ１，Ｒ２，…））。 Processing performed by the Mahalanobis distance calculation unit 13 will be described based on the table of FIG.
That is, let V be the actual utterance phrase of the driver, and A, B, and C be the recognition candidates searched by the dictionary search unit 12 for this utterance phrase V. A plurality of vehicle information (Ai, Bi, Ci: i = 1, 2,...) Is set in advance for these recognition candidates A, B, and C, and the respective vehicle information (Ai, Bi, Ci: i = i = 1, 2,..., Own vehicle driving information Pi (i = 1, 2,...) From the in-vehicle CAN communication network 2, own vehicle position information Qi (i = 1, 2,...) From the navigation device 3, It consists of destination information Ri (i = 1, 2,...) (Ai = (P1, P2,..., Q1, Q2,..., R1, R2,...), Bi = (P1, P2,. , Q2, ..., R1, R2, ...), Ci = (P1, P2, ..., Q1, Q2, ..., R1, R2, ...)).

そして、それぞれの車両情報（Ａｉ、Ｂｉ、Ｃｉ：ｉ＝１，２，…）毎に群が構成されている。すなわち、群ＧＡはＡ１、Ａ２、…から構成され、群ＧＢはＢ１、Ｂ２、…から構成され、群ＧＣはＣ１、Ｃ２、…から構成される。 A group is formed for each vehicle information (Ai, Bi, Ci: i = 1, 2,...). That is, the group GA is composed of A1, A2,..., The group GB is composed of B1, B2,..., And the group GC is composed of C1, C2,.

また、平均ＭＡは、群ＧＡにおける各情報毎に設定され、ＭＡ＝（Ｐav1，Ｐav2，…，Ｑav1，Ｑav2，…，Ｒav1，Ｒav2，…）である。ここで、Ｐav1は車両情報Ａｉにおける自車運転情報Ｐ１の平均、Ｐav2は車両情報Ａｉにおける自車運転情報Ｐ２の平均、Ｑav1は車両情報Ａｉにおける自車位置情報Ｑ１の平均、Ｑav2は車両情報Ａｉにおける自車位置情報Ｑ２の平均、Ｒav1は車両情報Ａｉにおける目的地情報Ｒ１の平均、Ｒav2は車両情報Ａｉにおける目的地情報Ｒ２の平均である。 The average MA is set for each piece of information in the group GA, and MA = (Pav1, Pav2,..., Qav1, Qav2,..., Rav1, Rav2,...). Here, Pav1 is an average of own vehicle driving information P1 in vehicle information Ai, Pav2 is an average of own vehicle driving information P2 in vehicle information Ai, Qav1 is an average of own vehicle position information Q1 in vehicle information Ai, and Qav2 is vehicle information Ai. The average of the vehicle position information Q2 in the vehicle information, Rav1 is the average of the destination information R1 in the vehicle information Ai, and Rav2 is the average of the destination information R2 in the vehicle information Ai.

同様に、平均ＭＢは、群ＧＢにおける各情報毎に設定され、ＭＢ＝（Ｐav1，Ｐav2，…，Ｑav1，Ｑav2，…，Ｒav1，Ｒav2，…）である。ここで、Ｐav1は車両情報Ｂｉにおける自車運転情報Ｐ１の平均、Ｐav2は車両情報Ｂｉにおける自車運転情報Ｐ２の平均、Ｑav1は車両情報Ｂｉにおける自車位置情報Ｑ１の平均、Ｑav2は車両情報Ｂｉにおける自車位置情報Ｑ２の平均、Ｒav1は車両情報Ｂｉにおける目的地情報Ｒ１の平均、Ｒav2は車両情報Ｂｉにおける目的地情報Ｒ２の平均である。 Similarly, the average MB is set for each piece of information in the group GB, and MB = (Pav1, Pav2,..., Qav1, Qav2,..., Rav1, Rav2,...). Here, Pav1 is the average of the own vehicle driving information P1 in the vehicle information Bi, Pav2 is the average of the own vehicle driving information P2 in the vehicle information Bi, Qav1 is the average of the own vehicle position information Q1 in the vehicle information Bi, and Qav2 is the vehicle information Bi. The average of the vehicle position information Q2 at Rv1, Rav1 is the average of the destination information R1 in the vehicle information Bi, and Rav2 is the average of the destination information R2 in the vehicle information Bi.

更に、平均ＭＣは、群ＧＣにおける各情報毎に設定され、ＭＣ＝（Ｐav1，Ｐav2，…，Ｑav1，Ｑav2，…，Ｒav1，Ｒav2，…）である。ここで、Ｐav1は車両情報Ｃｉにおける自車運転情報Ｐ１の平均、Ｐav2は車両情報Ｃｉにおける自車運転情報Ｐ２の平均、Ｑav1は車両情報Ｃｉにおける自車位置情報Ｑ１の平均、Ｑav2は車両情報Ｃｉにおける自車位置情報Ｑ２の平均、Ｒav1は車両情報Ｃｉにおける目的地情報Ｒ１の平均、Ｒav2は車両情報Ｃｉにおける目的地情報Ｒ２の平均である。 Further, the average MC is set for each piece of information in the group GC, and MC = (Pav1, Pav2,..., Qav1, Qav2,..., Rav1, Rav2,...). Here, Pav1 is an average of own vehicle driving information P1 in vehicle information Ci, Pav2 is an average of own vehicle driving information P2 in vehicle information Ci, Qav1 is an average of own vehicle position information Q1 in vehicle information Ci, and Qav2 is vehicle information Ci. The vehicle position information Q2 in the vehicle information Ci, Rav1 is the average of the destination information R1 in the vehicle information Ci, Rav2 is the average of the destination information R2 in the vehicle information Ci.

分散共分散行列の成分は、各群における車両情報の各成分から算出される分散と共分散である。σＡは、群ＧＡにおける、各Ｐｉ毎の分散、各Ｑｉ毎の分散、各Ｒｉ毎の分散、Ｐｉの各組み合わせ毎の共分散、Ｑｉの各組み合わせ毎の共分散、及び、Ｒｉの各組み合わせごとの共分散を成分にもつ行列である。σＢは、群ＧＢにおける、各Ｐｉ毎の分散、各Ｑｉ毎の分散、各Ｒｉ毎の分散、Ｐｉの各組み合わせ毎の共分散、Ｑｉの各組み合わせ毎の共分散、及び、Ｒｉの各組み合わせごとの共分散を成分にもつ行列である。σＣは、群ＧＣにおける、各Ｐｉ毎の分散、各Ｑｉ毎の分散、各Ｒｉ毎の分散、Ｐｉの各組み合わせ毎の共分散、Ｑｉの各組み合わせ毎の共分散、及び、Ｒｉの各組み合わせごとの共分散を成分にもつ行列である。具体的には、σＡについて述べると、σＡは、Ｐ１の分散、Ｐ２の分散、…、Ｑ１の分散、Ｑ２の分散、…、Ｒ１の分散、Ｒ２の分散、…、Ｐ１とＰ２とによる共分散、Ｐ２とＰ３との共分散、…、Ｑ１とＱ２とによる共分散、Ｑ２とＱ３との共分散、…、Ｒ１とＲ２とによる共分散、及び、Ｒ２とＲ３との共分散、…の成分をもつ。σＢ、σＣについては、σＡと同様であるため具他的な説明は省略する。 The components of the variance-covariance matrix are the variance and covariance calculated from each component of the vehicle information in each group. σA is dispersion for each Pi, dispersion for each Qi, dispersion for each Ri, covariance for each combination of Pi, covariance for each combination of Qi, and each combination of Ri in the group GA Is a matrix with the covariance of. σB is the variance for each Pi, variance for each Qi, variance for each Ri, covariance for each combination of Pi, covariance for each combination of Qi, and each combination of Ri in the group GB Is a matrix with the covariance of. σC is the variance for each Pi, variance for each Qi, variance for each Ri, covariance for each combination of Pi, covariance for each combination of Qi, and each combination of Ri in the group GC Is a matrix with the covariance of. Specifically, σA will be described. ΣA is P1 dispersion, P2 dispersion,..., Q1 dispersion, Q2 dispersion,... R1 dispersion, R2 dispersion,..., P1 and P2 covariance , P2 and P3 covariance, ..., Q1 and Q2 covariance, Q2 and Q3 covariance, ..., R1 and R2 covariance, and R2 and R3 covariance, ... It has. Since σB and σC are the same as σA, specific descriptions thereof are omitted.

そして、ドライバが発話フレーズＶを発話した際の車内ＣＡＮ通信網２から入力される車両情報Ｉｎが、どの群ＧＡ、ＧＢ、ＧＣに所属するのかを決めるために、車両情報Ｉｎと各群のマハラノビス距離を演算する。車両情報Ｉｎと群ＧＡのマハラノビス距離ＤＡは、以下の（１）式により演算される。
ＤＡ・ＤＡ＝ＸＡ・（σＡ）^−１・ＹＡ …（１）
ここで、（σＡ）^−１は分散共分散行列σＡの逆行列、ＸＡは車両情報Ｉｎと平均ＭＡの差の行ベクトル表示、ＹＡは車両情報Ｉｎと平均ＭＡの差の縦ベクトル表示である。 Then, in order to determine which group GA, GB, GC the vehicle information In inputted from the in-vehicle CAN communication network 2 when the driver utters the utterance phrase V belongs to, the vehicle information In and each group's Mahalanobis Calculate the distance. The vehicle information In and the Mahalanobis distance DA of the group GA are calculated by the following equation (1).
DA · DA = XA · (σA) ⁻¹ · YA (1)
Here, (σA) ⁻¹ is an inverse matrix of the variance-covariance matrix σA, XA is a row vector display of the difference between the vehicle information In and the average MA, and YA is a vertical vector display of the difference between the vehicle information In and the average MA.

同様に、車両情報Ｉｎと群ＧＢのマハラノビス距離ＤＢは、以下の（２）式により演算される。
ＤＢ・ＤＢ＝ＸＢ・（σＢ）^−１・ＹＢ …（２）
ここで、（σＢ）^−１は分散共分散行列σＢの逆行列、ＸＢは車両情報Ｉｎと平均ＭＢの差の行ベクトル表示、ＹＢは車両情報Ｉｎと平均ＭＢの差の縦ベクトル表示である。 Similarly, the vehicle information In and the Mahalanobis distance DB of the group GB are calculated by the following equation (2).
DB · DB = XB · (σB) ⁻¹ · YB (2)
Here, (σB) ⁻¹ is an inverse matrix of the variance-covariance matrix σB, XB is a row vector display of the difference between the vehicle information In and the average MB, and YB is a vertical vector display of the difference between the vehicle information In and the average MB.

また、車両情報Ｉｎと群ＧＣのマハラノビス距離ＤＣは、以下の（３）式により演算される。
ＤＣ・ＤＣ＝ＸＣ・（σＣ）^−１・ＹＣ …（３）
ここで、（σＣ）^−１は分散共分散行列σＣの逆行列、ＸＣは車両情報Ｉｎと平均ＭＣの差の行ベクトル表示、ＹＣは車両情報Ｉｎと平均ＭＣの差の縦ベクトル表示である。 Further, the vehicle information In and the Mahalanobis distance DC of the group GC are calculated by the following equation (3).
DC · DC = XC · (σC) ⁻¹ · YC (3)
Here, (σC) ⁻¹ is an inverse matrix of the variance-covariance matrix σC, XC is a row vector display of the difference between the vehicle information In and the average MC, and YC is a vertical vector display of the difference between the vehicle information In and the average MC.

このようにして、各群ＧＡ、ＧＢ、ＧＣについて演算されたマハラノビス距離ＤＡ、ＤＢ、ＤＣは、発話フレーズ決定出力部１４に出力される。すなわち、マハラノビス距離演算部１３は、マハラノビス距離演算手段として設けられている。 In this way, the Mahalanobis distances DA, DB, and DC calculated for the groups GA, GB, and GC are output to the utterance phrase determination output unit 14. That is, the Mahalanobis distance calculation unit 13 is provided as Mahalanobis distance calculation means.

発話フレーズ決定出力部１４は、マハラノビス距離演算部１３から各群ＧＡ、ＧＢ、ＧＣについて演算されたマハラノビス距離ＤＡ、ＤＢ、ＤＣが入力され、マハラノビス距離ＤＡ、ＤＢ、ＤＣが最小となった群を、その車両情報Ｉｎが所属する群として決定し、ドライバの発話フレーズＶに該当する認識結果として決定し、辞書検索部１２、及び、音声認識装置１から外部に対して出力する。すなわち、発話フレーズ決定出力部１４は発話フレーズ決定手段として設けられている。 The utterance phrase determination output unit 14 receives the Mahalanobis distances DA, DB, and DC calculated for each group GA, GB, and GC from the Mahalanobis distance calculation unit 13 and the group having the smallest Mahalanobis distances DA, DB, and DC. The vehicle information In is determined as a group to which the vehicle information In belongs, is determined as a recognition result corresponding to the driver's utterance phrase V, and is output from the dictionary search unit 12 and the voice recognition device 1 to the outside. That is, the utterance phrase determination output unit 14 is provided as an utterance phrase determination means.

尚、辞書検索部１２での単語辞書検索の結果、認識候補が１つのみであった場合には、マハラノビス距離演算部１３ではその認識候補についてマハラノビス距離を演算することはせず、そのまま発話フレーズ決定出力部１４に出力し、発話フレーズ決定出力部１４はこの認識候補をドライバの発話フレーズＶに該当する認識結果として決定し出力する。 Note that if the result of word dictionary search in the dictionary search unit 12 is that there is only one recognition candidate, the Mahalanobis distance calculation unit 13 does not calculate the Mahalanobis distance for the recognition candidate, and the utterance phrase is used as it is. The utterance phrase determination output unit 14 determines and outputs the recognition candidate as a recognition result corresponding to the utterance phrase V of the driver.

また、発話フレーズ決定出力部１４は、マハラノビス距離演算部１３から全く同じ値のマハラノビス距離の群が入力された場合には、辞書検索部１２で検索された認識候補順のもっとも上位の認識候補をドライバの発話フレーズＶに該当する認識結果として決定し出力する。 In addition, when a group of Mahalanobis distances having exactly the same value is input from the Mahalanobis distance calculation unit 13, the utterance phrase determination output unit 14 selects the highest recognition candidate in the recognition candidate order searched by the dictionary search unit 12. It is determined and output as a recognition result corresponding to the utterance phrase V of the driver.

次に、音声認識装置１で実行される音声認識プログラムを、図２のフローチャートで説明する。
まず、ステップ（以下、「Ｓ」と略称）１０１で、音声抽出部１１は、マイク４から入力される音声をノイズ等を除去して音声のみを抽出する。 Next, the speech recognition program executed by the speech recognition apparatus 1 will be described with reference to the flowchart of FIG.
First, in step (hereinafter abbreviated as “S”) 101, the voice extraction unit 11 removes noise and the like from the voice input from the microphone 4 and extracts only the voice.

次いで、Ｓ１０２に進み、辞書検索部１２は、Ｓ１０１で抽出された音声に対し、対応する単語を予め設定しておいた単語辞書を検索し、認識候補として選出する。 Next, in S102, the dictionary search unit 12 searches a word dictionary in which a corresponding word is set in advance for the voice extracted in S101, and selects it as a recognition candidate.

そして、Ｓ１０３に進み、Ｓ１０２の認識候補は１つか否か判定し、１つの場合にはＳ１０４に進んで、発話フレーズ決定出力部１４は、発話フレーズをその認識候補に決定し、Ｓ１０５に進んで、認識結果を出力し、プログラムを抜ける。 And it progresses to S103, it is determined whether the recognition candidate of S102 is one, and when it is one, it progresses to S104, the utterance phrase determination output part 14 determines an utterance phrase as the recognition candidate, and progresses to S105. , Output the recognition result and exit the program.

一方、Ｓ１０３で認識候補が１つではない、すなわち複数と判定した場合には、Ｓ１０６に進み、マハラノビス距離演算部１３は、各認識候補Ａ、Ｂ、Ｃに設定されている車両情報（Ａｉ、Ｂｉ、Ｃｉ：ｉ＝１，２，…）を読み込む。 On the other hand, if it is determined in S103 that there is not one recognition candidate, that is, a plurality of recognition candidates, the process proceeds to S106, and the Mahalanobis distance calculation unit 13 sets the vehicle information (Ai, Bi, Ci: i = 1, 2,.

次いで、Ｓ１０７に進み、マハラノビス距離演算部１３は、現在の車両情報Ｉｎを読み込み、Ｓ１０８に進んで、マハラノビス距離演算部１３は、前述の如く、車両情報Ｉｎの各認識候補Ａ、Ｂ、Ｃに対する各群ＧＡ、ＧＢ、ＧＣのマハラノビス距離ＤＡ、ＤＢ、ＤＣを演算する。 Next, the process proceeds to S107, the Mahalanobis distance calculation unit 13 reads the current vehicle information In, and the process proceeds to S108, where the Mahalanobis distance calculation unit 13 corresponds to each recognition candidate A, B, C of the vehicle information In as described above. The Mahalanobis distance DA, DB, DC of each group GA, GB, GC is calculated.

次に、Ｓ１０８に進み、発話フレーズ決定出力部１４は、マハラノビス距離ＤＡ、ＤＢ、ＤＣが最小となった群を、その車両情報Ｉｎが所属する群として決定し、ドライバの発話フレーズＶに該当する認識結果として決定し出力してプログラムを抜ける。 Next, proceeding to S108, the utterance phrase determination output unit 14 determines the group having the smallest Mahalanobis distance DA, DB, DC as the group to which the vehicle information In belongs, and corresponds to the utterance phrase V of the driver. Determine and output the recognition result and exit the program.

また、発話フレーズ決定出力部１４は、全く同じ値のマハラノビス距離があった場合には、辞書検索部１２で検索された認識候補順のもっとも上位の認識候補をドライバの発話フレーズＶに該当する認識結果として決定し出力してプログラムを抜ける。 The utterance phrase determination output unit 14 recognizes the recognition candidate corresponding to the utterance phrase V of the driver as the highest recognition candidate in the recognition candidate order searched by the dictionary search unit 12 when there is the same Mahalanobis distance. Determine and output as a result and exit the program.

次に、音声認識装置１で実行される音声認識を、より具体的な例で説明する。ここでは、ドライバが「暑い」と発話し、すなわち、ドライバの発話フレーズＶが「暑い」の場合で説明する。 Next, speech recognition executed by the speech recognition apparatus 1 will be described with a more specific example. Here, a case where the driver speaks “hot”, that is, a case where the driver's utterance phrase V is “hot” will be described.

これに対して、辞書検索部１２は、予め設定しておいた単語辞書を検索し、認識候補として「暑い」「厚木」を選出したものとする。この「暑い」「厚木」に対して、図３に対応して作成した表を図４に示す。すなわち、認識候補「暑い」に対して予め設定されている車両情報はＡ１〜Ａ４であり、認識候補「厚木」に対して予め設定されている車両情報はＢ１〜Ｂ４である。そして、自車両情報としては、Ｐ１のみが設定されており、これは車内温度である。また、自車位置情報としては、Ｑ１のみが設定されており、これは認識候補の地名の位置を目的地として、現在の車両進行方向との角度である（図５参照）。また、目的地情報は設定されていない。 In contrast, it is assumed that the dictionary search unit 12 searches a preset word dictionary and selects “hot” and “Atsugi” as recognition candidates. FIG. 4 shows a table created for this “hot” “Atsugi” corresponding to FIG. That is, the vehicle information preset for the recognition candidate “hot” is A1 to A4, and the vehicle information preset for the recognition candidate “Atsugi” is B1 to B4. And as own vehicle information, only P1 is set and this is in-vehicle temperature. Further, only Q1 is set as the vehicle position information, and this is an angle with the current vehicle traveling direction with the position of the place name of the recognition candidate as the destination (see FIG. 5). In addition, destination information is not set.

こうして、車内温度を横軸に角度を縦軸にとり座標に示したものが図６である。認識候補「暑い」の車両情報Ａ１〜Ａ４の分布は白丸でプロットされ、認識候補「厚木」の車両情報Ｂ１〜Ｂ４の分布は×点でプロットされる。また、図６中のＩｎの点（車内温度２５℃，角度６０度）は現在の車両情報の点とする。このような分布状態において、現在の車両情報Ｉｎが認識候補「暑い」の車両情報Ａ１〜Ａ４の分布の群ＧＡに属するのか、認識候補「厚木」の車両情報Ｂ１〜Ｂ４の分布の群ＧＢに属するのかをマハラノビス距離を演算することにより判断するのである。 FIG. 6 shows the in-vehicle temperature in the horizontal axis and the angle in the vertical axis. The distribution of the vehicle information A1 to A4 of the recognition candidate “hot” is plotted with white circles, and the distribution of the vehicle information B1 to B4 of the recognition candidate “Atsugi” is plotted with x points. Further, the In point (in-vehicle temperature 25 ° C., angle 60 °) in FIG. In such a distribution state, whether the current vehicle information In belongs to the distribution group GA of the vehicle information A1 to A4 of the recognition candidate “hot”, or the distribution group GB of the vehicle information B1 to B4 of the recognition candidate “Atsugi” Whether it belongs or not is determined by calculating the Mahalanobis distance.

こうして、現在の車両情報Ｉｎと認識候補「暑い」の車両情報Ａ１〜Ａ４の分布の群ＧＡまでのマハラノビス距離ＤＡが現在の車両情報Ｉｎと認識候補「厚木」の車両情報Ｂ１〜Ｂ４の分布の群ＧＢまでのマハラノビス距離ＤＢより小さい場合には、現在の車両情報Ｉｎは、認識候補「暑い」の車両情報Ａ１〜Ａ４の分布の群ＧＡに所属していると判断し、「暑い」を認識結果として出力するのである。 Thus, the Mahalanobis distance DA to the group GA of the distribution of the current vehicle information In and the recognition candidate “hot” vehicle information A1 to A4 is the distribution of the current vehicle information In and the vehicle information B1 to B4 of the recognition candidate “Atsugi”. If it is smaller than the Mahalanobis distance DB to the group GB, the current vehicle information In is judged to belong to the group GA of the distribution of the vehicle information A1 to A4 of the recognition candidate “hot”, and “hot” is recognized. The result is output.

このように、本発明の実施の形態によれば、ドライバの発話フレーズＶが認識候補のどれに属するのか、マハラノビス距離という統計的分析により判断するので、現在の状況を反映した精度の良い認識結果を得ることができる。 As described above, according to the embodiment of the present invention, it is determined by the statistical analysis of Mahalanobis distance to which of the recognition candidates the driver's utterance phrase V belongs, so that the accurate recognition result reflecting the current situation is obtained. Can be obtained.

また、認識に際しては現在の状況を反映して判断されるため、ドライバは発話以外の入力操作をする必要もなく使い勝手が良い。 Further, since the determination is made by reflecting the current situation at the time of recognition, the driver does not need to perform an input operation other than the utterance and is easy to use.

尚、本発明の実施の形態では、認識候補として「暑い」「厚木」の例を説明しているが、これに限るものではない。 In the embodiment of the present invention, examples of “hot” and “Atsugi” are described as recognition candidates, but the present invention is not limited to this.

車両の音声認識装置の機能ブロック図Functional block diagram of a vehicle voice recognition device 音声認識プログラムのフローチャートVoice recognition program flowchart 発話フレーズ毎に設定される情報とマハラノビス距離を演算する各値の表の説明図Explanatory drawing of the table of each value that calculates information set for each utterance phrase and Mahalanobis distance 「厚い」「厚木」を発話フレーズの一例として設定される情報とマハラノビス距離を演算する各値の表の説明図An explanatory diagram of the table of each value that calculates information and Mahalanobis distance set with "thick" and "Atsugi" as examples of utterance phrases 図４における自車位置情報の説明図Explanatory drawing of own vehicle position information in FIG. 図４におけるマハラノビス距離を求める際の自車運転情報と自車位置情報の分布を示す説明図Explanatory drawing which shows distribution of the own vehicle driving information and the own vehicle position information when obtaining the Mahalanobis distance in FIG.

Explanation of symbols

１音声認識装置
２車内ＣＡＮ通信網（自車情報入力手段）
３ナビゲーション装置（自車情報入力手段）
４マイク（音声入力手段）
１１音声抽出部
１２辞書検索部（単語辞書検索手段）
１３マハラノビス距離演算部（マハラノビス距離演算手段）
１４発話フレーズ決定出力部（発話フレーズ決定手段）
代理人弁理士伊藤進 1 Voice recognition device 2 In-vehicle CAN communication network (own vehicle information input means)
3. Navigation device (vehicle information input means)
4 Microphone (voice input means)
11 voice extraction unit 12 dictionary search unit (word dictionary search means)
13 Mahalanobis distance calculator (Mahalanobis distance calculator)
14 Utterance phrase determination output section (utterance phrase determination means)
Agent Patent Attorney Susumu Ito

Claims

Voice input means for inputting voice;
Own vehicle information input means for inputting own vehicle information;
A word dictionary search means for selecting a recognition candidate by searching a word dictionary in which the input voice is set in advance;
When there are a plurality of recognition candidates, a Mahalanobis distance calculation means for reading a preset information group for each recognition candidate and calculating a Mahalanobis distance between the read information group and current vehicle information;
An utterance phrase determining means for determining an utterance phrase from the recognition candidates based on the Mahalanobis distance calculated for each of the recognition candidates;
A voice recognition device for a vehicle, comprising:

The vehicle speech recognition apparatus according to claim 1, wherein the information group set in advance for each recognition candidate includes any one of vehicle driving information, vehicle position information, and destination position information. .