JP2002149191A

JP2002149191A - Voice input device

Info

Publication number: JP2002149191A
Application number: JP2000341429A
Authority: JP
Inventors: Ryuta Terajima; 立太寺嶌; Toshihiro Wakita; 敏裕脇田
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2000-11-09
Filing date: 2000-11-09
Publication date: 2002-05-24

Abstract

(57)【要約】【課題】認識率の高い車載用音声入力装置を提供する。【解決手段】音声入力装置に、話者状態又は／及び自動
車の走行状態を検知するセンサ装置５と、その出力から
話者の心的状態を推定する心的状態推定装置１０を設け
る。又、音声認識辞書群４０を設ける。その音声認識辞
書群４０には、車室内での様々な緊張下における発音に
基づいて作成された辞書４１、４２、４３、・・・を備
える。そして、音声入力時には辞書選択装置１５が心的
状態推定値（例えば、余裕度推定値）に応じてその音声
認識辞書群４０から最適な辞書、例えば辞書４１を選択
する。音声認識装置２５は、話者の心的状態に応じた最
適な辞書を用いて音声認識を行うのでその認識率が向上
する。この構成を車載用音声入力装置に用いる。 (57) [Summary] To provide an on-vehicle voice input device having a high recognition rate. A voice input device includes a sensor device for detecting a speaker state and / or a running state of a car, and a mental state estimating apparatus for estimating a mental state of the speaker from an output thereof. Further, a voice recognition dictionary group 40 is provided. The speech recognition dictionary group 40 includes dictionaries 41, 42, 43,... Created based on pronunciations under various tensions in the vehicle interior. Then, at the time of voice input, the dictionary selecting device 15 selects an optimal dictionary, for example, the dictionary 41 from the voice recognition dictionary group 40 according to the mental state estimated value (for example, the margin degree estimated value). Since the speech recognition device 25 performs speech recognition using an optimal dictionary according to the mood of the speaker, the recognition rate is improved. This configuration is used for an in-vehicle voice input device.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声で機器を制御
するための音声入力装置に関する。特に、人の心的状態
を推定しその心的状態に対応した辞書を選択して、音声
の認識率を向上させる音声入力装置に関する。本発明
は、例えば走行状態等から運転者の心的状態を推定し、
その心的状態に応じて的確な音声認識を行う音声入力装
置に適用できる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice input device for controlling a device by voice. In particular, the present invention relates to a voice input device that estimates a mental state of a person, selects a dictionary corresponding to the mental state, and improves a voice recognition rate. The present invention estimates the driver's mental state from the running state, for example,
The present invention can be applied to a voice input device that performs accurate voice recognition according to the mental state.

【０００２】[0002]

【従来の技術】従来より音声認識を用いた様々な音声入
力装置が提案されている。特に、それを自動車に適用さ
せる場合は安全性向上のため、よりいっそうの認識率向
上が求められている。例えば、認識率の向上を図った例
に特開平１１−１２６０９２号公報に開示の音声認識装
置及び車両用音声認識装置がある。これは、音声認識率
が低下する原因を排除して認識率を高める例である。そ
の概略構成図を図７に示す。従来の車両用音声認識装置
は、信号処理制御部１１２、音響処理部１１４、マイク
１１６、音声認識部１１８、音声合成部１２０、スピー
カ１２２、通信制御部１２４、トリガースイッチ部１２
６、ボディＥＣＵ部１３０、ナビゲーションＥＣＵ部１
５０から構成される。そして、パワーウインド１３２、
サンルーフ１３４、エアコンディショナー１３６、オー
ディオ装置１３８、ドアロック１４０、スピードメータ
１４２が上記ボディＥＣＵ１３０に接続されている。2. Description of the Related Art Conventionally, various voice input devices using voice recognition have been proposed. In particular, when it is applied to automobiles, further improvement in recognition rate is required for improving safety. For example, there are a speech recognition device and a vehicle speech recognition device disclosed in Japanese Patent Application Laid-Open No. 11-126092 as examples for improving the recognition rate. This is an example of increasing the recognition rate by eliminating the cause of the decrease in the speech recognition rate. FIG. 7 shows a schematic configuration diagram thereof. The conventional vehicle voice recognition device includes a signal processing control unit 112, a sound processing unit 114, a microphone 116, a voice recognition unit 118, a voice synthesis unit 120, a speaker 122, a communication control unit 124, and a trigger switch unit 12.
6, body ECU 130, navigation ECU 1
50. And power window 132,
A sunroof 134, an air conditioner 136, an audio device 138, a door lock 140, and a speedometer 142 are connected to the body ECU 130.

【０００３】上記構成において、音声認識部１１８での
認識結果で不認識が発生した場合には、信号処理制御部
１１２が不認識の原因を特定する。例えば、音声が雑音
に埋もれている時はマイク１１６の周囲環境悪化と判断
した場合は、ボディＥＣＵ１３０がマイク１１６の入力
環境を改善する。例えば、パワーウィンドウ１３２が開
いてるときは、このウィンドウ１３２を閉じる。これに
より、再度音声認識を行わせて認識率の向上を図る例で
ある。[0003] In the above configuration, when unrecognition occurs in the recognition result of the voice recognition unit 118, the signal processing control unit 112 specifies the cause of the unrecognition. For example, when it is determined that the surrounding environment of the microphone 116 is deteriorated when the voice is buried in the noise, the body ECU 130 improves the input environment of the microphone 116. For example, when the power window 132 is open, the window 132 is closed. This is an example in which speech recognition is performed again to improve the recognition rate.

【０００４】又、他に例えば特開平９−１３４１９３号
公報に開示の音声認識装置がある。これは、状況に応じ
て音声認識アルゴリズムを選択して認識率の向上を図る
例である。例えば、中央演算ユニットは様々なセンサか
らの情報により使用環境を推定する。そして、それに合
わせて複数の音声認識プログラムの内からその使用環境
に適合するプログラムを選択し実行する。例えば、車速
センサから自車が走行中か否かを判断し、走行中であれ
ばロードノイズや風切り音等の雑音に対して、対雑音性
能の高い音声認識プログラムを実行する。これにより、
認識率の向上を図る例である。Another example is a speech recognition apparatus disclosed in Japanese Patent Application Laid-Open No. 9-134193. This is an example in which a speech recognition algorithm is selected according to the situation to improve the recognition rate. For example, the central processing unit estimates the usage environment based on information from various sensors. Then, a program suitable for the use environment is selected and executed from among the plurality of speech recognition programs. For example, it is determined from the vehicle speed sensor whether or not the vehicle is traveling. If the vehicle is traveling, a voice recognition program having high noise immunity against noise such as road noise and wind noise is executed. This allows
This is an example in which the recognition rate is improved.

【０００５】又、他に認識率向上を示唆する論文があ
る。例えば、文献”MULTI-STYLE FORROBUST ISOLATED-W
ORD SPEECH RECONGNITION”（Richard P.Lippmann，Edw
ard A. Martin，Douglas B.Paul）ICASSP’８７、”A M
ACROSCOPIC ANALYSYS OF AN EMOTIONAL SPEECH CORPUS
”（J.E.H Noad，S.P Whiteside and P.D Green ），E
uroSpeech’97”等である。これらは、様々な心的状態
下で採録した音声から作成した音声認識音響モデルは、
平常時の音声のみから作成した音響モデルよりはその状
況下において高い認識率が得られることを述べている。[0005] There are other papers suggesting an improvement in the recognition rate. For example, the document "MULTI-STYLE FORROBUST ISOLATED-W
ORD SPEECH RECONGNITION ”(Richard P. Lippmann, Edw
ard A. Martin, Douglas B. Paul) ICASSP'87, "AM
ACROSCOPIC ANALYSYS OF AN EMOTIONAL SPEECH CORPUS
”(JEH Noad, SP Whiteside and PD Green), E
uroSpeech'97 ”etc. These are speech recognition acoustic models created from voices recorded under various mental states,
It states that a higher recognition rate can be obtained in that situation than an acoustic model created from normal speech only.

【０００６】[0006]

【発明が解決しようする課題】しかしながら、特開平１
１−１２６０９２号公報に開示の音声認識装置及び車両
用音声認識装置は、音声認識率を低下させる車室内外の
物理的原因を排除して認識率を高める例である。従って
音声認識率低下の原因が運転者状態にある場合は、認識
率を向上できないという欠点がある。又、特開平９−１
３４１９３号公報に開示の音声認識装置は、ノイズ等の
車室内外の物理的状況に応じて音声認識アルゴリズムを
選択して、認識率の向上を図る装置である。従って、こ
の場合も運転者状態の変化に対して対応できる装置では
ない。However, Japanese Patent Laid-Open No.
The speech recognition device and the vehicle speech recognition device disclosed in Japanese Patent Application Laid-Open No. 1-126092 are examples in which a physical cause inside and outside the vehicle compartment that reduces the speech recognition rate is eliminated to increase the recognition rate. Therefore, there is a disadvantage that the recognition rate cannot be improved when the cause of the decrease in the voice recognition rate is the driver state. Also, Japanese Patent Laid-Open No. 9-1
The voice recognition device disclosed in Japanese Patent No. 34193 is a device that selects a voice recognition algorithm in accordance with physical conditions inside and outside a vehicle such as noise to improve a recognition rate. Therefore, even in this case, it is not a device that can respond to a change in the driver's state.

【０００７】又、文献”MULTI-STYLE FOR ROBUST ISOLA
TED-WORD SPEECH RECONGNITION”（Richard P.Lippman
n，Edward A. Martin，Douglas B.Paul）ICASSP’８７
は、様々な心的状態下で採録した音声から一種類の平均
的な音響モデルを作成し、それによって声質の変化に対
応する方法である。しかしながら、運転者は運転による
ストレス下においては声質の変化のみならず不要語等の
語句の変化もある。この不要語を考慮した音声認識辞書
をも用いれば、より認識率を向上させることができると
考えられる。即ち、十分に改善と進歩の余地がある。[0007] Also, the document "MULTI-STYLE FOR ROBUST ISOLA"
TED-WORD SPEECH RECONGNITION ”(Richard P. Lippman
n, Edward A. Martin, Douglas B. Paul) ICASSP'87
Is a method of creating one type of average acoustic model from voices recorded under various mental states, thereby responding to changes in voice quality. However, under stress caused by driving, the driver not only changes voice quality but also changes words and phrases such as unnecessary words. It is considered that the recognition rate can be further improved by using a speech recognition dictionary that takes this unnecessary word into consideration. That is, there is ample room for improvement and progress.

【０００８】本発明は上述した問題点を解決するために
なされたものであり、その目的は音声認識辞書に様々な
心的状態に対応した辞書を備え、音声認識時には話者の
心的状態を推定し、その心的状態に対応した辞書を使用
することにより認識率を向上させることである。又、そ
れを音声入力装置に適用して様々な装置に対して的確な
制御を行うことである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a speech recognition dictionary having dictionaries corresponding to various mental states, and to have the mental state of a speaker during speech recognition. It is to improve the recognition rate by estimating and using a dictionary corresponding to the mental state. It is another object of the present invention to apply it to a voice input device to perform appropriate control on various devices.

【０００９】[0009]

【課題を解決するための手段】請求項１に記載の音声入
力装置は、音声入力手段と音声認識手段を有し入力され
た音声を認識して様々な機器を制御する音声入力装置で
あって、センサ装置とセンサ装置の出力により話者の心
的状態を推定する心的状態推定装置と、複数の音響モデ
ルと複数の言語モデルに基づいて作成された複数の辞書
を有する音声認識辞書群と、その心的状態推定装置の出
力に応じて音声認識辞書群から適切な辞書を選択する辞
書選択装置とを備えたことを特徴とする。尚、複数の音
響モデルと複数の言語モデルに基づいて作成された複数
の辞書とは、様々な環境下で採録された音声に基づいて
作成された音声辞書の意味である。以降、辞書は音声辞
書の意味で使用する。A voice input device according to a first aspect of the present invention is a voice input device having voice input means and voice recognition means for recognizing input voice and controlling various devices. A mental state estimating apparatus for estimating a speaker's mental state by a sensor device and an output of the sensor device, and a speech recognition dictionary group having a plurality of dictionaries created based on a plurality of acoustic models and a plurality of language models. And a dictionary selecting device for selecting an appropriate dictionary from the speech recognition dictionary group according to the output of the mental state estimating device. Note that the plurality of dictionaries created based on a plurality of acoustic models and a plurality of language models means a speech dictionary created based on speech recorded under various environments. Hereinafter, the dictionary is used in the meaning of the voice dictionary.

【００１０】請求項２に記載の音声入力装置によれば、
辞書選択装置は運転者の心的状態と音声認識辞書群の複
数の辞書との対応テーブルを有することを特徴とする。
請求項３に記載の音声入力装置によれば、話者が自車内
である時その音響モデルは車室内音響モデルであること
を特徴とする。又、請求項４に記載の音声入力装置によ
れば、センサ装置は自車の走行状態を検出する自車セン
サを有することを特徴とする。又、請求項５に記載の音
声入力装置によれば、センサ装置は自車外部の障害物を
検出する障害物センサを有することを特徴とする。[0010] According to the voice input device of the second aspect,
The dictionary selecting device is characterized by having a correspondence table between a driver's mental state and a plurality of dictionaries of a voice recognition dictionary group.
According to the voice input device of the third aspect, when the speaker is in the own vehicle, the acoustic model is a vehicle interior acoustic model. According to a fourth aspect of the present invention, there is provided a voice input device, wherein the sensor device has a vehicle sensor for detecting a traveling state of the vehicle. According to the voice input device of the fifth aspect, the sensor device has an obstacle sensor for detecting an obstacle outside the own vehicle.

【００１１】[0011]

【作用および効果】請求項１に記載の音声入力装置によ
れば、センサ装置は例えば話者状態、例えば話者の心拍
数、発汗量、動作量等を検出し出力する。心的状態推定
装置は、そのセンサ装置の出力により話者の心的状態を
推定する。心的状態とは、例えば作業負荷がかかった状
態、環境変化時の話者の緊張度、余裕度、感情等であ
る。例えば、心拍数と掌の発汗量は、話者の緊張度と相
関がある。又、センサ装置を例えばカメラ装置とすれば
画像処理により話者の感情等も推定可能である。次に、
辞書選択装置がその心的状態に応じて、複数の辞書から
なる音声認識辞書群からその心的状態に対応した適切な
辞書を選択する。そして、それを音声認識装置に出力す
る。音声認識装置は、その辞書を使用して音声認識を行
う。音声認識装置は、話者の心的状態に応じた辞書を使
用するのでより正確に音声を認識することができる。即
ち、音声認識率が向上する。従って、この構成要素で音
声入力装置を形成すれば、様々な機器をより的確に操作
することができる。According to the voice input device of the first aspect, the sensor device detects and outputs, for example, a speaker state, for example, a heart rate, a sweating amount, an operation amount, and the like of the speaker. The mental state estimation device estimates the mental state of the speaker based on the output of the sensor device. The mental state is, for example, a state where a work load is applied, a degree of tension, a margin, an emotion, and the like of the speaker when the environment is changed. For example, the heart rate and the amount of palm sweat correlate with the speaker's tension. If the sensor device is, for example, a camera device, the emotion of the speaker can be estimated by image processing. next,
The dictionary selecting device selects an appropriate dictionary corresponding to the mental state from a speech recognition dictionary group including a plurality of dictionaries, according to the mental state. Then, it is output to the speech recognition device. The speech recognition device performs speech recognition using the dictionary. Since the speech recognition device uses the dictionary according to the mood of the speaker, the speech can be more accurately recognized. That is, the speech recognition rate is improved. Therefore, if a voice input device is formed with these components, various devices can be operated more accurately.

【００１２】請求項２に記載の音声入力装置によれば、
辞書選択装置は話者の心的状態と音声認識辞書群内の複
数の辞書が対応せられた対応テーブルを有している。よ
って、心的状態推定装置の出力に応じて即座に適切な辞
書を指示することができる。よって、迅速に上記音声認
識を行って様々な機器を制御する音声入力装置となる。[0012] According to the voice input device of the second aspect,
The dictionary selection device has a correspondence table in which a mental state of a speaker is associated with a plurality of dictionaries in a speech recognition dictionary group. Therefore, an appropriate dictionary can be instructed immediately according to the output of the mental state estimation device. Accordingly, the voice input device controls the various devices by performing the voice recognition quickly.

【００１３】請求項３に記載の音声入力装置によれば、
話者が自車内である場合は音響モデルは車室内音響モデ
ルを使用する。車室内音響モデルは、話者音声の車室内
反射、自車エンジン音、風切り音等を考慮した音響モデ
ルである。車室内音響モデルを使用しているので、自車
内でも精度良く音声認識することができる。よって、精
度よく車載機器を操作する音声入力装置となる。According to the voice input device of the third aspect,
When the speaker is in the vehicle, the acoustic model uses the vehicle interior acoustic model. The vehicle interior acoustic model is an acoustic model that takes into account vehicle interior reflection of a speaker's voice, own vehicle engine sound, wind noise, and the like. Since the vehicle interior acoustic model is used, speech recognition can be performed with high accuracy even in the own vehicle. Therefore, the voice input device operates the on-vehicle device with high accuracy.

【００１４】請求項４に記載の音声入力装置によれば、
センサ装置は自車状態を検出する自車センサを有してい
る。例えば話者が運転者である場合、その心的状態は走
行状態で変化する。例えば加速時、高速走行時には緊張
度が高まり、運転余裕度が低下する。即ち、自車センサ
である例えば車速センサによって話者の心的状態を推定
することができる。これは、又心拍数、発汗量検出等の
話者センサからの推定を更に補強するものである。例え
ば、話者センサからの推定で運転余裕度が中であって
も、例えば走行速度が所定値以上で速ければ運転余裕度
小と補正する。従って、車室内での認識率をより向上さ
せることができる。即ち、より的確に車載機器を制御す
る音声入力装置となる。According to the voice input device of the fourth aspect,
The sensor device has a host vehicle sensor for detecting the host vehicle state. For example, when the speaker is a driver, the mental state changes depending on the running state. For example, during acceleration or high-speed running, the degree of tension increases, and the driving margin decreases. That is, the mental state of the speaker can be estimated by the own vehicle sensor, for example, the vehicle speed sensor. This further reinforces the estimation from speaker sensors, such as heart rate and sweat detection. For example, even if the driving margin is medium as estimated from the speaker sensor, the driving margin is corrected to be small if the traveling speed is faster than a predetermined value, for example. Therefore, the recognition rate in the vehicle compartment can be further improved. That is, the voice input device controls the on-vehicle device more accurately.

【００１５】請求項５に記載の音声入力装置によれば、
センサ装置は自車外部の障害物を検出する障害物センサ
を有している。例えば話者が運転者である場合、走行中
の障害物発見時には心的状態の緊張度が高まる。この
時、障害物センサは上記話者センサ、自車センサに比べ
て最も速く話者（運転者）の心的状態を推定することが
できる。なぜなら、例えば心拍数検出等の話者センサ、
例えば車線変更検出等の自車センサの反応は、緊張に対
して所定期間の遅延があるからである。According to the voice input device of the fifth aspect,
The sensor device has an obstacle sensor that detects an obstacle outside the vehicle. For example, when the speaker is a driver, the tension of the mental state increases when an obstacle is found during traveling. At this time, the obstacle sensor can estimate the mental state of the speaker (driver) faster than the speaker sensor and the own vehicle sensor. Because speaker sensors such as heart rate detection,
This is because, for example, the reaction of the own vehicle sensor such as lane change detection has a predetermined period of delay with respect to tension.

【００１６】従って、障害物センサは障害物発見時は最
も早く音声認識用の辞書を変更することができる。又
は、緊張に先だって音声認識用の辞書を変更することが
できる。即ち、遅延により（不適切な辞書により）認識
率を低下させることがない。即ち、障害物発見時にも高
い認識率を維持することができる。ここで、この障害物
センサは上記話者センサ及び自車センサと併用すること
が望ましい。３者を併用すれば、様々な変化に対応でき
る音声入力装置となる。尚、上記障害物センサとは、例
えばカメラ装置、レーザーレーダ装置等である。又、障
害物とは路上の所謂障害物のみならず前方車輌、側方車
両も含む。又、ガードレール等の走行路構成要素も含
む。Therefore, the obstacle sensor can change the speech recognition dictionary at the earliest when an obstacle is found. Alternatively, the dictionary for speech recognition can be changed prior to the tension. That is, the delay does not lower the recognition rate (due to an inappropriate dictionary). That is, a high recognition rate can be maintained even when an obstacle is found. Here, this obstacle sensor is desirably used in combination with the speaker sensor and the vehicle sensor. If three persons are used together, it becomes a voice input device which can cope with various changes. The obstacle sensor is, for example, a camera device, a laser radar device, or the like. The obstacles include not only so-called obstacles on the road but also vehicles ahead and side vehicles. It also includes travel path components such as guardrails.

【００１７】[0017]

【発明の実施の形態】（第１実施例）以下、本発明の実
施の形態について図面を参照して説明する。図１に本発
明の音声入力装置の１実施例を示す。図は、システム構
成図である。本実施例の音声入力装置は、センサ装置
５、心的状態推定装置１０、辞書選択装置１５、音声入
力手段であるマイク２０、音声認識手段である音声認識
装置２５、出力装置３５、複数の辞書からなる音声認識
辞書群４０から構成される。辞書対応テーブル１５ａ
は、辞書選択装置１５に含まれるものとする。DESCRIPTION OF THE PREFERRED EMBODIMENTS (First Embodiment) An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 shows an embodiment of a voice input device according to the present invention. The figure is a system configuration diagram. The voice input device of the present embodiment includes a sensor device 5, a mental state estimation device 10, a dictionary selection device 15, a microphone 20 as voice input means, a voice recognition device 25 as voice recognition means, an output device 35, a plurality of dictionaries. And a voice recognition dictionary group 40 composed of Dictionary correspondence table 15a
Are included in the dictionary selection device 15.

【００１８】センサ装置５は、例えば話者の心拍数、掌
の発汗量を検知するセンサである。心的状態推定装置１
０は、その出力を受けて話者の心的状態を推定する装置
である。例えばセンサ装置５の出力（心拍数）に応じて
心的状態の１つである余裕度を大、中、小に推定する。
辞書対応テーブル１５ａは、辞書とその余裕度の対応表
である。例えば、余裕度小に対して辞書４１（辞書
Ａ）、余裕度中に対して辞書４２（辞書Ｂ）、余裕度大
に対して辞書４３（辞書Ｃ）が対応されている。The sensor device 5 is a sensor that detects, for example, the speaker's heart rate and the amount of sweat in the palm. Mental state estimation device 1
Numeral 0 is a device that receives the output and estimates the mental state of the speaker. For example, a margin, which is one of the mental states, is estimated to be large, medium, or small according to the output (heart rate) of the sensor device 5.
The dictionary correspondence table 15a is a correspondence table of a dictionary and its margin. For example, a dictionary 41 (dictionary A) corresponds to a small margin, a dictionary 42 (dictionary B) corresponds to a medium margin, and a dictionary 43 (dictionary C) corresponds to a large margin.

【００１９】辞書選択装置１５は、その推定値に応じて
辞書対応テーブル１５ａから適切な辞書の例えば辞書番
号を知らせる装置である。又、音声認識辞書群４０は様
々な環境下で、即ち複数の音響モデルと複数の言語モデ
ルに基づいて形成された複数の音声辞書である辞書４
１、４２、４３、・・を有している。そして、音声認識
装置２５は、指定された辞書番号の辞書、例えば辞書４
１を用いてマイク２０からの音声を認識し、その結果を
接続された出力装置３５に出力する装置である。尚、本
実施例の音声入力装置は、演算を実行するＣＰＵ、プロ
グラムを記憶したＲＯＭ、データを記憶するＲＡＭ、周
辺装置であるハードディスク等の外部メモリ、及びイン
タフェース等からなるコンピュータシステムで構成され
ている。ＣＰＵの実行するプログラムと周辺装置で上記
心的状態推定装置１０、辞書選択装置１５が形成されて
いる。又、辞書対応テーブル１５ａと音声認識辞書群４
０はそのコンピュータシステムの周辺装置である例えば
外部メモリ部に形成されている。The dictionary selection device 15 is a device that notifies an appropriate dictionary, for example, a dictionary number, from the dictionary correspondence table 15a according to the estimated value. The speech recognition dictionary group 40 is a dictionary 4 which is a plurality of speech dictionaries formed under various environments, that is, based on a plurality of acoustic models and a plurality of language models.
, 42, 43,. Then, the voice recognition device 25 transmits the dictionary of the designated dictionary number, for example, the dictionary 4
1 is a device for recognizing the voice from the microphone 20 using the device 1 and outputting the result to the output device 35 connected thereto. The voice input device according to the present embodiment is configured by a computer system including a CPU for executing arithmetic operations, a ROM for storing programs, a RAM for storing data, an external memory such as a hard disk as a peripheral device, and an interface. I have. The mental state estimation device 10 and the dictionary selection device 15 are formed by a program executed by the CPU and peripheral devices. Also, the dictionary correspondence table 15a and the speech recognition dictionary group 4
Numeral 0 is formed in a peripheral device of the computer system, for example, an external memory unit.

【００２０】本実施例の音声入力装置の動作を図２に示
す。図は、フローチャートである。本実施例の音声入力
装置は図示しない認識スイッチがオンされることによっ
て開始される。先ず、ステップＳ１０で心的状態推定装
置１０を作動させセンサ装置５の出力を読む。例えば、
話者の心拍数を読む。次に、ステップＳ１１に移行し例
えばその心拍数に応じた話者の心的状態を推定する。例
えば、心拍数に応じて話者の余裕度を大、中、小に推定
する。そして、ステップＳ１２でその推定値に応じた例
えば辞書番号を辞書対応テーブル１５ａから選択する。
例えば、余裕度小であれば辞書４１を選択する。FIG. 2 shows the operation of the voice input device of this embodiment. The figure is a flowchart. The voice input device of the present embodiment is started when a recognition switch (not shown) is turned on. First, in step S10, the mental state estimating device 10 is operated and the output of the sensor device 5 is read. For example,
Read the speaker's heart rate. Next, the process proceeds to step S11, in which, for example, the mental state of the speaker according to the heart rate is estimated. For example, the margin of the speaker is estimated to be large, medium, or small according to the heart rate. Then, in step S12, for example, a dictionary number corresponding to the estimated value is selected from the dictionary correspondence table 15a.
For example, if the margin is small, the dictionary 41 is selected.

【００２１】次に、ステップＳ１３に移行しマイク２０
から話者音声を得る。そしてステップＳ１４で、その音
声を辞書番号で指定された辞書を用いて音声認識する。
例えば、辞書４１を用いて例えばＨＭＭ（隠れマルコフ
モデル）方式の音声認識を行う。そして、その結果を出
力装置３５に出力する。Next, the process proceeds to step S13, where the microphone 20
From the speaker. Then, in step S14, the voice is recognized using the dictionary specified by the dictionary number.
For example, speech recognition of, for example, HMM (Hidden Markov Model) method is performed using the dictionary 41. Then, the result is output to the output device 35.

【００２２】この時、その出力値は例えば認識した一連
の文字コードであり、出力装置は例えばオーディオ装置
又は空調機器である。出力装置がオーディオ装置の場合
は、認識結果は例えばＴＶ装置等のスイッチ・オンであ
り、チャンネルの指定、音量指定等である。上記音声入
力装置は、例えばこのようなフローチャートで動作され
る。At this time, the output value is, for example, a series of recognized character codes, and the output device is, for example, an audio device or an air conditioner. When the output device is an audio device, the recognition result is, for example, switch-on of a TV device or the like, such as designation of a channel, designation of a volume, and the like. The voice input device is operated, for example, according to such a flowchart.

【００２３】上述したように、本実施例の音声入力装置
は話者状態（心的状態）をセンサ装置で検出し、その心
的状態に応じた辞書で音声認識を行っている。従って、
従来の通常の辞書を用いる音声認識より認識率を向上さ
せることができる。又、その認識結果で制御機器を制御
するので、より的確にそれを制御することができる。As described above, the voice input device of the present embodiment detects the speaker state (mental state) by the sensor device and performs voice recognition using a dictionary corresponding to the mental state. Therefore,
The recognition rate can be improved as compared with the conventional voice recognition using a normal dictionary. In addition, since the control device is controlled based on the recognition result, it can be controlled more accurately.

【００２４】（第２実施例）第１実施例は、一般的な音
声入力装置に本発明を適用した１例であった。本実施例
は、車載用ナビゲーションシステムに適用した例であ
る。図３に、本実施例のナビゲーションシステムを示
す。第１実施例と異なる所は、第１実施例のセンサ装置
５を車速センサ５ａとし、そのセンサ出力から運転者の
心的状態を推定したことである。そして、出力装置３５
をナビゲーション装置３６とその結果を運転者に表示す
る表示装置３７としたことである。Second Embodiment The first embodiment is an example in which the present invention is applied to a general voice input device. This embodiment is an example applied to an on-vehicle navigation system. FIG. 3 shows a navigation system according to the present embodiment. The difference from the first embodiment is that the sensor device 5 of the first embodiment is a vehicle speed sensor 5a, and the mental state of the driver is estimated from the sensor output. And the output device 35
Is a navigation device 36 and a display device 37 for displaying the result to the driver.

【００２５】本実施例のナビゲーションシステムは、セ
ンサ装置である車速センサ５ａ、心的状態推定装置１
０、辞書選択装置１５、マイク２０、音声認識装置２
５、辞書対応テーブル１５ａ、ナビゲーション装置３
６、その結果を表示する表示装置３７、そして複数の辞
書４１，４２，４３からなる音声認識辞書群４０から構
成される。The navigation system according to the present embodiment has a vehicle speed sensor 5a as a sensor device and a mental state estimation device 1
0, dictionary selection device 15, microphone 20, speech recognition device 2
5, dictionary correspondence table 15a, navigation device 3
6, a display device 37 for displaying the result, and a speech recognition dictionary group 40 including a plurality of dictionaries 41, 42, 43.

【００２６】車速センサは５ａは、例えば車軸の回転数
を検出するセンサである。心的状態推定装置１０は、車
速センサ５ａの出力から運転者の心的状態を推定する。
例えば、運転余裕度を推定する。例えば車速１０Ｋｍ／
ｈ以下ならば運転余裕度大、１０Ｋｍ／ｈ〜５０Ｋｍな
らば運転余裕度中、時速５０Ｋｍ／ｈ以上ならば運転余
裕度小と推定する。辞書対応テーブル１５ａは、辞書と
心的状態の対応表である。例えば、運転余裕度小（車速
大）に対して辞書４１（辞書Ａ）、運転余裕度中（車速
中）に対して辞書４２（辞書Ｂ）、運転余裕度大（車速
小）に対して辞書４３（辞書Ｃ）が対応されている（図
４）。The vehicle speed sensor 5a is, for example, a sensor for detecting the rotation speed of the axle. The mental state estimation device 10 estimates the mental state of the driver from the output of the vehicle speed sensor 5a.
For example, the driving margin is estimated. For example, a vehicle speed of 10 km /
If the speed is less than h, it is estimated that the operation margin is large if 10 km / h to 50 km, and if the speed is 50 km / h or more, the operation margin is small. The dictionary correspondence table 15a is a correspondence table between the dictionary and the mental state. For example, a dictionary 41 (dictionary A) for a small driving margin (high vehicle speed), a dictionary 42 (dictionary B) for a medium driving margin (medium vehicle speed), and a dictionary for a large driving margin (small vehicle speed). 43 (dictionary C) are supported (FIG. 4).

【００２７】その辞書Ａ，Ｂ，Ｃは、それぞれの車速で
の心的状態に応じて車室内で採録された音声から学習さ
れた車室内音響モデルと言語モデルであり、音声認識辞
書群４０内に収められている。辞書選択装置１５は、辞
書対応テーブル１５ａに従って心的状態に対応した辞書
４１、４２，４３を選択して知らせる装置である。尚、
音声認識装置２５は第１実施例のそれと同等の装置であ
る。即ち、指定された辞書、例えば辞書４１を用いてマ
イク２０からの音声を認識し、その結果を接続されたナ
ビゲーション装置３６に出力する装置である。又、本実
施例のナビゲーション装置３６とその表示装置３７は、
ＣＰＵ、外部記憶装置、入力装置、インタフェース、Ｒ
ＯＭ、ＲＡＭ、液晶等のディスプレイ等で構成されるコ
ンピュータシステムとして実現されている。The dictionaries A, B, and C are a vehicle interior acoustic model and a language model learned from the voice recorded in the vehicle interior according to the mental state at each vehicle speed. It is stored in. The dictionary selecting device 15 is a device that selects and notifies the dictionaries 41, 42, and 43 corresponding to the mental state according to the dictionary correspondence table 15a. still,
The voice recognition device 25 is a device equivalent to that of the first embodiment. That is, it is a device that recognizes the voice from the microphone 20 using a designated dictionary, for example, the dictionary 41, and outputs the result to the connected navigation device 36. The navigation device 36 and the display device 37 of the present embodiment are
CPU, external storage device, input device, interface, R
It is realized as a computer system including a display such as an OM, a RAM, and a liquid crystal.

【００２８】本実施例のナビゲーションシステムの動作
を図５に示す。図は、フローチャートである。ここで
は、例えばそのシステムに既に目的地が設定されている
場合を想定する。そして、目的地までの予測所要時間を
得る場合に、運転者が「予測時間」を発音しその音声を
認識させてその結果を表示させる例で説明する。本実施
例のナビゲーションシステムは図示しない認識スイッチ
がオンされることによって開始される。先ずステップＳ
２０で車速センサ５ａの出力を読み、運転者の心的状態
である運転余裕度を推定する（心的状態推定装置の作
動）。例えば、車速１０Ｋｍ／ｈ以下であれば運転余裕
度大、１０〜５０Ｋｍ／ｈであれば運転余裕度中、５０
Ｋｍ／ｈ以上であれば運転余裕度小と推定する。FIG. 5 shows the operation of the navigation system of this embodiment. The figure is a flowchart. Here, for example, it is assumed that a destination is already set in the system. Then, in the case of obtaining the estimated required time to the destination, an example will be described in which the driver pronounces the "estimated time", recognizes its voice, and displays the result. The navigation system according to the present embodiment is started when a recognition switch (not shown) is turned on. First, step S
At step 20, the output of the vehicle speed sensor 5a is read, and the driving margin, which is the mental state of the driver, is estimated (operation of the mental state estimation device). For example, if the vehicle speed is 10 Km / h or less, the driving margin is large.
If it is equal to or higher than Km / h, it is estimated that the operating margin is small.

【００２９】次に、ステップＳ２１に移行しその推定値
に応じた辞書を辞書対応テーブル１５ａ（図４）から選
択する（辞書選択装置の作動）。例えば、運転余裕度大
であれば辞書Ａを選択し、運転余裕度小あれば辞書Ｃを
選択してその番号を知らせる。次に、ステップＳ２２に
移行しマイク２０から音声を得る。例えば発音「予測時
間」を得る。そしてステップＳ２３で、その音声を例え
ば運転余裕度に応じて選択された例えば辞書４１を用い
て音声認識する。この音声認識も、例えばＨＭＭ（隠れ
マルコフモデル）方式で行う。Next, the process proceeds to step S21, where a dictionary corresponding to the estimated value is selected from the dictionary correspondence table 15a (FIG. 4) (operation of the dictionary selection device). For example, if the driving margin is large, the dictionary A is selected, and if the driving margin is small, the dictionary C is selected and its number is notified. Next, the process proceeds to step S22 to obtain a sound from the microphone 20. For example, the pronunciation “predicted time” is obtained. Then, in step S23, the voice is recognized using, for example, the dictionary 41 selected according to the driving margin. This speech recognition is also performed, for example, by the HMM (Hidden Markov Model) method.

【００３０】この時、音声認識装置２５の出力値は例え
ば認識した一連の文字コードである。そして、それをナ
ビゲーションシ装置３６に出力する。そしてステップｓ
２４でナビゲーション装置３６がその指令「予測時間」
の結果、即ち目的地までの残り時間を表示装置３７に表
示して終了する。又は、図示しない音声合成装置で合成
して運転者に知らせて終了する。本実施例の音声入力装
置を採用したナビゲーションシステムは、例えばこのよ
うなフローチャートで動作する。At this time, the output value of the speech recognition device 25 is, for example, a series of recognized character codes. Then, it outputs it to the navigation device 36. And step s
At 24, the navigation device 36 issues the command “estimated time”.
, That is, the remaining time to the destination is displayed on the display device 37, and the process ends. Alternatively, the speech is synthesized by a voice synthesizer (not shown), the driver is notified, and the processing ends. The navigation system employing the voice input device of the present embodiment operates, for example, according to such a flowchart.

【００３１】上述したように、本実施例の音声入力装置
は自車の走行状態を検出し、それより運転者の心的状態
を推定している。そして、その心的状態に応じた辞書で
音声認識を行っている。又、その辞書は車室内音響モデ
ルと車室内言語モデルによる辞書である。よって、車室
内での認識率を従来より向上させることができる。従っ
て、従来より的確にナビゲーション装置を制御する音声
入力装置となる。As described above, the voice input device of the present embodiment detects the traveling state of the vehicle and estimates the mental state of the driver based on the detected traveling state. Then, speech recognition is performed using a dictionary corresponding to the mental state. The dictionary is a dictionary based on a vehicle interior acoustic model and a vehicle interior language model. Therefore, the recognition rate in the vehicle compartment can be improved as compared with the related art. Therefore, the voice input device controls the navigation device more accurately than before.

【００３２】（変形例）本発明は、他の様々な形態で実
施することができる。例えば、第１実施例、第２実施例
の心的状態推定装置１０、辞書選択装置１５は図示しな
いコンピュータ装置とそのプログラムとその周辺装置で
構成したが、それぞれ独立に形成されていてもよい。そ
の機能を有するならば、実際の形態は問わない。(Modification) The present invention can be implemented in various other forms. For example, although the mental state estimation device 10 and the dictionary selection device 15 of the first and second embodiments are constituted by a computer device (not shown) and its program and its peripheral devices, they may be formed independently. The actual form does not matter as long as it has the function.

【００３３】又、第１実施例のセンサ装置５は、話者状
態を検知するセンサとしそれにより直接話者の心的状態
を推定する例であった。又、第２実施例のセンサ装置は
自車状態を検出するセンサ（車速センサ５ａ）とし、そ
の出力から間接的に運転者（話者）の心的状態を推定す
る例であった。センサ装置５には、上記両者の機能を持
たせてもよい。即ち、話者（運転者）状態と自車状態を
検出し両者から総合的に話者の心的状態を推定しても良
い。例えば、車速が５ｋｍ／ｈ以下で一旦心的状態であ
る運転余裕度を大と判定しても、心拍数、掌の発汗量が
大であれば、運転余裕度を中、又は小に修正してもよ
い。なぜなら他の要因で、運転者が緊張している場合が
想定されるからである。このように構成すれば、より精
度よく心的状態が推定できる。よって、より的確に機器
を制御することができる。Further, the sensor device 5 of the first embodiment is an example in which a sensor for detecting the state of a speaker is used to directly estimate the mental state of the speaker. Further, the sensor device of the second embodiment is an example in which a sensor (vehicle speed sensor 5a) for detecting the state of the own vehicle is used, and the mental state of the driver (speaker) is indirectly estimated from the output thereof. The sensor device 5 may have both functions. That is, the state of the speaker (driver) and the state of the own vehicle may be detected, and the mental state of the speaker may be comprehensively estimated from both. For example, even if it is determined that the driving margin, which is a mental state once, is large when the vehicle speed is 5 km / h or less, if the heart rate and the amount of sweating of the palm are large, the driving margin is corrected to medium or small. You may. This is because the driver may be nervous due to other factors. With this configuration, the mental state can be more accurately estimated. Therefore, the device can be controlled more accurately.

【００３４】更に、そのセンサ装置５に障害物検出セン
サを追加してもよい。運転者に障害物等によって緊張が
発生する場合は、障害物センサが他のセンサに比べて最
も早くそれを検出することができるからである。なぜな
ら、話者（運転者）に緊張が発生してから話者センサ、
自車センサがそれを検出するまでには所定時間を要する
からである。例えば心拍数検出、車線変更検出等は、緊
張発生に対して例えば反射神経等の反応遅延がある。従
って、障害物等による緊張に最も早く対応するには、レ
ーザーレーダ装置、カメラ装置、超音波装置等の障害物
センサを付加すればよい。Further, an obstacle detection sensor may be added to the sensor device 5. This is because, if the driver is nervous due to an obstacle or the like, the obstacle sensor can detect it faster than other sensors. Because the speaker (driver) becomes nervous, the speaker sensor,
This is because it takes a predetermined time for the own vehicle sensor to detect it. For example, heart rate detection, lane change detection, and the like have a response delay of, for example, reflexes to the occurrence of tension. Therefore, in order to respond quickly to tension caused by an obstacle or the like, an obstacle sensor such as a laser radar device, a camera device, or an ultrasonic device may be added.

【００３５】センサ装置５に障害物センサを追加した場
合は、図５のフローチャートのステップＳ２０を図６に
示すステップＳ２０ａ、ステップＳ２ｂに変更する。即
ち、ステップＳ２０ａで話者センサ、自車センサそして
障害物センサの３出力を読む。次に、ステップＳ２０ｂ
でその３出力から運転者の心的状態を推定する。例え
ば、障害物センサ、車速センサ、話者センサの順に優先
順位を付けそれに従って心的状態を推定する。そして、
その推定値を次段のステップＳ２１に送出すればよい。
このようにすれば、最も早く音声認識用の辞書を変更す
ることができる。又は、緊張に先だって音声認識用の辞
書を変更することができる。即ち、話者の反射神経の遅
延により（不適切な辞書により）認識率を低下させるこ
とがない。即ち、障害物発見時にも高い認識率を維持す
る音声入力装置となる。When an obstacle sensor is added to the sensor device 5, step S20 in the flowchart of FIG. 5 is changed to steps S20a and S2b shown in FIG. That is, in step S20a, three outputs of the speaker sensor, the own vehicle sensor, and the obstacle sensor are read. Next, step S20b
Then, the mental state of the driver is estimated from the three outputs. For example, priorities are assigned in the order of the obstacle sensor, the vehicle speed sensor, and the speaker sensor, and the mental state is estimated according to the priority. And
The estimated value may be sent to the next step S21.
By doing so, it is possible to change the dictionary for speech recognition at the earliest. Alternatively, the dictionary for speech recognition can be changed prior to the tension. That is, the recognition rate is not reduced (by an inappropriate dictionary) due to the delay of the speaker's reflexes. That is, the voice input device maintains a high recognition rate even when an obstacle is found.

[Brief description of the drawings]

【図１】本発明の第１実施例に係る音声入力装置のシス
テム構成図。FIG. 1 is a system configuration diagram of a voice input device according to a first embodiment of the present invention.

【図２】本発明の第１実施例に係る音声入力装置の動作
を示すフローチャート。FIG. 2 is a flowchart showing the operation of the voice input device according to the first embodiment of the present invention.

【図３】本発明の第２実施例に係るナビゲーションシス
テムのシステム構成図。FIG. 3 is a system configuration diagram of a navigation system according to a second embodiment of the present invention.

【図４】本発明の第２実施例に用いる心的状態と辞書の
対応テーブル。FIG. 4 is a correspondence table between a mental state and a dictionary used in the second embodiment of the present invention.

【図５】本発明の第２実施例に係るナビゲーションシス
テムの動作を示すフローチャート。FIG. 5 is a flowchart showing an operation of the navigation system according to the second embodiment of the present invention.

【図６】本発明の第２実施例の変形例に係るフローチャ
ートの変形ステップ。FIG. 6 is a modified step of the flowchart according to a modified example of the second embodiment of the present invention.

【図７】従来の音声入力装置である音声認識装置のシス
テム構成図。FIG. 7 is a system configuration diagram of a voice recognition device that is a conventional voice input device.

[Explanation of symbols]

５センサ装置１０心的状態推定装置１５辞書選択装置１５ａ辞書対応テーブル２０マイク２５音声認識装置３５出力装置４０音声認識辞書群 Reference Signs List 5 sensor device 10 mental state estimation device 15 dictionary selection device 15a dictionary correspondence table 20 microphone 25 voice recognition device 35 output device 40 voice recognition dictionary group

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｂ６０Ｒ 21/00 ６２１Ｂ６０Ｒ 21/00 ６２１Ｅ６２２Ｂ６２２６２２Ｃ６２２Ｆ６２４Ｂ６２４６２４Ｃ６２４Ｅ６２４ＦＧ１０Ｌ 3/00 ５７１ＶＧ１０Ｌ 15/06 ５２１Ｔ 15/20 ５３１Ｑ 15/00 ５５１Ｊ 15/28 ５５１Ｑ 15/24 ５６１Ｃ５７１ＱＦターム(参考） 3D037 FA01 FA09 FA16 FA23 FB09 FB10 5D015 AA01 AA06 BB02 HH23 KK01 KK04 LL05 LL06 LL12 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) B60R 21/00 621 B60R 21/00 621E 622B 622 622C 622F 624B 624 624C 624E 624F G10L 3/00 571V G10L 15 / 06 521T 15/20 531Q 15/00 551J 15/28 551Q 15/24 561C 571Q F term (reference) 3D037 FA01 FA09 FA16 FA23 FB09 FB10 5D015 AA01 AA06 BB02 HH23 KK01 KK04 LL05 LL06 LL12

Claims

[Claims]

1. A voice input device comprising a voice input means and a voice recognition means for recognizing an input voice and controlling various devices, comprising: a sensor device; A mental state estimating device for estimating a state, a speech recognition dictionary group having a plurality of dictionaries created based on a plurality of acoustic models and a plurality of language models, and according to an output of the mental state estimating device, A dictionary selection device for selecting an appropriate dictionary from a group of speech recognition dictionaries.

2. The speech input device according to claim 1, wherein the dictionary selection device has a correspondence table between a mental state of a speaker and the plurality of dictionaries of the speech recognition dictionary group.

3. The voice input device according to claim 1, wherein when the speaker is in a vehicle, the acoustic model is a vehicle interior acoustic model.

4. The voice input device according to claim 3, wherein said sensor device has a vehicle sensor for detecting a traveling state of the vehicle.

5. The voice input device according to claim 3, wherein said sensor device has an obstacle sensor for detecting an obstacle outside the host vehicle.