JP2007057805A

JP2007057805A - Information processing apparatus for vehicle

Info

Publication number: JP2007057805A
Application number: JP2005242875A
Authority: JP
Inventors: Katsumi Ohashi; 克己大橋
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2005-08-24
Filing date: 2005-08-24
Publication date: 2007-03-08

Abstract

<P>PROBLEM TO BE SOLVED: To allow an information processing apparatus for a vehicle, that performs information processing in response to a voice, to be reacted with only with the voice of a driver for processing, that the driver should instructs to be performed, while reducing cost. <P>SOLUTION: The voice (acoustic feature value) of the driver is previously stored. Then only when a voice input from a microphone 35 is a voice for processing for the driver and the voice matches the previously stored voice (acoustic feature value) of the driver, the processing corresponding to the input voice is performed. None of hardware such as a CCD camera need not be used to decide whether the voice input from the microphone 35 is the voice of the driver, so that the cost is not raised. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ユーザーからの音声入力によって車両の情報処理を実行する車両用情報処理装置に関する。 The present invention relates to a vehicle information processing apparatus that executes information processing of a vehicle by voice input from a user.

従来より、運転時等、リモコンやスイッチの手動操作が出来ない時に、代わって音声によって情報処理の実行をする車両用情報処理装置がある（例えば特許文献１）。しかし、従来の車両用情報処理装置では、複数の乗員が同時に音声を発した場合に誤作動を起こすことがある。そこで特許文献２では、車内の複数位置から音声を入力する複数のマイクと車内座席の乗員の有無を判断するためのＣＣＤカメラから、音声を入力した人の座席位置を特定している。
特開平１１−２８８２９６号公報特開２００４−３５４９３０号公報 2. Description of the Related Art Conventionally, there is an information processing apparatus for a vehicle that performs information processing by voice instead of manual operation of a remote control or a switch when driving or the like (for example, Patent Document 1). However, in the conventional vehicle information processing apparatus, malfunction may occur when a plurality of occupants simultaneously emit sound. Therefore, in Patent Document 2, the seat position of the person who has input the sound is specified from a plurality of microphones that input the sound from a plurality of positions in the vehicle and a CCD camera for determining the presence or absence of a passenger in the vehicle seat.
Japanese Patent Laid-Open No. 11-288296 JP 2004-354930 A

ところで、上述の音声によって情報処理を実行する車両用情報処理装置において、例えば車両の走行に関する情報処理のようにドライバーが実行を指示すべき情報処理がある。一方、オーディオに関する情報処理のようにドライバー以外の乗員が実行を指示しても支障のない情報処理もある。この点、特許文献２の技術を用いれば入力された音声がドライバーの音声か否かを判断することができることから、音声によって実行可能な情報処理のうち、ドライバーが実行を指示すべき処理についてはドライバーの音声のみに反応するようにすることも可能である。 By the way, in the vehicle information processing apparatus that performs information processing by the above-described voice, there is information processing that the driver should instruct execution, such as information processing related to vehicle travel. On the other hand, there is information processing that does not hinder even if an occupant other than the driver instructs execution, such as information processing related to audio. In this respect, since it is possible to determine whether or not the input voice is the driver's voice by using the technique of Patent Document 2, of the information processing that can be executed by voice, the process that the driver should instruct execution is It is also possible to react only to the driver's voice.

しかしながら、特許文献２では音声を入力した人の座席位置を特定するために、複数のマイクとＣＣＤカメラとを用いる必要があることからコストアップしてしまう。 However, in Patent Document 2, since it is necessary to use a plurality of microphones and a CCD camera in order to specify the seat position of the person who has input the voice, the cost increases.

本発明は以上の問題点に鑑みてなされたものであり、音声によって情報処理を実行する車両用情報処理装置において、コストダウンを図りつつ、ドライバーが実行を指示すべき処理についてはドライバーの音声にのみ反応することを目的とする。 The present invention has been made in view of the above problems, and in a vehicle information processing apparatus that performs information processing by voice, the driver's voice is used for processing that the driver should instruct execution while reducing costs. The only purpose is to react.

上記目的を達成するために、請求項１の車両用情報処理装置は、音声を入力する音声入力手段と、前記音声入力手段により入力された音声が、ドライバーが実行を指示すべき処理とドライバー以外の乗員に対しても実行の指示可能な処理とを含む音声による実行指示可能な処理の実行を指示する音声であるか否かを判定する第１の判定手段と、ドライバーの声紋を記憶する声紋記憶手段と、前記声紋記憶手段に記憶されている声紋に基づいて、前記音声入力手段により入力された音声が、前記ドライバーの音声か否かを判定する第２の判定手段と、前記第１の判定手段が、前記音声入力手段により入力された音声が前記音声による実行指示可能な処理の実行を指示する音声であると判定したときには、前記ドライバーが実行を指示すべき処理については、前記第２の判定手段の判定結果に基づいて前記ドライバーの音声に対してのみ処理を実行する実行手段とを備えることを特徴とする。 In order to achieve the above object, an information processing apparatus for a vehicle according to claim 1 includes a voice input means for inputting voice, a process in which the voice input by the voice input means is to be instructed by the driver to execute, First determination means for determining whether or not the occupant is instructed to execute a process capable of instructing execution, including a process capable of instructing execution, and a voice print storing a driver's voice print Storage means; second determination means for determining whether or not the voice input by the voice input means is the voice of the driver based on the voiceprint stored in the voiceprint storage means; and the first When the determination unit determines that the voice input by the voice input unit is a voice instructing execution of a process that can be instructed to execute by the voice, the process to be instructed by the driver to be executed For, characterized by comprising an execution means for executing only the process to the audio of the second of said driver based on the determination result of the determining means.

これによれば、特許文献２のように複数のマイクとＣＣＤカメラを備える必要がなく入力された音声がドライバーの音声か否かを判定することができるので、コストダウン図りつつ、ドライバーが実行を指示すべき処理についてはドライバーの音声のみに反応するようにすることができる。 According to this, since it is not necessary to provide a plurality of microphones and a CCD camera as in Patent Document 2, it is possible to determine whether or not the input sound is the driver's sound, so that the driver can execute while reducing costs. The processing to be instructed can react only to the driver's voice.

請求項２の車両用情報処理装置は、前記声紋記憶手段は、運転開始時に前記音声入力手段により入力された音声を前記ドライバーの声紋として記憶することを特徴とする。これによれば、毎回の運転時のドライバーの声紋が記憶されることになるので、特に複数の人が運転する車両に適用すると効果的である。 The vehicle information processing apparatus according to claim 2 is characterized in that the voiceprint storage means stores the voice input by the voice input means at the start of driving as the voiceprint of the driver. According to this, since the driver's voiceprint at the time of every driving | running | working is memorize | stored, it is effective when applied to the vehicle which a several person drives especially.

請求項３の車両用情報処理装置は、前記声紋記憶手段に既にドライバーの声紋が記憶されているときには、以降の運転開始時にドライバーの声紋を記憶することを中止する中止手段を備えることを特徴とする。例えば、毎回の運転時のドライバーがほぼ特定の人に決まっている場合には、毎回の運転時にドライバーの声紋を記憶する必要はない。かえって毎回の運転時にドライバーの声紋を記憶すると、ドライバーの操作負担となることも予想される。請求項３はこのようなことを考慮したものである。 The vehicle information processing apparatus according to claim 3, further comprising a canceling unit that stops storing the driver's voiceprint when starting the subsequent driving when the driver's voiceprint is already stored in the voiceprint storage unit. To do. For example, when the driver at each driving is almost determined by a specific person, it is not necessary to memorize the driver's voiceprint at each driving. On the contrary, if the driver's voiceprint is memorized at every driving, it is expected that it will be a burden on the driver. Claim 3 takes this into consideration.

請求項４の車両用情報処理装置は、前記声紋記憶手段が、前記音声入力手段により入力された音声を前記ドライバーの声紋として記憶するスイッチを備えることを特徴とする。上記スイッチを設けることにより、ドライバーの好みの時に声紋を記憶することができる。特に、運転を開始した後、ドライバーが交代するときに効果的である。 The vehicle information processing apparatus according to claim 4 is characterized in that the voiceprint storage means includes a switch for storing the voice inputted by the voice input means as the voiceprint of the driver. By providing the switch, a voice print can be stored at the driver's preference. This is particularly effective when the driver changes after starting driving.

以下、本発明が適用された実施形態について図面を用いて説明する。図１は実施形態としてのマルチメディアシステムの概略構成を示すブロック図である。本マルチメディアシステムは、車両に搭載され、地図データを用いた地図表示や走行案内などのナビゲーションをはじめとして、その他のメディアを用いた画像表示やオーディオ装置などについても総合的に制御あるいは情報処理するようなシステムとして構成される。 Embodiments to which the present invention is applied will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a multimedia system as an embodiment. This multimedia system is installed in a vehicle and comprehensively controls or processes information such as map display using map data and navigation such as driving guidance, as well as image display and audio devices using other media. It is configured as a system like this.

具体的には、マルチメディアＥＣＵ１０と、タッチスイッチ付き表示装置１５と、エアコンＥＣＵ５１と、ＴＶチューナー５２と、オーディオ装置５３とが、通信ライン４０を介して相互に接続されて構成されている。 Specifically, the multimedia ECU 10, the display device 15 with a touch switch, the air conditioner ECU 51, the TV tuner 52, and the audio device 53 are connected to each other via the communication line 40.

マルチメディアＥＣＵ１０には、位置検出器４、地図データ入力器６、操作スイッチ群８が接続され、これらからのデータを入力すると共に、音声認識装置３０との間でもデータの入出力ができるようにされている。また、上述したタッチスイッチ付き表示装置１５との間でもデータの入出力ができるようにされている。なお、マルチメディアＥＣＵ１０はナビゲーション制御部１０ａ及びマルチメディア制御部１０ｂを備えており、これらはいずれも、周知のＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ／Ｏ及びこれらを接続するバスラインなどを備えた通常のコンピュータとして構成されている。そして、ナビゲーション制御部１０ａは、ナビゲーション関連の処理を実行する主体となり、マルチメディア制御部１０ｂはそれ以外のメディア、具体的には、上述したエアコンＥＣＵ５１、ＴＶチューナー５２及びオーディオ装置５３に関連する処理を実行する主体となる。 The multimedia ECU 10 is connected to a position detector 4, a map data input device 6, and an operation switch group 8 so that data can be input / output to / from the voice recognition device 30. Has been. In addition, data can be input / output with the display device 15 with a touch switch described above. Note that the multimedia ECU 10 includes a navigation control unit 10a and a multimedia control unit 10b, all of which include a well-known CPU, ROM, RAM, I / O, and a bus line that connects them. It is configured as a computer. The navigation control unit 10a is a main body that executes navigation-related processing, and the multimedia control unit 10b is processing related to other media, specifically, the air conditioner ECU 51, the TV tuner 52, and the audio device 53 described above. It becomes the subject that executes.

位置検出器４は、いずれも周知のジャイロスコープ１８、距離センサ２０、及び衛星からの電波に基づいて車両の位置を検出するＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）のためのＧＰＳ受信機２２を有している。これらのセンサ等１８、２０、２２は各々が性質の異なる誤差を持っているため、複数のセンサにより、各々補間しながら使用するように構成されている。なお、精度によっては上述した内の一部で構成してもよく、更に、ステアリングの回転センサ、各転動輪の車輪センサ等を用いてもよい。 Each of the position detectors 4 includes a known gyroscope 18, a distance sensor 20, and a GPS receiver 22 for GPS (Global Positioning System) that detects the position of the vehicle based on radio waves from satellites. . Each of these sensors 18, 20, and 22 has an error of a different nature, and is configured to be used while being interpolated by a plurality of sensors. Depending on the accuracy, a part of the above may be used, and further, a steering rotation sensor, a wheel sensor of each rolling wheel, or the like may be used.

地図データ入力器６は、位置検出の精度向上のためのいわゆるマップマッチング用データ、地図データ及び目印データを含む各種データを入力するための装置である。媒体としては、そのデータ量からＣＤ−ＲＯＭやＤＶＤを用いるのが一般的であるが、メモリカード等の媒体を用いてもよい。 The map data input device 6 is a device for inputting various data including so-called map matching data, map data, and landmark data for improving the accuracy of position detection. As a medium, a CD-ROM or DVD is generally used because of its data amount, but a medium such as a memory card may be used.

タッチスイッチ付き表示装置１５は、画面を指でタッチすると画面縦横に所定本数走っている赤外線が遮断され、押されたエリアを特定してスイッチ機能を発揮するタッチスイッチ１２と、表示装置全体の制御を司るディスプレイＥＣＵ１３と、ＬＣＤモニタ１４とを備えている。ＬＣＤモニタ１４はカラー表示が可能であり、その画面には、位置検出器４から入力された車両現在位置マークと、地図データ入力器６より入力された地図データと、更に地図上に表示する誘導経路や後述する設定地点の目印等の付加データとを重ねて表示することができる。これはナビゲーション装置としての使用方法であるが、例えばＴＶチューナ５２によって選局されたチャンネルのテレビ映像なども表示できるようにされている。 The touch switch display device 15 includes a touch switch 12 that blocks a predetermined number of infrared rays running vertically and horizontally when the screen is touched with a finger, identifies a pressed area, and performs a switch function, and controls the entire display device. A display ECU 13 and an LCD monitor 14. The LCD monitor 14 can display in color, and on its screen, the vehicle current position mark input from the position detector 4, the map data input from the map data input device 6, and the guidance displayed on the map are displayed. Additional data such as a route and landmarks for setting points to be described later can be displayed in an overlapping manner. This is a method of use as a navigation device. For example, a television image of a channel selected by the TV tuner 52 can be displayed.

また、操作スイッチ群８は、本実施例の場合には、タッチスイッチ付き表示装置１５の周辺に（あるいは表示装置の筐体に一体的に）配置されたメカニカルなスイッチであり、主に利用するメディアの選択に用いられる。具体的には、ナビゲーション、テレビ、オーディオ、エアコン、ＣＤなどの各種メディアの中から所望のものを選択するためのスイッチである。 In the case of the present embodiment, the operation switch group 8 is a mechanical switch disposed around the display device 15 with a touch switch (or integrally with the housing of the display device) and is mainly used. Used for media selection. Specifically, it is a switch for selecting a desired one from various media such as navigation, television, audio, air conditioner, and CD.

そして、音声認識装置３０は、上記操作スイッチ群８が手動操作により目的地などを指示するために用いられるのに対して、利用者が音声で入力することによっても同様に目的地などを指示することができるようにするための装置である。音声認識装置３０の構成について図２を参照してさらに説明する。 The voice recognition device 30 is used to instruct the destination or the like similarly when the user inputs the voice, while the operation switch group 8 is used to instruct the destination or the like by manual operation. It is a device for making it possible. The configuration of the speech recognition device 30 will be further described with reference to FIG.

この音声認識装置３０は、音声認識部３１と、対話制御部３２と、音声合成部３３と、音声入力部３４と、マイクロフォン３５と、ＰＴＴ（Ｐｕｓｈ−Ｔｏ−Ｔａｌｋ）スイッチ３６と、スピーカ３７と、ＰＴＴスイッチ制御部３８とを備えている。 The voice recognition device 30 includes a voice recognition unit 31, a dialogue control unit 32, a voice synthesis unit 33, a voice input unit 34, a microphone 35, a PTT (Push-To-Talk) switch 36, and a speaker 37. And a PTT switch control unit 38.

音声認識部３１は、音声入力部３４から入力された音声データを、対話制御部３２からの指示により入力音声の認識処理を行い、その認識結果を対話制御部３２に返す。すなわち、音声入力部３４から取得した音声データに対し、記憶している辞書データを用いて照合を行い、複数の比較対照パターン候補と比較して一致度の高い上位比較対照パターンを対話制御部３２へ出力する。入力音声中の単語系列の認識は、音声入力部３４から入力された音声データを順次音響分析して音響的特徴量（例えばケプストラム）を抽出し、この音響分析によって得られた音響的特徴量時系列データを得る。そして、周知のＤＰマッチング法、ＨＭＭ（隠れマルコフモデル）あるいはニューラルネットなどによって、この時系列データをいくつかの区間に分け、各区間が辞書データとして格納されたどの単語に対応しているかを求める。さらに、音声認識部３１は、対話制御部３２に認識した単語とともに、その単語に対応する音響的特徴量も出力する。 The voice recognition unit 31 performs input voice recognition processing on the voice data input from the voice input unit 34 according to an instruction from the dialogue control unit 32, and returns the recognition result to the dialogue control unit 32. That is, the speech data acquired from the speech input unit 34 is collated using the stored dictionary data, and the dialogue control unit 32 sets a higher comparison pattern having a higher degree of matching than a plurality of comparison pattern candidates. Output to. The recognition of the word sequence in the input speech is performed by sequentially analyzing the speech data input from the speech input unit 34 to extract the acoustic feature amount (for example, cepstrum), and the acoustic feature amount obtained by this acoustic analysis. Get series data. Then, the time series data is divided into several sections by a known DP matching method, HMM (Hidden Markov Model), or a neural network, and it is determined which word is stored as dictionary data for each section. . Furthermore, the voice recognition unit 31 outputs an acoustic feature amount corresponding to the word together with the word recognized by the dialogue control unit 32.

対話制御部３２は、その認識結果及び自身が管理する内部状態から、音声合成部３３への応答音声の発生指示や、システム自体の処理を実行するマルチメディアＥＣＵ１０に対して例えばナビゲート処理のために必要な目的地を通知して設定処理を実行させるよう指示する処理を実行する。この際、対話制御部３２は、マルチメディアＥＣＵ１０に実行を指示する処理のうち所定の処理については、ドライバーからの音声に対してのみ実行を指示する。具体的には、ナビゲーション処理に関する処理（例えば地図画面の縮尺の変更や目的地設定など）についてはドライバーからの音声に対してのみ実行を指示し、その他の処理（例えば、オーディオの音量やエアコンの温度設定など）についてはドライバー以外の乗員からの音声に対しても実行を指示する。このときの処理については、本発明の特徴的部分であるので、フローチャートを用いて後述する。 From the recognition result and the internal state managed by the dialog control unit 32, the interactive control unit 32 instructs the speech synthesizer 33 to generate a response voice and performs processing of the system itself, for example, for navigation processing. A process for instructing to execute the setting process by notifying the necessary destination is executed. At this time, the dialogue control unit 32 instructs the execution of only a predetermined process among the processes for instructing the multimedia ECU 10 to execute. Specifically, instructions related to navigation processing (for example, changing the scale of the map screen or setting the destination) are instructed to be executed only for the audio from the driver, and other processing (for example, audio volume or air conditioner (Temperature setting, etc.) is also instructed to be executed for voices from passengers other than the driver. The processing at this time is a characteristic part of the present invention, and will be described later using a flowchart.

また、このようにドライバーからの音声に対してのみ実行を指示するために、対話制御部３２は、音声入力部３４から入力された音声データがドライバーの音声か否かも判定している。具体的には、対話制御部３２は、ドライバーの声紋（音響的特徴量）が記憶できるようになっており、音声認識部３１から入力された音響的特徴量とそのドライバーの音響的特徴量との照合を行うことによって、音声入力部３４から入力された音声データがドライバーの音声か否かを判定する声紋判定部５９を有している。 Further, in order to instruct execution only for the voice from the driver as described above, the dialogue control unit 32 also determines whether or not the voice data input from the voice input unit 34 is the voice of the driver. Specifically, the dialogue control unit 32 can store a driver's voice print (acoustic feature amount), and the acoustic feature amount input from the speech recognition unit 31 and the acoustic feature amount of the driver. The voiceprint determination unit 59 determines whether or not the voice data input from the voice input unit 34 is the driver's voice.

以上の処理が確定後処理であり、結果として、この音声認識装置３０を利用すれば、上記操作スイッチ群８を操作しなくても、音声入力によりナビゲーション処理のための目的地の指示などが可能となるのである。 The above processing is post-determination processing. As a result, if this voice recognition device 30 is used, it is possible to specify a destination for navigation processing by voice input without operating the operation switch group 8. It becomes.

音声入力部３４は、マイクロフォン３５にて取り込んだ周囲の音声をデジタルデータに変換して音声認識部３１に出力するものである。なお、音声入力部３４は、マイクロフォン３５から入力されるノイズが音声認識部３１に出力しないように、所定の閾値以上の音声のみを音声認識部３１に出力している。また、ＰＴＴ制御部３８は、ＰＴＴスイッチ３６が押されているか否かを監視するとともに、ＰＴＴスイッチ３６が押されている場合には、その旨を音声入力部３４及び対話制御部３２に出力する。そして、対話制御部３２は、この際に音声認識部３１から入力された音響的特徴量を、ドライバーの音響的特徴量として記憶する。 The voice input unit 34 converts the surrounding voice captured by the microphone 35 into digital data and outputs the digital data to the voice recognition unit 31. Note that the voice input unit 34 outputs only voices having a predetermined threshold value or more to the voice recognition unit 31 so that noise input from the microphone 35 is not output to the voice recognition unit 31. In addition, the PTT control unit 38 monitors whether or not the PTT switch 36 is pressed, and if the PTT switch 36 is pressed, outputs the fact to the voice input unit 34 and the dialogue control unit 32. . Then, the dialogue control unit 32 stores the acoustic feature quantity input from the voice recognition unit 31 at this time as the acoustic feature quantity of the driver.

図１の構成説明に戻り、エアコンＥＣＵ５１は、エアコンの動作を制御するものであり、例えば、設定された温度となるように、送風温度や送風量などを制御する。 Returning to the description of the configuration in FIG. 1, the air conditioner ECU 51 controls the operation of the air conditioner.

ＴＶチューナ５２はテレビ放送信号を選局受信するための装置であり、いわゆるプリセットメモリに所定の周波数の放送信号を割り付けることができるようにされている。そして、乗員はそのプリセットメモリの番号を指定すれば、対応する放送局のテレビ放送信号を受信することができるようにされている。なお、この受信したテレビ映像は表示装置１５のＬＣＤモニタ１４に表示することができる。 The TV tuner 52 is a device for selecting and receiving a television broadcast signal, and can allocate a broadcast signal having a predetermined frequency to a so-called preset memory. Then, if the occupant designates the preset memory number, the television broadcast signal of the corresponding broadcasting station can be received. The received television image can be displayed on the LCD monitor 14 of the display device 15.

オーディオ装置５３は、音楽用のメディアを再生したり、所定周波数のラジオ信号を受信しスピーカに出力する装置である。この所定周波数のラジオ信号は、乗員がその都度受信したいラジオ信号に対応する周波数を指定したり、上述のＴＶチューナ５２と同様に、あらかじめ各ラジオ局に対応する周波数をプリセットメモリに割り付けることができる。 The audio device 53 is a device that plays music media, receives a radio signal of a predetermined frequency, and outputs it to a speaker. As for the radio signal of the predetermined frequency, the frequency corresponding to the radio signal that the occupant wants to receive each time can be designated, or the frequency corresponding to each radio station can be assigned to the preset memory in advance as in the TV tuner 52 described above. .

上述のエアコンＥＣＵ５１、ＴＶチューナ５２、オーディオ装置５３は、それぞれ乗員の手動操作によって各種動作をするとともに、本実施形態では上述の音声認識装置３０を用いて、乗員の音声によっても動作するようになっている。例えば、乗員が「エアコンの温度を××℃に設定」と言った場合には、音声認識装置３０がその音声を認識し、その認識結果に基づいて、後述するマルチメディア制御部１０ｂは、エアコンＥＣＵ５１に対し設定温度を××℃とするように指示する。 The air conditioner ECU 51, the TV tuner 52, and the audio device 53 perform various operations by manual operation of the occupant, and in the present embodiment, the air conditioner ECU 51, the TV tuner 52, and the audio device 53 operate by the occupant's voice. ing. For example, when the occupant says “Set the temperature of the air conditioner to xx ° C.”, the voice recognition device 30 recognizes the voice, and based on the recognition result, the multimedia control unit 10b, which will be described later, The ECU 51 is instructed to set the set temperature to xx ° C.

なお、マルチメディアＥＣＵ１０の内のナビゲーション制御部１０ａについて補足説明する。ナビゲーション機能を使用する場合には、例えばＬＣＤモニタ１４上に表示されるメニューから、ドライバーが操作スイッチ群８により、案内経路をＬＣＤモニタ１４に表示させるために経路情報表示処理を選択した場合、あるいは、音声認識装置３０を介して希望するメニューをマイクロフォン３５を介して音声入力することで、対話制御部３２からナビゲーション制御部１０ａへ同様の指示がなされた場合、次のような処理を実施する。すなわち、ドライバーがＬＣＤモニタ１４上の地図に基づいて、音声あるいは操作スイッチ群８の操作によって目的地を入力すると、ＧＰＳ受信機２２から得られる衛生のデータに基づき車両の現在地が求められ、目的地と現在地との間にダイクストラ法によりコスト計算して、現在地から目的地までの最も短距離の経路を誘導経路として求める処理が行われる。そして、ＬＣＤモニタ１４上の道路地図に重ねて誘導経路を表示し、交差点の拡大表示や曲がるべき交差点についての音声案内など、ドライバーに適切なルートを案内する。このような誘導経路を求める計算処理や案内処理は一般的に良く知られた処理である。また、この音声案内の条件設定や画面表示する言語（例えば日本語あるいは英語など）については自由に設定できるようにされている。さらに、目的地やその他登録させておくと便利な地点については、利用者が任意に地点登録させておくことができる。 A supplementary description will be given of the navigation control unit 10a in the multimedia ECU 10. When using the navigation function, for example, when the driver selects a route information display process for displaying the guidance route on the LCD monitor 14 by the operation switch group 8 from the menu displayed on the LCD monitor 14, or When a similar instruction is given from the dialogue control unit 32 to the navigation control unit 10a by inputting a desired menu through the microphone 35 via the voice recognition device 30, the following processing is performed. That is, when the driver inputs a destination by voice or operation of the operation switch group 8 based on the map on the LCD monitor 14, the current location of the vehicle is obtained based on the hygiene data obtained from the GPS receiver 22. The cost is calculated between the current location and the current location by the Dijkstra method, and the shortest route from the current location to the destination is obtained as a guidance route. Then, the guidance route is displayed overlaid on the road map on the LCD monitor 14, and an appropriate route is guided to the driver, such as an enlarged display of the intersection and voice guidance about the intersection to be bent. Such calculation processing and guidance processing for obtaining a guidance route are generally well-known processing. The voice guidance condition setting and the language (for example, Japanese or English) to be displayed on the screen can be freely set. Furthermore, the user can arbitrarily register the destination and other points that are convenient to register.

上述したように本実施形態のマルチメディアシステムは、マイクロフォン３５から音声入力があった場合、その音声がナビゲーションに関する処理の実行を指示するものである場合には、ドライバーの音声にのみに反応し、エアコンＥＣＵ５１、ＴＶチューナ５２、オーディオ装置５３に関する処理の実行を指示するものである場合には、ドライバー以外の乗員の音声にも反応する。そのために、マイクロフォン３５から入力された音声がドライバーの音声か否かを判定する必要があるが、その判定のためにあらかじめドライバーの音声（音響的特徴量）を記憶している。このときの処理を図３のフローチャートを用いて説明する。なお、この処理は音声認識装置３０が行っている。 As described above, the multimedia system according to the present embodiment reacts only to the driver's voice when the voice is input from the microphone 35 and the voice instructs the execution of the processing related to navigation. In the case of instructing execution of processing relating to the air conditioner ECU 51, the TV tuner 52, and the audio device 53, it also responds to the voices of passengers other than the driver. For this purpose, it is necessary to determine whether or not the voice input from the microphone 35 is the voice of the driver. For this judgment, the voice of the driver (acoustic feature value) is stored in advance. The processing at this time will be described with reference to the flowchart of FIG. This process is performed by the voice recognition device 30.

先ずステップＳ１０において、ＰＴＴスイッチ３６がオンされているか否かを判定する。なお、この判定はＰＴＴスイッチ制御部３８が行う。ここで、ＰＴＴスイッチ３６がオンされていないときは以降の処理を行わない。すなわち、ドライバーの音声（音響的特徴量）を記憶する処理を行わない。 First, in step S10, it is determined whether or not the PTT switch 36 is turned on. This determination is made by the PTT switch control unit 38. Here, when the PTT switch 36 is not turned on, the subsequent processing is not performed. That is, the process of storing the driver's voice (acoustic feature value) is not performed.

一方、ＰＴＴスイッチ３６がオンされているときは、次にステップＳ１１において、マイクロフォン３５から音声入力されたか否かを判定する。具体的には、ＰＴＴスイッチ制御部３８はＰＴＴスイッチ３５がオンされていることを示す信号を対話制御部３２に出力し、対話制御部３２は、音声合成部３３を介してスピーカ３７からドライバーに所定の言葉を発するように促す。この所定の言葉としては、例えば自己の氏名など、音響的特徴量が得られる言葉であればどのような言葉でもよい。また、ＰＴＴスイッチ制御部３８は、ＰＴＴスイッチ３６がオンされていることを示す信号を音声入力部３４にも出力し、その出力信号に基づいて、音声入力部３４がマイクロフォン３５から音声入力されたか否かを判定する。なお、マイクロフォン３５から音声入力されたか否かの判定を、ＰＴＴスイッチ制御部３８が行ってもよい。この場合、ＰＴＴスイッチ制御部３８は、音声入力部３４を参照すればよい。また、マイクロフォン３５から音声入力がされたか否かの判断は、所定時間内に音声入力がされたことを認められか否かで判断する。ここで、マイクロフォン３５から音声入力がされていないときには以降の処理を行わない。なお、この際、即座にこのフローチャートから抜けないで、再度所定の言葉を発するように促したり、再度ステップＳ１０に戻り、所定回数ステップＳ１１の判定をし、依然として音声入力がされていない場合にこのフローチャートから抜けるようにしてもよい。 On the other hand, if the PTT switch 36 is on, it is next determined in step S11 whether or not a voice is input from the microphone 35. Specifically, the PTT switch control unit 38 outputs a signal indicating that the PTT switch 35 is turned on to the dialogue control unit 32, and the dialogue control unit 32 sends the signal from the speaker 37 to the driver via the voice synthesis unit 33. Encourage them to utter certain words. The predetermined word may be any word as long as an acoustic feature amount can be obtained, such as the name of the person. In addition, the PTT switch control unit 38 also outputs a signal indicating that the PTT switch 36 is turned on to the voice input unit 34, and whether the voice input unit 34 receives a voice input from the microphone 35 based on the output signal. Determine whether or not. Note that the PTT switch control unit 38 may determine whether or not a voice is input from the microphone 35. In this case, the PTT switch control unit 38 may refer to the voice input unit 34. Further, whether or not voice input is made from the microphone 35 is determined based on whether or not it is recognized that voice input is made within a predetermined time. Here, when no sound is input from the microphone 35, the subsequent processing is not performed. It should be noted that at this time, without promptly leaving the flowchart, the user is prompted to utter a predetermined word again, or returns to step S10 again to make a determination of step S11 a predetermined number of times. It is also possible to leave the flowchart.

一方、マイクロフォン３５から音声入力がされたときには、次にステップＳ１２において、その音声をドライバーの音声（音響的特徴量）として記憶する。具体的には、上述したように、対話制御部３２は、音声認識部３１から入力された音声（音響的特徴量）を記憶する。ここで、記憶した音声（音響的特徴量）が、ドライバーの音声（音響的特徴量）となる。 On the other hand, when a voice is input from the microphone 35, in step S12, the voice is stored as a driver voice (acoustic feature value). Specifically, as described above, the dialogue control unit 32 stores the voice (acoustic feature amount) input from the voice recognition unit 31. Here, the stored voice (acoustic feature value) becomes the driver's voice (acoustic feature value).

以上のように、ＰＴＴスイッチ３６がオンされたときに、ドライバーの音声（音響的特徴量）を記憶する。これは、通常のナビゲーション処理等を指示するための音声と区別するためである。 As described above, when the PTT switch 36 is turned on, the driver's voice (acoustic feature value) is stored. This is for distinguishing from the voice for instructing normal navigation processing and the like.

次に、ドライバーの音声（音響的特徴量）が記憶されていることを前提として、音声によりナビゲーションに関する処理等を実行する処理を、図４のフローチャートを用いて説明する。なお、この処理は、音声認識装置３０及びマルチメディアＥＣＵ１０が行う。 Next, a process for executing a navigation-related process or the like by voice on the assumption that the driver's voice (acoustic feature value) is stored will be described with reference to the flowchart of FIG. This process is performed by the voice recognition device 30 and the multimedia ECU 10.

先ずステップＳ２０において、マイクロフォン３５から音声が入力されたか否かを判定する。ここで、マイクロフォン３５から音声入力がされていないときには、以降の処理は行わない。一方、マイクロフォン３５から音声入力されたときには、ステップＳ２１において、音声認識部３１はその入力された音声の内容を認識する処理を行う。具体的には、取得した音声データに対して、記憶されている辞書データを用いて照合を行う。そして、その照合結果により定まった上位比較対象パターンを認識結果として対話制御部３２に出力することとなる。この際、マイクロフォン３５から入力された音声がドライバーの音声か否かを判定するために音響的特徴量も対話制御部３２に出力する。 First, in step S20, it is determined whether or not sound is input from the microphone 35. Here, when no sound is input from the microphone 35, the subsequent processing is not performed. On the other hand, when voice is input from the microphone 35, in step S21, the voice recognition unit 31 performs processing for recognizing the content of the input voice. Specifically, the acquired voice data is collated using stored dictionary data. Then, the upper comparison target pattern determined based on the collation result is output to the dialogue control unit 32 as a recognition result. At this time, an acoustic feature quantity is also output to the dialogue control unit 32 in order to determine whether or not the voice input from the microphone 35 is the voice of the driver.

次にステップＳ２２において、マイクロフォン３５から入力された音声がドライバーの音声か否かを判定する。具体的には、対話制御部３２内の声紋判定部５９は、音声認識部３１から入力された音響的特徴量と対話制御部３２内に記憶されているドライバーの音響的特徴量とを照合する。ここで、マイクロフォン３５から入力された音声がドライバーの音声であると判定したときには、ステップＳ２３に処理を進める。 Next, in step S22, it is determined whether or not the voice input from the microphone 35 is the voice of the driver. Specifically, the voiceprint determination unit 59 in the dialogue control unit 32 collates the acoustic feature amount input from the speech recognition unit 31 with the acoustic feature amount of the driver stored in the dialogue control unit 32. . If it is determined that the voice input from the microphone 35 is the voice of the driver, the process proceeds to step S23.

そして、ステップＳ２３において、マイクロフォン３５から入力された音声がドライバーに限定された処理を指示する音声か否かを判定する。本実施形態では、音声で実行指示が可能な全ての処理に対する音声をドライバーの処理に対する音声としている。つまり、ナビゲーションに関する処理、エアコンＥＣＵ５１、ＴＶチューナー５２、オーディオ装置５３に対する処理の全てをドライバーは音声で実行指示が可能となっている。なお、マイクロフォン３５から入力された音声がドライバーの処理に対する音声か否かの判定は、音声認識部３１から入力された信号に基づいて行う。ここで、マイクロフォン３５から入力された音声がドライバーの処理に対する音声であると判定したときには、ステップＳ２４において、これから実行する処理をトークバックする。これは、対話制御部３２が、音声合成部３３を介してスピーカ３７を用いて行う。これにより、ドライバーは自身が所望する処理と一致しているか否かを事前に確認することができる。その後、ステップＳ２５において、入力した音声に対応する処理を実行する。これは、対話制御部３２が実行する処理内容をマルチメディアＥＣＵ１０に通知し、マルチメディアＥＣＵ１０は、その通知内容に基づいて、処理を実行するように対応する装置に指示することで可能となる。一方、マイクロフォン３５から入力された音声がドライバーの処理に対する音声でないと判定したときは、以降の処理は行わない。なお、ドライバーの処理に対する音声でない場合とは、例えば音声によって実行指示ができない処理に対するものである場合や、音声によって実行指示ができる処理に対するものである場合であっても適切な音声で指示していない場合などが挙げられる。 In step S23, it is determined whether or not the voice input from the microphone 35 is a voice instructing processing limited to the driver. In the present embodiment, the sound for all the processes that can be instructed to be executed by sound is used as the sound for the driver process. That is, the driver can instruct the execution of all of the processing related to navigation and the processing for the air conditioner ECU 51, the TV tuner 52, and the audio device 53 by voice. Note that whether or not the voice input from the microphone 35 is a voice for the driver's processing is determined based on the signal input from the voice recognition unit 31. Here, when it is determined that the sound input from the microphone 35 is the sound for the driver process, in step S24, the process to be executed is talked back. This is performed by the dialogue control unit 32 using the speaker 37 via the speech synthesis unit 33. As a result, the driver can confirm in advance whether or not it matches the processing desired by the driver. Thereafter, in step S25, processing corresponding to the input voice is executed. This is made possible by notifying the multimedia ECU 10 of the processing content executed by the dialogue control unit 32, and the multimedia ECU 10 instructs the corresponding device to execute the processing based on the notification content. On the other hand, when it is determined that the voice input from the microphone 35 is not the voice for the driver process, the subsequent process is not performed. Note that the case where the voice is not for the driver process is, for example, for a process for which an execution instruction cannot be given by voice, or for a process for which an execution instruction can be given by voice, even if the voice is given for an appropriate voice. There are cases where there is no such thing.

ステップＳ２２において、マイクロフォン３５から入力された音声がドライバーの音声でないと判定したときには、ステップＳ２６に処理を進める。そして、ステップＳ２６において、マイクロフォン３５から入力された音声が、ドライバー以外の乗員でも実行指示が可能な一般処理を指示する音声か否かを判定する。ここで、ドライバー以外の乗員でも実行指示が可能な一般の処理とは、ナビゲーションに関する処理以外の処理を言い、具体的には、マルチメディアＥＣＵ１０内のマルチメディア制御部１０ｂが行う処理（エアコンＥＣＵ５１、ＴＶチューナ５２、オーディオ装置５３）のうちの音声によって実行指示が可能な処理を言う。例えば、エアコンの設定温度の変更や、ＴＶ放送ラジオ局の選局や、音量の調節などが挙げられる。ここで、マイクロフォン３５から入力された音声が、ドライバー以外の乗員でも実行指示が可能な一般の処理に対するものであると判定したときには、上述と同様に、ステップＳ２４においてこれから実行する処理内容をトークバックし、その後、ステップＳ２５においてその処理を実行する。一方、マイクロフォン３５から入力された音声が、ドライバー以外の乗員でも実行指示が可能な一般の処理に対するものでないと判定したときは、以降の処理は行わない。なお、ドライバー以外の乗員でも実行指示が可能な一般の処理に対するものでない場合には、上述（ステップＳ２３否定判定）と同様の場合の他、ドライバーの処理に対する音声（ナビゲーションに関する処理に対する音声）の場合などが挙げられる。 If it is determined in step S22 that the sound input from the microphone 35 is not the driver's sound, the process proceeds to step S26. In step S <b> 26, it is determined whether or not the voice input from the microphone 35 is a voice instructing a general process that can be instructed by an occupant other than the driver. Here, the general process that can be instructed by an occupant other than the driver is a process other than the process related to navigation. Specifically, the process performed by the multimedia control unit 10b in the multimedia ECU 10 (air conditioner ECU 51, This refers to a process in which an execution instruction can be given by voice in the TV tuner 52 and the audio device 53). For example, changing the set temperature of the air conditioner, selecting a TV broadcast radio station, adjusting the volume, and the like. Here, when it is determined that the sound input from the microphone 35 is for a general process that can be executed even by an occupant other than the driver, the process to be executed in step S24 is talkbacked in the same manner as described above. Then, the process is executed in step S25. On the other hand, when it is determined that the voice input from the microphone 35 is not for general processing that can be executed by an occupant other than the driver, the subsequent processing is not performed. In addition, in the case where it is not for general processing that can be instructed even by an occupant other than the driver, in addition to the case similar to the above (determination of step S23), in the case of voice for driver processing (voice for navigation-related processing) Etc.

以上、本実施形態のマルチメディアシステムでは、乗員の音声によって種々の処理を実行することができ、この際、ナビゲーションに関する処理については、ドライバーの音声（音響的特徴量）をあらかじめ記憶しておくことにより、ドライバーの音声にのみ反応するようにしている。これにより、ドライバーは安心して走行することができる。また、ドライバーの音声か否かを判定するために、特許文献２のように複数のマイクロフォンやＣＣＤカメラ等のハードウェアを用いてないので、コスト高となることもない。さらに、ドライバーの音声（音響的特徴量）を記憶するのにＰＴＴスイッチを用いることにより、ドライバーは好みの時に音声（音響的特徴量）を記憶することができる。特に、運転を開始した後、ドライバーが交代するときに効果的である。 As described above, in the multimedia system according to the present embodiment, various processes can be executed by the passenger's voice. At this time, the driver's voice (acoustic feature amount) is stored in advance for the navigation-related processing. Therefore, it reacts only to the driver's voice. As a result, the driver can travel with peace of mind. Further, since it is not used for hardware such as a plurality of microphones and a CCD camera as in Patent Document 2 to determine whether or not the sound is a driver's voice, the cost is not increased. Further, by using the PTT switch to store the driver's voice (acoustic feature quantity), the driver can store the voice (acoustic feature quantity) at a desired time. This is particularly effective when the driver changes after starting driving.

なお、本発明に係る車両用情報処理装置は、上記実施形態に限定されるわけではなく、その趣旨を逸脱しない範囲において種々変形してもよい。
（変形例） The vehicle information processing apparatus according to the present invention is not limited to the above-described embodiment, and may be variously modified without departing from the spirit thereof.
(Modification)

上記実施形態では、ドライバーの音声（音響的特徴量）を記憶するために、ＰＴＴスイッチがオンされているときに入力された音声を、ドライバーの音声（音響的特徴量）として記憶していた（図３参照）。これは、上述したように通常のナビゲーション処理等を指示するための音声と区別するためである。しかし、これに限定されるわけではなく、ＰＴＴスイッチを用いず、運転開始時にマイクロフォン３５から入力された音声をドライバーの音声（音響的特徴量）として記憶するようにしてもよい。例えば、車両のエンジン始動時に、ドライバーに自己の氏名など所定の言葉を発するように促す。それにともなって入力された音声をドライバーの音声（音響的特徴量）として記憶する。これにより、毎回の運転開始時にドライバーの音声（音響的特徴量）が記憶されるので、特に複数の人が運転する車両に適用すると効果的である。 In the above embodiment, in order to store the driver's voice (acoustic feature), the voice input when the PTT switch is turned on is stored as the driver's voice (acoustic feature) ( (See FIG. 3). This is for distinguishing from voice for instructing normal navigation processing or the like as described above. However, the present invention is not limited to this, and a voice input from the microphone 35 at the start of driving may be stored as a driver voice (acoustic feature value) without using a PTT switch. For example, when the engine of the vehicle is started, the driver is urged to utter predetermined words such as his / her name. The voice input along with it is stored as the driver's voice (acoustic feature value). Thus, since the driver's voice (acoustic feature value) is stored at the start of each driving, it is particularly effective when applied to a vehicle driven by a plurality of people.

また、毎回の運転開始時にドライバーの音声（音響的特徴量）を記憶するのは操作負担となって煩わしいと思うドライバーもいると想定できるので、既にドライバーの音声（音響的特徴量）が記憶されている場合には、以降の運転開始時にはドライバーの音声（音響的特徴量）を記憶する処理を中止できるようにしてもよい。 In addition, it can be assumed that there are some drivers who find it cumbersome to memorize the driver's voice (acoustic feature) at the start of every driving, so the driver's voice (acoustic feature) is already stored. In such a case, the process of storing the driver's voice (acoustic feature value) may be stopped at the start of subsequent driving.

実施形態のマルチメディアシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the multimedia system of embodiment. 音声認識装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of a speech recognition apparatus. ドライバーの音声（音響的特徴量）を記憶する処理を示すフローチャートである。It is a flowchart which shows the process which memorize | stores the audio | voice (acoustic feature-value) of a driver. 音声によりナビゲーションに関する処理等を実行する処理を示すフローチャートである。It is a flowchart which shows the process which performs the process regarding navigation, etc. by an audio | voice.

Explanation of symbols

４位置検出器
６地図データ入力器
８操作スイッチ群
１０マルチメディアＥＣＵ
１５表示装置
３０音声認識装置
５１エアコンＥＣＵ
５２ＴＶチューナ
５３オーディオ装置 4 Position detector 6 Map data input device 8 Operation switch group 10 Multimedia ECU
DESCRIPTION OF SYMBOLS 15 Display apparatus 30 Voice recognition apparatus 51 Air-conditioner ECU
52 TV tuner 53 Audio device

Claims

Voice input means for inputting voice;
The voice input by the voice input means is a voice that instructs execution of a process that can be instructed to execute, including a process that should be instructed by the driver and a process that can be instructed to the passenger other than the driver. First determination means for determining whether or not
Voiceprint storage means for storing the driver's voiceprint;
Second determination means for determining whether or not the voice input by the voice input means is the voice of the driver based on the voice print stored in the voice print storage means;
When the first determination unit determines that the voice input by the voice input unit is a voice instructing execution of a process that can be instructed to execute by the voice, An information processing apparatus for a vehicle, comprising: execution means for executing processing only for the voice of the driver based on a determination result of the second determination means.

The vehicle information processing apparatus according to claim 1, wherein the voiceprint storage unit stores the voice input by the voice input unit at the start of driving as the voiceprint of the driver.

3. The vehicle information processing according to claim 2, further comprising: a canceling unit that stops storing the driver's voiceprint at the start of subsequent driving when the driver's voiceprint is already stored in the voiceprint storage unit. apparatus.

The vehicle information processing apparatus according to claim 1, wherein the voiceprint storage unit includes a switch that stores the voice input by the voice input unit as the voiceprint of the driver.