JPH11338490A

JPH11338490A - Speech recognition unit for vehicle

Info

Publication number: JPH11338490A
Application number: JP10149103A
Authority: JP
Inventors: Fumio Umeda; 文雄梅田; Takao Yamamoto; 敬央山本
Original assignee: Tokai Rika Co Ltd
Current assignee: Tokai Rika Co Ltd
Priority date: 1998-05-29
Filing date: 1998-05-29
Publication date: 1999-12-10

Abstract

PROBLEM TO BE SOLVED: To eliminate the operation of a switch for starting speech recognition processing, and to enhance a speech recognition rate even in a noisy cabin. SOLUTION: A DSP 12 for an image inputs an image signal from a CCD camera 3 to operate an actual lip pattern P. The DSP 12 outputs a starring terminal signal P1 corresponding to a starting terminal t1 to a speech processor 15, based on the pattern P. A DSP 16 for a speech starts, from the starting terminal t1, speech recognition processing for a sound signal from a microphone 6 in response to the signal P1.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、車両用音声認識装
置に係り、詳しくは音声認識処理における始端、音節端
及び終端の切り出しに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for a vehicle, and more particularly to cutting out a start end, a syllable end and an end in voice recognition processing.

【０００２】[0002]

【従来の技術】近年、カーナビゲーションの普及によ
り、目的地を設定するために車両用音声認識装置が使用
されるようになった。又、ハンズフリー電話等にも電話
番号の設定をするために車両用音声認識装置が使用され
ている。従って、その音声認識率のさらなる向上が求め
られている。2. Description of the Related Art In recent years, with the spread of car navigation, a vehicle voice recognition device has been used to set a destination. Also, a voice recognition device for a vehicle is used for setting a telephone number in a hands-free telephone or the like. Therefore, further improvement in the voice recognition rate is required.

【０００３】一般に、常に音声認識処理を開始するため
に、操縦者の音声の始端を走行中の車室内のノイズの中
から切り出すことはかなり難しい。そこで、従来、車両
用音声認識装置には音声の始端を切り出すために操作部
材が設けられ、この操作部材が音声認識処理を開始する
ために操作されていた。In general, in order to always start the voice recognition processing, it is very difficult to cut out the starting point of the voice of the driver from the noise in the cabin during traveling. Therefore, conventionally, an operation member has been provided in a vehicle voice recognition device to cut out the beginning of voice, and this operation member has been operated to start voice recognition processing.

【０００４】[0004]

【発明が解決しようとする課題】しかし、これらの音声
認識装置においては、音声認識処理を起動させるスイッ
チ操作が必要であるとともに、操作部材が操作されて音
声認識処理が開始されても、すぐに音声が入力されない
ことが頻繁に発生し、音声の始端を検出することが難し
かった。これは、音声の認識率を低下させる大きな原因
となっていた。However, in these voice recognition devices, a switch operation for starting the voice recognition process is required, and even if the operation member is operated and the voice recognition process is started, the voice recognition process is immediately performed. Frequent occurrence of no voice input has made it difficult to detect the beginning of the voice. This has been a major cause of lowering the speech recognition rate.

【０００５】さらに、走行中の車室内においては、かな
りのノイズが発生する。そのため、そのノイズの中でも
音声の認識率をさらに向上させることが要求されてい
る。また、真の音声認識装置を実現するためには、音声
認識処理を開始するために操作部材を操作する作業を取
り除くことも必要である。[0005] Further, considerable noise is generated in the cabin during traveling. Therefore, it is required to further improve the speech recognition rate even in the noise. Also, in order to realize a true voice recognition device, it is necessary to eliminate the operation of operating the operation member to start the voice recognition process.

【０００６】本発明の目的は、音声認識処理を開始する
ためのスイッチ操作を取り除き、更にノイズの多い車室
内においても音声認識率を向上させることができる車両
用音声認識装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a vehicular speech recognition apparatus which eliminates a switch operation for starting speech recognition processing and can improve the speech recognition rate even in a noisy cabin. .

【０００７】[0007]

【課題を解決するための手段】上記問題点を解決するた
めに、請求項１に記載の発明は、車内に設けられたマイ
クロフォンと、そのマイクロフォンが検出した操縦者の
音声を認識する音声認識手段とからなる車両用音声認識
装置において、車内に設けられ、前記操縦者の唇を含む
顔を撮像する撮像手段と、前記撮像手段からの画像信号
を入力し、その画像信号から前記操縦者の唇の画像領域
を切り出してその時々の唇パターンを演算し、その唇パ
ターンに基づいて操縦者の会話の開始を検出し、前記音
声認識手段に対してマイクロフォンから操縦者の音声を
入力し、音声認識処理動作を開始させるための始端信号
を出力する画像認識手段とからなることを要旨とする。According to a first aspect of the present invention, there is provided a microphone provided in a vehicle and a voice recognition means for recognizing a driver's voice detected by the microphone. An image pickup means provided in the vehicle, for picking up a face including the lips of the driver, and an image signal from the image pickup means, and the lip of the driver is obtained from the image signal. Calculates the lip pattern of each time, detects the start of the conversation of the pilot based on the lip pattern, inputs the voice of the pilot from the microphone to the voice recognition means, and performs voice recognition. The gist of the invention is that it comprises an image recognition means for outputting a start signal for starting a processing operation.

【０００８】請求項２に記載の発明は、請求項１に記載
の車両用音声認識装置において、前記画像認識手段は、
前記唇パターンに基づいて操縦者の会話の開始から終了
までの音節端を検出し、音節端信号を出力するものであ
り、前記音声認識手段は、その音節端信号に基づいて前
記始端信号に応答してから操縦者の音声について、その
音節端信号に基づいて音節区分して音声認識処理を行う
ようにしたことを要旨とする。According to a second aspect of the present invention, in the vehicle speech recognition apparatus according to the first aspect, the image recognition means includes:
Based on the lip pattern, a syllable end from the start to the end of the conversation of the pilot is detected, and a syllable end signal is output.The voice recognition means responds to the start end signal based on the syllable end signal. After that, the gist of the present invention is that voice recognition processing is performed on the voice of the operator by categorizing the syllables based on the syllable end signal.

【０００９】請求項１に記載の発明によれば、画像認識
手段により、撮像手段が撮像した顔から唇パターンを演
算し、その唇パターンに基づいて操縦者の会話の開始を
検出し、始端信号を音声認識手段に出力する。音声認識
手段は、この始端信号に応答してマイクロフォンから操
縦者の音声を入力し、音声認識処理動作を開始する。According to the first aspect of the present invention, the image recognition means calculates a lip pattern from the face imaged by the imaging means, detects the start of the conversation of the pilot based on the lip pattern, and outputs a start signal. Is output to the voice recognition means. The voice recognition means inputs the voice of the driver from the microphone in response to the start signal, and starts the voice recognition processing operation.

【００１０】従って、音声認識処理をする際の音声の始
端（会話の開始）を特定するための操作部材を操作する
ことなく、確実にしかも正確に音声の始端（会話の開
始）を特定することができ、音声認識を向上させること
ができる。[0010] Therefore, it is possible to reliably and accurately identify the beginning of speech (start of conversation) without operating an operation member for identifying the beginning of speech (start of conversation) when performing speech recognition processing. Can improve speech recognition.

【００１１】請求項２に記載の発明によれば、音声認識
手段は、画像認識手段からの音節端信号に基づいて操縦
者の音声を音節区分し、音声認識処理を音節毎の短い音
声について行うため、音声認識処理が容易になる。その
結果、音声の認識率を向上させることができる。According to the second aspect of the present invention, the voice recognizing means divides the voice of the operator into syllables based on the syllable end signal from the image recognizing means, and performs voice recognition processing on the short voice of each syllable. Therefore, the voice recognition processing becomes easy. As a result, the voice recognition rate can be improved.

【００１２】[0012]

【発明の実施の形態】以下、本発明を具体化した車両用
音声認識装置の一実施形態を図１〜図３に従って説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of a vehicle voice recognition apparatus embodying the present invention will be described below with reference to FIGS.

【００１３】図１に示すように、車室１内の運転席側の
前方にあるピラー（Ａピラー）２には撮像手段としての
ＣＣＤカメラ３が操縦者の顔にその焦点を合わせて設け
られている。As shown in FIG. 1, a CCD camera 3 as an image pickup means is provided on a pillar (A pillar) 2 in front of a driver's seat side in a vehicle cabin 1 so as to focus on a face of a pilot. ing.

【００１４】ハンドル４のパッド５上にはマイクロフォ
ン６が設けられている。このマイクロフォン６の指向性
は操縦者の口に向けられている。次に、車両に装備した
車両用音声認識装置の電気的構成を図２に示す。A microphone 6 is provided on the pad 5 of the handle 4. The directivity of the microphone 6 is directed to the operator's mouth. Next, FIG. 2 shows an electrical configuration of the vehicle voice recognition device mounted on the vehicle.

【００１５】前記ＣＣＤカメラ３は、操縦者の唇を含む
顔のビデオ画像信号を微分した画像信号を出力する。こ
の画像信号は、画像処理装置１１に入力される。この画
像処理装置１１は、画像認識手段としての画像用デジタ
ルシグナルプロセッサ（画像用ＤＳＰ）１２、データを
一時記憶する読み出し及び書き込み専用メモリ（画像用
ＲＡＭ）１３、読み出し専用メモリ（画像用ＲＯＭ）１
４から構成されている。画像用ＲＯＭ１４には画像用Ｄ
ＳＰ１２による高速デジタル演算を行わせるための制御
プログラムが格納されている。この制御プログラムは、
画像用ＤＳＰ１２内の画像用ＲＡＭ１３に転送されて、
画像用ＤＳＰ１２内で所望の画像認識処理が行われる。The CCD camera 3 outputs an image signal obtained by differentiating the video image signal of the face including the lips of the pilot. This image signal is input to the image processing device 11. The image processing apparatus 11 includes an image digital signal processor (image DSP) 12 as image recognition means, a read-only and write-only memory (image RAM) 13 for temporarily storing data, and a read-only memory (image ROM) 1.
4. The image ROM 14 has an image D
A control program for performing high-speed digital operation by SP12 is stored. This control program
Transferred to the image RAM 13 in the image DSP 12,
Desired image recognition processing is performed in the image DSP 12.

【００１６】画像用ＤＳＰ１２は、ＣＣＤカメラ３から
画像信号を入力し、その画像信号から操縦者の顔の輪郭
を演算している。そして、画像用ＤＳＰ１２は前記顔の
輪郭に外接する４角形の中心位置を演算して鼻位置とし
ている。そして、画像用ＤＳＰ１２は、前記鼻位置に予
め設定した距離を加えた位置を演算して唇中心位置とし
ている。The image DSP 12 receives an image signal from the CCD camera 3 and calculates an outline of the pilot's face from the image signal. Then, the image DSP 12 calculates the center position of the quadrilateral circumscribing the contour of the face and sets it as the nose position. Then, the image DSP 12 calculates a position obtained by adding a preset distance to the nose position and sets the calculated position as the lip center position.

【００１７】画像用ＤＳＰ１２は、前記唇中心位置を中
心とし、予め設定した領域内の画像信号を取り込み、唇
パターン（実唇パターン）Ｐを演算している。画像用Ｄ
ＳＰ１２は、予め設定した時間間隔でその時々の前記実
唇パターンＰを演算し、その時演算した実唇パターンＰ
とその一つ前に演算した実唇パターンＰとを比較してい
る。そして、予め設定した時間の間、両実唇パターンＰ
に相違がなく、且つその時演算した実唇パターンＰとそ
の一つ前に演算した実唇パターンＰとに相違がある時、
画像用ＤＳＰ１２は、音声認識処理の始端ｔ１に相対す
る始端信号Ｐ１を音声処理装置１５に出力している。The image DSP 12 fetches an image signal in a predetermined area around the lip center position and calculates a lip pattern (actual lip pattern) P. D for image
SP12 calculates the actual lip pattern P at each time at a preset time interval, and calculates the actual lip pattern P calculated at that time.
Is compared with the actual lip pattern P calculated immediately before. Then, for a preset time, both real lip patterns P
When there is no difference between the actual lip pattern P calculated at that time and the actual lip pattern P calculated immediately before,
The image DSP 12 outputs the start signal P1 corresponding to the start t1 of the voice recognition processing to the voice processing device 15.

【００１８】又、画像用ＤＳＰ１２はその時演算した実
唇パターンＰとその一つ前に演算した実唇パターンＰと
に相違がある時、音声認識処理の音節端ｔ２に相対する
音節端信号Ｐ２を音声処理装置１５に出力している。When there is a difference between the actual lip pattern P calculated at that time and the actual lip pattern P calculated immediately before the image DSP 12, the image DSP 12 outputs the syllable end signal P2 corresponding to the syllable end t2 of the voice recognition processing. It is output to the audio processing device 15.

【００１９】更に、画像用ＤＳＰ１２は予め設定した時
間の間、演算した前記実唇パターンＰに相違がな時、音
声認識処理の終端ｔ３に相対する終端信号Ｐ３を音声処
理装置１５に出力している。Further, when there is no difference between the calculated actual lip pattern P for a preset time, the image DSP 12 outputs a termination signal P3 corresponding to the termination t3 of the speech recognition processing to the speech processing unit 15. I have.

【００２０】マイクロフォン６は、操縦者の音声を音声
処理装置１５に出力している。音声処理装置１５は、音
声用ＤＳＰ１６、音声用ＲＡＭ１７及び音声用ＲＯＭ１
８から構成されている。音声用ＲＯＭ１８には音声用Ｄ
ＳＰ１６による高速デジタル演算を行わせるための制御
プログラムが格納されている。この制御プログラムは、
音声用ＤＳＰ１６内の音声用ＲＡＭ１７に転送され、音
声用ＤＳＰ１６内で所望の音声認識処理が行われる。
又、音声用ＲＯＭ１８には予め設定した複数の音声命令
パターンからなる複数の標準音声命令パターンが格納さ
れている。The microphone 6 outputs the voice of the operator to the voice processing device 15. The voice processing device 15 includes a voice DSP 16, a voice RAM 17, and a voice ROM 1.
8. The voice ROM 18 has a voice D
A control program for performing high-speed digital operation by SP16 is stored. This control program
The data is transferred to the audio RAM 17 in the audio DSP 16, and a desired audio recognition process is performed in the audio DSP 16.
The voice ROM 18 stores a plurality of standard voice command patterns including a plurality of voice command patterns set in advance.

【００２１】音声用ＤＳＰ１６は、イグニッションスイ
ッチによる車両の始動とともに前記した複数の標準音声
命令パターンを音声用ＤＳＰ１６内の音声用ＲＡＭ１７
に読み込む。The voice DSP 16 stores the plurality of standard voice command patterns in the voice RAM 17 in the voice DSP 16 when the vehicle is started by an ignition switch.
Read in.

【００２２】図３に示すように、音声用ＤＳＰ１６は始
端信号Ｐ１が入力されるまで音声認識処理を開始しない
待機状態となっている。音声用ＤＳＰ１６は、画像用Ｄ
ＳＰ１２からの始端信号Ｐ１が入力されると、操縦者の
会話が開始したとして、即ち音声用ＤＳＰ１６は音声認
識処理のための音声の始端ｔ１がマイクロフォン６から
入力されると判断する。そして、音声用ＤＳＰ１６は始
端信号Ｐ１に応答してマイクロフォン６からの音声信号
をその始端ｔ１から音声認識処理を開始する。As shown in FIG. 3, the voice DSP 16 is in a standby state in which the voice recognition processing is not started until the start signal P1 is input. The DSP 16 for audio is
When the start signal P1 from the SP 12 is input, it is determined that the conversation of the operator has started, that is, the voice DSP 16 determines that the voice start t1 for the voice recognition processing is input from the microphone 6. Then, the voice DSP 16 responds to the start signal P1 to start the voice recognition processing of the voice signal from the microphone 6 from the start point t1.

【００２３】又、音声用ＤＳＰ１６は、始端信号Ｐ１に
基づいて音声認識処理を開始した後、画像用ＤＳＰ１２
からの音節端信号Ｐ２が入力されると、操縦者の会話に
おいて１つの音節が終了（音節端ｔ２）し、次の音節に
移るものと判断する。そして、音声用ＤＳＰ１６はマイ
クロフォン６からの音声信号を音節端信号Ｐ２に応答し
てその時々の１音節に相当する区間に区切り、その区切
った音声信号について音声認識処理を行い、操縦者が会
話した実音声パターンを作成していく。即ち、音節端信
号Ｐ２に基づいて操縦者の音声は音節区分されて、音声
認識処理はその音節毎の短い音声について行われる。音
声用ＤＳＰ１６は演算した実音声パターンが予め設定し
た複数の標準音声命令パターンのうちのいずれか一つに
一致した時、その一致した標準音声命令パターンに相対
する制御信号を出力している。After the voice DSP 16 starts the voice recognition process based on the start signal P1, the voice DSP 16
When the syllable end signal P2 is input, one syllable is ended (syllable end t2) in the conversation of the pilot, and it is determined that the next syllable is to be transferred. Then, the voice DSP 16 divides the voice signal from the microphone 6 into sections corresponding to one syllable at each time in response to the syllable end signal P2, performs voice recognition processing on the separated voice signal, and the pilot has a conversation. Create real voice patterns. That is, the voice of the operator is classified into syllables based on the syllable end signal P2, and the voice recognition processing is performed on the short voice of each syllable. When the calculated actual voice pattern matches any one of a plurality of preset standard voice command patterns, the voice DSP 16 outputs a control signal corresponding to the matched standard voice command pattern.

【００２４】更に、音声用ＤＳＰ１６は画像用ＤＳＰ１
２から終端信号Ｐ３が入力されると、操縦者の会話が終
了したとして、即ち音声認識処理のための音声の終端ｔ
３がマイクロフォン６から入力されると判断する。そし
て、音声用ＤＳＰ１６は終端信号Ｐ３に応答してマイク
ロフォン６からその終端ｔ３までの音声信号を入力した
後、マイクロフォン６から音声信号を入力せずに音声認
識処理を行わない待機状態となる。そして、音声用ＤＳ
Ｐ１６は次の新たな始端信号Ｐ１の入力を待つ。Further, the audio DSP 16 is an image DSP 1
When the terminal signal P3 is input from the terminal 2, it is determined that the conversation of the operator has ended, that is, the terminal t of the voice for the voice recognition process
3 is determined to be input from the microphone 6. Then, after the voice DSP 16 receives the voice signal from the microphone 6 to the terminal t3 in response to the terminal signal P3, the voice DSP 16 does not receive the voice signal from the microphone 6 and enters a standby state in which the voice recognition process is not performed. And DS for audio
P16 waits for the input of the next new start signal P1.

【００２５】尚、本実施形態では、画像用ＤＳＰ１２が
会話開始、音節端及び会話終了をそれぞれ認識し、始端
信号Ｐ１、音節端信号Ｐ２、終端信号Ｐ３を音声用ＤＳ
Ｐ１６に出力するタイミングは、マイクロフォン６から
音声用ＤＳＰ１６に入力される会話の開始、音節端及び
会話の終了を示す音声信号のタイミングより若干速いも
のとした。従って、音声用ＤＳＰ１６は遅れることなく
操縦者の音声の音声信号を入力して音声認識処理を行う
ことができる。In this embodiment, the image DSP 12 recognizes the conversation start, the syllable end, and the conversation end, respectively, and converts the start signal P1, the syllable end signal P2, and the end signal P3 into the audio DS.
The timing of the output to P16 is slightly earlier than the timing of the audio signal indicating the start, end of the syllable, and end of the conversation input from the microphone 6 to the voice DSP 16. Therefore, the voice DSP 16 can input the voice signal of the voice of the operator without delay and perform the voice recognition process.

【００２６】また、始端信号Ｐ１、音節端信号Ｐ２、終
端信号Ｐ３を音声用ＤＳＰ１６に出力するタイミングが
音声信号の入力するタイミングより非常に速く音声認識
処理に支障がある場合には、音声用ＤＳＰ１６は始端信
号Ｐ１、音節端信号Ｐ２、終端信号Ｐ３を入力した後、
一定の時間後に音声認識処理のための動作を開始しても
よい。If the timing at which the start signal P1, the syllable end signal P2, and the end signal P3 are output to the voice DSP 16 is much faster than the timing at which the voice signal is input, and there is a problem in voice recognition processing, the voice DSP 16 After inputting the start signal P1, the syllable end signal P2, and the end signal P3,
The operation for the voice recognition process may be started after a certain time.

【００２７】本実施形態の車両用音声認識装置によれ
ば、以下のような特徴を得ることができる。（１）本実施形態では、画像用ＤＳＰ１２によってＣＣ
Ｄカメラ３が撮像した顔から唇パターンを演算し、その
唇パターンに基づいて操縦者の会話の開始を検出し、そ
の会話の開始を示す始端信号Ｐ１を音声用ＤＳＰ１６に
出力する。そして、音声用ＤＳＰ１６はこの始端信号Ｐ
１に応答してマイクロフォン６から操縦者の音声を入力
し音声認識処理動作を開始する。According to the voice recognition device for a vehicle of the present embodiment, the following features can be obtained. (1) In the present embodiment, the image DSP 12
The lip pattern is calculated from the face imaged by the D camera 3, the start of the conversation of the pilot is detected based on the lip pattern, and the start signal P1 indicating the start of the conversation is output to the DSP 16 for voice. Then, the DSP 16 for audio outputs the start signal P
In response to 1, the voice of the operator is input from the microphone 6 and the voice recognition processing operation is started.

【００２８】従って、音声認識処理をする際の音声の始
端（会話の開始）を特定するための操作部材を操作する
ことなく、確実にしかも正確に音声の始端（会話の開
始）を特定することができ、音声の認識率を向上させこ
とができる。Therefore, it is possible to reliably and accurately specify the beginning of the voice (start of conversation) without operating an operation member for specifying the start of voice (start of conversation) when performing voice recognition processing. And the voice recognition rate can be improved.

【００２９】（２）本実施形態では、画像用ＤＳＰ１２
によってＣＣＤカメラ３が撮像した顔から唇パターンを
演算し、その唇パターンに基づいて操縦者の会話におい
て１つの音節端ｔ２を検出し、その音節端ｔ２を示す音
節端信号Ｐ２を音声用ＤＳＰ１６に出力する。そして、
音声用ＤＳＰ１６はこの音節端信号Ｐ２に応答してマイ
クロフォン６からの操縦者の音声を音節区分し、その音
声認識処理をその音節毎の短い音声について行ってい
る。従って、音声の認識率を更に向上させることができ
る。(2) In this embodiment, the image DSP 12
The lip pattern is calculated from the face imaged by the CCD camera 3, and one syllable end t2 is detected in the conversation of the pilot based on the lip pattern, and a syllable end signal P2 indicating the syllable end t2 is sent to the voice DSP 16. Output. And
The voice DSP 16 responds to the syllable end signal P2 to classify the voice of the operator from the microphone 6 into syllables, and performs the voice recognition processing on the short voice of each syllable. Therefore, the voice recognition rate can be further improved.

【００３０】（３）本実施形態では、画像用ＤＳＰ１２
によってＣＣＤカメラ３が撮像した顔から唇パターンを
演算し、操縦者の会話の終了を検出し、その会話の終了
を示す終端信号Ｐ３を音声用ＤＳＰ１６に出力する。そ
して、音声用ＤＳＰ１６はこの終端信号Ｐ３に応答して
操縦者の音声の始端ｔ１から終端ｔ３までの音声信号に
ついて音声認識処理を行っている。従って、音声用ＤＳ
Ｐ１６は操縦者の会話の開始から終了までの音声認識の
みを行うため、音声認識率を更に向上させることができ
る。(3) In this embodiment, the image DSP 12
A lip pattern is calculated from the face imaged by the CCD camera 3 to detect the end of the conversation of the pilot, and an end signal P3 indicating the end of the conversation is output to the DSP 16 for voice. Then, the voice DSP 16 performs voice recognition processing on the voice signal from the start t1 to the end t3 of the driver's voice in response to the end signal P3. Therefore, DS for audio
P16 performs only the voice recognition from the start to the end of the conversation of the operator, so that the voice recognition rate can be further improved.

【００３１】（４）本実施形態では、音声用ＤＳＰ１６
は始端信号Ｐ１を入力してから終端信号Ｐ３を入力する
まで音声認識処理を行い、それ以外では音声認識処理を
行わない待機状態となる。従って、第三者の音声や車外
からのノイズ等で音声用ＤＳＰ１６が音声認識処理をし
てしまうことはない。その結果、第三者の音声や車外か
らのノイズ等による音声の誤認識を未然に防止すること
ができる。(4) In this embodiment, the audio DSP 16
Performs a speech recognition process from the input of the start signal P1 to the input of the end signal P3, and otherwise enters a standby state in which the speech recognition process is not performed. Therefore, the voice DSP 16 does not perform voice recognition processing due to voice of a third party, noise from the outside of the vehicle, or the like. As a result, it is possible to prevent erroneous recognition of a voice caused by a voice of a third party or noise from outside the vehicle.

【００３２】なお、本発明の実施形態は以下のように変
更してもよい。 ○ 画像処理装置１１の画像用ＤＳＰ１２に代えて画像
用中央演算装置（ＣＰＵ）に変更してもよい。The embodiment of the present invention may be modified as follows. The image DSP 12 of the image processing apparatus 11 may be replaced with an image central processing unit (CPU).

【００３３】○ 音声処理装置１５の音声用ＤＳＰ１６
に代えて音声用中央演算装置（ＣＰＵ）に変更してもよ
い。 ○ ＣＣＤカメラ３に代えて撮像管カメラを使用しても
よい。The audio DSP 16 of the audio processor 15
May be changed to a central processing unit for voice (CPU). ○ An imaging tube camera may be used instead of the CCD camera 3.

【００３４】○ １個のＣＣＤカメラ３に代えて複数の
ＣＣＤカメラ３を用いてもよい。この場合、複数のＣＣ
Ｄカメラ３から操縦者の唇を的確にとらえている一つの
ＣＣＤカメラ３を選択し、その選択したＣＣＤカメラ３
の画像信号を用いて唇パターンが演算される。A plurality of CCD cameras 3 may be used instead of one CCD camera 3. In this case, multiple CCs
One CCD camera 3 that accurately captures the lips of the driver is selected from the D cameras 3 and the selected CCD camera 3
The lip pattern is calculated using the image signal of.

【００３５】○ １個のマイクロフォン６に代えて複数
のマイクロフォンを用いてもよい。この場合、各マイク
ロフォンからの音声は音声用ＤＳＰ１６に入力されて合
成される。A plurality of microphones may be used instead of one microphone 6. In this case, the sound from each microphone is input to the sound DSP 16 and synthesized.

【００３６】○ ＣＣＤカメラ３をＡピラー２に設けた
が、インナーミラー支持部７に設けてもよい。 ○ ＣＣＤカメラ３のビデオ画像信号を微分した画像信
号を画像処理装置１１に入力することに代えて、前記ビ
デオ画像信号を画像処理装置１１に入力してもよい。こ
の場合、画像処理装置１１内において入力した前記ビデ
オ画像信号を微分する。Although the CCD camera 3 is provided on the A pillar 2, it may be provided on the inner mirror support 7. Instead of inputting an image signal obtained by differentiating the video image signal of the CCD camera 3 to the image processing device 11, the video image signal may be input to the image processing device 11. In this case, the video image signal input in the image processing device 11 is differentiated.

【００３７】上記各別例のように構成した場合にも、前
記実施形態とほぼ同様な特徴を得ることができる。次
に、前記実施形態及び別例から把握できる請求項に記載
した発明以外の技術的思想について、それらの効果と共
に以下に記載する。Even in the case of the configuration of each of the above-mentioned different examples, substantially the same features as those of the above embodiment can be obtained. Next, technical ideas other than the inventions described in the claims that can be grasped from the embodiment and other examples will be described below together with their effects.

【００３８】（１）請求項１に記載の車両用音声認識装
置において、前記画像認識手段（１２）は、前記唇パタ
ーンに基づいて操縦者の会話の終了を検出し、終端信号
（Ｐ３）を出力するものであり、音声認識手段（１６）
は、その終端信号（Ｐ３）を入力し、前記始端信号（Ｐ
１）に応答してから終端信号（Ｐ３）までの間の操縦者
の音声について音声認識処理を行うようにした車両用音
声認識装置。(1) In the voice recognition apparatus for a vehicle according to claim 1, the image recognition means (12) detects the end of the conversation of the pilot based on the lip pattern, and generates an end signal (P3). Output, voice recognition means (16)
Receives the end signal (P3) and receives the start signal (P3).
A voice recognition device for a vehicle, which performs voice recognition processing on a voice of a driver during a period from responding to 1) to an end signal (P3).

【００３９】従って、この（１）に記載の発明によれ
ば、音声の認識率を向上させることができる。（２）請求項１に記載の車両用音声認識装置において、
前記音声認識手段（１６）は、画像認識手段（１２）の
始端信号（Ｐ１）から終端信号Ｐ３までの間を除いて音
声認識処理を行わない待機状態であることを特徴とする
車両用音声認識装置。Therefore, according to the invention described in (1), the speech recognition rate can be improved. (2) The vehicle voice recognition device according to claim 1,
The voice recognition means for a vehicle, wherein the voice recognition means (16) is in a standby state in which voice recognition processing is not performed except for a period from a start signal (P1) to an end signal P3 of the image recognition means (12). apparatus.

【００４０】従って、この（２）に記載の発明によれ
ば、音声の誤認識を防止することができる。（３）車内に設けられたマイクロフォン（６）と、その
マイクロフォン（６）が検出した操縦者の音声を認識す
る音声認識手段（１６）とからなる車両用音声認識装置
の音声認識方法において、前記操縦者の唇を含む顔を撮
像してその画像信号から前記操縦者の唇の画像領域を切
り出し、その時々の唇パターンを演算し、その唇パター
ンに基づいて操縦者の会話の開始を検出し、その開始時
点に基づいて前記音声認識手段（１６）に対してマイク
ロフォン（６）から操縦者の音声を入力し、音声認識処
理動作を行わせる車両用音声認識装置の音声認識方法。Therefore, according to the invention described in (2), erroneous recognition of voice can be prevented. (3) A voice recognition method for a vehicular voice recognition device, comprising: a microphone (6) provided in a vehicle; and voice recognition means (16) for recognizing a driver's voice detected by the microphone (6). An image of the face including the lips of the pilot is imaged, the image area of the lips of the pilot is cut out from the image signal, the lip pattern at each time is calculated, and the start of the conversation of the pilot is detected based on the lip pattern. A voice recognition method for a vehicular voice recognition device for inputting a voice of a driver from a microphone (6) to the voice recognition means (16) based on a start time thereof and performing a voice recognition processing operation.

【００４１】[0041]

【発明の効果】以上詳述したように、請求項１に記載の
発明によれば、音声の開始を特定するための操作部材を
操作することなく、音声認識手段は確実にマイクロフォ
ンからの音声を捕らえて音声認識処理を行うことができ
るため、音声の認識率を向上させることができる。As described above in detail, according to the first aspect of the present invention, the voice recognition means can reliably output the voice from the microphone without operating the operation member for specifying the start of the voice. Since the voice recognition processing can be performed by capturing the voice, the voice recognition rate can be improved.

【００４２】請求項２に記載の発明によれば、音声認識
処理が容易になる。その結果、音声の認識率を向上させ
ることができる。According to the second aspect of the present invention, the voice recognition processing is facilitated. As a result, the voice recognition rate can be improved.

[Brief description of the drawings]

【図１】本実施形態におけるＣＣＤカメラ及びマイク
ロフォンの配置を説明する車室の概略図。FIG. 1 is a schematic diagram of a passenger compartment illustrating an arrangement of a CCD camera and a microphone according to an embodiment.

【図２】本実施形態における車両用音声認識装置の電
気的構成図。FIG. 2 is an electrical configuration diagram of the vehicle voice recognition device according to the embodiment.

【図３】本実施形態における音声波形のタイミングチ
ャート。FIG. 3 is a timing chart of an audio waveform in the embodiment.

[Explanation of symbols]

Ｐ１…始端信号、Ｐ２…音節端信号、ｔ２…音節端、３
…撮像手段としてのＣＣＤカメラ、６…マイクロフォ
ン、１２…画像認識手段としての画像用ＤＳＰ、１６…
音声認識手段としての音声用ＤＳＰ。P1: start signal, P2: syllable end signal, t2: syllable end, 3
... CCD camera as imaging means, 6 ... Microphone, 12 ... Image DSP as image recognition means, 16 ...
DSP for voice as voice recognition means.

フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＧ０６Ｔ 1/00 Ｇ０６Ｆ 15/62 ３８０ Continuation of the front page (51) Int.Cl. ⁶ Identification code FI G06T 1/00 G06F 15/62 380

Claims

[Claims]

1. A microphone (6) provided in a vehicle
And a voice recognition means (16) for recognizing the driver's voice detected by the microphone (6). An image pickup device provided in the vehicle for capturing a face including the pilot's lips. Means (3), an image signal from the image pickup means (3) is input, an image area of the lips of the pilot is cut out from the image signal, and a lip pattern at each time is calculated, based on the lip pattern. Detecting the start of the conversation of the pilot, and recognizing said voice recognition means (16)
, A pilot's voice is input from the microphone (6), and a start signal (P) for starting a voice recognition processing operation is input.
1. A vehicle voice recognition device comprising: an image recognition means (12) for outputting 1).

2. The vehicle speech recognition device according to claim 1, wherein the image recognition means detects a syllable end (t2) from the start to the end of the conversation of the pilot based on the lip pattern. And outputs a syllable end signal (P2). The voice recognition means (16) outputs the syllable end signal (P2).
A voice recognition device for a vehicle, which responds to the start signal (P1) based on the syllable end signal (P2) and classifies the syllable based on the syllable end signal (P2) to perform voice recognition processing.