JP2000322074A

JP2000322074A - Voice input section determination device, aural data extraction device, speech recognition device, vehicle navigation device and input microphone

Info

Publication number: JP2000322074A
Application number: JP11132822A
Authority: JP
Inventors: Kunio Yokoi; 邦雄横井; Ichiro Akahori; 一郎赤堀; Hiroshi Ono; 宏大野; Norihide Kitaoka; 教英北岡; Kazuhiro Higuchi; 和広樋口
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 1999-05-13
Filing date: 1999-05-13
Publication date: 2000-11-24

Abstract

PROBLEM TO BE SOLVED: To remove effectively a noise component from an input signal even in a noisy environment, to execute excellently the determination of a voice input section, and to improve a recognition ratio of a speech of a speaker. SOLUTION: A voice microphone 19 and a noise microphone 20 are arranged in the same microphone assembly so as to have different directivities by 180 degrees respectively. A frame dividing part 29 generates aural signal data and noise signal data based on input signals of the voice microphone 19 and the noise microphone 20, and a determination part 39 of a voice section determination device 30 determines an input section of a voice relative to the aural signal data based on the difference of short-time powers between the aural signal data and the noise signal data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、認識対象となる音
声信号成分と雑音信号成分とが混在した入力信号から雑
音信号成分を除去することによって、音声信号の認識率
の向上を図るようにした音声入力区間判定装置，音声デ
ータ抽出装置，音声認識装置及び前記音声認識装置を備
えた車両用ナビゲーション装置またはこれらの装置の何
れかに使用される入力用マイクに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention improves the recognition rate of a speech signal by removing a noise signal component from an input signal in which a speech signal component to be recognized and a noise signal component are mixed. The present invention relates to a voice input section determination device, a voice data extraction device, a voice recognition device, a vehicle navigation device including the voice recognition device, or an input microphone used in any of these devices.

【０００２】[0002]

【従来の技術】音声認識装置は、例えばカーナビゲーシ
ョン装置などに用いられており、例えば自動車の乗員が
目的地を設定するために発声した音声を認識して設定を
行うようにしている。この場合、音声認識装置は、入力
された音声を予め記憶されている複数の比較対象パター
ンと比較して、一致度合いの最も高いものを認識結果と
するようにしている。2. Description of the Related Art A speech recognition apparatus is used, for example, in a car navigation apparatus, and recognizes a voice uttered by an occupant of a car to set a destination, for example, and performs setting. In this case, the speech recognition device compares the input speech with a plurality of comparison target patterns stored in advance, and determines the one having the highest degree of coincidence as the recognition result.

【０００３】例えば、入力用マイクに入力される信号に
ついて音声が入力されている区間を判定するための従来
技術としては、図１６に示すように、入力信号のパワー
（電力）が予め設定された閾値を超えた区間を音声入力
区間として判定するようにしたものがある。しかしなが
ら、カーナビゲーション装置の実際の使用環境下である
車中においては、話者により発声された音声が入力され
る際に雑音成分が含まれることが避け難いため、音声入
力区間の判定が困難であった。[0003] For example, as a conventional technique for determining a section where a voice is being input for a signal input to an input microphone, as shown in FIG. 16, the power of an input signal is set in advance. In some cases, a section exceeding a threshold is determined as a speech input section. However, in a car under the actual use environment of the car navigation device, it is difficult to avoid the noise component when the voice uttered by the speaker is input, so that it is difficult to determine the voice input section. there were.

【０００４】そこで、例えば、音声入力区間の判定後に
入力信号から雑音成分を除去する方式の１つとして、ス
ペクトラムサブトラクション法（例えば、STEVEN F BOL
L,Suppression of Acoustic Noise in Speech Using Sp
ectral Subtraction,IEEE Tran.on Acoustics,Speech a
nd Signal processing,Vol.Assp-27,No.2 April 1979,p
p.113-120 など）がある。スペクトラムサブトラクショ
ン法は、入力された音声データを短時間のフレームに分
割し、各フレーム毎に雑音が重畳されている入力音声の
スペクトラムから事前に推定した雑音のスペクトラムを
減算することで、本来の音声スペクトラムを推定するよ
うにしたものである。Therefore, for example, as one of the methods for removing a noise component from an input signal after determining a voice input section, a spectrum subtraction method (for example, STEVEN F BOL) is used.
L, Suppression of Acoustic Noise in Speech Using Sp
ectral Subtraction, IEEE Tran.on Acoustics, Speech a
nd Signal processing, Vol.Assp-27, No.2 April 1979, p
p.113-120). The spectrum subtraction method divides input audio data into short-time frames, and subtracts the noise spectrum estimated in advance from the spectrum of the input audio on which noise is superimposed for each frame, to obtain the original audio data. The spectrum is estimated.

【０００５】[0005]

【発明が解決しようとする課題】ところが、事前に推定
された雑音は、音声が入力される直前の入力状態を参照
することで推定されているため、実際に認識対象の音声
が入力されている区間に発生している雑音とは一致しな
いという問題がある。However, since the noise estimated in advance is estimated by referring to the input state immediately before the input of the voice, the voice to be recognized is actually input. There is a problem that the noise does not coincide with the noise generated in the section.

【０００６】そこで、特開平４−２４５３００号公報に
開示されているように、音声入力用のマイクと雑音入力
用のマイクとを別個に設けて、両者より得られるデータ
に基づき雑音成分を推定し、リアルタイムで減算するよ
うにしたものがある。しかしながら、この方式でも、両
マイクの位置を近付け過ぎると雑音入力用マイクに音声
が混入されてしまうという問題がある。また、両マイク
の位置を離し過ぎると音声と雑音との相関が失われるこ
とになり、雑音成分の除去を精度良く行うことができな
くなる。Therefore, as disclosed in Japanese Patent Application Laid-Open No. 4-245300, a microphone for voice input and a microphone for noise input are separately provided, and a noise component is estimated based on data obtained from both. , There is one that subtracts in real time. However, even in this method, there is a problem that if the positions of both microphones are too close, sound is mixed into the noise input microphone. Further, if the positions of both microphones are too far apart, the correlation between voice and noise will be lost, and it will not be possible to accurately remove the noise component.

【０００７】本発明は上記事情に鑑みてなされたもので
あり、その目的は、雑音環境下においても、入力信号か
ら雑音成分を有効に除去し得て、音声入力区間の判定を
良好に行うことができる音声入力区間判定装置、また、
話者の音声の認識率を向上させることができる音声認識
装置及びその音声認識装置を備えてなる車両用ナビゲー
ション装置並びにこれらの何れかの装置に使用される入
力用マイクを提供することにある。The present invention has been made in view of the above circumstances, and an object of the present invention is to enable a noise component to be effectively removed from an input signal even in a noisy environment and to make a good determination of a voice input section. Voice input section determination device,
It is an object of the present invention to provide a voice recognition device capable of improving the recognition rate of a speaker's voice, a vehicle navigation device including the voice recognition device, and an input microphone used in any of these devices.

【０００８】[0008]

【課題を解決するための手段】請求項１記載の音声入力
区間判定装置によれば、音声信号データ生成手段は、指
向性が話者の方向に設定された音声用マイクの入力信号
に基づく音声信号データを生成し、雑音データ生成手段
は、指向性が音声用マイクとは異なるように設定された
雑音用マイクの入力信号に基づく雑音信号データを生成
する。そして、音声入力区間判定手段は、音声信号デー
タと雑音信号データとの差分に基づいて、前記音声信号
データにおける音声の入力区間を判定する。According to the first aspect of the present invention, the voice signal data generating means includes a voice signal based on an input signal of a voice microphone whose directivity is set to a direction of a speaker. The signal data is generated, and the noise data generating means generates noise signal data based on the input signal of the noise microphone set so that the directivity is different from that of the voice microphone. Then, the voice input section determining means determines a voice input section in the voice signal data based on a difference between the voice signal data and the noise signal data.

【０００９】即ち、話者が音声を発すると、音声用マイ
クには話者の音声と共に雑音が入力され、指向性が異な
る雑音用マイクには専ら雑音が入力されるので、音声信
号データと雑音信号データとの差分をとることによって
音声信号データ中に含まれている雑音成分が除去されて
話者の音声成分が残留する。従って、その残留した音声
成分に基づいて、音声信号データ中において話者の音声
が入力されている区間を確実に判定することができる。That is, when a speaker utters a voice, noise is input to the voice microphone together with the voice of the speaker, and noise is exclusively input to the noise microphone having different directivity. By taking the difference from the signal data, the noise component contained in the audio signal data is removed, and the audio component of the speaker remains. Therefore, based on the remaining voice component, it is possible to reliably determine a section in the voice signal data where the voice of the speaker is input.

【００１０】請求項２または３記載の音声入力区間判定
装置によれば、雑音用マイクの指向性を音声用マイクの
指向性に対して略１８０度（請求項２）または略９０度
（請求項３）異なるように設定するので、音声用マイク
と雑音用マイクとの干渉を極力小さくすることができる
と共に、両者のマイクより得られる信号の相関はある程
度維持することができるので、音声入力区間の判定精度
を向上させることができる。According to the second or third aspect of the present invention, the directivity of the noise microphone is approximately 180 degrees (claim 2) or approximately 90 degrees (claim). 3) Since the settings are made differently, the interference between the voice microphone and the noise microphone can be minimized, and the correlation between the signals obtained from both microphones can be maintained to some extent. Determination accuracy can be improved.

【００１１】請求項４記載の音声入力区間判定装置によ
れば、差分量検出手段は、実際に音声入力区間の判定が
行われる前に、予め、音声用マイク及び雑音用マイクに
夫々入力される雑音信号に基づく差分量を検出し、指向
性制御手段は、前記差分量が最大となるように指向性変
化機構によって音声用マイクと雑音用マイクとの指向性
を相対的に変化させるように制御する。[0011] According to the voice input section determining apparatus, the difference amount detection means is input to the voice microphone and the noise microphone in advance before the voice input section is actually determined. The directivity control means detects a difference amount based on the noise signal, and controls the relative directivity between the voice microphone and the noise microphone by a directivity changing mechanism so that the difference amount is maximized. I do.

【００１２】ここで、音声用マイクに入力される音声信
号成分をＳ（ｆ），雑音信号成分をＮ１（ｆ），雑音用
マイクに入力される雑音信号成分をＮ２（ｆ）として単
純な加減算モデルで考えると、音声入力区間を判定する
場合の差分量Ｄ（ｆ）は、Ｄ（ｆ）＝（Ｓ（ｆ）＋Ｎ１（ｆ））−α（ｆ）・Ｎ２
（ｆ）と表される。但し、ｆは周波数パラメータ，係数α
（ｆ）は、音声信号成分Ｓ（ｆ）が音声用マイクに入力
されない状態でＤ（ｆ）＝０となるように調整するため
の係数である。Here, a simple addition / subtraction is performed, where S (f) is a voice signal component input to the voice microphone, N1 (f) is a noise signal component, and N2 (f) is a noise signal component input to the noise microphone. Considering the model, the difference D (f) when determining the voice input section is: D (f) = (S (f) + N1 (f)) − α (f) · N2
(F). Where f is a frequency parameter and a coefficient α
(F) is a coefficient for adjusting so that D (f) = 0 when the audio signal component S (f) is not input to the audio microphone.

【００１３】従って、雑音用マイクに入力する信号につ
いては、Ｓ（ｆ）を極力入力することなく、Ｎ１（ｆ）
と相似であるＮ２（ｆ）をＮ１（ｆ）よりも大きなレベ
ルで得ることによって、Ｎ２（ｆ）に関する情報を高精
度で得ることが好ましい。即ち、雑音信号の差分量Ｄｎ
（ｆ）を、Ｄｎ（ｆ）＝Ｎ２（ｆ）−Ｎ１（ｆ）と定義すると、Ｄｎ（ｆ）が最も大きくなる状態が望ま
しい。その結果、音声入力区間の判定を行う場合には、
音声用マイク側にＳ（ｆ）と共に入力されるＮ１（ｆ）
を効果的に抑圧し得て、差分量Ｄ（ｆ）＝Ｓ（ｆ）とな
ることで音声信号Ｓ（ｆ）を良好に抽出できるようにな
り、音声入力区間の判定を高精度で行うことができる。Therefore, as for the signal to be input to the noise microphone, N1 (f) is input without inputting S (f) as much as possible.
It is preferable to obtain N2 (f) with high accuracy by obtaining N2 (f) similar to the above at a higher level than N1 (f). That is, the difference amount Dn of the noise signal
If (f) is defined as Dn (f) = N2 (f) -N1 (f), it is desirable that Dn (f) be the largest. As a result, when determining the voice input section,
N1 (f) input together with S (f) to the voice microphone side
Can be effectively suppressed, and the difference signal D (f) = S (f) makes it possible to satisfactorily extract the audio signal S (f), thereby determining the audio input section with high accuracy. Can be.

【００１４】請求項５記載の音声入力区間判定装置によ
れば、音声入力区間判定手段は、複数対の音声用マイク
及び雑音用マイクの内、差分量検出手段によって検出さ
れる差分量が最大となるものに夫々入力される入力信号
に基づいて、音声入力区間を判定する。従って、請求項
４と同様に、実際の使用環境に応じて雑音信号の差分量
が最大となる音声用マイク及び雑音用マイクの対より得
られる入力信号により音声入力区間を高精度で判定する
ことができる。According to the fifth aspect of the present invention, the voice input section determining means determines that the difference amount detected by the difference amount detecting means among a plurality of pairs of voice microphones and noise microphones is maximum. The voice input section is determined based on the input signals respectively input to the input devices. Therefore, similarly to the fourth aspect, the voice input section is determined with high accuracy based on the input signal obtained from the pair of the voice microphone and the noise microphone in which the difference amount of the noise signal is maximized according to the actual use environment. Can be.

【００１５】請求項６記載の音声入力区間判定装置によ
れば、音声入力区間判定手段は、音声信号データと雑音
信号データとの平均電力の差分に基づいて音声入力区間
を判定する。例えば、音声信号データ生成手段及び雑音
信号データ生成手段の夫々において、各マイクの入力信
号に基づいて各信号データ生成する場合には、入力信号
を所定期間毎にフレームとして分割して取り扱うことが
一般に行われている。According to a sixth aspect of the present invention, the voice input section determining means determines a voice input section based on a difference between average powers of voice signal data and noise signal data. For example, when each of the audio signal data generating means and the noise signal data generating means generates each signal data based on the input signal of each microphone, it is generally common to handle the input signal by dividing it as a frame at predetermined intervals. Is being done.

【００１６】そこで、音声信号データ及び雑音信号デー
タについて夫々各フレーム毎に平均電力を計算し両者の
差分をとれば、話者の音声が含まれているフレームにつ
いては差分値が著しく大きくなる。また、電力は入力信
号振幅の２乗で得られることから、振幅レベルを直接比
較する場合に比べてダイナミックレンジがより大きくな
る。従って、音声入力区間を容易に判定することができ
る。Therefore, if the average power is calculated for each frame for the voice signal data and the noise signal data and the difference between the two is calculated, the difference value for the frame containing the voice of the speaker becomes significantly large. Further, since the power is obtained by the square of the input signal amplitude, the dynamic range is larger than in the case where the amplitude levels are directly compared. Therefore, the voice input section can be easily determined.

【００１７】請求項７記載の音声入力区間判定装置によ
れば、音声入力区間判定手段は、音声信号データと雑音
信号データとのパワースペクトラムの差分に基づいて音
声の入力区間を判定する。即ち、各信号データについて
パワースペクトラムを求めることで、音声データを含ん
でいるフレームについては音声データに特徴的なスペク
トラム包絡が現れる。従って、そのような特徴的なスペ
クトラム包絡を含んでいる部分を検出することで音声入
力区間を容易に判定することができる。According to the seventh aspect of the present invention, the voice input section determining means determines the voice input section based on a difference in power spectrum between the voice signal data and the noise signal data. That is, by obtaining the power spectrum for each signal data, a characteristic spectrum envelope appears in the audio data for a frame including the audio data. Therefore, the voice input section can be easily determined by detecting a portion including such a characteristic spectrum envelope.

【００１８】請求項８記載の音声データ抽出装置によれ
ば、音声信号データ生成手段は、指向性が話者の方向に
設定された音声用マイクの入力信号に基づく音声信号デ
ータを生成し、雑音データ生成手段は、指向性が音声用
マイクとは異なるように設定された雑音用マイクの入力
信号に基づく雑音信号データを生成する。そして、音声
データ抽出手段は、音声信号データと雑音信号データと
の差分に基づいて音声信号データより音声データを抽出
する。According to the voice data extracting device of the present invention, the voice signal data generating means generates voice signal data based on an input signal of a voice microphone whose directivity is set to a direction of a speaker, and generates noise signal. The data generating means generates noise signal data based on the input signal of the noise microphone set so that the directivity is different from that of the voice microphone. Then, the audio data extracting means extracts audio data from the audio signal data based on a difference between the audio signal data and the noise signal data.

【００１９】即ち、話者が音声を発すると、音声用マイ
クには話者の音声と共に雑音が入力され、指向性が異な
る雑音用マイクには専ら雑音が入力されるので、音声信
号データと雑音信号データとの差分をとることによって
音声信号データ中に含まれている雑音成分が除去されて
話者の音声成分が残留する。従って、音声信号データ中
に含まれている話者の音声データを確実に抽出すること
ができる。That is, when a speaker emits a voice, noise is input to the voice microphone together with the voice of the speaker, and noise is exclusively input to the noise microphone having different directivity. By taking the difference from the signal data, the noise component contained in the audio signal data is removed, and the audio component of the speaker remains. Therefore, it is possible to reliably extract the voice data of the speaker included in the voice signal data.

【００２０】請求項９または１０記載の音声データ抽出
装置によれば、雑音用マイクの指向性を音声用マイクの
指向性に対して略１８０度（請求項９）または略９０度
（請求項１０）異なるように設定するので、音声用マイ
クと雑音用マイクとの干渉を極力小さくすることが可能
であると共に、両者のマイクより得られる信号の相関は
ある程度維持することができるので、音声データの抽出
精度を向上させることができる。According to the audio data extracting apparatus of the ninth or tenth aspect, the directivity of the noise microphone is approximately 180 degrees (claim 9) or approximately 90 degrees (claim 10) with respect to the directivity of the audio microphone. ) Since the settings are made differently, it is possible to minimize the interference between the voice microphone and the noise microphone, and to maintain the correlation between the signals obtained from both microphones to some extent. The extraction accuracy can be improved.

【００２１】請求項１１記載の音声データ抽出装置によ
れば、指向性制御手段は、差分量検出手段によって検出
される差分量が最大となるように、指向性変化機構によ
って音声用マイクと雑音用マイクとの指向性を相対的に
変化させるように制御する。従って、実際の使用環境に
応じて、音声用マイクと雑音用マイクとに夫々の入力さ
れる雑音信号の差分量が最大となる状態に調整して音声
データを抽出することができるので、音声データの抽出
を高精度で行うことができる。According to the audio data extracting device of the eleventh aspect, the directivity control means uses the directivity changing mechanism and the voice microphone and the noise control means so that the difference amount detected by the difference amount detecting means is maximized. Control is performed so that the directivity with the microphone is relatively changed. Therefore, according to the actual use environment, the audio data can be extracted by adjusting the difference between the noise signals input to the audio microphone and the noise microphone to a state where the difference amount becomes maximum. Can be extracted with high accuracy.

【００２２】請求項１２記載の音声入力区間判定装置に
よれば、音声データ抽出手段は、複数対の音声用マイク
及び雑音用マイクの内、差分量検出手段によって検出さ
れる差分量が最大となるものに夫々入力される入力信号
に基づいて、音声データの抽出を行う。従って、請求項
１１と同様に、実際の使用環境に応じて雑音信号の差分
量が最大となる音声用マイク及び雑音用マイクの対より
得られる入力信号に基づいて音声データを高精度で抽出
することができる。According to the twelfth aspect of the present invention, the audio data extracting means has a maximum difference amount detected by the difference amount detecting means among a plurality of pairs of audio microphones and noise microphones. The audio data is extracted based on the input signals respectively input to the objects. Therefore, as in the eleventh aspect, audio data is extracted with high accuracy based on an input signal obtained from a pair of an audio microphone and a noise microphone in which the difference amount of the noise signal is maximized according to the actual use environment. be able to.

【００２３】請求項１３記載の音声データ抽出装置によ
れば、音声データ抽出手段は、音声信号データと雑音信
号データとの所定範囲内におけるパワースペクトラムの
差分に基づいて音声データを抽出する。即ち、各信号デ
ータについてパワースペクトラムを求めると、音声デー
タを含んでいるフレームについては、音声データに特徴
的なスペクトラム包絡が重畳された状態で現れる。従っ
て、そこから雑音データのスペクトラム包絡を差し引く
ことによって、音声データを精度良く抽出することがで
きる。According to a thirteenth aspect of the present invention, the audio data extracting means extracts the audio data based on a difference in power spectrum within a predetermined range between the audio signal data and the noise signal data. That is, when a power spectrum is obtained for each signal data, a frame including audio data appears in a state where a characteristic spectrum envelope is superimposed on the audio data. Therefore, by subtracting the spectrum envelope of the noise data therefrom, audio data can be extracted with high accuracy.

【００２４】請求項１４記載の音声データ抽出装置によ
れば、請求項１乃至７の何れかに記載の音声入力区間判
定装置を備えるので、音声データ抽出手段は、音声入力
区間判定装置の音声入力区間判定手段によって判定され
た音声入力区間に含まれている音声データを抽出するこ
とにより、音声データの抽出を容易に行うことができ
る。According to a fourteenth aspect of the present invention, the voice data extracting device includes the voice input section determining device according to any one of the first to seventh aspects. By extracting the voice data included in the voice input section determined by the section determining means, the voice data can be easily extracted.

【００２５】請求項１５記載の音声データ抽出装置によ
れば、音声用マイク，雑音用マイク，雑音データ生成手
段及び入力信号データ生成手段を、音声入力区間判定装
置の各要素と共通に構成するので、全体を小形に構成す
ることができる。According to the voice data extracting device of the present invention, the voice microphone, the noise microphone, the noise data generating means and the input signal data generating means are configured in common with the respective elements of the voice input section determining device. , The whole can be made small.

【００２６】請求項１６記載の音声認識装置によれば、
請求項１乃至７の何れかに記載の音声入力区間判定装置
を備えることで、音声認識手段は、音声入力区間判定手
段により判定された音声入力区間に含まれている音声デ
ータを解析して音声データパターンを認識するので、高
精度で判定された音声入力区間に基づいて、音声データ
パターンの認識を容易に行うことができる。According to the speech recognition apparatus of the sixteenth aspect,
By providing the voice input section determination device according to any one of claims 1 to 7, the voice recognition section analyzes voice data included in the voice input section determined by the voice input section determination section and performs voice recognition. Since the data pattern is recognized, the voice data pattern can be easily recognized based on the voice input section determined with high accuracy.

【００２７】請求項１７記載の音声認識装置によれば、
請求項８乃至１５の何れかに記載の音声データ抽出装置
を備えることで、音声認識手段は、音声データ抽出装置
の音声データ抽出手段によって抽出された音声データを
解析して音声データパターンを認識するので、高精度で
抽出された音声データに基づいて音声データパターンの
認識を容易に行うことができる。[0027] According to the speech recognition apparatus of the seventeenth aspect,
By providing the voice data extraction device according to any one of claims 8 to 15, the voice recognition unit recognizes a voice data pattern by analyzing the voice data extracted by the voice data extraction unit of the voice data extraction device. Therefore, it is possible to easily recognize the voice data pattern based on the voice data extracted with high accuracy.

【００２８】請求項１８記載の車両用ナビゲーション装
置によれば、請求項１６または１７記載の音声認識装置
を備え、音声認識手段により認識された音声データパタ
ーンに基づいて所定の処理を実行する。即ち、車両の運
転中などにおいて雑音レベルが比較的高い環境下であっ
ても、その雑音の影響を極力排除して話者たる車両の乗
員が発した音声のデータパターンを確実に認識すること
が可能となる。従って、乗員は、入力キーなどを手動操
作せずとも音声入力によって所望の操作を確実に行わせ
ることができ、利便性が向上すると共に運転の安全性を
確保することもできる。[0028] According to the vehicle navigation apparatus of the eighteenth aspect, there is provided the voice recognition apparatus of the sixteenth or seventeenth aspect, and executes a predetermined process based on the voice data pattern recognized by the voice recognition means. In other words, even in an environment where the noise level is relatively high, such as when driving a vehicle, it is possible to eliminate the influence of the noise as much as possible and to reliably recognize the data pattern of the voice emitted by the occupant of the speaker vehicle. It becomes possible. Therefore, the occupant can reliably perform a desired operation by voice input without manually operating an input key or the like, thereby improving convenience and driving safety.

【００２９】請求項１９記載の入力用マイクによれば、
音声用マイク及び雑音用マイクを同一筐体内に配置して
一体に構成し、その筐体の形状を、音声用マイクと雑音
用マイクとが見かけ上同一方向を指向しているように形
成する。According to the input microphone of the nineteenth aspect,
The voice microphone and the noise microphone are arranged in the same housing and integrally formed, and the shape of the housing is formed such that the voice microphone and the noise microphone are apparently directed in the same direction.

【００３０】即ち、請求項１乃至７の何れかに記載の音
声入力区間判定装置，請求項８乃至１５の何れかに記載
の音声データ抽出装置，或いは、請求項１６または１７
記載の音声認識装置において、音声用マイクと雑音用マ
イクとは、互いに指向性が異なるように配置する必要が
ある。That is, the speech input section determination device according to any one of claims 1 to 7, the speech data extraction device according to any one of claims 8 to 15, or the speech data extraction device according to claim 16 or 17.
In the described voice recognition device, the voice microphone and the noise microphone need to be arranged so as to have different directivities from each other.

【００３１】その場合、各マイクの設置作業などを考慮
すると、音声用マイクと雑音用マイクとを同一筐体内に
配置して一体に構成することが望ましい。しかしなが
ら、その際に、各マイクの夫々異なる指向性に合わせて
筐体の形状を形成すると、マイクの設置作業時におい
て、作業者が何れの方向（指向性）に合わせてマイクを
設置すれば良いか容易に判断できなくなるおそれがあ
る。In this case, in consideration of the installation work of each microphone, it is desirable that the voice microphone and the noise microphone are arranged in the same housing to be integrally formed. However, at this time, if the shape of the housing is formed in accordance with the different directivities of the respective microphones, the worker may install the microphone in any direction (directivity) during the microphone installation work. May not be easily determined.

【００３２】従って、筐体の形状を、音声用マイクと雑
音用マイクとが見かけ上同一方向を指向しているように
形成することで、作業者が、音声用マイクが話者の方向
を指向するように取り付けを行えば、雑音用マイクの指
向性は、自ずと音声用マイクとは異なるように設定され
るので、マイクの設置作業を容易に行うことができるよ
うになり作業性が向上する。Therefore, by forming the shape of the housing such that the voice microphone and the noise microphone are apparently directed in the same direction, the operator can direct the voice microphone in the direction of the speaker. If the microphone is mounted in such a manner, the directivity of the noise microphone is naturally set to be different from that of the voice microphone, so that the microphone can be easily installed and workability is improved.

【００３３】[0033]

【発明の実施の形態】（第１実施例）以下、本発明をカ
ーナビゲーション装置に適用した場合の第１実施例につ
いて図１乃至図８を参照して説明する。カーナビゲーシ
ョン装置（カーナビ）１全体の電気的構成を示す図１に
おいて、カーナビ１は、位置検出器２，地図データ入力
器３，操作スイッチ群４，これらが接続される制御回路
５などを備えている。更に、制御回路５には、外部メモ
リ６やディスプレイ等の表示装置７，リモコンセンサ８
や音声認識装置９などが接続されている。尚、制御回路
５は、マイクロコンピュータとして構成されており、そ
の内部には、具体的には図示しないが、周知のＣＰＵ，
ＲＯＭ，ＲＡＭ，Ｉ／Ｏなどを備えている。そして、こ
れらの構成要素はバスラインを介して互いに電気的に接
続されている。DESCRIPTION OF THE PREFERRED EMBODIMENTS (First Embodiment) A first embodiment in which the present invention is applied to a car navigation system will be described below with reference to FIGS. In FIG. 1, which shows the overall electrical configuration of a car navigation system (car navigation system) 1, the car navigation system 1 includes a position detector 2, a map data input device 3, an operation switch group 4, a control circuit 5 to which these are connected, and the like. I have. Further, the control circuit 5 includes an external memory 6, a display device 7 such as a display, a remote control sensor 8
And a voice recognition device 9 are connected. Note that the control circuit 5 is configured as a microcomputer, and includes a well-known CPU,
ROM, RAM, I / O, etc. are provided. These components are electrically connected to each other via a bus line.

【００３４】位置検出器２は、周知の地磁気センサ１
０，ジャイロスコープ１１，距離センサ１２，及びＧＰ
Ｓ(Global Positionning System)衛星からの電波に基づ
いて車両の位置を検出するＧＰＳ受信機１３などを備え
ている。尚、これらの各センサ等１０〜１３は、夫々が
性質の異なる誤差を生じるため、各センサ等１０〜１３
からの情報を参照することで、必要に応じて修正を行い
ながら各情報を使用するように構成されている。尚、要
求される精度に応じては、各センサ等１０〜１３の内の
一部を用いて構成しても良く、或いは、更にステアリン
グセンサや車輪センサなどを追加しても良い。The position detector 2 is a well-known geomagnetic sensor 1
0, gyroscope 11, distance sensor 12, and GP
A GPS receiver 13 for detecting the position of the vehicle based on radio waves from an S (Global Positioning System) satellite is provided. Each of the sensors 10 to 13 generates an error having a different property.
By referring to the information from, each information is used while making corrections as necessary. Note that, depending on the required accuracy, a configuration may be made using a part of the sensors 10 to 13 or a steering sensor, a wheel sensor, or the like may be added.

【００３５】地図データ入力器３は、地図データ及び目
印データや、位置検出精度を向上させるための所謂マッ
プマッチング用のデータ等を含む各種データを入力する
ための装置である。データの記憶媒体としてはＣＤ−Ｒ
ＯＭを用いるのが一般的であるが、ＤＶＤ(Digital Ver
satile Disk)やメモリカードなどを用いても良い。The map data input device 3 is a device for inputting various data including map data and landmark data, so-called map matching data for improving position detection accuracy, and the like. CD-R as data storage medium
Although OM is generally used, DVD (Digital Ver.
satile Disk) or a memory card may be used.

【００３６】表示装置７の画面には、地図データ入力器
３より入力される地図データが表示されると共に、その
地図データ上に、位置検出器２より入力される車両の現
在位置を示すマーカや、誘導経路や設定地点の目印等の
付加データなどが重畳して表示されるようになってい
る。On the screen of the display device 7, map data input from the map data input device 3 is displayed, and a marker indicating the current position of the vehicle input from the position detector 2 is displayed on the map data. In addition, additional data such as a guidance route and a mark of a set point are displayed in a superimposed manner.

【００３７】また、カーナビ１は、リモートコントロー
ル端末（以下、リモコンと称す）１４を介してリモコン
センサ８から、或いは、操作スイッチ群４を介して目的
地の位置が入力されると、現在位置からその目的地まで
の最適な走行経路を自動的に選択して誘導経路として表
示する、所謂経路案内機能をも備えている。このよう
に、自動的に最適な経路を設定する手法には、例えばダ
イクストラ法等が知られている。操作スイッチ群４は、
例えば、表示装置７と一体に形成されたタッチスイッチ
若しくはメカニカルスイッチなどが用いられ、各種入力
操作に使用されるようになっている。When the position of the destination is input from the remote control sensor 8 via a remote control terminal (hereinafter, referred to as a remote controller) 14 or via the operation switch group 4, the car navigation system 1 starts from the current position. It also has a so-called route guidance function that automatically selects the optimal travel route to the destination and displays it as a guidance route. As a method for automatically setting an optimum route, for example, the Dijkstra method is known. The operation switch group 4 is
For example, a touch switch or a mechanical switch formed integrally with the display device 7 is used, and is used for various input operations.

【００３８】そして、音声認識装置９は、上記操作スイ
ッチ群４やリモコン１４が手動操作で目的地などを指示
するために用いられるのに対して、利用者が音声で入力
を行うことによっても同様に目的地などを指示できるよ
うにするために配置されている。The voice recognition device 9 is used when the operation switch group 4 and the remote controller 14 are manually operated to indicate a destination or the like, and similarly when the user performs voice input. It is arranged so that a destination or the like can be instructed.

【００３９】この音声認識装置９は、音声認識部（音声
認識手段）１５，対話制御部１６，音声合成部１７，音
声抽出装置１８，音声用マイク１９及び雑音用マイク２
０，ＰＴＴ(Push To Talk)スイッチ２１，スピーカ２２
及び制御部２３から構成されている。The voice recognition device 9 includes a voice recognition unit (voice recognition means) 15, a dialogue control unit 16, a voice synthesis unit 17, a voice extraction device 18, a voice microphone 19, and a noise microphone 2.
0, PTT (Push To Talk) switch 21, speaker 22
And a control unit 23.

【００４０】音声認識部１５は、対話制御部１６の指示
により音声抽出装置１８より与えられる音声データにつ
いて音声認識処理すると、その認識結果を対話制御部１
６に出力するようになっている。即ち、音声認識部１５
は、音声抽出装置１８から取得した音声データと、予め
記憶されている辞書データに登録されている複数の比較
対象パターン候補とを照合比較して、一致度の高い上位
比較対象パターンを対話制御部１６に出力する。When the speech recognition unit 15 performs speech recognition processing on the speech data supplied from the speech extraction device 18 in accordance with the instruction of the dialog control unit 16, the speech recognition unit 15 outputs the recognition result.
6 is output. That is, the voice recognition unit 15
Compares and compares the voice data obtained from the voice extraction device 18 with a plurality of comparison target pattern candidates registered in the dictionary data stored in advance, and determines a higher comparison target pattern having a high degree of coincidence with the dialog control unit. 16 is output.

【００４１】入力音声中の単語系列の認識は、音声デー
タを順次音響分析して例えばＬＰＣ(Linear Prediction
Coefficient) ケプストラムなどの音響的特徴量を抽出
し、その時系列データを得る。そして、得られた時系列
データを幾つかの区間に分割し、周知のＤＰマッチング
法，ＨＭＭ(Hidden Markov Model) ，或いはニューラル
ネット等の手法を用いて各区間が辞書データとして格納
されたどの単語に対応しているかを求めるようにする。The recognition of a word sequence in the input voice is performed by sequentially analyzing the voice data by acoustic analysis, for example, by LPC (Linear Prediction).
Coefficient) Acoustic features such as cepstrum are extracted, and their time series data is obtained. Then, the obtained time series data is divided into several sections, and each section is stored as dictionary data using a known DP matching method, HMM (Hidden Markov Model), or a method such as a neural network. Ask if they are compatible with

【００４２】対話制御部１６は、音声認識結果や自身が
管理する内部状態から、音声合成部１７への応答音声の
発声指示や、カーナビ１全体を統括制御する制御回路５
に対して例えばナビゲート処理のために必要な目的地を
通知して設定処理を実行させるための指示を行うように
なっている。即ち、この音声認識装置９を利用すれば、
ユーザは、カーナビ１に対する目的地の指示などを、操
作スイッチ群４やリモコン１４を操作せずとも音声入力
によって行うことができる。The dialog control unit 16 instructs the voice synthesizing unit 17 to generate a response voice from the result of the voice recognition and the internal state managed by the control unit 5, and controls the overall control of the car navigation system 1.
For example, a destination required for the navigation process is notified and an instruction to execute the setting process is issued. That is, if this voice recognition device 9 is used,
The user can give a destination instruction or the like to the car navigation system 1 by voice input without operating the operation switch group 4 or the remote controller 14.

【００４３】図２に示すように、音声用マイク１９は、
ユーザが発声した音声を入力するために設けられてお
り、その指向性の中心は、話者（ユーザ）２４の方向に
向けられている。また、雑音用マイク２０は、ユーザが
発声した音声の背景として存在する雑音を入力するため
に設けられており、その指向性の中心は、話者（ユー
ザ）２４の方向に対して反対方向（水平に１８０度異な
る方向）に向けられている。そして、これらの音声用マ
イク１９及び雑音用マイク２０は、１つのマイクアッシ
ー（筐体）２５の内部に配置されることで一体の入力用
マイク６０として構成されている。As shown in FIG. 2, the audio microphone 19 is
It is provided for inputting a voice uttered by the user, and its directivity is centered toward the speaker (user) 24. The noise microphone 20 is provided for inputting noise existing as a background of a voice uttered by the user, and has a directivity center opposite to the direction of the speaker (user) 24 ( (Directions differing by 180 degrees horizontally). The voice microphone 19 and the noise microphone 20 are arranged inside one microphone assembly (housing) 25 to constitute an integrated input microphone 60.

【００４４】ここで、図３は、マイクアッシー２５の外
観を示す図であり、（ａ）は正面図，（ｂ）は背面図，
（ｃ）は分解した状態を示す平面図である。マイクアッ
シー２５の左右両端には、音声用マイク１９と雑音用マ
イク２０とを夫々収納する矩形箱状態の収納部２５Ａ，
２５Ｂがあり、それらの収納部２５Ａ，２５Ｂを角柱状
の支持部２５Ｃによって支持する形状をなしている。FIGS. 3A and 3B show the appearance of the microphone assembly 25. FIG. 3A is a front view, FIG.
(C) is a plan view showing a disassembled state. At the left and right ends of the microphone assembly 25, storage sections 25A in a rectangular box state for storing the audio microphone 19 and the noise microphone 20, respectively.
25B, and has a shape in which the storage portions 25A and 25B are supported by a prismatic support portion 25C.

【００４５】そして、図３（ｃ）及び図２に示すよう
に、マイクアッシー２５の内部において、音声用マイク
１９は話者２４側に向けて配置され、雑音用マイク２０
は話者２４側と反対側に向けて配置されている。しかし
ながら、マイクアッシー２５の外観は、何れのマイク１
９，２０も見かけ上は話者２４側（図３（ａ）に示す前
面側）を指向しているように形成されている。As shown in FIGS. 3C and 2, inside the microphone assembly 25, the voice microphone 19 is disposed toward the speaker 24, and the noise microphone 20 is disposed.
Is arranged facing the speaker 24 side. However, the appearance of the microphone assembly 25 is
9 and 20 are also formed so as to be apparently directed toward the speaker 24 (the front side shown in FIG. 3A).

【００４６】また、音声用マイク１９が収納されている
収納部２５Ａの前面側には音響信号入力用の開孔２５Ｄ
が形成されており、雑音用マイク２０が収納されている
収納部２５Ｂの前面側設けられているのはダミーの開孔
である。そして、当該収納部２５Ｂについては、図３
（ｂ）に示す背面側に音響信号入力用のスリット２５Ｅ
が設けられており、マイクアッシー２５外部の雑音が入
力可能となっている。An opening 25D for inputting an acoustic signal is provided on the front side of the storage section 25A in which the audio microphone 19 is stored.
The dummy opening is provided on the front side of the storage section 25B in which the noise microphone 20 is stored. As for the storage section 25B, FIG.
A slit 25E for inputting an acoustic signal is provided on the rear side shown in FIG.
Is provided, so that noise outside the microphone assembly 25 can be input.

【００４７】このように構成されている入力用マイク６
０は、図４に示すように、自動車の運転席側に配置され
ているステアリングコラム２６の上に、前面側がユーザ
たる運転者の方向を向くようにして配置されている。即
ち、マイクアッシー２５の形状を図３に示すように形成
することで、作業者は、その取り付けを行う場合に内部
のマイク１９，２０の指向性を考慮することなく、単
に、マイクアッシー２５の前面側が運転者の方向を向く
ように入力用マイク６０を配置すれば良い。また、ステ
アリングコラム２６の左側方には、モーメンタリ動作す
るプッシュスイッチで構成されたＰＴＴスイッチ２１が
配置されている。The input microphone 6 configured as described above
0, as shown in FIG. 4, is disposed on the steering column 26 disposed on the driver's seat side of the automobile such that the front side faces the driver as the user. That is, by forming the shape of the microphone assembly 25 as shown in FIG. 3, the worker can simply mount the microphone assembly 25 without considering the directivity of the internal microphones 19 and 20 when mounting the microphone assembly 25. The input microphone 60 may be arranged such that the front side faces the driver. On the left side of the steering column 26, a PTT switch 21 constituted by a momentary push switch is arranged.

【００４８】再び図１を参照して、音声抽出装置１８
は、ＰＴＴスイッチ２１がユーザによりオン操作された
ことを制御部２３が検出すると、後述するように、音声
用マイク１９及び雑音用マイク２０に夫々入力された信
号に基づいて運転者が発声した音声の信号をデジタルデ
ータとして抽出し、音声認識部１５に出力するようにな
っている。尚、制御部２３は、車載オーディオ機器のア
ンプ２７に対して、オーディオ用スピーカ２８に出力し
ているオーディオ信号の音量を一時的に絞って略零にす
る（ミュート）指示を与えたり、また、そのオーディオ
信号の音量を元の状態に復帰させる（ミュート解除）指
示を与えることもできるように構成されている。Referring again to FIG. 1, voice extracting device 18
When the control unit 23 detects that the PTT switch 21 has been turned on by the user, as described later, the voice uttered by the driver based on the signals input to the voice microphone 19 and the noise microphone 20 respectively. Are extracted as digital data and output to the voice recognition unit 15. The control unit 23 gives an instruction to the amplifier 27 of the in-vehicle audio device to temporarily reduce the volume of the audio signal output to the audio speaker 28 to almost zero (mute), or It is also configured to be able to give an instruction to return the volume of the audio signal to the original state (unmute).

【００４９】ここで、音声抽出装置１８の詳細な構成に
ついて図５をも参照して説明する。音声抽出装置１８
は、フレーム分割部（音声信号データ生成手段，雑音信
号データ生成手段）２９，音声区間判定装置（音声入力
区間判定装置）３０，音声用バッファ３１，雑音用バッ
ファ３２，音声用のフーリエ変換部３３，雑音用のフー
リエ変換部３４，サブストラクト部（音声データ抽出手
段）３５を備えて構成されている。Here, a detailed configuration of the voice extracting device 18 will be described with reference to FIG. Voice extraction device 18
Is a frame division unit (sound signal data generation means, noise signal data generation means) 29, a speech section determination device (speech input section determination device) 30, a speech buffer 31, a noise buffer 32, a speech Fourier transform unit 33 , A noise Fourier transform unit 34 and a subtraction unit (sound data extracting means) 35.

【００５０】フレーム分割部２９は、マイク１９，２０
より入力される音響信号を例えば１０数ＫＨｚ程度のレ
ートでサンプリングして夫々Ａ／Ｄ変換し、得られた離
散データ系列を数１０ｍｓ程度の時間幅に設定された区
間（フレーム）毎に切り出して分割し、切り出しの端部
によって発生する高調波成分を抑制するための窓関数
（例えば、ハニング窓）を乗じて音声区間判定装置３０
に出力するようになっている。The frame dividing section 29 includes microphones 19 and 20
The input audio signal is sampled at a rate of, for example, about several tens of KHz and A / D converted, and the obtained discrete data sequence is cut out for each section (frame) set to a time width of about several tens of ms. The voice section determination device 30 is divided and multiplied by a window function (for example, a Hanning window) for suppressing a harmonic component generated by an end of the cutout.
Output.

【００５１】音声区間判定装置３０は、音声用マイク１
９に入力された音響信号の各フレームについて、話者２
４が発声した音声の信号が含まれているか否かを判定・
検出して、音声信号が含まれているフレームのデータに
ついては、音声用バッファ３１に出力して蓄積するよう
になっている。また、音声区間判定装置３０は、雑音用
マイク２０に入力された音響信号の各フレームデータの
内、音声信号が含まれているフレームに対応する（即
ち、同時刻）フレームのデータを、雑音用バッファ３２
に出力して蓄積するようになっている。The voice section determination device 30 includes the voice microphone 1
9 for each frame of the audio signal input to speaker 2.
4. It is determined whether or not the signal of the uttered voice is included.
The data of the detected frame including the audio signal is output to the audio buffer 31 and accumulated. In addition, the voice section determination device 30 converts the data of the frame corresponding to the frame including the voice signal (that is, the same time) among the frame data of the audio signal input to the noise microphone 20 into the noise. Buffer 32
To be stored.

【００５２】尚、以降において、フレーム分割部２９以
降における音声用マイク１９系統のデジタルデータを音
声信号データ，雑音用マイク２０系統のデジタルデータ
を雑音声信号データと称する。In the following, digital data of the 19 audio microphones after the frame division unit 29 will be referred to as audio signal data, and digital data of the 20 noise microphones will be referred to as miscellaneous audio signal data.

【００５３】フーリエ変換部３３，３４は、バッファ３
１，３２に夫々蓄積されているデータを高速フーリエ変
換(FFT:Fast Fourier Transform)して短時間スペクトラ
ムを生成し、サブトラクト部３５に出力するようになっ
ている。サブトラクト部３５は、スペクトラムサブトラ
クション方式のアルゴリズムに従い、音声用のフーリエ
変換部３３によって生成された短時間スペクトラム（音
声と雑音とを含むスペクトラム，入力スペクトラム）か
ら、雑音用のフーリエ変換部３４によって生成された短
時間スペクトラム（雑音のみ含むスペクトラム，雑音ス
ペクトラム）を差し引くことで雑音除去を行うものであ
る。その際、例えば１以上に設定したサブトラクション
係数を雑音スペクトラムに乗じて減算を行っても良い。
また、スペクトラムは、パワー，振幅の何れのスペクト
ラムであっても良い。そして、サブトラクト部３５にお
ける減算結果は、音声認識部１５に出力されるようにな
っている。The Fourier transform units 33, 34
The short-time spectrum is generated by performing Fast Fourier Transform (FFT) on the data stored in each of the first and second data, and is output to the subtracter 35. The subtractor 35 is generated by the noise Fourier transformer 34 from the short-time spectrum (spectrum including voice and noise, input spectrum) generated by the voice Fourier transformer 33 according to the algorithm of the spectrum subtraction method. The noise is removed by subtracting the short-time spectrum (spectrum including only noise, noise spectrum). At that time, for example, the subtraction may be performed by multiplying the noise spectrum by a subtraction coefficient set to one or more.
Further, the spectrum may be any one of power and amplitude. Then, the result of the subtraction in the subtracting unit 35 is output to the speech recognition unit 15.

【００５４】また、図６は、音声区間判定装置３０の詳
細な構成を示す機能ブロック図である。音声区間判定装
置３０は、音声用マイク１９側に対応するパワー算出部
（音声信号データ生成手段）３６及び雑音用マイク２０
側に対応するパワー算出部（雑音信号データ生成手段）
３７を備えている。これらのパワー算出部３６及び３７
は、夫々フレーム分割部２９において分割されたフレー
ム毎にパワー（短時間パワー）、即ち、音響信号の電力
を計算して減算器３８に出力するようになっている。FIG. 6 is a functional block diagram showing a detailed configuration of the voice section judging device 30. The voice section determination device 30 includes a power calculation unit (voice signal data generation unit) 36 corresponding to the voice microphone 19 and the noise microphone 20.
Power calculation unit (noise signal data generation means) corresponding to the side
37. These power calculators 36 and 37
Calculates the power (short-time power), that is, the power of the acoustic signal, for each frame divided by the frame dividing unit 29 and outputs the calculated power to the subtracter 38.

【００５５】すると、減算器３８では、パワー算出部３
６で算出されたパワーからパワー算出部３７で算出され
たパワーが減算され、その減算結果が判定部（音声入力
区間判定手段）３９に与えられる。そして、判定部３９
は、その減算結果が所定値以上である場合は、その時パ
ワー算出部３６に入力されているフレームが話者２４の
音声データを含んでいるフレーム（音声入力区間）であ
ると判定する。そして、ゲート（出力バッファ）４０及
び４１を開くことで、フレーム分割部２９よりパワー算
出部３６及び３７に与えられているフレームデータを音
声用バッファ３１及び雑音用バッファ３２に夫々出力す
るようになっている。Then, in the subtracter 38, the power calculator 3
The power calculated by the power calculation unit 37 is subtracted from the power calculated in 6, and the result of the subtraction is provided to a determination unit (speech input section determination unit) 39. Then, the judgment unit 39
If the result of the subtraction is equal to or greater than a predetermined value, it is determined that the frame input to the power calculation unit 36 at that time is a frame (voice input section) including the voice data of the speaker 24. When the gates (output buffers) 40 and 41 are opened, the frame data supplied to the power calculation units 36 and 37 from the frame division unit 29 is output to the audio buffer 31 and the noise buffer 32, respectively. ing.

【００５６】次に、本実施例の作用について図７及び図
８をも参照して説明する。先ず、カーナビ１の動作の概
略について説明すると、カーナビ１に電源が投入される
と、表示装置７の画面上に操作メニューが表示される。
そして、運転者が、操作スイッチ群４またはリモコン１
４を操作して案内経路を表示装置７に表示させる処理を
選択したり、或いは、音声認識装置９を介して音声入力
することで対話制御部１６から制御回路５へ同様の選択
指示がなされると、以下のように処理が実行される。Next, the operation of the present embodiment will be described with reference to FIGS. First, an outline of the operation of the car navigation 1 will be described. When the power of the car navigation 1 is turned on, an operation menu is displayed on the screen of the display device 7.
Then, the driver operates the operation switch group 4 or the remote controller 1
A similar selection instruction is given from the dialog control unit 16 to the control circuit 5 by selecting a process of operating the display unit 4 to display the guidance route on the display device 7 or by inputting a voice through the voice recognition device 9. Then, the processing is executed as follows.

【００５７】即ち、運転者が表示装置７に表示された地
図を参照して音声或いはリモコン１４の操作により目的
地を入力すると、ＧＰＳ受信機１３より得られる位置デ
ータに基づき自動車の現在位置が求められる。そして、
制御回路５は、ダイクストラ法を用いてコスト計算する
ことで、現在位置から目的地までの最短距離となる経路
を誘導経路として求める。そして、求めた誘導経路を表
示装置７の地図上に重畳して表示することで、運転者に
適切な経路を案内する。斯様にして誘導経路を計算する
処理や案内する処理は、一般的に良く知られたものであ
るから詳細な説明は省略する。That is, when the driver inputs a destination by voice or the operation of the remote controller 14 with reference to the map displayed on the display device 7, the current position of the vehicle is obtained based on the position data obtained from the GPS receiver 13. Can be And
The control circuit 5 calculates the cost using the Dijkstra method, and obtains the route that is the shortest distance from the current position to the destination as the guidance route. Then, by superimposing and displaying the obtained guidance route on the map of the display device 7, an appropriate route is guided to the driver. The process of calculating the guidance route and the process of providing guidance in this manner are generally well-known, and thus detailed description thereof will be omitted.

【００５８】次に、音声認識装置９の動作について、上
述の経路案内のために目的地を音声入力する場合を例と
して説明する。図７は、音声認識装置９における制御部
２３の制御内容を示すフローチャートである。先ず、制
御部２３は、ＰＴＴスイッチ２１が押された（ＯＮ）か
否かを判断し（ステップＳ１）、押された場合は「ＹＥ
Ｓ」と判断し、アンプ２７に対してオーディオ用スピー
カ２８に出力しているオーディオ信号のミュート指示を
与える（ステップＳ２）。Next, the operation of the voice recognition device 9 will be described by taking as an example a case where the destination is input by voice for the above-mentioned route guidance. FIG. 7 is a flowchart showing the control contents of the control unit 23 in the speech recognition device 9. First, the control unit 23 determines whether or not the PTT switch 21 has been pressed (ON) (step S1).
S ", and instructs the amplifier 27 to mute the audio signal output to the audio speaker 28 (step S2).

【００５９】その時点で、アンプ２７がオーディオ用ス
ピーカ２８にオーディオ信号を出力している場合は、図
８（ｂ），（ｃ）に示すように、オーディオ信号がミュ
ートされることで音声用マイク１９及び雑音用マイク２
０に入力されている音響信号のレベルが低下する。At that time, if the amplifier 27 is outputting an audio signal to the audio speaker 28, the audio signal is muted as shown in FIGS. 19 and noise microphone 2
The level of the sound signal input to 0 decreases.

【００６０】それから、制御部２３は、音声抽出装置１
８に対して、音声用マイク１９及び雑音用マイク２０に
入力されている音響信号の取り込み処理及び蓄積処理の
開始指示を与える（ステップＳ３）。例えば、話者（運
転者）２４が目的地を「愛知県刈谷市昭和町」と発声し
て入力したものとする。すると、フレーム分割部２９
は、マイク１９，２０より入力される音響信号をサンプ
リングしてフレーム毎に分割し、音声信号データ及び雑
音信号データを音声区間判定装置３０に出力する。Then, the control unit 23 controls the sound extraction device 1
8 is instructed to start the processing for capturing and storing the acoustic signals input to the audio microphone 19 and the noise microphone 20 (step S3). For example, it is assumed that the speaker (driver) 24 utters and inputs the destination as "Showa-cho, Kariya city, Aichi prefecture". Then, the frame dividing unit 29
Samples audio signals input from the microphones 19 and 20 and divides them into frames, and outputs audio signal data and noise signal data to the audio section determination device 30.

【００６１】この時、話者２４の方向を指向している音
声用マイク１９には、話者２４の音声と共に雑音が入力
され、話者２４に対して逆（１８０度）方向を指向して
いる雑音用マイク２０には、話者２４の音声は殆ど入力
されず専ら雑音が入力されることになる。At this time, noise is input to the voice microphone 19 pointing in the direction of the speaker 24 together with the voice of the speaker 24, and the noise is directed in the opposite direction (180 degrees) with respect to the speaker 24. The voice of the speaker 24 is hardly input to the noise microphone 20 and the noise is input exclusively.

【００６２】そして、音声区間判定装置３０は、パワー
算出部３６及び３７において、夫々音声信号データ及び
雑音信号データについて分割されたフレーム毎に短時間
パワーを計算すると、減算器３８により両短時間パワー
値の減算を行う。音声区間判定装置３０の判定部３９
は、減算結果が所定値以上となったフレームのデータを
音声用バッファ３１及び雑音用バッファ３２に夫々出力
する。When the power calculating sections 36 and 37 calculate the short-time power for each of the divided frames of the voice signal data and the noise signal data, the voice section determination device 30 Performs value subtraction. Determination unit 39 of voice section determination device 30
Outputs the data of the frame whose subtraction result is equal to or greater than a predetermined value to the audio buffer 31 and the noise buffer 32, respectively.

【００６３】即ち、図８に示すように、ＰＴＴスイッチ
２１がＯＮされた時点から、話者（運転者）２４が発
声を行う時点までの期間にあっては、音声用マイク１
９及び雑音用マイク２０には、共に略等しいレベルの雑
音（例えば、自動車のエンジン音や走行音等）が入力さ
れている。従って、この期間−に属するフレームの
音声信号データ及び雑音信号データについて夫々短時間
パワーを計算すると、両者の差は殆どなく、両者を減算
した結果は低い値となる。That is, as shown in FIG. 8, during the period from the time when the PTT switch 21 is turned on to the time when the speaker (driver) 24 makes a sound, the voice microphone 1
The noise 9 and the noise microphone 20 are input with substantially the same level of noise (for example, an engine sound or a running sound of an automobile). Therefore, when the short-time power is calculated for each of the audio signal data and the noise signal data of the frame belonging to the period-, there is almost no difference between the two, and the result of subtracting both is a low value.

【００６４】そして、話者２４が「愛知県刈谷市昭和
町」と発声している期間−（音声区間）にあっては
その音声が音声用マイク１９に入力されるため、図８
（ｂ）に示すように入力信号の振幅レベルは上昇する
が、前記音声は逆方向を指向している雑音用マイク２０
には殆ど入力されることがなく、入力信号の振幅レベル
は殆ど変化しない。During the period (voice section) in which the speaker 24 is uttering “Showa-cho, Kariya-shi, Aichi”, the voice is input to the voice microphone 19.
As shown in (b), the amplitude level of the input signal rises, but the sound is directed in the opposite direction.
Is hardly input, and the amplitude level of the input signal hardly changes.

【００６５】従って、期間−に雑音用マイク２０に
入力される音響信号は、音声用マイク１９に音声と同時
に入力されている雑音にほぼ等しい。故に、期間−
に属するフレームの音声信号データ及び雑音信号データ
について上記と同様に短時間パワーの計算を行うと、パ
ワー算出部３６の出力値はパワー算出部３７の出力値よ
りも著しく大きくなり、両者の減算結果はかなり高い値
となるので、判定部３９は、前記フレームを音声入力区
間と判定し、ゲート４０及び４１を介してフレームデー
タを音声用バッファ３１，雑音用バッファ３２に夫々出
力して蓄積させる。Accordingly, the acoustic signal input to the noise microphone 20 during the period-is substantially equal to the noise input to the audio microphone 19 simultaneously with the voice. Therefore, the period-
When the short-time power is calculated for the audio signal data and the noise signal data of the frame belonging to the same manner as described above, the output value of the power calculation unit 36 becomes significantly larger than the output value of the power calculation unit 37, Is considerably high, the determination unit 39 determines that the frame is a voice input section, and outputs the frame data via the gates 40 and 41 to the voice buffer 31 and the noise buffer 32, respectively, and accumulates them.

【００６６】それから、制御部２３は、ＰＴＴスイッチ
２１がＯＦＦされるまで待機し（ステップＳ４）、ＯＦ
Ｆされると「ＹＥＳ」と判断して入力信号の取り込み中
止を音声抽出装置１８に指示すると（ステップＳ５）、
続いて、音声抽出装置１８に音声抽出処理の開始指示が
与える（ステップＳ６）。そして、アンプ２７に対して
オーディオ信号のミュート解除を指示すると（ステップ
Ｓ７）、ステップＳ１に移行する。Then, the control unit 23 waits until the PTT switch 21 is turned off (step S4), and
If the answer is F, it is determined to be "YES" and when the voice extraction device 18 is instructed to stop taking in the input signal (Step S5)
Subsequently, an instruction to start the audio extraction processing is given to the audio extraction device 18 (step S6). Then, when the amplifier 27 is instructed to release the mute of the audio signal (step S7), the process proceeds to step S1.

【００６７】音声抽出装置１８は、制御部２３より音声
抽出処理の開始指示が与えられると、フーリエ変換部３
３，３４によって、バッファ３１，３２に夫々蓄積され
ている音声入力区間に対応するフレームデータを読み出
してＦＦＴにより短時間スペクトラムを生成し、サブト
ラクト部３５に出力する。When a start instruction of the sound extraction processing is given from the control unit 23, the sound extraction device 18
The frame data corresponding to the audio input section stored in the buffers 31 and 32 are read by the buffers 3 and 34, a short-time spectrum is generated by FFT, and the spectrum is output to the subtracter 35.

【００６８】サブトラクト部３５は、スペクトラムサブ
トラクション方式によって、フーリエ変換部３４によっ
て生成された短時間スペクトラムＳＮ（ｆ，ｔ）から、
フーリエ変換部３３によって生成された前者と同じフレ
ーム番号の短時間スペクトラムＮ（ｆ，ｔ）を差し引い
て雑音成分の除去を行う。すると、その減算結果とし
て、話者２４が発生した音声データに対応するスペクト
ラム成分Ｓ（ｆ，ｔ）が残留する。即ち、Ｓ（ｆ，ｔ）＝ＳＮ（ｆ，ｔ）−Ｎ（ｆ，ｔ） …（１）となる。尚、ｆは周波数、ｔはフレーム番号即ち時間で
ある。The subtracter 35 converts the short-time spectrum SN (f, t) generated by the Fourier transformer 34 by the spectrum subtraction method.
The noise component is removed by subtracting the short-time spectrum N (f, t) of the same frame number as the former generated by the Fourier transformer 33. Then, as a result of the subtraction, a spectrum component S (f, t) corresponding to the voice data generated by the speaker 24 remains. That is, S (f, t) = SN (f, t) -N (f, t) (1). Note that f is a frequency, and t is a frame number, that is, time.

【００６９】そして、サブトラクト部３５より出力され
た音声データのスペクトラムＳ（ｆ，ｔ）は、音声認識
部１５に出力される。すると、前述のように、音声認識
部１５において、スペクトラムＳ（ｆ，ｔ）について音
響的特徴量が抽出された時系列データが得られ、その時
系列データが区間分けされて、適当な単語毎に辞書デー
タに登録されているデータパターンとの照合が行われ
る。The spectrum S (f, t) of the audio data output from the subtracter 35 is output to the audio recognizer 15. Then, as described above, the speech recognition unit 15 obtains time-series data from which the acoustic features have been extracted for the spectrum S (f, t), and the time-series data is divided into sections, and the appropriate words are divided into appropriate words. The collation with the data pattern registered in the dictionary data is performed.

【００７０】その認識結果は、音声認識部１５から対話
制御部１６へと与えられ、対話制御部１６は、与えられ
た認識結果に基づいて音声合成部１７への応答音声の発
声指示を与え、スピーカ２２を介して「愛知県刈谷市昭
和町」と合成音声を発声させる。また、音声認識部１５
の認識結果は、制御回路５にも与えられ、制御回路５
は、「愛知県刈谷市昭和町」を目的地として案内経路の
計算を開始する。The recognition result is provided from the voice recognition unit 15 to the dialog control unit 16, and the dialog control unit 16 gives an instruction to generate a response voice to the voice synthesis unit 17 based on the provided recognition result. Through the speaker 22, a synthesized voice is uttered with "Showa-cho, Kariya city, Aichi prefecture". Also, the voice recognition unit 15
Is also given to the control circuit 5, and the control circuit 5
Starts the calculation of the guide route with "Showa-cho, Kariya-shi, Aichi" as the destination.

【００７１】以上のように本実施例によれば、音声用マ
イク１９及び雑音用マイク２０を、同一のマイクアッシ
ー２５内に互いに指向性が１８０度異なるように配置し
た。そして、音声用マイク１９の指向性を話者２４の方
向に設定し、フレーム分割部２９は、音声用マイク１９
及び雑音用マイク２０の入力信号に基づく音声信号デー
タ及び雑音信号データを生成し、音声区間判定装置３０
の判定部３９は、音声信号データと雑音信号データとの
短時間パワーの差分に基づいて、音声信号データにおけ
る音声の入力区間を判定するようにした。As described above, according to the present embodiment, the voice microphone 19 and the noise microphone 20 are arranged in the same microphone assembly 25 so that the directivities differ from each other by 180 degrees. Then, the directivity of the audio microphone 19 is set in the direction of the speaker 24, and the frame dividing unit 29 sets the audio microphone 19
And voice signal data and noise signal data based on the input signal of the noise microphone 20 and the voice section determination device 30.
Is determined based on the short-time power difference between the audio signal data and the noise signal data.

【００７２】即ち、話者２４が音声を発すると、音声用
マイク１９には話者の音声と共に雑音が入力され、指向
性が１８０度異なる雑音用マイク２０には専ら雑音が入
力されるので、夫々の入力信号データの短時間パワーの
差をとることで、両入力信号データの差を顕著に検出す
ることが可能となり、音声信号データ中において話者２
４の音声が入力されている区間を確実に判定することが
できる。そして、両マイク１９，２０の指向性を１８０
度異なるように設定することで、音声用マイク１９と雑
音用マイク２０との干渉を極力小さくすると共に、両マ
イク１９，２０に入力される信号の相関もある程度維持
することができるので、音声入力区間の判定精度を向上
させることができる。That is, when the speaker 24 emits voice, noise is input to the voice microphone 19 together with the voice of the speaker, and noise is exclusively input to the noise microphone 20 whose directivity is different by 180 degrees. By taking the difference between the short-term powers of the respective input signal data, the difference between the two input signal data can be remarkably detected.
The section in which the voice of No. 4 is input can be reliably determined. Then, the directivity of both microphones 19 and 20 is set to 180
By setting the microphones to be different from each other, the interference between the audio microphone 19 and the noise microphone 20 can be minimized, and the correlation between the signals input to the microphones 19 and 20 can be maintained to some extent. The accuracy of section determination can be improved.

【００７３】また、本実施例によれば、音声抽出装置１
８は、音声信号データと雑音信号データとの夫々対応す
るフレーム毎のパワースペクトラムの差分に基づいて音
声データを抽出するので、音声データを含んでいるフレ
ームから雑音データのスペクトラム包絡を差し引くこと
によって音声データを精度良く抽出することができる。
そして、音声区間判定装置３０による作用効果との相乗
によって、抽出精度を一層高めることができる。ひいて
は、音声認識装置９における音声の認識精度を向上させ
ることが可能となる。Further, according to the present embodiment, the voice extracting device 1
8 extracts the audio data based on the difference between the power spectrum of each frame corresponding to the audio signal data and the noise signal data. Therefore, the audio envelope is obtained by subtracting the spectrum envelope of the noise data from the frame including the audio data. Data can be accurately extracted.
Then, synergy with the operation and effect of the voice section determination device 30 can further enhance the extraction accuracy. As a result, it is possible to improve the recognition accuracy of the voice in the voice recognition device 9.

【００７４】また、入力用マイク６０を構成するマイク
アッシー２５の形状を、見かけ上は音声用マイク１９と
雑音用マイク２０とが何れも話者２４の方向を指向して
いるように形成したので、作業者は、音声用マイク１９
が話者２４の方向を指向するように入力用マイク６０の
取り付けを行えば、雑音用マイク２０の指向性は自ずと
音声用マイク１９とは異なるように設定される。そし
て、両マイク１９，２０をマイクアッシー２５内に一体
に配置することも加えて、マイク１９，２０の設置作業
を容易に行うことができるようになり作業性が向上す
る。また、配線の引き回しなども容易に行うことができ
る。The microphone assembly 25 constituting the input microphone 60 is formed such that both the microphone 19 for noise and the microphone 20 for noise seem to be directed to the speaker 24 in appearance. , The operator uses the audio microphone 19
If the input microphone 60 is attached so as to direct the direction of the speaker 24, the directivity of the noise microphone 20 is naturally set to be different from that of the voice microphone 19. In addition to the fact that the two microphones 19 and 20 are integrally arranged in the microphone assembly 25, the work of installing the microphones 19 and 20 can be easily performed, and the workability is improved. In addition, wiring can be easily arranged.

【００７５】更に、本実施例によれば、音声用マイク１
９，雑音用マイク２０，フレーム分割部２９を、音声抽
出装置１８と音声区間判定装置３０とで共通化したの
で、全体を小形に構成することができる。Further, according to the present embodiment, the audio microphone 1
9, the noise microphone 20, and the frame division unit 29 are shared by the voice extraction device 18 and the voice segment determination device 30, so that the whole can be made small.

【００７６】加えて、カーナビ１は、音声認識装置９に
よって認識された話者２４の音声データパターンに基づ
いて所定の処理を実行するので、自動車の運転中などに
おいて雑音レベルが比較的高い環境下であっても、その
雑音の影響を極力排除して話者２４が発した音声のデー
タパターンを確実に認識することが可能であるから、話
者２４が所望した処理を確実に実行することができる。
そして、話者２４たる運転者は、ステアリングコラム２
６に設置されたＰＴＴスイッチ２１を操作して発声する
だけで所望の操作を行うことができるので、自動車の運
転を安全に行うことができる。In addition, since the car navigation system 1 performs a predetermined process based on the voice data pattern of the speaker 24 recognized by the voice recognition device 9, the car navigation system 1 may be used in an environment where the noise level is relatively high, such as when driving a car. However, since the influence of the noise can be eliminated as much as possible and the data pattern of the voice uttered by the speaker 24 can be reliably recognized, the processing desired by the speaker 24 can be reliably executed. it can.
Then, the driver who is the speaker 24 is the steering column 2
Since the desired operation can be performed only by operating the PTT switch 21 installed in the speaker 6 and speaking, the driving of the automobile can be performed safely.

【００７７】（第２実施例）図９は本発明の第２実施例
を示すものであり、第２実施例では、マイクアッシー
（筐体）２５ａが第１実施例におけるマイクアッシー２
５と若干異なっている。即ち、マイクアッシー２５ａの
内部において、雑音用マイク２０の指向性の中心は、話
者２４の方向に対して水平に略９０度異なる方向、即
ち、助手席側のドア（図示せず）方向に向けられてい
る。尚、マイクアッシー２５ａの外観は、図３に示すマ
イクアッシー２５と同様に形成されている。以上が入力
用マイク６０ａを構成している。(Second Embodiment) FIG. 9 shows a second embodiment of the present invention. In the second embodiment, the microphone assembly (housing) 25a is the same as the microphone assembly 2 in the first embodiment.
Slightly different from 5. That is, in the microphone assembly 25a, the center of the directivity of the noise microphone 20 is different from the direction of the speaker 24 by about 90 degrees horizontally, that is, in the direction of the door (not shown) on the passenger seat side. Is turned. The appearance of the microphone assembly 25a is formed in the same manner as the microphone assembly 25 shown in FIG. The above constitutes the input microphone 60a.

【００７８】即ち、マイクアッシー２５ａをステアリン
グコラム２６に設置した場合には、運転席側のドア方向
から到来したすれ違う車両の走行音が音声用マイク１９
に混入することも考えられる。また、防音壁が設けられ
ている道路を走行する場合には、助手席側のドア方向か
ら運転中の自動車の走行音が反射して到来することもあ
る。従って、後者のような走行状態の頻度が比較的高い
と想定される場合は、図９に示すように、雑音用マイク
２０を助手席側のドア方向を指向するように配置する。
また、前者のような走行状態の頻度が比較的高いと想定
される場合には、雑音用マイク２０を運転席側のドア方
向を指向するように配置すると良い。That is, when the microphone assembly 25a is installed on the steering column 26, the traveling sound of a passing vehicle coming from the door side on the driver's seat side is transmitted to the audio microphone 19a.
It is also conceivable to mix in In addition, when traveling on a road provided with a soundproof wall, the traveling sound of a driving automobile may be reflected from the direction of the door on the passenger seat side and arrive. Therefore, when it is assumed that the frequency of the traveling state such as the latter is relatively high, as shown in FIG. 9, the noise microphone 20 is arranged so as to be directed to the door direction on the passenger seat side.
Further, when it is assumed that the frequency of the traveling state as the former is relatively high, the noise microphone 20 may be arranged so as to be directed to the door direction on the driver's seat side.

【００７９】ここで、音声用マイク１９に入力される音
声信号成分をＳ（ｆ），雑音信号成分をＮ１（ｆ）と
し、雑音用マイク２０に入力される雑音信号成分をＮ２
（ｆ）として単純な加減算モデルで考えると、音声入力
区間を判定する場合、または、音声データを抽出する場
合の差分量Ｄ（ｆ）は、Ｄ（ｆ）＝（Ｓ（ｆ）＋Ｎ１（ｆ））−α（ｆ）・Ｎ２（ｆ） …（２）と表すことができる。但し、係数α（ｆ）は、音声信号
成分Ｓ（ｆ）が音声用マイク１９に入力されていない状
態で、差分量Ｄ（ｆ）＝０（即ち、Ｎ１（ｆ）＝α
（ｆ）・Ｎ２（ｆ））となるように調整するための係数
である。Here, the voice signal component input to the voice microphone 19 is S (f), the noise signal component is N1 (f), and the noise signal component input to the noise microphone 20 is N2.
Considering a simple addition / subtraction model as (f), the difference amount D (f) when determining a voice input section or extracting voice data is D (f) = (S (f) + N1 (f) )) − Α (f) · N2 (f) (2) However, the coefficient α (f) is the difference D (f) = 0 (that is, N1 (f) = α when the audio signal component S (f) is not input to the audio microphone 19).
(F) · N2 (f)).

【００８０】実際に短時間パワーの差分量を演算する場
合には、事前に係数α（ｆ）を設定して（２）式のよう
に演算するため、雑音用マイク２０に入力する信号につ
いては、音声信号成分Ｓ（ｆ）を極力ゼロとして雑音信
号成分Ｎ２（ｆ）のレベルを高く得ること、即ち、Ｓ
（ｆ），Ｎ２（ｆ）間のダイナミックレンジがより大き
く（Ｓ（ｆ）＜＜Ｎ２（ｆ））なることが好ましい。When the difference amount of the short-time power is actually calculated, the coefficient α (f) is set in advance and the calculation is performed according to the equation (2). To obtain a high level of the noise signal component N2 (f) by setting the audio signal component S (f) to zero as much as possible,
It is preferable that the dynamic range between (f) and N2 (f) be larger (S (f) << N2 (f)).

【００８１】更に、換言すれば、雑音用マイク２０側に
おける前記ダイナミックレンジを最大にすることは、雑
音信号の差分量Ｄｎ（ｆ）を、Ｄｎ（ｆ）＝Ｎ２（ｆ）−Ｎ１（ｆ） …（３）と定義すると、Ｄｎ（ｆ）を最大にすることに等しい。
また、この場合、Ｎ２（ｆ）とＮ１（ｆ）との比をＲｎ
（ｆ）と定義して、Ｒｎ（ｆ）＝Ｎ２（ｆ）／Ｎ１（ｆ） …（４）Ｒｎ（ｆ）を最大にすることは、Ｄｎ（ｆ）を最大にす
ることと同義である。Further, in other words, to maximize the dynamic range on the noise microphone 20 side, the difference Dn (f) of the noise signal is calculated by: Dn (f) = N2 (f) -N1 (f) (3) is equivalent to maximizing Dn (f).
In this case, the ratio between N2 (f) and N1 (f) is Rn.
Rn (f) = N2 (f) / N1 (f) (4) Maximizing Rn (f) is synonymous with maximizing Dn (f). .

【００８２】その結果、雑音用マイク２０側において
は、（Ｎ１（ｆ）と相似である）雑音信号成分Ｎ２
（ｆ）に関する情報を高精度で得ることができるため、
音声入力区間の判定を行う際に（２）式を演算すると、
音声用マイク１９側に音声信号成分Ｓ（ｆ）と共に入力
される雑音信号成分をＮ１（ｆ）を効果的に抑圧し得
て、差分量Ｄ（ｆ）として音声信号成分Ｓ（ｆ）を良好
に抽出することができるようになる。As a result, on the noise microphone 20 side, the noise signal component N2 (similar to N1 (f))
Since information about (f) can be obtained with high accuracy,
When the expression (2) is calculated when determining the voice input section,
The noise signal component N (f) input together with the audio signal component S (f) to the audio microphone 19 can be effectively suppressed to N1 (f), and the audio signal component S (f) is excellent as the difference D (f). Can be extracted.

【００８３】従って、以上のように構成された第２実施
例によれば、雑音用マイク２０を助手席側のドア方向を
指向するように配置することで、前記ドア方向から到来
する自動車の走行音などの雑音信号を、高いレベルで雑
音用マイク２０に入力することができるようになる。そ
して、音声区間判定装置３０の減算器３８または音声抽
出装置１８のサブトラクト部３５において、音声入力区
間の判定及び音声データの抽出を高精度で行うことがで
きる。Therefore, according to the second embodiment configured as described above, the noise microphone 20 is disposed so as to be directed to the door direction on the passenger seat side, so that the vehicle arriving from the door direction can travel. A noise signal such as a sound can be input to the noise microphone 20 at a high level. Then, the subtracter 38 of the voice section determination device 30 or the subtracter 35 of the voice extraction device 18 can determine the voice input section and extract the voice data with high accuracy.

【００８４】（第３実施例）図１０は本発明の第３実施
例を示すものであり、第３実施例では、マイクアッシー
（筐体）２５ｂが、第２実施例と同様、第１実施例にお
けるマイクアッシー２５と若干異なっている。即ち、マ
イクアッシー２５ｂの内部において、雑音用マイク２０
の指向性の中心は、話者２４の方向に対して垂直に略９
０度異なる方向、即ち、運転室の天井方向に向けられて
いる。尚、マイクアッシー２５ｂの外観は、図３に示す
マイクアッシー２５と同様に形成されている。以上が入
力用マイク６０ｂを構成している。(Third Embodiment) FIG. 10 shows a third embodiment of the present invention. In the third embodiment, a microphone assembly (housing) 25b is similar to the second embodiment in the first embodiment. It is slightly different from Mike Assy 25 in the example. That is, the noise microphone 20 is set inside the microphone assembly 25b.
Of the direction of the speaker 24 is approximately 9 perpendicular to the direction of the speaker 24.
They are oriented in directions different by 0 degrees, that is, in the direction of the ceiling of the cab. The appearance of the microphone assembly 25b is formed in the same manner as the microphone assembly 25 shown in FIG. The above constitutes the input microphone 60b.

【００８５】即ち、雑音環境によっては、雑音用マイク
２０の指向性を斯様に設定することが望ましい場合も想
定される。従って、以上のように構成された第２実施例
によっても、第１または第２実施例と同様の効果を得る
ことができる。That is, depending on the noise environment, it is assumed that it is desirable to set the directivity of the noise microphone 20 in this way. Therefore, according to the second embodiment configured as described above, the same effect as that of the first or second embodiment can be obtained.

【００８６】（第４実施例）図１１及び図１２は本発明
の第４実施例を示すものであり、第１実施例と同一部分
には同一符号を付して説明を省略し、以下異なる部分に
ついてのみ説明する。第４実施例におけるマイクアッシ
ー（筐体）２５ｃの内部にはアクチュエータ（指向性変
化機構）４２が配置されており、音声用マイク１９及び
雑音用マイク２０は、アクチュエータ４２によって駆動
され水平方向に回動可能に構成されている。以上が入力
用マイク６０ｃを構成している。アクチュエータ４２
は、駆動制御部（指向性制御手段）４３により制御され
るようになっており、駆動制御部４３は、音声区間判定
装置３０における減算器（差分量検出手段）３８からの
減算結果に基づいてアクチュエータ４２の駆動制御を行
うようになっている。(Fourth Embodiment) FIGS. 11 and 12 show a fourth embodiment of the present invention. The same parts as those of the first embodiment are denoted by the same reference numerals, and the description thereof will be omitted. Only the parts will be described. An actuator (directivity changing mechanism) 42 is disposed inside the microphone assembly (housing) 25c in the fourth embodiment, and the audio microphone 19 and the noise microphone 20 are driven by the actuator 42 to rotate in the horizontal direction. It is configured to be movable. The above constitutes the input microphone 60c. Actuator 42
Is controlled by a drive control unit (directivity control unit) 43, and the drive control unit 43 performs the control based on a subtraction result from a subtractor (difference amount detection unit) 38 in the voice section determination device 30. The drive of the actuator 42 is controlled.

【００８７】次に、第４実施例の作用について説明す
る。音声認識装置９の制御部２３は、ＰＴＴスイッチ２
１がＯＮされて音声用マイク１９に話者２４の音声が入
力される以前に、音声区間判定装置３０に指示を与え
て、（３）式において示した雑音信号成分Ｎ１（ｆ）及
びＮ２（ｆ）の短時間パワーを夫々パワー算出部３６及
び３７により計算して両者の差分量Ｄｎ（ｆ）を求めさ
せ、その差分量Ｄｎ（ｆ）を駆動制御部４３に出力させ
るようにする。Next, the operation of the fourth embodiment will be described. The control unit 23 of the voice recognition device 9 includes the PTT switch 2
1 is turned on and before the voice of the speaker 24 is input to the voice microphone 19, an instruction is given to the voice section determination device 30, and the noise signal components N1 (f) and N2 ( The short-time power f) is calculated by the power calculation units 36 and 37, respectively, to obtain the difference Dn (f) between the two, and the difference Dn (f) is output to the drive control unit 43.

【００８８】そして、その処理を複数回繰り返して行う
ことで、駆動制御部４３は、差分量Ｄｎ（ｆ）の値が最
大となるように、アクチュエータ４２を介して音声用マ
イク１９及び雑音用マイク２０の指向性を相対的に変化
させる。その場合、例えば、音声用マイク１９の指向性
は話者２４の方向に固定させておき、雑音用マイク２０
を回動させてＤｎ（ｆ）の値が最大となる位置で停止さ
せる。次に、雑音用マイク２０の位置を固定して、音声
用マイク１９の指向性を話者２４方向を中心とする一定
範囲内で回動させてＤｎ（ｆ）の値が最大となる位置で
停止させるようにする。尚、後者の処理は省略しても良
い。Then, by repeating the process a plurality of times, the drive control unit 43 sends the audio microphone 19 and the noise microphone 19 via the actuator 42 such that the value of the difference Dn (f) becomes maximum. 20 are changed relatively. In this case, for example, the directivity of the audio microphone 19 is fixed in the direction of the speaker 24 and the noise microphone 20 is fixed.
To stop at the position where the value of Dn (f) becomes maximum. Next, the position of the noise microphone 20 is fixed, and the directivity of the voice microphone 19 is rotated within a certain range centered on the speaker 24 direction, and the position of the microphone Dn (f) is maximized. Stop it. Note that the latter process may be omitted.

【００８９】以上のように第４実施例によれば、駆動制
御部４３は、減算器３８によって得られる雑音信号の差
分量Ｄｎ（ｆ）が最大となるように、アクチュエータ４
２によって音声用マイク１９と雑音用マイク２０との指
向性を相対的に変化させるように制御する。As described above, according to the fourth embodiment, the drive control unit 43 controls the actuator 4 so that the difference Dn (f) of the noise signal obtained by the subtractor 38 is maximized.
2, the control is performed so that the directivity of the audio microphone 19 and the noise microphone 20 is relatively changed.

【００９０】従って、実際の様々に異なる使用環境に応
じて、前記差分量Ｄｎ（ｆ）が最大となる状態に調整す
ることで、前述したように音声信号Ｓ（ｆ）を高精度で
抽出することが可能となり、音声区間判定装置３０は、
音声入力区間を高精度で判定することができる。また、
音声抽出装置１８における音声データの抽出精度も向上
させることが可能となる、加えて、音声認識装置９にお
ける音声認識精度を向上させることもできる。Accordingly, the audio signal S (f) is extracted with high precision as described above by adjusting the difference amount Dn (f) to a maximum value according to the actual various different use environments. It is possible for the voice segment determination device 30 to:
The voice input section can be determined with high accuracy. Also,
The accuracy of voice data extraction by the voice extraction device 18 can be improved. In addition, the voice recognition accuracy of the voice recognition device 9 can be improved.

【００９１】（第５実施例）図１３は本発明の第５実施
例を示すものであり、第１実施例と同一部分には同一符
号を付して説明を省略し、以下異なる部分についてのみ
説明する。第５実施例においては、複数個の入力用マイ
ク６０（１），６０（２），…，６０（ｎ）を用いる構
成であり、それら複数個の入力用マイク６０は、ステア
リングコラム２６を含む運転席の複数箇所に配置されて
いる。但し、何れも、音声用マイク１９の指向性は話者
２４の方向を向くように配置される。(Fifth Embodiment) FIG. 13 shows a fifth embodiment of the present invention. The same parts as those in the first embodiment are denoted by the same reference numerals, and description thereof will be omitted. explain. In the fifth embodiment, a plurality of input microphones 60 (1), 60 (2),..., 60 (n) are used, and the plurality of input microphones 60 include the steering column 26. It is arranged at several places in the driver's seat. However, in each case, the directivity of the voice microphone 19 is arranged so as to face the speaker 24.

【００９２】各入力用マイク６０に入力される音響信号
は、マイクセレクタ４４を介してフレーム分割部２９に
与えられるようになっており、選択制御部（音声入力区
間判定手段）４５は、音声区間判定装置３０における減
算器３８からの減算結果に基づいてマイクセレクタ４４
にセレクト信号を出力するようになっている。そして、
マイクセレクタ４４は、与えられたセレクト信号に対応
する入力用マイク６０に入力されている音響信号をフレ
ーム分割部２９に出力するようになっている。The audio signal input to each input microphone 60 is provided to the frame division unit 29 via the microphone selector 44, and the selection control unit (voice input section determination means) 45 Microphone selector 44 based on the result of subtraction from subtractor 38 in determination device 30
Output a select signal. And
The microphone selector 44 outputs an audio signal input to the input microphone 60 corresponding to the given select signal to the frame dividing unit 29.

【００９３】次に、第５実施例の作用について説明す
る。音声認識装置９の制御部２３は、第４実施例と同様
に、ＰＴＴスイッチ２１が音声用マイク１９に話者２４
の音声が入力される前に、選択制御部４５に指示を与え
て複数個の入力用マイク６０（１），６０（２），…，
６０（ｎ）の夫々より入力されている音響信号をフレー
ム分割部２９に順次切換えて出力させる。Next, the operation of the fifth embodiment will be described. As in the fourth embodiment, the control unit 23 of the voice recognition device 9 sets the PTT switch 21
Is input to the selection control unit 45 to input a plurality of input microphones 60 (1), 60 (2),.
The audio signals input from each of 60 (n) are sequentially switched and output to the frame division unit 29.

【００９４】すると、音声区間判定装置３０は、各入力
用マイク６０によって得られる雑音信号成分Ｎ１（ｆ）
及びＮ２（ｆ）の短時間パワーを夫々パワー算出部３６
及び３７により計算して両者の差分量Ｄｎ（ｆ）を求め
選択制御部４５に出力する。そして、選択制御部４５
は、その結果に基づいて差分量Ｄｎ（ｆ）の値が最大と
なる入力用マイク６０を判定し、以降の音声入力区間判
定及び音声データ抽出に用いるようにする。Then, the voice section determination device 30 outputs the noise signal component N1 (f) obtained by each input microphone 60.
And the short-time power of N2 (f) are calculated by the power calculation unit 36, respectively.
, 37 and the difference Dn (f) is obtained and output to the selection control unit 45. Then, the selection control unit 45
Determines the input microphone 60 in which the value of the difference amount Dn (f) is maximum based on the result, and uses the input microphone 60 for subsequent voice input section determination and voice data extraction.

【００９５】以上のように第５実施例によれば、選択制
御部４５は、複数個の入力用マイク６０の内、減算器３
８によって検出される差分量Ｄｎ（ｆ）が最大となるも
のを選択するので、音声区間判定装置３０は、実際の使
用環境に応じて、雑音信号の差分量Ｄｎ（ｆ）が最大と
なる入力用マイク６０（即ち、音声用マイク１９と雑音
用マイク２０との対）より得られる入力信号に基づい
て、音声入力区間を高精度で判定することができる。ま
た、音声抽出装置１８及び音声認識装置９における音声
データの抽出及び音声認識の精度を向上させることもで
きる。As described above, according to the fifth embodiment, the selection control unit 45 includes the subtractor 3 among the plurality of input microphones 60.
8, the voice section determination device 30 selects the input that maximizes the noise signal difference Dn (f) according to the actual use environment. The voice input section can be determined with high accuracy based on the input signal obtained from the microphone 60 (that is, the pair of the voice microphone 19 and the noise microphone 20). Further, the accuracy of voice data extraction and voice recognition in the voice extraction device 18 and the voice recognition device 9 can be improved.

【００９６】（第６実施例）図１４及び図１５は本発明
の第６実施例を示すものであり、第１実施例と同一部分
には同一符号を付して説明を省略し、以下異なる部分に
ついてのみ説明する。図１４に示すように、第６実施例
における音声抽出装置４６では、第１実施例のフレーム
分割部２９が音響分析部（音声信号データ生成手段，雑
音信号データ生成手段）４７に置き換えれらており、音
声区間判定装置３０が音声区間判定装置４８に置き換え
られている。(Sixth Embodiment) FIGS. 14 and 15 show a sixth embodiment of the present invention. The same parts as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted. Only the parts will be described. As shown in FIG. 14, in the speech extracting apparatus 46 according to the sixth embodiment, the frame dividing unit 29 according to the first embodiment is replaced by an acoustic analyzing unit (sound signal data generating means, noise signal data generating means) 47. The voice section determination device 30 is replaced with a voice section determination device 48.

【００９７】音響分析部４７は、フレーム分割部２９の
機能に加えて、窓関数がかけられた各フレームデータを
ＦＦＴする機能をも具備したものである。そして、バッ
ファ３１，３２及びフーリエ変換部３３，３４は削除さ
れており、音声区間判定装置（音声入力区間判定装置）
４８から出力されるデータがサブトラクト部３５に直接
入力されるようになっている。The acoustic analysis unit 47 has a function of performing FFT on each frame data to which a window function has been applied, in addition to the function of the frame division unit 29. Then, the buffers 31, 32 and the Fourier transform units 33, 34 are deleted, and the voice section determination device (voice input section determination device)
The data output from 48 is directly input to the subtract unit 35.

【００９８】また、図１５に示すように、音声区間判定
装置４８は、音声区間判定装置３０のパワー算出部３６
及び３７を、パワースペクトラム算出部（音声信号デー
タ生成手段及び雑音信号データ生成手段）４９及び５０
に置き換えたものである。Further, as shown in FIG. 15, the voice section determination device 48 includes a power calculation section 36 of the voice section determination device 30.
And 37 are used as power spectrum calculators (audio signal data generator and noise signal data generator) 49 and 50.
Is replaced by

【００９９】次に、第６実施例の作用について説明す
る。第１実施例と同様に、話者２４が目的地を発声して
入力すると、音響分析部４７は、マイク１９，２０より
入力される音響信号をサンプリングしてフレーム毎に分
割し、窓関数を乗じてＦＦＴを行い、振幅スペクトラム
として得られた音声信号データ及び雑音信号データを音
声区間判定装置４８に出力する。Next, the operation of the sixth embodiment will be described. As in the first embodiment, when the speaker 24 speaks and inputs a destination, the sound analysis unit 47 samples the sound signal input from the microphones 19 and 20 and divides the sound signal into frames, and sets a window function. The FFT is performed by multiplication, and the audio signal data and the noise signal data obtained as the amplitude spectrum are output to the audio section determination device 48.

【０１００】そして、音声区間判定装置４８は、パワー
スペクトラム算出部４９及び５０において、与えられた
音声信号データ及び雑音信号データについてフレーム毎
にパワースペクトラムを計算すると、減算器３８により
両短時間パワースペクトラムの差分をとる。音声区間判
定装置４８の判定部（音声入力区間判定手段）３９ａ
は、その差分の結果、音声に特有のスペクトラム包絡が
残留しているフレームの音声信号データ及び雑音信号デ
ータをサブトラクト部３５に出力する。When the power spectrum calculators 49 and 50 calculate the power spectrum for the given voice signal data and noise signal data for each frame, the subtractor 38 outputs the short-time power spectrum. And take the difference. Judgment section (speech input section judgment means) 39a of speech section judgment device 48
Outputs the speech signal data and the noise signal data of the frame in which the spectrum envelope specific to the speech remains as a result of the difference to the subtracter 35.

【０１０１】そして、サブトラクト部３５は、音声区間
判定装置４８を介して振幅スペクトラムとして与えられ
た音声信号データ及び雑音信号データについて、第１実
施例と同様にスペクトラムサブトラクション方式を用い
ることで、音声信号データから雑音成分の除去を行う。Then, the subtractor 35 uses the spectrum subtraction method for the audio signal data and the noise signal data given as the amplitude spectrum through the audio section determination device 48 in the same manner as in the first embodiment. The noise component is removed from the data.

【０１０２】以上のように第６実施例によれば、音声区
間判定装置４８は、音声信号データと雑音信号データと
の所定周波数範囲内におけるパワースペクトラムの差分
に基づいて音声の入力区間を判定するようにした。即
ち、音声データを含んでいるフレームについては音声デ
ータに特徴的なスペクトラム包絡が現れることから、そ
のような特徴的なスペクトラム包絡を含んでいる部分を
検出して音声入力区間を容易に判定することができる。
また、差分をとる場合には、音声データに特徴的なスペ
クトラム包絡が現れる周波数範囲（所定周波数範囲，例
えば、０〜６ｋＨｚ程度）で行えば良い。As described above, according to the sixth embodiment, the voice section determination device 48 determines the voice input section based on the difference in power spectrum between the voice signal data and the noise signal data within a predetermined frequency range. I did it. That is, since a characteristic spectrum envelope appears in the audio data for a frame including the audio data, it is easy to determine a voice input section by detecting a portion including such a characteristic spectrum envelope. Can be.
In addition, when taking the difference, the difference may be obtained in a frequency range (a predetermined frequency range, for example, about 0 to 6 kHz) where a characteristic spectrum envelope appears in the audio data.

【０１０３】本発明は上記し且つ図面に記載した実施例
にのみ限定されるものではなく、次のような変形または
拡張が可能である。第１実施例において、入力用マイク
６０の取り付け、ステアリングコラム２６に限らず、運
転席側のドアやダッシュボードの上などに取り付けても
良い。音声用マイク１９と雑音用マイク２０との指向性
の関係は、第１乃至第３実施例において示したものに限
ることなく、要は雑音用マイク２０の指向性が音声用マ
イク１９の指向性と異なるように配置すれば良い。
（２）式で表した差分量Ｄ（ｆ）について、係数αをス
カラ量として（５）式で表現する場合、Ｄ（ｆ）＝（Ｓ（ｆ）＋Ｎ１（ｆ））−α・Ｎ２（ｆ） …（５）係数αを、Ｓ（ｆ）＝０の場合に、Σ｜Ｄ（ｆ）｜を最
小とするように決定したり、或いは、ΣＤ（ｆ）^２を最
小とするように決定しても良い。The present invention is not limited to the embodiment described above and shown in the drawings, and the following modifications or extensions are possible. In the first embodiment, the input microphone 60 is not limited to be mounted on the steering column 26, but may be mounted on a driver's seat side door or on a dashboard. The relationship between the directivity of the audio microphone 19 and the noise microphone 20 is not limited to the one shown in the first to third embodiments. What is necessary is just to arrange it differently.
For the difference amount D (f) expressed by the expression (2), when the coefficient α is expressed by the expression (5) as a scalar amount, D (f) = (S (f) + N1 (f)) − α · N2 ( f) (5) The coefficient α is determined so that Σ | D (f) | is minimized when S (f) = 0, or 係数 D (f) ² is minimized. You may decide.

【０１０４】第３実施例において、アクチュエータ４２
は、マイクアッシー２５ｃの外部に配置しても良い。第
４実施例において、マイク１９，２０の指向性を相対的
に変化させる場合に垂直方向に変位させるようにしても
良いし、水平，垂直方向の変位を組み合わせるようにし
ても良い。また、第４実施例では、指向性の調整を駆動
制御部４３によって自動的に行うようにしたが、調整を
マニュアルで行うように構成しても良い。また、第４実
施例において、差分量検出手段としての減算器３８に代
えて、サブトラクト部（差分量検出手段）３５において
雑音信号の差分量Ｄｎ（ｆ）を求め、駆動制御部４３
は、サブトラクト部３５より得られる差分量Ｄｎ（ｆ）
に基づいて同様の制御を行っても良い。In the third embodiment, the actuator 42
May be arranged outside the microphone assembly 25c. In the fourth embodiment, when the directivities of the microphones 19 and 20 are relatively changed, the microphones 19 and 20 may be vertically displaced, or may be combined with horizontal and vertical displacements. In the fourth embodiment, the directivity adjustment is automatically performed by the drive control unit 43. However, the adjustment may be performed manually. Further, in the fourth embodiment, instead of the subtracter 38 as the difference amount detecting means, the subtraction section (difference amount detecting means) 35 obtains the difference amount Dn (f) of the noise signal, and the drive control section 43
Is the difference Dn (f) obtained from the subtracter 35
A similar control may be performed based on

【０１０５】第４実施例と第５実施例とを組み合わせ
て、入力用マイク６０ｃを複数個配置して、差分量Ｄｎ
（ｆ）が最大となるものを用いても良い。第６実施例に
おいて、音声区間判定装置４８において、パワースペク
トラムの代わりに振幅スペクトラムの差分を求めて音声
区間の判定を行っても良い。その場合、音声抽出部４６
は、その差分の結果を直接音声データ抽出に利用しても
よい。音声区間判定装置３０，４８を音声認識装置９，
音声抽出装置１８と独立に構成しても良い。また、音声
抽出装置１８を音声認識装置９と独立に構成しても良
い。更に、音声区間判定装置３０，４８は、音声認識装
置９に用いるものに限ることなく、入力信号系列の何れ
の部分に音声データが含まれているかを判定するものに
適用できる。音声認識装置９としては、対話制御部１
６，音声合成部１７及びスピーカ２２は、必ずしも必要
ではない。カーナビゲーション装置１に限ることなく、
音声入力を行うものであれば適用が可能である。By combining the fourth embodiment and the fifth embodiment, a plurality of input microphones 60c are arranged and the difference Dn
The one that maximizes (f) may be used. In the sixth embodiment, the voice section determination device 48 may determine the voice section by calculating the difference of the amplitude spectrum instead of the power spectrum. In that case, the voice extraction unit 46
May directly use the result of the difference for audio data extraction. The voice section determination devices 30 and 48 are connected to the voice recognition device 9 and
You may comprise independently of the audio | voice extraction apparatus 18. Further, the voice extraction device 18 may be configured independently of the voice recognition device 9. Further, the voice section determination devices 30 and 48 are not limited to those used for the voice recognition device 9 and can be applied to devices that determine which part of the input signal sequence contains voice data. The dialogue control unit 1 as the voice recognition device 9
6, the voice synthesizer 17 and the speaker 22 are not always necessary. Without being limited to the car navigation device 1,
The present invention is applicable as long as it performs voice input.

[Brief description of the drawings]

【図１】本発明をカーナビゲーション装置に適用した第
１実施例を示すものであり、カーナビゲーション装置全
体の電気的構成を示す機能ブロック図FIG. 1 illustrates a first embodiment in which the present invention is applied to a car navigation device, and is a functional block diagram illustrating an electrical configuration of the entire car navigation device.

【図２】話者に対する音声用マイク及び雑音用マイクの
指向性の設定状態を示す平面図FIG. 2 is a plan view showing a setting state of directivity of a voice microphone and a noise microphone with respect to a speaker;

【図３】入力用マイクを構成するマイクアッシーの外観
を示す図であり、（ａ）は正面図，（ｂ）は背面図，
（ｃ）は分解した状態を示す平面図3A and 3B are views showing the appearance of a microphone assembly constituting an input microphone, wherein FIG. 3A is a front view, FIG.
(C) is a plan view showing a disassembled state.

【図４】入力用マイクが運転席のステアリングコラムに
配置されている状態を示す斜視図FIG. 4 is a perspective view showing a state in which an input microphone is arranged on a steering column in a driver's seat.

【図５】音声抽出装置の詳細な構成を示す機能ブロック
図FIG. 5 is a functional block diagram showing a detailed configuration of a voice extraction device.

【図６】音声区間判定装置の詳細な構成を示す機能ブロ
ック図FIG. 6 is a functional block diagram showing a detailed configuration of a voice section determination device.

【図７】音声認識装置の制御部の制御内容を示すフロー
チャートFIG. 7 is a flowchart showing control contents of a control unit of the voice recognition device.

【図８】夫々波形図であり、（ａ）はＰＴＴスイッチの
ＯＮ，ＯＦＦ状態、（ｂ）は音声用マイクの入力信号、
（ｃ）は雑音用マイクの入力信号を示す8A and 8B are waveform diagrams, respectively, wherein FIG. 8A shows ON / OFF states of a PTT switch, FIG. 8B shows an input signal of an audio microphone,
(C) shows an input signal of the noise microphone.

【図９】本発明の第２実施例を示す図２相当図FIG. 9 is a view corresponding to FIG. 2, showing a second embodiment of the present invention;

【図１０】本発明の第３実施例を示す図２相当図FIG. 10 is a view corresponding to FIG. 2, showing a third embodiment of the present invention;

【図１１】本発明の第４実施例を示す図２相当図FIG. 11 is a view corresponding to FIG. 2, showing a fourth embodiment of the present invention;

【図１２】図６相当図FIG. 12 is a diagram corresponding to FIG. 6;

【図１３】本発明の第５実施例を示す図６相当図FIG. 13 is a view corresponding to FIG. 6, showing a fifth embodiment of the present invention.

【図１４】本発明の第６実施例を示す図５相当図FIG. 14 is a view corresponding to FIG. 5, showing a sixth embodiment of the present invention.

【図１５】図６相当図FIG. 15 is a diagram corresponding to FIG. 6;

【図１６】従来技術における、音声入力区間の判定原理
を説明する図FIG. 16 is a diagram for explaining a principle of determining a speech input section in the related art.

[Explanation of symbols]

１はカーナビゲーション装置、１５は音声認識部（音声
認識手段），１８は音声抽出装置、１９は音声用マイ
ク、２０は雑音用マイク、２５，２５ａ，２５ｂ及び２
５ｃはマイクアッシー（筐体）、２９はフレーム分割部
（音声信号データ生成手段，雑音信号データ生成手
段）、３０は音声区間判定装置、３５はサブトラクト部
（音声データ抽出手段，差分量検出手段）、３６及び３
７はパワー算出部（音声信号データ生成手段及び雑音信
号データ生成手段）、３８は減算器（差分量検出手
段）、３９及び３９ａは判定部（音声入力区間判定手
段）、４２はアクチュエータ（指向性変化機構）、４３
は駆動制御部（指向性制御手段）、４５は選択制御部
（音声入力区間判定手段）、４６は音声抽出装置、４７
は音響分析部（音声信号データ生成手段，雑音信号デー
タ生成手段）、４８は音声区間判定装置、４９及び５０
はパワースペクトラム算出部（音声信号データ生成手段
及び雑音信号データ生成手段）、６０，６０ａ，６０ｂ
及び６０ｃは入力用マイクを示す。1 is a car navigation device, 15 is a voice recognition unit (voice recognition means), 18 is a voice extraction device, 19 is a voice microphone, 20 is a noise microphone, 25, 25a, 25b and 2
5c is a microphone assembly (casing), 29 is a frame division unit (audio signal data generation means, noise signal data generation means), 30 is an audio section determination device, and 35 is a subtraction unit (audio data extraction means, difference amount detection means) , 36 and 3
7 is a power calculation unit (sound signal data generation means and noise signal data generation means), 38 is a subtractor (difference amount detection means), 39 and 39a are judgment units (speech input section judgment means), and 42 is an actuator (directivity). Change mechanism), 43
Is a drive control unit (directivity control unit), 45 is a selection control unit (voice input section determination unit), 46 is a voice extraction device, 47
Is an acoustic analysis unit (sound signal data generation means, noise signal data generation means), 48 is a voice section determination device, 49 and 50
Are power spectrum calculation units (audio signal data generation means and noise signal data generation means), 60, 60a, 60b
Reference numerals 60c and 60c denote input microphones.

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 9/00 Ｄ (72)発明者大野宏愛知県刈谷市昭和町１丁目１番地株式会社デンソー内 (72)発明者北岡教英愛知県刈谷市昭和町１丁目１番地株式会社デンソー内 (72)発明者樋口和広愛知県刈谷市昭和町１丁目１番地株式会社デンソー内Ｆターム(参考） 2F029 AA02 AB01 AB07 AB09 AC01 AC02 AC04 AC18 5D015 CC01 DD02 DD03 EE05 KK01 9A001 HH16 HH17 JJ78 Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat II (Reference) G10L 9/00 D (72) Inventor Hiroshi Ohno 1-1-1 Showa-cho, Kariya-shi, Aichi Pref. Inventor Norihide Kitaoka 1-1-1, Showa-cho, Kariya-shi, Aichi Prefecture, DENSO Corporation (72) Inventor Kazuhiro Higuchi 1-1-1, Showa-cho, Kariya-shi, Aichi Prefecture F-term in DENSO Corporation 2F029 AA02 AB01 AB07 AB09 AC01 AC02 AC04 AC18 5D015 CC01 DD02 DD03 EE05 KK01 9A001 HH16 HH17 JJ78

Claims

[Claims]

The directivity is set to the direction of a speaker,
A voice microphone for inputting the voice of the speaker; a directivity set to be different from the voice microphone; a noise microphone for inputting noise; and an input signal of the voice microphone. Audio signal data generating means for generating audio signal data based on the noise signal, noise signal data generating means for generating noise signal data based on the input signal of the noise microphone, and audio signal data generated by the input signal data generating means And a voice input section determining means for determining a voice input section in the voice signal data based on a difference between the voice signal section and the noise signal data generated by the noise data generating section. Judgment device.

2. The voice input section determination device according to claim 1, wherein the directivity of the noise microphone is set to be substantially 180 degrees different from the directivity of the voice microphone.

3. The voice input section determination device according to claim 1, wherein the directivity of the noise microphone is set to be different from the directivity of the voice microphone by approximately 90 degrees.

4. A directivity changing mechanism for relatively changing the directivity of the voice microphone and the noise microphone, and a difference amount based on noise signals input to the voice microphone and the noise microphone, respectively. Means for detecting a difference amount to be detected, and directivity control for controlling the directivity of both microphones to be relatively changed by the directivity changing mechanism so that the difference amount detected by the difference amount detection means is maximized. The voice input section determination device according to any one of claims 1 to 3, further comprising means.

5. A plurality of pairs of the voice microphone and the noise microphone, and a difference amount detecting means for determining a difference amount based on a noise signal input to each of the voice microphone and the noise microphone for each pair. The voice input section determination unit determines a voice input section based on an input signal input to a pair having a maximum difference amount detected by the difference amount detection unit. The voice input section determination device according to any one of claims 1 to 4.

6. The voice input section determination unit according to claim 1, wherein the voice input section determination unit determines the voice input section based on a difference in average power between the voice signal data and the noise signal data. 3. The voice input section determination device according to claim 1.

7. The voice input section determining unit determines a voice input section based on a difference in power spectrum between the voice signal data and the noise signal data. The voice input section determination device according to any one of the claims.

8. The directivity is set in the direction of the speaker,
A voice microphone for inputting the voice of the speaker; a directivity set to be different from the voice microphone; a noise microphone for inputting noise; and an input signal of the voice microphone. Audio signal data generation means for generating audio signal data based on the noise signal, noise signal data generation means for generating noise signal data based on the input signal of the noise microphone, and audio signal data generated by the audio signal data generation means And an audio data extraction unit for extracting audio data from the audio signal data based on a difference between the audio signal data and the noise signal data generated by the noise signal data generation unit.

9. The audio data extraction device according to claim 8, wherein the directivity of the noise microphone is set to be different from the directivity of the audio microphone by approximately 180 degrees.

10. The audio data extraction device according to claim 8, wherein the directivity of the noise microphone is set to be different from the directivity of the audio microphone by approximately 90 degrees.

11. A directivity changing mechanism for relatively changing the directivity of the audio microphone and the noise microphone, and a difference amount based on a noise signal input to each of the audio microphone and the noise microphone. Means for detecting a difference amount to be detected; and directivity control for controlling the directivity of both microphones to be relatively changed by the directivity changing mechanism so that the difference amount detected by the difference amount detection means is maximized. The audio data extraction device according to claim 8, further comprising:

12. A difference amount detecting means for detecting a difference amount based on a noise signal inputted to each of the sound microphone and the noise microphone for each pair, wherein a plurality of pairs of the sound microphone and the noise microphone are provided. The audio data extracting means extracts audio data based on an input signal input to a pair having a maximum difference amount detected by the difference amount detecting means. The audio data extraction device according to any one of the above.

13. The audio data extracting means according to claim 8, wherein the audio data extracting means extracts the audio data based on a difference in power spectrum between the audio signal data and the noise signal data. Audio data extraction device.

14. A voice input section determining apparatus according to claim 1, wherein said voice data extracting means includes a voice included in a voice input section determined by said voice input section determining means. 14. The audio data extraction device according to claim 8, wherein data is extracted.

15. The voice microphone, the noise microphone, the noise data generating means, and the input signal data generating means are commonly configured with each element of the voice input section determination device. The audio data extraction device according to the above.

16. The voice input section determination device according to claim 1, further comprising: voice data included in the voice input section determined by the voice input section determination unit of the voice input section determination device. A speech recognition device, comprising: speech recognition means for analyzing and recognizing a speech data pattern.

17. A voice recognition apparatus for recognizing a voice data pattern by analyzing the voice data extracted by the voice data extraction device according to claim 8 and voice data extraction means of the voice data extraction device. And a voice recognition device.

18. A vehicular navigation system comprising the voice recognition device according to claim 16 or 17, wherein a predetermined process is executed based on a voice data pattern recognized by voice recognition means of the voice recognition device. apparatus.

19. The sound microphone and the noise microphone are arranged in the same housing to be integrally formed, and the shape of the housing is such that the sound microphone and the noise microphone are apparently in the same direction. The voice input section determination device according to any one of claims 1 to 7, wherein the voice data extraction device according to any one of claims 8 to 15, or the voice data extraction device according to any one of claims 8 to 15. Item 16. An input microphone used in any of the speech recognition devices according to Item 16 or 17.