JP2002221987A

JP2002221987A - Method and device for voice recognition and navigation device

Info

Publication number: JP2002221987A
Application number: JP2001016611A
Authority: JP
Inventors: Satoshi Nakaya; 聡中屋
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-01-25
Filing date: 2001-01-25
Publication date: 2002-08-09
Anticipated expiration: 2021-01-25
Also published as: JP4190735B2

Abstract

PROBLEM TO BE SOLVED: To improve a recognition rate by computing noise components more accurately. SOLUTION: Fixed type phrases reproduced from a speaker are collected by a voice recognition microphone prior to the performance of voice recognition of a user. The amount of propagation, that is the difference in comparing voice collected in a step S4 and original sound of the fixed type phrase, is computed as an environmental coefficient Sc. In a step S6, original sound So of a word model in dictionary word data is subtracted from uttered voice Si of the user, the coefficient Sc is subtracted and the result is obtained as a noise template Sn. In a step S7, the template Sn computed in the step S6 is subtracted from the original sound So of the word model in the dictionary word data and the uttered voice Si of the user. The result is compared and the word model for which a highest point is obtained in a voice recognition process is outputted as a recognition result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術的分野】本発明は、音声認識方法お
よび装置とそれを用いたナビゲーション装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method and apparatus and a navigation apparatus using the same.

【０００２】[0002]

【従来の技術】従来において、ナビゲーション装置の音
声認識手段は、リモコン装置に代わる操作手段であり、
使用者の発声する声により目的地までの推奨ルートの設
定や任意の場所の検索などの操作を行うことができるた
め、操作に気をとられることが少なく、使い勝手の良さ
と車両用装置として場合の安全性を向上させることがで
きる。2. Description of the Related Art Conventionally, a voice recognition means of a navigation device is an operation means replacing a remote control device.
The user's voice can be used to set recommended routes to the destination and perform operations such as searching for an arbitrary location. Safety can be improved.

【０００３】図４は従来技術における音声認識手段を備
えた車載用ナビゲーション装置のシステム構成図であ
る。図４において、音声認識用マイク２１は、使用者か
らの音声を入力するものである。雑音検出用マイク２２
は、車室内の所定の位置に設置されて車内雑音を入力す
るものである。音声認識手段２３は、音声認識用マイク
２１から入力された音声を解析し、辞書ワードデータの
中から近似しているデータを検索して音声認識を行うも
のである。雑音検出手段２４は、雑音検出用マイク２２
から入力された車内雑音を検出し、ノイズ量をＣＰＵ３
３に送信する。地域施設探索手段２５は、自車位置から
目的地までのルート設定や自車位置周辺の施設などを探
索する。入出力手段２８は、方位センサ２６や距離セン
サ２７などの入力信号を処理し、ＣＰＵ３３により周辺
装置との通信や制御を行なう。ＤＶＤ−ＲＯＭ（または
ＣＤ−ＲＯＭ）３０は、地図データや音声データ、音声
認識に使用される辞書ワードデータなどが記憶されてい
る。ＤＶＤ−ＲＯＭ（またはＣＤ−ＲＯＭ）ドライブ３
１は、ＤＶＤ−ＲＯＭ３０から地図データ、音声デー
タ、または音声認識に使用される辞書ワードデータなど
を読み出す。通信インターフェイス２９は、ＤＶＤ−Ｒ
ＯＭドライブ３１などの外部接続機器とＣＰＵ３３との
間の信号やデータの受け渡しを行う。メモリ３２は、テ
ンプレートとして複数の異なるノイズ量を格納したメモ
リであり、ＤＶＤ−ＲＯＭ３０に格納された音声認識用
辞書ワードデータの単語モデルに加算するノイズ量をノ
イズテンプレートとして格納してある。FIG. 4 is a system configuration diagram of a vehicle-mounted navigation device provided with a voice recognition means according to the prior art. In FIG. 4, a voice recognition microphone 21 is for inputting voice from a user. Noise detection microphone 22
Is installed at a predetermined position in the vehicle compartment to input noise inside the vehicle. The voice recognition unit 23 analyzes voice input from the voice recognition microphone 21 and searches for approximate data from dictionary word data to perform voice recognition. The noise detecting means 24 includes the noise detecting microphone 22
Detects vehicle interior noise input from the
Send to 3. The local facility search means 25 searches for a route from the own vehicle position to the destination, facilities around the own vehicle position, and the like. The input / output unit 28 processes input signals from the azimuth sensor 26 and the distance sensor 27, and performs communication and control with peripheral devices by the CPU 33. The DVD-ROM (or CD-ROM) 30 stores map data, voice data, dictionary word data used for voice recognition, and the like. DVD-ROM (or CD-ROM) drive 3
1 reads out, from the DVD-ROM 30, map data, audio data, dictionary word data used for audio recognition, and the like. The communication interface 29 is a DVD-R
It exchanges signals and data between the CPU 33 and an externally connected device such as the OM drive 31. The memory 32 is a memory that stores a plurality of different noise amounts as templates, and stores a noise amount to be added to the word model of the speech recognition dictionary word data stored in the DVD-ROM 30 as a noise template.

【０００４】使用者の音声が音声認識用マイク２１から
入力されると、音声認識手段２３は、予めＤＶＤ−ＲＯ
Ｍ３０に登録されている辞書ワードデータから比較対照
語彙を算出し、同時にＣＰＵ３３は、雑音検出手段２４
で検出された雑音と同等なノイズ量をノイズテンプレー
トから選出し、そのノイズ量を比較対照語彙に加算した
データと使用者の発声語句とを比較演算し、最も近似し
ている比較対照語彙を認識結果とする。また音声認識に
おける最高得点を獲得したノイズ係数またはテンプレー
トを学習することで、実際に最適なノイズ量を加算でき
るため認識率を高めることができる。When a user's voice is input from the voice recognition microphone 21, the voice recognition means 23
The comparison vocabulary is calculated from the dictionary word data registered in M30.
Selects the amount of noise equivalent to the noise detected in the above from the noise template, compares the data obtained by adding the amount of noise to the comparison vocabulary, and the utterance phrase of the user, and recognizes the closest comparison vocabulary Result. Further, by learning the noise coefficient or the template that has obtained the highest score in speech recognition, the optimum noise amount can be actually added, so that the recognition rate can be increased.

【０００５】図５は従来技術における音声認識方法を説
明するブロック図である。図５において、辞書ワードデ
ータＤは、記憶手段であるＤＶＤ−ＲＯＭまたはＣＤ−
ＲＯＭに記録されている音声認識用の単語モデルを収録
したものであり、複数の異なるノイズ量を記憶したノイ
ズテンプレートＴから、実際に検出したノイズ量の結果
から雑音相当のノイズ係数を決定する。辞書ワードデー
タＤ内の比較対照語彙に加算手段Ａにより上記ノイズ量
を合成した結果と、マイクＭから入力された使用者の音
声とを比較手段Ｃにより比較演算することで、一致する
単語モデルを決定する。学習手段Ｉは、辞書ワードデー
タＤから取り出した比較対照語彙となる単語モデルと、
ノイズテンプレートＴから取り出した最適なノイズ量を
加算手段Ａにより合成し、使用者の発声語句と比較した
結果の一番得点の高いノイズ係数またはテンプレートを
学習する。FIG. 5 is a block diagram for explaining a speech recognition method in the prior art. In FIG. 5, dictionary word data D is stored in a DVD-ROM or CD-ROM as a storage means.
A word model for speech recognition recorded in a ROM is recorded. From a noise template T storing a plurality of different noise amounts, a noise coefficient corresponding to noise is determined from the result of the actually detected noise amount. The result of combining the noise amount with the comparison vocabulary in the dictionary word data D by the adding means A and the user's voice input from the microphone M is compared and calculated by the comparing means C so that a matching word model is obtained. decide. The learning means I includes a word model serving as a comparison vocabulary extracted from the dictionary word data D;
The optimum noise amount extracted from the noise template T is synthesized by the adding means A, and the noise coefficient or template with the highest score as a result of comparison with the utterance phrase of the user is learned.

【０００６】音声認識用マイクや雑音検出用マイクの設
置場所およびこれらマイクの方向、また使用者の発声音
量などにより実際に加算するノイズ量は異なるため、複
数の異なるノイズ加算量を記憶したテンプレートを予め
用意することで、車内ノイズに対応した音声認識を実現
できる。Since the amount of noise to be actually added differs depending on the installation location of the microphone for speech recognition and the microphone for noise detection, the direction of these microphones, the volume of the user's voice, and the like, a template storing a plurality of different amounts of noise addition is used. By preparing in advance, voice recognition corresponding to the noise in the vehicle can be realized.

【０００７】なお、上記従来の技術では、音声認識用マ
イクの他に雑音検出用マイクを用いた雑音学習方法を示
したが、その他音声認識用マイク単体において、使用者
の発話直前の雑音レベルおよび雑音成分を検出し、解析
する方法もある。In the above-mentioned prior art, a noise learning method using a noise detection microphone in addition to a voice recognition microphone has been described. There is also a method of detecting and analyzing a noise component.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記従
来の音声認識方法では、使用者が簡単に、そして安心し
てナビゲーション装置の操作が行えることを意図するも
のであるにも拘わらず、その認識率には課題があった。
車載装置における音声認識の場合、車両にはエンジン音
やタイヤの走行音などの様々な雑音を発生する原因があ
り、その雑音は走行状況によりノイズ成分やノイズレベ
ルなどが多様に変化する。使用者が実際に発声した音声
に外部からの雑音が重なり、記憶手段に格納された辞書
ワードデータの比較対照語彙である単語モデルと比較し
た場合、実際の音声との比較が困難になることがある。
また音声認識用マイクの設置場所や設置方向の違いによ
っても、車両室内の音声反射、反響のため、ノイズ成分
やノイズレベルが異なり、予め雑音を加えた辞書ワード
データを用意しても、あらゆる雑音による安定した認識
率を確保することが困難である。However, in the above-mentioned conventional speech recognition method, although the user intends to operate the navigation device easily and with ease, the recognition rate is not so high. Had a challenge.
In the case of voice recognition in an in-vehicle device, a vehicle has various causes such as engine noise and tire running noise, and the noise varies in noise component and noise level variously depending on running conditions. External noise may be superimposed on the voice actually uttered by the user, and it may be difficult to compare with the actual voice when comparing the dictionary word data stored in the storage means with a word model which is a vocabulary for comparison. is there.
Also, depending on the installation location and installation direction of the voice recognition microphone, noise components and noise levels differ due to voice reflection and reverberation in the vehicle interior. It is difficult to secure a stable recognition rate due to

【０００９】また、複数のノイズテンプレートを用意
し、車速または雑音検出用マイクで集音した雑音などか
ら該当ノイズを予測し、そのノイズ量を辞書ワードデー
タの単語モデルの原音に加算して使用者の発話音声と比
較する方法があるが、車速からノイズ成分やノイズレベ
ルを予測する場合、変速機の切換時やワイパー動作、ウ
インカー動作や走行場所により、一概に車速に比例して
ノイズ量が大きくなるとは限らない。また発話する音質
や音声レベルが使用者により異なり、さらに車両の種類
によってもノイズ成分は異なるため、実際の使用者の発
話音声の音量や音質と車両のノイズ分がどのくらいある
か切り分けることが困難である。Also, a plurality of noise templates are prepared, the corresponding noise is predicted from the vehicle speed or noise collected by a noise detection microphone, and the amount of the noise is added to the original sound of the word model of the dictionary word data. However, when predicting the noise component and the noise level from the vehicle speed, the noise amount is generally large in proportion to the vehicle speed, depending on the transmission switching, the wiper operation, the turn signal operation, and the traveling location. Not necessarily. In addition, since the sound quality and voice level to be spoken differ depending on the user, and the noise component also differs depending on the type of vehicle, it is difficult to separate the volume and sound quality of the speech sound of the actual user from the noise of the vehicle. is there.

【００１０】本発明は、上記課題を解決するものであ
り、ノイズ成分をより正確に算出して認識率を向上させ
ることのできる音声認識方法および装置とそれを利用し
たナビゲーション装置を提供するものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and provides a speech recognition method and apparatus capable of more accurately calculating a noise component and improving a recognition rate, and a navigation apparatus using the same. is there.

【００１１】[0011]

【課題を解決するための手段】本発明の音声認識方法
は、使用者の音声認識を行う前に再生される定型フレー
ズを音声認識用マイクで集音し、前記集音した音声と前
記定型フレーズの原音とを比較した差分である伝搬減衰
量を環境係数として算出し、次いで前記使用者が発話し
た音声を基に辞書ワードデータから選択した比較対照語
彙の原音を減算し、さらに前記環境係数分を減算した結
果からノイズ量を求めてノイズテンプレートを作成する
ことを特徴とするものである。この方法により、ノイズ
成分をより正確に算出することができ、認識率を向上さ
せることができる。According to the voice recognition method of the present invention, a fixed phrase reproduced before a user's voice recognition is collected by a voice recognition microphone, and the collected voice and the fixed phrase are collected. A propagation attenuation, which is a difference obtained by comparing the original sound with the original sound, is calculated as an environmental coefficient, and then the original sound of the selected comparative vocabulary is subtracted from the dictionary word data based on the voice uttered by the user. , A noise template is created by calculating the amount of noise from the result of subtraction. With this method, the noise component can be calculated more accurately, and the recognition rate can be improved.

【００１２】また、本発明の音声認識方法は、前記辞書
ワードデータ内の単語モデルの原音と、前記使用者の発
話音声から前記算出したノイズ量を減算した結果とを比
較演算し、最も近似している単語モデルを認識結果とす
るものであり、ノイズ成分をより正確に算出することが
でき、認識率を向上させることができる。Further, in the speech recognition method of the present invention, the original sound of the word model in the dictionary word data is compared with the result obtained by subtracting the calculated noise amount from the uttered voice of the user, and the most approximate result is obtained. Since the word model is used as a recognition result, the noise component can be calculated more accurately, and the recognition rate can be improved.

【００１３】また、本発明の音声認識装置は、音声認識
用辞書ワードデータを記憶した記憶手段と、使用者の発
話音声を入力する音声認識用マイクと、音声による案内
を出力する音声案内用スピーカと、前記スピーカから再
生した定型フレーズを前記音声認識用マイクで集音した
音声と前記定型フレーズの原音とを比較した結果を基に
ノイズ量を作成し、前記ノイズ量と前記記憶手段に記憶
された音声発話用単語モデルの原音とを比較演算するこ
とにより音声認識を行う音声認識手段とを備えたもので
ある。この構成により、ノイズ成分をより正確に算出す
ることができ、認識率を向上させることができる。Further, the voice recognition apparatus of the present invention has a storage means for storing voice recognition dictionary word data, a voice recognition microphone for inputting a user's uttered voice, and a voice guidance speaker for outputting voice guidance. A noise amount is created based on a result obtained by comparing a voice collected by the voice recognition microphone with a fixed phrase reproduced from the speaker and an original sound of the fixed phrase, and the noise amount is stored in the storage unit. Voice recognition means for performing voice recognition by comparing and calculating the original sound of the voice utterance word model. With this configuration, the noise component can be calculated more accurately, and the recognition rate can be improved.

【００１４】また、本発明の音声認識装置は、前記算出
したノイズ量を学習することを特徴とするものであり、
種々の異なる状況に対応したノイズテンプレートを件成
することができるため、使用者の発話音声に対する認識
率を向上させることができる。Further, the speech recognition apparatus of the present invention is characterized by learning the calculated noise amount.
Since noise templates corresponding to various different situations can be formed, the recognition rate of the user's uttered voice can be improved.

【００１５】また、本発明は、上記音声認識装置を備え
たナビゲーション装置である。使用者がリモコンなどの
音声認識開始ボタンを操作した後にナビゲーション装置
が再生する「ボイスワードをお話しください」などの音
声や発話開始を促す信号音である定型フレーズは、その
音量や周波数分布などがナビゲーション装置の記憶手段
に格納されているため確定された値となり、この定型フ
レーズを特定の場所に設置された音声認識用マイクで集
音した音声データと比較演算することで、音声認識用マ
イクと音声案内用スピーカとの距離を現在の環境係数と
して算出できる。この環境係数は、音声案内用スピーカ
の設置位置や設置方向により車室内の音声反射や反響に
より異なり、さらに道路の状況や天候などによっても異
なるので、使用者の発話直前の車内環境を正確に算出す
ることができる。さらに上記算出した環境係数を学習さ
せることで、異なる車種や走行状況に対応したノイズテ
ンプレートを件成することができるため、使用者の発話
音声に対する認識率を向上させることができる。Further, the present invention is a navigation device provided with the above speech recognition device. When the user operates the voice recognition start button on the remote control or other device, the navigation device plays a voice such as "Please say a voice word" or a fixed phrase that is a signal to prompt the start of speech. The value is determined because it is stored in the storage means of the device, and the fixed phrase is compared with the voice data collected by the voice recognition microphone installed at a specific location to calculate the value. The distance from the guidance speaker can be calculated as the current environmental coefficient. This environmental coefficient varies depending on the sound reflection and reverberation in the cabin depending on the installation position and installation direction of the voice guidance speaker, and also depends on the road conditions and the weather, so the in-vehicle environment immediately before the user's speech is accurately calculated. can do. Further, by learning the calculated environment coefficient, a noise template corresponding to different vehicle types and driving situations can be formed, so that the recognition rate of the user's uttered voice can be improved.

【００１６】[0016]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。図１は本発明の実施の形態
における音声認識方法および装置を用いた車載ナビゲー
ション装置の構成を示すブロック図である。図１におい
て、音声認識用マイク１は、使用者からの発話音声の入
力を行なうものである。音声認識手段２は、音声認識用
マイク１から入力された音声を解析し、辞書ワードデー
タの中から近似しているデータを検索するものである。
音声案内用スピーカ３は、目的地までの経路案内や走行
案内または音声認識の発話誘導などの音声を出力するも
のである。Ｄ／Ａ変換部４は、スピーカ３から音声を出
力するために、デジタル音声データをアナログ音声デー
タに変換および増幅する。方位センサ５は、車両の進行
方向を検出する。各種センサ信号６は、パーキングスイ
ッチやウインカーまたは変速機のスイッチ入力などの信
号である。入出力装置７は、方位センサ５や各種センサ
信号６からの入力信号を処理し、ＣＰＵ１２により周辺
装置との通信や制御を行なう。ＤＶＤ−ＲＯＭ（または
ＣＤ−ＲＯＭ）８は、地図データや音声データ、さらに
音声認識で使用される比較対照語彙である単語モデルを
集めた辞書ワードデータなどが記憶されている。ＤＶＤ
−ＲＯＭ（またはＣＤ−ＲＯＭ）ドライブ９は、ＤＶＤ
−ＲＯＭ８から地図データ、音声データ、または辞書ワ
ードデータなどを読み出す。通信インターフェイス１０
は、ＤＶＤ−ＲＯＭドライブ９とＣＰＵ１２との間の信
号やデータの受け渡しを行う。メモリ１１は、テンプレ
ートとして複数の異なるノイズ量やノイズレベルから算
出された各種データおよびその他の作業データを記憶す
る。ＣＰＵ１２は、装置全体を制御し、音声認識におい
て環境係数の演算やノイズテンプレートの作成などを行
ない、使用者の発話音声とＤＶＤ−ＲＯＭ８に格納され
た辞書ワードデータの単語モデルとを比較演算する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an in-vehicle navigation device using a voice recognition method and device according to an embodiment of the present invention. In FIG. 1, a voice recognition microphone 1 is for inputting an uttered voice from a user. The voice recognition means 2 analyzes voice input from the voice recognition microphone 1 and searches for approximate data from dictionary word data.
The voice guidance speaker 3 outputs voices such as route guidance to the destination, travel guidance, and voice guidance for voice recognition. The D / A converter 4 converts and amplifies digital audio data into analog audio data in order to output audio from the speaker 3. The direction sensor 5 detects the traveling direction of the vehicle. The various sensor signals 6 are signals such as a parking switch, a turn signal or a switch input of a transmission. The input / output device 7 processes input signals from the direction sensor 5 and various sensor signals 6, and performs communication and control with peripheral devices by the CPU 12. The DVD-ROM (or CD-ROM) 8 stores map data and voice data, and dictionary word data in which word models as comparison vocabularies used in voice recognition are collected. DVD
The ROM (or CD-ROM) drive 9 is a DVD
Reading map data, voice data, dictionary word data, etc. from the ROM 8; Communication interface 10
Transmits and receives signals and data between the DVD-ROM drive 9 and the CPU 12. The memory 11 stores various data calculated from a plurality of different noise amounts and noise levels and other work data as a template. The CPU 12 controls the entire apparatus, calculates an environmental coefficient, creates a noise template, and the like in speech recognition, and compares the speech model of the user with the word model of the dictionary word data stored in the DVD-ROM 8.

【００１７】次に、本実施の形態の動作について説明す
る。使用者からリモコンなどの操作により音声認識要求
を検出した場合、ＣＰＵ１２は、音声データの格納され
たＤＶＤ−ＲＯＭ８から使用者に発話開始を促す定型フ
レーズ音声をＤ／Ａ変換部４により変換、増幅した後、
音声案内用スピーカ３より再生する。再生された定型フ
レーズ音声は、音声認識用マイク１に帰還入力した後、
ＤＶＤ−ＲＯＭ８に格納された定型フレーズ音声の原音
と比較演算した差分である音声の伝搬減衰量を環境係数
として算出する。この定型フレーズ音声の伝搬減衰量を
算出することで、音声案内用スピーカ３と音声認識用マ
イク１との距離を算出することができる。その後、入力
された使用者の発話音声と辞書ワードデータ内の単語モ
デルの原音と上記環境係数として算出した値を演算する
ことで、当該ノイズ量であるノイズテンプレートを作成
する。上記の処理を行なった後、音声認識手段２は、辞
書ワードデータ内の単語モデルの原音にノイズテンプレ
ートを加算したデータと使用者の発話音声とを比較、ま
たは上記単語モデルの原音と使用者の発話音声音からノ
イズテンプレートを減算したデータとを比較することに
より、現在の車室内のノイズ環境においてより正確な認
識率を確保することができる。Next, the operation of this embodiment will be described. When a voice recognition request is detected by a user's operation of a remote controller or the like, the CPU 12 converts and amplifies a fixed phrase voice prompting the user to start utterance from the DVD-ROM 8 in which voice data is stored by the D / A converter 4. After doing
The sound is reproduced from the voice guidance speaker 3. The reproduced fixed phrase voice is fed back to the voice recognition microphone 1 and then input.
The propagation attenuation of the sound, which is a difference obtained by performing a comparison operation with the original sound of the fixed phrase sound stored in the DVD-ROM 8, is calculated as an environmental coefficient. By calculating the amount of propagation attenuation of the fixed phrase voice, the distance between the voice guidance speaker 3 and the voice recognition microphone 1 can be calculated. Thereafter, a noise template that is the noise amount is created by calculating the input speech voice of the user, the original sound of the word model in the dictionary word data, and the value calculated as the environmental coefficient. After performing the above processing, the voice recognition means 2 compares the data obtained by adding the noise template to the original sound of the word model in the dictionary word data with the uttered voice of the user, or By comparing data obtained by subtracting the noise template from the uttered voice sound, a more accurate recognition rate can be secured in the current noise environment in the vehicle cabin.

【００１８】図２は本実施の形態における動作フロー図
である。ステップＳ１は初期設定であり、Ｓｃは音声の
伝搬減衰量である環境係数、Ｓｎはノイズテンプレー
ト、Ｓｉは使用者の発話音声である。ステップＳ２にお
いて使用者から音声認識の開始要求があった場合、ステ
ップＳ３においてナビゲーション装置は「ボイスワード
をお話しください」などの定型フレーズを再生するとと
もに、再生した定型フレーズを音声認識用マイク１に帰
還入力して集音量を取り込む。ステップＳ４では、記憶
手段であるＤＶＤ−ＲＯＭ８などに記録された辞書ワー
ドデータの単語モデルの原音とステップＳ３で集音した
定型フレーズとの音声とを演算し、実際の単語モデルの
原音との差分である音声の伝搬減衰量を環境係数Ｓｃと
して算出する。ステップＳ５では、使用者の発話音声の
検出待ちの後、ステップＳ６において使用者の発話音声
Ｓｉから辞書ワードデータ内の単語モデルの原音Ｓｏを
減算し、さらにステップＳ４で算出した環境係数Ｓｃを
減算し、その結果をノイズテンプレートＳｎとして求め
る。ステップＳ７は音声認識手段２が行う比較処理であ
り、辞書ワードデータ内の単語モデルの原音Ｓｏと使用
者の発話音声ＳｉからステップＳ６で算出したノイズテ
ンプレートＳｎを減算した結果を比較演算し、音声認識
処理にて最高得点を得た単語モデルを認識結果としてＣ
ＰＵ１２に出力する。音量や周波数分布特性などが確定
している定型フレーズ音声を利用して音声の伝搬減衰量
である環境係数を算出することで、より正確にノイズ量
であるノイズテンプレートを作成でき、このため結果と
して認識率を向上させることができる。FIG. 2 is an operation flow chart in the present embodiment. Step S1 is an initial setting, where Sc is an environmental coefficient which is a sound propagation attenuation, Sn is a noise template, and Si is a user's uttered voice. In step S2, if there is a request from the user to start speech recognition, in step S3, the navigation device reproduces a fixed phrase such as "Please speak a voice word" and returns the reproduced fixed phrase to the voice recognition microphone 1. Input and capture volume. In step S4, the original sound of the word model of the dictionary word data recorded on the DVD-ROM 8 or the like as the storage means and the sound of the fixed phrase collected in step S3 are calculated, and the difference between the original sound of the word model and the original sound of the actual word model is calculated. Is calculated as the environment coefficient Sc. In step S5, after waiting for the detection of the uttered voice of the user, the original sound So of the word model in the dictionary word data is subtracted from the uttered voice Si of the user in step S6, and the environmental coefficient Sc calculated in step S4 is further subtracted. Then, the result is obtained as a noise template Sn. Step S7 is a comparison process performed by the voice recognizing means 2, which compares the result obtained by subtracting the noise template Sn calculated in step S6 from the original sound So of the word model in the dictionary word data and the user's uttered voice Si, and The word model that obtained the highest score in the recognition process is used as the recognition result as C
Output to PU12. By calculating the environmental coefficient, which is the amount of propagation attenuation of the sound, using a fixed phrase sound whose volume and frequency distribution characteristics are fixed, it is possible to more accurately create a noise template, which is the amount of noise. The recognition rate can be improved.

【００１９】図３は本実施の形態における車室内の各装
置の設置位置を説明するための図である。図３（a）に
おいて、Ｄは運転者であり、Ｍは音声認識用マイクであ
り、ＳＰは表示手段に内蔵された音声案内用スピーカで
ある。図３（b）も同様であるが、音声案内用スピーカ
ＳＰは任意の位置に設置可能である。図３に示すよう
に、マイクＭやスピーカＳＰは、通常ある一定の場所に
配置され、運転者Ｄからは一定の距離があるものと想定
することができるが、スピーカＳＰの設置位置により車
室内の音声の反射および反響に違いが生じる。例として
図３（a）の表示手段内蔵型スピーカＳＰの場合、運転
者Ｄに向けて設置される場合と、表示手段後面に設置さ
れている場合とでは、運転者Ｄに伝わる音声の伝搬減哀
量は異なる。したがって、種々の条件下における伝搬減
衰量を音声認識手段２に学習させることで、異なる車種
や走行状況に対応したノイズテンプレートを件成するこ
とができ、使用者の発話音声に対する認識率を向上させ
ることができる。FIG. 3 is a diagram for explaining an installation position of each device in the vehicle compartment in the present embodiment. In FIG. 3A, D is a driver, M is a voice recognition microphone, and SP is a voice guidance speaker built in the display means. FIG. 3B is similar, but the voice guidance speaker SP can be installed at an arbitrary position. As shown in FIG. 3, the microphone M and the speaker SP are usually arranged at a certain fixed place, and it can be assumed that the microphone M and the speaker SP are at a certain distance from the driver D. There is a difference in the reflection and reverberation of the sound of the sound. As an example, in the case of the speaker SP with built-in display means in FIG. 3A, the propagation of the sound transmitted to the driver D is reduced between the case where the speaker SP is installed toward the driver D and the case where the speaker SP is installed behind the display means. Sorrow is different. Therefore, by making the speech recognition means 2 learn the propagation attenuation amount under various conditions, a noise template corresponding to a different vehicle type and running situation can be formed, and the recognition rate of the user's uttered voice is improved. be able to.

【００２０】以上のように、上記実施の形態によれば、
音声認識用マイクや音声案内用スピーカを車両により異
なる位置に設置した場合でも、予め音量や周波数分布特
性などが確定している定型フレーズを利用してノイズ量
を算出してノイズテンプレートを作成するため、車両の
種類や走行状況、天候、ワイパーやウインカーまたはオ
ーディオなどの車両装備の動作音などにより車内ノイズ
環境の異なる場合でも、従来に比べてより正確に使用者
の発話音声に対する認識率を向上させることができる。As described above, according to the above embodiment,
Even if a voice recognition microphone or a voice guidance speaker is installed at different positions depending on the vehicle, the noise amount is calculated using a fixed phrase in which the volume and frequency distribution characteristics are determined in advance to create a noise template. Even if the noise environment inside the vehicle differs due to the type of vehicle, driving conditions, weather, operation sounds of vehicle equipment such as wipers, turn signals or audio, etc., the recognition rate for the user's uttered voice is more accurately improved than before. be able to.

【００２１】なお、上記実施の形態では、音声認識装置
を組み込んだ車載ナビゲーション装置として説明した
が、本発明は音声認識装置単独としても構成することが
できる。また、上記実施の形態では、車載ナビゲーショ
ン装置として説明したが、本発明は、単に車両内に持ち
込まれる、または車両に着脱可能に搭載される移動型の
ナビゲーション装置としても構成することができる。Although the above embodiment has been described as an in-vehicle navigation device incorporating a voice recognition device, the present invention can also be configured as a single voice recognition device. Further, in the above-described embodiment, the description has been given as the in-vehicle navigation device. However, the present invention may be configured as a mobile navigation device which is simply carried into the vehicle or detachably mounted on the vehicle.

【００２２】[0022]

【発明の効果】本発明の音声認識方法は、上記実施の形
態から明らかなように、使用者の音声認識を行う前に再
生される定型フレーズを音声認識用マイクで集音し、集
音した音声と定型フレーズの原音とを比較した差分であ
る伝搬減衰量を環境係数として算出し、次いで使用者が
発話した音声から辞書ワードデータ内で選択した比較対
照語彙の原音を減算し、さらに環境係数分を減算した結
果からノイズ量を求めてノイズテンプレートを作成する
ものであり、再生音声の音量や周波数分布の確定してい
る定型フレーズを利用することにより、ノイズ成分をよ
り正確に算出することができる。また、辞書ワードデー
タ内の単語モデルの原音と、使用者の発話音声からノイ
ズ量を減算した結果とを比較演算し、最も近似している
単語モデルを認識結果とすることにより、認識率を一層
向上させることができる。According to the voice recognition method of the present invention, as is apparent from the above embodiment, the fixed phrase reproduced before the user's voice recognition is collected by the voice recognition microphone and collected. The propagation attenuation, which is the difference between the voice and the original sound of the fixed phrase, is calculated as an environmental coefficient, and then the original sound of the comparative vocabulary selected in the dictionary word data is subtracted from the voice uttered by the user, and the environmental coefficient is further calculated. A noise template is created by calculating the amount of noise from the result of subtracting the minutes, and the noise component can be calculated more accurately by using a fixed phrase with a fixed volume and frequency distribution of the reproduced sound. it can. The recognition rate is further improved by comparing the original sound of the word model in the dictionary word data with the result obtained by subtracting the amount of noise from the user's uttered voice, and using the most approximate word model as the recognition result. Can be improved.

【００２３】また、本発明の音声認識装置は、音声認識
用辞書ワードデータを記憶した記憶手段と、使用者の発
話音声を入力する音声認識用マイクと、音声による案内
を出力する音声案内用スピーカと、スピーカから再生し
た定型フレーズを音声認識用マイクで集音した音声と定
型フレーズの原音とを比較した結果を基にノイズ量を作
成し、そのノイズ量と記憶手段に記憶された音声発話用
単語モデルの原音とを比較演算することにより音声認識
を行う音声認識手段とを備えたものであり、ノイズ成分
をより正確に算出することができるので、認識率を一層
向上させることができる。Further, the voice recognition apparatus of the present invention has a storage means for storing voice recognition dictionary word data, a voice recognition microphone for inputting a user's uttered voice, and a voice guidance speaker for outputting voice guidance. And a noise amount is created based on the result of comparing the sound collected by the voice recognition microphone with the fixed phrase reproduced from the speaker and the original sound of the fixed phrase, and the noise amount and the voice utterance stored in the storage means are generated. A speech recognition unit for performing speech recognition by comparing and calculating the original sound of the word model is provided. Since the noise component can be calculated more accurately, the recognition rate can be further improved.

【００２４】さらに本発明の音声認識装置を備えたナビ
ゲーション装置は、車両の走行速度や走行場所、天候な
どの車外状況やマイクの設置場所や設置方向または車載
オーディオの音量、音質などの車内状況により、実際に
異なる雑音成分や雑音レベルが発生する場合でも、音声
認識を行なう直前にナビゲーション装置自体が再生する
定型フレーズを音声認識マイクで入力して比較演算し、
音声認識用マイクと音声案内用スピーカとの距離を算出
して、使用者と音声認識用マイク間における発話音声の
伝搬減衰量を環境係数とすることで、正確に現在のノイ
ズ量を算出し、さらに環境係数を学習することにより、
様々な車内環境において、音声認識の認識精度を向上さ
せる効果を有する。さらに、使用者の発話直前の車室内
のノイズ量を算出できるため、車速信号やその他のセン
サ信号により車両の走行状態を監視および検出する必要
がなくなり、さらに対話習熟させるにことに従い、上記
環境係数の値も定まり、より精度の高い認識率を確保で
きるという効果を有する。Further, the navigation device provided with the voice recognition device of the present invention can be used in a vehicle outside condition such as a running speed and a running place of the vehicle, weather, and a place and an installation direction of a microphone, or a vehicle interior condition such as a volume and a sound quality of a vehicle audio system. Even if a different noise component or noise level is actually generated, just before performing voice recognition, a fixed phrase reproduced by the navigation device itself is input by a voice recognition microphone and compared and calculated.
By calculating the distance between the voice recognition microphone and the voice guidance speaker and using the propagation attenuation of the uttered voice between the user and the voice recognition microphone as the environmental coefficient, the current noise amount is accurately calculated, By learning more environmental factors,
This has the effect of improving the recognition accuracy of voice recognition in various in-vehicle environments. Furthermore, since the amount of noise in the vehicle cabin immediately before the user's speech can be calculated, it is not necessary to monitor and detect the running state of the vehicle using a vehicle speed signal or other sensor signals. Is determined, which has an effect that a more accurate recognition rate can be secured.

[Brief description of the drawings]

【図１】本発明の実施の形態におけるナビゲーション装
置の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a navigation device according to an embodiment of the present invention.

【図２】本発明の実施の形態における音声認識処理のフ
ロー図FIG. 2 is a flowchart of a speech recognition process according to the embodiment of the present invention;

【図３】本発明の実施の形態における車室内の設置位置
を示す説明図FIG. 3 is an explanatory diagram showing an installation position in a vehicle interior according to the embodiment of the present invention.

【図４】従来におけるナビゲーション装置の構成を示す
ブロック図FIG. 4 is a block diagram showing a configuration of a conventional navigation device.

【図５】従来における音声認識方法を説明するブロック
図FIG. 5 is a block diagram illustrating a conventional speech recognition method.

[Explanation of symbols]

１音声認意用マイク２音声認識手投３音声案内用スピーカ４Ｄ／Ａ変換郁５方位センサ６各種センサ信号７入出力手段８ＤＶＤ−ＲＯＭ９ＤＶＤ−ＲＯＭドライブ１０通信インターフェイス１１メモリ１２ＣＰＵ DESCRIPTION OF SYMBOLS 1 Voice recognition microphone 2 Voice recognition hand throwing 3 Voice guidance speaker 4 D / A conversion 5 Direction sensor 6 Various sensor signals 7 Input / output means 8 DVD-ROM 9 DVD-ROM drive 10 Communication interface 11 Memory 12 CPU

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５５１Ｑ 15/28 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI theme coat ゛ (reference) G10L 15/00 G10L 3/00 551Q 15/28

Claims

[Claims]

1. A propagation attenuation amount, which is a difference obtained by collecting a fixed phrase reproduced before a user's voice recognition is performed by a voice recognition microphone and comparing the collected sound with an original sound of the fixed phrase. Is calculated as an environmental coefficient, and then the original sound of the selected comparative vocabulary is subtracted from the dictionary word data based on the voice uttered by the user, and the noise amount is obtained from the result obtained by subtracting the environmental coefficient. A speech recognition method characterized by creating

2. A comparison operation is performed between an original sound of a word model in the dictionary word data and a result obtained by subtracting the calculated amount of noise from the uttered voice of the user, and a word model that is most similar to a recognition result is determined. The voice recognition method according to claim 1, wherein

3. A storage means for storing dictionary word data for voice recognition, a voice recognition microphone for inputting a user's uttered voice, a voice guidance speaker for outputting voice guidance, and a fixed form reproduced from the speaker. A noise amount is created based on a result obtained by comparing the voice collected by the voice recognition microphone with the original sound of the fixed phrase, and the noise amount and the original sound of the voice utterance word model stored in the storage unit are generated. And a voice recognition unit for performing voice recognition by performing a comparison operation with the voice recognition device.

4. The speech recognition apparatus according to claim 3, wherein said calculated noise amount is learned.

5. A navigation device comprising the voice recognition device according to claim 3.