JP2005227369A

JP2005227369A - Voice recognition device, and method, and vehicle mounted navigation system

Info

Publication number: JP2005227369A
Application number: JP2004033825A
Authority: JP
Inventors: Keiichi Takahashi; 恵一高橋
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-02-10
Filing date: 2004-02-10
Publication date: 2005-08-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition device with which voice recognition by dialect can be performed. <P>SOLUTION: The voice recognition device which collates the result obtained by subjecting the voice of an inputted word to a frequency analysis with a word dictionary formed by using a plurality of recognition templates and recognizes the voice by extracting the word of a high degree of similarity is provided. A DVD-ROM 5 is provided with a standard template and a plurality of area templates. A CPU 20 switches the recognition template to an arbitrary template by selection of a user. The user is able to carry out voice recognition by selecting the area template of the area dialect suitable for the user and the voice recognition is made possible even by using the area language. Convenience and operability can be improved. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、ユーザが入力した音声を認識するための音声認識装置および方法と音声認識装置を備えた車載ナビゲーション装置に関する。 The present invention relates to a speech recognition device and method for recognizing speech input by a user, and an in-vehicle navigation device including the speech recognition device.

従来、利便性をより向上させるために、車両に搭載されたカーエアコン、カーオーディオ、カーナビゲーションなどの操作を音声認識装置を利用して行う技術が知られている（例えば、特許文献１参照）。音声認識装置において重要なことは、誤認識を防いで認識率を高めることである。誤認識の発生を防ぐ１つの方策として、入力した認識対象となる単語の短時間スペクトルを標準パターンと照合するための認識テンプレートとして、男性用と女性用と男女混合用の３つの認識テンプレートを用意して、それぞれの性別に適合した認識テンプレートを使用することにより、単一の認識テンプレートを使用する場合に比べて、音声認識における認識率を高める技術が知られている。
特開２００２−１６９５８４号公報 2. Description of the Related Art Conventionally, in order to further improve convenience, a technique for performing operations such as a car air conditioner, a car audio, and a car navigation mounted on a vehicle using a voice recognition device is known (for example, see Patent Document 1). . What is important in the speech recognition apparatus is to prevent misrecognition and increase the recognition rate. As one measure to prevent the occurrence of misrecognition, three recognition templates for men, women and men and women are prepared as recognition templates for matching the short-time spectrum of the input recognition target word with the standard pattern. Thus, a technique is known in which a recognition template suitable for each gender is used to increase the recognition rate in speech recognition as compared to the case where a single recognition template is used.
JP 2002-169484 A

しかしながら、このように男性用、女性用、混合用の３つの認識テンプレートを用意しても、それらは全て標準語を対象としており、アクセントやイントネーション、言いまわしが異なったり、なまりのあるユーザが発音する誤認識が発生することがあり、認識率があまり向上しないという問題があった。 However, even if three recognition templates are prepared for men, women, and mixed in this way, they are all targeted for standard words, and accents, intonations, phrases are different, and the pronunciation is pronounced by the user There is a problem that the recognition rate does not improve so much.

本発明は、このような従来の問題を解決するものであり、アクセントやイントネーション、言いまわしが異なったり、なまりのあるユーザが発音しても、認識率を向上させることのできる音声認識装置および方法と音声認識装置を搭載した車載ナビゲーション装置を提供することを目的とする。 The present invention solves such a conventional problem, and a speech recognition apparatus and method capable of improving the recognition rate even when a user with a different accent, intonation, or phrase, or with a rough sound, can improve the recognition rate. It is an object of the present invention to provide an in-vehicle navigation device equipped with a voice recognition device.

本発明の音声認識装置は、入力された単語の音声を周波数分析した結果を、複数の認識テンプレートを用いて作成した単語辞書と照合し、類似度の高い単語を抽出して音声を認識する音声認識装置であって、前記認識テンプレートとして標準テンプレートと複数の地域テンプレートとを備え、前記認識テンプレートを任意に切り替え可能な認識制御手段を備えたことを特徴とする。 The speech recognition apparatus of the present invention compares the result of frequency analysis of the speech of the input word with a word dictionary created using a plurality of recognition templates, extracts speech with high similarity, and recognizes speech A recognition apparatus comprising a standard template and a plurality of regional templates as the recognition templates, and a recognition control means capable of arbitrarily switching the recognition templates.

この構成により、ユーザが、自分の発音は標準語とは少し異なると自覚している場合は、自分に適した地域言語の地域テンプレートを選択して音声認識を行うことにより、音声入力を容易に行えるとともに音声認識率を高めることができ、音声認識における入力回数を減らすことにより操作性を向上させることができる。 With this configuration, if the user is aware that his / her pronunciation is slightly different from the standard language, voice input can be facilitated by selecting a regional template in a regional language suitable for him / her and performing speech recognition. In addition, the voice recognition rate can be increased, and the operability can be improved by reducing the number of inputs in voice recognition.

また、本発明の音声認識装置は、前記認識テンプレートが、それぞれ男性用、女性用、混合用の複数のテンプレートを備えていることを特徴とする。 Moreover, the speech recognition apparatus of the present invention is characterized in that the recognition template includes a plurality of templates for men, women, and a mixture, respectively.

この構成により、ユーザが、自分の性別または声質に応じて男性用または女性用を選択するか、または男女が一緒にいる場合には混合用の認識テンプレートを選択することにより、より認識率を高めることができる。 With this configuration, the user can select the male or female type according to his / her gender or voice quality, or if the male and female are together, the recognition template for mixing can be selected to further increase the recognition rate. be able to.

また、本発明の音声認識装置は、前記認識制御手段が、ユーザが指示した選択項目に基づいて前記認識テンプレートを切り替えることを特徴とする。 The speech recognition apparatus of the present invention is characterized in that the recognition control unit switches the recognition template based on a selection item designated by a user.

この構成により、ユーザが、どの認識テンプレートを使用して音声認識を行うかについて予め選択項目の中から選択しておくことにより、認識制御手段は、その選択された選択項目に従って認識テンプレートを選択することができる。 With this configuration, the recognition control unit selects a recognition template according to the selected selection item by selecting in advance from the selection items which recognition template is used for voice recognition. be able to.

また、本発明の音声認識装置は、前記選択項目は、表示手段に表示された地域コードから選択した地域データであることを特徴とする。 In the speech recognition apparatus of the present invention, the selection item is regional data selected from a regional code displayed on a display unit.

この構成により、ユーザは、表示手段の画面に表示された地域コード、例えば、日本全国の都道府県別に割り当てられた地域コードのいずれかを選択することにより、その地方の地域テンプレートを選択することができる。 With this configuration, the user can select a regional template for the region by selecting one of the regional codes displayed on the screen of the display means, for example, a regional code assigned to each prefecture in Japan. it can.

また、本発明の音声認識装置は、前記選択項目は、カード読取手段により読み取られた免許証またはＩＣカードからの本籍データまたは住所データであることを特徴とする。 In the voice recognition device according to the present invention, the selection item is a license read by a card reading means, or permanent data or address data from an IC card.

この構成により、ユーザが、ＩＣカード形式の免許証またはクレジット型またはプリペイド型のＩＣカードをカード読取手段に挿入することにより、カード読取手段が免許証に記載された本籍地または現住所、またＩＣカードに記載された現住所を読み取って、その地域のコードデータに変換することにより、その地方の地域テンプレートを選択することができる。 With this configuration, when the user inserts an IC card type license or a credit-type or prepaid type IC card into the card reading means, the card reading means has a permanent address or current address written on the license, or an IC card. By reading the current address described in the above and converting it into the code data of the region, the region template of the region can be selected.

また、本発明の音声認識装置は、前記選択項目は、現在位置を検出する現在位置検手段からの現在位置データであることを特徴とする。 In the speech recognition apparatus of the present invention, the selection item is current position data from a current position detecting unit that detects a current position.

この構成により、ユーザが、選択項目として現在位置を検出する現在位置検手段からの現在位置データを選択することにより、ユーザの移動に伴い自動的に現在位置に対応する地域テンプレートを選択することができる。 With this configuration, the user can automatically select a region template corresponding to the current position as the user moves by selecting current position data from the current position detecting unit that detects the current position as a selection item. it can.

また、本発明の音声認識装置は、前記選択項目は、地点登録手段により登録された自宅位置データであることを特徴とする。 In the voice recognition device of the present invention, the selection item is home position data registered by a point registration unit.

この構成により、ユーザが、選択項目として地点登録手段により登録された自宅位置データを選択することにより、自宅位置に対応する地域テンプレートを選択することができる。 With this configuration, the user can select an area template corresponding to the home position by selecting the home position data registered by the point registration unit as a selection item.

また、本発明の音声認識装置は、前記認識制御手段は、前回使用した認識率の高い認識テンプレートを記憶してそれを次回の認識テンプレートとして使用する学習機能を備えたことを特徴とする。 The speech recognition apparatus according to the present invention is characterized in that the recognition control means has a learning function for storing a recognition template having a high recognition rate used last time and using it as a next recognition template.

この構成により、ユーザが、どの選択項目を選択し、選択した地域テンプレートはどの地域テンプレートで、男性用か女性用かまたは混合用かを記憶しておくとともに、その中から認識率の高かった地域テンプレートを記憶することにより、次回は音声認識装置を起動した時点で、前回使用したものと同じ地域テンプレートを使用することができ、認識効率および操作性を高めることができる。 With this configuration, the user selects which selection item, and the selected regional template stores which regional template is for men, women, or mixed, and the region with the highest recognition rate By storing the template, the next time the voice recognition device is activated, the same regional template as used last time can be used, and the recognition efficiency and operability can be improved.

また、本発明の音声認識方法は、入力された単語の音声を周波数分析した結果を、複数の認識テンプレートを用いて作成した単語辞書と照合し類似度の高い単語を抽出して音声を認識する方法であって、前記認識テンプレートとして標準テンプレートと複数の地域テンプレートとを備え、前記認識テンプレートを任意に切り替えて音声認識を行うことを特徴とする。 Also, the speech recognition method of the present invention recognizes speech by comparing the result of frequency analysis of the speech of the input word with a word dictionary created using a plurality of recognition templates and extracting words with high similarity. A method is characterized in that a standard template and a plurality of regional templates are provided as the recognition templates, and voice recognition is performed by arbitrarily switching the recognition templates.

この構成により、ユーザが、自分の発音は標準語とは少し異なると自覚している場合は、自分に適した地域言語の地域テンプレートを使用して音声認識を行うことにより、音声入力を容易に行えるとともに音声認識率を高めることができ、音声認識における入力回数を減らすことにより操作性を向上させることができる。 With this configuration, if the user is aware that his / her pronunciation is a little different from the standard language, speech recognition is facilitated by using a regional template in a regional language suitable for him / her. In addition, the voice recognition rate can be increased, and the operability can be improved by reducing the number of inputs in voice recognition.

また、本発明の音声認識方法は、前記地域テンプレートは、それぞれ男性用、女性用、混合用の複数のテンプレートを備えていることを特徴とする。 The speech recognition method of the present invention is characterized in that each of the regional templates includes a plurality of templates for men, women, and a mixture.

この構成により、ユーザが、自分の性別に応じて男性用、女性用または混合用の認識テンプレートを選択することにより、より認識率を高めることができる。 With this configuration, the user can further increase the recognition rate by selecting a recognition template for men, women, or a mixture according to his / her gender.

また、本発明の音声認識方法は、前記認識テンプレートの切り替えは、ユーザの指示により行うことを特徴とする。 The speech recognition method of the present invention is characterized in that switching of the recognition template is performed according to a user instruction.

また、本発明の音声認識方法は、前回使用した認識率の高い認識テンプレートを記憶してそれを次回の認識テンプレートとして使用することを特徴とする。 The speech recognition method of the present invention is characterized in that a recognition template having a high recognition rate used last time is stored and used as the next recognition template.

この構成により、ユーザが、どの選択項目を選択し、選択した地域テンプレートはどの地域テンプレートで、男性用か女性用かまたは混合用かを記憶しておくとともに、その中から認識率の高かった地域テンプレートを記憶することにより、次回からは音声認識装置を起動した時点で、前回使用したものと同じ地域テンプレートを使用することができ、認識効率および操作性を高めることができる。 With this configuration, the user selects which selection item, and the selected regional template stores which regional template is for men, women, or mixed, and the region with the highest recognition rate By storing the template, it is possible to use the same regional template as that used last time when the speech recognition apparatus is activated from the next time, and the recognition efficiency and operability can be improved.

また、本発明は、現在位置を検出する現在位置検出手段と、道路地図データを基に地図および前記現在位置検出手段により検出された現在位置を表示する表示手段と、請求項１ないし請求項８のいずれかに記載の音声認識装置と、前記音声認識装置の認識結果を基に経路案内を行う制御手段を備えた車載ナビゲーション装置である。 The present invention also provides a current position detecting means for detecting a current position, a display means for displaying a map based on road map data and a current position detected by the current position detecting means, and claims 1 to 8. An in-vehicle navigation device comprising: the voice recognition device according to claim 1; and a control unit that performs route guidance based on a recognition result of the voice recognition device.

この構成により、ユーザは、ナビゲーション装置における入力操作を、自分に適した地域言語を使用して音声により行うことができ、音声入力を容易に行えるとともに音声認識率を高めることができ、車載ナビゲーション装置としての利便性および操作性を一層向上させることができる
また、本発明の車載ナビゲーション装置は、音声案内フレーズとして標準フレーズと複数の地域フレーズとを記憶する記憶手段を備え、前記音声認識装置において使用された認識テンプレートに合わせて前記音声案内フレーズを選択することを特徴とする。 With this configuration, the user can perform an input operation in the navigation device by voice using a regional language suitable for himself, and can easily perform voice input and increase the voice recognition rate. Further, the in-vehicle navigation device of the present invention includes a storage unit that stores a standard phrase and a plurality of regional phrases as a voice guidance phrase, and is used in the voice recognition device. The voice guidance phrase is selected according to the recognized recognition template.

この構成により、交差点や高速道路の出入口等における音声案内を、音声認識に使用した地域テンプレートと同様な地域言語で行うことができ、ユーザに対する情報伝達を容易に行うことができる。 With this configuration, voice guidance at intersections, highway entrances and the like can be performed in the same local language as the regional template used for voice recognition, and information can be easily transmitted to the user.

本発明は、音声認識に使用する認識テンプレートとして標準テンプレートと複数の地域テンプレートとを備え、この認識テンプレートを任意に切り替え可能な認識制御手段を備えているので、ユーザが、自分に適した地域テンプレートを選択して音声認識を行うことにより、音声入力を容易に行えるとともに音声認識率を高め、利便性および操作性を向上させることができるという効果を有する。 The present invention includes a standard template and a plurality of regional templates as recognition templates used for speech recognition, and includes a recognition control unit that can arbitrarily switch the recognition templates. By selecting and performing voice recognition, voice input can be performed easily, the voice recognition rate can be increased, and convenience and operability can be improved.

以下、本発明の実施の形態を図面を用いて説明する。図１は本発明の実施の形態における車載ナビゲーション装置の構成を示している。図１において、方位センサ１は、振動ジャイロが使用され、自車の進行方位を検出する。車速センサ２は、本装置を搭載した車両の車輪の回転数に応じた車速パルスを発生する。各種センサ信号３は、リバーススイッチ、パーキングスイッチ、ライトスイッチなどの信号である。センサ信号処理部４は、方位センサ１からの信号を基に車両の進行方向を算出するとともに、車速センサ２からの車速信号から走行距離を算出し、さらに各種センサ信号３を基に制御に必要な信号を生成する。ＤＶＤ−ＲＯＭ５は記憶手段の一つであり、地図データと、音声認識における標準語および複数の地域言語に対応する機能語辞書、施設名、住所、名字、名前等の各辞書データおよび音声フレーズデータ等が記録されている。ＤＶＤ−ＲＯＭドライブ６は、ＤＶＤ−ＲＯＭ５から地図データや音声認識用辞書データ、音声データなどを読み出すものである。液晶ディスプレイ７は、地図および現在の自車位置、方位、操作メニューなどを表示するものであり、その前面にタッチパネルなどの操作入力部を備えている。ＧＰＳ受信機８は、複数の衛星から送信される電波をアンテナを通じて受信し演算することで自車の現在位置（緯度・経度）を求めることができる。ＥＴＣ車載器９は、有料道路の入口ゲートおよび出口ゲートに設置された路側装置との無線により有料道路の利用料金の支払いを自動的に行うもので、ＥＴＣ車載器９に挿入されたクレジット機能を有するＩＣカードまたはプリペイド式のＩＣカードにより料金の精算を行うためにカード読取器を備えている。このカード読取器は、セキュリティ確保のために、挿入されたＩＣカード形式の運転免許証の内容を読み取って本人確認を行えるように構成されている。これらＤＶＤ−ＲＯＭドライブ６、液晶ディスプレイ７、ＧＰＳ受信機８、ＥＴＣ車載器９等は、車両のダッシュボード上に配置され、車内ＬＡＮ１０を通じて装置本体１１の通信インターフェース１２に接続されている。装置本体１１は、車両のトランクルームや車内のセンターコンソールや座席の下などに設置される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 shows the configuration of an in-vehicle navigation device according to an embodiment of the present invention. In FIG. 1, the direction sensor 1 uses a vibration gyro and detects the traveling direction of the own vehicle. The vehicle speed sensor 2 generates a vehicle speed pulse corresponding to the number of rotations of the wheel of the vehicle on which the present apparatus is mounted. Various sensor signals 3 are signals such as a reverse switch, a parking switch, and a light switch. The sensor signal processing unit 4 calculates the traveling direction of the vehicle based on the signal from the direction sensor 1, calculates the travel distance from the vehicle speed signal from the vehicle speed sensor 2, and further requires control based on various sensor signals 3. A simple signal. The DVD-ROM 5 is one of storage means, and includes map data, function word dictionary corresponding to a standard word in voice recognition and a plurality of regional languages, facility name, address, surname, name, and other dictionary data and voice phrase data. Etc. are recorded. The DVD-ROM drive 6 reads map data, voice recognition dictionary data, voice data, and the like from the DVD-ROM 5. The liquid crystal display 7 displays a map and the current vehicle position, direction, operation menu, and the like, and includes an operation input unit such as a touch panel on the front surface. The GPS receiver 8 can obtain the current position (latitude / longitude) of the vehicle by receiving and calculating radio waves transmitted from a plurality of satellites through an antenna. The ETC in-vehicle device 9 automatically pays the toll road usage fee wirelessly with the roadside devices installed at the entrance gate and the exit gate of the toll road, and has the credit function inserted in the ETC on-vehicle device 9. A card reader is provided for the payment of charges by using an IC card or a prepaid IC card. In order to ensure security, this card reader is configured to read the content of an inserted IC card type driver's license and confirm the identity. These DVD-ROM drive 6, liquid crystal display 7, GPS receiver 8, ETC vehicle-mounted device 9, etc. are arranged on the dashboard of the vehicle and are connected to the communication interface 12 of the apparatus main body 11 through the in-vehicle LAN 10. The apparatus main body 11 is installed in a trunk room of a vehicle, a center console in a vehicle, or under a seat.

マイク１３は、車内の運転者近傍に配置され、使用者からの発声語句を入力するものであり、スピーカ１４は、検索結果や音声認識結果、走行ルート上の交差点案内、分岐案内、料金所案内、出口案内などの音声案内、リモコンでの操作内容を音声で指示したりする場合に使用される。音声分析部１５は、マイク１３から入力された語句の単語を周波数分析して短時間スペクトルの時系列を出力する。画像プロセッサ１６は、地図データや自車の現在位置データ、建物データなどに基づき表示画像の形成処理を行う。記憶部１７は記憶手段の一つであり、プログラムを格納したＲＯＭや作業データを一時的に格納するＲＡＭ、画像データを格納するＶＲＡＭなどを備え、辞書メモリとしても機能している。認識テンプレート格納部１８は、音素の短時間スペクトルからなる男性用と女性用と混合用の３つの認識テンプレートを格納している。音声プロセッサ１９は、音声認識結果として出力された音素記号系列を音声信号に変換したり、検索結果や走行ルート上の音声案内、リモコンでの操作内容を表す音声信号をスピーカ１４に出力する。ＣＰＵ（中央処理装置）２０は、装置全体を制御するものであり、ナビゲーション装置としての機能を実現するために、現在位置算出手段、経路探索手段、検索手段、地点登録手段などのソフトウエアプログラムを実行するとともに、音声認識装置としての機能を実現するために、仮名ＣＶ変換部２１、辞書生成部２２および類似度判定部２３などのソフトウエアプログラムを実行する。 The microphone 13 is arranged in the vicinity of the driver in the vehicle and inputs a spoken phrase from the user. The speaker 14 is a search result, a voice recognition result, intersection guidance on the travel route, branch guidance, toll gate guidance. It is used for voice guidance such as exit guidance and voice operation instructions on the remote controller. The voice analysis unit 15 performs frequency analysis on the words of the phrase input from the microphone 13 and outputs a time series of a short time spectrum. The image processor 16 performs a display image forming process based on map data, current position data of the own vehicle, building data, and the like. The storage unit 17 is one of storage units, and includes a ROM storing a program, a RAM temporarily storing work data, a VRAM storing image data, and the like, and also functions as a dictionary memory. The recognition template storage unit 18 stores three recognition templates for men, women, and a mixture consisting of short-time spectra of phonemes. The voice processor 19 converts a phoneme symbol sequence output as a voice recognition result into a voice signal, or outputs a search result, voice guidance on the travel route, and a voice signal representing the operation content on the remote controller to the speaker 14. The CPU (central processing unit) 20 controls the entire apparatus, and software programs such as a current position calculation unit, a route search unit, a search unit, and a point registration unit are installed in order to realize a function as a navigation device. At the same time, software programs such as the kana CV conversion unit 21, the dictionary generation unit 22, and the similarity determination unit 23 are executed in order to realize the function as the voice recognition device.

次に、本実施の形態における動作について、まずナビゲーション装置の動作について説明する。図１において、所定の操作により装置を立ち上げると、ＣＰＵ２０内の現在位置検出手段が、ＧＰＳ受信機８からの位置情報と、方位センサ１および車速センサ２からの信号をセンサ信号処理部４により処理したデータを基に、自車の正確な現在位置を算出する。この自車位置情報に基づき、ＣＰＵ２０が、ＤＶＤ−ＲＯＭドライブ６を通じてＤＶＤ−ＲＯＭ５から該当する地図データを読み出し、画像プロセッサ１６により画像データに変換して記憶部１７のＶＲＡＭに一旦記憶した後、色信号に変換して通信インターフェイス１２を通じて液晶ディスプレイ７の画面上に自車位置とともに表示する。また、マイク１３を通じて目的地などの住所名を入力すると、音声認識機能によりその住所名を認識し、目的地が設定される。ＣＰＵ２０内の経路探索手段は、この特定された目的地までの自車の現在位置からの最適な案内経路を算出し、液晶ディスプレイ７の地図上に重ねて表示する。運転者は液晶ディスプレイ７に表示された案内経路に沿って車両を進めると、ＣＰＵ２０は、現在位置検出手段が算出した現在位置情報と地図データ上の道路ネットワークデータを基に、液晶ディスプレイ７上の自車位置マークを順次更新してゆく。車両が案内経路中の分岐点などに差し掛かると、地図データに付加された音声フレーズ案内がスピーカ１４から出力される。さらに、地点登録手段により記憶しておきたい地点を登録しておくことができ、目的地や自宅の住所も登録することができる。また、検索手段により、施設などをジャンルや５０音などにより検索することができる。 Next, the operation of this embodiment will be described first. In FIG. 1, when the apparatus is started up by a predetermined operation, the current position detection means in the CPU 20 receives position information from the GPS receiver 8 and signals from the direction sensor 1 and the vehicle speed sensor 2 by the sensor signal processing unit 4. Based on the processed data, the current current position of the vehicle is calculated. Based on the vehicle position information, the CPU 20 reads the corresponding map data from the DVD-ROM 5 through the DVD-ROM drive 6, converts it into image data by the image processor 16, and temporarily stores it in the VRAM of the storage unit 17. The signal is converted into a signal and displayed on the screen of the liquid crystal display 7 together with the vehicle position through the communication interface 12. When an address name such as a destination is input through the microphone 13, the address name is recognized by the voice recognition function, and the destination is set. The route search means in the CPU 20 calculates the optimum guide route from the current position of the vehicle to the specified destination and displays it on the map of the liquid crystal display 7 in a superimposed manner. When the driver advances the vehicle along the guidance route displayed on the liquid crystal display 7, the CPU 20 displays the information on the liquid crystal display 7 on the basis of the current position information calculated by the current position detecting means and the road network data on the map data. The vehicle position mark will be updated sequentially. When the vehicle reaches a branch point in the guidance route, the voice phrase guidance added to the map data is output from the speaker 14. Furthermore, the point to be stored can be registered by the point registration means, and the destination and home address can also be registered. Also, the facility can be searched by genre, 50 sounds, etc. by the search means.

次に、本実施の形態における音声認識装置について説明する。本実施の形態における音声認識装置は、マイク１３から入力された単語の音声を周波数分析して短時間スペクトルの時系列（ＬＰＣケプストラム係数列）を出力する音声分析部１５と、単語の読み仮名をＣＶ変換する仮名ＣＶ変換部２１と、音素の短時間スペクトルからなる男性用と女性用と混合用の３つの認識テンプレートを格納する認識テンプレート格納部１８と、仮名ＣＶ変換された単語を認識テンプレートにより音素記号系列（ＬＰＣケプストラム係数列）に変換して単語辞書を作成する辞書生成部２２と、音声分析部１５から出力された単語の短時間スペクトルの時系列と辞書生成部２２で生成された単語の音素記号系列との類似度を計算し、両者の間の距離の最も小さい、すなわち類似度の最も高い音素記号系列を認識結果として出力する類似度判定部２３と、認識結果の音素記号系列を音声としてスピーカ１４から出力する音声プロセッサ１９と、出力された認識結果の音声が入力された単語の音声と異なる誤認識が発生した場合は、３回を限度にユーザに操作のやり直しを行わせる認識制御手段でもあるＣＰＵ２０とを、主たる構成要件としている。 Next, the speech recognition apparatus in the present embodiment will be described. The speech recognition apparatus according to the present embodiment includes a speech analysis unit 15 that frequency-analyzes the speech of a word input from the microphone 13 and outputs a time series (LPC cepstrum coefficient sequence) of a short-time spectrum, The Kana CV conversion unit 21 that performs CV conversion, the recognition template storage unit 18 that stores three recognition templates for men, women, and a mixture consisting of short-time spectra of phonemes, and the Kana CV conversion word by the recognition template A dictionary generation unit 22 that creates a word dictionary by converting to a phoneme symbol sequence (LPC cepstrum coefficient sequence), a time series of short-term spectra of words output from the speech analysis unit 15, and words generated by the dictionary generation unit 22 The similarity between the phoneme symbol sequence and the phoneme symbol sequence with the smallest distance between them, that is, the highest similarity is used as the recognition result. When the similarity determination unit 23 to output, the speech processor 19 that outputs the phoneme symbol sequence of the recognition result as speech from the speaker 14, and the erroneous recognition that is different from the speech of the word to which the output speech of the recognition result is input The main constituent requirement is the CPU 20 which is also a recognition control means for allowing the user to redo the operation up to three times.

図２は本実施の形態における認識テンプレート格納部１８に格納された認識テンプレートの例を示している。認識テンプレートは、標準語に対応する標準テンプレート３１と地域言語に対応する地域テンプレート３２とからなり、それぞれのテンプレートには、「男女混合」「男１」「男２」「女１」「女２」の５種類のテンプレートが用意されている。「男女混合」は車両内に男性と女性が同乗している場合に誰が発声しても適合できるように音声認識に幅を持たせたものであり、「男１」は声の高い男性用、「男２」は声の低い男性用、「女１」は声の高い女性用、「女２」は声の低い女性用である。地域テンプレート３２は、東北弁、中部弁、関西弁、九州弁、沖縄弁などの地域ブロック毎に大まかに分類されているが、より認識率を高めたい場合には、各府県別に地域テンプレートを作成するとよい。 FIG. 2 shows an example of a recognition template stored in the recognition template storage unit 18 in the present embodiment. The recognition template includes a standard template 31 corresponding to the standard language and a regional template 32 corresponding to the regional language. Each template includes “mixed sex” “male 1” “male 2” “female 1” “female 2”. ”Templates are prepared. “Mixed” is a wide range of voice recognition that can be adapted to anyone who speaks when both men and women are in the vehicle. “Men 1” is for men with high voice. “Men 2” is for men with low voice, “Woman 1” is for women with high voice, and “Woman 2” is for women with low voice. The regional template 32 is roughly categorized for each regional block such as Tohoku dialect, Chubu dialect, Kansai dialect, Kyushu dialect, Okinawa dialect, etc. If you want to increase the recognition rate, create a regional template for each prefecture. Good.

次に、地域テンプレートを使用した音声認識処理について図３から図７を用いて説明する。初めに起動処理について説明する。図３において、液晶ディスプレイ７に表示されたメニュー画面から「ボイスコントロール」を選択すると、音声認識準備が開始され（ステップＳ１）、ＤＶＤ−ＲＯＭ５から音声認識辞書データが展開され、“あいうえお”順にソートされて一旦記憶部１７の辞書メモリに記憶される（ステップＳ２）。音声認識準備が完了すると（ステップＳ３）、ユーザがリモコンの音声入力ボタンを押して、マイク１３に向かって発声することにより音声の取り込みが行われ（ステップＳ４）、図４の地域テンプレート切替処理へ移行する。 Next, speech recognition processing using a regional template will be described with reference to FIGS. First, the activation process will be described. In FIG. 3, when “Voice Control” is selected from the menu screen displayed on the liquid crystal display 7, voice recognition preparation is started (step S 1), and voice recognition dictionary data is expanded from the DVD-ROM 5 and sorted in the order of “Aiueo”. Once stored in the dictionary memory of the storage unit 17 (step S2). When the voice recognition preparation is completed (step S3), the user presses the voice input button on the remote controller and speaks into the microphone 13 to capture the voice (step S4), and shifts to the regional template switching process of FIG. To do.

図４の地域テンプレート切替処理において、まずユーザによって地域テンプレートが切り替えられたかどうかを調べ（ステップＳ５）、どのような方式に切り替えられたかを調べる（ステップＳ６）。切り替えられた方式が地域コード指定であれば、選択された地域を指定する（ステップＳ７）。これは、例えば、日本全国の都道府県別に割り当てられた地域コードのいずれかを選択することにより、その地方の地域テンプレートを選択することができる。切り替えられた方式が現在地追尾であれば、ＧＰＳ受信機８が受信した現在位置情報に基づき、ＤＶＤ−ＲＯＭ５から読み出した道路地図データから現在地の地域を指定する（ステップＳ８）。切り替えられた方式が本籍住所指定であれば、ＥＴＣ車載器９のカード読取器に挿入された運転免許証やＩＣカードから読み取った本籍や住所から地域を指定するか（ステップＳ９）、または地点登録手段により登録された現住所から地域を指定する（ステップＳ１０）。このようにして地域を指定すると、ＣＰＵ２０は、指定された地域へ地域テンプレートを切り替え（ステップＳ１１）、図５の地域テンプレート選択処理へ移行する。なお、ステップＳ５で地域テンプレートへの切り替えでない場合は、標準テンプレートへの切り替えとなる（ステップＳ１２）。 In the regional template switching process of FIG. 4, first, it is checked whether or not the regional template has been switched by the user (step S5), and what type of switching has been performed (step S6). If the switched method is the area code designation, the selected area is designated (step S7). In this case, for example, by selecting any one of the regional codes assigned to the prefectures in Japan, the regional template for the region can be selected. If the switched method is the current location tracking, the current location area is designated from the road map data read from the DVD-ROM 5 based on the current location information received by the GPS receiver 8 (step S8). If the switched system is designated as a permanent address, specify the area from the driver's license inserted in the card reader of the ETC on-board unit 9 or the permanent address or address read from the IC card (step S9), or point registration A region is designated from the current address registered by the means (step S10). When the region is designated in this way, the CPU 20 switches the region template to the designated region (step S11), and proceeds to the region template selection process of FIG. If it is not switched to the regional template in step S5, it is switched to the standard template (step S12).

図５の地域テンプレート選択処理において、まず前回の音声認識処理で使用したテンプレートがデフォルトテンプレートであるかどうかを調べ（ステップＳ１３）、そうである場合にはデフォルトとして設定されたテンプレートを選択する（ステップＳ１８）。そうでない場合は、優先テンプレートは「男女混合」かを調べる（ステップＳ１４）。優先テンプレートとは、ユーザが自分の性別および声質から判断して任意に設定するものである。「男女混合」である場合には、テンプレートとして「男女混合」を選択する（ステップＳ１９）。そうでない場合は、優先テンプレートは「男１」かを調べ（ステップＳ１５）、そうである場合には、テンプレートとして「男１」を選択する（ステップＳ２０）。そうでない場合は、優先テンプレートは「男２」かを調べ（ステップＳ１６）、そうである場合には、テンプレートとして「男２」を選択する（ステップＳ２１）。そうでない場合は、優先テンプレートは「女１」かを調べ（ステップＳ１７）、そうである場合には、テンプレートとして「女１」を選択する（ステップＳ２２）。そうでない場合は、テンプレートとして「女２」を選択する（ステップＳ２３）。このようにして地域テンプレートが選択されると、図６の判定処理へ移行する。 In the regional template selection process of FIG. 5, it is first checked whether the template used in the previous speech recognition process is the default template (step S13). If so, the template set as the default is selected (step S13). S18). Otherwise, it is checked whether the priority template is “mixed sex” (step S14). The priority template is set arbitrarily by the user based on his / her gender and voice quality. If it is “Mixed”, “Mixed” is selected as a template (step S19). If not, it is checked whether the priority template is “male 1” (step S15). If so, “male 1” is selected as the template (step S20). If not, it is checked whether the priority template is “male 2” (step S16). If so, “male 2” is selected as the template (step S21). If not, it is checked whether the priority template is “female 1” (step S17). If so, “female 1” is selected as a template (step S22). Otherwise, “woman 2” is selected as a template (step S23). When the area template is selected in this way, the process proceeds to the determination process of FIG.

図６の判定処理において、図３のステップＳ４で取り込まれた音声データと図５の選択処理で選択された地域テンプレートの単語データとの比較が行われる（ステップＳ２４）。これは、音声データとして取り込まれた「施設検索」、「目的地設定」「地点登録」「現在地」「スクロール」といった機能語や、施設名、住所、名字、名前等の音声データとこれに対応する単語データとを比較するものである。ＤＶＤ−ＲＯＭ５から読み取られた単語データは、仮名ＣＶ変換部２１によってＣＶ変換され、次いで辞書生成部２２において、選択された地域テンプレートによりＬＰＣケプストラム係数列に変換されて単語辞書が生成され、記憶部１７の辞書メモリに格納される。ユーザが発声した単語は、音声分析部１５で周波数分析され、ＬＰＣケプストラム係数列に変換されて出力される。類似度判定部２３は、この入力された音声のＬＰＣケプストラム係数列と辞書メモリ内のＬＰＣケプストラム係数列との類似度を計算し、類似度の最も大きなものを辞書メモリ内から選んで認識結果として出力する。このようにして音声認識辞書の検索が行われ（ステップＳ２５）、入力された音声と辞書メモリ内の単語とがヒットしたかどうかを調べ（ステップＳ２６）、ヒットした場合は、ヒットした単語のＬＰＣケプストラム係数列が音声プロセッサ１９に出力され、音声プロセッサ１９でＶＣＶ音声信号に合成されてスピーカ１４からトークバックとして音声出力される（ステップＳ２７）。同時に、ＣＰＵ２０は、それに応じた制御動作を行い、図７のデフォルト設定処理へ移行する。また、ステップＳ２６でヒットしなかった場合は、リトライを３回まで許し（ステップＳ２８）、２回目で成功した場合は、図３の起動処理に戻って次の音声入力を待ち、３回目も失敗した場合には、エラーとして音声認識処理を終了し、その旨をスピーカ１４から音声出力する。ステップＳ２６でヒットしなかった場合、認識テンプレートを別のものに切り替えることにより、例えば「男性１」から「男性２」または「混合」用に切り替えることにより、ヒットする確率が高くなる。 In the determination process of FIG. 6, the voice data captured in step S4 of FIG. 3 is compared with the word data of the regional template selected in the selection process of FIG. 5 (step S24). This corresponds to voice data such as “facility search”, “destination setting”, “point registration”, “current location”, “scroll”, and voice data such as facility name, address, surname, name, etc. The word data to be compared is compared. The word data read from the DVD-ROM 5 is CV converted by the Kana CV conversion unit 21, and then converted into an LPC cepstrum coefficient sequence by the selected region template in the dictionary generation unit 22 to generate a word dictionary, and the storage unit 17 dictionary memories. The words uttered by the user are subjected to frequency analysis by the voice analysis unit 15, converted into an LPC cepstrum coefficient sequence, and output. The similarity determination unit 23 calculates the similarity between the LPC cepstrum coefficient sequence of the input speech and the LPC cepstrum coefficient sequence in the dictionary memory, and selects the one with the highest similarity from the dictionary memory as a recognition result. Output. In this way, the speech recognition dictionary is searched (step S25), and it is checked whether or not the input speech and a word in the dictionary memory have been hit (step S26). The cepstrum coefficient sequence is output to the audio processor 19, and is synthesized with the VCV audio signal by the audio processor 19, and is output as audio from the speaker 14 as talkback (step S27). At the same time, the CPU 20 performs a control operation corresponding to it, and proceeds to the default setting process of FIG. If no hit is found in step S26, retry is allowed up to 3 times (step S28). If successful in the second time, the process returns to the startup process of FIG. 3 and waits for the next voice input, and the third time also fails. In such a case, the voice recognition process is terminated as an error, and a message to that effect is output from the speaker 14. If there is no hit in step S26, the probability of hitting is increased by switching the recognition template to another, for example, switching from “male 1” to “male 2” or “mixed”.

次に、この音声認識装置における学習機能について説明する。上記したように、音声認識の精度を高めるためには、ユーザの声質に合った地域テンプレートを選択する必要があり、音声認識が成功した場合に、どの地域テンプレートを使用した場合にヒットしたかを記憶しておき、その選択した地域テンプレートを次回の音声認識の際にデフォルトとして設定しておくことにより、認識率の向上を図ることができる。 Next, the learning function in this speech recognition apparatus will be described. As described above, in order to improve the accuracy of speech recognition, it is necessary to select a regional template that matches the user's voice quality. It is possible to improve the recognition rate by storing it and setting the selected regional template as a default in the next speech recognition.

図７のデフォルト設定処理において、まず選択した地域テンプレートがヒットしたかどうかを調べる整合比較処理を行う（ステップＳ２９）。初めに「男女混合」のテンプレートについて他のテンプレートと比較して、例えば「悪い」「やや悪い」「普通」「やや良い」「良い」の５段階評価で採点し、続いて「男１」「男２」「女１」「女２」について順次比較して採点する（ステップＳ３０からステップＳ３４）。そして全ての採点結果から最も点数の高いテンプレートをデフォルトテンプレートとして設定し、次回の音声認識からそのテンプレートを採用する（ステップＳ３５）。 In the default setting process of FIG. 7, first, a matching comparison process is performed to check whether or not the selected area template has been hit (step S29). First, the “Mixed” template is scored in a five-step evaluation, for example, “Bad”, “Slightly bad”, “Normal”, “Slightly good”, and “Good”, followed by “Men 1”, “ Sequential comparison and scoring are made for male 2 "" woman 1 "" woman 2 "(step S30 to step S34). Then, the template with the highest score is set as the default template from all the scoring results, and the template is adopted from the next speech recognition (step S35).

なお、上記の音声認識処理において、地域テンプレートが選択された場合に、その選択された地域テンプレートに合致する地域言語による音声案内を行うようにするとよい。これは、標準語の他に地域言語による音声案内フレーズをＡＤＰＣＭによりＤＶＤ−ＲＯＭ５に記憶しておき、音声認識処理の際に地域テンプレートが選択された場合には、その選択された地域テンプレートに合致する地域言語による音声案内フレーズをＤＶＤ−ＲＯＭ５から読み出し、音声プロセッサ１９により音声信号に復号して、スピーカ１４から出力する。これにより、交差点や高速道路の出入口等における音声案内を、音声認識に使用した地域テンプレートと同様な地域言語で行うことができ、ユーザを安心させることができる。 In the voice recognition process described above, when a regional template is selected, voice guidance in a regional language that matches the selected regional template may be performed. This is because, in addition to the standard language, a voice guidance phrase in the local language is stored in the DVD-ROM 5 by ADPCM, and when the regional template is selected during the voice recognition processing, it matches the selected regional template. The voice guidance phrase in the local language is read from the DVD-ROM 5, decoded into a voice signal by the voice processor 19, and output from the speaker 14. Thereby, voice guidance at intersections, highway entrances and the like can be performed in the same local language as the local template used for voice recognition, and the user can be relieved.

このように、本実施の形態によれば、入力された単語の音声を周波数分析した結果を、複数の認識テンプレートを用いて作成した単語辞書と照合し、類似度の高い単語を抽出して音声を認識する際に、認識テンプレートとして標準テンプレートと複数の地域テンプレートとを備え、認識テンプレートを任意に切り替えるようにしたので、ユーザは、自分に適した地域テンプレートを選択して音声認識を行うことができ、地域言語を使用しても音声認識が可能であり、利便性および操作性を向上させることができる。 As described above, according to the present embodiment, the result of frequency analysis of the voice of the input word is collated with a word dictionary created using a plurality of recognition templates, and a word having a high similarity is extracted and the voice is extracted. When recognizing an image, a standard template and a plurality of regional templates are provided as recognition templates, and the recognition template is arbitrarily switched. Therefore, the user can select a regional template suitable for himself and perform speech recognition. In addition, voice recognition is possible even when a local language is used, and convenience and operability can be improved.

以上のように、本発明に係る音声認識装置および方法と車載ナビゲーション装置は、アクセントやイントネーション、言いまわしが異なったり、なまりのあるユーザが発音しても、その人に適合した地域言語用の認識テンプレートを使用することにより認識率を向上させることができるという効果を有し、音声入力により操作を行う制御装置等として有用であり、携帯電話、パソコン、ロボット、家電製品等への用途にも適用することができる。 As described above, the voice recognition device and method and the vehicle-mounted navigation device according to the present invention can recognize a local language suitable for a person even if the accent, intonation, or wording is different, or a pronounced user pronounces it. It has the effect that the recognition rate can be improved by using a template, is useful as a control device that operates by voice input, etc., and is also applicable to applications such as mobile phones, personal computers, robots, and home appliances can do.

本発明の実施の形態における音声認識装置を備えた車載ナビゲーション装置の構成を示す概略ブロック図Schematic block diagram showing a configuration of an in-vehicle navigation device provided with a voice recognition device in an embodiment of the present invention 本発明の実施の形態における認識テンプレートを示す模式図Schematic diagram showing a recognition template in an embodiment of the present invention 本発明の実施の形態における音声認識起動処理を示すフロー図The flowchart which shows the speech recognition starting process in embodiment of this invention 本発明の実施の形態における音声認識地域テンプレート切替処理を示すフロー図The flowchart which shows the speech recognition area template switching process in embodiment of this invention. 本発明の実施の形態における音声認識地域テンプレート選択処理を示すフロー図The flowchart which shows the speech recognition area template selection process in embodiment of this invention. 本発明の実施の形態における音声認識判定処理を示すフロー図The flowchart which shows the speech recognition determination process in embodiment of this invention 本発明の実施の形態における音声認識デフォルト設定処理を示すフロー図The flowchart which shows the speech recognition default setting process in embodiment of this invention

Explanation of symbols

１方位センサ
２車速センサ
３各種センサ信号
４センサ信号処理部
５ＤＶＤ−ＲＯＭ
６ＤＶＤ−ＲＯＭドライブ
７液晶ディスプレイ
８ＧＰＳ受信機
９ＥＴＣ車載器
１０車内ＬＡＮ
１１装置本体
１２通信インターフェイス
１３マイク
１４スピーカ
１５音声分析部
１６画像プロセッサ
１７記憶部
１８認識テンプレート格納部
１９音声プロセッサ
２０ＣＰＵ
２１仮名ＣＶ変換部
２２辞書生成部
２３類似度判定部 DESCRIPTION OF SYMBOLS 1 Direction sensor 2 Vehicle speed sensor 3 Various sensor signals 4 Sensor signal processing part 5 DVD-ROM
6 DVD-ROM drive 7 Liquid crystal display 8 GPS receiver 9 ETC in-vehicle device 10 Car LAN
DESCRIPTION OF SYMBOLS 11 Apparatus main body 12 Communication interface 13 Microphone 14 Speaker 15 Voice analysis part 16 Image processor 17 Memory | storage part 18 Recognition template storage part 19 Voice processor 20 CPU
21 Kana CV conversion unit 22 Dictionary generation unit 23 Similarity determination unit

Claims

A speech recognition apparatus for recognizing speech by extracting a word with high similarity by comparing a result of frequency analysis of speech of an input word with a word dictionary created using a plurality of recognition templates, A speech recognition apparatus comprising a standard template and a plurality of regional templates as templates, and a recognition control means capable of arbitrarily switching the recognition templates.

The speech recognition apparatus according to claim 1, wherein the recognition template includes a plurality of templates for men, women, and a mixture.

The speech recognition apparatus according to claim 1, wherein the recognition control unit switches the recognition template based on a selection item designated by a user.

4. The speech recognition apparatus according to claim 3, wherein the selection item is regional data selected from a regional code displayed on a display means.

4. The speech recognition apparatus according to claim 3, wherein the selection item is a license or IC card data read by a card reading means.

4. The voice recognition apparatus according to claim 3, wherein the selection item is current position data from a current position detecting means for detecting a current position.

4. The speech recognition apparatus according to claim 3, wherein the selection item is home position data registered by a point registration unit.

The said recognition control means is provided with the learning function which memorize | stores the recognition template with the high recognition rate used last time, and uses it as a next recognition template. Voice recognition device.

A method for recognizing speech by extracting a word having a high similarity by comparing a result of frequency analysis of speech of an input word with a word dictionary created using a plurality of recognition templates, and using the standard as the recognition template A speech recognition method comprising a template and a plurality of regional templates, and performing speech recognition by arbitrarily switching the recognition templates.

The speech recognition method according to claim 9, wherein the recognition template includes a plurality of templates for men, women, and a mixture.

The speech recognition method according to claim 9 or 10, wherein switching of the recognition template is performed according to a user instruction.

The speech recognition method according to any one of claims 9 to 11, wherein a recognition template having a high recognition rate used last time is stored and used as a next recognition template.

9. A current position detecting means for detecting a current position, a display means for displaying a map and a current position detected by the current position detecting means on the basis of road map data, and any one of claims 1 to 8. A vehicle-mounted navigation device comprising a voice recognition device and control means for performing route guidance based on a recognition result of the voice recognition device.

The storage device for storing a standard phrase and a plurality of regional phrases as a voice guidance phrase, wherein the voice guidance phrase is selected according to a recognition template used in the voice recognition device. Car navigation system.