JP2000181485A

JP2000181485A - Device and method for voice recognition

Info

Publication number: JP2000181485A
Application number: JP10354995A
Authority: JP
Inventors: Masaaki Ichihara; 雅明市原
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 1998-12-14
Filing date: 1998-12-14
Publication date: 2000-06-30

Abstract

PROBLEM TO BE SOLVED: To lighten the burden of speaking on a user when a destination for navigation is vocally set. SOLUTION: The user inputs destination data for navigation through a microphone 10. A control part 14 stores the input spoken data in a spoken data storage part 22 and analyzes at least part of the input spoken data by using a voice data base 18. The voice data 18 used for the analysis is switched by using the analysis result and the spoken data stored in the spoken data storage part 22 are read out to take an analysis again. Thus, the stored spoken data are used, so the user speaks only once to set the destination with high precision.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置及び方
法、特にナビゲーションシステムにおいて目的地を設定
する際の音声認識に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus and method, and more particularly to speech recognition when setting a destination in a navigation system.

【０００２】[0002]

【従来の技術】従来より、音声でナビゲーションシステ
ムの各種処理、例えば目的地設定を行う技術が提案され
ている。このような技術においては、いかに迅速かつ正
確にユーザが発した音声を認識するかが重要な課題であ
る。通常、音声認識はユーザの発声データと予め用意さ
れた音声データベース内の音声データとを比較すること
で行われるが、音声データベースは階層化して用いられ
ることが多い。2. Description of the Related Art Hitherto, there has been proposed a technique for performing various processes of a navigation system by voice, for example, setting a destination. In such a technique, it is an important issue how to quickly and accurately recognize a voice uttered by a user. Normally, speech recognition is performed by comparing utterance data of a user with speech data in a speech database prepared in advance, but the speech database is often used in a hierarchical manner.

【０００３】例えば、特開平１０−６２１９９号公報に
は、音声データベースを３つの階層に分け、階層１に位
置情報を有する施設名と位置情報を有しない施設ジャン
ル名を記憶し、階層２に階層１のジャンル名に該当す
る、位置情報を有する施設名と位置情報を有しない都道
府県名を記憶し、階層３に階層２の都道府県に該当す
る、位置情報を有する施設名を記憶して、ユーザの発声
データに応じて順次階層を変化させて音声認識する技術
が開示されている。For example, Japanese Patent Application Laid-Open No. Hei 10-62199 discloses that an audio database is divided into three layers, a facility name having location information and a facility genre name having no location information are stored in layer 1, and a hierarchy 2 is stored in layer 2. A facility name having location information and a prefecture name having no location information corresponding to the genre name of 1 are stored, and a facility name having location information corresponding to the prefecture of the tier 2 is stored in the layer 3; A technology for recognizing speech by sequentially changing the hierarchy according to user utterance data is disclosed.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記従
来技術ではユーザの発声毎に音声データベースの階層を
切り替えているため、例えばユーザがナビゲーションの
目的地をデパートの＊＊百貨店に設定したいと欲して
も、順次「施設」→「デパート」→「＊＊百貨店」と繰
り返し発声しなければならず、「デパートの＊＊百貨店
に行きたい」などのように自然な発声１回で目的地を設
定することができなかった。However, in the above-mentioned prior art, since the hierarchy of the voice database is switched for each utterance of the user, even if the user wants to set the navigation destination to the department store ** department store, for example. , You must repeatedly say "facility" → "department store" → "** department store" repeatedly, and set the destination with one natural utterance like "I want to go to the department store ** department store" Could not.

【０００５】また、ユーザによっては、例えば「＊＊百
貨店の近くの駐車場」の如く、ある目標物を起点として
目的地を設定したいと欲する場合があるが、従来技術で
はこのような目標物を起点とした目的地設定を認識する
ことができない問題があった。[0005] In addition, some users may want to set a destination starting from a certain target, such as "a parking lot near a department store". There was a problem that the destination setting as the starting point could not be recognized.

【０００６】本発明は、上記従来技術の有する課題に鑑
みなされたものであり、その目的は、ユーザの発声の負
担を軽減してより簡易に所望のデータを音声で設定でき
る装置及び方法を提供することにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned problems of the related art, and has as its object to provide an apparatus and a method capable of reducing a user's utterance load and easily setting desired data by voice. Is to do.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するため
に、第１の発明は、ユーザの発声データを記憶する発声
データ記憶手段と、前記発声データと音声データベース
内の音声データとを比較することにより前記発声データ
の少なくとも一部を解析する第１音声解析手段と、前記
第１音声解析手段で得られた解析データに基づいて前記
音声データベースを切り替える切替手段と、前記発声デ
ータ記憶手段に記憶された発声データを読み出し、前記
切替手段で切り替えられた音声データベース内の音声デ
ータとを比較することにより前記発声データを再解析す
る第２音声解析手段とを有することを特徴とする。第１
音声解析手段で解析し、さらに記憶しておいた発声デー
タを読み出して第２音声解析手段で再解析する（再解析
時には、音声データベースが切り替えられて最適化され
ている）ことで、ユーザの１回の発声で確実に音声認識
を行うことができる。なお、第１音声解析手段と第２音
声解析手段は別個に存在する必要はなく、同一の手段で
両機能を達成することも可能である。According to a first aspect of the present invention, an utterance data storage unit for storing utterance data of a user is compared with the utterance data and audio data in an audio database. First voice analysis means for analyzing at least a part of the utterance data, switching means for switching the voice database based on the analysis data obtained by the first voice analysis means, and storage in the utterance data storage means Second voice analysis means for re-analyzing the utterance data by reading out the generated utterance data and comparing it with the voice data in the voice database switched by the switching means. First
The analysis is performed by the voice analysis unit, and the stored utterance data is read out and re-analyzed by the second voice analysis unit (at the time of the re-analysis, the voice database is switched and optimized), so that the user's 1 Speech recognition can be reliably performed by utterance of the second time. It should be noted that the first sound analysis means and the second sound analysis means need not be provided separately, and both functions can be achieved by the same means.

【０００８】また、第２の発明は、第１の発明におい
て、前記発声データはナビゲーション用の目的地データ
であり、前記第２音声解析手段で所定のデータが得られ
た場合に、該データを前記目的地用の目標物として処理
する手段を有することを特徴とする。音声解析して得ら
れた所定のデータを目標物としてとらえることで、例え
ば「＊＊の近くの○○」という発声データについても、
「＊＊の近くの」という所定データを得ることで「＊
＊」を目標物として用い本来の目的地の「○○」を得る
ことが可能となる。According to a second aspect of the present invention, in the first aspect, the utterance data is destination data for navigation, and when predetermined data is obtained by the second voice analysis means, the data is converted to the destination data. It is characterized by having means for processing as the target for the destination. By taking predetermined data obtained by voice analysis as a target, for example, utterance data of "XX near **"
By obtaining the predetermined data “near **”, “*
By using "*" as a target, "OO" of the original destination can be obtained.

【０００９】また、第３の発明は、ユーザの発声データ
を記憶する記憶ステップと、前記ユーザの発声データと
音声データベース内の音声データとを比較することによ
り前記音声データの少なくとも一部を解析する第１解析
ステップと、前記第１解析ステップで得られた解析デー
タに基づいて前記音声データベースを切り替える切替ス
テップと、前記記憶ステップで記憶された発声データを
読み出し、読み出された発声データと前記切替ステップ
で切り替えられた音声データベース内の音声データとを
比較することにより再解析する第２解析ステップとを有
することを特徴とする。According to a third aspect of the present invention, at least a part of the voice data is analyzed by comparing the voice data of the user with voice data in a voice database by storing the voice data of the user and storing the voice data of the user. A first analysis step, a switching step of switching the voice database based on the analysis data obtained in the first analysis step, reading voice data stored in the storage step, and reading the read voice data and the switching. A second analysis step of reanalyzing by comparing with the audio data in the audio database switched in the step.

【００１０】また、第４の発明は、第３の発明におい
て、前記発声データはナビゲーション用の目的地データ
であり、前記第２解析ステップにおいて所定のデータが
得られた場合に、該データを前記目的地用の目標物とし
て処理する処理ステップをさらに有することを特徴とす
る。In a fourth aspect based on the third aspect, the utterance data is destination data for navigation, and when predetermined data is obtained in the second analysis step, the data is converted to the destination data. It is characterized by further comprising a processing step of processing as a target for a destination.

【００１１】[0011]

【発明の実施の形態】以下、図面に基づき本発明の実施
形態について、ナビゲーションシステムにおける目的地
設定を例にとり説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described with reference to the drawings, taking a destination setting in a navigation system as an example.

【００１２】図１は、本実施形態の構成ブロック図であ
り、音声認識機能を有するナビゲーションシステムの構
成を示す図である。FIG. 1 is a block diagram showing the configuration of the present embodiment, and shows the configuration of a navigation system having a voice recognition function.

【００１３】マイク１０は、ユーザ（車両乗員）の発声
を入力して制御部１４に供給する。現在位置検出部１２
は、ＧＰＳや車速センサ、方位センサなどで構成され、
車両の現在位置を検出して制御部１４に供給する。The microphone 10 inputs a user's (vehicle occupant) 's utterance and supplies it to the control unit 14. Current position detector 12
Is composed of GPS, vehicle speed sensor, direction sensor, etc.
The current position of the vehicle is detected and supplied to the control unit 14.

【００１４】制御部１４は、具体的にはマイクロコンピ
ュータで構成され、ナビゲーションに必要な各種制御を
実行するとともに、マイク１０から入力されたユーザの
発声データを解析して目的地を設定する。本実施形態で
は、この制御部１４が第１音声解析手段、第２音声解析
手段として機能するとともに、音声データベース１８の
内、解析に使用するデータを切り替える切替手段として
機能する。The control unit 14 is specifically constituted by a microcomputer, executes various controls necessary for navigation, analyzes the user's utterance data input from the microphone 10, and sets a destination. In the present embodiment, the control unit 14 functions as a first voice analysis unit and a second voice analysis unit, and also functions as a switching unit that switches data used for analysis in the voice database 18.

【００１５】音声データベース１８は、制御部１４でユ
ーザの発声データを解析する際に発声データと比較すべ
き音声データを記憶するもので、階層構造を有してい
る。制御部１４は、適宜この音声データベース１８にア
クセスして発声データを解析する。音声データベース１
８は、例えばＣＤ−ＲＯＭやＤＶＤ等で構成される。The voice database 18 stores voice data to be compared with the voice data when the voice data of the user is analyzed by the control unit 14, and has a hierarchical structure. The control unit 14 appropriately accesses the voice database 18 and analyzes the voice data. Voice database 1
Reference numeral 8 includes, for example, a CD-ROM or a DVD.

【００１６】地図データ記憶部２０は、ナビゲーション
に必要な地図データ（表示用地図データ及び経路探索用
地図データ）を記憶しており、制御部１４は検出された
現在位置周辺の地図データを地図データ記憶部２０から
読み出して表示部２４に表示し、あるいは発声データを
解析することで得られた目的地に至る経路を経路探索用
地図データを用いて探索し、推奨経路として表示部２４
に表示する。もちろん、推奨経路はスピーカから音声で
報知してもよい。地図データ記憶部２０は、例えばＣＤ
−ＲＯＭやＤＶＤ等で構成される。The map data storage unit 20 stores map data (display map data and route search map data) necessary for navigation, and the control unit 14 stores map data around the detected current position in the map data. A route to a destination obtained by reading out from the storage unit 20 and displaying it on the display unit 24 or analyzing the utterance data is searched for using the route search map data, and is displayed as a recommended route.
To be displayed. Of course, the recommended route may be notified by voice from a speaker. The map data storage unit 20 stores, for example, a CD
-Consists of ROM, DVD, etc.

【００１７】発声データ記憶部２２は、マイク１０から
入力した発声データを記憶するもので、制御部１４はこ
の発声データ記憶部２２に記憶された発声データを読み
出すことで、ユーザに対して再度の発声を要求すること
なく発声データの複数回の解析を行うことができる。発
声データ記憶部２２は、例えば半導体メモリで構成する
ことができる。The utterance data storage unit 22 stores the utterance data input from the microphone 10, and the control unit 14 reads out the utterance data stored in the utterance data storage unit 22, so that the user can receive the utterance data again. Multiple analyzes of utterance data can be performed without requiring utterance. The utterance data storage unit 22 can be composed of, for example, a semiconductor memory.

【００１８】なお、操作部１６は、表示部２４に表示さ
れた地図データのスクロールや、音声によらない手動の
目的地設定等の各種入力操作に用いられる。The operation unit 16 is used for various input operations such as scrolling of map data displayed on the display unit 24 and manual destination setting without using voice.

【００１９】図２には、音声データベース１８の階層構
造が示されている。音声データベースは、全国レベルの
認識グラマー辞書、都道府県レベルの認識グラマー辞
書、市町村レベルの認識グラマー辞書の３階層から構成
されている。なお、「グラマー辞書」とは、制御部１４
で発声データを解析する際に用いられるグラマー手法に
おける音声データの集合であり、グラマー手法について
は後述する。全国レベルの認識グラマー辞書には日本全
国の主要な地名や名前のデータが記憶され、都道府県レ
ベルの認識グラマー辞書には各都道府県毎に区分されて
その都道府県内の地名や名前のデータが記憶され、市町
村レベルの認識グラマー辞書には各市町村毎に区分され
てその市町村内の地名や名前のデータが記憶されてい
る。FIG. 2 shows a hierarchical structure of the audio database 18. The speech database is composed of three layers: a nationwide recognition grammar dictionary, a prefectural level recognition grammar dictionary, and a municipal level recognition grammar dictionary. The “glamor dictionary” refers to the control unit 14
Is a set of voice data in the grammar technique used when analyzing the utterance data in the grammar technique, which will be described later. The national-level recognition grammar dictionary stores the data of major place names and names throughout Japan, and the prefectural-level recognition grammar dictionary stores the data of the place names and names within each prefecture, which are classified by prefecture. The data is stored in the recognition grammar dictionary at the municipal level, for each municipal level and the data of the place names and names in the municipal level.

【００２０】音声データベース１８のどの階層のどこの
データを読み出して利用するかは、検出された現在位置
及び発声データの解析結果に応じて制御部１４が決定す
る。具体的には、例えば車両の現在位置が静岡県裾野市
である場合には、制御部１４は音声データベース１８内
の都道府県レベルの認識グラマー辞書として静岡県を指
定し、市町村レベルの認識グラマー辞書として裾野市及
びその近隣の市を指定する。そして、車両が移動して車
両の現在位置が東京都千代田区となった場合には、制御
部１４は都道府県レベルの認識グラマー辞書として東京
都を指定し、市町村レベルの認識グラマー辞書として千
代田区及びその近隣の区を指定する。現在位置に応じた
音声データベースを指定する利点は、現在位置周辺を目
的地として発声した場合に、迅速に解析して認識できる
点である。また、制御部１４は、発声データの解析結果
に応じて音声データベース１８の利用データを切り替え
る。例えば、発声データの解析により三島市が対象とな
っていることが分かれば、市町村レベルの認識グラマー
辞書を三島市に切り替えて解析を続行する。The control section 14 determines which data of which layer in the voice database 18 is to be read and used in accordance with the detected current position and the analysis result of the utterance data. Specifically, for example, when the current position of the vehicle is Susono City, Shizuoka Prefecture, the control unit 14 designates Shizuoka Prefecture as the prefectural level recognition grammar dictionary in the voice database 18, and the municipal level recognition grammar dictionary. Is designated as Susono City and its neighboring cities. Then, when the vehicle moves and the current position of the vehicle becomes Chiyoda-ku, Tokyo, the control unit 14 designates Tokyo as the recognition grammar dictionary at the prefectural level and Chiyoda-ku as the recognition grammar dictionary at the municipal level. And the ward nearby. The advantage of specifying the voice database according to the current position is that when the voice is uttered around the current position as the destination, the voice database can be quickly analyzed and recognized. The control unit 14 switches the usage data of the voice database 18 according to the analysis result of the utterance data. For example, if the analysis of the utterance data indicates that Mishima City is the target, the municipal level recognition grammar dictionary is switched to Mishima City and analysis is continued.

【００２１】図３には、本実施形態における音声認識の
処理フローチャートが示されている。まず、ユーザが発
声して目的地を入力する（Ｓ１０１）。発声の形態とし
ては、例えば「みしましの＊＊しょうてん（三島市の＊
＊商店）」や、「みしまえきのちかくのちゅうしゃじょ
うにいきたい（三島駅の近くの駐車場に行きたい）」な
どである。マイク１０から入力されたこれらの発声デー
タは、発声データ記憶部２２に記憶されるとともに（Ｓ
１０２）、制御部１４は入力した発声データをグラマー
手法を用いて解析する（Ｓ１０３）。FIG. 3 is a flowchart showing a speech recognition process according to this embodiment. First, the user speaks and inputs a destination (S101). As a form of the utterance, for example, "Mishimashi ** Shoten (*
* Shops) and I want to go to the parking lot near Mishima Station. These utterance data input from the microphone 10 are stored in the utterance data storage unit 22 and (S
102), the control unit 14 analyzes the input utterance data by using the grammar method (S103).

【００２２】ここで、グラマー手法について説明する。
グラマー手法とは、認識させたい語の並びを予め定義し
て認識させる手法であり、例えば文の並びを＜ａ＞＜ｂ
＞＜ｃ＞とし、＜ａ＞として「今日は」あるいは「明日
は」あるいは「あさっては」を候補とし、＜ｂ＞として
「天気は」あるいは「天気が」を候補とし、＜ｃ＞とし
て「いい」あるいは「悪い」を候補として、発声データ
「今日は天気がいい」や「今日は天気が悪い」や「明日
は天気がいい」などを認識するものである。本実施形態
では、目的地認識を行うために、以下のような語の並び
（フレーズと称する）を用いている。Here, the glamor technique will be described.
The grammar method is a method in which a sequence of words to be recognized is defined in advance and recognized, and for example, the sequence of sentences is defined as <a><b
><C>,<a> as a candidate for “today”, “tomorrow” or “tomorrow”, <b> as a candidate for “weather” or “weather”, and as <c>, “ Recognition of utterance data such as "the weather is good today", "the weather is bad today", "the weather is good tomorrow", and the like are set as good or bad candidates. In the present embodiment, the following word sequence (referred to as a phrase) is used to perform destination recognition.

【００２３】基本フレーズ１＝＜地名＞＜ｅｎｄ＞？基本フレーズ２＝＜ＮＵＬＬ＞？＜名前＞＜ｅｎｄ＞？基本フレーズ３＝＜地名＞の＜ＮＵＬＬ＞？＜名前＞＜
ｅｎｄ＞？基本フレーズ４＝＜地名＞の＜名前＞の＜ＮＵＬＬ＞？
＜名前＞＜ｅｎｄ＞？基本フレーズ５＝＜名前＞の＜ＮＵＬＬ＞？＜名前＞＜
ｅｎｄ＞？基本フレーズ６＝＜地名＞＜方向＞＜ｅｎｄ＞？但し、＜地名＞は住所あるいはエリアを表す語句で、住
所は「しずおか」や「しずおかけん」等であり、エリア
は「いず」や「ぼうそう」等である。また、＜ｅｎｄ＞
は文の終わりを示す語句で、「にいきたい」「へいきた
い」「にとめたい」「にかえる」「にいく」「たのむ」
「まで」「までたのみ」「へ」等である。＜ＮＵＬＬ＞
は範囲や程度を表す語句で、「ちかくの」「しゅうへん
の」「いちばん」「いちばんちかくの」「ちかい」「や
すい」「うまい」「おいしい」「いつもの」「そばの」
「ここら」等である。この＜ＮＵＬＬ＞データは目標物
を設定する際に必要となるデータでもある。＜名前＞は
名称や施設を表す語句で「＊＊えき」「＊＊ちゅうしゃ
じょう」「＊＊ごるふじょう」「＊＊こうえん」「＊＊
いんたーちぇんじ」「＊＊びょういん」「＊＊みなと」
「＊＊かわ」「＊＊かんこうち」「＊＊おんせん」等で
ある。＜方向＞は「＊＊ほうめん」等である。また、＜
＞の後の？は、その＜＞の語句が必須ではなく、なくて
も良いことを示す。したがって、基本フレーズ１には、
「しずおかにいきたい（静岡に行きたい）」の他、「し
ずおか」も含まれる。上述の例における「みしましの＊
＊商店（三島市の＊＊商店）」は基本フレーズ３に該当
し、「みしまえきのちかくのちゅうしゃじょう（三島駅
の近くの駐車場）」は基本フレーズ５に該当する。Basic phrase 1 = <place name><end>? Basic phrase 2 = <NULL>? <Name><end>? Basic phrase 3 = <NULL> of <place name>? <Name><
end>? Basic phrase 4 = <NULL> of <name> of <place name>?
<Name><end>? Basic phrase 5 = <NULL> of <name>? <Name><
end>? Basic phrase 6 = <place name><direction><end>? Here, the <place name> is a word representing an address or an area, and the address is “Shizuoka” or “Shizuokaken”, and the area is “Izu” or “Busou”. Also, <end>
Is the word that indicates the end of the sentence, "I want to go,""I want to go,""I want to go,""I want to go,""Go,""Go,"
"Until", "Until only", "He", etc. <NULL>
Is a word that expresses the range or degree.
"Here". The <NULL> data is also data required when setting a target. <Name> is a word that indicates the name or facility, such as “** Eki”, “** Chushajo”, “** Gorfujo”, “** Koen”, or “**”.
"Inta Chang", "** Byoin", "** Minato"
"** kawa", "** kankochi", "** onsen" and the like. <Direction> is “** homen” or the like. Also, <
> After? Indicates that the word <> is not essential and need not be present. Therefore, basic phrase 1 contains
In addition to “I want to go to Shizuoka”, “Shizuoka” is also included. In the above example, "*
"* Shop (** shop in Mishima-shi)" corresponds to basic phrase 3, and "Mishima-eki-no-chikaku-no-chushojo (parking lot near Mishima station)" corresponds to basic phrase 5.

【００２４】このようなグラマー手法を用いてユーザの
発声データを解析すると、少なくとも一部は解析できる
ものの、残りのデータ（特に発声の後半部分のデータ）
が解析できない場合が生じる。具体的には、上述の発声
データ「みしましの＊＊しょうてん（三島市の＊＊商
店）」を解析した場合、「みしまし（三島市）」は全国
レベルの認識グラマー辞書には存在して解析可能である
が、「＊＊しょうてん（＊＊商店）」という名前は市町
村レベルの認識グラマー辞書でなければ解析できず、し
かも市町村レベルの認識グラマー辞書が三島市以外の市
町村に指定されている場合（例えば車両の現在位置が裾
野市である場合には、市町村レベルの認識グラマー辞書
のデフォルト値は裾野市）には発声データを解析できな
い。そこで、解析して得られた結果を用いて音声データ
ベース１８のグラマー辞書を切り替える（Ｓ１０４）。
上述の場合、「みしまし（三島市）」が得られているの
で、市町村レベルの認識グラマー辞書を三島市用のデー
タに切り替える。When the user's utterance data is analyzed using such a grammar technique, at least a part of the data can be analyzed, but the remaining data (particularly, data in the latter half of the utterance).
May not be able to be analyzed. Specifically, when the above-mentioned utterance data “Mishimashino ** Shoten (** shop in Mishima City)” is analyzed, “Mishimashi (Mishima City)” is recognized as a nationwide recognition grammar dictionary. Although it exists and can be analyzed, the name “** shoten (** shop)” cannot be analyzed unless it is a municipal level recognition grammar dictionary, and the municipal level recognition grammar dictionary is not available for municipalities other than Mishima. If it is specified (for example, if the current position of the vehicle is a supporting city, the default value of the municipal level recognition grammar dictionary is the supporting city), the utterance data cannot be analyzed. Therefore, the grammar dictionary of the voice database 18 is switched using the result obtained by the analysis (S104).
In the above case, since “Mishimashi (Mishima City)” is obtained, the recognition grammar dictionary at the municipal level is switched to data for Mishima City.

【００２５】音声データベース１８を切り替えた後、Ｓ
１０２の処理で発声データ記憶部２２に記憶した発声デ
ータを読み出し、再度解析する（Ｓ１０５）。このと
き、市町村レベルの認識グラマー辞書は三島市用のデー
タとなっているため、発声データの内の「＊＊しょうて
ん（＊＊商店）」を解析することができる。発声データ
のすべての解析が終了すると、制御部１４は解析結果を
用いて地図データから目的地を検索する（Ｓ１０６）。
この例では、三島市用の地図データを読み出して＊＊商
店を検索することになる。After switching the voice database 18, S
The utterance data stored in the utterance data storage unit 22 in the process of 102 is read and analyzed again (S105). At this time, since the recognition grammar dictionary at the municipal level is data for Mishima City, “** shoten (** store)” in the utterance data can be analyzed. When all the analysis of the utterance data is completed, the control unit 14 searches the map data for a destination using the analysis result (S106).
In this example, the map data for Mishima city is read and ** stores are searched.

【００２６】一方、発声データが「みしまえきのちかく
のちゅうしゃじょうにいきたい（三島駅の近くの駐車場
に行きたい）」の場合でも、同様にしてＳ１０３で解析
を行い、アクティブな認識グラマー辞書（この場合は全
国レベル）がヒットして「みしまえき（三島駅）」「ち
かくの（近くの）」「ちゅうしゃじょう（駐車場）」を
解析することができる。そして、市町村レベルの認識グ
ラマー辞書を三島市用のデータに切替え（Ｓ１０４）、
発声データ記憶部２２に記憶された発声データを読み出
して再度解析を行う（Ｓ１０５）。なお、この例の場合
では、１回目の解析で全ての発声データを解析できるの
で、２回目の解析結果は１回目の解析結果と同一であ
る。もちろん、発声データが「みしまえきのちかくの＊
＊ちゅうしゃじょうにいきたい（三島駅の近くの＊＊駐
車場に行きたい）」である場合には、１回目の解析では
「＊＊ちゅうしゃじょう（＊＊駐車場）」の部分は解析
不能となり、音声データベースを切り替えた後の２回目
の解析で「＊＊ちゅうしゃじょう」の部分も解析できる
ことになる。そして、＜ＮＵＬＬ＞データである「近く
の」が存在するため、制御部１４は＜ＮＵＬＬ＞データ
の前に存在する＜名前＞データの解析結果を目標物とし
て処理し、地図データ上でこの目標物（三島駅）の座標
（Ｘ、Ｙ）から近い順に駐車場を検索する（Ｓ１０
６）。On the other hand, if the utterance data is "I want to go to the parking lot near Mishima station", the analysis is performed in the same manner in S103, and the active recognition grammar is obtained. The dictionary (in this case, the national level) hits and can analyze "Mishima Eki (Mishima Station)", "Chikakuno (nearby)", "Chushajo (parking lot)". Then, the recognition grammar dictionary at the municipal level is switched to data for Mishima city (S104),
The utterance data stored in the utterance data storage unit 22 is read and analyzed again (S105). In this case, since all utterance data can be analyzed in the first analysis, the result of the second analysis is the same as the result of the first analysis. Of course, if the utterance data is "
* If you want to go to the chushajo (you want to go to the ** parking lot near Mishima station) ", in the first analysis, the part of" ** chushajo (** parking) "is analyzed. It becomes impossible, and the part of “**” can be analyzed in the second analysis after switching the voice database. Then, since “near”, which is <NULL> data, exists, the control unit 14 processes an analysis result of the <name> data existing before the <NULL> data as a target, and displays this target on the map data. The parking lot is searched in order from the coordinates (X, Y) of the object (Mishima Station) (S10).
6).

【００２７】このように、本実施形態ではユーザの発声
データを記憶しておき、１回目の解析で解析できなかっ
た場合でも音声データベースを自動的に切り替えて再度
解析を行うので、音声認識の精度が向上するとともに、
ユーザは１回の発声で目的地を設定することができる。As described above, in the present embodiment, the utterance data of the user is stored, and even if the analysis cannot be performed in the first analysis, the voice database is automatically switched and the analysis is performed again. Is improved,
The user can set the destination with one utterance.

【００２８】また、本実施形態では、＜ＮＵＬＬ＞デー
タが存在する場合には、その前のデータを目標物とみな
して地図データから検索し、その目標物周辺の地図デー
タを検索することで本来の目的地を検索することができ
るので、ユーザは自然な発声で所望の目的地を容易に設
定することが可能となる。In this embodiment, when <NULL> data is present, the preceding data is regarded as a target and searched from the map data, and the map data around the target is searched for. Since the user can search for the destination, the user can easily set the desired destination with a natural utterance.

【００２９】また、本実施形態において、同音異義語が
存在する場合には、認識率を向上させるためにユーザに
対してより多くの情報を求めることが好適である。例え
ば、ユーザが「とよた」と発声した場合、スピーカから
「とよたしですか、とよたちょうですか」と問い合わせ
る等である。In the present embodiment, when a homonym exists, it is preferable to obtain more information from the user in order to improve the recognition rate. For example, when the user utters “Toyota”, the speaker inquires, “Is it OK?”.

【００３０】さらに、発声データの解析を行う場合、得
られたデータの種類を表すアノテーションを付与し、地
図データベースの検索を容易とする等の技術は当然なが
ら本実施形態においても用いることができる。例えば、
地名の中の県名にはアノテーションとして数字の１１、
市名には数字の１３、エリアには４２を付与し、名前に
はアノテーションとして数字の３２を付与する等であ
る。この場合、＜ＮＵＬＬ＞データの内の「ちかい」や
「ちかくの」にはアノテーションとして数字を付与する
（例えば９１）一方、＜ＮＵＬＬ＞データの内の「うま
い」や「やすい」にはアノテーションを付与しないのが
好適である。これらは目的地の設定（地図データの検
索）にとって不要な語句だからである。Further, when analyzing the utterance data, a technique such as adding an annotation indicating the type of the obtained data to facilitate the search of the map database can be used in the present embodiment. For example,
The prefecture name in the place name is the number 11,
For example, the number 13 is assigned to the city name, the number 42 is assigned to the area, and the number 32 is assigned to the name as an annotation. In this case, a number is given as an annotation to “near” or “near” in the <NULL> data (for example, 91), while an annotation is given to “good” or “easy” in the <NULL> data. It is preferred not to apply. This is because these terms are unnecessary for setting the destination (searching the map data).

【００３１】[0031]

【発明の効果】以上説明したように、本発明によればユ
ーザの発声の負担を軽減してより簡易に所望のデータ、
例えばナビゲーション用の目的地を音声で設定すること
ができる。As described above, according to the present invention, it is possible to reduce the user's utterance load and to more easily obtain desired data,
For example, a destination for navigation can be set by voice.

[Brief description of the drawings]

【図１】実施形態の構成ブロック図である。FIG. 1 is a configuration block diagram of an embodiment.

【図２】実施形態の音声データベースの構造を示す説
明図である。FIG. 2 is an explanatory diagram illustrating a structure of a voice database according to the embodiment;

【図３】実施形態の処理フローチャートである。FIG. 3 is a processing flowchart of the embodiment.

[Explanation of symbols]

１０マイク、１２現在位置検出部、１４制御部、
１６操作部、１８音声データベース、２０地図デー
タ記憶部、２２発声データ記憶部、２４表示部。10 microphones, 12 current position detection unit, 14 control unit,
16 operation unit, 18 voice database, 20 map data storage unit, 22 utterance data storage unit, 24 display unit.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０９Ｂ 29/10 Ｇ０９Ｂ 29/10 ＺＦターム(参考） 2C032 HB06 HC16 HD16 2F029 AA02 AB01 AB07 AB09 AC01 AC02 AC04 AC18 5D015 HH13 HH16 KK02 LL10 5H180 AA01 BB04 BB13 CC12 CC27 FF04 FF05 FF22 FF25 FF32 FF33 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G09B 29/10 G09B 29/10 Z F term (Reference) 2C032 HB06 HC16 HD16 2F029 AA02 AB01 AB07 AB09 AC01 AC02 AC04 AC18 5D015 HH13 HH16 KK02 LL10 5H180 AA01 BB04 BB13 CC12 CC27 FF04 FF05 FF22 FF25 FF32 FF33

Claims

[Claims]

An utterance data storage unit for storing utterance data of a user; a first voice analysis unit for analyzing at least a part of the utterance data by comparing the utterance data with voice data in a voice database; A switching unit that switches the voice database based on the analysis data obtained by the first voice analysis unit; and reads the utterance data stored in the utterance data storage unit, and reads the utterance data in the voice database that is switched by the switching unit. And a second voice analysis means for re-analyzing the voice data by comparing the voice data with voice data.

2. The apparatus according to claim 1, wherein the utterance data is destination data for navigation, and when predetermined data is obtained by the second voice analysis means, the data is converted to the destination data. A speech recognition device having means for processing as a target.

3. A storage step for storing utterance data of the user; a first analysis step of analyzing at least a part of the voice data by comparing the utterance data of the user with voice data in a voice database; A switching step of switching the voice database based on the analysis data obtained in the first analysis step; and reading the utterance data stored in the storage step,
A second analysis step of reanalyzing by comparing the read utterance data with the voice data in the voice database switched in the switching step.

4. The method according to claim 3, wherein the utterance data is destination data for navigation, and when predetermined data is obtained in the second analysis step, the data is converted to the target data for the destination. A speech recognition method, further comprising a processing step of processing as a thing.