JP2003140682A

JP2003140682A - Voice recognition device and voice dictionary generation method

Info

Publication number: JP2003140682A
Application number: JP2001339117A
Authority: JP
Inventors: Toshiyuki Momomoto; 利行百本
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2001-11-05
Filing date: 2001-11-05
Publication date: 2003-05-16

Abstract

PROBLEM TO BE SOLVED: To automatically generate a voice dictionary of words that a user personally requires and to generate a voice dictionary which is reliable. SOLUTION: A voice dictionary database 52 in which words and voice patterns are registered corresponding to each other is prepared, and a voice dictionary generation part of a voice recognition engine 51 sends a voice pattern inputted from a microphone to an external voice recognizing device 70; and the correspondence between a word whose voice is recognized by the external voice recognizing device and the voice pattern of the word is registered in the voice dictionary database 52 and a voice recognition part of the voice recognition engine 51 recognizes the word of the inputted voice by using a voice dictionary registered in the voice dictionary database.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置及び音
声辞書作成方法に係わり、特に、音声辞書データべース
に複数の単語の音声パターンを登録し、入力された音声
パターンと登録されている音声パターンを比較し、最も
類似している登録音声パターンに基づいて音声入力され
た単語を認識する音声認識装置及びその音声辞書作成方
法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device and a voice dictionary creating method, and more particularly, it registers voice patterns of a plurality of words in a voice dictionary database and registers them as input voice patterns. The present invention relates to a voice recognition device for recognizing a voice input word based on the most similar registered voice pattern by comparing voice patterns and a voice dictionary creating method thereof.

【０００２】[0002]

【従来の技術】ナビゲーション装置は目的地が入力され
ると該目的地までの経路を探索し、該経路に沿ってドラ
イバを誘導する。目的地を入力するには、(1) 目的地の
アドレスをキー入力する方法（アドレス入力）、(2) 目
的地が交差点の場合、2つのストリートを特定して該交
差点を入力する方法（交差点入力）、(3) 目的地である
POI(Point of Interest)の名称をキー入力する方法（PO
I入力)、(4) カテゴリーと所在都市名を入力してPOIの
候補リストを表示し、該候補リストより目的地であるPO
Iを選択する方法、(5) カーソルで直接地図上の目的地
を指示して入力する方法、(6) アドレスに所定の名称を
付してアドレスブックに登録しておき、該アドレスブッ
クを表示して目的地のアドレスを入力する方法、(7) PO
Iの電話番号を入力する方法、などがある。2. Description of the Related Art A navigation system searches for a route to a destination when a destination is input and guides a driver along the route. To enter the destination, (1) key in the address of the destination (address input), (2) if the destination is an intersection, specify two streets and enter the intersection (intersection Input), (3) is the destination
How to key in the name of POI (Point of Interest) (PO
(Enter I), (4) Enter the category and city name to display the POI candidate list, and select the PO that is the destination from the candidate list.
How to select I, (5) Directly input the destination on the map with the cursor, (6) Add a predetermined name to the address, register it in the address book, and display the address book And enter the destination address, (7) PO
There is a method of inputting the telephone number of I, etc.

【０００３】図９および図１０はアドレス入力による目
的地特定方法の説明図である。米国の住所表記は、州／
都市名／ストリート名／番地のフォーマットを有してい
る。従って、州／都市名／ストリート名／番地を入力す
ることにより目的地を特定できる。そこで、ナビゲーシ
ョン装置において、アドレス入力により目的地を特定す
るには、ユーザはリモコンのメニューキーを操作してス
クリーンにメインメニューを表示し、メインメニューよ
りメニュー項目"Dest"を選択する。"Dest"の選択によ
り、ナビゲーション装置は図９(a）に示すように目的地
入力方法を特定するための"Find Destination by"画面
を表示するから、メニュー項目"Address"を選択してリ
モコンのエンターキーを押下すれば、ナビゲーション制
御部は図９(b)に示すようにメニュー項目"Change Stat
e"、"Street Name"、"City Name"を表示する。都市名
より始めて目的地のアドレスを入力するには、メニュー
項目"City Name"を選択し、ストリート名より始めて目
的地のアドレスを入力するには、メニュー項目"Street
Name"を入力する。又、最後に選択された州がデフォル
ト州として表示されているから、州名を変更したい場合
には"Change State"を選択し(図９(c))、スクロールに
より所望の州を選択する(図９(d))。9 and 10 are explanatory views of a destination specifying method by inputting an address. US address notation is state /
It has a format of city name / street name / address. Therefore, the destination can be specified by inputting the state / city name / street name / address. Therefore, in order to specify the destination by inputting an address in the navigation device, the user operates the menu key of the remote controller to display the main menu on the screen, and selects the menu item "Dest" from the main menu. By selecting "Dest", the navigation device displays the "Find Destination by" screen for specifying the destination input method as shown in Fig. 9 (a). Select the menu item "Address" and select When the enter key is pressed, the navigation control section displays the menu item "Change Stat" as shown in Fig. 9 (b).
e "," Street Name "," City Name "are displayed. To enter the destination address starting with the city name, select the menu item" City Name "and enter the destination address starting with the street name. To do this, go to the menu item "Street
Enter "Name". Also, since the last selected state is displayed as the default state, if you want to change the state name, select "Change State" (Fig. 9 (c)) and scroll to the desired state. Select the state (Fig. 9 (d)).

【０００４】州決定後、メニュー項目"City Name"を選
択してエンターキーを操作すると、ナビゲーション制御
部は図９(e)に示すように都市名を入力させるためのア
ルファベット/ニューメリックキーボードをスクリーン
に表示する。この状態でユーザが都市名の先頭数スペル
を入力してキーボード上のリストキーListを入力すると
(図９(f))、図１０(g)に示すように、カリフォルニア州
の該スペルを有する都市名のリストがスクロール可能に
表示される。図はスペル"T"が入力された場合を示し、
先頭に"T"を有する都市名"TORRANCE"が表示される。つ
いで、ジョイスティックキーを用いて都市名リストをス
クロールして所望の都市名(TORRANCE)を選択してエンタ
ーキーを操作する。When the menu item "City Name" is selected and the enter key is operated after the state is determined, the navigation control section displays an alphabet / numeric keyboard on the screen for inputting the city name as shown in FIG. 9 (e). indicate. In this state, if the user enters the first few spells of the city name and enters the list key List on the keyboard,
As shown in FIG. 9 (f) and FIG. 10 (g), a list of city names having the spell in California is displayed in a scrollable manner. The figure shows the case where the spelling "T" is entered,
The city name "TORRANCE" with "T" at the beginning is displayed. Then, use the joystick keys to scroll through the city name list, select the desired city name (TORRANCE), and operate the enter key.

【０００５】以上により、都市名の特定作業が終了すれ
ば、ナビゲーション装置は図１０(h)に示すようにスト
リート名入力画面にする。この状態でユーザがストリー
トの先頭数スペルを入力してキーボード上のリストキー
Listを押下する（図１０(i))。これにより、図１０(j)
に示すように、該スペルを有する TORRANCEに存在する
ストリート名のリストがスクロール可能に表示される。
図はスペル"CAR"が入力された場合を示し、先頭に"CAR"
を有するストリート名"CARSON"がリスト表示される。つ
いで、ジョイスティックキーを用いてストリート名リス
トをスクロールして所望のストリート名(CARSON)を選択
してエンターキーを操作する。以上により、ストリート
名の特定作業が終了すれば、ナビゲーション制御部は、
図１０(k)に示すように番地入力画面にする。この状態
において、番地を入力してエンターキーを操作すれば、
図１０ (m)に示すように目的地のアドレスが表示される
から。アドレスが目的地として正しければ"OK to Proce
ed"を選択して目的地として確定する。目的地のアドレ
スが入力されるとナビゲーション制御部は、住所データ
ベースより目的地アドレスの経緯度を求め、該位置情報
を用いて経路探索を行なう。When the city name specifying operation is completed as described above, the navigation device displays the street name input screen as shown in FIG. 10 (h). In this state, the user enters the first few spells of the street and enters the list key on the keyboard.
Press List (Fig. 10 (i)). As a result, FIG. 10 (j)
As shown in, a list of street names existing in the TORRANCE having the spell is scrollably displayed.
The figure shows the case where the spelling "CAR" is entered, with "CAR" at the beginning
The street name "CARSON" with is listed. Then, use the joystick keys to scroll the street name list, select the desired street name (CARSON), and operate the enter key. When the street name identification work is completed as described above, the navigation control unit
The address input screen is displayed as shown in FIG. In this state, if you enter the address and operate the enter key,
The destination address is displayed as shown in Fig. 10 (m). If the address is correct as the destination, "OK to Proce
ed "is selected and confirmed as the destination. When the address of the destination is input, the navigation control unit obtains the latitude and longitude of the destination address from the address database and performs route search using the position information.

【０００６】図１１はPOIタイプ(カテゴリー)入力によ
る目的地特定方法の説明図である。ナビゲーション装置
において、POIタイプ入力により目的地を特定するに
は、ユーザはリモコンのメニューキーを操作してスクリ
ーンにメインメニューを表示し、メインメニューよりメ
ニュー項目"Dest"を選択する。"Dest"の選択により、ナ
ビゲーション制御部は図１１(a）に示すように目的地入
力方法を特定するための"Find Destination by"画面を
表示するから、メニュー項目"Point of Interest"を選
択してリモコンのエンターキーを押下すれば、ナビゲー
ション装置は図１１(b)に示すようにメニュー項目"Plac
e Name", "Place Type sort by distance","Place Type
within a city"を表示するから、たとえばメニュー項
目" Place Type sort by distance "を選択し、エンタ
ーキーを押下する。これにより、ナビゲーション制御部
は図１１(c)に示すようにメインのカテゴリーリストを
スクロール可能に表示し、スクロールすることにより図
１１ (d)に示すようにリスト内容を変更することができ
る。尚、メインカテゴリーにはサブカテゴリーを有するも
のと有しないものがあり、RESTAURANTやSHOPPING、GAS S
TATIONなどはサブカテゴリーがあり、これらを選択する
と、たとえばレストランを選択すると、図１１(e)に示す
ようにサブカテゴリーリストがスクロール可能に表示さ
れる。FIG. 11 is an explanatory diagram of a destination specifying method by inputting a POI type (category). In the navigation device, in order to specify the destination by the POI type input, the user operates the menu key of the remote controller to display the main menu on the screen, and selects the menu item "Dest" from the main menu. By selecting "Dest", the navigation controller displays the "Find Destination by" screen for specifying the destination input method, as shown in Fig. 11 (a). Select the menu item "Point of Interest". Then press the enter key on the remote controller and the navigation device will display the menu item "Plac" as shown in Fig. 11 (b).
e Name "," Place Type sort by distance "," Place Type
From within "a city" is displayed, select the menu item "Place Type sort by distance" and press the enter key, which causes the navigation controller to scroll the main category list as shown in Fig. 11 (c). You can change the contents of the list by displaying it and scrolling as shown in Fig. 11 (d) .There are some main categories with and without sub categories, RESTAURANT, SHOPPING, and GAS. S
TATION and the like have subcategories. When these are selected, for example, when a restaurant is selected, a subcategory list is scrollably displayed as shown in FIG. 11 (e).

【０００７】所定のPOIタイプ(カテゴリー)が特定され
ると、ナビゲーション制御部は図１１(f)に示すように該
カテゴリーの施設リストを距離順にスクロール可能に表
示する。リモコンのジョイスティックキーを用いてPOI名
リストをスクロールして所望のPOI(DEL AMO FASHION CE
NTER)を選択してエンターキーを操作すれば、ナビゲー
ション装置は図１１(g)に示すように目的地のアドレス
を表示するから。アドレスが目的地として正しければ"O
K to Proceed"を選択して目的地として確定する。目的
地のアドレスが入力されると、ナビゲーション装置はPO
Iの経緯度をPOIデータベースより求め、該位置情報を用
いて経路探索を行なう。When a predetermined POI type (category) is specified, the navigation controller displays the facility list of the category in a scrollable manner in the order of distance, as shown in FIG. 11 (f). Use the joystick keys on the remote control to scroll through the POI name list and select the desired POI (DEL AMO FASHION CE
If you select (NTER) and operate the enter key, the navigation device displays the destination address as shown in FIG. 11 (g). If the address is correct as the destination, "O
Select "K to Proceed" to confirm the destination. When the destination address is entered, the navigation device will
The latitude and longitude of I are obtained from the POI database, and a route search is performed using the position information.

【０００８】ところで、最近の傾向としてはキー入力の
代りに音声入力して目的地を特定することも可能になっ
ている。図１２は目的地を音声でアドレス入力する場合
の説明図である。ユーザがリモコンのトークスイッチを
押下すると(ステップ101)、システムは"ピッ"音を発生
する。この"ピッ"音の発生により音声入力が可能となり
(以下同様)、ユーザは"メニュー"と音声入力する(ステッ
プ102)。ナビゲーション制御部は入力音声を認識してス
クリーンに図９(a）に示すように目的地入力方法を特定
するための"Find Destination by"画面を表示し、"How w
ould you like to set your destinatio?"を音声出力す
る。これによりユーザは"Address"を音声入力する(ステ
ップ103)。By the way, as a recent tendency, it is possible to specify a destination by voice input instead of key input. FIG. 12 is an explanatory diagram in the case of inputting a destination address by voice. When the user presses the talk switch on the remote control (step 101), the system makes a "beep" sound. It is possible to input voice by the generation of this "beep" sound.
(The same applies hereinafter), the user inputs by voice "menu" (step 102). The navigation control unit recognizes the input voice and displays the "Find Destination by" screen on the screen to identify the destination input method as shown in Fig. 9 (a).
"ould you like to set your destinatio?" is output by voice, whereby the user inputs "Address" by voice (step 103).

【０００９】ナビゲーション制御部は入力音声を認識し
てスクリーンに図９(b）に示すようにメニュー項目"Cha
nge State"、"Street Name"、"City Name"を有する"F
indAddress by"画面を表示し、"Select City or Street
to find your address inCalifornia." を音声出力す
る。なお、最後に選択された州がデフォルト州として表
示されているから、州名を変更したい場合には"Change
State"を音声入力して図９(d)に示すように"Select a S
tate"画面を表示し、しかる後、州名を音声入力する。な
お、キャンセルをキー又は音声で入力により図９(b）に
示す状態に戻すことができる。The navigation controller recognizes the input voice and displays the menu item "Cha" on the screen as shown in FIG. 9 (b).
"F with" nge State "," Street Name ", and" City Name "
IndAddress by "Display the screen and select" Select City or Street "
"find your address in California." is output. The last selected state is displayed as the default state. If you want to change the state name, click "Change.
Input "State" by voice and click "Select a S" as shown in Fig. 9 (d).
The "tate" screen is displayed, and then the state name is input by voice. Note that the cancel can be returned to the state shown in FIG.

【００１０】州決定後、ユーザは都市名より始めて目的
地のアドレスを入力するために"City"を音声入力する
(ステップ104)。ナビゲーション制御部は入力音声を認
識してスクリーンに図９(e)に示すように都市名を入力
させるためのアルファベット/ニューメリックキーボー
ドを"Input City Name"画面をスクリーンに表示し、"Wh
ich City in California ？"を音声出力する(ステップ1
05)。ユーザが都市名"TORRANCE"を入力すると(ステップ
106)、ナビゲーション制御部は入力音声を認識してスク
リーンに図９(h)に示すようにストリート名を入力させ
るためのアルファベット/ニューメリックキーボードを
有する"Input Street Name"画面をスクリーンに表示
し、"Which Street in Torrance？"を音声出力する(ス
テップ107)。After the state is determined, the user voice-inputs "City" to input the destination address, starting with the city name.
(Step 104). The navigation control unit displays the "Input City Name" screen on the screen by displaying the alphabet / numeric keyboard for recognizing the input voice and inputting the city name on the screen as shown in FIG.
ich City in California? Is output as a voice (step 1
05). When the user enters the city name "TORRANCE" (step
106), the navigation controller displays the "Input Street Name" screen on the screen, which has an alphabet / numeric keyboard to recognize the input voice and input the street name on the screen as shown in FIG. 9 (h). Which Street in Torrance? "Is output as a voice (step 107).

【００１１】ユーザがストリート名"Carson" を入力す
ると(ステップ10８)、ナビゲーション装置は入力音声を
認識してスクリーンに図９(k)に示すように番地を入力
させるためのテンキーを有する"Input Address Number"
画面をスクリーンに表示し、"Which Street number on
Carson？"を音声出力する(ステップ109)。ユーザが番地"
3525"を音声入力すると(ステップ110)、ナビゲーション
制御部は、図10(m)に示すように目的地のアドレスを有
する"Confirm Destination "画面を表示した後、"Would
you like to calculate a route to 3525 Carson Torra
nce ST, CA?"を音声出力する。アドレスが目的地として
正しければユーザは"OK"を音声入力して目的地として確
定する。これにより、ナビゲーション制御部は"Calculat
ing route to 3525 Carson ST Torrance, CA"を音声出
力し、住所データベースより目的地アドレスの経緯度を
求め、該位置情報を用いて経路探索を行なう。When the user inputs the street name "Carson" (step 108), the navigation device recognizes the input voice and inputs "Address Address" on the screen as shown in FIG. 9 (k). Number "
Display the screen on the screen, and then "Which Street number on
Carson? "Voice out (step 109). User addresses"
When "2525" is input by voice (step 110), the navigation controller displays the "Confirm Destination" screen having the destination address as shown in FIG.
you like to calculate a route to 3525 Carson Torra
nce ST, CA? "is output by voice. If the address is correct as the destination, the user inputs" OK "by voice to determine the destination.
ing route to 3525 Carson ST Torrance, CA "is output by voice, the latitude and longitude of the destination address are obtained from the address database, and the route search is performed using the position information.

【００１２】図１３は音声認識装置の説明図である。音
声認識装置1はたとえばナビゲーション装置２と接続さ
れ、ユーザが音声入力した単語を認識してナビゲーショ
ン装置２に入力するようになっている。音声認識辞書１
ａは単語(Text)と発音記号番号(Annotation)を記憶し、
合成音声テーブル1ｂは発音記号番号(Annotation)と発
音記号との対応を記憶する。音声認識エンジン(Voice Re
cognition Engine)1cは音声認識辞書1a及び合成音声テ
ーブル1ｂを参照してマイクロホン1dから入力する話者
の音声パターンと最も類似度が高い(スコアが最も高い)
単語を決定してナビゲーション装置2に出力する。すなわ
ち、あらかじめ登録しておいた、音声認識辞書1aの単語
について入力音声パターンとパターンマッチングを行
い、一番スコア（確率）の高いものを認識結果とする。
認識結果としては、Textとそれに対応するAnnotation
（ID）を出力する。FIG. 13 is an explanatory diagram of the voice recognition device. The voice recognition device 1 is connected to, for example, the navigation device 2, and recognizes a word that the user has input by voice and inputs it to the navigation device 2. Speech recognition dictionary 1
a stores a word (Text) and a phonetic symbol number (Annotation),
The synthetic speech table 1b stores the correspondence between phonetic symbol numbers (Annotation) and phonetic symbols. Voice recognition engine (Voice Re
cognition Engine) 1c has the highest similarity (highest score) to the voice pattern of the speaker input from the microphone 1d by referring to the voice recognition dictionary 1a and the synthetic voice table 1b.
The word is determined and output to the navigation device 2. That is, the words in the voice recognition dictionary 1a that have been registered in advance are subjected to pattern matching with the input voice pattern, and the one with the highest score (probability) is taken as the recognition result.
As the recognition result, Text and Annotation corresponding to it
Output (ID).

【００１３】[0013]

【発明が解決しようとする課題】音声認識を行なうため
には認識したい単語を予め音声認識辞書に登録しておく
必要がある。ナビゲーション装置において、アドレス
(住所)で目的地入力するには都市、ストリート等を音声
入力する必要があり、POI(施設)で入力するにはPOI名
(施設名)を音声入力する必要がある。しかし、これら都
市、ストリート、施設名は非常に多く、数十万にもな
る。このため、目的地を正しく認識するには音声認識辞
書が大きくなりすぎ、ハードウェア規模が大きくなり、
装置の大型化、コスト高を招く問題がある。In order to perform voice recognition, it is necessary to register the word to be recognized in the voice recognition dictionary in advance. Address in navigation device
To enter the destination in (Address), you need to enter the city, street, etc. by voice, and in POI (Facility), enter the POI name.
(Facility name) needs to be input by voice. However, the names of these cities, streets, and facilities are very large, and there are hundreds of thousands. For this reason, the voice recognition dictionary becomes too large to correctly recognize the destination, and the hardware scale becomes large.
There are problems that the device becomes large and the cost becomes high.

【００１４】又、市販の音声認識エンジンは、音声認識対
象語が2秒間で最大4000発話語の能力しかなく、全都市
名、ストリート名、施設名を登録することはできない。そ
こで、本願出願人は特願2001-192737号として、予めユー
ザが真に必要としている地域名あるいは施設名をキー入
力し、該キー入力された地域名あるいは施設名の音声パ
ターンを内部的に作成し、該音声パターンを地域名ある
いは施設名に対応させて音声辞書に登録し、該音声辞書
を用いて音声認識する音声認識装置を提案している。し
かし、この提案されている音声認識装置は音声認識対象
語を一々キー入力により登録しなければならず、音声辞
書の作成が大変となる問題が有り、しかも、音声パターン
をキー入力データより内部的に合成するものであるため
信頼性が低い問題がである。Further, a commercially available voice recognition engine has a capability of a voice recognition target word having a maximum of 4000 utterances in 2 seconds, and cannot register all city names, street names and facility names. Therefore, the applicant of the present application, as Japanese Patent Application No. 2001-192737, internally inputs a region name or facility name that the user really needs in advance, and internally creates a voice pattern of the keyed region name or facility name. Then, a voice recognition device is proposed in which the voice pattern is registered in a voice dictionary in association with a region name or a facility name, and voice recognition is performed using the voice dictionary. However, this proposed voice recognition device has a problem that it is difficult to create a voice dictionary because each voice recognition target word must be registered by key input, and moreover, the voice pattern is internally generated from the key input data. The problem is that reliability is low because it is synthesized.

【００１５】以上から本発明の目的は、ユーザが個人的
に必要とする単語の音声辞書を自動的に作成することが
でき、しかも、信頼性の高い音声辞書を作成して音声認
識できるようにすることである。From the above, it is an object of the present invention to enable a user to automatically create a voice dictionary of words that he or she personally needs, and also to create a highly reliable voice dictionary for voice recognition. It is to be.

【００１６】[0016]

【課題を解決するための手段】上記課題は本発明によれ
ば、単語と音声パターンの対応を登録する音声辞書デー
タべース、入力された単語の音声パターンを外部音声認
識装置に送り、該外部音声認識装置で音声認識した単語
と該単語の音声パターンの対応を音声辞書データべース
に登録する音声辞書作成部、該音声辞書データべースに
登録されている音声辞書を用いて入力された音声の単語
を認識する音声認識部、を備えた音声認識装置により達
成される。すなわち、信頼性の高い音声辞書を作成する
ことができる。According to the present invention, there is provided a voice dictionary database for registering a correspondence between a word and a voice pattern, and a voice pattern of an input word is sent to an external voice recognition device. A voice dictionary creating unit for registering a correspondence between a word recognized by an external voice recognition device and a voice pattern of the word in a voice dictionary database, and inputting using a voice dictionary registered in the voice dictionary database. This is achieved by a voice recognition device provided with a voice recognition unit that recognizes a word of a generated voice. That is, a highly reliable voice dictionary can be created.

【００１７】また、下位概念の単語とその音声パターン
を組にして登録する辞書部を上位概念毎に設け、下位概
念の単語が音声入力され時、外部音声認識装置より得ら
れた単語とその音声パターンを上位概念に応じた辞書部
に登録する。このようにすれば、所定の辞書部に登録さ
れている単語を検索対象とするだけで良いため、パター
ンマッチングする単語数を少なくできる。このため、高
速の音声認識ができるようになる。Further, a dictionary unit for registering a subordinate concept word and its voice pattern as a set is provided for each superordinate concept, and when the subordinate concept word is input by voice, the word obtained by the external voice recognition device and its voice are obtained. The pattern is registered in the dictionary unit according to the superordinate concept. In this way, only the words registered in a predetermined dictionary unit need to be searched, and the number of words to be pattern-matched can be reduced. Therefore, high-speed voice recognition can be performed.

【００１８】音声入力されたとき、作成した音声辞書を
用いて入力音声の単語認識処理を行い、単語認識できた
場合は外部音声認識装置に問い合わせを行わず、単語認
識できない場合のみ外部音声認識装置に問い合わせ該外
部音声認識装置で音声認識した単語と該単語の音声パタ
ーンの対応を音声辞書データべースに登録するようにす
る。このようにすれば、音声認識を行いながら自動的に
ユーザが必要とする単語の音声パターンが登録された音
声辞書を作成することができる。When a voice is input, the word recognition processing of the input voice is performed using the created voice dictionary. If the word can be recognized, the external voice recognition device is not inquired. Only when the word cannot be recognized, the external voice recognition device is used. The correspondence between the word recognized by the external voice recognition device and the voice pattern of the word is registered in the voice dictionary database. By doing so, it is possible to automatically create a voice dictionary in which voice patterns of words required by the user are registered while performing voice recognition.

【００１９】[0019]

【発明の実施の形態】図1は本発明の音声辞書作成方法
を実現するシステムの全体構成図である。10はナビゲー
ション装置、50は音声認識装置、70は音声認識用の外部サ
ーバ、80は全単語の音声パターンが登録された大容量の
辞書データべースである。音声認識装置50は音声認識処
理及び音声辞書作成処理を行う音声認識エンジン部51、
ユーザ個人用の音声辞書が登録された個人用音声辞書デ
ータべース52、外部サーバ70と無線通信網／インタネッ
トを介して通信を行う無線通信部53を有している。音声
認識装置50とナビゲーション装置10は相互に通信可能に
接続されており、音声でナビゲーションの操作を行える
ようになっている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is an overall configuration diagram of a system for realizing a voice dictionary creating method of the present invention. 10 is a navigation device, 50 is a voice recognition device, 70 is an external server for voice recognition, and 80 is a large-capacity dictionary database in which voice patterns of all words are registered. The voice recognition device 50 is a voice recognition engine unit 51 that performs voice recognition processing and voice dictionary creation processing.
It has a personal voice dictionary database 52 in which a personal voice dictionary for a user is registered, and a wireless communication section 53 for communicating with an external server 70 via a wireless communication network / internet. The voice recognition device 50 and the navigation device 10 are connected so that they can communicate with each other, and the navigation operation can be performed by voice.

【００２０】所定のナビゲーション画面において、所定
の単語が音声入力されると音声認識エンジン51は該単語
の音声波形データ（PCM)を外部サーバ70に送り、外部サ
ーバ70に音声認識処理を行わせ、外部サーバよりその音
声認識結果(認識単語と発音記号列)と取得する。又、音
声認識エンジン51は取得した単語をナビゲーション装置
10に送出すると共に、単語と音声パターン（発音記号
列）を音声辞書データべース52に登録する。以上の動作
を繰り返すことにより、音声辞書データべース52にユー
ザが必要とする音声辞書が自動的に作成される。尚、予め
最小限の単語、たとえば各種コマンド、州あるいは県名、
メイン／サブのカテゴリー等は音声辞書データべース52
に予め登録しておくことができる。又、実際の目的地設定
に際して入力された単語のみならず、音声辞書データべ
ース52への登録を目的としてナビゲーション装置のスク
リーンに都市名入力画面を表示して所定の都市名を音声
入力することにより該都市名を音声辞書データべース52
に登録することができる。又、同様にストリート名入力
画面にして所定のストリート名を音声入力することによ
り該ストリート名を音声辞書データべース52に登録する
ことができる。更に、同様に、施設名入力画面にして所
定の施設名を音声入力することにより該施設名を音声辞
書データべース52に登録することができる。When a predetermined word is voice-input on a predetermined navigation screen, the voice recognition engine 51 sends the voice waveform data (PCM) of the word to the external server 70 to cause the external server 70 to perform voice recognition processing. The speech recognition result (recognized word and phonetic symbol string) is acquired from an external server. In addition, the voice recognition engine 51 uses the acquired words as a navigation device.
The word and the voice pattern (phonetic symbol string) are registered in the voice dictionary database 52 while being sent to 10. By repeating the above operation, the voice dictionary required by the user is automatically created in the voice dictionary database 52. In addition, the minimum number of words, such as various commands, state or province names,
The main / sub category is a voice dictionary database 52
You can register in advance. Also, not only the words input when setting the actual destination, but also the city name input screen is displayed on the screen of the navigation device for the purpose of registration in the voice dictionary database 52 and the predetermined city name is input by voice. By doing so, the city name is converted into a voice dictionary database 52.
You can register at. Similarly, by inputting a predetermined street name by voice on the street name input screen, the street name can be registered in the voice dictionary database 52. Furthermore, similarly, the facility name can be registered in the voice dictionary database 52 by voice inputting a predetermined facility name on the facility name input screen.

【００２１】図2はナビゲーション装置１０の構成図で
ある。図中、１１は地図情報を記憶する地図記憶媒体
で、例えばDVD(ディジタルビデオディスク)、１２はDVD
からの地図情報の読み取りを制御するDVD制御部、１３
は車両現在位置を測定する位置測定装置、１４はDVDか
ら読み出した車両位置周辺の地図情報を記憶する地図情
報メモリ、１５はメニュー選択による各種設定、指示を
行なうと共に地点名の入力、拡大／縮小等の操作を行う
リモコン、１６はリモコンインターフェースである。FIG. 2 is a block diagram of the navigation device 10. In the figure, 11 is a map storage medium for storing map information, such as a DVD (digital video disc), 12 is a DVD
A DVD control unit for controlling reading of map information from the disc, 13
Is a position measuring device for measuring the current position of the vehicle, 14 is a map information memory for storing the map information around the vehicle position read from the DVD, 15 is various settings and instructions by menu selection, and a point name is input and enlargement / reduction is performed. Reference numeral 16 denotes a remote controller interface for performing operations such as.

【００２２】１７はナビゲーション装置全体を制御する
ＣＰＵ（ナビゲーション制御部）、１８はＤＶＤより各
種制御プログラムをダウンロードするためのソフトウェ
ア(ローディングプログラム)、固定データ類等を記憶す
るＲＯＭ、１９はＤＶＤからダウンロードされた各種制
御プログラム（目的地設定プログラム、経路探索プログ
ラム等）、探索された誘導経路データ、その他の処理結
果を記憶するＲＡＭである。２１はDVDから読み出した
住所データベースを記憶する記憶部で、州毎に都市がリ
ストされ、又、都市毎にストリートがリストとされ、ア
ドレス（州／都市／ストリート／番地）が示す地点の経
緯度を有している。Reference numeral 17 is a CPU (navigation control unit) for controlling the entire navigation device, 18 is software (loading program) for downloading various control programs from the DVD, ROM for storing fixed data and the like, and 19 is download from the DVD. It is a RAM that stores various control programs (destination setting program, route search program, etc.) that have been performed, searched route data, and other processing results. Reference numeral 21 denotes a storage unit that stores an address database read from a DVD, in which cities are listed for each state and streets are listed for each city, and the latitude and longitude of a point indicated by an address (state / city / street / address). have.

【００２３】２２はDVDから読み出したPOIデータベース
を記憶する記憶部であり、POI名毎に住所、カテゴ
リー、POIの位置(経緯度)などを保持している。２３
は地図画像／誘導経路等を発生するディスプレイコント
ローラ、２４はディスプレイコントローラが発生した画
像を記憶するビデオＲＡＭ、２５は各種メニューおよび
リストを発生するメニュー／リスト発生部、２６は画像
を合成して出力する画像合成部、２７は画像合成部から
出力される画像を表示する表示装置(モニタ)、２８は交
差点までの距離や進行方向、音声入力案内メッセージを
音声で出力する音声案内部、２９は通信インタフェース
であり音声認識装置50とデータの送受を行なうものであ
る。Reference numeral 22 denotes a storage unit for storing a POI database read from a DVD, which holds an address, a category, a POI position (longitude) and the like for each POI name. 23
Is a display controller for generating a map image / guidance route, 24 is a video RAM for storing images generated by the display controller, 25 is a menu / list generating unit for generating various menus and lists, and 26 is a composite image output. An image synthesizing unit, 27 is a display device (monitor) for displaying the image output from the image synthesizing unit, 28 is a distance to the intersection, a traveling direction, a voice guidance unit for outputting a voice input guidance message by voice, and 29 is communication. The interface is for transmitting and receiving data to and from the voice recognition device 50.

【００２４】図3は音声認識装置の構成図であり、図1と
同一部分には同一符号を付している。音声認識エンジン5
1は音声認識部54及び音声辞書作成部55を備えている。音
声認識部55は音声認識に際して音声辞書データべース52
を参照してマイクロホン56から入力する話者の音声パタ
ーンと登録パターンとのパターンマッチングを行い、最
も類似度が高い(スコアが最も高い)単語を決定し、ナビ
ゲーション装置10に送出する。音声辞書作成部55は、単
語登録時、音声入力された単語の音声パターンデータ(音
声波形データ)を、無線通信部53を介して音声認識用の
外部サーバ70に送り、該外部サーバで音声認識した単語
と該単語の音声パターンを組にして音声辞書データべー
ス５２に登録する。操作部57は、音声認識を実行させるト
ークスイッチTKS、登録モードにする登録モードスイッチ
VRS等を有している。FIG. 3 is a block diagram of the voice recognition apparatus, and the same parts as those in FIG. 1 are designated by the same reference numerals. Speech recognition engine 5
1 includes a voice recognition unit 54 and a voice dictionary creation unit 55. The voice recognition unit 55 uses a voice dictionary database 52 for voice recognition.
Referring to, the pattern matching is performed between the voice pattern of the speaker input from the microphone 56 and the registered pattern, the word with the highest similarity (the highest score) is determined, and the word is sent to the navigation device 10. The voice dictionary creation unit 55 sends voice pattern data (voice waveform data) of a word that is voice input at the time of word registration to the external server 70 for voice recognition via the wireless communication unit 53, and the external server performs voice recognition. The word and the voice pattern of the word are paired and registered in the voice dictionary database 52. The operation unit 57 includes a talk switch TKS for executing voice recognition and a registration mode switch for setting the registration mode.
It has VRS etc.

【００２５】図4は本発明の第1実施例の音声辞書作成処
理フローである。登録モードスイッチVRSの操作により登
録モードになっている時(ステップ201)、ナビゲーション
装置の所定画面においてトークスイッチTKSが操作さ
れ、しかる後、音声入力があると(ステップ202)、音声
辞書作成部55は外部サーバ70に入力音声の音声波形デー
タを送り、音声認識を依頼する(ステップ203)。外部サー
バ70は受信した音声波形データと大容量音声辞書データ
べース80に登録されている発音記号とを用いてパターン
マッチングして最も類似度の高い単語を検索し、該単語
の文字列と発音記号列を音声辞書作成部55に返す。これ
により、音声辞書作成部55は認識単語の文字列と発音記
号列(発音パターン)を取得し(ステップ204)、該単語が音
声辞書データべース52に登録済みであるかチェックし
(ステップ205)、登録済みでなければ音声辞書データべー
ス52にアルファベット順に登録し(ステップ206)、しかる
後、認識単語をナビゲーション装置10に送出し(ステップ
207)、処理を終了する。FIG. 4 is a voice dictionary creation processing flow according to the first embodiment of the present invention. When in the registration mode by operating the registration mode switch VRS (step 201), the talk switch TKS is operated on a predetermined screen of the navigation device, and when there is a voice input thereafter (step 202), the voice dictionary creation unit 55 Sends voice waveform data of the input voice to the external server 70 and requests voice recognition (step 203). The external server 70 performs pattern matching using the received voice waveform data and the phonetic symbols registered in the large-capacity voice dictionary database 80 to search for the word with the highest degree of similarity, and to search for the character string of the word. The phonetic symbol string is returned to the voice dictionary creating unit 55. As a result, the voice dictionary creating unit 55 acquires the character string and the phonetic symbol string (pronunciation pattern) of the recognized word (step 204), and checks whether the word is already registered in the voice dictionary database 52.
(Step 205), if not registered, it is registered in alphabetical order in the voice dictionary database 52 (Step 206), and then the recognized word is sent to the navigation device 10 (Step
207), and the process ends.

【００２６】第1実施例では音声辞書データべース52に
単にアルファベット順に単語と音声パターンの対応を登
録した場合であるが、ナビゲーション画面毎に分けて登
録するように構成することもできる。このようにすれば、
音声辞書データべース52の音声辞書を用いて音声認識す
る場合、検索対象の単語数が減少し、高速で精度の高い音
声認識が可能となる。In the first embodiment, the correspondence between words and voice patterns is simply registered in the voice dictionary database 52 in alphabetical order. However, the correspondence may be registered separately for each navigation screen. If you do this,
When speech recognition is performed using the speech dictionary of the speech dictionary database 52, the number of search target words is reduced, and high-speed and highly accurate speech recognition is possible.

【００２７】図5(a)に示すように全画面に対応してコマ
ンド辞書部D1、住所入力画面に対応して住所辞書部D2、施
設入力画面に対応してPOI辞書部D3を設ける。又、住所辞
書部Ｄ２は図5(b)に示すように更に、州入力画面に対
応させて州名とその発音記号列(発音パターン)の対応を
記憶する辞書部D21、州毎に、都市入力画面に対応さ
せて都市名とその発音記号列の対応を記憶する辞書部D2
2、都市毎に、ストリート入力画面に対応させてストリ
ート名とその発音記号列の対応を記憶する辞書部D23を
設ける。又、POI辞書部D3は図5(c)に示すように更に、カ
テゴリー入力画面に対応させてメインカテゴリー名とそ
の発音記号列の対応を記憶する辞書部D31、サブカテゴ
リー入力画面に対応させてサブカテゴリー名とその発音
記号列の対応を記憶する辞書部D32、カテゴリー毎に、
施設名入力画面に対応させて施設名とその発音記号列の
対応を記憶する辞書部D33を設ける。As shown in FIG. 5 (a), a command dictionary section D1 is provided for all screens, an address dictionary section D2 is provided for address input screens, and a POI dictionary section D3 is provided for facility input screens. As shown in FIG. 5 (b), the address dictionary unit D2 further stores a correspondence between the state name and its phonetic symbol string (pronunciation pattern) in association with the state input screen. Dictionary section D2 that stores the correspondence between city names and their phonetic symbol strings in association with the input screen
2. A dictionary unit D23 is provided for each city to store the correspondence between street names and their phonetic symbol strings in association with the street input screen. Further, as shown in FIG. 5 (c), the POI dictionary unit D3 further corresponds to the category input screen and the dictionary unit D31 that stores the correspondence between the main category name and its phonetic symbol string, and the subcategory input screen. Dictionary section D32, which stores the correspondence between subcategory names and their phonetic symbol strings, for each category,
A dictionary unit D33 is provided for storing the correspondence between the facility name and its phonetic symbol string in association with the facility name input screen.

【００２８】換言すれば、音声辞書データべース52に、下
位概念の単語とその音声パターンを組にして登録するた
めの種々の辞書部を上位概念毎に設ける。住所の場合、
前記上位概念と下位概念は、州入力画面IDと州名、州
毎の都市入力画面ＩＤと都市名、都市毎のストリート
入力画面IDとストリート名である。又、POIの場合、前記上
位概念と下位概念は、カテゴリー入力画面IDとメイン
カテゴリー名、メインカテゴリー毎のサブカテゴリー
入力画面IDとサブカテゴリー名、メインあるいはサブ
のカテゴリー毎の施設名入力画面と施設名である。下位
概念の単語が音声入力され時、音声辞書作成部55は外部
サーバ70に認識を依頼して得られた単語とその音声パタ
ーンを上位概念に応じた辞書部に登録する。In other words, the voice dictionary database 52 is provided with various dictionary units for registering a subordinate concept word and its voice pattern as a set for each superordinate concept. For an address,
The superordinate concept and the subordinate concept are a state input screen ID and a state name, a city input screen ID and a city name for each state, and a street input screen ID and a street name for each city. In the case of POI, the above-mentioned superordinate concept and subordinate concept are the category input screen ID and main category name, the subcategory input screen ID and subcategory name for each main category, the facility name input screen and facility for each main or sub category. Is the name. When a word of a subordinate concept is input by voice, the voice dictionary creating unit 55 registers the word obtained by requesting the external server 70 for recognition and its voice pattern in the dictionary unit according to the superordinate concept.

【００２９】図6は音声辞書データべース52にコマンド
辞書部D1、住所入力画面に対応して住所辞書部D2、施設名
入力画面に対応してPOI辞書部D3を設けた場合の音声辞
書作成処理フローである。尚、予めコマンドは全てコマン
ド辞書部Ｄ１に登録されているものとする。登録モード
スイッチVRSの操作により登録モードになっているとき
(ステップ301)、ナビゲーション装置の所定画面において
トークスイッチTKSだ操作され、しかる後、音声入力が
あると(ステップ302)、音声辞書作成部55はナビゲーシ
ョン装置10より画面IDを取得する(ステップ303)。FIG. 6 shows a voice dictionary when the voice dictionary database 52 is provided with a command dictionary section D1, an address dictionary section D2 corresponding to the address input screen, and a POI dictionary section D3 corresponding to the facility name input screen. It is a creation processing flow. It is assumed that all commands are registered in the command dictionary section D1 in advance. When in registration mode by operating the registration mode switch VRS
(Step 301), the talk switch TKS is operated on the predetermined screen of the navigation device, and then, when there is a voice input (Step 302), the voice dictionary creation unit 55 acquires the screen ID from the navigation device 10 (Step 303). .

【００３０】ついで、音声辞書認識部54はコマンド辞書
部D1に登録されている単語を用いて音声認識を行い、入
力音声がコマンドであるか調べる(ステップ304)。コマン
ドであれば、該コマンドをナビゲーション装置10に入力
する。ナビゲーション装置10は入力されたコマンドに応
じた処理を行い、所定の画面を作成して表示する(ステッ
プ305)。以後、最初に戻り、以降の処理を繰り返す。ステ
ップ304において、コマンドでなければ、音声認識部54は
画面IDに応じた辞書検索を行い(ステップ306)、単語検索
ができたかチェックする(ステップ307)。検索できれば、
認識単語をナビゲーション装置10に入力する。ナビゲー
ション装置10は入力された単語に応じた処理を行い、所
定の画面を作成して表示する(ステップ308)。以後、最初
に戻り、以降の処理を繰り返す。Next, the voice dictionary recognition section 54 performs voice recognition using the words registered in the command dictionary section D1 and checks whether the input voice is a command (step 304). If it is a command, the command is input to the navigation device 10. The navigation device 10 performs a process according to the input command to create and display a predetermined screen (step 305). After that, the process returns to the beginning and the subsequent processes are repeated. If it is not a command in step 304, the voice recognition unit 54 performs a dictionary search according to the screen ID (step 306) and checks whether a word search has been performed (step 307). If you can search
The recognition word is input to the navigation device 10. The navigation device 10 performs processing according to the input word to create and display a predetermined screen (step 308). After that, the process returns to the beginning and the subsequent processes are repeated.

【００３１】ステップ307において、検索できなければ、
音声辞書作成部55は、外部サーバ70に入力音声の音声波
形データを送り、音声認識を依頼する(ステップ309)。外
部サーバ70は受信した音声波形データと大容量音声辞書
データべース80に登録されている発音記号とを用いてパ
ターンマッチングして最も類似度の高い単語を検索し、
該単語の文字列と発音記号列を音声辞書作成部55に返
す。これにより、音声辞書作成部55は認識単語の文字列
と発音記号列(発音パターン)を取得し(ステップ310)、該
認識単語の文字列と発音記号列の対応を画面IDに応じた
辞書部に登録する(ステップ311)。ついで、音声辞書作
成部55は認識単語をナビゲーション装置10に入力する。
ナビゲーション装置10は入力された単語に応じた処理を
行い、所定の画面を作成して表示する(ステップ308)。以
後、最初に戻り、以降の処理を繰り返す。In step 307, if no search is possible,
The voice dictionary creating unit 55 sends the voice waveform data of the input voice to the external server 70 and requests voice recognition (step 309). The external server 70 performs pattern matching using the received voice waveform data and the phonetic symbols registered in the large-capacity voice dictionary data base 80 to search for the word with the highest degree of similarity,
The character string of the word and the phonetic symbol string are returned to the voice dictionary creating unit 55. As a result, the voice dictionary creating unit 55 acquires the character string of the recognized word and the phonetic symbol string (pronunciation pattern) (step 310), and associates the character string of the recognized word with the phonetic symbol string according to the screen ID. To register (step 311). Then, the voice dictionary creating unit 55 inputs the recognized word to the navigation device 10.
The navigation device 10 performs processing according to the input word to create and display a predetermined screen (step 308). After that, the process returns to the beginning and the subsequent processes are repeated.

【００３２】以上により、画面ＩＤに対応させて単語を
音声辞書データべースに登録できると共に、単語登録と
音声認識処理を並行して行うことができる。又、大容量音
声辞書データべース80を音声辞書データべース52と同様
の体系で単語登録する。このようにすれば、音声辞書作
成部55は外部サーバ70に音声認識を依頼するとき画面ID
を音声波形データと共に送出し、外部サーバ70は該画面
IDに応じた辞書部の単語だけを検索対象として単語検索
する。このようにすれば、検索対象の単語数を少なくで
きるため高速の単語認識が可能となる。As described above, the word can be registered in the voice dictionary database in association with the screen ID, and the word registration and the voice recognition processing can be performed in parallel. In addition, the large-capacity voice dictionary database 80 is registered with words in the same system as the voice dictionary database 52. By doing this, the voice dictionary creation unit 55 can use the screen ID when requesting voice recognition from the external server 70.
Is transmitted together with the voice waveform data, and the external server 70 displays the screen.
Only the words in the dictionary corresponding to the ID are searched for as words. By doing so, the number of words to be searched can be reduced, so that high-speed word recognition can be performed.

【００３３】図7は図6の変形例であり、音声認識部54が
コマンドの識別処理のみを行う場合であり、図6と同一部
分には同一ステップ番号を付している。図7において、ス
テップ305までは同じである。ステップ304において、コマ
ンドでなければ、音声辞書作成部55は、外部サーバ70に入
力音声の音声波形データを送り、音声認識を依頼する(ス
テップ401)。外部サーバ70は受信した音声波形データと
大容量音声辞書データべース80に登録されている発音記
号とを用いてパターンマッチングして最も類似度の高い
単語を検索し、該単語の文字列と発音記号列を音声辞書
作成部55に返す。これにより、音声辞書作成部55は認識
単語の文字列と発音記号列(発音パターン)を取得する
(ステップ402)。ついで、音声辞書作成部55は、該認識単
語が音声辞書データべース52に登録済みであるかチェッ
クし(ステップ403)、登録済みでなければ認識単語の文字
列と発音記号列の対応を画面IDに応じた辞書部に登録す
る(ステップ404)。ついで、音声辞書作成部55は認識単
語をナビゲーション装置10に入力する。ナビゲーション
装置10は入力された単語に応じた処理を行い、所定の画
面を作成して表示する(ステップ405)。以後、最初に戻
り、以降の処理を繰り返す。FIG. 7 shows a modified example of FIG. 6, in which the voice recognition unit 54 performs only command identification processing, and the same parts as those in FIG. 6 are designated by the same step numbers. In FIG. 7, up to step 305 is the same. If it is not a command in step 304, the voice dictionary creating unit 55 sends the voice waveform data of the input voice to the external server 70 and requests voice recognition (step 401). The external server 70 performs pattern matching using the received voice waveform data and the phonetic symbols registered in the large-capacity voice dictionary database 80 to search for the word with the highest degree of similarity, and to search for the character string of the word. The phonetic symbol string is returned to the voice dictionary creating unit 55. As a result, the voice dictionary creating unit 55 acquires the character string of the recognized word and the phonetic symbol string (phonetic pattern).
(Step 402). Then, the voice dictionary creating unit 55 checks whether or not the recognized word is registered in the voice dictionary database 52 (step 403), and if not, associates the character string of the recognized word with the phonetic symbol string. It is registered in the dictionary unit corresponding to the screen ID (step 404). Then, the voice dictionary creating unit 55 inputs the recognized word to the navigation device 10. The navigation device 10 performs processing according to the input word to create and display a predetermined screen (step 405). After that, the process returns to the beginning and the subsequent processes are repeated.

【００３４】以上、図6、図7の処理フローでは、予めコ
マンドの単語をすべて音声辞書データべース52のコマン
ド辞書部D1に登録した場合であるが、登録しておかず住
所やPOIと同様に外部サーバからの音声認識結果に基づ
いて登録するように構成することもできる。As described above, in the processing flows of FIGS. 6 and 7, all the command words are registered in advance in the command dictionary section D1 of the voice dictionary database 52. It can also be configured to register based on the voice recognition result from the external server.

【００３５】図8は音声辞書データべース52に音声辞書
が作成され、該音声辞書を用いて音声認識を行う場合の
処理フローであり、音声登録は行わない。ユーザが地図
表示画面501においてリモコンのトークスイッチTKSを押
下すると、システムは"ピッ"音を発生する。この"ピッ"
音の発生により音声入力が可能となり(以下同様)、ユー
ザは"Menu"と音声入力する。全ての画面にコマンド辞書
が対応しているから、音声認識部54は該コマンド辞書部D
1を参照して音声認識し、" Menu"を識別してナビゲーシ
ョン装置10に入力する。FIG. 8 shows a processing flow when a voice dictionary is created in the voice dictionary database 52 and voice recognition is performed using the voice dictionary, and voice registration is not performed. When the user presses the talk switch TKS on the remote control on the map display screen 501, the system makes a beep sound. This "pip"
The generation of sound enables voice input (same below), and the user inputs "Menu" by voice. Since the command dictionary corresponds to all the screens, the voice recognition unit 54 uses the command dictionary unit D.
Voice recognition is performed by referring to 1, and "Menu" is identified and input to the navigation device 10.

【００３６】ナビゲーション制御部は"メニュー"の入力
によりスクリーンに目的地入力方法を特定するための"F
ind Destination by"画面502を表示し、"How would you
liketo set your destinatio?"を音声出力する。これに
よりユーザは"Address"を音声入力する。画面502にはコ
マンド辞書D1が対応しているから、音声認識部54は該コ
マンド辞書部D1を参照して音声認識し、"Address"を識
別してナビゲーション装置10に入力する。The navigation control unit displays "F" for specifying the destination input method on the screen by inputting "Menu".
ind Destination by "Display the screen 502," How would you
Then, the user voice-inputs "Address". Since the command dictionary D1 corresponds to the screen 502, the voice recognition unit 54 refers to the command dictionary unit D1. Voice recognition is performed, and "Address" is identified and input to the navigation device 10.

【００３７】ナビゲーション制御部は"Address"の入力
によりスクリーンにメニュー項目"Change State"、"Cit
y Name", "Street Name"を有する"Find address by"
画面503を表示し、"Select City or Street to find yo
ur address in California."を音声出力する。なお、最
後に選択された州がデフォルト州として表示されている
から、州名を変更したい場合には"Change State"を音声
入力し、しかる後、州名を音声入力する。The navigation control unit displays the menu items "Change State" and "Cit" on the screen by inputting "Address".
"Find address by" with "Y Name" and "Street Name"
Display screen 503 and select "Select City or Street to find yo."
"ur address in California." is output by voice. In addition, since the last selected state is displayed as the default state, if you want to change the state name, input "Change State" by voice, and then change the state. Enter your name by voice.

【００３８】州決定後、ユーザは都市名より始めて目的
地のアドレスを入力するために"City"を音声入力する。
画面503にはコマンド辞書D1が対応しているから、音声認
識部54は該コマンド辞書部D1を参照して音声認識し、"C
ity"を識別してナビゲーション装置10に入力する。ナビ
ゲーション制御部は"City"の入力によりスクリーンに都
市名を入力させるためのアルファベット/ニューメリッ
クキーボードを有する"Enter City Name"画面504を表示
し、"Which City in California ？"を音声出力する。
ユーザが都市名"TORRANCE"を入力すると、ナビゲーショ
ン制御部はCaliforniaの都市入力画面に応じた辞書D22
を参照して音声認識し、" TORRANCE"を識別してナビゲ
ーション装置10に入力する。After the state is determined, the user voice-inputs "City" to input the address of the destination starting from the city name.
Since the command dictionary D1 corresponds to the screen 503, the voice recognition unit 54 recognizes voice by referring to the command dictionary unit D1,
"city" is identified and input to the navigation device 10. The navigation control displays the "Enter City Name" screen 504 with an alphabet / numeric keyboard for entering the city name on the screen by entering "City", Which City in California? "Is output as a voice.
When the user inputs the city name "TORRANCE", the navigation control unit displays the dictionary D22 according to the city input screen of California.
Is referred to for voice recognition, "TORRANCE" is identified and input to the navigation device 10.

【００３９】ナビゲーション制御部は" TORRANCE "の入
力によりスクリーンに都市名を入力させるためのアルフ
ァベット/ニューメリックキーボードを有する"Enter St
reetName"画面505を表示し、"Which Street in Torranc
e？"を音声出力する。ユーザがストリート名"Carson"
を入力すると、ナビゲーション制御部はTORRANCEのスト
リート入力画面に応じた辞書D23を参照して音声認識
し、"Carson"を識別してナビゲーション装置10に入力す
る。以後、同様の操作により、アドレスを音声入力するこ
とができる。The navigation controller has an "Enter St" which has an alphabet / numeric keyboard for entering the city name on the screen by entering "TORRANCE".
Display reetName "Screen 505," then "Which Street in Torranc
e? "Is voice output. The user selects the street name" Carson "
Then, the navigation control unit recognizes voice by referring to the dictionary D23 corresponding to the street input screen of TORRANCE, identifies "Carson", and inputs it to the navigation device 10. After that, the address can be input by voice by the same operation.

【００４０】[0040]

【発明の効果】以上、本発明によれば、入力された音声
パターンを外部音声認識装置に送り、該外部音声認識装
置で音声認識した単語と該音声パターンの対応を音声辞
書データべースに登録し、該音声辞書データべースに登
録されている音声辞書データべースに登録されている音
声辞書を用いて入力された音声の単語を認識するように
構成したから、信頼性の高い、必要な単語のみの音声辞
書を作成することができる。As described above, according to the present invention, the input voice pattern is sent to the external voice recognition device, and the correspondence between the word recognized by the external voice recognition device and the voice pattern is used in the voice dictionary database. It is highly reliable because it is configured to recognize a voice word input by using a voice dictionary registered in the voice dictionary database registered and registered in the voice dictionary database. , You can create a voice dictionary of only the necessary words.

【００４１】また、本発明によれば、下位概念の単語と
その音声パターンを組にして登録する辞書部を上位概念
毎に設け、下位概念の単語が音声入力され時、外部音声
認識装置より得られた単語とその音声パターンを上位概
念に応じた辞書部に登録するようにしたから、所定の辞
書部に登録されている単語を検索対象とするだけで良い
ためパターンマッチングする単語数を少なくでき、高速
の音声認識ができるようになる。Further, according to the present invention, a dictionary unit for registering a subordinate concept word and its voice pattern as a set is provided for each superordinate concept, and when the subordinate concept word is input by voice, it is obtained from an external voice recognition device. Since the registered words and their voice patterns are registered in the dictionary section according to the superordinate concept, the number of words to be pattern-matched can be reduced because only the words registered in a predetermined dictionary section need to be searched. , High-speed voice recognition will be possible.

【００４２】また、本発明によれば、音声入力されたと
き、音声辞書を用いて入力音声の単語認識処理を行い、単
語認識できた場合は外部音声認識装置に問い合わせを行
わず、単語認識できない場合のみ外部音声認識装置に問
い合わせ該外部音声認識装置で音声認識した単語と該単
語の音声パターンの対応を音声辞書データべースに登録
するようにしたから、音声認識を行いながら自動的にユ
ーザが必要とする単語の音声パターンが登録された音声
辞書を作成することができる。Further, according to the present invention, when a voice is input, the word recognition process of the input voice is performed using the voice dictionary, and when the word can be recognized, the external voice recognition device is not inquired and the word cannot be recognized. Only in the case of inquiry to the external voice recognition device, since the correspondence between the word voice-recognized by the external voice recognition device and the voice pattern of the word is registered in the voice dictionary database, the user is automatically recognized while performing the voice recognition. It is possible to create a voice dictionary in which voice patterns of words required by the are registered.

[Brief description of drawings]

【図１】本発明の音声辞書作成方法を実現するシステム
の全体構成図である。FIG. 1 is an overall configuration diagram of a system that realizes a voice dictionary creating method of the present invention.

【図２】ナビゲーション装置の構成図である。FIG. 2 is a configuration diagram of a navigation device.

【図３】音声認識装置の構成図である。FIG. 3 is a configuration diagram of a voice recognition device.

【図４】本発明の第1実施例の音声辞書作成処理フロー
である。FIG. 4 is a voice dictionary creation processing flow according to the first embodiment of the present invention.

【図５】音声辞書データべースの記憶例である。FIG. 5 is a storage example of a voice dictionary database.

【図６】音声辞書作成の第2実施例の処理フローであ
る。FIG. 6 is a processing flow of a second embodiment of creating a voice dictionary.

【図７】音声認識部がコマンドの識別処理のみを行う場
合の音声辞書作成処理フローである。FIG. 7 is a flow chart of a voice dictionary creation process when the voice recognition unit performs only a command identification process.

【図８】音声でアドレス入力により目的地を設定する場
合の処理フローである。FIG. 8 is a processing flow for setting a destination by voice inputting an address.

【図９】従来のアドレス入力による目的地特定方法の説
明図である。FIG. 9 is an explanatory diagram of a conventional destination specifying method by inputting an address.

【図１０】従来のアドレス入力による目的地特定方法の
別の説明図である。FIG. 10 is another explanatory diagram of a conventional destination specifying method by inputting an address.

【図１１】従来のPOIタイプ(カテゴリー)入力による目
的地特定方法の説明図である。FIG. 11 is an explanatory diagram of a destination specifying method by inputting a conventional POI type (category).

【図１２】従来の目的地を音声でアドレス入力する場合
の説明図である。FIG. 12 is an explanatory diagram in the case of inputting a destination address by voice in the related art.

【図１３】音声認識装置の説明図である。FIG. 13 is an explanatory diagram of a voice recognition device.

[Explanation of symbols]

１０ナビゲーション装置５０音声認識装置５１音声認識エンジン部５２個人用音声辞書データべース５３無線通信部７０音声認識用の外部サーバ８０大容量の辞書データべース 10 navigation devices 50 voice recognition device 51 Speech recognition engine 52 Personal voice dictionary database 53 Wireless communication section 70 External server for voice recognition 80 large dictionary database

Claims

[Claims]

1. A voice pattern of a plurality of words is registered, an input voice pattern is compared with a registered voice pattern, and a voice-input word is recognized based on the most similar registered voice pattern. In the voice recognition device,
A voice dictionary database for registering correspondences between words and voice patterns, sending an input voice pattern to an external voice recognition device, and a voice dictionary showing correspondences between words recognized by the external voice recognition device and voice patterns of the words. A voice dictionary creating unit for registering in the database, and a voice recognizing unit for recognizing a word of voice input using the voice dictionary registered in the voice dictionary database are provided. Speech recognizer.

2. The voice dictionary database has a dictionary unit for registering a subordinate concept word and its voice pattern as a set for each superordinate concept, and when the subordinate concept word is input by voice, 2. The voice recognition device according to claim 1, wherein the dictionary creation unit registers the word obtained from the external voice recognition device and its voice pattern in the dictionary unit according to the superordinate concept.

3. When a voice recognition device is used for voice input of a navigation device, said superordinate concept and subordinate concept are state input screen ID and state name, city input screen ID and city name for each state, street for each street. 3. The voice recognition device according to claim 2, wherein the input screen and the street name are used.

4. When the voice recognition device is used for voice input of a navigation device, the superordinate concept and subordinate concept are a category input screen ID and category name, a facility input screen for each category and a facility name. The voice recognition device according to claim 2.

5. When a voice is input, the voice recognition unit performs word recognition processing of the input voice using a voice dictionary, and when the word can be recognized, the voice dictionary creation unit converts the input voice pattern into an external voice. When the word cannot be recognized without being sent to the recognition device, the input voice pattern is sent to the external voice recognition device, and the correspondence between the word recognized by the external voice recognition device and the voice pattern of the sound word is converted into a voice dictionary database. The voice recognition device according to claim 1 or 2, wherein the voice recognition device is registered in the voice recognition device.

6. A voice pattern of a plurality of words is registered, the input voice pattern is compared with the registered voice pattern, and the voice input word is recognized based on the most similar registered voice pattern. In the method for creating a voice dictionary of a voice recognition device, a voice dictionary database for registering correspondence between words and voice patterns is provided, the input voice pattern is sent to an external voice recognition device, and voice recognition is performed by the external voice recognition device. A voice dictionary creating method characterized by creating a voice dictionary by acquiring correspondence between a word and a voice pattern of the word and registering the correspondence in a voice dictionary database.

7. When a voice is input, the voice dictionary is used to perform word recognition processing of the input voice, and when the word can be recognized, the input voice pattern is not sent to an external voice recognition device, and the word is recognized. If it cannot be recognized, the input voice pattern is sent to the external voice recognition device, and the correspondence between the word voice-recognized by the external voice recognition device and the voice pattern of the word is registered in the voice dictionary database to create the voice dictionary. The voice dictionary creating method according to claim 6, wherein the voice dictionary is created.