JP2011203455A

JP2011203455A - Information terminal for vehicles, and program

Info

Publication number: JP2011203455A
Application number: JP2010070180A
Authority: JP
Inventors: Takamitsu Sakai; 孝光坂井; Kazuteru Yamanaka; 一輝山中; Miho Makimoto; 美保槇本
Original assignee: Aisin AW Co Ltd
Current assignee: Aisin AW Co Ltd
Priority date: 2010-03-25
Filing date: 2010-03-25
Publication date: 2011-10-13
Anticipated expiration: 2030-03-25
Also published as: JP5218459B2

Abstract

PROBLEM TO BE SOLVED: To provide an information terminal for vehicles, and a program, in which voice information on which voice recognition is not performed, is appropriately stored in a dictionary of voice recognition based on operation after that, and voice recognition of voice information which is not recognized, is appropriately performed.SOLUTION: When it is determined that a recognition result is not obtained after recognition processing of voice information uttered by a speaker, the voice information is held as unrecognizable voice information. Based on operation performed after it is determined that the recognition result is not obtained, an estimation point on a map, which is estimated to be corresponding to the unrecognizable voice information is specified (S114). The estimation point information regarding the estimation point and the unrecognizable voice information are stored by relating them (S118). Thereby, as the unrecognizable voice information in which the recognition result is not obtained, is stored by relating it to the estimation point information regarding the estimation point which is estimated to be corresponded to the unrecognizable voice information, voice recognition of voice information which is not recognized is appropriately performed.

Description

本発明は、車両情報端末およびプログラムに関する。 The present invention relates to a vehicle information terminal and a program.

従来、音声認識機能を備える音声認識装置が知られている。音声認識処理装置において、発話者の発した内容を認識できない場合、発話したときの視線や指差し動作から音声認識対象の物品を特定し、特定された物品と音声認識できなかった単語とを対応付けて音声認識に係る辞書に登録する技術が開示されている（例えば、特許文献１参照）。 Conventionally, a voice recognition device having a voice recognition function is known. In the speech recognition processing device, if the content spoken by the speaker cannot be recognized, the speech recognition target article is identified from the line of sight and pointing action when speaking, and the identified article is associated with the word that could not be recognized In addition, a technique for registering in a dictionary related to speech recognition is disclosed (for example, see Patent Document 1).

特開２００９−２２３１７２号公報JP 2009-223172 A

しかしながら特許文献１では、視線や指差しを検知する手段が必要であった。また、発話者の発した音声が示す音声認識対象が、例えばナビゲーション装置における目的地のように音声認識を行う地点から視認できない場合、視線や指差しを検知する手段を有していたとしても、音声認識対象を特定することはできない。
本発明は、上述の課題に鑑みてなされたものであり、その目的は、音声認識できなかった音声情報をその後の操作に基づいて音声認識に係る辞書に適切に記憶させることにより、音声認識できなかった音声情報を適切に音声認識させることができる車両用情報端末およびプログラムを提供することにある。 However, in Patent Document 1, a means for detecting line of sight and pointing is required. In addition, even if the speech recognition target indicated by the speech uttered by the speaker is not visible from a point where speech recognition is performed, for example, as a destination in the navigation device, even if it has means for detecting the line of sight and pointing, The speech recognition target cannot be specified.
The present invention has been made in view of the above-described problems, and an object of the present invention is to perform voice recognition by appropriately storing voice information that could not be voice recognized in a dictionary related to voice recognition based on subsequent operations. An object of the present invention is to provide a vehicle information terminal and a program capable of appropriately recognizing voice information that has not been received.

請求項１に記載の車両用情報端末は、発話者の発した音声情報を取得する音声情報取得手段と、音声情報取得手段により取得された音声情報を認識処理する認識処理手段と、認識処理手段により認識結果が得られたか否かを判断する認識結果判断手段と、認識結果判断手段により認識結果が得られないと判断された場合、音声情報を認識不可音声情報として保持する音声情報保持手段と、認識結果判断手段により認識結果が得られないと判断された後に実行された操作に基づき認識不可音声情報に対応すると推定される地図上の推定地点を特定する地点特定手段と、推定地点に関する推定地点情報と認識不可音声情報とを関連付けて記憶させる記憶制御手段と、を備える。これにより、認識結果が得られなかった認識不可音声情報を、認識不可音声情報に対応すると推定される推定地点に関する推定地点情報と関連付けて適切に記憶させることができるので、音声認識できなかった音声情報を適切に音声認識させることができる。 The vehicle information terminal according to claim 1 is a voice information acquisition unit that acquires voice information uttered by a speaker, a recognition processing unit that recognizes the voice information acquired by the voice information acquisition unit, and a recognition processing unit. A recognition result judging means for judging whether or not a recognition result has been obtained, and a voice information holding means for holding voice information as unrecognizable voice information when the recognition result judging means judges that a recognition result is not obtained. , A point identifying means for identifying an estimated point on the map that is estimated to correspond to unrecognizable speech information based on an operation performed after the recognition result determining means determines that a recognition result cannot be obtained, and an estimation regarding the estimated point Storage control means for storing the location information and the unrecognizable speech information in association with each other. As a result, the unrecognizable speech information for which no recognition result has been obtained can be appropriately stored in association with the estimated point information related to the estimated point estimated to correspond to the unrecognizable speech information. Information can be appropriately recognized by voice.

請求項２に記載の発明では、認識結果判断手段により認識結果が得られないと判断された後、目的地の設定に係る情報が取得されたか否かを判断する新情報取得判断手段を備える。地点特定手段は、新情報取得判断手段により目的地の設定に係る情報が取得されたと判断された場合、目的地の設定に係る情報に基づいて推定地点を特定する。これにより、目的地の設定に係る情報に基づいて推定地点を特定するので、認識不可音声情報に対応すると推定される推定地点をより適切に特定することができる。 The invention according to claim 2 further comprises a new information acquisition judging means for judging whether or not the information related to the destination setting is obtained after the recognition result judging means judges that the recognition result cannot be obtained. The point specifying unit specifies the estimated point based on the information related to the destination setting when the new information acquisition determining unit determines that the information related to the destination setting is acquired. Thereby, since an estimated point is specified based on the information which concerns on the setting of the destination, the estimated point estimated to respond | correspond to unrecognizable audio | voice information can be specified more appropriately.

請求項３に記載の発明では、地点特定手段は、新情報取得判断手段により目的地の設定に係る情報が取得されていないと判断された場合、車両のイグニッションがオフされた地点を推定地点として特定する。請求項４に記載の発明では、地点特定手段は、認識結果判断手段により認識結果が得られないと判断された後に車両のイグニッションがオフされた地点を推定地点として特定する。これにより、認識不可音声情報に対応すると推定される推定地点を容易に特定することができる。 In the invention according to claim 3, when it is determined by the new information acquisition determination means that the information related to the destination setting has not been acquired, the point specifying means uses the point where the vehicle ignition is turned off as the estimated point. Identify. In the invention according to claim 4, the point specifying means specifies the point where the ignition of the vehicle is turned off after the recognition result determining means determines that the recognition result is not obtained as the estimated point. As a result, it is possible to easily identify an estimated point estimated to correspond to unrecognizable voice information.

請求項５に記載の発明では、認識不可音声情報の発話に至る事前情報が取得されている場合、推定地点情報に含まれる属性情報と事前情報とが一致するか否かを判断する一致判断手段を備える。記憶制御手段は、一致判断手段により属性情報と事前情報とが一致すると判断された場合、推定地点情報と認識不可音声情報とを関連付けて記憶させる。これにより、推定地点情報と認識不可音声情報とを精度よく関連付けて記憶させることができる。 In the invention according to claim 5, when the prior information leading to the utterance of the unrecognizable voice information is acquired, the coincidence determination means for determining whether or not the attribute information included in the estimated point information matches the prior information Is provided. The storage control unit stores the estimated point information and the unrecognizable voice information in association with each other when the attribute information and the prior information are determined to match by the match determination unit. As a result, the estimated point information and the unrecognizable voice information can be stored in association with each other with high accuracy.

以上、車両用情報端末の発明として説明してきたが、次に示すようなプログラムの発明として実現することもできる。
すなわち、発話者の発した音声情報を取得する音声情報取得手段、音声情報取得手段により取得された音声情報を認識処理する認識処理手段、認識処理手段により認識結果が得られたか否かを判断する認識結果判断手段、認識結果判断手段により認識結果が得られないと判断された場合、音声情報を認識不可音声情報として保持する音声情報保持手段、認識結果判断手段により認識結果が得られないと判断された後に実行された操作に基づき、認識不可音声情報に対応すると推定される地図上の推定地点を特定する地点特定手段、および、推定地点に関する推定地点情報と認識不可音声情報とを関連付けて記憶させる記憶制御手段、としてコンピュータを機能させるプログラムである。このようなプログラムを実行することで、上述の車両用情報端末と同様の効果が奏される。 As described above, the invention has been described as the invention of the vehicle information terminal. However, the invention can also be realized as an invention of the following program.
That is, voice information acquisition means for acquiring voice information uttered by a speaker, recognition processing means for recognition processing of voice information acquired by the voice information acquisition means, and whether or not a recognition result is obtained by the recognition processing means When it is determined that the recognition result cannot be obtained by the recognition result determination unit and the recognition result determination unit, the speech information holding unit that holds the speech information as unrecognizable speech information and the recognition result determination unit determines that the recognition result cannot be obtained. Based on the operation performed after the operation, the point specifying means for specifying the estimated point on the map that is estimated to correspond to the unrecognizable voice information, and the estimated point information related to the estimated point and the unrecognizable voice information are stored in association with each other. This is a program that causes a computer to function as storage control means. By executing such a program, the same effect as the above-described vehicle information terminal can be obtained.

本発明の一実施形態の車両用情報端末の構成を示すブロック図である。It is a block diagram which shows the structure of the information terminal for vehicles of one Embodiment of this invention. 本発明の一実施形態の認識辞書に記憶されたデータを説明する説明図である。It is explanatory drawing explaining the data memorize | stored in the recognition dictionary of one Embodiment of this invention. 本発明の一実施形態の音声認識処理を説明するフローチャートである。It is a flowchart explaining the speech recognition process of one Embodiment of this invention. 本発明の一実施形態の認識辞書登録処理を説明するフローチャートである。It is a flowchart explaining the recognition dictionary registration process of one Embodiment of this invention.

以下、本発明による車両用情報端末を図面に基づいて説明する。
（一実施形態）
図１は、本発明の一実施形態による車両用情報端末としてのナビゲーション装置１の全体構成を示すブロック図である。ナビゲーション１は、制御部１０を中心に構成されており、制御部１０に接続される位置検出器２０、地図データ記憶部３０、音声認識情報記憶部４０、操作スイッチ群５０、音声入力部６０、音声出力部７０、描画部８０等を備えている。 Hereinafter, a vehicle information terminal according to the present invention will be described with reference to the drawings.
(One embodiment)
FIG. 1 is a block diagram showing an overall configuration of a navigation device 1 as a vehicle information terminal according to an embodiment of the present invention. The navigation 1 is configured around the control unit 10, and includes a position detector 20, a map data storage unit 30, a voice recognition information storage unit 40, an operation switch group 50, a voice input unit 60, connected to the control unit 10. An audio output unit 70, a drawing unit 80, and the like are provided.

制御部１０は、通常のコンピュータとして構成されている。制御部１０の内部には、ＣＰＵ、ＲＯＭ、Ｉ／Ｏ、および、これらの構成を接続するバスラインなどが備えられている。
位置検出器２０は、いずれも周知の地磁気センサ２１、ジャイロスコープ２２、距離センサ２３、および、衛星からの電波を受信するＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）受信機２４等を有している。これらのセンサ２１〜２４は、各々が性質の異なる誤差を持っているため、相互に補完しながら使用される。 The control unit 10 is configured as a normal computer. The control unit 10 includes a CPU, a ROM, an I / O, a bus line that connects these components, and the like.
The position detector 20 includes a well-known geomagnetic sensor 21, a gyroscope 22, a distance sensor 23, a GPS (Global Positioning System) receiver 24 that receives radio waves from a satellite, and the like. Since these sensors 21 to 24 have errors having different properties, they are used while complementing each other.

地図データ記憶部３０は、例えばハードディスク装置（ＨＤＤ）として実現される記憶装置である。なお、本実施形態ではＨＤＤを用いたが、ＤＶＤ−ＲＯＭや、メモリカード等の他の媒体を用いても差し支えない。地図データ記憶部３０は、位置検出の精度向上のためのいわゆるマップマッチング用データおよび経路を探索するための地図データを記憶している。地図データには、各種データが含まれるが、その一つとして施設に関する施設情報が含まれる。施設情報は、具体的には施設を特定するＩＤと関連付けられて記憶されているＰＯＩ（ＰｏｉｎｔＯｆＩｎｔｅｒｅｓｔ）情報である。ＰＯＩ情報には、施設名称、施設ＩＤ、位置座標、種別（ジャンル）を示す情報などが含まれる。 The map data storage unit 30 is a storage device realized as, for example, a hard disk device (HDD). Although the HDD is used in the present embodiment, other media such as a DVD-ROM and a memory card may be used. The map data storage unit 30 stores so-called map matching data for improving the accuracy of position detection and map data for searching for a route. The map data includes various data, one of which is facility information about the facility. The facility information is specifically POI (Point Of Interest) information stored in association with an ID that identifies the facility. The POI information includes facility name, facility ID, position coordinates, information indicating type (genre), and the like.

音声認識情報記憶部４０は、地図データ記憶部３０と同一のＨＤＤで構成されている。もちろん、メモリカード等の他の媒体を用いてもよい。音声認識情報記憶部４０には、認識辞書４１が記憶されている。 The voice recognition information storage unit 40 is composed of the same HDD as the map data storage unit 30. Of course, other media such as a memory card may be used. A recognition dictionary 41 is stored in the voice recognition information storage unit 40.

認識辞書４１は、音声波形データと対応する単語とが関連付けて記憶されている。認識辞書４１では、音声波形データに対応する単語が地図データに含まれるものである場合、音声波形データと地図データとが関連付けて記憶されている。図２に示すように、例えば、音声波形データＸと対応する単語が「おかざきしやくしょ」である場合、施設名称である「岡崎市役所」、ジャンル「市役所」、住所「愛知県岡崎市・・・」、位置座標（ｘ１，ｙ１）が、音声波形データＸと関連付けて記憶されている。 In the recognition dictionary 41, speech waveform data and a corresponding word are stored in association with each other. In the recognition dictionary 41, when the word corresponding to the speech waveform data is included in the map data, the speech waveform data and the map data are stored in association with each other. As shown in FIG. 2, for example, when the word corresponding to the speech waveform data X is “Okazaki Shikusho”, the facility name “Okazaki City Hall”, the genre “City Hall”, the address “Okazaki City, Aichi Prefecture,. The position coordinates (x1, y1) are stored in association with the speech waveform data X.

図１に戻り、操作スイッチ群５０は、ディスプレイ８１と一体になったタッチスイッチもしくはメカニカルなスイッチやリモコン装置等で構成され、各種入力に使用される。操作スイッチ群５０には、トークスイッチ５１が含まれる。トークスイッチ５１は、音声入力時に操作される。 Returning to FIG. 1, the operation switch group 50 includes a touch switch integrated with the display 81, a mechanical switch, a remote control device, or the like, and is used for various inputs. The operation switch group 50 includes a talk switch 51. The talk switch 51 is operated during voice input.

音声入力部６０は、音声を入力するためのマイク６１が接続されている。トークスイッチ５１がオンされたとき、マイク６１を介して発話者の発した音声が入力される。
音声出力部７０には、音声を出力するためのスピーカ７１が接続されている。
描画部８０には、ディスプレイ８１が接続されている。ディスプレイ８１は、液晶やＣＲＴを用いたカラーディスプレイである。このディスプレイ８１を介して情報表示が行われる。 The voice input unit 60 is connected to a microphone 61 for inputting voice. When the talk switch 51 is turned on, the voice uttered by the speaker is input via the microphone 61.
A speaker 71 for outputting sound is connected to the sound output unit 70.
A display 81 is connected to the drawing unit 80. The display 81 is a color display using liquid crystal or CRT. Information is displayed via the display 81.

ここで、図３に示すフローチャートに基づいて音声認識処理を説明する。図３に示す音声認識処理は、トークスイッチ５１がオンされたときに行われる処理であり、発話者の発した音声に基づいて目的地を設定する場合を例に説明する。
初めのステップＳ１０１（以下、「ステップ」を省略し、単に記号「Ｓ」で示す。）では、トークスイッチ５１がオンされたことを検知する。
Ｓ１０２では、認識辞書４１をセットする。 Here, the speech recognition processing will be described based on the flowchart shown in FIG. The voice recognition process shown in FIG. 3 is a process performed when the talk switch 51 is turned on, and a case where a destination is set based on a voice uttered by a speaker will be described as an example.
In the first step S101 (hereinafter, “step” is omitted and simply indicated by the symbol “S”), it is detected that the talk switch 51 is turned on.
In S102, the recognition dictionary 41 is set.

Ｓ１０３では、マイク６１を介して入力された発話者の発した音声情報を取得する。
Ｓ１０４では、Ｓ１０３で取得した音声情報について認識処理を行う。ここでは、Ｓ１０３で取得された音声情報をＡ／Ｄ変換し、データ処理が可能な波形データに変換する。そして波形データと認識辞書４１に記憶されている音声波形データとを照合し、認識候補を特定する。 In S103, the voice information uttered by the speaker input via the microphone 61 is acquired.
In S104, recognition processing is performed on the voice information acquired in S103. Here, the audio information acquired in S103 is A / D converted into waveform data that can be processed. Then, the waveform data and the speech waveform data stored in the recognition dictionary 41 are collated to identify a recognition candidate.

Ｓ１０５では、Ｓ１０４における認識処理において、認識結果が得られたか否かを判断する。本実施形態では、Ｓ１０４にて認識候補が特定できた場合、認識結果が得られたと判断する。認識結果が得られなかった場合（Ｓ１０５：ＮＯ）、Ｓ１０７へ移行する。認識結果が得られた場合（Ｓ１０５：ＹＥＳ）、Ｓ１０６へ移行する。 In S105, it is determined whether or not a recognition result is obtained in the recognition process in S104. In this embodiment, when a recognition candidate can be specified in S104, it is determined that a recognition result has been obtained. When the recognition result is not obtained (S105: NO), the process proceeds to S107. When a recognition result is obtained (S105: YES), the process proceeds to S106.

Ｓ１０６では、特定された認識候補を出力する。具体的には、特定された認識候補が「岡崎市役所」である場合、スピーカ７１を介して「岡崎市役所を目的地として設定します」といった音声を出力する。また、ディスプレイ８１に岡崎市役所を中心とする地図を表示する。そして、岡崎市役所を目的地として設定して経路と探索し、岡崎市役所への経路案内を行う。なお、目的地の設定、経路の探索、経路案内は、本処理とは別処理で行われるものとする。 In S106, the identified recognition candidate is output. Specifically, when the identified recognition candidate is “Okazaki City Hall”, the speaker 71 outputs a voice such as “Set Okazaki City Hall as the destination”. In addition, a map centering on the Okazaki City Hall is displayed on the display 81. Then, the Okazaki city hall is set as a destination, the route is searched, and the route guidance to the Okazaki city hall is performed. Note that destination setting, route search, and route guidance are performed in a process different from this process.

認識結果が得られなかった場合（Ｓ１０５：ＮＯ）に移行するＳ１０７では、Ｓ１０３にて取得した音声情報の波形データを認識不可音声情報として制御部１０を構成するＲＡＭに保持するとともに、認識不可フラグをセットする。また、スピーカ７１を介して「認識できませんでした」といった音声を出力する。 In S107, when the recognition result is not obtained (S105: NO), the waveform data of the voice information acquired in S103 is held in the RAM constituting the control unit 10 as unrecognizable voice information, and the recognition disabled flag. Set. Further, a voice such as “Could not be recognized” is output via the speaker 71.

続いて、認識できなかった音声情報をその後の操作に基づいて認識辞書登録を行う認識辞書登録処理について図４に示すフローチャートに基づいて説明する。図４に示す認識辞書登録処理は、認識不可フラグがセットされたときに行われる処理である。 Next, a recognition dictionary registration process for registering a recognition dictionary based on subsequent operations for voice information that could not be recognized will be described based on a flowchart shown in FIG. The recognition dictionary registration process shown in FIG. 4 is a process that is performed when the unrecognizable flag is set.

初めのＳ１１１では、目的地の設定に係る新たな情報が取得されたか否かを判断する。
取得される新たな情報は、操作スイッチ群５０を介して入力された情報でもよいし、マイク６１を介して入力された音声情報であってもよい。新たな情報が取得されていないと判断された場合（Ｓ１１１：ＮＯ）、Ｓ１１３へ移行する。新たな情報が取得されたと判断された場合（Ｓ１１１：ＹＥＳ）、Ｓ１１２へ移行する。
Ｓ１１２では、取得された新たな情報に基づき、目的地を設定する。 In the first S111, it is determined whether or not new information related to the destination setting has been acquired.
The new information to be acquired may be information input via the operation switch group 50 or may be audio information input via the microphone 61. When it is determined that new information has not been acquired (S111: NO), the process proceeds to S113. When it is determined that new information has been acquired (S111: YES), the process proceeds to S112.
In S112, a destination is set based on the acquired new information.

目的地の設定に係る新たな情報が取得されていないと判断された場合（Ｓ１１１：ＮＯ）に移行するＳ１１３では、イグニッションがオフされたか否かを判断する。イグニッションがオフされていないと判断された場合（Ｓ１１３：ＮＯ）、Ｓ１１１へ戻る。イグニッションがオフされたと判断された場合（Ｓ１１３：ＹＥＳ）、Ｓ１１４へ移行する。 If it is determined that new information related to the destination setting has not been acquired (S111: NO), in S113, it is determined whether the ignition is turned off. If it is determined that the ignition is not turned off (S113: NO), the process returns to S111. When it is determined that the ignition is turned off (S113: YES), the process proceeds to S114.

Ｓ１１４では、推定地点を特定する。すなわち、新たな情報が取得され（Ｓ１１１：ＹＥＳ）、取得された新たな情報に基づいて目的地が設定された（Ｓ１１２）後に移行するＳ１１４では、設定された目的地を推定地点として特定する。また、イグニッションがオフされたと判断された（Ｓ１１３：ＹＥＳ）後に移行するＳ１１４では、イグニッションがオフされた地点を推定地点として特定する。
本実施形態においては、認識不可音声情報が保持された後に行われた目的地を設定する操作、或いは、認識音声付加情報が保持された後に車両のイグニッションをオフする操作が、「認識結果判断手段により認識結果が得られないと判断された後に実行された操作」に対応している。 In S114, an estimated point is specified. That is, new information is acquired (S111: YES), and a destination is set based on the acquired new information (S112). In S114, the set destination is specified as an estimated point. Moreover, in S114 which transfers after it is judged that the ignition was turned off (S113: YES), the point where the ignition was turned off is specified as an estimated point.
In the present embodiment, the operation for setting the destination performed after the unrecognizable voice information is held or the operation for turning off the vehicle ignition after the recognized voice additional information is held is “recognition result judging means”. This corresponds to an operation performed after it is determined that a recognition result cannot be obtained.

Ｓ１１５では、特定された推定地点に関する推定地点情報を地図データ記憶部３０から取得する。ここでは、当該推定地点に対応するＰＯＩ情報が取得される。
Ｓ１１６では、施設のジャンルが予め設定されているか否かを判断する。本実施形態は、図３中のＳ１０７以前であって、Ｓ１０７に至る一連の操作処理において、施設のジャンルが設定されているか否かを判断する。ここで、「一連の操作処理」とは、例えば音声認識や操作スイッチ群５０を介した操作によりジャンル「ラーメン屋」が設定されており、「どこのラーメン屋ですか？」という問いかけに対する回答として、ユーザがトークスイッチ５１をオンして図２および図３に示す処理が行われる場合におけるジャンル「ラーメン屋」を設定する操作処理である。換言すると、ジャンルが設定されていることを前提により詳細な情報の入力をユーザに促す場合における当該ジャンルの設定に係る操作処理が「一連の操作処理」と対応している、といえる。また本実施形態では、予め設定されている施設のジャンル（以下、「設定施設ジャンル」という。）に関する情報が「認識不可音声の発話に至る事前情報」に対応している。施設のジャンルが予め設定されていない場合（Ｓ１１６：ＮＯ）、Ｓ１１８へ移行する。施設のジャンルが予め設定されている場合（Ｓ１１６：ＹＥＳ）、Ｓ１１７へ移行する。 In S115, the estimated point information regarding the specified estimated point is acquired from the map data storage unit 30. Here, POI information corresponding to the estimated point is acquired.
In S116, it is determined whether or not the genre of the facility is set in advance. In the present embodiment, it is before S107 in FIG. 3, and it is determined whether or not the genre of the facility is set in a series of operation processes up to S107. Here, the “series of operation processing” is, for example, the genre “Ramen shop” is set by voice recognition or operation via the operation switch group 50, and as an answer to the question “Where is the ramen shop?” This is an operation process for setting the genre “ramen restaurant” when the user turns on the talk switch 51 and the processes shown in FIGS. 2 and 3 are performed. In other words, it can be said that the operation process related to the setting of the genre when prompting the user to input detailed information on the assumption that the genre is set corresponds to “a series of operation processes”. Further, in the present embodiment, information related to a genre of a preset facility (hereinafter referred to as “set facility genre”) corresponds to “preliminary information leading to speech of unrecognizable speech”. When the genre of the facility is not set in advance (S116: NO), the process proceeds to S118. When the genre of the facility is set in advance (S116: YES), the process proceeds to S117.

Ｓ１１７では、推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致するか否かを判断する。本実施形態では、事前情報が施設のジャンルに関する情報であるので、推定地点情報に含まれる施設のジャンルに関する情報が「属性情報」に対応している。推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致しない場合（Ｓ１１７：ＮＯ）、Ｓ１１８の処理を行わない。推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致する場合（Ｓ１１７：ＹＥＳ）、Ｓ１１８へ移行する。 In S117, it is determined whether or not the facility genre included in the estimated point information matches the set facility genre. In this embodiment, since the prior information is information related to the genre of the facility, the information related to the genre of the facility included in the estimated point information corresponds to “attribute information”. When the genre of the facility included in the estimated point information does not match the set facility genre (S117: NO), the process of S118 is not performed. When the genre of the facility included in the estimated point information matches the set facility genre (S117: YES), the process proceeds to S118.

施設のジャンルが予め設定されていない場合（Ｓ１１６：ＮＯ）、および施設のジャンルが予め設定されていて、かつ、推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致する場合（Ｓ１１６：ＹＥＳ、Ｓ１１７：ＹＥＳ）に移行するＳ１１８では、図３中のＳ１０７で保持された認識不可音声情報と推定地点情報とを関連付けて認識辞書４１に記憶する。このとき、「先ほど認識できなかったキーワードをこの地点を示す言葉として登録します」といった音声を、スピーカ７１を介して出力する。また、Ｓ１０７においてセットされた認識不可フラグをリセットし、認識辞書登録処理を終了する。 When the genre of the facility is not set in advance (S116: NO), and when the genre of the facility is set in advance and the genre of the facility included in the estimated point information matches the set facility genre (S116: (YES, S117: YES) In S118, the unrecognizable speech information held in S107 in FIG. 3 and the estimated point information are associated with each other and stored in the recognition dictionary 41. At this time, a sound such as “Register a keyword that could not be recognized as a word indicating this point” is output via the speaker 71. In addition, the recognition impossible flag set in S107 is reset, and the recognition dictionary registration process is terminated.

ここで、認識辞書登録処理の具体例を説明する。
（１）具体例１
具体例１、２では、目的地を設定する際に、発話者が「おじいちゃん家」という単語を発話した例である。具体例１では、「おじいちゃん家」が認識できなかった後に目的地を設定した場合の辞書登録処理を説明する。 Here, a specific example of the recognition dictionary registration process will be described.
(1) Specific example 1
Specific examples 1 and 2 are examples in which a speaker utters the word “Grandpa House” when setting a destination. In the first specific example, a dictionary registration process when a destination is set after “Grandpa's House” cannot be recognized will be described.

目的地の設定に際し、トークスイッチ５１がオンされると（Ｓ１０１）、認識辞書４１がセットされる（Ｓ１０２）。発話者が「おじいちゃん家」と発話すると、音声情報が取得され（Ｓ１０３）、認識処理が行われる（Ｓ１０４）。発話された「おじいちゃん家」の波形データＡに対応する音声波形データが認識辞書４１に記憶されておらず、認識結果が得られない場合（Ｓ１０５：ＮＯ）、「認識できませんでした」といった音声を出力するとともに、波形データＡを認識不可音声情報として保持する（Ｓ１０７）。 When the talk switch 51 is turned on in setting the destination (S101), the recognition dictionary 41 is set (S102). When the speaker speaks “Grandpa's house”, voice information is acquired (S103), and recognition processing is performed (S104). If the speech waveform data corresponding to the waveform data A of the spoken “Grandpa family” is not stored in the recognition dictionary 41 and the recognition result cannot be obtained (S105: NO), a speech such as “Could not be recognized” is given. At the same time, the waveform data A is held as unrecognizable voice information (S107).

ここで、「おじいちゃん家」が認識できなかったことを通知された発話者が、操作スイッチ群５０を介して「名古屋市緑区Ａ町３−１８」という住所を入力し、目的地として設定した場合（Ｓ１１１：ＹＥＳ）、入力された住所である「名古屋市緑区Ａ町３−１８」が「おじいちゃん家」の住所である蓋然性が高い。本実施形態では、新たに取得された情報に基づいて目的地が設定されたので（Ｓ１１１：ＹＥＳ、Ｓ１１２）、設定された目的地である「名古屋市緑区Ａ町３−１８」を「おじいちゃん家」に対応すると推定される推定地点として特定し（Ｓ１１４）、推定地点情報を取得する（Ｓ１１５）。 Here, a speaker who has been notified that “Grandpa's house” could not be recognized entered the address “3-18, Amachi, Midori-ku, Nagoya-shi” via the operation switch group 50 and set it as the destination. In the case (S111: YES), the input address “Nagoya City Midori-ku Amachi 3-18” is highly likely to be the address of “Grandpa House”. In the present embodiment, since the destination is set based on the newly acquired information (S111: YES, S112), the set destination “Nagoya City Midori-ku Amachi 3-18” is changed to “Grandpa”. It is specified as an estimated point estimated to correspond to “house” (S114), and estimated point information is acquired (S115).

「おじいちゃん家」の波形データＡを認識不可音声情報として保持する前に施設のジャンルが設定されていないものとすると（Ｓ１１６：ＮＯ）、認識不可情報として保持された「おじいちゃん家」の波形データＡと、推定地点として特定された「名古屋市緑区Ａ町３−１８」の地点に関する情報とを関連付けて認識辞書４１に記憶する。具体的には、図２に示すように、住所「名古屋市緑区Ａ町３−１８」（図２中においては「Ａ町」以降を省略している。）、及び入力された住所に対応する位置座標（ｘａ,ｙａ）と、波形データＡとを関連付けて認識辞書４１に記憶する。また、「先ほど認識できなかったキーワードをこの地点を示す言葉として登録します」といった音声を、スピーカ７１を介して出力する（Ｓ１１８）。 If the facility genre is not set before the waveform data A of “Grandpa House” is stored as unrecognizable voice information (S116: NO), the waveform data A of “Grandpa House” stored as unrecognizable information is stored. And the information related to the point “3-18, Midori-ku, Nagoya-shi” identified as the estimated point are stored in the recognition dictionary 41 in association with each other. Specifically, as shown in FIG. 2, it corresponds to the address “Nagoya City Midori-ku Amachi 3-18” (in FIG. 2, “A town” and after are omitted) and the input address. The position coordinates (xa, ya) to be stored and the waveform data A are stored in the recognition dictionary 41 in association with each other. Further, a voice such as “Register a keyword that could not be recognized as a word indicating this point” is output through the speaker 71 (S118).

（２）具体例２
具体例２では、「おじいちゃん家」の波形データＡを認識不可音声情報として保持した後（Ｓ１０７）、目的地の設定に係る新たな情報が取得されなかった場合（Ｓ１１１：ＮＯ）の辞書登録処理を説明する。
この例では、目的地を設定せずに走行しているので、「おじいちゃん家」の波形データＡを認識不可音声情報として保持した（Ｓ１０７）後にイグニッションがオフされた地点が目的地として設定しようとした「おじいちゃん家」に対応する地点である蓋然性が高い。そこで本実施形態では、「おじいちゃん家」の波形データＡを認識不可音声情報として保持した後（Ｓ１０７）、目的地が設定されなかった場合（Ｓ１１１：ＮＯ）、イグニッションがオフされた地点を「おじいちゃん家」の波形データＡに対応すると推定される推定地点として特定し（Ｓ１１３：ＹＥＳ、Ｓ１１４）、推定地点に関する推定地点情報を取得する（Ｓ１１５）。 (2) Specific example 2
In the second specific example, after the waveform data A of “Grandpa's house” is held as unrecognizable voice information (S107), dictionary registration processing in a case where new information relating to destination setting is not acquired (S111: NO) Will be explained.
In this example, since the vehicle is traveling without setting the destination, the waveform data A of “Grandpa's house” is held as unrecognizable voice information (S107), and the point where the ignition is turned off is set as the destination. There is a high probability that it is a point corresponding to the “Grandpa House”. Therefore, in this embodiment, after the waveform data A of “Grandpa House” is held as unrecognizable voice information (S107), if the destination is not set (S111: NO), the point where the ignition is turned off is indicated as “Grandpa”. It is specified as an estimated point estimated to correspond to the waveform data A of “house” (S113: YES, S114), and estimated point information regarding the estimated point is acquired (S115).

「おじいちゃん家」の波形データＡを認識不可音声情報として保持する前に施設のジャンルが設定されていないものとすると（Ｓ１１６：ＮＯ）、認識不可情報として保持された「おじいちゃん家」の波形データＡと、イグニッションがオフされた地点に関する情報である推定地点情報とを関連付けて認識辞書４１に記憶する。また、「先ほど認識できなかったキーワードをこの地点を示す言葉として登録します」といった音声を、スピーカ７１を介して出力する（Ｓ１１８）。 If the facility genre is not set before the waveform data A of “Grandpa House” is stored as unrecognizable voice information (S116: NO), the waveform data A of “Grandpa House” stored as unrecognizable information is stored. And the estimated spot information, which is information related to the spot where the ignition is turned off, are stored in the recognition dictionary 41 in association with each other. Further, a voice such as “Register a keyword that could not be recognized as a word indicating this point” is output through the speaker 71 (S118).

具体例１、２によれば、発話者は、「おじいちゃん家」の波形データＡに対して住所「名古屋市緑区Ａ町３−１８」を登録するための操作を行う必要がない。また次回からは、「おじいちゃん家」と発話することにより、「おじいちゃん家」の波形データＡに関連付けて記憶された推定地点の情報を好適に利用することができ、利便性が向上する。 According to specific examples 1 and 2, the speaker does not need to perform an operation for registering the address “3-18, Amachi, Midori-ku, Nagoya-shi” with respect to the waveform data A of “Grandpa family”. Also, from the next time, by speaking “Grandpa House”, the information on the estimated point stored in association with the waveform data A of “Grandpa House” can be suitably used, and convenience is improved.

（３）具体例３
具体例３では、発話者の発した音声情報に基づいて目的地の施設のジャンルを設定する場合を説明する。
スピーカ７１を介して「ジャンルを発話して下さい」といったジャンルを問う音声が出力され、発話者がトークスイッチ５１をオンにすると（Ｓ１０１）、ジャンルに関する認識辞書４１がセットされる（Ｓ１０２）。次いで、発話者が「コンビニ」と発話したものとする。すると、発話された「コンビニ」が音声情報として取得され（Ｓ１０３）、認識処理が行われる（Ｓ１０４）。「コンビニ」の波形データＢに対応する音声波形データが認識辞書４１に「ジャンル」として記憶されておらず、認識結果が得られない場合（Ｓ１０５：ＮＯ）、「認識できませんでした」といった音声を出力するとともに、「コンビニ」の波形データＢを認識不可音声情報として保持する（Ｓ１０７）。 (3) Specific example 3
In specific example 3, a case will be described in which the genre of the destination facility is set based on the voice information uttered by the speaker.
When the speaker asks the genre, such as “Please speak the genre” via the speaker 71, and the speaker turns on the talk switch 51 (S101), the genre recognition dictionary 41 is set (S102). Next, it is assumed that the speaker speaks “convenience store”. Then, the spoken “convenience store” is acquired as voice information (S103), and recognition processing is performed (S104). If the speech waveform data corresponding to the waveform data B of the “convenience store” is not stored as the “genre” in the recognition dictionary 41 and the recognition result cannot be obtained (S105: NO), a speech such as “Could not be recognized” is given. At the same time, the waveform data B of “convenience store” is held as unrecognizable voice information (S107).

「コンビニ」の波形データＢを認識不可音声情報として保持した後（Ｓ１０７）、目的地の設定を行わずに走行した場合（Ｓ１１１：ＮＯ）、イグニッションがオフされた地点を「コンビニ」に対応すると推定される推定地点として特定し（Ｓ１１３：ＹＥＳ、Ｓ１１４）、推定地点に関する推定地点情報を取得する（Ｓ１１５）。イグニッションがオフされた地点に対応する施設のジャンルが「コンビニエンスストア」であった場合、ユーザは、施設のジャンルである「コンビニエンスストア」を「コンビニ」と発話した蓋然性が高い。そこで、図２に示すように、「コンビニ」の波形データＢと、施設のジャンルである「コンビニエンスストア」とを関連付けて認識辞書４１に記憶する。また、「先ほど認識できなかったキーワードを『コンビニエンスストア』を示す言葉として登録します」といった音声を、スピーカ７１を介して出力する（Ｓ１１８）。 When the waveform data B of the “convenience store” is stored as unrecognizable voice information (S107) and the vehicle travels without setting the destination (S111: NO), the point where the ignition is turned off corresponds to the “convenience store”. The estimated point to be estimated is specified (S113: YES, S114), and estimated point information regarding the estimated point is acquired (S115). When the genre of the facility corresponding to the point where the ignition is turned off is “convenience store”, the user has a high probability of speaking the “convenience store” that is the genre of the facility as “convenience store”. Therefore, as shown in FIG. 2, “convenience store” waveform data B and facility genre “convenience store” are stored in the recognition dictionary 41 in association with each other. Further, a voice such as “Register a keyword that could not be recognized earlier as a word indicating“ convenience store ”” is output through the speaker 71 (S118).

具体例３では、「ジャンルを発話して下さい」という問いかけに対して発話された「コンビニ」の波形データＢは、ジャンルに関する単語であることが特定されている、といえる。このように、認識不可音声情報がジャンルに関する情報であると特定されている場合、推定地点情報のジャンルに関する情報を参照し、該当するジャンル（具体例３では「コンビニエンスストア」）と認識不可音声情報とを関連付けて記憶するように構成してもよい。換言すると、「記憶制御手段は、認識不可音声情報の属性が特定されている場合、推定地点情報を参照し、認識不可音声情報の属性に該当する属性情報と認識不可音声情報とを関連付けて記憶する」ということである。 In specific example 3, it can be said that the waveform data B of “convenience store” uttered in response to the question “Please speak genre” is specified to be a word related to the genre. As described above, when the unrecognizable sound information is specified as information on the genre, the information on the genre in the estimated point information is referred to, and the corresponding genre (“convenience store” in the specific example 3) and the unrecognizable sound information. May be stored in association with each other. In other words, “when the attribute of the unrecognizable voice information is specified, the storage control means refers to the estimated point information and stores the attribute information corresponding to the attribute of the unrecognizable voice information and the unrecognizable voice information in association with each other. Is to do.

（４）具体例４
具体例４では、目的地の施設のジャンルが予め設定されている場合の辞書登録処理を説明する。
スピーカ７１を介して「ジャンルを発話して下さい」といった音声が出力され、ジャンルが質問されていた場合であって、発話者が「ラーメン屋」と発話したものとする。すると、「ラーメン屋」の波形データＣが音声情報として取得され（Ｓ１０３）、認識処理が行われ（Ｓ１０４）、「ラーメン屋」が認識候補として特定された場合（Ｓ１０５：ＹＥＳ）、認識候補が出力される（Ｓ１０６）。この例では、「どこのラーメン屋ですか？」という新たな入力を促す音声がスピーカ７１を介して出力され（Ｓ１０６）、図３に示す音声認識処理を終了する。このとき、目的地の施設のジャンルとして「ラーメン屋」が設定され、内部的に記憶される。ここまでの処理が、次に行われる音声認識処理のＳ１０７に至る一連の操作処理に対応し、「ラーメン屋」が「認識不可音声情報の発話に至る事前情報」に対応している。 (4) Specific example 4
Specific example 4 describes dictionary registration processing in the case where the genre of the destination facility is set in advance.
It is assumed that a voice such as “Please speak a genre” is output via the speaker 71 and the genre is questioned, and the speaker speaks “Ramen shop”. Then, the waveform data C of “ramen restaurant” is acquired as voice information (S103), recognition processing is performed (S104), and when “ramen restaurant” is specified as a recognition candidate (S105: YES), the recognition candidate is determined. It is output (S106). In this example, a voice prompting a new input “Which ramen shop?” Is output through the speaker 71 (S106), and the voice recognition process shown in FIG. 3 is terminated. At this time, “Ramen shop” is set as the genre of the destination facility and stored internally. The processing so far corresponds to a series of operation processing up to S107 of the speech recognition processing to be performed next, and “ramen shop” corresponds to “preliminary information leading to utterance of unrecognizable speech information”.

「どこのラーメン屋ですか？」という質問に対し、トークスイッチがオンされると（Ｓ１０１）、施設ジャンル「ラーメン屋」に対応する認識辞書がセットされる（Ｓ１０２）。ここで発話者が「ＫＲ苑」と発話したものとする。すると、発話された「ＫＲ苑」が音声情報として取得され（Ｓ１０３）、認識処理が行われる（Ｓ１０４）。「ＫＲ苑」に対応する波形データＤが認識辞書４１の「ラーメン屋」に対応する認識辞書に記憶されておらず、認識結果が得られない場合（Ｓ１０５：ＮＯ）、「認識できませんでした」といった音声を出力するとともに、「ＫＲ苑」の波形データＤを認識不可音声情報として保持する（Ｓ１０７）。 When the talk switch is turned on in response to the question “Where is ramen shop?” (S101), a recognition dictionary corresponding to the facility genre “ramen shop” is set (S102). Here, it is assumed that the speaker speaks “KR 苑”. Then, the spoken “KR 苑” is acquired as voice information (S103), and recognition processing is performed (S104). If the waveform data D corresponding to “KR 苑” is not stored in the recognition dictionary corresponding to “Ramen shop” in the recognition dictionary 41 and the recognition result cannot be obtained (S105: NO), “Recognition was not possible” And the waveform data D of “KR 苑” is held as unrecognizable voice information (S107).

「ＫＲ苑」の波形データＤを認識不可音声情報として保持した後（Ｓ１０７）、目的地の設定を行わずに走行した場合（Ｓ１１１：ＮＯ）、イグニッションをオフした地点を「ＫＲ苑」の波形データＤに対応すると推定される推定地点として特定し（Ｓ１１３：ＹＥＳ、Ｓ１１４）、推定地点に関する推定地点情報を取得する（Ｓ１１５）。 After holding the waveform data D of “KR 苑” as unrecognizable voice information (S107), when driving without setting the destination (S111: NO), the location where the ignition is turned off is the waveform of “KR 苑” The estimated point estimated to correspond to the data D is specified (S113: YES, S114), and estimated point information regarding the estimated point is acquired (S115).

この例では、目的地の施設のジャンルが予め「ラーメン屋」と設定されている（Ｓ１１６：ＹＥＳ）。次に推定地点情報に含まれる施設のジャンルが設定施設ジャンルとしてのラーメン屋と一致するか否かを判断する（Ｓ１１７）。推定地点情報に含まれる施設のジャンルがラーメン屋である場合、推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致するので（Ｓ１１７：ＹＥＳ）、認識不可音声情報として保持された「ＫＲ苑」の波形データＤと推定地点情報とを関連付けて認識辞書４１に記憶する。具体的には、図２に示すように、推定地点情報である名称「ＫＲ苑」、ジャンル「ラーメン屋」、住所および位置座標と、波形データＤとが関連付けられて認識辞書４１に記憶される。また、「先ほど認識できなかったキーワードをこの地点を示す言葉として登録します」といった音声を、スピーカ７１を介して出力する（Ｓ１１８）。 In this example, the genre of the destination facility is set in advance as “Ramen shop” (S116: YES). Next, it is determined whether or not the genre of the facility included in the estimated point information matches the ramen shop as the set facility genre (S117). When the genre of the facility included in the estimated point information is a ramen shop, since the genre of the facility included in the estimated point information matches the set facility genre (S117: YES), “KR” held as unrecognizable speech information The waveform data D of “苑” and the estimated point information are associated and stored in the recognition dictionary 41. Specifically, as shown in FIG. 2, the name “KR 苑”, the genre “Ramen shop”, the address and position coordinates, which are estimated point information, and the waveform data D are associated and stored in the recognition dictionary 41. . Further, a voice such as “Register a keyword that could not be recognized as a word indicating this point” is output through the speaker 71 (S118).

一方、推定地点情報に含まれる施設のジャンルがラーメン屋ではなく、例えばコンビニエンスストアである場合、推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致しない（Ｓ１１７：ＮＯ）。この場合、例えば目的地であるラーメン屋に向かう途中にコンビニエンスストアに立ち寄ったと考えられ、イグニッションがオフされた地点が目的地ではない可能性が高い。そのため、認識不可音声情報として保持された「ＫＲ苑」の波形データＤと推定地点情報とを関連付けて認識辞書４１に記憶する処理（Ｓ１１８）を行わない。このとき、「ＫＲ苑」の波形データＤを破棄してもよいし、イグニッションがオンされたときに図４に示す処理を再度行うようにしてもよい。 On the other hand, if the genre of the facility included in the estimated point information is not a ramen shop but a convenience store, for example, the genre of the facility included in the estimated point information does not match the set facility genre (S117: NO). In this case, for example, it is considered that a convenience store was stopped on the way to the ramen shop as the destination, and there is a high possibility that the point where the ignition is turned off is not the destination. Therefore, the process (S118) of correlating and storing the waveform data D of “KR 苑” held as unrecognizable speech information and the estimated point information in the recognition dictionary 41 is not performed. At this time, the waveform data D of “KR 苑” may be discarded, or the processing shown in FIG. 4 may be performed again when the ignition is turned on.

以上詳述したように、発話者の発した音声情報を取得し（Ｓ１０３）、音声情報を認識処理し（Ｓ１０４）、認識結果が得られたか否かを判断する（Ｓ１０５）。認識結果が得られないと判断された場合（Ｓ１０５：ＮＯ）、音声情報を認識不可音声情報として保持する（Ｓ１０７）。また、認識結果が得られないと判断された後に実行された操作に基づき、認識音声不可情報に対応すると推定される地図上の推定地点を特定し（Ｓ１１４）、推定地点に関する推定地点情報と認識不可音声情報とを関連付けて記憶させる（Ｓ１１８）。これにより、認識結果が得られなかった認識不可音声情報を、認識不可音声情報に対応すると推定される推定地点に関する推定地点情報と関連付けて適切に記憶させることができるので、音声認識できなかった音声情報を適切に音声認識させることができる。また、認識不可音声情報と推定地点情報とを関連付けて認識辞書４１に記憶することにより、その後の音声認識処理おいて、記憶された認識不可音声情報を認識可能な情報として利用することができ、ユーザの利便性が向上する。
本実施形態では、登録のための操作をユーザがしなくても推定地点情報と認識不可音声情報とを関連付けて記憶させることにより、認識不可音声情報を音声認識可能な情報として自動的に記憶させることができる。 As described above in detail, the voice information uttered by the speaker is acquired (S103), the voice information is recognized (S104), and it is determined whether a recognition result is obtained (S105). If it is determined that the recognition result cannot be obtained (S105: NO), the speech information is held as unrecognizable speech information (S107). Further, based on an operation executed after it is determined that a recognition result cannot be obtained, an estimated point on the map that is estimated to correspond to the recognized speech disabled information is identified (S114), and the estimated point information regarding the estimated point is recognized. The impossible voice information is stored in association with each other (S118). As a result, the unrecognizable speech information for which no recognition result has been obtained can be appropriately stored in association with the estimated point information related to the estimated point estimated to correspond to the unrecognizable speech information. Information can be appropriately recognized by voice. Further, by storing the unrecognizable voice information and the estimated point information in association with each other in the recognition dictionary 41, the stored unrecognizable voice information can be used as recognizable information in the subsequent voice recognition process. User convenience is improved.
In the present embodiment, the unrecognizable voice information is automatically stored as voice-recognizable information by storing the estimated point information and the unrecognizable voice information in association with each other without the user performing an operation for registration. be able to.

認識結果が得られないと判断された後（Ｓ１０５：ＮＯ）、目的地の設定に係る情報が取得されたか否かを判断する（Ｓ１１１）。目的地の設定に係る情報に基づいて目的地が設定された場合（Ｓ１１１：ＹＥＳ、Ｓ１１２）、当該目的地を推定地点として特定する（Ｓ１１４）。これにより、目的地の設定に係る情報に基づいて推定地点を特定するので、認識不可音声情報に対応すると推定される推定地点をより適切に特定することができる。 After it is determined that the recognition result cannot be obtained (S105: NO), it is determined whether or not the information related to the destination setting is acquired (S111). When the destination is set based on the information related to the setting of the destination (S111: YES, S112), the destination is specified as the estimated point (S114). Thereby, since an estimated point is specified based on the information which concerns on the setting of the destination, the estimated point estimated to respond | correspond to unrecognizable audio | voice information can be specified more appropriately.

また、目的地の設定に係る新たな情報が取得されていないと判断された場合（Ｓ１１１：ＮＯ）、車両のイグニッションがオフされた地点を推定地点として特定する（Ｓ１１３：ＹＥＳ、Ｓ１１４）。これにより、認識不可音声情報に対応すると推定される推定地点を容易に特定することができる。 Further, when it is determined that new information related to the destination setting has not been acquired (S111: NO), the point where the vehicle ignition is turned off is specified as the estimated point (S113: YES, S114). As a result, it is possible to easily identify an estimated point estimated to correspond to unrecognizable voice information.

さらに、目的地の施設のジャンルが予め設定されている場合（Ｓ１１６：ＹＥＳ）、推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致するか否かを判断し（Ｓ１１７）、推定地点情報に含まれる施設のジャンルと設定施設ジャンルとが一致する場合（Ｓ１１７：ＹＥＳ）、推定地点情報と認識音声不可情報とを関連付けて記憶させる。これにより、推定地点情報と認識不可音声情報とを精度よく関連付けて記憶させることができる。 Furthermore, when the genre of the destination facility is preset (S116: YES), it is determined whether the genre of the facility included in the estimated point information matches the set facility genre (S117), and the estimated point When the genre of the facility included in the information matches the set facility genre (S117: YES), the estimated point information and the recognized speech impossibility information are stored in association with each other. As a result, the estimated point information and the unrecognizable voice information can be stored in association with each other with high accuracy.

本実施形態では、制御部１０が「音声情報取得手段」、「認識処理手段」、「認識結果判断手段」、「音声情報保持手段」、「地点特定手段」、「記憶制御手段」、「新情報取得判断手段」、「一致判断手段」を構成する。また、図３中のＳ１０３が「音声情報取得手段」の機能としての処理に相当し、Ｓ１０４が「認識処理手段」の機能としての処理に相当し、Ｓ１０５が「認識結果判断手段」の機能としての処理に相当し、Ｓ１０７が「音声情報保持手段」の機能としての処理に相当し、図４中のＳ１１４が「地点特定手段」の機能としての処理に相当し、Ｓ１１８が「記憶制御手段」の機能としての処理に相当する。また、Ｓ１１１が「新情報取得判断手段」の機能としての処理に相当し、Ｓ１１７が「一致判断手段」の機能としての処理に相当する。 In the present embodiment, the control unit 10 performs “voice information acquisition means”, “recognition processing means”, “recognition result judgment means”, “voice information holding means”, “point specifying means”, “storage control means”, “new control means”, It constitutes “information acquisition determination means” and “match determination means”. Further, S103 in FIG. 3 corresponds to the processing as the function of the “voice information acquisition unit”, S104 corresponds to the processing as the function of the “recognition processing unit”, and S105 as the function of the “recognition result determination unit”. S107 corresponds to the process as the function of the “voice information holding means”, S114 in FIG. 4 corresponds to the process as the function of the “point specifying means”, and S118 corresponds to the “storage control means”. It corresponds to the processing as a function. Further, S111 corresponds to a process as a function of “new information acquisition determination unit”, and S117 corresponds to a process as a function of “match determination unit”.

以上、本発明は、上記実施形態になんら限定されるものではなく、発明の趣旨を逸脱しない範囲において種々の形態で実施可能である。
（ア）地点特定手段
上記実施形態では、目的地の設定に係る新たな情報が取得されていないと判断された場合（Ｓ１１１：ＮＯ）、イグニッションがオフされたか否かを判断し（Ｓ１１３）、イグニッションがオフされた地点を推定地点として特定した。変形例では、目的地の設定に係る新たな情報が取得されたか否かの判断を省略し、イグニッションがオフされた地点を推定地点として特定するように構成してもよい。これにより、認識不可音声情報に対応すると推定される推定地点を容易に特定することができる。 As mentioned above, this invention is not limited to the said embodiment at all, In the range which does not deviate from the meaning of invention, it can implement with a various form.
(A) Point specifying means In the above embodiment, when it is determined that new information related to the destination setting has not been acquired (S111: NO), it is determined whether the ignition is turned off (S113), The point where the ignition was turned off was identified as the estimated point. In a modification, it may be configured to omit the determination of whether or not new information related to the destination setting has been acquired, and to specify the point where the ignition is turned off as the estimated point. As a result, it is possible to easily identify an estimated point estimated to correspond to unrecognizable voice information.

（イ）一致判断手段
上記実施形態では、推定地点情報に含まれる目的地の施設のジャンルと事前情報としての設定施設ジャンルとが一致するか否かを判断し（Ｓ１１７）、推定地点情報に含まれる目的地の施設のジャンルと設定施設ジャンルとが一致する場合（Ｓ１１７：ＹＥＳ）、認識不可音声情報と推定地点情報とを関連付けて記憶した。事前情報は、施設のジャンルに限らず、Ｓ１０７に至る一連の操作において取得された情報であればどのような情報であってもよい。 (A) Match determination means In the above embodiment, it is determined whether or not the genre of the destination facility included in the estimated point information matches the set facility genre as prior information (S117), and is included in the estimated point information. When the genre of the destination facility and the set facility genre match (S117: YES), the unrecognizable voice information and the estimated point information are stored in association with each other. The prior information is not limited to the genre of the facility, and may be any information as long as it is information acquired in a series of operations up to S107.

具体的には、事前情報は、住所に関する情報であってもよい。例えば、発話者の発した音声情報のうち、「愛知県岡崎市」までは認識できたものの、その先が認識できなかったとする。このとき、「愛知県岡崎市」に続く音声信号の波形データＥを認識不可音声情報として保持する。また、「愛知県岡崎市」を事前情報とする。そして、特定された推定地点の住所が「愛知県岡崎市岡町」である場合、認識不可音声情報に含まれる属性情報（この例では住所）が事前情報である「愛知県岡崎市」と一致するので、認識音声不可情報として保持された波形データＥと「愛知県岡崎市」に続く「岡町」とを関連付けて認識辞書４１に記憶する。一方、特定された推定地点の住所が愛知県岡崎市以外であった場合、認識不可音声情報に含まれる属性情報と事前情報とが一致しないので、認識不可音声情報と推定地点情報とを関連付けて記憶する処理を行わない。 Specifically, the prior information may be information regarding an address. For example, it is assumed that the speech information uttered by the speaker can be recognized up to “Okazaki City, Aichi Prefecture”, but the destination cannot be recognized. At this time, the waveform data E of the audio signal following “Okazaki City, Aichi Prefecture” is held as unrecognizable audio information. In addition, “Okazaki City, Aichi Prefecture” is assumed to be prior information. If the address of the identified estimated point is “Okazaki, Okazaki, Aichi”, the attribute information (address in this example) included in the unrecognizable speech information matches the prior information “Okazaki, Aichi” Therefore, the waveform data E held as the recognized speech impossibility information and “Okamachi” following “Okazaki City, Aichi Prefecture” are stored in the recognition dictionary 41 in association with each other. On the other hand, if the address of the identified estimated location is other than Okazaki City, Aichi Prefecture, the attribute information included in the unrecognizable speech information and the prior information do not match. Do not perform memorizing process.

（ウ）認識辞書
上記実施形態では、認識辞書には、音声情報としてＡ／Ｄ変換された波形データが記憶されていた。変形例では、認識辞書の音声情報は、音素列として記憶されていてもよい。この場合、認識不可音声情報についても、波形データに替えて音素列として保持するように構成してもよい。 (C) Recognition Dictionary In the above embodiment, the recognition dictionary stores waveform data that has been A / D converted as voice information. In a modification, the speech information in the recognition dictionary may be stored as a phoneme string. In this case, the unrecognizable speech information may be held as a phoneme string instead of the waveform data.

（エ）発話者への確認処理
図３中のＳ１０６にて認識結果を出力した後、認識結果が発話者の意図したものであるか否かを判断するステップを追加してもよい。例えば、スピーカ７１を介し「これでよろしいですか？」といった音声を発することにより発話者に確認を促し、マイク６１を介して入力された音声情報または操作スイッチ群５０を操作することにより入力された情報を取得し、取得された情報に基づいて認識結果が発話者の意図したものであるか否かを判断する。認識結果が発話者の意図したものでないと判断された場合、Ｓ１０７へ移行し、Ｓ１０３にて取得した音声情報の波形データを認識不可音声情報として保持するように構成してもよい。 (D) Confirmation process for speaker After step S106 in FIG. 3 outputs the recognition result, a step of determining whether the recognition result is intended by the speaker may be added. For example, the speaker 71 is prompted to confirm by uttering a voice such as “Are you sure?” Via the speaker 71, and the voice information input via the microphone 61 or the operation switch group 50 is input. Information is acquired, and it is determined whether or not the recognition result is intended by the speaker based on the acquired information. When it is determined that the recognition result is not intended by the speaker, the process may proceed to S107, and the waveform data of the voice information acquired in S103 may be held as unrecognizable voice information.

また、Ｓ１０４における認識処理において、複数の認識候補が特定された場合、発話者に選択を促すように構成してもよい。
さらにまた、図４中のＳ１１８の直前に、認識音声不可情報と推定地点情報とを関連付けて記憶させるか否かの判断を発話者に促す処理を追加し、認識音声不可情報と推定地点情報とを関連付けて記憶させない旨の情報が取得された場合、Ｓ１１８の処理を行わないように構成してもよい。 Further, when a plurality of recognition candidates are specified in the recognition processing in S104, the speaker may be prompted to select.
Furthermore, immediately before S118 in FIG. 4, a process for prompting the speaker to determine whether or not the recognized speech impossibility information and the estimated spot information are stored in association with each other is added. May be configured not to perform the process of S118.

（オ）記憶制御手段
上記実施形態では、認識不可音声情報に対応すると推定される地図上の推定地点と認識不可音声情報とを関連付けて記憶させた。変形例では、認識不可音声情報と、車両用情報端末の操作に係る情報とを関連付けて記憶するように構成してもよい。具体的には、例えば、「空調を消す」という音声情報が取得されたが認識できなかった場合、「空調を消す」の音声データの波形データＦを認識不可音声情報として保持する。そして、その後エアコンをオフにする操作がなされた場合、「空調を消す」の音声データの波形データＦとエアコンをオフにする操作情報とを関連付けて記憶する、といった具合である。 (E) Storage control means In the above embodiment, the estimated point on the map estimated to correspond to the unrecognizable voice information and the unrecognizable voice information are stored in association with each other. In the modification, the unrecognizable voice information and the information related to the operation of the vehicle information terminal may be stored in association with each other. Specifically, for example, when the voice information “turn off air conditioning” is acquired but cannot be recognized, the waveform data F of the voice data “turn off air conditioning” is held as unrecognizable voice information. Then, when an operation to turn off the air conditioner is performed thereafter, the waveform data F of the voice data “turn off the air conditioner” and the operation information to turn off the air conditioner are stored in association with each other.

１：ナビゲーション装置（車両用情報端末）、１０：制御部（音声情報取得手段、認識処理手段、認識結果判断手段、音声情報保持手段、地点特定手段、記憶制御手段、新情報取得判断手段、一致判断手段）、２０：位置検出器、２１：地磁気センサ、２２：ジャイロスコープ、２３：距離センサ、２４：ＧＰＳ受信機、３０：地図データ記憶部、４０：音声認識情報記憶部、４１：認識辞書、５０：操作スイッチ群、５１：トークスイッチ、６０：音声入力部、６１：マイク、７０：音声出力部、７１：スピーカ、８０：描画部、８１：ディスプレイ 1: navigation device (vehicle information terminal), 10: control unit (voice information acquisition means, recognition processing means, recognition result judgment means, voice information holding means, point identification means, storage control means, new information acquisition judgment means, coincidence Determination means), 20: position detector, 21: geomagnetic sensor, 22: gyroscope, 23: distance sensor, 24: GPS receiver, 30: map data storage unit, 40: voice recognition information storage unit, 41: recognition dictionary , 50: operation switch group, 51: talk switch, 60: audio input unit, 61: microphone, 70: audio output unit, 71: speaker, 80: drawing unit, 81: display

Claims

Voice information acquisition means for acquiring voice information uttered by a speaker;
Recognition processing means for recognizing the voice information acquired by the voice information acquisition means;
Recognition result judging means for judging whether a recognition result is obtained by the recognition processing means;
Voice information holding means for holding the voice information as unrecognizable voice information when the recognition result judgment means determines that the recognition result cannot be obtained;
A point identifying unit that identifies an estimated point on the map that is estimated to correspond to the unrecognizable speech information based on an operation performed after the recognition result determining unit determines that the recognition result cannot be obtained;
Storage control means for storing the estimated point information related to the estimated point and the unrecognizable voice information in association with each other;
A vehicle information terminal comprising:

After determining that the recognition result cannot be obtained by the recognition result determination unit, the information acquisition unit includes a new information acquisition determination unit that determines whether information related to destination setting is acquired.
The point specifying unit is configured to specify the estimated point based on the information related to the destination setting when the new information acquisition determining unit determines that the information related to the destination setting is acquired. The vehicle information terminal according to claim 1.

The point specifying unit specifies, as the estimated point, a point where the ignition of the vehicle is turned off when it is determined by the new information acquisition determining unit that information relating to the setting of the destination has not been acquired. The vehicle information terminal according to claim 2.

2. The point according to claim 1, wherein the point specifying unit specifies a point where the ignition of the vehicle is turned off after the recognition result determining unit determines that the recognition result is not obtained as the estimated point. Information terminal for vehicles.

When prior information leading to the utterance of the unrecognizable voice information has been acquired, it comprises a match determining means for determining whether or not the attribute information included in the estimated point information matches the prior information,
The storage control means stores the estimated point information and the unrecognizable voice information in association with each other when the attribute information and the prior information are determined to match by the match determination means. Item 5. The vehicle information terminal according to any one of Items 1 to 4.

Voice information acquisition means for acquiring voice information uttered by a speaker;
Recognition processing means for recognizing the voice information acquired by the voice information acquisition means;
Recognition result judging means for judging whether a recognition result is obtained by the recognition processing means;
Voice information holding means for holding the voice information as unrecognizable voice information when the recognition result judgment means determines that the recognition result cannot be obtained;
Point identifying means for identifying an estimated point on the map that is estimated to correspond to the unrecognizable voice information based on an operation performed after the recognition result determining means determines that the recognition result cannot be obtained;
And storage control means for storing the estimated point information related to the estimated point and the unrecognizable voice information in association with each other,
As a program that allows the computer to function.