JP2009230068A

JP2009230068A - Voice recognition device and navigation system

Info

Publication number: JP2009230068A
Application number: JP2008078686A
Authority: JP
Inventors: Ryuichi Suzuki; 竜一鈴木
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2008-03-25
Filing date: 2008-03-25
Publication date: 2009-10-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide speech recognition technology for suppressing degradation in recognition performance by reducing comparison object patterns which are actually used, in a voice recognition by a large amount of comparison object patterns. <P>SOLUTION: In a dictionary section 312, there are a plurality of kinds of dictionaries classified based on a plurality of utterance patterns. By using a similarity degree between user's utterance pattern based on a recognition result, and the plurality of utterance patterns, a dictionary priority determination section 325 determines a priority of the plurality of kinds of dictionaries so that the priority may become higher, as the similarity degree is higher. If voice recognition is performed by preferentially using the dictionary of a high priority, in a voice recognition section 31, a final recognition result is obtained with high possibility, for example, only by using a dictionary of the priority 1. Thereby, appropriate voice recognition is attained with high possibility, only by collating with a fewer comparison object patterns than in a case when collating with all dictionaries. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、例えば電話番号の音声入力やナビゲーションシステムにおける目的地の音声入力などに用いて有効な音声認識技術に関する。 The present invention relates to a speech recognition technique effective for use in, for example, speech input of a telephone number or speech input of a destination in a navigation system.

従来より、入力された音声を予め記憶されている複数の比較対象パターン（認識辞書）と比較し、一致度合いの高いものを認識結果とする音声認識装置が既に実用化されており、例えばナビゲーションシステムにおいて設定すべき目的地を、利用者が地名や施設名を音声で入力するためなどに用いられたり、ハンズフリーシステムにおいて電話番号を音声入力するためなどに用いられている（特許文献１参照）。特に車載システムを運転手自身が利用する場合、音声入力であればボタン操作や画面注視が伴わないため、車両の走行中に行っても安全性が高いため有効である。 Conventionally, speech recognition apparatuses that compare input speech with a plurality of comparison target patterns (recognition dictionaries) stored in advance and recognize a result with a high degree of coincidence have already been put to practical use. For example, a navigation system Is used for a user to input a place name or facility name by voice, or for inputting a telephone number by voice in a hands-free system (see Patent Document 1). . In particular, when the driver himself uses the in-vehicle system, since it is not accompanied by button operation or screen gaze if the voice is input, it is effective because the safety is high even when the vehicle is running.

また近年、人間と機械の自然な対話を可能にする音声認識技術が増えてきているが、音声認識装置が自然な発話を受理するためには、膨大な比較対象パターンを音声認識装置に蓄積しておく必要がある。そこで、自然な発話を認識可能にするため、特許文献１に開示された手法は、決められた間隔で単語認識を行い、その各単語候補をキーワードとして構文解析手段を用いて解析し、無意味な語や発話のゆれを含む自然な発話の音声認識を可能にしている。また、特許文献２に開示された手法は、中間結果の確信度などを用いて、自然発話音声認識の精度の低さを処理の迅速さでカバーしようとしている。 In recent years, speech recognition technology that enables natural dialogue between humans and machines has increased, but in order for speech recognition devices to accept natural utterances, a large number of patterns to be compared are stored in the speech recognition device. It is necessary to keep. Therefore, in order to make it possible to recognize natural utterances, the technique disclosed in Patent Document 1 recognizes words at predetermined intervals, analyzes each word candidate as a keyword using a syntax analysis means, and is meaningless. It enables speech recognition of natural utterances including fluctuations of simple words and utterances. Further, the technique disclosed in Patent Document 2 tries to cover the low accuracy of spontaneous speech recognition with the speed of processing using the certainty of intermediate results.

しかし、以上のような方法では、自然発話の膨大な比較対象パターンを音声認識した後に実施するもので、結局、音声認識結果の精度が低ければ、後処理でカバーすることは難しく、正確な自然発話の認識を十分に達成することはできないと考えられる。
特開平５−１９７３８９号公報特開２００５−２８３９７２号公報 However, the above method is performed after speech recognition of a huge comparison target pattern of natural utterances. After all, if the accuracy of the speech recognition result is low, it is difficult to cover by post-processing, and accurate natural It is considered that utterance recognition cannot be fully achieved.
JP-A-5-197389 JP 2005-283972 A

このように、従来の音声認識装置にあっては、自然な発話を認識するため、膨大な比較対象パターンで音声認識した後に処理を行って、自然発話音声認識の精度の低さをカバーしようとする手法が多かった。 As described above, in the conventional speech recognition apparatus, in order to recognize a natural utterance, an attempt is made to cover the low accuracy of the natural utterance speech recognition by performing the processing after performing speech recognition with a huge comparison target pattern. There were many techniques to do.

しかし、音声認識では、比較対象パターン数が増えれば増えるほど認識性能の低下を招く恐れがある。そのため、膨大な比較対象パターンによる音声認識の結果をもとに何らかの処理を行ったとしても、音声認識結果の精度が低ければ低いほど、後処理でカバーすることは難しくなり、結果として正確な自然発話の認識ができなくなるおそれがある。 However, in speech recognition, the recognition performance may be degraded as the number of comparison target patterns increases. Therefore, even if some processing is performed based on the results of speech recognition using a huge number of comparison target patterns, the lower the accuracy of the speech recognition results, the more difficult it is to cover with post-processing. The utterance may not be recognized.

そこで本発明は、このような問題を解決し、膨大な比較対象パターンによる音声認識において、実際に使用する比較対象パターンを減らし、認識性能の低下を抑えることのできる音声認識技術を提供することを目的とする。 Accordingly, the present invention provides a speech recognition technology that can solve such problems and reduce the number of comparison target patterns that are actually used in speech recognition using a large number of comparison target patterns and suppress degradation in recognition performance. Objective.

請求項１に記載の音声認識装置によれば、認識手段が、音声入力手段を介して入力された音声を、予め辞書手段に記憶されている複数の比較対象パターンと比較して一致度合いの高いものを認識結果とする。ここで、辞書手段は、所定の複数の発話パターンに基づいて分類された複数種類の辞書を有しており、次のようにして優先順位を設定する。 According to the voice recognition device of the first aspect, the recognition unit compares the voice input through the voice input unit with a plurality of comparison target patterns stored in advance in the dictionary unit and has a high degree of matching. The result is the recognition result. Here, the dictionary means has a plurality of types of dictionaries classified based on a plurality of predetermined utterance patterns, and sets priorities as follows.

まず、辞書優先順位判定手段が、認識手段による認識結果に基づくユーザの発話パターンと所定の複数の発話パターンとの類似度合いを用いて、その類似度合いが高いほど順位が高くなるように複数種類の辞書の優先順位を判定する。そして、辞書制御手段が、その辞書優先順位判定手段による判定結果に基づいて、複数種類の辞書の優先順位を設定する。 First, the dictionary priority order determination means uses a degree of similarity between the user's utterance pattern based on the recognition result by the recognition means and a plurality of predetermined utterance patterns, so that the higher the degree of similarity, the higher the rank. Determine dictionary priority. Then, the dictionary control means sets priorities of a plurality of types of dictionaries based on the determination result by the dictionary priority order determination means.

実質的に同様の内容を発話する場合であっても、ユーザの癖によって発話パターンが異なることがある。例えば、ナビゲーション装置においてレストランを目的地に設定する場合を想定すると、「レストランに行きたい」「レストランを目的地にする」のように「目的語、動作語」の順番で話すユーザもいれば、「あのー、レストラン」「えーと、レストラン」
のように、動作語を省略すると共に、頭に不要語を付けて「不要語、目的語」の順番で話すユーザもいる。 Even when substantially the same content is uttered, the utterance pattern may differ depending on the user's habit. For example, assuming that a restaurant is set as a destination in a navigation device, there are users who speak in the order of “object, action word” such as “I want to go to a restaurant” and “I want to make a restaurant a destination”. “Ah, restaurant” “Eh, restaurant”
As described above, there are users who omit operation words and add unnecessary words to the head and speak in the order of “unnecessary words, object words”.

したがって、予め所定の複数の発話パターンに基づいて分類された複数種類の辞書のうち、どれを用いると、ユーザの癖を反映した発話パターンに適切に対応できるのかを決めるために、ユーザの発話パターンと所定の複数の発話パターンとの類似度合いが高いほど順位が高くなるように複数種類の辞書の優先順位を判定するのである。 Therefore, in order to determine which of a plurality of types of dictionaries previously classified based on a plurality of predetermined utterance patterns can be used appropriately for the utterance pattern reflecting the user's habit, the user's utterance pattern The priorities of a plurality of types of dictionaries are determined such that the higher the degree of similarity between a plurality of predetermined utterance patterns, the higher the rank.

そして、認識手段が、優先順位の高い辞書を優先して用いて認識結果を得るようにすれば、辞書手段全体としてみた場合は膨大な比較対象パターンを持つ音声認識装置であっても、実際に使用する比較対象パターンを減らし、認識性能の低下を抑えることができる。 If the recognition means obtains a recognition result by using a dictionary with high priority, even if it is a speech recognition apparatus having a huge comparison target pattern when viewed as a whole dictionary means, It is possible to reduce the number of comparison target patterns to be used and to suppress the degradation of recognition performance.

優先順位に従って優先されるのであるが、その優先度合いについては、予め固定的に設定しておいてもよいし、請求項２に示すように、優先度合いをユーザが指示可能に構成しても良い。つまり、ユーザからの指示を受付可能な受付手段を備え、この受付手段を介して受け付けた指示に基づき、辞書制御手段が、優先順位の設定された辞書の優先度合いを設定するのである。そして、認識手段は、辞書制御手段によって設定された優先度合いに基づいて前記一致度合いの判定を行うのである。 Although priority is given according to the priority order, the priority degree may be fixedly set in advance, or the priority degree may be designated by the user as shown in claim 2. . That is, a receiving unit that can receive an instruction from the user is provided, and the dictionary control unit sets the priority level of the dictionary set with the priority order based on the instruction received through the receiving unit. The recognizing unit determines the degree of coincidence based on the priority set by the dictionary control unit.

辞書の優先順位自体は決まっていても、優先度合いについては、相対的に大きくした方がよい場合もあれば小さくした方がよい場合もあるかもしれない。請求項２に示すようにすれば、そのようなユーザの意図を反映させることができる。 Even if the priority order of the dictionary itself is determined, there may be a case where it is better to make the priority higher or a lower priority. If it makes it as shown in Claim 2, such a user's intention can be reflected.

なお、優先順位の判定に際しては、ユーザの癖を反映した発話パターンに適切に対応できるようにするためには、直前の発話に対応する認識結果だけでなく、請求項３に示すように、過去所定回数の発話に対応する認識結果に基づくことも好ましい。その場合は、発話履歴記憶手段に、認識手段による認識結果を、過去所定回数の発話分記憶しておき、辞書優先順位判定手段が、その発話履歴記憶手段に記憶された過去所定回数の発話分に対応する認識結果に基づくユーザの発話パターンと所定の複数の発話パターンとの類似度合いを用いて、辞書の優先順位を判定するのである。 When determining the priority order, in order to appropriately deal with the utterance pattern reflecting the user's habit, not only the recognition result corresponding to the immediately preceding utterance but also the past as shown in claim 3 It is also preferable to be based on a recognition result corresponding to a predetermined number of utterances. In that case, the recognition result by the recognition means is stored in the utterance history storage means for the past predetermined number of utterances, and the dictionary priority order determination means stores the predetermined number of utterances in the past stored in the utterance history storage means. The priority order of the dictionary is determined using the degree of similarity between the user's utterance pattern and a plurality of predetermined utterance patterns based on the recognition result corresponding to.

上述した音声認識装置は、辞書の優先順位を装置側が自動的に判断して設定するようにしたが、請求項４に示すように、辞書の優先順位をユーザの指示に基づいて設定するようにしてもよい。 In the speech recognition apparatus described above, the priority order of the dictionary is automatically determined and set by the apparatus side. However, as shown in claim 4, the priority order of the dictionary is set based on a user instruction. May be.

請求項４に記載の音声認識装置によれば、認識手段が、音声入力手段を介して入力された音声を、予め辞書手段に記憶されている複数の比較対象パターンと比較して一致度合いの高いものを認識結果とする。ここで、辞書手段は、所定の複数の発話パターンに基づいて分類された複数種類の辞書を有しており、次のようにして優先順位を設定する。 According to the voice recognition device of the fourth aspect, the recognition means compares the voice input via the voice input means with a plurality of comparison target patterns stored in advance in the dictionary means and has a high degree of matching. The result is the recognition result. Here, the dictionary means has a plurality of types of dictionaries classified based on a plurality of predetermined utterance patterns, and sets priorities as follows.

まず、ユーザからの指示を受付可能な受付手段を介して受け付けた指示に基づき、複数種類の辞書の優先順位を設定する。このように、ユーザからの指示に基づいて優先順位を設定すれば、予め所定の複数の発話パターンに基づいて分類された複数種類の辞書のうちから、ユーザの癖を反映した発話パターンに適切に対応した優先順位で辞書を用いて音声認識を実行することができる。 First, priorities of a plurality of types of dictionaries are set based on an instruction received via an accepting unit that can accept an instruction from a user. As described above, if the priority order is set based on the instruction from the user, the utterance pattern reflecting the user's habit is appropriately selected from a plurality of types of dictionaries previously classified based on a plurality of predetermined utterance patterns. Speech recognition can be performed using a dictionary with corresponding priorities.

この場合も、優先度合いについては予め固定的に設定しておいてもよいし、請求項５に示すように、優先度合いをユーザが指示可能に構成しても良い。つまり、受付手段を介して受け付けたユーザからの指示に基づき、辞書制御手段が、優先順位の設定された辞書の優先度合いを設定する。 In this case as well, the priority level may be fixedly set in advance, or the priority level may be designated by the user as shown in claim 5. That is, based on an instruction from the user received through the receiving unit, the dictionary control unit sets the priority level of the dictionary set with the priority order.

また、このような音声認識装置の適用先としては、種々考えられるが、その一例として請求項６に示すように、音声入力手段を、ナビゲーション装置がナビゲート処理をする上で指定される必要のある所定の地名関連データの指示を使用者が音声にて入力するために用いるようにしたナビゲーションシステムが挙げられる。このナビゲート処理としては、例えば地図表示や経路案内等が考えられる。 In addition, various application destinations of such a speech recognition device are conceivable. As an example, as shown in claim 6, it is necessary to designate a speech input means when the navigation device performs a navigation process. There is a navigation system in which an instruction of certain predetermined place name related data is used for a user to input by voice. As this navigation processing, for example, map display, route guidance, etc. can be considered.

以下、本発明が適用された実施形態について図面を用いて説明する。なお、本発明の実施の形態は、下記の実施形態に何ら限定されることはなく、本発明の技術的範囲に属する限り種々の形態を採りうる。 Embodiments to which the present invention is applied will be described below with reference to the drawings. The embodiment of the present invention is not limited to the following embodiment, and can take various forms as long as they belong to the technical scope of the present invention.

［構成の説明］
（ナビゲーションシステム全体の説明）
図１は音声認識機能を持たせたナビゲーションシステム２の概略構成を示すブロック図である。本ナビゲーションシステム２は、車両に搭載されて用いられるいわゆるカーナビゲーションシステムであり、位置検出器４、データ入力器６、操作スイッチ群８、これらに接続された制御回路１０、制御回路１０に接続された外部メモリ１２、表示装置１４及びリモコンセンサ１５、通信装置１６及び音声認識装置３０を備えている。なお制御回路１０は通常のコンピュータとして構成されており、内部には、周知のＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ／Ｏ及びこれらの構成を接続するバスラインが備えられている。 [Description of configuration]
(Description of the entire navigation system)
FIG. 1 is a block diagram showing a schematic configuration of a navigation system 2 having a voice recognition function. The navigation system 2 is a so-called car navigation system used by being mounted on a vehicle. The navigation system 2 is connected to a position detector 4, a data input device 6, an operation switch group 8, a control circuit 10 connected thereto, and a control circuit 10. And an external memory 12, a display device 14, a remote control sensor 15, a communication device 16, and a voice recognition device 30. The control circuit 10 is configured as a normal computer, and includes a well-known CPU, ROM, RAM, I / O, and a bus line for connecting these configurations.

位置検出器４は、周知のジャイロスコープ１８、距離センサ２０及び衛星からの電波に基づいて車両の位置を検出するためのＧＰＳ受信機２２を有している。これらのセンサ等１８，２０，２２は各々が性質の異なる誤差を持っているため、複数のセンサにより、各々補間しながら使用するように構成されている。なお、精度によっては上述した内の一部で構成してもよく、更に、ステアリングの回転センサ、各転動輪の車輪センサ等を用いてもよい。 The position detector 4 has a known gyroscope 18, a distance sensor 20, and a GPS receiver 22 for detecting the position of the vehicle based on radio waves from a satellite. Each of these sensors 18, 20, and 22 has an error of a different nature, and is configured to be used while being interpolated by a plurality of sensors. Depending on the accuracy, a part of the above may be used, and further, a steering rotation sensor, a wheel sensor of each rolling wheel, or the like may be used.

データ入力器６は、位置検出の精度向上のためのいわゆるマップマッチング用データ、地図データ及び目印データを含むナビゲーション用の各種データに加えて、音声認識装置３０において認識処理を行う際に用いる辞書データを入力するための装置である。記憶媒体としては、ハードディスクやＤＶＤ、あるいはＣＤ−ＲＯＭ等の他の媒体を用いても良い。データ記憶媒体としてＤＶＤを用いた場合には、このデータ入力器６はＤＶＤプレーヤとなる。 In addition to so-called map matching data for improving the accuracy of position detection, various data for navigation including map data and landmark data, the data input device 6 uses dictionary data used when the speech recognition device 30 performs recognition processing. Is a device for inputting. As the storage medium, another medium such as a hard disk, a DVD, or a CD-ROM may be used. When a DVD is used as the data storage medium, the data input device 6 is a DVD player.

表示装置１４はカラー表示装置であり、表示装置１４の画面には、位置検出器４から入力された車両現在位置マークと、地図データ入力器６より入力された地図データと、更に地図上に表示する誘導経路や設定地点の目印等の付加データとを重ねて表示することができる。また、複数の選択肢を表示するメニュー画面やその中の選択肢を選んだ場合に、さらに複数の選択肢を表示するコマンド入力画面なども表示することができる。 The display device 14 is a color display device. On the screen of the display device 14, the vehicle current position mark input from the position detector 4, the map data input from the map data input device 6, and further displayed on the map. Additional data such as guidance routes to be set and landmarks for setting points can be displayed in an overlapping manner. In addition, when a menu screen that displays a plurality of options, or when an option is selected, a command input screen that displays a plurality of options can be displayed.

通信装置１６は、設定された連絡先通信情報によって特定される連絡先との通信を行うためのものであり、例えば携帯電話機等の移動体通信機によって構成される。
また、本ナビゲーションシステム２は、リモートコントロール端末（以下、リモコンと称する。）１５ａを介してリモコンセンサ１５から、あるいは操作スイッチ群８により目的地の位置を入力すると、現在位置からその目的地までの最適な経路を自動的に選択して誘導経路を形成し表示する、いわゆる経路案内機能も備えている。このような自動的に最適な経路を設定する手法は、ダイクストラ法等の手法が知られている。操作スイッチ群８は、例えば、表示装置１４と一体になったタッチスイッチもしくはメカニカルなスイッチ等が用いられ、各種コマンドの入力に利用される。 The communication device 16 is for communicating with a contact specified by the set contact communication information, and is configured by a mobile communication device such as a mobile phone.
In addition, when the navigation system 2 inputs the position of the destination from the remote control sensor 15 or the operation switch group 8 via a remote control terminal (hereinafter referred to as a remote controller) 15a, the navigation system 2 can move from the current position to the destination. A so-called route guidance function is also provided which automatically selects an optimum route to form and display a guidance route. As a method for automatically setting an optimal route, a method such as the Dijkstra method is known. For example, a touch switch or a mechanical switch integrated with the display device 14 is used as the operation switch group 8 and is used for inputting various commands.

そして、音声認識装置３０は、上記操作スイッチ群８あるいはリモコン１５ａが手動操作により各種コマンド入力のために用いられるのに対して、利用者が音声で入力することによっても同様に各種コマンドを入力できるようにするための装置である。 The voice recognition device 30 can input various commands in the same manner when the user inputs voices, while the operation switch group 8 or the remote controller 15a is used for inputting various commands by manual operation. It is an apparatus for doing so.

（音声認識装置３０の説明）
この音声認識装置３０は、音声認識部３１と、対話制御部３２と、音声合成部３３と、音声抽出部３４と、マイク３５と、スイッチ３６と、スピーカ３７と、制御部３８とを備えている。 (Description of voice recognition device 30)
The voice recognition device 30 includes a voice recognition unit 31, a dialogue control unit 32, a voice synthesis unit 33, a voice extraction unit 34, a microphone 35, a switch 36, a speaker 37, and a control unit 38. Yes.

音声認識部３１は、音声抽出部３４から入力された音声データを、対話制御部３２からの指示により入力音声の認識処理を行い、その認識結果を対話制御部３２に返す。すなわち、音声抽出部３４から取得した音声データに対し、記憶している辞書データを用いて照合を行い、複数の比較対象パターン候補と比較して一致度の高い上位比較対象パターンを対話制御部３２へ出力する。 The voice recognition unit 31 performs input voice recognition processing on the voice data input from the voice extraction unit 34 according to an instruction from the dialogue control unit 32, and returns the recognition result to the dialogue control unit 32. That is, the speech data acquired from the speech extraction unit 34 is collated using the stored dictionary data, and the upper comparison target pattern having a higher degree of coincidence than a plurality of comparison target pattern candidates is displayed in the dialog control unit 32. Output to.

入力音声中の単語系列の認識は、音声抽出部３４から入力された音声データを音響モデルと順次音響分析して音響的特徴量（例えばケプストラム）を抽出する。この音響分析によって得られた音響的特徴量時系列データを得る。そして、周知のＨＭＭ（隠れマルコフモデル）、ＤＰマッチング法あるいはニューラルネットなどによって、この時系列データをいくつかの区間に分け、各区間が辞書データとして格納されたどの単語に対応しているかを求める。 In recognition of a word sequence in the input voice, the acoustic data (for example, cepstrum) is extracted by sequentially analyzing the voice data inputted from the voice extraction unit 34 with the acoustic model. The acoustic feature time series data obtained by this acoustic analysis is obtained. Then, the time series data is divided into several sections by a known HMM (Hidden Markov Model), DP matching method, or neural network, and it is determined which word stored as dictionary data corresponds to each section. .

対話制御部３２は、音声認識部３１における認識結果や制御部３８からの指示に基づき、音声合成部３３への応答音声の出力指示、あるいは、ナビゲーションシステム自体の処理を実行する制御回路１０に対して例えばナビゲート処理のために必要な目的地やコマンドを通知して目的地の設定やコマンドを実行させるよう指示する処理を行う。このような処理の結果として、この音声認識装置３０を利用すれば、上記操作スイッチ群８あるいはリモコン１５ａを手動しなくても、音声入力によりナビゲーションシステムに対する目的地の指示などが可能となるのである。 Based on the recognition result in the voice recognition unit 31 and the instruction from the control unit 38, the dialogue control unit 32 instructs the control circuit 10 to execute a response voice output instruction to the voice synthesis unit 33 or the processing of the navigation system itself. Thus, for example, a destination or command necessary for the navigation process is notified, and processing for instructing to execute the destination setting or command is performed. As a result of such processing, if the voice recognition device 30 is used, a destination can be instructed to the navigation system by voice input without manually operating the operation switch group 8 or the remote controller 15a. .

なお、音声合成部３３は、波形データベース内に格納されている音声波形を用い、対話制御部３２からの応答音声の出力指示に基づく音声を合成する。この合成音声がスピーカ３７から出力されることとなる。 The voice synthesizer 33 synthesizes a voice based on a response voice output instruction from the dialogue control unit 32 using a voice waveform stored in the waveform database. This synthesized voice is output from the speaker 37.

音声抽出部３４は、マイク３５にて取り込んだ周囲の音声をデジタルデータに変換して音声認識部３１に出力するものである。詳しくは、入力した音声の特徴量を分析するため、例えば数１０ｍｓ程度の区間のフレーム信号を一定間隔で切り出し、その入力信号が、音声の含まれている音声区間であるのか音声の含まれていない雑音区間であるのか判定する。マイク３５から入力される信号は、認識対象の音声だけでなく雑音も混在したものであるため、音声区間と雑音区間の判定を行う。この判定方法としては従来より多くの手法が提案されており、例えば入力信号の短時間パワーを一定時間毎に抽出していき、所定の閾値以上の短時間パワーが一定以上継続したか否かによって音声区間であるか雑音区間であるかを判定する手法がよく採用されている。そして、音声区間であると判定された場合には、その入力信号が音声認識部３１に出力されることとなる。 The voice extraction unit 34 converts the surrounding voice captured by the microphone 35 into digital data and outputs the digital data to the voice recognition unit 31. Specifically, in order to analyze the feature amount of the input voice, for example, a frame signal of a section of about several tens of milliseconds is cut out at a constant interval, and whether the input signal is a voice section including the voice is included. Determine if there is no noise interval. Since the signal input from the microphone 35 includes not only the speech to be recognized but also noise, the speech section and the noise section are determined. Many methods have been proposed as this determination method. For example, the short-time power of the input signal is extracted at regular intervals, and depending on whether or not the short-time power equal to or greater than a predetermined threshold continues for a certain period. A method of determining whether a speech section or a noise section is often used. Then, when it is determined that it is a voice section, the input signal is output to the voice recognition unit 31.

また、本実施形態においては、利用者がスイッチ３６を押しながらマイク３５を介して音声を入力するという利用方法である。具体的には、制御部３８がスイッチ３６が押されたタイミングや戻されたタイミング及び押された状態が継続した時間を監視しており、スイッチ３６が押された場合には音声抽出部３４及び音声認識部３１に対して処理の実行を指示する。一方、スイッチ３６が押されていない場合にはその処理を実行させないようにしている。したがって、スイッチ３６が押されている間にマイク３５を介して入力された音声データが音声認識部３１へ出力されることとなる。 In the present embodiment, the user inputs voice through the microphone 35 while pressing the switch 36. Specifically, the control unit 38 monitors the timing at which the switch 36 is pressed, the timing at which the switch 36 is returned, and the time during which the pressed state continues, and if the switch 36 is pressed, The voice recognition unit 31 is instructed to execute processing. On the other hand, when the switch 36 is not pressed, the processing is not executed. Accordingly, voice data input via the microphone 35 while the switch 36 is being pressed is output to the voice recognition unit 31.

このような構成を有することによって、本実施形態のナビゲーションシステム２では、ユーザがコマンドを入力することによって、経路設定や経路案内あるいは施設検索や施設表示など各種の処理を実行することができる。 By having such a configuration, in the navigation system 2 of the present embodiment, various processes such as route setting, route guidance, facility search, and facility display can be executed by the user inputting a command.

（音声認識部３１と対話制御部３２の説明）
ここで、音声認識部３１と対話制御部３２についてさらに説明する。
図２に示すように、音声認識部３１は、照合部３１１と辞書部３１２と抽出結果記憶部３１３とを有しており、対話制御部３２は、処理部３２１と入力部３２２と辞書制御部３２３と発話履歴記憶部３２４と辞書優先順位判定部３２５を有している。 (Description of the voice recognition unit 31 and the dialogue control unit 32)
Here, the voice recognition unit 31 and the dialogue control unit 32 will be further described.
As shown in FIG. 2, the speech recognition unit 31 includes a collation unit 311, a dictionary unit 312, and an extraction result storage unit 313, and the dialogue control unit 32 includes a processing unit 321, an input unit 322, and a dictionary control unit. 323, an utterance history storage unit 324, and a dictionary priority order determination unit 325.

音声認識部３１においては、抽出結果記憶部３１３が音声抽出部３４から出力された抽出結果を記憶しておき、照合部３１１がその記憶された抽出結果に対し、辞書部３１２内に記憶されている辞書データ（以下、単に辞書と称す。）を用いて照合を行う。そして、照合部３１１にて辞書と比較されて一致度（尤度）が高いとされた上位の認識結果は、対話制御部３２の処理部３２１へ出力され、対話制御部３２の処理部３２１が、制御回路１０へその認識結果を出力する。 In the voice recognition unit 31, the extraction result storage unit 313 stores the extraction result output from the voice extraction unit 34, and the collation unit 311 stores the extracted result in the dictionary unit 312. Collation is performed using existing dictionary data (hereinafter simply referred to as a dictionary). Then, the higher-level recognition result that has been compared with the dictionary by the matching unit 311 and determined to have a high degree of coincidence (likelihood) is output to the processing unit 321 of the dialogue control unit 32, and the processing unit 321 of the dialogue control unit 32 The recognition result is output to the control circuit 10.

一方、制御回路１０から対話制御部３２へは、辞書の重み付け（優先度合）の指示がなされる。制御回路１０は、ユーザからの操作を操作スイッチ群８（図１参照）を介して受け付け、その操作に基づく指示を対話制御部３２へ出力する。対話制御部３２の入力部３２２はその指示を入力し、辞書制御部３２３へ出力する。 On the other hand, the control circuit 10 instructs the dialogue control unit 32 to weight the dictionary (priority level). The control circuit 10 receives an operation from the user via the operation switch group 8 (see FIG. 1), and outputs an instruction based on the operation to the dialogue control unit 32. The input unit 322 of the dialogue control unit 32 inputs the instruction and outputs it to the dictionary control unit 323.

また、発話履歴記憶部３２４は辞書優先順位判定部３２５へ発話履歴を出力し、辞書優先順位判定部３２５ではその発話履歴に基づき、辞書の優先順位の指示を辞書制御部３２３へ出力する。 Further, the utterance history storage unit 324 outputs the utterance history to the dictionary priority order determination unit 325, and the dictionary priority order determination unit 325 outputs a dictionary priority order instruction to the dictionary control unit 323 based on the utterance history.

辞書制御部３２３では、辞書優先順位判定部３２５から入力された指示、入力部３２２から入力された指示に基づき、音声認識部３１の辞書部３１２に対して、辞書の優先順位や重み付けの設定を行う。 The dictionary control unit 323 sets dictionary priorities and weights for the dictionary unit 312 of the speech recognition unit 31 based on the instruction input from the dictionary priority order determination unit 325 and the instruction input from the input unit 322. Do.

（辞書部３１２の説明）
ここで辞書部３１２について説明する。辞書部３１２は、第１辞書３１２ａ、第２辞書３１２ｂ、第３辞書３１２ｃ、第４辞書３１２ｄを有している。これら４つの辞書３１２ａ，３１２ｂ，３１２ｃ，３１２ｄは、予め４つの発話パターンに基づいて分類されたものである。 (Description of dictionary unit 312)
Here, the dictionary unit 312 will be described. The dictionary unit 312 includes a first dictionary 312a, a second dictionary 312b, a third dictionary 312c, and a fourth dictionary 312d. These four dictionaries 312a, 312b, 312c, and 312d are classified in advance based on four utterance patterns.

その一例について、図３を参照して説明する。例えばレストランを目的地に設定する場合を想定する。その場合、ユーザの発話例として、
えーと、レストランに行きたい
あのー、レストランを目的地にする
えーと、レストラン
あのー、レストラン
レストランに行きたい
レストランを目的地にする
レストラン
のような７つの発話があるとする。 One example thereof will be described with reference to FIG. For example, assume that a restaurant is set as a destination. In that case, as an example of the user's utterance,
Uh, I want to go to a restaurant, where the restaurant is a destination. Um, I want to go to a restaurant, a restaurant.

この発話は例えば、不要語、目的語、動作語の組み合わせで分類することができる。
目的地を設定するため、目的語は必ず必要なので、上記の例では、
（１）不要語、目的語、動作語
（２）不要語、目的語
（３）目的語、動作語
（４）目的語
の４つの発話パターンの組に分類できる。上記の発話例で言えば、
（１）えーと、レストランに行きたい
あのー、レストランを目的地にする
（２）えーと、レストラン
あのー、レストラン
（３）レストランに行きたい
レストランを目的地にする
（４）レストラン
という４つの発話パターンの組に分類できる。図３（ａ）は辞書が文法構造の場合のパターン分類例であり、図３（ｂ）は辞書が線形構造の場合のパターン分類例を示している。 This utterance can be classified, for example, by a combination of unnecessary words, object words, and action words.
In order to set a destination, an object is always required, so in the above example,
(1) Unnecessary word, object word, action word (2) Unnecessary word, object word (3) Object word, action word (4) It can be classified into a set of four utterance patterns. In the utterance example above,
(1) Uh, I want to go to a restaurant Oh, make a restaurant a destination (2) Uh, a restaurant Oh, a restaurant (3) I want to go to a restaurant Destination (4) A set of four utterance patterns Can be classified. FIG. 3A shows an example of pattern classification when the dictionary has a grammatical structure, and FIG. 3B shows an example of pattern classification when the dictionary has a linear structure.

そして、これら４つの辞書３１２ａ，３１２ｂ，３１２ｃ，３１２ｄについては、それぞれ優先順位と重み付けを設定することができる。この優先順位と重み付けの値の設定は、対話制御部３２の辞書制御部３２３が実行する。辞書３１２ａ，３１２ｂ，３１２ｃ，３１２ｄに対する優先順位と重み付けの設定例については、後で説明する。 For these four dictionaries 312a, 312b, 312c, and 312d, priority and weight can be set respectively. The dictionary control unit 323 of the dialogue control unit 32 executes the setting of the priority order and the weight value. An example of setting priorities and weights for the dictionaries 312a, 312b, 312c, and 312d will be described later.

以上、ナビゲーションシステム２の概略構成について説明したが、本実施形態におけるナビゲーションシステム２の構成と特許請求の範囲に記載した構成との対応は次のとおりである。 Although the schematic configuration of the navigation system 2 has been described above, the correspondence between the configuration of the navigation system 2 in the present embodiment and the configuration described in the claims is as follows.

本実施形態においては、マイク３５が音声入力手段に相当し、音声認識部３１内の辞書部３１２が辞書手段に相当する。また、照合部３１１が認識手段に相当し、操作スイッチ群８及びマイク３５が受付手段に相当する。また、対話制御部３２内の辞書優先順位判定部３２５が辞書優先順位判定手段に相当し、辞書制御部３２３が辞書制御手段に相当する。また、発話履歴記憶部３２４が発話履歴記憶手段に相当する。 In the present embodiment, the microphone 35 corresponds to a voice input unit, and the dictionary unit 312 in the voice recognition unit 31 corresponds to a dictionary unit. The collation unit 311 corresponds to a recognition unit, and the operation switch group 8 and the microphone 35 correspond to a reception unit. Further, the dictionary priority order determination unit 325 in the dialogue control unit 32 corresponds to a dictionary priority order determination unit, and the dictionary control unit 323 corresponds to a dictionary control unit. Further, the utterance history storage unit 324 corresponds to the utterance history storage means.

［音声認識処理の説明］
本実施形態のナビゲーションシステム２において実行される音声認識処理について、図４、図５のフローチャートを参照して説明する。これらのフローチャートは、音声認識部３１及び対話制御部３２にて実行される処理を示している。 [Description of voice recognition processing]
The speech recognition process executed in the navigation system 2 of the present embodiment will be described with reference to the flowcharts of FIGS. These flowcharts show processing executed by the voice recognition unit 31 and the dialogue control unit 32.

最初のステップＳ１０で変数ｉ＝１に設定し、続くＳ２０にて音声が入力されると、Ｓ３０にて、照合部３１１により入力音声と優先順位（ｉ）の辞書との照合を行って認識処理を行う。Ｓ４０では、この認識結果を処理部３２１へ送る。 When the variable i is set to 1 in the first step S10 and a voice is input in the subsequent S20, the collation unit 311 collates the input voice with the dictionary of the priority order (i) in S30 and performs a recognition process. I do. In S40, the recognition result is sent to the processing unit 321.

Ｓ５０では、その認識結果の尤度が閾値以上であるかを判定し、閾値以上であれば（Ｓ５０：ＹＥＳ）、その認識結果で確定する（Ｓ８０）。そして、その認識結果を発話履歴記憶部３２４へ記憶させる（Ｓ１４０）。 In S50, it is determined whether the likelihood of the recognition result is equal to or greater than a threshold value. If the likelihood is equal to or greater than the threshold value (S50: YES), the recognition result is confirmed (S80). Then, the recognition result is stored in the utterance history storage unit 324 (S140).

一方、尤度が閾値以上でなければ（Ｓ５０：ＮＯ）、変数ｉがｎ−１未満か否か判定する。このｎは優先順位の最大値である。ｉ＜ｎ−１の場合は（Ｓ６０：ＹＥＳ）、変数ｉをインクリメント（ｉ＝ｉ＋１）する（Ｓ７０）。そして、Ｓ３０へ戻り、そのインクリメントした優先順位（ｉ）の辞書と入力音声との照合を行って認識処理を行う。 On the other hand, if the likelihood is not greater than or equal to the threshold (S50: NO), it is determined whether or not the variable i is less than n-1. This n is the maximum priority. If i <n−1 (S60: YES), the variable i is incremented (i = i + 1) (S70). Then, the process returns to S30, where the incremented priority (i) dictionary and the input speech are collated to perform recognition processing.

ｉ≧ｎ−１の場合は（Ｓ６０：ＮＯ）、照合部３１１により入力音声と優先順位（ｎ）の辞書との照合を行って認識処理を行う（Ｓ９０）。Ｓ１００では、この認識結果を処理部３２１へ送る。 If i ≧ n−1 (S60: NO), the collation unit 311 collates the input speech with the dictionary of priority (n) to perform recognition processing (S90). In S100, the recognition result is sent to the processing unit 321.

Ｓ１１０では、優先順位（ｎ）の辞書での尤度が優先順位（１）〜（ｎ−１）の辞書での尤度以上か否か判定する。ここで、優先順位（ｎ）の辞書での尤度が優先順位（１）〜（ｎ−１）の辞書での尤度以上であれば（Ｓ１１０：ＹＥＳ）、優先順位（ｎ）の辞書での認識結果で確定する（Ｓ１２０）。一方、優先順位（ｎ）の辞書での尤度が優先順位（１）〜（ｎ−１）の辞書での尤度未満であれば（Ｓ１１０：ＮＯ）、優先順位（１）〜（ｎ−１）の辞書での尤度が最も高い認識結果で確定する（Ｓ１３０）。 In S110, it is determined whether or not the likelihood in the priority (n) dictionary is equal to or higher than the likelihood in the priorities (1) to (n-1). If the likelihood in the priority (n) dictionary is equal to or higher than the likelihood in the priority (1) to (n-1) dictionary (S110: YES), the priority (n) dictionary is used. The recognition result is determined (S120). On the other hand, if the likelihood in the priority order (n) dictionary is less than the likelihood in the priority order (1) to (n-1) dictionary (S110: NO), the priority order (1) to (n- The recognition result with the highest likelihood in the dictionary of 1) is determined (S130).

Ｓ１２０又はＳ１３０において認識結果が確定された後は、その認識結果を発話履歴記憶部３２４へ記憶させる（Ｓ１４０）。
続くＳ１５０（図５参照）では、辞書優先順位判定部３２５によって、辞書の優先順位の変更があるか否か判定する。この判定は、発話履歴記憶部３２４に記憶された発話履歴をもとにして判定する。 After the recognition result is confirmed in S120 or S130, the recognition result is stored in the utterance history storage unit 324 (S140).
In subsequent S150 (see FIG. 5), the dictionary priority determination unit 325 determines whether there is a change in the dictionary priority. This determination is made based on the utterance history stored in the utterance history storage unit 324.

辞書の優先順位の変更がある場合（Ｓ１５０：ＹＥＳ）、辞書優先順位判定部３２５は辞書制御部３２３に対して優先辞書の変更を指示し、その指示に基づいて辞書制御部３２３が、辞書部３１２に対して辞書の優先順位の設定を行う（Ｓ１６０）。 If there is a change in the dictionary priority (S150: YES), the dictionary priority determination unit 325 instructs the dictionary control unit 323 to change the priority dictionary, and the dictionary control unit 323 determines the dictionary unit based on the instruction. Dictionary priority order is set for 312 (S160).

辞書の優先順位については、例えば図２に示すように、第２辞書を優先順位１とし、第１辞書を優先順位２、第３辞書を優先順位３、第４辞書を優先順位４としているが、優先順位１の辞書は一つで、それ以外の三つの辞書は優先順位２とする、といったように、同じ優先順位の辞書が複数存在してもよい。例えば、第２辞書を優先順位１とし、第１辞書、第３辞書及び第４辞書を共に優先順位２とする、といったことである。もちろん、場合によっては優先順位１の辞書が複数存在してもよい。 For example, as shown in FIG. 2, the second dictionary has priority 1, the first dictionary has priority 2, the third dictionary has priority 3, and the fourth dictionary has priority 4. There may be a plurality of dictionaries with the same priority, such as one with priority 1 and three with the other three. For example, the second dictionary is given priority 1, and the first dictionary, the third dictionary, and the fourth dictionary are all given priority 2. Of course, depending on circumstances, there may be a plurality of prioritized dictionaries.

また、優先順位の判定に際しては、ユーザの癖を反映した発話パターンに適切に対応できるようにするためには、直前の発話に対応する認識結果だけでなく、過去所定回数の発話に対応する認識結果に基づくようにしてもよい。その場合は、発話履歴記憶部３２４に、過去所定回数（例えば１０回）の発話分記憶しておき、その過去所定回数の発話分に対応する認識結果に基づいて優先順位を判定することが考えられる。 Also, when determining the priority order, in order to be able to appropriately respond to the utterance pattern reflecting the user's habit, not only the recognition result corresponding to the immediately preceding utterance but also the recognition corresponding to the past predetermined number of utterances. You may make it based on a result. In that case, it may be possible to store a predetermined number of utterances (for example, 10 times) in the utterance history storage unit 324 and determine the priority order based on the recognition result corresponding to the predetermined number of utterances in the past. It is done.

なお、辞書優先順位については、例えば次のようにして決定する。
発話履歴より発話パターンの出現回数を求め、その回数順に優先順位を付ける。例えば過去の発話パターン１０回分が第１辞書３回、第２辞書６回、第３辞書１回、第４辞書０回であったとき、第２辞書を優先順位１、第１辞書を優先順位２、第３辞書を優先順位３、第４辞書を優先辞書４と決定する。 The dictionary priority order is determined as follows, for example.
The number of appearances of the utterance pattern is obtained from the utterance history, and priorities are assigned in the order of the number of utterances. For example, when the past 10 utterance patterns are the first dictionary 3 times, the second dictionary 6 times, the third dictionary 1 time, and the fourth dictionary 0 times, the second dictionary has priority 1 and the first dictionary has priority 2. The priority dictionary 3 is determined as the third dictionary and the priority dictionary 4 is determined as the fourth dictionary.

なお、第１辞書３回、第２辞書３回、第３辞書３回、第４辞書１回であったときは、第１辞書と第２辞書と第３辞書を優先順位１とし、第４辞書を優先順位２と決定する。このように、同じ優先順位に複数の辞書を設定することも可能である。 When the first dictionary is 3 times, the second dictionary is 3 times, the third dictionary is 3 times, and the fourth dictionary is 1 time, the first dictionary, the second dictionary, and the third dictionary are set to the priority order 1, and the fourth dictionary The dictionary is determined as priority 2. In this way, a plurality of dictionaries can be set with the same priority.

この辞書優先順位の決定方法としては、その他にもいくつか方法があり、例えば、発話履歴の割合の閾値を設定することが考えられる。例えば０．５以上，０．３以上，０．１以上、０．１未満のように設定し、過去の発話パターン１０回分が第１辞書３回、第２辞書４回、第３辞書２回、第４辞書１回であったとき、第１辞書と第２辞書を優先順位１、第３辞書と第４辞書を優先辞書２と決定するようにしてもよい。 There are several other methods for determining the dictionary priority order. For example, it is conceivable to set a threshold for the utterance history ratio. For example, it is set to 0.5 or more, 0.3 or more, 0.1 or more, and less than 0.1, and the past utterance pattern 10 times is the first dictionary 3 times, the second dictionary 4 times, the third dictionary 2 times. When the fourth dictionary is one time, the first dictionary and the second dictionary may be determined as the priority order 1, and the third dictionary and the fourth dictionary may be determined as the priority dictionary 2.

これらの過去所定回数および発話履歴の割合の閾値については、１０回および０．５、０．３，０．１というように固定的に設定するようにしてもよいし、ユーザが指示可能に構成しても良い。その場合は、例えば操作スイッチ群８を介してユーザからの指示を受け付け、その受け付けた指示に基づき、辞書制御部３２３が過去所定回数および履歴の割合の閾値を設定する。 The threshold of the past predetermined number of times and the ratio of the utterance history may be fixedly set to 10 times, 0.5, 0.3, 0.1, or configured to be instructable by the user. You may do it. In this case, for example, an instruction from the user is accepted via the operation switch group 8, and based on the accepted instruction, the dictionary control unit 323 sets a past predetermined number of times and a history ratio threshold.

続くＳ１７０では、辞書の重み付けの変更があるか否か判定する。この判定は、制御回路１０から辞書の重み付け（優先度合い）の指示があるか否かで判定する。
辞書の重み付けの変更がある場合（Ｓ１７０：ＹＥＳ）、その指示に基づいて辞書制御部３２３が、辞書部３１２に対して辞書の重み付けの設定を行う（Ｓ１８０）。 In subsequent S170, it is determined whether or not there is a change in the weighting of the dictionary. This determination is made based on whether there is an instruction for weighting (priority) of the dictionary from the control circuit 10.
If there is a change in the dictionary weight (S170: YES), the dictionary control unit 323 sets the dictionary weight to the dictionary unit 312 based on the instruction (S180).

なお、辞書の重み付けに関しては、例えば優先順位１，２，３，４の辞書に対して、それぞれ重み付けを１．０，０．８，０．７，０．６というように固定的に設定するようにしてもよいし、ユーザが指示可能に構成しても良い。その場合は、例えば操作スイッチ群８を介してユーザからの指示を受け付け、その受け付けた指示に基づき、辞書制御部３２３が重み付けを設定する。 Regarding the weighting of the dictionary, for example, the weighting is fixedly set to 1.0, 0.8, 0.7, 0.6 for the dictionaries of the priority order 1, 2, 3, 4 for example. Alternatively, the user may be able to give an instruction. In that case, for example, an instruction from the user is received via the operation switch group 8, and the dictionary control unit 323 sets weighting based on the received instruction.

重み付けの設定をユーザが指示する場合としては、例えば現在の重み付けでは認識性能が悪いと感じた場合に、辞書の優先度合いを変更するために指示することが考えられる。
また、重み付けの指示の仕方としては、最終的な重み付けの値そのもの（例えば１．０，０．８，０．７，０．６といった値）を指示してもよいし、割合などで指示してもよい。例えば優先順位１の辞書と優先順位２の辞書という２種類の優先順位しかない場合に、割合６：４と指示すれば、優先順位１の辞書の重みを１．０と設定し、優先順位２の辞書の重みを０．６７と設定する。３種類以上の優先順位があっても同様である。 As a case where the user instructs the setting of the weighting, for example, when it is felt that the recognition performance is poor with the current weighting, it may be instructed to change the priority level of the dictionary.
In addition, as a method of instructing weighting, a final weighting value itself (for example, a value such as 1.0, 0.8, 0.7, or 0.6) may be instructed, or instructed by a ratio or the like. May be. For example, if there are only two types of priorities, that is, a priority 1 dictionary and a priority 2 dictionary, if the ratio 6: 4 is indicated, the weight of the priority 1 dictionary is set to 1.0, and the priority 2 The dictionary weight is set to 0.67. The same applies even when there are three or more types of priority.

Ｓ１９０にて発話の続きがあるか否か判定し、発話の続きがある場合は（Ｓ１９０：ＹＥＳ）、図４のＳ１０へ移行し、音声入力の待ち状態となる。
発話の続きがない場合は（Ｓ１９０：ＮＯ）、これで一旦音声認識処理が終了となるため、現在の辞書の優先順位及び重み付けを保存する（Ｓ２００）。そして、続くＳ２１０では認識結果の報知を行う。この報知は、処理部３２１が音声合成部３３及びスピーカ３７を介して音声にて報知してもよいし、処理部３２１からの指示に基づいて制御回路１０が表示装置１４に認識結果を表示することによって報知しても良い。 In S190, it is determined whether or not there is a continuation of the utterance. If there is a continuation of the utterance (S190: YES), the process proceeds to S10 in FIG.
If there is no continuation of the utterance (S190: NO), the speech recognition process is once ended with this, so the current dictionary priority and weight are stored (S200). In subsequent S210, the recognition result is notified. This notification may be notified by voice from the processing unit 321 via the voice synthesis unit 33 and the speaker 37, or the control circuit 10 displays the recognition result on the display device 14 based on an instruction from the processing unit 321. You may alert by.

そして、確定指示があるか否か判断する（Ｓ２２０）。この確定指示の有無は、例えばユーザによるマイク３５からの音声入力に基づいて判断する。例えば「はい」とか「確定」といった確定指示であると解釈してもよい内容を示す音声入力があれば確定指示ありと判断でき、また「いいえ」とか「違う」といった確定指示でないと解釈してもよい内容を示す音声入力があれば確定指示なしと判断できる。もちろん、ユーザによる確定指示は音声入力によって行う場合に限定されず、例えばスイッチ操作によって行っても良い。その場合には、操作スイッチ群８を介して認識結果の確定を指示するための操作がなされたか否かによって確定指示の有無を判断する。 Then, it is determined whether there is a confirmation instruction (S220). The presence / absence of this confirmation instruction is determined based on, for example, voice input from the microphone 35 by the user. For example, if there is a voice input indicating content that may be interpreted as a confirmation instruction such as “Yes” or “Confirmation”, it can be determined that there is a confirmation instruction, and it is interpreted as not a confirmation instruction such as “No” or “No”. If there is a voice input indicating a good content, it can be determined that there is no confirmation instruction. Of course, the confirmation instruction by the user is not limited to the case where it is performed by voice input, and may be performed by, for example, a switch operation. In this case, the presence / absence of a confirmation instruction is determined based on whether or not an operation for instructing confirmation of the recognition result is performed via the operation switch group 8.

確定指示なしの場合には（Ｓ２２０：ＮＯ）、Ｓ１０へ戻って再度の音声入力に基づく音声認識処理を実行する。
一方、確定指示ありの場合には（Ｓ２２０：ＹＥＳ）、所定の確定後処理を実行する（Ｓ２３０）。この場合の確定後処理とは、処理部３２１が制御回路１０へ認識結果を出力すると共に、その認識結果が確定したものである旨も通知することである。この確定後処理に応じて、制御回路１０では、例えばナビゲーション機能を利用する場合の目的地設定や施設検索において目的地や施設を特定して入力する場合であれば、確定した目的地や施設に基づいて検索を行うこととなる。 If there is no confirmation instruction (S220: NO), the process returns to S10 and the voice recognition process based on the voice input again is executed.
On the other hand, if there is a confirmation instruction (S220: YES), predetermined post-confirmation processing is executed (S230). The post-determination process in this case is that the processing unit 321 outputs a recognition result to the control circuit 10 and notifies that the recognition result is confirmed. In accordance with the post-confirmation processing, the control circuit 10 determines the destination or facility when the destination or facility is specified and input in the destination setting or facility search when using the navigation function, for example. Based on this, a search is performed.

［効果］
例えばレストランを目的地に設定する場合に、図３にも示す下記の４種類の辞書を用いて音声認識処理を行う。
（１）不要語、目的語、動作語の発話パターン
えーと、レストランに行きたい
あのー、レストランを目的地にする
（２）不要語、目的語の発話パターン
えーと、レストラン
あのー、レストラン
（３）目的語、動作語の発話パターン
レストランに行きたい
レストランを目的地にする
（４）目的語の発話パターン
レストラン
ここで、ある発話者が「あのー、レストラン」「えーと、レストラン」というように、不要語、目的語という形で発話することが多いとすると、（２）の辞書、つまり「不要語、目的語」という発話パターンに対応する辞書の優先順位が１と設定される。レストラン以外の施設を目的地に設定する場合でも、同じ発話パターンにて発話する可能性が高いため、このように発話者の発話の癖を学習することで、発話者の頻繁に使用する発話パターンを判定し、それに対応する辞書の優先順位を高くする。 [effect]
For example, when setting a restaurant as a destination, voice recognition processing is performed using the following four types of dictionaries shown in FIG.
(1) Utterance patterns of unwanted words, object words and action words Uh, I want to go to a restaurant, make the restaurant the destination (2) Utterance patterns of unwanted words, object words Uh, restaurants Oh, restaurant (3) Objectives , Utterance pattern of motion words I want to go to a restaurant I want a restaurant as a destination (4) utterance pattern of an object word Restaurant Here, a speaker speaks an unnecessary word, purpose, such as “Ah, restaurant”, “Uh, restaurant” If the utterance is often made in the form of a word, the priority of the dictionary (2), that is, the dictionary corresponding to the utterance pattern “unnecessary word, object” is set to 1. Even when a facility other than a restaurant is set as the destination, it is highly likely that the user speaks with the same utterance pattern, so by learning the utterance habits of the speaker in this way, the utterance pattern frequently used by the speaker And the priority of the corresponding dictionary is increased.

そして、その優先順位の高い辞書を優先して用いて音声認識を実行すれば、例えば優先順位１の辞書を用いただけで最終的な認識結果を得られる可能性が高くなり、全ての辞書と照合する場合に比べて、少ない比較対象パターンとの照合を行うだけで、適切な音声認識を実現できる可能性が高くなる。 If speech recognition is performed using a dictionary with higher priority, for example, there is a high possibility that a final recognition result can be obtained only by using a dictionary with priority 1, for example. Compared with the case where it does, the possibility that appropriate voice recognition will be realizable only by collating with few comparison object patterns becomes high.

また、たとえ優先順位１の辞書を用いるだけでは認識結果が確定できなくても、優先順位の高い辞書から順番に使用して認識していくことで、全ての辞書と照合しないでも最終的な認識結果を確定できる可能性がある。 In addition, even if the recognition result cannot be determined simply by using the dictionary with priority 1, the final recognition is performed without collating with all dictionaries by using the dictionaries in order from the dictionary with the highest priority. The result may be finalized.

その結果、辞書手段全体としてみた場合は膨大な比較対象パターンを持つ音声認識装置であっても、実際に使用する比較対象パターンを減らし、認識性能の低下を抑えることができる。 As a result, when the dictionary means is viewed as a whole, even a speech recognition apparatus having a large number of comparison target patterns can reduce the number of comparison target patterns that are actually used and suppress a decrease in recognition performance.

図３に示すレストランを目的地に設定する場合の辞書を例にとって説明する。（１）〜（４）の辞書全体の比較対象パターン数は７であるが、優先順位１の（２）の辞書だけであれば比較対象パターン数は２である。 A description will be given taking a dictionary in the case of setting the restaurant shown in FIG. 3 as a destination as an example. The number of comparison target patterns in the entire dictionary of (1) to (4) is 7, but the number of comparison target patterns is 2 if only the dictionary of priority order (2) is used.

なお、優先順位が低いだけで、他の辞書を用いた音声認識も可能であるため、発話者の発話パターンが変化した場合であっても、対応できる。
［その他］
（１）上記実施形態では辞書部３１２内に４つの辞書３１２ａ，３１２ｂ，３１２ｃ，３１２ｄがある場合について説明したが、２つの辞書しかなく、一方が優先順位１の辞書、他方が優先順位２の辞書である場合には、図４のフローチャートに示す処理に替えて図６のフローチャートに示す処理を実行することとなる。 Note that voice recognition using other dictionaries is possible only with a low priority, so even if the utterance pattern of the speaker changes, it is possible to cope.
[Others]
(1) In the above embodiment, the case where there are four dictionaries 312a, 312b, 312c, and 312d in the dictionary unit 312 has been described. In the case of a dictionary, the process shown in the flowchart of FIG. 6 is executed instead of the process shown in the flowchart of FIG.

図６に示す処理について説明する。
最初のステップＳ５１０にて音声が入力されると、Ｓ５２０にて、照合部３１１により入力音声と優先順位１の辞書との照合を行って認識処理を行う。Ｓ５３０では、この認識結果を処理部３２１へ送る。 The process shown in FIG. 6 will be described.
When a voice is input in the first step S510, in S520, the collation unit 311 collates the input voice with the priority order dictionary and performs recognition processing. In S530, the recognition result is sent to the processing unit 321.

Ｓ５４０では、その認識結果の尤度が閾値以上であるかを判定し、閾値以上であれば（Ｓ５４０：ＹＥＳ）、その認識結果で確定する（Ｓ５５０）。そして、その認識結果を発話履歴記憶部３２４へ記憶させる（Ｓ６１０）。 In S540, it is determined whether the likelihood of the recognition result is equal to or greater than a threshold value. If the likelihood is equal to or greater than the threshold value (S540: YES), the recognition result is confirmed (S550). Then, the recognition result is stored in the utterance history storage unit 324 (S610).

一方、尤度が閾値以上でなければ（Ｓ５４０：ＮＯ）、入力音声と優先順位２の辞書との照合を行って認識処理を行う（Ｓ５６０）。Ｓ５７０では、この認識結果を処理部３２１へ送る。 On the other hand, if the likelihood is not equal to or greater than the threshold value (S540: NO), the input speech is compared with the dictionary of priority 2 to perform recognition processing (S560). In S570, the recognition result is sent to the processing unit 321.

Ｓ５８０では、優先順位２の辞書での尤度が優先順位１の辞書での尤度以上か否か判定する。ここで、優先順位２の辞書での尤度が優先順位１の辞書での尤度以上であれば（Ｓ５８０：ＹＥＳ）、優先順位２の辞書での認識結果で確定する（Ｓ５９０）。一方、優先順位２の辞書での尤度が優先順位１の辞書での尤度未満であれば（Ｓ５８０：ＮＯ）、優先順位１の辞書での尤度が最も高い認識結果で確定する（Ｓ６００）。 In S580, it is determined whether the likelihood in the priority 2 dictionary is greater than or equal to the likelihood in the priority 1 dictionary. If the likelihood in the priority 2 dictionary is equal to or higher than the likelihood in the priority 1 dictionary (S580: YES), the recognition result in the priority 2 dictionary is determined (S590). On the other hand, if the likelihood in the priority 2 dictionary is less than the likelihood in the priority 1 dictionary (S580: NO), the recognition result with the highest likelihood in the priority 1 dictionary is determined (S600). ).

Ｓ５９０又はＳ６００において認識結果が確定された後は、その認識結果を発話履歴記憶部３２４へ記憶させる（Ｓ６１０）。この後は、図５のＳ１５０へ移行する。
この場合も、上記実施形態の場合と同様、辞書手段全体としてみた場合は膨大な比較対象パターンを持つ音声認識装置であっても、実際に使用する比較対象パターンを減らし、認識性能の低下を抑えることができる。 After the recognition result is confirmed in S590 or S600, the recognition result is stored in the utterance history storage unit 324 (S610). Thereafter, the process proceeds to S150 of FIG.
In this case as well, as in the case of the above-described embodiment, even if the speech recognition device has a large number of comparison target patterns when viewed as the entire dictionary unit, the comparison target patterns that are actually used are reduced, and deterioration of recognition performance is suppressed. be able to.

（２）上記実施形態では、目的地設定の場合を例示したが、例えばナビゲーション装置における実行コマンド（例：地図を拡大する）で考えれば、例えば「えーと、地図を大きくして」、「あのー、５０ｍスケールにする」、「詳細を実行」という発話パターンが考えられる。そして、それらは、下記のように分類できる。 (2) In the above embodiment, the case of destination setting has been exemplified. However, for example, when considering an execution command (eg, enlarge map) in a navigation device, for example, “Uh, enlarge the map”, “Oh, An utterance pattern such as “to 50 m scale” or “execute details” can be considered. And they can be classified as follows.

不要語：えーと、あのー
目的語：地図を大きく、５０ｍスケール、詳細
動作語：して、にする、を実行
したがって、これらに基づいて発話パターン毎の辞書を設定し、それぞれについて優先順位や重み付けを設定すればよい。 Unnecessary words: Uh, uh, Object: Large map, 50m scale, Detailed Action word: Execute, then, Therefore, set a dictionary for each utterance pattern based on these, and prioritize and weight each You only have to set it.

これら発話パターンについては種々の例が考えられるが、他の一例を挙げておく。
施設名検索に際して、「不要語、都道府県名、ジャンル名、名称、動作語」というパターンが考えられる。また、曲名検索に際して、「不要語、ジャンル名、トラックＮｏ、歌手名、曲名、動作語」というパターンが考えられる。 Various examples of these utterance patterns can be considered, but another example will be given.
When searching for facility names, a pattern of “unnecessary words, prefecture names, genre names, names, action words” can be considered. Further, when searching for a song name, a pattern of “unnecessary word, genre name, track number, singer name, song name, action word” can be considered.

また、発話パターンは、その他に「動作語、目的語」という「倒置」や、「目的語、不要語、動作語」「目的語、動作語、不要語」といった「不要語の位置の変更」といったバリエーションも考えられる。 In addition, the utterance pattern includes “change of position of unnecessary words” such as “inverted” “operation word, object word”, “object word, unnecessary word, operation word”, “object word, operation word, unnecessary word”, etc. Such variations are also conceivable.

（３）上記実施形態では、辞書部３１２内の４つの辞書３１２ａ，３１２ｂ，３１２ｃ，３１２ｄの優先順位を、音声認識装置３０側が自動的に判断して設定するようにしたが、優先順位自体をユーザの指示に基づいて設定するようにしてもよい。 (3) In the above embodiment, the priority of the four dictionaries 312a, 312b, 312c, and 312d in the dictionary unit 312 is automatically determined and set by the voice recognition device 30 side. You may make it set based on a user's instruction | indication.

その場合は、例えば操作スイッチ群８を介してユーザからの指示を受け付け、その受け付けた指示に基づき、辞書制御部３２３が優先順位を設定する。辞書毎の優先順位を設定する場合には、例えば表示装置１４へ発話パターンを例示し、その発話パターン毎にユーザが希望の順位を設定していくような手法が考えられる。 In that case, for example, an instruction from the user is accepted via the operation switch group 8, and the dictionary control unit 323 sets the priority order based on the accepted instruction. In the case of setting the priority order for each dictionary, for example, an utterance pattern is exemplified on the display device 14, and the user can set a desired order for each utterance pattern.

（４）複数のユーザが利用する場合には、ユーザ毎の識別情報（ユーザＩＤ）に対応させて辞書の優先順位及び重み付けを記憶させておき、ナビゲーションシステム２の使用開始時（あるいは音声認識装置３０の使用開始時）にユーザＩＤを入力することで、ユーザ毎の設定情報を用いた音声認識を実行することができるようにしておれば、対応可能である。 (4) When a plurality of users use the dictionary, the priority order and the weight of the dictionary are stored in correspondence with the identification information (user ID) for each user, and when the navigation system 2 starts to be used (or the voice recognition device) This is possible if voice recognition using the setting information for each user can be executed by inputting the user ID at the start of use of 30).

音声認識機能を持たせたナビゲーションシステム２の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the navigation system 2 provided with the speech recognition function. 音声認識装置３０における音声認識部３１と対話制御部３２の構成を示すブロック図である。3 is a block diagram showing the configuration of a voice recognition unit 31 and a dialogue control unit 32 in the voice recognition device 30. FIG. 辞書部３１２の辞書データの一例を示す説明図である。It is explanatory drawing which shows an example of the dictionary data of the dictionary part. 音声認識処理の前半を示すフローチャートである。It is a flowchart which shows the first half of a speech recognition process. 音声認識処理の後半を示すフローチャートである。It is a flowchart which shows the second half of a speech recognition process. 音声認識処理の別例を示すフローチャートである。It is a flowchart which shows another example of a speech recognition process.

Explanation of symbols

２…ナビゲーションシステム、４…位置検出器、６…データ入力器、６…地図データ入力器、８…操作スイッチ群、１０…制御回路、１２…外部メモリ、１４…表示装置、１５…リモコンセンサ、１５ａ…リモコン、１６…通信装置、１８…ジャイロスコープ、２２…ＧＰＳ受信機、３０…音声認識装置、３１…音声認識部、３２…対話制御部、３３…音声合成部、３４…音声抽出部、３５…マイク、３６…スイッチ、３７…スピーカ、３８…制御部、３１１…照合部、３１２…辞書部、３１３…抽出結果記憶部、３２１…処理部、３２２…入力部、３２３…辞書制御部、３２４…発話履歴記憶部、３２５…辞書優先順位判定部。 DESCRIPTION OF SYMBOLS 2 ... Navigation system, 4 ... Position detector, 6 ... Data input device, 6 ... Map data input device, 8 ... Operation switch group, 10 ... Control circuit, 12 ... External memory, 14 ... Display apparatus, 15 ... Remote control sensor, 15a ... remote control, 16 ... communication device, 18 ... gyroscope, 22 ... GPS receiver, 30 ... speech recognition device, 31 ... speech recognition unit, 32 ... dialogue control unit, 33 ... speech synthesis unit, 34 ... speech extraction unit, 35 ... microphone, 36 ... switch, 37 ... speaker, 38 ... control unit, 311 ... collation unit, 312 ... dictionary unit, 313 ... extraction result storage unit, 321 ... processing unit, 322 ... input unit, 323 ... dictionary control unit, 324 ... utterance history storage unit, 325 ... dictionary priority order determination unit.

Claims

Voice input means that can input voice in a row,
A speech recognition apparatus comprising: recognition means for comparing speech inputted through the speech input means with a plurality of comparison target patterns stored in advance in the dictionary means and having a high degree of matching as a recognition result. There,
The dictionary means has a plurality of types of dictionaries classified based on a plurality of predetermined utterance patterns,
Using the degree of similarity between the user's utterance pattern based on the recognition result by the recognition means and the predetermined plurality of utterance patterns, the priority order of the plurality of types of dictionaries is determined so that the higher the degree of similarity, the higher the ranking. Dictionary priority order judging means,
Dictionary control means for setting priorities of the plurality of types of dictionaries based on the determination result by the dictionary priority order determination means;
With
The speech recognition apparatus characterized in that the recognition means preferentially uses a dictionary with a high priority set by the dictionary control means to obtain a recognition result.

The speech recognition apparatus according to claim 1,
It has a reception means that can receive instructions from users,
The dictionary control means sets a priority level of the dictionary in which the priority is set based on an instruction received through the receiving means,
The speech recognition apparatus characterized in that the recognition means determines the degree of matching based on the priority degree set by the dictionary control means.

The speech recognition apparatus according to claim 1 or 2,
Utterance history storage means for storing the recognition result by the recognition means for a predetermined number of past utterances,
The dictionary priority determination means uses the degree of similarity between the user's utterance pattern and the predetermined plurality of utterance patterns based on a recognition result corresponding to the past predetermined number of utterances stored in the utterance history storage means, A speech recognition apparatus characterized by determining a priority of a dictionary.

Voice input means that can input voice in a row,
A speech recognition apparatus comprising: recognition means for comparing speech inputted through the speech input means with a plurality of comparison target patterns stored in advance in the dictionary means and having a high degree of matching as a recognition result. There,
The dictionary means has a plurality of types of dictionaries classified based on a plurality of predetermined utterance patterns,
An accepting means capable of accepting an instruction from the user;
Dictionary control means for setting priorities of the plurality of types of dictionaries based on instructions received via the receiving means;
With
The speech recognition apparatus characterized in that the recognition means preferentially uses a dictionary with a high priority set by the dictionary control means to obtain a recognition result.

The speech recognition apparatus according to claim 4,
The dictionary control means sets a priority level of the dictionary in which the priority is set based on an instruction received through the receiving means,
The speech recognition apparatus, wherein the recognition unit determines the degree of matching based on a weight set by the dictionary control unit.

A voice recognition device according to any one of claims 1 to 5, and a navigation device that executes predetermined processing based on a result recognized by the voice recognition device,
The voice input means is used for a user to input an instruction of predetermined place name related data that needs to be specified at least when the navigation device performs a navigation process. system.