JP2006208905A

JP2006208905A - Voice dialog device and voice dialog method

Info

Publication number: JP2006208905A
Application number: JP2005022704A
Authority: JP
Inventors: Keiko Katsuragawa; 景子桂川
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2005-01-31
Filing date: 2005-01-31
Publication date: 2006-08-10
Anticipated expiration: 2025-01-31
Also published as: JP4661239B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice dialog device and a voice dialog method that allow a user to understand an inputted voice with high efficiency and vocally interact. <P>SOLUTION: Disclosed is the voice dialog device being characterized in that when a user's utterance is inputted to a microphone 130 as a voice input means, a language understanding section 114 classifies all candidate words outputted by a voice recognition section 112 into categories, finds category costs indicating levels of possibility that candidate words belonging to the categories are uttered for the respective categories, and selects all or some of categories whose category costs are larger than thresholds predetermined for the categories as candidate categories. Further, the language understanding section 114 searches the candidate words for an understanding result category based upon candidate words belonging to the categories and word reliability of the candidate words and searches for the understanding result candidate again after selecting a decreased number of candidate categories as new candidate categories when the understanding result candidate is not found. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は音声対話装置及び音声対話方法に関する。 The present invention relates to a voice dialogue apparatus and a voice dialogue method.

従来技術における音声対話装置は、例えば下記特許文献１に記載されているように、音声入力手段を持ち、前記入力手段によって入力された音声の認識を行ない複数の候補を含む認識結果を出力する音声認識手段を持つ。さらに、この音声対話装置は、前記音声認識手段によって認識された単語が発話された可能性である単語信頼度を計算する単語信頼度演算手段によって単語信頼度を計算する。ここまでに求められた理解結果の単語とその信頼度から理解結果を導きだすために、前記単語を意味上の階層構造で分類したカテゴリにまとめ、同一カテゴリに分類された単語信頼度の合計をカテゴリスコアとして、前記カテゴリの中から発話された可能性が高いカテゴリをカテゴリスコアによって判別する。最後に前記発話された可能性が高いと判別された各カテゴリの中で実際に発話された可能性が高い単語を判別して理解結果を生成する。 For example, as described in Patent Document 1 below, a speech dialogue apparatus according to the related art has a voice input unit, recognizes a voice input by the input unit, and outputs a recognition result including a plurality of candidates. Has a means of recognition. Further, the voice interaction apparatus calculates the word reliability by the word reliability calculation means for calculating the word reliability that is the possibility that the word recognized by the voice recognition means is spoken. In order to derive the understanding result from the word of the understanding result obtained so far and its reliability, the words are grouped into categories classified in a semantic hierarchical structure, and the sum of the word reliability classified into the same category is summed up. As the category score, a category having a high possibility of being uttered from the categories is determined based on the category score. Finally, a word that is highly likely to be actually spoken is determined from each category that has been identified as having a high probability of being spoken, and an understanding result is generated.

特開２００４−２５１９９８号公報JP 2004-251998 A

従来技術では、理解結果として採用するためのカテゴリ判定は、最終的に理解結果として採用する単語や前記単語の信頼度は考慮せず、単語信頼度の合計のみから一度だけ行われるため、判別されたカテゴリに該当する適当な単語が得られないことがあったり、より高い信頼度をもつ単語があるにもかかわらず低い信頼度の単語が選択されたりするなど、最適な理解結果を生成できないことがあるという問題点があった。 In the prior art, the category determination to be adopted as the understanding result is determined only because the word to be finally adopted as the understanding result and the reliability of the word are not considered, and are performed only once based on the total word reliability. The appropriate understanding result cannot be generated, for example, an appropriate word corresponding to the selected category may not be obtained, or a word with a higher reliability is selected, but a word with a lower reliability is selected. There was a problem that there was.

本発明は前記の問題に鑑みてなされたものであり、本発明が解決しようとする課題は、利用者が入力した音声を高効率で理解して音声対話する音声対話装置及び音声対話方法を提供することにある。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and the problem to be solved by the present invention is to provide a voice dialogue apparatus and a voice dialogue method for understanding voice inputted by a user with high efficiency and voice dialogue. There is to do.

入力される音声を音声入力手段が音声信号に変換し、該音声信号を音声認識手段が候補単語に変換し、該候補単語が発話された可能性の高さを示す単語信頼度を単語信頼度演算手段が計算し、該候補単語と該単語信頼度とから該音声入力手段に入力された音声言語を言語理解部が理解する音声対話装置であって、該言語理解部は、ユーザの発話が該音声入力手段に入力された場合に、該音声認識手段が出力する候補単語の全てをカテゴリに分類し、該カテゴリの各々について、該カテゴリに属する候補単語の単語信頼度を用いて、該カテゴリに属する候補単語が発話された可能性の高さを示すカテゴリスコアを計算し、該カテゴリスコアが該カテゴリに対して予め定められた閾値以上であるカテゴリの全部または一部を候補カテゴリとして選択するカテゴリ選択処理と、該候補カテゴリに属する候補単語と該候補単語の単語信頼度とに基づいて該候補単語の中から理解結果候補を探索する理解結果候補探索処理とを行い、該理解結果候補探索処理によって理解結果候補が発見されない場合には、該候補カテゴリの集合から１つ以上のカテゴリを除いてなる集合を新しい候補カテゴリの集合として選択するカテゴリ選択処理を行った後に前記理解結果候補探索処理を再び行うことを特徴とする音声対話装置を構成する。 The voice input means converts the input voice into a voice signal, the voice recognition means converts the voice signal into a candidate word, and the word reliability indicating the probability that the candidate word is spoken is a word reliability. A spoken dialogue apparatus in which a language understanding unit understands a spoken language input to the voice input unit based on the candidate word and the word reliability calculated by the calculation unit, and the language understanding unit When input to the speech input means, all candidate words output by the speech recognition means are classified into categories, and for each of the categories, using the word reliability of the candidate words belonging to the category, the category A category score indicating the probability that a candidate word belonging to is uttered is calculated, and all or a part of categories whose category score is equal to or higher than a predetermined threshold for the category are selected as candidate categories. Category selection processing, and understanding result candidate search processing for searching for an understanding result candidate from the candidate words based on the candidate words belonging to the candidate category and the word reliability of the candidate words, and the understanding result candidates When an understanding result candidate is not found by the search process, the understanding result candidate search is performed after performing a category selection process for selecting a set obtained by removing one or more categories from the candidate category set as a new candidate category set. A spoken dialogue apparatus is characterized in that the process is performed again.

本発明の実施によって、利用者が入力した音声を高効率で理解して音声対話する音声対話装置及び音声対話方法を提供することが可能となる。 By implementing the present invention, it is possible to provide a voice dialogue apparatus and a voice dialogue method for understanding voice spoken by a user with high efficiency and carrying out voice dialogue.

図１は、本発明に係る音声対話装置の実施の形態例であるナビゲーション装置の構成を示すブロック図である。図において、ナビゲーション装置１００は車両に搭載され、ユーザが音声認識開始を指示するためのスイッチ１２０と、ユーザの発話音声を含めて、入力される音声を音声信号に変換して出力する音声入力手段であるマイクロフォン１３０（図中、マイクと表示）と、メモリ１４０と、地図データやガイダンス音声の音声データを格納するディスク１５１と、ディスク１５１を読み取るディスク読み取り装置１５０と、地図やメニュー画面や制御装置１１０による音声認識結果を表示するモニタ１６０と、音声を出力するスピーカ１７０と、後述するようにマイクロフォン１３０を介して入力された音声データを音声認識する制御装置１１０とを備えている。 FIG. 1 is a block diagram showing a configuration of a navigation device which is an embodiment of a voice interaction device according to the present invention. In the figure, a navigation device 100 is mounted on a vehicle, and a switch 120 for a user to give an instruction to start voice recognition, and voice input means for converting an input voice, including a user's uttered voice, into a voice signal and outputting it. A microphone 130 (displayed as a microphone in the figure), a memory 140, a disk 151 storing map data and voice data of guidance voice, a disk reading device 150 for reading the disk 151, a map, a menu screen, and a control device 110 includes a monitor 160 that displays a voice recognition result by 110, a speaker 170 that outputs voice, and a control device 110 that recognizes voice data input via a microphone 130, as will be described later.

メモリ１４０は、ナビゲーション装置１００の操作に使用される語句および文、すなわち操作コマンドおよび地名や施設名、道路名などの固有名詞およびこれらの語句を含む文を受理し、音声認識の際に使用される音声認識辞書・文法１４１と、現時点までの発話の理解結果１４２を格納する。現時点までの発話の理解結果１４２は、対話によって次の発話が入力された際に、現発話の理解を過去の発話理解結果と合わせて理解するために使用される。発話理解の詳細については後述する。 The memory 140 accepts words and sentences used for the operation of the navigation device 100, that is, operation commands, proper names such as place names, facility names, road names, and sentences containing these words, and is used for speech recognition. Voice recognition dictionary / grammar 141 and utterance understanding result 142 up to the present time are stored. The understanding result 142 of the utterance up to the present time is used to understand the understanding of the current utterance together with the previous utterance understanding result when the next utterance is input by the dialogue. Details of speech understanding will be described later.

次に、音声認識に使用する音声認識用辞書・文法１４１について説明する。本実施の形態例では、カーナビゲーションシステムの目的地設定をメインタスクとする。そのため、入力文としては、「神奈川県」、「横浜駅」などといった施設に関する単語のみの入力と「神奈川県の横浜駅」、「東海道線の横浜駅」などといった複数のキーワードを含んだ文章による入力の両方を受理するよう、音声認識用辞書・文法１４１を構成する。 Next, the speech recognition dictionary / grammar 141 used for speech recognition will be described. In the present embodiment, the destination setting of the car navigation system is a main task. For this reason, the input sentence should be a text containing multiple keywords such as “Kanagawa Prefecture”, “Yokohama Station”, etc. and “Kanagawa Prefecture Yokohama Station”, “Tokaido Line Yokohama Station”, etc. The speech recognition dictionary / grammar 141 is configured to accept both inputs.

制御装置１１０は入力制御部１１１と、音声入力手段であるマイクロフォン１３０が出力する音声信号を候補単語に変換して出力する音声認識手段である音声認識装置１１２と、該候補単語が発話された可能性の高さを示す単語信頼度を計算する単語信頼度演算手段である単語信頼度演算部１１３と、該候補単語と該単語信頼度とから該音声入力手段に入力された音声言語を理解する言語理解部１１４と、応答生成部１１５と、ＧＵＩ（ガイダンス）表示制御部１１６と、音声合成部１１７とを備えている。入力制御部１１１はスイッチ１２０によって指示される音声認識開始合図によって音声認識部１１２に音声認識開始を指示する。 The control device 110 includes an input control unit 111, a speech recognition device 112 as speech recognition means for converting a speech signal output from the microphone 130 as speech input means into a candidate word, and the possibility that the candidate word has been uttered A word reliability calculation unit 113 which is a word reliability calculation means for calculating a word reliability indicating a high degree of reliability, and understands the spoken language input to the voice input means from the candidate word and the word reliability A language understanding unit 114, a response generation unit 115, a GUI (guidance) display control unit 116, and a speech synthesis unit 117 are provided. The input control unit 111 instructs the voice recognition unit 112 to start voice recognition by a voice recognition start signal instructed by the switch 120.

続いて、制御装置１１０の動作について詳しく説明する。ナビゲーション装置１００が起動されると、制御装置１１０はディスク読み取り装置１５０を使ってディスク１５１から音声認識用辞書・文法１４１をメモリ１４０上に読み込む。この状態で、スイッチ１２０が押されると、入力制御部１１１は音声認識部１１２に対して音声認識開始を指示する。音声認識部１１２は入力制御部１１１より認識開始が指示されると、マイクロフォン１３０から入力される音声を取り込む。 Next, the operation of the control device 110 will be described in detail. When the navigation device 100 is activated, the control device 110 reads the speech recognition dictionary / grammar 141 from the disk 151 into the memory 140 using the disk reading device 150. When the switch 120 is pressed in this state, the input control unit 111 instructs the voice recognition unit 112 to start voice recognition. When the input controller 111 instructs the voice recognition unit 112 to start recognition, the voice recognition unit 112 takes in the voice input from the microphone 130.

本実施の形態例における音声認識開始から応答文出力までの処理フローを図２に示す。ユーザが音声認識開始スイッチ１２０を押して音声認識可能状態になった状態（Ｓ０）で、ユーザがマイクロフォン１３０を使ってナビゲーション１００を操作するための文を発話すると（Ｓ１）、音声認識部１１２では、前記マイクロフォン１３０を介して入力された音声データを、音声認識用辞書・文法１４１に格納された待ち受け文とマッチング処理する。このマッチング処理の際には、入力された音声データと各待ち受け文との音響的な近さである音響尤度が計算され、この音響尤度が一定の値以上のものを認識結果の候補とする（Ｓ２）。 FIG. 2 shows a processing flow from the start of speech recognition to response sentence output in this embodiment. When the user presses the voice recognition start switch 120 and the voice recognition is enabled (S0) and the user utters a sentence for operating the navigation 100 using the microphone 130 (S1), the voice recognition unit 112 The voice data input through the microphone 130 is matched with a standby sentence stored in the voice recognition dictionary / grammar 141. In this matching process, the acoustic likelihood, which is the acoustic proximity between the input speech data and each standby sentence, is calculated, and if the acoustic likelihood is a certain value or more, (S2).

次に、単語信頼度演算部１１３ではこの認識結果候補と各候補の尤度から、認識結果候補に含まれる全ての単語（候補単語と称する）に対して単語信頼度を計算する（Ｓ３）。単語信頼度は直前の一発話において前記単語が発話された可能性をあらわす。単語wの信頼度Conf(w)は以下の式で求められる。 Next, the word reliability calculation unit 113 calculates word reliability for all words (referred to as candidate words) included in the recognition result candidates from the recognition result candidates and the likelihood of each candidate (S3). The word reliability indicates the possibility that the word has been uttered in the last utterance. The confidence level Conf (w) of the word w is obtained by the following equation.

ここで、N-best候補とは、同一認識結果における認識結果候補を、第１位から第Ｎ位までの尤度の高い順に配列した単語列のことであり、Ｌ_ｉはｉ番目のN-best候補の対数尤度であり、αは重み係数である。また、Ｐ_ｉは、単語wがN-best候補の中でｉ番目の候補に含まれている確からしさを表している。前記信頼度計算の詳細については、前記特許文献１にその詳細が記載されている。

Here, the N-best candidate, the recognition result candidates in the same recognition result is that the sequences and word string having a high likelihood sequence from 1st to the N-position, L _i is the i-th N- This is the log likelihood of the best candidate, and α is a weighting factor. Also, P _i represents the probability that a word w is included in the i-th candidate in the N-best candidates. Details of the reliability calculation are described in Patent Document 1.

次に、言語理解部１１４の処理に移る。言語理解部１１４ではまず、これまでの対話の中で発話された可能性のある全ての単語の単語信頼度を修正する（Ｓ４）。認識結果候補中の他の単語との意味上の上下関係の有無や整合性などによって（Ｓ３）で求めた単語信頼度を上下させた値が単語信頼度の修正結果となる。例えば、第一発話の認識結果候補中に「東京駅」があり、第二発話の認識結果候補中に「東京都」があった場合、「東京都」と「東京駅」の間には上下関係が成り立つため、お互いの単語信頼度を強めあう。また、前記の例で第一発話の認識結果候補中に「京都駅」があった場合は、「東京都」と「京都駅」の間には上下関係が成り立たないため、お互いの単語信頼度を弱めあう。なお、この単語信頼度の修正は行わなくてもよい。 Next, the processing of the language understanding unit 114 is performed. The language understanding unit 114 first corrects the word reliability of all words that may have been uttered in the previous dialogue (S4). A value obtained by raising or lowering the word reliability obtained in (S3) depending on the presence / absence or consistency of the semantic relationship with other words in the recognition result candidate is the word reliability correction result. For example, if “Tokyo Station” is among the recognition result candidates for the first utterance and “Tokyo” is among the recognition result candidates for the second utterance, the upper and lower sides are between “Tokyo” and “Tokyo Station”. Strengthen mutual word confidence because relationships are established. In the above example, if “Kyoto Station” is one of the recognition results candidates for the first utterance, there is no vertical relationship between “Tokyo” and “Kyoto Station”. Weaken each other. The word reliability need not be corrected.

単語信頼度の修正（Ｓ４）が終わると、候補単語の全てをカテゴリに分類し、各カテゴリスコアについて、カテゴリスコアを計算する（Ｓ５）。カテゴリとは、単語を意味上のまとまりで分類したもので、「都道府県カテゴリ」「市区町村カテゴリ」「路線名カテゴリ」「施設名カテゴリ」などがあり、例えば「都道府県カテゴリ」には「東京都」「神奈川県」などの都道府県名が分類され、「施設名カテゴリ」には「横浜駅」「横浜青葉インター」「千葉カントリークラブ」などの目的地設定における最終目的である目的地名が分類される。カテゴリスコアは、同じカテゴリに分類された候補単語の単語信頼度を足し合わせることで求められる。カテゴリスコアは、そのカテゴリに属する候補単語が発話された可能性の高さを示す数となっている。これは、現在の発話で認識候補とされた単語のスコアを理解結果１４２に保存されている過去に発話された内容の単語のスコアとを足し合わせることで過去の発話内容と今回の発話内容を合わせて対話理解することができる。 When the word reliability correction (S4) ends, all candidate words are classified into categories, and a category score is calculated for each category score (S5). A category is a grouping of words in terms of meaning, and includes "prefecture category", "city category", "route name category", and "facility name category". For example, "prefecture category" Prefectural names such as “Tokyo” and “Kanagawa” are categorized, and the “facility name category” includes the destination name that is the final goal in destination setting such as “Yokohama Station”, “Yokohama Aoba Inter”, and “Chiba Country Club”. being classified. The category score is obtained by adding the word reliability of candidate words classified into the same category. The category score is a number indicating the high possibility that a candidate word belonging to the category has been uttered. This is because the past utterance content and the current utterance content are obtained by adding the score of the word that has been selected as a recognition candidate in the current utterance and the score of the word uttered in the past stored in the understanding result 142. You can also understand the dialogue.

次に、（Ｓ５）で求めたカテゴリスコアを元に、選択するべきカテゴリ（候補カテゴリと称する）を選択する（Ｓ６、カテゴリ選択処理）。このカテゴリ選択処理において、カテゴリスコアが、そのカテゴリに対して予め定められた閾値以上であるカテゴリの全部または一部を候補カテゴリとして選択する。このカテゴリ選択処理によって、候補カテゴリの集合が１つまたは複数選択される。カテゴリ選択処理の詳細については後述する。 Next, a category to be selected (referred to as a candidate category) is selected based on the category score obtained in (S5) (S6, category selection process). In this category selection process, all or a part of categories whose category score is equal to or higher than a predetermined threshold for the category are selected as candidate categories. By this category selection process, one or a plurality of candidate category sets are selected. Details of the category selection process will be described later.

候補カテゴリが決定すると、この候補カテゴリから理解結果候補として採用する単語または単語組み合わせを検索する（Ｓ７、理解結果候補探索処理）。単語は、各カテゴリから１つずつ選択して組み合わせ、意味上の整合性がとれる組み合わせを理解結果候補とする。意味上の整合性がとれる組み合わせとしては、例えば、「東京都」と「東京駅」との組み合わせがあり（東京駅は東京都内にある）、整合性がとれない組み合わせとしては、例えば、「東京都」と「横浜駅」との組み合わせがある（横浜駅は東京都内に無い）。 When a candidate category is determined, a word or a word combination to be adopted as an understanding result candidate is searched from this candidate category (S7, understanding result candidate search process). Words are selected and combined one by one from each category, and combinations that are semantically consistent are taken as understanding result candidates. For example, there is a combination of “Tokyo” and “Tokyo Station” (Tokyo Station is in Tokyo), and a combination that cannot be consistent is, for example, “Tokyo”. There is a combination of “City” and “Yokohama Station” (Yokohama Station is not in Tokyo).

候補カテゴリが１つである場合には、該候補カテゴリに属する候補単語のすべてについて意味上の整合性がとれているので、最も高い単語信頼度を持つ候補単語を理解結果候補とすればよい。 When there is one candidate category, all the candidate words belonging to the candidate category are semantically consistent, so the candidate word having the highest word reliability may be used as the understanding result candidate.

もし、ここで理解結果候補が発見されなければ（Ｓ８のＮｏの分岐として）カテゴリ選択（Ｓ６）以下のステップを再び行う。この場合に、候補カテゴリの個数を増やすことによって、理解結果候補を決定することができる場合もありうるが、その場合に、この理解結果候補が発話された可能性は低いので、本発明においては、候補カテゴリの個数を減らして、理解結果候補が決定されるようにする。すなわち、候補カテゴリの集合から１つ以上のカテゴリを除いてなる集合を新しい候補カテゴリの集合として選択するカテゴリ選択処理を行う。この過程において、候補カテゴリの個数を削減する仕方は２つ以上あるので、複数の集合が新しい候補カテゴリの集合として選択される場合がある。そのような場合には、新しい候補カテゴリの集合の各々につて、理解結果候補探索処理（Ｓ７）を行う。 If an understanding result candidate is not found here (as a branch of No in S8), the category selection (S6) and subsequent steps are performed again. In this case, it may be possible to determine an understanding result candidate by increasing the number of candidate categories. In this case, however, it is unlikely that the understanding result candidate has been spoken. The number of candidate categories is reduced so that understanding result candidates are determined. That is, a category selection process is performed in which a set obtained by removing one or more categories from a set of candidate categories is selected as a new set of candidate categories. In this process, since there are two or more ways of reducing the number of candidate categories, a plurality of sets may be selected as a set of new candidate categories. In such a case, an understanding result candidate search process (S7) is performed for each set of new candidate categories.

上記の一連の処理によって、理解結果候補は必ず発見される（候補カテゴリが１つになっ場合を考えれば明らか）ので、理解結果候補の中から、例えば、単語信頼度の合計（理解結果候補が単語の場合はその単語信頼度、単語の組み合わせの場合はその単語の単語信頼度の合計）が最も大きいものを最終的な理解結果候補として選択する（Ｓ９）。 As a result of the above-described series of processing, an understanding result candidate is always found (obviously, considering the case where there is only one candidate category). In the case of a word, the highest word reliability is selected as the final understanding result candidate (S9).

次に、この最終的な理解結果候補である単語または単語の組み合わせが理解結果として適当であるかどうかを調べ（Ｓ１０）、適当でない場合はカテゴリ選択（Ｓ６）からやり直す。 Next, it is checked whether or not the word or word combination that is the final understanding result candidate is appropriate as the understanding result (S10), and if not, the process is repeated from the category selection (S6).

理解結果として適当な単語または単語の組み合わせが決定すると、その理解結果は応答文生成部１１５に渡される。応答文生成部１１５では言語理解部１１４が生成した理解結果をもとに応答文を生成する（Ｓ１１）。応答文生成では、目的地設定のために必要な情報が不足していれば不足する情報の入力を即す応答文を生成し、理解結果に選択された単語のスコアが低く、確認が必要と判断される場合には、理解内容の確認のための応答文を生成する。また、目的地が確定した場合には、目的地までの地図を検索し、表示する旨を伝える応答文を生成する。 When an appropriate word or combination of words is determined as an understanding result, the understanding result is passed to the response sentence generation unit 115. The response sentence generation unit 115 generates a response sentence based on the understanding result generated by the language understanding unit 114 (S11). In the response sentence generation, if there is insufficient information necessary for destination setting, a response sentence is created that prompts the input of the missing information, and the score of the word selected in the understanding result is low, and confirmation is required. If so, a response sentence for confirming the understanding content is generated. If the destination is determined, a map to the destination is searched and a response sentence is generated to inform the display.

この応答文は音声合成部１１７によって音声として出力される（Ｓ１２）。 This response sentence is output as speech by the speech synthesizer 117 (S12).

この時、ＧＵＩ表示制御部１１６は応答内容をモニタ１６０上に表示するとともに、地図表示が必要であればディスク読み取り装置１５０を使ってディスク１５１から地図データを読み出し、モニタ１６０に地図を表示して、一連の入力処理を終える（Ｓ１３）。 At this time, the GUI display control unit 116 displays the response contents on the monitor 160, and if the map display is necessary, reads the map data from the disk 151 using the disk reading device 150 and displays the map on the monitor 160. The series of input processes is finished (S13).

前記の構成によって、本発明においては、（候補カテゴリが１つである場合を考えれば明らかなように）理解結果候補を必ず見いだすことが可能となるので、本発明の実施によって、利用者が入力した音声を高効率で理解して音声対話する音声対話装置及び音声対話方法を提供することが可能となる。 With the above configuration, in the present invention, it becomes possible to always find a candidate for an understanding result (as is clear when considering the case where there is one candidate category). Therefore, it is possible to provide a voice dialogue apparatus and a voice dialogue method for understanding voices with high efficiency and carrying out voice dialogues.

次に、カテゴリ選択について詳しく述べるが、まずは、従来例におけるカテゴリ選択および理解結果選択の方法を、図３に示したサンプルデータおよび図４に示したフローチャートを用いて説明する。 Next, category selection will be described in detail. First, a category selection method and an understanding result selection method in the conventional example will be described with reference to the sample data shown in FIG. 3 and the flowchart shown in FIG.

図３の（ａ）は、音声認識部１１２が出力する認識結果候補と各候補の尤度を示している。音声入力開始（Ｓ０）から単語信頼度修正（Ｓ４）までは先に示した本発明の実施の形態例と同じ動作をする。つまり、ユーザが音声認識開始スイッチ１２０を押して音声認識可能状態になった状態（Ｓ０）で、マイクロフォン１３０を使ってナビゲーションを操作するための文を発話すると（Ｓ１）、音声認識部１１２が入力音声の認識処理をして、認識結果の候補と尤度を出力する（Ｓ２）。次に、単語信頼度演算部１１３ではこの認識結果候補と各候補の尤度から、単語信頼度を計算する（Ｓ３）。図３の（ｂ）は単語信頼度演算部１１３が単語信頼度を計算した結果を示している。認識結果候補３ｂに含まれる全ての単語を認識結果候補単語３ｄとして単語信頼度３ｅを計算する。 (A) of FIG. 3 has shown the recognition result candidate which the speech recognition part 112 outputs, and the likelihood of each candidate. From the voice input start (S0) to the word reliability correction (S4), the same operation as the embodiment of the present invention described above is performed. That is, when the user presses the voice recognition start switch 120 and is in a voice recognition enabled state (S0), when the user speaks a sentence for operating navigation using the microphone 130 (S1), the voice recognition unit 112 receives the input voice. The recognition process candidate and likelihood are output (S2). Next, the word reliability calculation unit 113 calculates word reliability from the recognition result candidates and the likelihood of each candidate (S3). FIG. 3B shows the result of the word reliability calculation unit 113 calculating the word reliability. The word reliability 3e is calculated with all the words included in the recognition result candidate 3b as the recognition result candidate word 3d.

次に、この単語信頼度演算結果（図３の（ｂ)）は言語理解部１１４に移される。言語理解部１１４では、まず、先に説明したように単語信頼度演算結果の単語信頼度修正を行なう（Ｓ４）。これによって３ｅの値が上下させられる。今回は単語信頼度修正による単語信頼度値の変化がなかったとして次に進む。言語理解部１１４では単語信頼度修正の後、認識結果候補単語３ｄからカテゴリスコアを計算する（Ｓ５）。 Next, the word reliability calculation result ((b) of FIG. 3) is transferred to the language understanding unit 114. The language understanding unit 114 first corrects the word reliability of the word reliability calculation result as described above (S4). This raises or lowers the value of 3e. This time, it is assumed that there has been no change in the word reliability value due to the word reliability correction. The language understanding unit 114 calculates a category score from the recognition result candidate word 3d after correcting the word reliability (S5).

カテゴリとは、県カテゴリ、路線名カテゴリ、施設名カテゴリなど、意味上の分類で同列に扱う単語の集合である。図３の例では、３ｆに示したように、「神奈川県」および「香川県」が県カテゴリ、「徳島線」が路線名カテゴリ、「横浜駅」および「屋島駅」が施設名カテゴリとなる。このように認識結果候補単語３ｄをカテゴリごとに分類し、同じカテゴリに分類された認識結果候補単語の単語信頼度３ｅを足し合わせたものがカテゴリスコアとなる。 A category is a set of words that are treated in the same column for semantic classification, such as a prefecture category, a route name category, and a facility name category. In the example of FIG. 3, as shown in 3f, “Kanagawa Prefecture” and “Kagawa Prefecture” are prefecture categories, “Tokushima Line” is a route name category, and “Yokohama Station” and “Yajima Station” are facility name categories. . In this way, the recognition result candidate words 3d are classified for each category, and the sum of the word reliability 3e of the recognition result candidate words classified into the same category is a category score.

図３の（ｂ）からカテゴリスコアを計算した結果を図３の（ｃ）に示した。例えば「神奈川県（単語信頼度０．８０）」および「香川県（単語信頼度０．２０）」からなる「県カテゴリ」のカテゴリスコア３ｈは１．００となる。 The result of calculating the category score from (b) of FIG. 3 is shown in (c) of FIG. For example, the category score 3h of “prefecture category” composed of “Kanagawa prefecture (word reliability 0.80)” and “Kagawa prefecture (word reliability 0.20)” is 1.00.

次に、このカテゴリスコアから理解結果として採用するカテゴリを選択する（Ｓ６）。各カテゴリにはカテゴリ選択のために、予めカテゴリ閾値３ｉが設定されており、この閾値を超えたカテゴリが理解結果に採用するカテゴリとして選択される。カテゴリ閾値３ｉは、データ学習によって予め設定されている。本例の場合、以下のように、カテゴリスコアがカテゴリ閾値と比較され、カテゴリ閾値よりも大きいカテゴリスコアを持つカテゴリが選択される。 Next, a category to be adopted as an understanding result is selected from the category score (S6). For each category, a category threshold 3i is set in advance for category selection, and a category exceeding this threshold is selected as a category to be adopted for the understanding result. The category threshold 3i is preset by data learning. In the case of this example, the category score is compared with the category threshold as described below, and a category having a category score larger than the category threshold is selected.

県カテゴリのカテゴリスコア１．０＞県カテゴリ閾値０．５、したがって、県カテゴリを選択、
路線名カテゴリのカテゴリスコア０．２＜路線名カテゴリ閾値０．４、したがって、路線名カテゴリを選択せず、
施設名カテゴリのカテゴリスコア０．５＞施設名カテゴリ閾値０．４、したがって、施設名カテゴリを選択、
これによって、理解結果として採用するべきカテゴリは県カテゴリと施設名カテゴリとなる。 Category score of prefecture category 1.0> prefecture category threshold 0.5, so select prefecture category,
Category score of route name category 0.2 <route name category threshold 0.4, therefore, route name category is not selected,
Category score for facility name category 0.5> facility name category threshold 0.4, therefore select facility name category,
As a result, the categories to be adopted as the understanding result are the prefecture category and the facility name category.

次に、言語理解部１１４では、認識結果候補単語３ｄから、選択されたカテゴリにあてはまる単語の組み合わせを探す（Ｓ７）。この際、選択されたカテゴリにあてはまる単語の組み合わせは、意味上の整合性がとれるものでなければならない。つまり、県カテゴリと施設名カテゴリに採用する単語の組み合わせでは、施設名カテゴリより採用する単語は県カテゴリで採用された県内に存在する施設の名称でなければならない。この条件に沿ってカテゴリにあてはまる単語の組み合わせを探した結果である理解結果候補が図３の（ｄ）となる。その際、理解結果候補に採用された単語の単語信頼度３ｅを足し合わせたものが理解結果候補のスコア３ｌとなる。 Next, the language understanding unit 114 searches the recognition result candidate word 3d for a combination of words that fits into the selected category (S7). At this time, word combinations that fall within the selected category must be semantically consistent. That is, in the combination of words adopted for the prefecture category and the facility name category, the word adopted from the facility name category must be the name of a facility existing in the prefecture adopted for the prefecture category. FIG. 3D shows an understanding result candidate that is a result of searching for a combination of words that fits into the category in accordance with this condition. At that time, the sum of the word reliability 3e of the word adopted as the understanding result candidate is the score 3l of the understanding result candidate.

最後に、理解結果候補のスコア３ｌが最も高くなる理解結果候補を最適な組み合わせの理解結果として選択し（Ｓ８）、この結果を元に応答文を生成（Ｓ９）、出力（Ｓ１０）して言語理解処理を終える。 Finally, an understanding result candidate having the highest score 3l of the understanding result candidate is selected as an understanding result of the optimum combination (S8), a response sentence is generated based on this result (S9), and output (S10). Finish the understanding process.

ここまでが従来例における言語理解部の処理であるが、これでは、図４のＳ７において、選択されたカテゴリにあてはまる単語の組み合わせが見つからなかった場合には、理解結果を導き出すことができない。図５に、従来例では適当な単語の組み合わせが見つからず、従来例では理解結果が得られない音声認識結果の例を示す。 The processing up to this point is the processing of the language understanding unit in the conventional example. However, in this case, if no combination of words matching the selected category is found in S7 of FIG. 4, the understanding result cannot be derived. FIG. 5 shows an example of a speech recognition result in which an appropriate word combination cannot be found in the conventional example and an understanding result cannot be obtained in the conventional example.

図５の（ａ）は音声認識部１１２が出力した認識結果候補と尤度である。この認識結果を元に前述の単語信頼度計算によって認識結果中に含まれる各単語の単語信頼度を計算したものが図５の（ｂ）である。このように単語信頼度計算した単語をカテゴリごとに分類してカテゴリスコアを求めたものが図５の（ｃ）となる。ここで、カテゴリスコアがカテゴリ閾値以上であるカテゴリは県カテゴリと路線名カテゴリであるため、候補カテゴリとして選択されるカテゴリは県カテゴリと路線名カテゴリとなる。しかし、県カテゴリの単語は「神奈川県」のみ、路線名カテゴリの単語は「飯山線」のみであり、「飯山線」は「神奈川県」内には存在しない路線名であるため、県カテゴリ＋路線名カテゴリの組み合わせで整合性のとれる単語の組み合わせはない。そのため、従来例の方法では理解結果を得ることができなかった。 FIG. 5A shows recognition result candidates and likelihoods output by the speech recognition unit 112. FIG. 5B shows the word reliability calculated for each word included in the recognition result by the above word reliability calculation based on the recognition result. FIG. 5C shows a category score obtained by classifying the words calculated for the word reliability in this way for each category. Here, since categories whose category score is equal to or higher than the category threshold are the prefecture category and the route name category, the categories selected as candidate categories are the prefecture category and the route name category. However, since the prefecture category word is only “Kanagawa Prefecture”, the route name category word is only “Iiyama Line”, and “Iiyama Line” is a route name that does not exist in “Kanagawa Prefecture”, the prefecture category + There is no combination of words that can be consistent with the combination of route name categories. For this reason, the conventional method cannot obtain an understanding result.

そこで、本発明では理解結果候補として選択する単語の単語信頼度と各単語間の関係とを利用しながらカテゴリ選択を行ない、さらに、理解結果を得ることができない場合に、候補カテゴリの個数を減らして理解結果の探索を再び行う。 Therefore, in the present invention, category selection is performed using the word reliability of the word to be selected as an understanding result candidate and the relationship between the words, and if the understanding result cannot be obtained, the number of candidate categories is reduced. Then search for the understanding result again.

（実施の形態例１）
図２に示した実施の形態例では、カテゴリ選択の後、採用する単語の組み合わせを検討し、適当な組み合わせが見つからなければカテゴリ選択を再度やり直す（Ｓ８→Ｓ６）。本実施の形態例では、選択されたＮ個の候補カテゴリにあてはまる（すなわち、意味上の整合性がとれる）適当な組み合わせの単語が存在しない場合、Ｎ未満の候補カテゴリを選択する。Ｎ個の候補カテゴリからＮ−１個の候補カテゴリを選択する方法としては、まず、Ｎ個の候補カテゴリの中で最もカテゴリスコアが低いカテゴリを除いたものを候補カテゴリ選択の結果とする方法がある。 (Embodiment 1)
In the embodiment shown in FIG. 2, after selecting a category, a combination of words to be adopted is examined, and if an appropriate combination is not found, the category selection is performed again (S8 → S6). In the present embodiment, if there are no appropriate combinations of words that apply to the selected N candidate categories (that is, semantically consistent), candidate categories less than N are selected. As a method of selecting N-1 candidate categories from N candidate categories, first, a method in which the category with the lowest category score among the N candidate categories is excluded as a result of candidate category selection. is there.

具体的には、例えば、理解結果候補探索処理において、候補カテゴリが１つであれば該カテゴリに属する単語の中から最も高い単語信頼度を持つ単語を理解結果候補とし、候補カテゴリが２つ以上であれば該候補カテゴリの各々から１つずつの単語を取り出して組み合わせた単語組み合わせの中で、意味上の整合性がとれ、かつ、単語信頼度の合計が最も高い単語組み合わせを理解結果候補とし、理解結果候補が発見されない場合には、該候補カテゴリからカテゴリスコアが最も低いカテゴリ１つを除いたものを新しい候補カテゴリとして選択した後に前記理解結果候補探索処理を再び行う。 Specifically, for example, in the understanding result candidate search process, if there is one candidate category, a word having the highest word reliability is selected as an understanding result candidate from words belonging to the category, and there are two or more candidate categories. Then, among the word combinations obtained by extracting one word from each of the candidate categories and combining them, the word combination having the highest semantic consistency and the highest total word reliability is set as the understanding result candidate. When no understanding result candidate is found, a candidate obtained by removing one category having the lowest category score from the candidate category is selected as a new candidate category, and then the understanding result candidate search process is performed again.

図５の例の場合、選択されたＮ個の候補カテゴリは県カテゴリと路線名カテゴリであり、そのうち最もカテゴリスコアが低いカテゴリは路線名カテゴリである。そのため、候補カテゴリとして選択するカテゴリから路線名カテゴリを除き、県カテゴリのみが候補カテゴリとなる。その結果、県カテゴリにあてはまる単語は「神奈川県」のみとなるため、理解結果候補は「神奈川県」となる。 In the example of FIG. 5, the selected N candidate categories are the prefecture category and the route name category, and the category having the lowest category score is the route name category. For this reason, the route category is excluded from the category selected as the candidate category, and only the prefecture category is a candidate category. As a result, the only word that applies to the prefecture category is “Kanagawa Prefecture”, so the candidate for the understanding result is “Kanagawa Prefecture”.

以上のようにして、理解結果として適当なカテゴリの組み合わせを見つけるために採用するカテゴリの数を減らす際、カテゴリスコアが低いカテゴリから削除していくことで、よりスコアが高いカテゴリのみが理解結果として残ることとなるため、理解結果の精度を向上させることができる。 As described above, when reducing the number of categories adopted to find an appropriate combination of categories as an understanding result, by deleting from the category with the lower category score, only the category with the higher score is obtained as the understanding result. As a result, the accuracy of the understanding result can be improved.

（実施の形態例２）
選択されたＮ個のカテゴリにあてはまる適当な組み合わせの単語が存在しない場合に、Ｎ個のカテゴリからＮ未満のカテゴリを選択するその他の方法としては、選択されたＮ個のカテゴリから１つをはずしたＮ−１個のカテゴリの組み合わせ（候補カテゴリの集合）Ｎ個の全てに関して、理解結果候補として適当な単語を調べ、理解結果のスコアを求め、理解結果のスコアが最も高くなるものを選択する方法もある。 (Embodiment 2)
Another way to select less than N categories from N categories when there is no suitable combination of words that fits into the selected N categories is to remove one from the selected N categories. For all N combinations of N-1 categories (a set of candidate categories), an appropriate word is examined as an understanding result candidate, a score for the understanding result is obtained, and a score having the highest understanding result is selected. There is also a method.

具体的には、例えば、理解結果候補探索処理において、候補カテゴリが１つであれば該カテゴリに属する単語の中から最も高い単語信頼度を持つ単語を理解結果候補とし、前記候補カテゴリが２つ以上であれば該候補カテゴリの各々から１つずつの単語を取り出して組み合わせた単語組み合わせの中で、意味上の整合性がとれ、かつ、単語信頼度の合計が最も高い単語組み合わせを理解結果候補とし、理解結果候補が発見されない場合には、該候補カテゴリの１つを候補カテゴリの範囲から排除してなる複数の候補カテゴリの集合を新しい候補カテゴリの集合として選択した後に前記理解結果候補探索処理を再び行う。 Specifically, for example, in the understanding result candidate search process, if there is one candidate category, a word having the highest word reliability is selected as an understanding result candidate from words belonging to the category, and two candidate categories are included. If it is above, among the word combinations obtained by extracting one word from each of the candidate categories and combining them, the word combination having the highest semantic consistency and the highest total word reliability is obtained as the candidate for the understanding result. If an understanding result candidate is not found, the understanding result candidate search process is performed after a set of a plurality of candidate categories obtained by excluding one of the candidate categories from the range of the candidate category is selected as a new candidate category set. Do again.

図５の例では、県カテゴリと施設名カテゴリのうちどちらか１つをはずしたカテゴリ、つまり県カテゴリのみまたは施設名カテゴリのみで理解結果候補としてあてはまる単語とそのスコア（この場合には単語信頼度）を求める。すると、図５の（ｄ）に示したように、理解結果「神奈川県」が理解結果スコア０．５５であるのに対して理解結果「桑名川駅」が理解結果スコア０．４５と理解結果「神奈川県」のほうが高スコアであるため、理解結果カテゴリは県カテゴリ、理解結果は「神奈川県」となる。 In the example of FIG. 5, a category in which one of the prefecture category and the facility name category is removed, that is, a word that is applied as an understanding result candidate only in the prefecture category or only the facility name category and its score (in this case, word reliability) ) Then, as shown in FIG. 5D, the understanding result “Kanagawa” has an understanding result score of 0.55, whereas the understanding result “Kuwanagawa Station” has an understanding result score of 0.45. Since “Kanagawa Prefecture” has a higher score, the understanding result category is the prefecture category, and the understanding result is “Kanagawa Prefecture”.

以上のようにして、理解結果として適当なカテゴリの組み合わせを見つけるために採用するカテゴリの数を減らす際、採用するカテゴリの数を１つ減らした全ての場合におけるカテゴリの組み合わせのうち、理解結果が最も高いスコアとなる組み合わせを理解結果として採用することができるため、理解結果の精度を向上させることができる。 As described above, when the number of categories to be adopted for finding an appropriate combination of categories as an understanding result is reduced, among the combinations of categories in all cases where the number of categories to be adopted is reduced, the understanding result is Since the combination having the highest score can be adopted as the understanding result, the accuracy of the understanding result can be improved.

（実施の形態例３）
次に、選択されたカテゴリにあてはまる単語の組み合わせが見つかりはしたが、これが最適な結果ではない場合について説明する。 (Embodiment 3)
Next, a case will be described in which a combination of words found in the selected category is found but this is not the optimum result.

具体的には、例えば、理解結果候補が複数の単語からなり、該単語の単語信頼度の合計（下記のスコア）が予め定められた閾値を超えない場合には、該理解結果候補を理解結果としない。 Specifically, for example, when the understanding result candidate includes a plurality of words and the total word reliability (the following score) of the word does not exceed a predetermined threshold, the understanding result candidate is determined as the understanding result. And not.

上記の例を図６に示した。音声認識部１１２が図６の（ａ）のように認識結果を出力した場合、単語信頼度演算部１１３では認識結果に含まれる全ての単語に関して単語信頼度を計算する。単語信頼度は図６の（ｂ）のようになる。この結果からカテゴリスコアを求めると、図６の（ｃ）のように、県カテゴリが０．６０、施設名カテゴリが０．４５となる。それぞれのカテゴリ閾値は０．５０、０．４０であるため、県カテゴリと施設名カテゴリが候補カテゴリとして採用すべきカテゴリと判定される。 The above example is shown in FIG. When the speech recognition unit 112 outputs the recognition result as shown in FIG. 6A, the word reliability calculation unit 113 calculates the word reliability for all the words included in the recognition result. The word reliability is as shown in FIG. When the category score is obtained from this result, the prefecture category is 0.60 and the facility name category is 0.45, as shown in FIG. Since the respective category threshold values are 0.50 and 0.40, it is determined that the prefecture category and the facility name category should be adopted as candidate categories.

次に、認識結果候補単語６ｄの中から県カテゴリと施設名カテゴリの組み合わせにあてはまる単語の中で整合性のとれるものを探すと、「長野県」と「桑名川駅」という組み合わせが得られる。しかし、これらの組み合わせは他の高い単語信頼度を持つ「神奈川県」などの単語に比べて各単語ともに単語スコア（単語信頼度）が０．０５と低く、理解結果としてのスコア（単語信頼度の合計）も０．１０と低い。これは、理解結果として選択する単語のスコアや理解結果のスコアよりもカテゴリを優先させているために生じる問題である。この場合、このような方法であってもなんらかの理解結果を出力することができるが、その正解精度は低い。 Next, from the recognition result candidate words 6d, when a matching word is searched for among the words applicable to the combination of the prefecture category and the facility name category, the combination of “Nagano Prefecture” and “Kuwanagawa Station” is obtained. However, these combinations have a low word score (word reliability) of 0.05 for each word compared to other words such as “Kanagawa” with high word reliability, and the score (word reliability) as an understanding result The total) is also low at 0.10. This is a problem that occurs because the category is prioritized over the score of the word selected as the understanding result and the score of the understanding result. In this case, even with such a method, some understanding result can be output, but the accuracy of the correct answer is low.

そこで、本実施の形態例では理解結果として選択する理解結果候補のスコア（例えば、単語信頼度の合計）に閾値を設ける。つまり、理解結果候補として選択された単語のスコアが予め定められた閾値（例えば０．２）以下ならば、その理解結果候補を理解結果としない。すなわち、理解結果候補が複数の単語からなり、該単語の単語信頼度の合計が予め定められた閾値以下ならば、該理解結果候補を理解結果として採用しない。そして、他の候補を探すために、候補カテゴリとして採用するカテゴリの数を１つ減らす。本実施の形態例では、施設名カテゴリを省いて県カテゴリのみで理解結果候補を求めた場合の結果「神奈川県」、理解結果スコア０．５５と、県カテゴリを省いて施設名カテゴリのみで理解結果候補を求めた場合の結果「掛川駅」、理解結果スコア０．４０は共に採用されている単語のスコアが閾値０．２を超えているため、これらを比較してスコアの高い「神奈川県」を理解結果とする。 Therefore, in this embodiment, a threshold is provided for the score (for example, the total word reliability) of the understanding result candidate selected as the understanding result. That is, if the score of a word selected as an understanding result candidate is equal to or less than a predetermined threshold (for example, 0.2), the understanding result candidate is not regarded as an understanding result. That is, if the understanding result candidate is composed of a plurality of words and the total word reliability of the word is equal to or less than a predetermined threshold value, the understanding result candidate is not adopted as the understanding result. In order to search for other candidates, the number of categories adopted as candidate categories is reduced by one. In the present embodiment, the result when the candidate for the understanding result is obtained only by the prefecture category without the facility name category is “Kanagawa”, the understanding result score is 0.55, and the understanding is made only by the facility name category by omitting the prefecture category. The result “Kakegawa Station” in the case where the result candidate is obtained and the understanding result score 0.40 are both higher than the threshold value of 0.2 because the score of the word that is adopted exceeds the threshold of 0.2. As an understanding result.

他にも、理解結果候補のスコアに対してもカテゴリ数に応じて閾値を設定し、各閾値を越えた理解結果候補のみを理解結果として採用することも可能である。また、閾値によって理解結果候補の採用、不採用を決定した結果、最終的に閾値を超えるスコアをもつ単語の理解結果や閾値を超えるスコアをもつ理解結果がみつからなかった場合には、最もスコアの大きいカテゴリ１つとそこにあてはまる単語を理解結果として、確認のための追加情報を求める応答文を生成する。 In addition, it is also possible to set a threshold value for the score of the understanding result candidate according to the number of categories, and to adopt only the understanding result candidate exceeding each threshold value as the understanding result. In addition, as a result of deciding whether or not to adopt candidate understanding results according to the threshold value, if the understanding result of a word having a score exceeding the threshold value or the understanding result having a score exceeding the threshold value is not found, the highest score is obtained. A response sentence for requesting additional information for confirmation is generated with one large category and a word corresponding thereto as an understanding result.

理解結果候補が１つの単語からなる場合にも、その単語の単語信頼度が予め定められた閾値以下ならば、その理解結果候補を理解結果としないようにしてもよい。 Even when the understanding result candidate is composed of one word, if the word reliability of the word is equal to or lower than a predetermined threshold value, the understanding result candidate may not be set as the understanding result.

以上のようにして、閾値を超える単語信頼度をもつ単語のみを理解結果として採用することができるため、極端に単語信頼度が低い単語を理解結果として採用したために起こる理解間違いを防ぐことができる。 As described above, since only words having word reliability exceeding the threshold can be adopted as the understanding result, it is possible to prevent misunderstandings caused by adopting words having extremely low word reliability as the understanding result. .

また、カテゴリスコアが高くても理解結果のスコアが閾値以下となるものは採用されないため、カテゴリ数が少なくても理解結果のスコアが高い結果を選択することができる。 In addition, even if the category score is high, those whose score of the understanding result is equal to or less than the threshold value are not adopted, so that a result having a high score of the understanding result can be selected even if the number of categories is small.

（実施の形態例４）
理解結果の生成精度を上げるためのその他の方法として、理解結果に選択する単語間の関係を考慮することもあげられる。具体的には、理解結果候補探索処理によって複数の理解結果候補が得られ、該理解結果候補のすべてが同一複数の単語からなる場合に、該理解結果候補の各々について、単語の単語信頼度の合計に、該理解結果候補中のカテゴリの組み合わせが１つの発話中で発話される確率が高いほど大きくなる数を乗じて得られる値を該理解結果候補のスコアとし、該理解結果候補の中で該スコアが最も高い理解結果候補を理解結果とする方法がある。 (Embodiment 4)
As another method for improving the generation accuracy of the understanding result, it is also possible to consider the relationship between words to be selected as the understanding result. Specifically, when a plurality of understanding result candidates are obtained by the understanding result candidate search process and all of the understanding result candidates are composed of the same plurality of words, the word reliability of the word is determined for each of the understanding result candidates. A value obtained by multiplying the total by the number that increases as the probability that the combination of categories in the understanding result candidate is uttered in one utterance is high is used as the score of the understanding result candidate, There is a method of using an understanding result candidate having the highest score as an understanding result.

そのような方法の例を図７に示す。音声認識部１１２が図７の（ａ）のように認識結果を出力した場合、単語信頼度演算部１１３では、認識結果に含まれる全ての単語に関して単語信頼度を計算する。単語信頼度は図７の（ｂ）のようになる。この結果からカテゴリのスコアを求めると、図７の（ｃ）のように、県カテゴリが０．６０、路線名カテゴリが０．４０、施設名カテゴリが０．４０となる。それぞれのカテゴリ閾値は０．５０、０．４０、０．４０であるため、県カテゴリ、路線名カテゴリ、施設名カテゴリが候補カテゴリとして採用すべきカテゴリと判断される。 An example of such a method is shown in FIG. When the speech recognition unit 112 outputs the recognition result as shown in FIG. 7A, the word reliability calculation unit 113 calculates the word reliability for all the words included in the recognition result. The word reliability is as shown in FIG. When the category score is obtained from this result, as shown in FIG. 7C, the prefecture category is 0.60, the route name category is 0.40, and the facility name category is 0.40. Since the respective category threshold values are 0.50, 0.40, and 0.40, it is determined that the prefecture category, the route name category, and the facility name category should be adopted as candidate categories.

次に、県カテゴリ、路線名カテゴリと施設名カテゴリの組み合わせにあてはまる単語の中で整合性のとれるものを探すが、この組み合わせで整合性のとれる単語の組み合わせは存在しない。 Next, a search is made for words that are consistent among the combinations of prefecture category, route name category, and facility name category, but there is no combination of words that can be consistent with this combination.

そのため、理解結果として選択するカテゴリを１つ減らして県カテゴリ＋路線名カテゴリ、県カテゴリ＋路線名カテゴリ、路線名カテゴリ＋施設名カテゴリの組み合わせでそれぞれ整合性のとれる組み合わせを探し、県カテゴリ＋路線名カテゴリにおいて「東京都＋東横線」、県カテゴリ＋施設名カテゴリにおいて「東京都＋品川駅」の組み合わせが得られる。ここで、理解結果のスコア（単語信頼度の合計）を比較するがどちらも０．９０と同じスコアである。 Therefore, the category selected as the understanding result is reduced by one, and the combination of the prefecture category + route name category, prefecture category + route name category, route name category + facility name category is searched for, and the prefecture category + route A combination of “Tokyo + Shinagawa Station” is obtained in the name category, and “Tokyo + Shinagawa Station” is obtained in the prefecture category + facility name category. Here, the scores of the comprehension results (total word reliability) are compared, but both have the same score of 0.90.

ここで、カテゴリ同士が１つの発話に存在する確率が高いほど大きくなる数（重み）の例を示した表である図８を利用する。図８は、カテゴリ１（８ａ）に示したカテゴリとカテゴリ２（８ｂ）に示したカテゴリの関係を重み（８ｃ）によって表している。このカテゴリの関係とは、２つのカテゴリが１つの発話に存在する確率の高さであり、それは重みで表される。この重みは、コーパスデータやユーザの発話履歴などからデータ学習によって生成される。この重みは、上記の、理解結果候補中のカテゴリの組み合わせが１つの発話中で発話される確率が高いほど大きくなる数に該当する。例えば、今回の例では県カテゴリ＋路線名カテゴリの組み合わせの重みは０．８であるのに対して、県名カテゴリ＋施設名カテゴリの組み合わせの重みは１．０なので、県カテゴリ＋路線名カテゴリは県名カテゴリ＋施設名カテゴリよりも関係が弱い。この値を利用して、「東京都＋東横線」のスコア０．９０には県カテゴリ＋路線カテゴリの重み０．８を乗じて０．７２とし、「東京都＋品川駅」のスコア０．９０には県カテゴリ＋施設カテゴリの重み１．０を乗じて０．９０とする。この値を比較した結果、県カテゴリ＋施設カテゴリである「東京都＋品川駅」が高スコアであるので理解結果として選択される。 Here, FIG. 8, which is a table showing an example of the number (weight) that increases as the probability that categories exist in one utterance, is higher. FIG. 8 shows the relationship between the category shown in category 1 (8a) and the category shown in category 2 (8b) by weight (8c). This category relationship is a high probability that two categories exist in one utterance, and is represented by a weight. This weight is generated by data learning from corpus data, user utterance history, and the like. This weight corresponds to a number that increases as the probability that the combination of categories in the understanding result candidate is uttered in one utterance is higher. For example, the weight of the combination of prefecture category + route name category is 0.8 in this example, whereas the weight of the combination of prefecture name category + facility name category is 1.0, so the prefecture category + route name category Is weaker than the prefecture name category + facility name category. Using this value, the score of “Tokyo + Toyoko Line” 0.90 is multiplied by the weight 0.8 of the prefecture category + route category to be 0.72, and the score of “Tokyo + Shinagawa Station” is 0. 90 is multiplied by the weight 1.0 of the prefecture category + facility category to be 0.90. As a result of comparing these values, “Tokyo + Shinagawa Station”, which is the prefecture category + facility category, has a high score and is selected as an understanding result.

以上のようにして、理解結果に採用される単語の組み合わせを考慮して理解結果のスコアを決定し、そのスコアをもとに最終理解結果を選択するので、発話される可能性が高い単語の組み合わせでの理解結果を採用されやすくすることができる。 As described above, the score of the understanding result is determined in consideration of the combination of words adopted for the understanding result, and the final understanding result is selected based on the score. The understanding result in combination can be easily adopted.

（実施の形態例５）
また、図７の例のような場合、それぞれの単語がどの認識結果に含まれていたかという情報を利用することもできる。具体的には、理解結果候補探索処理によって複数の理解結果候補が得られ、該理解結果候補のすべてが同一複数の単語からなる場合に、該理解結果候補の各々について、同一認識結果の同一候補内において認識された単語の組み合わせが該理解結果候補内にある場合に、１よりも大きい数を該単語の単語信頼度に乗じ、該乗算後の単語信頼度の合計を該理解結果候補のスコアとし、該理解結果候補の中で該スコアが最も高い理解結果候補を理解結果として採用する。つまり、同一の認識結果に含まれていた単語同士の組み合わせはスコアを高くすることで、認識部１１２の計算した単語同士の組み合わせ確率を利用することができる。 (Embodiment 5)
Further, in the case of the example of FIG. 7, it is also possible to use information indicating which recognition result each word is included in. Specifically, when a plurality of understanding result candidates are obtained by the understanding result candidate search process and all of the understanding result candidates are composed of the same plurality of words, the same candidate of the same recognition result is obtained for each of the understanding result candidates. If the combination of words recognized in the word is within the candidate for the understanding result, the word reliability of the word is multiplied by a number greater than 1 and the sum of the word reliability after the multiplication is the score of the candidate for the understanding result The understanding result candidate having the highest score among the understanding result candidates is adopted as the understanding result. That is, the combination probability of words calculated by the recognition unit 112 can be used by increasing the score of the combinations of words included in the same recognition result.

今回採用している単語信頼度計算は、［数１］の式を用いて単語信頼度を計算しているが、この式の中で、途中計算結果としてでてくるＰ_ｉは認識結果の各候補文の信頼度である。本実施の形態例ではこの、文としての信頼度も利用する。ちなみに、図７の認識結果において単語信頼度の計算途中で求められる文信頼度は図９の（ａ）中の９ｄのようになる。 In the word reliability calculation adopted this time, the word reliability is calculated using the formula [Equation 1]. In this formula, P _{i obtained} as an intermediate calculation result is each recognition result. The reliability of the candidate sentence. In the present embodiment, the reliability as a sentence is also used. Incidentally, the sentence reliability obtained during the calculation of the word reliability in the recognition result of FIG. 7 is 9d in FIG. 9A.

次に単語信頼度を求めるために、各単語はその単語が含まれていた文の信頼度を足し合わせるが（［数１］）、ここで、各単語がどの認識結果に含まれていたのかを調べる。その結果が図９の（ｂ）である。図９の（ｂ）では第一発話の第２認識候補を「１−２」と書き表している。例えば、図９の（ａ）がユーザの第一発話に対する認識結果であるとすると、単語「東京都」は第一発話の認識結果中の第２、第３、第４候補の中に現れている。通常は、これらの文信頼度を足し合わせたものを単語信頼度としてから理解結果を求めて理解結果のスコアを計算するが、本実施の形態例では、ここから直接理解結果のスコアを計算する。理解結果に採用する単語の組み合わせと文信頼度の信頼度によって単語信頼度に重み付けをする。この重み付け変数は、予めデータ学習によって最適値を求めておく。本実施の形態例では１．２を用いる。この重み付け変数は、上記の、１よりも大きい数に該当する。これを使うと、理解結果「東京都＋品川駅」に対する通常の理解結果スコアが、次のように求められるのに対して、
東京都の単語信頼度＋品川駅の単語信頼度
＝（０．３０＋０．２０＋０．１０）＋０．３０＝０．９０
（ここで、（０．３０＋０．２０＋０．１０）は東京都の単語信頼度であり、０．３０は品川駅の単語信頼度である）
と求められるのに対して、本実施の形態例は下記のようになる。 Next, in order to obtain the word reliability, each word is added with the reliability of the sentence in which the word was included ([Equation 1]). Here, which recognition result each word was included in Check out. The result is (b) of FIG. In FIG. 9B, the second recognition candidate of the first utterance is written as “1-2”. For example, if (a) in FIG. 9 is the recognition result for the first utterance of the user, the word “Tokyo” appears in the second, third, and fourth candidates in the recognition result of the first utterance. Yes. Usually, the sum of these sentence reliability is used as the word reliability, and then the understanding result is obtained and the score of the understanding result is calculated. In this embodiment, the score of the understanding result is directly calculated from here. . The word reliability is weighted according to the word combination adopted in the understanding result and the reliability of the sentence reliability. For this weighting variable, an optimum value is obtained in advance by data learning. In this embodiment, 1.2 is used. This weighting variable corresponds to a number greater than 1 described above. Using this, the normal understanding result score for the understanding result “Tokyo + Shinagawa Station” is calculated as follows,
Word reliability of Tokyo + Word reliability of Shinagawa station = (0.30 + 0.20 + 0.10) + 0.30 = 0.90
(Here, (0.30 + 0.20 + 0.10) is the word reliability of Tokyo, and 0.30 is the word reliability of Shinagawa Station)
In contrast to this, the present embodiment is as follows.

東京都の単語信頼度＋品川駅の単語信頼度
＝（０．３０×１．２＋０．２０＋０．１０）＋０．３０×１．２
＝１．０２
（ここで、２つの０．３０×１．２は共に第１発話の第２認識結果中で認識された単語分の単語信頼度の値である）
これらは、理解結果として採用された組み合わせで同じ認識結果中で認識された単語であるため、重みをつけている。すなわち、同一認識結果の同一候補内において認識された単語である「東京都」と「品川駅」とには、値は相異なるが、それぞれ、１よりも大きい数が（重みとして）乗じられている。これに対して理解結果「東京都＋東横線」は同一認識結果中での認識がないため、前記の例と同様に、理解結果スコアは０．９０となり、同一認識結果中での認識がなかった「東京都＋東横線」よりも同一認識結果中での認識結果が得られた「東京都＋品川駅」のスコアのほうが高くなるため、これを理解結果として採用する。 Word reliability in Tokyo + Word reliability in Shinagawa Station = (0.30 x 1.2 + 0.20 + 0.10) + 0.30 x 1.2
= 1.02
(Here, both 0.30 × 1.2 are word reliability values for words recognized in the second recognition result of the first utterance)
Since these are words recognized in the same recognition result with the combination adopted as the understanding result, they are weighted. That is, the words “Tokyo” and “Shinagawa Station” that are recognized in the same candidate of the same recognition result have different values, but each is multiplied by a number greater than 1 (as a weight). Yes. On the other hand, since the understanding result “Tokyo + Toyoko Line” is not recognized in the same recognition result, the understanding result score is 0.90 as in the above example, and there is no recognition in the same recognition result. Since the score of “Tokyo + Shinagawa Station”, which obtained the recognition result in the same recognition result, is higher than that of “Tokyo + Toyoko Line”, this is adopted as the understanding result.

以上のようにして、音声認識エンジンが計算した複数単語同時認識の可能性を利用して同一認識結果の同一候補内において認識された単語同士の組み合わせが理解結果として採用されやすくなるため、同時に発話された可能性が高い単語同士の組み合わせでの理解結果を採用されやすくすることができる。 As described above, a combination of words recognized within the same candidate of the same recognition result using the possibility of simultaneous recognition of a plurality of words calculated by the speech recognition engine is easily adopted as an understanding result. It is possible to make it easier to adopt an understanding result of a combination of words that have a high possibility of being made.

（実施の形態例６）
また、同様の方法で、音声認識部１１２による音声波形上の単語の認識箇所に重なりがあるもののスコアを下げることで認識箇所に重なりがある理解結果のスコアを下げることができる。具体的には、理解結果候補探索処理によって複数の理解結果候補が得られ、該理解結果候補のすべてが同一複数の単語からなる場合に、該理解結果候補の各々について、単語信頼度の合計に、同一認識結果内での音声波形中の単語認識区間に重なりがある単語が該理解結果候補内にある場合に、重なりが大きいほど１より小さくなる数を乗じて得られる値を該理解結果候補のスコアとし、該理解結果候補の中で該スコアが最も高い理解結果候補を理解結果として採用する。 (Embodiment 6)
Moreover, the score of the understanding result with an overlap in a recognition part can be lowered | hung by reducing the score of the recognition part 112 with the overlap in the recognition part of the word on the speech waveform by the same method. Specifically, when a plurality of understanding result candidates are obtained by the understanding result candidate searching process and all of the understanding result candidates are composed of the same plurality of words, the word reliability is summed for each of the understanding result candidates. When there is an overlapping word in the word recognition section in the speech waveform within the same recognition result in the understanding result candidate, the value obtained by multiplying the number obtained by multiplying a number smaller than 1 as the overlapping is large The understanding result candidate having the highest score among the understanding result candidates is adopted as the understanding result.

図１０は、図７の音声認識結果を出力したときの、音声波形と単語の認識位置の関係を時系列で示したものである。本実施の形態例において音声入力検出開始時刻はＴ０であり、音声入力検出終了位置はＴ５である。また、第１認識結果候補２０１、第２認識結果候補２０３、第３認識結果候補２０３、第４認識結果候補２０４における最初の単語の認識開始位置はＴ１であり、第５認識結果候補２０５における最初の単語の認識開始位置はＴ２である。図７の（ｄ）では「東京都＋東横線」と「東京都＋品川駅」という理解結果がともにスコア０．９０で出力されている。ここで、「東横線」と「東京都」の認識開始・終了時刻を調べる。「東横線」は第１認識結果候補２０１の中に現れた単語であり、「東京都」は第２認識結果候補２０２、第３認識結果候補２０３、第４認識結果候補２０４内に出現しているため、第１認識結果候補２０１と第２認識結果候補２０２、第３認識結果候補２０３、第４認識結果候補２０４との関係を調べる。すると、「東京都」は第２認識結果候補２０２、第３認識結果候補２０３、第４認識結果候補２０４それぞれで、時刻Ｔ１から時刻Ｔ３の間で認識されており、「東横線」は第１認識結果候補２０１において時刻Ｔ１から時刻Ｔ５の間で認識されている。そのため、時刻Ｔ１から時刻Ｔ３間において「東京都」と「東横線」の認識箇所に重なりが生じている。このため、しかし、実際の発話において異なる二つの単語を同時に発話ということはありえない為、認識箇所に重なりがあるものは、認識箇所の重なりの大小に応じてスコアを下げる。本実施の形態例では、認識時間が短い単語の認識箇所において重なり部分が占める割合をまず、求める。ここでは、Ｔ１からＴ３が１０００ミリ秒であった。そのため、「東京都」と「東横線」において認識時間が短い「東京都」を認識したＴ１からＴ２、１０００ミリ秒に占める重なり箇所の割合は１００％である。そこで、重なり箇所が占める割合に応じて理解結果スコアから割り引く。割り引率は重なり箇所の割合に３０％をかけたものとする。割引率の適用変数（ここでは３０％）は予めデータ学習によって求めておく。この割引率を１から引いて得る数値が、上記の、重なりが大きいほど１より小さくなる数に該当する。よって、今回は１．０×０．３＝０．３となり、理解結果スコア（単語信頼度の合計）である０．９は３０％を割り引いて０．６３となる。これに対して「東京都＋品川駅」は認識箇所に重なり箇所がなく、理解結果スコアは０．９０のままなので理解結果「東京都＋品川駅」のほうが理解結果「東京都＋東横線」スコアが高くなるため、これを最終的な理解結果として選択する。 FIG. 10 shows the relationship between the speech waveform and the word recognition position in time series when the speech recognition result of FIG. 7 is output. In this embodiment, the voice input detection start time is T0, and the voice input detection end position is T5. The first word recognition start position in the first recognition result candidate 201, the second recognition result candidate 203, the third recognition result candidate 203, and the fourth recognition result candidate 204 is T1, and the first recognition result candidate 205 The recognition start position of the word is T2. In (d) of FIG. 7, the understanding results of “Tokyo + Toyoko Line” and “Tokyo + Shinagawa Station” are both output with a score of 0.90. Here, the recognition start / end times of “Toyoko Line” and “Tokyo” are checked. “Toyoko Line” is a word that appears in the first recognition result candidate 201, and “Tokyo” appears in the second recognition result candidate 202, the third recognition result candidate 203, and the fourth recognition result candidate 204. Therefore, the relationship between the first recognition result candidate 201, the second recognition result candidate 202, the third recognition result candidate 203, and the fourth recognition result candidate 204 is examined. Then, “Tokyo” is recognized in each of the second recognition result candidate 202, the third recognition result candidate 203, and the fourth recognition result candidate 204 from time T1 to time T3, and “Toyoko Line” The recognition result candidate 201 is recognized between time T1 and time T5. For this reason, there is an overlap in the recognized portions of “Tokyo” and “Toyoko Line” between time T1 and time T3. For this reason, however, two different words in an actual utterance cannot be uttered at the same time. Therefore, if there are overlapping recognition locations, the score is lowered according to the overlap of the recognition locations. In the present embodiment, first, the ratio of the overlapping portion in the recognized portion of the word having a short recognition time is obtained. Here, T1 to T3 were 1000 milliseconds. For this reason, the ratio of overlapping locations in T1 to T2, 1000 milliseconds that recognizes “Tokyo” with a short recognition time in “Tokyo” and “Toyoko Line” is 100%. Therefore, discounting is performed from the understanding result score in accordance with the ratio occupied by overlapping portions. The discount rate is obtained by multiplying the ratio of overlapping parts by 30%. The discount rate application variable (30% here) is obtained in advance by data learning. The numerical value obtained by subtracting the discount rate from 1 corresponds to the number smaller than 1 as the overlap increases. Accordingly, this time, 1.0 × 0.3 = 0.3, and 0.9, which is the understanding result score (total word reliability), is 30% discounted to 0.63. On the other hand, “Tokyo + Shinagawa Station” has no overlapping part in the recognition part, and the understanding result score remains 0.90, so the understanding result “Tokyo + Shinagawa Station” is the understanding result “Tokyo + Toyoko Line”. Since the score is high, this is selected as the final understanding result.

以上のようにして、音声波形中の認識箇所に重なりがある単語同士の組み合わせには低い出現が設定されるため、音声波形中の認識箇所に重なりがある単語同士の組み合わせは理解結果として採用されにくくすることができる。 As described above, since a low occurrence is set for a combination of words that overlap in recognition locations in the speech waveform, a combination of words that overlap in recognition locations in the speech waveform is adopted as an understanding result. Can be difficult.

（実施の形態例７）
ここまでの方法を使えば、カテゴリ数がいくつでも対応できる。カテゴリ選択において多数のカテゴリが選択された例を図１１に示した。 (Embodiment 7)
If you use the method so far, you can handle any number of categories. An example in which a number of categories are selected in the category selection is shown in FIG.

ここでは、第一発話でユーザは「品川駅」と発話したが、認識結果が図１１の（ａ）のようになり、理解結果が「仙台駅」となり、言語理解に失敗した。そのため、ユーザが第二発話で「東京都の品川駅」と発話し、認識結果が図１１の（ｂ）のようになった場合の理解結果の導出例を示している。 Here, the user uttered “Shinagawa Station” in the first utterance, but the recognition result was as shown in FIG. 11A, the understanding result was “Sendai Station”, and the language understanding failed. Therefore, a derivation example of the understanding result when the user speaks “Shinagawa Station in Tokyo” in the second utterance and the recognition result is as shown in FIG. 11B is shown.

第二発話の後、単語信頼度計算は発話ごとに行ったあと、同じ単語の単語信頼度は足し合わせる。第一発話と第二発話の内容を足し合わせた結果が図１１の（ｃ）である。ここから各カテゴリのスコアを計算すると図１１の（ｄ）となるが、県カテゴリ、市区町村カテゴリ、路線名カテゴリ、道路名カテゴリ、施設名カテゴリの５つのカテゴリがそれぞれのカテゴリ閾値を越えている。そのため、これにあてはまる単語の組み合わせを探すが、目的地設定において路線名と道路名が同時に指定されることはないため、(1)「県カテゴリ＋市区町村カテゴリ＋路線名カテゴリ＋施設名カテゴリ」または(2)「県カテゴリ＋市区町村カテゴリ＋道路名カテゴリ＋施設名カテゴリ」で探すことになる。しかし、どちらの組み合わせにおいても全て整合性のとれる組あわせの単語は図１１の（ｃ）からは見つからないため、カテゴリ数をさらに１つ減らす。 After the second utterance, the word reliability calculation is performed for each utterance, and then the word reliability of the same word is added. FIG. 11C shows the result of adding the contents of the first utterance and the second utterance. When the score of each category is calculated from this, it becomes (d) in FIG. 11, and five categories of prefecture category, city category, route name category, road name category, and facility name category exceed their respective category thresholds. Yes. Therefore, search for a combination of words that apply to this, but since the route name and road name are not specified at the same time in the destination setting, (1) “prefecture category + city category + route name category + facility name category "Or (2)" Prefectural category + city category + road name category + facility name category ". However, the combination words that can be consistent in any combination are not found from FIG. 11C, so the number of categories is further reduced by one.

今回は、全てのカテゴリの組み合わせを考慮する方法をとることにする。すると、考えられる組み合わせは、(1)、(2)の組み合わせからどれか１つのカテゴリを省いた、以下の組み合わせになる。 This time, we will take a method that considers all combinations of categories. Then, possible combinations are the following combinations in which any one category is omitted from the combinations (1) and (2).

(3)「県カテゴリ＋市区町村カテゴリ＋路線名カテゴリ」、(4)「県カテゴリ＋市区町村カテゴリ＋施設名カテゴリ」、(5)「県カテゴリ＋路線名カテゴリ＋施設名カテゴリ」、(6)「県カテゴリ＋市区町村カテゴリ＋道路名カテゴリ」、(7)「県カテゴリ＋道路名カテゴリ＋施設名カテゴリ」。 (3) “Prefecture category + city category + route name category”, (4) “Province category + city category + facility name category”, (5) “Province category + route name category + facility name category”, (6) “Prefecture category + city category + road name category”, (7) “Prefecture category + road name category + facility name category”.

しかし、この組み合わせでも、整合性のとれる単語の組み合わせが得られないため、さらにカテゴリ数を１つ減らす。カテゴリの組み合わせは(8)「県カテゴリ＋市区町村カテゴリ」、(9)「県カテゴリ＋路線名カテゴリ」、(10)「県カテゴリ＋道路名カテゴリ」、(11)「県カテゴリ＋施設名カテゴリ」、(12)「市区町村カテゴリ＋路線名カテゴリ」、(13)「市区町村カテゴリ＋道路名カテゴリ」、(14)「市区町村カテゴリ＋施設名カテゴリ」、(15)「路線名カテゴリ＋施設名カテゴリ」、(16)「道路名＋施設名カテゴリ」となる。これらのうち、あてはまる単語が見つかったのが(8)、(9)、(10)、(11)、(15)である。これにあてはまった単語の組み合わせを図１１の続きである図１２の（ｅ）に示した。 However, even with this combination, it is not possible to obtain a word combination that can be matched, so the number of categories is further reduced by one. The combination of categories is (8) “prefecture category + city category”, (9) “prefecture category + route name category”, (10) “prefecture category + road name category”, (11) “prefecture category + facility name” "Category", (12) "City / City Category + Route Name Category", (13) "City / City Category + Road Name Category", (14) "City / City Category + Facility Name Category", (15) "Route “Name category + facility name category”, (16) “road name + facility name category”. Of these, the corresponding words were found in (8), (9), (10), (11), and (15). The combinations of words that correspond to this are shown in FIG.

さらに、これらのスコアを求めると１１ｒのようになり、理解結果「東京都＋品川駅」のスコアが最も高くなるため、これを理解結果とする。 Furthermore, when these scores are obtained, it becomes 11r, and the score of the understanding result “Tokyo + Shinagawa Station” is the highest, and this is taken as the understanding result.

また、Ｎ個のカテゴリにあてはまる理解結果とＮ−１個のカテゴリにあてはまる理解結果とを比べて最もスコアが高いものを理解結果として選択することも可能である。ただし、Ｎ個のカテゴリで選択される単語数はＮ個、Ｎ−１個のカテゴリで選択される単語はＮ−１個であるため、理解結果のスコアを正規化する必要がある。この際、正規化によってカテゴリ数の多い理解結果に重み付けをする。例えば、理解結果として選択するカテゴリが１個の場合は選択された単語のスコアをそのまま用いるのに対して、カテゴリが２個の場合は１．６、３個の場合は２．２で割ったものを比較に用いる。この、カテゴリ個数ごとの正規化のための変数は予めデータ学習によって最適値を求めておく。理解結果のカテゴリ数の最大値をＭとすると、このように、Ｍ個のカテゴリにあてはまる理解結果の中で最もスコアの高い結果とＭ−１個のカテゴリにあてはまる理解結果のうち最もスコアの高い理解結果を比較して、理解結果のスコアが高いものをＭ−２個のカテゴリにあてはまる理解結果の中で最もスコアの高い理解結果と比較する。このように１〜Ｍ個までのカテゴリ数における最適な結果を得ることができる。 It is also possible to select an understanding result having the highest score by comparing an understanding result that applies to N categories with an understanding result that applies to N−1 categories. However, since the number of words selected in the N categories is N and the number of words selected in the N−1 categories is N−1, it is necessary to normalize the score of the understanding result. At this time, an understanding result with a large number of categories is weighted by normalization. For example, when the category selected as the understanding result is 1, the score of the selected word is used as it is, whereas when the number of categories is 2, 1.6 is divided by 2.2 when the number is 3 We use thing for comparison. For these normalization variables for each category, an optimum value is obtained in advance by data learning. Assuming that the maximum value of the number of categories of the understanding results is M, the result having the highest score among the understanding results applicable to the M categories and the highest score among the understanding results applicable to the M-1 categories are thus obtained. Comparing the understanding results, the one with the highest score of the understanding results is compared with the understanding result having the highest score among the understanding results applicable to the M-2 categories. Thus, the optimum result in the number of categories from 1 to M can be obtained.

図１１の例において、カテゴリ数１の場合の理解結果も調べた場合を例に挙げる。カテゴリ数１の場合の理解結果は図１１の（ｅ）にカテゴリ数２の場合の理解結果とともに示した。ここで、カテゴリ数２の理解結果の中で最もスコアが高い理解結果とカテゴリ数１の理解結果の中でも最もスコアが高い理解結果とを比較する。
１．「東京都＋品川駅」・・・スコア０．９０
２．「東京都」・・・スコア０．５０
理解結果Ｎに含まれる全ての単語が理解結果Ｍに含まれる場合、それぞれのスコアは必ず、理解結果Ｎのスコア＜理解結果Ｍのスコアとなる。 In the example of FIG. 11, a case where the understanding result when the number of categories is 1 is also examined is taken as an example. The result of understanding when the number of categories is 1 is shown together with the result of understanding when the number of categories is 2 in FIG. Here, the understanding result having the highest score among the understanding results of the category number 2 is compared with the understanding result having the highest score among the understanding results of the category number 1.
1. "Tokyo + Shinagawa Station" score 0.90
2. "Tokyo" ... Score 0.50
When all the words included in the understanding result N are included in the understanding result M, the respective scores always satisfy the score of the understanding result N <the score of the understanding result M.

この２つの理解結果を比較するために、理解結果Ｍのスコアをカテゴリ数２の場合の正規化変数１．６で割る。正規化変数はコーパスを用いたデータ学習によって予め求めておく。正規化後のスコアは１１ｅに示した。１．、２．の理解結果の関しては以下の通り。
１．「東京都＋品川駅」・・・正規化済スコア０．５６
２．「東京都」・・・正規化済スコア０．５０
よって、理解結果「東京都＋品川駅」のほうが正規化済のスコアが高いため、この理解結果は信頼できると判断し、これを理解結果として採用する。 In order to compare the two understanding results, the score of the understanding result M is divided by the normalized variable 1.6 in the case of 2 categories. The normalization variable is obtained in advance by data learning using a corpus. The score after normalization is shown in 11e. 1. 2. The understanding results are as follows.
1. "Tokyo + Shinagawa Station" ... Normalized score 0.56
2. “Tokyo”-Normalized score 0.50
Therefore, the understanding result “Tokyo + Shinagawa Station” has a higher normalized score, so it is determined that the understanding result is reliable, and this is adopted as the understanding result.

ここまでにあげた理解結果の選択手段はそれぞれを組み合わせて使用することもでき、組み合わせて利用した場合もそれぞれの手段が個別に理解率向上に寄与する。 The understanding result selection means described so far can be used in combination, and even when used in combination, each means contributes to improving the understanding rate individually.

本発明実施の形態例の機能ブロックである。It is a functional block of the embodiment of the present invention. 本発明実施の形態例制御装置の基本動作フローである。It is a basic operation | movement flow of the example control apparatus of this invention. 従来例で理解結果を導き出すことができる認識結果と理解結果のサンプルである。It is a sample of a recognition result and an understanding result from which an understanding result can be derived in a conventional example. 従来例における制御装置の基本動作フローである。It is a basic operation | movement flow of the control apparatus in a prior art example. カテゴリ選択で選択されたカテゴリでの、従来例では整合性のとれる単語の組み合わせがみつからない認識結果と理解結果のサンプルである。It is a sample of the recognition result and the understanding result in the category selected by the category selection, in which a combination of consistent words cannot be found in the conventional example. カテゴリ選択で選択されたカテゴリでの単語の組み合わせでは理解結果のスコアが著しく低くなる認識結果と理解結果のサンプルである。The combination of words in the category selected in the category selection is a sample of the recognition result and the understanding result in which the score of the understanding result is remarkably lowered. 複数の理解結果の間でスコアに差がない認識結果と理解結果のサンプルである。It is a sample of recognition results and understanding results with no difference in scores among a plurality of understanding results. カテゴリ同士の組み合わせによる重みである。It is the weight by the combination of categories. 図７の例に示した認識結果における認識結果候補単語が含まれる文とその信頼度である。This is a sentence including recognition result candidate words in the recognition result shown in the example of FIG. 7 and its reliability. 図７に示した認識結果を認識した音声波形上で各単語が認識された位置を示す図である。It is a figure which shows the position where each word was recognized on the audio | voice waveform which recognized the recognition result shown in FIG. 多数のカテゴリが出現した場合の認識結果と理解結果のサンプルである。It is a sample of recognition results and understanding results when many categories appear. 図１１の続きである。It is a continuation of FIG.

Explanation of symbols

１００：ナビゲーション装置、１１０：制御装置、１１１：入力制御部、１１２：音声認識部、１１３：単語信頼度演算部、１１４：言語理解部、１１５：応答生成部、１１６：ＧＵＩ表示制御部、１１７：音声合成部、１２０：スイッチ、１３０：マイクロフォン、１４０：メモリ、１４１：音声認識用辞書・文法、１４２：理解結果、１５０：ディスク読み取り装置、１５１：ディスク、１６０：モニタ、１７０：スピーカ、２０１：第１認識結果候補を出力した際の単語認識位置、２０２：第２認識結果候補を出力した際の単語認識位置、２０３：第３認識結果候補を出力した際の単語認識位置、２０４：第４認識結果候補を出力した際の単語認識位置、２０５：第５認識結果候補を出力した際の単語認識位置。 100: Navigation device, 110: Control device, 111: Input control unit, 112: Speech recognition unit, 113: Word reliability calculation unit, 114: Language understanding unit, 115: Response generation unit, 116: GUI display control unit, 117 : Speech synthesis unit, 120: switch, 130: microphone, 140: memory, 141: dictionary / grammar for speech recognition, 142: understanding result, 150: disk reader, 151: disk, 160: monitor, 170: speaker, 201 : Word recognition position when the first recognition result candidate is output, 202: word recognition position when the second recognition result candidate is output, 203: word recognition position when the third recognition result candidate is output, 204: first 4 Word recognition position when a recognition result candidate is output, 205: Word recognition position when a fifth recognition result candidate is output.

Claims

Speech input means for converting input speech into speech signals and outputting, speech recognition means for converting speech signals to candidate words and outputting, and words indicating the likelihood of the candidate words being spoken A speech dialogue apparatus comprising: a word reliability calculation means for obtaining a reliability; and a language understanding unit for understanding a spoken language input to the voice input means from the candidate word and the word reliability,
The language understanding unit classifies all candidate words output by the voice recognition unit into categories when a user's utterance is input to the voice input unit, and for each of the categories, candidate words belonging to the category A category score indicating the likelihood that a candidate word belonging to the category has been uttered using the word reliability of the category, and all the categories having the category score equal to or higher than a predetermined threshold for the category or Category selection processing for selecting a part as a candidate category, and understanding result candidate search processing for searching for an understanding result candidate that is a word or a combination of words that can be semantically consistent from candidate words belonging to the candidate category If no understanding result candidate is found by the understanding result candidate search process, one or more categories are selected from the set of candidate categories. Voice dialogue system according to claim again to perform the understanding result candidate search process after the category selection processing for selecting excluding comprising a set as a new set of candidate categories.

When an understanding result candidate is not found in the understanding result candidate search process, a category selection process for selecting a set obtained by excluding one candidate category having the lowest category score from the set of candidate categories as a set of new candidate categories. The spoken dialogue apparatus according to claim 1, wherein the understanding result candidate search process is performed again after being performed.

When an understanding result candidate is not found in the understanding result candidate search process, after performing a category selection process for selecting a plurality of sets obtained by excluding one candidate category from the set of candidate categories as a set of new candidate categories. The spoken dialogue apparatus according to claim 1, wherein the understanding result candidate search process is performed again for each set of candidate categories.

4. The understanding result candidate is not adopted as an understanding result if the understanding result candidate consists of one word and the word reliability of the word is equal to or less than a predetermined threshold. Voice interaction device.

The understanding result candidate is not adopted as an understanding result if the understanding result candidate is composed of a plurality of words and the total word reliability of the words is equal to or less than a predetermined threshold value. 3. The voice interaction device according to 3.

When a plurality of understanding result candidates are obtained by the understanding result candidate search process and all of the understanding result candidates are composed of the same plurality of words, the understanding result is added to the total word reliability for each of the understanding result candidates. A value obtained by multiplying the number of combinations of categories in a candidate that increases as the probability of being uttered in one utterance increases is set as the score of the understanding result candidate, and the understanding having the highest score among the understanding result candidates 4. The spoken dialogue apparatus according to claim 1, wherein a result candidate is adopted as an understanding result.

When a plurality of understanding result candidates are obtained by the understanding result candidate searching process and all of the understanding result candidates are composed of the same plurality of words, each of the understanding result candidates is recognized within the same candidate of the same recognition result. When the combination of words is within the understanding result candidate, the word reliability of the word is multiplied by a number greater than 1, and the sum of the word reliability after the multiplication is used as the score of the understanding result candidate. The spoken dialogue apparatus according to claim 1, 2, or 3, wherein an understanding result candidate having the highest score among the result candidates is adopted as an understanding result.

When a plurality of understanding result candidates are obtained by the understanding result candidate search process and all of the understanding result candidates are composed of the same plurality of words, the same recognition result is added to the total word reliability for each of the understanding result candidates. When a word having an overlap in the word recognition section in the speech waveform within is present in the understanding result candidate, a value obtained by multiplying a number smaller than 1 as the overlap is larger is used as the score of the understanding result candidate. The spoken dialogue apparatus according to claim 1, 2, or 3, wherein an understanding result candidate having the highest score among the understanding result candidates is adopted as an understanding result.

Speech input means for converting input speech into speech signals and outputting, speech recognition means for converting speech signals to candidate words and outputting, and words indicating the likelihood of the candidate words being spoken A speech dialogue method using word reliability calculation means for obtaining reliability, and a language understanding unit that understands a spoken language input to the voice input means from the candidate word and the word reliability,
The language understanding unit classifies all candidate words output by the voice recognition unit into categories when a user's utterance is input to the voice input unit, and for each of the categories, candidate words belonging to the category A category score indicating the likelihood that a candidate word belonging to the category has been uttered using the word reliability of the category, and all the categories having the category score equal to or higher than a predetermined threshold for the category or Category selection processing for selecting a part as a candidate category, and understanding result candidate search processing for searching for an understanding result candidate that is a word or a combination of words that can be semantically consistent from candidate words belonging to the candidate category If no understanding result candidate is found by the understanding result candidate search process, one or more categories are selected from the set of candidate categories. Voice dialogue method and performing after the category selection processing for selecting excluding comprising a set as a new set of candidate categories the understanding result candidates search process again.

When an understanding result candidate is not found in the understanding result candidate search process, a category selection process for selecting a set obtained by excluding one candidate category having the lowest category score from the set of candidate categories as a set of new candidate categories. The spoken dialogue method according to claim 9, wherein the understanding result candidate search process is performed again after being performed.

When an understanding result candidate is not found in the understanding result candidate search process, after performing a category selection process for selecting a plurality of sets obtained by removing one candidate category from the set of candidate categories as a set of new candidate categories. 10. The spoken dialogue method according to claim 9, wherein the understanding result candidate search process is performed again for each set of candidate categories.

12. The understanding result candidate is not adopted as an understanding result if the understanding result candidate consists of one word and the word reliability of the word is not more than a predetermined threshold value. Voice interaction method.

The understanding result candidate is not adopted as an understanding result if the understanding result candidate includes a plurality of words and the total word reliability of the word is equal to or less than a predetermined threshold. 11. The voice interaction method according to 11.

When a plurality of understanding result candidates are obtained by the understanding result candidate search process and all of the understanding result candidates are composed of the same plurality of words, the understanding result is added to the total word reliability for each of the understanding result candidates. A value obtained by multiplying the number of combinations of categories in a candidate that increases as the probability of being uttered in one utterance increases is set as the score of the understanding result candidate, and the understanding having the highest score among the understanding result candidates 12. The voice interaction method according to claim 9, wherein the result candidate is adopted as an understanding result.

When a plurality of understanding result candidates are obtained by the understanding result candidate search process and all of the understanding result candidates are composed of the same plurality of words, each of the understanding result candidates is recognized within the same candidate of the same recognition result. If the word combination is within the understanding result candidate, the word reliability of the word is multiplied by a number greater than 1, and the sum of the word reliability after the multiplication is used as the score of the understanding result candidate. 12. The speech dialogue method according to claim 9, 10 or 11, wherein an understanding result candidate having the highest score among the result candidates is adopted as an understanding result.

When a plurality of understanding result candidates are obtained by the understanding result candidate search process and all of the understanding result candidates are composed of the same plurality of words, the same recognition result is added to the total word reliability for each of the understanding result candidates. A word obtained by multiplying a number that is smaller than 1 as the overlap is larger when a word that has an overlap in the word recognition section in the speech waveform is within the understanding result candidate, 12. The spoken dialogue method according to claim 9, 10 or 11, wherein an understanding result candidate having the highest score among the understanding result candidates is adopted as an understanding result.