JP2007004052A

JP2007004052A - Voice interactive device and voice understood result generation method

Info

Publication number: JP2007004052A
Application number: JP2005186892A
Authority: JP
Inventors: Keiko Katsuragawa; 景子桂川
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2005-06-27
Filing date: 2005-06-27
Publication date: 2007-01-11
Anticipated expiration: 2025-06-27
Also published as: JP4639990B2

Abstract

PROBLEM TO BE SOLVED: To reduce omission of selection of recognition result by voice recognition. SOLUTION: Voice, input via a microphone 20, is recognized, on the basis of a recognition object words. On the basis of the words included in the recognition result candidates that are the recognition results, similar words are detected from a disk 40, in which the recognition object words of the voice are stored in association with the similar words that are likely to be incorrectly recognized. The detected similar words are added to the recognition result candidates, and the understanding result to be the response to the speech uttered is generated from the recognition result candidates, to which the similar words have been added. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、発話された音声に応じて対話をする音声対話装置に関し、詳しくは、音声の認識率を向上させた音声対話装置及び音声理解結果生成方法に関する。 The present invention relates to a voice dialogue apparatus that performs a dialogue according to spoken voice, and more particularly, to a voice dialogue apparatus and a voice understanding result generation method with improved voice recognition rate.

ユーザによって発話された音声を入力し、入力された音声の音声認識結果に応じたシステム応答をすることで、ユーザとの間で対話をする音声対話装置が考案されている。このような音声対話装置では、入力された音声の認識率を向上させるために様々な音声認識手法を用いている。 2. Description of the Related Art A voice dialogue apparatus has been devised in which a voice uttered by a user is input and a system response is made according to a voice recognition result of the inputted voice, thereby performing a dialogue with the user. In such a speech dialogue apparatus, various speech recognition methods are used in order to improve the recognition rate of input speech.

例えば、音声の認識率を向上させるために、入力された音声を認識することで得られる複数の認識結果候補を音声認識結果として出力し、この音声認識結果に対する信頼度を求め、この新たに求められた信頼度を、それまでに発話された音声の音声認識結果の信頼度に加算することで、現在までの発話内容の理解結果を生成する（特許文献１。）といった手法が開示されている。 For example, in order to improve the speech recognition rate, a plurality of recognition result candidates obtained by recognizing input speech are output as speech recognition results, the reliability for the speech recognition results is obtained, and this newly obtained A method of generating an understanding result of the utterance content up to now is disclosed by adding the obtained reliability to the reliability of the speech recognition result of the speech uttered so far (Patent Document 1). .

これにより、入力された音声を単に単語として音声認識するばかりではなく、音声認識した単語の文脈との関連性を考慮し、より尤もらしい語を最終的に選定することができるため、効率よく音声認識精度を向上させることができる。
特開２００４−２５１９９８号公報 As a result, it is possible not only to recognize the input speech as a word, but also to select a more likely word in consideration of the relationship with the context of the speech-recognized word. Recognition accuracy can be improved.
JP 2004-251998 A

しかしながら、特許文献１で開示された手法では、音声認識結果として出力される認識結果候補のいずれかには、必ずユーザによって発話された単語が存在することを前提としているため、認識結果候補から正しい音声認識結果が漏れてしまっている可能性がある。 However, in the method disclosed in Patent Document 1, it is assumed that any of the recognition result candidates output as the speech recognition result always includes a word spoken by the user. The speech recognition result may have been leaked.

このように、正しい認識結果候補から正しい音声認識結果が排除されてしまった場合、当然、生成される理解結果も信頼できる結果とはなっていないことになる。 Thus, if the correct speech recognition result is excluded from the correct recognition result candidates, naturally, the generated understanding result is not a reliable result.

そこで、本発明は、上述した実情に鑑みて提案されたものであり、ユーザによって発話された正しい語句が、最終的な認識結果候補から排除され、選定漏れとなることを低減させることで、音声の認識率を向上させることができる音声対話装置及び音声理解結果生成方法を提供することを目的とする。 Therefore, the present invention has been proposed in view of the above-described circumstances, and correct words and phrases uttered by the user are excluded from the final recognition result candidates, thereby reducing the possibility of omission of the voice. An object of the present invention is to provide a speech dialogue apparatus and a speech understanding result generation method capable of improving the recognition rate of speech recognition.

本発明の音声対話装置では、発話された音声を入力する入力手段と、前記音声の認識対象語と、前記認識対象語に誤認識されやすい類似単語とを対応づけて記憶する記憶手段と、前記入力手段によって入力された音声を前記認識対象語に基づき認識する音声認識手段と、前記音声認識手段による認識結果である認識結果候補に含まれる単語に基づき、前記記憶手段から前記類似単語を検出する検出手段と、前記検出手段によって検出された前記類似単語を前記認識結果候補に加え、前記類似単語を加えた前記認識結果候補から、前記発話された音声に対する応答となる理解結果を生成する理解結果生成手段とを備えることにより、上述の課題を解決する。 In the spoken dialogue apparatus of the present invention, the input means for inputting the spoken voice, the storage means for storing the speech recognition target word and the similar word that is easily misrecognized by the recognition target word, and the storage means, The speech recognition means for recognizing the speech input by the input means based on the recognition target word, and the similar word is detected from the storage means based on a word included in a recognition result candidate as a recognition result by the speech recognition means. An understanding result for generating an understanding result as a response to the spoken voice from a detection means and the similar result detected by the detecting means to the recognition result candidate, and from the recognition result candidate to which the similar word is added The above-described problem is solved by providing the generation unit.

また、本発明の音声理解結果生成方法では、発話された音声を入力する入力工程と、前記入力工程によって入力された音声を前記認識対象語に基づき認識する音声認識工程と、前記音声認識工程による認識結果である認識結果候補に含まれる単語に基づき、前記音声の認識対象語と、前記認識対象語に誤認識されやすい類似単語とを対応づけて記憶する記憶手段から前記類似単語を検出する検出工程と、前記検出工程によって検出された前記類似単語を前記認識結果候補に加え、前記類似単語を加えた前記認識結果候補から、前記発話された音声に対する応答となる理解結果を生成する理解結果生成工程とを備えることにより、上述の課題を解決する。 In the speech understanding result generation method of the present invention, the input step of inputting the spoken speech, the speech recognition step of recognizing the speech input by the input step based on the recognition target word, and the speech recognition step Detection based on a word included in a recognition result candidate that is a recognition result, and detecting the similar word from a storage unit that stores the speech recognition target word and a similar word that is easily misrecognized by the recognition target word. And an understanding result generation for generating an understanding result as a response to the spoken speech from the recognition result candidate obtained by adding the similar word to the recognition result candidate. The above-described problems are solved by providing the process.

本発明の音声対話装置は、音声認識手段による認識結果である認識結果候補に含まれる単語に基づき、記憶手段から検出された認識対象語に誤認識されやすい類似単語を認識結果候補に加え、類似単語を加えた認識結果候補から、発話された音声に対する応答となる理解結果を生成する。 The speech dialogue apparatus according to the present invention adds a similar word that is easily misrecognized to the recognition target word detected from the storage unit based on the word included in the recognition result candidate that is the recognition result by the speech recognition unit, An understanding result that is a response to the spoken voice is generated from the recognition result candidate to which the word is added.

これにより、音声認識手段による認識結果として得られる認識結果候補だけではなく、音声認識処理では漏れてしまったが、認識結果候補に含まれる単語とは、音響的に近く誤認識されやすい類似単語を最終的な認識結果候補として扱うことができる。 As a result, not only the recognition result candidate obtained as a recognition result by the speech recognition means, but also leaked in the speech recognition processing, the word included in the recognition result candidate is a similar word that is acoustically close and easily misrecognized. It can be handled as a final recognition result candidate.

したがって、ユーザによる正当な発話であるのにも関わらず、音声認識処理により認識結果候補から排除されてしまった場合でも、理解結果として選択される可能性を残すことができるため、認識率を向上させることを可能とする。 Therefore, even if it is a legitimate utterance by the user, it can be selected as an understanding result even if it is excluded from the recognition result candidates by voice recognition processing, so the recognition rate is improved. It is possible to make it.

また、本発明の音声理解結果生成方法は、音声認識による認識結果である認識結果候補に含まれる単語に基づき、記憶手段から検出された認識対象語に誤認識されやすい類似単語を認識結果候補に加え、類似単語を加えた認識結果候補から、発話された音声に対する応答となる理解結果を生成する。 Further, the speech understanding result generation method of the present invention uses, as a recognition result candidate, a similar word that is easily misrecognized by the recognition target word detected from the storage unit based on the word included in the recognition result candidate that is the recognition result by speech recognition. In addition, an understanding result that is a response to the spoken voice is generated from the recognition result candidates to which similar words are added.

これにより、音声認識による認識結果として得られる認識結果候補だけではなく、音声認識処理では漏れてしまったが、認識結果候補に含まれる単語とは、音響的に近く誤認識されやすい類似単語を最終的な認識結果候補として扱うことができる。 As a result, not only the recognition result candidate obtained as a recognition result by speech recognition but also leaked in the speech recognition processing, but the word included in the recognition result candidate is a similar word that is acoustically close and easily misrecognized. Can be treated as a typical recognition result candidate.

以下、本発明の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まず、図１を用いて、本発明の実施の形態として示す音声対話装置の構成について説明をする。図１に本発明の実施の形態として示す音声対話装置は、車両などの移動体に搭載されるナビゲーション装置に適用した場合の構成である。ナビゲーション装置は、例えば、移動体である車両に搭載された場合、車両の現在位置を検出し、地図データから表示された車両の現在位置に対応する地図を表示しながら所望の目的地までの経路案内をすることができる。 First, the configuration of a voice interactive apparatus shown as an embodiment of the present invention will be described with reference to FIG. The voice interaction apparatus shown in FIG. 1 as an embodiment of the present invention is configured when applied to a navigation apparatus mounted on a moving body such as a vehicle. For example, when the navigation device is mounted on a vehicle that is a moving body, the navigation device detects the current position of the vehicle, displays a map corresponding to the current position of the vehicle displayed from the map data, and displays a route to a desired destination. You can give guidance.

この音声対話装置をナビゲーション装置に適用すると、ナビゲーション装置で要求される各種機能を、ユーザとシステムとの対話によってインタラクティブに動作させることができる。 When this voice interactive apparatus is applied to a navigation apparatus, various functions required by the navigation apparatus can be operated interactively by the interaction between the user and the system.

図１に示すように、音声対話装置は、スイッチ１０と、マイク２０と、メモリ３０と、経路案内に用いる地図データや、ガイダンス音声の音声データなどを格納するディスク４０と、ディスク４０に格納された各種データを読み取るディスク読み取り装置４１と、マイク２０を介して入力された音声を音声認識し、音声認識結果の内容を理解してシステム応答を生成する制御装置５０と、経路探索結果を示す地図、メニュー画面、制御装置５０による音声認識結果などを表示する、例えば液晶ディスプレイといったモニタ６０と、ガイダンス音声やユーザとの対話におけるシステム応答音声（以下、単にシステム応答と呼ぶ。）などを出力するスピーカ７０とを備えている。 As shown in FIG. 1, the voice interaction device is stored in a switch 10, a microphone 20, a memory 30, a disk 40 for storing map data used for route guidance, voice data for guidance voice, and the like. A disk reader 41 that reads various data, a controller 50 that recognizes voice input through the microphone 20, understands the contents of the voice recognition result, and generates a system response; and a map showing a route search result , A menu screen, a voice recognition result by the control device 50, etc., for example, a monitor 60 such as a liquid crystal display, and a speaker for outputting a guidance voice or a system response voice (hereinafter simply referred to as a system response) in a dialogue with the user. 70.

スイッチ１０は、ユーザの押下により、ユーザによって発話されマイク２０を介して入力された音声に対する音声認識処理を開始させるよう後述する制御装置５０の入力制御部５１に指示をする。 The switch 10 instructs the input control unit 51 of the control device 50, which will be described later, to start a speech recognition process on the speech uttered by the user and input through the microphone 20 when pressed by the user.

マイク２０は、ユーザによって発話された音声を、後述する制御装置５０の音声認識部５２に入力する。例えば、ユーザは、ナビゲーション装置の操作に使用される語句及び文、すなわち操作コマンド及び地名や施設名、道路名などの固有名詞及びこれらの語句を含む文を発話して、マイク２０からその音声を入力する。 The microphone 20 inputs the voice uttered by the user to a voice recognition unit 52 of the control device 50 described later. For example, the user utters words and sentences used for the operation of the navigation device, that is, operation commands and proper nouns such as place names, facility names, road names, and sentences including these phrases, and the voice is output from the microphone 20. input.

メモリ３０は、ランダムアクセス可能なＲＡＭ（Random Access Memory）などであり、音声認識処理が実行される場合に、ディスク読み取り装置４１によってディスク４０から読み出される音声認識用辞書・文法を記憶する記憶領域３１と、後述する制御装置５０の言語理解部５４から出力される現時点までの発話の理解結果を格納する記憶領域３２とを備えている。 The memory 30 is a random accessible RAM (Random Access Memory) or the like, and a storage area 31 for storing a voice recognition dictionary / grammar read from the disk 40 by the disk reading device 41 when a voice recognition process is executed. And a storage area 32 for storing an understanding result of utterances up to the present time output from a language understanding unit 54 of the control device 50 described later.

メモリ３０の記憶領域３２に格納される現時点までの発話の理解結果は、今回の発話の次に発話が入力された際に、この新たに入力された発話の理解を過去の発話理解結果と合わせて理解するために用いられる。 The understanding result of the utterance up to the present time stored in the storage area 32 of the memory 30 is that when the utterance is input next to the current utterance, the understanding of the newly input utterance is combined with the past utterance understanding result. Used to understand.

ディスク４０は、音声認識に使用する音声認識用辞書・文法、地図データベース、固有名詞データベース、一般データベース、読みデータベースなどを格納した記憶媒体である。 The disk 40 is a storage medium that stores a dictionary / grammar for speech recognition used for speech recognition, a map database, a proper noun database, a general database, a reading database, and the like.

一般に、音声認識用辞書・文法を用いて音声認識をするシステムでは、この音声認識用辞書・文法に記述されている認識対象語と文法とを用いた入力文だけを音声認識結果として受理することができる。この認識対象語に関する情報は、固有名詞データベース、一般データベース、読みデータベースでも管理されている。 Generally, in a speech recognition system using a speech recognition dictionary / grammar, only input sentences using the recognition target words and grammar described in the speech recognition dictionary / grammar are accepted as speech recognition results. Can do. Information about the recognition target word is also managed in a proper noun database, a general database, and a reading database.

例えば、ナビゲーション装置のメインタスクを経路探索をする際の目的地設定とすると、ユーザによってマイク２０から入力される入力文として、「神奈川県」「横浜駅」などといった施設に関する単語のみの入力と「神奈川県の横浜駅」「東海道線の横浜駅」などといった複数のキーワードを組み合わせた文章による入力との両方を想定することができる。 For example, assuming that the main task of the navigation device is a destination setting for route search, the user can input only words related to facilities such as “Kanagawa Prefecture” and “Yokohama Station” as input sentences input from the microphone 20 by the user. It is possible to envisage both input by sentences combining a plurality of keywords such as “Yokohama Station in Kanagawa Prefecture” and “Yokohama Station on the Tokaido Line”.

したがって、ディスク４０に格納される音声認識用辞書・文法は、このような単語のみの入力と複数のキーワードを含んだ文書の両方に対応することができる構成となっている。 Therefore, the speech recognition dictionary / grammar stored in the disk 40 is configured to be able to handle both such a word-only input and a document including a plurality of keywords.

地図データベースには、地図表示や経路探索に使用する地図データが収録されている。 The map database contains map data used for map display and route search.

固有名詞データベースには、「追浜駅」「東京○×パーク」といった施設を示す固有名詞や、「埼玉県」「多摩市」といった場所を示す固有名詞などが、各固有名詞に関する詳細情報と対応付けられて格納されている。 In the proper noun database, proper nouns indicating facilities such as “Oppama Station” and “Tokyo ○ × Park” and proper nouns indicating places such as “Saitama Prefecture” and “Tama City” are associated with detailed information on each proper noun. Stored.

一般データベースには、「駅」「ガソリンスタンド」などの施設種別を示す一般名詞、「県」「市」「町」などの行政区画を表す一般名詞、「はい」「いいえ」などの肯定語や否定語、「行く」「探す」「帰る」などの動詞、「１」…「１０００」といった数字、「そう」などの助詞、「です」などの助動詞などが、それぞれの詳細情報と対応付けられて格納されている。 The general database includes general nouns indicating the type of facility such as “station” and “gas station”, general nouns indicating administrative divisions such as “prefecture”, “city” and “town”, affirmative words such as “yes” and “no” Negative words, verbs such as “go”, “find”, and “return”, numbers such as “1” ... “1000”, particles such as “so”, auxiliary verbs such as “is”, etc. are associated with each detailed information. Stored.

固有名詞データベース、一般データベースの詳細情報とは、例えば、単語が施設の場合には表示用のカナ漢字表記と施設の所在地、飲食店やガソリンスタンドなどといった属性情報、その他の単語の場合には、その意味などを示す属性情報である。 Proper noun database, detailed information of general database, for example, if the word is a facility, Kana kanji notation for display and the location of the facility, attribute information such as restaurants and gas stations, and in the case of other words, This is attribute information indicating its meaning.

図２に、固有名詞データベースの一例を示す。図２に示すように、固有名詞データベースは、「追浜駅」「東京○×パーク」「埼玉県」「多摩市」といった各固有名詞を、これらを一意に特定する単語ＩＤ毎に、各固有名詞の表記形態を示す“表記”、各固有名詞の種別を示す“種別”、各固有名詞の所在を都道府県、市区町村単位で特定する“住所”、別途用意される各固有名詞の読み方を格納した読みデータベースへとリンクする読みＩＤを示す“読みリスト”といった情報で管理している。 FIG. 2 shows an example of a proper noun database. As shown in FIG. 2, the proper noun database includes each proper noun such as “Oppama Station”, “Tokyo ○ × Park”, “Saitama Prefecture”, and “Tama City” for each word ID that uniquely identifies these proper nouns. “Notation” indicating the form of the proper name, “Type” indicating the type of each proper noun, “Address” specifying the location of each proper noun by prefecture, city, and village, and how to read each proper noun prepared separately It is managed by information such as a “reading list” indicating reading IDs linked to the stored reading database.

また、図２に示すように、固有名詞のうち“種別”が駅とされる固有名詞は、“路線”として路線情報でも管理される。図示しないが、“種別”がインターチェンジとされる固有名詞にも“路線”として道路情報が与えられる。 Further, as shown in FIG. 2, proper nouns whose “type” is a station among proper nouns are also managed in route information as “routes”. Although not shown, road information is also given as “route” to proper nouns whose “type” is interchange.

図３に、一般データベースの一例を示す。図３に示すように、一般データベースは、「駅」「県」「市」「町」「はい」「いいえ」「そう」「です」「行く」「探す」「帰る」、「１」…「１０００」といった固有名詞以外の一般名詞、動詞、助詞、助動詞などを、これらを一意に特定する単語ＩＤ毎に、各単語の表記形態を示す“表記”、各単語を大まかに分類した“カテゴリ”、“カテゴリ”の下位概念である“種別”、別途用意される各単語の読み方を格納した読みデータベースへとリンクする読みＩＤを示す“読みリスト”といった情報で管理している。 FIG. 3 shows an example of a general database. As shown in FIG. 3, the general database includes “station”, “prefecture”, “city”, “town”, “yes”, “no”, “so”, “is”, “go”, “search”, “return”, “1”, “1”, etc. For example, a general noun other than a proper noun such as “1000”, a verb, a particle, an auxiliary verb, etc., for each word ID that uniquely identifies them, “notation” indicating the notation form of each word, “category” roughly classifying each word The information is managed by information such as “type” which is a subordinate concept of “category” and “reading list” indicating reading IDs linked to a reading database storing how to read each word separately prepared.

読みデータベースは、固有名詞データベース、一般データベースに分類されている各単語とリンクしており各単語の読み方を格納している。 The reading database is linked to each word classified into the proper noun database and the general database, and stores how to read each word.

固有名詞データベース、一般データベースに格納されている単語は、同じ意味を示す単語であっても表記及び読み方が異なる場合がある。例えば、図２の固有名詞データベースに登録された単語ＩＤが５２０１の「東京○×パーク」は、正式な表記である「東京○×パーク」に対して、これを短縮した「○×パーク」と表記され、単に、「まるばつぱーく」と呼ばれることも多い。この「東京○×パーク」と「○×パーク」とは、同一の施設としてユーザに認知されているが、表記及び読み方が異なっている。 The words stored in the proper noun database and the general database may be different in notation and reading even if they have the same meaning. For example, “Tokyo ○ × Park” having a word ID of 5201 registered in the proper noun database of FIG. 2 is shortened from “Tokyo ○ × Park”, which is the official notation, as “○ × Park”. It is often written and simply called “Marubatsu Park”. Although “Tokyo ○ × Park” and “○ × Park” are recognized as the same facility by the user, the notation and reading are different.

また、図３の一般データベースに登録された単語ＩＤが０７０４の「町」と表記される単語は、場合に応じて「まち」又は「ちょう」のいずれかの読み方がなされる。 In addition, the word written as “town” with the word ID 0704 registered in the general database in FIG. 3 is read as either “town” or “cho” depending on the case.

したがって、このような単語の言い換えに対応するために、固有名詞データベース、一般データベースに対して、別途、読みデータベースを設けている。 Therefore, in order to cope with such paraphrasing of words, a reading database is provided separately for the proper noun database and the general database.

図４に、図２に示した固有名詞データベース、図３に示した一般データベースに対応した読みデータベースを示す。 FIG. 4 shows a reading database corresponding to the proper noun database shown in FIG. 2 and the general database shown in FIG.

図４に示すように、読みデータベースは、固有名詞データベース、一般データベースに格納されている単語の全ての読み方を一意に特定する読みＩＤ毎に、読み方を示した“読み”と、この読み方をする単語の固有名詞データベース、一般データベースにおける表記と単語ＩＤとを“意味リスト”として保持している。また、図４に示すように、読みデータベースは、“類似単語読みリスト”として、自身と誤認識されやすい読み方をする単語の読みＩＤと、この読みＩＤで一意に読み方が特定される単語と、どの程度、誤認識されやすいかを数値化した類似度をリスト形式で保持している。 As shown in FIG. 4, the reading database reads “reading” indicating the reading for each reading ID that uniquely specifies all readings of words stored in the proper noun database and the general database. The notation of word proper noun database and general database and word ID are held as a “semantic list”. Also, as shown in FIG. 4, the reading database is a “similar word reading list”, a reading ID of a word that is easily misrecognized as itself, a word whose reading is uniquely specified by this reading ID, The degree of similarity in which the degree of misrecognition is quantified is stored in a list format.

続いて、制御装置５０について説明する。制御装置５０は、入力制御部５１と、音声認識部５２と、単語信頼度演算部５３と、言語理解部５４と、応答生成部５５と、ＧＵＩ表示制御部５６と、音声合成部５７とを備え、マイク２０を介して入力された音声に対して、音声認識処理をし、音声認識結果に応じたシステム応答を行う。 Next, the control device 50 will be described. The control device 50 includes an input control unit 51, a speech recognition unit 52, a word reliability calculation unit 53, a language understanding unit 54, a response generation unit 55, a GUI display control unit 56, and a speech synthesis unit 57. The voice recognition process is performed on the voice input through the microphone 20 and a system response is made according to the voice recognition result.

入力制御部５１は、ユーザによってスイッチ１０が押下されたことに応じて、音声認識部５２に対して音声認識処理を開始するよう指示をする。 The input control unit 51 instructs the voice recognition unit 52 to start the voice recognition process in response to the switch 10 being pressed by the user.

音声認識部５２は、入力制御部５１の指示に応じて、マイク２０から入力されるユーザによって発話され、図示しないＡ／Ｄコンバータでデジタル化された音声信号を取り込み音声認識処理を実行する。 In response to an instruction from the input control unit 51, the speech recognition unit 52 executes speech recognition processing by capturing a speech signal uttered by a user input from the microphone 20 and digitized by an A / D converter (not shown).

音声認識部５２は、取り込んだデジタル化された音声信号と、メモリ３０の記憶領域３１に構築された音声認識用辞書・文法が保持する認識対象語からなる待ち受け文とのマッチング処理により音声認識を行い、音声認識結果を言語理解部５４に出力する。 The voice recognition unit 52 performs voice recognition by matching the captured digitized voice signal with a standby sentence made up of recognition target words held in the voice recognition dictionary / grammar constructed in the storage area 31 of the memory 30. The speech recognition result is output to the language understanding unit 54.

音声認識部５２は、マッチング処理の際に、音声特徴データと各待ち受け文との音響的な近さである尤度を計算し、この尤度が一定の値以上のものを音声認識結果の認識結果候補とする。 During the matching process, the speech recognition unit 52 calculates a likelihood that is the acoustic proximity between the speech feature data and each standby sentence, and recognizes a speech recognition result that has a likelihood equal to or greater than a certain value. The result is a candidate.

音声認識部５２は、認識結果候補として、尤度が高い音声認識結果の上位Ｎ個の認識結果候補（以下、Ｎ−ｂｅｓｔ候補とも呼ぶ。）とその尤度とを単語信頼度演算部５３に出力する。この認識結果候補は、当該認識結果候補に含まれる各単語の読み方毎に与えられた、ディスク４０に格納されている図４に示す読みデータベースにおける識別コードである読みＩＤを尤度の高い順に並べた単語列として表される。 The speech recognition unit 52 uses the top N recognition result candidates (hereinafter also referred to as N-best candidates) of the speech recognition results with the highest likelihood as the recognition result candidates and their likelihoods to the word reliability calculation unit 53. Output. In this recognition result candidate, reading IDs, which are identification codes in the reading database shown in FIG. 4 stored in the disk 40, which are given for each reading of words included in the recognition result candidate, are arranged in descending order of likelihood. Expressed as a word string.

単語信頼度演算部５３は、音声認識部５２から音声認識結果として出力された認識結果候補に含まれる全ての単語に対して、各単語の読み方毎に単語信頼度を算出する。つまり、単語信頼度演算部５３は、同じ意味を示す単語であっても読み方が異なれば、異なる単語であるとして扱い、その単語信頼度を算出する。 The word reliability calculation unit 53 calculates word reliability for each word reading for all words included in the recognition result candidates output as the speech recognition result from the speech recognition unit 52. That is, the word reliability calculation unit 53 treats words having the same meaning as different words if they are read differently, and calculates the word reliability.

したがって、単語信頼度は、ディスク４０に格納されている図４に示す読みデータベースの読みＩＤ単位で算出されることになる。単語信頼度とは、単一の発話において、その読み方で単語が発話された可能性を示す値であり、ある単語Ｗの読みＩＤＷに対する単語信頼度をＣｏｎｆ（Ｗ）、Ｎ−ｂｅｓｔ候補それぞれに対する対数尤度をＬｉとすると、以下に示す（１）式によって求めることができる。

Therefore, the word reliability is calculated in units of reading IDs of the reading database shown in FIG. The word reliability is a value indicating the possibility that a word is uttered in a single utterance, and the word reliability with respect to a reading IDW of a certain word W is set to Conf (W) and N-best candidates, respectively. If the log likelihood is Li, it can be obtained by the following equation (1).

単語信頼度演算部５３によって算出された読みＩＤ毎の単語信頼度は、全て“認識結果候補中に含まれる単語の読みＩＤリスト”としてメモリ３０に保存される。なお、単語信頼度演算部５３による単語信頼度の演算については、特開２００４−２５１９９８号公報で開示されている。 The word reliability for each reading ID calculated by the word reliability calculation unit 53 is all stored in the memory 30 as “a reading ID list of words included in recognition result candidates”. The calculation of word reliability by the word reliability calculation unit 53 is disclosed in Japanese Patent Application Laid-Open No. 2004-251998.

言語理解部５４は、まず、音声認識部５２の音声認識結果である認識結果候補と読み方が類似しており、誤認識されやすい読み方をする類似単語に対して単語信頼度を演算し、音声認識部５２による音声認識処理の結果、認識結果候補とされた単語と同じように、認識結果候補として扱えるようにする。なお、類似単語に対する単語信頼度の設定手順については、後で詳細に説明をする。 The language understanding unit 54 first calculates a word reliability for a similar word whose reading is similar to the recognition result candidate that is the voice recognition result of the voice recognition unit 52 and is likely to be misrecognized. As a result of the speech recognition processing by the unit 52, it can be handled as a recognition result candidate in the same way as a word that is a recognition result candidate. Note that the word reliability setting procedure for similar words will be described in detail later.

さらに、言語理解部５４は、類似単語も含めた認識結果候補に含まれる単語を意味上のまとまりであるカテゴリに分類し、同一カテゴリ内に属する単語の単語信頼度を足し合わることでカテゴリスコアを算出する。 Further, the language understanding unit 54 classifies the words included in the recognition result candidates including similar words into categories that are semantically grouped, and adds the word reliability of words belonging to the same category, thereby adding a category score. Is calculated.

ナビゲーション装置のメインタスクを経路探索をする際の目的地設定とすると、ユーザによって発話される音声の種類から、カテゴリは、例えば、「都道府県カテゴリ」「市区町村カテゴリ」「路線名カテゴリ」「施設名カテゴリ」などが考えられる。例えば、「都道府県カテゴリ」には「東京都」「神奈川県」などの都道府県名が分類され、「施設名カテゴリ」には「横浜駅」「横浜青葉インター」などの目的地設定における最終目的である目的地名が分類されることになる。 If the main task of the navigation device is a destination setting when performing route search, the categories are, for example, “prefecture category”, “city category”, “route name category”, “ "Facility name category" can be considered. For example, prefecture names such as “Tokyo” and “Kanagawa” are categorized in “prefecture category”, and final purpose in destination setting such as “Yokohama Station” and “Yokohama Aoba Inter” in “facility name category” The destination name is classified.

言語理解部５４は、カテゴリスコアが所定の閾値を超えているカテゴリから、理解結果として出力させる単語を一つずつ選択して、全ての組み合わせを検証する。言語理解部５４は、意味的に整合性のとれる組み合わせだけを理解結果候補とし、この理解結果候補を構成する単語の信頼度を足し合わせ、足し合わせた結果に理解結果候補を構成する単語数に応じた補正をして理解結果スコアを算出する。 The language understanding unit 54 selects words to be output as an understanding result one by one from categories whose category score exceeds a predetermined threshold, and verifies all combinations. The language understanding unit 54 uses only combinations that are semantically consistent as understanding result candidates, adds the reliability of words constituting the understanding result candidates, and adds the results to the number of words constituting the understanding result candidate. The comprehension correction is made to calculate the understanding result score.

言語理解部５４は、理解結果スコアが最大となる理解結果候補を理解結果として応答生成部５５に出力する。 The language understanding unit 54 outputs the understanding result candidate having the maximum understanding result score to the response generation unit 55 as the understanding result.

応答生成部５５は、言語理解部５４から出力された理解結果に基づいて応答文を生成し、ＧＵＩ表示制御部５６、音声合成部５７に出力する。 The response generation unit 55 generates a response sentence based on the understanding result output from the language understanding unit 54 and outputs the response sentence to the GUI display control unit 56 and the speech synthesis unit 57.

ＧＵＩ表示制御部５６は、必要に応じて、ディスク読み取り装置４１を制御してディスク４０に格納されている地図データを読み出し、モニタ６０を介して地図を表示させたり、応答生成部５５で生成された応答文に即した応答内容をモニタ６０を介して表示させる。 The GUI display control unit 56 controls the disk reading device 41 to read the map data stored in the disk 40 and display a map via the monitor 60 as needed, or the response generation unit 55 generates the map data. The response content corresponding to the response sentence is displayed via the monitor 60.

音声合成部５７は、応答生成部５５によって生成される応答文に応じて、応答文に即したデジタル音声信号を合成し、当該音声合成部５７が備える図示しないＤ／Ａコンバータ、出力増幅器を介してスピーカ７０に出力する。 The voice synthesizer 57 synthesizes a digital voice signal in accordance with the response sentence according to the response sentence generated by the response generator 55, and passes through a D / A converter and an output amplifier (not shown) included in the voice synthesizer 57. To the speaker 70.

続いて、図５に示すフローチャートを用いて、制御装置５０による音声認識処理を開始してから応答文を出力するまでの処理動作について説明をする。なお、本フローチャートでは、ナビゲーション装置のメインタスクを経路探索をする際の目的地設定として説明をする。 Next, processing operations from when the speech recognition process by the control device 50 is started until a response sentence is output will be described using the flowchart shown in FIG. In this flowchart, the main task of the navigation device will be described as a destination setting for route search.

まず、ステップＳ１において、ナビゲーション装置が起動されると、制御装置５０は、ディスク読み取り装置４１を制御してディスク４０から音声認識用辞書・文法を読み出し、メモリ３０の記憶領域３１に格納させる。 First, when the navigation device is activated in step S 1, the control device 50 controls the disk reading device 41 to read the speech recognition dictionary / grammar from the disk 40 and store it in the storage area 31 of the memory 30.

そして、ユーザがスイッチ１０を押下することで、入力制御部５１により音声認識開始が指示され、音声認識部５２は音声認識可能状態となる。これに応じて、音声認識部５２は、ユーザによって発話されマイク２０を介して入力され、図示しないＡ／Ｄコンバータでデジタル化された音声信号の取り込みを開始する。 Then, when the user presses the switch 10, the input control unit 51 instructs the start of voice recognition, and the voice recognition unit 52 enters a voice recognition enabled state. In response to this, the voice recognizing unit 52 starts taking in a voice signal that is spoken by the user and input through the microphone 20 and digitized by an A / D converter (not shown).

音声認識部５２は、スイッチ１０が押下されるまでは、デジタル化された音声信号（以下、単にデジタル信号とも呼ぶ。）の平均パワーの演算を継続している。スイッチ１０が押下された後、この平均パワーに較べてデジタル信号の瞬時パワーが所定値以上に大きくなった時、ユーザが発話したと判断して、デジタル化された音声信号の取り込みが開始される。 The voice recognition unit 52 continues to calculate the average power of the digitized voice signal (hereinafter also simply referred to as a digital signal) until the switch 10 is pressed. After the switch 10 is pressed, when the instantaneous power of the digital signal becomes greater than a predetermined value compared to the average power, it is determined that the user has spoken and the capturing of the digitized audio signal is started. .

ステップＳ２において、音声認識部５２は、取り込んだデジタル化された音声信号と、メモリ３０の記憶領域３１に構築された音声認識用辞書・文法が保持する待ち受け文とを比較して、音響的な尤度を計算することで音声認識処理を実行する。 In step S 2, the speech recognition unit 52 compares the captured digitized speech signal with the standby sentence held in the speech recognition dictionary / grammar constructed in the storage area 31 of the memory 30, thereby The speech recognition process is executed by calculating the likelihood.

音声認識部５２は、音響的な尤度の高い上位Ｎ個の認識結果候補とその尤度とを音声認識結果として単語信頼度演算部５３に出力する。この認識結果候補は、当該認識結果候補に含まれる各単語の読み方毎に与えられた、ディスク４０に格納されている図４に示す読みデータベースにおける識別コードである読みＩＤを尤度の高い順に並べた単語列として表される。 The speech recognition unit 52 outputs the top N recognition result candidates with high acoustic likelihoods and the likelihoods to the word reliability calculation unit 53 as speech recognition results. In this recognition result candidate, reading IDs, which are identification codes in the reading database shown in FIG. 4 stored in the disk 40, which are given for each reading of words included in the recognition result candidate, are arranged in descending order of likelihood. Expressed as a word string.

ステップＳ３において、単語信頼度演算部５３は、認識結果候補と各認識結果候補の尤度から、認識結果候補に含まれる全ての単語、つまり読みＩＤ毎の単語信頼度を算出する。 In step S3, the word reliability calculation unit 53 calculates all words included in the recognition result candidates, that is, word reliability for each reading ID, from the recognition result candidates and the likelihood of each recognition result candidate.

ステップＳ４において、言語理解部５４は、読みＩＤを用いて、単語信頼度演算部５３によって単語信頼度を算出された認識結果候補に含まれる全ての単語の詳細情報を、ディスク４０に格納されている固有名詞データベース、一般データベースから取り出す。 In step S4, the language understanding unit 54 stores in the disk 40 detailed information of all the words included in the recognition result candidates whose word reliability is calculated by the word reliability calculation unit 53 using the reading ID. It is taken out from the proper noun database and general database.

また、言語理解部５４は、ディスク４０に格納された読みデータベースの読みＩＤ毎に設定された類似単語読みリストから、単語信頼度を算出した単語の読みと、誤認識されやすい読み方をする類似単語の読みＩＤと、その誤認識されやすい程度を示した類似度とを取り出す。 In addition, the language understanding unit 54 reads a word whose word reliability is calculated from a similar word reading list set for each reading ID of the reading database stored in the disk 40 and a similar word that is easily misrecognized. And the similarity indicating the degree of misrecognition.

ステップＳ５において、言語理解部５４は、ディスク４０の読みデータベースから取り出した類似単語の読みＩＤと類似度とを用いて、類似単語に対する単語信頼度を算出し、各類似単語に設定をする。これにより、言語理解部５４は、類似単語を、音声認識部５２による音声認識処理の結果、認識結果候補とされた単語と同じように、認識結果候補として扱えるようにする。 In step S5, the language understanding unit 54 calculates the word reliability for the similar word using the reading ID and the similarity of the similar word extracted from the reading database of the disk 40, and sets each similar word. As a result, the language understanding unit 54 can handle similar words as recognition result candidates in the same manner as words that are recognized as recognition result candidates as a result of the speech recognition processing by the speech recognition unit 52.

なお、言語理解部５４による類似単語への単語信頼度の設定手順については後で詳細に説明をする。 The procedure for setting the word reliability for similar words by the language understanding unit 54 will be described in detail later.

ステップＳ６において、言語理解部５４は、この時点までの対話の中で発話された可能性のある全ての単語の単語信頼度を修正する。 In step S6, the language understanding unit 54 corrects the word reliability of all the words that may have been uttered in the dialogue up to this point.

単語信頼度の修正は、認識結果候補中の他の単語との意味上の上下関係の有無や整合性などに応じて、ステップＳ３で算出した単語信頼度を上下させることで実行される。 The correction of the word reliability is executed by increasing or decreasing the word reliability calculated in step S3 according to the presence or absence of consistency with the other words in the recognition result candidate or consistency.

例えば、第１の発話の認識結果候補中に「東京駅（とうきょうえき）」という単語があり、第２の発話の認識結果候補中に「東京都（とうきょうと）」がある場合、「東京都」と「東京駅」の間には上下関係が成り立つため、お互いの単語信頼度が強められ、単語信頼度が上がる方向で修正される。また、第１の発話の認識結果候補中に「東京駅（とうきょうえき）」ではなく、「京都駅（きょうとえき）」があった場合は、「東京都」と「京都駅」の間には上下関係が成り立たないため、お互いの単語信頼度が弱められ、単語信頼度が下がる方向で修正される。 For example, if there is the word “Tokyo Station” in the recognition result candidate of the first utterance and “Tokyo” is in the recognition result candidate of the second utterance, ”And“ Tokyo Station ”have a vertical relationship, so the word reliability of each other is strengthened, and the word reliability is corrected in the direction of increasing. Also, if there is “Kyoto Station” instead of “Tokyo Station” in the recognition result candidate of the first utterance, it is between “Tokyo” and “Kyoto Station”. Since the hierarchical relationship does not hold, the mutual word reliability is weakened, and the word reliability is corrected in a decreasing direction.

ステップＳ７において、言語理解部５４は、類似単語も含めた認識結果候補に含まれる単語を意味上のまとまりであるカテゴリに分類し、同一カテゴリ内に属する単語の単語信頼度を足し合わることでカテゴリスコアを算出する。 In step S7, the language understanding unit 54 classifies words included in recognition result candidates including similar words into categories that are semantically grouped, and adds the word reliability of words belonging to the same category. A category score is calculated.

ステップＳ８において、言語理解部５４は、算出したカテゴリスコアが所定の閾値を超えているカテゴリを選択する。 In step S8, the language understanding unit 54 selects a category for which the calculated category score exceeds a predetermined threshold.

ステップＳ９において、言語理解部５４は、ステップＳ８において選択された各カテゴリから、理解結果となる単語を一つずつ選択して全ての組み合わせを検証し、理解結果の候補として出力する意味的に整合性のとれる組み合わせを探す。 In step S9, the language comprehension unit 54 selects one word each as an understanding result from each category selected in step S8, verifies all combinations, and outputs them as understanding result candidates. Search for a combination that has sex.

ステップＳ１０において、言語理解部５４は、理解結果候補となる意味的に整合性の取れる組み合わせが一つ以上見つかったかどうかを判定する。言語理解部５４は、意味的に整合性の取れる組み合わせが一つ以上見つかった場合は、ステップＳ１１へと進め、一つも見つからなかった場合は、ステップＳ８へと戻り、再度カテゴリ選択をやり直す。 In step S 10, the language understanding unit 54 determines whether one or more combinations that are semantically consistent as candidate understanding results have been found. If one or more combinations that can be semantically consistent are found, the language understanding unit 54 proceeds to step S11. If no combination is found, the language understanding unit 54 returns to step S8 and performs category selection again.

ステップＳ１１において、言語理解部５４は、各カテゴリから選択された単語の意味的に整合性の取れる組み合わせである各理解結果候補を構成する単語の単語信頼度を足し合わせ、足し合わせた結果に理解結果候補を構成する単語数に応じた補正をして理解結果スコアを算出する。そして、言語理解部５４は理解結果スコアが最大となる理解結果候補を、最良の理解結果として選択し応答生成部５５に出力する。 In step S11, the language understanding unit 54 adds the word reliability of the words constituting each understanding result candidate, which is a combination that can be semantically consistent with the words selected from each category, and understands the result of the addition. An understanding result score is calculated by performing correction according to the number of words constituting the result candidate. Then, the language understanding unit 54 selects an understanding result candidate having the maximum understanding result score as the best understanding result, and outputs it to the response generation unit 55.

なお、言語理解部５４によるステップＳ６から、ステップＳ１１までの処理内容は、特開２００４−２５１９９８号公報で開示されている。 In addition, the processing content from step S6 to step S11 by the language understanding part 54 is disclosed by Unexamined-Japanese-Patent No. 2004-251998.

ステップＳ１２において、応答生成部５５は、言語理解部５４で生成された理解結果に基づいて応答表示内容及び応答文を生成し、それぞれＧＵＩ表示制御部５６、音声合成部５７に出力する。 In step S 12, the response generation unit 55 generates a response display content and a response sentence based on the understanding result generated by the language understanding unit 54, and outputs the response display content and the response sentence to the GUI display control unit 56 and the speech synthesis unit 57, respectively.

応答生成部５５は、理解結果に応じて、例えば、目的地設定のために必要な情報が不足していれば不足する情報の入力を促す応答文を生成し、選択された理解結果を構成する単語の単語信頼度が低く、確認が必要と判断される場合には、理解内容の確認のための応答文を生成する。また、応答生成部５５は、目的地が確定した際には、目的地までの地図を検索し、表示させる旨を伝える応答文を生成する。 In response to the understanding result, the response generation unit 55 generates a response sentence that prompts the user to input insufficient information if the information necessary for destination setting is insufficient, and configures the selected understanding result. If the word reliability of the word is low and it is determined that confirmation is necessary, a response sentence for confirming the understanding content is generated. In addition, when the destination is determined, the response generation unit 55 searches for a map up to the destination and generates a response sentence informing that the map is to be displayed.

ステップＳ１３において、音声合成部５７は、応答生成部５５によって生成される応答文に応じて、応答文に即したデジタル音声信号を合成し、当該音声合成部５７が備える図示しないＤ／Ａコンバータ、出力増幅器を介してスピーカ７０に出力する。 In step S13, the speech synthesizer 57 synthesizes a digital speech signal corresponding to the response sentence according to the response sentence generated by the response generator 55, and a D / A converter (not shown) provided in the speech synthesizer 57, It outputs to the speaker 70 via an output amplifier.

ステップＳ１４において、ＧＵＩ表示制御部５６は、応答生成部５５によって生成される応答表示内容をモニタ６０上に表示するとともに、地図表示が必要であればディスク読み取り装置４１を使ってディスク４０から地図データを読み出し、モニタ６０に地図を表示させて一連の入力処理を終える。 In step S14, the GUI display control unit 56 displays the response display content generated by the response generation unit 55 on the monitor 60 and, if map display is necessary, the map data from the disc 40 using the disc reader 41. , The map is displayed on the monitor 60, and the series of input processes is completed.

（類似単語Ｗｎの単語信頼度設定手順：単語Ｗの単語信頼度から求める場合）
次に、図６に示すフローチャートを用いて、図５に示したフローチャートのステップＳ５における類似単語に単語信頼度を設定する際の手順について説明をする。 (Word reliability setting procedure for similar word Wn: When obtaining from word reliability of word W)
Next, the procedure for setting the word reliability for the similar word in step S5 of the flowchart shown in FIG. 5 will be described using the flowchart shown in FIG.

類似単語の単語信頼度を設定する手順について説明する前に、図５に示したフローチャートのステップＳ２において、音声認識部５２が音声認識処理を行った結果である認識結果候補リストＬｒ１の一例を図７に示す。図７に示すように、認識結果候補リストＬｒ１は、認識結果候補順位毎に、認識結果とその尤度とが記述されている。認識結果には、読みＩＤと単語の読み方とが示されている。 Before explaining the procedure for setting the word reliability of similar words, an example of a recognition result candidate list Lr1 that is a result of the speech recognition unit 52 performing speech recognition processing in step S2 of the flowchart shown in FIG. 7 shows. As shown in FIG. 7, the recognition result candidate list Lr1 describes a recognition result and its likelihood for each recognition result candidate rank. The recognition result indicates the reading ID and how to read the word.

図７に示す例では、認識結果候補の第１位として、尤度が６０の認識結果「１１０４１（おくたま）」が選択されており、認識結果候補の第２位として、尤度が４０の認識結果「１１０４１（おくたま）＋１７０１１（えき）」が選択されている。 In the example illustrated in FIG. 7, the recognition result “11041 (Okutama)” with a likelihood of 60 is selected as the first recognition result candidate, and the likelihood is 40 as the second recognition result candidate. The recognition result “11041 (Okutama) +17011 (Eki)” is selected.

これに対して、図５に示したフローチャートのステップＳ３において、単語信頼度演算部５３が単語信頼度を計算した結果である単語信頼度リストＬｃの一例を図８に示す。図８に示すように、単語信頼度リストＬｃには、認識結果候補リストＬｒ１に記述された認識結果候補である単語の読みＩＤと単語の読み方と、計算された単語信頼度とが記述されている。 On the other hand, FIG. 8 shows an example of the word reliability list Lc, which is a result of the word reliability calculation unit 53 calculating the word reliability in step S3 of the flowchart shown in FIG. As shown in FIG. 8, in the word reliability list Lc, the reading ID of the word that is the recognition result candidate described in the recognition result candidate list Lr1, how to read the word, and the calculated word reliability are described. Yes.

図８に示す例では、認識結果候補である「１１０４１（おくたま）」の信頼度が１．０、「１７０１１（えき）」の信頼度が０．４となっている。 In the example shown in FIG. 8, the reliability of the recognition result candidate “11041 (Okutama)” is 1.0, and the reliability of “17011 (Eki)” is 0.4.

図６に示すフローチャートでは、図５に示すフローチャートのステップＳ２おいて、図７に示すような認識結果候補リストＬｒ１が得られ、図８に示すような単語信頼度リストＬｃが得られたとして、ステップＳ５における類似単語の単語信頼度を設定する手順について説明をする。 In the flowchart shown in FIG. 6, assuming that the recognition result candidate list Lr1 as shown in FIG. 7 and the word reliability list Lc as shown in FIG. 8 are obtained in step S2 of the flowchart shown in FIG. The procedure for setting the word reliability of the similar word in step S5 will be described.

まず、ステップＳ２１において、言語理解部５４は、単語信頼度演算部５３から出力された単語信頼度リストＬｃから、認識結果候補である単語を一つ取り出す。 First, in step S 21, the language understanding unit 54 extracts one word that is a recognition result candidate from the word reliability list Lc output from the word reliability calculation unit 53.

例えば、本ステップで取り出した単語が、図８に示す単号信頼度リストＬｃの「１１０４１（おくたま）」であったとし、これを単語Ｗとする。 For example, assume that the word extracted in this step is “11041 (Okutama)” in the unit reliability list Lc shown in FIG.

ステップＳ２２において、言語理解部５４は、ディスク４０に格納されている読みデータベースを参照し、ステップＳ２１で取り出した認識結果候補である単語（単語Ｗ）の類似単語の読みＩＤと、類似度とをリストにした類似単語読みリストＬｗを取り出す。 In step S22, the language understanding unit 54 refers to the reading database stored in the disk 40, and obtains the reading ID and similarity of the similar word of the word (word W) that is the recognition result candidate extracted in step S21. A similar word reading list Lw is extracted.

ステップＳ２３において、言語理解部５４は、取り出した類似単語読みリストＬｗから、類似単語Ｗｎを一つ取り出す。 In step S23, the language understanding unit 54 extracts one similar word Wn from the extracted similar word reading list Lw.

図４に示す読みデータベースには、単語Ｗである「１１０４１（おくたま）」の類似単語リストとして、類似度が０．５の「１１０６１（おっぱま）」と、類似度が０．４の「１１０３１（たま）」が登録されているので、ここでは「１１０６１（おっぱま）」を類似単語Ｗｎとして取り出すことにする。 In the reading database shown in FIG. 4, as a similar word list of the word W “11041 (Okutama)”, “11061 (Oppama)” with a similarity of 0.5 and “ 11031 (Tama) "is registered, and here," 11061 (Oppama) "is extracted as the similar word Wn.

ステップＳ２４において、言語理解部５４は、ステップＳ２３で取り出した類似単語Ｗｎが単語信頼度リストＬｃに登録されているかどうかを判断する。ステップＳ２３において、「１１０６１（おっぱま）」を類似単語Ｗｎとして取り出した場合、言語理解部５４は、単語信頼度リストＬｃを参照して、読みＩＤが１１０６１の単語が既に登録されているかどうかを判断する。 In step S24, the language understanding unit 54 determines whether the similar word Wn extracted in step S23 is registered in the word reliability list Lc. In step S23, when “11061 (Oppama)” is extracted as the similar word Wn, the language understanding unit 54 refers to the word reliability list Lc to determine whether or not the word having the reading ID 11061 is already registered. to decide.

言語理解部５４は、類似単語Ｗｎが単語信頼度リストＬｃに既に登録されている場合は、ステップＳ２２へと戻り、次の類似単語Ｗｎに関する処理を実行する。また、言語理解部５４は、類似単語Ｗｎが類似単語リストＬｃに登録されていない場合は、ステップＳ２５へと進める。 If the similar word Wn is already registered in the word reliability list Lc, the language understanding unit 54 returns to step S22 and executes the process related to the next similar word Wn. If the similar word Wn is not registered in the similar word list Lc, the language understanding unit 54 proceeds to step S25.

このステップＳ２４において、類似単語Ｗｎが、単語信頼度リストＬｃに存在するかどうかを調べる目的は、認識結果候補リストＬｒ１の中に現れなかったけれど、認識結果中に現れた単語に誤認識されやすい単語の可能性を調べることにあるため、認識結果候補リストＬｒ１から得られた単語信頼度リストＬｃに既に登録されていれば、この類似単語Ｗｎに関しては調べる必要がない。そのため、単語信頼度リストＬｃの中に同じ読みＩＤの単語が登録されていれば、次の類似単語Ｗｎを調べる。 The purpose of checking whether or not the similar word Wn is present in the word reliability list Lc in step S24 is not appearing in the recognition result candidate list Lr1, but is easily misrecognized by a word appearing in the recognition result. Since the possibility of the word is to be checked, it is not necessary to check the similar word Wn if it is already registered in the word reliability list Lc obtained from the recognition result candidate list Lr1. Therefore, if a word with the same reading ID is registered in the word reliability list Lc, the next similar word Wn is examined.

ステップＳ２５において、言語理解部５４は、ステップＳ２３で取り出した類似単語Ｗｎが、単語リストＬｃに登録されていないことに応じて、今度は、この類似単語Ｗｎが、他の単語の類似単語Ｗｎとして既に類似単語ＷｎリストＬｎに登録されているかどうかを判断する。 In step S25, in response to the fact that the similar word Wn taken out in step S23 is not registered in the word list Lc, the language understanding unit 54 now selects the similar word Wn as a similar word Wn of another word. It is determined whether or not it is already registered in the similar word Wn list Ln.

言語理解部５４は、類似単語ＷｎリストＬｎに類似単語Ｗｎが登録されていない場合、ステップＳ２６へと進み、登録されている場合、ステップＳ２９へと進む。 The language understanding unit 54 proceeds to step S26 when the similar word Wn is not registered in the similar word Wn list Ln, and proceeds to step S29 when it is registered.

ステップＳ２６において、言語理解部５４は、ステップＳ２３で取り出した類似単語Ｗｎが、類似単語リストＬｎに登録されていないことに応じて、ディスク４０の固有名詞データベース、一般データベースを参照し、この類似単語Ｗｎの詳細情報を取り出す。 In step S26, the language understanding unit 54 refers to the proper noun database and the general database on the disc 40 in response to the similarity word Wn extracted in step S23 being not registered in the similarity word list Ln, and this similarity word Detailed information of Wn is extracted.

ステップＳ２７において、言語理解部５４は、ディスク４０のデータベースから類似単語Ｗｎの詳細情報を取り出した後、この類似単語Ｗｎに対して単語信頼度を設定する。類似単語Ｗｎとして、読みＩＤが１１０６１の単語である「おっぱま」が選択されている場合、言語理解部５４は、この「おっぱま」に対して単語信頼度を設定することになる。 In step S27, the language understanding unit 54 retrieves the detailed information of the similar word Wn from the database of the disk 40, and then sets the word reliability for the similar word Wn. When “Oppama”, which is a word having a reading ID of 11061, is selected as the similar word Wn, the language understanding unit 54 sets the word reliability for this “Oppama”.

類似単語Ｗｎの単語信頼度は、図５に示すフローチャートのステップＳ３において求められた単語Ｗの単語信頼度を用い、さらに単語Ｗと類似単語Ｗｎの類似度をβとすると、以下に示す（２）式のように表すことができる。 The word reliability of the similar word Wn is expressed as follows when the word reliability of the word W obtained in step S3 of the flowchart shown in FIG. 5 is used and the similarity between the word W and the similar word Wn is β (2) ) Can be expressed as:

類似単語Ｗｎの単語信頼度＝単語Ｗの単語信頼度 × β ・・・（２） Word reliability of similar word Wn = word reliability of word W × β (2)

単語Ｗとして、読みＩＤが１１０４１の単語である「おくたま」が選択され、この単語Ｗと読み方が類似する類似単語Ｗｎとして、読みＩＤが１１０６１の「おっぱま」が選択され、単語信頼度演算部５３によって「おくたま」の単語信頼度が１．０と計算されたとする。このときの、類似単語Ｗｎである「おっぱま」の単語信頼度は、「おくたま」と「おっぱま」との類似度が、図４に示す読みデータベースより０．５であることから、（２）式を用いて、以下のように算出することができる。 “Okutama”, which is a word having a reading ID of 11041, is selected as the word W, and “Oppama” having a reading ID of 11061 is selected as a similar word Wn whose reading is similar to the word W, and a word reliability calculation is performed. It is assumed that the word reliability of “Okutama” is calculated as 1.0 by the unit 53. At this time, the word reliability of “Oppama”, which is the similar word Wn, is 0.5 because the similarity between “Okutama” and “Oppama” is 0.5 from the reading database shown in FIG. 2) Using the equation, it can be calculated as follows.

「おっぱま」の単語信頼度＝「おくたま」の単語信頼度（１．０）×「おくたま」と「おっぱま」の類似度（０．５）＝０．５ Word reliability of “Oppama” = Word reliability of “Okutama” (1.0) × Similarity between “Okutama” and “Oppama” (0.5) = 0.5

ステップＳ２８のおいて、言語理解部５４は、単語信頼度を算出した類似単語Ｗｎの読みＩＤを、ディスク４０から取得した詳細情報と単語信頼度と共に類似単語リストＬｎに追加する。 In step S28, the language understanding unit 54 adds the reading ID of the similar word Wn for which the word reliability has been calculated to the similar word list Ln together with the detailed information acquired from the disk 40 and the word reliability.

ステップＳ２９において、言語理解部５４は、類似単語リストＬｎに、既に他の単語の類似単語として、類似単語Ｗｎが登録されていたことに応じて、類似単語リストＬｎの中の類似単語Ｗｎの単語信頼度を更新する。 In step S29, the language understanding unit 54 determines whether the similar word Wn in the similar word list Ln is already registered as a similar word of another word in the similar word list Ln. Update confidence.

言語理解部５４は、類似単語リストＬｎに既に登録されている類似単語Ｗｎの更新前単語信頼度を、図５に示すフローチャートのステップＳ３において求められた単語Ｗの単語信頼度、さらに単語Ｗと類似単語Ｗｎの類似度βを用いて、以下に示す（３）式により更新することができる。 The language understanding unit 54 determines the word reliability before update of the similar word Wn already registered in the similar word list Ln, the word reliability of the word W obtained in step S3 of the flowchart shown in FIG. Using the similarity β of the similar word Wn, it can be updated by the following equation (3).

Ｗｎの更新後単語信頼度＝Ｗｎの更新前単語信頼度＋Ｗの単語信頼度 × β ・・・（３） Word reliability after update of Wn = Word reliability before update of Wn + Word reliability of W × β (3)

右辺におけるＷｎの更新前単語信頼度とは、類似単語リストＬｎに登録されている類似単語のもつ単語信頼度で、βは単語Ｗと類似単語Ｗｎの類似度である。 The pre-update word reliability of Wn on the right side is the word reliability of similar words registered in the similar word list Ln, and β is the similarity between the word W and the similar word Wn.

（３）式に示すように、類似単語Ｗｎが、既に他の単語の類似単語Ｗｎとして類似単語リストに登録されている場合、その単語信頼度に今回の類似度の分を加算することで、新たな単語信頼度、つまり更新後の単語信頼度を計算することができる。 As shown in the equation (3), when the similar word Wn is already registered in the similar word list as the similar word Wn of another word, by adding the current degree of similarity to the word reliability, A new word reliability, that is, an updated word reliability can be calculated.

ステップＳ３０において、言語理解部５４は、類似単語Ｗｎに対して単語信頼度を設定した後、ステップＳ２２で取得された単語Ｗの類似単語読みリストＬｗを参照し、単語Ｗに対する全ての類似単語について単語信頼度を設定したかどうかを判定する。 In step S30, the language understanding unit 54 sets the word reliability for the similar word Wn, then refers to the similar word reading list Lw of the word W acquired in step S22, and all the similar words for the word W are determined. Determine whether word confidence is set.

言語理解部５４は、全ての類似単語Ｗｎの単語信頼度を設定した場合は、ステップＳ３１へと進み、まだ単語信頼度が設定されていない類似単語Ｗｎがある場合には、ステップＳ２２へと戻り、類似単語Ｗｎに対する単語信頼度の計算又は単語信頼度の更新処理を実行する。 The language understanding unit 54 proceeds to step S31 when the word reliability of all the similar words Wn is set, and returns to step S22 when there is a similar word Wn for which the word reliability is not yet set. The word reliability calculation or the word reliability update process for the similar word Wn is executed.

ステップＳ３１において、言語理解部５４は、単語信頼度リストＬｃに登録されている全ての単語Ｗに対する類似単語Ｗｎについて単語信頼度を設定したかどうかを判定する。 In step S31, the language understanding unit 54 determines whether or not the word reliability is set for the similar words Wn for all the words W registered in the word reliability list Lc.

言語理解部５４は、全ての単語Ｗに対する類似単語Ｗｎについて単語信頼度を設定した場合は、ステップＳ３２へと進む。また、言語理解部５４は、まだ類似単語Ｗｎに対する処理がなされていない単語Ｗが存在する場合には、ステップＳ２１へと戻り、ステップＳ２１〜ステップＳ３１までを繰り返す。 If the word reliability is set for the similar word Wn for all the words W, the language understanding unit 54 proceeds to step S32. If there is a word W that has not yet been processed for the similar word Wn, the language understanding unit 54 returns to step S21 and repeats steps S21 to S31.

ステップＳ３２において、言語理解部５４は、単語信頼度リストＬｃに登録されている全ての単語Ｗに対する類似単語Ｗｎについて単語信頼度の設定が終了すると、類似単語リストＬｎの中身を単語信頼度リストＬｃに追加する。 In step S32, the language understanding unit 54 completes the setting of the word reliability for the similar words Wn for all the words W registered in the word reliability list Lc, and the contents of the similar word list Ln are stored in the word reliability list Lc. Add to

図９に、図８で一例として示した単語信頼度リストＬｃに登録されている全ての単語Ｗに対応した類似単語の類似単語リストＬｎを示す。 FIG. 9 shows a similar word list Ln of similar words corresponding to all the words W registered in the word reliability list Lc shown as an example in FIG.

図７に示した単語信頼度リストＬｃにおいて、読みＩＤが１１０４１の単語である「おくたま」には、図４に示す読みデータベースから分かるように、読みＩＤが１１０６１の単語である「おっぱま」と、読みＩＤが１１０３１の単語である「たま」との２つの類似単語が存在している。 In the word reliability list Lc shown in FIG. 7, “Okutama” whose reading ID is 11041 is “Oppama” whose reading ID is 11061, as can be seen from the reading database shown in FIG. 4. And two similar words “Tama”, which is a word having a reading ID of 11031, exist.

また、図８に示した単語信頼度リストＬｃにおいて、読みＩＤが１７０１１の単語である「えき」には、図４に示す読みデータベースから分かるように、読みＩＤが１１０５１の単語である「うえき」という一つの類似単語が存在している。 Further, in the word reliability list Lc shown in FIG. 8, “Eki” whose reading ID is 17011 is “upeki” whose reading ID is 11051, as can be seen from the reading database shown in FIG. 4. One similar word exists.

これらは、図８に示す単語信頼度リストＬｃには登録されてない単語であるので、図９に示すように全て類似単語リストＬｎに追加されている。 Since these are words that are not registered in the word reliability list Lc shown in FIG. 8, they are all added to the similar word list Ln as shown in FIG.

図１０に、図９に示した類似単語リストＬｎの中身を、図８に示す単語信頼度リストＬｃに追加した例を示す。これにより、図５に示したフローチャートのステップＳ５よりも後の、ステップＳ６以降において、認識結果候補リストＬｒ１中の全ての単語Ｗに対する類似単語Ｗｎは、認識結果候補リストＬｒ１中の単語Ｗと同等に扱うことができるようになる。 FIG. 10 shows an example in which the contents of the similar word list Ln shown in FIG. 9 are added to the word reliability list Lc shown in FIG. Thus, in step S6 and subsequent steps after step S5 of the flowchart shown in FIG. 5, the similar word Wn for all the words W in the recognition result candidate list Lr1 is equivalent to the word W in the recognition result candidate list Lr1. Will be able to handle.

このように、音声認識結果として得られる単語Ｗだけではなく、音声認識部５２による音声認識処理では漏れてしまったが、単語Ｗとは誤認識されやすい読み方をする類似単語Ｗｎを最終的な認識結果候補として扱うことができる。したがって、ユーザによる正当な発話であるのにも関わらず、認識結果候補を選択する過程で排除されてしまった場合でも、理解結果として選択される可能性を残すことができる。 As described above, not only the word W obtained as a result of the speech recognition, but also the similar recognition word Wn that is easily misrecognized as the word W has been leaked in the speech recognition processing by the speech recognition unit 52, but is finally recognized. Can be treated as a result candidate. Therefore, even if the utterance is a legitimate utterance by the user, the possibility of being selected as an understanding result can be left even if it is excluded in the process of selecting a recognition result candidate.

また、認識対象語に、どの程度、誤認識されやすいかを数値化した類似度を用いて類似単語と、認識対象語の音響的な近さを示すことで、ユーザによって類似単語が発話された可能性を類似度に基づいて判断することができるため、正確な理解結果を生成することができる。 In addition, the similar word is uttered by the user by indicating the closeness of the similarity between the similar word and the recognition target word by using the degree of similarity of how much the recognition target word is easily misrecognized. Since the possibility can be determined based on the similarity, an accurate understanding result can be generated.

さらに、認識結果候補に含まれる単語の単語信頼度と、類似単語の単語信頼度をそれぞれ求めることで、同一の判断基準により各単語の発話可能性を判断することができるため、例えば、類似度が高い類似単語が認識結果候補に含まれていた場合に、その単語信頼度の高低に応じて発話された可能性を検証することができるため、より正確な理解結果を生成することができる。 Furthermore, since the word reliability of a word included in the recognition result candidate and the word reliability of a similar word are obtained, respectively, it is possible to determine the utterance possibility of each word based on the same determination criterion. When a similar word having a high is included in the recognition result candidate, the possibility of being uttered according to the level of word reliability can be verified, so that a more accurate understanding result can be generated.

（類似単語Ｗｎの単語信頼度設定手順：尤度から求める場合）
続いて、類似単語Ｗｎの単語信頼度を設定する際の別の手順について説明をする。上述した例では、認識結果候補リストＬｒ１中の単語Ｗとの類似度が高い類似単語Ｗｎの単語信頼度を、認識結果候補リストＬｒ１中の単語Ｗの単語信頼度から、（２）式又は（３）式を用いて算出していた。 (Word reliability setting procedure for similar word Wn: When obtaining from likelihood)
Next, another procedure for setting the word reliability of the similar word Wn will be described. In the example described above, the word reliability of the similar word Wn having a high similarity with the word W in the recognition result candidate list Lr1 is calculated from the word reliability of the word W in the recognition result candidate list Lr1 using the formula (2) or ( 3) Calculated using the equation.

これに対し、ユーザによって発話されマイク２０を介して入力された音声に対して、再度音声認識処理を実行し、音響的な尤度を求め、この尤度から類似単語Ｗｎの単語信頼度を算出する手法について説明をする。 On the other hand, the speech recognition process is performed again on the speech uttered by the user and input via the microphone 20, the acoustic likelihood is obtained, and the word reliability of the similar word Wn is calculated from the likelihood. The method to do is explained.

図１１に示すフローチャートを用いて、図５に示したフローチャートのステップＳ５における類似単語Ｗｎの単語信頼度を設定する際の別な手順について説明をする。 With reference to the flowchart shown in FIG. 11, another procedure for setting the word reliability of the similar word Wn in step S5 of the flowchart shown in FIG. 5 will be described.

なお、図１１において、類似単語Ｗｎが、類似単語リストＬｎに登録されているかどうかを調べるステップＳ４１〜ステップＳ４５までのステップは、上述した図６に示すフローチャートにおけるステップＳ２１〜ステップＳ２５までと全く同じであるため説明を省略する。 In FIG. 11, steps S41 to S45 for checking whether or not the similar word Wn is registered in the similar word list Ln are exactly the same as steps S21 to S25 in the flowchart shown in FIG. Therefore, the description is omitted.

また、図１１に示したステップＳ４１〜ステップＳ５２までのステップの前段である図５に示すフローチャートのステップＳ１において、ユーザにより発話されマイク２０を介して入力されデジタル化された音声信号は、例えば、メモリ３０などに一時的にバッファリングされているものとする。 Further, in step S1 of the flowchart shown in FIG. 5 which is the preceding stage of steps S41 to S52 shown in FIG. 11, the voice signal uttered by the user and inputted through the microphone 20 is digitized, for example, It is assumed that the memory 30 is temporarily buffered.

ステップＳ４６において、言語理解部５４は、ステップＳ２３で取り出した類似単語Ｗｎが、類似単語リストＬｎに登録されていないことに応じて、ディスク４０の固有名詞データベース、一般データベースを参照し、この類似単語Ｗｎの詳細情報を取り出す。 In step S46, the language understanding unit 54 refers to the proper noun database and the general database on the disk 40 in response to the similarity word Wn extracted in step S23 being not registered in the similarity word list Ln, and this similarity word Detailed information of Wn is extracted.

ステップＳ４７において、言語理解部５４は、この類似単語Ｗｎを類似単語リストＬｎに追加する。 In step S47, the language understanding unit 54 adds the similar word Wn to the similar word list Ln.

ステップＳ４８において、言語理解部５４は、ディスク読み取り装置４１を制御して、この類似単語リストＬｎに追加された類似単語Ｗｎを必ず認識することができる文法を、ディスク４０から読み取りメモリ３０の記憶領域３１に格納させる。 In step S48, the language understanding unit 54 reads the grammar that can recognize the similar word Wn added to the similar word list Ln from the disk 40 by controlling the disk reading device 41, and the storage area of the memory 30. 31 is stored.

これに応じて、音声認識部５２は、メモリ３０に一時的にバッファリングされているデジタル化された音声信号を読み出し、さらにメモリ３０の記憶領域３１に格納された類似単語Ｗｎを認識することができる文法と比較して、音響的な尤度を計算することで音声信号の認識処理を再度実行する。 In response to this, the voice recognition unit 52 reads the digitized voice signal temporarily buffered in the memory 30, and further recognizes the similar word Wn stored in the storage area 31 of the memory 30. The speech signal recognition process is executed again by calculating the acoustic likelihood in comparison with the possible grammar.

ステップＳ４９において、言語理解部５４は、音声認識部５２によりデジタル化された音声信号に対する２度目の音声認識処理によって認識結果が得られたことに応じて、認識結果を尤度と共に認識結果リストＬｒ２に追加する。 In step S49, the language understanding unit 54 recognizes the recognition result together with the likelihood in the recognition result list Lr2 in response to the recognition result obtained by the second speech recognition process for the speech signal digitized by the speech recognition unit 52. Add to

ステップＳ５０において、言語理解部５４は、ステップＳ４２で取得された単語Ｗの類似読みリストＬｗを参照し、単語Ｗに対する全ての類似単語にＷｎについて単語信頼度を設定したかどうか判定をする。 In step S50, the language understanding unit 54 refers to the similar reading list Lw of the word W acquired in step S42, and determines whether or not the word reliability is set for Wn for all similar words for the word W.

言語理解部５４は、全ての類似単語Ｗｎの単語信頼度を設定した場合は、ステップＳ５１へと進み、まだ単語信頼度が設定されていない類似単語Ｗｎがある場合には、ステップＳ４２へと戻る。 The language understanding unit 54 proceeds to step S51 when the word reliability of all the similar words Wn is set, and returns to step S42 when there is a similar word Wn for which the word reliability is not yet set. .

なお、ステップＳ４５において、言語理解部５４は、類似単語Ｗｎが類似単語リストＬｎに存在すれば、既に類似単語Ｗｎを必ず受理する文法を用いた音声認識部５２による音声認識処理は行なわれているので、本ステップＳ５０へと進むことになる。 In step S45, if the similar word Wn exists in the similar word list Ln, the language understanding unit 54 has already performed speech recognition processing by the speech recognition unit 52 using a grammar that always accepts the similar word Wn. Therefore, it will progress to this step S50.

ステップＳ５１において、言語理解部５４は、単語信頼度リストＬｃに登録されている全ての単語Ｗに対する類似単語Ｗｎについて単語信頼度を設定したかどうかを判定する。 In step S51, the language understanding unit 54 determines whether or not the word reliability is set for the similar words Wn for all the words W registered in the word reliability list Lc.

言語理解部５４は、全ての単語Ｗに対する類似単語Ｗｎについて単語信頼度を設定した場合は、ステップＳ５２へと進む。また、言語理解部５４は、まだ類似単語Ｗｎに対する処理がなされていない単語Ｗが存在する場合には、ステップＳ４１へと戻り、ステップＳ４１〜ステップＳ５１までを繰り返す。 If the word reliability is set for the similar word Wn for all the words W, the language understanding unit 54 proceeds to step S52. If there is a word W that has not yet been processed for the similar word Wn, the language understanding unit 54 returns to step S41 and repeats steps S41 to S51.

ステップＳ５２において、単語信頼度演算部５３は、最初に音声信号を音声認識部５２で音声認識処理した結果である認識結果候補リストＬｒ１と、同じ音声信号を２回目に音声認識処理した結果である認識結果候補リストＬｒ２とを合わせた認識結果候補列から単語信頼度を算出する。 In step S 52, the word reliability calculation unit 53 is the result of performing the speech recognition process on the same speech signal for the second time as the recognition result candidate list Lr 1 that is the result of speech recognition processing of the speech signal first by the speech recognition unit 52. The word reliability is calculated from the recognition result candidate string combined with the recognition result candidate list Lr2.

このようにして、音声対話装置は、ユーザによって発話されマイク２０を介して入力された音声に対して、再度音声認識処理を実行し、音響的な尤度を求め、この尤度から類似単語Ｗｎの単語信頼度を算出する。 In this way, the voice interaction apparatus performs the voice recognition process again on the voice uttered by the user and input through the microphone 20, and obtains the acoustic likelihood, and the similar word Wn is obtained from this likelihood. The word reliability of is calculated.

これにより、音声対話装置は、最初の認識結果に含まれる認識結果候補中の単語の単語信頼度と、類似単語Ｗｎの単語信頼度とを比較しながら最適な理解結果を選択することができる。 Thereby, the voice interactive apparatus can select an optimum understanding result while comparing the word reliability of the word in the recognition result candidate included in the first recognition result with the word reliability of the similar word Wn.

（認識結果候補の拡張）
上述したように、本発明の実施の形態として示す音声対話装置は、単語Ｗと誤認識されやすい読み方をする類似単語Ｗｎを最終的な認識結果候補として扱うことができるが、単語Ｗと１対１で対応する類似単語Ｗｎばかりではなく、読み方の組み合わせによって単語Ｗと誤認識されやすい読み方となる類似句を用いることで、最終的な認識結果候補から漏れ出てしまう語句を低減させることができる。 (Expansion of recognition result candidates)
As described above, the spoken dialogue apparatus shown as the embodiment of the present invention can treat the similar word Wn that is easily misrecognized as the word W as a final recognition result candidate. By using not only the similar word Wn corresponding to 1 but also a similar phrase that is easily misrecognized as the word W by a combination of readings, it is possible to reduce words that are leaked from the final recognition result candidates. .

図１２に、図４で示した読みデータベースを拡張し、読みＩＤごとの単語に対する類似句読みリストを付加した様子を示す。類似句とは、２語以上の単語を組み合わせた際の読み方が、対象となる単語の読み方と誤認識されやすく、類似度を有する単語列のことである。 FIG. 12 shows a state where the reading database shown in FIG. 4 is expanded and a similar phrase reading list for words for each reading ID is added. A similar phrase is a word string that is easily misrecognized as a way of reading a target word when two or more words are combined and has a similarity.

例えば、図１２に示すように、読みＩＤが１１０５１の単語である「うえき」に対する類似句は、読みＩＤが１１０７１の単語である「しんぐう」と、読みＩＤが１７０１１の単語である「えき」とを組み合わせてなる「しんぐうえき」という単語列である。図１２に示すように、この類似句の類似度は、０．８となっている。 For example, as shown in FIG. 12, the similar phrases for “Ueki” whose reading ID is 11051 are “Shingu” whose reading ID is 11071 and “Eki” whose reading ID is 17011. The word string “Shingu Ueki” is a combination of As shown in FIG. 12, the similarity of this similar phrase is 0.8.

これにより、「うえき」という認識結果が得られた場合、「しんぐう」と「えき」が最終的な認識結果候補となる可能性があり、類似度より単語信頼度を算出し、その発話可能性を検証することができる。 As a result, when the recognition result “Ueki” is obtained, “Shingu” and “Eki” may be the final recognition result candidates, and the word reliability can be calculated from the similarity and the utterance can be made The sex can be verified.

したがって、ユーザによる正当な発話であるのにも関わらず、認識結果候補を選択する過程で排除されてしまった場合でも、理解結果として選択される可能性を残すことができる。 Therefore, even if the utterance is a legitimate utterance by the user, the possibility of being selected as an understanding result can be left even if it is excluded in the process of selecting a recognition result candidate.

これとは逆に、認識結果候補に含まれる２語以上で構成される単語列と類似度の高い類似単語を用いることで、最終的な認識結果候補から漏れ出てしまう語句を低減させることもできる。 On the other hand, by using similar words having a high degree of similarity with a word string composed of two or more words included in the recognition result candidates, it is possible to reduce words that are leaked from the final recognition result candidates. it can.

例えば、図１３に示すように、読みＩＤが１１０７１の単語である「しんぐう」と、読みＩＤが１７０１１の単語である「えき」とを組み合わせた単語列に対して、類似度の高い類似単語として、読みＩＤが１１０５１の単語である「うえき」を登録しておく。図１３に示すように、この類似単語の類似度は０．８となっている。 For example, as shown in FIG. 13, a similar word having a high degree of similarity with respect to a word string in which “shingu” having a reading ID of 11071 and “eki” having a reading ID of 17011 are combined. Then, “Ueki”, which is a word with a reading ID of 11051, is registered. As shown in FIG. 13, the similarity of this similar word is 0.8.

これにより、「しんぐう」「えき」という認識結果が得られた場合、単語列の一部である「うえき」が最終的な認識結果候補となる可能性があり、類似度より単語信頼度を算出し、その発話可能性を検証することができる。 As a result, when the recognition results “Shingu” and “Eki” are obtained, “Ueki”, which is part of the word string, may be the final recognition result candidate. It is possible to calculate and verify the utterance possibility.

したがって、この場合も、ユーザによる正当な発話であるのにも関わらず、認識結果候補を選択する過程で排除されてしまった場合でも、理解結果として選択される可能性を残すことができる。 Therefore, in this case as well, there is a possibility of being selected as an understanding result even if it is excluded in the process of selecting a recognition result candidate in spite of a legitimate utterance by the user.

なお、図１３に示すような２語以上で構成される単語列の類似度は、実際に認識し得る単語の組み合わせのみについてあらかじめ算出しておく。 Note that the similarity between word strings composed of two or more words as shown in FIG. 13 is calculated in advance only for combinations of words that can be actually recognized.

なお、上述の実施の形態は本発明の一例である。このため、本発明は、上述の実施形態に限定されることはなく、この実施の形態以外であっても、本発明に係る技術的思想を逸脱しない範囲であれば、設計等に応じて種々の変更が可能であることは勿論である。 The above-described embodiment is an example of the present invention. For this reason, the present invention is not limited to the above-described embodiment, and various modifications can be made depending on the design and the like as long as the technical idea according to the present invention is not deviated from this embodiment. Of course, it is possible to change.

本発明の実施の形態として示す音声対話装置の構成について説明するための図である。It is a figure for demonstrating the structure of the voice interactive apparatus shown as embodiment of this invention. 固有名詞データベースの一例を示した図である。It is the figure which showed an example of the proper noun database. 一般データベースの一例を示した図である。It is the figure which showed an example of the general database. 読みデータベースの一例を示した図である。It is the figure which showed an example of the reading database. 音声認識処理を開始してから応答文を出力するまでの処理動作について説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation | movement after starting a speech recognition process until it outputs a response sentence. 類似単語の単語信頼度を設定する手順について説明するためのフローチャートである。It is a flowchart for demonstrating the procedure which sets the word reliability of a similar word. 認識結果候補リストの一例を示した図である。It is the figure which showed an example of the recognition result candidate list. 単語信頼度リストの一例を示した図である。It is the figure which showed an example of the word reliability list. 類似単語リストの一例を示した図である。It is the figure which showed an example of the similar word list. 類似単語リストを単語信頼度リストに追加した様子を示した図である。It is the figure which showed a mode that the similar word list was added to the word reliability list. 類似単語の単語信頼度を設定する手順について説明するためのフローチャートである。It is a flowchart for demonstrating the procedure which sets the word reliability of a similar word. 類似句読みリストを示した図である。It is the figure which showed the similar punctuation list. ２語以上で構成される単語列の類似単語及び類似度を示した図である。It is the figure which showed the similar word and similarity of the word sequence comprised by two or more words.

Explanation of symbols

１０スイッチ
２０マイク
３０メモリ
４０ディスク
５０制御装置
５１入力制御部
５２音声認識部
５３単語信頼度演算部
５４言語理解部
５５応答生成部
５６ＧＵＩ表示制御部
５７音声合成部
６０モニタ
７０スピーカ DESCRIPTION OF SYMBOLS 10 Switch 20 Microphone 30 Memory 40 Disk 50 Control apparatus 51 Input control part 52 Speech recognition part 53 Word reliability calculation part 54 Language understanding part 55 Response generation part 56 GUI display control part 57 Speech synthesis part 60 Monitor 70 Speaker

Claims

Input means for inputting spoken speech;
Storage means for storing the speech recognition target word and a similar word that is easily misrecognized by the recognition target word;
Speech recognition means for recognizing the speech input by the input means based on the recognition target word;
Detecting means for detecting the similar word from the storage means based on a word included in a recognition result candidate that is a recognition result by the voice recognition means;
Understanding result generating means for adding the similar word detected by the detecting means to the recognition result candidate and generating an understanding result as a response to the spoken voice from the recognition result candidate obtained by adding the similar word; A spoken dialogue apparatus characterized by comprising:

The degree of misrecognition of the similar word and the recognition target word is indicated by a degree of similarity obtained by quantifying how much the similar word is erroneously recognized by the recognition target word. The spoken dialogue apparatus according to 1.

Word reliability calculation means for calculating a word reliability indicating the possibility that each of the words included in the recognition result candidate and the similar word detected by the detection means is spoken,
The understanding result generation means generates an understanding result as a response to the spoken speech from the recognition result candidate to which the similar word is added based on the word reliability calculated by the word reliability calculation means. The voice interactive apparatus according to claim 1.

The speech recognition means recognizes the speech using a first grammar that recognizes the recognition target word, recognizes the speech using a second grammar that recognizes the similar word,
The understanding result generation means generates an understanding result as a response to the spoken speech from recognition result candidates that are recognition results using the first grammar and the second grammar by the speech recognition means. The voice interactive apparatus according to claim 1.

The understanding result generation means determines how much a word included in the recognition result candidate is erroneously recognized in a word string formed by connecting two or more recognition target words or a part of the word string. In accordance with the quantified similarity, the two or more recognition target words connected as the word string are added to the recognition result candidates,
The spoken dialogue apparatus according to claim 1, wherein an understanding result as a response to the spoken voice is generated from a recognition result candidate to which the two or more recognition target words are added.

The understanding result generation means is a numerical value indicating how much a word string in which two or more words included in the recognition result candidate are connected or a part of the word string is erroneously recognized by the recognition target word. Depending on the similarity, the recognition target word is added to the recognition result candidate,
The spoken dialogue apparatus according to claim 1, wherein an understanding result that is a response to the spoken voice is generated from a recognition result candidate to which the recognition target word is added.

An input process for inputting spoken voice;
A speech recognition step for recognizing the speech input by the input step based on the recognition target word;
Based on a word included in a recognition result candidate that is a recognition result by the speech recognition step, the similarity is stored from a storage unit that stores the speech recognition target word and a similar word that is easily misrecognized by the recognition target word in association with each other. A detection step of detecting a word;
An understanding result generation step of adding the similar word detected by the detection step to the recognition result candidate, and generating an understanding result as a response to the spoken speech from the recognition result candidate to which the similar word is added. A speech understanding result generation method characterized by comprising: