JP5378907B2

JP5378907B2 - Spoken dialogue apparatus and spoken dialogue program

Info

Publication number: JP5378907B2
Application number: JP2009184946A
Authority: JP
Inventors: 錦一和田; 位好寺澤; 利行難波; 義博大栄; 邦雄横井; 直樹三浦; 收岩田
Original assignee: Aisin AW Co Ltd; Denso Corp; Toyota Motor Corp; Toyota Central R&D Labs Inc
Current assignee: Aisin AW Co Ltd; Denso Corp; Toyota Motor Corp; Toyota Central R&D Labs Inc
Priority date: 2009-08-07
Filing date: 2009-08-07
Publication date: 2013-12-25
Anticipated expiration: 2029-08-07
Also published as: JP2011039185A

Description

本発明は、音声で情報を検索するための音声対話に係り、検索効率の高い音声対話装置及び音声対話プログラムに関する。 The present invention relates to a voice dialog for searching for information by voice, and relates to a voice dialog device and a voice dialog program having high search efficiency.

従来、ユーザが発生した音声を認識し、その認識結果に基いて情報検索を行う技術が知られており、カーナビゲーション装置における施設検索などに用いられている。 Conventionally, a technique for recognizing a voice generated by a user and performing information retrieval based on the recognition result is known and used for facility retrieval in a car navigation device.

このような技術を利用したものとして、例えば、施設特定情報データベースに登録されたキーワードを元に音声認識キーワード辞書を拡張し、該キーワードを音声認識できた場合にはこれをキーワードとして利用して検索を行う施設検索装置（例えば、特許文献１参照。）やナビゲーション装置（例えば、特許文献２参照。）が提案されている。ここで、施設特定情報は、特許文献１の施設検索装置では、施設に関する道路名、施設が存在する町名、施設の建物名、施設のサービス内容であり、特許文献２のナビゲーション装置では、施設名、大ジャンル、小ジャンル、住所、位置、営業日、特徴、ユーザ評価などである。
特開２００６−１３９２０３号公報特開２００７−１６３２２６号公報 As an example of using such a technology, for example, if a speech recognition keyword dictionary is expanded based on a keyword registered in a facility specifying information database, and the keyword can be recognized by speech, a search is performed using this keyword as a keyword. A facility search device (for example, see Patent Document 1) and a navigation device (for example, see Patent Document 2) have been proposed. Here, the facility identification information is a road name related to the facility, a town name where the facility exists, a building name of the facility, and a service content of the facility in the facility search device of Patent Document 1, and a facility name in the navigation device of Patent Document 2. , Large genre, small genre, address, position, business day, feature, user evaluation, and the like.
JP 2006-139203 A JP 2007-163226 A

しかしながら、特許文献１の施設検索装置や特許文献２のナビゲーション装置では、ユーザは装置が受付け可能なすべての検索キーワードを事前に知ることはできない。このため、ユーザが思いついた検索キーワードが、登録された全施設の属性情報に含まれないと、ユーザ発話の音声認識結果は誤認識あるいは不認識となってしまう。このような場合、ユーザは同じキーワードをもう一度言えば認識されるのか、そもそも無効なキーワードであるのかが分からずに、何度も音声入力を繰り返してしまう結果、検索効率が低下するという問題点がある。 However, in the facility search device of Patent Literature 1 and the navigation device of Patent Literature 2, the user cannot know in advance all the search keywords that the device can accept. For this reason, if the search keyword which the user came up with is not included in the attribute information of all the registered facilities, the speech recognition result of the user utterance will be erroneously recognized or unrecognized. In such a case, the user does not know whether the same keyword is recognized again or is an invalid keyword in the first place, and as a result of repeating voice input many times, search efficiency decreases. is there.

また、以前は検索キーワードとして音声入力できたキーワードが、施設情報の更新時に該当ずる施設が閉店などの理由で施設情報から削除されてしまった場合、ユーザはシステムが受付可能な検索キーワードが変更されたことを知ることができない。このため、ユーザは音声入力できるものと思い、何度も音声入力を繰り返してしまうという問題点もある。 In addition, if a keyword that could be input as a search keyword in the past has been deleted from the facility information due to reasons such as closing the facility when the facility information is updated, the search keyword that the system can accept is changed. I can't know that. For this reason, there is a problem that the user can input voices and repeats voice input many times.

本発明は、上記問題点を解決するために成されたものであり、ユーザ発話の誤認識や不認識を低減して、検索効率の高い音声対話装置及び音声対話プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object thereof is to provide a spoken dialogue apparatus and a spoken dialogue program with high search efficiency by reducing misrecognition and unrecognition of user utterances. To do.

上記目的を達成するために、請求項１記載の音声対話装置は、音声コマンドである語と音声コマンドであることを示す分類との対からなる音声コマンドリストと、検索対象データベースの検索キーワードである語と検索キーワードであることを示す分類との対からなる検索キーワードリストと、前記音声コマンドでも前記検索キーワードでもなく検索対象外ワードである語と検索対象外ワードであることを示す分類との対からなる対象外ワードリストと、を有する音声認識辞書を記憶する音声認識辞書記憶手段と、ユーザにより入力された音声データを音声認識して、前記音声認識辞書記憶手段に記憶された音声認識辞書に含まれる語の各々及びその分類を抽出する音声認識手段と、前記音声認識手段により抽出された各語の分類が、前記音声コマンド、前記検索キーワード及び前記対象外ワードの何れであるかを判定し、前記抽出された各語の分類が音声コマンドを含まず、かつ、検索キーワード及び検索対象外ワードを含む場合には、該分類が検索キーワードである語に基づいて前記検索対象データベースを検索すると共に、該分類が検索対象外ワードである語が対象外ワードであることを示す応答を生成する対話制御手段と、前記対話制御手段により検索された検索結果、及び前記対話制御手段により生成された応答を提示する提示手段と、を備えている。 In order to achieve the above object, a spoken dialogue apparatus according to claim 1 is a voice command list comprising a pair of a word that is a voice command and a classification indicating that it is a voice command, and a search keyword in a search target database. A search keyword list comprising a pair of a word and a classification indicating that it is a search keyword, and a pair of a word that is not a search target word and not a voice command or the search keyword, and a classification that indicates a search non-search word A speech recognition dictionary storage means for storing a speech recognition dictionary having a non-target word list comprising: a speech recognition dictionary stored in the speech recognition dictionary storage means for speech recognition of speech data input by a user; a speech recognition means for extracting included word of each and their classification, the classification of each word extracted by the voice recognition unit, the sound Command, the search keyword and determine which of the non-target word, the classification of each word which is the extraction does not contain a voice command, and, when containing the search keyword and a search excluded words, the while searching the search target database classification based on the word is the search keyword, a dialogue control means for generating a response indicating that the word the classification is Ru Oh search covered word is excluded words, the interactive Presenting means for presenting a search result retrieved by the control means and a response generated by the dialogue control means.

請求項１記載の発明によれば、検索対象外ワードが入力された場合には、該ワードが検索対象外であることを提示し、ユーザに自分が発話したワードでは検索できないことを知らせることができる。また、検索キーワードが入力された場合には、検索キーワードに基づいて検索対象データベースを検索し、検索結果を提示することができる。
請求項２記載の音声対話装置は、請求項１記載の音声対話装置において、前記音声認識辞書記憶手段は、前記音声コマンドリストと、前記検索キーワードリストと、検索を行う際にオプション的に使われる可能性が高いキーワードである複数のオプション語からなるオプション語リストから前記音声コマンド及び前記検索キーワードを削除して得た前記検索対象外ワードである語と前記検索対象外ワードであることを示す分類との対からなる前記対象外ワードリストと、を有する音声認識辞書を記憶するものである。
請求項３記載の音声対話装置は、請求項２記載の音声対話装置において、音声コマンドである語を記憶した音声コマンド辞書と、前記オプション語を記憶したオプション語辞書と、検索対象となる要素項目の情報を示す語を記憶した前記検索対象データベース、を用いて、前記音声コマンド辞書に記憶された語と音声コマンドであることを示す分類とを対応付けることで前記音声コマンドリストを生成し、前記検索対象データベースに記憶された語と検索キーワードであることを示す分類とを対応付けることで前記検索キーワードリストを生成し、前記オプション語辞書に記憶され且つ前記音声コマンド辞書にも前記検索対象データベースにも記憶されていない語と検索対象外ワードであることを示す分類とを対応付けることで前記対象外ワードリストを生成し、該生成した前記音声コマンドリスト、前記検索キーワードリスト及び前記対象外ワードリストを登録することによって前記音声認識辞書を生成する音声認識辞書生成手段を更に備えている。 According to the first aspect of the present invention, when a word that is not a search target is input, the fact that the word is not a search target is presented, and the user is informed that the search cannot be performed using the word spoken by the user. it can. When a search keyword is input, the search target database can be searched based on the search keyword and the search result can be presented.
According to a second aspect of the present invention, in the voice interaction device according to the first aspect, the voice recognition dictionary storage means is optionally used when performing the search with the voice command list, the search keyword list, and the like. Classification indicating that the word is a non-search word and the non-search word obtained by deleting the voice command and the search keyword from an option word list including a plurality of option words that are highly likely keywords A speech recognition dictionary having the non-target word list consisting of a pair of
The voice dialogue system according to claim 3, wherein, in the voice dialogue system according to claim 2, wherein the voice command dictionary that stores word is a voice command, and options dictionary which stores the option word element item to be searched The speech command list is generated by associating a word stored in the voice command dictionary with a classification indicating a voice command using the search target database storing words indicating information of The search keyword list is generated by associating a word stored in the target database with a classification indicating a search keyword, stored in the option word dictionary, and stored in the voice command dictionary and the search target database. By associating a word that has not been searched with a classification indicating that the word is not a search target word, It generates Dorisuto, the voice command list thus generated further includes a speech recognition dictionary generating means for generating said speech recognition dictionary by registering the search keyword list and the non-target word list.

請求項３記載の発明によれば、音声コマンド辞書と、検索対象データベースと、オプション語辞書とに基づいて、音声認識辞書を生成することができる。 According to the third aspect of the present invention, it is possible to generate a voice recognition dictionary based on the voice command dictionary, the search target database, and the option word dictionary.

請求項４記載の音声対話装置は、請求項１から請求項３の何れか１項記載の音声対話装置において、前記対話制御手段は、前記抽出された語の分類が音声コマンドを含む場合には、該音声コマンドに対応する処理を実行する。 The voice interaction device according to claim 4 is the voice interaction device according to any one of claims 1 to 3 , wherein the dialogue control means is configured such that the extracted word classification includes a voice command. The process corresponding to the voice command is executed.

請求項４記載の発明によれば、音声コマンドが入力された場合には、コマンドに対応する処理を実行することができる。 According to the fourth aspect of the present invention, when a voice command is input, processing corresponding to the command can be executed.

請求項５記載の音声対話プログラムは、コンピュータを、請求項１から請求項４の何れか１項記載の音声対話装置を構成する各手段として機能させる。 Audio interactive program according to claim 5 causes the computer to function as each means constituting the voice interaction equipment according to any one of claims 1 to 4.

請求項５記載の発明によれば、検索対象外ワードが入力された場合には、該ワードが検索対象外であることを提示し、ユーザに自分が発話したワードでは検索できないことを知らせることができる。 According to the fifth aspect of the present invention, when a word that is not a search target is input, the fact that the word is not a search target is presented, and the user is notified that the search cannot be performed using the word spoken by the user. it can.

請求項６記載の音声対話プログラムは、コンピュータを、音声コマンドである語と音声コマンドであることを示す分類との対からなる音声コマンドリストと、検索対象データベースの検索キーワードである語と検索キーワードであることを示す分類との対からなる検索キーワードリストと、前記音声コマンドでも前記検索キーワードでもなく検索対象外ワードである語と検索対象外ワードであることを示す分類との対からなる対象外ワードリストと、を有する音声認識辞書を記憶する音声認識辞書記憶手段、ユーザにより入力された音声データを音声認識して、前記音声認識辞書記憶手段に記憶された音声認識辞書に含まれる語の各々及びその分類を抽出する音声認識手段、前記音声認識手段により抽出された各語の分類が、前記音声コマンド、前記検索キーワード及び前記対象外ワードの何れであるかを判定し、前記抽出された各語の分類が音声コマンドを含まず、かつ、検索キーワード及び検索対象外ワードを含む場合には、該分類が検索キーワードである語に基づいて前記検索対象データベースを検索すると共に、該分類が検索対象外ワードである語が対象外ワードであることを示す応答を生成する対話制御手段、及び前記対話制御手段により検索された検索結果、及び前記対話制御手段により生成された応答を提示する提示手段、として機能させる。 According to a sixth aspect of the present invention, there is provided a voice interaction program comprising: a voice command list comprising a pair of a word that is a voice command and a classification that indicates a voice command; a word that is a search keyword in a search target database; A search keyword list consisting of a pair with a classification indicating that it is present, and a non-target word consisting of a pair of a word that is not a search target word and not a voice command or the search keyword and a classification indicating that it is a non-search target word A voice recognition dictionary storage means for storing a voice recognition dictionary having a list, each of words included in the voice recognition dictionary stored in the voice recognition dictionary storage means for voice recognition of voice data input by a user, and voice recognition means to extract the classification, the classification of each word extracted by the voice recognition unit, said voice command , The search keyword and determine which of the non-target word, the classification of each word which is the extraction does not contain a voice command, and, when containing the search keyword and a search excluded words, the classification together but searching the retrieval target database based on word is the search keyword, dialogue control means for generating a response indicating that the word the classification is Ru Oh search covered word is pair Zogai word, and the interaction It is made to function as a presentation means for presenting a search result retrieved by the control means and a response generated by the dialogue control means.

請求項６記載の発明によれば、検索対象外ワードが入力された場合には、該ワードが検索対象外であることを提示し、ユーザに自分が発話したワードでは検索できないことを知らせることができる。 According to the sixth aspect of the present invention, when a word not to be searched is input, it is indicated that the word is not to be searched, and the user is informed that the word cannot be searched with the word spoken by himself / herself. it can.

以上説明したように、本発明によれば、ユーザ発話の誤認識や不認識を低減して、音声対話装置の利便性を向上することができるという効果が得られる。 As described above, according to the present invention, it is possible to reduce the misrecognition and unrecognition of the user utterance and improve the convenience of the voice interactive apparatus.

以下、本発明の実施の形態について図面を参照しながら詳細に説明する。本実施の形態では、音声認識機能を持つ車両用カーナビゲーションシステム（以下、「ナビ」という。）における施設検索に本発明に係る音声対話装置を用いた場合の対話制御に関して説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, dialogue control in the case where the voice dialogue apparatus according to the present invention is used for facility search in a vehicle car navigation system (hereinafter referred to as “navigation”) having a voice recognition function will be described.

図１は、本発明の実施の形態に係る音声対話装置の構成を示すブロック図である。同図に示すように、音声対話装置は、音声データ入力部１１と、音声コマンド辞書１２と、オプション語辞書１３と、検索対象データベース１４と、音声認識辞書生成部１５と、音声認識辞書１６と、音声認識部１７と、対話制御部１８と、情報検索部１９と、提示部２０と、を備えている。 FIG. 1 is a block diagram showing a configuration of a voice interaction apparatus according to an embodiment of the present invention. As shown in the figure, the voice interaction apparatus includes a voice data input unit 11, a voice command dictionary 12, an option word dictionary 13, a search target database 14, a voice recognition dictionary generation unit 15, and a voice recognition dictionary 16. , A voice recognition unit 17, a dialogue control unit 18, an information search unit 19, and a presentation unit 20.

音声データ入力部１１は、マイクを含んで構成され、音声コマンドの実行や検索対象データベースの検索を行うために入力されるユーザの音声データを受理する。 The voice data input unit 11 includes a microphone, and accepts user voice data input to execute a voice command or search a search target database.

音声コマンド辞書１２は、音声操作が可能なコマンドの名称を格納した辞書であり、音声コマンドを音声認識辞書１６に登録するために用いられる。図２は、音声コマンド辞書１２の構成例を示す。 The voice command dictionary 12 is a dictionary that stores the names of commands that can be operated by voice, and is used for registering voice commands in the voice recognition dictionary 16. FIG. 2 shows a configuration example of the voice command dictionary 12.

オプション語辞書１３は、情報検索を行う際にユーザがオプション的に使う可能性が高いキーワードを格納する。例えば、大勢の一般ユーザに「施設検索を行う際に、エリアとジャンル以外にオプション的に使うキーワードは何ですか？」というアンケート調査を行い、得られた結果から施設の属性に関するキーワードを抽出し、オプション語辞書に格納するとよい。オプション語辞書１３は、音声コマンド辞書１２にも検索対象データベース１４に含まれていないキーワードをユーザが発話しても音声認識できるように、音声認識辞書１６を補填するために利用される。図３は、オプション語辞書１３の構成例を示す。 The option word dictionary 13 stores keywords that are likely to be used optionally by the user when performing an information search. For example, we conducted a questionnaire survey to many general users, such as “What keyword is used as an option in addition to area and genre when searching for facilities?” And extracted keywords related to facility attributes from the results. It is good to store in the option word dictionary. The option word dictionary 13 is used to supplement the voice recognition dictionary 16 so that the user can utter a keyword even if the keyword is not included in the search target database 14 in the voice command dictionary 12. FIG. 3 shows a configuration example of the option word dictionary 13.

検索対象データベース１４は、検索対象となる要素項目の様々な情報を格納したデータベースである。本実施の形態では音声対話装置をナビの施設検索に応用するので、要素項目は施設である。従って、検索対象データベース１４は、施設の「名称」、「エリア」、「ジャンル」等の基本的な情報に加えて、施設の「属性」に関わる複数のキーワードを格納する。施設の属性に関わるキーワードは、例えば、施設の経営者などからアンケートによって収集した自己ＰＲの内容を元に抽出するとよい。さらに、第三者へのアンケートやインターネット上の評判や口コミに関する情報を元にキーワードを抽出してもよいし、ユーザ自身が自由なキーワードを登録できるようにしてもよい。図４は、検索対象データベース１４の構成例を示す。 The search target database 14 is a database that stores various pieces of information about element items to be searched. In this embodiment, since the voice interactive apparatus is applied to facility search for navigation, the element item is a facility. Accordingly, the search target database 14 stores a plurality of keywords related to the “attribute” of the facility, in addition to basic information such as the “name”, “area”, and “genre” of the facility. For example, keywords related to the attributes of the facility may be extracted based on the contents of the self-PR collected from the manager of the facility through a questionnaire. Further, keywords may be extracted based on information on questionnaires to third parties, reputations on the Internet, and word-of-mouth, or users may be able to register free keywords. FIG. 4 shows a configuration example of the search target database 14.

音声認識辞書生成部１５は、音声コマンド辞書１２、オプション語辞書１３、および検索対象データベース１４を元に、音声認識辞書１６を生成する。音声認識辞書生成部１５は、各キーワードに「音声コマンド」、「検索有効語彙」、「検索無効語彙」の分類情報を併せて格納する。 The voice recognition dictionary generation unit 15 generates a voice recognition dictionary 16 based on the voice command dictionary 12, the option word dictionary 13, and the search target database 14. The voice recognition dictionary generation unit 15 stores the classification information of “voice command”, “search effective vocabulary”, and “search invalid vocabulary” together with each keyword.

音声認識辞書１６は、音声認識時に音声認識部１７により参照される辞書である。複数のキーワードを含み、各キーワードには、「音声コマンド」、「検索有効語彙」、「検索無効語彙」の分類が付与されている。 The speech recognition dictionary 16 is a dictionary that is referred to by the speech recognition unit 17 during speech recognition. A plurality of keywords are included, and each keyword is assigned a classification of “voice command”, “search effective vocabulary”, and “search invalid vocabulary”.

音声認識部１７は、音声データ入力部１１により入力された音声データを音声認識辞書１６を用いて音声認識する。認識した語には、「音声コマンド」、「検索有効語彙」、「検索無効語彙」の分類を付与して出力する。 The voice recognition unit 17 recognizes the voice data input by the voice data input unit 11 using the voice recognition dictionary 16. The recognized words are output with the classification of “voice command”, “search effective vocabulary”, and “search invalid vocabulary”.

対話制御部１８は、音声認識部１７により認識されたキーワードを元に、ユーザとの対話処理を行う。具体的には、対話制御部１８は、音声認識部１７による認識結果を示すための応答を生成したり、検索対象データベース１４の検索を行うように情報検索部１９を制御する。 The dialogue control unit 18 performs dialogue processing with the user based on the keyword recognized by the voice recognition unit 17. Specifically, the dialogue control unit 18 controls the information search unit 19 to generate a response for indicating a recognition result by the voice recognition unit 17 or to search the search target database 14.

情報検索部１９は、音声認識部１７により認識されたキーワードを検索条件にして検索対象データベース１４を検索し、検索結果を対話制御部１８に返す。 The information search unit 19 searches the search target database 14 using the keyword recognized by the voice recognition unit 17 as a search condition, and returns the search result to the dialogue control unit 18.

提示部２０は、スピーカやディスプレイを含んで構成され、ユーザへの応答内容を音声や文字によって提示する。具体的には、提示部２０は、対話制御部１８により生成された応答や情報検索部１９による検索結果を提示する。 The presentation unit 20 includes a speaker and a display, and presents response contents to the user by voice or text. Specifically, the presentation unit 20 presents the response generated by the dialogue control unit 18 and the search result by the information search unit 19.

以上のように構成された音声対話装置は、音声認識辞書１６を生成し、生成した音声認識辞書１６に基づいてユーザにより入力された音声データを認識して検索対象データベース１４の情報検索などの処理を行う。図５は、音声認識辞書１６の生成時の音声対話装置の作用の流れを示すフローチャートである。 The voice interaction apparatus configured as described above generates a voice recognition dictionary 16, recognizes voice data input by the user based on the generated voice recognition dictionary 16, and performs processing such as information search in the search target database 14. I do. FIG. 5 is a flowchart showing the flow of the operation of the voice interaction apparatus when the voice recognition dictionary 16 is generated.

まず、ステップ１００では、音声認識辞書生成部１５が、音声認識辞書１６を初期化する。 First, in step 100, the speech recognition dictionary generation unit 15 initializes the speech recognition dictionary 16.

ステップ１０２では、音声認識辞書生成部１５が、図２に示す音声コマンド辞書１２に格納された各キーワードに“音声コマンド”の分類を付与して音声コマンド語彙リストを作成する。 In step 102, the voice recognition dictionary generation unit 15 creates a voice command vocabulary list by assigning a classification of “voice command” to each keyword stored in the voice command dictionary 12 shown in FIG.

ステップ１０４では、音声認識辞書生成部１５が、作成した音声コマンド語彙リストを音声認識辞書１６に登録する。ここで、音声コマンドの音声認識辞書１６への登録は、音声コマンド辞書１２に格納された全てのキーワードを登録してもよいし、あるいは、ユーザに現在提示している画面や応答メッセージの内容に合わせて、次にユーザが発話可能なキーワードのみに絞り込んでから登録してもよい。キーワードを絞り込んでからと登録することにより、音声認識の精度向上を図ることができる。 In step 104, the voice recognition dictionary generation unit 15 registers the created voice command vocabulary list in the voice recognition dictionary 16. Here, the voice command may be registered in the voice recognition dictionary 16 by registering all the keywords stored in the voice command dictionary 12 or by using the screen currently displayed to the user or the content of the response message. In addition, it may be registered after narrowing down to only keywords that the user can speak next. By registering after narrowing down keywords, the accuracy of speech recognition can be improved.

ステップ１０６では、音声認識辞書生成部１５が、検索対象データベース１４から検索条件となるキーワードを抽出する。音声認識辞書生成部１５は、キーワードの抽出に際して、図４に示すエリア、ジャンル、属性情報をフィールドとして格納した検索対象データベース１４を用いる場合、「エリア：都道府県」、「エリア：市区町村」、「ジャンル」、「属性情報（１）〜（５）」の各フィールドに含まれるキーワードを互いに重複することなく抽出する。 In step 106, the speech recognition dictionary generation unit 15 extracts keywords that serve as search conditions from the search target database 14. When the keyword recognition is performed, the speech recognition dictionary generation unit 15 uses “area: prefecture”, “area: city” when using the search target database 14 storing the area, genre, and attribute information shown in FIG. 4 as fields. , “Genre”, and “attribute information (1) to (5)” fields are extracted without overlapping each other.

ステップ１０８では、音声認識辞書生成部１５が、検索対象データベース１４より抽出した各キーワードに“検索有効語彙”分類を付与して検索有効語彙リスト作成する。この際、現在地周辺における施設検索ができるように、キーワード「現在地周辺」を検索有効語彙リストに追加するとよい。図６は、検索有効語彙リストの構成例を示す。 In step 108, the speech recognition dictionary generation unit 15 assigns a “search effective vocabulary” classification to each keyword extracted from the search target database 14 and creates a search effective vocabulary list. At this time, the keyword “around the current location” may be added to the search effective vocabulary list so that the facility search around the current location can be performed. FIG. 6 shows a configuration example of the search effective vocabulary list.

ステップ１１０では、音声認識辞書生成部１５が、作成した検索有効語彙リストを音声認識辞書１６に登録する。 In step 110, the speech recognition dictionary generation unit 15 registers the created search effective vocabulary list in the speech recognition dictionary 16.

ステップ１１２では、音声認識辞書生成部１５が、図３に示すオプション語辞書１３のキーワードから、前述の音声コマンド語彙リストまたは検索有効語彙リストに含まれるキーワードを削除した後、各キーワードに“検索無効語彙”の分類を付与して検索無効語彙リストを作成する。例えば、図３に示すオプション語辞書１３を用いる場合、図６の検索有効語彙リストに含まれるキーワード「安い」、「宴会ができる」、「穴場的」が削除され、「すいている」「ランチがある」、「駐車場がある」、「美味しい」というキーワードが得られる。音声認識辞書生成部１５は、これらのキーワードに分類が付与して検索無効語彙リストを作成する。 In step 112, the speech recognition dictionary generation unit 15 deletes the keywords included in the above-described speech command vocabulary list or search effective vocabulary list from the keywords in the option word dictionary 13 shown in FIG. A search invalid vocabulary list is created by assigning a classification of “vocabulary”. For example, when the option word dictionary 13 shown in FIG. 3 is used, the keywords “cheap”, “can banquet”, and “unknown” included in the search effective vocabulary list of FIG. There are keywords such as “There is parking”, “There is a parking lot”, and “Delicious”. The speech recognition dictionary generator 15 creates a search invalid vocabulary list by assigning classifications to these keywords.

図７は、このようにして最終的に得られる音声認識辞書１６の生成例を示す。 FIG. 7 shows a generation example of the speech recognition dictionary 16 finally obtained in this way.

次に、音声対話による情報検索について説明する。図８は、情報検索時の音声対話装置の作用の流れを示すフローチャートである。 Next, information retrieval by voice dialogue will be described. FIG. 8 is a flowchart showing the flow of the operation of the voice interaction apparatus during information retrieval.

まず、ステップ２００では、前述のようにして音声認識辞書生成部１５が、音声認識辞書１６を生成する。本実施の形態では、情報検索を行う毎に音声認識辞書１６を生成することとしているが、音声認識辞書１６を生成するタイミングはこれに限定するものではない。例えば、音声コマンド辞書１２、オプション語辞書１３、検索対象データベース１４の何れかが更新されたタイミングに合わせて実行してもよい。その他にも、自車位置などから検索対象データベース１４内の検索対象範囲を限定する機能を有するシステムに応用する場合には、検索対象範囲が変更されたタイミングに合わせて音声認識辞書１６を生成してもよい。 First, in step 200, the speech recognition dictionary generation unit 15 generates the speech recognition dictionary 16 as described above. In the present embodiment, the speech recognition dictionary 16 is generated every time information search is performed, but the timing for generating the speech recognition dictionary 16 is not limited to this. For example, it may be executed at the timing when any one of the voice command dictionary 12, the option word dictionary 13, and the search target database 14 is updated. In addition, when applied to a system having a function of limiting the search target range in the search target database 14 based on the vehicle position or the like, the speech recognition dictionary 16 is generated in accordance with the timing when the search target range is changed. May be.

ステップ２０２では、音声データ入力部１１が、ユーザがナビによる情報検索のために発した音声データを受理する。 In step 202, the voice data input unit 11 accepts voice data issued by the user for information retrieval by navigation.

ステップ２０４では、音声認識部１７が、ユーザにより入力された音声データを、音声認識辞書１６を用いて音声認識する。音声認識部１７は、音声認識の結果として得られたキーワードと音声認識辞書１６内の同じキーワード項目を探し、対応する分類と併せて結果を出力する。例えば、認識結果が「居酒屋」であった場合には、図７の音声認識辞書内の同じキーワードに付与された分類“検索有効語彙”と共に「居酒屋（検索有効語彙）」などの様式を用いて出力する。ユーザ１回の発話に複数のキーワードが認識された場合は、結果リストとして出力する。 In step 204, the voice recognition unit 17 recognizes voice data input by the user using the voice recognition dictionary 16. The speech recognition unit 17 searches for the same keyword item in the speech recognition dictionary 16 as the keyword obtained as a result of speech recognition, and outputs the result together with the corresponding classification. For example, when the recognition result is “Izakaya”, using a format such as “Izakaya (search effective vocabulary)” together with the classification “search effective vocabulary” assigned to the same keyword in the speech recognition dictionary of FIG. Output. When a plurality of keywords are recognized in one utterance of the user, it is output as a result list.

音声認識部１７による音声認識は、公知の如何なる方法を用いてもよいが、例えば、次の方法によって行う。まず、音声データの音声特徴量の時系列データから音響モデルを参照して音素リストの候補を複数生成し、この音素リスト候補から音声認識辞書１６に登録されている単語の組み合わせによって表現可能な音素リスト候補を抽出する。次に、音声特徴量の時系列データと音素リストから音響的な尤度を算出し、尤度の高い順に上位Ｎ個を抽出する。そして、音素リストを該当する音声認識辞書１６の分類付きの認識ワードに置き換え、認識結果候補リストとする。この認識結果候補リストから文法的に成立する候補のみを抽出し、抽出された候補のうち最上位の候補を認識結果として出力する。文法的に成立する候補がない場合は、認識結果なしとする。 The voice recognition by the voice recognition unit 17 may be performed by any known method, for example, by the following method. First, a plurality of phoneme list candidates are generated by referring to an acoustic model from time-series data of speech feature values of speech data, and phonemes that can be expressed by combinations of words registered in the speech recognition dictionary 16 from the phoneme list candidates. Extract list candidates. Next, the acoustic likelihood is calculated from the time series data of the speech feature amount and the phoneme list, and the top N items are extracted in descending order of likelihood. Then, the phoneme list is replaced with a recognition word with classification in the corresponding speech recognition dictionary 16 to obtain a recognition result candidate list. Only candidates that are grammatically established are extracted from this recognition result candidate list, and the highest candidate among the extracted candidates is output as a recognition result. If there is no grammatical candidate, no recognition result is given.

ステップ２０６では、対話制御部１８が、音声認識部１７により認識されたキーワードの分類が音声コマンドか否かを判定し、音声コマンドの場合はステップ２０８に進み、音声コマンド尚場合にはステップ２１４に進む。 In step 206, the dialogue control unit 18 determines whether or not the keyword classification recognized by the voice recognition unit 17 is a voice command. If the command is a voice command, the process proceeds to step 208. If the voice command is not, the process proceeds to step 214. move on.

ステップ２０８では、対話制御部１８が音声認識部１７により認識された音声コマンドに対応する処理を実行し、提示部２０が実行結果を提示する。 In step 208, the dialogue control unit 18 executes processing corresponding to the voice command recognized by the voice recognition unit 17, and the presentation unit 20 presents the execution result.

ステップ２１０では、対話制御部１８が、音声コマンドが検索を終了するコマンドか否かを判定する。例えば、「目的地に設定する」など検索の次の段階の操作に移行する音声コマンドであった場合は、対話制御部１８は、情報検索を終了する。また、音声コマンドが「戻る」など検索の操作を継続する音声コマンドの場合は、ステップ２１２に進む。 In step 210, the dialogue control unit 18 determines whether or not the voice command is a command for terminating the search. For example, in the case of a voice command that shifts to an operation at the next stage of search such as “Set as destination”, the dialogue control unit 18 ends the information search. If the voice command is a voice command that continues the search operation such as “return”, the process proceeds to step 212.

ステップ２１２では、対話制御部１８が次の入力を促すメッセージを生成し、提示部２０がメッセージを提示した後、ステップ２０２に戻る。 In step 212, the dialogue control unit 18 generates a message for prompting the next input, and after the presentation unit 20 presents the message, the process returns to step 202.

ステップ２１４では、対話制御部１８が、音声認識部１７により認識されたキーワードが検索有効語彙を１つ以上含むか否かを判定し、検索有効語彙を１つ以上含む場合にはステップ２１６に進み、含まない場合にはステップ２２２に進む。 In step 214, the dialogue control unit 18 determines whether or not the keyword recognized by the voice recognition unit 17 includes one or more search effective vocabularies. If the keyword includes one or more search effective vocabularies, the process proceeds to step 216. If not included, the process proceeds to step 222.

ステップ２１６では、対話制御部１８が、認識されたキーワード毎に検索有効語彙か検索無効語彙かが分かるようにメッセージを生成し、提示部２０がメッセージを提示する。 In step 216, the dialogue control unit 18 generates a message so that it can be identified whether the search effective vocabulary or the search invalid vocabulary for each recognized keyword, and the presentation unit 20 presents the message.

ステップ２１８では、情報検索部１９が、認識されたキーワードのうちの検索有効語彙であるキーワードを検索条件として検索対象データベース１４の検索を行う。 In step 218, the information search unit 19 searches the search target database 14 using a keyword that is a search effective vocabulary among the recognized keywords as a search condition.

ステップ２２０では、提示部２０が情報検索部１９による検索結果を提示する。また、対話制御部１８が次の入力を促すメッセージを生成し、提示部２０がメッセージを提示した後、ステップ２０２に戻る。 In step 220, the presentation unit 20 presents the search result by the information search unit 19. Further, after the dialogue control unit 18 generates a message for prompting the next input and the presentation unit 20 presents the message, the process returns to step 202.

情報検索部１９による情報検索は、次のように行う。例えば、音声認識部１７が、「現在地周辺でランチがあるレストラン」というユーザの発話から、「現在地周辺（検索有効語彙）」、「ランチがある（検索無効語彙）」、「レストラン（検索有効語彙）」という３つのキーワードを認識したとする。この場合、対話制御部１８が、「“ランチがある”は、検索に無効なキーワードです。現在地周辺のレストランを条件として検索します。」のようにキーワードが有効か無効かを示すメッセージを生成し、提示部２０がこれを提示する。その後、「現在地周辺」と「レストラン」という２つの条件で検索対象データベース１４を検索した結果から、対話制御部１８が、例えば、「５件見つかりました。」のようなメッセージを生成し、提示部２０がこれを提示する。また、併せて、対話制御部１８が、「一番近いのはレストラン○○です。目的地に設定しますか？次を提示しますか？」のように次の入力を促すメッセージを生成し、提示部２０がこれを提示する。 Information retrieval by the information retrieval unit 19 is performed as follows. For example, the speech recognition unit 17 determines from the user's utterance “restaurant with lunch near the current location”, “around the current location (search effective vocabulary)”, “with lunch (search invalid vocabulary)”, “restaurant (search effective vocabulary). ) "Is recognized. In this case, the dialogue control unit 18 generates a message indicating whether the keyword is valid or invalid, such as ““ There is lunch ”is a keyword that is invalid for search. Then, the presentation unit 20 presents this. Thereafter, from the result of searching the search target database 14 under the two conditions “current location” and “restaurant”, the dialogue control unit 18 generates and presents a message such as “5 found”, for example. Part 20 presents this. At the same time, the dialogue control unit 18 generates a message for prompting the next input, such as “The nearest restaurant is XX. Do you want to set the destination? The presentation unit 20 presents this.

ステップ２２２では、対話制御部１８が、認識されたキーワードが検索無効語彙であることを示すメッセージ、及び「もう一度言ってください。」など別の入力を促すメッセージを生成し、提示部２０がこれらを提示した後、ステップ２０２に戻る。 In step 222, the dialogue control unit 18 generates a message indicating that the recognized keyword is a search invalid vocabulary, and a message for prompting another input such as “Please say again”. After the presentation, the process returns to step 202.

以上のように、本実施の形態に係る音声対話装置は、音声認識辞書１６を生成する際に、音声コマンド辞書１２にも検索対象データベース１４にも含まれないキーワードを、オプション語辞書１３を用いて補填し、検索キーワードとして有効か無効かを分類しておく。これにより、ユーザが情報検索時に使う可能性が高いキーワードの音声認識に対応することができる。また、認識したキーワードが検索に有効であるかどうかを即座にユーザに知らせることが可能となり、ユーザは無効なキーワードを何度も繰り返して音声入力することがなくなり、結果として情報検索を効率よく行うことができる。 As described above, the voice interaction apparatus according to the present embodiment uses the option word dictionary 13 for keywords not included in the voice command dictionary 12 or the search target database 14 when generating the voice recognition dictionary 16. And categorize whether the search keyword is valid or invalid. Thereby, it is possible to cope with voice recognition of a keyword that is highly likely to be used when the user searches for information. In addition, it is possible to immediately notify the user whether or not the recognized keyword is valid for the search, and the user does not repeatedly input the invalid keyword by voice many times, and as a result, information retrieval is efficiently performed. be able to.

なお、本発明は、上述の実施の形態に限定されるものではなく、特許請求の範囲に記載された範囲内で設計上の変更をされたものにも適用可能である。例えば、楽曲検索などにも応用することができる。 In addition, this invention is not limited to the above-mentioned embodiment, It is applicable also to what changed the design within the range described in the claim. For example, it can be applied to music search.

本発明の実施の形態に係る音声対話装置の構成を示すブロック図であるIt is a block diagram which shows the structure of the voice interactive apparatus which concerns on embodiment of this invention. 音声コマンド辞書の構成例を示す図である。It is a figure which shows the structural example of a voice command dictionary. オプション語辞書の構成例を示す図である。It is a figure which shows the structural example of an option word dictionary. 検索対象データベースの構成例を示す図である。It is a figure which shows the structural example of a search object database. 本実施の形態に係る音声対話装置の音声認識辞書生成時の作用の流れを示すフローチャートである。It is a flowchart which shows the flow of an effect | action at the time of the speech recognition dictionary production | generation of the speech dialogue apparatus which concerns on this Embodiment. 検索有効語彙リストの構成例を示す図である。It is a figure which shows the structural example of a search effective vocabulary list. 音声認識辞書の構成例を示す図である。It is a figure which shows the structural example of a speech recognition dictionary. 本実施の形態に係る音声対話装置の情報検索時の作用の流れを示すフローチャートである。It is a flowchart which shows the flow of an effect | action at the time of the information search of the voice interactive apparatus which concerns on this Embodiment.

１１音声データ入力部
１２音声コマンド辞書
１３オプション語辞書
１４検索対象データベース
１５音声認識辞書生成部
１６音声認識辞書
１７音声認識部
１８対話制御部
１９情報検索部
２０提示部 DESCRIPTION OF SYMBOLS 11 Voice data input part 12 Voice command dictionary 13 Option word dictionary 14 Search object database 15 Voice recognition dictionary production | generation part 16 Voice recognition dictionary 17 Voice recognition part 18 Dialog control part 19 Information search part 20 Presentation part

Claims

A search keyword list consisting of a pair of a voice command list consisting of a word that is a voice command and a classification indicating that it is a voice command, and a pair indicating a word that is a search keyword in the search target database and a classification indicating that it is a search keyword And a speech recognition dictionary storing a speech recognition dictionary having a word list that is not a search target word but a word that is not a search target word and a classification indicating that it is a non-search target word Dictionary storage means;
Speech recognition means for speech recognition of speech data input by a user, and extracting each of words included in the speech recognition dictionary stored in the speech recognition dictionary storage means and their classification;
It is determined whether the classification of each word extracted by the voice recognition means is the voice command, the search keyword, or the non-target word, the classification of each extracted word does not include a voice command, and search keyword and a search in cases involving non-target word, with searching the retrieval target database based on word the classification is a search keyword, is exempt word Ah Ru words the classification is by not searched word A dialogue control means for generating a response indicating that
Presenting means for presenting a search result retrieved by the dialog control means and a response generated by the dialog control means;
Spoken dialogue device with

The voice recognition dictionary storage means includes the voice command list, the search keyword list, and the voice command from an option word list including a plurality of option words that are keywords that are likely to be used optionally when performing a search. And a speech recognition dictionary having a non-search word list formed by pairs of words that are non-search words obtained by deleting the search keyword and classifications indicating non-search words. The voice interactive apparatus according to claim 1.

Using a voice command dictionary that stores word is a voice command, and options dictionary which stores the option word, the search target database, which stores the word indicating information element item to be searched, the voice command The speech command list is generated by associating words stored in the dictionary with classifications indicating voice commands, and the words stored in the search target database are associated with classifications indicating search keywords. To generate a search keyword list and associate a word that is stored in the option word dictionary and not stored in the voice command dictionary or the search target database with a classification that indicates a word that is not a search target. The non-target word list is generated, the generated voice command list, the search key Dorisuto and voice dialogue system of the further claims 2, further comprising a speech recognition dictionary generating means for generating a speech recognition dictionary by registering the non-target word list.

The dialogue control unit, when the extracted word classification includes voice command, voice dialogue system according to any one of claims 1 to 3 for executing a process corresponding to the voice command.

Computer, voice interaction program for functioning as each unit constituting the voice interaction equipment according to any one of claims 1 to 4.

Computer
A search keyword list consisting of a pair of a voice command list consisting of a word that is a voice command and a classification indicating that it is a voice command, and a pair indicating a word that is a search keyword in the search target database and a classification indicating that it is a search keyword And a speech recognition dictionary storing a speech recognition dictionary having a word list that is not a search target word but a word that is not a search target word and a classification indicating that it is a non-search target word Dictionary storage means,
The voice data input by the user to recognize speech, the speech recognition means to extract each and that classification of words included in the stored speech recognition dictionaries in the speech recognition dictionary storing means,
It is determined whether the classification of each word extracted by the voice recognition means is the voice command, the search keyword, or the non-target word, the classification of each extracted word does not include a voice command, and search when a keyword and the search outside word, as well as searching for the search target database based on word the classification is a search keyword, the classification is the Ah Ru terms in the search target outside word pair Zogai Interaction control means for generating a response indicating a word; and
Presenting means for presenting a search result retrieved by the dialog control means and a response generated by the dialog control means;
Spoken dialogue program to function as.