JP2015148758A

JP2015148758A - Voice interactive system and voice interactive method

Info

Publication number: JP2015148758A
Application number: JP2014022385A
Authority: JP
Inventors: 佐和樋口; Sawa Higuchi; 生聖渡部; Seisho Watabe
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2014-02-07
Filing date: 2014-02-07
Publication date: 2015-08-20

Abstract

PROBLEM TO BE SOLVED: To make a more appropriate reply for the intension of a user when continuing a natural conversation with the user.SOLUTION: A voice interactive system 100 which repeats interaction a plurality of times by combining a speech from a user 200 with a reply to the speech includes: a reply sentence database 170 which stores a plurality of reply sentences indicating reply candidates; an interaction recording section 160 which records the content of a predetermined number of past interactions with the user; a keyword extraction section 110 which extracts a keyword in the content of the interaction from the interaction recording section 160; a word extraction section 130 which extracts a word included in a speech which is input by the user; a selection section 140 which selects a reply sentence corresponding to the extracted word and the extracted keyword from among the reply sentences stored in the reply sentence database; and an output section 150 which outputs the selected reply sentence to the user by voice, as a reply to the speech.

Description

本発明は、音声対話システム及び音声対話方法に関し、特に、ユーザと連続した対話を行うための音声対話システム及び音声対話方法に関する。 The present invention relates to a voice dialogue system and a voice dialogue method, and more particularly, to a voice dialogue system and a voice dialogue method for performing continuous dialogue with a user.

ユーザとロボットが音声で対話する音声対話システムとしては、例えば、特許文献１に係る技術がある。特許文献１に係る技術は、ユーザの発話内容からキーワードを抽出し、抽出されたキーワードにより会話データベース等を検索し、ヒットしたコンテンツを応答として音声により出力するものである。 As a voice dialogue system in which a user and a robot talk by voice, for example, there is a technique according to Patent Document 1. The technique according to Patent Document 1 extracts keywords from user's utterance contents, searches a conversation database or the like with the extracted keywords, and outputs the hit contents as a response by voice.

特開２００６−１７１７１９号公報JP 2006-171719 A

ここで、人間同士がある話題に対して複数回の会話のやり取りを行う場合には、それまでの会話で出現した単語等を踏まえて次の発話が行われることが一般的である。そのため、会話における話題において重要な単語であっても、後続の発話においては省略されることもある。 Here, when exchanging a plurality of conversations on a topic with each other, it is common that the next utterance is performed based on words or the like that have appeared in the previous conversation. Therefore, even an important word in a conversation topic may be omitted in subsequent utterances.

特許文献１では、対話の対象となる発話に含まれる表現のみを解析し、解析により得られた単語をキーワードとして検索をしているため、ユーザの意図とは異なるコンテンツが応答としてヒットしてしまう可能性がある。 In Patent Document 1, only the expression included in the utterance subject to dialogue is analyzed, and the word obtained by the analysis is searched as a keyword. Therefore, content different from the user's intention is hit as a response. there is a possibility.

例えば、ユーザが「木星って何でできてるの？」と発話し、ロボットが「木星はガスで出来ていて、太陽系最大の惑星です。」と応答した後、続けてユーザが「直径を教えて」と発話した場合に、ロボットは、「土星の直径は約ＸＸキロメートルです。」等と応答してしまう可能性がある。しかし、ユーザの２番目の発話「直径を教えて」における「直径」は、本来「木星の直径」を意図したものであったが、それまでの対話の続きであることから「木星の」という言葉が省略されたものである。そのため、検索キーワードに「木星」が含まれず、何らかの直径に関する応答がデータベースからヒットしてしまったことを示す。このように、特許文献１に係る技術では、ユーザからの入力に、特定の話題における重要なキーワードが省略されている場合に、省略されたキーワードを考慮していないために、ユーザの意図を正確に把握することができないという問題点がある。 For example, after the user says, “What is Jupiter made of?” And the robot responds, “Jupiter is made of gas and is the largest planet in the solar system.” , The robot may respond such as “Saturn's diameter is about XX kilometers”. However, the "diameter" in the user's second utterance "Tell me the diameter" was originally intended to be "the diameter of Jupiter", but it is called "Jupiter" because it is a continuation of the previous dialogue. The words are omitted. Therefore, “Jupiter” is not included in the search keyword, indicating that a response regarding a certain diameter has been hit from the database. As described above, in the technique according to Patent Document 1, when an important keyword in a specific topic is omitted in the input from the user, the omitted keyword is not considered, and thus the user's intention is accurately determined. There is a problem that cannot be grasped.

本発明は、このような問題を解決するためになされたものであり、ユーザとの自然な対話を継続的に行う場合に、ユーザの意図に対してより的確な応答を実現するための音声対話システム及び音声対話方法を提供することを目的としている。 The present invention has been made to solve such a problem, and in a case where a natural dialogue with the user is continuously performed, a voice dialogue for realizing a more accurate response to the intention of the user. It is an object to provide a system and a voice interaction method.

本発明の第１の態様にかかる音声対話システムは、
ユーザからの発話と当該発話に対する応答との組み合わせである対話を複数回連続して行う音声対話システムであって、
前記応答の候補を示す複数の応答文を予め記憶した応答文データベースと、
前記ユーザとの過去の対話内容を所定回数分記録する対話記録部と、
前記対話記録部から前記対話内容におけるキーワードを抽出するキーワード抽出部と、
前記ユーザから新たに発話が入力された場合に、当該発話に含まれる単語を抽出する単語抽出部と、
前記応答文データベース内に記憶された複数の応答文のうち、前記抽出された単語と前記抽出されたキーワードとに対応する応答文を選択する選択部と、
前記ユーザへ前記選択した応答文を前記発話に対する応答として音声により出力する出力部と、
を備える。 A spoken dialogue system according to a first aspect of the present invention includes:
A spoken dialogue system that continuously conducts a dialogue that is a combination of an utterance from a user and a response to the utterance a plurality of times,
A response sentence database in which a plurality of response sentences indicating candidate responses are stored in advance;
A dialog recording unit for recording the past dialog content with the user a predetermined number of times;
A keyword extraction unit for extracting keywords in the dialogue content from the dialogue recording unit;
A word extraction unit that extracts a word included in the utterance when a new utterance is input from the user;
A selection unit that selects a response sentence corresponding to the extracted word and the extracted keyword among a plurality of response sentences stored in the response sentence database;
An output unit that outputs the selected response sentence to the user by voice as a response to the utterance;
Is provided.

本発明の第２の態様にかかる音声対話方法は、
ユーザからの発話と当該発話に対する応答との組み合わせである対話を複数回連続して行う音声対話システムを用いた音声対話方法であって、
前記音声対話システムは、
前記応答の候補を示す複数の応答文を予め記憶した応答文データベースと、
前記ユーザとの過去の対話内容を所定回数分記録した対話記録部とを備え、
前記対話記録部から前記対話内容におけるキーワードを抽出し、
前記ユーザから新たに発話が入力された場合に、当該発話に含まれる単語を抽出し、
前記応答文データベース内に記憶された複数の応答文のうち、前記抽出された単語と前記抽出されたキーワードとに対応する応答文を選択し、
前記ユーザへ前記選択した応答文を前記発話に対する応答として音声により出力する。 The voice interaction method according to the second aspect of the present invention includes:
A voice dialogue method using a voice dialogue system that continuously performs a dialogue that is a combination of an utterance from a user and a response to the utterance a plurality of times,
The spoken dialogue system includes:
A response sentence database in which a plurality of response sentences indicating candidate responses are stored in advance;
A dialogue recording unit that records a past number of dialogues with the user a predetermined number of times,
Extracting keywords in the dialogue content from the dialogue recording unit,
When a new utterance is input from the user, a word included in the utterance is extracted,
Selecting a response sentence corresponding to the extracted word and the extracted keyword from a plurality of response sentences stored in the response sentence database;
The selected response sentence is output to the user by voice as a response to the utterance.

このように、本発明の各態様では、ユーザの発話時における発話内容に含まれる単語だけでなく、直近における対話内容に含まれる単語（キーワード）も含めて、これらの単語に対応する応答文を選択する。そのため、ユーザがそれまでの対話内容を踏まえて重要なキーワードを省略した発話を行った場合であっても、ユーザの意図を把握することができ、それまでの対話内容に沿った応答文を出力することができる。 As described above, in each aspect of the present invention, not only words included in the utterance contents at the time of the user's utterance but also words (keywords) included in the latest conversation contents, response sentences corresponding to these words are included. select. Therefore, even if the user has made an utterance omitting an important keyword based on the content of the previous conversation, the user's intention can be grasped, and a response sentence according to the content of the previous conversation is output. can do.

本発明により、ユーザとの自然な対話を継続的に行う場合に、ユーザの意図に対してより的確な応答を実現するための音声対話システム及び音声対話方法を提供することができる。 According to the present invention, it is possible to provide a voice dialogue system and a voice dialogue method for realizing a more accurate response to a user's intention when a natural dialogue with the user is continuously performed.

本発明の実施の形態１にかかる音声対話システムの構成を示す図である。It is a figure which shows the structure of the speech dialogue system concerning Embodiment 1 of this invention. 本発明の実施の形態１にかかる応答処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the response process concerning Embodiment 1 of this invention. 本発明の実施の形態２にかかる音声対話システムの構成を示す図である。It is a figure which shows the structure of the voice dialogue system concerning Embodiment 2 of this invention. 本発明の実施の形態２にかかる応答処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the response process concerning Embodiment 2 of this invention.

以下では、上述した各態様を含む本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。各図面において、同一要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略する。 Hereinafter, specific embodiments to which the present invention including the above-described aspects are applied will be described in detail with reference to the drawings. In the drawings, the same elements are denoted by the same reference numerals, and redundant description will be omitted as necessary for the sake of clarity.

＜発明の実施の形態１＞
図１は、本発明の実施の形態１にかかる音声対話システム１００の構成を示す図である。音声対話システム１００は、ユーザ２００との対話を複数回連続して行う情報システムである。ここで、対話とは、ユーザからの発話とその発話に対する応答との組み合わせを指すものとする。音声対話システム１００は、例えば、対話型のロボット等であってもよい。音声対話システム１００は、キーワード抽出部１１０と、発話受付部１２０と、単語抽出部１３０と、選択部１４０と、出力部１５０と、対話記録部１６０と、応答文データベース（ＤＢ）１７０とを備える。 <Embodiment 1 of the Invention>
FIG. 1 is a diagram showing a configuration of a voice interaction system 100 according to the first exemplary embodiment of the present invention. The spoken dialogue system 100 is an information system that continuously performs dialogue with the user 200 a plurality of times. Here, the dialogue refers to a combination of an utterance from the user and a response to the utterance. The voice interactive system 100 may be, for example, an interactive robot. The spoken dialogue system 100 includes a keyword extraction unit 110, an utterance reception unit 120, a word extraction unit 130, a selection unit 140, an output unit 150, a dialogue recording unit 160, and a response sentence database (DB) 170. .

対話記録部１６０は、ユーザ２００との過去の対話内容を所定回数分記録する。ここで、対話内容とは、ユーザ２００から音声対話システム１００への発話をテキストデータに変換したものと、音声対話システム１００からユーザ２００への応答であるテキストデータとを含む。つまり、対話記録部１６０は、特定の話題に関してユーザ２００と連続して行われている対話の過去数回分の履歴データを記録し、保持している。 The dialogue recording unit 160 records past dialogue contents with the user 200 a predetermined number of times. Here, the dialogue content includes text obtained by converting speech from the user 200 to the voice dialogue system 100 into text data, and text data that is a response from the voice dialogue system 100 to the user 200. That is, the dialogue recording unit 160 records and holds history data for the past several times of dialogues that are continuously performed with the user 200 regarding a specific topic.

応答文ＤＢ１７０は、ユーザ２００からの発話に対する応答の候補を示す複数の応答文を予め記憶したデータベースである。対話記録部１６０及び応答文ＤＢ１７０は、音声対話システム１００にかかるロボット等の内部の記憶装置（不図示）により実現されている。但し、対話記録部１６０及び応答文ＤＢ１７０は、当該ロボット等の外部の記憶装置内で実現しても構わない。 The response sentence DB 170 is a database in which a plurality of response sentences indicating candidates for responses to the utterances from the user 200 are stored in advance. The dialogue recording unit 160 and the response sentence DB 170 are realized by an internal storage device (not shown) such as a robot related to the voice dialogue system 100. However, the dialogue recording unit 160 and the response sentence DB 170 may be realized in an external storage device such as the robot.

キーワード抽出部１１０は、対話記録部１６０から過去の対話内容におけるキーワードを抽出する。ここで、キーワード抽出部１１０は、対話記録部１６０に記録された過去の対話内容に含まれる単語のうち、一連の対話における話題に関して代表的な単語をキーワードとするものとする。つまり、キーワード抽出部１１０は、対話記録部１６０内の全ての単語ではなく一部の単語、しかも連続する対話における特徴的な単語をキーワードとすることが望ましい。例えば、過去の対話内容のうち、直近の数回で頻出する単語を次回の発話におけるキーワードとしてもよい。または、過去の対話内容のうち所定の基準において重要性の高い単語をキーワードとしてもよい。 The keyword extraction unit 110 extracts keywords in the past dialogue contents from the dialogue recording unit 160. Here, it is assumed that the keyword extraction unit 110 uses, as keywords, typical words related to topics in a series of conversations among words included in past conversation contents recorded in the conversation recording unit 160. That is, it is desirable that the keyword extraction unit 110 uses not all the words in the dialogue recording unit 160 but some words, and the characteristic words in the continuous dialogue as keywords. For example, of the past dialog contents, words that appear frequently in the latest several times may be used as keywords in the next utterance. Or it is good also considering a word with high importance in a predetermined standard among the contents of past dialogue as a keyword.

発話受付部１２０は、ユーザ２００からの発話の入力を受け付け、発話をテキストデータに変換し、対話記録部１６０へ格納する。単語抽出部１３０は、ユーザ２００から新たに発話が入力された場合に、当該発話に含まれる単語を抽出する。ここで、単語抽出部１３０は、入力された発話に含まれる全て又は一部の単語を抽出するものとする。一部の単語を抽出する場合には、単語抽出部１３０は、所定の基準において重要性の高い単語を抽出してもよい。選択部１４０は、応答文ＤＢ１７０内に記憶された複数の応答文のうち、抽出された単語と抽出されたキーワードとに対応する応答文を選択する。ここで、抽出された単語と抽出されたキーワードとに対応する応答文としては、単語及びキーワードそのものを含む応答文、または、単語又はキーワードのいずれかを含まないとしても当該単語又はキーワードに関連する他の単語を含む応答文としてもよい。例えば、選択部１４０は、上記単語又はキーワードに関連する他の単語を特定し、特定した単語を含む応答文を選択しても構わない。
出力部１５０は、選択した応答文を発話に対する応答として音声に変換してユーザ２００へ出力する。 The utterance reception unit 120 receives an utterance input from the user 200, converts the utterance into text data, and stores the text data in the dialogue recording unit 160. When a new utterance is input from the user 200, the word extraction unit 130 extracts a word included in the utterance. Here, it is assumed that the word extraction unit 130 extracts all or some of the words included in the input utterance. When extracting some words, the word extraction unit 130 may extract words that are highly important based on a predetermined criterion. The selection unit 140 selects a response sentence corresponding to the extracted word and the extracted keyword from the plurality of response sentences stored in the response sentence DB 170. Here, as a response sentence corresponding to the extracted word and the extracted keyword, a response sentence including the word and the keyword itself, or a word or keyword that is not included, is related to the word or keyword. It is good also as a response sentence containing another word. For example, the selection unit 140 may specify another word related to the word or the keyword and select a response sentence including the specified word.
The output unit 150 converts the selected response sentence into a voice as a response to the utterance and outputs it to the user 200.

尚、単語に関連する他の単語の特定としては、例えば、述語項構造解析の技術を用いてもよい。その場合、選択部１４０は、解析した構造が類似した文章を選択することとなる。ここで、述語項構造解析としては、例えば、“吉野幸一郎等、「述語項の類似度に基づく情報推薦を行う音声対話システム」、情報処理学会研究報告、Vol. 2011-SLP-87, No. 11”に開示された技術を適用することができる。その場合、単語抽出部１３０は、対象のテキストデータを形態素解析し、テキストデータに含まれる複数の単語を抽出する。そして、選択部１４０は、まず、抽出された複数の単語とキーワードとである検索語を全て含む応答文を、応答文ＤＢ１７０の中から検索する。検索できない場合、選択部１４０は、検索語の一部を選択し、選択した検索語を含む応答文を応答文ＤＢ１７０の中から検索する。このとき、選択部１４０は、例えば、各検索語について所定の手法により要素間の関連度を算出し、関連度のより高い検索語を一部の検索語として選択するとよい。これにより、選択される応答文の精度を向上させることができる。尚、上記において単語抽出部１３０をキーワード抽出部１１０に置き換えても同様となる。但し、述語項構造解析以外の技術を適用してもよい。 In addition, as specification of the other word relevant to a word, you may use the technique of a predicate term structure analysis, for example. In that case, the selection unit 140 selects a sentence having a similar analyzed structure. Here, as the predicate term structure analysis, for example, “Koiichiro Yoshino et al.,“ Spoken dialogue system that recommends information based on similarity of predicate terms ”, Information Processing Society of Japan, Vol. 2011-SLP-87, No. 11 ″ can be applied. In that case, the word extraction unit 130 performs morphological analysis on the target text data, and extracts a plurality of words included in the text data. First, a response sentence including all of the extracted search words that are a plurality of words and keywords is searched from the response sentence DB 170. If the search cannot be performed, the selection unit 140 selects a part of the search word, The response sentence including the selected search word is searched from the response sentence DB 170. At this time, for example, the selection unit 140 calculates the degree of association between elements by a predetermined method for each search word, and the degree of association is higher. Search word This may be selected as a part of the search term, thereby improving the accuracy of the selected response sentence, although the same applies if the word extraction unit 130 is replaced with the keyword extraction unit 110 in the above. Techniques other than predicate term structure analysis may be applied.

図２は、本発明の実施の形態１にかかる応答処理の流れを示すフローチャートである。前提として、既にユーザ２００と音声対話システム１００との間で、特定の話題に関する複数回の対話（発話と応答）の記録が対話記録部１６０に保持されているものとする。 FIG. 2 is a flowchart showing a flow of response processing according to the first exemplary embodiment of the present invention. As a premise, it is assumed that records of a plurality of dialogues (utterances and responses) regarding a specific topic are already held in the dialogue recording unit 160 between the user 200 and the voice dialogue system 100.

まず、キーワード抽出部１１０は、対話記録部１６０を参照し、過去の対話記録からキーワードを抽出する（Ｓ１１）。また、発話受付部１２０は、新たな発話を受け付け、受け付けた発話から変換されたテキストデータを対話記録部１６０に記録する（Ｓ１２）。続いて、単語抽出部１３０は、受け付けた発話のテキストデータから形態素解析等により単語を抽出する（Ｓ１３）。尚、実施の形態１では、ステップＳ１１と、ステップＳ１２及びＳ１３との順序は問わない。 First, the keyword extraction unit 110 refers to the dialogue recording unit 160 and extracts keywords from past dialogue records (S11). In addition, the utterance receiving unit 120 receives a new utterance and records text data converted from the received utterance in the dialogue recording unit 160 (S12). Subsequently, the word extraction unit 130 extracts words from the received utterance text data by morphological analysis or the like (S13). In the first embodiment, the order of step S11 and steps S12 and S13 does not matter.

次に、選択部１４０は、ステップＳ１１により抽出されたキーワード及びステップＳ１３により抽出された単語に対応する応答文を応答文ＤＢ１７０の中から選択する（Ｓ１４）。そして、選択部１４０は、選択した応答文を対話記録部１６０へ記録する（Ｓ１５）。また、出力部１５０は、選択した応答文を音声に変換してユーザ２００へ出力する（Ｓ１６）。 Next, the selection unit 140 selects a response sentence corresponding to the keyword extracted in step S11 and the word extracted in step S13 from the response sentence DB 170 (S14). Then, the selection unit 140 records the selected response sentence in the dialogue recording unit 160 (S15). Further, the output unit 150 converts the selected response sentence into speech and outputs it to the user 200 (S16).

このように、本発明の実施の形態１では、ステップＳ１２でユーザ２００から受け付けた発話内容に含まれる単語だけでなく、それ以前に対話記録部１６０に記録済みの対話内容に含まれる単語も含めて、これらに対応する応答文を選択する。そのため、ユーザがそれまでの対話内容を踏まえて重要なキーワードを省略した発話を行った場合であっても、ユーザの意図を正確に把握することができ、それまでの対話内容に沿った応答文を出力することができる。 As described above, in the first embodiment of the present invention, not only the words included in the utterance content received from the user 200 in step S12 but also the words included in the conversation content recorded in the dialog recording unit 160 before that are included. Then, the response sentence corresponding to these is selected. Therefore, even if the user has made an utterance omitting important keywords based on the content of the previous conversation, the user's intention can be accurately grasped, and a response sentence in accordance with the content of the previous conversation. Can be output.

＜発明の実施の形態２＞
本発明の実施の形態２は、上述した実施の形態１を改良したものである。実施の形態１では、ユーザ２００からの発話を受け付けてから都度、応答文ＤＢ１７０を検索して応答文を取得している。つまり、単語が入力される度に、入力された単語と応答文ＤＢ１７０内の応答文との照合を行う必要がある。そのため、発話を受け付けてから応答するまでにある程度の処理時間を要することになる。ユーザ２００とのより自然な会話を継続するには、応答時間を短縮することが望ましい。 <Embodiment 2 of the Invention>
The second embodiment of the present invention is an improvement over the first embodiment described above. In the first embodiment, every time an utterance from the user 200 is received, the response sentence DB 170 is searched to obtain a response sentence. That is, each time a word is input, it is necessary to collate the input word with the response sentence in the response sentence DB 170. For this reason, a certain amount of processing time is required from receiving an utterance to responding. In order to continue a more natural conversation with the user 200, it is desirable to shorten the response time.

ここで、図１の応答文ＤＢ１７０には、ユーザ２００との様々な話題における発話に対応するために、多数の応答文が格納されている。しかし、特定の話題について連続して対話を行う場合には、実際に選択される応答文は当該特定の話題に関係するものに限られる。そこで、本発明の実施の形態２では、ユーザ２００から新たな発話を受け付けるより前の段階で、既に行われた対話の履歴に基づいて予め応答文ＤＢ１７０から応答文の候補を選択しておくものである。例えば、直前の応答処理と並行して次に受け付けるであろう発話の応答文の候補を検索し、キャッシュとして保存しておくものである。これにより、新たな発話を受け付けてから大量のデータが保存された応答文ＤＢ１７０と照合を行う必要がなくなり、応答処理時間を短縮することができる。 Here, a large number of response sentences are stored in the response sentence DB 170 of FIG. 1 in order to correspond to utterances on various topics with the user 200. However, when a conversation is continuously performed on a specific topic, the response sentence that is actually selected is limited to that related to the specific topic. Therefore, in the second embodiment of the present invention, response sentence candidates are selected in advance from the response sentence DB 170 based on the history of conversations that have already been performed before accepting a new utterance from the user 200. It is. For example, a response sentence candidate of an utterance that will be accepted next is searched in parallel with the immediately preceding response process and stored as a cache. Thereby, it is not necessary to collate with the response sentence DB 170 in which a large amount of data is stored after receiving a new utterance, and the response processing time can be shortened.

図３は、本発明の実施の形態２にかかる音声対話システム１００ａの構成を示す図である。図３は、上述した音声対話システム１００を改良したものであり、音声対話システム１００と同一の構成には同一の符号を付し、詳細な説明を省略する。 FIG. 3 is a diagram showing a configuration of the voice interaction system 100a according to the second embodiment of the present invention. FIG. 3 is an improvement of the above-described voice interaction system 100. The same components as those in the voice interaction system 100 are denoted by the same reference numerals, and detailed description thereof is omitted.

音声対話システム１００ａは、音声対話システム１００との違いとして、キーワード抽出部１１０がキーワード抽出部１１０ａ、選択部１４０が選択部１４０ａに置き換わり、単語重要度ＤＢ１８０及びキャッシュ１９０が追加されたものである。 The voice interaction system 100a differs from the voice interaction system 100 in that the keyword extraction unit 110 is replaced with the keyword extraction unit 110a and the selection unit 140 is replaced with the selection unit 140a, and the word importance DB 180 and the cache 190 are added.

単語重要度ＤＢ１８０は、複数の単語のそれぞれについて所定の基準に基づく重要度が定義されたデータベースである。所定の基準とは例えば、複数の話題やテーマのそれぞれに応じた基準である。また、重要度は、ある話題における文書集合の中に含まれる各単語について、単語の出現頻度等に基づいて算出されたものである。例えば、ｔｆ−ｉｄｆ（tf : Term Frequency, idf : Inverse Document Frequency）といった公知の技術を用いて重要度を算出することができる。または、各単語の重要度を話題ごとに人間が判断して予め設定したものであってもよい。 The word importance DB 180 is a database in which importance based on a predetermined standard is defined for each of a plurality of words. The predetermined standard is, for example, a standard according to each of a plurality of topics and themes. Further, the importance is calculated based on the appearance frequency of words for each word included in a document set in a certain topic. For example, the importance can be calculated using a known technique such as tf-idf (tf: Term Frequency, idf: Inverse Document Frequency). Alternatively, the importance of each word may be set in advance by a human being for each topic.

キャッシュ１９０は、複数の応答文の一部を記憶可能な部分記憶部である。つまり、キャッシュ１９０に記憶されるデータ量は、応答文ＤＢ１７０に保存されるデータ量より少ない。また、キャッシュ１９０は、応答文ＤＢ１７０を実現する記憶装置よりも高速な記憶装置、例えば、一次記憶装置等により実現しても構わない。 The cache 190 is a partial storage unit that can store a part of a plurality of response sentences. That is, the amount of data stored in the cache 190 is smaller than the amount of data stored in the response sentence DB 170. The cache 190 may be realized by a storage device that is faster than the storage device that implements the response sentence DB 170, such as a primary storage device.

キーワード抽出部１１０ａは、キーワード抽出部１１０の機能に加え、対話記録部１６０から複数の単語がキーワードとして抽出された場合、単語重要度ＤＢ１８０を参照し、当該抽出された複数の単語のうち、重要度が所定値以上の単語をキーワードとする。尚、所定値は任意に設定可能である。また、キーワード抽出部１１０ａは、ユーザからの発話が新たに入力される前に実行される。例えば、直前の発話により選択部１４０ａや出力部１５０の処理と並行して実行される。または、ユーザ２００が次の発話を行う前、一定時間以上、間が空いている際に実行してもよい。 In addition to the function of the keyword extraction unit 110, the keyword extraction unit 110a refers to the word importance DB 180 when a plurality of words are extracted as keywords from the dialogue recording unit 160, and among the extracted plurality of words, the keyword extraction unit 110a A word whose degree is a predetermined value or more is set as a keyword. The predetermined value can be set arbitrarily. The keyword extraction unit 110a is executed before a new utterance from the user is input. For example, it is executed in parallel with the processing of the selection unit 140a and the output unit 150 by the immediately preceding utterance. Alternatively, it may be executed when the user 200 has a certain time or more before the next utterance is made.

選択部１４０ａは、関連文選択部１４１と、応答文選択部１４２とを備える。関連文選択部１４１は、ユーザからの発話が新たに入力される前に、応答文ＤＢ１７０の中から抽出されたキーワードを含む複数の応答文（関連文）を選択する。そして、関連文選択部１４１は、選択した複数の応答文をキャッシュ１９０に格納する。応答文選択部１４２は、ユーザ２００からの発話が新たに入力された後に、キャッシュ１９０の中から当該発話に含まれる単語に対応する応答文を選択する。尚、応答文選択部１４２における単語に対応する応答文の選択の仕方は、上記実施の形態１と同様に、単語に関連する他の単語を特定し、特定した他の単語を含む応答文を選択するようにしてもよい。 The selection unit 140a includes a related sentence selection unit 141 and a response sentence selection unit 142. The related sentence selection unit 141 selects a plurality of response sentences (related sentences) including the keywords extracted from the response sentence DB 170 before a new utterance from the user is input. Then, the related sentence selection unit 141 stores the selected plurality of response sentences in the cache 190. After the utterance from the user 200 is newly input, the response sentence selection unit 142 selects a response sentence corresponding to the word included in the utterance from the cache 190. Note that the method of selecting a response sentence corresponding to a word in the response sentence selection unit 142 is to specify another word related to the word, and to select a response sentence including the specified other word, as in the first embodiment. You may make it select.

尚、本実施の形態においては、キーワード抽出部１１０ａ及び単語重要度ＤＢ１８０は、必須ではない。その場合であっても処理時間を短縮できる。そして、キーワード抽出部１１０ａ及び単語重要度ＤＢ１８０を用いることにより、応答文を選択する精度を向上させることができる。 In the present embodiment, the keyword extraction unit 110a and the word importance DB 180 are not essential. Even in this case, the processing time can be shortened. And the precision which selects a response sentence can be improved by using the keyword extraction part 110a and the word importance DB180.

図４は、本発明の実施の形態２にかかる応答処理の流れを示すフローチャートである。尚、以下では図２と同等の処理については説明を省略する。キーワード抽出部１１０ａは、過去の対話記録と単語の重要度からキーワードを抽出する（Ｓ１１ａ）。すなわち、キーワード抽出部１１０ａは、直前の発話についての応答処理中又は応答処理後であって、次の発話が入力される前に、対話記録部１６０から複数の単語を抽出する。そして、キーワード抽出部１１０ａは、抽出された各単語について重要度が所定値以上のものをキーワードとする。 FIG. 4 is a flowchart showing a flow of response processing according to the second embodiment of the present invention. In the following, description of processing equivalent to that in FIG. 2 is omitted. The keyword extraction unit 110a extracts keywords from past dialogue records and word importance (S11a). That is, the keyword extraction unit 110a extracts a plurality of words from the dialogue recording unit 160 during or after the response process for the immediately preceding utterance and before the next utterance is input. Then, the keyword extraction unit 110a sets keywords whose importance is equal to or higher than a predetermined value for each extracted word.

次に、関連文選択部１４１は、応答文ＤＢ１７０の中から、キーワードを含む複数の応答文を関連文として選択する（Ｓ１１ｂ）。尚、関連文選択部１４１は、応答文ＤＢ１７０内に応答文の一部を選択するものとする。そして、関連文選択部１４１は、選択した関連文をキャッシュ１９０に格納する（Ｓ１１ｃ）。 Next, the related sentence selection unit 141 selects a plurality of response sentences including keywords from the response sentence DB 170 as related sentences (S11b). The related sentence selection unit 141 selects a part of the response sentence in the response sentence DB 170. Then, the related sentence selection unit 141 stores the selected related sentence in the cache 190 (S11c).

その後、図２と同様にステップＳ１２及びＳ１３が実行される。そして、応答文選択部１４２は、キャッシュ１９０からステップＳ１３により抽出された単語に対応する応答文を選択する（Ｓ１４ａ）。その後、図２と同様にステップＳ１５及びＳ１６が実行される。 Thereafter, steps S12 and S13 are executed as in FIG. Then, the response sentence selection unit 142 selects a response sentence corresponding to the word extracted from the cache 190 in step S13 (S14a). Thereafter, steps S15 and S16 are executed as in FIG.

このように、本実施の形態では、過去の対話記録に基づき応答文の候補を予めリストアップしておき、その後、発話された際にはリストアップされた（絞り込まれた）候補の中から応答文を選択することとなるため、処理時間を短縮できる。そのため、これまでの対話内容に基づき重要なキーワードが省略された発話がされた場合であっても、ユーザの意図を短時間で正確に把握できる。 As described above, in this embodiment, response sentence candidates are listed in advance based on past conversation records, and then, when uttered, responses are selected from the listed (restricted) candidates. Since the sentence is selected, the processing time can be shortened. Therefore, even if an utterance in which an important keyword is omitted based on the contents of the conversation so far, the user's intention can be accurately grasped in a short time.

＜その他の発明の実施の形態＞
尚、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、上述の実施の形態では、本発明をハードウェアの構成として説明したが、本発明は、これに限定されるものではない。本発明は、任意の処理を、ＣＰＵ（Central Processing Unit）にコンピュータプログラムを実行させることにより実現することも可能である。 <Other embodiments of the invention>
Note that the present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention. For example, in the above-described embodiment, the present invention has been described as a hardware configuration, but the present invention is not limited to this. The present invention can also realize arbitrary processing by causing a CPU (Central Processing Unit) to execute a computer program.

上述の例において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、ＤＶＤ（Digital Versatile Disc）、ＢＤ(Blu-ray(登録商標) Disc)、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（Random Access Memory））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In the above example, the program can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W, DVD (Digital Versatile Disc), BD (Blu-ray (registered trademark) Disc), semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM ( Random Access Memory)). The program may also be supplied to the computer by various types of transitory computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

１００音声対話システム
１００ａ音声対話システム
１１０キーワード抽出部
１１０ａキーワード抽出部
１２０発話受付部
１３０単語抽出部
１４０選択部
１４０ａ選択部
１４１関連文選択部
１４２応答文選択部
１５０出力部
１６０対話記録部
１７０応答文ＤＢ
１８０単語重要度ＤＢ
１９０キャッシュ
２００ユーザ DESCRIPTION OF SYMBOLS 100 Speech dialogue system 100a Voice dialogue system 110 Keyword extraction part 110a Keyword extraction part 120 Speech reception part 130 Word extraction part 140 Selection part 140a Selection part 141 Related sentence selection part 142 Response sentence selection part 150 Output part 160 Dialog recording part 170 Response sentence DB
180 word importance DB
190 cache 200 users

Claims

A spoken dialogue system that continuously conducts a dialogue that is a combination of an utterance from a user and a response to the utterance a plurality of times,
A response sentence database in which a plurality of response sentences indicating candidate responses are stored in advance;
A dialog recording unit for recording the past dialog content with the user a predetermined number of times;
A keyword extraction unit for extracting keywords in the dialogue content from the dialogue recording unit;
A word extraction unit that extracts a word included in the utterance when a new utterance is input from the user;
A selection unit that selects a response sentence corresponding to the extracted word and the extracted keyword among a plurality of response sentences stored in the response sentence database;
An output unit that outputs the selected response sentence to the user by voice as a response to the utterance;
A voice dialogue system comprising:

A partial storage unit capable of storing a part of the plurality of response sentences;
The selection unit includes:
Before a new utterance from the user is input, select a plurality of response sentences including the extracted keyword from the response sentence database;
Storing the selected plurality of response sentences in the partial storage unit;
The speech dialogue system according to claim 1, wherein a response sentence including a word included in the utterance is selected from the partial storage unit after a new utterance from the user is input.

A word importance database in which importance based on a predetermined criterion is defined for each of a plurality of words;
The keyword extracting unit refers to the word importance database when a plurality of words are extracted as the keyword from the dialogue recording unit, and the importance is equal to or higher than a predetermined value among the extracted words. The spoken dialogue system according to claim 1, wherein a word is the keyword.

The selection unit specifies the extracted word or another word related to the extracted keyword, does not include any of the extracted word or the extracted keyword, and The voice dialogue system according to any one of claims 1 to 3, wherein a response sentence including a word is selected as the corresponding response sentence among the plurality of response sentences.

A voice dialogue method using a voice dialogue system that continuously performs a dialogue that is a combination of an utterance from a user and a response to the utterance a plurality of times,
The spoken dialogue system includes:
A response sentence database in which a plurality of response sentences indicating candidate responses are stored in advance;
A dialogue recording unit that records a past number of dialogues with the user a predetermined number of times,
Extracting keywords in the dialogue content from the dialogue recording unit,
When a new utterance is input from the user, a word included in the utterance is extracted,
Selecting a response sentence corresponding to the extracted word and the extracted keyword from a plurality of response sentences stored in the response sentence database;
A voice dialogue method for outputting the selected response sentence to the user by voice as a response to the utterance.