JP2013072887A

JP2013072887A - Interactive device

Info

Publication number: JP2013072887A
Application number: JP2011209504A
Authority: JP
Inventors: Yuka Kobayashi; 優佳小林; Daisuke Yamamoto; 大介山本; Miwako Doi; 美和子土井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2011-09-26
Filing date: 2011-09-26
Publication date: 2013-04-22

Abstract

PROBLEM TO BE SOLVED: To achieve a natural interaction.SOLUTION: An interaction device includes an utterance set storage unit, an utterance set acquisition unit, a first output unit, a detection unit, and a second output unit. In the utterance set storage unit, an utterance set including a first utterance and a second utterance representing an utterance of a response to user's utterance assumed as a response to the first utterance is stored. The utterance set acquisition unit acquires an utterance set. The first output unit outputs the first utterance included in the acquired utterance set. The detection unit detects user's utterance after output of the first utterance. On detection of user's utterance, the second output unit outputs the second utterance included in the acquired utterance set.

Description

本発明の実施形態は、対話装置に関する。 Embodiments described herein relate generally to an interactive apparatus.

従来、ユーザと対話する対話装置が知られている。対話装置を利用した対話は、例えば、ユーザの入力文章を把握して対話するものと、ユーザの音声を把握して対話するものとがある。これらのような対話装置では、ユーザの入力文章又は音声を正確に把握し、より自然な対話を実現するために、高度な言語処理や音声認識処理が行われる場合がある。 Conventionally, an interactive apparatus that interacts with a user is known. The dialogue using the dialogue device includes, for example, a dialogue that grasps a user's input sentence and a dialogue that grasps a user's voice. In such an interactive apparatus, there are cases where advanced language processing and speech recognition processing are performed in order to accurately grasp a user's input sentence or voice and realize a more natural conversation.

特開２００６−３３１３４３号公報JP 2006-331343 A 特開２００６−３１４６７号公報JP 2006-31467 A 特開２０１０−７９５７４号公報JP 2010-79574 A

しかしながら、従来技術においては、ユーザの入力文章又は音声を正確に把握できない場合に、自然な対話を実現することが困難であるという問題がある。具体的には、従来技術では、言語処理や音声認識処理において、誤りが発生する可能性があるので、自然な対話を実現することが困難である。 However, in the prior art, there is a problem that it is difficult to realize a natural dialogue when the user's input sentence or voice cannot be accurately grasped. Specifically, in the prior art, errors may occur in language processing and speech recognition processing, so it is difficult to realize natural conversation.

本発明が解決しようとする課題は、自然な対話を実現することができる対話装置を提供することである。 The problem to be solved by the present invention is to provide an interactive apparatus capable of realizing natural conversation.

実施形態の対話装置は、発話セット記憶部と、発話セット取得部と、第１出力部と、検知部と、第２出力部とを有する。発話セット記憶部は、第１発話と、第１発話に対する応答として想定されるユーザによる発話に対する応答の発話を表す第２発話とを含んだ発話セットを記憶する。発話セット取得部は、発話セットを取得する。第１出力部は、取得された発話セットに含まれる第１発話を出力する。検知部は、第１発話が出力された後のユーザによる発話を検知する。第２出力部は、ユーザによる発話が検知された場合に、取得された発話セットに含まれる第２発話を出力する。 The interactive apparatus according to the embodiment includes an utterance set storage unit, an utterance set acquisition unit, a first output unit, a detection unit, and a second output unit. The utterance set storage unit stores an utterance set including a first utterance and a second utterance representing an utterance of a response to an utterance by a user assumed as a response to the first utterance. The utterance set acquisition unit acquires an utterance set. The first output unit outputs the first utterance included in the acquired utterance set. The detection unit detects an utterance by the user after the first utterance is output. The second output unit outputs the second utterance included in the acquired utterance set when the utterance by the user is detected.

図１は、第１の実施形態に係る対話装置の構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of the interactive apparatus according to the first embodiment. 図２は、第１の実施形態に係る発話セット記憶部に記憶される情報例を示す図である。FIG. 2 is a diagram illustrating an example of information stored in the utterance set storage unit according to the first embodiment. 図３は、第１の実施形態に係る対話処理の流れの例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of the flow of the interactive processing according to the first embodiment. 図４は、第２の実施形態に係る対話装置の構成例を示すブロック図である。FIG. 4 is a block diagram illustrating a configuration example of the interactive apparatus according to the second embodiment. 図５Ａは、第２の実施形態に係る発話テンプレート記憶部に記憶される情報例を示す図である。FIG. 5A is a diagram illustrating an example of information stored in the utterance template storage unit according to the second embodiment. 図５Ｂは、第２の実施形態に係る発話テンプレート記憶部に記憶される情報例を示す図である。FIG. 5B is a diagram illustrating an example of information stored in the utterance template storage unit according to the second embodiment. 図６は、第２の実施形態に係る発話セット生成処理の流れの例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of the flow of an utterance set generation process according to the second embodiment. 図７は、第３の実施形態に係る対話装置の構成例を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration example of the interactive apparatus according to the third embodiment. 図８は、第３の実施形態に係る特有単語記憶部に記憶される情報例を示す図である。FIG. 8 is a diagram illustrating an example of information stored in the specific word storage unit according to the third embodiment. 図９は、第３の実施形態に係る発話セット生成処理の流れの例を示すフローチャートである。FIG. 9 is a flowchart illustrating an example of the flow of an utterance set generation process according to the third embodiment. 図１０は、第４の実施形態に係る対話装置の構成例を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration example of the interactive apparatus according to the fourth embodiment. 図１１は、第４の実施形態に係る発話セット生成処理の流れの例を示すフローチャートである。FIG. 11 is a flowchart illustrating an example of the flow of an utterance set generation process according to the fourth embodiment. 図１２は、第５の実施形態に係る対話装置の構成例を示すブロック図である。FIG. 12 is a block diagram illustrating a configuration example of the interactive apparatus according to the fifth embodiment. 図１３は、第５の実施形態に係る発話テンプレート記憶部に記憶される情報例を示す図である。FIG. 13 is a diagram illustrating an example of information stored in the utterance template storage unit according to the fifth embodiment. 図１４は、第５の実施形態に係る順番決定処理を説明する図である。FIG. 14 is a diagram for explaining the order determination process according to the fifth embodiment. 図１５は、第５の実施形態に係る対話処理の流れの例を示すフローチャートである。FIG. 15 is a flowchart illustrating an example of a flow of interactive processing according to the fifth embodiment. 図１６は、第６の実施形態に係る対話装置の構成例を示すブロック図である。FIG. 16 is a block diagram illustrating a configuration example of the interactive apparatus according to the sixth embodiment. 図１７は、第６の実施形態に係る対話処理の流れの例を示すフローチャートである。FIG. 17 is a flowchart illustrating an example of the flow of interactive processing according to the sixth embodiment. 図１８は、第７の実施形態に係る対話装置の構成例を示すブロック図である。FIG. 18 is a block diagram illustrating a configuration example of the interactive apparatus according to the seventh embodiment. 図１９は、第７の実施形態に係る対話処理の流れの例を示すフローチャートである。FIG. 19 is a flowchart illustrating an example of the flow of interactive processing according to the seventh embodiment. 図２０は、再発話の要求時に係る対話処理の流れの例を示すフローチャートである。FIG. 20 is a flowchart illustrating an example of the flow of dialogue processing related to a request for recurrent speech. 図２１は、第３発話を含む発話セット記憶部に記憶される情報例を示す図である。FIG. 21 is a diagram illustrating an example of information stored in the utterance set storage unit including the third utterance. 図２２は、一定時間にユーザ発話を検知できない場合の対話処理の流れの例を示すフローチャートである。FIG. 22 is a flowchart illustrating an example of the flow of interactive processing when a user utterance cannot be detected at a certain time.

（第１の実施形態）
図１は、第１の実施形態に係る対話装置の構成例を示すブロック図である。例えば、図１に示すように、対話装置１００は、発話セット記憶部１０１と、発話セット取得部１０２と、出力部１０３と、検知部１０４とを有する。かかる対話装置１００は、例えば、ディスプレイ又はスピーカ等を備えた所定の表示出力装置に、文字又は音声を出力することによりユーザとの対話を実現する。 (First embodiment)
FIG. 1 is a block diagram illustrating a configuration example of the interactive apparatus according to the first embodiment. For example, as illustrated in FIG. 1, the dialogue apparatus 100 includes an utterance set storage unit 101, an utterance set acquisition unit 102, an output unit 103, and a detection unit 104. For example, the interactive device 100 realizes an interaction with the user by outputting characters or voices to a predetermined display output device including a display or a speaker.

発話セット記憶部１０１は、第１発話と、第１発話に対する応答として想定されるユーザによる発話に対する応答の発話を表す第２発話とを含んだ発話セットを記憶する。ここで、図２を用いて、第１の実施形態に係る発話セット記憶部１０１に記憶される情報について説明する。図２は、第１の実施形態に係る発話セット記憶部１０１に記憶される情報例を示す図である。 The utterance set storage unit 101 stores an utterance set including a first utterance and a second utterance representing an utterance of a response to an utterance by a user assumed as a response to the first utterance. Here, information stored in the utterance set storage unit 101 according to the first embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of information stored in the utterance set storage unit 101 according to the first embodiment.

例えば、図２に示すように、発話セット記憶部１０１は、第１発話「映画は好き？」と、第１発話に対して想定されるユーザによる発話を表す想定ユーザ発話「好きです」と、第２発話「そうなんだ」とを含んだ「発話セット１」を記憶する。図２では、説明の便宜上、「想定ユーザ発話」を含む発話セット記憶部１０１を例示したが、「想定ユーザ発話」は、発話セット記憶部１０１に含まれていなくても良い。すなわち、図２に示すように、発話セット記憶部１０１は、第１発話に対するユーザ発話がどのような応答であっても、自然な対話を実現できる発話セットを記憶している。 For example, as illustrated in FIG. 2, the utterance set storage unit 101 includes a first utterance “Is a movie?” And an assumed user utterance “I like” representing an utterance by a user assumed for the first utterance, The “utterance set 1” including the second utterance “Yes” is stored. In FIG. 2, for convenience of explanation, the utterance set storage unit 101 including “assumed user utterance” is illustrated, but the “assumed user utterance” may not be included in the utterance set storage unit 101. That is, as shown in FIG. 2, the utterance set storage unit 101 stores an utterance set that can realize a natural conversation regardless of the response of the user utterance to the first utterance.

発話セット取得部１０２は、発話セットを取得する。例えば、発話セット取得部１０２は、ユーザが対話装置１００の前に立ったり、ユーザによる対話装置１００の利用準備が整ったりした等、任意のタイミングで、発話セット記憶部１０１に記憶された発話セットを取得する。例を挙げると、発話セット取得部１０２は、発話セット記憶部１０１に記憶された「発話セット１」である、第１発話「映画は好き？」と、第２発話「そうなんだ」とを取得する。 The utterance set acquisition unit 102 acquires an utterance set. For example, the utterance set acquisition unit 102 stores the utterance set stored in the utterance set storage unit 101 at an arbitrary timing such as when the user stands in front of the dialog device 100 or when the user is ready to use the dialog device 100. To get. For example, the utterance set acquisition unit 102 acquires the first utterance “Is a movie?” And the second utterance “I think so” that are “utterance set 1” stored in the utterance set storage unit 101. To do.

出力部１０３は、発話セット取得部１０２によって取得された発話セットを出力する。かかる出力部１０３は、第１出力部１０３ａと、第２出力部１０３ｂとを有する。これらのうち、第１出力部１０３ａは、発話セット取得部１０２によって取得された発話セットに含まれる第１発話を出力する。例を挙げると、第１出力部１０３ａは、発話セット取得部１０２によって取得された発話セット「発話セット１」に含まれる第１発話「映画は好き？」を所定の表示出力装置に出力する。なお、第２出力部１０３ｂによる処理については後述する。 The output unit 103 outputs the utterance set acquired by the utterance set acquisition unit 102. The output unit 103 includes a first output unit 103a and a second output unit 103b. Among these, the first output unit 103 a outputs the first utterance included in the utterance set acquired by the utterance set acquisition unit 102. For example, the first output unit 103a outputs the first utterance “Do you like movies?” Included in the utterance set “utterance set 1” acquired by the utterance set acquisition unit 102 to a predetermined display output device. The processing by the second output unit 103b will be described later.

検知部１０４は、第１発話が出力された後のユーザによる発話を検知する。例えば、検知部１０４は、第１出力部１０３ａによる第１発話の出力後に、ユーザによる発話を検知する。例を挙げると、検知部１０４は、第１出力部１０３ａによって第１発話「映画は好き？」が所定の表示出力装置へ出力された後に、ユーザが「好きです」等を発話したことを検知する。なお、ユーザによる発話は、上記の例に限られるものではなく、どのような発話であっても良い。 The detection unit 104 detects an utterance by the user after the first utterance is output. For example, the detection unit 104 detects an utterance by the user after the first output unit 103a outputs the first utterance. For example, the detection unit 104 detects that the user uttered “I like” after the first utterance “Is movie?” Is output to a predetermined display output device by the first output unit 103a. To do. Note that the utterance by the user is not limited to the above example, and may be any utterance.

また、第２出力部１０３ｂは、検知部１０４によってユーザによる発話が検知された場合に、発話セット取得部１０２によって取得された発話セットに含まれる第２発話を出力する。例を挙げると、第２出力部１０３ｂは、検知部１０４によってユーザによる発話「好きです」が検知された場合に、発話セット取得部１０２によって取得された発話セット「発話セット１」に含まれる第２発話「そうなんだ」を所定の表示出力装置に出力する。但し、検知部１０４によってユーザによる発話を一定時間検知できない場合には、第２出力部１０３ｂによる第２発話の出力を行わずに、次の発話セットを発話セット取得部１０２に取得させる。 The second output unit 103 b outputs the second utterance included in the utterance set acquired by the utterance set acquisition unit 102 when the detection unit 104 detects an utterance by the user. For example, the second output unit 103b includes the second output unit 103b included in the utterance set “utterance set 1” acquired by the utterance set acquisition unit 102 when the detection unit 104 detects an utterance “I like” by the user. Two utterances “Yes” are output to a predetermined display output device. However, if the detection unit 104 cannot detect the user's utterance for a certain period of time, the utterance set acquisition unit 102 acquires the next utterance set without outputting the second utterance by the second output unit 103b.

次に、図３を用いて、第１の実施形態に係る対話処理の流れを説明する。図３は、第１の実施形態に係る対話処理の流れの例を示すフローチャートである。 Next, the flow of interactive processing according to the first embodiment will be described with reference to FIG. FIG. 3 is a flowchart illustrating an example of the flow of the interactive processing according to the first embodiment.

例えば、図３に示すように、発話セット取得部１０２は、発話セット記憶部１０１から発話セットを取得する（ステップＳ１０１）。また、第１出力部１０３ａは、発話セット取得部１０２によって取得された発話セットに含まれる第１発話を所定の表示出力装置に出力する（ステップＳ１０２）。 For example, as illustrated in FIG. 3, the utterance set acquisition unit 102 acquires an utterance set from the utterance set storage unit 101 (step S101). The first output unit 103a outputs the first utterance included in the utterance set acquired by the utterance set acquisition unit 102 to a predetermined display output device (Step S102).

また、検知部１０４は、第１出力部１０３ａによって出力された第１発話に対する、ユーザによる発話を検知したか否かを判定する（ステップＳ１０３）。このとき、検知部１０４によってユーザによる発話が検知された場合に（ステップＳ１０３肯定）、第２出力部１０３ｂは、発話セット取得部１０２によって取得された発話セットに含まれる第２発話を所定の表示出力装置に出力する（ステップＳ１０４）。また、第２発話の出力後、発話セット取得部１０２は、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ１０１）。 Moreover, the detection part 104 determines whether the user's utterance with respect to the 1st utterance output by the 1st output part 103a was detected (step S103). At this time, when the user's utterance is detected by the detection unit 104 (Yes in step S103), the second output unit 103b displays a predetermined display of the second utterance included in the utterance set acquired by the utterance set acquisition unit 102. Output to the output device (step S104). After the output of the second utterance, the utterance set acquisition unit 102 acquires the next utterance set from the utterance set storage unit 101 (step S101).

一方、検知部１０４は、ユーザによる発話を検知していない場合に（ステップＳ１０３否定）、一定時間が経過したか否かを判定する（ステップＳ１０５）。このとき、検知部１０４は、一定時間が経過していない場合に（ステップＳ１０５否定）、再度、ユーザによる発話を検知したか否かを判定する（ステップＳ１０３）。一方、一定時間が経過した場合に（ステップＳ１０５肯定）、発話セット取得部１０２は、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ１０１）。 On the other hand, when the utterance by the user is not detected (No at Step S103), the detection unit 104 determines whether or not a predetermined time has elapsed (Step S105). At this time, when the predetermined time has not elapsed (No at Step S105), the detection unit 104 determines again whether or not the user's utterance has been detected (Step S103). On the other hand, when the predetermined time has elapsed (Yes at Step S105), the utterance set acquisition unit 102 acquires the next utterance set from the utterance set storage unit 101 (Step S101).

本実施形態によれば、ユーザによる発話を限定する発話と、ユーザによる発話に対する応答の発話とを利用して対話するので、自然な対話を実現することができる。換言すると、ユーザによる発話の内容を認識することなく、自然な対話を実現することができる。 According to the present embodiment, since the dialogue is performed using the utterance that limits the utterance by the user and the utterance of the response to the utterance by the user, a natural dialogue can be realized. In other words, a natural conversation can be realized without recognizing the content of the utterance by the user.

（第２の実施形態）
図４は、第２の実施形態に係る対話装置の構成例を示すブロック図である。第２の実施形態では、第１の実施形態と同様の処理を実行する機能部については同一の符号を付し、同様の処理についてはその説明を省略する場合がある。 (Second Embodiment)
FIG. 4 is a block diagram illustrating a configuration example of the interactive apparatus according to the second embodiment. In the second embodiment, functional units that perform the same processes as in the first embodiment are denoted by the same reference numerals, and descriptions of the same processes may be omitted.

例えば、図４に示すように、対話装置２００は、発話セット記憶部１０１と、発話セット取得部１０２と、出力部１０３と、検知部１０４と、概念辞書記憶部２０５と、発話テンプレート記憶部２０６と、概念取得部２０７と、発話テンプレート取得部２０８と、発話セット生成部２０９とを有する。また、第１の実施形態と同様に、対話装置２００は、ディスプレイ又はスピーカ等を備えた所定の表示出力装置に、文字又は音声を出力することによりユーザとの対話を実現する。 For example, as illustrated in FIG. 4, the dialogue apparatus 200 includes an utterance set storage unit 101, an utterance set acquisition unit 102, an output unit 103, a detection unit 104, a concept dictionary storage unit 205, and an utterance template storage unit 206. A concept acquisition unit 207, an utterance template acquisition unit 208, and an utterance set generation unit 209. Similarly to the first embodiment, the dialogue apparatus 200 realizes a dialogue with the user by outputting characters or voices to a predetermined display output device including a display or a speaker.

概念辞書記憶部２０５は、語句と概念とを対応付けて記憶する。例えば、概念辞書記憶部２０５に記憶される単語は、常用されている単語であり、固有名詞や新しい単語等は含まれていない。発話テンプレート記憶部２０６は、一部の語句が概念で表現された第１発話と第２発話とを含んだ発話セットのテンプレートを記憶する。ここで、図５Ａ及び図５Ｂを用いて、第２の実施形態に係る発話テンプレート記憶部２０６に記憶される情報について説明する。図５Ａ及び図５Ｂは、第２の実施形態に係る発話テンプレート記憶部２０６に記憶される情報例を示す図である。 The concept dictionary storage unit 205 stores words and concepts in association with each other. For example, the words stored in the concept dictionary storage unit 205 are commonly used words and do not include proper nouns, new words, and the like. The utterance template storage unit 206 stores an utterance set template including a first utterance and a second utterance in which some words are expressed in concept. Here, information stored in the utterance template storage unit 206 according to the second embodiment will be described with reference to FIGS. 5A and 5B. 5A and 5B are diagrams illustrating examples of information stored in the utterance template storage unit 206 according to the second embodiment.

例えば、図５Ａに示すように、発話テンプレート記憶部２０６は、第１発話「［食べ物］は好き？」と、想定ユーザ発話「好きだよ」と、第２発話「［食べ物］はおいしいんだよ」とを含んだ発話セットのテンプレート「テンプレート１」を記憶する。ここで、“［］”で囲まれた「食べ物」は、単語の概念である。すなわち、発話テンプレート記憶部２０６に記憶される発話セットのテンプレートには、発話に含まれる単語が概念で表現されている。 For example, as illustrated in FIG. 5A, the utterance template storage unit 206 has a first utterance “Do you like [food]?”, An assumed user utterance “I like you”, and the second utterance “[food] is delicious. The template “template 1” of the utterance set including “yo” is stored. Here, “food” surrounded by “[]” is a concept of a word. That is, in the utterance set template stored in the utterance template storage unit 206, words included in the utterance are expressed in concept.

また、例えば、図５Ｂに示すように、発話テンプレート記憶部２０６は、第１発話「［食べ物］は好き？」と、想定ユーザ発話「好きだよ」と、第２発話「［食べ物］は「味」んだよ」とを含んだ発話セットのテンプレート「テンプレート１」を記憶する。同様に、“［］”で囲まれた「食べ物」及び「味」は、単語の概念である。すなわち、発話テンプレート記憶部２０６に記憶される発話セットのテンプレートには、発話に含まれる複数の単語が概念で表現されていても良い。なお、図５Ａ及び図５Ｂでは、説明の便宜上、「想定ユーザ発話」を含む発話テンプレート記憶部２０６を例示したが、「想定ユーザ発話」は、発話テンプレート記憶部２０６に含まれていなくても良い。 For example, as illustrated in FIG. 5B, the utterance template storage unit 206 includes a first utterance “Do you like [food]?”, An assumed user utterance “I like you”, and a second utterance “[food]” The template “template 1” of the utterance set including “taste” is stored. Similarly, “food” and “taste” surrounded by “[]” are word concepts. That is, in the utterance set template stored in the utterance template storage unit 206, a plurality of words included in the utterance may be expressed in concept. 5A and 5B illustrate the utterance template storage unit 206 including “assumed user utterance” for convenience of explanation, but the “assumed user utterance” may not be included in the utterance template storage unit 206. .

概念取得部２０７は、語句が入力された場合に、該語句に対応する概念を概念辞書記憶部２０５から取得する。例を挙げると、概念取得部２０７は、ユーザによる対話装置２００の操作により、単語「チョコレート」の入力を受け付ける。そして、概念取得部２０７は、受け付けた単語「チョコレート」に対応する概念「食べ物」を概念辞書記憶部２０５から取得する。また、複数の単語が入力される場合の例を挙げると、概念取得部２０７は、ユーザによる対話装置２００の操作により、単語「チョコレート」、「甘い」の入力を受け付ける。そして、概念取得部２０７は、受け付けた単語「チョコレート」、「甘い」に対応する概念「食べ物」、「味」を概念辞書記憶部２０５から取得する。なお、対話装置２００は、単語に対応する概念が概念辞書記憶部２０５に登録されていなかった場合に処理を終了する。 When a phrase is input, the concept acquisition unit 207 acquires a concept corresponding to the phrase from the concept dictionary storage unit 205. For example, the concept acquisition unit 207 receives an input of the word “chocolate” by the operation of the interactive device 200 by the user. Then, the concept acquisition unit 207 acquires the concept “food” corresponding to the received word “chocolate” from the concept dictionary storage unit 205. Further, as an example in the case where a plurality of words are input, the concept acquisition unit 207 accepts input of the words “chocolate” and “sweet” by the operation of the interactive device 200 by the user. Then, the concept acquisition unit 207 acquires the concepts “food” and “taste” corresponding to the received words “chocolate” and “sweet” from the concept dictionary storage unit 205. Note that the dialogue apparatus 200 ends the process when the concept corresponding to the word is not registered in the concept dictionary storage unit 205.

発話テンプレート取得部２０８は、概念取得部２０７によって取得された概念を含む発話セットのテンプレートを発話テンプレート記憶部２０６から取得する。例を挙げると、発話テンプレート取得部２０８は、概念取得部２０７によって取得された概念「食べ物」を含む発話セットのテンプレート「テンプレート１」を発話テンプレート記憶部２０６（図５Ａ参照）から取得する。また、複数の単語が入力された場合の例を挙げると、発話テンプレート取得部２０８は、概念取得部２０７によって取得された概念「食べ物」、「味」を含む発話セットのテンプレート「テンプレート１」を発話テンプレート記憶部２０６（図５Ｂ参照）から取得する。なお、対話装置２００は、概念を含む発話セットのテンプレートが発話テンプレート記憶部２０６に登録されていなかった場合に処理を終了する。 The utterance template acquisition unit 208 acquires an utterance set template including the concept acquired by the concept acquisition unit 207 from the utterance template storage unit 206. For example, the utterance template acquisition unit 208 acquires the template “template 1” of the utterance set including the concept “food” acquired by the concept acquisition unit 207 from the utterance template storage unit 206 (see FIG. 5A). Further, as an example in the case where a plurality of words are input, the utterance template acquisition unit 208 uses the template “template 1” of the utterance set including the concepts “food” and “taste” acquired by the concept acquisition unit 207. Acquired from the utterance template storage unit 206 (see FIG. 5B). Note that the dialogue apparatus 200 ends the process when the template of the utterance set including the concept is not registered in the utterance template storage unit 206.

発話セット生成部２０９は、発話テンプレート取得部２０８によって取得された発話セットのテンプレートに含まれる概念に、入力された語句を挿入して新たな発話セットを生成する。そして、発話セット生成部２０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 The utterance set generation unit 209 generates a new utterance set by inserting the input phrase into the concept included in the utterance set template acquired by the utterance template acquisition unit 208. Then, the utterance set generation unit 209 stores the generated new utterance set in the utterance set storage unit 101.

例を挙げると、発話セット生成部２０９は、発話テンプレート取得部２０８によって取得された発話セットのテンプレート「テンプレート１」（図５Ａ参照）に含まれる概念「食べ物」に、入力を受け付けた単語「チョコレート」を挿入して新たな発話セットを生成する。上記の例で生成される発話セットは、第１発話「チョコレートは好き？」と、想定ユーザ発話「好きだよ」と、第２発話「チョコレートはおいしいんだよ」とを含むものとなる。そして、発話セット生成部２０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 For example, the utterance set generation unit 209 receives the word “chocolate” received from the concept “food” included in the template “template 1” (see FIG. 5A) of the utterance set acquired by the utterance template acquisition unit 208. ”To generate a new utterance set. The utterance set generated in the above example includes the first utterance “Do you like chocolate?”, The assumed user utterance “I like you”, and the second utterance “I love chocolate”. Then, the utterance set generation unit 209 stores the generated new utterance set in the utterance set storage unit 101.

また、複数の単語が入力された場合の例を挙げると、発話セット生成部２０９は、発話テンプレート取得部２０８によって取得された発話セットのテンプレート「テンプレート１」（図５Ｂ参照）に含まれる概念「食べ物」、「味」に、入力を受け付けた単語「チョコレート」、「甘い」を挿入して新たな発話セットを生成する。上記の例で生成される発話セットは、第１発話「チョコレートは好き？」と、想定ユーザ発話「好きだよ」と、第２発話「チョコレートは甘いんだよ」とを含むものとなる。そして、発話セット生成部２０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 Further, when an example in which a plurality of words are input is given, the utterance set generation unit 209 includes a concept “template 1” (see FIG. 5B) included in the utterance set acquired by the utterance template acquisition unit 208. A new utterance set is generated by inserting the input words “chocolate” and “sweet” into “food” and “taste”. The utterance set generated in the above example includes the first utterance “Do you like chocolate?”, The assumed user utterance “I like you”, and the second utterance “Chocolate is sweet”. Then, the utterance set generation unit 209 stores the generated new utterance set in the utterance set storage unit 101.

次に、図６を用いて、第２の実施形態に係る発話セット生成処理の流れを説明する。図６は、第２の実施形態に係る発話セット生成処理の流れの例を示すフローチャートである。 Next, the flow of the utterance set generation process according to the second embodiment will be described with reference to FIG. FIG. 6 is a flowchart illustrating an example of the flow of an utterance set generation process according to the second embodiment.

例えば、図６に示すように、概念取得部２０７は、単語の入力を受け付けた場合に（ステップＳ２０１肯定）、該単語に対応する概念を概念辞書記憶部２０５から検索する（ステップＳ２０２）。また、概念取得部２０７は、単語の入力を受け付けていない場合に（ステップＳ２０１否定）、該単語の入力待ちの状態となる。 For example, as shown in FIG. 6, when the concept acquisition unit 207 receives an input of a word (Yes at Step S201), the concept acquisition unit 207 searches the concept dictionary storage unit 205 for a concept corresponding to the word (Step S202). In addition, when the concept acquisition unit 207 does not accept an input of a word (No at Step S201), the concept acquisition unit 207 enters a state of waiting for an input of the word.

このとき、概念取得部２０７によって単語に対応する概念が概念辞書記憶部２０５から取得された場合に（ステップＳ２０３肯定）、発話テンプレート取得部２０８は、取得された概念を含む発話セットのテンプレートを発話テンプレート記憶部２０６から検索する（ステップＳ２０４）。一方、対話装置２００は、概念取得部２０７によって単語に対応する概念が概念辞書記憶部２０５から取得されなかった場合に（ステップＳ２０３否定）、処理を終了する。 At this time, when the concept corresponding to the word is acquired from the concept dictionary storage unit 205 by the concept acquisition unit 207 (Yes in step S203), the utterance template acquisition unit 208 utters the template of the utterance set including the acquired concept. A search is made from the template storage unit 206 (step S204). On the other hand, when the concept corresponding to the word is not acquired from the concept dictionary storage unit 205 by the concept acquisition unit 207 (No in step S203), the interactive device 200 ends the process.

また、発話テンプレート取得部２０８は、概念取得部２０７によって取得された概念を含む発話セットのテンプレートを発話テンプレート記憶部２０６から検索する（ステップＳ２０４）。このとき、発話テンプレート取得部２０８によって発話セットのテンプレートが発話テンプレート記憶部２０６から取得された場合に（ステップＳ２０５肯定）、発話セット生成部２０９は、取得された発話セットのテンプレートに含まれる概念に、入力された単語を挿入して新たな発話セットを生成する（ステップＳ２０６）。その後、発話セット生成部２０９は、生成した発話セットを発話セット記憶部１０１に格納する。一方、対話装置２００は、発話テンプレート取得部２０８によって発話セットのテンプレートが発話テンプレート記憶部２０６から取得されなかった場合に（ステップＳ２０５否定）、処理を終了する。 Also, the utterance template acquisition unit 208 searches the utterance template storage unit 206 for an utterance set template including the concept acquired by the concept acquisition unit 207 (step S204). At this time, when the utterance set template is acquired from the utterance template storage unit 206 by the utterance template acquisition unit 208 (Yes in step S205), the utterance set generation unit 209 uses the concept included in the acquired utterance set template. Then, the input word is inserted to generate a new utterance set (step S206). Thereafter, the utterance set generation unit 209 stores the generated utterance set in the utterance set storage unit 101. On the other hand, when the utterance template acquisition unit 208 has not acquired the utterance set template from the utterance template storage unit 206 (No at Step S205), the interactive apparatus 200 ends the process.

本実施形態によれば、予め登録された発話セットを使用した対話だけでなく、任意の単語に基づいた発話セットを使用した種々の対話を実現することができる。 According to the present embodiment, not only a dialogue using a previously registered utterance set but also various dialogues using a utterance set based on an arbitrary word can be realized.

（第３の実施形態）
図７は、第３の実施形態に係る対話装置の構成例を示すブロック図である。第３の実施形態では、第１の実施形態又は第２の実施形態と同様の処理を実行する機能部については同一の符号を付し、同様の処理についてはその説明を省略する場合がある。 (Third embodiment)
FIG. 7 is a block diagram illustrating a configuration example of the interactive apparatus according to the third embodiment. In the third embodiment, functional units that perform the same processes as those in the first or second embodiment are denoted by the same reference numerals, and the description of the same processes may be omitted.

例えば、図７に示すように、対話装置３００は、発話セット記憶部１０１と、発話セット取得部１０２と、出力部１０３と、検知部１０４と、発話テンプレート記憶部２０６と、発話テンプレート取得部３０８と、発話セット生成部３０９と、特有単語記憶部３１０とを有する。また、第１の実施形態と同様に、対話装置３００は、ディスプレイ又はスピーカ等を備えた所定の表示出力装置に、文字又は音声を出力することによりユーザとの対話を実現する。 For example, as shown in FIG. 7, the dialogue apparatus 300 includes an utterance set storage unit 101, an utterance set acquisition unit 102, an output unit 103, a detection unit 104, an utterance template storage unit 206, and an utterance template acquisition unit 308. And an utterance set generation unit 309 and a unique word storage unit 310. Similarly to the first embodiment, the dialogue apparatus 300 realizes a dialogue with the user by outputting characters or voices to a predetermined display output device including a display or a speaker.

特有単語記憶部３１０は、外部から取得された特有な語句と、該特有な語句の概念とを対応付けて記憶する。ここで、図８を用いて、第３の実施形態に係る特有単語記憶部３１０に記憶される情報について説明する。図８は、第３の実施形態に係る特有単語記憶部３１０に記憶される情報例を示す図である。 The unique word storage unit 310 stores a unique phrase acquired from the outside and the concept of the unique phrase in association with each other. Here, information stored in the unique word storage unit 310 according to the third embodiment will be described with reference to FIG. FIG. 8 is a diagram illustrating an example of information stored in the unique word storage unit 310 according to the third embodiment.

例えば、図８に示すように、特有単語記憶部３１０は、特有単語「猛暑」と、特有単語の概念「天気」と、特有単語の出所を表す素性「http：／／tenki＿jouhou．jp」とを対応付けて記憶する。また、特有単語は、特有な語句の一例である。特有単語記憶部３１０に記憶される情報は、日々変化する情報に基づいて適宜更新される。具体的には、天気に該当する特有単語は、予め登録された地域の天気に基づいた単語（例えば、「晴れ」、「雨」、「猛暑」、「ゲリラ豪雨」等）に更新される。同様に、ニュースに該当する特有単語は、解析された特定のニュースサイトの記事に基づいた単語（例えば、「なでしこジャパン」、「東日本大震災」、「○○首相」等）に更新される。同様に、このほかの特有単語は、マイクロブログ（Micro Blogging）等で頻出する単語（例えば、「台風○○号」、「正心誠意」、「女子会」等）に更新される。このように、特有単語記憶部３１０に記憶される特有単語は、常用されない単語を含む等、上述した概念辞書記憶部２０５等に登録された単語とは異なる性質を有する単語である。なお、特有単語記憶部３１０への特有単語の登録は、対話装置３００内で実行されても良いし、特有単語を抽出するサービスを利用しても良い。 For example, as illustrated in FIG. 8, the unique word storage unit 310 includes a unique word “hot”, a unique word concept “weather”, and a feature “http://tenki_jouhou.jp” representing the origin of the unique word. Store in association with each other. The unique word is an example of a unique phrase. Information stored in the unique word storage unit 310 is updated as appropriate based on information that changes daily. Specifically, the specific word corresponding to the weather is updated to a word (for example, “sunny”, “rain”, “hot weather”, “guerrilla heavy rain”, etc.) based on the weather registered in advance. Similarly, the unique word corresponding to the news is updated to a word (for example, “Nadeshiko Japan”, “Great East Japan Earthquake”, “OO Prime Minister”, etc.) based on the analyzed article on the specific news site. Similarly, the other unique words are updated to words that frequently appear in micro blogging (for example, “Typhoon XX”, “Sincerity Sincerity”, “Girls Association”, etc.). As described above, the unique words stored in the unique word storage unit 310 are words having different properties from the words registered in the concept dictionary storage unit 205 and the like described above, including words that are not used regularly. The registration of the unique word in the unique word storage unit 310 may be executed in the interactive apparatus 300, or a service for extracting the unique word may be used.

発話テンプレート取得部３０８は、特有な語句の概念を含む発話セットのテンプレートを発話テンプレート記憶部２０６から取得する。例を挙げると、発話テンプレート取得部３０８は、特有単語記憶部３１０に記憶された特有単語「わさびソフト」の概念「食べ物」を取得する。そして、発話テンプレート取得部３０８は、取得した概念「食べ物」を含む発話セットのテンプレート「テンプレート１」を発話テンプレート記憶部２０６（図５Ａ参照）から取得する。なお、発話テンプレート取得部３０８は、特有単語記憶部３１０から特有単語の概念を取得する際、一つの様態として、特有単語記憶部３１０への登録が最新のものから取得する。 The utterance template acquisition unit 308 acquires an utterance set template including a unique phrase concept from the utterance template storage unit 206. For example, the utterance template acquisition unit 308 acquires the concept “food” of the unique word “wasabi software” stored in the unique word storage unit 310. Then, the utterance template acquisition unit 308 acquires the template “template 1” of the utterance set including the acquired concept “food” from the utterance template storage unit 206 (see FIG. 5A). Note that when the utterance template acquisition unit 308 acquires the concept of the specific word from the specific word storage unit 310, as one aspect, the utterance template acquisition unit 308 acquires the registration in the specific word storage unit 310 from the latest one.

また、複数の特有単語の概念を取得する場合の例を挙げると、発話テンプレート取得部３０８は、特有単語記憶部３１０に記憶された特有単語「わさびソフト」、「涙が出るほど辛い」の概念「食べ物」、「味」を取得する。そして、発話テンプレート取得部３０８は、取得した概念「食べ物」、「味」を含む発話セットのテンプレート「テンプレート１」を発話テンプレート記憶部２０６（図５Ｂ参照）から取得する。 Further, as an example in the case of acquiring the concept of a plurality of unique words, the utterance template acquisition unit 308 has a concept of the unique words “wasabi soft” and “spicy enough to tear” stored in the unique word storage unit 310. Get “food” and “taste”. Then, the utterance template acquisition unit 308 acquires the template “template 1” of the utterance set including the acquired concepts “food” and “taste” from the utterance template storage unit 206 (see FIG. 5B).

発話セット生成部３０９は、発話テンプレート取得部３０８によって取得された発話セットのテンプレートに含まれる概念に、特有な語句を挿入して新たな発話セットを生成する。そして、発話セット生成部３０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 The utterance set generation unit 309 generates a new utterance set by inserting a unique word / phrase into the concept included in the utterance set template acquired by the utterance template acquisition unit 308. Then, the utterance set generation unit 309 stores the generated new utterance set in the utterance set storage unit 101.

例を挙げると、発話セット生成部３０９は、発話テンプレート取得部３０８によって取得された発話セットのテンプレート「テンプレート１」（図５Ａ参照）に含まれる概念「食べ物」に、特有単語「わさびソフト」を挿入して新たな発話セットを生成する。上記の例で生成される発話セットは、第１発話「わさびソフトは好き？」と、想定ユーザ発話「好きだよ」と、第２発話「わさびソフトはおいしいんだよ」とを含むものとなる。そして、発話セット生成部３０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 For example, the utterance set generation unit 309 adds the unique word “wasabi software” to the concept “food” included in the template “template 1” (see FIG. 5A) of the utterance set acquired by the utterance template acquisition unit 308. Insert a new utterance set. The utterance set generated in the above example includes the first utterance “Do you like wasabi software?”, The assumed user utterance “I like it”, and the second utterance “I like wasabi software”. Become. Then, the utterance set generation unit 309 stores the generated new utterance set in the utterance set storage unit 101.

また、複数の特有単語の概念を取得した場合の例を挙げると、発話セット生成部３０９は、発話テンプレート取得部３０８によって取得された発話セットのテンプレート「テンプレート１」（図５Ｂ参照）に含まれる概念「食べ物」、「味」に、特有単語「わさびソフト」、「涙が出るほど辛い」を挿入して新たな発話セットを生成する。上記の例で生成される発話セットは、第１発話「わさびソフトは好き？」と、想定ユーザ発話「好きだよ」と、第２発話「わさびソフトは涙が出るほど辛いんだよ」とを含むものとなる。そして、発話セット生成部３０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 Further, when an example in which a concept of a plurality of unique words is acquired, the utterance set generation unit 309 is included in the utterance set template “template 1” (see FIG. 5B) acquired by the utterance template acquisition unit 308. A new utterance set is generated by inserting the unique words “wasabi soft” and “spicy enough to tear” into the concepts “food” and “taste”. The utterance set generated in the above example is the first utterance “Do you like wasabi software?”, The assumed user utterance “I like it”, and the second utterance “wasabi software is so hot that tears come out” Will be included. Then, the utterance set generation unit 309 stores the generated new utterance set in the utterance set storage unit 101.

なお、特有単語記憶部３１０から複数の特有単語の概念を取得する場合には、同じ素性を有する特有単語の概念を取得することが好ましい。なぜならば、何らかの関係を有する特有単語同士を利用して発話を生成することにより、より好適な発話を生成することができるからである。このことから、上記の例では、同じ素性「http：／／web．Analyze．cgi」を有する特有単語「わさびソフト」、「涙が出るほど辛い」の概念「食べ物」、「味」を取得する場合を例に挙げた。 In addition, when acquiring the concept of a some specific word from the specific word memory | storage part 310, it is preferable to acquire the concept of the specific word which has the same feature. This is because a more suitable utterance can be generated by generating an utterance using specific words having some relationship. For this reason, in the above example, the unique words “wasabi soft” and “food” and “taste” of “wasabi soft” having the same feature “http: //web.Analyze.cgi” are acquired. The case is given as an example.

次に、図９を用いて、第３の実施形態に係る発話セット生成処理の流れを説明する。図９は、第３の実施形態に係る発話セット生成処理の流れの例を示すフローチャートである。 Next, the flow of an utterance set generation process according to the third embodiment will be described with reference to FIG. FIG. 9 is a flowchart illustrating an example of the flow of an utterance set generation process according to the third embodiment.

例えば、図９に示すように、発話テンプレート取得部３０８は、特有単語の概念を特有単語記憶部３１０から取得する（ステップＳ３０１）。そして、発話テンプレート取得部３０８は、取得した特有単語の概念を含む発話セットのテンプレートを発話テンプレート記憶部２０６から検索する（ステップＳ３０２）。 For example, as shown in FIG. 9, the utterance template acquisition unit 308 acquires the concept of the specific word from the specific word storage unit 310 (step S301). Then, the utterance template acquisition unit 308 searches the utterance template storage unit 206 for an utterance set template including the acquired concept of the specific word (step S302).

このとき、発話テンプレート取得部３０８によって該当する発話セットのテンプレートが発話テンプレート記憶部２０６から取得された場合に（ステップＳ３０３肯定）、発話セット生成部３０９は、取得された発話セットのテンプレートに含まれる概念に、取得された特有単語を挿入して新たな発話セットを生成する（ステップＳ３０４）。 At this time, when the utterance template acquisition unit 308 acquires the template of the corresponding utterance set from the utterance template storage unit 206 (Yes in step S303), the utterance set generation unit 309 is included in the acquired utterance set template. A new utterance set is generated by inserting the acquired unique word into the concept (step S304).

その後、発話セット生成部３０９は、生成した発話セットを発話セット記憶部１０１に格納する。一方、対話装置３００は、発話テンプレート取得部３０８によって発話セットのテンプレートが発話テンプレート記憶部２０６から取得されなかった場合に（ステップＳ３０３否定）、処理を終了する。 Thereafter, the utterance set generation unit 309 stores the generated utterance set in the utterance set storage unit 101. On the other hand, when the utterance template acquisition unit 308 does not acquire the utterance set template from the utterance template storage unit 206 (No at Step S303), the interactive apparatus 300 ends the process.

本実施形態によれば、日々更新される最新のキーワードが含まれた発話セットを使用した対話を実現することができる。 According to the present embodiment, it is possible to realize a dialogue using an utterance set including the latest keyword updated daily.

（第４の実施形態）
図１０は、第４の実施形態に係る対話装置の構成例を示すブロック図である。第４の実施形態では、第１の実施形態又は第２の実施形態と同様の処理を実行する機能部については同一の符号を付し、同様の処理についてはその説明を省略する場合がある。 (Fourth embodiment)
FIG. 10 is a block diagram illustrating a configuration example of the interactive apparatus according to the fourth embodiment. In the fourth embodiment, functional units that perform the same processes as those in the first embodiment or the second embodiment are denoted by the same reference numerals, and descriptions of the same processes may be omitted.

例えば、図１０に示すように、対話装置４００は、発話セット記憶部１０１と、発話セット取得部１０２と、出力部１０３と、検知部１０４と、概念辞書記憶部２０５と、発話テンプレート記憶部２０６と、概念取得部４０７と、発話テンプレート取得部２０８と、発話セット生成部４０９と、抽出部４１１とを有する。また、第１の実施形態と同様に、対話装置４００は、ディスプレイ又はスピーカ等を備えた所定の表示出力装置に、文字又は音声を出力することによりユーザとの対話を実現する。 For example, as shown in FIG. 10, the dialogue apparatus 400 includes an utterance set storage unit 101, an utterance set acquisition unit 102, an output unit 103, a detection unit 104, a concept dictionary storage unit 205, and an utterance template storage unit 206. A concept acquisition unit 407, an utterance template acquisition unit 208, an utterance set generation unit 409, and an extraction unit 411. Similarly to the first embodiment, the dialogue apparatus 400 realizes a dialogue with the user by outputting characters or voices to a predetermined display output device including a display or a speaker.

抽出部４１１は、概念辞書記憶部２０５に記憶された語句に合致する、ユーザによる発話に含まれる語句を抽出する。例えば、抽出部４１１は、検知部１０４によって検知されたユーザによる発話の音声認識を行なう。そして、抽出部４１１は、音声認識処理の結果残った単語を、概念辞書記憶部２０５に記憶された単語に合致する、ユーザによる発話に含まれる単語として抽出する。 The extraction unit 411 extracts words / phrases included in the user's utterance that match the words / phrases stored in the concept dictionary storage unit 205. For example, the extraction unit 411 performs speech recognition of an utterance made by the user detected by the detection unit 104. Then, the extraction unit 411 extracts words remaining as a result of the speech recognition processing as words included in the utterance by the user that match the words stored in the concept dictionary storage unit 205.

かかる音声認識処理では、一つの様態として、ユーザによる発話を形態素解析することで単語に分割され、各単語を概念辞書記憶部２０５から検索する処理が実行される。ここで、概念辞書記憶部２０５に記憶されていない単語は、助詞や助動詞等のため排除される。これらにより、音声認識処理の結果、残った単語が抽出される。ここでは、音声認識処理の結果残った単語が「カレー」である場合を例に挙げる。 In such a speech recognition process, as one aspect, a utterance by a user is divided into words by performing a morphological analysis, and a process of searching each word from the concept dictionary storage unit 205 is executed. Here, words that are not stored in the concept dictionary storage unit 205 are excluded because of particles, auxiliary verbs, and the like. As a result, the remaining words are extracted as a result of the speech recognition process. Here, a case where the word remaining as a result of the speech recognition process is “curry” is taken as an example.

概念取得部４０７は、抽出部４１１によって抽出された語句に対応する概念を概念辞書記憶部２０５から取得する。例を挙げると、概念取得部４０７は、抽出部４１１による音声認識処理の結果、残った単語「カレー」に対応する概念「食べ物」を概念辞書記憶部２０５から取得する。また、発話テンプレート取得部２０８は、第２の実施形態と同様に、概念取得部４０７によって取得された概念「食べ物」を含む発話セットのテンプレート「テンプレート１」を発話テンプレート記憶部２０６（図５Ａ参照）から取得する。 The concept acquisition unit 407 acquires a concept corresponding to the word extracted by the extraction unit 411 from the concept dictionary storage unit 205. For example, the concept acquisition unit 407 acquires the concept “food” corresponding to the remaining word “curry” from the concept dictionary storage unit 205 as a result of the speech recognition processing by the extraction unit 411. Similarly to the second embodiment, the utterance template acquisition unit 208 uses the utterance template storage unit 206 (see FIG. 5A) for the template “template 1” of the utterance set including the concept “food” acquired by the concept acquisition unit 407. )

発話セット生成部４０９は、発話テンプレート取得部２０８によって取得された発話セットのテンプレートに含まれる概念に、抽出部４１１によって抽出された語句を挿入して新たな発話セットを生成する。そして、発話セット生成部４０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 The utterance set generation unit 409 generates a new utterance set by inserting the phrase extracted by the extraction unit 411 into the concept included in the utterance set template acquired by the utterance template acquisition unit 208. Then, the utterance set generation unit 409 stores the generated new utterance set in the utterance set storage unit 101.

例を挙げると、発話セット生成部４０９は、発話テンプレート取得部２０８によって取得された発話セットのテンプレート「テンプレート１」（図５Ａ参照）に含まれる概念「食べ物」に、抽出部４１１によって抽出された単語「カレー」を挿入して新たな発話セットを生成する。上記の例で生成される発話セットは、第１発話「カレーは好き？」と、想定ユーザ発話「好きだよ」と、第２発話「カレーはおいしいんだよ」とを含むものとなる。そして、発話セット生成部４０９は、生成した新たな発話セットを発話セット記憶部１０１に格納する。 For example, the utterance set generation unit 409 is extracted by the extraction unit 411 into the concept “food” included in the template “template 1” (see FIG. 5A) of the utterance set acquired by the utterance template acquisition unit 208. Insert the word “curry” to generate a new utterance set. The utterance set generated in the above example includes the first utterance “Do you like curry?”, The assumed user utterance “I like you”, and the second utterance “I like curry is delicious”. Then, the utterance set generation unit 409 stores the generated new utterance set in the utterance set storage unit 101.

次に、図１１を用いて、第４の実施形態に係る発話セット生成処理の流れを説明する。図１１は、第４の実施形態に係る発話セット生成処理の流れの例を示すフローチャートである。 Next, the flow of an utterance set generation process according to the fourth embodiment will be described with reference to FIG. FIG. 11 is a flowchart illustrating an example of the flow of an utterance set generation process according to the fourth embodiment.

例えば、図１１に示すように、検知部１０４によってユーザによる発話が検知された場合に（ステップＳ４０１肯定）、抽出部４１１は、音声認識処理を実行することにより、ユーザによる発話に含まれる単語を抽出する（ステップＳ４０２）。また、抽出部４１１は、検知部１０４によってユーザによる発話が検知されていない場合に（ステップＳ４０１否定）、検知部１０４によるユーザによる発話の検知待ちの状態となる。 For example, as shown in FIG. 11, when the utterance by the user is detected by the detection unit 104 (Yes in step S401), the extraction unit 411 executes a speech recognition process to thereby extract a word included in the utterance by the user. Extract (step S402). In addition, when the detection unit 104 has not detected the user's utterance (No in step S401), the extraction unit 411 enters a state of waiting for detection of the user's utterance by the detection unit 104.

また、概念取得部４０７は、抽出部４１１によって抽出された単語に対応する概念を概念辞書記憶部２０５から検索する（ステップＳ４０３）。このとき、概念取得部４０７によって単語に対応する概念が概念辞書記憶部２０５から取得された場合に（ステップＳ４０４肯定）、発話テンプレート取得部２０８は、取得された概念を含む発話セットのテンプレートを発話テンプレート記憶部２０６から検索する（ステップＳ４０５）。一方、対話装置４００は、概念取得部４０７によって単語に対応する概念が概念辞書記憶部２０５から取得されなかった場合に（ステップＳ４０４否定）、処理を終了する。 The concept acquisition unit 407 searches the concept dictionary storage unit 205 for a concept corresponding to the word extracted by the extraction unit 411 (step S403). At this time, when the concept corresponding to the word is acquired from the concept dictionary storage unit 205 by the concept acquisition unit 407 (Yes in step S404), the utterance template acquisition unit 208 utters the template of the utterance set including the acquired concept. A search is made from the template storage unit 206 (step S405). On the other hand, when the concept corresponding to the word is not acquired from the concept dictionary storage unit 205 by the concept acquisition unit 407 (No at Step S404), the dialogue apparatus 400 ends the process.

また、発話テンプレート取得部２０８によって発話セットのテンプレートが発話テンプレート記憶部２０６から取得された場合に（ステップＳ４０６肯定）、発話セット生成部４０９は、取得された発話セットのテンプレートに含まれる概念に、抽出部４１１によって抽出された単語を挿入して新たな発話セットを生成する（ステップＳ４０７）。その後、発話セット生成部４０９は、生成した発話セットを発話セット記憶部１０１に格納する。一方、対話装置４００は、発話テンプレート取得部２０８によって発話セットのテンプレートが発話テンプレート記憶部２０６から取得されなかった場合に（ステップＳ４０６否定）、処理を終了する。 Further, when the utterance set template is acquired from the utterance template storage unit 206 by the utterance template acquisition unit 208 (Yes in step S406), the utterance set generation unit 409 adds the concept included in the acquired utterance set template to the concept. The words extracted by the extraction unit 411 are inserted to generate a new utterance set (step S407). Thereafter, the utterance set generation unit 409 stores the generated utterance set in the utterance set storage unit 101. On the other hand, when the utterance template acquisition unit 208 does not acquire the utterance set template from the utterance template storage unit 206 (No at step S406), the interactive apparatus 400 ends the process.

本実施形態によれば、ユーザ発話に基づいて生成された発話セットを使用して対話するので、自然な対話の流れを実現することができる。 According to the present embodiment, since the dialogue is performed using the utterance set generated based on the user utterance, a natural flow of dialogue can be realized.

（第５の実施形態）
図１２は、第５の実施形態に係る対話装置の構成例を示すブロック図である。第５の実施形態では、第１の実施形態又は第２の実施形態と同様の処理を実行する機能部については同一の符号を付し、同様の処理についてはその説明を省略する場合がある。 (Fifth embodiment)
FIG. 12 is a block diagram illustrating a configuration example of the interactive apparatus according to the fifth embodiment. In the fifth embodiment, functional units that perform the same processes as those in the first embodiment or the second embodiment are denoted by the same reference numerals, and the description of the same processes may be omitted.

例えば、図１２に示すように、対話装置５００は、発話セット記憶部１０１と、発話セット取得部５０２と、出力部１０３と、検知部１０４と、概念辞書記憶部２０５と、発話テンプレート記憶部５０６と、概念取得部２０７と、発話テンプレート取得部２０８と、発話セット生成部２０９と、類似度算出部５１２と、共起辞書記憶部５１３と、決定部５１４とを有する。また、第１の実施形態と同様に、対話装置５００は、ディスプレイ又はスピーカ等を備えた所定の表示出力装置に、文字又は音声を出力することによりユーザとの対話を実現する。 For example, as shown in FIG. 12, the dialogue apparatus 500 includes an utterance set storage unit 101, an utterance set acquisition unit 502, an output unit 103, a detection unit 104, a concept dictionary storage unit 205, and an utterance template storage unit 506. A concept acquisition unit 207, an utterance template acquisition unit 208, an utterance set generation unit 209, a similarity calculation unit 512, a co-occurrence dictionary storage unit 513, and a determination unit 514. Similarly to the first embodiment, the dialogue apparatus 500 realizes a dialogue with the user by outputting characters or voices to a predetermined display output device including a display or a speaker.

発話テンプレート記憶部５０６は、第２の実施形態と同様に、一部の語句が概念で表現された第１発話と第２発話とを含んだ発話セットのテンプレートを記憶する。さらに、発話テンプレート記憶部５０６は、第１発話の種類と、第１発話に対するユーザによる発話から得られるユーザ情報と、発話セットのテンプレートの話題とを、第１発話及び第２発話に対応付けて記憶する。ここで、図１３を用いて、第５の実施形態に係る発話テンプレート記憶部５０６に記憶される情報について説明する。図１３は、第５の実施形態に係る発話テンプレート記憶部５０６に記憶される情報例を示す図である。 Similar to the second embodiment, the utterance template storage unit 506 stores a template of an utterance set including a first utterance and a second utterance in which some words are expressed as concepts. Furthermore, the utterance template storage unit 506 associates the type of the first utterance, the user information obtained from the utterance by the user with respect to the first utterance, and the topic of the template of the utterance set with the first utterance and the second utterance. Remember. Here, information stored in the utterance template storage unit 506 according to the fifth embodiment will be described with reference to FIG. FIG. 13 is a diagram illustrating an example of information stored in the utterance template storage unit 506 according to the fifth embodiment.

例えば、図１３に示すように、発話テンプレート記憶部５０６は、第１発話の種類「質問」と、第１発話「［食べ物］は好き？」と、想定ユーザ発話「好きだよ」と、ユーザ情報「好きな食べ物」と、第２発話「［食べ物］は［味］んだよ」と、話題「食べ物」とを含んだ発話セットのテンプレート「テンプレート１」を記憶する。ここで、“［］”で囲まれた「食べ物」、「味」は、単語の概念である。 For example, as illustrated in FIG. 13, the utterance template storage unit 506 includes a first utterance type “question”, a first utterance “Do you like [food]?”, And an assumed user utterance “I like you”. A template “template 1” of an utterance set including information “favorite food”, the second utterance “[food] is [taste]”, and the topic “food” is stored. Here, “food” and “taste” surrounded by “[]” are word concepts.

また、発話テンプレート記憶部５０６は、第１発話の種類「申し出」と、第１発話「［食べ物］はいかが？」と、想定ユーザ発話「どうも」と、第２発話「［食べ物］おすすめだよ」と、話題「食べ物」とを含んだ発話セットのテンプレート「テンプレート７」を記憶する。同様に、“［］”で囲まれた「食べ物」は、単語の概念である。「テンプレート７」のように、ユーザ情報は、発話セットのテンプレートによっては得られない場合もあるため、得られない場合にはその情報は保持されない。なお、「想定ユーザ発話」は、発話テンプレート記憶部５０６に含まれていなくても良い。 In addition, the utterance template storage unit 506 recommends the first utterance type “offer”, the first utterance “How is [food]?”, The assumed user utterance “mo”, and the second utterance “[food]”. ”And a template“ template 7 ”of an utterance set including the topic“ food ”. Similarly, “food” surrounded by “[]” is a concept of a word. Like the “template 7”, the user information may not be obtained depending on the template of the utterance set. If the user information cannot be obtained, the information is not retained. Note that the “assumed user utterance” may not be included in the utterance template storage unit 506.

類似度算出部５１２は、新たな発話セットを含む発話セット間、又は、発話セットのテンプレート間の類似度を算出する。例えば、類似度算出部５１２は、発話セットのテンプレート間で、「第１発話の種類」による類似度を算出する。詳細には、類似度算出部５１２は、発話セットのテンプレート「ｓ１」と、発話セットのテンプレート「ｓ２」との第１発話の種類による類似度「Ｒｔ（ｓ１，ｓ２）」を、第１発話の種類が同じであれば「１」、異なれば「０」とする。例を挙げると、類似度算出部５１２は、発話セットのテンプレート「テンプレート１」と、発話セットのテンプレート「テンプレート６」との第１発話の種類による類似度「Ｒｔ（ｓ１，ｓ２）」を、第１発話の種類が同じ「質問」であるので「１」とする。同様に、類似度算出部５１２は、発話セットのテンプレート「テンプレート１」と、発話セットのテンプレート「テンプレート７」との第１発話の種類による類似度「Ｒｔ（ｓ１，ｓ２）」を、第１発話の種類が異なる「質問」、「申し出」であるので「０」とする。第１の発話による類似度を算出する理由は、発話の種類が同じものを連続させてしまうことで、似たような対話ばかりになるのを防ぐためである。 The similarity calculation unit 512 calculates the similarity between utterance sets including a new utterance set or between templates of an utterance set. For example, the similarity calculation unit 512 calculates the similarity based on the “first utterance type” between the templates of the utterance set. Specifically, the similarity calculation unit 512 calculates the similarity “Rt (s1, s2)” based on the type of the first utterance between the utterance set template “s1” and the utterance set template “s2” as the first utterance. If the types are the same, “1” is set, and if they are different, “0” is set. For example, the similarity calculation unit 512 calculates the similarity “Rt (s1, s2)” based on the type of the first utterance between the template “template 1” of the utterance set and the template “template 6” of the utterance set. Since the type of the first utterance is the same “question”, “1” is set. Similarly, the similarity calculation unit 512 calculates the similarity “Rt (s1, s2)” based on the type of the first utterance between the template “template 1” of the utterance set and the template “template 7” of the utterance set as the first. Since the type of utterance is “question” and “offer”, “0” is set. The reason for calculating the similarity based on the first utterance is to prevent the conversations that are similar to each other from being continued by causing the same utterance types to be continued.

また、例えば、類似度算出部５１２は、発話セットのテンプレート間で、「ユーザ情報」による類似度を算出する。詳細には、類似度算出部５１２は、発話セットのテンプレート「ｓ１」と、発話セットのテンプレート「ｓ２」とのユーザ情報による類似度「Ｒｕ（ｓ１，ｓ２）」を、ユーザ情報が同じであれば「１」、異なれば「０」とする。例を挙げると、類似度算出部５１２は、発話セットのテンプレート「テンプレート１」と、発話セットのテンプレート「テンプレート６」とのユーザ情報による類似度「Ｒｕ（ｓ１，ｓ２）」を、ユーザ情報が同じ「好きな食べ物」であるので「１」とする。同様に、類似度算出部５１２は、発話セットのテンプレート「テンプレート１」と、発話セットのテンプレート「テンプレート７」とのユーザ情報による類似度「Ｒｕ（ｓ１，ｓ２）」を、異なる「好きな食べ物」、「（空欄）」であるので「０」とする。ユーザ情報による類似度を算出する理由は、似たようなことを尋ねる対話を連続させてしまうことが好ましくないからである。 Further, for example, the similarity calculation unit 512 calculates the similarity based on “user information” between the templates of the utterance set. Specifically, the similarity calculation unit 512 sets the similarity “Ru (s1, s2)” based on user information between the utterance set template “s1” and the utterance set template “s2” if the user information is the same. “1” if it is different, “0” if it is different. For example, the similarity calculation unit 512 indicates the similarity “Ru (s1, s2)” based on the user information between the utterance set template “template 1” and the utterance set template “template 6”. Since it is the same “favorite food”, “1” is set. Similarly, the similarity calculation unit 512 changes the similarity “Ru (s1, s2)” based on user information between the utterance set template “template 1” and the utterance set template “template 7” to different “favorite foods”. ”And“ (blank) ”, so“ 0 ”. The reason why the similarity based on the user information is calculated is that it is not preferable to continue conversations asking for similar things.

また、例えば、類似度算出部５１２は、発話セットのテンプレート間で、「話題」による類似度を算出する。詳細には、類似度算出部５１２は、発話セットのテンプレート「ｓ１」と、発話セットのテンプレート「ｓ２」との話題による類似度「Ｒｄ（ｓ１，ｓ２）」を、話題が同じであれば「１」、異なれば「０」とする。例を挙げると、類似度算出部５１２は、発話セットのテンプレート「テンプレート１」と、発話セットのテンプレート「テンプレート６」との話題による類似度「Ｒｄ（ｓ１，ｓ２）」を、話題が同じ「食べ物」であるので「１」とする。同様に、類似度算出部５１２は、発話セットのテンプレート「テンプレート１」と、発話セットのテンプレート「テンプレート５」との話題による類似度「Ｒｄ（ｓ１，ｓ２）」を、異なる「食べ物」、「人」であるので「０」とする。話題による類似度を算出する理由は、同じ話題の発話を連続させてしまうことで、似たような対話ばかりになるのを防ぐためである。 Further, for example, the similarity calculation unit 512 calculates the similarity based on “topic” between the templates of the utterance set. Specifically, the similarity calculation unit 512 determines the similarity “Rd (s1, s2)” based on the topic between the utterance set template “s1” and the utterance set template “s2” if the topics are the same. “1”, otherwise “0”. For example, the similarity calculation unit 512 uses a topic “similarity“ Rd (s1, s2) ”between the utterance set template“ template 1 ”and the utterance set template“ template 6 ”with the same topic“ “1” because it is “food”. Similarly, the similarity calculation unit 512 changes the similarity “Rd (s1, s2)” according to the topic between the utterance set template “template 1” and the utterance set template “template 5” to different “food”, “ “Human” is set to “0”. The reason for calculating the similarity based on the topic is to prevent the conversations from being similar to each other by continuing the utterances of the same topic.

これまで、発話セットのテンプレートの状態で類似度を算出する場合を説明したが、以下では、発話セットの状態で類似度を算出する場合を説明する。例えば、類似度算出部５１２は、発話セット間で、第１発話と第２発話との類似度を算出する。かかる類似度の算出の一つの様態として、編集距離を例に挙げる。編集距離とは、ある文章から別の文章に書き換えるときの、書き換えた単語の数を表す手数のことを指す。 The case where the similarity is calculated in the state of the utterance set template has been described so far, but the case where the similarity is calculated in the state of the utterance set will be described below. For example, the similarity calculation unit 512 calculates the similarity between the first utterance and the second utterance between utterance sets. As one aspect of calculating the similarity, an edit distance is taken as an example. The edit distance refers to the number of words representing the number of rewritten words when rewriting from one sentence to another.

また、以下では、下記の２つの発話セットを例に挙げる。
発話セットｓ１
「第１発話：カレーは好き？第２発話：カレーって辛いよね」
発話セットｓ２
「第１発話：アイスクリームは好き？第２発話：アイスクリームは冷たいよね」 In the following, the following two utterance sets are taken as an example.
Utterance set s1
"First utterance: Do you like curry? Second utterance: Curry is hard?"
Utterance set s2
"First utterance: Do you like ice cream? Second utterance: Ice cream is cold"

詳細には、類似度算出部５１２は、発話セットｓ１と発話セットｓ２とに含まれる第１発話同士の編集距離を求める。
発話セットｓ１の第１発話：カレー／は／好き？
発話セットｓ２の第１発話：アイスクリーム／は／好き？
上記の例では、発話セットｓ１の第１発話から、発話セットｓ２の第１発話へ書き換えるときに、「カレー」を「アイスクリーム」に書き換えるだけで良いので、手数は「１」となる。また、編集距離は、文章が長いほど大きくなることが多い。このため、文章の長さで正規化する。但し、助詞や助動詞は文章の内容に影響するものではないので、これら以外の単語に限定して正規化編集距離「手数÷文章の長さ」を求める。すなわち、類似度算出部５１２は、「は」以外の単語の正規化編集距離「１／２＝０．５」を求める。 Specifically, the similarity calculation unit 512 obtains the edit distance between the first utterances included in the utterance set s1 and the utterance set s2.
First utterance of utterance set s1: curry / ha / do you like?
First utterance of utterance set s2: Ice cream / Ha / Do you like it?
In the above example, when rewriting from the first utterance of the utterance set s1 to the first utterance of the utterance set s2, it is only necessary to rewrite “curry” with “ice cream”, so the number of steps becomes “1”. Also, the edit distance often increases as the sentence becomes longer. For this reason, it normalizes with the length of a sentence. However, particles and auxiliary verbs do not affect the content of the sentence, so the normalized edit distance “number of words / sentence length” is obtained by limiting to other words. That is, the similarity calculation unit 512 obtains a normalized editing distance “1/2 = 0.5” for words other than “ha”.

続いて、類似度算出部５１２は、発話セットｓ１と発話セットｓ２とに含まれる第２発話同士の編集距離を求める。
発話セットｓ１の第２発話：カレー／って／辛い／よね
発話セットｓ２の第２発話：アイスクリーム／は／冷たい／よね
上記の例では、発話セットｓ１の第２発話から、発話セットｓ２の第２発話へ書き換えるときに、「カレー」を「アイスクリーム」に書き換えるとともに、「辛い」を「冷たい」に書き換えれば良いので、手数は「２」となる。すなわち、類似度算出部５１２は、正規化編集距離「２／２＝１」を求める。 Subsequently, the similarity calculation unit 512 obtains an edit distance between the second utterances included in the utterance set s1 and the utterance set s2.
The second utterance of the utterance set s1: curry / te / spicy / Yone The second utterance of the utterance set s2: Ice cream / ha / cold / Yone In the above example, from the second utterance of the utterance set s1, the utterance set s2 When rewriting to the second utterance, “curry” should be rewritten to “ice cream” and “spicy” should be rewritten to “cold”, so the number of steps becomes “2”. That is, the similarity calculation unit 512 obtains a normalized editing distance “2/2 = 1”.

その後、類似度算出部５１２は、発話セットｓ１と発話セットｓ２との間の編集距離「０．５＋１＝１．５」を求める。すなわち、発話セット間の編集距離は、第１発話同士の編集距離と、第２発話同士の編集距離との和で求められる。そして、類似度算出部５１２は、発話セットｓ１と発話セットｓ２との間の編集距離による類似度「Ｒｅ（ｓ１，ｓ２）」を、求めた和の逆数「１／１．５＝０．６７」として求める。 Thereafter, the similarity calculation unit 512 obtains the edit distance “0.5 + 1 = 1.5” between the utterance set s1 and the utterance set s2. That is, the edit distance between utterance sets is obtained by the sum of the edit distance between the first utterances and the edit distance between the second utterances. Then, the similarity calculation unit 512 calculates the similarity “Re (s1, s2)” based on the edit distance between the utterance set s1 and the utterance set s2, and the reciprocal of the obtained sum “1 / 1.5 = 0.67. "

ところで、上記の例では、単語が一致するか否かを判定したが、単語間の関係性をさらに考慮しても良い。単語間の関係性の一例として、単語間の概念の違いが挙げられる。単語の概念は、概念辞書記憶部２０５に記憶された情報を利用する。例えば、「カレー」は、「生産物」、「食べ物」、「料理」という３階層の概念が付与されているものとする。また、「アイスクリーム」は、「生産物」、「食べ物」、「菓子」という３階層の概念が付与されているものとする。同様に、「辛い」は、「自然」、「自然」、「味」という３階層の概念が付与されているものとする。また、「冷たい」は、「関係」、「量」、「寒暖」という３階層の概念が付与されているものとする。概念の下位層まで一致する単語同士ほど、互いの意味が近い単語であると言える。そこで、概念の一致しなかった数で編集距離を求める。 By the way, in the above example, it is determined whether or not the words match, but the relationship between the words may be further considered. An example of the relationship between words is a conceptual difference between words. For the word concept, information stored in the concept dictionary storage unit 205 is used. For example, it is assumed that “curry” is given the concept of three levels of “product”, “food”, and “cooking”. In addition, “ice cream” is given a concept of three levels of “product”, “food”, and “confectionery”. Similarly, “spicy” is given the concept of three levels of “natural”, “natural”, and “taste”. Further, “cold” is given the concept of three levels of “relation”, “quantity”, and “cool”. It can be said that the words that match up to the lower layers of the concept are closer in meaning to each other. Therefore, the edit distance is obtained by the number that does not match the concept.

詳細には、類似度算出部５１２は、発話セットｓ１と発話セットｓ２とに含まれる第１発話同士の単語の概念の編集距離を求める。上記の例では、第１発話それぞれに含まれる単語「カレー」、「アイスクリーム」の概念は２階層まで一致しているため、手数は「３−２＝１」となる。この結果、第１発話同士の単語の概念の正規化編集距離は、「１／２＝０．５」となる。 Specifically, the similarity calculation unit 512 obtains the edit distance of the concept of the words between the first utterances included in the utterance set s1 and the utterance set s2. In the above example, the concept of the words “curry” and “ice cream” included in each of the first utterances matches up to two levels, so the number of moves is “3-2 = 1”. As a result, the normalized edit distance of the concept of the words between the first utterances is “1/2 = 0.5”.

続いて、類似度算出部５１２は、発話セットｓ１と発話セットｓ２とに含まれる第２発話同士の単語の概念の編集距離を求める。上記の例では、第２発話それぞれに含まれる単語「カレー」、「アイスクリーム」の概念は２階層まで一致しているため、手数は「３−２＝１」となる。加えて、第２発話それぞれに含まれる単語「辛い」、「冷たい」の概念は１階層も一致していないため、手数は「３−０＝３」となる。この結果、第２発話同士の単語の概念の正規化編集距離は、「（１＋３）／２＝２」となる。 Subsequently, the similarity calculation unit 512 obtains the edit distance of the concept of the words between the second utterances included in the utterance set s1 and the utterance set s2. In the above example, since the concepts of the words “curry” and “ice cream” included in each of the second utterances match up to two levels, the number of moves is “3-2 = 1”. In addition, since the concept of the words “spicy” and “cold” included in each of the second utterances does not match even one layer, the number of moves becomes “3-0 = 3”. As a result, the normalized editing distance of the concept of words between the second utterances is “(1 + 3) / 2 = 2”.

その後、類似度算出部５１２は、発話セットｓ１と発話セットｓ２との間の編集距離「０．５＋２＝２．５」を求める。そして、類似度算出部５１２は、発話セットｓ１と発話セットｓ２との間の編集距離による類似度「Ｒｅｃ（ｓ１，ｓ２）」を、求めた和の逆数「１／２．５＝０．４」として求める。 Thereafter, the similarity calculation unit 512 obtains the edit distance “0.5 + 2 = 2.5” between the utterance set s1 and the utterance set s2. Then, the similarity calculation unit 512 calculates the similarity “Rec (s1, s2)” based on the edit distance between the utterance set s1 and the utterance set s2, and the reciprocal of the obtained sum “1 / 2.5 = 0.4. "

ところで、上記の例では、単語間の関係性の一例として、単語間の概念の違いを挙げたが、単語間の共起性をさらに考慮しても良い。共起性に関しては、共起辞書記憶部５１３を利用する。共起辞書記憶部５１３は、例えば、任意の２つの単語の共起率を記憶する。共起率とは、２つの単語が同じ文書中でどれくらい使用されているかを表すものである。このような共起率の算出方法は、一つの様態として、以下の（数１）等がある。 By the way, in the above example, the difference in concept between words is given as an example of the relationship between words, but co-occurrence between words may be further considered. For co-occurrence, the co-occurrence dictionary storage unit 513 is used. The co-occurrence dictionary storage unit 513 stores the co-occurrence rate of any two words, for example. The co-occurrence rate represents how much two words are used in the same document. Such a co-occurrence rate calculation method includes the following (Equation 1) as one aspect.

（数１）

(Equation 1)

（数１）では、共起率の一例として「cosine係数」を挙げており、「単語Ａと単語Ｂとの共起頻度」とは単語Ａ、単語Ｂが同じ文書中で使用される頻度を意味する。このように、共起辞書記憶部５１３には、大量の文書を含んだデータを使用して予め算出された２つの単語の共起率が記憶されている。 In (Equation 1), “cosine coefficient” is given as an example of the co-occurrence rate, and “co-occurrence frequency of word A and word B” is the frequency with which word A and word B are used in the same document. means. As described above, the co-occurrence dictionary storage unit 513 stores the co-occurrence rate of two words calculated in advance using data including a large amount of documents.

詳細には、類似度算出部５１２は、発話セットｓ１と発話セットｓ２とに含まれる第１発話同士の単語の共起率の編集距離を求める。上記の例において、第１発話それぞれに含まれる単語「カレー」、「アイスクリーム」の共起率は、「０．２」であることとする。この結果、第１発話同士の単語の共起率の正規化編集距離は、「１／０．２／２＝１０」となる。 Specifically, the similarity calculation unit 512 obtains the edit distance of the word co-occurrence rate between the first utterances included in the utterance set s1 and the utterance set s2. In the above example, the co-occurrence rate of the words “curry” and “ice cream” included in each first utterance is “0.2”. As a result, the normalized editing distance of the word co-occurrence rate between the first utterances is “1 / 0.2 / 2 = 10”.

続いて、類似度算出部５１２は、発話セットｓ１と発話セットｓ２とに含まれる第２発話同士の単語の概念の編集距離を求める。上記の例では、第２発話それぞれに含まれる単語「カレー」、「アイスクリーム」の共起率は、「０．２」であることとする。加えて、第２発話それぞれに含まれる単語「辛い」、「冷たい」の共起率は、「０．０１」であることとする。この結果、第２発話同士の単語の共起率の正規化編集距離は、「（１／０．２＋１／０．０１）／２＝５２．５」となる。 Subsequently, the similarity calculation unit 512 obtains the edit distance of the concept of the words between the second utterances included in the utterance set s1 and the utterance set s2. In the above example, the co-occurrence rate of the words “curry” and “ice cream” included in each second utterance is “0.2”. In addition, the co-occurrence rate of the words “spicy” and “cold” included in each second utterance is “0.01”. As a result, the normalized editing distance of the word co-occurrence rate between the second utterances is “(1 / 0.2 + 1 / 0.01) /2=52.5”.

その後、類似度算出部５１２は、発話セットｓ１と発話セットｓ２との間の編集距離「１０＋５２．５＝６２．５」を求める。そして、類似度算出部５１２は、発話セットｓ１と発話セットｓ２との間の共起率を用いた編集距離による類似度「Ｒｅｓ（ｓ１，ｓ２）」を、求めた和の逆数「１／６２．５＝０．０１６」として求める。 Thereafter, the similarity calculation unit 512 obtains the edit distance “10 + 52.5 = 62.5” between the utterance set s1 and the utterance set s2. Then, the similarity calculation unit 512 calculates the similarity “Res (s1, s2)” based on the edit distance using the co-occurrence rate between the utterance set s1 and the utterance set s2, and the reciprocal of the obtained sum “1/62”. .5 = 0.016 ".

以上をふまえ、類似度算出部５１２は、発話セットｓ１と発話セットｓ２との間の類似度「Ｒ（ｓ１，ｓ２）」を、以下の（数２）により算出する。 Based on the above, the similarity calculation unit 512 calculates the similarity “R (s1, s2)” between the utterance set s1 and the utterance set s2 by the following (Equation 2).

（数２）
Ｒ（ｓ１，ｓ２）＝Ｗｔ＊Ｒｔ（ｓ１，ｓ２）＋Ｗｕ＊Ｒｕ（ｓ１，ｓ２）＋Ｗｄ＊Ｒｄ（ｓ１，ｓ２）＋Ｗｅ＊Ｒｅ（ｓ１，ｓ２）＋Ｗｅｃ＊Ｒｅｃ（ｓ１，ｓ２）＋Ｗｅｓ＊Ｒｅｓ（ｓ１，ｓ２） (Equation 2)
R (s1, s2) = Wt * Rt (s1, s2) + Wu * Ru (s1, s2) + Wd * Rd (s1, s2) + We * Re (s1, s2) + Wec * Rec (s1, s2) + Wes * Res (S1, s2)

（数２）に含まれる「Ｗｔ」、「Ｗｕ」、「Ｗｄ」、「Ｗｅ」、「Ｗｅｃ」及び「Ｗｅｓ」は、類似度それぞれに対する重みであり、０から１の間の値をとる。なお、類似度「Ｒ（ｓ１，ｓ２）」の算出では、上記の類似度のいずれかを利用することとしても良い。 “Wt”, “Wu”, “Wd”, “We”, “Wec”, and “Wes” included in (Equation 2) are weights for the respective similarities and take values between 0 and 1. In calculating the similarity “R (s1, s2)”, any of the above similarities may be used.

決定部５１４は、類似度が所定値を超えない範囲で、連続する発話セット間又は発話セットのテンプレート間の類似度が最も高くなるように、発話セット又は発話セットのテンプレートの順番を決定する。ここで、図１４を用いて、第５の実施形態に係る順番決定処理を説明する。図１４は、第５の実施形態に係る順番決定処理を説明する図である。 The determination unit 514 determines the order of utterance sets or utterance set templates so that the similarity between consecutive utterance sets or between utterance set templates is highest within a range in which the similarity does not exceed a predetermined value. Here, the order determination processing according to the fifth embodiment will be described with reference to FIG. FIG. 14 is a diagram for explaining the order determination process according to the fifth embodiment.

図１４では、類似度算出部５１２によって算出された、「発話セット１」、「発話セット２」、「発話セット３」及び「発話セット４」それぞれの間の類似度が表されている。また、ここでは、所定値が「０．９」である場合を例に挙げる。例えば、図１４に示すように、決定部５１４は、「発話セット１」の次に利用する発話セットを、所定値「０．９」を超えない範囲で、「発話セット１」との間の類似度が最も高い「発話セット４」に決定する。 In FIG. 14, similarities between “utterance set 1”, “utterance set 2”, “utterance set 3”, and “utterance set 4” calculated by the similarity calculation unit 512 are shown. Here, a case where the predetermined value is “0.9” is taken as an example. For example, as illustrated in FIG. 14, the determination unit 514 sets an utterance set to be used next to “utterance set 1” to “utterance set 1” within a range not exceeding a predetermined value “0.9”. The “speech set 4” having the highest similarity is determined.

そして、決定部５１４は、「発話セット４」の次に利用する発話セットを、所定値「０．９」を超えない範囲で、「発話セット４」との間の類似度が最も高い「発話セット３」に決定する。続いて、決定部５１４は、「発話セット３」の次に利用する発話セットを、所定値「０．９」を超えない範囲で、「発話セット３」との間の類似度が最も高い「発話セット２」に決定する。すなわち、決定部５１４は、図１４に示す類似度と所定値「０．９」とである場合に、「発話セット１」、「発話セット４」、「発話セット３」、「発話セット２」の順に発話セットの順番を決定する。 Then, the determination unit 514 sets the utterance set used next to the “utterance set 4” to the “utterance” having the highest degree of similarity with the “utterance set 4” within a range not exceeding the predetermined value “0.9”. Set 3 ”. Subsequently, the determination unit 514 sets the utterance set to be used next to the “utterance set 3” to the highest degree of similarity with the “utterance set 3” within a range not exceeding the predetermined value “0.9”. It determines to utterance set 2 ". That is, when the similarity shown in FIG. 14 is the predetermined value “0.9”, the determination unit 514 “utterance set 1”, “utterance set 4”, “utterance set 3”, “utterance set 2” The order of utterance sets is determined in the following order.

これらのように、連続する発話セット同士の類似度に所定値を設定し、類似度が所定値以上となる発話セットは連続して使用しないようにする。但し、類似度が低い発話セット同士を連続して使用すると、対話の内容が急に変化する場合があるため、所定値を超えない範囲で、連続する発話セット間の類似度が最も高くなるように発話セットの順番が決定される。 As described above, a predetermined value is set for the degree of similarity between consecutive utterance sets, and utterance sets whose similarity is equal to or higher than the predetermined value are not used continuously. However, if utterance sets with low similarity are used consecutively, the content of the conversation may change suddenly, so that the similarity between consecutive utterance sets is the highest within a range not exceeding the predetermined value. The order of utterance sets is determined.

発話セット取得部５０２は、決定された順番に従って発話セットを取得する。例えば、発話セット取得部５０２は、決定部５１４によって決定された発話セットの順番に従って、発話セット記憶部１０１から発話セットを取得する。また、第１出力部１０３ａは、発話セットの取得順に、第１発話を出力する。例えば、第１出力部１０３ａは、発話セット取得部５０２による発話セットの取得順に、発話セットに含まれる第１発話を所定の表示出力装置に出力する。なお、第２出力部１０３ｂは、検知部１０４によるユーザ発話の検知後に、第１出力部１０３ａによって出力された第１発話に対応する第２発話を所定の表示出力装置に出力する。 The utterance set acquisition unit 502 acquires an utterance set according to the determined order. For example, the utterance set acquisition unit 502 acquires an utterance set from the utterance set storage unit 101 in accordance with the order of the utterance set determined by the determination unit 514. The first output unit 103a outputs the first utterance in the order of acquisition of the utterance set. For example, the first output unit 103a outputs the first utterance included in the utterance set to a predetermined display output device in the order of acquisition of the utterance set by the utterance set acquisition unit 502. The second output unit 103b outputs a second utterance corresponding to the first utterance output by the first output unit 103a to a predetermined display output device after the detection unit 104 detects the user utterance.

次に、図１５を用いて、第５の実施形態に係る対話処理の流れを説明する。図１５は、第５の実施形態に係る対話処理の流れの例を示すフローチャートである。 Next, the flow of dialogue processing according to the fifth embodiment will be described with reference to FIG. FIG. 15 is a flowchart illustrating an example of a flow of interactive processing according to the fifth embodiment.

例えば、図１５に示すように、類似度算出部５１２は、発話セット間又は発話セットのテンプレート間の類似度を算出する（ステップＳ５０１）。また、決定部５１４は、類似度算出部５１２によって算出された類似度が所定値を超えない範囲で、連続する発話セット間又は発話セットのテンプレート間の類似度が最も高くなるように、発話セット又は発話セットのテンプレートの順番を決定する（ステップＳ５０２）。 For example, as shown in FIG. 15, the similarity calculation unit 512 calculates the similarity between utterance sets or between templates of an utterance set (step S501). Further, the determination unit 514 sets the utterance set so that the similarity between consecutive utterance sets or between templates of the utterance set is the highest within a range in which the similarity calculated by the similarity calculation unit 512 does not exceed a predetermined value. Alternatively, the order of the utterance set templates is determined (step S502).

また、発話セット取得部５０２は、決定部５１４によって決定された順番に従って、発話セット記憶部１０１に記憶された発話セットを取得する（ステップＳ５０３）。また、第１出力部１０３ａは、発話セット取得部５０２によって取得された発話セットに含まれる第１発話を所定の表示出力装置に出力する（ステップＳ５０４）。 Further, the utterance set acquisition unit 502 acquires the utterance set stored in the utterance set storage unit 101 according to the order determined by the determination unit 514 (step S503). In addition, the first output unit 103a outputs the first utterance included in the utterance set acquired by the utterance set acquisition unit 502 to a predetermined display output device (step S504).

また、検知部１０４は、第１出力部１０３ａによって出力された第１発話に対する、ユーザによる発話を検知したか否かを判定する（ステップＳ５０５）。このとき、検知部１０４によってユーザによる発話が検知された場合に（ステップＳ５０５肯定）、第２出力部１０３ｂは、第１出力部１０３ａによって出力された第１発話に対応する、発話セット取得部５０２によって取得された発話セットに含まれる第２発話を所定の表示出力装置に出力する（ステップＳ５０６）。また、第２発話の出力後、発話セット取得部５０２は、決定部５１４によって決定された順番に従って、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ５０３）。 Moreover, the detection part 104 determines whether the user's utterance with respect to the 1st utterance output by the 1st output part 103a was detected (step S505). At this time, when the utterance by the user is detected by the detection unit 104 (Yes in step S505), the second output unit 103b corresponds to the first utterance output by the first output unit 103a, and the utterance set acquisition unit 502 The second utterance included in the utterance set acquired in step S506 is output to a predetermined display output device (step S506). After the output of the second utterance, the utterance set acquisition unit 502 acquires the next utterance set from the utterance set storage unit 101 according to the order determined by the determination unit 514 (step S503).

一方、検知部１０４は、ユーザによる発話を検知していない場合に（ステップＳ５０５否定）、一定時間が経過したか否かを判定する（ステップＳ５０７）。このとき、検知部１０４は、一定時間が経過していない場合に（ステップＳ５０７否定）、再度、ユーザによる発話を検知したか否かを判定する（ステップＳ５０５）。一方、一定時間が経過した場合に（ステップＳ５０７肯定）、発話セット取得部５０２は、決定部５１４によって決定された順番に従って、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ５０３）。 On the other hand, when the utterance by the user is not detected (No at Step S505), the detection unit 104 determines whether or not a predetermined time has elapsed (Step S507). At this time, when the predetermined time has not elapsed (No at Step S507), the detection unit 104 determines again whether or not the user's utterance has been detected (Step S505). On the other hand, when the predetermined time has elapsed (Yes at Step S507), the utterance set acquisition unit 502 acquires the next utterance set from the utterance set storage unit 101 according to the order determined by the determination unit 514 (Step S503).

本実施形態によれば、発話セット間の類似度に基づいて発話セットの順番を決定するので、より自然な対話の流れを実現することができる。 According to this embodiment, since the order of utterance sets is determined based on the similarity between utterance sets, a more natural flow of conversation can be realized.

（第６の実施形態）
図１６は、第６の実施形態に係る対話装置の構成例を示すブロック図である。第６の実施形態では、第１の実施形態、第２の実施形態又は第５の実施形態と同様の処理を実行する機能部については同一の符号を付し、同様の処理についてはその説明を省略する場合がある。 (Sixth embodiment)
FIG. 16 is a block diagram illustrating a configuration example of the interactive apparatus according to the sixth embodiment. In the sixth embodiment, functional units that perform the same processes as those in the first embodiment, the second embodiment, or the fifth embodiment are denoted by the same reference numerals, and the description of the same processes is described. May be omitted.

例えば、図１６に示すように、対話装置６００は、発話セット記憶部１０１と、発話セット取得部６０２と、出力部１０３と、検知部１０４と、概念辞書記憶部２０５と、発話テンプレート記憶部５０６と、概念取得部２０７と、発話テンプレート取得部２０８と、発話セット生成部２０９と、類似度算出部６１２と、共起辞書記憶部５１３と、決定部６１４とを有する。また、第１の実施形態と同様に、対話装置６００は、ディスプレイ又はスピーカ等を備えた所定の表示出力装置に、文字又は音声を出力することによりユーザとの対話を実現する。 For example, as shown in FIG. 16, the dialogue apparatus 600 includes an utterance set storage unit 101, an utterance set acquisition unit 602, an output unit 103, a detection unit 104, a concept dictionary storage unit 205, and an utterance template storage unit 506. A concept acquisition unit 207, an utterance template acquisition unit 208, an utterance set generation unit 209, a similarity calculation unit 612, a co-occurrence dictionary storage unit 513, and a determination unit 614. Similarly to the first embodiment, the dialogue apparatus 600 realizes a dialogue with the user by outputting characters or voices to a predetermined display output device including a display or a speaker.

類似度算出部６１２は、新たな発話セットを含む発話セット間、又は、発話セットのテンプレート間の類似度を算出する。かかる類似度算出部６１２による処理は、第５の実施形態に係る類似度算出部５１２による処理と同様であるため、ここでは詳細な説明を省略する。 The similarity calculation unit 612 calculates the similarity between utterance sets including a new utterance set or between templates of an utterance set. Since the processing by the similarity calculation unit 612 is the same as the processing by the similarity calculation unit 512 according to the fifth embodiment, detailed description thereof is omitted here.

決定部６１４は、類似度が所定値以上となる発話セット同士又は発話セットのテンプレート同士を同一のグループにグループ分けし、類似度が最も近い異なるグループを順次選択して、連続する発話セット又は発話セットのテンプレートの順番を決定する。例えば、決定部６１４は、類似度算出部６１２によって算出された類似度が所定値「Ｘ１」以上であるというルールのもとに発話セットをグルーピングする。 The determination unit 614 groups utterance sets having similarities equal to or higher than a predetermined value or templates of utterance sets into the same group, sequentially selects different groups having the closest similarity, and successively sets utterance sets or utterances. Determine the order of the templates in the set. For example, the determination unit 614 groups speech sets based on a rule that the similarity calculated by the similarity calculation unit 612 is equal to or greater than a predetermined value “X1”.

そして、決定部６１４は、グループ同士の類似度を求める。グループ同士の類似度は、例えば、同一グループに属する発話セット間の類似度の平均値、最大値又は最小値のいずれかとする。続いて、決定部６１４は、同じグループから連続する発話セットを選択しない、すなわち異なるグループから連続する発話セットを順次選択し、発話セットの順番を決定する。但し、決定部６１４は、異なるグループから連続する発話セットを順次選択する場合に、グループ間の類似度が最も近いグループから連続する発話セットを順次選択する。 And the determination part 614 calculates | requires the similarity of groups. The similarity between groups is, for example, one of an average value, a maximum value, and a minimum value of the similarity between utterance sets belonging to the same group. Subsequently, the determination unit 614 does not select a continuous utterance set from the same group, that is, sequentially selects a continuous utterance set from a different group, and determines the order of the utterance sets. However, when the determination unit 614 sequentially selects continuous speech sets from different groups, the determination unit 614 sequentially selects continuous speech sets from the group having the closest similarity between the groups.

なお、グループから発話セットを選択する場合には、一度選択されたグループは全てのグループが選択された後にしか再選択できないようにしても良い。また、毎発話セットで類似度が所定値未満のグループに属する発話セットを選択しなくても良く、Ｎ回（「Ｎ」は、自然数）だけ同じグループ内から連続して発話セットを選択し、その後、他のグループから同じようにＮ回選択するようにしても良い。 When selecting an utterance set from a group, a group once selected may be selected again only after all groups are selected. Further, it is not necessary to select an utterance set belonging to a group having a similarity less than a predetermined value in each utterance set, and select an utterance set continuously from the same group N times (“N” is a natural number), Then, you may make it select N times from another group similarly.

発話セット取得部６０２は、決定された順番に従って発話セットを取得する。例えば、発話セット取得部６０２は、決定部６１４によって決定された発話セットの順番に従って、発話セット記憶部１０１から発話セットを取得する。また、第１出力部１０３ａは、発話セットの取得順に、第１発話を出力する。例えば、第１出力部１０３ａは、発話セット取得部６０２による発話セットの取得順に、発話セットに含まれる第１発話を所定の表示出力装置に出力する。なお、第２出力部１０３ｂは、検知部１０４によるユーザ発話の検知後に、第１出力部１０３ａによって出力された第１発話に対応する第２発話を所定の表示出力装置に出力する。 The utterance set acquisition unit 602 acquires the utterance set according to the determined order. For example, the utterance set acquisition unit 602 acquires the utterance set from the utterance set storage unit 101 according to the order of the utterance sets determined by the determination unit 614. The first output unit 103a outputs the first utterance in the order of acquisition of the utterance set. For example, the first output unit 103a outputs the first utterance included in the utterance set to a predetermined display output device in the order of acquisition of the utterance set by the utterance set acquisition unit 602. The second output unit 103b outputs a second utterance corresponding to the first utterance output by the first output unit 103a to a predetermined display output device after the detection unit 104 detects the user utterance.

次に、図１７を用いて、第６の実施形態に係る対話処理の流れを説明する。図１７は、第６の実施形態に係る対話処理の流れの例を示すフローチャートである。 Next, the flow of dialogue processing according to the sixth embodiment will be described with reference to FIG. FIG. 17 is a flowchart illustrating an example of the flow of interactive processing according to the sixth embodiment.

例えば、図１７に示すように、類似度算出部６１２は、発話セット間又は発話セットのテンプレート間の類似度を算出する（ステップＳ６０１）。また、決定部６１４は、類似度算出部６１２によって算出された類似度が所定値以上となる発話セット同士を同一グループにグループ分けする（ステップＳ６０２）。続いて、決定部６１４は、グループ同士の類似度に基づいて、連続する発話セットを各グループから選択し、発話セットの順番を決定する（ステップＳ６０３）。 For example, as shown in FIG. 17, the similarity calculation unit 612 calculates the similarity between utterance sets or between templates of an utterance set (step S601). The determination unit 614 groups utterance sets whose similarity calculated by the similarity calculation unit 612 is equal to or greater than a predetermined value into the same group (step S602). Subsequently, the determination unit 614 selects a continuous utterance set from each group based on the similarity between the groups, and determines the order of the utterance sets (step S603).

また、発話セット取得部６０２は、決定部６１４によって決定された順番に従って、発話セット記憶部１０１に記憶された発話セットを取得する（ステップＳ６０４）。また、第１出力部１０３ａは、発話セット取得部６０２によって取得された発話セットに含まれる第１発話を所定の表示出力装置に出力する（ステップＳ６０５）。 Further, the utterance set acquisition unit 602 acquires the utterance set stored in the utterance set storage unit 101 according to the order determined by the determination unit 614 (step S604). Also, the first output unit 103a outputs the first utterance included in the utterance set acquired by the utterance set acquisition unit 602 to a predetermined display output device (step S605).

また、検知部１０４は、第１出力部１０３ａによって出力された第１発話に対する、ユーザによる発話を検知したか否かを判定する（ステップＳ６０６）。このとき、検知部１０４によってユーザによる発話が検知された場合に（ステップＳ６０６肯定）、第２出力部１０３ｂは、第１出力部１０３ａによって出力された第１発話に対応する、発話セット取得部６０２によって取得された発話セットに含まれる第２発話を所定の表示出力装置に出力する（ステップＳ６０７）。また、第２発話の出力後、発話セット取得部６０２は、決定部６１４によって決定された順番に従って、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ６０４）。 In addition, the detection unit 104 determines whether or not an utterance by the user with respect to the first utterance output by the first output unit 103a has been detected (step S606). At this time, when the utterance by the user is detected by the detection unit 104 (Yes in step S606), the second output unit 103b corresponds to the first utterance output by the first output unit 103a, and the utterance set acquisition unit 602. The second utterance included in the utterance set obtained by the above is output to a predetermined display output device (step S607). In addition, after outputting the second utterance, the utterance set acquisition unit 602 acquires the next utterance set from the utterance set storage unit 101 in accordance with the order determined by the determination unit 614 (step S604).

一方、検知部１０４は、ユーザによる発話を検知していない場合に（ステップＳ６０６否定）、一定時間が経過したか否かを判定する（ステップＳ６０８）。このとき、検知部１０４は、一定時間が経過していない場合に（ステップＳ６０８否定）、再度、ユーザによる発話を検知したか否かを判定する（ステップＳ６０６）。一方、一定時間が経過した場合に（ステップＳ６０８肯定）、発話セット取得部６０２は、決定部６１４によって決定された順番に従って、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ６０４）。 On the other hand, when the utterance by the user is not detected (No at Step S606), the detection unit 104 determines whether a certain time has elapsed (Step S608). At this time, when the predetermined time has not elapsed (No at Step S608), the detection unit 104 determines again whether or not the user's utterance has been detected (Step S606). On the other hand, when the predetermined time has elapsed (Yes at Step S608), the utterance set acquisition unit 602 acquires the next utterance set from the utterance set storage unit 101 according to the order determined by the determination unit 614 (Step S604).

本実施形態によれば、発話セット同士の類似度が所定値以上のものをグルーピングして、グループ同士の類似度に基づいて発話セットの順番を決定するので、より自然な対話を実現することができる。 According to the present embodiment, groups having similarities between utterance sets are grouped, and the order of the utterance sets is determined based on the similarity between the groups, so that more natural dialogue can be realized. it can.

（第７の実施形態）
図１８は、第７の実施形態に係る対話装置の構成例を示すブロック図である。第７の実施形態では、第１の実施形態、第２の実施形態、第４の実施形態又は第５の実施形態と同様の処理を実行する機能部については同一の符号を付し、同様の処理についてはその説明を省略する場合がある。 (Seventh embodiment)
FIG. 18 is a block diagram illustrating a configuration example of the interactive apparatus according to the seventh embodiment. In the seventh embodiment, the same reference numerals are given to the functional units that perform the same processes as those in the first embodiment, the second embodiment, the fourth embodiment, or the fifth embodiment, and the same The description of the processing may be omitted.

例えば、図１８に示すように、対話装置７００は、発話セット記憶部１０１と、発話セット取得部７０２と、出力部１０３と、検知部１０４と、概念辞書記憶部２０５と、発話テンプレート記憶部２０６と、概念取得部４０７と、発話テンプレート取得部２０８と、発話セット生成部４０９と、抽出部４１１と、類似度算出部７１２と、共起辞書記憶部５１３と、決定部７１４とを有する。また、第１の実施形態と同様に、対話装置７００は、ディスプレイ又はスピーカ等を備えた所定の表示出力装置に、文字又は音声を出力することによりユーザとの対話を実現する。 For example, as shown in FIG. 18, the dialogue apparatus 700 includes an utterance set storage unit 101, an utterance set acquisition unit 702, an output unit 103, a detection unit 104, a concept dictionary storage unit 205, and an utterance template storage unit 206. A concept acquisition unit 407, an utterance template acquisition unit 208, an utterance set generation unit 409, an extraction unit 411, a similarity calculation unit 712, a co-occurrence dictionary storage unit 513, and a determination unit 714. Similarly to the first embodiment, the dialogue apparatus 700 realizes a dialogue with the user by outputting characters or voices to a predetermined display output device including a display or a speaker.

類似度算出部７１２は、抽出された語句と新たな発話セットを含む発話セットとの間、及び、発話セット間の類似度を算出する。例えば、類似度算出部７１２は、抽出部４１１による音声認識処理で抽出された単語と、各発話セットと間の類似度を算出するとともに、発話セット間の類似度を算出する。発話セット間の類似度の算出は上述してきた実施形態と同様であるためその説明を省略し、ここでは、抽出された単語と発話セットとの間の類似度の算出について説明する。 The similarity calculation unit 712 calculates the similarity between the extracted words and utterance sets including the new utterance set and between utterance sets. For example, the similarity calculation unit 712 calculates the similarity between the words extracted by the speech recognition processing by the extraction unit 411 and each utterance set, and calculates the similarity between the utterance sets. Since the calculation of the similarity between utterance sets is the same as in the above-described embodiment, the description thereof will be omitted. Here, the calculation of the similarity between the extracted word and the utterance set will be described.

また、以下では、下記の単語と発話セットとを例に挙げる。
単語ｋ１「カレー」
発話セットｓ１
「第１発話：アイスクリームは好き？第２発話：アイスクリームは冷たいよね」 In the following, the following words and utterance sets are given as examples.
The word k1 “curry”
Utterance set s1
"First utterance: Do you like ice cream? Second utterance: Ice cream is cold"

詳細には、類似度算出部７１２は、単語ｋ１と発話セットｓ１とに含まれる単語「カレー」、「アイスクリーム」の概念の類似度「Ｒｅｃ（ｋ１，ｓ１）＝１」を算出する。なお、発話セット中に複数の単語が含まれている場合には、平均値、最大値又は最小値のいずれかを類似度とすれば良い。 Specifically, the similarity calculation unit 712 calculates the similarity “Rec (k1, s1) = 1” of the concepts “curry” and “ice cream” included in the word k1 and the utterance set s1. When a plurality of words are included in the utterance set, any one of the average value, the maximum value, and the minimum value may be set as the similarity.

また、類似度算出部７１２は、単語「カレー」、「アイスクリーム」の共起率「０．２」と、単語「カレー」、「冷たい」の共起率「０．０１」と、単語「カレー」、「好き」の共起率「０．０５」とを共起辞書記憶部５１３から取得する。そして、類似度算出部７１２は、共起率を用いた類似度「Ｒｅｓ（ｋ１，ｓ１）＝（０．２＋０．０１＋０．０５）／３＝０．０８７」を求める。ここでは、平均値を使用する例を挙げたが、最大値又は最小値のいずれかを類似度としても良い。 In addition, the similarity calculation unit 712 has a co-occurrence rate “0.2” of the words “curry” and “ice cream”, a co-occurrence rate “0.01” of the words “curry” and “cold”, and the word “curry”. The co-occurrence rate “0.05” of “curry” and “like” is acquired from the co-occurrence dictionary storage unit 513. Then, the similarity calculation unit 712 calculates the similarity “Res (k1, s1) = (0.2 + 0.01 + 0.05) /3=0.087” using the co-occurrence rate. Here, an example in which the average value is used has been described, but either the maximum value or the minimum value may be used as the similarity.

以上をふまえ、類似度算出部７１２は、単語ｋ１と発話セットｓ１との間の類似度「Ｒ（ｋ１，ｓ１）」を、以下の（数３）により算出する。 Based on the above, the similarity calculation unit 712 calculates the similarity “R (k1, s1)” between the word k1 and the utterance set s1 by the following (Equation 3).

（数３）
Ｒ（ｋ１，ｓ１）＝Ｗｅｃ＊Ｒｅｃ（ｋ１，ｓ１）＋Ｗｅｓ＊Ｒｅｓ（ｋ１，ｓ１） (Equation 3)
R (k1, s1) = Wec * Rec (k1, s1) + Wes * Res (k1, s1)

（数３）に含まれる「Ｗｅｃ」及び「Ｗｅｓ」は、類似度それぞれに対する重みであり、０から１の間の値をとる。なお、類似度「Ｒ（ｋ１，ｓ１）」の算出では、上記の類似度のいずれかを利用することとしても良い。 “Wec” and “Wes” included in (Equation 3) are weights for the respective similarities, and take values between 0 and 1. In calculating the similarity “R (k1, s1)”, any of the above similarities may be used.

決定部７１４は、類似度が最も高い発話セットを一番目の発話セットとして、該一番目の発話セットから類似度が所定値を超えない範囲で、連続する発話セット間の類似度が最も高くなるように、発話セットの順番を決定する。例えば、決定部７１４は、類似度算出部７１２によって算出された単語ｋ１と各発話セットとの間の類似度のうち、最も高い類似度となった発話セットを一番目の発話セットとする。 The determination unit 714 sets the utterance set having the highest similarity as the first utterance set, and the similarity between consecutive utterance sets is highest within a range in which the similarity does not exceed a predetermined value from the first utterance set. Thus, the order of the utterance set is determined. For example, the determination unit 714 sets the utterance set having the highest similarity among the similarities between the word k1 calculated by the similarity calculation unit 712 and each utterance set as the first utterance set.

そして、決定部７１４は、類似度算出部７１２によって算出された発話セット間の類似度に基づいて、一番目の発話セットから類似度が所定値を超えない範囲で、連続する発話セット間の類似度が最も高くなるように、発話セットの順番を決定していく。すなわち、一番目の発話セットが決定された後、連続する発話セット同士の類似度に所定値を設定し、類似度が所定値以上となる発話セットは連続して使用しないようにする。但し、類似度が低い発話セット同士を連続して使用すると、対話の内容が急に変化する場合があるため、所定値を超えない範囲で、連続する発話セット間の類似度が最も高くなるように発話セットの順番が決定される。 Based on the similarity between utterance sets calculated by the similarity calculation unit 712, the determination unit 714 determines the similarity between consecutive utterance sets within a range in which the similarity does not exceed a predetermined value from the first utterance set. The order of the utterance set is determined so that the degree becomes the highest. That is, after the first utterance set is determined, a predetermined value is set for the similarity between consecutive utterance sets, and an utterance set having a similarity greater than or equal to the predetermined value is not used continuously. However, if utterance sets with low similarity are used consecutively, the content of the conversation may change suddenly, so that the similarity between consecutive utterance sets is the highest within a range not exceeding the predetermined value. The order of utterance sets is determined.

発話セット取得部７０２は、決定された順番に従って発話セットを取得する。例えば、発話セット取得部７０２は、決定部７１４によって決定された発話セットの順番に従って、発話セット記憶部１０１から発話セットを取得する。また、第１出力部１０３ａは、発話セットの取得順に、第１発話を出力する。例えば、第１出力部１０３ａは、発話セット取得部７０２による発話セットの取得順に、発話セットに含まれる第１発話を所定の表示出力装置に出力する。なお、第２出力部１０３ｂは、検知部１０４によるユーザ発話の検知後に、第１出力部１０３ａによって出力された第１発話に対応する第２発話を所定の表示出力装置に出力する。また、抽出部４１１によって新たな単語が抽出された場合には、上記処理を再度実行して、発話セットの順番を更新する。 The utterance set acquisition unit 702 acquires the utterance set according to the determined order. For example, the utterance set acquisition unit 702 acquires the utterance set from the utterance set storage unit 101 according to the order of the utterance set determined by the determination unit 714. The first output unit 103a outputs the first utterance in the order of acquisition of the utterance set. For example, the first output unit 103a outputs the first utterance included in the utterance set to a predetermined display output device in the order of acquisition of the utterance set by the utterance set acquisition unit 702. The second output unit 103b outputs a second utterance corresponding to the first utterance output by the first output unit 103a to a predetermined display output device after the detection unit 104 detects the user utterance. If a new word is extracted by the extraction unit 411, the above process is executed again to update the order of the utterance set.

次に、図１９を用いて、第７の実施形態に係る対話処理の流れを説明する。図１９は、第７の実施形態に係る対話処理の流れの例を示すフローチャートである。 Next, the flow of interactive processing according to the seventh embodiment will be described with reference to FIG. FIG. 19 is a flowchart illustrating an example of the flow of interactive processing according to the seventh embodiment.

例えば、図１９に示すように、検知部１０４によってユーザによる発話が検知された場合に（ステップＳ７０１肯定）、抽出部４１１は、音声認識処理を実行することにより、ユーザによる発話に含まれる単語を抽出する（ステップＳ７０２）。また、抽出部４１１は、検知部１０４によってユーザによる発話が検知されていない場合に（ステップＳ７０１否定）、検知部１０４によるユーザによる発話の検知待ちの状態となる。 For example, as shown in FIG. 19, when the utterance by the user is detected by the detection unit 104 (Yes in step S701), the extraction unit 411 executes a speech recognition process, thereby detecting a word included in the utterance by the user. Extract (step S702). Further, when the detection unit 104 has not detected the user's utterance (No in step S701), the extraction unit 411 enters a state of waiting for the user's detection of the utterance by the detection unit 104.

また、類似度算出部７１２は、抽出部４１１によって抽出された単語と各発話セットとの間、及び、発話セット間の類似度を算出する（ステップＳ７０３）。また、決定部７１４は、類似度算出部７１２によって算出された単語と発話セットとの類似度をもとに一番目の発話セットを決定し、類似度算出部７１２によって算出された発話セット間の類似度をもとに一番目の発話セットに続く発話セットの順番を決定する（ステップＳ７０４）。 Further, the similarity calculation unit 712 calculates the similarity between the words extracted by the extraction unit 411 and each utterance set and between utterance sets (step S703). Also, the determination unit 714 determines the first utterance set based on the similarity between the word calculated by the similarity calculation unit 712 and the utterance set, and between the utterance sets calculated by the similarity calculation unit 712. The order of the utterance set following the first utterance set is determined based on the similarity (step S704).

また、発話セット取得部７０２は、決定部７１４によって決定された順番に従って、発話セット記憶部１０１に記憶された発話セットを取得する（ステップＳ７０５）。また、第１出力部１０３ａは、発話セット取得部７０２によって取得された発話セットに含まれる第１発話を所定の表示出力装置に出力する（ステップＳ７０６）。 In addition, the utterance set acquisition unit 702 acquires the utterance set stored in the utterance set storage unit 101 in the order determined by the determination unit 714 (step S705). Also, the first output unit 103a outputs the first utterance included in the utterance set acquired by the utterance set acquisition unit 702 to a predetermined display output device (step S706).

また、検知部１０４は、第１出力部１０３ａによって出力された第１発話に対する、ユーザによる発話を検知したか否かを判定する（ステップＳ７０７）。このとき、検知部１０４によってユーザによる発話が検知された場合に（ステップＳ７０７肯定）、第２出力部１０３ｂは、第１出力部１０３ａによって出力された第１発話に対応する、発話セット取得部７０２によって取得された発話セットに含まれる第２発話を所定の表示出力装置に出力する（ステップＳ７０８）。また、第２発話の出力後、発話セット取得部７０２は、決定部７１４によって決定された順番に従って、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ７０５）。 Moreover, the detection part 104 determines whether the user's utterance with respect to the 1st utterance output by the 1st output part 103a was detected (step S707). At this time, when the user's utterance is detected by the detection unit 104 (Yes in step S707), the second output unit 103b corresponds to the first utterance output by the first output unit 103a, and the utterance set acquisition unit 702 The second utterance included in the utterance set acquired in step S708 is output to a predetermined display output device (step S708). After the output of the second utterance, the utterance set acquisition unit 702 acquires the next utterance set from the utterance set storage unit 101 according to the order determined by the determination unit 714 (step S705).

一方、検知部１０４は、ユーザによる発話を検知していない場合に（ステップＳ７０７否定）、一定時間が経過したか否かを判定する（ステップＳ７０９）。このとき、検知部１０４は、一定時間が経過していない場合に（ステップＳ７０９否定）、再度、ユーザによる発話を検知したか否かを判定する（ステップＳ７０７）。一方、一定時間が経過した場合に（ステップＳ７０９肯定）、発話セット取得部７０２は、決定部７１４によって決定された順番に従って、次の発話セットを発話セット記憶部１０１から取得する（ステップＳ７０５）。 On the other hand, when the utterance by the user is not detected (No at Step S707), the detection unit 104 determines whether or not a certain time has elapsed (Step S709). At this time, when the predetermined time has not elapsed (No at Step S709), the detection unit 104 determines again whether or not the user's utterance has been detected (Step S707). On the other hand, when the predetermined time has elapsed (Yes in step S709), the utterance set acquisition unit 702 acquires the next utterance set from the utterance set storage unit 101 according to the order determined by the determination unit 714 (step S705).

本実施形態によれば、音声認識で抽出された単語を含んだ発話セットを使用するので、より自然な対話を実現することができる。 According to the present embodiment, since an utterance set including a word extracted by speech recognition is used, a more natural dialogue can be realized.

（上記以外の実施形態）
上記実施形態では、ユーザによる発話の内容を認識することなく対話を実現する場合を説明したが、対話の続行に支障をきたさないような対話を実現することが好ましい。例えば、ユーザによる発話が「聞き返し」等の所定発話である場合に、そのまま次の発話を出力すると、ユーザを無視した対話になる可能性がある。これを回避するために、ユーザによる発話が「聞き返し」等の所定発話である場合に、再度、直前の発話を出力することもできる。 (Embodiments other than the above)
In the above-described embodiment, the case where the dialogue is realized without recognizing the content of the utterance by the user has been described. However, it is preferable to realize the dialogue that does not hinder the continuation of the dialogue. For example, when the utterance by the user is a predetermined utterance such as “return”, if the next utterance is output as it is, there is a possibility that the dialogue is ignored by the user. In order to avoid this, when the utterance by the user is a predetermined utterance such as “return”, the previous utterance can be output again.

図２０は、再発話の要求時に係る対話処理の流れの例を示すフローチャートである。例えば、図２０に示すように、対話装置１００は、発話セットを発話セット記憶部１０１から取得し（ステップＳ８０１）、取得した発話セットに含まれる第１発話を所定の表示出力装置に出力する（ステップＳ８０２）。 FIG. 20 is a flowchart illustrating an example of the flow of dialogue processing related to a request for recurrent speech. For example, as shown in FIG. 20, the interactive device 100 acquires an utterance set from the utterance set storage unit 101 (step S801), and outputs the first utterance included in the acquired utterance set to a predetermined display output device ( Step S802).

そして、対話装置１００は、ユーザによる発話を検知した場合に（ステップＳ８０３肯定）、検知したユーザによる発話が再発話の要求であるか否かを判定する（ステップＳ８０４）。かかる再発話の要求であるか否かの判定では、「もう一回話して」等の所定発話を予め保持しておき、検知したユーザによる発話が所定発話に該当するか否かを判定する。また、所定のボタンが押下された場合に、これを再発話の要求とみなしても良い。このとき、対話装置１００は、再発話の要求であると判定した場合に（ステップ８０４肯定）、再度、第１発話を出力する（ステップＳ８０２）。第１発話の再出力では、例えば、音声を大きくしたり、発話の速度をさらに遅くしたりする等、ユーザに理解されやすいように変更しても良い。 Then, when the utterance by the user is detected (Yes at Step S803), the dialogue apparatus 100 determines whether or not the detected utterance by the user is a request for re-utterance (Step S804). In determining whether or not it is a request for such a recurrent utterance, a predetermined utterance such as “speak again” is held in advance, and it is determined whether or not the detected utterance by the user corresponds to the predetermined utterance. Further, when a predetermined button is pressed, this may be regarded as a request for re-speech. At this time, when the interactive apparatus 100 determines that the request is for a recurrent utterance (Yes at step 804), it outputs the first utterance again (step S802). In the re-output of the first utterance, for example, the voice may be increased or the utterance speed may be further reduced so as to be easily understood by the user.

一方、対話装置１００は、再発話の要求でないと判定した場合に（ステップＳ８０４否定）、第２発話を出力する（ステップＳ８０５）。また、対話装置１００は、ユーザによる発話を検知していない場合に（ステップＳ８０３否定）、一定時間が経過したか否かを判定する（ステップＳ８０６）。このとき、対話装置１００は、一定時間が経過していないと判定した場合に（ステップＳ８０６否定）、ステップＳ８０３の処理を実行する。一方、対話装置１００は、一定時間が経過したと判定した場合に（ステップＳ８０６肯定）、ステップＳ８０１の処理を実行する。なお、再発話要求の判定処理は、第２発話の出力後にも実行しても良い。 On the other hand, when determining that the request is not a re-utterance request (No at Step S804), the interactive apparatus 100 outputs the second utterance (Step S805). In addition, when the utterance by the user is not detected (No at Step S803), the dialogue apparatus 100 determines whether a certain time has passed (Step S806). At this time, when it is determined that the predetermined time has not elapsed (No at Step S806), the interactive apparatus 100 executes the process at Step S803. On the other hand, when it is determined that the certain time has elapsed (Yes at Step S806), the interactive apparatus 100 executes the process at Step S801. The re-utterance request determination process may be executed even after the second utterance is output.

本実施形態によれば、ユーザ発話が所定発話である場合に、再度直前の発話を出力するので、一方的な対話になるのを抑制することができる。 According to the present embodiment, when the user utterance is a predetermined utterance, since the immediately previous utterance is output again, it is possible to suppress a one-way conversation.

また、上記実施形態では、ユーザによる発話を一定時間検知できなければ次の発話セットを使用した発話を行なう場合を説明したが、ユーザが何も発話しなかった場合でも適切な発話をすることもできる。ここでは、ユーザによる発話の応答の発話を表す「第２発話」に、ユーザが何も発話しなかった場合に出力する発話を表す「第３発話」が含まれる場合を例に挙げる。 Further, in the above embodiment, the case has been described in which the utterance using the next utterance set is performed if the utterance by the user cannot be detected for a certain period of time, but an appropriate utterance may be made even if the user does not utter anything. it can. Here, a case where the “second utterance” representing the utterance of the response of the utterance by the user includes the “third utterance” representing the utterance output when the user does not utter anything.

図２１は、第３発話を含む発話セット記憶部１０１に記憶される情報例を示す図である。例えば、図２１に示すように、発話セット記憶部１０１は、第１発話「映画は好き？」と、想定ユーザ発話「好きだよ」、「（応答なし）」と、第２発話「映画はいいよね」、「僕は映画好きだよ（第３発話）」とを含んだ「発話セット１」を記憶する。上記実施形態と同様に、「想定ユーザ発話」は、発話セット記憶部１０１に含まれていなくても良い。 FIG. 21 is a diagram illustrating an example of information stored in the utterance set storage unit 101 including the third utterance. For example, as illustrated in FIG. 21, the utterance set storage unit 101 includes a first utterance “Is a movie?”, An assumed user utterance “I like you”, “(no response)”, and a second utterance “movie is I remember "Speech Set 1" including "I like movies" (3rd utterance). Similar to the above-described embodiment, the “assumed user utterance” may not be included in the utterance set storage unit 101.

図２２は、一定時間にユーザ発話を検知できない場合の対話処理の流れの例を示すフローチャートである。例えば、図２２に示すように、対話装置１００は、発話セットを発話セット記憶部１０１から取得し（ステップＳ９０１）、取得した発話セットに含まれる第１発話を所定の表示出力装置に出力する（ステップＳ９０２）。 FIG. 22 is a flowchart illustrating an example of the flow of interactive processing when a user utterance cannot be detected at a certain time. For example, as shown in FIG. 22, the dialogue apparatus 100 acquires an utterance set from the utterance set storage unit 101 (step S901), and outputs the first utterance included in the acquired utterance set to a predetermined display output device ( Step S902).

そして、対話装置１００は、ユーザによる発話を検知した場合に（ステップＳ９０３肯定）、検知したユーザによる発話が再発話の要求であるか否かを判定する（ステップＳ９０４）。このとき、対話装置１００は、再発話の要求であると判定した場合に（ステップＳ９０４肯定）、再度、第１発話を出力する（ステップＳ９０２）。一方、対話装置１００は、再発話の要求でないと判定した場合に（ステップＳ９０４否定）、第２発話を出力する（ステップＳ９０５）。 Then, when the utterance by the user is detected (Yes at Step S903), the dialogue apparatus 100 determines whether or not the detected utterance by the user is a request for re-utterance (Step S904). At this time, when the dialogue apparatus 100 determines that the request is a request for recurrent utterance (Yes in step S904), it outputs the first utterance again (step S902). On the other hand, when determining that the request is not a re-utterance request (No at step S904), the interactive apparatus 100 outputs the second utterance (step S905).

また、対話装置１００は、ユーザによる発話を検知していない場合に（ステップＳ９０３否定）、一定時間が経過したか否かを判定する（ステップＳ９０６）。このとき、対話装置１００は、一定時間が経過していないと判定した場合に（ステップＳ９０６否定）、ステップＳ９０３の処理を実行する。一方、対話装置１００は、一定時間が経過したと判定した場合に（ステップＳ９０６肯定）、第３発話を出力する（ステップＳ９０７）。 In addition, when the utterance by the user is not detected (No at Step S903), the dialogue apparatus 100 determines whether a certain time has passed (Step S906). At this time, when it is determined that the predetermined time has not elapsed (No at Step S906), the interactive apparatus 100 executes the process at Step S903. On the other hand, when determining that the predetermined time has elapsed (Yes at Step S906), the interactive apparatus 100 outputs the third utterance (Step S907).

本実施形態によれば、一定時間にユーザ発話を検知できない場合でも、対応する発話を出力するので、適切な対話を実現することができる。 According to the present embodiment, even when a user utterance cannot be detected at a certain time, a corresponding utterance is output, so that an appropriate dialogue can be realized.

また、上記実施形態２では、概念辞書記憶部２０５には固有名詞や新しい単語等が含まれていないが、発話セットのテンプレート中に記載された概念に合わせて未知の単語を追加することもできる。未知の単語を追加する場合には、単語の概念を概念辞書記憶部２０５から取得及び付与し、概念を含む発話セットのテンプレートを検索し、発話セットを生成する。これにより、常用される単語だけでなく、話題のキーワードやおすすめの商品名等を追加することができる。例えば、商品名「チョコレートクランチＮＥＷ」という未知の単語を概念「食べ物」と設定し、商品の特徴である「やみつきになるほどおいしい」というフレーズを概念「味」と設定する。この結果、「チョコレートクランチＮＥＷは好き？」や、「チョコレートクランチＮＥＷはやみつきになるほどおいしいんだよ」等の発話セットを生成することができる。すなわち、本実施形態によれば、日常の対話だけでなく、商品の販売促進等にも適用することができる。 In the second embodiment, the concept dictionary storage unit 205 does not include proper nouns or new words, but unknown words can be added in accordance with the concepts described in the utterance set template. . When an unknown word is added, the concept of the word is acquired and assigned from the concept dictionary storage unit 205, the utterance set template including the concept is searched, and the utterance set is generated. This makes it possible to add not only commonly used words but also hot keywords and recommended product names. For example, an unknown word with the product name “Chocolate Crunch NEW” is set as a concept “food”, and a phrase “delicious enough to become addictive” that is a feature of the product is set as a concept “taste”. As a result, it is possible to generate an utterance set such as “Do you like chocolate crunch NEW?” Or “Chocolate crunch NEW is delicious enough to be addictive”. That is, according to the present embodiment, the present invention can be applied not only to daily conversation but also to sales promotion of products.

また、上記実施形態では、２つの発話セットに含まれる第１発話同士、第２発話同士の編集距離をもとに類似度を算出する場合を説明したが、第２発話と次の発話セットの第１発話との編集距離をもとに類似度を算出することもできる。すなわち、第１発話、第２発話の順にユーザが認識することから、第２発話と次の発話セットの第１発話との発話が似たような発話であると好ましくないので、これらの編集距離をもとに類似度を算出する。 Moreover, although the said embodiment demonstrated the case where similarity was calculated based on the edit distance of the 1st utterances contained in two utterance sets, and the 2nd utterances, a 2nd utterance and the following utterance set of The similarity can also be calculated based on the edit distance from the first utterance. That is, since the user recognizes in the order of the first utterance and the second utterance, it is not preferable that the utterance between the second utterance and the first utterance of the next utterance set is similar. The similarity is calculated based on the above.

また、上述してきた実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Further, the above-described embodiments are presented as examples, and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１００対話装置
１０１発話セット記憶部
１０２発話セット取得部
１０３出力部
１０３ａ第１出力部
１０３ｂ第２出力部
１０４検知部 DESCRIPTION OF SYMBOLS 100 Dialogue device 101 Utterance set memory | storage part 102 Utterance set acquisition part 103 Output part 103a 1st output part 103b 2nd output part 104 Detection part

Claims

An utterance set storage unit that stores an utterance set including a first utterance and a second utterance representing an utterance of a response to an utterance by a user assumed as a response to the first utterance;
An utterance set acquisition unit for acquiring the utterance set;
A first output unit that outputs the first utterance included in the acquired utterance set;
A detection unit for detecting an utterance by the user after the first utterance is output;
And a second output unit that outputs the second utterance included in the acquired utterance set when an utterance by the user is detected.

An utterance template storage unit for storing a template of the utterance set in which some words are expressed in concept;
A concept dictionary storage that stores the word and the concept in association with each other;
A concept acquisition unit that acquires the concept corresponding to the word from the concept dictionary storage unit when the word is input;
An utterance template acquisition unit for acquiring a template of the utterance set including the acquired concept from the utterance template storage unit;
An utterance set generation unit that generates a new utterance set by inserting the input phrase into the concept included in the acquired utterance set template, and stores the generated new utterance set in the utterance set storage unit The interactive apparatus according to claim 1, further comprising:

An extractor for extracting a phrase included in the utterance by the user that matches the phrase stored in the concept dictionary storage;
The concept acquisition unit acquires the concept corresponding to the extracted phrase from the concept dictionary storage unit,
The dialogue apparatus according to claim 2, wherein the utterance set generation unit generates a new utterance set by inserting the extracted phrase into a concept included in the acquired template of the utterance set. .

An utterance template storage unit for storing a template of the utterance set in which some words are expressed in concept;
A unique word storage unit that stores a unique phrase acquired from the outside and the concept of the unique phrase in association with each other;
An utterance template acquisition unit for acquiring a template of the utterance set including the concept of the unique phrase from the utterance template storage unit;
An utterance set generation unit that generates a new utterance set by inserting the unique words into the concept included in the acquired template of the utterance set, and stores the generated new utterance set in the utterance set storage unit; The interactive apparatus according to claim 1, further comprising:

A similarity calculation unit that calculates a similarity between the utterance sets including the new utterance set, or between templates of the utterance set;
Determination that determines the order of the utterance set or the templates of the utterance set so that the similarity between the consecutive utterance sets or between templates of the utterance set is the highest within a range where the similarity does not exceed a predetermined value. And further comprising
The utterance set acquisition unit acquires the utterance set according to the order,
The interactive apparatus according to claim 2, wherein the first output unit outputs the first utterance in the order of acquisition of the utterance set.

A similarity calculation unit that calculates a similarity between the utterance sets including the new utterance set, or between templates of the utterance set;
Grouping the utterance sets or the utterance set templates whose similarity is equal to or greater than a predetermined value into the same group, sequentially selecting different groups with the closest similarity, and the successive utterance sets or utterance sets And a determination unit for determining the order of the templates,
The utterance set acquisition unit acquires the utterance set according to the order,
The interactive apparatus according to claim 2, wherein the first output unit outputs the first utterance in the order of acquisition of the utterance set.

A similarity calculation unit that calculates a similarity between the extracted phrase and the utterance set including the new utterance set, and between the utterance sets;
The utterance set having the highest similarity is set as the first utterance set, and the similarity between consecutive utterance sets is highest within a range in which the similarity does not exceed a predetermined value from the first utterance set. And a determination unit for determining the order of the utterance set,
The utterance set acquisition unit acquires the utterance set according to the order,
The interactive apparatus according to claim 3, wherein the first output unit outputs the first utterance in order of acquisition of the utterance set.

The detection unit further detects a predetermined utterance by the user,
The interactive apparatus according to claim 1, wherein the first output unit outputs the first utterance when a predetermined utterance by the user is detected.

The utterance set storage unit stores the utterance set further including a third utterance representing an utterance of a response to a case where the user does not utter,
9. The dialogue apparatus according to claim 1, wherein the second output unit outputs the third utterance included in the acquired utterance set when an utterance by the user is not detected. .