JPWO2016104193A1

JPWO2016104193A1 - Correspondence determining device, voice dialogue system, control method of correspondence determining device, and voice dialogue device

Info

Publication number: JPWO2016104193A1
Application number: JP2016566114A
Authority: JP
Inventors: 彰則横濱; 誠悟伊藤; 田中　宏明; 宏明田中
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2014-12-26
Filing date: 2015-12-11
Publication date: 2017-05-25
Also published as: WO2016104193A1

Abstract

多様な表現の発話に対し、発話者の意図に応じた対応を速やかに特定する。音声対話装置（１）は、利用者の発話を解析して生成された該利用者の意図を示す意図検索子を生成する意図検索子生成部（２５）と、意図検索子と対応記述子とが対応付けられた対応記述子検索テーブル（４２）を参照して、意図検索子生成部（２５）により生成された意図検索子に対応する対応記述子を特定する対応記述子検索部（２９）とを備えている。For utterances of various expressions, quickly identify the response according to the intention of the speaker. The spoken dialogue apparatus (1) includes an intention searcher generation unit (25) that generates an intention searcher indicating the user's intention generated by analyzing a user's utterance, an intention searcher, a corresponding descriptor, Referring to the correspondence descriptor search table (42) associated with the correspondence descriptor search unit (29) for identifying the correspondence descriptor corresponding to the intention searcher generated by the intention searcher generation unit (25) And.

Description

本発明は、利用者と音声で対話する音声対話装置に関し、より詳細には、利用者の発話に応じて音声対話装置の対応を決定する対応決定装置等に関する。 The present invention relates to a voice interaction apparatus that interacts with a user by voice, and more particularly, to a correspondence determination apparatus that determines the correspondence of a voice interaction apparatus according to a user's utterance.

昨今の音声認識技術の進歩に伴い、スマートフォンなど情報機器端末だけでなく、様々な電子機器に音声対話システムが用いられている。例えば、ＥＬＩＺＡ型対話システムでは、予め定められた単語や言い回しなどをキーワードとし、該キーワードとそれに対する応答内容とを対応付けて記録しておくことにより、そのキーワードを含む発話に対して応答することを可能にしている。また、例えば下記の特許文献１および２には、会話の状況に応じた応答を行う対話装置が記載されている。 With recent advances in speech recognition technology, speech dialogue systems are used not only for information device terminals such as smartphones but also for various electronic devices. For example, in an ELIZA type interactive system, a predetermined word or phrase is used as a keyword, and the keyword and the response content are recorded in association with each other, thereby responding to an utterance including the keyword. Is possible. Further, for example, Patent Documents 1 and 2 below describe an interactive device that makes a response according to the conversation status.

日本国公開特許公報「特開２０１１‐６５５８２号（２０１１年３月３１日公開）」Japanese Patent Publication “Japanese Patent Laid-Open No. 2011-65582 (published on March 31, 2011)” 日本国公開特許公報「特開２００１‐３５７０５３号（２００１年１２月２６日公開）」Japanese Patent Publication “JP 2001-357053 (published on Dec. 26, 2001)”

ここで、利用者との円滑なインタラクションを実現するためには、利用者の意図を正しく理解すること、および利用者の意図に基づいた対応を、人間同士のコミュニケーションの一般的な速度と同等程度（数百ｍｓ以内）に実行することが求められる。 Here, in order to realize a smooth interaction with the user, the user's intention is understood correctly and the response based on the user's intention is equivalent to the general speed of communication between humans. It is required to be executed (within several hundred ms).

しかしながら、ＥＬＩＺＡ型対話システムでは、基本的に利用者の意図を考慮することなく応答内容を決定しているため、利用者の意図に応じた対応とならないことが多いという問題がある。また、ＥＬＩＺＡ型対話システムでは、予め登録されたキーワード以外には応答できないため、多様な表現の発話に対して応答するためには、多数のキーワードを記録させておく必要があり、これにより応答速度が遅延するという問題もある。 However, in the ELIZA type interactive system, since the response content is basically determined without considering the user's intention, there is a problem in that it often does not correspond to the user's intention. In addition, since the ELIZA type interactive system cannot respond to keywords other than pre-registered keywords, it is necessary to record a large number of keywords in order to respond to utterances of various expressions. There is also a problem of delay.

一方、特許文献１および２の技術では、利用者の意図に応じた対応が可能となるが、対応を決定するための処理が煩雑であり、高い処理能力を有するＣＰＵ（Central Processing Unit）を用いなければ、快適なタイミングでの対話を実現することが難しい。 On the other hand, in the techniques of Patent Documents 1 and 2, although it is possible to cope with the user's intention, the processing for determining the correspondence is complicated, and a CPU (Central Processing Unit) having high processing capability is used. Without it, it is difficult to realize a conversation at a comfortable timing.

本発明は、上記の問題点に鑑みてなされたものであって、その目的は、多様な表現の発話に対し、発話者の意図に応じた対応を速やかに特定することのできる対応決定装置等を提供することにある。 The present invention has been made in view of the above-described problems, and its purpose is to deal with a variety of expressions, such as a response determination device that can quickly identify a response according to the intention of the speaker Is to provide.

上記の課題を解決するために、本発明の一態様に係る対応決定装置は、利用者と音声対話を行う音声対話装置が該利用者の発話に応じて行う対応を決定する対応決定装置であって、上記発話を解析して生成された上記利用者の意図を示す意図検索子を取得する意図検索子取得部と、上記意図検索子と上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報を参照して、上記意図検索子取得部により取得された意図検索子に対応する対応記述子を特定する対応記述子検索部と、を備えている。 In order to solve the above-described problem, a correspondence determining apparatus according to an aspect of the present invention is a correspondence determining apparatus that determines a correspondence that a voice interactive apparatus that performs a voice conversation with a user performs according to the utterance of the user. The intention searcher acquisition unit that acquires the intention searcher indicating the user's intention generated by analyzing the utterance corresponds to the correspondence descriptor that indicates the correspondence between the intention searcher and the voice interactive device. A correspondence descriptor search unit that refers to the attached correspondence descriptor search information and identifies a correspondence descriptor corresponding to the intention searcher acquired by the intention searcher acquisition unit.

また、本発明の一態様に係る音声対話システムは、上記の課題を解決するために、音声対話装置にて利用者と音声対話を行う音声対話システムであって、上記利用者の意図を示す意図検索子と、上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報を参照して、上記利用者の発話を解析して生成された意図検索子に対応する対応記述子を特定する対応決定装置を含み、上記音声対話装置は、上記利用者の上記発話に対し、上記対応決定装置が特定した上記対応記述子の示す対応を実行する。 A voice interaction system according to an aspect of the present invention is a voice interaction system that performs a voice interaction with a user using a voice interaction device in order to solve the above-described problem. The intention is to indicate the intention of the user. Correspondence corresponding to the intention search element generated by analyzing the user's utterance with reference to the corresponding descriptor search information in which the searcher is associated with the corresponding descriptor indicating the correspondence of the voice interactive device The voice interaction device includes a correspondence determination device that specifies a descriptor, and performs the correspondence indicated by the correspondence descriptor specified by the correspondence determination device with respect to the utterance of the user.

そして、本発明の一態様に係る対応決定装置の制御方法は、上記の課題を解決するために、利用者と音声対話を行う音声対話装置が該利用者の発話に応じて行う対応を決定する対応決定装置の制御方法であって、上記発話を解析して生成された上記利用者の意図を示す意図検索子を取得する意図検索子取得ステップと、上記意図検索子と上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報を参照して、上記意図検索子取得ステップにて取得された意図検索子に対応する対応記述子を特定する対応記述子検索ステップと、を含む。 And the control method of the response determination apparatus according to one aspect of the present invention determines the response to be performed by the voice interaction apparatus that performs a voice conversation with the user according to the user's utterance in order to solve the above-described problem. A method of controlling a correspondence determination device, which is an intention searcher acquisition step for acquiring an intention searcher indicating the user's intention generated by analyzing the utterance, and a correspondence between the intention searcher and the voice interaction device A corresponding descriptor search step for identifying a corresponding descriptor corresponding to the intention searcher acquired in the intention searcher acquisition step with reference to the corresponding descriptor search information associated with the corresponding descriptor indicating ,including.

また、本発明の一態様に係る音声対話装置は、上記の課題を解決するために、利用者と音声対話を行う音声対話装置であって、上記利用者の意図を示す意図検索子と、上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報を参照して特定された、上記利用者の発話を解析して生成された意図検索子に対応する対応記述子を、外部機器から取得する対応記述子取得部と、上記対応記述子取得部が取得した上記対応記述子が示す対応を実行する対応制御部と、を備えている。 In order to solve the above-described problem, a voice interaction apparatus according to an aspect of the present invention is a voice interaction apparatus that performs a voice conversation with a user, the intention searcher indicating the user's intention, and the above Correspondence descriptor corresponding to the intention searcher generated by analyzing the user's utterance specified by referring to the correspondence descriptor search information associated with the correspondence descriptor indicating the correspondence of the voice interactive device Are provided from the external device, and a correspondence control unit that executes the correspondence indicated by the correspondence descriptor obtained by the correspondence descriptor obtaining unit.

本発明の上記各態様によれば、多様な表現の発話に対し、発話者の意図に応じた対応を速やかに特定することができる。 According to the above aspects of the present invention, it is possible to quickly identify the response corresponding to the intention of the speaker for various expressions.

本発明の一実施形態に係る音声対話装置の要部構成の一例を示すブロック図である。It is a block diagram which shows an example of a principal part structure of the voice interactive apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る音声対話システムを概略的に示す図である。1 is a diagram schematically showing a voice interaction system according to an embodiment of the present invention. 上記音声対話装置が備える切替部の処理の一例を示す表である。It is a table | surface which shows an example of a process of the switch part with which the said voice interactive apparatus is provided. 上記音声対話装置が備える情報取得部の要部構成の一例を示すブロック図である。It is a block diagram which shows an example of a principal part structure of the information acquisition part with which the said voice interactive apparatus is provided. 上記音声対話装置が備える意図テーブルの一例を示す図である。It is a figure which shows an example of the intention table with which the said voice interactive apparatus is provided. 上記音声対話装置が備える対応記述子検索テーブルの一例を示す図である。It is a figure which shows an example of the corresponding descriptor search table with which the said voice interactive apparatus is provided. 上記音声対話装置が備える隣接ペアテーブルの一例を示す図である。It is a figure which shows an example of the adjacent pair table with which the said voice interactive apparatus is provided. 上記音声対話装置による意図検索子の生成処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the production | generation process of the intent search element by the said voice interactive apparatus. 意図検索子に応じた処理を実行する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which performs the process according to an intention searcher. 対応記述子取得処理の一例を示すフローチャートである。It is a flowchart which shows an example of a corresponding | compatible descriptor acquisition process. 意図対話と隣接ペア対話の切り替え処理の一例を示す図である。It is a figure which shows an example of the switching process of an intention dialog and an adjacent pair dialog. 対応記述子の示す対応の実行制御処理の一例を示すフローチャートである。It is a flowchart which shows an example of the corresponding | compatible execution control process which a corresponding descriptor shows. 利用者を玄関で検出という事象の発生を検出して意図検索子を生成する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which detects generation | occurrence | production of the event that a user is detected at a front door and produces | generates an intention search element. 実施形態３に係る音声対話システムを概略的に示す図である。It is a figure which shows schematically the voice dialogue system which concerns on Embodiment 3.

〔実施形態１〕
本発明の実施形態について、図１〜図１２に基づいて説明すれば以下のとおりである。Embodiment 1
The embodiment of the present invention will be described below with reference to FIGS.

（音声対話装置の概要）図１は、本発明の一実施形態に係る音声対話装置１の要部構成の一例を示すブロック図である。音声対話装置１は、利用者と音声対話を行う装置であり、また、該利用者の発話に応じて行う対応を決定する対応決定装置でもある。図示のように、音声対話装置１は、集音部１０、通信部１１、撮像部１２、バッテリー１３、制御部１４、記憶部１５、音波出力部１６、および駆動部１７を備えている。 (Outline of Spoken Dialogue Device) FIG. 1 is a block diagram showing an example of a main configuration of a spoken dialogue device 1 according to an embodiment of the present invention. The voice interaction device 1 is a device that performs a voice conversation with a user, and is also a response determination device that determines a response to be performed according to the user's utterance. As illustrated, the voice interaction apparatus 1 includes a sound collection unit 10, a communication unit 11, an imaging unit 12, a battery 13, a control unit 14, a storage unit 15, a sound wave output unit 16, and a drive unit 17.

集音部１０は、利用者が発する声を集音し、集音した声を電子的な波のデータ（音声データ）に変換し、この音声データを制御部１４の音声認識部２０に送る。また、通信部１１は、音声対話装置１が外部の装置と通信するためのものである。そして、撮像部１２は、音声対話装置１の外部を撮像する撮像装置であり、撮像した画像データは情報取得部２１に送られる。また、バッテリー１３は、音声対話装置１に電力を供給する蓄電池である。音声対話装置１は、バッテリーから供給される電力により動作可能となっている。 The sound collection unit 10 collects a voice uttered by the user, converts the collected voice into electronic wave data (voice data), and sends the voice data to the voice recognition unit 20 of the control unit 14. The communication unit 11 is for the voice interaction apparatus 1 to communicate with an external apparatus. The imaging unit 12 is an imaging device that images the outside of the voice interaction device 1, and the captured image data is sent to the information acquisition unit 21. The battery 13 is a storage battery that supplies power to the voice interaction apparatus 1. The voice interactive apparatus 1 can be operated with electric power supplied from a battery.

制御部１４は、音声対話装置１の各部を統括して制御するものであり、制御部１４の詳細は後述する。また、記憶部１５は、音声対話装置１にて使用される各種データを記憶する記憶装置である。具体的には、記憶部１５には、隣接ペアテーブル（リンク情報）４０、意図テーブル４１、および対応記述子検索テーブル（対応記述子検索情報）４２が格納されている。各テーブルの詳細については、図５〜７にて後述する。 The control unit 14 controls the respective units of the voice interaction apparatus 1 in an integrated manner, and details of the control unit 14 will be described later. The storage unit 15 is a storage device that stores various data used in the voice interaction apparatus 1. Specifically, the storage unit 15 stores an adjacent pair table (link information) 40, an intention table 41, and a corresponding descriptor search table (corresponding descriptor search information) 42. Details of each table will be described later with reference to FIGS.

音波出力部１６は、音波を出力する出力装置であり、例えばスピーカであってもよい。また、駆動部１７は、音声対話装置１を駆動する駆動装置であり、例えばステッピングモータを含んでいてもよい。 The sound wave output unit 16 is an output device that outputs sound waves, and may be, for example, a speaker. Moreover, the drive part 17 is a drive device which drives the voice interactive apparatus 1, and may include a stepping motor, for example.

（制御部１４の要部構成）次に、制御部１４の詳細について説明する。図１に示すように、制御部１４は、音声認識部２０、情報取得部（事象検出部）２１、切替部２２、隣接ペア対話部、意図対話部、対応制御部（タイミング制御部）３０、対応文出力制御部３１、音声合成部３２、および対応行動制御部３３を備えている。 (Main part structure of the control part 14) Next, the detail of the control part 14 is demonstrated. As shown in FIG. 1, the control unit 14 includes a voice recognition unit 20, an information acquisition unit (event detection unit) 21, a switching unit 22, an adjacent pair dialogue unit, an intention dialogue unit, a correspondence control unit (timing control unit) 30, A corresponding sentence output control unit 31, a speech synthesis unit 32, and a corresponding action control unit 33 are provided.

音声認識部２０は、集音部１０から送られた音声データをテキストデータに変換し、変換したテキストデータを切替部２２に送る。音声認識部２０としては、例えばＡＳＲ（Auto Speech Recognition）装置を適用することもできる。 The voice recognition unit 20 converts the voice data sent from the sound collection unit 10 into text data, and sends the converted text data to the switching unit 22. As the speech recognition unit 20, for example, an ASR (Auto Speech Recognition) device can be applied.

情報取得部２１は、通信部１１、撮像部１２、およびバッテリー１３から各種情報を取得して、取得した情報から所定の事象が発生したことを検出し、該事象の発生を切替部２２に通知する。なお、情報取得部２１の詳細については、実施形態２で説明する。 The information acquisition unit 21 acquires various types of information from the communication unit 11, the imaging unit 12, and the battery 13, detects that a predetermined event has occurred from the acquired information, and notifies the switching unit 22 of the occurrence of the event. To do. Details of the information acquisition unit 21 will be described in a second embodiment.

切替部２２は、音声対話装置１により行われる利用者との対話を、隣接ペア対話部を用いた対話（以下、隣接ペア対話と呼ぶ）と、意図対話部を用いた対話（以下、意図対話と呼ぶ）とで切り替える。切替部２２による処理の詳細については図３にて後述する。 The switching unit 22 includes a dialog using the adjacent pair dialog unit (hereinafter referred to as an adjacent pair dialog) and a dialog using the intention dialog unit (hereinafter referred to as an intention dialog). Switch). Details of the processing by the switching unit 22 will be described later with reference to FIG.

対応制御部３０は、隣接ペア対話部、意図対話部、および切替部２２からの通知に応じて、音声対話装置１の対応を制御する。例えば、意図対話部から通知される情報が、発話内容を示す対応文である場合、対応制御部３０は、対応文を対応文出力制御部３１に通知する。また、意図対話部から通知される情報が、利用者に対する行動を示す対応行動である場合、対応制御部３０は、対応行動を対応行動制御部３３に通知する。そして、対応制御部３０は、連続して発話がなされた場合に、先の発話に対する対応を停止または中止する処理を行う。さらに、対応制御部３０は、音声対話装置１の対応の実行タイミングを制御するタイミング制御部としても機能する。 The correspondence control unit 30 controls the correspondence of the voice interaction device 1 in response to notifications from the adjacent pair dialogue unit, the intention dialogue unit, and the switching unit 22. For example, when the information notified from the intention dialogue unit is a correspondence sentence indicating the utterance content, the correspondence control unit 30 notifies the correspondence sentence output control unit 31 of the correspondence sentence. Further, when the information notified from the intention dialogue unit is a corresponding action indicating an action for the user, the corresponding control unit 30 notifies the corresponding action control unit 33 of the corresponding action. Then, the response control unit 30 performs processing for stopping or canceling the response to the previous utterance when the utterance is continuously made. Furthermore, the response control unit 30 also functions as a timing control unit that controls the corresponding execution timing of the voice interaction device 1.

対応文出力制御部３１は、対応制御部３０から通知される対応文を音声合成部３２に送信して音声データに変換させ、これにより得られた音声データを音波出力部１６に出力させる。なお、対応文の音声データを取得できる場合には、対応文出力制御部３１は、その音声データを図示しない再生部で再生して、音波出力部１６から出力してもよい。また、音声データは、音声対話装置１の記憶部１５に格納しておいてもよいし、外部機器から取得してもよい。 The correspondence sentence output control unit 31 transmits the correspondence sentence notified from the correspondence control unit 30 to the voice synthesis unit 32 to convert it into voice data, and causes the sound wave output unit 16 to output the voice data obtained thereby. If the corresponding sentence voice data can be acquired, the corresponding sentence output control unit 31 may reproduce the voice data by a reproduction unit (not shown) and output the sound data from the sound wave output unit 16. The voice data may be stored in the storage unit 15 of the voice interaction apparatus 1 or may be acquired from an external device.

音声合成部３２は、上記の通り、入力された対応文（テキストデータ）を音声データ（例えばＰＣＭ：Pulse Code Modulationデータ）に変換する。音声合成部３２としては、例えばＴＴＳ（Text To Speech）装置を適用することもできる。 As described above, the voice synthesizer 32 converts the input correspondence sentence (text data) into voice data (for example, PCM: Pulse Code Modulation data). As the speech synthesizer 32, for example, a TTS (Text To Speech) device can be applied.

対応行動制御部３３は、対応制御部３０の命令に従い、駆動部１７を駆動して音声対話装置１に対応行動を実行させる。なお、対応行動の内容によっては、音声対話装置１に対応行動を実行させるために、駆動部１７以外を制御してもよい。例えば、対応行動に音声出力が含まれている場合、対応文出力制御部３１を制御して音声出力させてもよいし、対応文出力制御部３１を介さずに音声合成部３２または音波出力部１６を制御して音声出力させてもよい。 The response action control unit 33 drives the drive unit 17 to cause the voice interaction apparatus 1 to execute the response action in accordance with an instruction from the response control unit 30. Depending on the content of the corresponding action, other than the drive unit 17 may be controlled to cause the voice interaction apparatus 1 to execute the corresponding action. For example, when the corresponding action includes a voice output, the corresponding sentence output control unit 31 may be controlled to output a voice, or the voice synthesis unit 32 or the sound wave output unit may be output without the corresponding sentence output control unit 31. 16 may be controlled to output sound.

（意図対話部）意図対話部は、利用者の意図に応じた対応を決定するものであり、図１に示すように、意図検索子生成部（意図検索子取得部）２５、形態素解析部２６、係り受け解析部２７、対応記述子解析部２８、および対応記述子検索部（対応記述子取得部）２９を備えている。 (Intention Dialogue Unit) The intention dialogue unit is to determine the correspondence according to the user's intention. As shown in FIG. 1, the intention searcher generation unit (intention searcher acquisition unit) 25, the morpheme analysis unit 26. A dependency analysis unit 27, a corresponding descriptor analysis unit 28, and a corresponding descriptor search unit (corresponding descriptor acquisition unit) 29.

意図検索子生成部２５は、音声認識部２０が生成した文字列（テキストデータ）を、切替部２２を介して受信し、そのテキストデータから利用者の意図を示す意図検索子を生成する。具体的には、意図検索子生成部２５は、受信したテキストデータを形態素解析部２６に出力して形態素解析させ、その結果である形態素解析情報を取得する。続いて、意図検索子生成部２５は、この形態素解析情報を係り受け解析部２７に出力して、係り受けを解析させ、その結果である分節情報と係り受け情報を取得する。そして、意図検索子生成部２５は、意図テーブル４１と分節情報から利用者の意図を特定すると共に、係り受け情報から意図の対象となる語句を特定し、これらの意図および対象を示す情報を含む意図検索子を生成する。また、意図検索子生成部２５は、外部機器から意図検索子を取得する機能も備えている。 The intention searcher generation unit 25 receives the character string (text data) generated by the voice recognition unit 20 via the switching unit 22 and generates an intention searcher indicating the user's intention from the text data. Specifically, the intention searcher generation unit 25 outputs the received text data to the morpheme analysis unit 26 to perform morpheme analysis, and acquires morpheme analysis information as a result thereof. Subsequently, the intention searcher generation unit 25 outputs the morpheme analysis information to the dependency analysis unit 27, analyzes the dependency, and acquires segment information and dependency information as a result. Then, the intention searcher generation unit 25 specifies the intention of the user from the intention table 41 and the segment information, specifies the phrase that is the target of the intention from the dependency information, and includes information indicating the intention and the target. Generate intention searcher. The intention searcher generation unit 25 also has a function of acquiring an intention searcher from an external device.

形態素解析部２６は、意図検索子生成部２５から入力されたテキストデータを形態素に分解し、品詞を振る。そして、形態素解析部２６は、分解した形態素およびその品詞を示す形態素情報を意図検索子生成部２５に出力する。 The morpheme analysis unit 26 decomposes the text data input from the intention searcher generation unit 25 into morphemes and gives parts of speech. The morpheme analysis unit 26 outputs the decomposed morpheme and morpheme information indicating the part of speech to the intention searcher generation unit 25.

係り受け解析部２７は、意図検索子生成部２５から入力された形態素情報の示す形態素がどのような分節（述部と文末表現の組み合わせ）を構成しているかを解析し、また各分節間の係り受けを解析する。そして、係り受け解析部２７は、上記解析の結果として、分節を示す分節情報と、係り受けの関係になっている分節を示す係り受け情報を意図検索子生成部２５に出力する。なお、意図検索子生成部２５、形態素解析部２６および係り受け解析部２７に係る一連の処理例については、図８にて後述する。 The dependency analysis unit 27 analyzes what segment (combination of predicate and sentence end expression) the morpheme indicated by the morpheme information input from the intention searcher generation unit 25 constitutes, and between each segment. Analyze the dependency. Then, the dependency analysis unit 27 outputs segment information indicating a segment and dependency information indicating a segment having a dependency relationship to the intention searcher generation unit 25 as a result of the analysis. A series of processing examples related to the intention searcher generation unit 25, the morpheme analysis unit 26, and the dependency analysis unit 27 will be described later with reference to FIG.

対応記述子検索部２９は、対応記述子検索テーブル４２を参照して、意図検索子生成部２５から入力された意図検索子に対応付けられた対応記述子を検索し、特定する。なお、対応記述子とは、音声対話装置１が実行する対応を示す情報である。対応記述子解析部２８は、対応記述子検索部２９から通知された対応記述子を解析し、該対応記述子の解析結果を対応制御部３０に出力する。なお、対応記述子解析部２８および対応記述子検索部２９の詳細な処理については、図９にて後述する。 The correspondence descriptor search unit 29 refers to the correspondence descriptor search table 42 and searches for and specifies the correspondence descriptor associated with the intention searcher input from the intention searcher generation unit 25. The correspondence descriptor is information indicating the correspondence executed by the voice interaction apparatus 1. The correspondence descriptor analysis unit 28 analyzes the correspondence descriptor notified from the correspondence descriptor search unit 29 and outputs the analysis result of the correspondence descriptor to the correspondence control unit 30. The detailed processing of the correspondence descriptor analysis unit 28 and the correspondence descriptor search unit 29 will be described later with reference to FIG.

（隣接ペア対話部）隣接ペア対話部は、隣接ペアテーブル４０を参照した利用者との対話である隣接ペア対話における利用者への対応を決定するものであり、話題管理部（リンク応答部）２３、および話題取得部（リンク応答部）２４を備えている。 (Adjacent Pair Dialogue Unit) The adjacent pair dialogue unit determines the correspondence to the user in the adjacent pair dialogue that is the dialogue with the user referring to the adjacent pair table 40, and the topic management unit (link response unit) 23, and a topic acquisition unit (link response unit) 24.

話題管理部２３は、隣接ペア対話における音声対話装置１の応答内容を決定する。具体的には、話題管理部２３は、切替部２２からの通知に応じた対応文を話題取得部２４から取得して切替部２２に返す。これにより、当該対応文が音声対話装置１から音声出力される。 The topic management unit 23 determines the response content of the voice interaction device 1 in the adjacent pair dialogue. Specifically, the topic management unit 23 acquires a correspondence sentence corresponding to the notification from the switching unit 22 from the topic acquisition unit 24 and returns it to the switching unit 22. As a result, the corresponding sentence is output as voice from the voice interaction apparatus 1.

話題取得部２４は、話題管理部２３からの要求に応じた対応文を隣接ペアテーブル４０から取得して話題管理部２３に返す。なお、隣接ペア対話の詳細については、図７および図１１にて後述する。 The topic acquisition unit 24 acquires the corresponding sentence corresponding to the request from the topic management unit 23 from the adjacent pair table 40 and returns it to the topic management unit 23. Details of the adjacent pair dialogue will be described later with reference to FIGS.

（音声対話システム１００の概略）音声対話装置１は、単体でも利用者との対話が可能であるが、各種サーバと通信することによって、その機能を拡張することができる。ここでは、音声対話装置１と各種サーバを含む音声対話システム１００について、図２に基づいて説明する。 (Outline of Spoken Dialogue System 100) The spoken dialogue apparatus 1 can talk with a user even when it is a single unit, but its function can be expanded by communicating with various servers. Here, the voice dialogue system 100 including the voice dialogue apparatus 1 and various servers will be described with reference to FIG.

図２は、音声対話システム１００を概略的に示す図である。音声対話システム１００には、音声対話装置１、音声認識装置２、意図検索子生成装置３、対応記述子検索装置（外部機器、対応決定装置）４、話題取得装置５、音声データ提供装置６、対応行動情報提供装置７、および情報提供装置８が含まれる。 FIG. 2 is a diagram schematically showing the voice interaction system 100. The voice dialogue system 100 includes a voice dialogue device 1, a voice recognition device 2, an intention searcher generation device 3, a correspondence descriptor retrieval device (external device, correspondence determination device) 4, a topic acquisition device 5, a voice data provision device 6, A corresponding behavior information providing device 7 and an information providing device 8 are included.

音声認識装置２は、音声対話装置１が備えている音声認識部２０と同様に、音声データをテキストデータに変換する機能を有していると共に、外部の装置（ここでは音声対話装置１）と通信する機能を備えている。このため、音声対話装置１の音声認識部２０は、音声認識に失敗した場合、その音声データを、通信部１１を介して音声認識装置２に送信して音声認識させ、その結果であるテキストデータを受信することができる。 The voice recognition device 2 has a function of converting voice data into text data, as well as the voice recognition unit 20 included in the voice dialogue device 1, and an external device (here, the voice dialogue device 1). It has a function to communicate. For this reason, when the speech recognition unit 20 of the speech dialogue apparatus 1 fails in speech recognition, the speech data is transmitted to the speech recognition device 2 via the communication unit 11 for speech recognition, and the resulting text data Can be received.

意図検索子生成装置３は、音声対話装置１が備えている意図検索子生成部２５と同様に、テキストデータから意図検索子を生成する機能を有していると共に、外部の装置（ここでは音声対話装置１）と通信する機能を備えている。このため、音声対話装置１の意図検索子生成部２５は、意図検索子が生成できないテキストデータがあった場合、そのテキストデータを、通信部１１を介して意図検索子生成装置３に送信し、その意図検索子を生成させ、生成された意図検索子を受信することができる。 The intention searcher generation device 3 has a function of generating an intention searcher from text data, in the same manner as the intention searcher generation unit 25 provided in the voice interaction device 1, and an external device (speech in this case). It has a function of communicating with the interactive device 1). For this reason, when there is text data that cannot generate an intention searcher, the intention searcher generation unit 25 of the voice interaction apparatus 1 transmits the text data to the intention searcher generation device 3 via the communication unit 11. The intention searcher can be generated, and the generated intention searcher can be received.

対応記述子検索装置４は、音声対話装置１が備えている対応記述子検索部２９と同様に、意図検索子に対応付けられた対応記述子を特定する機能を有していると共に、外部の装置（ここでは音声対話装置１）と通信する機能を備えている。このため、音声対話装置１の対応記述子検索部２９は、対応記述子が検出できない意図検索子があった場合、その意図検索子を、通信部１１を介して対応記述子検索装置４に送信し、これに対応する対応記述子を検出させ、検出された対応記述子を受信することができる。 Corresponding descriptor search device 4 has a function of specifying a corresponding descriptor associated with an intention searcher, as well as corresponding descriptor searching unit 29 provided in voice interactive device 1, It has a function of communicating with a device (here, the voice interactive device 1). For this reason, when there is an intention searcher for which the correspondence descriptor cannot be detected, the correspondence descriptor search unit 29 of the voice interactive device 1 transmits the intention searcher to the correspondence descriptor search device 4 via the communication unit 11. Then, the corresponding descriptor corresponding to the detected descriptor can be detected, and the detected corresponding descriptor can be received.

話題取得装置５は、音声対話装置１が備えている話題取得部２４と同様に、隣接ペア対話の対応文を取得する機能を有していると共に、外部の装置（ここでは音声対話装置１）と通信する機能を備えている。このため、音声対話装置１の話題取得部２４は、話題管理部２３から要求された対応文を検出できなかった場合、その要求を、通信部１１を介して話題取得装置５に送信し、その要求に応じた対応文を検出させ、検出された対応文を受信することができる。 The topic acquisition device 5 has a function of acquiring a correspondence sentence of an adjacent pair dialogue, as well as the topic acquisition unit 24 provided in the voice interaction device 1, and an external device (here, the voice interaction device 1). The function to communicate with. For this reason, when the topic acquisition unit 24 of the voice interaction device 1 cannot detect the corresponding sentence requested from the topic management unit 23, the topic acquisition unit 24 transmits the request to the topic acquisition device 5 via the communication unit 11, and It is possible to detect the corresponding sentence according to the request and receive the detected corresponding sentence.

音声データ提供装置６は、音声対話装置１が備えている音声合成部３２と同様に、テキストデータを音声データに変換する機能を有していると共に、外部の装置（ここでは音声対話装置１）と通信する機能を備えている。このため、音声対話装置１の対応文出力制御部３１は、音声合成部３２に音声データを生成させる代わりに、通信部１１を介して音声データ提供装置６にテキストデータを送信して音声データに変換させ、これを受信して音波出力部１６に出力させることができる。この場合、音声対話装置１は、音声合成部３２の代わりに、受信した音声データ（例えばＷＡＶ形式のデータ）を再生する再生部を備えていればよい。 The voice data providing apparatus 6 has a function of converting text data into voice data as well as the voice synthesizer 32 provided in the voice dialogue apparatus 1 and an external device (here, the voice dialogue apparatus 1). The function to communicate with. For this reason, the corresponding sentence output control unit 31 of the voice interaction device 1 transmits text data to the voice data providing device 6 via the communication unit 11 to generate voice data instead of causing the voice synthesis unit 32 to generate voice data. It can be converted, received and output to the sound wave output unit 16. In this case, the voice interaction apparatus 1 may include a playback unit that plays back the received voice data (for example, data in the WAV format) instead of the voice synthesis unit 32.

対応行動情報提供装置７は、音声対話装置１の対応行動制御部３３の要求に従って、音声対話装置１に情報を送信する。例えば、対応行動制御部３３が実行する対応行動が、日食の画像を取得するというものであれば、対応行動制御部３３は、対応行動情報提供装置７に日食の画像の送信を要求する。そして、対応行動情報提供装置７は、この要求に従って、インターネット等のネットワーク上で日食の画像を検索し、取得して、対応行動制御部３３に送信し、対応行動制御部３３はこれを利用者宛に送信する。 The corresponding behavior information providing device 7 transmits information to the voice interaction device 1 in accordance with a request from the corresponding behavior control unit 33 of the voice interaction device 1. For example, if the corresponding action executed by the corresponding action control unit 33 is to acquire an eclipse image, the corresponding action control unit 33 requests the corresponding action information providing device 7 to transmit an eclipse image. . And according to this request | requirement, the corresponding action information provision apparatus 7 searches and acquires the image of a solar eclipse on networks, such as the internet, and transmits to the corresponding action control part 33, and the corresponding action control part 33 uses this. To the recipient.

情報提供装置８は、音声対話装置１の情報取得部２１と通信して、例えばインターネット等のネットワークに関する所定の情報（ネットワーク情報）を情報取得部２１に送信する。詳細は実施形態２で説明するが、情報提供装置８は、例えば所定のウェブページを取得し、その内容が前回取得したときから更新されていた場合に、その旨を情報取得部２１に通知する。 The information providing device 8 communicates with the information acquisition unit 21 of the voice interaction device 1 and transmits predetermined information (network information) related to a network such as the Internet to the information acquisition unit 21. Although details will be described in the second embodiment, the information providing device 8 acquires a predetermined web page, for example, and notifies the information acquisition unit 21 when the content has been updated since the previous acquisition. .

（切替部の詳細）次に、切替部２２の詳細を図３に基づいて説明する。図３は、切替部２２の処理の一例を示す表である。図示のように、切替部２２は、切替部２２に情報を入力した入力元と、その入力の直前の状況（一回前の状況）とに応じた処理を行う。 (Details of Switching Unit) Next, details of the switching unit 22 will be described with reference to FIG. FIG. 3 is a table showing an example of processing of the switching unit 22. As shown in the figure, the switching unit 22 performs processing according to the input source that inputs information to the switching unit 22 and the situation immediately before the input (situation just before).

具体的には、音声認識部２０から文字列（音声認識結果のテキストデータ）が入力された場合、直前に音声対話が行われていなければ、入力されたテキストデータを意図対話部に出力する。一方、直前に隣接ペア対話が行われていれば、テキストデータの出力先は話題管理部２３とする。 Specifically, when a character string (text data of a speech recognition result) is input from the speech recognition unit 20, the input text data is output to the intention dialog unit if no speech dialogue has been performed immediately before. On the other hand, if the adjacent pair dialogue is performed immediately before, the output destination of the text data is the topic management unit 23.

また、入力元が情報取得部２１である場合、切替部２２は、直前の状況および入力の内容にかかわらず、情報取得部２１からの入力を意図検索子生成部２５に出力する。 When the input source is the information acquisition unit 21, the switching unit 22 outputs the input from the information acquisition unit 21 to the intention searcher generation unit 25 regardless of the immediately preceding situation and the content of the input.

そして、入力元が話題管理部２３である場合、直前の状況は考慮しないが、入力の内容に応じた処理を行う。具体的には、切替部２２は、話題管理部２３から隣接ペア対話を終了する旨の入力（ＥＮＤ）あった場合には、他の処理部への出力は行わない。例えば、利用者との対話が途切れたときに、話題管理部２３からＥＮＤが入力される。なお、同図には示していないが、この場合、切替部２２は意図対話への切り替えを行う。 When the input source is the topic management unit 23, the process according to the input content is performed without considering the immediately preceding situation. Specifically, when there is an input (END) to end the adjacent pair dialogue from the topic management unit 23, the switching unit 22 does not perform output to other processing units. For example, when the conversation with the user is interrupted, END is input from the topic management unit 23. Although not shown in the figure, in this case, the switching unit 22 switches to the intention dialogue.

一方、話題管理部２３から、隣接ペア対話にない文字がある旨の入力があった場合、入力されたテキストデータ（隣接ペア対話にない文字を含むテキストデータ）を意図対話部の意図検索子生成部２５に出力する。例えば、挨拶（おはよう等）の対話の後に、「昨日の野球の試合結果を教えて」のような、フレームの異なる（隣接ペア対話の枠を超えた）発話がなされたときに、このような処理がなされる。なお、同図には示していないが、この場合にも、切替部２２は意図対話への切り替えを行う。 On the other hand, when there is an input from the topic management unit 23 that there is a character that is not in the adjacent pair dialog, the input text data (text data including characters that are not in the adjacent pair dialog) is generated as an intention searcher of the intention dialog unit To the unit 25. For example, when an utterance with a different frame (exceeding the frame of the adjacent pair dialogue) such as “Tell me the results of yesterday's baseball game” after a greeting (good morning, etc.) dialogue, Processing is done. Although not shown in the figure, also in this case, the switching unit 22 switches to the intention dialogue.

また、話題管理部２３から「隣接ペア対話を開始できない」ことを示すエラー情報を受信した場合に、直前に隣接ペア対話にて発話がなされていれば、切替部２２は、その発話と同一の発話を実行するように隣接ペア対話部の話題管理部２３に指示する。なお、話題管理部２３は、隣接ペア対話にて発話させた後の利用者の返答が、隣接ペアテーブル４０に含まれる想定応答と部分一致した場合に、上記のエラー情報を切替部２２に送信する。 Further, when error information indicating that “adjacent pair dialogue cannot be started” is received from the topic management unit 23, if the utterance was made immediately before in the adjacent pair dialogue, the switching unit 22 is the same as the utterance. The topic management unit 23 of the adjacent pair dialogue unit is instructed to execute the utterance. The topic management unit 23 transmits the error information to the switching unit 22 when the user's response after the utterance in the adjacent pair dialogue partially matches the assumed response included in the adjacent pair table 40. To do.

これにより、例えば、隣接ペア対話にて音声対話装置１が「今日はほんとうに暑いですね」と発話した後の利用者の発話が「そんなこと・・・」であった場合に、再度「元気ですか？」と発話させることができる。なお、上記「・・・」の部分は、音声認識されなかった、あるいは集音部１０で取得されなかった部分である。 Thereby, for example, when the speech of the user after the speech dialogue apparatus 1 utters “It is really hot today” in the adjacent pair dialogue, "?" The portion “...” Is a portion that was not recognized by the voice or that was not acquired by the sound collection unit 10.

次に、対応制御部３０から隣接ペアを使うことを指示する入力があった場合、切替部２２は、直前の状況にかかわらず、隣接ペア対話部の話題管理部２３に対し、隣接ペア対話を実行させる命令を出力する。例えば、生成された意図検索子に対応付けられた対応記述子に隣接ペアＩＤが含まれている場合にこのような処理が行われる。 Next, when there is an input for instructing to use the adjacent pair from the correspondence control unit 30, the switching unit 22 performs the adjacent pair dialogue to the topic management unit 23 of the adjacent pair dialogue unit regardless of the immediately preceding situation. Output the instruction to be executed. For example, such processing is performed when an adjacent pair ID is included in the correspondence descriptor associated with the generated intention search element.

なお、対応制御部３０は、意図対話を行う場合（対応記述子解析部２８から隣接ペアＩＤを含まない対応記述子を受信した場合）にも、切替部２２に通知を行い、切替部２２は意図対話が行われたことを記憶してもよい。この場合、切替部２２は、次に音声認識部２０からテキストデータを受信したときに、そのテキストデータを意図対話部の意図検索子生成部２５に出力する。 The correspondence control unit 30 also notifies the switching unit 22 when the intention dialogue is performed (when the correspondence descriptor not including the adjacent pair ID is received from the correspondence descriptor analysis unit 28). You may memorize | store that the intention dialog was performed. In this case, when the switching unit 22 next receives text data from the speech recognition unit 20, the switching unit 22 outputs the text data to the intention searcher generation unit 25 of the intention dialog unit.

（意図テーブル）続いて、利用者の発話内容を示すテキストデータから、該利用者の意図を特定するための意図テーブル４１の詳細を図５に基づいて説明する。図５は、意図テーブル４１の一例を示す図である。 (Intent Table) Next, the details of the intention table 41 for specifying the user's intention from the text data indicating the user's utterance content will be described with reference to FIG. FIG. 5 is a diagram illustrating an example of the intention table 41.

図示のように、意図テーブル４１は、動詞、形容詞、連体詞、または名詞（動詞と形容詞については活用形も含む）と、その語尾（助動詞については活用形も含む）との組み合わせに対し、意図を示す情報が対応付けられた情報である。よって、意図テーブル４１を参照することにより、述部と文末表現の組み合わせから、意図を特定することができる。なお、意図テーブル４１では、語尾の一般的な意味（文法的な意味）を意図としてもよい。例えば、「動詞＋語尾」の組み合わせが「食べ（動詞連用形）＋たい（助動詞基本形：たい）」であれば、図５に示す意図テーブル４１から、意図は「希望」と特定される。 As shown in the figure, the intention table 41 indicates intentions for combinations of verbs, adjectives, conjunctions or nouns (including inflected forms for verbs and adjectives) and endings (including inflected forms for auxiliary verbs). The information shown is associated with the information. Therefore, by referring to the intention table 41, the intention can be specified from the combination of the predicate and the sentence end expression. In the intention table 41, the general meaning (grammatical meaning) of the ending may be used as the intention. For example, if the combination of “verb + ending” is “eat (verb combined form) + tai (auxiliary verb basic form: tai)”, the intention is identified as “hope” from the intention table 41 shown in FIG.

（対応記述子検索テーブル）続いて、意図検索子に対応する対応記述子を特定するための対応記述子検索テーブル４２の詳細を図６に基づいて説明する。図６は、対応記述子検索テーブル４２の一例を示す図である。図示のように、対応記述子検索テーブル４２は、意図検索子と対応記述子とが対応付けられた情報である。そして、意図検索子には、表層、意図、および対象という３つの要素が含まれており、対応記述子には、対応文、対応行動、および隣接ペアＩＤという３つの要素が含まれている。 (Corresponding Descriptor Search Table) Next, details of the corresponding descriptor search table 42 for specifying the corresponding descriptor corresponding to the intention searcher will be described with reference to FIG. FIG. 6 is a diagram illustrating an example of the correspondence descriptor search table 42. As illustrated, the correspondence descriptor search table 42 is information in which an intention searcher and a correspondence descriptor are associated with each other. The intention searcher includes three elements, that is, a surface layer, an intention, and an object, and the correspondence descriptor includes three elements such as a correspondence sentence, a correspondence action, and an adjacent pair ID.

同図の＃１に示すように、意図検索子が「表層：食べる、意図：希望」には、対応文「ｓｐｅａｋ：もうちょっと我慢して」を要素とする対応記述子が対応付けられている。このため、利用者の発話から生成される意図検索子が、「表層：食べる、意図：希望」である場合、音声対話装置１は、対応文「もうちょっと我慢して」を発話する。 As shown in # 1 of the figure, the intention searcher “surface: eat, intention: hope” is associated with a correspondence descriptor whose element is the correspondence sentence “speak: be patient”. . For this reason, when the intention searcher generated from the user's utterance is “Surface: Eating, Intention: Hope”, the voice interaction apparatus 1 utters the corresponding sentence “Be patient”.

なお、同図の♯３の例のように、対応文としてＵＲＬ（Uniform Resource Locator）等のアクセス先を示す情報を記述してもよい。この場合、この情報が示すアクセス先にアクセスすることにより、音声対話装置１に所定の内容の発話を行わせることができる。なお、アクセス先に格納する情報は、図示の例のように音声データのファイルであってもよいし、発話内容を示すテキストデータ等の情報であってもよいが、よりデータ容量の大きい音声データをネットワーク上から取得することが好ましい。これにより、音声対話装置１の記憶容量が比較的少ない場合でも、多様な音声データによる利用者への応答が可能になる。 Note that information indicating an access destination such as a URL (Uniform Resource Locator) may be described as a corresponding sentence as in the example of # 3 in FIG. In this case, by accessing the access destination indicated by this information, it is possible to cause the voice interaction apparatus 1 to utter a predetermined content. The information stored in the access destination may be a voice data file as in the illustrated example, or may be information such as text data indicating the utterance content, but the voice data having a larger data capacity. Is preferably obtained from the network. Thereby, even when the storage capacity of the voice interaction device 1 is relatively small, it becomes possible to respond to the user with various voice data.

また、同図の♯２の意図検索子は、表層と意図に加えて「対象」の要素を含んでいる。この「対象」としては、当該意図検索子の意図に関連する語句が記録されており、そして、このような意図検索子には、その「対象」に応じた対応記述子が対応付けられている。これにより、「対象」に特化した対応が実現される。 In addition, the intention searcher # 2 in the figure includes a “target” element in addition to the surface layer and the intention. As this “target”, a phrase related to the intention of the intention searcher is recorded, and a correspondence descriptor corresponding to the “target” is associated with such intention searcher. . As a result, a response specialized for the “target” is realized.

例えば、利用者が「何か食べたいな」と発話した場合、「表層：食べる、意図：希望」の意図検索子が生成されるので、図６の対応記述子検索テーブル４２を参照することにより、「もうちょっと我慢して」という対象を限定しない発話がなされる。これに対し、利用者が「カレーが食べたい」と発話した場合、「表層：食べる、意図：希望、対象：カレー」の意図検索子が生成される。これにより、図６の対応記述子検索テーブル４２を参照することにより、「カレーでもナンでも食べればいいじゃない」という、対象をカレーに特化した発話がなされる。 For example, when the user utters “I want to eat something”, an intention searcher of “surface: eat, intention: hope” is generated, so by referring to the correspondence descriptor search table 42 in FIG. An utterance is made that does not limit the subject, “Please be patient”. On the other hand, when the user speaks “I want to eat curry”, an intention searcher “surface: eat, intention: hope, target: curry” is generated. Thus, by referring to the correspondence descriptor search table 42 in FIG. 6, an utterance specialized for curry is made, such as “You should eat either curry or naan”.

なお、「対象」を考慮しなくとも、利用者の意図に沿った対応は可能である。このため、生成された意図検索子に「対象」が含まれている場合であっても、対応記述子検索テーブル４２にその「対象」は含まれないが、「表層」と「意図」は一致する意図検索子があれば、その意図検索子に対応付けられた対応記述子を取得してもよい。 Note that it is possible to respond to the intention of the user without considering the “target”. For this reason, even if “target” is included in the generated intent search element, the “target” is not included in the corresponding descriptor search table 42, but “surface” and “intention” match. If there is an intention searcher to be used, a correspondence descriptor associated with the intention searcher may be acquired.

また、図６の＃４の対応記述子には、対応文に加えて、対応行動が要素として含まれている。このような対応記述子が実行対象となった場合には、音声対話装置１は対応文の発話を行うと共に、対応行動を実行する。 In addition, the correspondence descriptor # 4 in FIG. 6 includes a correspondence action as an element in addition to the correspondence sentence. When such a correspondence descriptor becomes an execution target, the voice interaction apparatus 1 utters a correspondence sentence and executes a correspondence action.

対応行動は、所定の行動が実行されるような記述となっていればよく、その記述態様は特に限定されないが、例えば対応行動の識別子であるラベル名と、行動の内容と、行動の手順とを含む記述としてもよい。図示の例では、ラベル名が挨拶の対応行動について、まず、ステッピングモータを３０度回転させ、次に、その状態で１０秒待機し、最後にステッピングモータを３０度逆回転させるという行動が規定されている。 The corresponding action only needs to be a description that the predetermined action is executed, and the description mode is not particularly limited. For example, the label name that is the identifier of the corresponding action, the content of the action, the procedure of the action, It is good also as description containing. In the example shown in the figure, for the action corresponding to the greeting whose label name is greeting, first, the action of rotating the stepping motor 30 degrees, then waiting in that state for 10 seconds, and finally rotating the stepping motor 30 degrees backward is specified. ing.

なお、この対応行動は、音声対話装置１の外観が人型であり、駆動部１７が音声対話装置１の腰部に設けられたステッピングモータである場合を想定したものである。つまり、上記の対応行動を実行した場合、直立状態の音声対話装置１が、ステッピングモータの３０度回転により、その上半身が前傾した状態となり、その後、ステッピングモータの３０度逆回転により直立状態に戻ることになる。これが、利用者には、音声対話装置１がお辞儀をしているように見える。 This corresponding action assumes a case where the appearance of the voice interaction device 1 is humanoid and the drive unit 17 is a stepping motor provided on the waist of the voice interaction device 1. In other words, when the above-mentioned corresponding action is executed, the upright spoken dialogue apparatus 1 is in a state where the upper body is tilted forward by the rotation of the stepping motor by 30 degrees, and then the upright state by the reverse rotation of the stepping motor by 30 degrees. Will return. This seems to the user that the voice interactive device 1 is bowing.

このような対応記述子を参照することにより、意図検索子が「表層：帰宅、意図：現在、対象：利用者」である場合に、音声対話装置１に「ご主人様おかえりなさい」を発話させると共に、対応行動（挨拶）を実行させることができる。 By referring to such a correspondence descriptor, when the intention searcher is “surface: returning home, intention: current target: user”, the voice dialogue apparatus 1 utters “return your master”. , Corresponding actions (greetings) can be executed.

同様に、＃１０の対応記述子にも対応行動（ラベル名：画像取得）が含まれている。「画像取得」は、対応行動に発話が含まれている点と、途中経過に応じて行動が分岐する点で上記「挨拶」と相違している。 Similarly, the correspondence action (label name: image acquisition) is also included in the correspondence descriptor of # 10. “Image acquisition” is different from the above “greeting” in that the corresponding action includes an utterance and the action branches according to progress.

具体的には、「画像取得」では、まず意図検索子の「対象」（＃１０の例では「日食」）をキーワードとして画像検索を行う。なお、画像検索は、対応行動制御部３３が実行してもよいし、外部機器（例えば図２の対応行動情報提供装置７）に実行させてもよい。 Specifically, in “image acquisition”, first, an image search is performed using the “target” of the intention searcher (“eclipse” in the example of # 10) as a keyword. The image search may be executed by the corresponding action control unit 33 or may be executed by an external device (for example, the corresponding action information providing device 7 in FIG. 2).

そして、所定時間（図示の例では２０００ｍｓ）以内に画像を取得できる場合には、音声対話装置１に「画像が取得出来ました」と発話させると共に、取得された画像を電子メールで利用者に送信する。なお、送信先のアドレスは予め登録しておけばよい。さらに、この後、ステッピングモータを３０度回転させ、１０秒待機し、ステッピングモータを３０度逆回転させ、これにより「画像取得」は終了する。 If the image can be acquired within a predetermined time (2000 ms in the illustrated example), the voice dialogue apparatus 1 is uttered "Image has been acquired" and the acquired image is sent to the user by e-mail. Send. The destination address may be registered in advance. Further, after that, the stepping motor is rotated by 30 degrees, waits for 10 seconds, and the stepping motor is rotated reversely by 30 degrees, thereby completing the “image acquisition”.

一方、所定時間（図示の例では２０００ｍｓ）以内に画像を取得できない場合には、音声対話装置１に「画像が取得出来ませんでした」と発話させた後、ステッピングモータを３０度回転させ、１０秒待機し、ステッピングモータを３０度逆回転させ、これにより「画像取得」は終了する。 On the other hand, when an image cannot be acquired within a predetermined time (2000 ms in the illustrated example), the voice interaction apparatus 1 is told that “the image could not be acquired”, and then the stepping motor is rotated 30 degrees. After waiting for 1 second, the stepping motor is rotated reversely by 30 degrees, thereby completing the “image acquisition”.

また、図６の＃９の対応記述子には、隣接ペアＩＤが要素として含まれている。このように、実行する対応記述子に隣接ペアＩＤが含まれる場合、その隣接ペアＩＤが示す対話文が発話され、隣接ペア対話が行われる。なお、隣接ペアＩＤおよび隣接ペア対話の詳細は後述する。 The correspondence descriptor # 9 in FIG. 6 includes the adjacent pair ID as an element. As described above, when the adjacent pair ID is included in the corresponding descriptor to be executed, the dialogue sentence indicated by the adjacent pair ID is uttered, and the adjacent pair dialogue is performed. Details of the adjacent pair ID and the adjacent pair dialogue will be described later.

（隣接ペアテーブル）続いて、隣接ペアテーブル４０と隣接ペア対話について図７に基づいて説明する。図７は、隣接ペアテーブル４０の一例を示す図である。図示の隣接ペアテーブル４０は、音声対話装置１の発話内容と、該発話に対する利用者の応答として想定される想定応答と、該想定応答に対する音声対話装置１の発話内容（具体的には隣接ペアＩＤ）とが対応付けられたテーブルである。また、音声対話装置１の発話内容には、それぞれ固有の隣接ペアＩＤが付されている。 (Adjacent Pair Table) Next, the adjacent pair table 40 and the adjacent pair dialogue will be described with reference to FIG. FIG. 7 is a diagram illustrating an example of the adjacent pair table 40. The adjacent pair table 40 shown in the figure includes the utterance content of the voice interaction device 1, the assumed response assumed as the user's response to the utterance, and the utterance content of the voice interaction device 1 with respect to the assumed response (specifically, the adjacent pair). ID) are associated with each other. Also, each utterance content of the voice interaction device 1 is given a unique adjacent pair ID.

隣接ペアＩＤを含む対応記述子が実行される場合、隣接ペアテーブル４０を参照して、その隣接ペアＩＤの発話内容が特定される。例えば、図６の＃９の対応記述子には、隣接ペアＩＤ＝１が含まれているので、図７の隣接ペアテーブル４０を参照した場合、「今日はほんとうに暑いですね」と発話することが決定される。 When the correspondence descriptor including the adjacent pair ID is executed, the utterance content of the adjacent pair ID is specified with reference to the adjacent pair table 40. For example, the correspondence descriptor of # 9 in FIG. 6 includes the adjacent pair ID = 1, so when referring to the adjacent pair table 40 in FIG. 7, “Today is really hot” is spoken. It is decided.

また、上述のように、隣接ペアテーブル４０では、１つの発話内容に対して想定応答が対応付けられており、各想定応答に対して隣接ペアＩＤが対応付けられている。よって、隣接ペアテーブル４０に基づく発話が行われた後、利用者が想定応答を行った場合には、隣接ペアテーブル４０を参照することによって、速やかに次の発話内容を特定することができる。 Further, as described above, in the adjacent pair table 40, an assumed response is associated with one utterance content, and an adjacent pair ID is associated with each assumed response. Therefore, after the utterance based on the adjacent pair table 40 is performed, when the user makes an assumed response, the next utterance content can be quickly identified by referring to the adjacent pair table 40.

例えば、音声対話装置１に「今日はほんとうに暑いですね」と発話させた後、利用者が「そんなことないぞ」と発話した場合、隣接ＩＤ２の発話内容、すなわち「でも２５度超えてますよ」が特定される。このように、隣接ペアテーブル４０を参照することにより、利用者の発話が想定応答の範囲内であれば、その発話に対して速やかに応答することができる。 For example, after letting the voice dialogue device 1 say “It ’s really hot today”, if the user says “No, that ’s not true”, the content of the adjacent ID 2 utterance, “but over 25 degrees. Yo "is identified. In this way, by referring to the adjacent pair table 40, if the user's utterance is within the range of the assumed response, it is possible to quickly respond to the utterance.

なお、１つの発話内容に対応付ける想定応答の数は特に限定されず、１つであってもよいし、３つ以上であってもよい。また、各想定応答に表現のバリエーションが含まれていてもよい。例えば、「そんなことないぞ」に加えて、「そうでもない」や「暑くない」等、「今日はほんとうに暑いですね」の発話に対する否定的な発話内容を想定応答に含めておいてもよい。 The number of assumed responses associated with one utterance content is not particularly limited, and may be one or three or more. Each assumed response may include a variation of expression. For example, in addition to “I do n’t think that ’s true”, you can also include negative utterances in response to utterances like “It ’s not hot” or “It ’s really hot today” in the expected response. Good.

（意図検索子の生成処理）図８は、図１に示す音声対話装置１による意図検索子の生成処理の流れを示すシーケンス図である。図８に示すように、音声認識部２０は、入力された音声データをテキストデータに変換し、切替部２２に出力する。 (Intention Searcher Generation Processing) FIG. 8 is a sequence diagram showing a flow of intention searcher generation processing by the voice interaction apparatus 1 shown in FIG. As shown in FIG. 8, the voice recognition unit 20 converts the input voice data into text data and outputs the text data to the switching unit 22.

切替部２２は、音声認識部２０からテキストデータが入力されると、その直前に対話を行ったか否かを確認する。ここで、直前に対話が行われていない場合、切替部２２は、意図検索子生成部２５に音声認識部２０から入力されたテキストデータを出力する。なお、同図には示していないが、直前に隣接ペア対話が行われている場合には、切替部２２は、そのことを記憶しており、この記憶に基づいて、テキストデータを隣接ペア対話部に出力する（図３に示す表の「入力元」が音声認識部であるカラムを参照）。 When the text data is input from the voice recognition unit 20, the switching unit 22 checks whether or not a dialogue has been performed immediately before. Here, when no dialogue is performed immediately before, the switching unit 22 outputs the text data input from the speech recognition unit 20 to the intention searcher generation unit 25. Although not shown in the figure, when the adjacent pair dialogue is performed immediately before, the switching unit 22 stores that fact, and based on this storage, the text data is transferred to the adjacent pair dialogue. (Refer to the column where “input source” in the table shown in FIG. 3 is a speech recognition unit).

次に、意図検索子生成部２５は、切替部２２から入力されたテキストデータの文字列を形態素解析部２６に出力する。ここで、入力されたテキストデータが複数の文で構成されている場合には、意図検索子生成部２５は、最後の文の文字列を形態素解析部２６に出力する。例えば、切替部２２からのテキストデータが「いやぁ、さっき起きたばかりだよ。今日、ご飯が食べたい」である場合、「だよ。」と「今日」の間が文の切れ目であると判定して、「今日、ご飯が食べたい」を出力する。 Next, the intention searcher generation unit 25 outputs the character string of the text data input from the switching unit 22 to the morpheme analysis unit 26. When the input text data is composed of a plurality of sentences, the intention searcher generation unit 25 outputs the character string of the last sentence to the morpheme analysis unit 26. For example, if the text data from the switching unit 22 is “No, I just woke up. I want to eat rice today”, it is determined that there is a break between “Dayo” and “Today”. And output “I want to eat rice today”.

そして、形態素解析部２６は、意図検索子生成部２５から入力された文字列を形態素に分解し、品詞を振る。例えば、入力された文字列が「今日、ご飯が食べたい」であれば、「今日（名詞）／、（記号）／ご飯（名詞）／が（助詞）／食べ（動詞）／たい（助動詞）」のように分解して品詞を振る。そして、形態素解析部２６は、分解した形態素およびその品詞を示す形態素情報を意図検索子生成部２５に出力し、意図検索子生成部２５はこの形態素情報を係り受け解析部２７に出力する。 Then, the morpheme analysis unit 26 decomposes the character string input from the intention searcher generation unit 25 into morphemes and gives parts of speech. For example, if the input character string is “I want to eat rice today,” “Today (noun) /, (symbol) / rice (noun) / ga (particle) / eat (verb) / tai (auxiliary verb) ”And disseminate parts of speech. The morpheme analysis unit 26 outputs the decomposed morpheme and morpheme information indicating the part of speech to the intention searcher generation unit 25, and the intention searcher generation unit 25 outputs the morpheme information to the dependency analysis unit 27.

次に、係り受け解析部２７は、意図検索子生成部２５から入力された形態素情報の示す形態素を分節に分ける。そして、係り受け解析部２７は、文末部分の分節の述部と文末表現との組み合わせを示す分節情報を意図検索子生成部２５に出力する。例えば、上記の例では述部「食べ（動詞）」と文末表現「たい（助動詞）」との組み合わせを示す分節情報が生成される。 Next, the dependency analysis unit 27 divides the morpheme indicated by the morpheme information input from the intention searcher generation unit 25 into segments. Then, the dependency analysis unit 27 outputs segment information indicating a combination of a segment predicate and a sentence end expression to the intention searcher generation unit 25. For example, in the above example, segment information indicating a combination of the predicate “eat (verb)” and the sentence end expression “tai (auxiliary verb)” is generated.

また、「対象」の特定を含む意図検索子（図６の♯２参照）を生成する場合、係り受け解析部２７は、分節間の係り受けを示す係り受け情報を生成する。例えば、上記の例では、「ご飯が」と「食べたい」の組み合わせを示す係り受け情報が生成される。そして、意図検索子生成部２５は、係り受け解析部２７から入力された分節情報（述部と文末表現の組み合わせ）と係り受け情報を取得する。 In addition, when generating an intention search element including the identification of “target” (see # 2 in FIG. 6), the dependency analysis unit 27 generates dependency information indicating dependency between segments. For example, in the above example, dependency information indicating a combination of “rice” and “want to eat” is generated. Then, the intention searcher generation unit 25 acquires segment information (combination of predicate and sentence ending expression) and dependency information input from the dependency analysis unit 27.

次に、意図検索子生成部２５は、意図テーブル４１を参照して、取得した述部および文末表現に対応付けられた意図を特定する。例えば、「食べ（動詞）」と「たい（助動詞）」との組み合わせであれば、図５の意図テーブル４１から、意図は「希望」と特定される。 Next, the intention searcher generation unit 25 refers to the intention table 41 and identifies the intention associated with the acquired predicate and sentence ending expression. For example, in the case of a combination of “eat (verb)” and “tai (auxiliary verb)”, the intention is specified as “hope” from the intention table 41 of FIG.

続いて、意図検索子生成部２５は、述部を基本形（終止形）にした表層、および意図を含む意図検索子を生成する（意図検索子取得ステップ）。例えば、上記の例では、表層が「食べる」、意図が「希望」の意図検索子が生成される。そして、意図検索子生成部２５は、生成した意図検索子を、対応記述子検索部２９に出力する。なお、係り受け情報も取得している場合には、該係り受け情報を参照して、意図の特定に用いた分節に係る分節（またはそれに含まれる名詞）を「対象」として特定して、これを意図検索子に含めてもよい。 Subsequently, the intention searcher generation unit 25 generates a surface layer having the predicate in a basic form (end form) and an intention searcher including the intention (intention searcher acquisition step). For example, in the above example, an intention searcher whose surface layer is “eat” and whose intention is “hope” is generated. Then, the intention searcher generation unit 25 outputs the generated intention searcher to the correspondence descriptor search unit 29. If the dependency information is also acquired, the segment (or the noun included in the segment used to identify the intention) is identified as the “target” by referring to the dependency information. May be included in the intention searcher.

以上のようにして、利用者の発話内容を示すテキストデータから利用者の意図を示す意図検索子が生成される。例えば、利用者の発話が「カレーは辛かった」である場合、そのテキストデータが｛カレー（名詞）／は（助詞）／辛かっ（形容詞の連用形）／た（助動詞の基本形）｝のように形態素に分解されて品詞が振られる。そして、図５に示す意図テーブル４１から、形容詞の連用形と助動詞基本形（終止形）「た」の組み合わせは「事実、過去」と特定されるから、この場合に生成される意図検索子は、〔辛い｜事実、過去〕となる。なお、対象を含めて〔辛い｜事実、過去‖カレー〕としてもよい。 As described above, the intention searcher indicating the user's intention is generated from the text data indicating the utterance content of the user. For example, if the user ’s utterance is “curry was hard”, the text data is {curry (noun) / ha (particle) / spicy (adjective conjunctive form) / ta (basic form of auxiliary verb)}. It is broken down into parts of speech. Then, from the intention table 41 shown in FIG. 5, the combination of the adjective continuous form and the auxiliary verb basic form (termination form) “ta” is specified as “facts, past”, and the intention searcher generated in this case is [ Spicy | facts, the past]. In addition, it is good also as [spicy | facts, past rice curry] including the object.

また、例えば、利用者の発話が「年収は同じだ」である場合、｛年収（名詞）／は（助詞）／同じ（連体詞）／だ（助動詞）｝のように形態素に分解されて品詞が振られる。そして、図５に示す意図テーブル４１から意図を特定して、〔同じ｜事実〕の意図検索子が生成される。この場合も、対象を含めて〔同じ｜事実‖年収〕としてもよい。 Also, for example, if the user's utterance is “same salary”, the part of speech is decomposed into morphemes like {annual income (noun) / ha (particle) / same (combined) / da (auxiliary verb)}. Get dumped. Then, an intention is identified from the intention table 41 shown in FIG. 5, and an intention searcher of [same | facts] is generated. In this case as well, it may be [same | factual salary income] including the target.

さらに、例えば、利用者の発話が「それはオッケーだね」である場合、｛それ（代名詞）／は（助詞）／オッケー（名詞）／だ（助動詞）／ね（助詞）｝のように形態素に分解されて品詞が振られる。そして、図５に示す意図テーブル４１から意図を特定して、〔オッケー｜事実、確認〕の意図検索子が生成される。この場合も、対象を含めて〔オッケー｜事実、確認‖それ〕としてもよい。 Furthermore, for example, when the user's utterance is “It's okay,” it ’s morpheme like {it (pronoun) / ha (particle) / ok (noun) / da (auxiliary verb) / ne (particle)}. The parts of speech are given after being disassembled. Then, the intention is identified from the intention table 41 shown in FIG. 5, and an intention searcher of [OK | fact, confirmation] is generated. Also in this case, it is possible to include [OK] |

また、例えば、利用者の発話が「街は静かだ」である場合、｛街（名詞）／は（助詞）／静か（形容動詞）／だ（助動詞）｝のように形態素に分解されて品詞が振られる。そして、図５に示す意図テーブル４１から意図を特定して、〔静か｜事実〕の意図検索子が生成される。この場合も、対象を含めて〔静か｜事実‖街〕としてもよい。 Also, for example, when the user's utterance is “the town is quiet”, the part of speech is decomposed into morphemes like {town (noun) / ha (particle) / quiet (adjective verb) / da (auxiliary verb)}. Is shaken. Then, the intention is identified from the intention table 41 shown in FIG. 5, and a [quiet | facts] intention searcher is generated. Also in this case, it may be [quiet | factual street] including the object.

（ローカルで対応記述子を取得）図９は、意図検索子に応じた処理を実行する処理の一例を示すフローチャートである。対応記述子検索部２９は、記憶部１５に保存されている対応記述子検索テーブル４２（図６）を参照して、意図検索子生成部２５から入力された意図検索子に対応付けられた対応記述子を検索する（Ｓ１、対応記述子検索ステップ）。 (Obtaining Corresponding Descriptor Locally) FIG. 9 is a flowchart showing an example of processing for executing processing according to the intention searcher. The correspondence descriptor search unit 29 refers to the correspondence descriptor search table 42 (FIG. 6) stored in the storage unit 15 and corresponds to the intention searcher input from the intention searcher generation unit 25. A descriptor is searched (S1, corresponding descriptor search step).

ここで、意図検索子生成部２５から入力された意図検索子に対応付けられた対応記述子を検出した場合（Ｓ２にてＹＥＳ）、対応記述子検索部２９は、検出した対応記述子を対応記述子解析部２８に通知する。一方、対応記述子が検出されなかった（Ｓ２にてＮＯ）場合、外部機器（具体的には図２の対応記述子検索装置４）から対応記述子を取得する処理を実行する（Ｓ３）。Ｓ３の詳細については、図１０を参照して後述する。 If a corresponding descriptor associated with the intention searcher input from the intention searcher generation unit 25 is detected (YES in S2), the correspondence descriptor search unit 29 corresponds to the detected correspondence descriptor. This is notified to the descriptor analysis unit 28. On the other hand, when the corresponding descriptor is not detected (NO in S2), processing for acquiring the corresponding descriptor from the external device (specifically, the corresponding descriptor searching device 4 in FIG. 2) is executed (S3). Details of S3 will be described later with reference to FIG.

次に、対応記述子解析部２８は、対応記述子検索部２９から通知された対応記述子に「対応文」が含まれているか否かを判定する（Ｓ４）。ここで、「対応文」が含まれていないと判定した場合（Ｓ４にてＮＯ）、Ｓ６の処理に移行する。一方、「対応文」が含まれていると判定した場合（Ｓ４にてＹＥＳ）、対応記述子解析部２８は、当該対応文を対応制御部３０に通知する。続いて、対応制御部３０は、通知された対応文を対応文出力制御部３１に通知して、これを出力するように命令する。そして、対応文出力制御部３１は、この命令に従って上記対応文を音声合成部３２で音声データに変換し、音波出力部１６からこの音声データを出力させる（Ｓ５）。 Next, the correspondence descriptor analysis unit 28 determines whether or not “correspondence sentence” is included in the correspondence descriptor notified from the correspondence descriptor search unit 29 (S4). If it is determined that the “corresponding sentence” is not included (NO in S4), the process proceeds to S6. On the other hand, when it is determined that the “corresponding sentence” is included (YES in S4), the correspondence descriptor analyzing unit 28 notifies the correspondence control unit 30 of the corresponding sentence. Subsequently, the correspondence control unit 30 notifies the correspondence statement output control unit 31 of the notified correspondence statement and instructs it to be output. Then, the corresponding sentence output control unit 31 converts the corresponding sentence into voice data by the voice synthesizing unit 32 according to this command, and outputs the voice data from the sound wave output unit 16 (S5).

Ｓ６では、対応記述子解析部２８は、対応記述子に「対応行動」が含まれているか否かを判定する。ここで、「対応行動」が含まれていないと判定した場合（Ｓ６にてＮＯ）、Ｓ８の処理に移行する。一方、「対応行動」が含まれていると判定した場合（Ｓ６にてＹＥＳ）、対応記述子解析部２８は、当該対応行動を対応制御部３０に通知する。続いて、対応制御部３０は、通知された対応行動を対応行動制御部３３に通知して、これを実行するように命令する。そして、対応行動制御部３３は、この命令に従って駆動部１７を制御し、上記対応行動を音声対話装置１に実行させる（Ｓ７）。 In S <b> 6, the correspondence descriptor analysis unit 28 determines whether or not “correspondence action” is included in the correspondence descriptor. If it is determined that “corresponding action” is not included (NO in S6), the process proceeds to S8. On the other hand, when it is determined that “corresponding action” is included (YES in S6), the corresponding descriptor analysis unit 28 notifies the corresponding control unit 30 of the corresponding action. Subsequently, the response control unit 30 notifies the response behavior control unit 33 of the notified response behavior and instructs to execute it. Then, the corresponding action control unit 33 controls the driving unit 17 in accordance with this command, and causes the voice interaction device 1 to execute the corresponding action (S7).

Ｓ８では、対応記述子解析部２８は、対応記述子に「隣接ペアＩＤ」が含まれているか否かを判定する。ここで、「隣接ペアＩＤ」が含まれていないと判定した場合（Ｓ８にてＮＯ）、対応記述子解析部２８は処理を終了する。一方、「隣接ペアＩＤ」が含まれていると判定した場合（Ｓ８にてＹＥＳ）、対応記述子解析部２８は、当該隣接ペアＩＤを対応制御部３０に通知する。続いて、対応制御部３０は、通知された隣接ペアＩＤを切替部２２に通知して、これを使用するように命令する。そして、切替部２２は、この命令に従って話題管理部２３に上記隣接ペアＩＤを通知し、これを登録させる（Ｓ９）。これにより、隣接ペア対話が開始される。なお、隣接ペア対話の詳細については、図１１にて後述する。 In S <b> 8, the correspondence descriptor analysis unit 28 determines whether the “neighboring pair ID” is included in the correspondence descriptor. If it is determined that the “adjacent pair ID” is not included (NO in S8), the correspondence descriptor analysis unit 28 terminates the process. On the other hand, when it is determined that the “adjacent pair ID” is included (YES in S8), the correspondence descriptor analysis unit 28 notifies the correspondence control unit 30 of the adjacent pair ID. Subsequently, the correspondence control unit 30 notifies the switching unit 22 of the notified adjacent pair ID and instructs to use it. Then, in accordance with this command, the switching unit 22 notifies the topic management unit 23 of the adjacent pair ID and registers it (S9). Thereby, the adjacent pair dialogue is started. The details of the adjacent pair dialogue will be described later with reference to FIG.

なお、上記の例では、対応記述子が検出されなかった場合（Ｓ２でＮＯ）に、対応記述子取得処理（Ｓ３）を行い、対応記述子検索装置４から対応記述子を取得しているが、Ｓ１の検索処理と並行して、対応記述子取得処理（Ｓ３）を行ってもよい。この場合、対応記述子検索部２９は、意図検索子生成部２５から意図検索子が入力されると、通信部１１を介して該意図検索子を対応記述子検索装置４に送信すると共に、記憶部１５に保存されている対応記述子検索テーブル４２を参照して対応記述子を検索する。そして、対応記述子検索部２９は、対応記述子検索装置４から対応記述子を受信した場合にはそれを用い、受信しなかった場合には対応記述子検索テーブル４２から検出した対応記述子を用いてもよい。 In the above example, when no corresponding descriptor is detected (NO in S2), the corresponding descriptor acquisition process (S3) is performed and the corresponding descriptor is acquired from the corresponding descriptor search device 4. In parallel with the search process of S1, the corresponding descriptor acquisition process (S3) may be performed. In this case, when the intention searcher 29 is input from the intention searcher generation unit 25, the correspondence descriptor search unit 29 transmits the intention searcher to the correspondence descriptor search device 4 via the communication unit 11 and stores it. The corresponding descriptor is searched with reference to the corresponding descriptor search table 42 stored in the unit 15. Then, the correspondence descriptor search unit 29 uses the correspondence descriptor when it is received from the correspondence descriptor search device 4, and if not, the correspondence descriptor search unit 29 uses the correspondence descriptor detected from the correspondence descriptor search table 42. It may be used.

これにより、対応記述子検索テーブル４２には登録されていない対応記述子を速やかに取得することができる。なお、対応記述子検索装置４からの対応記述子の受信待ち受け時間が長くなりすぎると、利用者への応答が遅延するので、所定時間（例えば８００ｍｓ）以内に受信しなければ、対応記述子検索テーブル４２から検出した対応記述子を用いるようにしてもよい。 As a result, it is possible to quickly obtain a corresponding descriptor that is not registered in the corresponding descriptor search table 42. Note that if the waiting time for receiving the corresponding descriptor from the corresponding descriptor search device 4 becomes too long, the response to the user is delayed, so if it is not received within a predetermined time (for example, 800 ms), the corresponding descriptor search is performed. The correspondence descriptor detected from the table 42 may be used.

例えば、対応記述子検索テーブル４２には、〔述部｜意図〕が〔食べる｜希望〕の意図検索子は登録されているが、〔述部｜意図‖対象〕が〔食べる｜希望‖カレー〕の意図検索子は登録されていない場合を考える。この場合に、利用者が「カレー食べたい」と発話すると、所定時間以内に対応記述子検索装置４から対応記述子が帰って来れば、これを用いた対応（例えば「カレーでもナンでも食べればいいじゃない」の発話）がなされる。一方、所定時間以内に対応記述子検索装置４から対応記述子が帰って来なければ、対応記述子検索テーブル４２に基づく対応（例えば「もうちょっと我慢して」の発話）がなされる。 For example, in the corresponding descriptor search table 42, an intention search element having [predicate | intention] is [eating | hoping] is registered, but [predicate | intention target] is [eating | hopping curry]. Let's consider a case where no intention searcher is registered. In this case, when the user utters “I want to eat curry”, if the corresponding descriptor returns from the corresponding descriptor search device 4 within a predetermined time, a response using this (for example, “If you eat curry or nan” "I'm not good" utterance) is made. On the other hand, if the corresponding descriptor does not return from the corresponding descriptor search device 4 within a predetermined time, a response based on the corresponding descriptor search table 42 (for example, “satisfy a little more”) is made.

（ローカルに対応記述子がない場合、サーバで対応記述子を取得）図９のＳ３で行われる対応記述子取得処理について図１０に基づいて説明する。図１０は、対応記述子取得処理の一例を示すフローチャートである。記憶部１５に保存されている対応記述子検索テーブル４２から、対応記述子を検出することができなかった対応記述子検索部２９は、は、通信部１１を介して、対応記述子検索装置４（図２参照）にアクセスする（Ｓ２０）。具体的には、対応記述子検索部２９は、意図検索子を対応記述子検索装置４に送信して、該意図検索子に対応する対応記述子の有無を通知するように要求する。 (When there is no corresponding descriptor locally, the server acquires the corresponding descriptor.) The corresponding descriptor acquisition process performed in S3 of FIG. 9 will be described with reference to FIG. FIG. 10 is a flowchart illustrating an example of the correspondence descriptor acquisition process. The correspondence descriptor search unit 29 that has not been able to detect the correspondence descriptor from the correspondence descriptor search table 42 stored in the storage unit 15 receives the correspondence descriptor search device 4 via the communication unit 11. (See FIG. 2) is accessed (S20). Specifically, the correspondence descriptor search unit 29 transmits an intention searcher to the correspondence descriptor search device 4 and requests to notify the presence / absence of a correspondence descriptor corresponding to the intention searcher.

そして、対応記述子検索部２９は、対応記述子検索装置４からの応答を待ち受ける（Ｓ２１）。ここで、対応記述子検索装置４から対応記述子がない旨の応答を受信した、あるいは対応記述子検索装置４からの応答がない状態で所定時間が経過した場合、対応記述子検索部２９は、対応記述子検索装置４にも対応記述子がないと判定する（Ｓ２１にてＮＯ）。この場合、対応記述子検索部２９は、対応を中止して（Ｓ２２）、対応記述子取得処理を終了する。一方、対応記述子検索装置４から対応記述子がある旨の応答を受信した場合（Ｓ２１にてＹＥＳ）、対応記述子検索部２９は、対応記述子検索装置４に対し、隣接ペアの有無を通知するように要求してその応答を待ち受ける（Ｓ２３）。 Then, the correspondence descriptor search unit 29 waits for a response from the correspondence descriptor search device 4 (S21). Here, when a response indicating that there is no corresponding descriptor is received from the corresponding descriptor search device 4 or when a predetermined time has passed without a response from the corresponding descriptor search device 4, the corresponding descriptor search unit 29 Then, it is determined that there is no corresponding descriptor in corresponding descriptor search device 4 (NO in S21). In this case, the correspondence descriptor search unit 29 stops the correspondence (S22) and ends the correspondence descriptor acquisition process. On the other hand, when the response indicating that there is a corresponding descriptor is received from the corresponding descriptor search device 4 (YES in S21), the corresponding descriptor search unit 29 determines whether or not there is an adjacent pair to the corresponding descriptor search device 4. A request for notification is made and a response is awaited (S23).

ここで、対応記述子検索装置４から隣接ペアはない旨の応答を受信した、あるいは応答がない状態で所定時間が経過した場合、対応記述子検索部２９は、対応記述子検索装置４から対応記述子を取得して（Ｓ２４）、対応記述子取得処理を終了する。一方、対応記述子検索装置４から隣接ペアがある旨の応答を受信した場合（Ｓ２３でＹＥＳ）、対応記述子検索部２９は、対応記述子検索装置４から隣接ペアを取得する（Ｓ２５）。 Here, when a response indicating that there is no adjacent pair is received from the corresponding descriptor search device 4 or when a predetermined time has passed without a response, the corresponding descriptor search unit 29 responds from the corresponding descriptor search device 4. The descriptor is acquired (S24), and the corresponding descriptor acquisition process is terminated. On the other hand, when the response indicating that there is an adjacent pair is received from the corresponding descriptor search device 4 (YES in S23), the corresponding descriptor search unit 29 acquires the adjacent pair from the corresponding descriptor search device 4 (S25).

なお、ここで取得する隣接ペアとは、対応記述子検索装置４に送信した意図検索子の示す意図に応じた内容で隣接ペア対話を行うために必要な情報であり、少なくとも隣接ペア対話の対応文を１つ含んでいればよい。ただし、１つの対応文のみでは、利用者の次の発話に対する対応文を決定する際に、再度、対応記述子検索装置４にアクセスする必要が生じる。このため、対応文、想定応答、および想定応答に対する対応文を少なくともセットで含む情報を隣接ペアとして送信することが好ましく、音声対話装置１の記憶容量に余裕があれば、図７に示すような隣接ペアテーブルの全体を隣接ペアとして送信してもよい。 The adjacent pair acquired here is information necessary for performing the adjacent pair dialogue with the contents according to the intention indicated by the intention searcher transmitted to the correspondence descriptor search device 4, and at least the correspondence of the adjacent pair dialogue It only needs to contain one sentence. However, with only one correspondence sentence, it is necessary to access the correspondence descriptor search device 4 again when determining the correspondence sentence for the user's next utterance. For this reason, it is preferable to transmit the correspondence sentence, the assumed response, and the information including at least the correspondence sentence for the assumed response as an adjacent pair, and if the storage capacity of the voice interactive apparatus 1 is sufficient, as shown in FIG. The entire adjacent pair table may be transmitted as an adjacent pair.

そして、対応記述子検索部２９は、取得した隣接ペアを、対応記述子解析部２８、対応制御部３０、切替部２２、および話題管理部２３を介して話題取得部２４に送信し、記憶部１５に保存させ（Ｓ２６）、これにより対応記述子取得処理を終了する。 Then, the correspondence descriptor search unit 29 transmits the acquired adjacent pair to the topic acquisition unit 24 via the correspondence descriptor analysis unit 28, the correspondence control unit 30, the switching unit 22, and the topic management unit 23, and the storage unit 15 (S26), thereby completing the corresponding descriptor acquisition process.

（意図対話と隣接ペア対話の切り替え）次に、意図対話と隣接ペア対話の切り替えについて図１１に基づいて説明する。図１１は、意図対話と隣接ペア対話の切り替え処理の一例を示す図である。切替部２２は、対応制御部３０から隣接ペアＩＤの通知を受けると、該隣接ペアＩＤを話題管理部２３に通知して登録させる（図９のＳ９）と共に、隣接ペア対話に切り替える（Ｓ４０）。 (Switching between intention dialogue and adjacent pair dialogue) Next, switching between intention dialogue and adjacent pair dialogue will be described with reference to FIG. FIG. 11 is a diagram illustrating an example of a switching process between an intention dialog and an adjacent pair dialog. When receiving the notification of the adjacent pair ID from the correspondence control unit 30, the switching unit 22 notifies the topic management unit 23 of the adjacent pair ID for registration (S9 in FIG. 9) and switches to the adjacent pair dialogue (S40). .

そして、話題管理部２３は、登録した隣接ペアＩＤを話題取得部２４に通知し、話題取得部２４は、隣接ペアテーブル４０を参照して、通知された隣接ペアＩＤに対応する発話内容を特定する（Ｓ４１）。例えば、隣接ペアＩＤ＝１が通知された場合、図７の隣接ペアテーブル４０を参照すれば、＃１の「今日はほんとうに暑いですね」が、発話内容として特定される。 Then, the topic management unit 23 notifies the registered topic pair ID to the topic acquisition unit 24, and the topic acquisition unit 24 refers to the adjacent pair table 40 and specifies the utterance content corresponding to the notified adjacent pair ID. (S41). For example, when the adjacent pair ID = 1 is notified, referring to the adjacent pair table 40 in FIG. 7, # 1 “It is really hot today” is specified as the utterance content.

なお、話題取得部２４は、当該隣接ペアＩＤを含む隣接ペアテーブル４０をＲＡＭ（Random Access Memory）等の一時保存部に保存しておいてもよい。これにより、話題管理部２３は、該隣接ペアテーブル４０を用いた隣接ペア対話が継続している間は、話題取得部２４を介することなく、一時保存部を参照して迅速に応答内容を決定することができる。 Note that the topic acquisition unit 24 may store the adjacent pair table 40 including the adjacent pair ID in a temporary storage unit such as a RAM (Random Access Memory). As a result, the topic management unit 23 determines the response contents quickly with reference to the temporary storage unit without going through the topic acquisition unit 24 while the adjacent pair dialogue using the adjacent pair table 40 continues. can do.

続いて、話題取得部２４は、特定した発話内容を話題管理部２３に通知し、話題管理部２３はこれを対応制御部３０に通知する。そして、対応制御部３０は、通知された発話内容を対応文出力制御部３１に通知して、これを出力するように命令する（Ｓ４２）。これにより、対応文出力制御部３１および音声合成部３２により、音波出力部１６から上記発話内容の音声データが出力される。 Subsequently, the topic acquisition unit 24 notifies the topic management unit 23 of the specified utterance content, and the topic management unit 23 notifies the correspondence control unit 30 of this. Then, the correspondence control unit 30 notifies the corresponding sentence output control unit 31 of the notified utterance content and instructs to output it (S42). Thereby, the corresponding sentence output control unit 31 and the voice synthesis unit 32 output the voice data of the utterance content from the sound wave output unit 16.

また、隣接ペア対話に切り替えた切替部２２は、利用者の応答、具体的には利用者の応答発話を音声認識して得たテキストデータを音声認識部２０から受信するのを待ち受ける（Ｓ４３）。そして、隣接ペア対話に切り替えた状態で、利用者の応答があった（音声認識部２０からテキストデータを受信した）と判定した場合（Ｓ４３にてＹＥＳ）、切替部２２は、受信したテキストデータを話題管理部２３に転送する。 In addition, the switching unit 22 that has switched to the adjacent pair dialogue waits to receive from the speech recognition unit 20 text data obtained by voice recognition of the user's response, specifically, the user's response utterance (S43). . Then, when it is determined that there is a response from the user (text data is received from the voice recognition unit 20) in a state of switching to the adjacent pair dialogue (YES in S43), the switching unit 22 receives the received text data. Is transferred to the topic management unit 23.

次に、話題管理部２３は、上記テキストデータに隣接ペアの対応文があるか判定する（Ｓ４４）。具体的には、話題管理部２３は、転送された上記テキストデータをさらに話題取得部２４に転送して、該テキストデータに応じた対応文を特定するよう指示する。そして、この指示に応じて話題取得部２４から対応文が通知されたときに、隣接ペアの対応文があると判定し、通知されなかったときには隣接ペアの対応文がないと判定する。例えば、図７の隣接ペアテーブル４０を用いる場合に、音声対話装置１の「今日はほんとうに暑いですね」に対する利用者の応答が「そんなことないぞ」であった場合、隣接ＩＤ＝２の対応文「でも２５度超えてますよ」があると判定される。 Next, the topic management unit 23 determines whether there is a correspondence sentence of an adjacent pair in the text data (S44). Specifically, the topic management unit 23 further transfers the transferred text data to the topic acquisition unit 24 and instructs to identify the corresponding sentence according to the text data. Then, when a corresponding sentence is notified from the topic acquisition unit 24 in response to this instruction, it is determined that there is an adjacent pair corresponding sentence, and when it is not notified, it is determined that there is no adjacent pair corresponding sentence. For example, when the adjacent pair table 40 in FIG. 7 is used, if the user's response to “Today is really hot” of the voice interaction device 1 is “None”, the adjacent ID = 2. It is determined that there is a corresponding sentence “But it is over 25 degrees”.

なお、一時保存部に隣接ペアテーブル４０を保存している場合、話題管理部２３がテキストデータを解析して、該テキストデータに応じた対応文が隣接ペアテーブル４０に含まれているか否かを判定してもよい。 When the adjacent pair table 40 is stored in the temporary storage unit, the topic management unit 23 analyzes the text data, and determines whether or not the adjacent pair table 40 includes a corresponding sentence corresponding to the text data. You may judge.

ここで、隣接ペアの対応文があると判定した場合（Ｓ４４にてＹＥＳ）、処理はＳ４１に戻り、話題管理部２３は、当該対応文を利用者に対する発話内容と特定する。つまり、ユーザの応答内容が、隣接ペアテーブル４０に登録されている場合には、隣接ペア対話が継続される。一方、隣接ペアの対応文がないと判定した場合（Ｓ４４にてＮＯ）、話題管理部２３は、その旨を切替部２２に通知すると共に、上記テキストデータを切替部２２に返す。 If it is determined that there is a corresponding sentence of the adjacent pair (YES in S44), the process returns to S41, and the topic management unit 23 identifies the corresponding sentence as utterance content for the user. That is, when the response content of the user is registered in the adjacent pair table 40, the adjacent pair dialogue is continued. On the other hand, when it is determined that there is no corresponding sentence of the adjacent pair (NO in S44), the topic management unit 23 notifies the switching unit 22 to that effect and returns the text data to the switching unit 22.

この通知を受けた切替部２２は、意図対話への切り替えを行い（Ｓ４５）、処理は終了する。なお、意図対話への切り替え後、切替部２２は、上記テキストデータを意図検索子生成部２５に送信し、図８および図９に示した処理により、利用者の意図に応じた対応が実行される。 Upon receiving this notification, the switching unit 22 switches to the intention dialogue (S45), and the process ends. Note that after switching to the intention dialogue, the switching unit 22 transmits the text data to the intention searcher generating unit 25, and a response corresponding to the user's intention is executed by the processing shown in FIGS. The

以上のように、会話の状況に応じて隣接ペア対話に切り替えることにより、全ての対話を意図対話とする場合と比べて情報処理量を削減することができ、これにより、タイミングよく音声対話を行うことが可能になる。特に、コンテキストベースの音声対話では、利用者からの発話に対する応答をタイミングよく行い、利用者が安心して利用できるようにすることが望ましいので、この切り替えは有効である。また、会話の状況に応じて意図対話に切り替えることにより、対話におけるフレーム変化にも対応できる。 As described above, by switching to the adjacent pair dialogue according to the conversation situation, it is possible to reduce the amount of information processing compared to the case where all the dialogues are intended dialogues. It becomes possible. In particular, in context-based voice conversation, it is desirable to perform a response to the utterance from the user in a timely manner so that the user can use it with peace of mind, so this switching is effective. In addition, by switching to the intended dialogue according to the conversation situation, it is possible to deal with frame changes in the dialogue.

（対応制御部３０の処理）図９のフローチャートでは、対応記述子が検出されると、逐次対応文の出力や対応行動の実行を行う例を示したが、利用者とのより自然な対話のため、これらの対応の実行に関する制御を行ってもよい。これについて、図１２に基づいて説明する。図１２は、音声対話装置１が対応記述子の示す対応の実行制御処理の一例を示すフローチャートである。なお、このフローチャートの処理は、図９のＳ２にてＹＥＳと判定された後、Ｓ４の処理を実行する前に行われる。 (Processing of Correspondence Control Unit 30) In the flowchart of FIG. 9, when a correspondence descriptor is detected, an example of sequentially outputting a correspondence sentence and executing a corresponding action is shown. However, a more natural conversation with a user is performed. Therefore, you may perform control regarding execution of these correspondences. This will be described with reference to FIG. FIG. 12 is a flowchart illustrating an example of the corresponding execution control process indicated by the corresponding descriptor by the voice interaction apparatus 1. Note that the process of this flowchart is performed before the process of S4 is executed after it is determined YES in S2 of FIG.

なお、この制御には、主に２つの特徴点がある。その１点目は、音声対話装置１が利用者の発話に応じた対応を実行する前に、該利用者により新たな発話が行われた場合に、その対応を停止または中止する点である。そして、２点目は、利用者の発話に間が生じたときには、対応記述子に応じた対応の実行を待機する点である。 This control mainly has two characteristic points. The first point is that when a new utterance is made by the user before the voice interactive device 1 executes a response corresponding to the user's utterance, the response is stopped or stopped. The second point is that when there is a gap between the user's utterances, the execution of the correspondence according to the correspondence descriptor is waited.

この２点目の処理を可能にするために、意図検索子生成部２５は、利用者の発話に「間」が生じたときに、述部と意図の要素が空の意図検索子（以下、単に空の意図検索子と呼ぶ）を生成する。例えば、利用者が「いやー、ほんと寒いよね・・・うーん、なにか温かいもの食べたいな」などと、「間」（・・・）を空けて発話した場合を考える。この場合、「いやー、ほんと寒いよね」のテキストデータが受信された後、「うーん、なにか温かいもの食べたいな」のテキストデータが受信されるまでに時間間隔が生じる。そこで、意図検索子生成部２５は、テキストデータが受信された後、テキストデータの受信が確認できない期間が所定時間以上継続したと判定した場合に、空の意図検索子を生成する。 In order to enable this second processing, the intention searcher generation unit 25, when an “interval” occurs between the utterances of the user, the intention searcher (hereinafter, “predicate” and “intention” elements are empty). Simply called an empty intent searcher). For example, consider a case where a user speaks with a gap (...) saying, "No, it's really cold ... um, I want to eat something warm". In this case, there is a time interval from the reception of the text data of “No, really cold” to the reception of the text data of “Well, I want to eat something warm”. Therefore, the intention searcher generation unit 25 generates an empty intention searcher when it is determined that a period in which the reception of the text data cannot be confirmed continues for a predetermined time or longer after the text data is received.

なお、意図検索子生成部２５は、テキストデータが受信された後、次のテキストデータが受信されたときに、これらの受信タイミングが所定時間以上であれば、空の意図検索子を生成してもよい。また、空の意図検索子は、音声対話装置１と利用者との対話において、利用者の話したい内容（意図）が含まれていないときに生成すればよく、利用者の発話がない期間の検出時に限られず、他の契機で生成してもよい。例えば、「いやー」や「うーん」等の感動詞（感嘆詞、間投詞とも言う）が発話されたときや、音声認識できないような不明瞭な発話がなされたときにも、空の意図検索子を生成してもよい。 The intention searcher generation unit 25 generates an empty intention searcher when the next text data is received after the text data is received, and the reception timing is equal to or longer than a predetermined time. Also good. The empty intention searcher may be generated when the conversation between the voice interactive apparatus 1 and the user does not include the content (intention) that the user wants to speak. It is not restricted at the time of a detection, You may produce | generate with another opportunity. For example, an empty intent searcher can be used when an excitement verb (such as exclamation or interjection) such as “no” or “un” is uttered, or when an unclear utterance that cannot be recognized by speech is made. It may be generated.

空の意図検索子は、通常の意図検索子と同様に、対応記述子検索部２９に送られる。そして、対応記述子検索部２９は、空の意図検索子を受信した場合には、対応内容の要素が空の対応記述子（以下、単に空の対応記述子と呼ぶ）を生成して、これを対応記述子解析部２８を介して対応制御部３０に送信する。 The empty intention searcher is sent to the corresponding descriptor search unit 29 in the same manner as a normal intention searcher. When the correspondence descriptor search unit 29 receives an empty intention search element, the correspondence descriptor search unit 29 generates a correspondence descriptor whose correspondence content element is empty (hereinafter simply referred to as an empty correspondence descriptor). Is transmitted to the correspondence control unit 30 via the correspondence descriptor analysis unit 28.

このように、対応記述子には、空の対応記述子と通常の対応記述子（対応内容の要素が含まれた対応記述子）とがあるため、対応制御部３０は、まず、対応記述子解析部２８から受信した対応記述子が空の対応記述子であるか否かを判定する（Ｓ６０）。ここで、空の対応記述子であると判定した場合（Ｓ６０にてＹＥＳ）、対応制御部３０は、先に実行が予定された対応記述子があるか否かを判定する（Ｓ６１）。 As described above, since the correspondence descriptor includes an empty correspondence descriptor and a normal correspondence descriptor (a correspondence descriptor including an element of correspondence content), the correspondence control unit 30 first sets the correspondence descriptor. It is determined whether or not the correspondence descriptor received from the analysis unit 28 is an empty correspondence descriptor (S60). If it is determined that the correspondence descriptor is empty (YES in S60), the correspondence control unit 30 determines whether there is a correspondence descriptor scheduled to be executed first (S61).

そして、先に実行が予定された対応記述子がある場合（Ｓ６１にてＹＥＳ）、対応制御部３０は、先に実行が予定された対応記述子の実行タイミングを遅らせる（Ｓ６２）。例えば、対応記述子の実行タイミングをタイマーで管理している場合、対応制御部３０は、そのタイマーのタイムアウト時間（対応記述子の実行タイミング）を所定時間（例えば５００ｍｓ）だけ増加させてもよい。この後、処理は図９のＳ４に進み、遅らされたタイミングにて、先に実行が予定された対応記述子の示す対応が実行される。 If there is a correspondence descriptor scheduled to be executed first (YES in S61), the correspondence control unit 30 delays the execution timing of the correspondence descriptor scheduled to be executed first (S62). For example, when the execution timing of the corresponding descriptor is managed by a timer, the response control unit 30 may increase the timeout time (execution timing of the corresponding descriptor) of the timer by a predetermined time (for example, 500 ms). Thereafter, the process proceeds to S4 in FIG. 9, and the correspondence indicated by the correspondence descriptor scheduled to be executed first is executed at the delayed timing.

また、Ｓ６０において、対応記述子が空ではないと判定した場合（Ｓ６０にてＮＯ）にも、対応制御部３０は、先に実行が予定された対応記述子があるか否かを判定する（Ｓ６３）。ここで、先に実行が予定された対応記述子がないと判定した場合（Ｓ６３でＮＯ）、対応制御部３０はＳ６５の処理に進む。一方、先に実行が予定された対応記述子があると判定した場合（Ｓ６３にてＹＥＳ）、対応制御部３０は、先の対応記述子の実行中止命令を、対応文出力制御部３１および対応行動制御部３３に送信し（Ｓ６４）、Ｓ６５の処理に進む。 Also, when it is determined in S60 that the corresponding descriptor is not empty (NO in S60), the corresponding control unit 30 determines whether there is a corresponding descriptor that is scheduled to be executed first (S60). S63). If it is determined that there is no corresponding descriptor scheduled to be executed first (NO in S63), the response control unit 30 proceeds to the process of S65. On the other hand, when it is determined that there is a correspondence descriptor scheduled to be executed first (YES in S63), correspondence control unit 30 issues an instruction to stop execution of the previous correspondence descriptor, corresponding statement output control unit 31, and correspondence It transmits to the action control part 33 (S64), and progresses to the process of S65.

Ｓ６５では、対応制御部３０は、Ｓ６０で空ではないと判定した対応記述子を実行予定とする。また、対応制御部３０は、Ｓ６５にて実行予定とした対応記述子の実行タイミングを遅らせる（Ｓ６６）。例えば、対応制御部３０は、タイマーのタイムアウト時間（対応記述子の実行タイミング）を所定時間（例えば５００ｍｓ）に設定してもよい。そして、対応制御部３０は、Ｓ６５で実行予定とした対応記述子の示す対応の実行タイミングとなるのを待ち受ける（Ｓ６７）。 In S65, the correspondence control unit 30 schedules the correspondence descriptor determined not to be empty in S60. Further, the response control unit 30 delays the execution timing of the response descriptor scheduled to be executed in S65 (S66). For example, the correspondence control unit 30 may set the timer timeout time (execution timing of the correspondence descriptor) to a predetermined time (for example, 500 ms). Then, the correspondence control unit 30 waits for the corresponding execution timing indicated by the correspondence descriptor scheduled to be executed in S65 (S67).

この後、実行タイミングになったと判定した場合（Ｓ６７にてＹＥＳ）対応制御部３０は、実行タイミングとなった対応に中止命令が出されていないか確認する（Ｓ６８）。そして、中止命令が出されていれば（Ｓ６８にてＹＥＳ）、対応制御部３０は、該対応の実行を中止して（Ｓ６９）処理を終了する。一方、中止命令が出されていなければ（Ｓ６８にてＮＯ）、処理は図９のＳ４に進み、当該対応が実行される。 Thereafter, when it is determined that the execution timing has come (YES in S67), the response control unit 30 checks whether or not a stop instruction has been issued for the response that has reached the execution timing (S68). If a cancel command has been issued (YES in S68), response control unit 30 stops the execution of the response (S69) and ends the process. On the other hand, if no stop command has been issued (NO in S68), the process proceeds to S4 in FIG. 9 and the corresponding action is executed.

例えば、利用者が「いやー、ほんと寒いよね・・・うーん、なにか温かいもの食べたいな」と発話した場合、まず、「いやー、ほんと寒いよね」に対応する対応記述子が生成される。そして、次に「・・・」に対応する空の対応記述子が生成され、さらに「うーん、なにか温かいもの食べたいな」に対応する対応記述子が生成される。 For example, when the user speaks “No, really cold ... Well, I want to eat something warm,” first, a correspondence descriptor corresponding to “No, really cold” is generated. Next, an empty correspondence descriptor corresponding to “...” Is generated, and further, a correspondence descriptor corresponding to “Well, I want to eat something warm” is generated.

よって、「いやー、ほんと寒いよね」に対応する対応記述子が実行予定とされた（Ｓ６５）後、空の対応記述子によってこの実行タイミングが遅らされる（Ｓ６２）。そして、遅らされた実行タイミングまでに「うーん、なにか温かいもの食べたいな」に対応する対応記述子が取得されれば、「いやー、ほんと寒いよね」に対応する対応記述子の対応は中止される（Ｓ６４）。その後、「うーん、なにか温かいもの食べたいな」に対応する対応記述子の対応が実行予定とされる（Ｓ６５）。 Therefore, after the corresponding descriptor corresponding to “No, it is really cold” is scheduled to be executed (S65), the execution timing is delayed by an empty corresponding descriptor (S62). And if the corresponding descriptor corresponding to “Well, I want to eat something warm” is acquired by the delayed execution timing, the correspondence of the corresponding descriptor corresponding to “No, really cold” is canceled. (S64). After that, the correspondence descriptor corresponding to “Well, I want to eat something warm” is scheduled to be executed (S65).

このように、利用者の発話の「間」に合わせて、対応の実行を遅らせることにより、利用者の発話を妨げることなく、自然なタイミングで音声対話装置１に応答させることが可能になる。また、連続して行われた利用者の各発話のうち、最後の発話に対する応答のみが行われるので、利用者の発話を途中で遮ることを防ぐことができる。 In this way, by delaying the execution of the response in accordance with the “between” of the user's utterances, it is possible to cause the voice interaction apparatus 1 to respond at a natural timing without hindering the user's utterances. Moreover, since only the response with respect to the last utterance is performed among each utterance of the user performed continuously, it can prevent blocking a user's utterance on the way.

なお、利用者の複数の発話のうち、先の発話に対する応答を実行させてもよいし、応答内容に応じて実行の可否を決定してもよい。例えば、隣接ペアＩＤが含まれている応答記述子を優先する構成としても構わない。また、連続して複数の発話がなされたときには、「もう少しゆっくり話して下さい」等の利用者の再度の発話を促すメッセージや、「続けて話されると答えられないよ」等の応答が難しいことを伝えるメッセージを発話させてもよい。 Of the plurality of utterances of the user, a response to the previous utterance may be executed, or whether to execute the response may be determined according to the response content. For example, the response descriptor including the adjacent pair ID may be prioritized. Also, when multiple utterances are made in succession, it is difficult to respond to messages such as “Please speak a little more slowly”, prompting the user to speak again, and “If you continue speaking, you cannot answer” You may have a message telling you.

また、上記の例では、利用者の発話に「間」が生じたときに空の意図検索子を生成することによって、利用者の発話に対する対応の実行タイミングを遅らせているが、この例に限られない。例えば、空の意図検索子の生成を省略して空の対応記述子を生成してもよいし、これらの何れも生成せずに対応の実行タイミングを制御してもよい。空の意図検索子の生成を省略して、空の対応記述子を生成する場合、対応記述子検索部２９は、意図検索子生成部２５から意図検索子を受信した後、次の意図検索子の受信までの期間が所定時間以上であれば、空の対応記述子を生成すればよい。また、何れの生成も省略する場合には、対応制御部３０が、対応記述子解析部２８から対応の実行指示を受信した後、次の指示の受信までの期間が所定時間以上であれば、先の指示の実行を中止すればよい。 In the above example, the execution timing of the response to the user's utterance is delayed by generating an empty intention search element when “between” occurs in the user's utterance. I can't. For example, generation of an empty intention search element may be omitted to generate an empty correspondence descriptor, or the corresponding execution timing may be controlled without generating any of these. When generating an empty correspondence descriptor by omitting generation of an empty intention searcher, the correspondence descriptor search unit 29 receives the intention searcher from the intention searcher generation unit 25 and then receives the next intention searcher. If the period until the reception of is equal to or longer than a predetermined time, an empty correspondence descriptor may be generated. In the case where any generation is omitted, if the correspondence control unit 30 receives a corresponding execution instruction from the correspondence descriptor analysis unit 28 and the period until reception of the next instruction is equal to or longer than a predetermined time, The execution of the previous instruction may be stopped.

〔実施形態２〕
本発明の他の実施形態について、図４および図１３に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。[Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those described in the embodiment are given the same reference numerals, and descriptions thereof are omitted.

実施形態２では、利用者の発話以外の所定の事象の発生が検出された場合に、音声対話装置１が、利用者に対して能動的に対応する例を説明する。所定の事象とは、利用者に関連した事象であれば特に限定されないが、ここでは、音声対話装置１のバッテリー残量が少ないという事象、所定のウェブページが更新されたという事象、利用者を玄関で検出したという事象である例を説明する。 In the second embodiment, an example will be described in which the voice interactive device 1 actively responds to a user when occurrence of a predetermined event other than the user's utterance is detected. The predetermined event is not particularly limited as long as it is an event related to the user, but here, an event that the battery of the voice interactive device 1 is low, an event that a predetermined web page is updated, a user An example of an event that is detected at the entrance will be described.

（情報取得部２１の要部構成）図４は、図１に示した音声対話装置１が備える情報取得部２１の要部構成の一例を示すブロック図である。図４に示すように、情報取得部２１は、人物画像判別部５０、ユーザ判別部５１、位置情報取得部５２、外部情報マルチプレクサ５３、ネット情報取得部５４、ネット情報マルチプレクサ５５、残量検知部５６、および内部情報マルチプレクサ５７を備えている。 (Principal part structure of the information acquisition part 21) FIG. 4: is a block diagram which shows an example of the principal part structure of the information acquisition part 21 with which the voice interactive apparatus 1 shown in FIG. 1 is provided. As shown in FIG. 4, the information acquisition unit 21 includes a person image determination unit 50, a user determination unit 51, a position information acquisition unit 52, an external information multiplexer 53, a net information acquisition unit 54, a net information multiplexer 55, and a remaining amount detection unit. 56 and an internal information multiplexer 57.

人物画像判別部５０は、撮像部１２が撮像した画像に人物が撮像されていることを判別する。ユーザ判別部５１は、人物画像判別部５０が判別した人物が、所定の利用者であることを判別する。位置情報取得部５２は、人物画像判別部５０が判別した人物がいる位置を示す情報を取得する。外部情報マルチプレクサ５３は、ユーザ判別部５１および位置情報取得部５２から通知される情報に基づいて、利用者を玄関で検出したという事象（所定の事象）が発生したことを検出し、その旨を切替部２２に通知する。 The person image determination unit 50 determines that a person is captured in the image captured by the image capturing unit 12. The user determination unit 51 determines that the person determined by the person image determination unit 50 is a predetermined user. The position information acquisition unit 52 acquires information indicating the position where the person determined by the person image determination unit 50 is present. The external information multiplexer 53 detects the occurrence of an event (predetermined event) that the user has been detected at the entrance, based on the information notified from the user determination unit 51 and the position information acquisition unit 52, and notifies that fact. Notify the switching unit 22.

ネット情報取得部５４は、通信部１１を介して所定のウェブページを取得する。なお、取得するウェブページは、利用者が予め登録したウェブページ等の利用者が更新状況を把握したいウェブページである。ネット情報マルチプレクサ５５は、ネット情報取得部５４が取得したウェブページが、前回取得したときから更新されていた場合に、ウェブページが更新されたという事象（所定の事象）が発生したと判定し、その旨を切替部２２に通知する。 The net information acquisition unit 54 acquires a predetermined web page via the communication unit 11. The web page to be acquired is a web page that the user wants to know the update status, such as a web page registered in advance by the user. The net information multiplexer 55 determines that an event (predetermined event) that the web page has been updated has occurred when the web page acquired by the net information acquisition unit 54 has been updated since the previous acquisition. This is notified to the switching unit 22.

残量検知部５６は、バッテリー１３の残量を検知し、検知した残量を内部情報マルチプレクサ５７に通知する。また、内部情報マルチプレクサ５７は、通知された残量が所定値以下であれば、図示しないタイマー（水晶によるクロック情報を出力するものであってもよい）による時間の計測を開始し、そのタイマーによって、残量が所定値以下の状態が所定時間以上継続している事象（所定の事象）を検出する。そして、該事象を検出したときには、その旨を切替部２２に通知する。 The remaining amount detection unit 56 detects the remaining amount of the battery 13 and notifies the internal information multiplexer 57 of the detected remaining amount. Further, if the notified remaining amount is equal to or less than the predetermined value, the internal information multiplexer 57 starts measuring time by a timer (not shown) (which may output clock information by crystal), and the timer An event (predetermined event) in which the state where the remaining amount is not more than a predetermined value continues for a predetermined time or more is detected. When the event is detected, the switching unit 22 is notified accordingly.

（マルチプレクサの処理例）ここで、利用者を玄関で検出という事象の発生を検出して意図検索子を生成する処理を図１３に基づいて説明する。図１３は、利用者を玄関で検出という事象の発生を検出して意図検索子を生成する処理の一例を示すフローチャートである。 (Processing Example of Multiplexer) Here, a process of detecting the occurrence of an event of detecting a user at the entrance and generating an intention searcher will be described with reference to FIG. FIG. 13 is a flowchart illustrating an example of processing for generating an intention searcher by detecting the occurrence of an event of detecting a user at the entrance.

人物画像判別部５０は、撮像部１２から取得した撮像画像に人物が含まれていると判別する（Ｓ８０）と、その撮像画像をユーザ判別部５１に送信する。そして、ユーザ判別部５１は、受信した撮像画像中の人物が、所定の利用者であるか判定し（Ｓ８１）、所定の利用者であると判定する（Ｓ８１にてＹＥＳ）と、その旨を外部情報マルチプレクサ５３に通知する。 When it is determined that the captured image acquired from the imaging unit 12 includes a person (S80), the person image determination unit 50 transmits the captured image to the user determination unit 51. Then, the user determination unit 51 determines whether the person in the received captured image is a predetermined user (S81), and determines that the person is a predetermined user (YES in S81). Notify external information multiplexer 53.

この通知を受信した外部情報マルチプレクサ５３は、位置情報取得部５２が取得した位置が所定の位置（この例では玄関）であるか否かを判定する（Ｓ８２）。そして、所定の位置であると判定した場合（Ｓ８２にてＹＥＳ）、外部情報マルチプレクサ５３は、玄関で利用者を検出したという事象の発生を切替部２２に通知する。なお、Ｓ８１で所定の利用者ではない（Ｓ８１にてＮＯ）と判定された場合や、Ｓ８２で所定の位置ではない（Ｓ８２にてＮＯ）と判定された場合には、Ｓ８３には進まずに処理を終了する。 Receiving this notification, the external information multiplexer 53 determines whether or not the position acquired by the position information acquisition unit 52 is a predetermined position (the entrance in this example) (S82). If it is determined that the position is the predetermined position (YES in S82), external information multiplexer 53 notifies switching unit 22 of the occurrence of an event that a user has been detected at the entrance. If it is determined in S81 that the user is not a predetermined user (NO in S81), or if it is determined in S82 that the user is not in a predetermined position (NO in S82), the process does not proceed to S83. The process ends.

次に、上記の通知を受信した切替部２２は、意図検索子生成部２５に上記事象の発生を通知し、この通知を受信した意図検索子生成部２５は、当該事象に対応する予め定められた意図検索子を生成し（Ｓ８３）、対応記述子検索部２９に送信する。具体的には、表層を「帰宅」、意図を「現在」、対象を「利用者」とする意図検索子を生成し、送信する。これにより、図６の対応記述子検索テーブル４２から「ご主人様おかえりなさい」の対応文と、「挨拶」の対応行動が特定され、音声対話装置１は、「ご主人様おかえりなさい」と音声出力しながら、「挨拶」の対応行動を実行する。 Next, the switching unit 22 that has received the notification notifies the intention searcher generation unit 25 of the occurrence of the event, and the intention searcher generation unit 25 that has received the notification has a predetermined response corresponding to the event. The intention searcher is generated (S83) and transmitted to the corresponding descriptor search unit 29. Specifically, an intention searcher having the surface layer as “return home”, the intention as “current”, and the target as “user” is generated and transmitted. As a result, the correspondence sentence “Return the master” and the corresponding action “greeting” are identified from the correspondence descriptor search table 42 in FIG. 6, and the voice interaction apparatus 1 outputs a voice “Return the master”. However, the corresponding action of “greeting” is executed.

なお、他の事象の発生が検出された場合の意図検索子の生成も同様である。例えば、ウェブページの更新が通知された場合には、意図検索子生成部２５は、表層を「変わった」、意図を「事実」、対象を「ホームページ」とする意図検索子を生成する。また、バッテリー１３の残量が所定値以下の状態が所定時間以上継続している事象の発生を通知された場合には、意図検索子生成部２５は、表層を「なくなる」、意図を「事実、未来」、対象を「電池」とする意図検索子を生成する。 The same applies to the generation of the intention search element when the occurrence of another event is detected. For example, when the update of the web page is notified, the intention searcher generation unit 25 generates an intention searcher having the surface layer “changed”, the intention “facts”, and the target “homepage”. When notified of the occurrence of an event in which the remaining amount of the battery 13 is not more than a predetermined value for a predetermined time or longer, the intention searcher generation unit 25 “disappears” the surface layer, , Future ”, and an intention searcher whose target is“ battery ”.

無論、意図検索子の生成対象とする事象は、利用者に対する発話の契機となるような事象であればよく、上記の例に限られない。例えば、撮像部１２が撮像した利用者以外の画像や、集音部１０が取得した利用者の発話以外の音から検出される所定の事象を設定してもよい。この他にも、インターネットやラジオ、テレビ等から取得した情報で検出される所定の事象を設定してもよい。また、音声対話装置１または他の装置がセンサー（加速度センサー、角加速度センサー、温湿度センサー等）を備えている場合には、該センサーから取得した情報で検出される所定の事象を設定してもよい。さらに、上記のような情報を複数種類取得し、取得した各情報を用いて所定の演算を行うことによって算出した値から、所定の事象の発生を検出しても構わない。これにより、利用者自身の状況、利用者の周囲の状況、あるいは利用者が関心のある情報等に応じた、音声対話装置１からの自発的な発話や動作が可能になる。 Of course, the event to be generated by the intention searcher is not limited to the above example as long as it is an event that triggers the utterance to the user. For example, a predetermined event detected from an image other than the user captured by the imaging unit 12 or a sound other than the user's utterance acquired by the sound collection unit 10 may be set. In addition to this, a predetermined event detected from information acquired from the Internet, radio, television, or the like may be set. In addition, when the voice interactive device 1 or other device includes a sensor (acceleration sensor, angular acceleration sensor, temperature / humidity sensor, etc.), a predetermined event detected by information acquired from the sensor is set. Also good. Furthermore, the occurrence of a predetermined event may be detected from a value calculated by acquiring a plurality of types of information as described above and performing a predetermined calculation using each acquired information. Thereby, the spontaneous speech and operation | movement from the voice interactive apparatus 1 according to a user's own condition, a user's surrounding condition, the information which a user is interested, etc. are attained.

〔実施形態３〕
本発明の他の実施形態について、図１４に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。本実施形態では、ネットワーク上の対応決定装置８０を利用して上記実施形態の音声対話装置１と同様の機能を実現する例を説明する。[Embodiment 3]
The following will describe another embodiment of the present invention with reference to FIG. For convenience of explanation, members having the same functions as those described in the embodiment are given the same reference numerals, and descriptions thereof are omitted. In the present embodiment, an example will be described in which a function similar to that of the voice interaction apparatus 1 of the above-described embodiment is realized using the correspondence determination apparatus 80 on the network.

（音声対話システム２００の概要）図１４は、本実施形態に係る音声対話システム２００を概略的に示す図である。音声対話システム２００は、音声対話装置（例えばロボット）７０と、対応決定装置８０とを含む構成である。音声対話装置７０は、音声対話装置１と比べて、音声認識部２０、意図検索子生成部２５、および対応記述子検索部２９を備えていない点と、切替部２２が実行する処理の内容が異なっている点で相違している。 (Outline of Spoken Dialogue System 200) FIG. 14 is a diagram schematically showing the spoken dialogue system 200 according to the present embodiment. The voice interaction system 200 includes a voice interaction device (for example, a robot) 70 and a correspondence determination device 80. Compared with the voice interaction device 1, the voice interaction device 70 does not include the voice recognition unit 20, the intention searcher generation unit 25, and the corresponding descriptor search unit 29, and the contents of the processing executed by the switching unit 22 are the same. It is different in different points.

対応決定装置８０は、利用者の発話に対する対応を決定する装置であり、対応決定装置８０の各部を統括して制御する制御部８１と、対応決定装置８０が外部の装置（ここでは音声対話装置７０）と通信するための通信部８２を備えている。また、制御部８１には、音声認識部２０、意図検索子生成部２５、対応記述子検索部２９、話題取得部２４、および応答制御部８３が含まれている。なお、同図では省略しているが、意図検索子生成部２５には、形態素解析部２６および係り受け解析部２７が接続されており、図示しない記憶部には、隣接ペアテーブル４０、意図テーブル４１、および対応記述子検索テーブル４２が格納されている。 The correspondence determination device 80 is a device that determines the correspondence to the user's utterance. The control unit 81 that controls each part of the correspondence determination device 80 and the correspondence determination device 80 are external devices (in this case, a voice interaction device). 70). The communication part 82 for communicating with 70) is provided. The control unit 81 includes a voice recognition unit 20, an intention searcher generation unit 25, a correspondence descriptor search unit 29, a topic acquisition unit 24, and a response control unit 83. Although not shown in the figure, the intention searcher generation unit 25 is connected to a morpheme analysis unit 26 and a dependency analysis unit 27, and a storage unit (not shown) includes an adjacent pair table 40, an intention table. 41 and a correspondence descriptor search table 42 are stored.

応答制御部８３は、利用者の発話の音声データを受信して、該発話に対する応答を決定する。応答制御部８３の具体的な処理内容は下記の通りである。 The response control unit 83 receives the voice data of the user's utterance and determines a response to the utterance. Specific processing contents of the response control unit 83 are as follows.

（応答制御部８３の処理）図１４に示すように、音声対話装置７０は、利用者の発話を含む音声データを送信するので、応答制御部８３は、通信部８２を介してこれを受信する。次に、応答制御部８３は、受信した音声データを音声認識部２０に送信して音声認識させ、音声認識部２０から音声認識結果のテキストデータを取得する。続いて、応答制御部８３は、取得したテキストデータを意図検索子生成部２５に送信して、該テキストデータから意図検索子を生成させ、生成された意図検索子を取得する。そして、応答制御部８３は、取得した意図検索子を対応記述子検索部２９に送信して、該意図検索子に対応する対応記述子を特定させ、特定された対応記述子を取得する。 (Processing of Response Control Unit 83) As shown in FIG. 14, since the voice interaction device 70 transmits voice data including the user's utterance, the response control unit 83 receives this via the communication unit 82. . Next, the response control unit 83 transmits the received voice data to the voice recognition unit 20 to perform voice recognition, and acquires text data as a voice recognition result from the voice recognition unit 20. Subsequently, the response control unit 83 transmits the acquired text data to the intention searcher generation unit 25, generates an intention searcher from the text data, and acquires the generated intention searcher. Then, the response control unit 83 transmits the acquired intention searcher to the corresponding descriptor search unit 29, specifies the corresponding descriptor corresponding to the intention searcher, and acquires the specified corresponding descriptor.

ここで、取得した対応記述子に隣接ペアＩＤが含まれている場合には、その隣接ペアＩＤを話題取得部２４に送信して、該隣接ペアＩＤに対応する隣接ペアを特定させ、特定された隣接ペアを取得する。なお、取得する隣接ペアは、少なくともその隣接ペアＩＤの対応文を含むものであればよいが、その隣接ペアＩＤにリンクされた各情報を含む隣接ペアテーブル４０全体を取得することが好ましい。 Here, if the acquired correspondence descriptor includes an adjacent pair ID, the adjacent pair ID is transmitted to the topic acquisition unit 24 to identify the adjacent pair corresponding to the adjacent pair ID. Get adjacent pairs. It should be noted that the acquired adjacent pair only needs to include at least the corresponding sentence of the adjacent pair ID, but it is preferable to acquire the entire adjacent pair table 40 including each piece of information linked to the adjacent pair ID.

そして、応答制御部８３は、取得した対応記述子または隣接ペアを、通信部８２を介して音声対話装置７０に送信する。なお、隣接ペアを送信した場合には、利用者の次の発話に対して速やかに隣接ペアの対応文を返すことができるように、応答制御部８３は、隣接ペアを送信したこと、および送信した隣接ペアの内容を記憶しておくことが望ましい。 Then, the response control unit 83 transmits the acquired correspondence descriptor or adjacent pair to the voice interactive apparatus 70 via the communication unit 82. When the adjacent pair is transmitted, the response control unit 83 transmits that the adjacent pair has been transmitted and the transmission is performed so that the corresponding sentence of the adjacent pair can be quickly returned in response to the user's next utterance. It is desirable to store the contents of the adjacent pairs.

この後、利用者がさらに発話したときには、上記と同様に音声対話装置７０から音声データを受信し、応答制御部８３は、この音声データを音声認識部２０に送信してテキストデータを取得する。このとき、隣接ペアを送信したことを記憶していなければ、上記と同様の処理となるが、記憶していれば、応答制御部８３は、取得したテキストデータを話題取得部２４に送信して、該テキストデータに対応する対応文の有無を確認する。 Thereafter, when the user further utters, the voice data is received from the voice interaction device 70 as described above, and the response control unit 83 transmits the voice data to the voice recognition unit 20 to acquire the text data. At this time, if the fact that the adjacent pair has been transmitted is not stored, the processing is the same as above, but if it is stored, the response control unit 83 transmits the acquired text data to the topic acquisition unit 24. The presence / absence of a corresponding sentence corresponding to the text data is confirmed.

そして、対応文があることが確認されると、応答制御部８３は、当該対応文の隣接ペアＩＤを音声対話装置７０に送信する。なお、先に送信した隣接ペアに、当該隣接ペアＩＤの対応文が含まれていない場合には、その対応文も含めて送信する。例えば、図７の隣接ペアテーブルの♯１〜♯３までのデータを隣接ペアとして送信済みの場合に、ＩＤ＝４またはＩＤ＝５を発話させる場合には、♯４または♯５のデータも送信する。無論、♯４と♯５の両方のデータを送信してもよい。 When it is confirmed that there is a corresponding sentence, the response control unit 83 transmits the adjacent pair ID of the corresponding sentence to the voice interaction device 70. In addition, when the adjacent pair transmitted previously does not include the corresponding sentence of the adjacent pair ID, the corresponding sentence is also transmitted. For example, when data of # 1 to # 3 in the adjacent pair table in FIG. 7 has been transmitted as an adjacent pair, if ID = 4 or ID = 5 is uttered, data of # 4 or # 5 is also transmitted. To do. Of course, both # 4 and # 5 data may be transmitted.

一方、対応文がないことが確認されると、応答制御部８３は、テキストデータを意図検索子生成部２５に送信する。この後は、上述した通り、意図検索子が生成されて意図対話が行われる。 On the other hand, when it is confirmed that there is no corresponding sentence, the response control unit 83 transmits the text data to the intention searcher generation unit 25. Thereafter, as described above, an intention searcher is generated and an intention dialogue is performed.

（切替部２２の処理）次に、音声対話装置７０の切替部２２について説明する。上述のように、音声対話装置７０は、音声認識処理を自装置内では行わない。このため、音声対話装置７０は、集音部１０で音声データを取得すると、取得した音声データを対応決定装置８０に送信する。そして、その応答として、対応記述子または隣接ペアを受信する。 (Processing of Switching Unit 22) Next, the switching unit 22 of the voice interactive device 70 will be described. As described above, the voice interaction apparatus 70 does not perform voice recognition processing within the own apparatus. For this reason, when the voice dialogue apparatus 70 acquires voice data by the sound collection unit 10, the voice dialogue apparatus 70 transmits the acquired voice data to the correspondence determination apparatus 80. As a response, the correspondence descriptor or the adjacent pair is received.

ここで、対応記述子を受信した場合には、切替部２２は、対応記述子解析部２８（図１参照）にその対応記述子を送信する。つまり、本実施形態の切替部２２は、利用者の発話を解析して生成された意図検索子に対応する対応記述子を、外部機器（対応決定装置８０）から取得する対応記述子取得部として機能する。この後は、実施形態１で説明したように、対応制御部３０によって、この対応記述子が示す対応が実行される。 Here, when the correspondence descriptor is received, the switching unit 22 transmits the correspondence descriptor to the correspondence descriptor analysis unit 28 (see FIG. 1). That is, the switching unit 22 of the present embodiment serves as a correspondence descriptor acquisition unit that acquires a correspondence descriptor corresponding to an intention searcher generated by analyzing a user's utterance from an external device (a correspondence determination device 80). Function. Thereafter, as described in the first embodiment, the correspondence controller 30 executes the correspondence indicated by the correspondence descriptor.

一方、隣接ペアを受信した場合には、切替部２２は、受信した隣接ペアを話題管理部２３（図１参照）に送信する。この後は、実施形態１で説明したように、この隣接ペアに応じた対応が行われる。なお、受信した情報が隣接ペアＩＤのみであれば、音声対話装置７０の記憶部１５に格納されている隣接ペアテーブル４０から対応文が特定され、隣接ペアテーブル（全体または一部）を受信した場合には、受信した隣接ペアテーブルを用いて対応文が特定される。 On the other hand, when the adjacent pair is received, the switching unit 22 transmits the received adjacent pair to the topic management unit 23 (see FIG. 1). Thereafter, as described in the first embodiment, the correspondence corresponding to the adjacent pair is performed. If the received information is only the adjacent pair ID, the corresponding sentence is specified from the adjacent pair table 40 stored in the storage unit 15 of the voice interactive device 70, and the adjacent pair table (whole or part) is received. In this case, the corresponding sentence is specified using the received adjacent pair table.

（システム構成のバリエーション）上記では、音声対話装置１の一部機能をサーバ（対応決定装置８０）に持たせた音声対話システム２００について説明したが、サーバに持たせる機能はこの例に限られない。例えば、音声認識部２０は音声対話装置に残し、意図検索子生成部２５の機能をサーバに持たせた音声対話システムも本発明の範疇に含まれる。この構成では、音声対話装置は、意図検索子を生成する代わりに、サーバから意図検索子を取得することになるので、意図検索子生成部２５の代わりに意図検索子取得部を備えていればよい。 (Variation of System Configuration) In the above description, the voice conversation system 200 in which a part of the function of the voice interaction device 1 is provided in the server (correspondence determination device 80) has been described. However, the function provided in the server is not limited to this example. . For example, a voice dialogue system in which the voice recognition unit 20 is left in the voice dialogue device and the server has the function of the intention searcher generation unit 25 is also included in the scope of the present invention. In this configuration, since the voice interactive apparatus acquires an intention searcher from the server instead of generating an intention searcher, if the intentional searcher acquisition unit is provided instead of the intention searcher generation unit 25, Good.

また、対応制御部３０、対応文出力制御部３１、および対応行動制御部３３の機能をサーバに持たせてもよい。この他、音声認識部２０と対応記述子検索部２９の機能をサーバに持たせ、意図検索子生成部２５、対応制御部３０、対応文出力制御部３１、および対応行動制御部３３の機能を音声対話装置に残す構成等も可能である。このように、各機能をサーバと音声対話装置とに適宜振り分けた音声対話システムであっても、音声対話装置１と同様の機能を実現できる。また、サーバは機能毎に個別に設けてもよいし、複数の機能を１つのサーバに搭載してもよい。 Moreover, you may give a server the function of the corresponding | compatible control part 30, the corresponding sentence output control part 31, and the corresponding | compatible action control part 33. FIG. In addition, the server has the functions of the speech recognition unit 20 and the correspondence descriptor search unit 29, and the functions of the intention searcher generation unit 25, the correspondence control unit 30, the correspondence sentence output control unit 31, and the correspondence action control unit 33 are provided. It is also possible to have a configuration that remains in the voice interaction device. As described above, even in a voice dialogue system in which each function is appropriately allocated to the server and the voice dialogue device, the same function as the voice dialogue device 1 can be realized. Further, the server may be provided for each function, or a plurality of functions may be mounted on one server.

〔ソフトウェアによる実現例〕
音声対話装置１および対応決定装置８０の制御ブロック（特に制御部１４および８１の各ブロック）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。[Example of software implementation]
The control blocks (especially the blocks of the control units 14 and 81) of the voice interactive device 1 and the correspondence determining device 80 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, It may be realized by software using a CPU (Central Processing Unit).

後者の場合、音声対話装置１および対応決定装置８０は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（RandomAccess Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the voice interaction device 1 and the correspondence determining device 80 are a CPU that executes instructions of a program that is software that realizes each function, and a ROM in which the program and various data are recorded so as to be readable by a computer (or CPU). (Read Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る対応決定装置（音声対話装置１、対応決定装置８０）は、利用者と音声対話を行う音声対話装置（１）が該利用者の発話に応じて行う対応を決定する対応決定装置であって、上記発話を解析して生成された該利用者の意図を示す意図検索子を取得する意図検索子取得部（意図検索子生成部２５）と、上記意図検索子と、上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報（対応記述子検索テーブル４２）を参照して、上記意図検索子取得部により取得された意図検索子に対応する対応記述子を特定する対応記述子検索部（対応記述子検索部２９）と、を備えている。[Summary]
The correspondence determining device (speech dialogue device 1, correspondence decision device 80) according to aspect 1 of the present invention determines the correspondence that the voice dialogue device (1) that performs a voice dialogue with the user performs according to the utterance of the user. An intention searcher acquisition unit (intention searcher generation unit 25) for acquiring an intention searcher indicating the user's intention generated by analyzing the utterance, and the intention searcher; Corresponding to the intention searcher acquired by the intention searcher acquisition unit with reference to the corresponding descriptor search information (corresponding descriptor search table 42) associated with the corresponding descriptor indicating the correspondence of the voice interactive device And a corresponding descriptor search unit (corresponding descriptor search unit 29) for specifying the corresponding descriptor.

上記の構成によれば、利用者の意図を示す意図検索子を取得し、該意図検索子に対応した対応記述子を特定するので、利用者の意図に応じた対応を音声対話装置に実行させることができる。また、上記意図検索子は、利用者の意図を示す検索子であるから、利用者の発話した文言をそのまま検索子として用いる場合と比べて、利用者の多様な表現をカバーしやすい。例えば、利用者の意図が同じであれば、敬語や方言などにより発話された文言自体が変化したとしても、それらの発話を１つの意図検索子で表すことができる。よって、利用者の発話する文言とその対応とを対応付けたデータベースを用いる場合と比べて、対応記述子を検索するために必要な処理量を低減することができる。 According to the above configuration, the intention searcher indicating the user's intention is acquired, and the correspondence descriptor corresponding to the intention searcher is specified. Therefore, the voice interactive apparatus performs a response according to the user's intention. be able to. In addition, since the intention searcher is a searcher indicating the user's intention, it is easy to cover various expressions of the user as compared with the case where the words spoken by the user are used as they are as searchers. For example, if the user's intention is the same, even if the words spoken by honorific or dialect change, those utterances can be represented by one intention searcher. Therefore, compared with the case where the database which matched the wording of a user and the correspondence is used, the processing amount required in order to search a correspondence descriptor can be reduced.

したがって、上記の構成によれば、多様な表現の発話に対し、その意図に応じた対応を速やかに特定することができる。なお、意図検索子は、対応決定装置が生成してもよいし、外部機器から取得してもよい。また、対応記述子の示す対応は、音声対話装置が利用者に向けて実行するものであればよく、利用者向けの発話であってもよいし、それ以外の動作であってもよい。 Therefore, according to said structure, the response | compatibility according to the intention with respect to the utterance of various expressions can be specified rapidly. The intention searcher may be generated by the correspondence determination device or may be acquired from an external device. In addition, the correspondence indicated by the correspondence descriptor is not limited as long as the voice interactive device executes for the user, and may be an utterance for the user or other operation.

本発明の態様２に係る対応決定装置は、上記態様１において、上記対応記述子検索情報の上記意図検索子には、当該意図検索子の示す意図に関連する語句（対象）が含まれており、上記対応記述子検索部は、上記意図検索子取得部が取得した意図検索子に、上記利用者の発話から抽出された語句が含まれる場合、当該語句を含む意図検索子に対応付けられた上記対応記述子を特定してもよい。 In the correspondence determination apparatus according to aspect 2 of the present invention, in the aspect 1, the intention searcher of the correspondence descriptor search information includes a phrase (target) related to the intention indicated by the intention searcher. When the intention searcher acquired by the intention searcher acquisition unit includes a phrase extracted from the user's utterance, the correspondence descriptor search unit is associated with the intention searcher including the phrase The correspondence descriptor may be specified.

上記の構成によれば、利用者が特定の語句を発話した場合に、その利用者の意図に対応し、かつその語句にも対応する対応記述子が特定される。よって、利用者の意図に対応し、かつ特定の語句にも対応する対応を音声対話装置に実行させることができる。 According to the above configuration, when a user utters a specific word / phrase, a correspondence descriptor corresponding to the intention of the user and corresponding to the word / phrase is specified. Therefore, it is possible to cause the voice interactive apparatus to execute a response corresponding to the user's intention and corresponding to a specific word / phrase.

本発明の態様３に係る対応決定装置は、上記態様１または２において、所定の事象の発生を検出する事象検出部（情報取得部２１）を備え、上記意図検索子取得部は、上記事象検出部が上記所定の事象の発生を検出した場合に、当該事象に応じた意図検索子を取得してもよい。 The correspondence determination apparatus according to aspect 3 of the present invention includes the event detection unit (information acquisition unit 21) that detects occurrence of a predetermined event in the above-described aspect 1 or 2, and the intention searcher acquisition unit includes the event detection unit When the unit detects the occurrence of the predetermined event, an intention searcher corresponding to the event may be acquired.

上記の構成によれば、所定の事象の発生時における利用者の意図に応じた対応を決定することができる。よって、利用者が発話していない場合であっても、音声対話装置から主体的に利用者に話しかける等の能動的な対話が実現可能となる。なお、所定の事象は、音声対話装置が対応を行う契機として適当なものであればよい。例えば、音声対話装置がバッテリーで駆動するものであれば、そのバッテリー残量が少ない状態となっていることを上記所定の事象の発生として検出してもよい。 According to said structure, the response | compatibility according to the user's intention at the time of occurrence of a predetermined event can be determined. Therefore, even when the user is not speaking, it is possible to realize an active dialogue such as talking to the user independently from the voice dialogue apparatus. Note that the predetermined event may be any event that is appropriate as an opportunity for the voice interaction apparatus to respond. For example, if the voice interactive apparatus is driven by a battery, it may be detected as the occurrence of the predetermined event that the remaining battery level is low.

本発明の態様４に係る対応決定装置は、上記態様１から３のいずれかにおいて、上記意図検索子取得部により取得された意図検索子に対応する対応記述子を外部機器から取得する対応記述子取得部（対応記述子検索部２９）を備えていてもよい。 The correspondence determination apparatus according to aspect 4 of the present invention provides the correspondence descriptor for acquiring the correspondence descriptor corresponding to the intention searcher acquired by the intention searcher acquisition unit from the external device according to any one of aspects 1 to 3. An acquisition unit (corresponding descriptor search unit 29) may be provided.

上記の構成によれば、取得された意図検索子に対応する対応記述子を外部機器から取得するので、対応決定装置内で適切な対応記述子を検出できない場合であっても、外部機器から取得した対応記述子を用いて音声対話装置に対応を実行させることができる。また、対応決定装置内で対応記述子を検出できた場合であっても、外部機器からより適切な対応記述子を取得できた場合には、外部機器から取得した対応記述子を用いて、音声対話装置により適切な対応を実行させることができる。 According to the above configuration, since the correspondence descriptor corresponding to the acquired intention searcher is acquired from the external device, even if it is not possible to detect an appropriate correspondence descriptor in the correspondence determination device, it is acquired from the external device. It is possible to cause the voice interaction apparatus to execute the correspondence using the correspondence descriptor. Even if the correspondence descriptor can be detected in the correspondence determination device, if a more appropriate correspondence descriptor can be obtained from the external device, the correspondence descriptor obtained from the external device can be used to An appropriate response can be executed by the interactive device.

本発明の態様５に係る対応決定装置は、上記態様１から４のいずれかにおいて、上記対応記述子検索部が特定した上記対応記述子の示す対応が実行される前に、上記利用者により新たな発話が行われた場合に、上記対応の実行を停止または中止する対応制御部（３０）を備えていてもよい。 The correspondence determining apparatus according to aspect 5 of the present invention is the above-described correspondence determination apparatus according to any one of aspects 1 to 4, wherein the user performs a new operation before the correspondence indicated by the correspondence descriptor specified by the correspondence descriptor search unit is executed. A response control unit (30) may be provided that stops or cancels the execution of the response when an utterance is made.

上記の構成によれば、音声対話装置が利用者の発話に応じた対応を実行する前に、該利用者により新たな発話が行われた場合に、上記対応の実行を停止または中止する。よって、利用者が連続して発話する場合に、音声対話装置の対応が利用者の発話を妨げたり、音声対話装置に不自然な対応をさせたりすることを防ぐことができる。 According to the above configuration, when a new utterance is made by the user before the voice interactive apparatus executes a response according to the user's utterance, the execution of the response is stopped or stopped. Therefore, when the user continuously speaks, it is possible to prevent the correspondence of the voice interaction device from hindering the user's speech or causing the voice interaction device to perform an unnatural response.

本発明の態様６に係る対応決定装置は、上記態様１から５のいずれかにおいて、上記発話後の上記利用者が上記意図検索子の生成対象となる内容の発話を行っていない場合に、上記対応記述子検索部が特定した上記対応記述子の示す対応の実行タイミングを遅らせるタイミング制御部（対応制御部３０）を備えていてもよい。 The correspondence determination device according to aspect 6 of the present invention provides the correspondence determination apparatus according to any one of the aspects 1 to 5 described above, when the user after the utterance does not utter the content to be generated by the intention searcher. You may provide the timing control part (correspondence control part 30) which delays the execution timing of the response | compatibility which the corresponding | compatible descriptor search part specified by the said corresponding descriptor specifies.

上記の構成によれば、発話後の利用者が意図検索子の生成対象となる内容の発話を行っていない場合に、対応の実行を遅らせることにより、利用者の次の発話を妨げることなく、自然なタイミングで音声対話装置に応答させることが可能になる。 According to the above configuration, when the user after the utterance does not utter the content to be generated by the intention searcher, by delaying the execution of the response, the user's next utterance is not hindered. It becomes possible to make the voice interaction device respond at a natural timing.

なお、「意図検索子の生成対象となる内容の発話を行っていない場合」には、利用者が何ら発話していない場合の他、「ええと」や「うーん」等の感動詞が発話された場合のように、利用者の特定の意図が反映されていない発話が行われた場合も含まれる。よって、上記の構成は、利用者の意図を含む発話の後に「間」が生じたときに、その発話に対する対応の実行タイミングを遅らせる構成であるとも言える。 In addition, in the case of “not uttering the content for which the intention searcher is to be generated”, in addition to the case where the user does not utter anything, a moving verb such as “um” or “um” is uttered. This also includes the case where an utterance that does not reflect the specific intention of the user is made, as in Therefore, it can be said that the above configuration is a configuration in which the execution timing of the response to the utterance is delayed when an “interval” occurs after the utterance including the intention of the user.

本発明の態様７に係る対応決定装置は、上記態様１から６のいずれかにおいて、上記音声対話装置の所定の発話に対する利用者の応答として想定される発話内容（想定発話）と、該発話内容に対する上記音声対話装置の応答内容とが対応付けられたリンク情報（隣接ペアテーブル４０）を参照して、上記音声対話装置が上記所定の発話を行った後の上記音声対話装置の対応を決定するリンク応答部（話題管理部２３、話題取得部２４）を備えていてもよい。 The correspondence determining apparatus according to aspect 7 of the present invention provides the utterance content (assumed utterance) assumed as a user's response to the predetermined utterance of the voice interactive device according to any of the above aspects 1 to 6, and the utterance content Referring to the link information (adjacent pair table 40) associated with the response content of the voice interaction device with respect to the voice interaction device, the correspondence of the voice interaction device after the voice interaction device has made the predetermined utterance is determined. A link response unit (topic management unit 23, topic acquisition unit 24) may be provided.

上記の構成によれば、音声対話装置が所定の発話を行ったときには、意図検索子を生成することなく、リンク情報を用いてさらに速やかに対応を決定することができる。このように、意図検索子を用いた対応の決定と、リンク情報を用いた対応の決定とを併用することにより、利用者の意図を汲んだ対応を実現しつつ、状況に応じてリンク情報で対応を決定して、対応決定のために必要な処理の負荷を減らすことができる。 According to the above configuration, when the voice interactive apparatus makes a predetermined utterance, it is possible to determine the response more quickly using the link information without generating an intention searcher. In this way, by combining the determination of the correspondence using the intention searcher and the determination of the correspondence using the link information, while realizing the correspondence based on the user's intention, the link information can be used depending on the situation. By determining the response, the processing load required for determining the response can be reduced.

本発明の態様８に係る音声対話システム（１００、２００）は、音声対話装置（１、７０）にて利用者と音声対話を行う音声対話システムであって、上記利用者の意図を示す意図検索子と、上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報（対応記述子検索テーブル４２）を参照して、上記利用者の発話を解析して生成された意図検索子に対応する対応記述子を特定する対応決定装置（音声対話装置１、対応決定装置８０）を含み、上記音声対話装置は、上記利用者の上記発話に対し、上記対応決定装置が特定した上記対応記述子の示す対応を実行する。よって、上記態様１と同様の効果を奏する。 A voice dialogue system (100, 200) according to an aspect 8 of the present invention is a voice dialogue system that performs a voice dialogue with a user using a voice dialogue device (1, 70), and an intention search that indicates the intention of the user. Generated by analyzing the user's utterance with reference to correspondence descriptor search information (correspondence descriptor search table 42) in which a child and a correspondence descriptor indicating the correspondence of the voice interactive device are associated with each other A correspondence determination device (speech dialogue device 1, correspondence decision device 80) that identifies a correspondence descriptor corresponding to an intention searcher, wherein the correspondence decision device identifies the utterance of the user. The correspondence indicated by the correspondence descriptor is executed. Therefore, the same effects as those of the first aspect are obtained.

本発明の態様９に係る対応決定装置の制御方法は、利用者と音声対話を行う音声対話装置（１）が該利用者の発話に応じて行う対応を決定する対応決定装置の制御方法であって、上記発話を解析して生成された上記利用者の意図を示す意図検索子を取得する意図検索子取得ステップと、上記意図検索子と上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報（対応記述子検索テーブル４２）を参照して、上記意図検索子取得ステップにて取得された意図検索子に対応する対応記述子を特定する対応記述子検索ステップと、を含む。よって、上記態様１と同様の効果を奏する。 The control method of the correspondence determining apparatus according to aspect 9 of the present invention is a control method of the correspondence determining apparatus that determines the correspondence that the voice interactive device (1) that performs a voice conversation with the user according to the utterance of the user. The intention searcher acquisition step for acquiring the intention searcher indicating the user's intention generated by analyzing the utterance corresponds to the correspondence descriptor indicating the correspondence between the intention searcher and the voice interactive apparatus. A corresponding descriptor search step for identifying a corresponding descriptor corresponding to the intention searcher acquired in the intention searcher acquisition step with reference to the attached corresponding descriptor search information (corresponding descriptor search table 42); ,including. Therefore, the same effects as those of the first aspect are obtained.

本発明の態様１０に係る音声対話装置（１）は、上記対応決定装置を備え、利用者の発話に対し、該対応決定装置が決定した対応を実行する。よって、上記態様１と同様の効果を奏する。 A spoken dialogue apparatus (1) according to aspect 10 of the present invention includes the above-described correspondence determining device, and executes the correspondence determined by the correspondence determining device for a user's utterance. Therefore, the same effects as those of the first aspect are obtained.

本発明の態様１１に係る音声対話装置（７０）は、利用者と音声対話を行う音声対話装置であって、上記利用者の意図を示す意図検索子と、上記音声対話装置の対応を示す対応記述子とが対応付けられた対応記述子検索情報（対応記述子検索テーブル４２）を参照して特定された、上記利用者の発話を解析して生成された意図検索子に対応する対応記述子を、外部機器から取得する対応記述子取得部と、上記対応記述子取得部が取得した上記対応記述子が示す対応を実行する対応制御部と、を備えている。よって、上記態様１と同様の効果を奏する。 A voice interaction apparatus (70) according to an aspect 11 of the present invention is a voice interaction apparatus that performs a voice conversation with a user, and an intention searcher that indicates the user's intention and a correspondence that indicates correspondence between the voice interaction apparatus and the user. Corresponding descriptor corresponding to the intention searcher generated by analyzing the user's utterance specified by referring to the corresponding descriptor search information (corresponding descriptor search table 42) associated with the descriptor. Are provided from the external device, and a correspondence control unit that executes the correspondence indicated by the correspondence descriptor obtained by the correspondence descriptor obtaining unit. Therefore, the same effects as those of the first aspect are obtained.

本発明の各態様に係る対応決定装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記対応決定装置が備える各部（ソフトウェア要素）として動作させることにより上記対応決定装置をコンピュータにて実現させる対応決定装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The correspondence determination apparatus according to each aspect of the present invention may be realized by a computer. In this case, the correspondence determination apparatus is operated on each computer by causing the computer to operate as each unit (software element) included in the correspondence determination apparatus. The control program for the correspondence determination apparatus to be realized in this way and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

本発明は、利用者と音声対話を行う音声対話装置に利用することができる。 The present invention can be used in a voice dialogue apparatus that performs voice dialogue with a user.

１音声対話装置（対応決定装置）
４対応記述子検索装置（外部機器、対応決定装置）
７０音声対話装置
８０対応決定装置
２１情報取得部（事象検出部）
２３話題管理部（リンク応答部）
２４話題取得部（リンク応答部）
２５意図検索子生成部（意図検索子取得部）
２９対応記述子検索部（対応記述子取得部）
３０対応制御部（タイミング制御部）
４０隣接ペアテーブル（リンク情報）
４２対応記述子検索テーブル（対応記述子検索情報）
１００、２００音声対話システム1 Spoken Dialogue Device (Correspondence Determination Device)
4. Corresponding descriptor search device (external device, correspondence determining device)
70 Spoken Dialogue Device 80 Correspondence Determination Device 21 Information Acquisition Unit (Event Detection Unit)
23 Topic Management Department (Link Response Department)
24 Topic acquisition unit (link response unit)
25 Intention Searcher Generation Unit (Intention Searcher Acquisition Unit)
29 Corresponding descriptor search part (corresponding descriptor acquisition part)
30 Corresponding control unit (timing control unit)
40 Adjacent pair table (link information)
42 Corresponding descriptor search table (corresponding descriptor search information)
100, 200 Spoken dialogue system

Claims

A voice interaction device that performs a voice conversation with a user is a correspondence determination device that determines a correspondence to be performed according to the user's utterance,
An intention searcher acquisition unit for acquiring an intention searcher indicating the intention of the user generated by analyzing the utterance;
The correspondence description corresponding to the intention searcher acquired by the intention searcher acquisition unit with reference to the correspondence descriptor search information in which the intention searcher and the correspondence descriptor indicating the correspondence of the voice interactive device are associated with each other A correspondence determination apparatus comprising: a correspondence descriptor search unit that identifies a child.

The intention searcher of the corresponding descriptor search information includes a phrase related to the intention indicated by the intention searcher.
When the intention searcher acquired by the intention searcher acquisition unit includes a phrase extracted from the user's utterance, the correspondence descriptor search unit is associated with the intention searcher including the phrase The correspondence determination apparatus according to claim 1, wherein a correspondence descriptor is specified.

An event detector for detecting occurrence of a predetermined event;
3. The response according to claim 1, wherein the intention searcher acquisition unit acquires an intention searcher corresponding to the event when the event detection unit detects the occurrence of the predetermined event. Decision device.

The correspondence descriptor acquisition part which acquires the corresponding descriptor corresponding to the intention searcher acquired by the said intention searcher acquisition part from an external device is provided, The any one of Claim 1 to 3 characterized by the above-mentioned. The correspondence determination device described.

A response control unit that stops or stops the execution of the response when a new utterance is made by the user before the response indicated by the response descriptor specified by the response descriptor search unit is executed. The correspondence determining apparatus according to claim 1, further comprising a correspondence determining apparatus.

Timing control for delaying the execution timing of the correspondence indicated by the correspondence descriptor specified by the correspondence descriptor search unit when the user after the utterance does not utter the content to be generated by the intention searcher The correspondence determining apparatus according to claim 1, further comprising a unit.

With reference to link information in which the utterance content assumed as a user's response to the predetermined utterance of the voice interaction device and the response content of the voice interaction device with respect to the utterance content are associated, The correspondence determination apparatus according to claim 1, further comprising a link response unit that determines the correspondence of the voice interactive apparatus after performing the predetermined utterance.

A voice dialogue system for carrying out a voice dialogue with a user using a voice dialogue device,
Generated by analyzing the user's utterance with reference to the corresponding descriptor search information in which the intention searcher indicating the user's intention is associated with the corresponding descriptor indicating the correspondence of the voice interactive device A correspondence determining device that identifies a correspondence descriptor corresponding to the intention searcher,
The voice dialogue system, wherein the voice dialogue system executes the correspondence indicated by the correspondence descriptor specified by the correspondence decision device with respect to the utterance of the user.

A control method of a correspondence determination device for determining a correspondence that a voice dialogue device that performs a voice dialogue with a user performs according to the utterance of the user,
An intention searcher acquisition step of acquiring an intention searcher indicating the intention of the user generated by analyzing the utterance;
The correspondence corresponding to the intention searcher acquired in the intention searcher acquisition step with reference to the correspondence descriptor search information in which the correspondence descriptor indicating the correspondence between the intention searcher and the voice interactive device is associated. And a correspondence descriptor search step for specifying a descriptor.

A spoken dialogue apparatus comprising the correspondence determining device according to claim 1, wherein the correspondence determined by the correspondence determining device is executed for a user's utterance.

A voice dialogue device for carrying out a voice dialogue with a user,
Analyzing the user's utterance identified with reference to the corresponding descriptor search information in which the intention searcher indicating the user's intention and the corresponding descriptor indicating the correspondence of the voice interactive device are associated with each other. A corresponding descriptor acquisition unit for acquiring a corresponding descriptor corresponding to the intention search element generated from the external device;
A spoken dialogue apparatus comprising: a correspondence control unit that executes a correspondence indicated by the correspondence descriptor obtained by the correspondence descriptor obtaining unit.