JP2007079397A

JP2007079397A - Interaction method, interaction device, interaction program, and recording medium

Info

Publication number: JP2007079397A
Application number: JP2005269912A
Authority: JP
Inventors: Noboru Miyazaki; 昇宮崎; Tetsuo Amakasu; 哲郎甘粕; Teruo Hagino; 輝雄萩野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-09-16
Filing date: 2005-09-16
Publication date: 2007-03-29
Anticipated expiration: 2025-09-16
Also published as: JP4783608B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interaction method and an interaction device that hold continuity of an answer format at the time of scenario transition from a focus interaction scenario to an auxiliary interaction scenario. <P>SOLUTION: In the interaction device which can communicate with a plurality of interaction systems storing interaction scenarios and answer generation models and stores auxiliary interaction scenarios in a storage means, an input processing means generates indication information indicating an input conversion result and a scenario to be used for interaction processing from an interaction input and outputs them, and a focus interaction system setting means receives the interaction scenario indicated by the indication information and its answer generation model from an interaction system and sets them as a focus interaction model and a focus answer generation model; and an interaction scenario execution means generates answer contents for the input conversion result by using the focus interaction scenario or auxiliary interaction scenario and outputs them, and an answer generating means generates an interaction output from the answer contents by using the focus answer generation model and outputs the interaction output. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、入力に対して対話シナリオに基づいて適切な応答を出力する対話技術に関する。より詳しくは、異なる話題を扱う複数の対話システムを自動的に切り替えながら用いることによって、広い範囲の話題に対応する対話を実現する対話技術に関する。 The present invention relates to an interactive technique for outputting an appropriate response to an input based on an interactive scenario. More specifically, the present invention relates to a dialog technology that realizes a dialog corresponding to a wide range of topics by automatically switching between a plurality of dialog systems that handle different topics.

音声やテキストなどによる入力を受け付けて適切な応答を出力する従来的対話システムは、特定の話題に限定した対話を扱うことはできるが、広い範囲の話題に関連した入力に対して適切な応答を出力することは困難であった。これは、話題の範囲が広がれば入力に対する応答を決定する対話シナリオの記述が複雑になるからである。また、狭い範囲の話題しか扱えない対話システムでは、ユーザの要求に不十分な応答しか出来ない、ユーザが予めシステムの能力についてよく理解していなければならない、などの問題があった。そこで、比較的容易に構築できる狭い範囲の話題を扱う対話システムを複数構築しておき、それらを適切に切り替えながら対話を行うことによって、擬似的に広い範囲の話題を扱う対話システムを構築することが試みられてきた。 A traditional dialogue system that accepts input such as voice or text and outputs an appropriate response can handle dialogue limited to a specific topic, but responds appropriately to input related to a wide range of topics. It was difficult to output. This is because the description of the dialogue scenario for determining the response to the input becomes complicated as the topic range increases. In addition, in a dialogue system that can handle only a narrow range of topics, there are problems such as an inadequate response to the user's request, and the user's ability to understand the capabilities of the system in advance. Therefore, constructing a dialogue system that handles a wide range of topics in a pseudo manner by constructing a plurality of dialogue systems that handle a narrow range of topics that can be constructed relatively easily, and performing dialogue while switching them appropriately. Has been tried.

このような試みにおいて、例えば非特許文献１に示されるように、複数の対話システムに加えて中立な対話状態（複数の対話システムにおける対話状態のいずれにも該当しない対話状態である。）を準備するマルチ対話システムがある。このようなマルチ対話システムを用いる場合、中立な対話状態に対応した補助対話シナリオを準備しておき、現在動作している対話システムでは入力を処理できない場合に、別の対話システムへ切り替えるための対話を、補助対話シナリオを用いて行うような手法が考えられる。なお、上記の「入力を処理できない場合」とは、例えば、現在動作している対話システムの設計想定範囲外の入力をユーザが行った場合や、入力を処理する手続きがパターン認識処理を伴うものであったとして、パターン認識に失敗しそれ以降の処理を実行できなくなった場合などが考えられる。 In such an attempt, for example, as shown in Non-Patent Document 1, in addition to a plurality of dialogue systems, a neutral dialogue state (a dialogue state that does not correspond to any of the dialogue states in the plurality of dialogue systems) is prepared. There is a multi dialogue system. When such a multi-dialog system is used, an auxiliary dialog scenario corresponding to a neutral dialog state is prepared, and when the currently operating dialog system cannot process input, a dialog for switching to another dialog system is prepared. It is conceivable to use a supplementary dialogue scenario. In addition, the above-mentioned “when input cannot be processed” means, for example, when the user performs input outside the design assumption range of the currently operating interactive system, or when the input processing procedure involves pattern recognition processing. If the pattern recognition fails, the subsequent processing cannot be executed.

なお、今後、複数の対話システムのうち現在動作している（あるいは動作させる）対話システムを「焦点対話システム」と表記し、焦点対話システムの応答内容を決定する対話シナリオを「焦点対話シナリオ」と表記する。
磯部俊洋外５名、「複数モデルを選択的に用いる音声対話システムにおけるドメイン切り替え尺度の検討」、音声言語情報処理研究会（ＳＩＧ−ＳＬＰ）第４７回研究会、社団法人情報処理学会、２００３年７月１９日、ｐ．４１−４６ In the future, a dialogue system that is currently operating (or operated) among a plurality of dialogue systems will be referred to as a “focal dialogue system”, and a dialogue scenario that determines the response content of the focal dialogue system will be referred to as a “focal dialogue scenario”. write.
Toshihiro Isobe, 5 people, “Examination of domain switching scale in spoken dialogue system using multiple models selectively”, 47th SIG-SLP Research Group, Information Processing Society of Japan, 2003 July 19, p. 41-46

上記のマルチ対話システムでは、複数の対話システムのうちの一つを焦点対話システムとして稼動させ、焦点対話システムに対応する焦点対話シナリオと補助対話シナリオとを遷移しながら対話を行い、補助対話シナリオの進行に応じて焦点対話システムを切替えて対話を実現する。このようなマルチ対話システムでは、焦点対話シナリオと補助対話シナリオとの間で用いるシナリオが遷移した際の対話を自然に行う点に技術的な困難が存在した。 In the multi dialogue system described above, one of a plurality of dialogue systems is operated as a focal dialogue system, and the dialogue is performed while transitioning between the focal dialogue scenario and the auxiliary dialogue scenario corresponding to the focal dialogue system. The dialogue is realized by switching the focal dialogue system according to the progress. In such a multi-dialog system, there is a technical difficulty in that a dialogue is naturally performed when a scenario used between a focal dialogue scenario and an auxiliary dialogue scenario transitions.

マルチ対話システムでは、例えばユーザが焦点対話システムに何らかの要求を入力したつもりであるにも関わらず、焦点対話システムが処理できない入力であった場合、用いるシナリオが補助対話シナリオへ遷移する。この場合、補助対話シナリオにおいて、現在の焦点対話システムでは入力が扱えないことをユーザに伝え、別の対話システムへ切り替えるかどうかを確認するなどといった、対話の破綻を防ぐ処理を行うことが考えられる。 In the multi-dialog system, for example, when the user intends to input some request to the focus dialog system, but the input cannot be processed by the focus dialog system, the scenario to be used is changed to the auxiliary dialog scenario. In this case, in the auxiliary dialogue scenario, it may be possible to perform processing to prevent failure of the dialogue such as telling the user that the current focal dialogue system cannot handle input and confirming whether to switch to another dialogue system. .

ここで、対話システムの応答の様式が、個々の対話システムで異なっている場合を考える。「応答の様式」とは、例えば、音声で応答する場合では、音声の大きさ、発話速度、声質などといった話者性であり、画面にエージェントキャラクタを表示してジェスチャとテキストあるいは音声を組み合わせて応答する場合では、エージェントのキャラクタの種類などである。このような場合、焦点対話シナリオから補助対話シナリオに用いるシナリオが遷移した際に突然エージェントのキャラクタが変化したり音声の話者性が変化したりすることとなる。即ち、ユーザの立場からすると、焦点対話システムに入力したつもりであるにも関わらず突然応答様式が変化するように感じられることになる。一般にユーザは、同一の対話システムにおいては、同一の応答様式を想定して対話を進める。従って、同一の対話システムの応答様式が突然変化するように感じられる対話システムは、ユーザに混乱を与えることとなる。つまり、従来的なマルチ対話システムでは、個々の焦点対話シナリオから補助対話シナリオへ用いるシナリオが遷移する際に応答様式の不連続性が発生するという課題があった。 Here, let us consider a case where the response system of the dialog system is different in each dialog system. For example, in the case of responding by voice, the “response mode” is the speaker nature such as the volume of the voice, the speaking speed, the voice quality, and the like. The agent character is displayed on the screen, and the gesture and the text or the voice are combined. In the case of responding, it is the type of agent character. In such a case, when the scenario used for the auxiliary dialogue scenario changes from the focal dialogue scenario, the agent character suddenly changes or the voice speaker nature changes. That is, from the user's standpoint, the user feels that the response style suddenly changes despite the intention to input to the focal dialogue system. In general, in the same dialog system, the user proceeds with the dialog assuming the same response mode. Therefore, a dialog system that seems to suddenly change the response mode of the same dialog system is confusing to the user. In other words, the conventional multi-dialog system has a problem in that discontinuity of the response style occurs when the scenario used from the individual focus dialog scenario to the auxiliary dialog scenario transitions.

そこで本発明は、上記の課題に鑑み、焦点対話シナリオから補助対話シナリオへシナリオ遷移が生じた際の応答様式の連続性を保持する対話方法、対話装置、対話プログラムおよび記録媒体を提供することを目的とする。 Accordingly, in view of the above problems, the present invention provides a dialog method, a dialog device, a dialog program, and a recording medium that maintain continuity of response style when a scenario transition occurs from a focused dialog scenario to an auxiliary dialog scenario. Objective.

上記課題を解決するために、本発明では、それぞれ少なくとも対話シナリオおよび応答生成モデルを記憶して対話処理が実行可能な複数の対話システムと相互に通信可能であり、記憶手段には、少なくとも対話システムの切り替えにおける対話処理を行うことが可能な補助対話シナリオが記憶された対話装置において、対話装置の入力処理手段が、ユーザの対話入力から、ユーザの対話入力を対話処理が可能な形式に変換した入力変換結果および対話処理を実行するための対話シナリオあるいは補助対話シナリオを指示する指示情報を生成して、これらを出力し、対話装置の焦点対話システム設定手段が、上記指示情報によって指示された対話シナリオを有する対話システムの対話シナリオおよび応答生成モデルを当該対話システムから受信し、それぞれを対話処理を実行するための焦点対話モデルおよび焦点応答生成モデルとして設定し、対話装置の対話シナリオ実行手段が、上記焦点対話シナリオまたは上記補助対話シナリオを用いて、上記入力変換結果に対する応答内容を生成して出力し、対話装置の応答生成手段が、上記焦点応答生成モデルを用いて、上記応答内容からユーザに提示される対話出力を生成して出力する。 In order to solve the above-mentioned problem, in the present invention, at least a dialogue scenario and a response generation model can be stored to communicate with a plurality of dialogue systems capable of executing dialogue processing, and the storage means includes at least the dialogue system. In an interactive device in which an auxiliary interactive scenario capable of performing interactive processing in the switching of the dialog is stored, the input processing means of the interactive device converts the user's interactive input into a format capable of interactive processing from the user's interactive input The input conversion result and the instruction information for instructing the dialog scenario or the auxiliary dialog scenario for executing the dialog processing are generated and output, and the dialog system instructed by the focus dialog system setting means of the dialog device specifies the dialog Receive a dialogue scenario and response generation model of a dialogue system having a scenario from the dialogue system , Each of which is set as a focal dialogue model and a focal response generation model for executing dialogue processing, and the dialogue scenario execution means of the dialogue device uses the focal dialogue scenario or the auxiliary dialogue scenario to respond to the input conversion result. The content is generated and output, and the response generation means of the dialog device generates and outputs the dialog output presented to the user from the response content using the focus response generation model.

また、対話装置をコンピュータ上で機能させる対話プログラムによって、コンピュータを対話装置として作動処理させることができる。そして、この対話プログラムを記録した、コンピュータによって読み取り可能なプログラム記録媒体によって、他のコンピュータを対話装置として機能させることや、対話プログラムを流通させることなどが可能になる。 Further, the computer can be operated as an interactive device by an interactive program that causes the interactive device to function on the computer. A computer-readable program recording medium that records this interactive program makes it possible for another computer to function as an interactive device, or to distribute the interactive program.

本発明によれば、焦点対話シナリオから補助対話シナリオへシナリオ遷移が生じた場合でも、現在の対話システムにおける焦点応答生成モデルを用いて、応答内容から対話出力を生成して出力するので、焦点対話シナリオから補助対話シナリオへシナリオ遷移が生じた際の応答様式の連続性が保持される。 According to the present invention, even when a scenario transition occurs from a focal dialogue scenario to an auxiliary dialogue scenario, a dialogue output is generated and output from the response contents using the focal response generation model in the current dialogue system. The continuity of response style when a scenario transition occurs from a scenario to an auxiliary dialogue scenario is maintained.

以下に、２つの実施形態について説明する。
第１実施形態は、ユーザからの入力毎に用いる対話システムを判定し、この判定された対話システムに従って対話処理を行う場合である。
第２実施形態は、参考文献１に開示される対話システムに、本発明の手法を用いる場合である。
（参考文献１）特願２００５−２３２２１５号
《第１実施形態》
以下に、本発明の第１実施形態を、図１〜図１５を参照しながら説明する。 In the following, two embodiments will be described.
In the first embodiment, a dialogue system to be used is determined for each input from a user, and dialogue processing is performed according to the decided dialogue system.
The second embodiment is a case where the technique of the present invention is used in the dialogue system disclosed in Reference Document 1.
(Reference Document 1) Japanese Patent Application No. 2005-232215 << First Embodiment >>
Below, 1st Embodiment of this invention is described, referring FIGS.

本発明の最良の実施形態の一つは、主にコンピュータに本発明の対話プログラムを実行させることによって、コンピュータを対話装置として機能させる形態である。
図１は、第１実施形態に係わる対話装置（Ａ）のハードウェア構成を例示した構成ブロック図である。 One of the best embodiments of the present invention is a form in which a computer functions as an interactive device mainly by causing the computer to execute the interactive program of the present invention.
FIG. 1 is a configuration block diagram illustrating a hardware configuration of the interactive apparatus (A) according to the first embodiment.

図１に例示するように、対話装置（Ａ）は、マイクロフォンやキーボードなどが接続可能な入力部（１１）、スピーカや液晶ディスプレイなどが接続可能な出力部（１２）、対話装置（Ａ）外部に通信可能な通信装置（例えばモデム）が接続可能な通信部（１３）、ＭＰＵ〔Micro Processing Unit〕（１４）〔キャッシュメモリなどを備えていてもよい。〕、メモリであるＲＡＭ（Random Access Memory）（１５）、ＲＯＭ（Read Only Memory）（１６）やハードディスクなどである外部記憶装置（１７）並びにこれらの入力部（１１）、出力部（１２）、通信部（１３）、ＭＰＵ（１４）、ＲＡＭ（１５）、ＲＯＭ（１６）、外部記憶装置（１７）間のデータのやり取りが可能なように接続するバス（１８）などを備えている。また必要に応じて、対話装置（Ａ）に、ＣＤ−ＲＯＭなどの記憶媒体を読み書きできる装置（ドライブ）などを設けるとしてもよい。 As illustrated in FIG. 1, the interactive device (A) includes an input unit (11) to which a microphone, a keyboard, and the like can be connected, an output unit (12) to which a speaker, a liquid crystal display, and the like can be connected, and an external device of the interactive device (A). A communication unit (13) to which a communication device (for example, a modem) that can communicate with the communication unit (13) can be connected, MPU [Micro Processing Unit] (14) [cache memory, etc. may be provided. RAM (Random Access Memory) (15), ROM (Read Only Memory) (16), an external storage device (17) such as a hard disk, etc., and their input unit (11), output unit (12), A communication unit (13), an MPU (14), a RAM (15), a ROM (16), a bus (18) connected so as to exchange data between the external storage devices (17), and the like are provided. If necessary, the interactive device (A) may be provided with a device (drive) that can read and write a storage medium such as a CD-ROM.

対話装置（Ａ）の外部記憶装置（１７）には、少なくとも中立な対話状態に対応した補助対話シナリオおよび補助応答生成モデルが保存記憶されている。 In the external storage device (17) of the dialog device (A), an auxiliary dialog scenario and an auxiliary response generation model corresponding to at least a neutral dialog state are stored and stored.

対話装置（Ａ）のＲＯＭ（１６）には、対話処理を可能にするためのプログラムおよびこのプログラムの処理において必要となるデータなどが保存記憶されている。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭ（１５）などに適宜に保存記憶される。 The ROM (16) of the interactive device (A) stores and stores a program for enabling interactive processing, data necessary for processing of this program, and the like. Further, data obtained by the processing of these programs is appropriately stored and stored in the RAM (15) or the like.

より具体的には、ＲＯＭ（１６）には、対話装置（Ａ）に入力された情報に対して対話処理を実行可能なように情報処理を行うためのプログラム、焦点対話シナリオなどによって応答内容などを生成するためのプログラム、焦点対話システムの設定・変更を実行するためのプログラム、上記応答内容から応答情報を生成して出力するためのプログラムが保存記憶されている。その他、これらのプログラムに基づく処理を制御するための制御プログラムも適宜に保存しておく。 More specifically, in the ROM (16), the response contents etc. depending on the program for performing information processing so that the dialogue processing can be executed on the information input to the dialogue device (A), the focal dialogue scenario, etc. , A program for executing setting / changing of the focal dialog system, and a program for generating and outputting response information from the response contents are stored and stored. In addition, a control program for controlling processing based on these programs is also stored as appropriate.

第１実施形態に係る対話装置（Ａ）では、ＲＯＭ（１６）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭ（１５）に読み込まれて、ＭＰＵ（１４）で解釈実行・処理される。その結果、ＭＰＵ（１４）が所定の機能（入力処理部、対話シナリオ実行部、焦点対話システム設定部、応答生成部、制御部）を実現することで、対話処理が実現される。 In the interactive apparatus (A) according to the first embodiment, each program stored in the ROM (16) and data necessary for processing of each program are read into the RAM (15) as necessary, and the MPU (14 ) Is interpreted and executed. As a result, the MPU (14) implements predetermined functions (input processing unit, dialogue scenario execution unit, focal dialogue system setting unit, response generation unit, control unit), thereby realizing dialogue processing.

続いて、図２〜図１５を参照しながら、第１実施形態における対話処理について、具体例を示しながら叙述的に説明する。 Subsequently, the dialogue processing in the first embodiment will be described descriptively with specific examples, with reference to FIGS.

対話装置（Ａ）と、複数（ｎ個）の対話システム、即ち、第１対話システム（１０４１）、第２対話システム（１０４２）、・・・、第ｎ対話システム（１０４ｎ）とは、ネットワーク（１）を介して相互に通信可能に接続されている（図１８参照。）。各対話システムは、それ自体が対話処理を実行可能な公知の対話システムであるとする。このように既存の対話システムを用いることで、コストを抑えて様々な話題に対応可能なマルチ対話システムが構築される。
それぞれの対話システムは、少なくとも対話シナリオおよび応答生成モデルをその記憶手段に記憶している。つまり、第１対話システム（１０４１）は、第１対話シナリオ（１０４１ｂ）および第１応答生成モデル（１０４１ａ）をその記憶手段に記憶し、第２対話システム（１０４２）は、第２対話シナリオ（１０４２ｂ）および第２応答生成モデル（１０４２ａ）をその記憶手段に記憶し、第ｎ対話システム（１０４ｎ）は、第ｎ対話シナリオ（１０４ｎｂ）および第ｎ応答生成モデル（１０４ｎａ）をその記憶手段に記憶している。対話シナリオは、ある所定の話題について、対話を構築するための処理命令や入力に対する応答などが記述されたデータである（例えば、プログラム様式で記述される。）。応答生成モデルは、応答様式を決定するための情報が記述されたデータである。ここでは、便宜的に、第１対話システム（１０４１）を、東京観光案内システムとし、第ｎ対話システム（１０４ｎ）を、東京行政サービス案内システムとする。
なお、必ずネットワーク（１）を介して、対話装置（Ａ）と複数の対話システムとが通信可能に接続されなければならないというものではなく、要は、少なくとも対話装置（Ａ）と対話システムとで相互に通信可能に接続されることによって、少なくとも対話システムの対話シナリオおよび応答生成モデルが対話装置（Ａ）に送信可能な構成であればよい。 The interactive device (A) and a plurality (n) of interactive systems, that is, the first interactive system (1041), the second interactive system (1042),..., The nth interactive system (104n) 1) and are communicably connected to each other (see FIG. 18). Each interactive system is assumed to be a known interactive system capable of executing an interactive process. By using an existing dialogue system in this way, a multi-dialog system that can deal with various topics at a reduced cost is constructed.
Each dialogue system stores at least a dialogue scenario and a response generation model in its storage means. That is, the first dialog system (1041) stores the first dialog scenario (1041b) and the first response generation model (1041a) in its storage means, and the second dialog system (1042) stores the second dialog scenario (1042b). ) And the second response generation model (1042a) are stored in the storage means, and the nth interactive system (104n) stores the nth interaction scenario (104nb) and the nth response generation model (104na) in the storage means. ing. The dialogue scenario is data in which a processing command for constructing a dialogue, a response to an input, and the like are described for a predetermined topic (for example, written in a program format). The response generation model is data in which information for determining a response mode is described. Here, for convenience, the first dialogue system (1041) is a Tokyo sightseeing guidance system, and the nth dialogue system (104n) is a Tokyo administrative service guidance system.
Note that the dialog device (A) and the plurality of dialog systems are not necessarily connected to be able to communicate via the network (1), but at least the dialog device (A) and the dialog system are important. Any configuration may be employed as long as at least a dialogue scenario and a response generation model of the dialogue system can be transmitted to the dialogue apparatus (A) by being connected to each other.

また、対話装置（Ａ）の外部記憶装置（１７）に保存記憶される補助対話シナリオ（１０１２）および補助応答生成モデル（１０３２）は、制御部（８０）の制御によって、ＲＡＭ（１５）の所定の格納領域に格納される。 The auxiliary dialogue scenario (1012) and the auxiliary response generation model (1032) stored and stored in the external storage device (17) of the dialogue device (A) are stored in the RAM (15) under the control of the control unit (80). Stored in the storage area.

さらに、制御部（８０）は、処理シナリオ情報（１００３）を生成して、ＲＡＭ（１５）の所定の格納領域に格納する。なお、ここで生成される処理シナリオ情報（１０３３）は、初期情報として例えばＮｕｌｌ値とする。
以後、「ＲＡＭ（１５）から○○を読み込む」旨の説明をした場合は、「ＲＡＭ（１５）において○○が格納されている所定の格納領域から○○を読み込む」ことを意味するとする。 Furthermore, the control unit (80) generates processing scenario information (1003) and stores it in a predetermined storage area of the RAM (15). Note that the processing scenario information (1033) generated here is, for example, a null value as initial information.
Hereinafter, when it is described that “read XX from RAM (15)”, it means “read XX from a predetermined storage area in which XX is stored in RAM (15)”.

対話装置（Ａ）への入力には、例えば、対話装置（Ａ）の利用者であるユーザが発声した音声や、キーボードを用いたテキスト入力、マウス入力、タッチパネル入力、ボタン操作、ジェスチャ入力、もしくはそれらのいくつかを組み合わせたものなどが考えられる。第１実施形態では一例として音声による対話処理を想定しており、入力は音声であるとする。 The input to the interactive device (A) includes, for example, voice uttered by a user who is a user of the interactive device (A), text input using a keyboard, mouse input, touch panel input, button operation, gesture input, or A combination of some of these may be considered. In the first embodiment, it is assumed that dialogue processing by voice is assumed as an example, and the input is voice.

ユーザの発声した音声は、対話装置（Ａ）のマイクロフォン（３０）によって収音される（ステップＳ１）。マイクロフォン（３０）によって収音されたユーザの音声は、収音信号として、対話装置（Ａ）の入力処理部（１００）の入力となる。 The voice uttered by the user is picked up by the microphone (30) of the dialogue apparatus (A) (step S1). The user's voice collected by the microphone (30) is input to the input processing unit (100) of the interactive apparatus (A) as a collected sound signal.

対話装置（Ａ）の入力処理部（１００）は、上記の収音信号に対して、対話装置（Ａ）において対話処理を実行可能とする情報処理を行う。
具体的な一例として、入力処理部（１００）は、収音信号に対してＡ／Ｄ変換などを行い離散信号に変換し、この離散信号に対して、音声区間を検出する音声区間検出、検出された音声区間の周波数領域の変換などの音声分析処理を行う。さらに、入力処理部（１００）は、これらの処理が施された離散信号について、適宜の音響モデル（例えば単語の発音と音声特徴量との関係を確率として与える確率モデルである。）、言語モデル（例えば、単語と単語との共起関係を確率として与える確率モデルである。）などを用いて、収音信号（ユーザの発した音声）に相当するテキスト（音声認識結果）を得る。加えて、入力処理部（１００）は、この得られたテキストから特徴的なキーワードやテキストタイプ（例えば質問形式や返答形式など）などを抽出し、キーワード属性値対応表などを用いて、これらを属性−値ペアの形式の入力変換結果（１００１）として出力する。また、入力処理部（１００）は、例えば非特許文献１に開示される手法を用いて、抽出されたキーワードなどから、処理シナリオを指定する処理シナリオ指示情報（１００２）も生成して出力する（ステップＳ２）。
なお、対話装置（Ａ）の入力処理部（１００）は、公知技術（例えば参考文献２、上記非特許文献１などを参照。）によって達成されるから、入力処理部（１００）の詳細な構成・機能についての説明は略する。
（参考文献２）「ＮＴＴ技術ジャーナル」、社団法人電気通信協会、２００４年１月号 The input processing unit (100) of the dialog device (A) performs information processing that enables the dialog device (A) to execute the dialog processing on the collected sound signal.
As a specific example, the input processing unit (100) performs A / D conversion or the like on the collected sound signal to convert it into a discrete signal, and detects and detects a speech section for the discrete signal. Perform voice analysis processing such as frequency domain conversion of the voice segment. Further, the input processing unit (100) performs an appropriate acoustic model (for example, a probability model that gives a relationship between the pronunciation of a word and a speech feature amount as a probability) and a language model for the discrete signals subjected to these processes. (For example, a probabilistic model that gives a co-occurrence relationship between words as a probability.) Or the like is used to obtain text (speech recognition result) corresponding to a sound collection signal (speech made by the user). In addition, the input processing unit (100) extracts characteristic keywords and text types (for example, a question format and a response format) from the obtained text, and uses a keyword attribute value correspondence table or the like to extract them. Output as an input conversion result (1001) in the form of attribute-value pairs. The input processing unit (100) also generates and outputs processing scenario instruction information (1002) for specifying a processing scenario from an extracted keyword or the like using, for example, the technique disclosed in Non-Patent Document 1. Step S2).
Since the input processing unit (100) of the interactive device (A) is achieved by a known technique (for example, see Reference Document 2, Non-Patent Document 1 above), the detailed configuration of the input processing unit (100).・ Description of functions is omitted.
(Reference 2) “NTT Technical Journal”, Telecommunications Association, January 2004

例えば、ユーザが「六本木ヒルズの最寄り駅を教えてほしい」と発声すると、対話装置（Ａ）の入力処理部（１００）は、属性−値ペア形式の入力変換結果（１００１）を、属性が「意図タイプ」の値を「質問」、属性が「主題」の値を「最寄り駅」、属性が「エリア」の値を「六本木ヒルズ」として出力する（図３参照。）。また、対話装置（Ａ）の入力処理部（１００）は、「最寄り駅」「六本木ヒルズ」というキーワードなどに基づき、処理シナリオ指示情報（１００２）を、scenario＝"第１対話シナリオ"として出力する（図３参照。）。対話装置（Ａ）の入力処理部（１００）によって出力された入力変換結果（１００１）および処理シナリオ指示情報（１００２）は、ＲＡＭ（１５）の所定の格納領域に格納される。 For example, when the user utters “I want to know the nearest station of Roppongi Hills”, the input processing unit (100) of the dialogue device (A) displays the input conversion result (1001) in the attribute-value pair format and the attribute is “ The value of “Intent type” is output as “Question”, the value of “Subject” is “Nearest station”, and the value of “Area” is “Roppongi Hills” (see FIG. 3). Further, the input processing unit (100) of the dialogue apparatus (A) outputs the processing scenario instruction information (1002) as scenario = “first dialogue scenario” based on the keywords “nearest station” and “Roppongi Hills”. (See FIG. 3). The input conversion result (1001) and the processing scenario instruction information (1002) output by the input processing unit (100) of the interactive device (A) are stored in a predetermined storage area of the RAM (15).

対話装置（Ａ）の対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ情報（１００３）および処理シナリオ指示情報（１００２）を読み込み、各情報が一致するか否かを判定する（ステップＳ３）。
制御部（８０）は、判定結果が、各情報が一致する場合にはステップＳ９の処理を、各情報が一致しない場合にはステップＳ４の処理を実行するように制御する。
この段階では、処理シナリオ情報（１００３）はＮｕｌｌ値であり、処理シナリオ指示情報（１００２）は"第１対話シナリオ"であるから一致しないので、ステップＳ４の処理が実行される。 The dialogue scenario execution unit (101) of the dialogue device (A) reads the processing scenario information (1003) and the processing scenario instruction information (1002) from the RAM (15), and determines whether or not each information matches (step). S3).
The control unit (80) performs control so that the process of step S9 is executed when the pieces of information match, and the process of step S4 is executed when the pieces of information do not match.
At this stage, the processing scenario information (1003) is a Null value, and the processing scenario instruction information (1002) is “first dialog scenario”, so they do not match, so the processing of step S4 is executed.

制御部（８０）の制御の下、対話装置（Ａ）の対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から読み込んだ処理シナリオ指示情報（１００２）が"補助対話シナリオ"であるか否かを判定する（ステップＳ４）。
制御部（８０）は、判定結果が、処理シナリオ指示情報（１００２）が"補助対話シナリオ"である場合にはステップＳ１２の処理を、処理シナリオ指示情報（１００２）が"補助対話シナリオ"ではない場合にはステップＳ５の処理を実行するように制御する。
この段階では、処理シナリオ指示情報（１００２）は"第１対話シナリオ"であるから、ステップＳ５の処理が実行される。 Under the control of the control unit (80), the dialogue scenario execution unit (101) of the dialogue apparatus (A) determines whether or not the processing scenario instruction information (1002) read from the RAM (15) is an “auxiliary dialogue scenario”. Is determined (step S4).
The control unit (80) determines that the process scenario instruction information (1002) is not “auxiliary dialogue scenario” when the determination result is that the processing scenario instruction information (1002) is “auxiliary dialogue scenario”. In that case, control is performed to execute the process of step S5.
At this stage, since the process scenario instruction information (1002) is “first dialog scenario”, the process of step S5 is executed.

制御部（８０）の制御の下、対話装置（Ａ）の対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から読み込んだ処理シナリオ情報（１００３）がＮｕｌｌ値であるか否かを判定する（ステップＳ５）。
制御部（８０）は、判定結果が、処理シナリオ情報（１００３）がＮｕｌｌ値である場合にはステップＳ６の処理を、処理シナリオ情報（１００３）がＮｕｌｌ値ではない場合にはステップＳ２４の処理を実行するように制御する。
この段階では、処理シナリオ情報（１００３）はＮｕｌｌ値であるから、ステップＳ６の処理が実行される。 Under the control of the control unit (80), the dialogue scenario execution unit (101) of the dialogue device (A) determines whether or not the processing scenario information (1003) read from the RAM (15) is a Null value ( Step S5).
The control unit (80) performs the process of step S6 when the determination result is that the process scenario information (1003) is a Null value, and the process of step S24 when the process scenario information (1003) is not a Null value. Control to run.
At this stage, since the process scenario information (1003) is a Null value, the process of step S6 is executed.

制御部（８０）は、処理シナリオ指示情報（１００２）と同一内容の情報を処理シナリオ情報（１００３）として、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ６）。つまり、処理シナリオ指示情報（１００２）を処理シナリオ情報（１００３）に複写するのである。この段階で、処理シナリオ情報（１００３）は、Ｎｕｌｌ値から処理シナリオ指示情報（１００２）である"第１対話シナリオ"に変更されたことになる。 The control unit (80) stores information having the same contents as the processing scenario instruction information (1002) as processing scenario information (1003) in a predetermined storage area of the RAM (15) (step S6). That is, the processing scenario instruction information (1002) is copied to the processing scenario information (1003). At this stage, the processing scenario information (1003) is changed from the Null value to “first dialogue scenario” which is the processing scenario instruction information (1002).

続いて、制御部（８０）の制御の下、対話シナリオ実行部（１０１）は、処理シナリオ情報（１００３）から焦点対話システム設定情報（１０２１）を生成して、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ７）。ここで焦点対話システム設定情報（１０２１）とは、処理シナリオ情報（１００３）に対応した対話シナリオ等を設定するために必要な対話システムの設定・変更のための情報である。 Subsequently, under the control of the control unit (80), the dialogue scenario execution unit (101) generates the focal dialogue system setting information (1021) from the processing scenario information (1003), and stores the predetermined information in the RAM (15). Store in the area (step S7). Here, the focal dialogue system setting information (1021) is information for setting / changing the dialogue system necessary for setting a dialogue scenario corresponding to the processing scenario information (1003).

続いて、制御部（８０）の制御の下、対話装置（Ａ）の焦点対話システム設定部（１０２）は、ＲＡＭ（１５）から焦点対話システム設定情報（１０２１）を読み込む。そして、焦点対話システム設定部（１０２）は、焦点対話システム設定情報（１０２１）を解釈し、処理シナリオ情報（１００３）で指示される対話シナリオに対応する対話システムを選択する。さらに、焦点対話システム設定部（１０２）は、ネットワーク（１）を介して、この選択した対話システムの記憶手段から、対話シナリオおよび応答生成モデルをそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ８）。
この段階では、処理シナリオ情報（１００３）は"第１対話シナリオ"であるから、焦点対話システム設定部（１０２）は、ネットワーク（１）を介して、第１対話システム（１０４１）の記憶手段から第１対話シナリオ（１０４１ｂ）および第１応答生成モデル（１０４１ａ）をそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する。 Subsequently, under the control of the control unit (80), the focal dialogue system setting unit (102) of the dialogue device (A) reads the focal dialogue system setting information (1021) from the RAM (15). Then, the focal dialogue system setting unit (102) interprets the focal dialogue system setting information (1021) and selects a dialogue system corresponding to the dialogue scenario indicated by the processing scenario information (1003). Further, the focal dialogue system setting unit (102) reads the dialogue scenario and the response generation model from the storage unit of the selected dialogue system via the network (1), and generates the focal dialogue scenario (1011) and the focal response generation. The model (1031) is stored in a predetermined storage area of the RAM (15) (step S8).
At this stage, since the processing scenario information (1003) is the “first dialog scenario”, the focus dialog system setting unit (102) is connected to the storage unit of the first dialog system (1041) via the network (1). The first dialogue scenario (1041b) and the first response generation model (1041a) are read, respectively, and stored as a focal dialogue scenario (1011) and a focus response generation model (1031) in a predetermined storage area of the RAM (15).

続いて、制御部（８０）の制御の下、対話装置（Ａ）の対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から焦点対話シナリオ（１０１１）および入力変換結果（１００１）を読み込み、応答内容（１０３３）を生成し、この応答内容（１０３３）をＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ９）。この応答内容（１０３３）としては、例えば、テキストやエージェントキャラクタの描画コマンドなどが考えられる。第１実施形態では、音声による対話処理を実行するとしているので、応答内容（１０３３）はテキスト形式であるとする。具体的な一例として、対話シナリオ実行部（１０１）は、応答内容（１０３３）を、text="最寄り駅は六本木になります"として出力する（図４参照。）。 Subsequently, under the control of the control unit (80), the dialogue scenario execution unit (101) of the dialogue device (A) reads the focused dialogue scenario (1011) and the input conversion result (1001) from the RAM (15), and responds. Contents (1033) are generated, and the response contents (1033) are stored in a predetermined storage area of the RAM (15) (step S9). As this response content (1033), for example, a text or an agent character drawing command can be considered. In the first embodiment, since it is assumed that voice interactive processing is executed, it is assumed that the response content (1033) is in text format. As a specific example, the dialogue scenario execution unit (101) outputs the response content (1033) as text = "the nearest station is Roppongi" (see FIG. 4).

続いて、制御部（８０）の制御の下、対話装置（Ａ）の応答生成部（１０３）は、ＲＡＭ（１５）から焦点応答生成モデル（１０３１）および応答内容（１０３３）を読み込み、応答情報（対話出力）を生成して出力する（ステップＳ１０）。ここで対話処理におけるユーザへの応答としては、例えば、ディスプレイに表示されるテキスト、画像、エージェントキャラクタのジェスチャなどのアニメーション、スピーカから出力される合成音声もしくはこれらのいくつかを組み合わせたものなどが考えられる。第１実施形態では、音声による対話処理を実行するとしているので、応答は合成音声であるとする。そこで、応答生成部（１０３）は、波形接続方式のテキスト音声合成手段であるとし、応答生成部（１０３）の出力である応答情報は、音声波形データが連なる合成音声信号であるとする。但し、応答生成部（１０３）を、波形接続方式のテキスト音声合成手段に限定する趣旨ではなく、その他の方式の音声合成手段でもよい。 Subsequently, under the control of the control unit (80), the response generation unit (103) of the dialogue apparatus (A) reads the focus response generation model (1031) and the response content (1033) from the RAM (15), and receives response information. (Interactive output) is generated and output (step S10). Here, as the response to the user in the interactive processing, for example, text displayed on a display, an image, an animation such as an agent character gesture, a synthesized voice output from a speaker, or a combination of these may be considered. It is done. In the first embodiment, since it is assumed that dialogue processing by voice is executed, it is assumed that the response is synthesized speech. Therefore, it is assumed that the response generation unit (103) is a waveform connection type text-to-speech synthesis unit, and the response information output from the response generation unit (103) is a synthesized speech signal in which speech waveform data is connected. However, the response generation unit (103) is not limited to the waveform connection type text-to-speech synthesizer, and may be another type of speech synthesizer.

応答生成部（１０３）の具体的な一例を説明する。応答生成部（１０３）は、テキスト解析部、韻律生成部、音声波形選択部、音声合成部から構成される。応答生成部（１０３）は、焦点応答生成モデルおよび応答内容であるテキストを入力とし、合成音声信号を出力する。また、外部記憶装置（１７）には、図示しない音声波形データベースおよび音声情報データベースが保存記憶されている。音声波形データベースは、単語や文章を読み上げた音声データに対して公知のＡ／Ｄ変換を行い、合成音声を組み立てる上で適切な合成単位（例えば音素）で切出したもの（音声波形素片としての音声波形データ）の集合である。 A specific example of the response generation unit (103) will be described. The response generation unit (103) includes a text analysis unit, a prosody generation unit, a speech waveform selection unit, and a speech synthesis unit. The response generation unit (103) receives the focus response generation model and the text as the response content and outputs a synthesized speech signal. The external storage device (17) stores a voice waveform database and a voice information database (not shown). The speech waveform database performs well-known A / D conversion on speech data read out from words and sentences, and is cut out by an appropriate synthesis unit (for example, phonemes) for assembling synthesized speech (as speech waveform segments) Audio waveform data).

音声情報データベースは、合成音声を組み立てる上で適切な単位（合成単位）を音素として、これに諸情報が対応付けられたエントリーからなるデータ構造（テーブル）となっている。音声情報データベースの各エントリーは、音声波形素片の通し番号である音声波形素片番号、発声内容を示す音素ラベル情報、音素の発声時間長を示す音素継続時間情報、音素区間の平均パワーを正規化して得たパワー情報、音素の音高の時間推移を表したＦ_０パターン情報、音声波形データベースの中での音声波形データの位置を示す情報、例えば男女の別などの話者性を示すインデックス（以下、音声波形データ位置情報という。）などから構成される。
音声情報データベースのエントリーと音声波形データベースにおける（音声波形素片としての）各音声波形データとは、音声情報データベースにおける音声波形データ位置情報によって対応付けられる。 The speech information database has a data structure (table) composed of entries in which various units are associated with phonemes, which are units suitable for assembling synthesized speech. Each entry in the speech information database normalizes the speech waveform segment number, which is the serial number of the speech waveform segment, the phoneme label information indicating the utterance content, the phoneme duration information indicating the phoneme duration, and the average power of the phoneme segment. Power information obtained, F ₀ pattern information representing the time transition of phoneme pitches, information indicating the position of speech waveform data in the speech waveform database, for example, an index indicating the speaker nature such as gender distinction ( Hereinafter, it is referred to as voice waveform data position information).
The speech information database entry and each speech waveform data (as a speech waveform segment) in the speech waveform database are associated with each other by speech waveform data position information in the speech information database.

テキスト解析部は、入力されたテキストを形態素解析し、入力されたテキストに対応した音素列とアクセント型を出力する。 The text analysis unit performs morphological analysis on the input text and outputs a phoneme string and an accent type corresponding to the input text.

韻律生成部は、テキスト解析部が出力した情報および焦点応答生成モデルを入力として、音素ごとの音声のＦ_０パターン(基本周波数パターン)、音素継続時間長(音素の発声の長さ)、パワー情報(音声の大きさ)などを推定し、これを出力する。焦点応答生成モデル（第ｉ応答生成モデル）には、合成音声の話速や声の高さや話者性などを指定するテキスト音声合成のパラメータが記述されている。この段階では第１応答生成モデル（１０４１ａ）であり、例えば、pitch="200Hz"として平均的な声の高さを２００Ｈｚ、speed="fast"として口調の速さを速め、power="normal"として通常の声の大きさを指定するものとなっている（図５参照。）。 Prosody generation unit, as input information and focus response generation model text analyzer is output, F ₀ pattern (fundamental frequency pattern) of phonemes each speech phoneme duration length (the length of the phoneme uttered), power information (Sound volume) is estimated and output. The focus response generation model (i-th response generation model) describes text-to-speech synthesis parameters that specify the speech speed, voice pitch, speaker characteristics, and the like of synthesized speech. At this stage, the first response generation model (1041a) is used. For example, when pitch = "200Hz", the average voice height is 200Hz, speed = "fast" is used to increase the tone speed, and power = "normal" The normal voice volume is designated as (see FIG. 5).

音声波形選択部は、焦点応答生成モデルおよびテキスト解析部が出力した音素列の並びに従い、韻律生成部で出力した、音素ごとの音声のＦ_０パターン、音素継続時間長、パワー情報、応答生成モデルで指定される話者性（この段階では第１応答生成モデル（１０４１ａ）であり、例えば、voicetype="female"として話者を女性と指定している。）などをターゲットとして、これらターゲットとの歪みが小さく、また、音声波形素片を接続した際の音声波形素片同士での接続歪みが最小になるような音声波形素片の組み合わせ（最適音声波形素片列）を、音声情報データベースから選択して、最適音声波形素片列の各音声波形素片番号（テキスト解析部が出力した音素列の並びに対応している。）を出力する。最適音声波形素片列の決定には動的計画法などを用いる。 Speech waveform selector, in accordance with the arrangement of the sequence of phonemes focus response generation model and text analysis unit has output, and output by the prosody generation part, F ₀ pattern of phonemes each speech phoneme duration length, power information, response generation model (For example, the first response generation model (1041a) and the speaker is specified as female by voicetype = "female" at this stage). A combination of speech waveform segments (optimum speech waveform segment sequence) that minimizes connection distortion between speech waveform segments when the speech waveform segments are connected from the speech information database. The selected speech waveform segment number of the optimal speech waveform segment sequence (corresponding to the sequence of phoneme sequences output by the text analysis unit) is output. Dynamic programming or the like is used to determine the optimum speech waveform segment sequence.

音声合成部は、音声波形選択部で選択された最適音声波形素片列の各音声波形素片番号を入力として、この最適音声波形素片列の各音声波形素片番号に対応した音声波形データを（音声波形データ位置情報を参照して）音声波形データベースから読み込み、それら音声波形データを順次接続して連続した合成音声信号として出力する。 The speech synthesizer receives each speech waveform segment number of the optimum speech waveform segment sequence selected by the speech waveform selection unit as input, and speech waveform data corresponding to each speech waveform segment number of this optimum speech waveform segment sequence Are read from the speech waveform database (with reference to the speech waveform data position information), and the speech waveform data are sequentially connected and output as a continuous synthesized speech signal.

なお、対話装置（Ａ）の応答生成部（１０３）は、公知技術（例えば参考文献３などを参照。）によって達成されるから、応答生成部（１０３）の詳細な構成・機能についての説明は略する。
（参考文献３）特許２７６１５５２号公報 Since the response generation unit (103) of the interactive device (A) is achieved by a known technique (see, for example, Reference 3), the detailed configuration / function of the response generation unit (103) is described. Abbreviated.
(Reference 3) Japanese Patent No. 2761552

応答生成部（１０３）によって出力された合成音声信号（応答情報）は、対話装置（Ａ）のスピーカ（４０）から合成音声として出力される（ステップＳ１１）。つまり、スピーカ（４０）からは、女性の声で平均的な声の高さが２００Ｈｚになる程度の、やや早めの口調で通常の大きさの合成音声で「最寄り駅は六本木になります」と出力される。ユーザは、この合成音声を対話処理の応答として知覚する。 The synthesized speech signal (response information) output by the response generation unit (103) is output as synthesized speech from the speaker (40) of the interactive device (A) (step S11). In other words, from the speaker (40), the average voice pitch of a female voice is about 200 Hz, with a slightly early tone and a normal synthesized voice, “The nearest station is Roppongi.” Is output. The user perceives this synthesized speech as a response of the dialogue process.

ユーザは、この合成音声を聴いて満足し（この例で云えば、ユーザは、最寄り駅を知るだけで満足した。）、対話処理を終了するかもしれないし、あるいは、さらなる情報などを求めて対話処理を続行するかもしれない。続行する場合、ユーザは、従前の対話に関連した内容の言葉を発するかもしれないし、従前の対話に関連しない内容の言葉を発するかもしれない。さらに、従前のユーザから突然、別のユーザが割り込みないし変更し、従前の対話に関連した内容の言葉を発するかもしれないし、従前の対話に関連しない内容の言葉を発するかもしれない。このように、対話処理においては様々な場合が考えられる。
しかしながら、本発明は、このような様々な場合においても対応可能なものであるから、ユーザがさらなる情報などを求めて対話処理を続行する場合を例として、さらに説明を加えることにする。 The user is satisfied with listening to the synthesized speech (in this example, the user is satisfied only by knowing the nearest station), may end the dialog processing, or may be interactive for further information. Processing may continue. When continuing, the user may utter words with content related to the previous conversation, or utter words with content not related to the previous conversation. In addition, a previous user may suddenly interrupt or change from a previous user and utter words of content related to the previous dialog, or utter words of content not related to the previous dialog. As described above, various cases can be considered in the dialogue processing.
However, since the present invention can deal with such various cases, further explanation will be given by taking as an example a case where the user continues the dialogue process for further information.

ユーザが、合成音声の出力を受けて、さらなる情報などを求めて、ある言葉を発したとする。この言葉（音声）は、マイクロフォン（３０）によって収音され、上記ステップＳ１およびステップＳ２の処理が実行される。その結果、処理シナリオ指示情報（１００２）が、"第１対話シナリオ"である場合と、"第１対話シナリオ"ではない場合がありえる。例えば、ユーザが「六本木ヒルズ周辺の有名な公園を教えてほしい」と発声すると、処理シナリオ指示情報（１００２）は、東京観光案内システムである第１対話システムの"第１対話シナリオ"となる。また、ユーザが「六本木駅の近くの区役所を教えてください」という発声を行うと、処理シナリオ指示情報（１００２）は、東京行政サービス案内システムである第ｎ対話システムの"第ｎ対話シナリオ"となる（図６参照。）。 It is assumed that the user receives a synthesized speech output and utters a certain word for further information. This word (speech) is picked up by the microphone (30), and the processes of steps S1 and S2 are executed. As a result, the processing scenario instruction information (1002) may be “first dialog scenario” or may not be “first dialog scenario”. For example, when the user utters “I want you to tell me a famous park around Roppongi Hills”, the processing scenario instruction information (1002) becomes the “first dialog scenario” of the first dialog system, which is the Tokyo sightseeing information system. When the user utters “Please tell me the ward office near Roppongi Station”, the processing scenario instruction information (1002) is “nth dialogue scenario” of the nth dialogue system which is the Tokyo administrative service guidance system. (See FIG. 6).

処理シナリオ指示情報（１００２）が"第１対話シナリオ"である場合、ステップＳ３の処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ情報（１００３）および処理シナリオ指示情報（１００２）を読み込み、各情報が一致するか否かを判定する。この段階では、処理シナリオ情報（１００３）は"第１対話シナリオ"であり、処理シナリオ指示情報（１００２）は"第１対話シナリオ"であるから、各情報は一致する。
そこで、制御部（８０）は、次のステップＳ３０の処理を実行するように制御する。 When the process scenario instruction information (1002) is “first dialog scenario”, in the process of step S3, the dialog scenario execution unit (101) reads the process scenario information (1003) and process scenario instruction information (from the RAM (15)). 1002) is read and it is determined whether or not the pieces of information match. At this stage, the processing scenario information (1003) is the “first dialog scenario” and the processing scenario instruction information (1002) is the “first dialog scenario”, so the information matches.
Therefore, the control unit (80) controls to execute the process of the next step S30.

対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ情報（１００３）および焦点対話システム設定情報（１０２１）を読み込み、焦点対話システム設定情報（１０２１）が、現在の処理シナリオ情報（１００３）に対応した対話シナリオ等を設定するために必要な対話システムの設定・変更のための情報であるか否かを判定する（ステップＳ３０）。
制御部（８０）は、判定結果が、焦点対話システム設定情報（１０２１）が、現在の処理シナリオ情報（１００３）に対応した対話シナリオ等を設定するために必要な対話システムの設定・変更のための情報である場合にはステップＳ９〜ステップＳ１１の処理を、現在の処理シナリオ情報（１００３）に対応した対話シナリオ等を設定するために必要な対話システムの設定・変更のための情報ではない場合にはステップＳ３１の処理を実行するように制御する。
この段階では、焦点対話システム設定情報（１０２１）は、"第１対話シナリオ"等を設定するために必要な対話システムの設定・変更のための情報であり、現在の処理シナリオ情報（１００３）は"第１対話シナリオ"であるから、判定が成立し、ステップＳ９〜ステップＳ１１の処理が実行される。 The dialogue scenario execution unit (101) reads the processing scenario information (1003) and the focal dialogue system setting information (1021) from the RAM (15), and the focal dialogue system setting information (1021) becomes the current processing scenario information (1003). It is determined whether or not the information is information for setting / changing a dialogue system necessary for setting a dialogue scenario or the like corresponding to (Step S30).
The control unit (80) determines whether the determination result indicates that the focus dialogue system setting information (1021) is set or changed for a dialogue system necessary for setting a dialogue scenario corresponding to the current processing scenario information (1003). Is not information for setting / changing a dialog system necessary for setting a dialog scenario or the like corresponding to the current processing scenario information (1003). Is controlled to execute the processing of step S31.
At this stage, the focus dialogue system setting information (1021) is information for setting / changing the dialogue system necessary for setting the “first dialogue scenario” and the like, and the current processing scenario information (1003) is Since it is the “first dialog scenario”, the determination is established, and the processing of steps S9 to S11 is executed.

ここで、従前の対話処理において、第１対話システム（１０４１）における対話シナリオ（１０４１ｂ）および応答生成モデル（１０４１ａ）それぞれが、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納されているので、続くステップＳ９〜ステップＳ１１の処理は、対話シナリオ（１０４１ｂ）および応答生成モデル（１０４１ａ）に基づいて実行されることに留意しなければならない。 Here, in the conventional dialogue processing, the dialogue scenario (1041b) and the response generation model (1041a) in the first dialogue system (1041) are respectively referred to as the focus dialogue scenario (1011) and the focus response generation model (1031). Since it is stored in the predetermined storage area of 15), it should be noted that the processing of the subsequent steps S9 to S11 is executed based on the dialogue scenario (1041b) and the response generation model (1041a).

処理シナリオ指示情報（１００２）が"第１対話シナリオ"ではない場合として、上記のように処理シナリオ指示情報（１００２）が"第ｎ対話シナリオ"である場合を考える。この場合、ステップＳ３の処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ情報（１００３）および処理シナリオ指示情報（１００２）を読み込み、各情報が一致するか否かを判定する。この段階では、処理シナリオ情報（１００３）は"第１対話シナリオ"であり、処理シナリオ指示情報（１００２）は"第ｎ対話シナリオ"であるから、各情報は一致しない。 Assuming that the process scenario instruction information (1002) is not the “first dialog scenario”, consider the case where the process scenario instruction information (1002) is the “nth dialog scenario” as described above. In this case, in the process of step S3, the dialogue scenario execution unit (101) reads the processing scenario information (1003) and the processing scenario instruction information (1002) from the RAM (15), and determines whether or not each information matches. To do. At this stage, the processing scenario information (1003) is the “first dialog scenario” and the processing scenario instruction information (1002) is the “nth dialog scenario”, so the information does not match.

そこで、ステップＳ４の処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から読み込んだ処理シナリオ指示情報（１００２）が"補助対話シナリオ"であるか否かを判定する。
この段階では、処理シナリオ指示情報（１００２）は"第１対話シナリオ"であるから、ステップＳ５の処理が実行される。 Therefore, in the process of step S4, the dialogue scenario execution unit (101) determines whether or not the processing scenario instruction information (1002) read from the RAM (15) is an “auxiliary dialogue scenario”.
At this stage, since the process scenario instruction information (1002) is “first dialog scenario”, the process of step S5 is executed.

ステップＳ５の処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から読み込んだ処理シナリオ情報（１００３）がＮｕｌｌ値であるか否かを判定する。
この段階では、処理シナリオ情報（１００３）はＮｕｌｌ値ではないから、制御部（８０）の制御の下、ステップＳ２４の処理が実行される。 In the process of step S5, the dialogue scenario execution unit (101) determines whether or not the process scenario information (1003) read from the RAM (15) is a Null value.
At this stage, since the process scenario information (1003) is not a Null value, the process of step S24 is executed under the control of the control unit (80).

対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から、予めＲＡＭ（１５）に読み込まれている補助対話シナリオ（１０１２）および入力変換結果（１００１）を読み込み、応答内容（１０３３）を生成し、この応答内容（１０３３）をＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ２４）。この応答内容（１０３３）は上記と同様にテキスト形式であるとする。具体的な一例として、対話シナリオ実行部（１０１）は、応答内容（１０３３）を、text="これから行政サービス案内のシステムがご案内いたしますがよろしいでしょうか"として出力する（図７参照。）。 The dialogue scenario execution unit (101) reads the auxiliary dialogue scenario (1012) and the input conversion result (1001) read in advance from the RAM (15) from the RAM (15), generates response contents (1033), The response content (1033) is stored in a predetermined storage area of the RAM (15) (step S24). This response content (1033) is assumed to be in text format as described above. As a specific example, the dialogue scenario execution unit (101) outputs the response content (1033) as text = "Are you sure the administrative service guidance system will guide you now?" (See FIG. 7). .

続いて、制御部（８０）の制御の下、応答生成部（１０３）は、ＲＡＭ（１５）から焦点応答生成モデル（１０３１）および応答内容（１０３３）を読み込み、応答情報を生成して出力する（ステップＳ２５）。このステップＳ２５の処理はステップＳ１０の処理と同様であるから説明を略する。
なお、この段階では、焦点応答生成モデル（１０３１）は、第１応答生成モデル（１０４１ａ）であることに留意すること。 Subsequently, under the control of the control unit (80), the response generation unit (103) reads the focus response generation model (1031) and the response content (1033) from the RAM (15), and generates and outputs response information. (Step S25). Since the process of step S25 is the same as the process of step S10, description thereof is omitted.
Note that at this stage, the focus response generation model (1031) is the first response generation model (1041a).

続いて、制御部（８０）は、処理シナリオ指示情報（１００２）と同一内容の情報を処理シナリオ情報（１００３）として、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ２６）。このステップＳ２６の処理はステップＳ６の処理と同様である。
この段階で、処理システム情報（１００３）は、"第１対話シナリオ"から処理シナリオ指示情報（１００２）である"第ｎ対話シナリオ"に変更されたことになる。 Subsequently, the control unit (80) stores information having the same contents as the processing scenario instruction information (1002) as processing scenario information (1003) in a predetermined storage area of the RAM (15) (step S26). The process of step S26 is the same as the process of step S6.
At this stage, the processing system information (1003) is changed from the “first dialogue scenario” to the “nth dialogue scenario” which is the processing scenario instruction information (1002).

ステップＳ２５において応答生成部（１０３）によって出力された合成音声信号（応答情報）は、スピーカ（４０）から合成音声として出力される（ステップＳ２７）。このステップＳ２７の処理はステップＳ１１の処理と同様である。
既述のとおり、ステップＳ２５の処理において用いられる焦点応答生成モデル（１０３１）は、従前の第１応答生成モデル（１０４１ａ）のままであるため、スピーカ（４０）からは、女性の声で平均的な声の高さが２００Ｈｚになる程度の、やや早めの口調で通常の大きさの合成音声で「これから行政サービス案内のシステムがご案内いたしますがよろしいでしょうか」と出力される。従って、対話シナリオが補助対話シナリオに変更しても、応答様式が従前の応答様式（この場合は第１対話システムにおける応答様式である。）と同じになるので、ユーザを困惑させるようなことにはならない。 The synthesized speech signal (response information) output by the response generation unit (103) in step S25 is output as synthesized speech from the speaker (40) (step S27). The process in step S27 is the same as the process in step S11.
As described above, since the focus response generation model (1031) used in the process of step S25 is the same as the previous first response generation model (1041a), an average of female voices from the speaker (40). A normal voice with a slightly early tone, with a high voice pitch of 200 Hz, is output asking "Are you sure the government service guidance system will guide you?" Therefore, even if the dialogue scenario is changed to the auxiliary dialogue scenario, the response mode becomes the same as the previous response mode (in this case, the response mode in the first dialog system). Must not.

ユーザが、この合成音声を知覚して、例えば了承の返事である「はい」を発声したとする。この音声はマイクロフォン（３０）によって収音され、上記ステップＳ１およびステップＳ２の処理が実行される。その結果、属性−値ペア形式の入力変換結果（１００１）は、属性である「意図タイプ」の値が「返事」、属性である「主題」の値が「了承」になり、処理シナリオ指示情報（１００２）は"補助対話シナリオ"になる（図８参照。）。 It is assumed that the user perceives this synthesized speech and utters “Yes”, which is a reply of approval, for example. This sound is picked up by the microphone (30), and the processes in steps S1 and S2 are executed. As a result, in the attribute-value pair format input conversion result (1001), the value of the “intention type” attribute is “reply” and the value of the “subject” attribute is “acknowledgement”, and the processing scenario instruction information (1002) becomes an “auxiliary dialogue scenario” (see FIG. 8).

この場合、ステップＳ３の処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ情報（１００３）および処理シナリオ指示情報（１００２）を読み込み、各情報が一致するか否かを判定する。この段階では、処理シナリオ情報（１００３）は"第ｎ対話シナリオ"であり、処理シナリオ指示情報（１００２）は"補助対話シナリオ"であるから、各情報は一致しない。 In this case, in the process of step S3, the dialogue scenario execution unit (101) reads the processing scenario information (1003) and the processing scenario instruction information (1002) from the RAM (15), and determines whether or not each information matches. To do. At this stage, the processing scenario information (1003) is the “nth dialogue scenario” and the processing scenario instruction information (1002) is the “auxiliary dialogue scenario”.

そこで、ステップＳ４の処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から読み込んだ処理シナリオ指示情報（１００２）が"補助対話シナリオ"であるか否かを判定する。
この段階では、処理シナリオ指示情報（１００２）は"補助対話シナリオ"であるから、制御部（８０）の制御の下、ステップＳ１２の処理が実行される。 Therefore, in the process of step S4, the dialogue scenario execution unit (101) determines whether or not the processing scenario instruction information (1002) read from the RAM (15) is an “auxiliary dialogue scenario”.
At this stage, since the process scenario instruction information (1002) is an “auxiliary dialogue scenario”, the process of step S12 is executed under the control of the control unit (80).

対話シナリオ実行部（１０１）は、予めＲＡＭ（１５）に読み込まれている補助対話シナリオ（１０１２）および入力変換結果（１００１）を読み込み、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令であるか否かを判定する（ステップＳ１２）。
ここで「指令」とは、補助対話シナリオ（１０１２）に記述されている、実行処理の内容などを指定する命令のことである。また、「焦点対話システム変更指令」とは、焦点対話システムの変更を実行処理する内容の命令のことである。
制御部（８０）は、判定結果が、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令である場合にはステップＳ１９の処理を、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令ではない場合にはステップＳ１３の処理を実行するように制御する。ステップＳ１３の処理については後述する。
この段階では、入力変換結果（１００１）は、属性である「意図タイプ」の値は「返事」、属性である「主題」の値は「了承」であり、この入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令は焦点対話システム変更指令となっている。そこで、制御部（８０）は、ステップＳ１９の処理を実行するように制御する。 The dialogue scenario execution unit (101) reads the auxiliary dialogue scenario (1012) and the input conversion result (1001) previously read into the RAM (15), and commands the auxiliary dialogue scenario (1012) for the input conversion result (1001). Is a focal dialogue system change command (step S12).
Here, the “command” is a command that specifies the contents of the execution process described in the auxiliary dialogue scenario (1012). The “focused dialogue system change command” is a command having a content for executing a change of the focused dialogue system.
When the command of the auxiliary dialogue scenario (1012) with respect to the input conversion result (1001) is the focal dialogue system change command, the control unit (80) performs the process of step S19 and the input conversion result (1001). When the command of the auxiliary dialogue scenario (1012) is not the focal dialogue system change command, control is performed to execute the process of step S13. The process of step S13 will be described later.
At this stage, the input conversion result (1001) has an “intent type” attribute value “reply” and an attribute “subject” value “acknowledgement”. The command of the dialogue scenario (1012) is a focal dialogue system change command. Therefore, the control unit (80) performs control so as to execute the process of step S19.

対話シナリオ実行部（１０１）は、処理シナリオ情報（１００３）から焦点対話システム設定情報（１０２１）を生成して、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ１９）。このステップＳ１９の処理はステップＳ７の処理と同様である。 The dialogue scenario execution unit (101) generates the focal dialogue system setting information (1021) from the processing scenario information (1003) and stores it in a predetermined storage area of the RAM (15) (step S19). The process in step S19 is the same as the process in step S7.

続いて、制御部（８０）の制御の下、焦点対話システム設定部（１０２）は、ＲＡＭ（１５）から焦点対話システム設定情報（１０２１）を読み込む。そして、焦点対話システム設定部（１０２）は、焦点対話システム設定情報（１０２１）を解釈し、処理シナリオ情報（１００３）で指示される対話シナリオに対応する対話システムを選択する。さらに、焦点対話システム設定部（１０２）は、ネットワーク（１）を介して、この選択した対話システムの記憶手段から、対話シナリオおよび応答生成モデルをそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ２０）。このステップＳ２０の処理はステップＳ８の処理と同様である。
この段階では、処理シナリオ情報（１００３）は"第ｎ対話シナリオ"であるから、焦点対話システム設定部（１０２）は、ネットワーク（１）を介して、第ｎ対話システム（１０４ｎ）の記憶手段から第ｎ対話シナリオ（１０４ｎｂ）および第ｎ応答生成モデル（１０４ｎａ）をそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する。
なお、第ｎ応答生成モデル（１０４ｎａ）は、例えば、voicetype="male"として話者を男声、pitch="95Hz"として平均的な声の高さを９５Ｈｚ、speed="slow"として口調の速さを遅め、power="normal"として通常の声の大きさを指定するものとなっている（図９参照。）。 Subsequently, under the control of the control unit (80), the focal dialog system setting unit (102) reads the focal dialog system setting information (1021) from the RAM (15). Then, the focal dialogue system setting unit (102) interprets the focal dialogue system setting information (1021) and selects a dialogue system corresponding to the dialogue scenario indicated by the processing scenario information (1003). Further, the focal dialogue system setting unit (102) reads the dialogue scenario and the response generation model from the storage unit of the selected dialogue system via the network (1), and generates the focal dialogue scenario (1011) and the focal response generation. The model (1031) is stored in a predetermined storage area of the RAM (15) (step S20). The process in step S20 is the same as the process in step S8.
At this stage, since the processing scenario information (1003) is the “nth dialogue scenario”, the focal dialogue system setting unit (102) is stored in the nth dialogue system (104n) via the network (1). The nth dialogue scenario (104nb) and the nth response generation model (104na) are read, respectively, and stored in a predetermined storage area of the RAM (15) as the focus dialogue scenario (1011) and the focus response generation model (1031).
Note that the n-th response generation model (104na) is, for example, voice type = "male" when the speaker is male voice, pitch = "95Hz" when the average voice pitch is 95Hz, and speed = "slow" The normal voice level is designated by power = "normal" (see FIG. 9).

続いて、制御部（８０）の制御の下、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から焦点対話シナリオ（１０１１）を読み込み、対話システムの初期メッセージである応答内容（１０３３）を生成し、この応答内容（１０３３）をＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ２１）。このステップＳ２１の処理はステップＳ９の処理と同様である。具体的な一例として、対話シナリオ実行部（１０１）は、応答内容（１０３３）を、text="これから行政サービス案内のシステムでご案内いたします"として出力する（図１０参照。）。 Subsequently, under the control of the control unit (80), the dialogue scenario execution unit (101) reads the focused dialogue scenario (1011) from the RAM (15) and generates a response content (1033) that is an initial message of the dialogue system. The response content (1033) is stored in a predetermined storage area of the RAM (15) (step S21). The process in step S21 is the same as the process in step S9. As a specific example, the dialogue scenario execution unit (101) outputs the response content (1033) as text = "I will guide you through the administrative service guidance system from now on" (see FIG. 10).

続いて、制御部（８０）の制御の下、応答生成部（１０３）は、ＲＡＭ（１５）から焦点応答生成モデル（１０３１）および応答内容（１０３３）を読み込み、応答情報を生成して出力する（ステップＳ２２）。このステップＳ２２の処理はステップＳ１０の処理と同様である。
なお、この段階では、焦点応答生成モデル（１０３１）は、第ｎ応答生成モデル（１０４ｎａ）であることに留意すること。 Subsequently, under the control of the control unit (80), the response generation unit (103) reads the focus response generation model (1031) and the response content (1033) from the RAM (15), and generates and outputs response information. (Step S22). The process in step S22 is the same as the process in step S10.
It should be noted that at this stage, the focus response generation model (1031) is the nth response generation model (104na).

ステップＳ２２において応答生成部（１０３）によって出力された合成音声信号（応答情報）は、スピーカ（４０）から合成音声として出力される（ステップＳ２３）。このステップＳ２３の処理はステップＳ１１の処理と同様である。既述のとおり、ステップＳ２５の処理において用いられる焦点応答生成モデル（１０３１）は、第ｎ応答生成モデル（１０４ｎａ）であるため、スピーカ（４０）からは、男性の声で平均的な声の高さが９５Ｈｚになる程度の、やや遅めの口調で通常の大きさの合成音声で「これから行政サービス案内のシステムでご案内いたします」と出力される。 The synthesized speech signal (response information) output by the response generation unit (103) in step S22 is output as synthesized speech from the speaker (40) (step S23). The process in step S23 is the same as the process in step S11. As described above, since the focus response generation model (1031) used in the process of step S25 is the nth response generation model (104na), the average voice level of the male voice is higher than the speaker (40). The message will be "I will guide you through the administrative service guidance system from now on" with a synthesized voice of a normal volume with a slightly delayed tone of about 95 Hz.

このように、対話シナリオは、第１対話シナリオ（１０４１ｂ）→補助対話シナリオ（１０１２）→第ｎ対話シナリオ（１０４ｎｂ）と遷移したにも係わらず、応答様式は、第１対話システムにおける応答様式→第ｎ対話システムにおける応答様式と遷移したことになる。つまり、補助対話シナリオに対応する応答様式の応答が介入しないので、ユーザは、第１対話システムから第ｎ対話システムに移行したと受け止めることとなり、ユーザに無用な混乱・当惑などを生じせしめない。 As described above, the response mode is the response mode in the first dialog system in spite of the transition from the first dialog scenario (1041b) → the auxiliary dialog scenario (1012) → the nth dialog scenario (104nb) → This is a transition to the response mode in the nth interactive system. In other words, since the response of the response style corresponding to the auxiliary dialogue scenario does not intervene, the user perceives that the user has shifted from the first dialogue system to the n-th dialogue system, so that unnecessary confusion and confusion are not caused to the user.

＜補足説明１＞
さて、次に、第１実施形態における対話処理の補足説明をする。この補足説明は、処理シナリオ情報（１００３）がＮｕｌｌ値の状態において、入力音声の処理シナリオ指示情報（１００２）が"補助対話シナリオ"であった場合、あるいは、ステップＳ２７の処理の後、ステップＳ１２の処理において、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令ではない場合などにおいて、対話処理が破綻してしまうことを防止するための処理についてのものである。
ここでは、処理シナリオ情報（１００３）がＮｕｌｌ値の状態において、入力音声の処理シナリオ指示情報（１００２）が"補助対話シナリオ"であった場合を例として、補足説明する。 <Supplementary explanation 1>
Next, a supplementary explanation of the dialogue processing in the first embodiment will be given. In this supplementary explanation, when the processing scenario information (1003) is a Null value and the processing scenario instruction information (1002) of the input speech is “auxiliary dialogue scenario”, or after the processing of step S27, step S12 is executed. In this process, when the instruction of the auxiliary dialogue scenario (1012) for the input conversion result (1001) is not the focal dialogue system change command, the dialogue processing is prevented from failing. is there.
Here, supplementary explanation will be given by taking as an example a case where the processing scenario information (1002) of the input voice is “auxiliary dialogue scenario” in a state where the processing scenario information (1003) is a Null value.

ユーザが、対話装置（Ａ）との対話処理を開始するべく、ある言葉を発したとする。この言葉（音声）は、マイクロフォン（３０）によって収音され、上記ステップＳ１およびステップＳ２の処理が実行される。例えば、ユーザが「こんにちは」という発声を行うと、属性−値ペア形式の入力変換結果（１００１）は、属性である「意図タイプ」の値が「挨拶」、属性である「主題」の値が「不明」になり、処理シナリオ指示情報（１００２）は"補助対話シナリオ"になる（図１１参照。）。 It is assumed that a user utters a certain word to start a dialogue process with the dialogue device (A). This word (speech) is picked up by the microphone (30), and the processes of steps S1 and S2 are executed. For example, when the user performs utterance of "Hello", the attribute - input conversion result value pairs form (1001), the value of which is the attribute "intention type" is the value of "greeting", which is an attribute "subjects" The processing scenario instruction information (1002) becomes “auxiliary dialogue scenario” (see FIG. 11).

この場合、ステップＳ３の処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ情報（１００３）および処理シナリオ指示情報（１００２）を読み込み、各情報が一致するか否かを判定する。この段階では、処理シナリオ情報（１００３）はＮｕｌｌ値であり、処理シナリオ指示情報（１００２）は"補助対話シナリオ"であるから、各情報は一致しない。 In this case, in the process of step S3, the dialogue scenario execution unit (101) reads the processing scenario information (1003) and the processing scenario instruction information (1002) from the RAM (15), and determines whether or not each information matches. To do. At this stage, the processing scenario information (1003) is a Null value, and the processing scenario instruction information (1002) is an “auxiliary dialogue scenario”.

続くステップＳ１２の処理において、対話シナリオ実行部（１０１）は、予めＲＡＭ（１５）に読み込まれている補助対話シナリオ（１０１２）および入力変換結果（１００１）を読み込み、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令であるか否かを判定する。
この段階では、対話処理の開始であるから、通常、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が焦点対話システム変更指令であることはない。従って、制御部（８０）の制御の下、ステップＳ１３の処理が実行される。 In the subsequent process of step S12, the dialogue scenario executing unit (101) reads the auxiliary dialogue scenario (1012) and the input conversion result (1001) read in advance in the RAM (15), and assists the input conversion result (1001). It is determined whether the command of the dialogue scenario (1012) is a focal dialogue system change command.
At this stage, since the dialogue process is started, the command of the auxiliary dialogue scenario (1012) for the input conversion result (1001) is not usually the focal dialogue system change command. Accordingly, the process of step S13 is executed under the control of the control unit (80).

対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から、予めＲＡＭ（１５）に読み込まれている補助対話シナリオ（１０１２）を読み込み、対話システムの初期メッセージである応答内容（１０３３）を生成し、この応答内容（１０３３）をＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ１３）。この応答内容（１０３３）は上記と同様にテキスト形式であるとする。具体的な一例として、対話シナリオ実行部（１０１）は、応答内容（１０３３）を、text="お知りになりたいことは何でしょうか"として出力する。このステップＳ１３の処理はステップＳ２４の処理と同様である。 The dialogue scenario execution unit (101) reads the auxiliary dialogue scenario (1012) read in advance from the RAM (15) into the RAM (15), generates response content (1033) that is an initial message of the dialogue system, The response content (1033) is stored in a predetermined storage area of the RAM (15) (step S13). This response content (1033) is assumed to be in text format as described above. As a specific example, the dialogue scenario execution unit (101) outputs the response content (1033) as text = "What do you want to know?" The process of step S13 is the same as the process of step S24.

続いて、制御部（８０）の制御の下、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から読み込んだ処理シナリオ情報（１００３）がＮｕｌｌ値であるか否かを判定する（ステップＳ１４）。
制御部（８０）は、処理シナリオ情報（１００３）がＮｕｌｌ値である場合にはステップＳ１７の処理を、処理シナリオ情報（１００３）がＮｕｌｌ値ではない場合にはステップＳ１５の処理を実行するように制御する。
この段階では、処理シナリオ情報（１００３）はＮｕｌｌ値であるから、ステップＳ１７の処理が実行される。 Subsequently, under the control of the control unit (80), the dialogue scenario execution unit (101) determines whether or not the processing scenario information (1003) read from the RAM (15) is a Null value (step S14). .
The control unit (80) executes the process of step S17 when the process scenario information (1003) is a Null value, and executes the process of step S15 when the process scenario information (1003) is not a Null value. Control.
At this stage, since the process scenario information (1003) is a Null value, the process of step S17 is executed.

制御部（８０）の制御の下、応答生成部（１０３）は、ＲＡＭ（１５）から予めＲＡＭ（１５）に読み込まれている補助応答生成モデル（１０３２）およびステップＳ１３において得られた応答内容（１０３３）を読み込み、応答情報を生成して出力する（ステップＳ１７）。この段階では、焦点応答生成モデル（１０３１）がＲＡＭ（１５）に読み込まれていないため、応答情報の生成に補助応答生成モデル（１０３２）を用いるのである。 Under the control of the control unit (80), the response generation unit (103) reads the auxiliary response generation model (1032) read in advance from the RAM (15) into the RAM (15) and the response content obtained in step S13 ( 1033) is read and response information is generated and output (step S17). At this stage, since the focus response generation model (1031) is not read into the RAM (15), the auxiliary response generation model (1032) is used for generating response information.

ステップＳ１７において応答生成部（１０３）によって出力された合成音声信号（応答情報）は、スピーカ（４０）から合成音声として出力される（ステップＳ１８）。このステップＳ１８の処理はステップＳ１１の処理と同様である。 The synthesized speech signal (response information) output by the response generation unit (103) in step S17 is output as synthesized speech from the speaker (40) (step S18). The process in step S18 is the same as the process in step S11.

もし、ステップＳ１４の処理において、処理シナリオ情報（１００３）がＮｕｌｌ値ではなかった場合（このような場合としては、例えば上記の例において、ステップＳ２７の処理の後にユーザが「はい」以外の返事をして、処理シナリオ情報が"補助シナリオ情報"となった場合などが考えられる。）、ステップＳ１５の処理が実行される。 If the processing scenario information (1003) is not a Null value in the process of step S14 (in this case, for example, in the above example, the user returns a response other than “Yes” after the process of step S27). Thus, the case where the processing scenario information becomes “auxiliary scenario information” can be considered.), The process of step S15 is executed.

この場合は、何らかの焦点応答生成モデル（１０３１）がＲＡＭ（１５）に読み込まれているため、制御部（８０）の制御の下、応答生成部（１０３）は、ＲＡＭ（１５）から焦点応答生成モデル（１０３１）およびステップＳ１３において得られた応答内容（１０３３）を読み込み、応答情報を生成して出力する（ステップＳ１５）。このステップＳ１５の処理はステップＳ１０などの処理と同様である。 In this case, since some focus response generation model (1031) is read into the RAM (15), the response generation unit (103) generates the focus response from the RAM (15) under the control of the control unit (80). The model (1031) and the response content (1033) obtained in step S13 are read, and response information is generated and output (step S15). The process in step S15 is similar to the process in step S10 and the like.

ステップＳ１５において応答生成部（１０３）によって出力された合成音声信号（応答情報）は、スピーカ（４０）から合成音声として出力される（ステップＳ１６）。このステップＳ１６の処理はステップＳ１１などの処理と同様である。 The synthesized speech signal (response information) output by the response generation unit (103) in step S15 is output as synthesized speech from the speaker (40) (step S16). The process in step S16 is similar to the process in step S11 and the like.

このように、ステップＳ１３、ステップＳ１４、ステップＳ１５、ステップＳ１６、ステップＳ１７、ステップＳ１８の処理を行うことで、対話処理の破綻が防止される。特に、ステップＳ１５およびステップＳ１６の処理を行う場合には、応答様式が従前の応答様式と同じになるので、ユーザを困惑させるようなことにはならない。 In this way, the failure of the dialogue process is prevented by performing the processes of step S13, step S14, step S15, step S16, step S17, and step S18. In particular, when the processes of step S15 and step S16 are performed, the response format is the same as the previous response format, so that the user is not confused.

＜補足説明２＞
上記の例において、ステップＳ２７の処理の後、ユーザが「はい」と返事をしなかった場合を考える。例えば、ステップＳ２７の処理の後、ユーザが、「六本木駅の近くの区役所を教えてください」と同じ言葉を繰り返したとする。この場合、処理シナリオ指示情報（１００２）は"第ｎ対話シナリオ"になるが、ステップＳ２６の処理において、処理シナリオ情報（１００３）が"第ｎ対話シナリオ"に変更されているので、ステップＳ３の判定処理において、処理シナリオ指示情報（１００２）と処理シナリオ情報（１００３）とが一致すると判定される。次いで、ステップＳ３０の判定処理が実行されるが、この段階では、焦点対話システム設定情報（１０２１）は、"第１対話シナリオ"等を設定するために必要な対話システムの設定・変更のための情報であり、現在の処理シナリオ情報（１００３）は"第ｎ対話シナリオ"であるから、判定が成立せず、制御部（８０）は、ステップＳ３１の処理を実行するように制御する。 <Supplementary explanation 2>
In the above example, consider a case where the user does not reply “yes” after the process of step S27. For example, after the process of step S27, it is assumed that the user repeats the same word “Please tell me the ward office near Roppongi Station”. In this case, the processing scenario instruction information (1002) becomes the “nth dialogue scenario”. However, in the processing of step S26, the processing scenario information (1003) is changed to “nth dialogue scenario”. In the determination process, it is determined that the processing scenario instruction information (1002) matches the processing scenario information (1003). Next, the determination process of step S30 is executed. At this stage, the focus dialog system setting information (1021) is used for setting / changing the dialog system necessary for setting the “first dialog scenario” and the like. Since the current processing scenario information (1003) is the “nth dialog scenario”, the determination is not satisfied, and the control unit (80) performs control so as to execute the process of step S31.

対話シナリオ実行部（１０１）は、現在の処理シナリオ情報（１００３）から焦点対話システム設定情報（１０２１）を生成して、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ３１）。このステップＳ３１の処理はステップＳ１９の処理と同様である。
つまり、この段階で、焦点対話システム設定情報（１０２１）は、"第ｎ対話シナリオ"等を設定するために必要な対話システムの設定・変更のための情報となる。 The dialogue scenario execution unit (101) generates the focal dialogue system setting information (1021) from the current processing scenario information (1003) and stores it in a predetermined storage area of the RAM (15) (step S31). The process in step S31 is the same as the process in step S19.
That is, at this stage, the focus dialogue system setting information (1021) is information for setting / changing the dialogue system necessary for setting the “nth dialogue scenario” or the like.

続いて、制御部（８０）の制御の下、焦点対話システム設定部（１０２）は、ＲＡＭ（１５）から焦点対話システム設定情報（１０２１）を読み込む。そして、焦点対話システム設定部（１０２）は、焦点対話システム設定情報（１０２１）を解釈し、処理シナリオ情報（１００３）で指示される対話シナリオに対応する対話システムを選択する。さらに、焦点対話システム設定部（１０２）は、ネットワーク（１）を介して、この選択した対話システムの記憶手段から、対話シナリオおよび応答生成モデルをそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ３２）。このステップＳ３２の処理はステップＳ２０の処理と同様である。
この段階では、処理シナリオ情報（１００３）は"第ｎ対話シナリオ"であるから、焦点対話システム設定部（１０２）は、ネットワーク（１）を介して、第ｎ対話システム（１０４ｎ）の記憶手段から第ｎ対話シナリオ（１０４ｎｂ）および第ｎ応答生成モデル（１０４ｎａ）をそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する。 Subsequently, under the control of the control unit (80), the focal dialog system setting unit (102) reads the focal dialog system setting information (1021) from the RAM (15). Then, the focal dialogue system setting unit (102) interprets the focal dialogue system setting information (1021) and selects a dialogue system corresponding to the dialogue scenario indicated by the processing scenario information (1003). Further, the focal dialogue system setting unit (102) reads the dialogue scenario and the response generation model from the storage unit of the selected dialogue system via the network (1), and generates the focal dialogue scenario (1011) and the focal response generation. The model (1031) is stored in a predetermined storage area of the RAM (15) (step S32). The process in step S32 is the same as the process in step S20.
At this stage, since the processing scenario information (1003) is the “nth dialogue scenario”, the focal dialogue system setting unit (102) is stored in the nth dialogue system (104n) via the network (1). The nth dialogue scenario (104nb) and the nth response generation model (104na) are read, respectively, and stored in a predetermined storage area of the RAM (15) as the focus dialogue scenario (1011) and the focus response generation model (1031).

続いて、制御部（８０）の制御の下、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から焦点対話シナリオ（１０１１）を読み込み、対話システムの初期メッセージである応答内容（１０３３）を生成し、この応答内容（１０３３）をＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ３３）。このステップＳ３３の処理はステップＳ２１の処理と同様である。 Subsequently, under the control of the control unit (80), the dialogue scenario execution unit (101) reads the focused dialogue scenario (1011) from the RAM (15) and generates a response content (1033) that is an initial message of the dialogue system. The response content (1033) is stored in a predetermined storage area of the RAM (15) (step S33). The process in step S33 is the same as the process in step S21.

続いて、制御部（８０）の制御の下、応答生成部（１０３）は、ＲＡＭ（１５）から焦点応答生成モデル（１０３１）および応答内容（１０３３）を読み込み、応答情報を生成して出力する（ステップＳ３４）。このステップＳ３４の処理はステップＳ２２の処理と同様である。 Subsequently, under the control of the control unit (80), the response generation unit (103) reads the focus response generation model (1031) and the response content (1033) from the RAM (15), and generates and outputs response information. (Step S34). The process in step S34 is the same as the process in step S22.

ステップＳ３４において応答生成部（１０３）によって出力された合成音声信号（応答情報）は、スピーカ（４０）から合成音声として出力される（ステップＳ３５）。このステップＳ３５の処理はステップＳ２３の処理と同様である。 The synthesized speech signal (response information) output by the response generation unit (103) in step S34 is output as synthesized speech from the speaker (40) (step S35). The process in step S35 is the same as the process in step S23.

このように、ステップＳ３０、ステップＳ３１、ステップＳ３２、ステップＳ３３、ステップＳ３４、ステップＳ３５の処理を行うことで、用いる対話システムが切り替わっていないにも係わらず、続けて同じ処理シナリオ指示情報が得られた場合に、従前の対話システムが用いられてしまうことを防止している。 As described above, the same processing scenario instruction information can be obtained by performing the processing of step S30, step S31, step S32, step S33, step S34, and step S35, even though the dialog system to be used is not switched. In this case, the conventional dialogue system is prevented from being used.

なお、ここまで説明してきた第１実施形態は、本発明の手法の適用を説明するための一実施形態を例示したに過ぎず、その他の対話方法・装置の実施形態にも適用可能である。 The first embodiment described so far is merely an example for explaining the application of the method of the present invention, and can be applied to other embodiments of the interactive method and apparatus.

《第２実施形態》
以下に、本発明の第２実施形態について説明する。
既述のとおり、第２実施形態は、上記参考文献１の対話システムに、本発明の手法を用いる場合である。
以下に、本発明の第２実施形態を、図１６および図１７を参照しながら説明するが、第１実施形態および各図中の対応する部分は同一参照番号を付けて重複説明を省略する。また、第２実施形態における対話装置（Ｂ）のハードウェア構成例や対話システムとのネットワーク構成例などは第１実施形態と同様であるから説明を略する。なお、第２実施形態は、第１実施形態と同様、音声による対話処理を想定する。 << Second Embodiment >>
The second embodiment of the present invention will be described below.
As described above, the second embodiment is a case where the technique of the present invention is used in the dialogue system of the above-mentioned reference 1.
Hereinafter, a second embodiment of the present invention will be described with reference to FIGS. 16 and 17, and corresponding portions in the first embodiment and the respective drawings are denoted by the same reference numerals, and redundant description will be omitted. Also, the hardware configuration example of the interactive device (B) in the second embodiment and the network configuration example with the interactive system are the same as those in the first embodiment, and the description thereof will be omitted. Note that the second embodiment assumes voice interactive processing as in the first embodiment.

上記の第１実施形態は、ユーザの入力毎に用いる対話システムを判定し、この判定された対話システムに従って対話処理を行う場合であった。
一方、第２実施形態は、第１実施形態における入力処理部（１００）に相当する入力処理部（１００Ｂ）が、ユーザの入力が、焦点対話システムに対応する焦点対話シナリオに関連するか、対話システムの切り替わる際に発生する対話を扱う補助対話シナリオに関連するかを判定する機能を有する場合である。
両者の差異は、主として、第１実施形態における入力処理部（１００）は、処理シナリオ指示情報として、例えば、"第１対話シナリオ"、"第ｎ対話シナリオ"と指定していたが、第２実施形態における入力処理部（１００Ｂ）は、処理シナリオ指示情報として、例えば、"焦点対話シナリオ"、"補助対話シナリオ"と指定する点にある。
なお、第２実施形態の前提となる対話システムについては、上記参考文献１を参照することとし、その詳細な説明を略する。 In the first embodiment, the dialog system used for each user input is determined, and the dialog processing is performed according to the determined dialog system.
On the other hand, in the second embodiment, the input processing unit (100B) corresponding to the input processing unit (100) in the first embodiment determines whether the user input relates to a focal dialogue scenario corresponding to the focal dialogue system. This is a case of having a function of determining whether or not it is related to an auxiliary dialogue scenario that handles a dialogue that occurs when the system is switched.
The difference between the two is mainly that the input processing unit (100) in the first embodiment designates, for example, “first dialogue scenario” and “nth dialogue scenario” as the processing scenario instruction information. The input processing unit (100B) in the embodiment is to specify, for example, “focus dialogue scenario” and “auxiliary dialogue scenario” as the processing scenario instruction information.
In addition, about the dialog system used as the premise of 2nd Embodiment, it shall refer to the said reference document 1, and the detailed description is abbreviate | omitted.

また、説明の便宜から、現在の焦点対話システムが第１対話システム（１０４１）であるとする。これは、別の観点からすれば、初期状態の対話システムを第１対話システム（１０４１）とすることにも同じである。 Also, for convenience of explanation, it is assumed that the current focal dialogue system is the first dialogue system (1041). From another point of view, this is the same as setting the initial interactive system as the first interactive system (1041).

例えば、上記第１実施形態と同様に、現在の焦点対話システムが第１対話システム（１０４１）である場合において、ユーザが「六本木駅の近くの区役所を教えてください」という発声を行うと、この発声がマイクロフォン（３０）によって収音される（ステップＳ１）。 For example, in the same way as in the first embodiment, when the current focal dialogue system is the first dialogue system (1041), when the user utters "Please tell me the ward office near Roppongi Station" The utterance is picked up by the microphone (30) (step S1).

次に、入力処理部（１００Ｂ）は、ユーザの入力に対して、焦点対話システムである第１対話システム（１０４１）から東京行政サービス案内システムである第ｎ対話システム（１０４ｎ）に対話システムを切り替えるのが適切であると判定し、処理シナリオ指示情報（１００２）として"補助対話シナリオ"を指定する（ステップＳ２Ｂ）。なお、入力処理部（１００Ｂ）は、処理シナリオ指示情報（１００２）とともに入力変換結果（１００１）を出力し、これらはＲＡＭ（１５）の所定の格納領域に格納される。 Next, in response to the user input, the input processing unit (100B) switches the dialogue system from the first dialogue system (1041), which is a focal dialogue system, to the nth dialogue system (104n), which is a Tokyo administrative service guidance system. Is determined to be appropriate, and “auxiliary dialogue scenario” is designated as the processing scenario instruction information (1002) (step S2B). The input processing unit (100B) outputs the input conversion result (1001) together with the processing scenario instruction information (1002), and these are stored in a predetermined storage area of the RAM (15).

次に、制御部（８０）の制御の下、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ指示情報（１００２）を読み込み、処理シナリオ指示情報（１００２）が"補助対話シナリオ"であるか否かを判定する（ステップＳ３Ｂ）。
制御部（８０）は、判定結果が、処理シナリオ指示情報（１００２）が"補助対話シナリオ"ではない場合（つまり、"焦点対話シナリオ"である場合）にはステップＳ９〜ステップＳ１１の処理を、処理シナリオ指示情報（１００２）が"補助対話シナリオ"である場合にはステップＳ１２の処理を実行するように制御する。
この段階では、処理シナリオ指示情報（１００２）が"補助対話シナリオ"であるから、制御部（８０）の制御の下、ステップＳ１２の処理が実行される。 Next, under the control of the control unit (80), the dialogue scenario execution unit (101) reads the processing scenario instruction information (1002) from the RAM (15), and the processing scenario instruction information (1002) is “auxiliary dialogue scenario”. It is determined whether or not (step S3B).
When the determination result is that the processing scenario instruction information (1002) is not “auxiliary dialogue scenario” (that is, “focus dialogue scenario”), the control unit (80) performs steps S9 to S11. When the process scenario instruction information (1002) is “auxiliary dialogue scenario”, control is performed so as to execute the process of step S12.
At this stage, since the process scenario instruction information (1002) is “auxiliary dialogue scenario”, the process of step S12 is executed under the control of the control unit (80).

対話シナリオ実行部（１０１）は、予めＲＡＭ（１５）に読み込まれている補助対話シナリオ（１０１２）および入力変換結果（１００１）を読み込み、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令であるか否かを判定する（ステップＳ１２）。
制御部（８０）は、判定結果が、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令である場合にはステップＳ１９Ｂの処理を、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令ではない場合にはステップＳ２４の処理を実行するように制御する。
この段階では、属性−値ペア形式の入力変換結果（１００１）は、属性が「意図タイプ」の値を「質問」、属性が「主題」の値を「区役所」、属性が「エリア」の値を「六本木」としたものとなっている（図６参照。）。従って、この入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令は、焦点対話システム変更指令となっていない。そこで、制御部（８０）は、ステップＳ２４の処理を実行するように制御する。 The dialogue scenario execution unit (101) reads the auxiliary dialogue scenario (1012) and the input conversion result (1001) previously read into the RAM (15), and commands the auxiliary dialogue scenario (1012) for the input conversion result (1001). Is a focal dialogue system change command (step S12).
When the command of the auxiliary dialogue scenario (1012) with respect to the input conversion result (1001) is the focal dialogue system change command, the control unit (80) performs the process of step S19B and the input conversion result (1001). If the command of the auxiliary dialogue scenario (1012) is not the focal dialogue system change command, control is performed so as to execute the process of step S24.
At this stage, the input conversion result (1001) in the attribute-value pair format has a value of “question” for the attribute “intent type”, a value “city office” for the attribute “subject”, and a value “area” for the attribute. Is “Roppongi” (see FIG. 6). Therefore, the command of the auxiliary dialogue scenario (1012) for the input conversion result (1001) is not a focal dialogue system change command. Therefore, the control unit (80) performs control so as to execute the process of step S24.

対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から、予めＲＡＭ（１５）に読み込まれている補助対話シナリオ（１０１２）および入力変換結果（１００１）を読み込み、応答内容（１０３３）を生成し、この応答内容（１０３３）をＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ２４）。この応答内容（１０３３）は第１実施形態と同様にテキスト形式であるとする。具体的な一例として、対話シナリオ実行部（１０１）は、応答内容（１０３３）を、text="これから行政サービス案内のシステムがご案内いたしますがよろしいでしょうか"として出力する（図７参照。）。 The dialogue scenario execution unit (101) reads the auxiliary dialogue scenario (1012) and the input conversion result (1001) read in advance from the RAM (15) from the RAM (15), generates response contents (1033), The response content (1033) is stored in a predetermined storage area of the RAM (15) (step S24). This response content (1033) is assumed to be in text format as in the first embodiment. As a specific example, the dialogue scenario execution unit (101) outputs the response content (1033) as text = "Are you sure the administrative service guidance system will guide you now?" (See FIG. 7). .

続いて、制御部（８０）の制御の下、応答生成部（１０３）は、ＲＡＭ（１５）から焦点応答生成モデル（１０３１）および応答内容（１０３３）を読み込み、応答情報（合成音声信号）を生成して出力する（ステップＳ１０）。
なお、この段階では、焦点応答生成モデル（１０３１）は、第１応答生成モデル（１０４１ａ）であることに留意すること。 Subsequently, under the control of the control unit (80), the response generation unit (103) reads the focus response generation model (1031) and the response content (1033) from the RAM (15), and receives response information (synthesized speech signal). Generate and output (step S10).
Note that at this stage, the focus response generation model (1031) is the first response generation model (1041a).

ステップＳ１０において応答生成部（１０３）によって出力された合成音声信号（応答情報）は、スピーカ（４０）から合成音声として出力される（ステップＳ１１）。 The synthesized speech signal (response information) output by the response generation unit (103) in step S10 is output as synthesized speech from the speaker (40) (step S11).

ステップＳ１０の処理において用いられる焦点応答生成モデル（１０３１）は、従前の第１応答生成モデル（１０４１ａ）のままであるため、スピーカ（４０）からは、女性の声で平均的な声の高さが２００Ｈｚになる程度の、やや早めの口調で通常の大きさの合成音声で「これから行政サービス案内のシステムがご案内いたしますがよろしいでしょうか」と出力される。従って、対話シナリオが補助対話シナリオに変更しても、応答様式が従前の応答様式（この場合は第１対話システムにおける応答様式である。）と同じになるので、ユーザを困惑させるようなことにはならない。 Since the focus response generation model (1031) used in the process of step S10 is the same as the previous first response generation model (1041a), the average voice pitch of the female voice is obtained from the speaker (40). "Sorry, the government service guidance system will guide you in the future." Is output with a normal synthesized speech with a slightly early tone of about 200Hz. Therefore, even if the dialogue scenario is changed to the auxiliary dialogue scenario, the response mode becomes the same as the previous response mode (in this case, the response mode in the first dialog system). Must not.

ユーザが、上記の合成音声を知覚して、例えば了承の返事である「はい」を発声したとする。この音声はマイクロフォン（３０）によって収音され、上記ステップＳ１およびステップＳ２Ｂの処理が実行される。この結果、属性−値ペア形式の入力変換結果（１００１）は、属性である「意図タイプ」の値が「返事」、属性である「主題」の値が「了承」になる。また、処理シナリオ指示情報（１００２）は、それまでの対話履歴（参考文献１参照。）などから対話システムの切り替えの際の対話処理を扱うべく"補助対話シナリオ"になる（図８参照）。 It is assumed that the user perceives the above synthesized speech and utters “Yes”, which is a reply of approval, for example. This sound is picked up by the microphone (30), and the processes in steps S1 and S2B are executed. As a result, in the input conversion result (1001) in the attribute-value pair format, the value of the “intention type” that is the attribute is “reply”, and the value of the “subject” that is the attribute is “accepted”. Further, the processing scenario instruction information (1002) becomes an “auxiliary dialogue scenario” to handle the dialogue processing at the time of switching the dialogue system from the dialogue history so far (see Reference 1) (see FIG. 8).

次に、ステップＳ３Ｂの処理において、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から処理シナリオ指示情報（１００２）を読み込み、処理シナリオ指示情報（１００２）が"補助対話シナリオ"であるか否かを判定する。この段階では、処理シナリオ指示情報（１００２）が"補助対話シナリオ"であるから、制御部（８０）の制御の下、ステップＳ１２の処理が実行される。 Next, in the process of step S3B, the dialogue scenario execution unit (101) reads the processing scenario instruction information (1002) from the RAM (15), and whether or not the processing scenario instruction information (1002) is “auxiliary dialogue scenario”. Determine whether. At this stage, since the process scenario instruction information (1002) is “auxiliary dialogue scenario”, the process of step S12 is executed under the control of the control unit (80).

次に、ステップＳ１２の処理において、対話シナリオ実行部（１０１）は、予めＲＡＭ（１５）に読み込まれている補助対話シナリオ（１０１２）および入力変換結果（１００１）を読み込み、入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令が、焦点対話システム変更指令であるか否かを判定する。
この段階では、属性−値ペア形式の入力変換結果（１００１）は、属性が「意図タイプ」の値を「返事」、属性が「主題」の値を「了承」としたものとなっている。従って、この入力変換結果（１００１）に対する補助対話シナリオ（１０１２）の指令は、焦点対話システム変更指令となっている。そこで、制御部（８０）は、ステップＳ１９Ｂの処理を実行するように制御する。 Next, in the process of step S12, the dialogue scenario execution unit (101) reads the auxiliary dialogue scenario (1012) and the input conversion result (1001) read in the RAM (15) in advance, and the input conversion result (1001). It is determined whether or not the command of the auxiliary dialogue scenario (1012) is a focal dialogue system change command.
At this stage, the attribute-value pair format input conversion result (1001) is such that the value of the attribute “intention type” is “reply” and the value of the attribute “subject” is “accepted”. Therefore, the command of the auxiliary dialogue scenario (1012) for the input conversion result (1001) is a focal dialogue system change command. Therefore, the control unit (80) performs control so as to execute the process of step S19B.

対話シナリオ実行部（１０１）は、焦点対話システム変更指令から焦点対話システム設定情報（１０２１）を生成して、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ１９Ｂ）。 The dialogue scenario execution unit (101) generates the focal dialogue system setting information (1021) from the focal dialogue system change command, and stores it in a predetermined storage area of the RAM (15) (step S19B).

続いて、制御部（８０）の制御の下、焦点対話システム設定部（１０２）は、ＲＡＭ（１５）から焦点対話システム設定情報（１０２１）を読み込む。そして、焦点対話システム設定部（１０２）は、焦点対話システム設定情報（１０２１）を解釈し、焦点対話システム変更指令で指示される対話システムを選択する。さらに、焦点対話システム設定部（１０２）は、ネットワーク（１）を介して、この選択した対話システムの記憶手段から、対話シナリオおよび応答生成モデルをそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ２０Ｂ）。
この段階では、焦点対話システム変更指令は第ｎ対話システム（１０４ｎ）であるから、焦点対話システム設定部（１０２）は、第ｎ対話システム（１０４ｎ）における第ｎ対話シナリオ（１０４ｎｂ）および第ｎ応答生成モデル（１０４ｎａ）をそれぞれ読み込み、焦点対話シナリオ（１０１１）および焦点応答生成モデル（１０３１）として、ＲＡＭ（１５）の所定の格納領域に格納する。 Subsequently, under the control of the control unit (80), the focal dialog system setting unit (102) reads the focal dialog system setting information (1021) from the RAM (15). Then, the focal dialogue system setting unit (102) interprets the focal dialogue system setting information (1021) and selects the dialogue system designated by the focal dialogue system change command. Further, the focal dialogue system setting unit (102) reads the dialogue scenario and the response generation model from the storage unit of the selected dialogue system via the network (1), and generates the focal dialogue scenario (1011) and the focal response generation. The model (1031) is stored in a predetermined storage area of the RAM (15) (step S20B).
At this stage, since the focus dialog system change command is the nth dialog system (104n), the focus dialog system setting unit (102) performs the nth dialog scenario (104nb) and the nth response in the nth dialog system (104n). Each of the generation models (104na) is read and stored in a predetermined storage area of the RAM (15) as a focus dialogue scenario (1011) and a focus response generation model (1031).

続いて、制御部（８０）の制御の下、対話シナリオ実行部（１０１）は、ＲＡＭ（１５）から焦点対話シナリオ（１０１１）を読み込み（必要に応じて入力変換結果（１００１）も読み込む。）、応答内容（１０３３）を生成し、この応答内容（１０３３）をＲＡＭ（１５）の所定の格納領域に格納する（ステップＳ９）。具体的な一例として、対話シナリオ実行部（１０１）は、応答内容（１０３３）を、text="これから行政サービス案内のシステムでご案内いたします"として出力する（図１０参照。）。 Subsequently, under the control of the control unit (80), the dialogue scenario execution unit (101) reads the focal dialogue scenario (1011) from the RAM (15) (also reads the input conversion result (1001) as necessary). The response content (1033) is generated, and the response content (1033) is stored in a predetermined storage area of the RAM (15) (step S9). As a specific example, the dialogue scenario execution unit (101) outputs the response content (1033) as text = "I will guide you through the administrative service guidance system from now on" (see FIG. 10).

続いて、制御部（８０）の制御の下、応答生成部（１０３）は、ＲＡＭ（１５）から焦点応答生成モデル（１０３１）および応答内容（１０３３）を読み込み、応答情報（合成音声信号）を生成して出力する（ステップＳ１０）。
なお、この段階では、焦点応答生成モデル（１０３１）は、第ｎ応答生成モデル（１０４ｎａ）であることに留意すること。 Subsequently, under the control of the control unit (80), the response generation unit (103) reads the focus response generation model (1031) and the response content (1033) from the RAM (15), and receives response information (synthesized speech signal). Generate and output (step S10).
It should be noted that at this stage, the focus response generation model (1031) is the nth response generation model (104na).

ステップＳ１０において応答生成部（１０３）によって出力された合成音声信号（応答情報）は、スピーカ（４０）から合成音声として出力される（ステップＳ１１）。既述のとおり、ステップＳ１０の処理において用いられる焦点応答生成モデル（１０３１）は、第ｎ応答生成モデル（１０４ｎａ）であるため、スピーカ（４０）からは、男性の声で平均的な声の高さが９５Ｈｚになる程度の、やや遅めの口調で通常の大きさの合成音声で「これから行政サービス案内のシステムでご案内いたします」と出力される。 The synthesized speech signal (response information) output by the response generation unit (103) in step S10 is output as synthesized speech from the speaker (40) (step S11). As described above, since the focus response generation model (1031) used in the process of step S10 is the nth response generation model (104na), the average voice level of the male voice is higher than the speaker (40). The message will be "I will guide you through the administrative service guidance system from now on" with a synthesized voice of a normal volume with a slightly delayed tone of about 95 Hz.

このように、対話シナリオは、第１対話シナリオ（１０４１ｂ）→補助対話シナリオ（１０１２）→第ｎ対話シナリオ（１０４ｎｂ）と遷移したにも係わらず、応答様式は、第１対話システムにおける応答様式→第ｎ対話システムにおける応答様式と遷移したことになる。つまり、第２実施形態においても、補助対話シナリオに対応する応答様式の応答が介入しないので、ユーザは、第１対話システムから第ｎ対話システムに移行したと受け止めることとなり、ユーザに無用な混乱・当惑などを生じせしめない。 As described above, the response mode is the response mode in the first dialog system in spite of the transition from the first dialog scenario (1041b) → the auxiliary dialog scenario (1012) → the nth dialog scenario (104nb) → This is a transition to the response mode in the nth interactive system. That is, also in the second embodiment, since the response in the response mode corresponding to the auxiliary dialogue scenario does not intervene, the user perceives that the user has shifted from the first dialogue system to the n-th dialogue system. It will not cause embarrassment.

本発明である対話装置・方法は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The interactive apparatus / method according to the present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the gist of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

また、上記実施形態において説明した対話装置における処理機能をコンピュータによって実現する場合、対話装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記対話装置における処理機能がコンピュータ上で実現される。 Further, when the processing functions in the interactive device described in the above embodiment are realized by a computer, the processing contents of the functions that the interactive device should have are described by a program. Then, by executing this program on a computer, the processing functions in the above-described dialog device are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. The computer-readable recording medium may be any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage device. When executing the process, the computer reads the program stored in its own recording medium and executes the process according to the read program. As another execution form of the program, the computer may directly read the program from the portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、対話装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the interactive apparatus is configured by executing a predetermined program on the computer. However, at least a part of the processing contents may be realized by hardware.

本発明は、複数の対話システムを組み合わせてより広い範囲の話題を扱うマルチ対話システムに有用である。 The present invention is useful for a multi-dialog system that handles a wider range of topics by combining a plurality of dialog systems.

第１実施形態に係わる対話装置（Ａ）のハードウェア構成を例示した構成ブロック図。The block diagram which illustrated the hardware constitutions of the dialogue apparatus (A) concerning 1st Embodiment. 対話装置（Ａ）の機能構成例を示す図。The figure which shows the function structural example of a dialogue apparatus (A). 入力変換結果および処理シナリオ指示情報の例を示す図。The figure which shows the example of an input conversion result and process scenario instruction information. 応答内容の例を示す図。The figure which shows the example of a response content. 応答生成モデルの例を示す図。The figure which shows the example of a response production | generation model. 入力変換結果および処理シナリオ指示情報の例を示す図。The figure which shows the example of an input conversion result and process scenario instruction information. 応答内容の例を示す図。The figure which shows the example of a response content. 入力変換結果および処理シナリオ指示情報の例を示す図。The figure which shows the example of an input conversion result and process scenario instruction information. 応答生成モデルの例を示す図。The figure which shows the example of a response production | generation model. 応答内容の例を示す図。The figure which shows the example of a response content. 入力変換結果および処理シナリオ指示情報の例を示す図。The figure which shows the example of an input conversion result and process scenario instruction information. 対話装置（Ａ）における処理フローを示す図（その１）。The figure which shows the processing flow in a dialogue apparatus (A) (the 1). 対話装置（Ａ）における処理フローを示す図（その２）。The figure which shows the processing flow in a dialogue apparatus (A) (the 2). 対話装置（Ａ）における処理フローを示す図（その３）。The figure which shows the processing flow in a dialogue apparatus (A) (the 3). 対話装置（Ａ）における処理フローを示す図（その４）。The figure which shows the processing flow in a dialogue apparatus (A) (the 4). 第２実施形態に係わる対話装置（Ｂ）の機能構成例を示す図。The figure which shows the function structural example of the dialogue apparatus (B) concerning 2nd Embodiment. 対話装置（Ｂ）における処理フローを示す図。The figure which shows the processing flow in a dialogue apparatus (B). 対話装置と対話システムとの全体構成を示す図。The figure which shows the whole structure of a dialogue apparatus and a dialogue system.

Explanation of symbols

１ネットワーク
３０マイクロフォン
４０スピーカ
１００入力処理部
１０１対話シナリオ実行部
１０２焦点対話システム設定部
１０３応答生成部
１００１入力変換結果
１００２処理シナリオ指示情報
１００３処理シナリオ情報
１０１１焦点対話シナリオ
１０１２補助対話シナリオ
１０２１焦点対話システム設定情報
１０３１焦点応答生成モデル
１０３２補助応答生成モデル
１０３３応答内容
１０４１第１対話シナリオ
１０４１ａ第１応答生成モデル
１０４１ｂ第１対話シナリオ
１０４ｎ第ｎ対話システム
１０４ｎａ第ｎ応答生成モデル
１０４ｎｂ第ｎ対話シナリオ DESCRIPTION OF SYMBOLS 1 Network 30 Microphone 40 Speaker 100 Input processing part 101 Dialog scenario execution part 102 Focus dialog system setting part 103 Response generation part 1001 Input conversion result 1002 Process scenario instruction information 1003 Process scenario information 1011 Focus dialog scenario 1012 Auxiliary dialog scenario 1021 Focus dialog system Setting information 1031 Focus response generation model 1032 Auxiliary response generation model 1033 Response content 1041 First interaction scenario 1041a First response generation model 1041b First interaction scenario 104n nth interaction system 104na nth response generation model 104nb nth interaction scenario

Claims

Each of them can communicate with a plurality of interactive systems capable of executing interactive processing by storing at least an interactive scenario and a response generation model. A dialogue method in a dialogue device in which a dialogue scenario is stored,
The input processing means of the dialog device converts the user's dialog input into a format capable of dialog processing from the user's dialog input, and instruction information indicating the dialog scenario or auxiliary dialog scenario for executing the dialog processing An input processing step that generates and outputs these,
A focused dialogue system setting means for receiving a dialogue scenario and a response generation model of a dialogue system having a dialogue scenario indicated by the indication information from the dialogue system, and executing a dialogue process for each of them. And a focus dialogue system setting step to set as a focus response generation model,
A dialog scenario executing means for generating a response content to the input conversion result using the focal dialog scenario or the auxiliary dialog scenario, and outputting the response content using the focus dialog scenario or the auxiliary dialog scenario;
And a response generation step of generating and outputting a dialog output presented to a user from the response content using the focus response generation model.

A dialogue device capable of mutually communicating with a plurality of dialogue systems each capable of executing dialogue processing by storing at least a dialogue scenario and a response generation model;
Storage means for storing an auxiliary dialogue scenario capable of performing dialogue processing at least in dialogue system switching;
Generates the input conversion result obtained by converting the user's dialog input into a format that can be processed from the user's dialog input, and the instruction information that indicates the dialog scenario or auxiliary dialog scenario for executing the dialog processing, and outputs these Input processing means to
A focal dialogue that receives a dialogue scenario and a response generation model of a dialogue system having the dialogue scenario indicated by the instruction information from the dialogue system and sets them as a focal dialogue model and a focal response generation model for executing the dialogue processing, respectively. System setting means;
A dialogue scenario executing means for generating a response content for the input conversion result using the focal dialogue scenario or the auxiliary dialogue scenario, and outputting the response content;
And a response generation unit configured to generate and output a dialog output presented to the user from the response content using the focus response generation model.

A dialogue program for causing a computer to function as the dialogue device according to claim 2.

A computer-readable program recording medium on which the interactive program according to claim 3 is recorded.