JP2021110768A

JP2021110768A - Information processing device, information processing system, information processing method, and program

Info

Publication number: JP2021110768A
Application number: JP2020000682A
Authority: JP
Inventors: 加奈西川; Kana Nishikawa
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2021-08-02
Also published as: WO2021140816A1

Abstract

To provide a device and a method that allow processing according to a user's utterance to be executed on a device other than a device that has input the user's utterance.SOLUTION: When a data processing unit causes an external second information processing device to execute processing corresponding to a user's utterance, the data processing unit analyzes the user's utterance to generate user utterance interpretation data, converts the generated user utterance interpretation data to generate conversion data which the second information processing device can understand, and transmits the conversion data to the second information processing device. The user utterance interpretation data has intent corresponding to the intention of the user's utterance and a slot corresponding to element information included in the user's utterance. The data processing unit converts the user utterance interpretation data including the intent and the slot into data including intent and a slot which the second information processing device can understand.SELECTED DRAWING: Figure 4

Description

本開示は、情報処理装置、情報処理システム、および情報処理方法、並びにプログラムに関する。さらに詳細には、ユーザ発話に応じた処理や応答を実行する情報処理装置、情報処理システム、および情報処理方法、並びにプログラムに関する。 The present disclosure relates to information processing devices, information processing systems, information processing methods, and programs. More specifically, the present invention relates to an information processing device, an information processing system, an information processing method, and a program that execute processing and response according to a user's utterance.

昨今、ユーザ発話の音声認識を行い、認識結果に基づく様々な処理や応答を行う音声対話システムの利用が増大している。
この音声対話システムにおいては、マイクを介して入力するユーザ発話の解析を行い、解析結果に応じた処理を行う。 In recent years, the use of voice dialogue systems that perform voice recognition of user utterances and perform various processes and responses based on the recognition results is increasing.
In this voice dialogue system, the user's utterance input via the microphone is analyzed, and processing is performed according to the analysis result.

例えばユーザが、「明日の天気を教えて」と発話した場合、天気情報提供サーバから天気情報を取得して、取得情報に基づくシステム応答を生成して、生成した応答をスピーカーから出力する。具体的には、例えば、
システム発話＝「明日の天気は晴れです。ただし、夕方、雷雨があるかもしれません」
このようなシステム発話を出力する。 For example, when the user utters "Tell me the weather tomorrow", the weather information is acquired from the weather information providing server, a system response based on the acquired information is generated, and the generated response is output from the speaker. Specifically, for example
System utterance = "Tomorrow's weather will be sunny, but there may be thunderstorms in the evening."
Output such a system utterance.

このように、ユーザとの対話を行なう情報処理装置は、エージェント装置やスマートスピーカーと呼ばれる。
なお、ユーザとの対話を行なうエージェント装置について開示した従来技術として、例えば特許文献１（特開２００８−０９０５４５号公報）、特許文献２（特開２０１８−０８１４４４号公報）等がある。 Information processing devices that interact with users in this way are called agent devices and smart speakers.
As the prior art disclosed about the agent device that interacts with the user, there are, for example, Patent Document 1 (Japanese Patent Laid-Open No. 2008-090545), Patent Document 2 (Japanese Patent Laid-Open No. 2018-081444) and the like.

近年は、様々なメーカーが安価なエージェント装置を提供しており、家に複数台のエージェント装置を有するユーザも多い。
しかし、個別のエージェント装置は、ユーザが要求する処理の全てを実行できるわけでなく、各々、得意とする処理が限定されている。例えばユーザからのリクエストに応じて家の中のテレビやエアコンの制御を行なうエージェント装置Ａや、ニュースや天気情報の情報提供処理を得意とするエージェント装置Ｂや、レストラン検索や、食事のデリバリサービスの依頼処理等を得意とするエージェント装置Ｃ等、様々である。 In recent years, various manufacturers have provided inexpensive agent devices, and many users have multiple agent devices at home.
However, the individual agent devices cannot execute all the processes requested by the user, and the processes that they are good at are limited. For example, agent device A that controls TVs and air conditioners in the house in response to requests from users, agent device B that is good at providing information on news and weather information, restaurant search, and meal delivery services. There are various types such as an agent device C that is good at request processing and the like.

例えば、上記の得意とする処理が異なる複数のエージェント装置Ａ，Ｂ，Ｃを有しているユーザが、食事のデリバリサービスを依頼するために、
「おいしいピザを注文して」
と発話する場合、レストラン検索や、食事のデリバリサービスの依頼処理等を得意とするエージェント装置Ｃに対して発話を行なうことが必要となる。
その他のエージェント装置Ａ，Ｂに対して、上記のユーザ発話を行なっても、ユーザの依頼は実行されないという問題がある。 For example, in order for a user having a plurality of agent devices A, B, and C having different processes, which are the above-mentioned specialties, to request a meal delivery service,
"Order a delicious pizza"
It is necessary to speak to the agent device C, which is good at restaurant search, request processing of meal delivery service, and the like.
There is a problem that the user's request is not executed even if the above-mentioned user utterance is made to the other agent devices A and B.

特開２００８−０９０５４５号公報Japanese Unexamined Patent Publication No. 2008-090545 特開２０１８−０８１４４４号公報Japanese Unexamined Patent Publication No. 2018-081444

本開示は、例えば、上記問題点に鑑みてなされたものであり、ユーザがある１つのエージェント装置に対して行った発話に対する処理を他のエージェント装置に実行させることを可能として、ユーザ負担の軽減を実現する情報処理装置、情報処理システム、および情報処理方法、並びにプログラムを提供することを目的とする。 The present disclosure has been made in view of the above problems, for example, and makes it possible for another agent device to execute processing for an utterance made by a user on one agent device, thereby reducing the burden on the user. It is an object of the present invention to provide an information processing device, an information processing system, an information processing method, and a program that realize the above.

本開示の第１の側面は、
ユーザ発話を入力する音声入力部と、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成するデータ処理部を有し、
前記データ処理部は、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信する情報処理装置にある。 The first aspect of the disclosure is
A voice input unit for inputting user utterances,
It has a data processing unit that executes the analysis of the user utterance and generates the user utterance interpretation data.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
The information processing device converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and transmits the converted data to the second information processing device.

さらに、本開示の第２の側面は、
複数の情報処理装置を有する情報処理システムであり、
第１情報処理装置は、
ユーザ発話を入力する音声入力部と、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成するデータ処理部を有し、
前記データ処理部は、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信し、
前記第２情報処理装置は、
前記第１情報処理装置から受信する前記変換データに基づいて、前記ユーザ発話に対応した処理を実行する情報処理システムにある。 Further, the second aspect of the present disclosure is
It is an information processing system that has multiple information processing devices.
The first information processing device is
A voice input unit for inputting user utterances,
It has a data processing unit that executes the analysis of the user utterance and generates the user utterance interpretation data.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
The user utterance interpretation data is converted to generate conversion data that can be understood by the second information processing device, and the data is transmitted to the second information processing device.
The second information processing device is
The information processing system executes processing corresponding to the user's utterance based on the converted data received from the first information processing apparatus.

さらに、本開示の第３の側面は、
情報処理装置において実行する情報処理方法であり、
音声入力部が、ユーザ発話を入力し、
データ処理部が、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成するデータ処理を実行し、
前記データ処理部は、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信する情報処理方法にある。 Further, the third aspect of the present disclosure is
It is an information processing method executed in an information processing device.
The voice input section inputs the user's utterance,
The data processing department
Data processing that executes the analysis of the user utterance and generates the user utterance interpretation data is executed.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
It is an information processing method that converts the user utterance interpretation data, generates conversion data that can be understood by the second information processing apparatus, and transmits the converted data to the second information processing apparatus.

さらに、本開示の第４の側面は、
複数の情報処理装置を有する情報処理システムにおいて実行する情報処理方法であり、
第１情報処理装置が、
入力したユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成し、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信する処理を実行し、
前記第２情報処理装置が、
前記第１情報処理装置から受信する前記変換データに基づいて、前記ユーザ発話に対応した処理を実行する情報処理方法にある。 Further, the fourth aspect of the present disclosure is
It is an information processing method executed in an information processing system having a plurality of information processing devices.
The first information processing device
When the external second information processing device executes the processing corresponding to the input user utterance
The analysis of the user utterance is executed to generate the user utterance interpretation data, and the user utterance interpretation data is generated.
A process of converting the user utterance interpretation data, generating converted data that can be understood by the second information processing device, and transmitting the converted data to the second information processing device is executed.
The second information processing device
It is an information processing method that executes a process corresponding to the user's utterance based on the converted data received from the first information processing apparatus.

さらに、本開示の第５の側面は、
情報処理装置において情報処理を実行させるプログラムであり、
前記プログラムは、データ処理部に、
ユーザ発話の解析を実行してユーザ発話解釈データを生成させ、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成させ、前記第２情報処理装置に送信させるプログラムにある。 Further, the fifth aspect of the present disclosure is
A program that executes information processing in an information processing device.
The program is installed in the data processing unit.
Analyze user utterances to generate user utterance interpretation data
When the external second information processing device executes the process corresponding to the user's utterance,
It is in a program that converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and causes the second information processing device to transmit the converted data.

なお、本開示のプログラムは、例えば、様々なプログラム・コードを実行可能な情報処理装置やコンピュータ・システムに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体によって提供可能なプログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、情報処理装置やコンピュータ・システム上でプログラムに応じた処理が実現される。 The program of the present disclosure is, for example, a program that can be provided by a storage medium or a communication medium that is provided in a computer-readable format to an information processing device or a computer system that can execute various program codes. By providing such a program in a computer-readable format, processing according to the program can be realized on an information processing device or a computer system.

本開示のさらに他の目的、特徴や利点は、後述する本開示の実施例や添付する図面に基づくより詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Yet other objectives, features and advantages of the present disclosure will become apparent in more detailed description based on the examples of the present disclosure and the accompanying drawings described below. In the present specification, the system is a logical set configuration of a plurality of devices, and the devices having each configuration are not limited to those in the same housing.

本開示の一実施例の構成によれば、ユーザ発話を入力した装置以外の装置においてユーザ発話に応じた処理を実行させることを可能とした装置、方法が実現される。
具体的には、例えば、ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、ユーザ発話の解析を実行してユーザ発話解釈データを生成し、生成したユーザ発話解釈データを変換して、第２情報処理装置が理解可能な変換データを生成して第２情報処理装置に送信する。ユーザ発話解釈データは、ユーザ発話の意図に相当するインテントと、ユーザ発話に含まれる要素情報に相当するスロットを有し、データ処理部はインテントとスロットを含むユーザ発話解釈データを、第２情報処理装置が理解可能なインテントとスロットを含むデータに変換する。
本構成により、ユーザ発話を入力した装置以外の装置においてユーザ発話に応じた処理を実行させることを可能とした装置、方法が実現される。
なお、本明細書に記載された効果はあくまで例示であって限定されるものではなく、また付加的な効果があってもよい。 According to the configuration of one embodiment of the present disclosure, a device and a method capable of executing a process according to a user utterance in a device other than the device in which the user utterance is input are realized.
Specifically, for example, when the external second information processing device is made to execute the process corresponding to the user utterance, the user utterance analysis is executed to generate the user utterance interpretation data, and the generated user utterance interpretation data is converted. Then, the conversion data that can be understood by the second information processing device is generated and transmitted to the second information processing device. The user utterance interpretation data has an intent corresponding to the intention of the user utterance and a slot corresponding to the element information included in the user utterance, and the data processing unit uses the user utterance interpretation data including the intent and the slot as a second. Converts the data to include intents and slots that the information processor can understand.
With this configuration, a device and a method that enable a device other than the device that input the user utterance to execute the process according to the user utterance are realized.
The effects described in the present specification are merely exemplary and not limited, and may have additional effects.

ユーザ発話に基づく応答や処理を行う情報処理装置であるエージェント装置の例について説明する図である。It is a figure explaining the example of the agent apparatus which is an information processing apparatus which performs a response and processing based on a user's utterance. 複数のエージェント装置の利用例と問題点について説明する図である。It is a figure explaining the use example and the problem of a plurality of agent devices. 複数のエージェント装置の利用例と問題点について説明する図である。It is a figure explaining the use example and the problem of a plurality of agent devices. 本開示のエージェント装置の実行する処理の具体例について説明する図である。It is a figure explaining the specific example of the process executed by the agent apparatus of this disclosure. マッピングデータの一例について説明する図である。It is a figure explaining an example of mapping data. 複数のエージェント装置を利用した処理のシーケンスについて説明する図である。It is a figure explaining the sequence of processing using a plurality of agent devices. 複数のエージェント装置を利用した処理のシーケンスについて説明する図である。It is a figure explaining the sequence of processing using a plurality of agent devices. エージェント装置リストの具体例について説明する図である。It is a figure explaining the specific example of the agent apparatus list. １台のエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ａ以外の様々なエージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）の対応データの例を示す図である。User utterance interpretation data A (intent, slot) generated by one agent device A and user utterance interpretation data X (intent, slot) required for various agent devices X other than agent device A to execute processing. , Slot) is a diagram showing an example of corresponding data. 図９（５）に示す例に対応するエージェント装置Ａ，１０の具体的処理例を説明する図である。It is a figure explaining the specific processing example of the agent apparatus A, 10 corresponding to the example shown in FIG. 9 (5). エージェント装置の実行する処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the processing sequence executed by the agent apparatus. エージェント装置の実行する処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the processing sequence executed by the agent apparatus. エージェント装置の実行する処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the processing sequence executed by the agent apparatus. エージェント装置の実行する処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the processing sequence executed by the agent apparatus. 記憶部に記録される対話履歴情報（コンテキスト）のデータ例を示す図である。It is a figure which shows the data example of the dialogue history information (context) recorded in a storage part. 記憶部に記録される対話履歴情報（コンテキスト）のデータ例を示す図である。It is a figure which shows the data example of the dialogue history information (context) recorded in a storage part. ３台以上のエージェント装置を利用したシステム構成例について説明する図である。It is a figure explaining the system configuration example using three or more agent devices. 本開示の情報処理装置であるエージェント装置の構成例について説明する図である。It is a figure explaining the configuration example of the agent apparatus which is the information processing apparatus of this disclosure. 本開示の情報処理装置であるエージェント装置の構成例と利用例について説明する図である。It is a figure explaining the configuration example and the use example of the agent apparatus which is an information processing apparatus of this disclosure. 本開示の情報処理装置であるエージェント装置等のハードウェア構成例について説明する図である。It is a figure explaining the hardware configuration example of the agent apparatus which is the information processing apparatus of this disclosure.

以下、図面を参照しながら本開示の情報処理装置、情報処理システム、および情報処理方法、並びにプログラムの詳細について説明する。なお、説明は以下の項目に従って行なう。
１．エージェント装置の概要について
２．複数台のエージェント装置の利用例と問題点について
３．本開示の情報処理装置（エージェント装置）の実行する処理について
４．複数のエージェント装置を利用した処理のシーケンスについて
５．エージェント装置各々の異なるユーザ発話解釈データの例について
６．エージェント装置の実行する処理フローについて
７．処理を実行するためのデータが不足している場合の処理例について
８．複数のユーザ発話の管理処理例について
９．その他の実施例について
１０．エージェント装置（情報処理装置）の構成例について
１１．エージェント装置（情報処理装置）のハードウェア構成例について
１２．本開示の構成のまとめ Hereinafter, the details of the information processing apparatus, the information processing system, the information processing method, and the program of the present disclosure will be described with reference to the drawings. The explanation will be given according to the following items.
1. 1. About the outline of the agent device 2. Usage examples and problems of multiple agent devices 3. 4. Processing executed by the information processing device (agent device) of the present disclosure. 4. Processing sequence using multiple agent devices. 6. Examples of different user utterance interpretation data for each agent device. Processing flow executed by the agent device 7. 8. Processing example when there is not enough data to execute the processing. Example of management processing of multiple user utterances 9. About other examples 10. Configuration example of agent device (information processing device) 11. Hardware configuration example of agent device (information processing device) 12. Summary of the structure of this disclosure

［１．エージェント装置の概要について］
まず、図１以下を参照して、本開示の情報処理装置に相当するエージェント装置の概要についてについて説明する。 [1. About the outline of the agent device]
First, an outline of an agent device corresponding to the information processing device of the present disclosure will be described with reference to FIGS. 1 and 1 and below.

図１は、ユーザ１の発するユーザ発話を認識して応答を行うエージェント装置１０の一処理例を示す図である。
エージェント装置１０は、ユーザの発話、例えば、
ユーザ発話＝「大阪の明日、午後の天気を教えて」
このユーザ発話の音声認識処理を実行する。 FIG. 1 is a diagram showing an example of processing of an agent device 10 that recognizes a user's utterance uttered by the user 1 and makes a response.
The agent device 10 is a user's utterance, for example.
User utterance = "Tell me the weather in Osaka tomorrow and afternoon"
The voice recognition process of this user utterance is executed.

さらに、エージェント装置１０は、ユーザ発話の音声認識結果に基づく処理を実行する。
図１に示す例では、ユーザ発話＝「大阪の明日、午後の天気を教えて」に応答するためのデータを取得し、取得データに基づいて応答を生成して生成した応答を、スピーカー１４を介して出力する。
図１に示す例では、エージェント装置１０は、以下のシステム応答を行っている。
システム応答＝「大阪の明日、午後の天気は晴れですが、夕方、にわか雨がある可能性があります。」
エージェント装置１０は、音声合成処理（ＴＴＳ：ＴｅｘｔＴｏＳｐｅｅｃｈ）を実行して上記のシステム応答を生成して出力する。 Further, the agent device 10 executes a process based on the voice recognition result of the user's utterance.
In the example shown in FIG. 1, the speaker 14 is used to acquire data for responding to user utterance = "Tell me the weather tomorrow and afternoon in Osaka" and generate a response based on the acquired data. Output via.
In the example shown in FIG. 1, the agent device 10 makes the following system response.
System response = "Tomorrow in Osaka, the weather will be fine in the afternoon, but there may be a shower in the evening."
The agent device 10 executes voice synthesis processing (TTS: Text To Speech) to generate and output the above system response.

エージェント装置１０は、装置内の記憶部から取得した知識データ、またはネットワークを介して取得した知識データを利用して応答を生成して出力する。
図１に示すエージェント装置１０は、カメラ１１、マイク１２、表示部１３、スピーカー１４を有しており、音声入出力と画像入出力が可能な構成を有する。 The agent device 10 generates and outputs a response by using the knowledge data acquired from the storage unit in the device or the knowledge data acquired via the network.
The agent device 10 shown in FIG. 1 includes a camera 11, a microphone 12, a display unit 13, and a speaker 14, and has a configuration capable of audio input / output and image input / output.

なお、ユーザ発話に対する音声認識処理や意味解析処理は、エージェント装置１０内で行ってもよいし、クラウド側のサーバにおいて実行する構成としもよい。 The voice recognition process and the semantic analysis process for the user's utterance may be performed in the agent device 10 or may be executed in the server on the cloud side.

エージェント装置１０は、ユーザ１の発話を認識して、ユーザ発話に基づく応答を行う他、例えば、ユーザ発話に応じた様々な処理、例えば、家の中のテレビ、エアコン等の外部機器の制御も実行する。
例えばユーザ発話が「テレビのチャンネルを１に変えて」、あるいは「エアコンの設定温度を２０度にして」といった要求である場合、エージェント装置１０は、このユーザ発話の音声認識結果に基づいて、外部機器に対して制御信号（Ｗｉ−Ｆｉ、赤外光など）を出力して、ユーザ発話に従った制御を実行する。 The agent device 10 recognizes the utterance of the user 1 and performs a response based on the utterance of the user. In addition, for example, various processes according to the utterance of the user, for example, control of an external device such as a television or an air conditioner in the house. Execute.
For example, when the user utterance is a request such as "change the TV channel to 1" or "set the temperature of the air conditioner to 20 degrees", the agent device 10 externally recognizes the voice recognition result of the user utterance. A control signal (Wi-Fi, infrared light, etc.) is output to the device to execute control according to the user's utterance.

［２．複数台のエージェント装置の利用例と問題点について］
次に、複数台のエージェント装置の利用例と問題点について説明する。 [2. Usage examples and problems of multiple agent devices]
Next, usage examples and problems of a plurality of agent devices will be described.

近年は、様々なメーカーが安価なエージェント装置を提供しており、家に複数台のエージェント装置を有するユーザも多い。
しかし、前述したように、個別のエージェント装置は、ユーザが要求する処理の全てを実行できるわけでなく、各々、得意とする処理が限定されている。例えばユーザからのリクエストに応じて家の中のテレビやエアコンの制御を行なうエージェント装置Ａや、ニュースや天気情報の情報提供処理を得意とするエージェント装置Ｂや、レストラン検索や、食事のデリバリサービスの依頼処理等を得意とするエージェント装置Ｃ等、様々である。 In recent years, various manufacturers have provided inexpensive agent devices, and many users have multiple agent devices at home.
However, as described above, the individual agent devices cannot execute all the processes requested by the user, and the processes that they are good at are limited. For example, agent device A that controls TVs and air conditioners in the house in response to requests from users, agent device B that is good at providing information on news and weather information, restaurant search, and meal delivery services. There are various types such as an agent device C that is good at request processing and the like.

図２以下を参照して、複数のエージェント装置を所有するユーザによるエージェント装置の利用例と問題点について説明する。 FIG. 2 With reference to the following, an example of using the agent device by a user who owns a plurality of agent devices and problems will be described.

図２に示すように、家の２階にユーザ１がいる。ユーザのいる２階にエージェント装置Ａ，１０が置いてあり、１階に別のエージェント装置Ｂ，２０が置いてある。
２階のエージェント装置Ａ，１０は、実行可能な主要機能がニュースや天気情報、交通情報等の様々な情報提供処理である。
一方、１階のエージェント装置Ｂ，２０は、実行可能な主要機能が家の中の電気製品（テレビ、エアコン等）の制御や、レストラン検索、宅配（デリバリ）サービスの依頼等である。 As shown in FIG. 2, user 1 is on the second floor of the house. Agent devices A and 10 are placed on the second floor where the user is located, and other agent devices B and 20 are placed on the first floor.
The main functions that can be executed by the agent devices A and 10 on the second floor are various information provision processes such as news, weather information, and traffic information.
On the other hand, in the agent devices B and 20 on the first floor, the main functions that can be executed are control of electric appliances (television, air conditioner, etc.) in the house, restaurant search, request for home delivery (delivery) service, and the like.

このような特性の異なる２台のエージェント装置がそれぞれ１階と２階に置いてある。
ここで、図３に示すように、２階にいるユーザ１が、目の前のエージェント装置Ａ，１０に対して、以下のようなユーザ発話を行なったとする。
「ピザ注文したいです。１２：００に配達で、マルゲリータお願い」 Two agent devices with different characteristics are placed on the first floor and the second floor, respectively.
Here, as shown in FIG. 3, it is assumed that the user 1 on the second floor makes the following user utterance to the agent devices A and 10 in front of him.
"I want to order pizza. Deliver at 12:00, please Margherita."

このユーザ発話は、ユーザ１の前にあるエージェント装置Ａ，１０のマイクを介してエージェント装置Ａ，１０に入力される。
エージェント装置Ａ，１０は、ユーザ発話の音声解析を実行して、ユーザの要求を理解するが、エージェント装置Ａ，１０には宅配（デリバリ）サービスの依頼機能がない。この結果、図３に示すように、エージェント装置Ａ，１０は、例えば、ユーザ１に対して、以下のシステム発話を行なう。
「申し訳ありません。実行できません」 This user utterance is input to the agent devices A and 10 via the microphones of the agent devices A and 10 in front of the user 1.
The agent devices A and 10 perform voice analysis of the user's utterance to understand the user's request, but the agent devices A and 10 do not have a delivery service request function. As a result, as shown in FIG. 3, the agent devices A and 10 make the following system utterances to, for example, the user 1.
"Sorry, I can't do it."

このように、宅配（デリバリ）サービスの依頼機能がないエージェント装置Ａ，１０に対して、ピザ注文の要求発話を行なっても処理が行われない。
この結果、ユーザ１は、宅配（デリバリ）サービスの依頼機能があるエージェント装置Ｂ，２０のある１階に移動して、エージェント装置Ｂ，２０に対して、上記のユーザ発話を、再度、行なわなければならない。
本開示の情報処理装置（エージェント装置）は、このようなユーザ負担を解消するものである。 As described above, even if the pizza order request is uttered to the agent devices A and 10 that do not have the delivery service request function, the processing is not performed.
As a result, the user 1 must move to the first floor where the agent devices B and 20 having the delivery service request function are located, and make the above user utterance to the agent devices B and 20 again. Must be.
The information processing device (agent device) of the present disclosure eliminates such a burden on the user.

［３．本開示の情報処理装置（エージェント装置）の実行する処理について］
次に、本開示の情報処理装置（エージェント装置）の実行する処理について説明する。 [3. Processing executed by the information processing device (agent device) of the present disclosure]
Next, the processing executed by the information processing apparatus (agent apparatus) of the present disclosure will be described.

図４を参照して、本開示のエージェント装置の実行する処理の概要について説明する。
図４には、先に図２、図３を参照して説明したと同様の設定を示している。すなわち、家の２階にユーザ１がおり、同じ２階にエージェント装置Ａ，１０があり、１階に別のエージェント装置Ｂ，２０が置いてある。
２階のエージェント装置Ａ，１０は、実行可能な主要機能がニュースや天気情報、交通情報等の様々な情報提供処理である。
一方、１階のエージェント装置Ｂ，２０は、実行可能な主要機能が家の中の電気製品（テレビ、エアコン等）の制御や、レストラン検索、宅配（デリバリ）サービスの依頼等である。
このような特性の異なる２台のエージェント装置がそれぞれ１階と２階においてある。 The outline of the process executed by the agent apparatus of the present disclosure will be described with reference to FIG.
FIG. 4 shows the same settings as described above with reference to FIGS. 2 and 3. That is, the user 1 is on the second floor of the house, the agent devices A and 10 are on the same second floor, and another agent devices B and 20 are placed on the first floor.
The main functions that can be executed by the agent devices A and 10 on the second floor are various information provision processes such as news, weather information, and traffic information.
On the other hand, in the agent devices B and 20 on the first floor, the main functions that can be executed are control of electric appliances (television, air conditioner, etc.) in the house, restaurant search, request for home delivery (delivery) service, and the like.
Two agent devices having such different characteristics are on the first floor and the second floor, respectively.

ここで、図４に示すように、２階にいるユーザ１が、目の前のエージェント装置Ａ，１０に対して、以下のようなユーザ発話を行なったとする。
「ピザ注文したいです。１２：００に配達で、マルゲリータお願い」 Here, as shown in FIG. 4, it is assumed that the user 1 on the second floor makes the following user utterance to the agent devices A and 10 in front of him.
"I want to order pizza. Deliver at 12:00, please Margherita."

このユーザ発話は、ユーザ１の前にあるエージェント装置Ａ，１０のマイクを介してエージェント装置Ａ，１０に入力される。
エージェント装置Ａ，１０は、ユーザ発話の音声解析を実行して、ユーザの要求を理解する。
この処理は、図４に示す（ステップＳ０１）の処理である。 This user utterance is input to the agent devices A and 10 via the microphones of the agent devices A and 10 in front of the user 1.
The agent devices A and 10 execute voice analysis of the user's utterance to understand the user's request.
This process is the process shown in FIG. 4 (step S01).

図４に示す（ステップＳ０１）において、エージェント装置Ａ，１０は、ユーザ発話に対する音声認識処理や意味解析処理や対話状態推定処理を行なって、ユーザ発話を解釈する。
エージェント装置Ａ，１０は、ユーザ発話の解釈結果として、ユーザ発話の意図（インテント：Ｉｎｔｅｎｔ）や、発話に含まれる意味のある要素（有意要素）である要素情報（スロット：Ｓｌｏｔ）を推定する。
なお、このインテントやスロットの推定処理は、エージェント装置Ａ，１０固有のユーザ発話解釈アルゴリズムに従って実行される。 In (step S01) shown in FIG. 4, the agent devices A and 10 interpret the user utterance by performing voice recognition processing, semantic analysis processing, and dialogue state estimation processing for the user utterance.
The agent devices A and 10 estimate the intention of the user utterance (Intent) and the element information (slot) which is a meaningful element (significant element) included in the utterance as the interpretation result of the user utterance. ..
The intent and slot estimation processing is executed according to the user utterance interpretation algorithm peculiar to the agent devices A and 10.

図４に示すエージェント装置Ａ，１０は、（ステップＳ０１）において、例えば以下のユーザ発話解釈データを生成する。
（インテントａ）配達
（スロットａ１）配達時刻＝１２：００
（スロットａ２）食べ物＝ピザ
（スロットａ３）種類＝マルゲリータ The agent devices A and 10 shown in FIG. 4 generate, for example, the following user utterance interpretation data in (step S01).
(Intent a) Delivery (Slot a1) Delivery time = 12:00
(Slot a2) Food = Pizza (Slot a3) Type = Margherita

エージェント装置Ａ，１０は、上記のユーザ発話解釈データに基づいて、ユーザ１が、ピザ（マルゲリータ）の配達依頼を要求していることを理解する。
しかし、エージェント装置Ａ，１０は、ピザの配達依頼機能を有していない。 The agent devices A and 10 understand that the user 1 is requesting the delivery request of the pizza (Margherita) based on the above-mentioned user utterance interpretation data.
However, the agent devices A and 10 do not have a pizza delivery request function.

この場合、本開示のエージェント装置Ａ，１０は、ピザの配達依頼機能を有している他のエージェント装置に対して、ユーザ発話解釈データを転送する。
図４に示す例では、１階のエージェント装置Ｂ，２０に対して、ユーザ発話解釈データを転送する。 In this case, the agent devices A and 10 of the present disclosure transfer the user utterance interpretation data to another agent device having a pizza delivery request function.
In the example shown in FIG. 4, the user utterance interpretation data is transferred to the agent devices B and 20 on the first floor.

ただし、エージェント装置Ａ，１０が上記の（ステップＳ０１）で生成したユーザ発話解釈データ（インテントａ、スロットａ）をそのままエージェント装置Ｂ，２０に転送しても、エージェント装置Ｂ，２０は、これらのユーザ発話解釈データ（インテントａ、スロットａ）に基づく処理の実行、すなわちピザの注文処理を実行することができない場合がある。 However, even if the agent devices A and 10 directly transfer the user utterance interpretation data (intent a and slot a) generated in the above (step S01) to the agent devices B and 20, the agent devices B and 20 still have these. It may not be possible to execute the process based on the user utterance interpretation data (intent a, slot a), that is, the pizza order process.

この理由は、エージェント装置Ａ，１０の実行するユーザ発話解釈処理アルゴリズムＡと、エージェント装置Ｂ，２０の実行するユーザ発話解釈処理アルゴリズムＢとが異なる場合があるからである。 The reason for this is that the user utterance interpretation processing algorithm A executed by the agent devices A and 10 may be different from the user utterance interpretation processing algorithm B executed by the agent devices B and 20.

エージェント装置Ａ，１０は、エージェント装置Ａ，１０のユーザ発話解釈処理アルゴリズムＡに従ってユーザ発話解釈データＡ（インテントａ、スロットａ）を生成し、ユーザ発話解釈処理アルゴリズムＡに従ったユーザ発話解釈データＡ（インテントａ、スロットａ）に従って処理（例えばユーザに対する応答等）を実行する。 The agent devices A and 10 generate the user utterance interpretation data A (intent a, slot a) according to the user utterance interpretation processing algorithm A of the agent devices A and 10, and the user utterance interpretation data according to the user utterance interpretation processing algorithm A. Processing (for example, a response to a user) is executed according to A (intent a, slot a).

一方、エージェント装置Ｂ，２０は、エージェント装置Ｂ，２０のユーザ発話解釈処理アルゴリズムＢに従ってユーザ発話解釈データＢ（インテントｂ、スロットｂ）を生成し、ユーザ発話解釈処理アルゴリズムＢに従ったユーザ発話解釈データＢ（インテントｂ、スロットｂ）に従って処理を実行する。 On the other hand, the agent devices B and 20 generate the user utterance interpretation data B (intent b, slot b) according to the user utterance interpretation processing algorithm B of the agent devices B and 20, and the user utterance according to the user utterance interpretation processing algorithm B. The process is executed according to the interpretation data B (intent b, slot b).

従って、エージェント装置Ｂ，２０に対して、エージェント装置Ａ，１０が生成したユーザ発話解釈処理アルゴリズムＡに従ったユーザ発話解釈データＡ（インテントａ、スロットａ）を送信しても、エージェント装置Ｂ，２０は、ユーザ発話解釈データＡ（インテントａ、スロットａ）に基づく正しい処理（ユーザ要求に従った処理）を実行できない可能性がある。 Therefore, even if the user utterance interpretation data A (intent a, slot a) according to the user utterance interpretation processing algorithm A generated by the agent devices A and 10 is transmitted to the agent devices B and 20, the agent device B , 20 may not be able to execute correct processing (processing according to user request) based on user utterance interpretation data A (intent a, slot a).

そこで、エージェント装置Ａ，１０は、図４に示す（ステップＳ０２）の処理を実行する。
エージェント装置Ａ，１０は、（ステップＳ０２）において、（ステップＳ０１）で生成したユーザ発話解釈データＡ（インテントａ、スロットａ）を、エージェント装置Ｂ，２０が理解可能なユーザ発話解釈データＢ（インテントｂ、スロットｂ）に変換する処理（マッピング処理）を実行する。 Therefore, the agent devices A and 10 execute the process shown in FIG. 4 (step S02).
In (step S02), the agent devices A and 10 can use the user utterance interpretation data A (intent a, slot a) generated in (step S01) in the user utterance interpretation data B (intent a, slot a) that the agent devices B and 20 can understand. The process of converting to the intent b and the slot b) (mapping process) is executed.

ユーザ発話解釈データＢ（インテントｂ、スロットｂ）は、エージェント装置Ｂ，２０の実行するユーザ発話解釈処理アルゴリズムＢに従って生成されるユーザ発話解釈データ（インテントｂ、スロットｂ）に相当する。 The user utterance interpretation data B (intent b, slot b) corresponds to the user utterance interpretation data (intent b, slot b) generated according to the user utterance interpretation processing algorithm B executed by the agent devices B and 20.

エージェント装置Ａ，１０は、このデータ変換に必要なマッピングデータを記憶部に保持しており、このマッピングデータを参照して、（ステップＳ０１）で生成したユーザ発話解釈データＡ（インテントａ、スロットａ）を、エージェント装置Ｂ，２０が理解可能なユーザ発話解釈データＢ（インテントｂ、スロットｂ）に変換する処理（マッピング処理）を実行する。 The agent devices A and 10 store the mapping data required for this data conversion in the storage unit, and refer to this mapping data to generate the user utterance interpretation data A (intent a, slot) in (step S01). A process (mapping process) of converting a) into user utterance interpretation data B (intent b, slot b) that can be understood by the agent devices B and 20 is executed.

具体的には、例えば、エージェント装置Ａ，１０は、図４（ステップＳ０２）に示す以下の変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））を生成する。
（インテントｂ）ピザ注文
（スロットｂ１）配達時刻＝１２：００
（スロットｂ２）種類＝マルゲリータ Specifically, for example, the agent devices A and 10 generate the following conversion data (user utterance interpretation data B (intent b, slot b)) shown in FIG. 4 (step S02).
(Intent b) Pizza order (Slot b1) Delivery time = 12:00
(Slot b2) Type = Margherita

この変換データは、エージェント装置Ａ，１０の記憶部に格納されたマッピングデータ、具体的には、例えば図５に示すようなマッピングデータを参照して実行される。 This conversion data is executed with reference to the mapping data stored in the storage units of the agent devices A and 10, specifically, the mapping data as shown in FIG. 5, for example.

なお、上記変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））は、エージェント装置Ｂ，２０が、以下のユーザ発話、すなわち、
「ピザ注文したいです。１２：００に配達で、マルゲリータお願い」
上記ユーザ発話を直接、入力した場合にエージェント装置Ｂ，２０が実行するユーザ発話解釈処理アルゴリズムＢに従って生成するユーザ発話解釈データＢ（インテントｂ、スロットｂ）に相当する。 In the above conversion data (user utterance interpretation data B (intent b, slot b)), the agent devices B and 20 make the following user utterances, that is,
"I want to order pizza. Deliver at 12:00, please Margherita."
This corresponds to the user utterance interpretation data B (intent b, slot b) generated according to the user utterance interpretation processing algorithm B executed by the agent devices B and 20 when the user utterance is directly input.

エージェント装置Ａ，１０の記憶部に格納されているマッピングデータの例を図５に示す。
図５に示すように、マッピングデータは、以下のデータを対応付けて登録した構成を有する。
（Ａ）エージェント装置Ａ，１０が実行するユーザ発話解釈処理アルゴリズムＡに従って生成するユーザ発話解釈データＡ（インテントａ、スロットａ）、
（Ｂ）エージェント装置Ｂ，２０が処理を実行するために必要となるユーザ発話解釈データＢ（インテントｂ、スロットｂ）
なお、上記データ（Ｂ）は、エージェント装置Ｂ，２０のＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）を適用して、エージェント装置Ｂに処理を実行させることが可能なデータに相当する。
例えば、エージェント装置Ｂ，２０、またはエージェント装置Ｂ，２０の管理サーバ等が提供するＡＰＩが入力データとして許容したデータである。 FIG. 5 shows an example of mapping data stored in the storage units of the agent devices A and 10.
As shown in FIG. 5, the mapping data has a configuration in which the following data are associated and registered.
(A) User utterance interpretation data A (intent a, slot a) generated according to the user utterance interpretation processing algorithm A executed by the agent devices A and 10.
(B) User utterance interpretation data B (intent b, slot b) required for the agent devices B and 20 to execute the process.
The data (B) corresponds to data capable of causing the agent device B to execute the process by applying the API (Application Programming Interface) of the agent devices B and 20.
For example, the data is allowed as input data by the API provided by the agent devices B, 20 or the management server of the agent devices B, 20.

次に、エージェント装置Ａ，１０は、（ステップＳ０３）において、（ステップＳ０２）で生成した変換データ、
（インテントｂ）ピザ注文
（スロットｂ１）配達時刻＝１２：００
（スロットｂ２）種類＝マルゲリータ
この変換データを、通信部を介して、１階のエージェント装置Ｂ，２０に送信する。
例えば、エージェント装置Ａ，１０は、ステップＳ０２で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））を、エージェント装置Ｂ，２０、またはエージェント装置Ｂ，２０の管理サーバ等が提供するＡＰＩを利用した処理（ＡＰＩのＣａｌｌ処理）により、エージェント装置Ｂ，２０に入力することができる。 Next, the agent devices A and 10 have the conversion data generated in (step S02) in (step S03).
(Intent b) Pizza order (Slot b1) Delivery time = 12:00
(Slot b2) Type = Margherita This converted data is transmitted to the agent devices B and 20 on the first floor via the communication unit.
For example, in the agent devices A and 10, the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S02 is input by the agent devices B, 20 or the management server of the agent devices B, 20. It can be input to the agent devices B and 20 by the processing using the provided API (Call processing of API).

１階のエージェント装置Ｂ，２０は、２階のエージェント装置Ａ，１０から上記の変換データを受信する。
エージェント装置Ｂ，２０は、（ステップＳ０４）において、エージェント装置Ａ，１０から受信した変換データに基づく処理を実行する。
すなわち、ユーザの要求に従ってピザを注文する処理を行なう。 The agent devices B and 20 on the first floor receive the above conversion data from the agent devices A and 10 on the second floor.
In (step S04), the agent devices B and 20 execute processing based on the conversion data received from the agent devices A and 10.
That is, the process of ordering pizza is performed according to the user's request.

エージェント装置Ａ，１０から受信した変換データは、エージェント装置Ｂ，２０が、以下のユーザ発話、すなわち、
「ピザ注文したいです。１２：００に配達で、マルゲリータお願い」
上記ユーザ発話を直接、入力した場合にエージェント装置Ｂ，２０が実行するユーザ発話解釈処理アルゴリズムＢに従って生成するユーザ発話解釈データＢ（インテントｂ、スロットｂ）と同じである。
従って、エージェント装置Ｂ，２０は、ユーザ１から上記ユーザ発話を直接、入力した場合と同様、正確にユーザ１の意図を理解して正確な処理を実行することが可能となる。 The conversion data received from the agent devices A and 10 is the following user utterance, that is, the conversion data received by the agent devices B and 20.
"I want to order pizza. Deliver at 12:00, please Margherita."
This is the same as the user utterance interpretation data B (intent b, slot b) generated according to the user utterance interpretation processing algorithm B executed by the agent devices B and 20 when the user utterance is directly input.
Therefore, the agent devices B and 20 can accurately understand the intention of the user 1 and execute accurate processing as in the case where the user utterance is directly input from the user 1.

また、エージェント装置Ｂ，２０では、ユーザ発話に対する音声認識処理や意味解析処理や対話状態推定処理等のユーザ発話解釈処理を行なう必要がなく、処理が遅延することなくスムーズに処理を行なうことが可能となる。 Further, in the agent devices B and 20, it is not necessary to perform user utterance interpretation processing such as voice recognition processing, semantic analysis processing, and dialogue state estimation processing for the user utterance, and the processing can be smoothly performed without delay. It becomes.

［４．複数のエージェント装置を利用した処理のシーケンスについて］
次に、複数のエージェント装置を利用した処理のシーケンスについて説明する。 [4. About the sequence of processing using multiple agent devices]
Next, a processing sequence using a plurality of agent devices will be described.

図６、図７を参照して複数のエージェント装置を利用した処理のシーケンスについて説明する。
図６、図７には、左から、
（１）ユーザ発話を行なうユーザ、
（２）エージェント装置Ａ，１０、
（３）エージェント装置Ｂ，２０、
これらを示している。
図６、図７に示すステップＳ２１〜Ｓ２７の順番に処理が実行される。
以下、各ステップの処理について、順次、説明する。 A processing sequence using a plurality of agent devices will be described with reference to FIGS. 6 and 7.
In FIGS. 6 and 7, from the left,
(1) User who speaks
(2) Agent devices A, 10,
(3) Agent devices B, 20,
These are shown.
The processes are executed in the order of steps S21 to S27 shown in FIGS. 6 and 7.
Hereinafter, the processing of each step will be described in sequence.

（ステップＳ２１）
まず、ステップＳ２１において、ユーザ１がエージェント装置Ａ，１０に対して話かけ、ユーザ１とエージェント装置Ａ，１０との間で対話処理が実行される。 (Step S21)
First, in step S21, the user 1 speaks to the agent devices A and 10, and the dialogue process is executed between the user 1 and the agent devices A and 10.

（ステップＳ２２）
次に、ステップＳ２２において、エージェント装置Ａ，１０は、ステップＳ２１の対話処理で取得したユーザ発話の音声認識処理や意味解析処理や対話状態推定処理によるユーザ発話解釈処理を実行する。 (Step S22)
Next, in step S22, the agent devices A and 10 execute the user utterance interpretation process by the voice recognition process, the semantic analysis process, and the dialogue state estimation process of the user utterance acquired in the dialogue process of step S21.

この処理は、エージェント装置Ａ，１０のデータ処理部（音声解析部）が実行するユーザ発話解釈アルゴリズムＡに従って行われる。
エージェント装置Ａ，１０は、ユーザ発話解釈処理の結果として、ユーザ発話の意図（インテント：Ｉｎｔｅｎｔ）や、発話に含まれる意味のある要素（有意要素）である要素情報（スロット：Ｓｌｏｔ）からなるユーザ発話解釈データＡを生成する。 This processing is performed according to the user utterance interpretation algorithm A executed by the data processing unit (voice analysis unit) of the agent devices A and 10.
The agent devices A and 10 are composed of the intention of the user utterance (Intent) and the element information (slot) which is a meaningful element (significant element) included in the utterance as a result of the user utterance interpretation process. User utterance interpretation data A is generated.

具体的には、例えば、先に図４を参照して説明した以下のユーザ発話解釈データＡを生成する。
（インテントａ）配達
（スロットａ１）配達時刻＝１２：００
（スロットａ２）食べ物＝ピザ
（スロットａ３）種類＝マルゲリータ Specifically, for example, the following user utterance interpretation data A described above with reference to FIG. 4 is generated.
(Intent a) Delivery (Slot a1) Delivery time = 12:00
(Slot a2) Food = Pizza (Slot a3) Type = Margherita

（ステップＳ２３〜Ｓ２４）
エージェント装置Ａ，１０は、ステップＳ２２で取得したユーザ発話解釈データに基づいて、ユーザ１の要求を処理できるかできないかを判定し、処理が実行できる場合は処理を行なう。
ただし、このシーケンス図は、エージェント装置Ａ，１０がユーザ１の要求を処理できないと判定した場合のシーケンスを示している。 (Steps S23 to S24)
The agent devices A and 10 determine whether or not the request of the user 1 can be processed based on the user utterance interpretation data acquired in step S22, and if the processing can be executed, perform the processing.
However, this sequence diagram shows a sequence when the agent devices A and 10 determine that the request of the user 1 cannot be processed.

例えば、エージェント装置Ａ，１０は、ステップＳ２２で取得したユーザ発話解釈データに基づいて、ユーザ１が、ピザ（マルゲリータ）の配達依頼を要求していることを理解するが、エージェント装置Ａ，１０は、ピザの配達依頼機能を有していない。 For example, the agent devices A and 10 understand that the user 1 requests the delivery request of the pizza (Margherita) based on the user utterance interpretation data acquired in step S22, but the agent devices A and 10 have the agent devices A and 10 requesting the delivery request. , Does not have a pizza delivery request function.

この場合、エージェント装置Ａ，１０は、ステップＳ２３において、以下のシステム発話を出力する。
「処理できません」
ユーザ１は、このシステム応答に対して、以下のユーザ発話を行なう。
「エージェント装置Ｂに転送して」
エージェント装置Ａ，１０は、このユーザ発話を解釈し、ユーザ要求に従ってユーザ要求をエージェント装置Ｂに転送するための処理を行なう。
すなわち、図７に示すステップＳ２５以下の処理を行なう。 In this case, the agent devices A and 10 output the following system utterances in step S23.
"Cannot process"
The user 1 makes the following user utterance in response to this system response.
"Transfer to agent device B"
The agent devices A and 10 interpret the user utterance and perform a process for transferring the user request to the agent device B according to the user request.
That is, the processing of step S25 or less shown in FIG. 7 is performed.

なお、図６に示すシーケンス図では、ステップＳ２３〜Ｓ２４で、エージェント装置Ａ，１０が、ユーザ要求を処理できないことをユーザ１に伝え、ユーザ１が応答としてエージェント装置Ｂに転送する要求を行なった例を示している。 In the sequence diagram shown in FIG. 6, in steps S23 to S24, the agent devices A and 10 inform the user 1 that the user request cannot be processed, and the user 1 makes a request to transfer the user request to the agent device B as a response. An example is shown.

このようなユーザとの対話処理を行なうことなく、エージェント装置Ａ，１０がユーザ要求を処理できる通信可能な他のエージェント装置を、自ら検索して、検索されたエージェント装置にユーザ要求を転送する処理を行なう構成としてもよい。 A process in which the agent devices A and 10 search for another communicable agent device capable of processing the user request without performing such a dialogue process with the user, and transfer the user request to the searched agent device. It may be configured to perform.

ただし、この処理を行なう場合、エージェント装置Ａ，１０の記憶部には、例えば、図８に示すようなエージェント装置リストを保持することが必要である。このリストを参照して、ユーザ要求転送先のエージェント装置を選択する。 However, when this process is performed, it is necessary to hold, for example, a list of agent devices as shown in FIG. 8 in the storage units of the agent devices A and 10. Refer to this list to select the agent device to which the user request is forwarded.

図８に示すエージェント装置リストは、通信可能な他のエージェント装置の識別子、エージェント装置の機能、通信用アドレスを対応付けて記録したエージェント装置リストである。
エージェント装置Ａ，１０は、エージェント装置Ａ，１０の記憶部に格納された図８に示すエージェント装置リストを参照してユーザ要求転送先のエージェント装置を選択する処理を行なう構成としてもよい。 The agent device list shown in FIG. 8 is a list of agent devices recorded in association with identifiers of other agent devices capable of communication, functions of the agent devices, and communication addresses.
The agent devices A and 10 may be configured to perform a process of selecting an agent device as a user request transfer destination by referring to the agent device list shown in FIG. 8 stored in the storage unit of the agent devices A and 10.

（ステップＳ２５）
ユーザ要求転送先のエージェント装置が決定されると、エージェント装置Ａ，１０はステップＳ２５の処理を実行する。
なお、ユーザ要求転送先のエージェント装置はエージェント装置Ｂ，２０であるとする。 (Step S25)
When the agent device to which the user request is transferred is determined, the agent devices A and 10 execute the process of step S25.
It is assumed that the agent devices to which the user request is transferred are the agent devices B and 20.

エージェント装置Ａ，１０は、ステップＳ２５において、ステップＳ２２で生成した「ユーザ発話解釈データＡ」を、エージェント装置Ｂ，２０が理解可能な「ユーザ発話解釈データＢ」に変換（マッピング）する処理を行なう。
すなわち、エージェント装置Ｂ，２０が処理を実行するために必要となるユーザ発話解釈データＢ（インテント、スロット）に変換（マッピング）する処理を行なう。 In step S25, the agent devices A and 10 perform a process of converting (mapping) the "user utterance interpretation data A" generated in step S22 into "user utterance interpretation data B" that the agent devices B and 20 can understand. ..
That is, the agent devices B and 20 perform a process of converting (mapping) the user utterance interpretation data B (intent, slot) required for executing the process.

先に、図４を参照して説明したように、エージェント装置Ｂ，２０は、エージェント装置Ｂ，２０のユーザ発話解釈処理アルゴリズムＢに従ってユーザ発話解釈データ（インテントｂ、スロットｂ）を生成し、生成したユーザ発話解釈データＢ（インテントｂ、スロットｂ）に従って処理を実行する。
従って、エージェント装置Ｂ，２０に対して、エージェント装置Ａ，１０が生成したユーザ発話解釈処理アルゴリズムＡに従ったユーザ発話解釈データＡ（インテントａ、スロットａ）を送信しても、エージェント装置Ｂ，２０は、正しい処理（ユーザ要求に従った処理）を実行できない可能性がある。 As described above with reference to FIG. 4, the agent devices B and 20 generate user utterance interpretation data (intent b, slot b) according to the user utterance interpretation processing algorithm B of the agent devices B and 20. The process is executed according to the generated user utterance interpretation data B (intent b, slot b).
Therefore, even if the user utterance interpretation data A (intent a, slot a) according to the user utterance interpretation processing algorithm A generated by the agent devices A and 10 is transmitted to the agent devices B and 20, the agent device B , 20 may not be able to execute correct processing (processing according to user request).

このような事態を避けるため、エージェント装置Ａ，１０は、ステップＳ２５において、ステップＳ２２で生成した「ユーザ発話解釈データＡ」を、エージェント装置Ｂ，２０が理解可能な「ユーザ発話解釈データＢ」に変換（マッピング）する処理を行なう。
すなわち、エージェント装置Ｂ，２０が処理を実行するために必要となるユーザ発話解釈データＢ（インテント、スロット）を生成する処理を行なう。 In order to avoid such a situation, the agent devices A and 10 change the "user utterance interpretation data A" generated in step S22 into the "user utterance interpretation data B" that the agent devices B and 20 can understand in step S25. Performs conversion (mapping) processing.
That is, the agent devices B and 20 perform a process of generating the user utterance interpretation data B (intent, slot) required for executing the process.

エージェント装置Ａ，１０は、このマッピング処理に適用するマッピング用対応データを記憶部に格納し、このマッピング用対応データを参照して、「ユーザ発話解釈データＡ」を、「ユーザ発話解釈データＢ」に変換する処理を行なう。
記憶部に格納されたマッピング用対応データは、
「ユーザ発話解釈データＡを構成するインテント、スロット」と、
「ユーザ発話解釈データＢを構成するテンテント、スロット」との対応データによって構成される。 The agent devices A and 10 store the mapping correspondence data applied to the mapping processing in the storage unit, refer to the mapping correspondence data, and refer to the "user utterance interpretation data A" and the "user utterance interpretation data B". Performs the process of converting to.
The corresponding data for mapping stored in the storage unit is
"Intents and slots that make up user utterance interpretation data A",
It is composed of data corresponding to "tents and slots constituting the user utterance interpretation data B".

エージェント装置Ａ，１０は、ステップＳ２５において、記憶部に格納されたマッピング用対応データを参照して、ステップＳ２２で生成した「ユーザ発話解釈データＡ」を、エージェント装置Ｂ，２０が理解可能な「ユーザ発話解釈データＢ」に変換（マッピング）する処理を行なう。 In step S25, the agent devices A and 10 refer to the mapping correspondence data stored in the storage unit, and the agent devices B and 20 can understand the "user utterance interpretation data A" generated in step S22. Performs a process of converting (mapping) into "user utterance interpretation data B".

具体的には、例えば先に図４を参照して説明したように、図４（ステップＳ０２）に示す以下の変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））を生成する。
（インテントｂ）ピザ注文
（スロットｂ１）配達時刻＝１２：００
（スロットｂ２）種類＝マルゲリータ Specifically, for example, as described above with reference to FIG. 4, the following conversion data (user utterance interpretation data B (intent b, slot b)) shown in FIG. 4 (step S02) is generated.
(Intent b) Pizza order (Slot b1) Delivery time = 12:00
(Slot b2) Type = Margherita

上記変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））は、エージェント装置Ｂ，２０が、以下のユーザ発話、すなわち、
「ピザ注文したいです。１２：００に配達で、マルゲリータお願い」
上記ユーザ発話を直接、入力した場合にエージェント装置Ｂ，２０が実行するユーザ発話解釈処理アルゴリズムＢに従って生成するユーザ発話解釈データＢ（インテントｂ、スロットｂ）に相当する。 In the conversion data (user utterance interpretation data B (intent b, slot b)), the agent devices B and 20 have the following user utterances, that is,
"I want to order pizza. Deliver at 12:00, please Margherita."
This corresponds to the user utterance interpretation data B (intent b, slot b) generated according to the user utterance interpretation processing algorithm B executed by the agent devices B and 20 when the user utterance is directly input.

（ステップＳ２６）
次に、エージェント装置Ａ，１０は、ステップＳ２６において、ステップＳ２５で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））をエージェント装置Ｂ，２０に送信する。 (Step S26)
Next, in step S26, the agent devices A and 10 transmit the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S25 to the agent devices B and 20.

なお、ステップＳ２５においてエージェント装置Ａ，１０が生成する変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））は、例えば、エージェント装置Ｂ，２０、またはエージェント装置Ｂ，２０の管理サーバ等が提供するＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）が入力データとして許容したデータである。
エージェント装置Ａ，１０は、ステップＳ２５で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））をこのＡＰＩを利用した処理（ＡＰＩのＣａｌｌ処理）により、エージェント装置Ｂ，２０に入力することができる。 The conversion data (user speech interpretation data B (intent b, slot b)) generated by the agent devices A and 10 in step S25 is, for example, the agent devices B, 20 or the management server of the agent devices B, 20. This is the data allowed as input data by the API (Application Programming Interface) provided by the above.
The agent devices A and 10 transfer the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S25 to the agent devices B and 20 by processing using this API (Call processing of the API). You can enter it.

（ステップＳ２７）
ステップＳ２６で、エージェント装置Ａ，１０から、変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））を入力したエージェント装置Ｂ，２０は、次に、ステップＳ２７において、入力変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））に基く処理を実行する。 (Step S27)
The agent devices B and 20 that have input the conversion data (user utterance interpretation data B (intent b, slot b)) from the agent devices A and 10 in step S26 then move the input conversion data (user) in step S27. The process based on the utterance interpretation data B (intent b, slot b)) is executed.

先に図４を参照して説明した実施例の場合、例えば、エージェント装置Ａ，１０から受信する変換データは、エージェント装置Ｂ，２０が、以下のユーザ発話、すなわち、
「ピザ注文したいです。１２：００に配達で、マルゲリータお願い」
上記ユーザ発話を直接、入力した場合にエージェント装置Ｂ，２０が実行するユーザ発話解釈処理アルゴリズムＢに従って生成するユーザ発話解釈データＢ（インテントｂ、スロットｂ）と同じである。
従って、エージェント装置Ｂ，２０は、ユーザ１から上記ユーザ発話を直接、入力した場合と同様、正確にユーザ１の意図を理解して正確な処理を実行することが可能となる。 In the case of the embodiment described above with reference to FIG. 4, for example, the conversion data received from the agent devices A and 10 is such that the agent devices B and 20 make the following user utterances, that is,
"I want to order pizza. Deliver at 12:00, please Margherita."
This is the same as the user utterance interpretation data B (intent b, slot b) generated according to the user utterance interpretation processing algorithm B executed by the agent devices B and 20 when the user utterance is directly input.
Therefore, the agent devices B and 20 can accurately understand the intention of the user 1 and execute accurate processing as in the case where the user utterance is directly input from the user 1.

［５．エージェント装置各々の異なるユーザ発話解釈データの例について］
次に、エージェント装置各々の異なるユーザ発話解釈データの例について説明する。 [5. Example of different user utterance interpretation data for each agent device]
Next, an example of different user utterance interpretation data for each agent device will be described.

上述したように、エージェント装置の各々が実行するユーザ発話解釈処理アルゴリズムは、エージェント装置の種類に応じて異なる場合が多い。 As described above, the user utterance interpretation processing algorithm executed by each of the agent devices often differs depending on the type of the agent device.

例えば、３台のエージェント装置Ａ〜Ｃに対して、同一のユーザ発話を入力しても、各エージェント装置が実行するユーザ発話解釈処理アルゴリズムが異なると、各エージェント装置Ａ〜Ｃが生成するユーザ発話解釈データ（インテント、スロット）は異なる場合がある。 For example, even if the same user utterance is input to the three agent devices A to C, if the user utterance interpretation processing algorithm executed by each agent device is different, the user utterance generated by each agent device A to C is generated. Interpretation data (intents, slots) may differ.

図９に、様々なユーザ発話に対して、ある１台のエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ａ以外の様々なエージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）の対応データの例を示す。
なお、エージェント装置Ｘは、１台のエージェント装置ではなく、エージェント装置Ａと異なる様々なエージェント装置である。 In FIG. 9, the user utterance interpretation data A (intent, slot) generated by one agent device A and various agent devices X other than the agent device A execute processing for various user utterances. An example of the corresponding data of the user utterance interpretation data X (intent, slot) required for this is shown.
The agent device X is not a single agent device, but various agent devices different from the agent device A.

図９に示す（１）は、
ユーザ発話＝「洗濯機のタイマーを１２：００に設定して」
上記のユーザ発話に対するエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）の対応データを示している。
この（１）の例は、これら２つのエージェント装置Ａ，Ｘ対応のインテント、スロットは、以下のように同一である。
（インテント）洗濯機タイマー設定
（スロット）開始時間＝１２：００ (1) shown in FIG. 9 is
User utterance = "Set the washing machine timer to 12:00"
Correspondence between the user utterance interpretation data A (intent, slot) generated by the agent device A for the above user utterance and the user utterance interpretation data X (intent, slot) required for the agent device X to execute the process. Shows the data.
In this example (1), the intents and slots corresponding to these two agent devices A and X are the same as follows.
(Intent) Washing machine timer setting (Slot) Start time = 12:00

この（１）の例の場合、エージェント装置Ａが、ユーザ発話解釈アルゴリズムＡを適用して生成したユーザ発話解釈データＡ（インテント、スロット）を、変換することなく、そのまま、エージェント装置Ｘ対応のＡＰＩを介してエージェント装置Ｘに入力することで、エージェント装置Ｘに、ユーザ発話に従った正しい処理を実行させることができる。 In the case of this example (1), the agent device A corresponds to the agent device X as it is without converting the user utterance interpretation data A (intent, slot) generated by applying the user utterance interpretation algorithm A. By inputting data to the agent device X via the API, the agent device X can be made to execute the correct process according to the user's utterance.

図９に示す（２），（３）は、（１）と同様、
ユーザ発話＝「洗濯機のタイマーを１２：００に設定して」
上記のユーザ発話に対するエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）の対応データを示している。
（２）の例は、エージェント装置Ａ，Ｘ対応のインテント、スロットが完全に一致しないが、類似する例である。
（３）の例は、エージェント装置Ａ，Ｘ対応のインテントは一致するが、スロットの数が一致しない例である。 (2) and (3) shown in FIG. 9 are the same as in (1).
User utterance = "Set the washing machine timer to 12:00"
Correspondence between the user utterance interpretation data A (intent, slot) generated by the agent device A for the above user utterance and the user utterance interpretation data X (intent, slot) required for the agent device X to execute the process. Shows the data.
The example of (2) is a similar example in which the intents and slots corresponding to the agent devices A and X do not completely match.
The example of (3) is an example in which the intents corresponding to the agent devices A and X match, but the number of slots does not match.

これらの場合、エージェント装置Ａが、ユーザ発話解釈アルゴリズムＡを適用して生成したユーザ発話解釈データＡ（インテント、スロット）を変換することなく、そのまま、エージェント装置Ｘ対応のＡＰＩを介してエージェント装置Ｘに入力した場合、エージェント装置Ｘがユーザ発話に従った正しい処理を実行できない可能性がある。 In these cases, the agent device A does not convert the user utterance interpretation data A (intent, slot) generated by applying the user utterance interpretation algorithm A, and the agent device A directly passes through the agent device X-compatible API. If input to X, the agent device X may not be able to perform correct processing according to the user's utterance.

このような場合、エージェント装置Ａは、ユーザ発話解釈アルゴリズムＡを適用して生成したユーザ発話解釈データＡ（インテント、スロット）を変換して、図９の（２），（３）の各エントリのユーザ発話解釈データＸ（インテント、スロット）に相当する変換データを生成して、エージェント装置Ｘ対応のＡＰＩを介してエージェント装置Ｘに入力することが必要となる。
このような変換データ生成、入力処理を行なうことで、エージェント装置Ｘにユーザ発話に従った正しい処理を実行させることが可能となる。 In such a case, the agent device A converts the user utterance interpretation data A (intent, slot) generated by applying the user utterance interpretation algorithm A, and enters each of the entries (2) and (3) in FIG. It is necessary to generate conversion data corresponding to the user utterance interpretation data X (intent, slot) of the above and input it to the agent device X via the API corresponding to the agent device X.
By performing such conversion data generation and input processing, it is possible to cause the agent device X to execute correct processing according to the user's utterance.

図９（４），（５）は、あるユーザ発話に対してエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）とが異なる例である。
これらは、エージェント装置Ａが生成するインテントが、エージェント装置Ｘが処理を実行するために必要となるインテントの下位データである例である。 9 (4) and 9 (5) show the user utterance interpretation data A (intent, slot) generated by the agent device A for a certain user utterance, and the user required for the agent device X to execute the process. This is an example in which the utterance interpretation data X (intent, slot) is different.
These are examples in which the intent generated by the agent device A is lower-level data of the intent required for the agent device X to execute the process.

図９（６），（７）も、あるユーザ発話に対してエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）とが異なる例である。
これらは、エージェント装置Ａが生成するインテントが、エージェント装置Ｘが処理を実行するために必要となるインテントの上位データである例である。 In FIGS. 9 (6) and 9 (7), the user utterance interpretation data A (intent, slot) generated by the agent device A for a certain user utterance and the user required for the agent device X to execute the process are also shown. This is an example in which the utterance interpretation data X (intent, slot) is different.
These are examples in which the intent generated by the agent device A is higher-level data of the intent required for the agent device X to execute the process.

これら（４）〜（７）の例の設定の場合にも、エージェント装置Ａは、ユーザ発話解釈アルゴリズムＡを適用して生成したユーザ発話解釈データＡ（インテント、スロット）を変換して、図９の（４）〜（７）の各エントリのユーザ発話解釈データＸ（インテント、スロット）に相当する変換データを生成して、エージェント装置Ｘ対応のＡＰＩを介してエージェント装置Ｘに入力することが必要となる。
このような変換データ生成、入力処理を行なうことで、エージェント装置Ｘにユーザ発話に従った正しい処理を実行させることが可能となる。 Also in the case of the settings of these examples (4) to (7), the agent device A converts the user utterance interpretation data A (intent, slot) generated by applying the user utterance interpretation algorithm A, and shows the figure. Generate conversion data corresponding to the user utterance interpretation data X (intent, slot) of each entry of 9 (4) to (7) and input it to the agent device X via the API corresponding to the agent device X. Is required.
By performing such conversion data generation and input processing, it is possible to cause the agent device X to execute correct processing according to the user's utterance.

なお、図９（６），（７）に示す例では、エージェント装置Ｘが処理を実行するために必要となるスロットに、ユーザ発話に含まれない時間情報が設定されている。
このように、エージェント装置Ｘが処理を実行するために必要となるスロットに、ユーザ発話に含まれない情報が含まれる場合には、エージェントＡは、エージェント装置Ｘが処理を実行するために必要となるスロットに相当する情報を取得するためにユーザに質問する等の処理を行なう。
この処理例については後段で説明する。 In the examples shown in FIGS. 9 (6) and 9 (7), time information not included in the user's utterance is set in the slot required for the agent device X to execute the process.
As described above, when the slot required for the agent device X to execute the process contains information not included in the user's speech, the agent A is required for the agent device X to execute the process. Performs processing such as asking a user a question in order to acquire information corresponding to the slot.
An example of this processing will be described later.

図１０は、図９（５）に示す例に対応するエージェント装置Ａ，１０の具体的処理例を説明する図である。 FIG. 10 is a diagram illustrating a specific processing example of the agent devices A and 10 corresponding to the example shown in FIG. 9 (5).

図１０に示すステップＳ２１〜Ｓ２７の各ステップは、先に説明した図６、図７のシーケンス図に示すステップＳ２１〜Ｓ２７の各ステップに対応する。 Each step of steps S21 to S27 shown in FIG. 10 corresponds to each step of steps S21 to S27 shown in the sequence diagram of FIGS. 6 and 7 described above.

まず、ステップＳ２１において、ユーザ１とエージェント装置Ａ，１０との間で対話処理が実行され、ユーザ１が、以下のユーザ発話を実行する。
ユーザ発話＝「和食店を検索して、大崎で１７：００から開いているところ」 First, in step S21, a dialogue process is executed between the user 1 and the agent devices A and 10, and the user 1 executes the following user utterance.
User utterance = "Searching for a Japanese restaurant and opening it in Osaki from 17:00"

次に、ステップＳ２２において、エージェント装置Ａ，１０は、ステップＳ２１の対話処理で取得したユーザ発話の音声認識処理や意味解析処理や対話状態推定処理によるユーザ発話解釈処理を実行する。 Next, in step S22, the agent devices A and 10 execute the user utterance interpretation process by the voice recognition process, the semantic analysis process, and the dialogue state estimation process of the user utterance acquired in the dialogue process of step S21.

この処理は、エージェント装置Ａ，１０のデータ処理部（音声解析部）が実行するユーザ発話解釈アルゴリズムＡに従って行われる。
エージェント装置Ａ，１０は、ユーザ発話解釈処理の結果として、以下のユーザ発話解釈データＡを生成する。
（インテント）和食店検索
（スロット１）回転時間＝１７：００
（スロット２）場所＝大崎 This processing is performed according to the user utterance interpretation algorithm A executed by the data processing unit (voice analysis unit) of the agent devices A and 10.
The agent devices A and 10 generate the following user utterance interpretation data A as a result of the user utterance interpretation process.
(Intent) Japanese restaurant search (Slot 1) Rotation time = 17:00
(Slot 2) Location = Osaki

次に、エージェント装置Ａ，１０は、ステップＳ２５において、ステップＳ２２で生成した「ユーザ発話解釈データＡ」を、エージェント装置Ｂ，２０が理解可能な「ユーザ発話解釈データＢ」に変換（マッピング）する処理を行なう。
すなわち、エージェント装置Ｂ，２０が処理を実行するために必要となるユーザ発話解釈データＢ（インテント、スロット）を生成する処理を行なう。 Next, in step S25, the agent devices A and 10 convert (map) the "user utterance interpretation data A" generated in step S22 into "user utterance interpretation data B" that the agent devices B and 20 can understand. Perform processing.
That is, the agent devices B and 20 perform a process of generating the user utterance interpretation data B (intent, slot) required for executing the process.

エージェント装置Ａ，１０は、ステップＳ２５において、以下の変換データ（ユーザ発話解釈データＢ（インテント、スロット））を生成する。
（インテント）レストラン検索
（スロット１）開店時間＝１７：００
（スロット２）場所＝大崎
（スロット３）ジャンル＝和食 The agent devices A and 10 generate the following conversion data (user utterance interpretation data B (intent, slot)) in step S25.
(Intent) Restaurant search (Slot 1) Opening time = 17:00
(Slot 2) Location = Osaki (Slot 3) Genre = Japanese food

次に、エージェント装置Ａ，１０は、ステップＳ２６において、ステップＳ２５で生成した変換データ（ユーザ発話解釈データＢ（インテント、スロット））をエージェント装置Ｂ，２０に送信する。
例えば、エージェント装置Ｂ，２０、またはエージェント装置Ｂ，２０の管理サーバ等が提供するＡＰＩを利用した処理（ＡＰＩのＣａｌｌ処理）によりエージェント装置Ｂ，２０に入力する。 Next, in step S26, the agent devices A and 10 transmit the conversion data (user utterance interpretation data B (intent, slot)) generated in step S25 to the agent devices B and 20.
For example, it is input to the agent devices B and 20 by processing using the API provided by the agent devices B and 20 or the management server of the agent devices B and 20 (Call processing of the API).

ステップＳ２６で、エージェント装置Ａ，１０から、変換データ（ユーザ発話解釈データＢ（インテント、スロット））を入力したエージェント装置Ｂ，２０は、次に、ステップＳ２７において、入力変換データ（ユーザ発話解釈データＢ（インテント、スロット））に基く処理を実行する。 The agent devices B and 20 that have input the conversion data (user utterance interpretation data B (intent, slot)) from the agent devices A and 10 in step S26 then in step S27, input conversion data (user utterance interpretation). Process based on data B (intent, slot)) is executed.

具体的には、大崎で１７：００から開いている和食店を検索する処理を実行する。
なお、この処理結果は、エージェント装置Ｂ，２０からエージェント装置Ａ，１０に送信され、エージェント装置Ａ，１０の出力部（表示部、スピーカ）を介してユーザ１に提示される。 Specifically, the process of searching for a Japanese restaurant that is open from 17:00 in Osaki is executed.
The processing result is transmitted from the agent devices B and 20 to the agent devices A and 10, and is presented to the user 1 via the output unit (display unit and speaker) of the agent devices A and 10.

［６．エージェント装置の実行する処理フローについて］
次に、エージェント装置の実行する処理フローについて説明する。 [6. Processing flow executed by the agent device]
Next, the processing flow executed by the agent device will be described.

図１１以下のフローチャートを参照して、ユーザと直接、対話を実行するエージェント装置の実行する処理シーケンスについて説明する。
なお、図１１以下のフローチャートに示す処理のほとんどは、ユーザと直接、対話を実行するエージェント装置、すなわち、図４〜図１０を参照して説明した例におけるエージェント装置Ａ，１０の実行する処理である。
一部は、ユーザと直接、対話をしないエージェント装置、すなわち、図４〜図１０を参照して説明した例におけるエージェント装置Ｂ，２０の実行する処理も含まれる。 FIG. 11 The processing sequence executed by the agent device that directly executes the dialogue with the user will be described with reference to the following flowcharts.
Most of the processes shown in the flowcharts shown in FIGS. 11 and 11 are the processes executed by the agent devices A and 10 that directly execute the dialogue with the user, that is, the examples described with reference to FIGS. 4 to 10. be.
A part also includes an agent device that does not directly interact with the user, that is, a process executed by the agent devices B and 20 in the examples described with reference to FIGS. 4 to 10.

図１１以下に示すフローにおいて、ユーザと直接、対話をするエージェント装置Ａ（エージェント装置Ａ，１０に相当）の実行する処理は実線で示しており、ユーザと直接、対話をしないエージェント装置Ｂ（エージェント装置Ｂ，２０に相当）の実行する処理は点線で示している。 In the flow shown in FIG. 11 and below, the processing executed by the agent device A (corresponding to the agent devices A and 10) that directly interacts with the user is shown by a solid line, and the agent device B (agent) that does not directly interact with the user. The processing executed by the devices B and 20) is shown by the dotted line.

図１１以下に示すフローチャートに従った処理は、エージェント装置Ａの記憶部に格納されたプログラムに従って実行される。例えばプログラム実行機能を有するＣＰＵ等のプロセッサによるプログラム実行処理として実行可能である。 FIG. 11 The process according to the flowchart shown below is executed according to the program stored in the storage unit of the agent device A. For example, it can be executed as a program execution process by a processor such as a CPU having a program execution function.

まず、図１１に示すフローの各ステップの処理について説明する。 First, the processing of each step of the flow shown in FIG. 11 will be described.

（ステップＳ１０１）
まず、ユーザとの直接対話を行なうエージェント装置Ａは、ステップＳ１０１において、ユーザ発話を入力する。 (Step S101)
First, the agent device A, which directly interacts with the user, inputs the user's utterance in step S101.

（ステップＳ１０２）
次に、エージェント装置Ａは、ステップＳ１０２において、ステップＳ１０１で入力したユーザ発話の音声認識処理や意味解析処理や対話状態推定処理によるユーザ発話解釈処理を実行する。 (Step S102)
Next, in step S102, the agent device A executes the user utterance interpretation process by the voice recognition process, the semantic analysis process, and the dialogue state estimation process of the user utterance input in step S101.

この処理は、エージェント装置Ａのデータ処理部（音声解析部）が実行するユーザ発話解釈アルゴリズムＡに従って行われる。
なお、このユーザ発話解釈処理においては、エージェント装置Ａの記憶部に格納されたコンテキスト（対話履歴情報）が参照される。エージェント装置Ａの記憶部には、例えばユーザ単位の過去のコンテキスト、すなわち対話履歴情報が格納されている。コンテキストを参照することで、例えばユーザ（Ｕ１）は、料理の話題が多い。あるいはユーザ（Ｕ２）は、車や旅行の話が多いといったデータが取得され、これらのデータに基づいて、ユーザのインテントを高精度に解析することが可能となる。 This processing is performed according to the user utterance interpretation algorithm A executed by the data processing unit (voice analysis unit) of the agent device A.
In this user utterance interpretation process, the context (dialogue history information) stored in the storage unit of the agent device A is referred to. In the storage unit of the agent device A, for example, the past context of each user, that is, the dialogue history information is stored. By referring to the context, for example, the user (U1) often talks about cooking. Alternatively, the user (U2) can acquire data such as a lot of stories about cars and travel, and can analyze the user's intent with high accuracy based on these data.

（ステップＳ１０３）
次に、エージェント装置Ａは、ステップＳ１０３において、ステップＳ１０２におけるユーザ発話解釈処理の結果として、ユーザ発話の意図（インテント：Ｉｎｔｅｎｔ）や、発話に含まれる意味のある要素（有意要素）である要素情報（スロット：Ｓｌｏｔ）からなるユーザ発話解釈データＡを生成する。 (Step S103)
Next, in step S103, the agent device A is an element that is a meaningful element (significant element) included in the user utterance intention (Intent) or the utterance as a result of the user utterance interpretation process in step S102. User utterance interpretation data A composed of information (slot: Slot) is generated.

（ステップＳ１０４〜Ｓ１０５）
次に、エージェント装置Ａは、ステップＳ１０４〜Ｓ１０５において、ステップＳ１０３で取得したユーザ発話解釈データに基づいて、エージェント装置Ａ自身でユーザ発話に応じた処理を実行できるかできないかを判定する。
エージェント装置Ａ自身での処理が可能であると判定した場合はステップＳ１０６に進む。
一方、エージェント装置Ａ自身での処理が可能でないと判定した場合はステップＳ２０１に進む。 (Steps S104 to S105)
Next, in steps S104 to S105, the agent device A determines whether or not the agent device A itself can execute the process according to the user utterance based on the user utterance interpretation data acquired in step S103.
If it is determined that the processing by the agent device A itself is possible, the process proceeds to step S106.
On the other hand, if it is determined that the processing by the agent device A itself is not possible, the process proceeds to step S201.

（ステップＳ１０６）
ステップＳ１０５において、ユーザ発話に応じた処理を、エージェント装置Ａ自身で実行可能であると判定した場合はステップＳ１０６に進む。
この場合、エージェント装置Ａは、ステップＳ１０６において、ユーザ発話に応じた処理を実行する。 (Step S106)
If it is determined in step S105 that the process corresponding to the user's utterance can be executed by the agent device A itself, the process proceeds to step S106.
In this case, the agent device A executes the process according to the user's utterance in step S106.

（ステップＳ１０７）
ステップＳ１０６において、ユーザ発話に応じた処理を実行した後、エージェント装置Ａは、ステップＳ１０７において、エージェント装置Ａとユーザとの対話履歴情報等のコンテキストを記憶部に格納する。 (Step S107)
After executing the process according to the user's utterance in step S106, the agent device A stores the context such as the dialogue history information between the agent device A and the user in the storage unit in step S107.

次に、ステップＳ１０５において、ユーザ発話に応じた処理を、エージェント装置Ａ自身で実行可能でないと判定した場合の処理について、図１２に示すフローチャートを参照して説明する。 Next, in step S105, a process when it is determined that the process according to the user's utterance cannot be executed by the agent device A itself will be described with reference to the flowchart shown in FIG.

（ステップＳ２０１）
ステップＳ１０５において、ユーザ発話に応じた処理を、エージェント装置Ａ自身で実行可能でないと判定した場合、エージェント装置Ａは、ステップＳ２０１において、ユーザ要求を処理できる通信可能な他のエージェント装置を検索する。 (Step S201)
If it is determined in step S105 that the process corresponding to the user's utterance cannot be executed by the agent device A itself, the agent device A searches for another communicable agent device capable of processing the user request in step S201.

エージェント装置Ａの記憶部には、例えば、先に説明した図８に示すようなエージェント装置リストを保持し、このリストを参照して、ユーザ要求転送先のエージェント装置を選択する。 For example, an agent device list as shown in FIG. 8 described above is held in the storage unit of the agent device A, and the agent device of the user request transfer destination is selected with reference to this list.

なお、先に図６、図７を参照して説明したシーケンスのステップＳ２３〜Ｓ２４のように、エージェント装置Ａがユーザに実行できないことを通知し、ユーザから要求を他のエージェント装置に転送するよう依頼されたことを条件として要求転送処理を開始してもよい。 Note that, as in steps S23 to S24 of the sequence described above with reference to FIGS. 6 and 7, the agent device A notifies the user that it cannot be executed, and the user transfers the request to another agent device. The request transfer process may be started on the condition that the request is made.

（ステップＳ２０２）
次に、エージェント装置Ａは、ステップＳ２０２において、ステップＳ２０１における検索処理において、ユーザ要求を処理できる通信可能な他のエージェント装置が検出されたか否かを判定する。
検出された場合は、ステップＳ２０４に進む。
一方、検出されなかった場合はステップＳ２０３に進む。 (Step S202)
Next, in step S202, the agent device A determines whether or not another communicable agent device capable of processing the user request is detected in the search process in step S201.
If detected, the process proceeds to step S204.
On the other hand, if it is not detected, the process proceeds to step S203.

（ステップＳ２０３）
ステップＳ２０２における判定処理において、ユーザ要求を処理できる通信可能な他のエージェント装置が検出されなかったと判定した場合、エージェント装置Ａは、ステップＳ２０３において、ユーザ発話に基づく処理が実行できないことをユーザに伝えるためのシステム発話を生成して、ユーザに向けて出力する。 (Step S203)
If it is determined in the determination process in step S202 that no other communicable agent device capable of processing the user request has been detected, the agent device A informs the user that the process based on the user's utterance cannot be executed in step S203. Generates a system utterance for the user and outputs it to the user.

（ステップＳ２０４）
一方、ステップＳ２０２における判定処理において、ユーザ要求を処理できる通信可能な他のエージェント装置（エージェント装置Ｂ）が検出されたと判定した場合、エージェント装置Ａは、ステップＳ２０４において以下の処理を実行する。 (Step S204)
On the other hand, if it is determined in the determination process in step S202 that another communicable agent device (agent device B) capable of processing the user request is detected, the agent device A executes the following process in step S204.

エージェント装置Ａは、先のステップＳ１０３で生成した「ユーザ発話解釈データＡ」を、ユーザ要求を処理できる通信可能なエージェント装置Ｂが理解可能な「ユーザ発話解釈データＢ」に変換（マッピング）する処理を行なう。
すなわち、エージェント装置Ｂ，２０が処理を実行するために必要となるユーザ発話解釈データＢ（インテント、スロット）に変換（マッピング）する処理を行なう。 The agent device A converts (maps) the "user utterance interpretation data A" generated in the previous step S103 into "user utterance interpretation data B" that can be understood by the communicable agent device B that can process the user request. To do.
That is, the agent devices B and 20 perform a process of converting (mapping) the user utterance interpretation data B (intent, slot) required for executing the process.

前述したように、エージェント装置Ａは、このマッピング処理に適用するマッピング用対応データを記憶部に格納し、このマッピング用対応データを参照して、「ユーザ発話解釈データＡ」を、「ユーザ発話解釈データＢ」に変換する処理を行なう。
記憶部に格納されたマッピング用対応データは、
「ユーザ発話解釈データＡを構成するインテント、スロット」と、
「ユーザ発話解釈データＢを構成するテンテント、スロット」との対応データによって構成される。 As described above, the agent device A stores the mapping correspondence data applied to this mapping process in the storage unit, refers to the mapping correspondence data, and converts the “user utterance interpretation data A” into the “user utterance interpretation data A”. Performs the process of converting to "data B".
The corresponding data for mapping stored in the storage unit is
"Intents and slots that make up user utterance interpretation data A",
It is composed of data corresponding to "tents and slots constituting the user utterance interpretation data B".

エージェント装置Ａは、ステップＳ２０４において、記憶部に格納されたマッピング用対応データを参照して、ステップＳ１０３で生成した「ユーザ発話解釈データＡ」を、エージェント装置Ｂが理解可能な「ユーザ発話解釈データＢ」に変換（マッピング）する処理を行なう。 In step S204, the agent device A refers to the mapping correspondence data stored in the storage unit, and the agent device B can understand the “user utterance interpretation data A” generated in step S103 as “user utterance interpretation data”. Performs the process of converting (mapping) to "B".

（ステップＳ２０５）
次に、エージェント装置Ａは、ステップＳ２０５において、ステップＳ２０４で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））をエージェント装置Ｂに送信する。 (Step S205)
Next, in step S205, the agent device A transmits the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S204 to the agent device B.

なお、ステップＳ２０４においてエージェント装置Ａが生成する変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））は、例えば、エージェント装置Ｂ、またはエージェント装置Ｂの管理サーバ等が提供するＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）が入力データとして許容したデータである。
エージェント装置Ａは、ステップＳ２０４で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））をこのＡＰＩを利用した処理（ＡＰＩのＣａｌｌ処理）により、エージェント装置Ｂに入力することができる。 The conversion data (user speech interpretation data B (intent b, slot b)) generated by the agent device A in step S204 is, for example, an API (Application) provided by the agent device B, the management server of the agent device B, or the like. This is the data allowed by the Programming Interface) as input data.
The agent device A may input the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S204 to the agent device B by a process using this API (Call process of API). can.

（ステップＳ２０６）
ステップＳ２０６の処理は、エージェント装置Ｂが実行する処理である。
ステップＳ２０５において、エージェント装置Ａから、変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））を入力したエージェント装置Ｂは、ステップＳ２０６において、入力変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））に基く処理を実行する。 (Step S206)
The process of step S206 is a process executed by the agent device B.
In step S205, the agent device B that has input the conversion data (user utterance interpretation data B (intent b, slot b)) from the agent device A has input conversion data (user utterance interpretation data B (intent)) in step S206. b, the process based on slot b)) is executed.

エージェント装置Ａから入力した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））は、エージェント装置Ｂが、ステップＳ１０１のユーザ発話を、直接入力した場合にエージェント装置Ｂが実行するユーザ発話解釈処理アルゴリズムＢに従って生成するユーザ発話解釈データＢ（インテントｂ、スロットｂ）に相当するデータである。
従って、エージェント装置Ｂは、ユーザからユーザ発話を直接、入力した場合と同様、正確にユーザの意図を理解して正確な処理を実行することが可能となる。 The conversion data (user utterance interpretation data B (intent b, slot b)) input from the agent device A is the user utterance executed by the agent device B when the agent device B directly inputs the user utterance in step S101. This is data corresponding to user utterance interpretation data B (intent b, slot b) generated according to the interpretation processing algorithm B.
Therefore, the agent device B can accurately understand the user's intention and execute accurate processing as in the case where the user's utterance is directly input from the user.

また、エージェント装置Ｂでは、ユーザ発話に対する音声認識処理や意味解析処理や対話状態推定処理等のユーザ発話解釈処理を行なう必要がなく、処理が遅延することなくスムーズに処理を行なうことが可能となる。 Further, in the agent device B, it is not necessary to perform user utterance interpretation processing such as voice recognition processing, semantic analysis processing, and dialogue state estimation processing for the user utterance, and the processing can be smoothly performed without delay. ..

（ステップＳ２０７）
次に、エージェント装置Ａは、ステップＳ２０７において、エージェント装置Ｂから処理完了通知を受信する。 (Step S207)
Next, in step S207, the agent device A receives the processing completion notification from the agent device B.

（ステップＳ２０８）
最後に、エージェント装置Ａは、ステップＳ２０８において、ユーザに処理が完了したことを通知する。例えば、「処理が完了しました」や、「ピザを注文しました」等、ユーザ発話に応じた処理が完了したことをユーザに知らせるためのシステム発話を生成して、ユーザに向けて出力する。 (Step S208)
Finally, the agent device A notifies the user that the process is completed in step S208. For example, a system utterance for notifying the user that the process according to the user's utterance is completed, such as "process completed" or "I ordered pizza", is generated and output to the user.

このように、図１１、図１２に示すフローチャートに従った処理を行なうことで、ユーザの発話対象のエージェント装置が、ユーザ発話に応じた処理を実行できない場合であっても、他のエージェント装置に処理を実行させることが可能となる。
この結果、ユーザは、他のエージェント装置がある場所に移動して、同じ発話を繰り返す必要がなくなり、ユーザ負担が軽減される。
すなわち、ユーザは、１つのエージェント装置に対して１度の発話を行なえば、複数のエージェント装置の全ての機能を利用した処理を行なうことが可能となる。 In this way, by performing the processing according to the flowcharts shown in FIGS. 11 and 12, even if the agent device to be spoken by the user cannot execute the processing according to the user's utterance, the other agent device can be used. It is possible to execute the process.
As a result, the user does not have to move to a place where another agent device is located and repeat the same utterance, and the burden on the user is reduced.
That is, if the user makes one utterance to one agent device, the user can perform processing using all the functions of the plurality of agent devices.

［７．処理を実行するためのデータが不足している場合の処理例について］
次に、処理を実行するためのデータが不足している場合の処理例について説明する。 [7. About processing example when there is not enough data to execute processing]
Next, a processing example when there is insufficient data for executing the processing will be described.

図１１、図１２に示すフローチャートを参照して説明したように、ユーザ発話を直接、入力するエージェント装置Ａが、ユーザ発話に応じた処理を実行できない場合には、他の通信可能なエージェント装置Ｂに処理を依頼することになる。 As described with reference to the flowcharts shown in FIGS. 11 and 12, if the agent device A that directly inputs the user utterance cannot execute the process according to the user utterance, another communicable agent device B Will be requested to process.

この場合、エージェント装置Ａは、ユーザ発話に基づいて生成したユーザ発話解釈データＡ（インテントａ、スロットａ）を変換して、処理を実行するエージェント装置Ｂの理解可能なユーザ発話解釈データＢ（インテントｂ、スロットｂ）を生成する。 In this case, the agent device A converts the user utterance interpretation data A (intent a, slot a) generated based on the user utterance, and executes the process. The understandable user utterance interpretation data B ( Intent b, slot b) is generated.

しかし、ユーザ発話解釈データＡ（インテントａ、スロットａ）を変換して生成されるユーザ発話解釈データＢ（インテントｂ、スロットｂ）に含まれるデータのみでは、エージェント装置Ｂが処理を実行できない場合がある。 However, the agent device B cannot execute the process only with the data included in the user utterance interpretation data B (intent b, slot b) generated by converting the user utterance interpretation data A (intent a, slot a). In some cases.

例えば先に説明した図９に示すデータ、すなわち、様々なユーザ発話に対して、ある１台のエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ａ以外の様々なエージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）の対応データ例において、エントリ（３）や、エントリ（６），（７）のデータである。 For example, the data shown in FIG. 9 described above, that is, the user utterance interpretation data A (intent, slot) generated by one agent device A for various user utterances, and various data other than the agent device A. It is the data of the entry (3) and the entries (6) and (7) in the corresponding data example of the user utterance interpretation data X (intent, slot) required for the agent device X to execute the process.

例えば図９（３）は、
ユーザ発話＝「洗濯機のタイマーを１２：００に設定して」
上記のユーザ発話に対するエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）の対応データを示している。 For example, FIG. 9 (3) shows
User utterance = "Set the washing machine timer to 12:00"
Correspondence between the user utterance interpretation data A (intent, slot) generated by the agent device A for the above user utterance and the user utterance interpretation data X (intent, slot) required for the agent device X to execute the process. Shows the data.

ここで、エントリ（４）のエージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）は以下の各データである。
（インテント）洗濯機タイマー設定
（スロット１）開始時刻＝１２：００
（スロット２）開始モード＝？ Here, the user utterance interpretation data X (intent, slot) required for the agent device X of the entry (4) to execute the process is the following data.
(Intent) Washing machine timer setting (Slot 1) Start time = 12:00
(Slot 2) Start mode =?

これらのインテント、スロット１，２中、インテント、スロット１はユーザ発話「洗濯機のタイマーを１２：００に設定して」から取得されるが、
（スロット２）開始モード
このスロットは、上記のユーザ発話からは取得されないデータである。
エージェント装置Ｘにおいて、正確な処理（＝洗濯機のタイマー設定処理）を行なうためには、
（スロット２）開始モード
このスロットデータが必要であり、このデータは、上記のユーザ発話「洗濯機のタイマーを１２：００に設定して」からは取得できない。 Among these intents, slots 1 and 2, intents and slots 1 are obtained from the user utterance "Set the timer of the washing machine to 12:00".
(Slot 2) Start mode This slot is data that is not acquired from the above user utterance.
In order to perform accurate processing (= timer setting processing of the washing machine) in the agent device X,
(Slot 2) Start mode This slot data is required, and this data cannot be obtained from the above user utterance "Set the timer of the washing machine to 12:00".

同様に、図９（６），（７）は、それぞれ、
ユーザ発話＝「イタリアンを検索して、大崎で探して」
ユーザ発話＝「和食店を検索して、大崎で探して」
上記のユーザ発話に対するエージェント装置Ａが生成するユーザ発話解釈データＡ（インテント、スロット）と、エージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）の対応データを示している。 Similarly, FIGS. 9 (6) and 9 (7) are shown in FIGS. 9 (6) and 9 (7), respectively.
User utterance = "Search Italian and search in Osaki"
User utterance = "Search for a Japanese restaurant and look for it in Osaki"
Correspondence between the user utterance interpretation data A (intent, slot) generated by the agent device A for the above user utterance and the user utterance interpretation data X (intent, slot) required for the agent device X to execute the process. Shows the data.

図９に示すデータには、これらのエージェント装置Ｘが処理を実行するために必要となるユーザ発話解釈データＸ（インテント、スロット）として、
（スロット１）開店時間＝１７：００−２３：００
このスロットが含まれているが、この（スロット）は、上記のユーザ発話からは取得されないデータである。
エージェント装置Ｘにおいて、正確な処理（レストラン検索処理）を行なうためには、
（スロット１）開店時間＝１７：００−２３：００
このスロットデータが必要であり、このデータは、ユーザ発話からは取得できない。 The data shown in FIG. 9 includes user utterance interpretation data X (intents, slots) required for these agent devices X to execute processing.
(Slot 1) Opening hours = 17: 00-23: 00
Although this slot is included, this (slot) is data that is not acquired from the above user utterance.
In order to perform accurate processing (restaurant search processing) in the agent device X,
(Slot 1) Opening hours = 17: 00-23: 00
This slot data is required and this data cannot be obtained from user utterances.

このように、ユーザ発話を、入力するエージェント装置Ａが、ユーザ発話に基づいて生成したユーザ発話解釈データＡ（インテントａ、スロットａ）を変換してユーザ発話解釈データＢ（インテントｂ、スロットｂ）を生成し、これを、エージェント装置Ｂに送信しても、エージェント装置Ｂは、処理を実行できない場合がある。 In this way, the agent device A that inputs the user utterance converts the user utterance interpretation data A (intent a, slot a) generated based on the user utterance, and the user utterance interpretation data B (intent b, slot a). Even if b) is generated and transmitted to the agent device B, the agent device B may not be able to execute the process.

このような場合、エージェント装置Ｂは、処理を実行するために必要な情報を取得するようにエージェント装置Ａに依頼する。
エージェント装置Ａは、エージェント装置Ｂからの情報取得依頼に応じて、ユーザに質問を行い、ユーザからの応答を取得して、この応答に基づくデータをエージェント装置Ｂに送信する。
この追加情報送信処理を行なうことで、エージェント装置Ｂは、ユーザ発話に応じた処理を正確に実行することが可能となる。 In such a case, the agent device B requests the agent device A to acquire the information necessary for executing the process.
The agent device A asks a question to the user in response to the information acquisition request from the agent device B, acquires a response from the user, and transmits data based on this response to the agent device B.
By performing this additional information transmission processing, the agent device B can accurately execute the processing according to the user's utterance.

この処理シーケンスについて、図１３に示すフローチャートを参照して説明する。
図１３に示すフローチャートは、先に図１１、図１２を参照して説明したフローのステップＳ２０５に続いて実行する処理を示している。
図１３に示すフロー中、実線で示すステップは、ユーザ発話を直接、入力するエージェント装置Ａの実行する処理であり、点線で示すステップは、エージェント装置Ａと通信を行い、ユーザ発話に応じた処理を実行するエージェント装置Ｂの実行する処理である。 This processing sequence will be described with reference to the flowchart shown in FIG.
The flowchart shown in FIG. 13 shows a process to be executed following step S205 of the flow described above with reference to FIGS. 11 and 12.
In the flow shown in FIG. 13, the step shown by the solid line is the process executed by the agent device A that directly inputs the user utterance, and the step shown by the dotted line is the process that communicates with the agent device A and responds to the user utterance. This is a process executed by the agent device B that executes the above.

図１３に示すフローの各ステップの処理について説明する。
（ステップＳ２０５）
このステップＳ２０５の処理は、先に図１２を参照して説明した処理である。すなわち、エージェント装置Ａは、ステップＳ２０５において、ステップＳ２０４で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））をエージェント装置Ｂに送信する。 The processing of each step of the flow shown in FIG. 13 will be described.
(Step S205)
The process of step S205 is the process described above with reference to FIG. That is, in step S205, the agent device A transmits the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S204 to the agent device B.

（ステップＳ２１１）
ステップＳ２１１〜Ｓ２１３の処理は、エージェント装置Ｂが実行する処理である。
まず、エージェント装置Ｂは、ステップＳ２０４でエージェント装置Ａから受信した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））の検証処理を実行する。 (Step S211)
The processes of steps S211 to S213 are processes executed by the agent device B.
First, the agent device B executes a verification process of the conversion data (user utterance interpretation data B (intent b, slot b)) received from the agent device A in step S204.

具体的には、エージェント装置Ｂでの処理に必要なデータがすべて、エージェント装置Ａからの受信データに含まれているか否かを判定する。すなわち、エージェント装置Ａから受信した変換データであるユーザ発話解釈データＢ（インテントｂ、スロットｂ）に不足情報があるか否かを判定する。
不足情報があると判定した場合は、ステップＳ２１２に進む。 Specifically, it is determined whether or not all the data required for processing in the agent device B is included in the received data from the agent device A. That is, it is determined whether or not there is insufficient information in the user utterance interpretation data B (intent b, slot b) which is the conversion data received from the agent device A.
If it is determined that there is insufficient information, the process proceeds to step S212.

一方、不足情報がなく、処理を実行できると判定した場合は、ステップＳ２０６に進む。この場合は、先に図１２を参照して説明したステップＳ２０６〜Ｓ２０８の処理を実行する。 On the other hand, if it is determined that there is no insufficient information and the process can be executed, the process proceeds to step S206. In this case, the processes of steps S206 to S208 described above with reference to FIG. 12 are executed.

（ステップＳ２１２）
ステップＳ２１１において、エージェント装置Ｂが、エージェント装置Ａから受信した変換データであるユーザ発話解釈データＢ（インテントｂ、スロットｂ）に不足情報があると判定した場合、エージェント装置ＢはステップＳ２１２の処理を実行する。 (Step S212)
If the agent device B determines in step S211 that the user utterance interpretation data B (intent b, slot b), which is the conversion data received from the agent device A, has insufficient information, the agent device B processes the process in step S212. To execute.

エージェント装置Ｂは、ステップＳ２１２において、不足情報を取得するためのユーザへの質問をエージェント装置Ａに依頼する。 In step S212, the agent device B requests the agent device A to ask a question to the user for acquiring the shortage information.

エージェント装置Ａは、エージェント装置Ｂからの依頼に応じて、不足情報を取得するための質問を生成してユーザに対して出力する。すなわち質問を実行し、質問に対するユーザ応答に対するユーザ発話に対する処理を実行する。
新たなユーザ発話に対して実行する処理は、先に図１１、図１２を参照して説明したステップＳ１０２〜Ｓ１０３，ステップＳ２０４〜Ｓ２０５の処理と同様の処理である。 In response to the request from the agent device B, the agent device A generates a question for acquiring the shortage information and outputs the question to the user. That is, the question is executed, and the process for the user's utterance in response to the user's response to the question is executed.
The process executed for the new user utterance is the same process as the processes of steps S102 to S103 and steps S204 to S205 described above with reference to FIGS. 11 and 12.

（ステップＳ２１３）
次に、エージェント装置Ｂは、ステップＳ２１３において、エージェント装置から追加情報を取得し、さらに、ステップＳ２１１に戻り、不足情報の有無を判定する。 (Step S213)
Next, the agent device B acquires additional information from the agent device in step S213, and further returns to step S211 to determine whether or not there is insufficient information.

追加情報を入力しても、まだ不足情報がある場合は、ステップＳ２１１〜Ｓ２１３の処理を繰り返す。
ステップＳ２１１において、不足情報がないと判定されると、ステップＳ２０６に進み、処理を実行する。 If there is still insufficient information even after inputting additional information, the processes of steps S211 to S213 are repeated.
If it is determined in step S211 that there is no insufficient information, the process proceeds to step S206 to execute the process.

このように、エージェント装置Ｂは、処理の実行に必要となる情報が不足していると判定した場合は、エージェント装置Ａに不足情報を取得するように依頼し、不足情報を再送信してもらい、処理に必要となるデータを揃えて処理を実行する。
この追加情報送信処理を行なうことで、エージェント装置Ｂは、ユーザ発話に応じた処理を正確に実行することが可能となる。 In this way, when the agent device B determines that the information required for executing the process is insufficient, the agent device B requests the agent device A to acquire the missing information and retransmits the missing information. , Align the data required for processing and execute the processing.
By performing this additional information transmission processing, the agent device B can accurately execute the processing according to the user's utterance.

図１３に示すフローチャートを参照して説明した処理では、ユーザ発話に応じた処理を実行するエージェント装置Ｂが処理実行に必要となる不足情報の有無を判定している。
このような処理態様の他、エージェント装置Ｂでの処理に必要な情報に不足があるか否かを、エージェント装置Ａ側で判定する構成も可能である。
この処理を行なう場合の処理シーケンス例について、図１４に示すフローチャートを参照して説明する。 In the process described with reference to the flowchart shown in FIG. 13, the agent device B that executes the process according to the user's utterance determines whether or not there is insufficient information required for the process execution.
In addition to such a processing mode, it is also possible for the agent device A to determine whether or not the information required for processing by the agent device B is insufficient.
An example of the processing sequence when this processing is performed will be described with reference to the flowchart shown in FIG.

図１４に示すフロー中のステップＳ２０４、ステップＳ２０５〜Ｓ２０８は、先に図１２を参照して説明したフローのステップＳ２０４、ステップＳ２０５〜Ｓ２０８と同様の処理である。
ステップＳ２２１〜Ｓ２２４が新たに追加される処理である。ステップＳ２２１〜Ｓ２２４の処理は、エージェント装置ＡがステップＳ２０４の処理に続いて実行する。
ステップＳ２０４，ステップＳ２２１〜Ｓ２２４の処理について説明する。 Steps S204 and S205 to S208 in the flow shown in FIG. 14 are the same processes as steps S204 and S205 to S208 of the flow described above with reference to FIG.
Steps S221 to S224 are newly added processes. The processing of steps S221 to S224 is executed by the agent device A following the processing of step S204.
The processing of steps S204 and S221 to S224 will be described.

（ステップＳ２０４）
先に図１２を参照して説明したステップＳ２０２における判定処理において、ユーザ要求を処理できる通信可能な他のエージェント装置（エージェント装置Ｂ）が検出されたと判定した場合、エージェント装置Ａは、ステップＳ２０４において以下の処理を実行する。 (Step S204)
When it is determined that another communicable agent device (agent device B) capable of processing the user request is detected in the determination process in step S202 described above with reference to FIG. 12, the agent device A determines in step S204. Execute the following processing.

前述したように、エージェント装置Ａは、マッピング処理に適用するマッピング用対応データを記憶部に格納し、このマッピング用対応データを参照して、「ユーザ発話解釈データＡ」を、「ユーザ発話解釈データＢ」に変換する処理を行なう。 As described above, the agent device A stores the mapping correspondence data applied to the mapping process in the storage unit, and refers to the mapping correspondence data to refer to the “user utterance interpretation data A” and the “user utterance interpretation data A”. Perform the process of converting to "B".

（ステップＳ２１１）
次に、エージェント装置Ａは、ステップＳ２１１において、ステップＳ２０４で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））が、エージェント装置Ｂでの処理に必要な情報として十分であるか、不足があるかを判定する。 (Step S211)
Next, in step S211 of the agent device A, the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S204 is sufficient as information necessary for processing in the agent device B. Or determine if there is a shortage.

なお、不足があるか否かの判定は、例えば、上述した記憶部に格納されたマッピングデータを参照して判定することが可能である。
不足情報があると判定した場合は、ステップＳ２２２に進む。 It should be noted that it is possible to determine whether or not there is a shortage by referring to, for example, the mapping data stored in the storage unit described above.
If it is determined that there is insufficient information, the process proceeds to step S222.

（ステップＳ２２２）
ステップＳ２２１において、ステップＳ２０４で生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））が、エージェント装置Ｂでの処理に必要な情報として不十分であると判定した場合、エージェント装置ＡはステップＳ２２２の処理を実行する。 (Step S222)
If it is determined in step S221 that the conversion data (user utterance interpretation data B (intent b, slot b)) generated in step S204 is insufficient as information necessary for processing in the agent device B, the agent device A executes the process of step S222.

エージェント装置Ａは、ステップＳ２２２において、不足情報を取得するための質問を生成してユーザに対して出力すし、ユーサから、質問に対する応答としての新たなユーザ発話を入力する。 In step S222, the agent device A generates a question for acquiring the missing information and outputs it to the user, and inputs a new user utterance as a response to the question from the user.

（ステップＳ２２３〜Ｓ２２４）
次に、エージェント装置Ａは、質問に対する応答としての新たなユーザ発話に対してステップＳ２２３〜Ｓ２２４の処理を実行する。 (Steps S223 to S224)
Next, the agent device A executes the processes of steps S223 to S224 for a new user utterance as a response to the question.

この処理は、先に図１１、図１２を参照して説明したステップＳ１０２〜Ｓ１０３の処理と同様の処理である。
すなわち、ステップＳ２２３では、ステップ２２２で入力した質問に対する新たなユーザ発話の音声認識処理や意味解析処理や対話状態推定処理によるユーザ発話解釈処理を実行する。
さらに、ステップＳ２２４で、ユーザ発話の意図（インテント：Ｉｎｔｅｎｔ）や、発話に含まれる意味のある要素（有意要素）である要素情報（スロット：Ｓｌｏｔ）からなるユーザ発話解釈データＡ（インテント、スロット）を生成する。 This process is the same as the process of steps S102 to S103 described above with reference to FIGS. 11 and 12.
That is, in step S223, the user utterance interpretation process by the voice recognition process, the semantic analysis process, and the dialogue state estimation process of the new user utterance for the question input in step 222 is executed.
Further, in step S224, the user utterance interpretation data A (intent, Intent) composed of the intention of the user utterance (Intent) and the element information (slot: Slot) which is a meaningful element (significant element) included in the utterance. Slot) is created.

これらの処理の後、ステップＳ２０４に戻り、ステップＳ２２４で追加生成したユーザ発話解釈データＡ（インテント、スロット）を、先のユーザ発話に基づいて生成済みのユーザ発話解釈データＡ（インテント、スロット）に追加し、追加後の新たなユーザ発話解釈データＡ（インテント、スロット）に基づいて、「ユーザ発話解釈データＢ」を生成（マッピング）する処理を行なう。 After these processes, the process returns to step S204, and the user utterance interpretation data A (intent, slot) additionally generated in step S224 is used as the user utterance interpretation data A (intent, slot) generated based on the previous user utterance. ), And based on the new user utterance interpretation data A (intent, slot) after the addition, the process of generating (mapping) the "user utterance interpretation data B" is performed.

さらに、ステップＳ２２１で、新たに生成した「ユーザ発話解釈データＢ」が、エージェント装置Ｂでの処理に必要な情報として十分であるか否かを判定する。不足がある場合は、ステップＳ２２２〜Ｓ２２４，Ｓ２０４〜Ｓ２２１の処理を繰り返す。 Further, in step S221, it is determined whether or not the newly generated "user utterance interpretation data B" is sufficient as information necessary for processing in the agent device B. If there is a shortage, the processes of steps S222 to S224 and S204 to S221 are repeated.

ステップＳ２２１で新たに生成した「ユーザ発話解釈データＢ」が、エージェント装置Ｂでの処理に必要な情報として十分であると判定された場合は、ステップＳ２０５に進み、変換データ（ユーザ発話解釈データＢ）をエージェント装置Ｂに送信する。 If it is determined that the "user utterance interpretation data B" newly generated in step S221 is sufficient as the information necessary for processing in the agent device B, the process proceeds to step S205, and the conversion data (user utterance interpretation data B) is obtained. ) Is transmitted to the agent device B.

その後、ステップＳ２０６においてエージェント装置Ｂにおいて送信データに基づく処理が実行される。
エージェント装置Ｂは、処理の実行に必要となる情報をすべて取得したうえで、処理を実行することが可能となり、ユーザ発話に応じた処理を正確に実行することが可能となる。 After that, in step S206, the agent device B executes a process based on the transmitted data.
The agent device B can execute the process after acquiring all the information necessary for executing the process, and can accurately execute the process according to the user's utterance.

［８．複数のユーザ発話の管理処理例について］
次に、複数のユーザ発話の管理処理例について説明する。 [8. About management processing example of multiple user utterances]
Next, an example of management processing of a plurality of user utterances will be described.

ユーザ発話を直接、入力するエージェント装置Ａは、連続して同一ユーザ、あるいは異なる複数のユーザからユーザ発話を入力する場合がある。 The agent device A that directly inputs the user utterance may continuously input the user utterance from the same user or a plurality of different users.

エージェント装置Ａは、このように入力する複数のユーザ発話の各々について、１つの共通タスク（処理）についての発話であるか、別のタスク（処理）についての発話であるかを区別して管理しないと、誤った処理を行なってしまう可能性がある。 The agent device A must manage each of the plurality of user utterances input in this way by distinguishing whether the utterance is about one common task (process) or another task (process). , There is a possibility that wrong processing will be performed.

このような誤った処理の発生を防止するため、エージェント装置は、ユーザ発話の各々にタスクＩＤを設定した対話履歴情報、すなわちコンテキストを記憶部に記録する。
図１５は、記憶部に記録される対話履歴情報（コンテキスト）のデータ例を示す図である。 In order to prevent the occurrence of such an erroneous process, the agent device records the dialogue history information in which the task ID is set for each user utterance, that is, the context in the storage unit.
FIG. 15 is a diagram showing a data example of dialogue history information (context) recorded in the storage unit.

図１５には、時間（ｔ１），（ｔ２）に入力した以下の２つのユーザ発話に関する登録データの例を示している。
時間ｔ１のユーザ発話＝ピザ注文したいです。１２：００に配達で
時間ｔ２のユーザ発話＝マルゲリータ１つで
これら２つのユーザ発話に関する対話履歴情報（コンテキスト）の登録データである。 FIG. 15 shows an example of registration data related to the following two user utterances input at the time (t1) and (t2).
User utterance at time t1 = I want to order pizza. User utterance at time t2 at 12:00 delivery = one Margherita is registered data of dialogue history information (context) related to these two user utterances.

これらのユーザ発話を入力するエージェント装置Ａのデータ処理部は、図１５に示す対話履歴情報（コンテキスト）の登録データを生成して記憶部に格納する。
すなわち各ユーザ発話について、以下の対応データを生成して記録する。
（ａ）タスクＩＤ
（ｂ）ユーザ発話（テキストデータ）
（ｃ）発話ユーザ識別子
（ｄ）ユーザ発話解釈データ（インテント，スロット）
（ｅ）処理実行エージェント識別子 The data processing unit of the agent device A for inputting these user utterances generates the registration data of the dialogue history information (context) shown in FIG. 15 and stores it in the storage unit.
That is, the following corresponding data is generated and recorded for each user's utterance.
(A) Task ID
(B) User utterance (text data)
(C) Spoken user identifier (d) User utterance interpretation data (intent, slot)
(E) Processing execution agent identifier

なお、タスクＩＤは、１つのタスクに関する発話である場合は、同一のＩＤを設定する。
図１５に示す例は、
時間ｔ１のユーザ発話＝ピザ注文したいです。１２：００に配達で
時間ｔ２のユーザ発話＝マルゲリータ１つで
これら２つのユーザ発話をエージェント装置Ａのデータ処理部が同一タスクと判定し、同一のタスクＩＤを設定した例である。 If the task ID is an utterance related to one task, the same ID is set.
The example shown in FIG. 15 is
User utterance at time t1 = I want to order pizza. This is an example in which the data processing unit of the agent device A determines that these two user utterances are the same task with one user utterance at time t2 at 12:00 and the same task ID is set.

同一のタスクＩＤを設定するか否かの判定基準としては、例えば各発話の時間間隔や、発話解釈処理において推定するユーザ発話対応のドメインの類似性が利用される。 As a criterion for determining whether or not to set the same task ID, for example, the time interval of each utterance and the similarity of the domain corresponding to the user utterance estimated in the utterance interpretation process are used.

具体的には、複数のユーザ発話の時間間隔が予め規定したしきい値時間以下である場合、これらのユーザ発話は同一タスクと判定し、同一のタスクＩＤを設定する。
また、各ユーザ発話に対して実行する発話解釈処理において推定されたユーザ発話対応のドメインが類似している場合は、これらのユーザ発話は同一タスクと判定し、同一のタスクＩＤを設定する。 Specifically, when the time interval between a plurality of user utterances is equal to or less than a predetermined threshold time, these user utterances are determined to be the same task, and the same task ID is set.
Further, when the domains corresponding to the user utterances estimated in the utterance interpretation process executed for each user utterance are similar, it is determined that these user utterances are the same task, and the same task ID is set.

なお、ドメインとは、ユーザ発話の意味領域であり、先に説明したインテントの上位概念である。
例えば、
ユーザ発話＝洗濯機のタイマーを１２：００に設定して
このユーザ発話の発話解釈データとして得られるデータの一例は、
（インテント）洗濯機タイマー設定
（スロット）開始時刻＝１２：００
これらのデータであるが、さらに、（インテント）の上位概念の意味領域を示す（ドメイン）として、
（ドメイン）家電制御
このようなドメインについても推定される。 The domain is a semantic area of user utterance, and is a superordinate concept of the intent described above.
for example,
User utterance = An example of data obtained as utterance interpretation data of this user utterance by setting the timer of the washing machine to 12:00 is
(Intent) Washing machine timer setting (Slot) Start time = 12:00
These data, but also as (domain) indicating the semantic area of the superordinate concept of (intent)
(Domain) Home appliance control It is estimated that such a domain is also used.

すなわちエージェント装置は、ユーザ発話に対する発話解釈処理において、ドメイン、インテンント、スロットに対応する発話解釈データを決定する。 That is, the agent device determines the utterance interpretation data corresponding to the domain, the intention, and the slot in the utterance interpretation process for the user's utterance.

エージェント装置は、入力したユーザ発話に対して、ユーザ発話解釈処理を実行し、複数のユーザ発話間隔が、しきい値時間以内であり、かつそのドメインが類似する場合には、同一タスクに関する発話であると判定し、同一のタスクＩＤを設定して記録する。
図１５に示す２つのユーザ発話は、このような発話に関する対話履歴情報（コンテキスト）のデータ例である。 The agent device executes the user utterance interpretation process for the input user utterance, and if the plurality of user utterance intervals are within the threshold time and the domains are similar, the utterances related to the same task are used. It is determined that there is, and the same task ID is set and recorded.
The two user utterances shown in FIG. 15 are data examples of dialogue history information (context) related to such utterances.

図１６は、図１５と異なる発話に関する対話履歴情報（コンテキスト）のデータ例である。
図１６には、時間（ｔ１），（ｔ２）に入力した以下の２つのユーザ発話に関する登録データの例を示している。
時間ｔ１のユーザ発話＝トマトののったピザ注文したいです。１２：００に配達で
時間ｔ２のユーザ発話＝明日の朝８：００にタイマー設定して
これら２つのユーザ発話に関する対話履歴情報（コンテキスト）の登録データである。 FIG. 16 is a data example of dialogue history information (context) related to utterance different from that of FIG.
FIG. 16 shows an example of registration data related to the following two user utterances input at the times (t1) and (t2).
User utterance at time t1 = I want to order a pizza with tomatoes. User utterance at time t2 with delivery at 12:00 = timer is set at 8:00 tomorrow morning, and it is the registration data of dialogue history information (context) related to these two user utterances.

これらのユーザ発話が、エージェント装置に連続して入力された場合、エージェント装置は、入力したユーザ発話に対して、ユーザ発話解釈処理を実行する。
複数のユーザ発話間隔が、しきい値時間以内であり、かつそのドメインが類似する場合には、同一タスクに関する発話であると判定し、同一のタスクＩＤを設定して記録する。 When these user utterances are continuously input to the agent device, the agent device executes the user utterance interpretation process for the input user utterances.
If the utterance intervals of a plurality of users are within the threshold time and the domains are similar, it is determined that the utterances are related to the same task, and the same task ID is set and recorded.

しかし、図１６に示す例の場合、
時間ｔ１のユーザ発話＝トマトののったピザ注文したいです。１２：００に配達で
時間ｔ２のユーザ発話＝明日の朝８：００にタイマー設定して
これら２つのユーザ発話に対する発話解釈処理の結果として得られる２つの発話対応のドメインは異なるドメインとなる。
例えば、
時間ｔ１のユーザ発話＝トマトののったピザ注文したいです。１２：００に配達で
このユーザ発話のドメインは例えば、
（ドメイン）デリバリ処理
である。 However, in the case of the example shown in FIG.
User utterance at time t1 = I want to order a pizza with tomatoes. User utterances at time t2 with delivery at 12:00 = timers are set at 8:00 tomorrow morning, and the two utterance-compatible domains obtained as a result of the utterance interpretation processing for these two user utterances are different domains.
for example,
User utterance at time t1 = I want to order a pizza with tomatoes. Delivered at 12:00 This user-spoken domain is, for example,
(Domain) Delivery processing.

一方、
時間ｔ２のユーザ発話＝明日の朝８：００にタイマー設定して
このユーザ発話のドメインは例えば、
（ドメイン）タイマー設定
である。 on the other hand,
User utterance at time t2 = Set a timer at 8:00 tomorrow morning, and the domain of this user utterance is, for example.
(Domain) Timer setting.

従って、エージェント装置のデータ処理部は、これら２つのユーザ発話は同一タスクに関する発話ではないと判定し、これら２つの発話に対して異なるタスクＩＤを設定する。 Therefore, the data processing unit of the agent device determines that these two user utterances are not utterances related to the same task, and sets different task IDs for these two utterances.

なお、同一タスクＩＤが設定された複数の発話に対する処理は、基本的には１つのエージェント装置が実行する。異なるタスクＩＤが設定された複数の発話に対する処理は、それぞれのタスクＩＤ単位で処理を実行するエージェント装置がことなってもよい。 The processing for a plurality of utterances for which the same task ID is set is basically executed by one agent device. The processing for a plurality of utterances in which different task IDs are set may be performed by an agent device that executes the processing for each task ID unit.

［９．その他の実施例について］
次に、その他の実施例について説明する。 [9. About other examples]
Next, other examples will be described.

上述した実施例では、ユーザ発話を直接、入力するエージェント装置Ａと、ユーザ発話を直接、入力せず、エージェント装置Ａと通信を行なってユーザ発話に基づく処理を実行するエージェント装置Ｂの２台のエージェント装置を利用したシステムに基づく実施例について説明した。 In the above-described embodiment, there are two units, an agent device A that directly inputs the user utterance and an agent device B that communicates with the agent device A and executes processing based on the user utterance without directly inputting the user utterance. An example based on a system using an agent device has been described.

この他、例えば、図１７に示すように３台以上のエージェント装置を利用したシステム構成例としてもよい。
図１７に示すシステムは、ユーザ発話を直接、入力するエージェント装置Ａ，１０と、ユーザ発話を直接、入力せず、エージェント装置Ａ，１０と通信を行なってユーザ発話に基づく処理を実行するエージェント装置Ｂ，２０、エージェント装置Ｃ，３０の３台のエージェント装置を利用したシステムである。 In addition, for example, as shown in FIG. 17, a system configuration example using three or more agent devices may be used.
The system shown in FIG. 17 includes agent devices A and 10 that directly input user utterances, and agent devices that communicate with agent devices A and 10 without directly inputting user utterances and execute processing based on user utterances. This is a system that uses three agent devices, B, 20 and agent devices C, 30.

なお、各エージェント装置Ａ〜Ｃの記憶部には、先に図８を参照して説明したようなエージェント装置リストが記録されている。例えば、エージェント装置Ａ，１０は、このリストを参照して、ユーザ要求転送先のエージェント装置を選択する。 In the storage unit of each of the agent devices A to C, a list of agent devices as described above with reference to FIG. 8 is recorded. For example, the agent devices A and 10 refer to this list to select the agent device to which the user request is forwarded.

先に説明したように、図８に示すエージェント装置リストは、通信可能な他のエージェント装置の識別子、エージェント装置の機能、通信用アドレスを対応付けて記録したエージェント装置リストである。
エージェント装置Ａ，１０は、エージェント装置Ａ，１０の記憶部に格納された図８に示すエージェント装置リストを参照してユーザ要求転送先のエージェント装置を選択する処理を行なうことが可能となる。 As described above, the agent device list shown in FIG. 8 is a list of agent devices recorded in association with identifiers of other agent devices capable of communication, functions of the agent devices, and communication addresses.
The agent devices A and 10 can perform a process of selecting an agent device as a user request transfer destination by referring to the agent device list shown in FIG. 8 stored in the storage unit of the agent devices A and 10.

さらに、上述した実施例では、ユーザ発話を直接、入力するエージェント装置Ａ，１０が、ユーザ発話を直接、入力せず、エージェント装置Ａ，１０と通信を行なってユーザ発話に基づく処理を実行するエージェント装置Ｂ，２０に対して、エージェント装置Ｂ，２０が理解可能な「ユーザ発話解釈データＢ」を送信する実施例を説明した。 Further, in the above-described embodiment, the agent devices A and 10 that directly input the user utterance do not directly input the user utterance, but communicate with the agent devices A and 10 to execute the process based on the user utterance. An embodiment of transmitting "user utterance interpretation data B" that can be understood by the agent devices B and 20 to the devices B and 20 has been described.

エージェント装置Ａ，１０が実行する処理は、ユーザ発話解釈処理アルゴリズムＡに従って生成したユーザ発話解釈データＡ（インテントａ、スロットａ）を、エージェント装置Ｂ，２０が理解可能なユーザ発話解釈データＢ（インテントｂ、スロットｂ）に変換する処理（マッピング処理）である。
エージェント装置Ａ，１０は、このエージェント装置Ｂ，２０が理解可能なユーザ発話解釈データＢ（インテントｂ、スロットｂ）をエージェント装置Ｂ，２０に送信していた。 The processing executed by the agent devices A and 10 is the user utterance interpretation data A (intent a, slot a) generated according to the user utterance interpretation processing algorithm A, and the user utterance interpretation data B (intent a, slot a) that the agent devices B and 20 can understand. This is a process (mapping process) of converting to the intent b) and the slot b).
The agent devices A and 10 transmit the user utterance interpretation data B (intent b, slot b) that the agent devices B and 20 can understand to the agent devices B and 20.

エージェント装置Ａ，１０が生成する変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））は、例えば、エージェント装置Ｂ，２０、またはエージェント装置Ｂ，２０の管理サーバ等が提供するＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）が入力データとして許容したデータである。
エージェント装置Ａ，１０は、生成した変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））をこのＡＰＩを利用した処理（ＡＰＩのＣａｌｌ処理）により、エージェント装置Ｂ，２０に入力することができる。 The conversion data (user speech interpretation data B (intent b, slot b)) generated by the agent devices A and 10 is, for example, an API provided by the agent devices B and 20 or the management server of the agent devices B and 20. This is the data allowed as input data by the Application Programming Interface).
The agent devices A and 10 input the generated conversion data (user utterance interpretation data B (intent b, slot b)) to the agent devices B and 20 by a process using this API (Call process of API). Can be done.

例えば、このＡＰＩが、変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））のみならず、ユーザ発話データそのもの（テキストデータ）を入力データとして許容している場合は、エージェント装置Ａ，１０は、変換データ（ユーザ発話解釈データＢ（インテントｂ、スロットｂ））を生成することなく、ユーザ発話データそのもの（テキストデータ）をエージェント装置Ｂ，２０に送信してもよい。 For example, if this API allows not only the conversion data (user utterance interpretation data B (intent b, slot b)) but also the user utterance data itself (text data) as input data, the agent device A, 10 may transmit the user utterance data itself (text data) to the agent devices B and 20 without generating the conversion data (user utterance interpretation data B (intent b, slot b)).

エージェント装置Ａ，１０からエージェント装置Ｂ，２０に対して入力するデータ形式は、データ入力に利用するＡＰＩの許容データであればよく、例えば、ＡＰＩの設定に応じて以下のようなデータ形式が利用可能となる可能性がある。
１．ユーザ発話文そのもの（テキストデータ）
２．ユーザ発話文の正規化データ（例えば、ひらがなを漢字に変換したデータ、助詞抜けや表記ゆれの修正を行ったデータ等）
４．送信先のエージェント装置の特性に応じた解釈しやすい文に変換したデータ The data format input from the agent devices A and 10 to the agent devices B and 20 may be any data allowed by the API used for data input. For example, the following data formats are used depending on the API settings. It may be possible.
1. 1. User utterance itself (text data)
2. Normalized data of user utterances (for example, data obtained by converting hiragana into kanji, data obtained by correcting particle omissions and notational fluctuations, etc.)
4. Data converted into easy-to-interpret sentences according to the characteristics of the destination agent device

なお、さらに、ユーザ発話文に基づくデータのみならず、ユーザの顔画像や、その他のセンサーによるセンサー検出情報等も併せて送信する構成としてもよい。 Further, not only the data based on the user's utterance but also the user's face image, the sensor detection information by other sensors, and the like may be transmitted together.

また、上述した実施例では、、ユーザ発話を直接、入力するエージェント装置Ａ，１０が、ユーザ発話に応じた処理を実行できない場合に、エージェント装置Ｂ，２０に処理を依頼するという実施例について説明した。 Further, in the above-described embodiment, when the agent devices A and 10 that directly input the user utterance cannot execute the process according to the user utterance, the agent devices B and 20 are requested to perform the process. did.

この他、例えば、各エージェント装置の自然言語処理（ＮＬＰ：ＮａｔｕｒａｌＬａｎｇｕａｇｅＰｒｏｃｅｓｓｉｎｇ）の能力等に応じて、ユーザ発話に対する処理を振り分ける構成としてもよい。 In addition, for example, the processing for the user's utterance may be distributed according to the ability of each agent device for natural language processing (NLP: Natural Language Processing).

例えばユーザが利用可能なエージェント装置が以下の３台あるとする。
エージェント装置Ａ
エージェント装置Ｂ
エージェント装置Ｃ
これらの各エージェント装置は、それぞれ自然言語処理（ＮＬＰ：ＮａｔｕｒａｌＬａｎｇｕａｇｅＰｒｏｃｅｓｓｉｎｇ）の能力、音声認識処理（ＡＳＲ：ＡｕｔｏｍａｔｉｃＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ）の能力、音声合成処理（ＴＴＳ：ＴｅｘｔＴｏＳｐｅｅｃｈ）の能力や、装着したセンサーの種類が異なっている。 For example, assume that there are the following three agent devices that can be used by the user.
Agent device A
Agent device B
Agent device C
Each of these agent devices has the ability of natural language processing (NLP: Natural Language Processing), the ability of speech recognition processing (ASR: Automatic Speech Recognition), the ability of speech synthesis processing (TTS: Text To Speech), and the equipment. The type of sensor is different.

このような設定の場合、ユーザ発話の内容等に応じて、ユーザ発話に対応する処理を実行する最適なエージェント装置を選択し、選択されたエージェント装置が処理を実行する。
このような構成とすることで、ユーザ発話に対応した処理を、より最適なエージェント装置において実行することができる。 In the case of such a setting, the optimum agent device that executes the process corresponding to the user utterance is selected according to the content of the user utterance and the like, and the selected agent device executes the process.
With such a configuration, processing corresponding to user utterance can be executed in a more optimal agent device.

なお、この処理を実現するためには、各エージェント装置の記憶部に先に説明した図８に示すエージェントリストをさらに拡張させたリストを記録しておくことが必要である。すなわち、各エージェント装置の自然言語処理（ＮＬＰ）の能力、音声認識処理（ＡＳＲ）の能力、音声合成処理（ＴＴＳ）の能力や、装着したセンサーの種類等の情報を記録したリストを記憶部に格納し、このリストを参照してユーザ発話に対応する処理を実行するエージェント装置を選択する。 In order to realize this process, it is necessary to record in the storage unit of each agent device a list obtained by further expanding the agent list shown in FIG. 8 described above. That is, a list recording information such as the ability of each agent device for natural language processing (NLP), speech recognition processing (ASR), speech synthesis processing (TTS), and the type of attached sensor is stored in the storage unit. Select the agent device to store and refer to this list to execute the process corresponding to the user's speech.

［１０．エージェント装置（情報処理装置）の構成例について］
次に、本開示の情報処理装置、すなわちエージェント装置の構成例について説明する。 [10. About the configuration example of the agent device (information processing device)]
Next, a configuration example of the information processing device of the present disclosure, that is, the agent device will be described.

図１８は、ユーザ発話を認識して、ユーザ発話に対応する処理や応答を行うエージェント装置１００（情報処理装置）の一構成例を示す図である。 FIG. 18 is a diagram showing a configuration example of an agent device 100 (information processing device) that recognizes a user utterance and performs a process or a response corresponding to the user utterance.

図１８に示すように、エージェント装置１００は、入力部１１０、出力部１２０、データ処理部１３０、通信部１４０、記憶部１５０を有する。 As shown in FIG. 18, the agent device 100 includes an input unit 110, an output unit 120, a data processing unit 130, a communication unit 140, and a storage unit 150.

入力部１１０は、音声入力部（マイク）１１１、画像入力部（カメラ）１１２、センサー１１３を有する。
出力部１２０は、音声出力部（スピーカー）１２１、画像出力部（表示部）１２２を有する。 The input unit 110 includes a voice input unit (microphone) 111, an image input unit (camera) 112, and a sensor 113.
The output unit 120 includes an audio output unit (speaker) 121 and an image output unit (display unit) 122.

なお、音声入力部（マイク）１１１は、例えば図１に示すエージェント装置１０のマイク１２に対応する。
画像入力部（カメラ）１１２は、図１に示すエージェント装置１０のカメラ１１に対応する。
音声出力部（スピーカー）１２１は、図１に示すエージェント装置１０のスピーカー１４に対応する。
画像出力部（表示部）１２２は、図１に示すエージェント装置１０の表示部１３に対応する。
なお、画像出力部（表示部）１２２は、例えば、プロジェクタ等によって構成することも可能であり、また外部装置のテレビの表示部を利用した構成とすることも可能である。 The voice input unit (microphone) 111 corresponds to, for example, the microphone 12 of the agent device 10 shown in FIG.
The image input unit (camera) 112 corresponds to the camera 11 of the agent device 10 shown in FIG.
The audio output unit (speaker) 121 corresponds to the speaker 14 of the agent device 10 shown in FIG.
The image output unit (display unit) 122 corresponds to the display unit 13 of the agent device 10 shown in FIG.
The image output unit (display unit) 122 can be configured by, for example, a projector or the like, or can be configured by using the display unit of a television of an external device.

データ処理部１３０は、入力データ解析部１６０、応答処理実行部１７０を有する。
入力データ解析部１６０は、音声解析部１６１、画像解析部１６２、センサー情報解析部１６３、ユーザ発話対応処理実行制御部１６４、データ変換部１６５を有する。
データ処理実行部１７０は、出力音声生成部１７１、表示情報生成部１７２、ユーザ発話応答処理実行部１７３を有する。 The data processing unit 130 includes an input data analysis unit 160 and a response processing execution unit 170.
The input data analysis unit 160 includes a voice analysis unit 161, an image analysis unit 162, a sensor information analysis unit 163, a user utterance correspondence processing execution control unit 164, and a data conversion unit 165.
The data processing execution unit 170 includes an output voice generation unit 171, a display information generation unit 172, and a user utterance response processing execution unit 173.

ユーザの発話音声はマイクなどの音声入力部１１１に入力される。
音声入力部（マイク）１１１は、入力したユーザ発話音声を音声解析部１６１に入力する。
音声解析部１６１は、例えばＡＳＲ（ＡｕｔｏｍａｔｉｃＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ）機能を有し、音声データを複数の単語から構成されるテキストデータに変換する。
さらに、テキストデータに対する発話意味解析処理を実行する。
音声解析部１６１は、例えば、ＮＬＵ（ＮａｔｕｒａｌＬａｎｇｕａｇｅＵｎｄｅｒｓｔａｎｄｉｎｇ）等の自然言語理解機能を有し、テキストデータからユーザ発話の意図（インテント：Ｉｎｔｅｎｔ）や、発話に含まれる意味のある要素（有意要素）である（スロット：Ｓｌｏｔ）を推定する。すなわち、ユーザ発話解釈処理を実行してユーザ発話解釈データ（インテント、スロット）を生成する。 The user's spoken voice is input to a voice input unit 111 such as a microphone.
The voice input unit (microphone) 111 inputs the input user-spoken voice to the voice analysis unit 161.
The voice analysis unit 161 has, for example, an ASR (Automatic Speech Recognition) function, and converts voice data into text data composed of a plurality of words.
Further, the utterance semantic analysis process for the text data is executed.
The voice analysis unit 161 has, for example, a natural language understanding function such as NLU (Natural Language Understanding), and from text data, the intention (intent) of the user's utterance and the meaningful element (significant element) included in the utterance. ) (Slot: Slot) is estimated. That is, the user utterance interpretation process is executed to generate the user utterance interpretation data (intent, slot).

ユーザ発話から、意図（インテント）と、要素（スロット）を正確に推定、取得することができれば、エージェント装置１００は、ユーザ発話に対する正確な処理を行うことができる。 If the intention (intent) and the element (slot) can be accurately estimated and acquired from the user utterance, the agent device 100 can perform accurate processing for the user utterance.

音声解析部１６１によって取得されたユーザ発話解析情報は、記憶部１５０に格納されるとともに、ユーザ発話対応処理実行制御部１６４に出力される。 The user utterance analysis information acquired by the voice analysis unit 161 is stored in the storage unit 150 and output to the user utterance correspondence processing execution control unit 164.

画像入力部１１２は、発話ユーザおよびその周囲の画像を撮影して、画像解析部１６２に入力する。
画像解析部１６２は、発話ユーザの顔の表情やユーザの行動、視線情報、発話ユーザの周囲情報等の解析を行い、この解析結果を記憶部１５０に格納するとともに、ユーザ発話対応処理実行制御部１６４に出力する。 The image input unit 112 captures an image of the speaking user and its surroundings and inputs the image to the image analysis unit 162.
The image analysis unit 162 analyzes the facial expression of the utterance user, the user's behavior, the line-of-sight information, the surrounding information of the utterance user, etc. Output to 164.

センサー１１３は、例えば気温、気圧、ユーザの視線等を解析するために必要となるデータを取得するセンサーによって構成される。センサーの取得情報は、センサー情報解析部１６３に入力される。
センサー情報解析部１６３は、センサー取得情報に基づいて、例えば気温、気圧、ユーザの視線等のデータを取得して、この解析結果を記憶部１５０に格納するとともに、ユーザ発話対応処理実行制御部１６４に出力する。 The sensor 113 is composed of a sensor that acquires data necessary for analyzing, for example, temperature, atmospheric pressure, user's line of sight, and the like. The sensor acquisition information is input to the sensor information analysis unit 163.
The sensor information analysis unit 163 acquires data such as temperature, atmospheric pressure, and the user's line of sight based on the sensor acquisition information, stores the analysis result in the storage unit 150, and also stores the analysis result in the storage unit 150, and also stores the user speech response processing execution control unit 164. Output to.

ユーザ発話対応処理実行制御部１６４は、音声解析部１６１の生成したユーザ発話解釈データ（インテント、スロット）や、画像解析部１６２の生成した画像解析情報や、センサー情報解析部１６３の生成したセンサー解析情報を入力して、ユーザ発話に対応する処理を自装置で実行するか、他の通信可能なエージェント装置で実行させるかを決定する。 The user utterance correspondence processing execution control unit 164 includes user utterance interpretation data (intents, slots) generated by the voice analysis unit 161, image analysis information generated by the image analysis unit 162, and a sensor generated by the sensor information analysis unit 163. By inputting the analysis information, it is determined whether the process corresponding to the user's utterance is executed by the own device or another communicable agent device.

ユーザ発話対応処理実行制御部１６４がユーザ発話に対応する処理を他のエージェント装置で実行させることを決定した場合、データ変換部１６５は、音声解析部１６１が生成したユーザ発話解釈データＡ（インテントａ、スロットａ）を、ユーザ発話対応処理を実行する他のエージェント装置Ｂ，２０が理解可能なユーザ発話解釈データＢ（インテントｂ、スロットｂ）に変換する処理（マッピング処理）を実行する。
データ変換部１６５の生成した変換データは、通信部１４０を介して他のエージェント装置Ｂに送信される。 When the user utterance correspondence processing execution control unit 164 decides to execute the processing corresponding to the user utterance by another agent device, the data conversion unit 165 determines the user utterance interpretation data A (intent) generated by the voice analysis unit 161. A process (mapping process) of converting a, slot a) into user utterance interpretation data B (intent b, slot b) that can be understood by other agent devices B, 20 that execute user utterance correspondence process is executed.
The converted data generated by the data conversion unit 165 is transmitted to another agent device B via the communication unit 140.

記憶部１５０には、ユーザ発話の内容や、ユーザ発話に基づく学習データや、ユーザとの対話履歴情報等のコンテキスト情報、画像出力部（表示部）１２２に出力する表示用データ等が格納される。 The storage unit 150 stores the content of the user's utterance, learning data based on the user's utterance, context information such as dialogue history information with the user, display data to be output to the image output unit (display unit) 122, and the like. ..

記憶部１５０には、さらに、音声解析部１６１が生成したユーザ発話解釈データＡ（インテントａ、スロットａ）を他のエージェント装置が理解可能なユーザ発話解釈データＢ（インテントｂ、スロットｂ）に変換する処理（マッピング処理）を実行するためのマッピングデータが格納されている。
また、記憶部１５０には、さらに先に図８を参照して説明した通信可能な他のエージェント装置の機能情報やアクセス情報（通信アドレス）等を記録したエージェント装置リストが格納されている。 Further, in the storage unit 150, the user utterance interpretation data B (intent b, slot b) in which the user utterance interpretation data A (intent a, slot a) generated by the voice analysis unit 161 can be understood by another agent device. Mapping data for executing the process of converting to (mapping process) is stored.
Further, the storage unit 150 stores a list of agent devices in which functional information, access information (communication address), and the like of other communicable agent devices described above with reference to FIG. 8 are recorded.

応答処理実行部１７０は、出力音声生成部１７１、表示情報生成部１７２、ユーザ発話応答処理実行部１７３を有する。
出力音声生成部１７１は、音声解析部１６１の解析結果であるユーザ発話解析データに基づいて、ユーザに対するシステム発話を生成する。
出力音声生成部１７１の生成した応答音声情報は、スピーカー等の音声出力部１２１を介して出力される。 The response processing execution unit 170 includes an output voice generation unit 171, a display information generation unit 172, and a user utterance response processing execution unit 173.
The output voice generation unit 171 generates system utterances for the user based on the user utterance analysis data which is the analysis result of the voice analysis unit 161.
The response voice information generated by the output voice generation unit 171 is output via the voice output unit 121 such as a speaker.

表示情報生成部１７２は、ユーザに対するシステム発話のテキスト情報や、その他の提示情報を表示する。
例えばユーザが世界地図を見せてというユーザ発話を行った場合、世界地図を表示する。
世界地図は、例えばサービス提供サーバから取得可能である。 The display information generation unit 172 displays the text information of the system utterance to the user and other presentation information.
For example, when the user makes a user utterance to show the world map, the world map is displayed.
The world map can be obtained from, for example, a service providing server.

ユーザ発話応答処理実行部１７３は、ユーザ発話に対する処理を実行する。
入力データ解析部１６０のユーザ発話対応処理実行制御部１６４がユーザ発話に対応する処理を自装置で実行することを決定した場合、ユーザ発話応答処理実行部１７３は、ユーザ発話に対する処理を実行する。 The user utterance response processing execution unit 173 executes the processing for the user utterance.
When the user utterance correspondence processing execution control unit 164 of the input data analysis unit 160 decides to execute the processing corresponding to the user utterance in its own device, the user utterance response processing execution unit 173 executes the processing for the user utterance.

例えば、
ユーザ発話＝音楽を再生して
ユーザ発話＝面白い動画を見せて
このような発話である場合、ユーザ発話対応処理実行部１７３は、ユーザ発話に対する処理、すなわち音楽再生処理や、動画再生処理を行う。 for example,
User utterance = play music and show user utterance = interesting moving image In such an utterance, the user utterance correspondence processing execution unit 173 performs processing for user utterance, that is, music reproduction processing and moving image reproduction processing.

ただし、入力データ解析部１６０のユーザ発話対応処理実行制御部１６４がユーザ発話に対応する処理を他のエージェント装置で実行させることを決定した場合は、前述したように、データ変換部１６５が生成した変換データが通信部１４０を介して他のエージェント装置Ｂに送信され、他のエージェント装置Ｂにおいてユーザ発話に対応する処理が実行されることになる。 However, when the user utterance correspondence processing execution control unit 164 of the input data analysis unit 160 decides to execute the processing corresponding to the user utterance by another agent device, the data conversion unit 165 generates the data as described above. The converted data is transmitted to another agent device B via the communication unit 140, and the other agent device B executes a process corresponding to the user's utterance.

通信部１４０は、他のエージェント装置との通信処理や、外部サーバ、例えばニュース情報や天気情報、音楽情報等の様々な情報を提供するサーバ、さらに音声解析処理を実行するサーバ等との通信処理に利用される。
図１８に示すデータ処理部１３０の入力データ解析部１６０の実行する処理の一部、例えば音声解析処理や意味解析処理等は、外部サーバを利用して実行する構成としてもよい。 The communication unit 140 performs communication processing with other agent devices, communication processing with an external server, for example, a server that provides various information such as news information, weather information, music information, and a server that executes voice analysis processing. It is used for.
A part of the processing executed by the input data analysis unit 160 of the data processing unit 130 shown in FIG. 18, such as voice analysis processing and semantic analysis processing, may be configured to be executed by using an external server.

なお、本開示のエージェント装置（情報処理装置）１００は、図１９に示すように、いわゆるスマートスピーカー型のエージェント装置に限らず、スマホやＰＣ等のような様々な装置形態とすることが可能である。 As shown in FIG. 19, the agent device (information processing device) 100 of the present disclosure is not limited to the so-called smart speaker type agent device, and can be in various device forms such as a smartphone or a PC. be.

エージェント装置（情報処理装置）１００は、ユーザ１の発話を認識して、ユーザ発話に基づく応答を行う他、例えば、ユーザ発話に応じてテレビ、エアコン等の外部機器２５０の制御も実行する。
例えばユーザ発話が「テレビのチャンネルを１に変えて」、あるいは「エアコンの設定温度を２０度にして」といった要求である場合、エージェント装置（情報処理装置）１００は、このユーザ発話の音声認識結果に基づいて、外部機器２５０に対して制御信号（Ｗｉ−Ｆｉ、赤外光など）を出力して、ユーザ発話に従った制御を実行する。 The agent device (information processing device) 100 recognizes the utterance of the user 1 and performs a response based on the utterance of the user. For example, the agent device (information processing device) 100 also controls an external device 250 such as a television or an air conditioner in response to the utterance of the user.
For example, when the user utterance is a request such as "change the TV channel to 1" or "set the temperature of the air conditioner to 20 degrees", the agent device (information processing device) 100 determines the voice recognition result of the user utterance. A control signal (Wi-Fi, infrared light, etc.) is output to the external device 250 based on the above, and control according to the user's utterance is executed.

また、エージェント装置（情報処理装置）１００は、ネットワークを介して様々なデータ処理や情報提供を行なうサーバ２００と接続されている。エージェント装置（情報処理装置）１００は、サーバ２００から、ユーザ発話に対する応答を生成するために必要となる情報を取得することが可能である。また、音声認識処理や意味解析処理をサーバに行わせる構成としてもよい。 Further, the agent device (information processing device) 100 is connected to a server 200 that performs various data processing and information provision via a network. The agent device (information processing device) 100 can acquire information necessary for generating a response to a user's utterance from the server 200. Further, the server may be configured to perform voice recognition processing and semantic analysis processing.

［１１．エージェント装置（情報処理装置）のハードウェア構成例について］
次に、図２０を参照して、エージェント装置（情報処理装置）のハードウェア構成例について説明する。
図２０を参照して説明するハードウェアは、先に図１８を参照して説明したエージェント装置（情報処理装置）１００の１つの具体的なハードウェア構成例であり、また、図１９を参照して説明したサーバ２００を構成する情報処理装置のハードウェア構成の一例でもある。 [11. About hardware configuration example of agent device (information processing device)]
Next, a hardware configuration example of the agent device (information processing device) will be described with reference to FIG.
The hardware described with reference to FIG. 20 is one specific hardware configuration example of the agent device (information processing device) 100 described above with reference to FIG. 18, and also with reference to FIG. It is also an example of the hardware configuration of the information processing apparatus constituting the server 200 described above.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３０１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）３０２、または記憶部３０８に記憶されているプログラムに従って各種の処理を実行する制御部やデータ処理部として機能する。例えば、上述した実施例において説明したシーケンスに従った処理を実行する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３０３には、ＣＰＵ３０１が実行するプログラムやデータなどが記憶される。これらのＣＰＵ３０１、ＲＯＭ３０２、およびＲＡＭ３０３は、バス３０４により相互に接続されている。 The CPU (Central Processing Unit) 301 functions as a control unit or a data processing unit that executes various processes according to a program stored in the ROM (Read Only Memory) 302 or the storage unit 308. For example, the process according to the sequence described in the above-described embodiment is executed. The RAM (Random Access Memory) 303 stores programs and data executed by the CPU 301. These CPU 301, ROM 302, and RAM 303 are connected to each other by a bus 304.

ＣＰＵ３０１はバス３０４を介して入出力インタフェース３０５に接続され、入出力インタフェース３０５には、各種スイッチ、キーボード、マウス、マイクロホン、センサーなどよりなる入力部３０６、ディスプレイ、スピーカーなどよりなる出力部３０７が接続されている。ＣＰＵ３０１は、入力部３０６から入力される指令に対応して各種の処理を実行し、処理結果を例えば出力部３０７に出力する。 The CPU 301 is connected to the input / output interface 305 via the bus 304, and the input / output interface 305 is connected to an input unit 306 consisting of various switches, a keyboard, a mouse, a microphone, a sensor, etc., and an output unit 307 consisting of a display, a speaker, and the like. Has been done. The CPU 301 executes various processes in response to a command input from the input unit 306, and outputs the process results to, for example, the output unit 307.

入出力インタフェース３０５に接続されている記憶部３０８は、例えばハードディスク等からなり、ＣＰＵ３０１が実行するプログラムや各種のデータを記憶する。通信部３０９は、Ｗｉ−Ｆｉ通信、ブルートゥース（登録商標）（ＢＴ）通信、その他インターネットやローカルエリアネットワークなどのネットワークを介したデータ通信の送受信部として機能し、外部の装置と通信する。 The storage unit 308 connected to the input / output interface 305 is composed of, for example, a hard disk or the like, and stores a program executed by the CPU 301 and various data. The communication unit 309 functions as a transmission / reception unit for Wi-Fi communication, Bluetooth (registered trademark) (BT) communication, and other data communication via a network such as the Internet or a local area network, and communicates with an external device.

入出力インタフェース３０５に接続されているドライブ３１０は、磁気ディスク、光ディスク、光磁気ディスク、あるいはメモリカード等の半導体メモリなどのリムーバブルメディア３１１を駆動し、データの記録あるいは読み取りを実行する。 The drive 310 connected to the input / output interface 305 drives a removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory such as a memory card, and records or reads data.

［１２．本開示の構成のまとめ］
以上、特定の実施例を参照しながら、本開示の実施例について詳解してきた。しかしながら、本開示の要旨を逸脱しない範囲で当業者が実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本開示の要旨を判断するためには、特許請求の範囲の欄を参酌すべきである。 [12. Summary of the structure of this disclosure]
As described above, the examples of the present disclosure have been described in detail with reference to the specific examples. However, it is self-evident that one of ordinary skill in the art can modify or substitute the examples without departing from the gist of the present disclosure. That is, the present invention has been disclosed in the form of an example, and should not be construed in a limited manner. In order to judge the gist of this disclosure, the column of claims should be taken into consideration.

なお、本明細書において開示した技術は、以下のような構成をとることができる。
（１）ユーザ発話を入力する音声入力部と、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成するデータ処理部を有し、
前記データ処理部は、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信する情報処理装置。 The technology disclosed in the present specification can have the following configuration.
(1) A voice input unit for inputting user utterances and
It has a data processing unit that executes the analysis of the user utterance and generates the user utterance interpretation data.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
An information processing device that converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and transmits the converted data to the second information processing device.

（２）前記データ処理部が生成するユーザ発話解釈データは、
ユーザ発話の意図に相当するインテントと、ユーザ発話に含まれる要素情報に相当するスロットを有するデータである（１）に記載の情報処理装置。 (2) The user utterance interpretation data generated by the data processing unit is
The information processing device according to (1), which is data having an intent corresponding to the intention of the user's utterance and a slot corresponding to the element information included in the user's utterance.

（３）前記データ処理部は、
前記データ処理部が生成したインテントとスロットを含むユーザ発話解釈データを、前記第２情報処理装置が理解可能なインテントとスロットを含む変換データに変換する処理を実行する（２）に記載の情報処理装置。 (3) The data processing unit
Described in (2), the process of converting the user utterance interpretation data including the intents and slots generated by the data processing unit into the converted data including the intents and slots that can be understood by the second information processing apparatus is executed. Information processing device.

（４）前記データ処理部は、
前記データ処理部の実行するユーザ発話解釈処理アルゴリズムに従って生成されるユーザ発話解釈データに含まれるインテントおよびスロットと、前記第２情報処理装置が理解可能なインテントとスロットを対応付けたマッピンクデータを参照して、前記変換データの生成処理を実行する（２）または（３）に記載の情報処理装置。 (4) The data processing unit
Map pink data in which intents and slots included in user speech interpretation data generated according to a user speech interpretation processing algorithm executed by the data processing unit are associated with intents and slots that can be understood by the second information processing apparatus. The information processing apparatus according to (2) or (3), which executes the conversion data generation process with reference to.

（５）前記情報処理装置は、
前記マッピングデータを格納した記憶部を有する（４）に記載の情報処理装置。 (5) The information processing device is
The information processing apparatus according to (4), which has a storage unit that stores the mapping data.

（６）前記データ処理部は、
前記第２情報処理装置が理解可能なデータの入力を許容するＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｉｎｇＩｎｔｅｒｆａｃｅ）に対して前記変換データを入力する（１）〜（５）いずれかに記載の情報処理装置。 (6) The data processing unit
The information processing device according to any one of (1) to (5), wherein the converted data is input to an API (Application Programming Interface) that allows the second information processing device to input data that can be understood.

（７）前記データ処理部は、
前記ユーザ発話に対応した処理を自装置で実行できるか否かを判定し、実行できないと判定した場合に、前記変換データを生成して前記第２情報処理装置に送信する（１）〜（６）いずれかに記載の情報処理装置。 (7) The data processing unit
It is determined whether or not the process corresponding to the user's utterance can be executed by the own device, and if it is determined that the process cannot be executed, the converted data is generated and transmitted to the second information processing apparatus (1) to (6). ) The information processing device described in any of the above.

（８）前記データ処理部は、
ユーザから、前記ユーザ発話に対応した処理を他装置で実行させることの要求発話を入力した場合、
前記変換データを生成して前記第２情報処理装置に送信する（１）〜（７）いずれかに記載の情報処理装置。 (8) The data processing unit
When the user inputs a request utterance to execute the process corresponding to the user utterance on another device,
The information processing apparatus according to any one of (1) to (7), which generates the converted data and transmits the converted data to the second information processing apparatus.

（９）前記データ処理部は、
通信可能な他の情報処理装置各々の機能情報とアクセス情報を登録したリストを参照して前記ユーザ発話に対応した処理を実行させる外部装置を決定する（１）〜（８）いずれかに記載の情報処理装置。 (9) The data processing unit
The external device according to any one of (1) to (8) is determined by referring to a list in which functional information and access information of each of the other communicable information processing devices are registered to execute a process corresponding to the user's utterance. Information processing device.

（１０）前記データ処理部は、
生成した前記変換データが、前記第２情報処理装置においてユーザ発話対応の処理を実行するための必要情報の少なくとも一部を含まず、不足情報がある場合、
ユーザに対して、前記不足情報を取得するための質問を実行する（１）〜（９）いずれかに記載の情報処理装置。 (10) The data processing unit is
When the generated conversion data does not include at least a part of necessary information for executing the process corresponding to user utterance in the second information processing apparatus, and there is insufficient information.
The information processing device according to any one of (1) to (9), which asks a user a question for acquiring the lack information.

（１１）前記データ処理部は、
ユーザ発話に対応するタスクＩＤを設定してユーザ発話の管理処理を実行する（１）〜（１０）いずれかに記載の情報処理装置。 (11) The data processing unit is
The information processing device according to any one of (1) to (10), wherein a task ID corresponding to a user utterance is set and a user utterance management process is executed.

（１２）前記データ処理部は、
複数のユーザ発話を、しきい値時間以内の時間間隔で入力し、かつ、前記複数のユーザ発話のドメインが類似する場合、前記複数のユーザ発話に立てして同一のタスクＩＤを設定する（１１）に記載の情報処理装置。 (12) The data processing unit is
When a plurality of user utterances are input at time intervals within the threshold time and the domains of the plurality of user utterances are similar, the same task ID is set for the plurality of user utterances (11). ).

（１３）複数の情報処理装置を有する情報処理システムであり、
第１情報処理装置は、
ユーザ発話を入力する音声入力部と、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成するデータ処理部を有し、
前記データ処理部は、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信し、
前記第２情報処理装置は、
前記第１情報処理装置から受信する前記変換データに基づいて、前記ユーザ発話に対応した処理を実行する情報処理システム。 (13) An information processing system having a plurality of information processing devices.
The first information processing device is
A voice input unit for inputting user utterances,
It has a data processing unit that executes the analysis of the user utterance and generates the user utterance interpretation data.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
The user utterance interpretation data is converted to generate conversion data that can be understood by the second information processing device, and the data is transmitted to the second information processing device.
The second information processing device is
An information processing system that executes processing corresponding to the user's utterance based on the converted data received from the first information processing apparatus.

（１４）前記第１情報処理装置は、
前記ユーザ発話の意図に相当するインテントと、ユーザ発話に含まれる要素情報に相当するスロットを有するユーザ発話解釈データを生成し、
前記第２情報処理装置が理解可能なインテントとスロットを有する変換データを生成して、前記第２情報処理装置に送信する（１３）に記載の情報処理システム。 (14) The first information processing device is
User utterance interpretation data having an intent corresponding to the intention of the user utterance and a slot corresponding to the element information included in the user utterance is generated.
The information processing system according to (13), wherein the conversion data having an intent and a slot that the second information processing apparatus can understand is generated and transmitted to the second information processing apparatus.

（１５）情報処理装置において実行する情報処理方法であり、
音声入力部が、ユーザ発話を入力し、
データ処理部が、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成するデータ処理を実行し、
前記データ処理部は、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信する情報処理方法。 (15) An information processing method executed in an information processing device.
The voice input section inputs the user's utterance,
The data processing department
Data processing that executes the analysis of the user utterance and generates the user utterance interpretation data is executed.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
An information processing method that converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and transmits the converted data to the second information processing device.

（１６）複数の情報処理装置を有する情報処理システムにおいて実行する情報処理方法であり、
第１情報処理装置が、
入力したユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話の解析を実行してユーザ発話解釈データを生成し、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成し、前記第２情報処理装置に送信する処理を実行し、
前記第２情報処理装置が、
前記第１情報処理装置から受信する前記変換データに基づいて、前記ユーザ発話に対応した処理を実行する情報処理方法。 (16) An information processing method executed in an information processing system having a plurality of information processing devices.
The first information processing device
When the external second information processing device executes the processing corresponding to the input user utterance
The analysis of the user utterance is executed to generate the user utterance interpretation data, and the user utterance interpretation data is generated.
A process of converting the user utterance interpretation data, generating converted data that can be understood by the second information processing device, and transmitting the converted data to the second information processing device is executed.
The second information processing device
An information processing method that executes processing corresponding to the user's utterance based on the converted data received from the first information processing apparatus.

（１７）情報処理装置において情報処理を実行させるプログラムであり、
前記プログラムは、データ処理部に、
ユーザ発話の解析を実行してユーザ発話解釈データを生成させ、
前記ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、
前記ユーザ発話解釈データを変換して、前記第２情報処理装置が理解可能な変換データを生成させ、前記第２情報処理装置に送信させるプログラム。 (17) A program that executes information processing in an information processing device.
The program is installed in the data processing unit.
Analyze user utterances to generate user utterance interpretation data
When the external second information processing device executes the process corresponding to the user's utterance,
A program that converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and causes the second information processing device to transmit the converted data.

また、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。例えば、プログラムは記録媒体に予め記録しておくことができる。記録媒体からコンピュータにインストールする他、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットといったネットワークを介してプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。 In addition, the series of processes described in the specification can be executed by hardware, software, or a composite configuration of both. When executing processing by software, install the program that records the processing sequence in the memory in the computer built in the dedicated hardware and execute it, or execute the program on a general-purpose computer that can execute various processing. It can be installed and run. For example, the program can be pre-recorded on a recording medium. In addition to installing on a computer from a recording medium, the program can be received via a network such as LAN (Local Area Network) or the Internet and installed on a recording medium such as a built-in hard disk.

なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 The various processes described in the specification are not only executed in chronological order according to the description, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. Further, in the present specification, the system is a logical set configuration of a plurality of devices, and the devices having each configuration are not limited to those in the same housing.

以上、説明したように、本開示の一実施例の構成によれば、ユーザ発話を入力した装置以外の装置においてユーザ発話に応じた処理を実行させることを可能とした装置、方法が実現される。
具体的には、例えば、ユーザ発話に対応した処理を外部の第２情報処理装置に実行させる場合、ユーザ発話の解析を実行してユーザ発話解釈データを生成し、生成したユーザ発話解釈データを変換して、第２情報処理装置が理解可能な変換データを生成して第２情報処理装置に送信する。ユーザ発話解釈データは、ユーザ発話の意図に相当するインテントと、ユーザ発話に含まれる要素情報に相当するスロットを有し、データ処理部はインテントとスロットを含むユーザ発話解釈データを、第２情報処理装置が理解可能なインテントとスロットを含むデータに変換する。
本構成により、ユーザ発話を入力した装置以外の装置においてユーザ発話に応じた処理を実行させることを可能とした装置、方法が実現される。 As described above, according to the configuration of the embodiment of the present disclosure, a device and a method capable of executing a process according to the user's utterance in a device other than the device in which the user's utterance is input are realized. ..
Specifically, for example, when the external second information processing device is made to execute the process corresponding to the user utterance, the user utterance analysis is executed to generate the user utterance interpretation data, and the generated user utterance interpretation data is converted. Then, the conversion data that can be understood by the second information processing device is generated and transmitted to the second information processing device. The user utterance interpretation data has an intent corresponding to the intention of the user utterance and a slot corresponding to the element information included in the user utterance, and the data processing unit uses the user utterance interpretation data including the intent and the slot as a second. Converts the data to include intents and slots that the information processor can understand.
With this configuration, a device and a method that enable a device other than the device that input the user utterance to execute the process according to the user utterance are realized.

１０，２０，３０エージェント装置
１１カメラ
１２マイク
１３表示部
１４スピーカー
１１０入力部
１１１音声入力部
１１２画像入力部
１１３センサー
１２０出力部
１２１音声出力部
１２２画像出力部
１３０データ処理部
１４０通信部
１５０記憶部
１６０入力データ解析部
１６１音声解析部
１６２画像解析部
１６３センサー情報解析部
１６４ユーザ発話対応処理実行制御部
１６５データ変換部
１７０応答処理実行部
１７１出力音声生成部
１７２表示情報生成部
１７３ユーザ発話応答処理実行部
３０１ＣＰＵ
３０２ＲＯＭ
３０３ＲＡＭ
３０４バス
３０５入出力インタフェース
３０６入力部
３０７出力部
３０８記憶部
３０９通信部
３１０ドライブ
３１１リムーバブルメディア 10, 20, 30 Agent device 11 Camera 12 Microphone 13 Display unit 14 Speaker 110 Input unit 111 Audio input unit 112 Image input unit 113 Sensor 120 Output unit 121 Audio output unit 122 Image output unit 130 Data processing unit 140 Communication unit 150 Storage unit 160 Input data analysis unit 161 Voice analysis unit 162 Image analysis unit 163 Sensor information analysis unit 164 User speech response processing execution control unit 165 Data conversion unit 170 Response processing execution unit 171 Output voice generation unit 172 Display information generation unit 173 User speech response processing Execution unit 301 CPU
302 ROM
303 RAM
304 Bus 305 Input / output interface 306 Input unit 307 Output unit 308 Storage unit 309 Communication unit 310 Drive 311 Removable media

Claims

A voice input unit for inputting user utterances,
It has a data processing unit that executes the analysis of the user utterance and generates the user utterance interpretation data.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
An information processing device that converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and transmits the converted data to the second information processing device.

The user utterance interpretation data generated by the data processing unit is
The information processing device according to claim 1, which is data having an intent corresponding to the intention of the user's utterance and a slot corresponding to the element information included in the user's utterance.

The data processing unit
The second aspect of claim 2 is to execute a process of converting user utterance interpretation data including intents and slots generated by the data processing unit into conversion data including intents and slots that can be understood by the second information processing apparatus. Information processing device.

The data processing unit
Map pink data in which intents and slots included in user speech interpretation data generated according to a user speech interpretation processing algorithm executed by the data processing unit are associated with intents and slots that can be understood by the second information processing apparatus. The information processing apparatus according to claim 2, wherein the conversion data generation process is executed with reference to the above.

The information processing device
The information processing device according to claim 4, further comprising a storage unit that stores the mapping data.

The data processing unit
The information processing device according to claim 1, wherein the converted data is input to an API (Application Programming Interface) that allows the second information processing device to input data that can be understood.

The data processing unit
The first aspect of claim 1, wherein it is determined whether or not the process corresponding to the user's utterance can be executed by the own device, and when it is determined that the process cannot be executed, the conversion data is generated and transmitted to the second information processing apparatus. Information processing device.

The data processing unit
When the user inputs a request utterance to execute the process corresponding to the user utterance on another device,
The information processing device according to claim 1, wherein the converted data is generated and transmitted to the second information processing device.

The data processing unit
The information processing device according to claim 1, wherein an external device that executes a process corresponding to the user's utterance is determined by referring to a list in which functional information and access information of each of the other communicable information processing devices are registered.

The data processing unit
When the generated conversion data does not include at least a part of necessary information for executing the process corresponding to user utterance in the second information processing apparatus, and there is insufficient information.
The information processing device according to claim 1, wherein the user is asked a question for acquiring the lack information.

The data processing unit
The information processing device according to claim 1, wherein a task ID corresponding to the user utterance is set and the user utterance management process is executed.

The data processing unit
A claim in which a plurality of user utterances are input at time intervals within a threshold time, and when the domains of the plurality of user utterances are similar, the same task ID is set for the plurality of user utterances. 11. The information processing apparatus according to 11.

It is an information processing system that has multiple information processing devices.
The first information processing device is
A voice input unit for inputting user utterances,
It has a data processing unit that executes the analysis of the user utterance and generates the user utterance interpretation data.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
The user utterance interpretation data is converted to generate conversion data that can be understood by the second information processing device, and the data is transmitted to the second information processing device.
The second information processing device is
An information processing system that executes processing corresponding to the user's utterance based on the converted data received from the first information processing apparatus.

The first information processing device is
User utterance interpretation data having an intent corresponding to the intention of the user utterance and a slot corresponding to the element information included in the user utterance is generated.
The information processing system according to claim 13, wherein the conversion data having an intent and a slot that the second information processing apparatus can understand is generated and transmitted to the second information processing apparatus.

It is an information processing method executed in an information processing device.
The voice input section inputs the user's utterance,
The data processing department
Data processing that executes the analysis of the user utterance and generates the user utterance interpretation data is executed.
The data processing unit
When the external second information processing device executes the process corresponding to the user's utterance,
An information processing method that converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and transmits the converted data to the second information processing device.

It is an information processing method executed in an information processing system having a plurality of information processing devices.
The first information processing device
When the external second information processing device executes the processing corresponding to the input user utterance
The analysis of the user utterance is executed to generate the user utterance interpretation data, and the user utterance interpretation data is generated.
A process of converting the user utterance interpretation data, generating converted data that can be understood by the second information processing device, and transmitting the converted data to the second information processing device is executed.
The second information processing device
An information processing method that executes processing corresponding to the user's utterance based on the converted data received from the first information processing apparatus.

A program that executes information processing in an information processing device.
The program is installed in the data processing unit.
Analyze user utterances to generate user utterance interpretation data
When the external second information processing device executes the process corresponding to the user's utterance,
A program that converts the user utterance interpretation data to generate conversion data that can be understood by the second information processing device, and causes the second information processing device to transmit the converted data.