JP6795028B2

JP6795028B2 - Information processing system and information processing method

Info

Publication number: JP6795028B2
Application number: JP2018506772A
Authority: JP
Inventors: 井原　圭吾; 圭吾井原
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2016-03-22
Filing date: 2016-12-19
Publication date: 2020-12-02
Anticipated expiration: 2036-12-19
Also published as: JP7070638B2; WO2017163509A1; JP2021039370A; JPWO2017163509A1

Description

本開示は、情報処理システムおよび情報処理方法に関する。 The present disclosure relates to an information processing system and an information processing method.

近年、通信技術の発達により、ネットワークを介したメッセージのやり取りが頻繁に行われている。ユーザは、スマートフォンや携帯電話端末、タブレット端末等の情報処理端末を用いて、他端末から送信されたメッセージを確認したり、メッセージを送信したりすることができる。 In recent years, with the development of communication technology, messages are frequently exchanged via networks. The user can confirm a message sent from another terminal or send a message by using an information processing terminal such as a smartphone, a mobile phone terminal, or a tablet terminal.

また、情報処理端末において、ユーザのメッセージに対して自動で応答を行うエージェントシステムが提案されている。このようなシステムに関し、例えば下記特許文献１には、服装、髪型、持ち物、性格といったエージェント作成用データをユーザが好きに組み合わせてエージェントを作成することができるエージェント作成装置が記載されている。 Further, in an information processing terminal, an agent system that automatically responds to a user's message has been proposed. Regarding such a system, for example, Patent Document 1 below describes an agent creation device capable of creating an agent by combining agent creation data such as clothes, hairstyle, belongings, and personality as desired by the user.

また、下記特許文献２には、ユーザ情報からユーザの趣味・嗜好を含めた文化的背景を解釈し、その文化的背景に対応する動作をエージェントに行わせる装置が記載されている。また、下記特許文献３には、音声インタフェースエージェントにおいて、所定の感情を発生させる状況に特有に現れる予測不可能な付帯条件を学習し、学習された付帯条件を満たす新たな状況下で該所定の感情を想起させることが可能な感情生成装置が記載されている。 Further, Patent Document 2 below describes a device that interprets a cultural background including a user's hobbies / preferences from user information and causes an agent to perform an operation corresponding to the cultural background. Further, in Patent Document 3 below, the voice interface agent learns an unpredictable incidental condition that appears peculiar to a situation that generates a predetermined emotion, and the predetermined incidental condition is satisfied under a new situation that satisfies the learned incidental condition. An emotion generating device capable of recalling emotions is described.

特開２００３−１８６５８９号公報Japanese Unexamined Patent Publication No. 2003-186589 特開２００３−１０６８４６号公報Japanese Unexamined Patent Publication No. 2003-106846 特開平１１−２６５２３９号公報Japanese Unexamined Patent Publication No. 11-265239

しかしながら、エージェントのキャラクターを様々設定出来ても、エージェントはあくまでもユーザの対話相手として存在するものであって、ユーザ自身が好きなエージェントのキャラクターになりきるといった体験を得ることは出来なかった。 However, even if the character of the agent can be set in various ways, the agent exists only as a dialogue partner of the user, and the user cannot get the experience of becoming the character of the agent he / she likes.

そこで、本開示では、エージェントを通してエージェントのキャラクターをユーザ自身が体験できるようにすることでエージェントシステムの娯楽性をさらに高めることが可能な情報処理システムおよび情報処理方法を提案する。 Therefore, the present disclosure proposes an information processing system and an information processing method capable of further enhancing the entertainment of the agent system by allowing the user to experience the character of the agent through the agent.

本開示によれば、複数種類のキャラクターに対応する音素データベースと発話フレーズデータベースを記憶するエージェント記憶部と、ユーザのクライアント端末を介して特定のキャラクターを選択する選択信号を受信すると共に、当該特定のキャラクターの前記発話フレーズデータベースに応じた発話フレーズを送信する通信部と、前記通信部を介して受信した前記ユーザのメッセージに基づいて、前記特定のキャラクターに対応する前記音素データベースを用いて前記特定のキャラクターの音声に変換した変換メッセージを生成し、さらに前記ユーザのメッセージに対応する前記特定のキャラクターの発話フレーズを、前記発話フレーズデータベースを用いて生成し、前記生成した変換メッセージおよび発話フレーズを前記クライアント端末に返送するよう制御する制御部と、を備える、情報処理システムを提案する。 According to the present disclosure, an agent storage unit that stores a phonograph database and an utterance phrase database corresponding to a plurality of types of characters, and a selection signal for selecting a specific character are received via a user's client terminal, and the specific character is specified. Based on the communication unit that transmits the utterance phrase corresponding to the utterance phrase database of the character and the user's message received via the communication unit, the specific phonology database corresponding to the specific character is used. A conversion message converted into a character's voice is generated, a utterance phrase of the specific character corresponding to the user's message is generated using the utterance phrase database, and the generated conversion message and utterance phrase are generated by the client. We propose an information processing system equipped with a control unit that controls the return to the terminal.

本開示によれば、プロセッサが、複数種類のキャラクターに対応する音素データベースと発話フレーズデータベースをエージェント記憶部に記憶することと、ユーザのクライアント端末を介して特定のキャラクターを選択する選択信号を受信すると共に、当該特定のキャラクターの前記発話フレーズデータベースに応じた発話フレーズを通信部により送信することと、前記通信部を介して受信した前記ユーザのメッセージに基づいて、前記特定のキャラクターに対応する前記音素データベースを用いて前記特定のキャラクターの音声に変換した変換メッセージを生成し、さらに前記ユーザのメッセージに対応する前記特定のキャラクターの発話フレーズを、前記発話フレーズデータベースを用いて生成し、前記生成した変換メッセージおよび発話フレーズを前記クライアント端末に返送するよう制御部により制御することと、を含む、情報処理方法を提案する。 According to the present disclosure, the processor stores a phonological database and an utterance phrase database corresponding to a plurality of types of characters in an agent storage unit, and receives a selection signal for selecting a specific character via a user's client terminal. At the same time, the communication unit transmits the utterance phrase corresponding to the utterance phrase database of the specific character, and the phonetic element corresponding to the specific character is based on the user's message received via the communication unit. A conversion message converted into the voice of the specific character is generated using the database, and an utterance phrase of the specific character corresponding to the message of the user is generated using the utterance phrase database, and the generated conversion is performed. We propose an information processing method including controlling by a control unit to return a message and an utterance phrase to the client terminal.

以上説明したように本開示によれば、エージェントを通してエージェントのキャラクターをユーザ自身が体験できるようにすることでエージェントシステムの娯楽性をさらに高めることが可能となる。 As described above, according to the present disclosure, it is possible to further enhance the entertainment of the agent system by allowing the user to experience the character of the agent himself / herself through the agent.

なお、上記の効果は必ずしも限定的なものではなく、上記の効果とともに、または上記の効果に代えて、本明細書に示されたいずれかの効果、または本明細書から把握され得る他の効果が奏されてもよい。 It should be noted that the above effects are not necessarily limited, and either in combination with or in place of the above effects, any of the effects shown herein, or any other effect that can be grasped from this specification. May be played.

本開示の一実施形態による情報処理システムの概要について説明する図である。It is a figure explaining the outline of the information processing system by one Embodiment of this disclosure. 本実施形態による通信制御システムの全体構成を示す図である。It is a figure which shows the whole structure of the communication control system by this embodiment. 本実施形態による音声エージェントサーバの構成の一例を示すブロック図である。It is a block diagram which shows an example of the configuration of the voice agent server by this embodiment. 本実施形態による対話処理部の構成例を示す図である。It is a figure which shows the structural example of the dialogue processing part by this embodiment. 本実施形態による会話ＤＢの生成処理を示すフローチャートである。It is a flowchart which shows the generation process of the conversation DB by this embodiment. 本実施形態による音素ＤＢの生成処理を示すフローチャートである。It is a flowchart which shows the generation process of the phoneme DB by this embodiment. 本実施形態による対話制御処理を示すフローチャートである。It is a flowchart which shows the dialogue control processing by this embodiment. 本実施形態による会話ＤＢのデータ構成例について説明する図である。It is a figure explaining the data structure example of the conversation DB by this embodiment. 本実施形態による会話ＤＢの更新処理を示すフローチャートである。It is a flowchart which shows the update process of the conversation DB by this embodiment. 本実施形態による個人化レイヤーから共通レイヤーへの会話データ移行処理を示すフローチャートである。It is a flowchart which shows the conversation data migration process from the personalization layer to the common layer by this embodiment. 本実施形態による基本対話用会話ＤＢへの会話データの移行について説明する図である。It is a figure explaining the transfer of conversation data to the conversation DB for basic dialogue by this embodiment. 本実施形態による基本対話用ＤＢへの会話データ移行処理を示すフローチャートである。It is a flowchart which shows the conversation data transfer process to the basic dialogue DB by this embodiment. 本実施形態による広告ＤＢに登録されている広告情報の一例を示す図である。It is a figure which shows an example of the advertisement information registered in the advertisement DB by this embodiment. 本実施形態による広告内容の挿入処理を示すフローチャートである。It is a flowchart which shows the insertion process of the advertisement content by this embodiment. 本実施形態による対話処理部の構成例を示す図である。It is a figure which shows the structural example of the dialogue processing part by this embodiment. 本実施形態によるユーザ管理部の構成例を示す図である。It is a figure which shows the configuration example of the user management part by this embodiment. 本実施形態による自動発話制御部の構成例を示す図である。It is a figure which shows the configuration example of the automatic utterance control unit by this embodiment. 本実施形態によるシナリオ管理部の構成例を示す図である。It is a figure which shows the configuration example of the scenario management part by this embodiment. 本実施形態によるエージェントアプリケーションの購入処理を示すシーケンス図である。It is a sequence diagram which shows the purchase process of the agent application by this Embodiment. 本実施形態によるエージェントアプリケーションの購入時における表示画面例を示す図である。It is a figure which shows the display screen example at the time of purchase of the agent application by this embodiment. 本実施形態によるアカウント登録画面例を示す図である。It is a figure which shows the account registration screen example by this embodiment. 本実施形態によるメイン画面例を示す図である。It is a figure which shows the example of the main screen by this embodiment. 本実施形態による音声変換処理について説明する図である。It is a figure explaining the voice conversion process by this Embodiment. 本実施形態による音声変換処理を示すシーケンス図である。It is a sequence diagram which shows the voice conversion processing by this embodiment. 本実施形態によるＡＲ変身による成りきりについて説明する図である。It is a figure explaining the formation by AR transformation by this embodiment. 本実施形態による場所に応じた自動発話処理を示すシーケンス図である。It is a sequence diagram which shows the automatic utterance processing according to the place by this embodiment. 本実施形態による人物属性や表情に応じた自動発話処理を示すシーケンス図である。It is a sequence diagram which shows the automatic utterance processing according to the person attribute and the facial expression by this embodiment. 本実施形態によるユーザ行動に応じた自動発話処理を示すシーケンス図である。It is a sequence diagram which shows the automatic utterance processing according to the user action by this embodiment. 本実施形態による心理状態に応じた自動発話処理を示すシーケンス図である。It is a sequence diagram which shows the automatic utterance processing according to the psychological state by this embodiment. 本実施形態によるシナリオ取得処理を示すフローチャートである。It is a flowchart which shows the scenario acquisition process by this Embodiment. 本実施形態によるシナリオ購入までのクライアント端末における画面表示例を示す図である。It is a figure which shows the screen display example in the client terminal until the scenario purchase by this embodiment. 本実施形態によるシナリオ購入までのクライアント端末における画面表示例を示す図である。It is a figure which shows the screen display example in the client terminal until the scenario purchase by this embodiment. 本実施形態によるシナリオ参加登録処理を示すシーケンス図である。It is a sequence diagram which shows the scenario participation registration process by this embodiment. 本実施形態によるシナリオ構成について説明する図である。It is a figure explaining the scenario composition by this embodiment. 本実施形態によるシナリオ参加画面の一例を示す図である。It is a figure which shows an example of the scenario participation screen by this embodiment. 本実施形態によるエージェントAppがフォアグラウンドで起動中の場合におけるシナリオ開催開始通例例を示す図である。It is a figure which shows the example of the scenario holding start in the case where the agent App by this embodiment is running in the foreground. 本実施形態によるエージェントAppが非起動の場合におけるシナリオ開催開始通例例を示す図である。It is a figure which shows the example of the scenario holding start usual in the case where the agent App by this embodiment is not started. 本実施形態によるシナリオ実行処理を示すシーケンス図である。It is a sequence diagram which shows the scenario execution process by this Embodiment. 本実施形態によるイベント発生時における表示画面例を示す図である。It is a figure which shows the example of the display screen at the time of the event occurrence by this embodiment. 本実施形態によるユーザの位置をトリガとしたイベントの実行処理を示すシーケンス図である。It is a sequence diagram which shows the execution process of the event triggered by the position of the user by this embodiment. 本実施形態による複数ユーザの位置をトリガとしたイベントの実行処理を示すシーケンス図である。It is a sequence diagram which shows the execution process of the event triggered by the position of a plurality of users by this embodiment. 本実施形態によるカメラをかざす行動をユーザに促す表示画面例を示す図である。It is a figure which shows the example of the display screen which prompts a user to act to hold a camera by this embodiment. 本実施形態による他のキャラクターのオーバーレイ表示について説明する図である。It is a figure explaining the overlay display of another character by this embodiment. 本実施形態による各センサからの出力結果をトリガとしたイベントの実行処理を示すシーケンス図である。It is a sequence diagram which shows the execution process of the event triggered by the output result from each sensor by this embodiment. 本実施形態によるシナリオクリアの判断処理を示すシーケンス図である。It is a sequence diagram which shows the determination process of the scenario clear by this embodiment. 本実施形態によるシナリオクリア時の通知画面例を示す図である。It is a figure which shows the example of the notification screen at the time of clearing a scenario by this embodiment.

以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書および図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Preferred embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are designated by the same reference numerals, so that duplicate description will be omitted.

また、説明は以下の順序で行うものとする。
１．本開示の一実施形態による情報処理システムの概要
２．構成
２−１．システム構成
２−２．サーバの構成
３．システム動作処理
３−１．会話データ登録処理
３−２．音素ＤＢ生成処理
３−３．対話制御処理
３−４．会話ＤＢ更新処理
３−５．広告挿入処理
４．音声出力制御処理
４−１．構成
４−２．動作処理
（４−２−１．エージェント購入処理）
（４−２−２．音声変換処理）
（４−２−３．自動発話処理）
（４−２−４．シナリオ取得処理）
（４−２−５．シナリオ実行処理）
５．まとめIn addition, the explanation shall be given in the following order.
1. 1. Outline of the information processing system according to the embodiment of the present disclosure 2. Configuration 2-1. System configuration 2-2. Server configuration 3. System operation processing 3-1. Conversation data registration process 3-2. Phoneme DB generation process 3-3. Dialogue control processing 3-4. Conversation DB update process 3-5. Advertisement insertion process 4. Audio output control processing 4-1. Configuration 4-2. Operation processing (4-2-1. Agent purchase processing)
(4-2-2. Voice conversion processing)
(4-2-3. Automatic utterance processing)
(4-2-4. Scenario acquisition process)
(4-2-5. Scenario execution processing)
5. Summary

＜＜１．本開示の一実施形態による情報処理システムの概要＞＞
本開示の一実施形態による情報処理システムは、エージェントを通してエージェントのキャラクターをユーザ自身が体験できるようにすることでエージェントシステムの娯楽性をさらに高めることを可能とする。以下、図１を参照して本実施形態による情報処理システムの概要について説明する。<< 1. Outline of information processing system according to one embodiment of the present disclosure >>
The information processing system according to the embodiment of the present disclosure makes it possible to further enhance the entertainment of the agent system by allowing the user to experience the character of the agent through the agent. Hereinafter, an outline of the information processing system according to the present embodiment will be described with reference to FIG.

図１は、本開示の一実施形態による情報処理システムの概要について説明する図である。エージェントとの対話は、例えばユーザが所有するスマートフォン等のクライアント端末１を介して行われる。クライアント端末１は、マイクロホンおよびスピーカーを有し、ユーザとの音声による対話を可能とする。 FIG. 1 is a diagram illustrating an outline of an information processing system according to an embodiment of the present disclosure. The dialogue with the agent is performed, for example, via a client terminal 1 such as a smartphone owned by the user. The client terminal 1 has a microphone and a speaker, and enables a voice dialogue with the user.

ここで、上述したように、エージェントのキャラクターを様々設定出来ても、エージェントはあくまでもユーザの対話相手として存在するものであって、ユーザ自身が好きなエージェントのキャラクターになりきるといった体験を得ることは出来なかった。 Here, as described above, even if the agent character can be set in various ways, the agent exists only as a dialogue partner of the user, and it is not possible to obtain the experience that the user himself / herself can become the character of the agent he / she likes. I could not do it.

そこで、本実施形態では、エージェントがユーザと音声による自動会話を行う他、エージェントを通してエージェントのキャラクターをユーザ自身が体験できるようにすることでエージェントシステムの娯楽性をさらに高めることを可能とする。 Therefore, in the present embodiment, in addition to the agent having an automatic conversation with the user by voice, it is possible to further enhance the entertainment of the agent system by allowing the user to experience the character of the agent through the agent.

例えば、エージェントプログラムが起動している際、本実施形態による情報処理システムは、図１に示すように、ユーザが発話を行うと、その発話音声W₁をエージェントキャラクター１０の音声W₂に変換してユーザのイヤホン等から再生する（音声変換処理）。このように、ユーザが話した言葉がエージェントキャラクター１０（例えばヒーロー）の声で聞こえるため、ユーザはそのエージェントキャラクター１０になりきった体験が得られる。また、本実施形態による情報処理システムは、変換した音声W₂に続けて、ユーザの発話音声W₁に対応する所定のフレーズを同エージェントキャラクター１０の音声で続けて出力してもよい（図１に示す発話音声W₃）（自動発話処理）。所定のフレーズとは、例えばユーザの発話音声W₁に含まれるキーワード（またはフレーズ）に紐付けられて登録されているフレーズである。このようなフレーズを、ユーザの音声を変換した音声に続けて自動発話させることにより、そのキャラクターへのなりきり体験をより高めることができる。また、本実施形態による情報処理システムは、発話音声W₁に含まれるキーワード（またはフレーズ）や、対応する所定のフレーズに応じた効果音を併せて再生してもよい。また、本実施形態による自動発話処理は、ユーザの発話音声に限らず、ユーザの行動や移動場所、表情、日時等に対応するフレーズや効果音を再生してもよい。For example, when the agent program is running, the information processing system according to the present embodiment converts the spoken voice W ₁ into the voice W ₂ of the agent character 10 when the user speaks, as shown in FIG. Play from the user's earphones (voice conversion process). In this way, since the words spoken by the user are heard in the voice of the agent character 10 (for example, a hero), the user can obtain the experience of becoming the agent character 10. Further, the information processing system according to the present embodiment may output a predetermined phrase corresponding to the user's utterance voice W ₁ in succession with the voice of the agent character 10 following the converted voice W ₂ (FIG. 1). Speech voice W ₃ ) (automatic speech processing) shown in. The predetermined phrase is, for example, a phrase registered in association with a keyword (or phrase) included in the utterance voice W _{1 of the} user. By automatically uttering such a phrase following the converted voice of the user's voice, it is possible to further enhance the experience of impersonating the character. Further, the information processing system according to the present embodiment may also reproduce a keyword (or phrase) included in the spoken voice W ₁ and a sound effect corresponding to the corresponding predetermined phrase. Further, the automatic utterance processing according to the present embodiment is not limited to the utterance voice of the user, and may reproduce phrases and sound effects corresponding to the user's actions, moving places, facial expressions, date and time, and the like.

また、本実施形態による情報処理システムは、エージェントキャラクター１０にユーザ自身がなりきってシナリオに参加できるようにすることも可能である。例えばユーザが予め好きなエージェントとシナリオを購入すると、クライアント端末１に搭載された位置測位部（ＧＰＳなど）や各種センサ（加速度センサ、ジャイロセンサ、地磁気センサ、マイク、カメラ等）から検知された情報に基づくユーザの実世界の状況に応じてシナリオが展開される。具体的には、シナリオの進行に応じて各種イベントが開催され、ユーザはイベントに応じた行動は発話を行うことで、イベントを体験することができる。 Further, the information processing system according to the present embodiment can allow the user himself / herself to participate in the scenario by impersonating the agent character 10. For example, when a user purchases a favorite agent and scenario in advance, the information detected from the positioning unit (GPS, etc.) and various sensors (acceleration sensor, gyro sensor, geomagnetic sensor, microphone, camera, etc.) mounted on the client terminal 1 Scenarios are developed according to the user's real-world situation based on. Specifically, various events are held according to the progress of the scenario, and the user can experience the event by speaking the action according to the event.

また、本実施形態では、同シナリオに参加する他のエージェントキャラクターになりきった人物と出会うイベントを開催することも可能である。例えば、キャラクター同士が特定の時間、場所で出会うシーンを含むシナリオプログラムが実行されている際に、各キャラクターになりきった人物が特定の時間に特定の場所に移動すると、特別なイベントが発生する。具体的には、例えば情報処理システムは、各クライアント端末１において相手の発話音声を相手のエージェントキャラクターの音声に変換してイヤホン等から再生させることで、ユーザ同士がシナリオに登場するキャラクターとして会話を楽しむことができるようにしてもよい。また、本実施形態では、ユーザがクライアント端末１を相手にかざしてクライアント端末１のカメラで相手の姿が捉えられた際に、相手の姿に相手のエージェントキャラクターの画像を重畳表示させることで、聴覚的のみならず視覚的にも相手ユーザのエージェントキャラクターへの成りきりを直感的に提供することができる。 Further, in the present embodiment, it is also possible to hold an event to meet a person who has become another agent character who participates in the scenario. For example, when a scenario program that includes a scene where characters meet at a specific time and place is being executed, a special event will occur if the person who has become each character moves to a specific place at a specific time. .. Specifically, for example, in an information processing system, each client terminal 1 converts the voice of the other party into the voice of the other party's agent character and reproduces it from an earphone or the like, so that users can talk with each other as characters appearing in the scenario. You may be able to enjoy it. Further, in the present embodiment, when the user holds the client terminal 1 over the other party and the other party's figure is captured by the camera of the client terminal 1, the image of the other party's agent character is superimposed and displayed on the other party's figure. It is possible to intuitively provide the agent character of the other user not only audibly but also visually.

なお、本実施形態による情報処理システム（エージェントシステム）は、音声により応答を行う音声エージェントに限定されず、クライアント端末１においてテキストベースで応答を行うテキスト対応エージェントであってもよい。 The information processing system (agent system) according to the present embodiment is not limited to the voice agent that responds by voice, and may be a text-compatible agent that responds on a text basis at the client terminal 1.

以下、本実施形態による情報処理システムのシステム構成と、各装置の基本的な構成および動作処理について具体的に説明する。なお以降では、本実施形態による情報処理システムを通信制御システムと称して説明する。 Hereinafter, the system configuration of the information processing system according to the present embodiment, the basic configuration of each device, and the operation processing will be specifically described. Hereinafter, the information processing system according to the present embodiment will be referred to as a communication control system.

＜＜２．構成＞＞
＜２−１．システム構成＞
続いて、上述した本実施形態による通信制御システムの全体構成について図２を参照して説明する。図２は、本実施形態による通信制御システムの全体構成を示す図である。<< 2. Configuration >>
<2-1. System configuration>
Subsequently, the overall configuration of the communication control system according to the present embodiment described above will be described with reference to FIG. FIG. 2 is a diagram showing an overall configuration of a communication control system according to the present embodiment.

図２に示すように、本実施形態による通信制御システムは、クライアント端末１およびエージェントサーバ２を含む。 As shown in FIG. 2, the communication control system according to the present embodiment includes a client terminal 1 and an agent server 2.

エージェントサーバ２は、ネットワーク３を介してクライアント端末１と接続し、データの送受信を行う。具体的には、エージェントサーバ２は、クライアント端末１で収音され、送信された発話音声に対する応答音声を生成し、クライアント端末１に送信する。エージェントサーバ２は、１以上のエージェントに対応する音素ＤＢ（データベース）を有し、特定のエージェントの音声で応答音声を生成することが可能である。ここで、エージェントとは、漫画、アニメ、ゲーム、ドラマ、映画等のキャラクターや、芸能人、著名人、歴史上の人物等であってもよいし、また、個人に特定せず、例えば世代別の平均的な人物であってもよい。また、エージェントは、動物や擬人化されたキャラクターであってもよい。また、エージェントは、ユーザ本人の性格を反映した人物や、ユーザの友人、家族、知人等の性格を反映した人物であってもよい。 The agent server 2 connects to the client terminal 1 via the network 3 and transmits / receives data. Specifically, the agent server 2 generates a response voice to the utterance voice that is picked up by the client terminal 1 and transmitted, and transmits the response voice to the client terminal 1. The agent server 2 has a phoneme DB (database) corresponding to one or more agents, and can generate a response voice with the voice of a specific agent. Here, the agent may be a character such as a manga, an animation, a game, a drama, a movie, a celebrity, a celebrity, a historical person, or the like, and is not specified to an individual, for example, by generation. It may be an average person. The agent may also be an animal or anthropomorphic character. Further, the agent may be a person who reflects the personality of the user himself / herself, or a person who reflects the personality of the user's friends, family, acquaintances, and the like.

また、エージェントサーバ２は、各エージェントの性格を反映した応答内容を生成することが可能である。エージェントサーバ２は、エージェントを介して、ユーザのスケジュール管理、メッセージの送受信、情報提供等、様々なサービスをユーザとの対話を通じて提供し得る。 Further, the agent server 2 can generate a response content that reflects the character of each agent. The agent server 2 can provide various services such as user schedule management, message transmission / reception, and information provision through a dialogue with the user via the agent.

なおクライアント端末１は、図２に示すようなスマートフォンに限定されず、例えば携帯電話端末、タブレット端末、ＰＣ（パーソナルコンピュータ）、ゲーム機、ウェアラブル端末（スマートアイグラス、スマートバンド、スマートウォッチ、スマートネック等）等であってもよい。また、クライアント端末１は、ロボットであってもよい。 The client terminal 1 is not limited to a smartphone as shown in FIG. 2, for example, a mobile phone terminal, a tablet terminal, a PC (personal computer), a game machine, a wearable terminal (smart eyeglass, smart band, smart watch, smart neck). Etc.) etc. Further, the client terminal 1 may be a robot.

以上、本実施形態による通信制御システムの概要について説明した。続いて、本実施形態による通信制御システムのエージェントサーバ２の構成について図３を参照して具体的に説明する。 The outline of the communication control system according to the present embodiment has been described above. Subsequently, the configuration of the agent server 2 of the communication control system according to the present embodiment will be specifically described with reference to FIG.

＜２−２．エージェントサーバ２＞
図３は、本実施形態によるエージェントサーバ２の構成の一例を示すブロック図である。図３に示すように、エージェントサーバ２は、音声エージェントＩ／Ｆ（インタフェース）２０、対話処理部３０、音素記憶部４０、会話ＤＢ生成部５０、音素ＤＢ生成部６０、広告挿入処理部７０、広告ＤＢ７２、およびフィードバック取得処理部８０を有する。<2-2. Agent server 2>
FIG. 3 is a block diagram showing an example of the configuration of the agent server 2 according to the present embodiment. As shown in FIG. 3, the agent server 2 includes a voice agent I / F (interface) 20, a dialogue processing unit 30, a phoneme storage unit 40, a conversation DB generation unit 50, a phoneme DB generation unit 60, and an advertisement insertion processing unit 70. It has an advertisement DB 72 and a feedback acquisition processing unit 80.

音声エージェントＩ／Ｆ２０は、音声データの入出力部、音声認識部、および音声生成部として機能する。入出力部としては、ネットワーク３を介してクライアント端末１と送受信を行う通信部が想定される。音声エージェントＩ／Ｆ２０は、クライアント端末１からユーザの発話音声を受信し、音声認識によりテキスト化することが可能である。また、音声エージェントＩ／Ｆ２０は、対話処理部３０から出力されたエージェントの回答文データ（テキスト）を、当該エージェントに対応する音素データを用いて音声化し、生成したエージェントの応答音声をクライアント端末１に送信する。 The voice agent I / F 20 functions as a voice data input / output unit, a voice recognition unit, and a voice generation unit. As the input / output unit, a communication unit that transmits / receives to / from the client terminal 1 via the network 3 is assumed. The voice agent I / F20 can receive the user's uttered voice from the client terminal 1 and convert it into text by voice recognition. Further, the voice agent I / F 20 converts the agent's response text data (text) output from the dialogue processing unit 30 into voice using the phoneme data corresponding to the agent, and the generated agent's response voice is generated by the client terminal 1. Send to.

対話処理部３０は、演算処理装置および制御装置として機能し、各種プログラムに従ってエージェントサーバ２内の動作全般を制御する。対話処理部３０は、例えばＣＰＵ（Central Processing Unit）、マイクロプロセッサ等の電子回路によって実現される。また、本実施形態による対話処理部３０は、基本対話処理部３１、キャラクターＡ対話処理部３２、人物Ｂ対話処理部３３、人物Ｃ対話処理部３４として機能する。 The dialogue processing unit 30 functions as an arithmetic processing unit and a control device, and controls the overall operation in the agent server 2 according to various programs. The dialogue processing unit 30 is realized by an electronic circuit such as a CPU (Central Processing Unit) or a microprocessor. Further, the dialogue processing unit 30 according to the present embodiment functions as a basic dialogue processing unit 31, a character A dialogue processing unit 32, a person B dialogue processing unit 33, and a person C dialogue processing unit 34.

キャラクターＡ対話処理部３２、人物Ｂ対話処理部３３、人物Ｃ対話処理部３４は、エージェント毎に特化された対話を実現する。ここでは、エージェントの一例として「キャラクターＡ」「人物Ｂ」「人物Ｃ」を挙げているが、本実施形態は当然これに限定されず、さらに多数のエージェントに特化した対話を実現する各対話処理部を有していてもよい。基本対話処理部３１は、エージェント毎に特化されていない、汎用の対話を実現する。 The character A dialogue processing unit 32, the person B dialogue processing unit 33, and the person C dialogue processing unit 34 realize a dialogue specialized for each agent. Here, "character A", "person B", and "person C" are given as examples of agents, but the present embodiment is not limited to this, and each dialogue that realizes a dialogue specialized for a larger number of agents. It may have a processing unit. The basic dialogue processing unit 31 realizes a general-purpose dialogue that is not specialized for each agent.

ここで、基本対話処理部３１、キャラクターＡ対話処理部３２、人物Ｂ対話処理部３３、および人物Ｃ対話処理部３４に共通する基本構成について図４を参照して説明する。 Here, a basic configuration common to the basic dialogue processing unit 31, the character A dialogue processing unit 32, the person B dialogue processing unit 33, and the person C dialogue processing unit 34 will be described with reference to FIG.

図４は、本実施形態による対話処理部３００の構成例を示す図である。図４に示すように、対話処理部３００は、質問文検索部３１０、回答文生成部３２０、音素データ取得部３４０、および会話ＤＢ３３０を有する。会話ＤＢ３３０は、質問文データと回答文データが組になった会話データが保存されている。エージェントに特化した対話処理部では、かかる会話ＤＢ３３０にエージェントに特化した会話データが保存され、汎用の対話処理部では、かかる会話ＤＢ３３０にエージェントに特化しない汎用の会話データ（すなわち、基本会話データ）が保存されている。 FIG. 4 is a diagram showing a configuration example of the dialogue processing unit 300 according to the present embodiment. As shown in FIG. 4, the dialogue processing unit 300 includes a question sentence search unit 310, an answer sentence generation unit 320, a phoneme data acquisition unit 340, and a conversation DB 330. The conversation DB 330 stores conversation data in which question text data and answer text data are combined. The agent-specific dialogue processing unit stores agent-specific conversation data in the conversation DB 330, and the general-purpose dialogue processing unit stores general-purpose conversation data (that is, basic conversation) that is not agent-specific in the conversation DB 330. Data) is saved.

質問文検索部３１０は、音声エージェントＩ／Ｆ２０から出力された、ユーザの質問音声（発話音声の一例）を認識してテキスト化した質問文と一致する質問文データを会話ＤＢ３３０から検索する。回答文生成部３２０は、質問文検索部３１０により検索した質問文データに対応付けて保存されている回答文データを会話ＤＢ３３０から抽出し、回答文データを生成する。音素データ取得部３４０は、回答文生成部３２０により生成された回答文を音声化するための音素データを、対応するエージェントの音素記憶部４０から取得する。例えば、キャラクターＡ対話処理部３２の場合、キャラクターＡ音素ＤＢ４２から、回答文データをキャラクターＡの音声で再生するための音素データを取得する。そして、対話処理部３００は、生成した回答文データおよび取得した音素データを音声エージェントＩ／Ｆ２０に出力する。 The question sentence search unit 310 recognizes the user's question voice (an example of the spoken voice) output from the voice agent I / F20 and searches the conversation DB 330 for question text data that matches the textualized question text. The answer sentence generation unit 320 extracts the answer sentence data stored in association with the question sentence data searched by the question sentence search unit 310 from the conversation DB 330, and generates the answer sentence data. The phoneme data acquisition unit 340 acquires phoneme data for converting the answer sentence generated by the answer sentence generation unit 320 into a voice from the phoneme storage unit 40 of the corresponding agent. For example, in the case of the character A dialogue processing unit 32, the phoneme data for reproducing the answer sentence data by the voice of the character A is acquired from the character A phoneme DB 42. Then, the dialogue processing unit 300 outputs the generated answer sentence data and the acquired phoneme data to the voice agent I / F20.

音素記憶部４０は、エージェント毎の音声を生成するための音素データベースを格納する。音素記憶部４０は、ＲＯＭ（Read Only Memory）およびＲＡＭ（Random Access Memory）により実現され得る。図３に示す例では、基本音素ＤＢ４１、キャラクターＡ音素ＤＢ４２、人物Ｂ音素ＤＢ４３、人物Ｃ音素ＤＢ４４を格納する。各音素ＤＢには、音素データとして、例えば音素片とその制御情報である韻律モデルが記憶されている。 The phoneme storage unit 40 stores a phoneme database for generating voice for each agent. The phoneme storage unit 40 can be realized by a ROM (Read Only Memory) and a RAM (Random Access Memory). In the example shown in FIG. 3, the basic phoneme DB41, the character A phoneme DB42, the person B phoneme DB43, and the person C phoneme DB44 are stored. In each phoneme DB, for example, a phoneme piece and a prosody model which is control information thereof are stored as phoneme data.

会話ＤＢ生成部５０は、対話処理部３００の会話ＤＢ３３０を生成する機能を有する。例えば会話ＤＢ生成部５０は、想定される質問文データを収集し、各質問に対応する回答文データを収集した後に、質問文データと回答文データとを組にして保存する。そして、会話ＤＢ生成部５０は、所定数の会話データ（質問文データと回答文データとの組、例えば１００組）が集まったら、エージェントの会話データセットとして会話ＤＢ３３０に登録する。 The conversation DB generation unit 50 has a function of generating the conversation DB 330 of the conversation processing unit 300. For example, the conversation DB generation unit 50 collects expected question sentence data, collects answer sentence data corresponding to each question, and then saves the question sentence data and the answer sentence data as a set. Then, when a predetermined number of conversation data (a set of question sentence data and answer sentence data, for example, 100 sets) is collected, the conversation DB generation unit 50 registers it in the conversation DB 330 as an agent conversation data set.

音素ＤＢ生成部６０は、音素記憶部４０に格納されている音素ＤＢを生成する機能を有する。例えば音素ＤＢ生成部６０は、所定のテキストを読み上げた音声情報を解析して、音素片とその制御情報である韻律モデルに分解し、所定数以上の音声情報が収集できたら音素データとして音素ＤＢに登録する処理を行う。 The phoneme DB generation unit 60 has a function of generating a phoneme DB stored in the phoneme storage unit 40. For example, the phoneme DB generation unit 60 analyzes the voice information read out from a predetermined text, decomposes it into a phoneme piece and a phoneme model which is control information thereof, and when a predetermined number or more of voice information can be collected, the phoneme DB is used as phoneme data. Perform the process of registering in.

広告挿入処理部７０は、エージェントの対話に広告情報を挿入する機能を有する。挿入する広告情報は、広告ＤＢ７２から抽出し得る。広告ＤＢ７２には、企業等の提供側（ベンダー、サプライヤー）から依頼された広告情報（例えばテキスト、画像、音声等の広告内容、広告主、広告期間、広告対象者等の情報）が登録されている。 The advertisement insertion processing unit 70 has a function of inserting advertisement information into the dialogue of the agent. The advertisement information to be inserted can be extracted from the advertisement DB 72. In the advertisement DB 72, advertisement information requested from the provider (vendor, supplier) of a company or the like (for example, advertisement content such as text, image, voice, advertiser, advertisement period, advertisement target person, etc.) is registered. There is.

フィードバック取得処理部８０は、エージェントの対話に、フィードバックを取得するための質問を挿入し、ユーザからフィードバックを得るための機能を有する。 The feedback acquisition processing unit 80 has a function of inserting a question for acquiring feedback into the dialogue of the agent and obtaining feedback from the user.

以上、本実施形態によるエージェントサーバ２の構成について具体的に説明した。なお、本実施形態によるエージェントサーバ２の構成は、図３に示す例に限定されない。例えば、エージェントサーバ２が有する各構成は、各々ネットワーク上の他サーバで構成されていてもよい。 The configuration of the agent server 2 according to the present embodiment has been specifically described above. The configuration of the agent server 2 according to the present embodiment is not limited to the example shown in FIG. For example, each configuration of the agent server 2 may be configured by another server on the network.

続いて、本実施形態による通信制御システムの基本的な動作処理について図５〜図１４を参照して説明する。 Subsequently, the basic operation processing of the communication control system according to the present embodiment will be described with reference to FIGS. 5 to 14.

＜＜３．システム動作処理＞＞
＜３−１．会話データ登録処理＞
図５は、本実施形態による会話ＤＢ３３０の生成処理を示すフローチャートである。図５に示すように、まず、会話ＤＢ生成部５０は、想定される質問文を保存する（ステップＳ１０３）。<< 3. System operation processing >>
<3-1. Conversation data registration process>
FIG. 5 is a flowchart showing the generation process of the conversation DB 330 according to the present embodiment. As shown in FIG. 5, first, the conversation DB generation unit 50 saves the assumed question sentence (step S103).

次に、会話ＤＢ生成部５０は、質問文に対応する（対の）回答文を保存する（ステップＳ１０６）。 Next, the conversation DB generation unit 50 saves (paired) answer sentences corresponding to the question sentences (step S106).

次いで、会話ＤＢ生成部５０は、質問文と回答文のペア（会話データとも称す）が所定数集まったか否かを判断する（ステップＳ１０９）。 Next, the conversation DB generation unit 50 determines whether or not a predetermined number of pairs of question sentences and answer sentences (also referred to as conversation data) have been collected (step S109).

そして、質問文と会話文のペアが所定数集まった場合（ステップＳ１０９／Ｙｅｓ）、会話ＤＢ生成部５０は、質問文および回答文の多数のペアから成るデータセットを会話ＤＢ３３０に登録する（ステップＳ１１２）。質問文および回答文のペアの一例としては、例えば下記のようなものが想定される。 Then, when a predetermined number of pairs of question sentences and conversation sentences are collected (step S109 / Yes), the conversation DB generation unit 50 registers a data set composed of a large number of pairs of question sentences and answer sentences in the conversation DB 330 (step). S112). As an example of a pair of a question sentence and an answer sentence, for example, the following is assumed.

質問文および回答文のペア例
ペア１
質問文：おはよう。
回答文：今日の調子はどうですか？
ペア２
質問文：今日の天気は？
回答文：今日の天気は○○です。Question and answer pair example Pair 1
Question: Good morning.
Answer: How are you doing today?
Pair 2
Question: What's the weather today?
Answer: Today's weather is XX.

このようなペアが、会話データとして会話ＤＢ３３０に登録され得る。 Such a pair can be registered in the conversation DB 330 as conversation data.

＜３−２．音素ＤＢ生成処理＞
図６は、本実施形態による音素ＤＢの生成処理を示すフローチャートである。図６に示すように、まず、音素ＤＢ生成部６０は、例文の表示を行う（ステップＳ１１３）。例文の表示は、例えば図示しない情報処理端末のディスプレイに、音素データ生成のために必要な例文を表示する。<3-2. Phoneme DB generation process>
FIG. 6 is a flowchart showing a phoneme DB generation process according to the present embodiment. As shown in FIG. 6, first, the phoneme DB generation unit 60 displays an example sentence (step S113). For the display of example sentences, for example, an example sentence necessary for generating phoneme data is displayed on a display of an information processing terminal (not shown).

次に、音素ＤＢ生成部６０は、例文を読み上げた音声を録音し（ステップＳ１１６）、録音音声を分析する（ステップＳ１１９）。例えば、エージェントの音声を担当する人物により読み上げられた音声情報が情報処理端末のマイクロホンにより収集され、音素ＤＢ生成部６０がこれを受信し、記憶し、さらに音声分析を行う。 Next, the phoneme DB generation unit 60 records the voice read out from the example sentence (step S116) and analyzes the recorded voice (step S119). For example, voice information read aloud by a person in charge of voice of an agent is collected by a microphone of an information processing terminal, and the phoneme DB generation unit 60 receives and stores the voice information, and further performs voice analysis.

次いで、音素ＤＢ生成部６０は、音声情報に基づいて、韻律モデルを生成する（ステップＳ１２２）。韻律モデルとは、音声の韻律的特徴（例えば音の高低、音の強弱、発話速度等）を示す韻律パラメータを抽出するものであって、個人毎に異なる。 Next, the phoneme DB generation unit 60 generates a prosody model based on the voice information (step S122). The prosodic model extracts prosodic parameters indicating prosodic characteristics of speech (for example, pitch of sound, strength of sound, speech speed, etc.), and is different for each individual.

次に、音素ＤＢ生成部６０は、音声情報に基づいて、音素片（音素データ）を生成する（ステップＳ１２５）。 Next, the phoneme DB generation unit 60 generates a phoneme piece (phoneme data) based on the voice information (step S125).

次いで、音素ＤＢ生成部６０は、韻律モデルおよび音素片を保存する（ステップＳ１２８）。 Next, the phoneme DB generation unit 60 saves the prosody model and the phoneme piece (step S128).

続いて、音素ＤＢ生成部６０は、韻律モデルおよび音素片が所定数集まったか否かを判断する（ステップＳ１３１）。 Subsequently, the phoneme DB generation unit 60 determines whether or not a predetermined number of prosodic models and phoneme pieces have been collected (step S131).

そして、韻律モデルおよび音素片が所定数集まった場合（ステップＳ１３１／Ｙｅｓ）、音素ＤＢ生成部６０は、韻律モデルおよび音素片を、所定のエージェント用の音素データベースとして音素記憶部４０に登録する（ステップＳ１３４）。 Then, when a predetermined number of prosody models and phoneme pieces are collected (step S131 / Yes), the phoneme DB generation unit 60 registers the prosody model and phoneme pieces in the phoneme storage unit 40 as a phoneme database for a predetermined agent (step S131 / Yes). Step S134).

＜３−３．対話制御処理＞
図７は、本実施形態による対話制御処理を示すフローチャートである。図７に示すように、まず、音声エージェントＩ／Ｆ２０は、ユーザの質問音声およびエージェントＩＤを取得したか否かを確認する（ステップＳ１４３）。エージェントＩＤは、キャラクターＡ、人物Ｂ、人物Ｃといった特定のエージェントを示す識別情報である。ユーザは、エージェント毎の音素データを購入することができ、例えば購入処理時に購入したエージェントのＩＤがクライアント端末１に保存される。<3-3. Dialogue control process>
FIG. 7 is a flowchart showing the dialogue control process according to the present embodiment. As shown in FIG. 7, first, the voice agent I / F20 confirms whether or not the user's question voice and the agent ID have been acquired (step S143). The agent ID is identification information indicating a specific agent such as character A, person B, and person C. The user can purchase phoneme data for each agent, for example, the ID of the agent purchased at the time of purchase processing is stored in the client terminal 1.

次に、ユーザの質問音声およびエージェントＩＤを取得すると（ステップＳ１４６／Ｙｅｓ）、音声エージェントＩ／Ｆ２０は、質問音声を音声認識し、テキスト化する（ステップＳ１４９）。音声エージェントＩ／Ｆ２０は、テキスト化した質問文を、エージェントＩＤで指定された特定エージェントの対話処理部に出力する。例えば「エージェントＩＤ：キャラクターＡ」の場合、音声エージェントＩ／Ｆ２０は、テキスト化した質問文をキャラクターＡ対話処理部３２に出力する。 Next, when the user's question voice and agent ID are acquired (step S146 / Yes), the voice agent I / F20 voice-recognizes the question voice and converts it into text (step S149). The voice agent I / F20 outputs a textualized question sentence to the dialogue processing unit of the specific agent specified by the agent ID. For example, in the case of "agent ID: character A", the voice agent I / F 20 outputs a textualized question sentence to the character A dialogue processing unit 32.

次いで、対話処理部３０は、エージェントＩＤで指定された特定エージェントの会話ＤＢから、テキスト化した質問文と一致する質問文を検索する（ステップＳ１５２）。 Next, the dialogue processing unit 30 searches the conversation DB of the specific agent designated by the agent ID for a question text that matches the textualized question text (step S152).

次に、一致する質問があった場合（ステップＳ１５５／Ｙｅｓ）、キャラクターＡ対話処理部３２は、質問に対応する（対になって保存されている）回答文データを特定エージェントの会話ＤＢから取得する（ステップＳ１５８）。 Next, when there is a matching question (step S155 / Yes), the character A dialogue processing unit 32 acquires the answer sentence data corresponding to the question (stored in pairs) from the conversation DB of the specific agent. (Step S158).

一方、一致する質問がなかった場合（ステップＳ１５５／Ｎｏ）、基本対話処理部３１の会話ＤＢから、テキスト化した質問文と一致する質問文が検索される（ステップＳ１６１）。 On the other hand, when there is no matching question (step S155 / No), a question sentence matching the textualized question sentence is searched from the conversation DB of the basic dialogue processing unit 31 (step S161).

一致する質問文があった場合（ステップＳ１６１／Ｙｅｓ）、基本対話処理部３１は、質問に対応する（対になって保存されている）回答文データを基本対話処理部３１の会話ＤＢから取得する（ステップＳ１６７）。 When there is a matching question sentence (step S161 / Yes), the basic dialogue processing unit 31 acquires the answer sentence data corresponding to the question (stored in pairs) from the conversation DB of the basic dialogue processing unit 31. (Step S167).

一方、一致する質問文がなかった場合（ステップＳ１６４／Ｎｏ）、基本対話処理部３１は、一致する質問文が無い場合の回答文データ（例えば、「質問が解りません」といった回答文）を取得する（ステップＳ１７０）。 On the other hand, when there is no matching question sentence (step S164 / No), the basic dialogue processing unit 31 inputs the answer sentence data (for example, the answer sentence such as "I do not understand the question") when there is no matching question sentence. Acquire (step S170).

次いで、キャラクターＡ対話処理部３２により、エージェントＩＤで指定された特定エージェントの音素ＤＢ（ここでは、キャラクターＡ音素ＤＢ４２）を参照し、回答文データの音声を生成するためのキャラクターＡの音素データが取得される（ステップＳ１７３）。 Next, the character A dialogue processing unit 32 refers to the phoneme DB of the specific agent designated by the agent ID (here, the character A phoneme DB 42), and the phoneme data of the character A for generating the voice of the answer sentence data is obtained. Acquired (step S173).

次に、取得された音素データと回答文データが音声エージェントＩ／Ｆ２０に出力される（ステップＳ１７６）。 Next, the acquired phoneme data and the answer sentence data are output to the voice agent I / F20 (step S176).

そして、音声エージェントＩ／Ｆ２０は、回答文データ（テキスト）を音素データを用いて音声化（音声合成）し、クライアント端末１に送信する（ステップＳ１７９）。クライアント端末１では、キャラクターＡの音声で回答文が再生される。 Then, the voice agent I / F20 voices (speech synthesis) the answer sentence data (text) using the phoneme data and transmits it to the client terminal 1 (step S179). On the client terminal 1, the answer sentence is reproduced by the voice of the character A.

＜３−４．会話ＤＢ更新処理＞
次に、各対話処理部３００の会話ＤＢ３３０の更新処理について説明する。本実施形態では、ユーザとの会話によって会話ＤＢ３３０を成長させることが可能である。<3-4. Conversation DB update process>
Next, the update process of the conversation DB 330 of each dialogue processing unit 300 will be described. In the present embodiment, it is possible to grow the conversation DB 330 by talking with the user.

まず、会話ＤＢ３３０のデータ構成例について図８を参照して補足説明を行う。図８は、本実施形態による会話ＤＢ３３０のデータ構成例について説明する図である。図８に示すように、各会話ＤＢ３３０は、個人化レイヤー３３１と共通レイヤー３３２という２つのレイヤーを有する。例えばキャラクターＡ用会話ＤＢ３３０Ａの場合、共通レイヤー３３２Ａには、キャラクターＡの性格や特徴が反映された会話データが保持される。一方、個人化レイヤー３３１Ａには、ユーザとの会話により当該ユーザ向けにカスタマイズされた会話データが保持される。すなわち、キャラクターＡ音素ＤＢ４２およびキャラクターＡ対話処理部３２がセットでユーザに提供（販売）されるところ、あるユーザＸと、ユーザＹは、最初は同じキャラクターＡと対話を行う（共通レイヤー３３２Ａに保持されている会話データが使用される）が、対話を続けるにつれて、各ユーザ向けにカスタマイズされた会話データが、ユーザ毎の個人化レイヤー３３１Ａに蓄積される。これにより、ユーザＸ、ユーザＹそれぞれの好みに応じたキャラクターＡとの対話を提供できるようになる。 First, a supplementary explanation will be given with reference to FIG. 8 regarding a data configuration example of the conversation DB 330. FIG. 8 is a diagram illustrating a data configuration example of the conversation DB 330 according to the present embodiment. As shown in FIG. 8, each conversation DB 330 has two layers, a personalized layer 331 and a common layer 332. For example, in the case of the conversation DB 330A for character A, conversation data reflecting the character and characteristics of character A is held in the common layer 332A. On the other hand, the personalization layer 331A holds conversation data customized for the user by talking with the user. That is, when the character A phonetic element DB 42 and the character A dialogue processing unit 32 are provided (sold) to the user as a set, a certain user X and the user Y initially interact with the same character A (held in the common layer 332A). However, as the conversation continues, the conversation data customized for each user is accumulated in the personalized layer 331A for each user. As a result, it becomes possible to provide a dialogue with the character A according to the preference of each of the user X and the user Y.

またエージェント「人物Ｂ」が、キャラクターＡのような特定の性格を有さない平均的な世代別の人物の場合も、会話データがユーザ向けにカスタマイズされ得る。すなわち、例えば「人物Ｂ」が『２０代の人物』の場合、共通レイヤー３３２Ｂには２０代の平均的な会話データが保持され、ユーザとの対話を続けることでカスタマイズされた会話データがユーザ毎の個人化レイヤー３３１Ｂに保持される。また、ユーザは、人物Ｂの音声として「男性」、「女性」、「高い声」、「低い声」といった好きな音素データを人物Ｂ音素ＤＢ４３から選択し、購入することも可能である。 The conversation data can also be customized for the user when the agent "person B" is an average generational person such as character A who does not have a specific personality. That is, for example, when "person B" is "a person in his twenties", the average conversation data in his twenties is held in the common layer 332B, and customized conversation data is provided for each user by continuing the dialogue with the user. It is retained in the personalization layer 331B of. Further, the user can select and purchase favorite phoneme data such as "male", "female", "high voice", and "low voice" as the voice of the person B from the person B phoneme DB43.

このような会話ＤＢ３３０のカスタマイズを行う際の具体的な処理について、図９を参照して説明する。図９は、本実施形態による会話ＤＢ３３０の更新処理を示すフローチャートである。 Specific processing when customizing the conversation DB 330 will be described with reference to FIG. FIG. 9 is a flowchart showing an update process of the conversation DB 330 according to the present embodiment.

図９に示すように、まず、音声エージェントＩ／Ｆ２０は、クライアント端末１からユーザの質問音声を取得（受信）し、これを音声認識によりテキスト化する（ステップＳ１８３）。テキスト化されたデータ（質問文データ）は、エージェントＩＤにより指定されている特定エージェントの対話処理部（ここでは、例えばキャラクターＡ対話処理部３２）に出力される。 As shown in FIG. 9, first, the voice agent I / F20 acquires (receives) a user's question voice from the client terminal 1 and converts it into text by voice recognition (step S183). The textualized data (question text data) is output to the dialogue processing unit (here, for example, the character A dialogue processing unit 32) of the specific agent specified by the agent ID.

次に、キャラクターＡ対話処理部３２は、質問文データが所定のコマンドであるか否かを判断する（ステップＳ１８６）。 Next, the character A dialogue processing unit 32 determines whether or not the question text data is a predetermined command (step S186).

次いで、所定のコマンドである場合（ステップＳ１８６／Ｙｅｓ）、キャラクターＡ対話処理部３２は、ユーザ指定の回答文データを、会話ＤＢ３３０Ａの個人化レイヤー３３１Ａに質問文データと対で登録する（ステップＳ１８９）。所定のコマンドとは、例えば「ＮＧ」、「設定」といった言葉であってもよい。例えば以下のような会話の流れにより、キャラクターＡの会話ＤＢをカスタマイズすることができる。 Next, in the case of a predetermined command (step S186 / Yes), the character A dialogue processing unit 32 registers the answer sentence data specified by the user in the personalized layer 331A of the conversation DB 330A as a pair with the question sentence data (step S189). ). The predetermined command may be, for example, words such as "NG" and "setting". For example, the conversation DB of character A can be customized by the following conversation flow.

ユーザ：「おはよう」
キャラクターＡ：「おはよう」
ユーザ：「ＮＧ。元気で頑張ってと答えて」
キャラクターＡ：「元気で頑張って」User: "Good morning"
Character A: "Good morning"
User: "NG. Answer that you are fine and do your best"
Character A: "Be fine and do your best"

上記の会話の流れでは、『ＮＧ』が所定のコマンドであって、キャラクターＡ対話処理部３２は、ユーザから『ＮＧ』と発せられた後、ユーザ指定の回答文データ『元気で頑張って』を、質問文データ『おはよう』と対にして会話ＤＢ３３０Ａの個人化レイヤー３３１Ａに登録する。 In the above conversation flow, "NG" is a predetermined command, and the character A dialogue processing unit 32 issues the user-specified answer sentence data "Energetic and do your best" after the user issues "NG". , The question text data "Good morning" is paired and registered in the personalized layer 331A of the conversation DB 330A.

一方、所定のコマンドでない場合（ステップＳ１８６／Ｎｏ）、キャラクターＡ対話処理部３２は、質問文データと対になって保持されている回答文データをキャラクターＡ用会話ＤＢ３３０Ａから検索する。問文データと対になって保持されている回答文データがキャラクターＡ用会話ＤＢ３３０Ａに保持されていない場合、すなわち、ユーザの質問が回答文の無い質問であった場合（ステップＳ１９２／Ｙｅｓ）、キャラクターＡ対話処理部３２は、ユーザ指定の回答文データを、質問文と対にして個人化レイヤー３３１Ａに登録する（ステップＳ１９５）。例えば以下のような会話の流れにより、キャラクターＡの会話ＤＢをカスタマイズすることができる。 On the other hand, when the command is not a predetermined command (step S186 / No), the character A dialogue processing unit 32 searches the character A conversation DB 330A for the answer sentence data held as a pair with the question sentence data. When the answer sentence data held as a pair with the question sentence data is not held in the conversation DB 330A for character A, that is, when the user's question is a question without an answer sentence (step S192 / Yes). The character A dialogue processing unit 32 registers the answer sentence data specified by the user in the personalization layer 331A as a pair with the question sentence (step S195). For example, the conversation DB of character A can be customized by the following conversation flow.

ユーザ：「元気？」
キャラクターＡ：「質問がわかりません」（該当する回答が無い場合の回答データ例）
ユーザ：「『元気？』と聞いたら、『今日も元気だよ』と答えて」
キャラクターＡ：「今日も元気だよ」User: "How are you?"
Character A: "I don't understand the question" (example of answer data when there is no corresponding answer)
User: "When you ask,'How are you?', Answer'I'm fine today.'"
Character A: "I'm fine today"

上記会話の流れでは、『元気？』と対になって保持される回答文データが無いため、該当する回答が無い場合の回答データ例である『質問がわかりません』がキャラクターＡ対話処理部３２により取得され、対応するキャラクターＡの音素データと共に音声エージェントＩ／Ｆ２０に出力され、クライアント端末１で再生される。次いで、ユーザ指定の回答文『今日も元気だよ』が入力されると、キャラクターＡ対話処理部３２は、質問文データ『元気？』と対にして個人化レイヤー３３１Ａに登録する。 In the flow of the above conversation, "How are you? Since there is no answer sentence data that is held as a pair with "", "I don't understand the question", which is an example of answer data when there is no corresponding answer, is acquired by the character A dialogue processing unit 32, and the corresponding character A It is output to the voice agent I / F20 together with the phoneme data of, and is reproduced by the client terminal 1. Next, when the user-specified answer sentence "I'm fine today" is input, the character A dialogue processing unit 32 asks the question sentence data "How are you?". ] And register it in the personalized layer 331A.

なお、回答文の有る質問であった場合（ステップＳ１９２／Ｎｏ）、キャラクターＡ対話処理部３２は、当該回答文データを取得し、対応するキャラクターＡの音素データと共に音声エージェントＩ／Ｆ２０に出力し、クライアント端末１で回答文がキャラクターＡの音声で再生される（ステップＳ１９８）。 If the question has an answer sentence (step S192 / No), the character A dialogue processing unit 32 acquires the answer sentence data and outputs it to the voice agent I / F20 together with the corresponding character A phonetic data. , The answer sentence is reproduced by the voice of the character A on the client terminal 1 (step S198).

次いで、個人化レイヤーから共通レイヤーへの会話データ移行について、図１０を参照して説明する。図１０は、本実施形態による個人化レイヤーから共通レイヤーへの会話データ移行処理を示すフローチャートである。ここでは、一例としてキャラクターＡ対話処理部３２の個人化レイヤー３３１Ａから共通レイヤー３３２Ａへの会話データ移行処理について説明する。 Next, the transfer of conversation data from the personalized layer to the common layer will be described with reference to FIG. FIG. 10 is a flowchart showing a conversation data transfer process from the personalized layer to the common layer according to the present embodiment. Here, as an example, the conversation data transfer process from the personalized layer 331A of the character A dialogue processing unit 32 to the common layer 332A will be described.

図１０に示すように、まず、キャラクターＡ対話処理部３２は、ユーザ毎の個人化レイヤー３３１Ａを定期的にサーチし（ステップＳ２０３）、実質的に同じ内容の会話ペア（質問文データと回答文データのペア）を抽出する（ステップＳ２０６）。実質的に同じ内容の会話ペアとは、例えば質問文「元気？」と回答文「今日も元気だよ！」のペアと、質問文「元気ですか？」と回答文「今日も元気だよ！」のペアは、質問文が丁寧語か否かの違いのみであって、実質的に同じ内容の会話ペアと判断され得る。 As shown in FIG. 10, first, the character A dialogue processing unit 32 periodically searches for the personalized layer 331A for each user (step S203), and has substantially the same conversation pair (question text data and answer text). A pair of data) is extracted (step S206). Conversation pairs with substantially the same content are, for example, a pair of the question sentence "How are you?" And the answer sentence "I'm fine today!" And the question sentence "How are you?" And the answer sentence "I'm fine today." The pair of "!" Can be judged to be a conversation pair with substantially the same content, only the difference is whether the question sentence is a polite word or not.

次に、キャラクターＡ対話処理部３２は、ユーザ毎の個人化レイヤー３３１Ａから会話ペアが所定数以上抽出された場合（ステップＳ２０９／Ｙｅｓ）、当該会話ペアを（ユーザ毎の）共通レイヤー３３２Ａに登録する（ステップＳ２１２）。 Next, when the character A dialogue processing unit 32 extracts a predetermined number or more of conversation pairs from the personalized layer 331A for each user (step S209 / Yes), the character A dialogue processing unit 32 registers the conversation pairs in the common layer 332A (for each user). (Step S212).

このように、ユーザ毎の個人化レイヤー３３１において実質的に内容が同じ会話ペアを共通レイヤー３３２に移行することで、共通レイヤー３３２を成長（会話ペアを拡充）させることが可能となる。 In this way, by migrating the conversation pairs having substantially the same contents in the personalization layer 331 for each user to the common layer 332, it is possible to grow the common layer 332 (expand the conversation pairs).

また、本実施形態では、特定エージェントの会話ＤＢ（具体的には共通レイヤー）から基本対話用の会話ＤＢへ会話データを移行して基本対話用の会話ＤＢを成長させることも可能である。図１１は、本実施形態による基本対話用会話ＤＢ３３０Ｆへの会話データの移行について説明する図である。例えば、ユーザＸおよびユーザＹが各々エージェント「キャラクターＡ」を選択（購入）し、ユーザＺがエージェント「人物Ｂ」を選択（購入）している場合、図１１に示すように、ユーザＸのキャラクターＡ用会話ＤＢ３３０Ａ−Ｘ、ユーザＹのキャラクターＡ用会話ＤＢ３３０Ａ−Ｙ、およびユーザＺの人物Ｂ用会話ＤＢ３３０Ｂ−Ｚが対話処理部３０に存在し得る。この場合、各個人化レイヤー３３１Ａ−Ｘ、３３１Ａ−Ｙ、３３１Ｂ−Ｚには、各ユーザＸ、ユーザＹ、ユーザＺとの対話に応じて独自の（カスタマイズされた）会話ペアが登録されていく（図９参照）。次いで、同じエージェントの個人化レイヤー３３１Ａ−Ｘ、３３１Ａ−Ｙにおいて実質同じ会話ペアが所定数あると、ユーザ毎の共通レイヤー３３２Ａ−Ｘ、３３２Ａ−Ｙに各々登録される（図１０参照）。 Further, in the present embodiment, it is also possible to grow the conversation DB for basic dialogue by migrating the conversation data from the conversation DB (specifically, the common layer) of the specific agent to the conversation DB for basic dialogue. FIG. 11 is a diagram illustrating the transfer of conversation data to the basic conversation conversation DB 330F according to the present embodiment. For example, when user X and user Y each select (purchase) agent "character A" and user Z select (purchase) agent "person B", as shown in FIG. 11, the character of user X The conversation DB330A-X for A, the conversation DB330A-Y for character A of user Y, and the conversation DB330B-Z for person B of user Z may exist in the dialogue processing unit 30. In this case, each personalized layer 331A-X, 331A-Y, 331B-Z is registered with its own (customized) conversation pair according to the dialogue with each user X, user Y, and user Z. (See FIG. 9). Next, if there are a predetermined number of substantially the same conversation pairs in the personalized layers 331A-X and 331A-Y of the same agent, they are registered in the common layers 332A-X and 332A-Y for each user (see FIG. 10).

そして、対話処理部３０は、複数のエージェント（異なるエージェントを含んでもよい）の共通レイヤー３３２Ａ−Ｘ、３３２Ａ−Ｙ、３３２Ｂ−Ｚから実質同じ会話ペアが所定数以上抽出された場合、上位の基本対話用会話ＤＢ３３０Ｆに会話ペアを移行する。基本対話用会話ＤＢ３３０Ｆは、基本対話処理部３１が有する会話ＤＢである。これにより、基本対話用会話ＤＢ３３０Ｆを成長（会話ペアを拡充）させることが可能となる。かかるデータ移行処理について、図１２を参照して具体的に説明する。図１２は、本実施形態による基本対話用ＤＢ３３０Ｆへの会話データ移行処理を示すフローチャートである。 Then, the dialogue processing unit 30 is a higher-level basic when a predetermined number or more of substantially the same conversation pairs are extracted from the common layers 332A-X, 332A-Y, and 332B-Z of a plurality of agents (which may include different agents). The conversation pair is transferred to the conversation DB330F for dialogue. The basic dialogue DB 330F is a conversation DB owned by the basic dialogue processing unit 31. This makes it possible to grow the conversation DB330F for basic dialogue (expand conversation pairs). Such data migration processing will be specifically described with reference to FIG. FIG. 12 is a flowchart showing a conversation data transfer process to the basic dialogue DB 330F according to the present embodiment.

図１２に示すように、まず、対話処理部３０は、定期的に会話ＤＢ３３０の複数の共通レイヤー３３２をサーチし（ステップＳ２２３）、実質同じ会話ペアを抽出する（ステップＳ２２６）。 As shown in FIG. 12, first, the dialogue processing unit 30 periodically searches a plurality of common layers 332 of the conversation DB 330 (step S223), and extracts substantially the same conversation pair (step S226).

次に、対話処理部３０は、複数の共通レイヤー３３２から実質同じ会話ペアが所定数以上抽出された場合（ステップＳ２２９／Ｙｅｓ）、当該会話ペアを基本対話用会話ＤＢ３３０Ｆに登録する（ステップＳ２３２）。 Next, when a predetermined number or more of substantially the same conversation pairs are extracted from the plurality of common layers 332 (step S229 / Yes), the dialogue processing unit 30 registers the conversation pairs in the basic conversation conversation DB 330F (step S232). ..

このように、複数のエージェントにおける会話ＤＢ３３０の共通レイヤー３３２において実質的に内容が同じ会話ペアを、基本対話用会話ＤＢ３３０Ｆに移行することで、基本対話用会話ＤＢ３３０Ｆを成長（会話ペアを拡充）させることが可能となる。 In this way, the conversation pair for basic dialogue DB330F is grown (expanded conversation pair) by migrating the conversation pair having substantially the same content in the common layer 332 of the conversation DB330 among the plurality of agents to the conversation DB330F for basic dialogue. It becomes possible.

＜３−５．広告出力処理＞
続いて、広告挿入処理部７０による広告情報の挿入処理について図１３〜図１４を参照して説明する。本実施形態では、広告挿入処理部７０により、エージェントの発言に広告ＤＢ７２に格納されている広告情報の挿入を行うことが可能である。広告ＤＢ７２には、予め広告情報が登録され得る。図１３は、本実施形態による広告ＤＢ７２に登録されている広告情報の一例を示す図である。<3-5. Advertisement output processing>
Subsequently, the process of inserting the advertisement information by the advertisement insertion processing unit 70 will be described with reference to FIGS. 13 to 14. In the present embodiment, the advertisement insertion processing unit 70 can insert the advertisement information stored in the advertisement DB 72 into the remarks of the agent. Advertising information can be registered in advance in the advertising DB 72. FIG. 13 is a diagram showing an example of advertising information registered in the advertising DB 72 according to the present embodiment.

図１３に示すように、広告情報６２１は、例えばエージェントＩＤ、質問文、広告内容、条件、および確率を含む。エージェントＩＤは広告内容を発言するエージェントを指定し、質問文は広告内容を挿入するトリガとなるユーザの質問文を指定し、広告内容はエージェントの対話に挿入する広告文章である。また、条件は、広告内容を挿入する条件であって、確率は広告内容を挿入する確率を示す。例えば図１３の１段目に示す例では、エージェント「キャラクターＡ」との対話において、３０歳以下のユーザからの質問文に「チョコレート」という単語が含まれている場合に、「ＢＢ社の新しく発売されたチョコはミルクがたくさん入っていて美味しいよ」といった広告内容が回答文に挿入される。また、トリガとなる質問文が発せられた際に毎回広告内容を挿入するとユーザが煩わしく思ってしまうこともあるため、本実施形態では、広告を挿入する確率を設定するようにしてもよい。かかる確率は広告料に応じて決定されてもよい。例えば広告料が高いほど確率が高く設定される。 As shown in FIG. 13, the advertisement information 621 includes, for example, an agent ID, a question text, an advertisement content, a condition, and a probability. The agent ID specifies an agent who speaks the advertisement content, the question text specifies the question text of the user who triggers to insert the advertisement content, and the advertisement content is the advertisement text to be inserted into the dialogue of the agent. Further, the condition is a condition for inserting the advertisement content, and the probability indicates the probability of inserting the advertisement content. For example, in the example shown in the first row of FIG. 13, in the dialogue with the agent "Character A", when the word "chocolate" is included in the question text from the user under 30 years old, "BB company new The chocolate that was released contains a lot of milk and is delicious. " In addition, since the user may be annoyed if the advertisement content is inserted every time a question sentence that serves as a trigger is issued, the probability of inserting the advertisement may be set in the present embodiment. Such a probability may be determined according to the advertising fee. For example, the higher the advertising fee, the higher the probability is set.

このような広告内容の挿入処理について図１４を参照して具体的に説明する。図１４は、本実施形態による広告内容の挿入処理を示すフローチャートである。 Such an advertisement content insertion process will be specifically described with reference to FIG. FIG. 14 is a flowchart showing an advertisement content insertion process according to the present embodiment.

図１４に示すように、まず、広告挿入処理部７０は、ユーザとエージェントとの対話（具体的には、対話処理部３０による対話処理）を監視する（ステップＳ２４３）。 As shown in FIG. 14, first, the advertisement insertion processing unit 70 monitors the dialogue between the user and the agent (specifically, the dialogue processing by the dialogue processing unit 30) (step S243).

次に、広告挿入処理部７０は、ユーザとエージェントとの対話に、広告ＤＢ７２に登録されている質問文と同一の内容の質問文が登場したか否かを判断する（ステップＳ２４６）。 Next, the advertisement insertion processing unit 70 determines whether or not a question sentence having the same content as the question sentence registered in the advertisement DB 72 appears in the dialogue between the user and the agent (step S246).

次いで、同一の内容の質問文が登場した場合（ステップＳ２４６／Ｙｅｓ）、広告挿入処理部７０は、該当する質問文と対応付けられている広告挿入の条件および確率を確認する（ステップＳ２４９）。 Next, when a question sentence having the same content appears (step S246 / Yes), the advertisement insertion processing unit 70 confirms the condition and probability of the advertisement insertion associated with the corresponding question sentence (step S249).

続いて、広告挿入処理部７０は、条件および確率に基づいて、現在、広告が出せる状態であるか否かを判断する（ステップＳ２５２）。 Subsequently, the advertisement insertion processing unit 70 determines whether or not the advertisement is currently in a state of being able to be put out based on the condition and the probability (step S252).

次に、広告が出せる状態である場合（ステップＳ２５２／Ｙｅｓ）、広告挿入処理部７０は、対話処理部３０による対話処理を一時停止させ（ステップＳ２５５）、広告内容を対話に挿入する（ステップＳ２５８）。具体的には、例えばユーザの質問文に対するエージェントの回答文に、広告内容を挿入させる。 Next, when the advertisement can be put out (step S252 / Yes), the advertisement insertion processing unit 70 suspends the dialogue processing by the dialogue processing unit 30 (step S255) and inserts the advertisement content into the dialogue (step S258). ). Specifically, for example, the advertisement content is inserted into the agent's answer to the user's question.

そして、広告内容を含む対話（会話文データ）が対話処理部３０から音声エージェントＩ／Ｆ２０に出力され、音声エージェントＩ／Ｆ２０からクライアント端末１に送信され、エージェントの音声で再生される（ステップＳ２６１）。具体的には、例えば以下のような会話により、キャラクターＡの発言としてユーザに広告内容を提示することができる。 Then, the dialogue (conversation text data) including the advertisement content is output from the dialogue processing unit 30 to the voice agent I / F20, transmitted from the voice agent I / F20 to the client terminal 1, and reproduced by the voice of the agent (step S261). ). Specifically, for example, the advertisement content can be presented to the user as a statement of the character A by the following conversation.

ユーザ：「おはよう」
キャラクターＡ：「おはよう！今日の調子はどうですか？」
ユーザ：「元気だよ。何か美味しい物食べたいな」
キャラクターＡ：「ＣＣ店の焼肉が美味しいらしいよ」User: "Good morning"
Character A: "Good morning! How are you doing today?"
User: "I'm fine. I want to eat something delicious."
Character A: "It seems that the grilled meat at the CC store is delicious."

上記会話では、まず、ユーザの質問文「おはよう」に対して、キャラクターＡの会話ＤＢから検索された対応する回答文「おはよう！今日の調子はどうですか？」が音声出力される。次いで、ユーザの質問文「元気だよ。何か美味しい物食べたいな」に、広告挿入のトリガとなる質問文「何か美味しい物食べたいな」が含まれているため（図１３の２段目参照）、広告挿入処理部７０は広告挿入処理を行い、キャラクターＡの音声で広告内容「ＣＣ店の焼肉が美味しいらしいよ」といった回答文が出力される。 In the above conversation, first, in response to the user's question sentence "Good morning", the corresponding answer sentence "Good morning! How are you doing today?" Searched from the conversation DB of the character A is output by voice. Next, because the user's question "I'm fine. I want to eat something delicious" includes the question "I want to eat something delicious" that triggers the insertion of advertisements (2nd row in Fig. 13). (See eyes), the advertisement insertion processing unit 70 performs the advertisement insertion processing, and outputs an answer sentence such as "It seems that the grilled meat of the CC store is delicious" by the voice of the character A.

以上、本実施形態による通信制御システムの基本的な動作処理として、会話データ登録処理、音素ＤＢ生成処理、対話制御処理、会話ＤＢ更新処理、および広告挿入処理について説明した。 As described above, conversation data registration processing, phoneme DB generation processing, dialogue control processing, conversation DB update processing, and advertisement insertion processing have been described as basic operation processing of the communication control system according to the present embodiment.

さらに、本実施形態による通信制御システムの対話処理部３０は、エージェントの音声発話機能を用いて、エージェントのキャラクターにユーザ自身がなりきる体験を提供し、エージェントシステムの楽しさを高めることを可能とする。このような本実施形態による対話処理部３０の音声出力制御処理について、図１５〜図４２を参照して具体的に説明する。 Further, the dialogue processing unit 30 of the communication control system according to the present embodiment can provide the agent character with an experience of being the user himself / herself by using the voice utterance function of the agent, and can enhance the enjoyment of the agent system. To do. The voice output control process of the dialogue processing unit 30 according to the present embodiment will be specifically described with reference to FIGS. 15 to 42.

＜＜４．音声出力制御処理＞＞
＜４−１．構成＞
まず、本実施形態による音声出力制御処理を行う対話処理部３０ａの構成について、図１５を参照して説明する。<< 4. Audio output control processing >>
<4-1. Configuration>
First, the configuration of the dialogue processing unit 30a that performs the voice output control processing according to the present embodiment will be described with reference to FIG.

図１５は、本実施形態による対話処理部３０ａの構成例を示す図である。図１５に示すように、対話処理部３０ａは、基本対話処理部３１、キャラクターＡ対話処理部３２、人物Ｂ対話処理部３３、人物Ｃ対話処理部３４、ユーザ管理部３５、自動発話制御部３６、およびシナリオ管理部３７を有する。 FIG. 15 is a diagram showing a configuration example of the dialogue processing unit 30a according to the present embodiment. As shown in FIG. 15, the dialogue processing unit 30a includes a basic dialogue processing unit 31, a character A dialogue processing unit 32, a person B dialogue processing unit 33, a person C dialogue processing unit 34, a user management unit 35, and an automatic utterance control unit 36. , And a scenario management unit 37.

基本対話処理部３１、キャラクターＡ対話処理部３２、人物Ｂ対話処理部３３、および人物Ｃ対話処理部３４は、図３および図４を参照して説明したように、ユーザの発話に対応するエージェントの応答を生成する機能を有する。基本対話処理部３１は、エージェントに特化しない汎用の応答を生成し、キャラクターＡ対話処理部３２、人物Ｂ対話処理部３３、および人物Ｃ対話処理部３４は、各エージェントキャラクター（キャラクターＡ、人物Ｂ、人物Ｃ）にそれぞれ特化した応答を生成する。 The basic dialogue processing unit 31, the character A dialogue processing unit 32, the person B dialogue processing unit 33, and the person C dialogue processing unit 34 are agents corresponding to the user's utterances, as described with reference to FIGS. 3 and 4. It has a function to generate a response of. The basic dialogue processing unit 31 generates a general-purpose response that is not specific to the agent, and the character A dialogue processing unit 32, the person B dialogue processing unit 33, and the person C dialogue processing unit 34 are each agent character (character A, person). Generate a response specialized for each of B and C).

（ユーザ管理部３５）
ユーザ管理部３５は、ユーザ情報の管理（登録、変更、更新、削除）を行う。図１６に、本実施形態によるユーザ管理部３５の構成例を示す。図１６に示すように、ユーザ管理部３５は、ログイン管理部３５１、ユーザ情報ＤＢ３５２、顔情報登録部３５３、およびユーザ位置情報登録部３５４を有する。(User Management Department 35)
The user management unit 35 manages (registers, changes, updates, deletes) user information. FIG. 16 shows a configuration example of the user management unit 35 according to the present embodiment. As shown in FIG. 16, the user management unit 35 includes a login management unit 351, a user information DB 352, a face information registration unit 353, and a user location information registration unit 354.

ログイン管理部３５１は、クライアント端末１からの要求に応じて、ユーザのログイン認証を行う。具体的には、例えばログイン管理部３５１は、ユーザによりクライアント端末１で入力されたアカウント情報（アカウント名、パスワード）をユーザ情報ＤＢ３５２と参照し、ログイン認証を行う。ユーザ情報ＤＢ３５２は、ユーザＩＤ、アカウント情報、ユーザ属性情報（誕生日、性別、郵便番号等）、顔情報、購入した（ユーザ所有の）エージェントＩＤおよびシナリオＩＤ等を含むユーザ情報を記憶する。これらのユーザ情報は、例えばエージェントサーバ２の音声エージェントＩ／Ｆ２０（図３参照）を介してクライアント端末１から送信され、登録される。 The login management unit 351 performs login authentication of the user in response to a request from the client terminal 1. Specifically, for example, the login management unit 351 refers to the account information (account name, password) input by the user on the client terminal 1 with the user information DB 352, and performs login authentication. The user information DB 352 stores user information including user ID, account information, user attribute information (birthday, gender, zip code, etc.), face information, purchased (user-owned) agent ID, scenario ID, and the like. These user information are transmitted from the client terminal 1 and registered via, for example, the voice agent I / F20 (see FIG. 3) of the agent server 2.

顔情報登録部３５３は、ユーザの顔情報をユーザ情報ＤＢ５２に登録する。ユーザの顔情報は、例えばクライアント端末１に設けられたカメラにより撮像されたユーザの顔画像を解析した結果であって、クライアント端末１からエージェントサーバ２へ送信され得る。なお、クライアント端末１から顔画像が送信され、顔情報登録部３５３において解析してもよい。 The face information registration unit 353 registers the user's face information in the user information DB 52. The user's face information is, for example, the result of analyzing the user's face image captured by a camera provided in the client terminal 1, and can be transmitted from the client terminal 1 to the agent server 2. A face image may be transmitted from the client terminal 1 and analyzed by the face information registration unit 353.

ユーザ位置情報登録部３５４は、ユーザの現在位置情報をユーザ情報ＤＢ５２に登録する。ユーザの現在位置情報は、例えばクライアント端末１に設けられたＧＰＳ（Global Positioning System）等の位置測位部により測位され、定期的にエージェントサーバ２に送信される。 The user position information registration unit 354 registers the user's current position information in the user information DB 52. The user's current position information is positioned by a positioning unit such as GPS (Global Positioning System) provided in the client terminal 1, and is periodically transmitted to the agent server 2.

（自動発話制御部３６）
自動発話制御部３６は、エージェントによる自動的な発話を制御する機能を有する。図１７に、本実施形態による自動発話制御部３６の構成例を示す。図１７に示すように、自動発話制御部３６は、ユーザ音声抽出部３６１、音素データ取得部３６２、位置情報取得部３６３、フレーズ検索部３６４、フレーズＤＢ３６５、および情報解析部３６６を有する。(Automatic utterance control unit 36)
The automatic utterance control unit 36 has a function of controlling automatic utterance by an agent. FIG. 17 shows a configuration example of the automatic utterance control unit 36 according to the present embodiment. As shown in FIG. 17, the automatic utterance control unit 36 includes a user voice extraction unit 361, a phoneme data acquisition unit 362, a position information acquisition unit 363, a phrase search unit 364, a phrase DB 365, and an information analysis unit 366.

ユーザ音声抽出部３６１は、入力された音声情報を解析し、ユーザ音声を抽出する。かかる音声情報は、例えばクライアント端末１のマイクロホンにより収音され、ネットワークを介してクライアント端末１からエージェントサーバ２に送信される。エージェントサーバ２では、音声エージェントＩ／Ｆ２０により受信した当該音声情報を対話処理部３０ａへ出力する。なおクライアント端末１は、継続的、定期的、または所定のタイミングで周辺の音声情報を収音し、エージェントサーバ２へ送信する。ユーザ音声抽出部３６１は、抽出したユーザ音声をテキスト化し（発話テキストの生成）、音素データ取得部３６２へ出力する。 The user voice extraction unit 361 analyzes the input voice information and extracts the user voice. Such voice information is picked up by, for example, the microphone of the client terminal 1 and transmitted from the client terminal 1 to the agent server 2 via the network. The agent server 2 outputs the voice information received by the voice agent I / F 20 to the dialogue processing unit 30a. The client terminal 1 collects peripheral voice information continuously, periodically, or at a predetermined timing and transmits it to the agent server 2. The user voice extraction unit 361 converts the extracted user voice into text (generation of utterance text) and outputs it to the phoneme data acquisition unit 362.

位置情報取得部３６３は、ユーザの現在位置情報を取得し、フレーズ検索部３６４へ出力する。ユーザの現在位置情報は、クライアント端末１から送信され得る。 The position information acquisition unit 363 acquires the user's current position information and outputs it to the phrase search unit 364. The current position information of the user can be transmitted from the client terminal 1.

情報解析部３６６は、クライアント端末１から送信されたユーザ状況を示す種々の情報を解析し、解析結果をフレーズ検索部３６４へ出力する。具体的には、例えば情報解析部３６６は、クライアント端末１から送信された顔情報（撮像画像に基づいて解析された、現在のユーザの顔情報または周囲に居る人物の顔情報）から顔の表情を解析する。また、情報解析部３６６は、クライアント端末１から送信された加速度情報（加速度センサにより検知された情報）からユーザ行動（走っている、ジャンプしている、寝ている等）を解析する。また、情報解析部３６６は、クライアント端末１から送信された音声情報（マイクロホンにより収音された音声情報）から環境音（ユーザ周辺の雑音等）を解析する。また、情報解析部３６６は、クライアント端末１から送信された生体情報（脈拍センサ、心拍センサ、発汗センサ、体温センサ、血圧センサ、脳波センサ等により検知された情報）からユーザ状態（緊張している、怒っている、悲しんでいる、喜んでいる等）を解析する。そして、情報解析部３６６は、解析結果（ユーザまたは周辺人物の状況）をフレーズ検索部３６４へ出力する。 The information analysis unit 366 analyzes various information indicating the user status transmitted from the client terminal 1 and outputs the analysis result to the phrase search unit 364. Specifically, for example, the information analysis unit 366 uses facial expressions from the face information transmitted from the client terminal 1 (the face information of the current user or the face information of a person in the vicinity, which is analyzed based on the captured image). To analyze. In addition, the information analysis unit 366 analyzes the user behavior (running, jumping, sleeping, etc.) from the acceleration information (information detected by the acceleration sensor) transmitted from the client terminal 1. In addition, the information analysis unit 366 analyzes environmental sounds (noise around the user, etc.) from the voice information (voice information picked up by the microphone) transmitted from the client terminal 1. In addition, the information analysis unit 366 is nervous from the biometric information (information detected by the pulse sensor, heart rate sensor, sweating sensor, body temperature sensor, blood pressure sensor, brain wave sensor, etc.) transmitted from the client terminal 1. , Angry, sad, happy, etc.). Then, the information analysis unit 366 outputs the analysis result (the situation of the user or a peripheral person) to the phrase search unit 364.

フレーズ検索部３６４は、位置情報取得部３６３により取得されたユーザ位置、情報解析部３６６により解析されたユーザまたは相手の表情、ユーザ行動、またはユーザ状況等に応じたフレーズ（発話フレーズとも称す）をフレーズＤＢ３６５から検索し、検索結果を音素データ取得部３６２へ出力する。フレーズには、ナレーションや効果音が紐付けられていてもよい。また、フレーズＤＢ３６５は、エージェントキャラクター毎のフレーズデータが格納される。ここで、下記表１に、フレーズＤＢ３６５に格納されるエージェントキャラクター「ヒーロー」のフレーズデータ例を示す。下記表１に示すように、フレーズＤＢ３６５には、状況とフレーズや効果音が対応付けて記憶されている。下記表１に示す例では、一のセンサ種別に「状況」が対応付けられているが、本実施形態はこれに限定されず、複数のセンサの解析結果に基づいて「状況」が総合的に判断されてもよい。また、本実施形態では、複数の状況（場所、表情、時刻、状態等）が条件を満たす場合に対応する「フレーズ、効果音」が対応付けられていてもよい。 The phrase search unit 364 uses a phrase (also referred to as an utterance phrase) according to the user position acquired by the position information acquisition unit 363, the facial expression of the user or the other party analyzed by the information analysis unit 366, the user behavior, or the user situation. It searches from the phrase DB 365 and outputs the search result to the utterance data acquisition unit 362. Narration and sound effects may be associated with the phrase. Further, the phrase DB 365 stores phrase data for each agent character. Here, Table 1 below shows an example of phrase data of the agent character "hero" stored in the phrase DB 365. As shown in Table 1 below, the phrase DB 365 stores the situation and the phrase or sound effect in association with each other. In the example shown in Table 1 below, the "situation" is associated with one sensor type, but the present embodiment is not limited to this, and the "situation" is comprehensively based on the analysis results of a plurality of sensors. It may be judged. Further, in the present embodiment, "phrases, sound effects" corresponding to cases where a plurality of situations (place, facial expression, time, state, etc.) satisfy the conditions may be associated with each other.

音素データ取得部３６２は、音声抽出部３６１から出力された発話テキスト、またはフレーズ検索部３６４から出力されたフレーズ、ナレーションを音声化するための音素データを、対応するエージェントの音素記憶部４０から取得する。例えば音素データ取得部３６２は、ユーザ音声をユーザ指定のエージェントの音声に変換するため、当該エージェントの音素データを発話テキストに応じて取得する。クライアント端末１からは、特定のエージェントキャラクターを選択する選択信号が送信され得る。 The phoneme data acquisition unit 362 acquires the phoneme data for converting the utterance text output from the voice extraction unit 361, the phrase output from the phrase search unit 364, and the narration into voice from the phoneme storage unit 40 of the corresponding agent. To do. For example, the phoneme data acquisition unit 362 acquires the phoneme data of the agent according to the utterance text in order to convert the user voice into the voice of the agent specified by the user. A selection signal for selecting a specific agent character may be transmitted from the client terminal 1.

（シナリオ管理部３７）
シナリオ管理部３７は、エージェントキャラクターに紐付けられた各シナリオの管理を行う。図１８に、本実施形態によるシナリオ管理部３７の構成例を示す。図１８に示すように、シナリオ管理部３７は、データ管理部３７１、シナリオ実行部３７２、情報解析部３７３、およびシナリオＤＢ３７４を有する。(Scenario Management Department 37)
The scenario management unit 37 manages each scenario associated with the agent character. FIG. 18 shows a configuration example of the scenario management unit 37 according to the present embodiment. As shown in FIG. 18, the scenario management unit 37 has a data management unit 371, a scenario execution unit 372, an information analysis unit 373, and a scenario DB 374.

データ管理部３７１は、シナリオＤＢ３７４に格納されているシナリオの登録、変更、更新、削除といった管理を行う。シナリオＤＢ３７４には、各エージェントキャラクターに対応する１以上のシナリオデータが格納されている。シナリオデータには、タイトル、あらすじ、購入金額等が付随情報として付与され、さらに、イベント（シナリオイベントとも称す）に関するデータが含まれる。イベントに関するデータには、イベント発生のトリガとなる状況（場所、ユーザ行動、表情、ユーザ発話等）と、イベントの開催時刻（開催期間）等が含まれる。 The data management unit 371 manages the scenario stored in the scenario DB 374, such as registration, change, update, and deletion. One or more scenario data corresponding to each agent character is stored in the scenario DB 374. The scenario data is given a title, a synopsis, a purchase price, and the like as incidental information, and further includes data related to an event (also referred to as a scenario event). The data related to the event includes a situation (place, user behavior, facial expression, user utterance, etc.) that triggers the occurrence of the event, an event holding time (holding period), and the like.

シナリオ実行部３７２は、ユーザが参加中のシナリオに従って、エージェントキャラクターの音声や画像をユーザに提示するよう制御する。具体的には、シナリオ実行部３７２は、シナリオに基づく音声や画像等の提示情報を、音声エージェントＩ／Ｆ２０からネットワークを介してクライアント端末１へ送信するよう制御する。また、シナリオ実行部３７２は、情報解析部３７３による解析結果に基づいて、シナリオに含まれるイベントのトリガ判断を行い、イベントが発生する場合はイベントの音声や画像等の提示情報を、音声エージェントＩ／Ｆ２０からネットワークを介してクライアント端末１へ送信するよう制御する。 The scenario execution unit 372 controls to present the voice or image of the agent character to the user according to the scenario in which the user is participating. Specifically, the scenario execution unit 372 controls the voice agent I / F20 to transmit the presentation information such as voice and images based on the scenario to the client terminal 1 via the network. Further, the scenario execution unit 372 makes a trigger determination of the event included in the scenario based on the analysis result by the information analysis unit 373, and when the event occurs, the presentation information such as the sound or image of the event is transmitted to the voice agent I. Controls to transmit from / F20 to the client terminal 1 via the network.

情報解析部３７３は、クライアント端末１から送信されたユーザ状況を示す種々の情報を解析し、解析結果をフレーズ検索部３６４へ出力する。ユーザ状況を示す種々の情報とは、例えば位置情報、顔情報（撮像画像に基づいて解析された、現在のユーザの顔情報または周囲に居る人物の顔情報）、加速度情報、音声情報、生体情報等である。 The information analysis unit 373 analyzes various information indicating the user status transmitted from the client terminal 1 and outputs the analysis result to the phrase search unit 364. Various information indicating the user situation includes, for example, position information, face information (face information of the current user or face information of a person in the vicinity analyzed based on the captured image), acceleration information, voice information, and biological information. And so on.

ここで、下記表２に、シナリオＤＢ３７４に格納されるシナリオデータに含まれるイベントデータ例を示す。下記表２に示すように、イベントデータでは、トリガ発生の条件、イベント内容、およびアクションが対応付けられている。 Here, Table 2 below shows an example of event data included in the scenario data stored in the scenario DB 374. As shown in Table 2 below, the event data is associated with trigger generation conditions, event contents, and actions.

以上、本実施形態による対話処理部３０ａの構成について具体的に説明した。続いて、本実施形態による動作処理について図１９〜図４２を参照して具体的に説明する。 The configuration of the dialogue processing unit 30a according to the present embodiment has been specifically described above. Subsequently, the operation processing according to the present embodiment will be specifically described with reference to FIGS. 19 to 42.

＜４−２．動作処理＞
（４−２−１．エージェント購入処理）
図１９は、本実施形態によるエージェントアプリケーションの購入処理を示すシーケンス図である。ここで、エージェントアプリケーションとは、特定のエージェントキャラクターによる自動対話をクライアント端末１で享受するために使用されるソフトウェアであって、エージェントアプリケーションの購入は、「エージェントの購入」とも言える。以下、アプリケーションを「App」とも称する。<4-2. Operation processing>
(4-2-1. Agent purchase process)
FIG. 19 is a sequence diagram showing a purchase process of the agent application according to the present embodiment. Here, the agent application is software used to enjoy automatic dialogue by a specific agent character on the client terminal 1, and the purchase of the agent application can be said to be "purchase of the agent". Hereinafter, the application is also referred to as "App".

図１９に示すように、まず、クライアント端末１は、エージェントサーバ２により提供されるアプリケーションショップのＷｅｂサイトから任意の（すなわち、ユーザにより選択された）エージェントAppのダウンロードおよびインストールを行う（ステップＳ２７０）。なお、クライアント端末１とエージェントサーバ２は、ネットワークを介して接続される。エージェントサーバ２のデータの送受信は、音声エージェントＩ／Ｆ２０により行われ得る。 As shown in FIG. 19, first, the client terminal 1 downloads and installs an arbitrary (that is, user-selected) Agent App from the website of the application shop provided by the agent server 2 (step S270). .. The client terminal 1 and the agent server 2 are connected via a network. Data transmission / reception of the agent server 2 may be performed by the voice agent I / F20.

次いで、クライアント端末１は、エージェントAppを起動（初回起動）する（ステップＳ２７３）。ユーザアカウントが登録済みでない場合（ステップＳ２７６／Ｎｏ）、アカウントの登録処理をエージェントサーバ２に要求する（ステップＳ２７９）。 Next, the client terminal 1 starts (first starts) the agent application (step S273). If the user account has not been registered (step S276 / No), the agent server 2 is requested to register the account (step S279).

次に、エージェントサーバ２のユーザ管理部３５（図１６参照）は、クライアント端末１からの要求に応じて、新規アカウント情報をユーザ情報ＤＢ３５２に登録する（ステップＳ２８２）。新規アカウント情報は、アカウント名やパスワード、ユーザ属性情報（性別、生年月日、ニックネーム）等であって、クライアント端末１においてユーザにより入力され、アカウント登録処理の要求と共に送信される。 Next, the user management unit 35 (see FIG. 16) of the agent server 2 registers new account information in the user information DB 352 in response to a request from the client terminal 1 (step S282). The new account information is account name, password, user attribute information (gender, date of birth, nickname), etc., which is input by the user on the client terminal 1 and transmitted together with a request for account registration processing.

一方、ユーザアカウントが登録済みである場合（ステップＳ２７６／Ｙｅｓ）、クライアント端末１は、ログイン処理をエージェントサーバ２に要求する（ステップＳ２８５）。 On the other hand, when the user account is already registered (step S276 / Yes), the client terminal 1 requests the agent server 2 to perform the login process (step S285).

次いで、新規アカウント登録を行った場合若しくはログイン処理要求を受信した場合、エージェントサーバ２のログイン管理部３５１は、ユーザ情報ＤＢ３５２を参照し、アカウントのログイン処理を行う（ステップＳ２８８）。ログイン処理の要求では、クライアント端末１においてユーザにより入力されたアカウント名とパスワードが送信されるので、ログイン管理部３５１はユーザ情報ＤＢ３５２を参照して照合する。 Next, when a new account is registered or a login processing request is received, the login management unit 351 of the agent server 2 refers to the user information DB 352 and performs the login processing of the account (step S288). In the login processing request, the account name and password entered by the user on the client terminal 1 are transmitted, so the login management unit 351 refers to the user information DB 352 and collates them.

次に、ログイン処理が正常に完了すると、エージェントサーバ２は、ログイン完了通知を、音声エージェントＩ／Ｆ２０からネットワークを介してクライアント端末１へ送信する（ステップＳ２９１）。 Next, when the login process is normally completed, the agent server 2 transmits a login completion notification from the voice agent I / F 20 to the client terminal 1 via the network (step S291).

次いで、クライアント端末１は、クライアント端末１のカメラ（または周辺に存在する通信可能な外部端末に設けられているカメラ）を起動し、ユーザの顔を撮像し、撮像画像（顔画像）から顔情報を取得する（ステップＳ２９４）。顔情報は、撮像画像（顔画像）の解析結果でもよいし、顔画像自体であってもよい。 Next, the client terminal 1 activates the camera of the client terminal 1 (or a camera provided in a communicable external terminal existing in the vicinity), images the user's face, and performs face information from the captured image (face image). (Step S294). The face information may be the analysis result of the captured image (face image) or the face image itself.

次に、クライアント端末１は、顔情報をエージェントサーバ２へ送信し（ステップＳ２９７）、エージェントサーバ２は、顔情報をユーザ情報ＤＢ３５２に登録する（ステップＳ３００）。 Next, the client terminal 1 transmits the face information to the agent server 2 (step S297), and the agent server 2 registers the face information in the user information DB 352 (step S300).

続いて、クライアント端末１は、バックグラウンドでエージェントAppを実行させるか否かのユーザによる選択を受け付け（ステップＳ３０３）、選択内容を設定情報としてエージェントサーバ２へ送信する（ステップＳ３０６）。 Subsequently, the client terminal 1 accepts the user's selection as to whether or not to execute the agent application in the background (step S303), and transmits the selected content as setting information to the agent server 2 (step S306).

次いで、エージェントサーバ２は、設定情報をユーザ情報ＤＢ３５２に保存する（ステップＳ３０９）。なおかかる設定情報はクライアント端末１の記憶部に保存されていてもよい。 Next, the agent server 2 saves the setting information in the user information DB 352 (step S309). The setting information may be stored in the storage unit of the client terminal 1.

そして、クライアント端末１は、起動したエージェントAppに従ってメイン画面を表示する（ステップＳ３１２）。 Then, the client terminal 1 displays the main screen according to the activated agent App (step S312).

以上、エージェントApp購入とエージェントApp初回起動時の処理について説明した。ここで、エージェントApp購入とエージェントApp初回起動時におけるクライアント端末１での表示画面例について図２０〜図２２を参照して説明する。 The processing for purchasing the Agent App and starting the Agent App for the first time has been explained above. Here, an example of a display screen on the client terminal 1 at the time of purchasing the agent application and starting the agent application for the first time will be described with reference to FIGS. 20 to 22.

図２０は、本実施形態によるエージェントアプリケーションの購入時における表示画面例を示す図である。図２０左に示す画面１００には、購入対象の候補となる複数のエージェントAppのタイトルが表示されている。画面１００に示す各エージェントAppのタイトルは、例えばエージェントキャラクターの名称である。例えばエージェントキャラクター「パワフルマン」を購入したい場合、ユーザは、画面１００の「エージェントApp『パワフルマン』」を選択する。この場合、画面１００は図２０中央に示す画面１０１に遷移する。 FIG. 20 is a diagram showing an example of a display screen at the time of purchasing the agent application according to the present embodiment. On the screen 100 shown on the left side of FIG. 20, the titles of a plurality of agent apps that are candidates for purchase are displayed. The title of each agent App shown on the screen 100 is, for example, the name of the agent character. For example, when the user wants to purchase the agent character "Powerful Man", the user selects "Agent App" Powerful Man "" on the screen 100. In this case, the screen 100 transitions to the screen 101 shown in the center of FIG.

画面１０１には、アカウント情報入力欄、アカウント作成ボタン、および「アカウントをお持ちの方はこちら」ボタンが表示されている。アプリケーションショップを利用するためのアカウントを既に登録済みの場合、ユーザは、「アカウントをお持ちの方はこちら」ボタンを選択する。この場合、画面１０１は図２０右に示す画面１０２に遷移する。 On the screen 101, an account information input field, an account creation button, and a "Click here if you have an account" button are displayed. If you have already registered an account to use the application shop, the user selects the "If you have an account, click here" button. In this case, the screen 101 transitions to the screen 102 shown on the right side of FIG.

画面１０２には、アカウント名入力欄、パスワード入力欄、およびログインボタンが表示されている。ユーザは、登録済みのアカウント名（ユーザ名／ＩＤ、ログイン名／ＩＤとも称される）およびパスワードを入力し、ログインボタンを選択する。ログインボタンが選択されると、クライアント端末１は、入力されたアカウント名およびパスワードと共に、エージェントサーバ２に対してログイン処理要求を行う。 On the screen 102, an account name input field, a password input field, and a login button are displayed. The user enters the registered account name (also referred to as user name / ID and login name / ID) and password, and selects the login button. When the login button is selected, the client terminal 1 makes a login processing request to the agent server 2 together with the entered account name and password.

一方、アカウントが未登録の場合、画面１０１においてアカウント名等の入力を行い、アカウントの作成をエージェントサーバ２に依頼する。図２１は、本実施形態によるアカウント登録画面例を示す図である。図２１左の画面１０３に示すように、アカウント名等が入力され、「アカウント作成」ボタンが選択されると、クライアント端末１は、入力された情報と共にアカウント登録処理の要求をエージェントサーバ２に対して行う。 On the other hand, if the account is not registered, the account name and the like are input on the screen 101, and the agent server 2 is requested to create the account. FIG. 21 is a diagram showing an example of an account registration screen according to the present embodiment. As shown in the screen 103 on the left side of FIG. 21, when the account name and the like are input and the "Create account" button is selected, the client terminal 1 sends a request for account registration processing to the agent server 2 together with the input information. To do.

エージェントサーバ２においてアカウント登録処理が正常に完了すると、図２１右に示すように、アカウント作成が完了したことを通知する画面１０４が表示される。画面１０４には、「続いて、お客様の顔情報を登録しますか？」といったテキストが表示され、「はい」ボタンが選択されると、クライアント端末１のカメラが起動し、ユーザの顔の撮像、および顔情報の抽出（解析）が行われる。抽出された顔情報は、エージェントサーバ２へ送信され、ユーザ情報として登録される。 When the account registration process is normally completed on the agent server 2, a screen 104 notifying that the account creation is completed is displayed as shown on the right side of FIG. 21. On the screen 104, a text such as "Would you like to register your face information?" Is displayed. When the "Yes" button is selected, the camera of the client terminal 1 is activated and the user's face is imaged. , And face information extraction (analysis) is performed. The extracted face information is transmitted to the agent server 2 and registered as user information.

図２２は、本実施形態によるメイン画面例を示す図である。エージェントAppの初回起動においてログイン処理やアカウント登録処理が終了すると、図２２の左に示すように、エージェントAppを開始するか否かを確認する画面１０５が表示される。開始する場合、ユーザは画面１０５に表示されている「はい」ボタンを選択する。なお画面１０５には、バックグラウンドでの実行可否を設定するためのチェックボックスも表示されている。ユーザは、エージェントAppをバックグラウンドで実行したい場合にはチェックを入れる。クライアント端末１は、当該チェックボックスへのチェックの有無を、バックグラウンドでの実行可否の設定情報としてエージェントサーバ２へ送信する。 FIG. 22 is a diagram showing an example of a main screen according to the present embodiment. When the login process and the account registration process are completed in the initial startup of the agent application, a screen 105 for confirming whether or not to start the agent application is displayed as shown on the left side of FIG. 22. To start, the user selects the "Yes" button displayed on the screen 105. A check box for setting whether or not to execute in the background is also displayed on the screen 105. If the user wants to run the agent app in the background, check it. The client terminal 1 transmits to the agent server 2 whether or not the check box is checked as background setting information for whether or not to execute the check box.

画面１０５の「はい」ボタンが選択されると、エージェントAppのメイン画面１０６が表示される。ここでは、例えばヒーローキャラクターの「パワフルマン」の画像がクライアント端末１の表示部に表示され、さらに「この街は俺が守る！」といった「パワフルマン」の音声やテーマ曲がクライアント端末１のスピーカーから再生される。 When the "Yes" button on the screen 105 is selected, the main screen 106 of the agent app is displayed. Here, for example, the image of the hero character "Powerful Man" is displayed on the display of the client terminal 1, and the voice and theme song of "Powerful Man" such as "I will protect this city!" Are displayed on the speaker of the client terminal 1. Played from.

（４−２−２．音声変換処理）
続いて、本実施形態による音声変換処理について図２３〜図２４を参照して説明する。図２３は、本実施形態による音声変換処理について説明する図である。本実施形態では、図２３に示すように、ユーザの発話音声W₄をクライアント端末１（またはクライアント端末１と通信接続する周辺に存在するウェアラブル装置）のマイクロホンにより収音すると、これを対話処理部３０ａの自動発話制御部３６により特定のエージェントキャラクターの音声W₅に変換してユーザが装着するイヤホン等から再生する。特定のエージェントキャラクターとは、例えばクライアント端末１において起動中のエージェントAppに対応するキャラクターであって、当該エージェントAppを起動する操作が、実質的なユーザによるエージェントキャラクターの選択として認識され、選択信号がエージェントサーバ２へ送信される。このように、ユーザは自分の発話音声がエージェントキャラクターの音声で聞こえることで、エージェントキャラクターに成りきることができる。(4-2-2. Voice conversion processing)
Subsequently, the voice conversion process according to the present embodiment will be described with reference to FIGS. 23 to 24. FIG. 23 is a diagram illustrating the voice conversion process according to the present embodiment. In the present embodiment, as shown in FIG. 23, when the user's utterance voice W ₄ is picked up by the microphone of the client terminal 1 (or a wearable device existing in the vicinity of communicating with the client terminal 1), this is picked up by the dialogue processing unit. The automatic utterance control unit 36 of 30a converts the voice W ₅ of a specific agent character and reproduces it from an earphone or the like worn by the user. The specific agent character is, for example, a character corresponding to the agent app running on the client terminal 1, and the operation of starting the agent app is recognized as a substantial user selection of the agent character, and a selection signal is output. It is sent to the agent server 2. In this way, the user can impersonate the agent character by hearing his / her utterance voice as the voice of the agent character.

図２４は、本実施形態による音声変換処理を示すシーケンス図である。図２４に示すように、まず、クライアント端末１は、マイクロホンにより音データを収音すると（ステップＳ３２０）、収音した音データをエージェントサーバ２へ送信する（ステップＳ３２３）。この際、クライアント端末１は、ユーザが選択しているエージェントキャラクターを示す選択信号も併せて送信してもよい。これらのデータは、クライアント端末１からネットワーク３を介してエージェントサーバ２へ送信され、エージェントサーバ２の音声エージェントＩ／Ｆ２０（通信部として機能）で受信され得る。 FIG. 24 is a sequence diagram showing the voice conversion process according to the present embodiment. As shown in FIG. 24, first, when the client terminal 1 picks up the sound data by the microphone (step S320), the client terminal 1 transmits the picked up sound data to the agent server 2 (step S323). At this time, the client terminal 1 may also transmit a selection signal indicating the agent character selected by the user. These data can be transmitted from the client terminal 1 to the agent server 2 via the network 3 and received by the voice agent I / F20 (functioning as a communication unit) of the agent server 2.

次いで、エージェントサーバ２は、自動発話制御部３６のユーザ音声抽出部３６１（図１７参照）により、音データを解析し、ユーザ音声の抽出を行う。エージェントサーバ２は、ユーザ音声が抽出できた場合、これをユーザに選択された特定のエージェントキャラクター（ここでは、例えば「ヒーローキャラクター」）の音声に変換する（ステップＳ３２６）。より具体的には、エージェントサーバ２は、対話処理部３０ａのユーザ音声抽出部３６１により抽出、テキスト化したユーザ音声文をエージェントキャラクターの音声で音声化するための音素データを対話処理部３０ａの音素データ取得部３６２により取得する。そして、対話処理部３０ａから出力されたユーザ音声文および対応する特定のエージェントキャラクターの音素データに基づいて、音声エージェントＩ／Ｆ２０により、ユーザ音声文を特定のエージェントキャラクターの音声で音声化し（音声変換）、音声化したデータ（音声データ）を音声エージェントＩ／Ｆ２０からネットワークを介してクライアント端末１へ送信する（ステップＳ３２９）。 Next, the agent server 2 analyzes the sound data by the user voice extraction unit 361 (see FIG. 17) of the automatic utterance control unit 36, and extracts the user voice. When the user voice can be extracted, the agent server 2 converts it into the voice of a specific agent character (here, for example, "hero character") selected by the user (step S326). More specifically, the agent server 2 extracts phoneme data for converting the user voice sentence extracted by the user voice extraction unit 361 of the dialogue processing unit 30a into a text by the voice of the agent character, and the phoneme data of the dialogue processing unit 30a. It is acquired by the data acquisition unit 362. Then, based on the user voice sentence output from the dialogue processing unit 30a and the phonetic data of the corresponding specific agent character, the voice agent I / F20 converts the user voice sentence into a voice of the specific agent character (voice conversion). ), The voiced data (voice data) is transmitted from the voice agent I / F 20 to the client terminal 1 via the network (step S329).

次に、クライアント端末１は、エージェントサーバ２で変換されたヒーローキャラクターの声色の音声データをイヤホン等（スピーカーの一例）から再生する（ステップＳ３３２）。 Next, the client terminal 1 reproduces the voice data of the voice color of the hero character converted by the agent server 2 from an earphone or the like (an example of a speaker) (step S332).

以上、音声変換処理について説明したが、本実施形態は、上述したような聴覚的な成りきりに限定されず、視覚的にも成りきり体験を提供することが可能である。以下、図２５を参照して説明する。 Although the voice conversion process has been described above, the present embodiment is not limited to the above-mentioned auditory pretext, and it is possible to provide a visual prescription experience. Hereinafter, description will be made with reference to FIG.

図２５は、本実施形態によるＡＲ（Augmented Reality）変身による視覚的な成りきりについて説明する図である。本実施形態では、図２５に示すように、例えばクライアン端末１に設けられたカメラでユーザ自身の顔を撮像し、撮像したユーザの顔画像に、エージェントキャラクターの顔画像を重畳表示した画面１０７を生成して表示することで、視覚的な成りきり体験を提供することができる。この際、エージェントサーバ２により、撮像した顔画像に基づく顔認識、すなわちユーザ情報ＤＢ３５３２に登録された顔情報と一致するか否かの確認を行い、一致する場合は当該顔画像にエージェントキャラクターの顔画像を重畳表示するようにしてもよい。また、上述した音声変換の聴覚的な成りきり体験と併せて視覚的な成りきり体験を提供するようにしてもよい。 FIG. 25 is a diagram for explaining the visual premise of AR (Augmented Reality) transformation according to the present embodiment. In the present embodiment, as shown in FIG. 25, for example, a camera provided in the client terminal 1 captures the user's own face, and a screen 107 in which the face image of the agent character is superimposed and displayed on the captured user's face image is displayed. By generating and displaying it, it is possible to provide a visual experience. At this time, the agent server 2 performs face recognition based on the captured face image, that is, confirms whether or not the face information matches the face information registered in the user information DB 3532, and if they match, the face image of the agent character is displayed. The images may be superimposed and displayed. Further, a visual pretending experience may be provided in addition to the above-mentioned auditory pretending experience of voice conversion.

（４−２−３．自動発話処理）
続いて、ユーザ状況に応じた特定エージェントキャラクターの自動発話処理について図２６Ａ〜図２６Ｄを参照して説明する。ユーザ状況とは、例えばユーザの場所、人物属性や表情、行動状態（行動認識）、および心理状態等が想定される。(4-2-3. Automatic utterance processing)
Subsequently, the automatic utterance processing of the specific agent character according to the user situation will be described with reference to FIGS. 26A to 26D. The user situation is assumed to be, for example, the user's location, personal attributes and facial expressions, behavioral state (behavior recognition), psychological state, and the like.

図２６Ａは、本実施形態による場所に応じた自動発話処理を示すシーケンス図である。図２６Ａに示すように、まず、クライアント端末１は、ＧＰＳ等により現在位置情報を取得し（ステップＳ３４０）、エージェントサーバ２へ現在位置情報を送信する（ステップＳ３４２）。このような現在位置情報の取得および送信は、例えばクライアント端末１でエージェントプログラムが起動している際に定期的に行われ得る。 FIG. 26A is a sequence diagram showing automatic utterance processing according to a location according to the present embodiment. As shown in FIG. 26A, first, the client terminal 1 acquires the current position information by GPS or the like (step S340), and transmits the current position information to the agent server 2 (step S342). Such acquisition and transmission of the current position information can be performed periodically, for example, when the agent program is running on the client terminal 1.

次に、エージェントサーバ２の自動発話制御部３６は、クライアント端末１から送信され位置情報取得部３６３により取得したクライアント端末１の位置情報（場所）に対応するフレーズを、フレーズ検索部３６４によりフレーズＤＢ３６５を参照して検索する（ステップＳ３４４）。例えば上記表１に示しように、特定の場所（XX都市、Y公園、Z駅等）や一般的な場所（駅、郵便局、公園、海等）に紐付けられたフレーズや効果音が検索される。 Next, the automatic speech control unit 36 of the agent server 2 uses the phrase search unit 364 to input the phrase corresponding to the position information (location) of the client terminal 1 transmitted from the client terminal 1 and acquired by the position information acquisition unit 363 to the phrase DB365. (Step S344). For example, as shown in Table 1 above, phrases and sound effects associated with specific locations (XX cities, Y parks, Z stations, etc.) and general locations (stations, post offices, parks, seas, etc.) are searched. Will be done.

次いで、場所に応じたフレーズが見つかった（すなわち検索がヒットした）場合（ステップＳ３４６／Ｙｅｓ）、自動発話制御部３６は、検索したフレーズをユーザ指定の特定エージェントキャラクター、例えばヒーローキャラクターの声色に変換する（ステップＳ３４８）。具体的には、音素データ取得部３６２によりフレーズを音声化するためのヒーローキャラクターの音素データを音素記憶部４０から取得し、取得された音素データおよびフレーズが音声エージェントＩ／Ｆ２０に出力され、音声エージェントＩ／Ｆ２０によりフレーズの音声化処理（例えば音声合成）が行われる。 Next, when a phrase corresponding to the location is found (that is, the search is hit) (step S346 / Yes), the automatic utterance control unit 36 converts the searched phrase into the voice of a specific agent character specified by the user, for example, a hero character. (Step S348). Specifically, the phoneme data acquisition unit 362 acquires the phoneme data of the hero character for converting the phrase into voice from the phoneme storage unit 40, and the acquired phoneme data and the phrase are output to the voice agent I / F20 to make a voice. The agent I / F20 performs a phrase voice processing (for example, voice synthesis).

続いて、エージェントサーバ２は、音声エージェントＩ／Ｆ２０により生成した音声データをクライアント端末１へ送信する（ステップＳ３５０）。 Subsequently, the agent server 2 transmits the voice data generated by the voice agent I / F 20 to the client terminal 1 (step S350).

そして、クライアント端末１は、エージェントサーバ２から受信した音声データ、すなわち、所定のフレーズをヒーローキャラクターの声色で音声化された音声データを再生する（ステップＳ３５２）。これにより、ユーザが所定の場所に移動したことをトリガにクライアント端末１のスピーカーから特定のエージェントキャラクターの声色でその場所に対応するフレーズが再生される。 Then, the client terminal 1 reproduces the voice data received from the agent server 2, that is, the voice data in which a predetermined phrase is voiced with the voice of the hero character (step S352). As a result, when the user moves to a predetermined place, the speaker of the client terminal 1 reproduces the phrase corresponding to the place with the voice of a specific agent character.

図２６Ｂは、本実施形態による人物属性や表情に応じた各自動発話処理を示すシーケンス図である。図２６Ｂに示すように、まず、クライアント端末１は、カメラを起動し、撮像画像を取得し（ステップＳ３５４）、エージェントサーバ２へ撮像画像を送信する（ステップＳ３５６）。このようなカメラの起動、撮像、および送信は、例えばクライアント端末１でエージェントプログラムが起動している際に定期的に行われ得る。カメラは、クライアント端末１がユーザの視線方向等ユーザの周辺を撮像する外向きのカメラ（アウトカメラとも称される）と、クライアント端末１を操作するユーザを撮像する内向きのカメラ（インカメラとも称される）とが想定される。 FIG. 26B is a sequence diagram showing each automatic utterance process according to a person's attributes and facial expressions according to the present embodiment. As shown in FIG. 26B, first, the client terminal 1 activates the camera, acquires the captured image (step S354), and transmits the captured image to the agent server 2 (step S356). Such camera activation, imaging, and transmission may be performed periodically, for example, when the agent program is activated on the client terminal 1. The cameras are an outward camera (also called an out-camera) in which the client terminal 1 captures the user's surroundings such as the line-of-sight direction of the user, and an inward-facing camera (also referred to as an in-camera) in which the user who operates the client terminal 1 is imaged. It is assumed that it will be called).

次に、エージェントサーバ２の自動発話制御部３６は、クライアント端末１から送信された撮像画像がインカメラで撮像されたものであるか否かを判断する（ステップＳ３５８）。インカメラで撮像されたか否かは、例えば撮像画像に付属するメタデータから判断し得る。 Next, the automatic utterance control unit 36 of the agent server 2 determines whether or not the captured image transmitted from the client terminal 1 is captured by the in-camera (step S358). Whether or not the image is captured by the in-camera can be determined from, for example, the metadata attached to the captured image.

次いで、インカメラである場合（ステップＳ３５８／Ｙｅｓ）、撮像画像にはユーザが写っていると判断され、フレーズ検索部３６４は、情報解析部３６６による顔画像解析結果に基づいて、顔の表情に対応するフレーズをフレーズＤＢ３６５から検索する（ステップＳ３６０）。例えば上記表１に示したように、ユーザが笑顔の場合に対応するフレーズや効果音、若しくはユーザが怒り顔の場合に対応するフレーズや効果音等が検索される。 Next, in the case of the in-camera (step S358 / Yes), it is determined that the user is shown in the captured image, and the phrase search unit 364 changes the facial expression based on the face image analysis result by the information analysis unit 366. The corresponding phrase is searched from the phrase DB 365 (step S360). For example, as shown in Table 1 above, a phrase or sound effect corresponding to the case where the user smiles, or a phrase or sound effect corresponding to the case where the user has an angry face is searched.

一方、インカメラでない場合（ステップＳ３５８／Ｎｏ）、すなわちアウトカメラで撮像されたものである場合、撮像画像にはユーザ周辺の人物（例えばユーザと対面する人物）が写っていると判断され、フレーズ検索部３６４は、情報解析部３６６による顔画像解析結果に基づいて、対面する人物の属性（年齢、性別、雰囲気等）や表情に対応するフレーズをフレーズＤＢ３６５から検索する（ステップＳ３６２）。例えば上記表１に示したように、対面する相手が女性である場合に対応するフレーズや効果音等が検索される。 On the other hand, when it is not an in-camera (step S358 / No), that is, when it is captured by an out-camera, it is determined that a person around the user (for example, a person facing the user) is captured in the captured image, and the phrase The search unit 364 searches the phrase DB 365 for phrases corresponding to the attributes (age, gender, atmosphere, etc.) and facial expressions of the person facing each other based on the face image analysis result by the information analysis unit 366 (step S362). For example, as shown in Table 1 above, phrases, sound effects, etc. corresponding to the case where the person to be faced is a woman are searched.

次いで、人物属性や表情に応じたフレーズが見つかった（すなわち検索がヒットした）場合（ステップＳ３６４／Ｙｅｓ）、自動発話制御部３６は、検索したフレーズをユーザ指定の特定エージェントキャラクター、例えばヒーローキャラクターの声色に変換する（ステップＳ３６６）。特定エージェントキャラクターへの変換処理は、上記ステップＳ３４８で説明した処理と同様である。 Next, when a phrase corresponding to the person attribute or facial expression is found (that is, the search is hit) (step S364 / Yes), the automatic utterance control unit 36 uses the searched phrase as a user-specified specific agent character, for example, a hero character. It is converted into a voice color (step S366). The conversion process to the specific agent character is the same as the process described in step S348 above.

続いて、エージェントサーバ２は、音声エージェントＩ／Ｆ２０により生成した音声データをクライアント端末１へ送信する（ステップＳ３６８）。 Subsequently, the agent server 2 transmits the voice data generated by the voice agent I / F 20 to the client terminal 1 (step S368).

そして、クライアント端末１は、エージェントサーバ２から受信した音声データ、すなわち、所定のフレーズをヒーローキャラクターの声色で音声化された音声データを再生する（ステップＳ３７０）。これにより、ユーザの表情、または対面する相手の属性や表情をトリガにクライアント端末１のスピーカーから特定のエージェントキャラクターの声色でその時の表情や属性に対応するフレーズが再生される。 Then, the client terminal 1 reproduces the voice data received from the agent server 2, that is, the voice data in which a predetermined phrase is voiced with the voice of the hero character (step S370). As a result, a phrase corresponding to the facial expression or attribute at that time is reproduced from the speaker of the client terminal 1 with the voice of a specific agent character, triggered by the facial expression of the user or the attribute or facial expression of the other party facing the user.

図２６Ｃは、本実施形態によるユーザ行動に応じた各自動発話処理を示すシーケンス図である。図２６Ｃに示すように、まず、クライアント端末１は、加速度センサにより加速度センサ情報を取得し（ステップＳ３７２）、エージェントサーバ２へ加速度センサ情報を送信する（ステップＳ３７４）。加速度センサ情報の送信は、例えばクライアント端末１でエージェントプログラムが起動している際に定期的に行われ得る。 FIG. 26C is a sequence diagram showing each automatic utterance process according to the user action according to the present embodiment. As shown in FIG. 26C, first, the client terminal 1 acquires the acceleration sensor information by the acceleration sensor (step S372) and transmits the acceleration sensor information to the agent server 2 (step S374). The transmission of the acceleration sensor information may be performed periodically, for example, when the agent program is running on the client terminal 1.

次に、エージェントサーバ２の自動発話制御部３６は、クライアント端末１から送信された加速度センサ情報に基づいて情報解析部３６６により行動認識処理を行い、行動認識結果で示される行動状態に対応するフレーズをフレーズ検索部３６４によりフレーズＤＢ３６５を参照して検索する（ステップＳ３７６）。例えば上記表１に示しように、走っている状態や寝ている状態に紐付けられたフレーズや効果音が検索される。なお、行動認識処理に用いるセンサデータとしてここでは加速度センサ情報を用いているが、本実施形態は当然これに限定されず、加速度センサの他、ジャイロセンサ、地磁気センサ等、様々なセンサにより検知されたデータを用いてもよい。 Next, the automatic speech control unit 36 of the agent server 2 performs an action recognition process by the information analysis unit 366 based on the acceleration sensor information transmitted from the client terminal 1, and the phrase corresponding to the action state indicated by the action recognition result. Is searched by the phrase search unit 364 with reference to the phrase DB 365 (step S376). For example, as shown in Table 1 above, phrases and sound effects associated with a running state or a sleeping state are searched. Although acceleration sensor information is used here as sensor data used for behavior recognition processing, the present embodiment is not limited to this, and is detected by various sensors such as a gyro sensor and a geomagnetic sensor in addition to the acceleration sensor. Data may be used.

次いで、行動状態に応じたフレーズが見つかった（すなわち検索がヒットした）場合（ステップＳ３７８／Ｙｅｓ）、自動発話制御部３６は、検索したフレーズをユーザ指定の特定エージェントキャラクター、例えばヒーローキャラクターの声色に変換する（ステップＳ３８０）。 Next, when a phrase corresponding to the action state is found (that is, the search is hit) (step S378 / Yes), the automatic utterance control unit 36 converts the searched phrase into the voice of a specific agent character specified by the user, for example, a hero character. Convert (step S380).

続いて、エージェントサーバ２は、音声エージェントＩ／Ｆ２０により生成した音声データをクライアント端末１へ送信する（ステップＳ３８２）。 Subsequently, the agent server 2 transmits the voice data generated by the voice agent I / F 20 to the client terminal 1 (step S382).

そして、クライアント端末１は、エージェントサーバ２から受信した音声データ、すなわち、所定のフレーズをヒーローキャラクターの声色で音声化された音声データを再生する（ステップＳ３８４）。これにより、ユーザが所定の行動状態になったことをトリガにクライアント端末１のスピーカーから特定のエージェントキャラクターの声色でその時の行動状態に対応するフレーズが再生される。 Then, the client terminal 1 reproduces the voice data received from the agent server 2, that is, the voice data in which a predetermined phrase is voiced with the voice of the hero character (step S384). As a result, when the user is in a predetermined action state, the speaker of the client terminal 1 reproduces a phrase corresponding to the action state at that time with the voice of a specific agent character.

図２６Ｄは、本実施形態による心理状態に応じた各自動発話処理を示すシーケンス図である。図２６Ｄに示すように、まず、クライアント端末１は、生体センサによりユーザの生体情報を検知し（ステップＳ３８６）、エージェントサーバ２へ生体情報を送信する（ステップＳ３８８）。生体情報の送信は、例えばクライアント端末１でエージェントプログラムが起動している際に定期的に行われ得る。 FIG. 26D is a sequence diagram showing each automatic utterance process according to the psychological state according to the present embodiment. As shown in FIG. 26D, first, the client terminal 1 detects the biometric information of the user by the biometric sensor (step S386) and transmits the biometric information to the agent server 2 (step S388). The transmission of biometric information can be performed periodically, for example, when the agent program is running on the client terminal 1.

次に、エージェントサーバ２の自動発話制御部３６は、クライアント端末１から送信された生体情報を情報解析部３６６により解析し、解析により得られたユーザの心理状態（すなわち感情）に対応するフレーズを、フレーズ検索部３６４によりフレーズＤＢ３６５を参照して検索する（ステップＳ３９０）。例えば上記表１に示しように、脈拍が速く緊張した状態に紐付けられたフレーズや効果音が検索される。生体センサは、例えば脈拍、、心拍、血圧、発汗量、呼吸、脳波、または筋電等を検知する各種センサである。情報解析部３６６は、このような生体情報に基づいて、ユーザの心理状態、すなわち、喜び、怒り、悲しみ、緊張、興奮等の感情を解析する。 Next, the automatic utterance control unit 36 of the agent server 2 analyzes the biometric information transmitted from the client terminal 1 by the information analysis unit 366, and outputs a phrase corresponding to the user's psychological state (that is, emotion) obtained by the analysis. , The phrase search unit 364 refers to the phrase DB 365 and searches (step S390). For example, as shown in Table 1 above, phrases and sound effects associated with a state in which the pulse is fast and tense are searched. Biosensors are various sensors that detect, for example, pulse, heartbeat, blood pressure, sweating amount, respiration, brain wave, myoelectricity, and the like. The information analysis unit 366 analyzes the psychological state of the user, that is, emotions such as joy, anger, sadness, tension, and excitement, based on such biometric information.

次いで、心理状態に応じたフレーズが見つかった（すなわち検索がヒットした）場合（ステップＳ３９２／Ｙｅｓ）、自動発話制御部３６は、検索したフレーズをユーザ指定の特定エージェントキャラクター、例えばヒーローキャラクターの声色に変換する（ステップＳ３９４）。 Next, when a phrase corresponding to the psychological state is found (that is, the search is hit) (step S392 / Yes), the automatic utterance control unit 36 uses the searched phrase as the voice of a specific agent character specified by the user, for example, a hero character. Convert (step S394).

続いて、エージェントサーバ２は、音声エージェントＩ／Ｆ２０により生成した音声データをクライアント端末１へ送信する（ステップＳ３９６）。 Subsequently, the agent server 2 transmits the voice data generated by the voice agent I / F 20 to the client terminal 1 (step S396).

そして、クライアント端末１は、エージェントサーバ２から受信した音声データ、すなわち、所定のフレーズをヒーローキャラクターの声色で音声化された音声データを再生する（ステップＳ３９８）。これにより、ユーザの心理状態をトリガにクライアント端末１のスピーカーから特定のエージェントキャラクターの声色でその時の心理状態に対応するフレーズが再生される。 Then, the client terminal 1 reproduces the voice data received from the agent server 2, that is, the voice data in which a predetermined phrase is voiced with the voice of the hero character (step S398). As a result, a phrase corresponding to the psychological state at that time is reproduced from the speaker of the client terminal 1 with the voice of a specific agent character triggered by the psychological state of the user.

以上、ユーザ状況に応じたエージェントの自動発話制御処理について説明した。なお、本実施形態による自動発話制御処理は上述した例に限定されず、例えば場所、表情、行動、感情、ユーザ発話、日時等のうち少なくとも１以上のユーザ状況に対応するフレーズを検索してもよいし、複数のユーザ状況のうち例えば所定の順序（優先度の高い順等）にフレーズ検索を行ってもよい。 The automatic utterance control process of the agent according to the user situation has been described above. The automatic utterance control process according to the present embodiment is not limited to the above-mentioned example, and even if a phrase corresponding to at least one or more user situations among places, facial expressions, actions, emotions, user utterances, date and time, etc. is searched Alternatively, the phrase search may be performed in a predetermined order (higher priority order, etc.) among a plurality of user situations.

（４−２−４．シナリオ取得処理）
続いて、本実施形態によるシナリオモードについて説明する。本実施形態による対話処理部３０ａは、上述したようなユーザ音声のエージェントキャラクター音声への変換、およびユーザ状況に応じたエージェントキャラクターの自動発話の他、さらにユーザがエージェントキャラクターに成りきってシナリオ（物語）へ参加する体験を提供することができる。このような体験を提供する際に用いられるシナリオプログラムの取得について、以下図２７〜図２９を参照して説明する。(4-2-4. Scenario acquisition process)
Subsequently, the scenario mode according to the present embodiment will be described. In the dialogue processing unit 30a according to the present embodiment, in addition to the above-mentioned conversion of the user voice into the agent character voice and the automatic utterance of the agent character according to the user situation, the user becomes an agent character and a scenario (story) ) Can provide an experience of participating. The acquisition of the scenario program used to provide such an experience will be described below with reference to FIGS. 27-29.

図２７は、本実施形態によるシナリオ取得処理を示すシーケンス図である。図２７に示すように、まず、クライアント端末１の表示部に表示されたメニュー画面から「シナリオ一覧」が選択されると（ステップＳ４１０）、クライアント端末１は、エージェントサーバ２に対してシナリオ一覧の取得要求を行う（ステップＳ４１３）。ここでは、例えばユーザが購入したエージェントキャラクター「ヒーローキャラクター『パワフルマン』」のシナリオ一覧の取得要求が行われる。 FIG. 27 is a sequence diagram showing a scenario acquisition process according to the present embodiment. As shown in FIG. 27, when "scenario list" is first selected from the menu screen displayed on the display unit of the client terminal 1 (step S410), the client terminal 1 sets the scenario list for the agent server 2. An acquisition request is made (step S413). Here, for example, a request is made to acquire a scenario list of the agent character "hero character" Powerful Man "" purchased by the user.

次いで、エージェントサーバ２のシナリオ管理部３７は、ヒーローキャラクターに紐づくシナリオの一覧をシナリオＤＢ３７４から取得し（ステップＳ４１６）、クライアント端末１へ音声エージェントＩ／Ｆ２０からネットワークを介して送信する（ステップＳ４１９）。 Next, the scenario management unit 37 of the agent server 2 acquires a list of scenarios associated with the hero character from the scenario DB 374 (step S416), and transmits the list to the client terminal 1 from the voice agent I / F 20 via the network (step S419). ).

次に、クライアント端末１は、エージェントサーバ２から受信したシナリオ一覧を表示部に表示し（ステップＳ４２２）、ユーザによるシナリオの選択を受け付ける（ステップＳ４２５）。 Next, the client terminal 1 displays the scenario list received from the agent server 2 on the display unit (step S422), and accepts the user to select the scenario (step S425).

次いで、クライアント端末１は、ユーザが選択したシナリオを示す選択情報をエージェントサーバ２へ送信する（ステップＳ４２８）。 Next, the client terminal 1 transmits selection information indicating the scenario selected by the user to the agent server 2 (step S428).

次に、エージェントサーバ２のシナリオ管理部３７は、ユーザにより選択されたシナリオが購入済みであるか否かを判断し（ステップＳ４３１）、未購入の場合（ステップＳ４３１／Ｙｅｓ）、クライアント端末１に対して当該シナリオの購入画面の表示指示を行う（ステップＳ４３４）。 Next, the scenario management unit 37 of the agent server 2 determines whether or not the scenario selected by the user has been purchased (step S431), and if not purchased (step S431 / Yes), the client terminal 1 is notified. On the other hand, the display instruction of the purchase screen of the scenario is given (step S434).

次いで、クライアント端末１は、シナリオ購入画面を表示する（ステップＳ４３７）。ユーザは、例えばシナリオ購入画面に表示された購入ボタンをタップしてシナリオの購入を決定する。 Next, the client terminal 1 displays the scenario purchase screen (step S437). For example, the user taps the purchase button displayed on the scenario purchase screen to decide to purchase the scenario.

続いて、シナリオの購入が決定された場合（ステップＳ４４０／Ｙｅｓ）、クライアント端末１は、エージェントサーバ２に対してシナリオの購入依頼を行う（ステップＳ４４３）。 Subsequently, when the purchase of the scenario is decided (step S440 / Yes), the client terminal 1 requests the agent server 2 to purchase the scenario (step S443).

そして、エージェントサーバ２は、シナリオ購入処理を行う（ステップＳ４４６）。シナリオ購入処理は、例えばクレジットカードや電子マネーを用いた決済処理（アプリケーション内の課金処理）により行われ得る。また、ユーザによるシナリオの購入は、ユーザ管理部３５によりユーザ情報ＤＢ３５２にユーザ情報として登録される。 Then, the agent server 2 performs the scenario purchase process (step S446). The scenario purchase process can be performed by, for example, a payment process (billing process in the application) using a credit card or electronic money. Further, the purchase of the scenario by the user is registered as user information in the user information DB 352 by the user management unit 35.

なお、シナリオ購入画面が表示された後、購入を見合わせる場合（ステップＳ４４０／Ｎｏ）、例えばユーザは戻るボタン等をタップしてメニュー一覧画面に戻り、他のシナリオの取得を検討してもよい。 When the purchase is postponed after the scenario purchase screen is displayed (step S440 / No), for example, the user may tap the back button or the like to return to the menu list screen and consider acquiring another scenario.

ここで、図２８および図２９に、シナリオ購入までのクライアント端末１における画面表示例を示す。例えば、図２８左には、ユーザがあるヒーローキャラクターを購入した際のメイン画面１１０を示す。ユーザがメイン画面１１０に表示されているメニューボタン１１１を選択すると、図２８中央に示すように、メニュー画面１１２ａが表示される。そして、メニュー画面１１２ａに表示されているメニューに含まれる「シナリオ一覧」の項目を選択すると、図２８右に示すように、シナリオ一覧画面１１３が表示される。シナリオ一覧画面１１３には、参加可能なシナリオが並び、ユーザは購入したいシナリオを選択する。なお参加期間が既に終了しているシナリオや売り切れのシナリオは、グレーアウト表示され選択できない。例えば画面１１３では、選択可能なシナリオとしてシナリオ１１３ａ、１１３ｂが表示され、選択できないシナリオとしてシナリオ１１３ｃがグレーアウト表示されている。 Here, FIGS. 28 and 29 show an example of screen display on the client terminal 1 until the scenario purchase. For example, FIG. 28 left shows the main screen 110 when a user purchases a hero character. When the user selects the menu button 111 displayed on the main screen 110, the menu screen 112a is displayed as shown in the center of FIG. 28. Then, when the item of the "scenario list" included in the menu displayed on the menu screen 112a is selected, the scenario list screen 113 is displayed as shown on the right side of FIG. 28. The scenarios that can be participated in are arranged on the scenario list screen 113, and the user selects a scenario that he / she wants to purchase. Scenarios for which the participation period has already expired or sold out are grayed out and cannot be selected. For example, on the screen 113, scenarios 113a and 113b are displayed as selectable scenarios, and scenarios 113c are grayed out as non-selectable scenarios.

次いで、ユーザが例えばシナリオ１１３ａ「XX都市でバトル」を選択した場合、図２９の左に示すような購入画面１１４が表示される。購入画面１１４には、シナリオのタイトル、あらすじ、購入金額、購入ボタン１１４ａ、シナリオへの参加場所および期間が表示され、ユーザはシナリオの内容や金額、参加場所、期間等を確認する。シナリオには例えば複数のイベントが用意されていて、それらを全てクリアすることでシナリオをクリアすることが目標となる。「参加場所」とは、イベントが発動するトリガとなる場所であって、その場所に居なくてもシナリオに参加することは可能であるが、発動するイベントが少なく、シナリオをクリアすることが困難となる。 Next, when the user selects, for example, scenario 113a “Battle in XX City”, the purchase screen 114 as shown on the left of FIG. 29 is displayed. On the purchase screen 114, the title, synopsis, purchase amount, purchase button 114a, participation place and period of the scenario are displayed, and the user confirms the content and amount of the scenario, participation place, period and the like. For example, multiple events are prepared for the scenario, and the goal is to clear the scenario by clearing all of them. The "participation place" is a place that triggers an event, and it is possible to participate in the scenario even if you are not at that place, but there are few events that will be triggered and it is difficult to clear the scenario. It becomes.

そして、ユーザが購入ボタン１１４ａを選択すると、図２９右に示す購入決定画面１１５が表示され、「はい」ボタンを選択すると購入処理（決算処理）がエージェントサーバ２により行われ、シナリオ購入が完了する。 Then, when the user selects the purchase button 114a, the purchase decision screen 115 shown on the right side of FIG. 29 is displayed, and when the "Yes" button is selected, the purchase process (settlement process) is performed by the agent server 2, and the scenario purchase is completed. ..

続いて、シナリオへの参加登録処理について図３０を参照して説明する。図３０は、本実施形態によるシナリオ参加登録処理を示すシーケンス図である。なお図３０に示す処理は、上記ステップＳ４３１で、ユーザに選択されたシナリオが購入済みの場合に引き続き行われる処理である。 Subsequently, the participation registration process for the scenario will be described with reference to FIG. FIG. 30 is a sequence diagram showing a scenario participation registration process according to the present embodiment. The process shown in FIG. 30 is a process that is continuously performed when the scenario selected by the user has already been purchased in step S431.

エージェントサーバ２のシナリオ管理部３７は、ユーザに選択されたシナリオにおいて、ユーザが購入した特定のエージェントキャラクター、例えばヒーローキャラクターが未登録のシナリオを検索する（ステップＳ４５０）。ここで、本実施形態によるシナリオ構成について図３１を参照して説明する。 The scenario management unit 37 of the agent server 2 searches for a scenario in which a specific agent character purchased by the user, for example, a hero character, is not registered in the scenario selected by the user (step S450). Here, the scenario configuration according to the present embodiment will be described with reference to FIG. 31.

図３１は、本実施形態によるシナリオ構成について説明する図である。本実施形態によるシナリオには１以上のキャラクターが登場し、１つのキャラクターには一人のユーザが登録される。しかしながら、複数のユーザが同じキャラクターのエージェントプログラムを購入している状態も想定されるため、図３１に示すように、シナリオ毎に複数の参加グループを設定し、参加グループ毎にユーザ登録を行う。例えば、シナリオ#1には、参加グループ#1-1、#1-2、#1-3…と複数の参加グループが紐付けされ、参加グループ内においてキャラクターが重複しないようユーザの振り分けが行われる。具体的には、現在、シナリオ#1の参加グループ#1-1には「キャラクターＡ」と「キャラクターＢ」にそれぞれ成りきるユーザの登録が行われているが、「キャラクターＣ」は空位の状態である。また、同シナリオの参加グループ#1-2では、「キャラクターＡ」が空位の状態である。また、同シナリオの参加グループ#1-3では、「キャラクターＡ」、「キャラクターＢ」、「キャラクターＣ」が空位の状態である。シナリオ管理部３７は、ユーザがシナリオ#1の選択を行った際に、同シナリオの参加グループにおけるユーザ登録状態を把握し、ユーザが購入したキャラクターが未登録（空位）の参加グループを検索する。例えばユーザのキャラクターが「キャラクターＡ」の場合、参加グループ#1-2を検索し、「キャラクターＢ」の場合、参加グループ#1-3を検索し、「キャラクターＣ」の場合、参加グループ#1-1を検索する。なお、ユーザが同キャラクターで異なるシナリオ（例えばシナリオ#1とシナリオ#2）に同時に参加登録することは可能である。 FIG. 31 is a diagram illustrating a scenario configuration according to the present embodiment. One or more characters appear in the scenario according to this embodiment, and one user is registered in one character. However, since it is assumed that a plurality of users have purchased the agent program of the same character, as shown in FIG. 31, a plurality of participating groups are set for each scenario, and user registration is performed for each participating group. For example, in scenario # 1, the participating groups # 1-1, # 1-2, # 1-3, etc. are linked to a plurality of participating groups, and the users are sorted so that the characters do not overlap in the participating groups. .. Specifically, at present, users who can be "Character A" and "Character B" are registered in the participating group # 1-1 of scenario # 1, but "Character C" is vacant. Is. Also, in the participating group # 1-2 of the same scenario, "Character A" is vacant. In addition, in the participating groups # 1-3 of the same scenario, "Character A", "Character B", and "Character C" are vacant. When the user selects scenario # 1, the scenario management unit 37 grasps the user registration status in the participating group of the scenario, and searches for the participating group in which the character purchased by the user is not registered (vacant). For example, if the user's character is "character A", the participating group # 1-2 is searched, if it is "character B", the participating group # 1-3 is searched, and if it is "character C", the participating group # 1 is searched. Search for -1. It is possible for the user to simultaneously register for participation in different scenarios (for example, scenario # 1 and scenario # 2) with the same character.

次いで、シナリオ管理部３７は、検索した参加グループのシナリオ情報をクライアント端末１に送信する（ステップＳ４５３）。 Next, the scenario management unit 37 transmits the searched scenario information of the participating group to the client terminal 1 (step S453).

次に、クライアント端末１は、受信したシナリオ情報を表示部に表示する（ステップＳ４５６）。シナリオ情報の表示画面には、シナリオのあらすじや参加場所、日時の詳細と共に、参加ボタンが表示されている。ここで、図３２に本実施形態によるシナリオ参加画面の一例を示す。図３２左に示す画面１１６は、シナリオ一覧画面１１３（図２８参照）で選択したシナリオ、若しくはシナリオの購入決定画面１１５（図２９参照）で購入を決定し購入処理が完了したシナリオへ参加登録するための操作を受け付ける画面である。画面１１６には、例えば選択されたシナリオのタイトル、あらすじ、シナリオ購入済みの表示、参加場所、参加期間情報、および参加ボタン１１６ａが表示されている。ユーザは参加ボタン１１６ａを選択することで本シナリオへの参加意思を入力することができる。 Next, the client terminal 1 displays the received scenario information on the display unit (step S456). On the scenario information display screen, a participation button is displayed along with a synopsis of the scenario, a place of participation, and details of the date and time. Here, FIG. 32 shows an example of the scenario participation screen according to the present embodiment. The screen 116 shown on the left side of FIG. 32 is registered for participation in the scenario selected on the scenario list screen 113 (see FIG. 28) or the scenario in which the purchase is decided on the scenario purchase decision screen 115 (see FIG. 29) and the purchase process is completed. It is a screen that accepts the operation for. On the screen 116, for example, the title and synopsis of the selected scenario, the display of the purchased scenario, the participation place, the participation period information, and the participation button 116a are displayed. The user can input the intention to participate in this scenario by selecting the participation button 116a.

次いで、参加ボタンが選択されユーザの参加意思が入力された場合（ステップＳ４５９／Ｙｅｓ）、クライアント端末１はエージェントサーバ２に対して本シナリオ（の参加グループ）への参加依頼を行う（ステップＳ４６２）。 Next, when the participation button is selected and the user's intention to participate is input (step S459 / Yes), the client terminal 1 requests the agent server 2 to participate in this scenario (participation group) (step S462). ..

次に、エージェントサーバ２のシナリオ管理部３７は、クライアント端末１からの要求に応じて、ユーザのシナリオへの参加登録処理を行う（ステップＳ４６５）。各シナリオの参加グループに登場するキャラクーへのユーザ登録の情報は、シナリオＤＢ３７４（図１８参照）に登録されていてもよいし、ユーザ情報ＤＢ３５２（図１６参照）に登録されていてもよい。 Next, the scenario management unit 37 of the agent server 2 performs the participation registration process for the user's scenario in response to the request from the client terminal 1 (step S465). The information of user registration to the character cou appearing in the participating group of each scenario may be registered in the scenario DB 374 (see FIG. 18) or in the user information DB 352 (see FIG. 16).

次に、シナリオ管理部３７は、ユーザが参加登録しているシナリオが開始時刻前である場合（ステップＳ４６８／Ｙｅｓ）、開始時刻をユーザへ通知し（ステップＳ４７１）、クライアント端末１ではシナリオの開始時刻を表示画面等を介してユーザに通知する（ステップＳ４７４）。表示画面例としては、例えば図３２の右上に示す画面１１７が挙げられる。画面１７には、シナリオタイトルと共に、参加予約済みである旨、およびシナリオ開催開始時刻までのカウントダウンが表示されている。 Next, when the scenario in which the user has registered to participate is before the start time (step S468 / Yes), the scenario management unit 37 notifies the user of the start time (step S471), and the client terminal 1 starts the scenario. Notify the user of the time via a display screen or the like (step S474). As an example of the display screen, for example, the screen 117 shown in the upper right of FIG. 32 can be mentioned. On the screen 17, the scenario title, the fact that the participation has been reserved, and the countdown to the scenario start time are displayed.

続いて、シナリオの開催開始時刻になった場合（ステップＳ４７７／Ｙｅｓ）、若しくは参加登録したシナリオが既に開催開始時刻後であって（ステップＳ４６８／Ｎｏ）かつシナリオ開催中の場合（ステップＳ４６９／Ｙｅｓ）、シナリオ管理部３７は、シナリオの開催開始通知をユーザに通知する（ステップＳ４８０）。例えば、参加登録したシナリオが既に開始時刻後である場合（すなわちシナリオ開催中の場合）、図３２に示すように、参加ボタン１１６ａをタップした際に、図３２右下に示す画面１１８のように、シナリオタイトルと共に「参加中！」の旨が表示され、直ちにシナリオが開始される。また、既に参加登録しているシナリオの開催開始時刻に達した際、例えば図３３または図３４に示すような開催開始通知が行われる。なお、シナリオの開催期間が終了している場合（ステップＳ４６９／Ｎｏ）、シナリオへの参加はできないため参加登録処理は終了する。 Subsequently, when the start time of the scenario is reached (step S477 / Yes), or when the scenario registered for participation is already after the start time of the scenario (step S468 / No) and the scenario is being held (step S469 / Yes). ), The scenario management unit 37 notifies the user of the start notification of the scenario (step S480). For example, when the scenario registered for participation is already after the start time (that is, when the scenario is being held), as shown in FIG. 32, when the participation button 116a is tapped, as shown in the screen 118 shown in the lower right of FIG. 32. , "Participating!" Is displayed along with the scenario title, and the scenario starts immediately. Further, when the holding start time of the scenario already registered for participation is reached, the holding start notification as shown in FIG. 33 or FIG. 34 is given, for example. If the holding period of the scenario has ended (step S469 / No), the participation registration process ends because participation in the scenario is not possible.

図３３は、本実施形態によるエージェントAppがフォアグラウンドで起動中の場合におけるシナリオ開催開始通例例を示す図である。図３３左に示すように、エージェントAppの画面１２０（例えばメイン画面）が表示されている際に、シナリオの開催開始時刻に達すると、図３３右に示すように、エージェントAppの画面１２０上に、シナリオが開始されることを示すポップアップ表示１２０ａが表示される。ユーザが通知内容を確認の上、「ＯＫ」ボタンを押すと、ポップアップ表示１２０ａが閉じる。 FIG. 33 is a diagram showing a customary example of starting scenario holding when the agent App according to the present embodiment is running in the foreground. As shown on the left side of FIG. 33, when the screen 120 of the agent app (for example, the main screen) is displayed and the start time of the scenario is reached, the screen 120 of the agent app is displayed as shown on the right side of FIG. 33. , A pop-up display 120a indicating that the scenario is started is displayed. When the user confirms the content of the notification and presses the "OK" button, the pop-up display 120a is closed.

図３４は、本実施形態によるエージェントAppが非起動の場合におけるシナリオ開催開始通例例を示す図である。図３４左に示すように、エージェントAppが非起動の場合（例えばホーム画面１２２が表示されている場合）に、シナリオの開催開始時刻に達すると、ホーム画面１２２上にシナリオが開始されることを示すポップアップ表示１２２ａ（Push通知）が表示される。ユーザが通知内容を確認の上、「開く」ボタンを押すと、エージェントAppが起動し、図３４右に示すようにエージェントAppのメイン画面１２３が表示される。 FIG. 34 is a diagram showing a customary example of starting scenario holding when the agent App according to the present embodiment is not started. As shown on the left side of FIG. 34, when the agent app is not started (for example, when the home screen 122 is displayed), when the start time of the scenario is reached, the scenario is started on the home screen 122. The pop-up display 122a (Push notification) is displayed. When the user confirms the notification content and presses the "Open" button, the agent app is started and the main screen 123 of the agent app is displayed as shown on the right side of FIG. 34.

次いで、クライアント端末１は、シナリオが開始されることを表示画面等を介してユーザに通知する（ステップＳ４８３）。 Next, the client terminal 1 notifies the user that the scenario is started via a display screen or the like (step S483).

そして、シナリオ管理部３７は、シナリオ実行処理を開始する（ステップＳ４８６）。 Then, the scenario management unit 37 starts the scenario execution process (step S486).

例えば「XX都市でバトル」といったシナリオが開始された場合、例えばクライアント端末１の表示部には、エージェントキャラクター（例えばヒーローキャラクター）の画像が表示され、さらにナレーションとヒーローキャラクターの音声がイヤホン等から以下のように出力される。
・シナリオ音声
ナレーション「2015年10月12日、舞台はXX都市。繰り広げられる激しいバトルに戦士たちは疲弊していた…」
ヒーローキャラ「XX都市が俺を待ってるぜ！」
ナレーション「そのXX都市で8人の敵を倒すことが使命である。」For example, when a scenario such as "Battle in XX City" is started, for example, an image of an agent character (for example, a hero character) is displayed on the display unit of the client terminal 1, and further, the narration and the voice of the hero character are heard from the earphones and the like. Is output as.
・ Scenario voice narration "On October 12, 2015, the stage is XX city. The warriors were exhausted by the fierce battle that took place ..."
Hero character "XX city is waiting for me!"
Narration "It is our mission to defeat eight enemies in that XX city."

次いで、シナリオが進行している通常時は、クライアント端末１にヒーローキャラクターが表示され、ユーザの状況に応じてヒーローキャラクターが自動発話したり、対応する効果音が流れたりする。状況に応じた自動発話は、図２６Ａ〜図２６Ｄを参照して説明した処理と同様である。
・自動発話音声
ユーザ状況：位置情報の解析により、ユーザがXX都市に移動したことを認識。
ヒーローキャラ「ここがXX都市か。敵はどこだ！？」Next, when the scenario is in progress, the hero character is displayed on the client terminal 1, and the hero character automatically speaks or plays a corresponding sound effect according to the user's situation. The automatic utterance according to the situation is the same as the process described with reference to FIGS. 26A to 26D.
-Automatic voice user status: By analyzing location information, it is recognized that the user has moved to XX city.
Hero character "Is this the XX city? Where are the enemies !?"

続いて、ある条件により、事前にシナリオに用意されたイベントが発生する。イベントが発生した場合は、ヒーローキャラクターまたはナレーションによりイベントが発生した旨と、イベントクリアのために必要なアクションが通知される。ユーザがそのアクションを正しく行うことで、イベントクリアとなる。このような条件に応じたイベント発生といったシナリオ実行処理について、以下詳細に説明する。 Then, under certain conditions, an event prepared in advance for the scenario occurs. When an event occurs, the hero character or narration will notify you that the event has occurred and the actions required to clear the event. The event is cleared when the user performs the action correctly. Scenario execution processing such as event occurrence according to such conditions will be described in detail below.

（４−２−５．シナリオ実行処理）
本実施形態によるシナリオ管理部３７は、シナリオ実行部３７２により、ユーザの発話や移動場所、アクション（行動）等をトリガとしてシナリオイベント（本実施形態では「イベント」と称される）を発生させ、イベントクリアのための所定のアクションをユーザに指示する等の処理を行う。以下、図３５〜図４２を参照して具体的に説明する。(4-2-5. Scenario execution processing)
The scenario management unit 37 according to the present embodiment uses the scenario execution unit 372 to generate a scenario event (referred to as an “event” in the present embodiment) triggered by a user's utterance, movement location, action (behavior), or the like. Performs processing such as instructing the user of a predetermined action for clearing the event. Hereinafter, a specific description will be given with reference to FIGS. 35 to 42.

・ユーザ音声をトリガとしたイベントの発生
図３５は、本実施形態によるユーザ音声をトリガとしたイベントの実行処理を示すシーケンス図である。図３５に示すように、まず、クライアント端末１は、マイクにより周辺の音データを収音し（ステップＳ４９０）、収音した音データをエージェントサーバ２へ送信する（ステップＳ４９３）。-Generation of Event Triggered by User Voice FIG. 35 is a sequence diagram showing execution processing of an event triggered by user voice according to the present embodiment. As shown in FIG. 35, first, the client terminal 1 collects ambient sound data by a microphone (step S490), and transmits the collected sound data to the agent server 2 (step S493).

次に、エージェントサーバ２のシナリオ管理部３７は、情報解析部３７３により、音データの解析を行い、ユーザ音声の抽出を行う。ユーザ音声ができた場合、シナリオ実行部３７２は、ユーザが参加中のシナリオからユーザの発話に対応するイベントを検索する（ステップＳ４９６）。イベントの検索は、シナリオＤＢ３７４に格納されている、ユーザが参加中のシナリオのシナリオデータを参照して行う。上述したように、各シナリオには、１以上のイベントが含まれ、イベント発生のトリガ（条件）とイベント内容とイベントクリアのためのアクションとが対応付けられたデータがシナリオデータとしてシナリオＤＢ３７４に格納されている。シナリオデータの具体例は、上記表２に示した通りである。シナリオ管理部３７は、表２に示したようなイベントデータを参照して、ユーザ音声（すなわち発話内容）をトリガ（発生条件）とするイベントを検索する。 Next, the scenario management unit 37 of the agent server 2 analyzes the sound data by the information analysis unit 373 and extracts the user voice. When the user voice is generated, the scenario execution unit 372 searches for an event corresponding to the user's utterance from the scenario in which the user is participating (step S496). The event search is performed by referring to the scenario data of the scenario in which the user is participating, which is stored in the scenario DB 374. As described above, each scenario includes one or more events, and data in which an event occurrence trigger (condition), an event content, and an action for clearing the event are associated with each other is stored in the scenario DB 374 as scenario data. Has been done. Specific examples of scenario data are as shown in Table 2 above. The scenario management unit 37 refers to the event data as shown in Table 2 and searches for an event triggered by the user voice (that is, the utterance content) (occurrence condition).

次いで、シナリオ実行部３７２は、検索したイベントの情報をシナリオデータから抽出し（ステップＳ４９９）、対応する指定のアクション（イベントクリアのための指定のアクション）に関する情報をクライアント端末１へ送信する（ステップＳ５０２）。なお、対応するイベントが検索できなかった場合、シナリオ実行部３７２は特にクライアント端末１への情報送信は行わないようにしてもよいし、イベント発生のためのヒント（トリガの示唆）を出すようにしてもよい。また、クライアント端末１による音データの収音およびエージェントサーバ２への送信は、シナリオ開催期間中に定期的に行われ得る。 Next, the scenario execution unit 372 extracts the information of the searched event from the scenario data (step S499), and transmits the information regarding the corresponding designated action (designated action for clearing the event) to the client terminal 1 (step). S502). If the corresponding event cannot be searched, the scenario execution unit 372 may not particularly transmit information to the client terminal 1, or may give a hint (trigger suggestion) for event occurrence. You may. Further, the sound data collected by the client terminal 1 and transmitted to the agent server 2 may be periodically performed during the scenario holding period.

次に、クライアント端末１は、イベントクリアのための指定のアクションを行うよう、表示出力や音声出力等によりユーザに指示する（ステップＳ５０５）。ここで、図３６を参照してイベント発生時における表示画面の具体例について説明する。 Next, the client terminal 1 instructs the user to perform a designated action for clearing the event by displaying output, voice output, or the like (step S505). Here, a specific example of the display screen when an event occurs will be described with reference to FIG.

図３６は、本実施形態によるイベント発生時における表示画面例を示す図である。図３６左に示すように、例えばユーザのある発話音声W₆が上述した音声変換機能によりエージェントキャラクターの音声W₇に変換されると共に、当該発話音声W₆が特定の発話であって対応するイベントが検索された場合、当該イベントをクリアするためのアクションが指示される。例えば図３６右に示すように、「上にジャンプするんだ！今すぐ！」といったエージェントの発話音声W₈がイヤホン等から出力されたり、「ジャンプだ！」といったテキストとエージェントの画像を含む画面１２４がクライアント端末１の表示部に表示されたりする。これにより、ユーザは、イベントに対応する指定のアクションを実行することができる。FIG. 36 is a diagram showing an example of a display screen when an event occurs according to the present embodiment. As shown on the left side of FIG. 36, for example, a certain utterance voice W ₆ of a user is converted into the voice W ₇ of an agent character by the above-mentioned voice conversion function, and the utterance voice W ₆ is a specific utterance and corresponds to an event. If is searched, the action to clear the event is instructed. For example, as shown on the right side of Fig. 36, the agent's utterance voice W ₈ such as "Jump up! Now!" Is output from earphones, etc., or a screen containing text such as "Jump!" And an image of the agent. 124 may be displayed on the display unit of the client terminal 1. This allows the user to perform a specified action corresponding to the event.

続いて、クライアント端末１は、ユーザの行動等を検知する各センサからの出力結果を取得し（ステップＳ５０８）、各センサの出力結果をエージェントサーバ２へ送信する（ステップＳ５１１）。ユーザの行動等を検知する各センサとは、例えば加速度センサ、ジャイロセンサ、地磁気センサ、カメラ等である。 Subsequently, the client terminal 1 acquires the output result from each sensor that detects the user's behavior or the like (step S508), and transmits the output result of each sensor to the agent server 2 (step S511). The sensors that detect the user's behavior are, for example, an acceleration sensor, a gyro sensor, a geomagnetic sensor, a camera, and the like.

次いで、エージェントサーバ２は、情報解析部３６６により、各センサからの出力結果を解析し（例えば行動認識の解析）、解析結果に基づいてシナリオ実行部３７２により指定のアクションが行われたか否かを判断する（ステップＳ５１４）。 Next, the agent server 2 analyzes the output result from each sensor by the information analysis unit 366 (for example, analysis of behavior recognition), and determines whether or not the specified action is performed by the scenario execution unit 372 based on the analysis result. Determine (step S514).

次に、指定のアクションが行われたと判断された場合（ステップＳ５１４／Ｙｅｓ）、シナリオ実行部３７２は、対応するイベントがクリアされたと判断し（ステップＳ５１７）、クライアント端末１に対して、イベントをクリアした旨を送信する（ステップＳ５２０）。また、シナリオ実行部３７２は、イベントクリアの情報をシナリオＤＢ３７４に登録（更新）する。 Next, when it is determined that the specified action has been performed (step S514 / Yes), the scenario execution unit 372 determines that the corresponding event has been cleared (step S517), and issues an event to the client terminal 1. The fact that it has been cleared is transmitted (step S520). Further, the scenario execution unit 372 registers (updates) the event clear information in the scenario DB 374.

そして、クライアント端末１は、イベントをクリアした旨を表示出力や音声出力等によりユーザに通知する（ステップＳ５２３）。 Then, the client terminal 1 notifies the user that the event has been cleared by displaying output, voice output, or the like (step S523).

このように、本実施形態では、ユーザの特定の発話をトリガとして所定のイベントを発生させ、所定のアクションをユーザに行うよう促し、アクションが検知された場合に当該イベントをクリアしたとしてシナリオを進行させることができる。 As described above, in the present embodiment, a predetermined event is generated by triggering a specific utterance of the user, the user is urged to perform a predetermined action, and when the action is detected, the scenario is advanced assuming that the event is cleared. Can be made to.

・移動場所（ユーザの位置）をトリガとしたイベントの発生
図３７は、本実施形態によるユーザの位置をトリガとしたイベントの実行処理を示すシーケンス図である。図３７に示すように、まず、クライアント端末１は、ＧＰＳ等により現在位置情報を取得し（ステップＳ５３０）、取得した現在位置情報をエージェントサーバ２へ送信する（ステップＳ５３３）。-Generation of an event triggered by a movement location (user's position) FIG. 37 is a sequence diagram showing an event execution process triggered by a user's position according to the present embodiment. As shown in FIG. 37, first, the client terminal 1 acquires the current position information by GPS or the like (step S530), and transmits the acquired current position information to the agent server 2 (step S533).

次に、エージェントサーバ２のシナリオ管理部３７は、情報解析部３７３により、位置情報の解析を行い、位置情報で示される場所を特定する。例えば情報解析部３７３は、ランドマーク情報が紐付けられた地図データを参照して、ユーザが現在居る場所の名称（地名、都市名、建物名、公園名等）や種別（駅、公園、海辺、郵便局等）を取得する。場所が特定できた場合、シナリオ実行部３７２は、ユーザが参加中のシナリオから特定した場所に対応するイベントを検索する（ステップＳ５３６）。イベントの検索は、シナリオＤＢ３７４に格納されている、ユーザが参加中のシナリオのシナリオデータを参照して行う。シナリオ管理部３７は、上記表２に示したようなイベントデータを参照して、ユーザの現在居る場所（すなわち移動場所）をトリガとするイベントを検索する。 Next, the scenario management unit 37 of the agent server 2 analyzes the location information by the information analysis unit 373, and identifies the location indicated by the location information. For example, the information analysis unit 373 refers to the map data associated with the landmark information, and refers to the name (place name, city name, building name, park name, etc.) and type (station, park, seaside, etc.) of the place where the user is currently located. , Post office, etc.). When the location can be specified, the scenario execution unit 372 searches for an event corresponding to the specified location from the scenarios in which the user is participating (step S536). The event search is performed by referring to the scenario data of the scenario in which the user is participating, which is stored in the scenario DB 374. The scenario management unit 37 refers to the event data as shown in Table 2 above, and searches for an event triggered by the user's current location (that is, the movement location).

次いで、シナリオ実行部３７２は、検索したイベントの情報をシナリオデータから抽出し（ステップＳ５３９）、対応する指定のアクション（イベントクリアのための指定のアクション）に関する情報をクライアント端末１へ送信する（ステップＳ５４２）。なお、クライアント端末１による現在位置情報の取得およびエージェントサーバ２への送信は、シナリオ開催期間中に定期的に行われ得る。 Next, the scenario execution unit 372 extracts the information of the searched event from the scenario data (step S539), and transmits the information regarding the corresponding designated action (designated action for clearing the event) to the client terminal 1 (step). S542). The acquisition of the current position information by the client terminal 1 and the transmission to the agent server 2 may be performed periodically during the scenario holding period.

次に、クライアント端末１は、イベントクリアのための指定のアクションを行うよう、表示出力や音声出力等によりユーザに指示する（ステップＳ５４５）。 Next, the client terminal 1 instructs the user to perform a designated action for clearing the event by displaying output, voice output, or the like (step S545).

続いて、クライアント端末１は、ユーザの行動等を検知する各センサからの出力結果を取得し（ステップＳ５４８）、各センサの出力結果をエージェントサーバ２へ送信する（ステップＳ５５１）。 Subsequently, the client terminal 1 acquires the output result from each sensor that detects the user's behavior or the like (step S548), and transmits the output result of each sensor to the agent server 2 (step S551).

次いで、エージェントサーバ２は、情報解析部３６６により、各センサからの出力結果を解析し（例えば行動認識の解析）、解析結果に基づいてシナリオ実行部３７２により指定のアクションが行われたか否かを判断する（ステップＳ５５４）。 Next, the agent server 2 analyzes the output result from each sensor by the information analysis unit 366 (for example, analysis of behavior recognition), and determines whether or not the specified action is performed by the scenario execution unit 372 based on the analysis result. Determine (step S554).

次に、指定のアクションが行われたと判断された場合（ステップＳ５５４／Ｙｅｓ）、シナリオ実行部３７２は、対応するイベントがクリアされたと判断し（ステップＳ５５７）、クライアント端末１に対して、イベントをクリアした旨を送信する（ステップＳ５６０）。また、シナリオ実行部３７２は、イベントクリアの情報をシナリオＤＢ３７４に登録（更新）する。 Next, when it is determined that the specified action has been performed (step S554 / Yes), the scenario execution unit 372 determines that the corresponding event has been cleared (step S557), and issues an event to the client terminal 1. The fact that it has been cleared is transmitted (step S560). Further, the scenario execution unit 372 registers (updates) the event clear information in the scenario DB 374.

そして、クライアント端末１は、イベントをクリアした旨を表示出力や音声出力等によりユーザに通知する（ステップＳ５６３）。 Then, the client terminal 1 notifies the user that the event has been cleared by display output, voice output, or the like (step S563).

このように、本実施形態では、ユーザの位置をトリガとして所定のイベントを発生させ、所定のアクションをユーザに行うよう促し、アクションが検知された場合に当該イベントをクリアしたとしてシナリオを進行させることができる。 As described above, in the present embodiment, a predetermined event is generated by using the position of the user as a trigger, the user is urged to perform a predetermined action, and when the action is detected, the scenario is advanced assuming that the event is cleared. Can be done.

・複数ユーザが出会うこと（複数ユーザの位置）をトリガとしたイベントの発生
図３８は、本実施形態による複数ユーザの位置をトリガとしたイベントの実行処理を示すシーケンス図である。図３８に示すように、まず、クライアント端末１は、ＧＰＳ等により現在位置情報を取得し（ステップＳ５７０）、取得した現在位置情報をエージェントサーバ２へ送信する（ステップＳ５７２）。-Generation of an event triggered by the encounter of a plurality of users (positions of a plurality of users) FIG. 38 is a sequence diagram showing an event execution process triggered by the positions of a plurality of users according to the present embodiment. As shown in FIG. 38, first, the client terminal 1 acquires the current position information by GPS or the like (step S570), and transmits the acquired current position information to the agent server 2 (step S572).

次いで、エージェントサーバ２のシナリオ管理部３７は、同じシナリオに参加している他のキャラクターをエージェントとしている他ユーザがユーザの近くにいるか否かを判断する（ステップＳ５７３）。シナリオに参加している各ユーザの位置情報は、定期的にクライアント端末１から送信され、エージェントサーバ２側で管理されている。また、シナリオ管理部３７は、ユーザが特定の場所に移動した際に近辺に居る他のキャラクターのユーザを検索するようにしてもよい。また、シナリオ管理部３７は、同じシナリオに参加している不特定の他のキャラクターのユーザを検索するようにしてもよい。 Next, the scenario management unit 37 of the agent server 2 determines whether or not another user who uses another character participating in the same scenario as an agent is near the user (step S573). The location information of each user participating in the scenario is periodically transmitted from the client terminal 1 and managed on the agent server 2 side. Further, the scenario management unit 37 may search for a user of another character who is in the vicinity when the user moves to a specific place. Further, the scenario management unit 37 may search for users of other unspecified characters participating in the same scenario.

次に、近くに他のキャラクターをエージェントとする他ユーザが居ると判断された場合（ステップＳ５７３／Ｙｅｓ）、シナリオ管理部３７は、対応するイベントを検索する（ステップＳ５７６）。シナリオ管理部３７は、例えば上記表２に示したようなイベントデータを参照して、「同じシナリオに参加する他のキャラクターが近くに居る」場合をトリガとするイベント（例えば、「オーバーレイ表示」）を検索する。 Next, when it is determined that there is another user who uses another character as an agent nearby (step S573 / Yes), the scenario management unit 37 searches for the corresponding event (step S576). The scenario management unit 37 refers to the event data as shown in Table 2 above, and triggers an event (for example, "overlay display") triggered by "there are other characters participating in the same scenario nearby". To search for.

次いで、シナリオ実行部３７２は、検索したイベントの情報をシナリオデータから抽出し（ステップＳ５７９）、イベントの実行処理を行う。ここでは、例えば「オーバーレイ表示」というイベントである場合、シナリオ実行部３７２は、近くに居る人の顔画像の取得要求をクライアント端末１に対して行う（ステップＳ５８２）。 Next, the scenario execution unit 372 extracts the information of the searched event from the scenario data (step S579), and executes the event execution process. Here, for example, in the case of an event called "overlay display", the scenario execution unit 372 requests the client terminal 1 to acquire a face image of a nearby person (step S582).

次に、クライアント端末１は、エージェントサーバ２からの要求に応じて、カメラを起動し、ユーザに対して近くの人にカメラをかざすよう指示する（ステップＳ５８５）。ここでは、カメラを起動して近くの人にかざす行動が、イベントクリアのための指定のアクションとなる。 Next, the client terminal 1 activates the camera in response to the request from the agent server 2 and instructs the user to hold the camera over a nearby person (step S585). Here, the action of activating the camera and holding it over a nearby person is the designated action for clearing the event.

続いて、クライアント端末１は、近くの人の顔をカメラにより撮像して撮像画像を取得し（ステップＳ５８８）、撮像画像をエージェントサーバ２に送信する（ステップＳ５９１）。 Subsequently, the client terminal 1 captures the face of a nearby person with a camera to acquire a captured image (step S588), and transmits the captured image to the agent server 2 (step S591).

次いで、エージェントサーバ２のシナリオ管理部３７は、情報解析部３６６により、撮像画像を解析し、ユーザの近辺に居る人物の顔認識を行う（ステップＳ５９４）。さらに、シナリオ実行部３７２は、上記ステップＳ５７３で位置情報に基づいてユーザの近辺に居ると判断された他ユーザの顔情報と、撮像画像に基づく顔認識結果とを参照して、近辺に居る人物の顔認証を行ってもよい。 Next, the scenario management unit 37 of the agent server 2 analyzes the captured image by the information analysis unit 366 and recognizes the face of a person in the vicinity of the user (step S594). Further, the scenario execution unit 372 refers to the face information of another user determined to be in the vicinity of the user based on the position information in step S573 and the face recognition result based on the captured image, and refers to a person in the vicinity. Face recognition may be performed.

次に、近辺に居る人物の顔認識ができた場合（ステップＳ５９４／Ｙｅｓ）、シナリオ実行部３７２は、上記ステップＳ５７３で判断した近辺に居る他のキャラクターの情報をシナリオＤＢ３７４から取得し（ステップＳ５９７）、クライアント端末１へ送信する（ステップＳ６００）。キャラクター情報には、キャラクターの画像が含まれる。 Next, when the face of a person in the vicinity can be recognized (step S594 / Yes), the scenario execution unit 372 acquires information on other characters in the vicinity determined in step S573 from the scenario DB 374 (step S597). ), Sent to the client terminal 1 (step S600). The character information includes an image of the character.

続いて、クライアント端末１は、ユーザが近くの人物（相手ユーザ）にクライアント端末１のカメラをかざしてスルー画像が表示部に表示されている際に、エージェントサーバ２から送信されたキャラクター情報に基づいて、相手が成りきっているエージェントキャラクターの画像をスルー画像上で相手にオーバーレイ表示する（ステップＳ６０３）。これにより、ユーザは、現実空間で同シナリオに登場する他のキャラクターと出会うことができる。なお、エージェントサーバ２は、相手のキャラクター画像を相手のスルー画像に重畳表示するのみならず、例えば相手の発話音声を相手のキャラクターの音声に変換してユーザのイヤホン等から再生するようにしてもよい。また、相手ユーザのクライアント端末１においても同様にユーザのスルー画像にユーザのキャラクターを重畳表示させたり、ユーザの音声をユーザのキャラクターの音声に変換して再生したりするようにしてもよい。これにより、両ユーザは、同シナリオに登場するキャラクター同士として出会い、会話することができる。 Subsequently, the client terminal 1 is based on the character information transmitted from the agent server 2 when the user holds the camera of the client terminal 1 over a nearby person (other user) and the through image is displayed on the display unit. Then, the image of the agent character that the other party is made up of is overlaid on the other party on the through image (step S603). This allows the user to meet other characters appearing in the same scenario in real space. The agent server 2 not only superimposes the character image of the other party on the through image of the other party, but also converts the voice of the other party into the voice of the other character and reproduces it from the user's earphone or the like. Good. Further, also on the client terminal 1 of the other user, the user's character may be superimposed and displayed on the user's through image, or the user's voice may be converted into the user's character's voice and reproduced. As a result, both users can meet and talk with each other as characters appearing in the same scenario.

ここで、図３９Ａおよび図３９Ｂを参照して本実施形態による他のキャラクターのオーバーレイ表示の具体例について説明する。図３９Ａは、本実施形態によるカメラをかざす行動をユーザに促す表示画面例を示す図である。図示された画面１２５は、上記ステップＳ５８５でクライアント端末１の表示部に表示される誘導画面であって、エージェントキャラクターの画像およびカメラ起動ボタン１２５ａが含まれる。また、エージェントキャラクターの声色で、「カメラを起動して近くの人にかざしてみるんだ！」といった発話音声W₉が再生されてもよい。これによりユーザは、エージェントキャラクターの誘導に従ってカメラ起動ボタン１２５ａをタップしてカメラを起動し、近くの人物にかざすといったイベントクリアのための指定のアクションを取ることができる。Here, a specific example of overlay display of another character according to the present embodiment will be described with reference to FIGS. 39A and 39B. FIG. 39A is a diagram showing an example of a display screen prompting the user to hold the camera according to the present embodiment. The illustrated screen 125 is a guidance screen displayed on the display unit of the client terminal 1 in step S585, and includes an image of the agent character and a camera activation button 125a. In addition, in the tone of voice of the agent character, speech W _9, such as "I try held up close to the people to start the camera!" It may be played. As a result, the user can take a designated action for clearing the event, such as tapping the camera activation button 125a according to the guidance of the agent character to activate the camera and holding it over a nearby person.

図３９Ｂは、本実施形態による他のキャラクターのオーバーレイ表示について説明する図である。図３９Ｂに示すように、ユーザがクライアント端末１を近くにいる人物にかざすと、クライアント端末１の表示部に、クライアント端末１のカメラで撮像したスルー画像が表示され、さらにスルー画像に写る相手ユーザに相手のキャラクター画像がリアルタイムで重畳された画像１２６が表示される。この際、クライアント端末１は、エージェントサーバ２により相手ユーザの発話音声が相手ユーザのキャラクター音声に変換された音声や、状況に応じて自動発話される相手ユーザのキャラクターの所定フレーズ音声W₁₀をイヤホン等から再生してもよい。FIG. 39B is a diagram illustrating overlay display of other characters according to the present embodiment. As shown in FIG. 39B, when the user holds the client terminal 1 over a nearby person, the through image captured by the camera of the client terminal 1 is displayed on the display unit of the client terminal 1, and the other user reflected in the through image. The image 126 on which the character image of the other party is superimposed in real time is displayed on the screen. At this time, the client terminal 1 earphones the voice obtained by converting the voice of the other user into the character voice of the other user by the agent server 2 and the predetermined phrase voice W ₁₀ of the character of the other user automatically spoken according to the situation. It may be reproduced from the above.

次いで、シナリオ実行部３７２は、対応するイベントがクリアされたと判断し（ステップＳ６０６）、クライアント端末１に対して、イベントをクリアした旨を送信する（ステップＳ６０９）。また、シナリオ実行部３７２は、イベントクリアの情報をシナリオＤＢ３７４に登録（更新）する。 Next, the scenario execution unit 372 determines that the corresponding event has been cleared (step S606), and transmits to the client terminal 1 that the event has been cleared (step S609). Further, the scenario execution unit 372 registers (updates) the event clear information in the scenario DB 374.

そして、クライアント端末１は、イベントをクリアした旨を表示出力や音声出力等によりユーザに通知する（ステップＳ６１２）。 Then, the client terminal 1 notifies the user that the event has been cleared by display output, voice output, or the like (step S612).

このように、本実施形態では、複数ユーザの位置に基づいて、同じシナリオに参加するキャラクター同士が現実空間で出会うことをトリガとして所定のイベントを発生させることができる。 As described above, in the present embodiment, a predetermined event can be generated based on the positions of a plurality of users, triggered by the encounter of characters participating in the same scenario in the real space.

・各センサからの出力結果をトリガとしたイベントの発生
図４０は、本実施形態による各センサからの出力結果をトリガとしたイベントの実行処理を示すシーケンス図である。図４０に示すように、まず、クライアント端末１は、各センサからの出力結果を取得し（ステップＳ６２０）、エージェントサーバ２へ送信する（ステップＳ６２３）。各センサとは、例えば加速度センサ、ジャイロセンサ、地磁気センサ、カメラ等であってクライアント端末１や、クライアント端末１と通信接続するウェアラブル端末（例えばスマートバンド、スマートウォッチ、スマートアイグラス）等に設けられ、ユーザの行動を認識する。-Generation of an event triggered by an output result from each sensor FIG. 40 is a sequence diagram showing an event execution process triggered by an output result from each sensor according to the present embodiment. As shown in FIG. 40, first, the client terminal 1 acquires the output result from each sensor (step S620) and transmits it to the agent server 2 (step S623). Each sensor is, for example, an acceleration sensor, a gyro sensor, a geomagnetic sensor, a camera, etc., and is provided on a client terminal 1 or a wearable terminal (for example, a smart band, a smart watch, a smart eyeglass) that communicates with the client terminal 1. , Recognize user behavior.

次に、エージェントサーバ２のシナリオ管理部３７は、情報解析部３７３により、各センサの出力結果の解析を行い、ユーザの行動を特定する。ユーザの行動（寝ている、起きた、走った、歩いた、電車／自転車／自動車に乗った等）が特定できた場合、シナリオ実行部３７２は、ユーザが参加中のシナリオからユーザの行動に対応するイベントを検索する（ステップＳ６２６）。イベントの検索は、シナリオＤＢ３７４に格納されている、ユーザが参加中のシナリオのシナリオデータを参照して行う。シナリオ管理部３７は、上記表２に示したようなイベントデータを参照して、ユーザの行動をトリガとするイベントを検索する。 Next, the scenario management unit 37 of the agent server 2 analyzes the output results of each sensor by the information analysis unit 373, and identifies the user's behavior. When the user's behavior (sleeping, waking up, running, walking, riding a train / bicycle / car, etc.) can be identified, the scenario execution unit 372 changes the user's behavior from the scenario in which the user is participating. Search for the corresponding event (step S626). The event search is performed by referring to the scenario data of the scenario in which the user is participating, which is stored in the scenario DB 374. The scenario management unit 37 searches for an event triggered by the user's action by referring to the event data as shown in Table 2 above.

次いで、シナリオ実行部３７２は、検索したイベントの情報をシナリオデータから抽出し（ステップＳ６２９）、対応する指定のアクション（イベントクリアのための指定のアクション）に関する情報をクライアント端末１へ送信する（ステップＳ６３２）。なお、クライアント端末１による各センサからの出力結果の取得およびエージェントサーバ２への送信は、シナリオ開催期間中に定期的に行われ得る。 Next, the scenario execution unit 372 extracts the information of the searched event from the scenario data (step S629), and transmits the information regarding the corresponding designated action (designated action for clearing the event) to the client terminal 1 (step). S632). The acquisition of the output result from each sensor by the client terminal 1 and the transmission to the agent server 2 may be performed periodically during the scenario holding period.

次に、クライアント端末１は、イベントクリアのための指定のアクションを行うよう、表示出力や音声出力等によりユーザに指示する（ステップＳ６３５）。 Next, the client terminal 1 instructs the user to perform a designated action for clearing the event by displaying output, voice output, or the like (step S635).

続いて、クライアント端末１は、ユーザの行動等を検知する各センサからの出力結果を取得し（ステップＳ６３８）、各センサの出力結果をエージェントサーバ２へ送信する（ステップＳ６４１）。 Subsequently, the client terminal 1 acquires the output result from each sensor that detects the user's behavior or the like (step S638), and transmits the output result of each sensor to the agent server 2 (step S641).

次いで、エージェントサーバ２は、情報解析部３６６により、各センサからの出力結果を解析し（例えば行動認識の解析）、解析結果に基づいてシナリオ実行部３７２により指定のアクションが行われたか否かを判断する（ステップＳ６４４）。 Next, the agent server 2 analyzes the output result from each sensor by the information analysis unit 366 (for example, analysis of behavior recognition), and determines whether or not the specified action is performed by the scenario execution unit 372 based on the analysis result. Determine (step S644).

次に、指定のアクションが行われたと判断された場合（ステップＳ６４４／Ｙｅｓ）、シナリオ実行部３７２は、対応するイベントがクリアされたと判断し（ステップＳ６４７）、クライアント端末１に対して、イベントをクリアした旨を送信する（ステップＳ６５０）。また、シナリオ実行部３７２は、イベントクリアの情報をシナリオＤＢ３７４に登録（更新）する。 Next, when it is determined that the specified action has been performed (step S644 / Yes), the scenario execution unit 372 determines that the corresponding event has been cleared (step S647), and issues an event to the client terminal 1. The fact that it has been cleared is transmitted (step S650). Further, the scenario execution unit 372 registers (updates) the event clear information in the scenario DB 374.

そして、クライアント端末１は、イベントをクリアした旨を表示出力や音声出力等によりユーザに通知する（ステップＳ６５３）。 Then, the client terminal 1 notifies the user that the event has been cleared by displaying output, voice output, or the like (step S653).

このように、本実施形態では、ユーザの行動をトリガとして所定のイベントを発生させ、所定のアクションをユーザに行うよう促し、アクションが検知された場合に当該イベントをクリアしたとしてシナリオを進行させることができる。 As described above, in the present embodiment, a predetermined event is generated by using the user's action as a trigger, the user is urged to perform the predetermined action, and when the action is detected, the scenario is advanced assuming that the event is cleared. Can be done.

以上、本実施形態によるシナリオイベントの実行処理について具体的に説明した。なお、本実施形態によるシナリオイベントの発生トリガは、上述した発話（ユーザ音声）、移動場所（位置情報）、複数ユーザが出会うこと（複数ユーザの位置情報）、各センサの出力結果（ユーザ行動）、若しくはユーザの表情（撮像画像）、日時等のうち、少なくともいずれか１以上を含む条件としてもよい。例えば、ある特定の場所で、ある発話を行うことを条件としたり、ある特定の時刻にある場所に移動することを条件としてもよい。また、上述したトリガのうち、所定の順（予め設定された順序、優先度の高い順序等）にイベント発生有無を判断してもよい。 The execution process of the scenario event according to the present embodiment has been specifically described above. The triggers for generating a scenario event according to the present embodiment are the above-mentioned utterance (user voice), moving location (position information), encounter of multiple users (position information of multiple users), and output result of each sensor (user behavior). Alternatively, the condition may include at least one or more of the user's facial expression (captured image), date and time, and the like. For example, it may be a condition that a certain utterance is made at a specific place, or a condition that the person moves to a certain place at a specific time. Further, among the above-mentioned triggers, it may be determined whether or not an event has occurred in a predetermined order (preset order, high priority order, etc.).

また、上述したイベントは、エージェントAppが非起動時（バッググラウンドで実行中）にも発生し得る。イベント発生時は、例えばプッシュ通知でその旨が知らされ（「エージェントApp通知イベントが発生！」等）、エージェントAppを起動することでその内容を確認することができる。 The above-mentioned event can also occur when the agent application is not started (running in the background). When an event occurs, for example, a push notification will notify you (such as "Agent App notification event has occurred!"), And you can check the contents by starting the Agent App.

・シナリオクリア
１つのシナリオには例えば複数のイベントが含まれ、シナリオ開催期間中に全てのイベントをクリアすることが求められる。以下、図４１〜図４２を参照して本実施形態によるシナリオクリアの一例について説明する。-Scenario clear One scenario includes, for example, multiple events, and it is required to clear all the events during the scenario holding period. Hereinafter, an example of scenario clearing according to the present embodiment will be described with reference to FIGS. 41 to 42.

図４１は、本実施形態によるシナリオクリアの判断処理を示すシーケンス図である。図４１に示すように、まず、エージェントサーバ２のシナリオ管理部３７は、ユーザが参加中のシナリオにおける全てのイベントがクリアされたか否かを判断する（ステップＳ６６０）。 FIG. 41 is a sequence diagram showing a scenario clear determination process according to the present embodiment. As shown in FIG. 41, first, the scenario management unit 37 of the agent server 2 determines whether or not all the events in the scenario in which the user is participating have been cleared (step S660).

次いで、全てのイベントがクリアされたと判断した場合（ステップＳ６６０／Ｙｅｓ）、シナリオ実行部３７２は、当該シナリオがクリアされたと判断し（ステップＳ６６３）、クライアント端末１に対して、シナリオをクリアした旨を送信する（ステップＳ６６６）。また、シナリオ実行部３７２は、シナリオクリアの情報をシナリオＤＢ３７４に登録（更新）する。 Next, when it is determined that all the events have been cleared (step S660 / Yes), the scenario execution unit 372 determines that the scenario has been cleared (step S663), and indicates that the scenario has been cleared for the client terminal 1. Is transmitted (step S666). Further, the scenario execution unit 372 registers (updates) the scenario clear information in the scenario DB 374.

そして、クライアント端末１は、シナリオをクリアした旨を表示出力や音声出力等によりユーザに通知する（ステップＳ６６９）。ここで、図４２に、本実施形態によるシナリオクリア時の通知画面例を示す。 Then, the client terminal 1 notifies the user that the scenario has been cleared by display output, voice output, or the like (step S669). Here, FIG. 42 shows an example of a notification screen when the scenario is cleared according to the present embodiment.

図示された例では、画面１２８に、「シナリオ＃１『XX都市でバトル』をクリアしました！！」といった通知と、ＯＫボタンが表示される。これによりユーザは、参加中のシナリオ＃１の全てのイベントをクリアしたことが分かる。また、ＯＫボタンをタップすると当該通知の表示画面が閉じられ、例えばエージェントAppのメイン画面に戻る。 In the illustrated example, a notification such as "Scenario # 1" Battle in XX City "has been cleared !!" and an OK button are displayed on the screen 128. As a result, the user knows that he / she has cleared all the events of the participating scenario # 1. Also, when the OK button is tapped, the display screen of the notification is closed, and the screen returns to, for example, the main screen of the agent app.

＜＜５．まとめ＞＞
上述したように、本開示の実施形態による通信制御システムでは、エージェントを通してエージェントのキャラクターをユーザ自身が体験できるようにすることでエージェントシステムの娯楽性をさらに高めることが可能となる。<< 5. Summary >>
As described above, in the communication control system according to the embodiment of the present disclosure, it is possible to further enhance the entertainment of the agent system by allowing the user to experience the character of the agent through the agent.

以上、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本技術はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 Although the preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the present technology is not limited to such examples. It is clear that a person having ordinary knowledge in the technical field of the present disclosure can come up with various modifications or modifications within the scope of the technical ideas described in the claims. Of course, it is understood that the above also belongs to the technical scope of the present disclosure.

例えば、上述したクライアント端末１、またはエージェントサーバ２に内蔵されるＣＰＵ、ＲＯＭ、およびＲＡＭ等のハードウェアに、クライアント端末１、またはエージェントサーバ２の機能を発揮させるためのコンピュータプログラムも作成可能である。また、当該コンピュータプログラムを記憶させたコンピュータ読み取り可能な記憶媒体も提供される。 For example, it is possible to create a computer program for exerting the functions of the client terminal 1 or the agent server 2 on the hardware such as the CPU, ROM, and RAM built in the client terminal 1 or the agent server 2 described above. .. Also provided is a computer-readable storage medium that stores the computer program.

また、上述した実施形態では、クライアント端末１とインターネットを介して接続するエージェントサーバ２で各種機能が実現される構成を示したが、本実施形態はこれに限定されない。例えば、図３、図１５〜図１８に示すエージェントサーバ２の各構成のうち少なくとも一部が、クライアント端末１（スマートフォンやウェアラブル端末等）にあってもよい。また、図３、図１５〜図１８に示すエージェントサーバ２の構成全てがクライアント端末１に設けられ、クライアント端末１で全ての処理を行えるようにしてもよい。 Further, in the above-described embodiment, various functions are realized by the agent server 2 connected to the client terminal 1 via the Internet, but the present embodiment is not limited to this. For example, at least a part of each configuration of the agent server 2 shown in FIGS. 3 and 15 to 18 may be in the client terminal 1 (smartphone, wearable terminal, etc.). Further, all the configurations of the agent server 2 shown in FIGS. 3 and 15 to 18 may be provided in the client terminal 1 so that the client terminal 1 can perform all the processing.

また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 In addition, the effects described herein are merely explanatory or exemplary and are not limited. That is, the techniques according to the present disclosure may exhibit other effects apparent to those skilled in the art from the description herein, in addition to or in place of the above effects.

なお、本技術は以下のような構成も取ることができる。
（１）
複数種類のキャラクターに対応する音素データベースと発話フレーズデータベースを記憶するエージェント記憶部と、
ユーザのクライアント端末を介して特定のキャラクターを選択する選択信号を受信すると共に、当該特定のキャラクターの前記発話フレーズデータベースに応じた発話フレーズを送信する通信部と、
前記通信部を介して受信した前記ユーザのメッセージに基づいて、前記特定のキャラクターに対応する前記音素データベースを用いて前記特定のキャラクターの音声に変換した変換メッセージを生成し；
さらに前記ユーザのメッセージに対応する前記特定のキャラクターの発話フレーズを、前記発話フレーズデータベースを用いて生成し；
前記生成した変換メッセージおよび発話フレーズを前記クライアント端末に返送するよう制御する制御部と、
を備える、情報処理システム。
（２）
前記制御部は、前記通信部を介して受信した前記ユーザのコンテキストと、前記発話フレーズデータベースに基づいて、前記ユーザのメッセージに対応する発話フレーズを生成する、前記（１）に記載の情報処理システム。
（３）
前記ユーザのコンテキストは、前記ユーザの位置、顔認識、加速度情報、または生体センサ情報の少なくともいずれかである、前記（２）に記載の情報処理システム。
（４）
前記ユーザのメッセージは、ユーザの発話音声または当該発話音声をテキスト化したものである、前記（２）または（３）に記載の情報処理システム。
（５）
前記情報処理システムは、前記ユーザがキャラクターとして参加可能な複数のシナリオを記憶するシナリオ記憶部をさらに備え、
前記制御部は；
前記通信部を介して受信した前記ユーザからのシナリオ選択信号に応じて、前記シナリオ記憶部に記憶されるシナリオを選択し；
前記通信部を介して受信した前記ユーザのコンテキストが、前記選択したシナリオに記述されているイベント発生条件に一致すると、所定のイベントの発生を前記ユーザに通知するよう制御する、前記（１）〜（４）のいずれか１項に記載の情報処理システム。
（６）
前記所定のイベント発生時に前記ユーザに通知される内容には、特定のアクションを示す情報が含まれ、
前記制御部は、前記通信部を介して新たに受信した前記ユーザのコンテキストに基づいて、前記特定のアクションが実行された否かを判定し、実行されたと判定すると、イベントクリアを示す通知を前記ユーザに送信するよう制御する、前記（５）に記載の情報処理システム。
（７）
前記イベント発生条件の判断に用いられる前記ユーザのコンテキストは、前記ユーザの位置、発話、加速度情報、または生体センサ情報の少なくともいずれかである、前記（５）または（６）に記載の情報処理システム。
（８）
前記制御部は、前記選択されたシナリオに参加している各ユーザのクライアント端末の位置を示す各位置情報を前記通信部により受信し、前記ユーザの周辺に同一の前記シナリオに参加する他のユーザが存在すると判断した場合、前記通信部を介して前記ユーザにイベント発生の通知を行うよう制御する、前記（５）〜（７）のいずれか１項に記載の情報処理システム。
（９）
前記制御部は、前記通信部を介して受信した前記クライアント端末の撮像部により撮像された撮像画像に人物の顔を認識すると、当該顔に重畳表示するための前記他のユーザのキャラクターの画像を前記クライアント端末に送信するよう制御する、前記（８）に記載の情報処理システム。
（１０）
前記制御部は、前記通信部を介して受信した前記クライアント端末の収音部により収音された音声から他のユーザの発話音声を認識すると、当該発話音声を前記他のユーザのキャラクターの音声に変換した変換メッセージと、前記発話フレーズデータベースに基づく対応するフレーズを生成し、前記クライアント端末に送信するよう制御する、前記（８）または（９）に記載の情報処理システム。
（１１）
プロセッサが、
複数種類のキャラクターに対応する音素データベースと発話フレーズデータベースをエージェント記憶部に記憶することと、
ユーザのクライアント端末を介して特定のキャラクターを選択する選択信号を受信すると共に、当該特定のキャラクターの前記発話フレーズデータベースに応じた発話フレーズを通信部により送信することと、
前記通信部を介して受信した前記ユーザのメッセージに基づいて、前記特定のキャラクターに対応する前記音素データベースを用いて前記特定のキャラクターの音声に変換した変換メッセージを生成し；
さらに前記ユーザのメッセージに対応する前記特定のキャラクターの発話フレーズを、前記発話フレーズデータベースを用いて生成し；
前記生成した変換メッセージおよび発話フレーズを前記クライアント端末に返送するよう制御部により制御することと、
を含む、情報処理方法。The present technology can also have the following configurations.
(1)
An agent storage unit that stores phoneme databases and utterance phrase databases that support multiple types of characters,
A communication unit that receives a selection signal for selecting a specific character via the user's client terminal and transmits an utterance phrase corresponding to the utterance phrase database of the specific character.
Based on the user's message received via the communication unit, a converted message converted into the voice of the specific character is generated using the phoneme database corresponding to the specific character;
Further, the utterance phrase of the specific character corresponding to the message of the user is generated by using the utterance phrase database;
A control unit that controls the generated conversion message and utterance phrase to be returned to the client terminal.
Information processing system equipped with.
(2)
The information processing system according to (1) above, wherein the control unit generates an utterance phrase corresponding to the user's message based on the context of the user received via the communication unit and the utterance phrase database. ..
(3)
The information processing system according to (2) above, wherein the user's context is at least one of the user's position, face recognition, acceleration information, or biosensor information.
(4)
The information processing system according to (2) or (3) above, wherein the user's message is a user's uttered voice or a text of the uttered voice.
(5)
The information processing system further includes a scenario storage unit that stores a plurality of scenarios in which the user can participate as a character.
The control unit;
A scenario stored in the scenario storage unit is selected according to a scenario selection signal from the user received via the communication unit;
When the context of the user received via the communication unit matches the event occurrence condition described in the selected scenario, the user is controlled to notify the occurrence of a predetermined event. The information processing system according to any one of (4).
(6)
The content notified to the user when the predetermined event occurs includes information indicating a specific action.
The control unit determines whether or not the specific action has been executed based on the context of the user newly received via the communication unit, and if it determines that the specific action has been executed, the control unit issues a notification indicating event clear. The information processing system according to (5) above, which controls transmission to a user.
(7)
The information processing system according to (5) or (6) above, wherein the user's context used for determining the event occurrence condition is at least one of the user's position, utterance, acceleration information, or biosensor information. ..
(8)
The control unit receives each position information indicating the position of the client terminal of each user participating in the selected scenario by the communication unit, and another user who participates in the same scenario around the user. The information processing system according to any one of (5) to (7) above, which controls to notify the user of the occurrence of an event via the communication unit when it is determined that the information system exists.
(9)
When the control unit recognizes a person's face in the image captured by the image pickup unit of the client terminal received via the communication unit, the control unit displays an image of the character of the other user to be superimposed and displayed on the face. The information processing system according to (8) above, which controls transmission to the client terminal.
(10)
When the control unit recognizes the uttered voice of another user from the voice picked up by the sound collecting unit of the client terminal received via the communication unit, the uttered voice is converted into the voice of the character of the other user. The information processing system according to (8) or (9) above, wherein the converted conversion message and the corresponding phrase based on the utterance phrase database are generated and controlled to be transmitted to the client terminal.
(11)
The processor
Storing phoneme databases and utterance phrase databases corresponding to multiple types of characters in the agent storage section,
In addition to receiving a selection signal for selecting a specific character via the user's client terminal, the communication unit transmits a utterance phrase corresponding to the utterance phrase database of the specific character.
Based on the user's message received via the communication unit, a converted message converted into the voice of the specific character is generated using the phoneme database corresponding to the specific character;
Further, the utterance phrase of the specific character corresponding to the message of the user is generated by using the utterance phrase database;
Control by the control unit to return the generated conversion message and utterance phrase to the client terminal, and
Information processing methods, including.

１クライアント端末
２エージェントサーバ
３０対話処理部
３００対話処理部
３１０質問文検索部
３２０回答文生成部
３３０会話ＤＢ
３４０音素データ取得部
３０ａ対話処理部
３１基本対話処理部
３２キャラクターＡ対話処理部
３３人物Ｂ対話処理部
３４人物Ｃ対話処理部
３５ユーザ管理部
３５１ログイン管理部
３５２ユーザ情報ＤＢ
３５３顔情報登録部
３５４ユーザ位置情報登録部
３６自動発話制御部
３６１ユーザ音声抽出部
３６２音素データ取得部
３６３位置情報取得部
３６４フレーズ検索部
３６５フレーズＤＢ
３６６情報解析部
３７シナリオ管理部
３７１データ管理部
３７２シナリオ実行部
３７３情報解析部
３７４シナリオＤＢ
４０音素記憶部
４１基本用音素ＤＢ
４２キャラクターＡ音素ＤＢ
４３人物Ｂ音素ＤＢ
４４人物Ｃ音素ＤＢ
５０会話ＤＢ生成部
６０音素ＤＢ生成部
７０広告挿入処理部
７２広告ＤＢ
８０フィードバック取得処理部
３ネットワーク
１０エージェント1 Client terminal 2 Agent server 30 Dialogue processing unit 300 Dialogue processing unit 310 Question text search unit 320 Answer text generation unit 330 Conversation DB
340 Phoneme data acquisition unit 30a Dialogue processing unit 31 Basic dialogue processing unit 32 Character A Dialogue processing unit 33 Person B Dialogue processing unit 34 Person C Dialogue processing unit 35 User management unit 351 Login management unit 352 User information DB
353 Face information registration unit 354 User location information registration unit 36 Automatic utterance control unit 361 User voice extraction unit 362 Phoneme data acquisition unit 363 Location information acquisition unit 364 Phrase search unit 365 Phrase DB
366 Information Analysis Department 37 Scenario Management Department 371 Data Management Department 372 Scenario Execution Department 373 Information Analysis Department 374 Scenario DB
40 Phoneme storage 41 Basic phoneme DB
42 Character A Phoneme DB
43 Person B Phoneme DB
44 Person C Phoneme DB
50 Conversation DB generation unit 60 Phoneme DB generation unit 70 Advertisement insertion processing unit 72 Advertisement DB
80 Feedback acquisition processing unit 3 Network 10 Agent

Claims

An agent storage unit that stores phoneme databases and utterance phrase databases that support multiple types of characters,
A communication unit that receives a selection signal for selecting a specific character via the user's client terminal and transmits an utterance phrase corresponding to the utterance phrase database of the specific character.
Based on the user's message received via the communication unit, a converted message converted into the voice of the specific character is generated using the phoneme database corresponding to the specific character;
Further, the utterance phrase of the specific character corresponding to the message of the user is generated by using the utterance phrase database;
A control unit that controls the generated conversion message and utterance phrase to be returned to the client terminal.
Information processing system equipped with.

The information processing system according to claim 1, wherein the control unit generates an utterance phrase corresponding to the user's message based on the context of the user received via the communication unit and the utterance phrase database.

The information processing system according to claim 2, wherein the user's context is at least one of the user's position, face recognition, acceleration information, or biosensor information.

The information processing system according to claim 2 or 3 , wherein the user's message is a user's uttered voice or a text of the uttered voice.

The information processing system further includes a scenario storage unit that stores a plurality of scenarios in which the user can participate as a character.
The control unit;
A scenario stored in the scenario storage unit is selected according to a scenario selection signal from the user received via the communication unit;
Claims 1 to 4 control to notify the user of the occurrence of a predetermined event when the context of the user received via the communication unit matches the event occurrence condition described in the selected scenario. The information processing system according to any one of the above.

The content notified to the user when the predetermined event occurs includes information indicating a specific action.
The control unit determines whether or not the specific action has been executed based on the context of the user newly received via the communication unit, and if it determines that the specific action has been executed, the control unit issues a notification indicating event clear. The information processing system according to claim 5, which controls the transmission to the user.

The information processing system according to claim 5 or 6 , wherein the user's context used for determining the event occurrence condition is at least one of the user's position, utterance, acceleration information, or biosensor information.

The control unit receives each position information indicating the position of the client terminal of each user participating in the selected scenario by the communication unit, and another user who participates in the same scenario around the user. The information processing system according to any one of claims 5 to 7, which controls to notify the user of the occurrence of an event via the communication unit when it is determined that the information system exists.

When the control unit recognizes a person's face in the image captured by the image pickup unit of the client terminal received via the communication unit, the control unit displays an image of the character of the other user to be superimposed and displayed on the face. The information processing system according to claim 8, which controls transmission to the client terminal.

When the control unit recognizes the uttered voice of another user from the voice picked up by the sound picking unit of the client terminal received via the communication unit, the uttered voice is converted into the voice of the character of the other user. The information processing system according to claim 8 or 9 , wherein the converted conversion message and the corresponding phrase based on the utterance phrase database are generated and controlled to be transmitted to the client terminal.

The processor
Storing phoneme databases and utterance phrase databases corresponding to multiple types of characters in the agent storage section,
In addition to receiving a selection signal for selecting a specific character via the user's client terminal, the communication unit transmits a utterance phrase corresponding to the utterance phrase database of the specific character.
Based on the user's message received via the communication unit, a converted message converted into the voice of the specific character is generated using the phoneme database corresponding to the specific character;
Further, the utterance phrase of the specific character corresponding to the message of the user is generated by using the utterance phrase database;
Control by the control unit to return the generated conversion message and utterance phrase to the client terminal, and
Information processing methods, including.