JP6788620B2

JP6788620B2 - Information processing systems, information processing methods, and programs

Info

Publication number: JP6788620B2
Application number: JP2018008210A
Authority: JP
Inventors: 辰顕鈴木; 北岸　郁雄; 郁雄北岸; 健介 ▲高▼田; 宏幸穴井
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2018-01-22
Filing date: 2018-01-22
Publication date: 2020-11-25
Anticipated expiration: 2038-01-22
Also published as: JP2019128384A

Description

本発明は、情報処理システム、情報処理方法、およびプログラムに関する。 The present invention relates to information processing systems, information processing methods, and programs.

従来、目的地までの経路探索を行い、探索結果に応じて誘導経路を案内するナビゲーション処理中に、ユーザとの対話に基づいて、音声広告又は音声広告に係るアンケートを、音声出力手段により音声出力させる出力制御手段を備える情報処理装置が開示されている（例えば、特許文献１）。 Conventionally, during a navigation process in which a route search to a destination is performed and a guidance route is guided according to the search result, a voice advertisement or a questionnaire related to the voice advertisement is output by voice output means based on a dialogue with a user. An information processing device including an output control means for making the device is disclosed (for example, Patent Document 1).

特開２０１７−５８３１５号公報JP-A-2017-58315

しかしながら、従来の技術では、音声の出力は利用者に違和感を与える場合があった。 However, in the conventional technique, the output of voice may give a feeling of strangeness to the user.

本発明は、このような事情を考慮してなされたものであり、利用者に違和感を与えないように情報を提供することができる情報処理システム、情報処理方法、およびプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and an object of the present invention is to provide an information processing system, an information processing method, and a program capable of providing information so as not to give a sense of discomfort to the user. It is one of.

本発明の一態様は、利用者により発せられた音声に対する応答内容と、前記応答内容とは異なる特定情報とを出力部に出力させる応答部と、前記特定情報の出力態様を、前記応答内容の出力態様である第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、前記特定情報の出力態様を、前記第１出力態様よりも前記利用者が聞き取りやすい第２出力態様に変更して、前記特定情報を出力部に出力させる制御部とを備える情報処理システムである。 One aspect of the present invention is a response unit that outputs a response content to a voice uttered by a user and specific information different from the response content to an output unit, and an output mode of the specific information of the response content. When the user's instruction is received after changing to the first output mode, which is harder for the user to hear than the third output mode, which is the output mode, and then outputting to the output unit, the output mode of the specific information is described. It is an information processing system including a control unit that outputs the specific information to the output unit by changing to the second output mode that is easier for the user to hear than the first output mode.

本発明の一態様によれば、利用者に違和感を与えないように情報を提供することができる。 According to one aspect of the present invention, information can be provided so as not to give the user a sense of discomfort.

情報処理システム１の構成を示す図である。It is a figure which shows the structure of the information processing system 1. 情報処理システム１により実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of processing executed by the information processing system 1. 環境パターン情報６４の内容の一例を示す図である。It is a figure which shows an example of the content of the environment pattern information 64. 広告情報９２の内容の一例を示す図である。It is a figure which shows an example of the content of advertisement information 92. 出力度合情報７２の内容の一例を示す図である。It is a figure which shows an example of the content of output degree information 72. 利用者と自動応答装置４０との会話の一例を示す図である。It is a figure which shows an example of the conversation between a user and an automatic response device 40. 端末装置１０および自動応答装置４０により実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of processing executed by the terminal apparatus 10 and the automatic response apparatus 40. 利用情報７４の内容の一例を示す図である。It is a figure which shows an example of the content of the usage information 74. 第２実施形態の情報処理システム１Ａに含まれる自動応答装置４０Ａの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the automatic response device 40A included in the information processing system 1A of the 2nd Embodiment. 端末装置１０および第２実施形態の自動応答装置４０Ａにより実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of processing executed by the terminal apparatus 10 and the automatic response apparatus 40A of 2nd Embodiment. 指示対応情報７６の内容の一例を示す図である。It is a figure which shows an example of the content of instruction correspondence information 76. 第２実施形態の利用者と自動応答装置４０との会話の一例を示す図である。It is a figure which shows an example of the conversation between the user of 2nd Embodiment and the automatic response device 40. 広告の情報が出力される際の音量の変化を示す図である。It is a figure which shows the change of the volume when the advertisement information is output. 第３実施形態の情報処理システム１Ｂの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the information processing system 1B of 3rd Embodiment. 自動応答装置４０Ｂにより実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of processing executed by the automatic response device 40B. 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その１）である。It is a figure (the 1) which shows an example of the conversation and the image displayed on the display part 15 of the 3rd Embodiment. 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その２）である。FIG. 2 is a diagram (No. 2) showing an example of an image displayed on the conversation and display unit 15 of the third embodiment. 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その３）である。FIG. 3 is a diagram (No. 3) showing an example of an image displayed on the conversation and display unit 15 of the third embodiment. 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その４）である。FIG. 4 is a diagram (No. 4) showing an example of an image displayed on the conversation and display unit 15 of the third embodiment.

以下、図面を参照し、本発明の情報処理システム、情報処理方法、およびプログラムの実施形態について説明する。 Hereinafter, the information processing system, the information processing method, and the embodiment of the program of the present invention will be described with reference to the drawings.

＜概要（共通事項）＞
情報処理システムは、一以上のプロセッサにより実現される。情報処理システムは、利用者により発せられた音声に対する応答内容と、応答内容とは異なる特定情報とを出力部に出力させる。「応答内容」は、例えば、ＡＩ（Artificial Intelligence；人工知能）や、深層学習などの機械学習されたモデルにより動作する自動応答装置が決定する情報である。「特定情報」とは、例えば、広告や、挨拶、会話のきっかけとなる発話、お知らせ（例えばレコメンドやパスワード変更の要求）等の、利用者により発せられた音声に対する応答に該当しない情報である。 <Overview (common matters)>
The information processing system is realized by one or more processors. The information processing system outputs the response content to the voice uttered by the user and specific information different from the response content to the output unit. The "response content" is information determined by an automatic response device that operates by a machine-learned model such as AI (Artificial Intelligence) or deep learning. The "specific information" is information that does not correspond to a response to a voice uttered by a user, such as an advertisement, a greeting, an utterance that triggers a conversation, or a notification (for example, a request for recommendation or password change).

［概要（その１）］
情報処理システムは、音声が入力または出力の対象とされたユーザデバイス（例えば、マイクやスピーカ）の利用度合に応じて、特定情報の出力態様を制御する。「利用度合」とは、例えば、音声をユーザデバイスに入力した回数または頻度に基づく値、または音声をユーザデバイスに出力させた回数または頻度に基づく値である。例えば、ユーザデバイスの利用度合が高いほど、特定情報の出力量を多くする。すなわち、音声入力または出力を普段から多用するユーザには自動応答装置からの話しかけや音声広告を多く出力する。また、ユーザデバイスの利用度合が高いほど、特定情報の出力態様を利用者が聞き取りやすいように制御する。「出力態様」とは、例えば、音の大きさや、音の高低、情報が出力されるテンポである。概要（その１）については、後述する第１実施形態を中心に説明する。 [Overview (1)]
The information processing system controls the output mode of specific information according to the degree of utilization of a user device (for example, a microphone or a speaker) whose voice is input or output. The "utilization degree" is, for example, a value based on the number or frequency of inputting voice to the user device, or a value based on the number of times or frequency of outputting voice to the user device. For example, the higher the usage of the user device, the larger the output amount of specific information. That is, a lot of talks and voice advertisements from the automatic response device are output to the user who usually uses voice input or output a lot. Further, the higher the degree of use of the user device, the easier it is to control the output mode of the specific information so that the user can easily hear it. The "output mode" is, for example, the loudness of a sound, the pitch of a sound, and the tempo at which information is output. The outline (No. 1) will be described focusing on the first embodiment described later.

［概要（その２）］
情報処理システムは、特定情報の出力態様を、応答内容の第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、特定情報の出力態様を第２出力態様に変更して、特定情報を出力部に出力させる。「第２出力態様」は、第１出力態様よりも利用者が聞き取りやすい出力態様である。すなわち、自動応答装置との対話において特定情報（例えば音声広告）のみの音量を小さくし利用者からの要望や操作に応じて音量をアップする。概要（その２）については、後述する第２実施形態を中心に説明する。 [Overview (Part 2)]
When the information processing system receives the user's instruction after changing the output mode of the specific information to the first output mode that is harder for the user to hear than the third output mode of the response content and outputting it to the output unit. The output mode of the specific information is changed to the second output mode, and the specific information is output to the output unit. The "second output mode" is an output mode that is easier for the user to hear than the first output mode. That is, in the dialogue with the automatic response device, the volume of only specific information (for example, voice advertisement) is reduced and the volume is increased in response to a request or operation from the user. The outline (No. 2) will be described focusing on the second embodiment described later.

［概要（その３）］
情報処理システムは、第１のキャラクターに応じた出力態様によって応答内容を出力部に出力させ、第２のキャラクターに応じた出力態様によって特定情報を出力部に出力させる。更に、情報処理システムは、第１のキャラクターと第２のキャラクターとの会話を出力部に出力させる。「第１のキャラクター」は、例えば、日常において、利用者と対話したり、利用者の発話に対して応答したりするキャラクターである。「第２のキャラクター」は、例えば、第１のキャラクターとは異なるキャラクターであって、特定情報（例えば広告）に対応付けられたキャラクターである。このように、利用者と対話する第１のキャラクターと、音声広告に対応した第２のキャラクターとが、会話することで利用者の広告に対する興味を喚起させる。概要（その３）については、後述する第３実施形態を中心に説明する。 [Summary (3)]
The information processing system outputs the response content to the output unit according to the output mode according to the first character, and outputs the specific information to the output unit according to the output mode according to the second character. Further, the information processing system causes the output unit to output the conversation between the first character and the second character. The "first character" is, for example, a character that interacts with a user or responds to a user's utterance in daily life. The "second character" is, for example, a character different from the first character and associated with specific information (for example, an advertisement). In this way, the first character interacting with the user and the second character corresponding to the voice advertisement arouse the user's interest in the advertisement by having a conversation. The outline (No. 3) will be described focusing on the third embodiment described later.

＜第１実施形態＞
［全体構成］
図１は、情報処理システム１の構成を示す図である。情報処理システム１は、例えば、端末装置１０と、自動応答装置４０と、広告提供装置８０とを備える。これらの装置は、ネットワークＮＷを介して互いに通信する。ネットワークＮＷは、例えば、ＷＡＮ（Wide Area Network）やＬＡＮ（Local Area Network）、インターネット、専用回線、無線基地局、プロバイダなどを含む。本実施形態では、自動応答装置４０が、「情報処理システム」の一例である。また、「情報処理システム」は、端末装置１０および／または広告提供装置８０を含んでもよい。 <First Embodiment>
[overall structure]
FIG. 1 is a diagram showing a configuration of an information processing system 1. The information processing system 1 includes, for example, a terminal device 10, an automatic response device 40, and an advertisement providing device 80. These devices communicate with each other via the network NW. The network NW includes, for example, WAN (Wide Area Network), LAN (Local Area Network), the Internet, a dedicated line, a radio base station, a provider, and the like. In this embodiment, the automatic response device 40 is an example of an "information processing system". Further, the "information processing system" may include the terminal device 10 and / or the advertisement providing device 80.

［端末装置の機能構成］
端末装置１０は、例えば、スマートスピーカ（Artificial intelligenceスピーカ）や、スマートフォン、タブレット端末、パーソナルコンピュータ等である。第１実施形態では端末装置１０は、スマートスピーカであるものとして説明する。 [Functional configuration of terminal device]
The terminal device 10 is, for example, a smart speaker (Artificial intelligence speaker), a smartphone, a tablet terminal, a personal computer, or the like. In the first embodiment, the terminal device 10 will be described as being a smart speaker.

端末装置１０は、例えば、マイク１２と、スピーカ１４と、音声認識部１６と、音声生成部１８と、端末制御部２０と、端末装置側通信部２２と、記憶部３０とを備える。音声認識部１６、音声生成部１８、および端末制御部２０は、例えば、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサが、フラッシュメモリなどの記憶部３０に記憶されたアプリケーションプログラム（アプリ３２）を実行することにより実現される。アプリ３２は、例えば、ネットワークを介してサーバ装置等からダウンロードされてもよいし、予め端末装置１０にプリインストールされていてもよい。なお、アプリケーションプログラムに代えて、以下に説明するものと同様の機能を有するブラウザがＵＡ（User Agent）として用いられてもよい。なお、端末装置１０に含まれる一部または全部の機能は、自動応答装置４０に含まれてもよい。 The terminal device 10 includes, for example, a microphone 12, a speaker 14, a voice recognition unit 16, a voice generation unit 18, a terminal control unit 20, a terminal device side communication unit 22, and a storage unit 30. In the voice recognition unit 16, the voice generation unit 18, and the terminal control unit 20, for example, a hardware processor such as a CPU (Central Processing Unit) stores an application program (application 32) stored in a storage unit 30 such as a flash memory. It is realized by executing. The application 32 may be downloaded from a server device or the like via a network, or may be pre-installed in the terminal device 10 in advance. Instead of the application program, a browser having the same functions as those described below may be used as the UA (User Agent). Note that some or all of the functions included in the terminal device 10 may be included in the automatic response device 40.

マイク１２は、利用者によって発せられた音声、または端末装置１０が存在する環境の環境音を取得する。スピーカは、音声生成部１８により生成された情報に応じた音声を出力する。 The microphone 12 acquires the voice emitted by the user or the environmental sound of the environment in which the terminal device 10 exists. The speaker outputs voice according to the information generated by the voice generation unit 18.

音声認識部１６は、マイク１２により取得された音声をデジタルデータ（音声データ）に変換する。音声生成部１８は、自動応答装置４０により送信された情報に基づいて、スピーカ１４に出力させる音声に応じた情報を生成する。 The voice recognition unit 16 converts the voice acquired by the microphone 12 into digital data (voice data). The voice generation unit 18 generates information according to the voice to be output to the speaker 14 based on the information transmitted by the automatic response device 40.

端末制御部２０は、音声認識部１６により変換されたデジタルデータを、端末装置側通信部２２を用いて、自動応答装置４０に送信する。端末制御部２０は、自動応答装置４０により送信された情報を、端末装置側通信部２２を介して取得する。 The terminal control unit 20 transmits the digital data converted by the voice recognition unit 16 to the automatic response device 40 by using the communication unit 22 on the terminal device side. The terminal control unit 20 acquires the information transmitted by the automatic response device 40 via the terminal device side communication unit 22.

端末装置側通信部２２は、例えば、無線通信インターフェースである。端末装置側通信部２２は、自動応答装置４０により送信された情報を取得したり、端末装置１０において処理された処理結果を自動応答装置４０に送信したりする。 The terminal device side communication unit 22 is, for example, a wireless communication interface. The terminal device side communication unit 22 acquires the information transmitted by the automatic response device 40, and transmits the processing result processed by the terminal device 10 to the automatic response device 40.

［自動応答装置の機能構成］
自動応答装置４０は、例えば、利用者特定部４２と、環境解析部４３と、パターン特定部４４と、解釈部４６と、応答部４８と、提供制御部５０と、学習部５２と、応答装置側通信部５４と、第１記憶部６０と、第２記憶部７０とを備える。利用者特定部４２、環境解析部４３、パターン特定部４４、解釈部４６、応答部４８、提供制御部５０、および学習部５２は、例えば、ＣＰＵ等のハードウェアプロセッサが、記憶装置（例えば第１記憶部６０）に記憶されたプログラムを実行することにより実現される。また、これらの機能部は、ＬＳＩ（Large Scale Integration）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）等のハードウェアによって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。また、上記のプログラムは、予め記憶装置に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭなどの着脱可能な記憶媒体に格納されており、記憶媒体が自動応答装置４０のドライブ装置に装着されることで記憶装置にインストールされてもよい。第１記憶部６０および第２記憶部７０は、例えば、ＲＯＭ（Read Only Memory）、フラッシュメモリ、ＳＤカード、ＲＡＭ（Random Access Memory）、レジスタ等によって実現される。 [Functional configuration of interactive voice response]
The automatic response device 40 includes, for example, a user identification unit 42, an environment analysis unit 43, a pattern identification unit 44, an interpretation unit 46, a response unit 48, a provision control unit 50, a learning unit 52, and a response device. It includes a side communication unit 54, a first storage unit 60, and a second storage unit 70. In the user identification unit 42, the environment analysis unit 43, the pattern identification unit 44, the interpretation unit 46, the response unit 48, the provision control unit 50, and the learning unit 52, for example, a hardware processor such as a CPU is stored in a storage device (for example, a first unit). It is realized by executing the program stored in 1 storage unit 60). Further, these functional units may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit), etc. It may be realized by the collaboration of software and hardware. Further, the above program may be stored in a storage device in advance, or is stored in a removable storage medium such as a DVD or a CD-ROM, and the storage medium is attached to the drive device of the automatic response device 40. It may be installed in the storage device. The first storage unit 60 and the second storage unit 70 are realized by, for example, a ROM (Read Only Memory), a flash memory, an SD card, a RAM (Random Access Memory), a register, or the like.

第１記憶部６０には、例えば、後述する、利用者特定情報６２、環境特定情報６３、環境パターン情報６４、正規表現情報６６、およびシナリオ情報６８が記憶されている。第２記憶部７０には、例えば、後述する、出力度合情報７２、および利用情報７４が記憶されている。第１記憶部６０と第２記憶部７０は、必ずしも別体の記憶装置により実現される必要はなく、一体の記憶装置における異なる記憶領域であってもよい。 The first storage unit 60 stores, for example, user-specific information 62, environment-specific information 63, environment pattern information 64, regular expression information 66, and scenario information 68, which will be described later. For example, the output degree information 72 and the usage information 74, which will be described later, are stored in the second storage unit 70. The first storage unit 60 and the second storage unit 70 do not necessarily have to be realized by separate storage devices, and may be different storage areas in the integrated storage device.

利用者特定部４２は、例えば、端末装置１０により送信された音声データから人の声を表すと推定される音声データの成分（以下、発話成分）を抽出する。利用者特定部４２は、抽出した発話成分と、利用者特定情報６２に含まれる情報とを照合して、抽出した発話成分により表される音声を発した人物を特定する。利用者特定情報６２は、利用者の識別情報と、その利用者の声の特徴を示す情報（例えば、声紋パターンや周波数パターン）が対応付けられた情報である。 For example, the user identification unit 42 extracts a component of voice data (hereinafter, utterance component) presumed to represent a human voice from the voice data transmitted by the terminal device 10. The user identification unit 42 collates the extracted utterance component with the information contained in the user identification information 62, and identifies the person who has emitted the voice represented by the extracted utterance component. The user identification information 62 is information in which user identification information is associated with information indicating the characteristics of the user's voice (for example, a voiceprint pattern or a frequency pattern).

また、利用者特定部４２は、利用者特定情報６２を参照し、音声を発した利用者の周辺に存在する人物の種別を特定してもよい。この場合、利用者特定情報６２には、予め利用者の家族や友人などの声の特徴を示す情報が含まれている。また、利用者特定部４２は、端末装置側通信部２２を介して、家族等が保有する端末装置とWi-Fiルーターとの接続状態を示す情報を取得し、取得した情報に基づいて、端末装置の保有者がWi-Fiルーターが設置された位置付近に存在するか否かを判定してもよい。 Further, the user identification unit 42 may refer to the user identification information 62 and specify the type of a person existing in the vicinity of the user who has emitted the voice. In this case, the user identification information 62 includes information indicating the characteristics of the voice of the user's family and friends in advance. In addition, the user identification unit 42 acquires information indicating the connection status between the terminal device owned by the family and the Wi-Fi router via the communication unit 22 on the terminal device side, and based on the acquired information, the terminal The owner of the device may determine if it is near the location where the Wi-Fi router is installed.

環境解析部４３は、例えば、端末装置１０により送信された音声データから人の声以外の環境音を表すと推定される音声データの成分（以下、環境音成分）を抽出する。環境解析部４３は、抽出した環境音成分と、環境特定情報６３に含まれる情報とを照合して、抽出した環境音成分により表される環境音の大きさや、その環境音の発生要因を特定する。環境特定情報６３は、環境音の発生要因の識別情報と、環境音の発生要因ごとの音の特徴とが互いに対応付けられた情報である。 For example, the environment analysis unit 43 extracts a component of voice data (hereinafter, environmental sound component) presumed to represent an environmental sound other than a human voice from the voice data transmitted by the terminal device 10. The environmental analysis unit 43 collates the extracted environmental sound component with the information contained in the environmental identification information 63, and identifies the loudness of the environmental sound represented by the extracted environmental sound component and the cause of the environmental sound. To do. The environment-specific information 63 is information in which the identification information of the environmental sound generation factor and the sound characteristics of each environmental sound generation factor are associated with each other.

パターン特定部４４は、例えば、環境パターン情報６４と、利用者特定部４２の処理結果、および環境解析部４３の処理結果に基づいて、環境パターンを特定する。環境パターンとは、利用者が存在している環境について、所定の基準に従って分類されたパターンである。詳細は後述する。 The pattern identification unit 44 specifies an environment pattern based on, for example, the environment pattern information 64, the processing result of the user identification unit 42, and the processing result of the environment analysis unit 43. The environment pattern is a pattern in which the environment in which the user exists is classified according to a predetermined standard. Details will be described later.

解釈部４６は、例えば、人の声に対応する音声データを、テキスト情報に変換し、更に、テキスト情報と正規表現情報６６とを照合して、利用者の発話の意味を解釈する。例えば、利用者により「新宿から渋谷までの行き方を教えて」と発話されたものとする。解釈部４６は、上記の発話を形態素解析し、発話を品詞に分割する。そして、解釈部４６は、固有名詞かつ場所名に該当する新宿および渋谷を符号に変換した検索キーを生成し、正規表現情報６８を検索する。正規表現情報６８には、固有名詞を抽象化した符号に変換した情報（正規表現）が登録されている。例えば、「〇〇から××への行き方を教えて」、「〇〇から××までの行き方を教えて」などのテキストが付与された情報が正規表現として登録されている。 For example, the interpretation unit 46 converts the voice data corresponding to the human voice into text information, and further collates the text information with the regular expression information 66 to interpret the meaning of the user's utterance. For example, assume that the user utters "Tell me how to get from Shinjuku to Shibuya." The interpretation unit 46 morphologically analyzes the above utterance and divides the utterance into part of speech. Then, the interpretation unit 46 generates a search key in which Shinjuku and Shibuya corresponding to the proper noun and the place name are converted into codes, and searches the regular expression information 68. Information (regular expression) obtained by converting a proper noun into an abstract code is registered in the regular expression information 68. For example, information with texts such as "Tell me how to get from XX to XX" and "Tell me how to get from XX to XX" is registered as a regular expression.

応答部４８は、例えば、正規表現情報６８に含まれる「（固有名詞、場所）から（固有名詞、場所）までの行き方を教えて」に対応するテキスト情報を取得し、〇〇から××までの行き方を提供すればよいことを認識する。 The response unit 48 acquires, for example, the text information corresponding to "Tell me how to get from (proper noun, place) to (proper noun, place)" included in the regular expression information 68, and from XX to XX. Recognize that you should provide directions for.

そして、応答部４８は、（固有名詞、場所）の部分に、符号化された元情報である「新宿」および「渋谷」を埋め込むことで、「新宿から渋谷までの行き方を知りたい」という利用者の意思を認識する。応答部４８は、ネットワーク検索などを行い、新宿から渋谷までの行き方を取得する。応答部４８は、例えば、シナリオ情報６８を参照し、新宿から渋谷までの行き方を示す、端末装置１０において出力するための音声元情報を生成する。シナリオ情報６８は、例えば、利用者の発話に対して応答すべき内容が予め保持されている。すなわち、利用者が「〇〇から××までの行き方を知りたい」という意思を有する発話に対する応答内容が保持されている。シナリオ情報６８は、例えば、応答内容が利用者の嗜好等に合致するように利用者ごとに用意されている。 Then, the response unit 48 uses "I want to know how to get from Shinjuku to Shibuya" by embedding the encoded original information "Shinjuku" and "Shibuya" in the (proper noun, place) part. Recognize the intention of the person. The response unit 48 performs a network search and obtains directions from Shinjuku to Shibuya. The response unit 48 refers to, for example, the scenario information 68, and generates voice source information for output by the terminal device 10, which indicates the direction from Shinjuku to Shibuya. The scenario information 68 holds, for example, the content to be responded to the utterance of the user in advance. That is, the content of the response to the utterance in which the user has the intention of "knowing the way from XX to XX" is retained. The scenario information 68 is prepared for each user, for example, so that the response content matches the user's preference and the like.

なお、上記の応答部４８などの自動応答装置４０に含まれる一部または全部の機能は、端末装置１０に備えられてもよい。また、正規表現情報６６やシナリオ情報６８などの情報も端末装置１０の記憶装置に記憶されていてもよい。 Note that some or all of the functions included in the automatic response device 40 such as the response unit 48 may be provided in the terminal device 10. In addition, information such as regular expression information 66 and scenario information 68 may also be stored in the storage device of the terminal device 10.

提供制御部５０は、応答部４８により生成された音声元情報を、端末装置１０に出力させるために、応答装置側通信部５４を用いて、音声元情報を端末装置１０に送信する。更に、提供制御部５０は、広告提供装置８０により送信された音声元情報を端末装置１０に出力させるために、応答装置側通信部５４を用いて、その音声元情報を端末装置１０に送信する。 The providing control unit 50 transmits the voice source information to the terminal device 10 by using the response device side communication unit 54 in order to output the voice source information generated by the response unit 48 to the terminal device 10. Further, the provision control unit 50 transmits the voice source information to the terminal device 10 by using the response device side communication unit 54 in order to output the voice source information transmitted by the advertisement providing device 80 to the terminal device 10. ..

また、提供制御部５０は、応答内容または特定情報の出力態様を指定し、指定した出力態様で応答内容または特定情報を端末装置１０のスピーカ１４に出力させるために、指定した出力態様と応答内容または特定情報とを対応付けた情報を、応答装置側通信部５４を用いて端末装置１０に送信する。この提供制御部５０の機能は、端末装置１０に備えられてもよい。 Further, the provision control unit 50 specifies the output mode of the response content or the specific information, and in order to output the response content or the specific information to the speaker 14 of the terminal device 10 in the designated output mode, the designated output mode and the response content. Alternatively, the information associated with the specific information is transmitted to the terminal device 10 by using the communication unit 54 on the response device side. The function of the provided control unit 50 may be provided in the terminal device 10.

学習部５２は、端末装置１０のスピーカ１４に出力させた応答内容または特定情報の内容、出力させた情報の出力態様、利用者の反応、および環境パターンを学習する。学習とは、例えば、人工知能を用いた学習や、深層学習などの機械学習等である。 The learning unit 52 learns the content of the response or specific information output to the speaker 14 of the terminal device 10, the output mode of the output information, the reaction of the user, and the environmental pattern. Learning is, for example, learning using artificial intelligence, machine learning such as deep learning, and the like.

応答装置側通信部５４は、ネットワークインターフェースカード（Network Interface Card）等の通信インターフェースを含む。応答装置側通信部５４は、端末装置１０または広告提供装置８０により送信された情報を取得したり、自動応答装置４０において処理された処理結果を端末装置１０または広告提供装置８０に送信したりする。 The communication unit 54 on the response device side includes a communication interface such as a network interface card. The response device side communication unit 54 acquires the information transmitted by the terminal device 10 or the advertisement providing device 80, and transmits the processing result processed by the automatic response device 40 to the terminal device 10 or the advertisement providing device 80. ..

［広告提供装置］
広告提供装置８０は、例えば、情報提供部８２と、広告提供装置側通信部８４と、広告提供装置側記憶部９０とを備える。情報提供部８２は、利用者の発話により入力された情報、または自動応答装置４０の応答内容に基づいて、利用者に提供する広告を抽出し、抽出した広告に関する情報（例えば音声元情報および音声を出力する出力態様）を自動応答装置４０に提供する。 [Advertising provider]
The advertisement providing device 80 includes, for example, an information providing unit 82, an advertisement providing device side communication unit 84, and an advertisement providing device side storage unit 90. The information providing unit 82 extracts an advertisement to be provided to the user based on the information input by the user's utterance or the response content of the automatic response device 40, and information about the extracted advertisement (for example, voice source information and voice). An output mode) is provided to the automatic response device 40.

広告提供装置側通信部８４は、ネットワークインターフェースカード等の通信インターフェースを含む。広告提供装置側通信部８４は、自動応答装置４０により送信された情報を取得したり、広告提供装置８０において処理された処理結果を自動応答装置４０に送信したりする。広告提供装置側記憶部９０には、後述する広告情報９２が記憶されている。なお、広告提供装置８０と自動応答装置４０とは一体の装置として設けられてもよい。 The communication unit 84 on the advertisement providing device side includes a communication interface such as a network interface card. The advertisement providing device side communication unit 84 acquires the information transmitted by the automatic response device 40, and transmits the processing result processed by the advertisement providing device 80 to the automatic response device 40. The advertisement information 92, which will be described later, is stored in the advertisement providing device side storage unit 90. The advertisement providing device 80 and the automatic response device 40 may be provided as an integrated device.

［フローチャート（出力度合を決定する処理）］
図２は、情報処理システム１により実行される処理の流れの一例を示すフローチャートである。本処理は、利用者による音声ＵＩ（ユーザインタフェース／ユーザデバイス）の利用の量に応じ、人工物からの音声出力の量を制御する処理である。この音声ＵＩは音声認識である。 [Flowchart (process to determine the degree of output)]
FIG. 2 is a flowchart showing an example of the flow of processing executed by the information processing system 1. This process is a process of controlling the amount of voice output from an artificial object according to the amount of voice UI (user interface / user device) used by the user. This voice UI is voice recognition.

まず、端末装置１０は、利用者により音声が入力されたか否かを判定する（Ｓ１０）。利用者により音声が入力された場合（利用者と自動応答装置４０との会話が開始された場合）、入力された音声データ（発話成分および環境音成分）は、自動応答装置４０に送信される。 First, the terminal device 10 determines whether or not the voice has been input by the user (S10). When voice is input by the user (when a conversation between the user and the automatic response device 40 is started), the input voice data (speech component and environmental sound component) is transmitted to the automatic response device 40. ..

自動応答装置４０は、発話成分を取得し、取得した発話成分と利用者特定情報６２に基づいて、利用者を特定する（Ｓ２０）。自動応答装置４０は、環境音成分を取得し、取得した環境音成分と環境パターン情報６４に基づいて、環境パターンを特定する（Ｓ２２）。 The automatic response device 40 acquires the utterance component and identifies the user based on the acquired utterance component and the user identification information 62 (S20). The automatic response device 40 acquires an environmental sound component and identifies the environmental pattern based on the acquired environmental sound component and the environmental pattern information 64 (S22).

図３は、環境パターン情報６４の内容の一例を示す図である。環境パターン情報６４は、複数の環境パターンと、分類基準とが対応付けられた情報である。環境パターンの分類基準は、例えば、曜日や、時間、利用者の周囲に存在している人物の数、人物の種別、利用者が存在している環境音の大きさ、利用者が存在している環境（自宅、オフィス、街）、利用者が存在している位置、および利用者のスケジュール（事前に登録された現在の予定）等のうち、少なくとも一以上の項目に基づいて、分類されるパターンである。 FIG. 3 is a diagram showing an example of the contents of the environment pattern information 64. The environment pattern information 64 is information in which a plurality of environment patterns and classification criteria are associated with each other. The classification criteria for environmental patterns are, for example, the day and time, the number of people around the user, the type of person, the loudness of the environmental sound in which the user exists, and the presence of the user. It is classified based on at least one item among the environment (home, office, city), the location where the user exists, the user's schedule (pre-registered current schedule), etc. It is a pattern.

利用者が存在している環境、利用者が存在している位置、または利用者のスケジュールは、例えば予め利用者により設定された情報である。また、利用者が存在している環境、または利用者が存在している位置は、不図示のＧＰＳ（Global Positioning System）を利用した位置測位装置により測位された情報に基づいて特定されてもよい。また、利用者のスケジュールは、端末装置１０が他の装置からネットワークＮＷを介して取得した情報であってもよい。 The environment in which the user exists, the location in which the user exists, or the schedule of the user is, for example, information set in advance by the user. Further, the environment in which the user exists or the position where the user exists may be specified based on the information positioned by the positioning device using GPS (Global Positioning System) (not shown). .. Further, the user's schedule may be information acquired by the terminal device 10 from another device via the network NW.

次に、自動応答装置４０は、特定した利用者に提供する広告の内容を決定するように広告提供装置８０に依頼する（Ｓ２４）。この際、自動応答装置４０は、端末装置１０に入力された音声に含まれる情報をテキスト情報に変換したテキスト情報を広告提供装置８０に送信する。 Next, the automatic response device 40 requests the advertisement providing device 80 to determine the content of the advertisement to be provided to the specified user (S24). At this time, the automatic response device 40 transmits the text information obtained by converting the information contained in the voice input to the terminal device 10 into the text information to the advertisement providing device 80.

広告提供装置８０は、自動応答装置４０の依頼に応じて、広告情報９２を参照して、テキスト情報に対応する利用者に提供する広告の内容を決定する（Ｓ３０）。なお、広告提供装置８０は、利用者に提供する広告が存在しない場合、その旨を自動応答装置４０に送信する。 In response to the request of the automatic response device 40, the advertisement providing device 80 refers to the advertisement information 92 and determines the content of the advertisement to be provided to the user corresponding to the text information (S30). If the advertisement to be provided to the user does not exist, the advertisement providing device 80 transmits to that effect to the automatic response device 40.

図４は、広告情報９２の内容の一例を示す図である。広告情報９２は、広告ＩＤに対して、キャラクター、商品（またはサービス）、シナリオ、およびキーワードが対応付けられた情報である。「キャラクター」とは、所定の特徴を有する人物や、人に見立てた動物、植物、創作物、人工物などである。キャラクターは、商品ごとに設けられてもよいし、複数の商品ごとや、キャンペーンごとに設けられてもよい。 FIG. 4 is a diagram showing an example of the content of the advertisement information 92. The advertisement information 92 is information in which a character, a product (or service), a scenario, and a keyword are associated with the advertisement ID. A "character" is a person having a predetermined characteristic, an animal, a plant, a creative work, an artificial object, or the like that looks like a person. The character may be provided for each product, for each of a plurality of products, or for each campaign.

「シナリオ」とは、キャラクターが発する言葉（または言動）の内容や順序を規定したものである。シナリオは、例えば、キャラクターごとに設けられている。また、広告情報９２には、シナリオに加え、音声のトーンや、テンポ等のキャラクターの特徴がキャラクターに対して対応付けられている。商品やキャンペーンごとのキャラクターは、シナリオ（行動ルール）を基に自律的に行動する。 A "scenario" defines the content and order of words (or words and actions) uttered by a character. The scenario is provided for each character, for example. Further, in the advertisement information 92, in addition to the scenario, character characteristics such as voice tone and tempo are associated with the character. Characters for each product or campaign act autonomously based on scenarios (action rules).

「キーワード」は、広告に関連付けられた言葉である。［キーワード］は、商品を示す言葉の意味（意味情報）と同一の意味を有する言葉、または商品を示す言葉の意味に関連する言葉である。関連する言葉とは、商品を示す言葉から一般的に想起される言葉である。例えば、広告提供装置８０は、利用者により入力された言葉または自動応答装置４０により発せられた音声に含まれる言葉と、広告情報９２のキーワードとが合致する場合に、合致するキーワードに対応付けられた広告ＩＤに対応する情報（キャラクターが発話する音声元情報等）を自動応答装置４０に送信する。なお、広告提供装置８０は、人工知能や、深層学習などの機械学習されたモデルにより利用者に提供する情報を決定してもよい。 A "keyword" is a word associated with an advertisement. [Keyword] is a word having the same meaning as the meaning (semantic information) of the word indicating the product, or a word related to the meaning of the word indicating the product. Related words are words that are commonly recalled from words that indicate a product. For example, the advertisement providing device 80 is associated with a matching keyword when the word input by the user or the word included in the voice uttered by the automatic response device 40 matches the keyword of the advertisement information 92. Information corresponding to the advertisement ID (voice source information uttered by the character, etc.) is transmitted to the automatic response device 40. The advertisement providing device 80 may determine the information to be provided to the user by a machine-learned model such as artificial intelligence or deep learning.

次に、自動応答装置４０は、後述する出力度合情報７２を参照して、環境パターンに応じた広告の出力度合を決定し、決定した出力度合で広告を出力するように端末装置１０に指示をする（Ｓ２６）。次に、端末装置１０は、自動応答装置４０の指示に基づいて、広告を出力する（Ｓ１２）。これにより本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40 determines the output degree of the advertisement according to the environment pattern with reference to the output degree information 72 described later, and instructs the terminal device 10 to output the advertisement at the determined output degree. (S26). Next, the terminal device 10 outputs an advertisement based on the instruction of the automatic response device 40 (S12). As a result, the processing of one routine of this flowchart is completed.

図５は、出力度合情報７２の内容の一例を示す図である。出力度合情報７２は、例えば、環境パターンごとに用意されている。また、出力度合情報７２は、利用者ＩＤに対して、環境パターンにおける過去の利用度合および広告を出力する出力度合が対応付けられた情報である。 FIG. 5 is a diagram showing an example of the contents of the output degree information 72. The output degree information 72 is prepared for each environment pattern, for example. Further, the output degree information 72 is information in which the past usage degree in the environment pattern and the output degree for outputting the advertisement are associated with the user ID.

「過去の利用度合」とは、利用者が過去にスピーカ１４から音声による情報（例えば広告）の提供を受けた度合、または利用者が過去にマイク１２に音声を用いて情報を入力した度合である。「出力度合」とは、スピーカ１４を用いて利用者に情報を出力する場合に、出力される音の大きさである。「出力度合」は、「出力態様」の一例である。出力度合は、例えば、過去の利用度合が多いほど、出力される音の大きさは大きくなるように設定されている。なお、「スピーカ１４から音声による情報の提供を受けた度合」において、音楽を出力させた度合は除かれてもよい。 The "past usage degree" is the degree to which the user has received voice information (for example, advertisement) from the speaker 14 in the past, or the degree to which the user has input information to the microphone 12 by voice in the past. is there. The “output degree” is the loudness of the output sound when information is output to the user using the speaker 14. "Output degree" is an example of "output mode". The degree of output is set so that, for example, the greater the degree of use in the past, the louder the output sound. In addition, in "the degree of receiving information by voice from the speaker 14," the degree of outputting music may be excluded.

また、出力度合情報７２において、出力度合に代えて、他の出力に関する態様が対応付けられていてもよい。出力に関する態様とは、例えば、音の大きさ加え、音の高低、広告の内容が出力されるテンポ等である。出力に関する態様は、例えば、過去の利用度合が多いほど、利用者が聞き取りやすいように設定されている。 Further, in the output degree information 72, other modes related to output may be associated with the output degree information 72 instead of the output degree. The mode related to the output is, for example, the loudness of the sound, the pitch of the sound, the tempo at which the content of the advertisement is output, and the like. The output aspect is set so that the user can easily hear the output, for example, the greater the degree of use in the past.

また、利用者が存在する環境の環境音が所定の大きさ以上の場合、環境音が所定の大きさ未満の場合よりも、特定情報の出力態様の変化度合を小さくしてもよい。すなわち、もともと環境音が大きい環境においては、特定情報の出力を大きくさせなくてもよい。 Further, when the environmental sound of the environment in which the user exists is equal to or larger than a predetermined loudness, the degree of change in the output mode of the specific information may be smaller than that when the environmental sound is less than the predetermined loudness. That is, in an environment where the environmental sound is originally loud, it is not necessary to increase the output of specific information.

上述したように、自動応答装置４０が、出力度合情報７２を参照することにより、利用者に違和感を与えないように情報を提供することができる。 As described above, the automatic response device 40 can provide information by referring to the output degree information 72 so as not to give a sense of discomfort to the user.

なお、上述した説明では、一例として、利用者が音声を入力した場合に、利用度合に基づいて出力態様を制御する例について説明したが、単に自動応答装置４０が発話したり、情報を出力したりする場合において利用度合に基づいて出力態様を制御してもよい。 In the above description, as an example, when the user inputs voice, the output mode is controlled based on the degree of use, but the automatic response device 40 simply speaks or outputs information. In such a case, the output mode may be controlled based on the degree of utilization.

［具体例（その１）］
図６は、利用者と自動応答装置４０との会話の一例を示す図である。例えば、図６（Ａ）に示すように、（１）利用者が「新しい車が欲しいな。」とマイク１２に入力する。
（２）自動応答装置４０は、第１キャラクターの出力態様で、「どんな車が欲しいの？」と応答する。 [Specific example (1)]
FIG. 6 is a diagram showing an example of a conversation between the user and the automatic response device 40. For example, as shown in FIG. 6A, (1) the user inputs "I want a new car" to the microphone 12.
(2) The automatic response device 40 responds with "what kind of car do you want?" In the output mode of the first character.

次に、図６（Ｂ）に示すように、（３）利用者が「燃費のいい車がいいな。」とマイク１２に入力する。（４）自動応答装置４０は、第１キャラクターの出力態様で、「節約できるからいいよね。」と応答する。そして、（５）自動応答装置４０は、第２キャラクターの出力態様で、「車Ａが燃費いいよ。」と発話する。この第２キャラクターの出力態様は、ユーザデバイスの利用度合に応じた出力態様である。 Next, as shown in FIG. 6 (B), (3) the user inputs to the microphone 12 "I like a car with good fuel economy." (4) The automatic response device 40 responds with the output mode of the first character, "It's good because it saves money." Then, (5) the automatic response device 40 utters "Car A has good fuel economy" in the output mode of the second character. The output mode of the second character is an output mode according to the degree of use of the user device.

次に、図６（Ｃ）に示すように、（６）利用者が「詳しく教えて。」とマイク１２に入力する。（７）自動応答装置４０は、第２キャラクターの出力態様で、「車Ａは電気自動車だよ。フル充電で〇〇キロ走行可能だよ。」と応答する。 Next, as shown in FIG. 6 (C), (6) the user inputs "Tell me in detail." To the microphone 12. (7) The automatic response device 40 responds in the output mode of the second character, "Car A is an electric vehicle. It can travel 000 km on a full charge."

このように、第１キャラクターと利用者との会話において、キーワードが出現した場合、自動応答装置４０は、ユーザデバイスの利用度合に応じた出力態様で、キーワードに基づく広告を第２キャラクターの出力態様で、利用者に提供する。この結果、利用者に違和感を与えないように情報を提供することができる。 In this way, when a keyword appears in the conversation between the first character and the user, the automatic response device 40 outputs an advertisement based on the keyword in an output mode according to the degree of use of the user device, and outputs an advertisement based on the keyword. And provide it to the user. As a result, it is possible to provide information so as not to give the user a sense of discomfort.

なお、上記の（６）で、車Ａに興味を示さなかった場合、第２キャラクターは、その後、発話しなくてもよい。また、車Ａに興味を示さなかった場合、他の車に対応するキャラクターの出力態様で、他の車を紹介してもよい。 If the car A is not interested in the above (6), the second character does not have to speak after that. Further, if the car A is not interested, the other car may be introduced in the output mode of the character corresponding to the other car.

また、車の広告を提供したい場合、自動応答装置４０は、第１キャラクターに車の話題で会話するような発話や応答を行ってもよい。この場合、例えば、自動応答装置４０は、上述したキーワード、キーワードを誘導するような発話を行う。例えば、出力したい特定情報に基づいて、キャラクターの会話が選択される。 Further, when it is desired to provide an advertisement for a car, the automatic response device 40 may make an utterance or a response to the first character as if talking about the topic of the car. In this case, for example, the automatic response device 40 makes the above-mentioned keywords and utterances that induce the keywords. For example, a character conversation is selected based on the specific information to be output.

また、上述した例では、第２キャラクターの発話の出力度合を変更するものとしたが、第１キャラクターの発話の出力度合が変更されてもよい。また、出力度合は、利用者とキャラクターとの会話の度合に基づいて変更されてもよい。例えば、第１キャラクターと利用者との会話の度合が、第Ｎキャラクター（Ｎは任意の自然数）と利用者との会話の度合よりも高い場合、第１キャラクターが利用者に話し掛ける度合を、第Ｎキャラクターが利用者に話しかける度合よりも多くする。 Further, in the above-described example, the output degree of the utterance of the second character is changed, but the output degree of the utterance of the first character may be changed. Further, the output degree may be changed based on the degree of conversation between the user and the character. For example, if the degree of conversation between the first character and the user is higher than the degree of conversation between the Nth character (N is an arbitrary natural number) and the user, the degree of conversation between the first character and the user is determined to be the first. Make it more than the N character talks to the user.

［フローチャート（学習する処理）］
図７は、端末装置１０および自動応答装置４０により実行される処理の流れの一例を示すフローチャートである。図６のフローチャートのＳ４０、Ｓ５０、およびＳ５２の処理は、図２のフローチャートのＳ１０、Ｓ２０、およびＳ２２の処理と同様のため説明を省略する。 [Flowchart (learning process)]
FIG. 7 is a flowchart showing an example of the flow of processing executed by the terminal device 10 and the automatic response device 40. The processing of S40, S50, and S52 in the flowchart of FIG. 6 is the same as the processing of S10, S20, and S22 of the flowchart of FIG. 2, and thus the description thereof will be omitted.

Ｓ５２の処理後に、自動応答装置４０は、自装置が情報を利用者に提供したか否かを判定する（Ｓ５４）。情報を利用者に提供した場合、自動応答装置４０は、提供した情報の内容、および情報の提供後の利用者の反応を取得し、取得した反応を利用情報７４として第２記憶部７０に記憶させる（Ｓ５６）。 After the process of S52, the automatic response device 40 determines whether or not the own device has provided the information to the user (S54). When the information is provided to the user, the automatic response device 40 acquires the content of the provided information and the user's reaction after the information is provided, and stores the acquired reaction as the usage information 74 in the second storage unit 70. (S56).

図８は、利用情報７４の内容の一例を示す図である。利用情報７４は、利用者ごとに、過去に利用者により入力された情報、または過去に利用者に対して出力された情報と、入力された情報、または出力された情報の出力態様と、環境パターンと、出力された情報に対する利用者の反応（例えば指示）とが互いに対応付けられた情報である。 FIG. 8 is a diagram showing an example of the contents of the usage information 74. The usage information 74 includes information input by the user in the past or information output to the user in the past, information input or output mode of the output information, and an environment for each user. The pattern and the user's reaction (for example, instruction) to the output information are information associated with each other.

次に、自動応答装置４０は、所定のタイミングに到達したか否かを判定する（Ｓ５８）。所定のタイミングに到達していない場合、本フローチャートの１ルーチンの処理が終了する。所定のタイミングに到達した場合、自動応答装置４０は、利用情報７４を学習データとして学習する（Ｓ６０）。これにより本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40 determines whether or not a predetermined timing has been reached (S58). If the predetermined timing has not been reached, the processing of one routine of this flowchart ends. When the predetermined timing is reached, the automatic response device 40 learns the usage information 74 as learning data (S60). As a result, the processing of one routine of this flowchart is completed.

上述したように、利用者に情報を提供した際の利用者の反応や、環境パターン、情報の出力態様、情報の内容が学習されることにより、利用者の好みを把握することができる。そして、学習部５２は、利用者の好みを反映させて出力度合情報７２を生成したり、更新したりすることができる。 As described above, the user's preference can be grasped by learning the user's reaction when the information is provided to the user, the environmental pattern, the information output mode, and the content of the information. Then, the learning unit 52 can generate or update the output degree information 72 by reflecting the user's preference.

例えば、土曜日や、時間帯が７時〜８時、利用者の周囲に親が存在している場合、利用者が自宅にいる場合、またはプライベートのスケジュールが予定されている時間帯において、他の状況の場合よりも抑制するように特定情報が出力されるように指示されたことを示す情報が、利用情報７４に含まれているものとする。この場合、学習部５２は、上述した状況に対応する環境パターンでは、特定情報の出力を抑制するように、出力度合情報７２を生成する。 For example, on Saturdays, from 7:00 to 8:00, when there are parents around the user, when the user is at home, or when a private schedule is scheduled, other times. It is assumed that the usage information 74 includes information indicating that the specific information has been instructed to be output so as to suppress the situation more than in the case of the situation. In this case, the learning unit 52 generates the output degree information 72 so as to suppress the output of the specific information in the environment pattern corresponding to the above-mentioned situation.

上述したように、利用者と音声インタラクションするスマートスピーカなどの人工物が、音声広告や話し掛けを過剰に行うと利用者は煩雑に感じる場合があるが、本実施形態では、利用者の音声インタラクションの利用度合や、インタラクションが行われた状況に応じて、音声広告や話し掛けを調整するため、利用者に違和感を与えないように情報を提供することができる。 As described above, the user may feel complicated when an artificial object such as a smart speaker that interacts with the user excessively voice advertisements or talks, but in the present embodiment, the voice interaction of the user is performed. Since the voice advertisement and the conversation are adjusted according to the degree of use and the situation in which the interaction is performed, it is possible to provide information so as not to give the user a sense of discomfort.

なお、上述した例では、提供制御部５０が、音声が入力または出力の対象とされたユーザデバイスの利用度合に応じて、特定情報の出力態様を制御するものとして説明したが、これに代えて（或いは加えて）、以下のように変更されてもよい。すなわち、提供制御部５０は、ユーザデバイスの利用度合に応じて、第２応答内容の出力態様を制御する。この「第２応答内容」は、利用者により発せられた音声に対する応答内容であって広告を含む内容である。例えば、この場合、自動応答装置４０は、広告を含む応答内容を決定し、決定した応答内容をユーザデバイスの利用度合に応じた出力態様で端末装置１０に出力させる。このように、応答内容そのものが広告となり、且つ応答内容の制御態様が制御されるため、利用者に違和感を与えないように情報を提供することができる。 In the above-described example, the provision control unit 50 has been described as controlling the output mode of specific information according to the degree of use of the user device whose voice is input or output, but instead of this, (Or in addition), it may be changed as follows. That is, the provision control unit 50 controls the output mode of the second response content according to the degree of use of the user device. This "second response content" is the content of the response to the voice uttered by the user and includes the advertisement. For example, in this case, the automatic response device 40 determines the response content including the advertisement, and causes the terminal device 10 to output the determined response content in an output mode according to the degree of use of the user device. In this way, since the response content itself becomes an advertisement and the control mode of the response content is controlled, it is possible to provide information so as not to give a sense of discomfort to the user.

以上説明した第１実施形態によれば、提供制御部５０が、音声が入力または出力の対象とされたユーザデバイスの利用度合に応じて、特定情報の出力態様を制御することにより、利用者に違和感を与えないように情報を提供することができる。 According to the first embodiment described above, the provision control unit 50 controls the output mode of the specific information according to the degree of use of the user device to which the voice is input or output, thereby causing the user. Information can be provided so as not to give a sense of discomfort.

＜第２実施形態＞
以下、第２実施形態について説明する。提供制御部５０は、特定情報の出力態様を、応答内容の第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、特定情報の出力態様を、第１出力態様よりも利用者が聞き取りやすい第２出力態様に変更して、特定情報を出力部に出力させる。第１実施形態との相違点を中心に説明する。 <Second Embodiment>
Hereinafter, the second embodiment will be described. The provision control unit 50 changes the output mode of the specific information to the first output mode, which is harder for the user to hear than the third output mode of the response content, outputs the output to the output unit, and then accepts the user's instruction. In this case, the output mode of the specific information is changed to the second output mode that is easier for the user to hear than the first output mode, and the specific information is output to the output unit. The differences from the first embodiment will be mainly described.

図９は、第２実施形態の情報処理システム１Ａに含まれる自動応答装置４０Ａの機能構成の一例を示す図である。自動応答装置４０Ａは、第２記憶部７０に代えて、第２記憶部７０Ａを備える。第２記憶部７０Ａは、例えば、出力度合情報７２および利用情報７４に加え、更に指示対応情報７６（詳細は後述する）を備える。 FIG. 9 is a diagram showing an example of the functional configuration of the automatic response device 40A included in the information processing system 1A of the second embodiment. The automatic response device 40A includes a second storage unit 70A instead of the second storage unit 70. The second storage unit 70A includes, for example, output degree information 72 and usage information 74, as well as instruction correspondence information 76 (details will be described later).

第２実施形態の応答部４８は、特定情報を端末装置１０に出力させる場合、特定情報の出力態様を、応答内容の第３出力態様よりも利用者が聞き取りにくい第１態様に変更して、特定情報を端末装置１０に出力させる。 When the response unit 48 of the second embodiment outputs the specific information to the terminal device 10, the output mode of the specific information is changed to the first mode in which the user is harder to hear than the third output mode of the response content. The specific information is output to the terminal device 10.

上記のように特定情報を端末装置１０に出力させた後、自動応答装置４０Ａは、利用者の指示を受け付けた場合に、特定情報の出願態様を、第１出力態様よりも利用者が聞き取りやすい第２出力態様に変更して、特定情報を端末装置１０に出力させる。第２出力態様は、例えば、第１出力態様よりも、音量が大きい、音の周波数帯が利用者にとって聞き取りやすい、情報が出力されるテンポが適切である態様である。 After outputting the specific information to the terminal device 10 as described above, the automatic response device 40A makes it easier for the user to hear the application mode of the specific information than the first output mode when the user's instruction is received. The terminal device 10 is made to output the specific information by changing to the second output mode. The second output mode is, for example, a mode in which the volume is louder than the first output mode, the frequency band of the sound is easy for the user to hear, and the tempo at which the information is output is appropriate.

なお、利用者が聞き取りにくい第１態様に変更する処理において、利用者が存在する環境の環境音が所定の大きさ以上の場合、環境音が所定の大きさ未満の場合よりも、特定情報の出力態様を変化させなくてもよいし、出力態様の変化度合を小さくしてもよい。もともと環境音が大きい環境で出力態様を変更しても利用者に対する影響が小さいためである。 In the process of changing to the first mode, which is difficult for the user to hear, when the environmental sound of the environment in which the user exists is equal to or larger than the predetermined loudness, the specific information is more important than when the environmental sound is less than the predetermined loudness. The output mode may not be changed, or the degree of change in the output mode may be reduced. This is because even if the output mode is changed in an environment where the environmental sound is originally loud, the influence on the user is small.

［フローチャート］
図１０は、端末装置１０および第２実施形態の自動応答装置４０Ａにより実行される処理の流れの一例を示すフローチャートである。本処理は、第１出力態様で特定情報が出力された後に実行される処理である。図１０のフローチャートのＳ６０、Ｓ７０、およびＳ７２の処理は、図２のフローチャートのＳ１０、Ｓ２０、およびＳ２２の処理と同様のため説明を省略する。 [flowchart]
FIG. 10 is a flowchart showing an example of a processing flow executed by the terminal device 10 and the automatic response device 40A of the second embodiment. This process is a process executed after the specific information is output in the first output mode. The processing of S60, S70, and S72 in the flowchart of FIG. 10 is the same as the processing of S10, S20, and S22 of the flowchart of FIG. 2, and thus the description thereof will be omitted.

次に、自動応答装置４０Ａは、指示対応情報７６を参照し、特定された利用者と、特定された環境パターンと、入力された音声に含まれる情報（指示の内容）との組み合わせに合致する広告の情報の出力態様を決定する（Ｓ７４）。指示の内容とは、利用者が情報の出力に関して求めた指示の情報である。指示の内容とは、例えば、ボリュームを上げることや、ゆっくりと情報を出力させること、高い音で情報を出力させること、数秒前に出力された情報を出力すること等、またはこれらの組み合わせである。 Next, the automatic response device 40A refers to the instruction correspondence information 76, and matches the combination of the specified user, the specified environment pattern, and the information (content of the instruction) included in the input voice. The output mode of the advertisement information is determined (S74). The content of the instruction is the information of the instruction requested by the user regarding the output of the information. The content of the instruction is, for example, raising the volume, slowly outputting information, outputting information with a high sound, outputting information output a few seconds ago, or a combination thereof. ..

図１１は、指示対応情報７６の内容の一例を示す図である。指示対応情報７６は、利用者によって行われた指示に対して、どのような出力態様で情報を出力するかを決定するのに用いられる情報である。指示対応情報７６は、例えば、環境パターンごとに、利用者ＩＤ、指示の内容、および出力態様が互いに対応付けられた情報である。 FIG. 11 is a diagram showing an example of the contents of the instruction correspondence information 76. The instruction correspondence information 76 is information used for determining in what output mode the information is output in response to the instruction given by the user. The instruction correspondence information 76 is, for example, information in which a user ID, an instruction content, and an output mode are associated with each other for each environment pattern.

次に、自動応答装置４０Ａは、端末装置１０に決定した出力態様で広告の情報を出力するように指示する（Ｓ７６）。次に、端末装置１０は、自動応答装置４０Ａの指示に基づいて、決定された出力態様で広告の情報をスピーカ１４に出力させる（Ｓ６２）。これにより本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40A instructs the terminal device 10 to output the advertisement information in the determined output mode (S76). Next, the terminal device 10 causes the speaker 14 to output the advertisement information in the determined output mode based on the instruction of the automatic response device 40A (S62). As a result, the processing of one routine of this flowchart is completed.

上述したように、自動応答装置４０が、利用者の求めに応じて出力態様を変更するため、利用者に違和感を与えないように情報を提供することができる。 As described above, since the automatic response device 40 changes the output mode according to the request of the user, it is possible to provide information so as not to give the user a sense of discomfort.

［具体例（その２−１）］
図１２は、第２実施形態の利用者と自動応答装置４０Ａとの会話の一例を示す図である。例えば、図１２（Ａ）に示すように、（１）利用者が「新しい車が欲しいな。」とマイク１２に入力する。（２）自動応答装置４０Ａは、第１キャラクターの出力態様で、「どんな車が欲しいの？」と応答する。 [Specific example (No. 2-1)]
FIG. 12 is a diagram showing an example of a conversation between the user of the second embodiment and the automatic response device 40A. For example, as shown in FIG. 12 (A), (1) the user inputs "I want a new car" to the microphone 12. (2) The automatic response device 40A responds with "what kind of car do you want?" In the output mode of the first character.

次に、図１２（Ｂ）に示すように、（３）利用者が「燃費のいい車がいいな。」とマイク１２に入力する。（４）自動応答装置４０Ａは、第１キャラクターの出力態様で、「節約できるからいいよね。」と応答する。 Next, as shown in FIG. 12 (B), (3) the user inputs to the microphone 12 "I like a car with good fuel economy." (4) The automatic response device 40A responds with "It's good because it saves money" in the output mode of the first character.

次に、例えば、数秒程度、利用者によって発話がされない場合、図１２（Ｃ）に示すように、（５）自動応答装置４０Ａは、第２キャラクターの出力態様であり、且つ第１出力態様で、「車Ａをおすすめします。・・・・」と発話する。 Next, for example, when the user does not speak for about several seconds, (5) the automatic response device 40A is the output mode of the second character and in the first output mode, as shown in FIG. 12 (C). , "I recommend car A ...."

（６）利用者は、上記（５）で出力された情報に興味を持っていたが音量が小さいため聞こえなかったことから、「聞こえないよ。」と発話する。そうすると、（７）自動応答装置４０Ａは、第２キャラクターの出力態様であり、且つ音量を上げて、上記（５）で出力させた情報を端末装置１０に出力させる。すなわち、第２キャラクターが「車Ａをおすすめします。・・・」と、再度、発話する。 (6) The user was interested in the information output in (5) above, but could not hear it because the volume was low, so he said, "I can't hear it." Then, (7) the automatic response device 40A is the output mode of the second character, and the volume is raised to output the information output in (5) above to the terminal device 10. That is, the second character speaks again, "I recommend car A ....".

このように、第２キャラクターが情報を出力する場合の出力態様を、第１キャラクターが情報を出力する場合の出力態様よりも、利用者が聞き取りにくくすることにより、利用者に煩わしさを感じさせることを抑制することができる。また、利用者の求めに応じ、第２キャラクターが情報を出力する場合の出力態様を、利用者が聞き取りやすいようにすることにより、利用者にとっての利便性を向上させることができる。 In this way, the output mode when the second character outputs the information is harder for the user to hear than the output mode when the first character outputs the information, which makes the user feel annoyed. Can be suppressed. In addition, the convenience for the user can be improved by making the output mode when the second character outputs the information in response to the user's request easy for the user to hear.

なお、上述した説明では、一例として、利用者が音声を入力した場合に、特定情報が出力される例について説明したが、単に自動応答装置４０Ａが特定情報を出力する場合において、上記のように出力態様が制御されてもよい。また、例えば、出力したい特定情報に基づいて、第１のキャラクターと第２のキャラクターの会話が選択されてもよい。 In the above description, as an example, when the user inputs voice, the specific information is output. However, when the automatic response device 40A simply outputs the specific information, as described above. The output mode may be controlled. Further, for example, a conversation between the first character and the second character may be selected based on the specific information to be output.

［具体例（その２−２）］
図１３は、広告の情報が出力される際の音量の変化を示す図である。図１３の縦軸は音の大きさを示し、図１３の横軸は時間を示している。以下で説明する広告Ａ〜Ｃの各広告の長さ（時間）は、例えば所定秒（例えば１５秒程度）である。広告Ａ〜Ｃの順で広告の情報が出力される予定であるものとする。この場合において、例えば、広告Ａが出力され、広告Ｂが出力され、広告Ｂの内容が出力されている途中（図１３の時刻Ｔ）で、利用者が音量を上げることを指示した。自動応答装置４０Ａは、時刻Ｔにおいて、広告Ｂの内容を最初から端末装置１０に出力させる。すなわち、所定時間遡った部分や音量を絞った部分から、広告Ｂが再出力される。また、その後、自動応答装置４０Ａは、図示するように広告Ｂの内容が出力された後、音量を上げる前の音量に下げてもよいし、音量を上げた状態を維持してもよい。 [Specific example (2-2)]
FIG. 13 is a diagram showing a change in volume when advertising information is output. The vertical axis of FIG. 13 indicates loudness, and the horizontal axis of FIG. 13 indicates time. The length (time) of each of the advertisements A to C described below is, for example, a predetermined second (for example, about 15 seconds). It is assumed that the advertisement information is to be output in the order of advertisements A to C. In this case, for example, the user instructed to raise the volume while the advertisement A was output, the advertisement B was output, and the content of the advertisement B was being output (time T in FIG. 13). At time T, the automatic response device 40A causes the terminal device 10 to output the content of the advertisement B from the beginning. That is, the advertisement B is re-output from the portion that goes back a predetermined time or the portion where the volume is reduced. Further, after that, the automatic response device 40A may lower the volume before raising the volume after the content of the advertisement B is output as shown in the figure, or may keep the volume raised.

上述したように、自動応答装置４０Ａが、利用者により指示がされた場合に、指示された際に出力していた広告を最初から出力させるため、利用者は所望の情報を取得することができる。 As described above, when the automatic response device 40A is instructed by the user, the advertisement output at the time of the instruction is output from the beginning, so that the user can acquire desired information. ..

なお、上述した例では、利用者の指示に基づいて、内容Ｂを最初から出力するものとしたが、広告Ａの最初から出力してもよいし、利用者の指示がされたときから所定時間前に出力されていた情報から出力してもよい。また、利用者の発話の内容（例えば切迫度）に基づいて、再出力させる情報が決定されてもよい。また、自動応答装置４０Ａは、過去の利用者の指示の傾向または予め設定された条件に基づいて、利用者の指示がされたときから、どの程度前から広告を再度再生するかを決定してもよい。 In the above example, the content B is output from the beginning based on the instruction of the user, but the content B may be output from the beginning of the advertisement A, or a predetermined time from the time when the instruction of the user is given. You may output from the information that was output before. In addition, the information to be re-output may be determined based on the content of the user's utterance (for example, the degree of urgency). In addition, the automatic response device 40A determines how long before the advertisement is replayed from the time when the user's instruction is given, based on the tendency of the user's instruction in the past or the preset conditions. May be good.

［その他］
提供制御部５０は、特定情報の属性に基づいて、特定情報の出力態様を、第１出力態様に変更して特定情報を出力部に出力させてもよい。特定情報の属性とは、広告に関する情報、機器の操作に関する情報、楽曲、およびユーザに関連する期限に関する情報（パスワードの変更期限などの情報）のうち、少なくとも一つを含む。例えば、提供制御部５０は、広告に関する情報の出力態様を第１出力態様に変更し、他の属性の特定情報は出力態様を変更しなくてもよい。 [Other]
The provision control unit 50 may change the output mode of the specific information to the first output mode and output the specific information to the output unit based on the attribute of the specific information. The attribute of specific information includes at least one of information about advertisement, information about operation of a device, music, and information about a deadline related to a user (information such as a password change deadline). For example, the provision control unit 50 does not have to change the output mode of the information related to the advertisement to the first output mode, and the specific information of other attributes does not change the output mode.

提供制御部５０は、広告の種別に基づいて特定情報の出力態様を、第１出力態様に変更して特定情報を出力部に出力させてもよい。広告の種別とは、例えば、広告に対応する商品の種別である。例えば、提供制御部５０は、車の広告の出力態様については、第１出力態様に変更するが、不動産の広告の出力態様については、第１出力態様に変更せずに、出力部に出力させてもよい。 The provision control unit 50 may change the output mode of the specific information to the first output mode based on the type of the advertisement and output the specific information to the output unit. The type of advertisement is, for example, the type of product corresponding to the advertisement. For example, the provision control unit 50 changes the output mode of the car advertisement to the first output mode, but outputs the output mode of the real estate advertisement to the output unit without changing to the first output mode. You may.

また、提供制御部５０は、広告の種別と、過去に行われた利用者の指示の結果とに基づいて、特定情報の出力態様を、第１出力態様に変更して特定情報を出力部に出力させてもよい。例えば、学習部５２が、広告の種別と、過去に行われた利用者の指示の結果とを学習する。例えば、学習部５２は、車の広告が出力された場合、利用者はボリュームのアップを指示したが、不動産の広告が出力された場合、利用者はボリュームのダウンを指示したことを学習する。この場合、例えば、提供制御部５０は、不動産の広告の出力態様については、第１出力態様に変更するが、車の広告の出力態様については、第１出力態様に変更せずに、出力部に出力させてもよい。
Further, the provision control unit 50 changes the output mode of the specific information to the first output mode based on the type of advertisement and the result of the instruction given by the user in the past, and sends the specific information to the output unit. It may be output. For example, the learning unit 52 learns the type of advertisement and the result of the user's instruction given in the past. For example, the learning unit 52 learns that when a car advertisement is output, the user has instructed to increase the volume, but when a real estate advertisement is output, the user has instructed to decrease the volume. In this case, for example, the provision control unit 50 changes the output mode of the real estate advertisement to the first output mode, but does not change the output mode of the car advertisement to the first output mode. May be output to.

また、提供制御部５０は、上記の考え方を採用して、利用者に対応する環境パターンに基づいて、特定情報の出力態様を、第１出力態様に変更してもよい。例えば、ある環境においては、第１出力態様で特定情報が出力されることが利用者にとって好ましいことが学習部５２により学習される。提供制御部５０は、学習結果に基づいて、特定情報を第１出力態様で出力する。 Further, the provision control unit 50 may adopt the above-mentioned concept and change the output mode of the specific information to the first output mode based on the environment pattern corresponding to the user. For example, in a certain environment, the learning unit 52 learns that it is preferable for the user to output specific information in the first output mode. The provision control unit 50 outputs specific information in the first output mode based on the learning result.

また、利用者により指定された情報（例えば所定の属性の情報）の出力態様については、第１出力態様に変更し、指定されていない情報の出力態様については第１出力態様に変更しなくてもよい。 Further, the output mode of the information specified by the user (for example, the information of a predetermined attribute) is changed to the first output mode, and the output mode of the information not specified is not changed to the first output mode. May be good.

また、指示対応情報７６は、学習部５２により生成される。例えば、学習部５２は、第１出力態様で特定情報が出力部に出力された後、環境パターンごとに、利用者により受けた指示の内容および指示に基づいて変更された特定情報の出力態様を学習する。そして、学習部５２は、所定の環境パターンにおいて、特定情報の出力態様をどのように変更させたかを学習して、利用者の嗜好に合致する指示対応情報７６を生成する。 Further, the instruction correspondence information 76 is generated by the learning unit 52. For example, after the specific information is output to the output unit in the first output mode, the learning unit 52 outputs the specific information output mode changed based on the content of the instruction received by the user and the instruction for each environment pattern. learn. Then, the learning unit 52 learns how the output mode of the specific information is changed in the predetermined environment pattern, and generates the instruction correspondence information 76 that matches the preference of the user.

例えば、学習部５２は、土曜日や、時間帯が７時〜８時、利用者の周囲に親が存在している場合、利用者が自宅にいる場合、またはプライベートのスケジュールが予定されている時間帯において、利用者により受けた指示の内容および指示に基づいて変更した特定情報の出力態様を学習し、学習結果に基づいて、指示対応情報７６を生成する。例えば、利用者が、所定の環境パターンにおいてボリューム「１０」で特定情報の出力させる傾向にある場合、指示対応情報７６において、ボリュームの変更指示がされた場合の第２出力態様はボリューム「１０」に設定される。 For example, the learning department 52 may be on Saturday, from 7:00 to 8:00, when there are parents around the user, when the user is at home, or when a private schedule is scheduled. In the band, the content of the instruction received by the user and the output mode of the specific information changed based on the instruction are learned, and the instruction correspondence information 76 is generated based on the learning result. For example, when the user tends to output specific information on the volume "10" in a predetermined environment pattern, the second output mode when the volume change instruction is given in the instruction correspondence information 76 is the volume "10". Is set to.

以上説明した第２実施形態によれば、提供制御部５０は、特定情報の出力態様を、応答内容の出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、特定情報の出力態様を、第１出力態様よりも利用者が聞き取りやすい第２出力態様に変更して、特定情報を出力部に出力させることにより、利用者に違和感を与えないように情報を提供することができる。 According to the second embodiment described above, the provision control unit 50 changes the output mode of the specific information to the first output mode that is harder for the user to hear than the output mode of the response content, and causes the output unit to output the information. Later, when the user's instruction is received, the output mode of the specific information is changed to the second output mode that is easier for the user to hear than the first output mode, and the specific information is output to the output unit. Information can be provided so as not to give the user a sense of discomfort.

例えば、自動応答装置と利用者との対話の延長にそのまま音声広告を出力すると、煩わしく思われたり、ステルスマーケティング（ステマ）とみなされてしまったりする場合があるが、本実施形態のように、特定情報を利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示によって特定情報を第２出力態様に変更して出力部に出力させることにより、煩わしいと感じさせることを抑制したり、ステルスマーケティングとみなされること抑制する。 For example, if a voice advertisement is output as it is as an extension of the dialogue between the automatic response device and the user, it may seem annoying or may be regarded as stealth marketing (stemmer), but as in the present embodiment, It is troublesome to change the specific information to the first output mode that is difficult for the user to hear and output it to the output unit, and then change the specific information to the second output mode and output it to the output unit according to the user's instruction. Suppress what makes you feel or what is considered stealth marketing.

＜第３実施形態＞
以下、第３実施形態について説明する。提供制御部５０は、応答内容を出力する第１のキャラクターと、特定情報を出力する第２のキャラクターとの会話を出力部に出力させる。第１実施形態との相違点を中心に説明する。 <Third Embodiment>
Hereinafter, the third embodiment will be described. The provision control unit 50 causes the output unit to output a conversation between the first character that outputs the response content and the second character that outputs the specific information. The differences from the first embodiment will be mainly described.

図１４は、第３実施形態の情報処理システム１Ｂの機能構成の一例を示す図である。情報処理システム１Ｂは、例えば、端末装置１０Ｂと、自動応答装置４０Ｂと、広告提供装置８０Ｂとを備える。 FIG. 14 is a diagram showing an example of the functional configuration of the information processing system 1B of the third embodiment. The information processing system 1B includes, for example, a terminal device 10B, an automatic response device 40B, and an advertisement providing device 80B.

端末装置１０Ｂは、第１実施形態の端末装置１０の機能構成に加え、更に表示部１５と、画像生成部１９とを備える。表示部１５は、画像生成部１９の制御に基づいて、画像を表示する。画像生成部１９は、自動応答装置４０Ｂにより送信された情報に基づいて、表示部１５に画像を表示させる。例えば、音声生成部１８と画像生成部１９とは、自動応答装置４０Ｂにより送信された情報に基づいて、表示部１５に表示される画像の内容と、スピーカ１４に出力される音声の内容とが意図したタイミングになるように協調して、スピーカ１４および表示部１５を制御する。以下、音声生成部１８と画像生成部１９とを合わせたものを、「生成部１７」と称する。 The terminal device 10B further includes a display unit 15 and an image generation unit 19 in addition to the functional configuration of the terminal device 10 of the first embodiment. The display unit 15 displays an image under the control of the image generation unit 19. The image generation unit 19 causes the display unit 15 to display an image based on the information transmitted by the automatic response device 40B. For example, the voice generation unit 18 and the image generation unit 19 have the content of the image displayed on the display unit 15 and the content of the voice output to the speaker 14 based on the information transmitted by the automatic response device 40B. The speaker 14 and the display unit 15 are controlled in cooperation with each other so as to have an intended timing. Hereinafter, the combination of the voice generation unit 18 and the image generation unit 19 will be referred to as a “generation unit 17”.

自動応答装置４０Ｂは、第１実施形態の自動応答装置４０の機能構成に加え、更に画像提供部４９を備え、第１実施形態の第１記憶部６０に代えて、第１記憶部６０Ｂを備える。第１記憶部６０Ｂは、例えば、第１実施形態の第１記憶部６０に記憶された情報に加え、更にモーション情報６９が記憶されている。モーション情報６９は、利用者と会話するキャラクターの動きが規定された情報である。画像提供部４９は、モーション情報６９に含まれる情報、または広告提供装置８０Ｂにより提供された情報に基づいて、端末装置１０Ｂに表示される画像を生成するための情報を端末装置１０に提供する。画像を生成するための情報には、スピーカ１４に出力される発話に対して、画像を変化させるタイミングが対応付けられている。以下、応答部４８と画像提供部４９とを合わせたものを、「応答提供部４７」と称する。 The automatic response device 40B further includes an image providing unit 49 in addition to the functional configuration of the automatic response device 40 of the first embodiment, and includes a first storage unit 60B in place of the first storage unit 60 of the first embodiment. .. In the first storage unit 60B, for example, in addition to the information stored in the first storage unit 60 of the first embodiment, motion information 69 is further stored. The motion information 69 is information in which the movement of the character talking with the user is defined. The image providing unit 49 provides the terminal device 10 with information for generating an image displayed on the terminal device 10B based on the information included in the motion information 69 or the information provided by the advertisement providing device 80B. The information for generating the image is associated with the timing of changing the image with respect to the utterance output to the speaker 14. Hereinafter, the combination of the response unit 48 and the image providing unit 49 will be referred to as a “response providing unit 47”.

広告提供装置８０Ｂは、第１実施形態の広告提供装置側記憶部９０に代えて、広告提供装置側記憶部９０Ｂを備える。広告提供装置側記憶部９０は、例えば、広告情報９２Ｂを備える。広告情報９２Ｂは、第１実施形態の広告情報９２の情報に加え、更に広告モーション情報９３を備える。広告モーション情報９３は、広告ＩＤに対応付けられたキャラクターの動きが規定された情報である。 The advertisement providing device 80B includes an advertisement providing device side storage unit 90B instead of the advertisement providing device side storage unit 90 of the first embodiment. The advertisement providing device side storage unit 90 includes, for example, advertisement information 92B. The advertisement information 92B further includes the advertisement motion information 93 in addition to the information of the advertisement information 92 of the first embodiment. The advertisement motion information 93 is information that defines the movement of the character associated with the advertisement ID.

［フローチャート］
図１５は、自動応答装置４０Ｂにより実行される処理の流れの一例を示すフローチャートである。まず、応答提供部４７が、第１キャラクターと第２キャラクターとを会話させる（Ｓ８０）。次に、広告提供部４７は、第２キャラクターに広告の情報を出力させる（Ｓ８２）。 [flowchart]
FIG. 15 is a flowchart showing an example of the flow of processing executed by the automatic response device 40B. First, the response providing unit 47 causes the first character and the second character to have a conversation (S80). Next, the advertisement providing unit 47 causes the second character to output the information of the advertisement (S82).

次に、自動応答装置４０Ｂは、出力された広告の情報（第１の特定情報）に応じて利用者が音声を入力したか否かを判定する（Ｓ８４）。なお、音声に代えて、所定の操作がされたか否かが判定されてもよい。利用者が音声を入力していない場合、本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40B determines whether or not the user has input the voice according to the output advertisement information (first specific information) (S84). In addition, instead of voice, it may be determined whether or not a predetermined operation has been performed. If the user has not input voice, the processing of one routine of this flowchart ends.

利用者が音声を入力した場合、自動応答装置４０Ｂは、利用者が広告の情報の出力に対して煩わしいと感じているか否かを判定する（Ｓ８６）。「煩わしいと感じている」とは、例えば、入力された音声に含まれる情報が広告の情報の出力に関して、否定的な意味を有していることである。より具体的には、例えば、「静かにして」、「やめて」、「音を下げて」などの意味を有する発話がされた場合、利用者が煩わしいと感じていると判定される。利用者が煩わしいと感じていない場合、本フローチャートの１ルーチンの処理が終了する。なお、Ｓ８６で煩わしいと感じていない場合、自動応答装置４０Ｂは、第１の特定情報よりも詳細な情報である第２の特定情報を出力部に出力させる。詳細な情報とは、例えば、第１の特定情報が商品名や商品の属性である場合、その説明的な内容である。 When the user inputs a voice, the automatic response device 40B determines whether or not the user feels annoyed with the output of the advertisement information (S86). "I feel annoyed" means that, for example, the information contained in the input voice has a negative meaning with respect to the output of the information of the advertisement. More specifically, for example, when an utterance having a meaning such as "quiet", "stop", or "lower the sound" is made, it is determined that the user feels annoyed. If the user does not find it bothersome, the processing of one routine in this flowchart ends. If the S86 does not feel annoying, the automatic response device 40B causes the output unit to output the second specific information, which is more detailed information than the first specific information. The detailed information is, for example, descriptive content when the first specific information is a product name or an attribute of a product.

利用者が煩わしいと感じている場合、応答提供部４７は、広告の情報を出力させることを停止する（Ｓ８８）。なお、停止に代えて、利用者の反応に基づいて出力態様を変更させてもよい。例えば、利用者が「音を下げて」と入力した場合、広告の情報が出力される音が小さく制御される。これにより本フローチャートの１ルーチンの処理が終了する。 When the user feels annoyed, the response providing unit 47 stops outputting the advertisement information (S88). Instead of stopping, the output mode may be changed based on the reaction of the user. For example, when the user inputs "lower the sound", the sound at which the advertisement information is output is controlled to be small. As a result, the processing of one routine of this flowchart is completed.

上述したように、キャラクター同士が会話をして広告の情報を出力させることにより、利用者に対して、より情報に対する興味を持たせることができる。また、利用者の反応に応じて、情報の出力を抑制するため、利用者にとっての利便性が向上する。 As described above, by having the characters talk with each other and output the information of the advertisement, it is possible to make the user more interested in the information. In addition, since the output of information is suppressed according to the reaction of the user, the convenience for the user is improved.

［具体例（その３−１）］
図１６は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その１）である。提供制御部は、利用者に提供した情報に基づいて、第１キャラクターと第２キャラクターとを会話させる。例えば、図１６に示すように、（１）第２キャラクターＣＲ２が「今日の天気はどう？」と発話する。（２）第１キャラクターＣＲ１が、「予報では快晴だよ。」と応答する。 [Specific example (No. 3-1)]
FIG. 16 is a diagram (No. 1) showing an example of an image displayed on the conversation and display unit 15 of the third embodiment. The provision control unit causes the first character and the second character to have a conversation based on the information provided to the user. For example, as shown in FIG. 16, (1) the second character CR2 utters "How is the weather today?" (2) The first character CR1 responds, "It's fine in the forecast."

次に、（３）第２キャラクターＣＲ２が「ドライブ日和だね。」と発話する。次に、（４）第１キャラクターＣＲ１が、「そうだね。」と応答する。次に、（５）第２キャラクターＣＲ２が、「そういえば、ドライブするのに最適な車が発売されたよ。」と発話する。 Next, (3) the second character CR2 utters, "It's a good day to drive." Next, (4) the first character CR1 responds with "That's right." Next, (5) the second character CR2 says, "By the way, the best car to drive has been released."

このように、キャラクター同士で会話させて、商品を紹介することにより、利用者により自然に商品に興味を持たせることができる。 In this way, by having the characters talk to each other and introducing the product, the user can naturally become interested in the product.

［具体例（その３−２）］
例えば、自動応答装置４０Ｂは、第１キャラクターと利用者との会話に基づいて、利用者の好みや、嗜好、行動予定等の嗜好情報を取得する。嗜好情報とは、例えば、利用者の趣味や、利用頻度が高い施設または場所、購入頻度が高い商品、購入を希望している商品またはサービス等の情報である。 [Specific example (3-2)]
For example, the automatic response device 40B acquires preference information such as a user's preference, preference, and action schedule based on a conversation between the first character and the user. The preference information is, for example, information such as a user's hobby, a facility or place frequently used, a product frequently purchased, a product or service desired to be purchased, and the like.

図１７は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その２）である。提供制御部５０は、例えば、利用者と第１キャラクターとの会話に含まれる会話情報を第２キャラクターにより出力される特定情報の内容に反映させるか否かを利用者に問い合わせ、利用者に許諾を得た場合、会話情報を特定情報の内容に反映させる。 FIG. 17 is a diagram (No. 2) showing an example of an image displayed on the conversation and display unit 15 of the third embodiment. The provision control unit 50 asks the user whether or not to reflect the conversation information included in the conversation between the user and the first character in the content of the specific information output by the second character, and grants the user permission. When is obtained, the conversation information is reflected in the content of the specific information.

例えば、図１７に示すように、（１）第１キャラクターＣＲ１が「利用者Ａさん。利用者Ａさんが車の購入を考えていること他の人に教えていい？」と発話する。この発話に対して、利用者Ａさんが「いいよ。」と回答したものとする。（２）第１キャラクターＣＲ１が、「いいんだね。他の人に教えておくね。きっといい車が見つかるよ！」と応答する。このように、第１キャラクターが利用者の興味関心、傾向などの情報を第２キャラクターに提供することで、第２キャラクターが出力する情報を最適化する。 For example, as shown in FIG. 17, (1) the first character CR1 utters "User A. Can I tell other people that User A is thinking about purchasing a car?" It is assumed that user A replies "OK" to this utterance. (2) The first character CR1 responds, "It's good. I'll tell other people. I'm sure you'll find a good car!" In this way, the first character provides the second character with information such as the user's interests and tendencies, thereby optimizing the information output by the second character.

図１８は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その３）である。前述した図１７の（２）の応答後、所定のタイミングで以下の会話が行われる。（１）第２キャラクターＣＲ２が、例えば表示部１５に表示されていない状態で「ごめんください。」と発話する。次に、（２）第１キャラクターＣＲ１が、「どなたですか？」と応答する。次に、（３）第２キャラクターＣＲ２が、「少しお時間よろしいでしょうか？」と発話する。次に、（４）第１キャラクターＣＲ１が、「利用者Ａさん、どなたか尋ねてきましたよ。入れてもいいですか？」と発話する。この発話に対して、利用者Ａさんが、「入れていいよ。」と回答したものとする。次に、（５）第１キャラクターＣＲ１が、利用者Ａさんの発話に応じて、「お入りください。」と発話する。その後、表示部１５に図１９に示す画像が表示される。 FIG. 18 is a diagram (No. 3) showing an example of an image displayed on the conversation and display unit 15 of the third embodiment. After the response of (2) of FIG. 17 described above, the following conversation is performed at a predetermined timing. (1) The second character CR2 utters "I'm sorry." In a state where it is not displayed on the display unit 15, for example. Next, (2) the first character CR1 responds with "Who is it?". Next, (3) the second character CR2 utters, "Are you sure you have some time?" Next, (4) the first character CR1 utters, "User A, someone has asked. Can I put it in?" It is assumed that User A replied, "You can put it in." To this utterance. Next, (5) the first character CR1 utters "Please enter" in response to the utterance of user A. After that, the image shown in FIG. 19 is displayed on the display unit 15.

図１９は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その４）である。（１）第２キャラクターＣＲ２が、例えば表示部１５に表示された状態で「お車をお探しであることをお伺いしたので、ご紹介に参りました。」と発話する。次に、（２）第１キャラクターＣＲ１が、「利用者Ａさん、お話聞いてみますか？」と応答する。この応答に対して、利用者が肯定的な発話を行った場合、例えば、第２キャラクターＣＲ２は、商品を紹介する。この応答に対して、利用者が否定的な発話を行った場合、例えば、第２キャラクターＣＲ２は、商品の紹介を行わず、姿を消す。 FIG. 19 is a diagram (No. 4) showing an example of an image displayed on the conversation and display unit 15 of the third embodiment. (1) The second character CR2, for example, is displayed on the display unit 15, and says, "I heard that you are looking for a car, so I came to introduce you." Next, (2) the first character CR1 responds, "User A, would you like to hear from us?" When the user makes a positive utterance in response to this response, for example, the second character CR2 introduces the product. When the user makes a negative utterance in response to this response, for example, the second character CR2 disappears without introducing the product.

このように、嗜好情報の取扱いについて、許可が得られた場合に、利用者の嗜好情報に応じた広告の情報が出力されるため、利用者に煩わしさを感じさせることを抑制しつつ、利用者にとっての利便性を向上させることができる。 In this way, regarding the handling of preference information, when permission is obtained, advertisement information according to the user's preference information is output, so that the user can be used while suppressing the annoyance. It is possible to improve the convenience for the person.

なお、上述した例では、第１キャラクターＣＲ１と第２キャラクターＣＲ２とが会話する例について説明したが、これに代えて（または加えて）第２キャラクターＣＲ２と、第３キャラクターとが会話してもよい。第３キャラクターは、例えば、第２キャラクターＣＲ２がおすすめする商品（またはサービス）と競合する（または関連する）商品（またはサービス）を宣伝するキャラクターである。 In the above-mentioned example, an example in which the first character CR1 and the second character CR2 talk to each other has been described, but instead (or in addition), the second character CR2 and the third character may talk to each other. Good. The third character is, for example, a character that promotes a product (or service) that competes with (or is related to) a product (or service) recommended by the second character CR2.

以上説明した第３実施形態によれば、提供制御部５０は、第１のキャラクターに応じた出力態様によって応答内容を出力部に出力させ、第２のキャラクターに応じた出力態様によって特定情報を出力部に出力させ、第１のキャラクターと第２のキャラクターとの会話を出力部に出力させることにより、よりユーザに情報に対する興味を喚起させることができる。 According to the third embodiment described above, the provision control unit 50 causes the output unit to output the response content according to the output mode according to the first character, and outputs the specific information according to the output mode according to the second character. By outputting to the output unit and outputting the conversation between the first character and the second character to the output unit, it is possible to further arouse the user's interest in information.

なお、上述した各実施形態の情報処理システム１では、端末装置１０は一台であるものとして説明したが、二以上の端末装置１０が設けられてもよい。この場合、自動応答装置４０は、例えば、第１の端末装置１０または第２の端末装置１０から、その装置の識別情報と共に端末装置１０に入力された音声データを取得する。そして、自動応答装置４０は、取得した識別情報を参照して、第１の端末装置１０に第１キャラクターの出力態様で応答内容を出力させ、第２の端末装置１０に第２キャラクターの出力態様で特定情報を出力させる。 Although the information processing system 1 of each of the above-described embodiments has been described as having one terminal device 10, two or more terminal devices 10 may be provided. In this case, the automatic response device 40 acquires voice data input to the terminal device 10 together with identification information of the device from, for example, the first terminal device 10 or the second terminal device 10. Then, the automatic response device 40 refers to the acquired identification information, causes the first terminal device 10 to output the response content in the output mode of the first character, and causes the second terminal device 10 to output the response content in the output mode of the second character. To output specific information with.

以上説明した実施形態によれば、利用者により発せられた音声に対する応答内容と、前記応答内容とは異なる特定情報とを出力部に出力させる応答部と、前記特定情報の出力態様を、前記応答内容の出力態様である第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、前記特定情報の出力態様を、前記第１出力態様よりも前記利用者が聞き取りやすい第２出力態様に変更して、前記特定情報を出力部に出力させる制御部と備えることにより、利用者に違和感を与えないように情報を提供することができる。 According to the embodiment described above, the response unit for outputting the response content to the voice emitted by the user, the specific information different from the response content to the output unit, and the output mode of the specific information are the response. When the user's instruction is received after changing to the first output mode, which is harder for the user to hear than the third output mode, which is the content output mode, and then outputting to the output unit, the output mode of the specific information is changed. By changing to a second output mode that is easier for the user to hear than the first output mode and providing a control unit that outputs the specific information to the output unit, the information is provided so as not to give the user a sense of discomfort. Can be provided.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１，１Ａ、１Ｂ…情報処理システム、１０…端末装置、１２…マイク、１４…スピーカ、１５…表示部、１６…音声認識部、１８…音声生成部、１９…画像生成部、４０、４０Ａ、４０Ｂ…自動応答装置、４２…利用者特定部、４３…環境解析部、４６…解釈部、４８…応答部、４９…画像提供部、５０…提供制御部、５２…学習部、８０…広告提供装置、８２…情報提供部 1,1A, 1B ... Information processing system, 10 ... Terminal device, 12 ... Microphone, 14 ... Speaker, 15 ... Display unit, 16 ... Voice recognition unit, 18 ... Voice generation unit, 19 ... Image generation unit, 40, 40A, 40B ... Automatic response device, 42 ... User identification unit, 43 ... Environmental analysis unit, 46 ... Interpretation unit, 48 ... Response unit, 49 ... Image provision unit, 50 ... Provision control unit, 52 ... Learning unit, 80 ... Advertisement provision Equipment, 82 ... Information provision department

Claims

A response unit that outputs a response content to the voice emitted by the user and an advertisement that is specific information different from the response content to the output unit, and a response unit.
When the output mode of the advertisement is changed to the first output mode, which is harder for the user to hear than the third output mode, which is the output mode of the response content, and is output to the output unit, and then the user's instruction is received. A control unit that changes the output mode of the advertisement to a second output mode that is easier for the user to hear than the first output mode and outputs the specific information to the output unit.
The control unit has an output mode of the advertisement based on the type of the advertisement and the result of an instruction to change the output mode in the past when the advertisement is output to the output unit. Is changed to the first output mode to output the advertisement to the output unit.
Information processing system.

When the control unit receives a user's instruction, the control unit changes the output mode of the specific information based on the mode of the received instruction.
The information processing system according to claim 1.

When the output unit outputs specific information in the first output mode and receives a user's instruction, the control unit outputs the specific information output by the output unit in the first output mode. The mode is changed to the second output mode, and the output unit is output again.
The information processing system according to claim 1 or 2.

The first output mode is an output mode in which the volume is lower, the frequency band of sound is difficult for the user to hear, or the tempo at which information is output is faster than that of the third output mode.
The information processing system according to any one of claims 1 to 3.

The control unit further changes the output mode of the specific information to the first output mode based on the attribute of the specific information, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 4.

The attribute of the specific information includes at least one of information about advertisement, information about operation of a device, music, and information about a deadline related to a user.
The information processing system according to claim 5 .

The control unit changes the output mode of the specific information to the first output mode based on the attributes of the specific information and the attributes of the specific information previously designated by the user, and the specific information. Is output to the output section,
The information processing system according to any one of claims 1 to 6 .

Based on the time zone, the control unit changes the output mode of the specific information to the first output mode and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 7 .

Based on the environment in which the user exists, the control unit changes the output mode of the specific information to the first output mode and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 8 .

The control unit changes the output mode of the specific information to the first output mode based on the position where the user exists, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 9 .

The control unit changes the output mode of the specific information to the first output mode based on a person existing in the vicinity of the user, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 10 .

Based on the schedule information of the user, the control unit changes the output mode of the specific information to the first output mode and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 11 .

The control unit changes the output mode of the specific information to the first output mode based on the environmental sound of the environment in which the user exists, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 12 .

The user's instruction is given by voice input to the user device to which voice is input, or by a predetermined operation.
The information processing system according to any one of claims 1 to 13 .

The control unit outputs the response content to the output unit according to the output mode according to the first character, and outputs the specific information to the output unit according to the output mode according to the second character.
The information processing system according to any one of claims 1 to 14 .

One or more computers
The output unit is made to output the content of the response to the voice uttered by the user and the advertisement which is specific information different from the content of the response.
When the output mode of the advertisement is changed to the first output mode, which is harder for the user to hear than the third output mode, which is the output mode of the response content, and is output to the output unit, and then the user's instruction is received. In addition, the output mode of the advertisement is changed to a second output mode that is easier for the user to hear than the first output mode, and the specific information is output to the output unit.
Based on the type of the advertisement and the result of the instruction to change the output mode in the past when the advertisement is output to the output unit, the output mode of the advertisement is determined by the first. Change to the output mode and output the advertisement to the output unit.
Information processing method.

On one or more computers
A response content for the speech emitted by a user, a process of Ru is output to the output unit ads and a different specific information and the response content,
When the output mode of the advertisement is changed to the first output mode, which is harder for the user to hear than the third output mode, which is the output mode of the response content, and is output to the output unit, and then the user's instruction is received. to a process for the output mode of the advertisement, and change the user easy listening second output mode than said first output mode, Ru to output the specific information to the output unit,
Based on the type of the advertisement and the result of the instruction to change the output mode in the past when the advertisement is output to the output unit, the output mode of the advertisement is determined by the first. The process of changing to the output mode and outputting the advertisement to the output unit is executed.
program.