JP6351987B2

JP6351987B2 - Speech control device, speech device, speech control system, speech control method, speech device control method, and control program

Info

Publication number: JP6351987B2
Application number: JP2014017491A
Authority: JP
Inventors: 章友大西; 広瀬　斉志; 斉志広瀬; 千葉　雅裕; 雅裕千葉; 佳世森長; 友宏相曽; 和典柴田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2014-01-31
Filing date: 2014-01-31
Publication date: 2018-07-04
Anticipated expiration: 2034-01-31
Also published as: JP2015144398A

Description

本発明は、ユーザに対して音声で発話する発話装置および発話装置に発話させる発話制御装置等に関する。 The present invention relates to an utterance device that utters voice to a user, an utterance control device that causes an utterance device to utter, and the like.

近年、ユーザの呼びかけ等に応じて発話するロボット等の発話装置の普及・改良が進んでいる。例えば、下記の特許文献１には、映像情報と共にロボットの動作情報を放送局から配信することが記載されている。そして、テレビとは別装置である受信機から、映像情報をテレビに出力すると共に、動作情報に基づく動作指令をロボットに出力することにより、視聴者が所有するロボットと放送番組とを対話させることが記載されている。 In recent years, speech devices such as robots that speak in response to user calls and the like have been widely spread and improved. For example, Japanese Patent Application Laid-Open Publication No. 2004-228620 describes that robot operation information is distributed from a broadcast station together with video information. Then, from a receiver that is a separate device from the TV, video information is output to the TV, and an operation command based on the operation information is output to the robot, thereby allowing the robot owned by the viewer to interact with the broadcast program. Is described.

特開２００３−１６９３０５号公報（２００３年６月１３日公開）JP 2003-169305 A (released on June 13, 2003)

ここで、上記従来技術では、テレビに表示させる映像情報を、テレビに出力させる前に、テレビとは別の機器である受信機に受信させることによって、映像情報とロボットの動作とを同期させている。つまり、テレビには、放送される映像情報は直接受信されず、受信機を介して受信される。 Here, in the above prior art, before outputting the video information to be displayed on the television to the receiver, the receiver is a device different from the television to synchronize the video information with the operation of the robot. Yes. That is, the television broadcast image information is not received directly but received via the receiver.

このような構成では、受信機を介さない場合と比べて、テレビにおける映像出力が遅延してしまう。また、テレビのチャンネルを切り替える場合も、受信機に対して出力映像を切り替えさせる必要があり、通常のチャンネル切り替えとは異なる制御を行う必要が生じる。つまり、上記従来技術では、テレビに表示させる映像情報と、ロボットの動作とを同期させるために、ロボットの制御を行う受信機が、テレビに映像を送信して出力させる必要があるという問題があった。 With such a configuration, the video output on the television is delayed as compared with the case where no receiver is used. Also, when switching TV channels, it is necessary to cause the receiver to switch the output video, and it is necessary to perform control different from normal channel switching. In other words, the above-described prior art has a problem that the receiver that controls the robot needs to transmit and output the video to the television in order to synchronize the video information to be displayed on the television and the operation of the robot. It was.

本発明は、この問題点に鑑みてなされたものであり、その目的は、映像情報等のコンテンツをテレビ等の出力装置に送信することなく、出力装置が出力中のコンテンツに応じた発話を発話装置に行わせることのできる発話制御装置等を提供することにある。 The present invention has been made in view of this problem, and an object of the present invention is to utter an utterance corresponding to the content being output by the output device without transmitting content such as video information to the output device such as a television. An object of the present invention is to provide an utterance control device or the like that can be performed by a device.

上記の課題を解決するために、本発明の一態様に係る発話制御装置は、音声による発話機能を備えた発話装置に発話させる発話制御装置であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が、上記発話制御装置以外の装置から受信して出力しているコンテンツを特定するコンテンツ特定手段と、上記コンテンツ特定手段が特定したコンテンツに応じた内容で上記発話装置に発話させる発話制御手段と、を備えている。 In order to solve the above-described problem, an utterance control device according to an aspect of the present invention is an utterance control device that causes an utterance device having an utterance function by voice to speak, and is used as an apparatus used together with the utterance device. An output device associated with the utterance device specifies content that is received and output from a device other than the utterance control device, and content corresponding to the content specified by the content specification device. Utterance control means for causing the apparatus to utter.

また、上記の課題を解決するために、本発明の一態様に係る発話装置は、音声による発話機能を備えた発話装置であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が受信して出力しているコンテンツを特定するコンテンツ特定手段と、上記コンテンツ特定手段が特定したコンテンツに応じた内容で発話する発話手段と、を備えている。 In order to solve the above-described problem, an utterance device according to one aspect of the present invention is an utterance device having a speech utterance function, and is associated with the utterance device as a device used together with the utterance device. Content specifying means for specifying the content received and output by the output device, and utterance means for uttering the content according to the content specified by the content specifying means.

そして、上記の課題を解決するために、本発明の一態様に係る発話制御システムは、音声による発話機能を備えた発話装置に発話させる発話制御システムであって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が、上記発話装置に発話させる発話制御装置以外の装置から受信して出力しているコンテンツを特定するコンテンツ特定装置と、上記コンテンツ特定装置が特定したコンテンツに応じた内容で上記発話装置に発話させる上記発話制御装置と、上記発話装置と、を含む。 In order to solve the above problems, an utterance control system according to an aspect of the present invention is an utterance control system that causes an utterance device having a speech function by voice to utter, and is used together with the utterance device. The content specifying device for specifying the content received and output from a device other than the speech control device that causes the output device associated with the speaking device to speak, and the content specified by the content specifying device The utterance control device that causes the utterance device to utter with the content corresponding to the utterance device, and the utterance device.

また、上記の課題を解決するために、本発明の一態様に係る発話制御方法は、音声による発話機能を備えた発話装置に発話させる発話制御方法であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が、上記発話装置に発話させる発話制御装置以外の装置から受信して出力しているコンテンツを特定するコンテンツ特定ステップと、上記コンテンツ特定ステップにて特定したコンテンツに応じた内容で上記発話装置に発話させる発話制御ステップと、を含む。 In order to solve the above problem, an utterance control method according to one aspect of the present invention is an utterance control method for uttering an utterance device having an utterance function by voice, and is used together with the utterance device. The output device associated with the utterance device as specified in the content specification step for specifying the content received and output from a device other than the utterance control device that utters the utterance device, and the content specification step An utterance control step of causing the utterance device to utter with a content corresponding to the content.

そして、上記の課題を解決するために、本発明の一態様に係る発話装置の制御方法は、音声による発話機能を備えた発話装置の制御方法であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が受信して出力しているコンテンツを特定するコンテンツ特定ステップと、上記コンテンツ特定ステップにて特定したコンテンツに応じた内容で発話する発話ステップと、を含む。 And in order to solve said subject, the control method of the speech apparatus which concerns on 1 aspect of this invention is a control method of the speech apparatus provided with the speech function by voice, Comprising: As an apparatus used with the said speech apparatus A content specifying step for specifying the content received and output by the output device associated with the speech device; and an utterance step for speaking with a content corresponding to the content specified in the content specifying step.

本発明の上記各態様によれば、コンテンツを出力装置に送信することなく、出力装置が出力中のコンテンツに応じた発話を発話装置に行わせることができるという効果を奏する。 According to each aspect of the present invention, there is an effect that the utterance device can make the utterance according to the content being output by the output device without transmitting the content to the output device.

本発明の実施形態１に係る発話制御サーバおよび対話装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the speech control server which concerns on Embodiment 1 of this invention, and a dialogue apparatus. 上記発話制御サーバおよび対話装置を含む対話システムの概要を示す図である。It is a figure which shows the outline | summary of the dialog system containing the said speech control server and a dialog apparatus. 番組に応じた発話内容を生成するための情報の一例を示す図である。It is a figure which shows an example of the information for producing | generating the utterance content according to a program. 番組−発話内容対応情報の一例を示す図である。It is a figure which shows an example of a program-utterance content corresponding information. ＴＶ−視聴番組対応情報の一例を示す図である。It is a figure which shows an example of TV-viewing program corresponding information. 上記対話装置とＴＶで出力中の番組との対応を特定するために用いるデータを示す図である。It is a figure which shows the data used in order to specify the response | compatibility with the said interactive apparatus and the program currently output on TV. 発話管理情報の一例を示す図である。It is a figure which shows an example of speech management information. 上記発話制御サーバが実行する発話内容生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the speech content production | generation process which the said speech control server performs. 上記ＴＶが実行する視聴情報送信処理、および上記発話制御サーバが実行する視聴番組登録処理の一例を示すフローチャートである。It is a flowchart which shows an example of the viewing-and-listening information transmission process which the said TV performs, and the viewing-and-listening program registration process which the said speech control server performs. 上記発話制御サーバが実行する発話制御処理、および上記対話装置が実行する発話処理の一例を示すフローチャートである。It is a flowchart which shows an example of the speech control process which the said speech control server performs, and the speech process which the said dialogue apparatus performs. 本発明の実施形態２に係る対話システムの概要を示す図である。It is a figure which shows the outline | summary of the dialogue system which concerns on Embodiment 2 of this invention. 番組情報収集サーバが実行する発話内容生成処理、およびＴＶ情報収集サーバが実行する発話内容登録処理の一例を示すフローチャートである。It is a flowchart which shows an example of the utterance content production | generation process which a program information collection server performs, and the utterance content registration process which a TV information collection server performs. 上記ＴＶ情報収集サーバが実行する発話内容送信処理、および発話制御サーバが実行する発話制御処理の一例を示すフローチャートである。It is a flowchart which shows an example of the utterance content transmission process which the said TV information collection server performs, and the utterance control process which an utterance control server performs. 本発明の実施形態３に係る対話システムの概要を示す図である。It is a figure which shows the outline | summary of the dialogue system which concerns on Embodiment 3 of this invention. ＴＶ情報収集サーバが実行する視聴番組通知処理、番組情報収集サーバが実行する発話内容生成処理、および発話制御サーバが実行する発話制御処理の一例を示すフローチャートである。It is a flowchart which shows an example of the viewing-and-listening program notification process which a TV information collection server performs, the utterance content production | generation process which a program information collection server performs, and the utterance control process which an utterance control server performs. 本発明の実施形態４に係る対話システムの概要を示す図である。It is a figure which shows the outline | summary of the dialogue system which concerns on Embodiment 4 of this invention. 発話制御サーバが実行する視聴番組登録処理および発話制御処理の一例を示すフローチャートである。It is a flowchart which shows an example of the viewing-and-listening program registration process and speech control process which an speech control server performs. 上記の各サーバまたは上記対話装置として利用可能なコンピュータの構成を例示したブロック図である。It is the block diagram which illustrated the composition of the computer which can be used as each above-mentioned server or the above-mentioned dialog device.

〔実施形態１〕
以下、本発明の実施形態１について、図１から１０に基づいて詳細に説明する。まず、本実施形態に係る対話システムの概要を図２に基づいて説明する。図２は、対話システム（発話制御システム）５の概要を示す図である。 Embodiment 1
Hereinafter, Embodiment 1 of the present invention will be described in detail with reference to FIGS. First, the outline | summary of the dialogue system which concerns on this embodiment is demonstrated based on FIG. FIG. 2 is a diagram showing an outline of the dialogue system (speech control system) 5.

図示のように、対話システム５は、発話制御サーバ（発話制御装置）１と対話装置（発話装置）３とを含む。発話制御サーバ１は、対話装置３を制御して所定の発話を行わせるサーバであり、対話装置３は、発話制御サーバ１の制御に従って発話することにより、ユーザと音声で対話する。なお、同図では、対話装置３が、自動で床の掃除を行う自走掃除ロボットである例を示しているが、発話機能のみを備えていてもよく、発話以外の任意の機能を備えていてもよい。つまり、対話装置３は、発話制御サーバ１の制御に従って発話する（音声出力する）機能を備えていればよい。 As shown in the figure, the dialogue system 5 includes an utterance control server (speech control device) 1 and a dialogue device (speech device) 3. The utterance control server 1 is a server that controls the dialog device 3 to perform a predetermined utterance, and the dialog device 3 utters according to the control of the utterance control server 1 to interact with the user by voice. In addition, in the same figure, although the dialogue apparatus 3 has shown the example which is a self-propelled cleaning robot which cleans a floor automatically, it may be equipped only with the speech function and is equipped with arbitrary functions other than speech. May be. That is, the dialogue apparatus 3 only needs to have a function of uttering (outputting voice) according to the control of the utterance control server 1.

対話システム５の主な特徴点は、発話制御サーバ１が、ＴＶ（テレビジョン受像機、出力装置）５００にて視聴されている番組に応じた内容で対話装置３に発話させる点にある。より具体的には、発話制御サーバ１は、ＴＶ５００の出力している番組を示す視聴番組情報と、ＴＶ５００を識別するためのＴＶＩＤとを、ＴＶ５００から受信する。そして、発話制御サーバ１は、受信した視聴番組情報が示す番組に応じた発話内容を、上記受信したＴＶＩＤが示すＴＶと予め対応付けられた対話装置３に通知して発話させる。これにより、発話制御サーバ１から対話装置３に番組の映像などを送信することなく、ユーザの視聴している番組に応じた内容で対話装置３に発話させることができる。 The main feature of the dialogue system 5 is that the utterance control server 1 causes the dialogue device 3 to utter with contents corresponding to a program being viewed on a TV (television receiver, output device) 500. More specifically, the utterance control server 1 receives viewing program information indicating a program output from the TV 500 and a TV ID for identifying the TV 500 from the TV 500. Then, the utterance control server 1 notifies the dialog device 3 previously associated with the TV indicated by the received TVID and utters the utterance content corresponding to the program indicated by the received viewing program information. As a result, it is possible to cause the dialogue apparatus 3 to utter the content according to the program that the user is viewing without transmitting the video of the program from the utterance control server 1 to the dialogue apparatus 3.

なお、同図では簡単のため、ＴＶ５００と対話装置３を各１つ記載している。しかしながら、発話制御サーバ１は、対話装置３とＴＶ５００との組が複数存在する場合であっても、各ＴＶ５００が出力している番組に応じた内容で、そのＴＶ５００に対応する対話装置３に発話させることができる。 In the figure, for simplicity, only one TV 500 and one interactive device 3 are shown. However, the utterance control server 1 utters the conversation apparatus 3 corresponding to the TV 500 with the contents corresponding to the program output by each TV 500 even when there are a plurality of pairs of the conversation apparatus 3 and the TV 500. Can be made.

〔構成の詳細〕
続いて、対話システム５に含まれる各装置の構成の詳細を図１に基づいて説明する。図１は、発話制御サーバ１および対話装置３の要部構成を示すブロック図である。なお、同図では、対話システム５の構成要素ではないＴＶ５００の要部構成についても併せて示している。 [Configuration details]
Next, the details of the configuration of each device included in the interactive system 5 will be described with reference to FIG. FIG. 1 is a block diagram showing a main configuration of the utterance control server 1 and the dialogue apparatus 3. In the figure, the main configuration of the TV 500 that is not a component of the dialogue system 5 is also shown.

〔発話制御サーバ１の要部構成〕
図示のように、発話制御サーバ１は、発話制御サーバ１にて使用する各種データを記憶するサーバ記憶部１０と、発話制御サーバ１を統括して制御するサーバ制御部２０とを備えている。なお、発話制御サーバ１は、対話装置３の制御やインターネット上の情報にアクセスするための通信部、および発話制御サーバ１にデータを入力するための入力部などのブロックを備えているが、これらのブロックについては図示を省略している。 [Main Configuration of Speech Control Server 1]
As shown in the figure, the utterance control server 1 includes a server storage unit 10 that stores various data used in the utterance control server 1 and a server control unit 20 that controls the utterance control server 1 in an integrated manner. The utterance control server 1 includes blocks such as a communication unit for controlling the interactive device 3 and accessing information on the Internet and an input unit for inputting data to the utterance control server 1. These blocks are not shown.

サーバ記憶部１０には、番組情報テーブル１１、広告検索情報テーブル１２、番組検索情報テーブル１３、ＴＶ−視聴番組対応情報１４、番組−発話内容対応情報１５、ＴＶ−対話装置対応情報（対応情報）１６、および発話管理情報１７が格納されている。また、サーバ制御部２０は、番組情報取得部２１（スポンサー特定手段）、関連情報取得部（関連情報取得手段）２２、視聴番組情報取得部（コンテンツ特定手段）２３、発話内容生成部（発話内容生成手段）２４、発話対象特定部（発話対象特定手段）２５、および発話制御部（発話制御手段）２６を含む。 The server storage unit 10 includes a program information table 11, an advertisement search information table 12, a program search information table 13, a TV-viewing program correspondence information 14, a program-utterance content correspondence information 15, and a TV-interactive device correspondence information (correspondence information). 16 and utterance management information 17 are stored. The server control unit 20 also includes a program information acquisition unit 21 (sponsor specifying means), a related information acquisition unit (related information acquisition unit) 22, a viewing program information acquisition unit (content specification unit) 23, and an utterance content generation unit (utterance content). Generating means) 24, utterance target specifying unit (speech target specifying means) 25, and utterance control unit (speech control means) 26.

番組情報テーブル１１、広告検索情報テーブル１２、および番組検索情報テーブル１３は、番組に応じた発話内容を生成するための情報であり、例えば図３に示すようなデータであってもよい。図３は、番組に応じた発話内容を生成するための情報の一例を示す図である。 The program information table 11, the advertisement search information table 12, and the program search information table 13 are information for generating utterance contents corresponding to a program, and may be data as shown in FIG. 3, for example. FIG. 3 is a diagram illustrating an example of information for generating utterance contents corresponding to a program.

図３の（ａ）には、番組情報テーブル１１の一例を示している。この番組情報テーブル１１は、番組を識別する番組ＩＤと、該番組の開始時刻および終了時刻とが対応付けられたテーブルである。番組情報テーブル１１を参照することにより、現在放送中の番組およびその番組の終了時刻を特定することができるので、放送前または放送後の番組についての発話内容を生成することや、番組の開始、終了時刻に応じた発話タイミングを設定することが可能になる。例えば、放送開始時刻の５分前に、その番組が５分後に開始されることをユーザに伝える発話内容を生成することもできる。また、例えば、番組の放送終了時刻の５分後に、その番組に関する発話を行わせることや、発話内容の有効期限を放送終了時刻またはその近傍に設定することも可能になる。なお、このような発話制御が不要であれば、開始時刻と終了時刻の情報は省略し、放送中の番組の番組ＩＤのみを登録すればよい。 FIG. 3A shows an example of the program information table 11. The program information table 11 is a table in which a program ID for identifying a program is associated with the start time and end time of the program. By referring to the program information table 11, it is possible to specify the currently broadcast program and the end time of the program. Therefore, it is possible to generate the utterance content about the program before or after the broadcast, It is possible to set the utterance timing according to the end time. For example, it is possible to generate utterance content that tells the user that the program will start five minutes before the broadcast start time. Also, for example, it is possible to make an utterance about the program 5 minutes after the broadcast end time of the program, and to set the expiration date of the utterance content at or near the broadcast end time. If such utterance control is unnecessary, the information on the start time and end time is omitted, and only the program ID of the program being broadcast may be registered.

図３の（ｂ）には、番組検索情報テーブル１３の一例を示している。この番組検索情報テーブル１３は、番組を識別する番組ＩＤと、該番組の関連情報の検索に用いる検索キーワードとが対応付けられたテーブルである。番組検索情報テーブル１３を参照することにより、各番組の関連情報（以下、番組関連情報とも呼ぶ）をキーワード検索によって検出することができる。 FIG. 3B shows an example of the program search information table 13. The program search information table 13 is a table in which a program ID for identifying a program is associated with a search keyword used for searching related information of the program. By referring to the program search information table 13, related information of each program (hereinafter also referred to as program related information) can be detected by keyword search.

図３の（ｃ）には、広告検索情報テーブル１２の一例を示している。この広告検索情報テーブル１２は、番組を識別する番組ＩＤと、該番組のジャンルと、該番組のスポンサーを示すスポンサーＩＤとが対応付けられたテーブルである。広告検索情報テーブル１２を参照することにより、各番組、またはそのスポンサーに応じた広告に関する発話内容を生成することができる。例えば、広告検索情報テーブル１２に含まれるジャンルを用いることにより、発話制御サーバ１または他の装置に予め格納された広告のうち、番組のジャンルに応じた広告を特定し、その広告を発話内容とすることができる。また、例えば、広告検索情報テーブル１２に含まれるスポンサーＩＤを用いることにより、発話制御サーバ１または他の装置に予め格納された広告のうち、その番組のスポンサーの広告を特定して発話内容とすることができる。 FIG. 3C shows an example of the advertisement search information table 12. The advertisement search information table 12 is a table in which a program ID for identifying a program, a genre of the program, and a sponsor ID indicating a sponsor of the program are associated with each other. By referring to the advertisement search information table 12, it is possible to generate the utterance content related to the advertisement corresponding to each program or its sponsor. For example, by using the genre included in the advertisement search information table 12, the advertisement corresponding to the genre of the program is specified from the advertisements stored in advance in the utterance control server 1 or other devices, and the advertisement is set as the utterance content. can do. Further, for example, by using the sponsor ID included in the advertisement search information table 12, the advertisement of the sponsor of the program is specified as the utterance content among the advertisements stored in advance in the utterance control server 1 or other devices. be able to.

さらに、広告検索情報テーブル１２と、図３の（ｄ）に示すような広告スポンサー管理テーブルとを併用して、発話内容を決定することもできる。図３の（ｄ）に示す広告スポンサー管理テーブルは、スポンサーＩＤと、該ＩＤで特定されるスポンサーのスポンサー名と、発話内容とが対応付けられたテーブルである。広告検索情報テーブル１２と広告スポンサー管理テーブルとを参照することにより、番組のスポンサーに応じた発話内容を生成することができる。 Furthermore, the content of the utterance can be determined using the advertisement search information table 12 and the advertisement sponsor management table as shown in FIG. The advertisement sponsor management table shown in (d) of FIG. 3 is a table in which a sponsor ID, a sponsor name of a sponsor specified by the ID, and an utterance content are associated with each other. By referring to the advertisement search information table 12 and the advertisement sponsor management table, the utterance content corresponding to the sponsor of the program can be generated.

例えば、図３の（ｄ）の例では、SP0001に対し、スポンサー名である「スポンサー１」と、発話内容である「スポンサー１の広告です」が対応付けられている。これにより、SP0001がスポンサーである番組を出力しているテレビ５００のユーザに対し、対話装置３から「スポンサー１の広告です」との発話を行わせることが可能になる。なお、発話内容はスポンサーの所望の内容とすればよく、例えばスポンサーやその商品、サービスなどのＰＲであってもよい。また、各スポンサーのコマーシャルが放送される時刻が既知であれば、その放送時刻に合わせて上記の発話を行わせることも可能である。また、図３の（ｄ）において、スポンサー名に「汎用」が含まれているレコードのように、特定のスポンサーを対象としない広告を発話内容としてもよい。 For example, in the example of FIG. 3D, the sponsor name “Sponsor 1” and the utterance content “Advertisement of Sponsor 1” are associated with SP0001. As a result, the user of the television 500 outputting the program whose sponsor is SP0001 can be uttered by the dialogue device 3 as “Advertisement of sponsor 1”. The content of the utterance may be a content desired by the sponsor, and may be a PR of the sponsor, its product, service, or the like. Further, if the time at which each sponsor's commercial is broadcast is known, the above-mentioned utterance can be performed in accordance with the broadcast time. In addition, in FIG. 3D, an advertisement that does not target a specific sponsor, such as a record that includes “general-purpose” in the sponsor name, may be used as the utterance content.

なお、広告を発話内容とする場合、番組に登場する商品やサービスをＰＲする発話内容としてもよい。例えば、番組に登場する地域、店舗、出演者の身に付けている衣服や装飾品等をＰＲする発話内容を生成してもよい。この場合、番組に登場する地域、店舗、出演者の身に付けている衣服や装飾品をＰＲしたい者がこのような発話内容を生成して、その番組と対応付けて発話制御サーバ１に登録または送信すればよい。 In addition, when making an utterance content into an advertisement, it is good also as an utterance content which publicizes the goods and service which appear in a program. For example, the utterance content that publicizes the area, the store, the clothes and the decorations worn by the performers appearing in the program may be generated. In this case, a person who wants to publicize the area, store, performer's clothes and decorations that appear in the program generates such utterance content and registers it in the utterance control server 1 in association with the program. Or just send.

番組−発話内容対応情報１５は、発話内容生成部２４が生成した発話内容と、その発話内容に対応する番組とを対応付けた情報であり、発話内容生成部２４によって生成される。また、番組−発話内容対応情報１５は、当該発話内容で発話するタイミングを指定する情報を含んでいてもよい。 The program-utterance content correspondence information 15 is information in which the utterance content generated by the utterance content generation unit 24 is associated with the program corresponding to the utterance content, and is generated by the utterance content generation unit 24. Further, the program-utterance content correspondence information 15 may include information for designating the timing of speaking with the speech content.

例えば、番組−発話内容対応情報１５は、図４のようなデータであってもよい。図４は、番組−発話内容対応情報１５の一例を示す図である。図示の番組−発話内容対応情報１５は、番組を識別する番組ＩＤと、発話内容と、発話タイミングとが対応付けられたテーブルであり、各レコードには連番でナンバーが付されている。これにより、番組に応じた発話内容と、その発話内容で発話すべきタイミングを特定することができる。なお、同図の「即時」は、特定の時刻になるのを待つことなく、発話が可能な状態になり次第、発話させることを示している。 For example, the program-utterance content correspondence information 15 may be data as shown in FIG. FIG. 4 is a diagram showing an example of the program-utterance content correspondence information 15. The illustrated program-utterance content correspondence information 15 is a table in which a program ID for identifying a program, an utterance content, and an utterance timing are associated with each other, and each record is numbered sequentially. As a result, it is possible to specify the utterance content corresponding to the program and the timing at which the utterance should be uttered based on the utterance content. Note that “immediately” in the figure indicates that the user speaks as soon as it becomes possible to speak without waiting for a specific time.

ＴＶ−視聴番組対応情報１４は、ＴＶ５００と該ＴＶ５００で出力中の番組とを対応付けた情報である。ＴＶ−視聴番組対応情報１４は、視聴番組情報取得部２３により生成され、ＴＶ５００の出力する番組が切り替えられた場合には更新される。 The TV-viewing program correspondence information 14 is information that associates the TV 500 with the program being output by the TV 500. The TV-viewing program correspondence information 14 is generated by the viewing program information acquisition unit 23 and is updated when the program output by the TV 500 is switched.

例えば、ＴＶ−視聴番組対応情報１４は、図５に示すようなデータであってもよい。図５は、ＴＶ−視聴番組対応情報１４の一例を示す図である。図示のＴＶ−視聴番組対応情報１４は、ＴＶ５００を識別するＴＶＩＤと、該ＴＶＩＤで識別されるＴＶ５００で出力中の番組の番組ＩＤとが対応付けられたテーブルであり、各レコードには連番でナンバーが付されている。これにより、複数のＴＶ５００が存在する場合であっても、各ＴＶ５００で出力中の番組を特定することができる。 For example, the TV-viewing program correspondence information 14 may be data as shown in FIG. FIG. 5 is a diagram illustrating an example of the TV-viewing program correspondence information 14. The illustrated TV-viewing program correspondence information 14 is a table in which a TV ID for identifying the TV 500 and a program ID of a program being output by the TV 500 identified by the TV ID are associated with each other. Numbered. Thereby, even when there are a plurality of TVs 500, it is possible to specify a program being output by each TV 500.

ＴＶ−対話装置対応情報１６は、ＴＶ５００と対応付けられた対話装置３を示す情報である。発話制御サーバ１は、対話装置３の発話を制御するサーバであり、制御対象である対話装置３を特定することは可能である。このため、対話装置３と同じ部屋で使用するＴＶ５００の登録を、例えばその対話装置３のユーザから受け付けることによって、ＴＶ−対話装置対応情報１６を生成することができる。 The TV-interactive device correspondence information 16 is information indicating the interactive device 3 associated with the TV 500. The utterance control server 1 is a server that controls the utterance of the dialog device 3, and can specify the dialog device 3 to be controlled. Therefore, the TV-interactive device correspondence information 16 can be generated by receiving registration of the TV 500 used in the same room as the interactive device 3 from, for example, the user of the interactive device 3.

ＴＶ−対話装置対応情報１６は、例えば図６の（ａ）に示すようなデータであってもよい。図６は、対話装置３とＴＶ５００で出力中の番組との対応を特定するために用いるデータを示す図であり、（ａ）はＴＶ−対話装置対応情報１６の一例を示し、（ｂ）は対話装置−視聴番組対応情報の一例を示す。図示のＴＶ−対話装置対応情報１６は、ＴＶ５００を識別するＴＶＩＤと、対話装置３を識別する対話装置ＩＤとが対応付けられたテーブルであり、各レコードには連番でナンバーが付されている。これにより、複数のＴＶ５００が存在する場合であっても、各ＴＶ５００に対応する（各ＴＶ５００と共に使用される）対話装置３をそれぞれ特定することができる。 The TV-interactive device correspondence information 16 may be data as shown in FIG. FIG. 6 is a diagram showing data used to identify the correspondence between the interactive device 3 and the program being output on the TV 500, (a) shows an example of the TV-interactive device correspondence information 16, (b) An example of interactive apparatus-viewing program correspondence information is shown. The illustrated TV-interactive device correspondence information 16 is a table in which a TV ID for identifying the TV 500 and an interactive device ID for identifying the interactive device 3 are associated with each other, and each record is numbered sequentially. . As a result, even when there are a plurality of TVs 500, it is possible to specify the interactive devices 3 corresponding to the TVs 500 (used together with the TVs 500).

なお、ここでは、図４に示したような、ＴＶ５００が出力中の番組を示す番組ＩＤと発話内容とが対応付けられた番組−発話内容対応情報１５を用いることを想定している。このため、ＴＶ−視聴番組対応情報１４とＴＶ−対話装置対応情報１６とを用いて、ＴＶ５００が出力中の番組と、このＴＶ５００に対応する対話装置３との対応を特定する。このため、ＴＶ−視聴番組対応情報１４を格納する代わりに、図６の（ｂ）に示すような対話装置−視聴番組対応情報を格納してもよい。 Here, it is assumed that the program-utterance content correspondence information 15 in which the program ID indicating the program being output by the TV 500 and the speech content are associated as shown in FIG. 4 is used. Therefore, using the TV-viewing program correspondence information 14 and the TV-dialogue device correspondence information 16, the correspondence between the program being output by the TV 500 and the dialogue device 3 corresponding to the TV500 is specified. Therefore, instead of storing the TV-viewing program correspondence information 14, interactive device-viewing program correspondence information as shown in FIG. 6B may be stored.

発話管理情報１７は、対話装置３に発話を行わせるための管理情報であり、発話制御部２６によって生成・更新される。発話管理情報１７は、例えば図７に示すようなデータであってもよい。図７は、発話管理情報１７の一例を示す図である。図示の発話管理情報１７は、発話させる対話装置３を識別する対話装置ＩＤと、発話内容と、該発話内容に対する想定応答と、該想定応答に対する２次発話と、有効期限とが対応付けられたテーブルであり、各レコードには連番でナンバーが付されている。 The utterance management information 17 is management information for causing the dialogue apparatus 3 to utter, and is generated and updated by the utterance control unit 26. The utterance management information 17 may be data as shown in FIG. 7, for example. FIG. 7 is a diagram illustrating an example of the utterance management information 17. In the illustrated utterance management information 17, an interactive device ID for identifying the interactive device 3 to be uttered, an utterance content, an assumed response to the utterance content, a secondary utterance to the assumed response, and an expiration date are associated with each other. It is a table, and each record is numbered sequentially.

なお、想定応答とは、発話内容に対するユーザの応答内容を予め想定して登録したものである。また、２次発話は、ユーザが想定応答を行った場合に、対話装置３に発話させる発話内容を示す。想定応答および２次発話における「汎用」は、特定の想定応答または２次発話が設定されていないことを意味しており、想定応答および２次発話が「汎用」の発話内容で発話した後は、対話装置３はユーザの発話内容に応じて予め定められた応答を行う。有効期限は、該当レコードの発話を行わせる期限を示し、この期限を過ぎても発話指示を行っていないレコードに基づく発話指示は行われない。 The assumed response is registered in advance assuming the user's response content to the utterance content. The secondary utterance indicates the utterance content to be uttered by the dialogue apparatus 3 when the user makes an assumed response. “Generic” in the assumed response and the secondary utterance means that a specific assumed response or secondary utterance is not set, and after the assumed response and the secondary utterance are uttered with the utterance content of “general” The dialogue apparatus 3 makes a predetermined response according to the user's utterance content. The expiration date indicates a time limit for performing the utterance of the corresponding record, and no utterance instruction based on a record for which no utterance instruction has been issued even after this time limit has passed.

このような発話管理情報１７を参照することにより、複数の対話装置３のそれぞれについて、各対話装置３に対応付けられたＴＶ５００が出力中の番組に応じた発話を行わせることができる。また、ユーザが対話装置３に対して想定応答を行った場合に、当該対話装置３に２次発話を行わせることができる。 By referring to such utterance management information 17, it is possible to cause each of the plurality of interactive devices 3 to make an utterance corresponding to the program being output by the TV 500 associated with each interactive device 3. Further, when the user makes an assumed response to the dialogue apparatus 3, the dialogue apparatus 3 can be caused to make a secondary utterance.

番組情報取得部２１は、放送中の番組を示す番組情報を取得する。例えば、番組情報取得部２１は、ＥＰＧ（Electronic Program Guide）から、現在の時刻に放送している番組を特定し、上記番組情報として取得してもよい。ＥＰＧは、放送波やインターネット等から取得してもよいし、発話制御サーバ１の管理者などが手入力して番組情報取得部２１に取得させてもよい。また、番組情報取得部２１は、取得した番組情報を含む番組情報テーブル１１を生成してサーバ記憶部１０に格納する。さらに、番組情報取得部２１は、ＥＰＧ等に含まれる、番組の出演者、ジャンル、スポンサーを示す情報を用いて広告検索情報テーブル１２および番組検索情報テーブル１３を生成し、サーバ記憶部１０に格納する。なお、これらのテーブルは、他の装置が生成したものを取得してもよい。また、番組情報取得部２１は、放送中ではない番組（放送終了後の番組またはこれから放送される番組）を含む番組情報を取得してもよい。 The program information acquisition unit 21 acquires program information indicating a program being broadcast. For example, the program information acquisition unit 21 may specify a program broadcast at the current time from an EPG (Electronic Program Guide) and acquire it as the program information. The EPG may be acquired from a broadcast wave, the Internet, or the like, or may be manually input by the administrator of the speech control server 1 and acquired by the program information acquisition unit 21. Further, the program information acquisition unit 21 generates a program information table 11 including the acquired program information and stores it in the server storage unit 10. Further, the program information acquisition unit 21 generates the advertisement search information table 12 and the program search information table 13 using information indicating the performers, genres, and sponsors of the program included in the EPG and stores them in the server storage unit 10. To do. Note that these tables may be generated by other devices. Further, the program information acquisition unit 21 may acquire program information including a program that is not being broadcast (a program after the broadcast ends or a program that will be broadcast in the future).

関連情報取得部２２は、番組情報取得部２１が取得した番組情報に示される各番組の番組関連情報を所定の検索処理を行うことによって取得する。具体的には、関連情報取得部２２は、番組検索情報テーブル１３に含まれる検索キーワードを用いて、検索サイト等で検索を行うことにより、番組関連情報を取得する。なお、ＥＰＧには、番組名の他、出演者等の番組に関連する情報が含まれているから、ＥＰＧを検索して番組関連情報を取得してもよい。無論、番組関連情報は、どのような情報源から取得してもよい。 The related information acquisition unit 22 acquires the program related information of each program indicated in the program information acquired by the program information acquisition unit 21 by performing a predetermined search process. Specifically, the related information acquisition unit 22 acquires program related information by performing a search on a search site or the like using a search keyword included in the program search information table 13. Since the EPG includes information related to the program such as the performer in addition to the program name, the EPG may be searched to acquire the program related information. Of course, the program related information may be obtained from any information source.

また、関連情報取得部２２は、予め定められたイベントが発生したことを検出して発話内容生成部２４に通知する。各イベントには、予め発話内容が登録されており、これにより、イベントの発生時にそのイベントに応じた内容で対話装置３に発話させることができる。なお、イベントの発生は、イベントごとに予め定められた条件を満たしているか否かを判断することで検出することができる。 Further, the related information acquisition unit 22 detects that a predetermined event has occurred and notifies the utterance content generation unit 24 of it. In each event, the utterance content is registered in advance, so that when the event occurs, the dialog device 3 can utter the content according to the event. The occurrence of an event can be detected by determining whether or not a predetermined condition for each event is satisfied.

例えば、近時、短文投稿サイトなどでは、放送中の番組に関するコメントの投稿がなされている。そして、このようなサイトでは、特定の番組の特定のシーンにおいて、内容が同一または類似したコメントが多数投稿されることがある。例えば、著名な映画の一シーンにおけるセリフが、そのシーンが放送されるタイミングで多くのユーザから投稿されることが知られている。そこで、短文投稿サイトにおいて、特定のコメントが所定時間内に所定数以上投稿されたことを上記イベントとして検出してもよい。この場合、上記特定のコメントを、このイベントに対応する発話内容として登録しておくことにより、短文投稿サイトへの投稿と連動した発話を対話装置３に行わせることができる。また、上記のような特定のコメントを発話内容とする場合、投稿されたコメントの中から出現頻度の高い単語を抽出し、この単語を含む発話内容を生成してもよい。これにより、短文投稿サイトにおいてそのときに多数書き込まれている内容に応じた発話内容を自動的に生成することができる。 For example, recently, a comment on a program being broadcast has been posted on a short text posting site or the like. In such a site, many comments having the same or similar contents may be posted in a specific scene of a specific program. For example, it is known that words in a scene of a famous movie are posted by many users at the timing when the scene is broadcast. Therefore, in the short posting site, it may be detected as the event that a predetermined number of specific comments are posted within a predetermined time. In this case, by registering the specific comment as the utterance content corresponding to this event, it is possible to cause the dialogue apparatus 3 to utter in conjunction with posting to the short sentence posting site. Moreover, when making the above-mentioned specific comment into utterance content, the word with high appearance frequency may be extracted from the posted comment, and the utterance content containing this word may be produced | generated. As a result, it is possible to automatically generate the utterance contents corresponding to the contents written at that time on the short text posting site.

また、番組の放送開始時刻から所定の時間が経過したこと（あるいは放送終了時刻までの残り時間が所定の時間となったこと）をイベントとして検出してもよい。例えば、番組終盤の時間帯に特定のセリフが述べられる番組において、このようなイベントを設定しておき、当該イベントの発生を検出したとき（放送開始時刻から所定の時間が経過したことを検出したとき）に、そのセリフを発話させることもできる。なお、この構成では、番組においてセリフが述べられるタイミングと、対話装置３がそのセリフを発話するタイミングとを完全に一致させることは難しい場合がある。そこで、このような場合に対応できるよう、想定応答としてそのセリフのタイミングが悪い（早すぎるあるいは遅すぎる）ことを指摘する応答を登録しておき、２次発話として例えばタイミングの悪さを詫びる内容の発話を登録しておいてもよい。 Further, it may be detected as an event that a predetermined time has elapsed from the broadcast start time of the program (or that the remaining time until the broadcast end time has reached a predetermined time). For example, in a program in which a specific line is described at the end of the program, when such an event is set and occurrence of the event is detected (detected that a predetermined time has elapsed from the broadcast start time) ), You can also speak that line. In this configuration, it may be difficult to completely match the timing at which the dialogue is described in the program with the timing at which the dialogue apparatus 3 utters the dialogue. Therefore, in order to cope with such a case, a response indicating that the timing of the speech is bad (too early or too late) is registered as an assumed response, and as a secondary utterance, for example, content that apologizes for poor timing You may register utterances.

また、データ放送や、スマートフォンなどの情報端末を利用して、放送されている番組に対するユーザの感情などが反映された情報を収集することが行われている。そこで、このような情報を利用して発話内容を決定してもよい。例えば、各視聴者の所持する情報端末からのデータ受信回数を、番組の盛り上がり度合を示す指標として利用する場合、データ受信回数の合計値が所定回数に達したことを上記イベントとして検出し、番組の盛り上がりに関する発話を行わせてもよい。また、番組のジャンルを加味して、番組の盛り上がりに応じた発話を行わせることも可能である。例えば、番組のジャンルが「お笑い」や「バラエティ」である場合に、笑い声を発話させることや、番組のジャンルが「スポーツ」である場合に、「いい試合だね」等の試合の盛り上がりに関連した発話を行わせることもできる。また、番組の放送終了後や、終了時刻付近に、その番組中の上記データ受信回数に応じた発話を行わせてもよい。例えば、回数が所定回数以上であれば、「いい試合だったね」や「面白かったね」のように肯定的な発話を行わせ、所定回数に達していなければ「もう少し頑張って欲しかったね」のように否定的な発話を行わせてもよい。 In addition, using information terminals such as data broadcasting and smartphones, collecting information reflecting user's feelings on the broadcasted program is performed. Therefore, the utterance content may be determined using such information. For example, when the number of data receptions from the information terminal possessed by each viewer is used as an index indicating the degree of excitement of the program, it is detected as the event that the total number of data receptions has reached a predetermined number of times, and the program You may be asked to speak about the excitement. It is also possible to make an utterance according to the excitement of the program, taking into account the genre of the program. For example, if the genre of the program is “Comedy” or “Variety”, it is related to the excitement of the game such as “Let ’s laugh” or “Sports” if the program genre is “Sports” Utterances can be made. Further, utterances corresponding to the number of data receptions in the program may be performed after the broadcast of the program ends or near the end time. For example, if the number of times is greater than or equal to the predetermined number, make a positive utterance such as “It was a good game” or “It was interesting”, and if it has not reached the predetermined number, “I wanted you to do a little more hard” Negative utterances may be made.

視聴番組情報取得部２３は、対話装置３のユーザが視聴している番組を示す視聴番組情報（番組ＩＤ）を取得する。具体的には、視聴番組情報取得部２３は、対話装置３と対応付けられたＴＶ５００から、ＴＶＩＤと、出力中の番組の番組を示す視聴番組情報とを受信する。そして、視聴番組情報取得部２３は、受信した視聴番組情報から特定した番組ＩＤと、受信したＴＶＩＤとを対応付け、ＴＶ−視聴番組対応情報１４としてサーバ記憶部１０に格納する。 The viewing program information acquisition unit 23 acquires viewing program information (program ID) indicating a program that the user of the dialogue apparatus 3 is viewing. Specifically, the viewing program information acquisition unit 23 receives TVID and viewing program information indicating the program of the program being output from the TV 500 associated with the dialogue apparatus 3. Then, the viewing program information acquisition unit 23 associates the program ID specified from the received viewing program information with the received TVID, and stores it in the server storage unit 10 as TV-viewing program correspondence information 14.

なお、視聴番組情報は、ＴＶ５００が出力中の番組の番組を示す情報であればよく、特に限定されないが、ここではＴＶ５００に対して行われたユーザの操作内容を示す情報である例を説明する。ユーザの操作内容は、例えば電源のＯＮ／ＯＦＦ操作や、選曲操作、チャンネル送り（戻し）操作等を想定しており、このような操作内容を示す視聴番組情報を受信することにより、ＴＶ５００が出力中の番組を特定することが可能である。番組ＩＤは、番組を一意に特定できるものであればよく、視聴番組情報取得部２３が設定してもよいし、ＥＰＧ等で使用されているＩＤを流用してもよい。また、ＴＶ５００側で出力中の番組の特定を行い、特定した番組を示す情報（例えば番組ＩＤ）を発話制御サーバ１に送信するようにしてもよい。 Note that the viewing program information is not particularly limited as long as it is information indicating the program of the program that is being output by the TV 500. Here, an example in which the viewing program information is information indicating the user's operation performed on the TV 500 will be described. . The operation contents of the user are assumed to be, for example, power ON / OFF operation, music selection operation, channel sending (returning) operation, and the like, and the TV 500 outputs the viewing program information indicating such operation contents. It is possible to specify the program inside. The program ID may be any program ID that can uniquely identify the program, and may be set by the viewing program information acquisition unit 23 or may be an ID used in EPG or the like. Further, the program being output on the TV 500 side may be specified, and information indicating the specified program (for example, program ID) may be transmitted to the utterance control server 1.

発話内容生成部２４は、番組情報取得部２１が取得した番組情報に示される各番組に応じた発話内容を生成し、生成した発話内容と番組ＩＤとを対応付けて、番組−発話内容対応情報１５としてサーバ記憶部１０に格納する。生成する発話内容は、番組に関連するものであればよく、特に限定されない。例えば、関連情報取得部２２が取得した番組関連情報を定型文に組み込むこと、上述の広告スポンサー管理テーブルを用いること、あるいは番組のタイトルや出演者を定型文に組み込むことによって、発話内容を生成してもよい。また、発話内容は、発話制御サーバ１または他の装置に予め登録しておいてもよい。例えば、予めジャンル毎の発話内容を登録しておき、登録された発話内容の中から、番組のジャンルに応じた発話内容を取得してもよい。また、例えば、番組ごとの発話内容を登録しておき、これを利用してもよい。さらに、発話内容は、裏番組に関するものであってもよく、例えばある番組の発話内容として、「今、裏番組でＸＸという番組が放送されているよ」のような、裏番組を紹介する発話内容を生成してもよい。なお、裏番組は、ＥＰＧ等を参照することで特定することができる。 The utterance content generation unit 24 generates utterance content corresponding to each program indicated in the program information acquired by the program information acquisition unit 21, associates the generated utterance content with the program ID, and shows program-utterance content correspondence information. 15 is stored in the server storage unit 10. The utterance content to be generated is not particularly limited as long as it is related to the program. For example, the utterance content is generated by incorporating the program related information acquired by the related information acquisition unit 22 into the standard sentence, using the above-described advertisement sponsor management table, or incorporating the program title or performer into the standard sentence. May be. Moreover, the utterance content may be registered in advance in the utterance control server 1 or another device. For example, the utterance content for each genre may be registered in advance, and the utterance content corresponding to the genre of the program may be acquired from the registered utterance content. Further, for example, the utterance content for each program may be registered and used. Furthermore, the utterance content may be related to the back program. For example, as an utterance content of a certain program, an utterance introducing a back program such as “The program XX is being broadcast in the back program now” Content may be generated. The back program can be specified by referring to EPG or the like.

さらに、発話内容生成部２４は、生成した各発話内容について、その発話タイミングを決定してもよい。発話タイミングは、番組情報テーブル１１に含まれる、番組の開始、終了時刻に応じたものとしてもよい。なお、対話システム５では、ＴＶ５００で視聴されている番組を検出して発話を行わせるので、発話タイミングは「即時」が基本である。このため、発話タイミングの決定は省略してもよい。 Furthermore, the utterance content generation unit 24 may determine the utterance timing for each generated utterance content. The utterance timing may be in accordance with the start and end times of the program included in the program information table 11. In the dialog system 5, since the program being viewed on the TV 500 is detected and uttered, the utterance timing is basically “immediate”. For this reason, determination of the utterance timing may be omitted.

発話対象特定部２５は、発話内容生成部２４が生成した発話内容で発話させる対話装置３を特定する。具体的には、発話対象特定部２５は、番組−発話内容対応情報１５を参照して、発話タイミングとなっているレコードを抽出し、そのレコードに含まれる番組ＩＤを特定する。そして、ＴＶ−視聴番組対応情報１４とＴＶ−対話装置対応情報１６とを参照して、上記特定した番組ＩＤに対応する対話装置３を特定し、特定した対話装置３のリストを発話制御部２６に通知する。 The utterance target specifying unit 25 specifies the dialog device 3 that utters the utterance content generated by the utterance content generation unit 24. Specifically, the utterance target specifying unit 25 refers to the program-utterance content correspondence information 15, extracts a record at the utterance timing, and specifies a program ID included in the record. Then, with reference to the TV-viewing program correspondence information 14 and the TV-dialogue device correspondence information 16, the dialogue device 3 corresponding to the identified program ID is identified, and the list of the identified dialogue devices 3 is set as the utterance control unit 26. Notify

発話制御部２６は、視聴番組情報取得部２３が特定したＴＶ５００が出力中の番組に応じて予め発話内容生成部２４が生成した発話内容で対話装置３（発話対象特定部２５が特定した対話装置３）に発話させる。具体的には、発話制御部２６は、発話対象特定部２５から通知された対話装置３のリストと、番組−発話内容対応情報１５とから、発話管理情報１７を生成する。そして、発話内容に対応する発話データ（音声データ）を生成し、上記生成した発話管理情報１７に従って、対話装置３に発話データと発話指示を送信する。 The utterance control unit 26 uses the conversation contents 3 (the conversation apparatus identified by the utterance target identification unit 25) with the utterance contents generated in advance by the utterance content generation unit 24 according to the program being output by the TV 500 specified by the viewing program information acquisition unit 23. Let 3) speak. Specifically, the utterance control unit 26 generates the utterance management information 17 from the list of the dialog devices 3 notified from the utterance target specifying unit 25 and the program-utterance content correspondence information 15. Then, utterance data (voice data) corresponding to the utterance content is generated, and the utterance data and the utterance instruction are transmitted to the dialogue device 3 in accordance with the generated utterance management information 17.

〔対話装置３の要部構成〕
図１に示すように、対話装置３は、対話装置３を統括して制御する対話装置制御部３０、音声を出力する音声出力部４０、対話装置３にて使用する各種データを記憶する記憶部４１、および音声の入力を受け付ける音声入力部４２を備えている。また、対話装置制御部３０は、発話部（発話手段）３１、応答生成部３２、および解析部３３を含む。 [Configuration of main part of the dialogue apparatus 3]
As shown in FIG. 1, the dialogue device 3 includes a dialogue device control unit 30 that controls the dialogue device 3 in an integrated manner, a voice output unit 40 that outputs voice, and a storage unit that stores various data used in the dialogue device 3. 41 and a voice input unit 42 for receiving voice input. The interactive device control unit 30 includes an utterance unit (speech means) 31, a response generation unit 32, and an analysis unit 33.

発話部３１は、発話制御サーバ１から受信した発話データを音声出力部４０に出力させることによって対話装置３に発話させる。また、発話部３１は、発話制御サーバ１から想定応答の内容を示す情報、および２次発話の発話データを受信している場合には、これらのデータを記憶部４１に格納する。 The utterance unit 31 causes the dialogue apparatus 3 to utter by causing the voice output unit 40 to output the utterance data received from the utterance control server 1. In addition, when the utterance unit 31 receives information indicating the content of the assumed response and the utterance data of the secondary utterance from the utterance control server 1, the utterance unit 31 stores these data in the storage unit 41.

応答生成部３２は、対話装置３に対するユーザの呼びかけに応答するための制御を行う。例えば、ユーザからの呼びかけが想定応答である場合には、その想定応答と対応する２次発話の発話データを発話するように発話部３１を制御する。また、想定応答ではない呼びかけがなされた場合には、その呼びかけの内容に応じた応答内容を生成し、生成した内容で発話部３１に発話させる。 The response generation unit 32 performs control for responding to the user's call to the dialogue apparatus 3. For example, when the call from the user is an assumed response, the utterance unit 31 is controlled to utter the utterance data of the secondary utterance corresponding to the assumed response. When a call that is not an assumed response is made, a response content corresponding to the content of the call is generated, and the utterance unit 31 is uttered with the generated content.

解析部３３は、対話装置３に対するユーザの呼びかけの内容を解析する。具体的には、解析部３３は、音声入力部４２に入力された音声を認識し、認識結果を応答生成部３２に通知する。 The analysis unit 33 analyzes the content of the user's call to the dialogue device 3. Specifically, the analysis unit 33 recognizes the voice input to the voice input unit 42 and notifies the response generation unit 32 of the recognition result.

〔ＴＶ５００の要部構成〕
ＴＶ５００は、図示のように視聴番組情報送信部５０１を備えている。なお、ＴＶ５００は、発話制御サーバ１に情報を送信するための通信部、ＴＶ番組を受信して出力するための各種構成などを備えているが、図１では図示を省略している。 [Main part of TV500]
The TV 500 includes a viewing program information transmission unit 501 as illustrated. Note that the TV 500 includes a communication unit for transmitting information to the utterance control server 1 and various configurations for receiving and outputting TV programs, but is not shown in FIG.

視聴番組情報送信部５０１は、ＴＶ５００が出力中の番組を示す視聴番組情報を発話制御サーバ１に送信する。具体的には、視聴番組情報送信部５０１は、ＴＶ５００に対して行われたユーザの操作内容を示す視聴番組情報を送信する。なお、視聴番組情報がこの例に限られないことは上述した通りである。 The viewing program information transmission unit 501 transmits viewing program information indicating the program being output by the TV 500 to the utterance control server 1. Specifically, the viewing program information transmission unit 501 transmits viewing program information indicating the user's operation details performed on the TV 500. As described above, the viewing program information is not limited to this example.

〔発話内容生成処理の流れ〕
次に、発話内容生成処理の流れを図８に基づいて説明する。図８は、発話制御サーバ１が実行する発話内容生成処理の一例を示すフローチャートである。まず、番組情報取得部２１は、放送中の番組の番組情報を取得し（Ｓ１）、取得した番組情報を番組情報テーブル１１としてサーバ記憶部１０に格納する。また、番組情報取得部２１は、番組検索情報テーブル１３および広告検索情報テーブル１２を生成してサーバ記憶部１０に格納する。 [Flow of utterance content generation processing]
Next, the flow of the utterance content generation process will be described with reference to FIG. FIG. 8 is a flowchart illustrating an example of utterance content generation processing executed by the utterance control server 1. First, the program information acquisition unit 21 acquires program information of a program being broadcast (S1), and stores the acquired program information as the program information table 11 in the server storage unit 10. Further, the program information acquisition unit 21 generates the program search information table 13 and the advertisement search information table 12 and stores them in the server storage unit 10.

次に、関連情報取得部２２は、上記格納された番組検索情報テーブル１３を用いて番組関連情報を検索し（Ｓ２）、検出した番組関連情報を発話内容生成部２４に通知する。さらに、関連情報取得部２２は、イベント条件が満たされているか判定し（Ｓ３）、条件が満たされているイベントが存在すれば、それを発話内容生成部２４に通知する。 Next, the related information acquisition unit 22 searches for the program related information using the stored program search information table 13 (S2), and notifies the utterance content generation unit 24 of the detected program related information. Further, the related information acquisition unit 22 determines whether or not the event condition is satisfied (S3), and if there is an event that satisfies the condition, notifies the utterance content generation unit 24 of the event.

そして、発話内容生成部２４は、関連情報取得部２２から通知された番組関連情報、およびサーバ記憶部１０に格納された広告検索情報テーブル１２を用いて、番組情報テーブル１１に含まれる各番組に対応する発話内容を生成する（Ｓ４）。なお、条件が満たされているイベントが通知されている場合、そのイベントに対応する発話内容を生成する。また、広告スポンサー管理テーブル（図３の（ｄ）参照）を予め格納しているか、または外部から取得した場合、発話内容生成部２４は、該広告スポンサー管理テーブルと広告検索情報テーブル１２とを用いて発話内容を生成する。 Then, the utterance content generation unit 24 uses the program related information notified from the related information acquisition unit 22 and the advertisement search information table 12 stored in the server storage unit 10 to determine each program included in the program information table 11. Corresponding utterance content is generated (S4). If an event that satisfies the condition is notified, the utterance content corresponding to the event is generated. When the advertisement sponsor management table (see FIG. 3D) is stored in advance or acquired from the outside, the utterance content generation unit 24 uses the advertisement sponsor management table and the advertisement search information table 12. To generate utterance content.

最後に、発話内容生成部２４は、Ｓ４で生成した発話内容と、該発話内容で発話するタイミングを指定する情報とを対応付けて、番組−発話内容対応情報１５としてサーバ記憶部１０に格納する（Ｓ５）。これにより、発話内容生成処理は終了する。なお、放送中の番組は随時変化するので、発話制御サーバ１は、上述のフローの処理を、所定時間毎に実行する。 Finally, the utterance content generation unit 24 associates the utterance content generated in S4 with information specifying the timing of utterance with the utterance content, and stores it in the server storage unit 10 as program-utterance content correspondence information 15. (S5). Thereby, the utterance content generation process ends. Since the program being broadcast changes from time to time, the utterance control server 1 executes the processing of the above-described flow every predetermined time.

〔視聴情報送信処理および視聴番組登録処理の流れ〕
次に、視聴情報送信処理および視聴番組登録処理の流れを図９に基づいて説明する。図９は、ＴＶ５００が実行する視聴情報送信処理、および発話制御サーバ１が実行する視聴番組登録処理の一例を示すフローチャートである。 [Flow of viewing information transmission processing and viewing program registration processing]
Next, the flow of viewing information transmission processing and viewing program registration processing will be described with reference to FIG. FIG. 9 is a flowchart illustrating an example of viewing information transmission processing executed by the TV 500 and viewing program registration processing executed by the speech control server 1.

視聴情報送信処理では、ＴＶ５００の視聴番組情報送信部５０１は、ＴＶ５００に対するユーザの操作を受け付けたときに（Ｓ２１）、視聴番組情報およびＴＶＩＤを発話制御サーバ１に送信する（Ｓ２２）。これらの処理は、ＴＶ５００の動作中、継続して行われる。なお、視聴番組情報送信部５０１は、ユーザ操作が検出されない状態が、予め定めた一定時間が継続していることを検出した場合にも、視聴番組情報およびＴＶＩＤを送信する。この場合、最後に検出されたユーザ操作の内容を視聴番組情報として送信する。 In the viewing information transmission process, the viewing program information transmission unit 501 of the TV 500 transmits viewing program information and TVID to the utterance control server 1 when receiving a user operation on the TV 500 (S21) (S22). These processes are continuously performed during the operation of the TV 500. Note that the viewing program information transmission unit 501 transmits viewing program information and TVID even when it is detected that a user operation is not detected for a predetermined period of time. In this case, the content of the last detected user operation is transmitted as viewing program information.

視聴番組登録処理では、発話制御サーバ１の視聴番組情報取得部２３が、上記送信された視聴番組情報およびＴＶＩＤを受信する（Ｓ３１、コンテンツ特定ステップ）。そして、受信した視聴番組情報から、該ＴＶＩＤで特定されるＴＶ５００においてＴＶ番組の視聴中であるか判断する（Ｓ３２）。具体的には、ＴＶの電源を切る操作が行われたこと、またはＴＶ番組以外の外部出力への切り替え操作が行われたことを示す視聴番組情報を受信したときに、ＴＶ番組の視聴中ではないと判断する。なお、ＴＶ５００からの視聴番組情報およびＴＶＩＤの受信が一定時間以上検出されない場合にも、ＴＶ番組の視聴中ではないと判断してもよい。これにより、ＴＶ５００の電源コードが抜ける等、ＴＶ５００の操作が行われていないが、ＴＶの視聴が終了した場合にも、ＴＶ番組の視聴中ではないと判断することができる。また、番組情報テーブル１１を参照して番組の終了時刻を特定し、その時刻を過ぎていれば視聴中ではないと判断してもよい。 In the viewing program registration process, the viewing program information acquisition unit 23 of the utterance control server 1 receives the transmitted viewing program information and TVID (S31, content specifying step). Then, it is determined from the received viewing program information whether the TV program specified by the TVID is viewing a TV program (S32). Specifically, when viewing program information indicating that an operation to turn off the TV is performed or an operation to switch to an external output other than a TV program is received, the TV program is being viewed. Judge that there is no. Note that it may be determined that the TV program is not being watched even when reception of the viewing program information and TVID from the TV 500 is not detected for a certain period of time. As a result, it is possible to determine that the TV program is not being watched even when the TV 500 is not operated, such as the power cord of the TV 500 being disconnected, but the TV viewing is ended. The end time of the program may be specified with reference to the program information table 11, and it may be determined that the program is not being viewed if that time has passed.

ここで、番組の視聴中ではないと判断した場合（Ｓ３２でＮＯ）、視聴番組情報取得部２３は、サーバ記憶部１０に格納されているＴＶ−視聴番組対応情報１４から、Ｓ３１で受信したＴＶＩＤのレコードを削除する。つまり、番組を出力していないＴＶ５００については、ＴＶＩＤと番組との紐付けを解除する。この後、視聴番組情報取得部２３は、新たな視聴番組情報およびＴＶＩＤの受信待ち受け状態となる。なお、Ｓ３１で受信したＴＶＩＤのレコードがＴＶ−視聴番組対応情報１４に含まれていなければ、Ｓ３３の処理は省略される。 If it is determined that the program is not being viewed (NO in S32), the viewing program information acquisition unit 23 receives the TVID received in S31 from the TV-viewing program correspondence information 14 stored in the server storage unit 10. Delete the record. That is, for the TV 500 that has not output the program, the association between the TVID and the program is released. Thereafter, the viewing program information acquisition unit 23 enters a waiting state for receiving new viewing program information and TVID. If the TVID record received in S31 is not included in the TV-viewing program correspondence information 14, the process of S33 is omitted.

一方、ＴＶ番組の視聴中であると判断した場合（Ｓ３２でＹＥＳ）、視聴番組情報取得部２３は、サーバ記憶部１０からＴＶ−視聴番組対応情報１４を読み出し（Ｓ３４）、Ｓ３１で受信したＴＶＩＤを含むレコードが登録されていないか確認する（Ｓ３５）。 On the other hand, when it is determined that the TV program is being viewed (YES in S32), the viewing program information acquisition unit 23 reads the TV-viewing program correspondence information 14 from the server storage unit 10 (S34), and the TVID received in S31. It is confirmed whether or not a record including is registered (S35).

ここで、登録されていないことが確認された場合（Ｓ３５でＹＥＳ）、処理はＳ３７に進む。一方、登録されていることが確認された場合（Ｓ３５でＮＯ）、視聴番組情報取得部２３は、当該Ｓ３１で受信したＴＶＩＤのレコードをＴＶ−視聴番組対応情報１４から削除して、Ｓ３７の処理に進む。 Here, when it is confirmed that it is not registered (YES in S35), the process proceeds to S37. On the other hand, when it is confirmed that it is registered (NO in S35), the viewing program information acquisition unit 23 deletes the TVID record received in S31 from the TV-viewing program correspondence information 14, and the process of S37 Proceed to

Ｓ３７では、視聴番組情報取得部２３は、Ｓ３１で受信した視聴番組情報から特定した番組の番組ＩＤと、Ｓ３１で受信したＴＶＩＤとを対応付けたレコードをＴＶ−視聴番組対応情報１４に登録する（Ｓ３７）。この後、視聴番組情報取得部２３は、新たな視聴番組情報およびＴＶＩＤの受信待ち受け状態となる。 In S37, the viewing program information acquisition unit 23 registers a record in which the program ID of the program identified from the viewing program information received in S31 and the TVID received in S31 are associated with the TV-viewing program correspondence information 14 ( S37). Thereafter, the viewing program information acquisition unit 23 enters a waiting state for receiving new viewing program information and TVID.

〔発話制御処理および発話処理の流れ〕
次に、発話制御処理および発話処理の流れを図１０に基づいて説明する。図１０は、発話制御サーバ１が実行する発話制御処理、および対話装置３が実行する発話処理の一例を示すフローチャートである。 [Flow of speech control processing and speech processing]
Next, the flow of speech control processing and speech processing will be described with reference to FIG. FIG. 10 is a flowchart illustrating an example of the utterance control process executed by the utterance control server 1 and the utterance process executed by the dialogue apparatus 3.

発話制御処理では、発話制御サーバ１の発話対象特定部２５は、発話すべき（対話装置３に発話させるべき）発話内容が存在するか確認する（Ｓ５１）。具体的には、発話対象特定部２５は、発話内容生成部２４が格納した番組−発話内容対応情報１５を参照し、発話タイミングに到達している発話内容が存在するか否かによって上記の判断を行う。なお、生成した各発話内容についてタイマを設定し、該タイマを参照して発話すべき発話内容が存在することを特定してもよい。 In the utterance control process, the utterance target specifying unit 25 of the utterance control server 1 checks whether there is an utterance content to be uttered (to be uttered by the dialogue apparatus 3) (S51). Specifically, the utterance target specifying unit 25 refers to the program-utterance content correspondence information 15 stored in the utterance content generation unit 24, and determines whether or not there is utterance content that has reached the utterance timing. I do. A timer may be set for each generated utterance content, and it may be specified by referring to the timer that there is an utterance content to be uttered.

そして、発話すべき発話内容があることを確認した場合（Ｓ５１でＹＥＳ）、発話対象特定部２５は、当該発話内容のそれぞれについて、該発話内容で発話させる対話装置３を特定し、特定した対話装置３のリストを取得し（Ｓ５２）、発話制御部２６に通知する。具体的には、発話対象特定部２５は、番組−発話内容対応情報１５から、対話装置３に発話させるべき発話内容に対応する番組ＩＤを特定し、該番組ＩＤに対応するＴＶＩＤをＴＶ−視聴番組対応情報１４から特定し、該ＴＶＩＤに対応する対話装置ＩＤをＴＶ−対話装置対応情報１６から特定する。このようにして特定した対話装置ＩＤが、発話させるべき対話装置３のリストとなる。 Then, when it is confirmed that there is an utterance content to be uttered (YES in S51), the utterance target specifying unit 25 specifies the dialog device 3 that utters the utterance content for each utterance content and specifies the specified dialog. A list of devices 3 is acquired (S52), and the utterance control unit 26 is notified. Specifically, the utterance target specifying unit 25 specifies the program ID corresponding to the utterance content to be uttered by the dialogue apparatus 3 from the program-utterance content correspondence information 15, and TV-viewing the TVID corresponding to the program ID. The interactive device ID corresponding to the TVID is specified from the program correspondence information 14 and the TV-interactive device correspondence information 16 is specified. The dialog device ID specified in this way is a list of dialog devices 3 to be uttered.

次に、発話させるべき対話装置３のリスト（対話装置ＩＤのリスト）の通知を受けた発話制御部２６は、通知されたリストと番組−発話内容対応情報１５とを用いて発話管理情報１７を生成する。また、発話制御部２６は、該発話管理情報１７に含まれる各発話内容について発話データ（対話装置３が音声出力可能なデータ）を生成する（Ｓ５３）。そして、発話制御部２６は、生成した発話データを、当該発話データに従って発話させる対話装置３に送信し、該発話データを用いて発話するように指示する（Ｓ５４、発話制御ステップ）。なお、想定応答や２次発話が設定されている発話内容に対応する発話データを送信する場合、想定応答の内容を示す情報と、２次発話の発話データについても併せて送信してもよい。 Next, the utterance control unit 26 having received the notification of the list of dialog devices 3 to be uttered (list of dialog device IDs) uses the notified list and the program-utterance content correspondence information 15 to determine the utterance management information 17. Generate. Further, the utterance control unit 26 generates utterance data (data that can be output by the dialogue apparatus 3) for each utterance content included in the utterance management information 17 (S53). Then, the utterance control unit 26 transmits the generated utterance data to the dialogue device 3 that utters according to the utterance data, and instructs to speak using the utterance data (S54, utterance control step). In addition, when transmitting the utterance data corresponding to the utterance content in which the assumed response or the secondary utterance is set, information indicating the content of the assumed response and the utterance data of the secondary utterance may be transmitted together.

続いて、対話装置３が実行する発話処理について説明する。発話制御サーバ１から発話指示および発話データを受信すると（Ｓ６１）、発話部３１は、受信した発話データを音声出力部４０に送信し、音声出力させる。これにより、発話データに応じた発話が行われる（Ｓ６２）。 Next, an utterance process executed by the dialogue apparatus 3 will be described. When the utterance instruction and the utterance data are received from the utterance control server 1 (S61), the utterance unit 31 transmits the received utterance data to the voice output unit 40 and outputs the voice. Thereby, the utterance according to the utterance data is performed (S62).

ここで、Ｓ６２の発話に対してユーザが返答した場合、返答の音声は音声入力部４２によって取得され、解析部３３にて解析され（Ｓ６３）、解析結果（入力された音声の内容）が応答生成部３２に通知される。この通知を受信した応答生成部３２は、入力された音声の内容が想定応答であるか判断する（Ｓ６４）。 Here, when the user responds to the utterance of S62, the response voice is acquired by the voice input unit 42, analyzed by the analysis unit 33 (S63), and the analysis result (content of the input voice) is the response. The generation unit 32 is notified. Upon receiving this notification, the response generation unit 32 determines whether or not the content of the input voice is an assumed response (S64).

ここで、想定応答であると判断した場合（Ｓ６４でＹＥＳ）、応答生成部３２は、その想定応答に対応する２次発話の発話データを発話部３１に通知し、音声出力部４０から出力させる。一方、想定応答ではないと判断した場合（Ｓ６４でＮＯ）、応答生成部３２は、通常の応答を行う（Ｓ６６）。具体的には、応答生成部３２は、解析部３３の解析結果に応じた発話データを生成して発話部３１に通知し、音声出力部４０から出力させる。以上の処理は、発話指示を受信する度に行われる。 If it is determined that the response is an assumed response (YES in S64), the response generation unit 32 notifies the utterance unit 31 of the utterance data of the secondary utterance corresponding to the assumed response and causes the voice output unit 40 to output the utterance data. . On the other hand, when it is determined that the response is not an assumed response (NO in S64), the response generation unit 32 performs a normal response (S66). Specifically, the response generation unit 32 generates utterance data corresponding to the analysis result of the analysis unit 33, notifies the utterance unit 31, and causes the voice output unit 40 to output the utterance data. The above processing is performed every time an utterance instruction is received.

なお、Ｓ６３〜Ｓ６６の処理において、音声認識や発話データの生成については、発話制御サーバ１で行うようにしてもよい。この場合、応答生成部３２および解析部３３は発話制御サーバ１に設け、対話装置３に音声入力部４２が取得した音声データを発話制御サーバ１に送信するための音声データ送信部を設けた構成とすればよい。 In the processing of S63 to S66, the speech control server 1 may perform speech recognition and utterance data generation. In this case, the response generation unit 32 and the analysis unit 33 are provided in the utterance control server 1, and the voice data transmission unit for transmitting the voice data acquired by the voice input unit 42 to the utterance control server 1 is provided in the dialogue apparatus 3. And it is sufficient.

〔実施形態２〕
本発明の他の実施形態について、図１１〜図１３に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、上記実施形態にて説明した部材と同じ機能を有する部材、および上記実施形態にて説明した処理と同様の処理については、それぞれ上記実施形態と同じ符号を付し、その説明を省略する。 [Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those described in the above embodiment and processes similar to those described in the above embodiment are denoted by the same reference numerals as those in the above embodiment. Is omitted.

まず、本実施形態に係る対話システムの概要を図１１に基づいて説明する。図１１は、対話システム２０５の概要を示す図である。図示のように、対話システム（発話制御システム）２０５は、発話制御サーバ（発話制御装置）２０１、ＴＶ情報収集サーバ２０２（コンテンツ特定装置）、番組情報収集サーバ２０３、および対話装置３を含む。本実施形態の対話システム２０５は、上記実施形態の発話制御サーバ１の機能の一部を、ＴＶ情報収集サーバ２０２および番組情報収集サーバ２０３に分散させた構成となっている。 First, the outline | summary of the dialogue system which concerns on this embodiment is demonstrated based on FIG. FIG. 11 is a diagram showing an outline of the dialogue system 205. As illustrated, the dialogue system (speech control system) 205 includes an utterance control server (speech control device) 201, a TV information collection server 202 (content identification device), a program information collection server 203, and a dialogue device 3. The dialogue system 205 of this embodiment has a configuration in which some of the functions of the utterance control server 1 of the above embodiment are distributed to the TV information collection server 202 and the program information collection server 203.

発話制御サーバ２０１は、対話装置３を制御して所定の発話を行わせるサーバである。発話制御サーバ２０１は、図１の発話対象特定部２５および発話制御部２６を備え、ＴＶ−対話装置対応情報１６を予め格納している。 The utterance control server 201 is a server that controls the interactive apparatus 3 to perform a predetermined utterance. The utterance control server 201 includes the utterance target specifying unit 25 and the utterance control unit 26 of FIG. 1, and stores TV-interactive device correspondence information 16 in advance.

ＴＶ情報収集サーバ２０２は、ＴＶ５００が受信して出力している番組を特定するサーバであり、ＴＶ５００から視聴番組情報およびＴＶＩＤを受信する。ＴＶ情報収集サーバ２０２は、図１の視聴番組情報取得部２３を備えている。また、ＴＶ情報収集サーバ２０２は、番組情報収集サーバ２０３が生成した番組−発話内容対応情報１５を取得する発話内容取得部と、番組−発話内容対応情報１５に含まれる各発話内容について発話タイミングを特定する発話タイミング制御部を備えている。 The TV information collection server 202 is a server that identifies a program received and output by the TV 500, and receives viewing program information and TVID from the TV 500. The TV information collection server 202 includes the viewing program information acquisition unit 23 of FIG. Also, the TV information collection server 202 has an utterance content acquisition unit for acquiring the program-utterance content correspondence information 15 generated by the program information collection server 203 and an utterance timing for each utterance content included in the program-utterance content correspondence information 15. An utterance timing control unit to be specified is provided.

番組情報収集サーバ２０３は、放送中の番組に関する情報を収集し、該番組に対応する発話内容を生成するサーバである。番組情報収集サーバ２０３は、図１の番組情報取得部２１、関連情報取得部２２、および発話内容生成部２４を備えている。発話内容生成部２４の機能は、上記実施形態と概ね同様であるが、本実施形態では生成した発話内容（具体的には番組−発話内容対応情報１５）を送信する機能をさらに備えている点で相違している。 The program information collection server 203 is a server that collects information about a program being broadcast and generates utterance contents corresponding to the program. The program information collection server 203 includes the program information acquisition unit 21, the related information acquisition unit 22, and the utterance content generation unit 24 shown in FIG. The function of the utterance content generation unit 24 is substantially the same as that of the above embodiment, but the embodiment further includes a function of transmitting the generated utterance content (specifically, the program-utterance content correspondence information 15). Is different.

図１１に示すように、対話システム２０５では、ＴＶ情報収集サーバ２０２がＴＶ５００から視聴番組情報およびＴＶＩＤを受信する。また、ＴＶ情報収集サーバ２０２は、番組情報収集サーバ２０３から放送中の各番組に対応する発話内容と、その発話内容に対応する番組ＩＤとを受信する。そして、ＴＶ情報収集サーバ２０２は、ＴＶＩＤと発話内容とを対応付けて発話制御サーバ２０１に送信し、発話制御サーバ２０１は、受信したＴＶＩＤに対応付けられた対話装置３に対して、受信した発話内容で発話するように指示する。これにより、発話制御サーバ２０１から対話装置３に番組の映像などを送信することなく、ユーザの視聴している番組に応じた内容で対話装置３に発話させることができる。 As shown in FIG. 11, in the interactive system 205, the TV information collection server 202 receives viewing program information and TVID from the TV 500. Also, the TV information collection server 202 receives the utterance contents corresponding to each program being broadcast and the program ID corresponding to the utterance contents from the program information collection server 203. Then, the TV information collection server 202 associates the TVID with the utterance content and transmits the utterance to the utterance control server 201, and the utterance control server 201 transmits the received utterance to the dialogue apparatus 3 associated with the received TVID. Instruct them to speak with content. As a result, it is possible to cause the dialogue apparatus 3 to utter the content according to the program that the user is viewing without transmitting the video of the program from the utterance control server 201 to the dialogue apparatus 3.

〔発話内容生成処理および発話内容登録処理の流れ〕
次に、発話内容生成処理および発話内容登録処理の流れを図１２に基づいて説明する。図１２は、番組情報収集サーバ２０３が実行する発話内容生成処理、およびＴＶ情報収集サーバ２０２が実行する発話内容登録処理の一例を示すフローチャートである。 [Flow of utterance content generation processing and utterance content registration processing]
Next, the flow of utterance content generation processing and utterance content registration processing will be described with reference to FIG. FIG. 12 is a flowchart showing an example of utterance content generation processing executed by the program information collection server 203 and utterance content registration processing executed by the TV information collection server 202.

まず、番組情報収集サーバ２０３が備える番組情報取得部２１、関連情報取得部２２、および発話内容生成部２４が、図８と同様のＳ１〜Ｓ５の処理を行うことにより、番組−発話内容対応情報１５が登録される。そして、発話内容生成部２４は、発話内容と、これに対応する番組ＩＤとが対応付けられた番組−発話内容対応情報１５をＴＶ情報収集サーバ２０２に送信する（Ｓ２０１）。この後、予め定めた一定時間待機し（Ｓ２０２）、Ｓ１の処理に戻る。これにより、放送中の番組の移り変わりに応じた番組−発話内容対応情報１５が随時送信されることになる。 First, the program information acquisition unit 21, the related information acquisition unit 22, and the utterance content generation unit 24 included in the program information collection server 203 perform the processing of S1 to S5 similar to FIG. 15 is registered. Then, the utterance content generation unit 24 transmits the program-utterance content correspondence information 15 in which the utterance content is associated with the program ID corresponding to the utterance content to the TV information collection server 202 (S201). Thereafter, the process waits for a predetermined time (S202) and returns to the process of S1. Thereby, the program-utterance content correspondence information 15 corresponding to the change of the program being broadcast is transmitted as needed.

発話内容登録処理では、ＴＶ情報収集サーバ２０２の備える発話内容取得部が、発話内容送信部が送信した発話内容および番組ＩＤ（具体的には番組−発話内容対応情報１５）を受信する（Ｓ２１１）。そして、ＴＶ情報収集サーバ２０２の備える発話タイミング制御部は、各発話内容について発話タイマを登録する（Ｓ２１２）。発話タイマは、発話タイミングを特定するためのタイマであり、番組−発話内容対応情報１５に含まれる発話タイミングを示す情報に従って登録される。これらの処理は、番組−発話内容対応情報１５を受信する度に行われる。 In the utterance content registration process, the utterance content acquisition unit included in the TV information collection server 202 receives the utterance content and the program ID (specifically, the program-utterance content correspondence information 15) transmitted by the utterance content transmission unit (S211). . Then, the utterance timing control unit included in the TV information collection server 202 registers an utterance timer for each utterance content (S212). The utterance timer is a timer for specifying the utterance timing, and is registered according to information indicating the utterance timing included in the program-utterance content correspondence information 15. These processes are performed each time the program-utterance content correspondence information 15 is received.

〔発話内容送信処理および発話制御処理の流れ〕
次に、発話内容送信処理および発話制御処理の流れを図１３に基づいて説明する。図１３は、ＴＶ情報収集サーバ２０２が実行する発話内容送信処理、および発話制御サーバ２０１が実行する発話制御処理の一例を示すフローチャートである。なお、同図には示していないが、ＴＶ情報収集サーバ２０２の視聴番組情報取得部２３は、図９に示すような視聴番組登録処理を行うことにより、ＴＶ−視聴番組対応情報１４を格納している。 [Flow of utterance content transmission processing and utterance control processing]
Next, the flow of the utterance content transmission process and the utterance control process will be described with reference to FIG. FIG. 13 is a flowchart illustrating an example of the utterance content transmission process executed by the TV information collection server 202 and the utterance control process executed by the utterance control server 201. Although not shown in the figure, the viewing program information acquisition unit 23 of the TV information collection server 202 stores the TV-viewing program correspondence information 14 by performing viewing program registration processing as shown in FIG. ing.

発話内容送信処理は、上述の発話内容登録処理にて登録された発話タイマが発火する毎に行われる。なお、複数の発話タイマが発火した場合等には、各発話タイマに対応する発話内容送信処理が並行して実行される。発話タイミング制御部は、登録した発話タイマが発火したことを確認すると（Ｓ２２１）、発火した発話タイマに対応する発話内容について、これに対応するＴＶ５００を検索する（Ｓ２２２）。具体的には、発話タイミング制御部は、番組−発話内容対応情報１５から、発火した発話タイマに対応する番組ＩＤを特定し、ＴＶ−視聴番組対応情報１４において、この番組ＩＤと対応付けられたＴＶ５００を検索する。 The utterance content transmission process is performed each time the utterance timer registered in the utterance content registration process is fired. When a plurality of utterance timers are fired, etc., utterance content transmission processing corresponding to each utterance timer is executed in parallel. When the utterance timing control unit confirms that the registered utterance timer has fired (S221), the utterance timing control unit searches the TV 500 corresponding to the utterance content corresponding to the uttered utterance timer (S222). Specifically, the utterance timing control unit specifies the program ID corresponding to the uttered utterance timer from the program-utterance content correspondence information 15, and is associated with this program ID in the TV-viewing program correspondence information 14. Search TV500.

そして、発話タイミング制御部は、発火した発話タイマに対応する発話内容と、Ｓ２２２の処理で検出したＴＶＩＤのリストとを発話制御サーバ２０１に送信する（Ｓ２２３）。なお、送信する発話内容には、想定応答および２次発話の内容を示す情報が含まれていてもよい。 Then, the utterance timing control unit transmits the utterance contents corresponding to the uttered utterance timer and the TVID list detected in the process of S222 to the utterance control server 201 (S223). Note that the utterance content to be transmitted may include information indicating the assumed response and the content of the secondary utterance.

発話制御処理では、発話制御サーバ２０１の発話制御部２６は、ＴＶ情報収集サーバ２０２から発話内容およびＴＶＩＤのリストを受信し（Ｓ２３１）、受信したＴＶＩＤのリストを発話対象特定部２５に通知する。これにより、発話対象特定部２５は、ＴＶ−対話装置対応情報１６を参照して、上記通知されたリストに含まれる各ＴＶＩＤに対応する対話装置ＩＤを特定し、そのリストを取得する（Ｓ５２）。この後は、図１０と同様に、発話データの生成（Ｓ５３）および発話指示の送信（Ｓ５４）が行われる。また、発話指示を受信した対話装置３では、図１０と同様の発話処理が行われる。 In the utterance control process, the utterance control unit 26 of the utterance control server 201 receives the utterance content and TVID list from the TV information collection server 202 (S231), and notifies the utterance target specifying unit 25 of the received TVID list. As a result, the utterance target specifying unit 25 refers to the TV-interactive device correspondence information 16 to specify the interactive device ID corresponding to each TVID included in the notified list, and acquires the list (S52). . Thereafter, as in FIG. 10, utterance data generation (S53) and utterance instruction transmission (S54) are performed. Further, in the dialogue apparatus 3 that has received the utterance instruction, the utterance process similar to that shown in FIG. 10 is performed.

〔発話タイミングの制御について〕
上記では、ＴＶ情報収集サーバ２０２が発話タイマに従って発話内容を送信することによって、対話装置３の発話するタイミングが制御される例を説明したが、この例に限られない。例えば、発話タイミングが「即時」の発話内容については、発話タイマを登録せずに発話制御サーバ２０１に送信してもよい。具体的には、図１２のＳ２１１の処理の後、Ｓ２１２の処理を省略して、図１３のＳ２２２の処理に進み、Ｓ２２３にて発話内容と対象ＴＶリスト（発話内容に対応するＴＶＩＤのリスト）を送信してもよい。また、発話内容が広告である場合、所定時間毎にその発話を行わせることによって、広告効果の向上が期待できるので、所定時間毎（例えば１５分毎）に発話制御サーバ２０１に送信して、発話させるようにしてもよい。 [About control of speech timing]
In the above description, the TV information collection server 202 transmits the utterance content according to the utterance timer to control the utterance timing of the dialogue apparatus 3. However, the present invention is not limited to this example. For example, the utterance content whose utterance timing is “immediate” may be transmitted to the utterance control server 201 without registering the utterance timer. Specifically, after the process of S211 in FIG. 12, the process of S212 is omitted, and the process proceeds to the process of S222 in FIG. 13. In S223, the utterance content and the target TV list (list of TVID corresponding to the utterance content) May be sent. Also, if the utterance content is an advertisement, it is expected to improve the advertising effect by making the utterance every predetermined time, so it is transmitted to the utterance control server 201 every predetermined time (for example, every 15 minutes), You may make it speak.

また、発話タイミングを制御する方法は、上述の例に限られない。例えば、発話内容を生成する番組情報収集サーバ２０３が発話タイミングを制御してもよく、この場合、番組情報収集サーバ２０３は、生成した発話内容のうち、発話させるタイミングとなった発話内容をＴＶ情報収集サーバ２０２に送信すればよい。また、発話制御サーバ２０１や対話装置３が発話タイミングを制御してもよく、これらの場合には、発話制御サーバ２０１や対話装置３に発話タイミングを示す情報を送信しておけばよい。 Further, the method for controlling the speech timing is not limited to the above example. For example, the program information collection server 203 that generates the utterance content may control the utterance timing. In this case, the program information collection server 203 sets the utterance content at the utterance timing among the generated utterance content as the TV information. What is necessary is just to transmit to the collection server 202. In addition, the utterance control server 201 and the dialogue apparatus 3 may control the utterance timing. In these cases, information indicating the utterance timing may be transmitted to the utterance control server 201 and the dialogue apparatus 3.

〔実施形態３〕
本発明の他の実施形態について、図１４および図１５に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、上記実施形態にて説明した部材と同じ機能を有する部材、および上記実施形態にて説明した処理と同様の処理については、それぞれ上記実施形態と同じ符号を付し、その説明を省略する。 [Embodiment 3]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those described in the above embodiment and processes similar to those described in the above embodiment are denoted by the same reference numerals as those in the above embodiment. Is omitted.

まず、本実施形態に係る対話システムの概要を図１４に基づいて説明する。図１４は、対話システム３０５の概要を示す図である。図示のように、対話システム（発話制御システム）３０５は、発話制御サーバ（発話制御装置）３０１、ＴＶ情報収集サーバ（コンテンツ特定装置）３０２、番組情報収集サーバ３０３、および対話装置３を含む。本実施形態の対話システム３０５は、上記実施形態の発話制御サーバ１の機能の一部を、ＴＶ情報収集サーバ３０２および番組情報収集サーバ３０３に分散させた構成となっている。 First, the outline | summary of the dialogue system which concerns on this embodiment is demonstrated based on FIG. FIG. 14 is a diagram showing an outline of the dialogue system 305. As illustrated, the dialogue system (speech control system) 305 includes an utterance control server (speech control device) 301, a TV information collection server (content specifying device) 302, a program information collection server 303, and the dialogue device 3. The dialogue system 305 of this embodiment has a configuration in which some of the functions of the utterance control server 1 of the above embodiment are distributed to the TV information collection server 302 and the program information collection server 303.

発話制御サーバ３０１は、対話装置３を制御して所定の発話を行わせるサーバである。発話制御サーバ３０１は、図１の発話対象特定部２５および発話制御部２６を備え、ＴＶ−対話装置対応情報１６を予め格納している。 The utterance control server 301 is a server that controls the interactive apparatus 3 to perform a predetermined utterance. The utterance control server 301 includes the utterance target specifying unit 25 and the utterance control unit 26 of FIG. 1, and stores TV-interactive device correspondence information 16 in advance.

ＴＶ情報収集サーバ３０２は、ＴＶ５００が受信して出力している番組を特定するサーバであり、ＴＶ５００から視聴番組情報およびＴＶＩＤを受信する。ＴＶ情報収集サーバ３０２は、図１の視聴番組情報取得部２３と番組情報取得部２１を備えている。視聴番組情報取得部２３と番組情報取得部２１は、上記実施形態で説明した機能に加えて、取得した情報を番組情報収集サーバ３０３に送信する機能を備えている。 The TV information collection server 302 is a server that identifies a program received and output by the TV 500, and receives viewing program information and TVID from the TV 500. The TV information collection server 302 includes the viewing program information acquisition unit 23 and the program information acquisition unit 21 of FIG. The viewing program information acquisition unit 23 and the program information acquisition unit 21 have a function of transmitting the acquired information to the program information collection server 303 in addition to the functions described in the above embodiment.

番組情報収集サーバ３０３は、ＴＶ情報収集サーバ３０２から通知された番組に関する情報を収集し、該番組に対応する発話内容を生成するサーバである。番組情報収集サーバ３０３は、図１の番組情報取得部２１、関連情報取得部２２、および発話内容生成部２４を備えている。また、番組情報収集サーバ３０３は、ＴＶ５００において番組の視聴が行われているか否かを判断する視聴状態特定部を備えている。 The program information collection server 303 is a server that collects information related to a program notified from the TV information collection server 302 and generates utterance contents corresponding to the program. The program information collection server 303 includes the program information acquisition unit 21, the related information acquisition unit 22, and the utterance content generation unit 24 shown in FIG. In addition, the program information collection server 303 includes a viewing state specifying unit that determines whether or not the TV 500 is viewing a program.

図１４に示すように、対話システム３０５では、ＴＶ情報収集サーバ３０２がＴＶ５００から視聴番組情報およびＴＶＩＤを受信し、受信したＴＶＩＤと、受信した視聴番組情報から特定した番組ＩＤとを対応付けて番組情報収集サーバ３０３に送信する。次に、番組情報収集サーバ３０３は、受信した番組ＩＤで特定される番組に対応する発話内容を生成し、生成した発話内容とＴＶＩＤとを対応付けて発話制御サーバ３０１に送信する。そして、発話制御サーバ３０１は、受信したＴＶＩＤと対応付けられた対話装置３に対して、受信した発話内容で発話するように指示する。これにより、発話制御サーバ３０１から対話装置３に番組の映像などを送信することなく、ユーザの視聴している番組に応じた内容で対話装置３に発話させることができる。 As shown in FIG. 14, in the interactive system 305, the TV information collection server 302 receives viewing program information and TVID from the TV 500, and the received TVID is associated with the program ID specified from the received viewing program information. It transmits to the information collection server 303. Next, the program information collection server 303 generates utterance content corresponding to the program specified by the received program ID, and transmits the generated utterance content and TVID to the utterance control server 301 in association with each other. Then, the utterance control server 301 instructs the dialogue apparatus 3 associated with the received TVID to utter with the received utterance content. As a result, it is possible to cause the dialogue apparatus 3 to utter the content according to the program that the user is viewing without transmitting the video of the program from the utterance control server 301 to the dialogue apparatus 3.

〔視聴番組通知処理、発話内容生成処理、および発話制御処理の流れ〕
次に、視聴番組通知処理、発話内容生成処理、および発話制御処理の流れを図１５に基づいて説明する。図１５は、ＴＶ情報収集サーバ３０２が実行する視聴番組通知処理、番組情報収集サーバ３０３が実行する発話内容生成処理、および発話制御サーバ３０１が実行する発話制御処理の一例を示すフローチャートである。 [Flow of viewing program notification processing, utterance content generation processing, and utterance control processing]
Next, the flow of the viewing program notification process, the utterance content generation process, and the utterance control process will be described with reference to FIG. FIG. 15 is a flowchart illustrating an example of a viewing program notification process executed by the TV information collection server 302, an utterance content generation process executed by the program information collection server 303, and an utterance control process executed by the utterance control server 301.

まず、視聴番組通知処理では、ＴＶ情報収集サーバ３０２の備える視聴番組情報取得部２３が、ＴＶ５００から送信される視聴番組情報およびＴＶＩＤを受信し（Ｓ３０１、コンテンツ特定ステップ）、ＴＶ−視聴番組対応情報１４を生成する。なお、視聴番組情報が、ＴＶ５００の電源ＯＦＦ操作や放送されている番組以外の外部出力への切り替え操作が行われていることを示している場合等のように、番組ＩＤが特定できない場合には、ＴＶＩＤのみをＴＶ−視聴番組対応情報１４に登録してもよい。これにより、番組を視聴していないＴＶ５００を特定することができる。 First, in the viewing program notification process, the viewing program information acquisition unit 23 included in the TV information collection server 302 receives viewing program information and TVID transmitted from the TV 500 (S301, content specifying step), and TV-viewing program correspondence information. 14 is generated. When the program ID cannot be specified, such as when the viewing program information indicates that the TV 500 has been turned off or has been switched to an external output other than the broadcast program, etc. Only TVID may be registered in the TV-viewing program correspondence information 14. Thereby, it is possible to identify the TV 500 that is not viewing the program.

次に、番組情報取得部２１は、Ｓ３０１で受信した視聴番組情報が示す番組の番組情報を取得し（Ｓ３０２）、取得した番組情報から番組情報テーブル１１を生成する。また、番組情報取得部２１は、番組検索情報テーブル１３を生成する。なお、広告検索情報テーブル１２についても生成してもよい。 Next, the program information acquisition unit 21 acquires program information of the program indicated by the viewing program information received in S301 (S302), and generates the program information table 11 from the acquired program information. In addition, the program information acquisition unit 21 generates a program search information table 13. The advertisement search information table 12 may also be generated.

そして、番組情報取得部２１は、生成した番組情報テーブル１１および番組検索情報テーブル１３を番組情報収集サーバ３０３に送信し、視聴番組情報取得部２３は、番組情報収集サーバ３０３に送信する（Ｓ３０３）。 Then, the program information acquisition unit 21 transmits the generated program information table 11 and program search information table 13 to the program information collection server 303, and the viewing program information acquisition unit 23 transmits to the program information collection server 303 (S303). .

次に、発話内容生成処理では、番組情報収集サーバ３０３の備える視聴状態特定部が、番組情報テーブル１１、番組検索情報テーブル１３、およびＴＶ−視聴番組対応情報１４を受信する（Ｓ３１１）。そして、視聴状態特定部は、受信したＴＶ−視聴番組対応情報１４を参照して、ＴＶ５００において番組の視聴が行われているか判断する（Ｓ３１２）。具体的には、ＴＶ−視聴番組対応情報１４において、番組ＩＤが対応付けられていないＴＶＩＤのＴＶ５００では、番組の視聴が行われていないと判断する。また、番組ＩＤが対応付けられているＴＶＩＤであっても、その番組の終了時刻を過ぎていれば、視聴中ではないと判断してもよい。 Next, in the utterance content generation process, the viewing state specifying unit included in the program information collection server 303 receives the program information table 11, the program search information table 13, and the TV-viewing program correspondence information 14 (S311). Then, the viewing state specifying unit refers to the received TV-viewing program correspondence information 14 and determines whether the TV 500 is viewing the program (S312). Specifically, in the TV-viewing program correspondence information 14, it is determined that the TV ID TV 500 that is not associated with the program ID does not view the program. Even if the TV ID is associated with the program ID, it may be determined that the program ID is not being viewed if the end time of the program has passed.

ここで、視聴中ではないと判断した（Ｓ３１２でＮＯ）ＴＶ５００については、発話内容を生成せずに処理を終了する。一方、視聴状態特定部は、視聴中であると判断したＴＶ５００が存在する場合（Ｓ３１２でＹＥＳ）、そのＴＶ５００に対応する番組ＩＤを関連情報取得部２２に通知する。次に、関連情報取得部２２は、通知された番組ＩＤで特定される番組の番組関連情報を、Ｓ３１１で受信した番組検索情報テーブル１３を用いて検索し（Ｓ２）、検出した番組関連情報を発話内容生成部２４に通知する。 Here, for the TV 500 that is determined not to be viewed (NO in S312), the process ends without generating the utterance content. On the other hand, when there is a TV 500 that is determined to be being viewed (YES in S312), the viewing state specifying unit notifies the related information acquisition unit 22 of the program ID corresponding to the TV 500. Next, the related information acquisition unit 22 searches for the program related information of the program specified by the notified program ID using the program search information table 13 received in S311 (S2), and detects the detected program related information. The utterance content generation unit 24 is notified.

次に、発話内容生成部２４は、通知された番組関連情報を用いて、Ｓ３１２にて視聴中であると判断された各ＴＶ５００が出力中の各番組に対応する発話内容を生成する（Ｓ４）。なお、実施形態１の発話内容生成処理と同様にして、イベントに対応する発話内容や広告に関する発話内容を生成してもよい。 Next, the utterance content generation unit 24 generates utterance content corresponding to each program being output by each TV 500 determined to be viewing in S312 using the notified program related information (S4). . Note that the utterance content corresponding to the event and the utterance content related to the advertisement may be generated in the same manner as the utterance content generation processing of the first embodiment.

そして、発話内容生成部２４は、Ｓ４で生成した発話内容と、該発話内容で発話するタイミングを指定する情報とを対応付けて登録する（Ｓ５）。そして、各発話内容にＴＶＩＤを対応付けて発話制御サーバ３０１に送信する（Ｓ３１３）。なお、発話内容に対応するＴＶＩＤは、Ｓ３１１で受信したＴＶ−視聴番組対応情報１４を参照して特定する。 And the utterance content production | generation part 24 matches and registers the utterance content produced | generated by S4, and the information which designates the timing which speaks by this utterance content (S5). Then, the TVID is associated with each utterance content and transmitted to the utterance control server 301 (S313). The TVID corresponding to the utterance content is specified with reference to the TV-viewing program correspondence information 14 received in S311.

次に、発話制御処理では、発話制御サーバ３０１の備える発話制御部２６が、発話内容およびＴＶＩＤを受信し（Ｓ３２１）、受信したＴＶＩＤを発話対象特定部２５に通知する。これにより、発話対象特定部２５は、ＴＶ−対話装置対応情報１６を参照して、上記通知されたＴＶＩＤに対応する対話装置ＩＤを特定する（Ｓ５２）。この後は、図１０と同様に、発話データの生成（Ｓ５３）および発話指示の送信（Ｓ５４）が行われる。また、発話指示を受信した対話装置３では、図１０と同様の発話処理が行われる。 Next, in the utterance control process, the utterance control unit 26 included in the utterance control server 301 receives the utterance content and TVID (S321), and notifies the utterance target specifying unit 25 of the received TVID. Thereby, the utterance target specifying unit 25 refers to the TV-interactive device correspondence information 16 and specifies the interactive device ID corresponding to the notified TVID (S52). Thereafter, as in FIG. 10, utterance data generation (S53) and utterance instruction transmission (S54) are performed. Further, in the dialogue apparatus 3 that has received the utterance instruction, the utterance process similar to that shown in FIG. 10 is performed.

なお、図示の例では、Ｓ３１１を契機としてＳ３１２以下の処理を行っているが、１つの番組が視聴されている限り、その番組に対応する発話内容の生成については継続的に行うようにしてもよい。つまり、Ｓ３１３の後、Ｓ３１２の処理に遷移するようにしてもよい。これにより、番組の進行状況に応じた番組関連情報を検出して、よりリアルタイム性の高い発話内容を生成することが可能になる。 In the illustrated example, the processing from S312 onward is performed with S311 as an opportunity. However, as long as one program is viewed, the generation of the utterance content corresponding to the program may be continuously performed. Good. That is, after S313, the process may transition to S312. As a result, it is possible to detect program-related information according to the progress of the program and generate utterance contents with higher real-time characteristics.

また、ＴＶ５００に発話内容を送信した後、Ｓ３１２でＮＯと判断された場合、その発話内容に基づく発話が中止されるように、発話制御サーバ３０１に対して通知を行ってもよい。これにより、視聴されていない番組に関する発話を行ってしまうことを防ぐことができる。 Further, after transmitting the utterance content to the TV 500, if it is determined NO in S312, the utterance control server 301 may be notified so that the utterance based on the utterance content is stopped. As a result, it is possible to prevent an utterance relating to a program that has not been viewed.

〔実施形態４〕
本発明の他の実施形態について、図１６および図１７に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、上記実施形態にて説明した部材と同じ機能を有する部材、および上記実施形態にて説明した処理と同様の処理については、それぞれ上記実施形態と同じ符号を付し、その説明を省略する。 [Embodiment 4]
Another embodiment of the present invention will be described below with reference to FIGS. 16 and 17. For convenience of explanation, members having the same functions as those described in the above embodiment and processes similar to those described in the above embodiment are denoted by the same reference numerals as those in the above embodiment. Is omitted.

まず、本実施形態に係る対話システムの概要を図１６に基づいて説明する。図１６は、対話システム（発話制御システム）４０５の概要を示す図である。図示のように、対話システム４０５は、発話制御サーバ（発話制御装置）４０１、ＴＶ情報収集サーバ（コンテンツ特定装置）４０２、番組情報収集サーバ４０３、および対話装置３を含む。本実施形態の対話システム４０５は、上記実施形態の発話制御サーバ１の機能の一部を、ＴＶ情報収集サーバ４０２および番組情報収集サーバ４０３に分散させた構成となっている。 First, the outline | summary of the dialogue system which concerns on this embodiment is demonstrated based on FIG. FIG. 16 is a diagram showing an outline of a dialogue system (speech control system) 405. As illustrated, the dialogue system 405 includes an utterance control server (speech control device) 401, a TV information collection server (content specifying device) 402, a program information collection server 403, and the dialogue device 3. The dialogue system 405 of this embodiment has a configuration in which a part of the functions of the utterance control server 1 of the above embodiment is distributed to the TV information collection server 402 and the program information collection server 403.

発話制御サーバ４０１は、対話装置３を制御して所定の発話を行わせるサーバである。発話制御サーバ４０１は、図１の発話対象特定部２５および発話制御部２６を備え、ＴＶ−対話装置対応情報１６を予め格納している。 The utterance control server 401 is a server that controls the interactive apparatus 3 to perform a predetermined utterance. The utterance control server 401 includes the utterance target specifying unit 25 and the utterance control unit 26 of FIG. 1, and stores TV-interactive device correspondence information 16 in advance.

ＴＶ情報収集サーバ４０２は、ＴＶ５００が受信して出力している番組を特定するサーバであり、ＴＶ５００から視聴番組情報およびＴＶＩＤを受信する。ＴＶ情報収集サーバ４０２は、図１の視聴番組情報取得部２３を備えている。 The TV information collection server 402 is a server that identifies a program received and output by the TV 500, and receives viewing program information and TVID from the TV 500. The TV information collection server 402 includes the viewing program information acquisition unit 23 of FIG.

番組情報収集サーバ４０３は、放送中の各番組に対応する発話内容を生成するサーバである。番組情報収集サーバ４０３は、図１の番組情報取得部２１、関連情報取得部２２、および発話内容生成部２４を備えている。 The program information collection server 403 is a server that generates utterance contents corresponding to each program being broadcast. The program information collection server 403 includes the program information acquisition unit 21, the related information acquisition unit 22, and the utterance content generation unit 24 shown in FIG.

図１６に示すように、対話システム４０５では、ＴＶ情報収集サーバ４０２がＴＶ５００から視聴番組情報およびＴＶＩＤを受信し、受信したＴＶＩＤと、受信した視聴番組情報から特定した番組ＩＤとを対応付けて発話制御サーバ４０１に送信する。また、番組情報収集サーバ４０３は、放送中の各番組に対応する発話内容を生成し、生成した発話内容と番組ＩＤとを対応付けて発話制御サーバ４０１に送信する。そして、発話制御サーバ４０１は、番組情報収集サーバ４０３から受信した発話内容を、その発話内容に対応する番組を出力中のＴＶ５００と予め対応付けられた対話装置３に通知して発話させる。これにより、発話制御サーバ４０１から対話装置３に番組の映像などを送信することなく、ユーザの視聴している番組に応じた内容で対話装置３に発話させることができる。 As shown in FIG. 16, in the interactive system 405, the TV information collection server 402 receives the viewing program information and TVID from the TV 500, and utters the received TVID in association with the program ID specified from the received viewing program information. Transmit to the control server 401. Further, the program information collection server 403 generates utterance contents corresponding to each program being broadcast, and transmits the generated utterance contents and the program ID to the utterance control server 401 in association with each other. Then, the utterance control server 401 notifies the dialog device 3 previously associated with the TV 500 that is outputting the program corresponding to the utterance content of the utterance content received from the program information collection server 403 to utter. As a result, it is possible to cause the dialogue apparatus 3 to utter the content according to the program that the user is viewing without transmitting the video of the program from the utterance control server 401 to the dialogue apparatus 3.

〔視聴番組通知処理および発話制御処理の流れ〕
次に、視聴番組通知処理および発話制御処理の流れを図１７に基づいて説明する。図１７は、発話制御サーバ４０１が実行する視聴番組登録処理および発話制御処理の一例を示すフローチャートである。なお、同図には示していないが、ＴＶ情報収集サーバ４０２は、図１５の視聴番組通知処理と同様の処理を実行することによって、番組ＩＤとＴＶＩＤとを対応付けたＴＶ−視聴番組対応情報１４を発話制御サーバ４０１に送信している。ただし、本実施形態では、番組情報テーブル１１および番組検索情報テーブル１３の送信は行われない。また、番組情報収集サーバ４０３は、図１２の発話内容生成処理を実行することによって、発話内容と番組ＩＤとの対応を示す番組−発話内容対応情報１５を発話制御サーバ４０１に送信している。 [Flow of viewing program notification processing and utterance control processing]
Next, the flow of the viewing program notification process and the speech control process will be described with reference to FIG. FIG. 17 is a flowchart illustrating an example of a viewing program registration process and an utterance control process executed by the utterance control server 401. Although not shown in the figure, the TV information collection server 402 performs the same processing as the viewing program notification processing in FIG. 15 to perform TV-viewing program correspondence information in which the program ID and the TVID are associated with each other. 14 is transmitted to the utterance control server 401. However, in this embodiment, transmission of the program information table 11 and the program search information table 13 is not performed. Further, the program information collection server 403 transmits the utterance content generation processing of FIG. 12 to the utterance control server 401 by transmitting the program-utterance content correspondence information 15 indicating the correspondence between the utterance content and the program ID.

視聴番組登録処理では、発話制御サーバ４０１の発話制御部２６は、ＴＶ情報収集サーバ４０２からＴＶ−視聴番組対応情報１４を受信し（Ｓ４０１）、このＴＶ−視聴番組対応情報１４に含まれるＴＶＩＤを発話対象特定部２５に通知する。続いて、発話対象特定部２５は、予め発話制御サーバ４０１に格納されているＴＶ−対話装置対応情報１６を参照して、通知された各ＴＶＩＤに対応する対話装置ＩＤを特定する（Ｓ４０２）。 In the viewing program registration process, the utterance control unit 26 of the utterance control server 401 receives the TV-viewing program correspondence information 14 from the TV information collection server 402 (S 401), and sets the TVID included in the TV-viewing program correspondence information 14. The utterance target specifying unit 25 is notified. Subsequently, the utterance target specifying unit 25 refers to the TV-interactive device correspondence information 16 stored in advance in the utterance control server 401, and specifies the interactive device ID corresponding to each notified TVID (S402).

そして、発話対象特定部２５は、Ｓ４０２で特定した対話装置ＩＤと、Ｓ４０１で受信したＴＶ−視聴番組対応情報１４に含まれる番組ＩＤとを対応付けて、対話装置−視聴番組対応情報（図６の（ｂ）参照）として登録する（Ｓ４０３）。これらの処理は、ＴＶ情報収集サーバ４０２からＴＶ−視聴番組対応情報１４を受信する度に行われる。 Then, the utterance target specifying unit 25 associates the interactive device ID specified in S402 with the program ID included in the TV-viewing program correspondence information 14 received in S401, and thus the interactive device-viewing program correspondence information (FIG. 6). (See (b) of FIG. 2) (S403). These processes are performed each time the TV-viewing program correspondence information 14 is received from the TV information collection server 402.

次に、発話制御処理では、発話制御部２６は、番組情報収集サーバ４０３から番組−発話内容対応情報１５を受信する（Ｓ４１１）と、これを発話対象特定部２５に通知する。発話対象特定部２５は、番組−発話内容対応情報１５に含まれる発話内容のうち、対話装置３に発話させるべき発話内容を、発話タイミングを示す情報を参照する等して特定し、さらに、特定した発話内容に対応する番組ＩＤを特定する。 Next, in the utterance control process, when the utterance control unit 26 receives the program-utterance content correspondence information 15 from the program information collection server 403 (S411), the utterance control unit 26 notifies the utterance target specifying unit 25 of this. The utterance target specifying unit 25 specifies the utterance contents to be uttered by the dialog device 3 among the utterance contents included in the program-utterance content correspondence information 15 by referring to information indicating the utterance timing, and further specifies the utterance contents. The program ID corresponding to the uttered content is specified.

そして、発話対象特定部２５は、上記特定した番組ＩＤに対応付けられた対話装置ＩＤをＳ４０３で登録された対話装置−視聴番組対応情報から検索する（Ｓ４１２）ことにより、制御対象とする対話装置３を特定する。この後は、図１０と同様に、発話データの生成（Ｓ５３）および発話指示の送信（Ｓ５４）が行われる。また、発話指示を受信した対話装置３では、図１０と同様の発話処理が行われる。 Then, the utterance target specifying unit 25 searches for the dialog device ID associated with the specified program ID from the dialog device-viewing program correspondence information registered in S403 (S412), thereby the dialog device to be controlled. 3 is specified. Thereafter, as in FIG. 10, utterance data generation (S53) and utterance instruction transmission (S54) are performed. Further, in the dialogue apparatus 3 that has received the utterance instruction, the utterance process similar to that shown in FIG. 10 is performed.

なお、Ｓ５３の処理は、Ｓ４１２の処理の前に行ってもよいし、Ｓ４１２の処理と並行して行ってもよい。ただし、Ｓ４１２の処理の後で行うようにした場合、発話データの生成を、Ｓ５４にて発話指示する発話内容に絞り込むことが可能である。これにより、Ｓ４１１で受信した発話内容の全てについて一括して発話データを生成する必要がなくなるので、発話データの生成処理の負荷を分散することができる。 Note that the process of S53 may be performed before the process of S412 or may be performed in parallel with the process of S412. However, when the processing is performed after the processing of S412, the generation of the utterance data can be narrowed down to the utterance contents instructed to utter in S54. This eliminates the need to generate utterance data for all of the utterance contents received in S411, thereby distributing the load of utterance data generation processing.

〔コンテンツについて〕
上記各実施形態では、ＴＶ５００が受信して出力している放送番組に応じた発話を行わせる例を示したが、コンテンツは発話制御サーバ以外の装置から送信されるものであればよく、放送番組に限られない。また、コンテンツを出力する出力装置はＴＶ５００に限られない。例えば、放送ではなく、インターネット等の通信ネットワークを介して配信されるコンテンツ（例えばオンデマンドコンテンツ）、ＬＡＮ等のネットワーク上のコンテンツ、あるいはラジオ等の映像を伴わないコンテンツに応じた発話を行わせてもよい。また、出力装置は、スマートフォンやタブレット端末のような携帯型の通信端末であってもよい。 [About content]
In each of the above embodiments, an example is shown in which an utterance corresponding to a broadcast program received and output by the TV 500 is performed. However, the content may be transmitted from an apparatus other than the utterance control server. Not limited to. Further, the output device that outputs content is not limited to the TV 500. For example, utterances according to content distributed via a communication network such as the Internet (for example, on-demand content), content on a network such as a LAN, or content that does not involve video such as radio are performed instead of broadcasting. Also good. The output device may be a portable communication terminal such as a smartphone or a tablet terminal.

〔対話装置の変形例〕
上記各実施形態では、対話装置３の発話内容を発話制御サーバにて制御する例を説明したが、対話装置３において発話内容を決定してもよい。この場合、対話装置は、ＴＶ５００が出力中の番組を特定するコンテンツ特定手段を備え、該手段が特定した番組に応じた発話を発話部３１にて行う。そして、この例における対話装置の制御方法は、対話装置と対応付けられたＴＶ５００が受信して出力している番組を特定するコンテンツ特定ステップと、上記コンテンツ特定ステップにて特定したコンテンツに応じた内容で発話する発話ステップと、を含む。この構成では、上記実施形態に記載した各サーバを省略することも可能であり、対話装置３からサーバと通信するための構成を省略することもできる。 [Modified example of interactive device]
In each of the above-described embodiments, the example in which the utterance content of the dialog device 3 is controlled by the utterance control server has been described, but the utterance content may be determined in the dialog device 3. In this case, the dialogue apparatus includes content specifying means for specifying the program being output by the TV 500, and the utterance unit 31 performs utterance according to the program specified by the means. The interactive apparatus control method in this example includes a content specifying step for specifying a program received and output by the TV 500 associated with the interactive apparatus, and a content corresponding to the content specified in the content specifying step. An utterance step of uttering at. In this configuration, each server described in the above embodiment can be omitted, and the configuration for communicating with the server from the interactive device 3 can also be omitted.

上記コンテンツ特定手段が番組を特定する方法は特に限定されず、例えば上記各実施形態に記載のＴＶ情報収集サーバ２０２等から番組を示す情報を取得してもよい。また、対話装置に、ＴＶ５００を操作するための出力装置操作手段を設け、この出力装置操作手段が行った操作の内容を示す情報を用いて番組を特定してもよい。例えば、出力装置操作手段は、ＴＶ５００に対して電源ＯＮ／ＯＦＦ、チャンネル切り替え等の操作を行ってもよく、このような操作の内容を示す情報を記憶しておくことにより、番組を特定することができる。この構成によれば、図１に示す視聴番組情報送信部５０１を備えていない一般的なＴＶが出力している番組に応じた発話を行わせることもできる。また、番組に応じた発話内容の生成は対話装置で行ってもよいし、上記各実施形態に記載の番組情報収集サーバ２０３等に生成させてもよい。 The method by which the content specifying unit specifies a program is not particularly limited, and for example, information indicating a program may be acquired from the TV information collection server 202 or the like described in each of the above embodiments. Further, the interactive apparatus may be provided with an output device operation means for operating the TV 500, and the program may be specified using information indicating the content of the operation performed by the output device operation means. For example, the output device operating means may perform operations such as power ON / OFF and channel switching for the TV 500, and the program is specified by storing information indicating the contents of such operations. Can do. According to this configuration, it is possible to make an utterance corresponding to a program output by a general TV that does not include the viewing program information transmission unit 501 shown in FIG. Further, the utterance content corresponding to the program may be generated by the interactive device, or may be generated by the program information collection server 203 described in each of the above embodiments.

〔実施形態５〕
発話制御サーバ１、２０１、３０１、４０１、ＴＶ情報収集サーバ２０２、３０２、４０２、番組情報収集サーバ２０３、３０３、４０３、および対話装置３の各ブロックは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。後者の場合、上記の各サーバまたは対話装置３を図１８に示すようなコンピュータ（電子計算機）を用いて構成することができる。図１８は、上記の各サーバまたは対話装置３として利用可能なコンピュータ１００の構成を例示したブロック図である。 [Embodiment 5]
Each block of the utterance control server 1, 201, 301, 401, TV information collection server 202, 302, 402, program information collection server 203, 303, 403, and interactive device 3 is formed in an integrated circuit (IC chip) or the like. Alternatively, it may be realized by a logic circuit (hardware) or by software using a CPU (Central Processing Unit). In the latter case, each of the above-described servers or interactive devices 3 can be configured using a computer (electronic computer) as shown in FIG. FIG. 18 is a block diagram illustrating the configuration of a computer 100 that can be used as each of the servers or the interaction device 3 described above.

コンピュータ１００は、図１８に示すように、バス１１０を介して互いに接続された演算装置１２０と、主記憶装置１３０と、補助記憶装置１４０と、入出力インタフェース１５０とを備えている。演算装置１２０、主記憶装置１３０、および補助記憶装置１４０は、それぞれ、例えばＣＰＵ、ＲＡＭ（random access memory）、ハードディスクドライブであってもよい。なお、主記憶装置１３０は、コンピュータ読み取り可能な「一時的でない有形の媒体」であればよく、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブル論理回路などを用いることができる。 As shown in FIG. 18, the computer 100 includes an arithmetic device 120, a main storage device 130, an auxiliary storage device 140, and an input / output interface 150 connected to each other via a bus 110. The arithmetic device 120, the main storage device 130, and the auxiliary storage device 140 may be, for example, a CPU, a random access memory (RAM), and a hard disk drive, respectively. The main storage device 130 may be a computer-readable “non-temporary tangible medium”. For example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.

入出力インタフェース１５０には、入力装置２００および出力装置３００が接続される。上記各サーバの入力装置２００および出力装置３００は、他のサーバまたはＴＶ５００から送信されるデータの受信、および他のサーバまたは対話装置３へのデータの送信を行う。対話装置３の入力装置２００および出力装置３００は、発話制御サーバ１、２０１、３０１、４０１からのデータの受信、ユーザへの発話、およびユーザの音声の取得等を行う。 The input device 200 and the output device 300 are connected to the input / output interface 150. The input device 200 and the output device 300 of each server receive data transmitted from another server or the TV 500 and transmit data to another server or the interactive device 3. The input device 200 and the output device 300 of the interactive device 3 receive data from the utterance control servers 1, 201, 301, 401, utter the user, and acquire the user's voice.

補助記憶装置１４０には、コンピュータ１００を上記の各サーバまたは対話装置３として動作させるための各種プログラムが格納されている。そして、演算装置１２０は、補助記憶装置１４０に格納された上記各プログラムを主記憶装置１３０上に展開し、主記憶装置１３０上に展開された上記各プログラムに含まれる命令を実行することによって、コンピュータ１００を、上記の各サーバまたは対話装置３が備える各部として機能させる。 The auxiliary storage device 140 stores various programs for causing the computer 100 to operate as each of the above servers or the interactive device 3. Then, the arithmetic device 120 expands the respective programs stored in the auxiliary storage device 140 on the main storage device 130, and executes instructions included in the respective programs expanded on the main storage device 130. The computer 100 is caused to function as each unit included in each of the servers or the interactive device 3 described above.

なお、ここでは、内部記録媒体である補助記憶装置１４０に記録されている上記各プログラムを用いてコンピュータ１００を機能させる構成について説明したが、外部記録媒体に記録されているプログラムを用いてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 Here, the configuration has been described in which the computer 100 is made to function using each of the programs recorded in the auxiliary storage device 140, which is an internal recording medium. However, a program recorded in an external recording medium may be used. . The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る発話制御装置（発話制御サーバ１、４０１）は、音声による発話機能を備えた発話装置（対話装置３）に発話させる発話制御装置であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置（ＴＶ５００）が、上記発話制御装置以外の装置から受信して出力しているコンテンツ（番組）を特定するコンテンツ特定手段（視聴番組情報取得部２３）と、上記コンテンツ特定手段が特定したコンテンツに応じた内容で上記発話装置に発話させる発話制御手段（発話制御部２６）と、を備えている。 [Summary]
An utterance control device (speech control server 1, 401) according to aspect 1 of the present invention is an utterance control device that utters an utterance device (dialogue device 3) having an utterance function by voice, and is used together with the utterance device. Content specifying means (viewing program information acquisition unit 23) that specifies the content (program) received and output from the device other than the speech control device by the output device (TV 500) associated with the speech device as a device that ) And utterance control means (speech control unit 26) that causes the utterance device to utter in accordance with the content specified by the content specifying means.

上記の構成によれば、発話装置と共に使用される装置として発話装置と対応付けられた出力装置が、発話制御装置以外の装置から受信して出力しているコンテンツを特定し、特定したコンテンツに応じた内容で発話装置に発話させる。よって、上記発話制御装置は、コンテンツを出力装置に送信することなく、出力装置が出力中のコンテンツに応じた発話を発話装置に行わせることができる。つまり、〔背景技術〕に記載した特許文献１では、ロボットの動作制御をする受信機からテレビに映像情報を送信する必要があったが、上記の構成によれば、発話装置に発話させる発話制御装置はコンテンツを出力装置に送信する必要がない。 According to said structure, the output apparatus matched with the utterance apparatus as an apparatus used with an utterance apparatus specifies the content received and output from apparatuses other than an utterance control apparatus, and responds to the specified content. Talk to the utterance device with the contents. Therefore, the utterance control apparatus can cause the utterance apparatus to utter according to the content being output by the output apparatus without transmitting the content to the output apparatus. That is, in Patent Document 1 described in [Background Art], it is necessary to transmit video information from a receiver that controls the operation of the robot to the television. According to the above configuration, the utterance control that causes the utterance device to utter. The device does not need to send content to the output device.

また、特許文献１では、動作情報と映像情報とが受信機に揃った状態となった後、動作情報に基づいてロボットの動作制御を行うと共に、映像情報をテレビに送信している。これに対し、上記本発明の構成では、発話内容とコンテンツとを予め揃えておく必要がない。これは、上記本発明の構成では、出力装置が出力しているコンテンツを特定するためである。そして、出力装置が出力しているコンテンツを特定することにより、特定したコンテンツに基づいて、そのコンテンツの出力中に発話内容を生成し、その内容で発話させることも可能である。よって、出力装置が、例えば生放送番組のような事前に発話内容を生成することが難しいコンテンツを出力している場合であっても、その放送中に略リアルタイムで生成された、そのときに出力されているシーンに応じた発話を行わせることも可能になる。 In Patent Document 1, after the motion information and the video information are in a state where they are aligned in the receiver, the motion control of the robot is performed based on the motion information and the video information is transmitted to the television. On the other hand, in the configuration of the present invention, it is not necessary to prepare the utterance contents and the contents in advance. This is because the content of the output device is specified in the configuration of the present invention. Then, by specifying the content output by the output device, it is also possible to generate utterance content during the output of the content based on the specified content and to utter with the content. Therefore, even if the output device outputs content that is difficult to generate in advance, such as a live broadcast program, for example, it is generated at that time and is output in substantially real time during the broadcast. It is also possible to make an utterance according to the scene.

なお、「発話装置と共に使用される出力装置」が意味する発話装置と出力装置との関係は、出力装置が出力するコンテンツを視聴しているユーザに対して、発話装置が発話できる蓋然性があるという関係である。つまり、出力装置は、発話装置と通信したり、協働したりするか否かにかかわらず、発話装置と同じ部屋で使用される蓋然性があれば、「発話装置と共に使用される出力装置」である。 Note that the relationship between the utterance device and the output device, which means “the output device used with the utterance device”, is likely to be uttered by the utterance device to the user who is viewing the content output by the output device. It is a relationship. In other words, regardless of whether or not the output device communicates with or cooperates with the utterance device, if there is a probability of being used in the same room as the utterance device, the “output device used with the utterance device” is there.

本発明の態様２に係る発話制御装置は、上記態様１において、上記コンテンツ特定手段は、上記出力装置が送信した、該出力装置が出力するコンテンツを示す情報、または上記出力装置に対して行われた操作の内容を示す情報を用いてコンテンツを特定する。この態様によれば、出力装置が、その出力するコンテンツを示す情報を送信する場合、この情報を利用して、出力装置が出力しているコンテンツを特定することができる。また、出力装置に対して行われた操作の内容を示す情報を取得すれば、この情報から出力装置が出力しているコンテンツを特定することができる。 The speech control apparatus according to aspect 2 of the present invention is the speech control apparatus according to aspect 1, wherein the content specifying means is performed on the information transmitted from the output device and indicating the content output by the output device, or on the output device. The content is specified using information indicating the details of the operation. According to this aspect, when the output device transmits information indicating the content to be output, the content output from the output device can be specified using this information. Further, if information indicating the details of the operation performed on the output device is acquired, the content output by the output device can be specified from this information.

本発明の態様３に係る発話制御装置は、上記態様１または２において、制御対象である上記発話装置は複数であり、上記出力装置と上記発話装置との対応付けを示す対応情報（ＴＶ−発話装置対応情報１６）を参照して、複数の上記発話装置のうち、上記コンテンツ特定手段が出力中のコンテンツを特定した出力装置と対応付けられた発話装置を特定する発話対象特定手段（発話対象特定部２５）を備え、上記発話制御手段は、上記発話対象特定手段が特定した発話装置に発話させる。この態様によれば、複数の発話装置を制御して、各発話装置と対応付けられた出力装置が出力しているコンテンツに応じた発話を行わせることができる。 The utterance control device according to aspect 3 of the present invention is the utterance control device according to aspect 1 or 2, wherein there are a plurality of the utterance devices to be controlled, and correspondence information (TV-utterance) indicating the association between the output device and the utterance device Referring to the device correspondence information 16), among the plurality of utterance devices, the utterance target specifying means (speech target specification) for specifying the utterance device associated with the output device that specified the content being output by the content specification means Section 25), and the speech control means causes the speech device specified by the speech target specifying means to speak. According to this aspect, it is possible to control a plurality of utterance devices and perform utterances according to the content output by the output device associated with each utterance device.

本発明の態様４に係る発話制御装置は、上記態様１から３の何れかにおいて、上記コンテンツ特定手段が特定したコンテンツに関連する関連情報（番組関連情報）を取得する関連情報取得手段（関連情報取得部２２）と、上記関連情報取得手段が取得した関連情報を含む発話内容を生成する発話内容生成手段（発話内容生成部２４）を備えている。この態様によれば、コンテンツに応じた発話内容を自動的に生成し、これを発話装置に発話させることができる。 An utterance control device according to aspect 4 of the present invention is related to any one of aspects 1 to 3 above, related information acquisition means (related information) for acquiring related information (program related information) related to the content specified by the content specifying means. And an utterance content generation unit (utterance content generation unit 24) that generates utterance content including the related information acquired by the related information acquisition unit. According to this aspect, the utterance content corresponding to the content can be automatically generated, and this can be uttered by the utterance device.

本発明の態様５に係る発話制御装置は、上記態様１から４の何れかにおいて、上記コンテンツ特定手段が特定したコンテンツのスポンサーを特定するスポンサー特定手段（番組情報取得部２１）と、上記スポンサー特定手段が特定したスポンサーに応じた発話内容を生成する発話内容生成手段を備えている。この態様によれば、スポンサーの付いたコンテンツについて、そのスポンサーに応じた発話内容を自動的に生成し、これを発話装置に発話させることができる。例えば、スポンサーに関する広告を発話させることも可能である。 The utterance control device according to aspect 5 of the present invention is the utterance control device according to any one of the aspects 1 to 4, wherein the sponsor specifying means (program information acquisition unit 21) for specifying the sponsor of the content specified by the content specifying means; An utterance content generating means for generating utterance contents corresponding to the sponsor specified by the means is provided. According to this aspect, the utterance content corresponding to the sponsor can be automatically generated for the content with the sponsor, and this can be uttered by the utterance device. For example, an advertisement related to a sponsor can be uttered.

本発明の態様６に係る発話装置（対話装置３）は、音声による発話機能を備えた発話装置であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が受信して出力しているコンテンツを特定するコンテンツ特定手段と、上記コンテンツ特定手段が特定したコンテンツに応じた内容で発話する発話手段（発話部３１）と、を備えている。 An utterance device (dialogue device 3) according to an aspect 6 of the present invention is an utterance device having a speech utterance function, and is received by an output device associated with the utterance device as a device used with the utterance device. Content specifying means for specifying the content being output, and utterance means (speech unit 31) for uttering in accordance with the content specified by the content specifying means.

上記の構成によれば、発話装置と共に使用される装置として発話装置と対応付けられた出力装置が出力しているコンテンツを特定し、特定したコンテンツに応じた内容で発話する。よって、コンテンツを出力装置に送信することなく、出力装置が出力中のコンテンツに応じた発話を行うことができる。 According to said structure, the content which the output device matched with the speech apparatus is output as an apparatus used with a speech apparatus is specified, and it speaks by the content according to the specified content. Therefore, it is possible to make an utterance according to the content being output by the output device without transmitting the content to the output device.

本発明の態様７に係る発話装置は、上記態様６において、上記出力装置を操作するための出力装置操作手段を備え、上記コンテンツ特定手段は、上記出力装置操作手段が行った操作の内容を示す情報を用いてコンテンツを特定する。該構成によれば、発話装置を介して出力装置を操作することができる。そして、この操作の内容を示す情報を用いて、出力装置が出力しているコンテンツを特定するので、出力装置は出力しているコンテンツを送信する機能を備えている必要がない。つまり、上記の構成によれば、例えば一般的なテレビジョン受像機のような、出力しているコンテンツを送信する機能を備えていない出力装置を用いた場合であっても、その出力装置が出力しているコンテンツに応じた発話を行うことができる。 An utterance device according to aspect 7 of the present invention includes the output device operating means for operating the output device in the above aspect 6, and the content specifying means indicates details of an operation performed by the output device operating means. Identify content using information. According to this configuration, the output device can be operated via the speech device. And since the content which the output device is outputting is specified using the information which shows the content of this operation, the output device does not need to be provided with the function to transmit the output content. That is, according to the above configuration, even when an output device that does not have a function of transmitting output content, such as a general television receiver, is used, the output device outputs Speak according to the content being played.

本発明の態様８に係る発話制御システム（対話システム２０５、３０５、４０５）は、音声による発話機能を備えた発話装置に発話させる発話制御システムであって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が、上記発話装置に発話させる発話制御装置（発話制御サーバ２０１、３０１、４０１）以外の装置から受信して出力しているコンテンツを特定するコンテンツ特定装置（ＴＶ情報収集サーバ２０２、３０２、４０２）と、上記コンテンツ特定装置が特定したコンテンツに応じた内容で上記発話装置に発話させる上記発話制御装置と、上記発話装置と、を含む。この態様によれば、上記態様１と同様の効果を奏する。 An utterance control system (dialogue systems 205, 305, and 405) according to an aspect 8 of the present invention is an utterance control system that causes an utterance apparatus having an utterance function by voice to utter, and is an apparatus used together with the utterance apparatus. A content specifying device (TV) that specifies content received and output from a device other than the speech control device (speech control servers 201, 301, 401) that causes the output device associated with the speech device to speak. Information collection servers 202, 302, and 402), the utterance control device that causes the utterance device to utter the content according to the content specified by the content specifying device, and the utterance device. According to this aspect, the same effects as those of the above aspect 1 are obtained.

本発明の態様９に係る発話制御方法は、音声による発話機能を備えた発話装置に発話させる発話制御方法であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が、上記発話装置に発話させる発話制御装置以外の装置から受信して出力しているコンテンツを特定するコンテンツ特定ステップ（Ｓ３１、Ｓ３０１、Ｓ４０１）と、上記コンテンツ特定ステップにて特定したコンテンツに応じた内容で上記発話装置に発話させる発話制御ステップ（Ｓ５４）と、を含む。よって、態様１と同様の効果を奏する。 An utterance control method according to an aspect 9 of the present invention is an utterance control method for causing an utterance apparatus having an utterance function by voice to utter, and an output device associated with the utterance apparatus as an apparatus used with the utterance apparatus Depending on the content specified in the content specifying step (S31, S301, S401) for specifying the content received and output from a device other than the utterance control device that causes the utterance device to speak, and the content specified in the content specifying step An utterance control step (S54) for causing the utterance device to utter the content. Therefore, the same effect as in aspect 1 is produced.

本発明の態様１０に係る発話装置の制御方法は、音声による発話機能を備えた発話装置の制御方法であって、上記発話装置と共に使用される装置として上記発話装置と対応付けられた出力装置が受信して出力しているコンテンツを特定するコンテンツ特定ステップと、上記コンテンツ特定ステップにて特定したコンテンツに応じた内容で発話する発話ステップと、を含む。よって、態様６と同様の効果を奏する。 A control method for an utterance device according to aspect 10 of the present invention is a control method for an utterance device having a voice utterance function, wherein an output device associated with the utterance device is used as a device used with the utterance device. A content specifying step for specifying the content received and output and an utterance step for uttering the content according to the content specified in the content specifying step are included. Therefore, the same effect as in aspect 6 is obtained.

本発明の各態様に係る発話制御装置および発話装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記発話制御装置および上記発話装置が備える各手段として動作させることにより上記発話制御装置および上記発話装置をコンピュータにて実現させる上記発話制御装置および上記発話装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The utterance control device and the utterance device according to each aspect of the present invention may be realized by a computer. In this case, the utterance control is performed by operating the computer as each means included in the utterance control device and the utterance device. The utterance control apparatus and the utterance apparatus control program for realizing the apparatus and the utterance apparatus by a computer, and a computer-readable recording medium recording the utterance apparatus are also included in the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

本発明は、ユーザに対して発話を行う発話装置、およびその制御に利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used for an utterance device that utters a user and control thereof.

１、２０１、３０１、４０１発話制御サーバ（発話制御装置）
１６ＴＶ−対話装置対応情報（対応情報）
２１番組情報取得部（スポンサー特定手段）
２２関連情報取得部（関連情報取得手段）
２３視聴番組情報取得部（コンテンツ特定手段）
２４発話内容生成部（発話内容生成手段）
２５発話対象特定部（発話対象特定手段）
２６発話制御部（発話制御手段）
３対話装置（発話装置）
３１発話部（発話手段）
５、２０５、３０５、４０５対話システム（発話制御システム）
２０２、３０２、４０２ＴＶ情報収集サーバ（コンテンツ特定装置）
５００ＴＶ（出力装置） 1, 201, 301, 401 Speech control server (speech control device)
16 TV-dialogue device correspondence information (correspondence information)
21 Program information acquisition unit (sponsor identification means)
22 related information acquisition unit (related information acquisition means)
23 viewing program information acquisition unit (content specifying means)
24 Utterance content generation unit (utterance content generation means)
25 Utterance target specifying part (speech target specifying means)
26 Speech control unit (speech control means)
3 Dialogue device (speech device)
31 Utterance part (speech means)
5, 205, 305, 405 Dialogue system (speech control system)
202, 302, 402 TV information collection server (content specifying device)
500 TV (output device)

Claims

An utterance control device for uttering to an utterance device having an utterance function by voice,
An output device associated with the utterance device as a device used with the utterance device, content specifying means for specifying content received and output from a device other than the utterance control device;
According to the information for specifying the timing of speech, the speech control unit, characterized in that the content corresponding to the content in which the content specifying means has specified and a, a speech control means for causing the speech to the speech device in the above timing.

The content specifying means specifies content using information transmitted from the output device and indicating information output by the output device or information indicating an operation performed on the output device. The utterance control device according to claim 1.

An utterance control device for uttering to an utterance device having an utterance function by voice,
An output device associated with the utterance device as a device used with the utterance device, content specifying means for specifying content received and output from a device other than the utterance control device;
Utterance control means for causing the utterance device to utter with the content according to the content specified by the content specifying means,
There are a plurality of the utterance devices to be controlled,
Referring to the correspondence information indicating the correspondence between the output device and the utterance device, the utterance device associated with the output device that has identified the content being output by the content specifying unit among the plurality of utterance devices. It has an utterance target identification means to identify
It said speech control means, utterance controller you characterized thereby uttered speech apparatus identified above utterance target identification means.

Related information acquiring means for acquiring related information related to the content specified by the content specifying means;
The utterance content generation means for generating the utterance content including the related information acquired by the related information acquisition means as the utterance content corresponding to the content is provided. The utterance control device described.

Sponsor identification means for identifying the sponsor of the content identified by the content identification means;
5. The utterance content generation means for generating the utterance content corresponding to the sponsor specified by the sponsor specifying means as the utterance content corresponding to the content. 5. Utterance control device.

An utterance device having a voice utterance function,
Content specifying means for specifying content received and output by an output device associated with the utterance device as a device used with the utterance device;
According to the information for specifying the timing of utterance, speech apparatus characterized by a content corresponding to the content in which the content specifying means has specified and a, a speech means to speak the above timing.

An output device operating means for operating the output device;
The utterance device according to claim 6, wherein the content specifying means specifies content using information indicating details of an operation performed by the output device operating means.

An utterance control system for uttering to an utterance device having an utterance function by voice,
An output device associated with the utterance device as a device used with the utterance device, a content specifying device for specifying content received and output from a device other than the utterance control device that utters the utterance device;
The utterance control device that causes the utterance device to utter at the timing the content according to the content specified by the content specifying device according to the information specifying the utterance timing ;
An utterance control system including the utterance device.

An utterance control method for uttering to an utterance device having an utterance function by voice,
A content specifying step for specifying content received and output from a device other than the speech control device that causes the output device associated with the speech device to be used by the speech device as an apparatus used with the speech device;
According to the information for specifying the timing of speech, the speech control method characterized by the contents corresponding to content identified by the content specifying step including a speech control step of speech to the speech device in the above timing.

A method of controlling an utterance device having a speech utterance function,
A content specifying step of specifying content received and output by an output device associated with the utterance device as a device used with the utterance device;
According to the information specifying the timing of speech, the control method of the utterance apparatus which comprises a speech step, the uttering a content corresponding to content identified by the content identification step above timing.

A control program for causing a computer to function as the speech control device according to any one of claims 1 to 5, wherein the control program causes the computer to function as each of the means.

A control program for causing a computer to function as the utterance device according to claim 6 or 7, wherein the control program causes the computer to function as each of the means.