JP6266330B2

JP6266330B2 - Remote operation system and user terminal and viewing device thereof

Info

Publication number: JP6266330B2
Application number: JP2013258475A
Authority: JP
Inventors: 剣明呉; 加藤　恒夫; 恒夫加藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-12-13
Filing date: 2013-12-13
Publication date: 2018-01-24
Anticipated expiration: 2033-12-13
Also published as: JP2015115879A

Description

本発明は、視聴者のモバイル端末（ユーザ端末）でTV、セット・トップ・ボックス（Set Top Box：STB）、カーナビまたはデジタルフォトフレームなどの視聴機器を遠隔操作するシステムならびにそのユーザ端末および視聴機器に係り、特に、キャラクタ対話型UIを用いることで操作対象端末の相違をユーザに意識させることなく統一的な方式で遠隔操作できる遠隔操作システムならびにそのユーザ端末および視聴機器に関する。 The present invention relates to a system for remotely operating a viewing device such as a TV, a set top box (STB), a car navigation system, or a digital photo frame on a viewer's mobile terminal (user terminal), and the user terminal and viewing device thereof. In particular, the present invention relates to a remote operation system that can be remotely operated in a unified manner without making the user aware of differences in operation target terminals by using a character interactive UI, and the user terminal and viewing device.

テレビなどの視聴機器を遠隔操作する装置として赤外線リモコンが一般に普及している。しかしながら、赤外線リモコンでは、その発光部が視聴機器の受光部に向いてない場合、受光部に蛍光灯などの強い照明光が当たっている場合、リモコンと視聴機器との間に障害物がある場合などに操作の反応が悪くなることがある。また、リモコンの高機能化につれて操作が煩雑になり、さらに視聴機器ごとにリモコンのボタン位置や操作方法が統一されていないので、複数台の視聴機器を操作するユーザには戸惑いが生じ得る。 Infrared remote controls are widely used as devices for remotely operating viewing devices such as televisions. However, in the infrared remote control, when the light emitting unit is not directed to the light receiving unit of the viewing device, when the light receiving unit is exposed to strong illumination light such as a fluorescent lamp, there is an obstacle between the remote control and the viewing device In some cases, the operation may become unresponsive. In addition, the operation becomes complicated as the functions of the remote control become higher, and the button positions and operation methods of the remote control are not standardized for each viewing device, which may cause confusion for users who operate a plurality of viewing devices.

一方、近年になって視聴機器へのWi-FiやBluetooth（登録商標）の搭載が進み、スマートフォンやタブレット端末などのユーザ端末との連携が実現可能となった。 On the other hand, in recent years, Wi-Fi and Bluetooth (registered trademark) have been installed in viewing devices, and it has become possible to link with user terminals such as smartphones and tablet terminals.

特許文献１には、ユーザが発声した音声を認識し、視聴機器の制御コードに変換する技術が開示されている。 Patent Document 1 discloses a technique for recognizing a voice uttered by a user and converting it into a control code for a viewing device.

特許文献２には、Bluetooth（登録商標）通信方式を利用し、携帯電話と視聴機器との間でコンテンツの再生時刻を連動させる携帯リモコンによる再生技術が開示されている。 Patent Document 2 discloses a playback technique using a mobile remote controller that uses a Bluetooth (registered trademark) communication system to link the playback time of content between a mobile phone and a viewing device.

特許文献３には、視聴機器の画面領域を携帯電話と関連づけて記憶しておき、携帯電話から視聴機器に無線接続すると、割り当てられた画面領域を携帯リモコンから操作できる技術が開示されている。 Patent Document 3 discloses a technique in which a screen area of a viewing device is stored in association with a mobile phone, and when the mobile phone is wirelessly connected to the viewing device, the assigned screen area can be operated from the mobile remote controller.

特許文献４には、テレビや、ビデオプレイ、MACコンピュータ、タブレットなど、異なる機器に対して難しい操作をしなくても使えるユニバーサルリモコンの技術が開示されている。 Patent Document 4 discloses a universal remote control technology that can be used without performing difficult operations on different devices such as a television, a video play, a MAC computer, and a tablet.

特開2006-350221号公報JP 2006-350221 A 特開2009-43309号公報JP 2009-43309 特開2009-27485号公報JP 2009-27485 JP United States Patent Application No.20120019371United States Patent Application No.20120019371

特許文献１では、リモコンに対して電源のON/OFF、再生、早送り等の音声を発話すると視聴機器を遠隔制御できるが、どの画面でどの操作を可能にするか、どの音声命令を発話すればよいか等はユーザが記憶しておく必要がある。 In Japanese Patent Laid-Open No. 2004-26883, the audiovisual device can be remotely controlled by speaking the power on / off, playback, fast-forwarding, etc. to the remote control, but what operation can be performed on which screen and what voice command should be spoken It is necessary for the user to remember whether or not it is good.

特許文献２、３では、Wi-FiやBluetooth（登録商標）などの無線通信方式を使って視聴機器をアプリケーションから遠隔操作できるが、視聴機器の同異にかかわらず統一的で簡単に操作できるUIの実現は困難である。 In Patent Documents 2 and 3, a viewing device can be remotely controlled from an application using a wireless communication method such as Wi-Fi or Bluetooth (registered trademark), but a UI that can be easily and uniformly operated regardless of whether the viewing device is the same. Is difficult to realize.

特許文献４は、ハードウェアからソフトウェア、オペレーティングシステムまで全体を統合的に開発する強みを持っているアップル社の技術であるが、特許文献２、３と同様に、視聴機器の同異にかかわらず統一的で簡単に操作できるUIの実現は容易ではない。実際にも、Apple TV操作用のiPhone（登録商標）版リモコンとiPad（登録商標）版リモコンのUIとには違いが多く存在し、ITリテラシの低いユーザにとっては戸惑いを感じる声もあった。 Patent Document 4 is a technology of Apple Inc. that has the strength to develop the whole from hardware to software and operating system, but as with Patent Documents 2 and 3, regardless of the difference in viewing equipment. Realizing a uniform and easy-to-operate UI is not easy. Actually, there are many differences between the UI of the iPhone (registered trademark) remote control for operating Apple TV and the UI of the iPad (registered trademark) remote control, and some users with low IT literacy felt confused.

さらに、上記の各先行技術はいずれもリモコン操作の範疇に留まっており、多様な機器をいかに統一的で簡単に操作できるか、異なる視聴機器を跨いてユーザの生活習慣や好みを踏まえた機能・コンテンツ推薦がいかに実現できるか、などの課題を残している。 Furthermore, each of the above prior arts is still in the category of remote control operation, how to operate various devices in a unified and easy manner, functions that take into account user lifestyles and preferences across different viewing devices. Issues such as how content recommendation can be realized remain.

本発明の第１の目的は、ユーザと仮想的に対話する一のキャラクタを、操作対象機器の切り替えに応答して各機器のディスプレイ間で移動させ、キャラクタとの対話形式で遠隔操作を要求できるようすることで、ユーザが視聴機器の同異を意識させずに統一的な手法で各機器を遠隔操作できる遠隔操作システムを提供することにある。 A first object of the present invention is to move a character that virtually interacts with a user between displays of each device in response to switching of the operation target device, and can request remote operation in an interactive manner with the character. By doing so, it is to provide a remote operation system that allows a user to remotely operate each device by a unified method without being aware of the difference between viewing devices.

本発明の第２の目的は、ユーザ端末に登録されているユーザプロファイルを、各視聴機器がユーザからの遠隔操作に応答する際の応答内容に反映させることで、各視聴機器がユーザの嗜好や生活習慣に適した応答を行える遠隔操作システムならびにそのユーザ端末および視聴機器を提供することにある。 The second object of the present invention is to reflect the user profile registered in the user terminal in the response content when each viewing device responds to a remote operation from the user, so that each viewing device has the user's preference and It is an object to provide a remote control system capable of performing a response suitable for a lifestyle, a user terminal thereof, and a viewing device.

上記の目的を達成するために、本発明は、ユーザ端末から操作対象の視聴機器へキャラクタをディスプレイ上で移動させてキャラクタ対話型UIにより遠隔操作する遠隔操作システムにおいて、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention has the following configuration in a remote operation system in which a character is moved from a user terminal to an operation target viewing device on a display and remotely operated by a character interactive UI. There is a feature.

(1)ユーザ端末が、ユーザの発声内容を理解する意味理解手段と、発声内容を視聴端末へ提供する手段とを具備し、 (1) The user terminal comprises semantic understanding means for understanding the user's utterance content, and means for providing the utterance content to the viewing terminal,

ユーザ端末および視聴機器が、端末同士を無線接続する無線通信手段と、ユーザの発声内容に基づいて応答内容を決定する対話応答手段と、応答内容に基づいて音声メッセージを出力する音声応答手段と、ディスプレイ上でキャラクタのアニメーションを前記応答内容に応じて制御する第１アニメーション制御手段と、キャラクタをユーザ端末からジャンプアウトさせて操作対象の視聴機器へジャンプインさせる第２アニメーション制御手段とを具備し、視聴機器がさらに、応答内容に基づいて視聴サービスを制御する制御手段を具備した。 A user terminal and a viewing device, a wireless communication means for wirelessly connecting terminals; an interactive response means for determining a response content based on a user's utterance content; a voice response means for outputting a voice message based on the response content; First animation control means for controlling the animation of the character on the display according to the content of the response, and second animation control means for jumping out the character from the user terminal and jumping in to the viewing device to be operated, The viewing device further includes control means for controlling the viewing service based on the response content.

(2)ユーザ端末がユーザプロファイルを蓄積する手段を具備し、対話応答手段は、ユーザプロファイルを反映して応答内容を決定するようにした。 (2) The user terminal includes means for storing the user profile, and the dialog response means determines the response content reflecting the user profile.

(3)ユーザ端末は、キャラクタをジャンプインさせた視聴端末へ前記ユーザプロファイルを提供するようにした。 (3) The user terminal provides the user profile to the viewing terminal that jumps in the character.

(4)視聴端末の対話応答手段は、ユーザ端末から提供されたユーザプロファイルを反映して応答内容を決定するようにした。 (4) The interactive response means of the viewing terminal determines the response content reflecting the user profile provided from the user terminal.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1)ユーザと仮想的に対話する一のキャラクタが、操作対象となる視聴機器の切り替えに応答して各機器のディスプレイ間を移動して操作対象機器のディスプレイ上に出現するので、ユーザは視聴機器にかかわらず、ディスプレイ上に表示されたキャラクタとの対話形式で遠隔操作を要求できる。したがって、ユーザに視聴機器の同異を意識させずに統一的な手法で各機器を遠隔操作させることができるようになる。 (1) One character that virtually interacts with the user moves between the displays of each device in response to the switching of the viewing device to be operated and appears on the display of the operation target device. Regardless of the device, remote operation can be requested in an interactive manner with the character displayed on the display. Therefore, each device can be remotely operated by a unified method without making the user aware of the difference between the viewing devices.

(2)キャラクタがユーザ端末から視聴機器へ移動する際に、ユーザの嗜好等を含むプロファイルも視聴機器へ通知されるので、各視聴機器では要求された遠隔操作にユーザプロファイルを反映して応答内容を決定できるようになる。 (2) When the character moves from the user terminal to the viewing device, the profile including the user's preferences is also notified to the viewing device, so each viewing device reflects the user profile in the requested remote operation and the response content Can be determined.

例えば、視聴中のTVチャネルを切り替える遠隔操作が検知された際、ユーザがスポーツ中継好きである旨のプロファイルが取得されていれば、スポーツ番組へのチャネル切り替えを提案し、または優先できるようになる。 For example, when a remote operation for switching a TV channel being viewed is detected, if a profile indicating that the user likes sports broadcasting is acquired, channel switching to a sports program can be proposed or prioritized. .

本発明の概要を模式的に表現した図である。It is the figure which expressed the outline | summary of this invention typically. 本発明を適用した遠隔操作システムの機能ブロック図である。It is a functional block diagram of a remote control system to which the present invention is applied. 図１の遠隔操作における図２の主要部の動作を示したシーケンスフローである。It is the sequence flow which showed the operation | movement of the principal part of FIG. 2 in the remote control of FIG.

以下、図面を参照して本発明の実施の形態について詳細に説明する。ここでは初めに、図１の模式図を参照しながら、本発明のキャラクタ対話型UIにより視聴機器をユーザ端末１と連動させて対話方式で遠隔操作する方法の概要について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Here, first, an outline of a method of remotely operating a viewing device in conjunction with the user terminal 1 by using the character interactive UI of the present invention will be described with reference to the schematic diagram of FIG.

ユーザ端末１（ここでは、スマートフォンを想定）において遠隔操作アプリケーション（以下、遠隔アプリと表現する）が起動されると、同図(a)に示したように、端末ディスプレイに女性を模したキャラクタが重畳表示される。遠隔アプリは、予め登録されているユーザの興味や嗜好等のプロファイル情報に基づいてTVの番組プログラムを検索し、ユーザの興味や嗜好に合致した番組プログラムが見つかると、例えば「○○君（ユーザ名）の好きなプロ野球中継の時間だよ」といった音声メッセージを合成して前記キャラクタから擬似的に発声させる。 When a remote operation application (hereinafter referred to as a remote application) is activated on the user terminal 1 (here, a smartphone is assumed), a character imitating a woman appears on the terminal display as shown in FIG. It is displayed superimposed. The remote application searches for TV program programs based on pre-registered profile information such as user interests and preferences, and if a program program that matches the user interests and preferences is found, for example, “XX-kun (user A voice message such as “It ’s time for your favorite professional baseball broadcast” is synthesized and uttered in a pseudo manner from the character.

ここで、ユーザが「つけて！」、「TV ON」、「この番組を見たい」などと発声すると、当該音声がユーザ端末１のマイクロフォンで検知されて音声認識部１０３へ転送され、認識結果が意味理解部１０４へ転送される。意味理解部１０４では、ユーザの発声内容が視聴機器２（ここでは、TVを想定）のスイッチをオン操作する遠隔操作要求と認識されるので、ユーザ端末１ではTV２をオン操作する遠隔制御用の信号が生成されてTV２へ送信される。 Here, when the user utters “Turn on!”, “TV ON”, “I want to watch this program”, the voice is detected by the microphone of the user terminal 1 and transferred to the voice recognition unit 103, and the recognition result. Is transferred to the semantic understanding unit 104. The meaning comprehension unit 104 recognizes the user's utterance content as a remote operation request to turn on the switch of the viewing device 2 (assuming TV here), so that the user terminal 1 is for remote control to turn on the TV 2. A signal is generated and transmitted to the TV 2.

前記遠隔操作アプリはユーザに視聴推薦したプロ野球中継のチャンネルを把握しているので、ここでは、TV２に適合した「スイッチオン操作」および「チャンネル指定操作」の各制御信号が生成されてTV２へ送信される。 Since the remote operation application knows the channel of the professional baseball broadcast recommended for viewing by the user, here, the “switch-on operation” and “channel designation operation” control signals suitable for the TV 2 are generated and sent to the TV 2. Sent.

TV２では、図１(b)に示したように、前記各制御信号に応答してのスイッチがオンされ、かつチャンネルが指定チャンネルに切り替えられてプロ野球中継を含むメニュー画面が表示される。 In the TV 2, as shown in FIG. 1 (b), the switch in response to each control signal is turned on, and the channel is switched to the designated channel to display a menu screen including professional baseball broadcasts.

さらに、前記キャラクタがユーザ端末１のディスプレイからジャンプアウトしてTV２のディスプレイへジャンプインし、このキャラクタ移動に同期して、制御対象がユーザ端末１からTV２へ切り替わる。このとき、前記ユーザプロファイルもユーザ端末１からTV２へ提供される。 Further, the character jumps out from the display of the user terminal 1 and jumps into the display of the TV 2, and the control target is switched from the user terminal 1 to the TV 2 in synchronization with the movement of the character. At this time, the user profile is also provided from the user terminal 1 to the TV 2.

ここで、ユーザが例えば「負けているな。他の番組は？」と発声すると、これがユーザ端末１のマイクロフォンにより検知されて音声認識が実行され、音声認識の結果がTV２へ転送される。TV２では、前記音声認識の結果に基づいて他の番組プログラムの推薦要求と判別されるので、前記提供されたユーザプロファイルに基づいて、ユーザの興味や嗜好に合致した他の番組プログラムが放送中であるか否かが番組表を参照することで判定される。 Here, for example, when the user utters “Don't lose. What other programs are?”, This is detected by the microphone of the user terminal 1, voice recognition is performed, and the result of the voice recognition is transferred to the TV 2. Since the TV 2 determines that the request for recommending another program program is based on the result of the voice recognition, another program program that matches the user's interests and preferences is being broadcast based on the provided user profile. Whether or not there is is determined by referring to the program guide.

他のチャンネルでサッカーの試合を中継中であることが解ると、同図(c)に示したように、その開始時刻「７：３０」や内容「日本代表戦」が番組表から取得されて音声合成され、例えば「７：３０からサッカー日本代表戦だよ」という音声メッセージが前記キャラクタから発声される。 When it is understood that a soccer game is being relayed on another channel, the start time “7:30” and the content “Japan National Team” are acquired from the program guide as shown in FIG. The voice is synthesized and, for example, a voice message saying “It is a soccer match against Japan from 7:30” is uttered from the character.

この音声メッセージに対して、ユーザが例えば「それにして」と応答すると、その音声がユーザ端末１のマイクロフォンにより検知されて音声認識が実行され、音声認識の結果がTV２へ転送される。TV２では、前記音声認識の結果に基づいてサッカー中継へのチャンネル切り替えが了承されたと認識されるので、チャンネルがサッカー中継のチャンネルへ切り替えられる。その結果、TV２のディスプレイには、同図(d)に示したように、野球中継に代えてサッカー中継が映し出されることになる。 When the user responds to this voice message, for example, “Take it”, the voice is detected by the microphone of the user terminal 1, voice recognition is executed, and the result of the voice recognition is transferred to the TV 2. Since the TV 2 recognizes that the channel switching to the soccer relay has been approved based on the result of the voice recognition, the channel is switched to the soccer relay channel. As a result, as shown in FIG. 4D, a soccer broadcast is displayed on the display of the TV 2 instead of the baseball broadcast.

その後、サッカーの試合が終了してTV番組の終了時間が近づくと、同図(e)に示したように、再びキャラクタが出現する。なお、TV番組再生中であっても、ユーザがキャラクタの名前、名称、愛称などを発生して呼び出すとキャラクタが出現する。ここで、ユーザが例えば「『やったね！おめでとう！』とツイートして」と発声すると、これがユーザ端末１のマイクロフォンにより検知されて音声認識が実行され、音声認識の結果がTV２へ転送される。 After that, when the soccer game ends and the TV program end time approaches, the character appears again as shown in FIG. Even when a TV program is being reproduced, the character appears when the user generates and calls the character name, name, nickname, or the like. Here, when the user utters, for example, “Tweet, I did it! Congratulations!”, This is detected by the microphone of the user terminal 1, voice recognition is executed, and the result of the voice recognition is transferred to the TV 2.

TV２では、前記音声認識の結果に基づいてツイート要求と認識されるので、操作対象をTV２からユーザ端末１に戻すべく、キャラクタがTV２のディスプレイ上からジャンプアウトすると同時にユーザ端末１のディスプレイ上へジャンプインする。 Since the TV 2 recognizes the tweet request based on the result of the voice recognition, the character jumps out from the display of the TV 2 and simultaneously jumps onto the display of the user terminal 1 to return the operation target from the TV 2 to the user terminal 1. In.

ユーザ端末１では、ツイート用のアプリケーションが起動されると共に前記メッセージが音声認識されてテキスト変換され、ツイート用アプリケーションのメッセージ入力フィールドに入力される。テキスト入力が完了すると、同図(f)に示したように、入力内容と共にキャラクタが表示され、入力内容の了承を得るためのメッセージとして、例えば「これでいい？」という音声メッセージが前記キャラクタから発声される。 In the user terminal 1, a tweet application is activated, and the message is voice-recognized and converted into text, which is input to the message input field of the tweet application. When the text input is completed, as shown in FIG. 5 (f), the character is displayed together with the input content. As a message for obtaining approval of the input content, for example, a voice message “Is this OK?” Is sent from the character. Spoken.

この問い掛けに対して、ユーザが例えば「いいよ」と音声で応答すると、これがユーザ端末１のマイクロフォンにより検知されて音声認識され、了承と判定されれば前記スイートが所定のアドレスへ送信される。 In response to this question, for example, when the user responds with a voice saying “OK”, this is detected and recognized by the microphone of the user terminal 1, and if it is determined to be approved, the sweet is transmitted to a predetermined address.

なお、TVのスイッチをオフにしたい場合は、ユーザが「TVを閉じて」、「TV OFF」、「疲れたから今から寝るね」など発話すると、当該音声がユーザ端末１のマイクロフォンで検知されて音声認識部１０３へ転送され、認識結果が意味理解部１０４へ転送される。意味理解部１０４では、ユーザの発声内容が視聴機器２のスイッチをオフ操作する遠隔操作要求と認識されるので、ユーザ端末１ではTV２をオフ操作する遠隔制御用の信号が生成されてTV２へ送信される。 When the user wants to switch off the TV, when the user speaks “Close TV”, “TV OFF”, “I ’m going to go to bed now because I ’m tired”, the voice is detected by the microphone of the user terminal 1. The recognition result is transferred to the voice recognition unit 103, and the recognition result is transferred to the meaning understanding unit 104. The meaning comprehension unit 104 recognizes the user's utterance content as a remote operation request for turning off the switch of the viewing device 2, so that the user terminal 1 generates a remote control signal for turning off the TV 2 and transmits it to the TV 2. Is done.

このように、本発明ではユーザ端末を含む複数種類の情報機器を一元的に操作・連携させるべく、動きを伴ってユーザと仮想的に対話する一のキャラクタを、操作対象機器の切り替えに応答して各種の情報と共に各機器のディスプレイ間で移動させて情報を伝えるというキャラクタ対話型UIを採用することにより、第１に、遠隔操作対象として選択されている機器をユーザが簡単に認識できるようになり、第２に、ユーザに操作対象機器の違いを意識させない統一的な操作性を実現している。 As described above, in the present invention, in order to operate and link a plurality of types of information devices including the user terminal in a unified manner, one character that virtually interacts with the user with movement is responded to the switching of the operation target device. First, the user can easily recognize the device selected as the remote operation target by adopting a character interactive UI that conveys information by moving it between the displays of each device together with various information. Secondly, unified operability is realized in which the user is not aware of the difference between the operation target devices.

図２は、本発明の一実施例に係る視聴機器制御システムの主要部の構成を示したブロック図であり、ここでは、本発明の説明に不要な構成は図示が省略されている。本実施例では、視聴機器としてSTBに着目し、TV２がSTB３に接続され、ディスプレイ機能はTV３が担う一方、ディスプレイ機能以外の視聴機器機能はSTB３が担い、STB３をユーザ端末１と連動させて対話方式で遠隔操作する場合を例にして説明する。 FIG. 2 is a block diagram showing the configuration of the main part of the audiovisual device control system according to one embodiment of the present invention, and here, the configuration unnecessary for the description of the present invention is omitted. In this embodiment, paying attention to the STB as a viewing device, the TV 2 is connected to the STB 3, the display function is borne by the TV 3, while the viewing device functions other than the display function are borne by the STB 3, and the STB 3 is linked to the user terminal 1 for dialogue. A case where remote control is performed by the method will be described as an example.

ユーザ端末１において、ユーザプロファイル蓄積部１０１には、ユーザプロファイルとして、ユーザ端末に固有の端末ID（MACアドレスや携帯電話番号など）が記憶され、さらにユーザの属性情報として氏名、年齢、性別、趣味、嗜好、好みの番組、贔屓の俳優名などが記憶されている。 In the user terminal 1, the user profile storage unit 101 stores a terminal ID (such as a MAC address or a mobile phone number) unique to the user terminal as a user profile, and further includes name, age, gender, hobby as user attribute information. , Preferences, favorite programs, and actor names of nieces are stored.

無線通信部１０２は、STB３の無線通信部３０１との間にWi-FiやBluetooth（登録商標）などによる無線接続を確立し、ユーザの発話を理解したテキスト、ユーザ端末に固有の端末ID、ユーザの氏名・年齢、ユーザの好みなどを含むプロファイル情報、キャラクタ対話型UIの実行データなどをSTB３へ無線送信する。 The wireless communication unit 102 establishes wireless connection with the wireless communication unit 301 of the STB 3 by Wi-Fi, Bluetooth (registered trademark), etc., understands the user's speech, the terminal ID unique to the user terminal, the user Profile information including the name and age of the user, the user's preferences, etc., the execution data of the character interactive UI, etc. are wirelessly transmitted to the STB 3.

音声認識部１０３および意味理解部１０４は、マイクロフォン（図示省略）で検知された端末ユーザの音声を認識し、発話内容からユーザの要求を理解する。対話応答PF１０５は、端末ユーザに能動的に質問したり、端末ユーザからのリクエストに対する回答文を生成したりする。対話応答PF１０５の内部には、端末ユーザの日常生活の雑談対話パターンや状態遷移のテーブルが予め登録されている。 The voice recognition unit 103 and the meaning understanding unit 104 recognize the terminal user's voice detected by a microphone (not shown) and understand the user's request from the utterance content. The interactive response PF 105 actively asks the terminal user a question or generates an answer sentence to the request from the terminal user. Inside the dialogue response PF 105, a chat dialogue pattern of daily life of the terminal user and a table of state transition are registered in advance.

キャラクタ表示部１０６および音声合成部１０７は、擬人化されたキャラクタのアニメーション表示および音声合成による人間的で自然な会話を実現する。音声合成部１０７はさらに、前記対話応答PF１０５が生成した回答文などのテキストを音声に変換する機能も備える。 The character display unit 106 and the voice synthesizing unit 107 realize human-like and natural conversation by displaying an anthropomorphic character animation and voice synthesis. The speech synthesizer 107 further has a function of converting text such as an answer sentence generated by the dialogue response PF 105 into speech.

前記キャラクタ表示部１０６は、ディスプレイ上でキャラクタのアニメーションを応答内容に応じて制御する第１アニメーション制御部１０６ａおよびキャラクタをジャンプアウトおよびジャンプインさせる第２アニメーション制御部１０６ｂを含む。 The character display unit 106 includes a first animation control unit 106a that controls the animation of the character on the display according to the response content, and a second animation control unit 106b that jumps out and jumps in the character.

STB３において、対話応答PF３０２は、キャラクタがユーザ端末１からSTB３に移動した後、端末ユーザに能動的に質問したり、端末ユーザからのリクエストに対する回答文を生成したりする。 In STB3, after the character moves from the user terminal 1 to the STB3, the dialogue response PF302 actively asks the terminal user a question or generates a response to the request from the terminal user.

当該対話応答PF３０２にも、ユーザ端末側と同様に、端末ユーザの日常生活の雑談対話パターンや状態遷移のテーブルが登録されているほか、前記ユーザとの対話から解析された視聴要求に対応づけられたSTB３の機器操作の制御コード（チャネル切替、音量調整、アプリ起動など）が登録されている。 Similarly to the user terminal side, the dialog response PF 302 includes a chat dialog pattern and state transition table of the terminal user's daily life, and is associated with a viewing request analyzed from the dialog with the user. STB3 device operation control codes (channel switching, volume adjustment, application activation, etc.) are registered.

キャラクタ表示部３０３および音声合成部３０４は、擬人化されているキャラクタのアニメーション表示および音声合成による人間的で自然な会話を実現する。音声合成部３０４はさらに、前記対話応答PFが生成した回答文などのテキストを音声に変換する機能を備える。 The character display unit 303 and the voice synthesizing unit 304 realize human-like natural conversation by displaying an anthropomorphic character animation and voice synthesis. The voice synthesizer 304 further has a function of converting text such as an answer sentence generated by the dialogue response PF into voice.

番組検索部３０５は、ユーザ端末１を識別し、当該ユーザ端末１のユーザ属性（端末ID、氏名、年齢、好み情報）に対応した各コンテンツのレイティング情報（視聴制限情報）を参照する。そして、視聴要求されたコンテンツのレイティング情報をユーザが満たしているか否かを判定し、満たしていれば当該コンテンツの再生を、例えばVOD (Video On Demand) サービス部３０６に対して許可する。レイティング情報には、２０歳未満の視聴を禁止するR20、１８歳未満の視聴を禁止するR18および１５歳未満の視聴を禁止するR15などがある。アプリ部３０７はYouTube（登録商標）やカラオケ、辞書などサードパティより提供されているアプリケーションを管理する。制御部３０８は、遠隔操作に基づいて視聴サービスを制御する The program search unit 305 identifies the user terminal 1 and refers to the rating information (viewing restriction information) of each content corresponding to the user attributes (terminal ID, name, age, preference information) of the user terminal 1. Then, it is determined whether or not the user satisfies the rating information of the requested content, and if the content is satisfied, reproduction of the content is permitted to, for example, a VOD (Video On Demand) service unit 306. The rating information includes R20 that prohibits viewing under the age of 20, R18 that prohibits viewing under the age of 18, and R15 that prohibits viewing under the age of 15. The application unit 307 manages applications provided by third parties such as YouTube (registered trademark), karaoke, and dictionaries. The control unit 308 controls the viewing service based on remote operation.

次いで、前記キャラクタ表示部１０６，３０３におけるキャラクタのアニメーション演出について説明する。 Next, the animation effect of the character on the character display units 106 and 303 will be described.

本実施例では、各機器が同様のキャラクタ表示、音声合成および対話応答の実行フレームワークを備える。効率的かつ継続的なキャラクタ移動・情報提示を実現するためには、キャラクタの実行に必要な3Dモデルファイル、モーションファイルおよび対話用のテキストファイルのみを転送すればよい。また、これらの転送データはテキストのフォーマットであるため送受信の遅延も少ない。 In this embodiment, each device has a similar character display, speech synthesis, and interactive response execution framework. In order to realize efficient and continuous character movement and information presentation, only the 3D model file, motion file, and text file for dialogue necessary for character execution need be transferred. Further, since these transfer data are in a text format, transmission / reception delay is small.

本実施例では、前記3DモデルファイルおよびモーションファイルにMiku Miku Dance（MMD：3DCGムービー製作ツール）のフォーマットを採用し、描画する際に、読み込まれたモーションファイルに3Dモデルファイルに紐づけると、さまざまな組み合わせの3DCGアニメーションを実現できる。この3Dモデルファイルは、3Dポリゴンモデラーソフトにより作成されており、ポリゴン単位で立体のObjectを生成・編集できる。 In this example, the Miku Miku Dance (MMD: 3DCG movie production tool) format is adopted for the 3D model file and motion file, and when the drawing is linked to the 3D model file, various 3DCG animation with various combinations can be realized. This 3D model file is created by 3D polygon modeler software, and can create and edit solid objects in units of polygons.

また、前記モーションファイルは、モーションキャプチャをするための専用機材・ソフトを用いて、実際に人間の動きのサンプリング情報を取り込んでテキストファイル化したものである。実際には、映画などのコンピュータアニメーションおよびゲームなどにおけるキャラクタの人間らしい動きの再現にもよく利用されている。このモーションファイルのデータは、前記3Dモデルファイルと同様のモデルの骨格、およびフレームごとの骨格・関節の差分情報を記述している。実行時に毎秒３０フレームずつ描画すれば、連続的に自然な動きを表現できる。 The motion file is a text file obtained by actually taking sampling information of human movements using dedicated equipment and software for motion capture. Actually, it is often used to reproduce human-like movements of characters in computer animations such as movies and games. The data of the motion file describes the skeleton of the model similar to the 3D model file and the skeleton / joint difference information for each frame. By drawing 30 frames per second at the time of execution, natural motion can be expressed continuously.

さらに、本実施例ではキャラクタにテキスト情報を発生させる音声合成に規則音声合成技術を利用している。モバイル端末では処理能力やメモリ容量に制限があり、また音声モデルのデータベース容量も十分に確保できないので、音声読み上げ機能の利用時には携帯電話回線等のネットワーク経由でサーバ側に処理してもらう必要ある。 Further, in this embodiment, a regular speech synthesis technique is used for speech synthesis for generating text information on a character. Mobile terminals have limited processing capacity and memory capacity, and the database capacity of the voice model cannot be secured sufficiently. Therefore, when using the voice reading function, it is necessary to have the server process through a network such as a cellular phone line.

そのために、本実施例では声質のデータをより小さくすることができるHMM音声合成方式を採用し、テキストと音声のデータを対にしたデータをHMMという統計モデルに与えることによってHMMの挙動を決めるパラメータを学習し、学習済のHMMにテキストデータを与えることで音声合成に必要なパラメータを生成する。 Therefore, in this embodiment, the HMM speech synthesis method that can make the voice quality data smaller is adopted, and the parameter that determines the behavior of the HMM by giving the paired data of text and speech data to the statistical model called HMM Is generated, and parameters necessary for speech synthesis are generated by giving text data to the learned HMM.

こうした軽量化技術により、本実施例では、処理能力やメモリ容量の不十分なSTB、スマホ・タブレット、車載器などでもテキストから自然な音声コンテンツを生成でき、リアルタイムの情報読み上げやナレーション作成が可能になる。 With this lightweight technology, this example can generate natural audio content from text even on STBs, smartphones / tablets, in-vehicle devices with insufficient processing capacity and memory capacity, and enables real-time information reading and narration creation. Become.

次いで、キャラクタ表示部１０６の第２アニメーション制御部１０６ｂによる複数のデバイス間(ユーザ端末・STB)でのキャラクタ移動表現について説明する。 Next, a description will be given of a character movement expression between a plurality of devices (user terminal / STB) by the second animation control unit 106b of the character display unit 106. FIG.

本実施例では、キャラクタが一方のディスプレイAからジャンプアウトすると同時に他方のディスプレイBへジャンプインする、といった連続的なディスプレイ間移動を実現するために、２つのディスプレイA，Bを仮想的に１つの描画領域として扱っている。 In this embodiment, in order to realize continuous movement between displays such that a character jumps out from one display A and jumps into the other display B at the same time, two displays A and B are virtually combined into one. Treated as a drawing area.

例えば、ディスプレイAからキャラクタの一部（例えば、頭部）がジャンプアウトした時点でディスプレイBにはキャラクタの頭部だけが表示され、次いでディスプレイAから胴体がジャンプアウトするとディスプレイBには胴体がジャンプインする。 For example, when a part of the character (for example, the head) jumps out of display A, only the head of the character is displayed on display B, and when the body jumps out of display A, the body jumps to display B In.

このようなキャラクタの同期は、ユーザ端末１のキャラクタ・ジャンプアウト演出とSTB３のキャラクタ・ジャンプイン演出とのモーションファイルのフレームを同期させることで実現できる。 Such character synchronization can be realized by synchronizing the frames of the motion file of the character jump-out effect of the user terminal 1 and the character jump-in effect of the STB 3.

ユーザ端末１において、キャラクタ・ジャンプアウト演出のモーションフレームを画面上に一枚ずつ描画しつつ、Syncコマンドを描画中のフレームIDと共にSTB３へ送信する。STB３はSyncコマンドを受信するとフレームIDを解析し、それに対応するキャラクタ・ジャンプイン演出のモーションフレームIDを用いてテレビの画面上に描画する。 In the user terminal 1, while drawing motion frames for character jump-out effects one by one on the screen, the Sync command is transmitted to the STB 3 together with the frame ID being drawn. When STB3 receives the Sync command, it analyzes the frame ID and draws it on the screen of the television using the motion frame ID of the character jump-in effect corresponding to it.

次いで、キャラクタ移動の前後、ユーザ端末とSTBの機能動作について説明する。ユーザは日常的にユーザ端末１のディスプレイ上のキャラクタと対話し、ユーザ端末１はユーザからのテレビ視聴要求が検出されると、STB３に無線接続してキャラクタをTV２の画面にジャンプインさせる。このとき、ユーザ端末１がTV２のマイク（音声入力用）となり、ユーザの発話は音声認識、意味理解でテキストに変換され、STB３の対話応答PF３０２へ転送される。 Next, functional operations of the user terminal and the STB will be described before and after the character movement. The user interacts with the character on the display of the user terminal 1 on a daily basis, and when the user terminal 1 detects a television viewing request from the user, the user terminal 1 wirelessly connects to the STB 3 to jump in the character to the screen of the TV 2. At this time, the user terminal 1 becomes the microphone of the TV 2 (for voice input), and the user's utterance is converted into text by voice recognition and meaning understanding, and transferred to the dialogue response PF 302 of the STB 3.

その後、STB３の対話応答PF３０２はユーザの操作意図を推定し、キャラクタがビジュアル的なフィードバックおよび音声の返事をすると共にSTB３の機器操作を実行する。ユーザ端末１およびSTB３上に、同一または同等のキャラクタのビジュアルデータ・音声合成用モデルを格納するエンジンを構築したことで、ユーザ端末１とSTB３との間では、テキスト情報のみを受け渡すだけで横断的なキャラクタ対話型UIを実現できる。 Thereafter, the dialog response PF 302 of the STB 3 estimates the user's intention to operate, and the character performs visual feedback and voice response and executes the equipment operation of the STB 3. By building an engine that stores visual data / speech synthesis models of the same or equivalent character on user terminal 1 and STB 3, only text information is passed between user terminal 1 and STB 3. Realistic character interactive UI can be realized.

次いで、ユーザ端１とSTB３との間で送受信される各種メッセージのパケット構造について説明する。本実施例では、TCP/IP Socket通信を利用することで機器同士が無線接続されている状態を想定し、パケットはHEADER，CMD，PARAM，END，SUMの各フィールドにより構成される。 Next, packet structures of various messages transmitted / received between the user end 1 and the STB 3 will be described. In this embodiment, it is assumed that devices are wirelessly connected by using TCP / IP Socket communication, and a packet is composed of fields of HEADER, CMD, PARAM, END, and SUM.

HEADERには開始マークが登録される。CMDには実行命令（コマンド）が登録される。PARAMは複数のValueフィールドを含む。ENDには終了マークが登録される。SUMフィールドにはメッセージの整合性をチェックするためのチェックサムが登録される。 A start mark is registered in HEADER. Execution instructions (commands) are registered in the CMD. PARAM includes a plurality of Value fields. An end mark is registered in END. A checksum for checking the integrity of the message is registered in the SUM field.

例えば、ユーザ端末１からSTB３へ送信される接続要求メッセージでは、CMDフィールドには「ユーザ検証」に対応したコマンドが登録され、PARAMフィールドにはユーザ属性（ここでは、名前、年齢および好み情報など）や端末IDが登録される。 For example, in a connection request message transmitted from the user terminal 1 to the STB 3, a command corresponding to “user verification” is registered in the CMD field, and user attributes (here, name, age, preference information, etc.) are registered in the PARAM field. And the terminal ID are registered.

また、ユーザの発話を意味理解したメッセージであれば、CMDフィールドには「制御コード」に対応したコマンド（ここでは、テレビの開閉、番組検索、チャンネル切替など）が登録され、PARAMフィールドには、ユーザ発話のキーワード、それぞれのキーワードの品詞（名詞、動詞、地名、俳優の名前など）、端末ID（ここでは、端末製造ID）が登録される。 In addition, if the message understands the meaning of the user's utterance, a command corresponding to the “control code” (here, opening / closing of the TV, program search, channel switching, etc.) is registered in the CMD field, and in the PARAM field, Keywords of user utterances, parts of speech (nouns, verbs, place names, actor names, etc.) and terminal IDs (here, terminal manufacturing IDs) of each keyword are registered.

例えば、番組を検索するコマンドを実行する際に、PARAMから解析したそれぞれのキーワードを用いて番組表を検索する。前記番組表の検索には、番組の内容、俳優、カテゴリなどの絞り検索が可能である。 For example, when executing a command for searching for a program, the program guide is searched using each keyword analyzed from PARAM. In the search of the program guide, a narrow search such as program contents, actors, and categories can be performed.

次いで、対話応答PF１０５（３０２）の機能について説明する。対話応答PF１０５（３０２）は、対話シナリオに基づいてユーザとインタラクションを行うプラットフォームである。 Next, the function of the dialogue response PF 105 (302) will be described. The dialogue response PF 105 (302) is a platform for interacting with a user based on a dialogue scenario.

対話シナリオは１つ以上の状態ノードから構成され、各状態ノードでそれぞれの対話パターンが実行される。例えば、最初の状態ノードでユーザがキャラクタに放送中の番組を聞くと、キャラクタがユーザの好みに応じた推薦を行って状態ノード２へ移る。状態ノード２において、ユーザが前記推薦された番組を見たいと発話すると、STB３の電源がオンされてキャラクタがユーザ端末１からTV２の画面上にジャンプウインして状態ノード3へ移る。 An interaction scenario is composed of one or more state nodes, and each state node executes a respective interaction pattern. For example, when a user listens to a program being broadcast to a character at the first state node, the character makes a recommendation according to the user's preference and moves to state node 2. When the user speaks in the state node 2 to see the recommended program, the power of the STB 3 is turned on, and the character jumps from the user terminal 1 onto the screen of the TV 2 to move to the state node 3.

状態ノード３では、ユーザが番組の再生中に他のチャンネルの切り換えや、TV番組表の検索、VODコンテンツアプリ、YouTube（登録商標）やカラオケなどその他のアプリ３０７の起動などのコマンドが受け付けられる。ここで、例えばVODコンテンツアプリが起動されると状態ノード４へ移り、ユーザからの検索キーワードの発話に備えて待機する。 In the state node 3, commands such as switching of other channels, searching for a TV program guide, starting a VOD content application, and other applications 307 such as YouTube (registered trademark) and karaoke are accepted while the program is being played. Here, for example, when the VOD content application is activated, the process moves to the state node 4 and waits for the utterance of the search keyword from the user.

対話シナリオの状態ノードおよび各状態ノード間の遷移は、実際の視聴ユースケースの統計に基づき、状態ノード遷移図を作成したものである。ユーザの入力により正確に返答するため、多数のユーザの視聴関連の事例の収集から、まず汎用的かつ基本的な状態ノードと遷移ルールを作成する。そして、徐々に状態ノード、遷移ルールのパターン追加・修正の繰り返しにより、ユーザの多様な視聴操作に関連する対話精度を向上できる。 The state node of the dialogue scenario and the transition between each state node are prepared by creating a state node transition diagram based on the actual viewing use case statistics. In order to respond accurately by user input, general and basic state nodes and transition rules are first created from a collection of viewing-related cases of a large number of users. Then, by gradually repeating the addition / modification of the pattern of the state node and transition rule, it is possible to improve the dialogue accuracy related to various viewing operations of the user.

次いで、ユーザ属性に基づく視聴操作やコンテンツ推薦について説明する。STB３では、ユーザ端末１から送信された接続要求のメッセージが検知されると、当該メッセージから端末IDおよびユーザプロファイルが抽出されてメモリに記憶される。その後の対話でユーザから要求された視聴操作が規制対象であるか否かが判定され、音量調節や明るさ調整のようにレイティングと無関係な要求であれば、要求に応じた制御が実行される。 Next, viewing operations and content recommendation based on user attributes will be described. In the STB 3, when a connection request message transmitted from the user terminal 1 is detected, the terminal ID and the user profile are extracted from the message and stored in the memory. In the subsequent dialogue, it is determined whether or not the viewing operation requested by the user is subject to regulation. If the request is unrelated to the rating, such as volume control or brightness control, control according to the request is executed. .

これに対して、要求がレイティングの設定されているコンテンツの視聴要求であれば、要求されたコンテンツのレイティングが番組表から読み込まれ、前記抽出された端末IDと対応付けられているユーザプロファイル（ここでは、年齢）とレイティング情報とが比較される。そして、ユーザ年齢が制限対象外であれば視聴が許可される一方、ユーザ年齢が制限対象であれば視聴が拒否される。 On the other hand, if the request is a content viewing request for which rating is set, the rating of the requested content is read from the program guide, and the user profile (here) is associated with the extracted terminal ID Then, age) and rating information are compared. If the user age is not the restriction target, viewing is permitted, while if the user age is the restriction target, the viewing is rejected.

また、ユーザ端末１のユーザプロファイル蓄積部１０１には、当該ユーザの嗜好情報が蓄積されており、ユーザ端末１とSTB３との接続が確立されると、これらの嗜好情報がキャラクタ情報と共にSTB３へ転送され、番組検索やコンテンツ推薦に利用される。 The user profile storage unit 101 of the user terminal 1 stores the preference information of the user. When the connection between the user terminal 1 and the STB 3 is established, the preference information is transferred to the STB 3 together with the character information. And used for program search and content recommendation.

ユーザの嗜好情報には、favoritetvprogram（好みの番組名）、favoritetvgenre（好みのカテゴリ）、favoritetetalent（好みの俳優名）、favoriteplace（好みの場所）、favaritesports（好みのスポーツ）などがあり、例えば以下のような情報が紐付けられている。
favoritetvprogram/笑っていいとも/スッキリ
favoritetvgenre/ニュース/ドキュメンタリー/アニメ
favoritetetalent/宮根誠司/AKB/船越英一郎
favoriteplace/東京/韓国
favaritesports/野球/ゴルフ User preference information includes favoritetvprogram (favorite program name), favoritetvgenre (favorite category), favoritetetalent (favorite actor name), favoriteplace (favorite place), favaritesports (favorite sports), for example: Such information is linked.
favoritetvprogram / You can laugh / Refresh
favoritetvgenre / News / Documentary / Animation
favoritetetalent / Seiji Miyane / AKB / Eiichiro Funakoshi
favoriteplace / Tokyo / Korea
favaritesports / baseball / golf

次いで、ユーザの多様な言い回しに対する意図推定方法について説明する。本実施例では、ユーザが発話したキーワードの簡単なマッチングではなく、対話の意図推定によりユーザの操作意図をより正確に捉える。 Next, an intention estimation method for various expressions of users will be described. In this embodiment, the user's operation intention is captured more accurately by estimating the intention of the dialog rather than simply matching the keyword spoken by the user.

具体的には、意味理解部１０４は、発話を表す文字列を入力とし、意図スロットと呼ぶ意図を表すシンボルを出力する。意図スロットには各々、その意図に属すると想定される発話文を特徴ベクトルに変換したテンプレートを複数登録しており、入力の発話文字列を変換した特徴ベクトルと各テンプレートとの類似度を計算し、最も類似度の高いテンプレートが属する意図スロットを出力している。 Specifically, the meaning understanding unit 104 receives a character string representing an utterance and outputs a symbol representing an intention called an intention slot. Each intention slot contains multiple templates that convert utterances that are assumed to belong to the intention into feature vectors, and calculates the similarity between the feature vectors converted from the input utterance character string and each template. The intention slot to which the template having the highest similarity belongs is output.

具体的なアルゴリズムとして、発話文字列から類義語や数値表現を抽象化した内容語集合を抽出し、bag-of-words表現として、テンプレート辞書内で定義された内容語に対応する次元が非零となる、大きさ「１」の特徴ベクトルを作成する。この特徴ベクトルを用いて、テンプレート辞書内の各テンプレートとの類似度を計算し、最も類似度の高いテンプレートが属する意図スロットシンボルを意図推定結果として出力する。 As a specific algorithm, we extract a set of content words abstracted from synonyms and numerical expressions from utterance strings, and the dimension corresponding to the content words defined in the template dictionary is non-zero as a bag-of-words expression. A feature vector of size “1” is created. Using this feature vector, the similarity with each template in the template dictionary is calculated, and the intention slot symbol to which the template with the highest similarity belongs is output as the intention estimation result.

次いで、ユーザ端末１によるSTB３の自動発見および自動接続の手順について説明する。一般的に、STB３のIPアドレスはCATVプロバイダもしくはローカルルータのDHCPにより取得されるために一意に特定することは難しい。そこで、本発明ではSTB３のIPアドレスがユーザ端末１に通知される仕組みを導入する。 Next, procedures for automatic discovery and automatic connection of the STB 3 by the user terminal 1 will be described. Generally, since the IP address of STB3 is acquired by the DHCP of the CATV provider or the local router, it is difficult to uniquely identify it. Therefore, the present invention introduces a mechanism for notifying the user terminal 1 of the IP address of the STB 3.

本実施例では、ローカルネットワークに接続されたユーザ端末１がUDP経由でBroadcast探索を実行し、STB３は自分に割り当てられているIPアドレスおよび通信ポートを返信する。ユーザ端末１は、返信されたIPアドレスおよび通信ポート等の接続情報を用いてSTB3へ自動的に接続を要求する。これにより、端末ユーザはSTB３のIPアドレスを解析し、更には解析結果に基づいて手動接続する操作から解放される。 In this embodiment, the user terminal 1 connected to the local network performs a broadcast search via UDP, and the STB 3 returns the IP address and communication port assigned to itself. The user terminal 1 automatically requests connection to the STB 3 using the connection information such as the returned IP address and communication port. As a result, the terminal user analyzes the IP address of the STB 3 and is further freed from the operation of manually connecting based on the analysis result.

図３は、図１の遠隔操作における図２の主要部の動作を示したシーケンスフローであり、ユーザ端末１の意味理解部１０４において、TV２/STB３のスイッチをオン操作する音声信号が認識されると、時刻t1，t2では、電源ON信号が対話応答PF１０５から無線通信部１０２を経由してSTB３の無線通信部３０１へ送信される。時刻t3では、STB３の無線通信部３０１からユーザ端末１へACK信号（電源ON完了）が返信される。 FIG. 3 is a sequence flow showing the operation of the main part of FIG. 2 in the remote operation of FIG. 1. The meaning understanding unit 104 of the user terminal 1 recognizes the audio signal for turning on the switch of the TV 2 / STB 3. At times t1 and t2, a power ON signal is transmitted from the dialogue response PF 105 to the wireless communication unit 301 of the STB 3 via the wireless communication unit 102. At time t3, an ACK signal (power ON completion) is returned from the wireless communication unit 301 of the STB 3 to the user terminal 1.

時刻t4，t5では、前記キャラクタをTV２のディスプレイ上に表示させて各種の演出を行わせるために必要なキャラクタデータ（キャラクタの表示に必要な3Dモデルファイルおよびモーションファイル）が、ユーザ端末１の対話応答PF１０５から無線通信部１０２を経由してSTB３の無線通信部３０１へ送信される。 At times t4 and t5, the character data (3D model file and motion file necessary for displaying the character) necessary for displaying the character on the display of the TV 2 and performing various effects are displayed in the dialog of the user terminal 1. The response is transmitted from the response PF 105 to the wireless communication unit 301 of the STB 3 via the wireless communication unit 102.

時刻t6，t7では、前記キャラクタデータに対するACK（情報送信完了）がSTB３の無線通信部３０１からユーザ端末１の無線通信部１０２を経由して対話応答PF１０５へ返信される。これと並行して、時刻t８ではSTB３の無線通信部３０１から対話応答PF３０２へ前記キャラクタデータが転送される。 At times t6 and t7, an ACK (information transmission completion) for the character data is returned from the wireless communication unit 301 of the STB 3 to the dialogue response PF 105 via the wireless communication unit 102 of the user terminal 1. In parallel with this, at time t8, the character data is transferred from the wireless communication unit 301 of the STB 3 to the dialogue response PF 302.

その後、ユーザ端末２の対話応答PF１０５から、時刻t9においてキャラクタ表示部１０６へジャンプアウト描画要求が送信されると、端末ディスプレイ上ではキャラクタのジャンプアウト表示が演出される。 Thereafter, when a jump-out drawing request is transmitted from the dialogue response PF 105 of the user terminal 2 to the character display unit 106 at time t9, a jump-out display of the character is produced on the terminal display.

時刻t10では、対話応答PF１０５から無線通信部１０２へジャンプアウト完了が通知される。時刻t11，t12では、当該無線通信部１０２からSTB３の無線通信部３０１を介して対話応答PF３０２へ、前記ジャンプアウト完了が送信される。時刻t13では、STB３の対話応答PF３０２からキャラクタ表示部３０３へ前記ジャンプイン描画要求が転送され、TV2において、キャラクタのジャンプイン表示が演出される。 At time t10, the dialog response PF 105 notifies the wireless communication unit 102 of the completion of jump-out. At times t11 and t12, the jump-out completion is transmitted from the wireless communication unit 102 to the dialogue response PF 302 via the wireless communication unit 301 of the STB 3. At time t13, the jump-in drawing request is transferred from the interactive response PF302 of STB3 to the character display unit 303, and a character jump-in display is produced on the TV2.

なお、上記の実施形態では、視聴機器がSTBである場合を例にして説明したが、本発明はこれのみに限定されるものではなく、カーナビゲーションシステムやデジタルフォトフレームなど、ディスプレイを備えて無線による遠隔操作が可能な機器であれば、どのような視聴機器にも同様に適用できる。 In the above embodiment, the case where the viewing device is an STB has been described as an example. However, the present invention is not limited to this, and the display device such as a car navigation system or a digital photo frame is wirelessly provided. As long as the device can be operated remotely, it can be applied to any viewing device.

１…ユーザ端末，２…TV，３…STB，１０２，３０１…無線通信部，１０３…音声認識部，１０４…意味理解部，１０５，３０２…対話応答PF，１０６，３０３…キャラクタ表示部，１０７，３０４…音声合成部，３０５…番組検索部，３０６…VODサービス部，３０７…アプリ部，３０８…制御部 DESCRIPTION OF SYMBOLS 1 ... User terminal, 2 ... TV, 3 ... STB, 102, 301 ... Wireless communication part, 103 ... Voice recognition part, 104 ... Semantic understanding part, 105, 302 ... Dialog response PF, 106, 303 ... Character display part, 107 , 304 ... Voice synthesis unit, 305 ... Program search unit, 306 ... VOD service unit, 307 ... Application unit, 308 ... Control unit

Claims

The display of the character interactive UI character that virtually interacts with the user is moved between the displays of the user terminal and the viewing device, and the side of the user terminal and the viewing device on which the character is displayed is displayed. A remote control system that operates interactively with the characters inside,
The user terminal is
Meaning understanding means to understand user's utterance content,
Providing the utterance content to a viewing terminal,
The user terminal and the viewing device are
Wireless communication means for establishing a wireless connection with each other;
Dialogue response means for determining response content based on the user's utterance content;
Voice response means for outputting a voice message based on the response content;
First animation control means for controlling the animation of the character on the display in accordance with the response content;
Second animation control means for jumping out the character from the user terminal and jumping in the viewing device to be operated;
The viewing device further comprises:
A remote operation system comprising control means for controlling a viewing service based on the response content.

The user terminal comprises means for storing a user profile;
The remote operation system according to claim 1 , wherein the interactive response unit determines a response content by reflecting the user profile.

The remote operation system according to claim 2 , wherein the user terminal provides the user profile to a viewing terminal that jumps in a character.

4. The remote operation system according to claim 3 , wherein the interaction response means of the viewing terminal determines the response content reflecting the provided user profile.

The second animation control means, remote control system according to any one of claims 1 to 4, characterized in that to jump-in to the user terminal by jumping out from the viewing device to which the operation is completed the character.

In a user terminal of a remote control system in which a character is moved from a user terminal to a target viewing device on a display and remotely controlled by a character interactive UI,
Meaning understanding means to understand user's utterance content,
Means for providing the utterance content to a viewing terminal;
Wireless communication means for establishing a wireless connection with the viewing device;
Dialogue response means for determining response content based on the user's utterance content;
Voice response means for outputting a voice message based on the response content;
First animation control means for controlling the animation of the character on the display in accordance with the response content;
A user terminal of a remote control system, comprising: second animation control means for jumping out the character in synchronization with jump-in to the viewing terminal.

Means for storing a user profile;
The user terminal of the remote operation system according to claim 6 , wherein the dialog response unit determines a response content by reflecting the user profile.

8. The user terminal of a remote control system according to claim 7 , wherein the user profile is provided to a viewing terminal to which the character has jumped in.

In a viewing device of a remote control system in which a character is moved on a display from a user terminal to a target viewing device and remotely controlled by a character interactive UI,
Wireless communication means for establishing a wireless connection with the user terminal;
A dialog response means for determining a response content based on a user's utterance content provided from a user terminal;
Voice response means for outputting a voice message based on the response content;
First animation control means for controlling the animation of the character on the display in accordance with the response content;
Second animation control means for causing the character to jump in in synchronization with a jump-out from a user terminal;
A viewing device of a remote control system, comprising: a control unit that controls a viewing service based on the response content.

Means for obtaining a user profile from a user terminal;
The viewing apparatus of the remote operation system according to claim 9 , wherein the interaction response means determines the response content by reflecting the user profile.