JP2004072275A

JP2004072275A - Information supply system and control method therefor

Info

Publication number: JP2004072275A
Application number: JP2002226587A
Authority: JP
Inventors: Tetsuo Kosaka; 小坂　哲夫; Masaaki Yamada; 山田　雅章; Hiroki Yamamoto; 山本　寛樹
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-08-02
Filing date: 2002-08-02
Publication date: 2004-03-04

Abstract

PROBLEM TO BE SOLVED: To provide an information supply system for supplying optimum information by appropriately sharing picture display and voice output and to provide a control method of the system. SOLUTION: A master machine 110 acquires web contents 120 through a network 121, separates the acquired contents into a display content and a voice output content in an output data conversion part 114 and transmits the display content and voice data generated based on the voice output content in a voice synthesis part 115 to a slave machine 100. In the slave machine 100, received display data is displayed in a browser 102 and voice is outputted based on voice data in a voice output part 105. COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、画面による情報閲覧と音声による情報読み上げを可能とする情報提供システム及びその制御方法に関する。
【０００２】
【従来の技術】
近年、ＷＷＷ（Ｗｏｒｌｄ　Ｗｉｄｅ　Ｗｅｂ）を利用した情報閲覧を行うシステムがインフラとして広く社会に浸透している。情報閲覧の際には、一般にブラウザと呼ばれるソフトウェアを使用し、画面にＧＵＩ（Ｇｒａｐｈｉｃａｌ　Ｕｓｅｒ　Ｉｎｔｅｒｆａｃｅ）を表示することによって、閲覧対象となる情報の検索を行っている。
【０００３】
また、音声合成技術を利用し、ウェブのコンテンツを読み上げることによって情報を取得する、いわゆる音声ブラウザも実用化されつつある。この音声ブラウザは、視覚障害者による利用や、電話による情報取得用の手段等としての利用が考えられる。
【０００４】
また、特開２００１−１１７６９２においては、外部のアクセス装置からコンテンツ内のどの部分を読み上げるかを設定し、該設定されたコンテンツを音声合成して出力する、という技術が開示されている。該技術においては、コンテンツ内での音声合成出力（読み上げ）対象となる部分をユーザが指定する手段を有し、さらに、「本文」や「リンク」、「タイトル」等、読み上げ対象となるテキストの種類を指定する手段を有している。
【０００５】
【発明が解決しようとする課題】
しかしながら、上記特開２００１−１１７９２に開示された外部アクセス装置においては、読み上げ対象となるコンテンツの指定を行うのみであり、外部アクセス装置におけるＧＵＩを用いた情報提供については考慮されておらず、外部アクセス装置自体が情報提供装置として機能するものではなかった。
【０００６】
本発明は上述した問題を解決するためになされたものであり、画面表示と音声出力を適切に併用することによって最適な情報提供を行う情報提供システム及びその制御方法を提供することを目的とする。
【０００７】
【課題を解決するための手段】
上記目的を達成するための一手段として、本発明の情報提供システムは以下の構成を備える。
【０００８】
すなわち、第１の装置と第２の装置からなる情報提供システムであって、前記第１の装置は、外部装置から情報データを取得する情報データ取得手段と、該取得した情報データを表示用データと音声出力用データに分離する分離手段と、前記音声出力用データに基づいて音声データを生成する音声合成手段と、前記表示用データと前記音声データを前記第２の装置に送信する送信手段と、を有し、前記第２の装置は、前記第１の装置から前記表示用データと前記音声データを受信する受信手段と、前記表示用データを表示する表示手段と、前記音声データを音声として出力する音声出力手段と、を有することを特徴とする。
【０００９】
【発明の実施の形態】
以下、本発明に係る一実施形態について、図面を参照して詳細に説明する。
【００１０】
＜第１実施形態＞
上記従来例で示した特開２００１−１１７９２に開示された外部アクセス装置を、親機と子機が無線接続されたコードレスＦＡＸ機における子機として想定することができる。
【００１１】
この場合、該子機においてウェブコンテンツを閲覧する際に、その表示部による表示と、音声による読み上げとを併用することが望ましい。すなわち、子機の表示部のサイズが大きければ、表示された情報を一瞥で取得できるため、ユーザにとっては音声出力よりも画面出力の方が情報を取得しやすい。逆に、子機の表示部サイズが小さければ、表示する情報の種類を制限し、表示されない情報を音声によって出力する、という方法も考えられる。
【００１２】
また、子機において表示対象となるコンテンツの量によっても、最適な表示方法は変わってくる。すなわち、コンテンツの量が少ない場合には、その全てを画面出力として音声出力を使用しないことも有効であるが、コンテンツの量が多い場合には、情報の一部のみを画面出力とし、その他を音声出力することが望ましい。
【００１３】
そこで本実施形態では、子機においてより最適な情報提供を実現することを目的として、ＧＵＩによる視覚的表示と音声出力を併用し、さらに表示部のサイズやコンテンツ量に基づいて出力方法を決定する。これにより、子機のユーザによる情報取得が、より容易となる。
【００１４】
●システム構成
本実施形態においては、親機と子機が無線接続されたいわゆるコードレス親子電話機における子機を本実施形態に係る情報出力装置として、以下に説明する。
【００１５】
図１は、本実施形態におけるコードレスＦＡＸ機のＷｅｂ閲覧機能部分の構成を示すブロック図である。同図において、１００はＧＵＩを表示するためのブラウザや、音声を出力するためのスピーカを備えた子機であり、１１０は情報取得のためのネットワーク接続機能や、情報取得機能、音声合成機能などを含んだ親機である。親機１１０と子機１００は何らかの通信線路（無線や有線など）によって接続され、互いに情報のやりとりを可能とする。また、１２０はコンテンツであり、たとえばＨＴＭＬで記述されたウェブページなどに相当する。また、１２１はネットワーク等の通信線路を示す。
【００１６】
子機１００は、ボタン等のユーザによる指示入力を行う指示入力部１０１、画面表示によってＧＵＩを実現するブラウザ１０２、コンテンツデータを表示データと音声データに分離する信号分離・合成部１０３、親機との信号の送受信を行う信号送受信部１０４、音声出力部１０５、実際の音声出力を行うスピーカ（又はヘッドホン）１０６、を備える。なお、指示入力部１０１としては、マウスやペン入力、タッチスクリーン等のポインティングデバイスや、キーボード、ボタン、ソフトキーボード等、一般にブラウザ１０２において使用可能な入力方法であれば、どのような装置であっても良い。
【００１７】
親機１１０は、ネットワーク１２１を介したコンテンツ１２０の受信およびリクエスト送信を行うコンテンツ受信・リクエスト送信部１１１、受信したコンテンツ量を算出するコンテンツ量計算部１１２、算出したコンテンツ量および子機におけるブラウザ１０２の表示サイズ情報を記憶するコンテンツ量・表示サイズ記憶部１１３、受信したコンテンツ１２０の形態を、例えば表示用コンテンツと音声用コンテンツに分離するように変換する出力データ変換部１１４、音声用コンテンツに基づく音声合成を行う音声合成部１１５、表示用コンテンツと音声データとの多重化等を行う信号分離・合成部１１６、子機との信号の送受信を行う信号送受信部１１７、を備える。
●情報出力処理
以下、本実施形態における情報出力処理について詳細に説明する。図２は、本実施形態において外部コンテンツを取得し、これをユーザに提供する処理を示すフローチャートである。
【００１８】
●コンテンツ取得
まず、ステップＳ１００において、子機１００におけるユーザ入力イベントを取得する。具体的には、指示入力部１０１からユーザ入力があると、ブラウザ１０２がそれを受け取り、ユーザによる送信指示によってコンテンツを取得するためのリクエストを発生させる。
【００１９】
次にステップＳ１０１において、発生したリクエストを親機へ送信する。具体的には、ステップＳ１００で発生したコンテンツ取得のためのリクエストが、まず信号分離・合成部１０３に送られる。ここで、リクエストを送信する場合は信号合成の必要はないため、単にリクエスト信号が通過するのみであり、次いで信号送受信部１０４において親機１１０と子機１００間の通信手順に従い、リクエスト信号が親機１１０に送信される。
【００２０】
次にステップＳ１０２では親機１１０において、子機１００より送信されたリクエストが受信される。具体的には、子機１００から親機１１０へ送信されたリクエスト信号が、親機１１０の信号送受信部１１７で受信される。
【００２１】
ステップＳ１０３において、ネットワーク１２１を介したリクエストの送信が行われる。具体的には、ステップＳ１０２において受信されたリクエスト信号が、信号分離・合成部１１６を介してコンテンツ受信・リクエスト送信部１１１に送られ、ネットワーク１２１や、コンテンツ１２０の取得に適した通信手順に従い、リクエストが送信される。なお、この通信手順としては、ウェブコンテンツをアクセスする場合にはｈｔｔｐプロトコル等が使用される。
【００２２】
そしてステップＳ１０４において、送信されたリクエストに従い、ユーザによって指定されたコンテンツ１２０が取得される。すなわち、ステップＳ１０２で子機１００から親機１１０へ送信されたリクエストは、ネットワーク１２１を介して指定のコンテンツ１２０に達し、その情報が取得されて親機１１０のコンテンツ受信・リクエスト送信部１１１で受信される。ここで受信される情報としては例えば、コンテンツ１２０がウェブページであれば、ＨＴＭＬやＸＨＴＭＬ等の記述言語によるテキストデータなどである。
【００２３】
次にステップＳ１０５においては、ステップＳ１０４で取得されたコンテンツの容量の計算、および該計算結果の保存が行われる。まず、コンテンツ受信・リクエスト送信部１１１で受信されたコンテンツは、コンテンツ計算部１１２に送られてそのデータ量が計算され、該計算結果はコンテンツ量・表示サイズ記憶部１１３に記憶される。ここでコンテンツ量としては、画像のサイズ、本文の文字数、リンクの文字数等、さまざまな定義が可能であるが、ここでは本文の文字数を採用する。
【００２４】
ここで図３に、ＨＴＭＬで記述されたコンテンツ例を示す。アンカータグ内の文字は本文に含めないとすると、同図に示す例においては「リンク集」、「このページでは、さまざまな情報にアクセスすることができます」、「全国の天気予報をお知らせします。」、「首都圏の電車の乗り継ぎの情報を検索することができます。」といった内容の文字数が、コンテンツ量として計算される。
【００２５】
●出力データ変換
そしてステップＳ１０６では出力データ変換部１１４において、コンテンツ量・表示サイズ記憶部１１３に保持されたコンテンツ量及び表示サイズに基づき、出力データを変換する。ここで、コンテンツ量・表示サイズ記憶部１１３に保持されている内容のうち、コンテンツ量は上記ステップＳ１０５で算出されたものであるが、表示サイズとしては、子機１００のブラウザ１０２において表示可能な文字サイズを、予め記憶しておく。あるいは、子機１００が親機１１０に接続された時点で、子機１００から親機１１０へ表示サイズデータを自動的に送信し、記憶してもよい。
【００２６】
本実施形態においては、コンテンツ量や表示サイズのそれぞれを単独で参照しても、同時に参照してもよい。以下に、参照方法の具体例を示す。
【００２７】
・コンテンツ量のみを参照する場合
以下、親機１１０の出力データ変換部１１４において、コンテンツ量・表示サイズ記憶部１１３に保持されたコンテンツ量及び表示サイズのうち、コンテンツ量のみを参照してデータ変換を行う例について説明する。
【００２８】
この場合、出力データ変換部１１４においてコンテンツ量に対する閾値を設け、コンテンツ量が該閾値を超えた場合には、該コンテンツを音声出力用コンテンツと表示用コンテンツに分離する。一方、コンテンツ量が閾値を超えない場合にはコンテンツを分離せずに、該コンテンツの全てを表示用コンテンツとして扱う。なお、閾値を０に設定することによって、コンテンツは常に音声出力用コンテンツと表示用コンテンツに分離される。この閾値としては、ユーザが設定しても良いし、予め既定値を設定しておいても良い。
【００２９】
図３に示すコンテンツを例とすると、そのコンテンツ量が閾値を超えた場合、図３のコンテンツは図４の表示用コンテンツと、図５の音声用コンテンツに分離される。ここで、図３に示すコンテンツは、ブラウザ１０２において図６のように表示される。また、コンテンツ分離後の図４に示す表示用コンテンツは、ブラウザ１０２において図７のように表示される。
【００３０】
図３に示すコンテンツ例において、アンカータグ内の文字は本文とはみなされないため、表示用コンテンツとして抽出される。これに対し、＜ｐ＞タグ内の文字は本文とみなされ、音声用コンテンツとして抽出される。また、＜ｐ＞タグ内の文字が表示用コンテンツとして出力されない代わりに「＊」記号が出力されているが、もちろん「＊」記号は出力されなくともよい。
【００３１】
ここで図３の行４０２に示されるように、画像とリンク先が同一のアンカータグ内に存在する場合、画像を含むアンカータグが表示用コンテンツとしてそのまま出力される例を図４に示したが、表示用コンテンツに画像を含まないようにしても良い。この場合、出力データ変換部１１４では例えば、行４０２の内容を以下のように変換し、画像が表示されないようにする。
＜ｐ＞＜ａ　ｈｒｅｆ＝”ｈｔｔｐ：／／ｈｏｇｅｈｏｇｅ／ｓｅｎｄｅｎ．ｈｔｍｌ”＞　画像へのリンク＜／ａ＞＜／ｐ＞
【００３２】
・コンテンツ量と表示サイズの両方を参照する場合
以下、親機１１０の出力データ変換部１１４において、コンテンツ量・表示サイズ記憶部１１３に保持されたコンテンツ量及び表示サイズの両方を参照してデータ変換を行う例について説明する。
【００３３】
本実施形態における表示サイズはブラウザ１０２の表示文字数であるから、これを例えばａとし、コンテンツ量をｂとすると、所定の関数ｆ（）について、
ｂ　＞　ｆ（ａ）
の関係が成立する場合に、出力データ変換部１１４はコンテンツを分離する。ここで関数ｆ（）としては例えば、単純に定数を減じるような関数であっても良い。なお、コンテンツの分離方法としては、上述したコンテンツ量のみに基づくデータ分離の場合と同様であるため、ここでは説明を省略する。
【００３４】
以上のように出力データ変換部１１４においては、親機１１０で受信したコンテンツ１２０を音声出力用コンテンツと表示用コンテンツに分離し、前者を音声合成部１１５へ、後者を信号分離・合成部１１６に渡す。
【００３５】
・音声合成
次にステップＳ１０７では音声合成部１１５において、出力データ変換部１１４から出力された音声出力用コンテンツに基づいて音声合成処理を行う。
【００３６】
本実施形態の音声出力用コンテンツとしては上述したように、図５に示すようなテキストデータが使用される。これを、文字を音声に変換するｔｅｘｔ−ｔｏ−ｓｐｅｅｃｈタイプの合成部によって音声波形データに変換して信号分離・合成部１１６に送る。なお、この変換を行う場合に、データ量に応じて発話スピードを変化させてもよい。例えば、音声出力用コンテンツ量が多いほど発話スピードを速くする、などが考えられる。
【００３７】
●子機におけるＧＵＩ表示及び音声出力
次にステップＳ１０８においては、親機１１０から子機１００へ、音声とテキストデータを送信する。このとき、出力データ変換部１１４で得られた表示用コンテンツおよび、音声合成部１１５で得られた音声波形データが、信号分離・合成部１１６において多重化され、該多重化されたデータが信号送受信部１１７に送られる。そして信号送受信部１１７より、表示用コンテンツおよび音声波形データの多重化信号が、所謂マルチモーダルインタフェース等、所定の通信手順に従って子機１００に対し送信される。
【００３８】
ステップＳ１０９においては、親機１１０から送信された多重化信号が、子機１００の信号送受信部１０４にて受信され、信号分離・合成部１０３において表示用コンテンツデータと音声波形データに分離される。
【００３９】
次にステップＳ１１０においては、信号分離・合成部１０３で分離された表示用コンテンツデータがブラウザ１０２に送られることによって、テキストデータが表示される。ブラウザ１０２はすなわち、例えば図４に示すようなデータを受け取り、そのディスプレイ上に図７に示すような表示を行う。
【００４０】
そしてステップＳ１１１においては、信号分離・合成部１０３で分離された音声波形データが音声出力部１０５に送られることによって、音声出力が行われる。音声出力部１０５では、音声波形データをＤ／Ａ変換器によりアナログの音声波形に変換し、スピーカ１０６から音声として出力する。
【００４１】
●ソフトウェアによるシステム構成
なお、本実施形態の構成は図１に示すハードウェア構成に限らず、ソフトウェアによって実現することも可能である。図８に、本実施形態をソフトウェアによって実現する際の構成例を示す。同図において、２００が子機、２１０が親機であり、コンテンツ１２０及びネットワーク１２１は図１と同様である。
【００４２】
図８の親機２１０内に示すコンテンツ受信・リクエスト送信モジュール２１３においては、図１に示すコンテンツ受信・リクエスト送信部１１１と同等の処理が行われる。同様に、図８に示す信号送受信モジュール２１４においては、図１に示す信号送受信部１１７と同等の処理が行われる。
【００４３】
また、図１に示す親機１１０内のコンテンツ量計算部１１２，出力データ変換部１１４，音声合成部１１５，信号分離・合成部１１６は、図８に示す構成おいては親機２１０内のメモリ２１２にプログラムとして保持され、ＣＰＵ２１１が該プログラムを実行することにより、それぞれの機能が実現される。また、図１に示すコンテンツ量・表示サイズ記憶部１１３に保持される内容は、図８ではメモリ２１２に保持される。ここで、メモリ２１２としてはＲＯＭ，ＲＡＭ，ハードディスク，磁気テープ，ＦＤ，フラッシュメモリ等、一般的にコンピュータにおいて使用されているものであれば、どのような形態であってもよい。また、ＲＯＭとＲＡＭとハードディスク等、複数のメモリ媒体を併用してもよい。
【００４４】
なお、本実施形態においては親機と子機がそれぞれ１台づつからなるシステムを例として説明したが、複数の子機を有する場合であっても、本実施形態は同様の方法で実現可能である。
【００４５】
以上説明したように本実施形態によれば、子機における表示部のサイズや表示すべきコンテンツの量に従って、ＧＵＩや音声による出力方法を最適化することにより、ユーザに対して最適な形態による情報提供を行うことができる。
【００４６】
また、表示サイズやコンテンツ量の閾値をユーザが設定可能とすることにより、ユーザは自身にとって最適な出力方法へのカスタマイズを行うことができる。
【００４７】
＜第２実施形態＞
以下、本発明に係る第２実施形態について説明する。
【００４８】
上述した第１実施形態においては、受信したコンテンツの音声出力とＧＵＩとを併用する際に、現在読み上げている内容と表示されている内容との位置関係については特に考慮されていない。そこで第２施形態においては、両者の相対的な位置関係をユーザが容易に把握できるようにすることを特徴とする。
【００４９】
以下、図１に示すブロック構成図を参照して説明する。
【００５０】
第２実施形態においては、親機１１０の音声合成部１１５で音声を合成する際に、出力データを文ごとに生成するものとし、文と文の間に、切れ目を示す所定の情報（以下、切れ目情報と称する）を挿入する。
【００５１】
また、子機１００の音声出力部１０５では、音声出力開始時に所定の開始信号をブラウザ１０２に送る。ブラウザ１０２は該開始信号を受け取ると、例えば図４の先頭行５０１に該当する「＊」を点滅表示する。
【００５２】
そして音声出力中に、音声出力部１０５が上記切れ目情報を検出すると、該切れ目情報を音声発声と同期させながらブラウザ１０２に送る。ブラウザ１０２は切れ目情報を受け取ると、図４に示す行５０２に該当する、次の「＊」を点滅させる。このように、音声発声に同期させながら「＊」を順次点滅させることにより、読み上げている内容と表示されている内容の相対的な位置関係が、容易に把握できる。
【００５３】
なお、第２実施例においては「＊」記号を表示順に点滅させる例を示したが、アンカータグ中に「＊」記号があった場合には、音声発声と「＊」記号の点滅がずれてしまうことが考えられる。このような不具合を回避するために、出力データ変換部１１４による表示用コンテンツ出力時に、「＊」記号脇に特殊タグを挿入してもよい。この場合、ブラウザ１０２においてはもちろん、特殊タグを表示しない。また、「＊」記号を点滅させる場合は、特殊タグが後続する「＊」記号のみを点滅させるようにすれば良い。
【００５４】
また同様な方法により、リンク部分の読み上げ時には、読み上げ対象のリンクを反転表示することもできる。また、本文内容を特殊タグで囲み、ブラウザが特殊タグを検出した場合に、図９に示すように該特殊タグで囲まれた内容を音声出力と同期させることによって、テキストを右から左に流すように表示させることも可能である。
【００５５】
以上説明したように第２実施形態によれば、音声出力とＧＵＩを併用したコンテンツ出力を行う際に、現在読み上げている内容と表示されている内容との相対的な位置関係を容易に把握することができる。
【００５６】
＜第３実施形態＞
以下、本発明に係る第３実施形態について説明する。
【００５７】
上述した第１実施形態においては、子機のみにおいてコンテンツ出力を行う例について説明したが、第３実施形態においては、親機においても同様にコンテンツ出力を行う例について説明する。
【００５８】
図１０は、第３実施形態における親子電話機の構成を示すブロック図である。同図において、上述した第１実施形態で示した図１と同様の構成には同一番号を付し、詳細な説明を省略する。
【００５９】
図１０に示す構成においては、親機９１０内にもブラウザ９１９と音声出力部９２１が備えられていることを特徴とする。第３実施形態の親機９１０においては、子機１００におけるコンテンツ出力処理と同様に、出力データ変換部１１４で得られた音声出力用コンテンツ及び表示用コンテンツに基づき、スピーカ９２０による音声出力及びブラウザ９１９による表示を行う。
【００６０】
但し第３実施形態においては、親機９１０と子機１００とでブラウザの表示サイズが異なることが想定されるため、親機９１０内のコンテンツ量・表示サイズ記憶部９１３では、子機１００のブラウザ１０２と、親機９１０のブラウザ９１９の両方の表示サイズをあら予め保持しておく必要がある。そして出力データ変換部９１４においては、親機９１０に対する表示用コンテンツを作成する際にはブラウザ９１９の表示サイズを参照し、子機１００に対する表示用コンテンツを作成する際にはブラウザ１０２の表示サイズを参照する。
【００６１】
以上説明したように第３実施形態によれば、親機においても子機と同様に、ブラウザによる表示と音声出力とを併用した、ユーザにとって最適な形態による情報提供を行うことができる。
【００６２】
＜他の実施形態＞
なお、本発明は、複数の機器（例えばホストコンピュータ、インタフェイス機器、スピーカ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、電話機、ファクシミリ装置など）に適用しても良い。
【００６３】
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵまたはＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても達成されることは言うまでもない。
【００６４】
この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。
【００６５】
プログラムコードを供給するための記憶媒体としては、例えば、フロッピーディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることが出来る。
【００６６】
また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００６７】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００６８】
【発明の効果】
以上説明したように本発明によれば、画面表示と音声出力を適切に併用することによって最適な情報提供を行うことができる。
【図面の簡単な説明】
【図１】本発明に係る一実施形態における親子電話機の構成を示すブロック図である。
【図２】本実施形態における情報出力処理を示すフローチャートである。
【図３】本実施形態において取得したコンテンツ内容の一例を示す図である。
【図４】本実施形態における変換後の表示用コンテンツの一例を示す図である。
【図５】本実施形態における変換後の音声出力用コンテンツの一例を示す図である。
【図６】本実施形態において取得したコンテンツの表示例を示す図である。
【図７】本実施形態における表示用コンテンツの表示例を示す図である。
【図８】本実施形態をソフトウェアによって実現する場合の、親子電話機の構成を示すブロック図である。
【図９】第２実施形態における表示用コンテンツの表示例を示す図である。
【図１０】第３実施形態における親子電話機の構成を示すブロック図である。
【符号の説明】
１０１　ＣＰＵ
１０２　メインメモリ
１０３　ＳＣＳＩ　Ｉ／Ｆ
１０４　ネットワーク　Ｉ／Ｆ
１０５　ＨＤＤ
１０６　グラフィックアクセラレータ
１０７　カラーモニタ
１０８　ＵＳＢコントローラ
１０９　カラープリンタ
１１０　キーボード／マウスコントローラ
１１１　キーボード
１１２　マウス
１１３　ＬＡＮ
１１４　ＰＣＩバス[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information providing system capable of browsing information on a screen and reading out information by voice, and a control method thereof.
[0002]
[Prior art]
2. Description of the Related Art In recent years, a system for browsing information using the WWW (World Wide Web) has widely spread as an infrastructure in society. At the time of browsing information, software that is generally called a browser is used, and a GUI (Graphical User Interface) is displayed on a screen to search for information to be browsed.
[0003]
Also, a so-called voice browser that acquires information by reading out web content using a voice synthesis technology is being put into practical use. This voice browser can be used by a visually impaired person or as a means for acquiring information by telephone.
[0004]
Also, Japanese Patent Application Laid-Open No. 2001-117692 discloses a technique of setting which part of content is read out from an external access device, and synthesizing and outputting the set content. In this technology, the user has means for designating a portion to be subjected to speech synthesis output (speech) in the content, and further includes a "text", a "link", and a "title". It has means to specify the type.
[0005]
[Problems to be solved by the invention]
However, in the external access device disclosed in Japanese Patent Application Laid-Open No. 2001-11792, only the content to be read out is designated, and information provision using a GUI in the external access device is not considered. The access device itself did not function as an information providing device.
[0006]
The present invention has been made in order to solve the above-described problem, and has as its object to provide an information providing system that provides optimal information by appropriately using screen display and audio output, and a control method thereof. .
[0007]
[Means for Solving the Problems]
As one means for achieving the above object, the information providing system of the present invention has the following configuration.
[0008]
That is, an information providing system including a first device and a second device, wherein the first device includes: an information data acquisition unit configured to acquire information data from an external device; Separating means for separating the data into audio output data, audio synthesizing means for generating audio data based on the audio output data, and transmitting means for transmitting the display data and the audio data to the second device. The second device has a receiving unit that receives the display data and the audio data from the first device, a display unit that displays the display data, and converts the audio data into audio. And audio output means for outputting.
[0009]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
[0010]
<First embodiment>
The external access device disclosed in JP-A-2001-11792 shown in the above-mentioned conventional example can be assumed as a slave unit in a cordless FAX machine in which the master unit and the slave unit are wirelessly connected.
[0011]
In this case, when browsing the web content in the child device, it is desirable to use both the display by the display unit and the voice reading aloud. That is, if the size of the display unit of the slave unit is large, the displayed information can be obtained at a glance, so that the screen output is easier for the user to obtain the information than the voice output. Conversely, if the size of the display unit of the slave unit is small, a method of limiting the type of information to be displayed and outputting information not to be displayed by voice may be considered.
[0012]
The optimum display method also changes depending on the amount of content to be displayed on the slave unit. In other words, when the amount of content is small, it is effective not to use the audio output as the entire screen output, but when the amount of content is large, only part of the information is output to the screen, It is desirable to output audio.
[0013]
Therefore, in the present embodiment, for the purpose of realizing more optimal information provision in the slave unit, the visual display using the GUI and the audio output are used together, and the output method is determined based on the size of the display unit and the content amount. . This makes it easier for the user of the slave unit to obtain information.
[0014]
System Configuration In the present embodiment, a child device of a so-called cordless parent-child telephone in which the parent device and the child device are wirelessly connected will be described below as an information output device according to the present embodiment.
[0015]
FIG. 1 is a block diagram illustrating a configuration of a Web browsing function part of the cordless FAX machine according to the present embodiment. In the figure, reference numeral 100 denotes a slave unit having a browser for displaying a GUI and a speaker for outputting sound, and 110 denotes a network connection function for information acquisition, an information acquisition function, a speech synthesis function, and the like. Is the parent machine that contains Master device 110 and slave device 100 are connected by some kind of communication line (wireless, wired, or the like), and can exchange information with each other. Reference numeral 120 denotes content, which corresponds to, for example, a web page described in HTML. Reference numeral 121 denotes a communication line such as a network.
[0016]
The slave unit 100 includes an instruction input unit 101 for inputting an instruction by a user such as a button, a browser 102 for realizing a GUI by screen display, a signal separation / synthesis unit 103 for separating content data into display data and audio data, and a master unit. A signal transmission / reception unit 104 for transmitting and receiving signals, an audio output unit 105, and a speaker (or headphones) 106 for actually outputting audio. The instruction input unit 101 may be a pointing device such as a mouse, a pen, a touch screen, a keyboard, a button, a soft keyboard, or any other input method that can be generally used in the browser 102. Is also good.
[0017]
The parent device 110 includes a content reception / request transmission unit 111 for receiving and requesting the content 120 via the network 121, a content amount calculation unit 112 for calculating the received content amount, the calculated content amount and the browser 102 in the slave device. Content size / display size storage unit 113 for storing display size information, output data conversion unit 114 for converting the form of received content 120 into, for example, display content and audio content, and based on audio content. It includes a voice synthesizing unit 115 for performing voice synthesis, a signal separating / synthesizing unit 116 for multiplexing display content and voice data, and a signal transmitting / receiving unit 117 for transmitting / receiving signals to / from a slave unit.
Information Output Processing Hereinafter, the information output processing in the present embodiment will be described in detail. FIG. 2 is a flowchart illustrating a process of acquiring external content and providing the content to a user in the present embodiment.
[0018]
Acquisition of Content First, in step S100, a user input event in the slave unit 100 is acquired. Specifically, when there is a user input from the instruction input unit 101, the browser 102 receives the user input, and generates a request for acquiring the content according to a transmission instruction from the user.
[0019]
Next, in step S101, the generated request is transmitted to the parent device. Specifically, the request for content acquisition generated in step S100 is first sent to signal separation / combination section 103. Here, when transmitting a request, since there is no need for signal combining, the request signal simply passes through. Then, the request signal is transmitted to the signal transmitting / receiving section 104 according to the communication procedure between the parent device 110 and the child device 100. Device 110.
[0020]
Next, at step S102, the master unit 110 receives the request transmitted from the slave unit 100. Specifically, a request signal transmitted from child device 100 to parent device 110 is received by signal transmitting / receiving section 117 of parent device 110.
[0021]
In step S103, a request is transmitted via the network 121. Specifically, the request signal received in step S102 is sent to the content reception / request transmission unit 111 via the signal separation / synthesis unit 116, and the network 121 and the communication procedure suitable for acquiring the content 120 are performed according to the following procedure. Request is sent. As the communication procedure, an http protocol or the like is used when accessing web contents.
[0022]
Then, in step S104, the content 120 specified by the user is obtained according to the transmitted request. That is, the request transmitted from child device 100 to parent device 110 in step S 102 reaches designated content 120 via network 121, and the information is obtained and received by content reception / request transmission section 111 of parent device 110. Is done. The information received here is, for example, text data in a description language such as HTML or XHTML if the content 120 is a web page.
[0023]
Next, in step S105, the capacity of the content acquired in step S104 is calculated, and the calculation result is stored. First, the content received by the content reception / request transmission unit 111 is sent to the content calculation unit 112 to calculate the data amount, and the calculation result is stored in the content amount / display size storage unit 113. Here, the content amount can be variously defined, such as the size of the image, the number of characters in the body, and the number of characters in the link. Here, the number of characters in the body is used.
[0024]
Here, FIG. 3 shows an example of content described in HTML. If the text in the anchor tag is not included in the text, in the example shown in the figure, "links", "on this page you can access various information", "we will inform the weather forecast nationwide The number of characters such as "You can search for information on connecting trains in the Tokyo metropolitan area." Is calculated as the content amount.
[0025]
Output Data Conversion In step S106, the output data conversion unit 114 converts the output data based on the content amount and the display size held in the content amount / display size storage unit 113. Here, among the contents held in the content amount / display size storage unit 113, the content amount is calculated in step S105, but the display size is a size that can be displayed in the browser 102 of the child device 100. The character size is stored in advance. Alternatively, when the slave 100 is connected to the master 110, the display size data may be automatically transmitted from the slave 100 to the master 110 and stored.
[0026]
In the present embodiment, each of the content amount and the display size may be referred to independently or may be referred to simultaneously. Hereinafter, specific examples of the reference method will be described.
[0027]
In the case where only the content amount is referred to, the output data conversion unit 114 of the master unit 110 performs data conversion by referring to only the content amount among the content amount and the display size held in the content amount / display size storage unit 113. Will be described.
[0028]
In this case, the output data conversion unit 114 sets a threshold value for the content amount, and when the content amount exceeds the threshold value, separates the content into audio output content and display content. On the other hand, when the content amount does not exceed the threshold, all of the contents are handled as display contents without separating the contents. By setting the threshold value to 0, the content is always separated into audio output content and display content. The threshold may be set by the user or may be set to a predetermined value in advance.
[0029]
Taking the content shown in FIG. 3 as an example, if the content amount exceeds the threshold, the content in FIG. 3 is separated into the display content in FIG. 4 and the audio content in FIG. Here, the content shown in FIG. 3 is displayed on the browser 102 as shown in FIG. The display content shown in FIG. 4 after the content separation is displayed on the browser 102 as shown in FIG.
[0030]
In the content example shown in FIG. 3, the characters in the anchor tag are not regarded as the text, and are therefore extracted as display content. On the other hand, the characters in the tag are regarded as the body and are extracted as audio contents. Also, instead of the characters in the tag not being output as display content, the “*” symbol is output, but the “*” symbol need not be output.
[0031]
Here, as shown in a row 402 in FIG. 3, when an image and a link destination exist in the same anchor tag, an example in which an anchor tag including an image is output as it is as display content is shown in FIG. Alternatively, the display content may not include an image. In this case, the output data conversion unit 114 converts the content of the row 402 as follows, for example, so that no image is displayed.
<ahref="http://hogehoge/senden.html"> Link to image </a><//p>
[0032]
In the case where both the content amount and the display size are referred to, the output data conversion unit 114 of the master unit 110 performs data conversion with reference to both the content amount and the display size stored in the content amount / display size storage unit 113. Will be described.
[0033]
Since the display size in the present embodiment is the number of characters displayed on the browser 102, for example, if this is a and the content amount is b, for a predetermined function f (),
b> f (a)
Is satisfied, the output data conversion unit 114 separates the content. Here, the function f () may be, for example, a function that simply reduces a constant. Note that the content separation method is the same as the above-described data separation based on only the content amount, and thus the description is omitted here.
[0034]
As described above, the output data converter 114 separates the content 120 received by the base unit 110 into audio output content and display content, and sends the former to the audio synthesizer 115 and the latter to the signal separator / synthesizer 116. hand over.
[0035]
Voice Synthesis Next, in step S107, the voice synthesis unit 115 performs voice synthesis processing based on the audio output content output from the output data conversion unit 114.
[0036]
As described above, text data as shown in FIG. 5 is used as the audio output content of the present embodiment. This is converted into speech waveform data by a text-to-speech type synthesizing unit that converts text to speech, and sent to the signal separating / synthesizing unit 116. When performing this conversion, the speech speed may be changed according to the data amount. For example, it is conceivable to increase the utterance speed as the amount of audio output content increases.
[0037]
GUI Display and Voice Output in Slave Unit Next, in step S108, voice and text data are transmitted from master unit 110 to slave unit 100. At this time, the display content obtained by the output data conversion unit 114 and the audio waveform data obtained by the audio synthesis unit 115 are multiplexed by the signal separation / synthesis unit 116, and the multiplexed data is transmitted and received. It is sent to the unit 117. Then, a multiplexed signal of the display content and the audio waveform data is transmitted from the signal transmitting / receiving unit 117 to the slave unit 100 according to a predetermined communication procedure such as a so-called multimodal interface.
[0038]
In step S109, the multiplexed signal transmitted from master unit 110 is received by signal transmission / reception unit 104 of slave unit 100, and is separated into display content data and audio waveform data by signal separation / combination unit 103.
[0039]
Next, in step S110, the display content data separated by the signal separation / synthesis unit 103 is sent to the browser 102, so that text data is displayed. That is, the browser 102 receives, for example, data as shown in FIG. 4 and performs a display as shown in FIG. 7 on its display.
[0040]
In step S111, the audio output is performed by sending the audio waveform data separated by the signal separation / synthesis unit 103 to the audio output unit 105. The audio output unit 105 converts the audio waveform data into an analog audio waveform using a D / A converter, and outputs the analog audio waveform from the speaker 106.
[0041]
System Configuration by Software The configuration of the present embodiment is not limited to the hardware configuration shown in FIG. 1, but can be realized by software. FIG. 8 shows a configuration example when the present embodiment is realized by software. In the figure, reference numeral 200 denotes a slave unit, 210 denotes a master unit, and the contents 120 and the network 121 are the same as those in FIG.
[0042]
In the content reception / request transmission module 213 shown in the parent device 210 of FIG. 8, processing equivalent to that of the content reception / request transmission unit 111 shown in FIG. 1 is performed. Similarly, in the signal transmission / reception module 214 shown in FIG. 8, the same processing as the signal transmission / reception unit 117 shown in FIG. 1 is performed.
[0043]
Further, in the configuration shown in FIG. 8, the content amount calculation unit 112, the output data conversion unit 114, the speech synthesis unit 115, and the signal separation / synthesis unit 116 in the master unit 110 shown in FIG. Each function is realized by being held as a program in the 212 and the CPU 211 executing the program. Further, the content held in the content amount / display size storage unit 113 shown in FIG. 1 is held in the memory 212 in FIG. Here, the memory 212 may be in any form such as a ROM, a RAM, a hard disk, a magnetic tape, an FD, a flash memory, etc., as long as they are generally used in a computer. A plurality of memory media such as a ROM, a RAM, and a hard disk may be used in combination.
[0044]
Although the present embodiment has been described by taking as an example a system including one master unit and one slave unit, the present embodiment can be implemented by a similar method even when a plurality of slave units are provided. is there.
[0045]
As described above, according to the present embodiment, by optimizing the output method using a GUI or audio according to the size of the display unit in the slave unit and the amount of content to be displayed, information in an optimal form for the user is provided. Offers can be made.
[0046]
Further, by enabling the user to set the thresholds for the display size and the content amount, the user can customize the output method to be optimal for the user.
[0047]
<Second embodiment>
Hereinafter, a second embodiment according to the present invention will be described.
[0048]
In the first embodiment described above, when the audio output of the received content and the GUI are used together, the positional relationship between the currently read content and the displayed content is not particularly considered. Therefore, the second embodiment is characterized in that the user can easily grasp the relative positional relationship between the two.
[0049]
Hereinafter, description will be made with reference to the block diagram shown in FIG.
[0050]
In the second embodiment, when speech is synthesized by the speech synthesizer 115 of the base unit 110, output data is generated for each sentence, and predetermined information indicating a break between sentences (hereinafter, referred to as “sentence”) is provided between sentences. (Referred to as break information).
[0051]
Further, the audio output unit 105 of the child device 100 sends a predetermined start signal to the browser 102 at the start of the audio output. Upon receiving the start signal, the browser 102 blinks, for example, “*” corresponding to the first row 501 in FIG.
[0052]
When the audio output unit 105 detects the above-mentioned break information during audio output, the audio output unit 105 sends the break information to the browser 102 while synchronizing with the audio utterance. Upon receiving the break information, the browser 102 blinks the next “*” corresponding to the row 502 shown in FIG. As described above, by blinking “*” sequentially while synchronizing with the voice utterance, the relative positional relationship between the content being read and the content being displayed can be easily grasped.
[0053]
In the second embodiment, the example in which the “*” symbol blinks in the display order has been described. However, when the “*” symbol is present in the anchor tag, the voice utterance and the blinking of the “*” symbol are shifted. It is possible that To avoid such a problem, a special tag may be inserted beside the “*” symbol when the output data conversion unit 114 outputs the display content. In this case, the browser 102 does not display the special tag. When the "*" symbol is made to blink, only the "*" symbol following the special tag may be made to blink.
[0054]
In the same manner, when reading out the link portion, the link to be read out can be highlighted. Also, when the browser detects the special tag, the text is circulated from right to left by synchronizing the content enclosed with the special tag with the audio output as shown in FIG. 9 when the browser detects the special tag. Can be displayed as follows.
[0055]
As described above, according to the second embodiment, when performing content output using both voice output and the GUI, the relative positional relationship between the currently read content and the displayed content is easily grasped. be able to.
[0056]
<Third embodiment>
Hereinafter, a third embodiment according to the present invention will be described.
[0057]
In the above-described first embodiment, an example has been described in which content output is performed only in the child device, but in the third embodiment, an example in which content output is similarly performed in the parent device will be described.
[0058]
FIG. 10 is a block diagram illustrating a configuration of a parent-child telephone according to the third embodiment. In the figure, the same components as those in FIG. 1 shown in the above-described first embodiment are denoted by the same reference numerals, and detailed description is omitted.
[0059]
The configuration shown in FIG. 10 is characterized in that a browser 919 and an audio output unit 921 are also provided in master device 910. In the parent device 910 of the third embodiment, as in the content output processing in the child device 100, the audio output by the speaker 920 and the browser 919 are performed based on the audio output content and the display content obtained by the output data conversion unit 114. Is displayed.
[0060]
However, in the third embodiment, since it is assumed that the display size of the browser is different between the parent device 910 and the child device 100, the content amount / display size storage unit 913 in the parent device 910 stores the browser of the child device 100. It is necessary to previously store the display sizes of both the browser 102 and the browser 919 of the master unit 910. The output data conversion unit 914 refers to the display size of the browser 919 when creating display content for the parent device 910, and refers to the display size of the browser 102 when creating display content for the slave device 100. refer.
[0061]
As described above, according to the third embodiment, similarly to the child device, the parent device can provide information in a form optimal for the user by using the display by the browser and the audio output together.
[0062]
<Other embodiments>
The present invention can be applied to a system including a plurality of devices (for example, a host computer, an interface device, a speaker, a printer, etc.), but can be applied to a device including one device (for example, a telephone, a facsimile device, etc.). May be applied.
[0063]
Further, an object of the present invention is to provide a storage medium storing a program code of software for realizing the functions of the above-described embodiments to a system or an apparatus, and a computer (or CPU or MPU) of the system or apparatus to store the storage medium. Needless to say, this can also be achieved by reading out and executing the program code stored in the.
[0064]
In this case, the program code itself read from the storage medium realizes the function of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.
[0065]
As a storage medium for supplying the program code, for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, and the like can be used.
[0066]
When the computer executes the readout program code, not only the functions of the above-described embodiments are realized, but also an OS (Operating System) running on the computer based on the instruction of the program code. It goes without saying that a case where a part of the actual processing is performed and the function of the above-described embodiment is realized by the processing is also included.
[0067]
Further, after the program code read from the storage medium is written into a memory provided on a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that a CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.
[0068]
【The invention's effect】
As described above, according to the present invention, it is possible to provide optimal information by appropriately using screen display and audio output.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a parent-child phone according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an information output process according to the embodiment.
FIG. 3 is a diagram illustrating an example of content content acquired in the embodiment.
FIG. 4 is a diagram illustrating an example of converted display content according to the embodiment.
FIG. 5 is a diagram illustrating an example of converted audio output content according to the embodiment;
FIG. 6 is a diagram illustrating a display example of content acquired in the embodiment.
FIG. 7 is a diagram illustrating a display example of display content according to the embodiment.
FIG. 8 is a block diagram showing a configuration of a parent-child telephone when the present embodiment is realized by software.
FIG. 9 is a diagram illustrating a display example of display content according to the second embodiment.
FIG. 10 is a block diagram illustrating a configuration of a parent-child telephone according to a third embodiment.
[Explanation of symbols]
101 CPU
102 Main memory 103 SCSI I / F
104 Network I / F
105 HDD
106 Graphic accelerator 107 Color monitor 108 USB controller 109 Color printer 110 Keyboard / mouse controller 111 Keyboard 112 Mouse 113 LAN
114 PCI bus

Claims

An information providing system comprising a first device and a second device,
The first device comprises:
Information data acquisition means for acquiring information data from an external device;
Separating means for separating the acquired information data into display data and audio output data,
Voice synthesis means for generating voice data based on the voice output data,
Transmitting means for transmitting the display data and the audio data to the second device,
The second device comprises:
Receiving means for receiving the display data and the audio data from the first device;
Display means for displaying the display data,
Audio output means for outputting the audio data as audio,
An information providing system comprising:

2. The information providing system according to claim 1, wherein said information data acquiring means acquires web content described in a predetermined description language via a network as said information data.

The information providing system according to claim 2, wherein the separating unit separates the web content into display content and audio content according to a type of a tag in the description language.

The first device further comprises:
Data amount calculation means for calculating the data amount of the acquired information data;
Holding means for holding the data amount,
2. The information providing system according to claim 1, wherein the separation unit changes a ratio between the display data and the audio output data based on a data amount held in the holding unit.

The holding unit further holds display size information on the display unit of the second device,
5. The information providing system according to claim 4, wherein the separating unit changes a ratio between the display data and the audio output data based on the data amount and the display size information.

The first device further comprises:
Display means for displaying the display data,
Audio output means for outputting the audio data as audio,
The information providing system according to claim 1, further comprising:

2. The information providing system according to claim 1, wherein the voice synthesizing unit changes a generation speed of the voice data according to a data amount of the voice output data.

In the first device, the voice synthesis unit inserts a special tag at a predetermined position of the voice data,
In the second device,
The sound output means, when detecting the special tag during the output of the sound data, outputs information of the special tag to the display means,
2. The information providing system according to claim 1, wherein the display unit displays the display data in synchronization with the information on the special tag.

A method for controlling an information providing system including a first device and a second device,
In the first device,
An information data acquisition step of acquiring information data from an external device,
A separating step of separating the acquired information data into display data and audio output data,
A voice synthesis step of generating voice data based on the voice output data,
Transmitting the display data and the audio data to the second device,
In the second device,
A receiving step of receiving the display data and the audio data from the first device;
A display step of displaying the display data;
An audio output step of outputting the audio data as audio,
A method for controlling an information providing system, comprising:

A program which, when executed on a computer, causes the computer to operate as the first device according to any one of claims 1 to 7.

A program which, when executed on a computer, causes the computer to operate as the second device according to any one of claims 1 to 7.

A recording medium on which the program according to claim 10 or 11 is recorded.