JP2004171111A

JP2004171111A - Web browser control method and device

Info

Publication number: JP2004171111A
Application number: JP2002333559A
Authority: JP
Inventors: Kazuhiko Shudo; 和彦首藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2002-11-18
Filing date: 2002-11-18
Publication date: 2004-06-17
Anticipated expiration: 2022-11-18
Also published as: JP4110938B2

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem of requiring time up to arriving at required information when reading out the whole pages by a talking browser in a tagged sentence of the Internet. <P>SOLUTION: The tagged sentence is automatically distributed, and a part read out by the talking browser can be designated. Such a constitution can efficiently provide information of the Internet even in a small display screen without looking at the screen. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は音声によるインターネットのＷｅｂページを閲覧（ブラウズ）するためのソフトウェアの制御方法である。詳しくは、音声ブラウザ及び、携帯電話等でのインターネットブラウザの制御方法に関する。
【０００２】
【従来の技術】
近年、インターネットの利用が大幅に増大し、それに伴い従来とは異なる形態の利用の仕方がされるようになってきている。基本的に、インターネット上の情報は、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）と呼ばれる形式である。
これらの情報を、小さい画面を持った携帯電話や、小さいディスプレイがついた一般の加入者用固定電話からインターネットに接続してＨＴＭＬを読み込み、インターネットを利用できる機器がある。
更に、属性付きドキュメントの読み上げ装置で実現されるＷｅｂページ読み上げソフトや音声ブラウザ等（例えば、特許文献１参照。）を、視覚障害のある人が使って、インターネットの情報を表示している画面を参照せずにインターネットの情報にアクセスできる機器がある。
【０００３】
【特許文献１】
特開平１１−３２７８７０号公報（第２−３頁、第１図）
【０００４】
【発明が解決しようとする課題】
しかしながら、一般のＨＴＭＬで記載されたＷｅｂサイトは、高解像度の表示装置に表示させることを前提に作成されている場合が多い。この場合、小さな表示画面や音声ブラウザでは快適に閲覧できないことが多い。このようなＷｅｂサイトを複雑にしている原因の１つとして、ナビゲーション・メニューが画面表示の一部として含まれることが挙げられる。ここで、ナビゲーション・メニューとは、画面表示の本文ではなく、他のページに移動するための、リンクの集まりのことを言う。
例えば、インターネットでニュースの情報を表示するＷｅｂサイトであれば、ニュース本文の他に、「経済、国際、マーケット」というようなトピックだけを示した、ナビゲーション・メニューがある。たとえば、「経済」という部分を選択するとそのトピックだけを扱ったニュースのページへ移動する。このトピックだけを示した部分をナビゲーション・メニューという。
このようなナビゲーション・メニューは、高解像度の表示装置を使ってＷｅｂサイトを閲覧しているユーザにとって、大変便利なものである。
ところが、音声合成装置でＷｅｂページを読み上げさせる場合、このようなナビゲーション・メニューを発音させることは、本文だけを知りたい場合に煩わしいものとなる。
更に、携帯電話等からインターネットにアクセスする場合、小さな画面に本文の他に多数のナビゲーション・メニューを表示することになるので、スクロールをして本文を読むことが煩わしいことが多い。
【０００５】
本発明は、このような問題を鑑みてなされたものであり、Ｗｅｂページ中のナビゲーション・メニューに関する煩わしさに妨げられずに、快適に音声や携帯電話などの小さな画面でも快適にＷｅｂ閲覧をすることを可能にする。
【０００６】
【課題を解決するための手段】
本発明は、前記課題を解決するために、以下の手段を採用した。
【０００７】
すなわち、インターネット上のＨＴＭＬ等で記載されたＷｅｂサイトのコンテンツには、修飾条件等を定める属性データ（以下、これをタグと言うことがある）が含まれている。本発明は、このタグを読み上げの制御情報として利用する。本発明の、Ｗｅｂページ読み上げ方法では、属性付きのコンテンツを解析して、音声合成手段により、コンテンツ中のテキスト部分を読み上げる装置において、前記属性付きコンテンツの解析を行なう工程と、ＨＴＭＬコンテンツ読み上げの際に、原則として前記コンテンツの解析を行なう工程によって解析された読み上げ条件に沿ってテキスト部分を読み上げる工程と、を含むことを特徴とする。
【０００８】
【発明の実施の形態】
図１は、本発明の第１の実施形態の構成を示した概略図である。
第１の実施形態では、視覚障害のある人がテキスト音声変換技術（ＴＴＳ：ＴｅｘｔＴｏＳｐｅｅｃｈ）を用いたＷｅｂ読み上げソフトを用いて、画面を利用することなく、音声とキーボード等の入力装置のみでインターネットにアクセスする形態を想定する。もちろん視覚障害のない人が利用することも可能である。
【０００９】
図１に示したように、インターネットに代表されるネットワーク１に接続されているネットワーク端末２には、ナビゲーション・メニュー分割部３、ＨＴＭＬファイルから音声対話記述言語に変換する音声出力記述言語変換部４、音声対話記述言語を制御し会話を成立させる音声対話制御部５、ＨＴＭＬファイルを音声対話記述言語に変換する際、変換対象のＨＴＭＬファイルがリンクを含む場合、リンクを含むナビゲーション・メニューの次の項目への移動を設定するスキップ処理生成部６が保存されている。他に、ネットワーク端末２には、キーボード等の入力部８、音声対話制御部５からの指示で音声を出力するテキスト読み上げ部７を備えるものとする。
【００１０】
以下、処理の流れを図２のフローチャートに従って説明する。
まず、ネットワーク上に配信されているＨＴＭＬで記載されたコンテンツを受信する。受信したコンテンツはネットワーク端末２の一時記憶手段たとえばメモリー等（図示せず）にＨＴＬＭファイルとして格納しておく。（Ｓ２０１）
通常ならば受信したコンテンツをディスプレイ等の表示手段に表示する。しかし、本実施形態では、表示手段に表示しない。このとき、メモリーに格納したＨＴＭＬファイルに所定のタグ、たとえば＜ＦＲＡＭＥＳＥＴ＞がある場合、ナビゲーション・メニュー分割部３によって、受信し格納したＨＴＭＬファイルをナビゲーション・メニューと本文とに分ける（Ｓ２０２）。本発明のナビゲーション・メニュー分割方法は、どのような方法を利用しても良い。
【００１１】
ここで、ナビゲーション・メニュー分割部３の分割方法の一例を説明する。
まず、受信したＨＴＭＬコンテンツは、基本的にタグたとえば、＜ｈｔｍｌ＞＜ｈｅａｄ＞等のついたテキスト文書である。
このタグのついたテキスト文章を、ＤＯＭ（ｄｏｃｕｍｅｎｔＯｂｊｅｃｔＭｏｄｅｌ）解析し、ＤＯＭツリーを作成する。次に、作成したＤＯＭツリーの階層的な構造の中から、以下の特徴量を計算する。
１．下位フォルダのうち、リンクであるものの総数。
２．下位フォルダのうち、テキストであって、かつリンクである場合、そのテキストの長さを求め、テキストの長さの平均。
３．下位フォルダの中の、リンクであるものと、そうでないものの比率。
こうして各フォルダで、以上の３点の特徴量を求め、これらと予め設定した値と比較する。
３つの特徴量が設定した値と一致したときに、そのフォルダの占める部分をＷｅｂページ中のナビゲーション・メニューの部分として判断する。
【００１２】
次に、音声出力を実現するためには、音声出力記述言語変換部４でナビゲーション・メニューと本文を一度ＨＴＭＬファイルから音声出力のための音声出力記述言語へと変換しておく。（Ｓ２０３）
この音声出力の記述言語としては、ＶｏｉｃｅＸＭＬ（ＶｏｉｃｅｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）のような音声出力の標準とされる言語であっても良いし、独自の言語であってもかまわない。
次に、ナビゲーション・メニューの中にあるリンク先へ移動するスキップ処理に対しても音声で対応するための音声出力用の記述言語（以下、音声出力記述言語）の形式で実装する。（Ｓ２０４）
【００１３】
テキスト読み上げ部７は、音声出力記述言語に変換したＨＴＭＬコンテンツを順に読み上げる。本発明では、ナビゲーション・メニュー分割部３によってナビゲーション・メニューの部分を判断し、ＨＴＭＬコンテンツのナビゲーション・メニュー部分から読み上げていく（Ｓ２０５）。
【００１４】
ここで、スキップ処理の例を説明する。例えばナビゲーション・メニューの読み上げ中に、ユーザがスキップのコマンドを発生させた場合である（Ｓ２０６）。
スキップコマンドは、ユーザが入力部８から入力する。たとえば、入力部がキーボードであれば、特定のキーを押すことによるものでも、あるいは入力部８が音声認識装置であった場合はユーザが「スキップ」と発声したものを音声認識装置によって検知するものでも良い。いずれにしても、音声対話制御部５は、ユーザからの入力を受け、テキスト読み上げ部７へスキップ処理の指示を行なう。テキスト読み上げ部７は、読み上げていたナビゲーション・メニューのまとまりを全てスキップして本文へ移り、本文の読み上げを行なう（Ｓ２０７）。
【００１５】
このようにすることで、音声出力されているナビゲーション・メニューに興味の無いユーザは、この不要なナビゲーション・メニュー部分の読み上げが終わるのを待つことなく、すぐに次の項目へと進むことができる。スキップされるのは、ナビゲーション・メニューの項目なので、文章を適当な長さでスキップするのとは異なり、誤って必要な部分をスキップしてしまうということも少なくなる。
【００１６】
ところで、ＨＴＭＬファイルから音声出力用の記述言語への変換に関しては、使用する音声出力記述言語に依存する。
本実施例では、例としてＨＴＭＬファイルの要素のうちテキストとリンクのみを対象とする。画像などの、テキストとリンク以外のＨＴＭＬ要素については省略するものとして説明する。また、音声出力用記述言語にはＴＴＳによる読み上げ要素、キー入力などのユーザ入力要素、ＨＴＭＬのリンクに相当する、ジャンプによる移動先の指定等が記述できるものとする。
ＨＴＭＬのテキストに関してはＴＴＳによる読み上げ対象の要素とする。ＨＴＭＬのリンクに関しては、そのテキストはＴＴＳによる読み上げ対象とし、リンク先への移動も音声出力記述言語で記述する。
【００１７】
リンク先への移動は、リンク先の読み上げ直後にユーザから何らかの入力を得られるようにしておく。例えばユーザからキー入力を得たい場合では「リンク先へ移動する場合は１のボタンを押してください」等、音声対話制御部５で音声出力記述言語を作成する。
【００１８】
このように作成した音声出力によって、ユーザの入力を誘導し、ユーザから入力があった場合には、音声対話制御部５が作動し、ユーザの入力に従ってリンク先へのジャンプ等の処理を行なう。
【００１９】
このように、音声ブラウザで読み上げるＨＴＭＬコンテンツのうち、ナビゲーション・メニューの読み上げを、ユーザの入力によりスキップすることができるようにしたので、ユーザは必要のない場合ナビゲーション・メニューの読み上げを省くことができる。
【００２０】
図３は、本発明の第２の実施形態の構成を示した概略図である。
第２の実施形態では、本文とナビゲーション・メニューを分離してユーザに提示することを行なう。その結果として、読み上げ順序はＨＴＭＬコンテンツに記載されている順序とは必ずしも一致しない。
【００２１】
図３に示したように、インターネットに代表されるネットワーク１に接続されているネットワーク端末２には、ナビゲーション・メニュー分割部３、ＨＴＭＬファイルから音声対話記述言語に変換する音声出力記述言語変換部４、音声対話記述言語を制御しユーザとの対話を成立させる音声対話制御部５、ＨＴＭＬファイルを音声対話記述言語に変換する際、変換対象のＨＴＭＬファイルがリンクを含む場合、リンクを含むナビゲーション・メニューの次の項目への移動を設定するスキップ処理生成部６が保存されている。他に、ネットワーク端末２には、キーボード等の入力部８、音声対話制御部５からの指示で音声を出力するテキスト読み上げ部７を備えるものとする。
【００２２】
以下、処理の流れを図４のフローチャートに従って説明する。
まず、ネットワーク上に配信されているＨＴＭＬで記載されたコンテンツを受信する。受信したコンテンツは、ネットワーク端末２の一時記憶手段たとえばメモリー等（図示せず）にＨＴＬＭファイルとして格納しておく。（Ｓ４０１）
通常ならば受信したコンテンツをディスプレイ等の表示手段に表示する。しかし、本実施形態では、表示手段に表示しない。このとき、メモリーに格納したＨＴＭＬファイルに所定のタグ、たとえば＜ＦＲＡＭＥＳＥＴ＞がある場合、ナビゲーション・メニュー分割部３によって、受信し格納したＨＴＭＬファイルをナビゲーション・メニューと本文とに分ける（Ｓ４０２）。本発明のナビゲーション・メニュー分割方法は、どのような方法を利用しても良い。
【００２３】
ここで、ナビゲーション・メニュー分割部３でナビゲーション・メニューと本文部分を判断し分離した場合は、選択対話と呼ばれる新たな音声対話要素を付け加える。これはユーザに本文を聞きたいか、ナビゲーション・メニューを聞きたいかを最初に選択させる音声出力を記述するものである。これによって、ユーザは好みに応じて本文部分の読み上げをさせるか、あるいはナビゲーション・メニューの読み上げをさせる。さらにユーザが望むならナビゲーション・メニューのリンク先をたどることが可能となる。
ここで、たずねる要素をもった選択対話のための文章は、音声対話制御部５で作成する。たずねる要素をもった対話選択のための文章例としては、ユーザに対して「本文を聞きますか？それともナビゲーション・メニューを聞きますか？」と言う文章を作成する。（Ｓ４０３）
【００２４】
このとき、音声出力記述言語変換部４において、本文部分、ナビゲーション・メニュー部分、選択対話の部分をそれぞれ適当な音声出力記述言語に変換する。この、音声出力記述言語変換部４で変換し作成した音声出力記述ファイルのうち、本文部分のファイルをＶｏｉｃｅ−Ｂ、ナビゲーション・メニュー部分のファイルをＶｏｉｃｅ−Ｎ、選択対話の部分のファイルＶｏｉｃｅ−Ｃとする。
【００２５】
次に、ユーザからのＶｏｉｃｅ−Ｃの回答を得る。ユーザからの回答は入力部８によって得る（Ｓ４０４）。
次に、ユーザからの入力に応じて、つまりユーザがナビゲーション・メニューを希望した場合テキスト読み上げ部７は、Ｖｏｉｃｅ−Ｎ（Ｓ４０６）を、またユーザが本文を希望した場合テキスト読み上げ部７はＶｏｉｃｅ−Ｂ（Ｓ４０５）の処理を行なう。つまりテキスト読み上げ部７は、それぞれの音声出力を行なう。
【００２６】
このように、ＨＴＭＬコンテンツのうちナビゲーション・メニュー部分を決定して、本文部分とナビゲーション・メニュー部分を分離し、どちらを読み上げるかをユーザに選択させるようにしたので、例えばユーザが、本文のみを聞きたい場合には、必要のないナビゲーション・メニューの読み上げを省くことができる。
【００２７】
上記の実施形態では、音声のみでＷｅｂ閲覧をすることを想定していた。第３の実施形態では、音声だけでなく、表示装置を利用できる環境を想定する。
最近の携帯端末などでは小さい表示画面を持つものが多い。その表示画面に、テキストや画像を表示させることが可能となっている。
このような場合、画面だけでもＷｅｂの利用は可能である。しかしながら、小さい表示装置に対応したコンテンツだけを閲覧する場合は問題ないが、一般のＨＴＭＬで作成されたＷｅｂを閲覧しようとすると、それらのＨＴＭＬコンテンツは、高解像度の表示画面を想定して作られているため、小さな画面では見にくく不便である。
【００２８】
図５は、本発明の第３の実施形態の構成を示した概略図である。
本実施形態では、このような不便を解決するための、音声と小さい表示画面を共に有効に利用する方法を提案する。
【００２９】
図５に示したように、インターネットに代表されるネットワーク１に接続されているネットワーク端末２には、ナビゲーション・メニュー分割部３、ＨＴＭＬファイルから音声対話記述言語に変換する音声出力記述言語変換部４、ブラウザの制御つまり音声読み上げの開始、停止等の処理を行い、本文の音声提示とナビゲーション・メニューの画面提示との同期の処理等を行なうメディア総合部１１、が保存されている。メディア総合部１１には、音声対話記述言語を制御し会話を成立させる音声対話制御部５、ＨＴＭＬブラウザ部１０が保存されている。他にネットワーク端末２には、テキスト読み上げ部７、入力部８、表示部９を備えている。
【００３０】
以下、処理の流れを図６のフローチャートに従って説明する。
まず、ネットワーク上に配信されているＨＴＭＬで記載されたコンテンツを受信する。受信したコンテンツは、ネットワーク端末２の一時記憶手段たとえばメモリー等（図示せず）にＨＴＭＬファイルとして格納しておく（Ｓ６０１）。
次に、ナビゲーション・メニュー分割部３によって、受信し格納したＨＴＭＬファイルを解析しテキスト文章をナビゲーション・メニューと本文とに分ける（Ｓ６０２）。本発明のナビゲーション・メニュー分割方法は、どのような方法を利用しても良い。
【００３１】
次に、分割されたナビゲーション・メニューはメディア総合部１１に送られ、分割された本文部分は音声出力記述言語変換部４に送られる。音声出力記述言語変換部４では本文部分を音声出力記述言語に変換する（Ｓ６０３）。音声出力記述言語に変換後の本文部分のデータは、メディア総合部１１へ送る。
メディア総合部１１では、送られた本文部分のデータを音声対話制御部５に送る。次に音声対話制御部５は、本文部分をテキスト読み上げ部７を利用しテキスト読み上げを行なう。
また、送られたナビゲーション・メニュー部分はＨＴＭＬブラウザ部１０を利用し表示部９へ表示する（Ｓ６０４）。この、ナビゲーション・メニューの部分は通常のインターネットブラウザが行なう処理と同様である。
【００３２】
このように、ＨＴＭＬコンテンツのうち本文部分は、画面上には表示せずに、主に音声によってユーザに提示される（Ｓ６０５）。そのためユーザは、画面上のナビゲーション・メニューを確認しながら、音声で本文を聞くことができる（Ｓ６０７）。
【００３３】
メディア総合部１１において音声対話制御部とＨＴＭＬブラウザ部１０を総括している。例えば、ユーザが、キーボード等の入力部８から画面上のナビゲーション・メニューのリンクを選択した場合、本文の読み上げを直ちに止めて、選択されたリンク先へ移動する（Ｓ６０６）。
このような場合、メディア総合部１１において、音声読み上げの開始、停止等の処理を行い、本文部分の音声での提示とナビゲーション・メニューの画面表示とを同期させている。
【００３４】
このように、ＨＴＭＬコンテンツのうちナビゲーション・メニュー部分を画面で確認でき、本文部分を音声出力によって聞くことができる。ユーザは、画面に表示されているナビゲーション・メニューから所望の本文のみを選択することができ、必要のないナビゲーション・メニューの読み上げを省くことができる。さらに、画面が小さくても、情報量の比較的少ないナビゲーション・メニューのみを表示させることで、画面を有効に利用できる。
【００３５】
このように、ＨＴＭＬコンテンツを画面表示と音声読み上げに効果的に振り分けるようにしたので、携帯電話などの小さな画面と音声機能のみの限られたハードウエア資源の端末でも、情報量の多いＨＴＭＬコンテンツを快適に閲覧することができる。
【００３６】
図７は、本発明の第４の実施形態の構成を示した概略図である。
上記の第３の実施形態では、音声だけでなく、表示装置を利用できる環境を想定していた。
本実施形態では、音声を利用せず小さい表示画面を有効に利用する方法を提案する。
【００３７】
図７に示したように、インターネットに代表されるネットワーク１に接続されているネットワーク端末２、ネットワーク端末端末２には、たとえばハードディスクドライブ等に格納されたナビゲーション・メニュー分割部３、ＨＴＭＬコンテンツに変更を加えるＨＴＭＬコンテンツ変換部１２、ＨＴＭＬブラウザ部１０が保存されている。他にネットワーク端末２には、入力部８と、表示部９を備えている。
【００３８】
以下、処理の流れを図８のフローチャートに従って説明する。
まず、ネットワーク上に配信されているＨＴＭＬで記載されたコンテンツを受信する。受信したコンテンツはネットワーク端末２の一時記憶手段たとえばメモリー等（図示せず）にＨＴＭＬファイルとして格納しておく（Ｓ８０１）。
次に、ナビゲーション・メニュー分割部３によって、受信し格納したＨＴＭＬファイルを解析しテキスト文章をナビゲーション・メニュー部分と本文部分とに分ける（Ｓ８０２）。本発明のナビゲーション・メニュー分割方法は、どのような方法を利用しても良い。
この分割された、ナビゲーション・メニュー部分のＨＴＭＬデータをＨＴＭＬ−Ｎと呼ぶ。また、本文の部分のＨＴＭＬデータをＨＴＭＬ−Ｂと呼ぶ。
【００３９】
次に、分割されたＨＴＭＬデータはＨＴＭＬコンテンツ変換部１２に送られる。
次に、ＨＴＭＬコンテンツ変換部１２で、画面上の画像をクリックした際に特定の挙動を行なわせるクリッカブルイメージと呼ばれるリンクを作成する。
まず、本文の部分のＨＴＭＬデータであるＨＴＭＬ−Ｂのページの先頭あるいは最後にあたる部分へ、ＨＴＭＬ−Ｎデータへのリンクたとえば「ナビゲーション」等の文字列を作成して付け加える。また、ナビゲーション・メニューのＨＴＭＬデータであるＨＴＭＬ−Ｎデータには、ＨＴＭＬ−Ｂへのリンクたとえば「本文」等の文字列を作成して付け加える（Ｓ８０３）。
【００４０】
ＨＴＭＬブラウザ部１０は、始めに、ＨＴＭＬ−Ｂのみを処理し、表示部８に表示する。こうして表示部８には、ＨＴＭＬ−Ｂのみが表示される（Ｓ８０４）。このため表示内容は、ナビゲーション・メニューの無いシンプルな構造になり、小さい表示画面でも見やすいページとなる。
また、このページの先頭部あるいは最後にある「ナビゲーション」というリンクをユーザが選択した場合（Ｓ８０５）、「ナビゲーション」のリンク先はＨＴＭＬ−ＮなのでＨＴＭＬブラウザ部１０はＨＴＭＬ−Ｎを処理し、ＨＴＭＬ−Ｎのページをつまりナビゲーション・メニューを表示部８へ表示する（Ｓ８０６）。ユーザは、この表示されたナビゲーション・メニューから他のページに移動することができる。
【００４１】
このように、ＨＴＭＬコンテンツのうち本文とナビゲーション・メニューを別に処理し、ユーザがどちらか一方のページのみを見るようにし、それぞれ他方のページにリンクを張るようにした。このため、一般のＨＴＭＬコンテンツを閲覧するときに小さい表示画面でも快適にＷｅｂ閲覧することができる。
【００４２】
【発明の効果】
本発明によれば、インターネット上のＨＴＭＬコンテンツを解析し自動で音声読み上げ部やナビゲーション・メニュー等に振り分け音声ブラウザで読み上げる部分を指定することができ、画面を見ることなく情報を効率よく得ることができる。
また、インターネット上のＨＴＭＬコンテンツを振り分けた後、効率よく画面表示を行なうので、小さい表示画面でも閲覧しやすい。
【図面の簡単な説明】
【図１】本願発明の第１の実施形態の構成を示した概略図
【図２】第１の実施形態の手順を示したフローチャート
【図３】本願発明の第２の実施形態の構成を示した概略図
【図４】第２の実施形態の手順を示したフローチャート
【図５】本願発明の第３の実施形態の構成を示した概略図
【図６】第３の実施形態の手順を示したフローチャート
【図７】本願発明の第４の実施形態の構成を示した概略図
【図８】第４の実施形態の手順を示したフローチャート
【符号の説明】
１．ネットワーク
２．ネットワーク端末
３．ナビゲーション・メニュー分割部
４．音声出力記述言語変換部
５．音声対話制御部
６．スキップ処理生成部
７．テキスト読み上げ部
８．入力部
９．表示部
１０．ＨＴＭＬブラウザ部
１１．メディア統合部
１２．ＨＴＭＬコンテンツ変換部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention is a method for controlling software for browsing (browsing) an Internet Web page by voice. More specifically, the present invention relates to a voice browser and a method for controlling an Internet browser in a mobile phone or the like.
[0002]
[Prior art]
In recent years, the use of the Internet has greatly increased, and accordingly, a different form of usage from the past has been used. Basically, information on the Internet is in a format called HTML (Hyper Text Markup Language).
There are devices that can access the Internet from a mobile phone having a small screen or a general subscriber fixed telephone with a small display to read the HTML and use the Internet.
Furthermore, a screen displaying Internet information by a visually impaired person using Web page reading software, a voice browser, or the like (for example, refer to Patent Document 1) realized by a reading device for a document with attributes. Some devices can access information on the Internet without reference.
[0003]
[Patent Document 1]
JP-A-11-327870 (page 2-3, FIG. 1)
[0004]
[Problems to be solved by the invention]
However, Web sites described in general HTML are often created on the premise that they are displayed on a high-resolution display device. In this case, it is often difficult to browse comfortably on a small display screen or a voice browser. One of the factors that complicates such a website is that a navigation menu is included as part of the screen display. Here, the navigation menu is not a text displayed on the screen but a group of links for moving to another page.
For example, a Web site that displays news information on the Internet has a navigation menu showing only topics such as "economy, international, and markets" in addition to the text of the news. For example, selecting "Economy" will take you to a news page dedicated to that topic. The part that shows only this topic is called the navigation menu.
Such a navigation menu is very convenient for a user browsing a website using a high-resolution display device.
However, when the speech synthesis apparatus reads a Web page, it is troublesome to sound such a navigation menu when it is desired to know only the text.
Furthermore, when accessing the Internet from a mobile phone or the like, a large number of navigation menus are displayed in addition to the text on a small screen, so it is often troublesome to scroll and read the text.
[0005]
The present invention has been made in view of such a problem, and allows comfortable Web browsing even on a small screen such as a voice or a mobile phone without being hindered by a troublesome navigation menu in a Web page. Make it possible.
[0006]
[Means for Solving the Problems]
The present invention employs the following means in order to solve the above problems.
[0007]
In other words, the contents of a Web site described in HTML or the like on the Internet include attribute data (hereinafter, this may be referred to as a tag) that defines a modification condition and the like. The present invention uses this tag as read-out control information. In the method of reading out a Web page according to the present invention, a step of analyzing the attributed content in a text-to-text portion of the content by a voice synthesizing unit by analyzing the attributed content; And a step of reading out a text portion in accordance with a reading condition analyzed in principle by the step of analyzing the content.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a schematic diagram showing the configuration of the first embodiment of the present invention.
In the first embodiment, a visually impaired person can use Web-to-speech software using a text-to-speech conversion technology (TTS: Text To Speech) and use only voice and an input device such as a keyboard without using a screen. Assume a mode of accessing the Internet. Of course, it can also be used by people without visual impairment.
[0009]
As shown in FIG. 1, a network terminal 2 connected to a network 1 typified by the Internet includes a navigation / menu division unit 3, a speech output description language conversion unit 4 for converting an HTML file into a speech dialogue description language. A speech dialogue control unit 5 for controlling a speech dialogue description language to establish a conversation. When converting an HTML file into a speech dialogue description language, if the HTML file to be converted includes a link, the following of the navigation menu including the link A skip processing generation unit 6 for setting a move to an item is stored. In addition, it is assumed that the network terminal 2 includes an input unit 8 such as a keyboard and a text-to-speech unit 7 that outputs a voice in response to an instruction from the voice interaction control unit 5.
[0010]
Hereinafter, the processing flow will be described with reference to the flowchart of FIG.
First, content described in HTML distributed on a network is received. The received content is stored as an HTLM file in a temporary storage means of the network terminal 2, for example, a memory (not shown). (S201)
Normally, the received content is displayed on display means such as a display. However, in the present embodiment, no display is made on the display means. At this time, if the HTML file stored in the memory includes a predetermined tag, for example, <FRAMESET>, the navigation menu dividing unit 3 divides the received and stored HTML file into a navigation menu and a text (S202). The navigation / menu division method of the present invention may use any method.
[0011]
Here, an example of a division method of the navigation / menu division unit 3 will be described.
First, the received HTML content is basically a text document with a tag, for example, <html><head>.
A DOM (document Object Model) analysis is performed on the text sent with the tag to create a DOM tree. Next, the following features are calculated from the hierarchical structure of the created DOM tree.
1. Total number of subfolders that are links.
2. If it is a text and a link in the lower folders, calculate the length of the text and average the length of the text.
3. The ratio between what is a link and what is not in a subfolder.
In this way, the above three feature amounts are obtained for each folder, and these are compared with preset values.
When the three feature amounts match the set values, the portion occupied by the folder is determined as the portion of the navigation menu in the Web page.
[0012]
Next, in order to realize the audio output, the audio output description language conversion unit 4 converts the navigation menu and the text once from the HTML file to the audio output description language for audio output. (S203)
The language for describing the audio output may be a standard language for audio output such as VoiceXML (Voice Extensible Markup Language) or a unique language.
Next, it is implemented in the form of a description language for audio output (hereinafter, audio output description language) to cope with a skip process for moving to a link destination in the navigation menu by audio. (S204)
[0013]
The text-to-speech unit 7 reads out the HTML content converted into the audio output description language in order. In the present invention, the navigation menu part is determined by the navigation menu division unit 3 and is read out from the navigation menu part of the HTML content (S205).
[0014]
Here, an example of the skip processing will be described. For example, this is a case where the user issues a skip command while reading out the navigation menu (S206).
The user inputs the skip command from the input unit 8. For example, if the input unit is a keyboard, a specific key is pressed, or if the input unit 8 is a voice recognition device, the voice recognition device detects what the user utters as "skip". But it's fine. In any case, the voice interaction control unit 5 receives an input from the user and instructs the text-to-speech unit 7 to perform a skip process. The text-to-speech unit 7 skips all the units of the navigation menu that have been read out, shifts to the text, and reads out the text (S207).
[0015]
By doing so, a user who is not interested in the navigation menu that is output as voice can immediately proceed to the next item without waiting for the unnecessary navigation menu portion to be read out. . Since the skipped item is a navigation menu item, unlike the case where the text is skipped at an appropriate length, it is less likely that a necessary part is erroneously skipped.
[0016]
The conversion from the HTML file to a description language for voice output depends on the voice output description language used.
In the present embodiment, as an example, only texts and links are targeted among the elements of the HTML file. Description will be made assuming that HTML elements other than text and links, such as images, are omitted. It is assumed that the description language for voice output can describe a reading element by TTS, a user input element such as a key input, and a designation of a moving destination by a jump corresponding to an HTML link.
HTML text is an element to be read out by TTS. Regarding the HTML link, the text is to be read out by the TTS, and the movement to the link destination is also described in the audio output description language.
[0017]
When moving to the link destination, some input can be obtained from the user immediately after reading out the link destination. For example, when the user wants to obtain a key input, the voice interaction control unit 5 creates a voice output description language such as “Press the button 1 to move to the link destination”.
[0018]
The voice output created in this way guides the user's input, and when there is an input from the user, the voice interaction control unit 5 operates to perform processing such as jumping to a link destination according to the user's input.
[0019]
As described above, in the HTML content read out by the voice browser, the reading of the navigation menu can be skipped by the user's input, so that the user can omit the reading of the navigation menu when unnecessary. .
[0020]
FIG. 3 is a schematic diagram showing the configuration of the second embodiment of the present invention.
In the second embodiment, the text and the navigation menu are separated and presented to the user. As a result, the reading order does not always match the order described in the HTML content.
[0021]
As shown in FIG. 3, a network terminal 2 connected to a network 1 typified by the Internet includes a navigation / menu division unit 3, a speech output description language conversion unit 4 for converting an HTML file into a speech dialog description language. A speech dialogue control unit 5 for controlling a speech dialogue description language and establishing a dialogue with a user, a navigation menu including a link when the HTML file to be converted includes a link when converting an HTML file into a speech dialogue description language A skip processing generation unit 6 for setting the movement to the next item is stored. In addition, it is assumed that the network terminal 2 includes an input unit 8 such as a keyboard and a text-to-speech unit 7 that outputs a voice in response to an instruction from the voice interaction control unit 5.
[0022]
Hereinafter, the processing flow will be described with reference to the flowchart of FIG.
First, content described in HTML distributed on a network is received. The received content is stored as an HTLM file in a temporary storage means of the network terminal 2, for example, a memory (not shown). (S401)
Normally, the received content is displayed on display means such as a display. However, in the present embodiment, no display is made on the display means. At this time, if the HTML file stored in the memory has a predetermined tag, for example, <FRAMESET>, the navigation menu dividing unit 3 divides the received and stored HTML file into a navigation menu and a text (S402). The navigation / menu division method of the present invention may use any method.
[0023]
Here, when the navigation menu and the text part are determined and separated by the navigation / menu division unit 3, a new voice interaction element called a selection dialog is added. It describes the audio output that first allows the user to select whether he wants to hear the text or the navigation menu. As a result, the user can read out the text portion or read out the navigation menu as desired. Further, if desired by the user, it is possible to follow the link destination of the navigation menu.
Here, the sentence for the selection dialogue having the asking element is created by the voice interaction control unit 5. As an example of a sentence for dialog selection with an asking element, create a sentence to the user asking, "Are you listening to the text or the navigation menu?" (S403)
[0024]
At this time, the voice output description language conversion unit 4 converts the text portion, the navigation menu portion, and the selected dialogue portion into an appropriate voice output description language. Of the audio output description files converted and created by the audio output description language conversion unit 4, the file of the body part is Voice-B, the file of the navigation menu part is Voice-N, and the file of the selection dialogue is Voice-C. And
[0025]
Next, Voice-C's answer is obtained from the user. The answer from the user is obtained by the input unit 8 (S404).
Next, in response to an input from the user, that is, when the user desires the navigation menu, the text-to-speech unit 7 performs Voice-N (S406), and when the user desires the text, the text-to-speech unit 7 performs the Voice-N. B (S405) is performed. That is, the text-to-speech unit 7 outputs each voice.
[0026]
In this manner, the navigation menu portion of the HTML content is determined, the text portion and the navigation menu portion are separated, and the user is allowed to select which one to read aloud. If necessary, it is possible to omit unnecessary navigation menu reading.
[0027]
In the above embodiment, it is assumed that Web browsing is performed only by voice. In the third embodiment, an environment in which a display device can be used in addition to audio is assumed.
Many recent mobile terminals have a small display screen. It is possible to display texts and images on the display screen.
In such a case, the Web can be used only on the screen. However, there is no problem when browsing only contents corresponding to a small display device. However, when browsing a Web created by general HTML, those HTML contents are created assuming a high-resolution display screen. Therefore, it is difficult to see on a small screen, which is inconvenient.
[0028]
FIG. 5 is a schematic diagram showing the configuration of the third embodiment of the present invention.
The present embodiment proposes a method for effectively using both a voice and a small display screen to solve such inconvenience.
[0029]
As shown in FIG. 5, a network terminal 2 connected to a network 1 typified by the Internet includes a navigation / menu division unit 3, a speech output description language conversion unit 4 for converting an HTML file into a speech dialogue description language. And a media general unit 11 for controlling the browser, that is, for starting and stopping speech reading, and for synchronizing the presentation of the text with the screen of the navigation menu. The media synthesis unit 11 stores a speech dialogue control unit 5 for controlling a speech dialogue description language and establishing a conversation, and an HTML browser unit 10. In addition, the network terminal 2 includes a text-to-speech unit 7, an input unit 8, and a display unit 9.
[0030]
Hereinafter, the processing flow will be described with reference to the flowchart of FIG.
First, content described in HTML distributed on a network is received. The received content is stored as an HTML file in a temporary storage means of the network terminal 2, for example, a memory (not shown) (S601).
Next, the navigation menu dividing unit 3 analyzes the received and stored HTML file to divide the text into a navigation menu and a text (S602). The navigation / menu division method of the present invention may use any method.
[0031]
Next, the divided navigation menu is sent to the media synthesis unit 11, and the divided body part is sent to the audio output description language conversion unit 4. The voice output description language conversion unit 4 converts the text part into a voice output description language (S603). The data of the text part after the conversion into the audio output description language is sent to the media synthesis unit 11.
The media synthesis unit 11 sends the data of the sent text part to the voice interaction control unit 5. Next, the voice interaction control unit 5 reads out the text using the text-to-speech unit 7 for the text part.
The sent navigation menu is displayed on the display unit 9 using the HTML browser unit 10 (S604). The navigation menu part is the same as the processing performed by a normal Internet browser.
[0032]
In this way, the text portion of the HTML content is presented to the user mainly by voice without being displayed on the screen (S605). Therefore, the user can listen to the text by voice while checking the navigation menu on the screen (S607).
[0033]
The media synthesis unit 11 controls the voice interaction control unit and the HTML browser unit 10. For example, when the user selects a link in the navigation menu on the screen from the input unit 8 such as a keyboard, the reading of the text is immediately stopped and the user moves to the selected link destination (S606).
In such a case, the media synthesis unit 11 performs processing such as start and stop of reading aloud, and synchronizes the presentation of the text portion with the voice and the screen display of the navigation menu.
[0034]
In this way, the navigation menu portion of the HTML content can be confirmed on the screen, and the text portion can be heard by voice output. The user can select only the desired text from the navigation menu displayed on the screen, and can skip reading out the unnecessary navigation menu. Further, even if the screen is small, the screen can be effectively used by displaying only the navigation menu having a relatively small amount of information.
[0035]
As described above, since the HTML content is effectively distributed to the screen display and the voice reading, even a terminal having a small screen such as a mobile phone and a hardware resource having only a voice function can transfer the HTML content having a large amount of information. You can browse comfortably.
[0036]
FIG. 7 is a schematic diagram showing the configuration of the fourth embodiment of the present invention.
In the third embodiment described above, an environment in which not only voice but also a display device can be used has been assumed.
The present embodiment proposes a method for effectively using a small display screen without using sound.
[0037]
As shown in FIG. 7, the network terminal 2 connected to the network 1 typified by the Internet, the network terminal 2 has a navigation / menu division unit 3 stored in, for example, a hard disk drive, and changes to HTML content. The HTML content conversion unit 12 and the HTML browser unit 10 for adding the URL are stored. In addition, the network terminal 2 includes an input unit 8 and a display unit 9.
[0038]
Hereinafter, the processing flow will be described with reference to the flowchart of FIG.
First, content described in HTML distributed on a network is received. The received content is stored as an HTML file in a temporary storage means of the network terminal 2, for example, a memory (not shown) (S801).
Next, the HTML file received and stored is analyzed by the navigation menu division unit 3 to divide the text into a navigation menu part and a text part (S802). The navigation / menu division method of the present invention may use any method.
The divided HTML data of the navigation menu part is called HTML-N. Also, the HTML data in the body part is called HTML-B.
[0039]
Next, the divided HTML data is sent to the HTML content conversion unit 12.
Next, the HTML content conversion unit 12 creates a link called a clickable image that causes a specific behavior when an image on the screen is clicked.
First, a link to the HTML-N data, for example, a character string such as "navigation" is created and added to the beginning or end of the HTML-B page which is the HTML data of the body. A link to HTML-B, for example, a character string such as "text" is created and added to the HTML-N data, which is the HTML data of the navigation menu (S803).
[0040]
First, the HTML browser unit 10 processes only the HTML-B and displays it on the display unit 8. Thus, only HTML-B is displayed on the display unit 8 (S804). Therefore, the display content has a simple structure without a navigation menu, and is a page that is easy to view even on a small display screen.
If the user selects the link "Navigation" at the beginning or end of this page (S805), the link destination of "Navigation" is HTML-N, so that the HTML browser unit 10 processes the HTML-N and outputs the HTML. -N, that is, a navigation menu is displayed on the display unit 8 (S806). The user can move to another page from the displayed navigation menu.
[0041]
As described above, the text and the navigation menu in the HTML content are separately processed, so that the user views only one of the pages and links to the other pages. For this reason, when browsing general HTML content, Web browsing can be performed comfortably even on a small display screen.
[0042]
【The invention's effect】
According to the present invention, it is possible to analyze HTML content on the Internet, automatically sort the HTML content into a voice-to-speech unit, a navigation menu, and the like, and designate a portion to be read out by a voice browser, thereby efficiently obtaining information without looking at a screen. it can.
Also, since the screen display is performed efficiently after the HTML content on the Internet is sorted, it is easy to browse even a small display screen.
[Brief description of the drawings]
1 is a schematic diagram showing a configuration of a first embodiment of the present invention; FIG. 2 is a flowchart showing a procedure of the first embodiment; FIG. 3 is a diagram showing a configuration of a second embodiment of the present invention; FIG. 4 is a flow chart showing the procedure of the second embodiment. FIG. 5 is a schematic view showing the configuration of the third embodiment of the present invention. FIG. 6 is a view showing the procedure of the third embodiment. FIG. 7 is a schematic diagram showing the configuration of the fourth embodiment of the present invention. FIG. 8 is a flowchart showing the procedure of the fourth embodiment.
1. Network 2. Network terminal 3. 3. Navigation / menu division unit 4. Voice output description language conversion unit Voice dialogue controller 6. 6. Skip processing generation unit Text-to-speech unit 8. Input unit 9. Display unit 10. HTML browser section 11. Media integration unit 12. HTML content conversion unit

Claims

In a Web browser control method, a content of attribute-attached content is analyzed, and a text portion in the content is read out by a speech synthesis unit.
An analyzing step of analyzing the content of the content with the attribute data;
An input step of instructing a reading portion of the content analyzed by the step of analyzing the content by input means;
Reading a text portion according to an instruction from the input means,
A method for controlling a Web browser, comprising:

The analysis step includes:
2. The Web browser control method according to claim 1, wherein the dividing step is a dividing step of dividing a link item in the content into a navigation menu part and other parts separated by predetermined attribute data.

The dividing step includes:
Analyzing the content and dividing it as text data,
The length of the divided text data, the average length, the ratio of the link item and the ratio other than the link item are calculated, and the navigation menu part of the link item and the 3. The web browser control method according to claim 2, wherein the step of determining the text data is other than the text data.

In a Web browser control method, a content of content with attribute data is analyzed, and a text portion in the content is read out by a voice synthesis unit.
A division step of dividing into a navigation menu part and other parts of the link item in the content, separated by predetermined attribute data,
A skip processing step of setting skip processing for skipping the navigation menu divided in the division step,
An input step of inputting execution of the skip processing step by an input unit;
Reading a text portion according to an instruction from the input means,
A method for controlling a Web browser, comprising:

In a Web browser control method, a content of content with attribute data is analyzed, and a text portion in the content is read out by a voice synthesis unit.
A division step of dividing into a navigation menu part and other parts of the link item in the content, separated by predetermined attribute data,
An interaction control step for setting processing for selecting one of the navigation menu section and the other text section divided in the division step, and an input step for inputting execution of the interaction control step,
Reading a text portion according to an instruction from the input means,
A method for controlling a Web browser, comprising:

In a Web browser control method, a content of content with attribute data is analyzed, and a text portion in the content is read out by a voice synthesis unit.
A division step of dividing into a navigation menu part and other parts of the link item in the content, separated by predetermined attribute data,
A display step of displaying the navigation menu section divided in the division step on a display section,
Reading a text part other than the navigation menu part,
An inputting step of inputting a link item selection execution of a navigation menu displayed on the display unit by an input unit;
Moving to a link item destination according to an instruction from the input means;
A method for controlling a Web browser, comprising:

In a Web browser control method, a content of content with attribute data is analyzed, and a text portion in the content is read out by a voice synthesis unit.
A division step of dividing into a navigation menu part and other parts of the link item in the content, separated by predetermined attribute data,
Setting a reciprocal link between the divided navigation menu part and other parts,
A display step of displaying a navigation menu section on a display section with the mutual link attached,
A step of reading out a text part other than the navigation menu part with the mutual link;
An inputting step of inputting a link item selection execution of a navigation menu displayed on the display unit by an input unit;
Moving to a link item destination according to an instruction from the input means;
A method for controlling a Web browser, comprising:

In a device that analyzes the content of the content with attribute data and reads out a text part in the content by a speech synthesis unit,
Analysis means for analyzing the content with the attribute data;
Means for reading out a text portion in accordance with a reading condition, which is analyzed by means for analyzing the content in principle when reading out the HTML content;
Input means for instructing a reading section of the content analyzed by the step of analyzing the content,
A web browser control device, comprising:

The analysis means,
9. The web browser control device according to claim 8, wherein the web browser control device is a dividing unit that divides a link item in the content into a navigation menu part and other parts separated by predetermined attribute data.

The dividing means,
Means for analyzing the content and dividing it as text data,
The length of the divided text data, the average length, the ratio of the link item and the ratio other than the link item are obtained, and the navigation menu part of the link item in the divided text data and the other 10. The Web browser control device according to claim 9, wherein the Web browser control device determines the text data.

In the Web browser control device, the content of the content with the attribute data is analyzed, and the text portion in the content is read out by the speech synthesis unit.
Division means for dividing a link item in the content separated by predetermined attribute data into a navigation menu part and other parts,
Skip processing means for setting skip processing for skipping the navigation menu part divided by the division means,
Input means for inputting execution of the skip processing means;
A Web browser control device as means for reading out a text portion in accordance with an instruction from the input means.

In the Web browser control device, the content of the content with the attribute data is analyzed, and the text portion in the content is read out by the speech synthesis unit.
Division means for dividing into a navigation menu part and other parts of the link item in the content, separated by predetermined attribute data,
Dialog control means for setting processing for selecting one of the navigation menu part and the other text part divided by the division means;
Input means for inputting execution of the dialog control means;
Means for reading out a text portion according to an instruction from the input means;
A web browser control device, comprising:

In the Web browser control device, the content of the content with the attribute data is analyzed, and the text portion in the content is read out by the speech synthesis unit.
Division means for dividing into a navigation menu part and other parts of the link item in the content, separated by predetermined attribute data,
Display means for displaying the navigation menu section divided by the division means on a display section,
Means for reading a text part other than the navigation menu part,
Input means for inputting execution of link item selection execution of the navigation menu displayed on the display unit;
Means for moving to a link item destination according to an instruction from the input means;
A web browser control device, comprising:

In the Web browser control device, the content of the content with the attribute data is analyzed, and the text portion in the content is read out by the speech synthesis unit.
Division means for dividing into a navigation menu part and other parts of the link item in the content, separated by predetermined attribute data,
Means for setting mutual links in the divided navigation menu part and other parts,
Display means for displaying a navigation menu section on a display section with the mutual link,
Means for reading a text part other than the navigation menu part with the mutual link,
Input means for inputting execution of link item selection execution of the navigation menu displayed on the display unit;
Means for moving to a link item destination according to an instruction from the input means;
A web browser control device, comprising: