JPH11510977A

JPH11510977A - Method and apparatus for extracting information using audio interface

Info

Publication number: JPH11510977A
Application number: JP9538046A
Authority: JP
Inventors: マイケルアブラハムベネディクト; デビッドアランラッド; ジェームスクリストファーラミング; ケネスジーレアー; カーティスデュアンタッキー
Original assignee: AT&T Corp
Current assignee: AT&T Corp
Priority date: 1996-04-22
Filing date: 1997-03-18
Publication date: 1999-09-21
Also published as: CA2224712A1; IL122647A0; MX9710150A; KR19990028327A; IL122647A; WO1997040611A1; EP0834229A1

Abstract

(57)【要約】オーディオインターフェイス装置を用いてドキュメントサーバから情報を取り出すための方法および装置。ある有利な実施の形態では、通信ネットワークが、オーディオ処理ノードとオーディオインタープリタノードとを備えたオーディオブラウジングノードを含んでいる。オーディオインターフェイス装置とオーディオブラウジングノードとの間にはオーディオチャネルが確立される。オーディオブラウジングノードとドキュメントサーバとの間にはドキュメント供給プロトコルチャネルが確立される。ドキュメントサーバは、ドキュメント供給プロトコルチャネルを介してオーディオブラウジングノードにドキュメントを提供する。オーディオブラウジングノードは、ドキュメントをオーディオデータに翻訳し、さらにそのオーディオデータをオーディオチャネルを介してオーディオインターフェイスへ与える。オーディオインターフェイス装置は、オーディオチャネルを介してオーディオブラウジングノードへオーディオユーザ入力を与える。オーディオブラウジングノードは、そのオーディオユーザ入力を、ドキュメントサーバに与えられるのに適したユーザデータに翻訳し、さらにそのユーザデータをドキュメント供給プロトコルチャネルを介してドキュメントサーバに提供する。 (57) Abstract: A method and apparatus for retrieving information from a document server using an audio interface device. In one advantageous embodiment, a communication network includes an audio browsing node with an audio processing node and an audio interpreter node. An audio channel is established between the audio interface device and the audio browsing node. A document serving protocol channel is established between the audio browsing node and the document server. The document server provides the document to the audio browsing node via a document serving protocol channel. The audio browsing node translates the document into audio data and provides the audio data to an audio interface via an audio channel. The audio interface device provides audio user input to the audio browsing node via an audio channel. The audio browsing node translates the audio user input into user data suitable for being provided to the document server, and provides the user data to the document server via a document serving protocol channel.

Description

【発明の詳細な説明】オーディオインターフェイスを用いた情報の取り出し方法および装置発明の分野本発明は情報の取り出し一般に関する。より詳細には、本発明は、オーディオユーザインターフェイスを用いたネットワークからの情報取り出しに関する。発明の背景通信ネットワーク上に存在する情報量は膨大であるとともに急速に増加している。このようなネットワークで最も一般的なものは、世界中のコンピュータがリンクされたネットワークであるインターネット（Internet）である。インターネットの普及度の高さの大部分は、インターネットのワールドワイドウェブ（ＷＷＷ）部分に帰することができるであろう。ＷＷＷとは、サーバコンピュータとクライアントコンピュータとの間の情報が通常ハイパーテキスト転送プロトコル（ＨＴＴＰ）を用いてやり取りされる、インターネットの一部分である。サーバは情報を記憶しており、クライアントからの要求に応答してクライアントに情報を供給（すなわち、送信）する。クライアントは、情報を要求し且つ表示するための、しばしばブラウザといわれるコンピュータソフトウェアプログラムを実行する。ＷＷＷブラウザの例としては、ネットスケープ社（Netscape Communication s lnc.）のネットスケープ・ナビゲータ（Netscape Navigator）、およびマイクロソフト社（Microsoft Corp.）のインターネット・エクスプローラ（Internet Explorer）がある。サーバおよびこれに記憶された情報は、ＵＲＬ（Uniform Resource Locators ）によって識別される。ＵＲＬは、バーナーズ-リー（Berners-Lee,T.）らの”U niform Resource Locators”（RFC1738，Network Working Group，1994）に詳細に説明されている。それは参考文献として本明細書に組み入れられる。例えば、 http://www.hostname.com/document1.html（注１）というＵＲＬは、”document 1.html”というドキュメントがホストサーバ”www.hostname.com”にあることを示している。つまり、クライアントによるホストサーバへの情報の要求は、通常ＵＲＬを含んでいる。サーバからクライアントへ渡される情報は、通常ドキュメントといわれる。かかるドキュメントは、ＨＴＭＬ（Hypertext Markup Language ）のようなドキュメント言語により記述されるのが一般的である。クライアントからの要求を受けると、サーバはＨＴＭＬドキュメントをクライアントに送信する。ＨＴＭＬドキュメントは、コンピュータのディスプレイ画面でユーザに情報を表示するためにブラウザに用いられる情報を含んでいる。ＨＴＭＬドキュメントは、テキスト、論理構造コマンド、ハイパーテキストリンク、およびユーザ入力コマンドを含んでいてよい。ユーザがディスプレイからハイパーテキストリンクを（例えばマウスをクリックすることにより）選択したときには、ブラウザはサーバに別のドキュメントを要求する。現在のＷＷＷブラウザは、テキストおよびグラフィカルなユーザインターフェイスに基づいている。つまり、ドキュメントはコンピュータの画面上にイメージとして示される。かかるイメージは例えばテキスト、グラフィック、ハイパーテキストリンク、およびユーザ入力用ダイアローグボックスを含むものである。ユーザのＷＷＷとのすべての対話処理（インターアクション）は、グラフィカルユーザインターフェイスを通して行われる。オーディオデータはユーザのコンピュータで受信し且つ再生できるが（例えば、”.wav”または”.au”ファイル）、オーディオデータを受信することはＷＷＷのグラフィカルインターフェイスにとっては副次的なことに過ぎない。つまり、オーディオデータはユーザの要求の結果として送信されてもよいが、ユーザがオーディオインターフェイスを用いてＷＷＷと対話処理する手段は存在していない。（注１）ここで例にあげたＵＲＬは説明のためだけに用いたものである。いかなる特定のＵＲＬを用いることも本発明の実例として以外に何ら意味を持つものではない。また、実際のＵＲＬを意味するものではない。発明の要約本発明は、オーディオインターフェイス装置（例えば、電話）を用いてドキュメントサーバから情報を取り出すための方法および装置を提供する。また、ドキュメント供給プロトコルに従って動作するドキュメントサーバからドキュメントを取り出すインタープリタが提供される。このインタープリタは、ドキュメントを、オーディオユーザインターフェイスに与えられるオーディオデータに翻訳（ interpret）する。また、インタープリタは、オーディオインターフェイス装置からのオーディオユーザ入力を受信する。また、インタープリタは、そのオーディオユーザ入力を、ドキュメント供給プロトコルに従ってドキュメントサーバに送信されるのに適したユーザデータに翻訳するとともに、このユーザデータをドキュメントサーバに提供する。多くの実施の形態では、インタープリタは、オーディオユーザインターフェイス内或いはドキュメントサーバ内に位置していることがあり、またはオーディオユーザインターフェイスとドキュメントサーバとの間の通信チャネル内に配置されていることがある。ある実施の形態によると、本発明のオーディオブラウジング（browsing）機能を実行するための通信ネットワークノードは、長距離電話ネットワークのような通信ネットワーク内にノードとして含まれている。オーディオインターフェイス装置とノードとの間に、オーディオチャネルが確立される。また、ノードとドキュメントサーバとの間には、ドキュメント供給プロトコルチャネルが確立される。ノードは、ドキュメント供給プロトコルに従ってドキュメントサーバに供給されたドキュメントを受信し、さらにそのドキュメントをオーディオユーザインターフェイスに適したオーディオデータに翻訳する。そして、ノードは、オーディオチャネルを介してオーディオインターフェイス装置に、そのオーディオデータを送信する。また、ノードは、オーディオインターフェイス装置からオーディオユーザ入力（例えば、ＤＴＭＦトーン音または音声）を受信し、そのオーディオユーザ入力をドキュメントサーバに適したユーザデータに翻訳する。さらに、ノードは、ドキュメント供給プロトコルに従ってそのユーザデータをドキュメントサーバに送信する。ある実施の形態では、ドキュメントサーバは、ハイパーテキスト転送プロトコルを介してクライアントと通信するワールドワイドウェブのドキュメントサーバである。本発明の利点は、ユーザが、オーディオインターフェイス装置を介してワールドワイドウェブのドキュメントサーバとのオーディオブラウジングセッションを行うことができることである。ワールドワイドウェブのドキュメントサーバは、このブラウジングセッションを通常のやり方で扱うことができ、その特定のブラウジングセッションが通常のグラフィカルブラウザを実行するクライアントにより開始させられたかまたはオーディオインターフェイス装置により開始させられたかを知っている必要がない。必要な翻訳機能は通信ネットワークノードで実行され、これらの機能は、オーディオ翻訳装置を用いるユーザおよびハイパーテキスト転送プロトコルに従って動作するワールドワイドウェブのドキュメントサーバのいずれにも分からないように行われる。本発明のこれらおよび他の利点は、以下の詳細な説明および添付図面を参照することにより当業者にとって明らかとなるであろう。図面の簡単な説明図１は、本発明を実行するのに適した通信システムを示す図である。図２は、オーディオ処理ノードの構成部分のブロック図である。図３は、オーディオインタープリタノードの構成部分のブロック図である。図４は、ドキュメントサーバのブロック図である。図５は、オーディオＨＴＭＬドキュメントの一例を示す図である。図６は、ＨＴＭＬドキュメントの一例を示す図である。図７は、オーディオブラウジング機能がユーザインターフェイス装置で実行される一実施の形態のブロック図である。図８は、図７のユーザインターフェイス装置の構成部分のブロック図である。図９は、オーディオブラウジング機能がオーディオブラウジングドキュメントサーバで実行される一実施の形態のブロック図である。図１０は、図９のオーディオブラウジングドキュメントサーバの構成部分のブロック図である。図１１は、オーディオ翻訳機能がオーディオインタープリタドキュメントサーバで実行される一実施の形態のブロック図である。図１２は、図１１のオーディオインタープリタドキュメントサーバの構成部分のブロック図である。詳細な説明図１は、本発明を実行するのに適した通信システム１００を示す図である。例えば電話１１０のようなオーディオインターフェイス装置が、ローカル交換キャリア（ＬＥＣ）１２０に接続されている。オーディオインターフェイス装置としては電話以外のものを用いることもできる。例えば、オーディオインターフェイス装置は、電話通信機能のあるマルチメディアコンピュータであってもよい。本発明によると、電話１１０のユーザは、例えばドキュメントサーバ１６０のようなドキュメントサーバから提供される情報に関連した電話番号に電話を掛ける。図１に示した典型的な実施の形態においては、ドキュメントサーバ１６０は通信ネットワーク１６２の一部である。有利な実施の形態では、通信ネットワーク１６２はインターネット（Internet）である。ドキュメントサーバ１６０のようなドキュメントサーバを介してアクセスできる情報に関連した電話番号は、それがオーディオブラウジング補助部（adjunct）１５０のような特別な通信ネットワークノードにルート指定されるように設定される。図１に示した実施の形態では、オーディオブラウジング補助部１５０は、長距離電話ネットワークである通信ネットワーク１０２の中にあるノードである。よって、その通話呼び出しはＬＥＣ１２０にルート指定され、ＬＥＣ１２０がその通話呼び出しをさらに中継線１２５を介して長距離キャリアスイッチ１３０へルート指定する。長距離ネットワーク１０２は、通話呼び出しのルート指定を行うためにスイッチ１３０と同様の他のスイッチを有することが一般的である。しかしながら、簡単のために図１にはスイッチが１つだけ描かれている。通信ネットワーク１０２内のスイッチ１３０は”インテリジェント”スイッチであり、様々な機能を実行するためにプログラムされることがある処理ユニット１３１を含んでいる（または処理ユニット１３１に接続されている）。このように通信ネットワークスイッチ内に処理ユニットを用いることおよびそれをプログラムすることは、この技術分野では周知である。スイッチ１３０で通話呼び出しを受信すると、その通話呼び出しはオーディオブラウジング補助部１５０へルート指定される。これにより、電話１１０とオーディオブラウジング補助部１５０との間にオーディオチャネルが確立される。通信ネットワークを通しての通話のルート指定（ルーティング）はこの技術分野では周知であり、ここではこれ以上説明しない。ある実施の形態では、本発明によるオーディオブラウジングサービスは、通信ネットワーク１０２のサービスプロバイダによって提供されるオーディオブラウジングサービスの加入者となったユーザだけに提供される。かかる実施の形態では、スイッチ１３０に接続されたデータベース１４０が、加入者のリストを含んでいる。スイッチ１３０は、通話呼び出しが加入者からサーバへされたかどうかを判定するためにデータベース１４０を参照する。これを実現するための１つの方法は、データベース１４０内に呼び出し電話番号（ＡＮＩ）のリストを記憶しておくことである。周知の方法では、ＬＥＣ１２０がスイッチ１３０に電話１１０のＡＮＩを提供する。そして、スイッチ１３０は、そのＡＮＩがデータベース１４０に記憶されたオーディオブラウジングサービスの加入者リストに含まれているかどうかを判定するためにデータベース１４０を参照する。もしそのＡＮＩがリストにあれば、スイッチ１３０は、本発明に従い通話呼び出しをオーディオブラウジング補助部１５０にルート指定する。もしそのＡＮＩがオーディオブラウジングサービスの加入者でなければ、適切なメッセージが電話１１０に送られる。オーディオブラウジング補助部１５０は、ともに後で詳述する、オーディオ処理ノード１５２とオーディオインタープリタノード１５４とを含んでいる。オーディオブラウジング補助部１５０は、本発明に従ってオーディオブラウジング機能を提供する。電話１１０からの通話呼び出しを受信すると、オーディオブラウジング補助部１５０は、リンク１６４を介して、呼び出した電話番号に関連したドキュメントサーバ１６０との通信チャネルを確立する。電話番号とドキュメントサーバとの関連は、後で詳述する。ＷＷＷについての実施の形態では、リンク１６４はＴＣＰ／ＩＰに対するソケット接続であり、その確立はこの技術分野では周知である。ＴＣＰ／ＩＰのさらなる情報については、参考文献として組み入れられる、コマー・ダグラス（Comer，Douglas）の”Internetworking with TCP/IP：Princip l es，Protocols，and Architecture”（Englewood Cliffs，NJ，Prentice Hall， 1988）を参照されたい。オーディオブラウジング補助部１５０およびドキュメントサーバ１６０は、ドキュメント供給プロトコルを用いて互いに交信する。ここで、ドキュメント供給プロトコルとは、クライアントとサーバとの間の情報の転送についての通信プロトコルである。かかるプロトコルによると、クライアントはサーバに要求を送ることによりサーバに情報を要求し、サーバは要求された情報を含むドキュメントをクライアントに送ることにより要求に応える。よって、ドキュメント供給プロトコルチャネルは、オーディオブラウジング補助部１５０とドキュメントサーバ１６０との間にリンク１６４を介して確立される。有利であるＷＷＷについての実施の形態では、ドキュメント供給プロトコルは、ハイパーテキスト転送プロトコル（ＨＴＴＰ）である。このプロトコルはＷＷＷ通信の技術では周知であり、参考文献として組み入れられる、バーナーズ−リー（Bern ers-Lee，T）およびコノリー(Connolly，D)の”Hypertext Transfer Protocol(H TTP)Working Draft of the Internet Engineering Task Force”（1993）に詳述されている。従って、オーディオブラウジング補助部１５０は、ＨＴＴＰプロトコルを用いてドキュメントサーバ１６０と通信する。よって、ドキュメントサーバ１６０に関する限り、それは通常のグラフィカルブラウザを実行する通常のＷＷＷクライアントのいずれかと通信しているように振る舞う。つまり、ドキュメントサーバ１６０は、リンク１６４から受け取った要求に応答してオーディオブラウジング補助部１５０にドキュメントを供給する。ここでドキュメントとは情報の集合である。ドキュメントはサーバ１６０で予め決められた静的ドキュメントであってもよく、このときには、そのドキュメントへの全ての要求に対して同じ情報が与えられる結果となる。或いは、ドキュメントは、要求に応答して供給される情報が要求がされた時点で動的に生成されるような動的なものであってもよい。一般には動的ドキュメントは、情報への要求に応答してサーバ１６０によって実行されるプログラムであるスクリプトによって生成される。例えば、ＵＲＬはあるスクリプトと関連するものであってよい。サーバ１６０がＵＲＬを含む要求を受信したとき、サーバ１６０はスクリプトを実行して動的ドキュメントを生成し、情報を要求したクライアントにその動的に生成されたドキュメントを供給する。ドキュメントを動的に生成するためにスクリプトを用いることはこの技術分野では周知である。サーバ１６０によって供給されたドキュメントは、テキスト、論理構造コマンド、ハイパーテキストリンク、およびユーザ入力コマンドを含んでいる。こういったドキュメントの１つの特徴は、ドキュメントに含まれる情報の物理構造（すなわち、通常のグラフィックブラウザを実行してクライアント側で表示したときの情報の物理レイアウト）が定義されていないことである。その代わりとして、ドキュメントは、物理レイアウトを定義するためにブラウザにおいて翻訳される論理構造コマンドを含んでいる。例えば、このような論理構造コマンドは、強調コマンドや新しいパラグラフコマンドなどを含んでいる。このようなコマンドのシンタックス構造は、参考文献として組み入れられるゴールドファーブ・チャールズ（Goldfarb，Charles）の”The SGML Handbook”（Clarendon Press，1990 ）に記載されているＳＧＭＬ（Standard Generalized Markup Language）のような、より一般的な目的のドキュメント構造言語の規定に適合していてよい。本発明のＷＷＷについての実施の形態では、これらドキュメントは、ハイパーテキストマークアップ言語（ＨＴＭＬ）のドキュメントである。ＨＴＭＬは、ＷＷＷサーバによって供給されるドキュメントを定義するために用いられるＳＧＭＬに基づいた周知の言語である。ＨＴＭＬについては、参考文献として組み入れられる、バーナーズ−リー（Berners-Lee，T）およびコノリー（Connolly，D）の”Hyp ertext Markup Language（HTML）Working Draft of the Internet Engineering Task Force”（1993）に詳述されている。ＨＴＭＬドキュメントが通常のブラウザを実行するクライアントによって受信されたときに、ブラウザはＨＴＭＬドキュメントをイメージに翻訳し且つそのイメージをコンピュータディスプレイ画面に表示する。しかしながら、本発明の原理によると、ドキュメントサーバ１６０からドキュメントを受信すると、オーディオブラウジング補助部１５０はそのドキュメントをオーディオデータに変換する。かかる変換の詳細は後で詳しく説明する。そして、オーディオデータは、スイッチ１３０およびＬＥＣ１２０を介して電話１１０に送られる。つまり、この方法によると、電話１１０のユーザは、オーディオインターフェイスを介してドキュメントサーバ１６０の情報にアクセスすることが可能である。さらに、ユーザは、電話１１０からオーディオブラウジング補助部１５０にオーディオユーザ入力を送ることも可能である。オーディオユーザ入力は、例えば音声信号またはＤＴＭＦトーン音であってよい。オーディオブラウジング補助部１５０は、オーディオユーザ入力を、ＨＴＴＰプロトコルに従ってリンク１６４を介してドキュメントサーバ１６０に送信するのに適したユーザデータまたは命令に変換する。ユーザデータまたは命令は、さらにドキュメント供給プロトコルチャネルを介してドキュメントサーバ１６０に送られる。これにより、ユーザとドキュメントサーバとはオーディオユーザインターフェイスを介して互いに対話処理することとなる。このやり方では、ユーザはオーディオインターフェイスを介してＷＷＷドキュメントサーバとブラウジングセッションを行うことができる。ドキュメントサーバは、かかるブラウジングセッションを通常のやり方で扱うことができ、特定のブラウジングセッションが通常のグラフィカルブラウザを実行するクライアントにより開始させられたかまたは電話のようなオーディオインターフェイスにより開始させられたかを知っている必要がない。ネットワーク１０２内のオーディオブラウジング補助部１５０は、ドキュメントサーバ１６０から供給されたドキュメントを、電話１１０に送るのに適したオーディオデータに翻訳する。さらに、オーディオブラウジング補助部１５０は、電話１１０で受け取ったオーディオユーザ入力を、ドキュメントサーバ１６０で受信されるのに適したユーザデータに翻訳する。次に、ブラウジングセッションに関する利点のある実施の形態について、より詳細に説明する。ここで、電話１１０側にいるユーザが、ドキュメントサーバ１６０を介してアクセス可能な情報と関連付けられそれゆえオーディオブラウジング補助部１５０にルート指定されるように設定された番号（１２３）４５６−７８９０（注２）にダイアルすると仮定する。通話呼び出しはＬＥＣ１２０にルート指定され、ＬＥＣ１２０はその電話番号を長距離ネットワーク１０２、特にスイッチ１３０にルート指定されたものとして認識する。通話呼び出しを受信すると、スイッチ１３０は次にその通話呼び出しをリンク１３２を介してオーディオブラウジング補助部１５０にルート指定する。これにより、電話１１０とオーディオブラウジング補助部１５０との間のオーディオチャネルが確立される。オーディオ処理ノード１５２の詳細が図２に示されている。オーディオ処理ノード１５２は、電話ネットワークインターフェイスモジュール２１０と、ＤＴＭＦデコーダ／ジェネレータ２１２と、音声認識モジュール２１４と、テキスト− 音声モジュール２１６と、オーディオ再生／録音モジュール２１８とを備えており、図２に示すように、これらのそれぞれがオーディオバス２２０および制御／データバス２２２に接続されている。さらに、オーディオ処理ノード１５２は、中央処理装置２２４と、メモリ装置２２８と、パケットネットワークインターフェイス２３０とを備えており、これらのそれぞれは制御／データバス２２２に接続されている。オーディオ処理ノード１５２の全体としての機能は、中央処理装置２２４によって制御される。中央処理装置２２４は、メモリ装置２２８に記憶されて実行されるコンピュータプログラム命令２３２の制御によって動作する。メモリ装置２２８は機械的に読みだし可能な装置であればどのようなものでもよい。例えば、メモリ装置２２８は、ランダムアクセスメモリ（ＲＡＭ）、リードオンリーメモリ（ＲＯＭ）、プログラム可能なリードオンリーメモリ（ＰＲＯＭ）、消去可能ＰＲＯＭ（ＥＰＲＯＭ）、電気的消去可能ＰＲＯＭ（ＥＥＰＲＯＭ）、磁気記憶媒体（すなわち、磁気ディスク）、または光学的記憶媒体（すなわち、ＣＤ−ＲＯＭ）であってよい。さらに、オーディオ処理ノード１５２は、中央処理装置２２４によるアクセスが可能で且つコンピュータプログラム命令２３２とデータ２３４とを共に記憶することができる機械的に読みだし可能な装置の様々な組合せを含んでいてよい。電話ネットワークインターフェイスモジュール２１０は、オーディオ処理ノード１５２と電話ネットワークスイッチ１３０との間の低レベルの対話処理を扱う。ある実施の形態においては、モジュール２１０は、１または複数のアナログチップ／リングループスタート電話回線終端子からなる。モジュール２１０により、中央処理装置２２４は制御データバス２２２を介してリンク１３２を制御することができる。制御機能としては、オンフック／オフフック、呼び出し検出、および遠端オンフック検出を含む。別の実施の形態では、モジュール２１０は、Ｔ１／ＤＳ１、Ｅ１、またはＰＲ１のような１または複数のチャネル化ディジタルインターフェイスを含んでいる。信号は帯域内または帯域外であってよい。ＤＴＭＦデコーダ／ジェネレータ２１２は、ＤＴＭＦトーン信号のディジタルデータへの変換、およびディジタルデータからのＤＴＭＦトーン音の生成を扱う。音声認識モジュール２１４は、ユーザの電話１１０で発生してオーディオバス２２０から受け取った音声信号を認識する。このような音声信号は音声認識モジュール２１４によって処理され、ディジタルデータに変換される。テキスト−音声モジュール２１６は、ドキュメントサーバ１６０から受け取ったドキュメントのテキストを、電話１１０側のユーザに送信されるオーディオ音声信号に変換する。オーディオ再生／録音モジュール２１８はドキュメントサーバ１６０から受け取ったオーディオデータを電話１１０側で再生するとともに、ユーザの声のようなオーディオデータを録音するために用いられる。各モジュール２１０、２１２、２１４、２１６、２１８は、図２では別々の機能のモジュールとして示されていることを付記しておく。各モジュール２１２、２１４、２１６、２１８の機能は、周知の信号処理技術を用いて、ハードウェア、ソフトウェアまたはハードウェアとソフトウェアの組合せとして実現されてもよい。モジュール２１０の機能は、周知の信号処理技術を用いて、ハードウェア、またはハードウェアとソフトウェアの組合せとして実現されてもよい。各モジュールの機能は、実例に関連して後でさらに詳述される。パケットネットワークインターフェイス２３０は、オーディオ処理ノード１５２とオーディオインタープリタノード１５４との間の通信のために用いられる。オーディオブラウジング補助部１５０は、オーディオ処理ノード１５２に接続されたオーディオインタープリタノード１５４をも含んでいる。オーディオインタープリタノード１５４は、図３にその詳細が示されている。オーディオインタープリタノード１５４は、中央処理装置３０２と、メモリ３０４と、制御／データバス３１０によって接続された２つのパケットネットワークインターフェイス３０６、３０８とを含んでいる。オーディオインタープリタノード１５４の全体としての機能は、中央処理装置３０２によって制御される。中央処理装置３０２は、メモリ装置３０４に記憶されて実行されるコンピュータプログラム命令３１２の制御によって動作する。メモリ装置３０４は機械的に読みだし可能な装置であればどのようなものでもよい。例えば、メモリ装置３０４は、ランダムアクセスメモリ（ＲＡＭ）、リードオンリーメモリ（ＲＯＭ）、プログラム可能なリードオンリーメモリ（ＰＲＯＭ）、消去可能ＰＲＯＭ（ＥＰＲＯＭ）、電気的消去可能ＰＲＯＭ（ＥＥＰＲＯＭ）、磁気記憶媒体（すなわち、磁気ディスク）、または光学的記憶媒体（すなわち、ＣＤ−ＲＯＭ）であってよい。さらに、オーディオインタープリタノード１５４は、中央処理装置３０２によるアクセスが可能で且つコンピュータプログラム命令３１２とデータ３１４とを共に記憶することができる機械的に読みだし可能な装置の様々な組合せを含んでいてよい。中央処理装置が実行するソフトウェア命令を用いて、オーディオ処理ノード１５２およびオーディオインタープリタノード１５４のような装置を制御することは、この技術分野では周知であり、ここではさらに詳しい説明はしない。実例に戻ると、電話１１０から電話番号（１２３）４５６−７８９０への通話呼び出しはオーディオブラウジング補助部１５０、特にオーディオ処理ノード１５２へルート指定されている。中央処理装置２２４は、電話ネットワークインターフェイスモジュール２１０により呼び出し中の回線を検出する。通話呼び出しを検出すると、中央処理装置は、ダイアルされた番号（ＤＮ）と関連したＵＲＬを決定するために参照を行う。ダイアルされた電話番号（ＤＮ）はこの技術分野では周知のやり方でローカル交換キャリア１２０からスイッチ１３０へ与えられ、さらにＤＮはスイッチ１３０からオーディオブラウジング補助部１５０に与えられる。メモリ２２８内には、ＤＮに関連したＵＲＬのリストがデータ２３４として記憶されている。本例ではＤＮ（１２３）４５６−７８９０がＵＲＬ http: //www.att.com/〜phone/greeting と関連付けられていると仮定する。別の実施の形態では、様々なＤＮと関連付けられたＵＲＬのリストは、オーディオブラウジング補助部１５０にローカルにあるのではなく、データベース１４０のようなネットワークデータベースに記憶されている。かかる実施の形態では、オーディオ処理ノード１５２の中央処理装置２２４が、ネットワークスイッチ１３０に対してデータベース１４０の参照を要求する信号を送る。スイッチはデータベース１４０からＵＲＬを要求し、結果として得たＵＲＬをオーディオ処理ノード１５２に送り戻す。オーディオ処理ノード１５２とスイッチ１３０とデータベース１４０との間の通信は、この技術分野では周知である例えばＳＳ７のような帯域外信号システムを経由してもよいことを付記しておく。このような構成の利点は、複数のオーディオブラウジング補助部がネットワーク１０２内に存在してもよく、そしてそれぞれが１つのデータベース１４０を共有してもよいことである。これにより、ＵＲＬと関連するＤＮとを更新する必要があるデータベース１４０は１つだけとなる。ＤＮに関連したＵＲＬを受け取った後、オーディオ処理ノード１５２の中央処理装置２２４は、（ＵＲＬを含む）メッセージをオーディオインタープリタノード１５４に送り、オーディオインタープリタノード１５４にオーディオ翻訳／ブラウジングセッションを始めるように命令する。かかるメッセージは、中央処理装置２２４から制御／データバス２２２を経てパケットネットワークインターフェイス２３０へ送られる。さらにこのメッセージは、オーディオ処理ノード１５２のパケットネットワークインターフェイス２３０から接続１５３を介してオーディオインタープリタノード１５４のパケットネットワークインターフェイス３０６へ送られる。ある有利な実施の形態では、オーディオ処理ノード１５２およびオーディオインタープリタノード１５４は並置され、これにより一体としてオーディオブラウジング補助部１５０を形成する。別の実施の形態では、オーディオ処理ノード１５２およびオーディオインタープリタノード１５４は地理的に分離されてもよい。このような代替的ないくつかの実施の形態については後述する。接続１５３は、この技術分野では周知のパケットデータネットワーク接続（例えば、イーサネットに対するＴＣＰ／ＩＰ接続）であってよい。実例に戻ると、オーディオインタープリタノード１５４は、パケットネットワークインターフェイス３０６を介して、新たなオーディオ翻訳／ブラウジングセッションを始めるようにというメッセージを受け取る。中央処理装置３０２は、複数のユーザについての複数のオーディオ翻訳／ブラウジングセッションを同時に制御することが可能である。プロセッサによるこのような複数処理の実行は周知であり、各セッションを制御するソフトウェア処理の例示を一般に伴っている。オーディオ翻訳／ブラウジングセッションの開始に当たり、オーディオインタープリタノード１５４は、ＵＲＬ http://www.att.com/〜phone/greeting についてのＨＴＴＰ要求を接続１６４を介してドキュメントサーバ１６０へ送る。本例では、ドキュメントサーバ１６０がホスト名 www.att.com と関連していると仮定している。ドキュメントサーバ１６０の詳細が図４に示されている。ドキュメントサーバ１６０は、メモリ４０４に接続された中央処理装置４０２を含むコンピュータである。ドキュメントサーバ１６０の機能は、メモリ４０４に記憶されたコンピュータプログラム命令４１６を実行する中央処理装置４０２によって制御される。動作に当たり、ドキュメントサーバ１６０は、接続１６４およびパケットネットワークインターフェイス４４０を介してオーディオインタープリタノード１５４からのドキュメントの要求を受け取る。中央処理装置４０２はその要求を翻訳しメモリ４０４から要求された情報を取り出す。かかる要求は、ＨＴＭＬドキュメント４０８、オーディオＨＴＭＬドキュメント４１０、オーディオファイル４１２、またはグラフィックファイル４１４に対するものであってよい。ＨＴＭＬドキュメント４０８は周知のものであり、通常のＷＷＷグラフィカルブラウザに用いられる通常のＨＴＭＬ命令を含んでいる。オーディオＨＴＭＬドキュメントはＨＴＭＬドキュメントに類似しているが、本発明に従ったオーディオインタープリタノード１５４での翻訳のための特有の付加命令を有している。本発明のオーディオブラウジング面について特有のかかる命令を、ここではオーディオＨＴＭＬ命令という。オーディオＨＴＭＬドキュメントおよびオーディオＨＴＭＬ命令についての詳細は後で詳しく説明する。オーディオファイル４１２はオーディオ情報を含むフアイルである。グラフィックファイル４１４はグラフィカル（図表）情報を含むフアイルである。この技術分野で周知な方法によると、ＵＲＬは特定のドキュメントサーバにある特定のドキュメントを同定する。メモリ４０４は、動的に生成されるＨＴＭＬドキュメントおよびオーディオＨＴＭＬドキュメントについてのスクリプト４１８をも含んでいてよい。本例に戻ると、ＵＲＬ htt p://www.att.com/〜phone/greeting についてのＨＴＴＰ要求は、オーディオインタープリタノード１５４から接続１６４を介してドキュメントサーバ１６０によって受信される。ドキュメントサーバはこのＵＲＬを翻訳し、中央処理装置４０２の制御のもとでメモリ４０４からオーディオＨＴＭＬページを取り出す。そして、中央処理装置４０２は、このオーディオＨＴＭＬドキュメントを、パケットネットワークインターフェイス４４０およびリンク１６４を介してオーディオインタープリタノード１５４に送る。ＵＲＬ http://www.att.com/〜phone/greeting についての要求に応答して送られ、さらにオーディオインタープリタノード１５４に受信されるオーディオＨＴＭＬドキュメント５００が、図５に示されている。オーディオインタープリタノード１５４は以下のようにドキュメント５００の翻訳を始める。ある実施の形態では、ページのタイトルを含む、ドキュメント５００のライン５０２〜５０６の＜ＨＥＡＤ＞部分は音声には変換されず、オーディオインタープリタノード１５４に無視される。別の実施の形態では、＜ＴＩＴＬＥ＞部分は後述するテキスト−音声を用いて翻訳されてもよい。ドキュメント５００の＜ＢＯＤＹ＞部分のライン５０８にあるテキスト”Ｈｅｌｌｏ！”は、パケットネットワークインターフェイス３０６およびリンク１５３を介してオーディオインタープリタノード１５４からオーディオ処理ノード１５２へ送られる。テキスト”Ｈｅｌｌｏ！”について、オーディオインタープリタノード１５４は、そのテキストはテキスト−音声モジュール２１６で処理されるべきものであるとの命令をオーディオ処理ノード１５２に送る。オーディオ処理ノード１５２はパケットネットワークインターフェイス２３０を介して当該テキストおよび命令を受け取り、そして当該テキストは制御／データバス２２２を介してテキスト−音声モジュール２１６に与えられる。テキスト−音声モジュール２１６は、”Ｈｅｌｌｏ”（注３）を再生するオーディオ信号を生成し、オーディオバス２２０を介してこの信号を電話ネットワークインターフェイスモジュール２１０へ送る。さらに、電話ネットワークインターフェイスモジュール２１０はこのオーディオ信号を電話１１０に送る。テキスト−音声変換は周知であって、テキスト−音声モジュール２１４には通常のテキスト−音声技術が用いられてよいことを付記しておく。例えば、テキストが音声に変換される際、テキスト内の記号”！”は大きな音量での再生と翻訳されてもよい。ドキュメント５００のライン５１０はフォーム命令であり、オーディオインタープリタノード１５４はこの命令についてはオーディオ処理ノード１５２に対して何も送らない。オーディオインタープリタノード１５４はユーザからの将来の応答を期待することを示すものとしてライン５１０を翻訳し、そしてこの応答は、http://machine:8888/hastings-bin/getscript.sh．によって同定されるスクリプトへのアーギュメント（argument）として与えられる。ライン５１２はオーディオＨＴＭＬ命令である。オーディオインタープリタノード１５４は、メモリ４０４の記憶領域４１２内にある、www-spr.ih.att.com/〜hastings/annc/greet ing.mu8 で同定されるオーディオファイルについてのｈｔｔｐ要求をサーバ１６０へ送ることによりライン５１２を翻訳する。ドキュメントサーバ１６０はメモリ４０４からオーディオファイルを取り出し、それをリンク１６４を介してオーディオインタープリタノード１５４へ送る。オーディオファイルを受け取ると、オーディオインタープリタノード１５４はそのファイルを、そのファイルがオーディオ再生／録音モジュール２１８により再生されるべきものであることを示す命令とともにオーディオ処理ノード１５２へ送る。これらファイルおよび命令を受け取ると、オーディオ処理ノード１５２は、このオーディオファイルをオーディオ再生／録音モジュール２１８へルート指定する。オーディオ再生／録音モジュール２１８は、オーディオバス２２０を介して電話ネットワークインターフェイスモジュール２１０に送られるオーディオ信号を生成する。そして、電話ネットワークインターフェイスモジュール２１０はそのオーディオ信号を電話１１０へ送る。この結果、電話１１０側にいるユーザは、電話１１０のスピーカで、オーディオファイル www-spr.ih.att.com/〜hastings/annc/greeting.mu8 の内容を聞くことになる。ライン５１４〜５１６はオーディオＨＴＭＬ命令である。オーディオインタープリタノード１５４はライン５１４をオーディオ処理ノード１５２に送らない。ライン５１４は、ユーザからの応答が可変ネーム”collectvar”と関連したドキュメントサーバ１６０に送られることを示している。この命令は、ユーザが情報を促されそして情報を与えるプロンプト−コレクト（collect）シーケンスの開始を示すものである。この命令に続いて、プロンプト命令５１６および一組の選択命令５１８〜５２２がある。オーディオインタープリタノード１５４はライン５１２と同様なやり方でライン５１６を処理し、この結果、電話１１０側にいるユーザは http://www-spr.ih.att.com/〜hastings/annc/choices.mu8 で同定されるファイルからの音を聞くことになる。この音はいくつかの基準に基づいて選択をするようにユーザに尋ねるものであり、オーディオインタープリタノード１５４は電話１１０側のユーザからの応答を待つ。また、処理ライン５１６の結果、中央処理装置３０２はオーディオ処理ノード１５２へ、電話ネットワークインターフェイスモジュール２１０がオーディオ入力を受け取る準備をするようにするメッセージを送る。そして、ユーザは電話１１０からのオーディオユーザ入力で応答する。オーディオユーザ入力は、ユーザが電話１１０のキーパッド上のキーを押すことによって生成されるＤＴＭＦトーン形式であってよい。例えば、もしユーザが電話１１０のキーパッドの”２”を押すと、オーディオ処理ノード１５２は電話ネットワークインターフェイスモジュール２１０を介して”２”と関連したＤＴＭＦトーン音を受け取る。かかるオーディオ信号は中央処理装置２２４によってＤＴＭＦトーン音として認識され、この信号をオーディオバス２２０を介してＤＴＭＦデコーダ／ジェネレータ２１２に送るために命令が電話ネットワークインターフェイスモジュール２１０に送られる。中央処理装置２２４は、ＤＴＭＦトーン音をディジタルデータに変換しさらにそのディジタルデータをパケットネットワークインターフェイス２３０からオーディオインタープリタノード１５４へ送信するように、ＤＴＭＦデコーダ／ジェネレータ２１２に命令する。この信号が受信されると、オーディオインタープリタノード１５４は、ユーザの応答が”２”、つまりオーディオＨＴＭＬドキュメント５００のライン５２０に示された値”Ｊｉｍ”の選択であることを認識する。つまり、オーディオインタープリタノード１５４は可変”collectvar”と関連した値”Ｊｉｍ”を、ドキュメント５００のライン５１０で同定されるスクリプト http://machine:8888/hastings-bin/getscr ipt.sh．に送る。もしユーザの応答がリストされていないものを選択して入力するものであれば、つまり本例で”１”から”３”以外の応答があれば、或いは所定時間内にユーザが応答しなかったならば、オーディオインタープリタノード１５４は、「選択を受領できません。もう一度やり直してください」（イタリック体）という音声信号を生成するようにテキスト−音声モジュール２１６に命令し、そしてその信号が電話１１０側のユーザに送られる。代替的には、オーディオユーザ入力は音声信号であってもよい。つまり、ユーザが電話１１０のキーパッドの番号２を押す代わりに、ユーザは電話１１０のマイクに「２」という語を話すのである。この音声信号は、電話ネットワークインターフェイスモジュール２１０を介してオーディオ処理ノード１５２に受信される。そしてオーディオ信号は中央処理装置２２４によって音声信号であると認識され、かかる信号をオーディオバス２２０を介して音声認識モジュール２１４に送るために電話ネットワークインターフェイスモジュール２１０に命令が与えられる。中央処理装置２２４は、音声信号をディジタルデータに変換しさらにこのディジタルデータをオーディオインタープリタノード１５４に送信するためにパケットネットワークインターフェイス２３０へ与えるように、音声認識モジュール２１４に命令する。そしてオーディオインタープリタノード１５４は、ディジタルデータを受け取ると、ＤＴＭＦオーディオユーザ入力に関して説明したようにこのデータを処理する。なお、音声認識モジュール２１４はこの技術分野では周知の通常の音声認識技術にしたがって動作するものであることを付記しておく。ＨＴＭＬドキュメントにはしばしばハイパーテキストリンクが存在する。これが通常のグラフィカルブラウザを実行するコンピュータの画面に表示されるとき、ハイパーテキストリンクはグラフィカルに示される（例えば、下線つきで）。もしユーザが、例えばリンクをマウスでクリックすることにより、グラフィカルにリンクを選択した場合、ブラウザはリンクに示されたドキュメントについての要求を生成し、その要求をドキュメントサーバに送る。ここで、図６に示されたＨＴＭＬドキュメント６００について考察する。ライン６０４、６０５は、ハイパーテキストリンクの通常のＨＴＭＬ記述を詳細に示している。もしこのページが通常のグラフィカルブラウザで処理されたならば、ディスプレイは以下のように見える。 This page gives you a choice of links to follow to other World W ide Web pages. Please click on one of the links below. （このページでは他のＷＷＷのページへのリンクを選択できます。以下のリンクのどれか１つをクリックして下さい） click here for information on cars （自動車の情報についてはここをクリック） click here for information on trucks （トラックの情報についてはここをクリック）そして、ユーザは、マウスのようなグラフィカルポインティングデバイスを用いてリンクの１つを選択する。もしユーザが click here for information on car s を選択すると、ブラウザはＵＲＬ http://www.abc.com/cars.html で同定されるドキュメントについての要求を生成する。もしユーザが click here for info rmation on trucks を選択すると、ブラウザはＵＲＬ http://www.abc.com/truc ks.html で同定されるドキュメントについての要求を生成する。次に、本発明に従ったＨＴＭＬハイパーテキストリンクの処理について、図６を参照して説明する。ここで、ドキュメントサーバ１６０が図６に示されたＨＴＭＬドキュメント６００をオーディオインタープリタノード１５４に提供することを仮定する。ライン６０２、６０３はテキスト−音声モジュール２１６によってオーディオ信号に変換され、上述のようにユーザの電話１１０に与えられる。つまり、ユーザは、「このページでは他のＷＷＷのページへのリンクを選択できます。以下のリンクのどれか１つをクリックして下さい」という音声を聞く。ライン６０４では、ライン６０４がハイパーテキストリンクであるとオーディオインタープリタノード１５４が認識する。オーディオインタープリタノード１５４はオーディオ処理ノード１５２に対して、電話１１０へのトーン音をＤＴＭＦデコーダ／ジェネレータ２１２が生成するようにする命令を送る。或いは、このトーン音は、オーディオインタープリタノード１５４が、オーディオ再生／録音モジュール２１８にトーン音を含むオーディオファイルを再生させるようにする命令を、オーディオ処理ノード１５２に送ることによっても生成される。ハイパーテキストリンクの始まりをユーザに知らせるために、この特有のトーン音が用いられる。そして、オーディオインタープリタノード１５４は、テキストがテキスト−音声モジュール２１６で処理されることを示す命令とともに、ハイパーテキストリンクのテキスト（click here for information on cars）をオーディオ処理ノード１５２に与える。この結果、「自動車の情報についてはここをクリック」という音声信号が電話１１０に与えられる。そして、オーディオインタープリタノード１５４は、電話１１０へのトーン音をＤＴＭＦデコーダ／ジェネレータ２１２が生成するようにする命令をオーディオ処理ノード１５２に送る。ハイパーテキストリンクの終了をユーザに知らせるために、この特有のトーン音が用いられる。ハイパーテキストリンクの始まりおよび終了をユーザに知らせるために用いられるトーン音は同じトーン音でもよいし異なるトーン音でもよい。終了トーン音に引き続いて休止が置かれる。トーン音を用いる代わりに、ハイパーテキストリンクの始まりおよび終了が、「リンク開始［ハイパーテキスト］リンク終了」のような音声信号で識別されてもよい。もしユーザがリンクをたどることを希望する場合、ユーザは休止期間にユーザオーディオ入力を供給する。例えば、ユーザがリンク”click here for informa tion on cars ”をたどることを望むものとする。ユーザは、リンクについて生成された音声信号に引き続く休止期間内にオーディオ入力を入力する。オーディオ入力は、例えば、電話１１０のキーパッド上のキーを押すことで生成されるＤＴＭＦトーン音であってよい。ＤＴＭＦトーン音はオーディオ処理ノード１５２に受信され、さらにＤＴＭＦデコーダ／ジェネレータ２１２によって処理される。ＤＴＭＦトーン音を表すデータは、制御／データバス２２２、パケットネットワークインターフェイス２３０、およびリンク１５３を介してオーディオインタープリタノード１５４に与えられる。オーディオインタープリタノード１５４は、この信号を受け取ると、選択されたリンクに続く休止期間内に信号が受け取られたことを認識し、そして、オーディオインタープリタノード１５４は、選択されたリンクに関連したＵＲＬ http://www.abc.com/cars.html で同定されるＷＷＷドキュメントについての要求を生成する。或いは、ハイパーテキストリンクを選択するためのオーディオユーザ入力は、音声信号であってもよい。リンクの別のタイプとして、ハイパーテキストアンカーリンク（anchor link ）がある。アンカーリンクは、１つのＨＴＭＬドキュメント内の特定の場所にユーザがジャンプできるようにするものである。通常のグラフィカルブラウザでは、ユーザがアンカーリンクを選択したとき、ブラウザはリンクで指示されたドキュメントの一部を表示する。本発明のオーディオブラウジング技術によると、ユーザがアンカーリンクを選択したとき、オーディオインタープリタノード１５４はリンクで指定された個所のドキュメントを翻訳し始める。例えば、ドキュメント６００のライン６２０は、このドキュメントのライン６２５の部分へのハイパーテキストアンカーを含んでいる。このハイパーテキストリンクは、上述のように、新たなＨＴＭＬドキュメントを同定するハイパーテキストリンクと同様にユーザに識別される。ハイパーテキストアンカーリンクは、例えば、リンクがアンカーリンクであることを示す異なるオーディオトーン音または生成された音声信号によって区別されるものであってよい。もしユーザがライン６２０でアンカーリンクを選択すると、オーディオインタープリタノード１５４はライン６２５のテキストにスキップし、そこのＨＴＭＬドキュメント６００を翻訳し始める。図１に関連して説明した有利な実施の形態は、オーディオ処理ノード１５２とオーディオインタープリタノード１５４とを含むオーディオブラウジング補助部１５０が、長距離通信ネットワーク１０２内に位置する通信ネットワークノード内に具体化されるようにしたものである。このようにすることで、本発明によるオーディオブラウジング機能を、電話ネットワーク１０２サービス提供者が電話ネットワーク加入者に提供することができるようになる。かかる構成では、ユーザの構内設備またはドキュメントサーバに付加的なハードウェアが必要になることがない。全てのオーディオブラウジング機能が電話ネットワーク１０２内の構成要素によって提供される。しかしながら、このほかの構成とすることも可能であり、かかる代替的な構成はここでの開示により当業者が容易に実施できるものである。かかる代替的構成の１つが図７に示されており、オーディオブラウジング補助部の機能が図示のユーザインターフェイス装置７００において実行される。この実施の形態では、オーディオ処理ノード１５２の機能およびオーディオインタープリタノード１５４の機能は、ユーザインターフェイス装置７００内に１つにまとめられている。ユーザインターフェイス装置７００は、通信リンク７０２を介してドキュメントサーバ１６０と通信する。リンク７０２は図１に関して説明したリンク１６４と同様である。つまり、リンク７０２はＴＣＰ／ＩＰに対するソケット接続であってよく、その確立はこの技術分野では周知である。ユーザインターフェイス装置７００の詳細が図８に示されている。ユーザインターフェイス装置７００は、ユーザ入力を受け付けるためのキーパッド／キーボード８０２およびマイク８０４と、オーディオ出力をユーザに提供するためのスピーカ８０６とを備えている。また、ユーザインターフェイス装置７００は、制御／データバス８２４に接続されたキーパッド／キーボードインターフェイスモジュール８１６をも備えている。さらに、ユーザインターフェイス装置７００は、コーデック（codec）８１０と音声認識モジュール８１８と、テキスト−音声モジュール８２０と、オーディオ再生／録音モジュール８２２とを備えており、図８に示すように、それぞれがオーディオバス８０８と制御／データバス８２４とに接続されている。コーデック８１０は、アナログ−ディジタルコンバータ８１２とディジタル−アナログコンバータ８１４とを含んでおり、これら両方は制御／データバス８２４を介して中央処理装置８２６によって制御される。アナログ−ディジタルコンバータ８１２は、マイク８０４からのアナログオーディオユーザ入力をディジタルオーディオ信号に変換し、そのディジタルオーディオ信号をオーディオバス８０８に与える。ディジタル−アナログコンバータ８１４は、オーディオバス８０８からのディジタル信号を、スピーカ８０６から送出されるアナログオーディオ信号に変換する。キーパッド／キーボードインターフェイスモジュール８１６は、キーパッド／キーボード８０２からの入力を受け取り、その入力を制御／データバス８２４に与える。音声認識モジュール８１８、テキスト−音声モジュール８２０、およびオーディオ再生／録音モジュール８２２は、図２に関連して説明したモジュール２１４、２１６および２１８とそれぞれ同じ機能を実行し、これらと同様に構成されている。さらに、ユーザインターフェイス装置７００は、リンク７０２を介してインターネットのようなパケットネットワークに接続するためのパケットネットワークインターフェイス８３４を含んでいる。さらに、ユーザインターフェイス装置７００は、それぞれ制御／データバス８２４に接続された、中央処理装置８２６およびメモリ装置８２８を含んでいる。ユーザインターフェイス装置７００の全体としての機能は、中央処理装置８２６によって制御される。中央処理装置８２６は、メモリ装置８２８に記憶されて実行されるコンピュータプログラム命令８３０の制御のもとで動作する。メモリ装置８２８はデータ８３２をも含んでいる。ユーザインターフェイス装置７００は、図１の実施の形態と関連して説明したオーディオ処理ノード１５２およびオーディオインタープリタノード１５４の機能を実行する。これらの機能は、コンピュータプログラム命令８３０を実行する中央処理装置８２６によって実行される。つまり、コンピュータプログラム命令８３０は、（１）オーディオ処理ノード１５２の機能を実行するコンピュータプログラム命令２３２、および（２）オーディオインタープリタノード１５４の機能を実行するコンピュータプログラム命令３１２、と同じまたは類似のプログラム命令を含むものである。オーディオ処理ノード１５２およびオーディオインタープリタノード１５４の機能は前に詳しく説明したので、ここではこれ以上詳細には述べない。中央処理装置８２６は、複数の処理を同時に実行することができ、これにより、オーディオ処理ノード１５２およびオーディオインタープリタノード１５４の機能を実行する。このマルチ処理機能が図８に描かれており、そこでは中央処理装置８２６がオーディオ翻訳／ブラウジング処理８３６とオーディオプロセス処理８３８とを行うものとして示されている。動作において、ユーザインターフェイス装置７００のユーザは、キーパッド／キーボード８０２またはマイク８０４を用いてＵＲＬを要求する。もしキーパッド／キーボード８０２がＵＲＬの要求に用いられたならば、キーパッド／キーボードインターフェイスモジュール８１６が要求されたＵＲＬを制御／データバス８２４を介して中央処理装置８２６に与える。もしマイク８０４がＵＲＬの要求に用いられたならば、ユーザの声はマイク８０４で受信され、アナログ−ディジタルコンバータ８１２でディジタル化され、オーディオバス８０８を介して音声認識モジュール８１８に与えられる。そして、音声認識モジュール８１８は、要求されたＵＲＬを制御／データバス８２４を介して中央処理装置８２６に与える。中央処理装置８２６は、ＵＲＬを受信すると、オーディオ翻訳／ブラウジング処理８３６で例示されたオーディオブラウジング／翻訳セッションを開始する。オーディオ翻訳／ブラウジング処理８３６は、図１の実施の形態に関連して説明したのと類似のやり方で、パケットネットワークインターフェイス８３４を介してドキュメントサーバ１６０にＨＴＴＰ要求を送る。ドキュメントサーバ１６０からドキュメントを受け取ると、オーディオ翻訳／ブラウジング処理８３６は、本発明のオーディオブラウジング技術に従ってドキュメントを翻訳する。このドキュメントの翻訳によって生じた音は、オーディオプロセス処理８３８の制御によりスピーカ８０６からユーザに与えられる。同様に、ユーザインターフェイス装置７００のユーザは、マイク８０４を介してユーザインターフェイス装置にオーディオユーザ入力を与えることができる。オーディオ翻訳／ブラウジング処理８３６およびオーディオプロセス処理８３８はともにユーザインターフェイス装置７００内にあるから、２つの処理の間の全ての通信は中央処理装置８２６を通して処理間通信によって行われ、処理８３６、８３８とユーザインターフェイス装置７００の他の要素との間の全ての通信は制御／データバス８２４を介して行われる。図７および図８は、パケットネットワーク１６２内のドキュメントサーバ１６０と直接通信しているユーザインターフェイス装置７００を示している。或いは、ユーザインターフェイス装置７００は標準的な電話接続を介してドキュメントサーバ１６０と通信するようにされていてもよい。かかる構成において、パケットネットワークインターフェイス８３４の代わりに、制御／データバス８２４を介して中央処理装置８２６に制御される電話インターフェイス回路を用いてもよい。ユーザインターフェイス装置７００は、電話ネットワークを介してドキュメントサーバへ電話をする。ドキュメントサーバ１６０は、電話ネットワークインターフェイスモジュール２１０（図２）と類似のハードウェアを用いてユーザインターフェイス装置７００からの通話呼び出しを着信する。或いは、ドキュメントサーバ１６０へのパケットネットワーク接続を提供する終端点（termination poin t）により、電話ネットワーク内で通話呼び出しが着信され得る。図９に示す別の構成では、オーディオブラウジング補助部１５０の機能（オーディオ処理ノード１５２およびオーディオインタープリタノード１５４の機能を含んでいる）並びにドキュメントサーバ１６０の機能が、オーディオブラウジングドキュメントサーバ９００内で実行される。図９に描かれているように、通話呼び出しは電話１１０から、ＬＥＣ１２０、スイッチ１３０、別のＬＥＣ９０２を経てオーディオブラウジングドキュメントサーバ９００へルート指定される。つまり、この実施の形態では、通常の電話１１０から電話ネットワークを介してオーディオブラウジングドキュメントサーバ９００に到達することができる。さらに、オーディオブラウジングドキュメントサーバ９００は、リンク９０４を介してインターネットにも接続されている。オーディオブラウジングドキュメントサーバ９００の詳細が図１０に示されている。オーディオブラウジングドキュメントサーバ９００は、電話ネットワークインターフェイスモジュール１０１０と、ＤＴＭＦデコーダ／ジェネレータ１０１２と、音声認識モジュール１０１４と、テキスト−音声モジュール１０１６と、オーディオ再生／録音モジュール１０１８とを備えており、図１０に示すように、これらのそれぞれはオーディオバス１００２および制御／データバス１００４に接続されている。これらモジュール１０１０、１０１２、１０１４、１０１６、および１０１８は、図２に関連して説明したモジュール２１０、２１２、２１４、２１６、および２１８とそれぞれ同じ機能を実行するものであり、これらと同様に構成されている。さらに、オーディオブラウジングドキュメントサーバ９００は、リンク９０４を介してインターネットのようなパケットネットワークに接続するためのパケットネットワークインターフェイス１０４４を含んでいる。パケットネットワークインターフェイス１０４４は、図２に関連して説明したパケットネットワークインターフェイス２３０と同様なものである。また、オーディオブラウジングドキュメントサーバ９００は、中央処理装置１０２０とメモリ装置１０３０とを含んでおり、これら両方は制御／データバス１００４に接続されている。オーディオブラウジングドキュメントサーバ９００の全体としての機能は、中央処理装置１０２０によって制御される。中央処理装置１０２０は、メモリ装置１０３０に記憶されて実行されるコンピュータプログラム命令１０３２の制御のもとで動作する。メモリ装置１０３０は、データ１０３４、ＨＴＭＬドキュメント１０３６、オーディオＨＴＭＬドキュメント１０３８、オーディオファイル１０４０、およびグラフィックファイル１０４２をも含んでいる。オーディオブラウジングドキュメントサーバ９００は、図１の実施の形態に関連して説明した、オーディオ処理ノード１５２と、オーディオインタープリタノード１５４と、ドキュメントサーバ１６０との機能を実行する。これらの機能は、コンピュータプログラム命令１０３２を実行する中央処理装置１０２０によって行われる。つまり、コンピュータプログラム命令１０３２は、（１）オーディオ処理ノード１５２の機能を実行するコンピュータプログラム命令２３２、（２）オーディオインタープリタノード１５４の機能を実行するコンピュータプログラム命令３１２、および（３）ドキュメントサーバ１６０の機能を実行するコンピュータプログラム命令４１６、と同じまたは類似のプログラム命令を含むものである。オーディオ処理ノード１５２、オーディオインタープリタノード１５４、およびドキュメントサーバ１６０の機能は前に詳しく説明したので、ここではこれ以上詳細には述べない。中央処理装置１０２０は、複数の処理を同時に実行することができ、これにより、オーディオ処理ノード１５２、オーディオインタープリタノード１５４およびドキュメントサーバ１６０の機能を実行する。このマルチ処理機能が図１０に描かれており、そこでは中央処理装置１０２０がオーディオ翻訳／ブラウジング処理１０２２とドキュメント供給処理１０２４とオーディオプロセス処理１０２６とを行うものとして示されている。動作において、電話１１０からオーディオブラウジングドキュメントサーバ９００を介してアクセス可能な情報に関連した電話番号への通話呼び出しは、ＬＥＣ１２０、スイッチ１３０、およびＬＥＣ９０２を介してオーディオブラウジングドキュメントサーバ９００にルート指定される。なお、複数の電話番号がオーディオブラウジングドキュメントサーバ９００を介してアクセス可能な種々の情報に関連付けられていてよく、その各電話番号がオーディオブラウジングドキュメントサーバ９００にルート指定されることを付記しておく。呼び出しのあった回線は、オーディオプロセス処理１０２６の制御のもとで電話ネットワークインターフェイスモジュール１０１０を介して検出される。通話呼び出しが検出されると、中央処理装置１０２０は、ダイアルされた番号（ＤＮ）と関連したＵＲＬを決定するために参照を行う。ＤＮは、この技術分野で周知の方法によりＬＥＣ９０２からオーディオブラウジングドキュメントサーバ９００に与えられる。ＤＮとその関連ＵＲＬのリストはメモリ１０３０内にデータ１０３４として記憶されている。ＤＮと関連したＵＲＬを受け取ると、中央処理装置１０２０は、オーディオ翻訳／ブラウジング処理１０２２で例示されたオーディオブラウジング／翻訳セッションを開始する。オーディオ翻訳／ブラウジング処理１０２２は、中央処理装置１０２０に共存するドキュメント供給処理１０２４にＨＴＴＰ要求を送る。ドキュメント供給処理１０２４は、図１に示した実施の形態のドキュメントサーバ１６０と関連して説明したドキュメントサーバ機能を実行する。これらドキュメントサーバ機能は、メモリ１０３０に記憶された、ＨＴＭＬドキュメント１０３６、オーディオＨＴＭＬドキュメント１０３８、オーディオファイル１０４０、およびグラフィックファイル１０４２によってサポートされる。これにより、中央処理装置１０２０は、ＵＲＬと関連したドキュメントをメモリ１０３０から取り出すこととなる。そして、オーディオ翻訳／ブラウジング処理１０２２は、本発明のオーディオブラウジング技術に従ってドキュメントを翻訳する。このドキュメントの翻訳で生じた音は、オーディオプロセス処理１０２６の制御によりユーザに与えられる。同様に、電話１１０のユーザは、図１の実施の形態と関連して説明したのと同様なやり方で、オーディオブラウジングドキュメントサーバ９００にオーディオユーザ入力を与えることができる。オーディオ翻訳／ブラウジング処理１０２２、ドキュメント供給処理１０２４およびオーディオプロセス処理１０２６はともにオーディオブラウジングドキュメントサーバ９００内にあるから、処理１０２２、１０２４、１０２６間の全ての通信は中央処理装置１０２０を通して処理間通信によって行われ、処理１０２２、１０２４、１０２６とオーディオブラウジングドキュメントサーバ９００の他の要素との間の全ての通信は制御／データバス１００４を介して行われる。本実施の形態の１つの利益は、ＨＴＭＬドキュメントおよび他のデータが処理（例えば、翻訳）のために潜在的に不確かな広域ネットワークを通過する必要がないという点で効率がよいことである。図１に示した実施の形態では、オーディオ処理ノード１５２およびオーディオインタープリタノード１５４は並置されていた。しかしながら、オーディオ処理ノード１５２およびオーディオインタープリタノード１５４の機能は、図１１に示すように地理的に分離されてもよい。かかる実施の形態では、オーディオ処理ノード１５２が通信ネットワーク１０２内に含まれており、オーディオインタープリタドキュメントサーバ１１００がパケットネットワーク１６２内に含まれている。オーディオ処理ノード１５２の機能は、図１に関連して説明したのと同様である。ドキュメントサーバ１６０のようなドキュメントサーバの機能とオーディオインタープリタノード１５４の機能とを実行するオーディオインタープリタドキュメントサーバ１１００の詳細が、図１２に示されている。オーディオインタープリタドキュメントサーバ１１００は、リンク１５３および制御／データバス１２０４に接続されたパケットネットワークインターフェイス１２０２を含んでいる。また、オーディオインタープリタドキュメントサーバ１１００は、中央処理装置１２０６とメモリ装置１２１２とを含んでおり、これら両方が制御／データバス１２０４に接続されている。オーディオインタープリタドキュメントサーバ１１００の全体としての機能は、中央処理装置１２０６によって制御される。中央処理装置１２０６は、メモリ装置１２１２に記憶されて実行されるコンピュータプログラム命令１２１４の制御のもとで動作する。メモリ装置１２１２は、データ１２１６、ＨＴＭＬドキュメント１２１８、オーディオＨＴＭＬドキュメント１２２０、オーディオファイル１２２２、およびグラフイックファイル１２２４をも含んでいる。オーディオインタープリタドキュメントサーバ１１００は、図１の実施の形態に関連して説明した、オーディオインタープリタノード１５４とドキュメントサーバ１６０との機能を実行する。これらの機能は、コンピュータプログラム命令１２１４を実行する中央処理装置１２０６によって行われる。つまり、コンピュータプログラム命令１２１４は、（１）オーディオインタープリタノード１５４の機能を実行するコンピュータプログラム命令３１２、および（２）ドキュメントサーバ１６０の機能を実行するコンピュータプログラム命令４１６、と同じまたは類似のプログラム命令を含むものである。オーディオインタープリタノード１５４およびドキュメントサーバ１６０の機能は前に詳しく説明したので、ここではこれ以上詳細には述べない。中央処理装置１２０６は、複数の処理を同時に実行することができ、これにより、オーディオインタープリタノード１５４およびドキュメントサーバ１６０の機能を実行する。このマルチ処理機能が図１２に描かれており、そこでは中央処理装置１２０６がオーディオ翻訳／ブラウジング処理１２０８とドキュメント供給処理１２１０とを行うものとして示されている。動作において、オーディオ処理ノード１５２は、図１に関連して説明したのと同様のやり方で、リンク１５３を介してオーディオインタープリタドキュメントサーバ１１００と通信する。しかし、オーディオインタープリタノード１５４がリンク１６４を介してドキュメントサーバと通信する図１とは異なり、オーディオ翻訳ブラウジング処理１２０８は処理間通信により中央処理装置１２０６を介してドキュメント供給処理１２１０と通信する。よって、以上説明したように、本発明のオーディオブラウジングは、オーディオ処理機能、オーディオ翻訳／ブラウジング機能、およびドキュメント供給機能が特定の構成に応じて一体または分離されるというように、様々な形態で実現することができる。当業者は、他の構成によっても本発明のオーディオブラウジング機能が提供されることを認めるであろう。以上の記述から分かるように、本発明は、通常のグラフィックブラウザとともに用いられるようにされた標準的なＨＴＭＬドキュメントとともに、または本発明の特徴であるオーディオブラウジングに用いられるように特に生成されたオーディオＨＴＭＬドキュメントとともに用いられる。標準的なＨＴＭＬドキュメントのオーディオ翻訳については、多くの標準的なテキスト−音声変換技術が用いられてよい。次のセクションでは、標準的なＨＴＭＬドキュメントをオーディオデータに変換するために用いられる技術について説明する。ここで説明するＨＴＭＬドキュメントをオーディオデータに変換する技術は説明のためのものに過ぎず、当業者はＨＴＭＬドキュメントをオーディオ信号に変換する他の様々な技術を容易に実行することが可能である。標準的なテキスト文書は、周知である通常のテキスト−音声変換技術を用いて翻訳される。テキストはドキュメント内で出会ったときに翻訳され、このような翻訳はユーザがオーディオ入力（例えば、プロンプトに答えるためまたはリンクをたどるため）を供給するまで、或いはドキュメント内でプロンプトに到達するまで続けられる。ある文章の終わりは、音に休止を付加することで翻訳され、パラグラフマーク＜ｐ＞は長い休止を挿入することで翻訳される。テキスト様式は、以下のように翻訳されてよい。イメージ命令は、特定のイメージがドキュメント内に挿入されることを示すＨＴＭＬの仕様である。ＨＴＭＬイメージ命令の一例は、以下のようなものである。 <IMG SRC="http://machine.att.com/image.gif" ALT="[image of car]"> この命令は、ＵＲＬで定義された機械からイメージファイル”image.gif”が取り出され、それがクライアントのブラウザで表示されることを示している。ある通常のグラフィックブラウザはイメージファイルをサポートしておらず、そのためＨＴＭＬイメージ命令はイメージの代わりに表示される代替テキストを含むことがある。つまり、上述の例では、”image of car”というテキストがイメージファイルの代わりに含まれる。本発明のオーディオブラウジング技術によると、イメージ命令が代替テキストを含むものであれば、そのテキストは処理されて音声に変換され、その音声信号がユーザに提供される。つまり、この例では、” image of car”「自動車のイメージ」という音声信号が電話１１０側のユーザに提供される。もし、代替テキストが提供されなければ、代替テキストのないイメージに遭遇したことを示す音声信号（例えば、「代替説明のない写真です」）が生成される。通常のＨＴＭＬは、ユーザ入力の入力（entering）をサポートする命令を含んでいる。例えば、以下の命令、 <SELECT NAME="selectvar"> <OPTION> mary <OPTION SELECTED> joe <OPTION> </SELECT> は、ジョー（joe）がデフォールトのオプションとされているときに、マリー（m ary）とジョーという２つのオプションからのユーザが選択することを要求するものである。クライアントが通常のグラフィカルブラウザを実行する際、これらのオプションは例えばプルダウンメニューとして表されてよい。本発明のオーディオブラウジング技術によると、上記の命令は、以下のような音声信号に翻訳される。「以下のうちの１つを選んでください。メリー（休止）、現在選択されているジョー（休止）、オプションの終了。オプションをもう一度繰り返すには * ｒ、次に行くには # を押してください」もしあるオプションの後の休止期間にユーザがパウンドキー（pound key）を押すと、そのオプションが選択される。どのアイテムが選択されても、ユーザが次に進むことを選択したときは、可変な選択変数（selectvar）と関連したドキュメントサーバに戻る。ユーザがＤＴＭＦ信号で選択する代わりに、ユーザが音声信号で選択を行なってもよい。ユーザ入力を入力する別の通常のＨＴＭＬ命令は、チェックボックス命令である。例えば、以下のような一連の命令、 <INPUT TYPE="checkbox" NAME="varname"VALUE="red" CHECKED> <INPUT TYPE="checkbox" NAME="varname"VALUE="blue"> <INPUT TYPE="checkbox" NAME="varname"VALUE="green"> は、通常のグラフィックブラウザでは以下のように表示される。赤（red） □ 青（blue） □ 緑（green） □ デフォールトでは赤のボックスがチェックされている。ユーザは、青または緑のボックスをチェックすることでこのデフォールトを変更することが可能である。本発明のオーディオブラウジング技術によると、上記一連の命令は、ユーザに与えられる以下のような音声信号に翻訳される。「以下の選択は休止期間に # を押せば変えられます。現在選択されている赤（休止）、青（休止）、緑（休止）。このリストをもう一度繰り返すには * ｒ、次に行くには # を押してください」休止期間に # を押してＤＴＭＦ信号を生成すると、ユーザは休止期間の前にあるアイテムを選択することができる。# キーをもう一度押すと、ユーザは一連の入力動作から抜け出すことができる。ユーザがオプションのリストをもう一度繰り返したいときには *r を押せばよい。ＤＴＭＦ信号入力の代わりに、ユーザが音声信号入力を用いてチェックボックスオプションを選択するようにしてもよい。通常のＨＴＭＬドキュメントは、以下のような TEXTAREA 命令を用いてユーザにテキスト入力を要求することができる。 <TEXTAREA COLS=60 ROWS=4 NAME="textvar">ここにテキストを挿入してください </TEXTAREA> これにより、通常のグラフィックブラウザでは、「ここにテキストを挿入してください」というテキストに続いて、テキスト入力のためにユーザに与えられた６０行４列のテキストボックスが表示されることになる。本発明のオーディオブラウジング技術によると、上記命令は以下のように翻訳される。COL および ROWS というパラメータは無視され、「ここにテキストを挿入してください」という音声がユーザに与えられる。そして、ユーザはＤＴＭＦトーン音に続いて # 信号を入力することができる。これらＤＴＭＦ信号は、変数”textvar”と関連したドキュメントサーバに与えられる結果により処理される。或いは、ユーザは電話１１０のマイクに応答を話すことでテキストを与えることができ、その音声は音声認識モジュール２１４でデータに変換され、そのデータが変数”textva r”と関連したドキュメントサーバ１６０に与えられる。以上の記述から分かるように、通常のＨＴＭＬドキュメントが本発明のオーディオブラウジング技術によりブラウズされ得るように様々な技術を用いることができる。本発明によるオーディオブラウジングの利益をより十分に明らかにするために、通常のＨＴＭＬ命令に加えて付加的なドキュメント命令が用いられてもよい。オーディオＨＴＭＬ命令といわれるこれらの命令が、通常のＨＴＭＬドキュメントに導入されてもよい。これらのオーディオＨＴＭＬ命令を以下で説明する。音声源命令、 <VOICE SRC="//www.abc.com/audio.file"> によると、特定されたファイルがユーザに対して再生される結果となる。かかる命令は、図５に例示したドキュメント５００のライン５１２に詳細に記載されている。ネームコレクト命令、 <COLLECT NAME="collectvar"> はプロンプトおよびコレクトシーケンスの開始を指定している。かかるコレクトネーム命令の後には、プロンプト命令および１組の選択命令が続く。ユーザが選択を行うと、オーディオユーザ入力で示されたように、ユーザの選択の結果は可変コレクト変数（collectvar）と関連したドキュメントサーバに与えられる。コレクトネーム命令は、関連するプロンプトおよびコレクトシーケンスとともに、図５に例示したドキュメント５００のライン５１４〜５２４に詳細に説明されている。ＤＴＭＦ入力命令、 <INPUT TYPE="DTMF" MAXLENGTH="5" NAME=varname> は、ＤＴＭＦ信号形式のオーディオユーザ入力がユーザから予期されることを示している。この命令はオーディオブラウジング補助部１５０を休止させユーザからのＤＴＭＦ入力を待機するようにする。ユーザは電話１１０のキーパッドにあるキーを押すことによってＤＴＭＦシーケンスを入力し、# キーを押すことでシーケンスの終了を指示する。ＤＴＭＦ入力は、例示したＨＴＭＬドキュメント５００について上述したように処理される。そして、デコードされたＤＴＭＦ信号は、可変ネーム（varname）と関連したドキュメントサーバに与えられる。MAXLE NGTH パラメータは、入力可能な最大長（ＤＴＭＦ入力）を示す。もしユーザがＤＴＭＦキー（この例では５）の最大数を超えて入力すると、システムは超過した入力を無視する。同様に、SPEECH 入力命令、 <INPUT TYPE="SPEECH" MAXLENGTH="5" NAME=varname> は、音声信号形式のオーディオユーザ入力がユーザから予期されることを示している。この命令はオーディオブラウジング補助部１５０を休止させユーザからのＤＴＭＦ音声入力を待機するようにする。ユーザは電話１１０のマイクに向かって話して音声信号を入力する。音声入力は、例示したＨＴＭＬドキュメント５００について上述したように処理される。そして、音声信号は、可変ネームと関連したドキュメントサーバに与えられる。MAXLENGTH パラメータは、音声入力の最大長が５秒であることを示す。ここで説明したオーディオＨＴＭＬ命令は、本発明のオーディオブラウジング技術の利益を利用するために実行されうるオーディオＨＴＭＬのタイプの例である。当業者は別のタイプのオーディオＨＴＭＬ命令を容易に実行することができる。上述のオーディオＨＴＭＬ命令に加えて、オーディオブラウジング補助部１５０は様々なナビゲーション命令をサポートしている。通常のグラフィックブラウザでは、ユーザはドキュメントによりナビゲーションするための通常の技術を用いてもよい。このような通常の技術は、ドキュメントをスクロールするためのテキストスライダと、カーソル動作と、ページアップ、ページダウン、ホーム、およびエンドのような命令とを含んでいる。本発明のオーディオブラウジング技術によると、以下のように、ユーザは、ＤＴＭＦトーン形式または音声形式のいずれかのオーディオユーザ入力を用いてドキュメントをナビゲートしてもよい。以上の詳細な説明は、全ての点について実例となる典型的なものであって限定的なものと理解されるものではなく、ここで開示する発明の範囲は詳細な説明からではなく、特許法に許される最大限の範囲に解釈される請求の範囲によって決定されるものである。また、ここで示し説明した実施の形態は本発明の原則を例示しただけのものであり、当業者は本発明の範囲および特徴から離れることなく様々な設計変更を行なってもよいことが理解される。例えば、ここではパケットスイッチ通信チャネルのような通信チャネルについて説明したが、回路スイッチ通信チャネルのような通信チャネルでの実行も可能である。（注２）ここで電話番号は説明のためだけに用いたものである。いかなる特定の電話番号を用いることも本発明の実例として以外に何ら意味を持つものではない。また、実際の電話番号を意味するものではない。（注３）ここでイタリック体はテキストが音声として再生されることを示すために用いられる。DETAILED DESCRIPTION OF THE INVENTION Method and apparatus for extracting information using audio interface Field of the invention The present invention relates generally to information retrieval. More specifically, the invention relates to audio The present invention relates to extracting information from a network using a user interface. Background of the Invention The amount of information on communication networks is enormous and rapidly increasing. You. The most common of these networks is that computers around the world are The Internet is a linked network. Internet Most of the high penetration of the Internet is based on the Internet's World Wide Web (WW). W) could be attributed to the section. WWW is a server computer and The information to and from the client computer is usually a hypertext transfer protocol ( A part of the Internet that is exchanged using HTTP. Server is Stores information and responds to requests from clients. Supply (ie, send). Client requests and displays information Run computer software programs, often referred to as browsers You. Examples of WWW browsers include Netscape Communication s lnc. ) Netscape Navigator and microphone Microsoft Corp. ) Internet Explorer Explorer). The server and the information stored therein are URL (Uniform Resource Locators) ). The URL is Berners-Lee, T. “U” niform Resource Locators ”(RFC1738, Network Working Group, 1994) Is described in It is incorporated herein by reference. For example, http: // www. hostname. com / document1. The html (Note 1) URL is "document 1. html ”is stored on the host server“ www. hostname. com ” Show doing. That is, the request for information from the client to the host server is RL. The information passed from the server to the client is usually documented It is said to be Such documents are in HTML (Hypertext Markup Language) ) Is generally described in a document language. client Receives the request from the server, the server sends the HTML document to the client. You. HTML documents provide information to the user on a computer display screen. Contains information used by the browser to display the. HTML document Text, logical structure commands, hypertext links, and user input May include force commands. When the user opens the hypertext link from the display Browser (for example, by clicking the mouse) Request another document from the server. Current WWW browsers use text and graphical user interfaces. Based on chair. That is, the document is imaged on the computer screen. As shown. Such images can be, for example, text, graphics, It includes a text link and a dialog box for user input. You All user interaction with the WWW is a graphical user Through the user interface. Audio data is stored in the user's computer. Data can be received and played back (for example, ". wav ”or“. au ”file), Receiving audio data is done through a WWW graphical interface. Is just a side effect. In other words, audio data is the result of the user's request. May be transmitted as a result, but the user may use the audio interface to There is no means to interact with the WW. (Note 1) The URLs mentioned in the examples are used for explanation only. The squid The use of a particular URL has no meaning other than as an example of the present invention. is not. Also, it does not mean an actual URL. Summary of the Invention The present invention uses an audio interface device (for example, a telephone) to A method and apparatus for retrieving information from a ment server are provided. Also, throb Documents from a document server operating according to the document delivery protocol Is provided. This interpreter uses the documentation Into audio data given to the audio user interface ( interpret). Also, the interpreter is an audio interface device Receive audio user input from Also, the interpreter will User input to the document server according to the document delivery protocol It translates this data into user data suitable for transmission and Document server. In many embodiments, the interpreter Must be located in the audio user interface or in the document server. Or between the audio user interface and the document server May be located in a communication channel between them. According to one embodiment, the audio browsing function of the present invention Communication network nodes to perform such as long distance telephone network It is included as a node in the communication network. Audio interface An audio channel is established between the device and the node. In addition, nodes and Document supply protocol channel is established with the document server. . The node is served to the document server according to the document serving protocol. Received the document, and then send the document to the audio user interface. -Translate to audio data suitable for the face. And the node Audio data to the audio interface device via Send Also, the node receives audio from the audio interface device. Receives user input (eg, DTMF tone or voice) and outputs the audio Translates user input into user data suitable for the document server. In addition, The document shall document its user data according to the document supply protocol. Send to server. In one embodiment, the document server includes a hypertext transfer protocol. World Wide Web document server that communicates with clients via It is. The advantage of the present invention is that the user can Audio browsing session with World Wide Web document server That you can take action. World Wide Web Document Service The browser can handle this browsing session in the usual way, Browsing session runs on a regular graphical browser Started by the audio interface device. You don't need to know if you were hit. Necessary translation function is communication network node These functions are performed by the user using the audio translator and by the hypervisor. -World Wide Web documents that operate according to the text transfer protocol This is done in a way that is not known to any of the server. These and other advantages of the present invention are described with reference to the following detailed description and accompanying drawings. Will be apparent to those skilled in the art. BRIEF DESCRIPTION OF THE FIGURES FIG. 1 is a diagram illustrating a communication system suitable for carrying out the present invention. FIG. 2 is a block diagram of components of the audio processing node. FIG. 3 is a block diagram of components of the audio interpreter node. FIG. 4 is a block diagram of the document server. FIG. 5 is a diagram illustrating an example of an audio HTML document. FIG. 6 is a diagram illustrating an example of an HTML document. FIG. 7 shows that the audio browsing function is performed on the user interface device. FIG. 2 is a block diagram of one embodiment. FIG. 8 is a block diagram of components of the user interface device of FIG. FIG. 9 shows the audio browsing function of the audio browsing document. FIG. 3 is a block diagram of an embodiment executed by a server. FIG. 10 is a block diagram of the components of the audio browsing document server of FIG. It is a lock figure. FIG. 11 shows that the audio translation function is an audio interpreter document server. FIG. 3 is a block diagram of an embodiment executed in a server. FIG. 12 is a block diagram of the audio interpreter document server shown in FIG. It is a block diagram of. Detailed description FIG. 1 is a diagram illustrating a communication system 100 suitable for implementing the present invention. An example For example, an audio interface device such as telephone 110 It is connected to a rear (LEC) 120. Audio interface device Something other than a telephone can be used. For example, the audio interface The source device may be a multimedia computer with telephony capabilities. Book According to the invention, the user of the telephone 110 may be, for example, a document server 160. A phone number associated with the information provided by the appropriate document server. In the exemplary embodiment shown in FIG. 1, the document server 160 communicates It is a part of the network 162. In an advantageous embodiment, the communication network 1 Reference numeral 62 denotes the Internet. Like document server 160 The telephone number associated with the information accessible through the document server Special communication networks such as audio browsing assistant (adjunct) 150 Network nodes are set to be routed. In the embodiment shown in FIG. The audio browsing assist unit 150 is a communication device that is a long-distance telephone network. A node in the network 102. Therefore, the call is LE Routed to C120, LEC 120 further routes the call to trunk 1 25 to the long distance carrier switch 130. Long distance network Network 102 is similar to switch 130 for routing call calls. It is common to have other switches. However, for simplicity, FIG. Shows only one switch. Switch 13 in communication network 102 0 is an “intelligent” switch that can be programmed to perform various functions. (Or processing unit 1) 31). In this way, the processing unit is installed in the communication network switch. The use of and programming it is well known in the art. You. When a call is received by switch 130, the call is Routed to the browsing aid 150. As a result, the telephone 110 and Ode An audio channel is established with the browsing assistant 150. communication Routing calls over a network is a technology in the art. It is well known and will not be described further here. In one embodiment, the audio browsing service according to the present invention comprises a communication Audio browsing provided by a service provider on network 102 It is provided only to users who have subscribed to the aging service. In such an embodiment The database 140 connected to the switch 130 contains a list of subscribers In. Switch 130 determines whether the call was placed from the subscriber to the server The database 140 is referred to in order to determine One way to achieve this The method stores a list of calling telephone numbers (ANI) in database 140 It is to keep. In a well-known manner, the LEC 120 calls the switch Provide an ANI of 0. Then, the switch 130 determines that the ANI is a database. 140 included in the subscriber list of the audio browsing service stored in Reference is made to the database 140 to determine whether or not it exists. If that ANI If is in the list, switch 130 will audio the call in accordance with the present invention. Route to the browsing assisting unit 150. If the ANI is an audio bra If not, the appropriate message is sent to phone 110 You. The audio browsing auxiliary unit 150 includes an audio processing unit, which will be described later in detail. And an audio interpreter node 154. Oh The dio browsing assistant 150 is an audio browsing machine according to the present invention. Provide the ability. Upon receiving a call from the telephone 110, the audio browsing assistant 150 is a document associated with the called telephone number via link 164 A communication channel with the server 160 is established. Between phone number and document server The association will be described in detail later. In the embodiment for WWW, link 164 is TC A socket connection to P / IP, the establishment of which is well known in the art. . For more information on TCP / IP, see Comer, Douglas, “Internetworking with TCP / IP: Princip l es, Protocols, and Architecture ”(Englewood Cliffs, NJ, Prentice Hall, 1988). Audio browsing auxiliary unit 150 and document Server 160 communicates with each other using a document supply protocol. here Document delivery protocol is the transfer of information between client and server. This is a communication protocol for transmission. According to such a protocol, the client Requests information from the server by sending a request to the server, and the server sends the requested information. The request is fulfilled by sending a document containing the information to the client. Therefore, The document serving protocol channel is used by the audio browsing assistant 150 And a document server 160 via a link 164. Advantageous In one WWW embodiment, the document serving protocol is -A text transfer protocol (HTTP). This protocol is used for WWW communication. Berners-Lee (Bern), well known in the art and incorporated by reference. ers-Lee, T) and Connolly, D) 's "Hypertext Transfer Protocol (H TTP) Working Draft of the Internet Engineering Task Force ”(1993) Have been. Therefore, the audio browsing auxiliary unit 150 uses the HTTP protocol. To communicate with the document server 160. Therefore, the document server 160 As far as it goes, it is a regular WWW client running a regular graphical browser. Act as if communicating with one of the ants. In other words, the document server 160 provides audio browsing in response to a request received from link 164 The document is supplied to the auxiliary unit 150. Here, a document is a set of information is there. The document is a static document predetermined by the server 160, In this case, the same information is given to all requests for the document. Results are obtained. Alternatively, the document is information provided in response to the request May be dynamic such that it is generated dynamically at the time of the request. General The dynamic document is executed by server 160 in response to a request for information. Generated by a script that is a program to be executed. For example, the URL is It may be related to crypto. Server 160 receives request containing URL Server 160 executes the script to generate a dynamic document, Provide the dynamically generated document to the client that requested the information. Do The use of scripts to dynamically generate documents is It is well known. Documents provided by the server 160 may be text, logical structure commands. Code, hypertext links, and user input commands. Like this One of the characteristics of a document is the physical structure of the information contained in the document. That is, when a normal graphic browser is executed and displayed on the client side Is not defined. Instead, Documents are translated in the browser to define the physical layout Contains logical structure commands. For example, such a logical structure command is highlighted Contains commands and new paragraph commands. Of such commands The syntax structure is based on the Goldfarb Char "The SGML Handbook" by Golds (Goldfarb, Charles) (Clarendon Press, 1990 ), Such as SGML (Standard Generalized Markup Language) It may conform to a more general purpose document structure language specification. Departure In the Ming WWW embodiment, these documents are It is a document in the markup language (HTML). HTML is WWW Based on the SGML used to define documents supplied by It is a well-known language. HTML is included as a reference , Berners-Lee (T) and Connolly, D, "Hyp ertext Markup Language (HTML) Working Draft of the Internet Engineering Task Force ”(1993). HTML document received by client running normal browser When done, the browser translates the HTML document into an image and Display the image on the computer display screen. However, the source of the present invention According to the theory, when a document is received from the document server 160, an audio The browsing assistant 150 converts the document into audio data. You. The details of such conversion will be described later in detail. The audio data is Sent to phone 110 via switch 130 and LEC 120. In other words, this According to the method, the user of the phone 110 may be provided with a It is possible to access information of the document server 160. In addition, the user can access the audio browsing assistant 150 from the telephone 110. It is also possible to send audio user input. Audio user input is for example It may be an audio signal or a DTMF tone. Audio browsing assistant 150 converts audio user input to link 164 according to the HTTP protocol. User data or commands suitable for transmission to the document server 160 via Convert to a decree. User data or instructions can also be document-supplied protocols It is sent to the document server 160 via the channel. This allows users and Interact with the document server via the audio user interface Will be processed. In this manner, the user can access the WWW document via an audio interface. And a browsing session with the Document server The server can handle such browsing sessions in the usual way, A client whose browsing session runs a regular graphical browser Started by or by an audio interface like a phone You do not need to know what was started. Audio in network 102 The browsing assisting unit 150 receives the document supplied from the document server 160. The text is translated into audio data suitable for sending to the telephone 110. further, The audio browsing assisting unit 150 receives the audio User input into user data suitable for being received by the document server 160. translate. Next, a more detailed description of an advantageous embodiment for browsing sessions This will be described in detail. Here, the user on the telephone 110 side is the document server 1 Audio browsing associated with information accessible through the 60 (123) 456-7 set so that the route is designated by the routing assistance unit 150. Suppose you dial 890 (Note 2). Call calls are routed to LEC120 The LEC 120 assigns the telephone number to the long-distance network 102, It recognizes that the route has been designated to the switch 130. Receive a phone call And switch 130 then audio signals the call through link 132 Route to the browsing assisting unit 150. This allows the phone 110 and the audio An audio channel with the browsing aid 150 is established. Details of the audio processing node 152 are shown in FIG. Audio processing The card 152 includes a telephone network interface module 210 and a DTM. F-decoder / generator 212, speech recognition module 214, text- An audio module 216 and an audio playback / recording module 218 are provided. 2, each of which has an audio bus 220 and a control / It is connected to the data bus 222. Further, the audio processing node 152 Central processing unit 224, memory unit 228, and packet network interface Each of which is connected to a control / data bus 222. Has been continued. The overall function of the audio processing node 152 is a central processing unit. 224. The central processing unit 224 is stored in the memory device 228. It operates under the control of the computer program instruction 232 executed and executed. Memory device 228 can be any device that can be read mechanically. No. For example, the memory device 228 includes a random access memory (RAM), a read Only memory (ROM), programmable read only memory (PROM) ), Erasable PROM (EPROM), electrically erasable PROM (EEPROM) ), Magnetic storage media (ie, magnetic disks), or optical storage media (ie, CD-ROM). Further, the audio processing node 152 has a medium Computer program instructions 23 accessible by central processing unit 224 Of a mechanically readable device capable of storing both Various combinations may be included. The telephone network interface module 210 has an audio processing node. Handles low-level interactions between the network 152 and the telephone network switch 130 . In one embodiment, module 210 includes one or more analog chips. Up / Lin Group Start consists of a telephone line terminator. By module 210 , Central processing unit 224 controls link 132 via control data bus 222 be able to. Control functions include on-hook / off-hook, call detection, and Yo And far-end on-hook detection. In another embodiment, module 210 includes T1 / DS1, E1, or PR1, one or more channelized digital Interface. The signal may be in-band or out-of-band. DTM The F decoder / generator 212 converts the digital data of the DTMF tone signal into digital data. And the generation of DTMF tones from digital data. Voice recognition The awareness module 214 originates on the user's phone 110 and Recognize the audio signal received. Such a voice signal is transmitted to the voice recognition module 2 14 and converted into digital data. Text-speech module The rule 216 contains the text of the document received from the document server 160. Is converted into an audio sound signal transmitted to the user on the telephone 110 side. Oh Dio playback / record module 218 received from document server 160 The audio data is played back on the telephone 110 side, and Used to record audio data. Each module 210, 212, 21 4, 216 and 218 are shown as separate function modules in FIG. Is added. The function of each module 212, 214, 216, 218 Hardware, software or hardware using known signal processing technology It may be realized as a combination of software. The function of module 210 is Hardware or hardware and software using known signal processing technology May be realized as a combination. The function of each module will be described later in connection with the example. Further details will be given. The packet network interface 230 Communication between the processing node 152 and the audio interpreter node 154. Used for Audio browsing assistant 150 connects to audio processing node 152 Also includes an audio interpreter node 154. Audio in The detail of the interpreter node 154 is shown in FIG. Audio interface The printer node 154 includes a central processing unit 302, a memory 304, a control / data Two packet network interfaces connected by tabus 310 306, 308. The entirety of the audio interpreter node 154 Is controlled by the central processing unit 302. Central processing unit 302 Are the computer program instructions 31 stored and executed in the memory device 304 It operates by the control of 2. The memory device 304 can be any device that can be read mechanically. Good. For example, the memory device 304 includes a random access memory (RAM), Do-only memory (ROM), programmable read-only memory (PRO M), erasable PROM (EPROM), electrically erasable PROM (EEPRO) M), a magnetic storage medium (ie, a magnetic disk), or an optical storage medium (such as a magnetic disk). That is, it may be a CD-ROM). Additionally, an audio interpreter node 154 is a computer program that can be accessed by the central processing unit 302 and is Mechanical reading that can store both the RAM instruction 312 and the data 314 Various combinations of possible devices may be included. The audio processing node 1 uses software instructions executed by the central processing unit. Controlling devices such as 52 and audio interpreter node 154 Are well known in the art and will not be described in further detail here. Returning to the example, a call from telephone 110 to telephone number (123) 456-7890 The call is made to the audio browsing assistant 150, in particular to the audio processing node 1 Route 52 is designated. The central processing unit 224 has a telephone network interface. The line being called by the face module 210 is detected. Phone call , The central processing unit sends the URL associated with the dialed number (DN) Make a reference to determine. Dialed telephone numbers (DNs) are used in this technical field. Provided from the local exchange carrier 120 to the switch 130 in a known manner. , And DN is supplied from the switch 130 to the audio browsing auxiliary unit 150. Can be In the memory 228, a list of URLs related to the DN is stored as data 234. It is memorized. In this example, DN (123) 456-7890 is a URL http: // www. att. Assume that it is associated with com / ~ phone / greeting. In another embodiment, the list of URLs associated with various DNs is Instead of being local to the browsing assistant 150, the database 14 0 is stored in a network database. In such an embodiment, , The central processing unit 224 of the audio processing node 152 is a network switch. 1 A signal requesting reference to the database 140 is sent to 30. Switch is data Requesting a URL from the database 140 and sending the resulting URL to the audio processing unit. To the card 152. Audio processing node 152, switch 130 and data Communication with the base 140 is well known in the art, such as, for example, SS7. It should be noted that the signal may pass through a simple out-of-band signal system. Such a configuration The advantage is that multiple audio browsing aids are present in network 102. And each may share one database 140 is there. This allows the database to be updated with the URL and associated DN 140 is only one. After receiving the URL associated with the DN, the central processing The processor 224 converts the message (including the URL) into an audio interpreter message. To the audio interpreter node 154. Instruct the user to start a browsing session. Such messages are centrally processed The packet network interface from the device 224 via the control / data bus 222 To the case 230. Further, this message is sent to the audio processing node 15. 2 via the connection 153 from the packet network interface 230 of FIG. Packet Network Interface 3 of Dio Interpreter Node 154 06. In one advantageous embodiment, audio processing nodes 152 and And the audio interpreter node 154 are juxtaposed, whereby the audio Forming an auxiliary browsing assisting part 150; In another embodiment, audio The processing node 152 and the audio interpreter node 154 are geographically separated. May be separated. Some such alternative embodiments are described below. . Connection 153 is a packet data network connection known in the art (eg, For example, a TCP / IP connection to Ethernet) may be used. Returning to the example, the audio interpreter node 154 has a packet network. A new audio translation / browsing Receive a message to begin the session. The central processing unit 302 Simultaneous audio translation / browsing sessions for multiple users Can be controlled. The execution of such multiple processes by the processor is Knowledge and generally accompanied by an illustration of the software process that controls each session . To start an audio translation / browsing session, The reader node 154 is located at the URL http: // www. att. com / ~ phone / greeting The HTTP request is sent to the document server 160 via the connection 164. Book In the example, the document server 160 has the hostname www. att. com I assume. Details of the document server 160 are shown in FIG. Document server 160 is a computer including the central processing unit 402 connected to the memory 404. is there. The functions of the document server 160 are performed by the computer stored in the memory 404. Data program instructions 416 are controlled by central processing unit 402. In operation, the document server 160 establishes a connection 164 and a packet net. Audio interpreter node 154 via work interface 440 Receive a request for a document from. Central processing unit 402 translates the request. The requested information is retrieved from the memory 404. Such requests are in HTML documentation 408, audio HTML document 410, audio file 41 2, or for the graphics file 414. HTML The document 408 is well known and can be used for a normal WWW graphical browser. It contains normal HTML instructions. Audio HTML documents An audio interface similar to an HTML document, but in accordance with the present invention. It has a special additional instruction for translation at the Rita node 154. Of the present invention Such instructions specific to the diobrowsing surface are described here in the Audio HTM It is called L instruction. Audio HTML documents and audio HTML instructions Will be described in detail later. Audio file 412 contains audio A file containing information. Graphic file 414 is graphical (chart) ) A file containing information. According to methods well known in the art, the URL Identify a particular document on a given document server. Memory 404 , Dynamically generated HTML and audio HTML documents A script 418 for the event may also be included. Returning to this example, URL htt p: // www. att. HTTP requests for com / ~ phone / greeting N From the interpreter node 154 via the connection 164 to the document server 160 Is received. The document server translates the URL, and under the control of the central processing unit 402 Fetches an audio HTML page from the memory 404. And the central processing unit The device 402 stores this audio HTML document in a packet network interface. Audio interpreter via interface 440 and link 164 To the host 154. URL http: // www. att. com / ~ phone / greeting Audio H received by the audio interpreter node 154 A TML document 500 is shown in FIG. Audio interpreter Node 154 begins translating document 500 as follows. Some form of implementation In an embodiment, lines 502-506 of document 500, including the title of the page. The <HEAD> portion of is not converted to voice, and the audio interpreter node 1 Ignored by 54. In another embodiment, the <TITLE> portion is a text G may be translated using voice. The text “He” in the line 508 of the <BODY> portion of the document 500 llo! "Indicates the packet network interface 306 and the link 15 3 from the audio interpreter node 154 to the audio processing node 1 52. Audio Interpretation for text "Hello!" Node 154 processes the text in text-to-speech module 216. A command to the audio processing node 152 is to be sent. Audio processing The logical node 152 receives the text via the packet network interface 230. Text and instructions, and the text is transmitted over control / data bus 222. Via the text-to-speech module 216. Text-speech module 216 generates an audio signal for reproducing “Hello” (Note 3), This signal is transmitted via the Diobus 220 to the telephone network interface module. To the server 210. Further, the telephone network interface module 21 0 sends this audio signal to the telephone 110. Text-to-speech conversion is well known. The text-to-speech module 214 uses conventional text-to-speech technology. It should be noted that it may be possible. For example, when text is converted to speech, The symbol "!" In may be translated to play at a loud volume. Line 510 of document 500 is a form instruction, an audio interface. The interpreter node 154 sends this instruction to the audio processing node 152. Do not send anything. The audio interpreter node 154 provides future Translate line 510 to indicate that a response is expected, and this response , Http: // machine: 8888 / hastings-bin / getscript. sh. The screen identified by Provided as an argument to the lipto. Line 512 is Dio HTML instruction. The audio interpreter node 154 has a memory The www-spr. ih. att. com / ~ hastings / annc / greet ing. The server 16 sends an http request for the audio file identified by mu8. Translate line 512 by sending to zero. Document server 160 notes 404, retrieve the audio file, and import it via link 164. To the Dio interpreter node 154. When you receive the audio file, The audio interpreter node 154 reads the file, Indicates that it is to be played by the audio play / record module 218 It is sent to the audio processing node 152 together with the instruction. Copy these files and instructions Upon receipt, the audio processing node 152 Route to the audio playback / recording module 218. Audio playback / recording module Module 218 is connected via the audio bus 220 to the telephone network interface. An audio signal to be sent to the chair module 210 is generated. And the telephone network The network interface module 210 transmits the audio signal to the telephone 110 Send to As a result, the user at the telephone 110 side can use the speaker of the telephone 110 to Audio file www-spr. ih. att. com / ~ hastings / annc / greeting. mu8 contents Will hear. Lines 514-516 are audio HTML instructions. Audio interface The print node 154 does not send the line 514 to the audio processing node 152. Line 514 indicates that the response from the user is a document associated with the variable name "collectvar". This is sent to the document server 160. This command allows the user Prompts and provides information-opening a collect sequence It shows the beginning. Following this instruction is a prompt instruction 516 and a set of selections. There are select instructions 518-522. Audio interpreter node 154 is a line Process line 516 in the same manner as 512, so that you are on the phone 110 side The user is http: // www-spr. ih. att. com / ~ hastings / annc / choices. Identified by mu8 You will hear the sound from the file being played. This sound is selected based on several criteria. Asks the user to make a selection, audio interpreter node 1 54 waits for a response from the user on the telephone 110 side. Also, the result of processing line 516 , Central processing unit 302 sends audio processing node 152 a telephone network Prepare the interface module 210 to receive audio input. Send a message. The user then responds with audio user input from telephone 110. Aude The user input is entered by the user pressing a key on the telephone 110 keypad. DTMF tone format generated by the above method. For example, if the user calls telephone 11 Pressing "2" on the 0 keypad causes the audio processing node 152 to switch to the telephone network. DTMF token associated with “2” through the network interface module 210 Receive a sound. The audio signal is processed by the central processing unit 224 by DTMF. This signal is recognized as a tone sound, and this signal is The command to send to the coder / generator 212 is It is sent to the chair module 210. The central processing unit 224 generates the DTMF tone sound. Converts to digital data and converts the digital data to packet network Send from interface 230 to audio interpreter node 154 To the DTMF decoder / generator 212 as described above. This signal is received When the audio interpreter node 154 receives the response “2” from the user, The value "Ji" shown on line 520 of the Mari audio HTML document 500 m ”is selected. That is, the audio interpreter node 1 54 stores the value “Jim” associated with the variable “collectvar” in the Script identified in the in 510 http: // machine: 8888 / hastings-bin / getscr ipt. sh. Send to If the user's response is not listed, enter it If there is a response other than "1" to "3" in this example, or If the user does not respond within a fixed time, the audio interpreter node 1 54, "I can't accept your choice. Please try again." (Italic Command the text-to-speech module 216 to generate a speech signal , And the signal is sent to the user on the telephone 110 side. Alternatively, the audio user input may be an audio signal. In other words, you Instead of the user pressing number 2 on the telephone 110 keypad, the user Speak the word "2" to Iku. This audio signal is transmitted to the telephone network Received by the audio processing node 152 via the interface module 210 You. The audio signal is recognized as an audio signal by the central processing unit 224. The signal is sent to the speech recognition module 214 via the audio bus 220. The command is given to the telephone network interface module 210 to send It is. The central processing unit 224 converts the audio signal into digital data and further converts the digital signal into digital data. To transmit digital data to the audio interpreter node 154, Voice recognition module to provide to the To the command 214. The audio interpreter node 154 Data received, as described for DTMF audio user input Process this data. Note that the voice recognition module 214 is used in this technical field. Note that it works according to well-known normal speech recognition technology . Hypertext links often exist in HTML documents. this Is displayed on the screen of a computer running a normal graphical browser , Hypertext links are shown graphically (eg, underlined). If the user, for example, clicks the link with the mouse, the graphical If you select a link to the browser, the browser Generate a request and send the request to the document server. Here, shown in FIG. Consider an HTML document 600. Lines 604 and 605 are high 7 shows a detailed HTML description of a per-text link in detail. If this page If processed by a normal graphical browser, the display will look like this: Looks like. This page gives you a choice of links to follow to other World W ide Web pages. Please click on one of the links below. (This page allows you to select links to other WWW pages. Click on one of the links) click here for information on cars (Click here for car information) click here for information on trucks (Click here for track information) And the user uses a graphical pointing device like a mouse To select one of the links. If the userclick here for information on car s If you select, the browser will be identified at the URL http://www.abc.com/cars.html Generate a request for a document to be created. If the userclick here for info rmation on trucks If you select, the browser will go to the URL http://www.abc.com/truc Generate a request for the document identified in ks.html. Next, the processing of the HTML hypertext link according to the present invention will be described with reference to FIG. This will be described with reference to FIG. Here, the document server 160 determines whether the HT shown in FIG. Providing ML document 600 to audio interpreter node 154 Assume that Lines 602 and 603 are provided by the text-to-speech module 216. The audio signal is converted to an audio signal, and given to the user's telephone 110 as described above. In other words, the user is asked, "This page allows you to select a link to another WWW page. You. Please click one of the links below. " La In line 604, audio line indicates that line 604 is a hypertext link. The interpreter node 154 recognizes. Audio interpreter node 154 Sends the tone to the telephone 110 to the audio processing node 152 in DTMF decoding. Sends an instruction for the coder / generator 212 to generate. Alternatively, this The audio sound is output from the audio interpreter node 154 to the audio playback / recording mode. Command to cause Joule 218 to play an audio file containing a tone sound. An instruction is also generated by sending the instruction to the audio processing node 152. Hyper This distinctive tone is used to inform the user of the beginning of a text link. Can be Then, the audio interpreter node 154 determines that the text is text. G-Hypertext with instructions to be processed by voice module 216 Convert the text of the link (click here for information on cars) to audio To the logical node 152. As a result, "Click here for car information Is provided to the telephone 110. And the audio interpreter Node 154 converts the tone to the telephone 110 into a DTMF decoder / generator. An instruction is sent to audio processing node 152 for 212 to generate. Hyper -This unique tone is used to inform the user of the end of the text link. Can be To inform the user of the beginning and end of a hypertext link The tones used may be the same tone or different tones. End A pause follows the sound. Instead of using tone sounds, hypertext The start and end of the string are indicated by "Link Start [Hypertext] Link End May be identified by an audio signal such as "OK". If the user wishes to follow the link, the user will be Provides audio input. For example, if the user linksclick here for informa tion on cars User wants to follow the link. An audio input is input within a pause period following the audio signal. audio The input is, for example, a DT generated by pressing a key on the keypad of the telephone 110. It may be an MF tone. DTMF tone sounds to audio processing node 152 Received and further processed by the DTMF decoder / generator 212. The data representing the DTMF tone is stored in the control / data bus 222 and the packet network. Audio interface 230 and an audio interface via the link 153. It is provided to the print node 154. The audio interpreter node 154 Upon receipt of this signal, a signal is received within the idle period following the selected link. That the audio interpreter node 154 has been selected WWW identified at the URL http://www.abc.com/cars.html associated with the link Generate a request for a document. Alternatively, select a hypertext link The audio user input for selecting may be an audio signal. Another type of link is a hypertext anchor link (anchor link ). Anchor links link to a specific location within an HTML document. This allows the user to jump. In a normal graphical browser When the user selects the anchor link, the browser Display part of the document. According to the audio browsing technology of the present invention, When the user selects the anchor link, the audio interpreter node 154 Starts translating the document at the location specified by the link. For example, a document Line 620 of document 600 is a hyperlink to line 625 of this document. -Contains text anchors. This hypertext link is The user, as well as the hypertext link that identifies the new HTML document User. Hypertext anchor links, for example, A different audio tone or generated audio signal indicating a car link It may be distinguished by a number. If the user anchors at line 620 Upon selecting a link, the audio interpreter node 154 Skip to text and start translating HTML document 600 there. The advantageous embodiment described in connection with FIG. Audio browsing aid including audio interpreter node 154 150 is a communication network node located within the long distance communication network 102 It is embodied within. By doing so, the present invention The audio browsing function is provided by the telephone network 102 service provider. It can be provided to network subscribers. In such a configuration, May require additional hardware on site premises or on the document server. And not. All audio browsing functions are configured within the telephone network 102. Provided by components. However, other configurations are possible. Yes, such alternative configurations can be easily implemented by those skilled in the art based on the disclosure herein. It is. One such alternative configuration is shown in FIG. 7 and provides audio browsing assistance. The functions of the units are executed in the user interface device 700 shown in the figure. this In the embodiment, the function of the audio processing node 152 and the audio interface The functions of the print node 154 are integrated into the user interface device 700. It is stopped. The user interface device 700 is connected via a communication link 702. To communicate with the document server 160. Link 702 is described with respect to FIG. Link 164. That is, the link 702 is a source for TCP / IP. It may be a socket connection, the establishment of which is well known in the art. User in The details of the interface device 700 are shown in FIG. User interface Device 700 includes a keypad / keyboard 802 and a keypad / keyboard 802 for receiving user input. And a microphone 804 and a speaker 806 for providing audio output to a user. And Further, the user interface device 700 includes a control / data Keypad / keyboard interface module 81 connected to 6 is also provided. Further, the user interface device 700 includes a codec (Codec) 810, speech recognition module 818, text-speech module 820 and an audio playback / recording module 822, as shown in FIG. As shown, each connected to an audio bus 808 and a control / data bus 824. Have been. The codec 810 includes an analog-to-digital converter 812 and a And a digital-to-analog converter 814, both of which are controlled / data It is controlled by the central processing unit 826 via the bus 824. Analog-digital Tal converter 812 receives analog audio user input from microphone 804 Convert the digital audio signal to a digital audio signal Give to Obers 808. Digital-to-analog converter 814 provides audio The digital signal from the bus 808 is converted into an analog audio signal transmitted from the speaker 806. Audio signal. Keypad / keyboard interface module 816 receives input from keypad / keyboard 802 and controls the input. Control / data bus 824. Speech recognition module 818, text-to-speech Module 820 and audio playback / recording module 822 are associated with FIG. Perform the same functions as modules 214, 216 and 218, respectively And it is constituted similarly to these. Further, the user interface device 70 0 connects to a packet network such as the Internet via link 702. And a packet network interface 834 for connection. Further The user interface device 700 is connected to the control / data bus 824, respectively. It includes a central processing unit 826 and a memory device 828 connected thereto. A user The overall function of the interface device 700 is provided by the central processing unit 826. Controlled. The central processing unit 826 is stored in the memory device 828 and executed. It operates under the control of computer program instructions 830. Memory device 82 8 also contains data 832. The user interface device 700 has been described in relation to the embodiment of FIG. Machine of audio processing node 152 and audio interpreter node 154 Perform the function. These functions execute computer program instructions 830. This is executed by the central processing unit 826. That is, computer program instructions 830 is a computer program that executes the functions of (1) the audio processing node 152. Program instruction 232, and (2) the machine of the audio interpreter node 154. Computer program instructions 312 to perform the functions System instructions. Audio processing node 152 and audio interface The function of the interpreter node 154 has been described in detail above, so here is no more detail. I do not mention. The central processing unit 826 can execute a plurality of processes at the same time. Thereby, the audio processing node 152 and the audio interpreter The function of the mode 154 is executed. This multi-processing function is depicted in FIG. In the central processing unit 826, the audio translation / browsing processing 836 and the audio And a process 838. In operation, the user of the user interface device 700 operates the keypad / A URL is requested using the keyboard 802 or the microphone 804. If keypad Keypad / keyboard if keyboard / keyboard 802 was used to request a URL. Interface module 816 controls the requested URL / data bus 824 to the central processing unit 826. If Mike 804 requests URL If used, the user's voice is received by the microphone 804 and the analog-to-digital Digitalized by the digital converter 812 and transmitted through the audio bus 808. Provided to the recognition module 818. Then, the voice recognition module 818 The requested URL is provided to the central processing unit 826 via the control / data bus 824. . Upon receiving the URL, the central processing unit 826 performs audio translation / browsing. The audio browsing / translation session illustrated in operation 836 is started. Audio translation / browsing processing 836 is described in connection with the embodiment of FIG. In a manner similar to that described above, through the packet network interface 834 Sends an HTTP request to the document server 160. Document server 160 The audio translation / browsing process 836 receives the document from Translate the document according to the audio browsing technique of the present invention. This The sound produced by the translation of the document is used to control the audio processing 838 From the speaker 806 to the user. Similarly, the user interface The user of the device 700 can access the user interface device via the microphone 804. Audio user input can be provided. Audio translation / browsing processing 836 and audio processing processing 83 8 are both in the user interface device 700, All communication is performed by inter-process communication through the central processing unit 826, and processing 83 6, 838 and all communications between the other elements of the user interface device 700 Is performed via the control / data bus 824. 7 and 8 illustrate the document server 16 in the packet network 162. 0 shows the user interface device 700 in direct communication with the user interface device 700. Or , The user interface device 700 transmits the document via a standard telephone connection. It may be configured to communicate with the server 160. In such a configuration, the packet Control / data bus 824 instead of the network interface 834. Alternatively, a telephone interface circuit controlled by the central processing unit 826 may be used. No. The user interface device 700 is used for documenting via a telephone network. Call the client server. The document server 160 has a telephone network interface. User interface using hardware similar to interface module 210 (FIG. 2). A call from the interface device 700 is received. Or a document Termination that provides a packet network connection to the poin Due to t), a call call can be received in the telephone network. In another configuration shown in FIG. 9, the function (automatic Functions of the audio processing node 152 and the audio interpreter node 154. And the function of the document server 160 This is executed in the document server 900. Call as depicted in FIG. The call is made from the telephone 110, the LEC 120, the switch 130, another LEC 902 , And routed to the audio browsing document server 900. That is, in this embodiment, a normal telephone 110 is connected via a telephone network. The audio browsing document server 900 can be reached. Sa In addition, the audio browsing document server 900 connects via the link 904. It is also connected to the Internet. Audio browsing documents The details of the server 900 are shown in FIG. Audio browsing document Client server 900 includes a telephone network interface module 1010 and , DTMF decoder / generator 1012, speech recognition module 1014, , Text-speech module 1016 and audio playback / recording module 10 18 and each of these audio buses, as shown in FIG. 1002 and a control / data bus 1004. These modules 1010, 1012, 1014, 1016, and 1018 are described with reference to FIG. Described modules 210, 212, 214, 216, and 218, respectively They perform the same function and are configured similarly. In addition, The browsing document server 900 is interfaced via link 904. Packet network for connecting to packet networks such as An interface 1044 is included. Packet network interface 1044 is the packet network interface described in connection with FIG. It is similar to 230. Also an audio browsing document server 900 includes a central processing unit 1020 and a memory device 1030, Both are connected to a control / data bus 1004. Audio browsing The overall function of the document server 900 is controlled by the central processing unit 1020. Controlled. The central processing unit 1020 is stored in the memory device 1030 and executed. Re It operates under the control of computer program instructions 1032. Memory device 1 030 is data 1034, HTML document 1036, audio HTM L document 1038, audio file 1040, and graphic file. File 1042. The audio browsing document server 900 is related to the embodiment of FIG. The audio processing node 152 and the audio interpreter The functions of the mode 154 and the document server 160 are executed. These features , By a central processing unit 1020 executing the computer program instructions 1032. Done. That is, the computer program instruction 1032 is (1) audio Computer program instructions 232, (2 A) a computer program that performs the functions of the audio interpreter node 154 And (3) a controller for executing the functions of the document server 160. Computer program instructions 416, including the same or similar program instructions It is. Audio processing node 152, audio interpreter node 154 , And the function of the document server 160 have been described in detail before, so here It will not be described in further detail. The central processing unit 1020 executes a plurality of processes simultaneously The audio processing node 152, the audio interface The functions of the interpreter node 154 and the document server 160 are executed. this The multi-processing function is depicted in FIG. 10, where the central processing unit 1020 is Dio translation / browsing process 1022, document supply process 1024, and It is shown to perform the audio processing 1026. In operation, audio browsing document server 9 from telephone 110 Calls to telephone numbers associated with information accessible via 00 Audio browsing via C120, switch 130, and LEC902 To the document server 900. Note that multiple phone numbers Various information accessible via the diving browsing document server 900 Information, each of which is associated with an audio browsing document. It should be added that the route is specified to the comment server 900. There was a call The line is connected to the telephone network It is detected via the interface module 1010. Phone call detected The central processing unit 1020 then retrieves the URL associated with the dialed number (DN). Make a reference to determine. DN can be used as a LEC by methods well known in the art. 902 to the audio browsing document server 900. D A list of N and its associated URLs is stored in memory 1030 as data 1034. Have been. Upon receiving the URL associated with the DN, the central processing unit Audio browsing / audio browsing illustrated in the audio translation / browsing process 1022 Start a translation session. The audio translation / browsing process 1022 is medium An HTTP request is sent to the document supply processing 1024 coexisting in the central processing unit 1020. send. The document supply process 1024 is the document supply process of the embodiment shown in FIG. It performs the document server function described in connection with the remote server 160. these The document server function is an HTML document stored in the memory 1030. 1036, audio HTML document 1038, audio file 1 040, and the graphics file 1042. to this Accordingly, the central processing unit 1020 stores the document associated with the URL in the memory 103. It will be taken out from 0. Then, audio translation / browsing processing 102 2 translates the document according to the audio browsing technique of the present invention. The sound produced by the translation of this document is controlled by the audio process 1026 To the user. Similarly, the user of the telephone 110 will be able to use the embodiment of FIG. Audio browsing document in the same manner as described in connection with Audio user input can be provided to the server 900. Audio translation / browsing processing 1022, document supply processing 1024 And audio process processing 1026 are both audio browsing documents. Since it is in the ment server 900, everything between the processes 1022, 1024 and 1026 Is performed by inter-process communication through the central processing unit 1020, and the process 102 2, 1024, 1026 and the audio browsing document server 900 All communication with other elements takes place via control / data bus 1004. Book One benefit of embodiments is that HTML documents and other data are processed (eg, Does not have to go through a potentially uncertain wide area network for translation) It is efficient in this respect. In the embodiment shown in FIG. 1, the audio processing node 152 and the audio processing node 152 The interpreter nodes 154 were juxtaposed. However, audio processing The functions of node 152 and audio interpreter node 154 are illustrated in FIG. It may be geographically separated as shown. In such an embodiment, the audio processing Node 152 is included in communication network 102 and has an audio interface. Print document server 1100 included in packet network 162 I have. The function of the audio processing node 152 is similar to that described in connection with FIG. It is. Document server functions such as document server 160 and audio Audio interpreter that performs the functions of the audio interpreter node 154 Details of the document server 1100 are shown in FIG. Audio in The interpreter document server 1100 includes a link 153 and a control / data A packet network interface 1202 connected to the In. Also, the audio interpreter document server 1100 has a central It includes a processing unit 1206 and a memory unit 1212, both of which are control / data Data bus 1204. Audio interpreter document server The overall function of the server 1100 is controlled by the central processing unit 1206. . The central processing unit 1206 stores the program stored in the memory device 1212 for execution. It operates under the control of computer program instructions 1214. The memory device 1212 , Data 1216, HTML document 1218, audio HTML document Statement 1220, audio file 1222, and graphic file 1 224 as well. The audio interpreter document server 1100 has the configuration shown in FIG. The audio interpreter node 154 and the document The function with the server 160 is executed. These functions are performed by computer program instructions. This is performed by the central processing unit 1206 which executes 1214. In other words, The data program instruction 1214 includes (1) the audio interpreter node 154 Computer program instructions 312 for performing the functions of Computer program instructions 416 to perform the functions of the remote server 160. Or similar program instructions. Audio interpreter node Since the functions of the 154 and the document server 160 have been described in detail previously, I will not go into further details. The central processing unit 1206 performs a plurality of processes simultaneously. Can be executed, thereby providing audio interpreter nodes 154 and And the functions of the document server 160. This multi-processing function is shown in FIG. Depicted, where the central processing unit 1206 has audio translation / browsing It is shown performing process 1208 and document supply process 1210. . In operation, the audio processing node 152 operates as described with reference to FIG. In a similar manner, the audio interpreter document via link 153 Communicate with server 1100. However, if the audio interpreter node 154 Unlike FIG. 1, which communicates with a document server via link 164, audio The translation browsing process 1208 is performed via the central processing unit 1206 by inter-process communication. And communicates with the document supply processing 1210. Therefore, as described above, the audio browsing of the present invention can be used for audio. Processing function, audio translation / browsing function, and document supply function Can be implemented in various forms, such as being integrated or separated depending on the particular configuration. Can be Those skilled in the art will recognize that other configurations Will be provided. As can be seen from the above description, the present invention is compatible with a normal graphic browser. With standard HTML documents intended to be used for Audio specially created for use in audio browsing Used with Dio HTML documents. For audio translation of standard HTML documents, there are many standard Text-to-speech technology may be used. The next section describes the standard HT Technology used to convert ML documents to audio data explain. Convert the HTML document described here to audio data The technology is merely illustrative, and those skilled in the art will be able to convert HTML documents to audio. Various other techniques for converting to signals can be easily implemented. A standard text document is created using well-known ordinary text-to-speech technology. Be translated. The text is translated when encountered in the document, such as The translation allows the user to enter audio input (eg, to answer a prompt or link Or to reach a prompt in the document To continue. The end of a sentence is translated by adding pauses to the sound, Lagraph marks <p> are translated by inserting long pauses. The text style is , May be translated as follows: The image instruction is an H indicating that a particular image is to be inserted into the document. This is a specification of TML. An example of an HTML image instruction is as follows: . <IMG SRC = "http://machine.att.com/image.gif" ALT = "[image of car]"> This command is executed when the image file “image.gif” is read from the machine defined by the URL. Retrieved, indicating that it will be displayed in the client browser. Ah Normal graphic browsers do not support image files, HTML image instructions therefore contain alternative text that is displayed instead of the image Sometimes. In other words, in the above example, the text "image of car" Included instead of the file. According to the audio browsing technology of the present invention, , If the image instruction contains alternative text, the text is processed It is converted to voice and the voice signal is provided to the user. In other words, in this example, " Voice signal "image of car" to the user on the phone 110 side Provided. If no alternative text is provided, the image without alternative text Audio signal (e.g., "Photo without alternative description") Generated. Normal HTML includes instructions that support entering of user input. In. For example, the following instruction: <SELECT NAME = "selectvar"> <OPTION> mary <OPTION SELECTED> joe <OPTION> </ SELECT> Marie (m) when joe is the default option ary) and Joe require the user to choose between two options Things. When the client runs a regular graphical browser, May be represented, for example, as a pull-down menu. Aude of the present invention According to the Internet browsing technology, the above instruction is translated into the following audio signal: It is. "Choose one of the following: Mary (pause), currently selected Joe (pause), end of options. To repeat an option * r, press # to go next " If the user presses the pound key during a pause after an option Then the option is selected. No matter which item is selected, the user If you choose to proceed to the documentation associated with the variable selectvariable (selectvar) Return to the ment server. Instead of the user selecting with the DTMF signal, the user The selection may be made by a signal. Another common HTML instruction for entering user input is a checkbox instruction. You. For example, a series of instructions such as: <INPUT TYPE = "checkbox" NAME = "varname" VALUE = "red" CHECKED> <INPUT TYPE = "checkbox" NAME = "varname" VALUE = "blue"> <INPUT TYPE = "checkbox" NAME = "varname" VALUE = "green"> Is displayed as follows in a normal graphic browser. Red □ Blue □ Green □ By default, the red box is checked. User can be blue or green You can change this default by checking the box . According to the audio browsing technique of the present invention, the above series of instructions is transmitted to the user. It is translated into the following audio signal provided. "The following selections can be changed by pressing # during the inactivity period. Red (pause), blue (pause), green (pause). To repeat this list again * r, press # to go next " If the user presses # during the pause to generate a DTMF signal, the user will be prompted before the pause. Items can be selected. If you press the # key again, the user It is possible to get out of the input operation. The user repeats the list of options If you want to return, just press * r. Instead of DTMF signal input, the user Check box option may be selected using audio signal input . A normal HTML document is created by the user using the following TEXTAREA command. Can require text input. <TEXTAREA COLS = 60 ROWS = 4 NAME = "textvar"> Insert text here Please </ TEXTAREA> This allows ordinary graphic browsers to say "insert text here. 6 "provided to the user for text input following the text A text box with 0 rows and 4 columns will be displayed. Audio Bra of the Present Invention According to the usage technique, the above instruction is translated as follows. COL and ROWS Parameter is ignored, "Please insert text here" Is given to the user. Then, the user follows the DTMF tone # Signal can be input. These DTMF signals are associated with the variable “textvar”. It is processed according to the result given to the associated document server. Or the user Can give text by speaking a response to the microphone of phone 110, The voice is converted to data by the voice recognition module 214, and the data is stored in a variable “textva”. r "is provided to the document server 160 associated therewith. As can be seen from the above description, a normal HTML document is an audio document according to the present invention. Using various techniques so that they can be browsed by the it can. To better illustrate the benefits of audio browsing according to the present invention , Additional document instructions in addition to the normal HTML instructions may be used. These instructions, called audio HTML instructions, are converted to regular HTML documents. May be introduced to the client. These audio HTML instructions are described below. Voice source command, <VOICE SRC = "// www.abc.com/audio.file"> According to the above, the specified file is played back to the user. Take The instructions are described in detail on line 512 of the document 500 illustrated in FIG. I have. Name collect instruction, <COLLECT NAME = "collectvar"> Specifies the start of the prompt and collect sequence. Such collect The name command is followed by a prompt command and a set of select commands. User chooses Selection, the result of the user's selection is acceptable, as indicated by audio user input. Provided to the document server associated with the modified collect variable (collectvar). Ko The collect name instruction, along with the associated prompt and collect sequence, This is described in detail in lines 514-524 of the document 500 illustrated in FIG. I have. DTMF input command, <INPUT TYPE = "DTMF" MAXLENGTH = "5" NAME = varname> Indicates that audio user input in DTMF signal format is expected from the user doing. This command causes the audio browsing assist unit 150 to pause and the user It waits for these DTMF inputs. The user enters the keypad of telephone 110 Key to enter the DTMF sequence, and press the # key to enter the DTMF sequence. Indicates the end of the sequence. The DTMF input is the HTML document 5 shown as an example. 00 is processed as described above. And the decoded DTMF signal Is given to the document server associated with the variable name (varname). MAXLE The NGTH parameter indicates the maximum length that can be input (DTMF input). If the user If you enter more than the maximum number of DTMF keys (5 in this example), the system will Ignore the input. Similarly, the SPEECH input instruction, <INPUT TYPE = "SPEECH" MAXLENGTH = "5" NAME = varname> Indicates that audio user input in the form of audio signals is expected from the user I have. This command causes the audio browsing assist unit 150 to pause and the user Wait for DTMF voice input. The user looks into the microphone of phone 110 And input the audio signal. The voice input is an example HTML document 50 0 is processed as described above. And the audio signal is associated with the variable name Document server. The MAXLENGTH parameter sets the maximum Indicates that the length is 5 seconds. The audio HTML instructions described herein are compatible with the audio browsing of the present invention. Is an example of a type of audio HTML that can be implemented to take advantage of technology You. Those skilled in the art can easily execute other types of audio HTML instructions. You. In addition to the audio HTML instruction described above, the audio browsing auxiliary unit 15 0 supports various navigation instructions. Normal graphic browsing Users use the usual techniques for navigating through documents. May be. These common techniques use text to scroll the document. Text slider, cursor movement, page up, page down, home, And instructions like end. Audio browsing technology of the present invention According to the following, users can choose either DTMF tone format or audio format: These audio user inputs may be used to navigate the document. The above detailed description is illustrative in all respects and is illustrative only. It is not to be understood that the scope of the invention disclosed herein is a detailed description. And not by the claims that are interpreted to the fullest extent permitted by patent law. Is defined. The embodiments shown and described herein are examples of the principles of the present invention. For illustrative purposes only, one skilled in the art may, without departing from the scope and features of the invention. It is understood that various design changes may be made. For example, here the packet Although communication channels such as switch communication channels have been described, circuit switches Execution over a communication channel such as a communication channel is also possible. (Note 2) The telephone numbers used here are for explanation only. Any specific The use of the telephone number of the present invention has no meaning other than as an example of the present invention. No. It does not mean an actual telephone number. (Note 3) Here, italic type indicates that text is reproduced as speech. Used for

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＨ０４Ｍ 11/08 Ｈ０４Ｌ 11/20 １０２Ｚ (72)発明者ラミングジェームスクリストファーアメリカ合衆国カリフォルニア州メンロパークシャロンパークドライブ 350 アパートメントエヌ−103 (72)発明者レアーケネスジーアメリカ合衆国イリノイ州バーウィンウェスト 35 ストリート 7108 (72)発明者タッキーカーティスデュアンアメリカ合衆国イリノイ州シカゴノースレタ 3546 【要約の続き】したユーザデータに翻訳し、さらにそのユーザデータをドキュメント供給プロトコルチャネルを介してドキュメントサーバに提供する。──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁶ Identification code FI H04M 11/08 H04L 11/20 102Z (72) Inventor Ramming James Christopher Men Lopark Sharon Park Drive 350 Apartment N-103 (72 Inventor Lear Kenneth G United States Birwin West 35 Street 7108, Illinois, USA 7108 (72) Inventor Tucky Curtis Duane United States Illinois, Chicago North Rhodes 3546 Provide to document server via protocol channel.

Claims

[Claims] 1 For providing audio access to information stored on the server In the law, Audio between the audio interface device and the communication network node Establishing an o-channel; A document supply protocol between the communication network node and the server; Establishing a multi-channel The communication from the server via the document serving protocol channel Receiving a document at a network node; At the communication network node, the received document is Translating into Odata, Transferring the audio data from the communication network node to the audio channel; Transmitting to the audio interface device via a channel. A method characterized in that 2. The audio interface device is a telephone, and the audio channel The step of establishing a tunnel Receiving a call to a telephone number associated with the server; Routing the telephone to the communication network node. The method of claim 1, comprising: 3. The server is a WWW document server, and the document supply 2. The method of claim 1, wherein the protocol is a hypertext transfer protocol. Method. 4. The document of claim 1, wherein the document includes HTML instructions. the method of. 5 Note that said document further contains audio HTML instructions. 5. The method of claim 4, wherein the method comprises: 6 from the audio interface device via the audio channel Receiving the audio user input at the communication network node. And Transferring the audio user input at the communication network node to the document Translation into user data suitable for transmission via the Steps The user data is transmitted to the server via the document supply protocol channel. Transmitting to the server. . 7. The method of claim 1, wherein the audio user input is a DTMF tone. The method of claim 6. 8. The method of claim 6, wherein the audio user input is an audio signal. Method. 9. A system for accessing information stored in a server, A call from an audio interface device to a telephone number associated with the server A communication network node for receiving a call; An audio channel is established between the audio interface and the audio interface device. A communication network node such as The communication network node for associating the telephone number with the server A database accessible by A document supply protocol between the communication network node and the server; Means associated with the communication network node for establishing a communication channel; A document received from the server via the document serving protocol channel. Communication network node for translating a document into audio data An interpreter associated with The audio data is transferred to the audio input through the audio channel. Interface associated with the communication network node for transmitting to the interface device. And a step. 10. The telephone service, wherein the audio interface device is a telephone. The system of claim 9. 11 The interpreter transmits the audio interface device The audio user interface received over the audio channel Translates into user data suitable for transmission via the document serving protocol It is like that, The system provides the service via the document serving protocol channel. And means for transmitting the user data to the server. The system of claim 9, wherein 12. The audio user input is a DTMF tone. The system of claim 11. 13. The audio user input of claim 1, wherein the input is an audio signal. One system. 14. The server is a WWW document server and the document supply 10. The method of claim 9, wherein the protocol is a hypertext transfer protocol. System. 15. The document of claim 15, wherein the document contains HTML instructions. Nine systems. 16 The document further contains audio HTML instructions. The system of claim 15, wherein: 17. The database contains data for associating a telephone number with a URL. The system of claim 9, wherein 18 To the server that supplies the document according to the document supply protocol In a method for providing audio access to stored information, Establish a communication channel between the audio interface device and the server. Steps Translating the document supplied to the server into audio data When, Providing the audio data to the audio interface device. And a tip. 19. The translating step is performed at the server. 18 methods. 20 The document supply protocol is a hypertext transfer protocol 20. The method of claim 19, wherein: 21. the step of translating is performed on the audio user interface 19. The method of claim 18, wherein 22. The step of translating comprises the server and the audio user interface. To be performed at an intermediate node in the communication channel, which is disposed between the communication channel and the communication interface 19. The method of claim 18, wherein: 23 The document supply protocol is a hypertext transfer protocol. 19. The method of claim 18, wherein the method comprises: 24 Audio user input received from the audio interface device Translating force into instructions compatible with the document serving protocol; , Providing the command to the server. The method of claim 18. 25 The server and the audio interface device are connected via a communication channel. The server and the audio operating according to the document supply protocol. A system for translating information to and from a radio interface device, Docking provided by the server via the document serving protocol Means for receiving the document, An interpreter for translating the received document into audio data And Providing the audio data to the audio interface device A system comprising: 26 wherein the audio interface device is a telephone and the system is Means for establishing the communication channel. The means to establish Means for receiving a call from the phone to a phone number associated with the server; , A database for associating the telephone number with the server. 26. The system of claim 25, wherein: 27 The interpreter communicates with the audio interface device and the server. A node located in the communication channel with the server. 28. The system of claim 25, wherein 28 the interpreter is located in the document server 26. The system of claim 25, wherein: 29 The interpreter is located in the audio interface device. 26. The system of claim 25, wherein 30 receiving from the audio interface device by the interpreter Audio user input according to the document serving protocol. It is also translated into instructions suitable for The system further comprises means for providing the instructions to the document server. 26. The system of claim 25, wherein 31 Docking to Provide Audio Access to Stored Documents In the document server, Connect to a communication link that provides communication with the audio interface device. Interface for Machine readout storing computer program instructions and said document A possible storage device; The memory and the memory for executing the computer program instructions. A central processing unit connected to the interface, According to the computer program instructions, the central processing unit: Respond to receiving a document request and follow the document serving protocol. Retrieving the requested document from a machine readable storage device. And Translating the requested document into audio data; The audio data is transferred to the audio input through the interface. Transmitting to the interface device. And a document server. 32 The document supply protocol is a hypertext transfer protocol. 32. The document server of claim 31, wherein: 33. The communication link is a telephone network connection and the document server A telephone network interface further comprising a telephone network interface. The document server of claim 31. 34 The communication link is a packet network connection and the document Characterized in that the server further comprises a packet network interface 32. The document server according to claim 31, wherein: 35 The central processing unit further comprises: Audio received from the audio interface device via the communication link. Translates audio user input into user data in response to audio user input Steps to Responsive to the user data, the machine readable according to a document serving protocol. To retrieve documents from retrievable storage 32. The document server according to claim 31, wherein