JP4157418B2

JP4157418B2 - Data browsing support device, data browsing method, and data browsing program

Info

Publication number: JP4157418B2
Application number: JP2003127483A
Authority: JP
Inventors: 淳後藤; 則好浦谷; 淵培金
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2003-05-02
Filing date: 2003-05-02
Publication date: 2008-10-01
Anticipated expiration: 2023-05-02
Also published as: JP2004334409A

Description

【０００１】
【発明の属する技術分野】
本発明は、マークアップ言語で記述されたデータを閲覧する際に、利用者の音声によって閲覧の操作を支援するデータ閲覧支援装置、データ閲覧方法及びデータ閲覧プログラムに関する。
【０００２】
【従来の技術】
従来、ネットワークを介して提供される、インターネットのサービスの１つであるＷＷＷ（World Wide Web）では、Ｗｅｂサーバ上で公開される情報（Ｗｅｂページ）は、ＨＴＭＬ（Hyper Text Markup Language）と呼ばれるマークアップ言語（ページ記述言語）で記述されている。このＨＴＭＬデータと他のＨＴＭＬデータとをリンクするには、リンク先のＵＲＬ（Uniform Resource Locator）を画面上に表示される文字や図形に関連付けて、データ中に記述することにより行っている。一方、利用者は、パーソナルコンピュータ等で、ＨＴＭＬを解析して表示を行うＷｅｂブラウザを動作させることで、ＨＴＭＬデータの内容を閲覧することができる。また、このとき利用者は、マウス等の入力装置でリンク先を選択することで、ＵＲＬで記述されている他のリンク先の情報を閲覧することができる。
【０００３】
また、デジタル放送のサービスの１つであるデータ放送では、放送されるデータは、社団法人電波産業会（ＡＲＩＢ：Association of Radio Industries and Businesses）によって策定された、ＢＭＬ（Broadcast Markup Language）と呼ばれるマークアップ言語（ページ記述言語）で記述されている。利用者は、ＢＭＬブラウザを搭載したデジタル放送テレビ受像機によって、データ放送を視聴することができる。このＢＭＬは、ＨＴＭＬと同様に、リンク先をＵＲＬで記述しており、利用者はリモコンの矢印キー等によってリンク先を選択することで、表示画面の遷移を行っている。
【０００４】
しかし、このようなマウスやリモコンは、子供や高齢者にとっては、操作し難いものである。そこで、最近では、マウス等の入力装置を用いずに、利用者がリンク先を示す文字列を音声として発声し、音声認識を行うことで、利用者が希望するリンク先の情報を閲覧（視聴）する、音声認識によるネットサーフィンの技術が開示されている（例えば、特許文献１参照。）。
【０００５】
【特許文献１】
特開２００１−２７３２１６号公報（第４−６頁、第１−３図）
【０００６】
【発明が解決しようとする課題】
しかし、前記従来の技術では、利用者が、リンク先を示す画面上に表示された文字列を正確に発声しなければならず、音声認識において誤って認識してしまうという問題があった。また、この文字列は、Ｗｅｂページやデータ放送の制作者によって決定される任意の文字列であるため、予め音響・言語モデルに登録された文字（文字列）でない場合は認識することができないという問題がある。
さらに、前記従来の技術では、リンク先が画面上に表示されたバナー（Ｂａｎｎｅｒ）等の図形や領域に設定されている場合、利用者は、リンク先を音声で指定することができないという問題があった。
【０００７】
本発明は、以上のような問題点に鑑みてなされたものであり、Ｗｅｂページやデータ放送を閲覧（視聴）する際に、利用者の音声によって、閲覧の操作を支援し、音声のみでリンク先への画面遷移を行うことを可能にしたデータ閲覧支援装置、データ閲覧方法及びデータ閲覧プログラムを提供することを目的とする。
【０００８】
【課題を解決するための手段】
本発明は、前記目的を達成するために創案されたものであり、まず、請求項１に記載のデータ閲覧支援装置は、マークアップ言語で記述されたデータの内容を閲覧する際に、利用者の音声によって閲覧の操作を支援するデータ閲覧支援装置であって、リンク情報検索手段と、マーカ文字付加手段と、リンク情報記憶手段と、表示データ生成手段と、音声認識手段と、リンク先切り替え手段とを備える構成とした。
【０００９】
かかる構成によれば、データ閲覧支援装置は、リンク情報検索手段によって、ＨＴＭＬ、ＢＭＬ等のマークアップ言語で記述されたデータの中から、リンク先を示すリンク情報を検索する。このリンク情報は、リンク先のアドレスを示す情報であって、例えば、ＨＴＭＬ等では、「Ａタグ」の「ｈｒｅｆ属性」によって定義される情報である。そして、データ閲覧支援装置は、マーカ文字付加手段によって、リンク情報に、同一画面のデータ内で識別可能であると共に予め音声認識用の辞書に登録してある語彙を用いたマーカ文字を付加して、データからマーカ文字付データを生成し、マーカ文字とリンク情報とを関連付けてリンク情報記憶手段に記憶する。これによって、マーカ文字によって、リンク先（アドレス）を特定することが可能になる。
【００１０】
ここで、マーカ文字とは、同一画面のデータ内で識別可能な文字（列）であれば何でもよく、「１」、「２」、「３」…、「ａ」、「ｂ」、「ｃ」…等の文字を用いることができる。
【００１１】
そして、データ閲覧支援装置は、表示データ生成手段によって、マーカ文字付データを解析して、閲覧可能な表示データを生成して、画面上に表示する。これによって、データ閲覧支援装置は、リンク先を示す画像や文字列にマーカ文字を付加して表示することができるので、利用者は、マーカ文字が表示されている箇所にリンク先が含まれていることを認識することができる。
【００１２】
さらに、データ閲覧支援装置は、音声認識手段によって、利用者の音声を認識する。すなわち、利用者のデータ閲覧支援装置に対する操作指示を音声により認識する。そして、データ閲覧支援装置は、リンク先切り替え手段によって、音声認識手段で認識された認識結果が、マーカ文字に対応する（マーカ文字と認識される）場合に、そのマーカ文字に基づいて、リンク情報記憶手段に記憶されているリンク情報で示されたリンク先から、データを取得する。これによって、利用者はマーカ文字を発声することで、リンク先を移動してデータを閲覧することが可能になる。
【００１４】
また、データ閲覧支援装置は、音声認識用の辞書に登録してある語彙（マーカ文字）を利用者が発声することになり、音声認識手段による音声認識の誤認識を低減させることができる。
【００１５】
さらに、請求項２に記載のデータ閲覧支援装置は、請求項１に記載のデータ閲覧支援装置において、前記音声認識手段の認識結果である音声認識文字を、表示画面上に合成して表示させる文字合成手段を備える構成とした。
【００１６】
かかる構成によれば、データ閲覧支援装置は、音声認識手段で認識した認識結果である音声認識文字を、文字合成手段によって、表示画面上に合成して表示する。これによって、利用者は、自分が発声した音声が、データ閲覧支援装置でどのように認識されているかを確認することができる。
【００１７】
また、請求項３に記載のデータ閲覧支援装置は、請求項１又は請求項２に記載のデータ閲覧支援装置において、前記利用者からの操作指令を予め特定文字列の組み合わせで定型化しておき、前記音声認識手段の認識結果の中に含まれる前記特定文字列に基づいて、前記操作指令を解析する操作指令解析手段を備える構成とした。
【００１８】
かかる構成によれば、データ閲覧支援装置は、操作指令を予め特定文字列の組み合わせで定型化しておくため、操作指令解析手段によって、音声認識手段の認識結果の中に含まれる特定文字列が、定型化されたパターンに含まれるかどうかで、操作指令を特定することができる。この特定文字列は、例えばテレビ視聴やインターネットのＷｅｂページの閲覧等で、高い確率で発生する文字列であって、「チャンネル」、「番組名」、「放送局名」、「ジャンル」等及び動作を示す動詞を組み合わせることで、操作指令を特定（推定）することができる。
【００１９】
さらに、請求項４に記載のデータ閲覧支援装置は、請求項１乃至請求項３のいずれか１項に記載のデータ閲覧支援装置において、前記利用者からの操作指令に対する応答文を、音声として出力する音声合成手段を備える構成とした。
【００２０】
かかる構成によれば、データ閲覧支援装置は、音声合成手段によって、利用者からの操作指令に対する応答文を、音声合成し音声として出力する。これによって、利用者の操作指令を受け付けた、操作指令に間違いがあった等の通知を音声によって利用者に通知する。あるいは、操作結果、例えば、検索を行う操作に対する検索結果の件数を音声合成によって通知することとしてもよい。
【００２１】
また、請求項５に記載のデータ閲覧支援装置は、請求項３又は請求項４に記載のデータ閲覧支援装置において、放送データ受信手段と、通信データ受信手段と、受信切り替え手段とを備える構成とした。
【００２２】
かかる構成によれば、データ閲覧支援装置は、放送データ受信手段によって、放送波を介して放送データを受信し、通信データ受信手段によって、通信回線を介して通信データを受信する。なお、データ閲覧支援装置は、利用者の指示により、受信切り替え手段が、放送データ受信手段による受信と、通信データ受信手段による受信とを切り替える。これによって、データ閲覧支援装置は、テレビ視聴時の任意のタイミングで、Ｗｅｂページの閲覧を行ったり、Ｗｅｂページの閲覧中にテレビの視聴に切り替える等の切り替え操作を行うことができる。
【００２３】
さらに、請求項６に記載のデータ閲覧方法は、マークアップ言語で記述されたデータの内容を閲覧する際に、リンク先への移動の操作を、利用者の音声によって行うデータ閲覧方法であって、前記データに埋め込まれている前記リンク先を示すリンク情報に、同一画面内で識別可能であると共に予め音声認識用の辞書に登録してある語彙を用いたマーカ文字を付加して表示するステップと、表示された前記マーカ文字を利用者が音声として発声するステップと、前記利用者が発声した前記マーカ文字を認識して、そのマーカ文字に対応する前記リンク先からデータを取得して提示するステップと、を含んでいることを特徴とする。
【００２４】
この方法によれば、ＨＴＭＬ、ＢＭＬ等のマークアップ言語で記述されたデータに埋め込まれているリンク先のアドレスを示すリンク情報に、同一画面のデータ内で識別可能であると共に予め音声認識用の辞書に登録してある語彙を用いたマーカ文字を付加するため、画面上でリンク先を示す画像や文字列にマーカ文字を付加して表示することができる。そして、このデータ閲覧方法は、利用者がマーカ文字を音声として発声することで、マーカ文字に対応するリンク先からデータを取得し、リンク先を移動（遷移）することが可能になる。
【００２５】
また、請求項７に記載のデータ閲覧プログラムは、マークアップ言語で記述されたデータの内容を閲覧する際に、前記データに埋め込まれたリンク先への移動の操作を、利用者の音声によって行うために、コンピュータを、リンク情報検索手段、マーカ文字付加手段、リンク情報記憶制御手段、表示データ生成手段、音声認識手段、リンク先切り替え手段として機能させることとした。
【００２６】
かかる構成によれば、データ閲覧プログラムは、リンク情報検索手段によって、ＨＴＭＬ、ＢＭＬ等のマークアップ言語で記述されたデータの中から、リンク先を示すリンク情報を検索する。そして、データ閲覧プログラムは、マーカ文字付加手段によって、リンク情報に、同一画面のデータ内で識別可能であると共に予め音声認識用の辞書に登録してある語彙を用いたマーカ文字を付加して、データからマーカ文字付データを生成し、リンク情報記憶制御手段がマーカ文字とリンク情報とを関連付けてリンク情報記憶手段に記憶する。
【００２７】
そして、データ閲覧プログラムは、表示データ生成手段によって、マーカ文字付データを解析して、閲覧可能な表示データを生成して、画面上に表示する。さらに、データ閲覧プログラムは、音声認識手段によって、利用者の音声を認識し、その認識結果が、マーカ文字に対応する（マーカ文字と認識される）場合に、リンク先切り替え手段が、リンク情報記憶手段を参照して、マーカ文字に関連付けられているリンク情報で示されたリンク先からデータを取得する。
【００２８】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。ここでは、本発明をデジタル放送テレビ受像機に適用し、テレビ操作支援装置として構成している。
【００２９】
［テレビ操作支援装置（データ閲覧支援装置）の構成］
図１は、本発明におけるテレビ操作支援装置（データ閲覧支援装置）の構成を示したブロック図である。テレビ操作支援装置１は、デジタル放送の視聴と、インターネットによるＷｅｂページの閲覧を行うもので、さらに、利用者の音声によって、視聴（閲覧）のための操作を支援するものである。ここでは、テレビ操作支援装置１は、受信手段１０と、リンク情報検索手段１１と、マーカ文字付加手段１２と、表示データ生成手段１３と、表示制御手段１４と、音声認識手段１５と、対話処理手段１６と、音声合成手段１７と、記憶手段１８とを備えている。また、ここでは、テレビ操作支援装置１は、映像、データ等を表示するための表示装置２と、音声を入力する入力手段であるマイク３と、音声を出力する音声出力手段であるスピーカ４とを外部に接続している。
【００３０】
受信手段１０は、放送波５によるデジタル放送の受信と、通信回線６によるデータ受信とを行うものである。ここでは、受信手段１０は、放送データ受信部１０ａと、通信データ受信部１０ｂと、受信切り替え部１０ｃとを備えている。
【００３１】
放送データ受信部（放送データ受信手段）１０ａは、放送波５を介して、デジタル放送で放送される放送データを受信するものである。この放送データ受信部１０ａでは、放送データを受信、復調し、誤り訂正やＴＭＣＣ（Transmission and Multiplexing Configuration Control）復号等の復号を行い、ＭＰＥＧ２のトランスポートストリーム（Transport Stream）として出力する。なお、映像・音声（映像ストリーム及び音声ストリーム）は、表示制御手段１４へ出力される。また、データ（ＢＭＬ：データストリーム）は、カルーセル伝送を復号したデータファイルとして、リンク情報検索手段１１へ出力される。
【００３２】
なお、放送データ受信部１０ａは、ＡＲＩＢＳＴＤ−Ｂ１０で規定されているＳＩ（Service Information；番組配列情報）をカルーセル伝送で受信し解析することで、チャンネル、番組名、放送局名、ジャンル、出演者名等の番組情報を取得し、個々の番組情報を１つの形態素として、音響・言語モデル１８ａに記憶する。これは、一般に番組名等は複数の形態素の組み合わせで表現されるが、ここでは、後記する対話処理手段１６の操作指令解析部１６ａにおいて、番組名等を１つの形態素として認識させるためである。これによって、番組名等の音声認識率を高めることができる。
【００３３】
通信データ受信部（通信データ受信手段）１０ｂは、通信回線６を介して、データの通信を行うものである。例えば、インターネットのＷｅｂサイトに対して、Ｗｅｂページのデータ要求を送信し、要求したＷｅｂサイトからデータを受信する。なお、この通信データ受信部１０ｂは、ＴＣＰ／ＩＰの通信プロトコルによってデータ（通信データ）の受信を行う。ここで受信したＷｅｂページのデータ（ＨＴＭＬ）は、リンク情報検索手段１１へ出力される。
【００３４】
受信切り替え部（受信切り替え手段）１０ｃは、後記する対話処理手段１６から通知される「受信切り替え指示」に基づいて、放送データ受信部１０ａと通信データ受信部１０ｂとの受信の切り替えを行うものである。例えば、デジタル放送を視聴中に、Ｗｅｂページの閲覧を行うときは、放送データ受信部１０ａにおける放送データの受信を停止させ、「受信切り替え指示」とともに通知されるリンク先（アドレス）に基づいて、通信データ受信部１０ｂに対して通信データ（Ｗｅｂページ）の取得を行う旨を通知する。
【００３５】
リンク情報検索手段１１は、受信手段１０で受信したデータ（マークアップ言語）を解析して、そのデータの中に含まれるリンク先を示すリンク情報を検索するものである。例えば、ＨＴＭＬやＢＭＬにおいて、リンク先の属性を示す「Ａタグ」の「ｈｒｅｆ属性」を検索する。これによって、リンク情報検索手段１１は、「Ａタグ」の「ｈｒｅｆ属性」を含むタグ（「＜」、「＞」）内に、リンク情報が記述されていることを検出することができる。
【００３６】
なお、リンク情報検索手段１１は、リンク情報を含んだタグを検出するまでは、逐次入力されたデータ（マークアップ言語）をそのままマーカ文字付加手段１２へ通知する。また、リンク情報検索手段１１は、リンク情報を含んだタグを検出した場合は、リンク先を検出した旨を示す「検出通知」をマーカ文字付加手段１２へ通知し、その後にリンク情報を含んだタグをマーカ文字付加手段１２へ通知する。
【００３７】
マーカ文字付加手段１２は、リンク情報検索手段１１から通知されるデータ（マークアップ言語）に、「検出通知」を通知される度に、マーカ文字を付加してマーカ文字付データを生成するものである。なお、このマーカ文字は、画面の背景色とは異なる色に設定するものとする。このマーカ文字を付加されたデータ（マーカ文字付データ）は、表示データ生成手段１３へ通知される。
【００３８】
また、マーカ文字付加手段１２は、「検出通知」を通知された後のリンク情報と、付加したマーカ文字とを関連付け、記憶手段１８にリンク先データ１８ｃとして記憶しておく。
さらに、マーカ文字付加手段１２は、「戻る」、「進む」等の指示によって画面の切り替えを可能にするため、マーカ文字を付加して生成されたマーカ文字付データを、記憶手段１８に履歴データ１８ｂとして記憶しておく。
【００３９】
なお、マーカ文字は、同一画面のデータ内で、識別可能な文字であって、後記する音声認識手段１５で参照する音響・言語モデル１８ａに登録してある語彙を用いる。例えば、音響・言語モデル１８ａに予め「ｉｃｈｉ」＝「１」、「ｎｉ」＝「２」、「ｓａｎ」＝「３」、…等で語彙が登録されている場合、「１」、「２」、「３」、…をマーカ文字として使用する。このマーカ文字は、同一画面のデータ内で識別可能で、音響・言語モデル１８ａに登録されている語彙であれば何でもよい。例えば、「ａ」、「ｂ」、「ｃ」、…や、「い」、「ろ」、「は」、…であってもよい。
【００４０】
表示データ生成手段１３は、マーカ文字付加手段１２から通知されるマーカ文字付データを解析して、表示可能な出力形式に変換して出力するものである。これは、通知されたデータが、ＨＴＭＬデータの場合はＷｅｂブラウザ、ＢＭＬデータの場合はＢＭＬブラウザとして機能するものである。なお、ここで、表示可能な出力形式に変換された表示データは、表示制御手段１４を介して表示装置２の画面上に表示される。
【００４１】
なお、表示データ生成手段１３は、対話処理手段１６から「戻る」、「進む」等の履歴移動指示を通知されることで、履歴データ１８ｂとして記憶されている過去に表示したデータ（マーカ文字付データ）を参照して、画面の遷移を行う。
【００４２】
ここで、図２及び図３を参照（適宜図１参照）して、マークアップ言語で記述されたデータに、マーカ文字を付加する例について具体的に説明する。図２は、ＨＴＭＬで記述された画面のデータ例を表したものである。図２（ａ）は、マーカ文字を付加する前の元となるデータであり、図２（ｂ）は、図２（ａ）のデータにマーカ文字を付加したマーカ文字付データである。図３は、ＨＴＭＬで記述されたデータを表示可能な出力形式に変換して表示した画面例である。図３（ａ）は、図２（ａ）で記述されたデータを表示した画面例である。図３（ｂ）は、図２（ｂ）で記述されたマーカ文字付データを表示した画面例である。
【００４３】
図２（ａ）に示したように、ＨＴＭＬでは、「Ａタグ」の「ｈｒｅｆ属性」（ａｔｔ）によって、リンク先（ｌｎｋ）、例えば、”ｐｒｏｊｅｃｔｘ／ｐｒｏｊｅｃｔｘ．ｈｔｍｌ”を定義している。すなわち、リンク情報検索手段１１では、「ｈｒｅｆ属性」（ａｔｔ）をキーとして、リンク先を示すリンク情報を検索することができる。
【００４４】
そして、マーカ文字付加手段１２では、リンク情報検索手段１１で検索した「ｈｒｅｆ属性」（ａｔｔ）を含んだタグ（「＜」、「＞」）の前に、図２（ｂ）に示す「１」、「２」、「３」等のマーカ文字ｍｋを付加する。
【００４５】
なお、この図２（ｂ）においては、「ＦＯＮＴタグ」によって、マーカ文字ｍｋの色を指定し、「ｂタグ」によって、マーカ文字ｍｋの太さを指定している。ここでは、＜ＦＯＮＴｃｏｌｏｒ＝”＃ｆｆｆｆ９９”＞＜ｂ＞１＜／ｂ＞＜／ｆｏｎｔ＞とすることで、マーカ文字である「１」を付加している。
【００４６】
この図２（ｂ）では、「ｂｏｄｙタグ」の「ｂｇｃｏｌｏｒ属性」によって、背景色が”＃００００６６”（ｂｃ）と設定されている場合に、その背景色とは異なる色”＃ｆｆｆｆ９９”（ｆｃ）にマーカ文字ｍｋの色を設定している。このマーカ文字ｍｋの色ｆｃは、”ｆｆｆｆｆｆ”から、背景色ｂｃの値を減算することで、常に背景色とは異なる色を設定することができる。また、図２（ｂ）では、「ｂタグ」によって、マーカ文字ｍｋを太字に設定している。このように、マーカ文字ｍｋの色や太さを定義することで、マーカ文字ｍｋの視覚性を高めることができる。
【００４７】
図３（ａ）は、図２（ａ）で記述されたデータを表示した画面例であって、画面左側にリンク先が設定された画像Ｂ１〜Ｂ５や、画面中央にリンク先が設定された文字列Ｃ１及びＣ２を表示している。
【００４８】
図３（ｂ）は、図２（ｂ）で記述されたマーカ文字付データを表示した画面例であって、図３（ａ）でリンク先が設定されている画像Ｂ１〜Ｂ５、文字列Ｃ１及びＣ２にマーカ文字（Ｎ１〜Ｎ６及びＮ６ａ）を付加して表示している。すなわち、画像Ｂ１に対しては、マーカ文字Ｎ１として「１」、画像Ｂ２に対しては、マーカ文字Ｎ２として「２」のように順番にマーカ文字を付加している。
【００４９】
なお、ここでは、図３（ａ）の文字列Ｃ１及びＣ２のリンク先は同じであるものとして、図３（ｂ）では、同じマーカ文字Ｎ６及びＮ６ａとして「６」を付加している。このように、同じリンク先には同じマーカ文字を付加してもよいし、順番にマーカ文字を付加することとしてもよい。
【００５０】
図２（ｂ）及び図３（ｂ）に示したように、リンク先が設定されている画像、文字等にマーカ文字を付加することで、利用者は、マウス等の入力手段がなくても、音声によってリンク先を指定することができる。
図１に戻って説明を続ける。
【００５１】
表示制御手段１４は、表示データ生成手段１３から出力された表示データや、受信手段１０の放送データ受信部１０ａから通知される映像・音声を外部に出力するものである。この表示制御手段１４は、表示データ及び映像は表示装置２へ出力し、音声は音声出力手段であるスピーカ４へ出力する。
なお、この表示制御手段１４は、表示画面上に文字を合成して表示する文字合成部１４ａを備えている。
【００５２】
文字合成部（文字合成手段）１４ａは、後記する音声認識手段１５で認識された音声認識文字を、表示画面上に合成するものである。この音声認識文字を、表示画面上の左下等の固定した領域に表示させることで、利用者は、利用者が発声した操作指令が正しく認識されているかどうかを確認することができる。
【００５３】
音声認識手段１５は、マイク３から入力される利用者の音声（操作指令）を、音声認識し、テキストデータ（文字列）として出力するものである。ここで認識された文字列は、対話処理手段１６へ出力される。なお、この音声認識手段１５の音声認識は、公知の一般的な音声認識技術を用いて実現することができる。さらに、ここで認識された文字列（音声認識文字）は、表示制御手段１４へ出力され、音声認識結果として、表示装置２上に表示される。
【００５４】
対話処理手段１６は、音声認識手段１５の認識結果であるテキストデータ（文字列）に基づいて、利用者の操作指令を解析し、その操作指令に対応する動作を実行するものである。ここでは、対話処理手段１６は、操作指令解析部１６ａと、操作指令実行部１６ｂとを備えている。
【００５５】
操作指令解析部（操作指令解析手段）１６ａは、音声認識手段１５から入力された文字列を解析して、利用者の操作指令を認識するものである。ここでは、操作指令解析部１６ａは、音響・言語モデル１８ａに基づいて、文字列を形態素解析することで形態素に分割し（ここでは、単語に分割するものとする）、その形態素を単位として、予め定めた操作テンプレート１８ｄとマッチングを行うことで、操作指令の内容を特定（判断）する。
【００５６】
また、操作指令解析部１６ａは、操作指令に対して予め設定してある応答文を音声合成手段１７に通知して、利用者に対して操作指令を認識した、あるいは、認識できなかった等の応答を返す。
【００５７】
ここで、図４及び図５を参照（適宜図１参照）して、操作指令解析部１６ａが、文字列から操作指令の内容を認識する手法について具体的に説明する。図４及び図５ともに、操作指令解析部１６ａが、利用者が発声する音声によって、利用者の意思をどのように判断するかを説明するための説明図である。図４は、マーカ文字に対応する音声によって、リンク先を閲覧したいという利用者の意思を判断する例であり、図５はマーカ文字以外の任意の音声によって、利用者の意思を判断する例である。
【００５８】
まず、図４を参照して、マーカ文字に対応する音声によって、利用者の意思を判断する例について説明する。ここでは、操作指令解析部１６ａに、音声認識手段１５で認識した「３番が見たい。」（図４（ａ））という操作文字列が入力された例を示している。
【００５９】
ここで、操作指令解析部１６ａは、操作文字列を、音響・言語モデル１８ａに記憶されている形態素辞書ｄｉｃに基づいて形態素解析を行い、図４（ｂ）に示すように「３」「番」「が」「見」「たい。」と各形態素に分割する。なお、「３」というマーカ文字は、予め形態素辞書ｄｉｃに登録されているものとする。そして、図４（ｃ）に示すように、操作指令解析部１６ａは、各形態素と、操作テンプレート１８ｄとして登録されている文字列の組み合わせ（定型文）とのマッチングを行うことで、利用者の意思を判断する。
【００６０】
例えば、操作テンプレート１８ｄに、「［＠マーカ文字］＊｛［見る］｜［替える］｝」が登録されているとする。ここで、［＠マーカ文字］は、形態素辞書ｄｉｃに登録されているマーカ文字のいずれか１つを表し、｛［見る］｜［替える］｝は、［見る］又は［替える］のいずれかの動詞を表すものとする。これによって、操作指令解析部１６ａは、図４（ｃ）におけるマーカ文字である「３」と、動作を示す「見る」（ここでは活用していない動詞を用いる）が、操作テンプレート１８ｄの「［＠マーカ文字］＊｛［見る］｜［替える］｝」に合致し、「３」というマーカ文字のリンク先を見たいという意思と判断することができる。なお、操作テンプレート１８ｄには、テンプレート以外に、そのテンプレートに該当する動作を記述しておくものとする。
【００６１】
次に、図５を参照して、任意の音声によって、利用者の意思を判断する例について説明する。ここでは、操作指令解析部１６ａに、音声認識手段１５で認識した「プロジェクトＸのホームページを探して。」（図５（ａ））という操作文字列が入力された例を示している。
【００６２】
ここで、操作指令解析部１６ａは、操作文字列を、音響・言語モデル１８ａに記憶されている形態素辞書ｄｉｃに基づいて形態素解析を行い、図５（ｂ）に示すように「プロジェクトＸ」「の」「ホームページ」「を」「探し」「て。」と各形態素に分割する。なお、形態素辞書ｄｉｃには、「番組名」として「プロジェクトＸ」、「コマンド語」として「ホームページ」が登録されているものとする。ここで「コマンド語」とは、インターネット等の閲覧に用いる用語（他には、画面を戻す「戻る」、データの受信を中止する「中止」等）を示すものとする。
【００６３】
そして、図５（ｃ）に示すように、操作指令解析部１６ａは、各形態素と、操作テンプレート１８ｄとして登録されている文字列の組み合わせ（定型文）とのマッチングを行うことで、利用者の意思を判断する。
【００６４】
例えば、操作テンプレート１８ｄに、「［＠番組名］＊［＠コマンド語］＊｛［見る］｜［探す］｝」が登録されているとする。ここで、［＠番組名］は、形態素辞書ｄｉｃに登録されている番組名のいずれか１つを表し、［＠コマンド語］は、形態素辞書ｄｉｃに登録されているコマンド語のいずれか１つを表し、｛［見る］｜［探す］｝は、［見る］又は［探す］のいずれかの動詞を表すものとする。これによって、操作指令解析部１６ａは、図５（ｃ）における番組名である「プロジェクトＸ」と、コマンド語である「ホームページ」と、動作を示す「探す」（ここでは活用していない動詞を用いる）が、操作テンプレート１８ｄの「［＠番組名］＊［＠コマンド語］＊｛［見る］｜［探す］｝」に合致し、「プロジェクトＸ」という番組名のホームページを探したいという意思と判断することができる。
図１に戻って説明を続ける。
【００６５】
操作指令実行部１６ｂは、操作指令解析部１６ａで解析した操作指令に対する動作を実行するものである。なお、操作指令実行部１６ｂには、リンク先切り替え部１６ｂ１を備えている。
【００６６】
リンク先切り替え部（リンク先切り替え手段）１６ｂ１は、操作指令解析部１６ａによる操作指令の解析結果が、「マーカ文字のリンク先を見たい」という操作である場合に、リンク先データ１８ｃを参照して、マーカ文字のリンク先を取得し、放送データ受信部１０ａ又は通信データ受信部１０ｂに対して、そのリンク先のデータを取得する旨の指示を行う。
【００６７】
また、操作指令実行部１６ｂには、リンク先切り替え部１６ｂ１以外にも、図示していない各操作指令を実行する処理部を備えている。例えば、テレビ視聴中にインターネットの所望のホームページを閲覧したいという、操作指示があった場合は、受信切り替え部１０ｃに対して、放送データの受信から通信データの受信へ切り替える指示を通知する。
【００６８】
また、操作指令実行部１６ｂは、操作結果を応答文として音声合成手段１７に通知して、利用者に対して操作の実行結果を返す。例えば、ある番組に関連するホームページを検索する旨の操作指示があったときに、その件数を「該当するホームページが５件ありました。」等の音声によって通知する。
【００６９】
音声合成手段１７は、対話処理手段１６から通知される応答文を音声合成することで音声に変換し、スピーカ４を介して利用者に操作指令に対する応答を行う。これによって、利用者は、テレビ操作支援装置１と会話をする感覚で当該装置１の操作を行うことが可能になる。
【００７０】
記憶手段１８は、テレビ操作支援装置１において、音声認識、対話処理等に必要となる種々のデータを記憶しておくもので、半導体メモリ、ハードディスク等の一般的な記録媒体である。ここでは、記憶手段１８に、音響・言語モデル１８ａ、履歴データ１８ｂ、リンク先データ１８ｃ及び操作テンプレート１８ｄを記憶することとした。なお、これらのデータは、１つの記憶手段に記憶する必要はなく、複数の記憶手段に記憶することとしてもよい。
【００７１】
音響・言語モデル１８ａは、発音データに基づいて生成された音声の単語辞書と、個々の単語の繋がりを確率により表現したモデルとを、含んだデータである。さらに、この音響・言語モデル１８ａは、形態素（意味を担う最小の言語単位）の辞書（形態素辞書）を含んでおり、操作指令解析部１６ａにおいて、形態素解析を行う際に用いられる。
【００７２】
なお、この形態素辞書には、「番組名」、「放送局名」、「出演者名」等の語彙を形態素として登録しておく。一般に番組名等は、複数の形態素の組み合わせで表現される場合が多いが、ここでは、テレビ操作における音声認識率を高めるため、複数の形態素からなる「番組名」等を１つの形態素として登録しておく。このテレビ操作用の語彙は、放送データ受信部１０ａが放送データを解析することで更新を行う。
【００７３】
履歴データ１８ｂは、マーカ文字付加手段１２によってマーカ文字を付加されたデータ（マーカ文字付データ）を、記憶したものである。この履歴データ１８ｂは、表示データ生成手段１３によって参照され、画面の遷移が行われる。これによって、過去に表示したデータを放送波や通信回線を介して再度取得する必要がなくなる。
【００７４】
リンク先データ１８ｃは、マーカ文字付加手段１２によって付加されたマーカ文字と、そのマーカ文字を付加したリンク情報とを関連付けたデータである。このリンク先データ１８ｃは、対話処理手段１６の操作指令解析部１６ａによって、利用者からの操作指令に含まれるマーカ文字に対応するリンク情報が読み出される。なお、記憶手段１８に記憶されたリンク先データ１８ｃが、特許請求の範囲に記載のリンク情報記憶手段に相当する。
【００７５】
操作テンプレート１８ｄは、利用者が発声する発話内容（操作指令）を、特定文字列の組み合わせで定型化して、個々の定型文毎にその動作を設定したものである。例えば、図５に示したように、「番組名」とインターネットを検索する用語である「コマンド語」と、動詞（見る又は探す）を組み合わせて定型化し、その定型文に対して、＜「番組名」の「コマンド語」を見る又は探す＞という動作を設定しておく。
【００７６】
なお、この操作テンプレート１８ｄには、個々の操作指令の定型文に対して、応答文を設定しておく。例えば、図５の例で、操作指令を認識した場合には、「分かりました。」という定型文を、「番組名」が音響・言語モデル１８ａの形態素辞書にない場合には、「＜番組名＞という番組は存在しません。」という定型文を定義しておく。
【００７７】
以上、本発明に係るテレビ操作支援装置（データ閲覧支援装置）１の構成について説明したが、本発明はこれに限定されるものではない。例えば、受信手段１０を放送データ受信部１０ａ又は通信データ受信部１０ｂのいずれか１つの構成として、デジタル放送におけるデータ放送の操作のみを支援したり、通信回線を介したインターネットのＷｅｂページの閲覧のみを支援するものとして構成してもよい。
【００７８】
なお、テレビ操作支援装置（データ閲覧支援装置）１は、一般的なコンピュータにプログラムを実行させ、コンピュータ内の演算装置や記憶装置を動作させることにより実現することができる。このプログラム（データ閲覧プログラム）は、通信回線を介して配付することも可能であるし、ＢＭＬで記述することで、データ放送によって配信することも可能である。
【００７９】
［テレビ操作支援装置（データ閲覧支援装置）の動作］
次に、図６及び図７を参照（適宜図１参照）して、本発明におけるテレビ操作支援装置（データ閲覧支援装置）の動作について説明する。図６は、テレビ操作支援装置１が、通信回線６を介してＷｅｂページのデータ（ＨＴＭＬ）を取得して表示装置２に提示する動作（ＨＴＭＬデータ表示動作）を示したフローチャートである。図７は、テレビ操作支援装置１が、利用者の音声による操作指令よって、リンク先を移動する動作を示すフローチャートである。
【００８０】
（ＨＴＭＬデータ表示動作）
まず、図６を参照（適宜図１参照）して、テレビ操作支援装置１が、ＨＴＭＬデータを取得し、マーカ文字を付加した画面を表示する動作について説明する。なお、ここでは、通信回線６を介して取得したＨＴＭＬを解析して表示を行う動作について説明を行う。
【００８１】
テレビ操作支援装置１は、受信手段１０の通信データ受信部１０ｂによってＷｅｂページのデータ（ＨＴＭＬ）受信し、リンク情報検索手段１１がそのデータ（ＨＴＭＬ）の読み込みを行う（ステップＳ１０）。そして、リンク情報検索手段１１が、ＨＴＭＬデータから、リンク先の属性を示すリンク情報である「ｈｒｅｆ属性」を検索する（ステップＳ１１）。
【００８２】
そして、リンク情報の有無を判定し（ステップＳ１２）、リンク情報が存在する場合（ステップＳ１２でＹｅｓ）は、マーカ文字付加手段１２が、リンク情報の前にマーカ文字を付加しマーカ文字付データの生成を行う（ステップＳ１３）。なお、この段階で、マーカ文字付加手段１２は、マーカ文字付データを履歴データ１８ｂとして記憶手段１８に記憶する（ステップＳ１４）。また、マーカ文字付加手段１２は、リンク情報とそのリンク情報に付加したマーカ文字とを対応付けて、記憶手段１８のリンク先データ１８ｃとして記憶する（ステップＳ１５）。そして、表示データ生成手段（ブラウザ）１３が、マーカ文字付加手段１２で生成されたマーカ文字付データから、表示可能な出力形式に変換した表示データを生成し、表示装置２へ出力する（ステップＳ１６）。
【００８３】
一方、リンク情報が存在しない場合（ステップＳ１２でＮｏ）は、ステップＳ１６へ進み、表示データ生成手段（ブラウザ）１３が、ＨＴＭＬデータから、表示可能な出力形式に変換した表示データを生成し、表示装置２へ出力する。
【００８４】
そして、リンク情報検索手段１１において、ＨＴＭＬデータの読み込みが終了したかどうかを判定し（ステップＳ１７）、終了した場合（ステップＳ１７でＹｅｓ）は動作を終了し、終了していない場合（ステップＳ１７でＮｏ）は、ステップＳ１０へ戻って、ＨＴＭＬデータの読み込み以降の動作を継続する。
【００８５】
以上の動作によって、テレビ操作支援装置１は、通信回線６を介して取得したＷｅｂページのデータ（ＨＴＭＬ）を表示装置２に提示する際に、リンク先を示す領域や文字が存在する箇所に、「１」、「２」等のマーカ文字を付して画面上に表示させることができる。
【００８６】
（リンク先移動動作）
次に、図７を参照（適宜図１参照）して、テレビ操作支援装置１が、利用者の音声による操作指令によって、リンク先を移動する動作について説明する。
まず、テレビ操作支援装置１は、音声認識手段１５によって、マイク３から入力される利用者の音声（操作指令）を音声認識して、テキストデータ（文字列）に変換する（ステップＳ２０）。そして、音声認識手段１５が音声認識結果である文字列（音声認識文字）を表示制御手段１４へ通知することで、表示制御手段１４の文字合成部１４ａが、画面上に音声認識文字を合成して、表示装置２に表示する（ステップＳ２１）。これによって、利用者は、自分が発声した音声が、どのように認識されたかを判定することができる。
【００８７】
また、テレビ操作支援装置１は、対話処理手段１６の操作指令解析部１６ａによって、音声認識文字の解析を行い利用者の操作指令を認識する。より具体的には、操作指令解析部１６ａが、記憶手段１８に記憶されている音響・言語モデル１８ａに基づいて、音声認識文字に対して形態素解析を行い（ステップＳ２２）、個々の形態素と、記憶手段１８に記憶されている操作テンプレート１８ｄとをパターンマッチングすることで、操作指令の意味を決定する（ステップＳ２３）。
【００８８】
そして、操作指令実行部１６ｂが、ステップＳ２３で決定した操作指令が、利用者がマーカ文字を発声したことによるリンク先の移動であるかどうかを判定し（ステップＳ２４）、リンク先の移動である場合（ステップＳ２４でＹｅｓ）は、リンク先切り替え部１６ｂ１が、リンク先の切り替えを行う。すなわち、リンク先切り替え部１６ｂ１が、記憶手段１８に記憶されているリンク先データ１８ｃから、マーカ文字に対応するリンク情報（リンク先のアドレス）を取得し（ステップＳ２５）、受信手段１０に対してそのリンク先を通知することで、受信手段１０が新しいリンク先からデータの取得を行う（ステップＳ２６）。
【００８９】
そして、テレビ操作支援装置１は、このステップＳ２６で取得した新しいリンク先から取得したデータ（ＨＴＭＬデータ）を、表示装置２へ表示する（ステップＳ２７）。このステップＳ２７の具体的な動作は、図６で説明した、ＨＴＭＬデータを取得し、マーカ文字を付加した画面を表示する動作と同じである。
【００９０】
一方、ステップＳ２４において、操作指令がリンク先の移動でない場合（ステップＳ２４でＮｏ）は、操作指令実行部１６ｂがその操作指令によって、種々の操作を実行する。例えば、利用者が「戻る」と発声したことを、操作指令解析部１６ａが認識することで、画面を１画面前に戻す操作を行う。この場合、操作指令実行部１６ｂは、「戻る」を示す履歴移動指示を表示データ生成手段１３へ通知し、表示データ生成手段１３が、履歴データ１８ｂから１画面前のデータを取得することで、１画面前のデータを表示可能な出力形式に変換し表示装置２の画面上に表示する。
【００９１】
以上の動作によって、テレビ操作支援装置１は、利用者が発声する音声によって、利用者が望む操作を認識し実行することができる。また、画面上に表示されたマーカ文字を利用者が発声するという簡単な動作で、リンク先の切り替え動作（操作）を行うことができる。
【００９２】
なお、図６及び図７においては、ＨＴＭＬデータの表示動作と、ＨＴＭＬデータのリンク先の移動動作について主に説明したが、放送波５を介して受信したデータ（ＢＭＬ）についても同様に動作させることが可能である。
【００９３】
【発明の効果】
以上説明したとおり、本発明に係るデータ閲覧支援装置、データ閲覧方法及びデータ閲覧プログラムでは、以下に示す優れた効果を奏する。
【００９４】
請求項１、請求項６又は請求項７に記載の発明によれば、ＨＴＭＬやＢＬＭ等のマークアップ言語で記述されたデータを閲覧する際に、画面上にマーカ文字を表示して、リンク先が存在することを提示するとともに、利用者がそのマーカ文字を発声することで、利用者が所望するリンク先へ画面を遷移させることができる。また、リンク先が、画像等で示されている場合であっても、利用者は、その画像に付されたマーカ文字を発声するという簡単な操作で、リンク先へ画面を遷移させることができる。これによって、マウスやキーボード等の面倒な入力手段を用いることなく、データ閲覧の操作を支援することができる。
【００９５】
また、請求項１に記載の発明によれば、マーカ文字は、音声認識用の辞書、例えば、音響・言語モデルの形態素辞書等に登録されている文字を用いるため、利用者が音声によって操作をする場合に、認識できない、あるいは、誤認識の確率を低減することができる。
【００９６】
請求項２に記載の発明によれば、利用者が発声し、認識された文字（音声認識文字）を表示装置の画面上に合成して表示するため、利用者はデータ閲覧支援装置が認識した文字を確認することができる。これによって、誤認識があった場合でも、利用者はその誤り箇所を知ることができ、利用者に対して操作に対する安心感を与えることができる。
【００９７】
請求項３に記載の発明によれば、利用者が発声する操作指令を、複数の特定文字列の組み合わせで定型化することで、種々の操作指令をパターン化することができる。このように、操作指令をパターン化することで、多くの操作指令を少ないパターンで認識することが可能になり、会話によるユーザインタフェースを実現するができる。
【００９８】
請求項４に記載の発明によれば、利用者からの操作指令に対する応答を音声によって行うので、操作の受付、操作間違い等を音声で通知することができる。これによって、操作性を向上させることができる。
【００９９】
請求項５に記載の発明によれば、放送データの受信と、通信データの受信とを音声によって自由に切り替えることができる。これによって、例えば、放送データの情報だけでは情報が不充分な場合であっても、音声による簡単な操作で通信データの情報を取得することができる。
【図面の簡単な説明】
【図１】本発明におけるデータ閲覧支援装置（テレビ操作支援装置）の全体構成を示したブロック図である。
【図２】ＨＴＭＬで記述された画面のデータ例を示した図である。
【図３】図２のデータを表示可能な出力形式に変換して表示した画面を示した図である。
【図４】マーカ文字に対応する音声によって、利用者の意思を判断する例を説明するための説明図である。
【図５】任意の音声によって、利用者の意思を判断する例を説明するための説明図である。
【図６】本発明におけるデータ閲覧支援装置（テレビ操作支援装置）が、ＨＴＭＬデータにマーカ文字を付加して表示する動作を示したフローチャートである。
【図７】本発明におけるデータ閲覧支援装置（テレビ操作支援装置）が、音声によってリンク先を移動する動作を示したフローチャートである。
【符号の説明】
１テレビ操作支援装置（データ閲覧支援装置）
１０受信手段
１０ａ放送データ受信部（放送データ受信手段）
１０ｂ通信データ受信部（通信データ受信手段）
１０ｃ受信切り替え部（受信切り替え手段）
１１リンク情報検索手段
１２マーカ文字付加手段
１３表示データ生成手段
１４表示制御手段
１４ａ文字合成部（文字合成手段）
１５音声認識手段
１６対話処理手段
１６ａ操作指令解析部（操作指令解析手段）
１６ｂ操作指令実行部
１６ｂ１リンク先切り替え部（リンク先切り替え手段）
１７音声合成手段
１８記憶手段
１８ａ音響・言語モデル
１８ｂ履歴データ
１８ｃリンク先データ（リンク情報記憶手段）
１８ｄ操作テンプレート
２表示装置
３マイク
４スピーカ
５放送波
６通信回線[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data browsing support apparatus, a data browsing method, and a data browsing program that support a browsing operation by a user's voice when browsing data described in a markup language.
[0002]
[Prior art]
Conventionally, in WWW (World Wide Web), which is one of the services of the Internet provided via a network, information (Web page) published on a Web server is a mark called HTML (Hyper Text Markup Language) It is described in the up language (page description language). Linking this HTML data with other HTML data is performed by associating a URL (Uniform Resource Locator) of the link destination with the character or graphic displayed on the screen and describing it in the data. On the other hand, the user can browse the contents of HTML data by operating a Web browser that analyzes and displays HTML on a personal computer or the like. At this time, the user can browse the information of other link destinations described in the URL by selecting the link destination with an input device such as a mouse.
[0003]
In data broadcasting, which is one of the services of digital broadcasting, the data to be broadcast is a mark called BML (Broadcast Markup Language) established by the Association of Radio Industries and Businesses (ARIB). It is described in the up language (page description language). The user can view the data broadcast using a digital broadcast television receiver equipped with a BML browser. In this BML, as in HTML, the link destination is described by a URL, and the user changes the display screen by selecting the link destination using the arrow keys on the remote controller.
[0004]
However, such a mouse or remote control is difficult to operate for children and elderly people. Therefore, recently, without using an input device such as a mouse, a user utters a character string indicating a link destination as speech and performs voice recognition, thereby browsing (viewing / listening) information desired by the user. The technology of surfing the Internet by voice recognition is disclosed (for example, refer to Patent Document 1).
[0005]
[Patent Document 1]
JP 2001-273216 A (page 4-6, FIG. 1-3)
[0006]
[Problems to be solved by the invention]
However, the conventional technique has a problem that the user must correctly utter a character string displayed on a screen indicating a link destination, and erroneously recognizes the character string in speech recognition. Also, since this character string is an arbitrary character string determined by the creator of the Web page or data broadcast, it cannot be recognized unless it is a character (character string) registered in advance in the acoustic / language model. There's a problem.
Further, in the conventional technique, when the link destination is set in a graphic or area such as a banner displayed on the screen, the user cannot specify the link destination by voice. there were.
[0007]
The present invention has been made in view of the above problems. When browsing (viewing) a Web page or data broadcast, the browsing operation is supported by the user's voice, and the link is made only by voice. It is an object of the present invention to provide a data browsing support apparatus, a data browsing method, and a data browsing program that make it possible to perform screen transition to the previous one.
[0008]
[Means for Solving the Problems]
The present invention has been developed to achieve the above-mentioned object. First, the data browsing support apparatus according to claim 1 is a data written in a markup language. Contents of Is a data browsing support device that supports browsing operation with the voice of the user when browsing, a link information search means, marker character addition means, link information storage means, display data generation means, voice The configuration includes a recognition unit and a link destination switching unit.
[0009]
According to such a configuration, the data browsing support apparatus searches for link information indicating a link destination from data described in a markup language such as HTML or BML by the link information search means. This link information is information indicating the address of the link destination, and is information defined by “href attribute” of “A tag” in HTML, for example. And the data browsing support device can identify the link information in the data on the same screen by the marker character adding means. And the vocabulary registered in the dictionary for speech recognition in advance. Marker characters are added to generate data with marker characters from the data, and the marker characters and link information are associated with each other and stored in the link information storage means. As a result, the link destination (address) can be specified by the marker character.
[0010]
Here, the marker character may be any character (string) that can be identified in the data on the same screen, and “1”, “2”, “3”..., “A”, “b”, “c” "..." can be used.
[0011]
Then, the data browsing support apparatus analyzes the data with marker characters by the display data generation means, generates display data that can be browsed, and displays the display data on the screen. As a result, the data browsing support apparatus can display an image or character string indicating a link destination with a marker character added, so that the user can include the link destination where the marker character is displayed. I can recognize that.
[0012]
Furthermore, the data browsing support apparatus recognizes the user's voice by voice recognition means. That is, the user's operation instruction for the data browsing support apparatus is recognized by voice. When the recognition result recognized by the voice recognition unit by the link destination switching unit corresponds to the marker character (recognized as a marker character), the data browsing support device uses the link information based on the marker character. Data is acquired from the link destination indicated by the link information stored in the storage means. Thus, the user can read the data by moving the link destination by uttering the marker character.
[0014]
Also, In the data browsing support apparatus, the user utters a vocabulary (marker character) registered in the dictionary for speech recognition, and thus it is possible to reduce erroneous recognition of speech recognition by the speech recognition means.
[0015]
And claims 2 The data browsing support device according to claim 1 is provided. In The data browsing support apparatus described above includes a character synthesizing unit that synthesizes and displays a speech recognition character that is a recognition result of the speech recognition unit on a display screen.
[0016]
According to such a configuration, the data browsing support apparatus synthesizes and displays the speech recognition character, which is the recognition result recognized by the speech recognition unit, on the display screen by the character synthesis unit. Thus, the user can confirm how the voice uttered by the user is recognized by the data browsing support apparatus.
[0017]
Claims 3 The data browsing support device according to claim 1 is provided. Or claim 2 In the data browsing support device described in the above, the operation command from the user is standardized in advance by a combination of specific character strings, and based on the specific character string included in the recognition result of the speech recognition means, the Operation command analysis means for analyzing the operation command is provided.
[0018]
According to such a configuration, since the data browsing support apparatus preliminarily formats the operation command with a combination of specific character strings, the specific character string included in the recognition result of the voice recognition unit is determined by the operation command analysis unit. An operation command can be specified depending on whether it is included in the stylized pattern. This specific character string is a character string that occurs with a high probability, for example, when watching TV or browsing a web page on the Internet, and includes “channel”, “program name”, “broadcast station name”, “genre”, and the like. An operation command can be specified (estimated) by combining verbs indicating actions.
[0019]
And claims 4 The data browsing support device according to claim 1 is a claim. 3 In the data browsing support apparatus according to any one of the above, a configuration is provided in which voice synthesizing means for outputting a response sentence to the operation command from the user as a voice is provided.
[0020]
According to this configuration, the data browsing support apparatus synthesizes a response sentence to the operation command from the user by the voice synthesizer and outputs the voice as a voice. As a result, a notification that the user's operation command has been accepted or that the operation command is incorrect is notified to the user by voice. Alternatively, the operation result, for example, the number of search results for the search operation may be notified by voice synthesis.
[0021]
Claims 5 The data browsing support device described in claim 3 or claim 4 The data browsing support apparatus described in the above item is configured to include broadcast data receiving means, communication data receiving means, and reception switching means.
[0022]
According to such a configuration, the data browsing support device receives broadcast data via the broadcast wave by the broadcast data receiving means, and receives communication data via the communication line by the communication data receiving means. In the data browsing support apparatus, the reception switching unit switches between reception by the broadcast data reception unit and reception by the communication data reception unit according to a user instruction. As a result, the data browsing support apparatus can perform a switching operation such as browsing a Web page or switching to TV viewing while browsing the Web page at an arbitrary timing during TV viewing.
[0023]
And claims 6 The data browsing method described in is data written in markup language. Contents of Is a data browsing method in which the operation of moving to the link destination is performed by the user's voice when browsing the link information, and the link information indicating the link destination embedded in the data can be identified within the same screen And the vocabulary registered in the dictionary for speech recognition in advance. A step of adding and displaying a marker character, a step of a user uttering the displayed marker character as speech, and the link corresponding to the marker character by recognizing the marker character uttered by the user And a step of acquiring and presenting data from the tip.
[0024]
According to this method, the link information indicating the link destination address embedded in the data described in a markup language such as HTML or BML can be identified in the data on the same screen. And the vocabulary registered in the dictionary for speech recognition in advance. Since the marker character is added, the marker character can be added to the image or character string indicating the link destination on the screen. In this data browsing method, when the user utters the marker character as a voice, the data can be acquired from the link destination corresponding to the marker character, and the link destination can be moved (transitioned).
[0025]
Claims 7 The data browsing program described in is data written in markup language. Contents of In order to perform the operation of moving to the link destination embedded in the data by browsing the user, the computer is linked with a link information search means, a marker character addition means, a link information storage control means, a display. It was decided to function as data generation means, voice recognition means, and link destination switching means.
[0026]
According to such a configuration, the data browsing program searches the link information indicating the link destination from the data described in a markup language such as HTML or BML by the link information search means. The data browsing program can be identified in the data on the same screen as the link information by the marker character adding means. And the vocabulary registered in the dictionary for speech recognition in advance. Marker characters are added to generate data with marker characters from the data, and the link information storage control means associates the marker characters with the link information and stores them in the link information storage means.
[0027]
Then, the data browsing program analyzes the data with marker characters by the display data generating means, generates display data that can be browsed, and displays it on the screen. Further, the data browsing program recognizes the user's voice by the voice recognition unit, and when the recognition result corresponds to the marker character (recognized as the marker character), the link destination switching unit stores the link information. With reference to the means, data is acquired from the link destination indicated by the link information associated with the marker character.
[0028]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. Here, the present invention is applied to a digital broadcast television receiver and configured as a television operation support device.
[0029]
[Configuration of TV operation support device (data browsing support device)]
FIG. 1 is a block diagram showing a configuration of a television operation support device (data browsing support device) according to the present invention. The television operation support device 1 performs viewing of digital broadcasts and browsing of Web pages via the Internet, and further supports operations for viewing (browsing) by the user's voice. Here, the television operation support apparatus 1 includes a receiving unit 10, a link information searching unit 11, a marker character adding unit 12, a display data generating unit 13, a display control unit 14, a voice recognition unit 15, and a dialogue process. Means 16, speech synthesis means 17, and storage means 18 are provided. In addition, here, the television operation support device 1 includes a display device 2 for displaying video, data, and the like, a microphone 3 that is an input unit that inputs audio, and a speaker 4 that is an audio output unit that outputs audio. Is connected to the outside.
[0030]
The receiving means 10 performs digital broadcast reception using the broadcast wave 5 and data reception via the communication line 6. Here, the receiving means 10 includes a broadcast data receiving unit 10a, a communication data receiving unit 10b, and a reception switching unit 10c.
[0031]
The broadcast data receiving unit (broadcast data receiving means) 10 a receives broadcast data broadcast by digital broadcasting via the broadcast wave 5. The broadcast data receiving unit 10a receives and demodulates broadcast data, performs error correction, TMCC (Transmission and Multiplexing Configuration Control) decoding, etc., and outputs it as an MPEG2 transport stream. The video / audio (video stream and audio stream) is output to the display control means 14. The data (BML: data stream) is output to the link information search means 11 as a data file obtained by decoding carousel transmission.
[0032]
The broadcast data receiving unit 10a receives and analyzes SI (Service Information; program arrangement information) defined in ARIB STD-B10 through carousel transmission, thereby analyzing the channel, program name, broadcasting station name, genre, and appearance. Program information such as a person's name is acquired, and each piece of program information is stored in the acoustic / language model 18a as one morpheme. This is because a program name or the like is generally expressed by a combination of a plurality of morphemes, but here, the operation command analysis unit 16a of the dialog processing means 16 described later recognizes the program name or the like as one morpheme. Thereby, the voice recognition rate of the program name and the like can be increased.
[0033]
The communication data receiving unit (communication data receiving means) 10 b performs data communication via the communication line 6. For example, a web page data request is transmitted to a website on the Internet, and data is received from the requested website. The communication data receiving unit 10b receives data (communication data) using a TCP / IP communication protocol. The received Web page data (HTML) is output to the link information search means 11.
[0034]
The reception switching unit (reception switching unit) 10c switches reception between the broadcast data receiving unit 10a and the communication data receiving unit 10b based on a “reception switching instruction” notified from the dialogue processing unit 16 described later. is there. For example, when browsing a web page while viewing a digital broadcast, the broadcast data receiving unit 10a stops receiving broadcast data and, based on the link destination (address) notified together with the “reception switching instruction”, The communication data receiving unit 10b is notified that communication data (Web page) is to be acquired.
[0035]
The link information search unit 11 analyzes the data (markup language) received by the receiving unit 10 and searches for link information indicating a link destination included in the data. For example, “href attribute” of “A tag” indicating the attribute of the link destination is searched in HTML or BML. Thereby, the link information search means 11 can detect that the link information is described in the tags (“<”, “>”) including the “href attribute” of the “A tag”.
[0036]
Note that the link information search unit 11 notifies the marker character addition unit 12 of the sequentially input data (markup language) until the tag including the link information is detected. Further, when the link information search means 11 detects a tag including link information, the link information search means 11 notifies the marker character addition means 12 of “detection notification” indicating that the link destination has been detected, and then includes the link information. The tag is notified to the marker character adding means 12.
[0037]
The marker character adding means 12 is used to detect link information. Search Each time the “notification of detection” is notified to the data (markup language) notified from the means 11, the marker character is added to generate the data with the marker character. This marker character is set to a color different from the background color of the screen. The data to which the marker character is added (data with marker character) is notified to the display data generating means 13.
[0038]
Further, the marker character adding unit 12 associates the link information after the “notification of detection” is notified with the added marker character, and stores them in the storage unit 18 as the link destination data 18c.
Further, the marker character adding means 12 enables the switching of the screen by an instruction such as “return”, “forward”, etc., so that the data with the marker character generated by adding the marker characters is stored in the storage means 18 as history data. This is stored as 18b.
[0039]
The marker character is an identifiable character in the data on the same screen, and uses a vocabulary registered in the acoustic / language model 18a referred to by the voice recognition means 15 described later. For example, if the vocabulary is registered in advance in the acoustic / language model 18a with “ichi” = “1”, “ni” = “2”, “san” = “3”, etc., “1”, “2 ”,“ 3 ”,... Are used as marker characters. This marker character may be any vocabulary that can be identified in the data on the same screen and is registered in the acoustic / language model 18a. For example, “a”, “b”, “c”,..., “I”, “ro”, “ha”,.
[0040]
The display data generating means 13 analyzes the data with marker character notified from the marker character adding means 12, converts it into a displayable output format, and outputs it. If the notified data is HTML data, it functions as a Web browser, and if it is BML data, it functions as a BML browser. Here, the display data converted into a displayable output format is displayed on the screen of the display device 2 via the display control means 14.
[0041]
The display data generating means 13 is notified by the dialog processing means 16 of a history movement instruction such as “return”, “forward”, etc., so that the data displayed in the past stored as the history data 18b (with marker characters) is displayed. Change the screen by referring to (Data).
[0042]
Here, with reference to FIGS. 2 and 3 (refer to FIG. 1 as appropriate), an example of adding marker characters to data described in a markup language will be specifically described. FIG. 2 shows an example of screen data described in HTML. FIG. 2A shows data that is a source before adding a marker character, and FIG. 2B shows data with a marker character obtained by adding a marker character to the data shown in FIG. FIG. 3 is a screen example in which data described in HTML is converted into a displayable output format and displayed. FIG. 3A shows an example of a screen displaying the data described in FIG. FIG. 3B is an example of a screen displaying the data with marker characters described in FIG.
[0043]
As shown in FIG. 2A, in HTML, the “href attribute” (att) of “A tag” defines a link destination (lnk), for example, “projectx / projectx.html”. That is, the link information search means 11 can search for link information indicating the link destination using the “href attribute” (att) as a key.
[0044]
Then, the marker character adding means 12 precedes the tags (“<”, “>”) including the “href attribute” (att) searched by the link information searching means 11 with “1” shown in FIG. ”,“ 2 ”,“ 3 ”, and other marker characters mk are added.
[0045]
In FIG. 2B, the color of the marker character mk is designated by “FONT tag”, and the thickness of the marker character mk is designated by “b tag”. Here, the marker character “1” is added by setting <FONT color = “# ffff99”><b> 1 </ b></font>.
[0046]
In FIG. 2B, when the background color is set to “# 000066” (bc) by the “bgcolor attribute” of the “body tag”, the color “# ffff99” (fc) different from the background color is set. ) Is the color of the marker character mk. The color fc of the marker character mk can always be set to a color different from the background color by subtracting the value of the background color bc from “ffffff”. Further, in FIG. 2B, the marker character mk is set in bold by “b tag”. Thus, the visibility of the marker character mk can be improved by defining the color and thickness of the marker character mk.
[0047]
FIG. 3A is an example of a screen displaying the data described in FIG. 2A, and images B1 to B5 in which the link destination is set on the left side of the screen and the link destination is set in the center of the screen. Character strings C1 and C2 are displayed.
[0048]
FIG. 3B is an example of a screen displaying the marker character-added data described in FIG. 2B. The images B1 to B5 and the character string C1 to which the link destinations are set in FIG. And C2 are added with marker characters (N1 to N6 and N6a). That is, the marker character N1 is added to the image B1 as “1”, and the image B2 is added to the marker character N2 as “2” in order.
[0049]
Here, it is assumed that the character strings C1 and C2 in FIG. 3A have the same link destination, and in FIG. 3B, “6” is added as the same marker characters N6 and N6a. In this way, the same marker character may be added to the same link destination, or the marker character may be added in order.
[0050]
As shown in FIGS. 2B and 3B, by adding a marker character to an image, character, or the like to which a link destination is set, the user can have no input means such as a mouse. The link destination can be specified by voice.
Returning to FIG. 1, the description will be continued.
[0051]
The display controller 14 outputs the display data output from the display data generator 13 and the video / audio notified from the broadcast data receiver 10a of the receiver 10 to the outside. The display control means 14 outputs display data and video to the display device 2 and outputs sound to the speaker 4 which is sound output means.
The display control means 14 includes a character synthesis unit 14a that synthesizes and displays characters on the display screen.
[0052]
The character synthesizer (character synthesizer) 14a synthesizes a voice recognition character recognized by the voice recognizer 15 described later on the display screen. By displaying this voice recognition character in a fixed area such as the lower left on the display screen, the user can confirm whether or not the operation command uttered by the user is correctly recognized.
[0053]
The voice recognition means 15 recognizes a user's voice (operation command) input from the microphone 3 and outputs it as text data (character string). The character string recognized here is output to the dialogue processing means 16. Note that the voice recognition by the voice recognition means 15 can be realized using a known general voice recognition technique. Furthermore, the character string (voice recognition character) recognized here is output to the display control means 14 and displayed on the display device 2 as a voice recognition result.
[0054]
The dialogue processing means 16 analyzes the user's operation command based on the text data (character string) that is the recognition result of the voice recognition means 15 and executes an operation corresponding to the operation command. Here, the dialogue processing means 16 includes an operation command analysis unit 16a and an operation command execution unit 16b.
[0055]
The operation command analysis unit (operation command analysis unit) 16a analyzes the character string input from the voice recognition unit 15 and recognizes the user's operation command. Here, the operation command analysis unit 16a divides the character string into morphemes by analyzing the morphemes based on the acoustic / language model 18a (here, the character strings are divided into words), and the morphemes are used as units. The content of the operation command is specified (determined) by matching with a predetermined operation template 18d.
[0056]
Further, the operation command analysis unit 16a notifies the voice synthesizer 17 of a response sentence set in advance with respect to the operation command, and recognizes the operation command to the user or fails to recognize it. Returns a response.
[0057]
Here, with reference to FIG. 4 and FIG. 5 (refer to FIG. 1 as appropriate), a method in which the operation command analysis unit 16a recognizes the content of the operation command from the character string will be specifically described. Both FIG.4 and FIG.5 is explanatory drawing for demonstrating how the operation command analysis part 16a judges a user's intention with the audio | voice which a user utters. FIG. 4 is an example in which the user's intention to view the link destination is determined based on the voice corresponding to the marker character, and FIG. 5 is an example in which the user's intention is determined based on an arbitrary voice other than the marker character. is there.
[0058]
First, an example in which a user's intention is determined based on a voice corresponding to a marker character will be described with reference to FIG. Here, an example is shown in which an operation character string “I want to see number 3” (FIG. 4A) recognized by the voice recognition means 15 is input to the operation command analysis unit 16a.
[0059]
Here, the operation command analysis unit 16a performs morpheme analysis on the operation character string based on the morpheme dictionary dic stored in the sound / language model 18a, and “3” and “number” as shown in FIG. 4B. It is divided into morphemes such as “”, “ga”, “see” and “want”. It is assumed that the marker character “3” is registered in advance in the morpheme dictionary dic. And as shown in FIG.4 (c), the operation instruction | command analysis part 16a matches a morpheme with the combination (fixed sentence) of the character string registered as the operation template 18d, and a user's Judgment is made.
[0060]
For example, it is assumed that “[@marker character] * {[view] | [change]}” is registered in the operation template 18d. Here, [@marker character] represents any one of the marker characters registered in the morpheme dictionary dic, and {[view] | [change]} is either [view] or [change]. It shall represent a verb. As a result, the operation command analysis unit 16a performs "3" which is the marker character in FIG. 4C and "see" (uses a verb which is not used here) indicating the operation, but "[[ @Marker character] * {[view] | [change]} ", and it can be determined that the user wants to see the link destination of the marker character" 3 ". In addition to the template, an operation corresponding to the template is described in the operation template 18d.
[0061]
Next, an example in which a user's intention is determined by an arbitrary voice will be described with reference to FIG. Here, an example is shown in which the operation character string “Look for the homepage of Project X” (FIG. 5A) recognized by the voice recognition means 15 is input to the operation command analysis unit 16a.
[0062]
Here, the operation command analysis unit 16a performs morphological analysis on the operation character string based on the morpheme dictionary dic stored in the acoustic / language model 18a, and “project X” “ "", "Homepage", "", "Find" and "Te." In the morpheme dictionary dic, “project X” is registered as “program name”, and “homepage” is registered as “command word”. Here, the “command word” refers to a term used for browsing the Internet or the like (in addition, “return” to return the screen, “cancel” to cancel data reception, etc.).
[0063]
Then, as shown in FIG. 5C, the operation command analysis unit 16a performs matching between each morpheme and a combination of character strings registered as the operation template 18d (fixed sentence), so that the user's Judgment is made.
[0064]
For example, it is assumed that “[@program name] * [@ command word] * {[view] | [search]}” is registered in the operation template 18d. Here, [@program name] represents any one of the program names registered in the morpheme dictionary dic, and [@ command word] represents any one of the command words registered in the morpheme dictionary dic. And {[See] | [Find]} represents a verb of [See] or [Find]. As a result, the operation command analysis unit 16a displays “project X” as the program name in FIG. 5C, “home page” as the command word, and “search” indicating the operation (verb not used here). Is used) matches the “[@program name] * [@ command word] * {[view] | [search]}” of the operation template 18d and intends to search for the homepage of the program name “project X”. Judgment can be made.
Returning to FIG. 1, the description will be continued.
[0065]
The operation command execution unit 16b executes an operation for the operation command analyzed by the operation command analysis unit 16a. The operation command execution unit 16b includes a link destination switching unit 16b1.
[0066]
The link destination switching unit (link destination switching unit) 16b1 refers to the link destination data 18c when the analysis result of the operation command by the operation command analysis unit 16a is an operation "I want to see the link destination of the marker character". The link destination of the marker character is acquired, and an instruction to acquire the link destination data is issued to the broadcast data receiving unit 10a or the communication data receiving unit 10b.
[0067]
In addition to the link destination switching unit 16b1, the operation command execution unit 16b includes a processing unit that executes each operation command (not shown). For example, when there is an operation instruction to view a desired homepage on the Internet while watching TV, the reception switching unit 10c is notified of an instruction to switch from receiving broadcast data to receiving communication data.
[0068]
In addition, the operation command execution unit 16b notifies the speech synthesis unit 17 of the operation result as a response sentence, and returns the operation execution result to the user. For example, when there is an operation instruction to search for a homepage related to a certain program, the number of cases is notified by a voice such as “There were five applicable homepages”.
[0069]
The voice synthesizing unit 17 synthesizes the response sentence notified from the dialogue processing unit 16 by voice synthesis to convert it into voice, and sends a response to the operation command to the user via the speaker 4. As a result, the user can operate the device 1 as if having a conversation with the television operation support device 1.
[0070]
The storage unit 18 stores various data necessary for voice recognition, dialogue processing, and the like in the television operation support apparatus 1, and is a general recording medium such as a semiconductor memory or a hard disk. Here, the storage unit 18 stores the acoustic / language model 18a, the history data 18b, the link destination data 18c, and the operation template 18d. Note that these data need not be stored in a single storage unit, but may be stored in a plurality of storage units.
[0071]
The acoustic / language model 18a is data including a speech word dictionary generated based on pronunciation data and a model expressing the connection of individual words by probability. Furthermore, this acoustic / language model 18a includes a dictionary (morpheme dictionary) of morphemes (minimum language units having meaning), and is used when the operation command analysis unit 16a performs morpheme analysis.
[0072]
In this morpheme dictionary, words such as “program name”, “broadcasting station name”, and “performer name” are registered as morphemes. In general, a program name or the like is often expressed by a combination of a plurality of morphemes, but here, in order to increase the voice recognition rate in TV operation, a “program name” or the like consisting of a plurality of morphemes is registered as one morpheme. Keep it. The vocabulary for operating the TV is updated by the broadcast data receiving unit 10a analyzing the broadcast data.
[0073]
The history data 18b stores data to which marker characters are added by the marker character adding means 12 (data with marker characters). The history data 18b is referred to by the display data generation means 13, and the screen transition is performed. This eliminates the need to acquire data displayed in the past again via broadcast waves or communication lines.
[0074]
The link destination data 18c is data in which the marker character added by the marker character adding unit 12 is associated with the link information to which the marker character is added. From this link destination data 18c, the operation command analysis unit 16a of the dialogue processing means 16 reads the link information corresponding to the marker character included in the operation command from the user. The link destination data 18c stored in the storage unit 18 corresponds to the link information storage unit described in the claims.
[0075]
The operation template 18d is obtained by standardizing the utterance content (operation command) uttered by the user with a combination of specific character strings and setting the operation for each fixed sentence. For example, as shown in FIG. 5, a “program name”, a “command word” that is a term for searching the Internet, and a verb (view or search) are combined to form a standard, and <“program” The operation of “see or search for“ command word ”of“ name ”is set in advance.
[0076]
In the operation template 18d, a response sentence is set for the fixed sentence of each operation command. For example, in the example of FIG. 5, when the operation command is recognized, the standard sentence “I understand” is displayed, and when “Program name” is not in the morpheme dictionary of the sound / language model 18a, “<Program There is a fixed sentence "There is no program named" Name "."
[0077]
The configuration of the television operation support device (data browsing support device) 1 according to the present invention has been described above, but the present invention is not limited to this. For example, the receiving means 10 is configured as any one of the broadcast data receiving unit 10a and the communication data receiving unit 10b to support only the data broadcasting operation in digital broadcasting, or only browse the Internet web page through the communication line. You may comprise as what supports.
[0078]
The television operation support device (data browsing support device) 1 can be realized by causing a general computer to execute a program and operating an arithmetic device or a storage device in the computer. This program (data browsing program) can be distributed via a communication line, or can be distributed by data broadcasting by being described in BML.
[0079]
[Operation of TV operation support device (data browsing support device)]
Next, the operation of the television operation support device (data browsing support device) according to the present invention will be described with reference to FIGS. FIG. 6 is a flowchart showing an operation (HTML data display operation) in which the television operation support apparatus 1 acquires Web page data (HTML) through the communication line 6 and presents the data on the display apparatus 2. FIG. 7 is a flowchart showing an operation in which the television operation support apparatus 1 moves the link destination in response to an operation command by the user's voice.
[0080]
(HTML data display operation)
First, referring to FIG. 6 (refer to FIG. 1 as appropriate), an operation in which the television operation support device 1 acquires HTML data and displays a screen with marker characters added will be described. Here, the operation for analyzing and displaying the HTML acquired via the communication line 6 will be described.
[0081]
The television operation support device 1 receives the Web page data (HTML) by the communication data receiving unit 10b of the receiving means 10, and the link information search means 11 reads the data (HTML) (step S10). Then, the link information retrieval unit 11 retrieves “href attribute” which is link information indicating the attribute of the link destination from the HTML data (step S11).
[0082]
Then, the presence / absence of link information is determined (step S12), and if link information exists (Yes in step S12), the marker character adding means 12 adds a marker character before the link information and adds the marker character-added data. Generation is performed (step S13). At this stage, the marker character adding means 12 stores the data with marker characters in the storage means 18 as history data 18b (step S14). Further, the marker character adding means 12 associates the link information with the marker character added to the link information and stores it as the link destination data 18c of the storage means 18 (step S15). Then, the display data generating means (browser) 13 generates display data converted into a displayable output format from the data with marker characters generated by the marker character adding means 12, and outputs the display data to the display device 2 (step S16). ).
[0083]
On the other hand, if the link information does not exist (No in step S12), the process proceeds to step S16, and the display data generating means (browser) 13 generates display data converted from HTML data into a displayable output format and displayed. Output to device 2.
[0084]
Then, the link information search means 11 determines whether or not the reading of the HTML data has ended (step S17). If it has ended (Yes in step S17), the operation ends, and if it has not ended (in step S17) No) returns to step S10 and continues the operation after reading the HTML data.
[0085]
Through the above operation, when the television operation support device 1 presents the Web page data (HTML) acquired via the communication line 6 to the display device 2, the television operation support device 1 Marker characters such as “1” and “2” can be added and displayed on the screen.
[0086]
(Link destination movement operation)
Next, with reference to FIG. 7 (refer to FIG. 1 as appropriate), an operation in which the television operation support apparatus 1 moves the link destination according to an operation command based on a user's voice will be described.
First, the television operation support device 1 recognizes the user's voice (operation command) input from the microphone 3 by the voice recognition unit 15 and converts it into text data (character string) (step S20). Then, the voice recognition unit 15 notifies the display control unit 14 of a character string (speech recognition character) that is a voice recognition result, so that the character synthesis unit 14a of the display control unit 14 synthesizes the voice recognition character on the screen. Is displayed on the display device 2 (step S21). Thus, the user can determine how the voice he / she uttered has been recognized.
[0087]
In addition, the television operation support device 1 recognizes the user's operation command by analyzing the voice recognition character by the operation command analysis unit 16a of the dialogue processing unit 16. More specifically, the operation command analysis unit 16a performs morphological analysis on the speech recognition character based on the acoustic / language model 18a stored in the storage unit 18 (step S22), The meaning of the operation command is determined by pattern matching with the operation template 18d stored in the storage unit 18 (step S23).
[0088]
Then, the operation command execution unit 16b determines whether or not the operation command determined in step S23 is movement of the link destination due to the utterance of the marker character by the user (step S24), and is movement of the link destination. In the case (Yes in step S24), the link destination switching unit 16b1 performs link destination switching. That is, the link destination switching unit 16b1 acquires the link information (link destination address) corresponding to the marker character from the link destination data 18c stored in the storage unit 18 (step S25), and the reception unit 10 By notifying the link destination, the receiving means 10 acquires data from the new link destination (step S26).
[0089]
And the television operation assistance apparatus 1 displays the data (HTML data) acquired from the new link destination acquired by this step S26 on the display apparatus 2 (step S27). The specific operation in step S27 is the same as the operation described in FIG. 6 for acquiring HTML data and displaying a screen with marker characters added.
[0090]
On the other hand, when the operation command is not the movement of the link destination in step S24 (No in step S24), the operation command execution unit 16b executes various operations according to the operation command. For example, the operation command analysis unit 16a recognizes that the user has uttered “return”, thereby performing an operation of returning the screen to the previous screen. In this case, the operation command execution unit 16b notifies the display data generation unit 13 of a history movement instruction indicating “return”, and the display data generation unit 13 acquires data one screen before from the history data 18b. The data one screen before is converted into a displayable output format and displayed on the screen of the display device 2.
[0091]
Through the above operation, the television operation support device 1 can recognize and execute an operation desired by the user by means of a voice uttered by the user. Further, the switching operation (operation) of the link destination can be performed by a simple operation in which the user utters the marker character displayed on the screen.
[0092]
6 and 7 mainly explain the display operation of the HTML data and the movement operation of the link destination of the HTML data, but the same operation is performed on the data (BML) received via the broadcast wave 5 as well. It is possible.
[0093]
【The invention's effect】
As described above, the data browsing support device, the data browsing method, and the data browsing program according to the present invention have the following excellent effects.
[0094]
Claim 1, claim 6 Or claims 7 According to the invention described in the above, when browsing data described in a markup language such as HTML or BLM, a marker character is displayed on the screen to indicate that a link destination exists, and to the user Utters the marker character, so that the screen can be shifted to the link destination desired by the user. Even if the link destination is indicated by an image or the like, the user can transition the screen to the link destination with a simple operation of uttering the marker character attached to the image. . As a result, data browsing operations can be supported without using troublesome input means such as a mouse or a keyboard.
[0095]
Also, Claim 1 According to the invention described in the above, since the marker character is a character registered in a dictionary for speech recognition, for example, a morpheme dictionary of an acoustic / language model, the marker character is recognized when the user operates by speech. Cannot be performed, or the probability of misrecognition can be reduced.
[0096]
Claim 2 According to the invention described in the above, the user utters and recognizes the recognized character (speech recognition character) on the screen of the display device so that the user confirms the character recognized by the data browsing support device. can do. As a result, even when there is a misrecognition, the user can know the error location and give the user a sense of security for the operation.
[0097]
Claim 3 According to the invention described in, various operation commands can be patterned by standardizing the operation commands uttered by the user with a combination of a plurality of specific character strings. Thus, by patterning the operation commands, it becomes possible to recognize many operation commands with a small pattern, and a user interface by conversation can be realized.
[0098]
Claim 4 According to the invention described in the above, since the response to the operation command from the user is performed by voice, the reception of the operation, the operation mistake, etc. can be notified by voice. Thereby, operability can be improved.
[0099]
Claim 5 According to the invention described in (1), it is possible to freely switch between receiving broadcast data and receiving communication data by voice. Thereby, for example, even if the information is not enough only by the information of the broadcast data, the information of the communication data can be acquired by a simple operation by voice.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a data browsing support device (television operation support device) according to the present invention.
FIG. 2 is a diagram showing an example of screen data described in HTML.
3 is a view showing a screen displayed by converting the data of FIG. 2 into a displayable output format.
FIG. 4 is an explanatory diagram for explaining an example in which a user's intention is determined based on a voice corresponding to a marker character.
FIG. 5 is an explanatory diagram for explaining an example in which a user's intention is determined by an arbitrary voice.
FIG. 6 is a flowchart showing an operation in which a data browsing support device (television operation support device) according to the present invention displays an HTML data with a marker character added thereto.
FIG. 7 is a flowchart showing an operation in which the data browsing support device (television operation support device) according to the present invention moves the link destination by voice.
[Explanation of symbols]
1 TV operation support device (data browsing support device)
10 Receiving means
10a Broadcast data receiving unit (broadcast data receiving means)
10b Communication data receiving unit (communication data receiving means)
10c Reception switching unit (reception switching means)
11 Link information search means
12 Marker character addition means
13 Display data generation means
14 Display control means
14a Character composition part (character composition means)
15 Voice recognition means
16 Dialogue processing means
16a Operation command analysis unit (operation command analysis means)
16b Operation command execution unit
16b1 Link destination switching unit (link destination switching means)
17 Speech synthesis means
18 Memory means
18a Acoustic / Language Model
18b History data
18c Link destination data (link information storage means)
18d operation template
2 display devices
3 Microphone
4 Speaker
5 broadcast waves
6 Communication line

Claims

A data browsing support device that assists browsing operations with the voice of a user when browsing the contents of data described in a markup language,
Link information search means for searching for link information indicating a link destination from the data;
A marker character using a vocabulary that can be identified in the data on the same screen and registered in advance in the dictionary for speech recognition is added to the link information searched by the link information search means, and the marker is extracted from the data. A marker character adding means for generating data with characters;
Link information storage means for storing the marker character and the link information in association with each other;
Display data generation means for analyzing the data with marker characters and generating display data that can be browsed;
Voice recognition means for recognizing the user's voice;
When the recognition result recognized by the voice recognition unit corresponds to the marker character, data is obtained from the link destination indicated by the link information stored in the link information storage unit based on the marker character. Link destination switching means to be acquired;
A data browsing support device characterized by comprising:

The data browsing support apparatus according to claim 1 , further comprising: a character synthesizing unit that synthesizes and displays a voice recognition character that is a recognition result of the voice recognizing unit on a display screen.

Operation command analysis means for preliminarily stating the operation command from the user with a combination of specific character strings and analyzing the operation command based on the specific character string included in the recognition result of the voice recognition means that it comprises a data browsing support device according to claim 1 or claim 2, characterized in.

The data browsing support apparatus according to any one of claims 1 to 3 , further comprising: a voice synthesizer that outputs a response sentence to the operation command from the user as a voice.

Broadcast data receiving means for receiving broadcast data via broadcast waves;
Communication data receiving means for receiving communication data via a communication line;
Based on the operation command analyzed by the operation command analysis means, reception switching means for switching between reception of the broadcast data reception means and reception of the communication data reception means,
The data browsing support device according to claim 3 , wherein the data browsing support device is provided.

When browsing the content of data written in a markup language, the data browsing method is to perform the operation of moving to the link destination by the voice of the user,
Adding to the link information indicating the link destination embedded in the data with a marker character using a vocabulary that is identifiable within the same screen and registered in advance in a speech recognition dictionary; and ,
A user uttering the displayed marker character as speech;
Recognizing the marker character uttered by the user, obtaining and presenting data from the link destination corresponding to the marker character; and
The data browsing method characterized by including.

In order to perform the operation of moving to the link destination embedded in the data when browsing the contents of the data described in the markup language, a computer is used.
Link information search means for searching for link information indicating the link destination from the data,
A marker character using a vocabulary that is identifiable within the same screen and registered in advance in a dictionary for speech recognition is added to the link information searched by the link information searching means, and the marker character is attached from the data. Marker character adding means for generating data,
Link information storage control means for associating the marker character with the link information and storing it in link information storage means
Display data generation means for analyzing the data with marker characters and generating display data that can be browsed,
Voice recognition means for recognizing the voice of the user;
When the recognition result recognized by the voice recognition unit corresponds to the marker character, data is obtained from the link destination indicated by the link information stored in the link information storage unit based on the marker character. Link destination switching means to obtain,
Data browsing program characterized by functioning as