JP2005018241A

JP2005018241A - Information processor, link designation file acquisition method, link designation file acquisition program and program recording medium

Info

Publication number: JP2005018241A
Application number: JP2003179675A
Authority: JP
Inventors: Akira Tsuruta; 彰鶴田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2003-06-24
Filing date: 2003-06-24
Publication date: 2005-01-20

Abstract

PROBLEM TO BE SOLVED: To provide an information processor capable of accurately executing the voice recognition of a character string which has less difference of display pattern with an HTML sentence of an original sentence, and has difficult reading or has a plurality of reading, and capable of easily following a link by voice operation. SOLUTION: In the information processor of the present invention, a character string capable of associating a link designation from the extracted text sentence is displayed while being emphasized. In the information processor of the present invention, reading is displayed on the whole or part of the character string to the character string having difficult reading, or another character string requiring caution for reading. COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、入力された音声を認識して、ハイパーテキストなどのファイルから次のリンク先ファイルを取得する情報処理装置、リンク先ファイル取得方法、リンク先ファイル取得プログラム、及びプログラム記録媒体に関する。
【０００２】
【従来の技術】
近年、コンピュータネットワーク上、特にインターネットを介し、サーバ／クライアント形式で画像、音声などの情報を提供するＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）システムがある。ＷＷＷシステムでは、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋ−ｕｐＬａｎｇｕａｇｅ）と呼ばれるハイパーテキスト形式で情報がサーバからクライアントに送られる。クライアントではブラウザと呼ばれる情報閲覧装置を用いてＨＴＭＬで記述されたテキストを閲覧する。
【０００３】
ハイパーテキストには、表示すべき文字や写真などのオブジェクトの書式や、画面上のレイアウトに加え、他の情報へのリンクが存在することを示すリンクの説明文と、リンク先ファイル名であるＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）などを記述することができる。他の情報へのリンクは、ハイパーテキストに埋め込まれたリンクというタグから、他の情報を読み込むことにより行う。
【０００４】
一般的なブラウザでは、リンクの説明文は、通常フォントの色を変えたり、下線を引くなど他のテキストと区別できるように表示されている。上記リンクの説明文とＵＲＬとは、対応付けて管理されている。ユーザがマウスなどのポインティングデバイスやキーボードを使用して特定のリンクの説明文を指定すると、ブラウザはこの説明文に対応したＵＲＬを用いて、そのＵＲＬにより一意に決められるサーバに情報送信を依頼する。このようにして、ユーザはリンクを指定することにより、次々と関連情報にアクセスすることができる。
【０００５】
一方、音声認識を採用したブラウザも提案されている。例えば、ハイパーテキスト文章からアンカーポイントとなる文字列とこのアンカーポイントに埋め込まれたリンク先情報を抽出して、前記文字列に読み方データベースを参照して音声認識のための読みを与えて、ユーザが所望のリンクの説明文を読み上げると、リンク先の文章をロードする技術が開示されている（例えば、特許文献１参照）。また、この文献には、リンク先の情報にそれぞれ固有の番号を付け、ユーザがその番号を音声入力した場合に、リンク先の文章をロードする技術も開示されている。
【０００６】
また、リンク先ファイルに関連付けられた説明文の言語解析を行い、説明文を特徴付ける唯一の単語を選択し、この選択した単語を単語辞書に登録し、ユーザが所望の単語を読み上げると、リンク先の文章をロードする技術が開示されている（例えば、特許文献２参照）。この文献では、音声認識可能な単語を識別しやすいように、リンクの説明文中の音声認識可能な単語の読みをこの単語に関連付けて表示する、あるいは色を変更して表示するなどの強調表示をすることも提案されている。
【０００７】
【特許文献１】
特開平１０−１２４２９３号公報（請求項１、段落００１４〜００１７、段落２２）
【特許文献２】
特開平１１−２５０９８号公報（請求項１、段落００４６〜００５４）
【０００８】
【発明が解決しようとする課題】
しかし、特許文献１に記載の発明の場合、アンカーポイントとなる文字列に対して、読み方データベースにアクセスしながら、文字列の読み方が順次生成される。このため、ＨＴＭＬ文章に記述されているリンクの説明文の中に、ユーザが読み方のわからない単語や、例えば市場（しじょう、いちば）などのように読み方が複数ある単語が存在する場合に、読みが表示されないので、ユーザはリンクの説明文を付された文字列の読み方どおりに発声できず、音声操作によりリンクがたどれないという問題がある。
【０００９】
また、特許文献２に記載の発明の場合、リンクファイルに関する説明文から、説明文を特定できる唯一の単語を選択し、単語の読みをこの単語に関連付けて表示する。このため、特定された文字列全体に読みが付されるので、原文のＨＴＭＬ文章と表示形態が大きく異なるという問題がある。一方、読みを表示しない方法も開示されているが、読み方のわからない単語が存在する場合、ユーザはリンクの説明文を付された文字列の読み方どおりに発声できず、音声操作によりリンクがたどれないという問題がある。また、特定される文字列が名詞または名詞句であるので、同一の名詞または名詞句が存在する場合には、文字列が特定できない。すなわち、文字列を特定する条件が狭くなるという問題がある。
【００１０】
本発明は、上記問題に鑑みなされたものであり、その目的は、原文のＨＴＭＬ文章と表示形態の相違が少なく、読みが難しいまたは読みが複数ある文字列があっても正確に音声認識でき、音声操作によりリンクを容易にたどることのできる情報処理装置を提供することにある。
【００１１】
【課題を解決するための手段】
上記目的を達成するために、本発明の情報処理装置は、ネットワーク上のサーバから取得したファイルから、次のリンク先ファイルを取得する情報処理装置であって、前記取得したファイルを解析し、次のリンク先ファイルのファイル名とリンク先ファイル名に関連付けられたテキスト文とを抽出する文字列抽出手段と、前記抽出されたテキスト文からリンク先を関連付けることができる文字列を特定する文字列特定手段と、前記特定された文字列を強調表示させるための表記と、前記文字列の全部または一部に読み方が難しい文字列、または読み方に注意が必要な文字列に対して、文字列の全部または一部に読み方を表示するための表記とを生成する表記生成手段と、前記特定された文字列とリンク先ファイル名とを、リンク先ファイル名に対応付けて登録する管理テーブルと、前記管理テーブルに登録された文字列に関する音声認識を行うために音声認識辞書に登録する辞書登録手段と、ユーザから入力された音声を、前記音声認識辞書を用いて解析し、音声認識を行う音声認識手段と、前記音声認識手段の認識結果に基づき前記管理テーブルから対応する前記リンク先ファイル名を取得するリンク先ファイル名取得手段と、を有する。
【００１２】
この構成によれば、抽出されたテキスト文からリンク先を関連付けることができる文字列が、強調表示される。ユーザは音声で入力できる文字列を容易に認識できる。この結果、ユーザは、所望の文字列を読み上げることにより、リンク先の情報を簡単に選択できる。
【００１３】
また、文字列の全部または一部に読み方が難しい文字列、または読み方に注意が必要な文字列に対して、文字列の全部または一部に読み方が表示される。ユーザは、文字列の読み方を容易に知ることができる。特に、読み方が難しい文字列、または読み方に注意が必要な文字列に対してのみ読みが併記されるので、原文のＨＴＭＬ文章と表示形態とが大きく異なることはない。
【００１４】
また、本発明の情報処理装置では、前記表記生成手段は、単語の読み方に関する難易度情報を用いて、前記特定された文字列に含まれる単語であって、予め設定した難易度より読み方の難しい単語に対して、その単語の読み方を表示させる表記を生成する。
【００１５】
この構成によれば、読み方の難しい単語の読みが付与される。この結果、ユーザは、文字列の中に読み方を知らない単語があっても、正しく読むことができるので、発声内容をまちがえない。
【００１６】
また、本発明の情報処理装置では、表記生成手段は、漢字の読み方に関する難易度情報を用いて、前記特定された文字列に含まれる単語であって、予め設定した難易度より読み方の難しい漢字を有する単語に対して、その漢字の読み方を表示させる表記を生成するものであってもよい。
【００１７】
この構成によると、読み方の難しい漢字の読みが付与される。この結果、ユーザは、文字列の中に読み方を知らない漢字があっても、正しく読むことができるので、発声内容をまちがえない。全ての単語に難易度情報を持たせると、難易度情報は、膨大な情報量になる。しかし、本発明の構成にように数千語の漢字について難易度情報を持たせれば、難易度情報をコンパクトに実現できる。
【００１８】
また、上記表記生成手段は、単語の読み方に関する難易度または漢字の読み方に関する難易度を、ユーザの指示により設定できるものとすることができる。これにより、ユーザが自分の語学力に応じて読み方を付す基準を定めることができる。
【００１９】
前記表記生成手段は、単語の読み方に関する難易度情報を用いて、前記特定された文字列に含まれる単語であって、複数の読み方を有する単語に対して、その文字列における単語の適切な読み方を表示させる表記を生成するものであってもよい。
【００２０】
前記表記生成手段は、前記特定された文字列に含まれる固有名詞に対して、その固有名詞の読み方を表示させる表記を生成するものであってもよい。
【００２１】
前記表記生成手段は、漢字また仮名以外の文字列の読み方に関する読み方情報を用いて、前記特定された文字列に含まれる漢字また仮名以外の文字列に対して、その文字列の読み方を表示させる表記を生成するものであってもよい。この構成によれば、英単語や記号などの漢字また仮名以外の文字列に対しても読み方を表示できるので、発声内容をまちがえない。
【００２２】
読み方の表記は、前記特定された文字列に含まれる読み方を付与すべき文字列に対して、その文字列の読みを、当該文字列に対応付けて表示させる表記を生成してもよい。あるいは、特定された文字列に含まれる読み方を付与すべき文字列に対して、その文字列の読みを、他の表示とは異なる書体で、表示させてもよい。前記特定された文字列に含まれる読み方を付与すべき文字列に対して、その文字列の読みを、他の表示とは異なる色で、表示させてもよい。
【００２３】
一方、前記特定された文字列に含まれる読み方を付与すべき文字列に対して、その文字列の読みを、その文字列の代わりに、表示させる表記を生成するものであってもよい。この構成によると、文字列の代わりに、文字列の読みを表記するので、表示を見やすくすることができる。
【００２４】
また、本発明のネットワーク上のサーバから取得したファイルから、次のリンク先ファイルを取得するリンク先ファイル取得方法は、前記取得したファイルを解析し、次のリンク先ファイルのファイル名とリンク先ファイル名に関連付けられたテキスト文とを抽出するステップと、前記抽出されたテキスト文からリンク先を関連付けることができる文字列を特定するステップと、前記特定された文字列を強調表示させるための表記と、前記文字列の全部または一部に含まれる読み方が難しい文字列、または読み方に注意が必要な文字列に対して、読み方を表示するための表記とを生成するステップと、前記特定された文字列とリンク先ファイル名とを、リンク先ファイル名に対応付けて管理テーブルに登録するステップと、前記管理テーブルに登録された文字列に関する音声認識を行うために音声認識辞書に登録するステップと、ユーザから入力された音声を、前記音声認識辞書を用いて解析し、音声認識を行うステップと、前記音声認識の認識結果に基づき前記管理テーブルから対応する前記リンク先ファイル名を取得するステップと、を有するものであってもよい。
【００２５】
また、本発明によれば、ネットワーク上のサーバから取得したファイルから、次のリンク先ファイルを取得するリンク先ファイル取得プログラムであって、コンピュータを、前記取得したファイルを解析し、次のリンク先ファイルのファイル名とリンク先ファイル名に関連付けられたテキスト文とを抽出する文字列抽出手段と、前記抽出されたテキスト文からリンク先を関連付けることができる文字列を特定する文字列特定手段と、前記特定された文字列を強調表示させるための表記と、前記文字列の全部または一部に含まれる読み方が難しい文字列に対して、または読み方に注意が必要な文字列に対して、読み方を表示するための表記とを生成する表記生成手段と、前記特定された文字列とリンク先ファイル名とを、リンク先ファイル名に対応付けて登録する管理テーブルと、前記管理テーブルに登録された文字列に関する音声認識を行うために音声認識辞書に登録する辞書登録手段と、ユーザから入力された音声を、前記音声認識辞書を用いて解析し、音声認識を行う音声認識手段と、前記音声認識手段の認識結果に基づき前記管理テーブルから対応する前記リンク先ファイル名を取得するリンク先ファイル名取得手段として機能させるためのリンク先ファイル取得プログラムが提供される。
【００２６】
また、本発明は、上記リンク先ファイル取得プログラムを記録したコンピュータ読み取り可能な記録媒体を提供する。
【００２７】
また、本発明の情報処理装置は、ネットワーク上のサーバから取得したファイルに含まれる次のリンク先ファイルに対して、次のリンク先ファイルを関連付けることができる文字列を音声による指示により、次のリンク先ファイルを取得する情報処理装置であって、前記次のリンク先ファイルを関連付けることができる文字列から、読み方を表示する必要のある文字列を抽出する手段と、前記抽出された文字列に読み方を併記する手段、または前記抽出された文字列を、その文字列の読みに置換する手段と、を有するものであってもよい。
【００２８】
【発明の実施の形態】
以下に、本発明の実施の形態を、図面を参照しながら説明する。なお、本発明は、これらによって限定されるものではない。
【００２９】
図１は、本発明の情報処理装置の構成を示す図である。図１に示すように、情報処理装置は、制御部１と、プログラムメモリ２と、データメモリ３と、入力部４と、外部記憶部５と、通信部６と、表示部７と、音声入力部８とから構成され、これらの各部が、相互にバス接続されている。
【００３０】
制御部１は、プログラムメモリ（ＲＡＭ）２内に格納されているプログラムに従って動作し、この装置全体の動作を制御すると共に、ブラウザなどのプログラムを実行する。データメモリ（ＲＯＭ）３は、制御部１で使用される各種制御データを記憶している。データメモリ（ＲＡＭ）３は、制御部１による各種制御処理の実行時、ワークエリアとして使用され、各種データを一時的に保存する。プログラムメモリ２は、制御部１によって実行されるプログラムが格納されている。プログラムは、外部記憶部５または外部ネットワークより通信部６を介してプログラムメモリ２に格納される。制御部１は、外部ネットワークから通信部６を介して入力されたファイルを解析して、次のリンク先ファイルやリンク先ファイル名を抽出する。システムプログラム、ブラウザプログラム、音声認識プログラム、言語解析プログラムなどのプログラムは、プログラムメモリ２に格納される。
【００３１】
言語解析用単語辞書、情報テーブル、読み方辞書、受信したＨＴＭＬ文章のキャッシュ、音声認識用音響モデルなどは、外部記憶部５または外部ネットワークより通信部６を介してデータメモリ（ＲＡＭ）３に格納される、あるいはハードディスクなどのデータメモリ（ＲＯＭ）３に直接書き込まれている。管理テーブル、音声認識辞書などの作業用の変数は、データメモリ（ＲＡＭ）３に格納される。
【００３２】
入力部４は、キーボードやマウスなどで構成される。通信部６は、インターネットなどの外部ネットワークに接続され、ネットワークを介して到来したデータを受信して、制御部１に出力する。外部記憶部５は、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭなどの記録媒体から、プログラムデータなどを呼び出して、制御部１に出力する。なお、こうして読み出されたプログラムデータは、制御部１の処理によってデータメモリ３などへインストールされる。
【００３３】
表示部７は、表示内容をユーザに提示するもので、ＣＲＴや液晶表示装置などの表示手段で構成される。
【００３４】
音声入力部８では、マイクや外部アンプと接続され、ユーザから入力された音声信号をＡ／Ｄ変換してデジタル信号に変換する。
【００３５】
図２は、本実施の形態の情報処理装置を説明する構成ブロック図である。本実施の形態の情報処理装置は、文字列抽出部１１と、文字列特定部１２と、表記生成部１３と、管理テーブル１４と、辞書登録部１５と、リンク先ファイル名取得部１６と、単語辞書２１、情報テーブル２２、音声認識辞書２３とから、構成される。
【００３６】
本発明の情報処理装置は、入力部４を介して取得したＨＴＭＬ文章は、文字列抽出部１１において解析される。具体的には、ＨＴＭＬ文章から、次のリンク先ファイルのファイル名と、リンク先ファイル名に関連付けられた説明文が抽出される。
【００３７】
文字列特定部１２では、前記抽出された説明文を解析し、リンク先を対応付けることができる文字列を特定する。具体的には、単語辞書２１を用いて、単語辞書２１に記載されている単語が上記文字列に含まれているかどうかを判定することによって、行う。単語辞書２１を参照することにより、文字列を構成する単語の読み及び品詞や単語の区切りを判定する文法情報などが得られる。
【００３８】
表記生成部１３では、情報テーブル２２を参照して、前記特定された文字列中に、読み方を付与する必要がある文字列が存在するかどうかが、判断される。読み方を付与する必要があると判断された単語に関しては、上記ＨＴＭＬ文章に読み方を表示する表記を与える。上記表示部７では、読み方が付与されたＨＴＭＬ文章が、表示される。情報テーブル２２は、上記単語辞書に含まれる全ての単語、あるいは漢字のみを選択して、読み方の難易度情報を表示したテーブルである。複数の読み方を有する単語については、表記生成部１３で、読み方辞書を参照して適切な読み方を表示する表記を与える。また、複数の読みを有する単語に読み方を付与する場合には、情報テーブルに代えて読み方辞書を用いることもできる。特定された文字列に含まれる漢字また仮名以外の文字列に対して読み方を付与する場合には、情報テーブルに代えて読み方テーブルを用いてもよい。一方、このような情報テーブルや読み方テーブルを用いずに、単語辞書に付加した難易度情報を用いて、または単語辞書が有する読み方情報を用いて、読み方を付与する構成としてもよい。
【００３９】
管理テーブル１４では、上記文字列特定部１２で特定された文字列と、上記表記生成部１３で与えられた文字列の読み方データとが、リンク先ファイルに対応付けて、格納される。
【００４０】
辞書登録部１５では、前記文字列特定部で、特定された文字列と、その読み方とが、音声認識辞書２３に登録される。
【００４１】
音声認識部１６は、音声入力部８から入力された音声信号を分析し、音声認識辞書２３と照合する。音声認識は、公知の音声認識技術を用いることができる。例えば、隠れマルコフモデル（以下、ＨＭＭと呼ぶ）を用いて音声認識を行う（例えば、中川聖一著「確率モデルによる音声認識」電子情報通信学会、参照）。具体的には、上記分析された音声信号を、音声認識辞書２３に登録された認識対象語句に対応するＨＭＭ全てについて、その生起確率を求め、生起確率の最も高いＨＭＭに対応する語句を認識結果とする。
【００４２】
リンク先ファイル名取得部１７は、音声認識部１６で得られた結果から、管理テーブルを参照して、特定の文字列に対応するリンク先ファイルのファイル名を取得する。このファイル名に対応するファイルが読み込まれる。
【００４３】
（動作）
次に、本発明の情報処理装置の動作について図３を用いて詳細に説明する。図３は、本発明にかかる情報処理装置の動作を示すフローチャートである。
【００４４】
ステップＳ１０１では、インターネットなどの通信網を通して通信部６によりサーバから取得した、指定されたＨＴＭＬ文章が、データメモリ（ＲＯＭ）３に記憶される。ステップＳ１０２では、文字列抽出部１１にて、このテキスト文章を、タグ情報をもとに解析がなされる。
【００４５】
ここで、ＨＴＭＬ文章について、簡単に説明する。図４は、ＨＴＭＬ文章の一例を示す図である。＜ｈｔｍｌ＞タグと、＜／ｈｔｍｌ＞タグとは、文章がＨＴＭＬで書かれていること宣言する定義である。＜ｈｔｍｌ＞タグと、＜／ｈｔｍｌ＞タグとは、文章全体の最初と最後におかれる。＜ｈｅａｄ＞タグと、＜／ｈｅａｄ＞タグとの間には、文章のタイトルや特徴、製作者の情報などの、文章に関する情報が記載される。＜ｔｉｔｌｅ＞タグと＜／ｔｉｔｌｅ＞タグとの間には、＜ｈｅａｄ＞タグと、＜／ｈｅａｄ＞タグで挟まれた部分において、文章のタイトルが記載される。＜ｂｏｄｙ＞タグと、＜／ｂｏｄｙ＞タグで挟まれた部分に、実際にブラウザに表示される文章が記載される。
【００４６】
＜Ａｈｒｅｆ＝“★”＞タグと＜／Ａ＞タグの間には、リンクが設定される。★には、移動先のファイル名（ＵＲＬ）を記入する。例えば、図中の＜Ａｈｒｅｆ＝“ｈｔｔｐ：／／ｗｗｗ．ｘｙｚ．ｃｏ．ｊｐ／ｎｅｗｓ／ｄｎ．ｈｔｍｌ”＞社会＜／Ａ＞では、“ｈｔｔｐ：／／ｗｗｗ．ｘｙｚ．ｃｏ．ｊｐ／ｎｅｗｓ／ｄｎ．ｈｔｍｌ”がリンク先ファイル名で、「社会」が説明文を意味する。これらのタグ情報についての詳細は、例えばアンク著「ＨＴＭＬタグ辞典」（株式会社翔泳社）などに開示されている。図５は、ブラウザプログラムを用いて図４のＨＴＭＬ文章を表示した例である。
【００４７】
ステップＳ１０３では、＜Ａ＞タグ〜＜／Ａ＞タグの間の説明文を、単語辞書を参照して、形態素解析を行う。すなわち、説明文の表記と一致する見出しを持つ単語を単語辞書２１から読み出す。このステップにおいて、各単語の文法、読み情報及び品詞情報が分析される。例えば、「＜密輸未遂事件＞４億円分の高級腕時計没収できず」は、図６のように解析される。図６は、「＜密輸未遂事件＞４億円分の高級腕時計没収できず」を形態素解析した例である。
【００４８】
文字列特定部１２では、形態素解析の結果に基づいて、リンク先を対応付けることができる文字列を特定する。文字列を特定する方法としては、例えば、（１）「密輸未遂事件４億円分の高級腕時計没収できず」のように、説明文をそのまま文字列とする方法がある。また、（２）形態素解析結果から、名詞句を作成し、文字列とする方法がある。名詞句は、連続する名詞を統合して得られる。例えば、上記例から「密輸未遂事件」、「４億円分」、「高級腕時計没収」の名詞句が得られる。（３）上記（２）において得られた名詞句のうち、説明文の最初に出てくる名詞句を、文字列とする方法がある。上記の例では、「密輸未遂事件」が特定された文字列となる。（４）上記（２）において得られた名詞句のうち、最も長い名詞句を、文字列とする方法である。上記の例では、「高級腕時計没収」が特定された文字列となる。（５）上記（２）において得られた名詞句のうち、説明文の前から指定文字数以内の名詞句までを連続して文字列とする方法がある。上記の例では、指定文字数を１５字とした場合に、「密輸未遂事件４億円分」が特定された文字列となる。
【００４９】
文字列の特定方法は、説明文を特定できるものであれば、いずれの方法にも限定されない。特に、特定された文字列と同一の文字列が音声認識辞書に存在する場合は、特定方法を変えて、音声認識辞書に登録されていない文字列を特定する必要がある。
【００５０】
ステップＳ１０４では、表記生成部１３において、上記ステップＳ１０５で特定された文字列中に、読み方が難しい単語あるいは読み方に注意が必要な単語が含まれているかどうかを調べる。読み方が難しい単語あるいは読み方に注意が必要な単語が含まれている場合には、その単語の読みを表示させる表記を生成する。読みの表記を生成する態様については、後述する。
【００５１】
ステップＳ１０５では、ステップＳ１０３で特定された文字列とその読みを、リンク先ファイル名に対応付けて管理テーブル１４に登録する。図７は、管理テーブルに登録された特定された文字列と、その読みと、リンク先ファイル名の例を示す図である。
【００５２】
ステップＳ１０６では、管理テーブル１４に登録された特定された文字列を構成する各文字の音素モデルとその読みとから、認識対象語句の音素モデルが作成される。作成された認識対象語句の音素モデルは、音声認識辞書２３のデータに順次追加される。なお、登録された音声認識用のデータは、表示されるＨＴＭＬ文章が変更された場合には、このＨＴＭＬ文章に関するデータが全部削除される。
【００５３】
ステップＳ１０７で、ＨＴＭＬ文章の解析が終了したかどうかが、判断され、終了していない場合には、ステップＳ１０３に移行する。ＨＴＭＬ文章に含まれる全ての＜Ａ＞タグから＜／Ａ＞タグに挟まれた文章について、ステップＳ１０３からステップＳ１０６までの処理を繰り返して実行する。
【００５４】
ステップＳ１０７では、ＨＴＭＬ文章に含まれるリンク先情報のファイルを全て解析した後、制御部１から音声認識部１６を有効にする命令が送信される。これにより、ステップＳ１０８では、ユーザが音声を発話するまで、システムは待機状態になる。
【００５５】
ステップＳ１０９では、ユーザの発話による音声信号が、音声入力部８から、音声認識部１６に送信される。音声認識部１６は、入力された音声信号から特徴パラメータを抽出する。抽出された特徴パラメータ列は、ステップ１０６で登録された認識対象語句の音素モデルと、距離計算がされる。そして、最もスコアの高い認識対象語句が、認識結果として出力される。
【００５６】
ステップＳ１１０では、リンク先ファイル名取得部１７により、出力された認識対象語句に基づいて、管理テーブル１４の検索を行い、リンク先ファイル名が取得される。
【００５７】
ステップＳ１１１では、ステップＳ１１０で得られたリンク先ファイル名を用いて、通信部６から情報要求を行う命令を送信する。
【００５８】
例えば、ユーザが“密輸未遂事件”と発話すると、音声認識部から「密輸未遂事件」という認識結果が得られる。この認識結果を用いて、管理テーブルを検索すると、リンク先ファイル名である“ｈｔｔｐ：／／ｗｗｗ．ｘｙｚ．ｃｏ．ｊｐ／ｎｅｗｓ／ｄｎ／２００３０２１１．００１．ｈｔｍｌ”が得られる。これにより、次のリンク先のＨＴＭＬ文章を得ることができる。
【００５９】
（読みの表記生成処理）
図８は、表記生成部で、読みの表記を生成する処理を説明するフローチャートである。
【００６０】
ステップ２０１では、ステップ１０３で、特定された文字列から、文字列を構成する単語を取得する。例えば、「密輸未遂事件」の場合、「密輸」、「未遂」、「事件」の３つの単語の中から、まず「密輸」を取得する。
【００６１】
ステップ２０２では、ステップ２０１で取得された単語が、読み付与条件を満足するか否かが判断され、読み付与条件を満たす場合には、ステップＳ２０３に、満たさない場合には、ステップＳ２０４に移行する。図９は、ステップ２０２で用いられる情報テーブルの一例を示す図である。具体的には、難易度が所定の値以上の難易度を持つかどうかが判断される。例えば、難易度を３とした場合、「密輸」の難易度は３であるので、読みを付与すべき単語として選択される。難易度の設定は、ユーザが自己の語学力に応じて、任意に設定できるようにする。例えば、日本漢字能力検定の各級の出願内容などから、難易度の基準を定めればよい。また、図９に示す情報テーブルの難易度情報を、単語辞書に含ませる構成としてもよい。
【００６２】
ステップＳ２０３では、単語に読みが必要であることのフラグがセットされる。
【００６３】
ステップ２０４では、全ての単語について読み付与条件が判断されているかどうかを判断し、全ての単語について判断が終了していない場合は、ステップＳ２０１に移行し、全ての単語について判断が終了している場合には、ステップＳ２０５に移行する。
【００６４】
ステップＳ２０５では、ステップＳ１０３で特定された文字列を強調表示させるための表記と、ステップＳ２０３で単語に読みが必要であることのフラグがセットされた単語についての読みをこの単語に対応付けて表示させるための表記とが、生成される。図１０は、特定された文字列を他の文字列と異なる色で表示し、単語の読みをイタリック体でこの単語に関連付けて表示した例を示す図である。この図の例では、特定された文字列は、赤色で表示し、他の文字列は黒色や青色で表示して、特定された文字列を強調表示している。図１０では、音声入力可能な文字列「社会」、「政治」、「経済」、「国際」、「スポーツ」、「密輸（みつゆ）未遂（みすい）事件」、「羽田空港」、「鹿沼（かぬま）市職員略取（りゃくしゅ）」、「自民長崎県連」は、赤色で、通常の文字列「社会ニュース（２月１１日）は、黒色で、音声入力可能な文字列として特定されなかったリンクの説明文中の文字列は、青色で表示されている。
【００６５】
この結果、ユーザは音声入力ができる文字列を容易に判断することができる。また、難しい単語や、読み方に注意が必要な単語であっても、読み誤ることなくリンク先を指定することができる。図１１は、従来例に基づく特定された文字列を表示した例を示す図である。本発明によれば、図１１に示す場合のように、全ての単語に読みを付与しない。この結果、サーバ側が提供する文章の文字列配置を大きく崩すことがない表示が得られる。
【００６６】
なお、上記実施の形態では、表記生成部において読みの表記生成処理を実行するに際して、単語の情報テーブルを用いた。しかし、単語の難易度情報に限られない。図１２は、漢字の情報テーブルの一例を示す図である。単語の情報テーブルに代えて、図１２に示す漢字の情報テーブルを用いて、予め設定された難易度より難しい漢字を有する単語に、読みを付与してもよい。また、単語辞書に漢字の難易度情報を含ませる構成とすることもできる。この構成にすると、単語辞書自体が漢字の読み情報を有しており、この情報を使うことができる。
【００６７】
また、上記実施の形態における、表記生成部による読みの表記生成処理に際して、単語が複数の読み方を持つ場合、読み付与条件として、単語の読み方辞書を用いて、複数の読みを表示して、適切な読み方を選択できる構成としてもよい。また、別途読み方辞書を用いずに、単語辞書の有する読み情報を用いる構成としてもよい。
【００６８】
また、上記実施の形態における、表記生成部による読みの表記生成処理に際して、固有名詞には読み方の難しいものもあるので、読み付与条件として、文字列特定部で得られた単語の品詞情報を用いて固有名詞を選択し、読み方を付与する構成としてもよい。
【００６９】
また、上記実施の形態における、表記生成部による読みの表記生成処理に際して、英単語や記号などの漢字や仮名以外の文字列がある場合、読み付与条件として、読み方テーブルを用いて、読み方を付与する構成としてもよい。また、別途読み方テーブルを用いずに、単語辞書の有する読み情報を用いる構成としてもよい。
【００７０】
なお、単語が複数の読み方を持つ場合に正しい読みを付与する構成、固有名詞に読み方を付与する構成、および英単語や記号などの漢字や仮名以外の文字列に付与する構成は、前記単語または漢字に読み方を付与する構成と、組み合わせてもよい。
【００７１】
また、上記実施の形態では、表記生成部における読みの表記生成処理を行うに際して、文字列の読みを他の表示とは異なる書体のイタリック体で表示したが、これには限定されない。例えば、文字列の読みの色を、他の表示の色とは異なる色で表示してもよい。
【００７２】
図１３は、本発明の実施の形態における変形表示例を示す図である。この図に示すように、上記実施の形態における、表記生成部による読みの表記生成処理に際して、特定された文字列の変わりに、文字列の読みを表示させる構成としてもよい。
【００７３】
本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記憶した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読出し実行することによっても、達成される。
【００７４】
この場合、記録媒体から読出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成する。
【００７５】
プログラムコードを供給するための記録媒体としては、例えば、磁気テープやカセットテープなどのテープ系、フロッピディスク、ハードディスクなどの磁気ディスク、ＣＤ（コンパクトディスク）−ＲＯＭ、ＣＤ−Ｒ、ＭＯ（光磁気）ディスク、ＭＤ（ミニディスク）、ＤＶＤ（デジタル多用途ディスク）などの光ディスクのディスク系、ＩＣ（集積回路）カード、光カードなどのカード系、マスクＲＯＭ、ＥＰＲＯＭ（紫外線消去型ＲＯＭ）、ＥＥＰＲＯＭ（電気的消去型ＲＯＭ）、フラッシュＲＯＭなどの半導体メモリ系を含めた、固定的にプログラムを坦持する媒体を用いることができる。これら記録媒体には、プログラムに限られずデータを記録することもできる。
【００７６】
また、コンピュータが読出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００７７】
更に、記録媒体から読出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００７８】
また、本発明の情報処理装置は、モデムを備えてインターネットを含む通信ネットワークと接続できる。この場合に、上記プログラムコードは、通信ネットワークからのダウンロードなどによって、流動的にプログラムを坦持する媒体であってもよい。なお、その場合には、上記通信ネットワークからダウンロードするためのダウンロードプログラムは、予め本体装置に格納されている、あるいは別の記録媒体からインストールされる。
【００７９】
【発明の効果】
以上で説明したように、本発明の情報処理装置では、抽出されたテキスト文からリンク先を関連付けることができる文字列が、強調表示される。ユーザは音声で入力できる文字列を容易に認識できる。この結果、ユーザは、所望の文字列を読み上げることにより、リンク先の情報を簡単に選択できる。
また、本発明の情報処理装置では、文字列の全部または一部に読み方が難しい文字列、または読み方に注意が必要な文字列に対して、文字列の全部または一部に読み方が表示される。ユーザは、文字列の読み方を容易に知ることができる。特に、読み方が難しい文字列、または読み方に注意が必要な文字列に対してのみ読みが併記されるので、原文のＨＴＭＬ文章と表示形態とが大きく異なることはない。
【図面の簡単な説明】
【図１】図１は、本発明の情報処理装置の構成を示す図である。
【図２】図２は、本実施の形態の情報処理装置を説明する構成ブロック図である。
【図３】図３は、本発明にかかる情報処理装置の動作を示すフローチャートである。
【図４】図４は、ＨＴＭＬ文章の一例を示す図である。
【図５】図５は、ブラウザプログラムを用いて図４のＨＴＭＬ文章を表示した例である。
【図６】図６は、「＜密輸未遂事件＞４億円分の高級腕時計没収できず」を形態素解析した例である。
【図７】図７は、管理テーブルに登録された特定された文字列と、その読みと、リンク先ファイル名の例を示す図である。
【図８】図８は、表記生成部で、読みの表記を生成する処理を説明するフローチャートである。
【図９】図９は、ステップ２０２で用いられる難易度情報テーブルの一例を示す図である。
【図１０】図１０は、特定された文字列を他の文字列と異なる色で表示し、単語の読みをイタリック体でこの単語に関連付けて表示した例を示す図である。
【図１１】図１１は、従来例に基づく特定された文字列を表示した例を示す図である。
【図１２】図１２は、漢字の難易度情報テーブルの一例を示す図である。
【図１３】図１３は、本発明の実施の形態における変形表示例を示す図である。
【符号の説明】
１制御部
２プログラムメモリ
３データメモリ
４入力部
５外部記憶部
６通信部
７表示部
８音声入力部
１１文字列抽出部
１２文字列特定部
１３表記生成部
１４管理テーブル
１５辞書登録部
１６音声認識部
１７リンク先ファイル名取得部
２１単語辞書
２２情報テーブル
２３音声認識辞書[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information processing apparatus, a link destination file acquisition method, a link destination file acquisition program, and a program recording medium that recognize an input voice and acquire a next link destination file from a file such as hypertext.
[0002]
[Prior art]
2. Description of the Related Art In recent years, there are WWW (World Wide Web) systems that provide information such as images and sounds in a server / client format on a computer network, particularly via the Internet. In the WWW system, information is sent from the server to the client in a hypertext format called HTML (Hyper Text Mark-up Language). The client browses text written in HTML using an information browsing device called a browser.
[0003]
In the hypertext, in addition to the format of objects such as characters and photos to be displayed, the layout on the screen, a description of the link indicating that there is a link to other information, and the URL that is the link destination file name (Uniform Resource Locator) can be described. Links to other information are performed by reading other information from tags called links embedded in hypertext.
[0004]
In general browsers, link descriptions are usually displayed so that they can be distinguished from other text, such as by changing the font color or underlining. The description of the link and the URL are managed in association with each other. When a user specifies a description of a specific link using a pointing device such as a mouse or a keyboard, the browser uses a URL corresponding to this description to request information transmission to a server uniquely determined by the URL. . In this way, the user can access related information one after another by designating a link.
[0005]
On the other hand, browsers that employ voice recognition have also been proposed. For example, a character string serving as an anchor point and a link destination information embedded in the anchor point are extracted from a hypertext sentence, and a reading for speech recognition is given to the character string by referring to a reading database. A technique for loading a link destination sentence when a description of a desired link is read out is disclosed (see, for example, Patent Document 1). This document also discloses a technique for loading a link destination sentence when a unique number is assigned to each link destination information and the user inputs the number by voice.
[0006]
In addition, language analysis of the explanatory text associated with the link destination file is performed, a unique word that characterizes the explanatory text is selected, the selected word is registered in the word dictionary, and when the user reads a desired word, the link destination Is disclosed (see, for example, Patent Document 2). In this document, in order to make it easy to identify a speech-recognizable word, highlighting such as displaying the speech-recognizable word reading in the link explanation text in association with this word or changing the color is displayed. It has also been proposed to do.
[0007]
[Patent Document 1]
JP-A-10-124293 (Claim 1, paragraphs 0014 to 0017, paragraph 22)
[Patent Document 2]
JP-A-11-25098 (Claim 1, paragraphs 0046 to 0054)
[0008]
[Problems to be solved by the invention]
However, in the case of the invention described in Patent Document 1, character string readings are sequentially generated while accessing the reading database for character strings serving as anchor points. For this reason, if there are words that the user does not understand how to read, or words that have multiple readings such as the market (Shiba, Ichiba), etc. in the description of the link described in the HTML text, Since the reading is not displayed, there is a problem that the user cannot utter as the reading of the character string with the link description, and the link is not traced by voice operation.
[0009]
Further, in the case of the invention described in Patent Document 2, a unique word that can specify the explanatory text is selected from the explanatory text related to the link file, and the word reading is displayed in association with the word. For this reason, since the entire specified character string is read, there is a problem in that the display form is greatly different from the original HTML sentence. On the other hand, a method of not displaying the reading is also disclosed, but if there is a word that does not understand how to read, the user can not speak as the reading of the character string with the link description, and the link can be traced by voice operation There is no problem. Further, since the specified character string is a noun or noun phrase, the character string cannot be specified when the same noun or noun phrase exists. That is, there is a problem that the conditions for specifying the character string are narrowed.
[0010]
The present invention has been made in view of the above problems, and its purpose is that there is little difference between the original HTML text and the display form, and even if there is a character string that is difficult to read or has multiple readings, speech recognition can be performed accurately. An object of the present invention is to provide an information processing apparatus that can easily follow a link by voice operation.
[0011]
[Means for Solving the Problems]
In order to achieve the above object, an information processing apparatus of the present invention is an information processing apparatus that acquires a next linked file from a file acquired from a server on a network, and analyzes the acquired file, Character string extraction means for extracting the file name of the link destination file and the text sentence associated with the link destination file name, and character string specification for specifying the character string with which the link destination can be associated from the extracted text sentence Means, a notation for highlighting the specified character string, and a character string that is difficult to read in all or part of the character string, or a character string that requires careful reading, Or a notation generation means for generating a notation for displaying the reading in part, and the specified character string and the link destination file name, the link destination file name Using the management table to be registered in association, dictionary registration means for registering in the speech recognition dictionary to perform speech recognition on the character strings registered in the management table, and speech input from the user using the speech recognition dictionary Voice recognition means for performing speech recognition and link destination file name acquisition means for acquiring the corresponding link destination file name from the management table based on the recognition result of the voice recognition means.
[0012]
According to this configuration, a character string that can associate a link destination from the extracted text sentence is highlighted. The user can easily recognize a character string that can be input by voice. As a result, the user can easily select link destination information by reading out a desired character string.
[0013]
In addition, a character string that is difficult to read in all or part of the character string or a character string that requires attention to how to read is displayed in the whole or part of the character string. The user can easily know how to read the character string. In particular, reading is written only for a character string that is difficult to read or a character string that requires attention to reading, so the HTML text of the original text and the display form do not differ greatly.
[0014]
In the information processing apparatus according to the present invention, the notation generation unit uses the difficulty level information on how to read a word and is a word included in the specified character string, and is difficult to read than a preset difficulty level. For a word, a notation that displays how to read the word is generated.
[0015]
According to this configuration, reading of words that are difficult to read is given. As a result, the user can read correctly even if there is a word that does not know how to read in the character string, so the utterance content does not change.
[0016]
Further, in the information processing apparatus of the present invention, the notation generating means uses the difficulty level information related to how to read kanji and is a word included in the specified character string, and is more difficult to read than a preset difficulty level. A notation that displays how to read the kanji may be generated for a word that has.
[0017]
According to this configuration, reading of kanji that is difficult to read is given. As a result, even if there is a kanji that does not know how to read in the character string, the user can read it correctly, so the content of the utterance does not change. If all words have difficulty level information, the difficulty level information has a huge amount of information. However, if the difficulty level information is provided for thousands of Chinese characters as in the configuration of the present invention, the difficulty level information can be realized in a compact manner.
[0018]
Moreover, the said notation production | generation means shall be able to set the difficulty regarding the reading of a word, or the difficulty regarding the reading of a Chinese character by a user's instruction | indication. Thereby, the reference | standard which attaches how to read according to a user's language ability can be defined.
[0019]
The notation generation means uses the difficulty level information on how to read the word, and is a word included in the identified character string, and for a word having a plurality of readings, an appropriate reading of the word in the character string It is also possible to generate a notation for displaying.
[0020]
The notation generation unit may generate a notation for displaying proper noun readings for proper nouns included in the specified character string.
[0021]
The notation generation means displays the reading of the character string for the character string other than the kanji or kana included in the specified character string, using the reading information regarding the reading of the character string other than the kanji or kana. A notation may be generated. According to this configuration, since the reading can be displayed even for character strings other than kanji or kana such as English words and symbols, the content of the utterance cannot be changed.
[0022]
For the notation of reading, a notation that displays the reading of the character string associated with the character string may be generated for the character string to which the reading method included in the specified character string is to be given. Or you may display the reading of the character string with the typeface different from other displays with respect to the character string which should give the reading method contained in the specified character string. You may display the reading of the character string with the color different from other displays with respect to the character string which should give the reading method contained in the specified character string.
[0023]
On the other hand, for a character string to which a reading method included in the specified character string should be given, a notation for displaying the reading of the character string instead of the character string may be generated. According to this configuration, since the reading of the character string is written instead of the character string, the display can be made easy to see.
[0024]
The link destination file acquisition method for acquiring the next link destination file from the file acquired from the server on the network according to the present invention analyzes the acquired file, and the file name and link destination file of the next link destination file Extracting a text sentence associated with a name; identifying a character string that can be linked to a link destination from the extracted text sentence; and a notation for highlighting the identified character string; Generating a character string that is difficult to read in all or part of the character string, or a notation for displaying the reading for a character string that requires attention to the reading, and the specified character Registering a column and a link destination file name in the management table in association with the link destination file name, and the management table Registering in a speech recognition dictionary to perform speech recognition on a registered character string, analyzing speech input by a user using the speech recognition dictionary, performing speech recognition, and Acquiring the corresponding link destination file name from the management table based on a recognition result.
[0025]
Further, according to the present invention, there is provided a link destination file acquisition program for acquiring a next link destination file from a file acquired from a server on a network, the computer analyzing the acquired file, A character string extracting means for extracting a file name of the file and a text sentence associated with the link destination file name; a character string specifying means for specifying a character string capable of associating a link destination from the extracted text sentence; The notation for highlighting the specified character string and the reading for a difficult-to-read character string included in all or part of the character string, or for a character string that requires careful reading A notation generation means for generating a notation for display, and the specified character string and link destination file name as a link destination file name. A management table to be registered in association, dictionary registration means for registering in a speech recognition dictionary to perform speech recognition on character strings registered in the management table, and speech input from a user using the speech recognition dictionary A speech recognition means for performing speech recognition and a link destination file for functioning as a link destination file name acquisition means for acquiring the corresponding link destination file name from the management table based on the recognition result of the voice recognition means An acquisition program is provided.
[0026]
The present invention also provides a computer-readable recording medium on which the linked file acquisition program is recorded.
[0027]
Further, the information processing apparatus according to the present invention provides the following link destination file included in the file acquired from the server on the network with a character string that can associate the next link destination file with a voice instruction, An information processing apparatus for acquiring a link destination file, a means for extracting a character string that needs to display a reading from a character string that can be associated with the next link destination file, and the extracted character string There may be provided means for indicating how to read, or means for replacing the extracted character string with reading of the character string.
[0028]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings. In addition, this invention is not limited by these.
[0029]
FIG. 1 is a diagram showing a configuration of an information processing apparatus according to the present invention. As shown in FIG. 1, the information processing apparatus includes a control unit 1, a program memory 2, a data memory 3, an input unit 4, an external storage unit 5, a communication unit 6, a display unit 7, and voice input. These parts are connected to each other via a bus.
[0030]
The control unit 1 operates in accordance with a program stored in a program memory (RAM) 2, controls the operation of the entire apparatus, and executes a program such as a browser. A data memory (ROM) 3 stores various control data used by the control unit 1. The data memory (RAM) 3 is used as a work area when the control unit 1 executes various control processes, and temporarily stores various data. The program memory 2 stores a program executed by the control unit 1. The program is stored in the program memory 2 via the communication unit 6 from the external storage unit 5 or an external network. The control unit 1 analyzes the file input from the external network via the communication unit 6 and extracts the next link destination file and link destination file name. Programs such as system programs, browser programs, voice recognition programs, and language analysis programs are stored in the program memory 2.
[0031]
A word dictionary for language analysis, an information table, a reading dictionary, a cache of received HTML sentences, an acoustic model for speech recognition, and the like are stored in the data memory (RAM) 3 from the external storage unit 5 or the external network via the communication unit 6. Or written directly in a data memory (ROM) 3 such as a hard disk. Work variables such as a management table and a speech recognition dictionary are stored in a data memory (RAM) 3.
[0032]
The input unit 4 includes a keyboard and a mouse. The communication unit 6 is connected to an external network such as the Internet, receives data that has arrived via the network, and outputs the data to the control unit 1. The external storage unit 5 calls program data from a recording medium such as a CD-ROM or DVD-ROM and outputs the program data to the control unit 1. The program data read in this way is installed in the data memory 3 or the like by the processing of the control unit 1.
[0033]
The display unit 7 presents display contents to the user, and includes display means such as a CRT or a liquid crystal display device.
[0034]
The audio input unit 8 is connected to a microphone or an external amplifier, and converts an audio signal input from the user into a digital signal by A / D conversion.
[0035]
FIG. 2 is a configuration block diagram illustrating the information processing apparatus according to the present embodiment. The information processing apparatus according to the present embodiment includes a character string extraction unit 11, a character string identification unit 12, a notation generation unit 13, a management table 14, a dictionary registration unit 15, a link destination file name acquisition unit 16, and It consists of a word dictionary 21, an information table 22, and a speech recognition dictionary 23.
[0036]
In the information processing apparatus of the present invention, the HTML text acquired via the input unit 4 is analyzed by the character string extraction unit 11. Specifically, the file name of the next link destination file and the explanatory text associated with the link destination file name are extracted from the HTML text.
[0037]
The character string specifying unit 12 analyzes the extracted explanatory text and specifies a character string that can associate a link destination. Specifically, the determination is performed by using the word dictionary 21 to determine whether or not the word described in the word dictionary 21 is included in the character string. By referring to the word dictionary 21, grammatical information for determining the reading of the words constituting the character string, the part of speech, and the word break can be obtained.
[0038]
The notation generation unit 13 refers to the information table 22 to determine whether or not there is a character string that needs to be read in the specified character string. For words that are determined to require reading, a notation for displaying the reading is given to the HTML sentence. The display unit 7 displays an HTML sentence to which a reading is given. The information table 22 is a table displaying difficulty information on how to read by selecting all the words included in the word dictionary or only kanji. For a word having a plurality of readings, the notation generation unit 13 gives a notation for displaying an appropriate reading with reference to the reading dictionary. In addition, when a reading is given to a word having a plurality of readings, a reading dictionary can be used instead of the information table. When reading is given to character strings other than kanji or kana included in the specified character string, a reading table may be used instead of the information table. On the other hand, without using such an information table or a reading table, the reading method may be assigned using difficulty level information added to the word dictionary or using reading information included in the word dictionary.
[0039]
In the management table 14, the character string specified by the character string specifying unit 12 and the character string reading data given by the notation generating unit 13 are stored in association with the link destination file.
[0040]
In the dictionary registration unit 15, the character string specified by the character string specifying unit and how to read the character string are registered in the speech recognition dictionary 23.
[0041]
The voice recognition unit 16 analyzes the voice signal input from the voice input unit 8 and collates it with the voice recognition dictionary 23. For voice recognition, a known voice recognition technique can be used. For example, speech recognition is performed using a hidden Markov model (hereinafter referred to as HMM) (see, for example, Seiichi Nakagawa, “Speech Recognition Using Stochastic Models”, IEICE). Specifically, the probability of occurrence of all the HMMs corresponding to the recognition target words registered in the speech recognition dictionary 23 is obtained from the analyzed speech signal, and the words corresponding to the HMM having the highest occurrence probability are recognized. And
[0042]
The link destination file name acquisition unit 17 refers to the management table from the result obtained by the voice recognition unit 16 and acquires the file name of the link destination file corresponding to the specific character string. The file corresponding to this file name is read.
[0043]
(Operation)
Next, the operation of the information processing apparatus of the present invention will be described in detail with reference to FIG. FIG. 3 is a flowchart showing the operation of the information processing apparatus according to the present invention.
[0044]
In step S 101, the designated HTML text acquired from the server by the communication unit 6 through a communication network such as the Internet is stored in the data memory (ROM) 3. In step S102, the text string extraction unit 11 analyzes this text sentence based on the tag information.
[0045]
Here, an HTML sentence will be briefly described. FIG. 4 is a diagram illustrating an example of an HTML sentence. The <html> tag and the </ html> tag are definitions that declare that a sentence is written in HTML. The <html> tag and the </ html> tag are placed at the beginning and end of the entire sentence. Between the <head> tag and the </ head> tag, information related to the text, such as the title and characteristics of the text, and information about the producer, is described. Between the <title> tag and the </ title> tag, the title of the sentence is described in a portion sandwiched between the <head> tag and the </ head> tag. The text actually displayed on the browser is described in a portion sandwiched between the <body> tag and the </ body> tag.
[0046]
A link is set between the <A href=“★”> tag and the </A> tag. In ★, enter the destination file name (URL). For example, in <A href=“http://www.xyz.co.jp/news/dn.html”> society </A> in the figure, “http://www.xyz.co.jp/news” “/Dn.html” is the link destination file name, and “Society” means the explanatory text. Details of these tag information are disclosed in, for example, Ankh's “HTML tag dictionary” (Shosuisha Co., Ltd.). FIG. 5 is an example in which the HTML text of FIG. 4 is displayed using a browser program.
[0047]
In step S103, the explanatory text between the <A> tag to the </A> tag is subjected to morphological analysis with reference to the word dictionary. That is, a word having a headline that matches the notation of the explanatory text is read from the word dictionary 21. In this step, the grammar, reading information and part of speech information of each word are analyzed. For example, “<The attempted smuggling case> 400 million yen of luxury watches cannot be confiscated” is analyzed as shown in FIG. FIG. 6 is an example of a morphological analysis of “<A case of attempted smuggling> 400 million yen of luxury watch cannot be confiscated”.
[0048]
The character string specifying unit 12 specifies a character string that can be associated with a link destination based on the result of morphological analysis. As a method of specifying a character string, for example, there is a method of using an explanatory text as a character string as it is (1) “A high-class wristwatch worth 400 million yen cannot be confiscated”. In addition, (2) there is a method of creating a noun phrase from a morphological analysis result and making it a character string. A noun phrase is obtained by integrating consecutive nouns. For example, from the above example, the noun phrases of “smuggling attempted smuggling”, “400 million yen”, “confiscation of luxury watch” are obtained. (3) Among the noun phrases obtained in the above (2), there is a method in which a noun phrase appearing at the beginning of an explanatory sentence is used as a character string. In the above example, the “smuggling attempted case” is a specified character string. (4) This is a method in which the longest noun phrase among the noun phrases obtained in (2) above is a character string. In the above example, the character string specifying “confiscation of luxury watch” is specified. (5) Among the noun phrases obtained in the above (2), there is a method in which a character string is continuously formed up to a noun phrase within a designated number of characters from before the explanatory sentence. In the above example, when the designated number of characters is 15 characters, the character string in which “400 million yen for attempted smuggling” is specified.
[0049]
The method for specifying the character string is not limited to any method as long as the description can be specified. In particular, when the same character string as the specified character string exists in the speech recognition dictionary, it is necessary to specify a character string that is not registered in the speech recognition dictionary by changing the specifying method.
[0050]
In step S104, the notation generation unit 13 checks whether the character string specified in step S105 includes a word that is difficult to read or a word that requires attention. When a word that is difficult to read or a word that requires attention is included, a notation for displaying the reading of the word is generated. A mode of generating reading notation will be described later.
[0051]
In step S105, the character string specified in step S103 and its reading are registered in the management table 14 in association with the link destination file name. FIG. 7 is a diagram illustrating an example of a specified character string registered in the management table, its reading, and a link destination file name.
[0052]
In step S106, the phoneme model of the recognition target phrase is created from the phoneme model of each character constituting the specified character string registered in the management table 14 and its reading. The created phoneme model of the recognition target phrase is sequentially added to the data of the speech recognition dictionary 23. Note that when the displayed HTML text is changed, all the data related to the HTML text is deleted.
[0053]
In step S107, it is determined whether or not the analysis of the HTML sentence has been completed. If not, the process proceeds to step S103. The processing from step S103 to step S106 is repeatedly executed for all the sentences between the <A> tag and the </A> tag included in the HTML sentence.
[0054]
In step S 107, after analyzing all the link destination information files included in the HTML text, a command for enabling the voice recognition unit 16 is transmitted from the control unit 1. Thereby, in step S108, the system enters a standby state until the user speaks a voice.
[0055]
In step S 109, a voice signal generated by the user's utterance is transmitted from the voice input unit 8 to the voice recognition unit 16. The voice recognition unit 16 extracts feature parameters from the input voice signal. The extracted feature parameter string is subjected to distance calculation with the phoneme model of the recognition target phrase registered in step 106. Then, the recognition target word / phrase with the highest score is output as the recognition result.
[0056]
In step S110, the link destination file name acquisition unit 17 searches the management table 14 based on the output recognition target word and acquires the link destination file name.
[0057]
In step S111, a command for requesting information is transmitted from the communication unit 6 using the link destination file name obtained in step S110.
[0058]
For example, when the user utters “smuggling attempted case”, a recognition result “smuggling attempted case” is obtained from the speech recognition unit. When the management table is searched using the recognition result, the link destination file name “http://www.xyz.co.jp/news/dn/20033021.101.html” is obtained. Thereby, the HTML text of the next link destination can be obtained.
[0059]
(Reading notation generation process)
FIG. 8 is a flowchart illustrating a process of generating a reading notation in the notation generation unit.
[0060]
In step 201, the words constituting the character string are acquired from the character string identified in step 103. For example, in the case of “smuggling attempted case”, “smuggling” is first acquired from the three words “smuggling”, “attempt”, and “case”.
[0061]
In step 202, it is determined whether or not the word acquired in step 201 satisfies the reading provision condition. If the reading provision condition is satisfied, the process proceeds to step S203. If not, the process proceeds to step S204. . FIG. 9 is a diagram illustrating an example of an information table used in step 202. Specifically, it is determined whether the difficulty level has a difficulty level equal to or higher than a predetermined value. For example, when the difficulty level is 3, since the difficulty level of “smuggling” is 3, it is selected as a word to which reading should be given. The difficulty level can be set arbitrarily according to the user's language ability. For example, the standard of difficulty may be determined from the contents of each grade of Japanese Kanji ability test. Moreover, it is good also as a structure which includes the difficulty level information of the information table shown in FIG. 9 in a word dictionary.
[0062]
In step S203, a flag indicating that the word needs to be read is set.
[0063]
In step 204, it is determined whether or not the reading provision conditions have been determined for all words. If the determination has not been completed for all words, the process proceeds to step S201, and the determination has been completed for all words. In the case, the process proceeds to step S205.
[0064]
In step S205, the notation for highlighting the character string specified in step S103 and the reading of the word for which the word is required to be read in step S203 are displayed in association with this word. The notation for generating is generated. FIG. 10 is a diagram illustrating an example in which a specified character string is displayed in a color different from that of other character strings, and a word reading is displayed in association with the word in italics. In the example of this figure, the specified character string is displayed in red, the other character strings are displayed in black or blue, and the specified character string is highlighted. In FIG. 10, the character strings “society”, “politics”, “economy”, “international”, “sports”, “Mitsuyu attempted”, “Haneda Airport”, “ “Kanuma city officials” and “Nagasaki Prefectural Federation” are red, and the normal character string “Social News (February 11th) is black, as a character string that can be input by voice. The character string in the description of the link that has not been specified is displayed in blue.
[0065]
As a result, the user can easily determine a character string that can be input by voice. Even for difficult words or words that require attention to how to read, the link destination can be designated without misreading. FIG. 11 is a diagram illustrating an example in which a specified character string based on a conventional example is displayed. According to the present invention, as in the case shown in FIG. 11, all words are not read. As a result, it is possible to obtain a display in which the character string arrangement of the sentence provided by the server side is not greatly broken.
[0066]
In the above embodiment, the word information table is used when the notation generation unit executes the reading notation generation process. However, it is not limited to word difficulty information. FIG. 12 is a diagram illustrating an example of a kanji information table. Instead of the word information table, a kanji information table shown in FIG. 12 may be used to give a reading to a word having a kanji that is harder than a preset difficulty level. Moreover, it can also be set as the structure which includes the difficulty level information of a Chinese character in a word dictionary. With this configuration, the word dictionary itself has kanji reading information, and this information can be used.
[0067]
In addition, when a word has a plurality of readings in the reading notation generation processing by the notation generation unit in the above embodiment, a plurality of readings are displayed using a word reading dictionary as a reading giving condition, It is good also as a structure which can select various reading methods. Moreover, it is good also as a structure which uses the reading information which a word dictionary does not use separately a reading dictionary.
[0068]
In addition, in the above-described embodiment, in the reading notation generation processing by the notation generation unit, there are some proper nouns that are difficult to read, so the part-of-speech information of the word obtained by the character string specifying unit is used as the reading assignment condition. It is also possible to select a proper noun and give a reading.
[0069]
In addition, when there is a character string other than kanji and kana such as English words and symbols in the reading notation generation processing by the notation generation unit in the above embodiment, reading is given using a reading table as a reading granting condition. It is good also as composition to do. Moreover, it is good also as a structure which uses the reading information which a word dictionary does not use separately a reading table.
[0070]
In addition, when a word has a plurality of readings, a configuration that gives correct reading, a configuration that gives reading to proper nouns, and a configuration that gives to character strings other than kanji and kana such as English words and symbols are the word or You may combine with the structure which gives how to read kanji.
[0071]
Further, in the above embodiment, when the notation generation processing is performed in the notation generation unit, the reading of the character string is displayed in an italic typeface that is different from other displays, but the present invention is not limited to this. For example, the reading color of the character string may be displayed in a color different from other display colors.
[0072]
FIG. 13 is a diagram showing a modified display example in the embodiment of the present invention. As shown in the figure, in the above embodiment, the reading generation process of the reading by the notation generation unit may be configured to display the reading of the character string instead of the specified character string.
[0073]
An object of the present invention is to supply a recording medium storing software program codes for realizing the functions of the above-described embodiments to a system or apparatus, and store the computer (or CPU or MPU) of the system or apparatus in the recording medium. It is also achieved by reading and executing the programmed program code.
[0074]
In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiments, and the recording medium on which the program code is recorded constitutes the present invention.
[0075]
As a recording medium for supplying the program code, for example, a tape system such as a magnetic tape or a cassette tape, a magnetic disk such as a floppy disk or a hard disk, a CD (compact disk) -ROM, a CD-R, or MO (magneto-optical) Optical discs such as discs, MDs (mini discs), DVDs (digital versatile discs), ICs (integrated circuit) cards, optical cards, etc., mask ROMs, EPROMs (ultraviolet erasable ROMs), EEPROMs (electrical A medium carrying a fixed program, such as a semiconductor memory system such as a static erasable ROM) or a flash ROM, can be used. These recording media are not limited to programs, and data can also be recorded.
[0076]
Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.
[0077]
Further, after the program code read from the recording medium is written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.
[0078]
The information processing apparatus of the present invention can be connected to a communication network including the Internet by including a modem. In this case, the program code may be a medium that fluidly carries the program by downloading from a communication network or the like. In this case, a download program for downloading from the communication network is stored in the main device in advance or installed from another recording medium.
[0079]
【The invention's effect】
As described above, in the information processing apparatus of the present invention, the character string that can be linked to the link destination from the extracted text sentence is highlighted. The user can easily recognize a character string that can be input by voice. As a result, the user can easily select link destination information by reading out a desired character string.
Further, in the information processing apparatus of the present invention, the reading is displayed on all or part of the character string with respect to the character string that is difficult to read on all or part of the character string or the character string that requires attention to reading. . The user can easily know how to read the character string. In particular, reading is written only for a character string that is difficult to read or a character string that requires attention to reading, so the HTML text of the original text and the display form do not differ greatly.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of an information processing apparatus according to the present invention.
FIG. 2 is a block diagram illustrating the information processing apparatus according to the present embodiment.
FIG. 3 is a flowchart showing the operation of the information processing apparatus according to the present invention.
FIG. 4 is a diagram illustrating an example of an HTML sentence.
FIG. 5 is an example in which the HTML text of FIG. 4 is displayed using a browser program.
FIG. 6 is an example of a morphological analysis of “<The attempted smuggling case> 400 million yen of luxury watches cannot be confiscated”.
FIG. 7 is a diagram illustrating an example of a specified character string registered in the management table, its reading, and a link destination file name;
FIG. 8 is a flowchart illustrating processing for generating a reading notation in a notation generation unit;
FIG. 9 is a diagram illustrating an example of a difficulty level information table used in step 202;
FIG. 10 is a diagram illustrating an example in which a specified character string is displayed in a different color from other character strings, and a word reading is displayed in association with the word in italics.
FIG. 11 is a diagram illustrating an example in which a specified character string based on a conventional example is displayed.
FIG. 12 is a diagram illustrating an example of a Chinese character difficulty level information table;
FIG. 13 is a diagram showing a modified display example in the embodiment of the present invention.
[Explanation of symbols]
1 Control unit
2 Program memory
3 Data memory
4 Input section
5 External storage
6 Communication Department
7 Display section
8 Voice input part
11 Character string extraction unit
12 Character string identification part
13 Notation generator
14 Management table
15 Dictionary Registration Department
16 Voice recognition unit
17 Link destination file name acquisition part
21 word dictionary
22 Information table
23 Speech recognition dictionary

Claims

An information processing apparatus that acquires the next link destination file from a file acquired from a server on the network,
Character string extraction means for analyzing the acquired file and extracting the file name of the next link destination file and the text sentence associated with the link destination file name;
A character string specifying means for specifying a character string capable of associating a link destination from the extracted text sentence;
In order to display the notation for highlighting the specified character string, and the character string that is difficult to read in all or part of the character string, or the character string that requires careful reading, is displayed. A notation generating means for generating the notation of
A management table for registering the identified character string and link destination file name in association with the link destination file name;
Dictionary registration means for registering in the speech recognition dictionary to perform speech recognition on the character strings registered in the management table;
A voice recognition unit that analyzes voice input from a user using the voice recognition dictionary and performs voice recognition;
Link destination file name acquisition means for acquiring the corresponding link destination file name from the management table based on the recognition result of the voice recognition means;
An information processing apparatus.

The notation generation means includes:
Using difficulty information about how to read words,
The notation which displays the reading of the word with respect to the word which is contained in the specified character string and is difficult to read from the preset difficulty level is generated. Information processing device.

The notation generation means includes:
Using difficulty information about how to read kanji,
The notation which displays the kanji reading of the word which is contained in the specified character string and has kanji difficult to read from the preset difficulty level is generated. The information processing apparatus described in 1.

The notation generation means includes:
The information processing apparatus according to claim 2 or 3, wherein a difficulty level relating to a word reading or a difficulty level relating to a kanji reading can be set by a user instruction.

The notation generation means includes:
Using difficulty information about how to read words,
A word that is included in the specified character string and that has a plurality of readings is generated by referring to a reading dictionary and generating a notation that displays an appropriate reading of the words in the character string. The information processing apparatus according to claim 1.

The notation generation means includes:
The information processing apparatus according to claim 1, wherein a notation for displaying a reading of the proper noun is generated for the proper noun included in the specified character string.

The notation generation means includes:
Using reading information about how to read strings other than kanji or kana,
The information processing apparatus according to claim 1, wherein a notation for displaying how to read a character string other than kanji or kana included in the identified character string is generated.

The notation generation means includes:
2. A notation for generating, for a character string to which a reading method included in the specified character string is to be assigned, a reading of the character string is displayed in association with the character string. Item 8. The information processing device according to any one of Items 7.

The notation generation means includes:
2. A notation for generating a character string included in the specified character string to display a reading of the character string in a typeface different from other displays is generated. The information processing apparatus according to claim 7.

The notation generation means includes:
2. A notation for generating a display of a character string to be given a reading method included in the specified character string in a color different from that of other displays is generated. The information processing apparatus according to claim 7.

The notation generation means includes:
2. A notation for generating a display of a character string to be given a reading included in the specified character string instead of the character string. Item 8. The information processing device according to any one of Items 7.

A link destination file acquisition method for acquiring the next link destination file from a file acquired from a server on the network,
Analyzing the acquired file and extracting a file name of a next link destination file and a text sentence associated with the link destination file name;
Identifying a character string that can be linked to a link destination from the extracted text sentence;
In order to display the notation for highlighting the specified character string and the difficult-to-read character string included in all or part of the character string, or the character string that requires careful reading Generating a representation of
Registering the specified character string and link destination file name in the management table in association with the link destination file name;
Registering in the speech recognition dictionary to perform speech recognition on the character strings registered in the management table;
Analyzing voice input from a user using the voice recognition dictionary, and performing voice recognition;
Obtaining the corresponding linked file name from the management table based on the recognition result of the voice recognition;
A link destination file acquisition method comprising:

A link destination file acquisition program that acquires the next link destination file from a file acquired from a server on the network,
Computer
Character string extraction means for analyzing the acquired file and extracting the file name of the next link destination file and the text sentence associated with the link destination file name;
A character string specifying means for specifying a character string capable of associating a link destination from the extracted text sentence;
In order to display the notation for highlighting the specified character string and the difficult-to-read character string included in all or part of the character string, or the character string that requires careful reading A notation generating means for generating
A management table for registering the identified character string and link destination file name in association with the link destination file name;
Dictionary registration means for registering in the speech recognition dictionary to perform speech recognition on the character strings registered in the management table;
A voice recognition unit that analyzes voice input from a user using the voice recognition dictionary and performs voice recognition;
A link destination file acquisition program for functioning as a link destination file name acquisition unit that acquires the corresponding link destination file name from the management table based on a recognition result of the voice recognition unit.

A computer-readable recording medium on which the linked file acquisition program according to claim 13 is recorded.

An information processing device that acquires the next link destination file by voice instructions for a character string that can be associated with the next link destination file included in the file acquired from the server on the network. There,
Means for extracting a character string that needs to be displayed from a character string that can be associated with the next linked file;
Means for writing the reading to the extracted character string, or means for replacing the extracted character string with reading of the character string;
An information processing apparatus.