JP2002297372A

JP2002297372A - Method, device and program for retrieving voice in web page

Info

Publication number: JP2002297372A
Application number: JP2001101640A
Authority: JP
Inventors: Masanobu Nishitani; 正信西谷; Yasunaga Miyazawa; 康永宮沢
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2001-03-30
Filing date: 2001-03-30
Publication date: 2002-10-11
Anticipated expiration: 2021-03-30
Also published as: JP3893893B2

Abstract

PROBLEM TO BE SOLVED: To provide a web page voice retrieving method which retrieves voice concerning not only the link item of a web page but also the contents of the web page and also recognizes voice with high precision when a user performs natural speech. SOLUTION: An HTML analyzing part 2 generates a voice retrieval dictionary 3 by connecting a word extracted from the web page with the URL of the web page where the word is extracted. A task control part 5 generates a voice recognition task from the extracted word and selects a language model(LM) and an acoustic model(AM) being optimum to the voice recognition task from a prepared language model and acoustic model group 6. An intention analyzing part 8 analyzes the contents of the user's speech recognized by a voice recognizing part 7 and detects the word which indicates the user's speech contents. A browser control part 9 changes the web page based on the retrieval result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ウエブページの音
声検索方法に関し、特に、ＷＷＷブラウザを用いたウエ
ブページの音声検索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for retrieving voice from a web page, and more particularly, to a method for retrieving voice from a web page using a WWW browser.

【０００２】[0002]

【従来の技術】従来、ウエブページの音声検索を行うた
めに、ウエブページに埋め込まれたハイパーリンクを音
声によって検索する従来技術が数多く提案されている。
例えば、ＷＷＷブラウザに現在表示されているウエブペ
ージのリンク項目を音声によって検索する従来技術が、
特開平１１−１８４６７１号公報の「情報提示方法及び
装置、情報提示システム」において開示されている。ま
た、現在表示されているウエブページおよび現在表示さ
れているウエブページにリンクされているウエブページ
のリンク項目を音声によって検索する従来技術が、情報
処理学会論文誌Vol.40 No.2 p443-p452(1999)に掲載さ
れた「音声キーワードによるＷＷＷのブラウジング」に
おいて開示されている。2. Description of the Related Art Conventionally, there have been proposed many conventional techniques for retrieving a hyperlink embedded in a web page by voice in order to retrieve a web page by voice.
For example, a conventional technology for retrieving a link item of a web page currently displayed on a WWW browser by voice is described below.
This is disclosed in Japanese Patent Application Laid-Open No. 11-184671 entitled "Information Presentation Method and Apparatus, Information Presentation System". Further, a conventional technology for searching by voice for a currently displayed web page and a link item of a web page linked to the currently displayed web page is disclosed in IPSJ Transactions Vol.40 No.2 p443-p452. (1999), "Browsing WWW Using Voice Keywords".

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記の
従来技術では、音声検索の対象となるのはウエブページ
のリンク項目だけであり、ウエブページの内容について
は検索することができなかった。また、上記の従来技術
では、ユーザからの音声を認識する際は、単語認識を主
としているため、簡単な文法を用いた音声認識しかでき
なかった。そのため、ユーザの自然発話に基づいて音声
検索する際は、音声認識の精度が低いという問題があっ
た。本発明は、これらの課題を解決するためになされた
もので、ウエブページのリンク項目のみならずウエブペ
ージの内容についても音声検索することができ、かつ、
ユーザが自然発話した際にも精度の高い音声認識を実現
することができるウエブページの音声検索方法を提供す
ることを目的とする。However, in the above-mentioned prior art, only the link items of the web page can be searched for the voice, and the content of the web page cannot be searched. Further, in the above-described related art, when recognizing a voice from a user, since the main purpose is to recognize words, only voice recognition using a simple grammar can be performed. Therefore, when performing a voice search based on the natural utterance of the user, there is a problem that the accuracy of voice recognition is low. The present invention has been made to solve these problems, and it is possible to perform a voice search not only on the link items of the web page but also on the contents of the web page, and
It is an object of the present invention to provide a web page voice search method capable of realizing highly accurate voice recognition even when a user speaks naturally.

【０００４】[0004]

【課題を解決するための手段】本発明の請求項１に係る
発明は、ウエブページの音声検索方法であって、現在表
示されているウエブページにリンクされているウエブペ
ージを予め指定された階層数だけ先読みしてダウンロー
ドするウエブページ取得手段と、抽出された単語に、そ
の単語が抽出されたウエブページのＵＲＬを関連付ける
リンク付与手段と、単語と、その単語に関連付けられた
ＵＲＬとから音声検索用辞書を作成する辞書作成手段
と、単語から音声認識タスクを作成するタスク作成手段
と、音声認識タスクに最適な言語モデルと音響モデルを
選択する言語モデル・音響モデル選択手段と、ユーザが
発話した音声を認識する音声認識手段と、ユーザの発話
内容の意図を解析し検出する意図検出手段と、検出され
た意図を音声検索用辞書から検索する検索手段と、検索
結果に基づいてウエブページを変更するウエブページ変
更手段とを備えたことを特徴とする。請求項１の発明に
よれば、ダウンロードされた全てのウエブページから単
語を抽出し、抽出された単語にリンクを付与して音声検
索用辞書を作成することができるので、ウエブページの
リンク項目だけでなくウエブページの内容についても音
声検索することができる。また、ウエブページから抽出
された単語から音声認識タスクを作成することができる
ので、ダウンロードしたウエブページを音声検索する際
に最適な言語モデルと音響モデルを選択することができ
る。According to a first aspect of the present invention, there is provided a voice search method for a web page, wherein a web page linked to a currently displayed web page is specified in a predetermined hierarchy. Web page acquisition means for prefetching and downloading a number of words, link providing means for associating the extracted word with the URL of the Web page from which the word is extracted, and voice retrieval from the word and the URL associated with the word Dictionary creating means for creating a dictionary for use, task creating means for creating a speech recognition task from words, language model / acoustic model selecting means for selecting an optimal language model and acoustic model for the speech recognition task, and a user uttered Voice recognition means for recognizing voice, intention detection means for analyzing and detecting the intention of the utterance content of the user, and voice detection for detecting the detected intention Search means for searching the book, characterized in that a web page changing means for changing the web page based on the search result. According to the first aspect of the present invention, words can be extracted from all downloaded web pages, and a link can be added to the extracted words to create a voice search dictionary. In addition, voice search can be performed for the contents of the web page. In addition, since a speech recognition task can be created from words extracted from a web page, it is possible to select an optimal language model and acoustic model when performing a speech search on a downloaded web page.

【０００５】また、請求項２に係る発明は、単語抽出手
段は、ウエブページ取得手段がダウンロードした全ての
ウエブページからテキストを抽出した後、テキストを形
態素解析することによって単語を抽出することを特徴と
する。請求項２の発明によれば、単語抽出手段は、ウエ
ブページ取得手段がダウンロードした全てのウエブペー
ジに記載されたテキストを解析して単語を抽出すること
ができる。Further, the invention according to claim 2 is characterized in that the word extracting means extracts text from all the web pages downloaded by the web page obtaining means, and then extracts words by morphologically analyzing the text. And According to the invention of claim 2, the word extracting means can analyze the text described in all the web pages downloaded by the web page obtaining means and extract words.

【０００６】また、請求項３に係る発明は、リンク付与
手段は、単語抽出手段から抽出された単語に、単数また
は複数のリンクを付与することを特徴とする、請求項１
または請求項２に記載の音声検索方法ことを特徴とす
る。請求項３の発明によれば、リンク付与手段は、単語
に、単数または複数のリンクを付与することができる。The invention according to claim 3 is characterized in that the link assigning means assigns one or more links to the word extracted by the word extracting means.
Alternatively, the voice search method according to claim 2 is characterized. According to the third aspect of the present invention, the link assigning unit can assign one or more links to the word.

【０００７】また、請求項４に係る発明は、タスク作成
手段は、単語抽出手段から抽出された単語群を比較し、
類似概念のタスクまたは関連度の高いタスクを音声認識
タスクとすることを特徴とする。請求項４の発明によれ
ば、タスク作成手段は、ウエブページから抽出された単
語群から、そのウエブページを音声検索するのに好適な
音声認識タスクを作成することができる。According to a fourth aspect of the present invention, the task creating means compares the word groups extracted from the word extracting means,
A task having a similar concept or a task having a high degree of relevance is a speech recognition task. According to the invention of claim 4, the task creating means can create, from the word group extracted from the web page, a speech recognition task suitable for performing a voice search on the web page.

【０００８】また、請求項５に係る発明は、言語モデル
・音響モデル選択手段は、タスク作成手段が作成した音
声認識タスクを、予め用意した言語モデル・音響モデル
群が有するタスクと照合して、類似概念のタスクまたは
関連度の高いタスクを選択することを特徴とする。請求
項５の発明によれば、タスク作成手段が作成した音声認
識タスクに基づいて、予め用意した言語モデル・音響モ
デル群の中から、音声認識タスクに最適な言語モデルと
音響モデルを選択することができる。According to a fifth aspect of the present invention, the language model / acoustic model selecting means compares the speech recognition task created by the task creating means with a task of a language model / acoustic model group prepared in advance, It is characterized by selecting a task with a similar concept or a task with a high degree of relevance. According to the invention of claim 5, based on the speech recognition task created by the task creating means, a language model and an acoustic model optimal for the speech recognition task are selected from a group of language models and acoustic models prepared in advance. Can be.

【０００９】また、請求項６に係る発明は、言語モデル
・音響モデル選択手段は、タスク作成手段が作成した音
声認識タスクを、予め用意した音声認識タスク群が有す
るタスクと照合して、意味的に近いタスクを選択するこ
とを特徴とする。請求項６の発明によれば、タスク作成
手段が作成した音声認識タスクに基づいて、予め用意し
た音声認識タスク群の中から、音声認識タスクに最適な
言語モデルと音響モデルを選択することができる。According to a sixth aspect of the present invention, the language model / acoustic model selecting means matches the speech recognition task created by the task creating means with a task included in a group of speech recognition tasks prepared in advance, and performs semantics. A task close to is selected. According to the invention of claim 6, based on the speech recognition task created by the task creating means, a language model and an acoustic model optimal for the speech recognition task can be selected from a group of speech recognition tasks prepared in advance. .

【００１０】また、請求項７に係る発明は、ユーザが発
話した音声を認識する際に、音声認識タスクを動的に選
択して音声認識することを特徴とする。請求項７の発明
によれば、音声認識手段は、ユーザの発話内容に適した
音声認識タスクを動的に選択して音声認識することがで
きるので、精度の高い音声認識を実現することができ
る。The invention according to claim 7 is characterized in that, when recognizing a voice uttered by a user, a voice recognition task is dynamically selected to perform voice recognition. According to the seventh aspect of the present invention, since the voice recognition means can dynamically select a voice recognition task suitable for the content of the utterance of the user and perform voice recognition, it is possible to realize highly accurate voice recognition. .

【００１１】また、請求項８に係る発明は、ユーザが発
話した音声を認識する際に、音声認識タスクを動的に組
み合わせて音声認識することを特徴とする。請求項８の
発明によれば、音声認識手段は、ユーザの発話内容に適
した音声認識タスクを動的に組み合わせて音声認識する
ことができるので、精度の高い音声認識を実現すること
ができる。The invention according to claim 8 is characterized in that when recognizing a voice uttered by a user, voice recognition is performed by dynamically combining voice recognition tasks. According to the eighth aspect of the present invention, since the voice recognition means can dynamically perform voice recognition by appropriately combining voice recognition tasks suitable for the utterance content of the user, it is possible to realize highly accurate voice recognition.

【００１２】また、請求項９に係る発明は、ウエブペー
ジの音声検索装置であって、現在表示されているウエブ
ページにリンクされているウエブページを予め指定され
た階層数だけ先読みしてダウンロードするウエブページ
取得手段と、ダウンロードされた全てのウエブページか
ら単語を抽出する単語抽出手段と、抽出された単語に、
前記単語が抽出されたウエブページのＵＲＬを関連付け
るリンク付与手段と、前記単語と、前記単語に関連付け
られたＵＲＬとから音声検索用辞書を作成する辞書作成
手段と、前記単語から音声認識タスクを作成するタスク
作成手段と、前記音声認識タスクに最適な言語モデルと
音響モデルを選択する言語モデル・音響モデル選択手段
と、ユーザが発話した音声を認識する音声認識手段と、
ユーザの発話内容の意図を解析し検出する意図検出手段
と、検出された意図を前記音声検索用辞書から検索する
検索手段と、検索結果に基づいてウエブページを変更す
るウエブページ変更手段とを備えたことを特徴とする。
請求項９の発明によれば、ウエブページのリンク項目の
みならずウエブページの内容についても音声検索するこ
とができ、かつ、ユーザが自然発話した際にも精度の高
い音声認識を実現することができるウエブページの音声
認識装置を提供することができる。According to a ninth aspect of the present invention, there is provided a speech retrieval apparatus for a web page, wherein a web page linked to a currently displayed web page is prefetched by a predetermined number of layers and downloaded. Web page acquisition means, word extraction means for extracting words from all downloaded web pages,
Linking means for associating the URL of the web page from which the word has been extracted, dictionary creating means for creating a voice search dictionary from the word and the URL associated with the word, and creating a speech recognition task from the word Task creating means, a language model / acoustic model selecting means for selecting an optimal language model and acoustic model for the speech recognition task, and a speech recognizing means for recognizing a speech uttered by the user,
Intention detection means for analyzing and detecting the intention of the utterance content of the user, search means for searching for the detected intention from the voice search dictionary, and web page changing means for changing a web page based on the search result It is characterized by having.
According to the ninth aspect of the present invention, it is possible to perform a voice search not only on the link items of the web page but also on the contents of the web page, and to realize highly accurate voice recognition even when the user speaks naturally. It is possible to provide a web page voice recognition device capable of performing the above.

【００１３】また、請求項１０に係る発明は、ウエブペ
ージの音声検索プログラムであって、ウエブページを音
声検索するためにコンピュータを、現在表示されている
ウエブページにリンクされているウエブページを予め指
定された階層数だけ先読みしてダウンロードするウエブ
ページ取得手段と、ダウンロードされた全てのウエブペ
ージから単語を抽出する単語抽出手段と、抽出された単
語に、前記単語が抽出されたウエブページのＵＲＬを関
連付けるリンク付与手段と、前記単語と、前記単語に関
連付けられたＵＲＬとから音声検索用辞書を作成する辞
書作成手段と、前記単語から音声認識タスクを作成する
タスク作成手段と、前記音声認識タスクに最適な言語モ
デルと音響モデルを選択する言語モデル・音響モデル選
択手段と、ユーザが発話した音声を認識する音声認識手
段と、ユーザの発話内容の意図を解析し検出する意図検
出手段と、検出された意図を前記音声検索用辞書から検
索する検索手段と、検索結果に基づいてウエブページを
変更するウエブページ変更手段として機能させたことを
特徴とする。請求項１０の発明によれば、ウエブページ
のリンク項目のみならずウエブページの内容についても
音声検索することができ、かつ、ユーザが自然発話した
際にも精度の高い音声認識を実現することができるウエ
ブページの音声検索プログラムを提供することができ
る。According to a tenth aspect of the present invention, there is provided a voice search program for a web page, wherein a computer for voice search of the web page is controlled in advance by using a web page linked to the currently displayed web page. Web page acquisition means for prefetching and downloading a specified number of layers, word extraction means for extracting words from all downloaded Web pages, and URLs of the Web pages from which the words are extracted for the extracted words. Linking means for associating the word, a dictionary creating means for creating a speech search dictionary from the word, and a URL associated with the word, a task creating means for creating a speech recognition task from the word, Language / acoustic model selection means for selecting the optimal language model and acoustic model for Voice recognition means for recognizing the uttered voice, intention detection means for analyzing and detecting the intention of the utterance content of the user, search means for searching the detected intention from the voice search dictionary, and a web based on the search result. It is characterized by functioning as a web page changing means for changing a page. According to the tenth aspect, it is possible to perform a voice search not only for the link items of the web page but also for the content of the web page, and to realize highly accurate voice recognition even when the user speaks naturally. It is possible to provide a web page voice search program that can be used.

【００１４】[0014]

【作用】現在表示されているウエブページにリンクされ
ているウエブページを予め指定された階層数だけ先読み
してダウンロードした後、ダウンロードされた全てのウ
エブページから単語を抽出する。そして、抽出された単
語に、単語が抽出されたウエブページのＵＲＬを関連付
けて音声検索用辞書を作成する。また、抽出された単語
から音声認識タスクを作成し、音声認識タスクに最適な
言語モデルと音響モデルを選択する。そして、ユーザが
発話した音声を認識し、ユーザの発話内容を示す単語を
検出する。検出された単語を音声検索用辞書から検索
し、検索結果に基づいてウエブページを変更する。After reading and downloading the web pages linked to the currently displayed web page by a predetermined number of layers, words are extracted from all the downloaded web pages. Then, a voice search dictionary is created by associating the extracted word with the URL of the web page from which the word has been extracted. Further, a speech recognition task is created from the extracted words, and a language model and an acoustic model that are optimal for the speech recognition task are selected. Then, the speech uttered by the user is recognized, and a word indicating the uttered content of the user is detected. The detected word is searched from the voice search dictionary, and the web page is changed based on the search result.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態を、図
面を参照して詳細に説明する。図1は、本発明のウエブ
ページの音声検索方法を実現するためのシステム構成を
示す機能ブロック図である。また、図２は、ハイパーリ
ンクの抽出方法を説明した図であり、図３は、単語の抽
出方法を説明した図である。また、図４は、抽出された
単語にハイパーリンクを関連付ける方法を説明した図で
あり、図５は、音声検索用辞書を示す図である。また、
図６は、音声認識タスクを作成する方法を説明した図で
あり、図７は、言語モデル・音響モデルを選択する方法
を説明した図である。また、図８は、ウエブページを音
声検索する手順を示すフローチャートである。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a system configuration for realizing a web page voice search method according to the present invention. FIG. 2 is a diagram illustrating a method of extracting a hyperlink, and FIG. 3 is a diagram illustrating a method of extracting a word. FIG. 4 is a diagram for explaining a method of associating a hyperlink with an extracted word, and FIG. 5 is a diagram showing a voice search dictionary. Also,
FIG. 6 is a diagram illustrating a method of creating a speech recognition task, and FIG. 7 is a diagram illustrating a method of selecting a language model / acoustic model. FIG. 8 is a flowchart showing a procedure for retrieving a web page by voice.

【００１６】まず、本発明のウエブページの音声検索方
法を実現するためのシステム構成について、図１に示す
機能ブロック図を参照して説明する。図１において、１
はＷＷＷブラウザ、２はＨＴＭＬ解析部、３は音声検索
用辞書、４はシソーラス、５はタスク制御部、６は言語
モデル・音響モデル群、７は音声認識部、８は意図解析
部、９はブラウザ制御部である。First, a system configuration for realizing the web page voice search method of the present invention will be described with reference to a functional block diagram shown in FIG. In FIG. 1, 1
Is a WWW browser, 2 is an HTML analysis unit, 3 is a voice search dictionary, 4 is a thesaurus, 5 is a task control unit, 6 is a language model / acoustic model group, 7 is a speech recognition unit, 8 is an intention analysis unit, and 9 is an intention analysis unit. Browser control unit.

【００１７】ＷＷＷブラウザ１は、インターネット上の
ＷＷＷサーバーに蓄積されたハイパーテキスト形式のフ
ァイルやデータをクライアント側で閲覧するためのソフ
トウエアであり、ユーザまたはブラウザ制御部９から入
力されたＵＲＬが指定するウエブページ（以下、単に
「ページ」という）をダウンロードして表示する。本実
施の形態では、ＷＷＷブラウザ１には、例えば、マイク
ロソフト社のInternet Explorerやネットスケープ・コ
ミュニケーションズ社のNetscape Navigator等を用い
る。なお、ＷＷＷブラウザ１は、ＰＣ（パーソナル・コ
ンピュータ）で使用されるソフトウエアに限定されるも
のではなく、例えば、携帯電話やＰＤＡ（携帯情報端
末）等で使用されるソフトウエアであっても構わない。The WWW browser 1 is software for browsing files and data in a hypertext format stored in a WWW server on the Internet on a client side, and specifies a URL input from a user or a browser control unit 9. Download and display a web page (hereinafter simply referred to as “page”). In the present embodiment, as the WWW browser 1, for example, Microsoft Internet Explorer, Netscape Communications Netscape Navigator, or the like is used. The WWW browser 1 is not limited to software used in a PC (personal computer), but may be software used in a mobile phone, a PDA (portable information terminal), or the like. Absent.

【００１８】ＨＴＭＬ解析部２は、まず、ＷＷＷブラウ
ザ１が記憶領域にダウンロードした全てのページのＨＴ
ＭＬ文書を解析して音声検索用辞書３を作成する。そし
て、作成された音声検索用辞書３からタスク制御部５が
音声認識タスクを作成する。First, the HTML analysis unit 2 performs HT of all the pages downloaded by the WWW browser 1 to the storage area.
The voice search dictionary 3 is created by analyzing the ML document. Then, the task control unit 5 creates a speech recognition task from the created speech search dictionary 3.

【００１９】まず、音声検索用辞書を作成する方法につ
いて説明する。音声検索用辞書３を作成する際は、ＷＷ
Ｗブラウザ１がダウンロードしたＨＴＭＬ文書を解析し
て、（１）ハイパーリンクの抽出、（２）単語の抽出、
（３）抽出された単語へのハイパーリンクの関連付け、
を行う。First, a method for creating a voice search dictionary will be described. When creating the voice search dictionary 3, WW
The HTML document downloaded by the W browser 1 is analyzed, and (1) extraction of a hyperlink, (2) extraction of a word,
(3) associating hyperlinks with the extracted words;
I do.

【００２０】（１）ハイパーリンクの抽出ＨＴＭＬ文書からハイパーリンクを抽出する際は、ＨＴ
ＭＬタグである<A></A>タグのＨＲＥＦオプションで指
定されたＵＲＬと、開始タグ<A>と終了タグ</A>との間
に記載されたテキストを抽出する。抽出されたテキスト
は、すべて形態素解析され、単語単位に分割される。そ
して、抽出されたＵＲＬと単語は、音声検索用辞書３に
保存される。例えば、図２に示すように、<A></A>タグ
のＨＲＥＦオプションで指定された「http://www.epso
n.co.jp/」というＵＲＬと、開始タグ<A>と終了タグ</A
>との間に記載された「セイコーエプソン株式会社」と
いうテキストが抽出される。そして、抽出された「セイ
コーエプソン株式会社」というテキストは、形態素解析
によって「セイコーエプソン」という単語と「株式会
社」という単語に分割され、先の「http://www.epson.c
o.jp/」というＵＲＬと共に音声検索用辞書３に保存さ
れる。なお、テキストを形態素解析するレベルは、ユー
ザが任意に指定することができる。例えば、前記の「セ
イコーエプソン株式会社」というテキストは、「セイコ
ーエプソン」という単語と「株式会社」という単語に分
割することなく、「セイコーエプソン株式会社」という
一つの単語として扱うこともできる。ハイパーリンクが
抽出されると、ＨＴＭＬ解析部２はブラウザ制御部９を
制御して、抽出されたハイパーリンクのＵＲＬが指定す
るページをＷＷＷブラウザ１にダウンロードする。この
作業は、ＷＷＷブラウザ１がダウンロードしたページの
階層数が、予め指定された階層数に達するまで繰り返さ
れる。例えば、図２（ｂ）に示すように、予め現在のペ
ージの２ページ先まで先読みするように指定されている
場合は、まず、ＷＷＷブラウザ１が、現在のページＰ１
にリンクされているページＰ２、Ｐ３、Ｐ４を先読みし
てダウンロードする。そして、ＷＷＷブラウザ１は、ダ
ウンロードしたページＰ２、Ｐ３、Ｐ４にそれぞれリン
クされているページＰ５、Ｐ６、Ｐ７、Ｐ８、Ｐ９をさ
らに先読みしてダウンロードする。(1) Extraction of Hyperlink When extracting a hyperlink from an HTML document, HT
The text described between the URL specified by the HREF option of the <A></A> tag, which is the ML tag, and the start tag <A> and the end tag </A> is extracted. All the extracted texts are morphologically analyzed and divided into words. Then, the extracted URL and word are stored in the voice search dictionary 3. For example, as shown in FIG. 2, “http: //www.epso” specified by the HREF option of the <A></A> tag
n.co.jp/ ”, start tag <A> and end tag </ A
The text "Seiko Epson Corporation" written between> is extracted. Then, the extracted text "Seiko Epson Corporation" is divided into the words "Seiko Epson" and the word "Corporation" by morphological analysis, and the above "http: //www.epson.c
The URL is stored in the voice search dictionary 3 together with the URL “o.jp/”. The user can arbitrarily specify the level at which the text is to be morphologically analyzed. For example, the text “Seiko Epson Corporation” can be treated as one word “Seiko Epson Corporation” without being divided into the word “Seiko Epson” and the word “stock corporation”. When the hyperlink is extracted, the HTML analysis unit 2 controls the browser control unit 9 to download the page specified by the URL of the extracted hyperlink to the WWW browser 1. This operation is repeated until the number of layers of the page downloaded by the WWW browser 1 reaches a predetermined number of layers. For example, as shown in FIG. 2B, if it is specified in advance to prefetch up to two pages ahead of the current page, first, the WWW browser 1
The pages P2, P3 and P4 linked to are read ahead and downloaded. Then, the WWW browser 1 further prefetches and downloads pages P5, P6, P7, P8 and P9 linked to the downloaded pages P2, P3 and P4, respectively.

【００２１】（２）単語の抽出ＨＴＭＬ文書から単語を抽出する際は、まず、ＷＷＷブ
ラウザ１がダウンロードした全てのページのＨＴＭＬ文
書からテキストを抽出し、その後、抽出されたテキスト
を形態素解析して単語単位に分割する。ＨＴＭＬ文書か
らテキストを抽出する際は、特定のＨＴＭＬタグにおい
て、タグの直後またはタグ中の指定された位置に記載さ
れたテキストを抽出する。また、テキストを抽出した際
は、抽出したテキストの位置情報（行と列）を記憶して
おく。テキスト抽出の対象となるＨＴＭＬタグとして
は、タイトルを設定する<TITLE>タグ、リンク先を示す<
A>タグ、画像を表示する<IMG>タグ、見出しを設定する<
H>タグ、リストを作成する際に用いられる<UL>タグ，<O
L>タグ，<DL>タグ等がある。また、<IMG>タグのＡＬＴ
オプションで指定されたテキストも抽出の対象となる。
例えば、図３に示すように、<TITLE>タグの直後に記載
された「セイコーエプソン株式会社」というテキストが
抽出されると、抽出された「セイコーエプソン株式会
社」というテキストは形態素解析されて「セイコーエプ
ソン」という単語と「株式会社」という単語に分割され
る。なお、「（１）ハイパーリンクの抽出」のときと同
様に、テキストを形態素解析するレベルは、ユーザが任
意に指定することができる。例えば、図３の例では、
「セイコーエプソン株式会社」というテキストは、その
まま、「セイコーエプソン株式会社」という一つの単語
として扱うこともできる。(2) Word Extraction When extracting words from an HTML document, first, text is extracted from the HTML documents of all the pages downloaded by the WWW browser 1, and then the extracted text is subjected to morphological analysis. Divide words. When extracting text from an HTML document, in a specific HTML tag, text described immediately after the tag or at a specified position in the tag is extracted. When a text is extracted, positional information (row and column) of the extracted text is stored. HTML tags for text extraction include <TITLE> tags for setting titles and <
A> Set tag, <IMG> tag to display image, set heading <
H> tag, <UL> tag used when creating lists, <O>
L> tag and <DL> tag. Also, ALT of <IMG> tag
Text specified by options is also subject to extraction.
For example, as shown in FIG. 3, when the text “Seiko Epson Corporation” written immediately after the <TITLE> tag is extracted, the extracted text “Seiko Epson Corporation” is subjected to morphological analysis and It is divided into the word “Seiko Epson” and the word “stock”. As in the case of “(1) Extraction of hyperlink”, the user can arbitrarily specify the level at which the text is to be morphologically analyzed. For example, in the example of FIG.
The text “Seiko Epson Corporation” can be used as it is as a single word “Seiko Epson Corporation”.

【００２２】（３）抽出された単語へのハイパーリンク
の関連付け抽出された単語にハイパーリンクを関連付ける際は、
「（２）単語の抽出」においてＨＴＭＬ文書から抽出し
た単語に、その単語が抽出されたページのハイパーリン
クを関連付ける。具体的には、ＨＴＭＬ文書から抽出さ
れた単語に対して、その単語が記載されているページの
ＵＲＬに基づいてハイパーリンクを付与する。また、そ
の単語のリンク先には、ＨＴＭＬ文書内に同一のページ
内での参照を示す<A NAME>タグをリンク情報として埋め
込む。なお、<A NAME>タグを埋め込む位置については、
「（２）単語の抽出」においてテキストを抽出した際に
記憶された抽出されたテキストの位置情報（行と列）を
参照する。ハイパーリンクが付与された単語は、その単
語に付与されたハイパーリンクのＵＲＬと共に音声検索
用辞書３に保存される。例えば、図４（ａ）に示すよう
に、ＨＴＭＬ文書から「プリンタ」という単語が抽出さ
れた際は、「プリンタ」という単語には、「プリンタ」
という単語が記載されているページのＵＲＬに基づい
て、「http://localhost/index.html#プリンタ」という
ＵＲＬが付与される。そして、「プリンタ」という単語
は、「プリンタ」という単語に付与された「http://loc
alhost/index.html#プリンタ」というＵＲＬと共に音声
検索用辞書３に保存される。また、「プリンタ」という
単語のリンク先には、図４（ｂ）に示すように、ＨＴＭ
Ｌ文書内に「<A NAME=゛プリンタ゛>プリンタ</A>」とい
うタグをリンク情報として埋め込む。(3) Associating a Hyperlink with an Extracted Word When associating a hyperlink with an extracted word,
The hyperlink of the page from which the word was extracted is associated with the word extracted from the HTML document in “(2) Extraction of word”. Specifically, a word extracted from the HTML document is provided with a hyperlink based on the URL of the page on which the word is described. At the link destination of the word, an <A NAME> tag indicating a reference within the same page is embedded as link information in the HTML document. For the location where the <A NAME> tag is embedded,
The position information (row and column) of the extracted text stored when the text is extracted in “(2) Extracting words” is referred to. The word to which the hyperlink is assigned is stored in the voice search dictionary 3 together with the URL of the hyperlink assigned to the word. For example, as shown in FIG. 4A, when the word “printer” is extracted from the HTML document, the word “printer” is added to the word “printer”.
The URL “http: //localhost/index.html#printer” is assigned based on the URL of the page in which the word “http: //localhost/index.html#printer” is described. Then, the word "printer" is replaced with "http: // loc
The URL is stored in the voice search dictionary 3 together with the URL “alhost / index.html # printer”. In addition, as shown in FIG. 4B, the link destination of the word "printer"
A tag “<A NAME={Printer}> Printer </A>” is embedded as link information in the L document.

【００２３】音声検索用辞書３の一例を図５に示す。音
声検索用辞書３は、ＨＴＭＬ文書から抽出された単語
と、その単語に付与されたハイパーリンクのＵＲＬとか
ら構成されている。例えば、図５に示すように、音声検
索用辞書３には、「セイコーエプソン」という単語が、
その単語のリンク先を示す「http://www.epson.co.jp
/」というＵＲＬと共に保存されている。また、同様
に、音声検索用辞書３には、「プリンタ」という単語
が、その単語のリンク先を示す「http://localhost/ind
ex.html#プリンタ」というＵＲＬと共に保存されてい
る。FIG. 5 shows an example of the voice search dictionary 3. The voice search dictionary 3 includes words extracted from an HTML document and URLs of hyperlinks assigned to the words. For example, as shown in FIG. 5, the word “Seiko Epson” is stored in the voice search dictionary 3.
Indicates the link destination of the word "http://www.epson.co.jp
/ "With the URL. Similarly, in the voice search dictionary 3, the word "printer" includes "http: // localhost / ind" indicating the link destination of the word.
ex.html # printer ".

【００２４】次に、作成された音声検索用辞書３から音
声認識タスクを作成する方法について説明する。音声認
識タスクを作成する際は、まず、「（２）単語の抽出」
において、ＷＷＷブラウザ１にダウンロードされた全て
のページのＨＴＭＬ文書から抽出した単語群を、ＷＷＷ
ブラウザ１にダウンロードされたページの内容の把握す
るためのキーワード群とする。そして、それらのキーワ
ード群と外部データベースであるシソーラス４を利用し
て、タスク制御部５が音声認識タスクを作成する。例え
ば、図６に示すように、ＨＴＭＬ文書から抽出された単
語（キーワード）が、「プリンタ」、「メモリ」、「ハ
ードディスク」、「製品」、「購入」であった際は、シ
ソーラス４を参照してタスク制御部５がそれらのキーワ
ードの関連性を比較し、「パソコン関連機器の購入」と
いう音声認識タスクを作成する。Next, a method for creating a speech recognition task from the created speech search dictionary 3 will be described. When creating a voice recognition task, first, “(2) Word extraction”
In WWW, a group of words extracted from the HTML document of all pages downloaded to the WWW browser 1 is
It is a group of keywords for grasping the contents of the page downloaded to the browser 1. Then, the task control unit 5 creates a speech recognition task using the keyword group and the thesaurus 4 as an external database. For example, as shown in FIG. 6, when the words (keywords) extracted from the HTML document are “printer”, “memory”, “hard disk”, “product”, and “purchase”, refer to the thesaurus 4 Then, the task control unit 5 compares the relevance of those keywords, and creates a voice recognition task of “purchasing a personal computer-related device”.

【００２５】シソーラス４は、類義語や同義語等を集め
たデータベースであり、ＨＴＭＬ解析部２で抽出された
単語間の関係を整理し、単語同士の相互関係および単語
の適用範囲を定義し、ＨＴＭＬ解析部２において、ＨＴ
ＭＬ文書から抽出された単語群から音声認識タスクを作
成するのを補助する。また、シソーラス４は、後述する
ように、タスク制御部５において、言語モデル・音響モ
デル群６から言語モデル（ＬＭ）と音響モデル（ＡＭ）
を選択するのを補助する。The thesaurus 4 is a database in which synonyms, synonyms, and the like are collected. The relation between the words extracted by the HTML analysis unit 2 is arranged, the mutual relation between the words and the application range of the words are defined. In the analysis unit 2, the HT
It assists in creating a speech recognition task from a group of words extracted from an ML document. Further, as described later, the thesaurus 4 includes a language model (LM) and an acoustic model (AM) from the language model / acoustic model group 6 in the task control unit 5.
Assist in choosing

【００２６】タスク制御部５は、音声認識タスクを作成
するとともに、作成された音声認識タスクに基づいて、
言語モデル・音響モデル６群の中から、作成された音声
認識タスクに最適な言語モデル（ＬＭ）と音響モデル
（ＡＭ）を選択する。言語モデル（ＬＭ）と音響モデル
（ＡＭ）を選択する際は、作成された音声認識タスク
と、言語モデル・音響モデル群６が有するタスクとを照
合して、類似概念のタスクまたは関連度の高いタスクを
選択する。音声認識タスクと、言語モデル・音響モデル
群６が有するタスクとを照合する際は、シソーラス４を
利用する。例えば、図７に示すように、タスク制御部５
で作成された音声認識タスクが「パソコン関連機器の購
入」であり、言語モデル・音響モデル群６が有するタス
クがが「地名検索タスク」、「ホテル予約タスク」、
「ショッピングタスク」、「パソコン関連記事読み上げ
タスク」であった際は、シソーラス４を参照すると、
「パソコン関連機器の購入」と「パソコン関連記事読み
上げタスク」は、同類の概念として認識されるので、
「パソコン関連記事読み上げタスク」に関する言語モデ
ル（ＬＭ）と音響モデル（ＡＭ）が選択される。なお、
本実施の形態では、音声認識タスクと、言語モデル・音
響モデル群６が有するタスクとを照合する際は、外部デ
ータベースであるシソーラス４を利用したが、作成され
た音声認識タスクと、言語モデル・音響モデル群６が有
するタスクとを照合する方法は、本実施の形態の手法に
限定されるものではなく、様々な手法を用いることがで
きる。また、言語モデル（ＬＭ）と音響モデル（ＡＭ）
を選択する際は、図８に示すように、言語モデル・音響
モデル群６の代わりに、音声認識タスク群１０を用意し
ておき、音声認識タスク群１０の中から、作成された音
声認識タスクと意味的に近いタスクを選択するように構
成することもできる。また、予め、音声認識タスクと言
語モデル（ＬＭ）・音響モデル（ＡＭ）とを互いに関連
付けておいても、もちろん構わない。さらに、本実施の
形態では、音声認識タスクと、その音声認識タスクに対
応する言語モデル（ＬＭ）・音響モデル（ＡＭ）は一対
一で定まっているが、一つの音声認識タスクに対して複
数の言語モデル（ＬＭ）・音響モデル（ＡＭ）を組み合
わせても構わない。The task control unit 5 creates a speech recognition task and, based on the created speech recognition task,
An optimal language model (LM) and acoustic model (AM) for the created speech recognition task are selected from a group of six language models and acoustic models. When selecting the language model (LM) and the acoustic model (AM), the created speech recognition task is compared with a task of the language model / acoustic model group 6 to determine a task having a similar concept or a high degree of association. Select a task. When collating the speech recognition task with the tasks of the language model / acoustic model group 6, the thesaurus 4 is used. For example, as shown in FIG.
Is a "purchase of personal computer-related equipment", and the tasks of the language model / acoustic model group 6 are "place name search task", "hotel reservation task",
When "shopping task" and "computer-related article reading task" were performed, referring to thesaurus 4,
"Purchase PC-related equipment" and "PC-related article reading task" are recognized as similar concepts,
A language model (LM) and an acoustic model (AM) relating to the "PC-related article reading task" are selected. In addition,
In the present embodiment, when the speech recognition task and the task of the language model / acoustic model group 6 are compared with each other, the thesaurus 4 which is an external database is used. The method of matching the task with the acoustic model group 6 is not limited to the method of the present embodiment, and various methods can be used. In addition, a language model (LM) and an acoustic model (AM)
When selecting a speech recognition task group 10 as shown in FIG. 8, a speech recognition task group 10 is prepared in place of the language model / acoustic model group 6. It is also possible to select a task semantically similar to the above. Alternatively, the speech recognition task and the language model (LM) / acoustic model (AM) may be associated with each other in advance. Further, in the present embodiment, the speech recognition task and the language model (LM) / acoustic model (AM) corresponding to the speech recognition task are determined on a one-to-one basis. A language model (LM) and an acoustic model (AM) may be combined.

【００２７】言語モデル・音響モデル６群は、複数の言
語モデル（ＬＭ）と音響モデル（ＡＭ）とから構成さ
れ、各言語モデル（ＬＭ）・音響モデル（ＡＭ）は、各
音声認識タスクの内容に適したデータを有する。つま
り、言語モデル（ＬＭ）は、特定の分野に関するテキス
トデータから作成されており、音響モデル（ＡＭ）は、
特定の分野に関する音声データから作成されている。な
お、言語モデル（ＬＭ）と音響モデル（ＡＭ）は、統計
量を用いた確率・統計的なモデルや、文法を用いた構造
的なモデルを利用することも可能である。また、各言語
モデル（ＬＭ）は、音声認識用の語彙辞書を有する。つ
まり、音声認識用の語彙辞書は、各言語モデル（ＬＭ）
毎に用意されている。例えば、「パソコン関連記事読み
上げタスク」という音声認識タスクに対応する言語モデ
ル（ＬＭ）・音響モデル（ＡＭ）としては、パソコンに
関連する記事から作成した言語モデル（ＬＭ）と、パソ
コンに関連する記事の発話から作成した音響モデル（Ａ
Ｍ）とがある。また、言語モデル・音響モデル群６は、
ＰＣ（パーソナル・コンピュータ）等のローカルな環境
に設置されたものに限らず、インターネット等のネット
ワーク上に設置されたものを利用することもできる。The language model / acoustic model 6 group is composed of a plurality of language models (LM) and acoustic models (AM), and each language model (LM) / acoustic model (AM) is the content of each speech recognition task. With data suitable for That is, the language model (LM) is created from text data related to a specific field, and the acoustic model (AM) is
It is created from audio data for a specific field. Note that, for the language model (LM) and the acoustic model (AM), a probabilistic / statistical model using statistics and a structural model using grammar can be used. Each language model (LM) has a vocabulary dictionary for speech recognition. In other words, the vocabulary dictionary for speech recognition is a language model (LM)
It is prepared for each. For example, as a language model (LM) / acoustic model (AM) corresponding to a voice recognition task called “PC-related article reading task”, a language model (LM) created from an article related to a personal computer and an article related to a personal computer Model created from the utterance (A)
M). The language model / acoustic model group 6 is
Not only those installed in a local environment such as a PC (personal computer) but also those installed on a network such as the Internet can be used.

【００２８】音声認識部７は、ＷＷＷブラウザ１がダウ
ンロードしたページを対象に、タスク制御部５において
選択された言語モデル（ＬＭ）・音響モデル（ＡＭ）を
利用して、ユーザが発話した音声を認識する。音声認識
部７が音声認識する際は、まず、ユーザが発話した音声
の分析処理を行い、音声認識に必要な情報を抽出する。
そして、音声認識タスクが選択した言語モデル（ＬＭ）
・音響モデル（ＡＭ）を参照して、発話内容を示す単語
列を取得する。なお、ここでいう単語列とは、言語モデ
ル（ＬＭ）が持つ語彙の列である。また、ＷＷＷブラウ
ザ１がダウンロードしたページから取得された単語が、
言語モデル（ＬＭ）の持つ語彙に含まれない場合もある
が、このような未知語に対する処理は、説明の簡略化の
ため省略する。The voice recognition unit 7 uses the language model (LM) and the acoustic model (AM) selected by the task control unit 5 to target the page downloaded by the WWW browser 1 and reproduces the voice uttered by the user. recognize. When the voice recognition unit 7 performs voice recognition, first, a voice uttered by the user is analyzed to extract information necessary for voice recognition.
Then, the language model (LM) selected by the voice recognition task
-With reference to the acoustic model (AM), a word string indicating the utterance content is acquired. Note that the word sequence here is a sequence of vocabulary of the language model (LM). Also, the words acquired from the page downloaded by the WWW browser 1 are:
Although the word may not be included in the vocabulary of the language model (LM), the process for such an unknown word is omitted for simplification of description.

【００２９】意図解析部８は、音声認識部７において取
得された単語列を解析して、ユーザの発話内容を示す単
語を検出する。そして、解析結果から、音声検索用辞書
３を用いて、ユーザの所望するページの検索を行う。単
語列を解析する際は、パターンマッチ、意味解析、形態
素解析、格構造解析等の自然言語処理を利用して行う。
そして、解析結果として検出された単語を音声検索用辞
書３から検索して、移動するページを決定する。例え
ば、パターンマッチの手法を用いる場合は、まず、音声
認識部７において取得された単語列から、「〜を見た
い」、「〜を知りたい」という部分を見つけ出した後、
それらの部分から「〜」の部分に該当する単語を検出す
る。そして、検出された単語を音声検索用辞書３から検
索する。検索されたページはユーザの所望するページと
判断され、そのページを移動するページとして決定し、
そのページのＵＲＬを取得する。The intention analyzing unit 8 analyzes the word string acquired by the speech recognizing unit 7 and detects words indicating the contents of the utterance of the user. Then, based on the analysis result, a page desired by the user is searched using the voice search dictionary 3. Analysis of a word string is performed using natural language processing such as pattern matching, semantic analysis, morphological analysis, and case structure analysis.
Then, the word detected as the analysis result is searched from the voice search dictionary 3, and the page to be moved is determined. For example, in the case of using the pattern matching method, first, from the word string acquired by the voice recognition unit 7, a part “I want to see” or “I want to know” is found,
A word corresponding to the part of "~" is detected from those parts. Then, the detected word is searched from the voice search dictionary 3. The searched page is determined as a page desired by the user, and the page is determined as a page to be moved,
Get the URL of the page.

【００３０】ブラウザ制御部９は、意図解析部８の解析
結果に基づいて、ＷＷＷブラウザ１を制御する。具体的
には、意図解析部８において取得されたＵＲＬをＷＷＷ
ブラウザ１に入力してページに変更する。なお、現在表
示されているページと同一のページに移動する際は、ネ
ットワーク上から新たにダウンロードしてページを取得
するのではなく、ＨＴＭＬ文書を解析する際に取得した
ページをＷＷＷブラウザ１に渡す。The browser control unit 9 controls the WWW browser 1 based on the analysis result of the intention analysis unit 8. Specifically, the URL acquired by intention analysis section 8 is
Input to browser 1 and change to page. When moving to the same page as the currently displayed page, instead of newly downloading the page from the network and acquiring the page, the page acquired when analyzing the HTML document is passed to the WWW browser 1. .

【００３１】次に、ＷＷＷを音声検索する手順につい
て、図９に示すフローチャートを参照して説明する。ま
ず、ステップＳ１において、ユーザがＷＷＷブラウザ１
に任意のＵＲＬを入力する。ＵＲＬは、ユーザがキーボ
ードやマウスを用いて入力する。また、ＵＲＬは、ユー
ザが音声によって入力しても良い。続くステップＳ２に
おいて、ＷＷＷブラウザ１は、そのＵＲＬが示すページ
をダウンロードして表示する。そして、ステップＳ３で
は、ステップＳ２において表示されたページがユーザの
所望するページであるかどうかを判断する。表示された
ページがユーザの所望するページである場合は、作業を
終了する。逆に、ユーザが他のページを所望する場合
は、次のステップＳ４に進む。Next, a procedure for searching the WWW by voice will be described with reference to a flowchart shown in FIG. First, in step S1, the user sets the WWW browser 1
Enter any URL. The URL is input by the user using a keyboard or a mouse. Further, the URL may be input by the user by voice. In the following step S2, the WWW browser 1 downloads and displays the page indicated by the URL. In step S3, it is determined whether the page displayed in step S2 is a page desired by the user. If the displayed page is the page desired by the user, the operation is terminated. Conversely, if the user desires another page, the process proceeds to the next step S4.

【００３２】ステップＳ４では、ＨＴＭＬ解析部２が、
ＷＷＷブラウザ１に現在表示されているページおよび現
在表示されているページにリンクされているページか
ら、ＨＴＭＬ文書を取得する。続くステップＳ５では、
ステップＳ４において取得されたＨＴＭＬ文書を解析
し、ハイパーリンクの抽出と単語の抽出を行う。この作
業は、ＷＷＷブラウザ１がダウンロードしたページの階
層数が、予め指定された階層数に達するまで繰り返され
る。そして、ステップＳ６では、ＷＷＷブラウザ１がダ
ウンロードしたページの階層数が予め指定された階層数
に達したかどうかを判断する。ＷＷＷブラウザ１がダウ
ンロードしたページの階層数が予め指定された階層数に
達した場合は、次のステップＳ７に進む。また、ＷＷＷ
ブラウザ１がダウンロードしたページの階層数が予め指
定された階層数に達していない場合は、先のステップＳ
４に戻る。In step S4, the HTML analysis unit 2
An HTML document is acquired from a page currently displayed on the WWW browser 1 and a page linked to the currently displayed page. In the following step S5,
The HTML document acquired in step S4 is analyzed to extract hyperlinks and words. This operation is repeated until the number of layers of the page downloaded by the WWW browser 1 reaches a predetermined number of layers. Then, in step S6, it is determined whether the number of layers of the page downloaded by the WWW browser 1 has reached the number of layers specified in advance. When the number of layers of the page downloaded by the WWW browser 1 has reached the number of layers specified in advance, the process proceeds to the next step S7. Also, WWW
If the number of layers of the page downloaded by the browser 1 does not reach the number of layers specified in advance, the previous step S
Return to 4.

【００３３】ステップＳ７では、ＨＴＭＬ解析部２が、
先のステップＳ５において抽出されたハイパーリンクと
単語とから音声検索用辞書３を作成する。続く、ステッ
プＳ８では、タスク制御部５が、ステップＳ７において
作成された音声検索用辞書３から、音声認識タスクを作
成する。ステップＳ９では、ステップＳ８において作成
された音声認識タスクに基づいて、言語モデル・音響モ
デル群６から、言語モデル（ＬＭ）・音響モデル（Ａ
Ｍ）を選択する。In step S7, the HTML analysis unit 2
The voice search dictionary 3 is created from the hyperlinks and words extracted in the previous step S5. Subsequently, in step S8, the task control unit 5 creates a speech recognition task from the speech search dictionary 3 created in step S7. In step S9, based on the speech recognition task created in step S8, a language model (LM) / acoustic model (A
Select M).

【００３４】そして、ステップＳ１０においてユーザが
発話すると、続くステップＳ１１では、音声認識部７
が、先のステップＳ９において選択された言語モデル
（ＬＭ）・音響モデル（ＡＭ）を利用して、ユーザが発
話した音声を認識し、発話内容を示す単語列を取得す
る。続くステップＳ１２では、音声認識部７がユーザの
発話を認識できたかどうかを判断する。ユーザの発話を
認識できたと判断された場合は、次のステップＳ１３に
進む。また、ユーザの発話を認識できなかったと判断さ
れた場合は、先のステップＳ１０に戻る。Then, when the user speaks in step S10, in the following step S11, the voice recognition unit 7
Uses the language model (LM) / acoustic model (AM) selected in step S9 to recognize the speech uttered by the user and obtain a word string indicating the uttered content. In the following step S12, it is determined whether or not the voice recognition unit 7 has been able to recognize the utterance of the user. If it is determined that the utterance of the user has been recognized, the process proceeds to the next step S13. If it is determined that the utterance of the user has not been recognized, the process returns to step S10.

【００３５】ステップＳ１３では、意図解析部８が、先
のステップＳ１１において取得された単語列を解析し
て、ユーザの発話内容を示す単語を検出する。続くステ
ップＳ１４では、ステップＳ１３において検出された単
語を音声検索用辞書３から検索して、移動するページの
ＵＲＬを取得する。そして、ステップＳ１５では、ブラ
ウザ制御部９が、ステップＳ１４において取得されたＵ
ＲＬをＷＷＷブラウザ１に入力してページを変更する。In step S13, the intention analyzing unit 8 analyzes the word string acquired in the previous step S11 to detect a word indicating the utterance content of the user. In the following step S14, the word detected in step S13 is searched from the voice search dictionary 3, and the URL of the page to be moved is obtained. Then, in step S15, the browser control unit 9 transmits the U acquired in step S14.
The RL is input to the WWW browser 1 to change the page.

【００３６】以上、本発明のウエブページの音声検索方
法について、一実施の形態例を説明したが、本発明は前
記した実施の形態に限定されることなく、広く変形して
実施可能である。例えば、本実施形態では、ＷＷＷブラ
ウザ１にダウンロードされたＨＴＭＬ文書を解析する場
合について述べたが、ＨＴＭＬ文書に限らず、ＸＭＬ
（Extensible Markup Language）等によって記述された
文書でもかまわない。また、本実施形態では、ウエブペ
ージをダウンロードするのに、ＷＷＷブラウザプログラ
ムを用いたが、他の方法によってウエブページをダウン
ロードしても構わない。また、一度解析したページの情
報を履歴として保存することにより、次回以降のページ
の先読み、音声検索用辞書の作成、音声認識タスクを作
成等の処理を省くことができる。As described above, an embodiment of the method for retrieving voice of a web page according to the present invention has been described. However, the present invention is not limited to the above-described embodiment, and can be widely modified. For example, in the present embodiment, a case has been described in which an HTML document downloaded to the WWW browser 1 is analyzed.
(Extensible Markup Language) or the like. Further, in the present embodiment, the WWW browser program is used to download the web page, but the web page may be downloaded by another method. Further, by storing information of a page once analyzed as a history, it is possible to omit processes such as pre-reading the next and subsequent pages, creating a voice search dictionary, and creating a voice recognition task.

【００３７】[0037]

【発明の効果】本発明によれば、ウエブページのリンク
項目だけでなくウエブページの内容についても音声検索
することができる。したがって、自由度の高い音声検索
を実現することができる。また、ウエブページから抽出
された単語から音声認識タスクを作成することができる
ので、ダウンロードしたウエブページを音声検索する際
に最適な言語モデルと音響モデルを選択することができ
る。したがって、精度の高い音声認識を実現することが
できる。According to the present invention, a voice search can be performed not only for a link item of a web page but also for the content of the web page. Therefore, a voice search with a high degree of freedom can be realized. In addition, since a speech recognition task can be created from words extracted from a web page, it is possible to select an optimal language model and acoustic model when performing a speech search on a downloaded web page. Therefore, highly accurate speech recognition can be realized.

[Brief description of the drawings]

【図１】本発明のウエブページの音声検索方法を実現す
るためのシステム構成を示す機能ブロック図である。FIG. 1 is a functional block diagram showing a system configuration for realizing a web page voice search method according to the present invention.

【図２】ハイパーリンクの抽出方法を説明した図であ
る。FIG. 2 is a diagram illustrating a method of extracting a hyperlink.

【図３】単語の抽出方法を説明した図である。FIG. 3 is a diagram illustrating a method for extracting words.

【図４】抽出された単語にハイパーリンクを関連付ける
方法を説明した図である。FIG. 4 is a diagram illustrating a method of associating a hyperlink with an extracted word.

【図５】音声検索用辞書を示す図である。FIG. 5 is a diagram showing a voice search dictionary.

【図６】音声認識タスクを作成する方法を説明した図で
ある。FIG. 6 is a diagram illustrating a method of creating a voice recognition task.

【図７】言語モデル・音響モデルを選択する方法を説明
した図である。FIG. 7 is a diagram illustrating a method of selecting a language model / acoustic model.

【図８】言語モデル・音響モデルを選択する他の方法を
説明した図である。FIG. 8 is a diagram illustrating another method for selecting a language model / acoustic model.

【図９】ウエブページを音声検索する手順を示すフロー
チャートである。FIG. 9 is a flowchart illustrating a procedure for performing a voice search on a web page.

[Explanation of symbols]

１ＷＷＷブラウザ２ＨＴＭＬ解析部３音声検索用辞書４シソーラス５タスク制御部６言語モデル・音響モデル群７音声認識部８意図解析部９ブラウザ制御部１０音声認識タスク群 DESCRIPTION OF SYMBOLS 1 WWW browser 2 HTML analysis part 3 Voice search dictionary 4 Thesaurus 5 Task control part 6 Language model / acoustic model group 7 Voice recognition part 8 Intention analysis part 9 Browser control part 10 Voice recognition task group

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/18 Ｇ１０Ｌ 3/00 ５３７Ｊ 15/00 ５５１Ａ 15/28 ５５１ＰＦターム(参考） 5B075 KK07 ND16 PP10 PP14 PP24 PQ02 QM07 UU40 5D015 GG01 KK01 LL10 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI theme coat ゛ (reference) G10L 15/18 G10L 3/00 537J 15/00 551A 15/28 551P F-term (reference) 5B075 KK07 ND16 PP10 PP14 PP24 PQ02 QM07 UU40 5D015 GG01 KK01 LL10

Claims

[Claims]

1. A voice search method for a web page, comprising: a web page acquisition unit for prefetching and downloading a web page linked to a currently displayed web page by a predetermined number of layers; Word extracting means for extracting a word from all the web pages, linking means for associating the extracted word with the URL of the web page from which the word was extracted, and the word and the URL associated with the word. Dictionary creation means for creating a speech search dictionary from; a task creation means for creating a speech recognition task from the words; a language model / sound model selection means for selecting an optimal language model and an acoustic model for the speech recognition task; A voice recognition means for recognizing a voice uttered by the user, and analyzing and detecting an intention of the uttered content of the user A voice of a web page, comprising: a diagram detection means; a search means for searching a detected intention from the voice search dictionary; and a web page change means for changing a web page based on a search result. retrieval method.

2. The method according to claim 1, wherein the word extracting unit extracts text from all the web pages downloaded by the web page obtaining unit, and then extracts words by morphologically analyzing the text. 1
Voice search method described in.

3. The voice search method according to claim 1, wherein the link assigning unit assigns one or more links to the word extracted from the word extracting unit.

4. The method according to claim 1, wherein the task creating unit compares the word groups extracted from the word extracting unit, and sets a task having a similar concept or a highly related task of the word group as a speech recognition task. 4. A method for retrieving a speech of a web page according to claim 1.

5. The language model / acoustic model selecting means, wherein the speech recognition task created by the task creating means is
5. The task according to claim 1, wherein a task having a similar concept or a task having a high degree of relevance is selected by comparing the task with a task included in a language model / acoustic model group prepared in advance.
Method for retrieving web pages according to any of the above.

6. The language model / acoustic model selecting means, wherein the speech recognition task created by the task creating means is
Collating with a task of a voice recognition task group prepared in advance, and selecting a semantically close task,
The method for retrieving speech of a web page according to any one of claims 1 to 5.

7. The speech recognition device according to claim 1, wherein the speech recognition means dynamically selects the speech recognition task and performs speech recognition when recognizing a speech uttered by the user. Voice search method for web pages described in any of them.

8. The speech recognition apparatus according to claim 1, wherein said speech recognition means performs speech recognition by dynamically combining said speech recognition tasks when recognizing speech uttered by a user. A method for retrieving voice from web pages described in Crab.

9. A web page voice retrieval apparatus, comprising: a web page acquisition means for prefetching and downloading a web page linked to a currently displayed web page by a predetermined number of layers; Word extracting means for extracting a word from all the web pages, linking means for associating the extracted word with the URL of the web page from which the word was extracted, and the word and the URL associated with the word. Dictionary creation means for creating a speech search dictionary from; a task creation means for creating a speech recognition task from the words; a language model / sound model selection means for selecting an optimal language model and an acoustic model for the speech recognition task; A voice recognition means for recognizing a voice uttered by the user, and analyzing and detecting an intention of the uttered content of the user A voice of a web page, comprising: a diagram detection means; a search means for searching a detected intention from the voice search dictionary; and a web page change means for changing a web page based on a search result. Search device.

10. A Web page acquisition means for prefetching a Web page linked to a currently displayed Web page by a predetermined number of layers and downloading the computer in order to perform a voice search for the Web page; Word extracting means for extracting a word from all of the extracted web pages, link providing means for associating the extracted word with the URL of the web page from which the word has been extracted, and the word and the URL associated with the word Dictionary creation means for creating a speech search dictionary from the above; task creation means for creating a speech recognition task from the words; language model / sound model selection means for selecting a language model and an acoustic model optimal for the speech recognition task And voice recognition means for recognizing the voice uttered by the user, and the intention of the uttered content of the user. Intention detecting means for analyzing and detecting, a searching means for searching the detected intention from the voice search dictionary, and a voice search for a web page for functioning as a web page changing means for changing a web page based on a search result. program.