JP2002169585A

JP2002169585A - Device and method for voice browser and storage medium with program stored therein

Info

Publication number: JP2002169585A
Application number: JP2000370347A
Authority: JP
Inventors: Takanari Ueda; 隆也上田; Yuji Ikeda; 裕治池田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-12-05
Filing date: 2000-12-05
Publication date: 2002-06-14

Abstract

PROBLEM TO BE SOLVED: To enable voice input even when a grammar dictionary for voice recognition is not prepared on the contents side. SOLUTION: The voice browser which is capable of voice input is equipped with a voice input part 107 which inputs a voice, a voice recognition part 108 which recognizes the inputted voice, a contents dictionary holding part 103 which holds a contents dictionary for voice recognition described in contents, a user dictionary holding part 105 which holds user dictionaries for voice recognition prepared by users, and a dictionary switching part 104 which switches a dictionary used by the voice recognition means to a user dictionary when a contents dictionary corresponding to an input field as an object of data input among input fields provided to the contents is absent.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ボイスブラウザに
好適な音声認識方法及び装置に関する。The present invention relates to a speech recognition method and apparatus suitable for a voice browser.

【０００２】[0002]

【従来の技術】インターネットの普及により、Ｗｅｂ情
報アクセスが一般に行われるようになっている。更に音
声認識・合成技術の進展により、電話等を通してＷｅｂ
ページに音声によってアクセスする、いわゆるボイスブ
ラウザも用いられるようになってきている(例えば、特
開平１１−２４９８６７）。2. Description of the Related Art With the spread of the Internet, Web information access has been generally performed. Furthermore, with the development of speech recognition / synthesis technology, Web
A so-called voice browser for accessing a page by voice has also been used (for example, Japanese Patent Application Laid-Open No. H11-249867).

【０００３】上記は音声のみを用いてＷｅｂページにア
クセスするものであるが、Ｗｅｂページへのアクセスの
際にＧＵＩと音声を併用する形態も考えられる(ここで
は複合型ブラウザと称する)。例えば特開平１０−１２
４２９３では、Ｗｅｂページ中のリンクを音声入力によ
って選択したり、Ｗｅｂページの内容を合成音声によっ
て読み上げたりすることができる。この従来例では、ア
ンカーを付されている語句を解析して読みを付与するこ
とにより、その語句の音声認識が可能となっている。読
みが付与できなかった場合には、利用者が読みを登録す
ることによって音声入力を可能としている。[0003] Although the above describes accessing a Web page using only audio, a form in which a GUI and audio are used together when accessing a Web page is also considered (herein referred to as a composite browser). For example, JP-A-10-12
In 4293, a link in the Web page can be selected by voice input, and the contents of the Web page can be read out by synthesized voice. In this conventional example, by analyzing a word to which an anchor is attached and adding a reading, speech recognition of the word is possible. If the reading cannot be given, the user registers the reading to enable voice input.

【０００４】上記の特開平１０−１２４２９３では、音
声入力によって可能なのはリンクの選択のみであり、入
力フィールド等への入力はＧＵＩを用いて行なう。ここ
で、一般のボイスブラウザで行なわれているようにコン
テンツ提供者がコンテンツ中に音声認識用の文法・辞書
を記述するようにすれば、複合型ブラウザにおいても入
力フィールド等への入力を音声によって行なうことが可
能であることは容易に想到しえる。例えば、図６に示す
ような仮想的なコンテンツ記述においては、ｉｎｐｕｔ
要素のｇｒａｍｍａｒ属性によって文法・辞書を指示し
ている。ここで"ｃｈｉｍｅｉ"の表す文法・辞書は例え
ば図４に示すように、表記と読みを記述した内容を有し
ている。これにより、入力フィールド６０３に入力され
た音声を認識することができる。In Japanese Patent Laid-Open No. Hei 10-124293, only a link can be selected by voice input, and input to an input field or the like is performed using a GUI. Here, if the content provider writes a grammar / dictionary for speech recognition in the content as is done in a general voice browser, the input to the input field and the like can be input by voice even in the complex type browser. It is easy to imagine what can be done. For example, in a virtual content description as shown in FIG.
The grammar / dictionary is indicated by the grammar attribute of the element. Here, the grammar / dictionary represented by “chimei” has, for example, as shown in FIG. Thus, the voice input to the input field 603 can be recognized.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、従来例
の複合型ブラウザでは、コンテンツにおいて文法・辞書
が記述されていない場合は、音声を認識することができ
ず、したがって入力フィールドへの入力を音声によって
行なうことができなかった。このため、コンテンツ提供
者が音声入力を想定して作成したコンテンツしか音声に
よってアクセスすることができなかった。例えば、図７
に示すコンテンツでは、文法・辞書が記述されていない
ので、入力フィールド７０３に音声を入力することがで
きなかった。However, in the conventional complex type browser, if the grammar / dictionary is not described in the content, the voice cannot be recognized, and therefore, the input to the input field is performed by the voice. I couldn't do it. For this reason, only contents created by the content provider assuming voice input could be accessed by voice. For example, FIG.
Since no grammar / dictionary is described in the content shown in (1), no voice could be input to the input field 703.

【０００６】本発明は、音声認識用のユーザ辞書を用意
し、コンテンツ側で音声認識用の文法・辞書が用意され
ていない場合でも、ユーザ辞書を用いて音声認識をする
ことにより、音声入力を可能にするものである。The present invention provides a user dictionary for speech recognition, and performs speech recognition using the user dictionary even when a grammar / dictionary for speech recognition is not prepared on the content side. Is what makes it possible.

【０００７】[0007]

【課題を解決するための手段】かかる課題を解決するた
め、例えば本発明のボイスブラウザ装置は以下の構成を
備える。すなわち、音声による入力が可能なボイスブラ
ウザ装置であって、音声を入力する音声入力手段と、前
記入力された音声を認識する音声認識手段と、コンテン
ツ内に記述された音声認識用のコンテンツ辞書を保持す
るコンテンツ辞書保持手段と、ユーザごとに用意した音
声認識用のユーザ辞書を保持するユーザ辞書保持手段
と、前記コンテンツに設けられた入力フィールドのう
ち、データの入力対象となっている入力フィールドに対
応するコンテンツ辞書が存在しない場合、前記音声認識
手段で使用する辞書をユーザ辞書に切り替える辞書切替
手段とを備える。In order to solve such a problem, for example, a voice browser device of the present invention has the following configuration. That is, a voice browser device capable of inputting by voice, comprising voice input means for inputting voice, voice recognition means for recognizing the input voice, and a content dictionary for voice recognition described in the content. A content dictionary holding means for holding, a user dictionary holding means for holding a user dictionary for voice recognition prepared for each user, and an input field which is a data input target among input fields provided in the content. A dictionary switching unit that switches a dictionary used by the voice recognition unit to a user dictionary when a corresponding content dictionary does not exist.

【０００８】[0008]

【発明の実施の形態】［実施形態１］以下、図面を参照
して本発明の実施形態を詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [Embodiment 1] Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

【０００９】図１は、本発明の一実施形態に係る装置の
基本構成を示すブロック図である。FIG. 1 is a block diagram showing a basic configuration of an apparatus according to an embodiment of the present invention.

【００１０】１０１はブラウザによって表示すべき内容
を含むコンテンツを保持するコンテンツ保持部である。
１０２はコンテンツ保持部１０１に保持されたコンテン
ツを解析するコンテンツ解析部である。１０３はコンテ
ンツ中に音声認識用辞書が記述されていた場合にその辞
書を保持するコンテンツ辞書保持部である。コンテンツ
辞書保持部１０３には例えば図４に示すようなデータ４
０１が保持される。１０４は音声認識部１０８で使用す
る音声認識用辞書を切り替える辞書切替部である。１０
５はユーザ辞書を保持するユーザ辞書保持部である。ユ
ーザ辞書保持部１０５には例えば図５に示すようなデー
タ５０１が保持される。ユーザ辞書保持部１０５には事
前にこうしたデータが登録されているものとする。１０
６は音声入力部１０７からの音声入力や入力部１０９か
らのＧＵＩ入力を解析する入力解析部である。１０７は
音声入力を行なう音声入力部である。１０８は音声認識
を行なう音声認識部である。１０９はＧＵＩ入力を行な
う入力部である。１１０はコンテンツの内容を表示する
表示部である。１１１はコンテンツ辞書の有無を判定す
るコンテンツ辞書有無判定部である。Reference numeral 101 denotes a content holding unit for holding content including content to be displayed by a browser.
Reference numeral 102 denotes a content analysis unit that analyzes the content held in the content holding unit 101. Reference numeral 103 denotes a content dictionary holding unit that holds a dictionary for speech recognition when the dictionary is described in the content. For example, data 4 as shown in FIG.
01 is retained. Reference numeral 104 denotes a dictionary switching unit that switches a dictionary for speech recognition used by the speech recognition unit 108. 10
Reference numeral 5 denotes a user dictionary holding unit that holds a user dictionary. The user dictionary holding unit 105 holds, for example, data 501 as shown in FIG. It is assumed that such data is registered in the user dictionary holding unit 105 in advance. 10
Reference numeral 6 denotes an input analysis unit that analyzes a voice input from the voice input unit 107 and a GUI input from the input unit 109. Reference numeral 107 denotes a voice input unit for performing voice input. Reference numeral 108 denotes a voice recognition unit that performs voice recognition. An input unit 109 performs a GUI input. A display unit 110 displays the content of the content. Reference numeral 111 denotes a content dictionary presence / absence determination unit that determines the presence / absence of a content dictionary.

【００１１】図２は、本実施形態の装置の具体的構成を
示す図である。FIG. 2 is a diagram showing a specific configuration of the apparatus of the present embodiment.

【００１２】２０１はＣＰＵであり、後述する手順を実
現するプログラムに従って動作する。２０２はメモリで
あり、コンテンツ保持部１０１、コンテンツ辞書保持部
１０３、ユーザ辞書保持部１０５と上記プログラムの動
作に必要な記憶領域とを提供する。２０３は制御メモリ
であり、後述する手順を実現するプログラムを保持す
る。２０４はポインティングデバイスであり、入力部１
０９を実現する。２０５はディスプレイであり、表示部
１１０を実現する。２０６はマイクであり、音声入力部
１０７を実現する。２０７は各構成要素を結合するバス
である。Reference numeral 201 denotes a CPU, which operates according to a program for realizing a procedure described later. Reference numeral 202 denotes a memory, which provides the content holding unit 101, the content dictionary holding unit 103, the user dictionary holding unit 105, and a storage area necessary for the operation of the program. Reference numeral 203 denotes a control memory, which stores a program that implements a procedure described later. Reference numeral 204 denotes a pointing device, and the input unit 1
09 is realized. Reference numeral 205 denotes a display, which implements the display unit 110. Reference numeral 206 denotes a microphone, which implements the voice input unit 107. A bus 207 connects the components.

【００１３】次に、図３に示すフローチャートを参照し
て、本実施形態の装置の動作を説明する。本実施形態で
は一つのコンテンツ中に一つ入力フィールドが存在する
場合を扱う。Next, the operation of the apparatus of this embodiment will be described with reference to the flowchart shown in FIG. In the present embodiment, a case where one input field exists in one content is handled.

【００１４】まず、ステップＳ３０１ではネットワーク
(不図示)等からコンテンツを取得し、コンテンツ保持部
１０１に保持する。First, in step S301, the network
The content is acquired from a device (not shown) or the like, and is stored in the content storage unit 101.

【００１５】ステップＳ３０２では、コンテンツ保持部
１０１に保持されたコンテンツの内容を、コンテンツ解
析部１０２で解析する。コンテンツ中に音声認識用文法
・辞書の記述があれば、それに基づいて文法・辞書を取
り出す。なお、文法・辞書はコンテンツ記述言語のタグ
や属性によってコンテンツ中に記述されているものとす
る。例えば、図６の仮想的なコンテンツ記述において
は、ｉｎｐｕｔ要素のｇｒａｍｍａｒ属性によってファ
イル名を記述しているので、そのファイルを取得する。In step S302, the contents of the contents held in the contents holding unit 101 are analyzed by the contents analyzing unit 102. If the grammar / dictionary for speech recognition is described in the content, the grammar / dictionary is extracted based on the description. It is assumed that the grammar / dictionary is described in the content by tags and attributes of the content description language. For example, in the virtual content description of FIG. 6, since the file name is described by the grammar attribute of the input element, the file is acquired.

【００１６】ステップＳ３０３では、ステップＳ３０２
で取り出した音声認識用文法・辞書をコンテンツ辞書保
持部１０３に保持する。In step S303, step S302
The grammar / dictionary for speech recognition extracted in step (1) is stored in the content dictionary storage unit 103.

【００１７】ステップＳ３０４では、入力があったかど
うかを入力解析部１０６で調べる。本ステップは実際の
入力があるまで繰り返す。In step S304, the input analysis unit 106 checks whether an input has been made. This step is repeated until there is an actual input.

【００１８】ステップＳ３０５では、ステップＳ３０４
での入力が、入力フィールドへの音声入力かどうかを調
べる。入力フィールドへの音声入力の場合はステップＳ
３０６に進む。そうでない場合は処理を終了する。In step S305, step S304
Check if the input in is a voice input into the input field. Step S for voice input into the input field
Proceed to 306. If not, the process ends.

【００１９】ステップＳ３０６では、音声認識部で使用
する辞書を選択する際のモードで、自動モードが選択さ
れている場合には、ステップＳ３０７へ進む。In step S306, if the automatic mode is selected in the mode for selecting a dictionary to be used in the voice recognition unit, the flow advances to step S307.

【００２０】ステップＳ３０７では、コンテンツ辞書が
保持されているかどうかをコンテンツ辞書有無判定部１
１１で調べる。コンテンツ辞書が保持されている場合は
ステップＳ３０８に進み、コンテンツ辞書が保持されて
いない場合はステップＳ３０９に進む。In step S307, it is determined whether the content dictionary is held or not by the content dictionary presence / absence determining unit 1.
Check at 11. If the content dictionary is stored, the process proceeds to step S308. If the content dictionary is not stored, the process proceeds to step S309.

【００２１】ステップＳ３０８では、辞書切替部１０４
において、音声認識部１０８で使用する辞書を、ユーザ
辞書保持部１０５に保持されているユーザ辞書に切り替
える。In step S308, the dictionary switching unit 104
In, the dictionary used by the voice recognition unit is switched to the user dictionary stored in the user dictionary storage unit 105.

【００２２】ステップＳ３０９では、辞書切替部１０４
において、音声認識部１０８で使用する辞書を、コンテ
ンツ辞書保持部１０１に保持されているコンテンツ辞書
に切り替える。In step S309, dictionary switching unit 104
, The dictionary used by the voice recognition unit 108 is switched to the content dictionary stored in the content dictionary storage unit 101.

【００２３】ステップＳ３１０では、ステップＳ３０４
で入力された音声入力の音声認識処理を行なう。コンテ
ンツ辞書があった場合はコンテンツ辞書保持部１０３に
保持されたコンテンツ辞書を用い、コンテンツ辞書がな
かった場合はユーザ辞書保持部１０５に保持されたユー
ザ辞書を用いる。In step S310, step S304
Performs voice recognition processing of the voice input input by. If there is a content dictionary, the content dictionary stored in the content dictionary storage unit 103 is used. If there is no content dictionary, the user dictionary stored in the user dictionary storage unit 105 is used.

【００２４】ステップＳ３１１では、ステップＳ３０８
で音声認識した結果を表示部１１０に表示する。そして
処理を終了する。In step S311, step S308
The result of the voice recognition is displayed on the display unit 110. Then, the process ends.

【００２５】次に具体的な例を用いて、本実施形態の処
理手順について更に説明する。Next, the processing procedure of this embodiment will be further described using a specific example.

【００２６】コンテンツの一部が図６に示すものであっ
た場合、文法・辞書がコンテンツ中に指定されている
(ｉｎｐｕｔ要素のｇｒａｍｍａｒ属性)ので、これを取
得し、コンテンツ辞書保持部１０３に保持する。そし
て、入力フィールド６０３に音声が入力された場合は、
この辞書(例えば４０１)を用いて音声認識を行なう。一
方、コンテンツの一部が図７に示すものであった場合、
文法・辞書がコンテンツ中に指定されていない。よって
従来例では、入力フィールド７０３に音声で入力するこ
とができない。本実施形態では、このような場合、ユー
ザ辞書保持部１０５に保持されたユーザ辞書(例えば５
０１)を用いて音声認識を行なうので、入力フィールド
７０３に音声で入力することができる。If a part of the content is as shown in FIG. 6, a grammar / dictionary is specified in the content.
(grammar attribute of the input element), so that it is acquired and stored in the content dictionary storage unit 103. Then, when a voice is input in the input field 603,
Voice recognition is performed using this dictionary (for example, 401). On the other hand, if a part of the content is as shown in FIG.
The grammar / dictionary is not specified in the content. Therefore, in the conventional example, it is not possible to input voice into the input field 703. In this embodiment, in this case, in such a case, the user dictionary (for example, 5
01), the voice can be input to the input field 703 by voice.

【００２７】[実施形態２]上記実施形態ではコンテンツ
中に文法・辞書が記述されていない場合にユーザ辞書に
自動モード時のステップＳ３０７にて自動的に切り替え
たが、自動的に切り替えるのでなく、ユーザが明示的に
指定したときにユーザ辞書を使用するようにしてもよ
い。図１の構成に加えて、ユーザ辞書の使用を指定する
ための手段を設け、この手段によってユーザが指定をし
たときに辞書を切り替えるようにすればよい。すなわ
ち、ステップＳ３０６にてマニュアル（ユーザ）モード
が選択された場合には、ステップＳ３０８に進み、常に
ユーザ辞書を使用する。[Embodiment 2] In the above embodiment, when the grammar / dictionary is not described in the content, the user dictionary is automatically switched to the user dictionary in step S307 in the automatic mode. The user dictionary may be used when explicitly specified by the user. In addition to the configuration of FIG. 1, means for designating the use of the user dictionary may be provided, and the dictionary may be switched when the user designates the dictionary. That is, when the manual (user) mode is selected in step S306, the process proceeds to step S308, and the user dictionary is always used.

【００２８】[実施形態３]また、前記他の実施形態と同
様の構成によって、コンテンツ中に辞書が記述されてい
る場合であっても、ユーザの指定によりユーザ辞書を使
用するようにすることが可能である。例えば、コンテン
ツの一部が図６に示すものであった場合、文法・辞書が
コンテンツ中に指定されている(例えば図４の辞書)。こ
こで、ユーザがユーザ辞書の使用を明示的に指定した場
合には、入力フィールド６０３に音声が入力された際
に、コンテンツ辞書ではなく、ユーザ辞書(例えば図５
の辞書)を用いて音声認識を行なう。すなわち、ステッ
プＳ３０６にてマニュアル（ユーザ）モードが選択され
た場合には、ステップＳ３０８に進み、常にユーザ辞書
を使用する。なお、ステップＳ３０６にてマニュアル
（コンテンツ）モードが選択された場合には、ステップ
Ｓ３０９に進み、常にコンテンツ辞書を使用するように
設定することも可能である。[Embodiment 3] In addition, according to the same configuration as the above-mentioned other embodiments, even when a dictionary is described in the content, a user dictionary can be used according to a user's designation. It is possible. For example, when a part of the content is as shown in FIG. 6, a grammar / dictionary is specified in the content (for example, the dictionary in FIG. 4). Here, when the user explicitly specifies the use of the user dictionary, when a voice is input to the input field 603, the user does not use the content dictionary but the user dictionary (for example, FIG. 5).
Speech recognition is performed using the That is, when the manual (user) mode is selected in step S306, the process proceeds to step S308, and the user dictionary is always used. When the manual (content) mode is selected in step S306, the process proceeds to step S309, and it is possible to set so that the content dictionary is always used.

【００２９】[実施形態４]上記実施形態ではユーザ辞書
が一つの場合について説明したが、複数のユーザ辞書が
あってもよい。複数の辞書のうちどれを使用するかにつ
いては、ユーザが明示的に指定する、自動的に判別す
る、という方法がある。[Embodiment 4] In the above embodiment, the case where there is one user dictionary has been described, but there may be a plurality of user dictionaries. As to which one of the plurality of dictionaries to use, there is a method of explicitly specifying the dictionary or a method of automatically determining the dictionary.

【００３０】前者は前記他の実施形態と同様に実施でき
る。The former can be carried out in the same manner as the other embodiments.

【００３１】後者は例えば以下のように実施できる。各
ユーザ辞書に分類カテゴリ(どういった分野に関する辞
書かを表すもの)を予め付与しておく。コンテンツがど
の分類カテゴリに属するかを判定し(これは既知の文書
分類の手法によればよい)、該当する分類カテゴリのユ
ーザ辞書を採用する。該当する分類カテゴリのユーザ辞
書が存在しない場合に用いるユーザ辞書を決めておいて
もよい。The latter can be implemented, for example, as follows. A classification category (representing a field related to which field) is assigned to each user dictionary in advance. It is determined which classification category the content belongs to (this may be done by a known document classification method), and a user dictionary of the corresponding classification category is adopted. A user dictionary to be used when a user dictionary of the corresponding category does not exist may be determined.

【００３２】[実施形態５]上記実施形態では、入力フィ
ールドが一つ存在する場合を例にとったが、複数の入力
フィールドが存在してもよい。この場合は、個々の入力
フィールドに関して上記実施形態と同様の処理を行なう
ことによって対応できる。[Fifth Embodiment] In the above embodiment, the case where one input field is present is taken as an example, but a plurality of input fields may exist. This case can be dealt with by performing the same processing as in the above embodiment for each input field.

【００３３】[実施形態６]上記実施形態では、コンテン
ツ辞書、ユーザ辞書の内容が図４、５に示すような簡単
な辞書である場合を例にとったが、これらの内容は一般
的な音声認識用文法であっても同様に実施することがで
きる。[Embodiment 6] In the above-described embodiment, the case where the contents of the content dictionary and the user dictionary are simple dictionaries as shown in FIGS. 4 and 5 is taken as an example. The recognition grammar can be similarly implemented.

【００３４】[実施形態７]上記実施形態では、コンテン
ツの音声出力を行なっていないが、コンテンツを音声出
力させる場合であっても同様に実施することができる。[Embodiment 7] In the above embodiment, the audio output of the content is not performed. However, the same can be applied to the case where the audio output of the content is performed.

【００３５】[実施形態８]上記実施形態においては、各
部を同一の計算機上で構成する場合について説明した
が、これに限定されるものではなく、複数の計算機上で
実現してもよい。[Embodiment 8] In the above embodiment, the case where each unit is configured on the same computer has been described. However, the present invention is not limited to this, and may be realized on a plurality of computers.

【００３６】[実施形態９]なお、本発明は、複数の機器
から構成されるシステムに適用しても、１つの機器から
なる装置に適用してもよい。前述した実施形態の機能を
実現するソフトウェアのプログラムコードを記録した記
録媒体を、システムあるいは装置に供給し、そのシステ
ムあるいは装置のコンピュータ（またはＣＰＵやＭＰ
Ｕ）が記録媒体に格納されたプログラムコードを読み出
し実行することによっても、達成されることは言うまで
もない。[Embodiment 9] The present invention may be applied to a system composed of a plurality of devices or an apparatus composed of one device. A recording medium storing software program codes for realizing the functions of the above-described embodiments is supplied to a system or an apparatus, and a computer (or CPU or MP) of the system or the apparatus is provided.
Needless to say, this can also be achieved by U) reading and executing the program code stored in the recording medium.

【００３７】この場合、記録媒体から読み出されたプロ
グラムコード自体が前述した実施形態の機能を実現する
ことになり、そのプログラムコード自体が前述した実施
形態の機能を実現することになり、そのプログラムコー
ドを記録した記録媒体は本発明を構成することになる。In this case, the program code itself read from the recording medium implements the functions of the above-described embodiment, and the program code itself implements the functions of the above-described embodiment. The recording medium on which the code is recorded constitutes the present invention.

【００３８】[実施形態１０]プログラムコードを供給す
るための記録媒体としては、例えば、フロッピー（登録
商標）ディスク、ハードディスク、光ディスク、ＣＤ−
ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮
発性のメモリカード、ＲＯＭなどを用いることができ
る。 [実施形態１１]また、コンピュータが読み出したプログ
ラムコードを実行することにより、前述した実施形態の
機能が実現されるだけでなく、そのプログラムコードの
指示に基づき、コンピュータ上で稼動しているＯＳなど
が実際の処理の一部または全部を行い、その処理によっ
て前述した実施形態の機能が実現される場合も含まれる
ことは言うまでもない。[Embodiment 10] As a recording medium for supplying a program code, for example, a floppy (registered trademark) disk, hard disk, optical disk, CD-
A ROM, a CD-R, a DVD-ROM, a magnetic tape, a nonvolatile memory card, a ROM, and the like can be used. [Embodiment 11] When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also an OS running on the computer based on the instruction of the program code. Performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００３９】[実施形態１２]さらに、記録媒体から読み
出されたプログラムコードが、コンピュータに挿入され
た機能拡張ボードやコンピュータに接続された機能拡張
ユニットに備わるメモリに書き込まれた後、そのプログ
ラムコードの指示に基づき、その機能拡張ボードや機能
拡張ユニットに備わるＣＰＵなどが実際の処理の一部ま
たは全部を行い、その処理によって前述した実施形態の
機能が実現される場合も含まれることは言うまでもな
い。[Twelfth Embodiment] Further, after the program code read from the recording medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the program code is read. It is needless to say that a CPU or the like provided in the function expansion board or the function expansion unit performs part or all of the actual processing based on the instruction, and the function of the above-described embodiment is realized by the processing. .

【００４０】[0040]

【発明の効果】以上説明したように、本発明によれば、
音声認識用の文法・辞書がコンテンツ中に記述されてい
ないようなコンテンツにおいても、音声によって入力す
ることができる。As described above, according to the present invention,
Even in a content in which a grammar / dictionary for voice recognition is not described in the content, the content can be input by voice.

[Brief description of the drawings]

【図１】本発明に係るボイスブラウザ装置の実施形態の
基本構成を示すブロック図である。FIG. 1 is a block diagram showing a basic configuration of an embodiment of a voice browser device according to the present invention.

【図２】本発明の実施形態の具体的構成を示す図であ
る。FIG. 2 is a diagram showing a specific configuration of an embodiment of the present invention.

【図３】本発明に係るボイスブラウザ装置の実施形態に
おける処理の概要を示すフローチャートである。FIG. 3 is a flowchart showing an outline of processing in the embodiment of the voice browser device according to the present invention.

【図４】本発明の実施形態におけるコンテンツ辞書の内
容の例を示す図である。FIG. 4 is a diagram showing an example of contents of a content dictionary according to the embodiment of the present invention.

【図５】本発明の実施形態におけるユーザ辞書の内容の
例を示す図である。FIG. 5 is a diagram illustrating an example of contents of a user dictionary according to the embodiment of the present invention.

【図６】本発明の実施形態の具体的説明をするための図
である。FIG. 6 is a diagram for specifically describing an embodiment of the present invention.

【図７】本発明の実施形態の具体的説明をするための図
である。FIG. 7 is a diagram for specifically describing an embodiment of the present invention.

[Explanation of symbols]

１０１・・・コンテンツ保持部１０２・・・コンテンツ解析部１０３・・・コンテンツ辞書保持部１０４・・・辞書切替部１０５・・・ユーザ辞書保持部１０６・・・入力解析部１０７・・・音声入力部１０８・・・音声認識部１０９・・・入力部１１０・・・表示部１１１・・・コンテンツ辞書有無判定部２０１・・・ＣＰＵ２０２・・・メモリ２０３・・・制御メモリ２０４・・・ポインティングデバイス２０５・・・ディスプレイ２０６・・・マイク２０７・・・バス 101: Content holding unit 102: Content analyzing unit 103: Content dictionary holding unit 104: Dictionary switching unit 105: User dictionary holding unit 106: Input analyzing unit 107: Voice input Unit 108: Voice recognition unit 109: Input unit 110: Display unit 111: Content dictionary presence / absence determination unit 201: CPU 202: Memory 203: Control memory 204: Pointing Device 205 ・・・ Display 206 ・・・ Microphone 207 ・・・ Bus

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５５１Ｐ 15/28 Ｆターム(参考） 5B075 KK07 KK13 KK33 KK37 ND03 ND14 ND20 ND23 ND36 NK34 PP07 PP12 PP30 PQ02 PQ04 PQ42 UU40 5D015 KK01 LL10 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification code FI Theme coat ゛ (Reference) G10L 15/00 G10L 3/00 551P 15/28 F term (Reference) 5B075 KK07 KK13 KK33 KK37 ND03 ND14 ND20 ND23 ND36 NK34 PP07 PP12 PP30 PQ02 PQ04 PQ42 UU40 5D015 KK01 LL10

Claims

[Claims]

1. A voice browser device capable of inputting by voice, comprising: voice input means for inputting voice; voice recognition means for recognizing the input voice; and voice recognition means for voice recognition described in the content. Content dictionary holding means for holding a content dictionary, user dictionary holding means for holding a speech recognition user dictionary prepared for each user, and data input targets among input fields provided in the content A voice browser device comprising: a dictionary switching unit that switches a dictionary used by the voice recognition unit to a user dictionary when a content dictionary corresponding to the input field does not exist.

2. The voice browser device according to claim 1, further comprising a content presence / absence determining means for determining the presence / absence of a content dictionary, and automatically switching to a user dictionary when the content dictionary does not exist.

3. The voice browser device according to claim 1, further comprising a user dictionary designating means for explicitly instructing the user to use the user dictionary, and switching to the user dictionary according to the user's instruction.

4. A voice browser method capable of inputting by voice, comprising: a voice inputting step of inputting a voice; a voice recognizing step of recognizing the input voice; A content dictionary holding step of holding a content dictionary, a user dictionary holding step of holding a voice recognition user dictionary prepared for each user, and data input targets among input fields provided in the content. A dictionary switching step of switching a dictionary used in the voice recognition step to a user dictionary when there is no content dictionary corresponding to the input field.

5. The voice browser method according to claim 4, further comprising a content presence / absence determination step of determining presence / absence of a content dictionary, and automatically switching to a user dictionary when the content dictionary does not exist.

6. The voice browser method according to claim 4, further comprising a user dictionary specifying step in which the user explicitly instructs use of the user dictionary, and switching to the user dictionary according to the user's instruction.

7. A computer-readable storage medium storing the steps included in the voice browser method according to claim 4 as a program for causing a computer to execute the steps.