JPH11184671A

JPH11184671A - Method, device and system for presenting information

Info

Publication number: JPH11184671A
Application number: JP9353710A
Authority: JP
Inventors: Yoshitaka Kuwata; 喜隆桑田
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 1997-12-22
Filing date: 1997-12-22
Publication date: 1999-07-09

Abstract

PROBLEM TO BE SOLVED: To provide an information presenting device capable of flexibly controlling the presentation style of electronic information to be presented in voice. SOLUTION: A control pattern file 18 and a voice recognition dictionary 19 are composed of a control pattern file group and a voice recognition dictionary group prepared for every information unit related to a presentation style of electronic information to be presented. When the electronic information in an information presentation processing part 13 becomes an object of voice recognition control, recognition processing of an inputted voice is performed at a voice recognizing part 15 based on the voice recognition dictionary 19. At a voice control part 16, a control pattern corresponding to a vocabulary recognized on the basis of the control pattern file 18 is detected. The information presentation processing part 13 presents the corresponding electronic information in the presentation style based on the control information of the relevant control pattern.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識技術を用
いて電子化情報を提示する手法に関する。The present invention relates to a technique for presenting digitized information using a speech recognition technique.

【０００２】[0002]

【従来の技術】従来、コンピュータ端末等を使用して、
予め設定された認識語彙に基づいた音声認識処理を行う
とともに、情報閲覧装置上に提示される情報を、当該音
声認識結果に対応する制御情報に基づいて制御する方法
が知られている。2. Description of the Related Art Conventionally, using a computer terminal or the like,
There is known a method of performing a speech recognition process based on a preset recognition vocabulary and controlling information presented on an information browsing device based on control information corresponding to the speech recognition result.

【０００３】図１１は、この種の情報閲覧システムを表
す機能ブロック図である。この情報閲覧システム１１０
は、情報提供サーバ３０と、情報閲覧用装置として用い
られる複数のクライアント４０とを具備して構成されて
いる。情報提供サーバ３０は、その内部にデータベース
１７を構築している。このデータベース１７は、閲覧対
象となる文書または画像データ（以下、閲覧対象デー
タ）が蓄積されたものである。なお、ここでは、情報提
供サーバ３０においてクライアント４０からの閲覧要求
に対応したデータをデータベース１７から取得するため
の手段や情報提供サーバ３０及びクライアント４０間の
通信制御を行う手段については公知技術なので省略して
ある。FIG. 11 is a functional block diagram showing this type of information browsing system. This information browsing system 110
Is configured to include an information providing server 30 and a plurality of clients 40 used as information browsing devices. The information providing server 30 has a database 17 built therein. The database 17 stores documents or image data to be browsed (hereinafter referred to as browse target data). Here, the means for acquiring data corresponding to the browsing request from the client 40 in the information providing server 30 and the means for controlling the communication between the information providing server 30 and the client 40 in the information providing server 30 are well-known technologies, and are omitted here. I have.

【０００４】一方、クライアント４０は、音声認識部１
１１、音声認識辞書１１２、音声認識制御部１１３、ブ
ラウザ１１４の各機能ブロックを具備して構成される。
音声認識部１１１は、利用者等からマイク等の音声入力
装置を介して入力される音声の認識処理を行うものであ
る。この認識処理は、音声認識用の語彙群が格納された
音声認識辞書１１２に基づいて行われる。音声認識制御
部１１３は、音声認識の結果に対応する制御情報に基づ
いて対応するブラウザ１１４上の情報を制御するもので
ある。この制御情報は、予め設定されているものであ
る。ブラウザ１１４は、情報提供サーバ３０から提供さ
れるデータに対する情報閲覧用インタフェースである。
このブラウザ上の情報が音声による制御対象となるもの
である。On the other hand, the client 40 has a voice recognition unit 1
11, a speech recognition dictionary 112, a speech recognition control unit 113, and a browser 114.
The voice recognition unit 111 performs a process of recognizing voice input from a user or the like via a voice input device such as a microphone. This recognition processing is performed based on the speech recognition dictionary 112 in which a vocabulary group for speech recognition is stored. The voice recognition control unit 113 controls the corresponding information on the browser 114 based on the control information corresponding to the result of the voice recognition. This control information is set in advance. The browser 114 is an information browsing interface for data provided from the information providing server 30.
The information on the browser is to be controlled by voice.

【０００５】この情報閲覧システム１１０において、利
用者等は、クライアント４０に対して音声による制御命
令を入力してブラウザ１１４上の情報を制御することに
より、所望の情報を閲覧していた。In this information browsing system 110, a user or the like browses desired information by inputting a voice control command to the client 40 and controlling information on the browser 114.

【０００６】[0006]

【発明が解決しようとする課題】ところで、上述の情報
閲覧システムにおいて、認識対象となる語彙数が増加し
た場合に、現在の音声認識技術をそのまま適用すると認
識率が低下してしまうため、実用化が図りにくいという
問題があった。By the way, in the above-mentioned information browsing system, when the number of words to be recognized increases, if the current speech recognition technology is applied as it is, the recognition rate will be reduced. However, there was a problem that it was difficult to achieve.

【０００７】また、認識対象となる語彙を音声認識辞書
に予め登録設定しておく必要があるために、情報提供者
側と情報閲覧者側とが分離した形態の情報閲覧システム
では、情報提供者側からの音声認識辞書の変更は不可能
であった。そのため、例えばブラウザ上の情報に対する
音声認識による制御では、予め設定した音声認識辞書に
基づいて行わなければならないという制約があり、提示
ページ等のような特定の情報単位毎の自由な制御は不可
能であった。[0007] In addition, since the vocabulary to be recognized must be registered and set in advance in the speech recognition dictionary, in the information browsing system in which the information provider and the information viewer are separated, the information provider It was not possible to change the speech recognition dictionary from the side. For this reason, for example, in the control by speech recognition of information on a browser, there is a restriction that the control must be performed based on a preset speech recognition dictionary, and free control for each specific information unit such as a presentation page is not possible. Met.

【０００８】そこで本発明の課題は、ブラウザ等に提示
される情報の提示形態を音声による柔軟な制御で実現す
る、改良された情報提示方法を提供することにある。ま
た、本発明の他の課題は、上記情報提示方法の実施に適
した情報提示装置及びシステムを提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide an improved information presenting method for realizing a form of presenting information to be presented to a browser or the like by flexible control using voice. Another object of the present invention is to provide an information presenting apparatus and system suitable for implementing the above information presenting method.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決する本発
明の情報提示方法は、提示対象となる電子化情報の提示
形態を音声により変化させる、コンピュータ装置を用い
た情報提示方法であって、音声認識の対象となる認識語
彙群及び前記認識語彙群に対応させた前記電子化情報の
提示に係る提示制御情報群を、電子化情報の提示形態毎
に予め保持しておき、入力された音声と照合する認識語
彙を前記認識語彙群から検出し、当該認識語彙と照合す
べき提示制御情報を前記提示制御情報群から検出すると
ともに、検出した当該提示制御情報に基づいた提示形態
で対応する電子化情報を提示することを特徴とする。An information presenting method according to the present invention for solving the above-mentioned problems is an information presenting method using a computer device, which changes the presentation form of digitized information to be presented by voice, A recognition vocabulary group to be subjected to voice recognition and a presentation control information group related to the presentation of the digitized information corresponding to the recognized vocabulary group are held in advance for each presentation form of the digitized information, and the input speech is stored. A detection vocabulary to be matched with the recognized vocabulary group is detected from the group of recognized vocabularies, presentation control information to be matched with the recognized vocabulary is detected from the group of presentation control information, and an electronic corresponding to the presentation form based on the detected presentation control information. Is characterized by presenting coded information.

【００１０】また、上記他の課題を解決する本発明の情
報提示装置は、提示対象となる電子化情報の提示形態を
音声により制御する装置であって、音声認識の対象とな
る認識語彙群を前記提示形態毎に予め登録した音声認識
辞書と、前記認識語彙群に対応する前記電子化情報の提
示に係る提示制御情報を前記提示形態毎に予め設定した
制御パタンファイルと、操作者からの情報提示要求に対
応した前記電子化情報を提示するとともに、当該電子化
情報が音声による提示制御対象か否かを検出する手段
と、前記音声による提示制御対象となる電子化情報が検
出された場合に、操作者からの音声を取得するととも
に、当該音声と照合すべき認識語彙を前記音声認識辞書
から検出して音声認識を行う音声認識手段と、前記検出
された認識語彙と照合すべき提示制御情報を前記制御パ
タンファイルから検出するとともに、当該提示制御情報
に基づいた提示形態で対応する電子化情報を提示する提
示手段と、を備えて構成される。According to another aspect of the present invention, there is provided an information presenting apparatus for controlling the presentation form of digitized information to be presented by voice. A speech recognition dictionary registered in advance for each presentation form, a control pattern file in which presentation control information relating to presentation of the digitized information corresponding to the recognized vocabulary group is set in advance for each presentation form, and information from an operator. Means for presenting the digitized information corresponding to the presentation request and detecting whether or not the digitized information is a presentation control target by voice; and A voice recognition unit that obtains a voice from an operator, detects a recognition vocabulary to be compared with the voice from the voice recognition dictionary, and performs voice recognition, and performs matching with the detected recognition vocabulary. With a presentation control information detected from said control pattern file to, and provided with a presentation means for presenting the corresponding electronic information presentation form based on the presentation control information.

【００１１】この情報提示装置において、前記音声認識
辞書は、前記操作者からの音声と照合可能な同一のデー
タ形式から成る複数の音声認識データに対応した一意の
データを認識語彙として索出するように構築される。あ
るいは、操作者からの音声と照合可能な同一のデータ形
式から成る複数の音声認識データに対応した一意の文字
列を認識語彙として索出するように構築される。In this information presentation device, the voice recognition dictionary searches for unique data corresponding to a plurality of voice recognition data having the same data format that can be compared with voices from the operator as a recognition vocabulary. Is built on. Alternatively, it is constructed such that a unique character string corresponding to a plurality of pieces of voice recognition data having the same data format that can be collated with the voice from the operator is retrieved as a recognition vocabulary.

【００１２】また、前記制御パタンファイルは、前記認
識語彙または認識語彙の組み合わせから成る制御パタン
に対応した前記提示制御情報を索出できるように構築さ
れる。なお、前記音声認識辞書及び制御パタンファイル
は、例えば前記電子化情報を蓄積したデーターベース内
に構築されるようにすることもできる。Further, the control pattern file is constructed so that the presentation control information corresponding to the control pattern composed of the recognized vocabulary or a combination of the recognized vocabulary can be retrieved. The voice recognition dictionary and the control pattern file may be constructed in a database storing the digitized information, for example.

【００１３】前記提示手段は、例えば、前記電子化情報
の提示領域に対応した情報単位に基づいて当該電子化情
報を提示するように構成される。The presenting means is configured to present the digitized information based on, for example, an information unit corresponding to the digitized information presentation area.

【００１４】また、上記他の課題を解決する本発明の情
報提示システムは、提示対象となる電子化情報、音声認
識の対象となる音声認識語彙群、及び当該音声認識語彙
群に対応する前記電子化情報の提示に係る提示制御情報
群を前記電子化情報の提示形態毎に予め作成して保持す
る第１装置と、操作者からの情報提示要求に対応した前
記電子化情報を提示する第２装置とを双方向通信可能に
接続して成る。[0014] Further, the information presentation system of the present invention for solving the above-mentioned other problems is a computerized information to be presented, a speech recognition vocabulary group to be subjected to voice recognition, and the electronic corresponding to the speech recognition vocabulary group. Device that previously creates and holds a presentation control information group related to the presentation of digitized information for each presentation form of the digitized information, and a second device that presents the digitized information corresponding to an information presentation request from an operator. The device is connected so that bidirectional communication is possible.

【００１５】この情報提示システムにおいて、前記第２
装置は、前記操作者からの情報提示要求に対応した前記
電子化情報を前記第１装置から取得して提示するととも
に、当該電子化情報が音声による提示制御対象か否かを
検出する手段と、前記音声による提示制御対象となる電
子化情報が検出された場合に操作者からの音声を取得す
るとともに、当該電子化情報に対応する前記認識語彙群
を前記第１装置から取得して当該音声と照合する認識語
彙を検出する手段と、前記提示制御対象となる電子化情
報に対応する前記提示制御情報群を前記第１装置から取
得して前記検出された認識語彙と照合する提示制御情報
を検出するとともに当該提示制御情報に基づいた提示形
態で対応する電子化情報を前記第１装置から取得して提
示する提示手段とを備え、提示される前記電子化情報の
提示形態を音声により制御することを特徴とする。前記
提示手段は、例えば所定のＷｅｂブラウザの提示領域に
対応した情報単位に基づいて前記電子化情報を提示する
ように構成する。In this information presentation system, the second
The device acquires and presents the digitized information corresponding to the information presentation request from the operator from the first device, and detects whether the digitized information is a presentation control target by voice, and When the digitized information to be subjected to the presentation control by the voice is detected, the voice from the operator is obtained, and the recognition vocabulary group corresponding to the digitized information is obtained from the first device, and the voice is obtained. Means for detecting a recognition vocabulary to be compared, and presentation control information for acquiring the presentation control information group corresponding to the digitized information to be subjected to the presentation control from the first device, and detecting the presentation control information to be matched with the detected recognition vocabulary And a presentation means for acquiring and presenting the corresponding digitized information in the presentation form based on the presentation control information from the first device, and presenting the presented form of the digitized information to a voice. And controlling Ri. The presenting unit is configured to present the digitized information based on, for example, an information unit corresponding to a presentation area of a predetermined Web browser.

【００１６】[0016]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を詳細に説明する。（第１実施形態）図１は、本発明を適用した情報提示装
置の実施の形態例を表す機能ブロック図である。この情
報提示装置１は、コンピュータ装置の内部あるいは外部
記憶装置に構築されるデータベース１７、制御パタンフ
ァイル１８、音声認識辞書１９、及び上記コンピュータ
装置が所定のプログラムを読み込んで実行することによ
り形成される、提示要求入力部１１、データ取得部１
２、情報提示処理部１３、音声入力部１４、音声認識部
１５、及び音声制御部１６を備えて構成される。Embodiments of the present invention will be described below in detail with reference to the drawings. (First Embodiment) FIG. 1 is a functional block diagram showing an embodiment of an information presentation apparatus to which the present invention is applied. The information presentation device 1 is formed by a database 17, a control pattern file 18, a speech recognition dictionary 19, and a predetermined program that is read and executed by the computer device built in the computer device or an external storage device. , Presentation request input unit 11, data acquisition unit 1
2. It is configured to include an information presentation processing unit 13, a voice input unit 14, a voice recognition unit 15, and a voice control unit 16.

【００１７】上記プログラムは、通常、コンピュータ装
置の内部記憶装置あるいは外部記憶装置に格納され、随
時読み取られて実行されるようになっているが、コンピ
ュータ装置とは分離可能な記録媒体、例えばＣＤ−ＲＯ
ＭやＦＤ等に格納され、使用時に上記内部記憶装置また
は外部記憶装置にインストールされて随時実行に供され
るものであってもよい。つまり、本実施形態の情報提示
装置は、汎用のコンピュータ装置と、上記機能ブロック
を形成するためのプログラムをコンピュータ読み取り可
能な形態で記録した記録媒体と、によっても実現可能な
ものである。The above program is usually stored in an internal storage device or an external storage device of the computer device, and is read and executed as needed. However, a recording medium separable from the computer device, for example, a CD-ROM. RO
It may be stored in M, FD, or the like, and may be installed in the above-mentioned internal storage device or external storage device at the time of use and provided for execution at any time. That is, the information presentation device of the present embodiment can be realized by a general-purpose computer device and a recording medium that records a program for forming the functional blocks in a computer-readable form.

【００１８】データベース１７は、提示対象となる文書
や画像等のデータ（以下、提示対象データ）を蓄積した
ものであり、制御パタンファイル１８は、音声による情
報提示処理部１３の制御を行うための、予め登録された
制御パタン及び当該制御パタンに対応した制御情報が定
義されたものである。この制御パタンファイル１８は、
例えば、ページ等のような提示対象データにおける所定
の単位毎に作成された制御パタンファイル群が格納され
ているものである。この制御パタンファイルの例を図６
に示す。The database 17 stores data such as documents and images to be presented (hereinafter referred to as presentation target data). The control pattern file 18 is used to control the information presentation processing unit 13 by voice. , A control pattern registered in advance and control information corresponding to the control pattern. This control pattern file 18
For example, it stores a group of control pattern files created for each predetermined unit in presentation target data such as a page. An example of this control pattern file is shown in FIG.
Shown in

【００１９】音声認識辞書１９には、音声認識の対象と
なる語彙群が予め登録されている。この音声認識辞書１
９は、例えば、ページ等のような提示対象データにおけ
る所定の単位毎に作成された音声認識辞書群が格納され
ているものである。この音声認識辞書の例を示したのが
図５である。なお、制御パタンファイル１８及び音声認
識辞書１９は、データベース１７中の提示対象データと
各々対応づけられて構築されている。A vocabulary group to be subjected to speech recognition is registered in the speech recognition dictionary 19 in advance. This voice recognition dictionary 1
Numeral 9 stores a speech recognition dictionary group created for each predetermined unit in presentation target data such as a page. FIG. 5 shows an example of the speech recognition dictionary. Note that the control pattern file 18 and the speech recognition dictionary 19 are constructed in association with the presentation target data in the database 17, respectively.

【００２０】提示要求入力部１１は、図示しないキーボ
ードやマウス等の入力装置を介して入力される利用者等
からの提示要求を受け付けて、データ取得部１２への入
力を行うものである。The presentation request input unit 11 receives a presentation request from a user or the like input via an input device (not shown) such as a keyboard or a mouse, and performs input to the data acquisition unit 12.

【００２１】データ取得部１２は、入力された提示要求
に対応したデータをデータベース１７から取得して、情
報提示処理部１３への入力を行うものである。The data acquisition unit 12 acquires data corresponding to the input presentation request from the database 17 and inputs the data to the information presentation processing unit 13.

【００２２】情報提示処理部１３は、データ取得部１２
で取得されたデータの提示を行うものである。この情報
提示処理部１３は、例えば、Ｗｅｂ環境におけるブラウ
ザ等のインタフェースを用いて構成される。また、情報
提示処理部１３において出力される情報（以下、提示情
報）が、音声による制御対象となる場合には、音声入力
部１４と共動して音声入力を受け付けるように構成され
る。The information presentation processing unit 13 includes the data acquisition unit 12
Is presented. The information presentation processing unit 13 is configured using, for example, an interface such as a browser in a Web environment. Further, when the information output from the information presentation processing unit 13 (hereinafter referred to as presentation information) is to be controlled by voice, it is configured to cooperate with the voice input unit 14 to receive voice input.

【００２３】音声入力部１４は、図示しないマイク等の
入力装置を介して入力される利用者等からの音声を受け
付けて、音声認識部１５への入力を行うものである。音
声認識部１５は、音声入力部１４から入力された音声に
対して、音声認識辞書１９に基づいた認識処理を行うも
のである。音声が認識された場合の認識結果（以下、認
識語彙）は、音声制御部１６に入力される。音声制御部
１６は、認識語彙及び制御パタンファイル１８に基づい
て情報提示処理部１３における提示情報の制御、すなわ
ち音声認識制御（以下、音声制御）を行うものである。The voice input unit 14 receives a voice from a user or the like input through an input device such as a microphone (not shown) and inputs the voice to the voice recognition unit 15. The voice recognition unit 15 performs a recognition process on the voice input from the voice input unit 14 based on the voice recognition dictionary 19. A recognition result when the voice is recognized (hereinafter, a recognized vocabulary) is input to the voice control unit 16. The voice control unit 16 controls presentation information in the information presentation processing unit 13 based on the recognized vocabulary and the control pattern file 18, that is, performs voice recognition control (hereinafter, voice control).

【００２４】次に、情報提示装置１を用いた情報提示方
法を図２の手順に従って説明する。まず、提示要求入力
部１１では、利用者等からの提示要求が入力されると
（ステップＳ１０１）、当該提示要求をデータ取得部１
２に入力する。データ取得部１２では、当該提示要求に
対応するデータをデータベース１７中から取得し、この
取得したデータを情報提示処理部１３に入力して利用者
への提示を行わせる（ステップＳ１０２）。Next, an information presenting method using the information presenting apparatus 1 will be described according to the procedure of FIG. First, in the presentation request input unit 11, when a presentation request from a user or the like is input (step S101), the presentation request is input to the data acquisition unit 1.
Enter 2 The data acquisition unit 12 acquires data corresponding to the presentation request from the database 17, and inputs the acquired data to the information presentation processing unit 13 to present the data to the user (step S102).

【００２５】情報提示処理部１３では、入力されたデー
タ、すなわち提示情報が音声制御の対象か否かを検出す
るとともに、音声制御の対象である場合には（ステップ
Ｓ１０３：Yes）、音声入力部１４を起動させる。一
方、音声制御の対象ではない場合には（ステップＳ１０
３：No）、ステップＳ１０１に戻り、新たな提示要求の
入力を受け付ける。The information presentation processing unit 13 detects whether or not the input data, ie, the presentation information, is a target of voice control. If the input data is a target of voice control (step S103: Yes), the voice input unit 14 is started. On the other hand, if it is not the target of voice control (step S10
3: No), the process returns to step S101, and an input of a new presentation request is accepted.

【００２６】音声入力部１４は、利用者等からマイク等
の入力装置を介して入力される音声を取得するとともに
（ステップＳ１０４）、当該音声を音声認識部１５に入
力する。音声認識部１５は、当該音声の認識処理を行う
（ステップＳ１０５）。この場合の認識処理は、ステッ
プＳ１０２における提示情報に対応した音声認識の対象
となる語彙群が音声認識辞書１９中から参照され、当該
音声との照合、すなわちパターンマッチングにより行わ
れる。当該音声が認識されると（ステップＳ１０６：Ye
s）、対応する認識語彙は、音声制御部１６に入力され
る。一方、合致しない場合には（ステップＳ１０６：N
o）、その旨を適宜メッセージ等による通知を行うとと
もに、ステップＳ１０４に戻り音声の再入力を行う。The voice input unit 14 obtains voice input from a user or the like via an input device such as a microphone (step S104), and inputs the voice to the voice recognition unit 15. The voice recognition unit 15 performs the voice recognition process (Step S105). In this case, the recognition process is performed by referring to a vocabulary group to be subjected to speech recognition corresponding to the presentation information in step S102 from the speech recognition dictionary 19 and collating with the speech, that is, pattern matching. When the voice is recognized (Step S106: Ye
s) The corresponding recognized vocabulary is input to the voice control unit 16. On the other hand, if they do not match (step S106: N
o), a notice to that effect is given by a message or the like as appropriate, and the process returns to step S104 to re-input the voice.

【００２７】音声制御部１６では、音声認識後の認識語
彙と制御パタンファイル１８との照合を行い（ステップ
Ｓ１０７）、当該認識語彙が制御パタンファイル１８と
合致しなければ（ステップＳ１０８：No）、その旨を適
宜メッセージ等による通知を行うとともに、ステップＳ
１０４に戻り、音声の再入力を行う。この場合、例え
ば、制御パタンファイル１８中にデフォルト値となるよ
うな制御パタンを適宜設定しておくことにより、ステッ
プＳ１０５〜１０６において認識された認識語彙が制御
パタンファイル１８中に明示的に存在しないような場合
の対処が可能となる。The voice control unit 16 compares the recognized vocabulary after voice recognition with the control pattern file 18 (step S107). If the recognized vocabulary does not match the control pattern file 18 (step S108: No), This is notified appropriately by a message or the like, and step S
Returning to step 104, the voice is re-input. In this case, for example, by appropriately setting a control pattern to be a default value in the control pattern file 18, the recognized vocabulary recognized in steps S105 to S106 does not explicitly exist in the control pattern file 18. Such a case can be dealt with.

【００２８】一方、当該認識語彙が制御パタンファイル
１８と合致するとともに（ステップＳ１０８：Yes）、
当該認識語彙が制御パタン「終了」以外の場合には（ス
テップＳ１０９：No）、音声制御部１６は情報提示処理
部１３の提示情報に対する当該制御パタンに基づいた提
示処理を行う（ステップＳ１１０）。この場合の提示処
理では、例えば、データ取得部１２により当該制御パタ
ンに対応するデータをデータベース１７から逐次取得す
るとともに情報提示処理部１３に入力して提示する。或
いは、上記ステップＳ１０２において提示要求に対応す
るデータの取得時に、データ取得部１２は、当該データ
に係るすべてのデータを予め取得するとともに、情報提
示処理部１３に入力して保持しておき、音声制御部１６
は、情報提示処理部１３のデータに対して制御パタンに
基づいた提示処理を行うように適宜構成する。当該認識
語彙が制御パタン「終了」の場合には（ステップＳ１０
９：Yes）、処理を終了する。On the other hand, while the recognized vocabulary matches the control pattern file 18 (step S108: Yes),
If the recognized vocabulary is other than the control pattern “end” (step S109: No), the voice control unit 16 performs a presentation process on the presentation information of the information presentation processing unit 13 based on the control pattern (step S110). In the presentation process in this case, for example, data corresponding to the control pattern is sequentially acquired from the database 17 by the data acquisition unit 12 and input to the information presentation processing unit 13 to be presented. Alternatively, at the time of acquiring data corresponding to the presentation request in step S102, the data acquisition unit 12 acquires all data related to the data in advance, inputs the data to the information presentation processing unit 13 and holds the data. Control unit 16
Is appropriately configured to perform a presentation process based on the control pattern for the data of the information presentation processing unit 13. If the recognized vocabulary is the control pattern “end” (step S10
9: Yes), the process ends.

【００２９】なお、上記ステップＳ１０９は、制御パタ
ン「終了」に着目して情報提示装置１における終了処理
を便宜的に示したものであるが、実際には、この制御パ
タン「終了」も制御パタンファイル１８中に定義される
ものであり上記ステップＳ１１０と同様な処理となる。Note that the above-mentioned step S109 shows the end processing in the information presenting apparatus 1 for the sake of convenience focusing on the control pattern "end". In practice, however, this control pattern "end" is also a control pattern. It is defined in the file 18 and has the same processing as in step S110.

【００３０】図３及び４は、本発明の情報提示処理部１
３にＷｅｂブラウザを適用させるとともに、音声制御に
係る記述を、ＪａｖａによるＨＴＭＬ（HyperText Mark
up Language）文書中に組み込ませて実施させた場合の
形態を表す図表である。FIGS. 3 and 4 show the information presentation processing unit 1 of the present invention.
In addition to applying a Web browser to HTML3, Java-based HTML (HyperText Mark
5 is a chart showing a form in a case of being incorporated into an up language) document.

【００３１】図３のＨＴＭＬ文書「search.html」及び
図４のＨＴＭＬ文書「faq_top.html」では、各々、１０
及び１１行の記述が音声制御を行う宣言を表すものであ
る。当該文書が情報提示処理部１３、すなわちブラウザ
で解析され、１０行中の「RecogCont.class」というプ
ログラムが実行される。このプログラム「RecogCont.cl
ass」は、図３のＨＴＭＬ文書「search.html」の場合に
は、音声認識辞書「Recog1」及び制御パタン「Pattern
1」を、また図４のＨＴＭＬ文書「faq_top.html」の場
合には、音声認識辞書「Recog2」及び制御パタン「Patt
ern2」を各々引数として音声制御を行うものである。In the HTML document “search.html” in FIG. 3 and the HTML document “faq_top.html” in FIG.
And the description in line 11 represents a declaration for performing voice control. The document is analyzed by the information presentation processing unit 13, that is, the browser, and a program called “RecogCont.class” in 10 lines is executed. This program "RecogCont.cl
In the case of the HTML document “search.html” in FIG. 3, the “ass” is a speech recognition dictionary “Recog1” and a control pattern “Pattern”.
In the case of the HTML document “faq_top.html” in FIG. 4, the voice recognition dictionary “Recog2” and the control pattern “Patt
ern2 "is used as an argument to perform voice control.

【００３２】図５及び７は、上記プログラム「RecogCon
t.class」の引数となる音声認識辞書「Recog1」及び「R
ecog2」の一例を表す図表である。これらの図表中にお
ける「認識語彙」が入力される音声を表し、また「出
力」が出力パターンを表すものである。図表中の「認識
語彙」は、入力される音声データとの照合を行うもので
あり、便宜的に平仮名を用いて記述しているが、例え
ば、ディジタル化した音声データのサンプル等を使用し
ても良い。また、「出力」は音声認識部１５〜音声制御
部１６間における内部処理に用いるための出力パターン
であり、例えば、「認識語彙」の各データに対応した一
意な文字列を適宜設定しておけば良い。FIGS. 5 and 7 show the above program "RecogCon.
"T.class", the speech recognition dictionaries "Recog1" and "R
It is a chart showing an example of "ecog2". In these tables, “recognized vocabulary” indicates a voice to be input, and “output” indicates an output pattern. The “recognized vocabulary” in the table is used to collate with the input voice data, and is described using hiragana for convenience. For example, using a sample of digitized voice data, etc. Is also good. “Output” is an output pattern used for internal processing between the voice recognition unit 15 and the voice control unit 16. For example, a unique character string corresponding to each data of “recognized vocabulary” may be set as appropriate. Good.

【００３３】図６及び８は、上記プログラム「RecogCon
t.class」の引数となる制御パタンファイル「Pattern
1」及び「Pattern2」の一例を表す図表である。これら
の図表中における「制御パタン」が照合対象となる認識
語彙を表し、また「アクション」が定義される動作すな
わち制御情報を表すものである。図中の「制御パタン」
は、上記音声認識辞書における認識語彙の「出力」また
は「出力」の組み合わせから成るものであり、例えば、
図６中の「{ヒストリ｜履歴}」は、音声認識辞書の認識
に係る出力パタンが「ヒストリ」かまたは「履歴」かの
場合を表している。また「アクション」は、「制御パタ
ン」に対応した提示に係る処理を意味するものであり、
例えば、図６中の「JUMP “faq_top.html”」はＨＴＭ
Ｌ文書「faq_top.html」に移動して当該文書を表示する
ことを表している。FIGS. 6 and 8 show the above program "RecogCon
The control pattern file "Pattern" which is an argument of "t.class"
9 is a chart showing an example of “1” and “Pattern2”. The “control pattern” in these tables represents the recognition vocabulary to be collated, and the action in which the “action” is defined, that is, the control information. "Control pattern" in the figure
Is composed of "output" or a combination of "output" of the recognized vocabulary in the speech recognition dictionary, for example,
“{History | history}” in FIG. 6 indicates a case where the output pattern related to recognition of the speech recognition dictionary is “history” or “history”. “Action” means processing related to presentation corresponding to “control pattern”.
For example, “JUMP“ faq_top.html ”” in FIG.
This indicates that the document is moved to the L document “faq_top.html” and displayed.

【００３４】図９及び１０は、上記図３〜８の図表を用
いた場合の情報提示装置１の動作を表す模式図である。
まず図９では、ＨＴＭＬ文書「search.html」が解析さ
れブラウザ上に符号９Ａで示されるコンテンツの情報提
示が行われる。当該文書は音声制御の対象であるので音
声入力部１４において音声が入力される。ここで、符号
９Ｂに示す音声「一番に行く」が入力されたとすると、
音声認識部１５では、図５の音声認識辞書「Recog1」に
基づいて符号９Ｃの「一番」及び「行く」が認識語彙と
して検出される。音声制御部１６では、検出された認識
語彙と図６の制御パタンファイル「Pattern1」とに基づ
いて符号９Ｄの制御パタン「一番.*{行く｜飛ぶ}」が検
出され、この制御パタンの制御情報である「JUMP “faq
_top.html”」により、対応する提示情報をブラウザに
提示する。符号９Ｅは、アクション「JUMP “faq_top.h
tml”」による提示結果であり、ブラウザの提示情報
は、入力された音声により、符号９Ａから符号９Ｅに変
化したことを表している。FIGS. 9 and 10 are schematic diagrams showing the operation of the information presenting apparatus 1 when using the charts of FIGS.
First, in FIG. 9, the HTML document “search.html” is analyzed, and information of the content indicated by reference numeral 9A is presented on the browser. Since the document is subject to voice control, voice is input at the voice input unit 14. Here, assuming that the voice “go to the first” indicated by reference numeral 9B is input,
The speech recognition unit 15 detects “first” and “go” of the code 9C as recognition vocabulary based on the speech recognition dictionary “Recog1” in FIG. The voice control unit 16 detects a 9D control pattern “Ichiban. * {Go | Fly}” based on the detected recognition vocabulary and the control pattern file “Pattern1” in FIG. 6, and controls this control pattern. Information "JUMP" faq
_top.html ””, the corresponding presentation information is presented to the browser. Reference numeral 9E indicates the action “JUMP“ faq_top.h
tml "", indicating that the presentation information of the browser has changed from 9A to 9E by the input voice.

【００３５】図１０の場合も同様に、ＨＴＭＬ文書「fa
q_top.html」が解析され、ブラウザ上に符号１０Ａで示
されるコンテンツの情報提示が行われる。当該文書は、
音声制御の対象であるので、音声入力部１４において音
声が入力される。ここで、符号１０Ｂに示す音声「ソフ
トウェアに関する質問」が入力されたとすると、音声認
識部１５では、図７の音声認識辞書「Recog2」に基づい
て符号１０Ｃの「ソフトウェア」が認識語彙として検出
される。Similarly, in the case of FIG. 10, the HTML document "fa
“q_top.html” is analyzed, and the information of the content indicated by reference numeral 10A is presented on the browser. The document,
Since the voice is to be controlled, voice is input in the voice input unit 14. Here, assuming that the voice “question about software” shown by reference numeral 10B is input, the voice recognition unit 15 detects “software” of reference numeral 10C as a recognition vocabulary based on the voice recognition dictionary “Recog2” of FIG. .

【００３６】音声制御部１６では、検出された認識語彙
と図８の制御パタンファイル「Pattern2」とに基づいて
符号１０Ｄの制御パタン「ソフトウェア」が検出され、
この制御パタンの制御情報である「JUMP “software_fa
q.html”」により、対応する提示情報をブラウザに提示
する。符号１０Ｅは、アクション「JUMP “software_fa
q.html”」による提示結果であり、ブラウザの提示情報
は、入力された音声により、符号１０Ａから符号１０Ｅ
に変化したことを表している。The voice control unit 16 detects a control pattern "software" of reference numeral 10D based on the detected recognition vocabulary and the control pattern file "Pattern2" in FIG.
"JUMP" software_fa "which is the control information of this control pattern
q.html ””, the corresponding presentation information is presented to the browser. Reference numeral 10E indicates the action “JUMP“ software_fa
q.html "", and the presentation information of the browser is represented by reference numerals 10A to 10E according to the input voice.
It has changed to.

【００３７】なお、本実施形態では、データベース１
７、制御パタンファイル１８、及び音声認識辞書１９は
分離した形態で構築しているが、本実施形態に限定する
ことなく、例えば、制御パタンファイル１８及び音声認
識辞書１９を、適宜、データベース１７中に包含させて
構築しても良い。In this embodiment, the database 1
7. Although the control pattern file 18 and the speech recognition dictionary 19 are constructed in a separated form, the present invention is not limited to this embodiment. For example, the control pattern file 18 and the speech recognition dictionary 19 may be appropriately stored in the database 17. May be included.

【００３８】このように、本実施形態の情報提示装置１
では、認識対象となる語彙数が増加した場合でも、所定
の情報単位毎に認識語彙群を登録設定して音声認識辞書
を構築することにより、認識率及び実用性が低下するこ
となく自由度の高い音声認識が可能となる。このことか
ら、音声認識辞書の認識語彙は所定の情報単位毎に限定
されるために、従来手法と比較して認識精度を一定値以
上維持することが可能となる。As described above, the information presentation device 1 of the present embodiment
Then, even when the number of vocabularies to be recognized increases, the recognition vocabulary group is registered and set for each predetermined information unit to construct a speech recognition dictionary, so that the recognition rate and the degree of freedom can be maintained without lowering the practicality. High voice recognition becomes possible. From this, since the recognition vocabulary of the speech recognition dictionary is limited for each predetermined information unit, the recognition accuracy can be maintained at a certain value or more as compared with the conventional method.

【００３９】また、同様にして、所定の情報単位毎に制
御パタンを設定して制御パタンファイルを構築すること
により、例えば、提示ページ等のような特定の情報単位
毎に提示形態を自由に制御できる。Similarly, by setting a control pattern for each predetermined information unit and constructing a control pattern file, the presentation form can be freely controlled for each specific information unit such as a presentation page. it can.

【００４０】また、音声認識による情報提示を行うこと
により、例えば、携帯端末のようにキーボードの使用に
支障が多い場合等における効果的な補助支援が可能とな
り、広範囲な情報提示の環境を提供できる。Further, by presenting information by voice recognition, for example, effective assistance can be provided in a case where use of a keyboard is often hindered, such as in a portable terminal, and an environment for presenting a wide range of information can be provided. .

【００４１】（第２実施形態）本発明は、通信回線とし
てインタネット等の公衆網を介して流通する大量の電子
化情報に対して音声認識による情報の閲覧を行うシステ
ム、例えば、上記情報提示装置として機能するところの
クライアント、情報提供装置として機能するところの情
報提供サーバ、を配備した情報閲覧システムの形態で実
施することも可能である。(Second Embodiment) The present invention relates to a system for browsing information by voice recognition for a large amount of digitized information distributed through a public network such as the Internet as a communication line, for example, the information presenting apparatus. It is also possible to implement the present invention in the form of an information browsing system in which a client functioning as an information providing device and an information providing server functioning as an information providing device are deployed.

【００４２】この場合のクライアントは、例えば、イン
タネット環境上における複数の大規模なデータベースに
対する情報取得装置として位置付けられる。その構成例
としては、公衆網を介して情報提供サーバとの通信を行
う通信制御部を具備するとともに、上記情報提示装置１
と同様の機能ブロック、提示要求入力部１１、データ取
得部１２、情報提示処理部１３、音声入力部１４、音声
認識部１５、及び音声制御部１６、を具備して構成され
る。The client in this case is positioned as, for example, an information acquisition device for a plurality of large-scale databases on the Internet environment. As an example of the configuration, a communication control unit that performs communication with an information providing server via a public network is provided.
It has the same functional blocks as those described above, a presentation request input unit 11, a data acquisition unit 12, an information presentation processing unit 13, a speech input unit 14, a speech recognition unit 15, and a speech control unit 16.

【００４３】一方、情報提供サーバは、コンピュータ装
置の内部あるいは外部記憶装置に、上記データベース１
７、制御パタンファイル１８、及び音声認識辞書１９と
同一のデータベース、制御パタンファイル、及び音声認
識辞書を構築するとともに、クライアントとの通信を行
う通信制御部を具備して構成される。On the other hand, the information providing server stores the database 1 in the computer device or in an external storage device.
7. The same database as the control pattern file 18 and the voice recognition dictionary 19, the control pattern file, and the voice recognition dictionary are constructed, and the communication control unit that communicates with the client is provided.

【００４４】このクライアントが上記情報提示装置１と
相違する点は、通信制御を行う公知の通信制御部を具備
する点であり、提示要求入力部１１の提示要求に対応し
た情報提供サーバからの電子化情報を、この通信制御部
を介してデータ取得部１２に入力するように構成させ
る。また、取得される電子化情報に対応する制御パタン
ファイル及び音声認識辞書も同様に、通信制御部を介し
て情報提供サーバから取得するとともに、取得した制御
パタンファイル及び音声認識辞書をクライアント内に保
持するように構成させることで代替が可能となり、上記
情報提示装置１と同等の効果を得ることが可能となる。The difference between the client and the information presenting apparatus 1 is that the client includes a known communication control unit for performing communication control. Configuration information is input to the data acquisition unit 12 via the communication control unit. Similarly, a control pattern file and a voice recognition dictionary corresponding to the obtained digitized information are obtained from the information providing server via the communication control unit, and the obtained control pattern file and the voice recognition dictionary are stored in the client. By doing so, substitution is possible, and it is possible to obtain the same effect as that of the information presentation device 1.

【００４５】この情報閲覧システムでは、従来型システ
ムではクライアント環境に依存するために不可能であっ
た情報提供者側、すなわち情報提供サーバからの音声認
識辞書及び制御パタンファイルの変更が容易に実現可能
となり、クライアントにおける音声制御処理に反映され
るようになる。In this information browsing system, it is possible to easily change the speech recognition dictionary and the control pattern file from the information provider side, that is, the information providing server, which is impossible because the conventional system depends on the client environment. And this is reflected in the voice control processing in the client.

【００４６】[0046]

【発明の効果】以上の説明から明らかなように、本発明
によれば、ブラウザ等において提示対象となる情報の提
示形態が音声により柔軟に制御可能となる効果がある。
また、本発明を情報提供側と情報取得側とを分離した形
態のシステムに適用することにより、情報提供側では提
示形態を考慮した情報の構成が容易になるとともに、情
報取得側では音声による自由度及び操作性の高い環境が
提供可能となる。As is apparent from the above description, according to the present invention, there is an effect that the presentation form of information to be presented in a browser or the like can be flexibly controlled by voice.
In addition, by applying the present invention to a system in which the information providing side and the information obtaining side are separated from each other, the information providing side can easily configure the information in consideration of the presentation form, and the information obtaining side can freely use speech. An environment with high degree and operability can be provided.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る情報提示装置の機能
ブロック図。FIG. 1 is a functional block diagram of an information presentation device according to an embodiment of the present invention.

【図２】情報提示装置における処理手順図。FIG. 2 is a processing procedure diagram in the information presentation device.

【図３】音声制御処理を組み込んだＨＴＭＬ文書の一例
を示した説明図。FIG. 3 is an explanatory diagram showing an example of an HTML document incorporating a voice control process.

【図４】音声制御処理を組み込んだＨＴＭＬ文書の一例
を示した説明図。FIG. 4 is an explanatory diagram showing an example of an HTML document incorporating a voice control process.

【図５】音声認識辞書の一例を示した説明図。FIG. 5 is an explanatory diagram showing an example of a speech recognition dictionary.

【図６】音声認識辞書の一例を示した説明図。FIG. 6 is an explanatory diagram showing an example of a speech recognition dictionary.

【図７】制御パタンファイルの一例を示した説明図。FIG. 7 is an explanatory diagram showing an example of a control pattern file.

【図８】制御パタンファイルの一例を示した説明図。FIG. 8 is an explanatory diagram showing an example of a control pattern file.

【図９】情報提示装置における音声認識制御を表す模式
図。FIG. 9 is a schematic diagram illustrating voice recognition control in the information presentation device.

【図１０】情報提示装置における音声認識制御を表す模
式図。FIG. 10 is a schematic diagram illustrating voice recognition control in the information presentation device.

【図１１】従来型の情報閲覧システムにおける機能ブロ
ック図。FIG. 11 is a functional block diagram of a conventional information browsing system.

[Explanation of symbols]

１情報提示装置１１提示要求入力部１２データ取得部１３情報提示処理部１４音声入力部１５音声認識部１６音声制御部１７データベース１８制御パタンファイル１９音声認識辞書３０情報提供サーバ４０クライアント１１０情報閲覧システム１１１音声認識部１１２音声認識辞書１１３音声認識制御部１１４ブラウザ DESCRIPTION OF SYMBOLS 1 Information presentation apparatus 11 Presentation request input part 12 Data acquisition part 13 Information presentation processing part 14 Voice input part 15 Voice recognition part 16 Voice control part 17 Database 18 Control pattern file 19 Voice recognition dictionary 30 Information providing server 40 Client 110 Information browsing system 111 speech recognition unit 112 speech recognition dictionary 113 speech recognition control unit 114 browser

Claims

[Claims]

1. An information presentation method using a computer device, wherein a presentation form of digitized information to be presented is changed by voice, wherein a recognition vocabulary group to be subjected to voice recognition and the recognition vocabulary group correspond to the recognition vocabulary group. A presentation control information group relating to the presentation of the digitized information is stored in advance for each presentation form of the digitized information, and a recognition vocabulary to be collated with the input speech is detected from the recognition vocabulary group. An information presentation method, comprising: detecting presentation control information to be matched with the presentation control information group; and presenting the corresponding digitized information in a presentation form based on the detected presentation control information.

2. An apparatus for controlling a presentation form of digitized information to be presented by speech, comprising: a speech recognition dictionary in which recognition vocabulary groups to be subjected to speech recognition are registered in advance for each presentation form; A control pattern file in which presentation control information related to the presentation of the digitized information corresponding to a vocabulary group is set in advance for each of the presentation forms, and the digitized information corresponding to an information presentation request from an operator is presented. Means for detecting whether or not the digitized information is a presentation control target by voice; and when the digitized information to be subjected to the presentation control by voice is detected, a voice from the operator is acquired and collated with the voice. Voice recognition means for detecting a recognition vocabulary to be detected from the voice recognition dictionary and performing voice recognition; and detecting, from the control pattern file, presentation control information to be collated with the detected recognition vocabulary. To together, the information presentation device being characterized in that and a presenting means for presenting the corresponding electronic information presentation form based on the presentation control information.

3. The speech recognition dictionary is constructed so as to find out, as a recognition vocabulary, unique data corresponding to a plurality of speech recognition data having the same data format that can be collated with a voice from the operator. 3. The method according to claim 2, wherein
Information presentation device as described.

4. The speech recognition dictionary is constructed so as to find out, as a recognition vocabulary, a unique character string corresponding to a plurality of speech recognition data having the same data format that can be collated with a voice from the operator. 3. The method according to claim 2, wherein
Information presentation device as described.

5. The control pattern file according to claim 2, wherein the control pattern file is constructed to search for the presentation control information corresponding to the control pattern including the recognized vocabulary or a combination of the recognized vocabulary. Information presentation device.

6. The information presentation apparatus according to claim 2, wherein the speech recognition dictionary and the control pattern file are constructed by being included in a database storing the digitized information.

7. The system according to claim 2, wherein the presenting unit is configured to present the digitized information based on an information unit corresponding to a presentation region of the digitized information.
Information presentation device as described.

8. The computerized information, wherein the computerized information to be presented, the speech recognition vocabulary group to be subjected to speech recognition, and the presentation control information group related to the presentation of the computerized information corresponding to the speech recognition vocabulary group are represented by A first device, which is created and held in advance for each presentation mode, and a second device, which presents the digitized information corresponding to an information presentation request from an operator, is connected in a two-way communication manner. Means for acquiring and presenting the digitized information corresponding to the information presentation request from the operator from the first device, and detecting whether or not the digitized information is a presentation control target by voice; and When the digitized information to be subjected to the presentation control by the voice is detected, a voice from the operator is obtained, and the recognition vocabulary group corresponding to the digitized information is obtained from the first device, and Match with audio Means for detecting a linguistic vocabulary; detecting presentation control information for acquiring the presentation control information group corresponding to the digitized information to be subjected to the presentation control from the first device and collating with the detected recognition vocabulary; ,
Presentation means for acquiring and presenting the corresponding digitized information from the first device in a presentation form based on the presentation control information;
An information presentation system, comprising: controlling a presentation form of the digitized information to be presented by voice.

9. The information presentation apparatus according to claim 8, wherein said presentation means is configured to present said digitized information based on an information unit corresponding to a presentation area of a predetermined Web browser. system.