JP2000250677A

JP2000250677A - Device and method for multimodal interface

Info

Publication number: JP2000250677A
Application number: JP11054778A
Authority: JP
Inventors: Tetsuro Chino; 哲朗知野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-03-02
Filing date: 1999-03-02
Publication date: 2000-09-14
Anticipated expiration: 2019-03-02
Also published as: JP3753882B2

Abstract

PROBLEM TO BE SOLVED: To enable a natural and smooth interaction by presenting guide information for supporting the interaction to the user in optimum manner in a multimodal interface environment. SOLUTION: This device is provided with an attention information generating means 104 for detecting an attention spot of a user and generating it as attention information, a guide information control means 105 for finding a position to present the guide information for supporting the input of the user on the basis of the attention information and a presentation control means 101 for presenting and controlling the guide information at the position found by the guide information control means 105 and the guide information is presented near the attention spot of the user.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、マルチモーダルイ
ンターフェース装置及びマルチモーダルインターフェー
ス方法に関する。特に、複数の対話モードにより利用者
とコンピュータとの間のデータの授受を行うマルチモー
ダルインターフェース環境において、利用者に対話を補
助するガイド情報を最適に提示することで、自然かつ円
滑な対話を実現するための技術に関する。The present invention relates to a multimodal interface device and a multimodal interface method. In particular, in a multi-modal interface environment in which data is exchanged between the user and the computer in multiple interaction modes, natural and smooth interaction is realized by optimally presenting guide information that assists the interaction to the user. Related to technology.

【０００２】[0002]

【従来の技術】近年、パーソナルコンピュータを含む各
種計算機システムにおいて、従来のキーボードやマウス
などによる入力・ディスプレイなどによる文字や画像情
報の出力に加えて、音声情報や画像情報等のマルチメデ
ィア情報を入出力することが実現された。2. Description of the Related Art In recent years, in various computer systems including personal computers, multimedia information such as audio information and image information has been input in addition to the conventional input and output of characters and images by a keyboard and a mouse. Output was realized.

【０００３】これらのマルチメディア情報を用いた対話
システムの１つとして、音声対話システムがある。自然
言語解析・自然言語生成、音声認識・音声合成技術、或
いは対話処理技術の進歩などによって、利用者と音声入
出力データを対話する音声対話システムへの要求が高ま
っている。例えば自由発話による音声入力による対話シ
ステムである“ＴＯＳＢＵＲＧ−ＩＩ”（電気情報通信
学会論文誌、Ｖｏｌ．Ｊ７７−Ｄ−ＩＩ、Ｎｏ．８，ｐ
ｐ１４１７−１４２８，１９９４）等、様々な音声対話
システムの開発がなされている。One of the interactive systems using such multimedia information is a voice interactive system. Due to the progress of natural language analysis / natural language generation, speech recognition / speech synthesis technology, or interaction processing technology, there is an increasing demand for a speech interaction system for interacting with a user with speech input / output data. For example, "TOSBURG-II" which is a dialogue system based on voice input by free utterance (Transactions of the Institute of Electronics, Information and Communication Engineers, Vol. J77-D-II, No. 8, p.
pp. 1417-1428, 1994), etc., have been developed.

【０００４】この音声入出力に加え、例えばカメラを使
って把握した視覚情報入力データを利用し、或いは、タ
ッチパネル・ペン・タブレット・データグローブ・フッ
トスイッチ・対人センサ・ヘッドマウントディスプレイ
・フォースディスプレイ（提力装置）など様々な外部入
出力デバイスを通じて利用者と授受できる情報を利用す
ることにより、利用者と対話（インタラクション）を行
なうマルチモーダル対話システムへの要求が高まってい
る。これらの複数の対話モードを備えるユーザーインタ
ーフェースを、以下、マルチモーダルインターフェース
（Multimodal Interface、ＭＭＩ）と称する。In addition to the voice input / output, visual information input data grasped using, for example, a camera is used, or a touch panel, a pen, a tablet, a data glove, a foot switch, an interpersonal sensor, a head mounted display, and a force display are provided. By using information that can be exchanged with the user through various external input / output devices such as a power device, there is an increasing demand for a multi-modal interaction system for interacting with the user. Hereinafter, the user interface having the plurality of interactive modes is referred to as a multimodal interface (MMI).

【０００５】人間同士の対話においては、例えば音声な
ど一つのメディア（チャネル）のみを用いてコミュニケ
ーションを行なっている訳ではなく、身振り・手ぶり・
表情といった様々なメディアを通じて授受される非言語
メッセージを駆使して対話することによって、自然で円
滑なインタラクションが実現されている（“Ｉｎｔｅｌ
ｌｉｇｅｎｔＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａ
ｃｅｓ”，ＭａｙｂｕｒｙＭ．Ｔ，Ｅｄｓ．，Ｔｈｅ
ＡＡＡＩＰｒｅｓｓ／ＴｈｅＭＩＴＰｒｅｓ
ｓ，１９９３）。これと同様、人間とコンピュータとの
対話においても、マルチモーダルインターフェースは自
然で使いやすいヒューマンインタフェースを実現するた
めの有力な手法である。[0005] In a dialogue between humans, communication is not performed using only one medium (channel) such as voice, but gestures, gestures, and gestures are performed.
By interacting with non-verbal messages transmitted and received through various media such as facial expressions, natural and smooth interaction is realized (“Intel”
light Multimedia Interfa
ces ", Maybury MT, Eds., The
AAAI Press / The MIT Pres
s, 1993). Similarly, in the interaction between a human and a computer, the multimodal interface is a powerful technique for realizing a natural and easy-to-use human interface.

【０００６】従来のマルチモーダルインターフェースの
処理を以下に説明する。The processing of the conventional multimodal interface will be described below.

【０００７】利用者からコンピュータに音声入力等がな
されると、入力された音声波形信号はアナログ／デジタ
ル変換される。このデジタル化された音声信号の単位時
間当たりのパワー計算を行なうこと等によって、音声区
間が検出される。音声信号は、例えばＦＦＴ（高速フー
リエ変換）などの方法によって分析される。次に、例え
ば、ＨＭＭ（隠れマルコフモデル）などの方法を用い
て、分析された音声信号と予め用意した標準パターンで
ある音声認識辞書との照合処理が行われ、この照合結果
に従って発声内容が推定される。推定された発生内容に
応じた処理が行なわれる。[0007] When a user inputs a voice or the like to a computer, the input voice waveform signal is converted from analog to digital. The voice section is detected by, for example, calculating the power per unit time of the digitized voice signal. The audio signal is analyzed by a method such as FFT (Fast Fourier Transform). Next, for example, using a method such as HMM (Hidden Markov Model), the analyzed speech signal is collated with a speech recognition dictionary, which is a standard pattern prepared in advance, and the utterance content is estimated according to the collation result. Is done. Processing according to the estimated occurrence is performed.

【０００８】音声入力以外の非言語メッセージによる入
力の場合、カメラから得られる利用者を撮像した画像情
報の解析、赤外線などを用いた距離センサなどの出力情
報の解析、或いはタッチセンサなどの接触式の入力装置
の出力情報の解析によって、利用者の手の位置・形・或
いは動きなどを認識することで、利用者からのジェスチ
ャ等の非言語メッセージによる入力が行われていた。[0008] In the case of input by a non-verbal message other than voice input, analysis of image information of a user obtained from a camera, analysis of output information such as a distance sensor using infrared rays, or contact type such as a touch sensor By inputting a non-verbal message such as a gesture from a user by recognizing the position, shape, or movement of the user's hand by analyzing the output information of the input device.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、従来の
マルチモーダルインターフェースには、以下の問題点が
あった。However, the conventional multimodal interface has the following problems.

【００１０】そもそもマルチモーダルインターフェース
環境は、従来と異なる以下の特性を有する。すなわち、
第１に、入力された情報が、利用者が意図した情報か不
要な情報かを判別することが困難である。In the first place, the multimodal interface environment has the following characteristics different from the conventional one. That is,
First, it is difficult to determine whether the input information is information intended by the user or unnecessary information.

【００１１】第２に、入力データは予め登録した辞書な
どとの照合処理により意味内容が推定されるため、この
照合の精度を上げるためには利用者に対する入力候補な
どのガイド情報が不可欠である。以下、この入力のガイ
ドにつき詳述する。Second, since the meaning of input data is estimated by collation processing with a dictionary or the like registered in advance, guide information such as input candidates for a user is indispensable for improving the accuracy of the collation. . Hereinafter, this input guide will be described in detail.

【００１２】現在の技術においては、それぞれのメディ
アからの入力の照合処理による解析精度は低く、それぞ
れの入出力メディアの性質が十分には明らかとなってい
ない。このため、新たに利用可能となった各入出力メデ
ィア、あるいは複数の入出力メディアを効率的に利用
し、高能率で、効果的で、利用者の負担を軽減するマル
チモーダルインタフェースは未だ実現されていない。In the current technology, the analysis accuracy of the collation processing of the input from each medium is low, and the properties of each input / output medium are not sufficiently clarified. For this reason, a multimodal interface that efficiently uses each newly available I / O media or multiple I / O media, is highly efficient, effective, and reduces the burden on users is still realized. Not.

【００１３】また、利用者は、音声或いはジェスチャ等
によって入力を行なう際に、各場面毎に現在入力可能な
語彙やジェスチャ種などの入力候補である表現が分かり
難い。同時に、どの時点に入力が可能であるかどうかが
分かり難い。[0013] Further, when inputting by voice or gesture, it is difficult for the user to understand the expression that is an input candidate such as a vocabulary or a gesture type that can be currently input for each scene. At the same time, it is difficult to know at what point the input is possible.

【００１４】また、音声やジェスチャなどの認識技術を
利用したインターフェースでは、認識精度の低さから入
力内容が意図した内容で入力されたか否かには不確実性
がある。このため、利用者は各入力が正しく認識された
か否か分かり難い。In an interface using a recognition technique such as a voice or a gesture, there is uncertainty as to whether or not the input content is input as intended because of the low recognition accuracy. For this reason, it is difficult for the user to understand whether each input has been correctly recognized.

【００１５】これらに対処するために、利用者に対して
適宜入力のガイドを提示することが必要となる。このガ
イドは、現在入力可能な表現、現在の入力受け付けの状
態、又は入力のタイミングを示したり、或いは入力され
た表現の認識結果などを表示領域上の一部分等に表示し
て、利用者の入力を補助する。In order to cope with these, it is necessary to appropriately present a guide for input to the user. This guide shows the expressions that can be entered now, the current state of accepting input, or the timing of input, or displays the recognition result of the entered expression on a part of the display area, etc. To assist.

【００１６】音声やジェスチャなどの入力の各時点にお
いて入力可能な表現の組合せは一般に複雑かつ膨大にな
る。さらに、利用者は、あらかじめ決めた表現を入力す
るだけでなく、その場で逐次表現を決定・変更して、入
力する場合も少なくはない。上記の入力ガイドを参照す
ることにより、利用者はこの全ての表現の組み合わせを
覚えておくことが不要となる。In general, the combinations of expressions that can be input at each point of input of voice, gesture, and the like are complicated and enormous. Further, in many cases, the user not only inputs a predetermined expression, but also determines and changes the expression one by one on the spot and inputs the expression. By referring to the above input guide, the user does not need to remember all combinations of these expressions.

【００１７】この入力ガイドにより、利用者は、各時点
に於いて入力可能な表現の候補を見ながら入力する表現
を決定できる。また、音声あるいはジェスチャなどによ
る入力を行なう際に、その入力のタイミングを得ること
ができる。これらの理由により、利用者は入力を行う際
に、しばしばこの入力ガイドを参照する。With this input guide, the user can determine an expression to be input while looking at candidates for expressions that can be input at each point in time. In addition, when performing input by voice or gesture, the input timing can be obtained. For these reasons, users often refer to this input guide when performing input.

【００１８】しかし、従来の入力ガイドには、以下の問
題点があった。However, the conventional input guide has the following problems.

【００１９】一般にマルチモーダルインタフェースは、
何らかのタスクを実施する装置又は方法と併用され、こ
れらの装置等に対する音声やジェスチャ等の認識技術を
用いた入力を利用可能とするものである。従って、本来
のタスクを実行するためには、利用者はほとんどの時間
タスクに対応する作業領域を見る必要がある。Generally, a multimodal interface is
It is used in combination with a device or a method for performing some task, and enables input to these devices or the like using a recognition technique such as voice or gesture. Therefore, to execute the original task, the user needs to look at the work area corresponding to the task most of the time.

【００２０】この利用者が現在行なっている作業を行な
うための領域と、上述のガイドが表示される領域とは、
ほとんどの場合に別の領域となる。音声或いはジェスチ
ャなど認識技術を用いた入力手段を利用する際には、ガ
イドを見る必要があるにも拘わらず、利用者が作業領域
のみに視線を投げかけている場合には、音声あるいはジ
ェスチャなどの入力のためのガイドを見ることができな
くなる。他方、利用者がガイドのみに視線を投げかけて
いる場合には、作業領域を見ることが出来なくなるため
に、本来のタスクを行なうことが出来ない。他方、利用
者がガイドと作業領域の双方に視線を投げかけようとす
る場合には、利用者の視線が双方の間を頻繁に行き来す
ることとなる。このため、利用者が視線を動かして双方
の領域に視線を合わせ直したり、あるいは双方の領域で
現在必要な情報を捜し出す余分な作業が必要となって、
利用者の負荷が増大する。The area where the user is currently performing the work and the area where the above-mentioned guide is displayed are as follows:
In most cases it will be another area. When using input means using recognition technology such as voice or gesture, the user may need to look at the guide, but if the user is throwing his / her gaze only at the work area, the voice or gesture may be used. You will not be able to see the guide for input. On the other hand, when the user is throwing his / her line of sight only at the guide, the user cannot perform the original task because the work area cannot be seen. On the other hand, when the user attempts to throw his / her gaze at both the guide and the work area, the user's gaze frequently moves between them. For this reason, the user needs to move his / her gaze to realign his / her gaze in both areas, or to perform extra work to find the information currently needed in both areas.
The load on the user increases.

【００２１】このように、第１の問題点として、作業領
域と入力ガイドを別領域に表示していたので、音声ある
いはハンドジェスチャなどといった、本来は目の動きに
関して拘束の無いメディアを用いたメディアを使ってい
るにも拘わらず、利用者の目の動きに制限を与えてしま
い、結果としてこれらのメディアの持つ本来のメリット
を無効にしてしまう。As described above, the first problem is that the work area and the input guide are displayed in different areas, so that a medium such as a voice or a hand gesture using a medium which is not originally restricted by eye movements is used. In spite of the use of the media, it limits the movement of the eyes of the user, and consequently negates the original merit of these media.

【００２２】第２の問題点として、入力内容を解析した
結果を利用者に対してフィードバックする際に、このフ
ィードバック信号として従来は、ビープ音、相槌音声等
の音声信号が用いられていた。As a second problem, when the result of analyzing the input contents is fed back to the user, a sound signal such as a beep sound or a companion sound has conventionally been used as the feedback signal.

【００２３】しかし、周囲の環境によっては、常に音声
信号によるフィードバックを行なうことは、周囲に対す
る雑音となる。あるいは利用者自身に対しても繁雑とな
ってしまう。However, depending on the surrounding environment, always performing feedback by a voice signal causes noise to the surroundings. Or it becomes complicated for the user himself.

【００２４】第３の問題点として、利用者とコンピュー
タとの間の入力ガイドにおける対話を制御する手段が必
要となる。具体的には、利用者との間のコミュニケーシ
ョンに関する何らかの障害が発生した場合などに、その
障害の発生を検知し、かつその障害を解消する手段であ
る。これらの障害として、利用者からの入力の認識に失
敗したり、或いは利用者への情報の出力に失敗をした場
合等がある。これらの障害を解決するためには、例えば
確認のための情報の再提示・利用者への問い返し質問対
話・対話の論議の流れを適切に制御すること等によっ
て、利用者からの入力に対応して利用者への適切な出力
を行なったり、利用者からの入力と利用者への出力のタ
イミングを適切に制御したりする処理が必要となる。As a third problem, a means for controlling a dialog in an input guide between a user and a computer is required. Specifically, it is a means for detecting the occurrence of a failure when any failure relating to communication with the user occurs and eliminating the failure. These obstacles include a failure in recognizing an input from a user or a failure in outputting information to a user. In order to solve these obstacles, for example, the user must respond to the input from the user by re-presenting the information for confirmation, asking the user the question and answering the question, and appropriately controlling the flow of the discussion of the dialog. It is necessary to perform appropriate output to the user by using such a method, and to appropriately control the timing of the input from the user and the output to the user.

【００２５】従来のマウス・キーボード等の入力デバイ
スを想定した対話管理処理には各種の手法が用いられて
いる。例えば、予め用意した対話の流れであるスクリプ
トを利用した方法や、例えば質問／回答・挨拶／挨拶と
いった互いに対となる発話の組である発話対や発話交換
構造等の情報を利用した方法がある。さらに、対話の流
れ全体を対話の参加者の各個人の計画（プラン）或いは
参加者間の共同の計画（プラン）として形式化して記述
・生成・認識するプランニングによる方法などが用いら
れている。Various methods have been used in the conventional dialog management processing assuming input devices such as a mouse and a keyboard. For example, there is a method using a script that is a flow of a dialog prepared in advance, and a method using information such as a utterance pair and an utterance exchange structure, which are pairs of utterances such as a question / answer / greeting / greeting. . Further, a method of planning, describing, generating, and recognizing the entire flow of the dialogue as a plan (plan) of each individual of the dialogue participants or a joint plan (plan) between the participants is used.

【００２６】しかし、特に音声或いはジェスチャ等のマ
ルチモーダルによる入力の場合、利用者は入力すべき表
現を考えながら入力したり、途中で取り消したり、或い
は入力内容の決定に時間をかける。このため、入力を受
け付ける時間の制御を行なう際に、適切なタイミングの
制御が困難である。However, especially in the case of multimodal input such as voice or gesture, the user inputs while considering the expression to be input, cancels the input midway, or takes time to determine the input content. Therefore, it is difficult to control the appropriate timing when controlling the time for receiving the input.

【００２７】このタイミング制御を、予め設定した一定
の時間、入力を受け付けるように制御する第１の方法が
ある。しかしこの方法では、上述のように利用者が入力
に時間をかけた場合に入力が不可能となる。There is a first method of controlling this timing control so as to receive an input for a predetermined period of time. However, in this method, if the user takes a long time to input, as described above, input becomes impossible.

【００２８】一方、利用者が任意のタイミングで入力を
行なう可能性を考慮して、常時入力を受け付けるように
した第２の方法がある。しかしこの方法では、入力が意
図されない音声あるいは動作を誤って受け付けたり、あ
るいは周囲の雑音・関係のない動作・画像等を誤って受
け付ける。このため、誤動作が起こり、利用者の負担を
増加していた。On the other hand, there is a second method in which an input is always accepted in consideration of a possibility that a user performs an input at an arbitrary timing. However, in this method, a voice or an operation that is not intended to be input is erroneously received, or ambient noise, an unrelated operation or image, or the like is erroneously received. For this reason, a malfunction has occurred and the burden on the user has increased.

【００２９】一方、利用者が通常より遅いタイミングで
入力を行なう可能性を考慮して、利用者からの入力を受
け付ける時間を長くする第３の方法がある。しかしこの
方法でも第２の方法と同様、入力が意図されない音声あ
るいは動作を誤って受け付けたり、あるいは周囲の雑音
・関係のない動作・画像を誤って受け付ける。このた
め、誤動作が起こり、利用者の負担を増加していた。On the other hand, there is a third method for lengthening the time for accepting an input from a user in consideration of the possibility that the user performs an input at a timing later than usual. However, similarly to the second method, this method erroneously accepts a voice or an operation that is not intended to be input, or erroneously accepts surrounding noise / irrelevant operation / image. For this reason, a malfunction has occurred and the burden on the user has increased.

【００３０】また、利用者がまだ入力を継続する意思が
あるか否かを判定する手法がない。このため、どこまで
入力の待ち受け時間を延長すべきかの、適当なタイミン
グを判断することができない。Further, there is no method for determining whether or not the user intends to continue inputting. For this reason, it is not possible to determine an appropriate timing as to how long the input waiting time should be extended.

【００３１】従って、従来のマルチモーダルインタフェ
ースでは、人間同士のコミュニケーションにおいては重
要な役割を演じている、視線、身振り・手振りなどのジ
ェスチャ・顔表情などの非言語メッセージを効果的に活
用することができなかった。Therefore, in the conventional multimodal interface, it is possible to effectively utilize non-verbal messages such as eyes, gestures such as gestures and hand gestures, and facial expressions which play an important role in communication between humans. could not.

【００３２】以上説明したように、本発明は、マルチモ
ーダルインターフェース環境において、各メディアによ
る入力の認識の精度を向上させるために利用され、対話
を補助するために提示される各種ガイド情報が、本来処
理すべきアプリケーション処理とは異なる領域に提示さ
れていた、また状況によらずに音声信号によるフィード
バックを一律行っていた、さらには対話に障害が発生し
た場合等に利用者の意図する入力内容が必ずしも正しく
最後まで入力できなかったために、利用者の負担が増加
し、適切に入力ガイドが行えなかったという問題点を解
決するためになされたものである。As described above, the present invention is used in a multimodal interface environment to improve the accuracy of recognition of input by each medium, and various guide information presented to assist a dialogue is originally used. The input contents that were presented in a different area from the application processing to be processed, and the feedback by audio signal was performed uniformly regardless of the situation, and the input contents intended by the user when the failure occurred in the dialogue etc. The purpose of the present invention is to solve the problem that the input is not always performed correctly and the burden on the user is increased, and the input guide cannot be performed properly.

【００３３】そして、その目的とするところは、対話を
補助するためのガイド情報を、利用者が作業する作業領
域と対応させて提示することで、利用者の負担を軽減し
つつ適切に入力ガイドを行って複数の入出力メディアを
効率的に利用することを可能とするマルチモーダルイン
ターフェース装置及びマルチモーダルインターフェース
方法を提供することにある。The purpose is to present guide information for assisting the dialog in correspondence with the work area where the user works, thereby reducing the burden on the user and appropriately input guide. To provide a multi-modal interface device and a multi-modal interface method that enable efficient use of a plurality of input / output media.

【００３４】また、他の目的は、周囲の状況に即して必
要に応じて音声信号によるフィードバックを行うこと
で、より確実かつ自然に利用者に対して入力のフィード
バックを行うことにある。Another object of the present invention is to provide more reliable and natural feedback of an input to a user by performing feedback by an audio signal as necessary according to the surrounding situation.

【００３５】また、他の目的は、対話のタイミング或い
は流れを適切に制御することで、より効率よく利用者の
意図する入力情報を取得することにある。Another object of the present invention is to obtain input information intended by the user more efficiently by appropriately controlling the timing or flow of the dialogue.

【００３６】[0036]

【課題を解決するための手段】上記の課題を解決するた
めの本発明の特徴は、利用者の入力を補助するガイド情
報を利用者の注視箇所の近傍に提示すべくガイド情報の
提示を最適化する点にある。A feature of the present invention for solving the above-mentioned problem is that the guide information for assisting the user's input is optimally presented in the vicinity of the user's gaze point. In that

【００３７】この注視箇所は、例えば利用者の視線方向
から得られる。This gaze point is obtained, for example, from the direction of the user's line of sight.

【００３８】かかる機能を実現するための、本発明の第
１の特徴は、使用者とコンピュータとの間での情報の入
出力を、複数の対話モードにより行うマルチモーダルイ
ンターフェース装置であって、使用者の注視箇所を検出
して注視情報として生成する注視情報生成手段と、前記
注視情報に基づいて、前記使用者の入力を補助するため
のガイド情報を提示する位置を求めるガイド情報制御手
段と、ガイド情報制御手段により求められた位置にガイ
ド情報を提示制御する提示制御手段とを具備する点にあ
る。A first feature of the present invention for realizing such a function is a multimodal interface device for inputting / outputting information between a user and a computer in a plurality of interactive modes. Gaze information generating means for detecting a gaze point of the user and generating it as gaze information, based on the gaze information, guide information control means for determining a position to present guide information for assisting the input of the user, The present invention is characterized in that there is provided a presentation control means for controlling the presentation of guide information at the position determined by the guide information control means.

【００３９】上記構成によれば、利用者の作業領域の近
傍にガイド情報を提示することができる。従って、利用
者の負担を増加させることなく、利用者に確実にガイド
情報を認識させることが可能となる。According to the above configuration, guide information can be presented near the user's work area. Accordingly, the guide information can be surely recognized by the user without increasing the burden on the user.

【００４０】また、本発明の第２の特徴は、前記ガイド
情報制御手段は、さらに、前記注視情報が示す前記使用
者の注視箇所と前記ガイド情報の提示位置との間の距離
が所定の第１の閾値の範囲内にあるか否かを判定する第
１の判定手段を具備し、前記距離が前記第１の閾値の範
囲内にない場合に、前記使用者の注視箇所の近傍に前記
ガイド情報の提示位置を設定する点にある。According to a second feature of the present invention, the guide information control means further includes a predetermined distance between the user's gaze point indicated by the gaze information and a presentation position of the guide information. A first judging means for judging whether or not the distance is within a range of a first threshold value, and when the distance is not within the range of the first threshold value, the guide is located near a gaze point of the user. The point is to set the information presentation position.

【００４１】上記構成によれば、利用者の作業領域の近
傍にガイド情報を提示することができる。従って、利用
者の負担を増加させることなく、利用者に確実にガイド
情報を認識させることが可能となる。According to the above configuration, guide information can be presented near the work area of the user. Accordingly, the guide information can be surely recognized by the user without increasing the burden on the user.

【００４２】また、本発明の第３の特徴は、前記ガイド
情報制御手段は、さらに、前記利用者が前記ガイド情報
に注目していると判断される場合には、前記ガイド情報
の提示位置を固定とする点にある。A third feature of the present invention is that the guide information control means further determines a presentation position of the guide information when it is determined that the user is paying attention to the guide information. The point is that it is fixed.

【００４３】上記構成によれば、利用者がすでにガイド
情報を捕捉している場合には、ガイド情報を移動させな
いことにより、無駄なちらつきをなくして利用者の負担
を軽減することが可能となる。According to the above configuration, when the user has already captured the guide information, the guide information is not moved, so that unnecessary flicker can be eliminated and the burden on the user can be reduced. .

【００４４】また、本発明の第４の特徴は、前記ガイド
情報制御手段は、さらに、求められた前記ガイド情報の
提示位置が、前記ガイド情報を提示すべき所定の提示領
域内にあるか否かを判定する第２の判定手段を具備し、
前記提示位置が前記提示領域内にない場合に、前記提示
領域内に前記ガイド情報の提示位置を補正する点にあ
る。A fourth feature of the present invention is that the guide information control means further includes a step of determining whether or not the determined presentation position of the guide information is within a predetermined presentation area in which the guide information is to be presented. A second determining means for determining whether
When the present position is not in the present area, the present position of the guide information is corrected in the present area.

【００４５】上記構成によれば、算出されたガイド情報
の提示位置が表示画面等を越える場合に、提示位置を自
動的にオフセットすることができる。従って、利用者に
確実にガイド情報を認識させることが可能となる。According to the above configuration, when the calculated guide information presentation position exceeds the display screen or the like, the presentation position can be automatically offset. Therefore, it is possible for the user to reliably recognize the guide information.

【００４６】また、本発明の第５の特徴は、前記ガイド
情報制御手段は、前記使用者の視野領域内に前記ガイド
情報を提示させるべく前記ガイド情報の大きさを補正す
る点にある。A fifth feature of the present invention resides in that the guide information control means corrects the size of the guide information so as to present the guide information within the visual field of the user.

【００４７】上記構成によれば、利用者の視野領域内に
提示すべき全ガイド情報を収めることができる。従っ
て、利用者のガイド情報取得の負荷が軽減される。According to the above configuration, all guide information to be presented in the user's field of view can be stored. Therefore, the load on the user for acquiring guide information is reduced.

【００４８】また、本発明の第６の特徴は、上記マルチ
モーダルインターフェース装置は、さらに、提示領域に
提示される各アプリケーションの利用状況及び前記アプ
リケーションの各提示要素の配置情報のいずれか１つ以
上を示すアプリケーション提示情報を生成するアプリケ
ーション状況把握手段を具備し、前記ガイド情報制御手
段は、前記アプリケーション提示情報に基づき、前記ガ
イド情報を、前記各アプリケーションの提示を妨げない
位置に設定する点にある。According to a sixth feature of the present invention, the multimodal interface device further includes at least one of usage status of each application presented in the presentation area and arrangement information of each presentation element of the application. The present invention is characterized in that it comprises application status grasping means for generating application presentation information indicating the application information, and the guide information control means sets the guide information to a position which does not hinder the presentation of each application based on the application presentation information. .

【００４９】上記構成によれば、画面上の各アプリケー
ションが表示する情報を妨げることなくガイド情報を提
示することができる。従って、各アプリケーションとの
対話の効率が向上する。According to the above configuration, guide information can be presented without obstructing information displayed by each application on the screen. Therefore, the efficiency of interaction with each application is improved.

【００５０】また、本発明の第７の特徴は、上記マルチ
モーダルインターフェース装置は、さらに、使用者から
の入力を補助するための、音声信号による音声ガイド情
報を生成する補助音声生成手段と、前記音声ガイド情報
を出力制御する出力制御手段とを具備し、前記ガイド情
報制御手段は、前記注視情報に基づいて、前記使用者が
視野領域近傍に前記ガイド情報を捕捉不能と判断される
場合に、前記音声ガイド情報を提示すべきガイド情報に
設定する点にある。A seventh feature of the present invention is that the multi-modal interface device further comprises an auxiliary voice generating means for generating voice guide information by a voice signal for assisting an input from a user; Output control means for controlling the output of voice guide information, the guide information control means, based on the gaze information, when it is determined that the user can not capture the guide information in the vicinity of the visual field region, The point is that the voice guide information is set as guide information to be presented.

【００５１】上記構成によれば、利用者が表示画面を注
視していない場合であっても、利用者にガイド情報を認
識させることができる。According to the above configuration, even when the user is not watching the display screen, the user can recognize the guide information.

【００５２】また、本発明の第８の特徴は、上記マルチ
モーダルインターフェース装置は、さらに、前記注視情
報に基づいて、前記使用者の注視箇所が提示された前記
ガイド情報の領域内に滞留しているか否かを判定する第
３の判定手段を具備し、前記ガイド情報制御手段は、前
記使用者の前記注視箇所が滞留していると判定される場
合に、前記制御手段は、前記使用者からの入力の待ち受
け状態を維持する点にある。According to an eighth feature of the present invention, the multimodal interface device is further configured such that, based on the gaze information, the gaze point of the user stays in an area of the guide information in which the gaze point is presented. The guide information control means, when it is determined that the gaze point of the user is stagnant, the control means, from the user The point is to maintain the state of waiting for the input of.

【００５３】上記構成によれば、利用者からの入力待ち
受け時間を最適化することができる。従って、システム
の負荷が軽減される。According to the above configuration, the waiting time for input from the user can be optimized. Therefore, the load on the system is reduced.

【００５４】また、本発明の第９の特徴は、前記提示制
御手段は、前記ガイド情報を、提示領域に、半透明表示
或いは強調表示により提示制御する点にある。A ninth feature of the present invention resides in that the presentation control means controls the presentation of the guide information in a presentation area by translucent display or emphasized display.

【００５５】上記構成によれば、すでに表示されている
表示要素を妨げることなく、ガイド情報を提示すること
ができる。According to the above configuration, guide information can be presented without obstructing already displayed display elements.

【００５６】さらに、本発明の第１０の特徴は、使用者
とコンピュータとの間での情報の入出力を、複数の対話
モードにより行うマルチモーダルインターフェース方法
であって、使用者の注視箇所を検出して注視情報として
生成するステップと、前記注視情報に基づいて、前記使
用者の入力を補助するためのガイド情報を提示する提示
位置を求めるステップと、前記提示位置にガイド情報を
提示するステップとを含む点にある。Further, a tenth feature of the present invention is a multimodal interface method for inputting and outputting information between a user and a computer in a plurality of interactive modes, and detects a gazing point of the user. And generating a gaze information, based on the gaze information, determining a presentation position to present guide information for assisting the user input, and presenting the guide information at the presentation position Is included.

【００５７】上記構成によれば、利用者の作業領域の近
傍にガイド情報を提示することができる。従って、利用
者の負担を増加させることなく、利用者に確実にガイド
情報を認識させることが可能となる。According to the above configuration, guide information can be presented near the user's work area. Accordingly, the guide information can be surely recognized by the user without increasing the burden on the user.

【００５８】また、本発明の第１１の特徴は、使用者と
コンピュータとの間での情報の入出力を、複数の対話モ
ードにより行うマルチモーダルインターフェースプログ
ラムを格納するコンピュータ読み取り可能な記録媒体で
あって、使用者の注視箇所を検出して注視情報として生
成するモジュールと、前記注視情報に基づいて、前記使
用者の入力を補助するためのガイド情報を提示する提示
位置を求めるモジュールと、前記提示位置にガイド情報
を提示するモジュールとを含む点にある。An eleventh feature of the present invention is a computer-readable recording medium storing a multimodal interface program for inputting and outputting information between a user and a computer in a plurality of interactive modes. A module for detecting a gaze point of the user and generating it as gaze information; a module for determining a presentation position for presenting guide information for assisting the user's input based on the gaze information; And a module for presenting guide information at a position.

【００５９】上記構成によれば、利用者の作業領域の近
傍にガイド情報を提示することができる。従って、利用
者の負担を増加させることなく、利用者に確実にガイド
情報を認識させることが可能となる。According to the above configuration, guide information can be presented near the user's work area. Accordingly, the guide information can be surely recognized by the user without increasing the burden on the user.

【００６０】[0060]

【発明の実施の形態】第１の実施形態以下、図面を用いて本発明の第１の実施形態を詳細に説
明する。第１の実施形態は、利用者の視線位置に対応し
て入力ガイド情報の位置を制御する機能を提供する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings. The first embodiment provides a function of controlling the position of the input guide information according to the position of the user's line of sight.

【００６１】図１は本発明の第１の実施形態に係るマル
チモーダルインターフェース装置の機能構成を示すブロ
ック図である。図１に示すように、第１の実施形態に係
るマルチモーダルインターフェース装置１は、出力部１
０１と、入力部１０２と、視覚ガイド提示部１０３と、
注視対象検出部１０４と、制御部１０５とから構成され
る。FIG. 1 is a block diagram showing a functional configuration of the multimodal interface device according to the first embodiment of the present invention. As shown in FIG. 1, the multimodal interface device 1 according to the first embodiment includes an output unit 1
01, an input unit 102, a visual guide presenting unit 103,
It comprises a gaze target detecting unit 104 and a control unit 105.

【００６２】出力部１０１は、コンピュータ内部から利
用者に対する各種メディアによるデータの出力を行う。
出力部１０１は、例えばＣＲＴディスプレイ・ＬＣＤデ
ィスプレイ・投影プロジェクタ・ヘッドマウントディス
プレイ等の利用者に対して少なくとも視覚的な情報を出
力する。このうち例えばＧＵＩ（グラフィカルユーザイ
ンタフェース）により出力する場合には、例えばビット
マップディスプレイ上にウィンドウやメニューやポイン
タなどを表示するように構成される。The output unit 101 outputs data from various media to the user from inside the computer.
The output unit 101 outputs at least visual information to a user such as a CRT display, an LCD display, a projection projector, and a head-mounted display. For example, when outputting with a GUI (graphical user interface), for example, a window, a menu, a pointer, and the like are displayed on a bitmap display.

【００６３】図１においては１０２は、入力部を表して
いる。In FIG. 1, reference numeral 102 denotes an input unit.

【００６４】入力部１０２は、利用者からコンピュータ
に対する各種メディアによるデータの入力を行う。入力
部１０２は、マイク、カメラ、キーボード、タッチパネ
ル・ペン・タブレット・マウス・トラックボール等のポ
インティングデバイス、データグローブ、データスー
ツ、アイトラッカ、ヘッドトラッカ、ＯＣＲ、対人セン
サ、着席センサ、などの少なくとも一つの入力デバイス
を通じて得られる利用者からの音声情報・視覚情報・操
作情報などの入力を取り込む。入力された音声情報・視
覚情報・操作情報は、標本化、コード化、ディジタル
化、フィルタリング、信号変換、記録、保存、パターン
認識、言語／音声／画像／動作／操作の解析、理解、意
図抽出など、少なくとも一つの処理を行なうことによっ
て入力情報として解析される。The input section 102 inputs data from a user to a computer by various media. The input unit 102 includes at least one of a microphone, a camera, a keyboard, a pointing device such as a touch panel, a pen, a tablet, a mouse, and a trackball, a data glove, a data suit, an eye tracker, a head tracker, an OCR, an interpersonal sensor, and a seating sensor. It takes in input of voice information, visual information, operation information, etc. from the user obtained through the input device. Speech information, visual information, and operation information input are sampled, coded, digitized, filtered, signal converted, recorded, saved, pattern-recognized, analyzed for language / speech / image / operation / operation, understanding, and intention extraction The input information is analyzed by performing at least one process.

【００６５】視覚ガイド提示部１０３は、入力部１０２
から入力される利用者からの入力を補助するための情報
であるガイド情報を利用者に提示する。視覚ガイド提示
部１０３は、このガイド情報を、例えば文字や画像など
といった視覚的手段によって、出力部１０１上に表示す
る。この表示の際には、ガイド情報は、例えばウインド
ウやメニューやポインタなどといった他の表示要素に重
複させて表示されてもおよい。あるいは半透明で重畳さ
せたり、色・フォント・ブリンク・ハイライトなどの表
示属性を変更した表示形態で提示されてもよい。The visual guide presenting unit 103 includes the input unit 102
Is presented to the user as guide information, which is information for assisting the user to input from the user. The visual guide presenting unit 103 displays the guide information on the output unit 101 by visual means such as characters and images. In this display, the guide information may be displayed so as to overlap other display elements such as a window, a menu, and a pointer. Alternatively, it may be presented in a display form in which display attributes such as color, font, blink, highlight, and the like are changed, which are translucent and superimposed.

【００６６】提示されるガイド情報の内容は、第１に例
えば各時点における入力の受け付け可否の状況、音声に
よる入力を想定した場合の入力可能な語彙のリスト、あ
るいはジェスチャ入力を想定した場合の入力可能なジェ
スチャの種類名や記号や絵による表現のリスト等の、利
用者が現在入力することの可能な表現の候補に関する情
報がある。第２には、利用者からの入力の処理の進行状
況、或いは利用者からの入力の処理によって得られた認
識候補などに関する情報がある。これらの情報が、利用
者の入力を補助する情報として適宜提示されるように構
成されている。The contents of the guide information to be presented are, for example, firstly, whether input is accepted or not at each point in time, a list of vocabulary that can be input when voice input is assumed, or input when gesture input is assumed. There is information on candidate expressions that can be currently input by the user, such as a list of possible gesture type names and symbols or pictorial expressions. Second, there is information on the progress of processing of input from the user or information on recognition candidates obtained by processing of input from the user. The information is configured to be appropriately presented as information to assist the user in inputting.

【００６７】また、視覚ガイド提示部１０３は、後述さ
れる制御部１０５の制御に従って、ガイド情報を出力部
１０１の指定された位置に指定された形式で提示する。
尚、視覚ガイド提示部１０３は、請求項における提示制
御手段に対応する。Further, the visual guide presenting unit 103 presents guide information at a designated position on the output unit 101 in a designated format under the control of the control unit 105 described later.
Note that the visual guide presentation unit 103 corresponds to a presentation control unit in the claims.

【００６８】尚、以下において、第１の実施形態では、
入力部１０２からの入力情報としては音声認識入力を想
定する。視覚ガイド提示部１０３で提示されるガイド情
報としては、各時点において入力が可能な発声の候補で
ある音声認識語彙を半透明に適宜提示するものとする。
ただし、視覚ガイド提示部１０３における表示内容およ
び表示形式は限定されるものではなく、任意の手法を用
いることができることは言うまでもない。In the following, in the first embodiment,
As input information from the input unit 102, a voice recognition input is assumed. As the guide information presented by the visual guide presentation unit 103, a speech recognition vocabulary that is a candidate for utterance that can be input at each time point is appropriately presented in a translucent manner.
However, the display contents and the display format in the visual guide presentation unit 103 are not limited, and it goes without saying that any method can be used.

【００６９】注視対象検出部１０４は、利用者がコンピ
ュータ画面等を現在見ているか否か、また利用者の視線
が向いている場所・座標・領域・方向・物、或いはその
部分のうち少なくとも一つを検出する。検出された情報
は、注視対象情報として出力される。具体的には、この
注視対象情報は、例えば、利用者の眼球運動を観察する
アイトラッカ装置や、利用者の頭部の動きを検出するヘ
ッドトラッカ装置や、着席センサや、特願平０８−０５
９０７１「注視箇所推定装置とその方法」で用いられて
いる方法などにより利用者を観察するカメラや利用者が
装着したカメラから得られる画像情報を処理して利用者
の視線方向を検出する手法などによって、検出される。
尚、注視対象検出部１０４は、請求項における注視情報
生成手段に対応する。The gaze target detection unit 104 determines whether or not the user is currently viewing the computer screen or the like, and at least one of the place, coordinates, area, direction, object, or part thereof to which the user is looking. Detect one. The detected information is output as gaze target information. Specifically, the gaze target information includes, for example, an eye tracker device for observing the user's eye movement, a head tracker device for detecting the movement of the user's head, a seating sensor, and Japanese Patent Application No. 08-05.
9071 A method for detecting a user's line of sight by processing image information obtained from a camera for observing the user or a camera worn by the user using the method used in the “gaze point estimation device and method”, and the like. Is detected by
The gaze target detecting unit 104 corresponds to a gaze information generating unit in the claims.

【００７０】制御部１０５は、出力部１０１、入力部１
０２、視覚ガイド提示部１０３、注視対象検出部１０４
の各構成要素を制御及び管理する。尚、制御部１０５
は、請求項におけるガイド情報制御手段に対応する。The control unit 105 includes the output unit 101 and the input unit 1
02, visual guide presenting unit 103, gaze target detecting unit 104
Controls and manages each of the components. The control unit 105
Corresponds to the guide information control means in the claims.

【００７１】次に、第１の実施形態におけるマルチモー
ダルインターフェース装置のハードウエア構成を説明す
る。本発明に係るマルチモーダルインターフェース装置
１は、いわゆる汎用計算機、ワークステーション、Ｐ
Ｃ、ネットワーク端末等の各種コンピュータ単体又は各
コンピュータを相互接続したシステムに実装される。あ
るいは、自動販売機・券売機・遊技機等の対面型装置シ
ステムに実装されてもよい。Next, the hardware configuration of the multimodal interface device according to the first embodiment will be described. The multimodal interface device 1 according to the present invention is a so-called general-purpose computer, workstation, P
C, a computer such as a network terminal, or a computer in which various computers are interconnected. Alternatively, the present invention may be implemented in a face-to-face device system such as a vending machine, a ticket vending machine, and a game machine.

【００７２】図２は、第１の実施形態を汎用計算機に実
装する場合の内部構成の一例を示す。図２に示す汎用計
算機は、ＣＰＵ部５０１と、メモリ部５０２と、大容量
記憶部５０３と、通信インタフェース部５０４とを備え
る。図２に示す汎用計算機は、さらに、入力インタフェ
ース部５０５ａ〜５０５ｎと、入力デバイス部５０６ａ
〜５０６ｎと、出力インタフェース部５０７ａ〜５０７
ｍと、出力デバイス部５０８ａ〜５０８ｍを備える。
尚、メモリ部５０２と大容量記憶部５０３とは共用され
てもよい。FIG. 2 shows an example of the internal configuration when the first embodiment is mounted on a general-purpose computer. The general-purpose computer illustrated in FIG. 2 includes a CPU unit 501, a memory unit 502, a large-capacity storage unit 503, and a communication interface unit 504. The general-purpose computer shown in FIG. 2 further includes input interface units 505a to 505n and an input device unit 506a.
To 506n and output interface units 507a to 507
m, and output device units 508a to 508m.
Note that the memory unit 502 and the large-capacity storage unit 503 may be shared.

【００７３】入力デバイス部５０６ａ〜５０６ｎは、例
えばマイク、キーボード、ペンタブレット、ＯＣＲ、マ
ウス、スイッチ、タッチパネル、カメラ、データグロー
ブ、データスーツ等に実装される。出力デバイス部５０
８ａ〜５０８ｍは、例えばディスプレイ、スピーカ、フ
ォースディスプレイ等に実装される。ＣＰＵ部５０１
は、マルチモーダルインターフェース装置及び方法を実
現するソフトウェアを制御することにより、第１の実施
形態の機能を実現する。The input device units 506a to 506n are mounted on, for example, a microphone, keyboard, pen tablet, OCR, mouse, switch, touch panel, camera, data glove, data suit, and the like. Output device section 50
8a to 508m are mounted on, for example, a display, a speaker, a force display, and the like. CPU unit 501
Implements the functions of the first embodiment by controlling software that implements the multimodal interface device and method.

【００７４】尚、本発明のマルチモーダルインターフェ
ースの各種処理を実現するためのプログラムは、各種記
録媒体に保存することができる。かかる記録媒体を、上
記ハードウエアを具備する汎用機中のＣＰＵ部５０１に
より読み出し、当該プログラムを実行することにより、
本発明が実施される。ここで、記録媒体とは、例えば、
半導体メモリ・磁気ディスク（フロッピーディスク・ハ
ードディスク等）・光ディスク（ＣＤ−ＲＯＭ・ＤＶＤ
等）、プログラムを記録することができる装置全般を含
む。さらに、上記プログラムは、ネットワークなどの各
種通信手段を通じて提供されてもよい。Note that programs for implementing various processes of the multi-modal interface of the present invention can be stored in various recording media. By reading out such a recording medium by the CPU unit 501 in the general-purpose machine having the above hardware and executing the program,
The present invention is implemented. Here, the recording medium is, for example,
Semiconductor memory, magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD)
Etc.) and all devices capable of recording programs. Further, the program may be provided through various communication means such as a network.

【００７５】本発明の第１の実施形態は上記のように構
成されており、以下その処理の流れを図３乃至図５を用
いて順に説明する。The first embodiment of the present invention is configured as described above, and the flow of the processing will be described below in order with reference to FIGS.

【００７６】制御部１０５は、以下の手順に従って、視
覚ガイド提示部１０３の提示位置、提示サイズ等の各属
性を制御する。The control unit 105 controls each attribute such as the presentation position and the presentation size of the visual guide presentation unit 103 according to the following procedure.

【００７７】まず、視覚ガイド提示部１０３の提示位置
決定処理を説明する。First, the presentation position determination processing of the visual guide presentation unit 103 will be described.

【００７８】図３は、制御部１０５が行う視覚ガイド提
示部の提示位置の決定処理の手順を示すフローチャート
である。FIG. 3 is a flowchart showing a procedure of the process of determining the presentation position of the visual guide presentation unit performed by the control unit 105.

【００７９】ステップＳ１０は、視覚ガイド提示部１０
３の表示位置を保持するレジスタＬに、あらかじめ設定
した通常時の表示位置の値Ｌ０を設定する。Step S10 is the same as the visual guide presentation section 10
The value L0 of the normal display position set in advance is set in the register L holding the display position of No. 3.

【００８０】ステップＳ２０は、あらかじめ設定した初
期状態での発声候補Ｗ０を、発声候補を保持するレジス
タＷに設定する。In step S20, the utterance candidate W0 in the preset initial state is set in the register W for holding the utterance candidate.

【００８１】ステップＳ３０は、出力部１０１上に、レ
ジスタＬの示す座標を中心とする位置に視覚ガイド提示
部１０３を半透明で表示する。In step S30, the visual guide presenting unit 103 is displayed on the output unit 101 in a semi-transparent manner at a position centered on the coordinates indicated by the register L.

【００８２】ステップＳ４０は、注視対象検出部１０４
から利用者の注視位置Ｅが得られたか否かを判断する。
注視位置Ｅが得られた場合、ステップＳ６０へ進む。一
方、注視位置Ｅが得られなかった場合はステップＳ５０
に進む。At step S40, the gaze target detecting unit 104
It is determined whether or not the user's gaze position E has been obtained from.
When the gaze position E is obtained, the process proceeds to step S60. On the other hand, if the gaze position E cannot be obtained, the process proceeds to step S50.
Proceed to.

【００８３】ステップＳ５０は、新たな発声候補の集合
Ｗｉが得られたか否かを判断する。新たな発声候補の集
合Ｗｉが得られた場合、ステップＳ８０へ進む。一方、
発声候補の集合Ｗ０に変化がない場合には、ステップＳ
４０に戻る。A step S50 decides whether or not a new set of utterance candidates Wi has been obtained. When a new set of utterance candidates Wi is obtained, the process proceeds to step S80. on the other hand,
If there is no change in the utterance candidate set W0, the process proceeds to step S
Return to 40.

【００８４】ステップＳ６０は、レジスタＬの内容と注
視位置Ｅの内容を比較する。この比較結果により、レジ
スタＬと注視位置Ｅ双方の中心座標のずれが、一般的な
利用者の周辺視野範囲などに基づいて予め決定された閾
値Ｆ１以下である場合はステップＳ５０へ進む。一方、
中心座標のずれが、閾値Ｆ１より大きい場合は、ステッ
プＳ７０に進む。A step S60 compares the contents of the register L with the contents of the gaze position E. As a result of the comparison, if the difference between the center coordinates of both the register L and the gaze position E is equal to or smaller than the threshold value F1 determined in advance based on a general user's peripheral visual field range, the process proceeds to step S50. on the other hand,
If the deviation of the center coordinates is larger than the threshold value F1, the process proceeds to step S70.

【００８５】ステップＳ７０は、レジスタＬの内容をＥ
の内容で更新し、ステップＳ１００へ進む。In step S70, the contents of register L are stored in E
And the process proceeds to step S100.

【００８６】ステップＳ８０は、レジスタＷの内容をＷ
ｉで更新する。In step S80, the content of the register W is changed to W
Update with i.

【００８７】ステップＳ９０は、視覚ガイド提示部１０
３にレジスタＷの内容を設定する。Step S90 is the same as that of the visual guide presentation unit 10
3 is set to the contents of the register W.

【００８８】ステップＳ１００は、提示補正処理によ
り、レジスタＭの内容を決定する。提示補正処理の詳細
は後述する。レジスタＭは、提示補正処理で用いられる
提示位置を保持するレジスタである。In step S100, the contents of register M are determined by the presentation correction process. Details of the presentation correction process will be described later. The register M is a register that holds a presentation position used in the presentation correction process.

【００８９】ステップＳ１１０は、出力部１０１上に、
レジスタＭの示す座標を中心として視覚ガイド提示部１
０３を半透明で表示する。この処理の後、ステップＳ４
０に戻る。In step S110, the output unit 101
Visual guide presentation unit 1 centering on the coordinates indicated by register M
03 is displayed translucently. After this processing, step S4
Return to 0.

【００９０】次に、視覚ガイド提示部１０３の提示補正
処理を説明する。Next, the presentation correction processing of the visual guide presentation unit 103 will be described.

【００９１】図４は、制御部１０５が行う視覚ガイド提
示部の提示補正処理の手順を示すフローチャートであ
る。FIG. 4 is a flowchart showing the procedure of the presentation correction process of the visual guide presentation unit performed by the control unit 105.

【００９２】ステップＳ１０１は、レジスタＭにレジス
タＬの内容を複写する。A step S101 copies the contents of the register L to the register M.

【００９３】ステップＳ１０２は、現在の語彙Ｗを表示
する際の視覚ガイド提示部１０３の提示に必要な表示領
域のサイズが、一般的な利用者の周辺視野の上限範囲な
どに基づいてあらかじめ規定した領域のサイズを表す閾
値Ｆ２以下であるか否かを判断する。必要な表示領域の
サイズが閾値Ｆ２以下である場合は、ステップＳ１０４
に進む。一方、必要な表示領域のサイズが閾値Ｆ２より
大きい場合は、ステップＳ１０３に進む。In step S102, the size of the display area required for presentation by the visual guide presentation unit 103 when displaying the current vocabulary W is defined in advance based on the upper limit range of the peripheral vision of a general user. It is determined whether it is equal to or smaller than a threshold value F2 representing the size of the area. If the required size of the display area is equal to or smaller than the threshold F2, step S104
Proceed to. On the other hand, if the required display area size is larger than the threshold F2, the process proceeds to step S103.

【００９４】ステップＳ１０３は、視覚ガイド提示部１
０３の必要な表示領域のサイズを、閾値Ｆ２以下とする
よう、調整する。このサイズの調整は、例えば視覚ガイ
ド提示部１０３の表示フォントを縮小するなど表示様式
を変更することで行うことができる。Step S103 is the visual guide presentation section 1
03 is adjusted so that the required display area size is equal to or smaller than the threshold value F2. The adjustment of the size can be performed by changing the display style such as reducing the display font of the visual guide presenting unit 103, for example.

【００９５】ステップＳ１０４は、レジスタＭの内容を
中心として、視覚ガイド提示部１０３の表示領域の全体
が出力部１０１の中に収まるか否かを判断する。視覚ガ
イド提示部１０３の表示領域全体が出力部１０１の中に
収まる場合には、処理を終了する。一方、視覚ガイド提
示部１０３の表示領域全体が出力１０１の中に収まらな
い場合は、ステップＳ１０５に進む。In step S104, it is determined whether or not the entire display area of the visual guide presenting unit 103 fits in the output unit 101, centering on the contents of the register M. If the entire display area of the visual guide presenting unit 103 fits in the output unit 101, the process ends. On the other hand, if the entire display area of the visual guide presenting unit 103 does not fit in the output 101, the process proceeds to step S105.

【００９６】ステップＳ１０５は、レジスタＭの内容
を、出力部１０１の中心方向にあらかじめ設定した距離
Ｄ分シフトして、レジスタＭの内容を更新する。In step S105, the contents of the register M are shifted by a predetermined distance D toward the center of the output unit 101 to update the contents of the register M.

【００９７】ステップＳ１０６は、レジスタＭとレジス
タＬの中心座標のずれが、一般的な利用者の周辺視野の
上限範囲を表すあらかじめ規定した閾値Ｆ３を越えるか
否かを判定する。レジスタＭとレジスタＬの中心座標の
ずれ量が、閾値Ｆ３を越える場合は、レジスタＭに、あ
らかじめ設定した通常時の表示位置の値Ｌ０を設定し、
提示補正処理を終了する。レジスタＭとレジスタＬの中
心座標のずれ量が、閾値Ｆ３の範囲内にある場合はステ
ップＳ１０４に戻る。In step S106, it is determined whether or not the difference between the center coordinates of the register M and the register L exceeds a predetermined threshold value F3 representing the upper limit range of the peripheral vision of a general user. If the amount of deviation between the center coordinates of the register M and the register L exceeds the threshold value F3, a value L0 of a preset normal display position is set in the register M,
The presentation correction processing ends. If the shift amount between the center coordinates of the register M and the register L is within the range of the threshold value F3, the process returns to step S104.

【００９８】尚、利用者の周辺視野の範囲とは、例えば
利用者が視野位置を動かさずに表示内容を確認可能な領
域等を表すものとする。The range of the peripheral visual field of the user represents, for example, an area in which the user can confirm the display content without moving the visual field position.

【００９９】また、第１の実施形態においては、閾値Ｆ
１、Ｆ２、及びＦ３は、出力部１０１全体より十分に小
さい領域の任意のサイズであるとする。In the first embodiment, the threshold value F
It is assumed that 1, F2, and F3 have an arbitrary size in an area sufficiently smaller than the entire output unit 101.

【０１００】次に、上記の処理に従った第１の実施形態
の動作の具体例を、図５を用いて詳細に説明する。Next, a specific example of the operation of the first embodiment according to the above processing will be described in detail with reference to FIG.

【０１０１】尚、ここでは、入力ガイド提示部の初期位
置であるＬ０には出力部１０１の右上の位置が指定され
ているものと仮定する。また初期状態での発声候補の集
合Ｗ０には、［「w1」、「w2」、「w3」］が設定されて
いるものと仮定する。Here, it is assumed that the upper right position of the output unit 101 is designated as L0, which is the initial position of the input guide presenting unit. It is also assumed that [“w1”, “w2”, “w3”] is set in the utterance candidate set W0 in the initial state.

【０１０２】まず、図３に示すステップＳ１０、Ｓ２
０、Ｓ３０の処理が実行され、入力ガイドの初期提示位
置Ｌ０および発声候補Ｗ０がそれぞれ設定される。図５
（ａ）に示すように出力部１０１であるディスプレイ画
面の右上に現在の認識候補がガイド情報として半透明表
示される。First, steps S10 and S2 shown in FIG.
0 and S30 are executed, and the initial presentation position L0 and the utterance candidate W0 of the input guide are set, respectively. FIG.
As shown in (a), the current recognition candidate is translucently displayed as guide information at the upper right of the display screen as the output unit 101.

【０１０３】ここで、利用者は図５（ｂ）に示すように
現在のガイド情報の近傍（Ｅ１）辺りを注視したとす
る。Here, it is assumed that the user gazes around the vicinity (E1) of the current guide information as shown in FIG. 5B.

【０１０４】この場合、図３に示すステップＳ４０、Ｓ
６０の処理が実行され、現在利用者はガイド情報を周辺
視野で確認することが出来ることが確認される。このた
め、ガイド情報はそのまま表示される。利用者は入力可
能な語彙を周辺視野で確認しながら例えば「w1」等の入
力を行なうことが可能である。In this case, steps S40 and S40 shown in FIG.
Step 60 is executed, and it is confirmed that the user can now check the guide information in the peripheral vision. Therefore, the guide information is displayed as it is. The user can input, for example, “w1” while confirming the vocabulary that can be input in the peripheral vision.

【０１０５】また、この状態で利用者が、入力ガイドの
内容を見るために、或いは近傍に表示されている他の情
報を確認するために、視線位置Ｅ１の近傍に視線をめぐ
らせた場合を想定する。この近傍に視線をめぐらせる場
合にも、図３のステップＳ６０の処理によって、入力ガ
イドは移動されることなく一定の位置に表示され続け
る。これらの処理によって、入力ガイドが利用者の視線
移動につれて頻繁に動くことなどによる利用者への負担
の増加がないインタフェースが実現される。In this state, the user turns his / her gaze near the gaze position E1 in order to see the contents of the input guide or to check other information displayed in the vicinity. Suppose. Even in the case where the line of sight is moved in the vicinity, the input guide continues to be displayed at a fixed position without being moved by the process of step S60 in FIG. Through these processes, an interface that does not increase the burden on the user due to, for example, the input guide frequently moving as the user's line of sight moves is realized.

【０１０６】次に、図５（ｃ）に示すように、利用者
が、現在の入力ガイドから離れた位置にあるアプリケー
ションＡ１を参照又は操作するために、視線位置Ｅ２辺
りに視線を移した場合を想定する。Next, as shown in FIG. 5 (c), when the user shifts his / her gaze to around the gaze position E2 in order to refer to or operate the application A1 located away from the current input guide. Is assumed.

【０１０７】この視線の移動は、注視対象検出部１０３
によって検出され、制御部１０５に通知される。制御部
１０５は、視線位置Ｅ２は現在の入力ガイドの提示位置
を示すレジスタＬの値から閾値Ｆ１以上離れていること
を判別する。図３に示すステップＳ４０〜ステップＳ７
０の処理は、入力ガイドの提示位置を変更すべきである
ことを判定する。This movement of the line of sight is determined by the gaze target detecting unit 103.
And notifies the control unit 105. The control unit 105 determines that the line-of-sight position E2 is separated from the value of the register L indicating the present position of the input guide by the threshold value F1 or more. Steps S40 to S7 shown in FIG.
The process of 0 determines that the presentation position of the input guide should be changed.

【０１０８】この判定結果に従い、図５（ｄ）に示すよ
うに、図３のステップＳ１００（図４のステップＳ１０
１〜ステップＳ１０７）の処理は、利用者がその周辺視
野で確認可能な位置に入力ガイドを自動的に移動する。According to this determination result, as shown in FIG. 5D, step S100 in FIG. 3 (step S10 in FIG. 4) is performed.
In the processing from 1 to S107), the input guide is automatically moved to a position where the user can confirm it in the peripheral visual field.

【０１０９】なお、入力ガイドは半透明表示される。こ
の半透明表示は、現在利用者が捜査中あるいは参照中の
アプリケーションＡ２の表示を隠してしまうことがな
い。このため、利用者は支障なく入力、アプリケーショ
ンの操作・参照・利用等を行なうことができる。Note that the input guide is displayed translucently. This translucent display does not hide the display of the application A2 currently being searched or referenced by the user. For this reason, the user can perform input, operation, reference, and use of the application without any trouble.

【０１１０】次に、図５（ｅ）に示すように、提示すべ
き発声候補の数が多いため入力ガイドの表示サイズが周
辺視野を越えてしまう場合がある。この場合に、図４の
ステップＳ１０３の処理は、入力ガイドの表示サイズを
縮小する。このため、利用者は、発声候補などの入力ガ
イドの情報を視野内に収めることができる。Next, as shown in FIG. 5E, there are cases where the display size of the input guide exceeds the peripheral visual field due to the large number of utterance candidates to be presented. In this case, the process of step S103 in FIG. 4 reduces the display size of the input guide. For this reason, the user can put the information of the input guide such as the utterance candidate in the field of view.

【０１１１】さらに、図５（ｆ）に示すように、利用者
の注視位置Ｅ３が出力部１０１の周辺近傍にあり、算出
された入力ガイドの提示位置が出力部１０１からはみ出
る場合がある。この場合に、図４のステップＳ１０４〜
ステップＳ１０７の処理は、入力ガイドの表示位置を出
力部１０１の中心方向に向けて適切にオフセット表示す
る。このため、利用者は、入力ガイドの情報を支障なく
視野内に収めることができる。Furthermore, as shown in FIG. 5F, the user's gaze position E3 may be near the periphery of the output unit 101, and the calculated presentation position of the input guide may protrude from the output unit 101. In this case, steps S104 to S104 in FIG.
In the process of step S107, the display position of the input guide is appropriately offset toward the center of the output unit 101. For this reason, the user can put the information of the input guide in the field of view without any trouble.

【０１１２】尚、全ての時点において、認識可能な発声
候補が変更された場合には、図３のステップＳ４０〜ス
テップＳ１１０の処理は、入力ガイドの表示内容を逐次
更新する。同時に、表示内容が変わることによって必要
な表示領域のサイズ変更が変更される場合には、適切に
表示位置・表示形式が変更される。If the recognizable utterance candidates are changed at all points in time, the processing of steps S40 to S110 in FIG. 3 sequentially updates the display contents of the input guide. At the same time, when the required size change of the display area is changed due to the change of the display content, the display position and the display format are appropriately changed.

【０１１３】尚、第１の実施形態では、入力部１０２と
して音声認識入力を例としたが、入力手段はこれに限定
されない。例えばジェスチャ入力に対しても第１の実施
形態は適応可能である。In the first embodiment, a speech recognition input is taken as an example of the input unit 102, but the input means is not limited to this. For example, the first embodiment is applicable to a gesture input.

【０１１４】尚、第１の実施形態では、視覚的な出力部
１０１として一つのディスプレイを持つ装置を例とした
が、出力部１０１はこれに限定されない。例えば複数の
ディスプレイを持つマルチモニタ環境、或いは例えば頭
部装着型のヘッドマウントディスプレイなどを用いた仮
想空間環境においても、第１の実施形態は利用可能であ
る。In the first embodiment, a device having one display is used as the visual output unit 101, but the output unit 101 is not limited to this. For example, the first embodiment can be used in a multi-monitor environment having a plurality of displays, or in a virtual space environment using, for example, a head-mounted head-mounted display.

【０１１５】また、第１の実施形態では、視覚ガイド提
示部１０３の制御方法として、表示形態、表示位置、サ
イズなどを制御する例を示したが、制御方法はこれに限
定されない。例えば視覚ガイド提示部の出力の形を変形
させたり、複数配置させたり、あるいは利用者の注視位
置の周囲に渡って表示するように制御することも可能で
ある。Further, in the first embodiment, an example in which the display mode, the display position, the size, and the like are controlled is described as a method of controlling the visual guide presenting unit 103, but the control method is not limited to this. For example, it is possible to control the output form of the visual guide presenting unit to be deformed, a plurality of outputs to be arranged, or to be displayed around the gaze position of the user.

【０１１６】また、第１の実施形態では、表示を縮小す
ることによって、視覚ガイド提示部１０３に多くの情報
を提示するようにしたが、提示方法はこれに限定されな
い。例えば自動的にスクロールする表示形式などを利用
することも可能である。Further, in the first embodiment, a large amount of information is presented to the visual guide presentation unit 103 by reducing the display, but the presentation method is not limited to this. For example, it is also possible to use a display format that automatically scrolls.

【０１１７】また、視覚ガイド提示部１０３は、音声や
ジェスチャなどの入力を受けて受けている期間中だけ表
示するようにすることも可能である。Further, the visual guide presenting unit 103 can be displayed only during a period of receiving and receiving an input such as a voice or a gesture.

【０１１８】また、視覚ガイド提示部１０３の出力は、
音声やジェスチャなどの入力を受けて受けている期間中
だけ、利用者の視線位置に追従させて表示させることも
可能である。The output of the visual guide presenting unit 103 is
Only during a period of receiving and receiving an input of a voice, a gesture, or the like, a display can be made to follow the user's line of sight.

【０１１９】また、第１の実施形態では、利用者の周辺
視野領域の判定に、注視位置からの距離を用いていた
が、判定方法はこれに限定されない。例えば人間の視野
の特性を考慮した他の判断基準を追加して利用すること
も可能である。Further, in the first embodiment, the distance from the gaze position is used to determine the peripheral visual field of the user, but the determination method is not limited to this. For example, it is also possible to additionally use another criterion in consideration of the characteristics of the human visual field.

【０１２０】また、上記の第１の実施形態の機能を適宜
組み合わせて利用することもできる。例えば、利用者が
ある位置に視線をなげかけている時に、利用者が視線を
大きく動かすことなく内容を確認できる近傍位置であ
り、かつ現在の利用者が利用・参照している画面上の表
示要素と重ならないか又は重なりの最小となる位置に入
力ガイド情報を提示することが可能である。Further, the functions of the above-described first embodiment can be used in appropriate combinations. For example, when the user is looking at a certain position, it is a nearby position where the user can check the contents without greatly moving the line of sight, and the display element on the screen that the current user is using and referring to It is possible to present the input guide information at a position where the input guide information does not overlap with or minimizes the overlap.

【０１２１】第１の実施形態によれば、以下の効果が得
られる。According to the first embodiment, the following effects can be obtained.

【０１２２】注視対象検出部１０４は、検出された利用
者の視線位置に基づく注視情報を検出する。制御部１０
５は、この注視情報と現在の入力ガイドの表示位置とに
基づき、入力ガイドが利用者の注視箇所の近傍に表示さ
れるべく制御する。The gaze target detecting unit 104 detects gaze information based on the detected gaze position of the user. Control unit 10
5 controls the input guide to be displayed in the vicinity of the gazing point of the user based on the gaze information and the current display position of the input guide.

【０１２３】このため、作業をする利用者は、入力ガイ
ドを支障なく周辺視野内に収めることができる。従っ
て、利用者は入力すべき内容を効率よく把握することが
できる。Therefore, the user who works can keep the input guide within the peripheral visual field without any trouble. Therefore, the user can efficiently grasp the contents to be input.

【０１２４】第２の実施形態以下、本発明の第２の実施形態を、第１の実施形態と異
なる点についてのみ、図面を用いて詳細に説明する。Second Embodiment Hereinafter, a second embodiment of the present invention will be described in detail with reference to the drawings, only with respect to differences from the first embodiment.

【０１２５】第２の実施形態は、第１の実施形態に加え
てさらに、コンピュータ上で稼働するアプリケーション
の処理と入力ガイドとの間の衝突を避ける機能を提供す
る。図６は、本発明の第２の実施形態に係るマルチモー
ダルインターフェース装置の機能構成を示すブロック図
である。第２の実施形態は、出力部１０１と、入力部１
０２と、視覚ガイド提示部１０３と、注視対象検出部１
０４と、制御部２０６と、アプリケーション管理部２０
６とを具備する。The second embodiment provides, in addition to the first embodiment, a function for avoiding a collision between processing of an application running on a computer and an input guide. FIG. 6 is a block diagram illustrating a functional configuration of the multimodal interface device according to the second embodiment of the present invention. In the second embodiment, an output unit 101 and an input unit 1
02, the visual guide presenting unit 103, and the gaze target detecting unit 1
04, the control unit 206, and the application management unit 20
6 is provided.

【０１２６】アプリケーション管理部２０６は、各時点
において出力部１０１に表示されている、例えばウイン
ドウ・メニュー等といった表示要素の配置・依存関係・
表示状態を逐次管理する。このアプリケーションの状態
の把握は、アプリケーションに関連するタスクの状態を
監視するための一般に知られる手法を用いて行うことが
できる。アプリケーション管理部２０６は、制御部２０
５からの問い合わせに応じて、或いは制御部２０５に非
同期的にアプリケーションの状態情報を提供する。尚、
アプリケーション管理部２０６は、請求項におけるアプ
リケーション状況把握手段に対応する。The application management unit 206 arranges the display elements, such as window menus, which are displayed on the output unit 101 at each point in time.
The display state is sequentially managed. This state of the application can be grasped by using a generally known technique for monitoring the state of a task related to the application. The application management unit 206 controls the control unit 20
5 in response to an inquiry from the client 5 or asynchronously with the control unit 205. still,
The application management unit 206 corresponds to an application status grasping unit in the claims.

【０１２７】制御部２０５は、第１の実施形態の制御部
１０５とほぼ同様の機能を持つ。ただし、制御部２０５
は、図４のステップＳ１０４に替えて、以下の処理を行
う。即ち、制御部２０５は、レジスタＭの内容を中心と
して、視覚ガイド提示部１０３の表示領域の全体が出力
部１０１の中に収まるか否かを判断する。第２の実施形
態においては、制御部２０５は、さらに、アプリケーシ
ョン管理部２０６を参照して、レジスタＭの内容を中心
とした視覚ガイド提示部１０３の表示領域の全体が、稼
働するアプリケーションの他の表示要素と重複するか否
かを判定する。入力ガイドと他の表示要素とが重複する
場合には、入力ガイドの現在の位置を、他の表示要素と
離間する位置まで移動する。一方、重複しない場合に
は、処理を終了する。The control unit 205 has almost the same function as the control unit 105 of the first embodiment. However, the control unit 205
Performs the following processing instead of step S104 in FIG. That is, the control unit 205 determines whether or not the entire display area of the visual guide presenting unit 103 fits in the output unit 101, centering on the contents of the register M. In the second embodiment, the control unit 205 further refers to the application management unit 206, and the entire display area of the visual guide presenting unit 103 centering on the contents of the register M is used by another application that runs. It is determined whether or not the display element overlaps. If the input guide and another display element overlap, the current position of the input guide is moved to a position separated from the other display element. On the other hand, if they do not overlap, the process ends.

【０１２８】第２の実施形態のその他の構成および処理
手順は第１の実施形態と同様であるため、説明は省略さ
れる。The other configuration and processing procedure of the second embodiment are the same as those of the first embodiment, and thus the description is omitted.

【０１２９】尚、第２の実施形態では、アプリケーショ
ン管理部２０６を用いて、制御部２０５が他の表示要素
のある領域を避けてガイド情報を提示していたが、表示
方法はこれに限定されない。例えば、利用者の視線の動
きや他の入出力要素の動作状態を解析するなどにより、
現在利用者が使っているアプリケーションや、現在利用
者が参照しているアプリケーション或いはウインドウな
どの表示要素を識別し、これら要素との重複を避けるよ
う再配置して入力ガイドを提示するようにしてもよい。In the second embodiment, the control unit 205 uses the application management unit 206 to present guide information while avoiding an area having another display element. However, the display method is not limited to this. . For example, by analyzing the movement of the user's line of sight and the operating state of other input / output elements,
Even if the application currently used by the user or the display element such as the application or window currently referred to by the user is identified, the input guide may be rearranged so as to avoid duplication with these elements. Good.

【０１３０】第２の実施形態によれば、第１の実施形態
に加えてさらに以下の効果が得られる。According to the second embodiment, the following effects can be further obtained in addition to the effects of the first embodiment.

【０１３１】アプリケーション管理部２０６は、稼働す
るアプリケーションの状態を監視する。制御部２０５
は、このアプリケーションの表示状態に従って、適宜入
力ガイドの位置を補正する。このため、視覚ガイド情報
と他の表示要素との重複表示が避けられ、利用者が入力
ガイドを介した対話のためにアプリケーション処理を中
断することが不要となる。従って、利用者の作業効率が
向上し、利用者にとってより分かりやすいインタフェー
スが実現される。The application management unit 206 monitors the status of the running application. Control unit 205
Corrects the position of the input guide appropriately according to the display state of the application. For this reason, overlapping display of the visual guide information and other display elements is avoided, and it is not necessary for the user to interrupt the application processing for the interaction via the input guide. Therefore, the work efficiency of the user is improved, and an interface that is easier for the user to understand is realized.

【０１３２】第３の実施形態以下、本発明の第３の実施形態を、第１の実施形態及び
第２の実施形態と異なる点についてのみ、図面を用いて
詳細に説明する。Third Embodiment Hereinafter, a third embodiment of the present invention will be described in detail with reference to the drawings, only with respect to differences from the first embodiment and the second embodiment.

【０１３３】第３の実施形態は、第１の実施形態及び第
２の実施形態に加えてさらに、利用者への出力を補助音
声により補完する機能を提供する。In the third embodiment, in addition to the first and second embodiments, a function of supplementing the output to the user with auxiliary voice is provided.

【０１３４】図７は、本発明の第３の実施形態に係るマ
ルチモーダルインターフェース装置の機能構成を示すブ
ロック図である。第３の実施形態は、出力部１０１と、
入力部１０２と、視覚ガイド提示部１０３と、注視対象
検出部１０４と、制御部３０６と、補助音声提示部３０
７とを具備する。FIG. 7 is a block diagram showing a functional configuration of the multimodal interface device according to the third embodiment of the present invention. The third embodiment includes an output unit 101,
The input unit 102, the visual guide presenting unit 103, the gaze target detecting unit 104, the control unit 306, and the auxiliary voice presenting unit 30
7 is provided.

【０１３５】補助音声提示部３０７は、例えば利用者か
らの入力を正しく受け取ったことの確認、或いは利用者
からの入力を促す等の目的のため、音声信号によるフィ
ードバックを行う。このフィードバックを補助音声と称
する。補助音声は、ブザー、ディジタル記録された音声
信号の再生、或いは合成音声出力等により利用者に提示
される。尚、補助音声提示部３０７は、請求項における
出力制御手段に対応する。[0135] The auxiliary voice presentation unit 307 performs feedback by a voice signal for the purpose of, for example, confirming that the input from the user has been correctly received or prompting the input from the user. This feedback is called auxiliary sound. The auxiliary voice is presented to the user by a buzzer, reproduction of a digitally recorded voice signal, or output of a synthesized voice. The auxiliary voice presentation unit 307 corresponds to an output control unit in the claims.

【０１３６】制御部３０５は、第１の実施形態の制御部
１０５とほぼ同様の機能を持つ。ただし、制御部３０５
は、さらに以下の処理を行う。The control section 305 has almost the same function as the control section 105 of the first embodiment. However, the control unit 305
Performs the following processing.

【０１３７】即ち、制御部３０５は、視覚ガイド提示部
３０４が利用者の入力内容に対するフィードバックを提
示する際に、適宜補助音声提示部３０７を用いる。具体
的には、制御部３０５は、注視対象検出部１０３から得
られる利用者の注視位置Ｅと、現在視覚ガイド情報が提
示されている位置を表すレジスタＭの内容とを比較す
る。制御部３０５は、双方の位置のずれが予め決定され
る閾値Ｆ４より大きい場合には、利用者が視覚ガイド情
報を視野内に収められていないと判断する。視覚ガイド
情報が視野内にない場合、制御部３０５は、補助音声提
示部３０７から、音声信号によるフィードバックを提示
する。尚、制御部３０５又は補助音声提示部３０７は、
請求項における補助音声生成手段に対応する。That is, the control unit 305 uses the auxiliary voice presenting unit 307 as appropriate when the visual guide presenting unit 304 presents feedback on the input contents of the user. Specifically, the control unit 305 compares the gaze position E of the user obtained from the gaze target detection unit 103 with the contents of the register M indicating the position where the visual guide information is currently presented. If the displacement between the two positions is larger than a predetermined threshold value F4, the control unit 305 determines that the user does not include the visual guide information in the visual field. When the visual guide information is not in the visual field, the control unit 305 presents a feedback by an audio signal from the auxiliary audio presentation unit 307. Note that the control unit 305 or the auxiliary voice presentation unit 307
It corresponds to the auxiliary voice generating means in the claims.

【０１３８】第３の実施形態のその他の構成および処理
手順は第１の実施形態及び第２の実施形態と同様である
ため、説明は省略される。[0138] Other configurations and processing procedures of the third embodiment are the same as those of the first and second embodiments, and a description thereof will be omitted.

【０１３９】尚、第３の実施形態では、視覚的なガイド
情報の補助のため音声信号出力を用いる例を示したが、
第３の実施形態はこれに限定されない。例えば、振動や
力などを用いた出力等、利用者の視覚を拘束しない出力
であれば、他の出力信号に対しても第３の実施形態を適
用することができる。また、補助音声には、人間同士の
会話で多用される相槌を利用してもよい。In the third embodiment, an example is shown in which an audio signal output is used to assist visual guide information.
The third embodiment is not limited to this. For example, the third embodiment can be applied to other output signals as long as the output does not restrict the user's vision, such as an output using vibration or force. In addition, a companion voice often used in conversation between humans may be used as the auxiliary voice.

【０１４０】第３の実施形態は、第１の実施形態及び第
２の実施形態と適宜組み合わせて実施されてもよいこと
は言うまでもない。It goes without saying that the third embodiment may be implemented in combination with the first and second embodiments as appropriate.

【０１４１】第３の実施形態によれば、第１の実施形態
及び第２の実施形態に加えてさらに以下の効果が得られ
る。According to the third embodiment, the following effects can be further obtained in addition to the effects of the first and second embodiments.

【０１４２】補助音声提示部３０７は、制御部３０５の
制御により、視覚ガイド提示部１０３と共に、或いは視
覚ガイド提示部１０３に替えて、利用者からの入力に対
するフィードバックを補助音声により提示する。これに
より、利用者は、視覚ガイド情報を見ていない時には補
助音声によるフィードバックが視覚ガイド情報を補完
し、かつ必要のないときには補助音声が提示されない。
従って、より確実に利用者に入力のフィードバックを与
えるインタフェースが実現される。Under the control of the control unit 305, the auxiliary voice presenting unit 307 presents a feedback to the input from the user by the auxiliary voice together with or instead of the visual guide presenting unit 103. Thereby, when the user is not looking at the visual guide information, the feedback by the auxiliary voice complements the visual guide information, and when it is unnecessary, the auxiliary voice is not presented.
Therefore, an interface that more reliably provides input feedback to the user is realized.

【０１４３】第４の実施形態以下、本発明の第４の実施形態を、第１の実施形態乃至
第３の実施形態と異なる点についてのみ、図面を用いて
詳細に説明する。Fourth Embodiment Hereinafter, a fourth embodiment of the present invention will be described in detail with reference to the drawings, only with respect to differences from the first to third embodiments.

【０１４４】第４の実施形態は、上記の実施形態に加え
てさらに、利用者からの入力待ち受けの時間を調整する
機能を提供する。The fourth embodiment further provides a function of adjusting the time for waiting for an input from a user in addition to the above-described embodiments.

【０１４５】図８は、本発明の第４の実施形態に係るマ
ルチモーダルインターフェース装置の機能構成を示すブ
ロック図である。第４の実施形態は、出力部１０１と、
入力部１０２と、視覚ガイド提示部１０３と、注視対象
検出部１０４と、制御部４０５と、探索状態検出部４０
８とを具備する。FIG. 8 is a block diagram showing a functional configuration of a multimodal interface device according to a fourth embodiment of the present invention. The fourth embodiment includes an output unit 101,
The input unit 102, the visual guide presentation unit 103, the gaze target detection unit 104, the control unit 405, and the search state detection unit 40
8 is provided.

【０１４６】探索状態検出部４０８は、注視対象検出部
１０３から逐次提供される利用者の視線位置をあらかじ
め用意した規則によって監視する。具体的には、探索状
態検出部４０８は、例えば、利用者の視線位置が入力候
補を提示している視覚ガイド提示部１０４の上に滞留し
ている場合に、利用者が入力すべき候補を選択中である
と判断する等の解析を行なう。解析結果は、制御部４０
５に逐次通知される。尚、探索状態検出部４０８は、請
求項における第３の判定手段に対応する。The search state detection unit 408 monitors the user's line of sight provided sequentially from the gaze target detection unit 103 according to rules prepared in advance. Specifically, for example, when the user's line of sight stays on the visual guide presenting unit 104 presenting the input candidate, the search state detection unit 408 determines the candidate to be input by the user. Analysis such as judging that selection is being performed is performed. The analysis result is transmitted to the control unit 40
5 is successively notified. Note that the search state detection unit 408 corresponds to a third determination unit in the claims.

【０１４７】制御部４０５は、第１の実施形態の制御部
１０５とほぼ同様の機能を持つ。ただし、制御部４０５
は、さらに以下の処理を行う。The control section 405 has almost the same function as the control section 105 of the first embodiment. However, the control unit 405
Performs the following processing.

【０１４８】即ち、制御部４０５は、探索状態検出部４
０８から得られる利用者の探索状況に応じて、入力の待
ち受け時間の延長や、入力ガイド情報の提示時間の調整
等を適宜実行する。That is, the control unit 405 controls the search state detection unit 4
In accordance with the user's search status obtained from step 08, the input standby time is extended, the input guide information presentation time is adjusted, etc., as appropriate.

【０１４９】第４の実施形態のその他の構成および処理
手順は第１の実施形態乃至第３の実施形態と同様である
ため、説明は省略される。[0149] Other configurations and processing procedures of the fourth embodiment are the same as those of the first to third embodiments, and a description thereof will be omitted.

【０１５０】尚、第４の実施形態と上記の実施形態とを
適宜組み合わせて利用することができることはいうまで
もない。これらの組み合わせにより、例えば、利用者が
ある位置に視線をなげかけている時に、利用者が視線を
大きく動かすことなく内容を確認できる近傍位置であ
り、かつ現在の利用者が利用・参照している画面上の表
示要素と重ならないか或いは重なりの最小となる位置に
ガイド情報を提示することが可能である。It is needless to say that the fourth embodiment and the above-described embodiments can be used in an appropriate combination. By these combinations, for example, when the user is looking at a certain position, it is a nearby position where the user can check the content without greatly moving the line of sight, and the current user is using and referencing It is possible to present guide information at a position where it does not overlap with a display element on the screen or has a minimum overlap.

【０１５１】さらに、これに続き、利用者が例えば異な
る表示要素の内容を確認するために現在参照している表
示要素の近傍の他の表示要素上へ視線を移動させた場合
にも、新たに参照している表示要素と重ならないか或い
は重なりが最小となる位置であり、かつ利用者が視線を
大きく動かすことなく内容を確認可能な位置に入力ガイ
ドが移動されるべく制御することも可能である。Further, subsequently, when the user moves his / her gaze to another display element near the display element currently referred to, for example, to confirm the contents of a different display element, It is also possible to control so that the input guide is moved to a position where it does not overlap or has a minimum overlap with the display element being referred to and where the user can confirm the content without greatly moving his or her eyes. is there.

【０１５２】第４の実施形態は、上記の実施形態と適宜
組み合わせて実施されてもよいことは言うまでもない。It goes without saying that the fourth embodiment may be implemented in combination with the above embodiments as appropriate.

【０１５３】第４の実施形態によれば、上記の実施形態
に加えてさらに以下の効果が得られる。According to the fourth embodiment, the following effects can be further obtained in addition to the above embodiments.

【０１５４】探索状態検出部４０８は、利用者の探索状
況を監視する。制御部４０５は、この探索状況に応じ
て、入力の待ち受け時間の延長や、入力ガイド情報の提
示時間の調整等を行う。これにより、利用者が入力ガイ
ドを見ながら入力すべき表現・内容を検討或いは選択し
ている状態では、適切に入力の待ち受け延長等がされ
る。従って、少ない負荷でより使いやすいインターフェ
ースが実現される。The search state detection unit 408 monitors the search state of the user. The control unit 405 performs extension of the input waiting time, adjustment of the presentation time of the input guide information, and the like according to the search situation. Thereby, in a state where the user is examining or selecting the expression / content to be input while looking at the input guide, the input standby is appropriately extended. Therefore, an interface that is easier to use with less load is realized.

【０１５５】[0155]

【発明の効果】以上説明したように、本発明によれば、
以下に記載されるような効果を奏する。即ち、本発明
は、利用者の入力を補助するガイド情報を、利用者の視
線の位置を示す注視情報等に基づいて決定された位置に
適宜配置する機能を提供する。また、利用者の注視情報
に基づいて、ガイド情報を音声等の他のガイド情報によ
り適宜補完する機能を提供する。As described above, according to the present invention,
The following effects are obtained. That is, the present invention provides a function of appropriately arranging guide information for assisting a user's input at a position determined based on gaze information indicating a position of a user's line of sight. In addition, a function of appropriately supplementing guide information with other guide information such as voice based on the user's gaze information is provided.

【０１５６】これにより、利用者の負担を軽減しつつ効
率的かつ適切に入力のガイドを行うことが可能となる。As a result, it is possible to efficiently and appropriately guide the input while reducing the burden on the user.

【０１５７】このように、本発明を用いれば、マルチモ
ーダルインターフェース環境において、利用者とコンピ
ュータとの間の対話におけるデータ授受の精度が向上さ
れ、ひいては、利用者とコンピュータとの間の自然かつ
円滑なコミュニケーションが実現される。As described above, according to the present invention, in a multi-modal interface environment, the accuracy of data transfer in the dialog between the user and the computer is improved, and the natural and smooth communication between the user and the computer is achieved. Communication is realized.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係るマルチモーダル
インターフェース装置の機能構成を示すブロック図であ
る。FIG. 1 is a block diagram illustrating a functional configuration of a multimodal interface device according to a first embodiment of the present invention.

【図２】本発明に係るマルチモーダルインターフェース
装置が実装されるコンピュータシステムのハードウエア
構成を示すブロック図である。FIG. 2 is a block diagram illustrating a hardware configuration of a computer system in which the multimodal interface device according to the present invention is mounted.

【図３】本発明の第１の実施形態に係るマルチモーダル
インターフェース装置における制御部１０５が行う処理
手順を示すフローチャートである。FIG. 3 is a flowchart illustrating a processing procedure performed by a control unit 105 in the multimodal interface device according to the first embodiment of the present invention.

【図４】本発明の第１の実施形態に係るマルチモーダル
インターフェース装置における制御部１０５が行う提示
補正処理の処理手順を示すフローチャートである。FIG. 4 is a flowchart showing a procedure of a presentation correction process performed by a control unit 105 in the multimodal interface device according to the first embodiment of the present invention.

【図５】本発明に係る利用者の視線に伴うマルチモーダ
ルインターフェース装置の動作の一例を説明する図であ
る。FIG. 5 is a diagram illustrating an example of an operation of the multi-modal interface device according to the user's line of sight according to the present invention.

【図６】本発明の第２の実施形態に係るマルチモーダル
インターフェース装置の機能構成を示すブロック図であ
る。FIG. 6 is a block diagram illustrating a functional configuration of a multimodal interface device according to a second embodiment of the present invention.

【図７】本発明の第３の実施形態に係るマルチモーダル
インターフェース装置の機能構成を示すブロック図であ
る。FIG. 7 is a block diagram illustrating a functional configuration of a multimodal interface device according to a third embodiment of the present invention.

【図８】本発明の第４の実施形態に係るマルチモーダル
インターフェース装置の機能構成を示すブロック図であ
る。FIG. 8 is a block diagram illustrating a functional configuration of a multimodal interface device according to a fourth embodiment of the present invention.

[Explanation of symbols]

１、２、３、４マルチモーダルインターフェース装置１０１出力部１０２入力部１０３視覚ガイド提示部１０４注視対象検出部１０５、２０５、３０５、４０５制御部２０６アプリケーション管理部３０７補助音声提示部４０８探索状態検出部 1, 2, 3, 4 Multimodal interface device 101 Output unit 102 Input unit 103 Visual guide presentation unit 104 Gaze target detection unit 105, 205, 305, 405 Control unit 206 Application management unit 307 Auxiliary voice presentation unit 408 Search state detection unit

Claims

[Claims]

1. A multi-modal interface device for inputting / outputting information between a user and a computer in a plurality of interactive modes, wherein gaze information is generated as gaze information by detecting a gaze point of the user. Generating means; guide information control means for determining a position to present guide information for assisting the user's input based on the gaze information; and guide information control at the position determined by the guide information control means. A multi-modal interface device comprising:

2. The system according to claim 1, wherein the guide information control unit further includes: a control unit configured to determine whether a distance between a gazing point of the user indicated by the gazing information and a presentation position of the guide information is within a predetermined first threshold. Comprising a first determination unit for determining whether or not the guide information is to be set in the vicinity of the gazing point of the user when the distance is not within the range of the first threshold. The multi-modal interface device according to claim 1, wherein:

3. The guide information control means further comprising: fixing a presentation position of the guide information when it is determined that the user pays attention to the guide information. Item 3. The multimodal interface device according to item 1 or 2.

4. The guide information control means further comprises: a second determination means for determining whether the obtained presentation position of the guide information is within a predetermined presentation area where the guide information is to be presented. The multimodal interface device according to any one of claims 1 to 3, further comprising: when the presentation position is not in the presentation region, correcting the presentation position of the guide information in the presentation region. .

5. The guide information control unit according to claim 1, wherein the guide information control unit corrects a size of the guide information so as to present the guide information in a visual field of the user. Multimodal interface device.

6. The multi-modal interface device, further comprising: an application that generates application presentation information indicating at least one of a usage status of each application presented in a presentation area and arrangement information of each presentation element of the application. 6. The apparatus according to claim 1, further comprising a status grasping unit, wherein the guide information control unit sets the guide information to a position that does not hinder the presentation of each application, based on the application presentation information. 7. Or a multimodal interface device according to any one of the preceding claims.

7. The multi-modal interface device further comprises: auxiliary voice generation means for generating voice guide information based on a voice signal for assisting an input from a user; and output control for controlling output of the voice guide information. A guide to present the voice guide information when it is determined that the user cannot capture the guide information in the vicinity of the visual field based on the gaze information. 7. The multimodal interface device according to claim 1, wherein the information is set in information.

8. The multi-modal interface device further determines, based on the gaze information, whether or not the gaze point of the user stays in the area of the presented guide information. A determining unit, wherein the guide information control unit, when it is determined that the gaze point of the user is staying, the control unit maintains a standby state of an input from the user The multimodal interface device according to any one of claims 1 to 7, wherein:

9. The multimodal interface device according to claim 1, wherein said presentation control means controls the presentation of said guide information in a presentation area by translucent display or highlighted display.

10. A multimodal interface method for inputting and outputting information between a user and a computer in a plurality of interactive modes, comprising the steps of: detecting a gaze point of the user and generating the gaze information; A multi-modal method comprising: a step of obtaining a presentation position at which guide information for assisting the user's input is presented based on the gaze information; and a step of presenting guide information at the presentation position. Interface method.

11. A computer-readable recording medium storing a multi-modal interface program for inputting / outputting information between a user and a computer in a plurality of interactive modes, and detects a point of interest of the user. A module that generates a gaze information and a module that determines a presentation position for presenting guide information for assisting the user's input based on the gaze information; and a module that presents guide information at the presentation position. A computer-readable recording medium storing a multi-modal interface program, comprising: