JPH0863319A

JPH0863319A - Information processor

Info

Publication number: JPH0863319A
Application number: JP19923794A
Authority: JP
Inventors: Hideaki Kikuchi; 英明菊池; Haru Andou; ハル安藤; Nobuo Hataoka; 信夫畑岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-08-24
Filing date: 1994-08-24
Publication date: 1996-03-08

Abstract

PURPOSE: To simultaneously combine plural input means with each other for input operations in order to give an operation to such a part that cannot be directly operated since it is not visualized on a display and also to attain an operation to a visualized part with presence of plural corresponding objects of selection. CONSTITUTION: An information processor contains a voice input means such as a microphone 101, etc., a pointing input means such as a pen 102 or a mouse, etc., a command execution means 109, a screen display means 110 which changes the screen display in response to a command, and a screen output means 111 such as a display, etc., that outputs a screen. Furthermore an information integration means 108 is added to integrate the voice input information and the pointing input information and to interpret them. In such a constitution, the information processor can possess a function to interpret the operations including such non-direct operations as an operation of an invisible part of a screen, a collective selection operation of plural objects, etc., which are performed by a user by means of the voice input means and the pointing input means.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、パソコン、ワークステ
ーション、ワープロ等のＯＡ機器に搭載されたファイル
管理や文書編集、図形編集等のユーザインタフェースに
関し、ユーザにとって使い勝手のよい情報入力方法を提
供する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a user interface for file management, document editing, graphic editing, etc. mounted on OA equipment such as personal computers, workstations and word processors, and provides a user-friendly information input method. .

【０００２】[0002]

【従来の技術】従来のグラフィカル・ユーザインタフェ
ースでは、文字情報をビジュアル化して直接操作できる
ようにすることにより簡単な操作を実現している。その
際、入力手段としてはマウス、ペン等のポインティング
デバイス、あるいは文字を入力するためのキーボード、
音声等による入力を単独でしか受け付けない。一部で、
キーボードの特定のキーを押下しながらポインティング
デバイスでドラッグアンドドロップすることにより複数
対象物の操作が可能なインタフェースもあり、複数の入
力手段を同時に使う操作方法に対処している例も見られ
るが、キーボードを文字入力手段の主体とした従来のイ
ンタフェースにおいて、このような操作方法は熟練して
いない利用者にとっては不自然であり覚えにくい。2. Description of the Related Art A conventional graphical user interface realizes a simple operation by visualizing character information so that it can be directly operated. At that time, as the input means, a pointing device such as a mouse or a pen, or a keyboard for inputting characters,
It accepts only voice input alone. In part,
There is also an interface that allows you to operate multiple objects by dragging and dropping with a pointing device while pressing a specific key on the keyboard, and there are examples that deal with operating methods that use multiple input means at the same time, In a conventional interface that mainly uses a keyboard as a character input means, such an operating method is unnatural and difficult for an unskilled user to remember.

【０００３】[0003]

【発明が解決しようとする課題】上記従来技術では、多
くの状況で単一の入力手段による入力しか受け付けな
い。従って、ビジュアル化されていないために直接操作
できない部分への操作や、ビジュアル化されているが該
当する選択対象物が複数ある場合のように選択が面倒な
状況での操作を実行可能にするには、メニューの複雑化
が避けられない。そこで、本発明の課題は、単一の入力
手段のみでは操作しにくい上記の様な状況において、操
作を自然かつ簡単にするために、利用者の熟練を要する
入力手段であるキーボードの利用を極力避けて、かつ複
数の入力手段を同時に組み合わせて利用した入力を受け
付ける様にすることである。In the above-mentioned prior art, in many situations, only input by a single input means is accepted. Therefore, it is possible to perform an operation on a part that cannot be directly operated because it is not visualized, or an operation in a situation where selection is troublesome such as when there are multiple selected objects that are visualized. Inevitably, the menu becomes complicated. Therefore, an object of the present invention is to use the keyboard, which is an input means that requires skill of the user, as much as possible in order to make the operation natural and easy in the above situation where it is difficult to operate with only a single input means. It is to avoid the input, and to accept the input used by combining a plurality of input means at the same time.

【０００４】[0004]

【課題を解決するための手段】上記の問題を解決するた
めに、少なくともマイクロフォンなどの音声入力手段
と、ペンあるいはマウスなどのポインティング入力手段
と、利用者の入力音声を音声認識する音声認識手段と、
ポインティング入力手段を用いて入力されたポインティ
ング入力情報を検知するポインティング入力情報検知手
段と、コマンドを実行するコマンド実行手段と、該コマ
ンドに対応して画面表示を変更する画面表示手段を有す
る情報処理装置において、音声入力情報およびポインテ
ィング入力情報を統合して解釈を行なう情報統合手段を
備えた。In order to solve the above problems, at least a voice input means such as a microphone, a pointing input means such as a pen or a mouse, and a voice recognition means for recognizing a voice input by a user. ,
Information processing apparatus having pointing input information detection means for detecting pointing input information input using pointing input means, command execution means for executing a command, and screen display means for changing the screen display in response to the command In the above, an information integration means for integrating and interpreting the voice input information and the pointing input information is provided.

【０００５】[0005]

【作用】少なくともマイクロフォンなどの音声入力手段
と、ペンあるいはマウスなどのポインティング入力手段
と、利用者の入力音声を音声認識する音声認識手段と、
ポインティング入力手段を用いて入力されたポインティ
ング入力情報を検知するポインティング入力情報検知手
段と、コマンドを実行するコマンド実行手段と、該コマ
ンドに対応して画面表示を変更する画面表示手段を有す
る情報処理装置において、音声入力情報およびポインテ
ィング入力情報を統合して解釈を行なう情報統合手段を
備えることにより、利用者によって前記音声入力手段と
前記ポインティング入力手段を用いて同時に入力された
音声入力情報とポインティング入力情報を統合して解釈
し、単一入力手段では実現困難な画面上の不可視部分の
操作や複数オブジェクトの一括選択操作などの、非直接
操作を含む操作を処理することが可能になる。At least a voice input means such as a microphone, a pointing input means such as a pen or a mouse, and a voice recognition means for recognizing voice input by a user.
Information processing apparatus having pointing input information detection means for detecting pointing input information input using pointing input means, command execution means for executing a command, and screen display means for changing the screen display in response to the command In the above, by providing an information integration means for integrating and interpreting the voice input information and the pointing input information, the voice input information and the pointing input information which are simultaneously input by the user using the voice input means and the pointing input means. By integrating and interpreting, it becomes possible to process operations including indirect operations, such as operations of invisible parts on the screen that are difficult to realize with a single input means and batch selection operations of multiple objects.

【０００６】[0006]

【実施例】以下、図を用いて実施例を詳細に説明する。
ここでは、特に情報処理装置としてパソコンなどにおい
てファイル管理を行なうためのグラフィカル・ユーザイ
ンタフェースを想定して説明を行なう。ただし、本発明
は、該インタフェースに限らずデータベース検索、図形
編集、文書編集などのユーザインタフェース一般への応
用が可能である。EXAMPLES Examples will be described in detail below with reference to the drawings.
Here, a description will be given assuming a graphical user interface for managing files in a personal computer or the like as an information processing device. However, the present invention can be applied not only to the interface but also to general user interfaces such as database search, graphic editing, and document editing.

【０００７】図１は、本発明の情報処理装置の一実施例
を示すブロック図である。図１において、音声入力手段
１０１は音声を入力するのに用いられるマイクロフォン
などの道具である。音声検出手段１０３は、音声入力手
段１０１により入力された情報から利用者の音声区間と
入力時刻を検出する手段である。音声認識手段１０４
は、検出された利用者の音声を分析し、相当する単語列
を出力する手段である。音声入力情報解析手段１０５
は、音声入力情報を、意味を持つ単位に分割あるいは統
合し、意味情報を解析する手段である。ペン入力手段１
０２は、計算機への入力が可能な電子ペンなどの道具で
ある。ペン入力情報検知手段１０６は、ペン入力手段１
０２により入力された情報の内容および入力時刻を検知
する手段である。ペン入力情報解析手段１０７は、ペン
入力情報の意味情報を解析する手段である。情報統合手
段１０８は、意味単位に分割あるいは統合され意味情報
の解析が行なわれた音声入力情報と、意味情報の解析が
行なわれたペン入力情報を統合して解釈を行なう手段で
ある。コマンド実行手段１０９は、ファイル管理のコマ
ンドを実行する手段である。画面表示手段１１０はコマ
ンドの実行に対応して画面の表示を変更する手段であ
る。画面出力手段１１１は、ディスプレイなどの、文字
あるいは図形を出力する手段である。FIG. 1 is a block diagram showing an embodiment of an information processing apparatus of the present invention. In FIG. 1, the voice input means 101 is a tool such as a microphone used for inputting voice. The voice detection unit 103 is a unit that detects the voice section and the input time of the user from the information input by the voice input unit 101. Voice recognition means 104
Is a means for analyzing the detected voice of the user and outputting a corresponding word string. Voice input information analysis means 105
Is a means for dividing or integrating voice input information into units having a meaning and analyzing the meaning information. Pen input means 1
Reference numeral 02 is a tool such as an electronic pen capable of inputting to a computer. The pen input information detecting means 106 is the pen input means 1
02 is a means for detecting the content of the information input by 02 and the input time. The pen input information analysis unit 107 is a unit that analyzes the semantic information of the pen input information. The information integration unit 108 is a unit that interprets by integrating voice input information in which semantic information is divided or integrated into semantic units and semantic information is analyzed, and pen input information in which semantic information is analyzed. The command execution unit 109 is a unit that executes a file management command. The screen display means 110 is means for changing the screen display in response to the execution of a command. The screen output unit 111 is a unit such as a display that outputs characters or graphics.

【０００８】本発明が従来と異なる点は、情報統合手段
１０８を設けている点である。The present invention is different from the conventional one in that the information integrating means 108 is provided.

【０００９】図１の実施例において、まず音声入力手段
１０１を用いて入力された情報のうち、利用者の音声を
音声検出手段１０３により検出する。利用者の音声とし
ては、ファイル管理に関わるコマンド発声として、「移
動」「右」などの単語音声の他、「このファイルをここ
に移動したい」などの連続音声も入力されうる。これら
の音声が音声入力手段１０１を用いて入力された後、音
声検出手段１０３では、音声区間を検出すると同時に、
検出された音声区間の始端時刻と終端時刻を計測する。
音声認識手段１０４では、音声検出手段１０３で検出さ
れた音声区間を対象に通常の音声認識処理を行ない、認
識結果として単語列を出力する。音声入力情報解析手段
１０５は、音声認識結果の出力である単語列について、
意味情報単位に分割あるいは統合することにより、意味
情報を解析する。一方、ペン入力手段１０２を用いて入
力された情報は、ペン入力情報検知手段１０６において
検知され、その際にペン入力の開始時刻と終力時刻を計
測する。なお、ペン入力情報としては、画面上に表示さ
れたファイルやディレクトリの概念を示すアイコンなど
の対象物を指すためにペンを押下するような離散的な操
作や、ペンを押下したまま移動するような連続的な操作
などにより得られる入力情報がある。ペン入力情報解析
手段１０７は、ペンにより入力された情報の意味とし
て、対象物を指した場合には指された対象物の名前を、
ジェスチャが入力された場合にはジェスチャに対応する
コマンド名を解析し、ペン入力情報の意味として出力す
る。情報統合手段１０８は、音声入力手段１０１とペン
入力手段１０２を用いて同時に入力された情報につい
て、音声入力情報の意味情報とペン入力情報の意味情報
より、利用者の操作全体の意図を解釈する。なお、意図
はコマンド実行手段１０９において実行するコマンドに
対応する。In the embodiment shown in FIG. 1, first of all, the voice of the user is detected by the voice detecting means 103 from the information inputted using the voice inputting means 101. As the voice of the user, as a command utterance related to file management, word voices such as “move” and “right”, and continuous voices such as “I want to move this file here” can be input. After these voices are input using the voice input means 101, the voice detection means 103 detects the voice section and at the same time,
The start time and end time of the detected voice section are measured.
The voice recognition unit 104 performs a normal voice recognition process on the voice section detected by the voice detection unit 103, and outputs a word string as a recognition result. The voice input information analysis unit 105, for the word string that is the output of the voice recognition result,
Semantic information is analyzed by dividing or integrating into semantic information units. On the other hand, the information input using the pen input unit 102 is detected by the pen input information detection unit 106, and at that time, the start time and the end force time of the pen input are measured. Note that the pen input information may be a discrete operation such as pressing the pen to point to an object such as an icon showing the concept of a file or directory displayed on the screen, or moving with the pen pressed. There is input information obtained by various continuous operations. The pen input information analysis means 107 means the name of the object pointed to when the object is pointed to, as the meaning of the information input by the pen.
When a gesture is input, the command name corresponding to the gesture is analyzed and output as the meaning of pen input information. The information integration unit 108 interprets the intention of the user's entire operation based on the semantic information of the voice input information and the semantic information of the pen input information regarding the information input simultaneously using the voice input unit 101 and the pen input unit 102. . The intention corresponds to the command executed by the command executing means 109.

【００１０】図２は、図１の情報統合手段１０８の一実
施例を示すブロック図である。図２において、時間同期
解析手段２０１は音声入力情報とペン入力情報の時間的
な対応関係を解析し、時間同期性を判定する手段であ
る。意味照合手段２０２は、時間的に同期している音声
入力情報とペン入力情報の意味を比較照合する手段であ
る。フレーム作成手段２０３は、意味照合手段２０２の
結果を用いて、音声入力情報とペン入力情報からコマン
ド実行手段１０９でコマンドを実行するための条件を記
述したフレームを作成する手段である。FIG. 2 is a block diagram showing an embodiment of the information integration means 108 of FIG. In FIG. 2, the time synchronization analysis unit 201 is a unit that analyzes the temporal correspondence between the voice input information and the pen input information and determines the time synchronism. The meaning collating means 202 is means for comparing and collating the meanings of the voice input information and the pen input information which are synchronized in time. The frame creating means 203 is means for creating a frame in which the condition for executing the command by the command executing means 109 is described from the voice input information and the pen input information, using the result of the meaning matching means 202.

【００１１】まず、音声入力情報解析手段１０５から、
入力された音声入力情報の意味情報および入力開始時
刻、終了時刻が出力され、ペン入力情報解析手段１０７
から、ペン入力情報の意味情報および入力開始時刻、終
了時刻が出力される。時間同期解析手段２０１はこれら
の情報を入力とし、音声入力情報とペン入力情報の時間
的な同期性を解析する。例えば、画面上に表示されたフ
ァイルのアイコンをペンの押下により指している時刻
が、「このファイルを」という音声区間の出現時刻に対
して、あらかじめ定めた時間差の範囲内にあれば、これ
らのペン入力情報と音声入力情報は同期していると判断
する。ここで、音声入力情報とペン入力情報が同期して
いる場合には、同じ概念についての入力が図１の音声入
力手段１０１とペン入力手段１０２の両方の入力手段に
より行なわれたと判断するヒューリスティックを用い
る。このヒューリスティックにより時間同期解析手段２
０１は、時間的な同期性がある音声入力情報とペン入力
情報について、同じ概念を示す情報としてリンクして、
出力する。意味照合手段２０２は、時間同期解析手段２
０１によりリンクされた音声入力情報とペン入力情報の
意味の照合を行なう。それぞれの意味が一致すればリン
クされた音声入力情報とペン入力情報は一つに統合でき
る。意味が一致しなければ、入力情報の信頼性から、ペ
ン入力情報の内容を優先的に採用し、音声入力情報の内
容を破棄する。ただし、ペン入力情報の内容が曖昧であ
れば、音声入力情報の意味を利用してペン入力情報の内
容を明確化する処理を行なう。なお、ここで音声入力情
報の単位は、句単位を想定しているが、単語単位、文単
位、文節単位でも本質的には処理の内容は同じである。
フレーム作成手段２０３は、意味照合手段２０２におい
て同一概念を示す情報を統合した音声入力情報とペン入
力情報から、解釈を行なう。解釈の結果としては、ファ
イルの移動、削除などのコマンドとそのパラメータから
なるフレームとする。First, from the voice input information analysis means 105,
The semantic information of the input voice input information, the input start time and the end time are output, and the pen input information analysis unit 107 is output.
From, the semantic information of the pen input information and the input start time and end time are output. The time synchronization analysis means 201 receives these pieces of information as input and analyzes the time synchronization of the voice input information and the pen input information. For example, if the time pointed by pressing the pen on the file icon displayed on the screen is within a predetermined time difference with respect to the appearance time of the voice section "This file", It is determined that the pen input information and the voice input information are synchronized. Here, when the voice input information and the pen input information are synchronized, a heuristic for determining that the input for the same concept is made by both the voice input means 101 and the pen input means 102 in FIG. To use. With this heuristic, the time synchronization analysis means 2
01 is linked as information showing the same concept with respect to voice input information and pen input information which have temporal synchronization,
Output. The meaning matching unit 202 is the time synchronization analysis unit 2.
The meaning of the voice input information and the pen input information linked by 01 is collated. If the respective meanings match, the linked voice input information and pen input information can be integrated into one. If the meanings do not match, the content of the pen input information is preferentially adopted and the content of the voice input information is discarded in view of the reliability of the input information. However, if the content of the pen input information is ambiguous, the meaning of the voice input information is used to clarify the content of the pen input information. Although the unit of the voice input information is assumed to be a phrase unit here, the content of the process is essentially the same in a word unit, a sentence unit, and a phrase unit.
The frame creating means 203 interprets from the voice input information and the pen input information, which are information obtained by integrating the information showing the same concept in the meaning collating means 202. The result of the interpretation is a frame consisting of commands for moving and deleting files and their parameters.

【００１２】図３は、図２の時間同期解析手段２０１の
一実施例を示すフローチャートである。図２の音声入力
情報解析手段１０５から音声入力情報の意味情報および
入力開始時刻、終了時刻が、ペン入力情報解析手段１０
７から、ペン入力情報の意味情報および入力開始時刻、
終了時刻が時間同期解析手段２０１に入力される。時間
同期解析手段２０１は、これらの情報を受け取ると、図
３のフローチャートのように、音声入力情報とペン入力
情報がどのような時間的対応関係を持つかを解析する。
音声入力情報とペン入力情報の時間的な対応関係は図３
の［時間的な対応関係のタイプ］に示す６種類が存在す
る。これらのうち(1),(2),(3),(4)の対応関係において
常に同期性があると言え、(5),(6)では音声入力情報と
ペン入力情報の出現時刻があらかじめ定めた時間差の範
囲内にあるという条件を満たせば同期性があると言え
る。FIG. 3 is a flow chart showing an embodiment of the time synchronization analysis means 201 of FIG. From the voice input information analysis means 105 of FIG.
From 7, the meaning information of the pen input information and the input start time,
The end time is input to the time synchronization analysis means 201. Upon receiving these pieces of information, the time synchronization analysis unit 201 analyzes what kind of temporal correspondence the voice input information and the pen input information have, as shown in the flowchart of FIG.
Figure 3 shows the temporal correspondence between voice input information and pen input information.
There are 6 types shown in [Type of temporal correspondence]. Of these, it can be said that there is always synchronism in the correspondence relationship of (1), (2), (3), and (4), and in (5) and (6), the appearance time of the voice input information and the pen input information is set in advance. It can be said that there is synchronism if the condition of being within the defined time difference is satisfied.

【００１３】例えば、図３の凡例に示すように、音声入
力情報の入力開始時刻、終了時刻をそれぞれTss,Tseと
する。さらにペン入力情報の入力開始時刻、終了時刻を
それぞれTps,Tpeとするとき、Tss = 100ms,Tse = 1200m
s,Tps = 300ms,Tpe = 500msであったと仮定する。この
場合、図３のフローチャートにおいて、まずs301,s302,
s308,s310のフローをたどり音声入力情報とペン入力情
報には同期性があると判断される。この場合の時間的な
対応関係は、図３の［時間的な対応関係のタイプ］の
(1)に相当する。For example, as shown in the legend of FIG. 3, it is assumed that the input start time and end time of the voice input information are Tss and Tse, respectively. When the input start time and end time of pen input information are Tps and Tpe, respectively, Tss = 100ms, Tse = 1200m
It is assumed that s, Tps = 300ms and Tpe = 500ms. In this case, in the flowchart of FIG. 3, first, s301, s302,
Following the flow of s308 and s310, it is determined that the voice input information and the pen input information have synchronism. The temporal correspondence relationship in this case is shown in [Type of temporal correspondence relationship] of FIG.
Corresponds to (1).

【００１４】例えば、Tss = 600ms,Tse = 1500ms,Tps =
300ms,Tpe = 400msの場合、s305においてc = 500ms と
定めておけば、s301,s302,s303,s304,s305,s306という
フローをたどり、音声入力情報とペン入力情報は対応し
ていると判断される。この場合の時間的な対応関係は、
図３の［時間的な対応関係のタイプ］の(5)に相当す
る。For example, Tss = 600ms, Tse = 1500ms, Tps =
In the case of 300ms, Tpe = 400ms, if c = 500ms is set in s305, the flow of s301, s302, s303, s304, s305, s306 will be followed, and it will be judged that the voice input information and pen input information correspond. It The temporal correspondence in this case is
This corresponds to (5) of [Type of temporal correspondence] in FIG.

【００１５】このようにして、図２の時間同期解析手段
２０１では、音声入力情報とペン入力情報の同期性を判
定し、さらに同期する音声入力情報とペン入力情報をリ
ンクして出力する。In this way, the time synchronization analysis means 201 of FIG. 2 determines the synchronism between the voice input information and the pen input information, and further links the synchronized voice input information and the pen input information and outputs them.

【００１６】図４は、図２の意味照合手段２０２の一実
施例を示すブロック図である。図２の時間同期解析手段
２０１から、時間的に同期していることによりリンクさ
れた音声入力情報とペン入力情報が出力され、図４の意
味情報照合手段４０１において、リンクされた音声入力
情報とペン入力情報の意味情報の照合を行なう。時間的
に同期していれば、音声入力情報とペン入力情報は同一
の概念を示すと仮定しているため、同一の意味情報を持
ちうるが、音声入力情報には「これ」「ここ」などペン
入力情報には存在しない意味情報を持つ語が存在した
り、音声認識の誤りによって、時間的には同期していて
も異なる意味情報を持つ入力情報がリンクされる可能性
が生じる。従って、意味情報の照合を行ない、不一致で
あれば、入力情報の信頼性から、ペン入力情報の内容を
優先的に採用する。ただし、例えばディレクトリのアイ
コンをペンによりポインティングしただけでは対象物と
して判断すべきなのか目的地として判断すべきなのかと
いうように、解釈の曖昧性が残るような場合、ペン入力
情報のみでは解釈できない。このような場合に対処する
ために、意味素性情報照合手段４０２は、解釈が曖昧な
ペン入力情報に対応している音声入力情報の意味素性情
報を用いて、ペン入力情報の解釈の曖昧性を解消する。FIG. 4 is a block diagram showing an embodiment of the meaning collating means 202 of FIG. The time synchronization analysis unit 201 of FIG. 2 outputs the voice input information and the pen input information which are linked by being synchronized in time, and the semantic information collating unit 401 of FIG. 4 outputs the linked voice input information. The semantic information of the pen input information is collated. Since it is assumed that the voice input information and the pen input information indicate the same concept if they are synchronized in time, they can have the same semantic information, but the voice input information includes “this”, “here”, etc. There is a possibility that input information having different meaning information may be linked even if they are synchronized in time, due to a word having meaning information that does not exist in the pen input information or an error in voice recognition. Therefore, the semantic information is collated, and if they do not match, the content of the pen input information is preferentially adopted because of the reliability of the input information. However, if there is some ambiguity in the interpretation, such as whether it should be judged as the object or the destination by just pointing the directory icon with the pen, it cannot be interpreted only by the pen input information. . In order to deal with such a case, the semantic feature information collating unit 402 uses the semantic feature information of the voice input information corresponding to the pen input information whose interpretation is ambiguous to determine the ambiguity of the interpretation of the pen input information. Resolve.

【００１７】図５は、図２の意味照合手段２０２の一実
施例のデータの流れを示す図である。利用者が「ファイ
ル１をここにコピー」と発話しながら「ファイル１を」
と発話中に画面上のfile1のアイコンをペンで指し、
「ここに」と発話中に画面上のwindow2の領域をペンで
指す操作を行なった場合の例である。この時、音声入力
情報として「ここに」という表現の情報と、ペン入力情
報として対象物がwindow2である情報が時間的に同期し
ているため、図２の時間同期解析手段２０１によりリン
クされる。図４の意味情報照合手段４０１は、図５の音
声入力情報２の意味情報"here"とペン入力情報２の意味
情報"window2"を比較照合し、不一致の判定結果を返
す。次に図４の意味素性情報照合手段４０２は、音声入
力情報２の意味素性情報"location"とペン入力情報２の
意味素性情報"location"を比較照合し、一致の判定結果
を返す。その結果、リンクされた音声入力情報２とペン
入力情報２は、統合され、意味情報として"window2"、
意味素性情報として"location"を持つ統合化情報２に置
換される。FIG. 5 is a diagram showing a data flow of an embodiment of the meaning collating means 202 of FIG. User says "File 1" while saying "Copy file 1 here"
While talking, point the file1 icon on the screen with the pen,
This is an example of an operation of pointing the window2 area on the screen with a pen while uttering "here". At this time, since the information of the expression "here" as the voice input information and the information of the object window2 as the pen input information are temporally synchronized, they are linked by the time synchronization analysis means 201 of FIG. . The semantic information collating means 401 of FIG. 4 compares and collates the semantic information “here” of the voice input information 2 and the semantic information “window2” of the pen input information 2 of FIG. 5, and returns a determination result of non-coincidence. Next, the semantic feature information collating means 402 of FIG. 4 compares and collates the semantic feature information “location” of the voice input information 2 and the semantic feature information “location” of the pen input information 2 and returns a match determination result. As a result, the linked voice input information 2 and pen input information 2 are integrated, and "window2" as the semantic information,
It is replaced with the integrated information 2 having "location" as the semantic feature information.

【００１８】図６は、フレーム作成手段の一実施例の入
出力データを示す図である。FIG. 6 is a diagram showing input / output data of an embodiment of the frame creating means.

【００１９】図２の時間同期解析手段２０１と意味照合
手段２０２により、時間的に同期することにより同一概
念を示すと判断された音声入力情報とペン入力情報につ
いて情報を統合し、統合された入力情報から、図２のフ
レーム作成手段２０３においてフレームを作成する。こ
こでいうフレームとは、利用者の操作の意図の解釈結果
であり、ファイルの移動、削除などファイル管理に関わ
るコマンドとコマンド実行に要するパラメータからな
る。The time synchronization analysis unit 201 and the semantic matching unit 202 of FIG. 2 integrate the information about the voice input information and the pen input information which are judged to exhibit the same concept by synchronizing in time, and the integrated input is made. A frame is created by the frame creating means 203 in FIG. 2 from the information. The frame here is a result of interpretation of the intention of the user's operation, and is composed of commands relating to file management such as file movement and deletion, and parameters required for command execution.

【００２０】例えば、図６において、(1)に示した統合
化情報は既に同一概念を示す音声入力情報とペン入力情
報を統合した情報である。(2)に示したコマンドフレー
ムは、ファイル管理に関するファイルの移動、削除、コ
ピーなど、図１のコマンド実行手段１０９において実行
するコマンドの種類と、実行するために必要な対象物
名、目的地名などのパラメータからなる。図６におい
て、例えば、統合化情報１の意味素性"object"より、コ
マンドフレーム中のobjectのスロットに、統合化情報の
意味情報が記述される。同様にして、統合化情報２から
４の意味情報をコマンドフレームのスロットに記述し、
コマンドフレームを作成する。For example, in FIG. 6, the integrated information shown in (1) is information obtained by integrating the voice input information and the pen input information which already have the same concept. In the command frame shown in (2), the types of commands executed by the command execution means 109 of FIG. 1, such as file movement, deletion, and copy related to file management, the object name necessary for execution, the destination name, etc. It consists of parameters. In FIG. 6, for example, from the semantic feature “object” of the integrated information 1, the semantic information of the integrated information is described in the slot of object in the command frame. Similarly, describe the semantic information of the integrated information 2 to 4 in the slot of the command frame,
Create a command frame.

【００２１】図７は、本発明装置の一実施例の利用形態
を示す図である。この図においてディスプレイ７０１は
画面出力手段である。ウインドウ７０２は、ディレクト
リの内容を表示する領域である。アイコン７０３は、フ
ァイルの概念を示した図形である。FIG. 7 is a diagram showing a form of use of an embodiment of the device of the present invention. In this figure, the display 701 is a screen output means. The window 702 is an area for displaying the contents of the directory. The icon 703 is a graphic showing the concept of a file.

【００２２】この図において、利用者は「この裏のアイ
コンをここに移して下さい」と発話しながら、「アイコ
ンを」の発話中にディスプレイ上の(1)の領域をペンで
指し、「ここに」の発話中にディスプレイ上の(2)の領
域をペンで指すことにより、画面上には見えない対象物
に対する操作を行なうことが可能である。本発明装置に
おいて、図１の情報統合手段１０８を設けることによ
り、上記の操作や、画面上に表示された対象物の下層も
しくは上層の、画面上に表示されていないディレクトリ
やファイルを音声で指定する操作など、不可視部分の操
作を解釈できる。In this figure, while uttering "Please move the icon on the back here," the user points the area (1) on the display with a pen while uttering "icon", By pointing the area (2) on the display with a pen during the utterance of "ni", it is possible to operate an object that is not visible on the screen. By providing the information integration means 108 of FIG. 1 in the device of the present invention, the above-mentioned operation and the directory or file below or above the object displayed on the screen, which is not displayed on the screen, can be designated by voice. You can interpret the operation of the invisible part such as the operation.

【００２３】図８は、本発明装置の一実施例の利用形態
を示す図である。FIG. 8 is a diagram showing a form of use of an embodiment of the device of the present invention.

【００２４】この図において、利用者は「これを上のウ
インドウにコピーして下さい」と発話しながら、「これ
を」の発話中にディスプレイ上の(1)で示す領域を指す
操作で、ペンで直接、コピー先の領域を指さなくても音
声で相対的な位置を指定する操作を行なうことが可能で
ある。本発明装置において、図１の情報統合手段１０８
を設けることにより、上記の操作のような、直接操作の
代替となる間接操作を解釈できる。In this figure, the user utters "Please copy this to the window above" while pointing at the area indicated by (1) on the display while uttering "This" It is possible to directly specify the relative position by voice without pointing at the copy destination area. In the device of the present invention, the information integration means 108 of FIG.
By providing, it is possible to interpret an indirect operation that is an alternative to the direct operation, such as the above operation.

【００２５】図９は、本発明装置の一実施例の利用形態
を示す図である。FIG. 9 is a diagram showing a usage pattern of an embodiment of the apparatus of the present invention.

【００２６】この図において、利用者は「この形のアイ
コンを全て選択して下さい」と発話しながら、「この形
の」の発話中にディスプレイ上の(1)で示す領域を指す
操作で、形という属性を指定することにより、複数の対
象物の選択を一括して行なうことが可能である。本発明
装置において、図１の情報統合手段１０８を設けること
により、上記の操作や、色や大きさなどの属性を音声に
よって指定する操作を解釈できる。In this figure, the user utters "Please select all icons of this shape", while pointing "(1)" on the display during the utterance of "this shape", By specifying the attribute of shape, it is possible to collectively select a plurality of objects. By providing the information integration means 108 of FIG. 1 in the device of the present invention, it is possible to interpret the above-described operation and the operation of specifying attributes such as color and size by voice.

【００２７】[0027]

【発明の効果】少なくともマイクロフォンなどの音声入
力手段と、ペンあるいはマウスなどのポインティング入
力手段と、利用者の入力音声を音声認識する音声認識手
段と、ポインティング入力手段を用いて入力されたポイ
ンティング入力情報を検知するポインティング入力情報
検知手段と、コマンドを実行するコマンド実行手段と、
該コマンドに対応して画面表示を変更する画面表示手段
を有する情報処理装置において、音声入力情報およびポ
インティング入力情報を統合して解釈を行なう情報統合
手段を備えることにより、利用者により前記音声入力手
段と前記ポインティング入力手段を用いて同時に入力さ
れた音声入力情報とポインティング入力情報を統合して
解釈し、単一入力手段では実現困難な画面上の不可視部
分の操作や複数オブジェクトの一括選択操作などの、非
直接操作を含む操作を処理することを可能にした。EFFECTS OF THE INVENTION At least voice input means such as a microphone, pointing input means such as a pen or a mouse, voice recognition means for recognizing voice input by a user, and pointing input information input using the pointing input means. Pointing input information detection means for detecting the, and command execution means for executing the command,
An information processing apparatus having a screen display unit for changing the screen display in response to the command includes an information integration unit for integrating and interpreting voice input information and pointing input information, so that the user can perform the voice input unit. And the voice input information and the pointing input information which are simultaneously input using the pointing input means are integrated and interpreted, and the operation of the invisible part on the screen and the batch selection operation of a plurality of objects which are difficult to realize with the single input means are performed. It made it possible to process operations, including indirect operations.

[Brief description of drawings]

【図１】情報処理装置の一実施例を示すブロック図。FIG. 1 is a block diagram showing an embodiment of an information processing device.

【図２】情報統合手段の一実施例を示すブロック図。FIG. 2 is a block diagram showing an embodiment of information integration means.

【図３】時間同期解析手段の一実施例を示すフローチャ
ート。FIG. 3 is a flowchart showing an embodiment of time synchronization analysis means.

【図４】意味照合手段の一実施例を示すブロック図。FIG. 4 is a block diagram showing an embodiment of a meaning matching unit.

【図５】意味照合手段の一実施例のデータの流れを示す
図。FIG. 5 is a diagram showing a data flow of an embodiment of a meaning matching unit.

【図６】フレーム作成手段の一実施例の入出力データを
示す図。FIG. 6 is a diagram showing input / output data of an embodiment of a frame creating means.

【図７】本発明装置の一実施例の利用形態を示す図。FIG. 7 is a diagram showing a usage pattern of an embodiment of the device of the present invention.

【図８】本発明装置の一実施例の利用形態を示す図。FIG. 8 is a diagram showing a usage pattern of an embodiment of the device of the present invention.

【図９】本発明装置の一実施例の利用形態を示す図。FIG. 9 is a diagram showing a usage pattern of an embodiment of the device of the present invention.

[Explanation of symbols]

１０１…音声入力手段、１０２…ペン入力手段、１０３
…音声検出手段、１０４…音声認識手段、１０５…音声
入力情報解析手段、１０６…ペン入力情報検知手段、１
０７…ペン入力情報解析手段、１０８…情報統合手段、
１０９…コマンド実行手段、１１０…画面表示手段、１
１１…画面出力手段。101 ... Voice input means, 102 ... Pen input means, 103
... voice detection means, 104 ... voice recognition means, 105 ... voice input information analysis means, 106 ... pen input information detection means, 1
07 ... Pen input information analysis means, 108 ... Information integration means,
109 ... Command execution means, 110 ... Screen display means, 1
11 ... Screen output means.

Claims

[Claims]

1. A voice input means such as at least a microphone, a pointing input means such as a pen or a mouse, a voice detection means for detecting a voice of a user inputted by the voice input means, and a detected voice input. Voice recognition means for recognizing information by voice, voice input information analysis means for analyzing the voice input information, pointing input information detection means for detecting pointing input information input using the pointing input means, and the pointing Information having pointing input information analysis means for analyzing input information, command execution means for executing a command, screen display means for changing the screen display corresponding to the command, and screen output means for outputting a screen such as a display In the processing device, the voice input information and the pointing An information integration unit that integrates and interprets input information is provided, and a user uses the voice input unit and the pointing input unit, such as an operation of an invisible portion on a screen or a collective selection operation of a plurality of objects, An information processing apparatus having a function of interpreting an operation including an indirect operation.

2. The information processing apparatus according to claim 1, wherein the information integration unit is a time period of voice input information input using the voice input unit and pointing input information input using the pointing input unit. Command as a result of the interpretation from the result of the meaning matching means, the time synchronization analysis means for analyzing the temporal synchronism, the meaning matching means for matching the meanings of the voice input information and the pointing input information which are synchronized in time. An information processing apparatus having a frame creating means for creating an execution condition, and having a function of interpreting an operation in which the voice input information and the pointing input information are asynchronously input.

3. The information processing apparatus according to claim 1, wherein the time synchronization analysis means includes the voice input information input using the voice input means and the pointing input information input using the pointing input means. An information processing apparatus having a function of analyzing temporal correspondence and further determining time synchronism.

4. The information processing apparatus according to claim 1, wherein the meaning collating means has meanings and meaning features with respect to the voice input information and the pointing input information judged to have temporal synchronism by the time synchronization analysis means. An information processing device having a function of comparing and collating and complementing semantic concepts such as.

5. The information processing apparatus according to claim 1, wherein the information integration unit integrates the voice input information regarding the object invisible in the screen output unit with other input information to form an invisible portion. An information processing apparatus having a function of interpreting the operation of.

6. The information processing apparatus according to claim 1, wherein the information integration means integrates voice input information relating to a relative position with respect to a designated object with other input information, thereby producing a voice. An information processing apparatus having a function of interpreting an operation in which designation of an object is an alternative to a direct operation.

7. The information processing apparatus according to claim 1, wherein said information integration means integrates voice input information relating to attributes such as color, shape and size with other input information.
An information processing apparatus having a function of interpreting an operation of collectively selecting a plurality of objects having common attributes such as color, shape, and size by voice.