JPH09251368A

JPH09251368A - Construction supporting system for intelligent agent

Info

Publication number: JPH09251368A
Application number: JP8060593A
Authority: JP
Inventors: Shunichi Tano; 俊一田野; Hideki Sakao; 秀樹坂尾; Yasuharu Nanba; 康晴難波; Taminori Tomita; 民則冨田; Hirokazu Aoshima; 弘和青島
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-03-18
Filing date: 1996-03-18
Publication date: 1997-09-22

Abstract

PROBLEM TO BE SOLVED: To provide a function for freely constructing an agent responding by using plural media such as still images, moving images, voice and effective sound, etc., by controlling a computer and an application software, etc., instead of a user based on an instruction using the plural media such as a keyboard, a mouse, a pen, the voice and gestures, etc., even by the user incapable of performing programming. SOLUTION: By writing descriptions for input contents and input timings in the plural media such as keyboard input, mouse input, pen input, voice input and gesture input, etc., in a rule condition part 8 and writing the output information descriptions, execution command descriptions and the output timing descriptions of text display output, still image output, moving image output, voice output and effective sound output, etc., in a rule conclustion part 9, the operation of the agent is described.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、パーソナルコンピ
ュータや携帯情報機器等のユーザインタフェース技術に
関し、より詳細には、キーボード、マウス、ペン、音
声、身振りなど複数のメディアを用いた指示に基づき、
ユーザに代わってコンピュータやアプリケーションソフ
トウエア等を制御し、静止画、動画、音声、効果音等の
複数のメディアを用いて応答するエージェントの構築技
術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a user interface technology for a personal computer, a portable information device and the like, and more specifically, based on an instruction using a plurality of media such as a keyboard, a mouse, a pen, a voice and a gesture.
The present invention relates to a technology for constructing an agent that controls a computer, application software, etc. on behalf of a user and responds using a plurality of media such as still images, moving images, voices, and sound effects.

【０００２】[0002]

【従来の技術】現在、パーソナルコンピュータや携帯情
報機器等では、キーボードやマウスを用いた入力が主流
である。しかし、キーボードやマウスの入力は繁雑であ
るため、ユーザに代わって、コンピュータやアプリケー
ションの操作を代行してくれるエージェントと呼ばれる
ソフトウエアが注目を浴びている。2. Description of the Related Art At present, in a personal computer, a portable information device and the like, input using a keyboard and a mouse is the mainstream. However, since keyboard and mouse inputs are complicated, software called agents, which act on behalf of users to operate computers and applications, is drawing attention.

【０００３】例えば、文献（ａ）「Ｐ．Ｍayes：Ａgent
s that Ｒeduce Ｗork and Ｉnformation Ｏverload，
Ｃommunications of the ＡＣＭ，Ｖol，３７，Ｎo．
７，pp．３０−４０，１９９４」では、大量に送付され
る電子メールを読むべきか、捨てるべきかをアドバイス
するエージェントが記述されている。このエージェント
はユーザの肩越しにユーザのキーボードとマウスの操作
を監視しており、ユーザの操作の癖を学習する。For example, reference (a) "P. Mayes: Agent"
s that Reduce Work and Innovation Overload,
Communications of the ACM, Vol, 37, No.
7, pp. 30-40, 1994 ”describes an agent that advises whether to read a large amount of emails or discard them. This agent monitors the user's keyboard and mouse operations over the user's shoulders and learns the user's habits of operation.

【０００４】また、文献（ｂ）「Ｄ．Ｓmith，Ａ．Ｃyp
her and Ｊ．Ｓpohrer：Ｋid Ｓim：Ｐrogramming Ａge
nts Ｗithout a Ｐrogramming Ｌanguage，Ｃommunicat
ions of the ＡＣＭ，Ｖol，３７，Ｎo．７，pp．５４
−６７，１９９４」では、グラフィカルルールを用い
て、子供でもプログラミング出来る手法を提案してお
り、これを応用すればエージェントの構築に使えると主
張している。In addition, reference (b) “D. Smith, A. Cyp
her and J. Spohrer: Kid Sim: Programming Age
nts Whitout a Programming Language, Communicat
ions of the ACM, Vol, 37, No. 7, pp. 54
-67, 1994 ”proposes a method in which even a child can program by using a graphical rule, and argues that application of this method can be used for building an agent.

【０００５】[0005]

【発明が解決しようとする課題】しかし、上記技術は、
キーボード、マウスに特化されており、キーボード、マ
ウスを含む、ペン、音声、身振りなど複数のメディアを
用いた指示に基づき、ユーザに代わってコンピュータや
アプリケーションソフトウエア等を制御し、静止画、動
画、音声、効果音等の複数のメディアを用いて応答する
エージェントへの適用は困難であった。However, the above technique is
It specializes in keyboards and mice, and controls computers and application software on behalf of users based on instructions using multiple media including keyboards and mice, such as pens, voice, and gestures, and still images and videos. It was difficult to apply to agents that respond using multiple media such as voice, sound effects, etc.

【０００６】本発明の目的は、プログラミングができな
いユーザでも、キーボード、マウス、ペン、音声、身振
りなど複数のメディアを用いた指示に基づき、ユーザに
代わってコンピュータやアプリケーションソフトウエア
等を制御し、静止画、動画、音声、効果音等の複数のメ
ディアを用いて応答するエージェントを自由自在に構築
するための機能を提供することにある。An object of the present invention is that even a user who cannot program can control a computer, application software, etc. on behalf of the user based on an instruction using a plurality of media such as a keyboard, a mouse, a pen, a voice, and a gesture so that the user can stand still. It is to provide a function for freely constructing an agent that responds by using a plurality of media such as images, videos, voices, and sound effects.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、（１）プログラミングができないユーザでも、マル
チモーダル入力に反応し、マルチモーダル出力で応答す
るエージェントを容易に作ることを可能とするために、
ルール条件部に、キーボード入力、マウス入力、ペン入
力、音声入力、身振り入力など複数のメディアにおける
入力内容および入力タイミングに関する記述を書きルー
ル結論部に、テキスト表示出力、静止画出力、動画出
力、音声出力、効果音出力等の出力情報記述、実行コマ
ンド記述、それらの出力タイミング記述を書くことによ
り、エージェントの動作を記述できるようにしたこと、
（２）マルチモーダルなルール条件記述を可能とするた
めに、共通の時間軸上に、音声やキーボード入力等の各
入力モードごとに入力内容指定を可能としたこと、
（３）マルチモーダルなルール結論部記述を可能とする
ために、共通の時間軸上に、音声や画面出力等の各出力
モードごとに出力内容指定を可能としたこと、（４）ユ
ーザはコンピュータの出力画面を見て操作を決めてお
り、さらに、ユーザの操作結果はコンピュータ出力画面
の状態変化として記述できるため、マルチモーダル条件
記述およびマルチモーダル出力記述にコンピュータ画面
の状態の指定を可能とし、マルチモーダル条件記述にお
けるコンピュータ画面の記述は、そのコンピュータ画面
が実際にコンピュータの出力画面に現われた時点に真と
なる条件として、マルチモーダル出力記述に現われるコ
ンピュータ画面の記述は、そのコンピュータ画面となる
ようにコンピュータを制御したり、アプリケーションソ
フトウエアを操作するコマンドの実行として、解釈する
こと、（５）また、上記のようにマルチモーダルルール
をユーザが明示的に記述しなくても、ルール条件部やル
ール結論部ごとに、ユーザが複数の正しい例や誤った例
をマルチモーダルに与えることにより、ルール条件部や
ルール実後部の記述を自動的に生成すること、（６）さ
らに、上記のようにユーザが操作の例を明示的に記述し
なくても、ユーザの操作履歴をキーボード、マウス、ペ
ン、音声、身振りなど複数のメディアを用いた入力情報
や、静止画、動画、音声、効果音等の出力情報、ユーザ
の操作等からなる操作履歴を保存し、それらの中からよ
く現われるパターンを検出し、マルチモーダルルールを
自動生成すること、の手段を備えた構成としてある。In order to achieve the above object, (1) In order to enable even a user who cannot program, to easily create an agent that responds to a multimodal input and responds with a multimodal output. ,
In the rule condition part, write the description about the input content and input timing in multiple media such as keyboard input, mouse input, pen input, voice input, gesture input, and write the text display output, still image output, video output, voice in the rule conclusion part. By writing output information description such as output and sound effect output, execution command description, and output timing description of them, the operation of the agent can be described.
(2) In order to enable multi-modal rule condition description, it is possible to specify the input contents for each input mode such as voice or keyboard input on a common time axis.
(3) In order to enable multi-modal rule conclusion part description, it is possible to specify output contents for each output mode such as voice and screen output on a common time axis. (4) User is a computer Operation is decided by looking at the output screen of, and since the operation result of the user can be described as the state change of the computer output screen, it is possible to specify the state of the computer screen in the multimodal condition description and the multimodal output description. The description of the computer screen in the multimodal conditional description is a condition that becomes true when the computer screen actually appears on the output screen of the computer, and the description of the computer screen appearing in the multimodal output description becomes that computer screen. To control the computer and operate application software. Interpretation as command execution (5) In addition, even if the user does not explicitly describe the multimodal rule as described above, the user can obtain multiple correct examples and By giving an incorrect example in a multimodal manner, the description of the rule condition part and the actual part of the rule is automatically generated. (6) Furthermore, as described above, the user does not have to explicitly describe the operation example. Also, the user's operation history includes input information using multiple media such as a keyboard, mouse, pen, voice, and gesture, output information such as still images, videos, voices, and sound effects, and an operation history consisting of user operations. The configuration is provided with means for saving, detecting patterns that often appear in them, and automatically generating multimodal rules.

【０００８】[0008]

【発明の実施の形態】以下、図面を用いて本発明の実施
例を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【０００９】図１は本発明の第一の実施例を示す。FIG. 1 shows a first embodiment of the present invention.

【００１０】本実施例は、図１に示すように、モニタ
１、スピーカ３、カメラ４、マイク５などから構成され
るパーソナルコンピュータに存在するエージェント２が
対象となる。エージェント２は、このパーソナルコンピ
ュータの前のユーザのキーボード、マウス、ペン、音
声、身振りなど複数のメディアを用いた指示に基づき、
ユーザに代わってパーソナルコンピュータシステム自身
やこのパーソナルコンピュータ上で動作しているアプリ
ケーションソフトウエア等を制御し、静止画、動画、音
声、効果音等の複数のメディアを用いて応答する機能を
有している。In this embodiment, as shown in FIG. 1, an agent 2 existing in a personal computer composed of a monitor 1, a speaker 3, a camera 4, a microphone 5 and the like is a target. The agent 2 is based on an instruction using a plurality of media such as the keyboard, mouse, pen, voice, and gesture of the user in front of this personal computer,
It has the function of controlling the personal computer system itself and application software running on this personal computer on behalf of the user, and responding by using multiple media such as still images, moving images, voices, and sound effects. There is.

【００１１】このエージェントソフトウエアの動作内容
をユーザがプログラミングすることなく、自由自在に追
加、訂正するための機能、すなわち、知的エージェント
構築支援機能を有することが本システムの特徴であり、
そのメインウィンドウ７の様子がモニタ６内に表示され
ている。メインウィンドウ７は、エージェントの動作を
規定するためのマルチモーダルルールを表示しており、
マルチモーダルルール条件部８とマルチモーダルルール
実行部９から構成されている。A feature of the present system is that it has a function for freely adding and correcting the operation contents of this agent software without the user programming, that is, an intelligent agent construction support function.
The state of the main window 7 is displayed on the monitor 6. The main window 7 displays a multi-modal rule for defining the behavior of the agent,
It is composed of a multi-modal rule condition section 8 and a multi-modal rule execution section 9.

【００１２】図２以降を用いて、知的エージェント構築
支援機能について説明する。エージェント２をマウスで
クリックすると、そのエージェントに関するメニュー２
１が現われる。メニュー２１には、マルチモーダルルー
ル作成２２、操作履歴保存開始２３、操作履歴からの学
習２４の３つのサブメニューが表示されている。The intelligent agent construction support function will be described with reference to FIG. Clicking on Agent 2 with the mouse will bring up Menu 2 for that agent.
1 appears. In the menu 21, three submenus of multimodal rule creation 22, operation history storage start 23, and learning from operation history 24 are displayed.

【００１３】図３を用いて全体の動作フローを説明す
る。まず、モニタ１上に表示されている知的エージェン
ト２上をマウスでクリックする動作によるメニュー起動
操作があったかどうか判定し（３１）、もしメニュー起
動動作があれば、図２に示したメニューを表示する（３
２）。The overall operation flow will be described with reference to FIG. First, it is judged whether or not there is a menu starting operation by clicking the mouse on the intelligent agent 2 displayed on the monitor 1 (31). If there is a menu starting operation, the menu shown in FIG. 2 is displayed. Do (3
2).

【００１４】ユーザがメニュー中の“マルチモーダルル
ール作成”を選択した場合（３３）、図５に示すマルチ
モーダルルールエディタを起動する（３４）。マルチモ
ーダルルールエディタを用いて、エージェントの動作を
規定するマルチモーダルルールを作成する方法について
は、図５を用いて後述する。When the user selects "create multi-modal rule" in the menu (33), the multi-modal rule editor shown in FIG. 5 is started (34). A method of creating a multi-modal rule that defines the operation of the agent using the multi-modal rule editor will be described later with reference to FIG.

【００１５】ユーザがメニュー中の“操作履歴保存開
始”を選択した場合（３５）、履歴記憶スイッチをオン
に設定する（３６）。When the user selects "start operation history storage" in the menu (35), the history storage switch is set on (36).

【００１６】ユーザがメニュー中の“操作履歴からの学
習”を選択した場合（３７）、履歴ファイルを解析し、
マルチモーダルルールを自動生成する（３８）。When the user selects "learn from operation history" in the menu (37), the history file is analyzed,
A multimodal rule is automatically generated (38).

【００１７】さらに、履歴記憶スイッチがオンである場
合（３９）、マルチモーダルルールの自動生成に必要な
マルチモーダル履歴記憶を履歴ファイルに書き出す（４
０）。マルチモーダルルールの自動生成に必要なマルチ
モーダル履歴情報に関しては、図９を用いて後述する。Further, when the history storage switch is on (39), the multimodal history storage necessary for automatic generation of the multimodal rule is written in the history file (4).
0). The multi-modal history information necessary for automatically generating the multi-modal rule will be described later with reference to FIG.

【００１８】以上のように、ユーザがメニュー中の“マ
ルチモーダルルール作成”を選択した場合、マルチモー
ダルルールエディタが起動され、図５に示すウィンドウ
が表示される。このウィンドウを操作することにより、
マルチモーダルルールのユーザによる作成やユーザの操
作例を基にした自動作成を行うことができる。As described above, when the user selects "Create multi-modal rule" in the menu, the multi-modal rule editor is activated and the window shown in FIG. 5 is displayed. By operating this window,
It is possible to create a multi-modal rule by a user or automatically create an example based on a user operation.

【００１９】そこで、まず、図５に示すウィンドウが現
われたときの動作を図４に示す動作フローを用いて説明
する。マルチモーダルルールエディタのウィンドウ５１
には、ルールの名称５２とルールの定義情報であるルー
ル条件部５５とルール結論部５８が表示されている。学
習開始ボタン５３と学習終了ボタン５４は１つのルール
全体の学習をコントロールするボタンであり、学習開始
ボタン５６と学習終了ボタン５７はルールの条件部の学
習をコントロールするボタン、学習開始ボタン５９と学
習終了ボタン６０はルールの実行部の学習をコントロー
ルするボタンである。Therefore, first, the operation when the window shown in FIG. 5 appears will be described using the operation flow shown in FIG. Multimodal Rule Editor window 51
Shows a rule name 52, a rule condition part 55 and rule conclusion part 58 which are rule definition information. The learning start button 53 and the learning end button 54 are buttons for controlling the learning of one rule as a whole, and the learning start button 56 and the learning end button 57 are buttons for controlling the learning of the conditional part of the rule, and the learning start button 59 and the learning. The end button 60 is a button for controlling learning of the rule execution unit.

【００２０】このマルチモーダルルールエディタのウィ
ンドウ５１が表示されている状態において、ルール学習
ボタン５３が押された場合（４１）、学習終了ボタン５
４が押されるまで図９に例示した形式で操作履歴を記憶
し、結果を解析し１つのルールを生成する（４２）。ま
た、ルール条件部学習ボタン５６が押された場合（４
３）、学習終了ボタン５７が押されるまで図９に例示し
た形式で操作履歴を記憶し、結果を解析し１つのルール
条件部を生成する（４４）。また、ルール実行部学習ボ
タン５９が押された場合（４５）、学習終了ボタン６０
が押されるまで図９に例示した形式で操作履歴を記憶
し、結果を解析し１つのルール実行部を生成する（４
６）。操作履歴の保持形式や学習方法は図９を用いて後
述することにして、以下、マルチモーダルルールをユー
ザ自身で定義する方法４７について説明する。When the rule learning button 53 is pressed while the window 51 of the multimodal rule editor is displayed (41), the learning end button 5
The operation history is stored in the format illustrated in FIG. 9 until 4 is pressed, and the result is analyzed to generate one rule (42). In addition, when the rule condition part learning button 56 is pressed (4
3) Until the learning end button 57 is pressed, the operation history is stored in the format illustrated in FIG. 9, the result is analyzed, and one rule condition part is generated (44). When the rule execution unit learning button 59 is pressed (45), the learning end button 60
The operation history is stored in the format illustrated in FIG. 9 until is pressed, and the result is analyzed to generate one rule execution unit (4
6). A method of holding the operation history and a learning method will be described later with reference to FIG. 9, and a method 47 of defining the multimodal rule by the user will be described below.

【００２１】図６は、図５のルール条件部５５を拡大し
たものである。入力メディアとして、「音」６１、「音
声」６２、「キーボード」６３、「マウス」６４、「ペ
ン」６５、「身振り」６６、「画面」６７の計７種が用
意されている。「音」と「音声」の違いは、「音」６１
は言葉にならない効果音を表わすのに対し、「音声」６
２は言語化可能な音を表わす。また、「画面」６７はそ
の時パーソナルコンピュータのモニタ上に表示されてい
る画面の一部分を表わす。これについては図８を用いて
後述する。FIG. 6 is an enlarged view of the rule condition section 55 of FIG. As input media, a total of 7 types of "sound" 61, "voice" 62, "keyboard" 63, "mouse" 64, "pen" 65, "gesture" 66, and "screen" 67 are prepared. The difference between "sound" and "voice" is "sound" 61
Represents sound effects that cannot be expressed in words, while “voice” 6
2 represents a verbalizable sound. The "screen" 67 represents a part of the screen currently displayed on the monitor of the personal computer. This will be described later with reference to FIG.

【００２２】７種の入力メディアごとに、それぞれエデ
ィタウィンドウ６８が用意されている。メディアごとに
エディタウィンドウの動作は異なるが、時間座標６９を
持つ。また、開始時刻を明示するために、三角印７０を
用いる。An editor window 68 is prepared for each of the seven types of input media. The operation of the editor window is different for each medium, but has a time coordinate 69. Further, a triangle mark 70 is used to clearly indicate the start time.

【００２３】「音」用のエディタウィンドウでは、音が
発生されるべき時刻を選択し、音を実際に発生させてマ
イクで収集し、その波形が表示される。In the "sound" editor window, the time at which the sound should be generated is selected, the sound is actually generated and collected by the microphone, and its waveform is displayed.

【００２４】「音声」用のエディタウィンドウでは、音
声が発生されるべき時刻を選択し、その音声のテキスト
を入力する。開始時刻は三角印７０で明示してもよい。In the editor window for "Voice", select the time at which the voice should occur and enter the text of that voice. The start time may be clearly indicated by the triangle mark 70.

【００２５】「キーボード」用のエディタウィンドウで
は、キーボードから入力されるべきテキストを入力す
る。In the editor window for "keyboard", enter the text to be entered from the keyboard.

【００２６】「マウス」用のエディタウィンドウでは、
マウスから入力されるべきクリックを座標と時刻を表わ
す三角印で入力する。２つの連なった三角印はダブルク
リックを表わす。In the editor window for "mouse",
Clicks that should be input from the mouse are input using the triangles that represent the coordinates and time. Two consecutive triangles represent a double click.

【００２７】「ペン」用のエディタウィンドウでは、ペ
ンから入力されるべきストロークデータを入力する。In the "pen" editor window, stroke data to be input from the pen is input.

【００２８】「身振り」用のエディタウィンドウでは、
身振りが入力されるべき時刻を選択し、身振りを実際に
発生させてカメラで収集し、その様子が表示される。In the editor window for "gesture",
The time when the gesture should be input is selected, the gesture is actually generated and collected by the camera, and the state is displayed.

【００２９】「画面」用のエディタウィンドウでは、パ
ーソナルコンピュータの画面で表示されるべき画面の一
部分を画面から切り取り入力する。In the editor window for "screen", a part of the screen to be displayed on the screen of the personal computer is cut and input from the screen.

【００３０】以上のように、各入力メディアごとに記述
することにより、マルチモーダルルールの条件部を定義
することができる。この条件部に規定されているメディ
アで記述されている入力が、指定された時刻にそれぞれ
観測された場合、このマルチモーダルルールは実行可能
であると判定される。As described above, the condition part of the multimodal rule can be defined by describing each input medium. When the inputs described in the media specified in this condition part are observed at the specified time, it is determined that this multimodal rule can be executed.

【００３１】次に図７を用いてマルチモーダルルールの
実行部の定義方法を説明する。図７は、図５のルール実
行部５８を拡大したものである。出力メディアとして、
「音」７１、「音声」７２、「テキスト表示」７３、
「アニメーション」７４、「コマンド実行」７５、「画
面」７６の計６種が用意されている。「音」と「音声」
の違いは、入力部と同様、「音」は言葉にならない効果
音を表わすのに対し、「音声」は言語化可能な音を表わ
す。Next, the method of defining the execution part of the multimodal rule will be described with reference to FIG. FIG. 7 is an enlarged view of the rule execution unit 58 of FIG. As output media,
“Sound” 71, “Voice” 72, “Text display” 73,
A total of 6 types of "animation" 74, "command execution" 75, and "screen" 76 are prepared. "Sound" and "Voice"
The difference is that, like the input section, "sound" represents a sound effect that cannot be expressed in words, while "speech" represents a verbalizable sound.

【００３２】６種の出力メディアごとに、それぞれエデ
ィタウィンドウ７９が用意されている。メディアごとに
エディタウィンドウの動作は異なるが、時間座標７７を
持つ、また、開始時刻を明示するために三角印７８を用
いることなどは、入力部と同様の意味を持つ。An editor window 79 is prepared for each of the six types of output media. The operation of the editor window differs depending on the medium, but having the time coordinate 77 and using the triangular mark 78 to clearly indicate the start time has the same meaning as the input section.

【００３３】「音」用のエディタウィンドウでは、音が
発生されるべき時刻を選択し、音を実際に発生させてマ
イクで収集し、その波形が表示される。In the "sound" editor window, the time at which the sound should be generated is selected, the sound is actually generated and collected by the microphone, and its waveform is displayed.

【００３４】「音声」用のエディタウィンドウでは、音
声が発生されるべき時刻を選択し、その音声のテキスト
を入力する。開始時刻は三角印７８で明示してもよい。In the editor window for "voice", select the time at which the voice should occur and enter the text of that voice. The start time may be clearly indicated by the triangle mark 78.

【００３５】「テキスト表示」用のエディタウィンドウ
では、モニタに文字列として表示すべきテキストを入力
する。In the editor window for "text display", the text to be displayed as a character string on the monitor is input.

【００３６】「アニメーション」用のエディタウィンド
ウでは、モニタに動画として表示すべきアニメーション
を入力する。In the editor window for "animation", an animation to be displayed as a moving image on the monitor is input.

【００３７】「コマンド実行」用のエディタウィンドウ
では、パーソナルコンピュータで実行すべきコマンドを
明示的に記述する。図７の例では、あるファイル（Ｆil
e01）を削除するコマンド（Ｄelete Ｆile01）が記述
されている。In the editor window for "command execution", the command to be executed by the personal computer is explicitly described. In the example of FIG. 7, a file (Fil
The command (Delete File 01) for deleting e01) is described.

【００３８】「画面」用のエディタウィンドウでは、本
ルールを実行後、パーソナルコンピュータの画面で表示
されるべき画面の一部分を画面から切り取り入力する。
この解釈に関しては図８を用いて後述する。In the editor screen for "screen", after executing this rule, a part of the screen to be displayed on the screen of the personal computer is cut and input from the screen.
This interpretation will be described later with reference to FIG.

【００３９】以上のように、各出力メディアごとに記述
することにより、マルチモーダルルールの実行部を定義
することができる。As described above, by describing each output medium, the execution part of the multimodal rule can be defined.

【００４０】マルチモーダルルールを実行するとルール
の実行部に規定されたメディアで、そこに記述された内
容を、指定された時刻に出力することになる。例外は、
「コマンド実行」７５と「画面」７６である。「コマン
ド実行」７５に記述された内容は、指定された時刻にそ
のコマンド自身が実行され、「画面」７６の場合は、指
定された時刻に、そこに記述された画面の状態になるよ
うなコマンド列が推定され、順々に実行されることにな
る。When the multi-modal rule is executed, the content described in the medium defined in the rule execution part is output at the designated time. The exception is
These are “command execution” 75 and “screen” 76. The content described in the "command execution" 75 is such that the command itself is executed at the designated time, and in the case of the "screen" 76, the state of the screen described there is set at the designated time. The command sequence will be estimated and executed sequentially.

【００４１】図８に、「画面」に関する定義のみからな
るマルチモーダルルールの例を示す。条件部にある画面
の記述８１はファックス受信時にパーソナルコンピュー
タのデスクトップに現われるアイコンの変化後の状態を
表わしている。すなわち、パーソナルコンピュータにフ
ァックスが到着するとこのアイコン８１が現われる。ル
ール実後部の記述８２はそのファイルをホルダＡにのせ
る、つまり、格納する操作を行う場合の画面の状態を表
わしている。従って、本マルチモーダルルールはファッ
クス受信時に実行可能となり、受信したファイルがホル
ダＡに保存されるようなコマンドシーケンスが実行され
ることになる。FIG. 8 shows an example of a multi-modal rule consisting only of definitions relating to "screen". The screen description 81 in the condition section represents the state after the change of the icon appearing on the desktop of the personal computer when the fax is received. That is, this icon 81 appears when the fax arrives at the personal computer. The description 82 in the actual part of the rule represents the state of the screen when the file is placed on the holder A, that is, the storing operation is performed. Therefore, this multi-modal rule can be executed when a fax is received, and a command sequence for saving the received file in the holder A is executed.

【００４２】図９にルール全体の学習、ルール条件部の
学習、ルール結論部の学習に用いられる履歴情報の格納
形式を示す。格納形式９１は、ルール条件部、結論部の
定義形式を融合したものであり、項目としては、「音」
９２、「音声」９３、「キーボード」９４、「マウス」
９５、「ペン」９６、「身振り」９７、「画面」９８で
ある。これらからルール条件部、ルール結論部を直接生
成することが可能である。また、共通操作の抽出は従来
知られているクラスタリング手法を用いる。FIG. 9 shows a storage format of history information used for learning the entire rule, learning the rule condition part, and learning the rule conclusion part. The storage format 91 is a fusion of the definition formats of the rule condition part and the conclusion part, and the item is "sound".
92, "voice" 93, "keyboard" 94, "mouse"
95, “pen” 96, “gesture” 97, and “screen” 98. From these, it is possible to directly generate the rule condition part and the rule conclusion part. Further, a commonly known clustering method is used to extract the common operation.

【００４３】[0043]

【発明の効果】以上説明したように本発明の実施例によ
れば、キーボード、マウス、ペン、音声、身振りなど複
数のメディアを用いたユーザの指示に基づき、ユーザに
代わってコンピュータやアプリケーションソフトウエア
等を制御し、静止画、動画、音声、効果音等の複数のメ
ディアを用いて応答する知的なエージェントを、ユーザ
自身で、プログラミングを行うことなく定義することが
可能となる。As described above, according to the embodiments of the present invention, a computer or application software is operated on behalf of the user based on the user's instructions using a plurality of media such as a keyboard, a mouse, a pen, voice, and a gesture. It is possible to define an intelligent agent that controls the above and responds using a plurality of media such as still images, moving images, voices, and sound effects without programming by the user himself.

[Brief description of drawings]

【図１】本発明の第一の実施形態を示す図である。FIG. 1 is a diagram showing a first embodiment of the present invention.

【図２】エージェントに関するメニューを示す図であ
る。FIG. 2 is a diagram showing a menu relating to an agent.

【図３】全体の動作フローを示す説明図である。FIG. 3 is an explanatory diagram showing an overall operation flow.

【図４】マルチモーダルルールウィンドウの動作フロー
を説明する図である。FIG. 4 is a diagram illustrating an operation flow of a multi-modal rule window.

【図５】マルチモーダルルールエディタの全体を表わす
図である。FIG. 5 is a diagram showing an entire multi-modal rule editor.

【図６】マルチモーダルルールエディタの条件部を説明
する図である。FIG. 6 is a diagram illustrating a condition part of a multimodal rule editor.

【図７】マルチモーダルルールエディタの実後部を説明
する図である。FIG. 7 is a diagram illustrating a real rear part of a multimodal rule editor.

【図８】画面に関する記述のみからなるマルチモーダル
ルールの例を示す図である。FIG. 8 is a diagram illustrating an example of a multi-modal rule including only a description regarding a screen.

【図９】マルチモダール履歴情報の例を示す図である。FIG. 9 is a diagram showing an example of multi-modal history information.

[Explanation of symbols]

１…パーソナルコンピュータのモニタ、２…エ
ージェント、３…スピーカ、４…カメラ、
５…マイク、７、５１…マルチモーダルルールウ
ィンドウ、８、５５…マルチモーダルルール条件部、
９、５８…マルチモーダルルール実行部、２１…エ
ージェントメニュー、８１、８２…画面のみの定義から
なるマルチモーダルルールの例、９１…マルチモーダル
履歴記憶情報。1 ... Monitor of personal computer, 2 ... Agent, 3 ... Speaker, 4 ... Camera,
5 ... Mike, 7, 51 ... Multi-modal rule window, 8, 55 ... Multi-modal rule condition part,
9, 58 ... Multimodal rule execution unit, 21 ... Agent menu, 81, 82 ... Example of multimodal rule consisting of only screen definition, 91 ... Multimodal history storage information.

───────────────────────────────────────────────────── フロントページの続き (72)発明者冨田民則神奈川県川崎市麻生区王禅寺1099番地株式会社日立製作所システム開発研究所内 (72)発明者青島弘和神奈川県川崎市麻生区王禅寺1099番地株式会社日立製作所システム開発研究所内 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Tamunori Tomita 1099 Ozenji Temple, Aso-ku, Kawasaki City, Kanagawa Prefecture Inside the Hitachi, Ltd.System Development Laboratory (72) Inventor Hirokazu Aoshima 1099 Address Ozenji Temple, Aso-ku, Kawasaki City, Kanagawa Prefecture Hitachi Systems Development Laboratory

Claims

[Claims]

1. A still image, a moving image, a sound, a sound effect by controlling a computer, application software, etc. on behalf of the user based on a user's instruction using a plurality of media such as a keyboard, a mouse, a pen, a voice and a gesture. In a computer system having an agent that responds by using multiple media such as a multi-modal condition description consisting of description of input content and input timing in multiple media such as keyboard input, mouse input, pen input, voice input, and gesture input , Output information description such as text display output, still image output, video output, audio output, sound effect output, execution command description, and their output timing description, indicating the contents to be executed or responded when the condition description is satisfied Of the multimodal execution part description, which consists of A system for supporting construction of an intelligent agent characterized in that the behavior of an agent is defined by a set of multimodal rules composed of two sets.

2. The system according to claim 1, wherein the multimodal condition description can specify input contents for each input mode such as voice or keyboard input on a common time axis. Construction support system for intelligent agents.

3. The system according to claim 1, wherein the multimodal execution description can specify output contents for each output mode such as voice and screen output on a common time axis. Construction support system for intelligent agents.

4. The system according to claim 1, wherein the multimodal condition description and the multimodal output description include designation of a state of a computer screen, and the description of the computer screen in the multimodal condition description is such that the computer screen is actually a computer. As a condition that becomes true at the time of appearing on the output screen of, the description of the computer screen appearing in the multimodal output description is as the execution of the command for controlling the computer so that the computer screen becomes the computer screen or operating the software application.
A construction support system for intelligent agents characterized by being interpreted.

5. The system according to claim 1, wherein the user automatically generates a multimodal conditional part description and a multimodal execution part description by giving a plurality of correct and incorrect examples to the multimodal. Construction support system for intelligent agents.

6. The system according to claim 1, wherein the operation history of the user is input information using a plurality of media such as a keyboard, a mouse, a pen, voice, and a gesture, and output of a still image, a moving image, a voice, a sound effect, and the like. An intelligent agent construction support system characterized by automatically generating a multi-modal rule by storing and analyzing a multi-modal operation log consisting of information and user's operations.