JP3170103B2

JP3170103B2 - Object pointing method by compound form

Info

Publication number: JP3170103B2
Application number: JP14507193A
Authority: JP
Inventors: ハル安藤; 義典北原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-06-16
Filing date: 1993-06-16
Publication date: 2001-05-28
Anticipated expiration: 2016-05-28
Also published as: JPH075977A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、パーソナルコンピュー
タ（パソコン）、ワークステーション、ワードプロセッ
サ（ワープロ）等のＯＡ機器に搭載された図形編集シス
テム等のユーザインタフェースに関し、特に、ユーザに
とって使い勝手のよい、複合形態による対象物指示方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a user interface such as a graphic editing system mounted on an OA device such as a personal computer (personal computer), a workstation, or a word processor (word processor). The present invention relates to a method of pointing an object by a form.

【０００２】[0002]

【従来の技術】従来の対象物指示方法では、北原他「音
声入力による情報検索システムにおける口語文受理方法
の検討」（日本音響学会春季講演論文集３−５−７
１９９１．３）等に見られるように、音声のみを用いて
対象物を指示するものや、対象物の表示範囲内をマウス
等のポインティングデバイスの指示（マウスイベント）
によって対象物を認識するものが知られている。2. Description of the Related Art A conventional object pointing method is disclosed in Kitahara et al., "Study on Spoken Word Receiving Method in Information Retrieval System by Voice Input" (Proceedings of the Acoustical Society of Japan, Spring Meeting, 3-5-7).
As can be seen in 19911.3) and the like, an instruction of an object using only sound or an instruction of a pointing device such as a mouse within a display range of the object (mouse event)
An object that recognizes a target object is known.

【０００３】また、図形を指示する場合、B.A.Bolt “P
ut that there"(Computer Graphics, 14,3,1980)ｐｐ２
６２−２７０に開示されているように、ジェスチャーを
伴う指示代名詞の発声で指示する他、図形編集を行う前
に、予め画面上に描かれている物体に名前をつけてお
き、以後この名前を使用するといった方法があった。When a figure is designated, BABolt "P
ut that there "(Computer Graphics, 14,3,1980) pp2
As disclosed in JP-A-62-270, in addition to giving instructions by uttering demonstrative pronouns accompanied by gestures, before performing graphic editing, a name is given to an object drawn on the screen in advance, and this name is hereinafter referred to as this name. There was a method such as using.

【０００４】[0004]

【発明が解決しようとする課題】上記の従来技術では、
画面上に指示すべき対象物または物体が離散して存在す
る場合には、画面上の対象物を指やポインティングデバ
イスで指示しながら、「これ」あるいは「この四角形」とい
うような発声指示を行うことは、対象物を特定するのに
十分である。しかし、画面上の対象物が相互に近接し、
あるいは重なりあっているような場合、指等で対象物を
一意に特定するのは困難な場合があり、その指示をより
複雑な音声指示で補う必要が生じることが考えられる。
一方、より複雑な音声指示は誤認識の可能性が高くなる
という問題がある。In the above prior art,
When an object or an object to be instructed on the screen exists discretely, a voice instruction such as “this” or “this square” is performed while pointing the object on the screen with a finger or a pointing device. That is enough to identify the object. However, the objects on the screen are close to each other,
Alternatively, in the case of overlapping, it may be difficult to uniquely identify the target object with a finger or the like, and it may be necessary to supplement the instruction with a more complicated voice instruction.
On the other hand, a more complicated voice instruction has a problem that the possibility of erroneous recognition increases.

【０００５】本発明の目的は、ユーザが、メディアの利
用形態を意識せずに、自然に情報入力出来る様にし、さ
らに音声認識において大きな課題である誤認識を複数の
入力モードを利用することにより減少させ、図形編集、
画像編集等において、ユーザにとって自然で使い勝手の
よい対象物指示方法を提供することにある。An object of the present invention is to enable a user to input information naturally without being conscious of the form of use of media, and to use a plurality of input modes for erroneous recognition, which is a major problem in voice recognition. Decrease, shape editing,
An object of the present invention is to provide a method of pointing an object that is natural and easy for a user to use in image editing and the like.

【０００６】[0006]

【課題を解決するための手段】上記の問題を解決するた
めに、本発明による複合形態による対象物指示方法は、
表示装置の画面上に表示された複数の表示物の一つを対
象物として指示する際、指もしくはペンによるポインテ
ィング指示と音声による指示とを併用する、複合形態に
よる対象物指示方法であって、前記複数の表示物の相互
の位置関係を求め、該位置関係を表わす表示物知識テー
ブルを作成しておき、前記位置関係を含む音声による表
示物の指示に対する音声認識結果を前記表示物知識テー
ブルの内容に照らして、該音声が指示する対象物候補を
求め、前記ポインティング指示に基づく位置情報を求
め、該ポインティング指示に基づく位置情報と前記対象
物候補とに基づき対象物を特定するようにしたものであ
る。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, a method of pointing an object by a compound form according to the present invention comprises:
When instructing one of a plurality of display objects displayed on the screen of the display device as an object, a pointing instruction with a finger or a pen and an instruction by voice are used in combination, an object instruction method in a complex form, The mutual positional relationship between the plurality of display objects is obtained, a display object knowledge table representing the positional relationship is created, and the speech recognition result for the instruction of the display object by the voice including the positional relationship is stored in the display object knowledge table. In the light of the content, an object candidate indicated by the voice is obtained, position information based on the pointing instruction is obtained, and an object is specified based on the position information based on the pointing instruction and the object candidate. It is.

【０００７】また、本発明による複合形態入力装置は、
情報を表示する情報表示手段と、該情報表示手段の表示
画面上で指もしくはペンを用いて画面上の位置を指示す
るポインティング手段と、音声による指示を認識する音
声認識手段とを備え、ポインティング指示と音声による
指示とを併用して前記表示画面上の複数の表示物の一つ
を対象物として指示する複合形態入力装置であって、少
なくとも前記表示画面上に表示された複数の表示物の名
称を表わす単語および該複数の表示物の相互の位置関係
を表わす単語を含む単語辞書を格納する手段と、前記複
数の表示物の相互の位置関係を求める手段と、該位置関
係を表わす表示物知識テーブルを格納する手段と、前記
ポインティング指示に基づく位置情報を求める手段と、
前記音声認識手段により認識された表示物の名称および
位置関係を前記表示物知識テーブルに照らして、対象物
候補を求めると共に、前記位置情報に照らして該対象物
候補から対象物を特定する対象物特定手段とを備えたも
のである。[0007] Also, a composite form input device according to the present invention is provided
Information display means for displaying information; pointing means for indicating a position on the display screen of the information display means using a finger or a pen; and voice recognition means for recognizing an instruction by voice; A combined form input device for instructing one of a plurality of display objects on the display screen as a target object by using together with a voice instruction, wherein at least the names of the plurality of display objects displayed on the display screen Means for storing a word dictionary including a word representing the word and a word representing the mutual positional relationship between the plurality of displayed objects, means for determining the mutual positional relationship between the plurality of displayed objects, and display object knowledge representing the positional relationship Means for storing a table, means for obtaining position information based on the pointing instruction,
An object that specifies the name and the positional relationship of the display object recognized by the voice recognition unit with reference to the display object knowledge table and obtains the object candidate, and specifies the object from the object candidate based on the position information. And specifying means.

【０００８】[0008]

【作用】画面に表示されている複数の表示物の相互の位
置関係を記憶している知識テーブルを具備し、ユーザ
が、音声やポインティングジェスチャにより対象物を指
示すると、音声情報とポインティングジェスチャが取り
込まれる。音声入力に対しては、音声認識の手法として
例えば音声情報の形態素解析及び構文解析が行われ、該
解析結果により対象物の図形の名称および他の図形との
位置関係が抽出される。この音声認識結果と図形知識テ
ーブルとの照合により、対象物候補がしぼられる。さら
に、ポインティングジェスチャによって入力された位置
座標に照らして対象物候補群中から対象物が一意に特定
される。The present invention has a knowledge table which stores a mutual positional relationship between a plurality of display objects displayed on a screen. When a user designates an object by voice or pointing gesture, voice information and pointing gesture are taken in. It is. For voice input, for example, morphological analysis and syntax analysis of voice information are performed as a voice recognition technique, and the name of the graphic of the object and the positional relationship with other graphics are extracted based on the analysis result. By comparing the speech recognition result with the graphic knowledge table, the object candidate is narrowed down. Further, the target object is uniquely specified from the target candidate group in light of the position coordinates input by the pointing gesture.

【０００９】また、音声情報解析手段において算出され
た尤度が大きい対象物候補であっても、音声認識誤りの
可能性があり、このような場合にはポインティングジェ
スチャにより指示された範囲に照らしてその対象物候補
が除外されるので、音声認識誤りによる対象物認識誤り
を防止することができる。[0009] Further, even if the object candidate has a large likelihood calculated by the voice information analysis means, there is a possibility that a voice recognition error may occur. In such a case, the target is considered in the range specified by the pointing gesture. Since the object candidate is excluded, an object recognition error due to a speech recognition error can be prevented.

【００１０】また、１回の操作によって対象物候補が複
数個認識され、さらに対象物をしぼり込む必要がある場
合には、対象物候補拡大手段によりそれら対象物候補が
位置関係を保って拡大表示され、ユーザが、音声、ポイ
ンティングジェスチャ等を再度入力することにより、候
補をしぼることが可能になる。When a plurality of object candidates are recognized by one operation, and it is necessary to further narrow down the object, the object candidate enlargement means maintains the positional relationship and displays the object candidates in an enlarged manner. Then, the user can narrow down the candidates by re-inputting the voice, the pointing gesture, and the like.

【００１１】[0011]

【実施例】以下、本発明の実施例を図面を用いて詳細に
説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１２】図１は、本発明の一実施例を示すブロック
図である。ここでは、図形編集システムを想定して説明
を行なう。ただし、本発明は、該システムに限らず、Ｃ
ＡＤシステム等、画像処理システム一般への応用が可能
である。FIG. 1 is a block diagram showing an embodiment of the present invention. Here, description will be made assuming a graphic editing system. However, the present invention is not limited to this system,
Application to general image processing systems such as an AD system is possible.

【００１３】図１のシステムは、情報処理装置１、主記
憶装置２、タッチパネル５、このタッチパネル５の制御
装置３、ディスプレイ４、このディスプレイ４を制御す
る表示制御装置６、マイク８、このマイク８の入力信号
をデジタル信号に変換するＡ／Ｄ変換装置７、および各
種プログラムおよびデータを格納する外部記憶装置とし
てのディスク９からなる。The system shown in FIG. 1 includes an information processing device 1, a main storage device 2, a touch panel 5, a control device 3 for the touch panel 5, a display 4, a display control device 6 for controlling the display 4, a microphone 8, and a microphone 8. And an A / D converter 7 for converting an input signal into a digital signal, and a disk 9 as an external storage device for storing various programs and data.

【００１４】ディスク９上のシステムプログラム１１、
図形編集プログラム１２、音響認識プログラム１３、ポ
インティング座標読み込みプログラム１４、対象物指示
プログラム１５、音響標準パタンデータ１６、単語辞書
１７および表示物知識テーブル作成プログラム１８は、
システム立ち上げ時に、主記憶装置２にロードされる。The system program 11 on the disk 9
The figure editing program 12, the sound recognition program 13, the pointing coordinate reading program 14, the object instruction program 15, the sound standard pattern data 16, the word dictionary 17, and the display object knowledge table creating program 18
When the system is started, it is loaded into the main storage device 2.

【００１５】音響認識プログラム１３は、図５に示すよ
うに、音声入力プログラム１３１、特徴抽出プログラム
１３２より構成される。As shown in FIG. 5, the sound recognition program 13 comprises a voice input program 131 and a feature extraction program 132.

【００１６】対象物指示プログラム１５は、図１０に示
すように、ポインティング領域認識プログラム１５１、
辞書マッチングプログラム１５２、及び対象物候補拡大
プログラム１５３から構成されている。As shown in FIG. 10, the object pointing program 15 includes a pointing area recognition program 151,
It comprises a dictionary matching program 152 and an object candidate expanding program 153.

【００１７】辞書マッチングプログラム１５２の起動時
に用いられる単語辞書１７は、図１１に示す様に、単語
番号１１１と、各単語番号のスロットに属する互いに意
味の近い類義語の代表単語１１２、及び類義語１１３よ
りなる。例えば、「円」、「まる」という類義語には代
表単語が「円」とされ、「内側」、「内」、「中」「中
側」という類義語には代表単語が「内側」とされてい
る。単語番号１１１は、後述する表示物知識テーブル
（図１１）に定める表示物の相互の位置関係を表わす単
語（例えば、内側、外側）については、表示物知識テー
ブルに格納する関係説明番号と同じにしてある。As shown in FIG. 11, the word dictionary 17 used at the time of starting the dictionary matching program 152 is composed of a word number 111, a synonym representative word 112 and synonyms 113 having similar meanings belonging to each word number slot. Become. For example, the synonyms "Yen" and "Maru" have the representative word "Yen", and the synonyms "Inside", "Inside", "Middle" and "Middle" have the representative word "Inside". I have. The word number 111 is the same as the relationship description number stored in the display object knowledge table for words (for example, inside and outside) representing the mutual positional relationship of the display objects defined in the display object knowledge table (FIG. 11) described later. It is.

【００１８】図１９に、本実施例に処理で用いられる主
記憶装置２上の各種バッファメモリを示す。これらの各
バッファメモリについては、それぞれ利用される処理の
中で言及する。FIG. 19 shows various buffer memories on the main storage device 2 used in the processing in this embodiment. Each of these buffer memories will be described in the processing to be used.

【００１９】図２は、主記憶装置２にロードされた図形
編集プログラム１２を通じてディスプレイ４に表示され
た図形編集画面の一例である。本例では、画面の大きさ
は、横幅１０００、縦幅７００としている。同図におい
ては、図形編集プログラム１２によって主記憶装置２に
記憶された円描画テーブル、長方形描画テーブル、及び
三角形描画テーブルに基づいて、円７個、長方形１個、
及び三角形１個が、グラフィックモードで描かれてい
る。FIG. 2 is an example of a graphic editing screen displayed on the display 4 through the graphic editing program 12 loaded in the main storage device 2. In this example, the screen has a width of 1000 and a height of 700. In the figure, based on the circle drawing table, the rectangle drawing table, and the triangle drawing table stored in the main storage device 2 by the graphic editing program 12, seven circles, one rectangle,
And one triangle are drawn in graphic mode.

【００２０】図３に示すように、円描画テーブルＴＢ１
は、テーブル番号Ｅ＝８０１、円番号１００、中心ｘ座
標１０１、中心ｙ座標１０２、半径１０３及び色番号１
０４より構成され、長方形描画テーブルＴＢ３は、テー
ブル番号Ｅ＝８０２、長方形番号２００、左上頂点のｘ
座標２０１、ｙ座標２０２、縦長２０３、横長２０４及
び色番号２０５より構成される。また、三角形描画テー
ブルＴＢ３は、テーブル番号Ｅ＝８０３、三角形番号３
００、頂点座標１のｘ座標３０１１、頂点座標１のｙ座
標３０１２、頂点座標２のｘ座標３０２１、頂点座標２
のｙ座標３０２２、頂点座標３のｘ座標３０３１、頂点
座標３のｙ座標３０３２及び色番号３０４より構成され
る。色番号については、図形の色が白の場合は“１００
１”、黒の場合は“１００２”、赤の場合には“１００
３”というように記憶される。同図の円描画テーブルＴ
Ｂ１の例では、円番号Ｃ１として中心（２８０、３４
０）、半径５０、色が白の円、円番号Ｃ２として中心
（３９０、２３０）、半径５０、色が黒の円、円番号Ｃ
３として中心（３９０、３４０）、半径２０、色が白の
円、円番号Ｃ４として中心（３９０、３４０）、半径１
０、色が白の円、円番号Ｃ５として中心（５３０、２７
０）、半径４０、色が白の円、円番号Ｃ６として中心
（５３０、２７０）、半径６０、色が白の円の円がそれ
ぞれ描画されている。長方形描画テーブルＴＢ２の例で
は、長方形番号Ｒ１として左上頂点の座標が（１４０、
４６０）、縦長１２０、横長２５０、色が白である長方
形が描画されている。同様に、三角形描画テーブルＴＢ
３の例では、三角形番号Ｔ１として、頂点座標（６２
０、５２０）、（５５０、６６０）、（６８０、６３
０）、色が白である三角形が描画されている。なお、画
面上のＸＹ座標は、画面の左上を原点とする。As shown in FIG. 3, a circle drawing table TB1
Are table number E = 801, circle number 100, center x coordinate 101, center y coordinate 102, radius 103, and color number 1
04, the rectangle drawing table TB3 has a table number E = 802, a rectangle number 200, and x at the upper left vertex.
The coordinate 201 includes a coordinate 201, a y coordinate 202, a portrait 203, a landscape 204 and a color number 205. The triangle drawing table TB3 has a table number E = 803, a triangle number 3
00, x coordinate 3011 of vertex coordinate 1, y coordinate 3012 of vertex coordinate 1, x coordinate 3021 of vertex coordinate 2, vertex coordinate 2
, A vertex coordinate 3 x coordinate 3031, a vertex coordinate 3 y coordinate 3032, and a color number 304. The color number is “100” when the figure color is white.
1 "," 1002 "for black," 100 "for red
3 ". The circle drawing table T in FIG.
In the example of B1, the center (280, 34
0), radius 50, white circle, center (390, 230) as circle number C2, radius 50, black circle, circle number C2
3 as center (390, 340), radius 20, circle of white color, circle number C4 as center (390, 340), radius 1
0, circle of white color, center as circle number C5 (530, 27
0), a circle having a radius of 40, a white circle, and a circle having a center (530, 270), a radius of 60, and a white circle as a circle number C6 are drawn. In the example of the rectangle drawing table TB2, the coordinates of the upper left vertex are (140,
460), a rectangle 120 vertically long, 250 horizontally long, and white in color are drawn. Similarly, the triangle drawing table TB
In the example of No. 3, the vertex coordinates (62
0, 520), (550, 660), (680, 63)
0), a triangle whose color is white is drawn. The XY coordinates on the screen have the origin at the upper left of the screen.

【００２１】この時、描画テーブルに描かれている円の
個数、長方形の個数、及び三角形の個数が、主記憶装置
２上のバッファメモリＦ１、Ｆ２及びＦ３に格納されて
いるものとする。すなわち、前記の例では、Ｆ１に
“６”、Ｆ２に“１”、Ｆ３に“１”が格納されてい
る。また、Ｆ１＋Ｆ２＋Ｆ３の値を主記憶装置２上のバ
ッファメモリＷに格納する。本実施例の場合は、Ｗ＝８
となる。さらに、画面表示物全体の通し番号は、円、長
方形、三角形の順に１からＷとつけられる。例えば、通
し番号＝１の図形は、円番号“Ｃ１”の図形、通し番号
＝７の図形は、長方形番号“Ｒ１”の図形となる。At this time, it is assumed that the numbers of circles, rectangles, and triangles drawn in the drawing table are stored in the buffer memories F1, F2, and F3 on the main storage device 2. That is, in the above example, "6" is stored in F1, "1" is stored in F2, and "1" is stored in F3. Further, the value of F1 + F2 + F3 is stored in the buffer memory W on the main storage device 2. In the case of this embodiment, W = 8
Becomes Further, the serial numbers of the entire screen display are numbered from 1 to W in the order of circle, rectangle, and triangle. For example, a figure with a serial number = 1 is a figure with a circle number “C1”, and a figure with a serial number = 7 is a figure with a rectangle number “R1”.

【００２２】また、主記憶装置２上に表示物間の位置関
係を記憶した表示物知識テーブルを設ける。この表示物
知識テーブルは、例えば図４のテーブルＮＴに示す様
に、任意の２個の図形Ａ，Ｂ間の位置関係を、次のよう
に関係説明番号を用いて定義する。すなわち、図形Ａが
図形Ｂの右側にある場合は“６０５０”、左側にある場
合は“６０４０”、図形Ａが図形Ｂと交差している場合
は“７０４０”、図形Ａが図形Ｂに内含されている場合
は“８０４２”（被内含）、図形Ａが図形Ｂを内含して
いる場合は“８０４１”（内含）とする。図形Ａと図形
Ｂとの関係説明番号は、図では１個のみを示している
が、複数の関係説明番号を用いる場合もある。例えば、
右側かつ交差というような場合には、“６０５０”およ
び“７０４０”の２つの関係説明番号が格納される。こ
の表示物知識テーブルＮＴは、表示物間の位置関係だけ
でなく、表示物の属性、例えば、自動車⊃窓（自動車の
窓）の関係等を記憶することも可能である。Further, a display object knowledge table storing the positional relationship between the display objects is provided on the main storage device 2. In this display object knowledge table, for example, as shown in a table NT of FIG. 4, a positional relationship between arbitrary two figures A and B is defined by using a relation explanation number as follows. That is, “6050” when the graphic A is on the right side of the graphic B, “6040” when the graphic A is on the left side, “7040” when the graphic A crosses the graphic B, and the graphic A is included in the graphic B. If the figure A includes the figure B, it is set to "8042" (included), and if the figure A includes the figure B, it is set to "8041" (included). Although only one relationship description number is shown between FIG. A and FIG. B in the figure, a plurality of relationship description numbers may be used. For example,
In the case of right side and intersection, two related explanation numbers “6050” and “7040” are stored. The display object knowledge table NT can store not only the positional relationship between the display objects but also the attributes of the display objects, for example, the relationship between the vehicle and the window (the window of the vehicle).

【００２３】図１８に、表示物知識テーブル作成プログ
ラムの処理フローを示す。この処理は、画面上で図形が
入力作成されたときに実行される。FIG. 18 shows a processing flow of the display object knowledge table creating program. This process is executed when a figure is input and created on the screen.

【００２４】まず、個々の図形のすべての輪郭点の座標
値のＸ座標の最大値Ｘｍｘ１および最小値Ｘｍｎ１を抽
出し（ｓ１８１）、次に、Ｘｍｎ１＜Ｘｂ＜Ｘｍｘ１の
範囲にあるＸｂを座標値として持つ臨界点を抽出し、さ
らにその最小値Ｙｍｎ１（Ｘｂ）および最大値Ｙｍｘ１
（Ｘｂ）を求める（ｓ１８２）。そこで、表示されてい
る図形の全ての組み合わせについて、個々の図形のＸｍ
ｎ１＜Ｘｂ＜Ｘｍｘ１、Ｙｍｎ１（Ｘｂ）＜Ｙｂ＜Ｙｍ
ｘ１（Ｘｂ）の範囲について、２つの図形の共通座標の
有無を考慮して包含関係（交差関係も含む）を判断する
（ｓ１８３）。さらに、個々の図形のＸ最大値と最小値
の平均値ＸｎｉおよびＹ最大値と最小値の平均値Ｙｎｉ
を算出し（ｓ１８４）、表示されている図形のすべてに
ついて（Ｘｎｉ，Ｙｎｉ）の比較を行い、相互の位置関
係を抽出する（ｓ１８５）。このようにして得られた結
果は、図４の表示図形知識テーブルＮＴに格納される。First, the maximum value Xmx1 and the minimum value Xmn1 of the X coordinate of the coordinate values of all the contour points of each figure are extracted (s181), and the Xb in the range of Xmn1 <Xb <Xmx1 is converted to the coordinate value. Is extracted, and its minimum value Ymn1 (Xb) and maximum value Ymx1
(Xb) is obtained (s182). Therefore, for all combinations of displayed graphics, the Xm
n1 <Xb <Xmx1, Ymn1 (Xb) <Yb <Ym
With respect to the range of x1 (Xb), an inclusion relation (including a cross relation) is determined in consideration of the presence / absence of common coordinates of the two figures (s183). Further, the average value Xni of the X maximum value and the minimum value of each figure and the average value Yni of the Y maximum value and the minimum value are obtained.
Is calculated (s184), (Xni, Yni) is compared for all of the displayed graphics, and the mutual positional relationship is extracted (s185). The result thus obtained is stored in the display graphic knowledge table NT of FIG.

【００２５】なお、図１８の処理は、一度にすべての図
形について処理するようにしたが、新たな図形が追加さ
れるごとに、その追加図形について既存の図形との位置
関係を求めるようにすることも可能である。Although the processing in FIG. 18 is performed on all the figures at once, each time a new figure is added, the positional relationship between the additional figure and the existing figure is obtained. It is also possible.

【００２６】図１４に、図形編集プログラム１２起動後
の処理の流れを示す。FIG. 14 shows the flow of processing after the graphic editing program 12 is started.

【００２７】ユーザは、画面上の表示物のうちから対象
物を１つ指示するものとする。まず、情報処理装置１
は、主記憶装置２上の音響認識プログラム１３を起動し
（ｓ３０１）、さらにポインティング座標読み込みプロ
グラム１４を起動する（ｓ３０２）。音響認識プログラ
ム１３が起動されると、最初に、音声入力プログラム１
３１（図５）が起動される。ユーザは、タッチパネル５
上で、指或いはペン等によって対象物を指示したり（図
６）、マイク４を用いて音声のみ（発声例“長方形”）
によって対象物を指示したりする（図７）（ｓ３０
３）。指等によるポインティング入力あるいは音声入力
のみの指示で不十分な場合は、タッチパネル５上で対象
物を指示しながら同時にマイク４を用いて音声（“内側
の円”）で該対象物を指示する（図８）。音声入力が行
われた場合には、“１”が主記憶装置２上の音声入力識
別バッファメモリＡに書き込まれる。また、音声入力が
行われなかった場合には、“０”が当該バッファメモリ
Ａに書き込まれる。ポインティングが入力された場合に
は、“１”が主記憶装置２上のポインティング入力識別
バッファメモリＢに書き込まれる。また、ポインティン
グが入力されなかった場合には、“０”が主記憶装置２
上のバッファメモリＢに書き込まれる。It is assumed that the user designates one of the objects displayed on the screen. First, the information processing device 1
Starts the sound recognition program 13 on the main storage device 2 (s301), and further starts the pointing coordinate reading program 14 (s302). When the sound recognition program 13 is started, first, the voice input program 1
31 (FIG. 5) is activated. The user touches the touch panel 5
Above, an object is pointed with a finger or a pen or the like (FIG. 6), or only a sound is generated using the microphone 4 (an example of utterance “rectangular”).
(FIG. 7) (s30)
3). If an instruction of only pointing input or voice input by a finger or the like is insufficient, the target object is indicated by voice (“inner circle”) using the microphone 4 while simultaneously indicating the target object on the touch panel 5 ( (FIG. 8). When a voice input is performed, “1” is written into the voice input identification buffer memory A on the main storage device 2. If no voice input is made, “0” is written to the buffer memory A. When the pointing is input, “1” is written into the pointing input identification buffer memory B on the main storage device 2. When no pointing is input, “0” is set in the main storage device 2.
The data is written to the upper buffer memory B.

【００２８】まず、図８の様な、音声入力とポインティ
ング入力の２つが同時に行われた場合について述べる。
この場合、ユーザの音声が入力されると（ｓ１０１）、
音声入力プログラム１３１によって、この音声信号がＡ
／Ｄ変換装置７に取り込まれ、ディジタル信号に変換さ
れた後（ｓ１０２）、主記憶装置２に送られる。続い
て、特徴抽出プログラム１３２（図５）が起動され、該
ディジタル信号を、１０ｍｓのフレーム周期で、特徴ベ
クトルとして、例えば、斉藤、中田「音声情報処理の基
礎」（オーム社、昭５６）記載のＬＰＣケプストラム係
数の時系列に変換する（ｓ１０３）。ここで、フレーム
周期は１０ｍｓに限定されることなく、２０ｍｓ、３０
ｍｓ等任意に設定することができる。また、特徴ベクト
ルも、ＬＰＣケプストラム係数のみに限定されず、バン
ドパスフィルタの出力等を使用することも可能である。First, a case where two voice inputs and a pointing input are performed simultaneously as shown in FIG. 8 will be described.
In this case, when a user's voice is input (s101),
According to the voice input program 131, this voice signal is A
After being taken into the / D conversion device 7 and converted into a digital signal (s102), it is sent to the main storage device 2. Subsequently, a feature extraction program 132 (FIG. 5) is started, and the digital signal is described as a feature vector with a frame period of 10 ms, for example, as described in Saito and Nakata, "Basics of Speech Information Processing" (Ohmsha, Showa 56). Is converted to a time series of LPC cepstrum coefficients (s103). Here, the frame period is not limited to 10 ms, but may be 20 ms, 30 ms.
ms and the like can be set arbitrarily. Also, the feature vector is not limited to the LPC cepstrum coefficient, but may use an output of a band-pass filter or the like.

【００２９】一方、主記憶装置２上のポインティング座
標読み込みプログラム１４は、音響認識プログラム１３
が起動されるのと同時に、情報処理装置１によって起動
され（ｓ３０２）、並列処理が行なわれる。このポイン
ティング座標読み込みプログラム１４起動後の処理の流
れの一例を図１３により説明する。On the other hand, the pointing coordinate reading program 14 in the main storage device 2
Is activated by the information processing apparatus 1 at the same time as the activation (s302), and parallel processing is performed. An example of the flow of processing after starting the pointing coordinate reading program 14 will be described with reference to FIG.

【００３０】まず、主記憶装置２上のバッファメモリＰ
及びＱをゼロリセットする（ｓ２０１、ｓ２０２）。該
プログラムは、ユーザの指先或いはペン等がタッチパネ
ル５に触れている間（ｓ２０３）、一定時間間隔で接触
座標をパネル制御装置３を通じて取り込み（ｓ２０
４）、座標を取り込む毎にＰをインクリメントし、さら
に主記憶装置２のポインティング領域テーブルとして、
取り込んだｘ座標を配列メモリＸ［Ｐ］へ、ｙ座標を配
列メモリＹ［Ｐ］へ、座標入力時刻を配列メモリＴ
［Ｐ］へ書き込む（ｓ２０５）。ポインティング領域テ
ーブルは、図９のテーブルＰＴに示すように、座標番号
２００、入力時刻２０１、ｘ座標２０２、ｙ座標２０３
より構成されており、一定時間毎に、入力時刻、ｘ座標
及びｙ座標データが、入力された順序で座標番号“１”
から格納されていく。前記例では、１００ｍｓ毎に座標
データを記憶している。また指先或いはペン等が、タッ
チパネル５から離れてからある一定時間Ｔｏが経過する
と書き込みを終了する（ｓ２０３）。First, the buffer memory P in the main storage device 2
And Q are reset to zero (s201, s202). The program captures contact coordinates through the panel control device 3 at regular time intervals while the user's fingertip or pen touches the touch panel 5 (s203) (s20).
4), P is incremented each time coordinates are fetched, and further, as a pointing area table of the main storage device 2,
The fetched x coordinate is stored in the array memory X [P], the y coordinate is stored in the array memory Y [P], and the coordinate input time is stored in the array memory T [P].
Write to [P] (s205). As shown in the table PT of FIG. 9, the pointing area table has a coordinate number 200, an input time 201, an x coordinate 202, and a y coordinate 203.
The input time, the x-coordinate and the y-coordinate data are stored in a coordinate number “1” in the input order at regular intervals.
It is stored from. In the above example, the coordinate data is stored every 100 ms. When a certain time To elapses after the fingertip or the pen leaves the touch panel 5, the writing is terminated (s203).

【００３１】図１４に戻り、ユーザによるポインティン
グ入力や音声入力が終了すると、主記憶装置２上の対象
物指示プログラム１５（図１０）が起動される（ｓ３０
４）。まず始めに、バッファメモリＡの値を確認し（ｓ
３０５）、バッファメモリＡが“１”の場合（すなわち
音声入力が行われた場合）には、まず辞書マッチングプ
ログラム１５２が起動され（ｓ３０６）、続いてバッフ
ァメモリＢの値を確認し（ｓ３０７）、バッファメモリ
Ｂが“１”の場合（ポインティング入力が行われた場
合）には、ポインティング領域認識プログラム１５１が
起動される（ｓ３０８）。バッファメモリＡが“０”の
場合（音声入力が行われなかった場合）には、ポインテ
ィング領域認識プログラム１５１が即起動される（ｓ３
１１）。Returning to FIG. 14, when the pointing input or the voice input by the user is completed, the object instruction program 15 (FIG. 10) in the main storage device 2 is started (s30).
4). First, the value of the buffer memory A is checked (s
305), when the buffer memory A is "1" (that is, when voice input is performed), the dictionary matching program 152 is started first (s306), and then the value of the buffer memory B is confirmed (s307). When the buffer memory B is "1" (when the pointing input is performed), the pointing area recognition program 151 is started (s308). When the buffer memory A is "0" (when no voice input is performed), the pointing area recognition program 151 is immediately started (s3).
11).

【００３２】前記図８の例においては、バッファメモリ
Ａが“１”、バッファメモリＢが“１”であるため、ポ
インティング領域認識プログラム１５１および辞書マッ
チングプログラム１５２が共に起動される。In the example of FIG. 8, since the buffer memory A is "1" and the buffer memory B is "1", both the pointing area recognition program 151 and the dictionary matching program 152 are started.

【００３３】図１６により辞書マッチングプログラム１
５２の処理を説明する。先程求められた特徴ベクトルと
音響標準パタンデータ１６とのマッチングが行われる
（ｓ１６１）。これは、例えば、前述した北原他「音声
入力による情報検索システムにおける口語文受理方法の
検討」（日本音響学会、３−５−７、平３）に記載の方
法で行われ、その結果、入力音声は文字列に変換され
る。前記の例をとれば、この文字列は“うちがわのえ
ん”となる。さらに、前記文字列は、従来から行われて
いる方法、例えば、相沢他「計算機によるカナ漢字変
換」（ＮＨＫ技術研究、２５、５、昭４８）に記載され
ているような最長一致法を用いて形態素解析され（ｓ１
６２）、さらに単語辞書１７（図１１）とのマッチング
が行われ、個々の単語は代表単語に集約され、単語番号
を与えられた結果、音声認識結果の尤度の最大値として
（「内側」、名詞、８０４２）、（「の」、格助詞、９
００）、（「円」、名詞、７０１）の様な形態素情報が
得られる。次に、主記憶装置２上に表示物間の位置関係
を記憶した表示図形知識テーブルＮＴ（図４）の円のコ
ラム（図形Ａの一つの縦の欄）にある関係説明番号と、
（「内側」、名詞、８０４２）内の単語番号、すなわち
“８０４２”とがマッチングされ（ｓ１６３）、対象物
候補群が抽出される（ｓ１６４）。本例では、円描画テ
ーブル中の円番号“Ｃ４”、円番号“Ｃ５”が抽出され
る。また、対象物候補数を主記憶装置２上のバッファメ
モリＧに格納する。本実施例では、Ｇ＝２となる。According to FIG. 16, dictionary matching program 1
The processing of 52 will be described. Matching between the previously obtained feature vector and the sound standard pattern data 16 is performed (s161). This is performed, for example, by the method described in Kitahara et al., “Study of Spoken Language Receiving Method in Information Retrieval System by Voice Input” (Acoustical Society of Japan, 3-5-7, Hei 3). Is converted to a string. In the above example, this character string is “Uchigawa no En”. Further, the character string is obtained by using a conventional method, for example, the longest matching method described in Kana-Kanji conversion by computer (NHK Technical Research, 25, 5, 48). Morphological analysis (s1
62) Further, matching with the word dictionary 17 (FIG. 11) is performed, and individual words are aggregated into representative words. As a result of giving word numbers, the maximum likelihood value of the speech recognition result (“inside”) , Noun, 8042), ("no", case particle, 9
00), (“circle”, noun, 701). Next, a relation explanation number in a circle column (one vertical column of the graphic A) of the display graphic knowledge table NT (FIG. 4) in which the positional relation between the display objects is stored in the main storage device 2;
The word number in (“inside”, noun, 8042), that is, “8042” is matched (s163), and a candidate object group is extracted (s164). In this example, the circle number “C4” and the circle number “C5” in the circle drawing table are extracted. Further, the number of object candidates is stored in the buffer memory G on the main storage device 2. In the present embodiment, G = 2.

【００３４】次に、図１５によりポインティング領域認
識プログラム１５１の処理を説明する。主記憶装置２上
のＸ［１］からＸ［Ｐ］に格納されているＸ座標値のう
ちの最小値ＸＭｎと最大値ＸＭｘと、Ｙ［１］からＹ
［Ｐ］に格納されているＹ座標値の内の最小値ＹＭｎと
最大値ＹＭｘを算出し（ｓ１５１）、ＸＭｎ、ＸＭｘの
平均値とＹＭｎ、ＹＭｘの平均値が示す座標（Ｘｎｔ，
Ｙｎｔ）を求める（ｓ１５２）。次に、座標（Ｘｎｔ，
Ｙｎｔ）と対象物候補群中の個々の対象物候補との距離
ｌｇを順次計算し（ｓ１５３）、最も距離が短かった対
象物候補を一意に対象物と特定する（ｓ１５４）。Next, the processing of the pointing area recognition program 151 will be described with reference to FIG. The minimum value XMn and the maximum value XMx of the X coordinate values stored in X [1] to X [P] on the main storage device 2, and Y [1] to Y
The minimum value YMn and the maximum value YMx of the Y coordinate values stored in [P] are calculated (s151), and the coordinates (Xnt, Xnt, Xn, XMx, XMx) and the average value of YMn, YMx indicate.
Ynt) is obtained (s152). Next, the coordinates (Xnt,
The distance lg between Ynt) and each object candidate in the object candidate group is sequentially calculated (s153), and the object candidate having the shortest distance is uniquely identified as the object (s154).

【００３５】図１４に戻り、図１５のポインティング領
域認識プログラム１５１の処理により対象物候補群中に
（Ｘｎｔ，Ｙｎｔ）から等距離にある対象物候補が複数
個得られた場合には（ｓ３０９）、対象物候補拡大プロ
グラム１５３が起動され（ｓ３１０）、該候補図形群が
位置関係を保って拡大され、再度、音声やポインティン
グによって対象物指示を行う（ｓ３０３）ことにより、
それら複数個のうちから対象物をしぼることができる。Returning to FIG. 14, when a plurality of object candidates equidistant from (Xnt, Ynt) are obtained from the object candidate group by the processing of the pointing area recognition program 151 in FIG. 15 (s309). Then, the object candidate enlarging program 153 is activated (s310), the candidate graphic group is enlarged while maintaining the positional relationship, and the object is again instructed by voice or pointing (s303).
The object can be squeezed out of the plurality.

【００３６】図１７により、対象物候補群拡大プログラ
ム１５３の処理を説明する。まず、対象物候補群のある
領域にウィンドウを生成する（ｓ１７１）。続いて、対
象物候補群の相互の位置関係を保ちながら拡大した画像
を該ウィンドウ上に表示する（ｓ１７２）。そこで、ユ
ーザによる再度の指示で対象物を確定した（ｓ１７３）
後、当該ウィンドウを消去する（ｓ１７４）。Referring to FIG. 17, the processing of the object candidate group expansion program 153 will be described. First, a window is generated in a certain area of the candidate object group (s171). Subsequently, an enlarged image is displayed on the window while maintaining the mutual positional relationship between the candidate object groups (s172). Therefore, the object is determined by the user's instruction again (s173).
Thereafter, the window is deleted (s174).

【００３７】図８の例において、音声認識の結果、尤度
の最も大きい認識候補が“内側の円”となり、対象物候
補として円Ｃ４，Ｃ５が得られる。かつポインティング
領域認識プログラム１５１により、指示座標から対象物
候補群中の個々の対象物候補の中心座標までの距離が最
小の円Ｃ４が指示された対象物と判定される。In the example of FIG. 8, as a result of speech recognition, the recognition candidate having the highest likelihood is the "inner circle", and circles C4 and C5 are obtained as object candidates. In addition, the pointing area recognition program 151 determines that the circle C4 whose distance from the designated coordinates to the center coordinate of each of the candidate objects in the group of candidate objects is the designated object.

【００３８】なお、図１４のステップｓ３０８では、既
に辞書マッチングプログラム１５２により音声認識の結
果、対象物候補が得られているので、ポインティング領
域認識の処理においては、指示座標と対象物候補の中心
座標との距離を算出することなく、以下のようなより簡
易な方法を用いることができる。すなわち、ポインティ
ングジェスチャにより指示された近傍のＸ方向（または
Ｙ方法）範囲内に、個々の対象物候補の中心座標が内包
されるか否かを調べ、内包される対象物候補が１個存在
すれば、これが指示された対象物と判定される。２個以
上存在すれば、対象物候補群を拡大して再度指示を行
う。内包される対象物候補が存在しなければ、音声誤認
識の可能性があり、音声情報解析手段において算出され
た尤度の次順に大きい方ものについて順次、該尤度に対
応する単語で表わされる対象物の中心座標が、ポインテ
ィングジェスチャにより指示された範囲内に存在するか
どうかを判定し、該範囲内に存在する候補が抽出された
時点で該候補を一意に対象物と特定する。In step s308 of FIG. 14, since the target candidate has already been obtained as a result of speech recognition by the dictionary matching program 152, the pointing coordinates and the center coordinates of the target candidate are used in the pointing area recognition process. A simpler method such as the following can be used without calculating the distance to. That is, it is checked whether or not the center coordinates of each object candidate are included in the vicinity in the X direction (or Y method) designated by the pointing gesture, and if there is one included object candidate. If this is the case, it is determined to be the designated target. If there are two or more, the target object group is enlarged and the instruction is performed again. If the included object candidate does not exist, there is a possibility of erroneous speech recognition, and the next largest likelihood calculated by the speech information analysis means is sequentially represented by a word corresponding to the likelihood. It is determined whether or not the center coordinates of the object are within a range specified by the pointing gesture, and when a candidate existing within the range is extracted, the candidate is uniquely identified as the object.

【００３９】ポインティングジェスチャにより指示され
た範囲としては、例えば、ＸＭｎ−ｄ１≦Ｘｊ≦ＸＭｘ
＋ｄ２（ｄ１，ｄ２の値については後述）で定める座標
Ｘｊの範囲とする。ここで、ｄ１は画面の横幅の１／１
０の長さで、ここではｄ１＝１０とする。また、ｄ２は
画面横幅の１／２０の長さで、ここではｄ２＝５とす
る。ｄ１の方を大きくしたのは、右利きのユーザは図形
の右側を指示すると考えられるからである。左利きのユ
ーザに対してはｄ１，ｄ２の値を逆にする。したがっ
て、ｄ１，ｄ２の値はユーザが設定できるようにするこ
とが好ましい。但し、ポインティングジェスチャにより
指示される範囲を、Ｘ方向でなくＹ方向で定めるように
すれば、すなわち、ＹＭｎ−ｄ１≦Ｙｊ≦ＹＭｘ＋ｄ２
とすれば、利き腕に関係なく、ｄ１＞ｄ２とすればよ
い。The range specified by the pointing gesture is, for example, XMn-d1 ≦ Xj ≦ XMx
+ D2 (values of d1 and d2 will be described later) within a range of coordinates Xj. Here, d1 is 1/1 of the horizontal width of the screen.
The length is 0, and here, d1 = 10. D2 is 1/20 of the screen width, and here, d2 = 5. The reason for increasing d1 is that it is considered that a right-handed user indicates the right side of the figure. For left-handed users, the values of d1 and d2 are reversed. Therefore, it is preferable that the values of d1 and d2 can be set by the user. However, if the range specified by the pointing gesture is determined not in the X direction but in the Y direction, that is, YMn-d1 ≦ Yj ≦ YMx + d2
Then, d1> d2 may be set regardless of the dominant arm.

【００４０】なお、図７の様に、音声入力のみ行われた
場合には、“１”が主記憶装置２上の音声入力識別バッ
ファメモリＡに書き込まれ、ポインティング入力識別バ
ッファメモリＢには“０”が書き込まれる。バッファメ
モリＡが“１”、バッファメモリＢが“０”の場合に
は、辞書マッチングプログラム１５２のみが起動され、
求められた特徴ベクトルと音響標準パタンデータ１６と
のマッチングが、前述のように行われ、その結果、入力
音声は文字列に変換される。図７の例では、この文字列
は“ちょうほうけい”となる。さらに、この文字列は、
前述の最長一致法を用いて形態素解析され、さらに単語
辞書とのマッチングが行われた結果、（「長方形」、名
詞）の様な形態素情報が得られる。次に、抽出された名
詞番号と存在するテーブル番号とのマッチングが行わ
れ、本例では、テーブル番号Ｅ＝８０２が抽出される。
該テーブルには、図形Ｒ１の１つしか記憶されていない
ので、Ｒ１が選択され、長方形Ｒ１が画面上で強調表示
（例えば点滅）する。As shown in FIG. 7, when only voice input is performed, "1" is written into the voice input identification buffer memory A on the main memory 2 and "1" is written into the pointing input identification buffer memory B. 0 "is written. When the buffer memory A is “1” and the buffer memory B is “0”, only the dictionary matching program 152 is started,
Matching between the obtained feature vector and the sound standard pattern data 16 is performed as described above, and as a result, the input voice is converted into a character string. In the example of FIG. 7, this character string is “chohokei”. In addition, this string
As a result of morphological analysis using the longest matching method described above and further matching with a word dictionary, morphological information such as (“rectangle”, noun) is obtained. Next, matching is performed between the extracted noun number and the existing table number, and in this example, the table number E = 802 is extracted.
Since only one figure R1 is stored in the table, R1 is selected, and the rectangle R1 is highlighted (for example, blinks) on the screen.

【００４１】また、図５の様に、ポインティング入力の
み行われた場合には、“０”が主記憶装置２上の音声入
力識別バッファメモリＡに書き込まれ、ポインティング
入力識別バッファメモリＢには“１”が書き込まれる。
バッファメモリＡが“０”、バッファメモリＢが“１”
の場合には、ポインティング領域認識プログラム１５１
のみが起動され、主記憶装置２上のＸ［１］からＸ
［Ｐ］に格納されているＸ座標値のうちの最小値ＸＭｎ
と最大値ＸＭｘと、Ｙ［１］からＹ［Ｐ］に格納されて
いるＹ座標値の内の最小値ＹＭｎと最大値ＹＭｘを算出
し、ＸＭｎ、ＸＭｘの平均値とＹＭｎ、ＹＭｘの平均値
が示す座標（Ｘｎｔ，Ｙｎｔ）と対象物候補群中の個々
の対象物候補との距離を順次計算し、最も距離が短かっ
た対象物候補を一意に対象物と特定する。また、（Ｘｎ
ｔ，Ｙｎｔ）から最も距離が短い対象物候補が複数個あ
る場合には、対象物候補拡大プログラム１５３が起動さ
れて（ｓ３１０）、該候補図形群が位置関係を保って拡
大され、再度、音声やポインティングによって対象物指
示を行うことにより、それら複数個のうちから対象物を
しぼることができる。As shown in FIG. 5, when only pointing input is performed, "0" is written into the voice input identification buffer memory A on the main storage device 2, and "0" is written into the pointing input identification buffer memory B. 1 "is written.
Buffer memory A is "0", buffer memory B is "1"
In the case of, the pointing area recognition program 151
Are activated, and X [1] to X
The minimum value XMn of the X coordinate values stored in [P]
And the maximum value XMx, the minimum value YMn and the maximum value YMx of the Y coordinate values stored in Y [1] to Y [P] are calculated, and the average value of XMn and XMx and the average value of YMn and YMx are calculated. The distance between the coordinates (Xnt, Ynt) indicated by and the individual object candidates in the object candidate group is sequentially calculated, and the object candidate having the shortest distance is uniquely identified as the object. Also, (Xn
If there are a plurality of object candidates that are the shortest distance from (t, Ynt), the object candidate enlarging program 153 is started (s310), the candidate graphic group is enlarged while maintaining the positional relationship, and the sound is again reproduced. By giving an object instruction by pointing or pointing, the object can be squeezed out of the plurality of objects.

【００４２】[0042]

【発明の効果】画面に表示されている複数の表示物の包
含関係及び表示物間の位置関係を記憶しているテーブル
を備えることにより、ユーザが入力した音声情報とポイ
ンティングジェスチャによる位置情報から、複数の画面
表示物のうちユーザが意図する対象物を正確に認識する
ことができる。また、音声認識過程において算出された
尤度のうち最大値を持つ単語に対応する表示物が、ポイ
ンティングジェスチャにより指示された或る範囲内に存
在しない場合には、順次大きい尤度を持つ音声認識候補
単語を採用することにより、音声認識誤りによる対象物
誤認識を減少させることができ、ユーザの使い勝手は向
上する。また、１回の操作によって対象物候補が複数個
認識された場合、それら対象物候補を位置関係を保って
拡大表示する機能を備えることにより、１回の入力情報
だけでは対象物を一意に指示することが出来ない場合で
も、早く簡便に対象物を一意に特定できる。According to the present invention, by providing a table for storing the inclusive relation of a plurality of display objects displayed on the screen and the positional relation between the display objects, it is possible to obtain the audio information input by the user and the position information by the pointing gesture. The target object intended by the user among the plurality of screen display objects can be accurately recognized. If the display object corresponding to the word having the maximum value among the likelihoods calculated in the speech recognition process does not exist within a certain range instructed by the pointing gesture, the speech recognition with the sequentially larger likelihood is performed. By adopting the candidate words, it is possible to reduce the erroneous recognition of the object due to the speech recognition error, and the usability of the user is improved. In addition, when a plurality of object candidates are recognized by one operation, the function of enlarging and displaying the object candidates while maintaining the positional relationship is provided, so that the object can be uniquely designated by only one input information. Even if it is not possible, the object can be uniquely identified quickly and easily.

[Brief description of the drawings]

【図１】本発明の一実施例に係る図形編集システムを示
すブロック図である。FIG. 1 is a block diagram showing a graphic editing system according to an embodiment of the present invention.

【図２】実施例の図形編集画面の一例を示す説明図であ
る。FIG. 2 is an explanatory diagram illustrating an example of a graphic editing screen according to the embodiment;

【図３】実施例における描画テーブルのデータ構造の一
例を示す説明図である。FIG. 3 is an explanatory diagram illustrating an example of a data structure of a drawing table according to the embodiment.

【図４】実施例における表示図形知識テーブルのデータ
構造の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a data structure of a display graphic knowledge table in the embodiment.

【図５】実施例における音響認識プログラムのモジュー
ル構成を示す説明図である。FIG. 5 is an explanatory diagram showing a module configuration of a sound recognition program in the embodiment.

【図６】実施例においてポインティングジェスチャのみ
で対象物を指示している様子を示す図形編集画面の一例
の説明図である。FIG. 6 is an explanatory diagram of an example of a graphic editing screen showing a state in which an object is designated only by a pointing gesture in the embodiment.

【図７】実施例において音声のみで対象物を指示してい
る様子を示す図形編集画面の一例の説明図である。FIG. 7 is an explanatory diagram of an example of a graphic editing screen showing a state in which a target is indicated only by voice in the embodiment.

【図８】実施例において音声とポインティングジェスチ
ャの両方を用いて対象物を指示している様子を示す図形
編集画面の一例の説明図である。FIG. 8 is an explanatory diagram of an example of a graphic editing screen showing a state in which a target is indicated using both a voice and a pointing gesture in the embodiment.

【図９】実施例におけるポインティング領域テーブルの
データ構造の一例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a data structure of a pointing area table in the embodiment.

【図１０】実施例における対象物指示プログラム１５の
モジュール構成を示す説明図である。FIG. 10 is an explanatory diagram showing a module configuration of an object instruction program 15 in the embodiment.

【図１１】実施例における単語辞書１７のデータ構造の
一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of a data structure of a word dictionary 17 in the embodiment.

【図１２】実施例における音響認識プログラム１３起動
後の処理の流れの一例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of a flow of processing after activation of the acoustic recognition program 13 in the embodiment.

【図１３】実施例におけるポインティング座標読み込み
プログラム１４起動後の処理の流れの一例を示すフロー
チャートである。FIG. 13 is a flowchart illustrating an example of a processing flow after starting the pointing coordinate reading program according to the embodiment.

【図１４】実施例における図形編集プログラム１２起動
後の処理の流れの一例を示すフローチャートである。FIG. 14 is a flowchart illustrating an example of a process flow after the graphic editing program 12 is started in the embodiment.

【図１５】実施例におけるポインティング領域認識プロ
グラム１５１起動後の処理の流れの一例を示すフローチ
ャートである。FIG. 15 is a flowchart illustrating an example of a process flow after the pointing area recognition program 151 is started in the embodiment.

【図１６】実施例における辞書マッチングプログラム１
５２起動後の処理の流れの一例を示すフローチャートで
ある。FIG. 16 is a dictionary matching program 1 in the embodiment.
52 is a flowchart illustrating an example of the flow of processing after startup.

【図１７】実施例における対象物候補拡大プログラム１
５３起動後の処理の流れの一例を示すフローチャートで
ある。FIG. 17 is an object candidate enlarging program 1 in the embodiment.
53 is a flowchart illustrating an example of the flow of processing after activation.

【図１８】実施例における表示物知識テーブル作成プロ
グラム１８起動後の処理の流れの一例を示すフローチャ
ートである。FIG. 18 is a flowchart illustrating an example of a flow of processing after activation of a display object knowledge table creation program 18 in the embodiment.

【図１９】実施例において用いるバッファメモリの説明
図である。FIG. 19 is an explanatory diagram of a buffer memory used in the embodiment.

[Explanation of symbols]

１…情報処理装置、２…主記憶装置、３…パネル制御装
置、４…ディスプレイ、５…タッチパネル、６…表示制
御装置、７…Ａ／Ｄ変換装置、８…マイク、１１…シス
テムプログラム、１２…図形編集プログラム、１３…音
響認識プログラム、１４…ポインティング座標読み込み
プログラム、１５…対象物指示プログラム、１６…音響
標準パタンデータ、１７…単語辞書、１８…表示物知識
テーブル作成プログラム、DESCRIPTION OF SYMBOLS 1 ... Information processing device, 2 ... Main storage device, 3 ... Panel control device, 4 ... Display, 5 ... Touch panel, 6 ... Display control device, 7 ... A / D converter, 8 ... Microphone, 11 ... System program, 12 ... a figure editing program, 13 ... a sound recognition program, 14 ... a pointing coordinate reading program, 15 ... an object instruction program, 16 ... sound standard pattern data, 17 ... a word dictionary, 18 ... a display object knowledge table creation program,

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−372012（ＪＰ，Ａ) 特開昭60−146327（ＪＰ，Ａ) 特開昭59−183429（ＪＰ，Ａ) 実開昭59−178742（ＪＰ，Ｕ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 11/80 G06F 3/16 320 G06F 3/03 380 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-4-372012 (JP, A) JP-A-60-146327 (JP, A) JP-A-59-183429 (JP, A) 178742 (JP, U) (58) Field surveyed (Int. Cl. ⁷ , DB name) G06T 11/80 G06F 3/16 320 G06F 3/03 380

Claims

(57) [Claims]

An object instruction in a combined form, in which when pointing one of a plurality of objects displayed on a screen of a display device as an object, a pointing instruction by a finger or a pen and an instruction by voice are used together. A method, wherein a mutual positional relationship between the plurality of display objects is obtained, a display object knowledge table representing the positional relationship is created, and a voice recognition result for an instruction of the display object by voice including the positional relationship is obtained. A candidate for the object indicated by the voice is obtained based on the contents of the display object knowledge table, position information based on the pointing instruction is obtained, and an object is specified based on the position information based on the pointing instruction and the object candidate. A method of pointing an object in a composite form, comprising:

2. The method according to claim 1, wherein, among the object candidates having the highest likelihood based on the voice recognition, an object candidate located at a position closest to the position indicated by the pointing instruction is specified as the object. An object pointing method using a composite form.

3. An instruction range is obtained from position information based on the pointing instruction, and when an object candidate with the highest likelihood by the voice recognition exists in the instruction range, the object candidate is set as an object. The method according to claim 1, wherein the object is specified.

4. When there are a plurality of object candidates having the highest likelihood in the indicated range, an image area including the plurality of object candidates is enlarged, and thereafter, an instruction and a voice using the finger or the pen are again performed. The method according to claim 3, wherein the display object is instructed by a multi-function device.

5. When there is no object candidate having the highest likelihood in the indicated range, and when an object candidate having the next highest likelihood exists in the indicated range, the object candidate is set as the object. 5. The method according to claim 4, wherein the object is specified.

6. The method according to claim 3, wherein the pointing range is a predetermined range near the pointing coordinates of the finger or the pen.

7. An information display means for displaying information, a pointing means for indicating a position on the display screen of the information display means using a finger or a pen, a voice recognition means for recognizing a voice instruction. A composite form input device that uses one of a plurality of display objects on the display screen as an object using both a pointing instruction and a voice instruction, wherein at least a plurality of display objects are displayed on the display screen. Means for storing a word dictionary including a word representing the name of the displayed object and a word representing the mutual positional relationship of the plurality of displayed objects; means for determining the mutual positional relationship between the plurality of displayed objects; Means for storing a display object knowledge table indicating the position information; means for obtaining position information based on the pointing instruction; and a name and a name of the display object recognized by the voice recognition means. And a target specifying means for determining a target object from the candidate object by comparing the positional relationship with the display object knowledge table to obtain a target object candidate and referring to the position information. Shape input device.

8. An enlarging means for enlarging the object candidates while maintaining a positional relationship when one object cannot be identified from a plurality of object candidates by the object specifying means. Item 7. The composite form input device according to Item 7.