JPH10260772A

JPH10260772A - Object operation device and object operation method

Info

Publication number: JPH10260772A
Application number: JP6268197A
Authority: JP
Inventors: Osamu Yamaguchi; 修山口; Kazuhiro Fukui; 和広福井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-03-17
Filing date: 1997-03-17
Publication date: 1998-09-29
Anticipated expiration: 2017-03-17
Also published as: JP4030147B2

Abstract

PROBLEM TO BE SOLVED: To operate and select an object by a non-contact detection method without using a line-of-sight detection method for calculating the line-of-sight direction having strict accuracy by relating, storing, recognizing and controlling the state of the object and the input information of a face image. SOLUTION: This device is provided with an image input part 2, a face image processing part 3 for analyzing the face of a human and obtaining a feature amount, a recognition and judgement part 4 for relating, storing, recognizing and controlling the state of the object and the input information of the face image and an object control part 5. Then, during work, the face of the human is photographed, image analysis is performed and the feature amount is obtained. A dictionary pattern for recognition is prepared by using the feature amount and it is related to the operation contents of the object. At the different point of time, in the case of viewing the object, the face image is similarly obtained and recognized by using the dictionary pattern, and in the case of being close to the face image at the time of operating the object before, the object is automatically selected as an operation object.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はオブジェクト操作装
置およびその方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to an object operation device and a method thereof.

【０００２】[0002]

【従来の技術】近年、コンピュータ等のヒューマンイン
タフェースの高度化が重要とされ、より高度な入力デバ
イスを用いた、簡単かつ容易なインタフェースが発明さ
れている。例えば、視線検出装置を用いたインタフェー
スとして、マウス、ペンなどのデバイスを用いないで、
ウインドウの制御を行うものとして、（特開平４−２３
０２７：ウインドウ選択方式）など関連した発明が多数
提案されている。従来から、視線情報を正確にとるため
の発明もある。2. Description of the Related Art In recent years, the advancement of human interfaces such as computers has become important, and simple and easy interfaces using more advanced input devices have been invented. For example, as an interface using a line-of-sight detection device, without using a device such as a mouse or pen,
For controlling a window, see Japanese Patent Laid-Open No.
027: window selection method). Conventionally, there is also an invention for accurately obtaining gaze information.

【０００３】[0003]

【発明が解決しようとする課題】しかし、非接触な視線
検出装置では、注視点と人間の状態（顔の向きや瞳の位
置）を対応づけておくためのキャリブレーションが必要
である。キャリブレーションは、特別に用意された規定
の指標を見ることにより、注視点と瞳の位置との補正を
行う。検出精度を高めるためには、かなりの時間と繰り
返し動作を強要するものであるため、使用者の負担が大
きかった。However, in the non-contact gaze detecting device, it is necessary to perform a calibration for associating a gazing point with a human state (face direction or pupil position). The calibration corrects the gazing point and the pupil position by looking at a specially prepared prescribed index. In order to increase the detection accuracy, a considerable amount of time and repetitive operations are required, so that the burden on the user is large.

【０００４】また、注視点を得るために、近赤外線をあ
てる（特開昭６１−１７２５５２、特開平４−２３０２
７）ものや、瞳の位置を正確に取出すためにステレオの
装置を使う（伴野明，岸野文郎，小林幸雄：”瞳
孔の抽出処理と頭部の動きを許容する視線検出装置の試
作”，電子通信学会論文誌（Ｄ）Ｖｏｌ．Ｊ７６−Ｄ−
ＩＩ，Ｎｏ．３，ｐｐ．６３６−６４６（１９９３））
など、大掛かりな装置が必要であった。Further, in order to obtain a gazing point, near-infrared rays are applied (JP-A-61-172552, JP-A-4-2302).
7) Use stereo equipment to accurately detect objects and pupil positions (Akira Banno, Fumio Kishino, Yukio Kobayashi: "Prototype of eye gaze detection device that allows pupil extraction and head movement", Electronics Transactions of the Communication Society (D) Vol.J76-D-
II, No. 3, pp. 636-646 (1993))
Such a large device was required.

【０００５】本発明では、厳密な精度の視線方向を算出
する視線検出方法を用いることなく、非接触な検出方法
でオブジェクトを操作、選択することを可能にする。本
発明の装置において、最初は、従来通り他のデバイスを
用いて、オブジェクトに対する作業を行う。作業中はオ
ブジェクトを注視しているため、その作業中に人間の顔
を撮影し、画像解析を行って特徴量を取得しておく。そ
の特徴量を用いて認識用の辞書パターンを作成し、オブ
ジェクトの操作内容と関連づけておく。別の時点で、オ
ブジェクトを見ている場合に、同様に顔画像をしとく
し、作成された辞書パターンを用いて認識を行い、以前
にオブジェクトを操作していたときの顔画像に近い場合
には、明示的にデバイスを用いずとも、自動的にそのオ
ブジェクトを操作対象として選択することを実現する。According to the present invention, an object can be operated and selected by a non-contact detection method without using a line-of-sight detection method for calculating a line-of-sight direction with strict accuracy. In the apparatus of the present invention, at first, work is performed on an object by using another device as in the related art. Since the user is watching the object during the work, a human face is photographed during the work, and image analysis is performed to acquire a feature amount. A dictionary pattern for recognition is created using the feature amount, and is associated with the operation content of the object. At another point, if you are looking at the object, if you look at the face image in the same way, perform recognition using the created dictionary pattern, and if it is close to the face image when the object was previously operated, That is, it is possible to automatically select the object as an operation target without explicitly using a device.

【０００６】コンピュータのウインドウシステムを例と
すると、最初はウインドウのフォーカスをマウスを用い
て行い、そのウインドウで作業を行っている間に、その
作業中の人間の顔画像を取得し、注視している方向の顔
の特徴量を求める。その特徴量を用いて辞書パターンを
作成された後は辞書パターンを用いて認識を行い、ある
ウインドウを見ていたときの顔画像に近い場合、そのウ
インドウに自動的にフォーカシングが行われることにな
る。Taking a window system of a computer as an example, first, a window is focused using a mouse, and while working in the window, a face image of the working person is acquired and watched. The feature amount of the face in the direction in which it is located is obtained. After the dictionary pattern is created using the feature amount, recognition is performed using the dictionary pattern, and when the face image is close to the face image when looking at a window, focusing is automatically performed on the window. .

【０００７】ここで、マウスを操作している場合には、
人間は図１３のようにマウスカーソルを人物４４の目で
追従するという、ビジュアルフィードバックを行ってお
り、マウス４５のカーソルが表示部４６上において常に
視線で追うことによって、マウスの位置を制御してい
る。すなわち、マウスの移動操作中は自然に視線位置を
取得することができる。この性質を利用すれば、従来法
のように特別な指標を用意する必要はなく、指標を見せ
る形でのキャリブレーションは必要ない。Here, when the mouse is operated,
The human performs visual feedback that the mouse cursor follows the eyes of the person 44 as shown in FIG. 13, and the cursor of the mouse 45 controls the position of the mouse by constantly following the line of sight on the display unit 46. I have. That is, the line of sight position can be acquired naturally during the mouse moving operation. If this property is used, there is no need to prepare a special index unlike the conventional method, and it is not necessary to perform calibration in a manner that shows the index.

【０００８】また、ウインドウのようにマウスカーソル
にくらべ大きな面積を持つ対象を視線により識別する場
合、１）ウインドウの操作中（キー入力など）はウイン
ドウを注視する。２）視線検出の精度を必要としない。[0008] Further, when an object such as a window having an area larger than that of a mouse cursor is identified by the line of sight, 1) the window is watched during the operation of the window (such as key input). 2) There is no need for line-of-sight detection accuracy.

【０００９】これより、ウインドウの操作中に顔画像を
取得し、その顔画像を用いて辞書パターンをつくればよ
い。各ウインドウ毎に、辞書パターンを作成しておき、
認識時には、どの辞書に最も近いかを計算することで、
ウインドウを選択することができ、視線検出と同様な効
果が得られることとなる。Thus, a face image may be acquired during operation of the window, and a dictionary pattern may be created using the face image. Create a dictionary pattern for each window,
When recognizing, by calculating which dictionary is closest,
The window can be selected, and the same effect as that of the visual line detection can be obtained.

【００１０】適宜辞書パターンを作成することにより、
特別な指標を用いたキャリブレーションをあらかじめ行
う必要がなくなる。またこれまでの利用形態を崩すこと
なく、作業をしていく過程で、より使いやすい環境、様
式を自動的に獲得、利用できる。[0010] By appropriately creating a dictionary pattern,
It is not necessary to perform calibration using a special index in advance. In addition, it is possible to automatically acquire and use a more easy-to-use environment and style in the course of working without breaking the conventional usage form.

【００１１】[0011]

【課題を解決するための手段】以上の目的を達成するた
めに、第一の発明は、画像を入力するための画像入力部
と人間の顔を解析し特徴量を求める顔画像処理部とオブ
ジェクトの状態と顔画像の入力情報を関連づけ、記憶、
認識、制御を行う認識判断部とオブジェクトを制御する
ためのオブジェクト制御部を持つ。これにより、自動的
にオブジェクトを顔の向きなどで選択するなどの操作が
可能になる。In order to achieve the above object, a first aspect of the present invention provides an image input unit for inputting an image, a face image processing unit for analyzing a human face and obtaining a feature amount, and an object. Associating the state of the face with the input information of the face image, memorizing,
It has a recognition determination unit that performs recognition and control, and an object control unit that controls objects. As a result, an operation such as automatically selecting an object based on a face direction or the like becomes possible.

【００１２】第二の発明は、認識判断部において、規定
したオブジェクトの操作情報を受け取ったとき、人間の
顔を解析して得られる特徴量を記憶収集し、その特徴量
を識別するための辞書パターンの辞書生成を行う辞書生
成部を持つことを特徴とした請求項１のオブジェクト操
作装置にある。これにより、オブジェクト毎に適宜辞書
を生成するため、キャリブレーションが必要なくなる。A second invention is a dictionary for storing and collecting feature amounts obtained by analyzing a human face when the recognition judging unit receives operation information of a specified object, and identifying the feature amounts. 2. The object operation device according to claim 1, further comprising a dictionary generation unit that generates a dictionary of patterns. Thus, a dictionary is generated as appropriate for each object, so that calibration is not required.

【００１３】第三の発明は、認識判断部において、人間
の顔を解析して得られた特徴量によって生成された辞書
を用いて、類似度の近いカテゴリに識別する認識部を持
ち、そのカテゴリに関連づけられた操作情報を生成する
判断制御部を持つ。これにより、顔から得られた特徴量
のみで、オブジェクトの操作を可能にする。According to a third aspect of the present invention, the recognition judging section has a recognizing section for identifying a category having a similar degree of similarity by using a dictionary generated based on a feature obtained by analyzing a human face. And a judgment control unit for generating operation information associated with. Thus, the object can be operated only with the feature amount obtained from the face.

【００１４】第四の発明は、認識判断部において、人間
の顔を解析して得られた特徴量の辞書生成部とそれを識
別する認識部とを有し、対象となるオブジェクトの操作
状態に応じて、上記２つの機能の切り替え制御を行う判
断制御部を持つ。これにより、オブジェクトの操作状態
に応じた、認識制御が可能になる。According to a fourth aspect of the present invention, the recognition judging section has a dictionary generating section for a feature quantity obtained by analyzing a human face and a recognizing section for identifying the dictionary. Accordingly, a judgment control unit that controls switching of the above two functions is provided. Thereby, recognition control can be performed according to the operation state of the object.

【００１５】第五の発明は、認識判断部において、辞書
生成部、および認識部において、部分空間法を利用する
ことを特徴とする。これにより、辞書パターンをつくる
ために部分空間法を利用するため、簡易に辞書生成が可
能になり、安定した識別ができる。A fifth aspect of the present invention is characterized in that the recognition determining section utilizes a subspace method in the dictionary generating section and the recognizing section. Accordingly, since the subspace method is used to create a dictionary pattern, a dictionary can be easily generated, and stable identification can be performed.

【００１６】第六の発明は、画像を入力するための画像
入力手段と、撮影された人間の顔を解析する顔画像処理
手段とオブジェクト操作中の人間の顔を解析して得られ
る特徴量を記憶し、記憶した特徴量との類似度を判定
し、オブジェクトの操作情報を生成する認識判断方法と
オブジェクトを制御するためのオブジェクト制御手段と
を持つ。これにより、自動的にオブジェクトを顔の向き
などで選択するなどの操作する方法を提供できる。According to a sixth aspect of the present invention, there is provided image input means for inputting an image, face image processing means for analyzing a photographed human face, and feature amounts obtained by analyzing a human face during object operation. It has a recognition determination method for storing and determining the similarity with the stored feature amount and generating operation information of the object, and an object control means for controlling the object. This provides a method of performing an operation such as automatically selecting an object based on a face direction or the like.

【００１７】[0017]

【発明の実施の形態】以下に本発明の一実施例について
説明する。本発明による装置１を図１に示す。装置は画
像入力部２、顔画像処理部３、認識判断部４、オブジェ
クト制御部５の４つの部分からなる。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. An apparatus 1 according to the invention is shown in FIG. The apparatus comprises four parts: an image input unit 2, a face image processing unit 3, a recognition determination unit 4, and an object control unit 5.

【００１８】［画像入力部］画像入力部２は、人間の顔
を撮影するための装置で、通常のＴＶカメラを一つ用い
て画像の取得を行う。入力は白黒の画像でよいが、カラ
ー入力の場合は白黒に変換して、顔画像処理部３に送ら
れる。[Image Input Unit] The image input unit 2 is a device for photographing a human face, and obtains an image using one ordinary TV camera. The input may be a black and white image, but in the case of a color input, it is converted to black and white and sent to the face image processing unit 3.

【００１９】本実施例では、白黒の画像で処理を行う
が、顔画像処理部３で色情報などが必要な場合は、カラ
ー画像を出力しても良い。また、画像入力部２には、複
数のカメラがあっても良いし、カメラも固定入力カメラ
だけでなく、ピント（フォーカス）、絞り、ズームなど
が制御可能なカメラを用いても良い。カメラの撮影方向
を可変にするためのパン、チルト方向に回転可能なカメ
ラを用いてもよい。In the present embodiment, the processing is performed with a black and white image. However, if the face image processing unit 3 needs color information or the like, a color image may be output. The image input unit 2 may include a plurality of cameras, and may use not only a fixed input camera but also a camera capable of controlling focus (aperture), aperture, zoom, and the like. A camera rotatable in pan and tilt directions for changing the shooting direction of the camera may be used.

【００２０】［顔画像処理部］顔画像処理部３の一構成
例を、図２に示す。本実施例での顔画像処理部６は、顔
検出部７、顔部品検出部８、部分画像生成部９からな
る。[Face Image Processing Unit] FIG. 2 shows an example of the configuration of the face image processing unit 3. The face image processing unit 6 in this embodiment includes a face detection unit 7, a face part detection unit 8, and a partial image generation unit 9.

【００２１】顔検出部７では、入力された画像に対し、
アフィン変換（平行移動、拡大縮小、回転）、クロッピ
ング（切り出し）を行い、規定の大きさの部分画像を生
成する。具体的には、入力画像を数段階で縮小し、それ
ぞれの縮小画像で、ラスタスキャンを行う要領である大
きさ（Ｎｐｉｘｅｌ×Ｎｐｉｘｅｌ）の画像を切り
出す（図１０参照）。切り出された部分画像に対して、
あらかじめ用意された顔画像の辞書パターンとの類似度
を計算する。ここでは部分空間法を用いて、類似度を算
出する。In the face detecting section 7, the input image is
Affine transformation (translation, scaling, rotation) and cropping (cutout) are performed to generate a partial image of a specified size. Specifically, the input image is reduced in several stages, and an image of a size (N pixel × N pixel), which is the point of performing raster scan, is cut out from each reduced image (see FIG. 10). For the clipped partial image,
The degree of similarity between the prepared face image and the dictionary pattern is calculated. Here, the similarity is calculated using the subspace method.

【００２２】部分空間法は集められたサンプルをＫ−Ｌ
展開（主成分分析）し、その正規直交基底（固有ベクト
ル）を求め、これを辞書パターンとする。テストパター
ンをそれぞれカテゴリの辞書パターン（固有ベクトル）
との内積の総和を求め、これを類似度とする。このとき
テストパターンは最も高い類似度をもつカテゴリに属す
るとするものである。固有ベクトルφ＝（φ_i ，…，φ
_m ）を用いて、部分画像Ｇとの類似度ｓを求める。The subspace method uses the collected samples as KL
Expansion (principal component analysis) is performed to obtain an orthonormal basis (eigenvector), and this is used as a dictionary pattern. Each test pattern is a dictionary pattern of each category (eigenvector)
The sum of the inner products of is calculated, and this is used as the similarity. At this time, the test pattern belongs to the category having the highest similarity. The eigenvector φ = (φ _i , ..., φ
The similarity s with the partial image G is obtained using _m ).

【００２３】[0023]

【数１】 (Equation 1)

【００２４】切り出した画像の中で、最も高い類似度を
もつ解像度、位置を見つけ、顔が存在すると定める（図
１０中央下参照）。顔部品検出部８は、検出された顔領
域の中から、目、鼻、口といった顔部品の検出を行う。
本実施例では、顔部品である目（瞳孔）、鼻（鼻孔）を
検出する。A resolution and a position having the highest similarity are found in the extracted images, and it is determined that a face exists (see the lower center of FIG. 10). The face part detection unit 8 detects face parts such as eyes, nose, and mouth from the detected face area.
In the present embodiment, eyes (pupils) and nose (nostrils), which are face parts, are detected.

【００２５】これらの特徴的な部分を抽出する方法とし
て、図１１（ａ）の顔画像に対して、図１１（ｂ）に示
すように、まず、目鼻の候補として、丸領域の形状をし
た分離度マスク（山口修、福井和広、”分離度特徴
を用いた顔画像解析−目瞳の検出−”、情報処理学会
第５２回全国大会（２）−ｐｐ．１８７−１８８，１
９９６．）を用いて丸領域の検出を行う。なお特徴点抽
出の方法は従来法を用いてよいし、方法を問わない。複
数の丸領域から、４つの領域の組合せにより、目鼻候補
として考えた場合に、もっとも顔らしいと判定された４
点を目鼻候補を特定する。この顔らしさの判定は、あら
かじめ用意した顔モデルとの比較により検証する。顔モ
デルは、濃淡情報を用いたもの、バネモデルによる構造
特徴量など、従来から提案されている方法を用いればよ
い。As a method for extracting these characteristic portions, as shown in FIG. 11 (b), first, as shown in FIG. 11 (b), the shape of a round region was extracted as a candidate for eyes and nose. Separation Mask (Osamu Yamaguchi, Kazuhiro Fukui, "Face Image Analysis Using Separation Features-Eye Pupil Detection-", Information Processing Society of Japan 52nd Annual Convention (2) -pp. 187-188, 1
996. ) Is used to detect a round area. Note that a conventional method may be used as a feature point extraction method, and any method may be used. From a plurality of round regions, a combination of four regions was determined to be the most likely face when considered as an eye / nose candidate.
Points are specified as eye-nose candidates. The determination of the face likeness is verified by comparison with a prepared face model. The face model may use a conventionally proposed method such as a method using density information or a structural feature value using a spring model.

【００２６】部分画像生成部９は、検出された顔部品の
位置（特徴点）を基準として、図１１（ｅ）のように、
顔領域の切り出しを行う。特徴点として選ばれた４点に
基づいて、領域を画像中から再量子化し、規定の大きさ
の小画像（正規化画像）を生成する。ここでの正規化の
サイズは、１５×１５ピクセルの大きさとして、図に示
すように、目鼻の特徴点から２つのベクトル（図１１
（ｄ））を設定する。そのベクトルの線形和の位置の濃
淡値を、切り出した後の画像の濃淡値とする（図１１
（ｅ））。なお、この切り出しの大きさ（１５×１５ピ
クセル）は、これに限らない。また、切り出す部分に関
してもこれに制限されない。The partial image generation unit 9 uses the detected position (feature point) of the face part as a reference as shown in FIG.
Cut out the face area. Based on the four points selected as feature points, the region is requantized from the image to generate a small image (normalized image) having a specified size. The size of the normalization here is a size of 15 × 15 pixels, as shown in FIG.
(D)) is set. The gray value at the position of the linear sum of the vectors is defined as the gray value of the image after the clipping (FIG. 11).
(E)). Note that the size of this cutout (15 × 15 pixels) is not limited to this. Further, the cutout portion is not limited to this.

【００２７】顔部品の位置を用いて上述したように切り
出しを行う場合、顔の向き、瞳の位置の違いにより、図
１２のように異なる特徴量となる。図１２（ａ）は上、
図１２（ｂ）は下、図１２（ｃ）は右、図１２（ｄ）は
左を見ていた場合の濃淡値の模式図であり、この濃淡値
をそのまま特徴量として利用する。これにより、視線を
直接には求めないで、顔の向きなどの特徴量を利用でき
る。When the cutout is performed using the position of the face part as described above, different feature amounts are obtained as shown in FIG. 12 depending on the difference between the face direction and the pupil position. FIG.
FIG. 12B is a schematic diagram of the gray value when looking at the bottom, FIG. 12C is a right diagram when looking at the right, and FIG. 12D is a schematic diagram of the gray value when looking at the left. As a result, a feature amount such as a face direction can be used without directly obtaining a line of sight.

【００２８】図１２のような濃淡の特徴量を識別するこ
とによって、顔の向きの違いから、１７インチディスプ
レイ程度の大きさの画面を９分割程度で見分けることが
できる。しかし、これにはあらかじめキャリブレーショ
ンが必要となる。すなわち対応する分割画面を見ている
間の顔の特徴量を取得しておき、辞書をあらかじめ生成
しておく必要がある。これを用いて簡単なメニュー選択
なども可能である。By discriminating the characteristic amount of shading as shown in FIG. 12, it is possible to distinguish a screen having a size of about a 17-inch display into about nine divisions based on a difference in face direction. However, this requires calibration in advance. That is, it is necessary to acquire the feature amount of the face while viewing the corresponding divided screen and generate a dictionary in advance. Using this, simple menu selection and the like are also possible.

【００２９】［オブジェクト制御部］オブジェクト制御
部５は、図４に示されているようにオブジェクト状態管
理部１５、イベント管理部１６、オブジェクト状態変更
部１７から構成される。[Object Control Unit] The object control unit 5 comprises an object state management unit 15, an event management unit 16, and an object state change unit 17, as shown in FIG.

【００３０】オブジェクト状態管理部１５は、オブジェ
クトの生成、管理、削除などオブジェクトに関する状態
情報を管理する。イベント管理部１６は、オブジェクト
を操作するために、用意されたデバイスなどの情報や、
オブジェクトに対して行われる処理、またオブジェクト
が行う処理から発生するすべてのイベント情報を処理、
管理する。なお、イベントとは、システムにおける、オ
ブジェクトの操作情報、操作内容などをあらわし、操作
の最小単位を指す。The object state management unit 15 manages state information on objects such as creation, management, and deletion of objects. The event management unit 16 stores information such as devices prepared for operating the object,
Handles the processing performed on the object, and all event information resulting from the processing performed by the object,
to manage. Note that an event represents operation information, operation content, and the like of an object in the system, and indicates a minimum unit of operation.

【００３１】オブジェクト状態変更部１７は、各オブジ
ェクトの状態を変更し、オブジェクトの表示などに関し
ての制御、処理を行う。本実施例では、コンピュータの
ウインドウシステムにおける、ウインドウマネージャを
例に具体的に説明を行うため、図５のように、オブジェ
クト制御部１８をウインドウ制御部（ウインドウマネー
ジャ）とし、オブジェクト状態管理部をウインドウ管理
部１９、オブジェクト状態変更部をウインドウ表示変更
部２１とする。The object state changing unit 17 changes the state of each object, and performs control and processing related to the display of the object. In the present embodiment, in order to specifically describe a window manager in a computer window system as an example, as shown in FIG. 5, the object control unit 18 is a window control unit (window manager), and the object state management unit is a window control unit. The management unit 19 and the object state change unit are referred to as a window display change unit 21.

【００３２】［ウインドウ制御部］ウインドウ制御部１
８は、通常のウインドウシステムにおけるウインドウマ
ネージャと同等の機能であるが、次の３つのウインドウ
管理部１９、イベント管理部２０、画面表示変更部２１
からなるとして説明する。[Window control unit] Window control unit 1
8 is a function equivalent to a window manager in a normal window system, but includes the following three window management units 19, an event management unit 20, and a screen display change unit 21.
It is described as consisting of

【００３３】＜ウインドウ管理部＞ウインドウ管理部１
９は、表示されているウインドウの位置、大きさなどの
属性情報、ウインドウ同士の重なり方を管理する。それ
ぞれのウインドウは、新たに生成された時点で、そのウ
インドウの大きさ（ｗ，ｈ）、位置（ｘ，ｙ）、名称
（ｎａｍｅ）、ＩＤ番号（ｉｄｎｕｍｂｅｒ）を次のよ
うな（（ｘ，ｙ），（ｗ，ｈ），（ｎａｍｅ），（ｉｄｎｕ
ｍｂｅｒ））というタップルとして、登録する。<Window management unit> Window management unit 1
Numeral 9 manages attribute information such as the position and size of the displayed window, and how the windows overlap. When each window is newly created, the size (w, h), position (x, y), name (name), and ID number (idnumber) of the window are as follows ((x, y), (w, h), (name), (idnu)
mber)).

【００３４】ウインドウ同士の重なりあいを検出するた
めに各ウインドウの（ｘ，ｙ），（ｗ，ｈ）を用い、ど
のウインドウがどの別のウインドウに重なっているのか
を計算し保持する。The (x, y), (w, h) of each window is used to detect the overlap between windows, and which window overlaps with another window is calculated and held.

【００３５】＜イベント管理部＞イベントとは、ウイン
ドウシステムにおける、マウスの移動、ボタン操作、ウ
インドウ操作、キー入力など操作の最小単位を指す。<Event Management Unit> An event refers to a minimum unit of operation such as mouse movement, button operation, window operation, and key input in a window system.

【００３６】イベントは、（イベントのタイプ、イベントの起ったオブジェクト
（ウインドウ）ＩＤ、イベントの値（量））の組みとして表現される。An event is expressed as a set of (event type, object (window) ID where the event has occurred, event value (amount)).

【００３７】イベント管理部２０は、マウス、キーボー
ドといったデバイスからのイベントに対する処理、ま
た、イベントが起った際に画面表示の変更の指示なども
行う。例えばユーザからのキーボードの入力があると、
ウインドウシステムからのイベントが発生し、イベント
管理部２０にそのイベントが送られる。イベントをどの
ウインドウに送るのかは、このイベント管理部２０で処
理する。すなわち、ウインドウが選択対象となっている
か（フォーカスと呼ぶ）を管理している。The event management unit 20 performs a process for an event from a device such as a mouse or a keyboard, and also gives an instruction to change a screen display when an event occurs. For example, if there is a keyboard input from the user,
An event from the window system occurs, and the event is sent to the event management unit 20. The window to which the event is sent is processed by the event management unit 20. That is, it manages whether a window is a selection target (referred to as focus).

【００３８】＜画面表示変更部＞画面表示変更部２１で
は、ウインドウの表示、ウインドウ内の画像、文字、図
形などの描画、マウスカーソルの移動など、画面表示を
変更する場合など、イベントが起った場合にウインドウ
の表示を変更する。<Screen Display Changing Unit> The screen display changing unit 21 generates an event when the screen display is changed, such as displaying a window, drawing images, characters, figures, and the like in the window, and moving a mouse cursor. Change the window display when

【００３９】例えば、マウスの移動によりフォーカスが
移動した場合、フォーカスが変更したことをウインドウ
の枠の色を変化させる。ウインドウを移動した場合、移
動を指定された位置に持っていくなどである。For example, when the focus is moved by moving the mouse, the fact that the focus has been changed is changed by changing the color of the window frame. When the window is moved, the movement is brought to a designated position.

【００４０】［認識判断部］次に、図１の認識判断部４
について説明する。図３にその構成を示す。認識判断部
１０は、辞書生成部１２、認識部１１、判断制御部１３
からなる。[Recognition determining section] Next, the recognition determining section 4 shown in FIG.
Will be described. FIG. 3 shows the configuration. The recognition determination unit 10 includes a dictionary generation unit 12, a recognition unit 11, a determination control unit 13,
Consists of

【００４１】辞書生成部１２では、顔画像処理部３で生
成された特徴量を用いて、認識用の辞書パターンを生成
する。ここでは、切り出しが行われた複数枚の顔の部分
画像を用いて認識用の辞書パターンを生成する。辞書生
成部１２は、判断制御部１３からの指示により、図６の
ような処理を行う。まず、次の判断制御部からの指示が
くるまでの間、画像の収集を行う（ステップ２２）。そ
して、ある定数の画像が収集された場合に、それらの画
像からの分散共分散行列を構成する（ステップ２３）。
その行列をＫ−Ｌ展開することにより、行列の固有値、
固有ベクトルを計算する（ステップ２４）。これは具体
的にはｙａｃｏｂｉ法やＬＵ分解などの行列計算を行え
ばよい。次に固有値の大きい順に、対応する固有ベクト
ルを並び替えて、上位いくつかの固有ベクトルのみを取
出し、それを辞書パターンとして登録する（ステップ２
５）。辞書パターンは複数個もつことができ、任意に削
除することもできる。The dictionary generation unit 12 generates a dictionary pattern for recognition using the feature amount generated by the face image processing unit 3. Here, a dictionary pattern for recognition is generated using partial images of a plurality of cut out faces. The dictionary generation unit 12 performs a process as shown in FIG. 6 according to an instruction from the determination control unit 13. First, images are collected until an instruction is received from the next determination control unit (step 22). Then, when images of a certain constant are collected, a variance-covariance matrix from those images is constructed (step 23).
By performing KL expansion of the matrix, eigenvalues of the matrix,
An eigenvector is calculated (step 24). Specifically, a matrix calculation such as the yacobi method or LU decomposition may be performed. Next, the corresponding eigenvectors are rearranged in descending order of the eigenvalues, and only the top several eigenvectors are taken out and registered as dictionary patterns (step 2).
5). A plurality of dictionary patterns can be provided and can be deleted arbitrarily.

【００４２】＜認識部＞認識部１１は、辞書生成部１２
によってつくられた辞書パターンを用いて、別に切り出
された画像が、どの辞書パターンにもっとも近いのかを
求める。類似度は、先に説明した部分空間法を用いて、
ｉ番目の辞書パターン（固有ベクトル）φ_i により、部
分画像の類似度ｓ_i を求める。<Recognition Unit> The recognition unit 11 includes a dictionary generation unit 12
Using the dictionary pattern created by the above, it is determined to which dictionary pattern the image cut out separately is closest. The similarity is calculated using the subspace method described above.
The similarity s _i of the partial images is obtained from the i-th dictionary pattern (eigenvector) φ _i .

【００４３】[0043]

【数２】 (Equation 2)

【００４４】このとき、すべてのｓ_i について、最も大
きなｓ_i をもつ辞書パターンのカテゴリに分類される。＜判断制御部＞判断制御部１３では、１）オブジェクト
制御部５からの情報受け取り、２）認識、辞書生成の制
御、３）オブジェクト制御部５への指示を行う。判断制
御部１３の一実施例として、図７のようにイベント調停
部２９、イベント検証部２７、イベント生成部２８から
なる。[0044] In this case, for all s _i, is classified in the category of the dictionary pattern with the greatest s _i. <Judgment Control Unit> The judgment control unit 13 1) receives information from the object control unit 5, 2) controls recognition and dictionary generation, and 3) issues an instruction to the object control unit 5. As one embodiment of the judgment control unit 13, as shown in FIG. 7, it comprises an event arbitration unit 29, an event verification unit 27, and an event generation unit 28.

【００４５】イベント調停部２９は、オブジェクト制御
部５からイベントを受け取り、関係のあるイベントがど
うかを判断し、各イベント検証部２７、イベント生成部
２８、にイベントを振り分ける。ここでは、ウインドウ
を注視しているかどうかに関連するイベントのみを選択
するように、イベントのタイプとイベントの起ったウイ
ンドウＩＤのチェックを行う。すなわち（ｔｙｐｅ，ｗ
ｉｎＩＤ，ｖａｌｕｅ）で表現されたイベントのｔｙｐ
ｅ，ｗｉｎＩＤを用いて取捨選択する。The event arbitration unit 29 receives an event from the object control unit 5, determines whether there is a related event, and distributes the event to each of the event verification unit 27 and the event generation unit 28. Here, the type of the event and the window ID where the event has occurred are checked so that only the event related to whether or not the window is being watched is selected. That is, (type, w
inID, value)
e, select using winID.

【００４６】イベント検証部２７は、図８でしめすよう
なフローチャートで動作する。まず、オブジェクト制御
部（ウインドウ制御部）からイベントの情報を受け取る
（ステップ３０）。イベントがあった場合（ステップ３
１）、そのイベントが起こったウインドウのＩＤを確か
める。もし、対象としているウインドウＩＤからのイベ
ントである場合（ステップ３２）、辞書生成部に辞書生
成のための画像収集を指示する（ステップ３３）。The event verification section 27 operates according to a flowchart shown in FIG. First, event information is received from the object control unit (window control unit) (step 30). If there is an event (Step 3
1) Check the ID of the window where the event occurred. If the event is from the target window ID (step 32), the dictionary generation unit is instructed to collect images for dictionary generation (step 33).

【００４７】また、ウインドウの移動／削除などウイン
ドウ自身の位置情報などが変更された場合は、認識部に
対して辞書パターンの削除を指示する。イベント生成部
は、図９でしめすようなフローチャートで動作する。オ
ブジェクト制御部からイベントの情報を受け取るが（ス
テップ３７）、受け取ったイベントがマウスイベントで
ない場合で（ステップ３８）、更に認識部からの認識結
果が、あるウインドウを注視している場合、フォーカス
変更のイベントなど、関連づけされたイベントを生成
し、オブジェクト制御部に送る。When the position information of the window itself is changed such as moving / deleting the window, the recognition unit is instructed to delete the dictionary pattern. The event generator operates according to the flowchart shown in FIG. Event information is received from the object control unit (step 37). If the received event is not a mouse event (step 38), and if the recognition result from the recognition unit is gazing at a certain window, the focus change is performed. Generate an associated event, such as an event, and send it to the object control unit.

【００４８】これら各部の動きを実施例にそって動作を
説明する。 ◆実施例１（ウインドウマネージャ）本実施例は、パソコン、ワークステーションなどで画面
による入力（ＧＵＩ）を備えたウインドウシステムに適
用した例を説明する。The operation of each of these parts will be described according to an embodiment. First Embodiment (Window Manager) This embodiment describes an example in which the present invention is applied to a window system having a screen input (GUI) in a personal computer, a workstation, or the like.

【００４９】ウインドウフォーカスの選択（フォーカシ
ング）を例とする。ウインドウフォーカスとは、複数の
ウインドウの中から、キーボードによる入力など、操作
対象とするウインドウを選択することを指す。The selection (focusing) of the window focus is taken as an example. Window focus refers to selecting a window to be operated, such as input from a keyboard, from a plurality of windows.

【００５０】従来のウインドウシステムでは、ウインド
ウフォーカスを行うためには、図１４（ｃ）（ｄ）のよ
うに２つのウインドウが存在する場合は、それぞれのウ
インドウ内にマウスを移動させ、マウスカーソルをウイ
ンドウの内部に持ってくることによって、そのウインド
ウを操作対象とするフォーカシングができる。In the conventional window system, in order to perform window focus, when there are two windows as shown in FIGS. 14C and 14D, the mouse is moved into each of the windows and the mouse cursor is moved. By bringing the window inside the window, focusing on the window can be performed.

【００５１】本実施例では、最初は従来通り、ウインド
ウのフォーカスをマウスを用いて行い、そのウインドウ
で作業を行っている間に、その作業中の人間の顔画像を
取得する。辞書パターンが作成された後は、それらの辞
書パターンを用いて認識を行い、あるウインドウを見て
いたときの顔画像に近い場合、そのウインドウにフォー
カシングを行うことを実現する。In this embodiment, initially, the focus of the window is focused on using the mouse, and the face image of the working person is acquired while the work is being performed in the window. After the dictionary patterns are created, recognition is performed using those dictionary patterns, and when the image is close to the face image when a certain window is being viewed, focusing is performed on the window.

【００５２】パソコンやワークステーションのディスプ
レイ（表示装置）付近に取りつけられたカメラを入力と
して、人間の顔の画像を取得する。本実施例では、ディ
スプレイの下側付近に取りつけ、見上げる角度に設置
し、顔をとらえる。An image of a human face is obtained by using a camera mounted near a display (display device) of a personal computer or a workstation as an input. In this embodiment, the camera is mounted near the lower side of the display, installed at an angle to look up, and captures the face.

【００５３】図１４（ａ）のように、一つのウインドウ
が存在する状態から説明する。図１４（ｂ）のように新
しいウインドウを作成する。その後、ウインドウは図１
４（ｃ）のようにマウスによってフォーカシングされ
る。Description will be made from a state where one window exists as shown in FIG. A new window is created as shown in FIG. Then the window is
Focusing is performed by the mouse as shown in FIG.

【００５４】図１５は、ウインドウの状態遷移図を表
す。楕円で表されたものノードが各状態を示し、アーク
には操作内容を示すイベントのタイプを表している。ｎ
ｅｗは新規ウインドウの生成、ｋｅｙｐｕｓｈ、ｋｅ
ｙｒｅｌｅａｓｅはキーボードからの入力、ｍｏｕｓ
ｅｍｏｖｅはマウスの移動、ｉｃｏｎｉｆｙＤｅｉ
ｃｏｎｉｆｙは、ウインドウのアイコン化、ウインドウ
化の指示を表す。FIG. 15 shows a state transition diagram of the window. Nodes represented by ellipses indicate respective states, and arcs indicate event types indicating operation contents. n
ew is a new window creation, key push, ke
y release is keyboard input, mouse
e move is mouse movement, iconify Dei
“conify” indicates an instruction to convert a window into an icon or a window.

【００５５】例えば（ｍｏｕｓｅｍｏｖｅ，１２３，
（ｘ，ｙ））という場合はウインドウＩＤ１２３でマウ
スが（ｘ，ｙ）の位置に移動したことを表す。図１４
（ｂ）から図１４（ｃ）への変化は、「ウインドウ生
成」の後「フォーカス状態」に遷移することになる。ウ
インドウが生成された時点で、ウインドウ管理部は、新
しいウインドウの（ＩＤ、位置、大きさ、名称）を登録
する。そして、認識判断部の判断制御部にそのウインド
ウのＩＤを送る。For example, (mouse move, 123,
(X, y)) indicates that the mouse has moved to the position of (x, y) in the window ID 123. FIG.
The change from (b) to FIG. 14 (c) is a transition to “focus state” after “window generation”. When the window is created, the window manager registers (ID, position, size, name) of the new window. Then, the window ID is sent to the judgment control unit of the recognition judgment unit.

【００５６】「フォーカス状態」と「キー入力状態」を
遷移している場合、すなわち、フォーカスしたウインド
ウで作業を行っているときに、認識判断部の判断制御部
は、辞書生成部に対し、辞書パターンを生成するための
画像収集を指示する。なお、キー入力の場合は、キーが
押された瞬間だけではなく、連続してキーが押されてい
る時間についてすべて、画像収集を行う。When transitioning between the “focused state” and the “key input state”, that is, when the user is working in the focused window, the judgment control unit of the recognition judgment unit instructs the dictionary generation unit to transmit the dictionary. Instruct image collection to generate a pattern. In the case of key input, image acquisition is performed not only at the moment when the key is pressed but also during the time when the key is continuously pressed.

【００５７】辞書生成部は、ウインドウＩＤに対して、
収集枚数がある定数に達した場合に、辞書生成を図６の
要領で行う。辞書パターンが生成された場合、辞書生成
部は、判断制御部に辞書パターンの情報（辞書とウイン
ドウＩＤの組）を伝える。The dictionary generation unit calculates the window ID
When the number of collections reaches a certain constant, dictionary generation is performed as shown in FIG. When the dictionary pattern is generated, the dictionary generation unit notifies the determination control unit of the information of the dictionary pattern (a set of the dictionary and the window ID).

【００５８】判断制御部は認識部に対して、辞書パター
ンの更新情報を送り認識部は、辞書生成部から新たな辞
書パターンを受け取る。なお認識部は、辞書パターンで
認識した結果、ウインドウＩＤを判断部に送出すること
になる。The judgment control section sends update information of the dictionary pattern to the recognition section, and the recognition section receives a new dictionary pattern from the dictionary generation section. The recognition unit sends the window ID to the determination unit as a result of the recognition using the dictionary pattern.

【００５９】この例では、ウインドウは２つ存在するた
め、２つのウインドウそれぞれに対して認識用の辞書を
生成すればよい。判断制御部は、ウインドウ制御部から
のイベント情報を受け取り、マウスの移動、ボタンなど
があるかどうかを検知する。マウスに関するイベントが
発生していない場合、認識部からの認識結果（ウインド
ウＩＤ）を用いて、マウスを動かしたことと同様に、フ
ォーカスをそのＩＤのウインドウにあわせるように、イ
ベントを発生し、ウインドウ制御部に送る。In this example, since there are two windows, it is sufficient to generate a recognition dictionary for each of the two windows. The judgment control unit receives the event information from the window control unit and detects whether there is a mouse movement, a button, or the like. If no event related to the mouse has occurred, an event is generated using the recognition result (window ID) from the recognition unit so that the focus is adjusted to the window of that ID, as in the case of moving the mouse. Send to control unit.

【００６０】これにより、通常図１４（ｅ）のように、
マウスによるフォーカシングだけでなく、図１４（ｆ）
のようにマウスがウインドウ内に、入ってなくともフォ
ーカシングでき、つづけてキー入力ができるようにな
る。As a result, normally, as shown in FIG.
In addition to focusing with the mouse, FIG.
Focusing can be performed even if the mouse is not in the window, and key input can be continued.

【００６１】ウインドウ制御部は、フォーカスをあわせ
たウインドウにフォーカスが変更されたことを画面表示
するために、画面表示変更部に指示し、フォーカスをそ
のウインドウに会わせる。The window control unit instructs the screen display changing unit to display on a screen that the focus has been changed to the focused window, and causes the window to bring the focus into focus.

【００６２】次にウインドウを移動／削除した場合につ
いて述べる。これまでは、見ている方向にウインドウが
ある場合に、そのウインドウにフォーカスを与えること
について述べた。マウス操作により、ウインドウを移動
／削除させた場合、それまでフォーカシングに使用して
いた辞書パターンは使えなくなり、辞書パターンを更新
する必要がある。Next, a case where a window is moved / deleted will be described. So far, we have described giving focus to a window when there is a window in the viewing direction. When the window is moved / deleted by mouse operation, the dictionary pattern used for focusing cannot be used, and the dictionary pattern needs to be updated.

【００６３】まず、マウス操作によるウインドウの移動
／削除の制御が行われた場合、イベント管理部におい
て、ウインドウ移動／削除が検知される。イベント管理
部は、判断制御部に対してどのウインドウが移動／削除
されたかを判断部に知らせる。判断制御部は認識部に対
し、そのウインドウの識別に利用していた辞書パターン
を削除するように指示する。さらに判断制御部はそのウ
インドウＩＤに対してのマウスによるフォーカシングな
どのイベントが発生した場合には、辞書生成部に新たな
辞書パターンを生成するように画像収集の指示を与え
る。First, when window movement / deletion is controlled by mouse operation, the event management unit detects window movement / deletion. The event management unit notifies the judgment control unit which window has been moved / deleted to the judgment control unit. The determination control unit instructs the recognition unit to delete the dictionary pattern used for identifying the window. Further, when an event such as a mouse focusing on the window ID has occurred with respect to the window ID, the determination control unit gives an instruction to the dictionary generation unit to generate an image to generate a new dictionary pattern.

【００６４】また、ウインドウがアイコン化された場合
について述べる。アイコン化された場合、認識部では、
そのウインドウについての類似度を求めることをやめる
よう判断制御部から指示を送る。これにより、アイコン
化されたウインドウにフォーカスが与えられなくなる。
また、アイコン化されたウインドウをもとのウインドウ
に戻した場合には、再びその辞書を認識部に組み入れ、
認識を行う。The case where the window is iconified will be described. If it is iconized, the recognition unit
An instruction is sent from the judgment control unit to stop obtaining the similarity for the window. As a result, the iconized window is not given focus.
When the iconized window is returned to the original window, the dictionary is incorporated into the recognition unit again,
Perform recognition.

【００６５】◆実施例２（視線検出）先に説明したように、図１２のような濃淡の特徴量を識
別することによって、顔の向きの違いから、ディスプレ
イの画面を９分割程度で見分けることができる。この場
合、キャリブレーションが必要である。すなわち対応す
る分割画面を見ている間の顔の特徴量を取得しておき、
辞書をあらかじめ生成しておく必要がある。Embodiment 2 (Gaze Detection) As described above, the display screen is divided into about nine divisions based on the difference in the direction of the face by identifying the characteristic amount of light and shade as shown in FIG. Can be. In this case, calibration is required. That is, the feature amount of the face is acquired while viewing the corresponding split screen,
A dictionary must be created in advance.

【００６６】しかし、本発明を用いることにより、従来
と異なり、能動的にマウスを用いた次のようなキャリブ
レーション法が実現できる。図１６で示すように、９分
割の画面があり、マウスを移動することができるように
なっている。マウスを移動したときにそのマウスを見て
いると図１６（ａ）から図１６（ｂ）のように色が変化
するようにする。これは、マウスの位置によって色が変
化するのではなく、その分割位置を見ているときの人間
の顔の撮影画像の枚数に応じて色が濃く変化するように
する。次に図１６（ｃ）のようにマウスを動かした場
合、右上の色の変化は止まり、中央上の部分を見ている
時の撮影画像の枚数に応じて色が変化する。時間が経過
すると図１６（ｄ）に変化し、先の右上よりも多くの枚
数を取得したことになる。図１６（ｅ）のように、これ
をすべての分割位置について逐次行って、図１６（ｆ）
の状態になったとき、それぞれの顔の特徴量を収集し終
わったことになる。However, by using the present invention, unlike the related art, the following calibration method using a mouse can be realized. As shown in FIG. 16, there are nine divided screens, and the mouse can be moved. When the mouse is moved and the user is looking at the mouse, the color is changed from FIG. 16A to FIG. 16B. This is not to change the color according to the position of the mouse, but to change the color deeply according to the number of captured images of the human face when looking at the division position. Next, when the mouse is moved as shown in FIG. 16C, the change of the upper right color stops, and the color changes according to the number of captured images when looking at the upper center portion. When the time elapses, the state changes to FIG. 16D, indicating that a larger number of sheets have been acquired than in the upper right corner. As shown in FIG. 16E, this is sequentially performed for all the division positions, and FIG.
When this state is reached, the feature amount of each face has been collected.

【００６７】実現のためには、９分割のそれぞれ部分に
ウインドウを割り当て、マウスのおかれたウインドウに
ついて、実施例１で述べたように、辞書の生成を行う。
この際、辞書の生成に使用される画像の取得枚数に応じ
てウインドウの色を変更するように、ウインドウ制御部
に指示を出す。すべてのウインドウの色が変化したと
き、すべてのウインドウで辞書生成が終わったことにな
り、人間も知覚しやすいというメリットもある。For realization, a window is allocated to each of the nine divisions, and a dictionary is generated for the window where the mouse is placed as described in the first embodiment.
At this time, an instruction is issued to the window control unit to change the color of the window according to the number of acquired images used for generating the dictionary. When the colors of all the windows have changed, the dictionary generation has been completed for all the windows, which has the advantage that humans can easily perceive.

【００６８】作成された辞書を用いて認識を行い、簡単
なメニュー選択や、従来の視線検出のアプリケーション
に応用できる。なお、もちろん従来のような、システム
側から提示する形態のキャリブレーションを行ってもよ
い。Recognition is performed using the created dictionary, and the present invention can be applied to simple menu selection and a conventional gaze detection application. Of course, calibration of the type presented from the system side, as in the related art, may be performed.

【００６９】◆実施例３（リモコン）家電製品を例として、テレビのリモコンによるチャンネ
ルの選択への適用を考える。チャンネルを替える操作を
リモコンで行う場合に、図１７のような、画面とは別の
場所（Ａ，Ｂ，Ｃ，Ｄ）などの別の方向を見ながらリモ
コン操作でチャンネルを替える。これが先の実施例のイ
ベントに対応する。Embodiment 3 (Remote Control) Using a home electric appliance as an example, consider application to channel selection by a remote control of a television. When the operation of changing the channel is performed by the remote controller, the channel is changed by the remote controller operation while looking at another location (A, B, C, D) different from the screen as shown in FIG. This corresponds to the event of the previous embodiment.

【００７０】テレビに画像入力部を設置し、テレビを見
ている人の顔画像を取得する。顔画像処理部では同様の
処理を行う。判断制御部では、ある方向を見ながら行っ
たチャンネルの選択内容とを関連づけ、辞書が生成され
た後は、その方向を見るだけで、チャンネルの変更され
るということを可能にする。オブジェクト制御部につい
てはチャンネルの変更手段が必要となる。An image input unit is installed on a television, and a face image of a person watching the television is acquired. The face image processing unit performs the same processing. The judgment control unit associates the channel selection made while looking at a certain direction, and after the dictionary is generated, it is possible to change the channel only by looking at the direction. The object control unit requires a channel changing unit.

【００７１】この場合、オブジェクトの位置（テレビの
位置）は変化しないが、人間の位置が変化することが起
る。この場合に、辞書を更新して対応する方法と、人間
の位置をイベントの種類としてとらえて、それぞれの人
間の位置に関して辞書を生成する方法などで対処でき
る。In this case, the position of the object (the position of the television) does not change, but the position of the human changes. In this case, it is possible to cope with a method of updating the dictionary and coping with it, and a method of generating a dictionary for each human position by taking the position of the human as an event type.

【００７２】◆実施例４（他メディアのサポート）音声認識のデバイスを加えた例として、音声認識を用い
て、家電製品や社会システムを制御する場合を考える。
音声認識を用いたシステムの場合、音声認識のみを用い
たシステムでは、誤認識する場合が多い。これは、使用
者がその指令を送る状態にあるかどうかによって変化す
るものであり、システムに設置されたマイクの指向性な
ども要因となり、認識率が低下する。Embodiment 4 (Support for Other Media) As an example in which a device for voice recognition is added, consider a case in which home appliances and social systems are controlled using voice recognition.
In the case of a system that uses speech recognition, a system that uses only speech recognition often has incorrect recognition. This changes depending on whether the user is in a state of sending the command, and the recognition rate is reduced due to the directivity of a microphone installed in the system.

【００７３】唇の形状を認識に加える例などもあるが、
それを取得する場合にも顔の向きは重要である、ここで
は、音声登録時の別の種類の情報（顔の情報）の状態情
報で補完する例を示す。In some cases, the shape of the lips is added to the recognition.
The direction of the face is also important when acquiring it. Here, an example is shown in which the state information of another type of information (face information) at the time of voice registration is supplemented.

【００７４】最初のアクセス時（登録時）に、登録する
使用者の顔の特徴量を獲得しておいて辞書登録する。登
録する語が複数ある場合、それぞれをイベントとして扱
えば、複数の顔の状態情報が登録できる。At the time of the first access (at the time of registration), the feature amount of the face of the user to be registered is acquired and registered in the dictionary. When there are a plurality of words to be registered, if each is treated as an event, a plurality of face state information can be registered.

【００７５】そして、音声認識を用いて認識を行う場合
に、音声認識の結果が悪い場合でも、どの言葉を発生し
た顔の特徴量に近いかを併用して考えることにより、よ
り確実な認識が可能になる。In the case of performing recognition using voice recognition, even when the result of voice recognition is poor, more reliable recognition can be performed by considering together which words are close to the feature amount of the generated face. Will be possible.

【００７６】変形例について述べる。実施例１、２では
主としてウインドウを例に説明したものがあるが、ウイ
ンドウはオブジェクトとし、対象とするイベント内容に
ついても実施例３、４のように変更してもよい。すなわ
ち、コンピュータだけではなく、家庭の家電製品や、自
動車内、社会システムなどに応用してもよい。A modified example will be described. In the first and second embodiments, a window is mainly described as an example. However, a window may be an object, and the contents of a target event may be changed as in the third and fourth embodiments. That is, the present invention may be applied not only to computers but also to household electric home appliances, automobiles, social systems and the like.

【００７７】判断制御部において、従来におけるマウス
のようなデバイスを用いたイベント生成と本発明で述べ
た顔の向きによるイベント生成は、実施例では、マウス
によるデバイスを用いたイベント生成を優先している
が、その優先度を逆にしてもよい。In the judgment control unit, the conventional event generation using a device such as a mouse and the event generation based on the face direction described in the present invention have priority over the event generation using a mouse device in the embodiment. However, the priorities may be reversed.

【００７８】認識部では一定時間、数回の認識を行っ
て、もっとも識別回数の多かった辞書のカテゴリを認識
結果としてもよい。ウインドウシステム上で、キー入力
を行っている場合について考えると、キーボードを見て
いる場合や、別のウインドウに注視してしまった場合も
発生する。これを防ぐ場合には、過去の認識の情報を蓄
えておき、注視点と特徴量の関係を求めておく機構を設
けてもよい。これから概略の向き（キーボードを見てい
るか、収集対象となっているウインドウ以外をみている
かなど）を判定し、外れているものは、収集対象からは
ずすことにより、辞書の精度を向上させることができ
る。The recognition section may perform recognition several times for a certain period of time, and may use the dictionary category having the largest number of recognition times as the recognition result. Considering a case where a key input is performed on a window system, a case where the user is looking at the keyboard or a case where the user gazes at another window also occurs. In order to prevent this, a mechanism that stores information of past recognition and obtains the relationship between the point of regard and the feature amount may be provided. From this, it is possible to improve the accuracy of the dictionary by judging the general direction (whether the user is looking at the keyboard or looking at a window other than the collection target window), and removes those that are out of the collection target. .

【００７９】ウインドウアプリケーションにおいて、エ
ディタのようなキー入力が多い場合、カーソルの位置が
頻繁に変わることも考えられる。その場合は、ウインド
ウの移動に関わらず、辞書の更新を逐次行っても良い。
また、アプリケーション毎にこの制御を変更してもよ
い。In a window application, when there are many key inputs such as an editor, the position of the cursor may change frequently. In this case, the dictionary may be updated sequentially regardless of the movement of the window.
This control may be changed for each application.

【００８０】顔画像処理部において、上述した実施例で
は、顔の目、鼻を含む矩形部分の濃淡値を特徴量として
用いたが、目の位置も解析して得ているため、目付近だ
けを同様に濃淡値集合として取出して、瞳の位置変動も
考慮にいれて、辞書を生成してもよい。In the above-described embodiment, in the above-described embodiment, the gray level value of the rectangular portion including the eyes and nose of the face is used as the feature value. May be similarly extracted as a gray value set, and a dictionary may be generated in consideration of pupil position fluctuation.

【００８１】さらにより精度が必要な場合は、従来の視
線検出装置と置換、併用なども行っても良い。オブジェ
クト制御部については、さまざまなシステムと置換する
ことができ、顔の向きによるインタフェース機能を、容
易に機能拡張変更ができる。If further accuracy is required, replacement with or use in combination with a conventional visual axis detection device may be performed. The object control unit can be replaced with various systems, and the interface function based on the face direction can be easily extended and changed.

【００８２】[0082]

【発明の効果】本発明によれば、マウスなどの従来のデ
バイスを用いて制御していたオブジェクトの制御を、逐
次的に学習を行って、顔の向き、視線方向などの情報を
用いてオブジェクトの状態を変更、操作することができ
る。According to the present invention, the control of an object, which has been controlled using a conventional device such as a mouse, is sequentially learned, and the control of the object is performed using information such as a face direction and a gaze direction. Can be changed and operated.

【００８３】また、従来の視線検出のような、あらかじ
めキャリブレーションを行う必要がなく、マウスを使わ
なくとも制御が可能となる。これにより、被験者が特別
なキャリブレーションを習得しなくとも、作業の無駄を
省き、効率的な作業が可能にできる。Further, there is no need to perform calibration in advance as in the case of conventional gaze detection, and control can be performed without using a mouse. Thereby, even if the subject does not learn special calibration, wasteful work can be omitted and efficient work can be performed.

[Brief description of the drawings]

【図１】システムの構成Fig. 1 System configuration

【図２】顔画像処理部の一実施例FIG. 2 shows an embodiment of a face image processing unit.

【図３】認識判断部の一実施例FIG. 3 is an embodiment of a recognition determining unit.

【図４】オブジェクト制御部の一実施例FIG. 4 is an embodiment of an object control unit.

【図５】ウインドウ制御部の一実施例FIG. 5 shows an embodiment of a window control unit.

【図６】辞書生成のフローチャートFIG. 6 is a flowchart of dictionary generation.

【図７】判断制御部の一構成例FIG. 7 is a configuration example of a judgment control unit.

【図８】イベント検証部のフローチャートFIG. 8 is a flowchart of an event verification unit.

【図９】イベント生成部のフローチャートFIG. 9 is a flowchart of an event generation unit.

【図１０】顔検出の説明図FIG. 10 is an explanatory diagram of face detection.

【図１１】特徴点検出と切り出しの方法FIG. 11 Method of feature point detection and segmentation

【図１２】顔向きと顔特徴量の説明図FIG. 12 is an explanatory diagram of a face direction and a face feature amount.

【図１３】マウスの使用時の状態の説明図FIG. 13 is an explanatory diagram of a state when a mouse is used.

【図１４】ウインドウフォーカスの説明図FIG. 14 is an explanatory diagram of a window focus.

【図１５】ウインドウ状態の状態遷移図FIG. 15 is a state transition diagram of a window state.

【図１６】キャリブレーションの方法FIG. 16: Calibration method

【図１７】テレビにおける実施例の説明図FIG. 17 is an explanatory diagram of an embodiment in a television.

[Explanation of symbols]

１…オブジェクト操作装置２…画像入力部３…顔画像処理部４…認識判断部５…オブジェクト制御部 DESCRIPTION OF SYMBOLS 1 ... Object operating device 2 ... Image input part 3 ... Face image processing part 4 ... Recognition determination part 5 ... Object control part

Claims

[Claims]

An image input unit for inputting an image, a face image processing unit for analyzing a photographed human face, and a feature amount obtained by analyzing a face of a person who is operating an object. An object operation device, comprising: a recognition determination unit configured to determine a degree of similarity with a stored feature amount and generate operation information of an object; and an object control unit configured to control the object.

2. A method according to claim 1, wherein receiving the operation information of the specified object, the recognition determination unit stores and collects a feature amount obtained by analyzing a human face, and generates a dictionary of a dictionary pattern for identifying the feature amount. 2. The object operation device according to claim 1, further comprising a dictionary generation unit that performs the operation.

3. A recognition judging section having a recognition section for identifying a category having a similar degree of similarity by using a dictionary generated based on a feature amount obtained by analyzing a human face, and being associated with the category. 2. The object operation device according to claim 1, further comprising a determination control unit that generates operation information.

4. A recognition determining unit includes a dictionary generating unit for a feature amount obtained by analyzing a human face, and a recognizing unit for identifying the dictionary, and according to an operation state of a target object, 2. The object operation device according to claim 1, further comprising a determination control unit that controls switching between the two units.

5. The object operation device according to claim 1, wherein the recognition determination unit uses a subspace method in the dictionary generation unit and the recognition unit.

6. An image input means for inputting an image,
A face image processing means for analyzing a photographed human face and a feature amount obtained by analyzing the human face during the operation of the object are stored, a similarity with the stored feature amount is determined, and operation information of the object is determined. An object operation method, comprising: a recognition determining means for generating; and an object control means for controlling an object.