JP2003122505A

JP2003122505A - Information processor, information processing system, and program

Info

Publication number: JP2003122505A
Application number: JP2001315301A
Authority: JP
Inventors: Yukio Takahashi; 行雄高橋; Takuya Arai; 琢哉新井; Atsuhiko Imai; 敦彦今井; Hirotomo Fukuda; 宏友福田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2001-10-12
Filing date: 2001-10-12
Publication date: 2003-04-25
Anticipated expiration: 2021-10-12
Also published as: JP4055388B2

Abstract

PROBLEM TO BE SOLVED: To enable the operation similar to that of a touch panel even if a display screen does not adopt a structure of a touch panel. SOLUTION: A point position (actual coordinates) which is a position on the display screen designated by a pointer is recognized from a pickup image obtained by imaging the display screen with a camera part, and according to the recognized point position, a required information processing responding to GUI operation is performed. Contact sound generated by a physical pointer's contact with the display screen is collected by a microphone to perform analysis processing, and according to the result of the analysis processing, a required information processing responding to GUI operation is performed. Thus, the pointer is brought into contact with the display screen to thereby perform the operation corresponding to the mouse movement and the operation corresponding to the left button of the mouse.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、情報処理装置、情
報処理システム、及びプログラムに関するものであり、
特にグラフィカルユーザインターフェイスに対して行わ
れた操作に応じて所要の情報処理を実行する情報処理装
置、情報処理システムに、及びこれらの装置、システム
により実行すべきプログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing device, an information processing system, and a program,
In particular, the present invention relates to an information processing device and an information processing system that execute required information processing in accordance with an operation performed on a graphical user interface, and a program to be executed by these devices and systems.

【０００２】[0002]

【従来の技術】従来から、表示画面に表示されたＧＵＩ
画像に対して、直接、指やポインタペンなどによって画
面に直接触れることで操作が可能なように、表示画面に
対してタッチパネルを設けたものが広く知られている。
このようなタッチパネルとして、いわゆる感圧式のタッ
チパネルの構造例を図１５に簡略に示す。図１５（ａ）
（ｂ）に示すように、感圧式のタッチパネルは、上部電
極１０１と下部電極１０２とを、スペーサ１０３を介し
て対向するように配置させた構造を採る。下部電極１０
２は、例えばガラス基板に対して銀電極を形成するよう
にして構成され、上部電極１０１は、ＰＥＴフィルムな
どに対して抵抗膜を形成することで構成される。そし
て、上記銀電極及び抵抗膜からはそれぞれ、導線が引き
出されているものである。そして、例えばユーザの指や
ポインタペンなどによって、上部電極１０１側の或る位
置を押圧すると、この押圧力によって生じたＰＥＴフィ
ルムのたわみによって上部電極１０１の抵抗膜が下部電
極１０２の銀電極に接触する。この接触した位置に応じ
て変化する抵抗値等を検出することによって、操作位置
の検出が行われるものである。2. Description of the Related Art Conventionally, a GUI displayed on a display screen
It is widely known that a touch panel is provided on a display screen so that an image can be operated by directly touching the screen with a finger or a pointer pen.
As such a touch panel, a structure example of a so-called pressure-sensitive touch panel is briefly shown in FIG. FIG. 15 (a)
As shown in (b), the pressure-sensitive touch panel has a structure in which an upper electrode 101 and a lower electrode 102 are arranged so as to face each other via a spacer 103. Lower electrode 10
2 is configured by forming a silver electrode on a glass substrate, for example, and the upper electrode 101 is configured by forming a resistance film on a PET film or the like. Conductive wires are drawn out from the silver electrode and the resistance film, respectively. Then, when a certain position on the upper electrode 101 side is pressed by, for example, a user's finger or a pointer pen, the resistance film of the upper electrode 101 comes into contact with the silver electrode of the lower electrode 102 due to the bending of the PET film caused by this pressing force. To do. The operation position is detected by detecting the resistance value or the like that changes according to the contact position.

【０００３】[0003]

【発明が解決しようとする課題】ところで、例えばパー
ソナルコンピュータなどの情報処理装置におけるＧＵＩ
に対する操作としては、周知のようにして、マウスに代
表されるポインティングデバイスが用いられている。し
かしながら、このようなポインティングデバイスによる
操作では、ユーザは、ＧＵＩ画面を注視しながら、感覚
的にポインティングデバイスに対する操作を行っている
ことになる。これは即ち、ユーザは、視界に入っていな
い装置を操作する必要があるので、必ずしも直感的で分
かりやすい操作とはなり得ないという問題を有してい
る。By the way, for example, a GUI in an information processing device such as a personal computer.
As a well-known operation, a pointing device typified by a mouse is used for the operation. However, in such an operation using the pointing device, the user is sensuously operating the pointing device while gazing at the GUI screen. This means that the user has to operate the device that is not in the field of view, and thus the operation cannot be intuitive and easy to understand.

【０００４】そこで、マウスなどのこれまでに広く知ら
れているポインティングデバイスに代わる操作情報入力
の構成として次のような発明が提案されている。先ず、
特開２０００−１５４７９４号公報には、人の瞬きを利
用して操作入力を行うようにした発明が開示されてい
る。また、特開平１０−３１５０号公報では、歯の打ち
合わせにて発生する音を音響センサで検出する発明が開
示されている。しかし、これらの発明は何れも、いわゆ
るウェアラブルコンピューティングの観点によるもの
で、特別な機器を人体に装着する必要があり、人が普段
行うことはない不慣れな行為を伴う操作となる。つま
り、システム構成が特殊なものとなると共に、操作性も
依然として直感的には成りにくいという問題を有してい
る。Therefore, the following invention has been proposed as a configuration for inputting operation information in place of a widely known pointing device such as a mouse. First,
Japanese Unexamined Patent Publication No. 2000-154794 discloses an invention in which an operation input is performed by utilizing a blink of a person. Further, Japanese Patent Application Laid-Open No. 10-3150 discloses an invention in which a sound sensor detects a sound generated by tooth contact. However, all of these inventions are from the viewpoint of so-called wearable computing, and it is necessary to attach a special device to the human body, and the operation involves unfamiliar actions that a person does not usually do. That is, there is a problem that the system configuration becomes special and the operability is still difficult to be intuitively achieved.

【０００５】従って、直感的な操作を無理なく行えると
いう点では、前述したような、ＧＵＩ画面に対して直接
的にポインティング操作を行うことのできるタッチパネ
ル付きの表示画面とすることが良いことになる。しかし
ながら、パーソナルコンピュータなどの情報処理装置に
おいて、タッチパネルによるＧＵＩ操作を実現しようと
すれば、ユーザは、このようなタッチパネル付きのディ
スプレイを購入するなどして所有しなければならないこ
とになり、ユーザにとっては面倒なこととなる。また、
このようなディスプレイ装置は、高価であり経済的負担
を伴うために、この点でも不都合を生じる。Therefore, in terms of being able to perform intuitive operations without difficulty, it is preferable to use a display screen with a touch panel that allows direct pointing operations on the GUI screen as described above. . However, in an information processing device such as a personal computer, in order to realize a GUI operation using a touch panel, the user must own such a display with a touch panel, and the user must own the display. It will be troublesome. Also,
Since such a display device is expensive and involves an economic burden, it is disadvantageous in this respect as well.

【０００６】また、直感的な入力操作という点から見た
場合には、例えば特開平１０−１６１８０１号公報に示
されているように、ユーザが発する音声により操作の指
示を行うように構成することも考えられる。この場合に
は、システム構成としては、ユーザの音声を収音する汎
用的なマイクロフォンを追加すればよい。しかし、この
場合には、ユーザが発する音声を入力操作に利用するの
で、操作中は操作に関する内容しか発話できないという
不便さが伴うことになる。Further, from the viewpoint of intuitive input operation, as shown in, for example, Japanese Unexamined Patent Publication No. 10-161801, the operation should be instructed by a voice uttered by a user. Can also be considered. In this case, as a system configuration, a general-purpose microphone that picks up a user's voice may be added. However, in this case, since the voice uttered by the user is used for the input operation, there is an inconvenience that only the contents related to the operation can be uttered during the operation.

【０００７】[0007]

【課題を解決するための手段】そこで、本発明は、でき
るだけ汎用的な装置から成る構成でありながらも、直感
的な入力操作が簡便に行えるようにすることを目的とし
て、先ず、情報処理装置として次のように構成すること
とした。つまり、表示画面に対して画像を表示出力する
表示手段と、撮像装置により撮像された撮像画像を取り
込む画像取込手段と、この画像取込手段により取り込ま
れた撮像画像における表示画面の画像部分領域を認識す
る画像部分認識手段と、画像部分認識手段が認識する画
像部分領域内に存在するポインタの指し示す位置を、実
際の表示画面上におけるポイント位置として認識するポ
イント位置認識手段と、このポイント位置認識手段によ
り認識されたポイント位置に応じて、所要の情報処理を
実行可能な情報処理手段とを備えることとした。In view of the above, the present invention is directed to an information processing apparatus for the purpose of facilitating an intuitive input operation, even though the present invention is configured by a general-purpose apparatus as much as possible. It was decided to configure as follows. That is, display means for displaying and outputting an image on the display screen, image capturing means for capturing the captured image captured by the image capturing device, and image partial area of the display screen in the captured image captured by the image capturing means. And the point position recognition means for recognizing the position pointed by the pointer existing in the image partial area recognized by the image part recognition means as the point position on the actual display screen, and this point position recognition According to the point position recognized by the means, the information processing means capable of executing required information processing is provided.

【０００８】また、情報処理システムとして次のように
構成することとした。この発明としての情報処理システ
ムは、少なくとも、撮像装置と情報処理装置とから成る
ものとしたうえで、撮像装置は、情報処理装置の表示画
面全体を撮像可能な位置に設けることとする。そのうえ
で上記情報処理装置は、表示画面に対して画像を表示出
力する表示手段と、撮像装置により撮像された撮像画像
を取り込む画像取込手段と、この画像取込手段により取
り込まれた撮像画像における表示画面の画像部分領域を
認識する画像部分認識手段と、この画像部分認識手段が
認識する画像部分領域内に存在するポインタの指し示す
位置を、実際の表示画面上におけるポイント位置として
認識するポイント位置認識手段と、このポイント位置認
識手段により認識されたポイント位置に応じて所要の情
報処理を実行可能な情報処理手段とを備えることとし
た。Further, the information processing system is configured as follows. The information processing system according to the present invention comprises at least an imaging device and an information processing device, and the imaging device is provided at a position where the entire display screen of the information processing device can be imaged. Then, the information processing apparatus displays the image on the display screen by displaying the image, the image capturing unit that captures the captured image captured by the image capturing device, and the display in the captured image captured by the image capturing unit. Image part recognition means for recognizing the image part area of the screen, and point position recognition means for recognizing the position pointed by the pointer existing in the image part area recognized by the image part recognition means as the point position on the actual display screen. And an information processing means capable of executing required information processing according to the point position recognized by the point position recognition means.

【０００９】また、プログラムとしては次のように構成
することとした。つまり、情報処理装置の表示画面を撮
像するように配置される撮像装置により撮像された撮像
画像を取り込む画像取込手順と、この画像取込手順によ
り取り込まれた撮像画像における、上記表示画面の画像
部分領域を認識する画像部分認識手順と、この画像部分
認識手順が認識する画像部分領域内に存在するポインタ
の指し示す位置を、実際の表示画面上におけるポイント
位置として認識するポイント位置認識手順と、このポイ
ント位置認識手順により認識されたポイント位置に応じ
て所要の情報処理を実行可能な情報処理手順とを上記情
報処理装置に実行させるように構成する。Further, the program is configured as follows. That is, the image capturing procedure for capturing the captured image captured by the image capturing apparatus arranged to capture the display screen of the information processing apparatus, and the image of the display screen in the captured image captured by the image capturing procedure. An image partial recognition procedure for recognizing a partial area, a point position recognition procedure for recognizing a position pointed by a pointer existing in the image partial area recognized by the image partial recognition procedure as a point position on an actual display screen, and An information processing procedure capable of executing required information processing according to the point position recognized by the point position recognition procedure is configured to be executed by the information processing apparatus.

【００１０】上記各構成では、撮像装置により、情報処
理装置の表示画面を撮影するようにしている。そして、
撮像画像内に存在するポインタ（指やポインタペンな
ど）の指し示す位置を、実際に表示画面に表示されるグ
ラフィカルユーザインターフェイス画像（ＧＵＩ画像）
におけるポイント位置に変換する。そして、このポイン
ト位置に応じて所要の情報処理が実行されるようにして
いる。これによっては、ユーザが表示画面に直接触れる
ようにしてＧＵＩ画面に対する操作を行えば、この操作
に応じた情報処理が実行されることになる。つまりは、
情報処理装置に対して撮像装置を備えることで、タッチ
パネル的な操作を可能としている。In each of the above structures, the image pickup device photographs the display screen of the information processing device. And
A graphical user interface image (GUI image) actually displayed on the display screen at the position indicated by a pointer (finger, pointer pen, etc.) existing in the captured image
Convert to the point position in. Then, required information processing is executed according to the point position. According to this, if the user operates the GUI screen by directly touching the display screen, information processing corresponding to this operation is executed. In short,
By providing the information processing apparatus with the imaging device, it is possible to perform a touch panel operation.

【００１１】また、本発明の情報処理装置として次のよ
うにも構成することとした。つまり、表示画面に対して
画像を表示出力する表示手段と、マイクロフォン装置に
より収音された音声信号を取り込む音声取込手段と、こ
の音声取込手段により取り込まれた音声信号から、表示
画面に対して物理的ポインタが接触することにより生じ
る接触音の音声信号成分を抽出する抽出手段と、この抽
出手段により抽出された接触音の音声信号成分について
の所要の解析処理を実行する解析手段と、この解析手段
の解析結果に基づいて所要の情報処理を実行可能な情報
処理手段とを備えることとした。Further, the information processing apparatus of the present invention is configured as follows. That is, a display means for displaying and outputting an image on the display screen, a voice capturing means for capturing a voice signal picked up by the microphone device, and a voice signal captured by the voice capturing means for the display screen. Means for extracting the voice signal component of the contact sound generated by the contact of the physical pointer with the physical pointer, and an analyzing means for executing a required analysis process for the voice signal component of the contact sound extracted by the extracting means; An information processing means capable of executing required information processing based on the analysis result of the analysis means is provided.

【００１２】また、本発明の情報処理システムとして、
次のようにも構成する。本発明としての情報処理システ
ムは、マイクロフォン装置と情報処理装置とから成るも
のとされる。そして上記情報処理装置は、表示画面に対
して画像を表示出力する表示手段と、マイクロフォン装
置により収音された音声信号を取り込む音声取込手段
と、この音声取込手段により取り込まれた音声信号から
上記表示画面に対して物理的ポインタが接触することに
より生じる接触音の音声信号成分を抽出する抽出手段
と、この抽出手段により抽出された接触音の音声信号成
分についての所要の解析処理を実行する解析手段と、こ
の解析手段の解析結果に基づいて所要の情報処理を実行
可能な情報処理手段とを備えることとした。As the information processing system of the present invention,
Also configure as follows. The information processing system according to the present invention includes a microphone device and an information processing device. The information processing apparatus includes a display means for displaying and outputting an image on a display screen, a voice capturing means for capturing a voice signal collected by the microphone device, and a voice signal captured by the voice capturing means. Extraction means for extracting an audio signal component of a contact sound generated when a physical pointer comes into contact with the display screen, and required analysis processing for the audio signal component of the contact sound extracted by the extraction means. The analysis means and the information processing means capable of executing required information processing based on the analysis result of the analysis means are provided.

【００１３】さらに、本発明のプログラムとしては次の
ようにも構成することとした。つまり、マイクロフォン
装置により収音された音声信号を取り込む音声取込手順
と、この音声取込手順により取り込まれた音声信号か
ら、情報処理装置の表示画面に対して物理的ポインタが
接触することにより生じる接触音の音声信号成分を抽出
する抽出手順と、この抽出手順により抽出された接触音
の音声信号成分についての所要の解析処理を実行する解
析手順と、この解析手順の解析結果に基づいて所要の情
報処理を実行可能な情報処理手順とを上記情報処理装置
に実行させるように構成する。Further, the program of the present invention is configured as follows. That is, it occurs when a physical pointer comes into contact with the display screen of the information processing device from the voice capturing procedure for capturing the voice signal collected by the microphone device and the voice signal captured by this voice capturing procedure. An extraction procedure for extracting the voice signal component of the contact sound, an analysis procedure for executing the required analysis processing for the voice signal component of the contact sound extracted by this extraction procedure, and a required procedure based on the analysis result of this analysis procedure. An information processing procedure capable of executing information processing is configured to be executed by the information processing apparatus.

【００１４】上記各構成では、情報処理装置の表示画面
に対してユーザが直接的にポインタ（指やポインタペン
など）を接触させるようにして行った、例えば叩く、擦
るなどの操作に応じて発せられる接触音を、マイクロフ
ォンにより入力して解析するようにされる。そして、そ
の解析結果に応じて情報処理が実行される。これは、Ｇ
ＵＩ画像が表示される表示画面に対してポインタを接触
させるというタッチパネル的な操作が、この表示画面に
ポインタが接触したことによって発する音を利用して実
現されることを意味する。In each of the above-mentioned configurations, the user directly touches the display screen of the information processing device with the pointer (finger, pointer pen, etc.). The contact sound is input by a microphone and analyzed. Then, information processing is executed according to the analysis result. This is G
This means that a touch panel-like operation of bringing the pointer into contact with the display screen on which the UI image is displayed is realized by using the sound generated when the pointer comes into contact with the display screen.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態につい
て説明を行っていくこととする。以降の説明は次の順序
で行う。１．システム外観例２．パーソナルコンピュータの構成例３．疑似タッチパネル操作３−１．撮像画像に基づくポインティング操作に必要な
処理３−１−１．表示画面枠の指定３−１−２．表示画面枠の台形補正３−１−３．座標変換処理３−２．音声信号に基づく疑似ボタン操作に必要な処理３−２−１．音声信号の認識／解析処理例３−２−２．解析結果に応じた状態遷移４．疑似タッチパネル操作時の処理動作４−１．画像／音声認識ドライバのソフトウェア構成例４−２．フローチャートによる処理動作例５．変形例（ポインティング操作のみによる疑似タッチ
パネル操作）BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below. The following description will be given in the following order. 1. Example of system appearance 2. Configuration example of personal computer 3. Pseudo touch panel operation 3-1. Processing required for pointing operation based on captured image 3-1-1. Designation of display screen frame 3-1-2. Keystone correction of display screen frame 3-1-3. Coordinate conversion processing 3-2. Processing required for pseudo button operation based on audio signal 3-2-1. Speech signal recognition / analysis processing example 3-2-2. State transition according to analysis result 4. Processing operation when operating the pseudo touch panel 4-1. Software configuration example of image / voice recognition driver 4-2. 4. Processing operation example according to flowchart Modified example (pseudo touch panel operation by only pointing operation)

【００１６】１．システム外観例図１は、本実施の形態としてのシステムの外観例を斜視
図により示している。この図１に示すパーソナルコンピ
ュータ１０は、本発明の実施の形態としての情報処理装
置とされる。この図では、パーソナルコンピュータは、
いわゆるノートブック型とされており、内蔵のディスプ
レイモニタとしては液晶ディスプレイが採用されてい
る。そして、この液晶ディスプレイとしての表示画面１
７ａとしては、特にタッチパネルを備えていない通常の
構造とされる。しかしながら、本実施の形態では、以降
説明していくようにして、カメラ部２２により撮像され
る撮像画像、及びマイクロフォン２３により収音される
音声を利用することで、この表示画面１７ａに表示され
るＧＵＩ画像に対して、ユーザが指やポイントペンなど
のポインタを直接接触させるようにして入力操作を行う
ことを可能としている。1. Example of System Appearance FIG. 1 is a perspective view showing an example of the appearance of the system according to the present embodiment. The personal computer 10 shown in FIG. 1 is an information processing apparatus as an embodiment of the present invention. In this figure, the personal computer
It is a so-called notebook type, and a liquid crystal display is adopted as a built-in display monitor. And the display screen 1 as this liquid crystal display
7a has a normal structure without a touch panel. However, in the present embodiment, as described below, the captured image captured by the camera unit 22 and the sound captured by the microphone 23 are used to be displayed on the display screen 17a. It is possible for the user to perform an input operation on the GUI image by directly touching a pointer such as a finger or a point pen.

【００１７】このために、パーソナルコンピュータ１０
に対しては、例えば図示するようにして、外付けのカメ
ラ部２２を、取り付け具２２−１によって取り付けるよ
うにされる。そして、このカメラ部２２は、撮像画像と
して、表示画面１７ａ全体が収まるようにして固定して
配置される。このカメラ部２２としては、カラー画像を
撮影可能な構造を有しているものとされる。そして、例
えばＣＣＤ(Charge Coupled Device)などの撮像素子を
採用した小型のものが採用されればよい。また、カメラ
部２２とパーソナルコンピュータ１０とは、例えばケー
ブル２２ａにより、パーソナルコンピュータ１０側の筐
体に設けられている所定の入力端子と接続することで、
カメラ部２２にて得られた撮像画像の信号が、パーソナ
ルコンピュータ１０に入力されるようにする。なお、こ
の場合の入力端子は、アナログ映像信号入力端子のほ
か、例えばＵＳＢ(Universal Serial Bus)端子やＩＥＥ
Ｅ１３９４端子などのデータインターフェイス端子をは
じめとして各種考えられるものであって、特に限定され
るものではない。To this end, the personal computer 10
For example, as shown in the figure, the external camera unit 22 is attached by the attaching tool 22-1. Then, the camera unit 22 is fixedly arranged as a captured image so that the entire display screen 17a is accommodated. The camera unit 22 has a structure capable of capturing a color image. Then, for example, a small one using an image pickup device such as a CCD (Charge Coupled Device) may be adopted. In addition, the camera unit 22 and the personal computer 10 are connected to a predetermined input terminal provided in the housing on the personal computer 10 side by, for example, a cable 22a,
The signal of the captured image obtained by the camera unit 22 is input to the personal computer 10. The input terminal in this case is, for example, a USB (Universal Serial Bus) terminal or an IEEE in addition to the analog video signal input terminal.
Various types of data interface terminals such as an E1394 terminal are conceivable and are not particularly limited.

【００１８】また、マイクロフォン２３も同様にして、
ケーブル２３ａにより、例えばパーソナルコンピュータ
１０側のオーディオ信号入力端子などの、マイクロフォ
ン入力に対応した端子と接続するようにされる。これに
より、パーソナルコンピュータ１０により、マイクロフ
ォン２３により収音して得られた音声信号を入力するこ
とが可能になる。ここで、本実施の形態では、ユーザが
表示画面１７ａに対してポインタを直接接触させて操作
したときに生じる音、例えば表示画面１７ａを叩いた
り、擦ったりしたときの音をマイクロフォン２３により
収音する必要がある。このため、マイクロフォン２３
は、このような表示画面１７ａ上で生じる接触音が的確
に収音できるような位置に設置される。また、表示画面
１７ａ上で生じる接触音がより的確に収音されるべきこ
とを考慮して、所要の指向性を有したマイクロフォン２
３を利用するようにしても良い。Similarly, the microphone 23 is also
The cable 23a is connected to a terminal corresponding to a microphone input, such as an audio signal input terminal on the personal computer 10 side. As a result, the personal computer 10 can input the audio signal obtained by picking up the sound by the microphone 23. Here, in the present embodiment, the microphone 23 collects the sound generated when the user operates the pointer by directly touching the display screen 17a, for example, the sound when the display screen 17a is tapped or rubbed. There is a need to. Therefore, the microphone 23
Is installed at a position where the contact sound generated on the display screen 17a can be accurately collected. Further, considering that the contact sound generated on the display screen 17a should be more accurately collected, the microphone 2 having a required directivity is provided.
3 may be used.

【００１９】また、パーソナルコンピュータ１０とし
て、はじめから内蔵マイクロフォン２３Ａを備えたよう
な構成を採っている場合には、外付けのマイクロフォン
２２に代えて、この内蔵マイクロフォン２３Ａを用いれ
ばよい。また、この点については、カメラ部２２につい
ても同様である。つまり、パーソナルコンピュータ１０
に対して予めカメラ部が取り付けられており、かつ、表
示画面１７ａ全体を撮像できるように配置できる取り付
け構造が採られているのであれば、このような備え付け
のカメラ部を利用してよいものである。If the personal computer 10 has a built-in microphone 23A from the beginning, the built-in microphone 23A may be used instead of the externally attached microphone 22. This also applies to the camera unit 22. That is, the personal computer 10
On the other hand, if the camera unit is attached in advance and the mounting structure that can be arranged so that the entire display screen 17a can be imaged is adopted, such a provided camera unit may be used. is there.

【００２０】また、この図１においては、本実施の形態
の情報処理装置であるパーソナルコンピュータ１０は、
ノートブック型のパーソナルコンピュータであることと
しているが、いわゆるデスクトップ型のパーソナルコン
ピュータとされていても構わないものである。また、こ
のようなデスクトップ型とされる場合において、ディス
プレイモニタとしては、液晶ディスプレイに限定される
ものではなく、ＣＲＴとされても構わないものである。Further, in FIG. 1, the personal computer 10 which is the information processing apparatus of the present embodiment is
Although it is supposed to be a notebook type personal computer, it may be a so-called desktop type personal computer. Also, in the case of such a desktop type, the display monitor is not limited to the liquid crystal display, and may be a CRT.

【００２１】また、本発明としては、システム構成的に
も図１に示した形態に限定されるべきものではない。例
えば、プロジェクタ装置とパーソナルコンピュータとを
接続して、パーソナルコンピュータの画像をプロジェク
タ装置によってスクリーンに表示させるシステムが知ら
れているが、本発明としては、このようなプロジェクタ
装置を備えたパーソナルコンピュータのシステムにも適
用することができる。図１５は、このようなプロジェク
タ装置によりコンピュータの画像表示を行うシステムに
本発明を適用した場合の構成例が示されている。この場
合のパーソナルコンピュータ１０は、例えばビデオ出力
端子をプロジェクタ装置７０のビデオ入力端子を接続す
るなどして、パーソナルコンピュータ１０からの画像信
号がプロジェクタ装置７０に入力されるようにする。こ
れにより、プロジェクタ装置７０は、パーソナルコンピ
ュータ１０の表示画面１７ａに表示されるのと同じ画像
をスクリーン７１に対して拡大投射して表示させること
になる。Further, the present invention should not be limited to the configuration shown in FIG. 1 in terms of system configuration. For example, there is known a system in which a projector device and a personal computer are connected to each other and an image of the personal computer is displayed on a screen by the projector device. However, according to the present invention, a system of a personal computer including such a projector device is known. Can also be applied to. FIG. 15 shows a configuration example when the present invention is applied to a system for displaying an image on a computer by such a projector device. In this case, the personal computer 10 connects the video output terminal to the video input terminal of the projector device 70 so that the image signal from the personal computer 10 is input to the projector device 70. As a result, the projector device 70 enlarges and projects the same image as that displayed on the display screen 17a of the personal computer 10 onto the screen 71 for display.

【００２２】そして、この場合にもパーソナルコンピュ
ータ１０に対しては、カメラ部２２とマイクロフォン２
３とを接続するようにしている。この場合、カメラ部２
２は、スクリーン７１全体が撮像画像内におさまるよう
にして配置され、その撮像画像信号をパーソナルコンピ
ュータ１０に対して入力する。また、マイクロフォン２
３は、例えば後述するポインタとしての指し棒７２の先
端をスクリーン７１に対して接触させて何らかの操作を
行ったことで生じる「接触音」を収音可能な位置に設け
られる。そして、収音した音声信号をパーソナルコンピ
ュータ１０に対して入力する。また、指し棒７２は、ス
クリーン７１を対象として操作を行うためのポインタで
あり、ユーザには、この指し棒７２の先端部をスクリー
ン７１上に直接接触させるようにしてＧＵＩ操作を行っ
てもらうこととしている。そして、このようなシステム
構成においても、以降説明していく図１のシステムの場
合と同様に、カメラ部２２により撮像されるスクリーン
７１の撮像画像、及びマイクロフォン２３により収音さ
れる接触音の音声信号を利用することで、スクリーン７
１に表示される画像に対するポインタの接触操作に応じ
たＧＵＩが実現可能となるものである。Also in this case, the camera unit 22 and the microphone 2 are connected to the personal computer 10.
I am trying to connect with 3. In this case, the camera unit 2
2 is arranged so that the entire screen 71 fits within the captured image, and the captured image signal is input to the personal computer 10. Also, microphone 2
Reference numeral 3 is provided at a position where a "contact sound" generated by, for example, bringing the tip of a pointer rod 72 as a pointer, which will be described later, into contact with the screen 71 and performing some operation. Then, the collected audio signal is input to the personal computer 10. The pointer 72 is a pointer for performing an operation on the screen 71, and the user is asked to perform a GUI operation by directly contacting the tip of the pointer 72 with the screen 71. I am trying. Even in such a system configuration, as in the case of the system of FIG. 1 described below, the captured image of the screen 71 captured by the camera unit 22 and the sound of the contact sound captured by the microphone 23. By using the signal, screen 7
The GUI corresponding to the touch operation of the pointer on the image displayed on the screen 1 can be realized.

【００２３】また、これら図１及び図１５に示したシス
テムでは、カメラ部２２及びマイクロフォン２３とパー
ソナルコンピュータ１０との接続をケーブルにより行っ
ているが、例えば、既に知られている赤外線通信やブル
ートゥース通信などを利用して、無線により接続する構
成とすることも考えられる。また、特に図１５のシステ
ムでは、パーソナルコンピュータ１０とプロジェクタ装
置７０との間でのビデオ信号の入出力を無線により行う
ようにすることが考えられる。In the systems shown in FIGS. 1 and 15, the camera section 22 and the microphone 23 are connected to the personal computer 10 by a cable. It is conceivable that the configuration is such that the connection is made wirelessly by using such as. Further, particularly in the system of FIG. 15, it is conceivable that the input and output of the video signal between the personal computer 10 and the projector device 70 is performed wirelessly.

【００２４】２．パーソナルコンピュータの構成例図２のブロック図は、本実施の形態のパーソナルコンピ
ュータ１０の内部構成例を示しているこの図において、
ＣＰＵ１１は、例えばＲＯＭ１２においてファームウェ
アとして保持されているプログラム、及びＨＤＤ１８に
記憶されているＯＳ(Operating System)、及びアプリケ
ーションプログラム等をＲＡＭ１３に展開したプログラ
ムに従って各種の処理を実行する。また、ＲＡＭ１３に
はＣＰＵ１１が各種処理を実行するのに必要なデータ等
も適宜保持される。これら、ＣＰＵ１１、ＲＯＭ１２、
ＲＡＭ１３は、後述するネットワークインターフェイス
２０、データインターフェイス２１、及び入出力インタ
ーフェイス１４などと共に、内部バス２５により相互接
続される。内部バス２５は、例えば、ＰＣＩ(Periphera
l Component Interconnect)又はローカルバス等により
構成される。2. Configuration Example of Personal Computer FIG. 2 is a block diagram showing an internal configuration example of the personal computer 10 of the present embodiment.
The CPU 11 executes various processes in accordance with a program stored in the ROM 12 as firmware, an OS (Operating System) stored in the HDD 18, an application program, and the like expanded in the RAM 13. Further, the RAM 13 also appropriately stores data and the like necessary for the CPU 11 to execute various processes. These are CPU11, ROM12,
The RAM 13 is interconnected by an internal bus 25 together with a network interface 20, a data interface 21, an input / output interface 14 and the like which will be described later. The internal bus 25 is, for example, a PCI (Periphera).
l Component Interconnect) or a local bus.

【００２５】入出力インターフェイス１４は、これに接
続される装置等と内部バス２５との情報の授受を行うた
めに設けられるもので、この場合には、キーボード１
５、マウス１６、ディスプレイモニタ１７、ＨＤＤ１
８、メディアドライバ１９、及びカメラ部２２、マイク
ロフォン２３（２３Ａ）などを接続することができるこ
ととしている。The input / output interface 14 is provided for exchanging information between the internal bus 25 and devices connected to the input / output interface 14. In this case, the keyboard 1 is used.
5, mouse 16, display monitor 17, HDD1
8, the media driver 19, the camera unit 22, the microphone 23 (23A), etc. can be connected.

【００２６】キーボード１５及びマウス１６から供給さ
れた操作信号は、入出力インターフェイス１４からＣＰ
Ｕ１１に出力するようにされている。ＣＰＵ１１では、
例えばＯＳのプログラムの下で、これらキーボード１５
及びマウス１６からの操作信号に応じた所要の処理を実
行する。なお、本実施の形態においては、カメラ部２２
から出力される撮像画像信号と、マイクロフォン２３
（２３Ａ）にて収音されて出力される音声信号との情報
に基づいて、マウス１６の操作に代わるＧＵＩ操作が可
能なように構成される。また、本実施の形態のパーソナ
ルコンピュータ１０がノートブック型とされる場合、こ
のノートブック型においては、マウスに代わる操作子と
して、タッチパッドとクリックボタンを設けることが広
く行われている。そこで、この図２には、図示していな
いが、これらタッチパッドやクリックボタンをマウス１
６に代えて入出力インターフェイス１４に接続するよう
にしてもよいものである。The operation signals supplied from the keyboard 15 and the mouse 16 are sent to the CP from the input / output interface 14.
It is designed to output to U11. In the CPU 11,
For example, under the OS program, these keyboards 15
Also, a required process corresponding to the operation signal from the mouse 16 is executed. In the present embodiment, the camera unit 22
The captured image signal output from the microphone 23
The GUI operation instead of the operation of the mouse 16 can be performed based on the information of the audio signal that is collected and output at (23A). When the personal computer 10 of the present embodiment is a notebook type, a touch pad and a click button are widely provided in this notebook type as operators in place of the mouse. Therefore, although not shown in FIG. 2, these touchpads and click buttons are used by the mouse 1
6 may be connected to the input / output interface 14.

【００２７】また、ディスプレイモニタ１７に対して
は、入出力インターフェイス１４を介して表示用の映像
信号が出力され、これにより、ディスプレイモニタ１７
の表示画面１７ａには、画像が表示されることになる。A video signal for display is output to the display monitor 17 via the input / output interface 14, whereby the display monitor 17 is provided.
An image will be displayed on the display screen 17a.

【００２８】また、入出力インターフェイス１４には、
記憶媒体としてハードディスクを備えたハードディスク
ドライブ（ＨＤＤ）１８が接続されている。ＣＰＵ１１
は、ＨＤＤ１８のハードディスクに対してデータやプロ
グラム等の記録又は読み出しを行うことができるように
されている。そして、書き込みデータ及び読み出しデー
タが、入出力インターフェイス１４を介してＨＤＤ１８
と内部バス２５側との間で伝送されることになる。特に
本実施の形態としては、ＨＤＤ１８に対して、画像／音
声認識ドライバ１８ａがインストールされて記憶されて
いる。この画像／音声認識ドライバ１８ａは、後述する
ようにして、ユーザがディスプレイモニタ１７の表示画
面１７ａに対して行う接触操作を、カメラ部２２からの
撮像信号と、マイクロフォン２３からの音声信号に基づ
いて入力操作情報として扱うための処理を実現するアプ
リケーションプログラムである。Further, the input / output interface 14 includes
A hard disk drive (HDD) 18 having a hard disk as a storage medium is connected. CPU11
Is capable of recording or reading data, programs, etc. on the hard disk of the HDD 18. Then, the write data and the read data are transferred to the HDD 18 via the input / output interface 14.
And the internal bus 25 side. Particularly in this embodiment, the image / voice recognition driver 18a is installed and stored in the HDD 18. As will be described later, the image / speech recognition driver 18a performs a contact operation performed by the user on the display screen 17a of the display monitor 17 based on the image pickup signal from the camera unit 22 and the sound signal from the microphone 23. It is an application program that realizes processing for handling as input operation information.

【００２９】また、メディアドライバ１９は、例えば現
状であれば、ＣＤ−ＲＯＭやＤＶＤなどをはじめとし
て、特定種別のメディアに対応するドライバとされ、対
応するメディアに対してのデータの読み出し／書き込み
を実行する。このための制御もＣＰＵ１１が実行すると
共に、書き込みデータ及び読み出しデータが、入出力イ
ンターフェイス１４を介してＨＤＤ１８と内部バス２５
側との間で伝送される。Further, the media driver 19 is, for example, in the present situation, a driver corresponding to a specific type of media such as a CD-ROM and a DVD, and reads / writes data from / to the corresponding media. Run. The CPU 11 also executes the control for this purpose, and the write data and the read data are transferred to the HDD 18 and the internal bus 25 via the input / output interface 14.
Transmitted to and from the side.

【００３０】また、この場合のカメラ部２２は、例えば
図１により説明したようにして設けられているもので、
例えば撮像素子としてＣＣＤを備えた撮像装置とされ
る。そして、このカメラ部２２により撮像された画像の
信号である撮像画像信号は、入出力インターフェイス１
４を介して入力されることになる。マイクロフォン２３
（又は内蔵のマイクロフォン２３Ａ）もまた、図１に示
したようにしてパーソナルコンピュータ１０に対して設
けられるものである。このマイクロフォン２３により収
音された音声の音声信号は、入出力インターフェイスを
介することで、デジタルオーディオ信号として入力され
ることになる。The camera unit 22 in this case is provided as described with reference to FIG. 1,
For example, the image pickup device includes a CCD as an image pickup element. The captured image signal, which is the signal of the image captured by the camera unit 22, is input / output interface 1
4 will be input. Microphone 23
(Or the built-in microphone 23A) is also provided for the personal computer 10 as shown in FIG. The voice signal of the voice picked up by the microphone 23 is input as a digital audio signal through the input / output interface.

【００３１】ネットインターフェイス２０は、所定のネ
ットワークを介して通信を行うためのインターフェイス
であり、例えば、電話回線を利用したインターネットと
の接続に対応するのであればハードウェアとしてはモデ
ムなどが備えられる。また、ＬＡＮ(Local Area Networ
k)などのネットワークなどを介したものであればＥｔｈ
ｅｒｎｅｔなどのインターフェイスとなる。The net interface 20 is an interface for performing communication via a predetermined network, and for example, if it corresponds to the connection to the Internet using a telephone line, the hardware includes a modem or the like. In addition, LAN (Local Area Network)
Eth if it is via a network such as k)
It becomes an interface such as ernet.

【００３２】データインターフェイス２１は、例えばＳ
ＣＳＩ、ＵＳＢ、ＩＥＥＥ１３９４などに代表される、
ケーブル接続による外部周辺機器との通信のためのイン
ターフェイスとされる。なお、上記したカメラ部２２な
どは、このデータインターフェイス２１に対して接続さ
れても良い。The data interface 21 is, for example, S
CSI, USB, IEEE1394, etc.
It is used as an interface for communication with external peripheral devices by cable connection. The camera unit 22 and the like described above may be connected to the data interface 21.

【００３３】ところで、前述のようにしてＨＤＤ１８に
インストールされるべき、画像／音声認識ドライバ１８
ａは、後述する機能を実現するための方法手順がプログ
ラムとして格納されているアプリケーションプログラム
とされる。そして、これら画像／音声認識ドライバ１８
ａとしてのアプリケーションプログラムは、フロッピー
（登録商標）ディスク、ＣＤ−ＲＯＭ(Compact Disc Re
ad Only Memory)、ＭＯ(Magnet Optical)ディスク、Ｄ
ＶＤ(Digital Versatile Disc)、磁気ディスク、半導体
メモリなどのリムーバブルな記録媒体に対して、一時的
あるいは永続的に格納（記録）しておくことができる。
このようなリムーバブルの記録媒体は、いわゆるパッケ
ージソフトウェアとして提供することができる。そし
て、このようなリムーバブルな記録媒体を、例えばパー
ソナルコンピュータ１０のメディアドライバ１９に装填
して再生することで、パーソナルコンピュータ１０のＨ
ＤＤ１８に対してインストールするようにされる。な
お、上記のようにしてリムーバブルな記録媒体を利用し
てアプリケーションプログラムをインストールするほ
か、他の情報処理機器からパーソナルコンピュータ１０
に対してアプリケーションプログラムを無線で転送した
り、ＬＡＮ、インターネットといったネットワークを介
して、サーバからパーソナルコンピュータ１０にオンラ
インで転送し、パーソナルコンピュータ１０では、その
ようにして転送されてくるアプリケーションプログラム
を受信し、ＨＤＤ１８にインストールするようにするこ
ともできる。By the way, the image / voice recognition driver 18 to be installed in the HDD 18 as described above.
“A” is an application program in which method procedures for realizing the functions described below are stored as a program. Then, the image / voice recognition driver 18
The application program as a includes a floppy (registered trademark) disk and a CD-ROM (Compact Disc Replay).
ad Only Memory), MO (Magnet Optical) disc, D
It can be temporarily (or permanently) stored (recorded) in a removable recording medium such as a VD (Digital Versatile Disc), a magnetic disk, a semiconductor memory.
Such a removable recording medium can be provided as so-called package software. Then, such a removable recording medium is loaded into the media driver 19 of the personal computer 10, for example, and played back, so that the H
It will be installed for DD18. In addition to installing the application program using the removable recording medium as described above, the personal computer 10 can be installed from another information processing device.
To the personal computer 10 online via a network such as a LAN or the Internet, and the personal computer 10 receives the application program thus transferred. , HDD 18 can also be installed.

【００３４】３．疑似タッチパネル操作３−１．撮像画像に基づくポインティング操作に必要な
処理３−１−１．表示画面枠の指定周知のように、ＧＵＩに対する操作としては、マウス
（及びタッチパッド、クリックボタンなど）などをはじ
めとするポインティングデバイス（操作装置）を操作す
ることが一般的である。これに対して本実施の形態で
は、このようなポインティングデバイスによる操作に代
えて、表示画面に直接触れるようにして行うポインティ
ング操作によりＧＵＩ操作が実現されるようにするもの
である。つまり、タッチパネル的な操作を実現する。し
かも本実施の形態では、表示画面がタッチパネルの構造
を有していなくとも、表示画面１７ａを撮像するカメラ
部２２の撮像画像と、表示パネルに対する各種のタッチ
操作により生じる音をマイクロフォン２３（２３Ａ）に
て収音して得た音声とについて、後述するような認識処
理を実行することで、上記したタッチパネル的操作を実
現するものである。なお、本実施の形態におけるような
タッチパネル的操作については、以降においては「疑似
タッチパネル操作」ともいうことにする。3. Pseudo touch panel operation 3-1. Processing required for pointing operation based on captured image 3-1-1. Designation of Display Screen Frame As is well known, as a GUI operation, it is general to operate a pointing device (operating device) such as a mouse (and touch pad, click button, etc.). On the other hand, in the present embodiment, the GUI operation is realized by the pointing operation performed by directly touching the display screen instead of the operation by the pointing device. That is, a touch panel operation is realized. Moreover, in the present embodiment, even if the display screen does not have the structure of a touch panel, the microphone 23 (23A) produces a captured image of the camera unit 22 that captures the display screen 17a and a sound generated by various touch operations on the display panel. The touch panel-like operation described above is realized by executing a recognition process as will be described later with respect to the voice obtained by collecting the sound. Note that the touch panel-like operation as in the present embodiment will also be referred to as “pseudo touch panel operation” hereinafter.

【００３５】そして、本実施の形態の「疑似タッチパネ
ル操作」であるが、表示画面１７ａを撮像した撮像画像
の認識に基づく操作入力は、マウスによるカーソル移動
操作、つまり「ポインティング操作」としての機能を有
する。また、表示画面１７ａにて得られる接触音の認識
に基づく操作入力は、マウスのボタン操作としての機能
を有する。つまり、「疑似ボタン操作」を実現する。な
お、本実施の形態としての説明においては、この「疑似
ボタン操作」として、マウスの左ボタン操作（ＤＯＳ／
Ｖ系）としての機能を与えることとしている。In the "pseudo touch panel operation" of the present embodiment, the operation input based on the recognition of the picked-up image of the display screen 17a has a function of moving the cursor with the mouse, that is, "pointing operation". Have. The operation input based on the recognition of the contact sound obtained on the display screen 17a has a function as a button operation of a mouse. That is, "pseudo button operation" is realized. In the description of the present embodiment, as the "pseudo button operation", the left mouse button operation (DOS /
V system).

【００３６】そこで先ず、本実施の形態の「疑似タッチ
パネル操作」として、「ポインティング操作」から説明
を行っていくこととする。本実施の形態の「疑似タッチ
パネル操作」としてのポインティング操作（カーソル移
動操作）を実現するためには、以下の処理が必要とな
る。つまり、表示画面１７ａを撮像して得られる撮像画像内の表示
画面１７ａの表示領域を認識し、撮像画像内におけるどの画像領域がポインタであるの
かを認識し、さらに認識したポインタの指し示すポインタ位置とし
ての点座標を認識し、この認識したポインタ位置の座標を表示画面に実際に
表示されているＧＵＩ画像上の座標情報に変換する。と
いう処理である。そして、処理により得られた座標情
報がＯＳに渡されれば、ＯＳではその座標情報に応じて
カーソル移動制御を行うようにされるわけである。な
お、上記処理でいう「ポインタ」とは、表示画面１７
ａを実際に接触操作する物理的なものをいい、具体的に
はユーザの指や、ユーザが保持して使用する例えば棒状
のポインタペンなどが挙げられる。Therefore, first, the "pointing operation" will be described as the "pseudo touch panel operation" of the present embodiment. In order to realize the pointing operation (cursor movement operation) as the “pseudo touch panel operation” of the present embodiment, the following processing is required. That is, the display area of the display screen 17a in the captured image obtained by capturing the display screen 17a is recognized, which image area in the captured image is the pointer, and the pointer position pointed to by the recognized pointer is determined. Is recognized, and the coordinates of the recognized pointer position are converted into coordinate information on the GUI image actually displayed on the display screen. That is the process. When the coordinate information obtained by the processing is passed to the OS, the OS controls the cursor movement according to the coordinate information. The “pointer” in the above processing means the display screen 17
It is a physical object for actually touching and operating a, and specifically includes a user's finger or, for example, a rod-shaped pointer pen held and used by the user.

【００３７】そして、処理としての「表示画面１７ａ
を撮像して得られる撮像画像内の表示画面１７ａの表示
領域を認識」するためには、次に図３にて説明する初期
設定を予め行っておくようにされる。Then, as a processing, the "display screen 17a
In order to recognize the display area of the display screen 17a in the captured image obtained by capturing "," the initial setting described with reference to FIG. 3 is performed in advance.

【００３８】図３には、撮像画像内の表示画面１７ａの
表示領域を認識させるための初期設定の手順が示されて
いる。なお、この初期設定は、画像／音声認識ドライバ
１８ａのプログラムの下で行われる。ユーザは、例えば
所定操作によって初期設定モードとする。なお、この初
期設定時の操作は、マウスなどのポインティングデバイ
スを使用すればよい。初期設定モードとされると、パー
ソナルコンピュータ１０の表示画面１７ａには、図１に
示したようにして配置されるカメラ部２２の撮像画像が
表示されるようになる。このとき、ユーザは、この表示
画面１７ａに表示される撮像画像を見ながら、図３
（ａ）に示すようにして、表示画面１７ａにおいて、パ
ーソナルコンピュータ１０の表示画面１７ａの全体が表
示されるようにその配置位置や向きを調節する。つま
り、表示画面１７ａ内に、パーソナルコンピュータ１０
の画像であるＰＣ画像部分１０−１として、表示画面１
７ａの画像である表示画面領域１７ａ−１を表示させる
ものである。FIG. 3 shows an initial setting procedure for recognizing the display area of the display screen 17a in the captured image. Note that this initial setting is performed under the program of the image / voice recognition driver 18a. The user sets the initial setting mode by, for example, a predetermined operation. Note that a pointing device such as a mouse may be used for this initial setting operation. In the initial setting mode, the captured image of the camera unit 22 arranged as shown in FIG. 1 is displayed on the display screen 17a of the personal computer 10. At this time, the user sees the captured image displayed on the display screen 17a and
As shown in (a), on the display screen 17a, the arrangement position and orientation are adjusted so that the entire display screen 17a of the personal computer 10 is displayed. That is, the personal computer 10 is displayed on the display screen 17a.
As a PC image portion 10-1 which is an image of
The display screen area 17a-1 which is the image of 7a is displayed.

【００３９】上記のようにして、図３（ａ）に示したよ
うな画像状態が得られたら、ユーザは、画隅指定のため
の所定操作を行うようにされる。すると図３（ｂ）に示
すようにして、表示画面１７ａには、画隅指定カーソル
ＣＲ１が表示される。ユーザは、この画隅指定カーソル
ＣＲ１に対して例えばドラッグ操作を行って、この画隅
指定カーソルＣＲ１の頂点部を、表示画面領域１７ａ−
１の左上角に位置させて、例えば決定操作を行う。When the image state as shown in FIG. 3A is obtained as described above, the user performs a predetermined operation for designating the image corner. Then, as shown in FIG. 3B, the image corner designating cursor CR1 is displayed on the display screen 17a. The user performs, for example, a drag operation on the view-corner designating cursor CR1 so that the apex portion of the view-corner designating cursor CR1 is displayed on the display screen area 17a-
It is positioned at the upper left corner of 1, and a determination operation is performed, for example.

【００４０】続いては、図３（ｃ）に示すようにして、
次の画隅指定カーソルＣＲ２が表示されるので、ユーザ
は、同様にしてこの画隅指定カーソルＣＲ２をドラッグ
して、その頂点部を表示画面領域１７ａ−１の右上角に
位置させて決定操作を行う。さらに続いては、図３
（ｄ）に示すように表示される次の画隅指定カーソルＣ
Ｒ３をドラッグして、その頂点部を表示画面領域１７ａ
−１の右下角に位置させて決定操作を行う。そして、続
いては、図３（ｅ）に示すようにして最後の画隅指定カ
ーソルＣＲ４が表示されるので、この画隅指定カーソル
ＣＲ４をドラッグして、その頂点部を表示画面領域１７
ａ−１の左下角に位置させて決定操作を行うようにされ
る。Then, as shown in FIG. 3 (c),
Since the next image corner specifying cursor CR2 is displayed, the user similarly drags this image corner specifying cursor CR2 to position the apex portion at the upper right corner of the display screen area 17a-1 to perform the determining operation. To do. Furthermore, as shown in FIG.
Next image corner designation cursor C displayed as shown in (d)
Drag R3 to display its apex on the display screen area 17a.
The determination operation is carried out by locating it at the lower right corner of -1. Then, subsequently, as shown in FIG. 3E, the last image-corner designating cursor CR4 is displayed. Therefore, this image-corner designating cursor CR4 is dragged to display its vertex at the display screen area 17
The determination operation is performed by locating the lower left corner of a-1.

【００４１】このようして、表示画面領域１７ａ−１の
４つの角部を、各画隅指定カーソルＣＲ１〜ＣＲ４によ
り指定したことによっては、図３（ｆ）に示すようにし
て、撮像画像上における、表示画面領域１７ａ−１の各
頂点ａ，ｂ，ｃ，ｄの座標位置が把握されたことにな
る。これは、表示画面領域１７ａ−１の形状が、四角形
の「枠情報」として認識されたことを意味する。By thus designating the four corners of the display screen area 17a-1 with the respective image-corner designating cursors CR1 to CR4, as shown in FIG. That is, the coordinate positions of the vertices a, b, c, d of the display screen area 17a-1 in FIG. This means that the shape of the display screen area 17a-1 has been recognized as rectangular "frame information".

【００４２】３−１−２．表示画面枠の台形補正ところで、カメラ部２２の配置位置の関係上、必ずしも
表示画面１７ａは正面から撮影されない。このため、上
記図３（ｆ）に示したようにして認識されるこの表示画
面領域１７ａ−１の四角形形状（枠情報）として、も、
長方形や台形などの整った形ではなく、より歪んだ形状
となっている場合がほとんどである。本実施の形態で
は、表示画面領域１７ａ−１（枠情報）上の座標を実際
の表示画面１７ａ上の座標に変換する座標変換処理を実
行することで、表示画面１７ａ上での実際のポインタの
ポイント位置を特定するようにされるが、上記のような
表示画面領域１７ａ−１の四角形形状（枠情報）を利用
してそのまま座標変換処理を実行しようとすると、その
ための演算処理も複雑となってかえって正確さが失われ
ることにもなる。そこで、本実施の形態では、次に説明
するようにして表示画面領域１７ａ−１の四角形形状
（枠情報）についての補正処理を行う。3-1-2. Keystone correction of display screen frame By the way, due to the arrangement position of the camera unit 22, the display screen 17a is not necessarily photographed from the front. Therefore, as the quadrangular shape (frame information) of the display screen area 17a-1 recognized as shown in FIG.
In most cases, it is not a regular shape such as a rectangle or trapezoid, but a more distorted shape. In the present embodiment, by executing the coordinate conversion process of converting the coordinates on the display screen area 17a-1 (frame information) into the coordinates on the actual display screen 17a, the actual pointer of the display screen 17a is changed. Although the point position is specified, if the coordinate transformation process is executed as it is by using the quadrangular shape (frame information) of the display screen area 17a-1 as described above, the calculation process therefor becomes complicated. On the contrary, accuracy will be lost. Therefore, in the present embodiment, correction processing is performed on the quadrangular shape (frame information) of the display screen area 17a-1 as described below.

【００４３】図４は、上記した補正処理を模式的に示し
ている。図４（ａ）には、図３により説明した画隅指定
操作によって認識された枠情報が示されている。つま
り、図３（ｆ）に示したものと同様の表示画面領域１７
ａ−１としての四角形形状が示されている。この場合に
おいて、この図４（ａ）に示される枠情報は、上記した
理由によって、歪みのある四角形形状となっているもの
とし、例えばこの図４（ａ）に示される枠情報におい
て、横方向の辺ａｂと、これに対向する辺ｃｄとは平行
ではない状態にあるものとされる。また、同様にして、
縦方向の辺ａｄと、これに対向する辺ｂｃとは平行では
ない状態にあるものとする。FIG. 4 schematically shows the above-mentioned correction processing. FIG. 4A shows the frame information recognized by the image corner specifying operation described with reference to FIG. That is, the display screen area 17 similar to that shown in FIG.
A quadrangular shape as a-1 is shown. In this case, it is assumed that the frame information shown in FIG. 4A has a distorted quadrangular shape for the above-mentioned reason. For example, in the frame information shown in FIG. The side ab and the side cd opposite thereto are not parallel to each other. Also, in the same way,
It is assumed that the side ad in the vertical direction and the side bc facing the side ad are not parallel to each other.

【００４４】そして、本実施の形態では、枠情報の辺ａ
ｂを上底、辺ｃｄを下底としてして定めたうえで、この
２つの辺が平行となるようにして補正する。つまり、枠
情報を台形化するようにして補正するものである。これ
により、図４（ａ）に示される枠情報（四角形ａｂｃ
ｄ）は、例えば図４（ｂ）の破線による四角形ＡＢＣＤ
として示すようにして、台形となるように形状が変更さ
れる。具体例として、補正前の枠情報の各頂点を a(xa,ya) b(xb,yb) c(xc,yc) d(xd,yd) とし、補正後の枠情報である補正枠情報の各頂点を A(XA,YA) B(XB,YB) C(XC,YC) D(XD,YD) として、 YA＝YB＝(ya+yb)/2 YC＝YD＝(yc+yd)/2 の関係が成立するようにして形状補正を行うようにされ
る。ただし、条件として、 XA＝xa XB＝xb XC＝xc XD＝xd の関係を満たすこととして、座標変換結果に誤差が出な
いようにしているので、台形化補正された補正枠情報の
形状としては、必ずしも辺ＡＤ＝辺ＢＣとなるいわゆる
等脚台形にはならない。以上、図３及び図４により説明
したようにして、枠情報を認識し、この枠情報について
台形化の補正処理を実行することで、処理としての撮
像画像内の表示画面１７ａの表示領域（枠情報）の認識
が行われたこととなる。In this embodiment, the side a of the frame information is
It is determined that b is the upper base and side cd is the lower base, and then these two sides are corrected so that they are parallel to each other. That is, the frame information is corrected by making it trapezoidal. As a result, the frame information (quadrangle abc shown in FIG.
d) is, for example, a quadrangle ABCD indicated by a broken line in FIG.
The shape is changed so as to be a trapezoid as shown by. As a specific example, let each vertex of the frame information before correction be a (xa, ya) b (xb, yb) c (xc, yc) d (xd, yd), and set the corrected frame information that is the frame information after correction. Let each vertex be A (XA, YA) B (XB, YB) C (XC, YC) D (XD, YD), YA ＝ YB ＝ (ya + yb) / 2 YC ＝ YD ＝ (yc + yd) / Shape correction is performed so that the relationship of 2 holds. However, as a condition, the relationship of XA = xa XB = xb XC = xc XD = xd is satisfied so that no error will occur in the coordinate conversion result, so the shape of the correction frame information that has been trapezoidally corrected is , The side AD = side BC is not necessarily a so-called isosceles trapezoid. As described above with reference to FIGS. 3 and 4, by recognizing the frame information and performing the trapezoidal correction process on this frame information, the display area (frame) of the display screen 17a in the captured image is processed. (Information) has been recognized.

【００４５】３−１−３．座標変換処理上記のようにして枠情報の認識が行われ、この後の通常
モードにおいてポインタ（指、ポイントペンなど）によ
り操作が行われた場合には、そのポインタの指し示すポ
イント位置の座標を認識し、さらにこのポイント位置の
座標を実際の表示画面１７ａにおける座標に変換する座
標変換処理を行うことになる。つまり先に記した処理
を実行する。そしてここでは、処理に対応する座標
変換処理について図５〜図７を参照して説明する。な
お、処理としての座標変換処理を実行するためには、
先の処理として記したように、撮像画像内におけるど
の画像領域がポインタであるのかを認識したうえで、さ
らに処理として記したように、撮像画像上の枠情報に
対するポインタのポイント位置を認識することが必要と
なるが、これについては後述することとして、ここで
は、ポインタのポイント位置の認識処理が既に適正に行
われていることを前提として説明する。3-1-3. Coordinate conversion processing The frame information is recognized as described above, and when an operation is performed with a pointer (finger, point pen, etc.) in the subsequent normal mode, the coordinates of the point position pointed to by the pointer are recognized. Then, the coordinate conversion processing for converting the coordinates of the point position into the coordinates on the actual display screen 17a is performed. That is, the processing described above is executed. Then, here, the coordinate conversion processing corresponding to the processing will be described with reference to FIGS. In order to execute coordinate conversion processing as processing,
As described as the previous processing, after recognizing which image area in the captured image is the pointer, and further described as the processing, recognizing the point position of the pointer with respect to the frame information on the captured image. However, this will be described later. Here, it is assumed that the recognition processing of the point position of the pointer has already been properly performed.

【００４６】ここで、図５（ａ）には、図４にて説明し
たようにして得られた補正枠情報が示されている。そし
て、ユーザの表示画面１７ａに対するポインタの操作に
よって、input(xin,yin)という座標（図において×印で
示す）が得られたとする。この座標input(xin,yin)は、
例えば撮像画像の有効表示領域内にて検出された画像位
置としての座標（画像内座標）である。そして、本実施
の形態としての処理によっては、この画像内座標inpu
t(xin,yin)を、図５（ｂ）に示すようにして、実際にＧ
ＵＩ画像が表示されている表示部１７ａ上の実座標outp
ut(xout,yout)に変換することになるが、この図５によ
っては、先ず、ｙ座標を変換する場合を説明する。この
場合、実座標のｙ座標（yout）は、図５（ａ）に示され
る補正枠情報としての台形の高さheight_imgと、図５
（ｂ）に示される表示画面１７ａの縦方向の幅height_d
spとの比率により求めることができる。つまり、実座標
のｙ座標（yout）は、 yout=(yin-YA)＊height_dsp/height_img・・・（式１）として示される式によって算出することができる。Here, FIG. 5A shows the correction frame information obtained as described with reference to FIG. Then, it is assumed that coordinates (indicated by X in the figure) called input (xin, yin) are obtained by the user operating the pointer on the display screen 17a. This coordinate input (xin, yin) is
For example, the coordinates (in-image coordinates) as the image position detected in the effective display area of the captured image. Then, depending on the processing according to the present embodiment, this in-image coordinate inpu
By using t (xin, yin) as shown in FIG.
Actual coordinates outp on the display unit 17a where the UI image is displayed
It will be converted into ut (xout, yout), and the case of converting the y coordinate will be described first with reference to FIG. In this case, the y-coordinate (yout) of the real coordinates is the height height_img of the trapezoid as the correction frame information shown in FIG.
Vertical width of the display screen 17a shown in (b) height_d
It can be determined by the ratio with sp. That is, the y coordinate (yout) of the real coordinates can be calculated by the formula shown as yout = (yin-YA) * height_dsp / height_img (Equation 1).

【００４７】続いては、ｘ座標を求めることになるので
あるが、このためには、先ず、補正枠情報と表示画面１
７ａとのｘ軸方向（横方向）の位置を、所定の基準線に
対して合わせるようにされる。図６は、この基準位置合
わせについて模式的に示している。先ず、図６（ｂ）に
示される表示画面１７ａは、頂点Ｗ，Ｘ，Ｙ，Ｚから成
る長方形として示されるが、この表示画面１７ａの基準
線としては、辺ＷＸの中点Ｍ0と、辺ＺＹの中点Ｍ1を結
ぶ中線Ｍ0-Ｍ1とすればよい。Subsequently, the x coordinate is to be obtained. For this purpose, first, the correction frame information and the display screen 1 are displayed.
The position of 7a in the x-axis direction (horizontal direction) is aligned with a predetermined reference line. FIG. 6 schematically shows this reference position alignment. First, the display screen 17a shown in FIG. 6 (b) is shown as a rectangle composed of vertices W, X, Y and Z. A midline M0-M1 connecting the midpoints Z1 of ZY may be used.

【００４８】これに対して図６（ａ）に示す補正枠情報
の基準線ＸＮであるが、この補正枠情報としての台形
は、前述もしたように必ずしも等脚台形ではない。そこ
で、補正枠情報の辺ＡＢ上の点Ｎ0と、辺ＣＤ上の点Ｎ1
とを結んで形成されるべき基準ＸＮとしては、当該補正
枠情報の頂点Ａ，Ｂ，Ｃ，Ｄの各ｘ座標(XA，XB，XC，X
D)を用いて、ＸＮ＝(XA＋XB＋XC＋XD)/4・・・（式２）で表される式により求めるようにされる。On the other hand, although it is the reference line XN of the correction frame information shown in FIG. 6A, the trapezoid as the correction frame information is not necessarily the isosceles trapezoid as described above. Therefore, the point N0 on the side AB and the point N1 on the side CD of the correction frame information
As the reference XN to be formed by connecting with, the x-coordinates (XA, XB, XC, X) of the vertices A, B, C, D of the correction frame information are formed.
D), XN = (XA + XB + XC + XD) / 4 (Equation 2).

【００４９】そして、上記のようにして補正枠情報の基
準線ＸＮを求めた上で、画像内座標のｘ座標(xin)から
実座標のｘ座標(xout)への変換を行うようにされる。図
７は、このｘ座標の変換を模式的に示すものである。After the reference line XN of the correction frame information is obtained as described above, the x coordinate (xin) of the in-image coordinates is converted to the x coordinate (xout) of the actual coordinates. . FIG. 7 schematically shows this conversion of the x coordinate.

【００５０】ここで、補正枠情報である台形ＡＢＣＤに
おいて、ｙ座標(yin)を通過する高さ位置における幅をw
idth_imgとすると、実座標のｘ座標(xout)は、この台形
ＡＢＣＤ（補正枠情報）の幅width_imgと、表示画面１
７ａの横方向の幅width_dspとの比率により決定すると
いえる。しかしながら、補正枠情報である台形ＡＢＣＤ
の幅width_imgは、ｙ座標(yin)の値によって可変となっ
てしまうので、次のようにして算出することとした。つ
まり、辺ＤＡ上にある点についてのｙ座標(yin)のとき
のｘ座標を「ｘ0」とし、辺ＢＣ上にある点についての
ｙ座標(yin)のときのｘ座標を「ｘ1」として、 x0=(XD-XA)＊(yin-YA)/height_img+XA x1=(XB-XC)＊(yin-YA)/height_img+XC width_img=x1-x0・・・（式３）として表される式により求めるようにされる。そして、
このようにして求めたwidth_imgの値を利用すれば、実
座標のｘ座標(xout)については、 xout=(xin-XN)＊width_dsp/width_img+width_dsp/2・・・（式４）のようにして算出できることになる。このように、図５
〜図７により説明したようにして演算処理を行えば、撮
像画像における表示画面領域１７ａ−１内の画像内座標
input(xin,yin)を、現実の表示部１７ａ上の実座標outp
ut(xout,yout)に変換することができる。なお、これら
図５〜図７により説明した座標変換の処理は、あくまで
も一例であって、例えば他の演算式を用いた処理とされ
ても構わないものである。Here, in the trapezoid ABCD which is the correction frame information, the width at the height position passing the y coordinate (yin) is w.
Let idth_img be the x-coordinate (xout) of the real coordinate, and the width width_img of this trapezoid ABCD (correction frame information) and the display screen 1
It can be said that it is determined by the ratio with the width width_dsp of 7a in the horizontal direction. However, the trapezoid ABCD that is the correction frame information
The width width_img of is changed depending on the value of the y-coordinate (yin), so it is calculated as follows. That is, the x coordinate at the y coordinate (yin) for the point on the side DA is "x0", and the x coordinate at the y coordinate (yin) for the point on the side BC is "x1". x0 = (XD-XA) * (yin-YA) / height_img + XA x1 = (XB-XC) * (yin-YA) / height_img + XC width_img = x1-x0 ... (Expression 3) It is made to be calculated by a formula. And
If you use the value of width_img obtained in this way, for the x coordinate (xout) of the real coordinate, xout = (xin-XN) * width_dsp / width_img + width_dsp / 2 ... (Equation 4) Can be calculated. Thus, FIG.
~ If the arithmetic processing is performed as described with reference to Fig. 7, the in-image coordinates in the display screen area 17a-1 in the captured image
input (xin, yin) is the actual coordinate outp on the actual display 17a
Can be converted to ut (xout, yout). The coordinate conversion process described with reference to FIGS. 5 to 7 is merely an example, and may be a process using another arithmetic expression, for example.

【００５１】３−２．音声信号に基づく疑似ボタン操作
に必要な処理３−２−１．音声信号の認識／解析処理例続いては、本実施の形態の「疑似タッチパネル操作」と
して、「疑似ボタン操作」についての説明を行うことと
する。本実施の形態における「疑似ボタン操作」とは、
前述もしたように、ユーザが、表示画面１７ａに対して
ポインタを直接接触させながら操作を行うことに伴って
発生する接触音に基づいて、マウスの左ボタン操作に対
応する操作情報を発生させるものである。このために
は、マイクロフォン２３にて収音された音声のうちから、
上記したポインタ操作による「接触音」としての音声信
号成分を認識する音声信号成分認識処理、を行う必要が
ある。この音声信号成分認識処理によって、例えば「接
触音」と、ユーザの発話音声などの他の音声信号成分と
区別することができるものである。そして、このように
して「接触音」の音声信号成分を認識したうえで、ユーザのポインタ操作によって生じる接触音の発生状
態について解析を行う解析処理、を行うようにされる。
つまり「接触音」の発生状態を認識し、この認識結果に
応じて、マウスの左ボタン操作としてどのような操作が
行われたものとするのかを決定する。3-2. Processing required for pseudo button operation based on audio signal 3-2-1. Example of Recognition / Analysis Processing of Audio Signal Next, the “pseudo button operation” will be described as the “pseudo touch panel operation” of the present embodiment. The "pseudo button operation" in this embodiment means
As described above, the operation information corresponding to the operation of the left mouse button is generated based on the contact sound generated when the user operates the pointer while directly touching the display screen 17a. Is. To this end, from the sounds picked up by the microphone 23,
It is necessary to perform the voice signal component recognition processing for recognizing the voice signal component as the "touch sound" by the pointer operation. By this voice signal component recognition processing, for example, a "contact sound" can be distinguished from other voice signal components such as the user's uttered voice. Then, after recognizing the voice signal component of the “contact sound” in this way, an analysis process of analyzing the generation state of the contact sound generated by the pointer operation by the user is performed.
That is, the generation state of the “contact sound” is recognized, and what operation is assumed to have been performed as the operation of the left mouse button is determined according to the recognition result.

【００５２】先ず上記処理としての音声信号成分認識
処理であるが、本実施の形態としては、予めの初期設定
で、「接触音」としての音声信号成分の周波数帯域特性
を、解析処理に用いるべき音声信号成分として登録して
おくこととする。例えばユーザは、所定操作によって、
解析処理に用いるべき音声信号成分を登録するための登
録モードを設定する。そして、この登録モードの下で、
ユーザは、少なくとも、表示画面１７ａを叩いたり、ま
た、表示画面１７ａを擦るなどして音を出す。このよう
にして出てくる音は、マイクロフォン２３にて収音され
てパーソナルコンピュータ１０に入力される。パーソナ
ルコンピュータ１０では、画像／音声認識ドライバ１８
ａのプログラムに従ってその周波数帯域特性を検出し、
この検出した周波数帯域特性を、解析処理に用いるべき
音声信号成分として登録するようにされる。そして、表
示画面１７ａに対して実際にポインタによる接触操作が
行われているときには、この接触操作によって生じる接
触音がマイクロフォン２３によって収音されて音声信号
として取り込まれることになるが、パーソナルコンピュ
ータ１０では、この音声信号に対して、上記のようにし
て登録された周波数帯域のみを通過させるようにしてフ
ィルタ処理を施す。これにとって接触音としての音声信
号成分のみが抽出される。つまり、処理としての音声
信号成分認識処理を行ったこととなる。First, the voice signal component recognition process as the above process is performed. In this embodiment, the frequency band characteristic of the voice signal component as the "contact sound" should be used for the analysis process in the initial setting. It shall be registered as a voice signal component. For example, the user
A registration mode for registering a voice signal component to be used in analysis processing is set. And under this registration mode,
The user at least makes a sound by hitting the display screen 17a or rubbing the display screen 17a. The sound thus generated is picked up by the microphone 23 and input to the personal computer 10. In the personal computer 10, the image / voice recognition driver 18
According to the program of a, the frequency band characteristic is detected,
The detected frequency band characteristic is registered as an audio signal component to be used in the analysis process. Then, when the contact operation by the pointer is actually performed on the display screen 17a, the contact sound generated by the contact operation is picked up by the microphone 23 and taken in as an audio signal. The audio signal is filtered by passing only the frequency band registered as described above. For this, only the audio signal component as the contact sound is extracted. That is, it means that the voice signal component recognition process is performed.

【００５３】そして、上記処理としての解析処理は次
のようにして実行される。この解析処理を、図８により
模式的に示す。例えばパーソナルコンピュータ１０で
は、ユーザの表示画面１７ａに対する接触操作により生
じた「接触音」の音声信号成分を、マイクロフォン２３
から入力された音声信号から分離抽出する。そして、こ
の音声信号成分を解析するのにあたっては、接触音の音
声信号成分のレベルに対して、２種類の閾値を設定す
る。つまり、図８（ａ）に示すようにして、０レベルを
基準として、所定の絶対値レベルによる第１の閾値±ｔ
ｈ０と、この第１の閾値±ｔｈ０よりも大きいとされる
所定の絶対値レベルによる第２の閾値±ｔｈ１を設定す
る。この場合、図８（ａ）の右側に示されているよう
に、第１の閾値±ｔｈ０より小さな絶対値レベルの範囲
は、「接触音」は無音状態であるものとして認識する。
また、第１の閾値±ｔｈ０以上で、かつ、第２の閾値±
ｔｈ１よりも小さな絶対値レベルの範囲では、「接触
音」は弱音であるものとして認識する。さらに、第２の
閾値±ｔｈ１以上の絶対値レベルは強音であるとして認
識する。Then, the analysis processing as the above processing is executed as follows. This analysis processing is schematically shown in FIG. For example, in the personal computer 10, the voice signal component of the “contact sound” generated by the user's touch operation on the display screen 17 a is converted into the microphone 23.
The audio signal input from is separated and extracted. Then, when analyzing the audio signal component, two types of threshold values are set for the level of the audio signal component of the contact sound. That is, as shown in FIG. 8A, the first threshold value ± t based on the predetermined absolute value level with reference to the 0 level.
A threshold value h0 and a second threshold value ± th1 based on a predetermined absolute value level which is considered to be larger than the first threshold value ± th0 are set. In this case, as shown on the right side of FIG. 8A, the range of the absolute value level smaller than the first threshold value ± th0 is recognized as the “contact sound” being in the silent state.
Further, the first threshold value ± th0 or more, and the second threshold value ± th0
In the range of the absolute value level smaller than th1, the “contact sound” is recognized as a weak sound. Further, the absolute value level equal to or higher than the second threshold value ± th1 is recognized as strong sound.

【００５４】図８（ａ）に示す波形は、ユーザが表示画
面１７ａに対して接触操作を行ったことで得られた接触
音の音声信号成分とされる。パーソナルコンピュータ１
０では、この接触音の音声信号成分のレベルについて、
第１の閾値±ｔｈ０及び第２の閾値ｔｈ１と比較を行う
ようにされる。例えば、この図８（ａ）に示す波形とし
て、区間Ａでは、第１の閾値±ｔｈ０と第２の閾値ｔｈ
１の間にピークレベルが得られている。そこで、この場
合には図８（ｂ）に示すようにして、接触音として弱音
が発生したと判定することになる。また、区間Ｂにおい
ては第２の閾値ｔｈ１以上のピークレベルが得られてい
るので、接触音として強音が発生したと判定する。ま
た、区間Ｃにおいても、第１の閾値±ｔｈ０と第２の閾
値ｔｈ１の間にピークレベルが得られていることから、
弱音が発生したと判定することになる。また、区間Ａと
区間Ｂとの間、及び区間Ｂと区間Ｃとの間は、０レベル
若しくは第１の閾値±ｔｈ０より小さな絶対値レベルの
波形となっているので、無音であるとして判定される。
このようにして、本実施の形態の接触音についての解析
としては、接触音のレベルについての判定を行うものと
している。The waveform shown in FIG. 8A is an audio signal component of a contact sound obtained by the user performing a contact operation on the display screen 17a. Personal computer 1
At 0, regarding the level of the audio signal component of this contact sound,
A comparison is made with the first threshold value ± th0 and the second threshold value th1. For example, as the waveform shown in FIG. 8A, in the section A, the first threshold value ± th0 and the second threshold value th are
During 1 the peak level is obtained. Therefore, in this case, as shown in FIG. 8B, it is determined that a weak sound is generated as the contact sound. Further, since the peak level equal to or higher than the second threshold th1 is obtained in the section B, it is determined that a strong sound is generated as the contact sound. Also in the section C, since the peak level is obtained between the first threshold value ± th0 and the second threshold value th1,
It will be determined that a weak sound has occurred. Further, between the section A and the section B, and between the section B and the section C, there is a 0 level or an absolute value level smaller than the first threshold value ± th0. It
In this way, as the analysis of the contact sound in the present embodiment, the level of the contact sound is determined.

【００５５】また、このような接触音レベルと閾値との
比較は所定の時間ｔごとに行われるものとされ、この時
間ｔごとの区間で、無音、弱音、強音の三段階の判定を
行うようにされる。従って、図８（ｂ）によると、区間
Ａでは、時間ｔ×２の時間長で弱音であることを判定し
た場合が示されていることになる。また、区間Ｂでは、
時間ｔ×４の時間長にわたって強音であることを判定し
た場合が示されている。さらに区間Ｃでは、時間ｔ×９
の時間長にわたって弱音であることを判定していること
になる。本実施の形態では、このような音レベルの判定
の時間長も解析結果の１つとして扱われる。Further, such a comparison between the contact sound level and the threshold value is supposed to be carried out at every predetermined time t, and in the section at each time t, the three-step judgment of silence, weak sound and strong sound is carried out. To be done. Therefore, according to FIG. 8B, a case where it is determined in the section A that the sound is weak with a time length of time t × 2 is shown. In section B,
The case where it is determined that the sound is strong over a time length of time t × 4 is shown. Further, in section C, time t × 9
That is, it is determined that the sound is weak over the length of time. In the present embodiment, such a time length of sound level determination is also treated as one of the analysis results.

【００５６】３−２−２．解析結果に応じた状態遷移そして、上記図８（ａ）（ｂ）により説明したようにし
て得た、接触音についての解析結果に基づいては、例え
ば図９に示すようにしてマウスの左ボタンへの対応がと
られる。図９には、接触音の発生状態に応じてのマウス
左ボタン操作の状態遷移が示される。例えば先ず、接触
音の入力を待機している状態で、無音状態とされている
場合には、ステップＳ１に示すように無反応の状態で何
のアクションも起こさない。そして、例えばこのステッ
プＳ１の状態のもとから、弱音の発生したことが判定さ
れると、ステップＳ２として示すように、マウスの左ボ
タンを押した状態に遷移することになる。そして、例え
ばこの弱音の発生が所定時間以内に終了して無音の状態
に変化したとすれば、ステップＳ３に示すようにして、
左ボタンを離した状態に遷移する。つまり、短時間とさ
れる所定期間内において弱音が発生したとされる場合に
は、上記ステップＳ２→Ｓ３の処理が実行される結果、
マウスの左ボタンを短時間のうちに押して離す操作が行
われたとして扱うことになる。つまり、ワンクリックの
動作が得られたものとして扱う。図８においては、例え
ば区間Ａとしての解析結果がこれに対応する。つまり、
時間ｔ×２という比較的短時間において弱音が発生して
いることで、図８（ｃ）に示すように、この区間Ａは、
クリック操作が発生しているものである。3-2-2. The state transition according to the analysis result, and based on the analysis result of the contact sound obtained as described above with reference to FIGS. 8A and 8B, for example, as shown in FIG. Is taken. FIG. 9 shows the state transition of the mouse left button operation according to the generation state of the contact sound. For example, first, when a silent state is awaited for input of a contact sound, no action is taken in a non-responsive state as shown in step S1. Then, for example, if it is determined from the state of step S1 that a weak sound is generated, the state shifts to a state in which the left mouse button is pressed, as shown in step S2. Then, for example, if the generation of this weak sound ends within a predetermined time and changes to a silent state, as shown in step S3,
Transition to the state where the left button is released. That is, when a weak sound is generated within a predetermined period of a short time, as a result of executing the processing of steps S2 → S3,
It is treated as if the left mouse button was pressed and released in a short time. In other words, it is treated as if the one-click operation was obtained. In FIG. 8, for example, the analysis result of the section A corresponds to this. That is,
Since the weak sound is generated in a relatively short time t × 2, as shown in FIG.
The click operation is occurring.

【００５７】これに対して、ステップＳ２でマウスの左
ボタンを押したとしてアクションを起こした状態から、
弱音がそのまま或る時間以上継続したとされる場合に
は、ステップＳ４に示すようにして、マウスの左ボタン
を押したままの状態が継続されているものとして扱う。
そして、このステップＳ４に対応する状態として、ユー
ザが表示画面１７ａ上に対してポインタを接触させなが
ら移動させているのであれば、これがドラッグ操作とさ
れることになる。そして、このステップＳ４の状態のも
とで無音に変わったとすれば、ステップＳ３として示す
ように、左ボタンが離されたとする状態に遷移し、例え
ばこれまでドラッグ操作を行っていたとすれば、このド
ラッグ操作が解除される。なお、図８の場合には、この
ようなドラッグ操作は、区間Ｃが対応することになる。
この区間Ｃでは、弱音が時間ｔ×９という比較的長時間
にわたって継続されており、従って、図８（ｃ）に示す
ようにしてドラッグ操作が発生することになる。On the other hand, in step S2, from the state in which the action is taken by pressing the left button of the mouse,
If the weak sound is said to have continued for a certain period of time as it is, it is treated as if the state in which the left button of the mouse is held is continued as shown in step S4.
Then, in the state corresponding to this step S4, if the user is moving the pointer while touching it on the display screen 17a, this is a drag operation. Then, if it is changed to silence under the state of step S4, as shown in step S3, the state transits to a state in which the left button is released, and if, for example, a drag operation has been performed so far, this The drag operation is canceled. In the case of FIG. 8, such a drag operation corresponds to the section C.
In this section C, the weak sound is continued for a relatively long time t × 9, and therefore, the drag operation occurs as shown in FIG. 8C.

【００５８】また、ステップＳ２により左ボタンを押し
ていたとされる弱音発生の状態から、強音に変化したと
される場合には、ステップＳ５に遷移し、ダブルクリッ
クが行われたものとして扱うことになる。なお、このス
テップＳ５の状態の後、強音が継続する、または、弱音
に変化したとしてもステップＳ６に示すようにして、無
反応の状態に遷移するようにして、ダブルクリック操作
が確実に行われるように配慮している。また、この後、
無音の状態となれば、ステップＳ１としての無反応の状
態に遷移する。If it is determined in step S2 that the left button has been pressed and the weak sound has changed to a strong sound, the process proceeds to step S5 and is treated as if a double click had been performed. become. After the state of step S5, even if strong sound continues or changes to weak sound, as shown in step S6, a transition to a non-responsive state is made and a double-click operation is performed reliably. I am careful to be seen. Also after this,
When a silent state is reached, a transition is made to a non-responsive state as step S1.

【００５９】また、ステップＳ１の無音に対応した無反
応の状態の下で、強音の状態に変化したとされると、ス
テップＳ７に遷移してダブルクリックが行われたものと
みなすことになる。そして、この場合にも、その後にお
いて強音が継続する、または、弱音に変化したとしても
ステップＳ８として示すように、無反応の状態に遷移さ
せるようにしている。また、ステップＳ８の状態から無
音の状態に戻れば、ステップＳ１としての無反応の状態
に遷移させるようにしている。なお、図８において、上
記ステップＳ５若しくはステップＳ７のダブルクリック
操作は、区間Ｂが対応することになる。区間Ｂでは、時
間ｔ×４の期間にわたって強音が発生しているが、例え
ば最初の時間ｔ×２の期間に対応してダブルクリック操
作が発生している。そして残る以降の時間ｔ×２の期間
に対応しては無反応の状態としているものである。If it is determined that the sound has changed to the strong sound under the non-responsive state corresponding to the silence in step S1, the process proceeds to step S7, and it is considered that the double click has been performed. . Also in this case, even if the strong sound continues or the sound changes to a weak sound thereafter, as shown in step S8, the state is changed to the non-responsive state. Further, when the state of step S8 is returned to the silent state, the state is changed to the non-responsive state as step S1. In FIG. 8, the section B corresponds to the double-click operation in step S5 or step S7. In section B, a strong sound is generated over a period of time t × 4, but a double-click operation is generated corresponding to the first period of time t × 2, for example. Then, the non-reaction state is set in correspondence with the remaining time t × 2.

【００６０】本実施の形態では、例えば上記図８及び図
９により説明した接触音の解析と、この解析結果に基づ
いたマウスの左ボタン操作としての状態遷移を実行する
ことで「疑似ボタン操作」を実現するものとしているの
であるが、これは、具体的には、次のような態様によっ
て接触操作を行ってもらうことを想定しているものであ
る。ユーザのポインタを用いた表示画面１７ａに対する
接触操作として、接触音が発生するような操作がどのよ
うなものであるのかを考えてみると、おおきくは、表示
画面１７ａ上を叩く操作と、表示画面１７ａ上を擦るよ
うにして移動させる操作とに分けることができる。表示
画面１７ａ上を叩いた場合には、その叩く強さに応じた
大きさの打音が生じることになるし、表示画面１７ａ上
をポインタにより擦る場合には、表示画面１７ａをポイ
ンタがすることで、或る程度の大きさの摺音が生じるこ
ととなる。In the present embodiment, for example, the “pseudo button operation” is performed by analyzing the contact sound described with reference to FIGS. 8 and 9 and executing the state transition as the left button operation of the mouse based on the analysis result. Specifically, it is assumed that the contact operation is specifically performed in the following manner. Considering what kind of operation a touch sound is generated as a touch operation on the display screen 17a using the user's pointer, it is generally that an operation of tapping on the display screen 17a and a display screen It can be divided into an operation of rubbing and moving on 17a. When the display screen 17a is tapped, a tapping sound having a loudness corresponding to the tapping strength is generated, and when the display screen 17a is rubbed with the pointer, the display screen 17a must be touched by the pointer. Therefore, a certain amount of sliding noise is generated.

【００６１】そこで、先ず、クリック操作については、
ユーザには、表示画面１７ａを軽く叩くようにしてもら
うこととする。これによって、接触音としては、短い弱
音が生じることとなって、例えば先の図９のステップＳ
２→Ｓ３の流れにより説明したようにしてクリック操作
を発生させることができる。また、ドラッグ操作は、上
記のようにして表示画面１７ａ上のドラッグ開始位置を
軽く叩くようにしてポインタを接触させ、この接触した
状態のまま、ポインタをドラッグさせたいとする任意の
方向に移動させることとする。このようにすれば、先
ず、ドラッグ開始位置を軽く叩くことで打音としての弱
音が生じ、続いては表示画面１７ａ上を擦ることによる
弱音が発生することになるので、図９のステップＳ２→
Ｓ４の流れによるドラッグ操作としての状態遷移を得る
ことができる。そして、ダブルクリック操作は、表示画
面１７ａ上を強く叩いてもらうこととする。この場合、
接触音としては強音が発生することになるので、図９の
ステップＳ７、若しくはステップＳ５により説明したダ
ブルクリック操作への状態遷移が得られるものである。Therefore, first, regarding the click operation,
The user is asked to tap the display screen 17a. As a result, a short weak sound is generated as the contact sound, and for example, step S in FIG.
The click operation can be generated as described by the flow of 2 → S3. In the drag operation, the pointer is brought into contact by tapping the drag start position on the display screen 17a as described above, and the pointer is moved in any desired direction while keeping the contact state. I will. By doing so, first, a light sound as a tapping sound is generated by tapping the drag start position, and then a weak sound is generated by rubbing on the display screen 17a. Therefore, step S2 →
A state transition as a drag operation can be obtained by the flow of S4. Then, in the double-click operation, the display screen 17a is strongly tapped. in this case,
Since a strong sound is generated as the contact sound, the state transition to the double-click operation described in step S7 or step S5 of FIG. 9 can be obtained.

【００６２】なお、上記のようにして、ユーザに行って
もらうべき操作に応じて左ボタンクリック操作をシミュ
レートすることを前提とすれば、図８に示される無音・
弱音・強音を判定するための第１の閾値±ｔｈ０、及び
第２の閾値ｔｈ１は、ユーザによって任意に可変設定で
きるようにすることが好ましい。つまり、表示画面１７
ａを叩いたり、擦ったりするときに出てくる接触音の音
量はユーザごとに異なるものであり、また、マイクロフ
ォン２３の感度も、実際に使用されるマイクロフォンと
しての装置に応じて異なってくる。また、表示画面１７
ａとしての表面の材質などの物理的構造によっても接触
音の音量は異なってくるものである。そこで、このよう
な条件の相違に応じて、ユーザにとって最も操作しやす
い状態が得られるように、第１の閾値±ｔｈ０及び第２
の閾値±ｔｈ１を設定すれば、より正確な操作に対する
反応が得られることになる。Assuming that the left button click operation is simulated in accordance with the operation to be performed by the user as described above, the silence / sound shown in FIG.
It is preferable that the first threshold value ± th0 and the second threshold value th1 for determining weak sound / strong sound can be variably set by the user. That is, the display screen 17
The volume of the contact sound generated when hitting or rubbing a is different for each user, and the sensitivity of the microphone 23 is also different depending on the device actually used as the microphone. In addition, the display screen 17
The volume of the contact sound varies depending on the physical structure such as the material of the surface as a. Therefore, according to such a difference in the conditions, the first threshold value ± th0 and the second threshold value are set so that the state in which the user can easily operate is obtained.
If the threshold value ± th1 is set, a more accurate reaction to the operation can be obtained.

【００６３】４．疑似タッチパネル操作時の処理動作４−１．画像／音声認識ドライバのソフトウェア構成例続いては、これまでの説明を踏まえて、実際の疑似タッ
チパネル操作（ポインタによる表示画面１７ａに対する
接触操作）に対応した処理動作について説明を行ってい
くこととする。そこで先ず、本実施の形態の疑似タッチ
パネル操作を実現するためのプログラムである、画像／
音声認識ドライバ１８ａのプログラム構成について図１
０を参照して説明しておくこととする。4. Processing operation when operating the pseudo touch panel 4-1. Example of Software Configuration of Image / Voice Recognition Driver Next, based on the above description, the processing operation corresponding to the actual pseudo touch panel operation (touch operation on the display screen 17a by the pointer) will be described. . Therefore, first, image / image, which is a program for realizing the pseudo touch panel operation of the present embodiment,
FIG. 1 shows the program configuration of the voice recognition driver 18a.
It will be described with reference to 0.

【００６４】図１０は、画像／音声認識ドライバ１８ａ
のプログラム構成を模式的に示している。そして、画像
／音声認識ドライバ１８ａとしては、この図１０に示す
ようにして、おおきくは画像／音声認識モジュール部３
０と、マウスインターフェイスアプリケーション６０と
に分けられる。FIG. 10 shows the image / voice recognition driver 18a.
3 schematically shows the program configuration of. As the image / speech recognition driver 18a, as shown in FIG.
0 and a mouse interface application 60.

【００６５】そしてまた、この画像／音声認識モジュー
ル３０としては、音声認識モジュール部４０と画像認識
モジュール部５０とから成る。音声認識モジュール部４
０は、マイクロフォン２３（２３Ａ）から入力された音
声信号を取り込んで、例えば図８に示した接触音につい
ての解析処理を実行するためのプログラム部分である。
このために、オーディオキャプチャモジュール４１、オ
ーディオ解析モジュール４２を備える。The image / voice recognition module 30 is composed of a voice recognition module section 40 and an image recognition module section 50. Speech recognition module section 4
Reference numeral 0 is a program portion for taking in a voice signal input from the microphone 23 (23A) and executing the analysis process for the contact sound shown in FIG. 8, for example.
For this purpose, an audio capture module 41 and an audio analysis module 42 are provided.

【００６６】オーディオキャプチャモジュール４１によ
っては、マイクロフォン２３（２３Ａ）から入力された
アナログ音声信号について、解析処理に適合する所要の
形式のオーディオデータに変換し、オーディオ解析モジ
ュール４２に渡す。オーディオ解析モジュール４２によ
っては、図８により説明した解析処理を実行する。つま
り、オーディオキャプチャモジュール４１から受け取っ
たオーディオデータから、予め登録された接触音の周波
数特性の信号成分を抽出するためのフィルタ処理を実行
する。続いて、このフィルタ処理によって抽出された接
触音の周波数信号成分のレベルについて、時間ｔごとの
間隔で以て、予め設定された第１の閾値±ｔｈ０及び第
２の閾値±ｔｈ１との比較を行って、無音、弱音、強音
の３段階による判定を行う。そして、このようにして時
間ｔごとに判定される無音、弱音、強音についての判定
結果をマウスインターフェイスアプリケーション６０の
ボタン操作発生モジュール６２に対して渡す。The audio capture module 41 converts the analog voice signal input from the microphone 23 (23A) into audio data of a required format suitable for analysis processing, and passes it to the audio analysis module 42. Depending on the audio analysis module 42, the analysis processing described with reference to FIG. 8 is executed. That is, the filter process for extracting the signal component of the frequency characteristic of the contact sound registered in advance from the audio data received from the audio capture module 41 is executed. Subsequently, the level of the frequency signal component of the contact sound extracted by this filter processing is compared with a preset first threshold value ± th0 and a second threshold value ± th1 at intervals of time t. Then, the judgment is made in three stages of silence, weak sound, and strong sound. Then, the determination result for silence, weak sound, and strong sound thus determined at each time t is passed to the button operation generation module 62 of the mouse interface application 60.

【００６７】また、画像認識モジュール部５０は、カメ
ラ部２２から入力された撮像画像としての画像信号につ
いて認識処理を行ってポインタの座標情報を取得する機
能を与えるもので、このために、ビデオキャプチャモジ
ュール５１、座標変換モジュール５２としてのプログラ
ムにより構成される。ビデオキャプチャモジュール５１
は、カメラ部２２から入力された撮像画像のビデオ信号
を取り込み、所要の形式によるフレーム単位のビデオデ
ータに変換する。そして、このビデオデータをＲＡＭ１
３の所定領域にフレーム単位で書き込みを行う。そして
座標変換モジュール５２によっては、上記のようにして
フレーム単位でＲＡＭ１３に書き込まれたビデオデータ
から、例えば図３に示した枠情報の認識を行い、さら
に、図４に示した枠情報の台形補正処理を行うようにさ
れる。そのうえで、先ず、後述するようにしてポインタ
の先端位置を認識する。ポインタの先端位置とは、即ち
ポインタとしての指やポインタペンの先端位置であっ
て、実際には、ポインタが指し示す表示画面１７ａ上の
位置に対応する。なお、ポインタ先端位置の認識処理に
ついては後述する。そして、このようにして撮像画像上
でのポインタ先端位置（画像内座標：図５（ａ）参照）
が認識されると、図５〜図７にて説明したようにして所
定の演算処理を行って、実際の表示画面１７ａ上の座標
（実座標：図５（ｂ）参照）に対応させるための座標変
換処理を実行する。この座標変換処理によって得た座標
の情報は、マウスインターフェイスアプリケーション６
０におけるカーソル位置情報発生モジュール６３に対し
て渡すようにされる。Further, the image recognition module section 50 has a function of performing recognition processing on an image signal as a captured image input from the camera section 22 and acquiring coordinate information of a pointer. It is configured by a program as the module 51 and the coordinate conversion module 52. Video capture module 51
Captures a video signal of a captured image input from the camera unit 22 and converts it into frame-based video data in a required format. Then, this video data is transferred to RAM1.
Writing is performed for each frame in a predetermined area of No. 3. Then, depending on the coordinate conversion module 52, the frame information shown in FIG. 3 is recognized from the video data written in the RAM 13 frame by frame as described above, and further the trapezoidal correction of the frame information shown in FIG. 4 is performed. To be processed. Then, first, the tip position of the pointer is recognized as described later. The tip position of the pointer is the tip position of the finger or the pointer pen as the pointer, and actually corresponds to the position on the display screen 17a pointed by the pointer. The recognition processing of the pointer tip position will be described later. Then, in this way, the position of the tip of the pointer on the captured image (coordinates in the image: see FIG. 5A)
Is recognized, predetermined calculation processing is performed as described with reference to FIGS. 5 to 7 so as to correspond to the coordinates (actual coordinates: see FIG. 5B) on the actual display screen 17a. Perform coordinate conversion processing. The coordinate information obtained by this coordinate conversion process is used as the mouse interface application 6
It is passed to the cursor position information generation module 63 at 0.

【００６８】このようにして、画像／音声認識モジュー
ル３０からマウスインターフェイスアプリケーション６
０に対しては、音声認識モジュール４０からのオーディ
オ信号の解析結果と、画像認識モジュール部５０からの
座標変換処理によって得られ座標情報とが受け渡される
ことになる。そして、マウスインターフェイスアプリケ
ーション６０内のボタン操作情報発生モジュール６２に
よっては、画像／音声認識モジュール３０から受け取っ
た解析結果に基づいて、接触音についての無音・弱音・
強音の発生の状態遷移を認識し、この認識結果に応じ
て、図９に示したようにして、マウスの左ボタンの押圧
／解除としての操作情報に変換し、操作情報伝達モジュ
ール６１に受け渡す。また、マウスインターフェイスア
プリケーション６０内のカーソル位置情報発生モジュー
ル６３によっては、画像認識モジュール部５０から受け
取った座標情報を、現在のカーソル位置の座標として設
定するための処理を実行し、このカーソル位置座標の情
報を操作情報伝達モジュール６１に渡す。In this way, the image / speech recognition module 30 is operated by the mouse interface application 6
For 0, the analysis result of the audio signal from the voice recognition module 40 and the coordinate information obtained by the coordinate conversion processing from the image recognition module unit 50 are transferred. Then, depending on the button operation information generation module 62 in the mouse interface application 60, based on the analysis result received from the image / speech recognition module 30, there is no sound / weak sound / sound regarding contact sound.
The state transition of the generation of strong sound is recognized, and in accordance with the recognition result, as shown in FIG. 9, it is converted into operation information for pressing / releasing the left mouse button, and the operation information transmitting module 61 receives hand over. In addition, depending on the cursor position information generation module 63 in the mouse interface application 60, a process for setting the coordinate information received from the image recognition module unit 50 as the coordinates of the current cursor position is executed. The information is passed to the operation information transmission module 61.

【００６９】操作情報伝達モジュール６１によっては、
上記のようにして受け渡されたマウスの左ボタンの押圧
／解除の操作情報と、カーソル位置の座標を、ＯＳが処
理可能な操作情報に変換して、ＯＳに受け渡すようにさ
れる。ＯＳでは、この操作情報を受け取ることで、ＧＵ
Ｉ上でのカーソル移動表示、及びクリック、ダブルクリ
ック操作に応じた表示の変更、及び所要の情報処理を実
行する。例えばクリックに応じたＧＵＩ上でのアクティ
ブウィンドウの切り換えや、ダブルクリックに応じたフ
ァイルのオープンや、アプリケーションの起動などを行
う。Depending on the operation information transmission module 61,
The operation information for pressing / releasing the left button of the mouse and the coordinates of the cursor position, which have been transferred as described above, are converted into operation information that can be processed by the OS and transferred to the OS. The OS receives this operation information, so that the GU
The cursor movement display on I, the display change according to the click and double-click operations, and the required information processing are executed. For example, switching of the active window on the GUI in response to a click, opening of a file in response to a double-click, activation of an application, etc. are performed.

【００７０】４−２．フローチャートによる処理動作例続いては、上記図１０に示したプログラム構造による画
像／音声認識ドライバ１８ａに基づく処理動作として、
ユーザが表示画面１７ａに対して疑似タッチパネル操作
を行っているときに対応した処理を、図１１〜図１４を
参照して説明する。4-2. Processing Operation Example According to Flowchart Next, as processing operation based on the image / voice recognition driver 18a having the program structure shown in FIG.
The processing corresponding to the user performing the pseudo touch panel operation on the display screen 17a will be described with reference to FIGS.

【００７１】図１１は、疑似タッチパネル操作時に対応
した画像／音声認識ドライバ１８ａに従った処理とし
て、図１０により説明した画像認識モジュール５０、音
声認識モジュール４０、マウスインターフェイスアプリ
ケーション６０がそれぞれ実行すべき処理を並列的に示
している。なお、確認のために述べておくと、図１１に
示す処理は、ＣＰＵ１１が画像／音声認識ドライバ１８
ａとしてのプログラムに従って実行するものとなる。FIG. 11 is a process to be executed by the image recognition module 50, the voice recognition module 40, and the mouse interface application 60 described with reference to FIG. Are shown in parallel. For confirmation, the CPU 11 executes the image / voice recognition driver 18 in the process shown in FIG.
It is executed according to the program as a.

【００７２】ここでは先ず、画像認識モジュール５０に
従った処理から説明する。ここでは先ず、ビデオキャプ
チャモジュール５１の処理として、ステップＳ１０１に
示すように、ビデオ信号についての取り込み処理を実行
する。つまり、カメラ部２２から入力される撮像画像の
ビデオ信号を、以降のポインタ検出及び座標変換処理に
適合する所定形式による画像データに変換して、例えば
ＲＡＭ１３の所定領域に対して、フレーム単位で順次書
き込みを行っていく。First, the processing according to the image recognition module 50 will be described. Here, first, as the process of the video capture module 51, as shown in step S101, a capture process for a video signal is executed. That is, the video signal of the captured image input from the camera unit 22 is converted into image data in a predetermined format that is compatible with the subsequent pointer detection and coordinate conversion processing, and is sequentially frame-by-frame with respect to a predetermined area of the RAM 13, for example. I will write.

【００７３】続くステップＳ１０２以降の処理は、座標
変換モジュール５２に従った処理となる。ステップＳ１
０２においては、上記のようにしてＲＡＭ１３に書き込
まれたフレーム単位の撮像画像のビデオ信号から、ポイ
ンタを検出する。なお、ここでいうポインタとは、これ
までの説明からも分かるように、ユーザが表示画面１７
ａに対する疑似タッチパネル操作を行うのに使用すべき
物理的存在であり、例えば、ユーザ自身の指、若しくは
細い棒状のポインタペンなどである。また、ここでのポ
インタの検出とは、最終的には、撮像画像上においてポ
インタにより指し示す点としての位置、つまり、ポイン
タの先端部の位置（座標）を検出することをいう。そし
て、このステップＳ１０２の処理は、例えば図１２のフ
ローチャートに示すようにして実行される。The subsequent processing in step S102 and thereafter is processing according to the coordinate conversion module 52. Step S1
In 02, the pointer is detected from the video signal of the captured image in the frame unit written in the RAM 13 as described above. It should be noted that the pointer here means that the user can display on the display screen 17 as can be understood from the above description.
It is a physical entity that should be used to perform a pseudo touch panel operation on a, and is, for example, the user's finger or a thin rod-shaped pointer pen. In addition, the detection of the pointer here refers to finally detecting the position as a point pointed by the pointer on the captured image, that is, the position (coordinates) of the tip of the pointer. Then, the process of step S102 is executed as shown in the flowchart of FIG. 12, for example.

【００７４】図１２の処理についての説明を行うのにあ
たって、ポインタに関する初期設定について説明してお
く。図１２に示すポインタの検出処理により適正にポイ
ンタが検出されるためには、予めポインタの色を把握し
ていることが必要とされる。そこで、本実施の形態とし
ての疑似タッチパネル操作を行うのに先立っては、初期
設定時において、ポインタの色情報を登録するようにさ
れる。このためには、例えばユーザは、所定操作によっ
てポインタ色情報の登録モードとする。この登録モード
では、カメラ部２２による撮像画像をパーソナルコンピ
ュータ１０の表示画面１７ａに表示するようにしてい
る。そして、この状態の下で、例えばユーザは、ポイン
タとして自身の指を使うのであれば、自身の指をカメラ
部２２により撮像して表示画面１７ａに表示させる。ま
た、ポインタペンなどの物体をポインタとして使用する
のであれば、このポインタペンが表示画面１７ａに表示
されるようにする。Before describing the processing of FIG. 12, the initial setting regarding the pointer will be described. In order for the pointer detection process shown in FIG. 12 to properly detect the pointer, it is necessary to know the color of the pointer in advance. Therefore, prior to performing the pseudo touch panel operation as the present embodiment, the color information of the pointer is registered at the time of initial setting. For this purpose, for example, the user sets the pointer color information registration mode by a predetermined operation. In this registration mode, the image captured by the camera unit 22 is displayed on the display screen 17a of the personal computer 10. Then, under this state, for example, if the user uses his / her own finger as the pointer, the user picks up his / her finger with the camera unit 22 and displays it on the display screen 17a. If an object such as a pointer pen is used as the pointer, the pointer pen is displayed on the display screen 17a.

【００７５】そして、ユーザは上記のようにしてカメラ
部２２により撮像して表示画面１７ａに表示させている
ポインタの画像領域内に対して、色取得用のカーソルを
配置させ、登録操作を行う。これにより、色取得用のカ
ーソルが指し示していた位置の撮像画像領域の色情報
が、ポインタ色情報として登録されることになる。例え
ばポインタがユーザの指なのであれば、この指について
の肌色としての色情報が登録されることになり、また、
ポインタがポインタペンであるとしてその色が青色であ
れば、その青色が色情報として登録されることになる。Then, the user positions the cursor for color acquisition in the image area of the pointer which is imaged by the camera section 22 and displayed on the display screen 17a as described above, and performs the registration operation. As a result, the color information of the captured image area at the position pointed to by the color acquisition cursor is registered as pointer color information. For example, if the pointer is the user's finger, the color information as the skin color for this finger will be registered.
If the pointer is a pointer pen and the color is blue, the blue color is registered as color information.

【００７６】そして、図１２に示すポインタ検出処理は
次のようにして行われる。先ず、ステップＳ４０１にお
いては、ＲＡＭ１３に書き込まれたフレーム単位の撮像
画像データについて、予め設定された所定の複数種類の
色領域に分割する。ここでいう「複数の色領域」とは、
例えば図１３に示すようにしてＹＵＶ（輝度・色差）色
度空間上で表現される色を、複数の所定の領域ごとに区
分して得られる複数の領域をいう。The pointer detection process shown in FIG. 12 is performed as follows. First, in step S401, the captured image data in frame units written in the RAM 13 is divided into a plurality of preset color regions of a plurality of types. The "plurality of color areas" here means
For example, as shown in FIG. 13, it refers to a plurality of areas obtained by dividing a color expressed in a YUV (luminance / color difference) chromaticity space into a plurality of predetermined areas.

【００７７】続くステップＳ４０２においては、上記ス
テップＳ４０１によって分割された色領域ごとに、前述
のようにして登録されたポインタ色情報に対応する登録
色領域との比較を行う。登録色領域は、例えばポインタ
がユーザの指であるとして、前述のようにして登録され
たポインタ色情報が或る肌色としての色情報とされてい
る場合には、図１３において斜線で示されるように、こ
の肌色としてのポインタ色情報に対応した＋Ｙ、−Ｕ、
−Ｖ象限における一定範囲の領域を、登録色領域Ｒとし
て設定する。なお、この登録色領域Ｒの設定は、例えば
ポインタ色情報の登録時に行われればよい。In the following step S402, each color area divided in step S401 is compared with the registered color area corresponding to the pointer color information registered as described above. In the registered color area, for example, when the pointer is the user's finger and the pointer color information registered as described above is the color information as a certain skin color, it is indicated by the diagonal lines in FIG. , + Y, -U corresponding to the pointer color information as the skin color,
An area within a certain range in the −V quadrant is set as the registered color area R. The registration color area R may be set, for example, when the pointer color information is registered.

【００７８】そして次のステップＳ４０３においては、
上記ステップＳ４０２における比較結果として、撮像画
像としてのフレーム画像領域内において、登録色領域Ｒ
が存在するか否かについて判別する。ここで、登録色領
域Ｒは存在しないとして否定結果が得られた場合には、
このまま当該ステップＳ１０２としてのルーチンを抜け
るのであるが、設定色領域Ｒが存在するとして肯定結果
が得られた場合にはステップＳ４０４に進む。Then, in the next step S403,
As a result of the comparison in step S402, the registered color area R in the frame image area as the captured image is displayed.
Is present. Here, when a negative result is obtained assuming that the registered color region R does not exist,
Although the routine exits from step S102 as it is, if a positive result is obtained because the set color region R exists, the process proceeds to step S404.

【００７９】ここで、ステップＳ４０３において肯定結
果が得られる場合として、複数の異なる領域について、
登録色領域Ｒが存在するとして判定される場合がある。
このような場合には、これら複数の登録色領域Ｒのう
ち、或る１つの領域が実際にポインタを撮像して得られ
ている登録色領域Ｒであると考えることができる。そし
て、本実施の形態では、判定された複数の登録色領域Ｒ
のうちで最大面積の登録色領域を、実際にポインタを撮
像して得られた登録色領域Ｒとして特定することとし
た。このための処理がステップＳ４０４の処理となる。Here, as a case where a positive result is obtained in step S403,
It may be determined that the registered color region R exists.
In such a case, one of the plurality of registered color areas R can be considered to be the registered color area R actually obtained by imaging the pointer. Then, in the present embodiment, a plurality of determined registered color regions R
Among them, the registered color area having the largest area is specified as the registered color area R obtained by actually capturing an image of the pointer. The process for this is the process of step S404.

【００８０】つまり、ステップＳ４０４においては、ス
テップＳ４０３において存在している物と判定された複
数の登録色領域Ｒうちで、最大面積の登録色領域Ｒを検
出する。そして、検出した登録色領域Ｒを、撮像画像内
におけるポインタの画像部分領域（ポインタ領域）とし
て設定する。なお、ポインタ領域を認識するのにあたっ
ては、他の認識方法を用いたり、また、上記ステップＳ
４０４としての認識方法を含む、他の認識方法を併用し
てもよいものとされる。例えば、ポインタは必要に応じ
て表示画面１７ａ上を移動するものであるから、必要に
応じて動き検出処理を取り入れることなども考えられ
る。That is, in step S404, the registered color region R having the largest area is detected from the plurality of registered color regions R determined to exist in step S403. Then, the detected registered color area R is set as an image partial area (pointer area) of the pointer in the captured image. When recognizing the pointer area, another recognition method may be used, or the above step S
Other recognition methods, including the recognition method as 404, may be used together. For example, since the pointer moves on the display screen 17a as needed, it is possible to incorporate a motion detection process as needed.

【００８１】次のステップＳ４０５においては、上記の
ようにして検出されたポインタ領域についての、撮像画
像データにおける座標範囲を取得する。ここで、図１４
には、処理対象となっている撮像画像を示している。な
お、この図においては、撮像画像内において、パーソナ
ルコンピュータ１０のＰＣ画像部分１０−１として、表
示画面領域１７ａが表示されている状態が示されてい
る。ＲＡＭ１３には、このような画像状態の画像データ
がフレーム単位で書き込まれ、処理対象とされているこ
とになる。そして、例えばポインタとしてユーザが自身
の指を使用しているとした場合、上記ステップＳ４０４
までの処理によっては、ユーザの手の画像ＧＨとしての
画像領域部分が、ポインタ領域の座標範囲として得られ
ることになる。ここで、ユーザが指をポインタとして表
示画面１７ａに対する接触操作を行っている場合、実際
に指し示しているポイント位置となるのは、例えばユー
ザの人差し指の先端となるものである。従って、撮像画
像上におけるポインタ領域の座標範囲からポイント位置
としての点（座標）を特定するためには、この指先とし
ての先端部Ｐｔの位置（座標）を検出すればよいことと
なる。ステップＳ４０５に続くステップＳ４０６は、こ
のための処理となる。In the next step S405, the coordinate range in the picked-up image data for the pointer area detected as described above is acquired. Here, FIG.
In the figure, the captured image that is the processing target is shown. In addition, in this figure, a state in which the display screen area 17a is displayed as the PC image portion 10-1 of the personal computer 10 in the captured image is shown. Image data in such an image state is written in the RAM 13 in units of frames, and is to be processed. Then, for example, when the user uses his / her finger as the pointer, the above step S404.
Depending on the processes up to, the image area portion as the image GH of the user's hand is obtained as the coordinate range of the pointer area. Here, when the user is performing a touch operation on the display screen 17a with the finger as a pointer, the point position actually pointed is, for example, the tip of the index finger of the user. Therefore, in order to specify a point (coordinate) as a point position from the coordinate range of the pointer area on the captured image, it is sufficient to detect the position (coordinate) of the tip Pt as the fingertip. Step S406 following step S405 is a process for this purpose.

【００８２】ここで、例えば実際のポインタ操作をユー
ザの指を例に考えると、ユーザは、例えば図１４のユー
ザの手の画像ＧＨを見ても分かるように、人差し指を上
方向に向けた状態でポインティングの操作を行うことに
なる。なお、ポインタペンなどを使用する場合も、例え
ばポインタの先を上方向に向けるようにして操作を行う
という点で同様である。そこで、ステップＳ４０６によ
り先端部Ｐｔの位置（座標）を求めるのにあたっては、
次のような処理を実行させることとする。先のステップ
Ｓ４０５において取得した座標範囲の情報によっては、
例えば撮像画像内におけるポインタの形状を特定するこ
とができる。そして、この特定された形状を形成する座
標値群のうちから、例えば撮像画像データ内において最
も画面上側に存在する座標値を、先端部Ｐｔの座標とし
て取得するようにすることが考えられる。あるいはま
た、ポインタの座標範囲により特定される形状から、さ
らに、最も長く棒状に延びている画像部分の座標範囲を
特定し、この棒状の画像部分を形成する座標値群のうち
から、棒状の突端となる位置の座標を先端部Ｐｔの座標
として取得するようにすることも考えられる。Here, for example, considering an actual pointer operation using the user's finger as an example, the user points his or her index finger upward as can be seen from the image GH of the user's hand in FIG. 14, for example. The pointing operation will be performed with. It should be noted that the same applies when a pointer pen or the like is used, for example, the operation is performed with the tip of the pointer facing upward. Therefore, in obtaining the position (coordinates) of the tip Pt in step S406,
The following processing will be executed. Depending on the information of the coordinate range acquired in step S405,
For example, the shape of the pointer in the captured image can be specified. Then, from the coordinate value group forming the specified shape, for example, the coordinate value existing on the uppermost screen side in the captured image data may be acquired as the coordinate of the tip portion Pt. Alternatively, from the shape specified by the coordinate range of the pointer, further specify the coordinate range of the image portion extending in the longest rod shape, and select the rod-shaped tip from the coordinate value group forming the rod-shaped image portion. It is also possible to acquire the coordinates of the position as the coordinates of the tip Pt.

【００８３】説明を図１１に戻す。上記図１２に示すス
テップＳ１０２としてのポインタ検出処理が実行された
後は、ステップＳ１０３においてポインタが適正に検出
されたか否かが判別される。つまり、ステップＳ１０２
の処理によってポインタ領域及び先端部Ｐｔの座標が検
出されたか否かについて検出する。ここで、例えば先に
図１２に示した処理を実行した結果として、ステップＳ
４０４以降の処理を実行しなかった場合には、ポインタ
が検出されなかったとして否定結果が得られることにな
る。この場合には、一旦この図に示す処理ルーチンを抜
けて、再度ステップＳ１０１の処理に戻るようにされ
る。これに対して、図１２に示した処理としてステップ
Ｓ４０４以降の処理が実行されて最終的に座標先端部Ｐ
ｔの座標が得られていた場合には、ステップＳ１０３に
て肯定結果が得られ、ステップＳ１０４に進むことにな
る。The description returns to FIG. After the pointer detection processing in step S102 shown in FIG. 12 is executed, it is determined in step S103 whether the pointer is properly detected. That is, step S102
It is detected whether or not the coordinates of the pointer area and the tip portion Pt are detected by the processing of. Here, for example, as a result of executing the processing shown in FIG.
If the processing after 404 is not executed, a negative result is obtained because the pointer is not detected. In this case, the process routine shown in this figure is once exited and the process returns to step S101 again. On the other hand, as the processing shown in FIG. 12, the processing after step S404 is executed, and finally the coordinate leading end portion P
If the coordinate of t has been obtained, a positive result is obtained in step S103, and the process proceeds to step S104.

【００８４】ステップＳ１０４においては、先の図１２
におけるステップＳ４０６によって取得された先端部Ｐ
ｔの座標（画像内座標）を、表示画面１７ａにおける座
標（実座標）に変換する処理を実行する。このための処
理については、図５〜図７により説明したとおりであ
る。そして、次のステップＳ１０５においては、上記ス
テップＳ１０４にて実座標を取得したことに対応して、
マウスインターフェイスアプリケーション６０に対して
イベントを発生させる。In step S104, as shown in FIG.
In step S406 in step S406
A process of converting the coordinates of t (coordinates in the image) into the coordinates (actual coordinates) on the display screen 17a is executed. The process for this is as described with reference to FIGS. Then, in the next step S105, in response to the acquisition of the actual coordinates in step S104,
Generate an event for the mouse interface application 60.

【００８５】画像認識モジュール５０に従った処理とし
ては、例えばこのステップＳ１０１〜Ｓ１０５の処理を
フレーム周期ごとに繰り返し実行するものである。これ
により、例えば撮像画像内にポインタが存在しているこ
とが検出されている限りは、マウスインターフェイスア
プリケーション６０に対してイベントが投げられること
になる。そして、例えばポインタが移動を行えば、この
移動に応じて変換する実座標が取得されるごとにイベン
トが発生することになる。As the processing according to the image recognition module 50, for example, the processing of steps S101 to S105 is repeatedly executed for each frame cycle. This causes an event to be thrown to the mouse interface application 60 as long as it is detected that the pointer is present in the captured image, for example. Then, for example, if the pointer moves, an event will occur each time the actual coordinates to be converted are acquired according to this movement.

【００８６】続いては、図１１における音声認識モジュ
ール４０に従った処理について説明する。この処理は、
図１１における左側のステップＳ２０１〜Ｓ２０３の処
理として示されている。先ず、ステップＳ２０１におい
ては、ビデオキャプチャモジュール４１の処理として、
オーディオ信号についての取り込み処理を実行する。つ
まり、例えばマイクロフォン２３（２３Ａ）により収音
されることで入力されてくるオーディオ信号について、
必要があればデジタル信号への変換処理を実行したうえ
で、以降の解析処理に適合した所要のオーディオデータ
に変換する。そして、例えばＲＡＭ１１に確保した作業
領域に対して逐次書き込みを行っていくものである。Next, processing according to the voice recognition module 40 in FIG. 11 will be described. This process
This is shown as the processing of steps S201 to S203 on the left side in FIG. First, in step S201, as the processing of the video capture module 41,
Executes capture processing for audio signals. That is, for example, regarding the audio signal input by being picked up by the microphone 23 (23A),
If necessary, conversion processing to a digital signal is performed, and then conversion is performed to required audio data suitable for subsequent analysis processing. Then, for example, writing is sequentially performed to the work area secured in the RAM 11.

【００８７】そして、続くステップＳ２０２以降の処理
がオーディオ解析モジュール４２に従っての処理とな
る。ステップＳ２０２においては、図８により説明した
ようにして、解析処理を実行する。つまり、先ずは、入
力されたオーディオデータについてフィルタリング処理
を行って、予め設定された接触音の周波数帯域に対応す
る信号成分のみを分離抽出する。そして、この抽出され
た接触音の信号成分のレベルについて、予め設定された
閾値（第１の閾値±ｔｈ０及び第２の閾値±ｔｈ１）と
の比較を行う。また、その比較結果として、無音、弱
音、強音の三段階による解析結果を得るようにされる。
なお、このような解析処理は、図８にても説明したよう
に、所定の時間ｔごとのタイミングで行われる。そし
て、次のステップＳ２０３により、上記ステップＳ２０
２にて解析結果の情報（無音、弱音、強音）を得たこと
を示すイベントを、マウスインターフェイスアプリケー
ション６０に対して発生させる。Then, the processes following step S202 are processes according to the audio analysis module 42. In step S202, the analysis process is executed as described with reference to FIG. That is, first, the input audio data is filtered to separate and extract only the signal component corresponding to the preset frequency band of the contact sound. Then, the level of the signal component of the extracted contact sound is compared with preset threshold values (first threshold value ± th0 and second threshold value ± th1). Further, as the comparison result, an analysis result in three stages of silence, weak sound, and strong sound is obtained.
It should be noted that such an analysis process is performed at a timing of every predetermined time t as described with reference to FIG. Then, in the next step S203, the above step S20
At 2, the mouse interface application 60 is caused to generate an event indicating that the analysis result information (silent sound, weak sound, strong sound) has been obtained.

【００８８】続いては、マウスインターフェイスアプリ
ケーション６０に従った処理動作について説明する。こ
の処理は、図１１において中央のステップＳ３０１〜Ｓ
３０７の処理として示されている。先ず、ステップＳ３
０１〜Ｓ３０２の処理は、カーソル位置情報発生モジュ
ール６３に従っての処理となる。ステップＳ３０１にお
いては、画像認識モジュール５０に従ってステップＳ１
０５にて発生したとされるイベントを受け取るのを待機
している。ここで、画像認識モジュール５０からのイベ
ントを受け取るまでは、ステップＳ３０２→Ｓ３０２の
処理をスキップしてステップＳ３０３の処理に移行する
ようにされるが、ステップＳ３０１にてイベントを受け
取ったとして肯定結果が得られた場合にはステップＳ３
０２に進む。Next, the processing operation according to the mouse interface application 60 will be described. This process is performed in steps S301 to S in the center of FIG.
This is shown as the processing of 307. First, step S3
The processing of 01 to S302 is processing according to the cursor position information generation module 63. In step S301, step S1 is performed according to the image recognition module 50.
It is waiting to receive the event that occurred at 05. Here, until the event from the image recognition module 50 is received, the process of steps S302 → S302 is skipped and the process proceeds to the process of step S303. If obtained, step S3
Go to 02.

【００８９】ステップＳ３０２においては、先のステッ
プＳ１０４にて取得されたとするポインタの実座標の情
報を取り込むための処理を実行する。そして、続くステ
ップＳ３０３において、取り込みを行ったポインタの実
座標の情報に対応させて、カーソル位置情報を発生させ
る。この後は、ステップＳ３０４に進むこととしてい
る。In step S302, a process for fetching the information of the actual coordinates of the pointer which is assumed to be acquired in the previous step S104 is executed. Then, in the subsequent step S303, cursor position information is generated in correspondence with the information of the actual coordinates of the pointer that has been taken in. After this, the process proceeds to step S304.

【００９０】ステップＳ３０４〜Ｓ３０６は、ボタン操
作情報発生モジュール６２に従った処理である。ステッ
プＳ３０４においては、音声認識モジュール４０として
の処理に従ってステップＳ２０３により発生したとされ
るイベントの受け取りを待機している。ここで、ステッ
プＳ３０４において、イベントを受け取っていないと判
別された場合にはステップＳ３０５→Ｓ３０６の処理を
スキップしてステップＳ３０７の処理に移行する。これ
に対してステップＳ３０４においてイベントを受け取っ
たとして肯定結果が得られたのであればステップＳ３０
５に進む。Steps S304 to S306 are processes according to the button operation information generation module 62. In step S304, according to the process of the voice recognition module 40, it waits for the reception of the event that has occurred in step S203. If it is determined in step S304 that an event has not been received, the process of steps S305 → S306 is skipped and the process proceeds to step S307. On the other hand, if a positive result is obtained because the event is received in step S304, step S30
Go to 5.

【００９１】ステップＳ３０５においては、先のステッ
プＳ２０２にて得られたとされる解析結果（無音、弱
音、強音）について取り込みを行う。そして、次のステ
ップＳ３０６の処理によっては、上記ステップＳ２０２
により取り込んだ解析結果に基づいて、無音・弱音・強
音の発生の状態遷移を認識したうえで、図９に示したマ
ウスの左ボタンの押圧／解除としての操作情報を発生さ
せる。そして、ステップＳ３０７の処理に移行する。In step S305, the analysis result (silent sound, weak sound, strong sound) obtained in the previous step S202 is fetched. Then, depending on the processing of the next step S306, the above step S202
After recognizing the state transitions of silence, weak sound, and strong sound generation based on the analysis result captured by, operation information for pressing / releasing the left button of the mouse shown in FIG. 9 is generated. Then, the process proceeds to step S307.

【００９２】ステップＳ３０７の処理は、操作情報伝達
モジュール６１に従っての処理となる。つまり、ステッ
プＳ３０７の処理は、上記ステップＳ３０３及びＳ３０
６により発生したとされる、現在のカーソル位置情報及
び左ボタン操作情報を、ＯＳによって処理可能な構造の
データに変換して、ＯＳに受け渡すための処理を実行す
る。この受け渡された情報に応じて、ＯＳ上では、ＧＵ
Ｉ操作に対応した各種処理が実行されるのは、図１０の
説明において述べたとおりである。つまり、マウスの移
動に相当する操作と、マウス左ボタン押圧／解除に相当
する操作とに応じた処理を実行することになる。The processing of step S307 is processing according to the operation information transmission module 61. That is, the process of step S307 is the same as steps S303 and S30 described above.
The current cursor position information and the left button operation information, which is supposed to be generated by 6, are converted into data of a structure that can be processed by the OS, and the processing for passing to the OS is executed. Depending on the passed information, the GU
The various processes corresponding to the I operation are executed as described in the description of FIG. That is, the processing corresponding to the operation corresponding to the movement of the mouse and the operation corresponding to the pressing / release of the left mouse button is executed.

【００９３】これまでの説明のようにして本実施の形態
の疑似タッチパネル操作が実現されることで、例えば、
タッチパネル付きの表示デバイスを用意しなくとも、表
示画面に対して直接的に触れて操作を行う、タッチパネ
ル的な操作が容易に実現されることになる。タッチパネ
ル操作は、例えばマウス操作や、いわゆるウェアラブル
コンピューティングの発想に基づいた操作とは異なり、
直感的な操作であるので、より使い勝手の良いＧＵＩに
対する操作が行えることになる。By implementing the pseudo touch panel operation of the present embodiment as described above, for example,
Even if a display device with a touch panel is not prepared, a touch panel-like operation of directly touching the display screen for operation can be easily realized. Touch panel operation is different from, for example, mouse operation or operation based on the idea of so-called wearable computing,
Since this is an intuitive operation, it is possible to perform a more convenient GUI operation.

【００９４】そして、本実施の形態の場合において、こ
のような疑似タッチパネル操作を実現するのにあたって
は、ユーザは、画像／音声認識ドライバ１８ａをインス
トールした汎用的なパーソナルコンピュータと、カメラ
装置及びマイクロフォンのみでよいこととなる。カメラ
装置としては、例えばこれまでに広く普及しているパー
ソナルコンピュータ用のＣＣＤカメラなどを流用すれば
良く、また、マイクロフォンも従来から普及している一
般のものを用いればよい。また、特にマイクロフォンに
ついては、パーソナルコンピュータに内蔵されているの
であれば、これを用いても充分に実用に耐えうる。この
ため、ユーザは、経済的負担をさほど気にすることな
く、必要な機器を揃えることができる。例えば、タッチ
パネル付きのディスプレイ装置は、現状においては非常
に高価であるが、本実施の形態の場合であれば、より手
軽にタッチパネル的操作が行えるシステムを得ることが
できる。In order to realize such a pseudo touch panel operation in the case of the present embodiment, the user only needs a general-purpose personal computer in which the image / voice recognition driver 18a is installed, a camera device and a microphone. Will be good. As the camera device, for example, a CCD camera for a personal computer which has been widely used so far may be used, and a general microphone which has been conventionally used may be used as a microphone. Further, especially for a microphone, if it is built in a personal computer, even if it is used, it can be sufficiently put into practical use. For this reason, the user can prepare the necessary devices without paying much attention to the financial burden. For example, a display device with a touch panel is very expensive under the present circumstances, but in the case of the present embodiment, it is possible to obtain a system capable of performing a touch panel operation more easily.

【００９５】さらに本実施の形態においては、音声認識
として表示画面１７ａに対する「接触音」を認識するよ
うにしている。例えば、ユーザが発話した音声に応じて
ＧＵＩ制御を実行させる構成が知られているが、この場
合には、ユーザは、不用意に発話することができないの
で、例えば操作中には会話などが充分にできないことに
なるが、本実施の形態の場合には、例えば発話音声と
は、全く異なる周波数帯域特性等を有する「接触音」に
基づいたＧＵＩ制御としていることで、上記のような問
題は解消されることになる。Further, in the present embodiment, the "contact sound" on the display screen 17a is recognized as the voice recognition. For example, a configuration is known in which GUI control is executed in response to a voice uttered by a user. In this case, however, the user cannot inadvertently speak, so that, for example, a conversation is sufficient during operation. However, in the case of the present embodiment, since the GUI control is performed based on “contact sound” having a completely different frequency band characteristic, etc., from the uttered voice, the above-mentioned problem occurs. Will be resolved.

【００９６】５．変形例（ポインティング操作のみによ
る疑似タッチパネル操作）ところで、上記した本実施の形態としての疑似タッチパ
ネル操作では、ユーザのポインタによるポインティング
操作をマウスの移動操作に対応させ、また、このポイン
ティング操作に応じた接触音の情報を「疑似ボタン操
作」としてマウス左ボタン操作に対応させていた。しか
しながら、本実施の形態としては、上記した接触音の情
報を用いなくとも、例えばクリックなどの「疑似ボタン
操作」を実現することが可能である。以下、この点につ
いて、変形例として説明しておくこととする。5. Modified Example (Pseudo Touch Panel Operation by Only Pointing Operation) In the above-described pseudo touch panel operation according to the present embodiment, the pointing operation by the user's pointer is made to correspond to the movement operation of the mouse, and the touch operation corresponding to this pointing operation is made. The sound information was made to correspond to the left mouse button operation as "pseudo button operation". However, in the present embodiment, it is possible to realize, for example, a “pseudo button operation” such as clicking without using the above-mentioned contact sound information. Hereinafter, this point will be described as a modified example.

【００９７】この場合の操作態様例について、先ず説明
しておくこととする。例えば、この場合においても、ユ
ーザがポインタを用いて表示画面１７ａに対して接触操
作を行ったときに、表示画面１７ａに接触しているポイ
ンタの先端部Ｐｔの座標が、最終的にカーソル位置情報
として扱われるのは、上記実施の形態と同様である。従
って、表示画面１７ａ上に接触するポインタが移動する
のに応じて、そのポインタの先端部Ｐｔの位置（図１４
参照）に合わせるようにして表示画面１７ａに表示され
るカーソルは移動することとなる。そして、この場合に
おいて、ユーザがクリック操作を行いたいと思った場合
には、先ず、クリックしたいとするポイント位置にてポ
インタの先端部Ｐｔを一定時間以上停止させ、その後、
例えば一旦表示画面１７ａから離すようにする。そして
この後、同じクリックすべきポイント位置に対してポイ
ンタの先端部Ｐｔを戻すようにする。つまり、或る早さ
でもって、表示画面１７ａ上の同じ位置を１回叩くよう
にして操作する。An example of the operation mode in this case will be described first. For example, even in this case, when the user performs a touch operation on the display screen 17a using the pointer, the coordinates of the tip Pt of the pointer in contact with the display screen 17a are finally determined by the cursor position information. Is treated in the same manner as in the above embodiment. Therefore, as the pointer touching the display screen 17a moves, the position of the tip Pt of the pointer (see FIG.
The cursor displayed on the display screen 17a is moved in accordance with (see). Then, in this case, when the user wants to perform the click operation, first, the tip portion Pt of the pointer is stopped at the point position where the user wants to click, for a certain period of time, and then,
For example, the display screen 17a is once removed. Then, after this, the tip portion Pt of the pointer is returned to the same point position to be clicked. That is, the player operates the same position on the display screen 17a once at a certain speed.

【００９８】このような操作は、撮像画像上において
は、或るポイント位置の座標にてポインタの先端部Ｐｔ
の動きが一時停止した後に動き、さらにこの後、一時停
止していた位置の座標にポインタの先端部Ｐｔが位置す
るように見えることとなる。そこで、例えば図１０に示
した画像認識モジュール部５０及びマウスインターフェ
イスアプリケーション６０との連携によっては、フレー
ムごとの撮像画像データにおけるポインタの先端部Ｐｔ
について、動き検出を行うようにされる。そして、その
動き検出結果として、或る座標Ａにて一定時間以上
停止→ この後座標Ａから離れるようにして先端部Ｐ
ｔが移動→ 先端部Ｐｔが、再び上記の状態から一
定時間以内に座標Ａに移動、という状態遷移が得られた
ことが判定されたときに、座標Ａをクリック・ポイント
としてクリックが行われたものとして操作情報をＯＳに
渡すようにされる。なお、ダブルクリックについては、
例えばクリックしたいとするポイント位置にてポインタ
の先端部Ｐｔを一定時間以上停止させた後、そのポイン
ト位置を連続して二回叩いてもらうようにする。そし
て、これに応じた動き検出結果の条件が得られたときに
ダブルクリックが行われたものとして扱うようにすれば
よい。Such an operation is performed on the picked-up image at the tip portion Pt of the pointer at the coordinates of a certain point position.
Is temporarily stopped, and thereafter, the tip portion Pt of the pointer appears to be located at the coordinates of the position where it was temporarily stopped. Therefore, for example, depending on the cooperation with the image recognition module unit 50 and the mouse interface application 60 shown in FIG. 10, the tip Pt of the pointer in the captured image data for each frame is detected.
Is detected. Then, as a result of the motion detection, the movement is stopped at a coordinate A for a certain time or longer, and thereafter the tip P is moved away from the coordinate A.
When it is determined that the state transition in which t moves → the tip Pt moves to the coordinate A again within a certain time from the above state, the coordinate A is clicked as a click point. The operation information is handed over to the OS. Regarding double-click,
For example, after the tip portion Pt of the pointer is stopped at the point position where the user wants to click for a certain time or more, the point position is hit twice in succession. Then, when the condition of the motion detection result corresponding to this is obtained, it may be handled as if the double click was performed.

【００９９】なお、本発明としての疑似タッチパネル操
作を実現するための各種の処理はこれまでの説明に限定
されるものではなく、適宜変更可能である。例えば、図
３に示した枠情報の設定のための操作手順、また、図４
〜図７により説明した座標変換処理のための演算の仕方
などは、ほかにも考えられるものである。また、図８に
示した接触音についての解析処理の内容、及び図９に示
した解析結果に応じたマウス左ボタンの状態遷移も、実
際にユーザに行ってもらうべき操作態様に応じて適宜変
更されて構わない。例えば図８に示した解析処理では、
接触音レベルについて、無音、弱音、強音の三段階によ
る区分けを行っているが、例えば操作態様に応じては、
二段階としたり、或いは四段階以上とすることも考えら
れる。さらには、図１０〜図１２に示した、疑似タッチ
パネル操作実現のためのプログラム構成及び処理動作に
ついても、他の構成とすることが考えられるものであ
る。The various processes for implementing the pseudo touch panel operation according to the present invention are not limited to those described above, and can be changed as appropriate. For example, the operation procedure for setting the frame information shown in FIG.
The calculation method and the like for the coordinate conversion processing described with reference to FIG. 7 are also conceivable. In addition, the content of the analysis process for the contact sound shown in FIG. 8 and the state transition of the left mouse button depending on the analysis result shown in FIG. I don't mind. For example, in the analysis process shown in FIG.
The contact sound level is classified into three levels of silence, weak sound, and strong sound. For example, depending on the operation mode,
It is also conceivable that there are two stages or four or more stages. Furthermore, it is conceivable that the program configuration and the processing operation for realizing the pseudo touch panel operation shown in FIGS.

【０１００】[0100]

【発明の効果】以上説明したようにして本発明は、表示
画面を撮像して得られる撮像画像から、ポインタが指し
示している表示画面上の位置であるポイント位置を認識
して、この認識したポイント位置に応じて、ＧＵＩ操作
に応答した所要の情報処理を実行するように構成されて
いる。このような構成であれば、例えば表示画面がタッ
チパネルとしての構造を採っていなくとも、タッチパネ
ルと同様の操作を行うことが可能となる。一般に、表示
画面がタッチパネルとしての構造を有していなければ、
マウス、トラックパッドなどの操作子を操作することに
なるのであるが、このような操作は、操作子がユーザの
視界に無いこともあって必ずしも直感的な操作とはいえ
ない。しかしながら本発明では、タッチパネル付きの表
示画面でなくとも、直感的操作であるタッチパネル操作
を可能としている。また、このような本発明のタッチパ
ネル的操作を実現するためのハードウェア構成として
は、例えばパーソナルコンピュータなどの汎用の情報処
理装置に対してカメラ装置（撮像装置）を備えればよ
い。つまり、高価で特殊な装置などは必要ないことか
ら、ユーザとしては気軽にシステムを組めることにも成
る。また、タッチパネル付きのコンピュータディスプレ
イなどは、相当に高価であることから、このようなコン
ピュータディスプレイを購入することと比較しても、本
発明によるシステムは手軽に組むことができるものであ
る。As described above, the present invention recognizes the point position, which is the position on the display screen pointed by the pointer, from the captured image obtained by capturing the display screen, and recognizes the recognized point. It is configured to execute required information processing in response to the GUI operation according to the position. With such a configuration, even if the display screen does not have a structure as a touch panel, it is possible to perform an operation similar to that of the touch panel. Generally, if the display screen does not have a structure as a touch panel,
Although an operator such as a mouse or a trackpad is operated, such an operation is not necessarily an intuitive operation because the operator is not in the field of view of the user. However, according to the present invention, even if the display screen does not have a touch panel, intuitive touch panel operation is possible. Further, as a hardware configuration for realizing such a touch panel operation of the present invention, a camera device (image pickup device) may be provided for a general-purpose information processing device such as a personal computer. In other words, since it is expensive and does not require a special device, the user can easily set up the system. Further, since a computer display with a touch panel and the like are considerably expensive, the system according to the present invention can be easily assembled even when compared with the purchase of such a computer display.

【０１０１】また、本発明としては、物理的なポインタ
が表示画面に接触することで生じる接触音をマイクロフ
ォンによって収音して解析処理を行い、この解析処理の
結果に基づいて、ＧＵＩ操作に応答した所要の情報処理
を実行するようにされる。この場合の操作としては、例
えばカーソル移動のために表示画面上を擦るようにして
ポインタを移動させたり、また、クリック／ダブルクリ
ックのために表示画面上を叩いたりという、直感的な操
作を行ってもらうことができる。この発明によっても、
タッチパネルの構造を有していない表示画面に対してタ
ッチパネルと同様の操作が行えることになる。また、セ
ンサとしては、既に汎用的であり、安価で入手も容易な
マイクロフォンとされている。これにより、上記したカ
メラ部をセンサとする発明の構成と同様の効果が得られ
ることになる。また、例えばユーザの発話音声により操
作を行う構成が知られているが、この場合には、例えば
ユーザは、操作中には操作に関する内容しか話すことが
できないという不便さが伴う。しかしながら、本発明
は、接触音に基づいたＧＵＩ制御となることから上記し
た問題は解消され、操作中においてはユーザは、自由に
会話などをすることができる。また、この接触音に基づ
いてＧＵＩ操作を実現する本発明は、上記した撮像画像
から得たポイント位置に基づいてＧＵＩ操作を実現する
発明と併用することで、より操作性の向上したＧＵＩ操
作とすることもできる。Further, according to the present invention, the contact sound generated by the physical pointer touching the display screen is picked up by the microphone and analyzed, and the GUI operation is responded to based on the result of the analysis. The required required information processing is executed. In this case, intuitive operations such as rubbing the display screen to move the cursor and moving the pointer, or tapping the display screen to click / double-click are performed. Can be asked. According to this invention,
The same operation as the touch panel can be performed on the display screen having no touch panel structure. In addition, as a sensor, a microphone that is already general-purpose, inexpensive, and easily available is used. As a result, the same effect as the above-described configuration of the invention using the camera unit as a sensor can be obtained. Further, for example, a configuration is known in which an operation is performed by a user's uttered voice, but in this case, for example, the user has an inconvenience that only the content related to the operation can be spoken during the operation. However, since the present invention uses GUI control based on a contact sound, the above-mentioned problem is solved, and the user can freely talk during the operation. In addition, the present invention that realizes a GUI operation based on this contact sound is used in combination with the above-described invention that realizes a GUI operation based on a point position obtained from a captured image, thereby providing a GUI operation with improved operability. You can also do it.

[Brief description of drawings]

【図１】本実施の形態としての疑似タッチパネル操作シ
ステムの概要を示す斜視図である。FIG. 1 is a perspective view showing an outline of a pseudo touch panel operation system according to the present embodiment.

【図２】本実施の形態としてのパーソナルコンピュータ
の内部構成例を示すブロック図である。FIG. 2 is a block diagram showing an internal configuration example of a personal computer according to the present embodiment.

【図３】枠情報を設定するための操作手順を示す説明図
である。FIG. 3 is an explanatory diagram showing an operation procedure for setting frame information.

【図４】枠情報についての台形補正処理を模式的に示す
説明図である。FIG. 4 is an explanatory diagram schematically showing a keystone correction process for frame information.

【図５】枠情報に存在する画像内座標を、表示画面上の
実座標に変換するための処理例を模式的に示す説明図で
ある。FIG. 5 is an explanatory diagram schematically showing a processing example for converting the in-image coordinates existing in the frame information into the actual coordinates on the display screen.

【図６】枠情報に存在する画像内座標を、表示画面上の
実座標に変換するための処理例を模式的に示す説明図で
ある。FIG. 6 is an explanatory diagram schematically showing a processing example for converting the in-image coordinates existing in the frame information into the actual coordinates on the display screen.

【図７】枠情報に存在する画像内座標を、表示画面上の
実座標に変換するための処理例を模式的に示す説明図で
ある。FIG. 7 is an explanatory diagram schematically showing a processing example for converting the in-image coordinates existing in the frame information into the actual coordinates on the display screen.

【図８】接触音についての解析処理例を模式的に示す説
明図である。FIG. 8 is an explanatory diagram schematically showing an example of analysis processing for a contact sound.

【図９】接触音についての解析結果に応じたマウス左ボ
タン操作の状態遷移例を示す説明図である。FIG. 9 is an explanatory diagram showing a state transition example of a mouse left button operation according to an analysis result of a contact sound.

【図１０】画像／音声認識ドライバのプログラム構造例
を示すブロック図である。FIG. 10 is a block diagram showing a program structure example of an image / voice recognition driver.

【図１１】疑似タッチパネル操作時に対応する画像／音
声認識ドライバに従っての処理動作を示すフローチャー
トである。FIG. 11 is a flowchart showing a processing operation according to an image / voice recognition driver corresponding to a pseudo touch panel operation.

【図１２】画像認識モジュールに従った処理として、ポ
インタ検出処理例を示すフローチャートである。FIG. 12 is a flowchart showing an example of pointer detection processing as processing according to the image recognition module.

【図１３】色領域を、ＹＵＶ色度空間により表した説明
図である。FIG. 13 is an explanatory diagram showing a color area in a YUV chromaticity space.

【図１４】カメラ部により撮像された撮像画像として、
ポインタとしてのユーザの手の画像部分が現れている状
態を示す説明図である。FIG. 14 is a captured image captured by a camera unit,
It is explanatory drawing which shows the state in which the image part of the user's hand as a pointer has appeared.

【図１５】本実施の形態の疑似タッチパネル操作システ
ムについての他の構成例を示す説明図である。FIG. 15 is an explanatory diagram showing another configuration example of the pseudo touch panel operation system of the present embodiment.

【図１６】タッチパネルの構造例を模式的に示す説明図
である。FIG. 16 is an explanatory diagram schematically showing a structural example of a touch panel.

[Explanation of symbols]

１０パーソナルコンピュータ、１１ＣＰＵ、１７
ディスプレイモニタ、１７ａ表示画面、１８ＨＤ
Ｄ、１８ａ画像／音声認識ドライバ、２２カメラ
部、２３（２３Ａ）マイクロフォン、３０画像／音
声認識モジュール、４０音声認識モジュール部、４１
オーディオキャプチャモジュール、４２オーディオ解
析モジュール、５０画像認識モジュール部、５１ビ
デオキャプチャモジュール、５２座標取得モジュー
ル、６０マウスインターフェイスアプリケーション、
ＣＲ１〜ＣＲ４画隅指定カーソル、10 personal computer, 11 CPU, 17
Display monitor, 17a display screen, 18 HD
D, 18a image / speech recognition driver, 22 camera section, 23 (23A) microphone, 30 image / speech recognition module, 40 speech recognition module section, 41
Audio capture module, 42 audio analysis module, 50 image recognition module part, 51 video capture module, 52 coordinate acquisition module, 60 mouse interface application,
CR1 to CR4 Image corner designation cursor,

フロントページの続き (72)発明者今井敦彦東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者福田宏友東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 5B068 AA05 BB18 BC05 BD09 BD20 BE08 CC06 CD05 5B087 AA09 AB02 CC26 CC33 DD17 DE07 Continued front page (72) Inventor Atsuhiko Imai 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Soni -Inside the corporation (72) Inventor Hirotomo Fukuda 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Soni -Inside the corporation F term (reference) 5B068 AA05 BB18 BC05 BD09 BD20 BE08 CC06 CD05 5B087 AA09 AB02 CC26 CC33 DD17 DE07

Claims

[Claims]

1. A display means for displaying and outputting an image on a display screen, an image capturing means for capturing a captured image captured by an image capturing apparatus, and the display in the captured image captured by the image capturing means. Image position recognition means for recognizing the image part area of the screen, and point position recognition for recognizing the position indicated by the pointer existing in the image part area recognized by the image part recognition means as the actual point position on the display screen. An information processing device comprising: a means; and an information processing means capable of executing required information processing according to the point position recognized by the point position recognition means.

2. A voice capturing means for capturing a voice signal collected by a microphone device, and a voice pointer captured by the voice capturing means, which is caused by a physical pointer coming into contact with the display screen. The extraction means for extracting the voice signal component of the contact sound and the analysis means for executing a required analysis process for the voice signal component of the contact sound extracted by the extraction means are provided, and the information processing means is the analysis means. The information processing apparatus according to claim 1, wherein the information processing apparatus is configured to be capable of executing required information processing based on the analysis result of.

3. At least an imaging device and an information processing device, wherein the imaging device is provided at a position where the entire display screen of the information processing device can be imaged, and the information processing device A display means for displaying and outputting an image, an image capturing means for capturing the captured image captured by the image capturing device, and an image partial area of the display screen in the captured image captured by the image capturing means. Image part recognition means, point position recognition means for recognizing a position pointed by a pointer existing in the image part area recognized by the image part recognition means as an actual point position on the display screen, and the point position recognition means And an information processing unit capable of executing required information processing according to the point position recognized by An information processing system to be.

4. The information processing system further includes a microphone device, and the information processing device includes a voice capturing unit that captures a voice signal collected by the microphone unit, and a voice capturing unit that captures the voice signal. Extraction means for extracting a voice signal component of a contact sound generated when a physical pointer comes into contact with the display screen from the voice signal, and a required audio signal component of the contact sound extracted by the extraction means. 4. The information processing apparatus according to claim 3, further comprising: an analysis unit that executes an analysis process, wherein the information processing unit is configured to be able to execute required information processing based on an analysis result of the analysis unit. Processing system.

5. An image capturing procedure for capturing a captured image captured by an image capturing apparatus arranged to capture a display screen of an information processing apparatus, and the display in the captured image captured by the image capturing procedure. Image part recognition procedure for recognizing the image part area of the screen, and point position recognition for recognizing the position pointed by the pointer existing in the image part area recognized by the image part recognition procedure as the actual point position on the display screen. A program for causing the information processing apparatus to execute a procedure, and an information processing procedure capable of executing required information processing according to the point position recognized by the point position recognition procedure.

6. A voice capturing procedure for capturing a voice signal picked up by a microphone device, and a physical pointer coming into contact with the display screen from the voice signal captured by the voice capturing procedure. The information processing device is caused to execute an extraction procedure for extracting a voice signal component of a contact sound, and an analysis procedure for performing a required analysis process for the voice signal component of the contact sound extracted by the extraction procedure, and the information described above. The program according to claim 5, wherein the processing procedure is configured to allow the information processing apparatus to execute required information processing based on an analysis result of the analysis procedure.

7. A display means for displaying and outputting an image on a display screen, a voice capturing means for capturing a voice signal picked up by a microphone device, and the voice signal captured by the voice capturing means, Extraction means for extracting an audio signal component of a contact sound generated when a physical pointer comes into contact with the display screen, and analysis for executing a required analysis process for the audio signal component of the contact sound extracted by the extraction means. An information processing apparatus comprising: means, and information processing means capable of executing required information processing based on an analysis result of the analysis means.

8. The information processing apparatus according to claim 7, wherein the analyzing unit is configured to analyze the strength level of the contact sound at a predetermined stage.

9. The information processing apparatus according to claim 7, wherein the analysis unit is configured to analyze temporal continuity of the contact sound.

10. At least a microphone device,
The information processing apparatus comprises a display means for displaying and outputting an image on a display screen, a voice capturing means for capturing a voice signal picked up by a microphone device, and the voice capturing means. Extraction means for extracting the sound signal component of the contact sound generated by the physical pointer coming into contact with the display screen from the captured sound signal, and the sound signal component of the contact sound extracted by the extraction means. An information processing system comprising: an analysis unit that executes required analysis processing; and an information processing unit that can execute required information processing based on an analysis result of the analysis unit.

11. The information processing system according to claim 10, wherein the analysis unit is configured to analyze the strength level of the contact sound in a predetermined stage.

12. The information processing system according to claim 10, wherein the analysis unit is configured to analyze temporal continuity of the contact sound.

13. A physical pointer comes into contact with a display screen of an information processing device from a voice capturing procedure for capturing a voice signal collected by a microphone device and the voice signal captured by the voice capturing procedure. Based on the analysis result of the above-mentioned analysis procedure and the extraction procedure for extracting the voice signal component of the contact sound caused by And an information processing procedure capable of executing required information processing by the information processing apparatus.

14. The program according to claim 13, wherein the analysis procedure is configured to analyze the strength level of the contact sound at a predetermined stage with respect to the contact sound.

15. The program according to claim 13, wherein the analysis procedure is configured to analyze temporal continuity of the contact sound.