JP5068711B2

JP5068711B2 - Object shape recognition system and object shape recognition method

Info

Publication number: JP5068711B2
Application number: JP2008205609A
Authority: JP
Inventors: 美木子中西; 力堀越; 雅朗福本
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2008-08-08
Filing date: 2008-08-08
Publication date: 2012-11-07
Anticipated expiration: 2028-08-08
Also published as: JP2010038879A

Abstract

<P>PROBLEM TO BE SOLVED: To easily recognize the shape of an object. <P>SOLUTION: This device 10 for recognizing an object shape includes a camera 11 for imaging the object whose shape is to be recognized, a microphone 12 and sound detecting section 13 for detecting sound generated from the object, a position detecting section 14 for detecting the position at which the sound is generated in an image imaged by the camera 11 at a timing at which the sound is detected, and a shape estimating section 16 for estimating the shape of the object from the detected position. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、物体の形状を認識する物体形状認識システム及び物体形状認識方法に関する。 The present invention relates to an object shape recognition system and an object shape recognition method for recognizing the shape of an object.

プロジェクタやＨＭＤ（ヘッドマウントディスプレイ）で、映像を投影あるいは実際の物体に重畳（Augmented Reality（ＡＲ）、Mixed Reality（ＭＲ））する際には、通常、例えばスクリーンや決められた机など、あらかじめ決められた範囲や物体に映像を投影、重畳する。 When projecting images or superimposing them on an actual object (Augmented Reality (AR), Mixed Reality (MR)) with a projector or HMD (head-mounted display), it is usually decided beforehand such as a screen or a predetermined desk. Projects and superimposes an image on a specified range or object.

しかしながら、あらかじめ決められた物体や範囲にしか映像が、投影、重畳できないと、使用できる場所が限られてしまう。形状が分かっていない物体や範囲を認識し、その物体や範囲に対して映像を投影、重畳することができれば、いつでもどこでも映像を投影、重畳できるようになる。 However, if an image can be projected and superimposed only on a predetermined object or range, the place where it can be used is limited. If an object or range whose shape is not known is recognized, and an image can be projected and superimposed on the object or range, the image can be projected and superimposed anytime and anywhere.

物体の形状を認識する方法として以下のようなものがある。例えば、特許文献１には距離画像を用いて物体の三次元形状を認識する方法が示されている。また、特許文献２にはあらかじめ背景を撮像しておく方法が、特許文献３には、認識する物体のモデルを作成しておく方法が、それぞれ示されている。
特開平２−１８１８８０号公報特開２００７−２６３２７号公報特開２００７−４２１３６号公報 There are the following methods for recognizing the shape of an object. For example, Patent Document 1 discloses a method for recognizing a three-dimensional shape of an object using a distance image. Patent Document 2 discloses a method of capturing an image of a background in advance, and Patent Document 3 illustrates a method of creating a model of an object to be recognized.
Japanese Patent Laid-Open No. 2-181880 JP 2007-26327 A JP 2007-42136 A

しかしながら、上述の技術として知られるように、あらかじめ形状の分かっていない物体の形状のみを特定するのは非常に困難である。例えば、特許文献１に記載された技術では、物体の形状を認識することはできても、認識したい物体を特定することはできない。また、特許文献２及び３に記載された技術では、物体を認識させる前に、事前準備が必要であり、例えば、物体の形状や物体が利用される状況（背景）があらかじめわかっていなければならない。 However, as known as the above-mentioned technique, it is very difficult to specify only the shape of an object whose shape is not known in advance. For example, with the technique described in Patent Document 1, even if the shape of an object can be recognized, the object to be recognized cannot be specified. In addition, in the techniques described in Patent Documents 2 and 3, prior preparation is required before the object is recognized. For example, the shape of the object and the situation (background) in which the object is used must be known in advance. .

本発明は、上記の状況を鑑みてなされたものであり、容易に物体の形状を認識することができる物体形状認識システム及び物体形状認識方法を提供することを目的とする。ここで、認識対象となる物体の形状には、上述したような映像を投影、重畳させる物体の一部の範囲を含む。 The present invention has been made in view of the above situation, and an object thereof is to provide an object shape recognition system and an object shape recognition method that can easily recognize the shape of an object. Here, the shape of the object to be recognized includes a partial range of the object on which the video as described above is projected and superimposed.

上記目的を達成するために、本発明に係る物体形状認識システムは、形状の認識対象となる物体を撮像する撮像手段と、所定の音を検出する音検出手段と、音検出手段によって音が検出されたタイミングで撮像手段によって撮像された画像における、物体の形状に応じた位置を検出する位置検出手段と、位置検出手段によって検出された位置から物体の形状を推定する形状推定手段と、を備えることを特徴とする。 In order to achieve the above object, an object shape recognition system according to the present invention includes an imaging unit that images an object whose shape is to be recognized, a sound detection unit that detects a predetermined sound, and a sound that is detected by the sound detection unit. A position detection unit that detects a position corresponding to the shape of the object in the image captured by the imaging unit at a specified timing, and a shape estimation unit that estimates the shape of the object from the position detected by the position detection unit. It is characterized by that.

本発明に係る物体形状認識システムにおいて物体の形状が認識される際には、ユーザ等によって音が発生される。一方で、上記の音が発生したタイミングにおいて当該物体は撮像されており、撮像された画像における、物体の形状に応じた位置が検出される。検出された位置から物体の形状が特定される。上記のように、本発明に係る物体形状認識システムでは、物体の撮像、音の検出及び物体の形状に応じた位置の検出によって物体の形状の推定を行う。これらは従来の形状の認識に比べて簡易に行うことができ、結果として、本発明に係る物体形状認識システムによれば容易に物体の形状を認識することができる。 When an object shape is recognized in the object shape recognition system according to the present invention, a sound is generated by a user or the like. On the other hand, the object is imaged at the timing when the sound is generated, and a position corresponding to the shape of the object in the captured image is detected. The shape of the object is specified from the detected position. As described above, in the object shape recognition system according to the present invention, the object shape is estimated by imaging an object, detecting sound, and detecting a position corresponding to the object shape. These can be performed more easily than conventional shape recognition. As a result, the object shape recognition system according to the present invention can easily recognize the shape of the object.

音検出手段は、物体から発生した音を所定の音として検出して、位置検出手段は、音が発生した位置を物体の形状に応じた位置として検出する、ことが望ましい。この構成においては、ユーザ等によって形状の認識対象となる物体の所定位置から音が発生される。一方で、上記の音が発生したタイミングにおいて当該物体は撮像されており、撮像された画像における音が発生した位置が検出される。検出された位置から物体の形状が特定される。即ち、この構成によれば、物体の撮像、音の検出及び音が発生した位置の検出によって物体の形状の推定を行う。結果として、この構成によれば、より容易に物体の形状を認識することができる。 It is desirable that the sound detection unit detects a sound generated from the object as a predetermined sound, and the position detection unit detects a position where the sound is generated as a position corresponding to the shape of the object. In this configuration, sound is generated from a predetermined position of an object whose shape is to be recognized by a user or the like. On the other hand, the object is imaged at the timing when the sound is generated, and the position where the sound is generated in the captured image is detected. The shape of the object is specified from the detected position. That is, according to this configuration, the shape of the object is estimated by imaging the object, detecting the sound, and detecting the position where the sound is generated. As a result, according to this configuration, the shape of the object can be recognized more easily.

物体から発生した音は、当該物体が叩かれた音であり、位置検出手段は、物体を叩くものに係る情報をあらかじめ記憶しておき、当該情報に基づいて物体が叩かれた位置を、音が発生した位置として検出する、ことが望ましい。形状の認識対象となる物体を叩くことにより音を容易かつ確実に発生させることができる。また、何で物体を叩くかを決めておくことにより、上記の構成により確実に音が発生した位置が検出される。即ち、上記の構成によれば、より容易かつ確実に物体の形状を認識することができる。 The sound generated from the object is the sound of the object being struck, and the position detection means stores in advance information related to the object that is struck, and the position where the object is struck based on the information is determined. It is desirable to detect it as a position where the occurrence occurs. Sound can be easily and reliably generated by hitting an object whose shape is to be recognized. In addition, by determining why the object is struck, the position where the sound is generated is surely detected by the above configuration. That is, according to the above configuration, the shape of the object can be recognized more easily and reliably.

音検出手段は、検出する音に係る情報をあらかじめ記憶しておき、当該情報に基づいて所定の音を検出することが望ましい。この構成によれば、所定の音の検出を確実に行うことができ、確実に物体の形状を認識することができる。 It is desirable that the sound detection unit stores information related to the sound to be detected in advance and detects a predetermined sound based on the information. According to this configuration, it is possible to reliably detect a predetermined sound and to reliably recognize the shape of the object.

位置検出手段は、複数の物体の形状に応じた位置を検出し、形状推定手段は、位置検出手段によって検出された複数の位置から物体の形状を推定する、ことが望ましい。この構成によれば、複数の位置から物体の形状が推定されるので、適切に物体の形状を認識することができる。 It is desirable that the position detection unit detects positions corresponding to the shapes of the plurality of objects, and the shape estimation unit estimates the shapes of the objects from the plurality of positions detected by the position detection unit. According to this configuration, since the shape of the object is estimated from a plurality of positions, it is possible to appropriately recognize the shape of the object.

撮像手段は、複数の時刻にわたって物体を撮像し、撮像手段によって撮像された時間変化した画像における、位置検出手段によって検出された位置に対応する位置を検出する位置追従手段を更に備え、形状推定手段は、位置追従手段によって検出された位置から物体の形状を推定する、ことが望ましい。この構成によれば、検出された位置が追尾されて、例えば、複数の位置を検出する間に物体自体や撮像方向が動いたとしても、適切な位置に基づいて物体の形状を推定することができる。即ち、上記の構成によれば、より適切に物体の形状を認識することができる。 The imaging means further includes a position tracking means for capturing an object over a plurality of times and detecting a position corresponding to the position detected by the position detection means in the time-changed image captured by the imaging means. It is desirable to estimate the shape of the object from the position detected by the position tracking means. According to this configuration, the detected position is tracked. For example, even if the object itself or the imaging direction moves while detecting a plurality of positions, the shape of the object can be estimated based on an appropriate position. it can. That is, according to the above configuration, the shape of the object can be recognized more appropriately.

物体形状認識システムは、形状推定手段によって推定された物体の形状に応じて、映像を投影する投影手段を更に備えることが望ましい。この構成によれば、認識された形状に対して映像の投影を行うことができ、上述した投影や重畳を適切に行うことができる。 The object shape recognition system preferably further includes projection means for projecting an image in accordance with the shape of the object estimated by the shape estimation means. According to this configuration, an image can be projected on the recognized shape, and the above-described projection and superimposition can be appropriately performed.

撮像手段は、複数の時刻にわたって物体を撮像し、撮像手段によって撮像された時間変化した画像における、形状推定手段によって推定された物体の形状に対応する形状を検出する形状追従手段を更に備え、投影手段は、位置追従手段によって検出された形状に応じて、映像を投影する、ことが望ましい。この構成によれば、物体自体や撮像方向が動いたとしても、上述した投影や重畳を適切に行うことができる。 The imaging means further includes shape follow-up means for imaging an object over a plurality of times, and detecting a shape corresponding to the shape of the object estimated by the shape estimation means in the time-changed image captured by the imaging means. The means desirably projects an image in accordance with the shape detected by the position following means. According to this configuration, even if the object itself or the imaging direction moves, the above-described projection and superimposition can be appropriately performed.

ところで、本発明は、上記のように物体形状認識システムの発明として記述できる他に、以下のように物体形状認識方法の発明としても記述することができる。これはカテゴリが異なるだけで、実質的に同一の発明であり、同様の作用及び効果を奏する。 By the way, the present invention can be described as an invention of an object shape recognition system as described above, and can also be described as an invention of an object shape recognition method as follows. This is substantially the same invention only in different categories, and has the same operations and effects.

即ち、本発明に係る物体形状認識方法は、形状の認識対象となる物体を撮像する撮像ステップと、所定の音を検出する音検出ステップと、音検出ステップにおいて音が検出されたタイミングで撮像ステップにおいて撮像された画像における、物体の形状に応じた位置を検出する位置検出ステップと、位置検出ステップにおいて検出された位置から物体の形状を推定する形状推定ステップと、を含むことを特徴とする。 That is, the object shape recognition method according to the present invention includes an imaging step of imaging an object whose shape is to be recognized, a sound detection step of detecting a predetermined sound, and an imaging step at a timing when sound is detected in the sound detection step. A position detecting step for detecting a position corresponding to the shape of the object in the image picked up in (2), and a shape estimating step for estimating the shape of the object from the position detected in the position detecting step.

本発明では、形状の認識対象となる物体の撮像、音の検出及び音が発せられた位置の検出によって物体の形状の推定を行う。これらは従来の形状の認識に比べて簡易に行うことができ、結果として、本発明によれば容易に物体の形状を認識することができる。 In the present invention, the shape of the object is estimated by imaging the object whose shape is to be recognized, detecting the sound, and detecting the position where the sound is emitted. These can be performed more easily than conventional shape recognition. As a result, according to the present invention, the shape of an object can be easily recognized.

以下、図面と共に本発明による物体形状認識システム及び物体形状認識方法の好適な実施形態について詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, preferred embodiments of an object shape recognition system and an object shape recognition method according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted.

図１に、本発明に係る物体形状認識システムの実施形態である物体形状認識装置１０の外観構成を模式的に示す。物体形状認識装置１０は、物体の形状を認識する装置であるが、本実施形態においては、具体的には以下のような機能を有している。図１に示すように、物体形状認識装置１０は、眼鏡型ディスプレイであり、ユーザが装着できるような形状となっている。物体形状認識装置１０は、コンピュータの入力デバイスの機能を有している。具体的には、物体形状認識装置１０は、所定の物体２０に重畳されるように、例えば、（仮想的な）キーボードやタッチパネルなどの入力デバイスの映像３０を投影して、ユーザに当該入力デバイスの映像の映像を認識させる。物体形状認識装置１０は、ユーザによる当該映像（に対応する箇所）への指やペンでの操作を検出して、コンピュータへの入力とする。 FIG. 1 schematically shows an external configuration of an object shape recognition apparatus 10 that is an embodiment of an object shape recognition system according to the present invention. The object shape recognition device 10 is a device for recognizing the shape of an object. In the present embodiment, specifically, the object shape recognition device 10 has the following functions. As shown in FIG. 1, the object shape recognition apparatus 10 is a glasses-type display and has a shape that can be worn by a user. The object shape recognition apparatus 10 has a function of a computer input device. Specifically, the object shape recognition apparatus 10 projects an image 30 of an input device such as a (virtual) keyboard or a touch panel so as to be superimposed on a predetermined object 20, and the input device is presented to the user. Recognize the video of the video. The object shape recognizing device 10 detects an operation with a finger or a pen on the video (corresponding to the user) by the user and uses it as an input to the computer.

入力デバイスの映像３０を重畳させる上記の所定の物体２０は、平面を有するものが望ましく、例えば、ユーザが所持しているノートや手帳あるいは固定されている壁などである。また、入力デバイスの映像３０は、物体２０に対して投影が行われて重畳されてもよい。また、眼鏡型ディスプレイが光学式シースルー型のものである場合、眼鏡型ディスプレイのレンズに対して、ユーザが当該レンズを通して物体２０を見たときに重畳して見えるように、入力デバイスの映像３０のみ投影されてもよい。また、眼鏡型ディスプレイがビデオシースルー型のものである場合、カメラで撮像した物体２０も同時にレンズに投影されてもよい。このとき、両目に同じ映像を投影してもよいし、右目と左目の距離からそれぞれの目から見える映像を推定して、それぞれの目に投影する映像を別々に用意してもよい。それぞれの目に合わせて別々の映像を投影すると立体感（奥行き）のある映像をユーザに見せることができる。 The predetermined object 20 on which the video 30 of the input device is superimposed preferably has a flat surface, for example, a notebook or notebook held by the user or a fixed wall. Further, the image 30 of the input device may be superimposed on the object 20 by being projected. Further, when the glasses-type display is of an optical see-through type, only the image 30 of the input device is seen so as to be superimposed on the lens of the glasses-type display when the user views the object 20 through the lens. It may be projected. When the eyeglass-type display is a video see-through type, the object 20 captured by the camera may be simultaneously projected onto the lens. At this time, the same image may be projected to both eyes, or an image viewed from each eye may be estimated from the distance between the right eye and the left eye, and the images projected to each eye may be prepared separately. When different videos are projected according to each eye, it is possible to show the user a stereoscopic image (depth).

ここで、映像３０を投影する物体２０の形状（又は範囲）は、ユーザが指定する。この指定は、図２に示すように、例えば、物体２０の形状のすみ（特徴点）２１を指で叩いて音を出すことによって行われる。指定された物体２０の形状は、物体形状認識装置１０において認識される。即ち、本実施形態における形状の認識対象となるのは、映像３０を重畳させる物体２０である。形状の認識については、より詳細に後述する。 Here, the user specifies the shape (or range) of the object 20 onto which the video 30 is projected. As shown in FIG. 2, this designation is performed, for example, by making a sound by hitting a corner (feature point) 21 of the shape of the object 20 with a finger. The shape of the designated object 20 is recognized by the object shape recognition apparatus 10. That is, the shape recognition target in the present embodiment is the object 20 on which the video 30 is superimposed. The shape recognition will be described in detail later.

引き続いて、物体形状認識装置１０の機能について説明する。図３に示すように、物体形状認識装置１０は、カメラ１１と、マイク１２と、音検出部１３と、位置検出部１４と、位置追従部１５と、形状推定部１６と、形状追従部１７と、ディスプレイ１８と、映像蓄積部１９とを備える。 Subsequently, the function of the object shape recognition apparatus 10 will be described. As shown in FIG. 3, the object shape recognition apparatus 10 includes a camera 11, a microphone 12, a sound detection unit 13, a position detection unit 14, a position tracking unit 15, a shape estimation unit 16, and a shape tracking unit 17. And a display 18 and a video storage unit 19.

カメラ１１は、形状の認識対象となる物体２０を撮像する撮像手段である。カメラ１１は、複数の時刻にわたって物体２０の撮像を行う。即ち、カメラ１１は、動画として物体２０を撮像する。カメラ１１は、その撮像方向が、ユーザの視線方向、即ち、眼鏡型ディスプレイのレンズの光軸方向と同様の方向となるように設けられている。従って、カメラ１１は、ユーザが物体２０の方向に視線を向けることによって物体２０を撮像する（ユーザが物体２０の方向に視線を向けることによって、撮像方向が物体２０に向かう方向になる）。なお、カメラ１１は、眼鏡型ディスプレイと一体に設けられていても、別体として周囲に備え付けられ又はユーザに携帯されて設けられていてもよい。カメラ１１は、撮像した画像（データ）を位置検出部１４に出力する。また、後述する位置及び形状の追従を行うため、カメラ１１は、当該画像を位置追従部１５及び形状追従部１７に出力する。 The camera 11 is an imaging unit that captures an image of the object 20 whose shape is to be recognized. The camera 11 images the object 20 over a plurality of times. That is, the camera 11 images the object 20 as a moving image. The camera 11 is provided so that its imaging direction is the same as the direction of the user's line of sight, that is, the optical axis direction of the lens of the glasses-type display. Accordingly, the camera 11 captures an image of the object 20 when the user directs his / her line of sight toward the object 20 (when the user directs his / her line of sight toward the object 20, the imaging direction becomes the direction toward the object 20). The camera 11 may be provided integrally with the eyeglass-type display, or may be provided as a separate body around the camera 11 or carried by the user. The camera 11 outputs the captured image (data) to the position detection unit 14. In addition, the camera 11 outputs the image to the position follower 15 and the shape follower 17 in order to follow the position and shape to be described later.

マイク１２は、物体２０から発生した音を検出する音検出手段の一機能である。マイク１２は、物体２０を含む物体２０の周囲で発生した音を集音する。マイク１２は、集音した音（のデータ）を音検出部１３に出力する。 The microphone 12 is a function of sound detection means for detecting sound generated from the object 20. The microphone 12 collects sound generated around the object 20 including the object 20. The microphone 12 outputs the collected sound (data) to the sound detection unit 13.

音検出部１３は、マイク１２によって集音された音に、物体２０から発生した特定の音が含まれていることを検出する音検出手段の一機能である。物体２０から発生した特定の音とは、ユーザによって物体２０が叩かれた音である。具体的には、音検出部１３は、物体２０が叩かれた音に係る情報、例えば、音のパターン（リズム、音声、音量）をあらかじめ記憶しておき、その音のパターンとマイク１２から入力された音とが一致する（部分がある）かを判断することによって上記の検出を行う。即ち、音検出部１３は、検出する音をあらかじめ記憶（登録）しておき、記憶した音を物体２０から発生した音として検出する。この登録は物体形状認識装置１０の開発者が行ってもよいし、ユーザが自分自身で好みのパターンを登録することとしてもよい。音検出部１３は、特定の音が検出されるとその旨を位置検出部１４に通知する。 The sound detection unit 13 is a function of sound detection means for detecting that the sound collected by the microphone 12 includes a specific sound generated from the object 20. The specific sound generated from the object 20 is a sound that the object 20 is hit by the user. Specifically, the sound detection unit 13 stores in advance information related to the sound of the object 20 being struck, for example, a sound pattern (rhythm, voice, volume), and inputs the sound pattern and the microphone 12. The above detection is performed by judging whether or not the sound that is made matches (there is a part). That is, the sound detection unit 13 stores (registers) a sound to be detected in advance, and detects the stored sound as a sound generated from the object 20. This registration may be performed by the developer of the object shape recognition apparatus 10, or the user may register a desired pattern by himself / herself. When a specific sound is detected, the sound detection unit 13 notifies the position detection unit 14 to that effect.

位置検出部１４は、音検出部１３によって上記の特定の音が検出されたタイミングでカメラ１１によって撮像された画像における、上記の特定の音が発生した位置（特徴点）を検出する位置検出手段である。具体的には、位置検出部１４は、特定の音が検出された旨が音検出部１３から通知されたタイミングでカメラ１１から入力された画像から上記の位置の検出を行う。位置検出部１４は、物体２０を叩くものに係る情報をあらかじめ記憶しておき、上記のタイミングの画像から当該情報に基づいて物体２０を叩くものの位置（物体２０が叩かれた位置）を検出して、検出された位置を上記の特定の音が発生した位置とする。物体２０を叩くものとは、例えば、ユーザの指やユーザに用いられる棒などの道具などであり、あらかじめ指定されたものである。 The position detection unit 14 detects a position (feature point) where the specific sound is generated in an image captured by the camera 11 at the timing when the specific sound is detected by the sound detection unit 13. It is. Specifically, the position detection unit 14 detects the position from the image input from the camera 11 at the timing when the sound detection unit 13 notifies that the specific sound has been detected. The position detection unit 14 stores in advance information related to what strikes the object 20, and detects the position (position where the object 20 is hit) of the object 20 based on the information from the above timing image. The detected position is the position where the specific sound is generated. What strikes the object 20 is, for example, a tool such as a user's finger or a stick used by the user, and is designated in advance.

ユーザの指などの物体２０を叩くものの画像中の位置の検出は、画像処理によって行われる。具体的には、例えば、物体２０を叩くものに係る情報として指の色である肌色をあらかじめ位置検出部１４が記憶しておき、画像中の肌色の領域のうち最も大きい領域（画素数の多い領域）を検出して、その最も上の座標を検出する位置とする。また、肌色の領域の凹凸を検出して、あらかじめ位置検出部１４が記憶した特定の形状（例えば、楕円）を認識して、その形状のうちの所定の点を検出する位置としてもよい。また、手のモデルを作成しておき、あらかじめ位置検出部１４に記憶させておき、その情報に基づいて指先を検出して、指先の点を検出する位置としてもよい。位置検出部１４は、画像における、上記のように検出した位置の（二次元）座標データを位置追従部１５及び形状推定部１６に出力する。上記の位置の検出は、（一つの）物体２０の形状の検出に対して複数回（例えば３回）行われる。この回数が多いほど、物体２０の形状の、より詳細な認識が可能になる。 Detection of the position in the image of an object 20 such as a user's finger is performed by image processing. Specifically, for example, the position detection unit 14 stores in advance the skin color that is the color of the finger as information related to the object 20 hitting, and the largest area (the number of pixels is large) among the skin color areas in the image. (Region) is detected and the uppermost coordinate is detected. Alternatively, the unevenness of the skin color region may be detected, a specific shape (for example, an ellipse) stored in advance by the position detection unit 14 may be recognized, and a predetermined point of the shape may be detected. Alternatively, a hand model may be created and stored in the position detection unit 14 in advance, and the fingertip may be detected based on the information, and the fingertip point may be detected. The position detection unit 14 outputs (two-dimensional) coordinate data of the position detected as described above to the position tracking unit 15 and the shape estimation unit 16 in the image. The position detection is performed a plurality of times (for example, three times) for the detection of the shape of the (one) object 20. The greater the number of times, the more detailed the shape of the object 20 can be recognized.

なお、位置検出部１４は、画像から位置を検出するタイミングを音検出部１３から通知されていたが、必ずしも当該通知が行われなくてもよい。例えば、カメラ１１によって撮像された画像に撮像された時刻の情報を対応付けておき、音検出部１３から上述した特定の音が検出された時刻の通知を受けて、当該時刻に基づいて位置を検出する画像を特定してもよい。 The position detection unit 14 is notified of the timing of detecting the position from the image from the sound detection unit 13, but the notification is not necessarily performed. For example, information on the time taken is associated with an image picked up by the camera 11, and a notification of the time when the specific sound is detected is received from the sound detection unit 13, and the position is determined based on the time. The image to be detected may be specified.

位置追従部１５は、カメラ１１によって撮像された画像における、位置検出部１４によって検出された位置に対応する位置を検出（追従）する位置追従手段である。カメラ１１による撮像は、位置検出部１４による複数の位置検出が行われている間行われており、撮像された画像は時間変化する。一点目の位置検出が行われた後、次の点の検出が行われるまでに物体２０を手で持っていたりすると画像における検出した点の位置が動いてしまうことがある。また、カメラ１１の撮像方向が動いてしまい、画像における検出した点の位置が動いてしまうことがある。位置追従部１５における位置の検出（追従）は、上記のように位置の特定を複数の時刻にわたる画像を用いて行う場合に適切に形状の推定を行えるようにするためのものである。 The position follower 15 is a position follower that detects (follows) a position corresponding to the position detected by the position detector 14 in the image captured by the camera 11. Imaging by the camera 11 is performed while a plurality of position detections are performed by the position detection unit 14, and the captured image changes with time. If the object 20 is held by hand before the next point is detected after the first position is detected, the position of the detected point in the image may move. Further, the imaging direction of the camera 11 may move, and the position of the detected point in the image may move. The position detection (following) in the position tracking unit 15 is for appropriately estimating the shape when the position is specified using images over a plurality of times as described above.

具体的には、位置追従部１５は、位置検出部１４による検出に用いられた画像から、位置検出部１４によって検出された座標近傍の所定の範囲の画像を抽出する。位置追従部１５は、当該所定の範囲の画像を、検出された位置の特徴を示す画像として記憶する。続いて、位置追従部１５は、追従対象となる画像から、上記記憶した所定の範囲の画像に対応する部分を検出して、当該部分から追従した位置（画像上の二次元座標）を特定する。この検出は、具体的には例えば、オプティカルフロー等を用いた特徴点追従方法等が用いられて行われる。位置追従部１５は、追従した位置の座標データを形状推定部１６に出力する。 Specifically, the position tracking unit 15 extracts an image in a predetermined range near the coordinates detected by the position detection unit 14 from the image used for detection by the position detection unit 14. The position follower 15 stores the image in the predetermined range as an image indicating the feature of the detected position. Subsequently, the position tracking unit 15 detects a portion corresponding to the stored image in the predetermined range from the image to be tracked, and specifies a position (two-dimensional coordinates on the image) tracked from the portion. . Specifically, this detection is performed using, for example, a feature point tracking method using an optical flow or the like. The position tracking unit 15 outputs the coordinate data of the tracked position to the shape estimation unit 16.

形状推定部１６は、位置検出部１４によって検出された位置、及び位置追従部１５によって追従された位置から物体２０の形状を推定する形状推定手段である。ここで推定される形状は、二次元の形状（カメラ１１によって撮像された画像内での形状）である。また、ここでの形状の推定には、（カメラ１１によって撮像された画像内での）形状の位置を推定することも含む。形状推定部１６は、位置検出部１４に検出された（位置追従部１５によって追従された）位置が所定の数（例えば３点）になったら形状推定を行う。位置をいくつ用いて形状推定を行うかについては、あらかじめ形状推定部１６に記憶されている。 The shape estimation unit 16 is a shape estimation unit that estimates the shape of the object 20 from the position detected by the position detection unit 14 and the position tracked by the position tracking unit 15. The shape estimated here is a two-dimensional shape (a shape in an image captured by the camera 11). The estimation of the shape here includes estimating the position of the shape (in the image captured by the camera 11). The shape estimation unit 16 performs shape estimation when the position detected by the position detection unit 14 (followed by the position tracking unit 15) reaches a predetermined number (for example, three points). The number of positions used for shape estimation is stored in the shape estimation unit 16 in advance.

形状推定部１６は、具体的には例えば、検出された位置を結ぶことによって物体２０の形状を推定する。また、形状推定部１６は、検出された位置を結んだ形状の図形に内接する図形（例えば四角形）、検出された位置を全て含む図形を推定する形状としてもよい。また、上記の投影を行う範囲の画像上の大きさをあらかじめ設定しておき、検出された位置が当該範囲に収まるように近似して、形状を推定してもよい。形状推定部１６は、形状を推定するための情報（例えば、四角形等の形状が決まっている場合はその情報）又はルールをあらかじめ記憶しておき、その情報又はルールを用いて形状の推定を行う。形状推定部１６は、推定した形状を示す情報を形状追従部１７及びディスプレイ１８に出力する。ここで、推定した形状を示す情報には、画像内での形状の位置を示す情報も含む。 Specifically, for example, the shape estimation unit 16 estimates the shape of the object 20 by connecting the detected positions. In addition, the shape estimation unit 16 may be a shape that infers a graphic (for example, a quadrangle) that is inscribed in the graphic having a shape that connects the detected positions, or a graphic that includes all the detected positions. Alternatively, the size of the projection range may be set in advance, and the shape may be estimated by approximating the detected position within the range. The shape estimation unit 16 stores information for estimating the shape (for example, information when a shape such as a rectangle is determined) or a rule in advance, and estimates the shape using the information or the rule. . The shape estimation unit 16 outputs information indicating the estimated shape to the shape tracking unit 17 and the display 18. Here, the information indicating the estimated shape includes information indicating the position of the shape in the image.

形状追従部１７は、カメラによって撮像された画像における、形状推定部１６によって推定された物体２０の形状に対応する形状を検出（追従）する形状追従手段である。カメラ１１による撮像は、後述するディスプレイ１８による映像の投影が行われている間行われており、撮像された画像は時間変化する。上述したように撮像された画像内における、検出された形状は、物体２０やカメラ１１の方向の移動により変化しうる。ディスプレイ１８による映像の投影は、検出された物体２０の形状に応じて行われるものであり、複数の時刻にわたって映像の投影を行う場合に適切に映像の投影を行えるようにするためのものである。 The shape follower 17 is a shape follower that detects (follows) a shape corresponding to the shape of the object 20 estimated by the shape estimator 16 in the image captured by the camera. Imaging by the camera 11 is performed while a video is projected on the display 18 described later, and the captured image changes with time. The detected shape in the image captured as described above can be changed by the movement of the object 20 or the camera 11. The projection of the image by the display 18 is performed in accordance with the detected shape of the object 20, and is intended to appropriately project the image when the image is projected over a plurality of times. .

具体的には、形状追従部１７は、形状推定部１６による形状の推定が行われた時点の画像から、当該形状の特徴を示す情報を取得する。例えば、形状推定部１６によって推定された形状の範囲の画像を抽出する。形状追従部１７は、当該所定の範囲の画像を検出された形状の特徴を示す画像（のテンプレート）として記憶する。続いて、形状追従部１７は、追従対象となる画像から上記記憶した所定の範囲の画像（のテンプレート）に対応する部分を検出して、当該部分を追従対象となる画像における物体２０の形状とする。この検出は、具体的には例えば、パターンマッチング（テンプレートマッチング）の手法等が用いられて行われる。位置追従部１５は、追従した形状を示す情報をディスプレイ１８に出力する。 Specifically, the shape follower 17 acquires information indicating the feature of the shape from the image at the time when the shape is estimated by the shape estimator 16. For example, an image of the shape range estimated by the shape estimation unit 16 is extracted. The shape following unit 17 stores the image in the predetermined range as an image (template) indicating the detected feature of the shape. Subsequently, the shape tracking unit 17 detects a portion corresponding to the stored image (template) of the predetermined range from the image to be tracked, and detects the portion as the shape of the object 20 in the image to be tracked. To do. Specifically, this detection is performed using, for example, a pattern matching (template matching) method or the like. The position tracking unit 15 outputs information indicating the tracked shape to the display 18.

追従対象の画像と比較する、上記の形状の特徴を示す情報は画像自体でなくともよい。例えば、抽出した範囲の画像の色情報（ヒストグラムや色の平均）を上記の形状の特徴を示す情報として、追従対象の画像の同様の色情報を持つ領域を追従することとしてもよい。また、物体２０のエッジ（端部）の特徴を示す情報をテンプレートとして用いてもよい。また、上述した位置追従部１５と同様に、位置検出部１４によって検出された各位置を全て追従して、追従された位置から形状推定部１６と同様に形状の推定を行い、形状の追従を行うこととしてもよい。形状追従部１７は、追従した形状を示す情報をディスプレイ１８に出力する。ここで、追従した形状を示す情報には、画像内での形状の位置を示す情報も含む。 The information indicating the feature of the shape to be compared with the image to be followed does not have to be the image itself. For example, the color information (histogram or average color) of the image in the extracted range may be used as information indicating the characteristics of the shape, and the region having the same color information of the image to be followed may be followed. Further, information indicating the characteristics of the edge (end) of the object 20 may be used as a template. Similarly to the position follower 15 described above, all the positions detected by the position detector 14 are tracked, and the shape is estimated from the tracked position in the same manner as the shape estimator 16 to follow the shape. It may be done. The shape tracking unit 17 outputs information indicating the tracked shape to the display 18. Here, the information indicating the following shape includes information indicating the position of the shape in the image.

ディスプレイ１８は、形状推定部１６によって推定された物体２０の形状及び形状追従部１７によって追従された物体２０の形状（以下、これらを認識形状と呼ぶ）に応じて、映像３０を投影する投影手段である。ディスプレイ１８は、映像蓄積部１９から投影する映像を取得して投影を行う。ディスプレイ１８は上述したように例えば、眼鏡型ディスプレイのレンズに設けられており、映像の投影は、上述したように物体２０に重畳するように行われる。ディスプレイ１８は、形状推定部１６又は形状追従部１７から入力された情報に基づいて、映像蓄積部１９から取得した映像を認識形状に合うように変換して、変換後の映像３０を投影する。例えば、映像蓄積部１９に蓄積された投影すべき画像が図４（ａ）に示すように長方形の範囲に並べられた同じ大きさの文字が投影されるものであり、認識形状が台形形状であった場合には図４（ｂ）に示すように短辺（左側の辺）側の文字が長辺（右側の辺）側の文字よりも小さくなるように変換が行われる。また、ディスプレイ１８は、形状推定部１６又は形状追従部１７から入力された情報に基づいて、カメラ１１により撮像される画像において物体２０が位置する箇所に（ユーザが視認されるように）映像３０を投影する。 The display 18 projects the image 30 according to the shape of the object 20 estimated by the shape estimation unit 16 and the shape of the object 20 tracked by the shape tracking unit 17 (hereinafter referred to as recognition shapes). It is. The display 18 acquires the image to be projected from the image storage unit 19 and performs projection. As described above, the display 18 is provided, for example, on a lens of a glasses-type display, and projection of an image is performed so as to be superimposed on the object 20 as described above. Based on the information input from the shape estimation unit 16 or the shape tracking unit 17, the display 18 converts the video acquired from the video storage unit 19 to match the recognition shape, and projects the converted video 30. For example, the images to be projected stored in the video storage unit 19 are projected with the same size characters arranged in a rectangular range as shown in FIG. 4A, and the recognition shape is trapezoidal. If so, the conversion is performed so that the character on the short side (left side) side becomes smaller than the character on the long side (right side) side, as shown in FIG. In addition, the display 18 is based on the information input from the shape estimation unit 16 or the shape tracking unit 17, and the video 30 is displayed at a location where the object 20 is located in the image captured by the camera 11 (so that the user can visually recognize the image). Project.

この変換及び位置合わせは、ディスプレイ１８が、映像蓄積部１９から取得した映像に対して、拡大、縮小、回転移動及び平行移動等を行う既存の画像変換処理を行うことにより実現される。例えば、画像変換処理は、以下の変換行例Ｍで映像蓄積部１９から取得した映像（データ）を変換することによって行われる。

上記の式において、Ｒ_１ｘ，Ｒ_２ｘ，Ｒ_３ｘ，Ｒ_１ｙ，Ｒ_２ｙ，Ｒ_３ｙ，Ｒ_１ｚ，Ｒ_２ｚ，Ｒ_３ｚは回転パラメータであり、ΔＸ，ΔＹ，ΔＺは平行移動パラメータである。変換行例Ｍは、物体２０の認識形状の座標（ｘ，ｙ，ｚ）に合わせて、投影すべき画像の座標（Ｘ，Ｙ，Ｚ）をそれぞれの軸に対して回転移動及び平行移動させるための行列である。ここで、認識形状及び投影すべき画像は二次元であるのでｚ＝Ｚ＝０である。ディスプレイ１８が、これらのパラメータを、認識形状及び映像蓄積部１９から取得した映像の形状とそれぞれの位置を示す情報とから算出して変換を行う。認識形状が回転している場合は、回転軸毎に回転パラメータＲ部分が、認識形状の回転角度θａから求めることができる。平行移動認識形状が平行移動している場合は、平行移動分ΔＸ，ΔＹ，ΔＺのうちその移動軸に沿ったパラメータが設定される。 This conversion and alignment is realized by the display 18 performing an existing image conversion process for enlarging, reducing, rotating and translating the video acquired from the video storage unit 19. For example, the image conversion process is performed by converting the video (data) acquired from the video storage unit 19 in the following conversion example M.

In the above formula, R _1x , R _2x , R _3x , R _1y , R _2y , R _3y , R _1z , R _2z , R _3z are rotational parameters, and ΔX, ΔY, ΔZ are parallel movement parameters. In the conversion row example M, the coordinates (X, Y, Z) of the image to be projected are rotated and translated with respect to the respective axes in accordance with the coordinates (x, y, z) of the recognition shape of the object 20. Is a matrix for Here, since the recognition shape and the image to be projected are two-dimensional, z = Z = 0. The display 18 performs conversion by calculating these parameters from the recognized shape and the shape of the video acquired from the video storage unit 19 and information indicating the respective positions. When the recognition shape is rotating, the rotation parameter R portion can be obtained for each rotation axis from the rotation angle θa of the recognition shape. When the parallel movement recognition shape is moving in parallel, a parameter along the movement axis is set among the parallel movements ΔX, ΔY, and ΔZ.

ディスプレイ１８は、上記のように映像に対して変換処理を行い投影する。こうすることによって、図４（ｂ）に示すように物体２０の傾きに合わせて、重畳する画像３０が傾く。 The display 18 performs a conversion process on the video as described above and projects it. By doing so, the superimposed image 30 is tilted in accordance with the tilt of the object 20 as shown in FIG.

映像蓄積部１９は、ディスプレイ１８によって投影される映像（データ）３０を蓄積しており、ディスプレイ１８からの要求に従って映像をディスプレイ１８に出力する。 The video storage unit 19 stores video (data) 30 projected by the display 18 and outputs the video to the display 18 in accordance with a request from the display 18.

物体形状認識装置１０は、ディスプレイ１８によって投影された（仮想的な）キーボードやタッチパネル等の画像（に対応する部分）に対して、ユーザが行った入力動作を検出して、入力情報とする機能等も有している（図示せず）。この認識は、例えばカメラ１１によって撮像された画像を利用した、上述したような音と指の位置の認識等によっても行うことができる。指の位置の検出は、カメラ１１による画像における物体２０の（検出又は追従された）形状の範囲内のみで行われればよく、画像全体から指を探すよりも早く、精度よく検出を行うことができる。以上が、物体形状認識装置１０の機能である。 The object shape recognition apparatus 10 detects an input operation performed by a user on (an image corresponding to) an image such as a (virtual) keyboard or touch panel projected by the display 18 and uses the input operation as input information. Etc. (not shown). This recognition can also be performed by, for example, recognizing the sound and finger position as described above using an image captured by the camera 11. The detection of the finger position only needs to be performed within the range of the shape (detected or followed) of the object 20 in the image by the camera 11 and can be detected faster and more accurately than searching for the finger from the entire image. it can. The above is the function of the object shape recognition apparatus 10.

図５に示すように、物体形状認識装置１０は、ＣＰＵ（Central Processing Unit）１０１、主記憶装置であるＲＡＭ（Random Access Memory）１０２及びＲＯＭ（Read Only Memory）１０３、並びにハードディスク等の補助記憶装置１０４等のハードウェアを備えるコンピュータを備えて構成される。また、それ以外のハードウェアとして、物体形状認識装置１０は、上述したカメラ１１、マイク１２及びディスプレイ１８とを備えて構成される。これらの構成要素が、動作することにより、物体形状認識装置１０の上述した機能が発揮される。 As shown in FIG. 5, an object shape recognition apparatus 10 includes a central processing unit (CPU) 101, a random access memory (RAM) 102 and a read only memory (ROM) 103 that are main storage devices, and an auxiliary storage device such as a hard disk. A computer having hardware such as 104 is configured. As other hardware, the object shape recognition apparatus 10 includes the camera 11, the microphone 12, and the display 18 described above. When these components operate, the above-described functions of the object shape recognition apparatus 10 are exhibited.

引き続いて、図６のフローチャートを用いて本実施形態に係る物体形状認識装置１０により実行される処理（物体形状認識方法）について説明する。本処理は、ユーザが、物体形状認識装置１０の上述した入力デバイスの機能を用いる際に行われる。ユーザが物体形状認識装置１０を装着して、当該物体形状認識装置１０に対して当該機能を開始させる操作を行うことによって、本処理は開始される。 Subsequently, processing (object shape recognition method) executed by the object shape recognition apparatus 10 according to the present embodiment will be described using the flowchart of FIG. This process is performed when the user uses the function of the input device of the object shape recognition apparatus 10 described above. The process starts when the user wears the object shape recognition device 10 and performs an operation to start the function with respect to the object shape recognition device 10.

まず、物体形状認識装置１０では、カメラ１１による、形状の認識対象となる物体２０の撮像が開始される（Ｓ０１、撮像ステップ）。この際、物体形状認識装置１０を装着したユーザが物体２０の方を向くことによって、カメラ１１の撮像方向は物体２０の方向となる。この撮像は、本処理中継続して行われる。撮像された画像は、撮像される毎に位置検出部１４、位置追従部１５及び形状追従部１７に出力される。 First, in the object shape recognition device 10, the camera 11 starts imaging the object 20 as a shape recognition target (S01, imaging step). At this time, when the user wearing the object shape recognition device 10 faces the object 20, the imaging direction of the camera 11 becomes the direction of the object 20. This imaging is continuously performed during this processing. The captured image is output to the position detection unit 14, the position tracking unit 15, and the shape tracking unit 17 every time it is captured.

続いて、ユーザが、指等のあらかじめ設定されたもので物体２０を叩く。ユーザにより叩かれる物体２０の箇所は、上述したように物体２０のすみ２１等、物体２０の形状を認識できる位置である。物体２０がユーザに叩かれると、物体形状認識装置１０では、その音がマイク１２によって集音されて、その音がマイク１２から音検出部１３に入力される。続いて、音検出部１３によって物体２０が叩かれた音が検出される（Ｓ０２、音検出ステップ）。音が検出されるとその旨が音検出部１３から位置検出部１４に通知される。 Subsequently, the user hits the object 20 with a preset object such as a finger. The location of the object 20 hit by the user is a position where the shape of the object 20 can be recognized, such as the corner 21 of the object 20 as described above. When the object 20 is hit by the user, in the object shape recognition device 10, the sound is collected by the microphone 12, and the sound is input from the microphone 12 to the sound detection unit 13. Subsequently, the sound that the object 20 is hit is detected by the sound detection unit 13 (S02, sound detection step). When sound is detected, the sound detection unit 13 notifies the position detection unit 14 to that effect.

音検出部１３から位置検出部１４に音が検出された旨が通知されると、位置検出部１４によって、カメラによって撮像された画像における音が発生した位置（特徴点）が検出される（Ｓ０３、位置検出ステップ）。検出された特徴点を示す情報は、位置検出部１４から位置追従部１５及び形状推定部１６に出力される。ここで、検出された特徴点の数が形状の推定に必要な数である３になったか否かにより以下のように処理が分岐される（Ｓ０４）。 When the sound detection unit 13 notifies the position detection unit 14 that sound has been detected, the position detection unit 14 detects the position (feature point) where the sound is generated in the image captured by the camera (S03). , Position detection step). Information indicating the detected feature points is output from the position detection unit 14 to the position tracking unit 15 and the shape estimation unit 16. Here, the process branches as follows depending on whether or not the number of detected feature points has reached 3, which is the number necessary for shape estimation (S04).

検出された特徴点の数が３未満である場合は、位置追従部１５によって、カメラ１１による画像における、検出された位置に対応する位置が追従される（Ｓ０５、位置追従ステップ）。追従された位置を示す情報は、位置追従部１５から形状推定部１６に出力される。上記の位置の追従は、検出される特徴点の数が３になるまで継続して行われる。また、上記の音の検出（Ｓ０２）及び位置の検出（Ｓ０３）が行われる When the number of detected feature points is less than 3, the position tracking unit 15 tracks the position corresponding to the detected position in the image by the camera 11 (S05, position tracking step). Information indicating the tracked position is output from the position tracking unit 15 to the shape estimation unit 16. The tracking of the position is continuously performed until the number of detected feature points becomes three. In addition, the sound detection (S02) and the position detection (S03) are performed.

一方、Ｓ０３の処理の後、検出された特徴点の数が３以上となった場合、形状推定部１６によって物体２０の形状が推定される（Ｓ０６、形状推定ステップ）。推定された形状を示す情報は、形状推定部１６から形状追従部１７及びディスプレイ１８に出力される。形状追従部１７では、推定した形状の追従のため推定された形状を示す情報に基づいて、当該形状の特徴を示す情報が取得されて保存される（Ｓ０７、形状追従ステップ）。 On the other hand, after the process of S03, when the number of detected feature points is 3 or more, the shape of the object 20 is estimated by the shape estimation unit 16 (S06, shape estimation step). Information indicating the estimated shape is output from the shape estimation unit 16 to the shape tracking unit 17 and the display 18. Based on the information indicating the estimated shape for tracking the estimated shape, the shape tracking unit 17 acquires and stores information indicating the feature of the shape (S07, shape tracking step).

続いて、ディスプレイ１８によって、投影される映像が、映像蓄積部１９から取得される。続いて、ディスプレイ１８によって、形状推定部１６及び形状追従部１７から入力された、物体２０の形状を示す情報に基づいて、上記の映像を変換処理するためのパラメータが算出される（Ｓ０８、投影ステップ）。続いて、ディスプレイ１８によって、算出されたパラメータを用いて投影する画像の変換処理が行われる（Ｓ０９、投影ステップ）。続いて、ディスプレイ１８によって、変換処理された映像３０が、上述したように物体２０に重畳されるように投影される（Ｓ１０、投影ステップ）。 Subsequently, the projected image is acquired from the image storage unit 19 by the display 18. Subsequently, on the basis of the information indicating the shape of the object 20 input from the shape estimation unit 16 and the shape tracking unit 17, the display 18 calculates parameters for converting the video (S08, projection). Step). Subsequently, the display 18 performs conversion processing of an image to be projected using the calculated parameters (S09, projection step). Subsequently, the converted video 30 is projected by the display 18 so as to be superimposed on the object 20 as described above (S10, projection step).

上記の投影された映像は（仮想的な）キーボードやタッチパネル等の画像であり、当該画像（に対応する部分）に対してユーザは入力動作を行う。当該入力動作が行われた場合、物体形状認識装置１０では入力動作が検出され入力情報とされる（Ｓ１１）。 The projected video is an image of a (virtual) keyboard or touch panel, and the user performs an input operation on the image (corresponding to the image). When the input operation is performed, the object shape recognition apparatus 10 detects the input operation and uses it as input information (S11).

また、物体形状認識装置１０では、ディスプレイ１８による上記の映像の投影が行われている間、形状追従部１７によって、カメラ１１による画像における、物体２０の形状の追従が行われる（Ｓ１２、形状追従ステップ）。追従された形状を示す情報は、ディスプレイ１８に出力されて、映像の投影処理等（Ｓ０８〜Ｓ１１）が当該追従された形状に基づいて行われる。以上が本実施形態に係る物体形状認識装置１０により実行される処理である。 Further, in the object shape recognition device 10, the shape following unit 17 follows the shape of the object 20 in the image by the camera 11 while the above image is projected on the display 18 (S <b> 12, shape following). Step). Information indicating the tracked shape is output to the display 18, and video projection processing or the like (S08 to S11) is performed based on the tracked shape. The above is the processing executed by the object shape recognition apparatus 10 according to the present embodiment.

上述したように本実施形態では物体２０の形状が認識される際には、ユーザ等が物体２０を叩くことによって音が発生される。一方で、上記の音が発生したタイミングにおいて当該物体２０は撮像されており、撮像された画像における、音が発生した位置が検出される。検出された位置から物体２０の形状が特定される。即ち、本実施形態では、物体２０の撮像、音の検出、及び音が発生した位置の検出によって物体２０の形状の推定を行う。これらは従来の形状の認識に比べて簡易に行うことができ、結果として、本実施形態によれば容易に物体２０の形状を認識することができる。 As described above, in the present embodiment, when the shape of the object 20 is recognized, a sound is generated when the user or the like hits the object 20. On the other hand, the object 20 is imaged at the timing when the sound is generated, and the position where the sound is generated in the captured image is detected. The shape of the object 20 is specified from the detected position. That is, in this embodiment, the shape of the object 20 is estimated by imaging the object 20, detecting the sound, and detecting the position where the sound is generated. These can be performed more easily than conventional shape recognition. As a result, according to the present embodiment, the shape of the object 20 can be easily recognized.

本実施形態のように物体２０を叩いて音を発生させることとすれば、容易に物体２０から音を発生させることができる。何で物体を叩くか（例えば、ユーザの指）を決めておくことにより、あらかじめ指等の情報を記憶させておき、確実に音が発生した位置が検出される。このような構成とすることによって、より容易かつ確実に物体２０の形状を認識することができる。 If a sound is generated by hitting the object 20 as in the present embodiment, a sound can be easily generated from the object 20. By determining what the object is hit with (for example, the user's finger), information such as the finger is stored in advance, and the position where the sound is generated is reliably detected. By adopting such a configuration, the shape of the object 20 can be recognized more easily and reliably.

また、本実施形態のようにユーザが指で叩いた部分が形状として認識されるため、本実施形態のように当該形状がディスプレイ１８によって映像が投影される範囲として利用される場合、ユーザが任意かつ容易に投影範囲を指定することができる。即ち、本実施形態では、適切な形状の認識を行うことができる。また、この際、認識対象となる物体２０や撮像される画像の背景に係る情報をあらかじめ保持しておく必要はないため、この点においても本実施形態の実施は容易である。 In addition, since the portion struck by the user's finger as in this embodiment is recognized as a shape, when the shape is used as a range in which an image is projected by the display 18 as in this embodiment, the user can arbitrarily And a projection range can be designated easily. That is, in this embodiment, it is possible to recognize an appropriate shape. At this time, since it is not necessary to hold in advance information relating to the object 20 to be recognized and the background of the image to be captured, this embodiment is easy to implement in this respect as well.

更に、本実施形態のように検出する音の情報をあらかじめ記憶させておくことによって、発生する音の検出を確実に行うことができ、確実に物体２０の形状を認識することができる。ただし、必ずしも音の情報の記憶は必ずしも必要ではなく、例えば、一定量の大きさの音（あらかじめ設定した閾値を超える音量を有する音）が発生した場合に音を検出する等の構成としてもよい。 Furthermore, by storing in advance the information of the sound to be detected as in the present embodiment, it is possible to reliably detect the sound that is generated and to recognize the shape of the object 20 with certainty. However, it is not always necessary to store sound information. For example, a sound may be detected when a certain amount of sound (a sound having a volume exceeding a preset threshold) is generated. .

また、本実施形態のように検出された複数の位置から、形状を推定することが好ましい。この構成によれば、適切に物体の形状を認識することができる。本実施形態では、３点の位置から形状を推定しているがそれ以上の位置から形状を推定してもよい。 Moreover, it is preferable to estimate a shape from a plurality of positions detected as in the present embodiment. According to this configuration, it is possible to appropriately recognize the shape of the object. In the present embodiment, the shape is estimated from three positions, but the shape may be estimated from more positions.

また、本実施形態のように検出した位置を追尾することが好ましい。この構成によれば、例えば、複数の位置を検出する間に物体２０自体やカメラ１１の撮像方向が動いたとしても、適切な位置に基づいて物体２０の形状を推定することができる。即ち、より適切に物体２０の形状を認識することができる。ただし、物体２０やカメラ１１が固定されている場合や１つの画像からや短時間に複数の位置を検出する場合等は、必ずしも上記の構成をとる必要はない。 Further, it is preferable to track the detected position as in this embodiment. According to this configuration, for example, even if the object 20 itself or the imaging direction of the camera 11 moves while detecting a plurality of positions, the shape of the object 20 can be estimated based on an appropriate position. That is, the shape of the object 20 can be recognized more appropriately. However, when the object 20 or the camera 11 is fixed or when a plurality of positions are detected from a single image in a short time, the above-described configuration is not necessarily required.

また、本実施形態のような構成を取れば、ユーザが投影したい物体２０に応じて映像が投影され、上述した投影や重畳を適切に行うことができる。例えば、ユーザが所持しているノートや手帳等に違和感無く映像の投影が行われる。また、上述したように形状を追従した上で投影を行うこととすれば、物体２０自体やカメラ１１の撮像方向が動いたとしても、上述した投影や重畳を適切に行うことができる。ただし、位置の追従と同様に物体２０やカメラ１１が固定されている場合等には、必ずしも上記の構成をとる必要はない。 Further, if the configuration as in the present embodiment is taken, a video is projected according to the object 20 that the user wants to project, and the above-described projection and superimposition can be appropriately performed. For example, an image is projected onto a notebook or notebook held by the user without a sense of incongruity. Further, if the projection is performed after following the shape as described above, the above-described projection and superimposition can be appropriately performed even if the imaging direction of the object 20 itself or the camera 11 moves. However, when the object 20 and the camera 11 are fixed as in the case of tracking the position, the above-described configuration is not necessarily required.

また、上述したように物体２０の形状の認識、及び映像の投影を（上記の追従処理を含めて）一連の処理として行なうことによって、リアルタイムに映像の投影を行うことができ、上述した仮想的な入力デバイスとしてのユーザの利用性を高めることができる。 Further, as described above, by recognizing the shape of the object 20 and projecting the image as a series of processes (including the follow-up process described above), the image can be projected in real time. The user's usability as a simple input device can be improved.

なお、上述した実施形態では、ユーザが物体２０を叩く等して、物体２０から音を発生させることとしたが、必ずしも物体２０から音を発生させることを前提としなくてもよい。例えば、ユーザの入力音の認識において、周囲が騒がしく物体２０を叩く音が取得できない場合、ユーザの声がマイク１２及び音検出部１３によって音が検出されてもよい。その場合、位置検出部１４は、ユーザの声が検出されたタイミングでカメラ１１によって撮像された画像において、ユーザの指が指している点を検出する。検出される点は、物体２０の形状に応じた位置である。 In the above-described embodiment, the user generates a sound from the object 20 by hitting the object 20 or the like. However, it is not always necessary to generate a sound from the object 20. For example, when recognizing a user's input sound, when the surroundings are noisy and the sound of hitting the object 20 cannot be acquired, the sound of the user's voice may be detected by the microphone 12 and the sound detection unit 13. In that case, the position detection unit 14 detects a point pointed by the user's finger in the image captured by the camera 11 at the timing when the user's voice is detected. The detected point is a position corresponding to the shape of the object 20.

また、周囲が騒がしい際、マイク１２でリアルタイムに雑音を解析し、雑音モデルを作成することによって雑音除去を行うこともできる。 In addition, when the surroundings are noisy, noise can be removed by analyzing noise in real time with the microphone 12 and creating a noise model.

また、上述した実施形態では、物体２０の形状を推定するために検出するための位置はすべてユーザが叩いた位置を検出していたが、あらかじめ物体２０に設けてあるマーカを併用してもよい。即ち、物体２０に設けられるマーカの位置を、従来の方法と同様に検出して、形状推定部１６による形状の推定に用いる位置としてもよい。例えば、マーカを物体２０に１つ設けておき、他の２つの位置をユーザが物体２０を叩くことで検出してもよい。マーカとしては、例えば、特徴的な色や形状のしるしが用いられる。 In the above-described embodiment, the positions detected by the user for detecting the shape of the object 20 are all detected by the user. However, a marker provided in advance on the object 20 may be used in combination. . That is, the position of the marker provided on the object 20 may be detected in the same manner as in the conventional method and used as a position used for shape estimation by the shape estimation unit 16. For example, one marker may be provided on the object 20 and the other two positions may be detected by the user hitting the object 20. As the marker, for example, a mark having a characteristic color or shape is used.

また、本実施形態では、カメラ１１によって撮像されて得られる画像は、通常の（可視光により撮像される）画像であることとしたが、必ずしも上記の画像でなくてもよく、物体２０及び物体２０を叩くもの（ユーザの指等）の特徴が認識できるものであればよい。具体的には例えば、赤外線による画像、距離画像、サーモグラフィ（温度分布）による画像等でもよい。 In the present embodiment, the image obtained by being captured by the camera 11 is a normal image (captured by visible light), but is not necessarily the above-described image. What is necessary is just to be able to recognize the characteristics of the one hitting 20 (such as a user's finger). Specifically, for example, an image by infrared rays, a distance image, an image by thermography (temperature distribution) may be used.

本発明の実施形態における物体形状認識装置の外観構成を模式的に示す図である。It is a figure which shows typically the external appearance structure of the object shape recognition apparatus in embodiment of this invention. 物体の形状を認識する際にユーザによって物体が叩かれて音が発生する様子を示す図である。It is a figure which shows a mode that an object is hit by the user and the sound is generated when recognizing the shape of the object. 本発明の実施形態における物体形状認識装置の機能構成を示す図である。It is a figure which shows the function structure of the object shape recognition apparatus in embodiment of this invention. 物体形状認識装置において投影される映像を示す図である。It is a figure which shows the image | video projected in an object shape recognition apparatus. 本発明の実施形態における物体形状認識装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the object shape recognition apparatus in embodiment of this invention. 本発明の実施形態における物体形状認識装置で実行される処理（物体形状認識方法）を示すフローチャートである。It is a flowchart which shows the process (object shape recognition method) performed with the object shape recognition apparatus in embodiment of this invention.

Explanation of symbols

１０…物体形状認識装置、１１…カメラ、１２…マイク、１３…音検出部、１４…位置検出部、１５…位置追従部、１６…形状推定部、１７…形状追従部、１８…ディスプレイ、１９…映像蓄積部、１０１…ＣＰＵ、１０２…ＲＡＭ、１０３…ＲＯＭ、１０４…補助記憶装置、２０…物体、３０…映像。 DESCRIPTION OF SYMBOLS 10 ... Object shape recognition apparatus, 11 ... Camera, 12 ... Microphone, 13 ... Sound detection part, 14 ... Position detection part, 15 ... Position tracking part, 16 ... Shape estimation part, 17 ... Shape tracking part, 18 ... Display, 19 ... Image storage unit, 101 ... CPU, 102 ... RAM, 103 ... ROM, 104 ... auxiliary storage device, 20 ... object, 30 ... video.

Claims

Imaging means for imaging an object that is a shape recognition target;
Sound detection means for detecting a predetermined sound;
Position detecting means for detecting a position corresponding to the shape of the object in an image captured by the imaging means at a timing when the sound is detected by the sound detecting means;
Shape estimation means for estimating the shape of the object from the position detected by the position detection means;
An object shape recognition system comprising:

The sound detection means detects a sound generated from the object as the predetermined sound,
The position detecting means detects a position where the sound is generated as a position corresponding to a shape of the object;
The object shape recognition system according to claim 1.

The sound generated from the object is the sound of the object being hit,
The position detecting means stores in advance information relating to the object that strikes the object, and detects the position where the object is struck based on the information as the position where the sound is generated.
The object shape recognition system according to claim 2.

The object according to any one of claims 1 to 3, wherein the sound detection unit stores in advance information related to a sound to be detected, and detects the predetermined sound based on the information. Shape recognition system.

The position detecting means detects positions according to the shapes of the plurality of objects;
The shape estimation means estimates the shape of the object from a plurality of positions detected by the position detection means;
The object shape recognition system according to any one of claims 1 to 4.

The imaging means images the object over a plurality of times,
A position follower for detecting a position corresponding to the position detected by the position detector in the time-changed image captured by the imager;
The shape estimating means estimates the shape of the object from the position detected by the position following means;
The object shape recognition system according to any one of claims 1 to 5.

The object shape recognition system according to any one of claims 1 to 6, further comprising projection means for projecting an image in accordance with the shape of the object estimated by the shape estimation means.

The imaging means images the object over a plurality of times,
A shape follower for detecting a shape corresponding to the shape of the object estimated by the shape estimating unit in the time-varying image captured by the imaging unit;
The projecting means projects an image according to the shape detected by the position following means;
The object shape recognition system according to claim 7.

An imaging step of imaging an object whose shape is to be recognized;
A sound detection step for detecting a predetermined sound;
A position detection step of detecting a position corresponding to the shape of the object in the image captured in the imaging step at a timing when the sound is detected in the sound detection step;
A shape estimation step for estimating the shape of the object from the position detected in the position detection step;
An object shape recognition method including: