JP2015510169A

JP2015510169A - Feature improvement by contrast improvement and optical imaging for object detection

Info

Publication number: JP2015510169A
Application number: JP2014552391A
Authority: JP
Inventors: デイビッドホルツ，; フアヤン，
Original assignee: HOLZ David; Leap Motion Inc
Current assignee: HOLZ David; Leap Motion Inc
Priority date: 2012-01-17
Filing date: 2013-01-16
Publication date: 2015-04-02
Also published as: CN107066962A; DE112013000590B4; DE112013000590T5; WO2013109609A2; WO2013109609A3; CN104145276B; CN104145276A; CN107066962B; JP2016186793A

Abstract

画像内において見えている背景面と物体との間の改善されたコントラストは、物体に向けられて制御された照明の使用によって与えられる。距離による光強度の減衰を利用するべく、例えば赤外光源などの光源（または複数の光源）は、（複数の）カメラが画像を撮像する間に物体に光を当てるために、1以上のカメラの近くに配置され得る。撮像された画像は、背景画素から物体画素を区別する解析が可能である。【選択図】図１１Improved contrast between the background surface visible in the image and the object is given by the use of controlled illumination directed at the object. To take advantage of the attenuation of light intensity with distance, a light source (or multiple light sources), such as an infrared light source, can be used to illuminate an object while the camera (s) capture an image. Can be placed near. The captured image can be analyzed to distinguish object pixels from background pixels. [Selection] FIG.

Description

本出願は、２０１２年１１月８日に出願された米国シリアル番号第６１／７２４０６８について優先権を主張するともに利益を得るものであり、当該出願の開示全体が参照として本明細書に援用される。さらに、本出願は、米国特許出願番号第１３／４１４４８５（２０１２年３月７日出願）及び第１３／７２４３５７（２０１２年１２月２１日出願）の優先権を主張し、米国仮特許出願番号第６１／７２４０９１（２０１２年１１月８日出願）及び第６１／５８７５５４（２０１２年１月１７日出願）についても優先権及び利益を主張するものである。これらの出願は、その全体が参照として本明細書に援用される。 This application claims and benefits from US Serial No. 61 / 724,068 filed Nov. 8, 2012, the entire disclosure of which is incorporated herein by reference. . In addition, this application claims priority to US patent application Ser. Nos. 13/414485 (filed Mar. 7, 2012) and 13/724357 (filed Dec. 21, 2012). No. 61/724091 (filed on Nov. 8, 2012) and 61/588554 (filed on Jan. 17, 2012) also claim priority and benefit. These applications are hereby incorporated by reference in their entirety.

本出願は、撮像システム、特に、三次元（３Ｄ）物体検出、追跡及び光学画像化を用いた特徴評価に関する。 The present application relates to imaging systems, and in particular to feature evaluation using three-dimensional (3D) object detection, tracking and optical imaging.

モーションキャプチャシステムは、人間の手や人体などの関節部を有する物体を含む様々な物体の動きや構造に関する情報を取得するための、様々な場面で使用されている。このようなシステムには、一般的に、物体の体積、位置及び動きの再構成を成すための画像を解析するコンピュータや、動体の一連の画像を撮像するためのカメラが含まれる。３Ｄモーションキャプチャには、少なくとも２つのカメラが典型的に使用される。 The motion capture system is used in various scenes for acquiring information on the movement and structure of various objects including objects having joint portions such as human hands and human bodies. Such systems typically include a computer that analyzes images to reconstruct the volume, position, and motion of an object, and a camera that captures a series of images of moving objects. At least two cameras are typically used for 3D motion capture.

画像によるモーションキャプチャシステムは、背景から対象の物体を区別する能力に依存する。これは、しばしば、エッジを検出する（典型的には、画素の比較によって色及び／または輝度の急激な変化を検出する）画像解析アルゴリズムを用いることで、実現される。しかしながら、このような従来のシステムは、例えば、背景における対象の物体と背景及び／または背景のパターンとの間のコントラストが低く物体のエッジとして誤って検出され得る場合など、多くの一般的な状況下において性能が低下する。 Image motion capture systems rely on the ability to distinguish the object of interest from the background. This is often accomplished by using an image analysis algorithm that detects edges (typically detecting sudden changes in color and / or brightness by pixel comparison). However, such conventional systems have many common situations, for example, when the contrast between the object of interest in the background and the background and / or background pattern is low and can be erroneously detected as the edge of the object. Under performance will decrease.

場合によっては、例えば、動きの実行中において人が反射材や発光源のメッシュ等を着ているようにするなど、対象の物体の「道具化（instrumenting）」によって、物体と背景との区別を促進することができる。特別な照明条件（例えば、微光）は、画像内における反射材や光源を目立たせるために使用することができる。しかしながら、対象の道具化は、必ずしも便利または望ましい選択肢ではない。 In some cases, for example, a person is wearing a reflector or a light source mesh while performing a movement, such as “instrumenting” the target object to distinguish between the object and the background. Can be promoted. Special illumination conditions (e.g., low light) can be used to make the reflectors and light sources stand out in the image. However, subject instrumentation is not always a convenient or desirable option.

本発明のある実施形態は、使用する画像内において見えている背景面と物体との間のコントラストの改善により、物体認識を向上させる撮像システムに関する。これは、例えば、物体に向けられる照明の制御手段によって達成され得る。例えば、どの背景面よりもカメラに著しく近い人の手などを対象の物体とするモーションキャプチャシステムでは、距離に対する光強度の減衰（点状光源では１／ｒ^２）が、（複数の）カメラまたは他の（複数の）撮像装置の近くの光源（または複数の光源）の配置及び物体上の光の照射によって生かされる。対象の物体の近くで反射された光源光は、より遠くの背景面及び（物体と比較して）より遠くの背景からの反射光よりも非常に明るくなると予測することができ、より顕著な効果が生じ得る。したがって、いくつかの実施形態において、撮像画像内の画素の輝度に対するカットオフ閾値を、「背景」画素から「物体」画素を区別するために用いることができる。広帯域の環境光源を用いることができるが、様々な実施形態では、限定的な波長範囲の光と、そのような光を検出するよう適合したカメラが用いられる。例えば、赤外光源の光は、赤外周波数を感知する１以上のカメラとともに使用され得る。 An embodiment of the present invention relates to an imaging system that improves object recognition by improving the contrast between a background surface visible in the image used and the object. This can be achieved, for example, by means of control of the illumination directed at the object. For example, in a motion capture system that targets an object such as a human hand that is significantly closer to the camera than any background surface, the attenuation of light intensity with respect to distance (1 / r ² for a point light source) Utilized by the arrangement of the light source (or light sources) in the vicinity of the other imaging device (s) and the illumination of light on the object. Source light reflected near the object of interest can be expected to be much brighter than reflected light from farther backgrounds and farther backgrounds (compared to objects), more pronounced effect Can occur. Thus, in some embodiments, a cut-off threshold for the luminance of pixels in the captured image can be used to distinguish “object” pixels from “background” pixels. While a broadband ambient light source can be used, in various embodiments, a limited wavelength range of light and a camera adapted to detect such light are used. For example, light from an infrared light source can be used with one or more cameras that sense infrared frequencies.

したがって、第１の態様において、本発明は、デジタルで表示された画像シーンの中から対象の物体を識別するための画像撮像解析システムに関する。様々な実施形態において、前記システムは、視野に向けられた少なくとも１つのカメラと、前記カメラと同じ前記視野側に配置されて前記視野を照明するように向けられた少なくとも１つの光源と、前記カメラ及び（複数の）前記光源と結合された画像解析装置と、を備える。前記画像解析装置は、（複数の）前記光源が前記視野を照明すると同時に撮像される第１画像を含む、一連の画像を撮像するために（複数の）前記カメラを動作させ、背景ではなく前記物体に対応する画素を識別し、識別された画素に基づき、前記物体の位置及び形状を含む前記物体の３Ｄモデルを構築して、それが対象の前記物体に対応するか否かを幾何学的に決定するように構成され得る。ある実施形態では、前記画像解析装置は、（i）前記視野の近接領域内に位置する前記物体に対応する前景画像成分と、（ii）前記視野の遠隔領域内に位置する前記物体に対応する背景画像成分と、を区別するものであり、前記近接領域は、（複数の）前記カメラから広がるとともに、（複数の）前記カメラと前記前景画像成分に対応する前記物体との間の予測最大距離の少なくとも２倍となる奥行を有し、前記遠隔領域は、少なくとも１つの前記カメラに対して前記近接領域を越えた位置に存在している。例えば、前記近接領域が前記予測最大距離の少なくとも４倍となる奥行を有していてもよい。 Accordingly, in a first aspect, the present invention relates to an image capturing and analyzing system for identifying a target object from a digitally displayed image scene. In various embodiments, the system includes at least one camera directed to a field of view, at least one light source disposed on the same field side as the camera and directed to illuminate the field of view, and the camera And an image analysis device coupled to the light source (s). The image analysis device operates the camera (s) to capture a series of images, including a first image captured at the same time as the light source (s) illuminate the field of view, and not the background Identify a pixel corresponding to the object, and build a 3D model of the object including the position and shape of the object based on the identified pixel and determine whether it corresponds to the object of interest Can be configured to determine. In one embodiment, the image analysis device corresponds to (i) a foreground image component corresponding to the object located in a near region of the field of view, and (ii) corresponding to the object located in a remote region of the field of view. And the proximity region extends from the camera (s) and the predicted maximum distance between the camera (s) and the object corresponding to the foreground image component The remote area exists at a position beyond the proximity area with respect to at least one of the cameras. For example, the proximity region may have a depth that is at least four times the predicted maximum distance.

他の実施形態では、前記画像解析装置は、（複数の）前記光源が前記視野を照明していない時に（複数の）前記カメラを動作させて第２及び第３画像を撮像するとともに、前記第１及び第２画像の差分と前記第１及び第３画像の差分とに基づいて前記物体に対応する画素を識別するものであり、前記第２画像は前記第１画像の前に撮像され、前記第３画像は前記第２画像の後に撮像される。 In another embodiment, the image analysis apparatus operates the camera (s) to capture the second and third images when the light source (s) are not illuminating the field of view, Identifying a pixel corresponding to the object based on a difference between the first and second images and a difference between the first and third images, the second image being captured before the first image, The third image is captured after the second image.

例えば、（複数の）前記光源は、拡散出射体（例えば、赤外発光ダイオードであって、この場合は（複数の）前記カメラは赤外感知カメラである）であってもよい。２以上の前記光源が、（複数の）前記カメラに隣接し、これらが実質的に同一平面内に存在してもよい。様々な実施形態において、（複数の）前記カメラと（複数の）前記光源とが鉛直上方を向いている。コントラストを改善するために、前記カメラは、露光時間が１００マイクロ秒と同程度となるように動作し、（複数の）前記光源は、露光時間の間に少なくとも５ワットの電力レベルで駆動されるようにしてもよい。ある実装では、ホログラフィック回折格子が、それぞれの前記カメラのレンズと前記視野との間（即ち、前記カメラのレンズの前）に配置される。 For example, the light source (s) may be a diffuse emitter (eg, an infrared light emitting diode, in which case the camera (s) is an infrared sensitive camera). Two or more of the light sources may be adjacent to the camera (s) and they may be substantially in the same plane. In various embodiments, the camera (s) and the light source (s) are pointing vertically upward. To improve contrast, the camera is operated such that the exposure time is as high as 100 microseconds, and the light source (s) are driven at a power level of at least 5 watts during the exposure time. You may do it. In one implementation, a holographic diffraction grating is placed between each camera lens and the field of view (ie, in front of the camera lens).

画像解析装置は、候補物体を容量分析的に定義する楕円の識別と、楕円に基づく定義に対して幾何学的に矛盾する物体セグメントの破棄と、候補物体が対象の物体に対応するか否かについての楕円に基づく決定と、によって、ある物体が対象の物体に対応するか否かを幾何学的に決定し得る。 The image analysis device identifies the ellipse that defines the candidate object in terms of volumetric analysis, discards the object segment that is geometrically inconsistent with the definition based on the ellipse, and whether the candidate object corresponds to the target object. With an ellipse-based decision on, it can be determined geometrically whether an object corresponds to the object of interest.

別の態様において、本発明は、画像撮像解析方法に関する。様々な実施形態において、前記方法は、対象の物体を含む視野を照明する少なくとも１つの光源の駆動と、（複数の）前記光源の駆動と同時にカメラ（または、複数のカメラ）を使用することによる、前記視野の一連のデジタル画像の撮像と、背景ではなく前記物体に対応する画素の識別と、のステップを備え、識別された画素に基づき、前記物体の位置及び形状を含む前記物体の３Ｄモデルを構築して、それが対象の前記物体に対応するか否かを幾何学的に決定する。 In another aspect, the present invention relates to an image capturing analysis method. In various embodiments, the method is by driving at least one light source that illuminates a field of view containing the object of interest, and using the camera (or cameras) simultaneously with driving the light source (s). Imaging a series of digital images of the field of view and identifying pixels corresponding to the object rather than the background, and based on the identified pixels, a 3D model of the object including the position and shape of the object And geometrically determine whether it corresponds to the object of interest.

（複数の）前記光源は、対象の前記物体が近接領域内に位置するように配置してもよく、前記近接領域は、前記カメラから、前記カメラと対象の前記物体との間の予測最大距離の少なくとも２倍となる距離まで広がる。例えば、前記近接領域が前記予測最大距離の少なくとも４倍となる奥行を有していてもよい。（複数の）前記光源は、例えば、拡散出射体（例えば、赤外発光ダイオード）としてもよく、この場合、前記カメラは赤外感知カメラである。少なくとも２以上の前記光源が、前記カメラに隣接し、これらが実質的に同一平面内に存在してもよい。様々な実施形態において、前記カメラと（複数の）前記光源とが鉛直上方を向いている。コントラストを改善するために、前記カメラは、露光時間が１００マイクロ秒と同程度となるように動作し、（複数の）前記光源は、露光時間の間に少なくとも５ワットの電力レベルで駆動されるようにしてもよい。 The light source (s) may be arranged such that the object of interest is located within a proximity area, the proximity area being a predicted maximum distance between the camera and the object of interest from the camera. It spreads to a distance that is at least twice as long. For example, the proximity region may have a depth that is at least four times the predicted maximum distance. The light source (s) may be, for example, a diffuse emitter (eg, an infrared light emitting diode), in which case the camera is an infrared sensitive camera. At least two or more of the light sources may be adjacent to the camera and they may be substantially in the same plane. In various embodiments, the camera and the light source (s) are pointing vertically upward. To improve contrast, the camera is operated such that the exposure time is as high as 100 microseconds, and the light source (s) are driven at a power level of at least 5 watts during the exposure time. You may do it.

また、物体画素は、（複数の）前記光源が駆動していない時の第１画像と、（複数の）前記光源が駆動している時の第２画像と、（複数の）前記光源が駆動していない時の第３画像と、の撮像によって識別してもよく、前記第２及び第１画像の差分と、前記第２及び第３画像の差分と、に基づいて前記物体に対応する画素が識別される。 Further, the object pixel includes a first image when the light source (s) are not driven, a second image when the light source (s) are driven, and the light source (s) being driven. A pixel corresponding to the object based on the difference between the second and first images and the difference between the second and third images, which may be identified by imaging the third image when not Is identified.

ある物体が対象の物体に対応するか否かについての幾何学的な決定は、候補物体を容量分析的に定義する楕円の識別と、楕円に基づく定義に対して幾何学的に矛盾する物体セグメントの破棄と、候補物体が対象の物体に対応するか否かについての楕円に基づく決定と、から成り得るかこれらを含み得る。 The geometric decision as to whether an object corresponds to the object of interest is the identification of an ellipse that defines the candidate object volumetrically and an object segment that is geometrically inconsistent with the ellipse-based definition , And an ellipse-based decision as to whether the candidate object corresponds to the target object.

さらに別の態様において、本発明は、デジタル画像内における丸形物体の位置決め方法に関する。様々な実施形態において、前記方法は、対象の物体を含む視野を照明する少なくとも１つの光源の駆動と、少なくとも１つの前記光源が前記視野を照明すると同時に撮像される第１画像を含む、一連の画像を撮像するための前記カメラの動作と、前記視野内における丸形物体を示すガウシアン輝度減衰パターンを検出するための前記画像の解析と、のステップを備える。いくつかの実施形態において、前記丸形物体が、そのエッジの識別をすることなく検出される。この方法は、複数の撮像画像を通じて検出された前記丸形物体の動きの追跡を、さらに備えてもよい。 In yet another aspect, the invention relates to a method for positioning a round object in a digital image. In various embodiments, the method includes a series of driving at least one light source that illuminates a field of view that includes the object of interest, and a first image that is imaged at the same time as the at least one light source illuminates the field of view. And a step of operating the camera to capture an image and analyzing the image to detect a Gaussian luminance attenuation pattern indicating a round object in the field of view. In some embodiments, the round object is detected without identifying its edges. The method may further comprise tracking the movement of the round object detected through a plurality of captured images.

別の態様において、本発明は、視野内における丸形物体の位置決めをするための画像撮像解析システムに関する。様々な実施形態において、前記システムは、視野に向けられた少なくとも１つのカメラと、前記カメラと同じ前記視野側に配置されて前記視野を照明するように向けられた少なくとも１つの光源と、前記カメラ及び前記光源と結合された画像解析装置と、を備える。前記画像解析装置は、少なくとも１つの前記光源が前記視野を照明すると同時に撮像される第１画像を含む、一連の画像を撮像するために少なくとも1つの前記カメラを動作させ、前記視野内における丸形物体を示すガウシアン輝度減衰パターンを検出するために前記画像を解析するように、構成され得る。丸形物体は、いくつかの実施形態において、そのエッジの識別をすることなく検出され得る。前記システムは、複数の撮像画像を通じて検出された前記丸形物体の動きを追跡し得る。 In another aspect, the invention relates to an imaging analysis system for positioning a round object in a field of view. In various embodiments, the system includes at least one camera directed to a field of view, at least one light source disposed on the same field side as the camera and directed to illuminate the field of view, and the camera And an image analysis device coupled to the light source. The image analysis device operates at least one of the cameras to capture a series of images, including a first image captured at the same time as at least one of the light sources illuminates the field of view, and has a round shape within the field of view. It may be configured to analyze the image to detect a Gaussian luminance attenuation pattern indicative of the object. A round object may be detected without identifying its edge in some embodiments. The system may track the movement of the round object detected through a plurality of captured images.

本明細書において使用される語句「実質的に」または「およそ」は、±１０％（例えば、重量や体積）を意味し、いくつかの実施形態では±５％である。語句「基本的に〜から構成される」は、本明細書において他に定義しない限り、機能に寄与する他の材料を含まないことを意味する。本明細書を通じて、「一実施例（one example）」、「実施例（an example）」、「一実施形態（one embodiment）」または「実施形態（an embodiment）」の言及は、その例に関して記載された特定の特徴、構造または特徴が、本技術の少なくとも一例に含まれることを意味する。そのため、本明細書を通じた様々な場所における語句「一実施例では（in one example）」、「実施例では（in an example）」、「一実施形態（one embodiment）」または「実施形態（an embodiment）」の記載は、必ずしも全て同じ例について言及するものではない。さらに、特定の特性、構造、ルーチン、ステップまたは特徴は、本技術の１以上の例において任意の適切な方法で組み合わせることができる。本明細書で定義されている見出しは、単なる便宜上のものであって、特許請求する技術の範囲または意味を限定または解釈を意図したものではない。 The phrase “substantially” or “approximately” as used herein means ± 10% (eg, weight or volume), and in some embodiments ± 5%. The phrase “consisting essentially of” means that it does not include other materials that contribute to function unless otherwise defined herein. Throughout this specification, references to “one example”, “an example”, “one embodiment”, or “an embodiment” are described with respect to that example. The particular feature, structure or feature made is meant to be included in at least one example of the present technology. As such, the phrases “in one example”, “in an example”, “one embodiment” or “an embodiment” in various places throughout this specification. The description of embodiment) ”does not necessarily refer to the same example. Furthermore, the particular features, structures, routines, steps or characteristics may be combined in any suitable manner in one or more examples of the technology. The headings defined herein are for convenience only and are not intended to limit or interpret the scope or meaning of the claimed technology.

添付の図面と共に以下の詳細な説明は、本発明の性質及び利点のより良い理解を提供するであろう。 The following detailed description in conjunction with the accompanying drawings will provide a better understanding of the nature and advantages of the present invention.

本発明の実施形態に係る画像データを撮像するシステムを示す。1 shows a system for capturing image data according to an embodiment of the present invention. 本発明の実施形態に係る画像解析装置を実現するコンピュータシステムの簡略ブロック図。1 is a simplified block diagram of a computer system that realizes an image analysis apparatus according to an embodiment of the present invention. 本発明の実施形態に係る得られ得る画素行の輝度データのグラフ。6 is a graph of luminance data of pixel rows that can be obtained according to an embodiment of the present invention. 本発明の実施形態に係る得られ得る画素行の輝度データのグラフ。6 is a graph of luminance data of pixel rows that can be obtained according to an embodiment of the present invention. 本発明の実施形態に係る得られ得る画素行の輝度データのグラフ。6 is a graph of luminance data of pixel rows that can be obtained according to an embodiment of the present invention. 本発明の実施形態に係る画像内の物体の位置を識別するための処理のフロー図。The flowchart of the process for identifying the position of the object in the image which concerns on embodiment of this invention. 本発明の実施形態に係る一定の間隔でオンになるパルス状の光源の時系列を示す。4 shows a time series of pulsed light sources that are turned on at regular intervals according to an embodiment of the present invention. 本発明の実施形態に係る光源のパルス駆動と画像の撮像の時系列を示す。4 shows a time series of pulse driving and image capturing of a light source according to an embodiment of the present invention. 本発明の実施形態に係る一連の画像を用いて物体のエッジを識別する処理のフロー図。The flowchart of the process which identifies the edge of an object using a series of images which concern on embodiment of this invention. 本発明の実施形態に係るユーザ入力装置である動き検出器を含むコンピュータシステムの上面図。The top view of the computer system containing the motion detector which is a user input device concerning the embodiment of the present invention. 本発明の実施形態に係る動き検出器を含むコンピュータシステムの別の例を示すタブレットコンピュータの正面図。The front view of the tablet computer which shows another example of the computer system containing the motion detector which concerns on embodiment of this invention. 本発明の実施形態に係る動き検出器を含むゴーグルシステムを示す。1 illustrates a goggle system including a motion detector according to an embodiment of the present invention. 本発明の実施形態に係るコンピュータシステムまたは他のシステムを制御するためのユーザ入力として動き情報を使用する処理のフロー図。The flowchart of the process which uses motion information as a user input for controlling the computer system which concerns on embodiment of this invention, or another system. 本発明の別の実施形態に係る画像データを撮像するシステムを示す。3 shows a system for imaging image data according to another embodiment of the present invention. 本発明のさらに別の実施形態に係る画像データを撮像するシステムを示す。6 shows a system for capturing image data according to still another embodiment of the present invention.

本発明の実施形態に係る画像データを撮像するシステム１００を示す図１を、最初に参照する。システム１００は、画像解析システム１０６に結合された一対のカメラ１０２，１０４を備える。カメラ１０２，１０４は、可視スペクトルの全域を感知するカメラや、より典型的には、限定的な波長帯域（例えば、赤外（ＩＲ）や紫外帯域）の感度が強化されたカメラを含む、どのようなタイプのカメラであってもよい。より一般的に、本明細書における語句「カメラ」は、物体の画像を撮像して当該画像をデジタルデータの形式で表示することが可能な任意の装置（または装置の組み合わせ）を指す。例えば、二次元（２Ｄ）画像を撮像する従来の装置ではなく、ラインセンサやラインカメラを用いてもよい。語句「光」は、可視スペクトルの範囲内であってもそうでなくてもよく、広帯域（例えば、白色光）または狭帯域（例えば、単一波長または狭い波長帯）であってもよい、いかなる電磁的な出射をも含意するものとして、一般的に使用される。 Reference is first made to FIG. 1 illustrating a system 100 for imaging image data according to an embodiment of the present invention. System 100 includes a pair of cameras 102 and 104 coupled to an image analysis system 106. Cameras 102 and 104 include any camera that senses the entire visible spectrum or, more typically, a camera with enhanced sensitivity in a limited wavelength band (eg, infrared (IR) or ultraviolet band). Such a type of camera may be used. More generally, the phrase “camera” herein refers to any device (or combination of devices) capable of taking an image of an object and displaying the image in the form of digital data. For example, a line sensor or a line camera may be used instead of a conventional apparatus that captures a two-dimensional (2D) image. The phrase “light” may or may not be within the visible spectrum and may be broadband (eg, white light) or narrowband (eg, single wavelength or narrow wavelength band). Generally used to imply electromagnetic emission.

デジタルカメラの心臓部は、感光性画像素子（画素）のグリッドを含むイメージセンサである。レンズがイメージセンサの表面に光を集光し、画素に様々な強度の光が当たることで画像が形成される。各画素は、検出された光の強度を反映した大きさの電荷へと光を変換するとともに、測定可能なように当該電荷を収集する。ＣＣＤ及びＣＭＯＳイメージセンサのいずれもがこれと同じ機能を果たすが、信号の測定及び伝達方法が異なる。 The heart of a digital camera is an image sensor that includes a grid of photosensitive image elements (pixels). The lens condenses light on the surface of the image sensor, and an image is formed when light of various intensities hits the pixels. Each pixel converts the light into a charge that reflects the detected light intensity and collects the charge so that it can be measured. Both CCD and CMOS image sensors perform the same function, but differ in signal measurement and transmission methods.

ＣＣＤでは、各画素からの電荷が、測定可能な電圧へと電荷を変換する単一の構造へと搬送される。これは、測定構造に到達するまで、行毎及び列毎の「バケツリレー」方式により、各画素がその隣接する画素に電荷を順次移動させることによって行われる。これとは対照的に、ＣＭＯＳセンサは、各画素の位置に測定構造を配置される。測定結果は、それぞれの位置からセンサの出力へと直接的に転送される。 In a CCD, the charge from each pixel is conveyed to a single structure that converts the charge to a measurable voltage. This is done by each pixel moving its charge sequentially to its neighboring pixels in a “bucket relay” manner for each row and column until the measurement structure is reached. In contrast, a CMOS sensor has a measurement structure at each pixel location. The measurement results are transferred directly from each position to the sensor output.

カメラ１０２，１０４は、ビデオ画像（即ち、少なくとも毎秒１５フレームの一定レートである一連の画像フレーム）の撮像が可能であると好ましいが、特定のフレームレートが必要というわけではない。カメラ１０２，１０４の機能は本発明にとって重要ではなく、当該カメラは、フレームレート、画像解像度（例えば、画像あたりの画素数）、色または強度分解能（例えば、画素当たりの強度データのビット数）、レンズの焦点距離、被写界深度などについて様々であり得る。一般的に、特定の用途のために、対象の空間体積内の物体に焦点を合わせることが可能な任意のカメラが使用され得る。例えば、他の部分が静止している人の手の動きを撮像するために、対象の体積は、一辺がおよそ１メートルの立方体であると定義され得る。 Although the cameras 102 and 104 are preferably capable of capturing video images (ie, a series of image frames at a constant rate of at least 15 frames per second), a specific frame rate is not required. The function of the cameras 102, 104 is not critical to the present invention, and the camera can be used for frame rate, image resolution (eg, number of pixels per image), color or intensity resolution (eg, bits of intensity data per pixel), There may be various variations in lens focal length, depth of field, and the like. In general, any camera capable of focusing on an object in the target spatial volume may be used for a particular application. For example, to image the movement of a person's hand with other parts stationary, the volume of interest can be defined as a cube that is approximately 1 meter on a side.

システム１００は、カメラ１０２，１０４の両側に配置されるとともに画像解析システム１０６に制御される一対の光源１０８，１１０を、さらに備える。光源１０８，１１０は、一般的な従来の設計である赤外光源、例えば赤外発光ダイオード（ＬＥＤ）であってもよく、カメラ１０２，１０４は赤外光を感知可能であってもよい。フィルタ１２０、１２２は、可視光を除去して赤外光のみがカメラ１０２，１０４によって撮像された画像内に記録されるように、カメラ１０２，１０４の前に配置され得る。対象の物体が人の手や体であるいくつかの実施形態では、赤外光の使用によって、モーションキャプチャシステムを広範囲の照明条件下で動作させることを可能にするとともに、様々な不便や人が動く領域内に可視光が入射することに関連し得る妨害を回避することができる。しかし、特定の波長や電磁スペクトルの領域が必要となる。 The system 100 further includes a pair of light sources 108 and 110 disposed on both sides of the cameras 102 and 104 and controlled by the image analysis system 106. The light sources 108 and 110 may be an infrared light source, such as an infrared light emitting diode (LED), which is a typical conventional design, and the cameras 102 and 104 may be capable of sensing infrared light. Filters 120, 122 can be placed in front of cameras 102, 104 so that visible light is removed and only infrared light is recorded in images captured by cameras 102, 104. In some embodiments where the object of interest is a human hand or body, the use of infrared light allows the motion capture system to operate under a wide range of lighting conditions, Interferences that may be associated with the incidence of visible light within the moving area can be avoided. However, specific wavelengths and electromagnetic spectrum regions are required.

上述の構成は、代表的なものであって限定的なものではないことが、強調されるべきである。例えば、レーザや他の光源を、ＬＥＤの代わりに使用することができる。レーザの設定のために、レーザビームを広げる（及びカメラの視野に似た視野を作る）ための追加の光学系（例えば、レンズまたは拡散器）を用いてもよい。有用な構成は、異なる範囲のための短広角照明器をさらに含み得る。光源は、典型的には、鏡面反射性ではなく拡散性の点光源である。例えば、光拡散カプセル化によってパッケージ化されたＬＥＤが適している。 It should be emphasized that the above arrangement is exemplary and not limiting. For example, a laser or other light source can be used in place of the LED. For laser setup, additional optics (eg, a lens or diffuser) may be used to expand the laser beam (and create a field of view similar to that of the camera). Useful configurations may further include short wide angle illuminators for different ranges. The light source is typically a diffusive point source rather than a specular reflection. For example, LEDs packaged by light diffusion encapsulation are suitable.

動作時において、カメラ１０２、１０４は、対象の物体１１４（本例では、手）及び１以上の背景物体１１６が存在し得る対象の領域１１２に対して向けられる。光源１０８，１１０は、領域１１２を照射するように配置されている。いくつかの実施形態において、１以上の光源及び１以上のカメラ１０２，１０４は、検出される動きの下方（例えば、手の動きが検出される場合、その動きが行われる空間領域の直下）に配置される。手について記録される情報量は、それがカメラ画像内に占める画素数に比例し、手の「指示方向」に対するカメラの角度が可能な限り垂直であれば、当該手がより多くの画素を占めることになるため、上記の位置が最適である。ユーザにとって、スクリーンに対して手のひらを向けることは窮屈であるため、下面から見上げる、上面から見下ろすまたはスクリーンのベゼルから対角線上に見上げるあるいは見下ろす、のいずれかが最適な位置である。見上げる場合、背景物体（例えば、ユーザの机の上の散乱物）との混同の可能性が低くなり、真っすぐに見上げるようにすれば、視野外における他の人との混同の可能性が低くなる（さらには、顔を撮像しないことによってプライバシーが改善される）。例えば、コンピュータシステム等であり得る画像解析システム１０６は、領域１１２の画像を撮像するために、光源１０８，１１０及びカメラ１０２，１０４の動作を制御し得る。この撮像画像に基づいて、画像解析システム１０６は、物体１１４の位置及び／または動きを決定する。 In operation, the cameras 102, 104 are pointed toward a target area 112 where a target object 114 (in this example, a hand) and one or more background objects 116 may be present. The light sources 108 and 110 are arranged so as to irradiate the region 112. In some embodiments, the one or more light sources and the one or more cameras 102, 104 are below the detected movement (eg, directly below the spatial region in which the movement occurs if hand movement is detected). Be placed. The amount of information recorded for a hand is proportional to the number of pixels it occupies in the camera image, and the hand occupies more pixels if the camera's angle to the “pointing direction” of the hand is as vertical as possible. Therefore, the above position is optimal. For a user, pointing the palm against the screen is cramped, so either looking up from the bottom, looking down from the top or looking up diagonally from the screen bezel or looking down is the optimal position. When looking up, the possibility of confusion with background objects (for example, scattered objects on the user's desk) is low, and if looking straight up, the possibility of confusion with other people outside the field of view is reduced. (Furthermore, privacy is improved by not imaging the face). For example, the image analysis system 106, which can be a computer system or the like, can control the operation of the light sources 108, 110 and the cameras 102, 104 to capture an image of the region 112. Based on this captured image, the image analysis system 106 determines the position and / or movement of the object 114.

例えば、物体１１４の位置を決定する際のステップとして、画像解析システム１０６は、物体１１４の一部を含むカメラ１０２，１０４によって撮像された様々な画像の画素を決定し得る。いくつかの実施形態では、画像内の任意の画素が、物体１１４の一部を含む画素であるか否かに基づいて、「物体」画素または「背景」画素として分類され得る。光源１０８、１１０を使用する、物体または背景画素の分類は、画素の輝度に基づいて行われ得る。例えば、対象の物体１１４及びカメラ１０２，１０４の間の距離（ｒ_Ｏ）は、（複数の）背景物体１１６及びカメラ１０２，１０４の間の距離（ｒ_Ｂ）よりも小さいことが予想される。光源１０８、１１０からの光の強度が１／ｒ^２で減少するため、物体１１４は背景１１６と比較してより明るく照明され、物体１１４の一部を含む画素（即ち、物体画素）は、これに対応して背景１１６の一部を含む画素（即ち、背景画素）よりも明るくなる。例えば、ｒ_Ｂ／ｒ_Ｏ＝２の場合、物体１１４及び背景１１６が光源１０８，１１０からの光を同様に反射すると仮定し、さらに領域１１２の照明全体（少なくともカメラ１０２，１０４によって撮像される周波数帯域内）が光源１０８，１１０によって支配されていると仮定すると、物体画素は背景画素よりもおよそ４倍明るくなる。これらの仮定は、一般的に、カメラ１０２、１０４、光源１０８、１１０、フィルタ１２０，１２２及び通常遭遇する物体の適切な選択においても保持される。例えば、光源１０８，１１０が狭い周波数帯域で放射線の強い出射が可能な赤外ＬＥＤになり得るとともに、フィルタ１２０，１２２が光源１０８，１１０の周波数帯域に合致したものとなり得る。このように、人間の手や体、または背景内における熱源あるいは他の物体が赤外線を出射し得るが、それでもカメラ１０２，１０４の反応は、光源１０８、１１０に由来するとともに物体１１４及び／または背景１１６によって反射された光に支配されたものとなり得る。 For example, as a step in determining the position of the object 114, the image analysis system 106 may determine pixels of various images captured by the cameras 102, 104 that include a portion of the object 114. In some embodiments, any pixel in the image may be classified as an “object” pixel or a “background” pixel based on whether it is a pixel that includes a portion of the object 114. Classification of object or background pixels using light sources 108, 110 may be performed based on pixel brightness. For example, the distance (r _O ) between the target object 114 and the cameras 102, 104 is expected to be smaller than the distance (r _B ) between the background object 116 and the cameras 102, 104. Since the intensity of the light from the light sources 108, 110 decreases by 1 / r ² , the object 114 is illuminated brighter compared to the background 116, and the pixel that contains a portion of the object 114 (ie, the object pixel) Corresponding to the pixel 116 (i.e., the background pixel) including a part of the background 116. For example, if r _B / r _O = 2, it is assumed that the object 114 and the background 116 similarly reflect light from the light sources 108 and 110, and further the entire illumination of the region 112 (at least the frequency imaged by the cameras 102 and 104. Assuming that (in-band) is dominated by the light sources 108, 110, the object pixel will be approximately four times brighter than the background pixel. These assumptions are generally maintained in the proper selection of cameras 102, 104, light sources 108, 110, filters 120, 122 and the objects normally encountered. For example, the light sources 108 and 110 can be infrared LEDs that can emit strong radiation in a narrow frequency band, and the filters 120 and 122 can match the frequency bands of the light sources 108 and 110. Thus, a human hand or body, or a heat source or other object in the background, can emit infrared light, but the response of the cameras 102, 104 still originates from the light sources 108, 110 and the object 114 and / or background. It can be dominated by the light reflected by 116.

この構成では、画像解析システム１０６は、各画素に輝度閾値を適用することによって、迅速かつ正確に背景画素から対象画素を区別することができる。例えば、ＣＭＯＳセンサや類似の装置における画素の輝度は、センサ設計に基づいて０．０（暗）から１．０（完全飽和）の間にいくつかの階調を有する範囲で測定され得る。カメラ画素によって符号化される輝度は、典型的には蓄積される電荷またはダイオード電圧に起因しており、被写体の明るさに対して標準的（線形的）に対応する。いくつかの実施形態では、光源１０８，１１０は、距離ｒ_Ｏの物体から反射された光が１．０の輝度レベルを生じさせ、その一方で距離ｒ_Ｂ＝２ｒ_Ｏの物体から反射された光が０．２５の輝度レベルを生じさせるほど、十分に明るい。対象画素は、このように容易に、輝度に基づいて背景画素から区別され得る。さらに、物体のエッジもまた、隣接する画素の間における輝度の差に基づいて容易に検出され得るものであり、各画像内の物体の位置の決定を可能にする。カメラ１０２，１０４からの画像間における物体の位置の関連付けは、画像解析システム１０６における物体１１４の３Ｄ空間内の位置の決定を可能にするものであり、一連の画像の解析は、画像解析システム１０６における従来の動きアルゴリズムを用いた物体１１４の３Ｄ動きの再構成を可能にする。 In this configuration, the image analysis system 106 can quickly and accurately distinguish the target pixel from the background pixel by applying a luminance threshold value to each pixel. For example, the brightness of a pixel in a CMOS sensor or similar device can be measured in a range having several gradations between 0.0 (dark) and 1.0 (fully saturated) based on the sensor design. The luminance encoded by the camera pixel is typically due to accumulated charge or diode voltage and corresponds to a standard (linear) with respect to the brightness of the subject. In some embodiments, the light sources 108, 110 cause light reflected from an object at a distance r _O to produce a luminance level of 1.0 while light reflected from an object at a distance r _B = 2r _O. Is bright enough to produce a brightness level of 0.25. The target pixel can thus be easily distinguished from the background pixel based on the luminance. Furthermore, the edges of the object can also be easily detected based on the difference in brightness between adjacent pixels, allowing the position of the object in each image to be determined. The association of the position of the object between the images from the cameras 102, 104 enables the determination of the position of the object 114 in 3D space in the image analysis system 106, and the analysis of the series of images is performed by the image analysis system 106. Allows the reconstruction of 3D motion of the object 114 using conventional motion algorithms in FIG.

当然であるが、システム１００は例示であって、変更や修正は可能である。例えば、光源１０８、１１０は、カメラ１０２，１０４の両側に配置されるものとして示している。これは、両方のカメラの視点から見た物体１１４のエッジに対する照明を、容易にし得る。しかし、カメラ及びライトの特定の配置は必要ない。（他の構成の例については、以下で説明される。）物体が背景よりもカメラに著しく近い限り、本明細書に記載のような改善されたコントラストが達成され得る。 Of course, the system 100 is exemplary, and changes and modifications are possible. For example, the light sources 108 and 110 are shown as being disposed on both sides of the cameras 102 and 104. This may facilitate illumination of the edges of the object 114 viewed from the perspective of both cameras. However, a specific arrangement of cameras and lights is not necessary. (Examples of other configurations are described below.) As long as the object is significantly closer to the camera than the background, improved contrast as described herein can be achieved.

画像解析システム１０６（画像解析装置とも言う）は、例えば本明細書に記載の技術を用いた撮像及び画像データの処理が可能な任意の装置または装置の構成要素に含まれ得るあるいはこれを成し得る。図２は、本発明の実施形態に係る画像解析装置１０６を実現するコンピュータシステム２００の簡略ブロック図である。コンピュータシステム２００は、プロセッサ２０２、メモリ２０４、カメラインタフェース２０６、ディスプレイ２０８、スピーカ２０９、キーボード２１０及びマウス２１１を含む。 The image analysis system 106 (also referred to as an image analysis device) may be included in or constitute any device or device component capable of imaging and processing image data using, for example, the techniques described herein. obtain. FIG. 2 is a simplified block diagram of a computer system 200 that implements the image analysis apparatus 106 according to the embodiment of the present invention. The computer system 200 includes a processor 202, a memory 204, a camera interface 206, a display 208, a speaker 209, a keyboard 210 and a mouse 211.

メモリ２０４は、プロセッサ２０２によって実行される命令だけでなく、当該命令の実行に関連付けられている入力及び／または出力データを記憶するために使用され得る。特に、メモリ２０４は、以下で詳細に説明するモジュールのグループとして概念的に図示される、プロセッサ２０２の動作及び他のハードウェアコンポーネントとのやりとりを制御する命令を格納している。オペレーティングシステムは、メモリ割り当て、ファイル管理及び大容量記憶装置の操作などの基本システム機能である低レベルの実行を指示する。オペレーティングシステムは、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）オペレーティングシステム、Ｕｎｉｘ（登録商標）オペレーティングシステム、Ｌｉｎｕｘ（登録商標）オペレーティングシステム、Ｘｅｎｉｘオペレーティング・システム、ＩＢＭＡＩＸオペレーティングシステム、ヒューレットパッカードＵＸオペレーティングシステム、ＮｏｖｅｌｌＮＥＴＷＡＲＥオペレーティング・システム、ＳｕｎＭｉｃｒｏｓｙｓｔｅｍｓＳＯＬＡＲＩＳオペレーティングシステム、ＯＳ／２オペレーティング・システム、ＢｅＯＳオペレーティングシステム、ＭＡＣＩＮＴＯＳＨオペレーティングシステム、ＡＰＡＣＨＥオペレーティングシステム、ＯＰＥＮＳＴＥＰオペレーティングシステムまたはプラットフォームの別のオペレーティングシステムなど、様々なオペレーティングシステムから成り得るまたは含み得る。 Memory 204 may be used to store not only instructions executed by processor 202 but also input and / or output data associated with execution of the instructions. In particular, the memory 204 stores instructions that control the operation of the processor 202 and interaction with other hardware components, conceptually illustrated as a group of modules described in detail below. The operating system directs low-level execution, which is a basic system function such as memory allocation, file management, and mass storage operation. The operating systems are Microsoft Windows (registered trademark) operating system, Unix (registered trademark) operating system, Linux (registered trademark) operating system, Xenix operating system, IBM AIX operating system, Hewlett Packard UX operating system, Novell NETWARE operating system. , Sun Microsystems SOLARIS operating system, OS / 2 operating system, BeOS operating system, MACINTOSH operating system, APACHE operating system, OPENSTEP operating system or platform It can consist of or include various operating systems, such as another operating system.

コンピュータ環境は、他のリムーバブル／非リムーバブル、揮発性／不揮発性のコンピュータ記憶媒体を含み得る。例えば、ハードディスクドライブは、非リムーバブルであり不揮発性の磁気媒体に読み取りまたは書き込みを行い得る。磁気ディスクドライブは、リムーバブルであり不揮発性の磁気ディスクに読み取りまたは書き込みを行い得るとともに、光ディスクドライブは、リムーバブルであり不揮発性のＣＤ−ＲＯＭや他の光媒体などの光ディスクに読み取りまたは書き込みを行い得る。他のリムーバブル／非リムーバブル、揮発性／不揮発性のコンピュータ記憶媒体は、例示の動作環境で使用されるものを含むが、磁気テープカセット、フラッシュメモリカード、デジタル多用途ディスク（Digital Versatile Disk）、デジタルビデオテープ、ソリッドステートＲＡＭ、ソリッドステートＲＯＭなど、これらに限定されない。記憶媒体は、典型的には、リムーバブルまたは非リムーバブルのメモリインタフェースを介してシステムバスに接続されている。 The computing environment may include other removable / non-removable, volatile / nonvolatile computer storage media. For example, hard disk drives can read or write to non-removable, non-volatile magnetic media. A magnetic disk drive can read or write to a removable and non-volatile magnetic disk, and an optical disk drive can read or write to an optical disk such as a non-volatile CD-ROM or other optical media . Other removable / non-removable, volatile / nonvolatile computer storage media include those used in the exemplary operating environment, including magnetic tape cassettes, flash memory cards, digital versatile disks, digital The video tape, solid state RAM, solid state ROM, etc. are not limited to these. The storage medium is typically connected to the system bus via a removable or non-removable memory interface.

プロセッサ２０２は、汎用マイクロプロセッサであってもよいが、実装に応じてその代わりに、マイクロコントローラ、周辺集積回路素子、ＣＳＩＣ（Customer Specific Integrated Circuit）、ＡＳＩＣ（Application-Specific Integrated Circuit）、論理回路、デジタル信号プロセッサ、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイス、ＰＬＤ（Programmable Logic Device）、ＰＬＡ（Programmable Logic Array）、ＲＦＩＤプロセッサ、スマートチップまたは本発明の処理のステップを実行することが可能な他の任意の装置あるいは装置の構成、とすることができる。 The processor 202 may be a general-purpose microprocessor, but instead, depending on the implementation, a microcontroller, peripheral integrated circuit elements, CSIC (Customer Specific Integrated Circuit), ASIC (Application-Specific Integrated Circuit), logic circuit, Digital signal processor, programmable logic device such as FPGA (Field Programmable Gate Array), PLD (Programmable Logic Device), PLA (Programmable Logic Array), RFID processor, smart chip or processing steps of the present invention can be executed It can be any other device or device configuration.

カメラインタフェース２０６は、図１に示すカメラ１０２，１０４などのカメラとコンピュータシステム２００との間の通信を可能にするハードウェア及び／またはソフトウェアだけでなく、図１に示す光源１０８，１１０などの関連した光源も含み得る。したがって、例えば、カメラインタフェース２０６は、カメラが接続された１以上のデータポート２１６、２１８だけでなく、カメラから受信したデータ信号をプロセッサ２０２上で実行される従来のモーションキャプチャ（「モーキャプ」）プログラム２１４の入力として与える前に修正する（例えば、ノイズの減少やデータの再フォーマットをする）ためのハードウェア及び／またはソフトウェアシグナルプロセッサも含み得る。いくつかの実施形態では、カメラインタフェース２０６は、例えば、カメラを駆動または停止させるためや、カメラ設定（フレームレート、画質、感度等）の制御などのために、カメラへの信号の送信も行い得る。このような信号は、ユーザ入力または他の検出されたイベントに応じて順番に生成され得るものであり、例えばプロセッサ２０２からの制御信号に応じて送信され得る。 The camera interface 206 is not only related to hardware and / or software that enables communication between a camera such as the cameras 102 and 104 shown in FIG. 1 and the computer system 200, but also related to the light sources 108 and 110 shown in FIG. A light source may also be included. Thus, for example, the camera interface 206 may be a conventional motion capture (“Morcap”) program that executes on the processor 202 data signals received from the camera as well as one or more data ports 216, 218 to which the camera is connected. A hardware and / or software signal processor may also be included for modification (eg, noise reduction or data reformatting) prior to being provided as an input to 214. In some embodiments, the camera interface 206 may also send signals to the camera, eg, to drive or stop the camera, control camera settings (frame rate, image quality, sensitivity, etc.), etc. . Such signals can be generated in turn in response to user input or other detected events and can be transmitted in response to a control signal from processor 202, for example.

カメラインタフェース２０６は、光源（例えば、光源１０８、１１０）に接続可能なコントローラ２１７、２１９も含み得る。いくつかの実施形態において、コントローラ２１７，２１９は、例えばモーキャププログラム２１４を実行するプロセッサ２０２からの指示に応じて、動作電流を光源に供給する。他の実施形態では、光源が外部電源（不図示）から動作電流を引き込み得るとともに、コントローラ２１７，２１９が例えば光源のオンあるいはオフまたは輝度の変化を指示する光源のための制御信号を生成し得る。いくつかの実施形態では、１つのコントローラが複数の光源を制御するために使用され得る。 The camera interface 206 may also include controllers 217, 219 that can be connected to light sources (eg, light sources 108, 110). In some embodiments, the controllers 217, 219 provide operating current to the light source, eg, in response to an instruction from the processor 202 executing the morphcap program 214. In other embodiments, the light source may draw operating current from an external power source (not shown), and the controllers 217, 219 may generate control signals for the light source that indicate, for example, turning on or off the light source or changing the brightness. . In some embodiments, one controller can be used to control multiple light sources.

モーキャププログラム２１４を定義する命令は、メモリ２０４に格納され、これらの命令が実行されると、カメラインタフェース２０６に接続されたカメラから与えられる画像に対するモーションキャプチャ解析が実行される。一実施形態では、モーキャププログラム２１４は、物体検出モジュール２２２及び物体解析モジュール２２４などの様々なモジュールを含む。さらに、これらのモジュールの両方は、従来のものであって当技術分野において十分に特徴付けられているものである。物体検出モジュール２２２は、画像中の物体のエッジ及び／または物体の位置に関する他の情報を検出するために、画像（例えば、カメラインタフェース２０６を介して撮像された画像）を解析し得る。物体解析モジュール２２４は、物体の３Ｄ位置及び／または動きを決定するために、物体検出モジュール２２２によって与えられる物体情報を解析し得る。モーキャププログラム２１４のコードモジュールで実行され得る動作の例については、以下に記載する。メモリ２０４は、他の情報及び／またはモーキャププログラム２１４によって使用されるコードモジュールも含み得る。 Instructions that define the motocap program 214 are stored in the memory 204. When these instructions are executed, motion capture analysis is performed on an image provided from a camera connected to the camera interface 206. In one embodiment, the mocap program 214 includes various modules such as the object detection module 222 and the object analysis module 224. Furthermore, both of these modules are conventional and well characterized in the art. The object detection module 222 may analyze the image (eg, an image taken via the camera interface 206) to detect other information about the edge of the object and / or the position of the object in the image. Object analysis module 224 may analyze the object information provided by object detection module 222 to determine the 3D position and / or movement of the object. Examples of operations that can be performed by the code modules of the motocap program 214 are described below. Memory 204 may also include other information and / or code modules used by motocap program 214.

ディスプレイ２０８、スピーカ２０９、キーボード２１０及びマウス２１１は、コンピュータシステム２００とのユーザのやりとりを容易にし得る。これらの構成要素は、一般的な従来設計のものや、ユーザのやりとりの任意のタイプを与えることが望ましくなるように変更したものであり得る。いくつかの実施形態では、カメラインタフェース２０６及びモーキャププログラム２１４を使用したモーションキャプチャの結果が、ユーザ入力として解釈され得る。例えば、ユーザは、モーキャププログラム２１４を用いて解析される手のジェスチャを行うことが可能であり、この解析の結果は、プロセッサ２００（例えば、ウェブブラウザ、ワードプロセッサまたは他のアプリケーション）上で実行される他のプログラムへの指示として解釈され得る。そのため、例として、ユーザは、ディスプレイ２０８上に表示される現在のウェブページを「スクロール」するための上側または下側スワイプジェスチャや、スピーカ２０９からのオーディオ出力の音量を増大または減少するための回転ジェスチャなどを使用し得る、 Display 208, speaker 209, keyboard 210, and mouse 211 may facilitate user interaction with computer system 200. These components can be of a general conventional design or modified to make it desirable to provide any type of user interaction. In some embodiments, the results of motion capture using camera interface 206 and mcap program 214 may be interpreted as user input. For example, the user can make a hand gesture that is analyzed using the morphcap program 214 and the results of this analysis are executed on the processor 200 (eg, a web browser, word processor or other application). It can be interpreted as instructions to other programs. Thus, by way of example, the user may rotate upper or lower swipe gestures to “scroll” the current web page displayed on display 208, or increase or decrease the volume of audio output from speaker 209. You can use gestures, etc.

当然であるが、コンピュータシステム２００は例示であって、変更や修正は可能である。コンピュータシステムは、サーバシステム、デスクトップシステム、ラップトップシステム、タブレット、スマートフォンまたはパーソナルデジタルアシスタントなどを含む様々なフォームファクタで実現され得る。特定の実現態様は、例えば有線及び／または無線ネットワークインタフェース、メディアの再生及び／または記録機能など、本明細書に記載されていない他の機能を含み得る。いくつかの実施形態では、１以上のカメラが、分離した構成要素として与えられるのではなく、コンピュータ内に組み入れられ得る。さらに、画像解析装置は、コンピュータシステムの構成要素（例えば、プログラムコードを実行するプロセッサ、ＡＳＩＣまたは画像データと出力解析結果を受信するための適切なＩ／Ｏインタフェースを備えた固定機能デジタル信号プロセッサ）のサブセットのみを使用して実現され得る。 Of course, the computer system 200 is an example, and changes and modifications are possible. The computer system can be implemented in various form factors including a server system, desktop system, laptop system, tablet, smartphone or personal digital assistant. Particular implementations may include other functions not described herein, such as wired and / or wireless network interfaces, media playback and / or recording functions, for example. In some embodiments, one or more cameras may be incorporated into a computer rather than being provided as a separate component. Further, the image analysis device is a component of a computer system (eg, a processor that executes program code, an ASIC or a fixed function digital signal processor with an appropriate I / O interface for receiving image data and output analysis results). Can be implemented using only a subset of

コンピュータシステム２００は、特定のブロックを参照して本明細書に記載されているが、当該ブロックは説明の便宜のために定義されているものであって、構成部品の特定の物理的配置を意味することを意図するものではないと理解されるべきである。さらに、当該ブロックは、物理的に別個の構成要素に対応する必要はない。物理的に別個の構成要素が使用される場合、必要に応じて、構成要素間の接続（例えば、データ通信用など）が有線及び／または無線と成り得る。 Although the computer system 200 is described herein with reference to particular blocks, the blocks are defined for convenience of explanation and refer to particular physical arrangements of components. It should be understood that it is not intended to. Furthermore, the blocks need not correspond to physically separate components. Where physically separate components are used, the connections between components (eg, for data communication, etc.) can be wired and / or wireless as desired.

プロセッサ２０２による物体検出モジュール２２２の実行は、プロセッサ２０２に、物体の画像を撮像するためにカメラインタフェース２０６を動作させたり、画像データの解析によって背景画素から対象画素を区別させたりする。図３Ａ〜図３Ｃは、本発明の様々な実施形態に係る得られ得る画素行の輝度データの３つの異なるグラフである。各グラフは１つの画素行について例示しているが、画像が典型的には多数の画素行を含むとともに、行が任意の数の画素を含み得ると理解されるべきである。例えば、ＨＤビデオ画像は、それぞれ１９２０画素を有する１０８０行を含み得る。 The execution of the object detection module 222 by the processor 202 causes the processor 202 to operate the camera interface 206 in order to capture an image of the object, or to distinguish the target pixel from the background pixel by analyzing the image data. 3A-3C are three different graphs of pixel row luminance data that may be obtained according to various embodiments of the invention. Each graph illustrates one pixel row, but it should be understood that an image typically includes a number of pixel rows and that a row may include any number of pixels. For example, an HD video image may include 1080 rows each having 1920 pixels.

図３Ａは、手のひらの断面など、単一な断面を有する物体の画素行の輝度データ３００を示している。物体に対応する領域３０２内の画素は高輝度を有しているが、背景に対応する領域３０４及び３０６内の画素は著しく低い輝度を有する。図から分かるように、物体の位置は見てすぐに分かるものであり、物体のエッジの位置（位置３０８、位置３１０）は容易に識別される。例えば、０．５を上回る輝度を持つ画素は対象画素であると見なすことが可能であり、反対に０．５を下回る輝度を持つ画素は背景画素であると見なすことが可能である。 FIG. 3A shows luminance data 300 for a pixel row of an object having a single cross section, such as a cross section of a palm. Pixels in the region 302 corresponding to the object have high brightness, whereas pixels in the regions 304 and 306 corresponding to the background have significantly low brightness. As can be seen from the figure, the position of the object is readily apparent and the edge positions (position 308, position 310) of the object can be easily identified. For example, a pixel having a luminance higher than 0.5 can be regarded as a target pixel, and a pixel having a luminance lower than 0.5 can be regarded as a background pixel.

図３Ｂは、開いた手の指の断面など、複数の異なる断面を有する物体の画素行の輝度データ３２０を示している。物体に対応する領域３２２，３２３及び３２４は高輝度を有しているが、背景に対応する領域３２６〜３２９内の画素は低い輝度を有するこの場合も、輝度に対する単純なカットオフ閾値（例えば、０．５）は、対象画素を背景画素と区別するために十分であり、物体のエッジを容易に確定することができる。 FIG. 3B shows luminance data 320 for pixel rows of an object having a plurality of different cross sections, such as a cross section of a finger of an open hand. The regions 322, 323 and 324 corresponding to the object have high brightness, but the pixels in the regions 326 to 329 corresponding to the background have low brightness. Again, a simple cut-off threshold for brightness (e.g., 0.5) is sufficient to distinguish the target pixel from the background pixel, and the edge of the object can be easily determined.

図３Ｃは、広げた２本指をカメラに向けた手の断面など、物体までの距離が行の所々で変化する画素行の輝度データ３４０を示している。開いた指に対応する領域３４２及び３４３は最も高い輝度を有する。手の他の部分に対応する領域３４４及び３４５はわずかに少ない輝度を有する。これは、１つはより遠くにあるということ、１つは開いた指による影が掛かること、に起因し得る。背景に対応する領域３４８及び３４９は、背景領域であり、手が含まれる領域３４２〜３４５よりも著しく暗い。輝度に対するカットオフ閾値（例えば、０．５）は、この場合でも対象画素を背景画素と区別するために十分である。対象画素のさらなる解析は、領域３４２及び３４３のエッジを検出するためにも行われ得るものであり、物体の形状に関する追加の情報を与える。 FIG. 3C shows pixel row luminance data 340 in which the distance to the object varies, such as a cross section of a hand with two fingers spread toward the camera. Regions 342 and 343 corresponding to the open finger have the highest brightness. Regions 344 and 345 corresponding to other parts of the hand have slightly less brightness. This may be due to one being farther away and one being shadowed by an open finger. Regions 348 and 349 corresponding to the background are background regions and are significantly darker than regions 342 to 345 that include hands. A cutoff threshold for luminance (eg 0.5) is still sufficient to distinguish the target pixel from the background pixel. Further analysis of the target pixel can also be performed to detect the edges of regions 342 and 343, giving additional information regarding the shape of the object.

当然であるが、図３Ａ〜３Ｃに示すデータは例示である。いくつかの実施形態では、予測される距離（例えば、図１のｒ_Ｏ）にある物体が露出オーバーになる（即ち、全てではないにしても多くの対象画素が完全に１．０の輝度レベルで飽和してしまう）ことがあるなど、光源１０８，１１０の強度を調整することが望ましくなり得る。（物体の現実の輝度が、実際には高くなり得る。）背景画素も多少明るくし得るが、背景画素も飽和レベルに近づくほど強度が高く設定されない限り、依然として距離に対する光強度の減衰１／ｒ^２によって物体及び背景画素を区別することができる状態である。図３Ａ〜３Ｃに示したように、物体と背景との間に強いコントラストを作り出すために物体に向けられた照明の使用は、背景画素と対象画素を区別するための簡単で高速なアルゴリズムの使用を可能にするものであり、リアルタイムモーションキャプチャシステムにおいて特に有用となり得る。背景及び物体画素を区別するタスクの簡素化は、他のモーションキャプチャタスク（例えば、物体の位置、形状及び／または動きの再構築）のためのコンピュータ資源を開放し得る。 Of course, the data shown in FIGS. 3A-3C are exemplary. In some embodiments, an object at the expected distance (eg, r _O in FIG. 1) is overexposed (ie, many if not all of the target pixels are at a brightness level of 1.0). It may be desirable to adjust the intensity of the light sources 108, 110. (The actual brightness of the object can actually be high.) The background pixels can also be a little brighter, but the intensity of the light intensity with respect to the distance is still reduced 1 / r unless the intensity is set high enough that the background pixels also approach the saturation level. ² is a state in which an object and a background pixel can be distinguished. As shown in FIGS. 3A-3C, the use of illumination directed at the object to create a strong contrast between the object and the background uses a simple and fast algorithm to distinguish the background and target pixels. And can be particularly useful in real-time motion capture systems. Simplification of the task of distinguishing background and object pixels may free up computer resources for other motion capture tasks (eg, reconstruction of object position, shape and / or motion).

本発明の実施形態に係る画像内の物体の位置を識別するための処理のフロー図を示す図４を参照する。処理４００は、例えば図１のシステム１００において実現され得る。ブロック４０２において、光源１０８，１１０がオンする。ブロック４０４において、１以上の画像がカメラ１０２，１０４を用いて撮像される。いくつかの実施形態では、各カメラからの１つの画像が撮像される。他の実施形態では、各カメラから一連の画像が撮像される。２つのカメラからの２つの画像は、２つのカメラからの相関画像が物体の３Ｄ位置の決定に使用され得るように、時間において厳密に相関させられ得る（例えば、数ミリ秒以内となる同時）。 Reference is made to FIG. 4 showing a flow diagram of a process for identifying the position of an object in an image according to an embodiment of the invention. Process 400 may be implemented, for example, in system 100 of FIG. In block 402, the light sources 108, 110 are turned on. At block 404, one or more images are captured using the cameras 102,104. In some embodiments, one image from each camera is taken. In other embodiments, a series of images are taken from each camera. The two images from the two cameras can be closely correlated in time (eg, simultaneously within a few milliseconds) so that the correlation images from the two cameras can be used to determine the 3D position of the object. .

ブロック４０６において、背景画素から対象画素を区別するために、画素の輝度の閾値が適用される。ブロック４０６は、背景及び物体画素の間の遷移点に基づいた物体のエッジの位置の特定をも含み得る。いくつかの実施形態において、各画素は、最初に、輝度カットオフ閾値を超えているか否かに基づいて物体または背景のいずれかに分類される。例えば、図３Ａ〜３Ｃに示すように、０．５の飽和レベルにおけるカットオフが使用され得る。画素が分類されると、背景画素が画素物体に隣接している位置を見つけることによって、エッジが検出され得る。いくつかの実施形態では、ノイズ欠陥を回避するために、エッジの両側となる背景及び物体画素の領域が、特定の最小の大きさ（例えば、２、４または８画素）を有することを必要とし得る。 At block 406, a pixel brightness threshold is applied to distinguish the target pixel from the background pixel. Block 406 may also include locating the edge of the object based on a transition point between the background and the object pixel. In some embodiments, each pixel is initially classified as either an object or background based on whether a luminance cutoff threshold is exceeded. For example, as shown in FIGS. 3A-3C, a cutoff at a saturation level of 0.5 may be used. Once the pixels are classified, an edge can be detected by finding the location where the background pixel is adjacent to the pixel object. Some embodiments require that the background and object pixel regions on either side of the edge have a certain minimum size (eg, 2, 4 or 8 pixels) to avoid noise defects. obtain.

他の実施形態では、エッジが、画素が物体であるか背景であるかの最初の分類をすることなく検出され得る。例えば、Δβは、隣接する画素間の輝度の差として定義され得るものであり、｜Δβ｜が閾値を上回る（例えば、飽和範囲の単位で０．３または０．５）ことが、隣接する画素間における背景から物体または物体から背景への遷移を示し得る。（Δβの符号は、遷移の方向を示し得る。）物体のエッジが、実際には画素の中央である場合、境界において中間値を有する画素が存在し得る。これは、例えば画素ｉについて２つの輝度値（βＬ＝（β_ｉ＋β_ｉ−１）／２及びβＲ＝（β_ｉ＋β_ｉ＋１）／２、画素（ｉ−１）は画素ｉの左側、画素（ｉ＋１）は画素ｉの右側）を算出することによって、検出され得る。画素ｉがエッジの近くにない場合は一般的に｜βＬ−βＲ｜がゼロに近くなり、画素がエッジの近くにある場合は｜βＬ−βＲ｜が１に近くなり、｜βＬ−βＲ｜の閾値がエッジを検出するために使用され得る。 In other embodiments, the edge may be detected without first classifying whether the pixel is an object or a background. For example, Δβ can be defined as the luminance difference between adjacent pixels, and | Δβ | is above a threshold (eg, 0.3 or 0.5 in units of saturation range). A transition from background to object or from object to background may be indicated. (The sign of Δβ may indicate the direction of the transition.) If the edge of the object is actually the center of the pixel, there may be a pixel with an intermediate value at the boundary. For example, for pixel i, two luminance values (βL = (β _i + β _i−1 ) / 2 and βR = (β _i + β _{i + 1} ) / 2, pixel (i−1) is the left side of pixel i, pixel ( i + 1) can be detected by calculating the right side of pixel i). In general, | βL-βR | is close to zero when pixel i is not near the edge, and | βL-βR | is close to 1 when pixel is near the edge, and | βL-βR | A threshold can be used to detect edges.

いくつかの例では、物体の一部が画像内の別の物体を部分的に遮蔽してもよい。例えば手の場合、指が、手のひらや別の指を部分的に遮蔽してもよい。物体の一部が別の物体を部分的に遮蔽して生じる遮蔽エッジも、背景画素が除去されれば、小さいが明らかである輝度の変化に基づいて検出され得る。図３Ｃは、そのような部分的な遮蔽の例を示しており、遮蔽エッジの位置は明らかである。 In some examples, a part of an object may partially occlude another object in the image. For example, in the case of a hand, the finger may partially block the palm or another finger. Shielding edges that result from partly obscuring another object may also be detected based on small but obvious changes in brightness if background pixels are removed. FIG. 3C shows an example of such partial occlusion, where the location of the occlusion edge is clear.

検出されたエッジは、多くの目的に使用され得る。例えば、前述のように、２つのカメラから見た物体のエッジは、３Ｄ空間内の物体のおよその位置を決定するために使用され得る。カメラの光軸を横断する２Ｄ平面内の物体の位置は、１つの画像から決定され得るとともに、２つの異なるカメラからの時間相関画像における物体の位置の間のオフセット（視差）は、カメラ間の間隔が既知であれば、物体までの距離を決定するために使用され得る。 The detected edges can be used for many purposes. For example, as described above, the edge of an object viewed from two cameras can be used to determine the approximate position of the object in 3D space. The position of the object in the 2D plane across the optical axis of the camera can be determined from one image, and the offset (parallax) between the positions of the objects in the time-correlated images from two different cameras is If the spacing is known, it can be used to determine the distance to the object.

さらに、物体の位置及び形状は、２つの異なるカメラからの時間相関画像におけるそのエッジの位置に基づいて決定され得るとともに、物体の動き（関節を含む）は、一連となる一対の画像の解析から決定され得る。物体のエッジの位置に基づいた物体の位置、形状及び動き動きの決定に使用され得る技術例として、同時係属中のシリアル番号第１３／４１４４８５（２０１２年３月７日米国出願）の開示全体が、参照として本明細書に援用される。本開示にアクセスする当業者は、物体のエッジの位置に関する情報に基づいた物体の位置、形状及び動きを決定するものとしても使用され得る他の技術を認識するであろう。 Furthermore, the position and shape of the object can be determined based on the position of its edge in the time-correlated images from two different cameras, and the motion of the object (including joints) can be derived from a series of analysis of a pair of images. Can be determined. As an example of a technique that can be used to determine the position, shape, and motion of an object based on the position of the edge of the object, the entire disclosure of co-pending serial number 13/414485 (filed March 7, 2012 US) is , Incorporated herein by reference. Those skilled in the art accessing the present disclosure will recognize other techniques that may also be used to determine the position, shape, and movement of an object based on information about the position of the edge of the object.

上記第１３／４１４４８５出願に基づいて、物体の動き及び／または位置は、少量の情報を使用して再構成される。例えば、特定の視点から見た、物体の形状またはシルエットの外形は、様々な面内における当該視点から物体に対する接線を定義するために使用され得る（本明細書では「スライス」という）。わずか２つの異なった視点を用いると、当該視点から物体への４つ（またはそれ以上）の接線が、所定のスライス内で得られ得る。これらの４つ（またはそれ以上）の接線から、スライス内の物体の位置を決定することが可能であるとともに、スライス内のその断面を例えば１以上の楕円または他の単純閉曲線を用いて近似することが可能である。別の例として、特定のスライス内の物体の表面上の点の位置は、直接的に決定され得る（例えば、タイムオブフライトカメラを使用）とともに、当該スライス内の物体の断面の位置及び形状は、当該点に対する楕円や他の単純閉曲線のフィッティングにより近似され得る。異なるスライスについての位置及び断面決定は、その位置及び形状を含む物体の３Ｄモデルを構築するために相関させられ得る。一連の画像は、物体の動きをモデル化するものと同じ技術を用いて解析され得る。複数の独立した関節部を持つ複雑な物体（例えば、人間の手）の動きは、これらの技術を使用してモデル化され得る。 Based on the 13/414485 application, the movement and / or position of the object is reconstructed using a small amount of information. For example, the shape of an object or the outline of a silhouette viewed from a particular viewpoint can be used to define a tangent to the object from that viewpoint in various planes (referred to herein as a “slice”). With only two different viewpoints, four (or more) tangents from the viewpoint to the object can be obtained within a given slice. From these four (or more) tangents, it is possible to determine the position of the object in the slice and approximate its cross-section in the slice using, for example, one or more ellipses or other simple closed curves. It is possible. As another example, the position of a point on the surface of an object in a particular slice can be determined directly (eg, using a time-of-flight camera) and the position and shape of the cross-section of the object in that slice is Can be approximated by fitting an ellipse or other simple closed curve to the point. Position and cross-sectional determinations for different slices can be correlated to build a 3D model of the object including its position and shape. The series of images can be analyzed using the same techniques that model the movement of the object. The movement of complex objects (eg, human hands) with multiple independent joints can be modeled using these techniques.

より具体的に、ｘｙ平面内の楕円は、中心のｘ及びｙ座標（Ｘ_Ｃ、Ｙ_Ｃ）、長半径、短半径及び回転角度（例えば、ｘ軸に対する長半径の角度）、の５つのパラメータで特徴付けられ得る。４つの接線だけでは、楕円は未決定である。しかし、この事実にもかかわらず、楕円を推定するための効率的な処理は、パラメータの１つに関する初期の作業仮説（または「推測」）の定立と、解析中に収集される追加情報としての仮説の再検討と、を含む。この追加情報は、例えば、カメラ及び／または物体の性質に基づく物理的な制約を含み得る。いくつかの状況では、例えば２以上の視点が利用可能であるため、スライスの一部または全部について、物体に対する４以上の接線が利用可能となり得る。楕円形断面は、依然として決定可能であり、いくつかの例における処理では、パラメータの値を仮定する必要がないように若干簡略化されている。いくつかの例では、追加の接線は、追加の複雑さを生じ得る。いくつかの状況では、例えば１つのカメラの視野の範囲外に物体のエッジがあるためまたはエッジが検出されなかったため、スライスの一部または全部について、物体に対する４以上の接線が利用可能となり得る。３つの接線を有するスライスが解析され得る。例えば、隣接するスライス（例えば、少なくとも４つの接線を有していたスライス）にフィットする楕円からの２つのパラメータを使用することで、当該楕円及び３つの接線についての連立方程式が解かれ得るものであると十分に断定される。別の選択肢として、３つの接線にフィットし得る円がある。平面内で円を決定する３つのパラメータ（中心座標と半径）のみが必要であるため、３つの接線は十分に円にフィットする。３未満の接線を有するスライスは、破棄されるか隣接するスライスと組み合わせられ得る。 More specifically, the ellipse in the xy plane has five parameters: center x and y coordinates (X _C , Y _C ), major radius, minor radius, and rotation angle (eg, major radius angle with respect to the x axis). Can be characterized by With only four tangents, the ellipse is undetermined. However, despite this fact, an efficient process for estimating an ellipse is the establishment of an initial working hypothesis (or “guess”) for one of the parameters and additional information collected during the analysis. Including reexamination of hypotheses. This additional information may include physical constraints based on, for example, the nature of the camera and / or object. In some situations, for example, two or more viewpoints are available, so four or more tangents to the object may be available for some or all of the slices. The elliptical cross-section is still determinable and the process in some examples is slightly simplified so that it is not necessary to assume parameter values. In some examples, additional tangents can result in additional complexity. In some situations, four or more tangents to the object may be available for some or all of the slices, for example because the edge of the object is outside the field of view of one camera or no edge was detected. A slice with three tangents can be analyzed. For example, using two parameters from an ellipse that fits an adjacent slice (eg, a slice that had at least four tangents), the simultaneous equations for that ellipse and three tangents can be solved. It is fully determined that there is. Another option is a circle that can fit three tangents. Since only three parameters (center coordinates and radius) are needed to determine the circle in the plane, the three tangents fit the circle sufficiently. Slices with tangents less than 3 can be discarded or combined with adjacent slices.

ある物体が対象の物体に対応するか否かを幾何学的に決定するための１つの方法は、一連の楕円の体積を求めることで、物体を定義するとともに、楕円に基づく物体の定義に対して幾何学的に矛盾する物体セグメントを破棄すること（例えば、過度に円筒状、過度に直線状、過度に薄い、過度に小さいまたは過度に遠いセグメントの破棄）である。物体を特徴づけるために十分な数の楕円が残り、それが対象の物体に整合している場合、そのように識別されて、フレームからフレームへと追跡され得る。 One method for geometrically determining whether an object corresponds to a target object is to define the object by determining the volume of a series of ellipses, and to define an object based on an ellipse. And discarding object segments that are geometrically inconsistent (eg, discarding excessively cylindrical, excessively straight, excessively thin, excessively small, or excessively far segments). If a sufficient number of ellipses remain to characterize the object and it matches the object of interest, it can be identified as such and tracked from frame to frame.

いくつかの実施形態では、複数のスライスのそれぞれは、そのスライス内の物体の楕円形断面の大きさ及び位置を決定するために個別に解析される。これは、異なるスライスにわたって断面を相関させることにより改善され得る初期の３Ｄモデル（具体的には、楕円形断面の積み重ね）を与える。例えば、物体の表面は連続性を有するものとなることが予想され、不連続な楕円は結果的に無視され得る。例えば、動きや変形の継続性に関連した予想に基づいて、時間を越えて自身の３Ｄモデルを相関させることによって、さらなる改善が得られ得る。図１及び図２を改めて参照すると、いくつかの実施形態において、光源１０８，１１０は、継続的にオンにされるのではなく、パルスモードで動作され得る。例えば、光源１０８，１１０が、定常状態動作よりもパルスで明るい光を生成する能力を有する場合、これは有用となり得る。図５は、５０２に示すように光源１０８，１１０が一定の間隔でオンになるパルス状である時系列を示している。５０４に示すように、カメラ１０２，１０４のシャッタは、光パルスと一致するタイミングで画像を撮像するために開き得る。このように、画像が撮像される時間中、対象の物体は明るく照明され得る。いくつかの実施形態では、物体のシルエットが、異なる視点から見た物体に関する情報を示す１以上の物体の画像から抽出される。シルエットは複数の異なる技術を用いて得られ得るが、いくつかの実施形態では、当該シルエットは、物体の画像を撮像するカメラの使用及び物体のエッジを検出するための画像の解析によって得られる。 In some embodiments, each of the plurality of slices is analyzed individually to determine the size and position of the elliptical cross section of the object within that slice. This provides an initial 3D model (specifically, a stack of elliptical cross sections) that can be improved by correlating cross sections across different slices. For example, the surface of the object is expected to be continuous, and discontinuous ellipses can be ignored as a result. Further improvements can be obtained, for example, by correlating their 3D model over time based on expectations related to the continuity of movement and deformation. Referring back to FIGS. 1 and 2, in some embodiments, the light sources 108, 110 may be operated in a pulsed mode rather than being turned on continuously. For example, this can be useful if the light sources 108, 110 have the ability to generate light that is pulsed and brighter than steady state operation. FIG. 5 shows a time series in a pulse form in which the light sources 108 and 110 are turned on at regular intervals as indicated by 502. As shown at 504, the shutters of the cameras 102, 104 can be opened to capture an image at a timing that matches the light pulse. In this way, the target object can be illuminated brightly during the time that the image is captured. In some embodiments, object silhouettes are extracted from one or more object images that show information about the object viewed from different viewpoints. Although the silhouette can be obtained using a number of different techniques, in some embodiments, the silhouette is obtained by using a camera to capture an image of the object and analyzing the image to detect the edges of the object.

いくつかの実施形態では、光源１０８，１１０のパルス駆動が、対象の物体及び背景の間のコントラストをさらに改善するために使用され得る。特に、自身が光を出射するまたは反射性が高い物体が含まれているシーンの場合、シーン内の関連及び非関連（例えば、背景）の物体を区別するための能力が損なわれ得る。この問題は、カメラの露光時間を非常に短い時間（例えば、１００マイクロ秒以下）に設定するとともに、非常に高い電力（即ち、５〜２０ワット、または、場合によっては、例えば４０ワットなどのより高いレベル）で照明をパルス駆動することによって、対処され得る。このとき、最も一般的な環境照明の光源（例えば、蛍光灯）は、そのような明るい短時間の照明と比較して非常に暗い。即ち、マイクロ秒では、非パルス光源は、それらがミリ秒以上の露出時間に表れたとしても薄暗いものである。実際、この方法では、これらが同じ一般的な帯域で発光しても、他の物体に対する対象の物体のコントラストを増大させる。したがって、このような条件下における輝度による判別は、画像の再構成及び処理の目的のための無関連の物体の無視を可能にする。平均消費電力も低減される。２０ワットで１００マイクロ秒の場合、平均消費電力は１０ミリワットを下回る。一般的に、光源１０８，１１０は、カメラ全体の露光時間中にオンになるように動作させられる（即ち、パルス幅が、露光時間と等しくかつこれに揃っている）。 In some embodiments, pulsed driving of the light sources 108, 110 can be used to further improve the contrast between the object of interest and the background. In particular, in the case of scenes that contain light that is emitted or that are highly reflective, the ability to distinguish between related and unrelated (eg, background) objects in the scene can be compromised. The problem is that the exposure time of the camera is set to a very short time (eg 100 microseconds or less) and very high power (ie 5-20 watts, or in some cases more than 40 watts, for example). Can be addressed by pulsing the illumination at a high level). At this time, the most common ambient illumination light source (eg, fluorescent lamp) is very dark compared to such bright short-time illumination. That is, in microseconds, non-pulsed light sources are dim even if they appear with exposure times of milliseconds or longer. In fact, this method increases the contrast of the object of interest with respect to other objects even though they emit in the same general band. Thus, discrimination by luminance under such conditions allows for irrelevant objects to be ignored for image reconstruction and processing purposes. Average power consumption is also reduced. For 20 watts and 100 microseconds, the average power consumption is below 10 milliwatts. In general, the light sources 108, 110 are operated to be on during the exposure time of the entire camera (ie, the pulse width is equal to and aligned with the exposure time).

光源１０８，１１０をオンにして撮像された画像と光源１０８，１１０をオフにして撮像された画像との比較をする目的のために、光源１０８，１１０のパルスを調整することも可能である。図６は、６０４に示すようにカメラ１０２，１０４のシャッタが画像を撮像するために開いている間に、６０２に示すように光源１０８，１１０が一定の間隔でオンになるパルス状である時系列を示す。この場合、光源１０８，１１０は、１つおきの画像に対して「オン」になる。対象の物体が、背景領域よりも光源１０８，１１０に対して著しく近い場合、背景画素に対する光強度の差よりも、対象画素に対する光強度の差の方が、が強くなる。したがって、一連の画像内の画素の比較が、物体及び背景画素の区別に役立ち得る。 It is also possible to adjust the pulses of the light sources 108 and 110 for the purpose of comparing images captured with the light sources 108 and 110 turned on and images taken with the light sources 108 and 110 turned off. FIG. 6 illustrates a pulsed state in which the light sources 108 and 110 are turned on at regular intervals as indicated by 602 while the shutters of the cameras 102 and 104 are open to capture an image as indicated by 604. Indicates the series. In this case, the light sources 108 and 110 are “on” for every other image. When the target object is significantly closer to the light sources 108 and 110 than the background region, the difference in light intensity with respect to the target pixel is stronger than the difference in light intensity with respect to the background pixel. Thus, a comparison of pixels in a series of images can help distinguish between object and background pixels.

図７は、本発明の実施形態に係る一連の画像を用いて物体のエッジを識別する処理７００のフロー図である。ブロック７０２において、光源がオフにされ、ブロック７０４において、第１画像（Ａ）が撮像される。次に、ブロック７０６において、光源がオンにされ、ブロック７０８において、第２画像（Ｂ）が撮像される。ブロック７１０において、「差分」画像Ｂ−Ａは、例えば、画像Ａの各画素の輝度値を、画像Ｂの対応する画素の輝度値から減算することによって、算出される。画像Ｂは、光がオンの状態で撮像されたものであるため、Ｂ−Ａはほとんどの画素で正になることが予想される。 FIG. 7 is a flow diagram of a process 700 for identifying an object edge using a series of images according to an embodiment of the invention. In block 702, the light source is turned off, and in block 704, the first image (A) is captured. Next, in block 706, the light source is turned on, and in block 708, the second image (B) is captured. In block 710, a “difference” image B-A is calculated, for example, by subtracting the luminance value of each pixel of image A from the luminance value of the corresponding pixel of image B. Since the image B is captured with the light turned on, B-A is expected to be positive in most pixels.

差分画像は、閾値または他の画素毎の基準の適用によって背景及び前景を区別するために、使用される。ブロック７１２において、物体画素を識別するために差分画像に対して閾値が適用され、閾値を上回る（Ｂ−Ａ）は対象画素に関連付けられるとともに、閾値を下回る（Ｂ−Ａ）は背景画素に関連付けられる。物体のエッジは、その後で、上述のように背景画素に隣接する対象画素の識別によって定義され得る。物体のエッジは、上述のように位置及び／または動き検出などの目的のために使用され得る。 The difference image is used to distinguish the background and foreground by applying thresholds or other per-pixel criteria. In block 712, a threshold is applied to the difference image to identify the object pixel, with a threshold value (BA) being associated with the target pixel and a threshold value (BA) being associated with the background pixel. It is done. The edge of the object can then be defined by the identification of the target pixel adjacent to the background pixel as described above. The edge of the object may be used for purposes such as position and / or motion detection as described above.

代替的な実施形態では、物体のエッジが、一対ではなく三つ組の画像フレームを使用して識別される。例えば、１つの実装では、第１画像（画像１）が光源をオフとした状態で得られ、第２画像（画像２）が光源をオンにした状態で得られ、さらに第３画像（画像３）が光源を再びオフにした状態で撮像される。２つの差分画像
画像４＝ａｂｓ（画像２−画像１）及び
画像５＝ａｂｓ（画像２−画像３）
は、画素の輝度値を減算することによって定義される。最終的な画像、画像６は、画像４及び画像５の２つの画像に基づいて定義される。特に、画像６における各画素の値は、画像４及び画像５における２つの対応する画素値の小さい方である。換言すると、各画素について、画像６＝ｍｉｎ（画像４，画像５）である。画像６は、精度が改善された差分画像を表し、そのほとんどの画素は正となる。再度、前景及び背景画素を区別するために、閾値または他の基準が画素毎に使用され得る。 In an alternative embodiment, object edges are identified using triplet image frames instead of pairs. For example, in one implementation, a first image (image 1) is obtained with the light source turned off, a second image (image 2) is obtained with the light source turned on, and a third image (image 3) is obtained. ) Is imaged with the light source turned off again. Two difference images Image 4 = abs (image 2 -image 1) and image 5 = abs (image 2 -image 3)
Is defined by subtracting the luminance value of the pixel. The final image, image 6, is defined based on two images, image 4 and image 5. In particular, the value of each pixel in image 6 is the smaller of the two corresponding pixel values in images 4 and 5. In other words, for each pixel, image 6 = min (image 4, image 5). Image 6 represents a difference image with improved accuracy, and most of its pixels are positive. Again, thresholds or other criteria may be used for each pixel to distinguish foreground and background pixels.

本明細書に記載のコントラストに基づく物体検出は、対象の物体が背景物体よりも（複数の）光源に対して大幅に近づく（例えば、半分の距離）ことが予想される任意の状況に対して適用され得る。動き検出の使用に関するそのような適用の一つとして、コンピュータシステムとやりとりするためのユーザ入力がある。例えば、ユーザが画面を指し示すまたは他の手でジェスチャをすると、それが入力としてコンピュータシステムに解釈され得る。 The contrast-based object detection described herein is for any situation where the target object is expected to be much closer (eg, half the distance) to the light source (s) than the background object. Can be applied. One such application for using motion detection is user input for interacting with a computer system. For example, when a user points to the screen or makes a gesture with another hand, it can be interpreted as input to the computer system.

本発明の実施形態に係るユーザ入力装置である動き検出器を含むコンピュータシステム８００が、図８に示されている。コンピュータシステム８００は、プロセッサ、メモリ、固定またはリムーバブルディスクドライブ、ビデオドライバ、オーディオドライバ、ネットワークインタフェースコンポーネントなど、様々なコンピュータシステムの構成要素を収容し得るデスクトップボックス８０２を含む。ディスプレイ８０４は、デスクトップボックス８０２に接続されるとともにユーザが閲覧可能となるように配置されている。キーボード８０６は、ユーザの手が簡単に届く範囲内に配置される。動き検出器ユニット８０８は、キーボード８０６の近くに配置され（例えば、図示のような後方または片側）、その中でユーザがディスプレイ８０４に向かってジェスチャをすることが自然となる領域（例えば、キーボードの上側の空間であってモニタの前）に対して向けられている。カメラ８１０，８１２（例えば、上述のカメラ１０２，１０４と同様または同一であり得るもの）は、一般的には上側を向くように配置され、光源８１４，８１６（上述の光源１０８，１１０と同様または同一であり得るもの）は、動き検出器ユニット８０８の上の領域を照明するために、カメラ８１０，８１２の両側に配置される。典型的な実装では、カメラ８１０，８１２及び光源８１４，８１６は、実質的に同一平面内にある。この構成は、例えばエッジ検出を妨害し得る影の出現（光源がカメラに隣接せずに間に位置する場合と同様になり得る）を防止する。不図示のフィルタは、光源８１４，８１６のピーク周波数付近の帯域の外側となるすべての光を除去するために、動き検出器ユニット８０８の上面の上（または、カメラ８１０，８１２の開口のちょうど上）に配置され得る。 A computer system 800 including a motion detector that is a user input device according to an embodiment of the present invention is shown in FIG. The computer system 800 includes a desktop box 802 that may contain various computer system components, such as a processor, memory, fixed or removable disk drives, video drivers, audio drivers, network interface components, and the like. The display 804 is connected to the desktop box 802 and arranged so that the user can browse. The keyboard 806 is disposed within a range that can be easily reached by the user. The motion detector unit 808 is located near the keyboard 806 (eg, back or one side as shown) in which it is natural for the user to make a gesture toward the display 804 (eg, on the keyboard). It is directed to the upper space (in front of the monitor). Cameras 810, 812 (eg, which may be similar to or the same as the cameras 102, 104 described above) are generally arranged to face upwards, and light sources 814, 816 (similar to or similar to the light sources 108, 110 described above). (Which may be the same) are placed on either side of the cameras 810, 812 to illuminate the area above the motion detector unit 808. In a typical implementation, cameras 810, 812 and light sources 814, 816 are substantially in the same plane. This configuration prevents, for example, the appearance of shadows that can interfere with edge detection (similar to the case where the light source is not adjacent to the camera and is located in between). A filter not shown is above the top surface of the motion detector unit 808 (or just above the apertures of the cameras 810, 812) to remove all light that falls outside the band near the peak frequency of the light sources 814, 816. ).

図示の構成では、カメラ８１０，８１２の視野内でユーザが手または他の物体（例えば、鉛筆）を動かすと、背景はおそらく天井及び／または天井に設けられた様々な定着物から成り得る。ユーザの手が、動き検出器ユニット８０８の上側１０〜２０センチメートルになり得るのに対して、天井はその距離の５〜１０倍（またはそれ以上）となり得る。光源８１４，８１６からの照明は、それ故に天井と比較してユーザの手に対してはるかに強くなり、本明細書に記載の技術が、カメラ８１０，８１２によって撮像された画像内の背景画素から物体画素を確実に区別するために使用され得る。赤外光が使用される場合は、ユーザの気が散ったり光によって妨害されたりすることがない。 In the illustrated configuration, when a user moves a hand or other object (eg, a pencil) within the field of view of the cameras 810, 812, the background may possibly consist of a ceiling and / or various fixings provided on the ceiling. The user's hand can be 10-20 centimeters above the motion detector unit 808, while the ceiling can be 5-10 times (or more) that distance. The illumination from the light sources 814, 816 is therefore much stronger to the user's hand compared to the ceiling, and the techniques described herein can be used from background pixels in the image captured by the cameras 810, 812. It can be used to reliably distinguish object pixels. When infrared light is used, the user is not distracted or disturbed by the light.

コンピュータシステム８００は、図１に示した構造を利用し得る。例えば、動き検出器ユニット８０８のカメラ８１０，８１２がデスクトップボックス８０２に対して画像データを与え得るとともに、画像解析及びその後の解釈がデスクトップボックス８０２に収容されているプロセッサ及び他の構成要素を使用して行われ得る。また、動き検出器ユニット８０８は、画像解析及び解釈の一部または全部の段階を実行するためのプロセッサまたは他の構成要素を含み得る。例えば、動き検出器ユニット８０８は、物体画素及び背景画素を区別するための上述の処理の１以上を実行するプロセッサ（プログラム可能なまたは固定機能）を含み得る。この場合、動き検出器ユニット８０８は、さらなる解析及び解釈のために、撮像画像の減少表示（例えば、すべての背景画素をゼロにした表示）を、デスクトップボックス８０２に対して送信し得る。動き検出器ユニット８０８内部のプロセッサ及びデスクトップボックス８０２内のプロセッサの間における計算タスクの特別な分割は不要である。 The computer system 800 may utilize the structure shown in FIG. For example, the cameras 810, 812 of the motion detector unit 808 can provide image data to the desktop box 802, and image analysis and subsequent interpretation uses a processor and other components contained in the desktop box 802. Can be done. The motion detector unit 808 may also include a processor or other component for performing some or all stages of image analysis and interpretation. For example, motion detector unit 808 may include a processor (programmable or fixed function) that performs one or more of the processes described above for distinguishing between object pixels and background pixels. In this case, the motion detector unit 808 may send a reduced display of the captured image (eg, a display with all background pixels zeroed) to the desktop box 802 for further analysis and interpretation. No special division of computational tasks between the processor in the motion detector unit 808 and the processor in the desktop box 802 is necessary.

絶対的な輝度レベルによる物体画素及び背景画素の区別は、必ずしも必要ではない。例えば、物体形状の知見があれば、物体のエッジの明らかな検出もなく画像内の物体を検出するために、輝度減衰のパターンが利用され得る。丸みを帯びた物体（手や指など）では、例えば、１／ｒ^２の関係が、物体の中心付近にガウシアンまたは近ガウシアン輝度分布（near-Gaussian brightness distributions）を生じさせる。ＬＥＤによって照明されるとともにカメラに対して垂直に配置された円筒を撮像すると、円筒軸に対応する明るい中心線を有するとともに各側（円筒の周囲）で明るさが減衰する画像になる。指はおよそ円筒形であり、これらのガウシアンピークを識別することによって、背景が近く背景の相対輝度に起因してエッジが見えない（近接のためか、それとも積極的に赤外光が出射され得るということのため）という状況であっても指を配置することができる。語句「ガウシアン」は、負の二次導関数の曲線を含意するように、本明細書で広義に使用される。多くの場合、そのような曲線はベル型かつ対称形になるが、必ずしもそうではない。例えば、物体の鏡面反射性がより高い状況または物体が極端な角度である場合、曲線が特定の方向にずれ得る。したがって、本明細書で使用する用語「ガウシアン」は、明らかにガウシアン関数に適合する曲線のみに限定されるものではない。 It is not always necessary to distinguish between an object pixel and a background pixel based on an absolute luminance level. For example, if there is knowledge of the object shape, a luminance decay pattern can be used to detect the object in the image without obvious detection of the object edge. For a rounded object (such as a hand or finger), for example, the 1 / r ² relationship produces Gaussian or near-Gaussian brightness distributions near the center of the object. When a cylinder that is illuminated by the LED and is arranged perpendicular to the camera is imaged, the image has a bright center line corresponding to the cylinder axis and the brightness is attenuated on each side (around the cylinder). The finger is approximately cylindrical and by identifying these Gaussian peaks, the background is close and the edges are not visible due to the relative brightness of the background (due to proximity or aggressive infrared light can be emitted. For this reason, the finger can be placed even in the situation. The phrase “Gaussian” is used broadly herein to imply a negative second derivative curve. In many cases, such curves are bell-shaped and symmetric, but not necessarily. For example, if the object is more specular or if the object is at an extreme angle, the curve may shift in a particular direction. Therefore, the term “Gaussian” as used herein is not limited to only curves that clearly fit a Gaussian function.

図９は、本発明の実施形態に係る動き検出器を含むタブレットコンピュータ９００を示す。タブレットコンピュータ９００は、前面にベゼル９０４に囲まれた表示画面９０２を含む筐体を有する。１以上の制御ボタン９０６は、ベゼル９０４に含まれ得る。タブレットコンピュータ９００は、ハウジング内（例えば、表示画面９０２の背後）に、様々な従来のコンピュータの構成要素（プロセッサ、メモリ、ネットワークインタフェースなど）を有し得る。動き検出器ユニット９１０は、ベゼル９０４内に設けられるとともにタブレットコンピュータ９００の前方に位置するユーザの動きをキャプチャするように前面に向けられたカメラ９１２，９１４（例えば、図１のカメラ１０２，１０４と類似または同一）及び光源９１６，９１８（例えば、図１の光源１０８，１１０と類似または同一）を使用した実装になり得る。 FIG. 9 shows a tablet computer 900 including a motion detector according to an embodiment of the present invention. The tablet computer 900 has a housing including a display screen 902 surrounded by a bezel 904 on the front surface. One or more control buttons 906 may be included in the bezel 904. The tablet computer 900 may have various conventional computer components (processor, memory, network interface, etc.) within the housing (eg, behind the display screen 902). The motion detector unit 910 is provided within the bezel 904 and is directed to the front to capture the movement of a user located in front of the tablet computer 900 (eg, with the cameras 102, 104 of FIG. 1). Similar or identical) and light sources 916, 918 (eg, similar or identical to light sources 108, 110 of FIG. 1) may be implemented.

カメラ９１２，９１４の視野内でユーザが手または他の物体を動かすと、上述のように、動きが検出される。この場合、背景は、おそらくユーザ自身の体であり、タブレット・コンピュータ９００から大体２５〜３０センチメートルの距離となる。ユーザは、ディスプレイ９０２から例えば５〜１０センチメートルという短い距離で、手または他の物体を保持し得る。ユーザの手がユーザの体よりも光源９１６，９１８に著しく近い（例えば、半分の距離）限り、本明細書に記載の照明に基づいたコントラストの改善技術が、背景画素から対象画素を区別するために使用され得る。画像解析とその後の入力ジェスチャとしての解釈は、タブレットコンピュータ９００内（例えば、オペレーティングシステムまたはカメラ９１２，９１４から得られるデータを解析するための他のソフトウェアを実行するためのメインプロセッサの活用）で行われ得る。ユーザは、これにより、３Ｄ空間内のジェスチャを用いてタブレット９００とやりとりし得る。 As the user moves a hand or other object within the field of view of the cameras 912, 914, motion is detected as described above. In this case, the background is probably the user's own body and is a distance of approximately 25-30 centimeters from the tablet computer 900. A user may hold a hand or other object at a short distance, such as 5-10 centimeters, from the display 902. As long as the user's hand is significantly closer to the light source 916, 918 than the user's body (eg, half the distance), the illumination-based contrast improvement techniques described herein distinguish the target pixel from the background pixel. Can be used. Image analysis and subsequent interpretation as input gestures is performed within the tablet computer 900 (eg, utilizing the main processor to run an operating system or other software to analyze data obtained from the cameras 912, 914). Can be broken. The user can thereby interact with the tablet 900 using gestures in 3D space.

図１０に示すゴーグルシステム１０００も、本発明の実施形態に係る動き検出器を含み得る。ゴーグルシステム１０００は、例えば、仮想現実及び／または拡張現実環境に関連して使用され得る。ゴーグルシステム１０００は、従来の眼鏡と同様に、ユーザが着用可能なゴーグル１００２を含む。ゴーグル１００２は、ユーザの左右の目に画像（例えば、仮想現実環境の画像）を与える小型の表示画面を含み得る接眼レンズ１００４，１００６を含む。これらの画像は、ゴーグル１００２と有線または無線チャネルのいずれかを介して通信するベースユニット１００８（例えば、コンピュータシステム）によって与えられ得る。カメラ１０１０，１０１２（例えば、図１のカメラ１０２，１０４と類似または同一）は、それらがユーザの視界を不明瞭にしないように、ゴーグル１００２のフレーム部に設けられ得る。光源１０１４，１０１６は、ゴーグル１００２のフレーム部におけるカメラ１０１０，１０１２の両側に設けられ得る。カメラ１０１０，１０１２によって収集された画像は、解析及び仮想または拡張環境とのユーザのやりとりを示すジェスチャとしての解釈のために、ベースユニット１００８に送信され得る。（いくつかの実施形態では、接眼レンズ１００４，１００６を介して提示される仮想または拡張環境は、ユーザの手の表示を含み得るとともに、その表示はカメラ１０１０，１０１２によって収集された画像に基づき得る。） The goggles system 1000 shown in FIG. 10 may also include a motion detector according to an embodiment of the present invention. Goggle system 1000 may be used in connection with, for example, virtual reality and / or augmented reality environments. Goggles system 1000 includes goggles 1002 that a user can wear, similar to conventional glasses. The goggles 1002 include eyepieces 1004 and 1006 that may include small display screens that provide images (eg, virtual reality environment images) to the left and right eyes of the user. These images may be provided by a base unit 1008 (eg, a computer system) that communicates with the goggles 1002 via either a wired or wireless channel. Cameras 1010, 1012 (eg, similar or identical to cameras 102, 104 of FIG. 1) may be provided in the frame portion of goggles 1002 so that they do not obscure the user's view. The light sources 1014 and 1016 can be provided on both sides of the cameras 1010 and 1012 in the frame portion of the goggles 1002. Images collected by the cameras 1010, 1012 may be sent to the base unit 1008 for analysis and interpretation as gestures that indicate user interaction with a virtual or extended environment. (In some embodiments, the virtual or extended environment presented via eyepieces 1004, 1006 may include a display of the user's hand, and the display may be based on images collected by cameras 1010, 1012. .)

カメラ１０１０，１０１２の視野内でユーザが手または他の物体を使用してジェスチャをすると、上述のように動きが検出される。この場合、背景は、おそらくユーザが居る部屋の壁であり、ユーザは、きっと壁から多少の距離のところで座るまたは立っている。ユーザの手がユーザの体よりも光源１０１４，１０１６に著しく近い（例えば、半分の距離）限り、本明細書に記載の照明に基づいたコントラストの改善技術が、背景画素からの対象画素の区別を容易にする。画像解析とその後の入力ジェスチャとしての解釈は、ベースユニット１００８内で行われ得る。 When the user makes a gesture using the hand or other object within the field of view of the cameras 1010 and 1012, motion is detected as described above. In this case, the background is probably the wall of the room where the user is, and the user is probably sitting or standing some distance from the wall. As long as the user's hand is significantly closer to the light source 1014, 1016 than the user's body (eg, half the distance), the illumination-based contrast improvement techniques described herein can distinguish the target pixel from the background pixel. make it easier. Image analysis and subsequent interpretation as input gestures may be performed within the base unit 1008.

当然であるが、図８〜１０に示した動き検出器の実装は例示であって、変更や修正は可能である。例えば、動き検出器またはその構成要素は、キーボードやトラックパッドなどの他のユーザ入力デバイスを有する単一のハウジング内に組み込まれ得る。別の例では、動き検出器は、例えば、上向きのカメラ及び光源がラップトップキーボードと同一の平面に組み入れられた（例えば、キーボードの一方側、または前、または背後）または前向きのカメラ及び光源がラップトップの表示画面を囲むベゼルに組み入れられたラップトップコンピュータに含まれる。さらに別の例では、着用可能な動き検出器は、例えば、アクティブディスプレイや光学部品が含まれていないヘッドバンドやヘッドセットなどとして実装され得る。 Of course, the motion detector implementations shown in FIGS. 8-10 are exemplary and can be changed or modified. For example, the motion detector or its components can be incorporated into a single housing with other user input devices such as a keyboard and trackpad. In another example, the motion detector may have, for example, an upward facing camera and light source incorporated in the same plane as the laptop keyboard (eg, one side of the keyboard, or in front or behind) or a forward facing camera and light source. Included in a laptop computer incorporated in a bezel that surrounds a laptop display screen. In yet another example, the wearable motion detector may be implemented as, for example, a headband or headset that does not include an active display or optical components.

図１１に示すように、動き情報は、本発明の実施形態に係るコンピュータシステムまたは他のシステムを制御するためのユーザ入力として使用され得る。処理１１００は、例えば図８〜１０に示すようなコンピュータシステムで実行され得る。ブロック１１０２において、動き検出器の光源及びカメラを使用して画像が撮像される。上述のように、画像の撮像は、光源（及びカメラ）に近い物体がさらに遠くの物体よりもより明るく照明されるようなカメラの視野を照明するための光源の使用を含み得る。 As shown in FIG. 11, motion information may be used as user input to control a computer system or other system according to embodiments of the present invention. Process 1100 may be performed, for example, on a computer system as shown in FIGS. At block 1102, an image is captured using the motion detector light source and camera. As mentioned above, imaging an image may involve the use of a light source to illuminate the camera's field of view such that objects close to the light source (and the camera) are illuminated brighter than objects farther away.

ブロック１１０４において、撮像された画像は、輝度の変化に基づいて物体のエッジを検出するために解析される。例えば、上述のように、この解析には、各画素の輝度と閾値との比較、隣接する画素におけるローレベルからハイレベルへの輝度の遷移の検出及び／または光源による照明がある状態及びない状態で撮像された一連の画像の比較が、含まれ得る。ブロック１１０６において、エッジベースアルゴリズムは、物体の位置及び／または動きを決定するために使用される。このアルゴリズムは、例えば、上述の第１３／４１４４８５出願に記載の任意の接線ベースアルゴリズムであり得る。他のアルゴリズムも使用され得る。 At block 1104, the captured image is analyzed to detect the edge of the object based on the change in brightness. For example, as described above, this analysis includes comparison between the luminance of each pixel and a threshold value, detection of a luminance transition from a low level to a high level in an adjacent pixel, and / or the presence or absence of illumination by a light source A comparison of a series of images taken at can be included. In block 1106, an edge-based algorithm is used to determine the position and / or motion of the object. This algorithm can be, for example, any tangent-based algorithm described in the aforementioned 13/414485 application. Other algorithms can also be used.

ブロック１１０８において、ジェスチャは、物体の位置及び／または動きに基づいて識別される。例えば、ジェスチャのライブラリが、ユーザの指の位置及び／または動きに基づいて定義され得る。「タップ」は、表示画面に向かって伸びた指の速い動きに基づいて定義され得る。「トレース」は、表示画面に対して大体平行な平面内における伸びた指の動きとして定義され得る。内側ピンチは、互いに近づくように動く２本の伸びた指として定義され得るとともに、外側ピンチは、さらに開くように動く２本の伸びた指として定義され得る。スワイプジェスチャは、特定の方向（例えば、上、下、左、右）に対する手全体の動きに基づいて定義され得るとともに、別のスワイプジェスチャは、伸びた指の本数（例えば、１本、２本、すべて）に基づいてさらに定義され得る。他のジェスチャも定義され得る。ライブラリに検出された動きを比較することによって、検出された位置及び／または動きに関連付けられた特定のジェスチャが決定され得る。 At block 1108, a gesture is identified based on the position and / or movement of the object. For example, a library of gestures may be defined based on the position and / or movement of the user's finger. A “tap” may be defined based on a fast movement of a finger extending toward the display screen. “Trace” may be defined as an extended finger movement in a plane generally parallel to the display screen. An inner pinch can be defined as two extended fingers that move closer together, and an outer pinch can be defined as two extended fingers that move further open. A swipe gesture can be defined based on the movement of the entire hand in a particular direction (eg, up, down, left, right), while another swipe gesture can be defined as the number of extended fingers (eg, 1, 2 , All) can be further defined. Other gestures can also be defined. By comparing the motion detected in the library, a specific gesture associated with the detected position and / or motion can be determined.

ブロック１１１０において、ジェスチャは、コンピュータシステムが処理し得るユーザ入力として解釈される。特定の処理は、一般的にコンピュータシステム上で現在実行されているアプリケーションプログラム及びこれらのプログラムの特定の入力に対する応答の構成方法によって決まる。例えば、ブラウザプログラム内のタップは、指が指示するリンクの選択として解釈され得る。文書処理プログラム内のタップは、指が指示する位置へのカーソルの設置としてまたはメニュー項目あるいは画面上に見え得る他のグラフィック制御要素の選択として解釈され得る。特定のジェスチャ及び解釈は、オペレーティングシステム及び／または必要なアプリケーションのレベルで決定され得るものであり、任意のジェスチャの特定の解釈は不要である。 At block 1110, the gesture is interpreted as user input that can be processed by the computer system. The specific processing typically depends on the application programs currently running on the computer system and how they are configured to respond to specific inputs of these programs. For example, a tap in a browser program can be interpreted as a link selection indicated by a finger. A tap in a document processing program can be interpreted as placing a cursor at a position indicated by a finger or as a selection of menu items or other graphic control elements that can be seen on the screen. The specific gesture and interpretation can be determined at the level of the operating system and / or the required application, and no specific interpretation of any gesture is necessary.

全身の動きが、キャプチャされるとともに同様の目的に使用され得る。このような実施形態では、解析及び再構成が、およそリアルタイム（例えば、人の反応時間に匹敵する時間）で都合良く行われることで、ユーザは機器との自然なやりとりを体験する。他の用途では、モーションキャプチャは、リアルタイムで行われないデジタルレンダリング（例えば、コンピュータアニメーションムービーなど）のために使用され得る。このような場合、解析は必要な長さをとり得る。 Whole body motion can be captured and used for similar purposes. In such an embodiment, analysis and reconstruction is conveniently performed in approximately real time (eg, time comparable to human reaction time) so that the user experiences natural interactions with the device. In other applications, motion capture may be used for digital rendering that is not performed in real time (eg, computer animated movies, etc.). In such cases, the analysis can take the required length.

本明細書で説明する実施形態は、距離に応じた光強度の減少を利用することによって、撮像された画像内の物体及び背景の効率的な区別を与える。背景よりも物体に著しく近い（例えば、２倍以上）１以上の光源を用いて物体を明るく照明することで、物体及び背景の間のコントラストが増大し得る。いくつかの例では、フィルタが、意図した光源以外の光源からの光を除去するために使用され得る。赤外光を使用することで、不要な「ノイズ」やおそらく画像が撮像される環境内に存在する見える光源からの輝点を低減し得るとともに、ユーザ（赤外線を見ることが不可能であろう人）の気が散ることをも低減し得る。 The embodiments described herein provide an efficient distinction between objects and background in a captured image by utilizing a decrease in light intensity as a function of distance. Illuminating an object brightly using one or more light sources that are significantly closer to the object than the background (eg, twice or more) can increase the contrast between the object and the background. In some examples, a filter may be used to remove light from light sources other than the intended light source. Using infrared light can reduce unwanted "noise" and possibly bright spots from visible light sources that are present in the environment where the image is captured, and the user (will not be able to see the infrared light) It can also reduce distraction of people.

上述の実施形態は、対象の物体の画像を撮像するために使用されるカメラの両側に１つ配置された、２つの光源を備える。この配置は、位置及び動きの解析がそれぞれのカメラから見た物体のエッジの情報に依拠しており、光源がそれらのエッジを照明する場合において、特に有用になり得る。しかしながら、他の配置も使用され得る。例えば、図１２は、単一のカメラ１２０２及びカメラ１２０２の両側に配置される２つの光源１２０４，１２０６を有するシステム１２００を示している。この配置は、物体１２０８の画像を撮像するために使用され得るものであり、平坦な背景領域１２１０に対して物体１２０８の影が掛かかる。この実施形態では、対象画素及び背景画素が容易に区別され得る。さらに、背景１２１０は物体１２０８からそれほど遠くないが、依然として、影の掛かっている背景領域の画素及び影の掛かっていない背景領域の画素の両者の区別を可能にするために十分なコントラストが与えられている。物体及びその影の画像を用いて位置及び動きを検出するアルゴリズムは、上述の第１３／４１４４８５出願に記載されており、システム１２００は、物体とその影のエッジの位置を含む入力情報を、そのようなアルゴリズムへ与え得る。 The embodiment described above comprises two light sources, one arranged on each side of the camera used to take an image of the object of interest. This arrangement can be particularly useful when the position and motion analysis relies on object edge information viewed from the respective cameras and the light source illuminates those edges. However, other arrangements can be used. For example, FIG. 12 shows a system 1200 having a single camera 1202 and two light sources 1204, 1206 disposed on either side of the camera 1202. This arrangement can be used to capture an image of the object 1208 and the object 1208 is shaded against the flat background region 1210. In this embodiment, the target pixel and the background pixel can be easily distinguished. Furthermore, the background 1210 is not so far from the object 1208, but is still provided with sufficient contrast to allow discrimination between both shadowed and non-shadowed background area pixels. ing. The algorithm for detecting the position and motion using the image of the object and its shadow is described in the aforementioned 13/414485 application, and the system 1200 uses the input information including the position of the object and its shadow edge as its input information. Can be given to such an algorithm.

単一のカメラの実装１２００では、カメラ１２０２のレンズの前に配置されたホログラフィック回折格子１２１５を含めることから効果が得られ得る。格子１２１５は、ゴーストシルエット及び／または物体１２０８の接線として現れる干渉縞パターンを作出する。特に、分離可能な場合（即ち、オーバーラップが過剰ではない場合）、これらのパターンは、背景からの物体の区別を容易にする高いコントラストを有する。例えば、回折格子ハンドブック（ニューポートコーポレーション、２００５年１月、http://gratings.newport.com/library/handbook/handbook.aspで利用可能)を参照し、その開示全体は参照として本明細書に援用される。 A single camera implementation 1200 may benefit from including a holographic diffraction grating 1215 disposed in front of the lens of the camera 1202. The grating 1215 creates a fringe pattern that appears as a ghost silhouette and / or a tangent to the object 1208. In particular, when separable (ie, when the overlap is not excessive), these patterns have a high contrast that facilitates distinguishing objects from the background. See, for example, the Grating Handbook (Newport Corporation, January 2005, available at http://gratings.newport.com/library/handbook/handbook.asp), the entire disclosure of which is incorporated herein by reference. Incorporated.

図１３は、２つのカメラ１３０２，１３０４及びカメラの間に配置された１つの光源１３０６を有する別のシステム１３００を示す。システム１３００は、背景１３１０に対して、物体１３０８の画像を撮像し得る。システム１３００は、一般的には図１のシステム１００よりもエッジの照明について信頼性が低い。しかしながら、すべての位置及び動きを決定するためのアルゴリズムが、物体のエッジの正確な情報に依拠するのではない。したがって、システム１３００は、例えば、あまり精度が必要ではない状況でエッジベースアルゴリズムが使用され得る。システム１３００では、非エッジベースアルゴリズムも使用され得る。 FIG. 13 shows another system 1300 having two cameras 1302, 1304 and one light source 1306 disposed between the cameras. System 1300 may capture an image of object 1308 against background 1310. System 1300 is generally less reliable for edge illumination than system 100 of FIG. However, the algorithm for determining all positions and motions does not rely on accurate information on the edges of the object. Thus, the system 1300 can use an edge-based algorithm, for example, in situations where less accuracy is required. In system 1300, non-edge based algorithms may also be used.

特定の実施形態に関して本発明を説明してきたが、当業者は多数の変更が可能であることを認識するであろう。カメラ及び光源の数及び配置は変更され得る。フレームレート、空間分解能及び強度分解能を含むカメラの能力も、必要に応じて変更され得る。光源は、連続またはパルスモードで動作し得る。本明細書で説明するシステムは、物体及び背景の区別を容易にするために両者の間のコントラストを改善した画像を与え、この情報は多数の目的に使用され得るものであり、位置及び／または動き検出は多数の可能性の中の１つに過ぎない。 Although the present invention has been described with respect to particular embodiments, those skilled in the art will recognize that many variations are possible. The number and arrangement of cameras and light sources can be varied. Camera capabilities including frame rate, spatial resolution, and intensity resolution can also be varied as needed. The light source can operate in continuous or pulsed mode. The system described herein provides an image with improved contrast between the two to facilitate the distinction between object and background, and this information can be used for a number of purposes, such as position and / or Motion detection is just one of many possibilities.

背景から物体を識別するためのカットオフ閾値及び他の具体的な基準は、特定のカメラ及び特定の環境に適合され得る。上述のように、比ｒ_Ｂ／ｒ_Ｏが増大するについて、コントラストが増大すると予想される。いくつかの実施形態において、システムは、例えば光源の輝度、閾値基準などの調整により、特定の環境に較正され得る。高速アルゴリズムに実装され得る単純な基準の使用は、他の用途のための所定のシステムにおける処理能力を解放し得る。 Cut-off thresholds and other specific criteria for identifying objects from the background can be adapted to specific cameras and specific environments. As mentioned above, the contrast is expected to increase as the ratio r _B / r _O increases. In some embodiments, the system can be calibrated to a particular environment, for example, by adjusting light source brightness, threshold criteria, and the like. The use of simple criteria that can be implemented in a fast algorithm can free up processing power in a given system for other applications.

任意のタイプの物体が、これらの技術を用いてモーションキャプチャの対象となり得るとともに、特定の物体に対して実装の様々な面が最適化され得る。例えば、カメラ及び／または光源のタイプ及び位置は、動きがキャプチャされるべき物体の大きさ及び／または動きがキャプチャされるべき空間の大きさに基づいて最適化され得る。本発明の実施形態に係る解析技術は、任意の適切なコンピュータ言語のアルゴリズムとして実装され得るとともに、プログラム可能なプロセッサ上で実行される。あるいは、アルゴリズムの一部または全部は、固定機能のロジック回路に実装され得るとともに、このような回路が従来のまたは他のツールを使用して設計及び製造され得る。 Any type of object can be targeted for motion capture using these techniques, and various aspects of the implementation can be optimized for a particular object. For example, the type and position of the camera and / or light source may be optimized based on the size of the object for which motion is to be captured and / or the size of the space for which motion is to be captured. The analysis techniques according to embodiments of the present invention can be implemented as any suitable computer language algorithm and run on a programmable processor. Alternatively, some or all of the algorithms can be implemented in fixed function logic circuits, and such circuits can be designed and manufactured using conventional or other tools.

本発明の様々な特徴を含むコンピュータプログラムは、様々なコンピュータ可読記憶媒体で符号化され得る。適切な媒体は、磁気ディスクまたはテープ、コンパクトディスク（ＣＤ）またはＤＶＤ（デジタル多用途ディスク）などの光学記憶媒体、フラッシュメモリ及びコンピュータが読取可能な形式でデータを保持する任意の他の非一時媒体などを含む。プログラムコードで符号化されるコンピュータが読取可能な記憶媒体は、互換性のある装置と共にパッケージまたは他の装置とは別に備えられ得る。さらに、プログラムコードは、符号化されて光学の有線及び／または様々なプロトコルに準拠する無線ネットワーク（例えば、インターネットダウンロードを介して配信が可能なインターネットを含む）を介して送信され得る。 A computer program containing various features of the present invention may be encoded on various computer readable storage media. Suitable media include optical storage media such as magnetic disks or tapes, compact disks (CDs) or DVDs (digital versatile discs), flash memory and any other non-transitory media that holds data in a computer readable form. Etc. A computer readable storage medium encoded with the program code may be provided separately from the package or other device together with a compatible device. Further, the program code may be transmitted over an optical wired and / or wireless network that complies with various protocols (eg, including the Internet that can be distributed via Internet download).

以上のように、本発明を特定の実施形態について説明したが、当然ながら、本発明は、以下の請求項の範囲内での変更及び均等物の全てを網羅することを意図したものである。 Although the invention has been described with reference to specific embodiments, it will be understood that the invention is intended to cover all modifications and equivalents within the scope of the following claims.

Claims

An image imaging analysis system for identifying a target object from a digitally displayed image scene,
At least one camera directed at the field of view;
At least one light source disposed on the same field side as the camera and directed to illuminate the field;
An image analyzer coupled to the camera and at least one of the light sources;
The image analysis device includes:
Operating at least one of the cameras to capture a series of images including a first image captured at the same time as the at least one light source illuminates the field of view;
Identify pixels corresponding to the object, not the background,
Based on the identified pixels, it is configured to construct a 3D model of the object including the position and shape of the object and geometrically determine whether it corresponds to the object of interest. system.

The image analysis device includes: (i) a foreground image component corresponding to the object located in a proximity region of the visual field; and (ii) a background image component corresponding to the object located in a remote region of the visual field. Is to distinguish
The proximity region extends from at least one of the cameras and has a depth that is at least twice the predicted maximum distance between at least one of the cameras and the object corresponding to the foreground image component;
The system according to claim 1, wherein the remote area is located beyond the proximity area with respect to at least one of the cameras.

The system of claim 2, wherein the proximity region has a depth that is at least four times the predicted maximum distance.

The system of claim 1, wherein the at least one light source is a diffuse emitter.

The system of claim 4, wherein at least one of the light sources is an infrared light emitting diode and at least one of the cameras is an infrared sensitive camera.

The system of claim 1, wherein at least two of the light sources are adjacent to at least one of the cameras and are substantially in the same plane.

The system of claim 1, wherein at least one of the cameras and at least one of the light sources are oriented vertically upward.

The at least one camera is operated to have an exposure time as high as 100 microseconds, and the at least one light source is driven at a power level of at least 5 watts during the exposure time. System.

The system of claim 1, further comprising a holographic diffraction grating disposed between at least one camera lens and the field of view.

The image analysis apparatus operates at least one of the cameras when at least one of the light sources is not illuminating the field of view to capture the second and third images, and the difference between the first and second images. Identifying a pixel corresponding to the object based on the difference between the first and third images;
The system according to claim 1, wherein the second image is captured before the first image, and the third image is captured after the second image.

A method of image capture analysis, the driving of at least one light source for illuminating a field of view including a target object;
Taking a series of digital images of the field of view by using a camera simultaneously with driving at least one of the light sources;
Identifying the pixels corresponding to the object rather than the background, and
A method of constructing a 3D model of the object including the position and shape of the object based on the identified pixels and geometrically determining whether it corresponds to the object of interest.

At least one of the light sources is arranged such that the object of interest is located within a proximity region;
The method of claim 11, wherein the proximity region extends from the camera to a distance that is at least twice a predicted maximum distance between the camera and the object of interest.

The method of claim 12, wherein the proximity region has a depth that is at least four times the predicted maximum distance.

The method of claim 11, wherein at least one of the light sources is a diffuse emitter.

The method of claim 11, wherein at least one of the light sources is an infrared light emitting diode and the camera is an infrared sensitive camera.

12. The method of claim 11, wherein two light sources that are adjacent to and substantially in the same plane as the camera are driven.

The method of claim 11, wherein the camera and the at least one light source are oriented vertically upward.

A first image when at least one light source is not driven, a second image when at least one light source is driven, and a third image when at least one light source is not driven. Further comprising imaging of
The method of claim 11, wherein a pixel corresponding to the object is identified based on a difference between the second and first images and a difference between the second and third images.

A method for positioning a round object in a digital image, driving at least one light source that illuminates a field of view including the object of interest;
Operation of the camera to capture a series of images, including a first image captured at the same time as the at least one light source illuminates the field of view;
Analysis of the image to detect a Gaussian luminance decay pattern indicating a round object in the field of view;
A method comprising the steps of:

The method of claim 19, wherein the round object is detected without identifying its edges.

The method of claim 19, further comprising tracking movement of the round object detected through a plurality of captured images.

An image capturing and analyzing system for positioning a round object in a field of view,
At least one camera directed at the field of view;
At least one light source disposed on the same field side as the camera and directed to illuminate the field;
An image analyzer coupled to the camera and at least one of the light sources;
The image analysis device includes:
Operating at least one of the cameras to capture a series of images including a first image captured at the same time as the at least one light source illuminates the field of view;
A system configured to analyze the image to detect a Gaussian luminance decay pattern indicative of a round object in the field of view.

24. The system of claim 22, wherein the round object is detected without identifying its edges.

23. The system of claim 22, further tracking the movement of the round object detected through a plurality of captured images.