JP2002197472A

JP2002197472A - Method for recognizing object

Info

Publication number: JP2002197472A
Application number: JP2000404599A
Authority: JP
Inventors: Masahiro Tomono; 正裕友納
Original assignee: Individual
Current assignee: Individual
Priority date: 2000-12-26
Filing date: 2000-12-26
Publication date: 2002-07-12

Abstract

PROBLEM TO BE SOLVED: To recognize a three-dimensional object appearing in a two-dimensional image and the posture estimation of the three-dimensional object under a perspective projection camera model causing no distortion, and also to perform processing with a small amount of calculation. SOLUTION: In this object recognizing method, an input image is collated with an object model, by collating an image edge in the input image with the model edge of the object model by one at a time, for each posture of the object model to a camera. In such a case, the object posture is divided into a rotating component and a parallel moving component, it is first checked whether a projected image to the image plane of the model edge and the image edge exist on the same straight line with respect to the discrete value of the turning component, and a candidate for rotating component value and edge correspondence is narrowed down. Next, a distribution of parallel moving component values fitting each obtained candidate is calculated, and the parallel moving component value of the highest frequency is calculated. Then that parallel moving component value, and the turning component value and the edge correspondence at that time are defined as being the solutions.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、計算機による物体
認識方法に関し、とくに、あらかじめ登録した物体モデ
ルを用いて、入力画像に写った物体を認識し、さらに、
その物体のカメラに対する姿勢を推定する方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an object recognition method using a computer, and more particularly to a method for recognizing an object in an input image by using an object model registered in advance.
The present invention relates to a method of estimating a posture of the object with respect to a camera.

【０００２】[0002]

【従来の技術】３次元物体モデルを用いて２次元画像に
写った物体を認識する手法として、アラインメント法が
ある（文献：Ｄ．Ｐ．Ｈｕｔｔｅｎｌｏｃｈｅｒａｎ
ｄＳ．Ｕｌｌｍａｎ，“ＲｅｃｏｇｎｉｚｉｎｇＳ
ｏｌｉｄＯｂｊｅｃｔｓｂｙＡｌｉｇｎｍｅｎｔ
ｗｉｔｈａｎＩｍａｇｅ”，Ｉｎｔｅｒｎａｔｉ
ｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒ
Ｖｉｓｉｏｎ，Ｖｏｌ．５，Ｎｏ．２，ｐｐ．１９５−
２１２，１９９０）。アラインメント法では、カメラモ
デルを弱透視投影で近似し、３次元物体を２次元画像平
面に投影する変換行列を定義する。そして、物体モデル
の特徴点集合と画像上の特徴点集合とから、この変換行
列を求める問題として物体認識を定式化する。具体的に
は、物体モデル上の特徴点３個と画像上の特徴点３個を
対応づけた場合の変換行列を求め、その変換行列によっ
て他のモデル特徴点を画像平面に投影した結果が他の画
像特徴点とうまく一致するかを調べる。この処理を、モ
デル特徴点および画像特徴点の各組合せに対して行い、
一致度の高い変換行列を解とする。特徴点としては、画
像から抽出されたエッジの角、交点、変曲点などが用い
られる。2. Description of the Related Art An alignment method is known as a method for recognizing an object in a two-dimensional image using a three-dimensional object model (Reference: DP Huttenlocher an.
dS. Ullman, "Recognizing S
old Objects by Alignment
with an Image ", Internati
onal Journal of Computer
Vision, Vol. 5, No. 2, pp. 195-
212, 1990). In the alignment method, a camera model is approximated by weak perspective projection, and a transformation matrix for projecting a three-dimensional object onto a two-dimensional image plane is defined. Then, object recognition is formulated as a problem for obtaining the transformation matrix from the feature point set of the object model and the feature point set on the image. Specifically, a transformation matrix is obtained when three feature points on the object model are associated with three feature points on the image, and the result of projecting other model feature points on the image plane using the transformation matrix is different. Find out if the image feature points match well. Perform this process for each combination of model feature points and image feature points,
A transformation matrix having a high degree of coincidence is used as a solution. As the feature points, corners, intersections, inflection points, and the like of edges extracted from the image are used.

【０００３】[0003]

【発明が解決しようとする課題】通常、カメラの正確な
モデル化には透視投影が用いられる。透視投影では、物
体は奥行きに反比例して画像上に小さく写る。しかし、
アラインメント法では、弱透視投影で近似しているた
め、物体の奥行きが長い場合に投影像の歪みが大きくな
り、正しく認識できなくなるという問題がある。Normally, perspective projection is used for accurate modeling of a camera. In perspective projection, an object appears small on an image in inverse proportion to depth. But,
In the alignment method, since the projection is approximated by weak perspective projection, when the depth of the object is long, the distortion of the projected image becomes large, and there is a problem that the image cannot be correctly recognized.

【０００４】また、アラインメント法では、モデル特徴
点数をＭ、画像特徴点数をＮとすると、その計算量はＭ
^３Ｎ^２ｌｏｇＮに比例し、特徴点数が増えると計算時間
が膨大になるという問題がある。In the alignment method, if the number of model feature points is M and the number of image feature points is N, the amount of calculation is M
There is a problem that the calculation time becomes enormous as the number of feature points increases in proportion to ³ N ² logN.

【０００５】なお、アラインメント法に限らず、３次元
物体認識は探索空間が大きく、計算量が膨大であるとい
う問題をもつ。これは、物体認識の問題が、モデル特徴
と画像特徴の対応づけ問題と物体姿勢を求める問題とを
同時に解く必要があるためである。[0005] In addition to the alignment method, three-dimensional object recognition has a problem that a search space is large and a calculation amount is enormous. This is because the problem of object recognition needs to simultaneously solve the problem of associating model features with image features and the problem of finding an object posture.

【０００６】[0006]

【課題を解決するための手段】以上の課題を解決するた
めに、本発明の物体認識方法では、入力画像中の２次元
エッジ（画像エッジと呼ぶ）と物体モデルの３次元エッ
ジ（モデルエッジと呼ぶ）とを１本ずつ照合すること
で、入力画像と物体モデルの照合を行う。この照合は物
体の各姿勢ごとに行い、しかも、物体姿勢を回転成分と
平行移動成分に分けて段階的に行う。In order to solve the above problems, in the object recognition method of the present invention, a two-dimensional edge (referred to as an image edge) in an input image and a three-dimensional edge of an object model (a model edge and a model edge) are used. ) Are collated one by one to collate the input image with the object model. This collation is performed for each posture of the object, and the object posture is divided into a rotation component and a translation component in a stepwise manner.

【０００７】請求項１の発明は、該入力画像を撮影した
カメラの座標系における前記物体モデルの姿勢の回転成
分を離散化し、該回転成分の各離散値ごとに、前記物体
モデルを構成する各モデルエッジの画像平面への投影像
と前記画像エッジ集合の各画像エッジとが同一直線上に
存在しうるかを調べ、同一直線上に存在しうるモデルエ
ッジと画像エッジの対の集合をエッジ対集合として求
め、該エッジ対集合に含まれるモデルエッジの本数が所
定の閾値を越えた場合に、該エッジ対集合に含まれる各
エッジ対に対して、該エッジ対に含まれるモデルエッジ
の画像平面への投影像の両端点と該エッジ対に含まれる
画像エッジの両端点とが一致する前記物体モデルの姿勢
の平行移動成分値を求め、前記エッジ対集合に対する平
行移動成分値の分布において集中度が大きい領域の代表
値を平行移動成分の推定値とし、前記回転成分値と該平
行移動成分の推定値を物体姿勢の候補とする。According to a first aspect of the present invention, a rotation component of the posture of the object model in a coordinate system of a camera which has taken the input image is discretized, and each discrete value of the rotation component constitutes the object model. It is checked whether the projected image of the model edge on the image plane and each image edge of the image edge set can be on the same straight line, and a set of a model edge and an image edge pair that can be on the same straight line is determined as an edge pair set. When the number of model edges included in the edge pair set exceeds a predetermined threshold, for each edge pair included in the edge pair set, the image plane of the model edge included in the edge pair is transferred to the image plane. A translation component value of the posture of the object model in which both end points of the projected image of the object model coincide with both end points of the image edge included in the edge pair, and the distribution of the translation component value with respect to the edge pair set A representative value of the region Oite concentration degree is greater as the estimated value of the translation component, the estimated value of the rotation component value and the parallel movement component and a candidate object position.

【０００８】請求項２の発明は、前記物体姿勢候補の各
々ごとに、該物体姿勢候補における各モデルエッジの画
像平面への投影像と画像エッジの距離から該モデルエッ
ジの一致度を計算し、最もよく一致する画像エッジを該
モデルエッジに対応する画像エッジとして選び、全モデ
ルエッジの一致度の総和が最良となる物体姿勢候補を姿
勢推定の解として選ぶ。According to a second aspect of the present invention, for each of the object posture candidates, a degree of coincidence of the model edge is calculated from a distance between the image edge and a projection image of each model edge on the image plane in the object posture candidate, The image edge that best matches is selected as the image edge corresponding to the model edge, and the candidate object posture with the best sum of the degrees of coincidence of all model edges is selected as a solution for posture estimation.

【０００９】[0009]

【発明の実施の形態】本発明の物体認識方法は、多面体
で構成された３次元物体モデルを用いて、１枚の２次元
画像に写った物体の認識とカメラに対する物体姿勢の推
定を行うものである。入力画像と物体モデルの照合は、
画像エッジとモデルエッジとを１本ずつ照合することで
行う。物体モデルは多面体なので、モデルエッジはすべ
て直線分である。対象とする２次元画像は、計算機に取
り込まれたデジタル画像であり、画像処理分野で広く行
われているエッジ抽出処理により画像エッジを得てお
く。また、全ての画像エッジは、高曲率点や交差点など
を分割点として直線分に分割されているとする。図４
に、机の画像エッジの例を示す。ただし、実際の画像に
は、対象物体以外の画像エッジも多数含まれる。図５
に、机の物体モデルの例を示す。物体モデルは、物体の
特徴的な部分だけを表現するものでよく、この例では、
通常の視点から見える机の前面部分だけをモデル化して
いる。DESCRIPTION OF THE PREFERRED EMBODIMENTS The object recognition method of the present invention uses a three-dimensional object model composed of a polyhedron to recognize an object shown in one two-dimensional image and to estimate an object posture with respect to a camera. It is. Matching the input image with the object model
This is performed by checking the image edge and the model edge one by one. Since the object model is a polyhedron, all model edges are straight lines. The target two-dimensional image is a digital image captured by a computer, and an image edge is obtained by an edge extraction process widely performed in the field of image processing. It is also assumed that all image edges are divided into straight lines using high curvature points and intersections as division points. FIG.
The example of the image edge of a desk is shown below. However, the actual image includes many image edges other than the target object. FIG.
The following shows an example of a desk object model. The object model may represent only a characteristic part of the object, and in this example,
Only the front part of the desk that can be seen from a normal viewpoint is modeled.

【００１０】次に、本発明の物体認識方法の原理を説明
する。まず、透視投影によるカメラモデルは以下のよう
に定式化される。物体モデルとカメラの姿勢関係の一例
を図３に示す。物体モデルの形状は、物体モデルのロー
カル座標系で定義するとする。物体モデル座標系からカ
メラ座標系への座標変換パラメータをτ＝＜τ_ｒ，τ_ｔ
＞とする。τ_ｒ＝（θ，φ，ψ）は回転成分、τ_ｔ＝
（ｘ^ｔ，ｙ^ｔ，ｚ^ｔ）^Ｔは平行移動ベクトルである（Ｔ
は転置を表す）。このとき、物体モデル上の点Ｐのカメ
ラ座標系での値Ｐ^ｃは、数１のようになる。ただし、Ｒ
（τ_ｒ）はτ_ｒによる回転行列である。さらに、カメラ
座標系の点Ｐ^ｃ＝（ｘ^ｃ，ｙ^ｃ，ｚ^ｃ）^Ｔのスクリーン
座標系への投影点Ｐ^ｓは数２のようになる。Next, the principle of the object recognition method of the present invention will be described. First, a camera model based on perspective projection is formulated as follows. FIG. 3 shows an example of the posture relationship between the object model and the camera. It is assumed that the shape of the object model is defined in the local coordinate system of the object model. The coordinate transformation parameters from the object model coordinate system to the camera coordinate system are τ = <τ _r , τ _t
>. τ _r = (θ, φ, ψ) is a rotation component, and τ _t =
(X ^t , y ^t , z ^t ) ^T is the translation vector (T
Represents transposition). At this time, the value ^Pc of the point P on the object model in the camera coordinate system is as shown in Expression 1. Where R
(Τ _r ) is a rotation matrix based on τ _r . Moreover, the point of the camera coordinate system ^{^{^{P c = (x c, y}}} c, z c) projected point ^{P s} to the screen coordinate system of ^T is as Equation 2.

【００１１】[0011]

【数１】Ｐ^ｃ＝Ｒ（τ_ｒ）Ｐ＋τ_ｔ ## EQU1 ## P ^c = R (τ _r ) P + τ _t

【００１２】[0012]

【数２】 (Equation 2)

【００１３】画像エッジの集合をＬ、モデルエッジの集
合をＥ、モデルエッジｅ∈Ｅを姿勢τによって２次元画
像に投影したエッジ（投影エッジと呼ぶ）をｅ^ｓあるい
はｅ^ｓ（τ）と表す。入力画像中の物体を認識する問題
を、数３を満たす姿勢τと、ＥからＬへの写像ｍ＝
｛（ｅ，ｌ）｜ｅ∈Ｅ，ｌ∈Ｌ｝を求める問題として定
式化する。ただし、数３で、Ｄは２つの線分間の距離で
あり、たとえば、２つの線分の両端点間のユークリッド
距離の和で定義するが、２つの線分の両端点が一致した
ときに０となるならば、他の距離尺度でもよい。また、
対応する画像エッジをもたないモデルエッジがあっても
よい。[0013] represents a set of image edge L, and a set of model edge E, the model edge e∈E by the attitude tau (referred to as the projection edge) projected edges in a two-dimensional image and a e ^s or e ^{s (tau)} . The problem of recognizing an object in an input image is represented by a posture τ satisfying Expression 3, and a mapping m from E to L =
It is formulated as a problem to obtain {(e, l) | e {E, l {L}}. In Equation 3, D is a distance between two line segments. For example, D is defined as a sum of Euclidean distances between both end points of the two line segments. If so, another distance scale may be used. Also,
There may be model edges that do not have corresponding image edges.

【００１４】[0014]

【数３】 (Equation 3)

【００１５】Ｓでは、物体姿勢τ全体を探索しなければ
ならず、計算量が膨大になる。そこで、Ｓの探索空間を
分割し、解を段階的に求めることで、計算量の軽減を図
る。具体的には、まず数４によりエッジの直線方程式が
一致しうるかどうかで対応エッジを絞り込み、次に、数
５により、端点の一致により対応エッジを点Ｑが一致するときの両エッジの傾きの差の絶対値であ
り、数６で定義される。数６のｓｌｏｐｅ（ｘ）はエッ
ジｘの傾きである。Ｄ_１により投影エッジと画像エッジ
の直線方程式が一致しうるかどうかが判定される。In S, the entire object posture τ must be searched, and the amount of calculation becomes enormous. Therefore, the search space of S is divided and the solution is obtained in a stepwise manner, thereby reducing the amount of calculation. Specifically, first, the corresponding edges are narrowed down according to whether or not the straight line equations of the edges can match according to Equation 4, and then the corresponding edges are determined according to Equation 5 by matching the end points. This is the absolute value of the difference between the inclinations of the two edges when the point Q coincides, and is defined by Equation 6. The slope (x) in Equation 6 is the slope of the edge x. Whether linear equations of the projected edge and the image edge may coincide with D ₁ is determined.

【００１６】[0016]

【数４】 (Equation 4)

【００１７】[0017]

【数５】 (Equation 5)

【００１８】[0018]

【数６】 (Equation 6)

【００１９】Ｓ_１は直線方程式が一致しうるモデルエッ
ジｅと画像エッジｌの対の集合となる。Ｓ_２は、Ｓ_１で
得られたエッジ対応候補のうち、２つのエッジの両端点
が一致するものの集合であり、Ｓ_２＝Ｓが成り立つ。こ
れにより、Ｓを求めるには、まずＳ_１を求め、次にＳ_２
を求めればよい。[0019] S ₁ is the set of pairs of model edge e and image edges l linearly equation can match. S _2, of the edge correspondence candidates obtained in S _1, the set of those end points of the two edges are matched, S 2 _{= S} holds. Thus, in order to determine the S, and first obtains the _{S 1,} then _{S 2}
Should be obtained.

【００２０】次に、Ｓ_１とＳ_２の具体的な計算方法、お
よび、その際に探索空間が小さくなることを示す。ま
ず、Ｓ_１は回転成分τ_ｒにだけ依存し、τ_ｔによらな
い。こて不変であることを導けばよい。この証明を以下に示
す。Next, a specific calculation method of S ₁ and S ₂ and a reduction in the search space at that time will be described. First, S ₁ depends only on the rotation component τ _r and not on τ _t . This It is only necessary to guide that it is immutable. This proof is shown below.

【００２１】姿勢τにおける投影エッジｅ^ｓの傾きは数８のようにな
る。ここで、数１より、によらない。また、ｕ，ｖは画像で決まる。よって、数
８のｓｌｏｐｅ（ｅ^ｓ）はτ_ｔに対して不変である。
（証明終り）なお、Ｑを画像エッジｌ上のどの点にとっ
てもＳ_１は変わらない。[0021] The inclination of the projection edge e ^s in attitude τ is as Equation 8. Here, from Equation 1, It does not depend. U and v are determined by the image. Therefore, the slope (e ^s ) in Equation 8 is invariant to τ _t .
(Proof end) Incidentally, S ₁ does not change the Q for any point on the image edge l.

【００２２】[0022]

【数７】 (Equation 7)

【００２３】[0023]

【数８】 (Equation 8)

【００２４】以上により、Ｓ_１ではτ_ｔを考慮しなくて
よいため、探索空間はτ_ｒだけとなって小さくなる。そ
こで、Ｓ_１を求めるために、τ_ｒを適当な区画で離散化
し、その各離散値についてΣ_{（ｅ，ｌ）∈ｍ}Ｄ_１（ｅ^ｓ
（τ），ｌ）＝０を満たすエッジ対応ｍを求める。τ_ｒ
の各角度は０〜３６０度以内なので有限個の区画で覆う
ことができる。By [0024] As described above, since it is not necessary to consider the S ₁ τ _t, the search space is reduced become the only τ _r. Therefore, in order to obtain the _{S 1,} tau _r discretized with an appropriate compartments, for each of its discrete values _{Σ (e, l) ∈m D} 1 (e s
An edge correspondence m that satisfies (τ), l) = 0 is obtained. τ _r
Can be covered by a finite number of sections because each angle is within 0 to 360 degrees.

【００２５】Ｓ_２は、Ｓ_１で得られた各τ_ｒに対して、
投影エッジと画像エッジの一致度の高い平行移動成分τ
_ｔを計算することで求める。画像エッジと投影エッジの
両端点が一致する平行移動成分τ_ｔは、数９および数１
０で計算できる。ただし、Ｐ_１，Ｐ_２はモデルエッジの
端点、Ｑ_１＝（ｕ_１，ｖ_１）^Ｔ，Ｑ_２＝（ｕ_２，ｖ_２）
^Ｔは画像エッジの端点である。S ₂ is, for each τ _r obtained in S ₁ ,
A translation component τ with a high degree of coincidence between the projected edge and the image edge
_{It is obtained} by calculating _t . The translation component τ _t at which both end points of the image edge and the projection edge coincide with each other is expressed by Expression 9 and Expression 1
0 can be calculated. Here, P ₁ and P ₂ are end points of the model edge, and Q ₁ = (u ₁ , v ₁ ) ^T , Q ₂ = (u ₂ , v ₂ )
^T is the end point of the image edge.

【００２６】[0026]

【数９】 τ_ｔ＝ＦＲ（τ_ｒ）（Ｐ_２−Ｐ_１）−Ｒ（τ_ｒ）Ｐ_１ Τ _t = FR (τ _r ) (P ₂ −P ₁ ) −R (τ _r ) P ₁

【００２７】[0027]

【数１０】 (Equation 10)

【００２８】投影エッジと画像エッジの一致度の高い平
行移動成分τ_ｔは、次のように求める。まず、Ｓ_１で得
られた各τ_ｒごとに、そのτ_ｒにおけるエッジ対応ｍに
含まれる各エッジ対（ｅ，ｌ）に対して数９によりτ_ｔ
を計算し、τ_ｔの分布を得る。そして、その分布におい
て最も頻度の高いτ_ｔを選ぶ。実際は、誤差などにより
τ_ｔが一点に集中することはなく、ある程度の範面に分
散するので、投票やクラスタリングなどの手法で最も頻
度の高いτ_ｔを求める。これは、Ｄ＝０を近似的に満た
すエッジ対の数が最も多いτ_ｔをＳ_２の解として選んだ
ことを意味する。最後に、そのτ_ｔとそのときのτ_ｒと
ｍを組にして解の候補とする。The translation component τ _t having a high degree of coincidence between the projected edge and the image edge is obtained as follows. First, for each τ _r obtained in S ₁ , for each edge pair (e, l) included in the edge correspondence m at that τ _r , τ _t
To obtain the distribution of τ _t . Then, the most frequent _t in the distribution is selected. Actually, τ _t is not concentrated on one point due to an error or the like, but is dispersed over a certain range, so that the most frequent τ _t is obtained by a method such as voting or clustering. This means that the selected edge pair largest number tau _t of satisfying D = 0 to approximately as a solution of S _2. Finally, τ _t and τ _r and m at that time are paired and set as a solution candidate.

【００２９】以上の方法では、画像エッジの端点が正確
に抽出できていることを前提としている。画像エッジが
完全に抽出できる場合は、その端点をそのまま採用し
て、モデルエッジの端点と照合すればよい。しかし、実
際の画像においては、照明条件やコントラストによって
エッジがうまく抽出できなかったり、他の物体が重なっ
たためにエッジが隠されたりして、画像エッジの端点が
完全に抽出できないことがある。この場合は、２つの画
像エッジの交点を各画像エッジの端点候補として、上記
方法を適用する。ここで言う画像エッジの交点とは、画
像エッジを延長した直線の交点である。画像エッジの交
点は直線部分がある程度抽出できれば求めることができ
るため、上記方法により、画像エッジが途切れている場
合でも、画像エッジの端点候補を安定して求めることが
できる。The above method is based on the premise that the end points of the image edges have been accurately extracted. If the image edge can be completely extracted, the end point may be adopted as it is and collated with the end point of the model edge. However, in an actual image, edges may not be extracted properly due to lighting conditions or contrast, or edges may be hidden due to the overlapping of other objects, and the end points of the image edges may not be completely extracted. In this case, the above method is applied by using the intersection of two image edges as the end point candidates of each image edge. The intersection of the image edges referred to here is the intersection of a straight line extending the image edge. Since the intersection of the image edges can be obtained if a straight line portion can be extracted to some extent, even if the image edge is interrupted, the above-mentioned method can stably obtain the end point candidates of the image edge.

【００３０】以下、図面を参照しながら、本発明による
物体認識方法の実施例について説明する。図１は、本発
明の一実施例を示す流れ図である。ステップＳ１〜Ｓ７
が請求項１の範囲、ステップＳ１〜Ｓ１１が請求項２の
範囲である。まず、ステップＳ１で、物体姿勢の回転成
分τ_ｒを適当な区画で離散化する。離散化の方法として
は、たとえば、τ_ｒをオイラー角で表し、その３つの角
度それぞれを離散化する方法がある。あるいは、物体モ
デルの周囲を球で囲み、その球面を多角形に分割して、
球の中心から多角形の中心へのベクトルの方位角（離散
化された２つの角度になる）とそのベクトルを軸とした
回転角の離散値の組を用いる方法もある。なお、τ_ｒの
範囲は、物体がカメラに対してとりうる姿勢の範囲に限
定してよい。An embodiment of the object recognition method according to the present invention will be described below with reference to the drawings. FIG. 1 is a flowchart showing one embodiment of the present invention. Steps S1 to S7
Is the scope of claim 1, and steps S1 to S11 are the scope of claim 2. First, in step S1, the rotation component τ _r of the object posture is discretized in appropriate sections. As a discretization method, for example, there is a method in which τ _r is represented by an Euler angle, and each of the three angles is discretized. Alternatively, surround the object model with a sphere, divide the sphere into polygons,
There is also a method using a set of an azimuth angle of the vector from the center of the sphere to the center of the polygon (which becomes two discretized angles) and a discrete value of a rotation angle around the vector. Note that the range of τ _r may be limited to the range of postures that the object can take with respect to the camera.

【００３１】次に、ステップＳ２とステップＳ３によ
り、τ_ｒの各離散値に対して、ステップＳ４〜Ｓ７を繰
り返す。ステップＳ４では、現在のτ_ｒにおいて、各モ
デルエッジｅについて、投影エッジｅ^ｓ（τ）が画像エ
ッジｌのどれかと同一直線上に存在しなるか調べる。ｔｈ_１は閾値であり、Ｄ_１（ｅ
^ｓ（τ），ｌ）＜ｔｈ_１であれば、Ｄ_１（ｅ^ｓ（τ），
ｌ）＝０が成り立っていると見なす。これは、種々の誤
差により実際にＤ_１（ｅ^ｓ（τ），ｌ）＝０になること
が少ないためである。段落００２１で示したように、Ｄ
_１の計算にτ_ｔは必要ない。そして、同一直線上に存在
しうるモデルエッジと画像エッジの対（エッジ対）をエ
ッジ対集合Ｍに登録する。Next, the steps S2 and S3, for each discrete value of tau _r, repeats steps S4 to S7. In step S4, at the current τ _r , for each model edge e, the projected edge e ^s (τ) exists on the same straight line as one of the image edges l. Find out if it will. th ₁ is a threshold, and D ₁ (e
^{s (τ),} l) <if _{_{^{th 1, D 1 (e s}}} (τ),
l) = 0 is assumed to hold. This is because D ₁ ( ^es (τ), 1) is rarely actually 0 due to various errors. As shown in paragraph 0021, D
Τ _t is not required for the calculation of ₁ . Then, a pair (edge pair) of a model edge and an image edge that can exist on the same straight line is registered in an edge pair set M.

【００３２】次に、ステップＳ５で、エッジ対集合Ｍに
含まれるモデルエッジの本数が所定の閾値を越えたかど
うかを調べ、越えていればステップＳ６に進み、越えて
いなければステップＳ２に戻って次のτ_ｒについて計算
する。Next, in step S5, it is checked whether or not the number of model edges included in the edge pair set M has exceeded a predetermined threshold value. If it has, the process proceeds to step S6. If not, the process returns to step S2. The following τ _r is calculated.

【００３３】ステップＳ６では、ステップＳ４で得られ
たエッジ対集合Ｍに含まれるモデルエッジと画像エッジ
の各対に対して、それらの両端点をもとに数９および数
１０を用いて平行移動成分τ_ｔを計算する。In step S6, each pair of the model edge and the image edge included in the edge pair set M obtained in step S4 is translated by using equations (9) and (10) based on their both end points. Calculate the component τ _t .

【００３４】次に、ステップＳ７で、現在のτ_ｒにおい
て、エッジ対集合Ｍの全エッジ対に対する平行移動成分
τ_ｔの分布をもとにτ_ｔが集中する領域を求めて、集中
度の高い領域の代表値を解の候補とする。集中する領域
を求める方法としては、たとえば、投票（ｖｏｔｉｎ
ｇ）による方法やクラスタリングによる方法がある。投
票による方法では、τ_ｔの範囲を適当に離散化して、τ
_ｔのヒストグラムを作り、頻度（投票数）の大きいτ_ｔ
を解の候補とする。クラスタリングによる方法では、各
τ_ｔの近傍にある他のτ_ｔの個数を調べて、個数の多い
τ_ｔを解の候補とする。得られたτ_ｔと現在のτ_ｒを組
にして、物体姿勢の候補とする。Next, in step S7, at the current τ _r , an area where τ _t is concentrated is determined based on the distribution of the translation component τ _{t with} respect to all the edge pairs of the edge pair set M, and a high degree of concentration is obtained. The representative value of the region is set as a solution candidate. As a method of obtaining the area to be concentrated, for example, a voting (votin
g) and clustering. In the voting method, the range of τ _t is appropriately discretized to obtain τ
Create a histogram of _t , τ _t with large frequency (number of votes)
Is a solution candidate. In the process according to clustering by examining the number of other tau _t in the neighborhood of the tau _t, the number of large tau _t and candidate solutions. The obtained [tau] _t and the current [tau] _r are paired and set as a candidate for the object posture.

【００３５】次に、ステップＳ８とステップＳ９によ
り、ステップＳ７で得られた物体姿勢の各候補につい
て、ステップＳ１０を繰り返す。ステップＳ１０では、
現在の姿勢候補＜τ_ｒ，τ_ｔ＞における各エッジ対をも
とに、モデルエッジの一致度を計算する。そして、その
一致度の総和を、その姿勢候補における物体モデルの一
致度とする。Next, in steps S8 and S9, step S10 is repeated for each object posture candidate obtained in step S7. In step S10,
Current posture candidate <τ _r, τ _t> each edge pair in the original, to calculate the degree of matching of the model edge. Then, the sum of the degrees of coincidence is set as the degree of coincidence of the object model in the posture candidate.

【００３６】モデルエッジの一致度は、たとえば次のよ
うに計算する。すなわち、モデルエッジｅとエッジ対を
なす画像エッジの中で、段落００１３で述べたＤ
（ｅ^ｓ，ｌ）が最小となるｌをｅに対応する画像エッジ
であるとし、そのときのＤ（ｅ^ｓ，ｌ）の値をｅの一致
度とする。この場合、一致度が小さいほど、よく一致し
ていることになる。このとき、モデルエッジｅに対応す
る画像エッジがない場合にｅに適当なペナルティ点を与
えることにより、画像エッジに対応するモデルエッジが
全くない物体姿勢が解として選ばれるのを防ぐ処理を加
えてもよい。The coincidence of the model edges is calculated, for example, as follows. That is, in the image edge forming an edge pair with the model edge e, the D
(E ^{s, l)} and is an image edges corresponding to l that minimizes the e, its time of D (e ^{s, l)} the value of the matching degree e. In this case, the smaller the degree of coincidence, the better the coincidence. At this time, when there is no image edge corresponding to the model edge e, by giving an appropriate penalty point to e, processing is added to prevent an object posture having no model edge corresponding to the image edge from being selected as a solution. Is also good.

【００３７】最後に、ステップＳ１１において、物体モ
デルの一致度が最もよい物体姿勢＜τ_ｒ，τ_ｔ＞を選び
姿勢推定の解とする。また、その姿勢においてＤ
（ｅ^ｓ，ｌ）を最小とするエッジ対の集合を物体認識の
解とする。Finally, in step S11, the object posture <τ _r , τ _t > with the highest degree of coincidence between the object models is selected as a solution for posture estimation. In that position, D
(E ^{s, l)} a solution of object recognition the set of edge pair that minimizes the.

【００３８】図２は、本発明の物体認識方法を実行する
ためのシステム構成の一例を示すブロック図である。図
２で点線で囲んだ範囲が、本発明の物体認識方法を実行
する部分である。まず、エッジ抽出部１は入力画像から
画像エッジを抽出して、その結果をエッジ分割部２に渡
す。画像エッジの抽出は、たとえば、画像を微分してそ
の極値を追跡するなどの画像処理によって行う。エッジ
分割部２は、高曲率点や交差点などで画像エッジを直線
分に分割し、その結果を端点検出部３、姿勢計算部４、
エッジ照合部５に渡す。端点検出部３は、直線分に分割
された画像エッジの端点を求め、その結果を姿勢計算部
４とエッジ照合部５に渡す。画像エッジの端点は、段落
００２９で述べたように、直線分の端点をそのまま用い
る方法と、画像エッジの交点として求める方法とがあ
る。端点検出部３は、このどちらかを実装するか、ある
いは両方を実装して利用者に選択させる。次に、姿勢計
算部４は、図１のステップＳ１〜Ｓ７に示した処理によ
り、物体モデル記憶部６の物体モデルを参照しながら、
モデルエッジと画像エッジを照合して物体姿勢の候補を
求め、その候補をエッジ照合部５に渡す。エッジ照合部
５は、得られた物体姿勢の各候補に対して、ステップＳ
８〜Ｓ１１に示した処理により、物体モデル記憶部６の
物体モデルを参照しながら、モデルエッジと画像エッジ
が最もよく一致する物体姿勢を求め、さらに、そのとき
のエッジ対応を求める。FIG. 2 is a block diagram showing an example of a system configuration for executing the object recognition method of the present invention. The range surrounded by the dotted line in FIG. 2 is a portion for executing the object recognition method of the present invention. First, the edge extracting unit 1 extracts an image edge from an input image, and passes the result to the edge dividing unit 2. The extraction of the image edge is performed by image processing such as, for example, differentiating the image and tracking its extreme value. The edge dividing unit 2 divides an image edge into straight lines at a high curvature point, an intersection, or the like, and divides the result into an end point detecting unit 3, a posture calculating unit 4,
The information is passed to the edge matching unit 5. The end point detection unit 3 obtains the end points of the image edges divided into straight lines, and passes the result to the posture calculation unit 4 and the edge comparison unit 5. As described in paragraph 0029, the end point of the image edge includes a method of using the end point of the straight line as it is and a method of obtaining the end point of the image edge as an intersection of the image edge. The end point detection unit 3 implements either one or both, and allows the user to select one. Next, the posture calculation unit 4 refers to the object model in the object model storage unit 6 by the processing shown in steps S1 to S7 in FIG.
The model edge and the image edge are collated to obtain candidates for the object posture, and the candidates are passed to the edge collation unit 5. The edge matching unit 5 performs step S for each of the obtained candidates for the object posture.
Through the processing shown in steps S8 to S11, the object posture in which the model edge and the image edge best match with each other is determined with reference to the object model in the object model storage unit 6, and the edge correspondence at that time is determined.

【００３９】図６は、物体モデルの構成の一例を示す説
明図である。物体モデルは多面体であり、頂点情報、辺
情報、辺接続情報からなる。頂点情報は多面体を構成す
る頂点の３次元空間での座標値の集合である。座標系
は、各物体ごとにローカル座標系を設定して、その座標
系内での座標値にしておくと都合がよい。たとえば、図
６で、頂点ｐ１の座標値は（１００，２００，０）であ
る。辺情報は、多面体の辺の端点となる頂点の組の集合
である。辺がモデルエッジに相当する。たとえば、図６
で、辺ｅ１は頂点ｐ１と頂点ｐ２をつなぐ線分として定
義されている。FIG. 6 is an explanatory diagram showing an example of the configuration of the object model. The object model is a polyhedron, and includes vertex information, edge information, and edge connection information. The vertex information is a set of coordinate values in a three-dimensional space of the vertices forming the polyhedron. As the coordinate system, it is convenient to set a local coordinate system for each object and to set the coordinate values within the coordinate system. For example, in FIG. 6, the coordinate value of the vertex p1 is (100, 200, 0). The side information is a set of sets of vertices that are end points of the sides of the polyhedron. The side corresponds to the model edge. For example, FIG.
Where the side e1 is defined as a line segment connecting the vertices p1 and p2.

【００４０】辺接続情報は、辺の端点を交点として求め
るための他の辺を指定する。たとえば、図６で、辺ｅ１
の一方の端点は辺ｅ５および辺ｅ６との交点であり、も
う一方の端点は辺ｅ２および辺ｅ４との交点である。辺
接続情報は、平行移動成分τ_ｔの計算やエッジ照合にお
いて、画像エッジの端点を他の画像エッジとの交点とし
て求める際に、以下のように使われる。いま、辺ｅ１の
接続情報に辺ｅ２があったとする。すると、モデル上で
辺ｅ１の一方の端点は辺ｅ２との交点となるから、画像
上で辺ｅ１に対応する画像エッジの端点は、辺ｅ２とエ
ッジ対をなす画像エッジのどれかとの交点となるはずで
ある。そこで、全画像エッジについてτ_ｔを計算するの
ではなく、辺ｅ２とエッジ対をなす画像エッジに絞って
τ_ｔを計算する。The side connection information specifies another side for obtaining an end point of the side as an intersection. For example, in FIG.
Is an intersection with the side e5 and the side e6, and the other end is an intersection with the side e2 and the side e4. The edge connection information is used as follows when calculating an end point of an image edge as an intersection with another image edge in the calculation of the translation component τ _t or the edge matching. Now, it is assumed that the connection information of the side e1 includes the side e2. Then, on the model, one end point of the side e1 is an intersection with the side e2, so that the end point of the image edge corresponding to the side e1 on the image is the intersection with any one of the image edges forming an edge pair with the side e2. Should be. Therefore, instead of calculating the tau _t for all image edges, to calculate the tau _t Search in image edge forming the edge e2 and edge pairs.

【００４１】次に、本発明の物体認識方法の計算量を記
す。まず、ステップＳ４では、モデルエッジと画像エッ
ジのすべての対に対して１回ずつ計算を行うので、その
計算量はＭＮに比例する。ただし、Ｍはモデルエッジ
数、Ｎは画像エッジ数である。ステップＳ６〜Ｓ７の計
算量は、平行移動成分τ_ｔの計算における画像エッジの
端点の求め方により異なる。画像エッジの端点をそのま
ま用いる場合は、同一直線上に存在しうる投影エッジを
もつモデルエッジと画像エッジのすべての対に対して１
回ずつ計算を行うので、その計算量はＭＮ′に比例す
る。Ｎ′はモデルる。次に、画像エッジの交点を端点として用いる場合
は、Ｎ′個の画像エッジのそれぞれに対して、その両端
点を求めるための他の画像エッジとの組合せが平均で
（Ｎ′−１）^２だけあるため、計算量はＭＮ′^３に比例
する。ステップＳ１０の計算量も、ステップＳ６〜Ｓ７
の計算量と同様である。Next, the calculation amount of the object recognition method of the present invention will be described. First, in step S4, since the calculation is performed once for every pair of the model edge and the image edge, the calculation amount is proportional to MN. Here, M is the number of model edges, and N is the number of image edges. Computational steps S6~S7 varies depending Determination of the end point of the image edges in the calculation of the parallel movement component tau _t. When the end point of the image edge is used as it is, one pair is set for all pairs of the model edge and the image edge having the projection edge which can exist on the same straight line.
Since the calculation is performed each time, the calculation amount is proportional to MN '. N 'is the model You. Next, when the intersection of the image edges is used as an end point, each of the N ′ image edges is averagely combined with another image edge to obtain both end points of (N′−1) ^2. since there is only, the amount of calculation is proportional to the MN ^'3. The calculation amount in step S10 is also determined in steps S6 to S7.
Is the same as the calculation amount of

【００４２】以上より、本発明の計算量は、平行移動成
分τ_ｔの計算で画像エッジの端点をそのまま用いる場合
は、Ｍ（ｋ_１Ｎ＋ｋ_２Ｎ′＋ｋ_３Ｎ′）となる。ｋ_１，
ｋ_２，ｋ_３は比例定数であり、回転成分τ_ｒの離散化区
画数もこれに含まれる。画像エッジの交点を端点として
用いる場合は、Ｍ（ｋ_１Ｎ＋ｋ_２Ｎ′^３＋ｋ_３Ｎ′^３）
となる。As described above, the calculation amount of the present invention is M (k ₁ N + k ₂ N ′ + k ₃ N ′) when the end point of the image edge is used as it is in the calculation of the translation component τ _t . k ₁ ,
k ₂ and k ₃ are proportional constants, and include the number of discretized sections of the rotation component τ _r . When an intersection of image edges is used as an end point, M (k ₁ N + k ₂ N ′ ³ + k ₃ N ′ ³ )
Becomes

【００４３】[0043]

【発明の効果】以上述べたように、本発明の物体認識方
法によれば、透視投影でカメラをモデル化しているた
め、従来の技術で述べた弱透視投影で近似する方法に比
べ、物体の奥行きが長い場合でも正しく認識できるとい
う効果がある。As described above, according to the object recognition method of the present invention, since the camera is modeled by perspective projection, the object is compared with the method of approximation by weak perspective projection described in the prior art. There is an effect that correct recognition can be performed even when the depth is long.

【００４４】本発明の物体認識方法の計算量は、前述の
ように、Ｍ（ｋ_１Ｎ＋ｋ_２Ｎ′＋いずれの計算量も従来の技術の計算量Ｍ^３Ｎ^２ｌｏｇＮ
より小さくなる。したがって、従来の技術よりも計算時
間が短縮されるという効果がある。これは、Ｍ，Ｎが大
きい場合に顕著になる。As described above, the calculation amount of the object recognition method of the present invention is M (k ₁ N + k ₂ N ′ + Each of the computational amounts is the computational amount M ³ N ² logN of the prior art.
Smaller. Therefore, there is an effect that the calculation time is shorter than in the conventional technique. This becomes remarkable when M and N are large.

[Brief description of the drawings]

【図１】本発明の物体認識方法の処理手順を示す流れ図FIG. 1 is a flowchart showing a processing procedure of an object recognition method of the present invention.

【図２】本発明の物体認識方法を実行するシステムの構
成を示すブロック図FIG. 2 is a block diagram showing the configuration of a system that executes the object recognition method of the present invention.

【図３】カメラと物体モデルの姿勢関係を示す説明図FIG. 3 is an explanatory diagram showing a posture relationship between a camera and an object model.

【図４】画像中の物体のエッジ画像の一例を示す説明図FIG. 4 is an explanatory diagram illustrating an example of an edge image of an object in an image.

【図５】物体モデルの一例を示す説明図FIG. 5 is an explanatory diagram showing an example of an object model.

【図６】物体モデルのデータ表現の一例を示す説明図FIG. 6 is an explanatory diagram showing an example of data representation of an object model.

[Explanation of symbols]

１…エッジ抽出部、２…エッジ分割部、３…端点検出
部、４…姿勢計算部、５…エッジ照合部、６…物体モデ
ル記憶部、７…カメラ、８…カメラ座標系、９…画像平
面、１０…スクリーン座標系、１１…物体モデル、１２
…物体座標系。Ｓ１〜Ｓ１１は処理手順のステップであ
る。DESCRIPTION OF SYMBOLS 1 ... Edge extraction part, 2 ... Edge division part, 3 ... Endpoint detection part, 4 ... Attitude calculation part, 5 ... Edge collation part, 6 ... Object model storage part, 7 ... Camera, 8 ... Camera coordinate system, 9 ... Image Plane, 10: screen coordinate system, 11: object model, 12
... object coordinate system. S1 to S11 are steps in the processing procedure.

Claims

[Claims]

1. A method for recognizing an object appearing in an input image by comparing a set of image edges extracted from the input image with an object model registered in advance, the coordinate system of a camera which has taken the input image , The rotation component of the posture of the object model is discretized, and for each discrete value of the rotation component, a projected image of each model edge constituting the object model on an image plane and each image edge of the image edge set are It is checked whether or not the model edge can exist on the same straight line, and a set of a pair of the model edge and the image edge which can exist on the same straight line is obtained as an edge pair set. Then, for each edge pair included in the edge pair set, both end points of the projected image of the model edge included in the edge pair onto the image plane and the image edge included in the edge pair. Determine the translation component value of the orientation of the object model that matches the two end points of the edge, the representative value of the region where the degree of concentration is large in the distribution of the translation component value for the edge pair set as the estimated value of the translation component, An object recognition method, wherein the rotation component value and the estimated value of the translation component are set as candidates for an object posture.

2. For each of the object posture candidates, a degree of coincidence of the model edge is calculated from a distance between the image edge and a projection image of each model edge on the image plane in the object posture candidate. 2. The object recognition method according to claim 1, wherein an edge is selected as an image edge corresponding to the model edge, and an object posture candidate having the best sum of matching degrees of all model edges is selected as a solution for posture estimation.