JP2006343859A

JP2006343859A - Image processing system and image processing method

Info

Publication number: JP2006343859A
Application number: JP2005167300A
Authority: JP
Inventors: Masamichi Osugi; 雅道大杉
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2005-06-07
Filing date: 2005-06-07
Publication date: 2006-12-21
Anticipated expiration: 2025-06-07
Also published as: JP4774818B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing system and image processing method capable of precisely detecting information of a target object from a picked-up image with a small amount of data. <P>SOLUTION: The image processing system 11 detecting various information of the object from the pick-up image is provided with an imaging means 12, an information holding means 13a for holding a template comprising evaluation information and three-dimensional position information regarding a plurality of characteristic points on a two-dimensional image of the target object, an estimated value preparing means 13c for preparing a plurality of estimated values for positions and attitudes of the target object, a goodness-of-fit calculating means 13d for calculating goodness of fit of a plurality of estimated values prepared on the basis of respective templates regarding the picked-up image and held plurality of characteristic points, and a determining means 13e for determining an existence, the position, and/or attitude of the target object in the picked-up image on the basis of the goodness of fit of the calculated plurality of estimated values. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、撮像画像から物体についての様々な情報を検出する画像処理装置及び画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method for detecting various information about an object from a captured image.

画像処理には、カメラで撮像した撮像画像から顔などの対象物体を検出し、物体を検出できた場合にはその物体の位置や姿勢を推定するものがある。このような画像処理の手法としては、対象物体を様々な視点や位置から撮像した多数の参照画像を予め用意しておき、入力される撮像画像と各参照画像とのマッチングをそれぞれ行い、類似する参照画像を探すことによって撮像画像上における対象物体を検出する手法がある。また、他の手法としては、対象物体を撮像した画像から抽出した複数の特徴点の参照画像を予め用意しておき、入力される撮像画像とその複数の特徴点の参照画像とのマッチングをそれぞれ行い、撮像画像上で各参照画像と類似する位置をそれぞれ探索し、その撮像画像上での複数の特徴点の位置関係に矛盾がないか否かを判定することによって撮像画像上における対象物体を検出する手法がある（特許文献１参照）。
特開２００４−２５２５１１号公報 In image processing, a target object such as a face is detected from a captured image captured by a camera, and if an object can be detected, the position and orientation of the object are estimated. As such an image processing method, a large number of reference images obtained by imaging a target object from various viewpoints and positions are prepared in advance, and the input captured image and each reference image are matched, and similar. There is a technique for detecting a target object on a captured image by searching for a reference image. As another method, a reference image of a plurality of feature points extracted from an image obtained by capturing the target object is prepared in advance, and matching between the input captured image and the reference images of the plurality of feature points is performed. Search for a position similar to each reference image on the captured image, and determine whether there is a contradiction in the positional relationship of the plurality of feature points on the captured image. There is a method of detecting (see Patent Document 1).
JP 2004-252511 A

しかしながら、１つの目の手法の場合、物体の画像上での見え方は物体の向き、位置、照明条件などによって大きく変化するので、それらのバリエーションを全て網羅するような参照画像を予め用意しておかないと、それらの変化が生じた撮像画像内において対象物体を検出できない。全てのバリエーションの参照画像を用意した場合、膨大なデータ量となり、処理負荷が増大し、処理時間を非常に要する。また、２つの目の手法の場合、画像上で特徴点を探索する際に物体の向き、位置、照明条件が生じると、特徴点を抽出できなかったりあるいは誤った点を特徴点として抽出する場合ある。また、個別に複数の特徴点を探索するので、特徴点の位置決め精度が劣化する場合がある。 However, in the case of the one eye method, the appearance of an object on the image changes greatly depending on the orientation, position, illumination conditions, etc. of the object, so prepare a reference image that covers all these variations in advance. Otherwise, the target object cannot be detected in the captured image in which such changes have occurred. When all variations of reference images are prepared, the amount of data becomes enormous, the processing load increases, and processing time is extremely long. Also, in the case of the second method, if an object orientation, position, or lighting condition occurs when searching for a feature point on an image, the feature point cannot be extracted or an incorrect point is extracted as a feature point is there. In addition, since a plurality of feature points are searched individually, the positioning accuracy of the feature points may deteriorate.

そこで、本発明は、少ないデータ量により撮像画像から対象物体についての情報を高精度に検出することができる画像処理装置及び画像処理方法を提供することを課題とする。 Therefore, an object of the present invention is to provide an image processing apparatus and an image processing method that can detect information about a target object from a captured image with a small amount of data with high accuracy.

本発明に係る画像処理装置は、撮像手段と、対象物体の二次元画像上の複数の特徴点についての評価情報と三次元位置情報からなるテンプレートを保持する情報保持手段と、対象物体の位置又は／及び姿勢の推定値を複数生成する推定値生成手段と、撮像手段で撮像した撮像画像と情報保持手段で保持している複数の特徴点についての各テンプレートに基づいて推定値生成手段で生成した複数の推定値の適合度を算出する適合度算出手段と、適合度算出手段で算出した複数の推定値の適合度に基づいて撮像手段で撮像した撮像画像における対象物体の有無、位置又は／及び姿勢を判断する判断手段とを備えることを特徴とする。 An image processing apparatus according to the present invention includes an imaging unit, an information holding unit that holds a template including evaluation information and three-dimensional position information about a plurality of feature points on a two-dimensional image of a target object, and a position of the target object or And / or an estimated value generating means for generating a plurality of estimated posture values, and an estimated value generating means based on each of the captured image captured by the imaging means and a plurality of feature points held by the information holding means. A fitness calculation means for calculating the fitness of a plurality of estimated values, and the presence / absence, position or / and / or position of the target object in the captured image captured by the imaging means based on the fitness of the plurality of estimated values calculated by the fitness calculation means And a judging means for judging the posture.

この画像処理装置は、情報保持手段に対象物体の二次元画像上での複数の特徴点についてのテンプレートを保持している。テンプレートは、二次元画像上において各特徴点がその点であることの確からしさを評価するための評価情報（例えば、輝度画像（輝度パターン）、輝度ヒストグラム、エッジ画像、輝度画像に対するフーリエ変換による周波数特性）と特徴点間の三次元上での位置関係を示すための三次元位置情報（例えば、三次元モデルの三次元座標）からなる。画像処理装置では、撮像手段により撮像し、二次元画像を取得する。画像処理装置では、推定値生成手段により対象物体の撮像手段に対する位置又は／及び姿勢の推定値を複数生成する。そして、画像処理装置では、適合度算出手段により、推定値毎に、撮像画像と複数の特徴点についての各テンプレートに基づいて、三次元的な位置関係を考慮して複数の特徴点の確からしさを評価し、その複数の特徴点に対する評価から推定値が撮像画像における対象物体の位置又は／及び姿勢に対して適合している度合いを算出する。複数の特徴点についての三次元的位置関係を保持した上で個々の特徴点らしさを評価することにより、推定値の位置や姿勢が実際の位置や姿勢に合っているか否かを求めている。さらに、画像処理装置では、判断手段により、複数の推定値の適合度に基づいて、撮像画像における対象物体の有無、位置又は／及び姿勢を判断する。適合度が高いほど推定値の位置又は／姿勢が撮像画像における対象物体の位置又は／及び姿勢に適合している度合いが高いので、適合度が高い推定値を用いることによって対象物体が撮像画像内に存在するか否かを判断できるとともに位置又は／及び姿勢が判る。このように、画像処理装置では、各特徴点についての評価情報と三次元位置情報を保持し、画像全体ではなく各特徴点に対して処理を行うので、少ないデータ量に処理可能であり、処理負荷が軽く、処理時間も短い。また、画像処理装置では、位置や姿勢に対する様々な推定値を生成し、各推定値の適合度を求めているので、対象物体の各種変動に対してロバスト性があり、撮像画像における対象物体の有無、位置、姿勢を高精度に判断することができる。 In this image processing apparatus, templates for a plurality of feature points on a two-dimensional image of a target object are held in an information holding unit. The template is evaluation information (for example, luminance image (luminance pattern), luminance histogram, edge image, frequency obtained by Fourier transform on the luminance image) for evaluating the probability that each feature point is the point on the two-dimensional image. Characteristic) and three-dimensional positional information (for example, three-dimensional coordinates of a three-dimensional model) for indicating a positional relationship between the characteristic points in three dimensions. In the image processing apparatus, a two-dimensional image is acquired by imaging with an imaging unit. In the image processing apparatus, the estimated value generating means generates a plurality of estimated values of the position or / and orientation of the target object with respect to the imaging means. Then, in the image processing apparatus, the accuracy of the plurality of feature points is determined by taking the three-dimensional positional relationship into consideration based on the captured image and each template for the plurality of feature points for each estimated value by the fitness calculation means. And the degree to which the estimated value is adapted to the position or / and orientation of the target object in the captured image is calculated from the evaluation of the plurality of feature points. Whether or not the position and orientation of the estimated value match the actual position and orientation is determined by evaluating the likelihood of each feature point while maintaining the three-dimensional positional relationship of the plurality of feature points. Further, in the image processing apparatus, the presence / absence, position, and / or orientation of the target object in the captured image is determined by the determination unit based on the fitness of the plurality of estimated values. The higher the degree of matching, the higher the degree of matching of the position or orientation of the estimated value with the position or / and orientation of the target object in the captured image. The position or / and posture can be determined. In this way, the image processing apparatus holds evaluation information and three-dimensional position information for each feature point, and processes each feature point instead of the entire image. Light load and short processing time. In addition, since the image processing apparatus generates various estimated values for the position and orientation and obtains the fitness of each estimated value, it is robust against various variations of the target object, and the target object in the captured image Presence / absence, position, and orientation can be determined with high accuracy.

本発明の上記画像処理装置では、推定値生成手段は、撮像手段で過去に撮像した撮像画像における対象物体の位置又は／及び姿勢に関する値、対象物体が構造上取りうる位置又は／及び姿勢に関する値、対象物体の位置又は／及び姿勢の履歴の少なくとも１つに基づいて推定値を生成する構成としてもよい。 In the image processing apparatus of the present invention, the estimated value generating means is a value related to the position or / and orientation of the target object in the captured image captured in the past by the imaging means, or a value related to the position or / and orientation that the target object can take structurally. The estimated value may be generated based on at least one of the position or / and orientation history of the target object.

この画像処理装置の推定値生成手段では、撮像手段で過去に撮像した撮像画像（例えば、前フレームの撮像画像）における対象物体の位置又は／及び姿勢に関する値、対象物体が構造上取りうる位置又は／及び姿勢に関する値、対象物体の位置又は／及び姿勢の履歴の少なくとも１つに基づいて推定値を生成する。つまり、推定値をランダムに生成するのではなく、対象物体が取りうる可能性の高い位置や姿勢を基準にして推定値を生成する。このように、画像処理装置では、推定値を生成する範囲を絞ることによって、無駄な推定値を生成せず、処理負荷を軽減することができる。 In the estimated value generation means of this image processing apparatus, the value related to the position or / and orientation of the target object in the captured image (for example, the captured image of the previous frame) captured in the past by the imaging means, The estimated value is generated on the basis of at least one of / and a value related to the posture, a position of the target object, and / or a history of posture. That is, the estimated value is not generated at random, but is generated based on the position or orientation that the target object is likely to take. As described above, in the image processing apparatus, by narrowing the range for generating the estimated value, it is possible to reduce the processing load without generating a useless estimated value.

なお、撮像手段で過去に撮像した撮像画像における対象物体の位置又は／及び姿勢に関する値は、撮像手段で撮像している過程で順次得られる撮像画像から判断された位置や姿勢自体あるいはその位置や姿勢を示す他の値であり、例えば、前フレームの撮像画像から判断された対象物体の位置や姿勢である。対象物体が構造上取りうる位置又は／及び姿勢に関する値は、対象物体がおかれている環境や対象物体と撮像手段との位置関係などから物理的に決まる対象物体が動作可能な位置や姿勢の範囲である。対象物体の位置又は／及び姿勢の履歴は、対象物体が過去にとっていた位置や姿勢を蓄積し、対象物体個々の位置や姿勢の傾向を示すものであり、対象物体のとる可能性の高い位置や姿勢を示すことになる。 It should be noted that the value related to the position or / and orientation of the target object in the captured image captured in the past by the imaging means is the position or orientation determined from the captured images sequentially obtained in the process of imaging by the imaging means, or the position or position thereof. Other values indicating the posture, for example, the position and posture of the target object determined from the captured image of the previous frame. The position and / or orientation values that the target object can take structurally are the position and orientation of the target object that can be physically determined from the environment in which the target object is placed and the positional relationship between the target object and the imaging means. It is a range. The history of the position or orientation of the target object accumulates the position and orientation that the target object has taken in the past, and shows the tendency of the position and orientation of each target object. It will indicate your posture.

本発明の上記画像処理装置では、推定値生成手段は、撮像手段で過去に撮像した撮像画像における対象物体の位置又は／及び姿勢に関する値、対象物体の位置又は／及び姿勢の履歴の少なくとも１つに基づいて複数生成する推定値の密度を変える構成としてもよい。 In the image processing apparatus of the present invention, the estimated value generation means includes at least one of a value related to the position or / and orientation of the target object in the captured image captured in the past by the imaging means, and a history of the position or / and orientation of the target object. A configuration may be adopted in which the density of a plurality of estimated values to be generated is changed based on.

この画像処理装置の推定値生成手段では、撮像手段で過去に撮像した撮像画像における対象物体の位置又は／及び姿勢に関する値、対象物体の位置又は／及び姿勢の履歴の少なくとも１つに基づいて複数生成する推定値の設定間隔を変える。つまり、複数の推定値を生成する際に位置や姿勢を一定間隔とするのではなく、対象物体がとる可能性の高い位置や姿勢付近では推定値の設定間隔を狭くし（密度を高くし）、対象物体がとる可能性の低い位置や姿勢付近では推定値の設定間隔を広くする（密度を低くする）。このように、画像処理装置では、複数の推定値の密度を変えることによって、推定値の数を削減でき、処理負荷を軽減することができる。 The estimated value generation means of the image processing apparatus includes a plurality of values based on at least one of a value related to the position or / and orientation of the target object in the captured image captured in the past by the imaging means, and a history of the position or / and orientation of the target object. Change the setting interval of the estimated value to be generated. In other words, instead of setting the position and orientation at a fixed interval when generating multiple estimates, the estimated value setting interval is narrowed (higher density) near the position and orientation that the target object is likely to take. In the vicinity of the position or posture where the target object is unlikely to be taken, the setting interval of the estimated value is widened (the density is lowered). As described above, in the image processing apparatus, the number of estimated values can be reduced and the processing load can be reduced by changing the density of a plurality of estimated values.

本発明の上記画像処理装置では、適合度算出手段は、推定値生成手段で生成した各推定値によりテンプレートの各特徴点の三次元位置を変換し、当該変換した三次元位置を撮像手段で撮像された撮像画像に投影し、当該投影位置での特徴点の確からしさをテンプレートの評価情報と撮像画像の投影位置周辺の情報に基づいて評価し、当該評価値に基づいて適合度を算出する構成としてもよい。 In the image processing apparatus of the present invention, the fitness calculation means converts the three-dimensional position of each feature point of the template by each estimated value generated by the estimated value generating means, and images the converted three-dimensional position by the imaging means. Projecting on the captured image, evaluating the probability of the feature point at the projection position based on the evaluation information of the template and the information around the projection position of the captured image, and calculating the degree of fitness based on the evaluation value It is good.

この画像処理装置の適合度算出手段では、推定値毎に、各推定値の位置又は／及び姿勢に応じてテンプレートの各特徴点の三次元位置をそれぞれ変換し、当該変換した各三次元位置を二次元の撮像画像上にそれぞれ投影する。つまり、各特徴点の基準の三次元位置を推定値の位置や姿勢に応じて移動させ、推定値の位置や姿勢に応じた三次元位置の特徴点の撮像画像上の二次元位置を求める。そして、適合度算出手段では、各特徴点についてテンプレートの評価情報と撮像画像の投影位置周辺の情報に基づいて特徴点の確からしさを評価し、複数の特徴点についての評価値に基づいて適合度を算出する。つまり、推定値の位置や姿勢に応じた特徴点の三次元的な位置関係を保持した撮像画像上での投影位置で、投影位置周辺の画像情報がその位置に対応する特徴点の評価情報とどの程度類似しているかを評価することによって、推定値が撮像画像における対象物体の位置や姿勢にどれくらい適合しているかを求めている。 In the degree-of-fit calculation means of this image processing apparatus, for each estimated value, the three-dimensional position of each feature point of the template is converted according to the position or / and orientation of each estimated value, and the converted three-dimensional position is Each is projected onto a two-dimensional captured image. That is, the reference three-dimensional position of each feature point is moved according to the position and orientation of the estimated value, and the two-dimensional position on the captured image of the feature point at the three-dimensional position according to the position and orientation of the estimated value is obtained. Then, the suitability calculation means evaluates the probability of the feature point based on the template evaluation information and the information around the projection position of the captured image for each feature point, and the suitability based on the evaluation value for the plurality of feature points Is calculated. In other words, in the projected position on the captured image that retains the three-dimensional positional relationship of the feature point according to the position and orientation of the estimated value, the image information around the projected position is the evaluation information of the feature point corresponding to the position. The degree of similarity is evaluated to determine how much the estimated value matches the position and orientation of the target object in the captured image.

本発明の上記画像処理装置では、適合度算出手段は、撮像画像の投影位置周辺の情報をテンプレートの評価情報と同じ物理量に変換する。 In the image processing apparatus of the present invention, the fitness level calculation means converts information around the projection position of the captured image into the same physical quantity as the template evaluation information.

この画像処理装置の適合度算出手段では、評価する際に投影位置周辺の画像情報をテンプレートの評価情報の物理量と同じ物理量に変換し、評価値を求める。例えば、評価情報が輝度パターンの場合、撮像画像から得られる投影位置周辺の情報を輝度パターンに変換する。 In the degree-of-fit calculation means of this image processing apparatus, when evaluating, the image information around the projection position is converted into the same physical quantity as the physical quantity of the template evaluation information, and the evaluation value is obtained. For example, when the evaluation information is a luminance pattern, information around the projection position obtained from the captured image is converted into a luminance pattern.

本発明の上記画像処理装置では、適合度算出手段は、投影位置における特徴点の確からしさの評価値が所定値以下の場合には一定値にするように構成してもよい。 In the above-described image processing apparatus of the present invention, the degree-of-fit calculation means may be configured to be a constant value when the evaluation value of the probability of the feature point at the projection position is a predetermined value or less.

この画像処理装置の適合度算出手段では、各投影位置での特徴点の確からしさの評価値が所定値か否かを判定し、所定値以下の場合には評価値を一定値に設定する。このように、画像処理装置では、所定値以下の悪い評価値を取り除くことによって、ノイズなどの影響によって低下している評価値が適合度に影響するのを防止する。 The degree-of-fit calculation means of this image processing apparatus determines whether or not the evaluation value of the probability of the feature point at each projection position is a predetermined value, and sets the evaluation value to a constant value if it is less than or equal to the predetermined value. In this way, in the image processing apparatus, by removing bad evaluation values that are less than or equal to a predetermined value, it is possible to prevent the evaluation value that has been lowered due to the influence of noise or the like from affecting the fitness.

本発明の上記画像処理装置では、適合度算出手段は、特徴点毎の各投影位置における評価値からなるデータ構造を撮像画像における対象物体が変化する毎に生成する構成としてもよい。 In the image processing apparatus of the present invention, the fitness calculation means may be configured to generate a data structure composed of evaluation values at each projection position for each feature point each time the target object in the captured image changes.

この画像処理装置の適合度算出手段では、撮像画像における対象物体が変化すると（例えば、撮像画像が次フレームになると）、特徴点毎の画像上の各投影位置における評価値からなるデータ構造を生成する。このようなデータ構造を生成することにより、複数の推定値について順次評価値を求めていく過程で、ある特徴点のデータ構造において同じ投影位置の評価値が既に格納されている場合（つまり、同じ投影位置に再度投影された場合）、その投影位置については評価値を求める必要がなくなる。そのため、画像処理装置では、特徴点の同じ投影位置についての評価値を重複して算出することがなくなり、処理負荷を軽減できる。 When the target object in the captured image changes (for example, when the captured image becomes the next frame), the fitness calculation unit of the image processing apparatus generates a data structure including evaluation values at each projection position on the image for each feature point. To do. By generating such a data structure, the evaluation value at the same projection position is already stored in the data structure of a certain feature point in the process of sequentially obtaining evaluation values for a plurality of estimated values (that is, the same When the projection position is projected again), it is not necessary to obtain an evaluation value for the projection position. Therefore, in the image processing apparatus, it is no longer necessary to calculate the evaluation value for the same projection position of the feature point, and the processing load can be reduced.

本発明の上記画像処理装置では、適合度算出手段は、複数の特徴点のうち評価値が高い特徴点を用いて適合度を算出する構成としてもよい。 In the image processing apparatus of the present invention, the fitness level calculating means may be configured to calculate the fitness level using a feature point having a high evaluation value among a plurality of feature points.

この画像処理装置の適合度算出手段では、推定値毎に、複数の特徴点についての評価値をそれぞれ求め、その複数の特徴点の評価値のうち評価値が高いものだけを用いて適合度を算出する。このように、画像処理装置では、高い評価値だけを用いることによって、ノイズなどの影響によって低下している評価値や推定値の位置や姿勢によって撮像画像上に投影されない特徴点の評価値などが適合度に影響するのを防止し、高精度な適合度を求めることができる。 The degree-of-fit calculation means of this image processing apparatus obtains evaluation values for a plurality of feature points for each estimated value, and uses only the evaluation values of the plurality of feature points that have a high evaluation value to determine the degree of fit. calculate. As described above, in the image processing apparatus, by using only a high evaluation value, an evaluation value that is lowered due to the influence of noise or the like, an evaluation value of a feature point that is not projected on the captured image due to the position or orientation of the estimated value, and the like. It is possible to prevent the degree of fitness from being affected and to obtain a highly accurate fitness level.

本発明の上記画像処理装置では、適合度算出手段は、複数の特徴点の評価値の統計量を算出し、当該統計量を適合度とする構成としてもよい。 In the image processing apparatus of the present invention, the fitness level calculating means may be configured to calculate a statistical quantity of evaluation values of a plurality of feature points and use the statistical quantity as the fitness level.

この画像処理装置の適合度算出手段では、推定値毎に、複数の特徴点についての評価値の統計量（例えば、和、平均）を算出し、統計量を適合度とする。適合度を評価する際に各特徴点の評価値の統計量を用いるので、全体的な類似度に応じて適合度も変化する。その結果、物体の向き、位置、照明条件などによる見た目の変化に対するロバスト性が高くなり、局所的な誤った位置や姿勢に収束することを抑制する。 The degree-of-fit calculation means of this image processing apparatus calculates the statistic (e.g., sum, average) of evaluation values for a plurality of feature points for each estimated value, and uses the statistic as the degree of fit. Since the statistic of the evaluation value of each feature point is used when evaluating the fitness level, the fitness level changes according to the overall similarity. As a result, the robustness with respect to changes in appearance due to the orientation, position, illumination conditions, etc. of the object is increased, and convergence to a local erroneous position or posture is suppressed.

本発明の上記画像処理装置では、判断手段は、適合度の最大値が所定値以上、所定値以上の適合度の数が所定数以上、適合度の最大値から所定範囲内の値の適合度の数が所定数以上の少なくとも１つの条件を満たす場合に撮像画像に対象物体が存在すると判断する構成としてもよい。 In the image processing apparatus according to the present invention, the determination means includes a fitness value of a value within a predetermined range from the maximum value of the fitness level, wherein the maximum value of the fitness level is a predetermined value or more, the number of fitness levels equal to or greater than the predetermined value is a predetermined number or more. It is also possible to determine that the target object exists in the captured image when at least one condition satisfying at least a predetermined number is satisfied.

この画像処理装置の判断手段では、複数の推定値の適合度において適合度の最大値が所定値以上、所定値以上の適合度の数が所定数以上、適合度の最大値から所定範囲内の値の適合度（例えば、適合度の最大値の９割以上の値の適合度）の数が所定数以上のいずれかの条件を満たした場合に撮像画像に対象物体が存在すると判断する。適合度が高い場合には撮像画像の各投影位置において対応する特徴点がそれぞれ存在していると推測できるので、撮像画像に対象物体が存在していると判断できる。 In the determination means of the image processing device, the maximum value of the fitness values in the fitness values of the plurality of estimated values is equal to or greater than a predetermined value, the number of fitness values greater than or equal to the predetermined value is equal to or greater than a predetermined value, It is determined that the target object exists in the captured image when the degree of the degree of matching of the values (for example, the degree of matching of 90% or more of the maximum value of the degree of matching) satisfies any one of the predetermined number or more. When the degree of matching is high, it can be estimated that corresponding feature points exist at each projection position of the captured image, so it can be determined that the target object exists in the captured image.

本発明の上記画像処理装置では、対象物体の位置及び／又は姿勢の推定値と当該推定値に対する適合度を記憶する構成としてもよい。 The image processing apparatus according to the present invention may be configured to store an estimated value of the position and / or orientation of the target object and a fitness for the estimated value.

この画像処理装置では、対象物体の位置及び／又は姿勢の推定値と当該推定値に対する適合度を記憶する。これらの情報を記憶していくことにより、適合度が高い推定値から対象物体の位置や姿勢の履歴を蓄積することができる。したがって、この記憶した情報から対象物体のとる可能性の高い位置や姿勢が判る。そこで、画像処理装置では、推定値生成手段により、この記憶した情報を利用して、推定値を生成することができる。 In this image processing apparatus, the estimated value of the position and / or orientation of the target object and the degree of fitness for the estimated value are stored. By storing these pieces of information, the history of the position and orientation of the target object can be accumulated from the estimated value having a high degree of fitness. Therefore, from this stored information, the position and orientation that the target object is likely to take can be found. Therefore, in the image processing apparatus, the estimated value can be generated by the estimated value generation means using the stored information.

本発明の上記画像処理装置では、対象物体を黒目又は黒目と白目とした場合、情報保持手段は、対象物体が黒目の場合にはテンプレートとして黒目モデルを保持し、対象物体が黒目と白目の場合にはテンプレートとして黒目モデルと白目モデルを保持する。 In the image processing apparatus of the present invention, when the target object is black eyes or black eyes and white eyes, the information holding means holds a black eye model as a template when the target object is black eyes, and when the target object is black eyes and white eyes Holds a black eye model and a white eye model as templates.

本発明の上記画像処理装置では、対象物体を黒目又は黒目と白目とした場合、適合度算出手段は、推定値の適合度として、対象物体が黒目の場合には撮像画像の投影位置における黒目領域の輝度の平均値を算出し、対象物体が黒目と白目の場合には撮像画像の投影位置における黒目領域の輝度の平均値と白目領域の輝度の平均値を算出する。 In the image processing apparatus of the present invention, when the target object is a black eye or a black eye and a white eye, the suitability calculation means, as the suitability of the estimated value, if the target object is a black eye, the black eye region at the projected position of the captured image When the target object is black eyes and white eyes, the average brightness value of the black eye area and the average brightness value of the white eye area at the projection position of the captured image are calculated.

本発明に係る画像処理方法は、撮像ステップと、対象物体の位置及び／又は姿勢の推定値を複数生成する推定値生成ステップと、撮像ステップで撮像した撮像画像と対象物体の二次元画像上の複数の特徴点についての評価情報と三次元位置情報からなるテンプレートに基づいて推定値生成ステップで生成した複数の推定値の適合度を算出する適合度算出ステップと、適合度算出ステップで算出した複数の推定値の適合度に基づいて対象物体の有無、位置又は／及び姿勢を判断する判断ステップとを含むことを特徴とする。 An image processing method according to the present invention includes an imaging step, an estimated value generating step for generating a plurality of estimated values of the position and / or orientation of the target object, a captured image captured in the imaging step, and a two-dimensional image of the target object. A fitness calculation step for calculating the fitness of a plurality of estimated values generated in the estimated value generation step based on a template comprising evaluation information and three-dimensional position information for a plurality of feature points, and a plurality of parameters calculated in the fitness calculation step And a determination step of determining the presence / absence, position, and / or orientation of the target object based on the degree of matching of the estimated values.

本発明の上記画像処理方法の推定値生成ステップでは、撮像ステップで過去に撮像した撮像画像における対象物体の位置又は／及び姿勢に関する値、対象物体が構造上取りうる位置又は／及び姿勢に関する値、対象物体の位置又は／及び姿勢の履歴の少なくとも１つに基づいて推定値を生成する構成としてもよい。 In the estimated value generation step of the image processing method of the present invention, a value related to the position or / and orientation of the target object in the captured image captured in the past in the imaging step, a value related to the position or / and orientation that the target object can take structurally, The estimated value may be generated based on at least one of the position and / or orientation history of the target object.

本発明の上記画像処理方法の推定値生成ステップでは、撮像ステップで過去に撮像した撮像画像における対象物体の位置又は／及び姿勢に関する値、対象物体の位置又は／及び姿勢の履歴の少なくとも１つに基づいて複数生成する推定値の密度を変える構成としてもよい。 In the estimated value generation step of the image processing method of the present invention, at least one of a value related to the position or / and orientation of the target object and a history of the position or / and orientation of the target object in the captured image captured in the past in the imaging step. It is good also as a structure which changes the density of the estimated value produced | generated based on plural.

本発明の上記画像処理方法の適合度算出ステップでは、推定値生成ステップで生成した各推定値によりテンプレートの各特徴点の三次元位置を変換し、当該変換した三次元位置を撮像手段で撮像された撮像画像に投影し、当該投影位置での特徴点の確からしさをテンプレートの評価情報と撮像画像の投影位置周辺の情報に基づいて評価し、当該評価値に基づいて適合度を算出する構成としてもよい。 In the adaptability calculation step of the image processing method of the present invention, the three-dimensional position of each feature point of the template is converted by each estimated value generated in the estimated value generating step, and the converted three-dimensional position is imaged by the imaging means. Projecting the captured image, evaluating the probability of the feature point at the projection position based on the evaluation information of the template and the information around the projection position of the captured image, and calculating the fitness based on the evaluation value Also good.

本発明の上記画像処理方法の適合度算出ステップでは、撮像画像の投影位置周辺の情報をテンプレートの評価情報と同じ物理量に変換する。 In the fitness calculation step of the image processing method of the present invention, information around the projection position of the captured image is converted into the same physical quantity as the template evaluation information.

本発明の上記画像処理方法の適合度算出ステップでは、投影位置における特徴点の確からしさの評価値が所定値以下の場合には一定値にする構成としてもよい。 The adaptability calculation step of the image processing method of the present invention may be configured to be a constant value when the evaluation value of the probability of the feature point at the projection position is a predetermined value or less.

本発明の上記画像処理方法の適合度算出ステップでは、特徴点毎の各投影位置における評価値からなるデータ構造を撮像画像における対象物体が変化する毎に生成する構成としてもよい。 The fitness calculation step of the image processing method of the present invention may be configured to generate a data structure composed of evaluation values at each projection position for each feature point each time the target object in the captured image changes.

本発明の上記画像処理方法の適合度算出ステップでは、複数の特徴点のうち評価値が高い特徴点を用いて適合度を算出する構成としてもよい。 The fitness calculation step of the image processing method of the present invention may be configured to calculate the fitness using a feature point having a high evaluation value among a plurality of feature points.

本発明の上記画像処理方法の適合度算出ステップでは、複数の特徴点の評価値の統計量を算出し、当該統計量を適合度とする構成としてもよい。 In the fitness calculation step of the image processing method of the present invention, a statistic of evaluation values of a plurality of feature points may be calculated, and the statistic may be used as the fitness.

本発明の上記画像処理方法の判断ステップでは、適合度の最大値が所定値以上、所定値以上の適合度の数が所定数以上、適合度の最大値から所定範囲内の値の適合度の数が所定数以上の少なくとも１つの条件を満たす場合に撮像画像に対象物体が存在すると判断する構成としてもよい。 In the determination step of the image processing method of the present invention, the maximum value of the fitness is a predetermined value or more, the number of fitnesses of the predetermined value or more is a predetermined number or more, and the fitness of a value within a predetermined range from the maximum value of the fitness is determined. A configuration may be adopted in which it is determined that the target object exists in the captured image when at least one condition satisfying a predetermined number is satisfied.

本発明の上記画像処理方法では、対象物体の位置及び／又は姿勢の推定値と当該推定値に対する適合度を記憶する構成としてもよい。 The image processing method of the present invention may be configured to store an estimated value of the position and / or orientation of the target object and a fitness for the estimated value.

本発明の上記画像処理方法では、対象物体を黒目又は黒目と白目とした場合、テンプレートは、対象物体が黒目の場合には黒目モデルであり、対象物体が黒目と白目の場合には黒目モデルと白目モデルである。 In the image processing method of the present invention, when the target object is black eyes or black eyes and white eyes, the template is a black eye model when the target object is black eyes, and a black eye model when the target object is black eyes and white eyes. It is a white-eye model.

本発明の上記画像処理方法では、対象物体を黒目又は黒目と白目とした場合、適合度算出ステップでは、推定値の適合度として、対象物体が黒目の場合には撮像画像の投影位置における黒目領域の輝度の平均値を算出し、対象物体が黒目と白目の場合には撮像画像の投影位置における黒目領域の輝度の平均値と白目領域の輝度の平均値を算出する。 In the image processing method of the present invention, when the target object is black eyes or black eyes and white eyes, in the fitness level calculating step, as the fitness level of the estimated value, the black eye region at the projection position of the captured image when the target object is black color When the target object is black eyes and white eyes, the average brightness value of the black eye area and the average brightness value of the white eye area at the projection position of the captured image are calculated.

上記画像処理方法は、上記した各画像処理装置と同様の作用効果を有する。 The image processing method has the same effects as the above-described image processing apparatuses.

本発明によれば、少ないデータ量により撮像画像から対象物体の有無、位置又は／及び姿勢を高精度に検出することができる。 According to the present invention, the presence / absence, position, and / or orientation of a target object can be detected with high accuracy from a captured image with a small amount of data.

以下、図面を参照して、本発明に係る画像処理装置及び画像処理方法の実施の形態を説明する。 Embodiments of an image processing apparatus and an image processing method according to the present invention will be described below with reference to the drawings.

本実施の形態では、本発明を、対象物体を人間の顔又は人間の眼球とする画像処理装置に適用する。本実施の形態には、３つの形態があり、第１の実施の形態が顔の位置と姿勢を推定する形態であり、第２の実施の形態が顔の有無を判定する形態であり、第３の実施の形態が眼球の姿勢を推定する形態である。各本実施の形態では、保持する情報を求めるために事前に処理を行うための画像処理装置と対象物体の有無、位置、姿勢を判断するための画像処理装置が構成される。例えば、対象の人としては自動車などの乗り物の運転者であり、運転者の顔の位置や姿勢あるいは視線などを推定するために用いられる。 In the present embodiment, the present invention is applied to an image processing apparatus in which a target object is a human face or a human eyeball. In this embodiment, there are three forms, the first embodiment is a form for estimating the position and posture of the face, the second embodiment is a form for determining the presence or absence of the face, The third embodiment is a mode for estimating the posture of the eyeball. In each of the embodiments, an image processing apparatus for performing processing in advance to obtain information to be held and an image processing apparatus for determining the presence / absence, position, and orientation of a target object are configured. For example, the target person is a driver of a vehicle such as an automobile, and is used for estimating the position, posture, line of sight, etc. of the driver.

図１〜図１６を参照して、第１の実施の形態について説明する。図１は、第１の実施の形態及び第２の実施の形態に係るモデリング処理用の画像処理装置の構成図である。図２は、第１の実施の形態に係る顔位置・姿勢推定処理用の画像処理装置の構成図である。図３は、図１の第１カメラで撮像された顔の撮像画像の一例である。図４は、図３の撮像画像から抽出された特徴点を示す画像である。図５は、特徴点に対応する点を含む三次元モデルの一例である。図６は、図３の各特徴点の三次元位置を示す図である。図７は、図１の第１カメラと第２カメラでそれぞれ撮像された顔の撮像画像の一例である。図８は、図７の２つの撮像画像からそれぞれ抽出された特徴点を示す画像である。図９は、図２のカメラで撮像された顔の撮像画像の一例である。図１０は、図９の撮像画像から検出された顔領域を示す画像である。図１１は、図１０の撮像画像の顔領域から抽出された参照輝度パターンと類似する領域を示す画像である。図１２は、第１段階での顔の位置・姿勢の推定値に応じた特徴点の基準三次元位置から二次元画像上への投影の説明図である。図１３は、第１段階での各特徴点の正規化相関値の一例である。図１４は、第２段階での顔の位置・姿勢の推定値に応じて特徴点の基準三次元位置から二次元画像上への投影の説明図である。図１５は、第２段階での各特徴点の正規化相関値の一例である。図１６は、特徴点毎の各投影位置における正規化相関値からなるマップの一例である。 The first embodiment will be described with reference to FIGS. FIG. 1 is a configuration diagram of an image processing apparatus for modeling processing according to the first embodiment and the second embodiment. FIG. 2 is a configuration diagram of an image processing apparatus for face position / posture estimation processing according to the first embodiment. FIG. 3 is an example of a captured image of the face captured by the first camera of FIG. FIG. 4 is an image showing feature points extracted from the captured image of FIG. FIG. 5 is an example of a three-dimensional model including points corresponding to feature points. FIG. 6 is a diagram illustrating a three-dimensional position of each feature point in FIG. FIG. 7 is an example of the captured images of the faces captured by the first camera and the second camera of FIG. FIG. 8 is an image showing feature points respectively extracted from the two captured images of FIG. FIG. 9 is an example of a captured image of a face captured by the camera of FIG. FIG. 10 is an image showing the face area detected from the captured image of FIG. FIG. 11 is an image showing a region similar to the reference luminance pattern extracted from the face region of the captured image of FIG. FIG. 12 is an explanatory diagram of the projection onto the two-dimensional image from the reference three-dimensional position of the feature point according to the estimated value of the face position / posture in the first stage. FIG. 13 is an example of a normalized correlation value of each feature point in the first stage. FIG. 14 is an explanatory diagram of the projection from the reference three-dimensional position of the feature point onto the two-dimensional image according to the estimated value of the face position / posture in the second stage. FIG. 15 is an example of a normalized correlation value of each feature point in the second stage. FIG. 16 is an example of a map including normalized correlation values at each projection position for each feature point.

第１の実施の形態では、画像処理装置１と画像処理装置１１が構成される。画像処理装置１は、モデリング処理用の画像処理装置である。画像処理装置１１は、顔位置・姿勢推定処理用の画像処理装置である。 In the first embodiment, an image processing apparatus 1 and an image processing apparatus 11 are configured. The image processing apparatus 1 is an image processing apparatus for modeling processing. The image processing apparatus 11 is an image processing apparatus for face position / posture estimation processing.

画像処理装置１の構成について説明する。画像処理装置１は、画像処理装置１１で処理を行う前に、画像処理装置１１で保持する情報として認識対象の人の顔における複数の特徴点の輝度パターンと各特徴点の三次元位置を求める。そのために、画像処理装置１は、第１カメラ２、第２カメラ３、画像処理部４を備えている。画像処理部４は、パーソナルコンピュータなどのコンピュータ上でモデリング処理用のアプリケーションプログラム（ソフトウエア）を実行することによって特徴点抽出部４ａ、特徴点三次元位置推定部４ｂ、情報保持部４ｃが構成される。なお、２台の第１カメラ２と第２カメラ３を備えているが、１台の第１カメラ２だけが使用可能な場合と２台のカメラ２，３が使用可能の場合があり、その２つの場合について説明する。 The configuration of the image processing apparatus 1 will be described. The image processing apparatus 1 obtains a luminance pattern of a plurality of feature points and a three-dimensional position of each feature point as information held by the image processing apparatus 11 before processing by the image processing apparatus 11. . For this purpose, the image processing apparatus 1 includes a first camera 2, a second camera 3, and an image processing unit 4. The image processing unit 4 includes a feature point extraction unit 4a, a feature point three-dimensional position estimation unit 4b, and an information holding unit 4c by executing an application program (software) for modeling processing on a computer such as a personal computer. The Although two first cameras 2 and second cameras 3 are provided, there are cases where only one first camera 2 can be used and two cameras 2 and 3 can be used. Two cases will be described.

第１カメラ２、第２カメラ３は、所定の間隔をあけて平行に撮像対象に配置されたステレオカメラであり、視点の異なる二枚の撮像画像を同時に撮像する。カメラ２，３は、ＣＣＤ［Charge coupled device］などの撮像素子を備えるデジタルカメラであり、デジタル画像データからなる撮像画像を画像信号として画像処理部４に送信する。２台のカメラ２，３間の位置関係及び各カメラ２，３の内部パラメータ、外部パラメータは計測可能であり、これらの情報は画像処理部４に予め保持されている。画像処理部４では少なくとも輝度情報が有れば処理を行うことができるので、カメラ２，３はカラーカメラでもあるいは白黒カメラでもよい。カラーカメラの場合、画像処理部４でカラー画像から輝度画像に変換される。 The first camera 2 and the second camera 3 are stereo cameras arranged on the imaging target in parallel with a predetermined interval, and simultaneously capture two captured images with different viewpoints. The cameras 2 and 3 are digital cameras including an image sensor such as a CCD [Charge coupled device], and transmit a captured image formed of digital image data to the image processing unit 4 as an image signal. The positional relationship between the two cameras 2 and 3 and the internal and external parameters of each camera 2 and 3 can be measured, and these pieces of information are stored in the image processing unit 4 in advance. Since the image processing unit 4 can perform processing if there is at least luminance information, the cameras 2 and 3 may be color cameras or monochrome cameras. In the case of a color camera, the image processing unit 4 converts the color image into a luminance image.

１台の第１カメラ２だけを使用する場合、認識対象の人に第１カメラ２の方向を向いてもらう。したがって、第１カメラ２では、その人の真正面の顔を撮像し、その画像信号を画像処理部４に送信する。図３には、第１カメラ２で撮像された撮像画像ＢＩの一例を示している。２台のカメラ２，３を使用する場合、認識対象の人に第１カメラ２の方向を向いてもらう。したがって、第１カメラ２では、その人の顔を真正面から撮像し、その画像信号を画像処理部４に送信する。第２カメラ３では、その人の顔を真正面から少し側方にずれた位置から第１カメラ２と同時に撮像し、その画像信号を画像処理部４に送信する。図７には、第１カメラ２で撮像された撮像画像ＢＩ１と第２カメラ３で撮像された撮像画像ＢＩ２の一例を示している。 When only one first camera 2 is used, the person to be recognized is directed toward the first camera 2. Therefore, the first camera 2 captures the face in front of the person and transmits the image signal to the image processing unit 4. FIG. 3 shows an example of a captured image BI captured by the first camera 2. When two cameras 2 and 3 are used, the person to be recognized is directed toward the first camera 2. Therefore, the first camera 2 captures the person's face from the front and transmits the image signal to the image processing unit 4. In the second camera 3, the person's face is imaged at the same time as the first camera 2 from a position slightly deviated from the front, and the image signal is transmitted to the image processing unit 4. FIG. 7 shows an example of the captured image BI1 captured by the first camera 2 and the captured image BI2 captured by the second camera 3.

１台の第１カメラ２だけを使用する場合、特徴点抽出部４ａは、第１カメラ２で撮像された撮像画像ＢＩの顔内から輝度が周囲に比べて特徴的な点（特徴点）を３つ以上抽出する。特徴点としては、例えば、両目の各目尻、各目頭、鼻の左右の孔、口の左右端などの肌との境界となる箇所であり、輝度が周囲の領域と明らかな差がある箇所である。さらに、特徴点抽出部４ａでは、撮像画像ＢＩから各特徴点のその点らしさ（特徴点としての確からしさ）を評価する特徴量として特徴点を中心とした矩形領域の輝度パターンを参照輝度パターンとして抽出する。そして、特徴点抽出部４ａでは、各特徴点の輝度パターンを情報保持部４ｃに記憶させる。図４には、第１カメラ２で撮像された撮像画像ＢＩの顔から抽出した６つの特徴点（左右の目尻の各点と目頭の各点、口の左右端の各点）についての参照輝度パターンＩＰａ，ＩＰｂ，ＩＰｃ，ＩＰｄ，ＩＰｅ，ＩＰｆが示されている。なお、コンピュータのオペレータによって抽出される点を特徴点としてもよい。また、経験的にその後の処理に対して望ましい点があれば、そのような点周辺の画像を抽出し、統計的な特徴を元にそれと類似する輝度パターンを持つ位置として特徴点を抽出するようにしてもよい。 When only one first camera 2 is used, the feature point extraction unit 4a displays a characteristic point (feature point) whose luminance is characteristic from the inside of the face of the captured image BI captured by the first camera 2 compared to the surroundings. Extract 3 or more. The feature points are, for example, locations that are boundaries with the skin, such as the eyes of each eye, each eye's head, the left and right holes of the nose, and the left and right edges of the mouth, where the brightness is clearly different from the surrounding area. is there. Further, the feature point extraction unit 4a uses, as a reference brightness pattern, a luminance pattern of a rectangular area centered on the feature point as a feature amount for evaluating the likelihood of each feature point (probability as a feature point) from the captured image BI. Extract. The feature point extraction unit 4a stores the luminance pattern of each feature point in the information holding unit 4c. FIG. 4 shows reference luminances for six feature points extracted from the face of the captured image BI captured by the first camera 2 (each point on the left and right corners, each point on the eyes, and each point on the left and right ends of the mouth). Patterns IPa, IPb, IPc, IPd, IPe, and IPf are shown. Note that points extracted by a computer operator may be used as feature points. In addition, if there are points that are desirable for subsequent processing empirically, images around such points are extracted, and feature points are extracted as positions having a similar luminance pattern based on statistical features. It may be.

２台のカメラ２，３を使用する場合、特徴点抽出部４ａでは、第１カメラ２だけを使用する場合と同様に、第１カメラ２で撮像された撮像画像ＢＩ１の顔内から特徴点を３つ以上抽出し、撮像画像ＢＩ１から各特徴点の参照輝度パターンを抽出する。そして、特徴点抽出部４ａでは、各特徴点の輝度パターンを情報保持部４ｃに記憶させる。図８には、第１カメラ２で撮像された撮像画像ＢＩ１の顔から抽出した６つの特徴点についての参照輝度パターンＩＰ１ａ，ＩＰ１ｂ，ＩＰ１ｃ，ＩＰ１ｄ，ＩＰ１ｅ，ＩＰ１ｆが示されている。 When two cameras 2 and 3 are used, the feature point extraction unit 4a extracts feature points from the face of the captured image BI1 captured by the first camera 2 in the same manner as when only the first camera 2 is used. Three or more are extracted, and a reference luminance pattern of each feature point is extracted from the captured image BI1. The feature point extraction unit 4a stores the luminance pattern of each feature point in the information holding unit 4c. FIG. 8 shows reference luminance patterns IP1a, IP1b, IP1c, IP1d, IP1e, and IP1f for six feature points extracted from the face of the captured image BI1 captured by the first camera 2.

１台の第１カメラ２だけを使用する場合、特徴点三次元位置推定部４ｂでは、平均的な顔の三次元モデルから各特徴点に対応する点を抽出し、三次元モデルの座標系における各特徴点に対応する三次元位置（三次元座標）を読み出す。顔の三次元モデルは、正面を向いている顔の三次元形状を示す数百個程度の頂点からなるモデルであり、各頂点が三次元座標を有している。三次元モデルの生成方法としては、レンジスキャナやステレオカメラなどを用いて、平均的な顔をした人物の正面を向いているときの顔の三次元形状（三次元形状を示す点の集まり）を取得する。図５には、三次元モデルＤＭの一例を示している。 When only one first camera 2 is used, the feature point 3D position estimation unit 4b extracts points corresponding to each feature point from the average 3D model of the face, and in the coordinate system of the 3D model. A three-dimensional position (three-dimensional coordinate) corresponding to each feature point is read out. The face three-dimensional model is a model composed of several hundred vertices indicating the three-dimensional shape of the face facing the front, and each vertex has three-dimensional coordinates. As a 3D model generation method, using a range scanner, stereo camera, etc., the 3D shape of a face (a collection of points indicating a 3D shape) when facing the front of an average faced person get. FIG. 5 shows an example of the three-dimensional model DM.

さらに、特徴点三次元位置推定部４ｂでは、各特徴点について、第１カメラ２の画像のカメラ座標系での二次元座標と三次元モデルの座標系での三次元座標との位置関係から、第１カメラ２のカメラ座標系における三次元位置（つまり、第１カメラ２の画像上での三次元位置）を推定する。そして、特徴点三次元位置推定部４ｂでは、特徴点毎に、各特徴点のカメラ座標系における三次元位置を参照輝度パターンに対応付けて情報保持部４ｃに記憶させる。図６には、図４に示す各特徴点のカメラ座標系における三次元位置を示している。 Further, in the feature point three-dimensional position estimation unit 4b, for each feature point, from the positional relationship between the two-dimensional coordinates in the camera coordinate system of the image of the first camera 2 and the three-dimensional coordinates in the coordinate system of the three-dimensional model, A three-dimensional position in the camera coordinate system of the first camera 2 (that is, a three-dimensional position on the image of the first camera 2) is estimated. Then, the feature point three-dimensional position estimating unit 4b stores the three-dimensional position of each feature point in the camera coordinate system in the information holding unit 4c in association with the reference luminance pattern for each feature point. FIG. 6 shows a three-dimensional position of each feature point shown in FIG. 4 in the camera coordinate system.

ここで求められた三次元位置は、顔の位置や姿勢を推定する際の基準位置となる。したがって、第１の実施の形態では、この三次元位置からの位置や姿勢に関する変化量を位置や姿勢としている。なお、三次元モデルの座標系とカメラ座標系とが一致しているとし、その状態を基準として、そこからの位置や姿勢に関する変化量を位置や姿勢とすることもできる。 The three-dimensional position obtained here is a reference position for estimating the face position and posture. Therefore, in the first embodiment, the amount of change related to the position and posture from the three-dimensional position is used as the position and posture. In addition, assuming that the coordinate system of the three-dimensional model and the camera coordinate system coincide with each other, a change amount related to the position and orientation from that state can be used as the position and orientation.

ここで、第１カメラ２のカメラ座標系における三次元位置を推定する具体的な手法について説明する。以降の説明では、第１カメラ２に対して真正面を向いた認識対象の人の顔の状態を基準三次元位置として説明する。第１カメラ２で撮像した画像上の各特徴点の二次元座標、そのカメラの内部パラメータ、外部パラメータ及び各特徴点の三次元モデル座標系における三次元座標は既知である。したがって、未知なカメラ座標系における三次元位置を求めるためには、第１カメラ２と三次元モデルの位置関係（カメラ座標と三次元モデル座標系との位置関係）がどのようなときに、三次元モデル上の各特徴点が画像上の各特徴点の投影されている位置に映りこむのかを推定することになる。 Here, a specific method for estimating the three-dimensional position of the first camera 2 in the camera coordinate system will be described. In the following description, the state of the face of the person to be recognized facing directly in front of the first camera 2 will be described as the reference three-dimensional position. The two-dimensional coordinates of each feature point on the image captured by the first camera 2, the internal parameters and external parameters of the camera, and the three-dimensional coordinates of each feature point in the three-dimensional model coordinate system are known. Therefore, in order to obtain the three-dimensional position in the unknown camera coordinate system, the third order is obtained when the positional relationship between the first camera 2 and the three-dimensional model (the positional relationship between the camera coordinates and the three-dimensional model coordinate system) is determined. It is estimated whether each feature point on the original model is reflected at the projected position of each feature point on the image.

この推定を行うための手法としては、例えば、ＰＯＳＩＴ(D.DeMenthon andL.S.Davis,"Model-Based Object Pose in 25 Lines of Code",InternationalJournal of Computer Vision,15,pp.123-141,june 1995.)を用いる。この手法を用いることにより、三次元モデル上の各特徴点のカメラ座標系における位置と姿勢を算出することができる。特徴点三次元位置推定部４ｂでは、式（１）により、各特徴点について、三次元モデル座標系における三次元位置（Ｘ，Ｙ，Ｚ）から第１カメラ２のカメラ座標系における三次元位置（Ｘ’，Ｙ’，Ｚ’）を算出する。 As a method for performing this estimation, for example, POSIT (D. DeMenthon and L.S. Davis, “Model-Based Object Pose in 25 Lines of Code”, International Journal of Computer Vision, 15, pp. 123-141, june. 1995.) is used. By using this method, the position and orientation of each feature point on the three-dimensional model in the camera coordinate system can be calculated. In the feature point three-dimensional position estimation unit 4b, the three-dimensional position in the camera coordinate system of the first camera 2 is calculated from the three-dimensional position (X, Y, Z) in the three-dimensional model coordinate system with respect to each feature point by Expression (1). (X ′, Y ′, Z ′) is calculated.

式（１）において、ｔがＰＯＳＩＴによって求められた位置（並進ベクトル）であり、ＲがＰＯＳＩＴによって求められた姿勢（回転行列）である。三次元位置（Ｘ’，Ｙ’，Ｚ’）は、カメラ座標系における基準三次元位置となる。 In Expression (1), t is a position (translation vector) obtained by POSIT, and R is an attitude (rotation matrix) obtained by POSIT. The three-dimensional position (X ′, Y ′, Z ′) is a reference three-dimensional position in the camera coordinate system.

２台のカメラ２，３を使用する場合、特徴点三次元位置推定部４ｂでは、第１カメラ２の撮像画像ＢＩ１上の各特徴点に対応する第２カメラ３の撮像画像ＢＩ２上での類似位置を検出する。この検出方法としては、例えば、以下のような手法を用いる。カメラ２，３の位置関係及び内部パラメータ、外部パラメータは既知である。したがって、式（２）で示すエピポーラ拘束は、三次元空間内のある物体の同一点を第１カメラ２、第２カメラ３で撮像したときの各撮像画像上の投影位置ｍ１，ｍ２（拡張座標）に対して成立する。 When two cameras 2 and 3 are used, the feature point three-dimensional position estimation unit 4b is similar on the captured image BI2 of the second camera 3 corresponding to each feature point on the captured image BI1 of the first camera 2. Detect position. As this detection method, for example, the following method is used. The positional relationship, internal parameters, and external parameters of the cameras 2 and 3 are known. Therefore, the epipolar constraint shown in Expression (2) is the projection position m1, m2 (extended coordinates) on each captured image when the first camera 2 and the second camera 3 capture the same point of a certain object in the three-dimensional space. ).

ｍ１は、三次元空間内のある点の第１カメラ２の撮像画像上の投影位置であり、式（３）で表され、既知である。ｍ２は、三次元空間内のある点の第２カメラ３の撮像画像上の投影位置であり、式（４）で表され、未知である。Ｆは、基礎行列であり、式（５）で表される。Ａ１，Ａ２は、カメラ２，３の各内部行列であり、既知である。Ｔは、式（６）の行列で表され、（ｔ１，ｔ２，ｔ３）は並進ベクトルの各要素である。Ｒは、第１カメラ２のカメラ座標系における第１カメラ２のカメラ座標系から第２カメラ３のカメラ座標系への回転行列である。式（５）におけるＴ（位置）とＲ（姿勢）は外部パラメータとして算出されているので、式（５）から基礎行列Ｆを算出できる。 m1 is a projection position on a captured image of the first camera 2 at a certain point in the three-dimensional space, and is expressed by Expression (3) and known. m2 is a projection position on a captured image of the second camera 3 at a certain point in the three-dimensional space, and is expressed by Expression (4) and unknown. F is a basic matrix and is represented by Expression (5). A1 and A2 are internal matrices of the cameras 2 and 3, and are known. T is represented by the matrix of equation (6), and (t1, t2, t3) are each element of the translation vector. R is a rotation matrix from the camera coordinate system of the first camera 2 to the camera coordinate system of the second camera 3 in the camera coordinate system of the first camera 2. Since T (position) and R (attitude) in Expression (5) are calculated as external parameters, the basic matrix F can be calculated from Expression (5).

そこで、特徴点三次元位置推定部４ｂでは、第１カメラ２の撮像画像から抽出された各特徴点の二次元座標であるｍ１を用いて、式（２）により、第２カメラ３の撮像画像上での対応位置を算出する。そして、特徴点三次元位置推定部４ｂでは、第２カメラ３の撮像画像上での各対応位置が存在する領域において、第１カメラ２の撮像画像から抽出された各参照輝度パターンと類似する位置（特徴点）を探索する。そして、特徴点三次元位置推定部４ｂでは、各特徴点について、第２カメラ３の撮像画像からその探索した類似位置を中心とした矩形領域の輝度パターンを参照輝度パターンとして抽出し、その参照輝度パターンを第１カメラ２の撮像画像から抽出した情報保持部４ｃに記憶させる。図８には、第２カメラ３で撮像された撮像画像ＢＩ２から抽出された参照輝度パターンＩＰ２ａ，ＩＰ２ｂ，ＩＰ２ｃ，ＩＰ２ｄ，ＩＰ２ｅ，ＩＰ２ｆが示されている。この探索手法としては、例えば、テンプレートマッチングを用いる。テンプレートマッチングは、例えば、式（７）で表現され、第２カメラ３の撮像画像上の座標（ｘ，ｙ）における類似度ｎｏｒｍ（ｘ，ｙ）が求められる。 Therefore, the feature point three-dimensional position estimation unit 4b uses the m1 which is the two-dimensional coordinate of each feature point extracted from the captured image of the first camera 2 and uses the captured image of the second camera 3 by Expression (2). Calculate the corresponding position above. In the feature point three-dimensional position estimation unit 4b, a position similar to each reference luminance pattern extracted from the captured image of the first camera 2 in an area where each corresponding position on the captured image of the second camera 3 exists. Search for (feature points). Then, the feature point three-dimensional position estimation unit 4b extracts, for each feature point, a luminance pattern of a rectangular area centered on the searched similar position from the captured image of the second camera 3 as a reference luminance pattern, and the reference luminance The pattern is stored in the information holding unit 4c extracted from the captured image of the first camera 2. FIG. 8 shows reference luminance patterns IP2a, IP2b, IP2c, IP2d, IP2e, and IP2f extracted from the captured image BI2 captured by the second camera 3. As this search method, for example, template matching is used. The template matching is expressed by, for example, Expression (7), and the similarity norm (x, y) at the coordinates (x, y) on the captured image of the second camera 3 is obtained.

ｆ（ｘ，ｙ）は、第２カメラ３の撮像画像上の座標（ｘ，ｙ）における輝度値である。ｇ（ｕ，ｖ）は、第１カメラ２の撮像画像から抽出された参照輝度パターンの座標（ｕ，ｖ）における輝度値である。ｆ_ａｖｅは、第２カメラ３の撮像画像上の座標（ｘ，ｙ）を中心とした参照輝度パターンと同じサイズの領域の平均輝度である。ｇ_ａｖｅは、参照輝度パターンの平均輝度である。Ｖは、参照輝度パターンの垂直方向の取りえる座標の要素の集合である。Ｕは、参照輝度パターンの水平方向の取りえる座標の要素の集合である。 f (x, y) is a luminance value at coordinates (x, y) on the captured image of the second camera 3. g (u, v) is a luminance value at the coordinates (u, v) of the reference luminance pattern extracted from the captured image of the first camera 2. f _ave is the average brightness of an area having the same size as the reference brightness pattern centered on the coordinates (x, y) on the captured image of the second camera 3. g _ave is the average luminance of the reference luminance pattern. V is a set of elements of coordinates that can be taken in the vertical direction of the reference luminance pattern. U is a set of coordinate elements that can be taken in the horizontal direction of the reference luminance pattern.

さらに、特徴点三次元位置推定部４ｂでは、第１カメラ２の撮像画像から抽出した特徴点の二次元座標と第２カメラ３の撮像画像から探索した特徴点の二次元座標とを用いて、三角測量の原理により、第１カメラ２のカメラ座標系における各特徴点の三次元位置（基準三次元位置）を算出する。そして、特徴点三次元位置推定部４ｂでは、特徴点毎に、各特徴点のカメラ座標系における三次元位置を参照輝度パターンに対応付けて情報保持部４ｃに記憶させる。 Further, the feature point three-dimensional position estimation unit 4b uses the two-dimensional coordinates of the feature points extracted from the captured image of the first camera 2 and the two-dimensional coordinates of the feature points searched from the captured image of the second camera 3, Based on the principle of triangulation, the three-dimensional position (reference three-dimensional position) of each feature point in the camera coordinate system of the first camera 2 is calculated. Then, the feature point three-dimensional position estimating unit 4b stores the three-dimensional position of each feature point in the camera coordinate system in the information holding unit 4c in association with the reference luminance pattern for each feature point.

情報保持部４ｃは、所定のメモリ領域に構成され、認識対象の３つ以上の特徴点についての参照輝度パターンと第１カメラ２のカメラ座標系における三次元位置をテンプレートとして保持する。第１カメラ２だけ使用する場合には第１カメラ２の撮像画像から得られた参照輝度パターンだけが保持され、カメラ２，３を使用する場合にはカメラ２，３の各撮像画像からそれぞれ得られた参照輝度パターンが保持される。 The information holding unit 4c is configured in a predetermined memory area, and holds a reference luminance pattern for three or more feature points to be recognized and a three-dimensional position in the camera coordinate system of the first camera 2 as a template. When only the first camera 2 is used, only the reference luminance pattern obtained from the captured image of the first camera 2 is retained, and when using the cameras 2 and 3, it is obtained from each captured image of the cameras 2 and 3, respectively. The obtained reference luminance pattern is held.

画像処理装置１１の構成について説明する。画像処理装置１１は、画像処理装置１で生成した各特徴点についてのテンプレートを保持し、そのテンプレートを利用して撮像画像上の認識対象の人の顔の位置と姿勢を推定する。その際、画像処理装置１１では、各フレームの撮像画像について、位置と姿勢の推定値を多数設定し、その各推定値が撮像画像における顔の位置と姿勢に対して適合している度合いを算出し、その適合度に基づいて位置と姿勢を推定する。特に、画像処理装置１１では、位置と姿勢を高精度に推定するために、第１段階の推定値生成及び適合度算出を行い、さらに、第１段階で絞った推定値（位置と姿勢）に基づいて第２段階の推定値生成及び適合度算出を行う。そのために、画像処理装置１１は、カメラ１２、画像処理部１３を備えている。画像処理部１３は、コンピュータ上で顔位置・姿勢推定処理用のアプリケーションプログラムを実行することによって情報保持部１３ａ、記憶解析部１３ｂ、推定値生成部１３ｃ、適合度算出部１３ｄ、顔位置・姿勢出力部１３ｅが構成される。このコンピュータは画像処理装置１と同一のコンピュータであってもよいし、異なるコンピュータでもよく、同一のコンピュータの場合には情報保持部が共有されてもよい。 The configuration of the image processing apparatus 11 will be described. The image processing apparatus 11 holds a template for each feature point generated by the image processing apparatus 1 and estimates the position and orientation of the face of the person to be recognized on the captured image using the template. At that time, the image processing apparatus 11 sets a large number of estimated values of the position and orientation for the captured image of each frame, and calculates the degree to which each estimated value matches the position and orientation of the face in the captured image. Then, the position and orientation are estimated based on the fitness. In particular, in the image processing apparatus 11, in order to estimate the position and orientation with high accuracy, the first-stage estimated value generation and the fitness calculation are performed, and the estimated values (position and orientation) narrowed down in the first stage are further obtained. Based on this, the second-stage estimated value generation and fitness calculation are performed. For this purpose, the image processing apparatus 11 includes a camera 12 and an image processing unit 13. The image processing unit 13 executes an application program for face position / posture estimation processing on a computer to thereby perform an information holding unit 13a, a storage analysis unit 13b, an estimated value generation unit 13c, a fitness calculation unit 13d, a face position / posture An output unit 13e is configured. This computer may be the same computer as the image processing apparatus 1 or may be a different computer. In the case of the same computer, the information holding unit may be shared.

なお、第１の実施の形態では、カメラ１２が特許請求の範囲に記載する撮像手段に相当し、情報保持部１３ａが特許請求の範囲に記載する情報保持手段に相当し、推定値生成部１３ｃが特許請求の範囲に記載する推定値生成手段に相当し、適合度算出部１３ｄが特許請求の範囲に記載する適合度算出手段に相当し、顔位置・姿勢出力部１３ｅが特許請求の範囲に記載する判断手段に相当する。 In the first embodiment, the camera 12 corresponds to the imaging unit described in the claims, the information holding unit 13a corresponds to the information holding unit described in the claims, and the estimated value generating unit 13c. Corresponds to the estimated value generation means described in the claims, the fitness calculation unit 13d corresponds to the fitness calculation means described in the claims, and the face position / posture output unit 13e corresponds to the claims. This corresponds to the determination means described.

カメラ１２は、ＣＣＤなどの撮像素子を備えるデジタルカメラであり、デジタル画像データからなる撮像画像を画像信号として画像処理部１３に送信する。この際、カメラ１２では、時間的に連続して撮像し、一定時間間隔（例えば、１／３０秒）毎の連続した撮像画像（動画像）データを出力する。カメラ１２の内部パラメータ、外部パラメータは計測可能であり、これらの情報は画像処理部１３に予め保持されている。画像処理部１３では少なくとも輝度情報が有れば処理を行うことができるので、カメラ１２はカラーカメラでもあるいは白黒カメラでもよい。カラーカメラの場合、画像処理部１３でカラー画像から輝度画像に変換される。ちなみに、画像処理装置１１が自動車などに搭載される場合、カメラ１２は、車室内において、運転席に座っている運転者の顔を真正面から撮像できる位置に配置される。 The camera 12 is a digital camera including an image sensor such as a CCD, and transmits a captured image made up of digital image data to the image processing unit 13 as an image signal. At this time, the camera 12 continuously captures images in time and outputs continuous captured image (moving image) data at regular time intervals (for example, 1/30 seconds). The internal parameters and external parameters of the camera 12 can be measured, and these pieces of information are stored in the image processing unit 13 in advance. Since the image processing unit 13 can perform processing if there is at least luminance information, the camera 12 may be a color camera or a monochrome camera. In the case of a color camera, the image processing unit 13 converts a color image into a luminance image. Incidentally, when the image processing apparatus 11 is mounted on an automobile or the like, the camera 12 is disposed in a position where the face of the driver sitting in the driver's seat can be imaged from the front in the passenger compartment.

情報保持部１３ａは、所定のメモリ領域に構成され、画像処理装置１で求められた認識対象の顔の複数の特徴点についての参照輝度パターンとカメラ座標系における三次元位置をテンプレートとして保持する。 The information holding unit 13a is configured in a predetermined memory area, and holds a reference luminance pattern for a plurality of feature points of a recognition target face obtained by the image processing apparatus 1 and a three-dimensional position in a camera coordinate system as a template.

記憶解析部１３ｂは、所定のメモリ領域に構成され、推定値生成部１３ｃで生成される位置、姿勢の推定値と適合度算出部１３ｄで算出される各推定値に対する適合度を対応付けて記憶するとともに、顔位置・姿勢出力部１３ｅから出力される各フレームの撮像画像における推定された顔の位置と姿勢を記憶する。そして、記憶解析部１３ｂでは、認識対象の人が過去にとった顔の位置や姿勢を履歴として蓄積する。さらに、記憶解析部１３ｂでは、この位置と姿勢の履歴から、位置と姿勢の６つのパラメータによる六次元空間内で取りえる位置と姿勢の組み合わせを洗い出す。また、記憶解析部１３ｂでは、この位置と姿勢の履歴から、頻度的に多くとりえる位置と姿勢付近をピークとする正規分布を設定し、その正規分布を複数重ねて確率密度関数を生成する。なお、記憶解析部１３ｂでは、顔位置・姿勢出力部１３ｅから出力される顔の位置と姿勢を用いて履歴を蓄積してもよいし、あるいは、多数の推定値とその推定値の適合度の中から適合度が閾値より大きい推定値を用いて履歴を蓄積してもよい。 The storage analysis unit 13b is configured in a predetermined memory area, and stores the position and orientation estimated values generated by the estimated value generating unit 13c in association with the respective fitness values calculated by the fitness level calculating unit 13d. In addition, the estimated face position and posture in the captured image of each frame output from the face position / posture output unit 13e are stored. In the memory analysis unit 13b, the position and orientation of the face taken by the person to be recognized in the past are accumulated as a history. Further, the memory analysis unit 13b identifies combinations of positions and postures that can be taken in the six-dimensional space based on the six parameters of the positions and postures from the history of the positions and postures. In addition, the storage analysis unit 13b sets a normal distribution having a peak near a position and posture that can be frequently taken from the position and posture history, and generates a probability density function by superimposing a plurality of normal distributions. The storage analysis unit 13b may accumulate a history using the face position and posture output from the face position / posture output unit 13e, or may indicate a large number of estimated values and the fitness of the estimated values. The history may be accumulated using an estimated value whose fitness is greater than a threshold value.

画像処理部１３では、現フレーム分の画像信号を受信する毎に、その撮像画像を所定のメモリ領域に記憶させる。そして、画像処理部１３では、前フレームの撮像画像において顔の位置と姿勢が推定されているか否かを判定する。これは、現フレームが初期フレームである場合や前フレームで顔を検出できなかった場合には前フレームで顔の位置と姿勢を推定できないので、後段の処理を行うことができない。そこで、現フレームから顔を検出し、おおよその位置と姿勢を推定し、前フレームの位置と姿勢とする。図９には、カメラ１２から入力されるあるフレームの撮像画像ＩＩの一例を示している。 The image processing unit 13 stores the captured image in a predetermined memory area every time an image signal for the current frame is received. Then, the image processing unit 13 determines whether or not the face position and posture are estimated in the captured image of the previous frame. This is because if the current frame is the initial frame or if the face cannot be detected in the previous frame, the position and orientation of the face cannot be estimated in the previous frame, and therefore the subsequent processing cannot be performed. Therefore, a face is detected from the current frame, the approximate position and orientation are estimated, and the position and orientation of the previous frame are set. FIG. 9 shows an example of a captured image II of a certain frame input from the camera 12.

前フレームで顔の位置と姿勢の推定を行っていない場合、画像処理部１３では、現フレームの撮像画像内に顔が存在するか否かの顔検出を行い、顔が存在するときにはその顔の存在する領域を求める。図１０には、撮像画像ＩＩから顔を検出でき、その顔領域ＦＡを示している。この顔の検出方法としては、従来の顔検出処理を用いてもよいし、あるいは、第２の実施の形態における顔有無判定処理を用いてもよい。ここで、顔を検出できない場合、画像処理部１３では、次フレームの撮像画像を待つ。 When the face position and posture are not estimated in the previous frame, the image processing unit 13 performs face detection to determine whether or not a face exists in the captured image of the current frame. Find the area that exists. FIG. 10 shows a face area FA from which a face can be detected from the captured image II. As the face detection method, a conventional face detection process may be used, or the face presence / absence determination process in the second embodiment may be used. If the face cannot be detected, the image processing unit 13 waits for a captured image of the next frame.

次に、画像処理部１３では、顔領域ＦＡから各特徴点に対応する点を探索する。この探索手法としては、例えば、テンプレートマッチングを利用し、情報保持部１３ａで保持されている各特徴点の参照輝度パターンを用いて、上記した式（７）により、各参照輝度パターンと類似する位置（特徴点）を探索する。図１１には、撮像画像ＩＩの顔領域から探索した各参照輝度パターンと類似する領域ＲＡａ，ＲＡｂ，ＲＡｃ，ＲＡｄ，ＲＡｅ，ＲＡｆが示され、この各領域の中心が対応点である。ここで、３つ以上の対応点を探索できない場合、後段の処理を行うことができないので、画像処理部１３では、次フレームの撮像画像を待つ。 Next, the image processing unit 13 searches for points corresponding to each feature point from the face area FA. As this search method, for example, using template matching and using the reference luminance pattern of each feature point held by the information holding unit 13a, a position similar to each reference luminance pattern according to the above equation (7). Search for (feature points). FIG. 11 shows regions RAa, RAb, RAc, RAd, RAe, and RAf similar to each reference luminance pattern searched from the face region of the captured image II, and the center of each region is a corresponding point. Here, when three or more corresponding points cannot be searched, the subsequent process cannot be performed, and the image processing unit 13 waits for a captured image of the next frame.

次に、画像処理部１３では、各特徴点について、探索した対応点の二次元座標と情報保持部１３ａに保持されている基準三次元位置の対応関係から、基準三次元位置に対するカメラ１２のカメラ座標系におけるおおよその位置と姿勢を算出し、現フレームでの位置と姿勢の推定値とする。この推定値が、次フレームでの処理で、前フレームの位置と姿勢の推定値として利用される。この推定手法としては、例えば、ＰＯＳＩＴを用いる。 Next, in the image processing unit 13, for each feature point, the camera of the camera 12 with respect to the reference three-dimensional position is determined based on the correspondence relationship between the two-dimensional coordinates of the searched corresponding point and the reference three-dimensional position held in the information holding unit 13 a. The approximate position and orientation in the coordinate system are calculated and used as the estimated value of the position and orientation in the current frame. This estimated value is used as the estimated value of the position and orientation of the previous frame in the processing in the next frame. As this estimation method, for example, POSIT is used.

前フレームの位置と姿勢が推定されている場合、推定値生成部１３ｃでは、前フレームでの位置と姿勢（参照値）から取りえる現フレームの撮像画像における顔の位置と姿勢の推定値を多数個生成する。取りえる位置、姿勢の範囲は、位置と姿勢の６つのパラメータ毎に、フレーム間の時間内でそれぞれ変化できる各最大値を前フレームの位置と姿勢の６つのパラメータに加算及び減算した範囲となる。ただし、この範囲が認識対象の顔が構造上取りえない範囲を含んでいる場合、構造的に取りえない範囲を除いた範囲とする。つまり、認識対象の人の顔がおかれている環境やその顔とカメラ１２との位置関係などから、顔が動作可能な位置や姿勢の範囲は物理的に決まっているので、その範囲内で推定値が生成される。フレーム間の時間内でそれぞれ変化できる最大値としては、その顔がおかれている環境における顔の位置や姿勢を事前に測定し、その測定から得られた変化の最大値から予め設定してもよいし、あるいは、前フレームと前々フレームとの間での位置と姿勢の変化に基づいて設定してもよい。 When the position and orientation of the previous frame are estimated, the estimated value generation unit 13c provides many estimated values of the position and orientation of the face in the captured image of the current frame that can be obtained from the position and orientation (reference value) in the previous frame. Generate. The range of the position and orientation that can be taken is a range obtained by adding and subtracting each maximum value that can change within the time between frames to the six parameters of the position and orientation of the previous frame for each of the six parameters of position and orientation. . However, if this range includes a range in which the face to be recognized cannot be structurally taken, it is a range excluding a range that cannot be structurally taken. In other words, the range of positions and postures in which the face can operate is physically determined from the environment where the face of the person to be recognized is placed and the positional relationship between the face and the camera 12. An estimate is generated. The maximum value that can be changed within the time between frames is to measure the position and posture of the face in the environment where the face is placed in advance, and set it in advance from the maximum value of the change obtained from the measurement. Alternatively, it may be set based on a change in position and orientation between the previous frame and the previous frame.

このように設定した取りえる位置、姿勢の６つのパラメータの範囲において、各パラメータが実際にどのような値を取るかは同様に確からしい。そこで、推定値生成部１３ｃでは、位置、姿勢の６つのパラメータ毎にそれぞれの最小値と最大値の範囲の値をとる合計六次元空間内の一様分布からランダムにｎ回取り出し、その取り出した値を位置と姿勢の推定値ｅｓｔ１（ｉ）（ｉ＝１，・・・，ｎ）とする。具体的には、推定値生成部１３ｃでは、式（８）により、推定値を生成する。 It is equally probable what values each parameter actually takes in the range of the six parameters of the position and orientation set as described above. Therefore, the estimated value generation unit 13c randomly extracts n times from the uniform distribution in the total six-dimensional space that takes the values in the range of the minimum value and the maximum value for each of the six parameters of position and orientation. The value is an estimated value est1 (i) (i = 1,..., N) of the position and orientation. Specifically, the estimated value generation unit 13c generates an estimated value using Expression (8).

ｅｓｔ＿ｔ_ｘは位置のｘ座標の推定値であり、ｅｓｔ＿ｔ_ｙは位置のｙ座標の推定値であり、ｅｓｔ＿ｔ_ｚは位置のｚ座標の推定値であり、ｅｓｔ＿ｄｅｇ_ｘは姿勢のＸ軸周りの回転角度の推定値であり、ｅｓｔ＿ｄｅｇ_ｙは姿勢のＹ軸周りの回転角度の推定値であり、ｅｓｔ＿ｄｅｇ_ｚは姿勢のＺ軸周りの回転角度の推定値である。ｏｌｄ＿ｔ_ｘは位置のｘ座標の前フレーム値であり、ｏｌｄ＿ｔ_ｙは位置のｙ座標の前フレーム値であり、ｏｌｄ＿ｔ_ｚは位置のｚ座標の前フレーム値であり、ｏｌｄ＿ｄｅｇ_ｘは姿勢のＸ軸周りの回転角度の前フレーム値であり、ｏｌｄ＿ｄｅｇ_ｙは姿勢のＹ軸周りの回転角度の前フレーム値であり、ｏｌｄ＿ｄｅｇ_ｚは姿勢のＺ軸周りの回転角度の前フレーム値である。ｍａｘ＿ｔ_ｘは位置のｘ座標のフレーム間変化最大値であり、ｍａｘ＿ｔ_ｙは位置のｙ座標のフレーム間変化最大値であり、ｍａｘ＿ｔ_ｚは位置のｚ座標のフレーム間変化最大値であり、ｍａｘ＿ｄｅｇ_ｘは姿勢のＸ軸周りの回転角度のフレーム間変化最大値であり、ｍａｘ＿ｄｅｇ_ｙは姿勢のＹ軸周りの回転角度のフレーム間変化最大値であり、ｍａｘ＿ｄｅｇ_ｚは姿勢のＺ軸周りの回転角度のフレーム間変化最大値である。ｕ（−１，１）は−１から１の間の一様分布である。 Est_t _x is the estimate of x coordinate of the position, Est_t _y is the estimate of the y coordinate of the position, Est_t _z is an estimate of the z coordinate of the position, rotation angle around the X axis of Est_deg _x attitude Est_deg _y is an estimated value of the rotation angle around the Y axis of the posture, and est_deg _z is an estimated value of the rotation angle of the posture around the Z axis. Old_t _x is a previous frame of x coordinate of the position, Old_t _y is the previous frame values for the y-coordinate of the position, Old_t _z is the previous frame values for z coordinate of the position, Old_deg _x around the X axis of orientation a previous frame value of the rotation angle, old_deg _y is the previous frame value of the rotational angle around the Y axis orientation, old_deg _z is the previous frame value of the rotational angle around the Z axis of orientation. max_t _x is the maximum inter-frame change value of the x coordinate of the position, max_t _y is the maximum inter-frame change value of the y coordinate of the position, max_t _z is the maximum inter-frame change value of the z coordinate of the position, and max_deg _x Is the maximum inter-frame change of the rotation angle around the X axis of the posture, max_deg _y is the maximum inter-frame change of the rotation angle around the Y axis of the posture, and max_deg _z is the rotation angle around the Z axis of the posture. This is the maximum change between frames. u (-1,1) is a uniform distribution between -1 and 1.

なお、推定値の生成手法としては、記憶解析部１３ｂに蓄積されている履歴から導かれた六次元空間内で取りえる位置と姿勢の組み合わせに基づいて推定値を生成してもよいし、あるいは、記憶解析部１３ｂに蓄積されている履歴から導かれた確率密度関数に基づいて推定値を生成してもよいし、あるいは、推定値を一様分布ではなく、前フレームの位置と姿勢付近に現フレームの位置と姿勢がいる可能性が高いならば、前フレームの位置と姿勢をピークとする正規分布で推定値を生成してもよい。このような正規分布を用いて推定値を生成することにより、ピークに近いほど推定値の設定間隔が密となり、ピークから離れるほど推定値の設定間隔が疎となる。 As a method for generating an estimated value, an estimated value may be generated based on a combination of a position and a posture that can be taken in the six-dimensional space derived from the history accumulated in the storage analysis unit 13b. The estimated value may be generated based on the probability density function derived from the history accumulated in the memory analysis unit 13b, or the estimated value is not a uniform distribution but near the position and posture of the previous frame. If there is a high possibility that the position and orientation of the current frame are present, the estimated value may be generated with a normal distribution having the peak position and orientation of the previous frame. By generating an estimated value using such a normal distribution, the setting interval of estimated values becomes denser as it gets closer to the peak, and the setting interval of estimated values becomes sparser as it gets away from the peak.

さらに、推定値生成部１３ｃでは、適合度算出部１３ｄで第１段階の推定値ｅｓｔ（ｉ）に対する適合度が算出されると、その最大の適合度が閾値を超えたか否かを判定する。この閾値は、撮像画像上に顔が存在している（つまり、現フレームの撮像画像上の投影位置には特徴点として確からしい点がそれぞれ存在する）と推定できる程度の適合度であるか否かを判定するための閾値である。最大の適合度が閾値以下の場合、現フレームの撮像画像上で顔を検出できないと判断し、推定値生成部１３ｃでは、次フレームの撮像画像を待つ。なお、この判定手法としては、閾値を超える適合度の推定値の数が所定数以上の場合に顔が存在すると判定してもよいし、あるいは、適合度の最大値の一定の割合以上の値（例えば、最大値の９割以上の値）をとる適合度の推定値の数が所定数以上の場合に顔が存在すると判定してもよい。 Furthermore, in the estimated value generation unit 13c, when the fitness level for the first-stage estimated value est (i) is calculated by the fitness level calculation unit 13d, it is determined whether or not the maximum fitness level exceeds a threshold value. Whether or not this threshold is a degree of fitness that can be estimated that a face is present on the captured image (that is, each of the projected positions on the captured image of the current frame has a probable feature point). It is a threshold value for determining whether or not. If the maximum fitness is less than or equal to the threshold value, it is determined that a face cannot be detected on the captured image of the current frame, and the estimated value generation unit 13c waits for the captured image of the next frame. As this determination method, it may be determined that a face is present when the number of estimated fitness values exceeding a threshold is equal to or greater than a predetermined number, or a value equal to or greater than a certain percentage of the maximum fitness value. It may be determined that a face exists when the number of goodness-of-fit estimates that take (for example, 90% or more of the maximum value) is a predetermined number or more.

一方、最大の適合度が閾値を超えた場合、顔が存在すると判断し、推定値生成部１３ｃでは、更に絞った位置と姿勢の推定値を生成する。第１段階での最大の適合度を持つ推定値の位置と姿勢付近に真の位置と姿勢が存在していると推測できるので、その付近に集中して多数の位置と姿勢の推定値を再度設定する。そこで、推定値生成部１３ｃでは、この最大の適合度を持つ推定値の位置と姿勢を平均値とするような正規分布を仮定し、その正規分布からランダムにｎ’回取り出し、その取り出した値を位置と姿勢の推定値ｅｓｔ２（ｉ）（ｉ＝１，・・・，ｎ’）とする。あるいは、推定値生成部１３ｃでは、この最大値の適合度とその最大の一定の割合以上の値をとる適合度の推定値の位置と姿勢をサンプルとして取り出し、その各取り出した全てのサンプルから位置と姿勢のパラメータ毎に平均と分散を算出する。そして、推定値生成部１３ｃでは、式（９）により各パラメータの推定値ｅｓｔをそれぞれ算出し、推定値ｅｓｔ２（ｉ）（ｉ＝１，・・・，ｎ’）を生成する。 On the other hand, if the maximum fitness exceeds the threshold value, it is determined that a face exists, and the estimated value generation unit 13c generates estimated values of further narrowed positions and postures. Since it can be estimated that there is a true position and posture near the position and posture of the estimated value having the maximum fitness in the first stage, a large number of estimated values of the position and posture are concentrated again in the vicinity. Set. Therefore, the estimated value generation unit 13c assumes a normal distribution in which the position and orientation of the estimated value having the maximum fitness is an average value, and randomly extracts n ′ times from the normal distribution. Is an estimated value est2 (i) (i = 1,..., N ′) of the position and orientation. Alternatively, in the estimated value generation unit 13c, the position and orientation of the estimated value of the fitness that takes a value equal to or greater than the maximum value of the fitness and the maximum constant ratio are extracted as samples, and the position is extracted from all the extracted samples. The average and variance are calculated for each of the posture parameters. Then, the estimated value generation unit 13c calculates the estimated value est of each parameter by Expression (9), and generates estimated values est2 (i) (i = 1,..., N ′).

Ｎ（ａ，ｂ）は、平均ａ、分散ｂの正規分布である。ｍｅａｎは、パラメータのサンプルの平均である。σ^２は、パラメータのサンプルの分散である。 N (a, b) is a normal distribution with mean a and variance b. mean is the average of the sample of parameters. σ ² is the variance of the parameter sample.

適合度算出部１３ｄでは、推定値生成部１３ｃで生成した推定値毎に、式（１０）により、各推定値の位置と姿勢の６つのパラメータを用いて、情報保持部１３ａに保持している各特徴点の基準三次元位置をそれぞれ移動させる。 In the fitness calculation unit 13d, for each estimation value generated by the estimation value generation unit 13c, the information holding unit 13a holds the six parameters of the position and orientation of each estimation value according to Expression (10). The reference three-dimensional position of each feature point is moved.

（Ｘ，Ｙ，Ｚ）が情報保持部１３ａに保持されている特徴点の基準三次元位置（三次元座標）であり、（Ｘ’，Ｙ’，Ｚ’）が移動後の特徴点の三次元位置（三次元座標）である。また、Ｒ_ｚは、Ｚ軸周りの回転行列であり、式（１１）の行列で表さる。Ｒ_ｙは、Ｙ軸周りの回転行列であり、式（１２）の行列で表さる。Ｒ_ｘは、Ｘ軸周りの回転行列であり、式（１３）の行列で表される。ｅｓｔ＿ｔ_ｘは位置のｘ座標の推定値であり、ｅｓｔ＿ｔ_ｙは位置のｙ座標の推定値であり、ｅｓｔ＿ｔ_ｚは位置のｚ座標の推定値であり、ｅｓｔ＿ｄｅｇ_ｘは姿勢のＸ軸周りの回転角度の推定値であり、ｅｓｔ＿ｄｅｇ_ｙは姿勢のＹ軸周りの回転角度の推定値であり、ｅｓｔ＿ｄｅｇ_ｚは姿勢のＺ軸周りの回転角度の推定値である。 (X, Y, Z) is the reference three-dimensional position (three-dimensional coordinate) of the feature point held in the information holding unit 13a, and (X ′, Y ′, Z ′) is the tertiary of the feature point after movement. The original position (three-dimensional coordinates). R _z is a rotation matrix around the Z axis, and is represented by the matrix of Expression (11). R _y is a rotation matrix around the Y axis, and is represented by the matrix of Expression (12). R _x is a rotation matrix around the X axis, and is represented by the matrix of Expression (13). Est_t _x is the estimate of x coordinate of the position, Est_t _y is the estimate of the y coordinate of the position, Est_t _z is an estimate of the z coordinate of the position, rotation angle around the X axis of Est_deg _x attitude Est_deg _y is an estimated value of the rotation angle of the posture around the Y axis, and est_deg _z is an estimated value of the rotation angle of the posture around the Z axis.

さらに、適合度算出部１３ｄでは、式（１４）により、移動後の三次元位置（Ｘ’，Ｙ’，Ｚ’）をカメラ１２の撮像画像上に投影する。図１２には、特徴点の基準三次元位置３Ｄａ，３Ｄｂ，３Ｄｃ，３Ｄｄ，３Ｄｅ，３Ｄｆが第１段階で生成した推定値ｅａｔ１（ｉ）に応じて画像上に投影された二次元座標２Ｄ１ａ，２Ｄ１ｂ，２Ｄ１ｃ，２Ｄ１ｄ，２Ｄ１ｅ，２Ｄ１ｆを示している。また、図１４には、特徴点の基準三次元位置３Ｄａ，３Ｄｂ，３Ｄｃ，３Ｄｄ，３Ｄｅ，３Ｄｆが第２段階で生成した推定値ｅａｔ２（ｉ）に応じて画像上に投影された二次元座標２Ｄ２ａ，２Ｄ２ｂ，２Ｄ２ｃ，２Ｄ２ｄ，２Ｄ２ｅ，２Ｄ２ｆを示している。なお、ここでは、式（１０）の変換後の座標がカメラ座標系で表現されているので、座標系の変換の必要はない。また、この投影では、推定値の位置や姿勢によっては、撮像画像内に投影されない場合もある。 Furthermore, the fitness calculation unit 13d projects the three-dimensional position (X ′, Y ′, Z ′) after the movement onto the captured image of the camera 12 according to Expression (14). In FIG. 12, the reference three-dimensional positions 3Da, 3Db, 3Dc, 3Dd, 3De, and 3Df of the feature points are projected on the image according to the estimated value eat1 (i) generated in the first stage, 2D1b, 2D1c, 2D1d, 2D1e, and 2D1f are shown. FIG. 14 also shows the two-dimensional coordinates projected on the image according to the estimated values eat2 (i) generated in the second stage by the reference three-dimensional positions 3Da, 3Db, 3Dc, 3Dd, 3De, and 3Df of the feature points. 2D2a, 2D2b, 2D2c, 2D2d, 2D2e, and 2D2f are shown. Here, since the coordinates after the conversion in Expression (10) are expressed in the camera coordinate system, there is no need to convert the coordinate system. Further, in this projection, depending on the position and orientation of the estimated value, there is a case where it is not projected in the captured image.

ｓがスカラーであり、Ａがカメラ１２の内部行列であり、（ｕ，ｖ）が特徴点の撮像画像上の二次元座標であり、（Ｘ’，Ｙ’，Ｚ’）が移動後の特徴点の三次元位置（三次元座標）である。 s is a scalar, A is an internal matrix of the camera 12, (u, v) is a two-dimensional coordinate on a captured image of feature points, and (X ′, Y ′, Z ′) are features after movement. The three-dimensional position (three-dimensional coordinates) of the point.

さらに、適合度算出部１３ｄでは、式（１５）により、特徴点毎に、情報保持部１３ａに保持されている参照輝度パターンと現フレームの撮像画像上の投影位置での輝度パターンとの正規化相関値を算出する。この正規化相関値は、−１〜１の値であり、値が大きいほど相関度が高いことを示す。推定値の位置と姿勢が撮像画像の顔の位置と姿勢に近いほど、参照輝度パターンと現フレームの撮像画像から切り出された投影位置での輝度パターンとは近いパターンとなり、正規化相関値は大きくなる。 Further, the fitness calculation unit 13d normalizes the reference luminance pattern held in the information holding unit 13a and the luminance pattern at the projection position on the captured image of the current frame for each feature point by Expression (15). A correlation value is calculated. This normalized correlation value is a value between −1 and 1, and the larger the value, the higher the degree of correlation. The closer the estimated position and orientation are to the face position and orientation of the captured image, the closer the reference luminance pattern is to the luminance pattern at the projection position extracted from the captured image of the current frame, and the larger the normalized correlation value is Become.

ｆ（ｘ，ｙ）は、現フレームの撮像画像上の座標（ｘ，ｙ）における輝度値である。ｇ（ｕ，ｖ）は、参照輝度パターンの座標（ｕ，ｖ）における輝度値である。ｆ_ａｖｅは、現フレームの撮像画像上の座標（ｘ，ｙ）を中心とした参照輝度パターンと同じサイズの領域の平均輝度である。ｇ_ａｖｅは、参照輝度パターンの平均輝度である。Ｖは、参照輝度パターンの垂直方向の取りえる座標の要素の集合である。Ｕは、参照輝度パターンの水平方向の取りえる座標の要素の集合である。 f (x, y) is a luminance value at coordinates (x, y) on the captured image of the current frame. g (u, v) is a luminance value at the coordinates (u, v) of the reference luminance pattern. f _ave is an average luminance of an area having the same size as the reference luminance pattern around the coordinate (x, y) on the captured image of the current frame. g _ave is the average luminance of the reference luminance pattern. V is a set of elements of coordinates that can be taken in the vertical direction of the reference luminance pattern. U is a set of coordinate elements that can be taken in the horizontal direction of the reference luminance pattern.

多数の推定値に応じて各特徴点の基準三次元位置を撮像画像上に投影した場合、投影される位置はある範囲に集中し、同じ位置に投影される場合もある。特に、第２段階で位置と姿勢を絞った場合には、同じ位置に投影されるケースが増加する。その場合、撮像画像上の同じ投影位置では同じ輝度パターンなので、正規化相関値を算出した場合には同じ値が得られる。そこで、重複して計算を行わないように、特徴点毎にマップを生成し、マップによって算出した投影位置の正規化相関値が既に算出されているか否かを確認する。 When the reference three-dimensional position of each feature point is projected on the captured image in accordance with a large number of estimated values, the projected position is concentrated in a certain range and may be projected at the same position. In particular, when the position and orientation are narrowed down in the second stage, the number of cases projected at the same position increases. In this case, since the same luminance pattern is obtained at the same projection position on the captured image, the same value is obtained when the normalized correlation value is calculated. Therefore, a map is generated for each feature point so that the calculation is not repeated, and it is confirmed whether the normalized correlation value of the projection position calculated by the map has already been calculated.

マップは、特徴毎に設定され、撮像画像の全画素分の正規化相関値を格納するためのテーブルが用意される（図１６参照）。マップは、フレームが変わる毎あるいは撮像画像の顔の位置や姿勢に変化がある毎に全データが消去され、新たに生成される。図１６に示す例では、画像上の（ａ１，ｂ１）には０．５という値の正規化相関値が格納されており、既にこの投影位置が算出されていることを示し、（ａ１，ｂ２）、（ａ２，ｂ１）、（ａ２，ｂ２）などには正規化相関値が格納されておらず、未だこれらの投影位置が算出されていることを示す。 A map is set for each feature, and a table for storing normalized correlation values for all pixels of the captured image is prepared (see FIG. 16). Every time the frame changes or every time there is a change in the face position or posture of the captured image, the map is erased and newly generated. In the example shown in FIG. 16, a normalized correlation value of 0.5 is stored in (a1, b1) on the image, indicating that this projection position has already been calculated, (a1, b2 ), (A2, b1), (a2, b2), etc., indicate that normalized correlation values are not stored, and these projection positions are still being calculated.

適合度算出部１３ｄでは、ある特徴点の投影位置を算出する毎に、その特徴点のマップを参照し、その算出した投影位置に既に正規化相関値が格納されているか否かを判定する。既に正規化相関値が格納されている場合、適合度算出部１３ｄでは、その正規化相関値を取り出して後処理で使用する。一方、未だ正規化相関値が格納されていない場合、適合度算出部１３ｄでは、その投影位置での正規化相関値を算出する。次に、適合度算出部１３ｄでは、算出した正規化相関値が閾値より小さいか否かを判定し、閾値より小さい場合には正規化相関値を一定値に置き換える。例えば、正規化相関値が０より小さい場合には、正規化相関値を０に置き換える。このように、非常に小さな正規化相関値を排除することにより、ノイズなどの影響によって低下している正規化相関値が適合度に影響するのを防止する。そして、適合度算出部１３ｄでは、その特徴点のマップの該当する投影位置に正規化相関値を書き込む。このように、適合度算出部１３ｄでは、推定値毎に、全ての特徴点についての正規化相関値を求める。図１３には、第１段階の推定値ｅｓｔ１（ｉ）の場合の各特徴点の正規化相関値を示している。また、図１５には、第２段階の推定値ｅｓｔ２（ｉ）の場合の各特徴点の正規化相関値を示しており、一部の特徴点については第１段階の場合より正規化相関値が大きくなっている。 Whenever the projection position of a certain feature point is calculated, the goodness-of-fit calculation unit 13d refers to the map of the feature point and determines whether or not a normalized correlation value is already stored in the calculated projection position. When the normalized correlation value is already stored, the fitness calculation unit 13d extracts the normalized correlation value and uses it in post-processing. On the other hand, when the normalized correlation value is not yet stored, the fitness calculation unit 13d calculates a normalized correlation value at the projection position. Next, the goodness-of-fit calculation unit 13d determines whether the calculated normalized correlation value is smaller than a threshold value. If the calculated normalized correlation value is smaller than the threshold value, the normalized correlation value is replaced with a constant value. For example, when the normalized correlation value is smaller than 0, the normalized correlation value is replaced with 0. In this way, by eliminating a very small normalized correlation value, it is possible to prevent the normalized correlation value that has been lowered due to the influence of noise or the like from affecting the fitness. Then, the matching degree calculation unit 13d writes the normalized correlation value at the corresponding projection position of the feature point map. As described above, the fitness calculation unit 13d obtains normalized correlation values for all feature points for each estimated value. FIG. 13 shows the normalized correlation value of each feature point in the case of the first-stage estimated value est1 (i). FIG. 15 shows the normalized correlation value of each feature point in the case of the second-stage estimated value est2 (i). For some feature points, the normalized correlation value in the case of the first stage is shown. Is getting bigger.

適合度算出部１３ｄでは、推定値毎に、全ての特徴点の中から特徴点を選択し、選択した特徴点の正規化相関値を用いて適合度を算出する。適合度は、推定値の位置と姿勢に応じて三次元的な位置関係によって投影された現フレームの撮像画像上における各投影位置において複数の特徴点の位置と確からしさを評価し、推定値の位置及び姿勢と撮像画像上の顔の位置及び姿勢とが適合しているか否かの度合いを示し、値が大きいほど適合していること示す。具体的には、基準三次元位置の各特徴点を推定値の位置と姿勢に応じて投影した撮像画像上の各投影位置の輝度パターンが各特徴点の参照輝度パターンとそれぞれ類似しているほど、適合度が大きくなる。 For each estimated value, the fitness level calculation unit 13d selects a feature point from among all the feature points, and calculates the fitness level using the normalized correlation value of the selected feature point. The goodness of fit evaluates the position and probability of a plurality of feature points at each projection position on the captured image of the current frame projected by a three-dimensional positional relationship according to the position and orientation of the estimated value, The degree of whether or not the position and orientation and the position and orientation of the face on the captured image are compatible is indicated, and the larger the value is, the more compatible it is. Specifically, the brightness pattern of each projection position on the captured image obtained by projecting each feature point of the standard three-dimensional position according to the position and orientation of the estimated value is similar to the reference brightness pattern of each feature point. , The goodness of fit increases.

適合度の算出に用いる特徴点の選択方法は、全ての特徴点を用いてもよいし、閾値より大きい正規化相関値の特徴点だけを用いてもよいし、撮像画像内に投影された特徴点だけを用いてもよいし、あるいは、特徴点の三次元位置が定義された座標系における特徴点周辺の平均的な法線ベクトル及び姿勢の推定値を考慮し、カメラ１２の撮像画像上に見えていると判断される特徴点だけを用いてもよい。このカメラ１２の撮像画像上に見えていると判断される特徴点だけを用いる場合、位置と姿勢の推定値により特徴点の基準三次元位置を移動させた後の三次元位置にカメラ１２の光学原点から向かうベクトルと姿勢の推定値により特徴点の法線ベクトルを回転させてできるベクトルの内積が０以上の閾値より大きいときに、この特徴点が使用可能だと判断する。なお、上記のような適合度算出に用いる特徴点の選択方法を示したが、これらの選択方法うちのいくつかを組み合わせて特徴点を選択してもよい。 The feature point selection method used for calculating the degree of fitness may use all feature points, or may use only feature points with a normalized correlation value greater than the threshold value, or features projected in the captured image. Only the points may be used, or an average normal vector around the feature point in the coordinate system in which the three-dimensional position of the feature point is defined and an estimated value of the posture are taken into consideration on the captured image of the camera 12. Only feature points that are determined to be visible may be used. When only the feature points that are determined to be visible on the captured image of the camera 12 are used, the optical of the camera 12 is moved to the three-dimensional position after the reference three-dimensional position of the feature point is moved based on the estimated position and orientation. When the inner product of the vector formed by rotating the normal vector of the feature point based on the vector from the origin and the estimated value of the posture is larger than the threshold value of 0 or more, it is determined that the feature point can be used. In addition, although the selection method of the feature point used for the above fitness calculation is shown, you may select a feature point combining some of these selection methods.

適合度の算出は、全ての推定値に対して実施してもよいが、適合度が小さくなると予測できる推定値に対して適合度の算出を中止してもよい。例えば、特徴点の中でも信頼性の高そうな１つ以上の特徴点の正規化相関値の平均値が閾値以上の場合に適合度の算出を実施し、その平均値が閾値未満の場合に適合度の算出を中止するようにしてもよい。 The calculation of the fitness may be performed for all the estimated values, but the calculation of the fitness may be stopped for an estimated value that can be predicted that the fitness is small. For example, if the average value of normalized correlation values of one or more feature points that are likely to be reliable among feature points is equal to or greater than a threshold value, the degree of fitness is calculated. The calculation of the degree may be stopped.

適合度の算出方法としては、選択された特徴点の各正規化相関値の統計量を算出する。統計量としては、例えば、和、平均値がある。 As a method for calculating the fitness, a statistic of each normalized correlation value of the selected feature point is calculated. Examples of statistics include sums and average values.

顔位置・姿勢出力部１３ｅでは、第２段階の全ての推定値ｅｓｔ２（ｉ）について適合度の算出が終了すると、算出した適合度とその適合度に対応する推定値の位置と姿勢を用いて、撮像画像における顔の位置と姿勢を推定し、その推定した位置と姿勢を出力する。その推定方法としては、例えば、適合度が最大の推定値の位置と姿勢としてもよいし、閾値以上の適合度を持つ推定値の適合度による加重平均値によって位置と姿勢を算出してもよいし、適合度を算出された全ての推定値の適合度による加重平均値によって位置と姿勢を算出してもよいし、適合度の最大値の一定の割合以上の値をとる適合度の推定値の適合度による加重平均値によって位置と姿勢を算出してもよいし、適合度の最大値の一定の割合以上の値をとる適合度の推定値の適合度の数が所定数以上の場合にそれらの適合度の推定値の適合度による加重平均値によって位置と姿勢を算出してもよいし、あるいは、適合度の最大値の一定の割合以上の値をとる適合度の推定値の適合度の数が所定数未満の場合に適合度が最大の推定値の位置と姿勢としてもよい。加重平均値を利用する場合にが、式（１６）によって算出を行う。 When the face position / posture output unit 13e finishes calculating the fitness for all the estimated values est2 (i) in the second stage, it uses the calculated fitness and the position and orientation of the estimated value corresponding to the fitness. Then, the position and orientation of the face in the captured image are estimated, and the estimated position and orientation are output. As the estimation method, for example, the position and orientation of the estimated value having the maximum fitness may be used, or the position and orientation may be calculated by a weighted average value based on the fitness of the estimated value having a fitness equal to or higher than the threshold. Alternatively, the position and orientation may be calculated by a weighted average value based on the fitness values of all the estimated values for which the fitness values have been calculated, or the fitness estimate value that takes a value above a certain percentage of the maximum fitness value. The position and orientation may be calculated by a weighted average value based on the degree of goodness of fit, or when the number of goodnesses of the estimated goodness of fit that takes a value of a certain percentage or more of the maximum value of goodness of fit is a predetermined number The position and orientation may be calculated by a weighted average value based on the fitness values of the estimated fitness values, or the fitness values of the estimated fitness values that take a value that exceeds a certain percentage of the maximum fitness value. If the number of is less than the predetermined number, And it may be used as the attitude. When using the weighted average value, the calculation is performed according to Equation (16).

ｅｓｔ_ｉが推定値ｉのパラメータであり、ｗ_ｉが推定値ｉの適合度であり、Ｇが推定に使用すると判断された適合度のインデックス集合である。 est _i is a parameter of the estimated value i, w _i is the fitness of the estimated value i, and G is an index set of the fitness determined to be used for estimation.

次に、図１を参照して、画像処理装置１の動作について説明する。特に、画像処理部４のモデリング処理については図１７のフローチャートに沿って説明する。図１７は、図１の画像処理装置におけるモデリング処理の流れを示すフローチャートである。 Next, the operation of the image processing apparatus 1 will be described with reference to FIG. In particular, the modeling process of the image processing unit 4 will be described with reference to the flowchart of FIG. FIG. 17 is a flowchart showing the flow of modeling processing in the image processing apparatus of FIG.

第１カメラ２では、第１カメラ２に対して真正面を向いている認識対象の人の顔を撮像し、その撮像画像の画像信号を画像処理部４に送信する。第２カメラ３も使用可能な場合、第２カメラ３では、第１カメラ２と同時に、第１カメラ２に対して真正面を向いている認識対象の人の顔を側方から撮像し、その撮像画像の画像信号を画像処理部４に送信する。 The first camera 2 captures an image of the face of a person to be recognized facing directly in front of the first camera 2, and transmits an image signal of the captured image to the image processing unit 4. When the second camera 3 can also be used, the second camera 3 captures the face of the person to be recognized facing the front of the first camera 2 from the side at the same time as the first camera 2 and captures the image. The image signal of the image is transmitted to the image processing unit 4.

画像処理部４では、第１カメラ２及び第２カメラ３が使用可能かあるいは第１カメラ２だけが使用可能かを判定する（Ｓ１０）。 The image processing unit 4 determines whether the first camera 2 and the second camera 3 can be used or only the first camera 2 can be used (S10).

Ｓ１０にて第１カメラ２だけが使用可能と判定した場合、画像処理部４では、第１カメラ２からの画像信号を受信し、顔の撮像画像を取得する（Ｓ１１）。画像処理部４では、第１カメラ２の撮像画像から複数の特徴点を抽出し、その各特徴点周辺の輝度パターンを参照輝度パターンとして保持する（Ｓ１２）。また、画像処理部４では、平均的な顔の三次元モデルから各特徴点に対応する点をそれぞれ選択し、その各対応点の三次元モデルの座標系における三次元位置を抽出する（Ｓ１３）。そして、画像処理部４では、各特徴点の撮像画像上の二次元座標と各特徴点に対応する対応点の三次元モデル座標系での三次元位置との関係から、第１カメラ２のカメラ座標系における各特徴点の三次元位置を推定する（Ｓ１４）。さらに、画像処理部４では、特徴点毎に、推定した三次元位置（三次元座標）を基準三次元位置として参照輝度パターンに対応付けて保持する。 When it is determined in S10 that only the first camera 2 is usable, the image processing unit 4 receives an image signal from the first camera 2 and acquires a captured image of the face (S11). The image processing unit 4 extracts a plurality of feature points from the captured image of the first camera 2, and holds a luminance pattern around each feature point as a reference luminance pattern (S12). Further, the image processing unit 4 selects points corresponding to each feature point from the average three-dimensional model of the face, and extracts the three-dimensional position of each corresponding point in the coordinate system of the three-dimensional model (S13). . Then, the image processing unit 4 determines the camera of the first camera 2 from the relationship between the two-dimensional coordinates on the captured image of each feature point and the three-dimensional position of the corresponding point corresponding to each feature point in the three-dimensional model coordinate system. A three-dimensional position of each feature point in the coordinate system is estimated (S14). Further, the image processing unit 4 holds the estimated three-dimensional position (three-dimensional coordinates) in association with the reference luminance pattern as a standard three-dimensional position for each feature point.

Ｓ１０にて第１カメラ２及び第２カメラ２が使用可能と判定した場合、画像処理部４では、第１カメラ２及び第２カメラ３からの各画像信号を受信し、顔の各撮像画像を取得する（Ｓ１５）。画像処理部４では、第１カメラ２の撮像画像から複数の特徴点を抽出し、その各特徴点周辺の輝度パターンを参照輝度パターンとして保持する（Ｓ１６）。画像処理部４では、特徴点毎に、第１カメラ２の撮像画像から得られた特徴点の輝度パターンにより、第２カメラ３の撮像画像から類似する位置を探索する（Ｓ１７）。この際、画像処理部４では、第２カメラ３の撮像画像上でのその各類似位置周辺の輝度パターンを参照輝度パターンとして保持する。そして、画像処理部４では、第１カメラ２の撮像画像上での特徴点の二次元座標と第２カメラ３の撮像画像上での類似位置の二次元座標との関係から、第１カメラ２のカメラ座標系における各特徴点の三次元位置を推定する（Ｓ１８）。さらに、画像処理部４では、特徴点毎に、推定した三次元位置（三次元座標）を基準三次元位置として参照輝度パターンに対応付けて保持する。 If it is determined in S10 that the first camera 2 and the second camera 2 can be used, the image processing unit 4 receives the image signals from the first camera 2 and the second camera 3, and converts each captured image of the face. Obtain (S15). The image processing unit 4 extracts a plurality of feature points from the captured image of the first camera 2, and holds the brightness pattern around each feature point as a reference brightness pattern (S16). For each feature point, the image processing unit 4 searches for a similar position from the captured image of the second camera 3 based on the luminance pattern of the feature point obtained from the captured image of the first camera 2 (S17). At this time, the image processing unit 4 holds the luminance pattern around each similar position on the captured image of the second camera 3 as a reference luminance pattern. Then, the image processing unit 4 determines the first camera 2 based on the relationship between the two-dimensional coordinates of the feature points on the captured image of the first camera 2 and the two-dimensional coordinates of the similar position on the captured image of the second camera 3. The three-dimensional position of each feature point in the camera coordinate system is estimated (S18). Further, the image processing unit 4 holds the estimated three-dimensional position (three-dimensional coordinates) in association with the reference luminance pattern as a standard three-dimensional position for each feature point.

図２を参照して、画像処理装置１１の動作について説明する。特に、画像処理部１３の顔位置・姿勢推定処理について図１８のフローチャートに沿って説明し、画像処理部１３の正規化相関値算出処理について図１９のフローチャートに沿って説明する。図１８は、図２の画像処理装置における顔位置・姿勢推定処理の流れを示すフローチャートである。図１９は、図２の画像処理装置における正規化相関値算出処理の流れを示すフローチャートである。 The operation of the image processing apparatus 11 will be described with reference to FIG. In particular, the face position / posture estimation process of the image processing unit 13 will be described with reference to the flowchart of FIG. 18, and the normalized correlation value calculation process of the image processing unit 13 will be described with reference to the flowchart of FIG. FIG. 18 is a flowchart showing the flow of face position / posture estimation processing in the image processing apparatus of FIG. FIG. 19 is a flowchart showing the flow of normalized correlation value calculation processing in the image processing apparatus of FIG.

カメラ１２では、時間的に連続して撮像し、一定時間毎に撮像画像の画像信号を画像処理部１３に送信する。 The camera 12 captures images continuously in time and transmits an image signal of the captured image to the image processing unit 13 at regular time intervals.

画像処理部１３では、カメラ１２から画像信号を受信し、現フレームの撮像画像を順次取得する（Ｓ２０）。 The image processing unit 13 receives an image signal from the camera 12 and sequentially acquires captured images of the current frame (S20).

画像処理部１３では、前フレームで撮像画像における顔の位置と姿勢を推定したか否かを判定する（Ｓ２１）。Ｓ２１にて前フレームで位置と姿勢を推定していないと判定した場合、画像処理部１３では、現フレームの撮像画像において顔を検出できたか否かを判定する（Ｓ２２）。Ｓ２２にて顔を検出できなかった場合、画像処理部１３では、Ｓ２０に戻って、次フレームの撮像画像を待つ。Ｓ２２にて顔を検出できた場合、画像処理部１３では、その顔の中から３つ以上の特徴点が検出できたか否かを判定する（Ｓ２３）。Ｓ２３にて３つ以上の特徴点を検出できなかったと判定した場合、画像処理部１３では、Ｓ２０に戻って、次フレームの撮像画像を待つ。Ｓ２３にて３つ以上の特徴点を検出できたと判定した場合、画像処理部１３では、検出した各特徴点の二次元座標と保持している各特徴点の基準三次元位置の対応関係から基準三次元位置に対するカメラ１２のカメラ座標系における位置と姿勢を算出し、この位置と姿勢を現フレームの位置と姿勢の推定値（つまり、この推定値が次フレームでは前フレームでの位置と姿勢の推定値となる）とする（Ｓ２４）。そして、画像処理部１３では、Ｓ２０に戻って、次フレームの撮像画像を待つ。 The image processing unit 13 determines whether or not the face position and posture in the captured image have been estimated in the previous frame (S21). If it is determined in S21 that the position and orientation are not estimated in the previous frame, the image processing unit 13 determines whether or not a face has been detected in the captured image of the current frame (S22). If the face cannot be detected in S22, the image processing unit 13 returns to S20 and waits for the captured image of the next frame. When a face can be detected in S22, the image processing unit 13 determines whether or not three or more feature points have been detected from the face (S23). If it is determined in S23 that three or more feature points could not be detected, the image processing unit 13 returns to S20 and waits for a captured image of the next frame. If it is determined in step S23 that three or more feature points have been detected, the image processing unit 13 determines the reference from the correspondence between the detected two-dimensional coordinates of each feature point and the stored reference three-dimensional position of each feature point. The position and posture of the camera 12 in the camera coordinate system with respect to the three-dimensional position are calculated, and the position and posture are estimated values of the position and posture of the current frame (that is, this estimated value is the position and posture of the previous frame in the next frame). It becomes an estimated value) (S24). Then, the image processing unit 13 returns to S20 and waits for the captured image of the next frame.

Ｓ２１にて前フレームで位置と姿勢を推定していると判定した場合、画像処理部１３では、前フレームで推定した位置と姿勢に基づいて現フレームで取りえる位置と姿勢の推定値ｅｓｔ１（ｉ）（ｉ＝１，・・・，ｎ）を生成する（Ｓ２５）。そして、画像処理部１３では、各推定値ｅｓｔ１（ｉ）の位置と姿勢により保持している各特徴点の基準三次元位置をそれぞれ移動させ、その移動させた各三次元位置からカメラ１２で撮像した撮像画像上に投影した投影位置（二次元座標）をそれぞれ算出する（Ｓ２６）。さらに、画像処理部１３では、ｅｓｔ１（ｉ）の場合の各特徴点について、撮像画像上の投影位置での輝度パターンと保持している参照輝度パターンとの正規化相関値を算出する（Ｓ２７）。そして、画像処理部１３では、各特徴点の正規化相関値から、推定値ｅｓｔ１（ｉ）の撮像画像の顔の位置と姿勢に対する整合性の評価値として適合度を算出する（Ｓ２８）。ここまでの処理で、第１段階の推定値ｅｓｔ１（ｉ）が生成され、その推定値ｅｓｔ１（ｉ）に対してそれぞれ適合度が算出される。 If it is determined in S21 that the position and orientation are estimated in the previous frame, the image processing unit 13 estimates the position and orientation estimates est1 (i in the current frame based on the position and orientation estimated in the previous frame. ) (I = 1,..., N) is generated (S25). Then, the image processing unit 13 moves the reference three-dimensional position of each feature point held by the position and orientation of each estimated value est1 (i), and images are taken by the camera 12 from each moved three-dimensional position. Projected positions (two-dimensional coordinates) projected on the captured image are calculated (S26). Further, the image processing unit 13 calculates a normalized correlation value between the luminance pattern at the projection position on the captured image and the held reference luminance pattern for each feature point in the case of est1 (i) (S27). . Then, the image processing unit 13 calculates the fitness as an evaluation value of the consistency of the estimated value est1 (i) with respect to the face position and posture of the captured image from the normalized correlation value of each feature point (S28). Through the processing so far, the first-stage estimated value est1 (i) is generated, and the fitness is calculated for each estimated value est1 (i).

画像処理部１３では、算出した全ての適合度の中から最大の適合度を抽出し、その最大の適合度が閾値を超えたか否かを判定する（Ｓ２９）。Ｓ２９にて閾値以下と判定した場合、この現フレームで顔を検出できなかったと判断し、画像処理部１３では、Ｓ２０に戻って、次フレームの撮像画像を待つ。 The image processing unit 13 extracts the maximum fitness from all the calculated fitness, and determines whether or not the maximum fitness exceeds a threshold value (S29). If it is determined in S29 that the face is equal to or smaller than the threshold value, it is determined that the face cannot be detected in the current frame, and the image processing unit 13 returns to S20 and waits for the captured image of the next frame.

Ｓ２９にて閾値を超えると判定した場合、画像処理部１３では、最大の適合度に対応する推定値ｅｓｔ１（ｉ）の位置と姿勢付近で現フレームで取りえる位置と姿勢の推定値ｅｓｔ２（ｉ）（ｉ＝１，・・・，ｎ’）を生成する（Ｓ３０）。そして、画像処理部１３では、各推定値ｅｓｔ２（ｉ）の位置と姿勢により保持している各特徴点の基準三次元位置をそれぞれ移動させ、その移動させた各三次元位置からカメラ１２で撮像した撮像画像上に投影した投影位置（二次元座標）をそれぞれ算出する（Ｓ３１）。さらに、画像処理部１３では、ｅｓｔ２（ｉ）の場合の各特徴点について、撮像画像上の投影位置での輝度パターンと保持している参照輝度パターンとの正規化相関値を算出する（Ｓ３２）。そして、画像処理部１３では、各特徴点の正規化相関値から、推定値ｅｓｔ２（ｉ）の適合度を算出する（Ｓ３３）。ここまでの処理で、第２段階の推定値ｅｓｔ２（ｉ）が生成され、その推定値ｅｓｔ２（ｉ）に対してそれぞれ適合度が算出される。 If it is determined in S29 that the threshold value is exceeded, the image processing unit 13 estimates the position and orientation estimated values est2 (i in the current frame near the position and orientation of the estimated value est1 (i) corresponding to the maximum fitness. ) (I = 1,..., N ′) is generated (S30). Then, the image processing unit 13 moves the reference three-dimensional position of each feature point held by the position and orientation of each estimated value est2 (i), and the camera 12 captures an image from each moved three-dimensional position. Projection positions (two-dimensional coordinates) projected on the captured image are calculated (S31). Further, the image processing unit 13 calculates a normalized correlation value between the luminance pattern at the projection position on the captured image and the held reference luminance pattern for each feature point in the case of est2 (i) (S32). . Then, the image processing unit 13 calculates the fitness of the estimated value est2 (i) from the normalized correlation value of each feature point (S33). Through the processing so far, the second-stage estimated value est2 (i) is generated, and the fitness is calculated for each estimated value est2 (i).

画像処理部１３では、推定値ｅｓｔ２（ｉ）の位置と姿勢と各推定値ｅｓｔ２（ｉ）に対して算出した適合度に基づいて、現フレームの撮像画像における顔の位置と姿勢を推定し、その推定した位置と姿勢を出力する（Ｓ３４）。そして、画像処理部１３では、Ｓ２０に戻って、次フレームの撮像画像を待つ。 The image processing unit 13 estimates the position and orientation of the face in the captured image of the current frame based on the position and orientation of the estimated value est2 (i) and the fitness calculated for each estimated value est2 (i). The estimated position and orientation are output (S34). Then, the image processing unit 13 returns to S20 and waits for the captured image of the next frame.

特に、Ｓ２７、Ｓ３２で正規化相関値を算出する際、画像処理部１３では、全ての特徴点に対する投影位置が算出されると、特徴点毎に、マップを参照し、その算出された投影位置が既に投影されたことがあるか否か（つまり、各特徴点のマップのその投影位置に既に正規化相関値が格納されているか否か）を判定する（Ｓ４０）。 In particular, when calculating the normalized correlation values in S27 and S32, when the image processing unit 13 calculates the projection positions for all the feature points, the calculated projection position is referred to the map for each feature point. Has already been projected (that is, whether or not a normalized correlation value has already been stored at the projection position of each feature point map) (S40).

Ｓ４０にてその算出された投影位置に未だ投影されていないと判定した場合、画像処理部１３では、現フレームの撮像画像から、投影位置を中心とし、保持している特徴点の参照輝度パターンと同じサイズで輝度パターンを切り出す（Ｓ４１）。そして、画像処理部１３では、切り出し輝度パターンと保持している特徴点の参照輝度パターンとの間で正規化相関値を算出する（Ｓ４２）。さらに、画像処理部１３では、算出した正規化相関値が閾値より小さいか否かを判定し、閾値より小さいときには正規化相関値を一定値に置き換える（Ｓ４３）。そして、画像処理部１３では、各特徴点のマップに、投影位置に対応付けて正規化相関値を書き込む（Ｓ４４）。 If it is determined in S40 that the projection position has not been projected yet, the image processing unit 13 determines from the captured image of the current frame the reference luminance pattern of the feature point held around the projection position. A luminance pattern is cut out with the same size (S41). Then, the image processing unit 13 calculates a normalized correlation value between the cut-out luminance pattern and the retained reference luminance pattern of the feature point (S42). Further, the image processing unit 13 determines whether or not the calculated normalized correlation value is smaller than the threshold value. When the calculated normalized correlation value is smaller than the threshold value, the normalized correlation value is replaced with a constant value (S43). Then, the image processing unit 13 writes the normalized correlation value in association with the projection position in the map of each feature point (S44).

一方、Ｓ４０にてその算出された投影位置が既に投影されていると判定した場合、画像処理部１３では、その特徴点についての正規化相関値を算出せずに、後処理ではマップからその投影位置の正規化相関値を抽出する（Ｓ４５）。 On the other hand, if it is determined in S40 that the calculated projection position has already been projected, the image processing unit 13 does not calculate a normalized correlation value for the feature point, and post-processes the projection from the map. A normalized correlation value of the position is extracted (S45).

画像処理部１３では、全ての特徴点について正規化相関値の書き込みが終了したか否かを判定する（Ｓ４６）。Ｓ４６にて全ての特徴点について正規化相関値の書き込みが終了したと判定した場合、画像処理部１３では、次の推定値に対する正規化相関値算出に移る。一方、Ｓ４６にて全ての特徴点について正規化相関値の書き込みが終了していないと判定した場合、画像処理部１３では、Ｓ４０に戻って、次の特徴点についての正規化相関値算出に移る。 The image processing unit 13 determines whether or not the writing of normalized correlation values for all feature points has been completed (S46). If it is determined in S46 that the writing of normalized correlation values for all feature points has been completed, the image processing unit 13 proceeds to normalization correlation value calculation for the next estimated value. On the other hand, if it is determined in S46 that the writing of normalized correlation values for all feature points has not been completed, the image processing unit 13 returns to S40 and proceeds to the calculation of normalized correlation values for the next feature point. .

この画像処理装置１１によれば、多数の推定値を生成し、各推定値について各特徴点としての確からしさと特徴点の全体的な位置とを評価した適合度を求めることにより、撮像画像から顔の位置と姿勢を高精度に推定することができる。また、画像処理装置１１によれば、各特徴点についての参照輝度パターンと基準三次元位置の少ないデータだけを保持し、画像全体ではなく、この各特徴点についての処理を行うだけなので、処理負荷を軽減でき、処理時間も短い。 According to the image processing apparatus 11, a large number of estimated values are generated, and the degree of suitability obtained by evaluating the probability as each feature point and the overall position of the feature point for each estimated value is obtained. The position and orientation of the face can be estimated with high accuracy. Further, according to the image processing apparatus 11, only the reference luminance pattern and the data with a small standard three-dimensional position are stored for each feature point, and only the processing for each feature point is performed instead of the entire image. The processing time is short.

画像処理装置１１では、第１段階で絞った推定値の位置と姿勢を用いて第２段階で更に位置と姿勢を絞り込むので、非常に高精度な位置と姿勢を推定することができる。 In the image processing apparatus 11, since the position and orientation are further narrowed down in the second stage using the position and orientation of the estimated value narrowed down in the first stage, it is possible to estimate the position and orientation with very high accuracy.

画像処理装置１１では、推定値を生成する際に、認識対象の人の顔がとる可能性のある範囲に限定して推定値を生成するので、無駄な推定値を生成せず、処理負荷を軽減するとともに、局所的な誤った位置や姿勢を推定することを抑制する。特に、画像処理装置１１では、位置と姿勢の履歴を蓄積し、その履歴も考慮して推定値を生成する場合には、個々の人の顔の動きの癖を考慮した推定値を設定できる。そのため、比較的狭い範囲に集中させて推定値を分布させることができ、ロバスト性が向上し、処理負荷も軽減する。 In the image processing apparatus 11, when generating the estimated value, the estimated value is generated only in a range that the human face of the recognition target may take, so a useless estimated value is not generated and the processing load is increased. While mitigating, it suppresses estimating a local incorrect position and posture. In particular, in the image processing apparatus 11, when the position and orientation history is accumulated and the estimated value is generated in consideration of the history, the estimated value can be set in consideration of the wrinkles of individual human faces. Therefore, the estimated values can be distributed in a relatively narrow range, the robustness is improved, and the processing load is reduced.

画像処理装置１１では、適合度を評価する際に、各特徴点の正規化相関値の統計量を用いるので、全体的な類似度に応じて適合度も変化する。そのため、特徴点の一部が隠れたりあるいは照明変動などによって見え方の違いが生じても、その一部分の輝度パターンとの類似度が低下しても、全体的な類似度を使用しているので、適合度としてはそれほど低下せず、その一部の影響を抑えることができる。また、一部の特徴点と類似する部分が撮像画像上に存在しても、全体的な類似度が高くなければ適合度としてはそれほど上昇せず、その一部の影響を抑えることができる。このように、見た目の変化に対するロバスト性が高く、局所的に誤った位置や姿勢に収束することを抑制する。 Since the image processing apparatus 11 uses the statistic of the normalized correlation value of each feature point when evaluating the fitness, the fitness also changes according to the overall similarity. Therefore, even if a part of the feature point is hidden or a difference in appearance occurs due to illumination fluctuation, etc., even if the similarity with the luminance pattern of that part decreases, the overall similarity is used. The degree of fitness does not decrease so much, and some of the influence can be suppressed. Further, even if a portion similar to some feature points exists on the captured image, if the overall similarity is not high, the degree of matching does not increase so much, and the influence of that portion can be suppressed. Thus, the robustness with respect to the change in appearance is high, and the convergence to an erroneous position or posture is suppressed.

画像処理装置１１では、特徴点毎のマップを設定しているので、同じ投影位置については正規化相関値を重複して算出することがなく、処理負荷を軽減することができる。また、画像処理装置１１では、位置と姿勢を推定する際に、最大値の適合度の一定割合以上の適合度の推定値も利用する場合には、推定する位置と姿勢に対してフィルタリング効果がある。そのため、推定した位置と姿勢が連続的な値となり、滑らかに変化する。 Since the image processing device 11 sets a map for each feature point, the normalized correlation value is not calculated redundantly for the same projection position, and the processing load can be reduced. In addition, when estimating the position and orientation, the image processing apparatus 11 has a filtering effect on the estimated position and orientation when an estimated value of the fitness that is equal to or greater than a certain ratio of the fitness of the maximum value is also used. is there. For this reason, the estimated position and orientation become continuous values and change smoothly.

図２０を参照して、第２の実施の形態について説明する。図２０は、第２の実施の形態に係る顔有無判定処理用の画像処理装置の構成図である。 The second embodiment will be described with reference to FIG. FIG. 20 is a configuration diagram of an image processing apparatus for face presence / absence determination processing according to the second embodiment.

第２の実施の形態では、第１の実施の形態と同様の画像処理装置１と画像処理装置２１が構成される。画像処理装置２１は、顔有無判定処理用の画像処理装置である。なお、画像処理装置１についての説明は第１の実施の形態で行ってので、説明を省略する。また、第２の実施の形態でも、画像処理装置１で生成した認識対象の人の顔における複数の特徴点の輝度パターンと各特徴点の三次元位置からなる複数の特徴点のテンプレートを画像処理装置２１で保持する。 In the second embodiment, an image processing apparatus 1 and an image processing apparatus 21 similar to those in the first embodiment are configured. The image processing device 21 is an image processing device for face presence / absence determination processing. Note that the description of the image processing apparatus 1 has been given in the first embodiment, and the description thereof will be omitted. Also in the second embodiment, a template of a plurality of feature points including a luminance pattern of a plurality of feature points and a three-dimensional position of each feature point in the face of a person to be recognized generated by the image processing apparatus 1 is subjected to image processing. It is held by the device 21.

画像処理装置２１の構成について説明する。画像処理装置２１は、画像処理装置１で生成した各特徴点についてのテンプレートを保持し、そのテンプレートを利用して撮像画像上の認識対象の人の顔の有無を判定する。その際、画像処理装置２１では、位置と姿勢の推定値を多数設定し、その各推定値が撮像画像における顔の位置と姿勢に対して適合している度合いを算出し、その適合度に基づいて顔の有無を判定する。そのために、画像処理装置２１は、カメラ２２、画像処理部２３を備えている。画像処理部２３は、コンピュータ上で顔有無判定処理用のアプリケーションプログラムを実行することによって情報保持部２３ａ、記憶解析部２３ｂ、推定値生成部２３ｃ、適合度算出部２３ｄ、顔有無出力部２３ｅが構成される。このコンピュータは画像処理装置１と同一のコンピュータであってもよいし、異なるコンピュータでもよく、同一のコンピュータの場合には情報保持部が共有されてもよい。 The configuration of the image processing device 21 will be described. The image processing device 21 holds a template for each feature point generated by the image processing device 1, and uses the template to determine the presence or absence of a human face to be recognized on the captured image. At that time, the image processing apparatus 21 sets a large number of estimated values of the position and orientation, calculates the degree to which each estimated value is adapted to the position and orientation of the face in the captured image, and based on the degree of adaptation To determine the presence or absence of a face. For this purpose, the image processing apparatus 21 includes a camera 22 and an image processing unit 23. The image processing unit 23 includes an information holding unit 23a, a storage analysis unit 23b, an estimated value generation unit 23c, a fitness calculation unit 23d, and a face presence / absence output unit 23e by executing an application program for face presence / absence determination processing on a computer. Composed. This computer may be the same computer as the image processing apparatus 1 or may be a different computer. In the case of the same computer, the information holding unit may be shared.

なお、第２の実施の形態では、カメラ２２が特許請求の範囲に記載する撮像手段に相当し、情報保持部２３ａが特許請求の範囲に記載する情報保持手段に相当し、推定値生成部２３ｃが特許請求の範囲に記載する推定値生成手段に相当し、適合度算出部２３ｄが特許請求の範囲に記載する適合度算出手段に相当し、顔有無出力部２３ｅが特許請求の範囲に記載する判断手段に相当する。 In the second embodiment, the camera 22 corresponds to the imaging unit described in the claims, the information holding unit 23a corresponds to the information holding unit described in the claims, and the estimated value generating unit 23c. Corresponds to the estimated value generating means described in the claims, the fitness calculating unit 23d corresponds to the fitness calculating means described in the claims, and the face presence / absence output unit 23e is described in the claims. It corresponds to a determination means.

カメラ２２は第１の実施の形態に係るカメラ１２と同様のカメラであり、その説明を省略する。情報保持部２３ａは第１の実施の形態に係る情報保持部１３ａと同様の情報保持部であり、その説明を省略する。 The camera 22 is the same camera as the camera 12 according to the first embodiment, and a description thereof is omitted. The information holding unit 23a is an information holding unit similar to the information holding unit 13a according to the first embodiment, and a description thereof is omitted.

記憶解析部２３ｂは、所定のメモリ領域に構成され、推定値生成部２３ｃで生成される位置、姿勢の推定値と適合度算出部２３ｄで算出される各推定値に対する適合度を対応付けて記憶するとともに、顔有無出力部２３ｅから出力される各フレームの撮像画像における顔の有無を記憶する。そして、記憶解析部２３ｂでは、多数の推定値とその推定値の適合度の中から適合度が閾値より大きい推定値を用いて、認識対象の人が過去にとった顔の位置や姿勢を履歴として蓄積する。さらに、記憶解析部２３ｂでは、第１の実施の形態に係る記憶解析部１３ｂと同様に、この位置と姿勢の履歴から、六次元空間内で取りえる位置と姿勢の組み合わせを洗い出したり、あるいは、頻度的に多くとりえる位置と姿勢付近をピークとする正規分布を複数重ねて確率密度関数を生成する。 The storage analysis unit 23b is configured in a predetermined memory area, and stores the position and orientation estimation values generated by the estimated value generation unit 23c in association with the respective fitness values calculated by the fitness level calculation unit 23d. In addition, the presence / absence of a face in the captured image of each frame output from the face presence / absence output unit 23e is stored. Then, the memory analysis unit 23b uses the estimated value having a fitness value larger than the threshold value from among a large number of estimated values and the fitness values of the estimated values to record the face position and posture taken by the person to be recognized in the past. Accumulate as. Further, in the memory analysis unit 23b, as in the memory analysis unit 13b according to the first embodiment, from the position and posture history, a combination of positions and postures that can be taken in the six-dimensional space is identified, or A probability density function is generated by overlapping a plurality of normal distributions having peaks at positions and postures that can be frequently taken.

推定値生成部２３ｃでは、第１の実施の形態に係る推定値生成部１３ｃと同様の手法により、前フレームでの位置と姿勢（参照値）から取りえる現フレームの撮像画像における顔の位置と姿勢の推定値ｅｓｔ（ｉ）（ｉ，・・・，ｎ）を多数個生成する。適合度算出部２３ｃでは、第１の実施の形態に係る適合度算出部１３ｃと同様の手法により、推定値生成部２３ｃで生成した推定値ｅｓｔ（ｉ）毎に、各推定値ｅｓｔ（ｉ）の位置と姿勢を用いて情報保持部２３ａに保持している各特徴点の基準三次元位置をそれぞれ移動させた三次元位置をカメラ２２の撮像画像上に投影し、さらに、各特徴点の正規化相関値を算出し、各特徴点の正規化相関値を用いて適合度を算出する。正規化相関値の算出では、第１の実施の形態と同様に特徴点毎のマップを用いる。 The estimated value generation unit 23c uses the same method as the estimated value generation unit 13c according to the first embodiment to determine the position of the face in the captured image of the current frame that can be obtained from the position and orientation (reference value) in the previous frame. A large number of estimated posture values est (i) (i,..., N) are generated. In the fitness level calculation unit 23c, each estimated value est (i) is calculated for each estimated value est (i) generated by the estimated value generation unit 23c by the same method as the fitness level calculation unit 13c according to the first embodiment. The three-dimensional position obtained by moving the reference three-dimensional position of each feature point held in the information holding unit 23a using the position and orientation of the image is projected onto the captured image of the camera 22, and the normalization of each feature point is further performed. The correlation value is calculated, and the fitness is calculated using the normalized correlation value of each feature point. In the calculation of the normalized correlation value, a map for each feature point is used as in the first embodiment.

顔有無出力部２３ｅでは、全ての推定値ｅｓｔ（ｉ）について適合度の算出が終了すると、算出した適合度からその最大の適合度を抽出し、その最大の適合度が閾値を超えたか否かを判定する。この閾値は、撮像画像上に顔が存在していると推定できる程度の適合度であるか否かを判定するための閾値である。最大の適合度が閾値以下の場合、顔有無出力部２３ｅでは、現フレームの撮像画像上に顔は存在しないと判定し、その判定結果を出力する。一方、最大の適合度が閾値を超える場合、顔有無出力部２３ｅでは、現フレームの撮像画像上に顔は存在すると判定し、その判定結果を出力する。なお、この判定手法としては、閾値を超える適合度の推定値の数が所定数以上の場合に顔が存在すると判定してもよいし、あるいは、適合度の最大値の一定の割合以上の値をとる適合度の推定値の数が所定数以上の場合に顔が存在すると判定してもよい。 When the face presence / absence output unit 23e finishes calculating the fitness for all the estimated values est (i), the maximum fitness is extracted from the calculated fitness and whether or not the maximum fitness exceeds a threshold value. Determine. This threshold value is a threshold value for determining whether or not the degree of fitness is such that it can be estimated that a face is present on the captured image. When the maximum fitness is less than or equal to the threshold value, the face presence / absence output unit 23e determines that no face exists on the captured image of the current frame, and outputs the determination result. On the other hand, when the maximum matching level exceeds the threshold value, the face presence / absence output unit 23e determines that a face exists on the captured image of the current frame, and outputs the determination result. As this determination method, it may be determined that a face is present when the number of estimated fitness values exceeding a threshold is equal to or greater than a predetermined number, or a value equal to or greater than a certain percentage of the maximum fitness value. It may be determined that a face is present when the number of estimated fitness values for taking is greater than or equal to a predetermined number.

なお、全ての適合度と閾値とをそれぞれ判定し、閾値を超えた適合度が複数ある場合には、撮像画像上に複数の顔が存在すると判定することもできる。この場合、テンプレートとして保持する参照輝度パターンを生成するための人の顔を平均的な人の顔とするか、あるいは、テンプレートとして保持されている参照輝度パターンを生成するための人の顔と似た人の顔の有無判定が可能となる。 Note that it is also possible to determine all the fitness levels and threshold values, and when there are a plurality of fitness levels exceeding the threshold value, it can be determined that there are a plurality of faces on the captured image. In this case, the person's face for generating the reference luminance pattern held as the template is set as the average person's face, or similar to the person's face for generating the reference luminance pattern held as the template. The presence / absence of a person's face can be determined.

図２０を参照して、画像処理装置２１の動作について説明する。特に、画像処理部２３の顔有無判定処理について図２１のフローチャートに沿って説明する。図２１は、図２０の画像処理装置における顔有無判定処理の流れを示すフローチャートである。 The operation of the image processing device 21 will be described with reference to FIG. In particular, the face presence / absence determination process of the image processing unit 23 will be described with reference to the flowchart of FIG. FIG. 21 is a flowchart showing a flow of face presence / absence determination processing in the image processing apparatus of FIG.

カメラ２２では、時間的に連続して撮像し、一定時間毎に撮像画像の画像信号を画像処理部２３に送信する。 The camera 22 captures images continuously in time and transmits an image signal of the captured image to the image processing unit 23 at regular time intervals.

画像処理部２３では、カメラ２２から画像信号を受信し、現フレームの撮像画像を順次取得する（Ｓ５０）。画像処理部２３では、前フレームで推定した位置と姿勢に基づいて現フレームで取りえる位置と姿勢の推定値ｅｓｔ（ｉ）（ｉ＝１，・・・，ｎ）を生成する（Ｓ５１）。そして、画像処理部２３では、各推定値ｅｓｔ（ｉ）の位置と姿勢により保持している各特徴点の基準三次元位置をそれぞれ移動させ、その移動させた各三次元位置からカメラ２２で撮像した撮像画像上に投影した投影位置（二次元座標）をそれぞれ算出する（Ｓ５２）。さらに、画像処理部２３では、推定値ｅｓｔ（ｉ）の場合の各特徴点について、撮像画像上の投影位置での輝度パターンと保持している参照輝度パターンとの正規化相関値を算出する（Ｓ５３）。そして、画像処理部２３では、各特徴点の正規化相関値から、推定値ｅｓｔ（ｉ）の適合度を算出する（Ｓ５４）。 The image processing unit 23 receives an image signal from the camera 22 and sequentially acquires captured images of the current frame (S50). The image processing unit 23 generates position and orientation estimated values est (i) (i = 1,..., N) that can be taken in the current frame based on the position and orientation estimated in the previous frame (S51). Then, the image processing unit 23 moves the reference three-dimensional position of each feature point held by the position and orientation of each estimated value est (i), and the camera 22 captures an image from each moved three-dimensional position. Projection positions (two-dimensional coordinates) projected on the captured image are calculated (S52). Further, the image processing unit 23 calculates a normalized correlation value between the luminance pattern at the projection position on the captured image and the held reference luminance pattern for each feature point in the case of the estimated value est (i) ( S53). Then, the image processing unit 23 calculates the fitness of the estimated value est (i) from the normalized correlation value of each feature point (S54).

全ての推定値ｅｓｔ（ｉ）についての適合度を算出すると、画像処理部２３では、適合度の中からその最大の適合度を抽出し、その最大の適合度が閾値を超えたか否か（すなわち、撮像画像内に顔が存在するか否か）を判定する（Ｓ５５）。Ｓ５５にて最大の適合度が閾値以下と判定した場合、画像処理部２３では、現フレームの撮像画像上に顔は存在しないと判定し、その判定結果を出力する。一方、Ｓ５５にて最大の適合度が閾値を超えると判定した場合、画像処理部２３では、現フレームの撮像画像上に顔は存在すると判定し、その判定結果を出力する。 After calculating the fitness for all the estimated values est (i), the image processing unit 23 extracts the maximum fitness from the fitness and determines whether or not the maximum fitness exceeds a threshold (that is, Whether or not a face exists in the captured image is determined (S55). If it is determined in S55 that the maximum fitness is equal to or less than the threshold value, the image processing unit 23 determines that no face exists on the captured image of the current frame, and outputs the determination result. On the other hand, if it is determined in S55 that the maximum matching level exceeds the threshold value, the image processing unit 23 determines that a face exists on the captured image of the current frame, and outputs the determination result.

この画像処理装置２１によれば、多数の推定値を生成し、各推定値について各特徴点としての確からしさと特徴点の全体的な位置とを評価した適合度を求めることにより、撮像画像から顔の有無を高精度に判定することができる。また、画像処理装置２１によれば、各特徴点の参照輝度パターンと基準三次元位置の少ないデータだけを保持し、画像全体ではなく、この各特徴点についての処理を行うだけなので、処理負荷を軽減でき、処理時間も短い。さらに、画像処理装置２１は、推定値の生成、適合度による評価、マップの活用については第１の実施の形態の画像処理装置１１と同様の効果を有している。 According to this image processing device 21, a large number of estimated values are generated, and the degree of suitability obtained by evaluating the certainty as each feature point and the overall position of the feature point for each estimated value is obtained, thereby obtaining from the captured image. The presence or absence of a face can be determined with high accuracy. Further, according to the image processing device 21, only the reference luminance pattern of each feature point and data with a small standard three-dimensional position are held, and only the processing for each feature point is performed instead of the entire image. Can be reduced and processing time is short. Furthermore, the image processing device 21 has the same effects as the image processing device 11 of the first embodiment with respect to generation of estimated values, evaluation based on fitness, and utilization of maps.

図２２〜図２９を参照して、第３の実施の形態について説明する。図２２は、第３の実施の形態に係る眼球中心位置推定処理用の画像処理装置の構成図である。図２３は、第３の実施の形態に係る眼球姿勢推定処理用の画像処理装置の構成図である。図２４は、眼球構造を示す図である。図２５は、眼球モデルを示す図である。図２６は、眼球とカメラとの関係を示す図である。図２７は、カメラで撮像された目の撮像画像の一例である。図２８は、眼球の回転の推定値に応じて黒目内の点を二次元画像上への投影の説明図である。図２９は、カメラ座標系と眼球座標系との関係を示す図である。 A third embodiment will be described with reference to FIGS. FIG. 22 is a configuration diagram of an image processing apparatus for eyeball center position estimation processing according to the third embodiment. FIG. 23 is a configuration diagram of an image processing apparatus for eyeball posture estimation processing according to the third embodiment. FIG. 24 is a diagram showing an eyeball structure. FIG. 25 is a diagram illustrating an eyeball model. FIG. 26 is a diagram illustrating the relationship between the eyeball and the camera. FIG. 27 is an example of a captured image of the eye captured by the camera. FIG. 28 is an explanatory diagram of the projection of the points in the black eye on the two-dimensional image according to the estimated value of the rotation of the eyeball. FIG. 29 is a diagram illustrating the relationship between the camera coordinate system and the eyeball coordinate system.

第３の実施の形態では、画像処理装置３１と画像処理装置４１が構成される。画像処理装置３１は、眼球中心位置推定処理用の画像処理装置である。画像処理装置４１は、眼球姿勢推定処理用の画像処理装置である。 In the third embodiment, an image processing device 31 and an image processing device 41 are configured. The image processing device 31 is an image processing device for eyeball center position estimation processing. The image processing device 41 is an image processing device for eyeball posture estimation processing.

画像処理装置３１の構成について説明する。画像処理装置３１では、画像処理装置４１で処理を行う前に、画像処理装置４１で保持する情報として眼球の中心位置とその眼球中心位置の場合の黒目内の各点の三次元位置を求める。その際、画像処理装置３１では、眼球中心位置の推定値を多数設定し、その各推定値が撮像画像における眼球中心位置に対して適合している度合いを算出し、その適合度に基づいて眼球中心位置を推定する。そのために、画像処理装置３１は、カメラ３２、画像処理部３３を備えている。画像処理部３３は、コンピュータ上で眼球中心位置推定処理用のアプリケーションプログラムを実行することによって情報保持部３３ａ、推定値生成部３３ｂ、適合度算出部３３ｃ、眼球中心位置出力部３３ｄが構成される。 A configuration of the image processing apparatus 31 will be described. In the image processing device 31, before the processing by the image processing device 41, as the information held by the image processing device 41, the center position of the eyeball and the three-dimensional position of each point in the black eye in the case of the eyeball center position are obtained. At that time, the image processing device 31 sets a large number of estimated values of the eyeball center position, calculates the degree of matching of each estimated value with the eyeball center position in the captured image, and based on the degree of fitness, the eyeball Estimate the center position. For this purpose, the image processing apparatus 31 includes a camera 32 and an image processing unit 33. The image processing unit 33 includes an information holding unit 33a, an estimated value generation unit 33b, a fitness calculation unit 33c, and an eyeball center position output unit 33d by executing an application program for eyeball center position estimation processing on a computer. .

なお、第３の実施の形態の画像処理装置３１では、カメラ３２が特許請求の範囲に記載する撮像手段に相当し、情報保持部３３ａが特許請求の範囲に記載する情報保持手段に相当し、推定値生成部３３ｂが特許請求の範囲に記載する推定値生成手段に相当し、適合度算出部３３ｃが特許請求の範囲に記載する適合度算出手段に相当し、眼球中心位置出力部３３ｄが特許請求の範囲に記載する判断手段に相当する。 In the image processing apparatus 31 according to the third embodiment, the camera 32 corresponds to the imaging unit described in the claims, the information holding unit 33a corresponds to the information holding unit described in the claims, The estimated value generator 33b corresponds to the estimated value generator described in the claims, the fitness calculator 33c corresponds to the fitness calculator described in the claims, and the eyeball center position output unit 33d is patented. This corresponds to the determination means described in the claims.

カメラ３２は第１の実施の形態に係る第１カメラ２と同様のカメラであり、その説明を省略する。ここでの撮像では、認識対象の人に、カメラ３２のレンズ中心を覗き込むように向いてもらう。したがって、カメラ３２では、その人のレンズを覗き込む目を撮像し、その画像信号を画像処理部３３に送信する。 The camera 32 is the same camera as the first camera 2 according to the first embodiment, and a description thereof is omitted. In the imaging here, the person to be recognized is directed to look into the lens center of the camera 32. Therefore, the camera 32 captures an image of the eye looking into the person's lens and transmits the image signal to the image processing unit 33.

情報保持部３３ａは、所定のメモリ領域に構成され、眼球モデルを保持する。眼球モデルの構築方法について説明する。まず、図２４に示すような眼球構造を仮定する。この眼球構造では、眼球の左右方向の軸をＸ軸とし、眼球の上下方向の軸をＹ軸とし、眼球の奥行き方向の軸をＺ軸とする。また、Ｐが眼球中心であり、Ｒが眼球半径であり、ｒが虹彩半径である。眼球半径Ｒ、虹彩半径ｒについては、標準値を用い、固定である。また、眼球中心Ｐと虹彩（黒目領域）の輪郭とを結ぶ線分と眼球中心Ｐと虹彩中心とを結ぶ線分とが作る立体角をθとする。また、この立体角θを近似するＸ軸周りの回転角をｄｅｇ＿ｘとし、Ｙ軸周りの回転角をｄｅｇ＿ｙとする。回転角ｄｅｇ＿ｘ及び回転角ｄｅｇ＿ｙと立体角θとの関係は式（１７）で表される。また、立体角θは、例えば、標準的な眼球半径Ｒと虹彩半径ｒを使うと、式（１８）で表される。 The information holding unit 33a is configured in a predetermined memory area and holds an eyeball model. A method for constructing the eyeball model will be described. First, an eyeball structure as shown in FIG. 24 is assumed. In this eyeball structure, the left-right axis of the eyeball is taken as the X axis, the up-down direction axis of the eyeball is taken as the Y-axis, and the depth direction axis of the eyeball is taken as the Z-axis. P is the center of the eyeball, R is the eyeball radius, and r is the iris radius. The eyeball radius R and iris radius r are fixed using standard values. A solid angle formed by a line segment connecting the eyeball center P and the outline of the iris (black-eye region) and a line segment connecting the eyeball center P and the iris center is denoted by θ. A rotation angle around the X axis that approximates the solid angle θ is deg_x, and a rotation angle around the Y axis is deg_y. The relationship between the rotation angle deg_x, the rotation angle deg_y, and the solid angle θ is expressed by Expression (17). Further, the solid angle θ is expressed by Expression (18), for example, when a standard eyeball radius R and iris radius r are used.

ここで、図２４に示す眼球構造を、図２５に示すような眼球中心Ｐ、眼球半径Ｒ及び虹彩半径ｒからなる簡易の眼球モデルに近似し、眼球モデルを構築する。情報保持部３３ａでは、この眼球モデルが保持されている。 Here, the eyeball structure shown in FIG. 24 is approximated to a simple eyeball model including the eyeball center P, the eyeball radius R, and the iris radius r as shown in FIG. 25 to construct an eyeball model. The eyeball model is held in the information holding unit 33a.

推定値生成部３３ｂでは、カメラ３２の光学中心Ｏと眼球中心Ｐとを結ぶ線分と眼球モデルの視線とが一致するように眼球を回転させながら眼球モデルが眼球中心位置をカメラ３２の前で変化させたときに取りえる眼球中心位置の推定値を多数個生成する（図２６参照）。ここでは、例えば、顔の位置を検出する従来技術を利用し、顔を撮像画像上から検出し、そのサイズで顔を検出できるのは平均的な顔であるとすると、どの程度の位置に顔がいるということが予測できる。このような顔の位置の予測値が使用可能な場合、撮像画像上での顔の位置と平均的な顔の位置の関係及び顔の位置に対する眼球中心位置の相対的な位置関係を利用することにより、おおよその眼球中心位置を推定できるので、その位置付近に推定値を生成する。 In the estimated value generation unit 33b, the eyeball model moves the eyeball center position in front of the camera 32 while rotating the eyeball so that the line connecting the optical center O of the camera 32 and the eyeball center P matches the line of sight of the eyeball model. A large number of estimated values of the center position of the eyeball that can be taken when changed are generated (see FIG. 26). Here, for example, using a conventional technique for detecting the position of a face, if the face can be detected from the captured image and the face can be detected with that size, the position of the face is It can be predicted that there is. When such face position prediction values are available, use the relationship between the face position on the captured image and the average face position, and the relative position relationship of the eyeball center position with respect to the face position. Thus, since the approximate center position of the eyeball can be estimated, an estimated value is generated near the position.

適合度算出部３３ｃでは、まず、各推定値の眼球中心位置の場合に、撮像画像上で黒目内の各点がとる二次元位置を算出する。適合度算出部３３ｃでは、眼球中心位置Ｐと視線方向（眼球中心から黒目中心に向かうベクトルＱ）が決まると、上記の式（１７）、式（１８）から、眼球モデルにおける多数の黒目内の点の三次元位置を式（１９）によりそれぞれ算出する。 First, in the case of the eyeball center position of each estimated value, the fitness level calculating unit 33c calculates a two-dimensional position taken by each point in the black eye on the captured image. When the eyeball center position P and the line-of-sight direction (vector Q from the eyeball center to the center of the black eye) are determined, the fitness level calculation unit 33c determines the number of black eyes in the eyeball model from the above equations (17) and (18). The three-dimensional position of the point is calculated by equation (19).

式（１９）では、Ｐが眼球中心位置の推定値であり、Ｑが眼球中心から黒目中心に向かうベクトル（三次元座標）であり、Ｑ’が黒目内の点の三次元座標である。ここでは、後段での処理を簡単にするために、カメラ座標系でのＰ及びＱが得られたとし、Ｑが眼球モデルから求められる。Ｒ_ｙは、Ｙ軸周りの回転行列であり、式（２０）で表される。Ｒ_ｘは、Ｘ軸周りの回転行列であり、式（２１）で表される。この場合、黒目内の各点は、Ｘ軸周りの回転角ｄｅｇ＿ｘ及びＹ軸周りの回転角ｄｅｇ＿ｙと立体角θとは式（２２）の関係を満たすことになる。 In Equation (19), P is an estimated value of the center position of the eyeball, Q is a vector (three-dimensional coordinate) from the center of the eyeball to the center of the black eye, and Q ′ is the three-dimensional coordinate of the point in the black eye. Here, in order to simplify the subsequent processing, it is assumed that P and Q in the camera coordinate system are obtained, and Q is obtained from the eyeball model. R _y is a rotation matrix around the Y axis, and is represented by Expression (20). R _x is a rotation matrix around the X axis, and is represented by Expression (21). In this case, for each point in the black eye, the rotation angle deg_x around the X axis, the rotation angle deg_y around the Y axis, and the solid angle θ satisfy the relationship of Expression (22).

さらに、適合度算出部３３ｃでは、式（２３）により、黒目内の各点の三次元座標Ｑ’の撮像画像上への投影位置をそれぞれ算出する。この多数の投影位置で形成される領域が、眼球モデルを推定値の眼球中心位置で移動させたときの眼球モデル上の虹彩（黒目）の撮像画像上での領域となる。 Furthermore, the fitness level calculation unit 33c calculates the projection position on the captured image of the three-dimensional coordinates Q 'of each point in the black eye according to Expression (23). A region formed by the multiple projection positions is a region on a captured image of the iris (black eye) on the eyeball model when the eyeball model is moved at the estimated eyeball center position.

ｓがスカラーであり、Ａがカメラ３２の内部行列であり、（ｕ，ｖ）が黒目内の点の撮像画像上の投影位置（二次元座標）であり、（Ｘ’，Ｙ’，Ｚ’）が黒目内の点の三次元位置（三次元座標）Ｑ’である。ここでは、ｄｅｇ＿ｘ、ｄｅｇ＿ｙの値を様々な組み合わせで設定することによって、黒目内の多数の点を算出している。ｄｅｇ＿ｘ，ｄｅｇ＿ｙの刻み幅は、撮像画像上に投影したときに黒目内の点群が隙間だらけにならないように、例えば、数ｄｅｇ程度で行う。 s is a scalar, A is an internal matrix of the camera 32, (u, v) is a projection position (two-dimensional coordinates) on a captured image of a point in the black eye, and (X ′, Y ′, Z ′) ) Is the three-dimensional position (three-dimensional coordinates) Q ′ of the point in the black eye. Here, a large number of points in the black eye are calculated by setting the values of deg_x and deg_y in various combinations. The step size of deg_x and deg_y is set to, for example, about several deg so that the point group in the black eye does not become full of gaps when projected onto the captured image.

適合度算出部３３ｃでは、推定値毎に、カメラ３２の撮像画像から黒目内の各点の投影位置の輝度値をそれぞれ取得する。そして、適合度算出部３３ｃでは、推定値毎に、全ての投影位置での輝度値の平均値を算出し、この平均値を適合度とする。なお、適合度算出部３３ｃでも、第１の実施の形態と同様に、マップを設定し、処理負荷を軽減する。 The suitability calculation unit 33c acquires the brightness value of the projection position of each point in the black eye from the captured image of the camera 32 for each estimated value. Then, the fitness level calculation unit 33c calculates an average value of luminance values at all projection positions for each estimated value, and uses this average value as the fitness level. Note that, also in the fitness calculation unit 33c, a map is set and the processing load is reduced as in the first embodiment.

このように、各推定値の眼球中心位置を与えことにより、その眼球中心位置と光学中心とを結ぶ線分に視線が一致するように眼球モデルを回転し、眼球モデル上の虹彩にあたる円盤（図２５の斜線領域）が撮像画像上でとる位置が決まり、その円盤内部の点が黒目内の点に相当する。つまり、算出される輝度値の平均値は、黒目らしさの指標となる。この指標を適合度として捉えることにより、黒目は、目の周辺の中でもとりわけ黒いので、その平均値は小さな値になることが予測される。したがって、この指標では、最小値をとるものが最も確からしいと考えることができる。 Thus, by giving the eyeball center position of each estimated value, the eyeball model is rotated so that the line of sight coincides with the line segment connecting the eyeball center position and the optical center, and the disc corresponding to the iris on the eyeball model (see FIG. 25) is determined on the captured image, and the point inside the disk corresponds to the point in the black eye. That is, the average value of the calculated luminance values is an index of blackness. By capturing this index as the degree of fitness, the black eye is particularly black in the vicinity of the eye, so the average value is predicted to be small. Therefore, it can be considered that the index having the minimum value is most likely.

そこで、眼球中心位置出力部３３ｄでは、全ての推定値について適合度の算出が終了すると、算出した適合度の中から最小値の適合度を抽出する。そして、眼球中心位置出力部３３ｄでは、その最小の適合度を持つ推定値を眼球中心位置と推定し、その推定した眼球中心位置を出力する。また、眼球中心位置出力部３３ｄでは、その推定した眼球中心位置の場合の黒目内の各点の三次元座標も出力する。 Therefore, the eyeball center position output unit 33d extracts the fitness value of the minimum value from the calculated fitness values when calculation of the fitness values for all estimated values is completed. Then, the eyeball center position output unit 33d estimates the estimated value having the minimum fitness as the eyeball center position, and outputs the estimated eyeball center position. The eyeball center position output unit 33d also outputs the three-dimensional coordinates of each point in the black eye in the case of the estimated eyeball center position.

画像処理装置４１の構成について説明する。画像処理装置４１では、眼球モデルについてのテンプレートを保持し、そのテンプレートを利用して撮像画像上の眼球の姿勢（視線方向）を推定する。ここでは、顔が動かないと仮定し、画像処理装置３１で求めた眼球中心位置Ｐが固定であるとしているので、眼球モデルに生じる変化はカメラ座標系に対する回転運動であり、この回転運動を眼球姿勢として推定する。その際、画像処理装置４１では、眼球姿勢の推定値を多数設定し、その各推定値が撮像画像における眼球姿勢に対して適合している度合いを算出し、その適合度に基づいて眼球姿勢を推定する。そのために、画像処理装置４１は、カメラ４２、画像処理部４３を備えている。画像処理部４３は、コンピュータ上で眼球姿勢推定処理用のアプリケーションプログラムを実行することによって情報保持部４３ａ、記憶解析部４３ｂ、推定値生成部４３ｃ、適合度算出部４３ｄ、眼球姿勢出力部４３ｅが構成される。 A configuration of the image processing apparatus 41 will be described. The image processing apparatus 41 holds a template for the eyeball model, and estimates the posture (eye-gaze direction) of the eyeball on the captured image using the template. Here, since it is assumed that the face does not move and the eyeball center position P obtained by the image processing device 31 is fixed, the change that occurs in the eyeball model is a rotational motion with respect to the camera coordinate system. Estimated as posture. At that time, the image processing apparatus 41 sets a large number of estimated values of the eyeball posture, calculates the degree to which each estimated value matches the eyeball posture in the captured image, and determines the eyeball posture based on the degree of fitness. presume. For this purpose, the image processing apparatus 41 includes a camera 42 and an image processing unit 43. The image processing unit 43 executes an application program for eyeball posture estimation processing on a computer, whereby an information holding unit 43a, a storage analysis unit 43b, an estimated value generation unit 43c, a fitness calculation unit 43d, and an eyeball posture output unit 43e Composed.

なお、第３の実施の形態の画像処理装置４１では、カメラ４２が特許請求の範囲に記載する撮像手段に相当し、情報保持部４３ａが特許請求の範囲に記載する情報保持手段に相当し、推定値生成部４３ｃが特許請求の範囲に記載する推定値生成手段に相当し、適合度算出部４３ｄが特許請求の範囲に記載する適合度算出手段に相当し、眼球姿勢出力部４３ｅが特許請求の範囲に記載する判断手段に相当する。 In the image processing apparatus 41 according to the third embodiment, the camera 42 corresponds to the imaging unit described in the claims, the information holding unit 43a corresponds to the information holding unit described in the claims, The estimated value generator 43c corresponds to the estimated value generator described in the claims, the fitness calculator 43d corresponds to the fitness calculator described in the claims, and the eyeball posture output unit 43e claims. It corresponds to the determination means described in the range.

カメラ４２は第１の実施の形態に係るカメラ１２と同様のカメラであり、その説明を省略する。なお、画像処理装置４１が自動車などに搭載される場合、カメラ４２は、車室内において、運転席に座っている運転者の目付近を真正面から撮像できる位置に配置される。図２７には、カメラ４２で撮像されたあるフレームの撮像画像の一例を示している。 The camera 42 is the same camera as the camera 12 according to the first embodiment, and a description thereof is omitted. Note that when the image processing device 41 is mounted in an automobile or the like, the camera 42 is arranged in a position in the passenger compartment where the vicinity of the eyes of the driver sitting in the driver's seat can be imaged from the front. FIG. 27 shows an example of a captured image of a certain frame captured by the camera 42.

情報保持部４３ａは、所定のメモリ領域に構成され、画像処理装置３１で保持した同様の眼球モデル及び画像処理装置３１から出力された眼球中心位置Ｐとその眼球中心位置Ｐの場合の眼球モデルの黒目内の各点のカメラ座標系における三次元位置を保持する。 The information holding unit 43a is configured in a predetermined memory area, the same eyeball model held by the image processing device 31, and the eyeball center position P output from the image processing device 31 and the eyeball model in the case of the eyeball center position P. Holds the 3D position of each point in the black eye in the camera coordinate system.

記憶解析部４３ｂは、所定のメモリ領域に構成され、推定値生成部４３ｃで生成される眼球の姿勢の推定値と適合度算出部４３ｄで算出される各推定値に対する適合度を対応付けて記憶するとともに、眼球姿勢出力部４３ｅから出力される撮像画像における眼球の姿勢を記憶する。そして、記憶解析部４３ｂでは、認識対象の人が過去にとった眼球の姿勢を履歴として蓄積する。さらに、記憶解析部４３ｂでは、この過去の履歴から取りえる眼球の姿勢を洗い出す。また、記憶解析部４３ｂでは、過去の履歴から頻度的に多くとりえる眼球の姿勢付近をピークとする正規分布を設定し、その正規分布を複数重ねて確率密度関数を生成する。なお、記憶解析部４３ｂでは、眼球姿勢出力部４３ｅから出力される眼球の姿勢を用いて履歴を蓄積してもよいし、あるいは、多数の推定値とその推定値の適合度の中から適合度が閾値より大きい推定値を用いて履歴を蓄積してもよい。 The storage analysis unit 43b is configured in a predetermined memory area and stores the estimated value of the posture of the eyeball generated by the estimated value generation unit 43c and the fitness for each estimated value calculated by the fitness calculation unit 43d in association with each other. At the same time, the posture of the eyeball in the captured image output from the eyeball posture output unit 43e is stored. In the memory analysis unit 43b, the posture of the eyeball taken by the person to be recognized in the past is accumulated as a history. Further, the memory analysis unit 43b identifies the eyeball postures that can be taken from the past history. In addition, the memory analysis unit 43b sets a normal distribution having a peak in the vicinity of the posture of the eyeball that can be frequently obtained from the past history, and generates a probability density function by superimposing a plurality of normal distributions. Note that the memory analysis unit 43b may accumulate the history using the eyeball posture output from the eyeball posture output unit 43e, or the fitness level may be selected from a large number of estimated values and the fitness levels of the estimated values. The history may be accumulated using an estimated value that is greater than the threshold.

推定値生成部４３ｃでは、情報保持部４３ａに保持されている眼球モデルに基づいて、前フレームでの眼球の姿勢から取りえる現フレームの撮像画像における眼球の姿勢（回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙ）の推定値を多数個生成する。取りえる姿勢の範囲は、姿勢の２つのパラメータ毎に、フレーム間の時間内でそれぞれ変化できる各最大値を前フレームの姿勢の２つのパラメータに加算及び減算した範囲となる。ただし、この範囲が認識対象の眼球が構造上取りえない範囲を含んでいる場合、構造的に取りえない範囲を除いた範囲とする。つまり、回転可能な眼球の姿勢の範囲は物理的に決まっているので、その範囲を超えて推定値は生成されることはない。フレーム間の時間内でそれぞれ変化できる最大値としては、その眼球がおかれる環境における眼球の姿勢を予め測定し、その測定から得られた変化の最大値から予め設定してもよいし、あるいは、前フレームと前々フレームとの間での姿勢の変化に基づいて設定してもよい。 Based on the eyeball model held in the information holding unit 43a, the estimated value generation unit 43c estimates the eyeball posture (rotation angles deg_x, deg_y) in the captured image of the current frame that can be obtained from the eyeball posture in the previous frame. Generate many values. The range of postures that can be taken is a range obtained by adding and subtracting each maximum value that can be changed within the time between frames to the two parameters of the posture of the previous frame for each of the two parameters of the posture. However, when this range includes a range that the eyeball to be recognized cannot be structurally taken, it is a range excluding the range that cannot be structurally taken. In other words, since the range of the posture of the rotatable eyeball is physically determined, the estimated value is not generated beyond the range. As the maximum value that can be changed within the time between frames, the posture of the eyeball in the environment where the eyeball is placed may be measured in advance, and may be set in advance from the maximum value of the change obtained from the measurement, or You may set based on the change of the attitude | position between a front frame and a frame before the last.

このように設定した取りえる姿勢の２つのパラメータの範囲において、各パラメータが実際にどのような値を取るかは同様に確からしい。そこで、推定値生成部４３ｃでは、姿勢の２つのパラメータ毎にそれぞれの最小値と最大値の範囲の値をとる合計ニ次元平面内の一様分布からランダムにｎ回取り出し、その取り出した値を姿勢の推定値とする。 In the range of the two parameters of the posture that can be set as described above, it is equally likely that each parameter actually takes a value. Therefore, the estimated value generation unit 43c randomly extracts n times from the uniform distribution in the total two-dimensional plane that takes a value in the range of the minimum value and the maximum value for each of the two parameters of the posture, and calculates the extracted value. The estimated value of the posture.

なお、推定値の生成手法としては、記憶解析部４３ｂに蓄積されている履歴から導かれた取りえる位置と姿勢の組み合わせに基づいて推定値を生成してもよいし、あるいは、記憶解析部４３ｂに蓄積されている履歴から導かれた確率密度関数に基づいて推定値を生成してもよいし、あるいは、推定値を一様分布ではなく、前フレームの姿勢付近に現フレームの姿勢がいる可能性が高いならば、前フレームの姿勢をピークとする正規分布で推定値を生成してもよい。 As a method for generating an estimated value, an estimated value may be generated based on a combination of a position and a posture that can be obtained from the history accumulated in the storage analysis unit 43b, or the storage analysis unit 43b. The estimated value may be generated based on the probability density function derived from the history stored in the history, or the estimated value may not have a uniform distribution and the current frame posture may be near the previous frame posture If the characteristics are high, the estimated value may be generated with a normal distribution having the peak of the posture of the previous frame.

適合度算出部４３ｄでは、推定値生成部４３ｃで生成した推定値毎に、各推定値の姿勢の２つのパラメータを用いて、撮像画像上で黒目内の各点がとる二次元位置を算出する。ここでは、情報保持部４３ａに保持されている固定の眼球中心位置Ｐと推定値の各回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙを用いて二次元位置を算出する。図２９に示すように、眼球に設定した仮想的な眼球座標系ＥＣを考え、その眼球の姿勢の方向ＥＤとカメラ４２の光軸の方向ＣＤとが真逆の方向としかつ眼球座標系ＥＣのＸ軸とカメラ座標系ＣＣのＸ軸とが真逆の方向とし、この状態を基準姿勢とする。回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙは、この基準姿勢からの眼球の回転角である。 For each estimated value generated by the estimated value generating unit 43c, the fitness calculating unit 43d calculates a two-dimensional position taken by each point in the black eye on the captured image using two parameters of the posture of each estimated value. . Here, the two-dimensional position is calculated using the fixed eyeball center position P held in the information holding unit 43a and the estimated rotation angles deg_x and deg_y. As shown in FIG. 29, a virtual eyeball coordinate system EC set for the eyeball is considered, and the eyeball orientation direction ED and the optical axis direction CD of the camera 42 are opposite to each other and the eyeball coordinate system EC is The X axis and the X axis of the camera coordinate system CC are in opposite directions, and this state is defined as a reference posture. The rotation angles deg_x and deg_y are the rotation angles of the eyeball from this reference posture.

ここで、情報保持部４３ａで保持している眼球モデルと眼球モデルの黒目内の各点のカメラ座標系における三次元位置を使用する。このとき、眼球モデルの中心を画像処理装置３１で求めた眼球中心位置Ｐ（固定）とし、眼球中心位置Ｐが眼球座標系ＥＣの原点となる（図２９参照）。この状態で、推定値の回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙを用いて、眼球モデルを回転させる。適合度算出部４３ｄでは、式（２４）により、回転前の黒目内の各点のカメラ座標系における三次元位置をＱとし、回転後の黒目内の各点の三次元位置Ｑ’をそれぞれ算出する。 Here, the eyeball model held by the information holding unit 43a and the three-dimensional position in the camera coordinate system of each point in the black eye of the eyeball model are used. At this time, the center of the eyeball model is set to the eyeball center position P (fixed) obtained by the image processing device 31, and the eyeball center position P is the origin of the eyeball coordinate system EC (see FIG. 29). In this state, the eyeball model is rotated using the estimated rotation angles deg_x and deg_y. In the fitness calculation unit 43d, the three-dimensional position in the camera coordinate system of each point in the black eye before rotation is defined as Q, and the three-dimensional position Q ′ of each point in the black eye after rotation is calculated by Expression (24). To do.

Ｒ_ｙは、Ｙ軸周りの回転行列であり、式（２５）で表され、推定値の回転角ｄｅｇ＿ｙが使用される。Ｒ_ｘは、Ｘ軸周りの回転行列であり、式（２６）で表され、推定値の回転角ｄｅｇ＿ｘが使用される。この場合、黒目内の各点は、回転角ｄｅｇ＿ｘ及び回転角ｄｅｇ＿ｙと立体角θとは式（２７）の関係を満たすことになる。 R _y is a rotation matrix around the Y axis and is expressed by Expression (25), and the rotation angle deg_y of the estimated value is used. R _x is a rotation matrix around the X axis, and is represented by Expression (26), and an estimated rotation angle deg_x is used. In this case, for each point in the black eye, the rotation angle deg_x, the rotation angle deg_y, and the solid angle θ satisfy the relationship of Expression (27).

そして、適合度算出部４３ｄでは、画像処理装置３１と同様に、上記の式（２３）により、黒目内の各点の三次元座標Ｑ’の撮像画像上への投影位置をそれぞれ算出する。この多数の投影位置で形成される領域が、眼球モデルを推定値の回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙで回転させたときの眼球モデル上の虹彩（黒目）の撮像画像上での領域となる。図２８には、推定値の回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙに応じて、画像上に投影された二次元座標ＩＤ，・・・・を示している。 Then, similarly to the image processing device 31, the suitability calculation unit 43 d calculates the projection position on the captured image of the three-dimensional coordinates Q ′ of each point in the black eye by the above equation (23). The region formed by the large number of projection positions is a region on a captured image of the iris (black eye) on the eyeball model when the eyeball model is rotated at the estimated rotation angles deg_x and deg_y. FIG. 28 shows two-dimensional coordinate IDs projected on the image according to the estimated rotation angles deg_x, deg_y.

さらに、適合度算出部４３ｄでは、画像処理装置３１と同様に、推定値毎に、カメラ４２の撮像画像から黒目内の各点の投影位置の輝度値をそれぞれ取得し、全ての投影位置での輝度値の平均値を適合度とする。なお、適合度算出部４３ｄでも、第１の実施の形態と同様に、マップを設定し、処理負荷を軽減する。 Further, in the fitness level calculation unit 43d, as in the image processing device 31, the brightness value of the projection position of each point in the black eye is acquired from the captured image of the camera 42 for each estimated value, and at all projection positions. The average value of luminance values is taken as the fitness. In the fitness level calculation unit 43d, as in the first embodiment, a map is set and the processing load is reduced.

眼球姿勢出力部４３ｅでは、全ての推定値について適合度の算出が終了すると、算出した適合度の中から最小値の適合度を抽出する。そして、眼球姿勢出力部４３ｅでは、その最小の適合度を持つ推定値を姿勢（回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙ）と推定し、その推定した眼球姿勢を出力する。 When the eyeball posture output unit 43e finishes calculating the fitness level for all the estimated values, the fitness level of the minimum value is extracted from the calculated fitness levels. Then, the eyeball posture output unit 43e estimates the estimated value having the minimum fitness as the posture (rotation angle deg_x, deg_y), and outputs the estimated eyeball posture.

適合度の推定方法としては、他の方法でもよく、例えば、閾値以下の適合度を持つ推定値の適合度による加重平均値によって姿勢を推定してもよいし、適合度を算出された全ての推定値の適合度による加重平均値によって姿勢を推定してもよいし、適合度の最小値の一定倍以下の値（例えば、最大値の１．１倍以下の値）をとる適合度の推定値の適合度による加重平均値によって姿勢を推定してもよいし、適合度の最小値の一定倍以下の値をとる適合度の推定値の適合度の数が所定数以上の場合にそれらの適合度の推定値の適合度による加重平均値によって姿勢を推定してもよいし、あるいは、適合度の最小値の一定倍以下の値をとる適合度の推定値の適合度の数が所定数未満の場合に適合度が最小の推定値の姿勢としてもよい。 As a method of estimating the fitness, other methods may be used, for example, the posture may be estimated by a weighted average value based on the fitness of an estimated value having a fitness equal to or less than a threshold, or all the fitness values calculated may be calculated. The posture may be estimated by a weighted average value based on the degree of fitness of the estimated value, or the degree of fitness that takes a value not more than a fixed multiple of the minimum value of the fitness (for example, a value not more than 1.1 times the maximum value). The posture may be estimated by a weighted average value based on the degree of fitness of the values, or when the number of fitness levels of the estimated fitness value that takes a value that is a certain multiple of the minimum value of the fitness level is greater than or equal to a predetermined number The posture may be estimated by a weighted average value based on the fitness value of the fitness value, or the number of fitness values of the fitness value that takes a value not more than a certain multiple of the minimum value of the fitness value is a predetermined number. If it is less than this, the posture of the estimated value with the minimum fitness may be used.

次に、図２２を参照して、画像処理装置３１の動作について説明する。特に、画像処理部３３の眼球中心位置推定処理については図３１のフローチャートに沿って説明する。図３１は、図２２の画像処理装置における眼球中心位置推定処理の流れを示すフローチャートである。 Next, the operation of the image processing apparatus 31 will be described with reference to FIG. In particular, the eyeball center position estimation process of the image processing unit 33 will be described with reference to the flowchart of FIG. FIG. 31 is a flowchart showing a flow of eyeball center position estimation processing in the image processing apparatus of FIG.

カメラ３２では、カメラを覗き込む目を撮像し、その撮像画像の画像信号を画像処理部３３に送信する。 The camera 32 captures an image of an eye looking into the camera and transmits an image signal of the captured image to the image processing unit 33.

画像処理部３３では、眼球中心位置Ｐ、眼球半径Ｒ、虹彩半径ｒからなる眼球モデルを設定する（Ｓ６０）。画像処理部３３では、カメラ３２から画像信号を受信し、目の撮像画像を取得する（Ｓ６１）。 The image processing unit 33 sets an eyeball model including an eyeball center position P, an eyeball radius R, and an iris radius r (S60). The image processing unit 33 receives an image signal from the camera 32 and acquires a captured image of the eye (S61).

画像処理部３３では、眼球モデルの眼球中心とカメラ３２の光学中心を結ぶ線分が眼球モデルの視線と一致するように回転する場合に、眼球中心がカメラ３２に対して取りえる位置の推定値を多数生成する（Ｓ６２）。そして、画像処理部３３では、各推定値の眼球中心位置の場合に、各回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙに応じて黒目内の各点をそれぞれ移動させ、その移動させた各三次元位置からカメラ３２で撮像した撮像画像上に投影した投影位置（二次元座標）をそれぞれ算出する（Ｓ６３）。さらに、画像処理部３３では、各推定値の眼球中心位置の場合に、撮像画像上の黒目内の各点の投影位置での輝度の平均値を算出し、その平均値を適合度とする（Ｓ６４）。 In the image processing unit 33, when the line segment connecting the eyeball center of the eyeball model and the optical center of the camera 32 rotates so as to coincide with the line of sight of the eyeball model, an estimated value of the position that the eyeball center can take with respect to the camera 32 Are generated (S62). Then, in the case of the eyeball center position of each estimated value, the image processing unit 33 moves each point in the black eye according to each rotation angle deg_x, deg_y, and the camera 32 from each moved three-dimensional position. Projection positions (two-dimensional coordinates) projected on the captured image are calculated (S63). Further, in the case of the eyeball center position of each estimated value, the image processing unit 33 calculates an average value of luminance at the projection position of each point in the black eye on the captured image, and uses the average value as the fitness ( S64).

全ての推定値の眼球中心位置についての適合度を算出すると、画像処理部３３では、推定値の眼球中心位置と各推定値に対して算出した適合度を用いて、撮像画像における眼球中心位置を推定し、その推定した眼球中心位置を出力する（Ｓ６５）。また、画像処理部３３では、その推定した眼球中心位置の場合の眼球モデルの黒目内の各点のカメラ座標系における三次元位置を出力する。 When the degree of fitness for the eyeball center position of all the estimated values is calculated, the image processing unit 33 calculates the eyeball center position in the captured image using the eyeball center position of the estimated value and the degree of fitness calculated for each estimated value. The estimated eyeball center position is output (S65). Further, the image processing unit 33 outputs the three-dimensional position in the camera coordinate system of each point in the black eye of the eyeball model in the case of the estimated eyeball center position.

次に、図２３を参照して、画像処理装置４１の動作について説明する。特に、画像処理部４３の眼球姿勢推定処理については図３２のフローチャートに沿って説明する。図３２は、図２３の画像処理装置における眼球姿勢推定処理の流れを示すフローチャートである。 Next, the operation of the image processing apparatus 41 will be described with reference to FIG. In particular, the eyeball posture estimation process of the image processing unit 43 will be described with reference to the flowchart of FIG. FIG. 32 is a flowchart showing the flow of eyeball posture estimation processing in the image processing apparatus of FIG.

カメラ４２では、時間的に連続して撮像し、一定時間毎に撮像画像の画像信号を画像処理部４３に送信する。 The camera 42 captures images continuously in time and transmits an image signal of the captured image to the image processing unit 43 at regular time intervals.

画像処理部４３では、眼球中心位置Ｐ、眼球半径Ｒ、虹彩半径ｒからなる眼球モデルを設定する（Ｓ７０）。この眼球モデルは、眼球中心位置Ｐが画像処理装置３１で推定した眼球中心位置で固定され、その眼球中心位置Ｐの場合の黒目内の各点の基準となる三次元位置が画像処理装置３１で算出された値である。画像処理部４３では、カメラ４２から画像信号を受信し、現フレームの撮像画像を取得する（Ｓ７１）。 The image processing unit 43 sets an eyeball model including the eyeball center position P, the eyeball radius R, and the iris radius r (S70). In this eyeball model, the eyeball center position P is fixed at the eyeball center position estimated by the image processing device 31, and the three-dimensional position serving as a reference for each point in the black eye in the case of the eyeball center position P is the image processing device 31. This is a calculated value. The image processing unit 43 receives an image signal from the camera 42 and acquires a captured image of the current frame (S71).

画像処理部４３では、前フレームで推定した眼球姿勢に基づいて、眼球がカメラ座標系に対して取りえる回転の推定値を多数生成する（Ｓ７２）。そして、画像処理部４３では、各推定値の回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙに応じて、保持している黒目内の各点の三次元位置をそれぞれ移動させ、その移動させた各三次元位置からカメラ４２で撮像した撮像画像上に投影した投影位置（二次元座標）をそれぞれ算出する（Ｓ７３）。さらに、画像処理部４３では、各推定値の回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙの場合に、撮像画像上の黒目内の各点の投影位置での輝度の平均値を算出し、その平均値を適合度とする（Ｓ７４）。 The image processing unit 43 generates a large number of estimated rotation values that the eyeball can take with respect to the camera coordinate system based on the eyeball posture estimated in the previous frame (S72). Then, the image processing unit 43 moves the three-dimensional position of each point in the black eye held according to the rotation angles deg_x and deg_y of the respective estimated values, and the camera 42 from each of the moved three-dimensional positions. Projection positions (two-dimensional coordinates) projected on the captured image captured in (1) are calculated (S73). Further, in the case of the rotation angles deg_x and deg_y of each estimated value, the image processing unit 43 calculates the average value of the luminance at the projection position of each point in the black eye on the captured image, and uses the average value as the fitness. (S74).

全ての推定値の回転角ｄｅｇ＿ｘ，ｄｅｇ＿ｙについての適合度を算出すると、画像処理部４３では、推定値の回転角と各推定値に対して算出した適合度を用いて、撮像画像における眼球姿勢を推定し、その推定した眼球姿勢を出力する（Ｓ７５）。 When the degrees of fitness for the rotation angles deg_x and deg_y of all the estimated values are calculated, the image processing unit 43 uses the rotation angles of the estimated values and the fitness calculated for each estimated value to calculate the eyeball posture in the captured image. The estimated eyeball posture is output (S75).

この画像処理装置３１及び画像処理装置４１によれば、多数の推定値を生成し、各推定値について黒目としての確からしさと黒目の位置とを評価した適合度を求めることにより、撮像画像から眼球中心位置や眼球姿勢を高精度に推定することができる。また、画像処理装置３１及び画像処理装置４１によれば、眼球モデルの少ないデータだけを保持し、画像全体ではなく、この眼球モデルについての処理を行うだけなので、処理負荷を軽減でき、処理時間も短い。 According to the image processing device 31 and the image processing device 41, a large number of estimated values are generated, and an eyeball is obtained from the captured image by obtaining a degree of fitness that evaluates the probability of the black eye and the position of the black eye for each estimated value. The center position and eyeball posture can be estimated with high accuracy. Further, according to the image processing device 31 and the image processing device 41, only the data with a small eyeball model is held, and only the processing for the eyeball model is performed instead of the entire image, so that the processing load can be reduced and the processing time can be reduced. short.

特に、画像上での黒目は、視線がカメラを向いているときにはほぼ円であるが、視線の光学中心からのずれが大きくなるのに従って楕円になる。そこで、画像処理装置３１及び画像処理装置４１では、眼球モデルを利用し、かつ、眼球モデルの位置や姿勢を三次元的に移動させ、その画像上での投影位置を用いることにより、画像上での黒目の変化を高精度に表現することができる。そのため、画像上での黒目と同じ見え方をする眼球モデルの位置や姿勢を探すことが可能となり、その位置や姿勢を推定することができる。その結果、撮像画像上で楕円に見えるような黒目に対しても、その姿勢（視線）を高精度に推定することができる。 In particular, the black eye on the image is substantially a circle when the line of sight is facing the camera, but becomes an ellipse as the deviation of the line of sight from the optical center increases. Therefore, the image processing device 31 and the image processing device 41 use the eyeball model, move the position and posture of the eyeball model in a three-dimensional manner, and use the projection position on the image. The change of the black eye can be expressed with high accuracy. Therefore, it is possible to search for the position and orientation of the eyeball model that looks the same as the black eye on the image, and to estimate the position and orientation. As a result, the posture (line of sight) of a black eye that looks like an ellipse on the captured image can be estimated with high accuracy.

以上、本発明に係る実施の形態について説明したが、本発明は上記実施の形態に限定されることなく様々な形態で実施される。 As mentioned above, although embodiment which concerns on this invention was described, this invention is implemented in various forms, without being limited to the said embodiment.

例えば、本実施の形態では対象物体として人間の顔や眼球に適用したが、これら以外の三次元物体にも適用可能であり、例えば、人の体や車両などの同じような三次元形状を有しており、その三次元形状を特定可能な物体に適用可能である。 For example, in the present embodiment, the target object is applied to a human face or eyeball, but it can also be applied to other three-dimensional objects, for example, a similar three-dimensional shape such as a human body or a vehicle. It can be applied to an object whose three-dimensional shape can be specified.

また、本実施の形態ではパーソナルコンピュータなどのコンピュータ上でアプリケーションプログラム（ソフトウエア）を実行することによって各部を構成したが、ハードウエアによって各部を構成してもよい。 In the present embodiment, each unit is configured by executing an application program (software) on a computer such as a personal computer. However, each unit may be configured by hardware.

また、本実施の形態では対象物体の有無、位置、姿勢を判断するための画像処理装置に１台のカメラを備え、単一の撮像画像から有無、位置、姿勢を判断する構成としたが、画像処理装置に複数台のカメラ（例えば、ステレオカメラ）を備え、複数の撮像画像から有無、位置、姿勢を判断するようにしてもよい。この場合、処理負荷は増加するが、有無判定や位置、姿勢の推定精度は向上する。例えば、第１の実施の形態で複数台のカメラを備える場合、複数の撮像画像それぞれについて、投影位置の算出、各特徴点についてのマップの生成や保持、正規化相関演算、適合度算出などを行う必要がある。 In this embodiment, the image processing apparatus for determining the presence / absence, position, and orientation of the target object is provided with one camera, and the presence / absence, position, and orientation are determined from a single captured image. The image processing apparatus may include a plurality of cameras (for example, stereo cameras), and may determine presence / absence, position, and posture from a plurality of captured images. In this case, the processing load increases, but the presence / absence determination and position / posture estimation accuracy are improved. For example, when a plurality of cameras are provided in the first embodiment, for each of a plurality of captured images, calculation of a projection position, generation and retention of a map for each feature point, normalized correlation calculation, fitness calculation, etc. There is a need to do.

また、本実施の形態では特徴点を評価するための情報として輝度情報を用いる構成としたが、彩度、ＲＧＢの色情報などの他の画像情報を用いてもよい。また、本実施の形態では特徴点を評価するための情報として輝度パターンを用いる構成としたが、輝度のヒストグラム、輝度の分布形状、エッジパターン（エッジ画像）、フーリエ変換による周波数特性などの他の情報を用いてもよい。 In this embodiment, luminance information is used as information for evaluating feature points. However, other image information such as saturation and RGB color information may be used. In this embodiment, the luminance pattern is used as the information for evaluating the feature points. However, the luminance histogram, the luminance distribution shape, the edge pattern (edge image), the frequency characteristics by Fourier transform, and the like are used. Information may be used.

また、本実施の形態では特徴点毎にマップを生成する構成としたが、マップを生成しない構成としてもよい。 In the present embodiment, the map is generated for each feature point, but the map may not be generated.

また、本実施の形態では記憶解析部を設け、推定値や推定値の適合度を記憶し、対象物体が過去にとった位置や姿勢の履歴を作成し、その履歴も利用して推定値を生成する構成としたが、記憶解析部を設けずに、履歴を利用しないで推定値を生成するようにしてもよい。 In this embodiment, a storage analysis unit is provided to store the estimated value and the fitness of the estimated value, create a history of the position and orientation of the target object taken in the past, and also use the history to calculate the estimated value. However, the estimated value may be generated without using the history without using the storage analysis unit.

また、第１の実施の形態及び第２の実施の形態では顔の位置及び姿勢の両方についての推定値を生成する構成としたが、位置と姿勢のいずれか一方を固定とし、他方のみの推定値を生成する構成としてもよい。また、対象物体によっては、位置と姿勢のいずれか一方しか変化しないものもあり、その場合にはその変化するものについてのみ推定値を生成する。 In the first embodiment and the second embodiment, the estimated values for both the position and orientation of the face are generated. However, either one of the position and orientation is fixed, and only the other is estimated. It is good also as a structure which produces | generates a value. Some target objects change only one of the position and orientation, and in that case, an estimated value is generated only for the changed object.

また、第１の実施の形態では２つの段階で推定値生成や各推定値についての適合度算出を行うことによって位置と姿勢を推定する構成としたが、１つの段階で位置と姿勢を推定するようにしてもよい。 In the first embodiment, the configuration is such that the position and orientation are estimated by generating estimated values and calculating the fitness for each estimated value in two stages. However, the position and orientation are estimated in one stage. You may do it.

また、第３の実施の形態では眼球モデルに基づいて眼球中心位置を推定する構成としたが、顔の位置に対する平均的な眼球中心位置の相対的な位置関係が既知な場合、第１の実施の形態で推定した顔の位置や姿勢を元にしてその相対的な位置関係から眼球中心位置を推定することも可能である。 In the third embodiment, the eyeball center position is estimated based on the eyeball model. However, when the relative positional relationship of the average eyeball center position with respect to the face position is known, the first embodiment is performed. It is also possible to estimate the center position of the eyeball from the relative positional relationship based on the position and posture of the face estimated in this form.

また、第３の実施の形態では顔が動かないと仮定した場合の眼球の姿勢を推定する構成としたが、顔（頭部）が動く場合の眼球の姿勢を推定することも可能である。例えば、まず、第１の実施の形態で顔の位置と姿勢を推定し、その推定した顔の位置と姿勢を考慮して眼球の姿勢を推定するようにする。 In the third embodiment, the posture of the eyeball is estimated when it is assumed that the face does not move. However, the posture of the eyeball when the face (head) moves can also be estimated. For example, first, the face position and posture are estimated in the first embodiment, and the eyeball posture is estimated in consideration of the estimated face position and posture.

また、第３の実施の形態では黒目だけを考慮した眼球モデルを用いて各推定を行ったが、黒目に加えて白目も考慮した眼球モデルを用いて各推定を行ってもよい。撮像画像の上の目は、黒目以外にも目尻付近や上瞼付近も同様に黒い場合がある。そこで、図３０に示すような、黒目とその周辺の白目との位置関係まで考慮できる眼球モデルを用いることもできる。この眼球モデルを用いる場合、黒目内の各点の画像上の投影位置における輝度の平均値以外に、白目内の各点の画像上の投影位置における輝度の平均値も求めることができるので、例えば、白目領域の輝度平均値から黒目領域の輝度平均値を減算した値を適合度とし、この適合度が最も大きくなる位置が最も黒目らしいと判断することができる。 In the third embodiment, each estimation is performed using an eyeball model that considers only black eyes, but each estimation may be performed using an eyeball model that considers white eyes in addition to black eyes. In addition to black eyes, the eyes above the captured image may be black in the vicinity of the corners of the eyes and the upper eyelids as well. Therefore, an eyeball model that can take into account the positional relationship between black eyes and surrounding white eyes as shown in FIG. 30 can also be used. When this eyeball model is used, in addition to the average value of the luminance at the projection position on the image of each point in the black eye, the average value of the luminance at the projection position on the image of each point in the white eye can be obtained. The value obtained by subtracting the average luminance value of the black eye region from the average luminance value of the white eye region is used as the fitness level, and it can be determined that the position where the fitness level is the highest is most likely black.

また、第３の実施の形態では黒目としての確からしさを評価する際に、黒目領域が他の領域に比べて最も輝度が小さいので、適合度の中から最も小さい適合度を抽出する構成としたが、評価情報として黒目領域を示す輝度の平均値を判定するための閾値を保持し、その閾値以下の適合度を抽出する構成としてもよいし、あるいは、虹彩が青や緑などの他の色の場合には評価情報としてその色の領域の輝度の平均値を判定するための閾値を保持し、その閾値内の適合度を抽出する構成としてもよいし、また、評価情報として個々の人の虹彩模様を示す参照輝度パターンを保持し、その参照輝度パターンと投影位置での輝度パターンとから正規化相関値を求め、正規化相関値から適合度を算出する構成としてもよい。 Further, in the third embodiment, when evaluating the probability as a black eye, the black eye region has the lowest luminance as compared with other regions, and therefore, the configuration is such that the smallest fitness is extracted from the fitness. However, the threshold value for determining the average value of the luminance indicating the black eye area may be held as the evaluation information, and the degree of fit below the threshold value may be extracted, or the iris may be in other colors such as blue or green In this case, a threshold value for determining the average value of the brightness of the color area may be held as evaluation information, and the degree of fit within the threshold value may be extracted. A configuration may be adopted in which a reference luminance pattern indicating an iris pattern is held, a normalized correlation value is obtained from the reference luminance pattern and the luminance pattern at the projection position, and the fitness is calculated from the normalized correlation value.

第１の実施の形態及び第２の実施の形態に係るモデリング処理用の画像処理装置の構成図である。It is a block diagram of the image processing apparatus for modeling processing which concerns on 1st Embodiment and 2nd Embodiment. 第１の実施の形態に係る顔位置・姿勢推定処理用の画像処理装置の構成図である。1 is a configuration diagram of an image processing apparatus for face position / posture estimation processing according to a first embodiment. FIG. 図１の第１カメラで撮像された顔の撮像画像の一例である。It is an example of the captured image of the face imaged with the 1st camera of FIG. 図３の撮像画像から抽出された特徴点を示す画像である。It is an image which shows the feature point extracted from the captured image of FIG. 特徴点に対応する点を含む三次元モデルの一例である。It is an example of a three-dimensional model including points corresponding to feature points. 図３の各特徴点の三次元位置を示す図である。It is a figure which shows the three-dimensional position of each feature point of FIG. 図１の第１カメラと第２カメラでそれぞれ撮像された顔の撮像画像の一例である。It is an example of the captured image of the face each imaged with the 1st camera and 2nd camera of FIG. 図７の２つの撮像画像からそれぞれ抽出された特徴点を示す画像である。It is an image which shows the feature point each extracted from the two captured images of FIG. 図２のカメラで撮像された顔の撮像画像の一例である。It is an example of the captured image of the face imaged with the camera of FIG. 図９の撮像画像から検出された顔領域を示す画像である。10 is an image showing a face area detected from the captured image of FIG. 9. 図１０の撮像画像の顔領域から抽出された参照輝度パターンと類似する領域を示す画像である。It is an image which shows the area | region similar to the reference luminance pattern extracted from the face area | region of the captured image of FIG. 第１段階での顔の位置・姿勢の推定値に応じた特徴点の基準三次元位置から二次元画像上への投影の説明図である。It is explanatory drawing of the projection on the two-dimensional image from the reference | standard three-dimensional position of the feature point according to the estimated value of the face position and attitude | position in the 1st step. 第１段階での各特徴点の正規化相関値の一例である。It is an example of the normalized correlation value of each feature point in the first stage. 第２段階での顔の位置・姿勢の推定値に応じて特徴点の基準三次元位置から二次元画像上への投影の説明図である。It is explanatory drawing of the projection on the two-dimensional image from the reference | standard three-dimensional position of a feature point according to the estimated value of the face position and attitude | position in a 2nd step. 第２段階での各特徴点の正規化相関値の一例である。It is an example of the normalized correlation value of each feature point in the second stage. 特徴点毎の各投影位置における正規化相関値からなるマップの一例である。It is an example of the map which consists of the normalized correlation value in each projection position for every feature point. 図１の画像処理装置におけるモデリング処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the modeling process in the image processing apparatus of FIG. 図２の画像処理装置における顔位置・姿勢推定処理の流れを示すフローチャートである。3 is a flowchart showing a flow of face position / posture estimation processing in the image processing apparatus of FIG. 2. 図２の画像処理装置における正規化相関値算出処理の流れを示すフローチャートである。3 is a flowchart showing a flow of normalized correlation value calculation processing in the image processing apparatus of FIG. 2. 第２の実施の形態に係る顔有無判定処理用の画像処理装置の構成図である。It is a block diagram of the image processing apparatus for the face presence determination process which concerns on 2nd Embodiment. 図２０の画像処理装置における顔有無判定処理の流れを示すフローチャートである。FIG. 21 is a flowchart showing a flow of face presence / absence determination processing in the image processing apparatus of FIG. 20. FIG. 第３の実施の形態に係る眼球中心位置推定処理用の画像処理装置の構成図である。It is a block diagram of the image processing apparatus for the eyeball center position estimation process which concerns on 3rd Embodiment. 第３の実施の形態に係る眼球姿勢推定処理用の画像処理装置の構成図である。It is a block diagram of the image processing apparatus for the eyeball attitude | position estimation process which concerns on 3rd Embodiment. 眼球構造を示す図である。It is a figure which shows an eyeball structure. 眼球モデルを示す図である。It is a figure which shows an eyeball model. 眼球とカメラとの関係を示す図である。It is a figure which shows the relationship between an eyeball and a camera. 図２３のカメラで撮像された目の撮像画像の一例である。It is an example of the captured image of the eye imaged with the camera of FIG. 眼球の回転の推定値に応じて黒目内の点を二次元画像上への投影の説明図である。It is explanatory drawing of the projection on the two-dimensional image of the point in a black eye according to the estimated value of rotation of an eyeball. カメラ座標系と眼球座標系との関係を示す図である。It is a figure which shows the relationship between a camera coordinate system and an eyeball coordinate system. 白目も含めた眼球モデルを示す図である。It is a figure which shows the eyeball model also including a white eye. 図２２の画像処理装置における眼球中心位置推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the eyeball center position estimation process in the image processing apparatus of FIG. 図２３の画像処理装置における眼球姿勢推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the eyeball attitude | position estimation process in the image processing apparatus of FIG.

Explanation of symbols

１，１１，２１，３１，４１…画像処理装置、２…第１カメラ、３…第２カメラ、１２，２２，３２，４２…カメラ、４，１３，２３，３３，４３…画像処理部、４ａ…特徴点抽出部、４ｂ…特徴点三次元位置推定部、４ｃ，１３ａ，２３ａ，３３ａ，４３ａ…情報保持部、１３ｂ，２３ｂ，４３ｂ…記憶解析部、１３ｃ，２３ｃ，３３ｂ，４３ｃ…推定値生成部、１３ｄ，２３ｄ，３３ｃ，４３ｄ…適合度算出部、１３ｅ…顔位置・姿勢出力部、２３ｅ…顔有無出力部、３３ｄ…眼球中心位置出力部、４３ｅ…眼球姿勢出力部 DESCRIPTION OF SYMBOLS 1, 11, 21, 31, 41 ... Image processing apparatus, 2 ... 1st camera, 3 ... 2nd camera, 12, 22, 32, 42 ... Camera, 4, 13, 23, 33, 43 ... Image processing part, 4a: Feature point extraction unit, 4b: Feature point three-dimensional position estimation unit, 4c, 13a, 23a, 33a, 43a ... Information holding unit, 13b, 23b, 43b ... Memory analysis unit, 13c, 23c, 33b, 43c ... Estimation Value generation unit, 13d, 23d, 33c, 43d ... fitness calculation unit, 13e ... face position / posture output unit, 23e ... face presence / absence output unit, 33d ... eyeball center position output unit, 43e ... eyeball posture output unit

Claims

Imaging means;
Information holding means for holding a template consisting of evaluation information about a plurality of feature points on a two-dimensional image of a target object and three-dimensional position information;
Estimated value generation means for generating a plurality of estimated values of the position or / and orientation of the target object;
A fitness calculation for calculating a fitness of a plurality of estimated values generated by the estimated value generating device based on each template of a captured image captured by the imaging device and a plurality of feature points held by the information holding device. Means,
And determining means for determining the presence / absence, position or / and orientation of the target object in the captured image captured by the imaging means based on the fitness of the plurality of estimated values calculated by the fitness calculating means. Image processing device.

The estimated value generation means includes a value related to a position or / and orientation of the target object in a captured image captured in the past by the imaging means, a value related to a position or / and orientation that the target object can take structurally, a position of the target object or / The image processing apparatus according to claim 1, wherein the estimated value is generated based on at least one of the history of the posture and the posture.

The estimation value generation means generates a plurality of estimations based on at least one of a value related to a position or / and orientation of the target object in a captured image captured in the past by the imaging means, and a history of the position or / and orientation of the target object. The image processing apparatus according to claim 1, wherein the density of values is changed.

The fitness calculation means converts a three-dimensional position of each feature point of the template by each estimated value generated by the estimated value generation means, and projects the converted three-dimensional position on a captured image captured by the imaging means. The probability of the feature point at the projection position is evaluated based on the evaluation information of the template and the information around the projection position of the captured image, and the fitness is calculated based on the evaluation value. The image processing apparatus according to any one of claims 1 to 3.

The image processing apparatus according to claim 4, wherein the fitness calculation unit converts information around a projection position of the captured image into the same physical quantity as the evaluation information of the template.

The image processing apparatus according to claim 4, wherein the fitness calculation unit sets a constant value when an evaluation value of the probability of the feature point at the projection position is equal to or less than a predetermined value.

7. The method according to claim 4, wherein the matching level calculation unit generates a data structure including an evaluation value at each projection position for each feature point each time the target object in the captured image changes. The image processing apparatus described in the item.

The image processing apparatus according to claim 4, wherein the fitness level calculation unit calculates the fitness level using a feature point having a high evaluation value among a plurality of feature points. .

The image according to any one of claims 4 to 8, wherein the fitness calculation means calculates a statistic of evaluation values of a plurality of feature points, and uses the statistic as a fitness. Processing equipment.

The determination means has at least a predetermined value greater than or equal to a predetermined value, the number of fitness degrees greater than or equal to a predetermined value is greater than or equal to a predetermined number, and the number of fitness values within a predetermined range from the maximum fitness value The image processing apparatus according to claim 1, wherein the image processing apparatus determines that the target object exists in the captured image when one condition is satisfied.

The image processing apparatus according to claim 1, wherein an estimated value of the position and / or orientation of the target object and a degree of fitness for the estimated value are stored.

The target object is black eyes or black eyes and white eyes,
The information holding means holds a black eye model as a template when the target object has black eyes, and holds a black eye model and a white eye model as templates when the target object has black eyes and white eyes. The image processing device according to claim 11.

The target object is black eyes or black eyes and white eyes,
The degree-of-fit calculation means calculates the average value of the luminance of the black-eye area at the projected position of the captured image when the target object is black eyes as the degree of match of the estimated value, and images when the target object is black-eye and white-eye The image processing apparatus according to claim 1, wherein an average value of luminance of a black eye region and an average value of luminance of a white eye region at an image projection position are calculated.

Imaging step;
An estimated value generating step for generating a plurality of estimated values of the position and / or orientation of the target object;
Adaptation of a plurality of estimated values generated in the estimated value generating step based on a template consisting of evaluation information and three-dimensional position information about a plurality of feature points on a two-dimensional image of the captured image captured in the imaging step and the target object A fitness calculation step for calculating a degree,
An image processing method comprising: a determination step of determining presence / absence, position, and / or orientation of the target object based on the fitness of the plurality of estimated values calculated in the fitness calculation step.

In the estimated value generation step, a value related to the position or / and orientation of the target object in the captured image captured in the past in the imaging step, a value related to a position or / and orientation that the target object can take structurally, a position of the target object or / The image processing method according to claim 14, wherein an estimated value is generated based on at least one of the history of the posture and the posture.

In the estimation value generation step, a plurality of estimations are generated based on at least one of a value related to the position or / and orientation of the target object in the captured image captured in the past in the imaging step, and a history of the position or / and orientation of the target object. 16. The image processing method according to claim 14, wherein the density of values is changed.

In the fitness calculation step, the three-dimensional position of each feature point of the template is converted by each estimated value generated in the estimated value generating step, and the converted three-dimensional position is projected on the captured image captured by the imaging unit. The probability of the feature point at the projection position is evaluated based on the evaluation information of the template and the information around the projection position of the captured image, and the fitness is calculated based on the evaluation value. The image processing method according to any one of claims 14 to 16.

The image processing method according to claim 17, wherein in the matching level calculation step, information around a projection position of the captured image is converted into the same physical quantity as the evaluation information of the template.

The image processing method according to claim 17 or 18, wherein, in the fitness calculation step, the evaluation value of the probability of the feature point at the projection position is set to a constant value when the evaluation value is equal to or less than a predetermined value.

20. The data structure including evaluation values at each projection position for each feature point is generated each time the target object in the captured image is changed in the matching degree calculation step. The image processing method described in the item.

The image processing method according to any one of claims 17 to 20, wherein in the fitness level calculating step, the fitness level is calculated using a feature point having a high evaluation value among a plurality of feature points. .

The image according to any one of claims 17 to 21, wherein, in the fitness calculation step, a statistic of evaluation values of a plurality of feature points is calculated, and the statistic is used as the fitness. Processing method.

In the determining step, the maximum value of the fitness level is a predetermined value or more, the number of fitness levels greater than or equal to the predetermined value is a predetermined number or more, and the number of fitness levels within a predetermined range from the maximum value of the fitness level is at least a predetermined number or more. The image processing method according to any one of claims 14 to 22, wherein a target object is determined to exist in the captured image when one condition is satisfied.

The image processing method according to any one of claims 14 to 23, wherein an estimated value of the position and / or orientation of the target object and a degree of fitness for the estimated value are stored.

The target object is black eyes or black eyes and white eyes,
25. The template according to any one of claims 14 to 24, wherein the template is a black eye model when the target object is black eyes, and a black eye model and a white eye model when the target object is black eyes and white eyes. An image processing method described in 1.

The target object is black eyes or black eyes and white eyes,
In the fitness calculation step, as the fitness of the estimated value, the average value of the luminance of the black eye region at the projected position of the captured image is calculated when the target object is black eye, and the image is captured when the target object is black eye and white eye The image processing method according to any one of claims 14 to 25, wherein an average value of luminance of a black eye region and an average value of luminance of a white eye region at an image projection position are calculated.