JP5672112B2

JP5672112B2 - Stereo image calibration method, stereo image calibration apparatus, and computer program for stereo image calibration

Info

Publication number: JP5672112B2
Application number: JP2011075931A
Authority: JP
Inventors: 岡本　浩明; 浩明岡本; 安川　裕介; 裕介安川; 章博 ▲今▼井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-03-30
Filing date: 2011-03-30
Publication date: 2015-02-18
Anticipated expiration: 2031-03-30
Also published as: JP2012209895A

Description

本発明は、例えば、同じ物体を異なる方向から撮影した２枚の画像のうちの少なくとも一方をステレオ画像用に較正するステレオ画像較正方法、ステレオ画像較正装置及びステレオ画像較正用コンピュータプログラムに関する。 The present invention relates to a stereo image calibration method, a stereo image calibration apparatus, and a stereo image calibration computer program that calibrate at least one of two images obtained by photographing the same object from different directions for a stereo image.

従来より、３次元的な像を再生するための研究がなされている。３次元的な像を再生するための一つの方法として、同一の物体に対して異なる方向から撮影した二つの画像を並べて表示し、その二つの画像のそれぞれを、観察者の左右それぞれの眼に見せる方法が知られている。このような方法で用いられる２枚一組の画像は、ステレオ画像と呼ばれる。 Conventionally, research for reproducing a three-dimensional image has been made. As one method for reproducing a three-dimensional image, two images taken from different directions with respect to the same object are displayed side by side, and each of the two images is displayed on the left and right eyes of the observer. The method of showing is known. A set of two images used in such a method is called a stereo image.

ステレオ画像に含まれる２枚の画像は、観察者の左右それぞれの眼で観察されるものであるため、質の良い３次元像を再生するために、その２枚の画像に写された像は、観察者が一般的に物を見る条件と同じ条件で撮影されていることが好ましい。しかしながら、左目用の画像を撮影するカメラと右目用の画像を撮影するカメラとが適切に配置されていないことがある。その結果、その２枚の画像に写っている像が、適切な位置から垂直方向または水平方向にずれていたり、あるいは、一方の画像に写っている像が、他方の画像に写っている像に対して画像上で回転してしまうことがある。このような場合、良好な３次元像を再生可能なステレオ画像を生成するためには、少なくとも一方の画像に写っている像の位置を修正するための較正パラメータを求めるキャリブレーション処理が行われる。 Since the two images included in the stereo image are observed by the left and right eyes of the observer, in order to reproduce a high-quality three-dimensional image, the images projected on the two images are It is preferable that the image is taken under the same conditions as the conditions under which an observer generally sees an object. However, a camera that captures an image for the left eye and a camera that captures an image for the right eye may not be appropriately arranged. As a result, the images in the two images are shifted from their proper positions in the vertical or horizontal direction, or the image in one image is converted into the image in the other image. On the other hand, it may rotate on the image. In such a case, in order to generate a stereo image capable of reproducing a good three-dimensional image, a calibration process for obtaining a calibration parameter for correcting the position of the image shown in at least one of the images is performed.

従来は、そのキャリブレーション処理について、ユーザーが眼で見ながら左目用の画像の像の位置、または右目用の画像の像の位置を調節しており、その操作はユーザにとって非常に煩わしいものであった。特に、ユーザが初心者である場合、そのユーザが適切にキャリブレーション処理を行うのは簡単ではない。 Conventionally, in the calibration process, the position of the image for the left eye or the position of the image for the right eye is adjusted while the user sees with the eyes, and this operation is very troublesome for the user. It was. In particular, when the user is a beginner, it is not easy for the user to appropriately perform the calibration process.

一方、二つの画像のそれぞれの像に基づいて、二つの画像上の像の位置を自動的に位置合わせする技術が提案されている（例えば、特許文献１〜３を参照）。例えば、特許文献１には、第１の撮像画像及び第２の撮像画像における特定対象の検出マークの代表座標間の差分ベクトルに基づいて、各撮像画像における特定対象の表示位置の差分を調整することで、その特定対象の視差量を調整する技術が開示されている。
また、特許文献１には、各撮像画像から、検出マークの代わりに、特定対象のパーツ、例えば、顔の一部である目、口などを検出して、そのパーツの各画像上の表示位置の差に基づいて視差量を調整することも開示されている。 On the other hand, a technique for automatically aligning the positions of the images on the two images based on the images of the two images has been proposed (see, for example, Patent Documents 1 to 3). For example, in Patent Document 1, a difference in display position of a specific target in each captured image is adjusted based on a difference vector between representative coordinates of detection marks of a specific target in a first captured image and a second captured image. Thus, a technique for adjusting the parallax amount of the specific target is disclosed.
Further, in Patent Document 1, instead of a detection mark, a specific target part, for example, an eye or a mouth that is a part of a face is detected from each captured image, and the display position of each part on the image is detected. It is also disclosed that the amount of parallax is adjusted based on the difference between the two.

さらに、特許文献２には、二つの撮像部の光軸のずれを調節するために、各撮像部で撮影された画像からそれぞれ顔を検出し、二つの画像間での顔の位置のずれ量に基づいて光軸の補正角度を算出する技術が提案されている。さらに、特許文献３には、ステレオカメラの光軸のずれを調整するために、予め距離の分かっている遠方の領域と近方の領域のそれぞれに基づいて、二つのカメラによる画像のうちの一方に対する他方の並進補正量及び回転角を求める技術が提案されている。 Furthermore, in Patent Document 2, in order to adjust the deviation of the optical axes of the two imaging units, a face is detected from each image captured by each imaging unit, and the amount of deviation of the position of the face between the two images A technique for calculating the correction angle of the optical axis based on the above has been proposed. Further, in Patent Document 3, in order to adjust the shift of the optical axis of the stereo camera, one of the images from the two cameras is based on each of a far region and a near region whose distances are known in advance. There has been proposed a technique for obtaining the other translational correction amount and rotation angle with respect to.

特開２０１０−１４７９４０号公報JP 2010-147940 A 特開２００８−２５２２５４号公報JP 2008-252254 A 特開平１１−２５９６３２号公報JP-A-11-259632

しかしながら、特許文献１または２に開示された技術では、顔といった、カメラに比較的近い位置にする被写体上の点からのみ、像の位置合わせをするための補正量が求められる。そのため、これらの技術では、画像全体に対して適切な位置合わせがされないことがあり、その結果、画像上の遠方に位置する物体の３次元再生像が不自然となるおそれがあった。
また、特許文献３に開示された技術では、カメラからの距離が予め分かっている複数の点が光軸調整のために用いられるので、カメラからの距離が既知の点が無い場合には、その技術は適用できない。 However, in the technique disclosed in Patent Document 1 or 2, a correction amount for aligning the image is obtained only from a point on the subject such as a face that is relatively close to the camera. Therefore, in these techniques, proper alignment may not be performed with respect to the entire image, and as a result, a three-dimensional reproduced image of an object located far away on the image may be unnatural.
Further, in the technique disclosed in Patent Document 3, since a plurality of points whose distance from the camera is known in advance are used for optical axis adjustment, when there is no known point from the camera, Technology is not applicable.

そこで本明細書は、各カメラによる画像に写っている像全体の位置をステレオ画像として適切となるように修正するための較正パラメータを算出可能なステレオ画像較正方法を提供することを目的とする。 Therefore, an object of the present specification is to provide a stereo image calibration method capable of calculating a calibration parameter for correcting the position of the entire image shown in the image by each camera so as to be appropriate as a stereo image.

一つの実施形態によれば、ステレオ画像較正方法が提供される。このステレオ画像較正方法は、被写体を含む領域を第１のカメラで撮影することにより生成された第１の画像とその領域を第２のカメラで撮影することにより生成された第２の画像とを取得し、第１の画像から被写体が写っている第１の被写体領域を検出し、かつ、第２の画像から被写体が写っている第２の被写体領域を検出し、第１の被写体領域及び第２の被写体領域から、被写体上の同一の点に対応する被写体特徴点の組を少なくとも一つ抽出し、その少なくとも一つの被写体特徴点の組に基づいて、像面に対して平行な第１の軸及び第１の軸と直交する第２の軸のそれぞれの周りの第１のカメラの回転角と第２のカメラの回転角との差を求め、その差による第１の画像上の被写体の位置と第２の画像上の被写体の位置のずれを補正する第１の較正パラメータを求め、第１の画像から第１の被写体領域を除いた第１の背景領域及び第２の画像から第２の被写体領域を除いた第２の背景領域に写っている物体の像から、第１のカメラの光軸周りの回転角と第２のカメラの光軸周りの回転角の差による第１の画像と第２の画像間の回転を補正する第２の較正パラメータを求めることを含む。 According to one embodiment, a stereo image calibration method is provided. In this stereo image calibration method, a first image generated by shooting a region including a subject with a first camera and a second image generated by shooting the region with a second camera are used. A first subject area in which the subject is captured from the first image, and a second subject area in which the subject is photographed is detected from the second image, and the first subject area and the first subject area are detected. At least one set of subject feature points corresponding to the same point on the subject is extracted from the two subject areas, and a first parallel to the image plane is extracted based on the at least one set of subject feature points. The difference between the rotation angle of the first camera and the rotation angle of the second camera around each of the axis and the second axis orthogonal to the first axis is obtained, and the difference between the rotation angle of the second camera and the subject on the first image is determined. Correct the difference between the position and the position of the subject on the second image The calibration parameter of 1 is obtained, and the object reflected in the first background area obtained by removing the first subject area from the first image and the second background area obtained by removing the second subject area from the second image. From the image, a second calibration parameter for correcting a rotation between the first image and the second image due to a difference between a rotation angle around the optical axis of the first camera and a rotation angle around the optical axis of the second camera is obtained. Including seeking.

本発明の目的及び利点は、請求項において特に指摘されたエレメント及び組み合わせにより実現され、かつ達成される。
上記の一般的な記述及び下記の詳細な記述の何れも、例示的かつ説明的なものであり、請求項のように、本発明を制限するものではないことを理解されたい。 The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It should be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.

ここに開示されるステレオ画像較正方法は、各カメラによる画像に写っている像全体の位置をステレオ画像として適切となるように修正するための較正パラメータを算出できる。 The stereo image calibration method disclosed herein can calculate calibration parameters for correcting the position of the entire image shown in the image from each camera so as to be appropriate as a stereo image.

一つの実施形態によるステレオ画像較正装置の概略構成図である。It is a schematic block diagram of the stereo image calibration apparatus by one embodiment. 処理部の機能ブロック図である。It is a functional block diagram of a processing part. （ａ）〜（ｃ）は、本実施形態による、顔特徴点の組の抽出手順を表す模式図である。(A)-(c) is a schematic diagram showing the extraction procedure of the group of the face feature point by this embodiment. 各カメラについて定められる実空間上の座標系を表す図である。It is a figure showing the coordinate system on real space defined about each camera. （ａ）は、左画像及び右画像における顔特徴点の組の一例を表す模式図である。（ｂ）は、画像中心から顔特徴点までの垂直方向の差と水平面からの回転成分との関係を表す模式図である。（ｃ）は、画像中心から顔特徴点までの水平方向の差と垂直面からの回転成分の関係を表す模式図である。(A) is a schematic diagram showing an example of a set of face feature points in the left image and the right image. (B) is a schematic diagram showing the relationship between the difference in the vertical direction from the image center to the face feature point and the rotation component from the horizontal plane. (C) is a schematic diagram showing the relationship between the difference in the horizontal direction from the image center to the face feature point and the rotation component from the vertical plane. （ａ）は、局所勾配の算出に用いられる複数の画素の位置関係を説明する図であり、（ｂ）は、左画像及び右画像の背景領域から求められる局所勾配の方位の分布の模式図である。(A) is a figure explaining the positional relationship of the some pixel used for calculation of local gradient, (b) is a schematic diagram of distribution of orientation of the local gradient calculated | required from the background area | region of a left image and a right image It is. ステレオ画像較正処理の動作フローチャートである。It is an operation | movement flowchart of a stereo image calibration process. （ａ）は、左画像上に写っている背景領域のうち、右画像で頭部により隠蔽される範囲を表す模式図である。また（ｂ）は、右画像上に写っている背景領域のうち、左画像で頭部により隠蔽される範囲を表す模式図である。(A) is a schematic diagram showing the range concealed by the head in a right image among the background area | regions reflected on the left image. Moreover, (b) is a schematic diagram showing the range concealed by the head in the left image in the background region shown on the right image. （ａ）及び（ｂ）は、それぞれ、背景特徴点の抽出に関して、左画像及び右画像において背景領域から除外される領域を表す模式図である。(A) And (b) is a schematic diagram showing the area | region excluded from a background area | region in a left image and a right image, respectively regarding extraction of a background feature point, respectively. （ａ）及び（ｂ）は、それぞれ、背景特徴点の抽出に関して、左画像及び右画像において背景領域から除外される領域を表す他の一例の模式図である。(A) And (b) is a schematic diagram of another example showing the area | region excluded from a background area | region in the left image and the right image regarding extraction of a background feature point, respectively. 左画像を生成するカメラの撮影範囲と右画像を生成するカメラの撮影範囲の関係を表す模式図である。It is a schematic diagram showing the relationship between the imaging range of the camera which produces | generates a left image, and the imaging range of the camera which produces | generates a right image. 二つのカメラから左画像に写る範囲と右画像に写る範囲が同一となる平面までの距離が、その二つのカメラから左画像及び右画像に写る最も遠方の物体までの距離よりも大きい場合のカメラの撮影範囲の模式図である。The camera when the distance from the two cameras to the plane where the range shown in the left image and the range shown in the right image are the same is larger than the distance from the two cameras to the farthest object in the left and right images It is a schematic diagram of the imaging range. 二つのカメラから左画像に写る範囲と右画像に写る範囲が同一となる平面までの距離が、その二つのカメラから左画像及び右画像に写る最も遠方の物体までの距離よりも小さい場合のカメラの撮影範囲の模式図である。A camera in which the distance from the two cameras to the plane where the range shown in the left image and the range shown in the right image are the same is less than the distance from the two cameras to the farthest object in the left and right images It is a schematic diagram of the imaging range. （ａ）及び（ｂ）は、それぞれ、背景特徴点の抽出に関して、左画像及び右画像において背景領域から除外される領域を表すさらに他の一例の模式図である。(A) And (b) is a schematic diagram of the further another example showing the area | region excluded from a background area | region in the left image and the right image regarding extraction of a background feature point, respectively.

図を参照しつつ、一つの実施形態またはその変形例によるステレオ画像較正装置について説明する。このステレオ画像較正装置は、二つのカメラにより生成されたそれぞれの画像から、比較的カメラの近傍に位置する被写体の情報に基づいてその被写体上の同一の点に対応する特徴点の組を検出する。そしてステレオ画像較正装置は、カメラ間の像面に平行な水平方向軸周りの回転角の差及び像面に平行な垂直方向軸周りの回転角の差に起因する被写体の位置のずれを補正する較正パラメータを、その特徴点同士のずれ量に基づいて求める。さらにこのステレオ画像較正装置は、被写体外の背景が写っている各画像上の領域から同一の物体に対応する特徴点をそれぞれ検出し、それら特徴点に基づいて、カメラの光軸周りの回転角の差に起因する像の回転を補正する較正パラメータを求める。これにより、このステレオ画像較正装置は、各カメラから被写体までの距離が未知であっても、各カメラによる画像に対して写っている像全体をステレオ画像用に位置合わせできる較正パラメータを算出できる。 A stereo image calibration apparatus according to one embodiment or a modification thereof will be described with reference to the drawings. This stereo image calibration device detects a set of feature points corresponding to the same point on the subject based on information on the subject located relatively near the camera from the images generated by the two cameras. . Then, the stereo image calibration apparatus corrects the displacement of the subject position caused by the difference in the rotation angle around the horizontal axis parallel to the image plane between the cameras and the difference in the rotation angle around the vertical axis parallel to the image plane. A calibration parameter is obtained based on the amount of deviation between the feature points. Further, the stereo image calibration device detects feature points corresponding to the same object from the regions on the image where the background outside the subject is reflected, and based on these feature points, the rotation angle around the optical axis of the camera. A calibration parameter for correcting the image rotation due to the difference is obtained. Thereby, even if the distance from each camera to the subject is unknown, this stereo image calibration apparatus can calculate a calibration parameter capable of aligning the entire image shown in the image from each camera for a stereo image.

本実施形態では、ステレオ画像較正装置は、３次元の顔画像を表示するテレビ電話システムに適用され、ステレオ画像較正装置が有するカメラは、被写体として、通話しようとする人物の頭部を撮影するものとする。しかし、このステレオ画像較正装置は、テレビ電話システム以外のステレオ画像を生成する様々な装置に適用可能である。 In this embodiment, the stereo image calibration device is applied to a videophone system that displays a three-dimensional face image, and the camera included in the stereo image calibration device captures the head of a person who is going to talk as a subject. And However, this stereo image calibration apparatus can be applied to various apparatuses that generate stereo images other than the videophone system.

図１は、一つの実施形態によるステレオ画像較正装置の概略構成図である。図１に示すように、ステレオ画像較正装置１は、２台のカメラ２−１、２−２と、入力部３と、記憶部４と、処理部５とを有する。さらに、ステレオ画像較正装置１は、カメラ２−１、２−２により生成された画像を後述する較正パラメータを用いて補正することにより得られた補正画像を、通信ネットワークを介して他の機器へ出力するためのインターフェース回路（図示せず）を有していてもよい。 FIG. 1 is a schematic configuration diagram of a stereo image calibration apparatus according to an embodiment. As shown in FIG. 1, the stereo image calibration apparatus 1 includes two cameras 2-1 and 2-2, an input unit 3, a storage unit 4, and a processing unit 5. Furthermore, the stereo image calibration apparatus 1 transmits a corrected image obtained by correcting the images generated by the cameras 2-1 and 2-2 using calibration parameters described later to other devices via a communication network. An interface circuit (not shown) for outputting may be included.

カメラ２−１は、左目用の画像を生成するカメラであり、カメラ２−２は、右目用の画像を生成するカメラである。そのために、カメラ２−１、２−２は、２次元状に配置された固体撮像素子のアレイを有するイメージセンサと、そのイメージセンサ上に、被写体の像を結像する結像光学系を有する。
カメラ２−１、２−２は、同一の被写体を含む領域を撮影し、かつその被写体及び背景の３次元像を再生するためのステレオ画像を生成できるように、各カメラの結像光学系の光軸が略平行となり、かつ、略水平方向に所定の間隔をあけて配置されることが好ましい。そのために、カメラ２−１及び２−２は、例えば、図示しない一つの筺体内の所定の位置に収容されていてもよい。ただし、後述するように、処理部５が実行するキャリブレーション処理によって少なくとも一方の画像に写っている像の位置を補正する較正パラメータが算出される。そのため、カメラ２−１とカメラ２−２の配置は、それらのカメラで生成された画像がそのままステレオ画像となるほど厳密に調整されていなくてもよい。また、カメラ２−１により生成された画像上の被写体の像の大きさと、カメラ２−２により生成された画像上の同一の被写体の像の大きさが略等しくなるように、カメラ２−１、２−２の結像光学系の焦点距離及びイメージセンサの画素数は同一であるとした。しかし、カメラ２−１、２−２の結像光学系の焦点距離及びイメージセンサの画素数は互いに異なっていてもよい。
カメラ２−１、２−２は、それぞれ、画像を生成する度に、その生成した画像を入力部３へ送信する。なお、以下では、便宜上、カメラ２−１で生成された左目用の画像を左画像と呼び、カメラ２−２で生成された右目用の画像を右画像と呼ぶ。 The camera 2-1 is a camera that generates an image for the left eye, and the camera 2-2 is a camera that generates an image for the right eye. For this purpose, the cameras 2-1 and 2-2 have an image sensor having an array of solid-state imaging elements arranged two-dimensionally, and an imaging optical system that forms an image of a subject on the image sensor. .
The cameras 2-1 and 2-2 capture the area including the same subject, and generate a stereo image for reproducing a three-dimensional image of the subject and the background. It is preferable that the optical axes are substantially parallel and arranged at a predetermined interval in a substantially horizontal direction. For this purpose, the cameras 2-1 and 2-2 may be accommodated in a predetermined position in one casing (not shown), for example. However, as will be described later, a calibration parameter for correcting the position of an image shown in at least one image is calculated by a calibration process executed by the processing unit 5. Therefore, the arrangement of the cameras 2-1 and 2-2 does not have to be adjusted so precisely that the images generated by these cameras become stereo images as they are. In addition, the size of the subject image on the image generated by the camera 2-1 is substantially equal to the size of the same subject image on the image generated by the camera 2-2. The focal length of the imaging optical system 2-2 and the number of pixels of the image sensor are the same. However, the focal lengths of the imaging optical systems of the cameras 2-1 and 2-2 and the number of pixels of the image sensor may be different from each other.
Each of the cameras 2-1 and 2-2 transmits the generated image to the input unit 3 each time an image is generated. In the following, for the sake of convenience, the image for the left eye generated by the camera 2-1 is referred to as a left image, and the image for the right eye generated by the camera 2-2 is referred to as a right image.

入力部３は、カメラ２−１、２−２から、それぞれ、左画像及び右画像を受け取り、各画像を処理部５へ渡す。そのために、入力部３は、例えば、カメラ２−１及び２−２と処理部５とを接続するためのユニバーサル・シリアル・バス(Universal Serial Bus、USB)などのシリアルバス規格に従ったインターフェース回路、あるいはビデオインターフェース回路を有する。
入力部３は、取得した左画像及び右画像を処理部５へ渡す。 The input unit 3 receives the left image and the right image from the cameras 2-1 and 2-2, and delivers each image to the processing unit 5. For this purpose, the input unit 3 is an interface circuit in accordance with a serial bus standard such as a universal serial bus (USB) for connecting the cameras 2-1 and 2-2 to the processing unit 5, for example. Or a video interface circuit.
The input unit 3 passes the acquired left image and right image to the processing unit 5.

記憶部４は、例えば、読み書き可能な揮発性または不揮発性の半導体メモリ回路、あるいは、磁気記録媒体または光記録媒体を有する。そして記憶部４は、入力部３から受け取った左画像及び右画像を記憶する。またステレオ画像較正装置１の処理部５が有する各機能が、処理部５が有するプロセッサ上で実行されるコンピュータプログラムにより実現される場合、そのコンピュータプログラムを記憶してもよい。さらに記憶部４は、較正パラメータを記憶してもよい。 The storage unit 4 includes, for example, a readable / writable volatile or nonvolatile semiconductor memory circuit, or a magnetic recording medium or an optical recording medium. The storage unit 4 stores the left image and the right image received from the input unit 3. Moreover, when each function which the process part 5 of the stereo image calibration apparatus 1 has is implement | achieved by the computer program run on the processor which the process part 5 has, you may memorize | store the computer program. Furthermore, the storage unit 4 may store calibration parameters.

処理部５は、少なくとも一つのプロセッサ及びその周辺回路を有する。そして処理部５は、左画像と右画像から、ステレオ画像を生成するために、少なくとも何れか一方の画像上の像の位置を補正する較正パラメータを求める。 The processing unit 5 includes at least one processor and its peripheral circuits. Then, the processing unit 5 obtains a calibration parameter for correcting the position of the image on at least one of the images in order to generate a stereo image from the left image and the right image.

図２に、処理部５の機能ブロック図を示す。図２に示すように、処理部５は、顔検出部１１と、顔特徴点抽出部１２と、背景特徴点抽出部１３と、較正パラメータ算出部１４と、判定部１５とを有する。処理部５が有するこれらの各部は、例えば、処理部５が有するプロセッサ上で実行されるコンピュータプログラムによって実現される機能モジュールとして実装される。あるいは、処理部５が有するこれらの各部は、それぞれ、別個の演算回路としてステレオ画像較正装置１に実装されてもよく、あるいはそれらの各部の機能を実現する一つの集積回路としてステレオ画像較正装置１に実装されてもよい。 FIG. 2 shows a functional block diagram of the processing unit 5. As illustrated in FIG. 2, the processing unit 5 includes a face detection unit 11, a face feature point extraction unit 12, a background feature point extraction unit 13, a calibration parameter calculation unit 14, and a determination unit 15. Each of these units included in the processing unit 5 is implemented as a functional module realized by a computer program executed on a processor included in the processing unit 5, for example. Alternatively, each of these units included in the processing unit 5 may be mounted on the stereo image calibration device 1 as a separate arithmetic circuit, or the stereo image calibration device 1 as one integrated circuit that realizes the function of each unit. May be implemented.

顔検出部１１は、被写体領域検出部の一例であり、左画像及び右画像のそれぞれから、顔が写っている領域を抽出する。以下では、便宜上、顔が写っている領域を顔領域と呼ぶ。顔領域は、被写体領域の一例である。
顔検出部１１は、例えば、機械学習に基づく識別器を用いて左画像及び右画像からそれぞれ顔領域を検出する。そのような識別器は、例えば、Adaboost識別器、多層パーセプトロンあるいはサポートベクターマシンとすることができる。これらの識別器は、予め顔が写っていることが分かっている複数の画像と顔が写っていないことが分かっている複数の画像を用いた教師付き学習によって、画像上の顔領域を検出するように学習される。例えば、識別器としてAdaboost識別器が用いられる場合、顔検出部１１は、右画像及び左画像をそれぞれ複数の小領域に分割し、その小領域を順次識別器に入力する。識別器は、例えば、小領域から顔領域の検出に有用なHaar-like特徴を検出し、そのHaar-like特徴に基づいてその小領域が顔領域か否か判定する。
同様に、多層パーセプトロンまたはサポートベクターマシンといった他の識別器が用いられる場合も、顔検出部１１は、右画像及び左画像を分割した小領域、または小領域から抽出された様々な特徴量を識別器に入力することでその小領域が顔領域か否か判定する。 The face detection unit 11 is an example of a subject region detection unit, and extracts a region where a face is shown from each of the left image and the right image. Hereinafter, for convenience, an area in which a face is shown is referred to as a face area. The face area is an example of a subject area.
For example, the face detection unit 11 detects a face area from each of the left image and the right image using a discriminator based on machine learning. Such a discriminator can be, for example, an Adaboost discriminator, a multilayer perceptron or a support vector machine. These classifiers detect facial regions on images by supervised learning using multiple images whose faces are known in advance and multiple images whose faces are known not to be captured. To be learned. For example, when an Adaboost classifier is used as the classifier, the face detection unit 11 divides each of the right image and the left image into a plurality of small areas, and sequentially inputs the small areas to the classifier. For example, the classifier detects a Haar-like feature useful for detecting a face region from a small region, and determines whether the small region is a face region based on the Haar-like feature.
Similarly, when another classifier such as a multilayer perceptron or a support vector machine is used, the face detection unit 11 identifies a small area obtained by dividing the right image and the left image, or various feature amounts extracted from the small area. It is determined whether or not the small area is a face area.

また、顔検出部１１は、識別器以外の顔領域を検出する様々な方式の何れかに従って顔領域を検出してもよい。例えば、顔検出部１１は、右画像及び左画像から、肌色に相当する色成分を持つ画素を検出し、ラベリング処理によってその画素の集合した領域を顔候補領域として求める。なお、ある画素の色成分がHSV表色系で表されたときにそのH成分が約0〜約30の範囲に含まれる値を持つ場合、その画素は肌色に相当する色成分を持つとすることができる。そして顔検出部１１は、例えば、顔候補領域の大きさが、右画像及び左画像上での一般的な顔の大きさの範囲に含まれ、かつ、顔候補領域の円形度が、一般的な顔の輪郭に相当する所定の閾値以上である場合に顔候補領域を顔領域としてもよい。なお顔検出部１１は、顔候補領域の輪郭上に位置する画素の合計を顔候補領域の周囲長として求め、顔候補領域内の総画素数に4πを乗じた値を周囲長の２乗で除することにより円形度を算出できる。 The face detection unit 11 may detect the face area according to any of various methods for detecting a face area other than the classifier. For example, the face detection unit 11 detects pixels having a color component corresponding to the skin color from the right image and the left image, and obtains a region where the pixels are gathered as a face candidate region by labeling processing. When the color component of a pixel is represented in the HSV color system, if the H component has a value included in the range of about 0 to about 30, the pixel has a color component corresponding to the skin color. be able to. The face detection unit 11 includes, for example, the size of the face candidate area included in the range of general face sizes on the right image and the left image, and the circularity of the face candidate area is The face candidate area may be set as a face area when the threshold value is equal to or greater than a predetermined threshold corresponding to the contour of a simple face. The face detection unit 11 obtains the total number of pixels located on the contour of the face candidate area as the perimeter of the face candidate area, and multiplies the total number of pixels in the face candidate area by 4π as the square of the perimeter. The circularity can be calculated by dividing.

あるいは、顔検出部１１は、顔候補領域の輪郭上の各画素の座標を楕円方程式に当てはめて最小二乗法を適用することにより、顔候補領域を楕円近似してもよい。そして顔検出部１１は、その楕円の長軸と短軸の比が一般的な顔の長軸と短軸の比の範囲に含まれる場合に、顔候補領域を顔領域としてもよい。なお、顔検出部１１は、楕円近似により顔候補領域の形状を評価する場合、画像の各画素の輝度成分に対して近傍画素間演算を行ってエッジに相当するエッジ画素を検出してもよい。この場合、顔検出部１１は、エッジ画素を例えばラベリング処理を用いて連結し、一定の長さ以上に連結されたエッジ画素を顔候補領域の輪郭とする。 Alternatively, the face detection unit 11 may approximate the face candidate region to an ellipse by applying the least square method by applying the coordinates of each pixel on the contour of the face candidate region to the elliptic equation. The face detection unit 11 may use the face candidate area as a face area when the ratio of the major axis to the minor axis of the ellipse is included in the range of the ratio of the major axis to the minor axis of the face. Note that, when the shape of the face candidate region is evaluated by ellipse approximation, the face detection unit 11 may detect an edge pixel corresponding to an edge by performing an inter-pixel calculation on the luminance component of each pixel of the image. . In this case, the face detection unit 11 connects the edge pixels using, for example, a labeling process, and uses the edge pixels connected to a certain length or more as the contour of the face candidate region.

あるいは、顔検出部１１は、顔候補領域と一般的な顔の形状に相当するテンプレートとの間でテンプレートマッチングを行って、顔候補領域とテンプレートとの正規化相互相関値を算出し、その正規化相互相関値が所定値以上である場合に、顔候補領域を顔領域と判定してもよい。 Alternatively, the face detection unit 11 performs template matching between the face candidate region and a template corresponding to a general face shape, calculates a normalized cross-correlation value between the face candidate region and the template, When the normalized cross correlation value is equal to or greater than a predetermined value, the face candidate area may be determined as the face area.

また、カメラ２−１、２−２の撮影範囲内に複数の人がいることがある。このような場合、左画像または右画像から複数の顔領域が検出されることになる。そこで顔検出部１１は、左画像から複数の顔領域を検出した場合、較正パラメータ算出用に一人の人物の顔に着目するため、例えば、最も左画像の中心に近い顔領域、あるいは、最も大きい顔領域を一つ選択する。同様に、顔検出部１１は、右画像から複数の顔領域を検出した場合、最も右画像の中心に近い顔領域、あるいは、最も大きい顔領域を一つ選択する。
なお、ステレオ画像較正装置が異なる位置に配置された複数のマイクロホンを有する場合、処理部５は各マイクロホンに到達する音の時刻の差から複数の人物のうちの発声した人物の方向を推定し、顔検出部１１はその推定方向に最も近い顔領域を選択してもよい。 In addition, there may be a plurality of people within the shooting range of the cameras 2-1 and 2-2. In such a case, a plurality of face regions are detected from the left image or the right image. Therefore, when detecting a plurality of face areas from the left image, the face detection unit 11 focuses on the face of one person for calculating the calibration parameters. For example, the face detection section 11 is the face area closest to the center of the left image or the largest. Select one face area. Similarly, when detecting a plurality of face areas from the right image, the face detection unit 11 selects one face area closest to the center of the right image or the largest face area.
When the stereo image calibration apparatus has a plurality of microphones arranged at different positions, the processing unit 5 estimates the direction of the person who spoke out of the plurality of persons from the difference in time of the sound reaching each microphone, The face detection unit 11 may select the face area closest to the estimated direction.

顔検出部１１は、顔領域を検出すると、顔領域と、画像から顔領域を除いた領域である背景領域とを表す情報を、左画像と右画像のそれぞれについて生成する。例えば、その情報は、右画像または左画像と同一のサイズを有し、かつ顔領域内の画素と背景領域の画素とが異なる画素値を持つ２値画像とすることができる。なお、顔検出部１１は、上記のように一つの画像から複数の顔領域が検出されている場合、選択された一つの顔領域以外の顔領域を背景領域に含めてもよい。
そして顔検出部１１は、顔領域及び背景領域を表す情報を顔特徴点抽出部１２及び背景特徴点抽出部１３へ渡す。 When detecting the face area, the face detection unit 11 generates information representing the face area and a background area that is an area obtained by removing the face area from the image for each of the left image and the right image. For example, the information can be a binary image having the same size as the right image or the left image and having a pixel value in which the pixels in the face area and the pixels in the background area are different. Note that, when a plurality of face areas are detected from one image as described above, the face detection unit 11 may include a face area other than the selected one in the background area.
Then, the face detection unit 11 passes information representing the face region and the background region to the face feature point extraction unit 12 and the background feature point extraction unit 13.

顔特徴点抽出部１２は、被写体特徴量抽出部の一例であり、左画像上の顔領域及び右画像上の顔領域のそれぞれから、顔の同一の点に対応する顔特徴点の組を少なくとも一つ抽出する。本実施形態では、顔特徴点抽出部１２は、左画像及び右画像のうちの一方の顔領域から顔特徴点の候補となる第１の候補点を抽出し、他方の画像の顔領域においてその候補点と一致する第２の候補点を探索する。そして顔特徴点抽出部１２は、他方の画像で検出された第２の候補点に対して、第１の候補点を抽出した画像上で一致する第３の候補点を見つける。そして顔特徴点抽出部１２は、第１の候補点と第３の候補点が実質的に同一とみなせる場合、第１の候補点または第３の候補点と第２の候補点とを、顔特徴点の組とする。これにより、顔特徴量抽出部１２は、顔の同一の部位に対応する左画像上の顔特徴点及び右画像上の顔特徴点を精度良く抽出できる。 The face feature point extraction unit 12 is an example of a subject feature amount extraction unit, and at least a set of face feature points corresponding to the same point of the face from each of the face region on the left image and the face region on the right image. Extract one. In the present embodiment, the face feature point extraction unit 12 extracts a first candidate point that is a candidate for a face feature point from one face area of the left image and the right image, and the face feature point is extracted from the face area of the other image. A second candidate point that matches the candidate point is searched. Then, the facial feature point extraction unit 12 finds a third candidate point that matches the second candidate point detected in the other image on the image from which the first candidate point has been extracted. When the first candidate point and the third candidate point can be regarded as substantially the same, the face feature point extraction unit 12 determines the first candidate point or the third candidate point and the second candidate point as the face A set of feature points. Thereby, the face feature amount extraction unit 12 can accurately extract the face feature points on the left image and the face feature points on the right image corresponding to the same part of the face.

最初に、一方の画像の顔領域から第１の候補点を抽出するために、顔特徴点抽出部１２は、例えば、左画像上の顔領域に対してコーナー検出器を適用することにより検出される複数の点をそれぞれ第１の候補点とする。なお、顔特徴点抽出部１２は、そのようなコーナー検出器として、例えば、Harris検出器を用いることができる。また顔特徴点抽出部１２は、顔領域から第１の候補点を抽出するために、コーナー検出器以外の特徴的な点を検出する検出器を用いてもよい。そのような検出器として、例えば、顔特徴点抽出部１２は、Scale-invariant feature transform(SIFT)検出器を用いてもよい。 First, in order to extract the first candidate point from the face area of one image, the face feature point extraction unit 12 is detected by applying a corner detector to the face area on the left image, for example. A plurality of points are set as first candidate points. The face feature point extraction unit 12 can use, for example, a Harris detector as such a corner detector. The face feature point extraction unit 12 may use a detector that detects a characteristic point other than the corner detector in order to extract the first candidate point from the face region. As such a detector, for example, the face feature point extraction unit 12 may use a Scale-invariant feature transform (SIFT) detector.

その際、顔特徴点抽出部１２は、顔領域内で目、鼻、口などの特徴的な部位が存在する可能性のある範囲を予測できるので、その特徴的な部位が存在する可能性のある範囲内に限定して、上記のような検出器を適用して第１の候補点を抽出してもよい。あるいは、顔特徴点抽出部１２は、顔領域のサイズが小さいほど、着目画素を第１の候補点として抽出するか否かを判定するために検出器に入力される領域のサイズを小さくしてもよい。さらにまた、顔特徴点抽出部１２は、顔の特徴的な部位を表すテンプレートと左画像の顔領域との間で相対的な位置を変えつつテンプレートマッチングを行って正規化相互相関値を求めてもよい。そして顔特徴点抽出部１２は、その正規化相互相関値が最も高くなるときのテンプレートと重なる顔領域内の何れかの画素を第１の候補点として抽出してもよい。 At that time, the face feature point extraction unit 12 can predict a range in which a characteristic part such as an eye, a nose, or a mouth may exist in the face region, so that the characteristic part may exist. The first candidate point may be extracted by applying the detector as described above within a certain range. Alternatively, the face feature point extraction unit 12 reduces the size of the region input to the detector in order to determine whether or not to extract the target pixel as the first candidate point as the size of the face region is smaller. Also good. Furthermore, the face feature point extraction unit 12 obtains a normalized cross-correlation value by performing template matching while changing the relative position between the template representing the characteristic part of the face and the face area of the left image. Also good. Then, the face feature point extraction unit 12 may extract any pixel in the face region that overlaps the template when the normalized cross-correlation value is the highest as the first candidate point.

次に、顔特徴点抽出部１２は、左画像から抽出された第１の候補点ごとに、その第１の候補点を中心とする所定の領域をテンプレートとして設定する。そして顔特徴点抽出部１２は、そのテンプレートと右画像上の顔領域との間で相対的な位置を変えつつテンプレートマッチングを行って、例えば正規化相互相関値を求める。そして顔特徴点抽出部１２は、正規化相互相関値が最大となる、すなわち、テンプレートと最も一致するときのテンプレートの中心に対応する右画像上の画素を第２の候補点として求める。 Next, the face feature point extraction unit 12 sets, for each first candidate point extracted from the left image, a predetermined area centered on the first candidate point as a template. Then, the face feature point extraction unit 12 performs template matching while changing the relative position between the template and the face area on the right image, and obtains, for example, a normalized cross-correlation value. Then, the facial feature point extraction unit 12 obtains, as a second candidate point, a pixel on the right image corresponding to the center of the template when the normalized cross-correlation value is maximized, that is, the template matches most.

同様に、顔特徴点抽出部１２は、第２の候補点を中心とする所定の領域を再探索用テンプレートとして設定する。そして顔特徴点抽出部１２は、再探索用テンプレートと左画像上の顔領域との間で相対的な位置を変えつつテンプレートマッチングを行って、例えば正規化相互相関値を求める。そして顔特徴点抽出部１２は、正規化相互相関値が最大となる、すなわち、再探索用テンプレートと最も一致するときの再探索用テンプレートの中心に対応する左画像上の画素を第３の候補点として求める。顔特徴点抽出部１２は、第３の候補点と元の第１の候補点間の距離を求め、所定の距離閾値以下であれば、左画像上の第１の候補点または第３の候補点と右画像上の第２の候補点とを、同一の部位に対応する顔特徴点の組とする。なお、所定の領域は、候補点の周囲の顔の構造の一致度合いを調べられる大きさであり、かつ、撮影方向の差による影響が小さくて済む程度の大きさとすることが好ましい。例えば、所定の領域は、水平方向、垂直方向とも、顔領域の水平方向の幅の1/10〜1/4程度の長さを持つ矩形領域とすることができる。また、距離閾値は、例えば、その距離が同一の部位に対応する特徴点とみなせる最大距離に設定される。
一方、顔特徴点抽出部１２は、テンプレートと右画像上の顔領域との正規化相互相関値の最大値が所定の閾値未満である場合には、そのテンプレートに対応する第１の候補点と一致する第２の候補点が右画像には存在しないとして、その第１の候補点を、顔特徴点の組の探索対象から外してもよい。同様に、顔特徴点抽出部１２は、再探索用テンプレートと左画像上の顔領域との正規化相互相関値の最大値が所定の閾値未満である場合にも、その再探索用テンプレートに対応する第１の候補点及び第２の候補点を顔特徴点の組の探索対象から外してもよい。この所定の閾値が高く設定されるほど、顔特徴点抽出部１２は、顔特徴点の組が、同一の部位に対応していることの確からしさを向上できる。例えば、所定の閾値は、0.9〜0.95に設定される。あるいは、顔特徴点抽出部１２は、左画像から抽出された第１の候補点の数が多いほど、所定の閾値を高くしてもよい。これにより、顔特徴点抽出部１２は、一方の画像の顔領域から抽出された候補点の数が多いときには、同一の部位に対応している可能性が高い顔特徴点の組だけを抽出できる。また、一方の画像の顔領域から抽出された顔特徴点の数が少なくても、顔特徴点抽出部１２は較正パラメータを求めるために十分な数の顔特徴点の組を抽出できる。 Similarly, the face feature point extraction unit 12 sets a predetermined region centered on the second candidate point as a re-search template. Then, the face feature point extraction unit 12 performs template matching while changing the relative position between the re-search template and the face area on the left image, and obtains a normalized cross-correlation value, for example. Then, the face feature point extraction unit 12 determines the pixel on the left image corresponding to the center of the re-search template when the normalized cross-correlation value is maximum, that is, the best match with the re-search template, as the third candidate. Find as a point. The face feature point extraction unit 12 obtains a distance between the third candidate point and the original first candidate point, and if it is equal to or smaller than a predetermined distance threshold, the first candidate point or the third candidate on the left image Let the point and the second candidate point on the right image be a set of face feature points corresponding to the same part. It is preferable that the predetermined area has such a size that the degree of coincidence of the structures of the faces around the candidate points can be examined and that the influence of the difference in the photographing direction can be reduced. For example, the predetermined area can be a rectangular area having a length of about 1/10 to 1/4 of the horizontal width of the face area in both the horizontal and vertical directions. Further, the distance threshold is set to, for example, the maximum distance that can be regarded as a feature point corresponding to a part having the same distance.
On the other hand, when the maximum value of the normalized cross-correlation value between the template and the face area on the right image is less than a predetermined threshold, the face feature point extraction unit 12 determines the first candidate point corresponding to the template and If the matching second candidate point does not exist in the right image, the first candidate point may be excluded from the search target of the set of face feature points. Similarly, the face feature point extraction unit 12 responds to the re-search template even when the maximum normalized cross-correlation value between the re-search template and the face area on the left image is less than a predetermined threshold. The first candidate point and the second candidate point may be excluded from the search target of the set of face feature points. The higher the predetermined threshold value is set, the more the face feature point extraction unit 12 can improve the certainty that the set of face feature points corresponds to the same part. For example, the predetermined threshold is set to 0.9 to 0.95. Alternatively, the face feature point extraction unit 12 may increase the predetermined threshold as the number of first candidate points extracted from the left image increases. Thereby, the face feature point extraction unit 12 can extract only a set of face feature points that are highly likely to correspond to the same part when the number of candidate points extracted from the face area of one image is large. . Further, even if the number of face feature points extracted from the face area of one image is small, the face feature point extraction unit 12 can extract a sufficient number of sets of face feature points for obtaining calibration parameters.

図３（ａ）〜図３（ｃ）は、本実施形態による、顔特徴点の組の抽出手順を表す模式図である。図３（ａ）において、左画像３００の顔領域内で複数の第１の候補点３０１が抽出される。この時点では、右画像３１０に対しては何の処理も行われない。
次に、図３（ｂ）に示すように、左画像３００から抽出された複数の第１の候補点のうち、注目する候補点３０１ａを中心とするテンプレート３０２が設定される。そして右画像３１０では、テンプレート３０２とのテンプレートマッチングが行われ、第２の候補点３１１が抽出される。
その後、図３（ｃ）に示されるように、右画像３１０に基づいて、第２の候補点３１１を中心とする再探索用テンプレート３１２が設定され、左画像３００では、再探索用テンプレート３１２と顔領域とのテンプレートマッチングが行われる。その結果、第３の候補点３０３が抽出され、この第３の候補点３０３と第１の候補点３０１ａ間の距離が距離閾値以下であれば、第１の候補点３０１ａ（あるいは第３の候補点３０３）と第２の候補点３１１が顔特徴点の組となる。 FIG. 3A to FIG. 3C are schematic views showing a procedure for extracting a set of face feature points according to the present embodiment. In FIG. 3A, a plurality of first candidate points 301 are extracted in the face area of the left image 300. At this time, no processing is performed on the right image 310.
Next, as illustrated in FIG. 3B, a template 302 centered on a target candidate point 301 a among the plurality of first candidate points extracted from the left image 300 is set. In the right image 310, template matching with the template 302 is performed, and second candidate points 311 are extracted.
Thereafter, as shown in FIG. 3C, a re-search template 312 centering on the second candidate point 311 is set based on the right image 310. In the left image 300, the re-search template 312 and Template matching with the face area is performed. As a result, the third candidate point 303 is extracted, and if the distance between the third candidate point 303 and the first candidate point 301a is equal to or smaller than the distance threshold, the first candidate point 301a (or the third candidate) The point 303) and the second candidate point 311 are a set of face feature points.

顔特徴点抽出部１２は、最初に右画像の顔領域から第１の候補点を抽出し、その第１の候補点に対応する第２の候補点を左画像の顔領域内で探索してもよい。
顔特徴点抽出部１２は、得られた顔特徴点の組ごとに、二つの顔特徴点の画像上の水平座標値及び垂直座標値を記憶部４に記憶する。 The face feature point extraction unit 12 first extracts a first candidate point from the face area of the right image, and searches for a second candidate point corresponding to the first candidate point in the face area of the left image. Also good.
The face feature point extraction unit 12 stores the horizontal coordinate value and the vertical coordinate value on the image of the two face feature points in the storage unit 4 for each set of obtained face feature points.

背景特徴点抽出部１３は、左画像及び右画像の背景領域から、それぞれ、その背景領域に写っている同一の物体上の同一の点に対応する背景特徴点の組を少なくとも一つ抽出する。
そこで、背景特徴点抽出部１３は、顔特徴点抽出部１２と同様に、左画像及び右画像のうちの一方の背景領域から背景特徴点の候補となる第１の候補点を複数抽出する。そして背景特徴点抽出部１３は、第１の候補点ごとに、その候補点を中心とするテンプレートと他方の画像の背景領域との間で相対的な位置を変えつつテンプレートマッチングを行うことにより、その第１の候補点と最も一致する第２の候補点を求める。そして背景特徴点抽出部１３は、第２の候補点を中心とする再探索用テンプレートと、第１の候補点を抽出した画像の背景領域とのテンプレートマッチングを行うことにより、第２の候補点と最も一致する第３の候補点を求める。そして背景特徴量抽出部１３は、第３の候補点と第１の候補点間の距離が距離閾値以下であれば、一方の画像におけるその第１の候補点または第３の候補点と他方の画像における第２の候補点を背景特徴点の組として抽出する。
背景特徴点抽出部１３は、得られた背景特徴点の組ごとに、二つの背景特徴点の画像上の水平座標値及び垂直座標値を記憶部４に記憶する。 The background feature point extraction unit 13 extracts at least one set of background feature points corresponding to the same point on the same object in the background region from the background region of the left image and the right image.
Therefore, similarly to the face feature point extraction unit 12, the background feature point extraction unit 13 extracts a plurality of first candidate points that are candidates for background feature points from one of the background regions of the left image and the right image. The background feature point extraction unit 13 performs template matching for each first candidate point while changing the relative position between the template centered on the candidate point and the background region of the other image. A second candidate point that most closely matches the first candidate point is obtained. Then, the background feature point extraction unit 13 performs template matching between the re-search template centered on the second candidate point and the background region of the image from which the first candidate point is extracted, thereby obtaining the second candidate point. The third candidate point that most closely matches is obtained. And if the distance between the 3rd candidate point and the 1st candidate point is below a distance threshold, background feature-value extraction part 13 will be the 1st candidate point or 3rd candidate point in one picture, and the other The second candidate point in the image is extracted as a set of background feature points.
The background feature point extraction unit 13 stores the horizontal coordinate value and the vertical coordinate value on the image of the two background feature points in the storage unit 4 for each set of obtained background feature points.

較正パラメータ算出部１４は、左画像及び右画像の少なくとも一方に対する、その画像上に写っている像の位置を修正するための較正パラメータを算出する。本実施形態では、較正パラメータ算出部１４は、同一の部位に対応する顔特徴点の組から、像面に平行な水平軸周りのカメラ２−１の回転角とカメラ２−２の回転角との差による画像上の被写体の像の位置ずれを補正する水平方向較正パラメータを求める。同様に、較正パラメータ算出部１４は、同一の部位に対応する顔特徴点の組から、像面に平行な垂直軸周りのカメラ２−１の回転角とカメラ２−２の回転角との差による像の位置ずれを補正する垂直方向較正パラメータを求める。さらに較正パラメータ算出部１４は、同一の物体に対応する背景特徴点の組から、光軸周りのカメラ２−１の回転角とカメラ２−２の回転角との差による、右画像上の像と左画像上の像間の回転を補正する回転方向較正パラメータを求める。 The calibration parameter calculation unit 14 calculates a calibration parameter for correcting the position of the image shown on the image for at least one of the left image and the right image. In the present embodiment, the calibration parameter calculation unit 14 calculates the rotation angle of the camera 2-1 around the horizontal axis parallel to the image plane and the rotation angle of the camera 2-2 from a set of facial feature points corresponding to the same part. A horizontal calibration parameter for correcting the positional deviation of the image of the subject on the image due to the difference between the two is obtained. Similarly, the calibration parameter calculation unit 14 calculates the difference between the rotation angle of the camera 2-1 around the vertical axis parallel to the image plane and the rotation angle of the camera 2-2 from the set of face feature points corresponding to the same part. The vertical calibration parameter for correcting the image positional deviation due to the above is obtained. Further, the calibration parameter calculation unit 14 determines an image on the right image based on the difference between the rotation angle of the camera 2-1 and the rotation angle of the camera 2-2 around the optical axis from a set of background feature points corresponding to the same object. And a rotation direction calibration parameter for correcting the rotation between the images on the left image.

ここで較正パラメータを表現するための便宜上、カメラ２−１、２−２について定められる実空間上の座標系を定義する。図４は、そのような座標系を表す模式図である。座標系４０１は、左画像を生成するカメラ２−１について定められる座標系であり、x軸はカメラ２−１の像面に平行な水平方向、y軸は垂直方向を表す。またz軸は、カメラ２−１が有する結像光学系の光軸である。x軸は、被写体である人物の顔４１０に向かって右向きを正とし、y軸は上方を正とし、z軸は顔４１０へ向かう方向を正とする。そして左画像４０２の中心(X_C,Y_C)が座標系４０１の原点に対応する。したがって、左画像上で垂直方向の座標Y_Cの水平線は、実空間のx軸に対応する。また左画像４０２上で水平方向の座標X_Cの垂直線は、実空間のy軸に対応する。同様に、座標系４０３は、右画像を生成するカメラ２−２について定められる座標系であり、x軸はカメラ２−２の像面に平行な水平方向、y軸は像面に平行な垂直方向を表す。またz軸は、カメラ２−２が有する結像光学系の光軸である。そして右画像４０４の中心(X_C,Y_C)が座標系４０３の原点に対応する。 Here, for the convenience of expressing the calibration parameters, a coordinate system in the real space defined for the cameras 2-1 and 2-2 is defined. FIG. 4 is a schematic diagram showing such a coordinate system. The coordinate system 401 is a coordinate system defined for the camera 2-1 that generates the left image. The x-axis represents the horizontal direction parallel to the image plane of the camera 2-1, and the y-axis represents the vertical direction. The z-axis is the optical axis of the imaging optical system included in the camera 2-1. The x-axis is positive toward the face 410 of the person who is the subject, the y-axis is positive upward, and the z-axis is positive toward the face 410. The center (X _C , Y _C ) of the left image 402 corresponds to the origin of the coordinate system 401. Therefore, the horizontal line of the vertical coordinate Y _C on the left image corresponds to the x axis in real space. Further, the vertical line of the horizontal coordinate X _C on the left image 402 corresponds to the y-axis in real space. Similarly, the coordinate system 403 is a coordinate system defined for the camera 2-2 that generates the right image. The x axis is a horizontal direction parallel to the image plane of the camera 2-2, and the y axis is a vertical direction parallel to the image plane. Represents a direction. The z-axis is the optical axis of the imaging optical system that the camera 2-2 has. The center (X _C , Y _C ) of the right image 404 corresponds to the origin of the coordinate system 403.

カメラ２−１及びカメラ２−２が設置される際の組み立て誤差などにより、一方のカメラの向きが、他方のカメラの向きと異なることがある。この向きの差は、x軸周りの回転、y軸周りの回転、及びz軸周りの回転で表される。なお、左右のカメラの取り付け位置に差がある場合も、その位置の差は各軸周りの回転量の差に含まれた形で表される。
そこで本実施形態では、較正パラメータ算出部１４は、顔特徴点の組から水平方向較正パラメータとしてx軸周りの回転量の差θ_xを求め、垂直方向較正パラメータとしてy軸周りの回転量の差θ_yを求める。 The orientation of one camera may differ from the orientation of the other camera due to assembly errors when the camera 2-1 and the camera 2-2 are installed. This difference in orientation is represented by rotation around the x axis, rotation around the y axis, and rotation around the z axis. In addition, even when there is a difference in the attachment positions of the left and right cameras, the difference in the positions is expressed in a form included in the difference in the rotation amount around each axis.
Therefore, in this embodiment, the calibration parameter calculating unit 14 calculates the rotation amount of the difference theta _x about the x-axis as a horizontal calibration parameters from the set of facial feature points, the difference between the rotation amount about the y-axis as the vertical direction calibration parameters Find θ _y .

図５（ａ）〜図５（ｃ）を参照しつつ、顔特徴点の組と較正パラメータθ_x、θ_yの関係を説明する。図５（ａ）は、左画像及び右画像における顔特徴点の組の一例を表す。図５（ａ）において、左画像５００には座標(X_L,i,Y_L,i)にi番目（iは1以上の整数）の顔特徴点の組に含まれる顔特徴点５０１が検出されており、右画像５１０には座標(X_R,i,Y_R,i)にi番目の顔特徴点の組に含まれる顔特徴点５１１が検出されている。この場合、座標(X_C,Y_C)に位置するカメラ２−１の光軸に対応する画像中心５０２から顔特徴点５０１までの左画像上の水平方向差及び垂直方向の差は、それぞれ(X_L,i-X_C,Y_L,i-Y_C)となる。同様に、座標(X_C,Y_C)に位置するカメラ２−２の光軸に対応する画像中心５１２から顔特徴点５１１までの右画像上の水平方向差及び垂直方向の差は、それぞれ(X_R,i-X_C,Y_R,i-Y_C)となる。この垂直方向の差は、カメラ２−１、２−２の光軸を通る水平面から特徴点までの回転成分に相当し、一方、水平方向の差は、カメラ２−１、２−２の光軸を通る垂直面から特徴点までの回転成分に相当する。そして水平面から特徴点までの回転成分は、x軸周りの回転角に対応し、垂直面から特徴点までの回転成分は、y軸周りの回転角に対応する。 With reference to FIGS. 5A to 5C, the relationship between the set of face feature points and the calibration parameters θ _x and θ _y will be described. FIG. 5A shows an example of a set of face feature points in the left image and the right image. 5A, the left image 500 detects a face feature point 501 included in a set of i-th (i is an integer of 1 or more) face feature points at coordinates (X _{L, i} , Y _{L, i} ). In the right image 510, a face feature point 511 included in a set of i-th face feature points is detected at coordinates (X _{R, i} , Y _{R, i} ). In this case, the horizontal direction difference and the vertical direction difference on the left image from the image center 502 to the face feature point 501 corresponding to the optical axis of the camera 2-1 located at the coordinates (X _C , Y _C ) are respectively ( X _{L, i} -X _C , Y _{L, i} -Y _C ). Similarly, the horizontal difference and the vertical difference on the right image from the image center 512 to the face feature point 511 corresponding to the optical axis of the camera 2-2 located at the coordinates (X _C , Y _C ) are respectively ( X _{R, i} -X _C , Y _{R, i} -Y _C ). This vertical difference corresponds to the rotational component from the horizontal plane passing through the optical axes of the cameras 2-1 and 2-2 to the feature point, while the horizontal difference is the light of the cameras 2-1 and 2-2. This corresponds to the rotation component from the vertical plane passing through the axis to the feature point. The rotation component from the horizontal plane to the feature point corresponds to the rotation angle around the x axis, and the rotation component from the vertical plane to the feature point corresponds to the rotation angle around the y axis.

図５（ｂ）は、画像中心から顔特徴点までの垂直方向の差と水平面からの回転成分θ_x,L,i、θ_x,R,iとの関係を表す模式図である。像面は、カメラの視点からカメラの焦点距離に位置すると仮定される。そのため、図５（ｂ）に示されるように、回転成分θ_x,L,i、θ_x,R,iは、それぞれ、次式に従って算出される。

ここでfは、カメラ２−１、２−２が有する結像光学系の焦点距離である。 FIG. 5B is a schematic diagram showing the relationship between the difference in the vertical direction from the image center to the face feature point and the rotation components θ _{x, L, i} and θ _{x, R, i} from the horizontal plane. It is assumed that the image plane is located at the focal length of the camera from the camera viewpoint. Therefore, as shown in FIG. 5B, the rotation components θ _{x, L, i} and θ _{x, R, i} are calculated according to the following equations, respectively.

Here, f is the focal length of the imaging optical system that the cameras 2-1 and 2-2 have.

較正パラメータ算出部１４は、顔特徴点の組ごとに、（１）式に従って回転成分θ_y,L,i、θ_y,R,iを求め、その回転成分間の差(θ_x,L,i-θ_x,R,i)を求める。そして較正パラメータ算出部１４は、その差の平均値を水平方向較正パラメータθ_xとする。 The calibration parameter calculation unit 14 obtains the rotation components θ _{y, L, i} , θ _{y, R, i} according to the equation (1) for each set of face feature points, and calculates the difference between the rotation components (θ _{x, L, i} -θ _{x, R, i} ) The calibration parameter calculator 14 sets the average value of the differences as the horizontal calibration parameter θ _x .

図５（ｃ）は、画像中心から顔特徴点までの水平方向の差と垂直面からの回転成分θ_y,L,i、θ_y,R,iの関係を表す模式図である。図５（ｃ）に示されるように、回転成分θ_y,L,i、θ_y,R,iは、それぞれ、次式に従って算出される。

FIG. 5C is a schematic diagram showing the relationship between the difference in the horizontal direction from the image center to the face feature point and the rotation components θ _{y, L, i} and θ _{y, R, i} from the vertical plane. As shown in FIG. 5C, the rotation components θ _{y, L, i} and θ _{y, R, i} are respectively calculated according to the following equations.

較正パラメータ算出部１４は、顔特徴点の組ごとに、（２）式に従って回転成分θ_y,L,iと回転成分θ_y,R,iを求め、その回転成分間の差(θ_y,L,i-θ_y,R,i)を求める。そして較正パラメータ算出部１４は、その差の平均値を垂直方向較正パラメータθ_yとする。 The calibration parameter calculation unit 14 obtains the rotation component θ _{y, L, i} and the rotation component θ _{y, R, i} according to the equation (2) for each set of face feature points, and calculates the difference between the rotation components (θ _{y, L, i} -θ _{y, R, i} ) is obtained. Then, the calibration parameter calculation unit 14 sets the average value of the differences as the vertical direction calibration parameter θ _y .

なお、較正パラメータ算出部１４は、各顔特徴点の組についての回転角の差を加重平均することにより、水平方向較正パラメータθ_x及び垂直方向較正パラメータθ_yを求めてもよい。その際、較正パラメータ算出部１４は、画像中心(X_c,Y_c)から顔特徴点の組に含まれる顔特徴点までの平均距離が小さいほど、その顔特徴点の組について算出された回転角の差に対する重み係数を小さくすることが好ましい。画像中心は光軸に対応しているので、画像中心に近い顔特徴点ほど、水平面、垂直面に対する回転成分の絶対値も小さい。そのため、画像中心に近い顔特徴点の組ほど、左右の画像間での回転成分の差に含まれる誤差が相対的に大きくなるためである。 Note that the calibration parameter calculation unit 14 may obtain the horizontal direction calibration parameter θ _x and the vertical direction calibration parameter θ _y by performing weighted averaging of the difference in rotation angle for each set of face feature points. At that time, the calibration parameter calculation unit 14 calculates the rotation calculated for the set of face feature points as the average distance from the image center (X _c , Y _c ) to the face feature points included in the set of face feature points decreases. It is preferable to reduce the weighting coefficient for the corner difference. Since the image center corresponds to the optical axis, the face feature point closer to the image center has a smaller absolute value of the rotation component with respect to the horizontal and vertical planes. For this reason, as the set of face feature points closer to the center of the image, the error included in the difference in rotational components between the left and right images becomes relatively large.

また較正パラメータ算出部１４は、背景特徴点の組に基づいて、回転方向較正パラメータθ_zを求める。そこで先ず、較正パラメータ算出部１４は、背景特徴点の組について、x軸周りのカメラの回転角の差及びy軸周りのカメラの回転角の差による画像上の特徴点の位置のずれを打ち消す。そのために、較正パラメータ算出部１４は、次式に従って、上記の較正パラメータθ_x及びθ_yを用いて右画像から抽出された各背景特徴点の位置を補正する。

ここで(x_R,j,y_R,j)(jは1以上の整数)は、j番目の背景特徴点の組に含まれる、右画像上の背景特徴点の水平座標及び垂直座標である。また(x'_R,j,y'_R,j)は、j番目の背景特徴点の組に含まれる、右画像上の背景特徴点の補正後の水平座標及び垂直座標である。なお、右画像から抽出された各背景特徴点の位置を補正する代わりに、左画像上の各背景特徴点の位置を補正する場合、較正パラメータ算出部１４は、（３）式においてθ_x、θ_yをそれぞれ(-θ_x)、(-θ_y)とすればよい。また較正パラメータ算出部１４は、各画像の背景特徴点をそれぞれ較正パラメータθ_x、θ_yの半分ずつ移動させてもよい。この場合、左画像上の背景特徴点について、（３）式におけるθ_x、θ_yは、それぞれ(-θ_x/2)、(-θ_y/2)に置き換えられる。また右画像上の背景特徴点について、（３）式におけるθ_x、θ_yは、それぞれ(θ_x/2)、(θ_y/2)に置き換えられる。 Further, the calibration parameter calculation unit 14 obtains the rotation direction calibration parameter θ _z based on the set of background feature points. Therefore, first, the calibration parameter calculation unit 14 cancels the shift of the position of the feature point on the image due to the difference in the rotation angle of the camera around the x axis and the difference in the rotation angle of the camera around the y axis for the set of background feature points. . For this purpose, the calibration parameter calculation unit 14 corrects the position of each background feature point extracted from the right image using the calibration parameters θ _x and θ _y according to the following equation.

Where (x _{R, j} , y _{R, j} ) (j is an integer greater than or equal to 1) is the horizontal and vertical coordinates of the background feature points on the right image included in the set of the jth background feature point . Further, (x ′ _{R, j} , y ′ _{R, j} ) is the corrected horizontal coordinate and vertical coordinate of the background feature point on the right image included in the set of the j-th background feature point. In addition, when correcting the position of each background feature point on the left image instead of correcting the position of each background feature point extracted from the right image, the calibration parameter calculator 14 calculates θ _x , the theta _y respectively (-θ _x), - may be a (θ _y). The calibration parameter calculation unit 14 may move the background feature points of each image by half of the calibration parameters θ _x and θ _y , respectively. In this case, the background feature points on the left image, the theta _x, theta _y in (3), respectively (- [theta] _x / 2), - is replaced by (θ _y / 2). For the background feature point on the right image, θ _x and θ _y in equation (3) are replaced with (θ _x / 2) and (θ _y / 2), respectively.

その後、較正パラメータ算出部１４は、左画像上の各背景特徴点及び右画像上の各背景特徴点を、画像中心(X_c,Y_c)を回転中心として、それぞれ、互いに反対向きに画像上で角度Δθ/2ずつ回転させる。較正パラメータ算出部１４は、左画像に対する右画像の回転角Δθを少しずつ変えて行った時、各背景特徴点の組について、左右の垂直座標の差の平均値が最小となるときの回転角Δθを回転方向較正パラメータθ_zとする。
較正パラメータ算出部１４は、各背景特徴点の組についての垂直座標の差を加重平均することにより、回転方向較正パラメータθ_zを求めてもよい。この場合も、較正パラメータ算出部１４は、画像中心(X_c,Y_c)から背景特徴点の組に含まれる背景特徴点までの距離の平均値が小さいほど、その背景特徴点の組について算出された垂直座標の差に対する重み係数を小さくすることが好ましい。 Thereafter, the calibration parameter calculation unit 14 sets each background feature point on the left image and each background feature point on the right image on the image in the opposite directions with the image center (X _c , Y _c ) as the rotation center. To rotate the angle by Δθ / 2. When the calibration parameter calculation unit 14 changes the rotation angle Δθ of the right image with respect to the left image little by little, the rotation angle when the average value of the difference between the left and right vertical coordinates is minimized for each set of background feature points. Let Δθ be the rotation direction calibration parameter θ _z .
The calibration parameter calculation unit 14 may obtain the rotation direction calibration parameter θ _z by performing weighted averaging of the vertical coordinate difference for each set of background feature points. Also in this case, the calibration parameter calculation unit 14 calculates a set of background feature points as the average value of the distances from the image center (X _c , Y _c ) to the background feature points included in the set of background feature points is small. It is preferable to reduce the weighting factor for the difference between the vertical coordinates.

なお、被写体である人の後方に壁しか存在しない場合のように、背景領域内の画素値の局所的な変化が小さくなることがある。このような場合、抽出される背景特徴点の組の数が少なくなり、その結果、背景特徴点の組の数が回転方向較正パラメータθ_zを求めるために不十分となることがある。そこで較正パラメータ算出部１４は、背景領域内の局所勾配の方位の分布を左画像及び右画像からそれぞれ求め、左画像と右画像間の局所勾配の方位の分布におけるピーク角度の差を回転方向較正パラメータθ_zとしてもよい。 Note that the local change in the pixel value in the background region may be small, as in the case where only the wall exists behind the person who is the subject. In such a case, the number of background feature point sets to be extracted is reduced, and as a result, the number of background feature point sets may be insufficient to obtain the rotation direction calibration parameter θ _z . Therefore, the calibration parameter calculation unit 14 obtains the local gradient orientation distribution in the background region from the left image and the right image, respectively, and calculates the difference in peak angle in the local gradient orientation distribution between the left image and the right image in the rotational direction calibration. The parameter θ _z may be used.

図６（ａ）は、局所勾配の算出に用いられる複数の画素の位置関係を説明する図である。図６（ａ）に示されるように、背景領域６００内の注目する画素(i,j)について、較正パラメータ算出部１４は、例えば、左側に隣接する画素(i-1,j)及び上側に隣接する画素(i,j-1)を参照する。そして較正パラメータ算出部１４は、次式に従って画素(i,j)の局所勾配の方位φ_ijを求める。

ここでV_i,j、V_i-1,j及びV_i,j-1は、それぞれ、画素(i,j)、(i-1,j)及び(i,j-1)の画素値である。なお、左画像及び右画像がカラー画像である場合、これらの画素値は、例えば、何れか一つの色成分の値とすることができる。較正パラメータ算出部１４は、背景領域内の各画素について、（４）式に従って方位角を求め、そして方位角ごとの画素数を求める。 FIG. 6A is a diagram for explaining the positional relationship between a plurality of pixels used for calculation of the local gradient. As shown in FIG. 6A, for the pixel of interest (i, j) in the background region 600, the calibration parameter calculation unit 14, for example, has the pixel (i−1, j) adjacent to the left side and the upper side. Refer to the adjacent pixel (i, j-1). Then, the calibration parameter calculation unit 14 obtains the orientation φ _ij of the local gradient of the pixel (i, j) according to the following equation.

Where V _{i, j} , V _{i−1, j} and V _{i, j−1} are the pixel values of pixels (i, j), (i−1, j) and (i, j−1), respectively. is there. In addition, when the left image and the right image are color images, these pixel values can be set to the value of any one color component, for example. The calibration parameter calculation unit 14 obtains an azimuth angle for each pixel in the background area according to the equation (4), and obtains the number of pixels for each azimuth angle.

図６（ｂ）は、左画像及び右画像の背景領域から求められる局所勾配の方位の分布の模式図である。ヒストグラム６１０は、左画像の背景領域から求められた局所勾配の方位の分布を表し、ヒストグラム６２０は、右画像の背景領域から求められた局所勾配の方位の分布を表す。また横軸は方位角を表し、縦軸は画素数を表す。例えば、ヒストグラム６１０において、方位角φ_aの画素数が最も多い、すなわち、方位角φ_aがピーク角度である。またヒストグラム６２０において、方位角φ_bがピーク角度である。この場合、較正パラメータ算出部１４は、方位角の差(φ_a-φ_b)を回転方向較正パラメータθ_zとして求める。
較正パラメータ算出部１４は、較正パラメータの組(θ_x,θ_y,θ_z)を判定部１５へ渡す。 FIG. 6B is a schematic diagram of the orientation distribution of the local gradient obtained from the background regions of the left image and the right image. The histogram 610 represents the distribution of the azimuth of the local gradient obtained from the background region of the left image, and the histogram 620 represents the distribution of the azimuth of the local gradient obtained from the background region of the right image. The horizontal axis represents the azimuth angle, and the vertical axis represents the number of pixels. For example, in the histogram 610, the number of pixels of the azimuth angle phi _a most frequently, i.e., azimuth angle phi _a is peak angle. In the histogram 620, the azimuth angle φ _b is the peak angle. In this case, the calibration parameter calculation unit 14 calculates the azimuth angle difference (φ _a −φ _b ) as the rotation direction calibration parameter θ _z .
The calibration parameter calculation unit 14 passes the set of calibration parameters (θ _x , θ _y , θ _z ) to the determination unit 15.

判定部１５は、較正パラメータ算出部１４により算出された較正パラメータが適正か否かを判定する。ステレオ画像を正しく３次元表示するためには、較正パラメータを用いて補正された左画像及び右画像上で同一の物体の高さが等しいこと、すなわち、同一の物体についてのエピポーラ線の高さが両画像において等しいことが重要である。そこで判定部１５は、補正後の左画像及び右画像における、各顔特徴点の組または各背景特徴点の組の垂直方向のずれの大きさに基づいて較正パラメータが適正か否か判定する。なお、エピポーラ線は、一方のカメラと被写体上の注目点を結ぶ線を他方のカメラで撮影した画像上に投影した線である。 The determination unit 15 determines whether the calibration parameter calculated by the calibration parameter calculation unit 14 is appropriate. In order to correctly display a stereo image in three dimensions, the height of the same object on the left image and the right image corrected using the calibration parameters is equal, that is, the height of the epipolar line for the same object is the same. It is important that they are equal in both images. Therefore, the determination unit 15 determines whether or not the calibration parameter is appropriate based on the magnitude of the vertical deviation of each set of face feature points or each set of background feature points in the corrected left image and right image. The epipolar line is a line obtained by projecting a line connecting one camera and a point of interest on the subject onto an image photographed by the other camera.

判定部１５は、例えば、上記の較正パラメータの組を用いて、右画像の顔特徴点および背景特徴点の座標をそれぞれ変換し、補正後の右画像上での各特徴点の座標を求める。そのために、判定部１５は、例えば、上記の（３）式にしたがって各特徴点の位置を補正する。その後、判定部１５は、補正された各特徴点の座標を、次式に従って変換することにより、z軸周り、すなわち光軸周りのカメラ２−１とカメラ２−２の回転角の差による位置のずれを補正する。

ここで、(x_R,k,y_R,k)(kは1以上の整数)は、k番目の背景特徴点または顔特徴点の組に含まれる、x軸及びy軸周りのカメラの回転角の差による位置のずれが補正された後の右画像上の背景特徴点または顔特徴点の水平座標及び垂直座標である。また(x"_R,k,y"_R,k)は、k番目の背景特徴点または顔特徴点の組に含まれる、光軸周りのカメラの回転角の差による位置のずれを補正した後の右画像上の背景特徴点または顔特徴点の水平座標及び垂直座標である。なお、右画像から抽出された各特徴点の位置を補正する代わりに、左画像上の各特徴点の位置を補正する場合、判定部１５は、（３）及び（５）式においてθ_x、θ_y、θ_zをそれぞれ(-θ_x)、(-θ_y)、(-θ_z)とすればよい。また判定部１５は、各画像の特徴点をそれぞれ較正パラメータθ_x、θ_y、θ_zの半分ずつ移動させてもよい。この場合、左画像上の特徴点については、（３）及び（５）式におけるθ_x、θ_y、θ_zは、それぞれ(-θ_x/2)、(-θ_y/2)、(-θ_z/2)に置き換えられる。また右画像上の特徴点については、（３）及び（５）式におけるθ_x、θ_y、θ_zは、それぞれ(θ_x/2)、(θ_y/2)、(θ_z/2)に置き換えられる。 For example, the determination unit 15 converts the coordinates of the face feature point and the background feature point of the right image using the calibration parameter set described above, and obtains the coordinates of each feature point on the corrected right image. For this purpose, the determination unit 15 corrects the position of each feature point according to, for example, the above equation (3). After that, the determination unit 15 converts the corrected coordinates of the feature points according to the following expression, so that the position around the z axis, that is, the difference in rotation angle between the camera 2-1 and the camera 2-2 around the optical axis. Correct the deviation.

Where (x _{R, k} , y _{R, k} ) (k is an integer greater than or equal to 1) is the rotation of the camera around the x-axis and y-axis included in the set of the kth background feature point or face feature point It is the horizontal coordinate and the vertical coordinate of the background feature point or the face feature point on the right image after correcting the position shift due to the difference in corners. Also, (x " _{R, k} , y" _{R, k} ) is after correcting the position shift due to the difference of the camera rotation angle around the optical axis included in the kth background feature point or face feature point set The horizontal coordinate and the vertical coordinate of the background feature point or the face feature point on the right image. In addition, when correcting the position of each feature point on the left image instead of correcting the position of each feature point extracted from the right image, the determination unit 15 determines that θ _x , θ _y, θ _z, respectively _{(-θ x), (- θ} y), - may be a (θ _z). The determination unit 15 may move the feature points of each image by half of the calibration parameters θ _x , θ _y , and θ _z , respectively. In this case, for feature points on the left image, θ _x , θ _y , and θ _z in equations (3) and (5) are (−θ _x / 2), (−θ _y / 2), (− replaced by θ _z / 2). For feature points on the right image, θ _x , θ _y , and θ _z in equations (3) and (5) are (θ _x / 2), (θ _y / 2), and (θ _z / 2), respectively. Is replaced by

判定部１５は、各特徴点の組に含まれる特徴点同士の垂直方向の座標のずれの絶対値の平均値が所定の許容範囲内であれば、較正パラメータの組(θ_x,θ_y,θ_z)は適正であると判定する。一方、その平均値がその許容範囲から外れていれば、判定部１５は、較正パラメータの組は不適正であると判定する。例えば、インターレース方式にてステレオ画像を表示する場合には、垂直方向における1画素のずれは知覚されないので、その許容範囲は±1画素に設定される。 If the average value of absolute deviations in the vertical coordinate shift between the feature points included in each feature point set is within a predetermined allowable range, the determination unit 15 sets the calibration parameter set (θ _x , θ _y , θ _z ) is determined to be appropriate. On the other hand, if the average value is out of the allowable range, the determination unit 15 determines that the set of calibration parameters is inappropriate. For example, when a stereo image is displayed by the interlace method, since the shift of one pixel in the vertical direction is not perceived, the allowable range is set to ± 1 pixel.

また顔特徴点の組における特徴点同士の垂直方向の座標のずれの絶対値の平均値に対する許容範囲は、背景特徴点の組における特徴点同士の垂直方向の座標のずれの絶対値の平均値に対する許容範囲と異なっていてもよい。例えば、顔の３次元表示像の質を背景領域に写っている物体の３次元表示像の質よりも重視する場合には、顔特徴点の組に対する許容範囲は、背景特徴点の組に対する許容範囲よりも狭く設定されてもよい。例えば、顔特徴点の組に対する許容範囲は±1画素であり、背景特徴点の組に対する許容範囲は±2画素であってもよい。
また判定部１５は、各特徴点の組に含まれる特徴点同士の垂直方向の座標のずれを評価するために、そのずれの絶対値の平均値を算出する代わりに、そのずれの二乗の平均値を求めてもよい。この場合、判定部１５は、垂直方向の座標のずれの二乗の平均値が所定の閾値以下である場合に、較正パラメータの組(θ_x,θ_y,θ_z)は適正であると判定する。
さらに判定部１５は、顔特徴点の組のみ、または背景特徴点の組のみを用いて上記のように特徴点同士の垂直方向の座標のずれの平均値を求め、その平均値が許容範囲に含まれるか否かによって較正パラメータの組(θ_x,θ_y,θ_z)が適正か否かを判定してもよい。 In addition, the allowable range for the average absolute value of the vertical coordinate shift between the feature points in the set of face feature points is the average of the absolute value of the vertical coordinate shift between the feature points in the set of background feature points. May be different from the allowable range for. For example, when the quality of the 3D display image of the face is more important than the quality of the 3D display image of the object in the background area, the allowable range for the set of face feature points is the allowable range for the set of background feature points. It may be set narrower than the range. For example, the allowable range for the set of face feature points may be ± 1 pixel, and the allowable range for the set of background feature points may be ± 2 pixels.
Further, in order to evaluate the vertical coordinate shift between the feature points included in each set of feature points, the determination unit 15 calculates the average of the absolute values of the shifts instead of calculating the average value of the squares of the shifts. A value may be obtained. In this case, the determination unit 15 determines that the set of calibration parameters (θ _x , θ _y , θ _z ) is appropriate when the average value of the squares of deviations in the vertical coordinate is equal to or less than a predetermined threshold. .
Further, the determination unit 15 obtains an average value of the vertical coordinate shift between the feature points using only the set of face feature points or only the set of background feature points as described above, and the average value falls within the allowable range. Whether or not a set of calibration parameters (θ _x , θ _y , θ _z ) is appropriate may be determined depending on whether they are included.

判定部１５は、較正パラメータの組(θ_x,θ_y,θ_z)は適正であると判定した場合、その較正パラメータの組を記憶部４に記憶し、処理部５に適正な較正パラメータの組が得られたことを通知する。その後、処理部５は、カメラ２−１、２−２から左画像及び右画像の組が得られる度に、その較正パラメータの組を用いて、（３）及び（５）式に従って少なくとも一方の画像の各画素の座標を変換することで、一組のステレオ画像を得る。
一方、判定部１５は、較正パラメータの組(θ_x,θ_y,θ_z)は不適正であると判定した場合、その較正パラメータの組を廃棄し、処理部５にその旨を通知する。そして処理部５は、再度カメラ２−１、２−２から左画像及び右画像を取得し、新たに得られた左画像及び右画像に基づいて再度較正パラメータの組を求める。 When the determination unit 15 determines that the calibration parameter set (θ _x , θ _y , θ _z ) is appropriate, the determination unit 15 stores the calibration parameter set in the storage unit 4 and stores the calibration parameter set in the processing unit 5. Notify that the pair has been obtained. Thereafter, each time a pair of left and right images is obtained from the cameras 2-1 and 2-2, the processing unit 5 uses at least one of the calibration parameter sets according to the equations (3) and (5). A set of stereo images is obtained by converting the coordinates of each pixel of the image.
On the other hand, when the determination unit 15 determines that the set of calibration parameters (θ _x , θ _y , θ _z ) is inappropriate, the determination unit 15 discards the set of calibration parameters and notifies the processing unit 5 to that effect. Then, the processing unit 5 obtains the left image and the right image again from the cameras 2-1 and 2-2, and obtains a set of calibration parameters again based on the newly obtained left image and right image.

また、判定部１５は、較正パラメータの組(θ_x,θ_y,θ_z)を用いて、その較正パラメータの組の算出後に得られた右画像または左画像の補正画像を生成し、その補正画像に基づいて再度顔特徴点の組及び背景特徴点の組を抽出してもよい。そして判定部１５は、それら特徴点の組における特徴点同士の垂直方向座標のずれの絶対値の平均値が所定の許容範囲に含まれるか否かで、較正パラメータの組が適正か否かを判定してもよい。 Further, the determination unit 15 uses the calibration parameter set (θ _x , θ _y , θ _z ) to generate a corrected image of the right image or the left image obtained after calculation of the calibration parameter set, and the correction A set of face feature points and a set of background feature points may be extracted again based on the image. Then, the determination unit 15 determines whether or not the set of calibration parameters is appropriate depending on whether or not the average value of the absolute value of the deviation of the vertical coordinate between the feature points in the set of feature points is included in the predetermined allowable range. You may judge.

図７は、処理部５により実行されるキャリブレーション処理（ステレオ画像較正処理）の動作フローチャートである。
処理部５は、カメラ２−１から左画像を取得し、カメラ２−２から右画像を取得する（ステップＳ１０１）。そして処理部５は、左画像及び右画像を顔検出部１１、顔特徴点抽出部１２及び背景特徴点抽出部１３へ渡す。顔検出部１１は、左画像及び右画像から、それぞれ顔領域を検出する（ステップＳ１０２）。そして顔検出部１１は、顔領域及び背景領域を表す情報を顔特徴点抽出部１２及び背景特徴点抽出部１３へ渡す。 FIG. 7 is an operation flowchart of a calibration process (stereo image calibration process) executed by the processing unit 5.
The processing unit 5 acquires the left image from the camera 2-1, and acquires the right image from the camera 2-2 (step S101). Then, the processing unit 5 passes the left image and the right image to the face detection unit 11, the face feature point extraction unit 12, and the background feature point extraction unit 13. The face detection unit 11 detects each face area from the left image and the right image (step S102). Then, the face detection unit 11 passes information representing the face region and the background region to the face feature point extraction unit 12 and the background feature point extraction unit 13.

顔特徴点抽出部１２は、左画像及び右画像のそれぞれの顔領域から、顔上の同一の点に対応する顔特徴点の組を少なくとも一つ抽出する（ステップＳ１０３）。そして顔特徴点抽出部１２は、その顔特徴点の組に含まれる各特徴点の座標を記憶部４に記憶する。
また背景特徴点抽出部１３は、左画像及び右画像のそれぞれの背景領域から、同一の物体上の同一の点に対応する背景特徴点の組を少なくとも一つ抽出する（ステップＳ１０４）。そして背景特徴点抽出部１３は、その背景特徴点の組に含まれる各特徴点の座標を記憶部４に記憶する。 The face feature point extraction unit 12 extracts at least one set of face feature points corresponding to the same point on the face from each face area of the left image and the right image (step S103). Then, the face feature point extraction unit 12 stores the coordinates of each feature point included in the set of face feature points in the storage unit 4.
Further, the background feature point extraction unit 13 extracts at least one set of background feature points corresponding to the same point on the same object from the respective background regions of the left image and the right image (step S104). Then, the background feature point extraction unit 13 stores the coordinates of each feature point included in the set of background feature points in the storage unit 4.

較正パラメータ算出部１４は、顔特徴点の組から画像上の水平方向及び垂直方向の較正パラメータθ_x、θ_yを算出する（ステップＳ１０５）。また較正パラメータ算出部１４は、背景特徴点の組から回転方向較正パラメータθ_zを算出する（ステップＳ１０６）。そして較正パラメータ算出部１４は、較正パラメータの組(θ_x,θ_y,θ_z)を判定部１５へ渡す。 The calibration parameter calculation unit 14 calculates calibration parameters θ _x and θ _y in the horizontal direction and the vertical direction on the image from the set of face feature points (step S105). The calibration parameter calculation unit 14 calculates a rotation direction calibration parameter θ _z from the set of background feature points (step S106). Then, the calibration parameter calculation unit 14 passes the set of calibration parameters (θ _x , θ _y , θ _z ) to the determination unit 15.

判定部１５は、較正パラメータの組(θ_x,θ_y,θ_z)が適正か否か判定する（ステップＳ１０７）。較正パラメータの組(θ_x,θ_y,θ_z)が適正でなければ（ステップＳ１０７−Ｎｏ）、処理部５は、ステップＳ１０１以降の処理を繰り返す。一方、較正パラメータの組(θ_x,θ_y,θ_z)が適正であれば（ステップＳ１０７−Ｙｅｓ）、判定部１５は、その較正パラメータの組を記憶部４に記憶し、その後キャリブレーション処理を終了する。なお、処理部５は、ステップＳ１０３とＳ１０４の処理の順序を入れ替えてもよい。 The determination unit 15 determines whether or not the set of calibration parameters (θ _x , θ _y , θ _z ) is appropriate (step S107). If the set of calibration parameters (θ _x , θ _y , θ _z ) is not appropriate (No at Step S107), the processing unit 5 repeats the processes after Step S101. On the other hand, if the set of calibration parameters (θ _x , θ _y , θ _z ) is appropriate (Yes in step S107), the determination unit 15 stores the set of calibration parameters in the storage unit 4, and then performs a calibration process. Exit. Note that the processing unit 5 may change the order of the processes in steps S103 and S104.

以上に説明してきたように、このステレオ画像較正装置は、予め被写体として分かっている顔を、その特徴を利用して左右の画像から検出し、その顔が写っている領域内の特徴点に基づいて、水平方向及び垂直方向の較正パラメータを求める。そのため、このステレオ画像較正装置は、左右の画像における被写体である顔の位置をステレオ画像に適するように補正できる。またこのステレオ画像較正装置は、背景領域内の背景特徴点の組に基づいて、回転方向の較正パラメータを求める。背景領域は、通常、光軸に相当する画像中心から離れているので、カメラ間の光軸周りの回転角の差を反映し易い。そのため、このステレオ画像較正装置は、左右の画像間における像の回転を適切に補正できる。 As described above, this stereo image calibration apparatus detects a face that is known as a subject in advance from the left and right images using the feature, and based on the feature point in the region in which the face is reflected. Thus, the horizontal and vertical calibration parameters are obtained. Therefore, this stereo image calibration apparatus can correct the position of the face as the subject in the left and right images so as to be suitable for the stereo image. Further, the stereo image calibration apparatus obtains a calibration parameter in the rotation direction based on a set of background feature points in the background region. Since the background area is usually away from the center of the image corresponding to the optical axis, it easily reflects the difference in rotation angle around the optical axis between cameras. Therefore, this stereo image calibration apparatus can appropriately correct the rotation of the image between the left and right images.

変形例によれば、較正パラメータ算出部は、背景特徴点の組だけでなく、顔特徴点の組も用いて、回転方向較正パラメータθ_zを求めてもよい。これにより、較正パラメータ算出部は、画像中のより広い範囲に分布する特徴点の組を参照することができるので、より適切に回転方向較正パラメータθ_zを決定できる。特に、背景領域に模様の無い壁が写っている場合のように、抽出される背景特徴点の組の数が少ないことがある。このような場合に、較正パラメータ算出部は、顔特徴点の組も用いることで回転方向較正パラメータを精度良く求めることができる。 According to the modification, the calibration parameter calculation unit may obtain the rotation direction calibration parameter θ _z using not only the set of background feature points but also the set of face feature points. Thus, the calibration parameter calculating unit, it is possible to refer to the set of feature points distributed in a wider range in the image can be determined more appropriately rotating direction calibration parameter theta _z. In particular, there may be a small number of background feature point sets to be extracted, as in the case where a wall having no pattern appears in the background area. In such a case, the calibration parameter calculation unit can obtain the rotation direction calibration parameter with high accuracy by using a set of face feature points.

さらに他の変形例によれば、背景特徴点抽出部は、背景特徴点を抽出する領域を、左画像と右画像の両方に同一の物体が写っている可能性が高い領域に限定してもよい。
顔領域の近傍では、その顔領域に写っている人の頭部によって、その背後にある物体が隠蔽されるオクルージョンが生じる。本実施形態では、カメラ２−１とカメラ２−２の位置が異なっているので、左画像において頭部によって隠蔽される範囲と右画像において頭部によって隠蔽される範囲は異なる。そこで背景特徴点抽出部は、背景領域から、顔領域から所定範囲内にある領域を除き、その残りの領域から背景特徴点を抽出する。 According to still another modification, the background feature point extraction unit may limit the region from which the background feature point is extracted to a region where the same object is likely to appear in both the left image and the right image. Good.
In the vicinity of the face area, an occlusion occurs in which an object behind the face area is concealed by the head of the person reflected in the face area. In the present embodiment, since the positions of the camera 2-1 and the camera 2-2 are different, the range hidden by the head in the left image is different from the range hidden by the head in the right image. Therefore, the background feature point extraction unit removes a region within a predetermined range from the face region from the background region, and extracts a background feature point from the remaining region.

この様子を図８（ａ）及び図８（ｂ）を参照しつつ説明する。図８（ａ）は、左画像上に写っている背景領域のうち、右画像で頭部により隠蔽される範囲を表す模式図である。また図８（ｂ）は、右画像上に写っている背景領域のうち、左画像で頭部により隠蔽される範囲を表す模式図である。
図８（ａ）及び図８（ｂ）において、点８０１、８０２は、それぞれ、カメラ２−１、カメラ２−２の視点を表す。線８０３、８０４は、それぞれ、カメラ２−１の撮像面及びカメラ２−２の撮像面を表す。また線８０５は、実空間上での頭部の範囲を表す。この例では、簡単化のために頭部８０５を平面で近似的に表した。また線８０６は、カメラ２−１、２−２から最も遠方にある物体を表す。この場合、領域８０７内にある物体は、左画像には写るものの、右画像では頭部８０５に隠れて写らない。そこで背景特徴点抽出部は、左画像において、領域８０７の左端に対応する左画像上の水平座標x_L,0,maxから領域８０７の右端に対応する左画像上の水平座標x_L,0までの範囲を、背景特徴点を抽出する範囲から除外する。 This will be described with reference to FIGS. 8A and 8B. FIG. 8A is a schematic diagram showing a range concealed by the head in the right image in the background area shown on the left image. FIG. 8B is a schematic diagram showing a range concealed by the head in the left image in the background area shown on the right image.
8A and 8B, points 801 and 802 represent the viewpoints of the camera 2-1 and the camera 2-2, respectively. Lines 803 and 804 represent the imaging surface of the camera 2-1 and the imaging surface of the camera 2-2, respectively. A line 805 represents the range of the head in real space. In this example, the head 805 is approximately represented by a plane for simplification. A line 806 represents an object farthest from the cameras 2-1 and 2-2. In this case, an object in the area 807 is not shown in the right image but hidden in the head 805 although it is shown in the left image. Therefore, in the left image, the background feature point extraction unit from the horizontal coordinate x _{L, 0, max} on the left image corresponding to the left end of the region 807 to the horizontal coordinate x _{L, 0} on the left image corresponding to the right end of the region 807 Is excluded from the range from which background feature points are extracted.

この場合、領域８０７の右端は顔領域に写る頭部８０５で規定されるので、左画像上での水平座標x_L,0は、顔領域の左端の座標となる。また水平座標x_L,0,maxは、右画像上での顔領域の左端の座標x_R,0に対応する位置となる。カメラ２−１、２−２から最も遠方にある物体までの距離Z_maxが既知であるならば、カメラ２−１とカメラ２−２間の間隔も通常既知であるので、三角測量の原理に基づいて、右画像上の座標x_R,0に対応する左画像上の水平座標x_L,0,maxは算出される。また、距離Z_maxが未知である場合、背景特徴点抽出部は、Z_max=∞、すなわち、カメラ２−２の視点８０２と座標x_R,0を通る線と、カメラ２−１の視点８０１と水平座標x_L,0,maxを通る線が平行であると仮定し、水平座標x_L,0,maxを座標x_R,0に設定する。 In this case, since the right end of the area 807 is defined by the head 805 appearing in the face area, the horizontal coordinate x _{L, 0} on the left image is the left end coordinate of the face area. The horizontal coordinate x _{L, 0, max} is a position corresponding to the left end coordinate x _{R, 0} of the face area on the right image. If the distance Z _max from the cameras 2-1 and 2-2 to the farthest object is known, the distance between the camera 2-1 and the camera 2-2 is usually known. Based on this, the horizontal coordinate x _{L, 0, max} on the left image corresponding to the coordinate x _{R, 0} on the right image is calculated. When the distance Z _max is unknown, the background feature point extraction unit extracts Z _max = ∞, that is, a line passing through the viewpoint 802 of the camera 2-2 and the coordinate x _{R, 0} and the viewpoint 801 of the camera 2-1. Assuming that the lines passing through the horizontal coordinates x _{L, 0, max} are parallel, the horizontal coordinates x _{L, 0, max} are set to the coordinates x _{R, 0} .

同様に、右画像に写っているにもかかわらず、頭部８０５で隠されることにより左画像に写らない領域は、図８（ｂ）に示された領域８０８となる。そこで背景特徴点抽出部は、右画像において、領域８０８の左端に対応する右画像上の水平座標x_R,1から領域８０８の右端に対応する右画像上の水平座標x_R,1,maxまでの範囲を、背景特徴点を抽出する範囲から除外する。そして領域８０８の左端は顔領域に写る頭部８０５で規定されるので、水平座標x_R,1は、顔領域の右端の座標となる。一方、水平座標x_R,1,maxは、左画像上での顔領域の右端の座標x_L,1に対応する位置となる。カメラ２−１、２−２から最も遠方にある物体までの距離Z_maxが既知であるならば、水平座標x_L,0,maxと同様に、三角測量の原理に基づいて、水平座標x_R,1,maxは算出される。また、距離Z_maxが未知である場合、背景特徴点抽出部は、Z_max=∞と仮定し、水平座標x_R,1,maxを座標x_L,1に設定する。 Similarly, a region that is hidden in the head image 805 but is not reflected in the left image even though it appears in the right image is the region 808 shown in FIG. 8B. Therefore, in the right image, the background feature point extraction unit from the horizontal coordinate x _{R, 1} on the right image corresponding to the left end of the region 808 to the horizontal coordinate x _{R, 1, max} on the right image corresponding to the right end of the region 808. Is excluded from the range from which background feature points are extracted. Since the left end of the area 808 is defined by the head 805 that appears in the face area, the horizontal coordinate x _{R, 1} is the coordinate of the right end of the face area. On the other hand, the horizontal coordinate xR _{, 1, max} is a position corresponding to the coordinate xL _{, 1} of the right end of the face area on the left image. If the distance Z _max from the cameras 2-1 and 2-2 to the farthest object is known, the horizontal coordinate x _R is based on the principle of triangulation as well as the horizontal coordinate x _{L, 0, max.} _{, 1, max} are calculated. When the distance Z _max is unknown, the background feature point extraction unit assumes that Z _max = ∞ and sets the horizontal coordinate x _{R, 1, max} to the coordinate x _{L, 1} .

図９（ａ）及び図９（ｂ）は、それぞれ、背景特徴点の抽出に関して、左画像及び右画像において背景領域から除外される領域を表す模式図である。図９（ａ）に示されるように、左画像９００では、顔領域９１０の左側に隣接する、水平座標x_L,0,maxからx_L,0までの範囲９０２が、背景特徴点の抽出が行われない領域として、背景領域９０１から除外される。また図９（ｂ）に示されるように、右画像９０５では、顔領域９１５の右側に隣接する、水平座標x_R,1からx_R,1,maxまでの範囲９０７が、背景特徴点の抽出が行われない領域として、背景領域９０６から除外される。これにより、例えば、左画像９００では、頭部によって隠蔽される物点９２０は、右画像９０５では、背景特徴点の抽出対象領域から外れる。なお、顔領域が矩形領域でない場合には、背景特徴点抽出部は、左画像に関して、行ごとに顔領域の左端の座標をx_L,0とし、上記のようにして求めたx_L,0,maxからx_L,0までの幅だけ左側の座標をx_L,0,maxとしてもよい。同様に、背景特徴点抽出部は、右画像に関して、行ごとに顔領域の右端の座標をx_R,1とし、上記のようにして求めたx_R,1からx_R,1,maxまでの幅だけ右側の座標をx_R,1,maxとしてもよい。 FIG. 9A and FIG. 9B are schematic diagrams showing regions excluded from the background region in the left image and the right image, respectively, regarding the extraction of background feature points. As shown in FIG. 9A, in the left image 900, a range 902 from the horizontal coordinates x _{L, 0, max} to x _{L, 0} adjacent to the left side of the face area 910 is extracted from the background feature points. It is excluded from the background area 901 as a non-performed area. Also, as shown in FIG. 9B, in the right image 905, the range 907 from the horizontal coordinates xR _{, 1} to xR _{, 1, max} adjacent to the right side of the face area 915 is extracted from the background feature points. Is excluded from the background area 906 as an area where no image processing is performed. Thereby, for example, in the left image 900, the object point 920 concealed by the head is out of the background feature point extraction target area in the right image 905. Incidentally, when the face region is not a rectangular area, the background feature point extraction unit, x _{L, 0} to regard the left image, the leftmost coordinate of the face area and x _{L, 0} for each row, obtained as above _{, the} left coordinate for the width from _max to x _{L, 0} may be x _{L, 0, max} . Similarly, with respect to the right image, the background feature point extraction unit sets the right end coordinate of the face area to x _{R, 1} for each row, and obtains x _{R, 1} to x _{R, 1, max} as described above. The coordinates on the right side of the width may be set to _{xR, 1, max} .

図１０（ａ）及び図１０（ｂ）は、それぞれ、背景特徴点の抽出に関して、左画像及び右画像において背景領域から除外される領域を表す他の一例の模式図である。図１０（ａ）に示されるように、左画像１０００では、左端近傍の幅dの領域１００１に写っている物体は、物点１０２０のように右画像に写らない。また図１０（ｂ）に示されるように、右画像１０１０では、右端近傍の幅dの領域１０１１に写っている物体は、左画像に写らない。そのため、背景特徴点抽出部は、左画像の領域１００１及び右画像の領域１０１１から背景特徴点を抽出しないようにしてもよい。 FIG. 10A and FIG. 10B are schematic diagrams of other examples showing areas excluded from the background area in the left image and the right image, respectively, regarding the extraction of the background feature points. As shown in FIG. 10A, in the left image 1000, an object shown in a region 1001 having a width d near the left end is not shown in the right image like an object point 1020. Further, as shown in FIG. 10B, in the right image 1010, an object shown in a region 1011 having a width d near the right end is not shown in the left image. Therefore, the background feature point extraction unit may not extract the background feature points from the left image area 1001 and the right image area 1011.

幅dの決定方法を図１１〜図１３を参照しつつ説明する。図１１は、左画像を生成するカメラ２−１の撮影範囲と右画像を生成するカメラ２−２の撮影範囲の関係を表す模式図である。この例では、カメラ２−１の結像光学系の光軸はカメラ２−２の結像光学系の光軸と平行であるとする。
図１１において、点１１０１、１１０２は、それぞれ、カメラ２−１、カメラ２−２の視点を表す。線１１０３、１１０４は、それぞれ、カメラ２−１の撮影面及びカメラ２−２の撮影面を表す。そしてカメラ２−１、２−２の水平方向の画角は、ともにθであるとする。また線１１０５は、カメラ２−１、２−２から最も遠方にある物体を表す。この場合、左画像に写る物体１１０５の水平方向の幅は(2Z_maxtan(θ/2)となる。なお、Z_maxは、カメラ２−１から物体１１０５までの距離である。同様に、右画像に写る物体１１０５の水平方向の幅は(2Z_maxtan(θ/2)となる。そしてカメラ２−１とカメラ２−２の間隔がTであれば、左画像上の左端に対応する物体１１０５上の点１１０６から距離Tまでの点までを含む領域１１０７に存在する物体が右画像に写らない。そこで背景特徴点が抽出されない、左画像の左端からの距離dは次式で算出される。

ここで、Wは左画像の水平方向の画素数である。同様に、右画像において背景特徴点が抽出されない右画像の右端からの距離dも（６）式により算出される。 A method for determining the width d will be described with reference to FIGS. FIG. 11 is a schematic diagram showing the relationship between the shooting range of the camera 2-1 that generates the left image and the shooting range of the camera 2-2 that generates the right image. In this example, it is assumed that the optical axis of the imaging optical system of the camera 2-1 is parallel to the optical axis of the imaging optical system of the camera 2-2.
In FIG. 11, points 1101 and 1102 represent the viewpoints of the camera 2-1 and the camera 2-2, respectively.

Lines

1103 and 1104 represent the imaging plane of the camera 2-1 and the imaging plane of the camera 2-2, respectively. The horizontal angles of view of the cameras 2-1 and 2-2 are both assumed to be θ. A line 1105 represents an object farthest from the cameras 2-1 and 2-2. In this case, the horizontal width of the object 1105 shown in the left image is (2Z _max tan (θ / 2), where Z _max is the distance from the camera 2-1 to the object 1105. Similarly, the right The horizontal width of the object 1105 in the image is (2Z _max tan (θ / 2). If the distance between the camera 2-1 and the camera 2-2 is T, the object corresponding to the left end on the left image An object existing in an area 1107 including a point from 1106 on the point 1105 to a point from a distance T is not shown in the right image, so that a background feature point is not extracted and a distance d from the left end of the left image is calculated by the following equation. .

Here, W is the number of pixels in the horizontal direction of the left image. Similarly, the distance d from the right end of the right image from which the background feature point is not extracted in the right image is also calculated by the equation (6).

また、カメラ２−１の結像光学系の光軸がカメラ２−２の結像光学系の光軸と平行でないこともある。
図１２は、カメラ２−１、２−２から左画像に写る範囲と右画像に写る範囲が同一となる平面までの距離Z'が、カメラ２−１、２−２から左画像及び右画像に写る最も遠方の物体までの距離Z_maxよりも大きい場合のカメラ２−１、２−２の撮影範囲の模式図である。
図１２において点１２０１、１２０２は、それぞれ、カメラ２−１、カメラ２−２の視点を表す。線１２０３、１２０４は、それぞれ、カメラ２−１の撮影面及びカメラ２−２の撮影面を表す。そしてカメラ２−１、２−２の水平方向の画角は、ともにθであるとする。また物体１２０５は、カメラ２−１、２−２から最も遠方にある物体である。さらに、カメラ２−１の結像光学系の光軸とカメラ２−２の結像光学系の光軸は角度αをなして物体側で交差し、物体１２０５の法線に対してそれぞれα/2だけ傾いているとする。このαは、例えば、設計値である。 Further, the optical axis of the imaging optical system of the camera 2-1 may not be parallel to the optical axis of the imaging optical system of the camera 2-2.
FIG. 12 shows the distance Z ′ from the cameras 2-1 and 2-2 to the plane where the range shown in the left image and the range shown in the right image are the same. it is a schematic view of a photographic range of the camera 2-1 and 2-2 is larger than the most distant distance Z _max to the object caught on.
In FIG. 12, points 1201 and 1202 represent the viewpoints of the cameras 2-1 and 2-2, respectively. Lines 1203 and 1204 represent the imaging plane of the camera 2-1 and the imaging plane of the camera 2-2, respectively. The horizontal angles of view of the cameras 2-1 and 2-2 are both assumed to be θ. An object 1205 is an object farthest from the cameras 2-1 and 2-2. Further, the optical axis of the imaging optical system of the camera 2-1 and the optical axis of the imaging optical system of the camera 2-2 intersect at the object side at an angle α, and α / Suppose that it is tilted by 2. This α is, for example, a design value.

ここで、物体１２０５における、右画像の左端に対応する点をPとする。このとき、カメラ２−２の視点１２０２から物体１２０５まで垂直に下ろした点と、Pとの間の距離lはZ_maxtan(θ/2+α/2)となる。一方、カメラ２−１の視点１２０１とPを通る直線と、視線とのなす角度βは(α/2+tan^-1((l-T)/Z_max))となる。なおTは、カメラ２−１とカメラ２−２間の距離である。したがって、左画像及び右画像の水平方向の画素数をWとすると、一方の画像に写らない領域の他方の画像における画像端からの幅dは次式で求められる。

Here, P is a point corresponding to the left end of the right image in the object 1205. At this time, a distance l between the point P2 vertically lowered from the viewpoint 1202 of the camera 2-2 to the object 1205 and P is Z _max tan (θ / 2 + α / 2). On the other hand, an angle β formed by a straight line passing through the viewpoint 1201 and P of the camera 2-1 and the line of sight is (α / 2 + tan ⁻¹ ((lT) / Z _max )). Note that T is the distance between the camera 2-1 and the camera 2-2. Accordingly, if the number of pixels in the horizontal direction of the left image and the right image is W, the width d from the image end in the other image of the region not shown in one image is obtained by

図１３は、カメラ２−１、２−２から左画像に写る範囲と右画像に写る範囲が同一となる平面までの距離Z'が、カメラ２−１、２−２から左画像及び右画像に写る最も遠方の物体までの距離Z_maxよりも小さい場合のカメラ２−１、２−２の撮影範囲の模式図である。
図１３において点１３０１、１３０２は、それぞれ、カメラ２−１、カメラ２−２の視点を表す。線１３０３、１３０４は、それぞれ、カメラ２−１の撮影面及びカメラ２−２の撮影面を表す。そしてカメラ２−１、２−２の水平方向の画角は、ともにθであるとする。また物体１３０５は、カメラ２−１、２−２から最も遠方（距離Z_max）にある物体である。さらに、カメラ２−１の結像光学系の光軸とカメラ２−２の結像光学系の光軸は角度αをなして物体側で交差し、物体１３０５の法線に対してそれぞれα/2だけ傾いているとする。
ここで、物体１３０５における、右画像の左端に対応する点をQとする。このとき、カメラ２−１の視点１３０１から物体１３０５まで垂直に下ろした点と、Qとの間の距離mはZ_maxtan(θ/2-α/2)となる。一方、カメラ２−２の視点１３０１とQを通る直線と、カメラ２−２の光軸１３０６とのなす角度γは(tan^-1((m+T)/Z_max)- α/2)となる。なおTは、カメラ２−１とカメラ２−２間の距離である。したがって、左画像及び右画像の水平方向の画素数をWとすると、一方の画像に写らない領域の他方の画像における画像端からの幅dは次式で求められる。

FIG. 13 shows the distance Z ′ from the cameras 2-1 and 2-2 to the plane where the range shown in the left image and the range shown in the right image are the same. it is a schematic view of a photographic range of the camera 2-1 and 2-2 is smaller than the distance Z _max of the most up to distant objects caught on.
In FIG. 13, points 1301 and 1302 represent the viewpoints of the cameras 2-1 and 2-2, respectively.

Lines

1303 and 1304 represent the imaging plane of the camera 2-1 and the imaging plane of the camera 2-2, respectively. The horizontal angles of view of the cameras 2-1 and 2-2 are both assumed to be θ. An object 1305 is an object that is farthest from the cameras 2-1 and 2-2 (distance Z _max ). Further, the optical axis of the imaging optical system of the camera 2-1 and the optical axis of the imaging optical system of the camera 2-2 intersect on the object side at an angle α, and α / Suppose that it is tilted by 2.
Here, Q is a point corresponding to the left end of the right image in the object 1305. At this time, a distance m between the point Q1 lowered vertically from the viewpoint 1301 of the camera 2-1 to the object 1305 and Q is Z _max tan (θ / 2−α / 2). On the other hand, the angle γ formed between the straight line passing through the viewpoint 1301 of the camera 2-2 and Q and the optical axis 1306 of the camera 2-2 is (tan ⁻¹ ((m + T) / Z _max ) −α / 2). Become. Note that T is the distance between the camera 2-1 and the camera 2-2. Accordingly, if the number of pixels in the horizontal direction of the left image and the right image is W, the width d from the image end in the other image of the region not shown in one image is obtained by

さらに、背景特徴点抽出部は、背景領域から、上述した左画像の左端近傍の領域及び右画像の右端近傍の領域と頭部によりオクルージョンが生じる顔領域近傍の領域との両方を除いた残りの領域からのみ背景特徴点を抽出してもよい。
図１４（ａ）及び図１４（ｂ）は、それぞれ、背景特徴点の抽出に関して左画像及び右画像において背景領域から除外される領域を表すさらに他の一例の模式図である。図１４（ａ）に示されるように、左画像１４００では、背景領域から左画像の左端近傍の幅dの領域１４０２と顔領域１４０３の左端近傍の、水平座標x_L,0,maxからx_L,0までの領域１４０４を除いた残りの領域１４０５が背景特徴点の抽出対象となる。また図１４（ｂ）に示されるように、右画像１４１０では、背景領域から右画像の右端近傍の幅dの領域１４１２と顔領域１４１３の右端近傍の、水平座標x_R,1からx_R,1,maxまでの領域１４１４を除いた残りの領域１４１５が背景特徴点の抽出対象となる。 Further, the background feature point extraction unit removes both the above-mentioned region near the left end of the left image and the region near the right end of the right image and the region near the face region where occlusion occurs due to the head from the background region. Background feature points may be extracted only from the region.
FIG. 14A and FIG. 14B are schematic diagrams of still another example showing areas excluded from the background area in the left image and the right image with respect to extraction of the background feature points, respectively. As shown in FIG. 14A, in the left image 1400, the horizontal coordinate x _{L, 0, max} to x _{L in the} vicinity of the left end of the area 1402 and the face area 1403 from the background area to the left end of the left image from the background area. _{, 0} , the remaining area 1405 excluding the area 1404 is the background feature point extraction target. 14B, in the right image 1410, horizontal coordinates x _{R, 1} to x _R, from the background region to the region 1412 having a width d near the right end of the right image and the right end of the face region 1413 _{, The} remaining area 1415 excluding the area 1414 up to _{1, max} is the background feature point extraction target.

上述したような背景特徴点抽出の除外範囲の設定は、キャリブレーション済みのステレオカメラで３次元計測を行う装置にも適用が可能である。
例えば、カメラ２−１とカメラ２−２間の間隔は既知であるので、処理部は、各背景特徴点の組に対して、三角測量の原理に基づいてカメラ２−１、２−２からその背景特徴点の組に対応する物体までの距離を求めることができる。同様に、処理部は、各顔特徴点の組に対して、三角測量の原理に基づいてカメラ２−１、２−２からその顔特徴点の組に対応する顔上の点までの距離を求めることができる。
その際、片方のカメラで撮影された画像において被写体により隠蔽されている特徴点、または片方のカメラにしか写らない特徴点が含まれれば、無駄な処理量が増えるとともに、特徴点の誤対応による計測ミスが生じる。この時、上述したように、背景特徴点の抽出を行わない領域を設定することにより、処理部は、処理の効率化と精度の向上を実現できる。 The setting of the exclusion range of background feature point extraction as described above can also be applied to an apparatus that performs three-dimensional measurement with a calibrated stereo camera.
For example, since the interval between the camera 2-1 and the camera 2-2 is known, the processing unit determines whether the pair of background feature points is from the cameras 2-1 and 2-2 based on the principle of triangulation. The distance to the object corresponding to the set of background feature points can be obtained. Similarly, for each set of facial feature points, the processing unit calculates the distance from the cameras 2-1 and 2-2 to the point on the face corresponding to the set of facial feature points based on the principle of triangulation. Can be sought.
At this time, if a feature point concealed by the subject in an image taken with one camera or a feature point that can only be seen by one camera is included, the amount of processing is wasted and the feature point is mishandled. A measurement error occurs. At this time, as described above, by setting a region where the background feature points are not extracted, the processing unit can realize an improvement in processing efficiency and accuracy.

場合によっては、カメラ２−１の結像光学系の焦点距離とカメラ２−２の結像光学系の焦点距離が異なることがある。このような場合、左画像に写る像の拡大率（以下、カメラ倍率と呼ぶ）が右画像に写る像の拡大率と異なることになる。変形例によれば、処理部は、同一の物体の同一の点に対応する特徴点の組を、このようなカメラ倍率の差異の較正にも利用できる。
ここで、左画像上の二つの特徴点間の距離の平均値と右画像上の二つの特徴点間の距離の平均値は、カメラ倍率の違いを反映していると推測される。特に、カメラまでの距離が相対的に遠い物体上の点については、両方のカメラからその点までの距離がほぼ等しくなるので、各カメラとその点までの距離間の差は無視できるほど小さい。
そこで、例えば、較正パラメータ算出部は、左画像及び右画像のそれぞれについて、カメラから相対的に物体に対応する、複数の背景特徴点から任意の二つの背景特徴点を選択し、その二つの背景特徴点間の距離（画像上の位置の差）を計算する。較正パラメータ算出部は、選択する背景特徴点の組み合わせを様々に変えて、背景特徴点間の距離を多数算出し、その距離の平均値を求める。そして較正パラメータ算出部は、左画像について求めた背景特徴点間の距離の平均値に対する、右画像について求めた背景特徴点間の距離の平均値の比率をスケール比Sとして求める。スケール比Sが決まれば、較正パラメータ算出部は、x軸周りの較正パラメータθ_x及びy軸周りの較正パラメータθ_yを決定する際に、（１）式、（２）式においてX_R,i、Y_R,iにスケール比Sを乗じることで、カメラ倍率の違いを補正できる。 In some cases, the focal length of the imaging optical system of the camera 2-1 may be different from the focal length of the imaging optical system of the camera 2-2. In such a case, the magnification of the image shown in the left image (hereinafter referred to as the camera magnification) is different from the magnification of the image shown in the right image. According to the modification, the processing unit can also use a set of feature points corresponding to the same point of the same object for calibration of the difference in camera magnification.
Here, the average value of the distance between the two feature points on the left image and the average value of the distance between the two feature points on the right image are estimated to reflect the difference in camera magnification. In particular, for a point on an object that is relatively far from the camera, the distance from both cameras to that point is almost equal, so the difference between each camera and the distance to that point is negligibly small.
Therefore, for example, for each of the left image and the right image, the calibration parameter calculation unit selects any two background feature points from a plurality of background feature points that correspond relatively to the object from the camera, and the two backgrounds The distance between feature points (positional difference on the image) is calculated. The calibration parameter calculation unit calculates various distances between the background feature points by changing various combinations of background feature points to be selected, and obtains an average value of the distances. Then, the calibration parameter calculation unit obtains, as a scale ratio S, a ratio of the average value of the distance between the background feature points obtained for the right image to the average value of the distance between the background feature points obtained for the left image. When the scale ratio S is determined, the calibration parameter calculation unit determines X _{R, i} in the equations (1) and (2) when determining the calibration parameter θ _x around the x axis and the calibration parameter θ _y around the y axis. By multiplying YR _{, i} by the scale ratio S, the difference in camera magnification can be corrected.

さらに他の変形例によれば、一旦求めた較正パラメータの組(θ_x,θ_y,θ_z)に基いて補正された右画像または左画像から顔特徴点の組及び背景特徴点の組を求めることにより、カメラから顔までの距離及び各背景特徴点の組に対応する物体までの距離が求められる。その結果、顔から各背景特徴点の組に対応する物体までの相対的な距離が計測できる。計測した距離に基いて、背景特徴点抽出部は、背景の特徴点の組を求める際の、テンプレートマッチングの探索範囲をその距離を中心とする所定の範囲内の距離となる画像上の領域に限定できる。そして背景特徴点抽出部は、限定された範囲内で再度テンプレートマッチングを行うことにより、同一の物体の同一の点に対応している可能性がより高い背景特徴点の組を得ることが可能になる。較正パラメータ算出部は、その背景特徴点の組を使用して、再度、較正パラメータを算出し直せば、より正確な較正パラメータを求めることができる。特に、上記のスケール比の推定においては、較正パラメータ算出部は、遠方の背景特徴点を選択的に利用することにより、カメラ２−１から物体までの距離とカメラ２−２からその物体までの距離の違いに起因するスケール比の測定誤差を低減できる。 According to still another modification, a set of face feature points and a set of background feature points are obtained from a right image or a left image corrected based on a set of calibration parameters (θ _x , θ _y , θ _z ) once obtained. By obtaining, the distance from the camera to the face and the distance to the object corresponding to each set of background feature points are obtained. As a result, the relative distance from the face to the object corresponding to each set of background feature points can be measured. Based on the measured distance, the background feature point extraction unit calculates a template matching search range when obtaining a set of background feature points to an area on the image that is a distance within a predetermined range centered on the distance. Can be limited. The background feature point extraction unit can obtain a set of background feature points that are more likely to correspond to the same point of the same object by performing template matching again within a limited range. Become. The calibration parameter calculation unit can obtain a more accurate calibration parameter by calculating the calibration parameter again using the set of background feature points. In particular, in the above-described estimation of the scale ratio, the calibration parameter calculation unit selectively uses a distant background feature point to thereby determine the distance from the camera 2-1 to the object and the camera 2-2 to the object. Measurement error of the scale ratio due to the difference in distance can be reduced.

さらに他の変形例によれば、被写体は、顔ではなく、人の全身あるいは上半身、車両または植物などであってもよい。この場合、処理部は、左画像及び右画像から、それぞれそのような被写体が写っている被写体領域を検出する被写体領域検出部を有する。この被写体領域検出部は、例えば、顔検出部と同様に、予め被写体が写っていることが分かっている複数の画像と被写体が写っていないことが分かっている複数の画像を用いた教師付き学習によって、画像上の被写体領域を抽出するように学習された識別器を用いて被写体領域を抽出する。そして被写体領域が抽出された後、処理部は、上記の実施形態またはその変形例と同様に、左画像及び右画像の被写体領域から被写体の同一の点を表す被写体特徴点の組を求め、背景領域から背景特徴点の組を求める。そして処理部は、被写体特徴点の組から水平方向及び垂直方向の較正パラメータを求め、背景特徴点の組から回転方向較正パラメータを求めればよい。 According to still another modification, the subject may be a person's whole body or upper body, a vehicle, a plant, or the like instead of a face. In this case, the processing unit includes a subject region detection unit that detects a subject region in which such a subject is captured from the left image and the right image. For example, the subject area detection unit, like the face detection unit, uses supervised learning using a plurality of images whose subject is known to be captured in advance and a plurality of images whose subject is known not to be captured. Thus, the subject region is extracted using the discriminator learned to extract the subject region on the image. Then, after the subject area is extracted, the processing unit obtains a set of subject feature points representing the same point of the subject from the subject areas of the left image and the right image, as in the above-described embodiment or the modification thereof, and the background. A set of background feature points is obtained from the region. Then, the processing unit may obtain horizontal and vertical calibration parameters from the set of subject feature points, and obtain a rotation direction calibration parameter from the set of background feature points.

上記の実施形態またはその変形例による処理部の機能を実現するコンピュータプログラムは、磁気記録媒体、光記録媒体といったコンピュータ読み取り可能な記録媒体に記録された形で提供されてもよい。 The computer program that realizes the function of the processing unit according to the above-described embodiment or its modification may be provided in a form recorded on a computer-readable recording medium such as a magnetic recording medium or an optical recording medium.

ここに挙げられた全ての例及び特定の用語は、読者が、本発明及び当該技術の促進に対する本発明者により寄与された概念を理解することを助ける、教示的な目的において意図されたものであり、本発明の優位性及び劣等性を示すことに関する、本明細書の如何なる例の構成、そのような特定の挙げられた例及び条件に限定しないように解釈されるべきものである。本発明の実施形態は詳細に説明されているが、本発明の精神及び範囲から外れることなく、様々な変更、置換及び修正をこれに加えることが可能であることを理解されたい。 All examples and specific terms listed herein are intended for instructional purposes to help the reader understand the concepts contributed by the inventor to the present invention and the promotion of the technology. It should be construed that it is not limited to the construction of any example herein, such specific examples and conditions, with respect to showing the superiority and inferiority of the present invention. Although embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions and modifications can be made thereto without departing from the spirit and scope of the present invention.

以上説明した実施形態及びその変形例に関し、更に以下の付記を開示する。
（付記１）
被写体を含む領域を第１のカメラで撮影することにより生成された第１の画像と当該領域を第２のカメラで撮影することにより生成された第２の画像とを取得し、
前記第１の画像から前記被写体が写っている第１の被写体領域を検出し、かつ、前記第２の画像から前記被写体が写っている第２の被写体領域を検出し、
前記第１の被写体領域及び前記第２の被写体領域から、前記被写体上の同一の点に対応する被写体特徴点の組を少なくとも一つ抽出し、
前記少なくとも一つの被写体特徴点の組に基づいて、像面に対して平行な第１の軸及び当該第１の軸と直交する第２の軸のそれぞれの周りの前記第１のカメラの回転角と前記第２のカメラの回転角との差を求め、当該差による前記第１の画像上の前記被写体の位置と前記第２の画像上の前記被写体の位置のずれを補正する第１の較正パラメータを求め、
前記第１の画像から前記第１の被写体領域を除いた第１の背景領域及び前記第２の画像から前記第２の被写体領域を除いた第２の背景領域に写っている物体の像から、前記第１のカメラの光軸周りの回転角と前記第２のカメラの光軸周りの回転角の差による前記第１の画像と前記第２の画像間の回転を補正する第２の較正パラメータを求める、
ことを含むステレオ画像較正方法。
（付記２）
前第１の背景領域及び前記第２の背景領域から、同一の物体上の同一の点に対応する背景特徴点の組を少なくとも一つ抽出することをさらに含み、
前記第２の較正パラメータを求めることは、前記第１の画像及び前記第２の画像のうちの少なくとも一方を回転させつつ、前記背景特徴点の組ごとに、前記第１の画像に含まれる背景特徴点と前記第２の画像に含まれる背景特徴点間の距離を求めて該距離の平均値を求め、当該平均値が最小となるときの前記第１の画像に対する前記第２の画像の回転角度を前記第２の較正パラメータとする、付記１に記載のステレオ画像較正方法。
（付記３）
前記背景特徴点の組を抽出することは、前記第１の背景領域の中から前記第２のカメラについて前記被写体によって隠蔽される領域に相当する範囲を除いた領域から前記背景特徴点を抽出する、付記２に記載のステレオ画像較正方法。
（付記４）
前記背景特徴点の組を抽出することは、前記第１の背景領域の中から前記第２のカメラの撮影範囲から外れる範囲を除いた領域から前記背景特徴点を抽出する、付記２または３に記載のステレオ画像較正方法。
（付記５）
前記第２の較正パラメータを求めることは、前記第１の背景領域内の各画素における画素値の局所勾配の方位角を求めることにより当該局所勾配の頻度が最大となる第１の方位角を求め、かつ、前記第２の背景領域内の各画素における画素値の局所勾配の方位角を求めることにより当該局所勾配の頻度が最大となる第２の方位角を求め、前記第１の方位角と前記第２の方位角の差を前記第２の較正パラメータとする、付記１に記載のステレオ画像較正方法。
（付記６）
前記第１の画像及び前記第２の画像のうちの少なくとも一方についての前記被写体特徴点の座標を前記第１の較正パラメータ及び前記第２の較正パラメータを用いて変換して得られた補正後の被写体特徴点の垂直方向座標と、前記第１の画像及び前記第２の画像のうちの他方についての前記被写体上の同一の点に対応する前記被写体特徴点の垂直方向座標との差が所定の許容範囲内である場合に、前記第１の較正パラメータ及び前記第２の較正パラメータを記憶部に記憶させる、付記１〜５の何れか一項に記載のステレオ画像較正方法。
（付記７）
前記被写体特徴点の組を少なくとも一つ抽出することは、
前記第１の被写体領域上の第１の候補点を所定の検出器を用いて検出し、
前記第１の候補点を含む前記第１の被写体領域内の第１の部分領域と最も一致する前記第２の被写体領域内の第２の部分領域を求めて当該第２の部分領域内の所定点を第２の候補点とし、
前記第２の部分領域と最も一致する前記第１の被写体領域内の第３の部分領域を求め、当該第３の部分領域内の第３の候補点と前記第１の候補点間の距離が所定の閾値以下である場合、前記第１の候補点または当該第３の候補点と前記第２の被写体候補点とを、前記被写体特徴点の組とする、付記１〜６の何れか一項に記載のステレオ画像較正方法。
（付記８）
前記第１の較正パラメータを求めることは、
前記被写体特徴点の組ごとに、
前記第１の画像上の前記第１のカメラの光軸に対応する点から当該被写体特徴点の組に含まれる前記第１の画像上の第１の被写体特徴点までの第１の距離に応じた当該第１の被写体特徴点の前記第１の方向に対する回転角と前記第２の画像上の前記第２のカメラの光軸に対応する点から当該被写体特徴点の組に含まれる前記第２の画像上の第２の被写体特徴点までの第２の距離に応じた当該第２の被写体特徴点の前記第１の方向に対する回転角との第１の差を求め、
前記第１の距離に応じた前記第１の被写体特徴点の前記第２の方向に対する回転角と前記第２の距離に応じた前記第２の被写体特徴点の前記第２の方向に対する回転角との第２の差を求め、
前記第１の差の平均及び前記第２の差の平均を前記第１の較正パラメータとする、付記１〜７の何れか一項に記載のステレオ画像較正方法。
（付記９）
被写体を含む領域を第１のカメラで撮影することにより生成された第１の画像と当該領域を第２のカメラで撮影することにより生成された第２の画像とを取得し、
前記第１の画像から前記被写体が写っている第１の被写体領域を検出し、かつ、前記第２の画像から前記被写体が写っている第２の被写体領域を検出し、
前記第１の被写体領域及び前記第２の被写体領域から、前記被写体上の同一の点に対応する被写体特徴点の組を少なくとも一つ抽出し、
前記少なくとも一つの被写体特徴点の組に基づいて、像面に対して平行な第１の軸及び当該第１の軸と直交する第２の軸のそれぞれの周りの前記第１のカメラの回転角と前記第２のカメラの回転角との差を求め、当該差による前記第１の画像上の前記被写体の位置と前記第２の画像上の前記被写体の位置のずれを補正する第１の較正パラメータを求め、
前記第１の画像から前記第１の被写体領域を除いた第１の背景領域及び前記第２の画像から前記第２の被写体領域を除いた第２の背景領域に写っている物体の像から、前記第１のカメラの光軸周りの回転角と前記第２のカメラの光軸周りの回転角の差による前記第１の画像と前記第２の画像間の回転を補正する第２の較正パラメータを求める、
ことをコンピュータに実行させるステレオ画像較正用コンピュータプログラム。
（付記１０）
所定の被写体を含む領域を撮影することにより第１の画像を生成する第１のカメラと、
前記第１のカメラと異なる位置に配置され、かつ前記領域を撮影することにより第２の画像を生成する第２のカメラと、
前記第１の画像から前記被写体が写っている第１の被写体領域を検出し、かつ、前記第２の画像から前記被写体が写っている第２の被写体領域を検出する被写体領域検出部と、
前記第１の被写体領域及び前記第２の被写体領域から、前記被写体上の同一の点に対応する被写体特徴点の組を少なくとも一つ抽出する被写体特徴点抽出部と、
前記少なくとも一つの被写体特徴点の組に基づいて、像面に対して平行な第１の軸及び当該第１の軸と直交する第２の軸のそれぞれの周りの前記第１のカメラの回転角と前記第２のカメラの回転角との差を求め、当該差による前記第１の画像上の前記被写体の位置と前記第２の画像上の前記被写体の位置のずれを補正する第１の較正パラメータを求め、かつ、前記第１の画像から前記第１の被写体領域を除いた第１の背景領域及び前記第２の画像から前記第２の被写体領域を除いた第２の背景領域に写っている物体の像から、前記第１のカメラの光軸周りの回転角と前記第２のカメラの光軸周りの回転角の差による前記第１の画像と前記第２の画像間の回転を補正する第２の較正パラメータを求める較正パラメータ算出部と、
を有するステレオ画像較正装置。 The following supplementary notes are further disclosed regarding the embodiment described above and its modifications.
(Appendix 1)
Obtaining a first image generated by shooting the area including the subject with the first camera and a second image generated by shooting the area with the second camera;
Detecting a first subject area in which the subject is reflected from the first image, and detecting a second subject area in which the subject is reflected from the second image;
Extracting at least one set of subject feature points corresponding to the same point on the subject from the first subject region and the second subject region;
Based on the set of at least one subject feature point, the rotation angle of the first camera around each of a first axis parallel to the image plane and a second axis orthogonal to the first axis And a first calibration for correcting a difference between the position of the subject on the first image and the position of the subject on the second image due to the difference. Find the parameters
From an image of an object shown in a first background area obtained by removing the first subject area from the first image and a second background area obtained by removing the second subject area from the second image, A second calibration parameter for correcting rotation between the first image and the second image due to a difference between a rotation angle around the optical axis of the first camera and a rotation angle around the optical axis of the second camera; Seeking
Stereo image calibration method comprising:
(Appendix 2)
Further comprising extracting at least one set of background feature points corresponding to the same point on the same object from the previous first background region and the second background region;
Obtaining the second calibration parameter means that at least one of the first image and the second image is rotated, and the background included in the first image for each set of background feature points. The distance between the feature point and the background feature point included in the second image is obtained to obtain an average value of the distance, and the second image is rotated with respect to the first image when the average value is minimized. The stereo image calibration method according to appendix 1, wherein an angle is the second calibration parameter.
(Appendix 3)
Extracting the set of background feature points extracts the background feature points from an area excluding the range corresponding to the area hidden by the subject for the second camera from the first background area. The stereo image calibration method according to appendix 2.
(Appendix 4)
In extracting the background feature point, the background feature point is extracted from an area excluding a range outside the shooting range of the second camera from the first background area. A stereo image calibration method as described.
(Appendix 5)
The second calibration parameter is obtained by obtaining the first azimuth angle at which the frequency of the local gradient is maximized by obtaining the azimuth angle of the local gradient of the pixel value in each pixel in the first background region. And, by obtaining the azimuth angle of the local gradient of the pixel value in each pixel in the second background region, the second azimuth angle at which the frequency of the local gradient is maximized is obtained, and the first azimuth angle and The stereo image calibration method according to appendix 1, wherein the difference in the second azimuth is the second calibration parameter.
(Appendix 6)
The corrected feature obtained by converting the coordinates of the subject feature point for at least one of the first image and the second image using the first calibration parameter and the second calibration parameter. The difference between the vertical coordinate of the subject feature point and the vertical coordinate of the subject feature point corresponding to the same point on the subject for the other of the first image and the second image is a predetermined value. The stereo image calibration method according to any one of appendices 1 to 5, wherein the first calibration parameter and the second calibration parameter are stored in a storage unit when the tolerance is within an allowable range.
(Appendix 7)
Extracting at least one set of subject feature points;
Detecting a first candidate point on the first subject area using a predetermined detector;
A second partial area in the second subject area that most closely matches the first partial area in the first subject area that includes the first candidate point is determined, and the location in the second partial area is determined. Let the fixed point be the second candidate point,
A third partial region in the first subject region that most closely matches the second partial region is obtained, and a distance between the third candidate point in the third partial region and the first candidate point is determined. If it is equal to or less than a predetermined threshold, the first candidate point or the third candidate point and the second subject candidate point are set as a set of the subject feature points. A stereo image calibration method according to claim 1.
(Appendix 8)
Determining the first calibration parameter includes
For each set of subject feature points,
According to a first distance from a point corresponding to the optical axis of the first camera on the first image to a first subject feature point on the first image included in the set of subject feature points In addition, the second feature point included in the set of subject feature points from the point corresponding to the rotation angle of the first subject feature point with respect to the first direction and the optical axis of the second camera on the second image. Determining a first difference between a rotation angle of the second subject feature point with respect to the first direction according to a second distance to the second subject feature point on the image of
A rotation angle of the first subject feature point with respect to the second direction according to the first distance, and a rotation angle of the second subject feature point with respect to the second direction according to the second distance; Find the second difference of
The stereo image calibration method according to any one of appendices 1 to 7, wherein an average of the first difference and an average of the second difference are used as the first calibration parameter.
(Appendix 9)
Obtaining a first image generated by shooting the area including the subject with the first camera and a second image generated by shooting the area with the second camera;
Detecting a first subject area in which the subject is reflected from the first image, and detecting a second subject area in which the subject is reflected from the second image;
Extracting at least one set of subject feature points corresponding to the same point on the subject from the first subject region and the second subject region;
Based on the set of at least one subject feature point, the rotation angle of the first camera around each of a first axis parallel to the image plane and a second axis orthogonal to the first axis And a first calibration for correcting a difference between the position of the subject on the first image and the position of the subject on the second image due to the difference. Find the parameters
From an image of an object shown in a first background area obtained by removing the first subject area from the first image and a second background area obtained by removing the second subject area from the second image, A second calibration parameter for correcting rotation between the first image and the second image due to a difference between a rotation angle around the optical axis of the first camera and a rotation angle around the optical axis of the second camera; Seeking
A computer program for stereo image calibration that causes a computer to execute this.
(Appendix 10)
A first camera that generates a first image by photographing an area including a predetermined subject;
A second camera arranged at a different position from the first camera and generating a second image by photographing the region;
A subject region detection unit that detects a first subject region in which the subject is captured from the first image and detects a second subject region in which the subject is captured from the second image;
A subject feature point extraction unit that extracts at least one set of subject feature points corresponding to the same point on the subject from the first subject region and the second subject region;
Based on the set of at least one subject feature point, the rotation angle of the first camera around each of a first axis parallel to the image plane and a second axis orthogonal to the first axis And a first calibration for correcting a difference between the position of the subject on the first image and the position of the subject on the second image due to the difference. Parameters are obtained, and are reflected in a first background area obtained by removing the first subject area from the first image and a second background area obtained by removing the second subject area from the second image. The rotation between the first image and the second image due to the difference between the rotation angle around the optical axis of the first camera and the rotation angle around the optical axis of the second camera is corrected from the image of the moving object A calibration parameter calculator for obtaining a second calibration parameter to be
Stereo image calibration apparatus having

１ステレオ画像較正装置
２−１、２−２カメラ
３入力部
４記憶部
５処理部
１１顔検出部
１２顔特徴点抽出部
１３背景特徴点抽出部
１４較正パラメータ算出部
１５判定部 DESCRIPTION OF SYMBOLS 1 Stereo image calibration apparatus 2-1 and 2-2 Camera 3 Input part 4 Memory | storage part 5 Processing part 11 Face detection part 12 Face feature point extraction part 13 Background feature point extraction part 14 Calibration parameter calculation part 15 Determination part

Claims

Obtaining a first image generated by shooting the area including the subject with the first camera and a second image generated by shooting the area with the second camera;
Detecting a first subject area in which the subject is reflected from the first image, and detecting a second subject area in which the subject is reflected from the second image;
Extracting at least one set of subject feature points corresponding to the same point on the subject from the first subject region and the second subject region;
Based on the set of at least one subject feature point, the rotation angle of the first camera around each of a first axis parallel to the image plane and a second axis orthogonal to the first axis And a first calibration for correcting a difference between the position of the subject on the first image and the position of the subject on the second image due to the difference. Find the parameters
From an image of an object shown in a first background area obtained by removing the first subject area from the first image and a second background area obtained by removing the second subject area from the second image, A second calibration parameter for correcting rotation between the first image and the second image due to a difference between a rotation angle around the optical axis of the first camera and a rotation angle around the optical axis of the second camera; Seeking
Stereo image calibration method comprising:

Further comprising extracting at least one set of background feature points corresponding to the same point on the same object from the previous first background region and the second background region;
The second to determine the calibration parameter, the first image and while rotating at least one of said second image, the background included in the prior SL first image for each set of said background feature points seeking distance between background feature points included in the second image feature points, an average value of those said distance, said second image with respect to the first image when the average value is minimized The stereo image calibration method according to claim 1, wherein the rotation angle is set as the second calibration parameter.

The second calibration parameter is obtained by obtaining the first azimuth angle at which the frequency of the local gradient is maximized by obtaining the azimuth angle of the local gradient of the pixel value in each pixel in the first background region. And, by obtaining the azimuth angle of the local gradient of the pixel value in each pixel in the second background region, the second azimuth angle at which the frequency of the local gradient is maximized is obtained, and the first azimuth angle and The stereo image calibration method according to claim 1, wherein a difference between the second azimuth angles is used as the second calibration parameter.

Extracting at least one set of subject feature points;
Detecting a first candidate point on the first subject area using a predetermined detector;
A second partial area in the second subject area that most closely matches the first partial area in the first subject area that includes the first candidate point is determined, and the location in the second partial area is determined. Let the fixed point be the second candidate point,
A third partial region in the first subject region that most closely matches the second partial region is obtained, and a distance between the third candidate point in the third partial region and the first candidate point is determined. If it is less than a predetermined threshold value, and said first candidate point or the third candidate points and the second candidate point, a set of object feature point, one of claims 1 to 3 one The stereo image calibration method according to Item.

Obtaining a first image generated by shooting the area including the subject with the first camera and a second image generated by shooting the area with the second camera;
Detecting a first subject area in which the subject is reflected from the first image, and detecting a second subject area in which the subject is reflected from the second image;
Extracting at least one set of subject feature points corresponding to the same point on the subject from the first subject region and the second subject region;
Based on the set of at least one subject feature point, the rotation angle of the first camera around each of a first axis parallel to the image plane and a second axis orthogonal to the first axis And a first calibration for correcting a difference between the position of the subject on the first image and the position of the subject on the second image due to the difference. Find the parameters
From an image of an object shown in a first background area obtained by removing the first subject area from the first image and a second background area obtained by removing the second subject area from the second image, A second calibration parameter for correcting rotation between the first image and the second image due to a difference between a rotation angle around the optical axis of the first camera and a rotation angle around the optical axis of the second camera; Seeking
A computer program for stereo image calibration that causes a computer to execute this.

A first camera that generates a first image by photographing an area including a predetermined subject;
A second camera arranged at a different position from the first camera and generating a second image by photographing the region;
A subject region detection unit that detects a first subject region in which the subject is captured from the first image and detects a second subject region in which the subject is captured from the second image;
A subject feature point extraction unit that extracts at least one set of subject feature points corresponding to the same point on the subject from the first subject region and the second subject region;
Based on the set of at least one subject feature point, the rotation angle of the first camera around each of a first axis parallel to the image plane and a second axis orthogonal to the first axis And a first calibration for correcting a difference between the position of the subject on the first image and the position of the subject on the second image due to the difference. Parameters are obtained, and are reflected in a first background area obtained by removing the first subject area from the first image and a second background area obtained by removing the second subject area from the second image. The rotation between the first image and the second image due to the difference between the rotation angle around the optical axis of the first camera and the rotation angle around the optical axis of the second camera is corrected from the image of the moving object A calibration parameter calculator for obtaining a second calibration parameter to be
Stereo image calibration apparatus having