JP2010187130A

JP2010187130A - Camera calibrating device, camera calibration method, camera calibration program, and recording medium having the program recorded therein

Info

Publication number: JP2010187130A
Application number: JP2009029082A
Authority: JP
Inventors: Tatsuya Osawa; 達哉大澤; Hiroyuki Arai; 啓之新井; Hideki Koike; 秀樹小池
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-02-10
Filing date: 2009-02-10
Publication date: 2010-08-26

Abstract

<P>PROBLEM TO BE SOLVED: To facilitate measurement of internal and external parameters of a camera. <P>SOLUTION: A silhouette image creation means 12 creates a silhouette image resulting from extracting an area of an object from an image inputted from an image acquisition means 11 like a camera. A parameter estimation means 14 selects a preliminarily prepared change type of the object in accordance with an arbitrary probability and generates a state candidate according with the selected change type. A simulation image is created by projecting the silhouette image on a virtual camera having the same camera internal and external parameters as the generated state candidate on the basis of the state candidate and simulating the silhouette image. An evaluation value is obtained which results from product operation between a result of comparison between both images and a determination result of likelihood of the object itself based on prior knowledge. It is determined whether the state candidate can be accepted as the final change state or not in accordance with the evaluation value. Internal and external parameters of the camera are estimated from the accepted change state. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、人物・動物を含む物体をカメラで撮影した画像を利用してカメラの内部・外部パラメータを推定する技術に関する。 The present invention relates to a technique for estimating internal and external parameters of a camera using an image obtained by photographing an object including a person / animal with a camera.

コンピュータビジョン分野では、カメラの撮影情報を利用して人物などの動く対象の追跡に関する研究が多く行われている。この人物追跡技術は、主に映像監視技術などに応用されている。ここでは例えば次の処理が実現されている。 In the field of computer vision, much research has been conducted on tracking moving objects such as people using camera information. This person tracking technology is mainly applied to video surveillance technology and the like. Here, for example, the following processing is realized.

すなわち、カメラなどの画像取得装置を利用して対象物体を観測し、画像上における対象の領域（シルエット）を抽出したシルエット画像を用意する。予め追跡に利用するカメラの内部・外部パラメータを予め計測して求めておき、三次元空間中に人物モデルを配置し、先に算出したパラメータを持つ仮想のカメラでこのシーンを撮影すると、シルエット画像のシミュレーションを行うことができる。 That is, a target object is observed using an image acquisition device such as a camera, and a silhouette image obtained by extracting a target region (silhouette) on the image is prepared. When the internal and external parameters of the camera used for tracking are measured in advance and obtained, a person model is placed in the three-dimensional space, and this scene is shot with a virtual camera having the previously calculated parameters, a silhouette image is obtained. Can be simulated.

そして、人物モデルに楕円体を用いて生成したシミュレーション画像とシルエット画像を比較することで、複数の人物の追跡を行う方法が提案されている。この一例として非特許文献１の技術が知られている。 A method for tracking a plurality of persons by comparing a simulation image generated using an ellipsoid with a silhouette image and a silhouette image has been proposed. As an example of this, the technique of Non-Patent Document 1 is known.

大澤達哉，数藤恭子，新井啓之，小池秀樹 “単眼動画像を用いた近接する複数対象の追跡” 信学技報Ｖｏｌ．１０８Ｎｏ．９３２００８年６月１２日発行Ｐ１〜Ｐ６Tatsuya Osawa, Atsuko Kudo, Hiroyuki Arai, Hideki Koike “Tracking multiple adjacent objects using monocular moving images” IEICE Tech. 108 no. 93 Published on June 12, 2008 P1-P6

非特許文献１人物の方法によれば、三次元空間上での位置を直接予測し、予測した結果を実際に設置されているカメラと同じ内部・外部パラメータを持つ仮想のカメラで撮影し、シルエット画像のシミュレーションを行い、実際に撮影された画像と整合性を比較することで、一台のカメラ情報のみを用いて、人物の三次元空間上での位置の追跡が行われる。 According to the method of non-patent document 1, a person directly predicts a position in a three-dimensional space, and captures the predicted result with a virtual camera having the same internal and external parameters as the camera actually installed. By simulating the image and comparing the consistency with the actually captured image, the position of the person in the three-dimensional space is tracked using only one camera information.

しかしながら、シルエット画像のシミュレーションを行うには事前に撮影に用いるカメラの内部・外部パラメータを計測しておく必要がある。ところが、カメラの内部・外部パラメータの計測には専門知識を必要とするので、訓練を受けた熟練者にしか計測ができないおそれがあった。 However, in order to simulate a silhouette image, it is necessary to measure the internal and external parameters of the camera used for shooting in advance. However, since measurement of the internal and external parameters of the camera requires specialized knowledge, there is a possibility that only a trained skilled person can perform the measurement.

本発明は、上述のような従来技術の問題点を解決するためになされたものであり、カメラの内部・外部パラメータを容易に計測可能な技術を提供することを解決課題としている。 The present invention has been made to solve the above-described problems of the prior art, and an object of the present invention is to provide a technique capable of easily measuring internal and external parameters of a camera.

そこで、本発明は、前記課題を解決するために、物体の変化状態（物体数・物体の大きさ・物体の位置）の予測結果を評価することで、カメラの内部・外部パラメータを自動的に推定している。 Therefore, in order to solve the above problems, the present invention automatically evaluates the internal / external parameters of the camera by evaluating the prediction result of the change state of the object (number of objects, size of the object, position of the object). Estimated.

本発明の一態様は、物体の撮影画像から撮影に利用しているカメラの内部・外部パラメータを求めるカメラ校正装置であって、前記カメラの撮影画像から物体の写っている領域を抽出したシルエット画像を作成するシルエット画像作成手段と、前記物体の変化タイプに応じて予測した状態を、前記シルエット画像を利用して評価することで、前記カメラの内部・外部パラメータを推定するパラメータ推定手段と、を備える。 One aspect of the present invention is a camera calibration apparatus for obtaining internal and external parameters of a camera used for photographing from a photographed image of an object, and a silhouette image obtained by extracting a region where an object is photographed from the photographed image of the camera A silhouette image creating means for creating a parameter estimation means for estimating internal and external parameters of the camera by evaluating the predicted state according to the change type of the object using the silhouette image. Prepare.

本発明の他の態様は、物体の撮影画像から撮影に利用しているカメラの内部・外部パラメータを求めるカメラ校正方法であって、シルエット画像作成手段が、前記カメラの撮影画像から物体の写っている領域を抽出したシルエット画像を作成する第１ステップと、パラメータ推定手段が、前記物体の変化タイプに応じて予測した状態を、前記シルエット画像を利用して評価することで、前記カメラの内部・外部パラメータを推定する第２ステップと、を有する。 Another aspect of the present invention is a camera calibration method for obtaining internal and external parameters of a camera used for photographing from a photographed image of an object, wherein the silhouette image creating means captures an object from the photographed image of the camera. A first step of creating a silhouette image in which a region is extracted, and the parameter estimation means evaluates the state predicted according to the change type of the object using the silhouette image, thereby A second step of estimating external parameters.

前記両態様においては、前記物体の状態を、楕円体モデルの集合で表し、モデル頭頂部の画像上での二次元座標値と、頭部に近似する楕円体の短軸と、身体に近似する楕円体の短軸と、身長を示すモデルの高さで人物を表すことが好ましい。 In both aspects, the state of the object is represented by a set of ellipsoidal models, and approximates to the body, the two-dimensional coordinate values on the model top, the short axis of the ellipsoid that approximates the head, and the body. It is preferable to represent the person by the ellipsoidal short axis and the height of the model indicating the height.

なお、本発明は、前記カメラ校正装置としてコンピュータを機能させるプログラムの態様で提供してもよく、また該プログラムを記録した記録媒体の態様で提供してもよい。 The present invention may be provided in the form of a program that causes a computer to function as the camera calibration device, or may be provided in the form of a recording medium on which the program is recorded.

本発明によれば、カメラの内部・外部パラメータが自動的に推定されることから、該パラメータの計測が容易となる。 According to the present invention, since the internal and external parameters of the camera are automatically estimated, it is easy to measure the parameters.

カメラの撮影状況を示す概略図。Schematic which shows the imaging | photography condition of a camera. カメラの位置・姿勢を表す座標図。The coordinate diagram showing the position and orientation of the camera. 三次元モデルを示す楕円体モデルの概略図。Schematic of an ellipsoid model showing a three-dimensional model. 本発明の実施形態に係るカメラ構成装置の構成図。The block diagram of the camera structure apparatus which concerns on embodiment of this invention. 同カメラ校正装置の処理を示すチャート図。The chart figure which shows the process of the camera calibration apparatus. （ａ）は入力画像、（ｂ）はシルエット画像。(A) is an input image, (b) is a silhouette image. （ａ）は入力画像、（ｂ）は垂直エッジ画像、（ｃ）は水平エッジ画像。(A) is an input image, (b) is a vertical edge image, and (c) is a horizontal edge image. 同パラメータの推定の処理ステップを示すフローチャート。The flowchart which shows the process step of estimation of the parameter. 同シルエット画像から計算した水平方向の頭部位置の確率分布図。Probability distribution map of horizontal head position calculated from the same silhouette image. 同エッジ検出画像から計算した垂直方向の頭部位置の確率分布図。The probability distribution map of the head position in the vertical direction calculated from the edge detection image. （ａ）は実施例の入力画像、（ｂ）は実施例のシルエット画像、（ｃ）は実施例のエッジ画像（水平）、（ｄ）は実施例のエッジ画像（垂直）、（ｅ）は実施例のシミュレーション画像、（ｆ）は実施例で推定された対象の三次元位置(A) is an input image of the embodiment, (b) is a silhouette image of the embodiment, (c) is an edge image (horizontal) of the embodiment, (d) is an edge image (vertical) of the embodiment, and (e) is Simulation image of the example, (f) is the three-dimensional position of the object estimated in the example

（１）基本原理
まず本発明の基本原理を以下に説明する。本発明は、三次元空間中に存在する１又は複数の物体（人物・動物を含む）を撮影することで、各々の物体の変化状態、即ち物体数・物体位置・物体の大きさ、及び撮影に利用した画像取得手段のカメラ内部・外部パラメータを推定している。 (1) Basic Principle First, the basic principle of the present invention will be described below. The present invention captures one or a plurality of objects (including a person / animal) existing in a three-dimensional space, thereby changing the state of each object, that is, the number of objects, the position of the object, the size of the object, and the imaging. The camera internal and external parameters of the image acquisition means used for the estimation are estimated.

ここでは撮影される１又は複数の対象物は同一平面上に存在しているものと仮定する。一例として図１に示すように、平面上を歩行する複数の人物を対象とし、位置と姿勢を固定されたカメラ１（例えばデジタルカメラやビデオカメラなど）を用いてもよい。この場合に三次元空間を定義する座標は任意に設定することが可能であるため、ＸＹ平面を人物が歩行している平面とし、Ｚ軸を床面からの高さを表すように設定することが可能である。 Here, it is assumed that one or more objects to be photographed exist on the same plane. As an example, as shown in FIG. 1, a camera 1 (for example, a digital camera or a video camera) that targets a plurality of persons walking on a plane and whose positions and postures are fixed may be used. In this case, the coordinates that define the three-dimensional space can be set arbitrarily, so the XY plane should be the plane on which the person is walking and the Z axis should be set to represent the height from the floor. Is possible.

図２に示すように、Ｘ軸およびＹ軸の原点にカメラ１を置けば、カメラ１の三次元位置は（０，０，Ｔｚ）と平面上からの高さを示す一つの変数で表すことができ、またカメラ１の姿勢についてもＸ軸、Ｙ軸周りの回転角（φ，θ）を用いた二変数で表すことができる。これら３つの変数（Ｔｚ，φ，θ）がカメラ１の外部パラメータを示している。 As shown in FIG. 2, if the camera 1 is placed at the origin of the X axis and the Y axis, the three-dimensional position of the camera 1 is represented by (0, 0, Tz) and one variable indicating the height from the plane. The posture of the camera 1 can also be expressed by two variables using rotation angles (φ, θ) around the X axis and the Y axis. These three variables (Tz, φ, θ) indicate the external parameters of the camera 1.

つぎにカメラ１の内部パラメータは、カメラレンズの歪を無視し、投影中心が画像の中心を示していることを仮定すると、レンズの焦点距離ｆのみで表すことが可能である。 Next, the internal parameters of the camera 1 can be expressed only by the focal length f of the lens, assuming that the distortion of the camera lens is ignored and the projection center indicates the center of the image.

これらカメラ１の内部・外部パラメータを用いると、三次元空間上の点が二次元画像上のどこに投影されるかを計算するのに必要な投影行列Ｐを、以下の式１により計算することが可能である。この式１において行列Ｉは３×３の単位行列である。 Using these internal and external parameters of the camera 1, a projection matrix P necessary for calculating where on the two-dimensional image a point on the three-dimensional space is projected can be calculated by the following equation (1). Is possible. In Equation 1, the matrix I is a 3 × 3 unit matrix.

また、対象物体を表すために図３に示すように、２つの楕円体Ｐ．Ｑを用いて人物の形状に近似したモデルを定義した。ここではモデル頭頂部の画像上での二次元座標値（ｘ，ｙ），頭部に近似する楕円体Ｐの短軸ｒ１，身体に近似する楕円体Ｑの短軸ｒ２，身長を表すモデルの高さｈで人物を表している。なお、前記楕円体Ｐの長軸と短軸は一定の比率を保っていると考え、長軸半径ｒ＝ｋ１×ｒ１とした。ここでｋ１は任意の定数であり、例えば１．５という数値を使ってもよい。また、前記楕円体Ｑの長軸に関しても同様に長軸半径ｒ＝ｈ−ｋ２×ｒ１とした。ｋ２は任意の定数であり、例えば１．２という数値を使ってもよい。 In order to represent the target object, as shown in FIG. A model approximating the shape of a person was defined using Q. Here, the model represents the two-dimensional coordinate value (x, y) on the image of the top of the model, the short axis r2 of the ellipsoid P approximating the head, the short axis r2 of the ellipsoid Q approximating the body, and the height. A person is represented by a height h. The major axis and the minor axis of the ellipsoid P are considered to maintain a constant ratio, and the major axis radius r = k1 × r1. Here, k1 is an arbitrary constant, and for example, a numerical value of 1.5 may be used. Similarly, the major axis radius of the ellipsoid Q is set to r = h−k2 × r1. k2 is an arbitrary constant, and for example, a numerical value of 1.2 may be used.

したがって、一人の人物の状態は、（ｘ，ｙ，ｒ１，ｒ２，ｈ）の５つのパラメータにより表すことができる。また、複数の人物の状態を表す対象状態は、各人物の位置、大きさを表すベクトルＭ（人物モデル）を並べたものによって表すことも可能である。すなわち、カメラ１の内部・外部パラメータと複数の人物の状態を含んだ状態Ｓは、以下の式２で表される。 Therefore, the state of one person can be expressed by five parameters (x, y, r1, r2, h). The target state representing the states of a plurality of persons can also be represented by arranging vectors M (person models) representing the positions and sizes of the persons. That is, the state S including the internal / external parameters of the camera 1 and the states of a plurality of persons is expressed by the following Expression 2.

ここで状態Ｓを構成する人物ｋの位置、大きさ、形を表す人物モデルＭ_kは、以下の式３で表される５次元ベクトルである。 Here, the person model M _k representing the position, size, and shape of the person k constituting the state S is a five-dimensional vector represented by the following Expression 3.

ただし、ここで人物モデルＭ_kを構成する５つのパラメータは、前述の人物の形状を近似したモデルのパラメータである。 Here, the five parameters constituting the person model M _k are parameters of a model that approximates the shape of the person described above.

以上により、もしカメラ１の内部・外部パラメータおよび複数の対象の状態を含む状態Ｓが既知であるならば、非特許文献１の方法と同様にしてシルエット画像のシミュレーションを行うことが可能であることは自明である。 As described above, if the state S including the internal / external parameters of the camera 1 and the states of a plurality of objects is known, it is possible to simulate a silhouette image in the same manner as in the method of Non-Patent Document 1. Is self-explanatory.

本発明は、この状態Ｓを予測し、シルエット画像のシミュレーションを行い、実際のシルエット画像と比較し、整合性を評価することで、予測された状態Ｓの評価を行う。 The present invention evaluates the predicted state S by predicting the state S, simulating a silhouette image, comparing it with an actual silhouette image, and evaluating consistency.

本発明の基本原理は、以上の処理を確率統計状態分布推定法の一種であるＭＣＭＣ（ＭａｒｋｏｖＣｈａｉｎＭｏｎｔｅＣａｒｌｏ）法を用いることで、正しいパラメータの組み合わせを表す状態Ｓを推定することである。この基本原理は、本発明の実施形態に係るカメラ校正装置に反映されている。 The basic principle of the present invention is to estimate a state S representing a correct combination of parameters by using the MCMC (Markov Chain Monte Carlo) method which is a kind of probability statistical state distribution estimation method for the above processing. This basic principle is reflected in the camera calibration apparatus according to the embodiment of the present invention.

（２）具体的な実施形態
つぎに図４〜図１０に基づき本発明の実施形態を説明する。図４は、本発明の実施形態に係るカメラ構成装置１０の構成例を示している。このカメラ構成装置１０は、物体の画像を取得する画像取得手段１１とネットワーク接続されているものとする。 (2) Specific Embodiment Next, an embodiment of the present invention will be described with reference to FIGS. FIG. 4 shows a configuration example of the camera configuration device 10 according to the embodiment of the present invention. Assume that this camera configuration apparatus 10 is connected to an image acquisition unit 11 that acquires an image of an object via a network.

この画像取得手段１１は、各時刻に撮影された画像データを取得する手段、即ち前記カメラ１に相当する。詳細に述べれば、デジタルカメラやビデオカメラなどでとらえた物体の画像をデジタル信号に変換する画像センサなどで構成されている。ここでも撮影対象の物体（１又は複数の人物・動物など）は、同一平面上に存在しているものとする。 The image acquisition means 11 corresponds to a means for acquiring image data taken at each time, that is, the camera 1. More specifically, the image sensor includes an image sensor that converts an image of an object captured by a digital camera or a video camera into a digital signal. Here again, it is assumed that an object to be imaged (one or a plurality of persons / animals, etc.) exists on the same plane.

具体的には、前記カメラ校正装置１０は、コンピュータにより構成され、前記画像取得手段１１から時系列に画像データが入力される。ここでは前記カメラ校正装置１０は、コンピュータのハードウェア資源（ＣＰＵ，メモリ，ハードディスクドライブ装置，通信インタフェースなど）とソフトウェアとの協働の結果、シルエット画像作成手段１２，エッジ画像作成手段１３，パラメータ推定手段１４として機能している。 Specifically, the camera calibration device 10 is configured by a computer, and image data is input from the image acquisition unit 11 in time series. Here, the camera calibration device 10 is a result of cooperation between computer hardware resources (CPU, memory, hard disk drive, communication interface, etc.) and software, resulting in silhouette image creation means 12, edge image creation means 13, parameter estimation. It functions as the means 14.

前記シルエット画像作成手段１２は、入力された画像データから対象物体の写っている領域だけを抽出したシルエット画像を作成する。前記エッジ画像作成手段１３は、入力された画像データからエッジを抽出したエッジ画像を作成する。 The silhouette image creating means 12 creates a silhouette image obtained by extracting only the region where the target object is shown from the input image data. The edge image creating means 13 creates an edge image obtained by extracting an edge from input image data.

前記パラメータ推定手段１４は、前記シルエット画像の作成手段１２によって作成されたシルエット画像を用いて、前記状態Ｓを推定する。すなわち、シルエット画像中に存在する物体数・物体の大きさ・物体の三次元位置と、前記画像取得手段１１の内部・外部パラメータ（焦点距離ｆ・三次元位置（０，０，Ｔｚ）およびＸ軸とＹ軸周りの回転角で表される姿勢（φ，θ））とを推定する。この推定結果は、モニタやプリンタなどに出力される。以下、図５のチャートに基づき前記カメラ校正装置１０の処理ステップ（Ｓ５０１〜Ｓ５０４）を説明する。 The parameter estimation unit 14 estimates the state S using the silhouette image created by the silhouette image creation unit 12. That is, the number of objects present in the silhouette image, the size of the object, the three-dimensional position of the object, and the internal / external parameters (focal length f, three-dimensional position (0, 0, Tz) and X A posture (φ, θ)) expressed by an axis and a rotation angle around the Y axis is estimated. This estimation result is output to a monitor or a printer. The processing steps (S501 to S504) of the camera calibration device 10 will be described below based on the chart of FIG.

Ｓ５０１：処理が開始されると、前記画像取得手段１１から撮影画像のデータ（Ｉｍ）が通信インタフェース経由で入力される（Ｓ５０１）。この実施形態では一例としてネットワーク経由で入力されることを前提に説明するが、前記画像データ（Ｉｍ）を取得できれば取得手段の如何は問わないものとする。 S501: When processing is started, captured image data (Im) is input from the image acquisition means 11 via a communication interface (S501). In this embodiment, description will be made on the assumption that the data is input via a network as an example. However, any acquisition means may be used as long as the image data (Im) can be acquired.

Ｓ５０２：前記シルエット画像作成手段１２によって、Ｓ５０１で入力された画像データ（Ｉｍ）に対し、シルエット画像（Ｓｉｌ）を作成する。シルエット画像は、撮影した画像の対象物体が写っている領域の輝度値が１、他の領域が０である２値画像であり、図６（ａ）は入力画像の一例を示し、図６（ｂ）はシルエット画像の一例を示している。このようなシルエット画像は、背景差分法やフレーム間差分法などの周知な方法を利用することで簡単に生成することができる。 S502: The silhouette image creating means 12 creates a silhouette image (Sil) for the image data (Im) input in S501. The silhouette image is a binary image in which the luminance value of the region where the target object of the photographed image is reflected is 1 and the other region is 0. FIG. 6A shows an example of the input image. b) shows an example of a silhouette image. Such a silhouette image can be easily generated by using a known method such as a background difference method or an inter-frame difference method.

Ｓ５０３：前記エッジ画像作成手段１３によって、Ｓ５０１で入力された画像データ（Ｉｍ）に対し、エッジ画像を作成する。ここではエッジは垂直方向、水平方向を検出し、垂直方向Ｅｈおよび水平方向Ｅｖを成分とする（Ｅｈ，Ｅｖ）を各画素ごと保存し、これをエッジ画像とする。 S503: The edge image creation means 13 creates an edge image for the image data (Im) input in S501. Here, the edge detects the vertical direction and the horizontal direction, and (Eh, Ev) having the vertical direction Eh and the horizontal direction Ev as components is stored for each pixel, and this is used as an edge image.

このエッジ画像の生成はよく知られたＳｏｂｅｌオペレータ等を用いて簡単に生成することができる。一例として図７（ａ）は入力画像を示し、図７（ｂ）は垂直エッジ検出画像を示し、図７（ｃ）は水平エッジ検出画像を示している。なお、Ｓ５０２．Ｓ５０３で作成したシルエット画像・エッジ画像はメモリ（ＲＡＭ）やハードディスクドライブ装置に記憶・保存してもよい。 The edge image can be easily generated by using a well-known Sobel operator or the like. As an example, FIG. 7A shows an input image, FIG. 7B shows a vertical edge detection image, and FIG. 7C shows a horizontal edge detection image. Note that S502. The silhouette image / edge image created in S503 may be stored / saved in a memory (RAM) or a hard disk drive.

Ｓ５０４：前記パラメータ推定手段１４によって、Ｓ５０２で作成されたシルエット画像（Ｓｉｌ）を用いて、画像中に存在する物体の数、三次元位置、大きさ、前記画像取得手段１１におけるカメラの内部・外部パラメータ[カメラの焦点距離ｆ、三次元位置（０，０，Ｔｚ）、Ｘ軸・Ｙ軸周りの回転角で表される姿勢（φ，θ）]を含む状態Ｓを推定する。 S504: Using the silhouette image (Sil) created in S502 by the parameter estimation means 14, the number of objects existing in the image, the three-dimensional position, the size, the inside / outside of the camera in the image acquisition means 11 The state S including the parameters [focal length f of the camera, three-dimensional position (0, 0, Tz), posture (φ, θ) represented by rotation angles around the X axis and Y axis] is estimated.

以下、図８のフローチャートに基づきＳ５０４の具体的な処理内容（Ｓ６０１〜Ｓ６０７）を説明する。ここでは状態Ｓの推定は、複数回、（Ｂ＋Ｐ）回の反復計算によって生成される（Ｂ＋Ｐ）個の状態候補を用いて行われる。また、初期状態をＳ’₀，ｎ回目の反復計算において生成された状態候補をＳ’_nと表すことにする。最初のＢ回に生成された状態候補は捨て、最後のＰ回に得られた状態候補は全て等確率で起こりうると考えパラメータを推定する。これは最初のほうに生成された状態候補は初期状態Ｓ’₀に大きく依存していると考えられるためである。 Hereinafter, the specific processing content (S601 to S607) of S504 will be described based on the flowchart of FIG. Here, the estimation of the state S is performed using (B + P) state candidates generated by a plurality of (B + P) iterations. Further, the initial state is represented as S ′ ₀ , and the state candidate generated in the n-th iterative calculation is represented as S ′ _n . The state candidates generated in the first B times are discarded, and the state parameters obtained in the last P times can be generated with equal probability, and the parameters are estimated. This is because the state candidate generated at the beginning is considered to largely depend on the initial state S ′ ₀ .

そして、更新回数（ｎ＝１）として処理が開始される。Ｓの初期状態は、任意であるが、例えば、次のように設定することができる。（ｆ，Ｔｚ，φ，θ）＝（Ｘ＿ＳＩＺＥ，２００，０．０，０．０）、人物の状態を空集合とする。すなわちＳ＝（Ｘ＿ＳＩＺＥ，２００，０．０，０．０）である。ただし、Ｘ＿ＳＩＺＥは入力画像の水平方向の大きさである。 Then, the processing is started as the number of updates (n = 1). The initial state of S is arbitrary, but can be set as follows, for example. (F, Tz, φ, θ) = (X_SIZE, 200, 0.0, 0.0), and the state of the person is an empty set. That is, S = (X_SIZE, 200, 0.0, 0.0). X_SIZE is the horizontal size of the input image.

Ｓ６０１：変化タイプの選択を行う。変化タイプとは対象状態を変化させる処理のタイプを表すものであり、次の７種類から成る。 S601: A change type is selected. The change type represents the type of processing for changing the target state, and consists of the following seven types.

１：人物の追加，２：人物の消去，３：人物の位置変更，４：人物の大きさ変更，５：焦点距離変更，６：カメラ位置変更，７：カメラ姿勢変更。 1: Add person, 2: Delete person, 3: Change person's position, 4: Change person's size, 5: Change focal length, 6: Change camera position, 7: Change camera posture.

まず現在の状態として人物が含まれていない場合には必ず（１：人物の追加）が選択される。それ以外の場合には各タイプが選ばれる確率を予め任意に設定しておき、確率的に４つのタイプから選択を行う。例えば（ｐ１，ｐ２，ｐ３，ｐ４，ｐ５，ｐ６，ｐ７）＝（０．０５，０．０５，０．２，０．２，０．１，０．２，０．２）というように設定することができる（ｐｉはタイプｉが選択される確率）。 First, when a person is not included in the current state, (1: Add person) is always selected. In other cases, the probability that each type is selected is set arbitrarily in advance, and the probability is selected from four types. For example, (p1, p2, p3, p4, p5, p6, p7) = (0.05, 0.05, 0.2, 0.2, 0.1, 0.2, 0.2) (Pi is the probability that type i will be selected).

Ｓ６０２：Ｓ６０１において選択された変化タイプに従い、状態候補Ｓ^*を生成する。以下で変化タイプ別に処理を説明する。 S602: A state candidate S ^* is generated according to the change type selected in S601. The processing will be described below for each change type.

＜１：人物の追加＞
人物の追加が選ばれた場合には新たにモデルパラメータの設定を行う必要がある。例えばこれは文献１に記載の次の方法を用いてもよい。 <1: Add person>
When the addition of a person is selected, it is necessary to newly set model parameters. For example, the following method described in Document 1 may be used.

まず人物に近似する楕円体の大きさを表すパラメータｒ１、ｒ２およびｈに関しては、人物の大きさに関する経験的な事前知識（３ｍを超える身長の人はいないなど）から取り得る値の範囲を制限することができる。ここではこの範囲を
ｒ１：ｒ１_min≦ｒ１≦ｒ１_max
ｒ２：ｒ２_min≦ｒ２≦ｒ２_max
ｈ：ｈ_min≦ｈ≦ｈ_max
とし、この条件下で一様乱数を発生させることで、ｒ１、ｒ２、ｈのパラメータを新たに生成することができる。 First, regarding the parameters r1, r2 and h representing the size of an ellipsoid that approximates a person, the range of values that can be taken from empirical prior knowledge about the size of the person (such as no one with a height exceeding 3 m) is limited. can do. Here, this range is set to r1: r1 _min ≦ r1 ≦ r1 _max
r2: r2 _min ≤ r2 ≤ r2 _max
h: h _min ≤ h ≤ h _max
By generating uniform random numbers under these conditions, parameters r1, r2, and h can be newly generated.

頭部位置（ｘ，ｙ）に関するパラメータは、以下のようにして生成することが可能である。まず水平方向の頭部位置ｘを生成する。図９に示すように人物は画像上で垂直方向に立っているように撮影されていることを仮定すると、ある位置ｘにおいて垂直方向に対象だけが写っているシルエット画像（Ｓｉｌ）を足し合わせれば人物の頭部が存在する位置において、この値Ｖが高くなる。ある位置ｘにおける足し合わせ量Ｖ（ｘ）は、以下の式４で表され、これを画像上の全てのｘにおいて計算する。 The parameter relating to the head position (x, y) can be generated as follows. First, a horizontal head position x is generated. Assuming that a person is photographed so as to stand in the vertical direction on the image as shown in FIG. 9, if a silhouette image (Sil) in which only the object is reflected in the vertical direction at a certain position x is added, This value V increases at a position where the head of the person exists. The addition amount V (x) at a certain position x is expressed by the following Equation 4, which is calculated for all x on the image.

ただし、Ｓｉｌ（ｘ，ｙ）はシルエット画像のｘ行ｙ列成分、Ｙは画像の垂直方向サイズを示す。このＶ（ｘ）を水平方向の頭部位置確率分布とするために、画像上の全てのｘのＶ（ｘ）の総和が１．０となるように正規化を行い、生成された水平方向の頭部位置確率分布Ｖ（ｘ）に従った位置ｘを抽出する。これは例えば次のようにして実現可能である。すなわち、０．０〜１．０の間の一様乱数を利用して、値ｖａｌを得る。そして、Ｖ（ｘ）をｘ＝０から順に値ｓｕｍに足し合わせていき、初めてｓｕｍ≧ｖａｌとなったときのｘを抽出することで実現可能である。 However, Sil (x, y) indicates the x row and y column components of the silhouette image, and Y indicates the vertical size of the image. In order to make this V (x) a head position probability distribution in the horizontal direction, normalization is performed so that the sum of V (x) of all x on the image becomes 1.0, and the generated horizontal direction The position x according to the head position probability distribution V (x) is extracted. This can be realized, for example, as follows. That is, the value val is obtained using a uniform random number between 0.0 and 1.0. It can be realized by adding V (x) to the value sum in order from x = 0 and extracting x when sum ≧ val for the first time.

つぎに垂直方向の頭部位置ｙを生成する。すでに人物に近似する楕円体の大きさを表すパラメータは生成されているので、これとカメラの内部・外部パラメータ（ｆ，Ｔｚ，，φ，θ）を用いて投影行列Ｐを計算し、人物頭部を表す部分の楕円体（以下、頭部モデルとする。）を画像上に投影することができる。図１０に示すように頭部モデル上におけるエッジ検出画像（Ｅｖ，Ｅｈ）の値を足し合わせて平均を取った値をＷ（ｙ）とすると、人物頭部付近で大きくなる。Ｗ（ｙ）の計算は、以下の式５で計算する。 Next, a vertical head position y is generated. Since the parameter representing the size of the ellipsoid that approximates the person has already been generated, the projection matrix P is calculated using this and the camera internal and external parameters (f, Tz, φ, θ), and the human head An ellipsoid representing a part (hereinafter referred to as a head model) can be projected on the image. As shown in FIG. 10, when the value obtained by adding the values of the edge detection images (Ev, Eh) on the head model and taking the average is W (y), the value increases in the vicinity of the human head. W (y) is calculated by the following formula 5.

ただし、ＨＬは頭部輪郭長、ｎ（ｉ）は頭部モデルの法線ベクトル、ｇ（ｉ）＝（Ｅｖ（ｉ），Ｅｈ（ｉ））で頭部輪郭におけるエッジ検出画像の値を要素とする勾配ベクトルである。 Here, HL is the head contour length, n (i) is the normal vector of the head model, and g (i) = (Ev (i), Eh (i)) is the value of the edge detection image in the head contour. Is a gradient vector.

画像上の全てのｙでＷ（ｙ）を計算し、最後にこれを位置ｘにおける垂直方向の頭部位置確率分布となるように、Ｗ（ｙ）の総和を１．０とする正規化を行い、垂直方向の頭部位置確率分布Ｗ（ｙ）に従って位置ｙを抽出する。これは水平方向と同様の方法で抽出することが可能である。 W (y) is calculated for all y on the image, and finally, normalization is performed with the sum of W (y) being 1.0 so that this becomes the vertical head position probability distribution at position x. The position y is extracted according to the head position probability distribution W (y) in the vertical direction. This can be extracted in the same manner as in the horizontal direction.

以上のようにして新しいモデルパラメータＭ_nを生成することができる。暫定的な状態候補Ｓ^*はＳ’_n-1に新たな対象状態Ｍ_nを追加した状態となる。 A new model parameter M _n can be generated as described above. The provisional state candidate S ^* is a state in which a new target state M _n is added to S ′ _n−1 .

＜２：人物の消去＞
人物の消去が選択された場合には、まず前回生成された状態候補Ｓ’_n-1からランダムに対象を選択する。すなわち、Ｋ個の対象があった場合には各々の対象が１／Ｋの確率で選択されることになる。つぎに選択された対象Ｍ_kを消去する。暫定的な状態候補Ｓ^*はＳ’_n-1から対象状態Ｍ_kを消去した状態となる。 <2: Erase person>
When the deletion of the person is selected, first, a target is randomly selected from the previously generated state candidates S ′ _n−1 . That is, when there are K objects, each object is selected with a probability of 1 / K. Next, the selected object M _k is deleted. The provisional state candidate S ^* is a state in which the target state M _k is deleted from S ′ _n−1 .

＜３：人物の位置変更＞
人物の位置変更が選択された場合には、まず前回生成された状態候補Ｓ’_n-1からランダムに対象を選択する。すなわち、Ｋ個の対象があった場合には各々の対象が１／Ｋの確率で選択されることになる。つぎに選択されたＭ_kの要素の内、人物の二次元平面上での位置（ｘ_k，ｙ_k）から、次の式６に従って変化させた暫定的な位置（ｘ’_k，ｙ’_k）を計算する。 <3: Person position change>
When the position change of the person is selected, first, a target is randomly selected from the previously generated state candidates S ′ _n−1 . That is, when there are K objects, each object is selected with a probability of 1 / K. Next, among the selected elements of M _k , the provisional position (x ′ _k , y ′ _k ) changed from the position (x _k , y _k ) of the person on the two-dimensional plane according to the following equation 6 ).

ただし、δ_x，δ_yはそれぞれ一次元のガウスノイズであり、そのパラメータは実験的に任意の値に決めることができる。Ｍ’_kを以下の式７のように定義する。 However, δ _x and δ _y are one-dimensional Gaussian noises, and their parameters can be experimentally determined to arbitrary values. M ′ _k is defined as in Equation 7 below.

暫定的な状態候補Ｓ^*はＳ’_n-1の対象状態Ｍ_kをＭ’_kに更新した状態となる。 The provisional state candidate S ^* is a state in which the target state M _k of S ′ _n−1 is updated to M ′ _k .

＜４：人物の大きさ変更＞
人物の大きさ変更が選択された場合には、まず前回生成された状態候補Ｓ’_n-1からランダムに対象を選択する。すなわち、Ｋ個の対象があった場合には各々の対象が１／Ｋの確率で選択されることになる。つぎに選択されたＭ_kの要素の内、人物に近似する楕円体の半径ｒ１、半径ｒ２および高さｈを表す（ｒ１_k，ｒ２_k，ｈ_k）から次の式８に従って変化させた暫定的な形、大きさのパラメータ（ｒ１’_k，ｒ２’_k，ｈ’_k）を計算する。 <4: Changing the size of a person>
If the magnitude of change of the person is selected, first select a target at random from the state candidate S _'n-1 that was last generated. That is, when there are K objects, each object is selected with a probability of 1 / K. Next, among the selected elements of M _k , the provisional values changed from (r1 _k , r2 _k , h _k ) representing the radius r1, radius r2 and height h of an ellipsoid approximating a person according to the following equation (8): The parameters of the general shape and size (r1 ′ _k , r2 ′ _k , h ′ _k ) are calculated.

ただし、δ_r1，δ_r2，δ_hはそれぞれ一次元のガウスノイズであり、そのパラメータは実験的に任意の値に決めることができる。この暫定的な形、大きさのパラメータを用いてＭ’_kを以下の式９のように定義する。 However, δ _r1 , δ _r2 , and δ _h are one-dimensional Gaussian noises, and their parameters can be experimentally determined to arbitrary values. Using this provisional shape and size parameter, M ′ _k is defined as in Equation 9 below.

＜５：焦点距離変更＞
焦点距離変更が選択された場合には、カメラの焦点距離パラメータをｆから次の式１０に従って変化させた暫定的な焦点距離ｆ’に更新する。 <5: Change focal length>
If the focal length change is selected, the focal length parameter of the camera is updated from f to a temporary focal length f ′ changed according to the following expression 10.

ただし、δ_fは一次元のガウスノイズであり、そのパラメータは実験的に任意の値に決めることができる。 However, δ _f is a one-dimensional Gaussian noise, and its parameter can be determined to an arbitrary value experimentally.

＜６：カメラ位置変更＞
カメラ位置変更が選択された場合には、カメラの位置パラメータをＴｚから次の式１１に従って変化させた暫定的なカメラ位置Ｔｚ’に更新する。 <6: Camera position change>
When the camera position change is selected, the camera position parameter is updated from Tz to the provisional camera position Tz ′ changed according to the following Expression 11.

ただし、δ_Tzは一次元のガウスノイズであり、そのパラメータは実験的に任意の値に決めることができる。 However, δ _Tz is a one-dimensional Gaussian noise, and its parameter can be determined to an arbitrary value experimentally.

＜７：カメラ姿勢変更＞
カメラ姿勢変更が選択された場合には、カメラの姿勢パラメータを（φ，θ）から次の式１２に従って変化させた暫定的なカメラ姿勢回転角（φ’，θ’）で更新する。 <7: Camera posture change>
When the camera posture change is selected, the camera posture parameter is updated with the temporary camera posture rotation angle (φ ′, θ ′) changed from (φ, θ) according to the following Expression 12.

ただし、δ_Φ，δ_Θはそれぞれ一次元のガウスノイズであり、そのパラメータは実験的に任意の値に決めることができる。 However, δ _Φ and δ _Θ are one-dimensional Gaussian noises, and their parameters can be experimentally determined to arbitrary values.

Ｓ６０３：Ｓ６０２の暫定的な状態候補Ｓ^*を用いて、Ｓ５０３において作成されたシルエット画像（Ｓｉｌ）をシミュレートした画像、即ちシミュレーション画像（Ｓｉｍ）を作成する。ここで三次元空間に存在する物体のカメラ画像上への投影関係を表す投影行列Ｐはパラメータ（ｆ，Ｔｚ，φ，θ）より計算できる。 S603: An image simulating the silhouette image (Sil) created in S503, that is, a simulation image (Sim) is created using the temporary state candidate S ^* in S602. Here, the projection matrix P representing the projection relationship of the object existing in the three-dimensional space onto the camera image can be calculated from the parameters (f, Tz, φ, θ).

この関係を利用すると画像上の二次元座標値（ｘ，ｙ）に投影される平面上に直立する高さｈの物体の平面上の二次元座標値（Ｘ，Ｙ）を計算することが可能である。これにより三次元空間内にＳ６０２で計算された暫定的な対象状態Ｓ^*に基づいた楕円体で構成された三次元モデルを複数配置することができる。 By using this relationship, it is possible to calculate the two-dimensional coordinate value (X, Y) on the plane of the object of height h standing upright on the plane projected to the two-dimensional coordinate value (x, y) on the image. It is. Thereby, a plurality of 3D models composed of ellipsoids based on the provisional target state S ^* calculated in S602 can be arranged in the 3D space.

以上のように生成された仮想シーンを仮想カメラへ投影することで、シミュレーション画像を作成する。画像上の楕円体領域の値を１、それ以外の領域の値を０とする２値画像とすることで、シルエット画像（Ｓｉｌ）をシミュレートしたシミュレーション画像（Ｓｉｍ）を作成することができる。 A simulation image is created by projecting the virtual scene generated as described above onto a virtual camera. By using a binary image in which the value of the ellipsoidal region on the image is 1 and the value of the other region is 0, a simulation image (Sim) simulating a silhouette image (Sil) can be created.

Ｓ６０４：Ｓ６０２の暫定的な対象状態Ｓ^*が、尤もらしいか否かを判断するために、尤度Ｌを計算する。これは事前知識による対象状態自体の確からしさの判定およびＳ５０３で作成されたシルエット画像（Ｓｉｌ）とＳ６０３で生成されたシミュレーション画像（Ｓｉｍ）の比較により行う。 S604: The likelihood L is calculated in order to determine whether or not the provisional target state S ^{* in} S602 is likely. This is performed by determining the probability of the target state itself based on prior knowledge and comparing the silhouette image (Sil) created in S503 with the simulation image (Sim) generated in S603.

ここで事前知識による対象状態自体の確からしさの判定は、人物同士が三次元空間上で重なることがないという条件である。いま、人物を楕円体モデルで近似しているため、個々の楕円体が重ならない条件、即ち異なる二つの楕円体の中心同士の距離が、二つの楕円体の半径の和より大きいことを満たす必要がある。 Here, the determination of the certainty of the target state itself based on prior knowledge is a condition that persons do not overlap in a three-dimensional space. Now, since the person is approximated by an ellipsoid model, it is necessary to satisfy the condition that the individual ellipsoids do not overlap, that is, the distance between the centers of two different ellipsoids is greater than the sum of the radii of the two ellipsoids. There is.

二人の人物同士が三次元空間上で重ならないようにするために、次の式１３で表された距離Ｒ、即ち二人の人物を表す楕円体の中心同士の距離に応じたペナルティ関数Ｅ（Ｓ^*）を設定する。 In order to prevent two persons from overlapping each other in a three-dimensional space, a penalty function E corresponding to the distance R expressed by the following equation 13, that is, the distance between the centers of ellipsoids representing the two persons. Set (S ^* ).

ただし、Ｘ_k，Ｙ_kはシミュレーション画像生成の際に計算した三次元空間上での人物ｋの平面上の位置Ｘ座標、Ｙ座標を示す。楕円体の距離が近くなるとペナルティを与える関数Ｅは、例えば以下の式１４のように設定できる。ただしαは定数で、実験的に任意に設定可能である。 Here, X _k and Y _k indicate the position X coordinate and Y coordinate on the plane of the person k in the three-dimensional space calculated when the simulation image is generated. The function E that gives a penalty when the distance between the ellipsoids becomes short can be set as in the following Expression 14, for example. However, α is a constant and can be arbitrarily set experimentally.

つぎにシルエット画像（Ｓｉｌ）とＳ６０３で生成されたシミュレーション画像（Ｓｉｍ）の比較について説明する。これは例えば以下の式１５の評価式によって計算を行うことができる。ただし、（ｕ，ｖ）は画像のｕ行ｖ列成分を示す。 Next, a comparison between the silhouette image (Sil) and the simulation image (Sim) generated in S603 will be described. This can be calculated by, for example, the following evaluation formula (15). However, (u, v) indicates the u row v column component of the image.

最終的に暫定状態Ｓ^*の尤度Ｌは、式１６によって表される積演算によって計算される。 Finally, the likelihood L of the provisional state S ^* is calculated by a product operation represented by Expression 16.

このような尤度を用いることで、三次元空間上での人物位置の確からしさと二次元画像上での追跡を融合することが可能となる。 By using such likelihood, it becomes possible to combine the accuracy of the person position in the three-dimensional space and the tracking on the two-dimensional image.

Ｓ６０５：Ｓ６０４で計算された尤度Ｌを用いて、暫定状態候補Ｓ^*を受け入れるか、もしくは拒否するかの演算を行う。これは受け入れ拒否判断確率Ａの計算によって決定する。受け入れ拒否判断確率Ａは、例えば以下の式１２により計算することが可能である。 S605: Using the likelihood L calculated in S604, the provisional state candidate S ^* is accepted or rejected. This is determined by calculating the acceptance rejection judgment probability A. The acceptance refusal judgment probability A can be calculated by, for example, the following Expression 12.

ここで現在ｎ回目の更新だとすると、Ｌ’はｎ−１回目の候補状態Ｓ’_n-1の尤度を表す。すなわち、前回の状態よりも今回の状態のほうが、尤度が高ければ必ず受け入れ、もし前回の状態よりも尤度が低ければ、Ｌ／Ｌ’の確率で暫定状態Ｓ^*を採用する。 Here, if it is the nth update at present, L ′ represents the likelihood of the _n− 1th candidate state S ′ _n−1 . That is, the current state is always accepted if the likelihood is higher than the previous state, and if the likelihood is lower than the previous state, the provisional state S ^* is adopted with a probability of L / L ′.

ここで暫定状態Ｓ^*が採用されれば、Ｓ’_n＝Ｓ^*とし、採用が拒否されればＳ’_n＝Ｓ’_t,n-1として、Ｓ６０６の条件判断ステップへと進む。 Here, if the provisional state S ^* is adopted, S ′ _n = S ^* is set, and if adoption is rejected, S ′ _n = S ′ _{t, n−1} is set and the process proceeds to the condition determination step of S606.

Ｓ６０６：更新処理が規定の回数行われたかの判断が行われる。更新回数ｎが任意に決定できる定数ＢとＰの和で表されるＢ＋Ｐ回を超えていれば反復計算処理を終了し、Ｓ６０７の対象状態の推定ステップへと進む。 S606: It is determined whether the update process has been performed a prescribed number of times. If the number of updates n exceeds B + P times represented by the sum of constants B and P that can be arbitrarily determined, the iterative calculation process ends, and the process proceeds to the target state estimation step of S607.

逆に更新回数ｎが、Ｂ＋Ｐ回を越えていなければＳ６０１に戻って再度処理を繰り返す。なお、更新処理の規定回数はプログラムなどに定義されているものとする。 Conversely, if the number of updates n does not exceed B + P times, the process returns to S601 and the process is repeated again. It is assumed that the prescribed number of update processes is defined in a program or the like.

Ｓ６０７：最大確率を持つ状態の計算を行う。反復計算により生成された状態候補のうち、最後のＰ個の状態は全て等確率で起こると仮定しているため、最大確率を持つ状態Ｓは、以下の式１８によって計算することができる。 S607: A state having the maximum probability is calculated. Since it is assumed that the last P states of the state candidates generated by the iterative calculation all occur with equal probability, the state S having the maximum probability can be calculated by the following Expression 18.

このようにして計算した状態Ｓが、三次元空間中に存在する１又は複数の物体や人物を撮影することで各々の物体の数、位置、大きさと、撮影に利用された前記画像取得手段１１におけるカメラの内部・外部パラメータとを含む推定結果である。 The state S calculated in this manner is obtained by photographing one or more objects or persons existing in the three-dimensional space, and the number, position, and size of each object, and the image acquisition unit 11 used for photographing. This is an estimation result including the internal and external parameters of the camera at.

この推定結果には、カメラの内部・外部パラメータを含んでいることから、該パラメータの計測が容易となる。また、ここで計算された状態Ｓは、三次元空間上での人物位置の確からしさが最も高いことから、ここに含まれたカメラの内部・外部パラメータは、適切なパラメータとしてカメラキャリブレーションなどに利用され、作業の簡素化などに役立てることができる。 Since this estimation result includes the internal and external parameters of the camera, it is easy to measure the parameters. In addition, since the state S calculated here has the highest probability of the person position in the three-dimensional space, the internal and external parameters of the camera included here are used as appropriate parameters for camera calibration and the like. It can be used to simplify work.

特に、これまではカメラの内部・外部パラメータの計測には、訓練を受けた熟練者にしか計測ができないといった問題点があったが、前記カメラ構成装置１０によれば、画像から自動的に推定されることから、カメラなどの画像取得手段１１を設置するだけで、画像に写る対象の三次元位置の推定や追跡といった応用が可能となる。 In particular, until now, the measurement of the internal and external parameters of the camera has been problematic in that only a trained skilled person can measure it. According to the camera construction device 10, it is automatically estimated from the image. Therefore, application such as estimation and tracking of a three-dimensional position of an object appearing in an image can be performed only by installing image acquisition means 11 such as a camera.

なお、Ｓ６０１〜Ｓ６０６の際の中間データは、適宜にメモリ（ＲＡＭ）やハードディスクドライブ装置に記憶・保存され、また式４〜式１８はプログラムなどに定義されているものとする。 It is assumed that the intermediate data in S601 to S606 is appropriately stored and saved in a memory (RAM) or a hard disk drive, and Equations 4 to 18 are defined in a program or the like.

本発明は、前記実施形態に限定されるものではなく、例えばＳ６０４のシルエット画像とシミュレーション画像の比較に式１５ではなく、画像間の類似度を取る評価関数を用いてもよい。この評価関数には単純な積演算を利用しても同様にＳ６０４の処理が実現できる。 The present invention is not limited to the above-described embodiment. For example, an evaluation function that takes similarity between images instead of Equation 15 may be used to compare the silhouette image and the simulation image in S604. Even if a simple product operation is used for this evaluation function, the processing of S604 can be realized in the same manner.

また、本発明は、前記カメラ校正装置１０の各手段１２〜１４の一部もしくは全部として、コンピュータを機能させるプログラムとしても構成することができる。この場合には、Ｓ５０１〜Ｓ５０４．Ｓ６０１〜Ｓ６０７の全てのステップあるいは一部のステップをコンピュータに実行させる。 Further, the present invention can also be configured as a program that causes a computer to function as a part or all of the means 12 to 14 of the camera calibration apparatus 10. In this case, S501 to S504. Cause the computer to execute all or some of steps S601 to S607.

このプログラムは、Ｗｅｂサイトや電子メールなどネットワークを通じて提供することができる。また、前記プログラムは、ＣＤ−ＲＯＭ，ＤＶＤ−ＲＯＭ，ＣＤ−Ｒ，ＣＤ−ＲＷ，ＤＶＤ−Ｒ，ＤＶＤ−ＲＷ，ＭＯ，ＨＤＤ，Ｂｌｕ−ｒａｙＤｉｓｋ（登録商標）などの記録媒体１０に記録して、保存・配布することも可能である。この記録媒体１０は、記録媒体駆動装置を利用して読み出され、そのプログラムコード自体が前記実施形態の処理を実現するので、該記録媒体も本発明を構成する。
（３）実施例
以下、前記カメラ校正装置１０を用いて、ビデオカメラから得られた映像を処理して、撮影された人物を利用して、追跡した結果を図５の処理フローを用いて述べる。まず処理が開始されると、カメラ画像の取得を行い（Ｓ５０１）、シルエット画像とエッジ画像とを作成する（Ｓ５０２，Ｓ５０３）。図１１（ａ）はビデオカメラからの入力画像を示し、図１１（ｂ）は背景差分法を用いて作成したシルエット画像を示し、図１１（ｃ）（ｄ）はＳｏｂｅｌオペレータを用いて作成したエッジ画像を示している。 This program can be provided through a network such as a website or e-mail. The program is recorded in a recording medium 10 such as a CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, MO, HDD, Blu-ray Disk (registered trademark). It can also be stored and distributed. The recording medium 10 is read using a recording medium driving device, and the program code itself realizes the processing of the above embodiment, so that the recording medium also constitutes the present invention.
(3) Embodiment Hereinafter, the camera calibration device 10 is used to process the video obtained from the video camera, and the result of tracking using the photographed person will be described with reference to the processing flow of FIG. . First, when processing is started, a camera image is acquired (S501), and a silhouette image and an edge image are created (S502, S503). FIG. 11A shows an input image from a video camera, FIG. 11B shows a silhouette image created using the background subtraction method, and FIGS. 11C and 11D show a silhouette image created using the Sobel operator. An edge image is shown.

つぎにＳ５０４にてパラメータの推定を行う。ここでは状態候補を逐次生成させながらその状態をシミュレートしたシミュレーション画像の作成を行って、その評価を行う。図１１（ｅ）は計算された最大確率を取る対象状態を表現したシミュレーション画像を示し、図１１（ｆ）はその対象状態を俯瞰した図を示している。図１１（ｆ）によれば、人物の三次元的な位置関係が正しく推定されていることが分かる。ここではカメラの内部・外部パラメータが正確に推定されていなければ、人物の三次元的な位置関係も正しく推定できないため、カメラの内部・外部パラメータの推定結果の正確性も示している。 Next, parameters are estimated in S504. Here, a simulation image simulating the state is generated while the state candidates are sequentially generated, and the evaluation is performed. FIG. 11E shows a simulation image representing a target state that takes the calculated maximum probability, and FIG. 11F shows a bird's eye view of the target state. As can be seen from FIG. 11F, the three-dimensional positional relationship of the person is correctly estimated. Here, if the internal and external parameters of the camera are not accurately estimated, the three-dimensional positional relationship of the person cannot be estimated correctly, and the accuracy of the estimation results of the internal and external parameters of the camera is also shown.

以上述べた処理により本実施例では人物が撮影された画像から自動的に人物の三次元位置・物体数・物体の大きさおよびビデオカメラの焦点距離、位置、姿勢といったパラメータの推定結果が出力される。 According to the processing described above, in this embodiment, estimation results of parameters such as the three-dimensional position, the number of objects, the size of the object, and the focal length, position, and posture of the video camera are automatically output from the image of the person photographed. The

１．１１…画像の取得手段（カメラ）
１０…カメラ校正装置
１２…シルエット画像作成手段
１３…エッジ画像作成手段
１４…パラメータ推定手段 1.11 Image acquisition means (camera)
DESCRIPTION OF SYMBOLS 10 ... Camera calibration apparatus 12 ... Silhouette image creation means 13 ... Edge image creation means 14 ... Parameter estimation means

Claims

A camera calibration device for obtaining internal and external parameters of a camera used for photographing from a photographed image of an object,
Silhouette image creation means for creating a silhouette image obtained by extracting a region where an object is captured from a photographed image of the camera;
Parameter estimation means for estimating internal and external parameters of the camera by evaluating the predicted state according to the change type of the object using the silhouette image;
A camera calibration device comprising:

The parameter estimation means selects a prepared object change type according to an arbitrary probability, and generates a state candidate according to the selected change type;
Means for projecting onto a virtual camera having the same internal and external parameters of the camera as the state candidate based on the state candidate, and generating a simulation image simulating the silhouette image;
Means for evaluating the state candidate from the result of comparing the silhouette image and the simulation image and the determination result of the probability of the target state itself based on prior knowledge;
The camera calibration apparatus according to claim 1, further comprising:

The means for evaluating the state candidate includes means for obtaining an evaluation value obtained by calculating the comparison result and the determination result;
Determining whether to accept the state candidate as a final change state based on the evaluation value,
3. The camera calibration apparatus according to claim 2, wherein internal and external parameters of the camera are estimated from the final change state.

Representing the state of the object as a set of ellipsoidal models, two-dimensional coordinate values on the image of the top of the model, the short axis of the ellipsoid that approximates the head, the short axis of the ellipsoid that approximates the body, The camera calibration apparatus according to claim 1, wherein a person is represented by a height of a model indicating height.

A camera calibration method for obtaining internal and external parameters of a camera used for photographing from a photographed image of an object,
A first step in which a silhouette image creating means creates a silhouette image obtained by extracting an area in which an object is captured from a photographed image of the camera;
A second step of estimating internal and external parameters of the camera by evaluating a state predicted by the parameter estimation unit according to the change type of the object using the silhouette image;
A camera calibration method characterized by comprising:

The second step includes selecting a change type of an object prepared in advance according to an arbitrary probability, and generating a state candidate according to the selected change type;
Based on the state candidate, projecting to a virtual camera having the same camera internal / external parameters as the state candidate, and generating a simulation image simulating the silhouette image;
Evaluating the candidate state from the determination result of the probability of the target state itself based on the comparison result of the silhouette image and the simulation image and prior knowledge;
The camera calibration method according to claim 5, further comprising:

The step of evaluating the state candidate includes obtaining an evaluation value obtained by calculating the comparison result and the determination result;
Determining whether to accept the state candidate as a final change state based on the evaluation value,
The camera calibration method according to claim 6, wherein internal and external parameters of the camera are estimated from the final change state.

Representing the state of the object as a set of ellipsoidal models, two-dimensional coordinate values on the image of the top of the model, the short axis of the ellipsoid that approximates the head, the short axis of the ellipsoid that approximates the body, The camera calibration method according to claim 5, wherein a person is represented by a height of a model indicating height.

A camera calibration program for causing a computer to function as the camera calibration device according to claim 1.

A recording medium on which the camera calibration program according to claim 9 is recorded.