JP4839330B2

JP4839330B2 - Image processing apparatus and image processing program

Info

Publication number: JP4839330B2
Application number: JP2008050976A
Authority: JP
Inventors: 亮和田; 弘幸音山
Original assignee: Toshiba Teli Corp
Current assignee: Toshiba Teli Corp
Priority date: 2008-02-29
Filing date: 2008-02-29
Publication date: 2011-12-21
Anticipated expiration: 2028-02-29
Also published as: JP2009210286A

Description

本発明は、単眼カメラ（１台のカメラ）で撮影した画像を処理対象とした画像処理装置および画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program for processing an image captured by a monocular camera (one camera).

カメラにより撮像された映像上の世界座標を求める手法としては、カメラ情報や、カメラ設置条件を利用した３次元計測等により、２次元の画面上にて３次元座標を求め、画面上の位置を求めるといった手法等がある。 As a method for obtaining the world coordinates on the image captured by the camera, the three-dimensional coordinates are obtained on the two-dimensional screen by the camera information, the three-dimensional measurement using the camera installation conditions, and the position on the screen is obtained. There are methods such as seeking.

この種のシステムとして、同一人物を２台以上の非同期カメラで撮影して、人物の３次元的な頭頂部位置を推定する、人物位置を推定するカメラシステムが存在した（特許文献１参照）。 As this type of system, there is a camera system that estimates the position of a person by photographing the same person with two or more asynchronous cameras and estimating the three-dimensional top position of the person (see Patent Document 1).

しかしながら、上記手法によるカメラ技術は、カメラ情報や設置位置を正確に測定する必要があり、また、カメラの設置位置によっては、測定が不可能な場合も存在するという問題があった。また、画像処理において、画面上に出現した変化領域からオブジェクトを特定する場合等、オブジェクトの存在位置がカメラからどの程度離れた位置なのか、オブジェクトの幅、高さが実際にどの程度の大きさなのかといった情報を利用する必要があり、カメラ映像から変化領域を抽出して、画像上のサイズのみで判断を行ったりするが、画面手前の小動物と画面奥の人物など対象物の大きさが異なり、その存在位置を正しく算出することができないという問題があった。
特開２００７−２３３５２３号公報 However, the camera technique based on the above method has a problem that it is necessary to accurately measure camera information and an installation position, and depending on the installation position of the camera, there are cases where measurement is not possible. Also, in image processing, when identifying an object from a change area that appears on the screen, how far the object is located from the camera, and how large the object's width and height are actually It is necessary to use information such as whether or not it is necessary to extract the change area from the camera image and make a judgment based only on the size on the image. In contrast, there is a problem that the existence position cannot be calculated correctly.
JP 2007-233523 A

上述したように、従来のカメラ画像を対象とした画像処理技術に於いては、設置並びに設定に多くの時間を要するという問題があり、認識精度の面でも問題があった。 As described above, the conventional image processing technology for camera images has a problem that it takes a lot of time for installation and setting, and there is also a problem in terms of recognition accuracy.

本発明は、上記問題点を解消し、簡単な設定で、単眼カメラの画像から、精度の高いオブジェクト位置の推定を可能にした、画像処理装置および画像処理プログラムを提供することを目的とする。 It is an object of the present invention to provide an image processing apparatus and an image processing program that can solve the above-described problems and can estimate an object position with high accuracy from an image of a monocular camera with simple settings.

本発明は、単眼カメラの監視範囲のうち、地平面上の監視範囲を示す画面上の４頂点の情報と、前記単眼カメラで撮影した３枚以上のサンプル画像中のサンプル対象物についての画面情報および実測情報とをもとに、前記画面上の奥行き方向の画素に対する実距離を指数近似式により求め、オブジェクトの接地位置からの奥行き方向画素位置から、前記画面上の奥行き方向の各画素を、前記地上面上での奥行き方向の前記実距離に対応した実距離テーブルを作成する実距離テーブル取得手段と、前記画素単位の実距離テーブルの情報と、前記監視範囲を示す画面上の４頂点の情報と、前記３枚以上のサンプル画像中の前記サンプル対象物についての画面情報および実測情報とをもとに、前記画面上の前記オブジェクトの接地位置からの横方向の画素に対する実距離の重みを線形近似式により求め、前記画面上の横方向の各画素を、前記地平面上での奥行き方向の各距離毎に、前記地平面上の監視範囲を基準とした座標系における横方向実座標に対応付けた横方向重みテーブルを作成する横方向重みテーブル取得手段と、前記画素単位の実距離テーブルの情報と、前記監視範囲を示す画面上の４頂点の情報と、前記３枚以上のサンプル画像中の前記サンプル対象物についての画面情報および実測情報とをもとに、前記画面上の前記オブジェクトの接地位置からの縦方向の画素に対する実距離の重みを線形近似式により求め、前記画面上の縦方向の各画素位置について、これに対応する奥行き方向実距離において、１画素が有する実高さ幅を格納する縦方向重みテーブルを作成する縦方向重みテーブル取得手段と、前記各取得手段により作成された前記画素単位の実距離テーブル、前記画面上の横方向重みテーブル、および前記画面上の縦方向重みテーブルを参照テーブルとして保持し、少なくとも前記画素単位の実距離テーブルを用いて、前記単眼カメラで撮影した前記オブジェクトの実在位置を認識する画像処理を実行する画像処理手段と、を具備した画像処理装置を提供する。 The present invention relates to information on four vertices on a screen indicating a monitoring range on the ground plane among monitoring ranges of a monocular camera, and screen information about sample objects in three or more sample images taken by the monocular camera. Based on the actual measurement information, the actual distance to the pixels in the depth direction on the screen is obtained by an exponential approximation formula, and each pixel in the depth direction on the screen is determined from the depth direction pixel position from the ground contact position of the object . An actual distance table acquisition means for creating an actual distance table corresponding to the actual distance in the depth direction on the ground surface , information on the actual distance table in units of pixels, and four vertices on the screen indicating the monitoring range; information and, on the basis of the screen information and the actual information on the sample object in said three or more sample images, laterally from the ground position of the object on the screen Determined by linear approximation formula weight of the actual distance to the pixel, the coordinates of each pixel in the horizontal direction on the screen, for each distance in the depth direction on the ground plane, relative to the monitoring area on the ground plane Horizontal weight table acquisition means for creating a horizontal weight table associated with the horizontal real coordinates in the system , information on the actual distance table in pixel units, information on the four vertices on the screen indicating the monitoring range, Based on the screen information and the actual measurement information about the sample object in the three or more sample images , a linear approximation formula is used to calculate the weight of the actual distance for the vertical pixel from the ground contact position of the object on the screen. the calculated, for each pixel position in the vertical direction on the screen, in the depth direction actual distance corresponding to the vertical direction to create a longitudinal weight table that stores real height width 1 pixel has A weight table acquisition unit, an actual distance table in units of pixels created by each acquisition unit, a horizontal weight table on the screen, and a vertical weight table on the screen are stored as reference tables, and at least the pixels There is provided an image processing apparatus comprising image processing means for executing image processing for recognizing an actual position of the object photographed by the monocular camera using a unit real distance table.

また、本発明は、単眼カメラの画像処理装置としてコンピュータを機能させるための画像処理プログラムであって、単眼カメラの監視範囲のうち、地平面上の監視範囲を示す画面上の４頂点の情報と、前記単眼カメラで撮影した３枚以上のサンプル画像中のサンプル対象物についての画面情報および実測情報とをもとに、前記画面上の奥行き方向の画素に対する実距離を指数近似式により求め、オブジェクトの接地位置からの奥行き方向画素位置から、前記画面上の奥行き方向の各画素を、前記地上面上での奥行き方向の前記実距離に対応した実距離テーブルを作成する機能と、前記画素単位の実距離テーブルの情報と、前記監視範囲を示す画面上の４頂点の情報と、前記３枚以上のサンプル画像中の前記サンプル対象物についての画面情報および実測情報とをもとに、前記画面上の前記オブジェクトの接地位置からの横方向の画素に対する実距離の重みを線形近似式により求め、前記画面上の横方向の各画素を、前記地平面上での奥行き方向の各距離毎に、前記地平面上の監視範囲を基準とした座標系における横方向実座標に対応付けた横方向重みテーブルを作成する機能と、前記画素単位の実距離テーブルの情報と、前記監視範囲を示す画面上の４頂点の情報と、前記３枚以上のサンプル画像中の前記サンプル対象物についての画面情報および実測情報とをもとに、前記画面上の前記オブジェクトの接地位置からの縦方向の画素に対する実距離の重みを線形近似式により求め、前記画面上の縦方向の各画素位置について、これに対応する奥行き方向実距離において、１画素が有する実高さ幅を格納する縦方向重みテーブルを作成する機能と、前記各機能により作成された前記画素単位の実距離テーブル、前記画面上の横方向重みテーブル、および前記画面上の縦方向重みテーブルを参照テーブルとして保持し、少なくとも前記画素単位の実距離テーブルを用いて、前記単眼カメラで撮影した前記オブジェクトの実在位置を認識する処理、又は前記単眼カメラで撮影した前記オブジェクトを処理対象から除去する処理を実行する画像処理機能と、をコンピュータに実現させるたことを特徴とする。 The present invention also provides an image processing program for causing a computer to function as an image processing apparatus for a monocular camera, and includes information on four vertices on a screen indicating a monitoring range on the ground plane among the monitoring range of the monocular camera. , on the basis of the screen information and the actual information on the sample object in a single camera shot with three or more sample images, obtained by the actual distance the exponential approximation formula for the depth direction of the pixels on the screen, the object A function of creating an actual distance table corresponding to the actual distance in the depth direction on the ground surface from each pixel in the depth direction on the screen from the pixel position in the depth direction from the ground contact position ; and information of the real distance table, screen information on the 4 and vertex information on a screen showing the monitoring range, the sample object in said three or more sample images Based on the preliminary actual measurement information, determined by the actual distance the linear approximation formula weights of the lateral direction of the pixel from the ground position of the object on the screen, each pixel in the horizontal direction on the screen, the ground plane A function for creating a horizontal weight table associated with the horizontal actual coordinates in the coordinate system based on the monitoring range on the ground plane for each distance in the depth direction above, and the actual distance table in units of pixels information and the 4 vertices of the information on the screen showing the monitoring range, on the basis of the screen information and the actual information on the sample object in said three or more sample images, the object on the screen It obtains the weight of the actual distance relative to the longitudinal direction of the pixels from the ground position of the linear approximation equation, for each pixel position in the vertical direction on the screen, in the depth direction actual distance corresponding to one pixel is chromatic And the ability to create a longitudinal weight table that stores real height width that the actual distance table of the pixel units created by the respective function, lateral weight table on the screen, and the longitudinal weights on the screen A table is stored as a reference table, and at least the actual distance table in units of pixels is used to recognize the actual position of the object photographed by the monocular camera, or the object photographed by the monocular camera is removed from the processing target And an image processing function for executing the processing to be executed by a computer.

この画像処理により、単眼（カメラ１台）の画像処理において、カメラ情報やカメラ設置条件を利用することなく、撮影された画像上の監視範囲（４頂点）と位置情報を３箇所以上入力することで撮影画面上の位置推定を行うことができる。また、画像上の位置に対する実距離を算出することが可能となり、実世界座標上の対象物の存在位置を高い精度で把握できる。また、撮影映像上のオブジェクトを抽出（領域）すると、そのサイズ（幅、高さ）の実距離を推定することができ、さらに、その位置の実距離を推定することができる。また、撮影映像上で監視範囲を設定することで、そのゾーンを外れたオブジェクトを対象外とすることができ、ノイズ除去を行うことができる。 By this image processing, three or more monitoring ranges (four vertices) and position information on the captured image can be input without using camera information or camera installation conditions in monocular (one camera) image processing. The position on the shooting screen can be estimated. In addition, it is possible to calculate the actual distance with respect to the position on the image, and the position of the object on the real world coordinates can be grasped with high accuracy. Further, when an object on the captured video is extracted (region), the actual distance of the size (width, height) can be estimated, and further, the actual distance of the position can be estimated. In addition, by setting a monitoring range on the captured video, an object outside the zone can be excluded and noise can be removed.

本発明によれば、簡単な設定で、単眼カメラの画像から、精度の高いオブジェクト位置の推定が可能である。 According to the present invention, it is possible to estimate an object position with high accuracy from an image of a monocular camera with a simple setting.

以下図面を参照して本発明の実施形態を説明する。なお、この実施形態では、本発明を単眼カメラの追跡処理技術に適用した例を示しているが、単眼カメラによる種々の画像処理技術において広く適用可能である。 Embodiments of the present invention will be described below with reference to the drawings. In this embodiment, an example in which the present invention is applied to a tracking processing technique of a monocular camera is shown, but the present invention can be widely applied to various image processing techniques using a monocular camera.

本発明の実施形態に係る画像処理装置は、単眼カメラ（１台のカメラ）で撮影した画像により検出（抽出）または追跡した対象物の実在位置（実世界座標）を画像上の位置から求めるオブジェクト位置推定機能を実現している。 An image processing apparatus according to an embodiment of the present invention is an object that obtains an actual position (real world coordinates) of an object detected (extracted) or tracked from an image captured by a monocular camera (one camera) from a position on the image. The position estimation function is realized.

本発明は、単眼カメラの画像処理において、単眼カメラで撮影した入力画像から抽出したオブジェクトの画像上の位置に基づく世界座標上の位置の推定を行う機能を実現する構成要素として、
単眼カメラの監視範囲のうち、地平面上の監視範囲を示す画面上の４頂点の情報と、前記単眼カメラで撮影した３枚以上のサンプル画像中のサンプル対象物についての画面情報および実測情報とをもとに、前記画面上の奥行き方向の画素に対する実距離を指数近似式により求め、オブジェクトの接地位置からの奥行き方向画素位置から、前記画面上の奥行き方向の各画素を、前記地上面上での奥行き方向の前記実距離に対応した実距離テーブルを作成する実距離テーブル取得手段と、前記画素単位の実距離テーブルの情報と、前記監視範囲を示す画面上の４頂点の情報と、前記３枚以上のサンプル画像中の前記サンプル対象物についての画面情報および実測情報とをもとに、前記画面上の前記オブジェクトの接地位置からの横方向の画素に対する実距離の重みを線形近似式により求め、前記画面上の横方向の各画素を、前記地平面上での奥行き方向の各距離毎に、前記地平面上の監視範囲を基準とした座標系における横方向実座標に対応付けた横方向重みテーブルを作成する横方向重みテーブル取得手段と、前記画素単位の実距離テーブルの情報と、前記監視範囲を示す画面上の４頂点の情報と、前記３枚以上のサンプル画像中の前記サンプル対象物についての画面情報および実測情報とをもとに、前記画面上の前記オブジェクトの接地位置からの縦方向の画素に対する実距離の重みを線形近似式により求め、前記画面上の縦方向の各画素位置について、これに対応する奥行き方向実距離において、１画素が有する実高さ幅を格納する縦方向重みテーブルを作成する縦方向重みテーブル取得手段と、前記各取得手段により作成された前記画素単位の実距離テーブル、前記画面上の横方向重みテーブル、および前記画面上の縦方向重みテーブルを参照テーブルとして保持し、少なくとも前記画素単位の実距離テーブルを用いて、前記単眼カメラで撮影した前記オブジェクトの実在位置を認識する画像処理を実行する画像処理手段とを具備する。 In the image processing of the monocular camera, the present invention is a component that realizes a function for estimating the position on the world coordinates based on the position on the image of the object extracted from the input image captured by the monocular camera.
Among the monitoring range of the monocular camera, information on the four vertices on the screen indicating the monitoring range on the ground plane , screen information and actual measurement information on sample objects in three or more sample images taken by the monocular camera, The actual distance to the pixels in the depth direction on the screen is obtained by an exponential approximation formula, and each pixel in the depth direction on the screen is determined on the ground surface from the pixel position in the depth direction from the ground contact position of the object. Real distance table acquisition means for creating an actual distance table corresponding to the actual distance in the depth direction in the information, information on the actual distance table in units of pixels, information on the four vertices on the screen indicating the monitoring range, based on the screen information and the actual information on the sample object in three or more sample images, versus the horizontal direction of the pixel from the ground position of the object on the screen That obtained by the actual distance the linear approximation formula weights, each pixel in the lateral direction on the screen, for each distance in the depth direction on the ground plane, the coordinate system based on the monitoring area on the ground plane A horizontal weight table acquisition means for creating a horizontal weight table associated with the horizontal real coordinates in the information, information on the actual distance table in pixel units, information on the four vertices on the screen indicating the monitoring range, Based on the screen information and the actual measurement information about the sample object in three or more sample images , the weight of the actual distance with respect to the vertical pixel from the ground contact position of the object on the screen is expressed by a linear approximation formula. determined, for each pixel position in the vertical direction on the screen, in the depth direction actual distance corresponding to the vertical direction weight tape to create a longitudinal weight table that stores real height width 1 pixel has And a real distance table for each pixel created by each of the acquisition means, a horizontal weight table on the screen, and a vertical weight table on the screen as reference tables, and at least the pixel unit Image processing means for executing image processing for recognizing the actual position of the object photographed by the monocular camera using the real distance table.

本発明の実施形態に係る画像処理装置におけるオブジェクト位置の推定処理について、その動作の概要を、図１を参照して説明する。なお、ここでは、単眼カメラ１１で撮影した画像を１フレーム３２０×２４０画素のＱＶＧＡでカメラ映像として出力するものとする。 An outline of the operation of the object position estimation process in the image processing apparatus according to the embodiment of the present invention will be described with reference to FIG. Here, it is assumed that an image captured by the monocular camera 11 is output as a camera image by a QVGA of 320 × 240 pixels per frame.

本発明の実施形態に係る、オブジェクト位置の推定は、サンプルモデルとなる実際のオブジェクト（対象物）を単眼カメラで撮影した画像をもとに行う。このサンプルモデルとなるオブジェクトは、例えば、図示するような、二人並んだ人物モデル等、幅方向（横方向）の間隔が設定可能なオブジェクトとする。 The estimation of the object position according to the embodiment of the present invention is performed based on an image obtained by photographing a real object (target object) serving as a sample model with a monocular camera. The object as the sample model is, for example, an object in which an interval in the width direction (lateral direction) can be set, such as a person model in which two people are arranged as shown in the figure.

まず、図１に示す処理Ａ１において、単眼カメラ１１の監視映像から、上記サンプルモデルとなるオブジェクトが撮影されている画像を３枚以上用意する。このとき、これらの画像は同一のオブジェクト（同一サンプルモデル）であり、画面上の異なる箇所にて撮影されているものとする。なお、単眼カメラ１１は、例えば監視エリアを対象に、斜め上方から監視対象を撮像する設置条件のもとに画角が設定されている。 First, in process A1 shown in FIG. 1, three or more images in which an object to be the sample model is photographed are prepared from the monitoring video of the monocular camera 11. At this time, these images are the same object (same sample model) and are taken at different locations on the screen. Note that the angle of view of the monocular camera 11 is set based on installation conditions for imaging a monitoring target from obliquely above, for example, in a monitoring area.

次に、用意した画像からオブジェクトの領域を矩形（四角形）で囲み、矩形で囲った３箇所以上のオブジェクト情報を取得する。ここでは、図５に示すように、矩形の大きさを異にする（撮影位置を異にした）３枚の画像（Ｐ１，Ｐ２，Ｐ３）を取得し、この３枚の画像のオブジェクトを矩形で囲う処理を行う。 Next, an object region is enclosed by a rectangle (rectangle) from the prepared image, and object information at three or more places enclosed by the rectangle is acquired. Here, as shown in FIG. 5, three images (P1, P2, P3) having different rectangle sizes (different shooting positions) are acquired, and the objects of the three images are rectangular. The process enclosed by is performed.

図１に示す処理Ａ２において、単眼カメラ１１で監視する監視範囲（４頂点）を画面上で指定する。ここでは、図６に示すように、取得した画像の１枚にて、監視範囲（４頂点）を設定する。例えば、図６では、道路の歩行者を監視対象とする場合、４頂点は道路を示す領域である。そして、図６の画面左端のラインを横方向の基準ライン（０ｍライン）とし、この監視範囲（４頂点）内で位置を推定する。つまり、画面上のある位置がこの基準ライン（０ｍライン）から横方向に何ｍの位置かという結果を得るために使用する。 In process A2 shown in FIG. 1, the monitoring range (four vertices) monitored by the monocular camera 11 is designated on the screen. Here, as shown in FIG. 6, a monitoring range (four vertices) is set for one of the acquired images. For example, in FIG. 6, when a pedestrian on a road is to be monitored, the four vertices are areas indicating roads. Then, the line at the left end of the screen in FIG. 6 is set as a horizontal reference line (0 m line), and the position is estimated within this monitoring range (four vertices) . That is, it is used to obtain a result indicating how many meters a position on the screen is in the horizontal direction from this reference line (0 m line).

次に、監視対象（４頂点）内の上記図５に示した３枚の画像（Ｐ１，Ｐ２，Ｐ３）の１枚１枚に対して、矩形で囲ったオブジェクトについて、図１又は図７に示すように、画像下側からの足元位置（接地位置）までの距離Ｐｙｉ（ｐｉｘ：ピクセル）、Ｗｙｉ（ｍ：メートル）と、２人の間隔ΔＰｘｉ（ｐｉｘ：ピクセル）、ΔＷｘ（ｍ：メートル）と、どちらか一方のオブジェクトの高さΔＰｈｉ（ｐｉｘ：ピクセル）、ΔＷｈ（ｍ：メートル）の各情報を取得する（ｉは画像の枚数、ΔＷｘ、ΔＷｈは各画像で共通）。 Next, with respect to each of the three images (P1, P2, P3) shown in FIG. 5 in the monitoring target (four vertices), an object enclosed by a rectangle is shown in FIG. 1 or FIG. as shown, the distance to the foot position from the image lower (ground position) Pyi (pix: pixels), Wyi (m: m) and, two spacing ΔPxi (pix: pixels), ΔWx (m: m) Each information of the height ΔPhi (pix : pixel ) and ΔWh (m : meter ) of either one of the objects is acquired (i is the number of images, and ΔWx and ΔWh are common to each image).

次に、上記した実距離グラフの取得処理（縦方向の距離モデル算出処理）と、横方向重みグラフの取得処理（横方向の距離の重みモデル算出処理）と、縦方向重みグラフの取得処理（縦方向の距離の重みモデル算出処理）により、実距離グラフ（足元位置−距離グラフ）と、横方向重みグラフ（距離−横方向重みグラフ）と、縦（高さ）方向重みグラフ（距離−高さ方向重みグラフ）とを取得する。 Next, the actual distance graph acquisition process (vertical distance model calculation process), the horizontal weight graph acquisition process (horizontal distance weight model calculation process), and the vertical weight graph acquisition process ( By the vertical distance weight model calculation process), the actual distance graph (foot position-distance graph), the horizontal weight graph (distance-horizontal weight graph), and the vertical (height) direction weight graph (distance-high). Directional weight graph).

実距離グラフ（足元位置−距離グラフ）の取得（算出）処理では、上記した３つ（または３つ以上）の画像の下側から足元位置（オブジェクト）までの距離Ｐｙｉ（ｐｉｘ：ピクセル）、Ｗｙｉ（ｍ：メートル）（ｉは画像の枚数）の情報を用いて、最小二乗法により、画像上の足元位置Ｐｙに対する実距離Ｗｙの指数近似式を取得する。 In the acquisition (calculation) processing of the actual distance graph (foot position-distance graph), the distance Pyi (pix : pixel ) from the lower side of the above three (or three or more) images to the foot position (object ), Wyi Using the information of (m : meter ) (i is the number of images) , an exponential approximate expression of the actual distance Wy with respect to the foot position Py on the image is acquired by the least square method.

距離モデル：Ｗｙ＝Ａｅ^ＢＰｙ（Ａ、Ｂは最小二乗法により求まるパラメータ）…（１）
（１）式で算出した距離モデル近似式より、画像上の位置に対する実距離グラフを取得する（なお、カメラ直下から、画像下側の位置までの距離（死角となる距離）をＷｙの値にプラスすることにより、カメラ直下からの実距離を求めることも可能となる）。この実距離グラフ（足元位置−距離グラフ）の一例を図８に示している。 Distance model: Wy = Ae ^BPy (A and B are parameters determined by the method of least squares) (1)
An actual distance graph with respect to the position on the image is acquired from the distance model approximate expression calculated by the expression (1) (Note that the distance from the position immediately below the camera to the position on the lower side of the image (distance that becomes a blind spot) is set to the value of Wy. By adding it, it is also possible to find the actual distance from directly under the camera). An example of the actual distance graph (foot position-distance graph) is shown in FIG.

横方向重みグラフ（距離−横方向重みグラフ）の取得（算出）処理では、上記実距離グラフの取得処理で取得した縦方向の距離モデルと、３つ（または３つ以上）の画像の２人（オブジェクト）の間隔ΔＰｘｉ（ｐｉｘ：ピクセル）、ΔＷｘ（ｍ：メートル）（ｉは画像の枚数）から、実距離Ｗｙの位置における横方向の１画素あたりの重み（ωｘｉ＝ΔＷｘ／ΔＰｘｉ）を求める。 In the acquisition (calculation) process of the horizontal weight graph (distance-horizontal weight graph) , two persons of the vertical distance model acquired by the actual distance graph acquisition process and three (or more) images are used. From the (object) interval ΔPxi (pix : pixel ), ΔWx ( m: meter ) (i is the number of images) , the weight per pixel in the horizontal direction at the position of the actual distance Wy (ωxi = ΔWx / ΔPxi) is obtained. .

３つ（または３つ以上）の実距離Ｗｙと横方向の１画素あたりの重みωｘｉ（ｉは画像の枚数）を用いて、最小二乗法により、実距離Ｗｙに対する横方向の１画素あたりの重みωｘｉの線形近似式を取得する。 The weight per pixel in the horizontal direction with respect to the actual distance Wy by the least square method using three (or more) three actual distances Wy and the weight ωxi per pixel in the horizontal direction (i is the number of images). Obtain a linear approximation of ωxi.

横方向の１画素あたりの重みモデル：ωｘ＝ＣＷｙ＋Ｄ（Ｃ、Ｄは最小二乗法により求まるパラメータ）…（２）
（２）式で算出した横方向重み近似式より、実距離Ｗｙに対する横方向重みグラフを取得する。この横方向重みグラフ（距離−横方向重みグラフ）の一例を図９に示している。 Weight model per pixel in the horizontal direction: ωx = CWy + D (C and D are parameters determined by the method of least squares) (2)
A horizontal weight graph with respect to the actual distance Wy is acquired from the horizontal weight approximate expression calculated by the expression (2). An example of this horizontal weight graph (distance-horizontal weight graph) is shown in FIG.

画面上の各画素での横方向位置は、基準ラインから画素までの画像上の距離に、算出した重みを掛けることにより算出される。 The horizontal position of each pixel on the screen is calculated by multiplying the distance on the image from the reference line to the pixel by the calculated weight.

縦（高さ）方向重みグラフ（距離−高さ方向重みグラフ）の取得（算出）処理では、上記実距離グラフの取得処理で取得した縦方向の距離モデルと、３つ以上の画像のオブジェクトの高さΔＰｈｉ（ｐｉｘ：ピクセル）、ΔＷｈ（ｍ：メートル）から、実距離Ｗｙの位置における高さ方向の１画素あたりの重み（ωｚｉ＝ΔＷｈ／ΔＰｈｉ）を求める。 In the acquisition (calculation) process of the vertical (height) direction weight graph (distance-height direction weight graph), the vertical distance model acquired in the real distance graph acquisition process and three or more image objects are obtained. From the height ΔPhi (pix : pixel ) and ΔWh ( m: meter ), the weight per pixel in the height direction at the position of the actual distance Wy (ωzi = ΔWh / ΔPhi) is obtained.

３つ（または３つ以上）の実距離Ｗｙと高さ方向の１画素あたりの重みωｚｉ（ｉは画像の枚数）を用いて、最小二乗法により、実距離Ｗｙに対する高さ方向の１画素あたりの重みωｚｉの線形近似式を取得する。 Using three (or three or more) actual distances Wy and a weight ωzi per pixel in the height direction (i is the number of images) , per pixel in the height direction with respect to the actual distance Wy by the least square method The linear approximation formula of the weight ωzi of

高さ方向の１画素あたりの重みモデル：ωｚ＝ＥＷｙ＋Ｆ（Ｅ、Ｆは最小二乗法により求まるパラメータ）…（３）
（３）式で算出した高さ方向重み近似式より、実距離Ｗｙに対する高さ方向重みグラフを取得する。この縦方向重みグラフ（距離−高さ方向重みグラフ）の一例を図１０に示している。 Weight model per pixel in the height direction: ωz = EWy + F (E and F are parameters determined by the method of least squares) (3)
A height direction weight graph with respect to the actual distance Wy is acquired from the height direction weight approximate expression calculated by the expression (3). An example of this vertical weight graph (distance-height weight graph) is shown in FIG.

画面上の各画素での高さ方向位置は、画像上の足元位置から指定した高さ方向の画素までの画像上の距離に、算出した重みを掛けることにより算出される。 The height direction position of each pixel on the screen is calculated by multiplying the distance on the image from the foot position on the image to the designated pixel in the height direction by the calculated weight.

図１に示す処理Ａ３において、上記した実距離グラフ（足元位置−距離グラフ）の情報（パラメータ値）をもとに、実距離テーブル（図３に示す実距離Ｙテーブル１５５ａ）が作成され、横方向重みグラフ（距離−横方向重みグラフ）の情報（パラメータ値）をもとに、横方向重みテーブル（図３に示す実距離Ｘ重みテーブル１５５ｂ）が作成され、縦（高さ）方向重みグラフ（距離−高さ方向重みグラフ）の情報（パラメータ値）をもとに、縦方向重みテーブル（図３に示す実距離Ｚ重みテーブル１５５ｃ）が作成される。 In process A3 shown in FIG. 1, an actual distance table (actual distance Y table 155a shown in FIG. 3) is created based on the information (parameter values) of the above-described actual distance graph (foot position-distance graph). Based on the information (parameter values) of the direction weight graph (distance-horizontal weight graph), a horizontal weight table (actual distance X weight table 155b shown in FIG. 3) is created, and the vertical (height) direction weight graph Based on the information (parameter value) of (distance-height direction weight graph), a vertical direction weight table (actual distance Z weight table 155c shown in FIG. 3) is created.

これらの実距離テーブル（実距離Ｙテーブル１５５ａ）、横方向重みテーブル（実距離Ｘ重みテーブル１５５ｂ）、縦方向重みテーブル（実距離Ｚ重みテーブル１５５ｃ）を参照テーブルとして、単眼カメラ１１で撮影したオブジェクトの位置、サイズの推定、並びにこの推定に基づく画像処理が実施される。 These real distance tables (real distance Y table 155a ), horizontal weight table (real distance X weight table 155b ), and vertical weight table (real distance Z weight table 155c ) are used as reference tables for objects photographed with the monocular camera 11. The position and size are estimated, and image processing based on this estimation is performed.

上記各グラフの作成で求めた推定係数により、画像上の位置における実距離や、２点間の距離または、高さを求めることが可能となる。また、ある点から何ｍの位置がどこかということも算出可能となる。 It is possible to obtain the actual distance at the position on the image, the distance between two points, or the height based on the estimation coefficient obtained by creating each graph. Also, it is possible to calculate how many meters a position is from a certain point.

画像上のある位置（ｉ，ｊ）の実距離を求める場合、上記（１）式（縦方向の距離モデル近似式Ｗｙ＝Ａｅ^ＢＰｙ）より、Ｐｙにｊを代入して、縦方向の実距離Ｗｙを算出する。算出したＷｙを上記（２）式（横方向の距離モデル重み近似式ωｘ＝ＣＷｙ＋Ｄ）に代入してωｘを算出する。 When calculating the actual distance of a certain position (i, j) on the image, the actual distance in the vertical direction is obtained by substituting j for Py from the above formula (1) (vertical distance model approximate expression Wy = Ae ^BPy). Wy is calculated. Ωx is calculated by substituting the calculated Wy into the above formula (2) (lateral distance model weight approximation formula ωx = CWy + D).

監視範囲（４頂点）の横方向基準ラインから、ある位置（ｉ，ｊ）までの横方向の画素数ＤＰｘを算出する。横方向の実距離Ｗｘは、ωｘ×ＤＰｘにより算出される。 The number of pixels DPx in the horizontal direction from the horizontal reference line in the monitoring range (four vertices) to a certain position (i, j) is calculated. The actual distance Wx in the horizontal direction is calculated by ωx × DPx.

これにより、画像上のある位置（ｉ，ｊ）の実距離（Ｗｘ，Ｗｙ）が求められる。 Thereby, the actual distance (Wx, Wy) of a certain position (i, j) on the image is obtained.

画像上のある位置（ｉ，ｊ）から、画像上の画素数ＤＰｚの位置の高さの実距離を求める場合、上記（１）式（縦方向の距離モデル近似式Ｗｙ＝Ａｅ^ＢＰｙ）より、Ｐｙにｊを代入して、縦方向の実距離Ｗｙを算出する。算出したＷｙを上記（３）式（高さ方向の距離モデル重み近似式ωｚ＝ＥＷｙ＋Ｆ）に代入してωｚを算出する。高さ方向の実距離は、ωｚ×ＤＰｚにより算出される。 When obtaining the actual distance of the height of the number of pixels DPz on the image from a certain position (i, j) on the image, from the above formula (1) (vertical distance model approximation formula Wy = Ae ^BPy) , By substituting j for Py, the actual vertical distance Wy is calculated. Ωz is calculated by substituting the calculated Wy into the above formula (3) (distance model weight approximation formula in the height direction ωz = EWy + F). The actual distance in the height direction is calculated by ωz × DPz.

上記した実距離テーブル（実距離Ｙテーブル１５５ａ）、横方向重みテーブル（実距離Ｘ重みテーブル１５５ｂ）、縦方向重みテーブル（実距離Ｚ重みテーブル１５５ｃ）を参照テーブルとして用いて、単眼カメラ１１で撮影した画像に対してオブジェクト位置の推定並びにこれに伴う各種の画像処理が行われる。 Using the above-described actual distance table (actual distance Y table 155a ), horizontal weight table (actual distance X weight table 155b ), and vertical weight table (actual distance Z weight table 155c ) as a reference table, the image is taken with the monocular camera 11. The estimated position of the object and various image processing associated therewith are performed on the obtained image.

この実施形態では、図１に示す処理Ａ１１において、単眼カメラ１１の映像を所定の周期で入力し、入力した画面上の現在画像と過去画像を差分処理して、変化した画素を含む矩形の領域を抽出する。 In this embodiment, in the process A11 shown in FIG. 1, the image of the monocular camera 11 is input at a predetermined cycle, the input current image on the screen and the past image are subjected to differential processing, and a rectangular area including changed pixels To extract.

図１に示す処理Ａ１２において、上記抽出した矩形の領域について、新たに抽出した矩形の領域を追跡開始オブジェクトとして、当該オブジェクトの追跡を行い、追跡したオブジェクト情報を保持し、追跡オブジェクトが出現する度にオブジェクト情報を更新する。 In the process A12 shown in FIG. 1, for the extracted rectangular area, the newly extracted rectangular area is used as a tracking start object, the object is tracked, the tracked object information is held, and a tracking object appears. Update object information.

図１に示す処理Ａ１３において、抽出した矩形の領域（追跡開始オブジェクト、および追跡オブジェクト）について、上記前処理により取得した参照テーブルを用いて、オブジェクト位置の推定並びにこの推定に基づく各種のオブジェクト処理を行う。このオブジェクト処理では、例えば、実距離テーブル（実距離Ｙテーブル１５５ａ）を用いて、単眼カメラ１１で撮影した画像の実在位置を認識する処理を実行することができる。また、実距離テーブル（実距離Ｙテーブル１５５ａ）を用いて、単眼カメラ１１で撮影した画像を処理対象から除外する処理を実行することができる。また、実距離テーブル（実距離Ｙテーブル１５５ａ）と、横方向重みテーブル（実距離Ｘ重みテーブル１５５ｂ）または縦方向重みテーブル（実距離Ｚ重みテーブル１５５ｃ）を用いて、単眼カメラ１１で撮影した画像の実体（例えばオブジェクトの実寸による大きさ）を認識する処理、実際のオブジェクトサイズに基づく仕分け処理等を実行することができる。また、監視範囲の４頂点の情報と、実距離テーブル（実距離Ｙテーブル１５５ａ）と、横方向重みテーブル（実距離Ｘ重みテーブル１５５ｂ）または縦方向重みテーブル（実距離Ｚ重みテーブル１５５ｃ）を用いて、監視範囲（４頂点）を、実空間上に、設定することができ、さらに、この実空間の監視範囲（４頂点）を対象に、オブジェクトの弁別、選択（抽出）、画像削除、空間監視等、種々の処理が可能となる。これらの単眼カメラ映像による各処理は、上記した指数近似モデル、および各線形近似モデルによるグラフ処理によってはじめて可能になる。 In the process A13 shown in FIG. 1, for the extracted rectangular area (tracking start object and tracking object), the object position is estimated and various object processes based on the estimation are performed using the reference table acquired by the preprocessing. Do. In this object process, for example, a process of recognizing the actual position of an image taken by the monocular camera 11 can be executed using an actual distance table (actual distance Y table 155a ). Further, it is possible to execute processing for excluding an image photographed by the monocular camera 11 from the processing target using the actual distance table (actual distance Y table 155a ). In addition, an image captured by the monocular camera 11 using an actual distance table (actual distance Y table 155a ) and a horizontal weight table (actual distance X weight table 155b ) or a vertical weight table (actual distance Z weight table 155c ). For example, a process for recognizing the actual object size (for example, the actual size of the object) and a sorting process based on the actual object size can be executed. Further, information on the four vertices of the monitoring range, an actual distance table (actual distance Y table 155a ), a horizontal weight table (actual distance X weight table 155b ) or a vertical weight table (actual distance Z weight table 155c ) are used. Thus, the monitoring range (4 vertices) can be set on the real space, and further, object discrimination, selection (extraction), image deletion, space can be set for the real space monitoring range (4 vertices). Various processes such as monitoring can be performed. Each processing using these monocular camera images is possible only by the graph processing using the exponential approximation model and each linear approximation model.

上記した実距離テーブル（実距離Ｙテーブル１５５ａ）、横方向重みテーブル（実距離Ｘ重みテーブル１５５ｂ）、縦方向重みテーブル（実距離Ｚ重みテーブル１５５ｃ）を参照テーブルとして用いた実施形態に係る画像処理装置の構成を図２に示し、この画像処理装置の要部の構成を図３に示している。 Image processing according to the embodiment using the above-described actual distance table (real distance Y table 155a ), horizontal weight table (real distance X weight table 155b ), and vertical weight table (real distance Z weight table 155c ) as reference tables. The configuration of the apparatus is shown in FIG. 2, and the configuration of the main part of the image processing apparatus is shown in FIG.

本発明の実施形態に係る画像処理装置は、図２に示すように、カメラ（単眼カメラ）１１と、キャプチャ部１２と、画像処理記憶部１３と、画像処理部１５と、表示部１６とを具備して構成される。 As shown in FIG. 2, the image processing apparatus according to the embodiment of the present invention includes a camera (monocular camera) 11, a capture unit 12, an image processing storage unit 13, an image processing unit 15, and a display unit 16. It is provided and configured.

カメラ１１は、レンズユニットとレンズユニットの結像位置に設けられた撮像素子（例えばＣＣＤ固体撮像素子、若しくはＣＭＯＳイメージセンサ）とを具備して、屋外若しくは屋内の動きを伴う被写体（動物体）を対象に、一定の画角で撮像した一画面分の画像を所定の画素単位（例えば１フレーム３２０×２４０画素（ｐｉｘ）＝ＱＶＧＡ）で出力する。 The camera 11 includes a lens unit and an image pickup device (for example, a CCD solid-state image pickup device or a CMOS image sensor) provided at an image forming position of the lens unit, so that a subject (animal body) that moves outdoors or indoors can be used. An image for one screen imaged at a certain angle of view is output to the target in a predetermined pixel unit (for example, one frame 320 × 240 pixels (pix) = QVGA).

キャプチャ部１２は、カメラ１１が撮像したフレーム単位の画像を画像処理部１５の処理対象画像（入力画像）として取り込み、画像処理記憶部１３内の画像バッファ１３１に保持する処理機能をもつ。この画像バッファメモリ１３１に取り込む画面上の入力画像は、ここでは原画像とするが、エッジ画像であってもよい。 The capture unit 12 has a processing function of capturing an image in frame units captured by the camera 11 as a processing target image (input image) of the image processing unit 15 and holding it in the image buffer 131 in the image processing storage unit 13. The input image on the screen captured in the image buffer memory 131 is an original image here, but may be an edge image.

画像処理記憶部１３は、画像処理部１５の処理に供される、キャプチャ部１２が取り込んだ入力画像および処理中の各画像を含む各種データを記憶する。この画像処理記憶部１３には、画像処理部１５の制御の下に、キャプチャ部１２が取り込んだフレーム単位の画像のうち、今回取り込んだ一画面分の画像（現在画像）と、前回取り込んだ一画面分の画像（過去画像）をそれぞれ処理対象画像として保持する画像バッファメモリ１３１を構成する領域が確保されるとともに、画像処理部１５の処理に用いられる画像領域が確保される。さらに画像処理記憶部１３には、画像処理部１５の処理に供される各種のパラメータおよび制御データを記憶する記憶領域、追跡処理に於いて生成若しくは取得される各種情報の記憶領域等も確保される。 The image processing storage unit 13 stores various data including the input image captured by the capture unit 12 and each image being processed, which is used for the processing of the image processing unit 15. In this image processing storage unit 13, under the control of the image processing unit 15, among the frame-unit images captured by the capture unit 12, an image for one screen captured this time (current image) and one previously captured image are stored. An area constituting the image buffer memory 131 that holds images for the screen (past images) as processing target images is secured, and an image area used for the processing of the image processing unit 15 is secured. Further, the image processing storage unit 13 has a storage area for storing various parameters and control data used for the processing of the image processing unit 15, a storage area for various information generated or obtained in the tracking process, and the like. The

位置推定係数パラメータ算出部１４（図３を参照）は、上述した、４頂点の情報、単眼カメラ１１で撮影した３箇所以上のサンプル画像の画面情報および実測情報等を入力情報として、上述した、距離モデル、および各重みモデルの位置推定係数パラメータ値を算出し、この位置推定係数パラメータ値を参照テーブル（実距離Ｙテーブル１５５ａ、実距離Ｘ重みテーブル１５５ｂ、実距離Ｚ重みテーブル１５５ｃ）を作成する要素データとして画像処理部１５に設けられた位置推定テーブル算出部１５４に供給する。 The position estimation coefficient parameter calculation unit 14 (see FIG. 3) is described above using, as input information, information on the four vertices described above, screen information on three or more sample images captured by the monocular camera 11, measurement information, and the like. The distance model and the position estimation coefficient parameter value of each weight model are calculated, and a reference table (real distance Y table 155a , real distance X weight table 155b , real distance Z weight table 155c ) is created using the position estimation coefficient parameter values. The data is supplied as element data to a position estimation table calculation unit 154 provided in the image processing unit 15.

画像処理部１５は、画像処理記憶部１３の画像バッファメモリ１３１に保持された過去画像と現在画像を差分処理して二値化した差分二値化画像を生成し、この差分二値化画像に含まれるノイズを除去して、この差分二値化画像から、変化画素を含む矩形の領域を抽出する変化領域抽出処理部１５１と、この変化領域抽出処理部１５１で抽出した矩形の領域を処理対象に、領域を追跡する追跡処理部１５２と、位置推定テーブル算出部１５４により算出された参照テーブル（実距離Ｙテーブル１５５ａ、実距離Ｘ重みテーブル１５５ｂ、実距離Ｚ重みテーブル１５５ｃ）を用いて、上記したような位置推定による各種処理を実行するオブジェクト位置・サイズ推定処理部１５３と、を具備して構成される。 The image processing unit 15 generates a difference binarized image obtained by binarizing the past image and the current image held in the image buffer memory 131 of the image processing storage unit 13 and binarizing the difference image. The change area extraction processing unit 151 that removes the included noise and extracts the rectangular area including the change pixel from the difference binarized image, and the rectangular area extracted by the change area extraction processing unit 151 is processed. In addition, by using the tracking processing unit 152 that tracks the region and the reference tables calculated by the position estimation table calculation unit 154 (the actual distance Y table 155a , the actual distance X weight table 155b , and the actual distance Z weight table 155c ) , the object position and size estimation processing unit 153 performs various processes according to the above position estimation, and comprises a.

表示部（表示デバイス）１６は、上記画像処理部１５で画像処理された移動物体領域を表示出力する。 Display unit (display device) 16 displays and outputs the image processed moving objects area by the image processing unit 15.

上記した画像処理装置における処理の手順を図４に示している。図４に示すステップＢ３の処理において、上記参照テーブル（実距離Ｙテーブル１５５ａ、実距離Ｘ重みテーブル１５５ｂ、実距離Ｚ重みテーブル１５５ｃ）を用いたオブジェクト位置・サイズ推定処理が実施される。 FIG. 4 shows a processing procedure in the above-described image processing apparatus. In the process of step B3 shown in FIG. 4, the object position / size estimation process using the reference tables (real distance Y table 155a , real distance X weight table 155b , real distance Z weight table 155c ) is performed.

図４に示す処理手順に従い、図２および図３に示す画像処理装置の動作を説明する。 The operation of the image processing apparatus shown in FIGS. 2 and 3 will be described according to the processing procedure shown in FIG.

画像処理部１５に設けられた変化領域抽出処理部１５１は、画像処理記憶部１３の画像バッファメモリ１３１に保持された過去画像と現在画像を差分処理して二値化した差分二値化画像を生成し、この差分二値化画像に含まれるノイズを除去して、この画像情報を変化領域バッファメモリ１３２に書き込み保持する。さらに、この差分二値化画像の情報から、変化画素を含む矩形の領域を抽出して、この矩形領域の情報を変化領域バッファメモリ１３２に書き込み保持する（図４ステップＢ１）。 The change area extraction processing unit 151 provided in the image processing unit 15 performs a difference binarized image obtained by binarizing the past image and the current image held in the image buffer memory 131 of the image processing storage unit 13 by performing a difference process. produced, to remove noise contained in the difference binarized image, to hold write this image data into the transition region of the buffer memory 132. Further, the information of the difference binary image, extracts a rectangular region including the changed pixels, to retain write information of the rectangular region in the change area buffer memory 132 (FIG. 4 step B1).

追跡処理部１５２は、変化領域抽出処理部１５１により抽出され、変化領域バッファメモリ１３２に保持された矩形の領域を読み出し、追跡対象オブジェクトとして追跡処理する。その追跡処理結果は、追跡情報バッファメモリ１３３に書き込み保持される。この追跡処理は、初期動作時においては、追跡処理の対象となるオブジェクトが存在しないので、このオブジェクトを新規オブジェクトとして追跡情報バッファメモリ１３３に登録し、以降の追跡処理の対象とする。この追跡処理は、例えば、矩形の形状および重なりによる領域判定と、最小二乗近似により求めた予測方向に従う予測位置に基づく領域判定とを組み合わせた領域の追跡処理が適用できる（図４ステップＢ２）。 The tracking processing unit 152 reads out a rectangular region extracted by the changed region extraction processing unit 151 and held in the changed region buffer memory 132 , and performs tracking processing as a tracking target object. The tracking processing result is written and held in the tracking information buffer memory 133. In this tracking process, since there is no object to be tracked in the initial operation, this object is registered as a new object in the tracking information buffer memory 133 and is used as a target for the subsequent tracking process. For example, a region tracking process combining region determination based on a rectangular shape and overlap and region determination based on a prediction position according to a prediction direction obtained by least square approximation can be applied to this tracking processing (step B2 in FIG. 4).

オブジェクト位置・サイズ推定処理部１５３は、変化領域抽出処理部１５１が抽出した矩形領域（追跡開始（新規）オブジェクト、および追跡オブジェクト）について、追跡処理部１５２の追跡処理と、位置推定テーブル算出部１５４が算出した参照テーブル（実距離Ｙテーブル１５５ａ、実距離Ｘ重みテーブル１５５ｂ、実距離Ｚ重みテーブル１５５ｃ）を用いて、オブジェクトの推定を行う（図４ステップＢ３）。 The object position / size estimation processing unit 153 performs the tracking processing of the tracking processing unit 152 and the position estimation table calculation unit 154 for the rectangular regions (tracking start (new) object and tracking object) extracted by the change region extraction processing unit 151. The object is estimated using the reference tables (actual distance Y table 155a , actual distance X weight table 155b , and actual distance Z weight table 155c ) calculated (step B3 in FIG. 4).

オブジェクト位置・サイズ推定処理部１５３は、検出又は追跡した対象物の存在位置（実世界座標）を画像上の位置から求めるオブジェクト位置推定機能を実現する。例えば監視範囲（２次元平面＝４頂点）をユーザが指定して、その監視範囲（４頂点）内に存在しているオブジェクトの足元、頭頂位置を比較的高い位置精度で推定することができる。
また、上述したように、実距離Ｙテーブル１５５ａを用いて、単眼カメラ１１で撮影した画像の実在位置を認識する処理、実距離Ｙテーブル１５５ａを用いて、単眼カメラ１１で撮影した画像を処理対象から除外する処理が可能である。更に、実距離Ｙテーブル１５５ａと、実距離Ｘ重みテーブル１５５ｂまたは実距離Ｚ重みテーブル１５５ａを用いて、単眼カメラ１１で撮影した画像の実体（例えばオブジェクトの実寸による大きさ）を認識する処理や、実際のオブジェクトサイズに基づく仕分け処理が可能である。更にまた、監視範囲の４頂点の情報と実距離Ｙテーブル１５５ａと実距離Ｘ重みテーブル１５５ｂまたは実距離Ｚ重みテーブル１５５ｃを用いて、監視範囲（４頂点）を、実空間上に、設定したオブジェクト認識処理、実空間の監視範囲（４頂点）を対象としたオブジェクトの弁別、選択（抽出）、画像削除、空間監視等、種々の処理が可能である。 The object position / size estimation processing unit 153 realizes an object position estimation function that obtains the position (real world coordinates) of the detected or tracked object from the position on the image. For example, the monitoring range (two-dimensional plane = 4 vertices ) can be specified by the user, and the foot and vertex positions of the objects existing in the monitoring range (4 vertices) can be estimated with relatively high positional accuracy.
As described above, by using the actual distance Y table 155a, the process of recognizing the actual position of the image taken by the monocular camera 11, using the actual distance Y table 155a, the processing target image taken by the monocular camera 11 Can be excluded . Furthermore, the actual distance Y table 155a, with the actual distance X weight table 155b or actual distance Z weight table 155a, the process of recognizing and the entity of the captured image (e.g., the size by the actual size of the object) with a monocular camera 11, Sorting processing based on the actual object size is possible. Furthermore, by using the information on the four vertices of the monitoring range, the real distance Y table 155a , the real distance X weight table 155b, or the real distance Z weight table 155c , the object in which the monitoring range (four vertices) is set in the real space. Various processes such as recognition processing, discrimination of objects targeting the real space monitoring range (four vertices) , selection (extraction), image deletion, space monitoring, and the like are possible.

本発明の実施形態に係るオブジェクト位置およびサイズの推定機能について、図１１乃至図２１を参照してさらに説明を行う。 The object position and size estimation function according to the embodiment of the present invention will be further described with reference to FIGS.

（１）．監視範囲（４頂点）の考え方
図１１に示す、単眼カメラ１１で撮影した画面上において、同画面上の４点、ｐ０（ｘ０，ｙ０）、ｐ１（ｘ１，ｙ１）、ｐ２（ｘ２，ｙ１）、ｐ３（ｘ３，ｙ０）を指定することで、その４点を結ぶ四角形を平面領域と見做し、その平面領域を監視範囲（４頂点）とする。監視範囲の指定イメージを図１１に上記４点を結ぶ直線で示している。 (1). Concept of monitoring range (four vertices) On the screen photographed by the monocular camera 11 shown in FIG. 11, four points on the screen, p0 (x0, y0), p1 (x1, y1), p2 (x2, y1) , P3 (x3, y0) is designated, a square connecting the four points is regarded as a plane area, and the plane area is set as a monitoring range (four vertices) . A monitoring range designation image is shown in FIG. 11 by a straight line connecting the four points.

奥行き方向の手前ライン（ｐ０−ｐ３ライン）、奥のライン（ｐ１−ｐ２ライン）のｙ座標は同じとし、並行とする（平面領域は正確な四角形でなくても良いが、このルールに従うものとする）。なお、画面外の点も入力可能とする（ただし、ｐ１は画面内に必ず設定すること）。ｘ１＜ｘ２、ｘ０＜ｘ３、ｙ０＞ｙ１であるように設定されること。また、高さ方向を設定する（世界座標の高さ）ことで、高さ方向の監視範囲（４頂点）が設定され、不要領域の削除が可能となること（Ｘ，Ｙ，Ｚが求まった最後に、監視範囲（４頂点）の空間設定を行う）。ｐ０，ｐ１は基準ラインとなるため、ｐ１を画面内に存在することとする。
（２）．位置推定の考え方とリファレンス位置（基準位置）
監視範囲（４頂点）内にリファレンス（基準）となる複数の位置（等間隔の複数点）の座標（ｘ，ｙ）と、それに対応する世界座標（Ｘ，Ｙ，Ｚ）を取得し、距離（Ｙ）−画面上位置（ｙ）グラフを取得する。 The y-coordinates of the front line in the depth direction (p0-p3 line) and the back line (p1-p2 line) are the same and are parallel (the planar area does not have to be an exact quadrangle, To do). It should be noted that points outside the screen can also be input (however, p1 must be set within the screen). x1 <x2, x0 <x3, y0> y1 are set. Also, by setting the height direction (the height of the world coordinates), the monitoring range (4 vertices) in the height direction is set, and unnecessary areas can be deleted (X, Y, and Z are obtained) Finally, the space of the monitoring range (4 vertices) is set). Since p0 and p1 are reference lines, it is assumed that p1 exists in the screen .
(2). Position estimation concept and reference position (reference position)
The coordinates (x, y) of a plurality of positions (a plurality of equally spaced points) serving as a reference (reference ) in the monitoring range (four vertices) and the corresponding world coordinates (X, Y, Z) are acquired, and the distance (Y) —Position on screen (y) A graph is acquired.

取得した距離（Ｙ）を利用して、１画素当たりの距離（ΔＸ）−距離（Ｙ）、１画素当たりの高さ（ΔＺ）−距離（Ｙ）グラフを得る。カメラからの距離−画像座標の関係グラフ（Ｙ−ｙグラフ）の一例を図１２に示している。１画素当たりの距離−画像座標の関係グラフ（ΔＸ−Ｙグラフ）の一例を図１３に示している。 Using the acquired distance (Y), a graph of distance per pixel (ΔX) −distance (Y), height per pixel (ΔZ) −distance (Y) is obtained. FIG. 12 shows an example of a distance-image coordinate relationship graph (Yy graph) from the camera. An example of a distance-image coordinate relationship graph (ΔX-Y graph) per pixel is shown in FIG.

リファレンス位置は以下のとおりとする。 The reference position is as follows.

図１４の画面に示すように、監視範囲（４頂点）の画面手前、中央、奥の３点の基準位置（ｙ０、ｙ１、ｙ２）を決め、それぞれの間隔ΔＹ０［ｍ：メートル］（目測による設定も可）、ΔＹ１［ｍ］、ΔＹ２［ｍ］を計測し、Ｙ０［ｍ］、Ｙ１［ｍ］、Ｙ２［ｍ］を得る。このΔＹ［ｍ］間隔のリファレンス位置は、前項のオブジェクトモデルを得るための位置と共通にしてもよい。また、各基準位置で、ある等間隔ΔＸ１［ｍ］、ΔＸ２［ｍ］、ΔＸ３［ｍ］の各画素幅（Δｘ［ｐｉｘ：ピクセル］）を得る。この時、画像上のｙ座標が左右でほぼ同一である（並行である）ようにする。 As shown in the screen of FIG. 14, three reference positions (y0, y1, y2) at the front, center, and back of the monitoring range (four vertices) are determined, and each interval ΔY0 [m: meter] (according to the measurement) Also, ΔY1 [m] and ΔY2 [m] are measured to obtain Y0 [m], Y1 [m], and Y2 [m]. The reference position of this ΔY [m] interval may be made common with the position for obtaining the object model in the previous section. Further, in the reference position, is equidistant ΔX1 [m], Δ X2 [ m], each pixel width Δ X3 [m] (Δx [ pix: Pixel]) is obtained. At this time, the y coordinate on the image is set to be substantially the same on the left and right (in parallel).

取得したＹ０［ｍ］、Ｙ１［ｍ］、Ｙ２［ｍ］、ｙ０［ｐｉｘ］、ｙ１［ｐｉｘ］、ｙ２［ｐｉｘ］、ΔＸ１［ｍ］、ΔＸ２［ｍ］、ΔＸ３［ｍ］、Δｘ１［ｐｉｘ］、Δｘ２［ｐｉｘ］、Δｘ３［ｐｉｘ］を利用して、最小二乗法により、最初に、距離（Ｙ）−画像座標（ｙ）の指数近似式を得る。次に、取得した距離（Ｙ）を利用して、１画素当たりの距離（Ｘ）−距離（Ｙ）の線形近似式を得る。同様の考え方で、取得したＹ０［ｍ］、Ｙ１［ｍ］、Ｙ２［ｍ］、ｙ０［ｐｉｘ］、ｙ１［ｐｉｘ］、ｙ２［ｐｉｘ］、ΔＨ１［ｍ］、Ｈ２［ｍ］、Ｈ３［ｍ］、Δｈ１［ｐｉｘ］、Δｈ２［ｐｉｘ］、Δｈ３［ｐｉｘ］を利用して、１画素当りの高さ（Ｚ）−距離（Ｙ）の線形近似式を得る。 Acquired Y0 [m], Y1 [m ], Y2 [m], y0 [pix], y1 [pix], y2 [pix], ΔX1 [m], Δ X2 [m], Δ X3 [m], Δx1 Using [pix], Δx2 [pix], and Δx3 [pix], first, an exponential approximate expression of distance (Y) −image coordinates (y) is obtained by the least square method. Next, using the acquired distance (Y), a linear approximation formula of distance (X) per pixel−distance (Y) is obtained. In the same way, the obtained Y0 [m], Y1 [m], Y2 [m], y0 [pix], y1 [pix], y2 [pix], ΔH1 [m], H2 [m], H3 [m ], Δh1 [pix], Δh2 [pix], and Δh3 [pix] are used to obtain a linear approximate expression of height (Z) -distance (Y) per pixel.

上記指数近似式から、図１６に示すようなＹ座標参照テーブル（実距離テーブル；実距離Ｙテーブル１５５ａ）を作成する。 From the exponential approximation equation, Y coordinate reference table as shown in FIG. 16; to create a (real distance table actual distance Y table 155a).

上記線形近似式から、図１７に示すようなＸ座標参照テーブル（横方向重みテーブル；実距離Ｘ重みテーブル１５５ｂ）を作成する。 From the above linear approximation formula, an X coordinate reference table (lateral weight table; actual distance X weight table 155b ) as shown in FIG. 17 is created.

上記線形近似式から、図１８に示すようなＺ座標参照テーブル（縦方向重みテーブル；実距離Ｚ重みテーブル１５５ｃ）を作成する。 From the linear approximation equation, Z coordinate reference table as shown in FIG. 18; to create a (longitudinal weight table actual distance Z weight table 155c).

（３）．オブジェクト位置推定方法（監視範囲と３次元位置マップの考え方）
監視範囲（４頂点）を３次元マップに展開したとき、図１９のようになる。 (3). Object position estimation method (Concept of monitoring range and 3D position map)
When the monitoring range (four vertices) is expanded into a three-dimensional map, it becomes as shown in FIG.

オブジェクトの足元位置（ｘｐ，ｙｐ）を入力とし、図１６のＹ座標参照テーブル（１５５ａ）から、ｙｐ座標に対応するＹｐ座標を得る。 Using the foot position (xp, yp) of the object as input, Yp coordinates corresponding to the yp coordinates are obtained from the Y coordinate reference table (155a) of FIG.

オブジェクトの足元位置（ｘｐ，ｙｐ）を入力とし、図１７のＸ座標参照テーブル（１５５ｂ）から、監視範囲（４頂点）の左辺を基準にした（ｘｐ，ｙｐ）座標に対応するＸｐ座標を得る。 Using the foot position (xp, yp) of the object as an input, the Xp coordinate corresponding to the (xp, yp) coordinate based on the left side of the monitoring range (four vertices) is obtained from the X coordinate reference table (155b) of FIG. .

図１８のＺ座標参照テーブル（１５５ｃ）から、（ｙｔ−ｙｐ）の差を利用して高さＺを得る。 The height Z is obtained from the Z coordinate reference table (155c) of FIG. 18 using the difference of (yt−yp).

この方法で得られた（Ｘ，Ｙ，Ｚ）座標をオブジェクトの位置とする（３次元マップにプロットが可能となる）。 The (X, Y, Z) coordinates obtained by this method are used as object positions (plotting is possible on a three-dimensional map).

（４）．監視範囲（４頂点）を利用した不要領域の削除
例として、監視対象となるオブジェクトの領域が、図２０に直線で示した監視範囲（４頂点）の外部に出現したと判断した場合は、その領域（オブジェクト）を削除する。例えば、監視範囲（４頂点）に対して、空間設定（世界座標の高さ情報）が行われた場合のみ監視範囲の外部に出現したと判断したオブジェクトを削除する（監視範囲空間設定によって、高さ方向の設定を行わない場合は、削除しない）。また、例えば、図２１に矩形で示すように、足元が監視平面上に存在しない、または、監視エリア外に頭頂位置が存在する場合は、当該領域を削除する。 (4). Deletion of unnecessary area using monitoring range (4 vertices) As an example, if it is determined that the area of the object to be monitored has appeared outside the monitoring range (4 vertices) indicated by a straight line in FIG. Delete an area (object). For example, an object that is determined to have appeared outside the monitoring range is deleted only when a space setting (world coordinate height information) is performed for the monitoring range (four vertices) . If you do not set the direction, do not delete.) Further, for example, as shown by a rectangle in FIG. 21, when the foot does not exist on the monitoring plane or the head position exists outside the monitoring area, the region is deleted.

上記した本発明の実施形態によれば、単眼（カメラ１台）の画像処理において、カメラ情報やカメラ設置条件を利用することなく、撮影された画像上の監視範囲（４頂点）と位置情報を３箇所以上入力することで撮影画面上の位置推定を行うことができる。また、画像上の位置に対する実距離を算出することが可能となり、実世界座標上の対象物の存在位置を高い精度で把握できる。また、撮影映像上のオブジェクトを抽出（領域）すると、そのサイズ（幅、高さ）の実距離を推定することができ、さらに、その位置の実距離を推定することができる。また、撮影映像上で監視範囲（４頂点）を設定することで、そのゾーンを外れたオブジェクトを対象外とすることができ、ノイズ除去を行うことができる。 According to the above-described embodiment of the present invention, in the image processing of a single eye (one camera), the monitoring range (four vertices) and the position information on the captured image can be obtained without using camera information or camera installation conditions. The position on the shooting screen can be estimated by inputting three or more locations. In addition, it is possible to calculate the actual distance with respect to the position on the image, and the position of the object on the real world coordinates can be grasped with high accuracy. Further, when an object on the captured video is extracted (region), the actual distance of the size (width, height) can be estimated, and further, the actual distance of the position can be estimated. In addition, by setting a monitoring range (four vertices) on the captured video, an object outside the zone can be excluded and noise can be removed.

本発明の実施形態に係る実施形態に係る画像処理装置の動作概要を示す図。The figure which shows the operation | movement outline | summary of the image processing apparatus which concerns on embodiment which concerns on embodiment of this invention. 上記実施形態に係る画像処理装置の構成を示すブロック図。The block diagram which shows the structure of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の構成要素間における情報の流れを示す図。The figure which shows the flow of the information between the components of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の処理手順を示すフローチャート。5 is a flowchart showing a processing procedure of the image processing apparatus according to the embodiment. 上記実施形態に係る画像処理装置の参照テーブルを作成するための撮影画像を示す図。The figure which shows the picked-up image for creating the reference table of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の監視範囲（４頂点）の設定例を示す図。The figure which shows the example of a setting of the monitoring range (4 vertex) of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の位置情報設定方法を説明するための図。The figure for demonstrating the positional information setting method of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の取得グラフ例を示す図。The figure which shows the example of an acquisition graph of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の取得グラフ例を示す図。The figure which shows the example of an acquisition graph of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の取得グラフ例を示す図。The figure which shows the example of an acquisition graph of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の監視範囲（４頂点）の設定方法を説明するための図。The figure for demonstrating the setting method of the monitoring range (4 vertex) of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の取得グラフ例を示す図。The figure which shows the example of an acquisition graph of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の取得グラフ例を示す図。The figure which shows the example of an acquisition graph of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の位置座標を説明するための図。The figure for demonstrating the position coordinate of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の取得グラフ例を示す図。The figure which shows the example of an acquisition graph of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の参照テーブル構成例を示す図。The figure which shows the reference table structural example of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の参照テーブル構成例を示す図。The figure which shows the reference table structural example of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の参照テーブル構成例を示す図。The figure which shows the reference table structural example of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置のオブジェクト位置推定方法を説明するための図。The figure for demonstrating the object position estimation method of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の処理動作例を示す図。The figure which shows the processing operation example of the image processing apparatus which concerns on the said embodiment. 上記実施形態に係る画像処理装置の処理動作例を示す図。The figure which shows the processing operation example of the image processing apparatus which concerns on the said embodiment.

Explanation of symbols

１１…単眼カメラ、１２…キャプチャ部、１３…画像処理記憶部、１４…位置推定係数パラメータ算出部、１５…画像処理部、１６…表示部（表示デバイス）、１５１…変化領域抽出処理部、１５２…追跡処理部、１５３…オブジェクト位置・サイズ推定処理部、１５４…位置推定テーブル算出部。 DESCRIPTION OF SYMBOLS 11 ... Monocular camera, 12 ... Capture part, 13 ... Image processing memory | storage part, 14 ... Position estimation coefficient parameter calculation part, 15 ... Image processing part, 16 ... Display part (display device), 151 ... Change area extraction process part, 152 ... Tracking processing unit, 153... Object position / size estimation processing unit, 154.

Claims

Among the monitoring range of the monocular camera, information on the four vertices on the screen indicating the monitoring range on the ground plane , screen information and actual measurement information on sample objects in three or more sample images taken by the monocular camera, The actual distance to the pixels in the depth direction on the screen is obtained by an exponential approximation formula, and each pixel in the depth direction on the screen is determined on the ground surface from the pixel position in the depth direction from the ground contact position of the object. An actual distance table acquisition means for creating an actual distance table corresponding to the actual distance in the depth direction at
Based on information on the actual distance table in units of pixels, information on four vertices on the screen indicating the monitoring range, and screen information and actual measurement information on the sample object in the three or more sample images , A weight of an actual distance from the ground contact position of the object on the screen to a pixel in the horizontal direction is obtained by a linear approximation formula , and each pixel in the horizontal direction on the screen is each distance in the depth direction on the ground plane. A horizontal weight table acquisition means for creating a horizontal weight table associated with the horizontal real coordinates in the coordinate system with reference to the monitoring range on the ground plane ,
Based on information on the actual distance table in units of pixels, information on four vertices on the screen indicating the monitoring range, and screen information and actual measurement information on the sample object in the three or more sample images The weight of the real distance for the vertical pixel from the contact position of the object on the screen is determined by a linear approximation formula, and the vertical pixel position on the screen is the corresponding depth direction real distance. Vertical weight table acquisition means for creating a vertical weight table for storing the actual height width of one pixel ;
The pixel-based real distance table, the horizontal weight table on the screen, and the vertical weight table on the screen, which are created by the respective acquisition means, are stored as reference tables, and at least the pixel-based real distance table. Using image processing means for executing image processing for recognizing the actual position of the object photographed by the monocular camera;
An image processing apparatus comprising:

Screen information and actual measurement information about the sample object in the sample image are information indicating the distance from the reference position to the ground position of the sample model in which the interval can be set, and the height and The image processing apparatus according to claim 1, further comprising information indicating an interval.

The image processing apparatus according to claim 1, wherein the image processing unit executes a process of excluding the object photographed by the monocular camera from a processing target using the pixel-based actual distance table.

The image processing means is a process for recognizing the substance of the object photographed by the monocular camera using the pixel-based actual distance table and the horizontal weight table on the screen or the vertical weight table on the screen. The image processing apparatus according to claim 1, wherein:

The image processing means uses information on four vertices on the screen indicating the monitoring range, an actual distance table in units of pixels, a horizontal weight table on the screen or a vertical weight table on the screen, The image processing apparatus according to claim 1, wherein processing for setting the monitoring range on a real space is executed.

The image processing means estimates or specifies the object photographed by the monocular camera using the pixel-based actual distance table and the horizontal weight table on the screen or the vertical weight table on the screen. The image processing apparatus according to claim 1.

An image processing program for causing a computer to function as an image processing apparatus for a monocular camera,
Among the monitoring range of the monocular camera, information on the four vertices on the screen indicating the monitoring range on the ground plane , screen information and actual measurement information on sample objects in three or more sample images taken by the monocular camera, based on, determined by the actual distance the exponential approximation formula for the depth direction of the pixel on the screen, from the depth direction pixel position from the ground position of the object, each pixel in the depth direction on the screen, on the ground surface A function of creating an actual distance table corresponding to the actual distance in the depth direction at
Based on information on the actual distance table in units of pixels, information on four vertices on the screen indicating the monitoring range, and screen information and actual measurement information on the sample object in the three or more sample images , A weight of an actual distance from the ground contact position of the object on the screen to a pixel in the horizontal direction is obtained by a linear approximation formula , and each pixel in the horizontal direction on the screen is each distance in the depth direction on the ground plane. A function of creating a lateral weight table associated with the actual lateral coordinates in the coordinate system with reference to the monitoring range on the ground plane ,
Based on information on the actual distance table in units of pixels, information on four vertices on the screen indicating the monitoring range, and screen information and actual measurement information on the sample object in the three or more sample images The weight of the real distance for the vertical pixel from the contact position of the object on the screen is determined by a linear approximation formula, and the vertical pixel position on the screen is the corresponding depth direction real distance. A function of creating a vertical weight table for storing the actual height width of one pixel ;
The pixel-based actual distance table created by each function, the horizontal weight table on the screen, and the vertical weight table on the screen are stored as reference tables, and at least the pixel-based actual distance table is used. An image processing function for executing processing for recognizing an actual position of the object photographed by the monocular camera or processing for removing the object photographed by the monocular camera from a processing target;
An image processing program for causing a computer to include the image processing program.

The image processing function is a process for recognizing the substance of the object photographed by the monocular camera using the pixel-based actual distance table and the horizontal weight table on the screen or the vertical weight table on the screen. The image processing program according to claim 7, further comprising: executing a process of setting the monitoring range in a real space, or a process of estimating or specifying the object captured by the monocular camera.