JP7241492B2

JP7241492B2 - Image processing device, image processing method, program, and storage medium

Info

Publication number: JP7241492B2
Application number: JP2018171678A
Authority: JP
Inventors: 秀敏椿
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2023-03-17
Anticipated expiration: 2038-09-13
Also published as: JP2020042727A

Description

本発明は、ステレオ画像から静止領域および移動体領域を判別し空間マップを生成する技術に関する。 TECHNICAL FIELD The present invention relates to a technique for discriminating a still area and a moving object area from a stereo image and generating a space map.

空間を移動するカメラの画像や外界・内界センサの観測データにより空間のマップとカメラの軌跡を取得する空間マッピングまたはＳＬＡＭ（ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ）と呼ばれる技術がある。空間マップは車載用途では人体や標識の認識の入力に利用され、自動運転や危険回避動作の判断に用いられる。また、空間マップは、携帯電話では物体の認識／判別を用いたＡＲ（拡張現実）ナビゲーションに用いられる。 There is a technique called spatial mapping or SLAM (Simultaneous Localization and Mapping) that obtains a space map and camera trajectories from images of a camera moving in space and observation data of external/internal sensors. Spatial maps are used for in-vehicle applications to input recognition of human bodies and signs, and are used to determine automatic driving and danger avoidance actions. Spatial maps are also used in mobile phones for AR (Augmented Reality) navigation using object recognition/discrimination.

空間マップを生成する際に静止物体は静止領域、移動体は移動体領域と判別できると、カメラの軌跡自体のロバスト性と精度を向上させることができる。また、人体と静止物体を判別できると、人体を移動体として管理できるなど、物体の認識等において空間マップの利用効率や価値が向上する。 If a stationary object can be identified as a stationary area and a moving object can be identified as a moving area when generating a space map, the robustness and accuracy of the camera trajectory itself can be improved. Also, if a human body can be distinguished from a stationary object, the human body can be managed as a moving object, and the utilization efficiency and value of the space map will be improved in object recognition and the like.

特許文献１には、空間マップを静止領域用と移動体領域に分けて管理し、入力画像と静止領域用マップとの相関、入力画像と移動体領域用マップとの相関により静止領域と移動体領域を判別する技術が記載されている。特許文献２には、時系列に入力されるステレオ画像もしくは多眼画像を用いて、ステレオ画像間の相関により立体物を検出し、立体物の領域を時系列フレーム間で動きベクトル追跡により追尾して動きの有無で静止領域と移動体領域を判別する技術が提案されている。 In Patent Literature 1, a space map is managed separately for a static area and a moving object area, and the stationary area and the moving object are classified by the correlation between the input image and the static area map, and the correlation between the input image and the moving object area map. Techniques for determining regions are described. In Patent Document 2, using stereo images or multi-view images input in time series, a three-dimensional object is detected by correlation between the stereo images, and the area of the three-dimensional object is tracked by motion vector tracking between time-series frames. Techniques have been proposed for distinguishing stationary and moving object areas based on the presence or absence of motion.

特開２０１２－１０３８１９号公報JP 2012-103819 A 特開２０１７－１４２７６０号公報JP 2017-142760 A 特許第５１９２０９６号公報Japanese Patent No. 5192096

”ＰｒｏｊｅｃｔＴａｎｇｏＤｅｖｅｌｏｐｍｅｎｔＫｉｔ” ｈｔｔｐｓ：／／ｄｅｖｅｌｏｐｅｒｓ．ｇｏｏｇｌｅ．ｃｏｍ／ｔａｎｇｏ／，ａｃｃｅｓｓｅｄ：２０１７－０２－２８．"Project Tango Development Kit" https://developers. google. com/tango/, accessed: 2017-02-28. ”コンピュータビジョン最先端ガイド３” 第２章「ＩＣＰアルゴリズム」増田健（産業技術研究所）ｐ．３３－６２"Computer Vision Leading Edge Guide 3" Chapter 2 "ICP Algorithm" Takeshi Masuda (Institute of Advanced Industrial Science and Technology) p. 33-62

特許文献１では、移動するカメラで撮影した画像においてカメラに相対的なものでない絶対的な静止領域と移動体領域を判別するためには静止領域用と移動体領域用の２つの空間マップを維持更新しなければならない。このため、大きな記憶領域や高い処理能力が必要である。また、カルマンフィルタのような予測更新の必要なループ処理のため、隠れパラメータの安定まで再度数フレーム初期化処理を行う必要があり、例外処理からの復帰に処理時間がかかる。さらに、初期化処理のような負荷の高い処理が別途必要になる。特許文献２では、距離と動きによる判別を別々に行うため、カメラに対して相対的に動きのない物体は静止物体として判別されてしまう。 In Patent Literature 1, two spatial maps are maintained for a static region and a moving object region in order to distinguish an absolute static region and a moving object region that are not relative to the camera in an image captured by a moving camera. Must update. Therefore, a large storage area and high processing power are required. In addition, because of loop processing such as the Kalman filter that requires prediction updating, it is necessary to perform initialization processing again for several frames until the hidden parameters stabilize, and it takes processing time to return from exception processing. Furthermore, processing with high load such as initialization processing is required separately. In Japanese Patent Application Laid-Open No. 2004-200010, determination is made separately based on distance and motion, so an object that does not move relatively to the camera is determined as a stationary object.

本発明は、上記課題に鑑みてなされ、その目的は、移動する撮像装置で撮影されたステレオ画像による空間のマッピングにおいて、大きな記憶領域や高い処理能力を必要とせずに静止領域と移動体領域を判別できる技術を実現することである。 The present invention has been made in view of the above problems, and its object is to map a stationary area and a moving object area without requiring a large storage area or high processing power in spatial mapping using stereo images captured by a moving imaging device. It is to realize a technology that can discriminate.

上記課題を解決し、目的を達成するために、本発明の画像処理装置は、ステレオ画像を入力する入力手段と、前記ステレオ画像を撮影したときの撮像装置の位置と姿勢の情報を取得する取得手段と、ステレオ画像間の相関および撮影時刻の異なるステレオ画像間の相関から静止領域と移動体領域の判別を行い、三次元情報を生成する生成手段と、を有し、前記生成手段は、前記撮像装置の位置と姿勢を考慮して、前記ステレオ画像間における相関を求めるための第１の相関処理を行い、当該ステレオ画像と撮影時刻の異なるステレオ画像の少なくともいずれかとの間における相関を求めるための第２の相関処理を行い、前記第１の相関処理により求められた相関と前記第２の相関処理により求められた相関との組み合わせにより静止領域および移動体である可能性の高い可能性領域を判別し、前記移動体の可能性領域に基づいて、前記ステレオ画像間で前記撮像装置の位置と姿勢を考慮して前記第１の相関処理を行い、当該ステレオ画像と撮影時刻の異なるステレオ画像との間で前記撮像装置の位置と姿勢および前記移動体の位置と姿勢の変化を考慮して第３の相関処理を行い、前記第１の相関処理により求められた相関と前記第３の相関処理により求められた相関との組み合わせにより移動体領域およびそれ以外の領域を判別する。 In order to solve the above problems and achieve the object, the image processing apparatus of the present invention includes input means for inputting a stereo image, and acquisition for acquiring information on the position and orientation of an imaging device when the stereo image was captured. and generating means for determining a static area and a moving object area from the correlation between stereo images and the correlation between stereo images captured at different times, and generating three-dimensional information, wherein the generating means comprises the above Taking into account the position and orientation of the imaging device, performing a first correlation process for obtaining the correlation between the stereo images, and obtaining the correlation between the stereo image and at least one of the stereo images captured at different times. performing the second correlation processing, and combining the correlation obtained by the first correlation processing and the correlation obtained by the second correlation processing to determine a stationary region and a highly likely region of a moving object and performing the first correlation processing between the stereo images in consideration of the position and orientation of the imaging device based on the possible region of the moving object, and obtaining a stereo image having a different shooting time from the stereo image. in consideration of changes in the position and orientation of the imaging device and the position and orientation of the moving object, and the correlation obtained by the first correlation processing and the third correlation The mobile object area and other areas are determined by combining with the correlation obtained by the processing .

本発明によれば、移動する撮像装置で撮影されたステレオ画像による空間のマッピングにおいて、大きな記憶領域や高い処理能力を必要とせずに静止領域と移動体領域を判別できる。 According to the present invention, it is possible to distinguish between a stationary area and a moving object area without requiring a large storage area or high processing power in spatial mapping using stereo images captured by a moving imaging device.

実施形態１の画像処理装置を説明する図。1A and 1B are diagrams for explaining an image processing apparatus according to a first embodiment; FIG. 実施形態１の処理フローを説明する図。4A and 4B are diagrams for explaining the processing flow of the first embodiment; FIG. プレーンスウィープ法による相関処理を説明する図。FIG. 5 is a diagram for explaining correlation processing by a plane sweep method; ステレオ画像の相関処理を説明する図。FIG. 4 is a diagram for explaining correlation processing of stereo images; 移動体領域の判別処理を説明する図。The figure explaining the discrimination|determination process of a moving body area|region. 移動体領域を判別するための相関処理を説明する図。FIG. 5 is a diagram for explaining correlation processing for determining a moving object region; 静止領域、移動体の可能性領域、移動体領域をラべリングしたマップを説明する図。FIG. 4 is a diagram for explaining a map labeled with a stationary area, a possible moving object area, and a moving object area; ３眼以上のステレオカメラへの適用例を説明する図。The figure explaining the example of application to a stereo camera more than three eyes. 実施形態２の装置構成を説明する図。FIG. 7 is a diagram for explaining the device configuration of Embodiment 2; 瞳分割光学系の撮像部の構成を説明する図。FIG. 4 is a diagram for explaining the configuration of an imaging unit of the pupil division optical system;

以下に、本発明を実施するための形態について詳細に説明する。尚、以下に説明する実施の形態は、本発明を実現するための一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されるべきものであり、本発明は以下の実施の形態に限定されるものではない。また、後述する各実施形態の一部を適宜組み合わせて構成してもよい。 EMBODIMENT OF THE INVENTION Below, the form for implementing this invention is demonstrated in detail. The embodiment described below is an example for realizing the present invention, and should be appropriately modified or changed according to the configuration of the apparatus to which the present invention is applied and various conditions. It is not limited to the embodiment of Also, a part of each embodiment described later may be appropriately combined.

［実施形態１］以下、実施形態１について説明する。 [Embodiment 1] Embodiment 1 will be described below.

まず、図１を参照して、実施形態１の画像処理装置１００の構成および機能について説明する。 First, the configuration and functions of an image processing apparatus 100 according to the first embodiment will be described with reference to FIG.

画像処理装置１００は、ステレオ画像入力部１０１、カメラ状態取得部１０２、画像処理部１０３、記憶部１０４、を備える。 The image processing apparatus 100 includes a stereo image input section 101 , a camera state acquisition section 102 , an image processing section 103 and a storage section 104 .

ステレオ画像入力部１０１は、ステレオカメラなどの撮像装置により撮影されたステレオ画像を入力する。ステレオ画像は、２眼または３眼以上のステレオカメラにより異なる視点で撮影された少なくとも２枚の画像を含み、被写体までの距離や姿勢などの三次元形状を復元し空間マップを生成するために用いられる。ステレオ画像は、ステレオカメラなどの撮像装置から時間的に連続する画像または動画のフレームが取得可能である。以下では、ステレオ画像入力部１０１がステレオカメラにより動画撮影されたステレオ画像のフレームを入力するものとする。 A stereo image input unit 101 inputs a stereo image captured by an imaging device such as a stereo camera. A stereo image includes at least two images taken from different viewpoints by a stereo camera with two or more eyes, and is used to restore the three-dimensional shape such as the distance and posture to the subject and generate a space map. be done. A stereo image can be acquired as temporally continuous images or moving image frames from an imaging device such as a stereo camera. In the following description, it is assumed that the stereo image input unit 101 inputs stereo image frames captured by a stereo camera.

カメラ状態取得部１０２は、ステレオ画像の各フレーム撮影時のカメラの位置と姿勢に関する情報を取得する。カメラの位置および姿勢はある時点からの相対的な位置および姿勢でもよいし、カメラの位置と姿勢の変化値の積分により算出してもよい。例えば、ジャイロセンサや加速度センサ等の慣性センサの組み合わせで実現したり、慣性計測ユニット（ＩＭＵ）アセンブリを用いてもよい。また、慣性センサに制約せず、姿勢方位基準装置（ＡＨＲＳ）とＧＰＳサービスを組み合わせて実現してもよい。さらに地磁気センサ等の他のセンサを組み合わせたり、画像情報を組み合わせることで位置や姿勢の変化の情報の取得精度を高めてもよい。室内ならばビーコンによる測位機構を組み合わせて位置の変化を検出してもよい。また、ステレオ画像入力部１０１から入力したステレオ画像もしくは一方の単眼画像を用いてＶｉｓｕａｌＳＬＡＭ処理により各フレームに対するカメラの位置と姿勢の変化の情報を取得してもよい。または前述のセンサ群とＶｉｓｕａｌＳＬＡＭ処理を組み合わせたＶｉｓｕａｌ－ＩｎｅｒｔｉａｌＳＬＡＭ処理により各フレームに対するカメラの位置と姿勢の変化の情報を取得してもよい。 The camera state acquisition unit 102 acquires information about the position and orientation of the camera when each frame of the stereo image is captured. The position and orientation of the camera may be relative positions and orientations from a certain point in time, or may be calculated by integrating changes in camera position and orientation. For example, a combination of inertial sensors such as gyro sensors and accelerometers may be used, or an inertial measurement unit (IMU) assembly may be used. Also, without being limited to the inertial sensor, an attitude and heading reference system (AHRS) and a GPS service may be combined to achieve this. Further, by combining other sensors such as a geomagnetic sensor or by combining image information, the acquisition accuracy of information on changes in position and orientation may be enhanced. If it is indoors, the position change may be detected by combining a positioning mechanism using a beacon. Alternatively, the stereo image input from the stereo image input unit 101 or one of the monocular images may be used to acquire information on changes in camera position and orientation for each frame by Visual SLAM processing. Alternatively, information on changes in the position and orientation of the camera for each frame may be obtained by Visual-Inertial SLAM processing in which the above-described sensor group and Visual SLAM processing are combined.

画像処理部１０３は、ステレオ画像入力部１０１で取得したステレオ画像のシーケンスとカメラ状態取得部１０２で取得したステレオ画像の各フレーム撮影時のカメラの位置と姿勢の情報を入力する。そして、各フレームに対応する空間マップを生成し、静止領域および移動体の可能性領域、さらに移動体領域を判別したラベルマップを生成する。そして必要に応じてステレオ画像の各フレーム撮影時のカメラの位置と姿勢の情報を用いて各フレームに対応して生成されたそれぞれの空間マップおよびラベルマップを経時的に統合していく。 The image processing unit 103 inputs the sequence of stereo images acquired by the stereo image input unit 101 and information on the position and orientation of the camera at the time of photographing each frame of the stereo images acquired by the camera state acquisition unit 102 . Then, a spatial map corresponding to each frame is generated, and a label map is generated by discriminating the stationary area, the possible moving object area, and the moving object area. Then, if necessary, each space map and label map generated corresponding to each frame are integrated over time using information on the position and orientation of the camera when each frame of the stereo image was captured.

空間マッピングあるいはＳＬＡＭと呼ばれる技術は、空間のマップとカメラの軌跡を相互依存する形で同時に取得する技術である。実際には、カメラ軌跡、言い換えれば、カメラの位置と姿勢の情報を求める基準となるマップの更新、例えば新しいフレーム分のマップ追加はカメラの位置と姿勢の追跡処理に比べて時間的に低頻度である。また、空間マップの利用価値が高まるにつれてカメラ軌跡推定は処理の軽量な疎なマップで実施し、別途軌跡情報と画像入力を用いて密で情報量の多い空間マップを生成する方法も非力なリソースで魅力的なデバイスを実現する方法として増加している（非特許文献１）。画像処理部１０３で生成した空間マップおよび静止領域、移動体の可能性領域、移動体領域を判別したラベルマップは、それ以降のカメラ状態取得部１０２の処理に利用してもよいし、まったく利用しない場合もありうる。空間マップ生成のみの構成やその入力である奥行画像取得のみの構成も考えられる。 A technique called spatial mapping or SLAM is a technique that simultaneously acquires a map of space and a camera trajectory in an interdependent manner. In fact, updating the camera trajectory, in other words, the map that is the basis for obtaining camera position and orientation information, such as adding new frames to the map, is temporally less frequent than the camera position and orientation tracking processing. is. In addition, as the utility value of space maps increases, camera trajectory estimation is performed with a sparse map that is light in processing, and a method of separately generating a dense and information-rich space map using trajectory information and image input is also a powerless resource. It is increasing as a method of realizing an attractive device in the field (Non-Patent Document 1). The space map generated by the image processing unit 103, the static area, the possible area of the moving object, and the label map obtained by discriminating the moving object area may be used for subsequent processing by the camera state acquisition unit 102, or may be used at all. You may not. A configuration that only generates a spatial map or a configuration that only acquires a depth image as its input is also conceivable.

記憶部１０４は、ステレオ画像入力部１０１で入力したステレオ画像、カメラ状態取得部１０２で取得したカメラの位置と姿勢の情報を記憶する。また、画像処理部１０３でフレームごとに生成した空間マップ、対応する静止領域、移動体領域ラベルマップ、カメラの位置と姿勢の情報に基づき統合した空間マップおよびラベルマップを記憶する。 The storage unit 104 stores the stereo image input by the stereo image input unit 101 and information on the position and orientation of the camera acquired by the camera state acquisition unit 102 . It also stores a space map generated for each frame by the image processing unit 103, a corresponding static region, a moving body region label map, and a space map and label map integrated based on information on the position and orientation of the camera.

次に、図２を参照して、実施形態１の空間マップ生成処理について説明する。 Next, the spatial map generation processing of the first embodiment will be described with reference to FIG.

なお、図２の処理は、ステレオ画像の入力処理（Ｓ１０１）、カメラの位置と姿勢の情報の取得処理（Ｓ１０２）、静止領域および移動体の可能性領域の判別処理（Ｓ１０３）、移動体領域の判別処理（Ｓ１０４）および空間マップ統合処理（Ｓ１０５）を含む。ステレオ画像の取得処理（Ｓ１０１）はステレオ画像入力部、カメラの位置と姿勢の情報の取得処理（Ｓ１０２）はカメラ状態取得部１０２が実行する。静止領域および移動体の可能性領域の判別処理（Ｓ１０３）、移動体領域の判別処理（Ｓ１０４）および空間マップ統合処理（Ｓ１０５、Ｓ１０６）は、画像処理部１０３が実行し記憶部１０４に保存する。 Note that the processing in FIG. 2 includes stereo image input processing (S101), camera position and orientation information acquisition processing (S102), static region and moving object possible region determination processing (S103), moving object region determination processing (S104) and spatial map integration processing (S105). Stereo image acquisition processing (S101) is performed by the stereo image input unit, and camera position and orientation information acquisition processing (S102) is performed by the camera state acquisition unit 102. FIG. The image processing unit 103 executes and saves in the storage unit 104 the static area and possible moving object area determination processing (S103), the moving object area determination processing (S104), and the space map integration processing (S105, S106). .

Ｓ１０１では、ステレオ画像入力部１０１がステレオ画像を時系列に入力する。画像処理装置１００にステレオカメラが接続されている場合は、ステレオカメラから時間的に連続するステレオ画像または動画撮影されたステレオ画像のフレームを順次入力する。 In S101, the stereo image input unit 101 inputs stereo images in time series. When a stereo camera is connected to the image processing apparatus 100, temporally continuous stereo images or frames of stereo images captured by moving images are sequentially input from the stereo camera.

Ｓ１０２では、カメラ状態取得部１０２がＳ１０１で入力した時系列のステレオ画像ごとに撮影時の各フレームのカメラの位置Ｔと姿勢Ｒまたはフレーム間の位置の変化量ΔＴ、姿勢の変化量ΔＲを取得する。ステレオ画像のフレーム間のカメラの位置と姿勢の変化量ΔＴ、ΔＲを用いる場合は、ある時点でのカメラの位置と姿勢を基準に積算して相対的なカメラの位置／姿勢の情報として出力する。 In S102, the camera state acquisition unit 102 acquires the camera position T and orientation R of each frame at the time of shooting for each time-series stereo image input in S101, or the amount of change ΔT in position between frames and the amount of change ΔR in orientation. do. When using the amounts of change ΔT and ΔR in the camera position and orientation between stereo image frames, the camera position and orientation at a certain point in time are integrated and output as relative camera position/orientation information. .

Ｓ１０３では、画像処理部１０３が静止領域および移動体の可能性領域の判別を行う。画像処理部１０３は、ステレオ画像間でカメラの位置と姿勢を考慮して相関処理を行い、また、ステレオ画像と時系列のステレオ画像の少なくともいずれかとの間でカメラの位置と姿勢を考慮して相関処理を行い、静止画領域および移動体である可能性のある領域を判別する。Ｓ１０３では、ステレオカメラで撮影された少なくとも時間的に同期した２枚の画像間の相関と、これら２枚の画像と撮影時刻の異なる少なくとも１枚の画像（各々の画像撮影時のカメラの位置と姿勢の関係が判明している画像）との間の撮影時のカメラの位置と姿勢を考慮した相関との高低の組み合わせの関係により静止領域および移動体の可能性領域を判別する。 In S103, the image processing unit 103 discriminates a still area and a possible moving object area. The image processing unit 103 performs correlation processing between stereo images in consideration of the position and orientation of the camera, and considers the position and orientation of the camera between at least one of the stereo images and time-series stereo images. Correlation processing is performed to determine still image areas and areas that may be moving objects. In S103, the correlation between at least two temporally synchronized images captured by a stereo camera and at least one image captured at different times from these two images (the position of the camera at the time each image was captured and the The static area and the possible area of the moving object are determined based on the combination of high and low correlation with the correlation considering the position and attitude of the camera at the time of shooting.

Ｓ１０４では、画像処理部１０３が、移動体である可能性のある領域の中から移動体領域を判別する。ステレオ画像と時系列のステレオ画像のセット、Ｓ１０２で得られた時系列のステレオ画像ごとの撮像時のカメラの位置と姿勢の情報を用いて判別を行う。 In S<b>104 , the image processing unit 103 determines a moving object area from areas that may be a moving object. Determination is performed using a set of stereo images and time-series stereo images, and information on the position and orientation of the camera at the time of imaging for each of the time-series stereo images obtained in S102.

ここで、Ｓ１０３における各画像撮影時のカメラの位置と姿勢の関係が分かっているステレオ画像間での相関処理を説明する。以下では、図３を参照して、プレーンスウィープ法の三次元復元による相関と逆射影変換による写像の相関を組み合わせた場合の例を説明する。 Here, the correlation processing between stereo images for which the relationship between the position and orientation of the camera at the time of image capturing in S103 is known will be described. An example of combining correlation by three-dimensional reconstruction of the plane sweep method and mapping correlation by inverse projective transformation will be described below with reference to FIG.

プレーンスウィープ法では、複数のカメラから物体を撮影して得られる参照画像（カメラ画像）と仮想空間上で仮想視点に設定した仮想カメラの光軸に対して平行に等間隔に並べられた複数枚のプレーンを用いて相関演算を実施して三次元形状の復元を行う。 In the plane sweep method, a reference image (camera image) obtained by photographing an object from multiple cameras and multiple images arranged at equal intervals parallel to the optical axis of a virtual camera set as a virtual viewpoint in virtual space 3D shape is restored by performing a correlation operation using the planes of .

まず、参照画像を撮影したカメラを、Ｓ１０２で得られた各ステレオ画像の撮像時のカメラの位置と姿勢の情報に基づいて仮想空間上に並べる。 First, the cameras that have captured the reference images are arranged in the virtual space based on the information on the positions and orientations of the cameras at the time of capturing the stereo images obtained in S102.

図３（ａ）は各画像撮像時のカメラの位置と姿勢に基づいて仮想空間に左カメラ３０１と右カメラ３０２を並べ、仮想カメラ３０３およびプレーン３０４を設定した様子を示している。なお、説明の簡易化のため、仮想カメラ３０３は左カメラ３０１と右カメラ３０２と異なるように設定したが、いずれかと同一のものと設定してもよい。 FIG. 3A shows a state in which a left camera 301 and a right camera 302 are arranged in virtual space based on the positions and orientations of the cameras when each image is captured, and a virtual camera 303 and a plane 304 are set. To simplify the explanation, the virtual camera 303 is set to be different from the left camera 301 and the right camera 302, but it may be set to be the same as either one.

図３（ｂ）のようにプレーン３０４の各面に対して各参照画像を逆射影する。そして、プレーン３０４の各面上での輝度または色の同一性の計算により、各画像に写像された物体表面の仮想空間上での位置の判定を行う。例えば図３（ｃ）に示すようにプレーン３０４の各面上で右カメラ３０１と左カメラ３０２から投影された像が重なる部分に対して、仮想視点カメラ３０３の画像の画素ごとに各カメラの画像間の輝度または色の同一性を判定するための評価値（スコア）を計算する。そして、同一性が高いプレーン上の計算点は物体表面上の点（物体表面点）である可能性が高いものとみなす。この計算をすべてのプレーンに対してデプス値の昇順、すなわち、仮想視点に最も近い前方のプレーンから、後方のプレーンに向けて１枚ずつ順に行う。最終的に、仮想視点カメラの画像の画素ごとに最も同一性の高い最良のスコアを持つデプス位置に物体表面点があるものと判定し、仮想視点から見た三次元の物体形状を復元する。 Each reference image is back-projected onto each surface of the plane 304 as shown in FIG. 3(b). Then, by calculating the brightness or color identity on each surface of the plane 304, the position in the virtual space of the surface of the object mapped to each image is determined. For example, as shown in FIG. 3C, for each pixel of the image of the virtual viewpoint camera 303, an image of each camera is generated for each pixel of the image projected by the right camera 301 and the image projected by the left camera 302 on each surface of the plane 304. Computes a score for judging luminance or color similarity between images. Then, it is assumed that the calculation points on the plane with high identity are highly likely to be points on the object surface (object surface points). This calculation is performed for all planes in ascending order of depth values, ie, from the front plane closest to the virtual viewpoint to the rear plane one by one. Finally, it is determined that the object surface point is at the depth position with the best score with the highest degree of identity for each pixel of the image of the virtual viewpoint camera, and the three-dimensional shape of the object viewed from the virtual viewpoint is restored.

また、スコアを計算する際に用いた各カメラの画像の色を平均化したものをその仮想空間上の物体の表面点の輝度または色の属性情報として与えてもよい。図３（ｄ）は復元された三次元形状の例を示している。白丸３０５が３次元空間のプレーン上で再構成された三次元形状のサンプリング点、実線が三次元形状の想定上の包絡面３０６である。包絡面３０６に沿うように奥行き方向に十分密にする場合にはプレーンの間隔を十分密に狭める必要がある。 Further, the averaged color of the image of each camera used when calculating the score may be given as the attribute information of the brightness or color of the surface point of the object in the virtual space. FIG. 3(d) shows an example of a restored three-dimensional shape. The white circles 305 are the sampling points of the three-dimensional shape reconstructed on the plane of the three-dimensional space, and the solid line is the assumed envelope surface 306 of the three-dimensional shape. In order to make the planes sufficiently dense in the depth direction along the envelope surface 306, it is necessary to narrow the intervals between the planes sufficiently.

以上の処理により、左カメラ画像上の画素－仮想空間上の三次元点－右カメラ画像上の画素の対応付けが得られる。対応付けが得られなかった領域には右カメラでは見えているが左カメラでは見えないオクルージョン領域などが考えられる。 Through the above processing, the correspondence of pixels on the left camera image-three-dimensional points on the virtual space-pixels on the right camera image is obtained. An occlusion area that can be seen by the right camera but not seen by the left camera can be considered as an area for which correspondence cannot be obtained.

次に、ステレオ画像と撮影時刻が異なるステレオ画像の少なくともいずれかとの間で相関処理を行う。そして、先に求めたステレオ画像間の相関と、ステレオ画像と撮影時刻が異なるステレオ画像の少なくともいずれかとの間の相関との高低の組み合わせの関係により静止領域および移動体の可能性領域を判別する。 Next, correlation processing is performed between at least one of the stereo images and the stereo images captured at different times. Then, the static region and the possible region of the moving object are determined based on the combination of the correlation between the previously obtained stereo images and the correlation between the stereo images and at least one of the stereo images captured at different times. .

プレーンスウィープ法を用いる場合は、ステレオ画像間の相関と、当該ステレオ画像と撮影時刻の異なるステレオ画像のフレームの少なくともいずれかとの間の相関とを、三次元形状の復元結果を用いて一連の手順としてより簡便に算出できる。 When using the plane sweep method, the correlation between the stereo images and the correlation between the stereo image and at least one of the frames of the stereo images taken at different times are calculated using the three-dimensional shape restoration result. can be calculated more simply as

図４を参照して、三次元形状と三次元形状の算出に用いたステレオ画像と撮影時刻の異なるステレオ画像の少なくともいずれかとの間での相関処理を説明する。 With reference to FIG. 4, the correlation processing between at least one of the three-dimensional shape, the stereo image used to calculate the three-dimensional shape, and the stereo images taken at different times will be described.

図４（ａ）はステレオカメラの位置関係を示しており、図３で説明したある時刻におけるステレオカメラ３０１および３０２で撮影した画像から復元した三次元形状を示している。白丸３０５はサンプリング点を示し、カメラ４０１および４０２は三次元形状を復元したステレオ画像のフレームとは撮影時刻が異なる、例えば隣接した時系列のステレオ画像のフレームに対応したカメラを示している。カメラの位置と姿勢の関係は、Ｓ１０２において絶対的、もしくは、三次元形状を復元したステレオ画像のフレームに対応するカメラに対して相対的に取得されているとする。この新たなステレオ画像のフレームのいずれかに対して三次元形状を写像して輝度または色に関する相関処理を行う。図４（ｂ）はステレオカメラの左カメラ４０１に写像して相関処理を行う様子を示している。ステレオ画像のフレーム間における相関と同様に、写像される画素ごとにカメラ画像と三次元形状の輝度または色の同一性を判定するための評価値（スコア）を計算する。
（式１）

ここで、Ｒ_ａ，Ｇ_ａ，Ｂ_ａはカメラ画像の画素値、Ｒ_３ｄ，Ｇ_３ｄ，Ｂ_３ｄは写像された三次元形状の色属性値である。 FIG. 4(a) shows the positional relationship of the stereo cameras, and shows the three-dimensional shape restored from the images taken by the

stereo cameras

301 and 302 at a certain time described in FIG. A white circle 305 indicates a sampling point, and

cameras

401 and 402 indicate cameras corresponding to, for example, adjacent time-series stereo image frames that are photographed at different times from the stereo image frames whose three-dimensional shape is restored. It is assumed that the relationship between the position and orientation of the camera has been acquired in S102 absolutely or relative to the camera corresponding to the frame of the stereo image whose three-dimensional shape has been restored. A three-dimensional shape is mapped to one of the frames of this new stereo image and a correlation process for luminance or color is performed. FIG. 4(b) shows how correlation processing is performed by mapping to the left camera 401 of the stereo camera. Similar to the inter-frame correlation of stereo images, an evaluation value (score) is calculated for each mapped pixel to determine the identity of the brightness or color of the camera image and the three-dimensional shape.
(Formula 1)

Here, R _a , G _a , B _a are the pixel values of the camera image, and R _3d , G _3d , B _3d are the color attribute values of the mapped three-dimensional shape.

そして、画像上の同一性が高い画素は、ステレオ画像間で相関が高く３次元復元が可能で、かつ、撮影時刻の異なるステレオ画像のフレームのカメラの位置と姿勢を考慮した画像上でも対応関係にあると考えられる。つまり、時間的に移動していない３次元空間上の静止物体の物体表面が写像されたものである可能性が高いものとみなせる。一方、ステレオ画像間で相関が高く３次元復元が可能であるのに、撮影時刻の異なるステレオ画像のフレームのカメラの位置と姿勢を考慮した画像との間では相関の低い領域は移動体が写像されたため相関が低くなっていると考えられる。また、相関が低い領域には前述のオクルージョンを含む誤対応なども含まれる。このような属性を持つ領域でいずれの時刻のいずれかの画像に写像したものを移動体の可能性領域と呼ぶ。図４（ｃ）はステレオ画像間の相関とステレオ画像と撮影時刻の異なるステレオ画像の少なくともいずれかとの間での相関の高低と静止領域、移動体の可能性領域との関係の組み合わせを示している。 Pixels with high identity on the image have a high correlation between stereo images and can be reconstructed in 3D, and also on images that consider the position and orientation of the camera of the stereo image frames taken at different times. It is considered to be in In other words, it can be considered that there is a high possibility that the object surface of a stationary object in a three-dimensional space that does not move temporally is mapped. On the other hand, although the correlation between stereo images is high and 3D reconstruction is possible, moving objects are mapped in areas where the correlation is low between images taken into account for the position and orientation of the camera in stereo image frames taken at different times. It is thought that the correlation is low because In addition, areas with low correlation include erroneous correspondences including the above-mentioned occlusion. A region having such an attribute and mapped to any image at any time is called a possible moving object region. FIG. 4(c) shows combinations of correlations between stereo images, correlation levels between stereo images and at least one of stereo images captured at different times, static regions, and possible regions of moving objects. there is

以上のように、Ｓ１０３における静止領域および移動体の可能性領域の判別処理の例として、例えばプレーンスウィープ法を用いる例を説明したが、静止領域および移動体の可能性領域を判別する方法としては上述した例に限定されない。例えば、ステレオ画像間で平行化を行い、通常の領域相関手法などによりステレオ画像の各画素間で対応を求め、ステレオカメラの基線長や焦点距離、画素サイズ等に基づき各ステレオ画像間で三次元復元を実施する。そして、ステレオ画像の各フレームを撮影した時点でのカメラの位置と姿勢を取得し、カメラ間の位置Ｔと姿勢Ｒの変化に基づき、三次元復元結果の空間座標系を統合することにより、ステレオ画像のフレームごとに求めた三次元復元結果の間での相関を実施する。三次元復元結果の間の相関には各三次元形状のサンプリング点の属性として付加した色や輝度情報、三次元点座標を用いて行う。相関の高い領域は、プレーンスウィープ法を用いた場合と同様に静止領域とみなすことができる。一方、相関の低い領域は、移動体である可能性のある領域と判別することができる。移動体の可能性領域にはオクルージョンを含む誤対応なども含まれる。 As described above, an example using the plane sweep method has been described as an example of the determination processing of the stationary area and the possible area of the moving object in S103. It is not limited to the examples described above. For example, parallelization is performed between stereo images, correspondence between each pixel of the stereo images is obtained by a normal area correlation method, etc., and three-dimensional images are obtained based on the base line length, focal length, pixel size, etc. of the stereo camera. Perform a restore. Then, the position and orientation of the camera at the time when each frame of the stereo image was captured are acquired, and the spatial coordinate system of the three-dimensional restoration result is integrated based on the change in the position T and orientation R between the cameras to obtain the stereo image. A correlation is performed between the 3D reconstruction results obtained for each frame of the image. Correlation between three-dimensional reconstruction results is performed using color and luminance information and three-dimensional point coordinates added as attributes of sampling points of each three-dimensional shape. Areas of high correlation can be regarded as static areas as with the plain sweep method. On the other hand, areas with low correlation can be identified as areas that may be moving objects. The possible range of moving objects includes erroneous correspondence including occlusion.

以上の例では、説明の簡略化のために、ステレオ画像間の相関により三次元復元を行い静止領域および移動体の可能性領域を判別する手法を説明したが、プレーンスウィープ法を用いる場合と同様に、ステレオ画像と撮影時刻の異なる時刻のステレオ画像の少なくともいずれかとの間でカメラの位置と姿勢を考慮して相関処理を行う場合も同様である。まずステレオ画像間、およびステレオ画像と撮影時刻の異なるステレオ画像の少なくともいずれかとの間の２組で再構成を行う。そして、三次元復元結果の空間座標系を統合し、三次元復元結果間の相関処理により静止領域および移動体の可能性領域を判別する。画像の平行化および三次元復元結果の空間座標系の統合はステレオ画像間および時系列のステレオ画像間のカメラの位置Ｔと姿勢Ｒの変化の情報に基づいて行う。 In the above example, for the sake of simplicity of explanation, we explained the method of 3D restoration by correlation between stereo images to discriminate the possible regions of static regions and moving objects. The same applies to the case where the correlation processing is performed between the stereo image and at least one of the stereo images captured at different times in consideration of the position and orientation of the camera. First, reconstruction is performed between two pairs of stereo images and between the stereo images and at least one of the stereo images captured at different times. Then, the spatial coordinate systems of the three-dimensional reconstruction results are integrated, and the stationary area and the possible moving object area are determined by correlation processing between the three-dimensional reconstruction results. The parallelization of the images and the integration of the spatial coordinate system of the three-dimensional reconstruction results are performed based on the information of the change in the camera position T and orientation R between the stereo images and between the time-series stereo images.

Ｓ１０４では、画像処理部１０３は、Ｓ１０３で可能性領域と判別された領域中から移動体領域を判別する。判別は、ステレオ画像と時系列のステレオ画像のセット、Ｓ１０２で得られた各ステレオ画像の撮像時のカメラの位置と姿勢の情報を用いる。ただし、画像処理部１０３の処理能力に余裕がある場合は可能性領域の判別結果を事前情報として画像の全領域に対して移動体領域の判別を行ってもよい。 In S104, the image processing unit 103 determines a moving object area from the areas determined as possible areas in S103. The determination uses a set of stereo images and time-series stereo images, and information on the position and orientation of the camera when each stereo image was captured in S102. However, if the image processing unit 103 has sufficient processing capacity, the determination result of the possible area may be used as prior information to determine the moving object area for the entire area of the image.

次に、図５を参照して、実施形態１の動作について説明する。図５（ａ）は、説明の容易化のための参考例を示している。図５（ａ）は、ある時刻におけるステレオ画像から３次元形状を復元したある静止領域の３次元座標点、および撮影時刻の異なるステレオ画像のフレームから３次元形状を復元したある静止領域の３次元座標点をそれぞれ示している。ステレオカメラは移動しても被写体は静止しているため、共通の領域が写像されている領域は三次元形状を復元した座標点が重複する。５０６はある時点での三次元形状を復元した包絡面、５０７は撮影時刻の異なるステレオ画像から三次元形状を復元した包絡面である。各ステレオ画像に写像される対応する領域が静止領域となる。参考例ではあるが最初に三次元形状の復元を行ったステレオ画像と撮影時刻の異なるステレオ画像の対応付けは最初のステレオ画像で復元した三次元形状の座標点を撮影時刻の異なるステレオ画像のそれぞれに写像することで容易に得られる。 Next, operation of the first embodiment will be described with reference to FIG. FIG. 5(a) shows a reference example for simplification of explanation. FIG. 5(a) shows 3D coordinate points of a static region whose 3D shape is restored from a stereo image at a certain time, and 3D coordinates of a static region whose 3D shape is restored from stereo image frames taken at different times. Each coordinate point is indicated. Since the subject remains stationary even when the stereo camera moves, the coordinate points of the restored three-dimensional shape overlap in the area where the common area is mapped. An envelope surface 506 is a restored three-dimensional shape at a certain point in time, and an envelope surface 507 is a three-dimensional shape restored from stereo images captured at different times. The corresponding area mapped to each stereo image becomes the static area. Although it is a reference example, the correspondence between the stereo image whose 3D shape was restored first and the stereo image whose shooting time is different is that the coordinate points of the 3D shape restored by the first stereo image are mapped to each of the stereo images whose shooting time is different. can be easily obtained by mapping to

図５（ｂ）は、ある時刻におけるステレオ画像および撮影時刻の異なるステレオ画像から３次元形状を復元した移動体の可能性領域の例を示している。例えば被写体が図中で右に移動する場合、ある時点での三次元形状を復元した包絡面５０８の位置に対して、撮影時刻の異なるステレオ画像から三次元形状を復元した包絡面５０９の位置は異なって相関で一致せず、ずれることになる。また、ずれ量やずれの状態は分からない。 FIG. 5(b) shows an example of a possible region of a moving object whose three-dimensional shape is restored from stereo images at a certain time and stereo images taken at different times. For example, when the subject moves to the right in the drawing, the position of the envelope surface 509 whose three-dimensional shape is restored from the stereo images captured at different times will be the position of the envelope surface 508 whose three-dimensional shape is restored at a certain point in time. They are different and do not agree with each other in correlation, resulting in deviation. In addition, the amount of deviation and the state of deviation are not known.

撮影時刻の異なるステレオ画像からの３次元形状の復元は、静止領域の場合と異なり、前回の処理でステレオ画像から３次元形状を復元した結果の逆射影では得られない。このため、Ｓ１０３におけるステレオ画像間の三次元復元と同様にして、移動体の可能性領域を選択して実施する。 Unlike the static region, the restoration of the three-dimensional shape from the stereo images captured at different times cannot be obtained by inverse projection of the result of the restoration of the three-dimensional shape from the stereo images in the previous processing. For this reason, similar to the three-dimensional restoration between stereo images in S103, the possible region of the moving object is selected and executed.

図５（ｂ）のように求められた、撮影時刻の異なるステレオ画像からそれぞれ３次元形状を復元した三次元情報とＳ１０２で取得した各ステレオ画像を撮像したカメラの座標情報とを用いて相関を行い、移動体領域を決定する。 Correlation is performed using three-dimensional information obtained as shown in FIG. to determine the moving object area.

次に、図６を参照して、被写体の移動による移動体の可能性領域の判別する場合に、三次元形状の復元に用いた情報の間のずれを被写体の位置と姿勢の変化を探索して求めて相関を行う手法を説明する。図６（ａ）は、被写体の位置と姿勢の変化のある状態を示している。例えば被写体が剛体の場合、被写体の位置と姿勢の変化をカメラの位置と姿勢の変化として三次元形状のサンプリング点の間の座標系の移動に関する探索として扱うことができる。これにより、相関の高い領域を移動体の可能性領域からオクルージョン領域やその誤対応を生じている領域として除き移動体領域を得ることができる。 Next, referring to FIG. 6, in the case of discriminating a possible region of a moving object due to the movement of the subject, a shift in the information used for restoring the three-dimensional shape is searched for changes in the position and posture of the subject. A method for performing correlation by obtaining FIG. 6(a) shows a state in which the subject's position and posture are changed. For example, if the object is a rigid body, changes in the position and orientation of the object can be treated as changes in the position and orientation of the camera as searches for movement of the coordinate system between sampling points of the three-dimensional shape. As a result, the moving object area can be obtained by excluding the highly correlated area from the moving object possible area as an occlusion area or an area causing erroneous correspondence.

図６（ｂ）は、既知のＩＣＰ（ＩｔｒａｔｉｖｅＣｌｏｓｅｓｔＰｏｉｎｔ）法（非特許文献２）を用いて、被写体の位置と姿勢の変化をカメラの位置と姿勢の変化の逆射影変換として探索して一致が得られる移動体の三次元領域を算出する様子を示している。 FIG. 6(b) shows that a known ICP (Itrative Closest Point) method (Non-Patent Document 2) is used to search and match changes in the position and orientation of the subject as an inverse projective transformation of changes in the position and orientation of the camera. It shows how to calculate the three-dimensional region of the moving object from which is obtained.

撮影時刻の異なるステレオ画像から３次元形状を復元したある移動体領域を、それを撮像したカメラ座標系ごとの位置と姿勢について探索する。
カメラの位置と姿勢の探索は、以下の式２、式３により行う。
（式２）

（式３）

ここで、ｐ，ｑは三次元復元形状、ΔＲ_ｏｂ，ΔＴ_ｏｂは被写体の位置と姿勢の変化量、ｐ’はカメラの位置と姿勢を考慮した座標系から被写体の位置と姿勢の変化量の逆射影変換で三次元点ｐを座標変換した三次元復元形状である。 A moving object region whose three-dimensional shape is restored from stereo images captured at different times is searched for the position and orientation of each camera coordinate system that captured the region.
The search for the position and orientation of the camera is performed using the following Equations 2 and 3.
(Formula 2)

(Formula 3)

Here, p and q are the three-dimensional reconstruction shape, ΔR _ob and ΔT _ob are the amount of change in the position and orientation of the object, and p′ is the amount of change in the position and orientation of the object from the coordinate system considering the position and orientation of the camera. It is a three-dimensional restored shape obtained by coordinate transformation of the three-dimensional point p by inverse projection transformation.

これにより被写体の三次元復元形状５０８および５０９の相関が最も高くなるカメラの位置と姿勢の逆射影変換が、被写体の位置と姿勢の変化量となり、高相関部分が真の移動体の三次元復元形状部分となる。そして、そのステレオ画像への写像部分が真の移動体領域となる。 As a result, the inverse projective transformation of the camera position and orientation where the correlation between the three-dimensional reconstruction shapes 508 and 509 of the subject is the highest becomes the amount of change in the subject's position and orientation, and the highly correlated portion is the true three-dimensional reconstruction of the moving object. shape part. Then, the portion mapped to the stereo image becomes the true moving object area.

以上の処理により相関の整合度で移動体可能性領域が真の移動体かを判別できるが、同時に相関の探索における被写体の位置と姿勢の変化量ΔＴ_ｏｂ，ΔＲ_ｏｂから、相関を取った撮影時刻の異なるステレオ画像のフレーム間での被写体の位置と姿勢の変化の情報を取得できる。 Through the above processing, it is possible to determine whether the moving object possible region is a true moving _object based on the matching _degree of the correlation. It is possible to acquire information on changes in the position and posture of a subject between frames of stereo images at different times.

図７は、Ｓ１０３で求められる移動体の可能性領域マップ、Ｓ１０４求められる移動体領域マップの例を示している。図７（ａ）は対応するフレームのステレオ画像の一方である。図７（ｂ）はＳ１０３で求められる静止領域７０１および移動体の可能性領域７０２をラべリングした静止領域および移動体の可能性領域マップを示し、図７（ａ）のステレオ画像の一方に対応する。図７（ｃ）はＳ１０４で求められる移動体領域マップを示し、静止領域７０１および移動体領域７０３、そしてオクルージョンによる誤対応や対応不能領域を含むその他の領域７０４をラべリングした静止／移動体領域マップを示している。図７（ｂ）、（ｃ）はラべリングが対応する画像のピクセルレベルまでされたマップの例を示しているが、より荒いブロックごとのラべリングマップでもよい。 FIG. 7 shows an example of the moving object possible area map obtained in S103 and the moving object area map obtained in S104. FIG. 7(a) is one of the stereo images of the corresponding frame. FIG. 7(b) shows a still region and moving object possible region map obtained by labeling the still region 701 and the moving object possible region 702 obtained in S103. handle. FIG. 7(c) shows the moving object area map obtained in S104, which is a stationary/moving object labeled with a stationary area 701, a moving object area 703, and other areas 704 including erroneous correspondence and unsupportable areas due to occlusion. A region map is shown. Although FIGS. 7(b) and (c) show examples of maps with labeling down to the pixel level of the corresponding image, rougher block-by-block labeling maps are also possible.

また、Ｓ１０３およびＳ１０４での判別は、撮影時刻の同じステレオ画像間およびステレオ画像と撮影時刻の異なるステレオ画像の少なくともいずれかとの間のそれぞれの画素についての対応関係のグラフを構築する処理に対応する。図７（ｄ）、（ｅ）は、図７（ｂ）、（ｃ）に対応する画像間の画素の対応関係のグラフのイメージを示している。グラフのエッジ関係についてはここまで述べた領域ベースのマッチングやプレーンスウィープ法による３次元復元を経た相関処理により、隣接画素関係や三次元幾何的な前後関係について整合ある対応を求める。また、図７（ｄ）の移動体領域マップにおける移動体の撮影時刻の異なる画像間の対応付けでは、Ｓ１０４で説明したように被写体の移動をＩＣＰ法を用いて幾何学的に考慮して探索することで時空間的にも整合の取れた対応付けを求める。 Further, the determination in S103 and S104 corresponds to the process of constructing a graph of the corresponding relationship for each pixel between the stereo images captured at the same time and between the stereo images and at least one of the stereo images captured at different times. . FIGS. 7(d) and 7(e) show graph images of pixel correspondences between images corresponding to FIGS. 7(b) and 7(c). Concerning the edge relationship of the graph, a consistent correspondence is obtained for the adjacent pixel relationship and the three-dimensional geometric anteroposterior relationship by the region-based matching described above and the correlation processing through the three-dimensional restoration by the plane sweep method. In addition, in the association between images captured at different times of the moving body in the moving body area map of FIG. By doing so, a correspondence that is also spatio-temporally consistent is obtained.

Ｓ１０３のステレオ画像間で対応が求められるが、撮影時刻の異なるステレオ画像との間で対応が見つからない移動体の可能性領域の画素に対して相関処理を実施し、エッジが構築できるかを確認していく。このグラフ構築処理を、基準とする画像上の各画素もしくは一部の画素をグルーピングしたサブ領域に対してまんべんなく行うことでマップへのラべリングを実施する。図７（ｄ）のようにステレオ画像（ｓｔｅｒｅｏＡ，ｓｔｅｒｅｏＢ）間で幾何学的に整合ある対応が見つかり、撮影時刻の異なる画像（ｄｉｆｆＴｉｍｅ）との間でも対応付けのエッジが構築できた画素またはサブ領域には静止領域のラベルが付与される。ステレオ画像（ｓｔｅｒｅｏＡ，ｓｔｅｒｅｏＢ）間で整合ある対応は見つかるが、Ｓ１０３で撮影時刻の異なる画像（ｄｉｆｆＴｉｍｅ）との間では整合ある対応が見つけられず、エッジが構築できない場合は移動体の可能性領域ラベルが付与される。移動体の可能性領域ラベルが付与された領域については図７（ｅ）のようにＳ１０４の撮影時刻の異なるステレオ画像の間、または撮影時刻の異なるステレオ画像のいずれかとの間でカメラの移動および被写体の移動を考慮した相関処理を実施する。そして、整合ある対応が見つけられてグラフノード間でエッジが構築できた場合には移動体領域のラベルが付与される。それ以外の領域には未対応のラベルを付与するかＮａＮとしてそのままにする。 Correlation processing is performed on the pixels in the possible region of the moving object for which correspondence is found between the stereo images in S103, but no correspondence is found between the stereo images captured at different times, and it is confirmed whether edges can be constructed. continue. The map is labeled by evenly performing this graph construction processing on sub-regions obtained by grouping each pixel or a part of pixels on the reference image. A pixel or The subregions are labeled as static regions. A consistent correspondence is found between the stereo images (stereoA, stereoB), but no consistent correspondence is found between the images captured at different times (diff times) in S103. A region label is given. For the area labeled with the possible moving object area label, as shown in FIG. 7(e), the camera movement and Correlation processing is performed in consideration of the movement of the subject. Then, when a matching correspondence is found and an edge can be constructed between graph nodes, a label of a moving body region is given. Other regions are given unsupported labels or left as NaN.

更に、グラフ構築の処理を数式で表す。Ｓ１０３における同時刻に撮影のステレオ画像間の相関処理のオペレータを式４で表す。
（式４）

撮影時刻の異なる画像間の被写体動きを考慮しない相関処理のオペレータを式５で表す。
（式５）

Ｓ１０４における撮影時刻の異なる画像間の被写体動きを考慮した相関処理のオペレータを式６で表す。
（式６）

ここで、ｘ→（ｘの上の矢印の意）は基準とするある画像上の座標、…は、カメラの位置と姿勢の情報や判別のための各種中間情報への閾値などの追加パラメータである。ｆｌｇ｛ｏｋ，ｎｇ｝は相関による画像またはサブ領域間の対応付けの可否を表す。ｙ→（ｙの上の矢印の意）は相関付けを行った他方の画像上の座標である。ｆｌｇ＝ｏｋの場合のみｙ→は値を持ち、次のオペレータの入力値ｘ→とすることができる（ｆｌｇ＝ｎｇならば、ｙ→＝ＮａＮ）とする。すると、静止領域は、式７の値を取る画素またはサブ領域となる。
（式７）

移動可能性領域は、式８の値を取る画素またはサブ領域となる。
（式８）

更に、式８と判別された領域に式５を適用して、式９となる領域が、移動体領域の画素またはサブ領域と判別される。
（式９）

となる領域が、移動体領域の画素またはサブ領域と判別される。 Furthermore, the processing of graph construction is represented by a formula. Equation 4 represents an operator for correlation processing between stereo images captured at the same time in S103.
(Formula 4)

Expression 5 expresses an operator for correlation processing that does not consider subject motion between images captured at different times.
(Formula 5)

Equation 6 represents an operator for correlation processing that considers subject motion between images captured at different times in S104.
(Formula 6)

Here, x → (meaning the arrow above x) is the coordinates on a reference image, and … is additional parameters such as information on the position and orientation of the camera and thresholds for various intermediate information for discrimination. be. flg{ok, ng} represents whether or not correspondence between images or sub-regions is possible by correlation. y→(meaning the arrow above y) is the coordinate on the other image that has been correlated. Only if flg=ok does y→ have a value and can be the input value x→ of the next operator (let y→=NaN if flg=ng). A static region is then a pixel or sub-region taking the value of Equation 7.
(Formula 7)

A region of possibility of movement will be a pixel or sub-region taking the value of Equation 8.
(Formula 8)

Furthermore, by applying Equation 5 to the region determined by Equation 8, the region defined by Equation 9 is determined as a pixel or sub-region of the moving object region.
(Formula 9)

is determined to be a pixel or sub-region of the moving body region.

本実施形態においては２眼ステレオカメラの例を説明したが、３眼以上のステレオカメラの画像に対しても同様に入力できる。特に、プレーンスウィープ法を相関処理に用いる場合、図８（ａ）に示すように容易にカメラを追加して演算を実施可能である。また、同様に撮影時刻の異なるステレオ画像のフレームも隣接する１枚の画像である必要はなく、図８（ｂ）に示すよう隣接する数フレームの画像を用いてもよく、フレーム数が増えるほどスコアの顕著性が増加する。 In this embodiment, an example of a two-lens stereo camera has been described, but an image of a three-lens or more stereo camera can be similarly input. In particular, when the plane sweep method is used for correlation processing, it is possible to easily add a camera and perform calculations as shown in FIG. 8(a). Similarly, the frames of stereo images captured at different times do not have to be one adjacent image, and images of several adjacent frames may be used as shown in FIG. The salience of the score increases.

また、本実施形態においては被写体が１つの場合を説明したが、移動体可能性領域がフレーム画像内に複数ある場合には各々の領域に対し、それぞれで移動体領域の判別処理を行うことで、複数の移動体に対応する移動体領域を取り扱うことができる。また、本実施形態では、被写体が剛体の場合を説明したが、移動体の可能性領域のサイズを小さく区切ることにより、非剛体も同様に取り扱うことが可能となり、各移動体領域の位置と姿勢の変化の情報を算出することも可能となる。隣接領域の位置と姿勢の変化の情報を事前情報として用いてもよい。 In addition, in the present embodiment, the case where there is only one subject has been described, but when there are a plurality of possible moving object areas in the frame image, each area can be subjected to moving object area discrimination processing. , moving body regions corresponding to a plurality of moving bodies. In addition, in this embodiment, the case where the subject is a rigid body has been described. It is also possible to calculate information on changes in . Information on changes in position and orientation of adjacent regions may be used as prior information.

以上の処理により、時系列のステレオ画像の各フレームに対応する静止領域、移動体領域とそれ以外の判別不能領域の判別が可能となる。後段で認識処理を実施することを想定して、動いている人や自動車の可能性がある画像領域のみを選別して認識処理に投入するような場合、静止被写体か移動体のラべリング情報のみ取得できれば十分である。そのような場合は、各ステレオ画像のフレームに対応するラべリング画像情報を記憶部１０４に順次保存していく。 By the above processing, it is possible to distinguish between the stationary area, the moving object area, and other unidentifiable areas corresponding to each frame of the time-series stereo images. Assuming that recognition processing will be performed later, when only image regions that are likely to be moving people or cars are selected and input to recognition processing, labeling information of static objects or moving objects It is enough if only In such a case, the labeling image information corresponding to each stereo image frame is sequentially stored in the storage unit 104 .

一方、ステレオ画像もしくは過去のフレームとの関係からフレームごとに三次元情報を生成していく場合、それを順次フレーム処理して蓄積していく場合には、対応する画素から生成された三次元情報に静止領域、移動体領域の領域属性を付加していく。そして必要ならいずれにも含まれない算出不能領域の属性を割り当ててもよい。例えば三次元情報をポイントクラウドとして保持する場合には、Ｒ，Ｇ，Ｂの色情報をＸ，Ｙ，Ｚの空間座標と共に多次元ベクトル情報として保持するように１次元追加して移動体属性をセットする。移動体領域には移動体の位置と姿勢の変化の情報ΔＴ_ｏｂ，ΔＲ_ｏｂを付加してもよい。 On the other hand, when 3D information is generated for each frame from a stereo image or a relationship with past frames, and when it is sequentially processed and accumulated, 3D information generated from corresponding pixels The area attributes of the stationary area and the moving object area are added to . Then, if necessary, attributes of incomputable regions that are not included in any of them may be assigned. For example, when three-dimensional information is held as a point cloud, one dimension is added so as to hold R, G, and B color information together with X, Y, and Z spatial coordinates as multidimensional vector information, and moving body attributes are calculated. set. Information ΔT _ob and ΔR _ob on changes in the position and attitude of the moving body may be added to the moving body region.

Ｓ１０５、Ｓ１０６では、画像処理部１０３は、各ステレオ画像のフレームに対応して生成した三次元情報を蓄積し統合していく。蓄積・統合は、各ステレオ画像のフレームに対応して生成した三次元情報をフレームごとのカメラの位置と姿勢を用いて実行していく。例えば、ある時点からのあるフレームのカメラの相対的な位置と姿勢をＴ，Ｒとすると、フレームに対応して生成された三次元情報を、対応するカメラ座標から、最初のフレームに対応する統合座標系に座標変換する。そして過去の三次元情報を加算するなどの処理により統合して統合していく。 In S105 and S106, the image processing unit 103 accumulates and integrates three-dimensional information generated corresponding to each stereo image frame. Accumulation/integration is performed by using the position and orientation of the camera for each frame of the three-dimensional information generated corresponding to each stereo image frame. For example, if the relative position and orientation of a camera in a certain frame from a certain point in time are T and R, the 3D information generated corresponding to the frame is integrated from the corresponding camera coordinates to the first frame. Coordinate transformation to coordinate system. Then, the past three-dimensional information is integrated by processing such as addition.

このように算出した移動体の属性情報は、ある特定のフレーム間での静止・移動を判別した属性情報である。そのため、統合・蓄積された三次元空間マップ上で隣接する距離、類似した色情報の同一被写体と考えらえる領域が、過去に移動体領域であったものがある統合時点から静止領域に変わった場合は、蓄積された三次元空間マップ内でその領域の３次元点群を全て静止領域に変換するなど、マップの更新処理を行ってもよい。 The attribute information of the moving object calculated in this manner is attribute information that determines whether the object is stationary or moves between specific frames. As a result, areas that are considered to be the same subject with similar distance and similar color information on the integrated and accumulated three-dimensional spatial map, which were moving object areas in the past, changed to static areas at the time of integration. In such a case, map update processing may be performed, such as converting all the 3D point groups of the area in the accumulated 3D space map into static areas.

以上のように、本実施形態によれば、移動するカメラで撮影したステレオ画像と時系列のステレオ画像の少なくともいずれかとの間でカメラの位置と姿勢を考慮して相関処理を行うことで移動体である可能性のある領域を判別できる。さらに、ステレオ画像および撮影時刻の異なるステレオ画像との間でカメラの位置と姿勢を考慮して相関処理を行うことで移動体領域を判別できる。また、各移動体の位置と姿勢の変化の情報も算出できる。 As described above, according to the present embodiment, correlation processing is performed between at least one of stereo images captured by a moving camera and time-series stereo images, taking into account the position and orientation of the camera. It is possible to discriminate regions that are likely to be Furthermore, the moving object region can be determined by performing correlation processing in consideration of the position and orientation of the camera between the stereo image and the stereo image captured at different times. Information on changes in the position and orientation of each moving body can also be calculated.

よって、移動するステレオカメラによる空間のマッピングにおいて、大きな記憶領域や高い処理能力が必要な処理が不要になり、静止領域と移動体領域を少ない計算量で判別できるようになる。 Therefore, in mapping a space using a moving stereo camera, processing that requires a large storage area and high processing power becomes unnecessary, and a static area and a moving object area can be discriminated with a small amount of calculation.

［実施形態２］次に、実施形態２について説明する。 [Embodiment 2] Next, Embodiment 2 will be described.

まず、図９を参照して、実施形態２の撮像装置２００の構成および機能について説明する。 First, with reference to FIG. 9, the configuration and functions of an imaging device 200 according to the second embodiment will be described.

撮像装置２００は、１つ以上の撮像部２０１とカメラ状態取得部１０２、画像処理部２０３、記憶部１０４、を備える。 The imaging device 200 includes one or more imaging units 201 , a camera state acquisition unit 102 , an image processing unit 203 and a storage unit 104 .

撮像部２０１は、図１０（ａ）に示すように、１つの撮像部で２つ以上のステレオ画像を撮像することができる瞳分割光学系である（特許文献３）。ステレオ画像の撮像部を２つ以上の撮像系を用いて実現する場合は、瞳分割光学系ではない普通の撮像系を用いてもよい。 As shown in FIG. 10A, the imaging unit 201 is a pupil division optical system capable of capturing two or more stereo images with one imaging unit (Patent Document 3). When the imaging unit for stereo images is realized using two or more imaging systems, an ordinary imaging system other than the pupil division optical system may be used.

瞳分割光学系では、同一光学系１００１の射出瞳１０３０の異なる瞳領域１０３１ａ、１０３１ｂを通過した光束１０３２ａ、１０３２ｂにより形成される被写体の光像（Ａ像、Ｂ像）間で相対的な視点位置のずれが生じる。そのため、射出瞳１０３０上におけるＡ像とＢ像を形成する重心間隔に対応する基線長を持つステレオ画像を撮像することが可能となる。 In the pupil division optical system, the relative viewpoint position between optical images (images A and B) of a subject formed by light beams 1032a and 1032b that have passed through different pupil regions 1031a and 1031b of the exit pupil 1030 of the same optical system 1001. deviation occurs. Therefore, it is possible to pick up a stereo image having a base line length corresponding to the distance between the centers of gravity forming the A image and the B image on the exit pupil 1030 .

撮像素子１００３は、図１０（ｂ）に示すように多数の測距画素（以下、画素とも呼ぶ）１０１０Ｒ、１０１０Ｇ、１０１０Ｂがｘｙ平面上に二次元に配列されている。 As shown in FIG. 10B, the imaging device 1003 has a large number of ranging pixels (hereinafter also referred to as pixels) 1010R, 1010G, and 1010B arranged two-dimensionally on the xy plane.

各画素１０１０Ｒ、１０１０Ｇ、１０１０Ｂは、図１０（ｃ）に示すようにマイクロレンズ１０１１、カラーフィルタ１０２２Ｒ、１０２２Ｇ、１０２２Ｂ、光電変換部１０１０Ｒａ、１０１０Ｒｂ、１０１０Ｇａ、１０１０Ｇｂ、１０１０Ｂａ、１０１０Ｂｂ、導波路１０１３を含む。撮像素子１００３は画素ごとにカラーフィルタ１０２２Ｒ、１０２２Ｇ、１０２２Ｂによって検出する波長帯域に応じた分光特性が与えられ、それぞれ、主として赤光、緑光、青光を取得する画素となっている。図示しない公知の配色パターンによってｘｙ平面上に配置されている。基板１０２４は、検出する波長帯域で吸収を有する材料、例えばＳｉであり、イオン打ち込みなどで、内部の少なくとも一部の領域に各光電変換部が形成される。各画素は、図示しない配線を備えている。 Each pixel 1010R, 1010G, 1010B includes a microlens 1011, color filters 1022R, 1022G, 1022B, photoelectric conversion units 1010Ra, 1010Rb, 1010Ga, 1010Gb, 1010Ba, 1010Bb, and a waveguide 1013 as shown in FIG. . The image sensor 1003 is given spectral characteristics according to the wavelength band to be detected by color filters 1022R, 1022G, and 1022B for each pixel, and is a pixel that mainly acquires red light, green light, and blue light, respectively. They are arranged on the xy plane according to a known coloration pattern (not shown). The substrate 1024 is made of a material having absorption in the wavelength band to be detected, such as Si, and each photoelectric conversion section is formed in at least a partial region inside by ion implantation or the like. Each pixel has wiring (not shown).

光電変換部１０１０Ｒａ、１０１０Ｇａ、１０１０Ｂａ、１０１０Ｒｂ、１０１０Ｇｂ、１０１０Ｂｂには、それぞれ射出瞳１０３０の異なる領域である第１の瞳領域１０３１ａを通過した光束１０３２ａおよび第２の瞳領域１０３１ｂを通過した光束１０３２ｂが入射し、それぞれ第１の信号および第２の信号が得られる。Ａ像を形成する第１の信号を取得する光電変換部をＡ画素と呼び、Ｂ像を形成する第２の信号を取得する光電変換部をＢ画素と呼ぶ。各光電変換部で取得された信号は、一旦画像処理部２０３に転送され現像処理が施されステレオ画像のフレームが生成される。このように、撮像素子１００３上でステレオ画像間の同期処理を実施できるため、フレームレートを高速にしても同期の取れた基線長の小さな対応付けを行い易いステレオ画像を取得可能である。 In the photoelectric conversion units 1010Ra, 1010Ga, 1010Ba, 1010Rb, 1010Gb, and 1010Bb, a light flux 1032a that has passed through a first pupil region 1031a and a light flux 1032b that have passed through a second pupil region 1031b, which are different regions of the exit pupil 1030, respectively. incident to obtain a first signal and a second signal, respectively. A photoelectric conversion unit that acquires a first signal that forms an A image is called an A pixel, and a photoelectric conversion unit that acquires a second signal that forms a B image is called a B pixel. A signal acquired by each photoelectric conversion unit is temporarily transferred to the image processing unit 203 and subjected to development processing to generate a stereo image frame. In this manner, synchronization processing between stereo images can be performed on the image pickup device 1003, so even if the frame rate is increased, it is possible to acquire synchronized stereo images with small baseline lengths that are easy to associate.

カメラ状態取得部１０２、ステレオ画像の現像処理を除く画像処理部２０３、記憶部１０４の機能は実施形態１と同様であるため説明を省略する。 The functions of the camera state acquisition unit 102, the image processing unit 203 except for the development processing of the stereo image, and the storage unit 104 are the same as those in the first embodiment, so description thereof will be omitted.

また、実施形態２の処理フローは実施形態１と同様のため説明を省略する。 Further, since the processing flow of the second embodiment is the same as that of the first embodiment, the description thereof is omitted.

以上のように、本実施形態においても、実施形態１と同等の作用効果を得ることができ、移動するステレオカメラによる空間のマッピングにおいて、大きな記憶領域や高い処理能力が必要な処理が不要になり、静止領域と移動体領域を少ない計算量で判別できるようになる。 As described above, even in this embodiment, the same effects as those in the first embodiment can be obtained, and processing that requires a large storage area and high processing power is not required in spatial mapping using a moving stereo camera. , the stationary area and the moving object area can be discriminated with a small amount of calculation.

［その他の実施形態］
本実施形態として適用可能な画像処理装置や撮像装置は、デジタルスチルカメラ、デジタルビデオカメラ、車載カメラ、携帯電話やスマートフォンなどを含む。 [Other embodiments]
Image processing devices and imaging devices that can be applied as the present embodiment include digital still cameras, digital video cameras, vehicle-mounted cameras, mobile phones, smart phones, and the like.

本実施形態の画像処理または撮像装置を画像認識装置と結合して自動車に搭載する場合、走行線上に侵入してくる可能性のある歩行者や自転車等の危険度を、画像中の移動している領域のみに対して認識処理にかけることが可能となり、認識処理可能なフレームの向上や処理負荷の軽減に役立てることができる。 When the image processing or imaging device of this embodiment is combined with an image recognition device and mounted on a vehicle, the degree of danger of pedestrians, bicycles, etc. that may enter the driving line is determined by moving in the image. Recognition processing can be applied to only the region in which the image is present, which can be used to improve the number of frames that can be recognized and to reduce the processing load.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

１００…画像処理装置、１０１…ステレオ画像入力部、１０２…カメラ状態取得部、１０３…画像処理部、１０４…記憶部 DESCRIPTION OF SYMBOLS 100... Image processing apparatus 101... Stereo image input part 102... Camera state acquisition part 103... Image processing part 104... Storage part

Claims

input means for inputting a stereo image;
Acquisition means for acquiring information on the position and orientation of the imaging device when the stereo image was captured;
generating means for determining a still area and a moving object area based on the correlation between stereo images and the correlation between stereo images captured at different times, and generating three-dimensional information;
The generating means performs a first correlation process for obtaining a correlation between the stereo images in consideration of the position and orientation of the imaging device, and performs a first correlation process for determining the correlation between the stereo images and at least one of the stereo images captured at different times. performing a second correlation process for obtaining the correlation between the static area and the moving object by combining the correlation obtained by the first correlation process and the correlation obtained by the second correlation process; discriminate high- potential regions ,
Based on the possible region of the moving object, the first correlation processing is performed between the stereo images in consideration of the position and orientation of the imaging device, and between the stereo images and stereo images captured at different times Third correlation processing is performed in consideration of changes in the position and orientation of the imaging device and the position and orientation of the moving object, and the correlation obtained by the first correlation processing and the correlation obtained by the third correlation processing An image processing apparatus characterized by distinguishing between a moving object area and other areas by combining with the correlation obtained from the image processing apparatus.

2. The image processing apparatus according to claim 1 , wherein said generating means calculates changes in position and/or orientation of said moving body region.

3. The image processing apparatus according to claim 1 , wherein when the image includes a plurality of possible regions of the moving object , the generating unit performs a process of determining the moving object region for each region.

The generating means divides the possible region of the moving body and performs moving body region determination processing to determine a rigid body region and a non-rigid body region, and calculates changes in the position and orientation of each region. The image processing apparatus according to any one of claims 1 to 3 , characterized by:

The first correlation processing is performed using a plane sweep method, and the second correlation processing is correlation processing by inverse projective transformation of the first correlation processing using the plane sweep method. The image processing apparatus according to any one of claims 1 to 4 .

The generating means discriminates an area in which both the correlation obtained by the first correlation process and the correlation obtained by the second correlation process are high as a static area, and 6. The image processing apparatus according to any one of claims 1 to 5 , wherein a region having a high correlation but a low correlation obtained by the second correlation processing is determined as the possible region of the moving object. .

The third correlation processing is performed by comparing the position and orientation of the imaging device and the moving object between the three-dimensional information generated by the first correlation processing and the three-dimensional information generated by the second correlation processing. 5. The image processing according to any one of claims 1 to 4 , characterized in that three-dimensional shape correspondence is performed in consideration of changes in the position and posture of the moving body region and other regions. Device.

8. The image processing apparatus according to claim 7 , wherein the association is performed using an ICP (Itrative Closest Point) method.

The generation means generates a space map from at least two sets of time-series stereo images used for determining the stationary region and the moving body region, and integrates the generated space map in consideration of the position and orientation of the imaging device. 9. The image processing apparatus according to any one of claims 1 to 8 , wherein:

The input means is imaging means,
10. The image processing apparatus according to any one of claims 1 to 9 , wherein the image processing apparatus is the imaging apparatus.

11. The image processing apparatus according to claim 10 , wherein said imaging means includes at least one pupil division optical system.

an input step in which the input means inputs a stereo image;
an acquisition step in which an acquisition means acquires information on the position and orientation of the imaging device when the stereo image was captured;
a generating step of generating three-dimensional information by determining a static area and a moving object area based on the correlation between stereo images and the correlation between stereo images captured at different times,
In the generating step, in consideration of the position and orientation of the image pickup device, a first correlation process is performed to obtain a correlation between the stereo images, and at least one of the stereo images and the stereo images captured at different times. performing a second correlation process for obtaining the correlation between the static area and the moving object by combining the correlation obtained by the first correlation process and the correlation obtained by the second correlation process; discriminate high- potential regions ,
Based on the possible region of the moving object, the first correlation processing is performed between the stereo images in consideration of the position and orientation of the imaging device, and between the stereo images and stereo images captured at different times Third correlation processing is performed in consideration of changes in the position and orientation of the imaging device and the position and orientation of the moving object, and the correlation obtained by the first correlation processing and the correlation obtained by the third correlation processing An image processing method characterized in that a moving object region and other regions are discriminated by combination with the correlation obtained by the method .

A program for causing a computer to execute the image processing method according to claim 12 .

A computer-readable storage medium storing a program for causing a computer to execute the image processing method according to claim 12 .