JP2017167970A

JP2017167970A - Image processing apparatus, object recognition apparatus, device control system, image processing method, and program

Info

Publication number: JP2017167970A
Application number: JP2016054453A
Authority: JP
Inventors: 聖也天野; Seiya Amano; 洋義関口; Hiroyoshi Sekiguchi; 横田　聡一郎; Soichiro Yokota; 聡一郎横田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-03-17
Filing date: 2016-03-17
Publication date: 2017-09-21

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program capable of accurately performing a tracking process on an object.SOLUTION: The image processing apparatus comprises: first matching means which acquires an outline of an object by connecting each pixel hit in distance information of the object when the object is searched from an upper side to a lower side of the object in a distance image corresponding to the current frame, and which specifies a candidate position of the object in a horizontal direction when the outline is found to correspond to an object to be searched by template matching using an outline template; and second matching means which determines a position of the object in the horizontal direction and a first position of the object in a vertical direction by performing template matching using an image template in a vertical direction from the candidate position in the distance image.SELECTED DRAWING: Figure 14

Description

本発明は、画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラムに関する。 The present invention relates to an image processing device, an object recognition device, a device control system, an image processing method, and a program.

従来、自動車の安全性において、歩行者と自動車とが衝突したときに、いかに歩行者を守れるか、および、乗員を保護できるかの観点から、自動車のボディー構造等の開発が行われてきた。しかしながら、近年、情報処理技術および画像処理技術の発達により、高速に人および自動車を検出する技術が開発されてきている。これらの技術を応用して、自動車が物体に衝突する前に自動的にブレーキをかけ、衝突を未然に防ぐという自動車もすでに開発されている。自動車の自動制御には、人または他車等の物体までの距離を正確に測定する必要があり、そのためには、ミリ波レーダおよびレーザレーダによる測距、ならびに、ステレオカメラによる測距等が実用化されている。 Conventionally, in terms of safety of automobiles, body structures of automobiles have been developed from the viewpoint of how to protect pedestrians and protect passengers when pedestrians and automobiles collide. However, in recent years, with the development of information processing technology and image processing technology, technology for detecting people and cars at high speed has been developed. Automobiles that apply these technologies to automatically apply a brake before an automobile collides with an object to prevent the collision have already been developed. For automatic control of automobiles, it is necessary to accurately measure the distance to an object such as a person or another vehicle. For this purpose, distance measurement using millimeter wave radar and laser radar, distance measurement using a stereo camera, etc. are practical. It has become.

ステレオカメラによる物体認識処理では、大きくクラスタリング処理と、トラッキング処理とに分けることができる。クラスタリング処理は、特にリアルタイムに撮像された輝度画像、およびステレオカメラから導出された視差画像を用いて物体を新規に検出する処理である。また、トラッキング処理は、複数のフレームの情報を用いてクラスタリング処理で検出された物体を追従する処理である。トラッキング処理では、基本的に、二次元画像上の視差値または輝度値のパターンに基づいて、前フレームで検出された物体と類似している領域を、テンプレートマッチングにより現在のフレーム内から検出する。 Object recognition processing using a stereo camera can be broadly divided into clustering processing and tracking processing. The clustering process is a process of newly detecting an object using a luminance image captured in real time and a parallax image derived from a stereo camera. The tracking process is a process of following the object detected by the clustering process using information of a plurality of frames. In the tracking process, basically, a region similar to the object detected in the previous frame is detected from the current frame by template matching based on the parallax value or luminance value pattern on the two-dimensional image.

このようなトラッキング処理の技術として、歩行者が存在すると認識された歩行者認識領域の特定および歩行者であることの確度を示す歩行者スコアを行い、歩行者スコアに基づいて、歩行者が存在するという認識結果の採否を決定する技術が提案されている（特許文献１参照）。 As a technique for such tracking processing, the pedestrian recognition area that is recognized as having a pedestrian is identified and a pedestrian score indicating the accuracy of being a pedestrian is performed, and the pedestrian is present based on the pedestrian score. A technique for determining whether or not to accept the recognition result is proposed (see Patent Document 1).

しかし、特許文献１には、フレーム間で刻々と異なっていく特徴量に対応する処理が記載されておらず、非剛体である歩行者のように、個体ごとに様々な特徴があり、時間経過により特徴量も異なっていく物体を、精度よく追従（トラッキング）する処理を実行するのが困難であるという問題がある。 However, Patent Document 1 does not describe a process corresponding to a feature amount that varies from frame to frame, and there are various features for each individual such as a non-rigid pedestrian. Therefore, there is a problem that it is difficult to execute processing for accurately tracking (tracking) an object whose feature amount varies depending on the type of object.

本発明は、上記に鑑みてなされたものであって、物体に対して精度よく追従処理を行うことができる画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and provides an image processing device, an object recognition device, a device control system, an image processing method, and a program that can accurately perform tracking processing on an object. Objective.

上述した課題を解決し、目的を達成するために、本発明は、現在のフレームに対応する距離画像において、物体の上側から下方向に向かって探索した場合に、物体の距離情報の画素に行き当たる各画素を結ぶことによって物体の輪郭を取得し、輪郭に対して、輪郭テンプレートを用いたテンプレートマッチングによって輪郭が検出対象の物体に相当する場合、物体の横方向の候補位置を特定する第１マッチング手段と、距離画像において、候補位置からの縦方向において画像テンプレートを用いたテンプレートマッチングを行い、物体の横方向の位置、および、縦方向の第１位置を決定する第２マッチング手段と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention goes to the distance information pixel of the object when searching from the upper side to the lower side of the object in the distance image corresponding to the current frame. First, the candidate position in the lateral direction of the object is specified when the contour of the object is obtained by connecting the corresponding pixels and the contour corresponds to the object to be detected by template matching using the contour template. Matching means and second matching means for performing template matching using an image template in the vertical direction from the candidate position in the distance image, and determining the horizontal position of the object and the first vertical position. It is characterized by having.

本発明によれば、物体に対して精度よく追従処理を行うことができる。 According to the present invention, it is possible to accurately perform a tracking process on an object.

図１は、実施の形態に係る機器制御システムを車両に搭載した例を示す図である。FIG. 1 is a diagram illustrating an example in which a device control system according to an embodiment is mounted on a vehicle. 図２は、実施の形態に係る物体認識装置の外観の一例を示す図である。FIG. 2 is a diagram illustrating an example of an appearance of the object recognition apparatus according to the embodiment. 図３は、実施の形態に係る物体認識装置のハードウェア構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of a hardware configuration of the object recognition apparatus according to the embodiment. 図４は、実施の形態に係る物体認識装置の機能ブロック構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of a functional block configuration of the object recognition apparatus according to the embodiment. 図５は、実施の形態に係る物体認識装置の視差値演算処理部の機能ブロック構成の一例を示す図である。FIG. 5 is a diagram illustrating an example of a functional block configuration of a disparity value calculation processing unit of the object recognition device according to the embodiment. 図６は、撮像部から物体までの距離を導き出す原理を説明する図である。FIG. 6 is a diagram for explaining the principle of deriving the distance from the imaging unit to the object. 図７は、基準画像における基準画素に対応する比較画像における対応画素を求める場合の説明図である。FIG. 7 is an explanatory diagram for obtaining a corresponding pixel in the comparison image corresponding to the reference pixel in the reference image. 図８は、ブロックマッチング処理の結果のグラフの一例を示す図である。FIG. 8 is a diagram illustrating an example of a graph of a result of the block matching process. 図９は、実施の形態に係る物体認識装置の認識処理部の機能ブロック構成の一例を示す図である。FIG. 9 is a diagram illustrating an example of a functional block configuration of the recognition processing unit of the object recognition apparatus according to the embodiment. 図１０は、視差画像から生成されるＶマップの例を示す図である。FIG. 10 is a diagram illustrating an example of a V map generated from a parallax image. 図１１は、視差画像から生成されるＵマップの例を示す図である。FIG. 11 is a diagram illustrating an example of a U map generated from a parallax image. 図１２は、Ｕマップから生成されるリアルＵマップの例を示す図である。FIG. 12 is a diagram illustrating an example of a real U map generated from the U map. 図１３は、検出枠を作成する処理を説明する図である。FIG. 13 is a diagram illustrating processing for creating a detection frame. 図１４は、実施の形態に係る物体認識装置の認識処理部のトラッキング処理部の機能ブロック構成の一例を示す図である。FIG. 14 is a diagram illustrating an example of a functional block configuration of the tracking processing unit of the recognition processing unit of the object recognition apparatus according to the embodiment. 図１５は、実施の形態に係る視差値導出部のブロックマッチング処理の動作の一例を示すフローチャートである。FIG. 15 is a flowchart illustrating an example of the block matching processing operation of the disparity value deriving unit according to the embodiment. 図１６は、実施の形態に係る認識処理部のトラッキング処理部のトラッキング処理の動作の一例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of the tracking processing operation of the tracking processing unit of the recognition processing unit according to the embodiment. 図１７は、移動予測の動作を説明する図である。FIG. 17 is a diagram for explaining the movement prediction operation. 図１８は、実施の形態のトラッキング処理部の判定部の分岐処理の動作の一例を示すフローチャートである。FIG. 18 is a flowchart illustrating an example of the branch processing operation of the determination unit of the tracking processing unit according to the embodiment. 図１９は、実施の形態のトラッキング処理部のマッチング部の歩行者用マッチング処理の動作の一例を示すフローチャートである。FIG. 19 is a flowchart illustrating an example of the operation of the pedestrian matching process of the matching unit of the tracking processing unit according to the embodiment. 図２０は、歩行者用マッチング処理における形状マッチング処理のうち輪郭を検出する動作を説明する図である。FIG. 20 is a diagram illustrating an operation of detecting a contour in the shape matching process in the pedestrian matching process. 図２１は、形状マッチング処理において検出された輪郭の一例を示す図である。FIG. 21 is a diagram illustrating an example of a contour detected in the shape matching process. 図２２は、前フレームに対応する視差画像で検出された輪郭テンプレートの一例を示す図である。FIG. 22 is a diagram illustrating an example of a contour template detected from a parallax image corresponding to a previous frame. 図２３は、実施の形態のマッチング部の歩行者用マッチング処理における形状マッチング処理の動作を説明する図である。FIG. 23 is a diagram illustrating the operation of the shape matching process in the pedestrian matching process of the matching unit according to the embodiment. 図２４は、歩行者用マッチング処理における画像マッチング処理で使用する画像テンプレートの一例を示す図である。FIG. 24 is a diagram illustrating an example of an image template used in the image matching process in the pedestrian matching process. 図２５は、実施の形態のマッチング部の歩行者用マッチング処理における画像マッチング処理の動作を説明する図である。FIG. 25 is a diagram illustrating the operation of the image matching process in the pedestrian matching process of the matching unit according to the embodiment. 図２６は、実施の形態のマッチング部の歩行者用マッチング処理における境界決定処理の動作を説明する図である。FIG. 26 is a diagram illustrating the operation of the boundary determination process in the pedestrian matching process of the matching unit according to the embodiment. 図２７は、実施の形態のマッチング部の歩行者用マッチング処理における枠補正処理の動作を説明する図である。FIG. 27 is a diagram illustrating an operation of a frame correction process in the pedestrian matching process of the matching unit according to the embodiment.

以下に、図１〜２７を参照しながら、本発明に係る画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラムの実施の形態を詳細に説明する。また、以下の実施の形態によって本発明が限定されるものではなく、以下の実施の形態における構成要素には、当業者が容易に想到できるもの、実質的に同一のもの、およびいわゆる均等の範囲のものが含まれる。さらに、以下の実施の形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換、変更および組み合わせを行うことができる。 Hereinafter, embodiments of an image processing device, an object recognition device, a device control system, an image processing method, and a program according to the present invention will be described in detail with reference to FIGS. In addition, the present invention is not limited by the following embodiments, and constituent elements in the following embodiments are easily conceivable by those skilled in the art, substantially the same, and so-called equivalent ranges. Is included. Furthermore, various omissions, substitutions, changes, and combinations of the constituent elements can be made without departing from the scope of the following embodiments.

［物体認識装置を備えた車両の概略構成］
図１は、実施の形態に係る機器制御システムを車両に搭載した例を示す図である。図１を参照しながら、本実施の形態の機器制御システム６０が車両７０に搭載される場合を例に説明する。 [Schematic configuration of vehicle with object recognition device]
FIG. 1 is a diagram illustrating an example in which a device control system according to an embodiment is mounted on a vehicle. An example in which the device control system 60 of this embodiment is mounted on a vehicle 70 will be described with reference to FIG.

図１のうち、図１（ａ）は、機器制御システム６０を搭載した車両７０の側面図であり、図１（ｂ）は、車両７０の正面図である。 1A is a side view of a vehicle 70 on which the device control system 60 is mounted, and FIG. 1B is a front view of the vehicle 70.

図１に示すように、自動車である車両７０は、機器制御システム６０を搭載している。機器制御システム６０は、車両７０の居室空間である車室に設置された物体認識装置１と、車両制御装置６（制御装置）と、ステアリングホイール７と、ブレーキペダル８と、を備えている。 As shown in FIG. 1, a vehicle 70 that is an automobile is equipped with a device control system 60. The device control system 60 includes an object recognition device 1 installed in a passenger compartment that is a room space of the vehicle 70, a vehicle control device 6 (control device), a steering wheel 7, and a brake pedal 8.

物体認識装置１は、車両７０の進行方向を撮像する撮像機能を有し、例えば、車両７０のフロントウィンドウ内側のバックミラー近傍に設置される。物体認識装置１は、構成および動作の詳細は後述するが、本体部２と、本体部２に固定された撮像部１０ａと、撮像部１０ｂとを備えている。撮像部１０ａ、１０ｂは、車両７０の進行方向の被写体を撮像できるように本体部２に固定されている。 The object recognition device 1 has an imaging function for imaging the traveling direction of the vehicle 70 and is installed, for example, in the vicinity of a rearview mirror inside the front window of the vehicle 70. The object recognition apparatus 1 includes a main body 2, an imaging unit 10a fixed to the main body 2, and an imaging unit 10b. The imaging units 10a and 10b are fixed to the main body unit 2 so that a subject in the traveling direction of the vehicle 70 can be imaged.

車両制御装置６は、物体認識装置１から受信した認識情報に基づいて、各種車両制御を実行するＥＣＵ（ＥｌｅｃｔｒｏｎｉｃＣｏｎｔｒｏｌＵｎｉｔ）である。車両制御装置６は、車両制御の例として、物体認識装置１から受信した認識情報に基づいて、ステアリングホイール７を含むステアリング系統（制御対象の一例）を制御して障害物を回避するステアリング制御、または、ブレーキペダル８（制御対象の一例）を制御して車両７０を減速および停止させるブレーキ制御等を実行する。 The vehicle control device 6 is an ECU (Electronic Control Unit) that performs various vehicle controls based on the recognition information received from the object recognition device 1. As an example of vehicle control, the vehicle control device 6 controls a steering system (an example of a control target) including a steering wheel 7 based on recognition information received from the object recognition device 1 to avoid an obstacle, Or the brake control etc. which decelerate and stop the vehicle 70 by controlling the brake pedal 8 (an example of a control object) are performed.

このような物体認識装置１および車両制御装置６を含む機器制御システム６０のように、ステアリング制御またはブレーキ制御等の車両制御が実行されることによって、車両７０の運転の安全性を向上することができる。 As in the device control system 60 including the object recognition device 1 and the vehicle control device 6 described above, vehicle control such as steering control or brake control is executed, so that the driving safety of the vehicle 70 can be improved. it can.

なお、上述のように、物体認識装置１は、車両７０の前方を撮像するものとしたが、これに限定されるものではない。すなわち、物体認識装置１は、車両７０の後方または側方を撮像するように設置されるものとしてもよい。この場合、物体認識装置１は、車両７０の後方の後続車および人、または側方の他の車両および人等の位置を検出することができる。そして、車両制御装置６は、車両７０の車線変更時または車線合流時等における危険を検知して、上述の車両制御を実行することができる。また、車両制御装置６は、車両７０の駐車時等におけるバック動作において、物体認識装置１によって出力された車両７０の後方の障害物についての認識情報に基づいて、衝突の危険があると判断した場合に、上述の車両制御を実行することができる。 Note that, as described above, the object recognition device 1 captures the front of the vehicle 70, but is not limited thereto. That is, the object recognition device 1 may be installed so as to capture the rear or side of the vehicle 70. In this case, the object recognition apparatus 1 can detect the positions of the following vehicle and person behind the vehicle 70, or other vehicles and persons on the side. And the vehicle control apparatus 6 can detect the danger at the time of the lane change of the vehicle 70 or a lane merge, etc., and can perform the above-mentioned vehicle control. Further, the vehicle control device 6 determines that there is a risk of collision based on the recognition information about the obstacle behind the vehicle 70 output by the object recognition device 1 in the back operation when the vehicle 70 is parked. In this case, the vehicle control described above can be executed.

［物体認識装置の構成］
図２は、実施の形態に係る物体認識装置の外観の一例を示す図である。図２に示すように、物体認識装置１は、上述のように、本体部２と、本体部２に固定された撮像部１０ａと、撮像部１０ｂとを備えている。撮像部１０ａ、１０ｂは、本体部２に対して平行等位に配置された一対の円筒形状のカメラで構成されている。また、説明の便宜上、図２に示す撮像部１０ａを右のカメラと称し、撮像部１０ｂを左のカメラと称する場合がある。 [Configuration of Object Recognition Device]
FIG. 2 is a diagram illustrating an example of an appearance of the object recognition apparatus according to the embodiment. As shown in FIG. 2, the object recognition device 1 includes the main body unit 2, the imaging unit 10a fixed to the main body unit 2, and the imaging unit 10b as described above. The imaging units 10 a and 10 b are constituted by a pair of cylindrical cameras arranged in parallel equiposition with respect to the main body unit 2. For convenience of explanation, the imaging unit 10a illustrated in FIG. 2 may be referred to as a right camera and the imaging unit 10b may be referred to as a left camera.

（物体認識装置のハードウェア構成）
図３は、実施の形態に係る物体認識装置のハードウェア構成の一例を示す図である。図３を参照しながら、物体認識装置１のハードウェア構成について説明する。 (Hardware configuration of object recognition device)
FIG. 3 is a diagram illustrating an example of a hardware configuration of the object recognition apparatus according to the embodiment. The hardware configuration of the object recognition apparatus 1 will be described with reference to FIG.

図３に示すように、物体認識装置１は、本体部２内に視差値導出部３および認識処理部５を備えている。 As shown in FIG. 3, the object recognition device 1 includes a parallax value deriving unit 3 and a recognition processing unit 5 in the main body unit 2.

視差値導出部３は、物体を撮像して得られた複数の画像から、物体に対する視差を示す視差値ｄｐを導出し、各画素における視差値ｄｐを示す視差画像を出力する装置である。認識処理部５は、視差値導出部３から出力された視差画像に基づいて、撮像画像に写り込んでいる人および車等の物体に対する物体認識処理等を行い、物体認識処理の結果を示す情報である認識情報を、車両制御装置６に出力する装置である。 The parallax value deriving unit 3 is a device that derives a parallax value dp indicating the parallax for the object from a plurality of images obtained by imaging the object, and outputs a parallax image indicating the parallax value dp in each pixel. Based on the parallax image output from the parallax value deriving unit 3, the recognition processing unit 5 performs object recognition processing or the like on an object such as a person or a car reflected in the captured image, and indicates information indicating the result of the object recognition processing This is a device that outputs the recognition information to the vehicle control device 6.

図３に示すように、視差値導出部３は、撮像部１０ａと、撮像部１０ｂと、信号変換部２０ａと、信号変換部２０ｂと、画像処理部３０と、を備えている。 As shown in FIG. 3, the parallax value deriving unit 3 includes an imaging unit 10a, an imaging unit 10b, a signal conversion unit 20a, a signal conversion unit 20b, and an image processing unit 30.

撮像部１０ａは、前方の被写体を撮像してアナログの画像信号を生成する処理部である。撮像部１０ａは、撮像レンズ１１ａと、絞り１２ａと、画像センサ１３ａと、を備えている。 The imaging unit 10a is a processing unit that images an object in front and generates an analog image signal. The imaging unit 10a includes an imaging lens 11a, a diaphragm 12a, and an image sensor 13a.

撮像レンズ１１ａは、入射する光を屈折させて物体の像を画像センサ１３ａに結像させるための光学素子である。絞り１２ａは、撮像レンズ１１ａを通過した光の一部を遮ることによって、画像センサ１３ａに入力する光の量を調整する部材である。画像センサ１３ａは、撮像レンズ１１ａに入射し、絞り１２ａを通過した光を電気的なアナログの画像信号に変換する半導体素子である。画像センサ１３ａは、例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅｓ）またはＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）等の固体撮像素子によって実現される。 The imaging lens 11a is an optical element that refracts incident light to form an image of an object on the image sensor 13a. The diaphragm 12a is a member that adjusts the amount of light input to the image sensor 13a by blocking part of the light that has passed through the imaging lens 11a. The image sensor 13a is a semiconductor element that converts light that has entered the imaging lens 11a and passed through the aperture 12a into an electrical analog image signal. The image sensor 13a is realized by, for example, a solid-state imaging device such as a charge coupled devices (CCD) or a complementary metal oxide semiconductor (CMOS).

撮像部１０ｂは、前方の被写体を撮像してアナログの画像信号を生成する処理部である。撮像部１０ｂは、撮像レンズ１１ｂと、絞り１２ｂと、画像センサ１３ｂと、を備えている。なお、撮像レンズ１１ｂ、絞り１２ｂおよび画像センサ１３ｂの機能は、それぞれ上述した撮像レンズ１１ａ、絞り１２ａおよび画像センサ１３ａの機能と同様である。また、撮像レンズ１１ａおよび撮像レンズ１１ｂは、左右のカメラが同一の条件で撮像されるように、それぞれのレンズ面が互いに同一平面上にあるように設置されている。 The imaging unit 10b is a processing unit that images an object in front and generates an analog image signal. The imaging unit 10b includes an imaging lens 11b, a diaphragm 12b, and an image sensor 13b. The functions of the imaging lens 11b, the diaphragm 12b, and the image sensor 13b are the same as the functions of the imaging lens 11a, the diaphragm 12a, and the image sensor 13a, respectively. Further, the imaging lens 11a and the imaging lens 11b are installed so that the respective lens surfaces are on the same plane so that the left and right cameras are imaged under the same conditions.

信号変換部２０ａは、撮像部１０ａにより生成されたアナログの画像信号を、デジタル形式の画像データに変換する処理部である。信号変換部２０ａは、ＣＤＳ（ＣｏｒｒｅｌａｔｅｄＤｏｕｂｌｅＳａｍｐｌｉｎｇ）２１ａと、ＡＧＣ（ＡｕｔｏＧａｉｎＣｏｎｔｒｏｌ）２２ａと、ＡＤＣ（ＡｎａｌｏｇＤｉｇｉｔａｌＣｏｎｖｅｒｔｅｒ）２３ａと、フレームメモリ２４ａと、を備えている。 The signal conversion unit 20a is a processing unit that converts the analog image signal generated by the imaging unit 10a into digital image data. The signal converter 20a includes a CDS (Correlated Double Sampling) 21a, an AGC (Auto Gain Control) 22a, an ADC (Analog Digital Converter) 23a, and a frame memory 24a.

ＣＤＳ２１ａは、画像センサ１３ａにより生成されたアナログの画像信号に対して、相関二重サンプリング、横方向の微分フィルタ、または縦方向の平滑フィルタ等によりノイズを除去する。ＡＧＣ２２ａは、ＣＤＳ２１ａによってノイズが除去されたアナログの画像信号の強度を制御する利得制御を行う。ＡＤＣ２３ａは、ＡＧＣ２２ａによって利得制御されたアナログの画像信号をデジタル形式の画像データに変換する。フレームメモリ２４ａは、ＡＤＣ２３ａによって変換された画像データを記憶する。 The CDS 21a removes noise from the analog image signal generated by the image sensor 13a by correlated double sampling, a horizontal differential filter, a vertical smoothing filter, or the like. The AGC 22a performs gain control for controlling the intensity of the analog image signal from which noise has been removed by the CDS 21a. The ADC 23a converts an analog image signal whose gain is controlled by the AGC 22a into digital image data. The frame memory 24a stores the image data converted by the ADC 23a.

信号変換部２０ｂは、撮像部１０ｂにより生成されたアナログの画像信号を、デジタル形式の画像データに変換する処理部である。信号変換部２０ｂは、ＣＤＳ２１ｂと、ＡＧＣ２２ｂと、ＡＤＣ２３ｂと、フレームメモリ２４ｂと、を備えている。なお、ＣＤＳ２１ｂ、ＡＧＣ２２ｂ、ＡＤＣ２３ｂおよびフレームメモリ２４ｂの機能は、それぞれ上述したＣＤＳ２１ａ、ＡＧＣ２２ａ、ＡＤＣ２３ａおよびフレームメモリ２４ａの機能と同様である。 The signal conversion unit 20b is a processing unit that converts an analog image signal generated by the imaging unit 10b into digital image data. The signal conversion unit 20b includes a CDS 21b, an AGC 22b, an ADC 23b, and a frame memory 24b. The functions of the CDS 21b, AGC 22b, ADC 23b, and frame memory 24b are the same as the functions of the CDS 21a, AGC 22a, ADC 23a, and frame memory 24a, respectively.

画像処理部３０は、信号変換部２０ａおよび信号変換部２０ｂによって変換された画像データに対して画像処理をする装置である。画像処理部３０は、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）３１と、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３２と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）３３と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３４と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）３５と、バスライン３９と、を備えている。 The image processing unit 30 is a device that performs image processing on the image data converted by the signal conversion unit 20a and the signal conversion unit 20b. The image processing unit 30 includes an FPGA (Field Programmable Gate Array) 31, a CPU (Central Processing Unit) 32, a ROM (Read Only Memory) 33, a RAM (Random Access Memory) 34, and an I / F (35). And a bus line 39.

ＦＰＧＡ３１は、集積回路であり、ここでは、画像データに基づく画像における視差値ｄｐを導出する処理を行う。ＣＰＵ３２は、視差値導出部３の各機能を制御する。ＲＯＭ３３は、ＣＰＵ３２が視差値導出部３の各機能を制御するために実行する画像処理用プログラムを記憶している。ＲＡＭ３４は、ＣＰＵ３２のワークエリアとして使用される。Ｉ／Ｆ３５は、認識処理部５におけるＩ／Ｆ５５と、通信線４を介して通信するためのインターフェースである。バスライン３９は、図３に示すように、ＦＰＧＡ３１、ＣＰＵ３２、ＲＯＭ３３、ＲＡＭ３４およびＩ／Ｆ３５が互いに通信可能となるように接続するアドレスバスおよびデータバス等である。 The FPGA 31 is an integrated circuit, and here performs a process of deriving a parallax value dp in an image based on image data. The CPU 32 controls each function of the parallax value deriving unit 3. The ROM 33 stores an image processing program that the CPU 32 executes in order to control each function of the parallax value deriving unit 3. The RAM 34 is used as a work area for the CPU 32. The I / F 35 is an interface for communicating with the I / F 55 in the recognition processing unit 5 via the communication line 4. As shown in FIG. 3, the bus line 39 is an address bus, a data bus, or the like that connects the FPGA 31, the CPU 32, the ROM 33, the RAM 34, and the I / F 35 so that they can communicate with each other.

なお、画像処理部３０は、視差値ｄｐを導出する集積回路としてＦＰＧＡ３１を備えるものとしているが、これに限定されるものではなく、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等の集積回路であってもよい。 The image processing unit 30 includes the FPGA 31 as an integrated circuit for deriving the parallax value dp. However, the image processing unit 30 is not limited to this and may be an integrated circuit such as an ASIC (Application Specific Integrated Circuit). .

図３に示すように、認識処理部５は、ＦＰＧＡ５１と、ＣＰＵ５２と、ＲＯＭ５３と、ＲＡＭ５４と、Ｉ／Ｆ５５と、ＣＡＮ（ＣｏｎｔｒｏｌｌｅｒＡｒｅａＮｅｔｗｏｒｋ）Ｉ／Ｆ５８と、バスライン５９と、を備えている。 As shown in FIG. 3, the recognition processing unit 5 includes an FPGA 51, a CPU 52, a ROM 53, a RAM 54, an I / F 55, a CAN (Controller Area Network) I / F 58, and a bus line 59. .

ＦＰＧＡ５１は、集積回路であり、ここでは、画像処理部３０から受信した視差画像に基づいて、物体に対する物体認識処理を行う。ＣＰＵ５２は、認識処理部５の各機能を制御する。ＲＯＭ５３は、ＣＰＵ５２が認識処理部５の物体認識処理を実行する物体認識処理用プログラムを記憶している。ＲＡＭ５４は、ＣＰＵ５２のワークエリアとして使用される。Ｉ／Ｆ５５は、画像処理部３０のＩ／Ｆ３５と、通信線４を介してデータ通信するためのインターフェースである。ＣＡＮＩ／Ｆ５８は、外部コントローラ（例えば、図６に示す車両制御装置６）と通信するためのインターフェースであり、例えば、自動車のＣＡＮ等に接続されるバスライン５９は、図３に示すように、ＦＰＧＡ５１、ＣＰＵ５２、ＲＯＭ５３、ＲＡＭ５４、Ｉ／Ｆ５５およびＣＡＮＩ／Ｆ５８が互いに通信可能となるように接続するアドレスバスおよびデータバス等である。 The FPGA 51 is an integrated circuit, and here performs object recognition processing on an object based on the parallax image received from the image processing unit 30. The CPU 52 controls each function of the recognition processing unit 5. The ROM 53 stores an object recognition processing program for the CPU 52 to execute the object recognition processing of the recognition processing unit 5. The RAM 54 is used as a work area for the CPU 52. The I / F 55 is an interface for data communication with the I / F 35 of the image processing unit 30 via the communication line 4. The CAN I / F 58 is an interface for communicating with an external controller (for example, the vehicle control device 6 shown in FIG. 6). For example, the bus line 59 connected to the CAN of the automobile is as shown in FIG. An FPGA 51, a CPU 52, a ROM 53, a RAM 54, an I / F 55, and a CAN / F 58 are an address bus, a data bus, and the like that are connected so that they can communicate with each other.

このような構成により、画像処理部３０のＩ／Ｆ３５から通信線４を介して認識処理部５に視差画像が送信されると、認識処理部５におけるＣＰＵ５２の命令によって、ＦＰＧＡ５１が、視差画像に基づいて、撮像画像に写り込んでいる人および車等の物体の物体認識処理等を実行する。 With this configuration, when a parallax image is transmitted from the I / F 35 of the image processing unit 30 to the recognition processing unit 5 via the communication line 4, the FPGA 51 is converted into a parallax image by a command of the CPU 52 in the recognition processing unit 5. Based on this, an object recognition process of an object such as a person and a car reflected in the captured image is executed.

なお、上述の各プログラムは、インストール可能な形式または実行可能な形式のファイルで、コンピュータで読み取り可能な記録媒体に記録して流通させてもよい。この記録媒体は、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）またはＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリカード等である。 Each of the above-mentioned programs may be recorded and distributed on a computer-readable recording medium in a file that can be installed or executed. The recording medium is a CD-ROM (Compact Disc Read Only Memory) or an SD (Secure Digital) memory card.

また、図３に示すように、視差値導出部３の画像処理部３０と、認識処理部５とは別体の装置としているが、これに限定されるものではなく、例えば、画像処理部３０と認識処理部５とを同一の装置として、視差画像の生成、および物体認識処理を行うものとしてもよい。 As shown in FIG. 3, the image processing unit 30 of the parallax value deriving unit 3 and the recognition processing unit 5 are separate devices, but the present invention is not limited to this. For example, the image processing unit 30 And the recognition processing unit 5 may be the same device to generate a parallax image and perform object recognition processing.

（物体認識装置の機能ブロックの構成および動作）
図４は、実施の形態に係る物体認識装置の機能ブロック構成の一例を示す図である。まず、図４を参照しながら、物体認識装置１の要部の機能ブロックの構成および動作について説明する。 (Configuration and operation of functional block of object recognition device)
FIG. 4 is a diagram illustrating an example of a functional block configuration of the object recognition apparatus according to the embodiment. First, the configuration and operation of the functional blocks of the main part of the object recognition device 1 will be described with reference to FIG.

図３でも上述したが、図４に示すように、物体認識装置１は、視差値導出部３と、認識処理部５と、を備えている。このうち、視差値導出部３は、画像取得部１００ａ（第１撮像手段）と、画像取得部１００ｂ（第２撮像手段）と、変換部２００ａ、２００ｂと、視差値演算処理部３００（生成手段）と、を有する。 As described above with reference to FIG. 3, as illustrated in FIG. 4, the object recognition apparatus 1 includes a parallax value deriving unit 3 and a recognition processing unit 5. Among these, the parallax value deriving unit 3 includes an image acquisition unit 100a (first imaging unit), an image acquisition unit 100b (second imaging unit), conversion units 200a and 200b, and a parallax value calculation processing unit 300 (generation unit). And).

画像取得部１００ａは、右のカメラにより前方の被写体を撮像して、アナログの画像信号を生成し、画像信号に基づく画像である輝度画像を得る機能部である。画像取得部１００ａは、図３に示す撮像部１０ａによって実現される。 The image acquisition unit 100a is a functional unit that captures a front subject with the right camera, generates an analog image signal, and obtains a luminance image that is an image based on the image signal. The image acquisition unit 100a is realized by the imaging unit 10a illustrated in FIG.

画像取得部１００ｂは、左のカメラにより前方の被写体を撮像して、アナログの画像信号を生成し、画像信号に基づく画像である輝度画像を得る機能部である。画像取得部１００ｂは、図３に示す撮像部１０ｂによって実現される。 The image acquisition unit 100b is a functional unit that captures a front subject with the left camera, generates an analog image signal, and obtains a luminance image that is an image based on the image signal. The image acquisition unit 100b is realized by the imaging unit 10b illustrated in FIG.

変換部２００ａは、画像取得部１００ａにより得られた輝度画像の画像データに対して、ノイズを除去し、デジタル形式の画像データに変換して出力する機能部である。変換部２００ａは、図３に示す信号変換部２０ａによって実現される。 The conversion unit 200a is a functional unit that removes noise from the image data of the luminance image obtained by the image acquisition unit 100a, converts the image data into digital image data, and outputs the image data. The converter 200a is realized by the signal converter 20a shown in FIG.

変換部２００ｂは、画像取得部１００ｂにより得られた輝度画像の画像データに対して、ノイズを除去し、デジタル形式の画像データに変換して出力する機能部である。変換部２００ｂは、図３に示す信号変換部２０ｂによって実現される。 The conversion unit 200b is a functional unit that removes noise from the image data of the luminance image obtained by the image acquisition unit 100b, converts the image data into digital image data, and outputs the image data. The converter 200b is realized by the signal converter 20b shown in FIG.

ここで、変換部２００ａ、２００ｂが出力する２つの輝度画像の画像データ（以下、単に、輝度画像と称する）のうち、右のカメラ（撮像部１０ａ）である画像取得部１００ａにより撮像された輝度画像を基準画像Ｉａの画像データ（以下、単に、基準画像Ｉａと称する）（第１撮像画像）とし、左のカメラ（撮像部１０ｂ）である画像取得部１００ｂにより撮像された輝度画像を比較画像Ｉｂの画像データ（以下、単に、比較画像Ｉｂと称する）（第２撮像画像）とする。すなわち、変換部２００ａ、２００ｂは、画像取得部１００ａ、１００ｂそれぞれから出力された２つの輝度画像に基づいて、それぞれ基準画像Ｉａおよび比較画像Ｉｂを出力する。 Here, of the image data of the two luminance images (hereinafter simply referred to as luminance images) output by the conversion units 200a and 200b, the luminance captured by the image acquisition unit 100a which is the right camera (imaging unit 10a). The image is the image data of the reference image Ia (hereinafter simply referred to as the reference image Ia) (first captured image), and the luminance image captured by the image acquisition unit 100b which is the left camera (imaging unit 10b) is a comparative image. It is assumed that the image data of Ib (hereinafter simply referred to as comparative image Ib) (second captured image). That is, the conversion units 200a and 200b output the reference image Ia and the comparison image Ib, respectively, based on the two luminance images output from the image acquisition units 100a and 100b.

視差値演算処理部３００は、変換部２００ａ、２００ｂそれぞれから受信した基準画像Ｉａおよび比較画像Ｉｂに基づいて、基準画像Ｉａの各画素についての視差値を導出し、基準画像Ｉａの各画素に視差値を対応させた視差画像（距離画像の一例）を生成する機能部である。視差値演算処理部３００は、生成した視差画像を、認識処理部５に出力する。なお、視差値演算処理部３００が生成する画像は視差画像に限定されるものではなく、視差値と同様に物体までの距離を示す情報を画素値とする画像（距離画像）であればよい。 The disparity value calculation processing unit 300 derives a disparity value for each pixel of the reference image Ia based on the reference image Ia and the comparison image Ib received from the conversion units 200a and 200b, and the disparity value for each pixel of the reference image Ia It is a functional unit that generates a parallax image (an example of a distance image) that corresponds to a value. The parallax value calculation processing unit 300 outputs the generated parallax image to the recognition processing unit 5. Note that the image generated by the parallax value calculation processing unit 300 is not limited to a parallax image, and may be an image (distance image) having information indicating a distance to an object as a pixel value, similar to the parallax value.

認識処理部５は、視差値導出部３から受信した基準画像Ｉａおよび視差画像に基づいて、物体を認識（検出）し、かつ、認識した物体を追跡（トラッキング）する物体認識処理を行う機能部である。 The recognition processing unit 5 recognizes (detects) an object based on the reference image Ia and the parallax image received from the parallax value deriving unit 3 and performs an object recognition process for tracking (tracking) the recognized object. It is.

＜視差値演算処理部の機能ブロックの構成および動作＞
図５は、実施の形態に係る物体認識装置の視差値演算処理部の機能ブロック構成の一例を示す図である。図６は、撮像部から物体までの距離を導き出す原理を説明する図である。図７は、基準画像における基準画素に対応する比較画像における対応画素を求める場合の説明図である。図８は、ブロックマッチング処理の結果のグラフの一例を示す図である。 <Configuration and Operation of Functional Block of Parallax Value Calculation Processing Unit>
FIG. 5 is a diagram illustrating an example of a functional block configuration of a disparity value calculation processing unit of the object recognition device according to the embodiment. FIG. 6 is a diagram for explaining the principle of deriving the distance from the imaging unit to the object. FIG. 7 is an explanatory diagram for obtaining a corresponding pixel in the comparison image corresponding to the reference pixel in the reference image. FIG. 8 is a diagram illustrating an example of a graph of a result of the block matching process.

まず、図６〜８を用いて、ブロックマッチング処理による測距方法の概略について説明する。 First, an outline of a distance measuring method using block matching processing will be described with reference to FIGS.

＜＜測距の原理＞＞
図６を参照しながら、ステレオマッチング処理により、ステレオカメラから物体に対する視差を導出し、この視差を示す視差値によって、ステレオカメラから物体までの距離を測定する原理について説明する。 << Principles of ranging >>
With reference to FIG. 6, the principle of deriving a parallax with respect to an object from the stereo camera by the stereo matching process and measuring the distance from the stereo camera to the object with the parallax value indicating the parallax will be described.

図６に示す撮像システムは、平行等位に配置された撮像部１０ａと撮像部１０ｂとを有するものとする。撮像部１０ａ、１０ｂは、それぞれ、入射する光を屈折させて物体の像を固体撮像素子である画像センサに結像させる撮像レンズ１１ａ、１１ｂを有する。撮像部１０ａおよび撮像部１０ｂによって撮像された各画像を、それぞれ基準画像Ｉａおよび比較画像Ｉｂとする。図６において、３次元空間内の物体Ｅ上の点Ｓは、基準画像Ｉａおよび比較画像Ｉｂそれぞれにおいて、撮像レンズ１１ａと撮像レンズ１１ｂとを結ぶ直線と平行な直線上の位置に写像される。ここで、各画像に写像された点Ｓを、基準画像Ｉａにおいて点Ｓａ（ｘ，ｙ）とし、比較画像Ｉｂにおいて点Ｓｂ（Ｘ，ｙ）とする。このとき、視差値ｄｐは、基準画像Ｉａ上の座標における点Ｓａ（ｘ，ｙ）と比較画像Ｉｂ上の座標における点Ｓｂ（Ｘ，ｙ）とを用いて、以下の（式１）のように表される。 The imaging system illustrated in FIG. 6 includes an imaging unit 10a and an imaging unit 10b that are arranged in parallel equiposition. The imaging units 10a and 10b respectively include imaging lenses 11a and 11b that refract incident light and form an image of an object on an image sensor that is a solid-state imaging device. The images captured by the imaging unit 10a and the imaging unit 10b are referred to as a reference image Ia and a comparative image Ib, respectively. In FIG. 6, the point S on the object E in the three-dimensional space is mapped to a position on a straight line parallel to a straight line connecting the imaging lens 11a and the imaging lens 11b in the reference image Ia and the comparison image Ib. Here, the point S mapped to each image is a point Sa (x, y) in the reference image Ia and a point Sb (X, y) in the comparison image Ib. At this time, the parallax value dp is expressed by the following (Equation 1) using the point Sa (x, y) at the coordinates on the reference image Ia and the point Sb (X, y) at the coordinates on the comparison image Ib. It is expressed in

ｄｐ＝Ｘ−ｘ（式１） dp = X−x (Formula 1)

また、図６において、基準画像Ｉａにおける点Ｓａ（ｘ，ｙ）と撮像レンズ１１ａから撮像面上におろした垂線の交点との距離をΔａとし、比較画像Ｉｂにおける点Ｓｂ（Ｘ，ｙ）と撮像レンズ１１ｂから撮像面上におろした垂線の交点との距離をΔｂとすると、視差値ｄｐは、ｄｐ＝Δａ＋Δｂと表すこともできる。 In FIG. 6, the distance between the point Sa (x, y) in the reference image Ia and the intersection of the perpendicular line taken from the imaging lens 11a on the imaging surface is Δa, and the point Sb (X, y) in the comparative image Ib is The parallax value dp can also be expressed as dp = Δa + Δb, where Δb is the distance from the intersection of the perpendicular line taken from the imaging lens 11b on the imaging surface.

次に、視差値ｄｐを用いることにより、撮像部１０ａ、１０ｂと物体Ｅとの間の距離Ｚを導出する。ここで、距離Ｚは、撮像レンズ１１ａの焦点位置と撮像レンズ１１ｂの焦点位置とを結ぶ直線から物体Ｅ上の点Ｓまでの距離である。図６に示すように、撮像レンズ１１ａおよび撮像レンズ１１ｂの焦点距離ｆ、撮像レンズ１１ａと撮像レンズ１１ｂとの間の長さである基線長Ｂ、および視差値ｄｐを用いて、下記の（式２）により、距離Ｚを算出することができる。 Next, the distance Z between the imaging units 10a and 10b and the object E is derived by using the parallax value dp. Here, the distance Z is a distance from a straight line connecting the focal position of the imaging lens 11a and the focal position of the imaging lens 11b to the point S on the object E. As shown in FIG. 6, using the focal length f of the imaging lens 11a and the imaging lens 11b, the baseline length B that is the length between the imaging lens 11a and the imaging lens 11b, and the parallax value dp, According to 2), the distance Z can be calculated.

Ｚ＝（Ｂ×ｆ）／ｄｐ（式２） Z = (B × f) / dp (Formula 2)

この（式２）により、視差値ｄｐが大きいほど距離Ｚは小さく、視差値ｄｐが小さいほど距離Ｚは大きくなることがわかる。 From this (Equation 2), it can be seen that the larger the parallax value dp, the smaller the distance Z, and the smaller the parallax value dp, the larger the distance Z.

＜＜ブロックマッチング処理＞＞
次に、図７および８を用いて、ブロックマッチング処理による測距方法について説明する。 << Block matching process >>
Next, a distance measurement method using block matching processing will be described with reference to FIGS.

図７および８を参照しながら、コスト値Ｃ（ｐ，ｄ）の算出方法について説明する。なお、以降、Ｃ（ｐ，ｄ）は、Ｃ（ｘ，ｙ，ｄ）を表すものとして説明する。 A method for calculating the cost value C (p, d) will be described with reference to FIGS. In the following description, C (p, d) represents C (x, y, d).

図７のうち、図７（ａ）は、基準画像Ｉａにおける基準画素ｐおよび基準領域ｐｂを示す概念図を示し、図７（ｂ）は、図７（ａ）に示す基準画素ｐに対応する比較画像Ｉｂにおける対応画素の候補を順次シフトしながら（ずらしながら）、コスト値Ｃを算出する際の概念図である。ここで、対応画素とは、基準画像Ｉａにおける基準画素ｐに最も類似する比較画像Ｉｂにおける画素を示す。また、コスト値Ｃとは、基準画像Ｉａにおける基準画素ｐに対する、比較画像Ｉｂにおける各画素の類似度または非類似度を表す評価値（一致度）である。以下に示すコスト値Ｃは、値が小さいほど、比較画像Ｉｂにおける画素が基準画素ｐと類似していることを示す非類似度を表す評価値であるものとして説明する。 7A is a conceptual diagram showing the reference pixel p and the reference region pb in the reference image Ia, and FIG. 7B corresponds to the reference pixel p shown in FIG. 7A. It is a conceptual diagram at the time of calculating the cost value C while sequentially shifting (shifting) the corresponding pixel candidates in the comparative image Ib. Here, the corresponding pixel indicates a pixel in the comparison image Ib that is most similar to the reference pixel p in the reference image Ia. The cost value C is an evaluation value (degree of coincidence) representing the similarity or dissimilarity of each pixel in the comparison image Ib with respect to the reference pixel p in the reference image Ia. The cost value C shown below is described as an evaluation value that represents a degree of dissimilarity indicating that the smaller the value is, the more similar the pixel in the comparative image Ib is to the reference pixel p.

図７（ａ）に示すように、基準画像Ｉａにおける基準画素ｐ（ｘ，ｙ）、および、基準画素ｐ（ｘ，ｙ）に対する比較画像Ｉｂにおけるエピポーラ線ＥＬ上の対応画素の候補である候補画素ｑ（ｘ＋ｄ，ｙ）の各輝度値（画素値）に基づいて、基準画素ｐ（ｘ，ｙ）に対する対応画素の候補である候補画素ｑ（ｘ＋ｄ，ｙ）のコスト値Ｃ（ｐ，ｄ）が算出される。ｄは、基準画素ｐと候補画素ｑとのシフト量（ずれ量）であり、シフト量ｄは、画素単位でシフトされる。すなわち、候補画素ｑ（ｘ＋ｄ，ｙ）を予め指定された範囲（例えば、０＜ｄ＜２５）において順次一画素分シフトしながら、候補画素ｑ（ｘ＋ｄ，ｙ）と基準画素ｐ（ｘ，ｙ）との輝度値の非類似度であるコスト値Ｃ（ｐ，ｄ）が算出される。また、基準画素ｐの対応画素を求めるためステレオマッチング処理として、本実施の形態ではブロックマッチング処理を行う。ブロックマッチング処理では、基準画像Ｉａの基準画素ｐを中心とする所定領域である基準領域ｐｂと、比較画像Ｉｂの候補画素ｑを中心とする候補領域ｑｂ（大きさは基準領域ｐｂと同一）との非類似度を求める。基準領域ｐｂと候補領域ｑｂとの非類似度を示すコスト値Ｃとしては、ＳＡＤ（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ）、ＳＳＤ（ＳｕｍｏｆＳｑｕａｒｅｄＤｉｆｆｅｒｅｎｃｅ）、または、ＳＳＤの値から各ブロックの平均値を減算したＺＳＳＤ（Ｚｅｒｏ−ｍｅａｎ−ＳｕｍｏｆＳｑｕａｒｅｄＤｉｆｆｅｒｅｎｃｅ）等が用いられる。これらの評価値は、相関が高い（類似の度合いが高い）ほど、値が小さくなるので非類似度を示す。 As shown in FIG. 7A, a candidate that is a candidate for a reference pixel p (x, y) in the reference image Ia and a corresponding pixel on the epipolar line EL in the comparison image Ib for the reference pixel p (x, y). Based on each luminance value (pixel value) of the pixel q (x + d, y), the cost value C (p, d) of the candidate pixel q (x + d, y) that is a candidate for the corresponding pixel with respect to the reference pixel p (x, y). ) Is calculated. d is a shift amount (shift amount) between the reference pixel p and the candidate pixel q, and the shift amount d is shifted in units of pixels. That is, the candidate pixel q (x + d, y) and the reference pixel p (x, y) are sequentially shifted by one pixel within a predetermined range (for example, 0 <d <25). ) Is calculated as a cost value C (p, d) which is a dissimilarity between the brightness value and the brightness value. In addition, in the present embodiment, block matching processing is performed as stereo matching processing for obtaining a corresponding pixel of the reference pixel p. In the block matching process, a reference region pb that is a predetermined region centered on the reference pixel p of the reference image Ia, and a candidate region qb (size is the same as the reference region pb) centered on the candidate pixel q of the comparison image Ib. Find dissimilarity of. The cost value C indicating the dissimilarity between the reference region pb and the candidate region qb is obtained by subtracting the average value of each block from the value of SAD (Sum of Absolute Difference), SSD (Sum of Squared Difference), or SSD. ZSSD (Zero-mean-Sum of Squared Difference) or the like is used. These evaluation values indicate dissimilarity because the values are smaller as the correlation is higher (the degree of similarity is higher).

なお、上述のように、撮像部１０ａ、１０ｂは、それぞれ平行等位に配置されるため、基準画像Ｉａおよび比較画像Ｉｂも、それぞれ平行等位の関係にある。したがって、基準画像Ｉａにおける基準画素ｐに対応する比較画像Ｉｂにおける対応画素は、図７に紙面視横方向の線として示されるエピポーラ線ＥＬ上に存在することになり、比較画像Ｉｂにおける対応画素を求めるためには、比較画像Ｉｂのエピポーラ線ＥＬ上の画素を探索すればよい。 As described above, since the imaging units 10a and 10b are arranged in parallel equiposition, the reference image Ia and the comparison image Ib are also in parallel equivalence relations. Therefore, the corresponding pixel in the comparison image Ib corresponding to the reference pixel p in the reference image Ia exists on the epipolar line EL shown as a horizontal line in FIG. 7, and the corresponding pixel in the comparison image Ib is In order to obtain it, the pixel on the epipolar line EL of the comparison image Ib may be searched.

このようなブロックマッチング処理で算出されたコスト値Ｃ（ｐ，ｄ）は、シフト量ｄとの関係で、例えば、図８に示すグラフにより表される。図８の例では、コスト値Ｃは、シフト量ｄ＝７の場合が最小値となるため、視差値ｄｐ＝７として導出される。 The cost value C (p, d) calculated by such block matching processing is represented by, for example, a graph shown in FIG. 8 in relation to the shift amount d. In the example of FIG. 8, the cost value C is derived as the parallax value dp = 7 because the minimum value is obtained when the shift amount d = 7.

＜＜視差値演算処理部の機能ブロックの具体的な構成および動作＞＞
図５を参照しながら、視差値演算処理部３００の機能ブロックの具体的な構成および動作について説明する。 << Specific Configuration and Operation of Functional Block of Parallax Value Calculation Processing Unit >>
A specific configuration and operation of the functional block of the parallax value calculation processing unit 300 will be described with reference to FIG.

図５に示すように、視差値演算処理部３００は、コスト算出部３０１と、決定部３０２と、第１生成部３０３と、を有する。 As illustrated in FIG. 5, the parallax value calculation processing unit 300 includes a cost calculation unit 301, a determination unit 302, and a first generation unit 303.

コスト算出部３０１は、基準画像Ｉａにおける基準画素ｐ（ｘ，ｙ）の輝度値、および、基準画素ｐ（ｘ，ｙ）に基づく比較画像Ｉｂにおけるエピポーラ線ＥＬ上で、基準画素ｐ（ｘ，ｙ）の位置に相当する画素からシフト量ｄでシフトすることにより特定される、対応画素の候補である候補画素ｑ（ｘ＋ｄ，ｙ）の各輝度値に基づいて、各候補画素ｑ（ｘ＋ｄ，ｙ）のコスト値Ｃ（ｐ，ｄ）を算出する機能部である。具体的には、コスト算出部３０１は、ブロックマッチング処理により、基準画像Ｉａの基準画素ｐを中心とする所定領域である基準領域ｐｂと、比較画像Ｉｂの候補画素ｑを中心とする候補領域ｑｂ（大きさは基準領域ｐｂと同一）との非類似度をコスト値Ｃとして算出する。 The cost calculation unit 301 uses the reference pixel p (x, y) on the epipolar line EL in the comparison image Ib based on the luminance value of the reference pixel p (x, y) in the reference image Ia and the reference pixel p (x, y). Each candidate pixel q (x + d, y) is determined based on each luminance value of the candidate pixel q (x + d, y), which is a candidate for the corresponding pixel, specified by shifting from the pixel corresponding to the position y) by the shift amount d. This is a functional unit that calculates the cost value C (p, d) of y). Specifically, the cost calculation unit 301 performs, by block matching processing, a reference region pb that is a predetermined region centered on the reference pixel p of the reference image Ia and a candidate region qb centered on the candidate pixel q of the comparison image Ib. The dissimilarity with (the size is the same as the reference region pb) is calculated as the cost value C.

決定部３０２は、コスト算出部３０１により算出されたコスト値Ｃの最小値に対応するシフト量ｄを、コスト値Ｃの算出の対象となった基準画像Ｉａの画素についての視差値ｄｐとして決定する機能部である。 The determination unit 302 determines the shift amount d corresponding to the minimum value of the cost value C calculated by the cost calculation unit 301 as the parallax value dp for the pixel of the reference image Ia for which the cost value C is calculated. It is a functional part.

第１生成部３０３は、決定部３０２により決定された視差値ｄｐに基づいて、基準画像Ｉａの各画素の画素値を、その画素に対応する視差値ｄｐで置き換えた画像である視差画像を生成する機能部である。 Based on the parallax value dp determined by the determination unit 302, the first generation unit 303 generates a parallax image that is an image obtained by replacing the pixel value of each pixel of the reference image Ia with the parallax value dp corresponding to the pixel. It is a functional part to do.

図５に示すコスト算出部３０１、決定部３０２および第１生成部３０３は、それぞれ図３に示すＦＰＧＡ３１によって実現される。なお、コスト算出部３０１、決定部３０２および第１生成部３０３の一部または全部は、ハードウェア回路であるＦＰＧＡ３１ではなく、ＲＯＭ３３に記憶されているプログラムがＣＰＵ３２によって実行されることによって実現されるものとしてもよい。 The cost calculation unit 301, the determination unit 302, and the first generation unit 303 illustrated in FIG. 5 are each realized by the FPGA 31 illustrated in FIG. Note that part or all of the cost calculation unit 301, the determination unit 302, and the first generation unit 303 are realized by the CPU 32 executing a program stored in the ROM 33 instead of the FPGA 31 that is a hardware circuit. It may be a thing.

なお、図５に示す視差値演算処理部３００のコスト算出部３０１、決定部３０２および第１生成部３０３は、機能を概念的に示したものであって、このような構成に限定されるものではない。例えば、図５に示す視差値演算処理部３００で独立した機能部として図示した複数の機能部を、１つの機能部として構成してもよい。一方、図５に示す視差値演算処理部３００で１つの機能部が有する機能を複数に分割し、複数の機能部として構成するものとしてもよい。 Note that the cost calculation unit 301, the determination unit 302, and the first generation unit 303 of the parallax value calculation processing unit 300 illustrated in FIG. 5 conceptually illustrate functions, and are limited to such a configuration. is not. For example, a plurality of functional units illustrated as independent functional units in the parallax value calculation processing unit 300 illustrated in FIG. 5 may be configured as one functional unit. On the other hand, the parallax value calculation processing unit 300 illustrated in FIG. 5 may be configured such that the functions of one functional unit are divided into a plurality of functional units.

＜認識処理部の機能ブロックの構成および動作＞
図９は、実施の形態に係る物体認識装置の認識処理部の機能ブロック構成の一例を示す図である。図１０は、視差画像から生成されるＶマップの例を示す図である。図１１は、視差画像から生成されるＵマップの例を示す図である。図１２は、Ｕマップから生成されるリアルＵマップの例を示す図である。図１３は、検出枠を作成する処理を説明する図である。図９〜１３を参照しながら、認識処理部５の機能ブロックの構成および動作について説明する。 <Configuration and operation of functional block of recognition processing unit>
FIG. 9 is a diagram illustrating an example of a functional block configuration of the recognition processing unit of the object recognition apparatus according to the embodiment. FIG. 10 is a diagram illustrating an example of a V map generated from a parallax image. FIG. 11 is a diagram illustrating an example of a U map generated from a parallax image. FIG. 12 is a diagram illustrating an example of a real U map generated from the U map. FIG. 13 is a diagram illustrating processing for creating a detection frame. The configuration and operation of the functional block of the recognition processing unit 5 will be described with reference to FIGS.

図９に示すように、認識処理部５は、第２生成部５００と、クラスタリング処理部５１０（検出手段）と、トラッキング処理部５２０と、を有する。 As illustrated in FIG. 9, the recognition processing unit 5 includes a second generation unit 500, a clustering processing unit 510 (detection unit), and a tracking processing unit 520.

第２生成部５００は、視差値演算処理部３００から視差画像を入力し、かつ、視差値導出部３から基準画像Ｉａを入力し、Ｖ−Ｄｉｓｐａｒｉｔｙマップ、Ｕ−Ｄｉｓｐａｒｉｔｙマップ、およびＲｅａｌＵ−Ｄｉｓｐａｒｉｔｙマップ等を生成する機能部である。具体的には、第２生成部５００は、視差値演算処理部３００から入力した視差画像から路面を検出するために、図１０（ｂ）に示すＶ−ＤｉｓｐａｒｉｔｙマップであるＶマップＶＭを生成する。ここで、Ｖ−Ｄｉｓｐａｒｉｔｙマップとは、縦軸を基準画像Ｉａのｙ軸とし、横軸を視差画像の視差値ｄｐ（または距離）とした、視差値ｄｐの頻度分布を示す二次元ヒストグラムである。図１０（ａ）に示す基準画像Ｉａには、例えば、路面７００と、電柱７０１と、車７０２とが写り込んでいる。この基準画像Ｉａの路面７００は、ＶマップＶＭにおいては路面部７００ａに対応し、電柱７０１は、電柱部７０１ａに対応し、車７０２は、車部７０２ａに対応する。 The second generation unit 500 receives the parallax image from the parallax value calculation processing unit 300 and the reference image Ia from the parallax value deriving unit 3, and receives the V-Disparity map, the U-Disparity map, and the Real U-Disparity. It is a functional unit that generates a map and the like. Specifically, the second generation unit 500 generates a V map VM that is a V-Disparity map illustrated in FIG. 10B in order to detect a road surface from the parallax image input from the parallax value calculation processing unit 300. . Here, the V-Disparity map is a two-dimensional histogram showing the frequency distribution of the parallax values dp with the vertical axis as the y-axis of the reference image Ia and the horizontal axis as the parallax value dp (or distance) of the parallax image. . In the reference image Ia shown in FIG. 10A, for example, a road surface 700, a utility pole 701, and a car 702 are reflected. The road surface 700 of the reference image Ia corresponds to the road surface portion 700a in the V map VM, the electric pole 701 corresponds to the electric pole portion 701a, and the vehicle 702 corresponds to the vehicle portion 702a.

また、第２生成部５００は、生成したＶマップＶＭから、路面と推定される位置を直線近似する。路面が平坦な場合は、１本の直線で近似可能であるが、勾配が変わる路面の場合は、ＶマップＶＭの区間を分割して精度よく直線近似する必要がある。直線近似としては、公知技術であるハフ変換または最小二乗法等が利用できる。ＶマップＶＭにおいて、検出された路面部７００ａより上方に位置する塊である電柱部７０１ａおよび車部７０２ａは、それぞれ路面上の物体である電柱７０１および車７０２に相当する。後述する第２生成部５００によりＵ−Ｄｉｓｐａｒｉｔｙマップが生成される際に、ノイズ除去のため路面より上方の情報のみが用いられる。 In addition, the second generation unit 500 linearly approximates the position estimated as the road surface from the generated V map VM. When the road surface is flat, it can be approximated by a single straight line. However, when the road surface changes in slope, it is necessary to divide the section of the V map VM and perform linear approximation with high accuracy. As the linear approximation, a Hough transform or a least square method which is a known technique can be used. In the V map VM, the electric pole portion 701a and the vehicle portion 702a, which are blocks located above the detected road surface portion 700a, correspond to the electric pole 701 and the vehicle 702, which are objects on the road surface, respectively. When a U-Disparity map is generated by the second generation unit 500 described later, only information above the road surface is used for noise removal.

また、第２生成部５００は、ＶマップＶＭで検出された路面より上方に位置する情報のみを利用、すなわち、図１１（ａ）に示す基準画像Ｉａでは左ガードレール７１１、右ガードレール７１２、車７１３および車７１４に対応する視差画像上の情報を利用して、物体を認識するために、図１１（ｂ）に示すＵ−ＤｉｓｐａｒｉｔｙマップであるＵマップＵＭを生成する。ここで、ＵマップＵＭは、横軸を基準画像Ｉａのｘ軸とし、縦軸を視差画像の視差値ｄｐ（または距離）とした、視差値ｄｐの頻度分布を示す二次元ヒストグラムである。図１１（ａ）に示す基準画像Ｉａの左ガードレール７１１は、ＵマップＵＭにおいては左ガードレール部７１１ａに対応し、右ガードレール７１２は、右ガードレール部７１２ａに対応し、車７１３は、車部７１３ａに対応し、車７１４は、車部７１４ａに対応する。 Further, the second generation unit 500 uses only information located above the road surface detected by the V map VM, that is, the left guard rail 711, the right guard rail 712, and the car 713 in the reference image Ia shown in FIG. And in order to recognize an object using the information on the parallax image corresponding to the car 714, the U map UM which is a U-Disparity map shown in FIG. 11B is generated. Here, the U map UM is a two-dimensional histogram showing the frequency distribution of the parallax values dp with the horizontal axis as the x-axis of the reference image Ia and the vertical axis as the parallax value dp (or distance) of the parallax image. In the U map UM, the left guard rail 711 of the reference image Ia shown in FIG. 11A corresponds to the left guard rail portion 711a, the right guard rail 712 corresponds to the right guard rail portion 712a, and the vehicle 713 corresponds to the vehicle portion 713a. Correspondingly, the vehicle 714 corresponds to the vehicle portion 714a.

また、第２生成部５００は、ＶマップＶＭで検出された路面より上方に位置する情報のみを利用、すなわち、図１１（ａ）に示す基準画像Ｉａでは左ガードレール７１１、右ガードレール７１２、車７１３および車７１４に対応する視差画像上の情報を利用して、図１１（ｃ）に示すＵ−Ｄｉｓｐａｒｉｔｙマップの一例であるＵマップＵＭ＿Ｈを生成する。ここで、Ｕ−Ｄｉｓｐａｒｉｔｙマップの一例であるＵマップＵＭ＿Ｈは、横軸を基準画像Ｉａのｘ軸とし、縦軸を視差画像の視差値ｄｐとし、画素値を物体の高さとした画像である。図１１（ａ）に示す基準画像Ｉａの左ガードレール７１１は、ＵマップＵＭ＿Ｈにおいては左ガードレール部７１１ｂに対応し、右ガードレール７１２は、右ガードレール部７１２ｂに対応し、車７１３は、車部７１３ｂに対応し、車７１４は、車部７１４ｂに対応する。 Further, the second generation unit 500 uses only information located above the road surface detected by the V map VM, that is, the left guard rail 711, the right guard rail 712, and the car 713 in the reference image Ia shown in FIG. And U map UM_H which is an example of the U-Disparity map shown in FIG.11 (c) is produced | generated using the information on the parallax image corresponding to the vehicle 714. FIG. Here, the U map UM_H, which is an example of the U-Disparity map, is an image in which the horizontal axis is the x axis of the reference image Ia, the vertical axis is the parallax value dp of the parallax image, and the pixel value is the height of the object. In the U map UM_H, the left guard rail 711 of the reference image Ia shown in FIG. 11A corresponds to the left guard rail portion 711b, the right guard rail 712 corresponds to the right guard rail portion 712b, and the vehicle 713 corresponds to the vehicle portion 713b. Correspondingly, the vehicle 714 corresponds to the vehicle portion 714b.

また、第２生成部５００は、生成した図１２（ａ）に示すＵマップＵＭから、横軸を実際の距離に変換した図１２（ｂ）に示すＲｅａｌＵ−ＤｉｓｐａｒｉｔｙマップであるリアルＵマップＲＭを生成する。ここで、リアルＵマップＲＭは、横軸を、撮像部１０ｂ（右のカメラ）から撮像部１０ａ（左のカメラ）へ向かう方向の実距離とし、縦軸を、視差画像の視差値ｄｐ（またはその視差値ｄｐから変換した奥行き方向の距離）とした二次元ヒストグラムである。図１２（ａ）に示すＵマップＵＭの左ガードレール部７１１ａは、リアルＵマップＲＭにおいては左ガードレール部７１１ｃに対応し、右ガードレール部７１２ａは、右ガードレール部７１２ｃに対応し、車部７１３ａは、車部７１３ｃに対応し、車部７１４ａは、車部７１４ｃに対応する。具体的には、第２生成部５００は、ＵマップＵＭでは、遠方（視差値ｄｐが小さい）では物体が小さいため、視差情報が少なく、距離の分解能も小さいので間引きせず、近距離の場合は物体が大きく写るため、視差情報が多く、距離の分解能も大きいので画素を大きく間引くことによって、リアルＵマップＲＭを生成する。後述するように、クラスタリング処理部５１０により、リアルＵマップＲＭから画素値の塊（物体）を抽出して物体を検出することができる。なお、第２生成部５００は、ＵマップＵＭからリアルＵマップＲＭを生成することに限定されるものではなく、視差画像から、直接、リアルＵマップＲＭを生成することも可能である。 Further, the second generation unit 500 generates a real U map RM that is a Real U-Disparity map shown in FIG. 12B in which the horizontal axis is converted into an actual distance from the generated U map UM shown in FIG. Is generated. Here, in the real U map RM, the horizontal axis is the actual distance in the direction from the imaging unit 10b (right camera) to the imaging unit 10a (left camera), and the vertical axis is the parallax value dp of the parallax image (or It is a two-dimensional histogram as a distance in the depth direction converted from the parallax value dp. The left guard rail portion 711a of the U map UM shown in FIG. 12A corresponds to the left guard rail portion 711c in the real U map RM, the right guard rail portion 712a corresponds to the right guard rail portion 712c, and the vehicle portion 713a Corresponding to the vehicle portion 713c, the vehicle portion 714a corresponds to the vehicle portion 714c. Specifically, in the U map UM, since the object is small in the distance (the disparity value dp is small), the second generation unit 500 has a small amount of disparity information and a small distance resolution. Since a large object is captured, the parallax information is large and the resolution of the distance is large. Therefore, the real U map RM is generated by thinning out pixels greatly. As will be described later, the clustering processing unit 510 can detect an object by extracting a block (object) of pixel values from the real U map RM. The second generation unit 500 is not limited to generating the real U map RM from the U map UM, and can also generate the real U map RM directly from the parallax image.

なお、視差値導出部３から第２生成部５００に入力される画像は基準画像Ｉａに限定されるものではなく、比較画像Ｉｂを対象とするものとしてもよい。 Note that the image input from the parallax value derivation unit 3 to the second generation unit 500 is not limited to the reference image Ia, and may be the comparison image Ib.

クラスタリング処理部５１０は、第２生成部５００から入力された各マップに基づいて、視差画像に写っている物体を検出する機能部である。クラスタリング処理部５１０は、生成したＵマップＵＭまたはリアルＵマップＲＭから、物体の視差画像および基準画像Ｉａにおけるｘ軸方向の位置および幅（ｘｍｉｎ，ｘｍａｘ）を特定できる。また、クラスタリング処理部５１０は、生成したＵマップＵＭまたはリアルＵマップＲＭでの物体の高さの情報（ｄｍｉｎ，ｄｍａｘ）から物体の実際の奥行きを特定できる。また、クラスタリング処理部５１０は、生成したＶマップＶＭから、物体の視差画像および基準画像Ｉａにおけるｙ軸方向の位置および高さ（ｙｍｉｎ＝「最大視差値の路面からの最大高さに相当するｙ座標」，ｙｍａｘ＝「最大視差値から得られる路面の高さを示すｙ座標」）を特定できる。また、クラスタリング処理部５１０は、視差画像において特定した物体のｘ軸方向の幅（ｘｍｉｎ，ｘｍａｘ）、ｙ軸方向の高さ（ｙｍｉｎ，ｙｍａｘ）およびそれぞれに対応する視差値ｄｐから、物体の実際のｘ軸方向およびｙ軸方向のサイズが特定できる。以上のように、クラスタリング処理部５１０は、ＶマップＶＭ、ＵマップＵＭおよびリアルＵマップＲＭを利用して、基準画像Ｉａでの物体の位置、ならびに実際の幅、高さおよび奥行きを特定することができる。また、クラスタリング処理部５１０は、基準画像Ｉａでの物体の位置が特定されるので、視差画像における位置も定まり、物体までの距離も特定できる。 The clustering processing unit 510 is a functional unit that detects an object shown in the parallax image based on each map input from the second generation unit 500. The clustering processing unit 510 can specify the position and width (xmin, xmax) in the x-axis direction of the parallax image of the object and the reference image Ia from the generated U map UM or real U map RM. Further, the clustering processing unit 510 can identify the actual depth of the object from the height information (dmin, dmax) of the object in the generated U map UM or real U map RM. Further, the clustering processing unit 510 uses the generated V map VM to determine the position and height in the y-axis direction (ymin = “y corresponding to the maximum height from the road surface of the maximum parallax value) in the parallax image of the object and the reference image Ia. Coordinate ", ymax =" y coordinate indicating the height of the road surface obtained from the maximum parallax value "). Further, the clustering processing unit 510 determines the actual object from the width (xmin, xmax) in the x-axis direction and the height (ymin, ymax) in the y-axis direction and the corresponding parallax values dp of the object specified in the parallax image. The size in the x-axis direction and the y-axis direction can be specified. As described above, the clustering processing unit 510 uses the V map VM, the U map UM, and the real U map RM to specify the position of the object in the reference image Ia and the actual width, height, and depth. Can do. In addition, since the position of the object in the reference image Ia is specified, the clustering processing unit 510 can determine the position in the parallax image and can also specify the distance to the object.

そして、クラスタリング処理部５１０は、最終的に、図１３（ａ）に示すように、リアルＵマップＲＭ上で特定（検出）した物体の検出領域７２１〜７２４にそれぞれ対応するように、図１３（ｂ）に示す基準画像Ｉａまたは視差画像Ｉｐ上の検出枠７２１ａ〜７２４ａを作成する。 Then, as shown in FIG. 13A, the clustering processing unit 510 finally corresponds to the detection areas 721 to 724 of the objects specified (detected) on the real U map RM, as shown in FIG. Detection frames 721a to 724a on the reference image Ia or the parallax image Ip shown in b) are created.

また、クラスタリング処理部５１０は、物体について特定した実際のサイズ（幅、高さ、奥行き）から、下記の（表１）を用いて、物体が何であるかを特定することができる。例えば、物体の幅が１３００［ｍｍ］、高さが１８００［ｍｍ］、奥行きが２０００［ｍｍ］である場合、物体は「普通車」であると特定できる。なお、（表１）のような幅、高さおよび奥行きと、物体の種類（物体タイプ）とを関連付ける情報をテーブルとして、ＲＡＭ５４等に記憶させておくものとすればよい。 Further, the clustering processing unit 510 can identify what the object is using the following (Table 1) from the actual size (width, height, depth) specified for the object. For example, when the width of the object is 1300 [mm], the height is 1800 [mm], and the depth is 2000 [mm], the object can be specified as a “normal vehicle”. Note that information associating the width, height, and depth with the type of object (object type) as in (Table 1) may be stored in the RAM 54 or the like as a table.

クラスタリング処理部５１０は、検出（認識）された物体に関する情報を認識領域情報として生成する。ここで、認識領域情報とは、クラスタリング処理部５１０により検出された物体に関する情報を示し、例えば、検出した物体の基準画像Ｉａ、Ｖ−Ｄｉｓｐａｒｉｔｙマップ、Ｕ−Ｄｉｓｐａｒｉｔｙマップ、およびＲｅａｌＵ−Ｄｉｓｐａｒｉｔｙマップ等における位置および大きさ、検出した物体の種類ならびに、後述する棄却フラグ等の情報を含む。 The clustering processing unit 510 generates information about the detected (recognized) object as recognition area information. Here, the recognition area information indicates information related to the object detected by the clustering processing unit 510. For example, the reference image Ia, V-Disparity map, U-Disparity map, and Real U-Disparity map of the detected object, etc. This includes information such as the position and size at, the type of the detected object, and a rejection flag, which will be described later.

図９に示す認識処理部５の第２生成部５００およびクラスタリング処理部５１０は、それぞれ図３に示すＦＰＧＡ５１によって実現される。なお、第２生成部５００およびクラスタリング処理部５１０の一部または全部は、ハードウェア回路であるＦＰＧＡ５１ではなく、ＲＯＭ５３に記憶されているプログラムがＣＰＵ５２によって実行されることによって実現されるものとしてもよい。 The second generation unit 500 and the clustering processing unit 510 of the recognition processing unit 5 illustrated in FIG. 9 are realized by the FPGA 51 illustrated in FIG. Part or all of the second generation unit 500 and the clustering processing unit 510 may be realized by the CPU 52 executing a program stored in the ROM 53 instead of the FPGA 51 which is a hardware circuit. .

トラッキング処理部５２０は、クラスタリング処理部５１０により検出（認識）された物体に関する情報である認識領域情報に基づいて、その物体を棄却したり、追跡処理をしたりするトラッキング処理を実行する機能部である。トラッキング処理部５２０の具体的な構成は、後述する図１４で説明する。ここで、棄却とは、その物体を後段の処理（例えば、車両制御装置６における制御処理等）の対象外とすることを示す。 The tracking processing unit 520 is a functional unit that executes tracking processing that rejects an object or performs tracking processing based on recognition area information that is information about an object detected (recognized) by the clustering processing unit 510. is there. A specific configuration of the tracking processing unit 520 will be described with reference to FIG. Here, “rejection” indicates that the object is excluded from the target of subsequent processing (for example, control processing or the like in the vehicle control device 6).

なお、本発明に係る「画像処理装置」は、トラッキング処理部５２０であってもよく、トラッキング処理部５２０を含む認識処理部５であってもよい。 The “image processing apparatus” according to the present invention may be the tracking processing unit 520 or the recognition processing unit 5 including the tracking processing unit 520.

＜＜トラッキング処理部の機能ブロックの構成および動作＞＞
図１４は、実施の形態に係る物体認識装置の認識処理部のトラッキング処理部の機能ブロック構成の一例を示す図である。図１４を参照しながら、認識処理部５のトラッキング処理部５２０の機能ブロックの構成および動作について説明する。 << Configuration and operation of functional block of tracking processing section >>
FIG. 14 is a diagram illustrating an example of a functional block configuration of the tracking processing unit of the recognition processing unit of the object recognition apparatus according to the embodiment. The configuration and operation of the functional block of the tracking processing unit 520 of the recognition processing unit 5 will be described with reference to FIG.

図１４に示すように、トラッキング処理部５２０は、移動予測部６００（予測手段）と、マッチング部６１０と、チェック部６２０と、特徴更新部６３０（更新手段）と、状態遷移部６４０と、を有する。 As shown in FIG. 14, the tracking processing unit 520 includes a movement prediction unit 600 (prediction unit), a matching unit 610, a check unit 620, a feature update unit 630 (update unit), and a state transition unit 640. Have.

移動予測部６００は、クラスタリング処理部５１０により新規検出された物体のこれまでの移動および動作状態の履歴、ならびに車両情報を用いて、これまで追従（トラッキング）してきた物体ごとに、現在の輝度画像（以下、単に「フレーム」という場合がある）（または、それに対応する視差画像）上で物体が存在する確率が高い予測領域を予測する機能部である。移動予測部６００は、前回のフレーム（以下、単に「前フレーム」という場合がある）までの移動情報（例えば、重心の相対位置履歴および相対速度履歴等)、および車両情報を用いて、ｘｚ平面（ｘ：フレーム横位置、z：距離）で物体の動きを予測する。なお、移動予測部６００は、予測以上の動きを持つ物体に対応するために、前回予測した予測領域よりも拡大する処理を行ってもよい。また、上述の移動情報は、各検出された物体ごとの認識領域情報に含まれるものとしてもよい。以下の説明では、認識領域情報は上述の移動情報を含むものとして説明する。 The movement prediction unit 600 uses the history of movement and operation state of the object newly detected by the clustering processing unit 510 and the vehicle information, and uses the current luminance image for each object that has been tracked (tracked). This is a functional unit that predicts a prediction region where there is a high probability that an object is present (or may simply be referred to as a “frame” hereinafter) (or a parallax image corresponding thereto). The movement prediction unit 600 uses movement information (for example, relative position history and relative speed history of the center of gravity, etc.) up to the previous frame (hereinafter sometimes simply referred to as “previous frame”) and vehicle information, and the xz plane. The motion of the object is predicted by (x: frame horizontal position, z: distance). Note that the movement prediction unit 600 may perform a process of enlarging the previously predicted prediction area in order to deal with an object having a motion that is greater than or equal to the prediction. Further, the above movement information may be included in the recognition area information for each detected object. In the following description, the recognition area information will be described as including the above movement information.

マッチング部６１０は、移動予測部６００により予測された予測領域内における前フレームで求めた特徴量（テンプレート）との類似度に基づくテンプレートマッチングを行い、現在のフレーム（以下、単に「現在フレーム」という）における物体（特に、車両および歩行者）の位置を求めるマッチング処理を行う機能部である。ここで、歩行者とは、撮像手段等により撮像された撮像画像に含まれる人物を示すものとし、歩いている者、走っている者、および止まっている者等すべての者を示すものとする。マッチング部６１０は、判定部６１１と、形状マッチング部６１２（第１マッチング手段）と、画像マッチング部６１３（第２マッチング手段）と、境界決定部６１４（決定手段）と、補正処理部６１５と、を有する。 The matching unit 610 performs template matching based on the similarity with the feature amount (template) obtained in the previous frame in the prediction region predicted by the movement prediction unit 600, and performs the current frame (hereinafter simply referred to as “current frame”). ) Is a functional unit that performs a matching process for obtaining the position of an object (in particular, a vehicle and a pedestrian). Here, the pedestrian refers to a person included in a captured image captured by an imaging unit or the like, and refers to all persons such as a walking person, a running person, and a stopped person. . The matching unit 610 includes a determination unit 611, a shape matching unit 612 (first matching unit), an image matching unit 613 (second matching unit), a boundary determination unit 614 (determination unit), a correction processing unit 615, Have

判定部６１１は、物体の認識領域情報に基づいて、その物体が車両であるか歩行者であるかを判定し、物体が車両である場合は、後段の処理で、車両を追跡するための車両用マッチング処理を実行させ、歩行者である場合は、後段の処理で、歩行者を追跡するための歩行者用マッチング処理を実行させる分岐処理を行う機能部である。 The determination unit 611 determines whether the object is a vehicle or a pedestrian based on the recognition area information of the object. If the object is a vehicle, the vehicle for tracking the vehicle in the subsequent process If the user is a pedestrian, the function unit performs a branching process for executing the pedestrian matching process for tracking the pedestrian in the subsequent process.

形状マッチング部６１２は、歩行者用マッチング処理において、視差画像において歩行者の頭部を主とする輪郭を検出し、前フレームに対応する視差画像で検出された歩行者の輪郭をテンプレート（輪郭テンプレート）としてテンプレートマッチングを行う形状マッチング処理を行う機能部である。 In the pedestrian matching process, the shape matching unit 612 detects a contour mainly including the head of the pedestrian in the parallax image, and uses the contour of the pedestrian detected in the parallax image corresponding to the previous frame as a template (contour template). ) Is a functional unit that performs shape matching processing for performing template matching.

画像マッチング部６１３は、歩行者用マッチング処理において、現在フレームである輝度画像において、前フレームに対応する視差画像で検出された輪郭テンプレートに基づいた画像テンプレートによりテンプレートマッチングを行う画像マッチング処理を行う機能部である。 The image matching unit 613 performs a function of performing an image matching process that performs template matching using an image template based on a contour template detected from a parallax image corresponding to a previous frame in a luminance image that is a current frame in a pedestrian matching process. Part.

境界決定部６１４は、歩行者用マッチング処理において、現在フレームで複数の歩行者の輪郭が検出された場合、画像マッチング部６１３によって検出（位置が決定）された歩行者以外の歩行者との境界を決定する境界決定処理を行う機能部である。 In the pedestrian matching process, the boundary determination unit 614 detects a boundary with a pedestrian other than the pedestrian detected (position is determined) by the image matching unit 613 when the contours of a plurality of pedestrians are detected in the current frame. It is a functional part which performs the boundary determination process which determines.

補正処理部６１５は、画像マッチング部６１３により検出された歩行者の検出領域の枠（検出枠）について枠補正処理を行う機能部である。すなわち、補正処理部６１５により歩行者の検出枠について枠補正処理が行われた後の検出枠の画像が、現在フレーム（または現在フレームに対応する視差画像）でのその歩行者の検出領域となる。 The correction processing unit 615 is a functional unit that performs frame correction processing on the frame (detection frame) of the detection area of the pedestrian detected by the image matching unit 613. That is, the image of the detection frame after the frame correction process is performed on the detection frame of the pedestrian by the correction processing unit 615 becomes the detection area of the pedestrian in the current frame (or the parallax image corresponding to the current frame). .

チェック部６２０は、マッチング部６１０により検出された物体の検出領域の大きさに基づいて、トラッキングの目的とする物体（例えば、歩行者または車両）の大きさに対応するか否かを判断する機能である。 The check unit 620 determines whether or not the size corresponds to the size of an object (for example, a pedestrian or a vehicle) targeted for tracking based on the size of the detection area of the object detected by the matching unit 610. It is.

特徴更新部６３０は、現在フレームで検出された物体の検出領域の画像から、次のフレームにおいて、形状マッチング部６１２および画像マッチング部６１３のテンプレートマッチングで用いる特徴量（輪郭テンプレートおよび画像テンプレート）を更新する機能部である。 The feature update unit 630 updates the feature amount (contour template and image template) used in template matching of the shape matching unit 612 and the image matching unit 613 in the next frame from the image of the detection area of the object detected in the current frame. It is a functional part to do.

状態遷移部６４０は、補正処理部６１５により最終的に定まった物体の認識領域情報に基づいて、物体の状態を遷移させる機能部である。例えば、状態遷移部６４０は、チェック部６２０によりトラッキングの目的とする物体と判断されなかった物体、および、形状マッチング部６１２および画像マッチング部６１３によるマッチングにより物体を検出（追跡）できなかった物体を棄却する旨を示す棄却フラグを、その物体の認識領域情報に含めることによって物体の状態を遷移させる。状態遷移部６４０は、遷移させた物体の状態を反映させた認識領域情報を、認識情報として車両制御装置６（図４参照）に出力する。 The state transition unit 640 is a functional unit that transitions the state of the object based on the recognition area information of the object finally determined by the correction processing unit 615. For example, the state transition unit 640 detects an object that is not determined as an object to be tracked by the check unit 620 and an object that cannot be detected (tracked) by matching by the shape matching unit 612 and the image matching unit 613. The state of the object is transitioned by including a rejection flag indicating that the object is rejected in the recognition area information of the object. The state transition unit 640 outputs recognition area information reflecting the state of the transitioned object to the vehicle control device 6 (see FIG. 4) as recognition information.

図１４に示す移動予測部６００、マッチング部６１０の判定部６１１、形状マッチング部６１２、画像マッチング部６１３、境界決定部６１４および補正処理部６１５、チェック部６２０、特徴更新部６３０、ならびに状態遷移部６４０は、それぞれ図３に示すＦＰＧＡ５１によって実現される。なお、これらの機能部の一部または全部は、ハードウェア回路であるＦＰＧＡ５１ではなく、ＲＯＭ５３に記憶されているプログラムがＣＰＵ５２によって実行されることによって実現されるものとしてもよい。 The movement prediction unit 600, the determination unit 611 of the matching unit 610, the shape matching unit 612, the image matching unit 613, the boundary determination unit 614 and the correction processing unit 615, the check unit 620, the feature update unit 630, and the state transition unit illustrated in FIG. 640 is realized by the FPGA 51 shown in FIG. Note that some or all of these functional units may be realized by the CPU 52 executing a program stored in the ROM 53 instead of the FPGA 51 that is a hardware circuit.

なお、図１４に示すトラッキング処理部５２０の各機能部は、機能を概念的に示したものであって、このような構成に限定されるものではない。例えば、図１４に示すトラッキング処理部５２０で独立した機能部として図示した複数の機能部を、１つの機能部として構成してもよい。一方、図１４に示すトラッキング処理部５２０で１つの機能部が有する機能を複数に分割し、複数の機能部として構成するものとしてもよい。 Note that each function unit of the tracking processing unit 520 illustrated in FIG. 14 conceptually illustrates a function, and is not limited to such a configuration. For example, a plurality of functional units illustrated as independent functional units in the tracking processing unit 520 illustrated in FIG. 14 may be configured as one functional unit. On the other hand, the tracking processor 520 shown in FIG. 14 may be configured as a plurality of functional units by dividing a function of one functional unit into a plurality of functional units.

［物体認識装置の動作］
次に、図１５〜２７を参照しながら、物体認識装置１の具体的な動作について説明する。 [Operation of object recognition device]
Next, a specific operation of the object recognition device 1 will be described with reference to FIGS.

（視差値導出部のブロックマッチング処理）
図１５は、実施の形態に係る視差値導出部のブロックマッチング処理の動作の一例を示すフローチャートである。図１５を参照しながら、物体認識装置１の視差値導出部３のブロックマッチング処理の動作の流れについて説明する。 (Block matching process of parallax value deriving unit)
FIG. 15 is a flowchart illustrating an example of the block matching processing operation of the disparity value deriving unit according to the embodiment. The flow of the operation of the block matching process of the parallax value deriving unit 3 of the object recognition device 1 will be described with reference to FIG.

＜ステップＳ１−１＞
視差値導出部３の画像取得部１００ｂは、左のカメラ（撮像部１０ｂ）により前方の被写体を撮像して、それぞれアナログの画像信号を生成し、その画像信号に基づく画像である輝度画像を得る。これによって、後段の画像処理の対象となる画像信号が得られることになる。そして、ステップＳ２−１へ移行する。 <Step S1-1>
The image acquisition unit 100b of the parallax value deriving unit 3 images a front subject with the left camera (imaging unit 10b), generates an analog image signal, and obtains a luminance image that is an image based on the image signal. . As a result, an image signal to be subjected to subsequent image processing is obtained. And it transfers to step S2-1.

＜ステップＳ１−２＞
視差値導出部３の画像取得部１００ａは、右のカメラ（撮像部１０ａ）により前方の被写体を撮像して、それぞれアナログの画像信号を生成し、その画像信号に基づく画像である輝度画像を得る。これによって、後段の画像処理の対象となる画像信号が得られることになる。そして、ステップＳ２−２へ移行する。 <Step S1-2>
The image acquisition unit 100a of the parallax value deriving unit 3 images a front subject with the right camera (imaging unit 10a), generates an analog image signal, and obtains a luminance image that is an image based on the image signal. . As a result, an image signal to be subjected to subsequent image processing is obtained. Then, the process proceeds to step S2-2.

＜ステップＳ２−１＞
視差値導出部３の変換部２００ｂは、撮像部１０ｂにより撮像されて得られたアナログの画像信号に対して、ノイズを除去し、デジタル形式の画像データに変換する。このように、デジタル形式の画像データに変換することによって、その画像データに基づく画像に対して画素ごとの画像処理が可能となる。そして、ステップＳ３−１へ移行する。 <Step S2-1>
The conversion unit 200b of the parallax value deriving unit 3 removes noise from the analog image signal obtained by imaging by the imaging unit 10b and converts the analog image signal into digital image data. In this way, by converting the image data into digital image data, image processing for each pixel can be performed on an image based on the image data. And it transfers to step S3-1.

＜ステップＳ２−２＞
視差値導出部３の変換部２００ａは、撮像部１０ａにより撮像されて得られたアナログの画像信号に対して、ノイズを除去し、デジタル形式の画像データに変換する。このように、デジタル形式の画像データに変換することによって、その画像データに基づく画像に対して画素ごとの画像処理が可能となる。そして、ステップＳ３−２へ移行する。 <Step S2-2>
The conversion unit 200a of the parallax value deriving unit 3 removes noise from the analog image signal obtained by imaging by the imaging unit 10a and converts the analog image signal into digital image data. In this way, by converting the image data into digital image data, image processing for each pixel can be performed on an image based on the image data. Then, the process proceeds to step S3-2.

＜ステップＳ３−１＞
変換部２００ｂは、ステップＳ２−１において変換したデジタル形式の画像データに基づく画像をブロックマッチング処理における比較画像Ｉｂとして出力する。これによって、ブロックマッチング処理において視差値を求めるための比較対象となる画像を得る。そして、ステップＳ４へ移行する。 <Step S3-1>
The conversion unit 200b outputs an image based on the digital image data converted in step S2-1 as the comparison image Ib in the block matching process. Thus, an image to be compared for obtaining a parallax value in the block matching process is obtained. Then, the process proceeds to step S4.

＜ステップＳ３−２＞
変換部２００ａは、ステップＳ２−２において変換したデジタル形式の画像データに基づく画像をブロックマッチング処理における基準画像Ｉａとして出力する。これによって、ブロックマッチング処理において視差値を求めるための基準となる画像を得る。そして、ステップＳ４へ移行する。 <Step S3-2>
The conversion unit 200a outputs an image based on the digital image data converted in step S2-2 as the reference image Ia in the block matching process. Thus, an image serving as a reference for obtaining a parallax value in the block matching process is obtained. Then, the process proceeds to step S4.

＜ステップＳ４＞
視差値導出部３の視差値演算処理部３００のコスト算出部３０１は、基準画像Ｉａにおける基準画素ｐ（ｘ，ｙ）の輝度値、および、基準画素ｐ（ｘ，ｙ）に基づく比較画像Ｉｂにおけるエピポーラ線ＥＬ上で、基準画素ｐ（ｘ，ｙ）の位置に相当する画素からシフト量ｄでシフトすることにより特定される、対応画素の候補画素ｑ（ｘ＋ｄ，ｙ）の各輝度値に基づいて、各候補画素ｑ（ｘ＋ｄ，ｙ）のコスト値Ｃ（ｐ，ｄ）を算出する。具体的には、コスト算出部３０１は、ブロックマッチング処理により、基準画像Ｉａの基準画素ｐを中心とする所定領域である基準領域ｐｂと、比較画像Ｉｂの候補画素ｑを中心とする候補領域ｑｂ（大きさは基準領域ｐｂと同一）との非類似度をコスト値Ｃとして算出する。そして、ステップＳ５へ進む。 <Step S4>
The cost calculation unit 301 of the parallax value calculation processing unit 300 of the parallax value deriving unit 3 compares the luminance value of the reference pixel p (x, y) in the reference image Ia and the comparison image Ib based on the reference pixel p (x, y). On the epipolar line EL, the brightness value of the candidate pixel q (x + d, y) of the corresponding pixel specified by shifting by the shift amount d from the pixel corresponding to the position of the reference pixel p (x, y). Based on this, the cost value C (p, d) of each candidate pixel q (x + d, y) is calculated. Specifically, the cost calculation unit 301 performs, by block matching processing, a reference region pb that is a predetermined region centered on the reference pixel p of the reference image Ia and a candidate region qb centered on the candidate pixel q of the comparison image Ib. The dissimilarity with (the size is the same as the reference region pb) is calculated as the cost value C. Then, the process proceeds to step S5.

＜ステップＳ５＞
視差値導出部３の視差値演算処理部３００の決定部３０２は、コスト算出部３０１により算出されたコスト値Ｃの最小値に対応するシフト量ｄを、コスト値Ｃの算出の対象となった基準画像Ｉａの画素についての視差値ｄｐとして決定する。そして、視差値導出部３の視差値演算処理部３００の第１生成部３０３は、決定部３０２により決定された視差値ｄｐに基づいて、基準画像Ｉａの各画素の輝度値を、その画素に対応する視差値ｄｐで表した画像である視差画像を生成する。第１生成部３０３は、生成した視差画像を、認識処理部５に出力する。 <Step S5>
The determination unit 302 of the parallax value calculation processing unit 300 of the parallax value deriving unit 3 uses the shift amount d corresponding to the minimum value of the cost value C calculated by the cost calculation unit 301 as a target for calculating the cost value C. The parallax value dp for the pixel of the reference image Ia is determined. Then, the first generation unit 303 of the parallax value calculation processing unit 300 of the parallax value deriving unit 3 sets the luminance value of each pixel of the reference image Ia to that pixel based on the parallax value dp determined by the determination unit 302. A parallax image that is an image represented by the corresponding parallax value dp is generated. The first generation unit 303 outputs the generated parallax image to the recognition processing unit 5.

なお、上述のステレオマッチング処理は、ブロックマッチング処理を例として説明したが、これに限定されるものではなく、ＳＧＭ（Ｓｅｍｉ−ＧｌｏｂａｌＭａｔｃｈｉｎｇ）法を用いた処理であってもよい。 The stereo matching process described above has been described by taking the block matching process as an example, but is not limited thereto, and may be a process using an SGM (Semi-Global Matching) method.

（認識処理部のトラッキング処理部のトラッキング処理）
図１６は、実施の形態に係る認識処理部のトラッキング処理部のトラッキング処理の動作の一例を示すフローチャートである。図１７は、移動予測の動作を説明する図である。図１６および１７を参照しながら、認識処理部５のトラッキング処理部５２０のトラッキング処理の動作の流れについて説明する。 (Tracking process of the tracking processor of the recognition processor)
FIG. 16 is a flowchart illustrating an example of the tracking processing operation of the tracking processing unit of the recognition processing unit according to the embodiment. FIG. 17 is a diagram for explaining the movement prediction operation. The flow of the tracking processing operation of the tracking processing unit 520 of the recognition processing unit 5 will be described with reference to FIGS.

＜ステップＳ１１＞
トラッキング処理部５２０の移動予測部６００は、前段のクラスタリング処理部５１０により新規検出された物体のこれまでの移動および動作状態の履歴、ならびに車両情報を含む認識領域情報を用いて、これまで追従（トラッキング）してきた物体ごとに、図１７に示すように、現在フレーム（基準画像Ｉａ）（またはそれに対応する視差画像）上で物体が存在する確率が高い予測領域８００を予測する。そして、ステップＳ１２へ移行する。 <Step S11>
The movement prediction unit 600 of the tracking processing unit 520 uses the recognition area information including the history of the movement and motion state of the object newly detected by the clustering processing unit 510 in the previous stage and the vehicle information so far ( For each object that has been tracked), as shown in FIG. 17, a prediction region 800 having a high probability that an object exists on the current frame (reference image Ia) (or a corresponding parallax image) is predicted. Then, the process proceeds to step S12.

＜ステップＳ１２＞
トラッキング処理部５２０のマッチング部６１０は、予測領域８００内における前フレームで求めた特徴量（テンプレート）との類似度に基づくテンプレートマッチングを行い、現在フレームにおける物体（特に、車両および歩行者）の位置を求める。マッチング部６１０によるマッチング処理の詳細は、図１８および１９で後述する。そして、ステップＳ１３へ移行する。 <Step S12>
The matching unit 610 of the tracking processing unit 520 performs template matching based on the similarity to the feature amount (template) obtained in the previous frame in the prediction region 800, and the position of an object (particularly, a vehicle and a pedestrian) in the current frame. Ask for. Details of the matching processing by the matching unit 610 will be described later with reference to FIGS. Then, the process proceeds to step S13.

＜ステップＳ１３＞
トラッキング処理部５２０のチェック部６２０は、マッチング部６１０により検出された物体の検出領域の大きさに基づいて、トラッキングの目的とする物体（例えば、歩行者または車両）の大きさに対応するか否かを判断する。そして、ステップＳ１４へ移行する。 <Step S13>
Whether or not the check unit 620 of the tracking processing unit 520 corresponds to the size of an object (for example, a pedestrian or a vehicle) targeted for tracking based on the size of the detection area of the object detected by the matching unit 610. Determine whether. Then, the process proceeds to step S14.

＜ステップＳ１４＞
トラッキング処理部５２０の特徴更新部６３０は、現在フレームで検出された物体の検出領域の画像から、次のフレームにおいて、形状マッチング部６１２および画像マッチング部６１３のテンプレートマッチングで用いる特徴量（輪郭テンプレートおよび画像テンプレート）を更新する。そして、ステップＳ１５へ移行する。 <Step S14>
The feature update unit 630 of the tracking processing unit 520 uses, from the image of the detection area of the object detected in the current frame, the feature amount (contour template and profile template) used in the template matching of the shape matching unit 612 and the image matching unit 613 in the next frame. Update the image template. Then, the process proceeds to step S15.

＜ステップＳ１５＞
トラッキング処理部５２０の状態遷移部６４０は、補正処理部６１５により最終的に定まった物体の認識領域情報に基づいて、物体の状態を遷移させる。状態遷移部６４０は、遷移させた物体の状態を反映させた認識領域情報を、認識情報として車両制御装置６に出力する。 <Step S15>
The state transition unit 640 of the tracking processing unit 520 changes the state of the object based on the recognition area information of the object finally determined by the correction processing unit 615. The state transition unit 640 outputs recognition area information reflecting the state of the transitioned object to the vehicle control device 6 as recognition information.

以上のステップＳ１１〜Ｓ１５の処理により、トラッキング処理部５２０によるトラッキング処理が行われる。なお、ステップＳ１１〜Ｓ１５の処理は、クラスタリング処理部５１０により新規検出された物体の検出領域ごとに実行される。 The tracking process by the tracking processing unit 520 is performed by the processes in steps S11 to S15 described above. Note that the processing in steps S11 to S15 is executed for each detection region of an object newly detected by the clustering processing unit 510.

（トラッキング処理における分岐処理）
図１８は、実施の形態のトラッキング処理部の判定部の分岐処理の動作の一例を示すフローチャートである。図１８を参照しながら、トラッキング処理部５２０のマッチング部６１０の分岐処理の動作の流れについて説明する。 (Branch processing in tracking processing)
FIG. 18 is a flowchart illustrating an example of the branch processing operation of the determination unit of the tracking processing unit according to the embodiment. The flow of the branch processing operation of the matching unit 610 of the tracking processing unit 520 will be described with reference to FIG.

＜ステップＳ１２１＞
マッチング部６１０の判定部６１１は、物体の認識領域情報に基づいて、追跡対象となる物体（追跡物体）の種類を判定する。物体が車両である場合（ステップＳ１２１：車両）、ステップＳ１２２へ移行し、物体が歩行者である場合（ステップＳ１２１：歩行者）、ステップＳ１２３へ移行する。なお、トラッキング処理の対象となる物体が車両および歩行者に限られる場合、判定部６１１は、物体の種類の判定の結果、車両および歩行者でないと判定した場合、その物体の認識領域情報に、その物体を棄却する旨を示す棄却フラグを含めるものとしてもよい。 <Step S121>
The determination unit 611 of the matching unit 610 determines the type of the object (tracked object) to be tracked based on the recognition area information of the object. When the object is a vehicle (step S121: vehicle), the process proceeds to step S122. When the object is a pedestrian (step S121: pedestrian), the process proceeds to step S123. When the object to be tracked is limited to a vehicle and a pedestrian, the determination unit 611 determines that the object is not a vehicle or a pedestrian as a result of determining the type of the object. A rejection flag indicating that the object is to be rejected may be included.

＜ステップＳ１２２＞
判定部６１１は、物体が車両であると判定した場合、車両を追跡するための車両用マッチング処理を実行させる。そして、分岐処理を終了する。 <Step S122>
If the determination unit 611 determines that the object is a vehicle, the determination unit 611 executes a vehicle matching process for tracking the vehicle. Then, the branch process ends.

＜ステップＳ１２３＞
判定部６１１は、物体が歩行者であると判定した場合、歩行者を追跡するための歩行者用マッチング処理を実行させる。歩行者用マッチング処理の詳細は、図１９で後述する。そして、分岐処理を終了する。 <Step S123>
If the determination unit 611 determines that the object is a pedestrian, the determination unit 611 executes a pedestrian matching process for tracking the pedestrian. Details of the pedestrian matching process will be described later with reference to FIG. Then, the branch process ends.

以上のステップＳ１２１〜Ｓ１２３の処理により、トラッキング処理部５２０のマッチング部６１０による分岐処理が行われる。 The branch processing by the matching unit 610 of the tracking processing unit 520 is performed by the processing of steps S121 to S123 described above.

（トラッキング処理における歩行者用マッチング処理）
図１９は、実施の形態のトラッキング処理部のマッチング部の歩行者用マッチング処理の動作の一例を示すフローチャートである。図２０は、歩行者用マッチング処理における形状マッチング処理のうち輪郭を検出する動作を説明する図である。図２１は、形状マッチング処理において検出された輪郭の一例を示す図である。図２２は、前フレームに対応する視差画像で検出された輪郭テンプレートの一例を示す図である。図２３は、実施の形態のマッチング部の歩行者用マッチング処理における形状マッチング処理の動作を説明する図である。図２４は、歩行者用マッチング処理における画像マッチング処理で使用する画像テンプレートの一例を示す図である。図２５は、実施の形態のマッチング部の歩行者用マッチング処理における画像マッチング処理の動作を説明する図である。図２６は、実施の形態のマッチング部の歩行者用マッチング処理における境界決定処理の動作を説明する図である。図２７は、実施の形態のマッチング部の歩行者用マッチング処理における枠補正処理の動作を説明する図である。図１９〜２７を参照しながら、トラッキング処理部５２０のマッチング部６１０の歩行者用マッチング処理の動作の流れについて説明する。 (Pedestrian matching process in tracking process)
FIG. 19 is a flowchart illustrating an example of the operation of the pedestrian matching process of the matching unit of the tracking processing unit according to the embodiment. FIG. 20 is a diagram illustrating an operation of detecting a contour in the shape matching process in the pedestrian matching process. FIG. 21 is a diagram illustrating an example of a contour detected in the shape matching process. FIG. 22 is a diagram illustrating an example of a contour template detected from a parallax image corresponding to a previous frame. FIG. 23 is a diagram illustrating the operation of the shape matching process in the pedestrian matching process of the matching unit according to the embodiment. FIG. 24 is a diagram illustrating an example of an image template used in the image matching process in the pedestrian matching process. FIG. 25 is a diagram illustrating the operation of the image matching process in the pedestrian matching process of the matching unit according to the embodiment. FIG. 26 is a diagram illustrating the operation of the boundary determination process in the pedestrian matching process of the matching unit according to the embodiment. FIG. 27 is a diagram illustrating an operation of a frame correction process in the pedestrian matching process of the matching unit according to the embodiment. The flow of the operation of the matching process for pedestrians of the matching unit 610 of the tracking processing unit 520 will be described with reference to FIGS.

＜ステップＳ１２３１＞
マッチング部６１０の形状マッチング部６１２は、視差画像において歩行者の頭部を主とする輪郭を検出し、前フレームに対応する視差画像で検出された歩行者の輪郭をテンプレート（輪郭テンプレート）としてテンプレートマッチングを行う形状マッチング処理を行う。具体的には、形状マッチング部６１２は、まず、図２０に示すように、現フレームに対応する視差画像における、移動予測部６００により予測された予測領域８０１において、各ｘ座標で上端から下方向へ向かって探索し、歩行者の視差値に行き当たる位置（Ｙ座標）を特定していく。次に、形状マッチング部６１２は、特定した位置を結ぶことによって、図２１に示すように、予測領域８０１において、歩行者の頭部から肩近傍にかけての輪郭である輪郭９０１を取得する。ただし、取得された輪郭は、本来この時点では歩行者の輪郭か、歩行者以外の物体の輪郭かについては判別されていないが、ここでは、図２０に示す歩行者を例にして説明するものとする。図２１の例では、２人の歩行者の輪郭が抽出された場合の例を示している。次に、形状マッチング部６１２は、前フレームに対応する視差画像の予測領域８０２で検出された歩行者の輪郭をテンプレート（図２２に示す輪郭テンプレート９０２）を、予測領域８０１において横方向（Ｘ方向）にずらしながら、輪郭９０１に対するテンプレートマッチングを行う。なお、テンプレートマッチングに用いる類似度を示すコスト値としては、例えば、ＳＡＤ、ＳＳＤまたはＺＳＳＤ等を用いるものとすればよい。図２３に示す例では、ＳＡＤを用いた場合を示す。 <Step S1231>
The shape matching unit 612 of the matching unit 610 detects a contour mainly including the head of the pedestrian in the parallax image, and uses the contour of the pedestrian detected from the parallax image corresponding to the previous frame as a template (contour template). A shape matching process for matching is performed. Specifically, as shown in FIG. 20, the shape matching unit 612 first moves downward from the upper end at each x coordinate in the prediction region 801 predicted by the movement prediction unit 600 in the parallax image corresponding to the current frame. The position where the pedestrian's parallax value is reached (Y coordinate) is specified. Next, the shape matching unit 612 acquires a contour 901 that is a contour from the head of the pedestrian to the vicinity of the shoulder in the prediction region 801 as shown in FIG. 21 by connecting the specified positions. However, it is not determined at this point whether the acquired contour is the contour of a pedestrian or the contour of an object other than a pedestrian at this point, but here, the pedestrian shown in FIG. 20 will be described as an example. And In the example of FIG. 21, an example in which the contours of two pedestrians are extracted is shown. Next, the shape matching unit 612 generates a template (contour template 902 shown in FIG. 22) of the pedestrian contour detected in the prediction region 802 of the parallax image corresponding to the previous frame, and the horizontal direction (X direction) in the prediction region 801. ) And template matching for the contour 901 is performed. For example, SAD, SSD, or ZSSD may be used as the cost value indicating the similarity used for template matching. In the example shown in FIG. 23, the case where SAD is used is shown.

形状マッチング部６１２は、予測領域８０１において、輪郭テンプレート９０２を横方向にずらしながらＳＡＤを算出し、ＳＡＤの値が極小となるＸ方向の位置（すなわち、類似度が高い位置）を特定する。図２３の例では、形状マッチング部６１２は、ＳＡＤの値が極小となるＸ方向の位置、すなわち、追跡対象の歩行者が存在する候補となるＸ方向の位置である候補位置Ｐ１、Ｐ２を特定する。ここで、形状マッチング部６１２は、追跡対象となる歩行者のＸ方向の位置として判定した位置は、例えば、ＳＡＤが最も小さくなった場合に位置する輪郭テンプレート９０２の最も上側に存在する画素のＸ方向の位置とすればよい。形状マッチング部６１２は、特定した候補位置Ｐ１、Ｐ２の情報を、画像マッチング部６１３に送る。そして、ステップＳ１２３２へ移行する。 In the prediction region 801, the shape matching unit 612 calculates the SAD while shifting the contour template 902 in the horizontal direction, and specifies the position in the X direction where the SAD value is minimized (that is, the position with high similarity). In the example of FIG. 23, the shape matching unit 612 identifies candidate positions P1 and P2 that are positions in the X direction where the SAD value is minimized, that is, positions in the X direction that are candidates for the presence of the pedestrian to be tracked. To do. Here, the shape matching unit 612 determines the position determined as the position in the X direction of the pedestrian to be tracked, for example, the X of the pixel existing on the uppermost side of the contour template 902 located when the SAD becomes the smallest. What is necessary is just to set it as the position of a direction. The shape matching unit 612 sends information on the identified candidate positions P1 and P2 to the image matching unit 613. Then, control goes to a step S1232.

＜ステップＳ１２３２＞
マッチング部６１０の画像マッチング部６１３は、現在フレームである輝度画像（基準画像Ｉａ）において、前フレームに対応する視差画像で検出された輪郭テンプレートに基づいた画像テンプレートによりテンプレートマッチングを行う画像マッチング処理を行う。具体的には、図２４（ａ）に示すように、前フレームに対応する視差画像で検出されている輪郭テンプレート９０２の位置に対応する輝度画像（基準画像Ｉａ）上の位置において、輪郭を構成する各画素から下方向のＮ画素分で構成される画像テンプレート９０２ａが、予め、前フレームに対する特徴更新部６３０の処理により作成されている。画像マッチング部６１３は、まず、図２５に示すように、視差画像上の予測領域８０１に対応する輝度画像（基準画像Ｉａ）上の予測領域８１１において、形状マッチング部６１２により特定されたＸ方向の候補の位置である候補位置Ｐ１、Ｐ２のそれぞれにおいて、画像テンプレート９０２ａを予測領域８１１の上端から下方向にずらしながら、テンプレートマッチングを行う。なお、テンプレートマッチングに用いる類似度を示すコスト値としては、例えば、ＳＡＤ、ＳＳＤまたはＺＳＳＤ等を用いるものとすればよい。図２５に示す例では、ＳＡＤを用いた場合を示す。 <Step S1232>
The image matching unit 613 of the matching unit 610 performs image matching processing for performing template matching on the luminance image (reference image Ia) that is the current frame using an image template based on the contour template detected in the parallax image corresponding to the previous frame. Do. Specifically, as shown in FIG. 24A, a contour is formed at a position on the luminance image (reference image Ia) corresponding to the position of the contour template 902 detected in the parallax image corresponding to the previous frame. An image template 902a composed of N pixels in the downward direction from each pixel is created in advance by the process of the feature update unit 630 for the previous frame. First, as shown in FIG. 25, the image matching unit 613 performs the X-direction specified by the shape matching unit 612 in the prediction region 811 on the luminance image (reference image Ia) corresponding to the prediction region 801 on the parallax image. Template matching is performed while shifting the image template 902a downward from the upper end of the prediction region 811 at each of the candidate positions P1 and P2, which are candidate positions. For example, SAD, SSD, or ZSSD may be used as the cost value indicating the similarity used for template matching. In the example shown in FIG. 25, the case where SAD is used is shown.

画像マッチング部６１３は、予測領域８１１において、画像テンプレート９０２ａをＹ方向にずらしながらＳＡＤを算出し、ＳＡＤの値が最も小さくなるＹ方向の位置（すなわち、最も類似度が高い位置）を特定する。図２５の例では、画像テンプレート９０２ａによるテンプレートマッチングを候補位置Ｐ１、Ｐ２双方で行うので、画像マッチング部６１３は、それぞれの候補位置で最も値が小さいＳＡＤのうち小さい方の候補位置を、歩行者のＸ方向の位置であると決定する。また、画像マッチング部６１３は、決定した歩行者のＸ方向の位置に対応する候補位置でＳＡＤが最も小さくなるＹ方向の位置を、歩行者のＹ方向の位置（第１位置）であると決定する。ここで、画像マッチング部６１３は、追跡対象となる歩行者のＹ方向の位置として決定した位置は、例えば、ＳＡＤが最も小さくなった場合に位置する画像テンプレート９０２ａの最も上側に存在する画素のＹ方向の位置とすればよい。これによって、追跡対象となる歩行者の視差画像（または輝度画像）の位置が検出される。 In the prediction area 811, the image matching unit 613 calculates SAD while shifting the image template 902a in the Y direction, and specifies the position in the Y direction where the SAD value is the smallest (that is, the position where the similarity is the highest). In the example of FIG. 25, since template matching based on the image template 902a is performed at both candidate positions P1 and P2, the image matching unit 613 selects a smaller candidate position from among the SADs having the smallest value at each candidate position as a pedestrian. Is determined to be in the X direction. Further, the image matching unit 613 determines the position in the Y direction where the SAD is the smallest at the candidate position corresponding to the determined position of the pedestrian in the X direction as the position of the pedestrian in the Y direction (first position). To do. Here, the position determined as the position in the Y direction of the pedestrian to be tracked by the image matching unit 613 is, for example, Y of the pixel existing on the uppermost side of the image template 902a positioned when the SAD becomes the smallest. What is necessary is just to set it as the position of a direction. As a result, the position of the parallax image (or luminance image) of the pedestrian to be tracked is detected.

ここで、例えば、現在フレームで位置が検出された歩行者の検出領域を示す検出枠は、例えば、検出領域の面積、すなわち検出枠内の面積が同一であるものとし、かつ、前フレームにおける歩行者の位置に対する、その歩行者を囲む検出枠の相対位置が同一となるように、現在フレームにおける検出枠の位置を決定するものとすればよい。なお、このように決定された検出枠の位置および大きさは、後述の補正処理部６１５による枠補正処理によって補正される。 Here, for example, the detection frame indicating the detection area of the pedestrian whose position is detected in the current frame is assumed to have the same area of the detection area, that is, the area in the detection frame, and walking in the previous frame, for example. The position of the detection frame in the current frame may be determined such that the relative position of the detection frame surrounding the pedestrian with respect to the position of the person is the same. Note that the position and size of the detection frame determined in this way are corrected by frame correction processing by the correction processing unit 615 described later.

なお、画像マッチング部６１３による画像マッチング処理は、上述のように画像テンプレート９０２ａを候補位置（例えば、図２５に示す候補位置Ｐ１、Ｐ２）において上端から下方向にずらしながらＳＡＤを求めるという方法に限定されるものではない。図２４（ｂ）に示す矩形テンプレート９０２ｂは、特徴更新部６３０により作成された図２４（ａ）に示す画像テンプレート９０２ａにおける各Ｘ座標においてＹ方向に延びる各Ｎ画素を、Ｙ方向（高さ方向）でそろえて矩形状にしたものである。画像マッチング部６１３は、形状マッチング部６１２により特定された候補位置Ｐ１、Ｐ２において、矩形テンプレート９０２ｂを構成する各Ｘ座標のＮ画素を、輪郭９０１に向かって下ろし、Ｎ画素の各列を、その列の最上の画素が、輪郭９０１の各画素に重なるように配置させる。画像マッチング部６１３は、この場合に、矩形テンプレート９０２ｂの各画素値と、予測領域８１１において、Ｎ画素の各列が重なった部分の各画素値とを用いてＳＡＤ等を算出し、算出したＳＡＤ等の値が所定の閾値未満である場合、算出したＳＡＤ等に対応する輪郭９０１における輪郭部分で定まる位置を、追跡対象となる歩行者の視差画像（または輝度画像）の位置として検出するものとしてもよい。これによって、画像テンプレート９０２ａをＹ方向にずらしながら各位置でＳＡＤ等を算出する必要がないため、追跡対象となる歩行者の位置を検出するための処理速度を向上させることができる。 Note that the image matching processing by the image matching unit 613 is limited to a method of obtaining SAD while shifting the image template 902a downward from the upper end at the candidate position (for example, candidate positions P1 and P2 shown in FIG. 25) as described above. Is not to be done. A rectangular template 902b shown in FIG. 24B is obtained by converting each N pixel extending in the Y direction in each X coordinate in the image template 902a shown in FIG. 24A created by the feature update unit 630 into the Y direction (height direction). ) To make it a rectangular shape. The image matching unit 613 lowers the N pixels of each X coordinate constituting the rectangular template 902b toward the contour 901 at the candidate positions P1 and P2 specified by the shape matching unit 612, and sets each column of N pixels to The uppermost pixel in the column is arranged so as to overlap each pixel of the contour 901. In this case, the image matching unit 613 calculates SAD or the like using each pixel value of the rectangular template 902b and each pixel value of a portion where each column of N pixels overlaps in the prediction region 811. When a value such as is less than a predetermined threshold, the position determined by the contour portion in the contour 901 corresponding to the calculated SAD or the like is detected as the position of the parallax image (or luminance image) of the pedestrian to be tracked. Also good. Thereby, since it is not necessary to calculate SAD or the like at each position while shifting the image template 902a in the Y direction, the processing speed for detecting the position of the pedestrian to be tracked can be improved.

画像マッチング部６１３は、決定した歩行者の位置、ならびに検出枠の位置および大きさの情報を境界決定部６１４に送る。そして、ステップＳ１２３３へ移行する。 The image matching unit 613 sends the determined position of the pedestrian and information on the position and size of the detection frame to the boundary determining unit 614. Then, control goes to a step S1233.

＜ステップＳ１２３３＞
マッチング部６１０の境界決定部６１４は、現在フレームで複数の歩行者の輪郭が検出された場合、画像マッチング部６１３によって検出（位置が決定）された歩行者以外の歩行者との境界を決定する境界決定処理を行う。具体的には、境界決定部６１４は、まず、図２６に示すように、現フレームに対する視差画像における予測領域８０１において、上端から下方向に向かって、Ｘ方向に延びる基準線ＢＬを下ろしていく。そして、境界決定部６１４は、画像マッチング部６１３により位置が決定された物体（歩行者）の視差値の塊（孤立領域）と基準線ＢＬとが重なり始めてから、さらに基準線ＢＬを下ろしていき、基準線ＢＬと重なり始めた位置から基準線ＢＬの現位置までの領域について、Ｙ方向に視差値が連続する領域であり、かつ、その領域のＹ方向の長さが所定の長さ（例えば、２０［ｃｍ］）以上となったか否かを判定する。 <Step S1233>
The boundary determination unit 614 of the matching unit 610 determines a boundary with a pedestrian other than the pedestrian detected (position is determined) by the image matching unit 613 when the contours of a plurality of pedestrians are detected in the current frame. Perform boundary determination processing. Specifically, the boundary determination unit 614 first lowers the reference line BL extending in the X direction from the top to the bottom in the prediction region 801 in the parallax image for the current frame, as shown in FIG. . The boundary determining unit 614 then lowers the reference line BL after the parallax value cluster (isolated region) of the object (pedestrian) whose position has been determined by the image matching unit 613 begins to overlap the reference line BL. The region from the position where the reference line BL starts to overlap to the current position of the reference line BL is a region where parallax values are continuous in the Y direction, and the length of the region in the Y direction is a predetermined length (for example, , 20 [cm]) or more.

また、同時に、境界決定部６１４は、図２１に示した輪郭９０１のように、複数の歩行者の輪郭が抽出された場合、画像マッチング部６１３により位置が決定された追跡対象となる歩行者とは別の歩行者に対しても、上述の処理と同様に、別の歩行者の孤立領域と基準線ＢＬとが重なり始めた位置から基準線ＢＬの現位置までの領域について、Ｙ方向に視差値が連続する領域であり、かつ、その領域のＹ方向の長さが所定の長さ（例えば、２０［ｃｍ］）となったか否かを判定する。この場合、別の歩行者について、基準線ＢＬと重なり始めた位置から基準線ＢＬの現位置までの領域について、Ｙ方向に視差値が連続する領域であり、かつ、その領域のＹ方向の長さが所定の長さ（例えば、２０［ｃｍ］）以上となる領域が検出されなかった場合、境界決定部６１４は、これ以降の境界決定処理は行わない。 At the same time, when the contours of a plurality of pedestrians are extracted, such as the contour 901 shown in FIG. 21, the boundary determination unit 614 and the pedestrian to be tracked whose position is determined by the image matching unit 613 For other pedestrians, the parallax in the Y direction is determined for the region from the position where the isolated area of the other pedestrian begins to overlap the reference line BL to the current position of the reference line BL, as in the above-described process. It is determined whether the value is a continuous region and the length of the region in the Y direction is a predetermined length (for example, 20 [cm]). In this case, for another pedestrian, the region from the position where the reference line BL begins to overlap to the current position of the reference line BL is a region where parallax values continue in the Y direction, and the length of the region in the Y direction If no region having a length equal to or longer than a predetermined length (for example, 20 [cm]) is not detected, the boundary determination unit 614 does not perform the subsequent boundary determination processing.

そして、境界決定部６１４は、画像マッチング部６１３により位置が決定された歩行者が、別の歩行者の右側にいる場合、歩行者および別の歩行者それぞれの孤立領域について、基準線ＢＬと重なり始めた位置から基準線ＢＬの現位置までの領域が、それぞれ、Ｙ方向に視差値が連続する領域であり、かつ、その領域のＹ方向の長さが所定の長さ（例えば、２０［ｃｍ］）以上となった時点における基準線ＢＬのＹ方向の位置（第２位置）での歩行者の領域の左端と、別の歩行者の領域の右端との中点を、Ｘ方向における歩行者と別の歩行者との境界位置とする。一方、境界決定部６１４は、画像マッチング部６１３により位置が決定された歩行者が、別の歩行者の左側にいる場合、歩行者および別の歩行者それぞれの孤立領域について、基準線ＢＬと重なり始めた位置から基準線ＢＬの現位置までの領域が、それぞれ、Ｙ方向に視差値が連続する領域であり、かつ、その領域のＹ方向の長さが所定の長さ（例えば、２０［ｃｍ］）以上となった時点における基準線ＢＬのＹ方向の位置（第２位置）での歩行者の領域の右端と、別の歩行者の左端との中点を、Ｘ方向における歩行者と別の歩行者との境界位置とする。例えば、図２６において、紙面視右側の歩行者が画像マッチング部６１３により位置が決定された歩行者であるものとした場合、境界決定部６１４は、歩行者および別の歩行者それぞれの孤立領域について、基準線ＢＬと重なり始めた位置から基準線ＢＬの現位置までの領域が、それぞれ、Ｙ方向に視差値が連続する領域であり、かつ、その領域のＹ方向の長さが所定の長さ（例えば、２０［ｃｍ］）以上となった時点における基準線ＢＬのＹ方向の位置での歩行者の領域の左端を検出位置Ｐ３とし、別の歩行者の領域の右端を検出位置Ｐ４として検出する。そして、境界決定部６１４は、検出位置Ｐ３と検出位置Ｐ４との中点を境界位置Ｐｂとして検出する。 Then, when the pedestrian whose position is determined by the image matching unit 613 is on the right side of another pedestrian, the boundary determination unit 614 overlaps the reference line BL for each isolated region of the pedestrian and another pedestrian. The areas from the start position to the current position of the reference line BL are areas where parallax values are continuous in the Y direction, and the length of the area in the Y direction is a predetermined length (for example, 20 [cm ] The pedestrian in the X direction is the midpoint between the left end of the pedestrian area at the position in the Y direction (second position) of the reference line BL and the right end of another pedestrian area at the time point And the boundary position with another pedestrian. On the other hand, when the pedestrian whose position is determined by the image matching unit 613 is on the left side of another pedestrian, the boundary determination unit 614 overlaps the reference line BL for each isolated region of the pedestrian and another pedestrian. The areas from the start position to the current position of the reference line BL are areas where parallax values are continuous in the Y direction, and the length of the area in the Y direction is a predetermined length (for example, 20 [cm ]) The middle point between the right end of the pedestrian region at the position in the Y direction (second position) of the reference line BL and the left end of another pedestrian at the time of the above becomes different from the pedestrian in the X direction. Boundary position with pedestrians. For example, in FIG. 26, when the pedestrian on the right side in the drawing is a pedestrian whose position is determined by the image matching unit 613, the boundary determination unit 614 determines the isolated area of each pedestrian and another pedestrian. The areas from the position where the reference line BL begins to overlap to the current position of the reference line BL are areas where parallax values are continuous in the Y direction, and the length of the area in the Y direction is a predetermined length. (For example, 20 [cm]) or more, the left end of the pedestrian region at the position in the Y direction of the reference line BL is detected as the detection position P3, and the right end of another pedestrian region is detected as the detection position P4. To do. Then, the boundary determination unit 614 detects the midpoint between the detection position P3 and the detection position P4 as the boundary position Pb.

境界決定部６１４は、歩行者の位置、歩行者の検出枠の位置および大きさ、ならびに境界位置の情報を補正処理部６１５に送る。そして、ステップＳ１２３４へ移行する。 The boundary determination unit 614 sends information about the position of the pedestrian, the position and size of the detection frame of the pedestrian, and the boundary position to the correction processing unit 615. Then, control goes to a step S1234.

＜ステップＳ１２３４＞
マッチング部６１０の補正処理部６１５は、現在フレームにおいて、位置が検出された歩行者の検出領域の枠（検出枠）について枠補正処理を行う。具体的には、補正処理部６１５は、まず、図２７に示すように、現フレームに対応する視差画像上の歩行者の検出枠８２０内の画像について、Ｘ方向で視差値を含む画素の頻度を示すヒストグラム９１０、および、Ｙ方向で視差値を含む画素の頻度を示すヒストグラム９１１を作成する。そして、補正処理部６１５は、図２７に示すように、ヒストグラム９１０において閾値Ｔｈを超えるＸ方向の位置を、それぞれ、補正後の検出枠８２１の左端および右端の位置とし、ヒストグラム９１１において閾値Ｔｈを超えるＹ方向の位置を、それぞれ、補正後の検出枠８２１の上端および下端の位置とする。閾値Ｔｈは、例えば、ヒストグラムの最大値に対して１０〜２０［％］の値とすればよい。この場合、図２７ではＸ方向およびＹ方向の閾値を、閾値Ｔｈとしているが、同一の閾値である必要はない。このようにして、補正処理部６１５により枠補正処理が行われた検出枠８２１の画像が、マッチング部６１０による歩行者用マッチング処理によって最終的に検出された検出領域となる。そして、補正処理部６１５は、検出した歩行者の検出領域の情報（位置および大きさ等）を、その歩行者の認識領域情報に含める。そして、歩行者用マッチング処理を終了する。 <Step S1234>
The correction processing unit 615 of the matching unit 610 performs frame correction processing on the frame (detection frame) of the detection area of the pedestrian whose position is detected in the current frame. Specifically, first, as shown in FIG. 27, the correction processing unit 615 performs the frequency of pixels including a parallax value in the X direction for an image in the pedestrian detection frame 820 on the parallax image corresponding to the current frame. And a histogram 911 indicating the frequency of pixels including a parallax value in the Y direction. Then, as illustrated in FIG. 27, the correction processing unit 615 sets the positions in the X direction that exceed the threshold Th in the histogram 910 as the positions of the left end and the right end of the corrected detection frame 821, and sets the threshold Th in the histogram 911. The positions in the Y direction that exceed the values are the positions of the upper end and the lower end of the corrected detection frame 821, respectively. The threshold value Th may be a value of 10 to 20% with respect to the maximum value of the histogram, for example. In this case, in FIG. 27, the threshold values in the X direction and the Y direction are set to the threshold value Th, but they need not be the same threshold value. In this way, the image of the detection frame 821 that has been subjected to the frame correction processing by the correction processing unit 615 becomes a detection region that is finally detected by the pedestrian matching processing by the matching unit 610. Then, the correction processing unit 615 includes information (position, size, etc.) of the detected detection area of the pedestrian in the recognition area information of the pedestrian. And the matching process for pedestrians is complete | finished.

以上のステップＳ１２３１〜Ｓ１２３４の処理により、マッチング部６１０の歩行者用マッチング処理が行われる。また、マッチング部６１０の歩行者用マッチング処理の終了後、上述したように、特徴更新部６３０は、現フレームに対応する予測領域８０１において、画像マッチング部６１３により歩行者の位置が決定されることによって確定したその歩行者の輪郭を、次のフレームに対応する視差画像で使用する輪郭テンプレートとして、現在記憶している輪郭テンプレート９０２に代えて更新する。さらに、特徴更新部６３０は、現フレームの予測領域８１１において、画像マッチング部６１３により歩行者の位置が決定されることによって確定したその歩行者についての画像テンプレートを作成し、現在記憶している画像テンプレート９０２ａに代えて更新する。この場合、例えば、特徴更新部６３０は、現フレームにおいて、確定した歩行者の輪郭の各画素の位置から下方向のＮ画素分で構成される画像を、次のフレームで使用する画像テンプレートとすればよい。 The matching process for pedestrians of the matching unit 610 is performed by the processes in steps S1231 to S1234 described above. In addition, after the pedestrian matching process of the matching unit 610 is finished, as described above, the feature update unit 630 determines that the position of the pedestrian is determined by the image matching unit 613 in the prediction region 801 corresponding to the current frame. The outline of the pedestrian determined by the above is updated instead of the currently stored outline template 902 as the outline template used in the parallax image corresponding to the next frame. Further, the feature update unit 630 creates an image template for the pedestrian determined by the position of the pedestrian determined by the image matching unit 613 in the prediction area 811 of the current frame, and stores the currently stored image. It updates instead of the template 902a. In this case, for example, the feature update unit 630 uses, as an image template to be used in the next frame, an image composed of N pixels downward from the position of each pixel of the confirmed pedestrian outline in the current frame. That's fine.

以上のように、本実施の形態に係る物体認識装置１のトラッキング処理における歩行者用マッチング処理では、形状マッチング処理により、歩行者の頭部から肩近傍にかけての輪郭を検出して、輪郭についてのテンプレートマッチングにより、歩行者のＸ方向の位置の候補を特定し、画像マッチング処理により、頭部から肩近傍にかけての画像テンプレートを用いて、Ｘ方向の候補位置におけるＹ方向において、画像についてのテンプレートマッチングにより、歩行者のＹ方向の位置を特定することによって、最終的に歩行者の位置を特定するものとしている。このように、追跡対象とする歩行者の頭部から肩近傍にかけての輪郭を用いて、歩行者の位置を検出しているので、歩行者の手足等の姿勢変化が生じても、または、異なる服装を着用している等の別の歩行者が近傍にいたとしても、歩行者を個々に精度よく検出することができる。また、形状マッチング処理により、Ｘ方向における歩行者の位置の候補を先に特定しておき、そのＸ方向の候補位置においてＹ方向に画像テンプレートによるテンプレートマッチングをすればよいので、歩行者の検出の処理速度を向上させることができる。 As described above, in the pedestrian matching process in the tracking process of the object recognition apparatus 1 according to the present embodiment, the contour from the pedestrian's head to the vicinity of the shoulder is detected by the shape matching process, and Template matching is used to identify pedestrian position candidates in the X direction, and image matching processing is used for image matching in the Y direction at the candidate positions in the X direction using image templates from the head to the vicinity of the shoulder. Thus, by specifying the position of the pedestrian in the Y direction, the position of the pedestrian is finally specified. In this way, since the position of the pedestrian is detected using the contour from the head of the pedestrian to be tracked and the vicinity of the shoulder, the posture of the pedestrian's limbs or the like changes or is different. Even if there is another pedestrian in the vicinity such as wearing clothes, the pedestrian can be detected individually with high accuracy. In addition, the pedestrian position candidates in the X direction can be specified in advance by shape matching processing, and template matching can be performed using the image template in the Y direction at the candidate positions in the X direction. The processing speed can be improved.

また、歩行者の視差値の塊（孤立領域）に対して、頭部と想定される位置を決定して、別の歩行者との境界を求める境界決定処理により、視差値のヒストグラムまたは輪郭の形状をも用いて局所的な谷を見つける処理等に比べて、精度よく別の歩行者との境界を特定することができる。これによって、複数の歩行者の位置が接近していることにより、１つの物体として検出されやすい状態においても、個々の歩行者として検出することが可能となる。 In addition, a parallax value histogram or contour is determined by a boundary determination process for determining a position that is assumed to be the head of a pedestrian parallax value lump (isolated region) and determining a boundary with another pedestrian. The boundary with another pedestrian can be specified with higher accuracy than the process of finding a local valley using the shape. As a result, since the positions of a plurality of pedestrians are approaching, it is possible to detect individual pedestrians even in a state where they are easily detected as one object.

また、形状マッチング処理、画像マッチングし処理、および境界決定処理はいずれも、画像（輝度画像または視差画像）全体に対してではなく、移動予測部６００により歩行者が存在すると予測された予測領域に対して行われるので、画像全体に対して処理するよりも処理速度を向上させることができる。 In addition, the shape matching process, the image matching process, and the boundary determination process are not performed on the entire image (luminance image or parallax image), but on the prediction region in which the pedestrian is predicted to exist by the movement prediction unit 600. Therefore, the processing speed can be improved as compared with the case of processing the entire image.

なお、上述の実施の形態では、コスト値Ｃは非類似度を表す評価値としているが、類似度を表す評価値であってもよい。この場合、類似度であるコスト値Ｃが最大（極値）となるシフト量ｄが視差値ｄｐとなる。 In the above-described embodiment, the cost value C is an evaluation value that represents dissimilarity, but may be an evaluation value that represents similarity. In this case, the shift amount d at which the cost value C, which is the similarity, is maximized (extreme value) is the parallax value dp.

また、上述の実施の形態では、車両７０としての自動車に搭載される物体認識装置について説明したが、これに限定されるものではない。例えば、他の車両の一例としてバイク、自転車、車椅子または農業用の耕運機等の車両に搭載されるものとしてもよい。また、移動体の一例としての車両だけでなく、ロボット等の移動体であってもよい。 Moreover, although the above-mentioned embodiment demonstrated the object recognition apparatus mounted in the motor vehicle as the vehicle 70, it is not limited to this. For example, it may be mounted on a vehicle such as a motorcycle, bicycle, wheelchair, or agricultural cultivator as an example of another vehicle. In addition to a vehicle as an example of a moving body, a moving body such as a robot may be used.

また、上述の実施の形態において、物体認識装置１の視差値導出部３および認識処理部５の各機能部の少なくともいずれかがプログラムの実行によって実現される場合、そのプログラムは、ＲＯＭ等に予め組み込まれて提供される。また、上述の実施の形態に係る物体認識装置１で実行されるプログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（ＣｏｍｐａｃｔＤｉｓｋＲｅｃｏｒｄａｂｌｅ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）等のコンピュータで読み取り可能な記録媒体に記録して提供するように構成してもよい。また、上述の実施の形態の物体認識装置１で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、上述の実施の形態の物体認識装置１で実行されるプログラムを、インターネット等のネットワーク経由で提供または配布するように構成してもよい。また、上述の実施の形態の物体認識装置１で実行されるプログラムは、上述した各機能部のうち少なくともいずれかを含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ５２（ＣＰＵ３２）が上述のＲＯＭ５３（ＲＯＭ３３）からプログラムを読み出して実行することにより、上述の各機能部が主記憶装置（ＲＡＭ５４（ＲＡＭ３４）等）上にロードされて生成されるようになっている。 In the above-described embodiment, when at least one of the functional units of the parallax value deriving unit 3 and the recognition processing unit 5 of the object recognition device 1 is realized by executing a program, the program is stored in a ROM or the like in advance. Provided embedded. The program executed by the object recognition apparatus 1 according to the above-described embodiment is a file in an installable format or an executable format, and is a CD-ROM, a flexible disk (FD), a CD-R (Compact Disk Recordable). It may be configured to be recorded on a computer-readable recording medium such as a DVD (Digital Versatile Disc). Further, the program executed by the object recognition apparatus 1 of the above-described embodiment may be configured to be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network. The program executed by the object recognition apparatus 1 according to the above-described embodiment may be configured to be provided or distributed via a network such as the Internet. The program executed by the object recognition apparatus 1 according to the above-described embodiment has a module configuration including at least one of the above-described functional units. As actual hardware, the CPU 52 (CPU 32) is described above. By reading and executing the program from the ROM 53 (ROM 33), the above-described functional units are loaded onto the main storage device (RAM 54 (RAM 34) or the like) and generated.

１物体認識装置
２本体部
３視差値導出部
４通信線
５認識処理部
６車両制御装置
７ステアリングホイール
８ブレーキペダル
１０ａ、１０ｂ撮像部
１１ａ、１１ｂ撮像レンズ
１２ａ、１２ｂ絞り
１３ａ、１３ｂ画像センサ
２０ａ、２０ｂ信号変換部
２１ａ、２１ｂＣＤＳ
２２ａ、２２ｂＡＧＣ
２３ａ、２３ｂＡＤＣ
２４ａ、２４ｂフレームメモリ
３０画像処理部
３１ＦＰＧＡ
３２ＣＰＵ
３３ＲＯＭ
３４ＲＡＭ
３５Ｉ／Ｆ
３９バスライン
５１ＦＰＧＡ
５２ＣＰＵ
５３ＲＯＭ
５４ＲＡＭ
５５Ｉ／Ｆ
５８ＣＡＮＩ／Ｆ
５９バスライン
６０機器制御システム
７０車両
１００ａ、１００ｂ画像取得部
２００ａ、２００ｂ変換部
３００視差値演算処理部
３０１コスト算出部
３０２決定部
３０３第１生成部
５００第２生成部
５１０クラスタリング処理部
５２０トラッキング処理部
６００移動予測部
６１０マッチング部
６１１判定部
６１２形状マッチング部
６１３画像マッチング部
６１４境界決定部
６１５補正処理部
６２０チェック部
６３０特徴更新部
６４０状態遷移部
７００路面
７００ａ路面部
７０１電柱
７０１ａ電柱部
７０２車
７０２ａ車部
７１１左ガードレール
７１１ａ〜７１１ｃ左ガードレール部
７１２右ガードレール
７１２ａ〜７１２ｃ右ガードレール部
７１３車
７１３ａ〜７１３ｃ車部
７１４車
７１４ａ〜７１４ｃ車部
７２１〜７２４検出領域
７２１ａ〜７２４ａ検出枠
８００〜８０２予測領域
８１１予測領域
８２０、８２１検出枠
９０１輪郭
９０２輪郭テンプレート
９０２ａ画像テンプレート
９０２ｂ矩形テンプレート
９１０、９１１ヒストグラム
Ｂ基線長
ＢＬ基準線
Ｃコスト値
ｄシフト量
ｄｐ視差値
Ｅ物体
ＥＬエピポーラ線
ｆ焦点距離
Ｉａ基準画像
Ｉｂ比較画像
Ｉｐ視差画像
ｐ基準画素
Ｐ１、Ｐ２候補位置
Ｐ３、Ｐ４検出位置
ｐｂ基準領域
Ｐｂ境界位置
ｑ候補画素
ｑｂ候補領域
ＲＭリアルＵマップ
Ｓ、Ｓａ、Ｓｂ点
Ｔｈ閾値
ＵＭ、ＵＭ＿ＨＵマップ
ＶＭＶマップ
Ｚ距離 DESCRIPTION OF SYMBOLS 1 Object recognition apparatus 2 Main body part 3 Parallax value derivation part 4 Communication line 5 Recognition processing part 6 Vehicle control apparatus 7 Steering wheel 8 Brake pedal 10a, 10b Imaging part 11a, 11b Imaging lens 12a, 12b Aperture 13a, 13b Image sensor 20a, 20b Signal converter 21a, 21b CDS
22a, 22b AGC
23a, 23b ADC
24a, 24b Frame memory 30 Image processing unit 31 FPGA
32 CPU
33 ROM
34 RAM
35 I / F
39 Bus line 51 FPGA
52 CPU
53 ROM
54 RAM
55 I / F
58 CANI / F
59 bus line 60 device control system 70 vehicle 100a, 100b image acquisition unit 200a, 200b conversion unit 300 parallax value calculation processing unit 301 cost calculation unit 302 determination unit 303 first generation unit 500 second generation unit 510 clustering processing unit 520 tracking processing Unit 600 movement prediction unit 610 matching unit 611 determination unit 612 shape matching unit 613 image matching unit 614 boundary determination unit 615 correction processing unit 620 check unit 630 feature update unit 640 state transition unit 700 road surface 700a road surface unit 701 utility pole 701a power pole unit 702 vehicle 702a Car part 711 Left guard rail 711a to 711c Left guard rail part 712 Right guard rail 712a to 712c Right guard rail part 713 Car 713a to 713c Car part 714 Car 714a -714c Vehicle part 721-724 Detection area 721a-724a Detection frame 800-802 Prediction area 811 Prediction area 820, 821 Detection frame 901 Outline 902 Outline template 902a Image template 902b Rectangular template 910, 911 Histogram B Base line length BL Base line C Cost Value d shift amount dp parallax value E object EL epipolar line f focal length Ia reference image Ib comparison image Ip parallax image p reference pixel P1, P2 candidate position P3, P4 detection position pb reference area Pb boundary position q candidate pixel qb candidate area RM Real U Map S, Sa, Sb Point Th Threshold UM, UM_H U Map VM V Map Z Distance

特開２０１４−１４６２６７号公報JP 2014-146267 A

Claims

In the distance image corresponding to the current frame, when searching from the upper side to the lower side of the object, obtain the contour of the object by connecting the pixels that hit the pixel of the distance information of the object, First matching means for specifying a candidate position in the lateral direction of the object when the contour corresponds to an object to be detected by template matching using a contour template with respect to the contour;
Second matching means for performing template matching using an image template in a vertical direction from the candidate position in the distance image, and determining a horizontal position of the object and a first position in the vertical direction;
An image processing apparatus.

The image processing apparatus according to claim 1, wherein the first matching unit performs template matching on the contour using the contour template including at least a part of a head of a person.

When a plurality of the contours corresponding to the object to be detected are specified by the first matching means, and when searching from the upper side to the lower side of each object in the distance image, A second position in the vertical direction when the length of the distance information pixels in the direction reaches a predetermined length is specified, and a horizontal boundary of each object is determined based on the second position. The image processing apparatus according to claim 1, further comprising a determination unit that determines the value.

The determination means determines, as the boundary, a midpoint of each position in the lateral direction of an end portion closer to another object in the pixel row of the distance information in the lateral direction of the second position of each object. Item 4. The image processing apparatus according to Item 3.

In the distance image corresponding to the previous frame of the current frame, the contour of the object whose position is determined by the second matching means is updated as the contour template, and corresponds to the contour in the previous frame 5. The image processing apparatus according to claim 1, further comprising an update unit configured to update an image configured by a predetermined number of pixel groups in a downward direction from the position of each pixel as the image template.

A prediction unit for obtaining a prediction region in which the object exists in a distance image corresponding to the current frame;
The first matching processing means performs template matching using the contour template in the prediction region,
The image processing apparatus according to claim 1, wherein the second matching processing unit performs template matching using the image template in an area of the current frame corresponding to the prediction area.

First imaging means for obtaining a first captured image by imaging a subject;
A second imaging unit that is arranged at a position different from the position of the first imaging unit and obtains a second captured image by imaging the subject;
Generating means for generating the distance image based on distance information obtained for the subject from the first captured image and the second captured image;
Detection means for newly detecting an object based on the first captured image or the second captured image and the distance image;
The image processing apparatus according to any one of claims 1 to 6,
An object recognition device comprising:

The object recognition device according to claim 7;
A control device that controls a control object based on information on the object detected by the object recognition device;
Equipment control system equipped with.

In the distance image corresponding to the current frame, when searching from the upper side to the lower side of the object, obtaining the contour of the object by connecting the pixels that have reached the pixel of the distance information of the object Steps,
A first matching step of specifying a candidate position in the lateral direction of the object when the contour corresponds to an object to be detected by template matching using a contour template with respect to the contour;
A second matching step of performing template matching using an image template in a vertical direction from the candidate position in the distance image, and determining a horizontal position of the object and a first position in the vertical direction;
An image processing method.

Computer
In the distance image corresponding to the current frame, when searching from the upper side to the lower side of the object, obtain the contour of the object by connecting the pixels that hit the pixel of the distance information of the object, First matching means for specifying a candidate position in the lateral direction of the object when the contour corresponds to an object to be detected by template matching using a contour template with respect to the contour;
Second matching means for performing template matching using an image template in a vertical direction from the candidate position in the distance image, and determining a horizontal position of the object and a first position in the vertical direction;
Program to make it function.