JP6907513B2

JP6907513B2 - Information processing equipment, imaging equipment, equipment control system, information processing method and program

Info

Publication number: JP6907513B2
Application number: JP2016229348A
Authority: JP
Inventors: 直樹本橋
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-11-25
Filing date: 2016-11-25
Publication date: 2021-07-21
Anticipated expiration: 2036-11-25
Also published as: JP2018085059A

Description

本発明は、情報処理装置、撮像装置、機器制御システム、情報処理方法およびプログラムに関する。 The present invention relates to an information processing device, an imaging device, a device control system, an information processing method and a program.

従来、自動車の安全性において、歩行者と自動車とが衝突したときに、いかに歩行者を守れるか、および、乗員を保護できるかの観点から、自動車のボディー構造等の開発が行われてきた。しかしながら、近年、情報処理技術および画像処理技術の発達により、高速に人および自動車を検出する技術が開発されてきている。これらの技術を応用して、自動車が物体に衝突する前に自動的にブレーキをかけ、衝突を未然に防ぐという自動車もすでに開発されている。車両の自動制御には、人または他車等の物体までの距離を正確に測定する必要があり、そのためには、ミリ波レーダおよびレーザレーダによる測距、ならびに、ステレオカメラによる測距等が実用化されている。例えばステレオカメラで測距する場合、左右のカメラで撮影された局所領域のズレ量（視差）に基づいて視差画像を生成し、前方物体と自車との距離を測定することができる。そして、同程度の距離に存在する（同程度の視差値を有する）視差画素の群を１つの物体として検出するクラスタリング処理を行う。 Conventionally, in terms of automobile safety, the body structure of an automobile has been developed from the viewpoint of how to protect pedestrians and occupants when a pedestrian collides with the automobile. However, in recent years, with the development of information processing technology and image processing technology, technology for detecting people and automobiles at high speed has been developed. By applying these technologies, automobiles have already been developed that automatically apply the brakes before the vehicle collides with an object to prevent the collision. For automatic vehicle control, it is necessary to accurately measure the distance to an object such as a person or another vehicle. For that purpose, distance measurement using millimeter-wave radar and laser radar, distance measurement using a stereo camera, etc. are practical. Has been converted. For example, when measuring a distance with a stereo camera, a parallax image can be generated based on the amount of deviation (parallax) of a local region taken by the left and right cameras, and the distance between the object in front and the own vehicle can be measured. Then, a clustering process is performed to detect a group of parallax pixels existing at the same distance (having the same parallax value) as one object.

ここで、全ての視差画素（視差点）をクラスタリングしてしまうと、検出対象となる物体とは別に、路面上の白線の視差点も拾得し、平坦であるはずの路面の一部分を誤認識物体として検出してしまう問題が発生する。この場合、システムは前方に物体が存在するものであると判定して、急ブレーキをかけてしまう問題を生じる。この問題を解決するために、各視差点（視差画像のｘ座標値、視差画像のｙ座標値、視差値ｄ）を、横軸を視差値ｄ、縦軸を視差画像のｘ座標値、奥行方向の軸を頻度値とした２次元ヒストグラム上に投票して得られる情報（Ｖ−ＤｉｓｐａｒｉｔｙＭａｐ）を生成し、この情報に投票された点群から最小二乗法などの統計手法を用いて路面形状を推定し、推定路面よりも所定の高さ以上の位置に存在する視差点のみを使ってクラスタリングすることで、路面を誤認識物体として検出することを回避する技術が知られている（例えば特許文献１参照）。 Here, if all the parallax pixels (parallax points) are clustered, the parallax points of the white lines on the road surface can be found separately from the object to be detected, and a part of the road surface that should be flat can be misrecognized as an object. There is a problem that it is detected as. In this case, the system determines that there is an object in front of it, causing a problem of sudden braking. In order to solve this problem, each disparity point (x-coordinate value of the disparity image, y-coordinate value of the disparity image, disparity value d) is set on the horizontal axis as the disparity value d and the vertical axis as the x-coordinate value of the disparity image and the depth. Information (V-DisparityMap) obtained by voting on a two-dimensional histogram with the axis of direction as the frequency value is generated, and the road surface shape is obtained from the point cloud voted for this information using a statistical method such as the minimum square method. There is known a technique for avoiding detecting a road surface as a false perceptible object by estimating and clustering using only the disparity points existing at a position higher than a predetermined height of the estimated road surface (for example, Patent Documents). 1).

しかしながら、従来技術においては、例えば路面に対応する視差が少ない状況下で車両などの物体が存在するシーンにおいて、物体に対応する視差を路面に対応する視差と誤って選択してしまい、推定路面が実際よりも引き上がってしまうという問題がある。つまり、従来技術においては、実際の路面とは異なる推定路面を用いて物体検出が行われてしまうために、物体の検出精度を十分に確保することが困難であるという問題がある。 However, in the prior art, for example, in a scene where an object such as a vehicle exists in a situation where the parallax corresponding to the road surface is small, the parallax corresponding to the object is erroneously selected as the parallax corresponding to the road surface, and the estimated road surface becomes There is a problem that it will be pulled up more than it actually is. That is, in the prior art, there is a problem that it is difficult to sufficiently secure the detection accuracy of the object because the object is detected by using the estimated road surface different from the actual road surface.

本発明は、上記に鑑みてなされたものであって、物体の検出精度を十分に確保可能な情報処理装置、撮像装置、機器制御システム、情報処理方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide an information processing device, an imaging device, a device control system, an information processing method, and a program capable of sufficiently ensuring the detection accuracy of an object.

上述した課題を解決し、目的を達成するために、本発明は、画素毎に距離情報を有する距離画像を取得する取得部と、前記距離画像に含まれる複数の画素に基づいて、縦方向の位置と、奥行方向の位置とが対応付けられた対応情報を生成する生成部と、前記対応情報を分割した複数のセグメントごとに、オブジェクトの高さの基準となる基準オブジェクトの形状を推定する推定部と、前記セグメントごとに、前記推定部により推定された前記基準オブジェクトの形状を示す推定形状に基づく形状を隣接する前記セグメントまで延長させた延長形状を基準に、前記所定の形状よりも下方に存在する前記距離情報の分布、または、前記延長形状に占める前記距離情報を有する画素が投票された座標の合計数の割合、に応じて、前記推定形状を棄却する棄却部と、を備える情報処理装置である。 In order to solve the above-mentioned problems and achieve the object, the present invention is based on an acquisition unit that acquires a distance image having distance information for each pixel and a plurality of pixels included in the distance image in the vertical direction. An estimation unit that generates correspondence information in which a position and a position in the depth direction are associated with each other, and an estimation that estimates the shape of a reference object that serves as a reference for the height of the object for each of a plurality of segments in which the correspondence information is divided. parts and, for each of the segments, to the reference extension shape is extended to an adjacent segment of based rather shape the estimated shape indicating a shape of the reference object estimated by the estimating unit, than the predetermined shape It is provided with a rejection unit that rejects the estimated shape according to the distribution of the distance information existing below or the ratio of the total number of coordinates in which the pixels having the distance information are voted to the extension shape. It is an information processing device.

本発明によれば、物体の検出精度を十分に確保することができる。 According to the present invention, it is possible to sufficiently secure the detection accuracy of the object.

図１は、実施形態の移動体制御システムの概略構成を示す模式図である。FIG. 1 is a schematic diagram showing a schematic configuration of a mobile control system according to an embodiment. 図２は、撮像ユニット及び解析ユニットの概略的なブロック図である。FIG. 2 is a schematic block diagram of an imaging unit and an analysis unit. 図３は、被写体と各カメラ部の撮像レンズとの位置関係を示す図である。FIG. 3 is a diagram showing the positional relationship between the subject and the image pickup lens of each camera unit. 図４は、解析ユニットが有する機能を概略的に説明するための図である。FIG. 4 is a diagram for schematically explaining the function of the analysis unit. 図５は、物体検出処理部が有する機能の一例を示す図である。FIG. 5 is a diagram showing an example of the function of the object detection processing unit. 図６は、路面検出処理部が有する機能の一例を示す図である。FIG. 6 is a diagram showing an example of the function of the road surface detection processing unit. 図７は、視差画像の一例を示す図である。FIG. 7 is a diagram showing an example of a parallax image. 図８は、クラスタリング処理部の詳細な構成の一例を示す図である。FIG. 8 is a diagram showing an example of a detailed configuration of the clustering processing unit. 図９は、撮像画像の一例を示す図である。FIG. 9 is a diagram showing an example of a captured image. 図１０は、ＡｄｕｌｔＬａｒｇｅＵｍａｐの一例を示す図である。FIG. 10 is a diagram showing an example of the Adult Large Umap. 図１１は、ＬａｒｇｅＵｍａｐの一例を示す図である。FIG. 11 is a diagram showing an example of Large Umap. 図１２は、撮像画像の一例を示す図である。FIG. 12 is a diagram showing an example of a captured image. 図１３は、孤立領域の一例を示す図である。FIG. 13 is a diagram showing an example of an isolated region. 図１４は、図１３に示す孤立領域に対応する視差画像上の領域を示す図である。FIG. 14 is a diagram showing a region on the parallax image corresponding to the isolated region shown in FIG. 図１５は、オブジェクトタイプごとに定められたサイズ範囲を示す図である。FIG. 15 is a diagram showing a size range defined for each object type. 図１６は、棄却処理を説明するための図である。FIG. 16 is a diagram for explaining a rejection process. 図１７は、路面推定部が有する機能の一例を示す図である。FIG. 17 is a diagram showing an example of the function of the road surface estimation unit. 図１８は、第１の生成部により生成されるＶマップの一例を示す図である。FIG. 18 is a diagram showing an example of a V map generated by the first generation unit. 図１９は、分割部による分割で得られた複数のセグメントの一例を示す図である。FIG. 19 is a diagram showing an example of a plurality of segments obtained by division by the division portion. 図２０は、第１の実施形態の棄却判断を説明するための図である。FIG. 20 is a diagram for explaining a rejection determination of the first embodiment. 図２１は、延長路面の下方に存在する視差値の頻度値を計測する方法を説明するための図である。FIG. 21 is a diagram for explaining a method of measuring the frequency value of the parallax value existing below the extended road surface. 図２２は、延長路面の下方に存在する視差値の頻度値を計測する方法の別の例を説明するための図である。FIG. 22 is a diagram for explaining another example of a method of measuring the frequency value of the parallax value existing below the extended road surface. 図２３は、デフォルト路面を説明するための図である。FIG. 23 is a diagram for explaining a default road surface. 図２４は、路面検出処理部による処理の一例を示すフローチャートである。FIG. 24 is a flowchart showing an example of processing by the road surface detection processing unit. 図２５は、第２の実施形態の棄却判断を説明するための図である。FIG. 25 is a diagram for explaining a rejection determination of the second embodiment.

以下、添付図面を参照しながら、本発明に係る情報処理装置、撮像装置、機器制御システム、情報処理方法およびプログラムの実施形態を詳細に説明する。 Hereinafter, embodiments of the information processing device, the imaging device, the device control system, the information processing method, and the program according to the present invention will be described in detail with reference to the accompanying drawings.

（第１の実施形態）
図１は、実施形態の移動体制御システム１００の概略構成を示す模式図である。図１に示すように、移動体制御システム１００は、移動体の一例である自動車等の車両１０１に設けられる。移動体制御システム１００は、撮像ユニット１０２、解析ユニット１０３、制御ユニット１０４及び表示部１０５を有している。 (First Embodiment)
FIG. 1 is a schematic diagram showing a schematic configuration of the mobile control system 100 of the embodiment. As shown in FIG. 1, the mobile body control system 100 is provided in a vehicle 101 such as an automobile, which is an example of a mobile body. The mobile control system 100 includes an imaging unit 102, an analysis unit 103, a control unit 104, and a display unit 105.

撮像ユニット１０２は、車両１０１のフロントガラス１０６のルームミラー付近に設けられ、車両１０１の例えば進行方向等の画像を撮像する。撮像ユニット１０２の撮像動作で得られる画像データを含む各種データは、解析ユニット１０３に供給される。解析ユニット１０３は、撮像ユニット１０２から供給される各種データに基づいて、車両１０１が走行中の路面、車両１の前方車両、歩行者、障害物等の認識対象物を解析する。制御ユニット１０４は、解析ユニット１０３の解析結果に基づいて、表示部１０５を介して、車両１０１の運転者へ警告等を行う。また、制御ユニット１０４は、解析結果に基づいて、各種車載機器の制御、車両１０１のハンドル制御又はブレーキ制御等の走行支援を行う。 The image pickup unit 102 is provided near the rearview mirror of the windshield 106 of the vehicle 101, and captures an image of the vehicle 101, for example, the traveling direction. Various data including the image data obtained by the imaging operation of the imaging unit 102 are supplied to the analysis unit 103. The analysis unit 103 analyzes a recognition target such as a road surface on which the vehicle 101 is traveling, a vehicle in front of the vehicle 1, a pedestrian, an obstacle, etc., based on various data supplied from the imaging unit 102. The control unit 104 warns the driver of the vehicle 101 or the like via the display unit 105 based on the analysis result of the analysis unit 103. Further, the control unit 104 performs running support such as control of various in-vehicle devices, steering wheel control of the vehicle 101, and brake control based on the analysis result.

図２は、撮像ユニット１０２及び解析ユニット１０３の概略的なブロック図である。この例では、解析ユニット１０３は「情報処理装置」として機能し、撮像ユニット１０２および解析ユニット１０３の組は「撮像装置」として機能する。なお、上述の制御ユニット１０４は、「制御部」として機能し、撮像装置の出力結果に基づいて機器（この例では車両１０１）を制御する。撮像ユニット１０２は、左目用となる第１のカメラ部１Ａと、右目用となる第２のカメラ部１Ｂとの、２台のカメラ部が平行に組み付けられて構成されている。つまり、撮像ユニット１０２は、ステレオ画像を撮像するステレオカメラとして構成されている。ステレオ画像とは、複数の視点ごとの撮像で得られる複数の撮像画像（複数の視点と１対１に対応する複数の撮像画像）を含む画像であり、撮像ユニット１０２は、このステレオ画像を撮像するための装置である（「撮像部」として機能する）。各カメラ部１Ａおよび１Ｂは、それぞれレンズ５、画像センサ６、センサコントローラ７を備えている。画像センサ６は、例えばＣＣＤイメージセンサまたはＣＭＯＳイメージセンサとなっている。ＣＣＤは、「Charge Coupled Device」の略記である。また、ＣＭＯＳは、「Complementary Metal-Oxide Semiconductor」の略記である。センサコントローラ７は、画像センサ６の露光制御、画像読み出し制御、外部回路との通信、および画像データの送信制御等を行う。 FIG. 2 is a schematic block diagram of the imaging unit 102 and the analysis unit 103. In this example, the analysis unit 103 functions as an "information processing device", and the pair of the imaging unit 102 and the analysis unit 103 functions as an "imaging device". The control unit 104 described above functions as a "control unit" and controls the device (vehicle 101 in this example) based on the output result of the imaging device. The image pickup unit 102 is configured by assembling two camera units, a first camera unit 1A for the left eye and a second camera unit 1B for the right eye, in parallel. That is, the image pickup unit 102 is configured as a stereo camera that captures a stereo image. The stereo image is an image including a plurality of captured images (a plurality of captured images corresponding to one-to-one with a plurality of viewpoints) obtained by imaging for each of a plurality of viewpoints, and the imaging unit 102 captures the stereo images. (Functions as an "imaging unit"). Each camera unit 1A and 1B includes a lens 5, an image sensor 6, and a sensor controller 7, respectively. The image sensor 6 is, for example, a CCD image sensor or a CMOS image sensor. CCD is an abbreviation for "Charge Coupled Device". CMOS is an abbreviation for "Complementary Metal-Oxide Semiconductor". The sensor controller 7 performs exposure control of the image sensor 6, image readout control, communication with an external circuit, transmission control of image data, and the like.

解析ユニット１０３は、データバスライン１０、シリアルバスライン１１、ＣＰＵ１５、ＦＰＧＡ１６、ＲＯＭ１７、ＲＡＭ１８、シリアルＩＦ１９、およびデータＩＦ２０を有している。ＣＰＵは、「Central Processing Unit」の略記である。ＦＰＧＡは、「Field-Programmable Gate Array」の略記である。ＲＯＭは、「Read Only Memory」の略記である。ＲＡＭは、「Random Access Memory」の略記である。ＩＦは、「interface」の略記である。 The analysis unit 103 has a data bus line 10, a serial bus line 11, a CPU 15, an FPGA 16, a ROM 17, a RAM 18, a serial IF 19, and a data IF 20. CPU is an abbreviation for "Central Processing Unit". FPGA is an abbreviation for "Field-Programmable Gate Array". ROM is an abbreviation for "Read Only Memory". RAM is an abbreviation for "Random Access Memory". IF is an abbreviation for "interface".

上述の撮像ユニット１０２は、データバスライン１０およびシリアルバスライン１１を介して解析ユニット１０３と接続されている。ＣＰＵ１５は、解析ユニット１０３全体の動作、画像処理、および画像認識処理を実行制御する。第１のカメラ部１Ａおよび第２のカメラ部１Ｂの画像センサ６で撮像された撮像画像の輝度画像データは、データバスライン１０を介して解析ユニット１０３のＲＡＭ１８に書き込まれる。ＣＰＵ１５またはＦＰＧＡ１６からのセンサ露光値の変更制御データ、画像読み出しパラメータの変更制御データ、および各種設定データ等は、シリアルバスライン１１を介して送受信される。 The image pickup unit 102 described above is connected to the analysis unit 103 via a data bus line 10 and a serial bus line 11. The CPU 15 executes and controls the operation, image processing, and image recognition processing of the entire analysis unit 103. The luminance image data of the captured image captured by the image sensor 6 of the first camera unit 1A and the second camera unit 1B is written to the RAM 18 of the analysis unit 103 via the data bus line 10. The sensor exposure value change control data, the image reading parameter change control data, various setting data, and the like from the CPU 15 or the FPGA 16 are transmitted and received via the serial bus line 11.

ＦＰＧＡ１６は、ＲＡＭ１８に保存された画像データに対してリアルタイム性が要求される処理である。ＦＰＧＡ１６は、第１のカメラ部１Ａおよび第２のカメラ部１Ｂでそれぞれ撮像された輝度画像データ（撮像画像）のうち、一方を基準画像とすると共に他方を比較画像とする。そして、ＦＰＧＡ１６は、撮像領域内の同一地点に対応する基準画像上の対応画像部分と比較画像上の対応画像部分との位置ズレ量を、対応画像部分の視差値（視差画像データ）として算出する。 The FPGA 16 is a process that requires real-time performance for the image data stored in the RAM 18. The FPGA 16 uses one of the luminance image data (captured images) captured by the first camera unit 1A and the second camera unit 1B as a reference image and the other as a comparative image. Then, the FPGA 16 calculates the amount of positional deviation between the corresponding image portion on the reference image and the corresponding image portion on the comparison image corresponding to the same point in the imaging region as a parallax value (parallax image data) of the corresponding image portion. ..

図３に、ＸＺ平面上における被写体３０と、第１のカメラ部１Ａの撮像レンズ５Ａと、第２のカメラ部１Ｂの撮像レンズ５Ｂとの位置関係を示す。この図３において、各撮像レンズ５Ａ、５Ｂの間の距離ｂ及び各撮像レンズ５Ａ、５Ｂの焦点距離ｆは、それぞれ固定値である。また、被写体３０の注視点Ｐに対する撮像レンズ５ＡのＸ座標のズレ量をΔ１とする。また、被写体３０の注視点Ｐに対する撮像レンズ５ＢのＸ座標のズレ量をΔ２とする。この場合において、ＦＰＧＡ１６は、被写体３０の注視点Ｐに対する各撮像レンズ５Ａ、５ＢのＸ座標の差である視差値ｄを、以下の式１で算出する。 FIG. 3 shows the positional relationship between the subject 30 on the XZ plane, the image pickup lens 5A of the first camera unit 1A, and the image pickup lens 5B of the second camera unit 1B. In FIG. 3, the distance b between the imaging lenses 5A and 5B and the focal length f of the imaging lenses 5A and 5B are fixed values, respectively. Further, the amount of deviation of the X coordinate of the image pickup lens 5A with respect to the gazing point P of the subject 30 is set to Δ1. Further, the amount of deviation of the X coordinate of the image pickup lens 5B with respect to the gazing point P of the subject 30 is set to Δ2. In this case, the FPGA 16 calculates the parallax value d, which is the difference between the X coordinates of the imaging lenses 5A and 5B with respect to the gazing point P of the subject 30, by the following equation 1.

解析ユニット１０３のＦＰＧＡ１６は、撮像ユニット１０２から供給される輝度画像データに対して、例えばガンマ補正処理及び歪み補正処理（左右の撮像画像の平行化）等のリアルタイム性が要求される処理を施す。また、ＦＰＧＡ１６は、このようなリアルタイム性が要求される処理を施した輝度画像データを用いて上述の式１の演算を行うことで、視差画像データ（視差値Ｄ）を生成し、ＲＡＭ１５に書き込む。 The FPGA 16 of the analysis unit 103 performs processing that requires real-time performance, such as gamma correction processing and distortion correction processing (parallelization of left and right captured images), on the luminance image data supplied from the imaging unit 102. Further, the FPGA 16 generates parallax image data (parallax value D) by performing the calculation of the above equation 1 using the luminance image data that has been subjected to such processing that requires real-time performance, and writes it to the RAM 15. ..

図２に戻って説明を続ける。ＣＰＵ１５は、撮像ユニット１０２の各センサコントローラ７の制御、および解析ユニット１０３の全体的な制御を行う。また、ＲＯＭ１７には、後述する状況認識、予測、立体物認識等を実行するための立体物認識プログラムが記憶されている。立体物認識プログラムは、画像処理プログラムの一例である。ＣＰＵ１５は、データＩＦ２０を介して、例えば自車両のＣＡＮ情報（車速、加速度、舵角、ヨーレート等）をパラメータとして取得する。そして、ＣＰＵ１５は、ＲＯＭ１７に記憶されている立体物認識プログラムに従って、ＲＡＭ１８に記憶されている輝度画像および視差画像を用いて、状況認識等の各種処理を実行制御することで、例えば先行車両等の認識対象の認識を行う。ＣＡＮは、「Controller Area Network」の略記である。 The explanation will be continued by returning to FIG. The CPU 15 controls each sensor controller 7 of the image pickup unit 102 and controls the analysis unit 103 as a whole. Further, the ROM 17 stores a three-dimensional object recognition program for executing situational awareness, prediction, three-dimensional object recognition, and the like, which will be described later. The three-dimensional object recognition program is an example of an image processing program. The CPU 15 acquires, for example, CAN information (vehicle speed, acceleration, steering angle, yaw rate, etc.) of the own vehicle as parameters via the data IF20. Then, the CPU 15 executes and controls various processes such as situation recognition using the luminance image and the parallax image stored in the RAM 18 according to the three-dimensional object recognition program stored in the ROM 17, for example, the preceding vehicle or the like. Recognize the recognition target. CAN is an abbreviation for "Controller Area Network".

認識対象の認識データは、シリアルＩＦ１９を介して、制御ユニット１０４へ供給される。制御ユニット１０４は、認識対象の認識データを用いて自車両のブレーキ制御や自車両の速度制御等の走行支援を行う。 The recognition data to be recognized is supplied to the control unit 104 via the serial IF 19. The control unit 104 uses the recognition data of the recognition target to perform running support such as brake control of the own vehicle and speed control of the own vehicle.

図４は、解析ユニット１０３が有する機能を概略的に説明するための図である。ステレオカメラを構成する撮像ユニット１０２で撮像されるステレオ画像は解析ユニット１０３へ供給される。例えば第１のカメラ部１Ａおよび第２のカメラ部１Ｂがカラー仕様の場合、第１のカメラ部１Ａおよび第２のカメラ部１Ｂの各々は、以下の式２の演算を行うことで、ＲＧＢ（赤緑青）の各信号から輝度（Ｙ）信号を生成するカラー輝度変換処理を行う。第１のカメラ部１Ａおよび第２のカメラ部１Ｂの各々は、カラー輝度変換処理により生成した輝度画像データ（撮像画像）を、解析ユニット１０３が有する前処理部１１１へ供給する。第１のカメラ部１Ａで撮像された輝度画像データ（撮像画像）と、第２のカメラ部１Ｂで撮像された輝度画像データ（撮像画像）との組がステレオ画像であると考えることができる。この例では、前処理部１１１は、ＦＰＧＡ１６により実現される。 FIG. 4 is a diagram for schematically explaining the function of the analysis unit 103. The stereo image captured by the image pickup unit 102 constituting the stereo camera is supplied to the analysis unit 103. For example, when the first camera unit 1A and the second camera unit 1B have color specifications, each of the first camera unit 1A and the second camera unit 1B performs RGB ( A color luminance conversion process is performed to generate a luminance (Y) signal from each of the red, green, and blue) signals. Each of the first camera unit 1A and the second camera unit 1B supplies the luminance image data (captured image) generated by the color luminance conversion process to the preprocessing unit 111 included in the analysis unit 103. It can be considered that the set of the luminance image data (captured image) captured by the first camera unit 1A and the luminance image data (captured image) captured by the second camera unit 1B is a stereo image. In this example, the preprocessing unit 111 is realized by the FPGA 16.

前処理部１１１は、第１のカメラ部１Ａおよび第２のカメラ部１Ｂから受け取った輝度画像データの前処理を行う。この例では、前処理としてガンマ補正処理を行う。そして、前処理部１１１は、前処理を行った後の輝度画像データを平行化画像生成部１１２へ供給する。 The preprocessing unit 111 preprocesses the luminance image data received from the first camera unit 1A and the second camera unit 1B. In this example, gamma correction processing is performed as preprocessing. Then, the preprocessing unit 111 supplies the luminance image data after the preprocessing to the parallelized image generation unit 112.

平行化画像生成部１１２は、前処理部１１１から供給された輝度画像データに対して、平行化処理（歪み補正処理）を施す。この平行化処理は、第１のカメラ部１Ａ、第２のカメラ部１Ｂから出力される輝度画像データを、２つのピンホールカメラが平行に取り付けられたときに得られる理想的な平行化ステレオ画像に変換する処理である。具体的には、各画素の歪み量を、Δｘ＝ｆ（ｘ、ｙ）、Δｙ＝ｇ（ｘ、ｙ）という多項式を用いて計算した計算結果を用いて、第１のカメラ部１Ａ、第２のカメラ部１Ｂから出力される輝度画像データの各画素を変換する。多項式は、例えば、ｘ（画像の横方向位置）、ｙ（画像の縦方向位置）に関する５次多項式に基づく。これにより、第１のカメラ部１Ａ、第２のカメラ部１Ｂの光学系の歪みを補正した平行な輝度画像を得ることができる。この例では、平行化画像生成部１１２は、ＦＰＧＡ１６により実現される。 The parallelization image generation unit 112 performs parallelization processing (distortion correction processing) on the luminance image data supplied from the preprocessing unit 111. This parallelization process is an ideal parallelization stereo image obtained when two pinhole cameras are mounted in parallel with the brightness image data output from the first camera unit 1A and the second camera unit 1B. It is a process to convert to. Specifically, the first camera unit 1A, the first camera unit 1A, using the calculation result calculated by using the polynomials Δx = f (x, y) and Δy = g (x, y) for the amount of distortion of each pixel. Each pixel of the brightness image data output from the camera unit 1B of 2 is converted. The polynomial is based on, for example, a fifth-order polynomial with respect to x (horizontal position of the image) and y (vertical position of the image). As a result, it is possible to obtain a parallel luminance image in which the distortion of the optical system of the first camera unit 1A and the second camera unit 1B is corrected. In this example, the parallelized image generation unit 112 is realized by the FPGA 16.

視差画像生成部１１３は、「距離画像生成部」の一例であり、撮像ユニット１０２により撮像されたステレオ画像から、画素毎に距離情報を備えた距離画像の一例である、画素毎に視差値を備えた視差画像を生成する。ここでは、視差画像生成部１１３は、第１のカメラ部１Ａの輝度画像データを基準画像データとし、第２のカメラ部１Ｂの輝度画像データを比較画像データとし、上述の式１に示す演算を行うことで、基準画像データと比較画像データの視差を示す視差画像データを生成する。具体的には、視差画像生成部１１３は、基準画像データの所定の「行」について、一つの注目画素を中心とした複数画素（例えば１６画素×１画素）からなるブロックを定義する。一方、比較画像データにおける同じ「行」において、定義した基準画像データのブロックと同じサイズのブロックを１画素ずつ横ライン方向（Ｘ方向）へズラす。そして、視差画像生成部１１３は、基準画像データにおいて定義したブロックの画素値の特徴を示す特徴量と比較画像データにおける各ブロックの画素値の特徴を示す特徴量との相関を示す相関値を、それぞれ算出する。 The parallax image generation unit 113 is an example of a “distance image generation unit”, and obtains a parallax value for each pixel, which is an example of a distance image having distance information for each pixel from a stereo image captured by the imaging unit 102. Generate a parallax image provided. Here, the parallax image generation unit 113 uses the brightness image data of the first camera unit 1A as the reference image data and the brightness image data of the second camera unit 1B as the comparison image data, and performs the calculation shown in the above equation 1. By doing so, disparity image data indicating the disparity between the reference image data and the comparison image data is generated. Specifically, the parallax image generation unit 113 defines a block composed of a plurality of pixels (for example, 16 pixels × 1 pixel) centered on one pixel of interest for a predetermined “row” of the reference image data. On the other hand, in the same "row" in the comparative image data, a block having the same size as the defined reference image data block is shifted one pixel at a time in the horizontal line direction (X direction). Then, the parallax image generation unit 113 sets a correlation value indicating the correlation between the feature amount indicating the characteristic of the pixel value of the block defined in the reference image data and the feature amount indicating the characteristic of the pixel value of each block in the comparison image data. Calculate each.

また、視差画像生成部１１３は、算出した相関値に基づき、比較画像データにおける各ブロックの中で最も基準画像データのブロックと相関があった比較画像データのブロックを選定するマッチング処理を行う。その後、基準画像データのブロックの注目画素と、マッチング処理で選定された比較画像データのブロックの対応画素との位置ズレ量を視差値Ｄとして算出する。このような視差値Ｄを算出する処理を基準画像データの全域又は特定の一領域について行うことで、視差画像データを得る。なお、視差画像の生成方法としては、公知の様々な技術を利用可能である。要するに、視差画像生成部１１３は、ステレオカメラで撮像されるステレオ画像から、画素毎に距離情報を有する距離画像（この例では視差画像）を算出（生成）していると考えることができる。 Further, the parallax image generation unit 113 performs a matching process of selecting the block of the comparison image data that has the most correlation with the block of the reference image data among the blocks of the comparison image data based on the calculated correlation value. After that, the amount of positional deviation between the pixel of interest of the block of the reference image data and the corresponding pixel of the block of the comparative image data selected by the matching process is calculated as the parallax value D. Parallax image data is obtained by performing such a process of calculating the parallax value D for the entire area of the reference image data or a specific area. As a method for generating a parallax image, various known techniques can be used. In short, it can be considered that the parallax image generation unit 113 calculates (generates) a distance image (parallax image in this example) having distance information for each pixel from the stereo image captured by the stereo camera.

マッチング処理に用いるブロックの特徴量としては、例えばブロック内の各画素の値（輝度値）を用いることができる。また、相関値としては、例えば基準画像データのブロック内の各画素の値（輝度値）と、これらの画素にそれぞれ対応する比較画像データのブロック内の各画素の値（輝度値）との差分の絶対値の総和を用いることができる。この場合、当該総和が最も小さくなるブロックが、最も相関があるブロックとして検出される。 As the feature amount of the block used for the matching process, for example, the value (luminance value) of each pixel in the block can be used. Further, as the correlation value, for example, the difference between the value (brightness value) of each pixel in the block of the reference image data and the value (brightness value) of each pixel in the block of the comparative image data corresponding to each of these pixels. The sum of the absolute values of can be used. In this case, the block with the smallest sum is detected as the most correlated block.

このような視差画像生成部１１３のマッチング処理としては、例えばＳＳＤ（Sum of Squared Difference）、ＺＳＳＤ（Zero-mean Sum of Squared Difference）、ＳＡＤ（Sum of Absolute Difference）、又は、ＺＳＡＤ（Zero-mean Sum of Absolute Difference）等の手法を用いることができる。なお、マッチング処理において、１画素未満のサブピクセルレベルの視差値が必要な場合は、推定値を用いる。推定値の推定手法としては、例えば等角直線方式又は二次曲線方式等を用いることができる。ただし、推定したサブピクセルレベルの視差値には誤差が発生する。このため、推定誤差を減少させるＥＥＣ（推定誤差補正）等の手法を用いてもよい。 Examples of the matching process of the parallax image generation unit 113 include SSD (Sum of Squared Difference), ZSD (Zero-mean Sum of Squared Difference), SAD (Sum of Absolute Difference), or ZSAD (Zero-mean Sum). A method such as of Absolute Difference) can be used. If a subpixel level parallax value of less than one pixel is required in the matching process, an estimated value is used. As a method for estimating the estimated value, for example, an isometric straight line method, a quadratic curve method, or the like can be used. However, an error occurs in the estimated sub-pixel level parallax value. Therefore, a method such as EEC (estimation error correction) that reduces the estimation error may be used.

この例では、視差画像生成部１１３は、ＦＰＧＡ１６により実現される。視差画像生成部１１３により生成された視差画像は、物体検出処理部１１４へ供給される。この例では、物体検出処理部１１４の機能は、ＣＰＵ１５が立体物認識プログラムを実行することにより実現される。 In this example, the parallax image generation unit 113 is realized by the FPGA 16. The parallax image generated by the parallax image generation unit 113 is supplied to the object detection processing unit 114. In this example, the function of the object detection processing unit 114 is realized by the CPU 15 executing the three-dimensional object recognition program.

図５は、物体検出処理部１１４が有する機能の一例を示す図である。図５に示すように、物体検出処理部１１４は、路面検出処理部１２２、クラスタリング処理部１２３、トラッキング処理部１２４を有する。 FIG. 5 is a diagram showing an example of the function of the object detection processing unit 114. As shown in FIG. 5, the object detection processing unit 114 includes a road surface detection processing unit 122, a clustering processing unit 123, and a tracking processing unit 124.

路面検出処理部１２２は、視差画像生成部１１３から入力される視差画像を用いて、オブジェクトの高さの基準となる基準オブジェクトの一例である路面を検出する。図６に示すように、路面検出処理部１２２は、取得部１２５と、第１の生成部１２６と、路面推定部１２７と、を有する。取得部１２５は、画素毎に距離情報を有する距離画像の一例である視差画像を取得する。取得部１２５により取得された視差画像は第１の生成部１２６および後述のクラスタリング処理部１２３へ入力される。 The road surface detection processing unit 122 detects a road surface which is an example of a reference object that serves as a reference for the height of the object by using the parallax image input from the parallax image generation unit 113. As shown in FIG. 6, the road surface detection processing unit 122 includes an acquisition unit 125, a first generation unit 126, and a road surface estimation unit 127. The acquisition unit 125 acquires a parallax image which is an example of a distance image having distance information for each pixel. The parallax image acquired by the acquisition unit 125 is input to the first generation unit 126 and the clustering processing unit 123 described later.

第１の生成部１２６は、「生成部」の一例であり、視差画像に含まれる複数の画素に基づいて、視差画像の縦方向の位置と、ステレオカメラの光軸の方向を示す奥行方向の位置とが対応付けられた対応情報を生成する。この例では、第１の生成部１２６は、視差画像の各画素を、画像の垂直方向の座標（ｙ）を縦軸、視差値ｄを横軸とする２次元ヒストグラム上に投票して、上述の対応情報を生成する。以下の説明では、この対応情報を「Ｖマップ（Ｖ−Ｄｉｓｐａｒｉｔｙマップ）」と称する。Ｖマップは、視差画像の（ｘ座標値、ｙ座標値、視差値ｄ）の組のうち、横軸（ｘ軸）を視差値ｄ、縦軸（ｙ軸）をｙ座標値、奥行方向の軸（ｚ軸）を頻度とした２次元ヒストグラムである。要するに、Ｖマップは、縦方向の位置と視差値ｄ（奥行方向の位置に相当）との組み合わせごとに、視差値ｄの頻度値を記録した情報であると考えることもできる。以下の説明では、Ｖマップ内の座標点のうち、視差画像に含まれる視差値ｄを有する画素（視差画素）が投票された座標を視差点と称する場合がある。なお、Ｖマップの生成において、視差画像のｙ座標とＶマップのｙ座標とは対応関係にあり、視差画像の特定のｙ座標の水平ライン上の視差値ｄは、Ｖマップの対応するｙ座標の水平ラインのうち、該視差値ｄに対応する点（Ｖマップ上の座標点）に投票される。したがって、視差画像の同じ水平ラインに含まれる視差値ｄは同値となるものも存在するため、Ｖマップの任意の座標点には、同値の視差値ｄの数を示す頻度値が格納されることになる。視差画像の特定の水平ラインにおいては、同じ路面であれば、視差値ｄは互いに類似する値となるため、Ｖマップにおける路面に対応する視差画素は密集して投票されることになる。 The first generation unit 126 is an example of the “generation unit”, and is in the depth direction indicating the vertical position of the parallax image and the direction of the optical axis of the stereo camera based on a plurality of pixels included in the parallax image. Generates correspondence information associated with the position. In this example, the first generation unit 126 votes each pixel of the parallax image on a two-dimensional histogram having the coordinates (y) in the vertical direction of the image as the vertical axis and the parallax value d as the horizontal axis. Generate the correspondence information of. In the following description, this correspondence information will be referred to as a "V map (V-Disparity map)". In the V map, among the sets of (x coordinate value, y coordinate value, difference value d) of the difference image, the horizontal axis (x axis) is the difference value d, the vertical axis (y axis) is the y coordinate value, and the depth direction. It is a two-dimensional histogram with the axis (z-axis) as a frequency. In short, the V-map can be considered as information that records the frequency value of the parallax value d for each combination of the position in the vertical direction and the parallax value d (corresponding to the position in the depth direction). In the following description, among the coordinate points in the V map, the coordinates in which the pixel having the parallax value d included in the parallax image (parallax pixel) is voted may be referred to as the parallax point. In the generation of the V-map, the y-coordinate of the differential image and the y-coordinate of the V-map have a corresponding relationship, and the differential value d on the horizontal line of the specific y-coordinate of the differential image is the corresponding y-coordinate of the V-map. Of the horizontal lines of, the points (coordinate points on the V map) corresponding to the disparity value d are voted. Therefore, since some parallax values d included in the same horizontal line of the parallax image have the same value, a frequency value indicating the number of parallax values d of the same value is stored at an arbitrary coordinate point of the V map. become. In a specific horizontal line of the parallax image, if the road surface is the same, the parallax values d are similar to each other, so that the parallax pixels corresponding to the road surface in the V map are densely voted.

なお、第１の生成部１２６は、視差画像中の全ての視差画素を投票してもよいが、図７に示す視差画像Ｉｐのように、所定の領域（例えば、図７に示す投票領域７０１〜７０３）を設定し、その領域に含まれる視差画素のみを投票するものとしてもよい。例えば、路面は遠方になるにつれて、消失点に向かって狭くなっていくという性質を利用し、図７に示すように、路面の幅にあった投票領域を所定数設定する方法が考えられる。このように投票領域を制限することによって、路面以外のノイズがＶマップに混入することを抑制することができる。また、視差画像中の一水平ラインにおける視差画素を適宜間引いて投票するものとしてもよい。また、間引きに関しては、水平方向だけではなく、垂直方向に対して実行してもよい。 The first generation unit 126 may vote for all the parallax pixels in the parallax image, but like the parallax image Ip shown in FIG. 7, a predetermined region (for example, the voting region 701 shown in FIG. 7). ~ 703) may be set and only the parallax pixels included in the area may be voted. For example, by utilizing the property that the road surface becomes narrower toward the vanishing point as the road surface becomes farther, as shown in FIG. 7, a method of setting a predetermined number of voting areas suitable for the width of the road surface can be considered. By limiting the voting area in this way, it is possible to prevent noise other than the road surface from being mixed into the V-map. Further, the parallax pixels in one horizontal line in the parallax image may be appropriately thinned out for voting. Further, the thinning may be performed not only in the horizontal direction but also in the vertical direction.

第１の生成部１２６により生成されたＶマップは、図６に示す路面推定部１２７へ入力される。路面推定部１２７は、Ｖマップ内の投票された視差点から所定の方法で標本点を選択し、選択された点群を直線近似(または、曲線近似)する形で路面の形状を推定する。ここでは、オブジェクトの高さの基準となる基準オブジェクトは路面に相当する。路面推定部１２７の具体的な内容については後述する。路面推定部１２７による推定結果（路面推定情報）は、クラスタリング処理部１２３へ入力される。 The V map generated by the first generation unit 126 is input to the road surface estimation unit 127 shown in FIG. The road surface estimation unit 127 selects a sample point from the voted disparity points in the V map by a predetermined method, and estimates the shape of the road surface by linearly approximating (or curving) the selected point cloud. Here, the reference object, which is the reference for the height of the object, corresponds to the road surface. The specific contents of the road surface estimation unit 127 will be described later. The estimation result (road surface estimation information) by the road surface estimation unit 127 is input to the clustering processing unit 123.

クラスタリング処理部１２３は、路面推定情報を用いて、取得部１２５により取得された視差画像上の物体位置を検出する。図８は、クラスタリング処理部１２３の詳細な構成の一例を示す図である。図８に示すように、クラスタリング処理部１２３は、第２の生成部１３０と、孤立領域検出処理部１４０と、視差画処理部１５０と、棄却処理部１５０と、を有する。第２の生成部１３０は、視差画像のうち、路面（基準オブジェクトの一例）よりも高い範囲に存在する複数の画素を用いて、ステレオカメラの光軸と直交する方向を示す横方向の位置と、ステレオカメラの光軸の方向を示す奥行方向の位置とが対応付けられた第２の対応情報を生成する。この例では、第２の対応情報は、横軸（Ｘ軸）を横方向の実際の距離（実距離）、縦軸（Ｙ軸）を視差画像の視差値ｄ、奥行方向の軸（Ｚ軸）を頻度とした２次元ヒストグラムである。第２の対応情報は、実距離と視差値ｄとの組み合わせごとに、視差の頻度値を記録した情報であると考えることもできる。 The clustering processing unit 123 detects the position of the object on the parallax image acquired by the acquisition unit 125 by using the road surface estimation information. FIG. 8 is a diagram showing an example of a detailed configuration of the clustering processing unit 123. As shown in FIG. 8, the clustering processing unit 123 includes a second generation unit 130, an isolated region detection processing unit 140, a parallax image processing unit 150, and a rejection processing unit 150. The second generation unit 130 uses a plurality of pixels existing in a range higher than the road surface (an example of a reference object) in the parallax image, and has a lateral position indicating a direction orthogonal to the optical axis of the stereo camera. , Generates a second correspondence information associated with a position in the depth direction indicating the direction of the optical axis of the stereo camera. In this example, the second correspondence information is that the horizontal axis (X-axis) is the actual distance in the horizontal direction (actual distance), the vertical axis (Y-axis) is the parallax value d of the parallax image, and the axis in the depth direction (Z-axis). ) Is a two-dimensional histogram with a frequency. The second correspondence information can be considered to be information that records the frequency value of parallax for each combination of the actual distance and the parallax value d.

ここで、上述の路面推定部１２７の路面推定により、路面を表す直線式が得られているため、視差ｄが決まれば、対応するｙ座標ｙ０が決まり、この座標ｙ０が路面の高さとなる。例えば視差値がｄでｙ座標がｙ’である場合、ｙ’−ｙ０が視差値ｄのときの路面からの高さを示す。上述の座標（ｄ，ｙ’）の路面からの高さＨは、Ｈ＝（ｚ×（ｙ’−ｙ０））／ｆという演算式で求めることができる。なお、この演算式における「ｚ」は、視差値ｄから計算される距離（ｚ＝Ｂｆ／（ｄ−ｏｆｆｓｅｔ））、「ｆ」は撮像ユニット１０２の焦点距離を（ｙ’−ｙ０）の単位と同じ単位に変換した値である。ここで、ＢＦは、撮像ユニット１０２の基線長Ｂと焦点距離ｆを乗じた値、ｏｆｆｓｅｔは無限遠のオブジェクトを撮影したときの視差である。 Here, since the linear equation representing the road surface is obtained by the road surface estimation of the road surface estimation unit 127 described above, if the parallax d is determined, the corresponding y coordinate y0 is determined, and this coordinate y0 becomes the height of the road surface. For example, when the parallax value is d and the y coordinate is y', the height from the road surface when y'−y0 is the parallax value d is indicated. The height H of the above-mentioned coordinates (d, y') from the road surface can be obtained by the arithmetic expression H = (z × (y'−y0)) / f. In this calculation formula, "z" is the distance calculated from the parallax value d (z = Bf / (d-offset)), and "f" is the unit of the focal length of the imaging unit 102 (y'-y0). It is a value converted to the same unit as. Here, BF is a value obtained by multiplying the baseline length B of the imaging unit 102 by the focal length f, and offset is the parallax when an object at infinity is photographed.

第２の生成部１３０は、第２の対応情報として、「ＡｄｕｌｔＬａｒｇｅＵｍａｐ」、「ＬａｒｇｅＵｍａｐ」、「ＳｍａｌｌＵｍａｐ」のうちの少なくとも１つを生成する。以下、これらのマップについて説明する。まず、「ＡｄｕｌｔＬａｒｇｅＵｍａｐ」について説明する。視差画像の横方向の位置をｘ、縦方向の位置をｙ、画素ごとに設定される視差値をｄとすると、第２の生成部１３０は、視差画像のうち、路面よりも高い第１の範囲内の所定値以上の高さの範囲を示す第２の範囲内に存在する点（ｘ、ｙ、ｄ）を、（ｘ、ｄ）の値に基づいて投票することで、横軸を視差画像のｘ、縦軸を視差値ｄ、奥行方向の軸を頻度とした２次元ヒストグラムを生成する。そして、この２次元ヒストグラムの横軸を実距離に変換して、ＡｄｕｌｔＬａｒｇｅＵｍａｐを生成する。 The second generation unit 130 generates at least one of "Adult Large Umap", "Large Umap", and "Small Umap" as the second correspondence information. Hereinafter, these maps will be described. First, "Adult Large Umap" will be described. Assuming that the horizontal position of the parallax image is x, the vertical position is y, and the parallax value set for each pixel is d, the second generation unit 130 is the first parallax image higher than the road surface. Parallax the horizontal axis by voting based on the value of (x, d) the points (x, y, d) existing in the second range indicating the range of height equal to or higher than the predetermined value in the range. A two-dimensional histogram is generated in which x of the image, the vertical axis is the parallax value d, and the axis in the depth direction is the frequency. Then, the horizontal axis of this two-dimensional histogram is converted into an actual distance to generate an Adult Range Umap.

例えば図９に示す撮像画像においては、大人と子供を含む人グループ１と、大人同士の人グループ２と、ポールと、車両とが映り込んでいる。この例では、路面からの実高さが１５０ｃｍ〜２００ｃｍの範囲が第２の範囲として設定され、該第２の範囲の視差値ｄが投票されたＡｄｕｌｔＬａｒｇｅＵｍａｐは図１０のようになる。高さが１５０ｃｍ未満の子供の視差値ｄは投票されないためマップ上に現れないことになる。なお、縦軸は、距離に応じた間引き率を用いて視差値ｄを間引き処理した間引き視差となっている。第２の生成部１３０により生成されたＡｄｕｌｔＬａｒｇｅＵｍａｐは孤立領域検出処理部１４０に入力される。 For example, in the captured image shown in FIG. 9, a person group 1 including an adult and a child, a person group 2 between adults, a pole, and a vehicle are reflected. In this example, a range in which the actual height from the road surface is 150 cm to 200 cm is set as the second range, and the Parallax Umap in which the parallax value d in the second range is voted is as shown in FIG. The parallax value d of a child with a height of less than 150 cm will not appear on the map because it is not voted. The vertical axis is the thinned parallax obtained by thinning out the parallax value d using the thinning rate according to the distance. The Adult Large Umap generated by the second generation unit 130 is input to the isolated region detection processing unit 140.

次に、「ＬａｒｇｅＵｍａｐ」について説明する。視差画像の横方向の位置をｘ、縦方向の位置をｙ、画素ごとに設定される視差値をｄとすると、第２の生成部１３０は、視差画像のうち第１の範囲内に存在する点（ｘ、ｙ、ｄ）を、（ｘ、ｄ）の値に基づいて投票することで、横軸を視差画像のｘ、縦軸を視差値ｄ、奥行方向の軸を頻度とした２次元ヒストグラムを生成する。そして、この２次元ヒストグラムの横軸を実距離に変換して、ＬａｒｇｅＵｍａｐを生成する。図９の例では、０ｃｍ〜２００ｃｍの範囲（上述の第２の範囲を含んでいる）が第１の範囲として設定され、該第１の範囲の視差値ｄが投票されたＬａｒｇｅＵｍａｐは図１１のようになる。また、第２の生成部１３０は、ＬａｒｇｅＵｍａｐと併せて、ＬａｒｇｅＵｍａｐに投票される視差点（実距離と視差値ｄとの組）のうち、路面からの高さ（ｈ）が最も高い視差点の高さを記録して、横軸を実距離（カメラの左右方向の距離）、縦軸を視差値ｄとし、対応する点ごとに高さが記録された高さ情報を生成することもできる。高さ情報は、実距離と視差値ｄとの組み合わせごとに高さを記録した情報であると考えてもよい。以下の説明では、この高さ情報を、「ＬａｒｇｅＵｍａｐの高さマップ」と称する。「ＬａｒｇｅＵｍａｐの高さマップ」に含まれる各画素の位置はＬａｒｇｅＵｍａｐに含まれる各画素の位置に対応している。第２の生成部１３０により生成されたＬａｒｇｅＵｍａｐおよびＬａｒｇｅＵｍａｐの高さマップは孤立領域検出処理部１４０に入力される。 Next, "Large Umap" will be described. Assuming that the horizontal position of the parallax image is x, the vertical position is y, and the parallax value set for each pixel is d, the second generation unit 130 exists within the first range of the parallax image. By voting the points (x, y, d) based on the values of (x, d), the horizontal axis is the x of the parallax image, the vertical axis is the parallax value d, and the axis in the depth direction is the frequency. Generate a parallax. Then, the horizontal axis of this two-dimensional histogram is converted into an actual distance to generate a Large Umap. In the example of FIG. 9, a range of 0 cm to 200 cm (including the second range described above) is set as the first range, and the Parallax Umap in which the parallax value d of the first range is voted is shown in FIG. become that way. Further, the second generation unit 130, together with the Large Umap, has the highest height (h) from the road surface among the parallax points (the pair of the actual distance and the parallax value d) voted for by the Large Umap. It is also possible to record the height of the difference point, set the horizontal axis as the actual distance (distance in the left-right direction of the camera) and the vertical axis as the parallax value d, and generate height information in which the height is recorded for each corresponding point. can. The height information may be considered as information that records the height for each combination of the actual distance and the parallax value d. In the following description, this height information will be referred to as a "Large Umap height map". The position of each pixel included in the "Large Umap height map" corresponds to the position of each pixel included in the Large Umap. The height maps of Large Umap and Large Umap generated by the second generation unit 130 are input to the isolated area detection processing unit 140.

次に、「ＳｍａｌｌＵｍａｐ」について説明する。視差画像の横方向の位置をｘ、縦方向の位置をｙ、画素ごとに設定される視差値をｄとすると、第２の生成部１３０は、視差画像のうち第１の範囲内に存在する点（ｘ、ｙ、ｄ）を、（ｘ、ｄ）の値に基づいて投票（ＬａｒｇｅＵｍａｐを作成する場合よりも少ない数を投票）することで、横軸を視差画像のｘ、縦軸を視差値ｄ、奥行方向の軸を頻度とした２次元ヒストグラムを生成する。そして、この２次元ヒストグラムの横軸を実距離に変換して、ＳｍａｌｌＵｍａｐを生成する。ＳｍａｌｌＵｍａｐは、ＬａｒｇｅＵｍａｐと比較して１画素の距離分解能が低い。また、第２の生成部１３０は、ＳｍａｌｌＵｍａｐと併せて、ＳｍａｌｌＵｍａｐに投票される視差点（実距離と視差値ｄとの組）のうち、路面からの高さ（ｈ）が最も高い視差点の高さを記録して、横軸を実距離（カメラの左右方向の距離）、縦軸を視差値ｄとし、対応する点ごとに高さが記録された高さ情報を生成することもできる。高さ情報は、実距離と視差値ｄとの組み合わせごとに高さを記録した情報であると考えてもよい。以下の説明では、この高さ情報を、「ＳｍａｌｌＵマップの高さｍａｐ」と称する。「ＳｍａｌｌＵｍａｐの高さｍａｐ」に含まれる各画素の位置はＳｍａｌｌＵｍａｐに含まれる各画素の位置に対応している。第２の生成部１３０により生成されたＳｍａｌｌＵｍａｐおよびＳｍａｌｌＵマップの高さマップは孤立領域検出処理部１４０に入力される。 Next, "Small Umap" will be described. Assuming that the horizontal position of the parallax image is x, the vertical position is y, and the parallax value set for each pixel is d, the second generation unit 130 exists within the first range of the parallax image. By voting the points (x, y, d) based on the value of (x, d) (voting a smaller number than when creating a Large Umap), the horizontal axis is x of the parallax image and the vertical axis is the vertical axis. A two-dimensional histogram is generated with the parallax value d and the axis in the depth direction as the frequency. Then, the horizontal axis of this two-dimensional histogram is converted into an actual distance to generate Small Umap. Small Umap has a lower resolution of one pixel than Large Umap. Further, the second generation unit 130, together with Small Umap, has the highest height (h) from the road surface among the parallax points (the pair of the actual distance and the parallax value d) voted for Small Umap. It is also possible to record the height of the difference point, set the horizontal axis as the actual distance (distance in the left-right direction of the camera) and the vertical axis as the parallax value d, and generate height information in which the height is recorded for each corresponding point. can. The height information may be considered as information that records the height for each combination of the actual distance and the parallax value d. In the following description, this height information will be referred to as "Small U map height map". The position of each pixel included in "Small Umap height map" corresponds to the position of each pixel included in Small Umap. The height maps of the Small Umap and Small U maps generated by the second generation unit 130 are input to the isolated area detection processing unit 140.

この例では、第２の生成部１３０はＬａｒｇｅＵｍａｐを生成し、その生成されたＬａｒｇｅＵｍａｐが孤立領域検出処理部１４０に入力される場合を例に挙げて説明するが、これに限らず、例えば「ＡｄｕｌｔＬａｒｇｅＵｍａｐ」、「ＬａｒｇｅＵｍａｐ」、「ＳｍａｌｌＵｍａｐ」を用いて物体検出を行う場合は、第２の生成部１３０は、「ＡｄｕｌｔＬａｒｇｅＵｍａｐ」、「ＬａｒｇｅＵｍａｐ」、「ＳｍａｌｌＵｍａｐ」を生成し、これらのマップが孤立領域検出処理部１４０に入力されてもよい。 In this example, the case where the second generation unit 130 generates the Large Umap and the generated Large Umap is input to the isolated area detection processing unit 140 will be described as an example, but the present invention is not limited to this, for example. When object detection is performed using "Input Large Umap", "Large Umap", and "Small Umap", the second generation unit 130 generates "Adult Large Umap", "Large Umap", and "Small Umap". Then, these maps may be input to the isolated area detection processing unit 140.

図８に戻って説明を続ける。孤立領域検出処理部１４０は、前述の第２の対応情報（この例ではＬａｒｇｅＵｍａｐ）から、視差値ｄの塊の領域である孤立領域（集合領域）を検出する。例えば図１２に示す撮像画像の場合、左右にガードレール８１、８２があり、車両７７及び車両７９がセンターラインを挟んで対面通行をしている。各走行車線には、それぞれ１台の車両７７又は車両７９が走行している。車両７９とガードレール８２との間には２本のポール８０Ａ，８０Ｂが存在している。図１３は、図１２に示す撮像画像に基づいて得られたＬａｒｇｅＵｍａｐであり、枠で囲まれた領域が孤立領域に相当する。 The explanation will be continued by returning to FIG. The isolated region detection processing unit 140 detects an isolated region (aggregate region) which is a region of a mass having a parallax value d from the above-mentioned second correspondence information (Large Umap in this example). For example, in the case of the captured image shown in FIG. 12, there are guardrails 81 and 82 on the left and right, and the vehicle 77 and the vehicle 79 are facing each other with the center line in between. One vehicle 77 or 79 is traveling in each traveling lane. There are two poles 80A and 80B between the vehicle 79 and the guardrail 82. FIG. 13 is a Large Umap obtained based on the captured image shown in FIG. 12, and the region surrounded by the frame corresponds to the isolated region.

図８に示す視差画処理部１５０は、孤立領域検出処理部１４０により検出された孤立領域に対応する視差画像上の領域や実空間での物体情報を検出する視差画処理を行う。図１４は、図１３に示す孤立領域に対応する視差画像上の領域（視差画処理部１５０による処理の結果）を示す図であり、図１４の領域９１はガードレール８１に対応する領域であり、領域９２は車両７７に対応する領域であり、領域９３は車両７９に対応する領域であり、領域９４はポール８０Ａに対応する領域であり、領域９５はポール８０Ｂに対応する領域であり、領域９６はガードレール８２に対応する領域である。 The parallax image processing unit 150 shown in FIG. 8 performs parallax image processing for detecting a region on the parallax image corresponding to the isolated region detected by the isolated region detection processing unit 140 and object information in the real space. FIG. 14 is a diagram showing a region on the parallax image (result of processing by the parallax image processing unit 150) corresponding to the isolated region shown in FIG. 13, and region 91 in FIG. 14 is a region corresponding to the guardrail 81. The area 92 is an area corresponding to the vehicle 77, the area 93 is an area corresponding to the vehicle 79, the area 94 is an area corresponding to the pole 80A, and the area 95 is an area corresponding to the pole 80B, and the area 96. Is an area corresponding to the guardrail 82.

図８に示す棄却処理部１６０は、視差画処理部１５０により検出された視差画上の領域や実空間での物体情報に基づき、出力すべきオブジェクトを選別する棄却処理を行う。棄却処理部１６０は、物体のサイズに着目したサイズ棄却と、物体同士の位置関係に着目したオーバラップ棄却とを実行する。例えばサイズ棄却では、図１５に示すオブジェクトタイプごとに定められたサイズ範囲に当てはまらないサイズの検出結果を棄却する。例えば図１６の例では、領域９１および領域９６は棄却されている。また、オーバラップ棄却では、視差画処理により検出された、視差画上の孤立領域に対応する領域同士に対し、重なりを持つ結果の取捨選択を行う。 The rejection processing unit 160 shown in FIG. 8 performs a rejection process for selecting objects to be output based on the object information in the parallax image region and the real space detected by the parallax image processing unit 150. The rejection processing unit 160 executes size rejection focusing on the size of the object and overlap rejection focusing on the positional relationship between the objects. For example, in size rejection, the detection result of a size that does not fall within the size range defined for each object type shown in FIG. 15 is rejected. For example, in the example of FIG. 16, regions 91 and 96 are rejected. Further, in the overlap rejection, the result of overlapping is selected for the regions corresponding to the isolated regions on the parallax image detected by the parallax image processing.

クラスタリング処理部１２３からの出力情報（検出結果）は図５に示すトラッキング処理部１２４に入力される。トラッキング処理部１２４は、クラスタリング処理部１２３による検出結果（検出された物体）が複数のフレームにわたって連続して出現する場合に追跡対象であると判定し、追跡対象である場合には、その検出結果を物体検出結果として制御ユニット１０４へ出力する。制御ユニット１０４は、物体検出結果に基づいて、実際に車両１０１を制御する。 The output information (detection result) from the clustering processing unit 123 is input to the tracking processing unit 124 shown in FIG. The tracking processing unit 124 determines that the detection result (detected object) by the clustering processing unit 123 is a tracking target when it appears continuously over a plurality of frames, and if it is a tracking target, the detection result. Is output to the control unit 104 as an object detection result. The control unit 104 actually controls the vehicle 101 based on the object detection result.

以下では、基準オブジェクトの一例である路面の形状を推定する路面推定部１２７（図６）の具体的な内容を説明する。図１７は、路面推定部１２７が有する機能の一例を示す図である。図１７に示すように、路面推定部１２７は、分割部１７１と、推定部１７２と、棄却部１７３と、補間部１７４と、を有する。 Hereinafter, the specific contents of the road surface estimation unit 127 (FIG. 6) for estimating the shape of the road surface, which is an example of the reference object, will be described. FIG. 17 is a diagram showing an example of the function of the road surface estimation unit 127. As shown in FIG. 17, the road surface estimation unit 127 includes a division unit 171, an estimation unit 172, a rejection unit 173, and an interpolation unit 174.

分割部１７１は、第１の生成部１２６から入力されるＶマップ（対応情報）を複数のセグメントに分割する。この例では、分割部１７１は、Ｖマップを、奥行方向（視差値ｄの方向、Ｖマップの横軸の方向）に連続する複数のセグメントに分割する。ただし、これに限らず、例えば視差画像のｙ方向（Ｖマップの縦軸方向）に分割してもよい。また、セグメントの位置は任意の位置に設定することが可能である。通常、セグメント間は連続させることが望ましいが、不連続となっても構わない(例えば、所定距離範囲(ｄ値)での推定をあえて実行しない場合など)。本実施形態では、セグメントは２つ以上設定する。セグメントは、等間隔に設定せずに所定の幅で設定することもできる。例えば、遠方領域は解像度が低い(路面分解能が低い)ことがわかっているため、遠方に行くに連れて、セグメントを細かく分割する方法が考えられる。従って、上記に合わせてセグメント数を決定すれば良い。 The division unit 171 divides the V map (correspondence information) input from the first generation unit 126 into a plurality of segments. In this example, the division unit 171 divides the V-map into a plurality of segments continuous in the depth direction (the direction of the parallax value d, the direction of the horizontal axis of the V-map). However, the present invention is not limited to this, and for example, the parallax image may be divided in the y direction (vertical direction of the V map). Moreover, the position of the segment can be set to an arbitrary position. Normally, it is desirable to make the segments continuous, but it may be discontinuous (for example, when the estimation in a predetermined distance range (d value) is not performed intentionally). In this embodiment, two or more segments are set. The segments may be set with a predetermined width instead of being set at equal intervals. For example, since it is known that the distant region has a low resolution (the road surface resolution is low), it is conceivable to divide the segment into smaller pieces as the distance goes far. Therefore, the number of segments may be determined according to the above.

例えば、第１の生成部１２６が、図１８の（Ａ）に示す視差画像Ｉｐ２から図１８の（Ｂ）に示すＶマップを生成し、このＶマップが分割部１７１に入力される場合を想定する。図１８の（Ａ）に示す視差画像Ｉｐ２には、路面６００と、軽トラック６０１とが写り込んでいる。この視差画像Ｉｐ２内の軽トラック６００は、図１８の（Ｂ）に示すＶマップにおいて６０３で示された投票点群（視差点群）に対応している。分割部１７１は、図１８の（Ｂ）に示すＶマップを、所定のｄ座標の範囲で区切られた複数のセグメントに分割する。図１９は、分割部１７１による分割で得られた複数のセグメントの一例を示す図であり、右から順番に、第１セグメントｓｅｇ１、第２セグメントｓｅｇ２、第３セグメントｓｅｇ３、第４セグメントｓｅｇ４、第５セグメントｓｅｇ５、第６セグメントｓｅｇ６、第７セグメントｓｅｇ７と称する。 For example, it is assumed that the first generation unit 126 generates the V map shown in FIG. 18 (B) from the parallax image Ip2 shown in FIG. 18 (A), and this V map is input to the division unit 171. do. The parallax image Ip2 shown in FIG. 18A shows the road surface 600 and the light truck 601. The light track 600 in the parallax image Ip2 corresponds to the voting point group (parallax point group) shown in 603 in the V map shown in FIG. 18 (B). The dividing unit 171 divides the V map shown in FIG. 18B into a plurality of segments divided within a predetermined d-coordinate range. FIG. 19 is a diagram showing an example of a plurality of segments obtained by division by the division unit 171. In order from the right, the first segment seg1, the second segment seg2, the third segment seg3, the fourth segment seg4, and the first segment are shown. It is referred to as 5 segment seg5, 6th segment seg6, and 7th segment seg7.

図１７の説明を続ける。推定部１７２は、対応情報を分割した複数のセグメントごとに、路面（基準オブジェクトの一例）の形状を推定する。より具体的には、推定部１７２は、セグメントごとに以下の処理を行う。まず推定部１７２は、処理対象のセグメント（以下、「対象セグメント」と称する場合がある）における視差値ｄの方向（奥行方向）の各座標（以下、「ｄ座標」と称する場合がある）の位置から、所定の個数（例えば１点など）の代表点（以下、「標本点」と称する）を選択する。標本点の選択方法としては、例えば、各ｄ座標に対して、その垂直（縦）方向に存在する視差点のうち、単純に頻度の最も多い視差点（最頻点）を選択してもよく、または、着目するｄ座標とその左右の複数の画素を併せてＶマップの下方向から上方向に上げていき、路面の視差点が含まれ得る領域を制限した上で、その中から最頻点を選択するといったように、より正確に路面の視差点を捉える方法を用いてもよい。または、視差点がない位置（座標）を標本点として選択してもよい。例えば、着目している座標（ｄ，ｙ）には視差点は存在していないが、周囲に頻度が多い視差点が集中している場合、偶発的に座標（ｄ，ｙ）の視差点が欠落している可能性があるため、この抜けている位置を標本点として選択することも可能である。 The description of FIG. 17 will be continued. The estimation unit 172 estimates the shape of the road surface (an example of a reference object) for each of a plurality of segments in which the corresponding information is divided. More specifically, the estimation unit 172 performs the following processing for each segment. First, the estimation unit 172 describes each coordinate (hereinafter, may be referred to as “d coordinate”) in the direction (depth direction) of the parallax value d in the segment to be processed (hereinafter, may be referred to as “target segment”). From the position, a predetermined number (for example, one point) of representative points (hereinafter, referred to as “sample points”) are selected. As a method of selecting a sample point, for example, for each d-coordinate, the most frequent disparity point (most frequent point) may be simply selected from the disparity points existing in the vertical (vertical) direction. Or, the d-coordinate of interest and a plurality of pixels on the left and right of the d-coordinate are raised from the bottom to the top of the V-map to limit the area where the discrepancy point of the road surface can be included, and then the most frequent of them. A method of capturing the difference point on the road surface more accurately, such as selecting a point, may be used. Alternatively, a position (coordinates) without a discrepancy point may be selected as a sample point. For example, if there is no discriminant point at the coordinate (d, y) of interest, but frequent discriminant points are concentrated around it, the discriminant point at the coordinate (d, y) accidentally becomes. Since it may be missing, it is possible to select this missing position as the sample point.

また、推定部１７２は、以上のようにして選択した標本点のうち、不適切な標本点を除去してもよい。これによって、後述の標本点群に対する直線近似の際に、不適切な標本点（外れ点）の影響を受けて、路面の推定結果が不適切になってしまうことを抑制することができる。外れ点の除去方法としては、例えば、一旦、対象セグメント内の全ての標本点を使って最小二乗法で直線近似し、近似直線から所定の距離離れた標本点を除去するものとしてもよい。この場合、外れ点を除去した状態で、再度、最小二乗法により推定した結果が最終的な推定結果となる。 Further, the estimation unit 172 may remove an inappropriate sample point from the sample points selected as described above. As a result, it is possible to prevent the road surface estimation result from becoming inappropriate due to the influence of an inappropriate sample point (off point) in the case of linear approximation with respect to the sample point group described later. As a method for removing the outliers, for example, all the sample points in the target segment may be linearly approximated by the least squares method, and the sample points separated from the approximate straight line by a predetermined distance may be removed. In this case, the final estimation result is the result estimated by the least squares method again with the outliers removed.

推定部１７２は、残った標本点を使って、路面の形状を推定する。路面の形状を推定する方法としては、例えば、最小二乗法等によって標本点群に対して直線近似を行う方法、または、多項式近似等を用いて曲線形状を推定する方法等がある。同時に、後段の成否判定（路面の形状を推定した結果に対する成否判定）に使用するために、これらの手法に基づいた相関係数などの数値尺度を算出しておいても良い。以降の説明では、特に断らない限り、路面の形状推定は直線近似によるものとして説明する。また、路面の形状の推定結果を推定路面と称する場合がある。 The estimation unit 172 estimates the shape of the road surface using the remaining sample points. As a method of estimating the shape of the road surface, for example, there are a method of performing a straight line approximation to a sample point cloud by a least squares method or the like, a method of estimating a curve shape using a polynomial approximation or the like, and the like. At the same time, a numerical scale such as a correlation coefficient based on these methods may be calculated for use in the success / failure judgment of the latter stage (success / failure judgment based on the result of estimating the shape of the road surface). In the following description, unless otherwise specified, the shape estimation of the road surface will be described by linear approximation. Further, the estimation result of the shape of the road surface may be referred to as an estimated road surface.

ここで、例えば図１８の（Ａ）の視差画像Ｉｐ２における台形形状の領域７１１内の視差画素がＶマップ生成時の投票対象である場合を想定する。そして、領域７１１に含まれる２つの領域７１２および領域７１３のうち、領域７１２は路面の視差画素（視差値を有する画素）が存在する領域を表し、領域７１３は路面の視差画素が存在しない領域を表すものとする。したがって、図１８に示す対応関係で、視差画像Ｉｐ２のｙ座標に存在する視差画素はＶマップ上の対応する座標（ｄ，ｙ）に投票されたと仮定すると、領域７１２内の路面に対応する視差画素は路面として投票されるが、領域７１３には路面に対応する視差画素が存在しないので投票されない。また、軽トラック６０１は荷台部分６１２とキャビン部分６１１とでカメラからの距離が異なるため、それぞれに対応する視差画素は、Ｖマップ上の異なるセグメントに投票される。さらに、軽トラック６０１には荷台カバーが存在するため、画像の下から上に向かうにつれて緩やかに距離が変化していくため、Ｖマップ上では路面の視差分布と類似する。このため、推定される路面は、軽トラックに対応する視差（物体視差）の影響を受けてしまい、正解路面（実際の路面）に比べて高い位置に推定されてしまう。 Here, for example, it is assumed that the parallax pixels in the trapezoidal region 711 in the parallax image Ip2 of FIG. 18A are the voting targets at the time of V-map generation. Of the two regions 712 and 713 included in the region 711, the region 712 represents a region in which parallax pixels (pixels having a parallax value) on the road surface exist, and the region 713 is a region in which the parallax pixels on the road surface do not exist. It shall be represented. Therefore, in the correspondence shown in FIG. 18, assuming that the parallax pixels existing at the y-coordinate of the parallax image Ip2 are voted for the corresponding coordinates (d, y) on the V-map, the parallax corresponding to the road surface in the region 712. The pixels are voted as the road surface, but are not voted because there are no parallax pixels corresponding to the road surface in the area 713. Further, since the distance from the camera of the light truck 601 is different between the loading platform portion 612 and the cabin portion 611, the parallax pixels corresponding to each are voted for different segments on the V map. Further, since the light truck 601 has a loading platform cover, the distance gradually changes from the bottom to the top of the image, which is similar to the parallax distribution of the road surface on the V map. Therefore, the estimated road surface is affected by the parallax (object parallax) corresponding to the light truck, and is estimated at a higher position than the correct road surface (actual road surface).

そこで、本実施形態では、セグメントごとに、該セグメントにおける推定路面を延長して延長路面を設定し、延長路面よりも下方に存在する視差値ｄの頻度値が一定以上存在する場合には、該延長路面に対応する推定路面は物体視差の影響を受けて引き上がっていると判断し、該当推定路面を棄却する（推定は失敗と判断する）。以下、具体的な内容を説明する。 Therefore, in the present embodiment, the estimated road surface in the segment is extended to set the extended road surface for each segment, and when the frequency value of the parallax value d existing below the extended road surface exists more than a certain level, the said It is judged that the estimated road surface corresponding to the extended road surface is pulled up due to the influence of the object parallax, and the corresponding estimated road surface is rejected (the estimation is judged to be a failure). The specific contents will be described below.

図１７に示す棄却部１７３は、セグメントごとに、推定部１７２により推定された推定路面（路面の形状を示す推定形状）に基づいて設定された所定の形状を基準に、投票された画素の視差値ｄの分布（より具体的には、Ｖマップ上の投票点の分布（視差値ｄの頻度値の分布））が所定の基準に合致する場合は、推定路面を棄却する。上記所定の形状は、推定路面に基づく形状を、隣接するセグメントまで延長させた形状を含む。この例では、推定路面を、そのまま隣接するセグメントまで延長させた形状（以下、「延長路面」と称する）を上記所定の形状としているが、これに限らず、上記所定の形状は、後述のマージン線であってもよい。棄却部１７３は、延長路面よりも下方に存在する視差値ｄの頻度値の分布に応じて、推定路面を棄却するか否かを決定する。より具体的には、棄却部１７３は、延長路面よりも下方に存在する視差値ｄの頻度値が閾値以上の場合、推定路面を棄却する。 The rejection unit 173 shown in FIG. 17 is a parallax of the pixels voted based on a predetermined shape set based on the estimated road surface (estimated shape indicating the shape of the road surface) estimated by the estimation unit 172 for each segment. If the distribution of the value d (more specifically, the distribution of voting points on the V map (distribution of the frequency value of the parallax value d)) meets a predetermined criterion, the estimated road surface is rejected. The predetermined shape includes a shape obtained by extending a shape based on an estimated road surface to an adjacent segment. In this example, the shape obtained by extending the estimated road surface to the adjacent segment as it is (hereinafter, referred to as “extended road surface”) is defined as the above-mentioned predetermined shape, but the above-mentioned predetermined shape is not limited to this, and the above-mentioned predetermined shape is a margin described later. It may be a line. The rejection unit 173 determines whether or not to reject the estimated road surface according to the distribution of the frequency values of the parallax value d existing below the extended road surface. More specifically, the rejection unit 173 rejects the estimated road surface when the frequency value of the parallax value d existing below the extended road surface is equal to or greater than the threshold value.

通常、図２０の第７セグメントｓｅｇ７における推定路面Ｂのように正解路面を推定できている場合、該推定路面Ｂを、隣接する第６セグメントｓｅｇ６まで延長した延長路面Ｂの下方に存在する視差値ｄの頻度値は少量となる（または出現しない）。一方で、第３セグメントｓｅｇ３における推定路面Ａのように物体視差により路面が引き上がってしまっている場合、該推定路面Ａを、隣接する第２セグメントｓｅｇ２まで延長した延長路面Ａの下方には物体視差（荷台６１２に対応する視差）が存在する。したがって、棄却部１７３は、延長路面Ａよりも下方に存在する視差値ｄの頻度値を計測し、視差値ｄの頻度値が閾値以上の場合は、推定路面Ａを棄却する。このような処理をセグメントごとに実行する。なお、推定路面の成否判定として、上記処理に加えて、角度による成否判定や標本点群の分散による成否判定などの異なる成否判定を併せて適用しても構わない。なお、角度による成否判定の一例として、推定路面の実角度が所定値を超えていた場合に該推定路面を棄却する態様などがある。また、標本点群の分散による成否判定の一例として、ばらついた点群から推定された路面はその形状が信頼できないものとして、該推定路面を棄却する態様などがある。 Normally, when the correct road surface can be estimated like the estimated road surface B in the 7-segment segment 7 of FIG. 20, the parallax value existing below the extended road surface B extending the estimated road surface B to the adjacent 6-segment seg6. The frequency value of d is small (or does not appear). On the other hand, when the road surface is pulled up due to object parallax like the estimated road surface A in the third segment seg3, an object is below the extended road surface A in which the estimated road surface A is extended to the adjacent second segment seg2. There is parallax (parallax corresponding to the loading platform 612). Therefore, the rejection unit 173 measures the frequency value of the parallax value d existing below the extended road surface A, and rejects the estimated road surface A when the frequency value of the parallax value d is equal to or greater than the threshold value. Such processing is executed for each segment. In addition to the above processing, different success / failure judgments such as success / failure judgment based on the angle and success / failure judgment based on the dispersion of the sample point group may be applied together as the success / failure judgment of the estimated road surface. As an example of success / failure judgment based on the angle, there is a mode in which the estimated road surface is rejected when the actual angle of the estimated road surface exceeds a predetermined value. Further, as an example of success / failure judgment based on the dispersion of the sample point cloud, there is an embodiment in which the estimated road surface is rejected because the shape of the road surface estimated from the scattered point cloud is unreliable.

なお、実際には路面視差は分散する可能性があるので（例えば遠方に向かうほど視差精度は悪くなる）、図２０に示すマージン線Ａ、Ｂのように所定のマージン線を設けて、マージン線よりも下方を計測範囲としてもよい。要するに、上記所定の形状は、推定路面に基づく形状を、隣接するセグメントまで延長させた形状を含む形態であってもよい。「推定路面に基づく形状」とは、推定路面そのものであってもよいし、マージン線であってもよい。また、延長路面に関しては、着目セグメントよりも近方セグメントを使っても良いし、遠方セグメントを使っても良い。無論、両方に延長した路面を使用しても構わない。また、延長する長さも所定の長さを設定することが可能である。例えば、１セグメント分だけ延長してもよいし、それ以上延長しても構わない。また、セグメント単位に限らず、所定の距離分延長しても構わない。また、本処理を適用する範囲を限定してもよい。例えば、近方で路面推定に失敗してしまうと、推定路面が引き上がった場合に目前の物体が未認識になるというリスクを考慮して、所定のセグメントよりも近方のセグメントに対してのみ本処理を実行しても構わない。無論、遠方のセグメントに対してのみ実行してもよいため、適用する範囲は任意である。 In reality, the road surface parallax may be dispersed (for example, the parallax accuracy deteriorates as the distance increases). Therefore, a predetermined margin line is provided as shown in FIGS. 20A and 20 to provide a margin line. The measurement range may be below. In short, the predetermined shape may include a shape obtained by extending a shape based on an estimated road surface to an adjacent segment. The "shape based on the estimated road surface" may be the estimated road surface itself or a margin line. Further, with respect to the extended road surface, a segment closer to the segment of interest may be used, or a segment farther away may be used. Of course, you may use the road surface extended to both. Further, the length to be extended can also be set to a predetermined length. For example, it may be extended by one segment or longer. Further, the extension is not limited to the segment unit, and may be extended by a predetermined distance. Further, the range to which this process is applied may be limited. For example, if the road surface estimation fails in the near vicinity, considering the risk that the object in front will be unrecognized when the estimated road surface is pulled up, only for the segment closer than the predetermined segment. This process may be executed. Of course, it can be executed only for distant segments, so the scope of application is arbitrary.

次に、延長路面の下方に存在する視差値ｄの頻度値を計測する方法について説明する。本実施形態では、棄却部１７３は、奥行方向の位置ごとに（Ｖマップの視差値ｄの方向（横軸方向）の位置ごとに）、延長路面よりも下方に存在する視差値ｄの頻度値をカウントした頻度値を対応付けた頻度ヒストグラムを生成する。そして、頻度ヒストグラムを参照して、対応する頻度値が所定値以上となる奥行方向の位置の数を示すビン数を計測し、セグメントの長さに対するビン数の割合が閾値以上の場合、延長路面に対応する推定路面を棄却する。Ｖマップの各座標には視差値ｄの頻度値が格納されているため、頻度ヒストグラムを作成する際には、頻度値の合計値を用いてもよいし、頻度値が所定値以上となっている座標数をカウントしてもよい。そして、ビン判定閾値以上の頻度値が対応付けられた頻度ヒストグラムの座標の数（視差値ｄの方向の座標の数）をカウントし、セグメントの長さに占める、ビン判定閾値以上のビン数の割合（ビン割合）を算出する。そして、ビン割合が閾値（「割合閾値」と称する）以上の場合は、延長路面（またはマージン線）よりも下方に物体視差が存在すると判定する。 Next, a method of measuring the frequency value of the parallax value d existing below the extended road surface will be described. In the present embodiment, the rejection unit 173 has a frequency value of the parallax value d existing below the extension road surface for each position in the depth direction (for each position in the direction of the parallax value d of the V map (horizontal axis direction)). Generates a frequency histogram associated with the frequency values that count. Then, with reference to the frequency histogram, the number of bins indicating the number of positions in the depth direction in which the corresponding frequency value is equal to or greater than a predetermined value is measured, and when the ratio of the number of bins to the segment length is equal to or greater than the threshold value, the extended road surface Reject the estimated road surface corresponding to. Since the frequency value of the parallax value d is stored in each coordinate of the V map, the total value of the frequency values may be used when creating the frequency histogram, or the frequency value becomes a predetermined value or more. You may count the number of coordinates. Then, the number of coordinates of the frequency histogram associated with the frequency value equal to or higher than the bin judgment threshold value (the number of coordinates in the direction of the parallax value d) is counted, and the number of bins equal to or higher than the bin judgment threshold value in the segment length Calculate the ratio (bin ratio). Then, when the bin ratio is equal to or higher than the threshold value (referred to as “ratio threshold value”), it is determined that the object parallax exists below the extended road surface (or the margin line).

例えば図２１の例では、第２セグメントｓｅｇ２に対応する頻度ヒストグラムＡは、ビン判定閾値以上のビン数が２であり、例えば割合閾値を５０％とすれば、セグメント長さ（３つのビンに相当）に占める、ビン判定閾値以上のビン数の割合は割合閾値以上になるので、延長路面Ａの下方に物体視差が存在すると判定し、延長路面Ａに対応する推定路面Ａを棄却する。一方で、第６セグメントｓｅｇ６に対応する頻度ヒストグラムＢは、ビン判定閾値を超えているビンが存在しないため、ビン割合は割合閾値未満となり、延長路面Ｂの下方には物体視差が存在しないと判定し、延長路面Ｂに対応する推定路面Ｂを棄却することはしない。ここで、ビン判定閾値を使用する理由は、ノイズに対してロバストにするためである。ビン判定閾値を設けずに、頻度が１以上のビン数をカウントしてしまうと、ノイズが多い雨天などのシーンにおいて、ビン割合が割合閾値を超えやすくなってしまう。無論、このビン判定閾値は設けなくても良い(ビン判定閾値＝０であってもよい)。 For example, in the example of FIG. 21, the frequency histogram A corresponding to the second segment seg2 has 2 bins equal to or greater than the bin determination threshold value. For example, if the ratio threshold value is 50%, the segment length (corresponding to 3 bins). ), Since the ratio of the number of bins equal to or greater than the bin determination threshold value is equal to or greater than the ratio threshold value, it is determined that there is object parallax below the extension road surface A, and the estimated road surface A corresponding to the extension road surface A is rejected. On the other hand, in the frequency histogram B corresponding to the sixth segment seg6, since there is no bin exceeding the bin determination threshold value, the bin ratio is less than the ratio threshold value, and it is determined that there is no object parallax below the extended road surface B. However, the estimated road surface B corresponding to the extended road surface B is not rejected. Here, the reason for using the bin determination threshold is to make it robust against noise. If the number of bins having a frequency of 1 or more is counted without setting the bin determination threshold value, the bin ratio tends to exceed the ratio threshold value in a scene such as rainy weather with a lot of noise. Of course, this bin determination threshold may not be provided (the bin determination threshold may be 0).

また、別の方法として、棄却部１７３は、Ｖマップのうち、延長路面（またはマージン線）よりも下方の所定領域に占める、所定数（１でもよいし、ノイズ対策として１よりも大きい数であってもよい）以上の視差画素が投票された座標（所定数以上の頻度値を有する座標）の合計数の割合が閾値以上の場合、該延長路面に対応する推定路面を棄却することもできる。所定領域の形状は任意であるが、例えば図２２のように、マージン線（または延長路面であってもよい）よりも下方の領域を好適に捉えるために台形としてもよいが、これに限らず、例えば矩形としてもよい。図２２の例では、計測領域Ａ内には所定数以上の視差画素が投票された座標が半数以上存在しているため、例えば閾値が５０％とすると、棄却部１７３は、マージン線Ａよりも下方に物体視差が存在すると判定し、マージン線Ａに対応する推定路面Ａを棄却する。一方、計測領域Ｂ内には、所定数以上の視差画素が投票された座標が存在しないため、棄却部１７３は、マージン線Ｂよりも下方に物体視差は存在しないと判定し、マージン線Ｂに対応する推定路面Ｂを棄却することはしない。 Alternatively, as another method, the rejection unit 173 is a predetermined number (which may be 1 or a number larger than 1 as a noise countermeasure) occupying a predetermined area below the extension road surface (or margin line) in the V map. If the ratio of the total number of coordinates (coordinates having a frequency value of a predetermined number or more) voted for the parallax pixels or more is equal to or more than the threshold value, the estimated road surface corresponding to the extended road surface can be rejected. .. The shape of the predetermined region is arbitrary, but as shown in FIG. 22, for example, it may be trapezoidal in order to preferably capture the region below the margin line (or may be an extension road surface), but the present invention is not limited to this. , For example, it may be a rectangle. In the example of FIG. 22, since more than half of the coordinates in which a predetermined number or more of parallax pixels are voted exist in the measurement area A, for example, when the threshold value is 50%, the rejection unit 173 is larger than the margin line A. It is determined that the object parallax exists below, and the estimated road surface A corresponding to the margin line A is rejected. On the other hand, since there are no coordinates in the measurement area B in which a predetermined number or more of parallax pixels have been voted, the rejection unit 173 determines that there is no object parallax below the margin line B, and sets the margin line B. The corresponding estimated road surface B is not rejected.

図１７に戻って説明を続ける。補間部１７４は、「設定部」の一例であり、棄却部１７３によりセグメントに対応する推定路面が棄却された場合、該セグメントに対応する路面の形状として、所定の路面（所定の形状）を設定（補間）する。所定の路面の一例としては、平坦な形状と仮定したデフォルト路面（デフォルト形状）、または、過去のフレームで推定した形状を示す履歴路面（履歴形状）などがある。図２３の（Ａ）は、車両が平坦な路面を走行している場合の視差画像Ｉｐ３を示し、図２３の（Ｂ）は、視差画像Ｉｐ３から生成されたＶマップを示している。図２３の（Ｂ）に示すように、平坦な路面を走行している場合、推定される推定路面ＥＲ１は、平坦な路面と仮定した路面であるデフォルト路面ＤＲと、ほぼ一致する。デフォルト路面ＤＲは予めカメラの取付高さとピッチング角度から算出することが可能である。また、履歴路面とは、１フレーム以上前のフレームで推定された過去の推定路面を示し、過去の所定数のフレームで推定された路面を平均した路面であってもよい。 Returning to FIG. 17, the description will be continued. The interpolation unit 174 is an example of the “setting unit”, and when the estimated road surface corresponding to the segment is rejected by the rejection unit 173, a predetermined road surface (predetermined shape) is set as the shape of the road surface corresponding to the segment. (Interpolate). As an example of a predetermined road surface, there is a default road surface (default shape) assuming a flat shape, or a historical road surface (history shape) showing a shape estimated in a past frame. FIG. 23 (A) shows a parallax image Ip3 when the vehicle is traveling on a flat road surface, and FIG. 23 (B) shows a V-map generated from the parallax image Ip3. As shown in FIG. 23 (B), when traveling on a flat road surface, the estimated road surface ER1 substantially coincides with the default road surface DR, which is a road surface assumed to be a flat road surface. The default road surface DR can be calculated in advance from the mounting height of the camera and the pitching angle. Further, the historical road surface indicates a past estimated road surface estimated in a frame one frame or more before, and may be a road surface obtained by averaging the road surfaces estimated in a predetermined number of frames in the past.

図２４は、本実施形態の路面検出処理部１２２による処理の一例を示すフローチャートである。各ステップの具体的な内容は上述したとおりであるので、詳細な説明については適宜に省略する。まず、取得部１２５は視差画像を取得する（ステップＳ１）。取得部１２５は、視差画像生成部１１３により生成された視差画像を直接取得してもよいし、視差画像を予めＣＤ、ＤＶＤ、ＨＤＤなどの記録メディアやネットワーク・ストレージに保存しておき、必要時にこれらを読み込んで使用しても構わない。また、視差画像は一枚のみ取得してもよいし、動画像データをフレームごとに逐次取得しても構わない。なお、Ｖマップを事前に構築しておき、路面検出処理部１２２へ入力する方法も可能である。この場合、ステップＳ１および次のステップＳ２はスキップし、ステップＳ３から処理が開始する。 FIG. 24 is a flowchart showing an example of processing by the road surface detection processing unit 122 of the present embodiment. Since the specific contents of each step are as described above, detailed description thereof will be omitted as appropriate. First, the acquisition unit 125 acquires a parallax image (step S1). The acquisition unit 125 may directly acquire the parallax image generated by the parallax image generation unit 113, or store the parallax image in advance in a recording medium such as a CD, DVD, or HDD or network storage, and when necessary. You may read and use these. Further, only one parallax image may be acquired, or moving image data may be acquired sequentially for each frame. It is also possible to construct a V-map in advance and input it to the road surface detection processing unit 122. In this case, step S1 and the next step S2 are skipped, and the process starts from step S3.

次に、第１の生成部１２６は、ステップＳ１で取得された視差画像を用いて、Ｖマップを生成する（ステップＳ２）。具体的な内容は上述したとおりである。 Next, the first generation unit 126 generates a V map using the parallax image acquired in step S1 (step S2). The specific contents are as described above.

次に、路面推定部１２７（分割部１７１）は、ステップＳ２で生成されたＶマップを複数のセグメントに分割する（ステップＳ３）。具体的な内容は上述したとおりである。 Next, the road surface estimation unit 127 (division unit 171) divides the V map generated in step S2 into a plurality of segments (step S3). The specific contents are as described above.

以下のステップＳ４〜ステップＳ７の処理はセグメントの数だけ繰り返し実行される。なお、ここでは、一のセグメントについてステップＳ４〜ステップＳ７の処理が完了した後に、次のセグメントについてステップＳ４〜ステップＳ７の処理が実行されるが、このような構成に限定されない。例えば、各ステップの処理を全セグメント分実行した後に、次のステップへ移行するという形態であってもよい。例えばステップＳ４の処理を全セグメント分実行した後に、ステップＳ５に移行するといった具合である。 The following processes of steps S4 to S7 are repeatedly executed for the number of segments. Here, after the processing of steps S4 to S7 is completed for one segment, the processing of steps S4 to S7 is executed for the next segment, but the configuration is not limited to this. For example, it may be in the form of moving to the next step after executing the processing of each step for all the segments. For example, after the processing of step S4 is executed for all the segments, the process proceeds to step S5.

ステップＳ４では、路面推定部１２７（推定部１７２）は、各ｄ座標に対して、標本点探索を実施する（ステップＳ４）。このとき、標本点は１点に限定せず、複数点決定してもよい。また、視差が垂直方向に存在しないｄ座標も存在することから、標本点を決定しないｄ座標が存在しても良い。具体的な内容は上述したとおりである。 In step S4, the road surface estimation unit 127 (estimation unit 172) performs a sample point search for each d coordinate (step S4). At this time, the sample points are not limited to one point, and a plurality of sample points may be determined. Further, since there are d-coordinates in which the parallax does not exist in the vertical direction, there may be d-coordinates in which the sample point is not determined. The specific contents are as described above.

ステップＳ５では、路面推定部１２７（推定部１７２）は、路面の形状を推定する（ステップＳ５）。具体的な内容は上述したとおりである。 In step S5, the road surface estimation unit 127 (estimation unit 172) estimates the shape of the road surface (step S5). The specific contents are as described above.

ステップＳ６では、路面推定部１２７（棄却部１７３）は、推定路面を延長した延長路面を設定し、延長路面より下方に存在する視差値ｄの頻度値が閾値以上であるか否かを判断する。つまり、路面推定部１２７（棄却部１７３）は、推定路面を棄却するか否かを判断する（ステップＳ６）。具体的な内容は上述したとおりである。 In step S6, the road surface estimation unit 127 (rejection unit 173) sets an extension road surface that extends the estimated road surface, and determines whether or not the frequency value of the parallax value d existing below the extension road surface is equal to or greater than the threshold value. .. That is, the road surface estimation unit 127 (rejection unit 173) determines whether or not to reject the estimated road surface (step S6). The specific contents are as described above.

ステップＳ６の結果が否定の場合（ステップＳ６：Ｎｏ）、推定路面がそのまま採用されることになる。一方、ステップＳ６の結果が肯定の場合（ステップＳ６：Ｙｅｓ）、推定路面は棄却され、路面推定部１２７（補間部１７４）は、該セグメントに対応する路面として、上述の所定の路面（例えばデフォルト路面や履歴路面等）を、新たな推定路面として設定（補間）する（ステップＳ７）。 If the result of step S6 is negative (step S6: No), the estimated road surface is adopted as it is. On the other hand, when the result of step S6 is affirmative (step S6: Yes), the estimated road surface is rejected, and the road surface estimation unit 127 (interpolation unit 174) has the above-mentioned predetermined road surface (for example, default) as the road surface corresponding to the segment. The road surface, historical road surface, etc.) are set (interpolated) as a new estimated road surface (step S7).

なお、上記に限らず、例えばステップＳ４とステップＳ５との間に上述した外れ点を除去する処理を入れてもよい。また、全てのセグメントについてステップＳ４〜ステップＳ７の処理が完了した後に、セグメント間の推定路面が滑らかに連続するように修正するスムージング処理を行う形態であってもよい。スムージング処理の一例として、例えば２つのセグメントの推定路面の内、一方の推定路面の始点に対応するｄ座標と、他方の推定路面の終点に対応するｄ座標(セグメント間に切れ目がない場合、終点と始点は同じｄ座標を指す)が所定のｙ座標位置を通るように修正する(修正するということは推定路面のＶマップにおける傾きと切片が変更されることと同意になる)処理を行ってもよい。このスムージング処理により、全セグメント間で推定路面の連続性が担保される。上記の所定のｙ座標位置とは、例えば、上記の始点に対応するｙ座標と終点に対応するｙ座標との中点のｙ座標を使用する方法が考えられる。スムージング処理することで、あるセグメントでの推定路面が適していない場合に修正される可能性があるため、路面推定の精度を向上させる効果がある。スムージング処理された推定路面が最終結果となる。また、このスムージング処理は、一つのセグメントに対するステップＳ４〜ステップＳ７の処理が完了するたびに、該一つのセグメントに対応する推定路面と、一つ前のセグメントに対応する推定路面とのスムージング処理を行う形態であってもよい。なお、外れ点を除去する処理やスムージング処理を行わない形態であってもよい。 Not limited to the above, for example, a process for removing the above-mentioned deviation point may be inserted between steps S4 and S5. Further, after the processes of steps S4 to S7 are completed for all the segments, a smoothing process for correcting the estimated road surface between the segments so as to be smoothly continuous may be performed. As an example of smoothing processing, for example, of the estimated road surfaces of two segments, the d coordinate corresponding to the start point of one estimated road surface and the d coordinate corresponding to the end point of the other estimated road surface (the end point when there is no break between the segments). And the start point point to the same d-coordinate) so that it passes through the predetermined y-coordinate position (correction agrees with the change of the slope and intercept in the V-map of the estimated road surface). May be good. This smoothing process ensures the continuity of the estimated road surface across all segments. As the predetermined y-coordinate position, for example, a method of using the y-coordinate of the midpoint between the y-coordinate corresponding to the start point and the y-coordinate corresponding to the end point can be considered. The smoothing process has the effect of improving the accuracy of road surface estimation because it may be corrected when the estimated road surface in a certain segment is not suitable. The smoothed estimated road surface is the final result. Further, in this smoothing process, each time the processes of steps S4 to S7 for one segment are completed, the smoothing process of the estimated road surface corresponding to the one segment and the estimated road surface corresponding to the previous segment is performed. It may be in the form of performing. In addition, the form may be in a form in which the process of removing the outliers or the smoothing process is not performed.

以上に説明したように、本実施形態では、Ｖマップを複数のセグメントに分割し、セグメントごとに路面を推定する。そして、セグメントごとに、推定路面を延長した延長路面を設定し、延長路面よりも下方に存在する視差値ｄの頻度値が一定以上存在する場合には、延長路面に対応する推定路面は物体視差の影響を受けて引き上がっていると判断し、該推定路面を棄却する（推定は失敗と判断する）。これにより、実際の路面とは異なる推定路面を用いて物体検出が行われてしまうことを防止できるので、結果として、物体の検出精度を十分に確保することができる。 As described above, in the present embodiment, the V map is divided into a plurality of segments, and the road surface is estimated for each segment. Then, an extension road surface that extends the estimated road surface is set for each segment, and when the frequency value of the parallax value d existing below the extension road surface exists above a certain level, the estimated road surface corresponding to the extension road surface is the object parallax. It is judged that the road surface has been pulled up due to the influence of the above, and the estimated road surface is rejected (the estimation is judged to be a failure). As a result, it is possible to prevent the object detection from being performed using an estimated road surface different from the actual road surface, and as a result, it is possible to sufficiently secure the detection accuracy of the object.

（第２の実施形態）
次に、第２の実施形態を説明する。上述の第１の実施形態と共通する部分については適宜に説明を省略する。基本的な構成は上述の第１の実施形態と同様であるが、本実施形態では、棄却部１７３は、上記所定の形状（延長路面またはマージン線）に占める、視差値ｄを有する画素（視差画像の画素）が投票された座標の合計数の割合が閾値未満の場合、推定路面を棄却する。 (Second embodiment)
Next, the second embodiment will be described. The description of the parts common to the first embodiment described above will be omitted as appropriate. The basic configuration is the same as that of the first embodiment described above, but in the present embodiment, the rejection unit 173 is a pixel (parallax) having a parallax value d in the predetermined shape (extended road surface or margin line). If the percentage of the total number of coordinates voted for (pixels in the image) is less than the threshold, the estimated road surface is rejected.

ここで、例えば図２５の（Ａ）の視差画像Ｉｐ４における台形形状の領域８１１内の視差画素がＶマップ生成時の投票対象である場合を想定する。そして、領域８１１に含まれる３つの領域８１２、領域８１３および領域８１４のうち、領域８１２および領域８１４は視差画素が存在する領域を表し、領域８１３は視差画素が存在しない領域を表すものとする。路面視差が少量、または存在しない場合に、大型のトラック８２０などの直立する物体が存在する場合、推定路面が物体視差により引き上がってしまう場合がある。例えば、図２５のように大型のトラック８２０が存在すると、Ｖマップ上の対応するセグメントに、トラック８２０に対応する視差が縦方向に分布する。通常、路面視差が十分に存在し正しく路面推定できるのであれば、推定路面は路面の視差に対応する投票点群上に存在する。従って、延長路面も路面の視差に対応する投票点群上に存在することになる。しかし、物体視差により推定路面が不適切な傾きを持って引き上がっている場合、延長路面は、視差を持つ画素（視差画像の画素）が投票された座標の分布（投票点群）上から外れた位置に推定されることになる。図２５の（Ｂ）に示すように、推定路面Ｂを延長した延長路面Ｂは、視差を持つ画素が投票された座標の分布上に推定されるが、推定路面Ａのように物体視差により不適切に引き上がっている場合、その延長路面Ａは、視差を持つ画素が投票された座標の分布上ではない領域に存在することになる。 Here, for example, it is assumed that the parallax pixels in the trapezoidal region 811 in the parallax image Ip4 of FIG. 25 (A) are the voting targets at the time of V-map generation. Of the three regions 812, 813, and 814 included in the region 811, the region 812 and the region 814 represent a region in which the parallax pixels exist, and the region 813 represents a region in which the parallax pixels do not exist. When the road surface parallax is small or does not exist, and when an upright object such as a large truck 820 is present, the estimated road surface may be pulled up by the object parallax. For example, when a large truck 820 is present as shown in FIG. 25, the parallax corresponding to the truck 820 is distributed in the vertical direction in the corresponding segment on the V map. Normally, if the road surface parallax is sufficient and the road surface can be estimated correctly, the estimated road surface exists on the voting point group corresponding to the road surface parallax. Therefore, the extended road surface also exists on the voting point group corresponding to the parallax of the road surface. However, when the estimated road surface is pulled up with an inappropriate inclination due to object parallax, the extended road surface deviates from the distribution of coordinates (point cloud) in which the pixels with parallax (pixels in the parallax image) are voted. It will be estimated at the position. As shown in FIG. 25 (B), the extended road surface B obtained by extending the estimated road surface B is estimated on the distribution of the coordinates in which the pixels having parallax are voted, but it is not possible due to the object parallax like the estimated road surface A. When properly pulled up, the extended road surface A will be in a region where the pixels with parallax are not on the distribution of the voted coordinates.

そこで、本実施形態では、セグメントごとに、推定路面を延長して延長路面を設定し、延長路面の長さに対して、その延長路面上に、視差を持つ画素が投票された座標が何点存在するかをカウントし、その割合（カウント数／延長路面の長さ（セグメントの長さを用いてもよい））が閾値以上であるか否かを判断する。そして、閾値以上である場合は、着目セグメントにおける推定路面は正しく路面視差を拾って路面推定できているとみなして推定成功とする（推定路面の棄却は行わない）。一方、閾値未満の場合は推定失敗と判断して、着目セグメントにおける推定路面を棄却し、デフォルト路面又は履歴路面を設定する。 Therefore, in the present embodiment, the estimated road surface is extended and the extended road surface is set for each segment, and the coordinates at which the pixels having parallax are voted on the extended road surface are the points with respect to the length of the extended road surface. It counts whether it exists and determines whether the ratio (number of counts / length of extended road surface (segment length may be used)) is equal to or greater than the threshold value. If it is equal to or higher than the threshold value, it is considered that the estimated road surface in the segment of interest has correctly picked up the road surface parallax and the road surface can be estimated, and the estimation is successful (the estimated road surface is not rejected). On the other hand, if it is less than the threshold value, it is judged that the estimation has failed, the estimated road surface in the segment of interest is rejected, and the default road surface or the historical road surface is set.

なお、ここでは、延長路面上の座標点数（視差を持つ画素が投票された座標の数）をカウントすると説明したが、線上の点のみに着目してしまうと多少の傾きの差異に影響を受けて正しく計測できなくなる可能性がある。従って、マージン線を設けて、延長路面とマージン線に挟まれる領域内の座標点数をカウントしてもよい。ここでは、延長路面とマージン線に挟まれる領域が、上記所定の形状（推定路面に基づいて設定された所定の形状）に対応していると考えることができる。なお、マージン線は延長路面の下方向だけに限定されるものではなく、上方向に設けられてもよい。また、マージン線が１本の場合は、マージン線と延長路面に挟まれる領域内の座標点数をカウントし、マージン線が２本の場合は、最も外側に存在する２本に挟まれる領域内の座標点数をカウントする。また、カウントについて補足すると、頻度値が１以上の座標点数をカウントしてもよいし、所定値未満である場合はノイズ視差と見なして、所定値以上の頻度値を持つ座標をカウントしてもよい。また、延長する路面はより近方のセグメント、または、近方の所定距離に対してのみ実行してもよいし、遠方のセグメント、または、遠方の所定距離まで延長してもよい。つまり、延長路面の長さは任意に設定可能である。また、本処理を適用する範囲は、所定のセグメント、または所定の距離間に限定してもよい(例えば、第２セグメントと第３セグメントに対してのみ実行するといった使い方が可能である)。なお、推定路面の成否判定として、上記処理に加えて、角度による成否判定や標本点群の分散による成否判定などの異なる成否判定を併せて適用しても構わない。角度による成否判定の例、および、分散による成否判定の例は上述したとおりである。 Here, it was explained that the number of coordinate points on the extended road surface (the number of coordinates in which pixels with parallax were voted) is counted, but if only the points on the line are focused on, it will be affected by a slight difference in inclination. There is a possibility that it will not be possible to measure correctly. Therefore, a margin line may be provided to count the number of coordinate points in the region sandwiched between the extension road surface and the margin line. Here, it can be considered that the region sandwiched between the extension road surface and the margin line corresponds to the above-mentioned predetermined shape (a predetermined shape set based on the estimated road surface). The margin line is not limited to the downward direction of the extension road surface, and may be provided in the upward direction. When there is one margin line, the number of coordinate points in the area sandwiched between the margin line and the extension road surface is counted, and when there are two margin lines, the number of coordinate points in the area sandwiched between the two outermost lines is counted. Count the number of coordinate points. To supplement the count, the number of coordinate points whose frequency value is 1 or more may be counted, or if it is less than a predetermined value, it may be regarded as noise parallax and the coordinates having a frequency value equal to or higher than the predetermined value may be counted. good. Further, the road surface to be extended may be executed only for a closer segment or a predetermined distance near, or may be extended to a distant segment or a predetermined distance far away. That is, the length of the extended road surface can be set arbitrarily. Further, the range to which this process is applied may be limited to a predetermined segment or a predetermined distance (for example, it can be used to execute only to the second segment and the third segment). In addition to the above processing, different success / failure judgments such as success / failure judgment based on the angle and success / failure judgment based on the dispersion of the sample point group may be applied together as the success / failure judgment of the estimated road surface. An example of success / failure judgment by angle and an example of success / failure judgment by dispersion are as described above.

本実施形態の路面検出処理部１２２による処理の流れは図２４に示すフローチャートと同様であり、ステップＳ６の判断処理が上述の第１の実施形態と異なる。より具体的には、棄却部１７３は、着目したセグメントにおける推定路面を延長した延長路面を設定し、延長路面上に、視差を持つ画素が投票された座標が何点存在するかをカウントし、その割合（カウント数／延長路面の長さ（セグメントの長さを用いてもよい））を算出する。そして、算出した割合が閾値未満の場合に、延長路面に対応する推定路面（着目セグメントに対応する推定路面）を棄却すると判断する。一方、算出した割合が閾値以上の場合は、延長路面に対応する推定路面を棄却しないと判断する。 The processing flow by the road surface detection processing unit 122 of this embodiment is the same as the flowchart shown in FIG. 24, and the determination processing of step S6 is different from that of the first embodiment described above. More specifically, the rejection unit 173 sets an extension road surface that is an extension of the estimated road surface in the segment of interest, and counts how many coordinates the pixels with parallax have voted for on the extension road surface. The ratio (count number / length of extended road surface (segment length may be used)) is calculated. Then, when the calculated ratio is less than the threshold value, it is determined that the estimated road surface corresponding to the extended road surface (estimated road surface corresponding to the segment of interest) is rejected. On the other hand, if the calculated ratio is equal to or greater than the threshold value, it is determined that the estimated road surface corresponding to the extended road surface is not rejected.

以上に説明したように、本実施形態では、Ｖマップを複数のセグメントに分割し、セグメントごとに路面を推定する。そして、セグメントごとに、推定路面を延長した延長路面を設定し、Ｖマップ上において、延長路面に占める、視差値ｄを有する画素が投票された座標の合計数の割合が閾値未満の場合は、延長路面に対応する推定路面は物体視差の影響を受けて引き上がっていると判断し、該推定路面を棄却する（推定は失敗と判断する）。これにより、実際の路面とは異なる推定路面を用いて物体検出が行われてしまうことを防止できるので、結果として、物体の検出精度を十分に確保することができる。 As described above, in the present embodiment, the V map is divided into a plurality of segments, and the road surface is estimated for each segment. Then, an extension road surface obtained by extending the estimated road surface is set for each segment, and when the ratio of the total number of coordinates voted by the pixels having the parallax value d to the extension road surface on the V map is less than the threshold value, It is judged that the estimated road surface corresponding to the extended road surface is pulled up due to the influence of the object parallax, and the estimated road surface is rejected (the estimation is judged to be a failure). As a result, it is possible to prevent the object detection from being performed using an estimated road surface different from the actual road surface, and as a result, it is possible to sufficiently secure the detection accuracy of the object.

なお、上述の第１の実施形態と第２の実施形態とを組み合わせて用いることも可能である。例えば棄却部１７３は、セグメントごとに、第１の実施形態の棄却判断と第２の実施形態の棄却判断を切り替えることができる形態であってもよい。例えば棄却部１７３は、着目セグメントにおける推定路面の位置が、デフォルト路面または履歴路面よりも所定値以上高い場合は、該着目セグメントにおける推定路面の棄却判断として第２の実施形態で説明した棄却判断を行い、所定値未満の高さの場合は、第１の実施形態で説明した棄却判断を行う形態であってもよい。また、例えば棄却部１７３は、着目セグメントに対応する推定路面を延長して設定した延長路面の傾きの絶対値が所定値以上の場合（急峻な場合）は、該着目セグメントに対応する推定路面の棄却判断として第２の実施形態で説明した棄却判断を行い、傾きの絶対値が所定値未満の場合は、第１の実施形態で説明した棄却判断を行う形態であってもよい。 It is also possible to use the above-mentioned first embodiment and the second embodiment in combination. For example, the rejection unit 173 may be in a form capable of switching between the rejection determination of the first embodiment and the rejection determination of the second embodiment for each segment. For example, when the position of the estimated road surface in the interest segment is higher than the default road surface or the historical road surface by a predetermined value or more, the rejection unit 173 determines the rejection as described in the second embodiment as the rejection determination of the estimated road surface in the interest segment. If the height is less than a predetermined value, the rejection determination described in the first embodiment may be performed. Further, for example, when the absolute value of the slope of the extended road surface set by extending the estimated road surface corresponding to the segment of interest is equal to or greater than a predetermined value (in the case of steepness), the rejection unit 173 determines the estimated road surface corresponding to the segment of interest. As the rejection determination, the rejection determination described in the second embodiment may be performed, and if the absolute value of the slope is less than a predetermined value, the rejection determination described in the first embodiment may be performed.

以上、本発明に係る実施形態について説明したが、本発明は、上述の各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上述の実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。 Although the embodiments according to the present invention have been described above, the present invention is not limited to the above-described embodiments as they are, and at the implementation stage, the components can be modified and embodied within a range that does not deviate from the gist thereof. .. In addition, various inventions can be formed by an appropriate combination of the plurality of components disclosed in the above-described embodiment. For example, some components may be removed from all the components shown in the embodiments.

また、上述した実施形態の移動体制御システム１００で実行されるプログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のコンピュータで読み取り可能な記録媒体に記録して提供するように構成してもよいし、インターネット等のネットワーク経由で提供または配布するように構成してもよい。また、各種プログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 Further, the program executed by the mobile control system 100 of the above-described embodiment is a file in an installable format or an executable format, and is a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital Versaille Disk). ), USB (Universal Serial Bus), etc., may be configured to be recorded and provided on a computer-readable recording medium, or may be provided or distributed via a network such as the Internet. Further, various programs may be configured to be provided by incorporating them into a ROM or the like in advance.

１Ａ第１のカメラ部
１Ｂ第２のカメラ部
５レンズ
６画像センサ
７センサコントローラ
１０データバスライン
１１シリアルバスライン
１５ＣＰＵ
１６ＦＰＧＡ
１７ＲＯＭ
１８ＲＡＭ
１９シリアルＩＦ
２０データＩＦ
１００移動体制御システム
１０１車両
１０２撮像ユニット
１０３解析ユニット
１０４制御ユニット
１０５表示部
１０６フロントガラス
１１１前処理部
１１２平行化画像生成部
１１３視差画像生成部
１１４物体検出処理部
１２２路面検出処理部
１２３クラスタリング処理部
１２４トラッキング処理部
１２５取得部
１２６第１の生成部
１２７路面推定部
１３０第２の生成部
１４０孤立領域検出処理部
１５０視差画処理部
１６０棄却処理部
１７１分割部
１７２推定部
１７３棄却部
１７４補間部 1A 1st camera unit 1B 2nd camera unit 5 Lens 6 Image sensor 7 Sensor controller 10 Data bus line 11 Serial bus line 15 CPU
16 FPGA
17 ROM
18 RAM
19 Serial IF
20 data IF
100 Mobile control system 101 Vehicle 102 Imaging unit 103 Analysis unit 104 Control unit 105 Display unit 106 Windshield 111 Preprocessing unit 112 Parallelized image generation unit 113 Parallax image generation unit 114 Object detection processing unit 122 Road surface detection processing unit 123 Clustering processing Unit 124 Tracking processing unit 125 Acquisition unit 126 First generation unit 127 Road surface estimation unit 130 Second generation unit 140 Isolated area detection processing unit 150 Parallax image processing unit 160 Rejection processing unit 171 Dividing unit 172 Estimating unit 173 Rejection unit 174 Interpolation Department

特開２０１１−１２８８４４号公報Japanese Unexamined Patent Publication No. 2011-128844

Claims

An acquisition unit that acquires a distance image that has distance information for each pixel,
A generation unit that generates correspondence information in which a position in the vertical direction and a position in the depth direction are associated with each other based on a plurality of pixels included in the distance image.
An estimation unit that estimates the shape of the reference object, which is the reference for the height of the object, for each of the plurality of segments in which the correspondence information is divided.
For each segment, the distance existing below the extension shape is based on an extension shape obtained by extending a shape based on the estimated shape indicating the shape of the reference object estimated by the estimation unit to the adjacent segment. It includes a rejection unit that rejects the estimated shape according to the distribution of information or the ratio of the total number of coordinates voted by the pixels having the distance information to the extension shape.
Information processing device.

The shape based on the estimated shape is the estimated shape itself.
The information processing device according to claim 1.

The shape based on the estimated shape is a margin line of the estimated shape.
The information processing device according to claim 1.

When the frequency value of the distance information existing below the extension shape is equal to or greater than the threshold value, the rejection unit rejects the estimated shape.
The information processing device according to claim 1.

The rejection unit generates a frequency histogram associated with a frequency value that counts the distance information existing below the predetermined shape for each position in the depth direction, and the corresponding frequency value is equal to or higher than the predetermined value. The number of bins indicating the number of positions in the depth direction is measured, and when the ratio of the number of bins to the length of the segment is equal to or greater than the threshold value, the estimated shape is rejected.
The information processing device according to claim 4.

When the ratio of the total number of coordinates voted by the pixels having the distance information to the predetermined area below the extension shape in the corresponding information is equal to or more than the threshold value, the rejection unit obtains the estimated shape. Reject,
The information processing device according to claim 4.

The rejection unit rejects the estimated shape when the ratio of the total number of coordinates voted by the pixels having the distance information to the extension shape is less than the threshold value.
The information processing device according to claim 1.

When the estimated shape corresponding to the segment is rejected by the rejection unit, a setting unit for setting a predetermined shape as the shape of the reference object corresponding to the segment is further provided.
The information processing device according to any one of claims 1 to 7.

The predetermined shape includes a default shape that is assumed to be a flat shape, or a historical shape that indicates a shape estimated in a past frame.
The information processing device according to claim 8.

An imaging unit that captures stereo images,
A distance image generation unit that generates a distance image having distance information for each pixel from the stereo image captured by the imaging unit.
A generation unit that generates correspondence information in which a position in the vertical direction and a position in the depth direction are associated with each other based on a plurality of pixels included in the distance image.
An estimation unit that estimates the shape of the reference object, which is the reference for the height of the object, for each of the plurality of segments in which the correspondence information is divided.
For each segment, the distance existing below the extension shape is based on an extension shape obtained by extending a shape based on the estimated shape indicating the shape of the reference object estimated by the estimation unit to the adjacent segment. It includes a rejection unit that rejects the estimated shape according to the distribution of information or the ratio of the total number of coordinates voted by the pixels having the distance information to the extension shape.
Imaging device.

A device control system including an image pickup device and a control unit that controls the device based on the output result of the image pickup device.
The image pickup device
An imaging unit that captures stereo images,
A distance image generation unit that generates a distance image having distance information for each pixel from the stereo image captured by the imaging unit.
A generation unit that generates correspondence information in which a position in the vertical direction and a position in the depth direction are associated with each other based on a plurality of pixels included in the distance image.
An estimation unit that estimates the shape of the reference object, which is the reference for the height of the object, for each of the plurality of segments in which the correspondence information is divided.
The distance existing below a predetermined shape for each of the segments, based on an extension shape obtained by extending a shape based on the estimated shape indicating the shape of the reference object estimated by the estimation unit to the adjacent segment. It includes a rejection unit that rejects the estimated shape according to the distribution of information or the ratio of the total number of coordinates voted by the pixels having the distance information to the extension shape.
Equipment control system.

An acquisition step of acquiring a distance image having distance information for each pixel,
A generation step of generating correspondence information in which a vertical position and a depth position are associated with each other based on a plurality of pixels included in the distance image.
An estimation step for estimating the shape of a reference object, which is a reference for the height of the object, for each of a plurality of segments obtained by dividing the correspondence information.
The distance existing below a predetermined shape for each of the segments, based on an extension shape obtained by extending a shape based on the estimated shape indicating the shape of the reference object estimated by the estimation step to the adjacent segment. A rejection step of rejecting the estimated shape, depending on the distribution of information or the ratio of the total number of coordinates voted by the pixels having the distance information to the extension shape, is included.
Information processing method.

On the computer
An acquisition step of acquiring a distance image having distance information for each pixel,
A generation step of voting a plurality of pixels included in the distance image to generate correspondence information in which a vertical position and a depth position are associated with each other.
An estimation step for estimating the shape of a reference object, which is a reference for the height of an object, for each of a plurality of segments obtained by dividing the correspondence information.
For each segment, the distance existing below the extension shape is based on an extension shape obtained by extending the shape based on the estimated shape indicating the shape of the reference object estimated by the estimation step to the adjacent segment. A program for executing a rejection step of rejecting the estimated shape according to the distribution of information or the ratio of the total number of coordinates voted by the pixels having the distance information to the extended shape.