JP2023521765A

JP2023521765A - Method and system for providing depth maps with confidence estimates

Info

Publication number: JP2023521765A
Application number: JP2022561423A
Authority: JP
Inventors: オールデンジャン、リーフ; ルイスブルス、カルステン
Original assignee: Nodar Inc
Current assignee: Nodar Inc
Priority date: 2021-01-06
Filing date: 2021-12-22
Publication date: 2023-05-25
Anticipated expiration: 2041-12-22
Also published as: KR20220167794A; KR102668282B1; JP7525936B2; KR20230142640A; WO2022150196A1; EP4275194A1; KR102583256B1

Abstract

監視ありまたは監視なしの車両移動のための自動車両支援システムが提供される。システムは、制御システムと、第１のセンサシステムとを含む。第１のセンサシステムは、シーンの第１の画像データを受信し得、第１の画像データに基づいて、第１の視差マップと、第１の確信度マップとを出力し得る。制御システムは、第１の視差マップと、第１の確信度マップとに基づいて、ビデオストリームを出力し得る。車両支援システムは、シーンの少なくとも一部分の第２の画像データを受信し、第２の画像データに基づいて、第２の確信度マップを出力する第２のセンサシステムも含み得る。ビデオストリームは、スーパフレームを含み得、各スーパフレームは、シーンの２Ｄ画像と、２Ｄ画像に対応する深度マップと、深度マップに対応する確実性マップとを含む。An automated vehicle assistance system is provided for supervised or unsupervised vehicle movement. The system includes a control system and a first sensor system. A first sensor system may receive first image data of a scene and may output a first disparity map and a first confidence map based on the first image data. The control system may output a video stream based on the first disparity map and the first confidence map. The vehicle assistance system may also include a second sensor system that receives second image data of at least a portion of the scene and outputs a second confidence map based on the second image data. A video stream may include superframes, each superframe including a 2D image of a scene, a depth map corresponding to the 2D image, and a certainty map corresponding to the depth map.

Description

本発明の技術は、ステレオビジョンシステムに関する。詳細には、本発明の技術は、シーンの画像をキャプチャし、２次元（「２Ｄ」）画像データ、シーン内の対象物までの距離に関係する深度推定値データ、および深度推定値データの確実性のレベルに関係する確信度データを出力する、ステレオビジョンシステム（例えば、ステレオカメラシステム）に関する。 The technology of the present invention relates to stereo vision systems. Specifically, the technique of the present invention captures an image of a scene and uses two-dimensional (“2D”) image data, depth estimate data related to distances to objects in the scene, and robustness of the depth estimate data. Stereo vision systems (eg, stereo camera systems) that output confidence data related to sexuality levels.

ステレオビジョンシステムは、典型的には、左カメラによってキャプチャされた画像と、右カメラによってキャプチャされた画像における、マッチしたピクセル間の視差またはパララックスを測定することによって、距離を推定するために、２つのカメラ（例えば、左カメラおよび右カメラ）を使用する。例えば、特許文献１は、カメラのうちの１つ（しばしば、左カメラ）によってキャプチャされた、平行化画像に対応する、ピクセルシフトデータを包含する、２Ｄマトリックスである、視差マップを出力する、ステレオビジョンシステムおよび方法を開示している。ピクセルの深度は、ピクセルの視差に反比例するので、カメラからピクセルで画像化された対象物までの距離に対応する、画像の各ピクセルについての深度の推定値は、視差マップから容易に計算され得る。そのため、「深度マップ」および「視差マップ」という用語は、それらが、画像のシーンについて、ほぼ同一の情報を提供するので、本願明細書においては、交換可能に使用されることがある。 Stereo vision systems typically use to estimate distance by measuring the disparity or parallax between matched pixels in the image captured by the left camera and the image captured by the right camera. Two cameras (eg, left camera and right camera) are used. For example, US Pat. No. 6,200,300 outputs a disparity map, a 2D matrix containing pixel shift data, corresponding to a rectified image captured by one of the cameras (often the left camera), a stereo A vision system and method are disclosed. Since the depth of a pixel is inversely proportional to the disparity of the pixel, an estimate of the depth for each pixel in the image, corresponding to the distance from the camera to the object imaged by the pixel, can be easily calculated from the disparity map. . As such, the terms "depth map" and "parallax map" are sometimes used interchangeably herein as they provide nearly identical information about the scene of an image.

存在しているステレオビジョンシステムの問題は、深度推定値が、典型的には、推定値が、ピクセルごとに、またはフレームごとにでさえ、どの程度信頼できるか（または信頼できないか）の指標なしに提供されることである。従来の運転者支援システムにおいては、車両を停止させる、または操縦するための自動化された決定は、多数のセンサ（例えば、レーダ、ライダ、カメラなど）からの情報に基づいて、いずれのセンサからの情報についても信頼性の知識なしに、行われることがある。これは、運転者支援システムが、不必要な手順（例えば、きわめて低速での走行、過度に頻繁なブレーキなど）を実行することによって、センサからの相反する情報の不確実性を過大補償する結果となり得、乗客の快適性を低下させることがあり、または運転者支援システムが、恣意的に別のセンサよりも１つのセンサを選ぶことによって、相反する情報の不確実性を過小補償する結果となり得、乗客リスクを低下させることがある。認識されるように、運転者支援システムによって行われる決定は、乗客の安全に大きく影響することがあり、時には、重大な生命に関わる決定となることがある。例えば、車両が、幹線道路での運転に典型的な巡航スピードで走行しているとき、対象物までの距離の正確な知識は、巡航スピードを維持、またはほぼ維持しながら、対象物に衝突することを避けるために、運転者支援システムがどのように車両を制御するかに必須であり得る。 A problem with existing stereo vision systems is that depth estimates typically have no indication of how reliable (or unreliable) the estimate is on a pixel-by-pixel or even frame-by-frame basis. is to be provided to In conventional driver assistance systems, automated decisions to stop or steer a vehicle are based on information from a number of sensors (e.g., radar, lidar, cameras, etc.). Information can also be done without knowledge of its reliability. This results in the driver assistance system overcompensating for the uncertainties of conflicting information from the sensors by performing unnecessary procedures (e.g. driving very slowly, braking too frequently, etc.). which can reduce passenger comfort, or result in driver assistance systems undercompensating for uncertainties in conflicting information by arbitrarily choosing one sensor over another. and may reduce passenger risk. As will be appreciated, decisions made by driver assistance systems can have a significant impact on passenger safety and, at times, can be critical and life-threatening decisions. For example, when a vehicle is traveling at cruising speeds typical of highway driving, accurate knowledge of the distance to the object will allow it to collide with the object while maintaining or nearly maintaining cruising speed. To avoid this, how the driver assistance system controls the vehicle can be essential.

米国特許第８２０８７１６号明細書U.S. Pat. No. 8,208,716

自律車両および高度運転者支援システムは、車両の周囲についての情報を獲得して、車両の制御システムが、車両をどのように操縦するか、車両のスピードをどのように調整するか（例えば、どのように加速または減速するか）、安全対策を配備するか（例えば、警告点滅灯をオンにするか）どうかなどについての決定を行うことを可能にするために、様々なタイプのセンサ（例えば、ライダ、レーダ、カメラ、超音波、ステレオカメラなど）からの情報を使用し得る。認識されるように、車両の制御システムは、数十の電子制御モジュールまたはユニット（「ＥＣＵ」）を、いくつかのケースにおいては、１００を超えるＥＣＵを含むことがあり、各ＥＣＵは、車両動作の一側面を制御する（例えば、スピード制御ＥＣＵ、ブレーキ制御ＥＣＵ、変速機制御ＥＣＵ、エンジン制御ＥＣＵ、バッテリ管理ＥＣＵなど）。 Autonomous vehicles and advanced driver assistance systems acquire information about the vehicle's surroundings to determine how the vehicle's control system steers the vehicle and adjusts the vehicle's speed (e.g., how various types of sensors (such as information from lidar, radar, cameras, ultrasound, stereo cameras, etc.). As will be appreciated, a vehicle's control system may include dozens of electronic control modules or units ("ECUs"), and in some cases over 100 ECUs, each ECU controlling the operation of the vehicle. (eg, speed control ECU, brake control ECU, transmission control ECU, engine control ECU, battery management ECU, etc.).

本発明者らは、異なるタイプのセンサからのデータを最も良く融合させる、または組み合わせるためには、センサの各々からの測定値の確信度または確実性のレベルを知ることが重要であることを認知し、認識した。本技術のいくつかの態様においては、画像のペア内に出現する特徴までの距離を示す、高解像度深度情報は、画像のペアのステレオマッチングを通して決定され得る。ステレオマッチングは、視差特徴、先行画像、コスト曲線、および局所的特性のうちのいずれか１つまたは任意の組み合わせに基づいて、距離確実性情報を提供するために実行され得る。距離確実性情報は、深度情報の信頼性が格付けられることを可能にすることによって、センサ融合における問題を最小化または排除するために使用され得、したがって、深度情報を使用して行われた決定の安全性の程度を増加させ得る。 The inventors have recognized that it is important to know the confidence or level of certainty of the measurements from each of the sensors in order to best fuse or combine data from different types of sensors. and recognized. In some aspects of the present technology, high-resolution depth information, indicating distances to features appearing in pairs of images, may be determined through stereo matching of pairs of images. Stereo matching may be performed to provide range certainty information based on any one or any combination of disparity features, prior images, cost curves, and local characteristics. Range certainty information can be used to minimize or eliminate problems in sensor fusion by allowing the reliability of depth information to be graded, thus allowing decisions made using depth information to be can increase the degree of safety of

本技術の態様によれば、監視ありまたは監視なしの車両移動のための自動車両支援システムが提供される。システムは、コンピュータプロセッサと、コンピュータプロセッサに対して結合されたメモリとから成る、車両制御システムと、シーンの第１の画像データを受信し、第１の画像データに基づいて、第１の視差マップと、第１の確信度マップとを出力するように構成された、第１のセンサシステムとから成り得る。車両制御システムは、第１のセンサシステムから、第１の視差マップと、第１の確信度マップとを受信し、第１の視差マップと、第１の確信度マップとから成る、ビデオストリームを出力するように構成され得る。 According to aspects of the present technology, an automated vehicle assistance system for supervised or unsupervised vehicle travel is provided. A system receives first image data of a vehicle control system and a scene, comprising a computer processor and a memory coupled to the computer processor, and generates a first disparity map based on the first image data. and a first sensor system configured to output a first confidence map. The vehicle control system receives the first disparity map and the first confidence map from the first sensor system and produces a video stream comprising the first disparity map and the first confidence map. output.

この態様のいくつかの実施形態においては、ビデオストリームにおいて、第１の確信度マップは、第１の視差マップの一部になるようにエンコードされ得る。いくつかの実施形態においては、視差マップは、ピクセルの各々についての視差データから成り得、確信度マップは、ピクセルの各々についての確信度データから成り得る。いくつかの実施形態においては、第１の画像データは、左右の２次元（２Ｄ）の第１の画像についてのデータから成り得、第１のセンサシステムは、第１の画像データから、左右（２Ｄ）の平行化された第１の画像と、第１のコストボリュームマップとを生成するように構成され得、第１のセンサシステムは、２Ｄの平行化された第１の画像と、第１の視差マップと、第１のコストボリュームマップとから、第１の確信度マップを生成するように構成され得る。いくつかの実施形態においては、第１のセンサシステムは、セミグローバルマッチング（ＳＧＭ）アルゴリズムから決定される一意性の値、および第１の画像データに対するソーベル演算から決定される画像テクスチャメトリックの一方または両方に基づいて、第１の確信度マップを生成するように構成され得る。 In some embodiments of this aspect, the first confidence map may be encoded to be part of the first disparity map in the video stream. In some embodiments, the disparity map may consist of disparity data for each of the pixels and the confidence map may consist of confidence data for each of the pixels. In some embodiments, the first image data may consist of data for a left and right two-dimensional (2D) first image, and the first sensor system may extract left and right (2D) first images from the first image data. 2D) rectified first image and a first cost volume map, wherein the first sensor system receives the 2D rectified first image and the first cost volume map; and the first cost volume map to generate a first confidence map. In some embodiments, the first sensor system uses one of a uniqueness value determined from a semi-global matching (SGM) algorithm and an image texture metric determined from a Sobel operation on the first image data, or Based on both, it may be configured to generate a first belief map.

この態様のいくつかの実施形態においては、車両支援システムは、シーンの少なくとも一部分の第２の画像データを受信し、第２の画像データに基づいて、第２の確信度マップを出力するように構成された、第２のセンサシステムからさらに成り得る。車両制御システムは、第２のセンサシステムから、第２の確信度マップを受信することと、ビデオストリームを、スーパフレームのシーケンスとして出力することであって、各スーパフレームは、第１の視差マップと、第１の確信度マップと、第２の確信度マップとに基づいた情報から成る、出力することとを行うように構成され得る。いくつかの実施形態においては、車両制御システムは、ビデオストリームの情報に基づいて、車両の電子制御ユニット（ＥＣＵ）に対して、制御信号を出力するように構成され得る。いくつかの実施形態においては、第１のセンサシステムは、第１の視差マップと、第１の確信度マップとを生成するために、第１の画像データを処理するように構成された、第１のセンサモジュールであり得、第２のセンサシステムは、第２の確信度マップを生成するために、第２の画像データを処理するように構成された、第２のセンサモジュールであり得、第１のセンサモジュールおよび第２のセンサモジュールは、メモリ内に記憶され得、コンピュータプロセッサは、第１のセンサモジュールおよび第２のセンサモジュールを遂行するように構成され得る。いくつかの実施形態においては、ビデオストリームは、第１の視差マップと、第１の確信度マップとから成る、少なくとも１つのスーパフレームと、第１の視差マップと、第２の確信度マップとから成る、少なくとも１つのスーパフレームとから成り得る。いくつかの実施形態においては、ビデオストリームは、第１の確信度マップの一部分と、第２の確信度マップの一部分とから成る、少なくとも１つのスーパフレームから成り得る。いくつかの実施形態においては、第１の画像データは、ステレオビジョンデータから成り得、第２の画像データは、ライダデータから成り得る。 In some embodiments of this aspect, the vehicle assistance system receives second image data of at least a portion of the scene and outputs a second confidence map based on the second image data. It may further comprise a second sensor system configured. The vehicle control system receives a second confidence map from the second sensor system and outputs the video stream as a sequence of superframes, each superframe being a first disparity map. and outputting information based on the first belief map and the second belief map. In some embodiments, the vehicle control system may be configured to output control signals to the vehicle's electronic control unit (ECU) based on the information in the video stream. In some embodiments, the first sensor system is configured to process the first image data to generate a first disparity map and a first confidence map. can be one sensor module and the second sensor system can be a second sensor module configured to process the second image data to generate a second belief map; A first sensor module and a second sensor module may be stored in memory and a computer processor may be configured to perform the first sensor module and the second sensor module. In some embodiments, the video stream comprises at least one superframe consisting of a first disparity map, a first confidence map, a first disparity map and a second confidence map. at least one superframe consisting of In some embodiments, the video stream may consist of at least one superframe consisting of a portion of the first belief map and a portion of the second belief map. In some embodiments, the first image data may consist of stereo vision data and the second image data may consist of lidar data.

この態様のいくつかの実施形態においては、車両支援システムは、シーンの少なくとも一部分の第３の画像データを受信し、第３の画像データに基づいて、第３の確信度マップを出力するように構成された、第３のセンサシステムからさらに成り得る。第３の画像データは、レーダデータ、または音響データから成る。 In some embodiments of this aspect, the vehicle assistance system receives third image data of at least a portion of the scene and outputs a third confidence map based on the third image data. It may further comprise a configured third sensor system. The third image data consists of radar data or acoustic data.

この態様のいくつかの実施形態においては、ビデオストリームの各スーパフレームは、シーンの２次元（２Ｄ）画像と、シーンの深度マップと、シーンの確実性マップとから成り得る。いくつかの実施形態においては、シーンの確実性マップは、第１の確信度マップ、もしくは第２の確信度マップ、または第１の確信度マップと第２の確信度マップとの組み合わせから成り得る。いくつかの実施形態においては、シーンの深度マップは、シーンの２Ｄ画像に対応する画像データを用いて変調された、第１の視差マップから成り得、シーンの確実性マップは、シーンの２Ｄ画像に対応する画像データを用いて変調された、第１の確信度マップ、もしくは第２の確信度マップ、または第１の確信度マップと第２の確信度マップとの組み合わせから成り得る。いくつかの実施形態においては、シーンの２Ｄ画像のピクセル、シーンの深度マップのピクセル、およびシーンの確実性マップのピクセルは、時間的および空間的にマッチさせられ得る。いくつかの実施形態においては、車両制御システムは、ビデオストリームのデータサイズを下げるために、第１の視差マップからの視差情報、ならびに第１の確信度マップおよび第２の確信度マップからの確信度情報をエンコードするように構成され得る。 In some embodiments of this aspect, each superframe of the video stream may consist of a two-dimensional (2D) image of the scene, a depth map of the scene, and a certainty map of the scene. In some embodiments, the scene certainty map may consist of a first confidence map, or a second confidence map, or a combination of the first and second confidence maps. . In some embodiments, the depth map of the scene may consist of a first disparity map modulated with image data corresponding to a 2D image of the scene, and the certainty map of the scene may consist of the 2D image of the scene. or a second belief map, or a combination of the first and second belief maps, modulated with image data corresponding to . In some embodiments, the pixels of the 2D image of the scene, the pixels of the depth map of the scene, and the pixels of the certainty map of the scene may be temporally and spatially matched. In some embodiments, the vehicle control system combines disparity information from the first disparity map and beliefs from the first and second belief maps to reduce the data size of the video stream. It may be configured to encode degree information.

この態様のいくつかの実施形態においては、車両支援システムは、車両上に搭載されるように構成された、カメラのペアからさらに成り得る。カメラは、第１のセンサシステムに対して、第１の画像データを提供するように構成され得る。 In some embodiments of this aspect, the vehicle assistance system may further comprise a pair of cameras configured to be mounted on the vehicle. The camera may be configured to provide first image data to the first sensor system.

この態様のいくつかの実施形態においては、ビデオストリームは、２次元（２Ｄ）カラー画像から成り得、各２Ｄカラー画像は、複数のピクセルから成り、各ピクセルのアルファチャンネル透明度は、ピクセルについての確信度値に比例する。いくつかの実施形態においては、２Ｄカラー画像のカラーは、深度範囲を示し得る。 In some embodiments of this aspect, the video stream may consist of two-dimensional (2D) color images, each 2D color image consisting of a plurality of pixels, and the alpha channel transparency of each pixel determines the confidence about the pixel. Proportional to degrees. In some embodiments, the color of the 2D color image may indicate depth range.

本技術の別の態様によれば、コンピュータプロセッサによって遂行されたときに、コンピュータプロセッサに、監視ありまたは監視なしの車両移動のための自動車両支援システムの方法を実行させるコードが記憶された、非一時的コンピュータ可読記憶媒体が提供される。方法は、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとを獲得する工程であって、第１の視差マップおよび第１の確信度マップは、シーンの第１の画像データに対応する、工程と、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとから成る、ビデオストリームを出力する工程とから成り得る。 According to another aspect of the present technology, a non-supervised vehicle assistance system having code stored thereon which, when performed by a computer processor, causes the computer processor to perform a method of an automotive vehicle assistance system for supervised or unsupervised vehicle movement. A temporary computer-readable storage medium is provided. The method comprises a computer processor obtaining a first disparity map and a first confidence map, the first disparity map and the first confidence map being first image data of a scene and the computer processor outputting a video stream comprising the first disparity map and the first confidence map.

この態様のいくつかの実施形態においては、ビデオストリームを出力する工程は、コンピュータプロセッサが、第１の視差マップの一部となるように、第１の確信度マップをエンコードする工程から成り得る。いくつかの実施形態においては、第１の画像データは、複数のピクセルから成り得、視差マップは、ピクセルの各々についての視差データから成り得、確信度マップは、ピクセルの各々についての確信度データから成り得る。いくつかの実施形態においては、方法は、コンピュータプロセッサが、シーンの少なくとも一部分の第２の画像データに対応する第２の確信度マップを獲得する工程と、コンピュータプロセッサが、ビデオストリームを、スーパフレームのシーケンスとして出力する工程であって、各スーパフレームは、第１の視差マップと、第１の確信度マップと、第２の確信度マップとに基づいた情報から成る、工程とからさらに成り得る。いくつかの実施形態においては、方法は、コンピュータプロセッサが、ビデオストリームの情報に基づいて、車両の電子制御ユニット（ＥＣＵ）に対して、制御信号を出力する工程からさらに成り得る。いくつかの実施形態においては、方法は、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとを獲得するために、第１の画像データを処理する工程と、コンピュータプロセッサが、第２の確信度マップを獲得するために、第２の画像データを処理する工程とからさらに成り得る。 In some embodiments of this aspect, outputting the video stream may comprise the computer processor encoding the first confidence map to become part of the first disparity map. In some embodiments, the first image data may consist of a plurality of pixels, the disparity map may consist of disparity data for each of the pixels, and the confidence map may consist of confidence data for each of the pixels. can consist of In some embodiments, the method comprises the steps of: a computer processor obtaining a second confidence map corresponding to second image data of at least a portion of a scene; wherein each superframe consists of information based on the first disparity map, the first confidence map, and the second confidence map . In some embodiments, the method may further comprise the computer processor outputting a control signal to an electronic control unit (ECU) of the vehicle based on the information in the video stream. In some embodiments, the method comprises the steps of: a computer processor processing the first image data to obtain a first disparity map and a first confidence map; processing the second image data to obtain a second confidence map.

この態様のいくつかの実施形態においては、ビデオストリームを出力する工程は、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとから成るように、少なくとも１つのスーパフレームを準備する工程と、コンピュータプロセッサが、第１の視差マップと、第２の確信度マップとから成るように、少なくとも１つのスーパフレームを準備する工程とから成り得る。いくつかの実施形態においては、ビデオストリームを出力する工程は、コンピュータプロセッサが、第１の確信度マップの一部分と、第２の確信度マップの一部分とから成る、少なくとも１つのスーパフレームを準備する工程からさらに成り得る。いくつかの実施形態においては、第１の画像データは、ステレオビジョンデータから成り得、第２の画像データは、ライダデータ、またはレーダデータ、または音響データから成り得る。 In some embodiments of this aspect, the step of outputting the video stream includes the computer processor preparing at least one superframe to consist of the first disparity map and the first confidence map. and a computer processor preparing at least one superframe to consist of a first disparity map and a second confidence map. In some embodiments, the step of outputting the video stream includes the computer processor preparing at least one superframe consisting of a portion of the first belief map and a portion of the second belief map. It can further comprise a step. In some embodiments, the first image data may consist of stereo vision data and the second image data may consist of lidar data, or radar data, or acoustic data.

この態様のいくつかの実施形態においては、ビデオストリームを出力する工程は、コンピュータプロセッサが、シーンの２次元（２Ｄ）画像と、シーンの深度マップと、シーンの確実性マップとから成るように、ビデオストリームの各スーパフレームを準備する工程から成り得る。いくつかの実施形態においては、コンピュータプロセッサによって、各スーパフレームを準備する工程は、シーンの２Ｄ画像のピクセルと、シーンの深度マップのピクセルと、シーンの確実性マップのピクセルとを、時間的および空間的にマッチさせる工程から成り得る。いくつかの実施形態においては、ビデオストリームを出力する工程は、コンピュータプロセッサが、ビデオストリームのデータサイズを下げるために、第１の視差マップからの視差情報、ならびに第１の確信度マップおよび第２の確信度マップからの確信度情報をエンコードする工程から成り得る。いくつかの実施形態においては、ビデオストリームを出力する工程は、各２Ｄカラー画像が、複数のピクセルから成り、各ピクセルのアルファチャンネル透明度が、ピクセルについての確信度値に比例し、２Ｄカラー画像のカラーが、深度範囲を示すように、２次元（２Ｄ）カラー画像を準備する工程から成り得る。 In some embodiments of this aspect, the step of outputting the video stream includes: Preparing each superframe of the video stream. In some embodiments, the step of preparing each superframe by a computer processor includes combining pixels of a 2D image of the scene, pixels of a depth map of the scene, and pixels of a certainty map of the scene into temporal and It may consist of a spatial matching step. In some embodiments, the step of outputting the video stream includes the computer processor disparity information from the first disparity map and the first confidence map and the second confidence map to reduce the data size of the video stream. encoding confidence information from the confidence map of the . In some embodiments, the step of outputting the video stream comprises: each 2D color image consisting of a plurality of pixels, an alpha channel transparency for each pixel proportional to a confidence value for the pixel; Color may consist of preparing a two-dimensional (2D) color image to indicate the depth range.

本技術の別の態様によれば、ステレオビジョンシステムが提供される。システムは、画像のペアのシーケンスをキャプチャするように構成された、ステレオカメラシステムであって、画像の各ペアは、同時にキャプチャされた、第１の画像と、第２の画像とから成る、ステレオカメラシステムと、コンピュータプロセッサであって、ステレオカメラシステムから、画像データのストリームを受信することであって、画像データは、画像のペアのシーケンスに対応する、受信することを行うようにプログラムされた、コンピュータプロセッサとから成り得る。コンピュータプロセッサは、画像のペアの各々について、マッチさせられたピクセルの２次元（２Ｄ）ピクセルマップを生成するために、第１の画像と、第２の画像とを平行化することと、ピクセルマップの各ピクセルについて、深度値を決定することと、ピクセルマップの各ピクセルについて、深度値についての確信度値を決定することとを行うようにもプログラムされ得る。コンピュータプロセッサは、確信度値のうちの少なくとも１つが画像異常を示すとき、制御信号を発行することを行うようにもプログラムされ得る。 According to another aspect of the technology, a stereo vision system is provided. The system is a stereo camera system configured to capture a sequence of image pairs, each pair of images consisting of a first image and a second image captured simultaneously. a camera system and a computer processor programmed to receive a stream of image data from the stereo camera system, the image data corresponding to a sequence of pairs of images. , a computer processor; A computer processor rectifies the first image and the second image to generate a two-dimensional (2D) pixel map of matched pixels for each of the pair of images; and for each pixel of the pixmap a confidence value for the depth value. The computer processor may also be programmed to issue the control signal when at least one of the confidence values indicates an image anomaly.

この態様のいくつかの実施形態においては、画像異常は、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応し得る。いくつかの実施形態においては、画像異常は、シーケンスの画像の２つ以上の連続するペアについて、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応し得る。いくつかの実施形態においては、画像異常は、確信度マップの隣接する領域の複数のピクセルから成り得る。いくつかの実施形態においては、制御信号は、事前記録されたメッセージであり得る、可聴音を引き起こすように構成され得る。いくつかの実施形態においては、制御信号は、車両のエンジン制御モジュールに対して発行され得る。いくつかの実施形態においては、ピクセルマップの各ピクセルについて、確信度値は、ピクセルにおけるエッジの有無、ピクセルの照度レベル、ならびにピクセルマップがそれから生成された第１の画像および第２の画像のテクスチャ値に基づいて決定され得る。 In some embodiments of this aspect, the image anomaly may correspond to one or more pixels in the portion of the confidence map that have confidence values below a predetermined threshold. In some embodiments, the image anomaly is one or more pixels of the portion of the confidence map that have confidence values below a predetermined threshold for two or more consecutive pairs of images of the sequence. can correspond to In some embodiments, an image anomaly may consist of multiple pixels in contiguous regions of the confidence map. In some embodiments, the control signal may be configured to cause an audible tone, which may be a pre-recorded message. In some embodiments, the control signal may be issued to the vehicle's engine control module. In some embodiments, for each pixel of the pixmap, the confidence value is the presence or absence of an edge at the pixel, the illumination level of the pixel, and the texture of the first and second images from which the pixmap was generated. can be determined based on the value.

この態様のいくつかの実施形態においては、コンピュータプロセッサは、画像のペアのシーケンスに対応する、スーパフレームのシーケンスを出力するようにプログラムされ得、スーパフレームの各々は、２Ｄ画像と、２Ｄ画像に対応する確信度マップとから成る。いくつかの実施形態においては、２Ｄ画像は、第１の画像、または第２の画像であり得る。いくつかの実施形態においては、コンピュータプロセッサは、スーパフレームのシーケンスを、２Ｄ画像と、確信度マップに対応する可視的な確信度インジケータとを示すことをディスプレイに行わせる、表示信号として出力するようにプログラムされ得る。表示信号は、確信度インジケータを、２Ｄ画像の各ピクセルの透明度として、ピクセルごとに表示させ得る。いくつかの実施形態においては、スーパフレームの各々は、２Ｄ画像と、確信度マップと、２Ｄ画像に対応する視差マップとから成り得る。 In some embodiments of this aspect, the computer processor may be programmed to output a sequence of superframes corresponding to the sequence of image pairs, each superframe being a 2D image and a 2D image. and the corresponding confidence map. In some embodiments, the 2D image can be the first image or the second image. In some embodiments, the computer processor outputs the sequence of superframes as a display signal that causes the display to show a 2D image and a visual confidence indicator corresponding to the confidence map. can be programmed to The display signal may cause the confidence indicator to be displayed pixel by pixel as the transparency of each pixel of the 2D image. In some embodiments, each superframe may consist of a 2D image, a confidence map, and a disparity map corresponding to the 2D image.

本技術の別の態様によれば、コンピュータプロセッサによって遂行されたとき、コンピュータプロセッサに、ステレオビジョンシステムの方法を実行させるコードが記憶された、非一時的コンピュータ可読記憶媒体が提供される。方法は、コンピュータプロセッサが、ステレオカメラシステムから、画像データのストリームを受信する工程であって、画像データは、画像のペアのシーケンスに対応し、画像の各ペアは、同時にキャプチャされた、第１の画像と、第２の画像とから成る、工程と、画像のペアの各々について、コンピュータプロセッサが、マッチさせられたピクセルの２次元（２Ｄ）ピクセルマップを生成するために、第１の画像と、第２の画像とを平行化する工程と、ピクセルマップの各ピクセルについて、深度値を決定する工程と、ピクセルマップの各ピクセルについて、深度値についての確信度値を決定する工程と、確信度マップのうちの少なくとも１つが画像異常を示すとき、コンピュータプロセッサが、制御信号を発行する工程とから成り得る。 According to another aspect of the present technology, there is provided a non-transitory computer-readable storage medium having code stored thereon which, when executed by a computer processor, causes the computer processor to perform a method of a stereo vision system. The method comprises the steps of: a computer processor receiving a stream of image data from a stereo camera system, the image data corresponding to a sequence of image pairs, each pair of images captured simultaneously; and a second image, and for each pair of images, the computer processor generates a two-dimensional (2D) pixel map of matched pixels from the first image and , the second image; for each pixel of the pixmap, determining a depth value; for each pixel of the pixmap, determining a confidence value for the depth value; and the computer processor issuing a control signal when at least one of the maps indicates an image anomaly.

この態様のいくつかの実施形態においては、画像異常は、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応し得る。いくつかの実施形態においては、画像異常は、シーケンスの画像の２つ以上の連続するペアについて、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応し得る。いくつかの実施形態においては、画像異常は、確信度マップの隣接する領域の複数のピクセルから成り得る。 In some embodiments of this aspect, the image anomaly may correspond to one or more pixels in the portion of the confidence map that have confidence values below a predetermined threshold. In some embodiments, an image anomaly is one or more pixels in a portion of the confidence map that have confidence values below a predetermined threshold for two or more consecutive pairs of images of the sequence. can correspond to In some embodiments, an image anomaly may consist of multiple pixels in contiguous regions of the confidence map.

この態様のいくつかの実施形態においては、制御信号は、可聴音を引き起こすように構成され得る。例えば、可聴音は、事前記録されたメッセージであり得る。いくつかの実施形態においては、制御信号は、車両のエンジン制御モジュールに対して発行され得る。いくつかの実施形態においては、ピクセルマップの各ピクセルについて、確信度値は、ピクセルにおけるエッジの有無、ピクセルの照度レベル、ならびにピクセルマップがそれから生成された第１の画像および第２の画像のテクスチャ値に基づいて決定され得る。 In some embodiments of this aspect, the control signal may be configured to cause an audible sound. For example, the audible tone can be a pre-recorded message. In some embodiments, the control signal may be issued to the vehicle's engine control module. In some embodiments, for each pixel in the pixmap, the confidence value is the presence or absence of an edge at the pixel, the illumination level of the pixel, and the texture of the first and second images from which the pixmap was generated. can be determined based on the value.

この態様のいくつかの実施形態においては、方法は、コンピュータプロセッサが、画像のペアのシーケンスに対応する、スーパフレームのシーケンスを出力する工程であって、スーパフレームの各々は、２Ｄ画像と、２Ｄ画像に対応する視差マップと、２Ｄ画像に対応する確信度マップとから成る、工程からさらに成り得る。いくつかの実施形態においては、２Ｄ画像は、第１の画像、または第２の画像であり得る。いくつかの実施形態においては、スーパフレームのシーケンスを出力する工程は、２Ｄ画像と、確信度マップに対応する可視的な確信度インジケータとを示すことをディスプレイに行わせる、表示信号を出力する工程から成り得る。いくつかの実施形態においては、表示信号は、確信度インジケータを、２Ｄ画像の各ピクセルの透明度として、ピクセルごとに表示させ得る。いくつかの実施形態においては、スーパフレームの各々は、２Ｄ画像と、確信度マップと、２Ｄ画像に対応する視差マップとから成り得る。 In some embodiments of this aspect, the method comprises the step of the computer processor outputting a sequence of superframes corresponding to the sequence of image pairs, each superframe being a 2D image and a 2D It may further comprise a process comprising a disparity map corresponding to the image and a confidence map corresponding to the 2D image. In some embodiments, the 2D image can be the first image or the second image. In some embodiments, outputting the sequence of superframes includes outputting a display signal that causes the display to show a 2D image and a visual confidence indicator corresponding to the confidence map. can consist of In some embodiments, the display signal may cause the confidence indicator to be displayed on a pixel-by-pixel basis as the transparency of each pixel of the 2D image. In some embodiments, each superframe may consist of a 2D image, a confidence map, and a disparity map corresponding to the 2D image.

上述の特徴は、本願明細書で議論される実施形態のいずれにおいても、別個に、または任意の組み合わせで一緒に、使用され得る。
本願明細書において開示される本技術の様々な態様および実施形態が、添付の図を参照して、以下で説明される。図は、必ずしも実寸に比例しているとは限らないことが認識されるべきである。多数の図に出現する項目が、同じ参照番号によって示されることがある。明瞭にする目的で、あらゆる構成要素が、あらゆる図において、ラベル付けされるとは限らないことがある。添付の図のうちの少なくとも１つは、以下に示されるように、少なくとも部分的にカラーで作成される。 The features described above may be used separately or together in any combination in any of the embodiments discussed herein.
Various aspects and embodiments of the technology disclosed herein are described below with reference to the accompanying figures. It should be appreciated that the drawings are not necessarily to scale. Items appearing in multiple figures may be indicated by the same reference numerals. For clarity purposes, not all components may be labeled in all figures. At least one of the accompanying figures, as indicated below, is at least partially drawn in color.

本技術のいくつかの実施形態による、ステレオビジョンシステムのブロック図。1 is a block diagram of a stereo vision system, in accordance with some embodiments of the present technology; FIG. 本技術のいくつかの実施形態による、車両上に配置されたセンサおよび電子制御ユニットの配置を示す図。FIG. 2 illustrates an arrangement of sensors and electronic control units located on a vehicle, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、制御システムに対して結合されたステレオビジョンシステムのブロック図。1 is a block diagram of a stereo vision system coupled to a control system, according to some embodiments of the present technology; FIG. 本技術のいくつかの実施形態による、コストボリュームがどのように決定され得るかを理解するための図。FIG. 4 is a diagram for understanding how cost volume may be determined, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、コスト曲線の例を示す図。FIG. 13 illustrates an example cost curve, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、確信度処理手順によって実行されるプロセスを示す図。FIG. 7 illustrates a process performed by a confidence processing procedure, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、ビデオストリームおよびビデオストリームのスーパフレームの例を示す図（図６の一部はカラー化されている）。FIG. 6 shows an example of a video stream and superframes of the video stream (part of FIG. 6 is in color), in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、ビデオストリーム内に含まれ得る、異なるタイプのスーパフレームの例を示す図。FIG. 4 illustrates examples of different types of superframes that may be included within a video stream, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、ビデオストリーム内に含まれ得る、異なるタイプのスーパフレームの例を示す図。FIG. 4 shows an example of different types of superframes that may be included within a video stream, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、ビデオストリーム内に含まれ得る、異なるタイプのスーパフレームの例を示す図。FIG. 4 shows an example of different types of superframes that may be included within a video stream, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、ビデオストリーム内に含まれ得る、異なるタイプのスーパフレームの例を示す図。FIG. 4 shows an example of different types of superframes that may be included within a video stream, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、ビデオストリーム内に含まれ得る、異なるタイプのスーパフレームの例を示す図。FIG. 4 shows an example of different types of superframes that may be included within a video stream, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、ビデオストリーム内に含まれ得る、異なるタイプのスーパフレームの例を示す図。FIG. 4 shows an example of different types of superframes that may be included within a video stream, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、センサによってキャプチャされたシーンを照明することを備える、ビーム領域を概略的に描いた図。FIG. 14 schematically depicts a beam region comprising illuminating a scene captured by a sensor, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、シーンのキャプチャされた画像内の対象物についての断面積と範囲との間の関係を描いたグラフ。4 is a graph depicting the relationship between cross-sectional area and extent for an object within a captured image of a scene, in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、視差マップおよび確信度マップの例を示す図（図９Ａの一部はカラー化されている）。9A-9B show examples of disparity maps and confidence maps (some of which are in color in FIG. 9A), in accordance with some embodiments of the present technology; 本技術のいくつかの実施形態による、視差マップおよび確信度マップの例を示す図。FIG. 4 shows examples of disparity maps and confidence maps, in accordance with some embodiments of the present technology;

車両が動いている最中に、車両の周囲についての情報を獲得して、車両の電子制御システムが、車両をどのように動作させるかについての決定を行うことを可能にするために、および／または車両を操作している運転者を支援するのに有用な情報を運転者に対して提供するために、異なるタイプのセンサシステム（例えば、ライダシステム、レーダシステム、モノビジョンカメラシステム、ステレオビジョンカメラシステム、温度測定システム（例えば、熱電対）、音響システム（例えば、超音波トランスデューサシステム、可聴音マイクロフォンシステムなど））からのセンサ情報が、車両によって使用され得る。例えば、センサ情報は、車両のスピードを調整する（例えば、加速または減速する）、安全手段を配備する（例えば、警告点滅灯、フロントガラスワイパをオンにするなど）、車両の経路内の対象物から離れるように操縦するなどするために、制御システムによって使用され得る。別の例においては、センサ情報は、車両の経路内の特定の対象物を運転者に警告するために、制御システムによって使用され得る。本技術のいくつかの実施形態においては、制御システムは、車両の動作の様々な側面を制御するように構成された、ＥＣＵから成る、集中型コンピュータシステムであり得る。ＥＣＵの各々は、１つまたは複数のセンサからデータを受信し、車両の一部を動作させるために使用される、１つまたは複数の制御信号を出力するために、データを処理するように構成された、ソフトウェアおよび／またはハードウェアから成り得る。上で述べられたように、移動している車両においては、動作中の１００を超えるＥＣＵがあることがある。いくつかのＥＣＵは、他のＥＣＵから独立して動作し得、いくつかのＥＣＵは、１つまたは複数の他のＥＣＵと相互依存して動作し得る。本技術の他のいくつかの実施形態においては、車両の電子制御システムは、非集中化され得る。例えば、バッテリ管理ＥＣＵは、例えば、スピード制御ＥＣＵから独立した、別個のシステムとして動作し得る。各ＥＣＵは、１つタイプのセンサ、または複数タイプのセンサから、センサ情報を受信し得る。例えば、車室温度ＥＣＵは、車両の車室の異なる領域に位置付けられた、１つまたは複数の温度計から、センサ情報を受信し得、ヒータおよび／またはエアコンを制御して、車室の温度を車両の乗員によって設定された温度に維持するために、センサ情報を使用し得る。別の例においては、操縦ＥＣＵは、ステレオ画像化のためのカメラの１つまたは複数のセット、１つまたは複数のレーダシステム、１つまたは複数のライダシステム、１つまたは複数のタイヤ圧計、１つまたは複数のマイクロフォン、２Ｄ画像化のための１つまたは複数のカメラ、および１つまたは複数のナビゲーションシステム（例えば、ＧＰＳシステム）の様々な組み合わせからセンサ情報を受信し得、目的地まで車両を安全に操縦するのに、最も良い一連の行動を決定するために、センサ情報を使用し得る。 To obtain information about the vehicle's surroundings while the vehicle is in motion to enable the vehicle's electronic control system to make decisions about how to operate the vehicle; and/ or different types of sensor systems (e.g., lidar systems, radar systems, monovision camera systems, stereovision cameras, etc.) to provide the driver with useful information to assist him in operating the vehicle. Sensor information from systems, temperature measurement systems (eg, thermocouples), acoustic systems (eg, ultrasonic transducer systems, audio microphone systems, etc.) may be used by the vehicle. For example, sensor information may be used to adjust the speed of the vehicle (e.g., accelerate or decelerate), deploy safety measures (e.g., turn on warning flashing lights, windshield wipers, etc.), or detect objects in the vehicle's path. It can be used by the control system to steer away from, etc. In another example, sensor information may be used by the control system to warn the driver of certain objects in the vehicle's path. In some embodiments of the present technology, the control system may be a centralized computer system consisting of an ECU configured to control various aspects of vehicle operation. Each of the ECUs is configured to receive data from one or more sensors and process the data to output one or more control signals used to operate parts of the vehicle. It may consist of software and/or hardware implemented. As mentioned above, in a moving vehicle there may be over 100 ECUs in operation. Some ECUs may operate independently of other ECUs, and some ECUs may operate interdependently with one or more other ECUs. In some other embodiments of the present technology, the vehicle's electronic control system may be decentralized. For example, the battery management ECU may operate as a separate system, independent from, for example, the speed control ECU. Each ECU may receive sensor information from one type of sensor, or multiple types of sensors. For example, the cabin temperature ECU may receive sensor information from one or more thermometers positioned in different areas of the vehicle cabin and control heaters and/or air conditioners to determine the cabin temperature. can be used to maintain the temperature set by the vehicle occupants. In another example, the steering ECU includes one or more sets of cameras for stereo imaging, one or more radar systems, one or more lidar systems, one or more tire pressure gauges, one Sensor information may be received from various combinations of one or more microphones, one or more cameras for 2D imaging, and one or more navigation systems (e.g., GPS systems) to navigate the vehicle to its destination. Sensor information can be used to determine the best course of action for safe navigation.

本発明者らは、異なるタイプのセンサからのデータの使用を最適化するためには、センサの各々からの測定値の確信度または確実性のレベルを知ることが重要であることを認知し、認識した。本技術のいくつかの実施形態によれば、確信度情報は、対象物までの距離を決定するためのデータのために、どのセンサまたはセンサのどの組み合わせが使用されるべきかを決定するために使用され得る。従来のセンサは、典型的には、測定値についての確信度レベルを示す、誤差推定値（例えば、誤差バー）を報告することなく、測定値または推定された測定値を報告し、それが、正確な融合を困難にしていた。しかしながら、上で述べられたように、自動車用途で特に重要な機能的に安全なシステムは、人命の安全に影響し得る決定を行うために、センサ測定値に依存する。したがって、車両の電子制御システムに対して、正確なセンサデータを提供することに、高められた関心があり、それは、制御システムに対して、センサデータについての確信度のレベルを提供することを伴い得る。確信度データを装備することで、制御システムは、センサデータが、車両を制御するために使用されるほど十分に信用できるか、それともセンサデータが、十分に信頼できず、使用されるべきではないかに関する決定をより良く行うことができることがある。認識されるように、センサ融合は、異なるタイプのセンサからのデータを組み合わせるために、中央制御システムによって実行され得るが、本技術のいくつかの実施形態においては、センサ融合は、車両のＥＣＵの１つもしくは複数によって、またはＥＣＵおよび／もしくは中央制御システムと連携して動作する、補助システムによって、実行され得る。 The inventors recognize that in order to optimize the use of data from different types of sensors, it is important to know the confidence or certainty level of the measurements from each of the sensors; recognized. According to some embodiments of the present technology, confidence information is used to determine which sensor or combination of sensors should be used for data to determine distance to an object. can be used. Conventional sensors typically report measurements or estimated measurements without reporting error estimates (e.g., error bars) that indicate a level of confidence about the measurements, which making accurate fusion difficult. However, as mentioned above, functionally safe systems, which are of particular importance in automotive applications, rely on sensor measurements to make decisions that can affect the safety of human life. Accordingly, there is an increased interest in providing accurate sensor data to a vehicle's electronic control system, which entails providing the control system with a level of confidence about the sensor data. obtain. Equipped with confidence data, the control system can either trust the sensor data sufficiently to be used to control the vehicle, or the sensor data is not reliable enough and should not be used. You may be able to make better decisions about whether As will be appreciated, sensor fusion may be performed by a central control system to combine data from different types of sensors, but in some embodiments of the present technology sensor fusion is performed by the vehicle's ECU. It may be performed by one or more or by an auxiliary system working in conjunction with the ECU and/or central control system.

本発明者らは、より多くの証拠または情報が利用可能になると、仮説または推定値についての確率を更新するために、ベイズ推論において、推定値の確信度レベルが使用され得ることを認知し、認識した。車両上に配備された同じタイプおよび／または異なるタイプの様々なセンサからのセンサデータが、車両上に配備された特定のセンサからのセンサデータを裏付けるために、使用され得る。本発明者らは、車両の運転者支援システムが、信頼できるデータに基づいて決定を行うことを可能にすることによって、深度推定値についての確信度範囲を報告するセンサが、より安全な車両をもたらし得ることを認知し、認識した。それは、人間の運転者によって制御されないことがある、自律車両にとってとりわけ重要である。例えば、車両が、幹線道路の運転に典型的な巡航スピードで走行しており、センサ（例えば、カメラ）が、（例えば、センサ上のデブリのせいで）センサが車両に近い範囲内にある対象物を間違って検出するように、部分的に妨害された場合、センサによってキャプチャされた画像の分析に織り込まれる様々な因子に基づいて、本技術のいくつかの実施形態によるステレオビジョンシステムは、その対象物に対応するそのセンサデータに対して、相対的に低いレベルの確実性を示す確信度を決定および出力し得、したがって、そのセンサデータは、無視され得る。これは、車両が、その経路上において巡航スピードを継続するように制御されること、または潜在的に偽の（すなわち、存在しない）対象物に衝突することを避けるために、緊急ブレーキをかける、例えば、車両のブレーキを突然適用するような、車両の追突される可能性を増加させ得る制御が行われる代わりに、観察時間を増加させるために、スピードを緩やかに下げるように制御されることを可能にし得る。いくつかの実施形態においては、センサデータと関連付けられた確信度は、センサデータが使用され得るか、それとも相対的に高い確信度を有する、別のセンサからのセンサデータが代わりに使用されるべきかを決定するために使用され得る。例えば、センサデータからの深度情報は、車両の経路内の対象物を示しているが、深度情報が、低い確信度値と関連付けられている場合、対象物らしきものが、より多くの時間にわたって観察されることを可能にするために、および／または対象物らしきものが、他のセンサシステムによってクロスチェックされることを可能にするために、車両は、その走行スピードを下げるように制御され得る。以下で議論されるように、本技術の様々な実施形態によれば、ステレオビジョン技術によって獲得されたデータを裏付けるまたは補足するために、異なるタイプのセンサシステム（例えば、レーダ、ライダ、音響などに基づくシステム）が使用され得る。 We recognize that the confidence level of an estimate can be used in Bayesian inference to update the probability about a hypothesis or estimate as more evidence or information becomes available; recognized. Sensor data from various sensors of the same type and/or different types deployed on the vehicle may be used to corroborate sensor data from a particular sensor deployed on the vehicle. The inventors have found that sensors reporting confidence ranges for depth estimates can lead to safer vehicles by enabling a vehicle's driver assistance systems to make decisions based on reliable data. Recognized and recognized what it could bring. It is especially important for autonomous vehicles, which may not be controlled by a human driver. For example, the vehicle is traveling at cruising speeds typical of highway driving and the sensor (e.g. camera) is subject to the sensor being within close range of the vehicle (e.g. due to debris on the sensor). Based on various factors factored into the analysis of the images captured by the sensors, the stereo vision system according to some embodiments of the present technology may detect an object incorrectly if it is partially obstructed. A confidence factor may be determined and output indicating a relatively low level of certainty for that sensor data corresponding to the object, and thus the sensor data may be ignored. This involves the vehicle being controlled to continue cruising speed on its path, or applying emergency braking to avoid potentially colliding with spurious (i.e., non-existent) objects. For example, instead of controlling the vehicle to increase the likelihood of a rear-end collision, such as suddenly applying the vehicle's brakes, it is controlled to slow down gradually to increase the observation time. can make it possible. In some embodiments, the confidence associated with the sensor data may be used, or sensor data from another sensor with a relatively high confidence should be used instead. can be used to determine whether For example, depth information from sensor data indicates an object in the vehicle's path, but if the depth information is associated with a low confidence value, likely objects are observed over more time. The vehicle may be controlled to slow its travel speed to allow the vehicle to be detected and/or to allow possible objects to be cross-checked by other sensor systems. As discussed below, various embodiments of the present technology employ different types of sensor systems (e.g., radar, lidar, acoustic, etc.) to corroborate or supplement data acquired by stereo vision techniques. based system) can be used.

本発明者らは、特定のセンサが部分的または全体的に誤動作している（例えば、デブリによって部分的または全体的に覆われている）かどうかを、運転者支援システムが決定することを可能にすることによって、各センサからの深度推定値の確信度レベルまたは確実性の程度が、乗客の安全を増加させるために、有利に使用され得ることを認知し、認識した。いくつかの実施形態においては、確信度レベルから導出された情報は、センサにおける異常を運転者に対して警告するために使用され得る。いくつかの実施形態においては、ステレオビジョンシステム、またはステレオビジョンシステムと連携して動作するデバイスは、複数の画像にわたって、低い確信度（例えば、閾値確信度レベルを下回る値）の発生を追跡し得、センサが異常に機能していることがあるかどうかを決定するために、例えば、低い確信度の頻度を使用し得る。 The inventors have determined that a driver assistance system can determine if a particular sensor is partially or totally malfunctioning (e.g. partially or totally covered by debris). By doing so, the confidence level or degree of certainty of the depth estimate from each sensor can be used advantageously to increase passenger safety. In some embodiments, information derived from confidence levels may be used to warn the driver of anomalies in sensors. In some embodiments, a stereo vision system, or a device operating in conjunction with a stereo vision system, may track occurrences of low confidence (e.g., values below a threshold confidence level) across multiple images. , the low confidence frequency, for example, may be used to determine whether the sensor may be functioning abnormally.

本技術のいくつかの実施形態によれば、ステレオビジョンシステムは、画像についての全般的な確信度を決定および出力するように構成され得る。「確信度」、「確信度範囲」、および「確信度レベル」という用語は、本願明細書においては、交換可能に使用されることがある。画像は、移動する車両（例えば、車、トラック、飛行機など）上のセンサによってキャプチャされたシーンのものであり得、または固定された構造物（例えば、街灯、管制塔、住居、オフィスビルなど）上に搭載されたセンサによってキャプチャされたシーンのものであり得る。ステレオビジョンシステムは、車両上に配備されたスタンドアロンシステムであり得、または車両の制御システム内に組み込まれ得る。いくつかの実施形態においては、ステレオビジョンシステムは、画像の複数の領域の各領域についての確信度を決定および出力し得る。例えば、画像は、（例えば、左上、右上、左下、右下）象限に分割され得、ステレオビジョンシステムは、各象限についての確信度を決定および出力し得る。いくつかの実施形態においては、ステレオビジョンシステムは、画像の各ピクセルについての確信度を決定および出力し得る。 According to some embodiments of the present technology, a stereo vision system may be configured to determine and output a general belief about an image. The terms "belief," "belief range," and "belief level" may be used interchangeably herein. The images can be of scenes captured by sensors on moving vehicles (e.g. cars, trucks, planes, etc.) or fixed structures (e.g. streetlights, control towers, residences, office buildings, etc.). It can be of a scene captured by an on-board sensor. The stereo vision system may be a stand-alone system deployed on the vehicle or may be integrated within the vehicle's control system. In some embodiments, a stereo vision system may determine and output a confidence measure for each region of multiple regions of an image. For example, an image may be divided into quadrants (eg, upper left, upper right, lower left, lower right) and the stereo vision system may determine and output a confidence score for each quadrant. In some embodiments, a stereo vision system may determine and output a confidence score for each pixel of an image.

本技術のいくつかの実施形態によれば、ステレオビジョンシステムは、シーンのキャプチャされた画像に対応する、深度マップを出力するように構成され得る。画像は、キャプチャされたデジタル画像であり得、またはアナログ画像からデジタル化され得る。深度マップは、深度値、またはステレオビジョンシステムのセンサからシーン内の対象物までの距離のマップであり得る。深度マップは、深度マップの各ピクセル（および画像の各ピクセル）が、関連付けられた深度値を有し得るように、画像のピクセルに対応する、ピクセルから成り得る。ステレオビジョンシステムは、深度マップとともに、確信度データを出力するようにも構成され得る。いくつかの実施形態においては、確信度データは、深度マップの確実性または確信度を示す、確信度マップであり得る。いくつかの実施形態においては、確信度マップは、確信度マップの各ピクセル（および画像の各ピクセル）が、関連付けられた確信度を有し得るように、深度マップ（および画像）のピクセルに対応する、ピクセルから成り得る。いくつかの実施形態においては、確信度マップは、誤差バーとして、標準偏差値として、バケット（例えば、高確信度、中確信度、低確信度）で、または各推定された確信度の品質レベルを示すことが可能な他の任意のメトリックで、確信度の推定値を表し得る。 According to some embodiments of the present technology, a stereo vision system may be configured to output a depth map corresponding to captured images of a scene. The image can be a captured digital image or can be digitized from an analog image. The depth map may be a map of depth values or distances from the stereo vision system's sensors to objects in the scene. A depth map may consist of pixels that correspond to pixels of the image such that each pixel of the depth map (and each pixel of the image) may have an associated depth value. A stereo vision system may also be configured to output confidence data along with the depth map. In some embodiments, the confidence data may be a confidence map that indicates the certainty or confidence of the depth map. In some embodiments, the belief map corresponds to pixels of the depth map (and image) such that each pixel of the belief map (and each pixel of the image) may have an associated belief. can consist of pixels. In some embodiments, the confidence map is displayed as error bars, as standard deviation values, in buckets (e.g., high confidence, medium confidence, low confidence), or as quality levels for each estimated confidence. Any other metric that can indicate the confidence estimate can be represented.

上で述べられたように、ピクセルの深度は、ピクセルの視差に反比例するので、画像の各ピクセルについての深度の推定値は、視差マップから計算され得る。そのため、「深度マップ」および「視差マップ」という用語は、それらが、画像内のキャプチャされたシーンについて、ほぼ同一の情報を提供し、当技術分野で知られた単純な代数変換によって関係付けられるので、本願明細書においては、交換可能に使用されることがある。 As mentioned above, since the depth of a pixel is inversely proportional to the disparity of the pixel, a depth estimate for each pixel of the image can be calculated from the disparity map. As such, the terms "depth map" and "disparity map" are used because they provide nearly identical information about the captured scene in an image and are related by simple algebraic transformations known in the art. As such, they may be used interchangeably herein.

本技術のいくつかの実施形態によれば、自律車両、および／または高度運転者支援システム（ＡＤＡＳ）は、事故を避けるために、および／または信頼できないデータがあるときに、運転者に対して警告するために、深度マップ、および深度マップと関連付けられた確信度値を有利に使用し得る。いくつかの実施形態においては、確信度値は、深度マップの各ピクセルに対して提供され得るので、低い確信度値を有するいくつかのピクセルがあるときに、シーンのキャプチャされたビデオシーケンスのフレーム全体を破棄する必要がないことがある。代わりに、低い確信度値を有するピクセルは、破棄され得、十分に高い確信度値を有する残りのピクセルが、深度算出のために使用され得る。この選択性は、運転中に視野内のシーンを処理するとき、車両のＡピラー、車両のフロントガラスワイパなどの妨害対象物を、運転手は自然に見えないことにし得るという、人間の視覚に非常に似ている。すなわち、運転者は、視野内のシーンを評価しながら、妨害対象物を自動的に無視する。いくつかの実施形態においては、フレームのいくつかが無視された場合であっても、ビデオシーケンスのフレームは使用され得るので、確信度マップは、センサ利用可能性を増加させ得、それは、車両のセンサが、より大きい範囲の環境状況において、より高いデューティーサイクルで、動作することを可能にする。例えば、センサのレンズの一部に汚れがあるときであっても、またはフロントガラスワイパがセンサの視野を部分的に妨害することがあるときも、またはセンサによってキャプチャされた画像の一区画が露出過多であるときも、またはセンサによってキャプチャされた画像の一区画が低い光レベルを有するときも、または画像の他のいくつかのピクセルが無視されるべきデータを有することがある場合であっても、センサによってキャプチャされた画像のいくつかのピクセルが有用なデータを有し得る、いずれの状況があるときも、センサは動作し得る。ピクセルごとの確信度値から成る、確信度マップを提供することによって、車両の電子制御システムは、有効で、相対的に高い確信度値を有する、深度マップのエリアに注意を払うことを可能にされ得る。 According to some embodiments of the present technology, autonomous vehicles and/or Advanced Driver Assistance Systems (ADAS) may instruct drivers to avoid accidents and/or when there is unreliable data. Depth maps and confidence values associated with depth maps may be advantageously used to warn. In some embodiments, a confidence value may be provided for each pixel of the depth map so that when there are some pixels with low confidence values, frames of the captured video sequence of the scene Sometimes it is not necessary to discard the whole thing. Alternatively, pixels with low confidence values can be discarded and the remaining pixels with sufficiently high confidence values can be used for depth calculation. This selectivity is attributed to human vision, which can make obstructing objects such as vehicle A-pillars, vehicle windshield wipers, etc. naturally invisible to the driver when processing the scene in the field of view while driving. very similar. That is, the driver automatically ignores obstructing objects while evaluating the scene within the field of view. In some embodiments, the confidence map may increase sensor availability, as frames of the video sequence may be used even if some of the frames are ignored, which may increase the vehicle's Allows the sensor to operate at higher duty cycles in a larger range of environmental conditions. For example, even when part of the sensor's lens is dirty, or when a windshield wiper may partially obstruct the sensor's field of view, or even when a section of the image captured by the sensor is exposed. even when there are too many, or when a section of the image captured by the sensor has low light levels, or when some other pixels of the image may have data that should be ignored. The sensor may operate in any situation where some pixels of the image captured by the sensor may have useful data. By providing a confidence map consisting of per-pixel confidence values, the vehicle's electronic control system is enabled to pay attention to areas of the depth map that are valid and have relatively high confidence values. can be

本技術のいくつかの実施形態によれば、自律車両は、信頼性および安全を増加させるために、カメラ、ライダ、レーダ、および／または超音波センサなどの、異なるセンサからの情報を融合し得る。そのようなセンサは、対象物までの距離に関する情報を報告または提供し得るが、異なるセンサについて異なる距離が報告されたとき、どのセンサを信用すべきかが、不明確なことがある。いくつかの実施形態においては、センサ融合アルゴリズムは、異なるセンサからのデータを組み合わせ得、１つまたは別のセンサからの融合されていない情報が、個別に使用されるときに可能であるよりも、小さい不確実性を有する、融合された情報を出力し得る。本発明者らは、アルゴリズムに対して、異なるセンサの各々についての確実性パラメータ（例えば、分散）を提供することによって、センサ融合アルゴリズムが、融合された情報の確実性を増加させるように、強化され得ることを認知し、認識した。いくつかの実施形態においては、確信度マップは、ピクセルごとに、異なるセンサについて、決定され得るので、センサ融合は、異なるが重なり合う視野を有する２つ以上のセンサについて、可能にされ得る。いくつかの実施形態においては、対象物を包含する重なり合う視野があるとき、対象物についてのレーダ距離推定値は、対象物についてのステレオビジョン距離推定値と比較され得る。例えば、晴天の日中に走行する車の上のセンサは、車上のカメラによってキャプチャされた画像に基づいた、ステレオビジョン距離決定について、非常に高い確信度値を有し得、したがって、とりわけ、車上の他のセンサが、高い確信度値を返すことを期待され得ない、３００メートル以上の範囲または距離にある対象物について、ステレオビジョン距離決定は、車の電子制御システムによって、信用され得る。（例えば、濃霧、土砂降りの雨など）天候が、悪くなった場合、視程は、より低くなり得、結果として、光減衰が、ステレオビジョン距離推定値を悪化させ、関連付けられた確信度値を低くさせ得る。車の制御システムは、そのときは、カメラからのデータの代わりに、レーダデータから距離推定値を獲得するように、切り替え得る。同様に、車が、夜間、または霧もしくは他の降水のない、低い周囲光レベルにおいて走行している場合、典型的なライダシステムは、それ独自の能動的な照明源を有するので、車の制御システムは、レーダデータまたはカメラからのデータからの代わりに、ライダデータから距離推定値を獲得するように、切り替え得る。いくつかの実施形態においては、車の制御システムが、ステレオビジョン距離推定値、ライダ距離推定値、レーダ距離推定値、または音響距離推定値の間の切り替えをいつ行うべきかを決定する代わりに、切り替えは、車の主要なセンサシステムであり得る、車のステレオビジョンシステムによって、実行され得る。認識されるように、上述の例は、車についての距離推定値に関係するが、本技術は、車に限定されず、他の車両（例えば、トラックおよび他の道路車両、列車および他の鉄道車両、船および他の海上車両、ならびに飛行機および他の航空車両など）に適用可能であり得る。 According to some embodiments of the present technology, autonomous vehicles may fuse information from different sensors, such as cameras, lidar, radar, and/or ultrasonic sensors, to increase reliability and safety. . Such sensors may report or provide information about the distance to an object, but it may be unclear which sensor to trust when different distances are reported for different sensors. In some embodiments, the sensor fusion algorithm may combine data from different sensors, providing more information than is possible when unfused information from one or another sensor is used individually. It can output fused information with a small uncertainty. We enhance the sensor fusion algorithm to increase the certainty of the fused information by providing the algorithm with certainty parameters (e.g. variance) for each of the different sensors. Recognized and recognized that it can be done. In some embodiments, confidence maps may be determined for different sensors on a pixel-by-pixel basis, so sensor fusion may be enabled for two or more sensors with different but overlapping fields of view. In some embodiments, the radar range estimate for the object may be compared to the stereo vision range estimate for the object when there are overlapping fields of view encompassing the object. For example, sensors on a car driving during a sunny day may have very high confidence values for stereovision range determinations based on images captured by cameras on the car; Stereovision range determinations can be trusted by the vehicle's electronic control system for objects at ranges or distances greater than 300 meters where other sensors on the vehicle cannot be expected to return high confidence values. . If the weather deteriorates (e.g., heavy fog, torrential rain, etc.), visibility may be lower, and as a result light attenuation worsens the stereovision range estimate and lowers the associated confidence value. can let The vehicle's control system may then switch to obtaining range estimates from radar data instead of data from the camera. Similarly, when a vehicle is traveling at night or in low ambient light levels without fog or other precipitation, a typical lidar system has its own active lighting source, so the vehicle's control The system may switch to obtain range estimates from lidar data instead of from radar data or data from cameras. In some embodiments, instead of the vehicle's control system determining when to switch between stereo vision range estimates, lidar range estimates, radar range estimates, or acoustic range estimates, The switching may be performed by the car's stereo vision system, which may be the car's primary sensor system. As will be appreciated, the examples above relate to distance estimates for cars, but the technology is not limited to cars, but other vehicles (e.g., trucks and other road vehicles, trains and other railroad vehicles). vehicles, ships and other marine vehicles, as well as airplanes and other air vehicles, etc.).

図１は、本技術のいくつかの実施形態による、ステレオマッチング確信度または確実性についての値を提供するように構成された、ステレオビジョンシステム１のブロック図を示している。ステレオビジョンシステム１は、複数のセンサシステムに対して結合され、センサシステムの各々から、測定値またはセンサデータを受信するように構成された、コンピュータプロセッサ１０から成り得る。いくつかの実施形態においては、プロセッサ１０に対して結合されたセンサシステムは、カメラ、ライダセンサ、レーダセンサ、および／または超音波センサであり得る、センサ１００から成る、画像化システムから成り得る。いくつかの実施形態においては、ステレオビジョンシステム１は、自律的に動く（すなわち、人間の制御なし）、および／または半自律的に動く（すなわち、人間の制御もしくは限定された人間の制御あり）ことが可能な車両の上に、搭載され得る。例えば、車両は、自動車、トラック、ロボット、海上船舶、および飛行車両などであり得る。 FIG. 1 illustrates a block diagram of a stereo vision system 1 configured to provide values for stereo matching confidence or certainty, in accordance with some embodiments of the present technology. Stereo vision system 1 may consist of a computer processor 10 coupled to a plurality of sensor systems and configured to receive measurements or sensor data from each of the sensor systems. In some embodiments, the sensor system coupled to processor 10 may comprise an imaging system comprising sensor 100, which may be a camera, lidar sensor, radar sensor, and/or ultrasonic sensor. In some embodiments, the stereo vision system 1 moves autonomously (i.e., without human control) and/or semi-autonomously (i.e., with human control or limited human control). can be mounted on vehicles capable of For example, vehicles can be cars, trucks, robots, marine vessels, flying vehicles, and the like.

本技術のいくつかの実施形態によれば、センサ１００は、同時に、すなわち、時間の同じまたはほぼ同じ瞬間に、車両の環境の画像をキャプチャするように構成された、２つのステレオカメラ１００から成り得る。表記を簡略化するために、カメラ１００は、それらが、互いに対して垂直（例えば、上下）に、または互いに対して斜めに、または（例えば、一方のカメラは車両の前部分にあり、他方のカメラは車両の後部分にあるなど）異なる範囲ビン内にオフセットされて位置決めされることがあっても、本願明細書においては、「左」カメラおよび「右」カメラと呼ばれ得る。カメラ１００は、例えば、カラーＣＭＯＳ（相補型金属酸化膜半導体）カメラ、グレースケールＣＭＯＳカメラ、ＣＣＤ（電荷結合素子）カメラ、ＳＷＩＲ（短波長赤外線）カメラ、ＬＷＩＲ（長波長赤外線）カメラ、または焦点面アレイセンサであり得る。 According to some embodiments of the present technology, the sensor 100 consists of two stereo cameras 100 configured to capture images of the vehicle's environment at the same time, i.e. at the same or approximately the same instant in time. obtain. To simplify the notation, the cameras 100 may be arranged so that they are either perpendicular to each other (e.g., above and below), or oblique to each other, or (e.g., one camera at the front of the vehicle and the other Although the cameras may be offset and positioned within different range bins (such as at the rear of the vehicle), they may be referred to herein as "left" and "right" cameras. Camera 100 may be, for example, a color CMOS (complementary metal oxide semiconductor) camera, a grayscale CMOS camera, a CCD (charge coupled device) camera, a SWIR (short wavelength infrared) camera, a LWIR (long wavelength infrared) camera, or a focal plane It can be an array sensor.

本技術のいくつかの実施形態によれば、センサＳ１、Ｓ２、Ｓ３、Ｓ４、Ｓ５、Ｓ６、Ｓ７、Ｓ８、Ｓ９は、図２に概略的に描かれているように、車両２０上の複数の異なる場所に位置付けられ得る。いくつかの実施形態においては、センサＳ１、Ｓ２、・・・、Ｓ９のうちのいくつかは、車両２０の車室の内部に位置付けられ得、したがって、埃および雨から保護され得る。いくつかの実施形態においては、センサＳ１、Ｓ２、・・・、Ｓ９のいくつかは、車両２０の車室の外部に位置付けられ得、したがって、ベースライン（すなわち、２つのセンサ間の距離）が、車室の幅よりも大きくなることを可能にし得、これは、車室の幅に制限されたベースラインと比較して、相対的により遠い距離にある対象物が、検出されることを可能にし得る。いくつかの実施形態においては、センサＳ１、Ｓ２、・・・、Ｓ９は、センサＳ１、Ｓ２、・・・、Ｓ９からデータを受信し、および／またはセンサＳ１、Ｓ２、・・・、Ｓ９に電力を供給するように構成された、１つまたは複数のＥＣＵ２２、２４に対して、無線または有線で、結合され得る。いくつかの実施形態においては、ＥＣＵ２２、２４のうちの少なくともいくつかは、コンピュータプロセッサ１０の一部であり得る。いくつかの他の実施形態においては、ＥＣＵ２２、２４のうちの少なくともいくつかは、コンピュータプロセッサ１０の外部に位置付けられ得、コンピュータプロセッサ１０に対して、無線または有線で、信号を送信するように構成され得る。いくつかの実施形態においては、ヘッドライト内のカメラのペア（例えば、Ｓ１とＳ９）、またはサイドビューミラー内のカメラのペア（例えば、Ｓ２とＳ８）、またはフロントガラス内のカメラのペア（例えば、Ｓ３とＳ７、もしくは代替として、Ｓ３とＳ５）、またはルーフ上のカメラのペア（例えば、Ｓ４とＳ６）など、２つのカメラだけが、必要とされる。 According to some embodiments of the present technology, sensors S1, S2, S3, S4, S5, S6, S7, S8, S9 are multiple sensors on vehicle 20, as schematically depicted in FIG. can be located at different locations in the In some embodiments, some of the sensors S1, S2, . In some embodiments, some of the sensors S1, S2, . , can be larger than the width of the passenger compartment, which allows objects at relatively greater distances to be detected compared to a baseline limited to the width of the passenger compartment. can be In some embodiments, sensors S1, S2, . . . , S9 receive data from sensors S1, S2, . It may be coupled wirelessly or by wire to one or more ECUs 22, 24 configured to supply power. At least some of the ECUs 22 , 24 may be part of the computer processor 10 in some embodiments. In some other embodiments, at least some of the ECUs 22, 24 may be located external to the computer processor 10 and configured to transmit signals to the computer processor 10, either wirelessly or by wire. can be In some embodiments, a pair of cameras in the headlights (e.g. S1 and S9), or a pair of cameras in the side view mirrors (e.g. S2 and S8), or a pair of cameras in the windshield (e.g. , S3 and S7, or alternatively S3 and S5), or a pair of cameras on the roof (eg S4 and S6) are required.

本技術のいくつかの実施形態によれば、ステレオビジョンシステム１は、図３において概略的に示されるように、車両のメインシステムコントローラ３０に対して、結合され得る。いくつかの実施形態においては、メインシステムコントローラ３０は、車両動作のすべての自動化された側面を制御するように構成され得る、車両の制御システムであり得る。いくつかの実施形態においては、ステレオビジョンシステム１は、メインシステムコントローラ３０によって指令されるように構成され得、指令および制御ライン３２を介して、メインシステムコントローラ３０に対する信号、およびメインシステムコントローラ３０からの信号を通信し得る。認識されるように、指令および制御ライン３２は、有線通信メカニズム（例えば、データバス、通信ライン）であり得、または当技術分野において知られた通信技法を使用する、無線通信メカニズムであり得る。いくつかの実施形態においては、メインシステムコントローラ３０は、高レベル機能（例えば、自動緊急ブレーキ、ルート選択など）を編成し、高レベル機能を実施するために、様々なサブシステムまたはＥＣＵ（例えば、ステレオビジョンシステム１）と通信するように構成された、コンピュータから成り得る。いくつかの実施形態においては、指令および制御ライン３２を介した通信のために、共通通信プロトコルが、使用され得る（例えば、イーサネット（登録商標）、ＣＡＮ（コントローラエリアネットワーク）、Ｉ２Ｃ（集積回路間）、ＬＩＮ（ローカル相互接続ネットワーク）など）。ステレオビジョンシステム１は、図３において、メインシステムコントローラ３０から分離して示されているが、ステレオビジョンシステム１は、いくつかの実施形態においては、メインシステムコントローラ３０の一部であり得、いくつかの実施形態においては、メインシステムコントローラ３０の筐体内に、物理的に位置付けられ得る。 According to some embodiments of the present technology, the stereo vision system 1 may be coupled to the vehicle's main system controller 30, as shown schematically in FIG. In some embodiments, main system controller 30 may be a vehicle control system that may be configured to control all automated aspects of vehicle operation. In some embodiments, the stereo vision system 1 may be configured to be commanded by a main system controller 30, with signals to and from the main system controller 30 via command and control lines 32. signals can be communicated. As will be appreciated, command and control line 32 may be a wired communication mechanism (eg, data bus, communication line) or may be a wireless communication mechanism using communication techniques known in the art. In some embodiments, main system controller 30 orchestrates high-level functions (e.g., automatic emergency braking, route selection, etc.) and uses various subsystems or ECUs (e.g., It may consist of a computer arranged to communicate with the stereo vision system 1). In some embodiments, a common communication protocol may be used for communication over command and control lines 32 (e.g., Ethernet, CAN (controller area network), I2C (inter-integrated circuit ), LIN (Local Interconnection Network), etc.). Although stereo vision system 1 is shown separate from main system controller 30 in FIG. 3, stereo vision system 1 may be part of main system controller 30 in some embodiments, and may In some embodiments, it may be physically located within the housing of main system controller 30 .

図１に戻ると、カメラ１００は、本技術のいくつかの実施形態によれば、画像取得モジュール１０２に対して、無線でまたは有線接続で、結合され得る。いくつかの実施形態においては、カメラ１００によってキャプチャされたシーンの画像データは、知られた通信インターフェース（例えば、ＵＳＢ（ユニバーサルシリアルバス）コネクタ、イーサネット（登録商標）コネクタ、ＭＩＰＩ（モバイル業界プロセッサインターフェース）ＣＳＩ（カメラシリアルインターフェース）コネクタ、ＧＭＳＬ（ギガビットマルチメディアシリアルリンク）コネクタ、およびフラットパネルディスプレイリンク（ＦＰＤリンク）コネクタなど）を介して、画像取得モジュール１０２に対して、転送され得る。いくつかの実施形態においては、カメラ１００は、画像取得モジュール１０２に対して画像データを、リアルタイムまたはほぼリアルタイムで、直接的に、またはカメラ１００に組み込まれ得る、バッファメモリデバイス（例えば、ＲＡＭ）を介して、送信するように構成され得る。いくつかの実施形態においては、カメラ１００は、画像取得モジュール１０２、およびコンピュータプロセッサ１０の他の一部によってアクセス可能な、データ記憶メモリデバイス１４０と関連付けられ得、カメラ１００は、データ記憶デバイス１４０に対して、画像データを送信するように構成され得る。いくつかの実施形態においては、カメラ１００は、シーンのビデオデータのストリームをキャプチャするように構成された、ビデオカメラであり得る。ビデオデータのストリームは、左ストリームと、右ストリームとから成り得、各ストリームは、フレームのシーケンスから成る。したがって、本願明細書において使用される場合の「画像データ」という用語は、いくつかの実施形態においては、ビデオデータのフレームに言及していることがある。 Returning to FIG. 1, the camera 100 may be coupled wirelessly or with a wired connection to the image acquisition module 102, according to some embodiments of the present technology. In some embodiments, the image data of the scene captured by the camera 100 is transmitted through known communication interfaces (e.g., USB (Universal Serial Bus) connector, Ethernet connector, MIPI (Mobile Industry Processor Interface)). CSI (Camera Serial Interface) connector, GMSL (Gigabit Multimedia Serial Link) connector, and Flat Panel Display Link (FPD Link) connector, etc.) to image acquisition module 102 . In some embodiments, the camera 100 transmits image data to the image acquisition module 102 in real-time or near real-time, either directly or through a buffer memory device (e.g., RAM), which may be incorporated into the camera 100. may be configured to transmit via In some embodiments, the camera 100 may be associated with a data storage memory device 140 accessible by the image acquisition module 102 and other portions of the computer processor 10, the camera 100 storing data in the data storage device 140. In return, it may be configured to transmit image data. In some embodiments, camera 100 may be a video camera configured to capture a stream of video data of a scene. A stream of video data may consist of a left stream and a right stream, each stream consisting of a sequence of frames. Accordingly, the term "image data" as used herein may, in some embodiments, refer to a frame of video data.

本技術のいくつかの実施形態によれば、画像取得モジュール１０２は、原デジタル画像データまたは「原画像データ」を生成するために、カメラ１００からの画像データをデジタル化するように構成され得る。いくつかの実施形態においては、画像取得モジュール１０２は、画像前処理モジュール１０４に対して、原画像データを提供し得る。いくつかの実施形態においては、画像取得モジュール１０２は、将来の処理のために原画像データを記憶し得る、メモリ１４０に対して、原画像データを提供し得る。 According to some embodiments of the present technology, image acquisition module 102 may be configured to digitize image data from camera 100 to generate raw digital image data or "raw image data." In some embodiments, image acquisition module 102 may provide raw image data to image preprocessing module 104 . In some embodiments, image acquisition module 102 may provide original image data to memory 140, which may store the original image data for future processing.

本技術のいくつかの実施形態によれば、画像前処理モジュール１０４は、補正された左画像および右画像を生成するために、原画像データを補正するように構成され得る。例えば、画像前処理モジュール１０４は、デモザイク、自動焦点、自動露出、および自動ホワイトバランス補正、口径食、ノイズリダクション、不良ピクセルフィルタリング、ＨＤＲ（ハイダイナミックレンジ）ルックアップテーブルカラー処理、および画像圧縮のうちのいずれか１つまたは任意の組み合わせを実行し得る。補正された左画像および右画像は、画像平行化モジュール１０６に対して、転送され得る。 According to some embodiments of the present technology, image preprocessing module 104 may be configured to correct the original image data to generate corrected left and right images. For example, the image preprocessing module 104 includes demosaicing, autofocus, autoexposure, and autowhite balance correction, vignetting, noise reduction, bad pixel filtering, HDR (high dynamic range) lookup table color processing, and image compression among: any one or any combination of The corrected left and right images may be forwarded to image rectification module 106 .

本技術のいくつかの実施形態によれば、画像平行化モジュール１０６は、補正された左画像および右画像のピクセルの対応する行が、同じエピポーラ平面上にあるように、それらを歪ませることによって、補正された左画像および右画像を平行化するように構成され得る。歪ませた後、画像平行化モジュール１０６は、カラー画像またはグレースケール画像であり得る、左右の平行化２Ｄ画像１１４を出力し得る。認識されるように、画像平行化は、補正された左画像および右画像における共通の対象物のマッチングを簡略化するために使用される、知られた技法である。画像平行化モジュール１０６は、ステレオマッチングモジュール１０８、確信度処理モジュール１１０、およびエンコーダモジュール１１２に対して、左右の平行化２Ｄ画像１１４を提供し得る。 According to some embodiments of the present technology, image rectification module 106 distorts corresponding rows of pixels of the corrected left and right images so that they lie on the same epipolar plane. , to parallelize the corrected left and right images. After warping, image rectification module 106 may output left and right rectified 2D images 114, which may be color images or grayscale images. As will be appreciated, image rectification is a known technique used to simplify matching of common objects in corrected left and right images. Image rectification module 106 may provide left and right rectified 2D images 114 to stereo matching module 108 , belief processing module 110 , and encoder module 112 .

本技術のいくつかの実施形態によれば、ステレオマッチングモジュール１０８は、平行化２Ｄ画像１１４における各マッチングピクセルペア間の視差を計算するように構成され得る。ステレオマッチングモジュール１０８によって実行される処理は、いくつかの実施形態においては、４つの手順、すなわち、コスト計算手順、コスト集約手順、視差計算手順、および視差精緻化手順から成り得、それらの各々は、以下で議論される。 According to some embodiments of the present technology, stereo matching module 108 may be configured to calculate disparity between each matching pixel pair in rectified 2D image 114 . The processing performed by stereo matching module 108 may, in some embodiments, consist of four procedures: a cost calculation procedure, a cost aggregation procedure, a disparity calculation procedure, and a disparity refinement procedure, each of which is , discussed below.

本技術のいくつかの実施形態によれば、コスト計算手順は、可能な視差値のセットのうちの各視差値における、各ピクセルについてのマッチングコストを計算することによって、「視差空間画像」と呼ばれることもある、３次元（３Ｄ）コストボリュームマップ１１８を構築することから成り得る。以下で議論されるように、コストボリューム（またはより正しくは、マッチングコストボリューム）は、Ｗ×Ｈ×Ｄの積として決定され得、ＷおよびＨは、各画像の幅寸法および高さ寸法であり、Ｄは、視差仮説または可能な視差の数である。特定のピクセルと特定の視差値とについてのマッチングコストは、その特定のピクセルが、その特定の視差値を有する可能性がどれだけ低いかを表す。典型的には、以下で議論されるように、与えられたピクセルについて、最も低いマッチングコストを有する視差値が、視差マップにおいて使用されるために、選定される。与えられたピクセルについての視差値を選択するためのこのアプローチは、いわゆる勝者総取り（ＷＴＡ）アプローチであり、勝者は、最も低いマッチングコストを有する視差値、すなわち、すべての視差仮説の中で最良のものである。マッチングコストは、例えば、絶対差技法、相互情報量（ＭＩ）技法（例えば、階層的ＭＩ（ＨＭＩ）技法）、正規化相互相関（ＮＣＣ）技法、ハミング距離技法など、知られた技法を使用して、計算され得る。いくつかの実施形態においては、ＮＣＣ技法は、Ｈ．ヒルシュミュラ（Ｈ．Ｈｉｒｓｃｈｍｕｌｌｅｒ）らによる、「ステレオマッチングのためのコスト関数の評価（ＥｖａｌｕａｔｉｏｎｏｆＣｏｓｔＦｕｎｃｔｉｏｎｓｆｏｒＳｔｅｒｅｏＭａｔｃｈｉｎｇ）」（２００７ＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ）において説明されているように、考察中のピクセルの周りの２つのサブウィンドウ（左右の平行化２Ｄ画像１１４の各々において１つのサブウィンドウ）についてのコストをマッチさせるために、使用され得る。いくつかの実施形態においては、ハミング距離技法は、Ｓ．サリカ（Ｓ．Ｓａｒｉｋａ）らによる、「変動する放射測定条件下におけるセンサスフィルタリングベースのステレオマッチング（ＣｅｎｓｕｓＦｉｌｔｅｒｉｎｇＢａｓｅｄＳｔｅｒｅｏｍａｔｃｈｉｎｇＵｎｄｅｒＶａｒｙｉｎｇＲａｄｉｏｍｅｔｒｉｃＣｏｎｄｉｔｉｏｎｓ）」（２０１５ＰｒｏｃｅｄｉａＣｏｍｐｕｔｅｒＳｃｉｅｎｃｅ）において説明されているように、考察中のピクセルを取り囲む近隣ピクセルが、これらのピクセルの強度値が、考察中のピクセルのそれよりも大きいか、それとも小さいかに応じて、ビットストリングにマッピングされる、センサス変換において、使用され得る。 According to some embodiments of the present technology, the cost calculation procedure is called a "disparity space image" by calculating the matching cost for each pixel at each disparity value of the set of possible disparity values. It may consist of constructing a three-dimensional (3D) cost volume map 118 . As discussed below, the cost volume (or more correctly, the matching cost volume) can be determined as the product of W×H×D, where W and H are the width and height dimensions of each image. , D is the number of disparity hypotheses or possible disparities. The matching cost for a particular pixel and a particular disparity value represents how unlikely that particular pixel has that particular disparity value. Typically, for a given pixel, the disparity value with the lowest matching cost is chosen to be used in the disparity map, as discussed below. This approach for choosing a disparity value for a given pixel is the so-called winner-take-all (WTA) approach, where the winner is the disparity value with the lowest matching cost, i.e. the best among all disparity hypotheses belongs to. Matching costs are determined using known techniques such as, for example, absolute difference techniques, mutual information (MI) techniques (e.g., hierarchical MI (HMI) techniques), normalized cross-correlation (NCC) techniques, Hamming distance techniques, etc. , can be calculated. In some embodiments, the NCC technique is H.264. H. Hirschmuller et al., "Evaluation of Cost Functions for Stereo Matching" (2007 IEEE Conference on Computer Vision and Pattern Recognition) to consider It can be used to match the costs for two sub-windows (one sub-window in each of the left and right rectified 2D images 114) around the pixels in. In some embodiments, the Hamming distance technique is the S.M. Under consideration, as described in S. Sarika et al., "Census Filtering Based Stereomatching Under Varying Radiometric Conditions" (2015 Procedia Computer Science). Neighboring pixels surrounding a pixel of can be used in a census transform that is mapped to a bitstring depending on whether the intensity values of these pixels are greater or less than that of the pixel under consideration.

図４Ａは、本技術のいくつかの実施形態による、コストボリュームマップ１１８がどのように決定され得るかを理解するための図を示している。コストボリューム分析において、コストボリュームマップ１１８は、平行化２Ｄ画像１１４の左平行化画像（または右平行化画像）内のあらゆるピクセルについてのコスト曲線（例えば、コスト曲線４００）を収集することによって、構築され得る。コストボリュームマップ１１８は、Ｈ×Ｗ×Ｄの要素を有し、Ｈは、画像の高さであり（例えば、Ｈは、行方向におけるピクセルの数であり得、またはより低い精度のマッチングが許される場合、行方向におけるピクセルグループの数であり得）、Ｗは、画像の幅であり（例えば、Ｗは、列方向におけるピクセル数であり得、またはより低い精度のマッチングが許される場合、列方向におけるピクセルグループの数であり得）、Ｄは、探索される視差の数である。したがって、簡略化して述べると、コストボリュームマップ１１８は、各ピクセル座標（行および列）についてのコスト曲線を備えると考えられ得る。いくつかの実施形態においては、コストボリュームマップ１１８は、当技術分野で知られている技法を使用して、より信頼性の高いマッチングコストを獲得するために、（例えば、フィルタリングによって）精緻化され得る。例えば、Ｃ．レーマン（Ｃ．Ｒｈｅｍａｎｎ）らによる、「ビジュアルコレスポンデンスおよびその先のための高速コストボリュームフィルタリング（Ｆａｓｔｃｏｓｔ－ｖｏｌｕｍｅｆｉｌｔｅｒｉｎｇｆｏｒｖｉｓｕａｌｃｏｒｒｅｓｐｏｎｄｅｎｃｅａｎｄｂｅｙｏｎｄ）」（２０１１ＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ）において説明されている技法がある。 FIG. 4A provides a diagram for understanding how the cost volume map 118 may be determined, according to some embodiments of the present technology. In cost volume analysis, cost volume map 118 is constructed by collecting cost curves (e.g., cost curve 400) for every pixel in the left rectified image (or right rectified image) of rectified 2D image 114. can be The cost volume map 118 has H×W×D elements, where H is the height of the image (e.g., H can be the number of pixels in the row direction, or less precision matching is allowed). W is the width of the image (e.g., W can be the number of pixels in the column direction, or the column can be the number of pixel groups in a direction) and D is the number of disparities searched. Simplistically, therefore, the cost volume map 118 can be thought of as comprising a cost curve for each pixel coordinate (row and column). In some embodiments, cost volume map 118 is refined (eg, by filtering) to obtain more reliable matching costs using techniques known in the art. obtain. For example, C.I. C. Rhemann et al., "Fast cost-volume filtering for visual correspondence and beyond," 2011 Conference on Computer Vision and Pattern Recognition on) There is technique.

図４Ｂは、本技術のいくつかの実施形態による、コストマッチング手順を説明するために使用され得る、コスト曲線４００の例を示している。認識されるように、コストマッチングは、典型的には、左平行化画像において（Ｈ０，Ｗ０）の（行，列）座標を有するピクセルＰと、右平行化画像において（Ｈ０，Ｗ０＋Ｄ０）の座標を有するピクセルとの間で、実行され得る。マッチングコストと視差との間の関係を示す、コスト曲線４００は、考察中の特定のピクセルを表し得る。（言及される「マッチングコスト」という用語は、本願明細書においては、「マッチコスト」または「コスト」と呼ばれることもある）。図４Ｂに示されるように、コスト曲線４００は、視差がｄ_１のとき、ｃ_ｄ１の最も低いグローバル最小コストを有し、視差がｄ_２のとき、ｃ_ｄ２の２番目に低いグローバル最小コストを有し、視差がｄ_２ｍのとき、ｃ_ｄ２ｍの２番目に低いローカル最小コストを有する。認識されるように、「グローバル」という用語は、コスト曲線４００におけるすべての点にわたって評価された値に対して使用され得、「ローカル」という用語は、コスト曲線４００の一部分にわたって評価された値に対して使用され得る。 FIG. 4B shows an example cost curve 400 that may be used to describe cost matching procedures, in accordance with some embodiments of the present technology. As will be appreciated, cost matching typically involves a pixel P having (row, column) coordinates of (H0, W0) in the left rectified image and a coordinate of (H0, W0+D0) in the right rectified image. can be performed between pixels with A cost curve 400, which shows the relationship between matching cost and disparity, may represent the particular pixel under consideration. (The term "matching cost" referred to is sometimes referred to herein as "matching cost" or "cost"). As shown in FIG. _4B , the cost curve 400 has the lowest global minimum cost of c _d1 when the disparity is d1, and the second lowest global minimum cost of c _d2 when the disparity is _d2 . and has the second lowest local minimum cost of c _d2m when the disparity is d _2m . As will be appreciated, the term "global" may be used for values evaluated over all points in cost curve 400, and the term "local" may be used for values evaluated over a portion of cost curve 400. can be used against

本技術のいくつかの実施形態によれば、マッチングコストについての値は、左右の平行化２Ｄ画像１１４における考察中のピクセルの周りの、列方向において５ピクセルにまたがり、行方向において５ピクセルにまたがる、サブウィンドウ（「５×５」サブウィンドウ）について、ＮＣＣ技法（上を参照）を使用して、決定され得る。 According to some embodiments of the present technology, the value for the matching cost spans 5 pixels in the column direction and 5 pixels in the row direction around the pixel under consideration in the left and right rectified 2D images 114. , for subwindows (“5×5” subwindows), can be determined using the NCC technique (see above).

本技術のいくつかの実施形態によれば、コスト集約手順は、コスト計算手順の結果を利用して、各ピクセルのサポート領域にわたって、マッチングコストを集約することから成り得る。「ローカル」ステレオマッチング技法については、サポート領域は、関心ピクセルの周りの近隣ピクセルのグループにおけるコストの加重和であると理解され得る。「セミグローバル」および「グローバル」ステレオマッチング技法については、サポート領域は、画像内の全ピクセルについてのコストの関数であると理解され得る。 According to some embodiments of the present technology, the cost aggregation procedure may consist of aggregating matching costs over the region of support of each pixel using the results of the cost computation procedure. For "local" stereo matching techniques, the region of support can be understood to be the weighted sum of costs in a group of neighboring pixels around the pixel of interest. For "semi-global" and "global" stereo matching techniques, the area of support can be understood to be a function of cost for all pixels in the image.

本技術のいくつかの実施形態によれば、視差計算手順は、コスト集約手順の結果を使用し、ローカルまたはグローバル最適化方法を使用して、各ピクセルについての視差を計算することと、精緻化されていない視差マップを生成することとから成り得る。計算スピード対正確さが、ローカル最適化手法とグローバル最適化手法との間の選定を決定し得る。例えば、ローカル方法は、正確さよりもスピードが望まれる場合に使用され得、一方、グローバル方法は、スピードよりも正確さが望まれる場合に使用され得る。いくつかの実施形態においては、ブロックマッチングなどの、ローカル最適化方法が使用され得る。いくつかの実施形態においては、セミグローバルマッチング（ＳＧＭ）などの、グローバル最適化方法が使用され得る。 According to some embodiments of the present technology, the disparity calculation procedure uses the results of the cost aggregation procedure and uses local or global optimization methods to calculate disparity for each pixel and refinement and generating a non-parallax map. Computational speed versus accuracy may determine the choice between local and global optimization approaches. For example, local methods may be used when speed is desired over accuracy, while global methods may be used when accuracy is desired over speed. In some embodiments, local optimization methods such as block matching may be used. In some embodiments, global optimization methods such as semi-global matching (SGM) may be used.

本技術のいくつかの実施形態によれば、視差精緻化手順は、左右の平行化２Ｄ画像１１４についての２Ｄ視差マップ１１６を生成するために、精緻化されていない視差マップをフィルタリングすることから成り得る。視差精緻化手順は、視差値を補正するための任意選択の手順である。従来の精緻化ステップは、左右チェックと、穴埋めと、平滑化フィルタと、異常値検出および除去とを含む。 According to some embodiments of the present technology, the disparity refinement procedure consists of filtering the unrefined disparity map to produce a 2D disparity map 116 for the left and right rectified 2D images 114. obtain. The parallax refinement procedure is an optional procedure for correcting parallax values. Conventional refinement steps include left-right checking, hole filling, smoothing filters, and outlier detection and removal.

本技術のいくつかの実施形態によれば、ステレオマッチングモジュール１０８は、上で説明されたものとは異なる、ステレオマッチング技法を採用し得る。例えば、ステレオマッチングモジュールは、モドサアドハミド（ＭｏｈｄＳａａｄＨａｍｉｄ）らによる、「深層学習に基づいたステレオマッチングアルゴリズム：調査（Ｓｔｅｒｅｏｍａｔｃｈｉｎｇａｌｇｏｒｉｔｈｍｂａｓｅｄｏｎｄｅｅｐｌｅａｒｎｉｎｇ：Ａｓｕｒｖｅｙ）」（２０２０ＪｏｕｒｎａｌｏｆＫｉｎｇＳａｕｄＵｎｉｖｅｒｓｉｔｙ－ＣｏｍｐｕｔｅｒａｎｄＩｎｆｏｒｍａｔｉｏｎＳｃｉｅｎｃｅｓ）において説明されている１つもしくは複数の技法、および／またはＨ．ヒルシュミュラ（Ｈ．Ｈｉｒｓｃｈｍｕｌｌｅｒ）による、「セミグローバルマッチングおよび相互情報量によるステレオ処理（ＳｔｅｒｅｏＰｒｏｃｅｓｓｉｎｇｂｙＳｅｍｉ－ＧｌｏｂａｌＭａｔｃｈｉｎｇａｎｄＭｕｔｕａｌＩｎｆｏｒｍａｔｉｏｎ）」（２００８ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ）において説明されている１つもしくは複数の技法を使用し得る。 According to some embodiments of the present technology, stereo matching module 108 may employ different stereo matching techniques than those described above. For example, the stereo matching module is described in Mohd Saad Hamid et al., "Stereo matching algorithm based on deep learning: A survey" (2020 Journal of King Saud University). - Computer and Information Sciences) and/or one or more of the techniques described in H. H. Hirschmuller, "Stereo Processing by Semi-Global Matching and Mutual Information" (2008 IEEE Transactions on Pattern Analysis and Machine) e Intelligence)1 One or more techniques may be used.

本技術のいくつかの実施形態によれば、ステレオマッチングモジュール１０８は、図１に示されるように、ステレオビジョンシステム１のプロセッサ１０のエンコーダモジュール１１２、および確信度処理モジュール１１０に対して、２Ｄ視差マップ１１６を出力する。いくつかの実施形態においては、ステレオマッチングモジュール１０８は、確信度処理モジュール１１０に対して、視差マップ１１６を導出することとの関連において上で議論された、３Ｄコストボリュームマップ１１８も出力する。 According to some embodiments of the present technology, stereo matching module 108 applies 2D disparity 2D parallax to encoder module 112 and belief processing module 110 of processor 10 of stereo vision system 1, as shown in FIG. Output map 116 . In some embodiments, stereo matching module 108 also outputs 3D cost volume map 118 , discussed above in connection with deriving disparity map 116 , to belief processing module 110 .

本技術のいくつかの実施形態によれば、確信度処理モジュール１１０は、画像平行化モジュール１０６から、左右の２Ｄ平行化画像１１４を、ステレオマッチングモジュール１０８から、コストボリュームマップ１１８を、およびステレオマッチングモジュール１０８から、２Ｄ視差マップ１１６を、入力として受信し、これらの入力から、平行化画像１１４の各ピクセルについて推定される視差の正確さを決定するように構成される。 According to some embodiments of the present technology, the belief processing module 110 processes the left and right 2D rectified images 114 from the image rectification module 106, the cost volume map 118 from the stereo matching module 108, and the stereo matching From module 108 , it receives as input a 2D disparity map 116 and is configured to determine from these inputs the estimated disparity accuracy for each pixel of the rectified image 114 .

確信度処理モジュール１１０は、本技術のいくつかの実施形態によれば、ビデオストリームの各フレームについての確信度値を計算するように構成され得る。いくつかの実施形態においては、確信度処理モジュール１１０は、ビデオストリームの各フレームの各ピクセルについての確信度値を計算し得る。したがって、本願明細書における「画像」という用語は、ビデオストリームのフレームを包含し得ることが理解されるべきである。 Confidence processing module 110 may be configured to calculate a belief value for each frame of the video stream, according to some embodiments of the present technology. In some embodiments, belief processing module 110 may calculate a belief value for each pixel of each frame of the video stream. Accordingly, it should be understood that the term "image" herein can encompass frames of a video stream.

図１に戻ると、確信度処理モジュール１１０は、本技術のいくつかの実施形態によれば、エンコーダモジュール１１２に対して、確信度情報を出力し得る。確信度情報は、いくつかの実施形態においては、ビデオストリームの各フレームについての確信度マップ１２０であり得る。確信度マップ１２０は、ステレオマッチングモジュール１０８によって、エンコーダモジュール１１２に対して出力された、視差マップ１１６と同じ次元（ｄｉｍｅｎｓｉｏｎｓ）を有し得る。いくつかの実施形態においては、確信度マップ１２０は、視差マップ１１６内の対応するピクセルについての深度推定値の確信度レベルを表す値を示す、フレームの各ピクセルについての確信度情報を含み得る。いくつかの実施形態においては、特定のピクセルの確信度レベルを表す値は、そのピクセルについて決定された視差または深度の２乗平均平方根誤差であり得、またはそのピクセルについて決定された視差または深度の９５％信頼区間であり得、または変動性の任意の尺度（例えば、標準偏差、平均の標準誤差、信頼区間、データ範囲、パーセンタイルなど）であり得る。Ｍ．ポッジ（Ｍ．Ｐｏｇｇｉ）らによる、「深層学習時代におけるステレオマッチングの確信度について：量的評価（Ｏｎｔｈｅｃｏｎｆｉｄｅｎｃｅｏｆｓｔｅｒｅｏｍａｔｃｈｉｎｇｉｎａｄｅｅｐ－ｌｅａｒｎｉｎｇｅｒａ：ａｑｕａｎｔｉｔａｔｉｖｅｅｖａｌｕａｔｉｏｎ）」（２０２１ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓ＆ＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ）、およびＸ．フー（Ｘ．Ｈｕ）らによる、「ステレオビジョンのための確信度尺度の量的評価（ＡＱｕａｎｔｉｔａｔｉｖｅＥｖａｌｕａｔｉｏｎｏｆＣｏｎｆｉｄｅｎｃｅＭｅａｓｕｒｅｓｆｏｒＳｔｅｒｅｏＶｉｓｉｏｎ）」（２０１２ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ）において説明されているものなど、他の確信度メトリックが使用され得る。 Returning to FIG. 1, belief processing module 110 may output belief information to encoder module 112, in accordance with some embodiments of the present technology. The belief information, in some embodiments, may be a belief map 120 for each frame of the video stream. Confidence map 120 may have the same dimensions as disparity map 116 output by stereo matching module 108 to encoder module 112 . In some embodiments, confidence map 120 may include confidence information for each pixel of the frame that indicates a value representing the confidence level of the depth estimate for the corresponding pixel in disparity map 116 . In some embodiments, the value representing the confidence level for a particular pixel can be the root mean square error of the disparity or depth determined for that pixel, or the disparity or depth determined for that pixel. Can be 95% confidence intervals, or can be any measure of variability (eg, standard deviation, standard error of the mean, confidence intervals, data ranges, percentiles, etc.). M. M. Poggi et al., "On the confidence of stereo matching in a deep-learning era: a quantitative evaluation" (2021 IEEE Transactions on P Attern Analysis & Machine Intelligence), and X. X. Hu et al., "A Quantitative Evaluation of Confidence Measures for Stereo Vision" (2012 IEEE Transactions on Pattern Analysis and Machine Int) elligence) Other confidence metrics may be used, such as .

図５は、本技術のいくつかの実施形態による、確信度処理モジュール１１０によって実行される処理を示す図である。いくつかの実施形態においては、ブロック５００において、第１の確信度尺度プロセスは、ステレオマッチングモジュール１０８から受信されたコストボリューム１１８に基づいて、第１の確信度マップ５０８を計算する。いくつかの実施形態においては、各ピクセルについてのコスト曲線（例えば、コスト曲線４００）の勝者マージンが計算され、第１の確信度マップ５０８上における、そのピクセルの確信度値として使用され得る。勝者マージン（ＷＭＮ）は、コスト曲線全体にわたって正規化された、２番目に低いローカル最小コストｃ_ｄ２ｍと最も低いグローバル最小コストｃ_ｄ１との間の差、または FIG. 5 is a diagram illustrating processing performed by belief processing module 110, in accordance with some embodiments of the present technology. In some embodiments, at block 500 the first confidence measure process computes a first confidence map 508 based on the cost volumes 118 received from the stereo matching module 108 . In some embodiments, a cost curve (eg, cost curve 400 ) winner margin for each pixel may be calculated and used as the belief value for that pixel on the first belief map 508 . The winner margin (WMN) is the difference between the second lowest local minimum cost c _d2m and the lowest global minimum cost c _d1 normalized over the entire cost curve, or

として定義され得、ｐは、考察中のピクセルの座標であり、ｃは、マッチコストであり、Ｄは、探索される視差のセットである。いくつかの実施形態においては、第１の確信度マップ３０８は、Ｍ．ポッジ（Ｍ．Ｐｏｇｇｉ）らによる、「深層学習時代におけるステレオマッチングの確信度について：量的評価（Ｏｎｔｈｅｃｏｎｆｉｄｅｎｃｅｏｆｓｔｅｒｅｏｍａｔｃｈｉｎｇｉｎａｄｅｅｐ－ｌｅａｒｎｉｎｇｅｒａ：ａｑｕａｎｔｉｔａｔｉｖｅｅｖａｌｕａｔｉｏｎ）」（２０２１ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓ＆ＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ）において要約されているものなど、他の尺度からも導出され得る。 where p is the coordinate of the pixel under consideration, c is the match cost, and D is the disparity set to be searched. In some embodiments, the first confidence map 308 is based on the M.O. M. Poggi et al., "On the confidence of stereo matching in a deep-learning era: a quantitative evaluation" (2021 IEEE Transactions on P Attern Analysis & Machine Intelligence) can also be derived from other measures.

本技術のいくつかの実施形態によれば、ブロック５０２において、確信度処理モジュール１１０の第２の確信度尺度プロセスは、画像平行化モジュール１０６から受信された平行化２Ｄ画像１１４に基づいて、第２の確信度マップ５１０を計算する。いくつかの実施形態においては、テクスチャが、例えば、平行化２Ｄ画像１１４内における変化を決定するために、画像データの導関数（例えば、ｘ導関数）を算出することによって、平行化２Ｄ画像１１４上において測定され、テクスチャ値は、第２の確信度マップ５１０を導出するために使用され得る。いくつかの実施形態においては、画像テクスチャは、ｘソーベル演算子を使用して、測定され得る。認識されるように、テクスチャのない特徴、または画像内の複数の他の特徴と同じである特徴をマッチさせる際の困難さのせいで、少ないテクスチャ、または繰り返し構造から成るテクスチャを有する画像上において、３Ｄ再構築処理を実行することは困難であることが、ステレオビジョン技術において知られている。この困難さを考慮して、第２の確信度マップ３１０は、ｘソーベル演算子： According to some embodiments of the present technology, at block 502, the second confidence measure process of the confidence processing module 110, based on the rectified 2D image 114 received from the image rectification module 106, 2 confidence map 510 is computed. In some embodiments, the texture is applied to the rectified 2D image 114, for example, by calculating a derivative (eg, x-derivative) of the image data to determine changes in the rectified 2D image 114. The texture values measured above may be used to derive a second belief map 510 . In some embodiments, image texture may be measured using the x-Sobel operator. As will be appreciated, due to the difficulty in matching untextured features, or features that are the same as multiple other features in the image, on images with little texture, or texture consisting of repeating structures , is known in stereo vision technology to be difficult to perform the 3D reconstruction process. With this difficulty in mind, the second confidence map 310 uses the x-Sobel operator:

を用いて畳み込まれた、グレースケール画像から導出され得る。画像内のエッジを強調することが知られている、そのようなソーベル畳み込みまたはフィルタリングは、平行化２Ｄ画像１１４内にエッジがあるとき、高い値を産出し得、したがって、ステレオマッチングを行うのがより容易であり得る、より鮮明な、より良く定義された特徴という結果になり得る。いくつかの実施形態においては、ソーベル畳み込みに加えて、またはソーベル畳み込みの代わりに、第２の確信度マップ５１０は、ピクセルの強度に基づいて、各ピクセルを評価し、最小閾値を下回る、または最大閾値を上回る強度を有するピクセルにペナルティを与えることによって、導出され得る。例えば、最小閾値を下回る信号強度レベルを有する、各低強度または過小照明ピクセルに対して、ゼロまたは低い確信度値が決定され得、同様に、最大値を上回る信号強度レベルを有する、各過大照明または飽和ピクセルに対して、ゼロまたは低い確信度値が決定され得る。 can be derived from the grayscale image, convolved with Such Sobel convolution or filtering, which is known to enhance edges in the image, may yield high values when there are edges in the rectified 2D image 114, thus making it easier to perform stereo matching. It can result in sharper, better defined features that can be easier. In some embodiments, in addition to or instead of the Sobel convolution, the second confidence map 510 evaluates each pixel based on the intensity of the pixel, below a minimum threshold or maximum It can be derived by penalizing pixels with intensities above a threshold. For example, a zero or low confidence value may be determined for each low-intensity or under-illuminated pixel having a signal intensity level below the minimum threshold, and similarly for each over-illuminated pixel having a signal intensity level above the maximum value. Or a zero or low confidence value may be determined for saturated pixels.

本技術のいくつかの実施形態によれば、ブロック５０４において、確信度処理モジュール１１０の第３の確信度尺度プロセスは、ステレオマッチングモジュール１０８から受信された視差マップ１１６に基づいて、第３の確信度マップ５１２を計算する。いくつかの実施形態においては、第３の確信度マップ３１２についての確信度値は、平行化２Ｄ画像１１４の各ピクセルについて決定された視差の分散から計算され得る。いくつかの実施形態においては、分散は、近隣ピクセル（例えば、考察中のピクセルを取り囲むピクセル）について決定された視差に関する統計的分散であり得る。例えば、相対的により高い分散を有するピクセルは、ピクセルが、よりノイズの多い（例えば、より散在性の）データを有する、視差マップ１１６の領域の一部であることを示し得、したがって、相対的により低い確信度値を割り当てられ得る。認識されるように、ノイズの多さは、ぼやけを示し得、画像をキャプチャするために使用されたセンサ（例えば、カメラ１００）、またはセンサの領域が、汚れている、または部分的に見えなくされている可能性があることを示し得る。視差マップ１１６における分散を検出するために、知られた技法（例えば、ラプラスフィルタおよび／またはソーベルフィルタに基づいた技法）が使用され得る。いくつかの実施形態においては、閾値分散を下回る分散は、考察中のピクセルが、ぼやけている、またはぼやけた領域内にあることを示し得、したがって、第３の確信度マップ５１２において、低い確信度値を割り当てられ得る。 At block 504, the third confidence measure process of the confidence processing module 110, based on the disparity map 116 received from the stereo matching module 108, determines the third confidence measure 116, according to some embodiments of the present technology. Calculate the degree map 512 . In some embodiments, the confidence value for the third confidence map 312 may be calculated from the disparity variance determined for each pixel of the rectified 2D image 114 . In some embodiments, the variance may be a statistical variance for disparities determined for neighboring pixels (eg, pixels surrounding the pixel under consideration). For example, a pixel with relatively higher variance may indicate that the pixel is part of a region of disparity map 116 that has noisier (eg, more sparse) data; may be assigned lower confidence values. As will be appreciated, noisiness can indicate blurring, where the sensor (e.g., camera 100) used to capture the image, or the area of the sensor, is dirty or partially obscured. It can be shown that there is a possibility that Known techniques (eg, techniques based on Laplacian and/or Sobel filters) may be used to detect variance in disparity map 116 . In some embodiments, a variance below the threshold variance may indicate that the pixel under consideration is blurred or within a blurred region, and thus a low confidence value in the third confidence map 512. can be assigned a degree value.

認識されるように、第１の確信度マップ５０８、第２の確信度マップ５１０、および第３の確信度マップ５１２は、本技術のいくつかの実施形態によれば、画像の各ピクセル、またはビデオストリームの各フレームの各ピクセルについての確信度値から成ると説明されたが、いくつかの他の実施形態においては、第１の確信度マップ５０８、第２の確信度マップ５１０、および第３の確信度マップ５１２のうちの１つまたは複数は、画像またはフレームの２つ以上のピクセルを代表する、確信度値から成り得る。例えば、ビデオストリームの各フレームは、ｎ個のグループにカテゴライズされたピクセルから成り得、確信度処理モジュール１１０は、ビデオストリームの各フレームについて、ｎ個の確信度値（すなわち、ｎ個のグループの各々についての確信度値）を計算するように構成され得る。 As will be appreciated, the first belief map 508, the second belief map 510, and the third belief map 512 may, according to some embodiments of the present technology, each pixel of an image, or Although described as consisting of a confidence value for each pixel of each frame of a video stream, in some other embodiments, a first confidence map 508, a second confidence map 510, and a third One or more of the confidence maps 512 of may consist of confidence values that are representative of two or more pixels of the image or frame. For example, each frame of the video stream may consist of pixels categorized into n groups, and the belief processing module 110 provides n belief values (i.e., n groups of n groups) for each frame of the video stream. confidence value for each).

本技術のいくつかの実施形態によれば、ブロック５０６において、確信度処理モジュール１１０の集約器プロセスは、第１の確信度マップ５０８、第２の確信度マップ５１０、および第３の確信度マップ５１２を使用し、確信度マップ１２０を生成するために、第１の確信度マップ５０８、第２の確信度マップ５１０、および第３の確信度マップ５１２における推定された確信度値を組み合わせる。したがって、確信度マップ１２０は、複数の確信度の尺度に基づいた、確実性の最も良い推定値である、強化された確信度値から成り得る。いくつかの実施形態においては、集約器プロセスは、各ピクセルについて、確信度の和を計算することから成り得、和を、確信度マップ１２０におけるピクセルについての強化された確信度値として使用し得る。いくつかの実施形態においては、集約器プロセスは、第１の確信度マップ５０８、第２の確信度マップ５１０、および第３の確信度マップ５１２における確信度を重み付けし、その後、各ピクセルについて、確信度の加重和を計算することを備え得、加重和を、確信度マップ１２０におけるピクセルについての強化された確信度値として使用し得る。いくつかの実施形態においては、集約器プロセスは、各ピクセルについて、第１の確信度マップ５０８、第２の確信度マップ５１０、および第３の確信度マップ５１２におけるピクセルについての確信度を、確信度マップ１２０におけるピクセルについての強化された確信度値として使用される、単一の値を出力し得る、ルックアップテーブルに対する、３つの入力値として使用することを備え得る。 At block 506, the aggregator process of the belief processing module 110 processes the first belief map 508, the second belief map 510, and the third belief map, according to some embodiments of the present technology. 512 is used to combine the estimated confidence values in the first confidence map 508 , the second confidence map 510 , and the third confidence map 512 to generate the confidence map 120 . Thus, confidence map 120 may consist of enhanced confidence values that are the best estimates of certainty based on multiple confidence measures. In some embodiments, the aggregator process may consist of computing a sum of beliefs for each pixel, and the sum may be used as the enhanced belief value for the pixel in the belief map 120. . In some embodiments, the aggregator process weights the beliefs in the first belief map 508, the second belief map 510, and the third belief map 512, and then for each pixel: Calculating a weighted sum of the beliefs may comprise calculating the weighted sum and using the weighted sum as the enhanced belief value for the pixel in the belief map 120 . In some embodiments, the aggregator process converts, for each pixel, the belief for the pixel in the first belief map 508, the second belief map 510, and the third belief map 512 into a belief. It may comprise using three input values to a lookup table that may output a single value to be used as the enhanced confidence value for the pixel in degree map 120 .

図１に戻ると、確信度処理モジュール１１０は、本技術のいくつかの実施形態によれば、コンピュータプロセッサ１０のエンコーダモジュール１１２に対して、確信度マップ１２０を出力し得る。エンコーダモジュール１１２は、平行化２Ｄ画像１１４、視差マップ１１６、および確信度マップ１２０を受信し、受信された情報をエンコードし、エンコードされた情報から成る、ビデオストリーム１２２を出力するように構成され得る。いくつかの実施形態においては、ビデオストリーム１２２は、車両のメインシステムコントローラ３０に対して、提供され得る。 Returning to FIG. 1, belief processing module 110 may output belief map 120 to encoder module 112 of computer processor 10, in accordance with some embodiments of the present technology. Encoder module 112 may be configured to receive rectified 2D image 114, disparity map 116, and confidence map 120, encode the received information, and output a video stream 122 consisting of the encoded information. . In some embodiments, the video stream 122 may be provided to the vehicle's main system controller 30 .

本技術のいくつかの実施形態によれば、ビデオストリーム１２２は、各２４ビットカラー値が、カメラ１００によってキャプチャされたシーンまでの範囲または距離であるようにエンコードされ得る、２４ビットカラー深度ビデオから成り得る。いくつかの実施形態においては、０から近似的に１６８００メートルまでの距離が、２４ビットを用いて表され得、各カラーは、１６８００メートル範囲の異なる１ミリメートル部分を表す。いくつかの実施形態においては、ビデオストリーム１２２は、ピクセルから成る２Ｄフレームから成り得、各フレームのピクセルは、平行化２Ｄ画像１１４のピクセルに対応する。ビデオストリーム１２２の各フレームの各ピクセルは、強化された確信度値とともに、エンコードされ得る。例えば、各強化された確信度値は、０から２５５までの８ビット符号なし値であり得、相対的により高い値は、より高いレベルの確信度を示し得る。いくつかの実施形態においては、確信度マップ１２０の強化された確信度値について、８ビット表現を使用することは、確信度マップ１２０が、グレースケール画像として表示されることを可能にし得る。いくつかの実施形態においては、ビデオストリーム１２２は、深度または距離を表す、２４ビットカラービデオストリームから成り得、確信度を表す、８ビットモノクロビデオストリームからも成り得る。深度データが確信度データから分離され得るように、ビデオストリーム１２２を出力することは、いくつかの実施形態においては、例えば、対象物を検出する際、および車両の対象物までの距離を決定する際に、強化された信頼性を提供するために、異なるタイプのセンサ（例えば、ライダ、レーダ、超音波、カメラなど）からのデータが組み合わされ得る、センサ融合を容易にし得る。例えば、シーン内の対象物が、典型的なカメラによってキャプチャされた画像において、明確に見えないことがある、霧がかかった環境においては、画像のピクセルについての確信度値は、一般に、画像全体にわたって低くなり得る。そのようなケースにおいては、車両の制御システム（例えば、メインシステムコントローラ３０）は、画像が、使用されるのに十分なほど信頼できないと決定し得る。 According to some embodiments of the present technology, video stream 122 may be encoded such that each 24-bit color value is the range or distance to the scene captured by camera 100 from 24-bit color depth video. It is possible. In some embodiments, distances from 0 to approximately 16800 meters can be represented using 24 bits, with each color representing a different 1 millimeter portion of the 16800 meter range. In some embodiments, the video stream 122 may consist of 2D frames of pixels, each frame pixel corresponding to a pixel in the reconciled 2D image 114 . Each pixel of each frame of video stream 122 may be encoded with an enhanced confidence value. For example, each enhanced confidence value may be an 8-bit unsigned value from 0 to 255, with relatively higher values indicating higher levels of confidence. In some embodiments, using an 8-bit representation for the enhanced belief values of belief map 120 may allow belief map 120 to be displayed as a grayscale image. In some embodiments, video stream 122 may consist of a 24-bit color video stream representing depth or distance, and may also consist of an 8-bit monochrome video stream representing confidence. Outputting the video stream 122 so that the depth data can be separated from the confidence data, in some embodiments, for example, in detecting objects and determining the vehicle's distance to the object In practice, it may facilitate sensor fusion, in which data from different types of sensors (eg, lidar, radar, ultrasound, cameras, etc.) may be combined to provide enhanced reliability. For example, in foggy environments where objects in a scene may not be clearly visible in images captured by a typical camera, confidence values for pixels in an image are generally can be low over In such cases, the vehicle's control system (eg, main system controller 30) may determine that the image is not reliable enough to be used.

本技術のいくつかの実施形態によれば、ステレオビジョンシステム１のコンピュータプロセッサ１０は、ライダ確信度処理モジュール１２４から成り得、センサ１００は、レーザ光を用いてシーンを照明し、シーンから受光された反射光から、ライダ画像データ（例えば、ビデオストリーム）と、ライダ確信度データとを生成し、エンコーダモジュール１１２に対して、ライダ確信度マップ１２６を出力するように構成された、ライダセンサから成り得る。任意選択で、ライダ画像データは、ライダ確信度マップ１２６と一緒に出力され得る。いくつかの実施形態においては、コンピュータプロセッサ１０は、レーダ確信度処理モジュール１２８から成り得、センサ１００は、知られた波長（例えば、７６．５ＧＨｚ）の波を用いてシーンを照明し、知られた波長を有し、シーンから反射された反射波から、レーダ画像データ（例えば、ビデオストリーム）と、レーダ確信度データとを生成し、エンコーダモジュール１１２に対して、レーダ確信度マップ１３０を出力するように構成された、レーダセンサから成り得る。任意選択で、レーダ画像データは、レーダ確信度マップ１３０と一緒に出力され得る。いくつかの実施形態においては、コンピュータプロセッサ１０は、音響確信度処理モジュール１３２から成り得、センサ１００は、知られた波長（例えば、２０ｋＨｚ）の音響波（例えば、超音波）を用いてシーンを照射し、知られた波長を有し、シーンから反射された反射波から、音響画像データ（例えば、ビデオストリーム）と、音響確信度データとを生成し、エンコーダモジュール１１２に対して、音響確信度マップ１３４を出力するように構成された、トランスデューサから成り得る。任意選択で、音響画像データは、音響確信度マップ１３４と一緒に出力され得る。ライダ画像データ、レーダ画像データ、および音響画像データについての確信度値を決定するための技法が、以下で議論される。 According to some embodiments of the present technology, the computer processor 10 of the stereo vision system 1 may consist of a lidar belief processing module 124, the sensor 100 illuminating the scene with laser light and receiving light from the scene. lidar sensor configured to generate lidar image data (e.g., a video stream) and lidar belief data from the reflected light and output a lidar belief map 126 to the encoder module 112. . Optionally, lidar image data may be output along with the lidar belief map 126 . In some embodiments, computer processor 10 may comprise radar confidence processing module 128, sensor 100 illuminates the scene with waves of known wavelength (eg, 76.5 GHz), generated radar image data (e.g., a video stream) and radar confidence data from reflected waves having wavelengths that are reflected from the scene, and outputs a radar confidence map 130 to the encoder module 112. It may consist of a radar sensor configured to: Optionally, radar image data may be output along with radar confidence map 130 . In some embodiments, computer processor 10 may comprise acoustic confidence processing module 132, and sensor 100 scans a scene using acoustic waves (eg, ultrasound) of known wavelength (eg, 20 kHz). Generate acoustic image data (e.g., a video stream) and acoustic confidence data from reflected waves of illuminated and known wavelengths reflected from the scene, and provide the acoustic confidence data to encoder module 112 . It may consist of a transducer configured to output map 134 . Optionally, the acoustic image data may be output together with the acoustic confidence map 134 . Techniques for determining confidence values for lidar image data, radar image data, and acoustic image data are discussed below.

本技術のいくつかの実施形態によれば、ライダ確信度処理モジュール１２４は、ステレオマッチングモジュール１０８から、視差マップ１１６およびコストボリュームマップ１１８を、確信度処理モジュール１１０から、確信度マップ１２０を受信するように構成され得る。図１には示されていないが、ライダ確信度処理モジュール１２４は、いくつかの実施形態においては、画像平行化モジュール１０６から、左右の平行化２Ｄ画像１１４の一方または両方を受信するようにも構成され得る。ライダ確信度処理モジュール１２４は、比較を実行するために、およびライダ確信度データが、確信度マップ１２０の対応する領域よりも高い確信度値である値を有する領域を識別するために、確信度マップ１２０と、他の受信された情報（すなわち、視差マップ１１６、および／またはコストボリュームマップ１１８、および／または平行化２Ｄ画像１１４の一方もしくは両方）のうちのいくつかまたはすべてとを使用し得る。いくつかの実施形態においては、ライダ確信度処理モジュール１２４は、エンコーダモジュール１１２に対して、比較情報を、ライダ確信度マップ１２６と一緒に提供し得る。同様に、いくつかの実施形態においては、視差マップ１１６、コストボリュームマップ１１８、および確信度マップ１２０が、エンコーダモジュール１１２に対して、比較情報を、レーダ確信度マップ１３０と一緒に提供するように構成され得る、レーダ確信度処理モジュール１２８に対して提供され得る。同様に、いくつかの実施形態においては、視差マップ１１６、コストボリュームマップ１１８、および確信度マップ１２０が、エンコーダモジュール１１２に対して、比較情報を、音響確信度マップ１３４と一緒に提供するように構成され得る、音響確信度処理モジュール１３２に対して提供され得る。 According to some embodiments of the present technology, lidar belief processing module 124 receives disparity map 116 and cost volume map 118 from stereo matching module 108 and belief map 120 from belief processing module 110. can be configured as Although not shown in FIG. 1, the lidar belief processing module 124 also receives one or both of the left and right parallelized 2D images 114 from the image parallelization module 106 in some embodiments. can be configured. The lidar belief processing module 124 performs the comparison and identifies areas where the lidar belief data has values that are higher confidence values than corresponding areas of the belief map 120 . Map 120 and some or all of the other received information (i.e., one or both of disparity map 116, and/or cost volume map 118, and/or rectified 2D image 114) may be used. . In some embodiments, lidar belief processing module 124 may provide comparison information to encoder module 112 along with lidar belief map 126 . Similarly, in some embodiments, disparity map 116 , cost volume map 118 , and confidence map 120 provide comparative information to encoder module 112 along with radar confidence map 130 . It may be provided to a radar confidence processing module 128, which may be configured. Similarly, in some embodiments, disparity map 116, cost volume map 118, and confidence map 120 provide comparative information to encoder module 112 along with acoustic confidence map 134. It may be provided to an acoustic belief processing module 132, which may be configured.

本技術のいくつかの実施形態によれば、エンコーダモジュール１１２は、カメラ１００によって獲得されたセンサデータだけに基づいた深度情報を含むように、ビデオストリーム１２２をエンコードするか、それとも別のセンサシステム（例えば、ライダセンサ、レーダセンサ、音響センサ、カメラの別のペアなど）だけに基づいた深度情報を含むように、ビデオストリーム１２２をエンコードするか、それとも車両上の複数の異なるセンサシステムから導出されたセンサデータを組み合わせるために、センサ融合を実行するかを決定するために、確信度マップ１２０と、ライダ確信度マップ１２６、レーダ確信度マップ１３０、および音響確信度マップ１３４のうちの１つまたは複数とを使用し得る。いくつかの実施形態においては、エンコーダモジュール１１２は、ビデオストリーム１２２における画像データ、深度データ、および／または確信度データが、異なるセンサシステムから決定された、最も高い確信度値に対応するように、ビデオストリーム１２２をエンコードするように構成され得る。いくつかの実施形態においては、ビューストリーム１２２のフレームは、平行化２Ｄ画像１１４に基づいた、１つまたは複数のフレームと、ライダ画像データに基づいた、１つまたは複数のフレームによって後続される、確信度マップ１２０と、音響画像データおよび音響確信度マップ１３４に基づいた、１つまたは複数のフレームによって後続される、ライダ確信度マップ１２６とから成り得る。いくつかの実施形態においては、ビデオストリーム１２２の１つまたは複数のフレームは、各々、異なるセンサシステムから導出されたデータの組み合わせから成り得る。例えば、ビデオストリーム１２２のフレームの各々は、４つの象限（例えば、左上、右上、左下、右下）から成り得、各象限は、最も高い全般的な確信度（例えば、象限のピクセルにわたる最も高い平均確信度レベル）を有するセンサデータから成る。したがって、カメラ１００の一方または両方の上のデブリが、確信度マップ１２０の左上象限に、車両の他のセンサシステムの全般的な確信度と比較して最も低い全般的な確信度を有させるが、確信度マップ１２０の他の３つの象限は、最も高い全般的な確信度値を有する場合、ビデオストリーム１２２の対応するフレームの左上象限は、最も高い全般的な確信度を有する、センサシステムからのデータを用いて、置き換えられ得る。認識されるように、ビデオストリーム１２２のフレームを、フレームごとに、または象限ごとに、最適化する代わりに、フレームは、異なるセンサシステムのセンサによってキャプチャされたシーンの高信頼性の推定値を提供する他のやり方で最適化され得る。いくつかの実施形態においては、車両が、夜間に、または暗い環境において動作させられているとき、エンコーダモジュール１１２は、カメラベースデータの代わりに、例えば、ライダベースデータを用いて、ビデオストリーム１２２をエンコードし得る。 According to some embodiments of the present technology, encoder module 112 encodes video stream 122 to include depth information based solely on sensor data acquired by camera 100 or another sensor system ( For example, a lidar sensor, a radar sensor, an acoustic sensor, another pair of cameras, etc.), or encode the video stream 122 to include depth information based solely on sensors derived from multiple different sensor systems on the vehicle. To combine data, confidence map 120 and one or more of lidar confidence map 126, radar confidence map 130, and acoustic confidence map 134 are used to determine whether to perform sensor fusion. can be used. In some embodiments, encoder module 112 is configured such that the image data, depth data, and/or confidence data in video stream 122 correspond to the highest confidence value determined from different sensor systems. It may be configured to encode the video stream 122 . In some embodiments, the frames of viewstream 122 are followed by one or more frames based on rectified 2D image 114 and one or more frames based on lidar image data. It may consist of a belief map 120 and a lidar belief map 126 followed by one or more frames based on the acoustic image data and the acoustic belief map 134 . In some embodiments, one or more frames of video stream 122 may each consist of a combination of data derived from different sensor systems. For example, each of the frames of the video stream 122 may consist of four quadrants (eg, upper left, upper right, lower left, lower right), and each quadrant has the highest overall confidence (eg, the highest over pixels in the quadrant). consists of sensor data with an average confidence level). Debris on one or both of the cameras 100 therefore causes the upper left quadrant of the confidence map 120 to have the lowest overall confidence compared to the overall confidence of the vehicle's other sensor systems. , the upper left quadrant of the corresponding frame of the video stream 122 has the highest overall confidence value if the other three quadrants of the confidence map 120 have the highest overall confidence value. can be replaced using the data of As will be appreciated, instead of optimizing the frames of the video stream 122 on a frame-by-frame or quadrant-by-quadrant basis, the frames provide highly reliable estimates of scenes captured by sensors of different sensor systems. can be optimized in other ways. In some embodiments, when the vehicle is being operated at night or in a dark environment, the encoder module 112 converts the video stream 122 using, for example, lidar-based data instead of camera-based data. can be encoded.

図６は、本技術のいくつかの実施形態による、ビデオストリーム１２２の例を描いている。ビデオストリーム１２２は、シーケンスフレームから成り得、そのうちの３つだけが、すなわち、フレームＮ６００、フレームＮ＋１６０２、およびフレームＮ＋２６０４だけが示されている。いくつかの実施形態においては、各フレームは、（カラーまたはグレースケールでの）左（または右）平行化２Ｄ画像１１４と、（カラーまたはグレースケールでの）視差マップ１１６と、（カラーまたはグレースケールでの）確信度マップ１２０とから成る、スーパフレーム６０６であり得る。いくつかの実施形態においては、スーパフレーム６０６は、以下のうちの、すなわち、速度大きさマップ、（ｘ軸に沿った動きを描く）水平速度マップ、（ｙ軸に沿った動きを描く）垂直速度マップ、および（ｚ軸に沿った動きを描く）視線速度マップのうちのいずれか１つまたは任意の組み合わせなどの、１つまたは複数の速度マップも包含し得る。速度マップは、２つ以上のフレームにわたって、密なオプティカルフローを有する、マッチさせられた点（またはピクセル）を追跡し、その後、マッチさせられた点の視差マップ値の差を決定することによって、計算され得る。この方式においては、スーパフレーム６０６は、ビデオフレームの他の任意のストリームと同じように送信され得、したがって、エンコーディング、デコーディング、送信などのための存在しているビデオ技術を有利に使用し得る。認識されるように、視差マップ１１６は、代わりに、深度マップであり得、なぜなら、両方のタイプのマップは、センサ（例えば、カメラ１００）から、センサによってキャプチャされたシーンまでの距離に関係し、両方のタイプのマップは、典型的には、互いに反比例するからである。いくつかの実施形態においては、各スーパフレームの平行化２Ｄ画像１１４の各ピクセルの「アルファチャンネル」または透明度が、情報を記憶するために、使用され得る。例えば、ピクセルについての深度推定値の確信度または確実性についての値は、ピクセルのアルファチャンネル内に記憶され得る。 FIG. 6 depicts an example video stream 122, in accordance with some embodiments of the present technology. Video stream 122 may consist of a sequence of frames, of which only three are shown: frame N 600, frame N+1 602, and frame N+2 604. In some embodiments, each frame includes a left (or right) rectified 2D image 114 (in color or grayscale), a disparity map 116 (in color or grayscale), and a (color or grayscale ), and the belief map 120 . In some embodiments, the superframe 606 includes a velocity magnitude map, a horizontal velocity map (describing motion along the x-axis), a vertical velocity map (describing motion along the y-axis), and a One or more velocity maps may also be included, such as any one or any combination of a velocity map and a radial velocity map (which depicts motion along the z-axis). The velocity map tracks matched points (or pixels) with dense optical flow over two or more frames, and then by determining the difference in disparity map values of the matched points. can be calculated. In this scheme, the superframes 606 may be transmitted like any other stream of video frames, thus advantageously using existing video technologies for encoding, decoding, transmission, etc. . As will be appreciated, the disparity map 116 may instead be a depth map, since both types of maps relate to the distance from the sensor (e.g. camera 100) to the scene captured by the sensor. , since both types of maps are typically inversely proportional to each other. In some embodiments, the "alpha channel" or transparency of each pixel of the rectified 2D image 114 of each superframe may be used to store information. For example, the confidence or certainty value of the depth estimate for a pixel may be stored in the pixel's alpha channel.

図７Ａ乃至図７Ｆは、本技術のいくつかの実施形態による、ビデオストリーム１２２内に含まれ得る、６タイプのスーパフレームを示している。認識されるように、異なるタイプのスーパフレームは、異なるタイプの情報から成り、したがって、異なるサイズを有し得る。図７Ａは、垂直に連結された、２ＤＲＧＢ（赤緑青）カラー画像７００（例えば、左（または右）平行化２Ｄ画像１１４）と、深度マップまたは視差マップ７００（例えば、視差マップ１１６）と、確信度マップ７０４（例えば、確信度マップ１２０）とから成る、スーパフレーム７５０を示している。いくつかの実施形態においては、スーパフレーム７５０の各ピクセルは、ピクセルの視差または範囲を記述するために、３バイトを使用し得、３バイトが、ピクセルの確信度レベルを記述するために使用され得る。いくつかの実施形態においては、スーパフレーム７５０の各ピクセルは、９バイトを用いて指定され得る。図７Ｂは、２Ｄカラー画像７０６と、視差マップおよび確信度マップの連結とから成る、スーパフレーム７５２を示している。いくつかの実施形態においては、スーパフレーム７５２の各ピクセルは、視差マップのために２バイト、確信度マップのために１バイトを使用し得、連結すると合計で３バイトである。いくつかの実施形態においては、スーパフレーム７５２の各ピクセルは、６バイトを用いて指定され得る。図７Ｃは、赤色画像７１０のためのチャンネル、緑色画像７１２のためのチャンネル、および青色画像７１４のためのチャンネルを含む、チャンネルから形成される、２Ｄカラー画像から成る、スーパフレーム７５４を示している。スーパフレーム７５４は、上位バイト７１６および下位バイト７１８を含む、１６ビット視差マップと、確信度マップとからも成る。いくつかの実施形態においては、スーパフレーム７５４の各ピクセルは、６バイトを用いて指定され得る。図７Ｄは、赤色画像７２２、緑色画像７２４、および青色画像７２６から形成されたカラー画像と、視差マップ７２８と、確信度マップ７３０とから成る、スーパフレーム７５６を示している。いくつかの実施形態においては、視差マップ７２８の各ピクセルは、５バイトを用いて指定され得る。図７Ｅは、２Ｄグレースケール画像７３２と、視差マップ７３４と、確信度マップ７３６とから成る、スーパフレーム７５８を示している。いくつかの実施形態においては、スーパフレーム７５８の各ピクセルは、３バイトを用いて指定され得る。図７Ｆは、２Ｄカラー画像７３８と、視差マップ７４０と、確信度マップ７４２とから成る、スーパフレーム７６０を示している。カラー画像７３８の各ピクセルは、１バイトを用いて指定され得る。例えば、カラー画像７３８は、ベイヤフィルタパターンまたはルックアップテーブルによって、カラーパレット内に指定され得、これは、スーパフレーム７６０が、ＲＧＢカラー画像におけるのと同じくらいコンパクトであることを可能にし得る。いくつかの実施形態においては、スーパフレーム７６０の各ピクセルは、３バイトを用いて指定され得る。 7A-7F illustrate six types of superframes that may be included within video stream 122, in accordance with some embodiments of the present technology. As will be appreciated, different types of superframes may consist of different types of information and therefore have different sizes. FIG. 7A shows a vertically concatenated 2D RGB (red-green-blue) color image 700 (e.g., left (or right) rectified 2D image 114), a depth map or disparity map 700 (e.g., disparity map 116), and A superframe 750 is shown, which consists of a belief map 704 (eg, belief map 120). In some embodiments, each pixel of superframe 750 may use 3 bytes to describe the pixel's disparity or range, and 3 bytes are used to describe the pixel's confidence level. obtain. In some embodiments, each pixel of superframe 750 may be specified using 9 bytes. FIG. 7B shows a superframe 752 consisting of the 2D color image 706 and the concatenation of the disparity map and the confidence map. In some embodiments, each pixel of superframe 752 may use 2 bytes for the disparity map and 1 byte for the confidence map, for a total of 3 bytes when concatenated. In some embodiments, each pixel of superframe 752 may be specified using 6 bytes. FIG. 7C shows a superframe 754 consisting of a 2D color image formed from channels including a channel for the red image 710, a channel for the green image 712, and a channel for the blue image 714. . Superframe 754 also consists of a 16-bit disparity map, including upper byte 716 and lower byte 718, and a confidence map. In some embodiments, each pixel of superframe 754 may be specified using 6 bytes. FIG. 7D shows a superframe 756 consisting of a color image formed from red image 722 , green image 724 , and blue image 726 , disparity map 728 and confidence map 730 . In some embodiments, each pixel of disparity map 728 may be specified using 5 bytes. FIG. 7E shows a superframe 758 consisting of a 2D grayscale image 732, a disparity map 734, and a confidence map 736. FIG. In some embodiments, each pixel of superframe 758 may be specified using 3 bytes. FIG. 7F shows a superframe 760 consisting of a 2D color image 738, a disparity map 740 and a confidence map 742. FIG. Each pixel of color image 738 may be specified using one byte. For example, color image 738 may be specified in a color palette by a Bayer filter pattern or lookup table, which may allow superframe 760 to be as compact as in an RGB color image. In some embodiments, each pixel of superframe 760 may be specified using 3 bytes.

上で述べられたように、エンコーダモジュール１１２に対して、カメラ１００からのセンサデータに基づいた、ステレオビジョン確信度マップ１２０を提供することに加えて、ライダ確信度マップ１２６、レーダ確信度マップ１３０、および音響確信度マップ１３４が、エンコーダモジュール１１２に対して、提供され得る。ライダセンサ、レーダセンサ、および音響センサは、上で説明されたように、カメラ１００と同じまたは同様の視野を有し得、または重なり合う視野を有し得る。いくつかの実施形態においては、ライダ確信度マップ１２６、レーダ確信度マップ１３０、および音響確信度マップ１３４のうちの１つまたは複数は、ビデオストリーム１２２に対して、付加され得る。例えば、ライダ確信度マップ１２６（または別の確信度マップ）は、図７Ａ乃至図７Ｆのスーパフレーム７５０、７５２、７５４、７５６、７５８、７６０のいずれかの確信度マップ内に、別個のビットとしてエンコードされ得、または単純に、例えば、垂直連結によって、スーパフレームに対して付加され得る。 In addition to providing stereo vision confidence map 120 based on sensor data from camera 100 to encoder module 112 as described above, lidar confidence map 126, radar confidence map 130, , and acoustic confidence map 134 may be provided to encoder module 112 . The lidar, radar, and acoustic sensors may have the same or similar field of view as camera 100, or may have overlapping fields of view, as described above. In some embodiments, one or more of lidar confidence map 126 , radar confidence map 130 , and acoustic confidence map 134 may be appended to video stream 122 . For example, the lidar belief map 126 (or another belief map) may be placed as separate bits in the belief maps of any of the superframes 750, 752, 754, 756, 758, 760 of FIGS. 7A-7F. It can be encoded or simply appended to the superframe, eg by vertical concatenation.

本技術のいくつかの実施形態によれば、ライダセンサが、画像の各ピクセルについてのライダ戻り信号を提供するように構成される場合、ライダ確信度マップ（例えば、ライダ確信度マップ１２６）は、２Ｄ画像（例えば、左（または右）平行化２Ｄ画像１１４）の各ピクセルについての確信度値から成り得る。いくつかの実施形態においては、ライダセンサは、例えば、１つ置きのピクセルもしくは２つ置きのピクセルについての、またはピクセルの事前決定されたグループについての、ライダ戻り信号を提供するように構成され得、これは、相対的により疎らであるが、それでも依然として有用なライダ確信度マップという結果になり得る。各ピクセルが、その座標（ｉ，ｊ）によって識別され得る場合、ピクセル（ｉ，ｊ）の確信度値は、ｍ_ｉｊと表記され得る。いくつかの実施形態においては、ライダ確信度マップは、
Ｒ≧Ｒ_ｍｉｎの場合、ｍ_ｉｊ＝ａ_０Ｒ^－２（Ｐ＋Ｐ_０）^－１／２
Ｒ＜Ｒ_ｍｉｎの場合、ｍ_ｉｊ＝０
に従った、２Ｄ画像１１４と視差マップ１１６との関数であり得、Ｒは、視差マップ１１６によって測定されるような、ピクセル（ｉ，ｊ）における対象物までの距離または範囲であり、Ｐは、２Ｄ画像１１４によって測定されるような、ピクセル（ｉ，ｊ）の（太陽光または他の光源からの）背景光パワーであり、ａ_０は、正規化定数であり、Ｐ_０は、フィッティング定数であり、Ｒ_ｍｉｎは、ライダセンサの最小距離である。いくつかの実施形態においては、Ｐの値は、例えば、カメラ１００についてのセンサゲインおよび露出値によって除算された、ピクセル（ｉ，ｊ）における２Ｄ画像１１４のグレースケール値であり得る。上で示されたように、ライダ確信度マップを生成する際に使用される、２Ｄ画像および視差マップは、カメラ１００によってキャプチャされた２Ｄ画像１１４、およびステレオビジョン処理に基づいて決定された視差マップ１１６であり得る。いくつかの実施形態においては、ピクセル（ｉ，ｊ）における対象物が、ライダセンサの最小範囲よりも近いとき、ピクセル（ｉ，ｊ）についての確信度値は、ゼロに設定され得る。確信度値ｍ_ｉｊについての上記の式は、距離または範囲の２乗、すなわち、Ｒ^２に反比例し、また背景光パワー、すなわち、Ｐの平方根にも反比例するように、すなわち、
ＳＮＲ_{ｌｉｄａｒ}∝Ｒ^－２Ｐ^－１／２
となるように、ライダ信号対雑音比（ＳＮＲ）の推定から、導出される。すなわち、対象物が、ライダセンサから遠く離れるほど、戻り信号は低下し、それ故、ライダ推定の正確さは、戻り信号に対応する受光された光子の数に比例して、低下する。加えて、任意の背景光（例えば、太陽、月、人工）は、ライダセンサによって「ショットノイズ」を感知させることがあり、ノイズエネルギーは、背景光パワーの平方根に等しい。したがって、ライダ確信度マップ１２６は、２つの物理量、すなわち、ピクセルにおける対象物までの距離と、ピクセルにおける背景光のレベルとに基づき得る。 According to some embodiments of the present technology, if the lidar sensor is configured to provide a lidar return signal for each pixel of the image, the lidar belief map (eg, lidar belief map 126) is a 2D It may consist of a confidence value for each pixel of an image (eg, left (or right) rectified 2D image 114). In some embodiments, the lidar sensor may be configured to provide a lidar return signal, e.g., for every other pixel or every other pixel, or for a predetermined group of pixels, This can result in a relatively sparser, but still useful lidar belief map. If each pixel can be identified by its coordinates (i,j), the confidence value for pixel (i,j) can be denoted as m _ij . In some embodiments, the lidar confidence map is
If R≧R _min then m _ij =a ₀ R ⁻² (P+P ₀ ) ^−1/2
if R<R _min then m _ij =0
where R is the distance or extent to the object at pixel (i,j), as measured by the disparity map 116, and P is , is the background light power (from sunlight or other light source) at pixel (i,j) as measured by the 2D image 114, a ₀ is the normalization constant, and P ₀ is the fitting constant and R _min is the minimum distance of the lidar sensor. In some embodiments, the value of P may be, for example, the grayscale value of 2D image 114 at pixel (i,j) divided by the sensor gain and exposure values for camera 100 . As indicated above, the 2D image and disparity map used in generating the lidar belief map are the 2D image 114 captured by camera 100 and the disparity map determined based on stereo vision processing. 116. In some embodiments, the confidence value for pixel (i,j) may be set to zero when the object at pixel (i,j) is closer than the minimum range of the lidar sensor. The above formula for the confidence value m _ij is inversely proportional to the square of the distance or range, i.e. ^R2 , and also inversely proportional to the square root of the background light power, i.e. P, i.e.
SNR _lidar ∝R ⁻² P ^−1/2
It is derived from an estimate of the lidar signal-to-noise ratio (SNR) such that That is, the farther the object is from the lidar sensor, the lower the return signal, and therefore the accuracy of the lidar estimate decreases in proportion to the number of received photons corresponding to the return signal. Additionally, any background light (eg, sun, moon, man-made) can cause "shot noise" to be perceived by a lidar sensor, where noise energy is equal to the square root of the background light power. Thus, the lidar belief map 126 may be based on two physical quantities: the distance to the object in pixels and the background light level in pixels.

レーダセンサおよび音響センサも、ライダセンサと同じく、Ｒ^－２Ｐ^－１／２に比例する、ＳＮＲを有し得る。この特性は、本技術のいくつかの実施形態による、ライダ確信度マップを計算するための、上で説明されたのと同様の方式で、レーダ確信度マップ（例えば、レーダ確信度マップ１３０）、および／または音響確信度マップ（例えば、音響確信度マップ１３４）を計算するために、採用され得る。 Radar and acoustic sensors, like lidar sensors, can also have an SNR that is proportional to R ⁻² P ^−1/2 . This property is a radar confidence map (e.g., radar confidence map 130), in a manner similar to that described above for computing lidar confidence maps, according to some embodiments of the present technology; and/or to compute an acoustic confidence map (eg, acoustic confidence map 134).

本技術のいくつかの実施形態によれば、車両上の異なるセンサシステムは、異なる距離範囲にある対象物を感知するように、配置され得る。例えば、図８Ａは、レーダセンサシステムまたは音響センサシステムによって照射されるビーム領域８００を概略的に描いている。ビーム領域８００は、多数の距離または範囲Ｒ_Ａ、Ｒ_Ｂ、Ｒ_Ｃ、Ｒ_Ｄにおいて、多数の対象物Ａ、Ｂ、Ｃ、Ｄを有する、シーンを包含し得る。対象物から反射される戻り信号の強さは、ステレオビジョンシステム１によって測定され得る、対象物の断面積に比例する（例えば、範囲は、視差マップ１１６に基づき、ターゲットの面積は、左（または右）平行化２Ｄ画像１１４によって与えられる）。いくつかの実施形態においては、センサシステムによって検出された戻り信号は、信号強さに基づいて、異なるビン内にカテゴライズされ得る。図８Ｂは、対象物Ａ、Ｂ、Ｃ、Ｄの各々について、断面積と範囲との間の関係を描いたグラフである。破線は、レーダセンサシステム（またはライダセンサシステム）のノイズフロア８０２を表す。ノイズフロア８０２は、レーダおよびライダ技術についての知られた量であり、距離の２乗の関数として劣化することが知られている、センサ感度に対応し得る。上で述べられたように、ライダＳＮＲは、距離または範囲の２乗に反比例する値として、推定され得る。いくつかの実施形態においては、ピクセルについてのライダ（またはレーダ）確信度は、ピクセルについてのＳＮＲに比例する値として、決定され得る。例えば、複数のピクセルにまたがり得る対象物の断面積の、ノイズフロア８０２の範囲スケール値に対する比は、ＳＮＲに比例する値であり、対象物が出現するピクセルについての確信度値として、使用され得る。 According to some embodiments of the present technology, different sensor systems on the vehicle may be arranged to sense objects at different distance ranges. For example, FIG. 8A schematically depicts a beam area 800 illuminated by a radar or acoustic sensor system. Beam region 800 may encompass a scene having multiple objects A, _B , _C , D at multiple ranges or ranges R _A , R B , R C , R _D . The strength of the return signal reflected from the object is proportional to the cross-sectional area of the object, which can be measured by the stereo vision system 1 (e.g. range is based on the parallax map 116 and the area of the target is left (or Right) given by the rectified 2D image 114). In some embodiments, return signals detected by the sensor system may be categorized into different bins based on signal strength. FIG. 8B is a graph depicting the relationship between cross-sectional area and extent for each of objects A, B, C, and D; The dashed line represents the noise floor 802 of the radar sensor system (or lidar sensor system). The noise floor 802 is a known quantity for radar and lidar technology and may correspond to sensor sensitivity, which is known to degrade as a function of distance squared. As mentioned above, the lidar SNR can be estimated as a value that is inversely proportional to the square of the distance or range. In some embodiments, the lidar (or radar) confidence for a pixel may be determined as a value proportional to the SNR for the pixel. For example, the ratio of the cross-sectional area of an object, which can span multiple pixels, to the range scale value of the noise floor 802 is a value proportional to the SNR and can be used as the confidence value for the pixel that the object appears. .

本技術の例となる実施形態によれば、ステレオビジョンシステム１がインストールされた車両は、乗用車であり得る。センサ１００は、車のフロントガラスの左上部分および右上部分（例えば、図２におけるセンサＳ３、Ｓ７の位置）上に搭載された、２つのカメラであり得、上で議論されたスーパフレーム７５０などのスーパフレームから成り得る、ビデオストリーム１２２になるように処理および変換される、画像をキャプチャするように構成され得る。スーパフレーム７５０のＲＧＢ画像７００は、画像前処理モジュール１０４の一部であり得る、またはステレオマッチングが実行される前に、画像が平行化を施される前（または後）に、キャプチャされた画像の画像データを処理する、コンピュータプロセッサ１０の別個のモジュールであり得る、対象物検出モジュール（例えば、単眼対象物検出器）によって、解析され得る。例えば、対象物検出モジュールは、キャプチャされた画像のシーンにおいて、様々な形状のバウンディングボックスを決定するために、トレーニングされた畳み込みニューラルネットワークを使用するように構成され得る。形状は、ニューラルネットワークが、車によってしばしば遭遇される典型的な対象物（例えば、他の車両、歩行者、自転車に乗った人、交通標識、交通信号など）として認知するようにトレーニングされた、対象物のカテゴリに対応し得る。スーパフレーム７５０の視差マップ７０２は、検出された対象物までの平均距離を決定するために、バウンディングボックスにわたって平均化され得る。さらに、地面または路面が、Ｓ．カケガワ（Ｓ．Ｋａｋｅｇａｗａ）らによる、「ステレオカメラのための垂直ローカル視差ヒストグラムに基づいた路面セグメンテーション（ＲｏａｄＳｕｒｆａｃｅＳｅｇｍｅｎｔａｔｉｏｎｂａｓｅｄｏｎＶｅｒｔｉｃａｌｌｙＬｏｃａｌＤｉｓｐａｒｉｔｙＨｉｓｔｏｇｒａｍｆｏｒＳｔｅｒｅｏＣａｍｅｒａ）」（Ｉｎｔ’ｌＪ．ＩＴＳＲｅｓ．ｖｏｌ．１６，ｐｐ．９０－９７，２０１８）において説明されている、垂直視差ヒストグラムを使用した技法、ならびにＨ．バジノ（Ｈ．Ｂａｄｉｎｏ）らによる、「確率的占有グリッドおよび動的プログラミングを使用したフリースペース計算（ＦｒｅｅＳｐａｃｅＣｏｍｐｕｔａｔｉｏｎＵｓｉｎｇＳｔｏｃｈａｓｔｉｃＯｃｃｕｐａｎｃｙＧｒｉｄｓａｎｄＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ）」（ＷｏｒｋｓｈｏｐｏｎＤｙｎａｍｉｃａｌＶｉｓｉｏｎ，ＩＣＣＶ，２００７）において説明されている、確率的占有グリッドおよび動的プログラミングを使用した技法など、知られた技法を使用して、視差マップ７０２から計算され得る。路面から、未分類のロードハザード（例えば、オートバイのヘルメット、レンガ、緩んだ積荷からの木製パレット、マフラ、タイヤから分離したトレッドなど）が、見つけられ、マークされ得る。典型的な路面上の小さい未分類の対象物は、しばしば、よく定義されたエッジおよび輪郭特徴を有し、それが、確信度マップ７０４における、高い確信度値をもたらし得、したがって、誤検出および不必要な自動緊急ブレーキが発生することを防止し得る。この例となる実施形態においては、カメラ１００の一方または両方のキャプチャされた画像の左下の一部を塞ぐ、デブリ（例えば、泥）が、フロントガラス上にある場合、確信度処理モジュール１１０は、ステレオマッチングモジュール１０８が、ステレオマッチングモジュール１０８によって出力される、コストボリュームマップ１１８の左下の一部において、高いコストを繰り返し報告することを検出し得、したがって、確信度マップ１２０の対応領域について、低い確信度値を提供し得る。車のメインシステムコントローラ３０は、低い確信度値が、許容可能な確信度閾値内にあるかどうかを決定するように構成され得、閾値内にない場合、左下領域からのデータを無効化するように、コンピュータプロセッサ１０を制御し得る。すなわち、いくつかの実施形態においては、左下領域についての確信度値が、信頼性閾値よりも低く、したがって、左下領域についての計算された深度値は、無効であり得るので、コンピュータプロセッサ１０は、誤読を避け、結果として、車の誤った制御を避ける（例えば、シーン内に実際にはない対象物を避けるために、回避行動を実行するように、車が制御されるのを防止する）ために、左下領域に関連する、カメラ１００からのデータを使用して、キャプチャされた画像のシーン内の対象物を検出しないように制御され得る。代わりに、上で議論されたように、データについての確信度値が、確信度閾値を満たしている場合、車上の他のセンサからのデータが使用され得る。 According to example embodiments of the present technology, the vehicle in which the stereo vision system 1 is installed may be a passenger car. The sensors 100 can be two cameras mounted on the upper left and upper right portions of the windshield of the vehicle (e.g., the locations of sensors S3, S7 in FIG. 2), such as the Superframe 750 discussed above. It may be configured to capture images that are processed and converted into a video stream 122, which may consist of superframes. The RGB image 700 of the superframe 750 can be part of the image preprocessing module 104, or an image captured before (or after) the image is subjected to rectification, before stereo matching is performed. can be analyzed by an object detection module (eg, a monocular object detector), which can be a separate module of computer processor 10, which processes the image data of . For example, the object detection module may be configured to use a trained convolutional neural network to determine bounding boxes of various shapes in a scene of captured images. The shapes were trained to perceive the neural network as typical objects often encountered by cars (e.g. other vehicles, pedestrians, cyclists, traffic signs, traffic lights, etc.). It may correspond to a category of objects. Disparity map 702 of superframe 750 may be averaged over the bounding box to determine the average distance to the detected object. Furthermore, if the ground or road surface is S. S. Kakegawa et al., "Road Surface Segmentation based on Vertically Local Disparity Histogram for Stereo Camera" (Int'l J. ITS Res. .vol. 16, pp. 90-97, 2018), as well as techniques using vertical disparity histograms, as well as H. H. Badino et al., "Free Space Computation Using Stochastic Occupancy Grids and Dynamic Programming," Workshop on Dynamical Vision, ICCV, 2007) can be calculated from the disparity map 702 using known techniques, such as techniques using probabilistic occupancy grids and dynamic programming. From the road surface, unclassified road hazards (eg, motorcycle helmets, bricks, wooden pallets from loose cargo, mufflers, treads separated from tires, etc.) can be found and marked. Small unclassified objects on a typical road surface often have well-defined edge and contour features, which can lead to high confidence values in the confidence map 704, thus reducing false positives and It is possible to prevent unnecessary automatic emergency braking from occurring. In this example embodiment, if there is debris (e.g., mud) on the windshield that blocks part of the lower left portion of the captured image of one or both of the cameras 100, the belief processing module 110 will: The stereo matching module 108 may detect that the lower left portion of the cost volume map 118 output by the stereo matching module 108 repeatedly reports a high cost, thus for the corresponding region of the confidence map 120 a low A confidence value may be provided. The vehicle's main system controller 30 may be configured to determine if the low confidence value is within an acceptable confidence threshold, and if not to invalidate data from the lower left region. can control the computer processor 10 at the same time. That is, in some embodiments, the confidence value for the lower left region may be below the confidence threshold, and thus the calculated depth value for the lower left region may be invalid, so computer processor 10 may: To avoid misreadings and, as a result, erroneous control of the car (e.g. to prevent the car from being controlled to perform evasive actions to avoid objects that are not actually in the scene). Additionally, data from the camera 100 associated with the lower left region can be used to control detection of objects within the scene of the captured image. Alternatively, as discussed above, data from other sensors on the vehicle may be used if the confidence value for the data meets the confidence threshold.

図３に戻ると、メインシステムコントローラ３０は、本技術のいくつかの実施形態によれば、車両の動作の様々な側面を制御するように構成された、複数のＥＣＵ３４－１、３４－２、３４－３、３４－４、・・・、３４－ｎに対して、制御信号を発行するように構成され得る。例えば、ビデオストリーム１２２は、車両のディスプレイ（示されず）に対して提供され得、また赤い尾灯が車両の経路の前方にあり、それが交通渋滞または他の潜在的危険を示していることがあると、高い確実性（例えば、９０％よりも高い確実性）で決定するために、ビデオストリーム１２２からの視差情報および確信度情報を使用し得る、メインシステムコントローラ３０に対しても提供され得る。メインシステムコントローラ３０は、その後、車両を減速させ、停止させるように、車両のパワートレインを制御し得る、ＥＣＵ３４－１に対して、制御信号を発行し得る。いくつかの実施形態においては、コンピュータプロセッサ１０および／またはメインシステムコントローラ３０による処理は、シーンの画像データが、カメラ１００によってキャプチャされ、画像取得モジュール１０２に対して転送されたときに対して、リアルタイムまたはほぼリアルタイム（例えば、２秒未満または１秒未満）で、ディスプレイが、ビデオストリーム１２２に対応するビデオを示し得るだけ、十分に高速であり得る。いくつかの実施形態においては、ＥＣＵ３４－１は、シーンの画像データが、カメラ１００によってキャプチャされ、画像取得モジュール１０２に対して転送されたときに対して、リアルタイムまたはほぼリアルタイムで、制御信号を受信し得る。上で述べられたように、メインシステムコントローラ３０と、ＥＣＵ３４－１、３４－２、３４－３、３４－４、・・・、３４－ｎとの間の通信は、技法を介し得る。例えば、指令および制御ライン３２は、車両に内蔵されたネットワークまたはバスであり得、メインシステムコントローラ３０と、ＥＣＵ３４－１、３４－２、３４－３、３４－４、・・・、３４－ｎとの間で信号を送るために、知られたプロトコル（例えば、イーサネット（登録商標）、ＣＡＮ、Ｉ２Ｃ、ＬＩＮなど）を使用し得る。 Returning to FIG. 3, the main system controller 30 comprises a plurality of ECUs 34-1, 34-2, 34-2, 34-1, 34-2, configured to control various aspects of the operation of the vehicle, in accordance with some embodiments of the present technology. 34-3, 34-4, . . . , 34-n. For example, the video stream 122 may be provided to the vehicle's display (not shown) and a red tail light ahead of the vehicle's path, which may indicate a traffic jam or other potential hazard. and to main system controller 30, which may use disparity information and confidence information from video stream 122 to determine with high certainty (eg, greater than 90% certainty). Main system controller 30 may then issue control signals to ECU 34-1, which may control the vehicle's powertrain to slow down and stop the vehicle. In some embodiments, processing by computer processor 10 and/or main system controller 30 is performed in real-time with respect to when image data of a scene is captured by camera 100 and transferred to image acquisition module 102. Or it may be fast enough that the display may show video corresponding to video stream 122 in near real-time (eg, less than 2 seconds or less than 1 second). In some embodiments, ECU 34-1 receives control signals in real time or near real time relative to when image data of a scene is captured by camera 100 and transferred to image acquisition module 102. can. As noted above, communication between the main system controller 30 and the ECUs 34-1, 34-2, 34-3, 34-4, . . . , 34-n may be via techniques. For example, the command and control line 32 may be a network or bus internal to the vehicle, and the main system controller 30 and the ECUs 34-1, 34-2, 34-3, 34-4, . . . , 34-n. Known protocols (eg, Ethernet, CAN, I2C, LIN, etc.) may be used to send signals to and from.

図９Ａおよび図９Ｂは、２つの５メガピクセルカメラによってキャプチャされ、上で説明された本技術の様々な実施形態に従って処理された、シーンの画像からの結果である、データの例を示している。２つのカメラは、それらが１．２ｍのベースラインによって分離されるように、標準的な乗用車のルーフに搭載された。図９Ａは、シーンの距離および明度の両方が、同時に見られ得るように、グレースケールカメラ画像（示されず）とブレンドされた、カラー視差マップを示している。視差マップは、赤色が相対的に近い対象物を示し、青色が相対的により遠く離れた対象物を示す、ｊｅｔカラーマップを用いてエンコードされた、０から１０２３までの値を有する。視差マップの透明度は、グレースケールカメラ画像によって変調される。図９Ｂは、シーンの対応する確信度マップを示している。確信度マップは、Ｈ．ヒルシュミュラ（Ｈ．Ｈｉｒｓｃｈｍｕｌｌｅｒ）による、「セミグローバルマッチングおよび相互情報量によるステレオ処理（ＳｔｅｒｅｏＰｒｏｃｅｓｓｉｎｇｂｙＳｅｍｉ－ＧｌｏｂａｌＭａｔｃｈｉｎｇａｎｄＭｕｔｕａｌＩｎｆｏｒｍａｔｉｏｎ）」（２００８ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ）において説明されているような、セミグローバルマッチングアルゴリズムからの、各ピクセルについての最小集約コスト値を示している。 9A and 9B show example data resulting from images of a scene captured by two 5-megapixel cameras and processed according to various embodiments of the present technology described above. . The two cameras were mounted on the roof of a standard passenger car such that they were separated by a baseline of 1.2m. FIG. 9A shows a color disparity map blended with a grayscale camera image (not shown) so that both the distance and brightness of the scene can be viewed simultaneously. The disparity map has values from 0 to 1023, encoded using the jet colormap, with red indicating relatively closer objects and blue indicating relatively farther away objects. The transparency of the parallax map is modulated by the grayscale camera image. FIG. 9B shows the corresponding confidence map for the scene. Confidence maps are provided by H.I. H. Hirschmuller, "Stereo Processing by Semi-Global Matching and Mutual Information" (2008 IEEE Transactions on Pattern Analysis and Machine) e Intelligence) , shows the minimum aggregated cost value for each pixel from the semi-global matching algorithm.

本技術のいくつかの実施形態によれば、図１、図３、および図５における処理モジュールおよび構成要素は、ハードウェア（例えば、上で説明された手順および方法を実行するようにプログラムされた、１つもしくは複数のコンピュータプロセッサ）で、および／またはソフトウェア（例えば、コンピュータ遂行可能コード）で、実施され得る。いくつかの実施形態においては、ソフトウェアの少なくともいくつかは、コンピュータプロセッサ内に事前プログラムされ得る。いくつかの実施形態においては、ソフトウェアの少なくともいくつかは、非一時的コンピュータ可読記憶媒体上に、または複数の非一時的コンピュータ可読記憶媒体上に記憶され得、それらは、コンピュータプロセッサによってアクセスされ、遂行され得る。いくつかの実施形態においては、プロセスモジュールは、ソフトウェアモジュール、もしくはハードウェアモジュール、またはハードウェアとソフトウェアとを組み合わせたモジュールであり得る。 According to some embodiments of the present technology, the processing modules and components in FIGS. 1, 3, and 5 are hardware (e.g., programmed to perform the procedures and methods described above). , one or more computer processors) and/or in software (eg, computer executable code). In some embodiments, at least some of the software may be pre-programmed into the computer processor. In some embodiments, at least some of the software may be stored on a non-transitory computer-readable storage medium, or on multiple non-transitory computer-readable storage media, which are accessed by a computer processor, can be carried out. In some embodiments, a process module may be a software module, or a hardware module, or a module combining hardware and software.

本願明細書において説明された技術による、車両支援システムは、異なる構成で具現化され得る。例となる構成は、以下のような構成（１）乃至（２１）の組み合わせを含む。
（１）監視ありまたは監視なしの車両移動のための自動車両支援システムであって、システムは、コンピュータプロセッサと、コンピュータプロセッサに対して結合されたメモリとから成る、車両制御システムと、シーンの第１の画像データを受信し、第１の画像データに基づいて、第１の視差マップおよび第１の確信度マップを出力するように構成された、第１のセンサシステムとを備え、車両制御システムは、第１のセンサシステムから、第１の視差マップと、第１の確信度マップとを受信し、第１の視差マップと、第１の確信度マップとから成る、ビデオストリームを出力するように構成される、自動車両支援システム。 A vehicle assistance system according to the techniques described herein may be embodied in different configurations. Exemplary configurations include combinations of configurations (1) through (21) as follows.
(1) An automated vehicle assistance system for supervised or unsupervised vehicle movement, the system comprising a vehicle control system comprising a computer processor and a memory coupled to the computer processor; a first sensor system configured to receive one piece of image data and output a first disparity map and a first confidence map based on the first image data; receives the first disparity map and the first confidence map from the first sensor system and outputs a video stream comprising the first disparity map and the first confidence map. An automated vehicle assistance system, comprising:

（２）ビデオストリームにおいて、第１の確信度マップは、第１の視差マップの一部になるように、エンコードされる、構成（１）のシステム。
（３）第１の画像データは、複数のピクセルから成り、視差マップは、ピクセルの各々についての視差データから成り、確信度マップは、ピクセルの各々についての確信度データから成る、構成（１）または構成（２）のシステム。 (2) The system of configuration (1), wherein in the video stream the first confidence map is encoded to become part of the first disparity map.
(3) The first image data consists of a plurality of pixels, the disparity map consists of disparity data for each of the pixels, and the confidence map consists of confidence data for each of the pixels, configuration (1) or a system of configuration (2).

（４）第１の画像データは、左右の２次元（２Ｄ）の第１の画像についてのデータから成り、第１のセンサシステムは、第１の画像データから、左右（２Ｄ）の平行化された第１の画像と、第１のコストボリュームマップとを生成するように構成され、第１のセンサシステムは、２Ｄの平行化された第１の画像と、第１の視差マップと、第１のコストボリュームマップとから、第１の確信度マップを生成するように構成される、構成（１）乃至（３）のうちのいずれか１つのシステム。 (4) the first image data comprises data for a left and right two-dimensional (2D) first image, and the first sensor system outputs left and right (2D) collimated data from the first image data; and a first cost volume map, wherein the first sensor system is configured to generate a 2D parallelized first image, a first disparity map, and a first The system of any one of configurations (1)-(3), wherein the system is configured to generate the first confidence map from the cost volume map of the .

（５）第１のセンサシステムは、セミグローバルマッチング（ＳＧＭ）アルゴリズムから決定される一意性の値、および第１の画像データに対するソーベル演算から決定される画像テクスチャ値の一方または両方に基づいて、第１の確信度マップを生成するように構成される、構成（１）乃至（４）のうちのいずれか１つのシステム。 (5) the first sensor system based on one or both of a uniqueness value determined from a semi-global matching (SGM) algorithm and an image texture value determined from a Sobel operation on the first image data; The system of any one of configurations (1)-(4), configured to generate a first belief map.

（６）シーンの少なくとも一部分の第２の画像データを受信し、第２の画像データに基づいて、第２の確信度マップを出力するように構成された、第２のセンサシステムをさらに備え、車両制御システムは、第２のセンサシステムから、第２の確信度マップを受信することと、ビデオストリームを、スーパフレームのシーケンスとして出力することであって、各スーパフレームは、第１の視差マップと、第１の確信度マップと、第２の確信度マップとに基づいた情報から成る、出力することとを行うように構成される、構成（１）乃至（５）のうちのいずれか１つのシステム。 (6) further comprising a second sensor system configured to receive second image data of at least a portion of the scene and output a second confidence map based on the second image data; The vehicle control system receives a second confidence map from the second sensor system and outputs the video stream as a sequence of superframes, each superframe being a first disparity map. and outputting information based on the first confidence map and the second confidence map. system.

（７）車両制御システムは、ビデオストリームの情報に基づいて、車両の電子制御ユニット（ＥＣＵ）に対して、制御信号を出力するように構成される、構成（１）乃至（６）のうちのいずれか１つのシステム。 (7) Any of configurations (1) to (6), wherein the vehicle control system is configured to output a control signal to an electronic control unit (ECU) of the vehicle based on the information in the video stream. any one system.

（８）第１のセンサシステムは、第１の視差マップと、第１の確信度マップとを生成するために、第１の画像データを処理するように構成された、第１のセンサモジュールであり、第２のセンサシステムは、第２の確信度マップを生成するために、第２の画像データを処理するように構成された、第２のセンサモジュールであり、第１のセンサモジュールおよび第２のセンサモジュールは、メモリ内に記憶され、コンピュータプロセッサは、第１のセンサモジュールおよび第２のセンサモジュールを遂行するように構成される、構成（１）乃至（７）のうちのいずれか１つのシステム。 (8) the first sensor system is a first sensor module configured to process the first image data to generate a first disparity map and a first confidence map; Yes, the second sensor system is a second sensor module configured to process the second image data to generate a second belief map, the first sensor module and the second any one of configurations (1) through (7), wherein the two sensor modules are stored in the memory and the computer processor is configured to perform the first sensor module and the second sensor module; system.

（９）ビデオストリームは、第１の視差マップと、第１の確信度マップとから成る、少なくとも１つのスーパフレームと、第１の視差マップと、第２の確信度マップとから成る、少なくとも１つのスーパフレームとから成る、構成（１）乃至（８）のうちのいずれか１つのシステム。 (9) the video stream comprises at least one superframe consisting of a first disparity map, a first confidence map, a first disparity map and a second confidence map; The system of any one of configurations (1) through (8), consisting of one superframe.

（１０）ビデオストリームは、第１の確信度マップの一部分と、第２の確信度マップの一部分とから成る、少なくとも１つのスーパフレームから成る、構成（１）乃至（９）のうちのいずれか１つのシステム。 (10) any of configurations (1) through (9), wherein the video stream consists of at least one superframe consisting of a portion of the first confidence map and a portion of the second confidence map; one system.

（１１）第１の画像データは、ステレオビジョンデータから成り、第２の画像データは、ライダデータから成る、構成（１）乃至（１０）のうちのいずれか１つのシステム。
（１２）シーンの少なくとも一部分の第３の画像データを受信し、第３の画像データに基づいて、第３の確信度マップを出力するように構成された、第３のセンサシステムをさらに備える、構成（１）乃至（１１）のうちのいずれか１つのシステム。 (11) The system of any one of configurations (1)-(10), wherein the first image data comprises stereo vision data and the second image data comprises lidar data.
(12) further comprising a third sensor system configured to receive third image data of at least a portion of the scene and output a third confidence map based on the third image data; The system of any one of configurations (1) through (11).

（１３）第３の画像データは、レーダデータ、または音響データから成る、構成（１）乃至（１２）のうちのいずれか１つのシステム。
（１４）ビデオストリームの各スーパフレームは、シーンの２次元（２Ｄ）画像と、シーンの深度マップと、シーンの確実性マップとから成る、構成（１）乃至（１３）のうちのいずれか１つのシステム。 (13) The system of any one of configurations (1)-(12), wherein the third image data comprises radar data or acoustic data.
(14) any one of configurations (1) through (13), wherein each superframe of the video stream consists of a two-dimensional (2D) image of the scene, a depth map of the scene, and a certainty map of the scene; system.

（１５）シーンの確実性マップは、第１の確信度マップ、もしくは第２の確信度マップ、または第１の確信度マップと第２の確信度マップとの組み合わせから成る、構成（１）乃至（１４）のうちのいずれか１つのシステム。 (15) The scene certainty map consists of a first confidence map, a second confidence map, or a combination of the first confidence map and the second confidence map, configurations (1) to The system of any one of (14).

（１６）シーンの深度マップは、シーンの２Ｄ画像に対応する画像データを用いて変調された、第１の視差マップから成り、シーンの確実性マップは、シーンの２Ｄ画像に対応する画像データを用いて変調された、第１の確信度マップ、もしくは第２の確信度マップ、または第１の確信度マップと第２の確信度マップとの組み合わせから成る、構成（１）乃至（１５）のうちのいずれか１つのシステム。 (16) the scene depth map consists of a first disparity map modulated with image data corresponding to a 2D image of the scene, and the scene certainty map comprises image data corresponding to the 2D image of the scene; of configurations (1) to (15), consisting of a first confidence map, or a second confidence map, or a combination of the first and second confidence maps, modulated using one of our systems.

（１７）シーンの２Ｄ画像のピクセル、シーンの深度マップのピクセル、およびシーンの確実性マップのピクセルは、時間的および空間的にマッチさせられる、構成（１）乃至（１６）のうちのいずれか１つのシステム。 (17) any of configurations (1) through (16), wherein the pixels of the 2D image of the scene, the pixels of the depth map of the scene, and the pixels of the certainty map of the scene are temporally and spatially matched; one system.

（１８）車両制御システムは、ビデオストリームのデータサイズを下げるために、第１の視差マップからの視差情報、ならびに第１の確信度マップおよび第２の確信度マップからの確信度情報をエンコードするように構成される、構成（１）乃至（１７）のうちのいずれか１つのシステム。 (18) The vehicle control system encodes the disparity information from the first disparity map and the belief information from the first and second belief maps to reduce the data size of the video stream. The system of any one of configurations (1) through (17), wherein the system is configured to:

（１９）車両上に搭載されるように構成された、カメラのペアをさらに備え、カメラは、第１のセンサシステムに対して、第１の画像データを提供するように構成される、構成（１）乃至（１８）のうちのいずれか１つのシステム。 (19) further comprising a pair of cameras configured to be mounted on the vehicle, the cameras configured to provide first image data to the first sensor system; A system according to any one of 1) to (18).

（２０）ビデオストリームは、２次元（２Ｄ）カラー画像から成り、各２Ｄカラー画像は、複数のピクセルから成り、各ピクセルのアルファチャンネル透明度は、ピクセルについての確信度値に比例する、構成（１）乃至（１９）のうちのいずれか１つのシステム。 (20) The video stream consists of two-dimensional (2D) color images, each 2D color image consisting of a plurality of pixels, the alpha channel transparency of each pixel being proportional to the confidence value for the pixel, configuration (1 ) through (19).

（２１）２Ｄカラー画像のカラーは、深度範囲を示す、構成（１）乃至（２０）のうちのいずれか１つのシステム。
非一時的コンピュータ可読記憶媒体は、コンピュータプロセッサによって遂行されたときに、コンピュータプロセッサに、本願明細書において説明される技術による、監視ありまたは監視なしの車両移動のための自動車両支援システムの方法を実行させるコードを記憶するように構成され得る。そのようなコンピュータ可読記憶媒体の例は、以下のような構成（２２）乃至（３４）の組み合わせを含む。 (21) The system of any one of configurations (1)-(20), wherein the color of the 2D color image indicates depth range.
The non-transitory computer readable storage medium, when executed by a computer processor, instructs the computer processor to perform an automated vehicle assistance system method for supervised or unsupervised vehicle movement according to the techniques described herein. It may be configured to store code to be executed. Examples of such computer-readable storage media include combinations of structures (22)-(34) as follows.

（２２）コンピュータプロセッサによって遂行されたときに、コンピュータプロセッサに、監視ありまたは監視なしの車両移動のための自動車両支援システムの方法を実行させるコードを記憶した、非一時的コンピュータ可読記憶媒体であって、方法は、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとを獲得する工程であって、第１の視差マップおよび第１の確信度マップは、シーンの第１の画像データに対応する、工程と、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとから成る、ビデオストリームを出力する工程とから成る、非一時的コンピュータ可読記憶媒体。 (22) A non-transitory computer-readable storage medium storing code that, when executed by a computer processor, causes the computer processor to perform the method of the automated vehicle assistance system for supervised or unsupervised vehicle movement. The method includes, by a computer processor, obtaining a first disparity map and a first belief map, the first disparity map and the first belief map being a first A non-transitory computer-readable storage medium, corresponding to image data, and a computer processor outputting a video stream comprising a first disparity map and a first confidence map.

（２３）ビデオストリームを出力する工程は、コンピュータプロセッサが、第１の視差マップの一部となるように、第１の確信度マップをエンコードする工程から成る、構成（２２）のコンピュータ可読記憶媒体。 (23) The computer-readable storage medium of arrangement (22), wherein the step of outputting the video stream comprises the step of the computer processor encoding the first confidence map to become part of the first disparity map. .

（２４）第１の画像データは、複数のピクセルから成り、視差マップは、ピクセルの各々についての視差データから成り、確信度マップは、ピクセルの各々についての確信度データから成る、構成（２２）または構成（２３）のコンピュータ可読記憶媒体。 (24) The first image data consists of a plurality of pixels, the disparity map consists of disparity data for each of the pixels, and the confidence map consists of confidence data for each of the pixels. (22) or the computer-readable storage medium of configuration (23).

（２５）方法は、コンピュータプロセッサが、シーンの少なくとも一部分の第２の画像データに対応する第２の確信度マップを獲得する工程と、コンピュータプロセッサが、ビデオストリームを、スーパフレームのシーケンスとして出力する工程であって、各スーパフレームは、第１の視差マップと、第１の確信度マップと、第２の確信度マップとに基づいた情報から成る、工程とからさらに成る、構成（２２）乃至（２４）のうちのいずれか１つのコンピュータ可読記憶媒体。 (25) The method comprises the steps of: a computer processor obtaining a second confidence map corresponding to second image data of at least a portion of the scene; and the computer processor outputting the video stream as a sequence of superframes. wherein each superframe consists of information based on the first disparity map, the first confidence map, and the second confidence map; (24) any one computer readable storage medium.

（２６）方法は、コンピュータプロセッサが、ビデオストリームの情報に基づいて、車両の電子制御ユニット（ＥＣＵ）に対して、制御信号を出力する工程からさらに成る、構成（２２）乃至（２５）のうちのいずれか１つのコンピュータ可読記憶媒体。 (26) The method of any of (22) to (25) further comprising the step of the computer processor outputting a control signal to an electronic control unit (ECU) of the vehicle based on the information in the video stream. any one computer-readable storage medium of

（２７）方法は、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとを獲得するために、第１の画像データを処理する工程と、コンピュータプロセッサが、第２の確信度マップを獲得するために、第２の画像データを処理する工程とからさらに成る、構成（２２）乃至（２６）のうちのいずれか１つのコンピュータ可読記憶媒体。 (27) The method comprises a computer processor processing the first image data to obtain a first disparity map and a first confidence map; and processing the second image data to obtain the map.

（２８）ビデオストリームを出力する工程は、コンピュータプロセッサが、第１の視差マップと、第１の確信度マップとから成るように、少なくとも１つのスーパフレームを準備する工程と、コンピュータプロセッサが、第１の視差マップと、第２の確信度マップとから成るように、少なくとも１つのスーパフレームを準備する工程とから成る、構成（２２）乃至（２７）のうちのいずれか１つのコンピュータ可読記憶媒体。 (28) Outputting the video stream comprises the computer processor preparing at least one superframe to consist of the first disparity map and the first confidence map; preparing at least one superframe to consist of one disparity map and a second confidence map. .

（２９）ビデオストリームを出力する工程は、コンピュータプロセッサが、第１の確信度マップの一部分と、第２の確信度マップの一部分とから成る、少なくとも１つのスーパフレームを準備する工程から成る、構成（２２）乃至（２８）のうちのいずれか１つのコンピュータ可読記録媒体。 (29) The step of outputting the video stream comprises the computer processor preparing at least one superframe consisting of a portion of the first confidence map and a portion of the second confidence map. The computer-readable recording medium of any one of (22)-(28).

（３０）第１の画像データは、ステレオビジョンデータから成り、第２の画像データは、ライダデータ、またはレーダデータ、または音響データから成る、構成（２２）乃至（２９）のうちのいずれか１つのコンピュータ可読記憶媒体。 (30) any one of arrangements (22) through (29), wherein the first image data consists of stereo vision data and the second image data consists of lidar data, or radar data, or acoustic data; one computer-readable storage medium.

（３１）ビデオストリームを出力する工程は、コンピュータプロセッサが、シーンの２次元（２Ｄ）画像と、シーンの深度マップと、シーンの確実性マップとから成るように、ビデオストリームの各スーパフレームを準備する工程から成る、構成（２２）乃至（３０）のうちのいずれか１つのコンピュータ可読記憶媒体。 (31) The step of outputting the video stream comprises the computer processor preparing each superframe of the video stream to consist of a two-dimensional (2D) image of the scene, a depth map of the scene, and a certainty map of the scene. The computer-readable storage medium of any one of arrangements (22)-(30) comprising the step of:

（３２）コンピュータプロセッサによって、各スーパフレームを準備する工程は、シーンの２Ｄ画像のピクセルと、シーンの深度マップのピクセルと、シーンの確実性マップのピクセルとを、時間的および空間的にマッチさせる工程から成る、構成（２２）乃至（３１）のうちのいずれか１つのコンピュータ可読記憶媒体。 (32) by a computer processor, preparing each superframe temporally and spatially matches pixels of a 2D image of the scene, pixels of a depth map of the scene, and pixels of a certainty map of the scene; The computer-readable storage medium of any one of arrangements (22)-(31), comprising the steps of:

（３３）ビデオストリームを出力する工程は、コンピュータプロセッサが、ビデオストリームのデータサイズを下げるために、第１の視差マップからの視差情報、ならびに第１の確信度マップおよび第２の確信度マップからの確信度情報をエンコードする工程から成る、構成（２２）乃至（３２）のうちのいずれか１つのコンピュータ可読記憶媒体。 (33) The step of outputting the video stream comprises: the computer processor disparity information from the first disparity map and from the first and second confidence maps to reduce the data size of the video stream; 33. The computer-readable storage medium of any one of arrangements (22)-(32), comprising the step of encoding the confidence information.

（３４）ビデオストリームを出力する工程は、各２Ｄカラー画像が、複数のピクセルから成り、各ピクセルのアルファチャンネル透明度が、ピクセルについての確信度値に比例し、２Ｄカラー画像のカラーが、深度範囲を示すように、２次元（２Ｄ）カラー画像を準備する工程から成る、構成（２２）乃至（３３）のうちのいずれか１つのコンピュータ可読記憶媒体。 (34) outputting the video stream, wherein each 2D color image is composed of a plurality of pixels, the alpha channel transparency of each pixel is proportional to the confidence value for the pixel, and the color of the 2D color image is the depth range; The computer-readable storage medium of any one of arrangements (22)-(33) comprising the step of providing a two-dimensional (2D) color image, as shown in

本願明細書において説明される技術による、ステレオビジョンシステムは、異なる構成で具現化され得る。例となる構成は、以下のような構成（３５）乃至（４６）の組み合わせを含む。 A stereo vision system according to the techniques described herein may be embodied in different configurations. Exemplary configurations include combinations of configurations (35)-(46) as follows.

（３５）画像のペアのシーケンスをキャプチャするように構成された、ステレオカメラシステムであって、画像の各ペアは、同時にキャプチャされた、第１の画像と、第２の画像とから成る、ステレオカメラシステムと、コンピュータプロセッサであって、ステレオカメラシステムから、画像データのストリームを受信することであって、画像データは、画像のペアのシーケンスに対応する、受信することと、画像のペアの各々について、マッチさせられたピクセルの２次元（２Ｄ）ピクセルマップを生成するために、第１の画像と、第２の画像とを平行化することと、ピクセルマップの各ピクセルについて、深度値を決定することと、ピクセルマップの各ピクセルについて、深度値についての確信度値を決定することと、確信度値のうちの少なくとも１つが、画像異常を示すとき、制御信号を発行することとを行うようにプログラムされた、コンピュータプロセッサとを備える、ステレオビジョンシステム。 (35) A stereo camera system configured to capture a sequence of image pairs, each pair of images consisting of a first image and a second image captured simultaneously. a camera system and a computer processor, receiving a stream of image data from the stereo camera system, the image data corresponding to a sequence of image pairs, each of the image pairs; rectifying the first image and the second image to generate a two-dimensional (2D) pixmap of matched pixels for , and determining a depth value for each pixel of the pixmap determining, for each pixel of the pixmap, a confidence value for the depth value; and issuing a control signal when at least one of the confidence values indicates an image anomaly. a computer processor programmed to a stereo vision system.

（３６）画像異常は、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応する、構成（３５）のシステム。
（３７）画像異常は、シーケンスの画像の２つ以上の連続するペアについて、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応する、構成（３５）または構成（３６）のシステム。 (36) The system of configuration (35), wherein the image anomaly corresponds to one or more pixels of the portion of the confidence map having a confidence value below a predetermined threshold.
(37) the image anomaly corresponds to one or more pixels of the portion of the confidence map that have confidence values below a predetermined threshold for two or more consecutive pairs of images of the sequence; (35) or system of configuration (36).

（３８）画像異常は、確信度マップの隣接する領域の複数のピクセルから成る、構成（３５）乃至（３７）のうちのいずれか１つのシステム。
（３９）制御信号は、可聴音を引き起こすように構成される、構成（３５）乃至（３８）のうちのいずれか１つのシステム。 (38) The system of any one of configurations (35)-(37), wherein the image anomaly consists of a plurality of pixels in contiguous regions of the confidence map.
(39) The system of any one of configurations (35)-(38), wherein the control signal is configured to cause an audible sound.

（４０）可聴音は、事前記録されたメッセージである、構成（３５）乃至（３９）のうちのいずれか１つのシステム。
（４１）制御信号は、車両のエンジン制御モジュールに対して発行される、構成（３５）乃至（４０）のうちのいずれか１つのシステム。 (40) The system of any one of configurations (35)-(39), wherein the audible tone is a pre-recorded message.
(41) The system of any one of arrangements (35)-(40), wherein the control signal is issued to an engine control module of the vehicle.

（４２）ピクセルマップの各ピクセルについて、確信度値は、ピクセルにおけるエッジの有無、ピクセルの照度レベル、ならびにピクセルマップがそれから生成された第１の画像および第２の画像のテクスチャ値に基づいて決定される、構成（３５）乃至（４１）のうちのいずれか１つのシステム。 (42) For each pixel of the pixelmap, a confidence value is determined based on the presence or absence of an edge at the pixel, the illumination level of the pixel, and the texture values of the first and second images from which the pixelmap was generated. The system of any one of configurations (35)-(41).

（４３）コンピュータプロセッサは、画像のペアのシーケンスに対応する、スーパフレームのシーケンスを出力するようにプログラムされ、スーパフレームの各々は、２Ｄ画像と、２Ｄ画像に対応する確信度マップとから成る、構成（３５）乃至（４２）のうちのいずれか１つのシステム。 (43) the computer processor is programmed to output a sequence of superframes corresponding to the sequence of image pairs, each superframe consisting of a 2D image and a confidence map corresponding to the 2D image; The system of any one of configurations (35)-(42).

（４４）２Ｄ画像は、第１の画像、または第２の画像である、構成（３５）乃至（４３）のうちのいずれか１つのシステム。
（４５）コンピュータプロセッサは、スーパフレームのシーケンスを、２Ｄ画像と、確信度マップに対応する可視的な確信度インジケータとを示すことをディスプレイに行わせる、表示信号として出力するようにプログラムされる、構成（３５）乃至（４４）のうちのいずれか１つのシステム。 (44) The system of any one of configurations (35)-(43), wherein the 2D image is the first image or the second image.
(45) the computer processor is programmed to output the sequence of superframes as a display signal that causes the display to show a 2D image and a visible confidence indicator corresponding to the confidence map; The system of any one of configurations (35)-(44).

（４６）表示信号は、確信度インジケータを、２Ｄ画像の各ピクセルの透明度として、ピクセルごとに、表示させる、構成（３５）乃至（４５）のうちのいずれか１つのシステム。 (46) The system of any one of arrangements (35)-(45), wherein the display signal causes the confidence indicator to be displayed, pixel by pixel, as transparency for each pixel of the 2D image.

非一時的コンピュータ可読記憶媒体は、コンピュータプロセッサによって遂行されたときに、コンピュータプロセッサに、本願明細書において説明される技術による、車両支援システムの方法を実行させるコードを記憶するように構成され得る。そのようなコンピュータ可読記憶媒体の例は、以下のような構成（４７）乃至（５９）の組み合わせを含む。 A non-transitory computer-readable storage medium may be configured to store code that, when executed by a computer processor, causes the computer processor to perform methods of vehicle assistance systems in accordance with the techniques described herein. Examples of such computer-readable storage media include combinations of structures (47) through (59) as follows.

（４７）コンピュータプロセッサによって遂行されたときに、コンピュータプロセッサに、ステレオビジョンシステムの方法を実行させるコードを記憶した、非一時的コンピュータ可読記憶媒体であって、方法は、コンピュータプロセッサが、ステレオカメラシステムから、画像データのストリームを受信する工程であって、画像データは、画像のペアのシーケンスに対応し、画像の各ペアは、同時にキャプチャされた、第１の画像と、第２の画像とから成る、工程と、画像のペアの各々について、コンピュータプロセッサが、マッチさせられたピクセルの２次元（２Ｄ）ピクセルマップを生成するために、第１の画像と、第２の画像とを平行化する工程と、ピクセルマップの各ピクセルについて、深度値を決定する工程と、ピクセルマップの各ピクセルについて、深度値についての確信度値を決定する工程と、確信度マップのうちの少なくとも１つが、画像異常を示すとき、コンピュータプロセッサが、制御信号を発行する工程とから成る、非一時的コンピュータ可読記憶媒体。 (47) A non-transitory computer-readable storage medium storing code that, when executed by a computer processor, causes the computer processor to perform a method of a stereo vision system, the method comprising: receiving a stream of image data from, the image data corresponding to a sequence of pairs of images, each pair of images from the first image and the second image captured simultaneously. and for each pair of images, the computer processor parallelizes the first image and the second image to generate a two-dimensional (2D) pixel map of matched pixels. determining a depth value for each pixel of the pixmap; determining a confidence value for the depth value for each pixel of the pixmap; and the computer processor issuing a control signal when indicating a non-transitory computer-readable storage medium.

（４８）画像異常は、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応する、構成（４７）のコンピュータ可読記憶媒体。 (48) The computer-readable storage medium of arrangement (47), wherein the image anomaly corresponds to one or more pixels of the portion of the confidence map having a confidence value below the predetermined threshold.

（４９）画像異常は、シーケンスの画像の２つ以上の連続するペアについて、事前決定された閾値を下回る確信度値を有する、確信度マップの一部分の１つまたは複数のピクセルに対応する、構成（４７）または構成（４８）のコンピュータ可読記憶媒体。 (49) the image anomaly corresponds to one or more pixels of the portion of the confidence map that have confidence values below a predetermined threshold for two or more consecutive pairs of images of the sequence; (47) or computer readable storage medium of configuration (48).

（５０）画像異常は、確信度マップの隣接する領域の複数のピクセルから成る、構成（４７）乃至（４９）のうちのいずれか１つのコンピュータ可読記憶媒体。
（５１）制御信号は、可聴音を引き起こすように構成される、構成（４７）乃至（５０）のうちのいずれか１つのコンピュータ可読記憶媒体。 (50) The computer-readable storage medium of any one of arrangements (47)-(49), wherein the image anomaly consists of a plurality of pixels of contiguous regions of the confidence map.
(51) The computer-readable storage medium of any one of arrangements (47)-(50), wherein the control signal is configured to cause an audible sound.

（５２）可聴音は、事前記録されたメッセージである、構成（４７）乃至（５１）のうちのいずれか１つのコンピュータ可読記憶媒体。
（５３）制御信号は、車両のエンジン制御モジュールに対して発行される、構成（４７）乃至（５２）のうちのいずれか１つのコンピュータ可読記憶媒体。 (52) The computer-readable storage medium of any one of arrangements (47)-(51), wherein the audible tone is a pre-recorded message.
(53) The computer-readable storage medium of any one of arrangements (47)-(52), wherein the control signal is issued to an engine control module of the vehicle.

（５４）ピクセルマップの各ピクセルについて、確信度値は、ピクセルにおけるエッジの有無、ピクセルの照度レベル、ならびにピクセルマップがそれから生成された第１の画像および第２の画像のテクスチャ値に基づいて決定される、構成（４７）乃至（５３）のうちのいずれか１つのコンピュータ可読記憶媒体。 (54) For each pixel of the pixelmap, a confidence value is determined based on the presence or absence of an edge at the pixel, the illumination level of the pixel, and the texture values of the first and second images from which the pixelmap was generated. The computer-readable storage medium of any one of arrangements (47)-(53).

（５５）方法は、コンピュータプロセッサが、画像のペアのシーケンスに対応する、スーパフレームのシーケンスを出力する工程であって、スーパフレームの各々は、２Ｄ画像と、２Ｄ画像に対応する視差マップと、２Ｄ画像に対応する確信度マップとから成る、工程からさらに成る、構成（４７）乃至（５４）のうちのいずれか１つのコンピュータ可読記憶媒体。 (55) The method comprises the step of the computer processor outputting a sequence of superframes corresponding to the sequence of image pairs, each superframe being a 2D image and a disparity map corresponding to the 2D image; and a confidence map corresponding to the 2D image.

（５６）２Ｄ画像は、第１の画像、または第２の画像である、構成（４７）乃至（５５）のうちのいずれか１つのコンピュータ可読記憶媒体。
（５７）スーパフレームのシーケンスを出力する工程は、２Ｄ画像と、確信度マップに対応する可視的な確信度インジケータとを示すことをディスプレイに行わせる、表示信号を出力する工程から成る、構成（４７）乃至（５６）のうちのいずれか１つのコンピュータ可読記憶媒体。 (56) The computer-readable storage medium of any one of arrangements (47)-(55), wherein the 2D image is the first image or the second image.
(57) Outputting the sequence of superframes comprises outputting a display signal that causes the display to show the 2D image and a visible confidence indicator corresponding to the confidence map, comprising: 47) The computer readable storage medium of any one of (56).

（５８）表示信号は、確信度インジケータを、２Ｄ画像の各ピクセルの透明度として、ピクセルごとに、表示させる、構成（４７）乃至（５７）のうちのいずれか１つのコンピュータ可読記憶媒体。 (58) The computer-readable storage medium of any one of arrangements (47)-(57), wherein the display signal causes the confidence indicator to be displayed, pixel by pixel, as transparency for each pixel of the 2D image.

（５９）スーパフレームの各々は、２Ｄ画像と、確信度マップと、２Ｄ画像に対応する視差マップとから成る、構成（４７）乃至（５８）のうちのいずれか１つのコンピュータ可読記憶媒体。 (59) The computer-readable storage medium of any one of arrangements (47)-(58), wherein each superframe consists of a 2D image, a confidence map, and a disparity map corresponding to the 2D image.

結び
様々な改変、変更、および改良が、上で議論された構造、構成、および方法に対して行われることができ、本願明細書において開示された本発明の主旨および範囲内にあることが意図されていることが理解されるべきである。さらに、本発明の利点が示されたが、本発明のあらゆる実施形態が、あらゆる説明された利点を含むわけではないことが認識されるべきである。いくつかの実施形態は、本願明細書において有利であるとして説明されたいずれかの特徴を実施しないことがある。したがって、上述の説明および添付の図面は、例としてのものにすぎない。 Conclusion Various alterations, modifications, and improvements may be made to the structures, compositions, and methods discussed above and are intended to be within the spirit and scope of the invention disclosed herein. It should be understood that Further, while advantages of the invention have been indicated, it should be appreciated that not every embodiment of the invention will include every described advantage. Some embodiments may not implement any features described as advantageous herein. Accordingly, the above description and accompanying drawings are only exemplary.

本技術のいくつかの態様は、１つまたは複数の方法として具現化され得、本技術の方法の一部として実行される行為は、任意の適切なやり方で、順序付けられ得ることが理解されるべきである。したがって、様々な実施形態において、順次的な行為として示され、および／または説明されるにもかかわらず、いくつかの行為を同時に実行することを含み得る、示されたのと、および／または説明されたのと異なる順序で行為が実行される実施形態が構築され得る。 It will be appreciated that some aspects of the technology may be embodied as one or more methods, and the acts performed as part of the methods of the technology may be ordered in any suitable manner. should. Thus, in various embodiments, the actions shown and/or described may involve performing some acts simultaneously, even though they are shown and/or described as sequential acts. Embodiments may be constructed in which the acts are performed in a different order than they are performed.

本発明の様々な態様は、単独で、組み合わせて、または上で説明された実施形態において具体的には議論されなかった様々な配置で使用され得、したがって、それの適用の際、上述の説明において規定された、または図面において例示された、構成要素の詳細および配置に限定されない。例えば、１つの実施形態において説明された態様は、他の実施形態において説明された態様と、任意の方式で、組み合わされ得る。 Various aspects of the present invention may be used singly, in combination, or in various arrangements not specifically discussed in the above-described embodiments, and therefore, upon application thereof, the above description may be used. It is not limited to the details and arrangements of components defined in or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.

要素を修飾するための、説明および特許請求の範囲における、「第１の」、「第２の」、「第３の」などの序数用語の使用は、それ自体では、１つの要素が別の要素よりも上位である、いかなる優先順位、優先権、もしくは順序も、または方法の行為が実行される時間的順序を含意せず、ある名前を有する１つの要素または行為を、（序数用語を使用しなければ）同じ名前を有する別の要素または行為から区別して、要素または行為を区別するためのラベルとして使用されるにすぎない。 The use of ordinal terms such as “first,” “second,” “third,” etc. in the description and claims to modify elements may, by themselves, refer to one element as another. One element or act with a certain name does not imply any precedence, precedence, or order of precedence over the elements, or the temporal order in which the method acts are performed (using ordinal terminology). It is only used as a label to distinguish an element or action (otherwise) from another element or action with the same name.

本願明細書において定義され、使用されるような、すべての定義は、辞書の定義、本願明細書に援用される文書における定義、および／または定義された用語の通常の意味よりも優先されると理解されるべきである。 All definitions, as defined and used herein, supersede dictionary definitions, definitions in documents incorporated herein, and/or the ordinary meaning of the defined terms. should be understood.

本願明細書および特許請求の範囲において使用される場合、不定冠詞「ａ」および「ａｎ」は、明確に反対のことを示していない限り、「少なくとも１つ」を意味すると理解されるべきである。 As used in this specification and claims, the indefinite articles "a" and "an" shall be understood to mean "at least one," unless clearly indicated to the contrary. .

本願明細書および特許請求の範囲において使用される場合、１つまたは複数の要素のリストへの言及における、「少なくとも１つ」という語句は、要素のリスト内の要素のうちのいずれか１つまたは複数から選択された、少なくとも１つの要素を意味するが、必ずしも、要素のリスト内に具体的に列挙された、ありとあらゆる要素の少なくとも１つを含むとは限らず、要素のリスト内の要素の任意の組み合わせを除外しないと理解されるべきである。この定義は、「少なくとも１つ」という語句が言及する、要素のリスト内において具体的に識別された要素以外の要素が、具体的に識別されたそれらの要素に関係があるか、それとも関係がないかにかかわらず、任意選択で存在し得ることも許容する。 As used herein and in the claims, the phrase "at least one" in reference to a list of one or more elements means any one or means at least one element selected from a plurality, but not necessarily including at least one of every and every element specifically recited in the list of elements, any of the elements in the list of elements It should be understood that the combination of This definition does not specify whether elements other than the specifically identified elements in the list of elements to which the phrase "at least one" refers are related to or related to those specifically identified elements. It is also allowed that it may optionally be present, whether or not it is present.

本願明細書および特許請求の範囲において使用される場合、２つの値（例えば、距離、幅など）への言及における、「等しい」または「同じ」という語句は、２つの値が、製造公差内で同じであることを意味する。したがって、等しい、または同じである、２つの値は、２つの値が、±５％だけ互いに異なることを意味し得る。 As used herein and in the claims, the phrase "equal" or "same" in reference to two values (e.g., distance, width, etc.) means that the two values are within manufacturing tolerances. means they are the same. Thus, two values that are equal or the same can mean that the two values differ from each other by ±5%.

本願明細書および特許請求の範囲で使用される場合の「および／または」という語句は、そのように等位接続された要素の「どちらかまたは両方」、すなわち、いくつかのケースにおいては連言的に存在し、他のケースにおいては選言的に存在する要素を意味すると理解されるべきである。「および／または」を用いて列挙された多数の要素は、同じやり方で、すなわち、そのように等位接続された要素のうちの「１つまたは複数」と解釈されるべきである。「および／または」節によって具体的に識別された要素以外の他の要素が、具体的に識別されたそれらの要素に関係があるか、それとも関係がないかにかかわらず、任意選択で存在し得る。したがって、非限定的な例として、「Ａおよび／またはＢ」に対する言及は、「備える」などのオープンエンド語と併せて使用されるとき、１つの実施形態においては、Ａだけに（任意選択でＢ以外の要素を含む）、別の実施形態においては、Ｂだけに（任意選択でＡ以外の要素を含む）、また別の実施形態においては、ＡとＢの両方に（任意選択で他の要素を含む）言及するなどすることができる。 The phrase "and/or" as used herein and in the claims means "either or both" of such conjoined elements, i.e., in some cases conjunctive It should be understood to mean an element that exists implicitly and in other cases disjunctively. Multiple elements listed with "and/or" should be construed in the same fashion, ie, "one or more" of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the "and/or" clause, whether related or unrelated to those elements specifically identified. . Thus, as a non-limiting example, references to "A and/or B," when used in conjunction with open-ended words such as "comprising," in one embodiment refer only to A (and optionally B), in another embodiment, only B (optionally including elements other than A), and in another embodiment, both A and B (optionally including other (including elements) can be mentioned.

本願明細書および特許請求の範囲において使用される場合、「または」は、上で定義された「および／または」と同じ意味を有すると理解されるべきである。例えば、リスト内の項目を分離するとき、「または」または「および／または」は、包含的であると、すなわち、数々の要素または要素のリストのうちの、２つ以上も含む、少なくとも１つ、および任意選択で、追加の列挙されていない項目を含むと解釈されるものとする。「ただ１つの」、もしくは「正確に１つの」など、明確に反対のことを示す用語だけが、または特許請求の範囲において使用されたときの「から成る」が、数々の要素または要素のリストのうちの正確に１つを含むことに言及している。一般に、本願明細書において使用される場合の「または」という用語は、「どちらか」、「１つの」、「ただ１つの」、または「正確に１つの」などの、排他性の用語によって先行されるときだけ、排他的選択肢（すなわち、「一方または他方であって、両方ではない」）を示すと解釈されるものとする。特許請求の範囲において使用されるときの「から原則的に成る」は、特許法の分野で使用されるような、それの通常の意味を有するものとする。 As used in the specification and claims, "or" should be understood to have the same meaning as "and/or" as defined above. For example, when separating items in a list, "or" or "and/or" are inclusive, i.e., including at least one of a number of elements or lists of elements. , and, optionally, additional unlisted items. Only words that clearly indicate the opposite, such as "only one" or "exactly one", or "consisting of" when used in a claim, a number of elements or a list of elements refers to including exactly one of Generally, the term "or" as used herein is preceded by terms of exclusivity such as "either," "one," "exactly one," or "exactly one." shall be construed to indicate exclusive alternatives (ie, "one or the other, but not both") only when "Consisting essentially of" when used in the claims shall have its ordinary meaning as used in the field of patent law.

また、本願明細書において使用される言葉遣いおよび専門用語は、説明の目的のためのものであり、限定的なものと見なされるべきではない。本願明細書における、「含む」、「備える」、「から成る」、「有する」、「包含する」、および「伴う」などの用語、およびそれらの変形の使用は、その後に列挙される項目、およびそれらの均等物、ならびに追加の項目を包含することを意味する。 Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of terms such as “including,” “comprising,” “consisting of,” “having,” “including,” and “accompanied by” herein, and variations thereof, herein, shall be used to refer to the items listed thereafter, and equivalents thereof, as well as additional items.

本願明細書において使用される場合の「近似的に」および「約」という用語は、いくつかの実施形態においては、目標値の±２０％以内、いくつかの実施形態においては、目標値の±１０％以内、いくつかの実施形態においては、目標値の±５％以内、およびいくつかの実施形態においては、目標値の±２％以内を意味すると解釈され得る。「近似的に」および「約」という用語は、目標値に等しくてよい。 The terms "approximately" and "about" as used herein are in some embodiments within ±20% of a target value, in some embodiments within ±20% of a target value. It can be interpreted to mean within 10%, in some embodiments within ±5% of the target value, and in some embodiments within ±2% of the target value. The terms "approximately" and "about" may equate to a target value.

本願明細書において使用される場合の「実質的に」という用語は、いくつかの実施形態においては、目標値の９５％以内、いくつかの実施形態においては、目標値の９８％以内、いくつかの実施形態においては、目標値の９９％以内、およびいくつかの実施形態においては、目標値の９９．５％以内を意味すると解釈され得る。いくつかの実施形態においては、「実質的に」という用語は、目標値の１００％に等しくてよい。 The term "substantially" as used herein is in some embodiments within 95% of the target value, in some embodiments within 98% of the target value, in some In embodiments, it can be taken to mean within 99% of the target value, and in some embodiments within 99.5% of the target value. In some embodiments, the term "substantially" may equal 100% of the target value.

Claims

An automated vehicle assistance system for supervised or unsupervised vehicle movement, said system comprising:
a vehicle control system comprising a computer processor and a memory coupled to the computer processor;
a first sensor system configured to receive first image data of a scene and to output a first disparity map and a first confidence map based on said first image data; with
The vehicle control system includes:
receiving the first disparity map and the first confidence map from the first sensor system;
configured to output a video stream consisting of the first disparity map and the first confidence map;
Automated vehicle assistance system.

2. The vehicle assistance system of claim 1, wherein in the video stream the first confidence map is encoded to become part of the first disparity map.

the first image data consists of a plurality of pixels;
the disparity map comprises disparity data for each of the pixels;
the confidence map consists of confidence data for each of the pixels;
A vehicle assistance system according to claim 1 .

the first image data comprises data for left and right two-dimensional (2D) first images;
the first sensor system is configured to generate a first left-right (2D) parallelized image and a first cost volume map from the first image data;
The first sensor system includes:
the 2D parallelized first image; and
the first parallax map;
configured to generate the first confidence map from the first cost volume map;
A vehicle assistance system according to claim 1 .

The first sensor system includes:
generating the first confidence map based on one or both of a uniqueness value determined from a semi-global matching (SGM) algorithm and an image texture metric determined from a Sobel operation on the first image data. 2. The vehicle assistance system of claim 1, wherein the vehicle assistance system is configured to:

a second sensor system configured to receive second image data of at least a portion of said scene and to output a second confidence map based on said second image data;
The vehicle control system includes:
receiving the second confidence map from the second sensor system;
outputting the video stream as a sequence of superframes, each superframe based on the first disparity map, the first confidence map and the second confidence map; consisting of information configured to output and
A vehicle assistance system according to claim 1 .

7. The vehicle assistance system of claim 6, wherein the vehicle control system is configured to output control signals to a vehicle electronic control unit (ECU) based on the information in the video stream.

a first sensor module, wherein the first sensor system is configured to process the first image data to generate the first disparity map and the first confidence map; and
the second sensor system is a second sensor module configured to process the second image data to generate the second belief map;
the first sensor module and the second sensor module are stored in the memory;
the computer processor configured to perform the first sensor module and the second sensor module;
A vehicle assistance system according to claim 6.

The video stream is
at least one superframe consisting of the first disparity map and the first confidence map;
7. The vehicle assistance system of claim 6, comprising: at least one superframe comprising said first disparity map and said second confidence map.

7. The vehicle assistance system of claim 6, wherein said video stream comprises at least one superframe comprising a portion of said first belief map and a portion of said second belief map.

the first image data comprises stereo vision data;
wherein the second image data comprises lidar data;
A vehicle assistance system according to claim 6.

a third sensor system configured to receive third image data of at least a portion of said scene and to output a third confidence map based on said third image data; Item 12. A vehicle assistance system according to Item 11.

13. Vehicle assistance system according to claim 12, wherein said third image data comprises radar data or acoustic data.

Each superframe of the video stream comprises:
a two-dimensional (2D) image of the scene;
a depth map of the scene;
7. The vehicle assistance system of claim 6, comprising: a certainty map of the scene;

3. The certainty map of the scene comprises the first confidence map, the second confidence map, or a combination of the first and second confidence maps. 15. Vehicle assistance system according to 14.

said depth map of said scene comprising said first disparity map modulated with image data corresponding to said 2D image of said scene;
The certainty map of the scene is the first confidence map, or the second confidence map, or the first confidence map modulated with image data corresponding to the 2D image of the scene. consisting of a combination of a likelihood map and the second confidence map;
15. Vehicle assistance system according to claim 14.

15. The vehicle assistance system of claim 14, wherein pixels of the 2D image of the scene, pixels of the depth map of the scene, and pixels of the certainty map of the scene are temporally and spatially matched.

The vehicle control system combines disparity information from the first disparity map and belief information from the first and second belief maps to reduce the data size of the video stream. 18. The vehicle assistance system of claim 17, configured to encode.

further comprising a pair of cameras configured to be mounted on the vehicle;
the camera is configured to provide the first image data to the first sensor system;
A vehicle assistance system according to claim 6.

7. The method of claim 6, wherein the video stream consists of two-dimensional (2D) color images, each 2D color image consisting of a plurality of pixels, the alpha channel transparency of each pixel being proportional to the confidence value for the pixel. Vehicle assistance system as described.

21. The vehicle assistance system of claim 20, wherein colors of the 2D color image indicate depth range.

A non-transitory computer-readable storage medium storing code that, when executed by a computer processor, causes said computer processor to perform a method of an automated vehicle assistance system for supervised or unsupervised vehicle movement, comprising: The method includes:
the computer processor obtaining a first disparity map and a first confidence map, wherein the first disparity map and the first confidence map are first image data of a scene; a step corresponding to
said computer processor outputting a video stream consisting of said first disparity map and said first confidence map.

23. The computer of claim 22, wherein said step of outputting said video stream comprises said computer processor encoding said first confidence map to become part of said first disparity map. readable storage medium.

the first image data consists of a plurality of pixels;
the disparity map comprises disparity data for each of the pixels;
the confidence map consists of confidence data for each of the pixels;
23. A computer-readable storage medium according to claim 22.

The method includes:
said computer processor obtaining a second confidence map corresponding to second image data of at least a portion of said scene;
said computer processor outputting said video stream as a sequence of superframes, each superframe comprising said first disparity map, said first confidence map and said second confidence 23. The computer-readable storage medium of claim 22, further comprising the step of: information based on the map.

The method includes:
26. The computer readable storage medium of claim 25, further comprising: said computer processor outputting a control signal to a vehicle electronic control unit (ECU) based on said information in said video stream.

The method includes:
said computer processor processing said first image data to obtain said first disparity map and said first confidence map;
26. The computer-readable storage medium of claim 25, further comprising: said computer processor processing said second image data to obtain said second confidence map.

The step of outputting the video stream comprises:
the computer processor preparing at least one superframe to consist of the first disparity map and the first confidence map;
26. The computer-readable storage medium of claim 25, comprising: said computer processor preparing at least one superframe to consist of said first disparity map and said second confidence map.

Said step of outputting said video stream comprises said computer processor preparing at least one superframe consisting of a portion of said first belief map and a portion of said second belief map. 26. The computer readable storage medium of claim 25.

the first image data comprises stereo vision data;
wherein the second image data consists of lidar data, radar data, or acoustic data;
26. A computer-readable storage medium according to claim 25.

The step of outputting the video stream comprises: the computer processor;
a two-dimensional (2D) image of the scene;
a depth map of the scene;
26. The computer-readable storage medium of claim 25, comprising preparing each superframe of the video stream to consist of: a certainty map of the scene;

By the computer processor, the step of preparing each superframe includes pixels of the 2D image of the scene, pixels of the depth map of the scene, pixels of the certainty map of the scene, and 32. The computer-readable storage medium of Claim 31, comprising the step of spatially matching.

In the step of outputting the video stream, the computer processor performs disparity information from the first disparity map and disparity information from the first confidence map and the second confidence map to reduce the data size of the video stream. 32. The computer-readable storage medium of claim 31, comprising encoding belief information from the belief map.

The step of outputting the video stream comprises:
each 2D color image consists of a plurality of pixels,
the alpha channel transparency of each pixel is proportional to the confidence value for that pixel;
34. The computer-readable storage medium of claim 33, comprising preparing a two-dimensional (2D) color image such that colors of the 2D color image indicate depth range.

A stereo camera system configured to capture a sequence of image pairs, each pair of images consisting of a first image and a second image captured simultaneously; ,
a computer processor,
receiving a stream of image data from the stereo camera system, the image data corresponding to the sequence of image pairs;
For each of said pairs of images,
rectifying the first image and the second image to generate a two-dimensional (2D) pixel map of matched pixels;
determining a depth value for each pixel of the pixel map;
determining a confidence value for the depth value for each pixel of the pixel map;
and a computer processor programmed to issue a control signal when at least one of said confidence values indicates an image anomaly.

36. The system of claim 35, wherein the image anomaly corresponds to one or more pixels of a portion of a confidence map having a confidence value below a predetermined threshold.

3. The image anomalies correspond to one or more pixels of a portion of a confidence map having confidence values below a predetermined threshold for two or more consecutive pairs of images of the sequence. 35. The system according to 35.

36. The system of claim 35, wherein the image anomaly consists of a plurality of pixels in contiguous regions of a confidence map.

36. The system of Claim 35, wherein the control signal is configured to cause an audible sound.

40. The system of Claim 39, wherein the audible tone is a pre-recorded message.

36. The system of claim 35, wherein the control signal is issued to an engine control module of the vehicle.

For each pixel of the pixelmap, the confidence value is the presence or absence of an edge at the pixel, the illumination level of the pixel, and the texture values of the first and second images from which the pixelmap was generated. 36. The system of claim 35, determined based on:

The computer processor is programmed to output a sequence of superframes corresponding to the sequence of image pairs, each superframe consisting of a 2D image and a confidence map corresponding to the 2D image. 36. The system of claim 35.

44. The system of claim 43, wherein said 2D image is said first image or said second image.

The computer processor is programmed to output the sequence of superframes as a display signal that causes a display to show the 2D image and a visual confidence indicator corresponding to the confidence map. 44. The system of claim 43.

46. The system of claim 45, wherein the display signal causes the confidence indicator to be displayed on a pixel-by-pixel basis as the transparency of each pixel of the 2D image.

44. The system of claim 43, wherein each of said superframes consists of said 2D image, said confidence map, and a disparity map corresponding to said 2D image.

A non-transitory computer-readable storage medium storing code that, when executed by a computer processor, causes the computer processor to perform a method of a stereo vision system, the method comprising:
the computer processor receiving a stream of image data from a stereo camera system, the image data corresponding to a sequence of image pairs, each pair of images captured simultaneously; a step consisting of an image and a second image;
for each of said pairs of images, said computer processor:
rectifying the first image and the second image to generate a two-dimensional (2D) pixel map of matched pixels;
determining a depth value for each pixel of the pixel map;
determining a confidence value for the depth value for each pixel of the pixel map;
and said computer processor issuing a control signal when at least one of the confidence maps indicates an image anomaly.

49. The computer-readable storage medium of claim 48, wherein the image anomaly corresponds to one or more pixels of the portion of the confidence map having a confidence value below a predetermined threshold.

wherein the image anomaly corresponds to one or more pixels of the portion of the confidence map that have confidence values below a predetermined threshold for two or more consecutive pairs of images of the sequence. 49. The computer-readable storage medium of Clause 48.

49. The computer-readable storage medium of claim 48, wherein the image anomaly consists of a plurality of pixels in contiguous regions of the confidence map.

49. The computer-readable storage medium of Claim 48, wherein the control signal is configured to cause an audible sound.

53. The computer-readable storage medium of Claim 52, wherein the audible tone is a pre-recorded message.

49. The computer readable storage medium of Claim 48, wherein the control signal is issued to an engine control module of a vehicle.

For each pixel of the pixelmap, the confidence value is the presence or absence of an edge at the pixel, the illumination level of the pixel, and the texture values of the first and second images from which the pixelmap was generated. 49. The computer-readable storage medium of claim 48, determined based on.

The method includes:
said computer processor outputting a sequence of superframes corresponding to said sequence of image pairs, each of said superframes comprising a 2D image, a disparity map corresponding to said 2D image, and said 2D 49. The computer-readable storage medium of claim 48, further comprising: a confidence map corresponding to the image.

57. The computer-readable storage medium of claim 56, wherein said 2D image is said first image or said second image.

3. The step of outputting the sequence of superframes comprises outputting a display signal to cause a display to show the 2D image and a visual confidence indicator corresponding to the confidence map. 57. Computer readable storage medium according to Clause 56.

59. The computer-readable storage medium of Claim 58, wherein the display signal causes the confidence indicator to be displayed on a pixel-by-pixel basis as transparency for each pixel of the 2D image.

57. The computer-readable storage medium of Claim 56, wherein each of said superframes consists of said 2D image, said confidence map, and a disparity map corresponding to said 2D image.