JP2023508276A

JP2023508276A - map containing covariances at multiresolution voxels

Info

Publication number: JP2023508276A
Application number: JP2022537282A
Authority: JP
Inventors: カーステンボッセマイケル; ブラエスパトリック; アダムスデレク; レブサメンブライス
Original assignee: ズークスインコーポレイテッド
Priority date: 2019-12-20
Filing date: 2020-12-15
Publication date: 2023-03-02
Also published as: WO2021127692A1; EP4078534A1; EP4078534A4; CN114868154A

Abstract

キャプチャされた環境データの統計データに基づいてシーンまたはマップを表すための技法が本明細書に述べられる。場合によっては、データ（たとえば、共分散データ、平均データなど）は、複数のセマンティック層を含む多重解像度ボクセル空間として格納されることがある。いくつかの場合には、個々のセマンティック層は、異なる解像度を有する複数のボクセルグリッドを含むことがある。複数の解像度のボクセル空間がマージされて、１つまたは複数の解像度にて検出されたボクセルの共分散に基づいて組み合わされたシーンを生成することがある。Techniques for representing a scene or map based on statistics of captured environmental data are described herein. In some cases, data (eg, covariance data, mean data, etc.) may be stored as a multi-resolution voxel space that includes multiple semantic layers. In some cases, individual semantic layers may contain multiple voxel grids with different resolutions. Multiple resolution voxel spaces may be merged to produce a combined scene based on the covariance of detected voxels at one or more resolutions.

Description

関連出願の相互参照
本出願は、２０１９年１２月２０日に提出され、「MAPS COMPRISING COVARIANCES IN MULTI-RESOLUTION VOXELS」という表題をつけられた米国出願第１６／７２２，５９８号の優先権を主張し、全体が参照により本明細書に組み入れられる。 CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to U.S. Application No. 16/722,598, filed Dec. 20, 2019 and entitled "MAPS COMPRISING COVARIANCES IN MULTI-RESOLUTION VOXELS." , which is incorporated herein by reference in its entirety.

データは、環境においてキャプチャされ、環境のマップとして表されることが可能である。よくあることだが、上記のマップは、環境内を進む車両によって用いられることが可能であり、マップが、いろいろな目的に対して用いられることが可能である。場合によっては、環境は、２次元マップとして表されることが可能であり、一方、他の場合、環境は、３次元マップとして表されることが可能である。さらに、環境内の表面は、複数のポリゴンまたは三角形を用いて表されることがよくある。 Data can be captured in the environment and represented as a map of the environment. As is often the case, the map can be used by a vehicle navigating the environment and the map can be used for a variety of purposes. In some cases the environment can be represented as a two-dimensional map, while in other cases the environment can be represented as a three-dimensional map. Furthermore, surfaces in the environment are often represented using multiple polygons or triangles.

詳細な説明は、添付の図面を参照して説明される。図面において、参照符号の最も左の数字（複数可）は、参照符号が最初に現れる図面を特定する。別の図面における同一の参照符号の使用は、同様のまたはまったく同じのコンポーネントまたは特徴を示す。 The detailed description is described with reference to the accompanying drawings. In the drawings, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears. The use of the same reference numbers in different drawings indicates similar or identical components or features.

本明細書に説明されている多重解像度ボクセル空間の例示的なアーキテクチャを例示する例示的な図である。FIG. 4 is an example diagram illustrating an example architecture of the multi-resolution voxel space described herein; 本明細書に説明されている多重解像度ボクセル空間のセマンティック層の例示的な解像度を例示する例示的な絵入りの図である。FIG. 4 is an exemplary pictorial diagram illustrating an exemplary resolution of the semantic layer of the multi-resolution voxel space described herein; 本明細書に説明されている図２の多重解像度ボクセル空間についての第１の解像度を例示する例示的な図である。3 is an exemplary diagram illustrating the first resolution for the multi-resolution voxel space of FIG. 2 described herein; FIG. 本明細書に説明されている図２の多重解像度ボクセル空間についての第２の解像度を例示する例示的な図である。3 is an exemplary diagram illustrating a second resolution for the multi-resolution voxel space of FIG. 2 described herein; FIG. 本明細書に説明されている図２の多重解像度ボクセル空間についての第３の解像度を例示する例示的な図である。3 is an exemplary diagram illustrating a third resolution for the multi-resolution voxel space of FIG. 2 described herein; FIG. 本明細書に説明されている物理環境を表すデータをマップデータとアライメントするように構成されたシステムの例示的なデータフローを例示する例示的な処理フロー図である。FIG. 2 is an example process flow diagram illustrating an example data flow of a system configured to align data representing a physical environment as described herein with map data; 本明細書に説明されている多重解像度ボクセル空間を生成することに関連付けられた例示的な処理を例示する例示的なフロー図である。FIG. 4 is an exemplary flow diagram illustrating exemplary processing associated with generating a multi-resolution voxel space as described herein; 本明細書に説明されているターゲット多重解像度ボクセル空間をリファレンス多重解像度ボクセル空間とアライメントすることについての例示的な処理を例示する別のフロー図である。FIG. 4 is another flow diagram illustrating an exemplary process for aligning a target multi-resolution voxel space with a reference multi-resolution voxel space as described herein; 本明細書に説明されている多重解像度ボクセル空間アライメントシステムを実装するための例示的なシステムのブロック図である。1 is a block diagram of an exemplary system for implementing the multi-resolution voxel spatial alignment system described herein; FIG. 本明細書に説明されているキャプチャされたデータの点群表現と比較した図２～４の多重解像度ボクセル空間の例についての絵入りの図である。5 is a pictorial illustration of the example multi-resolution voxel space of FIGS. 2-4 compared to the point cloud representation of the captured data described herein; FIG.

本明細書に説明される技法は、物理環境を表すデータの点分布の空間平均、共分散、および重みを格納するボクセルを含んでいる多重解像度ボクセル空間（multi-resolution voxel space）を含むマップデータを、決定することおよび／または用いることに向けられる。マップデータは、異なる解像度または物理的距離にて物理環境を表す複数のボクセルグリッド（voxel grid）またはボクセル層を含むことがある。例として、各々のボクセル層（voxel layer）は、進行している層の２倍の解像度にて物理環境を表すことがある。すなわち、第１の層におけるボクセルが第１の体積（例えば、１０ｃｍ×１０ｃｍ×１０ｃｍ）を表すことがある一方、第２の層におけるボクセルは、第２の体積（例えば、２０ｃｍ×２０ｃｍ×２０ｃｍ）を表すことがある。多重解像度ボクセル空間のボクセルに関連付けられたデータは、複数のコバリアンスの楕円体（covariance ellipsoid）として表されることがある。コバリアンスの楕円体の表現は、個々のボクセルに関連付けられたデータ点の算出された平均値および共分散値に基づいて生成されることがある。場合によっては、ボクセルデータは、例えば、分類および／またはセグメンテーション情報などのセマンティック情報と関連付けられることが可能であり、特定の分類と関連付けられたデータは、特定の分類と関連付けられた特有の多重解像度ボクセル空間と関連付けることが可能である。今述べた例では、各々のボクセルコバリアンスセマンティック層（covariance semantic layer）は、コバリアンスの楕円体として特有のセマンティッククラス（semantic class）（例えば、木、車両、建物など）に関連付けられたデータ点を含むことがある。 The techniques described herein map data including a multi-resolution voxel space containing voxels that store spatial means, covariances, and weights of point distributions of data representing a physical environment. is directed to determining and/or using the Map data may include multiple voxel grids or voxel layers that represent the physical environment at different resolutions or physical distances. As an example, each voxel layer may represent the physical environment at twice the resolution of the advancing layer. That is, voxels in a first layer may represent a first volume (e.g., 10 cm x 10 cm x 10 cm), while voxels in a second layer may represent a second volume (e.g., 20 cm x 20 cm x 20 cm). may represent Data associated with a voxel in a multi-resolution voxel space may be represented as multiple covariance ellipsoids. An ellipsoidal representation of covariance may be generated based on the calculated mean and covariance values of the data points associated with individual voxels. In some cases, the voxel data can be associated with semantic information, such as, for example, classification and/or segmentation information, and the data associated with a particular classification is a unique multi-resolution It is possible to relate to voxel space. In the example just described, each voxel covariance semantic layer represents data points associated with a particular semantic class (e.g. trees, vehicles, buildings, etc.) as covariance ellipsoids. may contain.

場合によっては、多重解像度ボクセル空間によって表されるマップデータは、例えば、ライダー（light detection and ranging）システムの出力など、物理環境を表すデータ点から生成されることがある。例として、システムは、点群として表される複数のライダーポイント（lidar point）またはライダーデータを受信することがある。システムは、ライダーポイントを、車両（例えば、システムがライダーポイントをキャプチャする）の局所的な基準フレームに少なくとも部分的に基づいて、第１の解像度を有するボクセルグリッド（例えば、多重解像度ボクセル空間のボクセルグリッドが最大数のボクセルを有する）のボクセルに割り当てる、ないしは別のやり方により関連付けることがある。システムは、たとえば、集められたデータの平均値、セントロイド、共分散、および同類のものなど、各ボクセルに関連付けられた統計データを決定することがある。次に、システムは、より高いレベルのボクセルグリッドを生成するとき、より低い解像度のグリッドのボクセル（またはボクセルに関連付けられたデータ）をマージする、ないしは別のやり方により組み合わせることがある。例えば、より低い解像度のグリッドにおける３次元近傍（例えば、より高いレベルのボクセルの物理空間に関連付けられたｘ方向、ｙ方向、およびｚ方向のボクセル数）内のボクセル（またはボクセルに関連付けられたデータ）は、次のより高いレベルのボクセルグリッドを形成するとき、マージされることがある。１つの特定の例では、近傍内のボクセルは、より低い解像度のグリッドの各ボクセルについての個々のガウス分布の重み付き和をとることによってマージされる。場合によっては、より低い解像度におけるボクセルをマージして、より高い解像度のグリッドを形成することは、計算的に安価であるだけでなく、基準フレームによりライダーデータをローカライズするのを助けることを、より低い解像度のグリッドに可能にする。 In some cases, map data represented by a multi-resolution voxel space may be generated from data points representing the physical environment, such as the output of a lidar (light detection and ranging) system, for example. As an example, the system may receive multiple lidar points or lidar data represented as a point cloud. The system bases the lidar points at least in part on a local frame of reference of the vehicle (e.g., the system captures the lidar points) in a voxel grid having a first resolution (e.g., voxels in a multi-resolution voxel space). grid with the maximum number of voxels). The system may determine statistical data associated with each voxel, such as, for example, mean values, centroids, covariances, and the like of the collected data. The system may then merge or otherwise combine the voxels (or data associated with the voxels) of the lower resolution grid when generating the higher level voxel grid. For example, a voxel (or data associated with a voxel) within a three-dimensional neighborhood (e.g., the number of voxels in the x, y, and z directions associated with the physical space of the higher level voxel) in a lower resolution grid ) may be merged when forming the next higher level voxel grid. In one particular example, voxels within a neighborhood are merged by taking a weighted sum of individual Gaussian distributions for each voxel of the lower resolution grid. In some cases, merging voxels at a lower resolution to form a higher resolution grid is not only computationally cheaper, but also more efficient to help localize lidar data by reference frame. Allows for lower resolution grids.

いくつかの実装では、システムは、多重解像度ボクセル空間を利用して、物理環境の複数のスキャンをアライメントして、物理環境のマップおよびシーンを生成する、同様に、マップまたはシーン内の車両のローカライゼーションを助けることがある。例として、多重解像度ボクセル空間（例えば、ターゲット多重解像度ボクセル空間）が物理環境を表す特有のスキャンまたはデータセットに対して生成されると、システムは、生成された多重解像度ボクセル空間を、シーンを表す多重解像度ボクセル空間（例えば、リファレンス多重解像度ボクセル空間）とアライメントすることがある。場合によっては、アライメントは、リファレンス多重解像度ボクセル空間およびターゲット多重解像度ボクセル空間の各解像度におけるボクセル間の対応を実質的に同時に見つけることによって行われることがある。例えば、システムは、ターゲット多重解像度ボクセル空間の特有の解像度の各ボクセルに対して、占められているボクセルに対するリファレンス多重解像度ボクセル空間の対応する特有の解像度における平均ターゲット点を含むしきい値距離内またはしきい値数のボクセル（例えば、ボクセルの近傍）内のボクセル間で検索することがある。セマンティック層を含む例では、システムは、ターゲット多重解像度ボクセル空間における各セマンティック層の特有の解像度の各ボクセルに対して、リファレンス多重解像度ボクセル空間における対応するセマンティック層の特有の解像度における平均ターゲット点を含むボクセルの近傍を検索することがある。 In some implementations, the system utilizes multi-resolution voxel space to align multiple scans of the physical environment to generate maps and scenes of the physical environment, as well as localization of vehicles within the maps or scenes. can help As an example, when a multi-resolution voxel space (e.g., a target multi-resolution voxel space) is generated for a particular scan or dataset representing a physical environment, the system converts the generated multi-resolution voxel space into a representation of the scene. It may be aligned with a multi-resolution voxel space (eg, a reference multi-resolution voxel space). In some cases, alignment may be performed by substantially simultaneously finding correspondences between voxels at each resolution of the reference multi-resolution voxel space and the target multi-resolution voxel space. For example, the system may, for each voxel of a specific resolution in the target multiresolution voxel space, within a threshold distance containing the average target point at the corresponding specific resolution of the reference multiresolution voxel space to the occupied voxel, or It may search among voxels within a threshold number of voxels (eg, a neighborhood of voxels). In examples involving semantic layers, the system includes, for each voxel of each semantic layer-specific resolution in the target multi-resolution voxel space, the average target point at the corresponding semantic layer-specific resolution in the reference multi-resolution voxel space. A neighborhood of voxels may be searched.

リファレンス多重解像度ボクセル空間の近傍内の識別されたボクセルのうち、システムは、ターゲット多重解像度ボクセル空間のボクセルに近いセントロイドを有するボクセルを選択することがある。次に、システムは、リファレンス多重解像度ボクセル空間における選択されたボクセルの分布を、ターゲット共分散スタック（target covariance stack）のボクセルにより平均することがある。次に、システムは、組み合わされた共分散行列に主成分分析を行い、固有値（例えば、最小の固有値）を、２つのボクセルに対してマッチングした法線ベクトルとして選択することがある。次に、システムは、少なくともいくつかの例では、上記のマッチングした法線ベクトルに少なくとも部分的に基づくことがあるマッチングしたボクセルの各々に対して残差（または誤差など）を決定し、続いて、すべての上記の残差について最適化を行うことがある。最適化は、上記のボクセルセントロイドのペアの間における距離を最小化することがある。このように、２つのボクセルを表すマージされたボクセルは、元のボクセルの両方の共分散（例えば、関連データの）および重みを正確に表す位置のグリッド内に位置されることがある。もちろん、上記のボクセルのマージを、必要としないアプリケーションはある。非限定的の例として、２つのボクセル空間の間の相対変換は、一般に、ボクセルを組み合わせる（マージする）ことなく、ローカライズのために用いられることがある。 Of the voxels identified within the neighborhood of the reference multiresolution voxel space, the system may select voxels with centroids close to those of the target multiresolution voxel space. The system may then average the distribution of the selected voxels in the reference multiresolution voxel space with the voxels of the target covariance stack. The system may then perform principal component analysis on the combined covariance matrix and select the eigenvalue (eg, the smallest eigenvalue) as the matched normal vector for the two voxels. Next, the system determines a residual (or error, etc.) for each of the matched voxels, which, in at least some examples, may be based at least in part on the matched normal vectors above, followed by , we may perform an optimization on all the above residuals. The optimization may minimize the distance between pairs of voxel centroids above. In this way, a merged voxel representing two voxels may be placed within a grid of positions that accurately represent the covariances (eg, of the associated data) and weights of both of the original voxels. Of course, there are applications that do not require the above voxel merging. As a non-limiting example, relative transformations between two voxel spaces are commonly used for localization without merging voxels.

アライメントの間、各層が実質的に同時にマージされることがある場合でさえ、より粗い解像度（例えば、より大きなボクセルに対応する解像度）は、より細かい解像度より前のマッチングに帰着することがある。このように、より粗い解像度におけるマッチングは、より細かい解像度が、マッチングをすることを開始し、アライメント処理を完了することができるように、２つの多重解像度ボクセル空間を、より近いアライメントに至らせるのを助けることがある。場合によっては、キャプチャされたセンサーデータを、環境を表す多重解像度ボクセル空間にマージすることによって、車両は、ポリゴンおよび／またはメッシュを含む従来のマップデータを利用するシステムよりも、より正確および／またはより迅速に環境内の位置を初期化する、またはローカライズすることができることがある。加えて、ボクセルを多重解像度ボクセル空間に格納することによって、データが、より容易にインデックス可能な／検索可能なやり方において格納されることにより、処理速度およびスループットを向上させることがある。例えば、粗い解像度が実際のタスクに対して許容し得るならば、粗い層がメモリーにロードされることにより、望ましい動作のためにアクセスされ処理されるデータの量を減らすことがある。 Even though each layer may be merged substantially simultaneously during alignment, coarser resolutions (e.g., resolutions corresponding to larger voxels) may result in matching before finer resolutions. Thus, matching at the coarser resolution brings the two multi-resolution voxel spaces into closer alignment so that the finer resolution can begin matching and complete the alignment process. can help In some cases, by merging the captured sensor data into a multi-resolution voxel space representing the environment, the vehicle can be more accurate and/or It may be possible to initialize or localize locations in the environment more quickly. Additionally, storing voxels in a multi-resolution voxel space may improve processing speed and throughput by storing data in a more easily indexable/searchable manner. For example, if coarser resolution is acceptable for a practical task, coarser layers may be loaded into memory to reduce the amount of data accessed and processed for the desired operation.

場合によっては、多重解像度ボクセル空間は、空間の各層が環境についての細部の異なる解像度を提供するので、従来のシステムよりも正確に環境を表すことがある。ゆえに、いくつかの状況では、物理環境のより詳細な表現へのアクセスを有することは、自律車両の全体的な安全性を向上させることがある。 In some cases, a multi-resolution voxel space may represent the environment more accurately than conventional systems because each layer of space provides a different resolution of detail about the environment. Therefore, in some situations, having access to a more detailed representation of the physical environment may improve the overall safety of an autonomous vehicle.

図１は、本明細書に説明されている多重解像度ボクセル空間１０２の例示的なアーキテクチャ１００を例示する例示的な図である。現在の例では、多重解像度ボクセル空間１０２は、セマンティック層１０４、１０６、および１０８として例示される複数のセマンティック層から形成される。セマンティック層１０４～１０８の各々は、特有のセマンティッククラスまたはタイプに対するデータを表すことがある。例として、第１のセマンティック層１０４は、樹木を表すデータを含む一方、第２のセマンティック層１０６は、建物を表すデータを含むことがある。ゆえに、複数のセマンティック層１０４～１０８を含む多重解像度ボクセル空間１０２は、図２～５に関して以下に例示されている物理環境のフルピクチャまたはマップとして各セマンティック層１０４～１０８からのデータを表すことがある。場合によっては、いくつかのアプリケーションは、特有のセマンティッククラスのみの識別または意識を必要とすることがある一方、他のアプリケーションは、物理環境全体の詳細な理解を必要とすることがある。多重解像度ボクセル空間１０２をセマンティック層１０４～１０８にセグメンテーションすることによって、各アプリケーションは、環境に対して相当する適切なクラスまたはタイプのデータのみを処理することにより、いくつかのアプリケーションにおいて処理速度を向上させることがある。 FIG. 1 is an exemplary diagram illustrating an exemplary architecture 100 of the multi-resolution voxel space 102 described herein. In the present example, multi-resolution voxel space 102 is formed from multiple semantic layers illustrated as semantic layers 104 , 106 and 108 . Each of the semantic layers 104-108 may represent data for a specific semantic class or type. As an example, the first semantic layer 104 may contain data representing trees, while the second semantic layer 106 may contain data representing buildings. Thus, a multi-resolution voxel space 102 that includes multiple semantic layers 104-108 can represent the data from each semantic layer 104-108 as a full picture or map of the physical environment illustrated below with respect to FIGS. be. In some cases, some applications may require identification or awareness of only specific semantic classes, while other applications may require detailed understanding of the entire physical environment. By segmenting the multi-resolution voxel space 102 into semantic layers 104-108, each application can speed up processing in some applications by processing only the appropriate class or type of data corresponding to the environment. may cause

さらに、セマンティック層１０４～１０８の各々は、ボクセルコバリアンスグリッド（voxel covariance grid）１１０、１１２、および１１４として例示される、１つまたは複数のボクセルグリッドを含むこともある。ボクセルコバリアンスグリッド１１０-１１４の各々は、対応するセマンティック層１０４～１０８の同一のセマンティックデータ（semantic data）を表すが、異なる解像度においてである。例として、複数のグリッド１１０の第１のボクセルコバリアンスグリッドは、およそ２５センチメートルのサイズを有するボクセルを有することがある一方、複数のグリッド１１０の第２のボクセルコバリアンスグリッドは、およそ１６メートルのサイズを有するボクセルを有する。ゆえに、複数のグリッド１１０～１１４の各々の各ボクセルコバリアンスグリッドは、物理環境の表されたデータのアライメントおよび処理を助けるために、異なる解像度または粗さを有することがある。例えば、いくつかのアプリケーションは、物理環境についての粗い一般的な理解のみを必要とすることがある一方、他のアプリケーションは、物理環境についての詳細な理解を必要とし、各アプリケーションが、ボクセルグリッドを、望ましいまたは適切な解像度にて処理することにより、いくつかのアプリケーションにおける処理速度を向上させることがある。 Additionally, each of the semantic layers 104-108 may include one or more voxel grids, illustrated as voxel covariance grids 110, 112, and 114. FIG. Each of the voxel covariance grids 110-114 represents the same semantic data of the corresponding semantic layers 104-108, but at different resolutions. As an example, a first voxel covariance grid of plurality of grids 110 may have voxels having a size of approximately 25 centimeters, while a second voxel covariance grid of plurality of grids 110 may have a size of approximately 16 centimeters. has a voxel with a size of . Thus, each voxel covariance grid in each of the plurality of grids 110-114 may have a different resolution or coarseness to aid in alignment and processing of data representing the physical environment. For example, some applications may require only a coarse, general understanding of the physical environment, while other applications may require a detailed understanding of the physical environment, and each application may require a voxel grid. , processing at a desired or suitable resolution may improve processing speed in some applications.

いくつかの例では、たとえば、図２～５に関して以下に例示されるものなど、多重解像度ボクセル空間１０２のボクセルコバリアンスグリッド１１０～１１４のボクセルに関連付けられたデータは、共分散行列、平均、および点分布を表す重みを格納するボクセルによって表されることがある。場合によっては、グリッド１１０～１１４のボクセルは、コバリアンスの楕円体として視覚的に与えられることがある。コバリアンスの楕円体は、各ボクセルの固有値比の形状パラメーターに少なくとも部分的に基づくことがある。 In some examples, data associated with voxels of voxel covariance grids 110-114 in multi-resolution voxel space 102, such as those illustrated below with respect to FIGS. It may be represented by voxels that store weights representing point distributions. In some cases, the voxels of grids 110-114 may be visually presented as covariance ellipsoids. The covariance ellipsoid may be based at least in part on the shape parameter of the eigenvalue ratios of each voxel.

例示される例では、３つのセマンティック層１０４～１０８とボクセルコバリアンスグリッド１１０～１１４の３つのセットとが示されている。しかしながら、多重解像度ボクセル空間１０２が、セマンティック層をいくらでも含むことがあること、および、各セマンティック層が、ボクセルコバリアンスグリッドをいくらでも含むことがあることは理解されるは、理解されるべきである。いくつかの実装では、各セマンティック層に対してボクセルコバリアンスグリッドの数は同一であり得る一方、他の実装では、各セマンティック層内のボクセルコバリアンスグリッドの数は異なることがある。例として、いくつかのセマンティッククラス、たとえば、群葉（または歩行者）などは、たとえば建物などの他のセマンティッククラスよりも多くの追加の微細な解像度ボクセルコバリアンスグリッドを必要とすることがあり、ゆえに、歩行者クラスを表すセマンティック層は、建物クラスを表すセマンティック層よりも多くのボクセルコバリアンスグリッドを含むことがある。 In the illustrated example, three semantic layers 104-108 and three sets of voxel covariance grids 110-114 are shown. However, it should be understood that the multi-resolution voxel space 102 may contain any number of semantic layers, and that each semantic layer may contain any number of voxel covariance grids. In some implementations, the number of voxel covariance grids may be the same for each semantic layer, while in other implementations the number of voxel covariance grids in each semantic layer may differ. As an example, some semantic classes, e.g. foliage (or pedestrians), may require more additional fine resolution voxel covariance grids than other semantic classes, e.g. buildings, Therefore, the semantic layer representing the pedestrian class may contain more voxel covariance grids than the semantic layer representing the building class.

図２は、本明細書に説明されている多重解像度ボクセル空間２０８のセマンティック層の例示的な解像度２０２、２０４、および２０６を例示する例示的な示す絵入りの図２００である。現在の例では、解像度は説明のためだけに２次元で示され、いくつの次元でも用いられることがある（たとえば現実世界の３次元物理空間を表す３次元など）ことは理解されるべきである。現在の例では、第１の解像度２０２の第１の近傍２１０内のボクセルは、第２の解像度２０４のボクセル２１２を形成するために組み合わされる。同じく、第２の解像度２０４の第１の近傍２１４内のボクセルは、第３の解像度２０６のボクセル２１６を形成するために組み合わされる。以下に述べている第３の解像度のボクセル２１６は、近傍２１４内のボクセル２１８および２２０の各々からの個々のガウス分布の重み付き和に基づいて形成されて、単一のより高い解像度のボクセルを生成することがある。個々のガウス分布の重み付き和を決定することは、処理リソースおよび時間の観点から計算的に安価であり、ゆえに、多重解像度ボクセル空間２０８を構築することは、従来のシステムよりも早く、より少ない処理リソースにより行われることがあることは、理解されるべきである。 FIG. 2 is an exemplary illustrative pictorial diagram 200 illustrating exemplary resolutions 202, 204, and 206 of the semantic layers of the multi-resolution voxel space 208 described herein. In the current example, resolution is shown in two dimensions for illustrative purposes only, and it should be understood that any number of dimensions may be used (eg, three dimensions representing a three-dimensional physical space in the real world, etc.). . In the current example, voxels within first neighborhood 210 at first resolution 202 are combined to form voxels 212 at second resolution 204 . Similarly, voxels within first neighborhood 214 of second resolution 204 are combined to form voxels 216 of third resolution 206 . Third resolution voxels 216, discussed below, are formed based on a weighted sum of individual Gaussian distributions from each of voxels 218 and 220 in neighborhood 214 to yield a single higher resolution voxel. may generate. Determining the weighted sum of the individual Gaussian distributions is computationally cheap in terms of processing resources and time, so building the multi-resolution voxel space 208 is faster and less expensive than conventional systems. It should be understood that it may be done by processing resources.

現在の例では、２次元の２ｘ２近傍が示される。しかしながら、多重解像度ボクセル空間は、物理空間を表す３次元ボクセルグリッドとして形成されることが可能であること、および、近傍は、たとえば２ｘ２ｘ２、３ｘ３ｘ３、５ｘ５ｘ５など、種々の均一のサイズ、またはたとえば２ｘ３ｘ４、４ｘ３ｘ４、５ｘ１ｘ３など、均一でないサイズを有することがあることは、理解されるべきである。１つの特有の例では、近傍は、各々のより高い解像度の層が、進行している下位層の半分の数のボクセルを有しながら、２ｘ２ｘ２ボクセルサイズを有することがある。 In the current example, a two-dimensional 2x2 neighborhood is shown. However, the multi-resolution voxel space can be formed as a three-dimensional voxel grid representing physical space, and the neighborhoods can be of various uniform sizes, e.g. 2x2x2, 3x3x3, 5x5x5, or e.g. It should be understood that they may have non-uniform sizes, such as 4x3x4, 5x1x3, and so on. In one specific example, the neighborhood may have a 2x2x2 voxel size, with each higher resolution layer having half the number of voxels of the proceeding lower layer.

図３～５は、本明細書に説明されている図２の多重解像度ボクセル空間２０８についての複数の解像度２０２、２０４、および２０６を例示する例示的な図である。現在の例では、多重解像度ボクセル空間２０８のセマンティック層の各々は、物理環境のピクチャまたはマップを生成するのに示される。例として、多重解像度ボクセル空間２０８は、自律車両によってキャプチャされた物理環境の複数のライダースキャンをマージする、またはアライメントすることによって形成されることがある。現在の例では、多重解像度ボクセル空間２０８は、異なる解像度２０２、２０４、および２０６にて物理環境を示すのに、ズームインまたはズームアウトされることがある。例として、解像度２０２は、第１の解像度または最も細かい解像度にてボクセルを示す。ゆえに、多重解像度ボクセル空間２０８の解像度２０２は、解像度２０４または２０６の各々よりも多くのボクセルを含み、さらに、物理環境の最も詳細な表現も含む。進行している解像度２０４または２０６の各々は、後続するより粗い解像度のボクセルにより物理環境を示す。例として、解像度２０２の多重解像度ボクセル空間の各ボクセルは２５ｃｍの領域を表すことがある一方、解像度２０６の多重解像度ボクセル空間の各ボクセルは、１６ｍの領域を表すことがある。 3-5 are exemplary diagrams illustrating multiple resolutions 202, 204, and 206 for the multi-resolution voxel space 208 of FIG. 2 described herein. In the current example, each of the semantic layers of multi-resolution voxel space 208 is shown to produce a picture or map of the physical environment. As an example, multi-resolution voxel space 208 may be formed by merging or aligning multiple lidar scans of the physical environment captured by the autonomous vehicle. In the current example, multi-resolution voxel space 208 may be zoomed in or out to show the physical environment at different resolutions 202, 204, and 206. FIG. As an example, resolution 202 shows voxels at a first or finest resolution. Thus, resolution 202 of multi-resolution voxel space 208 contains more voxels than each of resolutions 204 or 206, and also contains the most detailed representation of the physical environment. Each progressive resolution 204 or 206 represents the physical environment with subsequent coarser resolution voxels. As an example, each voxel in the multi-resolution voxel space of resolution 202 may represent an area of 25 cm, while each voxel in the multi-resolution voxel space of resolution 206 may represent an area of 16 m.

場合によっては、特有のセマンティック層に関連付けられたボクセルは、多重解像度ボクセル空間２０８を見るとき、２つのセマンティック層に関連付けられたボクセルを互いに視覚的に区別するために、着色されるまたはテクスチャされることがある。さらに、各ボクセルに関連付けられたデータが、ボクセルの固有値比、形状パラメーター、および空間統計に少なくとも部分的に基づく形状を有するコバリアンスの楕円体として表されるので、図２～５に例示されるデータが、対応する物体の実在の形状を実質的に表す形状を有することは特筆されるべきである。 In some cases, voxels associated with a particular semantic layer are colored or textured to visually distinguish voxels associated with two semantic layers from each other when viewing the multi-resolution voxel space 208. Sometimes. Further, the data associated with each voxel is represented as an ellipsoid of covariance having a shape based at least in part on the voxel's eigenvalue ratios, shape parameters, and spatial statistics, thus the data illustrated in FIGS. However, it should be noted that they have shapes that substantially represent the real-world shapes of the corresponding objects.

いくつかの例では、多重解像度ボクセル空間１０２の各々のより高い解像度３００～５００は、先行する低レベルの解像度２００～４００の半分の数のボクセルを有することがある。例として、解像度３００が、およそ４メートルのサイズのボクセルを有するならば、解像度４００のボクセルは、およそ８メートルのサイズ（例えば、解像度３００のボクセルの２倍のサイズ）であり得る。しかしながら、他の例では、各解像度２００～５００のボクセルのサイズおよび／または数は、他の数学的なおよび／または任意の関係を有することがある。 In some examples, each higher resolution 300-500 of the multi-resolution voxel space 102 may have half as many voxels as the preceding lower-level resolution 200-400. As an example, if the resolution 300 has voxels approximately 4 meters in size, the resolution 400 voxels may be approximately 8 meters in size (eg, twice the size of the resolution 300 voxels). However, in other examples, the size and/or number of voxels at each resolution 200-500 may have other mathematical and/or arbitrary relationships.

現在の例では、種々のセマンティッククラスが、楕円の模様または色の違いに基づいて示される。例として、楕円体３０２は、群葉に対応することがあり、楕円体３０４は、壁、構造物、または建物に対応することがあり、楕円体３０６は、たとえば草などの地表の被覆に対応することがある。 In the current example, different semantic classes are shown based on ellipse patterns or color differences. By way of example, ellipsoid 302 may correspond to foliage, ellipsoid 304 may correspond to walls, structures, or buildings, and ellipsoid 306 may correspond to ground cover, such as grass. I have something to do.

図６～図８は、図１～図５の多重解像度ボクセル空間に関連付けられた例示的な処理を例示するフロー図である。処理は、ハードウェア、ソフトウェア、またはそれらの組み合わせにおいて実装されることが可能であるいくつかまたはすべての動作のシーケンスを表す、論理的なフロー図におけるブロックの集まりとして例示される。ソフトウェアのコンテキストという状況では、ブロックは、１つまたは複数のプロセッサーによって実行されると、引用される動作を行う、１つまたは複数のコンピューター読取り可能媒体に格納されたコンピューター実行可能な命令を表す。一般に、コンピューター実行可能な命令は、特有の機能を行うまたは特有の抽象データ型を実装するルーチン、プログラム、オブジェクト、コンポーネント、データ構造などを含む。 FIGS. 6-8 are flow diagrams illustrating exemplary processing associated with the multi-resolution voxel space of FIGS. 1-5. Processes are illustrated as collections of blocks in logical flow diagrams that represent sequences of some or all actions that may be implemented in hardware, software, or a combination thereof. In the context of software, blocks represent computer-executable instructions stored on one or more computer-readable media that, when executed by one or more processors, perform the recited acts. Generally, computer-executable instructions include routines, programs, objects, components, data structures, etc. that perform particular functions or implement particular abstract data types.

説明される動作の順は、限定として解釈されるべきではない。説明されるブロックのいくらでも、どんな順でもおよび／または並列に組み合わせて、処理または代替処理を行うことが可能であり、すべてのブロックが実行される必要はない。議論の目的のために、本明細書における処理は、本明細書における例にて説明されるフレームワーク、アーキテクチャ、および環境を参照して説明されるが、処理は、非常に種々様々の他のフレームワーク、アーキテクチャ、または環境にて実装されることがある。 The order of operations described should not be interpreted as a limitation. Any number of the illustrated blocks may be processed or alternately combined in any order and/or in parallel, and not all blocks need be executed. For purposes of discussion, the processes herein are described with reference to the frameworks, architectures, and environments described in the examples herein, but the processes can be implemented in a wide variety of other It may be implemented in a framework, architecture, or environment.

図６は、本明細書に説明されている物理環境を表すデータをシーンとアライメントするように構成されたシステムの例示的なデータフローを例示する例示的な処理フロー図６００である。例示される例では、システムは、シーンを、環境を表すデータも同様に、多重解像度ボクセル空間として格納されるように構成されることがある。上に述べている、多重解像度ボクセル空間は、各セマンティック層が、異なる解像度のコバリアンスの楕円体としてボクセルを表す複数のボクセルグリッドを含む複数のセマンティック層を有することがある。 FIG. 6 is an example process flow diagram 600 illustrating an example data flow for a system configured to align data representing a physical environment with a scene as described herein. In the illustrated example, the system may be configured such that the scene, as well as the data representing the environment, are stored as a multi-resolution voxel space. The multi-resolution voxel space described above may have multiple semantic layers, each semantic layer containing multiple voxel grids representing voxels as ellipsoids of covariance at different resolutions.

１つの特有の例では、センサーシステム６０２、たとえば、ライダー、レーダー、ソナー、赤外線、カメラ、または他の画像キャプチャデバイスなどは、システムを取り囲む物理環境を表すデータをキャプチャすることがある。場合によっては、キャプチャされたデータは、たとえばライダースキャンの出力から生成された点群など、複数のデータ点６０４であり得る。今述べた例では、データ点６０４は、多重解像度ボクセル空間生成コンポーネント６０６によって受信されることがある。 In one specific example, a sensor system 602, such as a lidar, radar, sonar, infrared, camera, or other image capture device, may capture data representing the physical environment surrounding the system. In some cases, the captured data can be multiple data points 604, such as a point cloud generated from the output of a lidar scan. In the example just described, data points 604 may be received by multi-resolution voxel space generation component 606 .

多重解像度ボクセル空間生成コンポーネント６０６は、データ点６０４からターゲット多重解像度ボクセル空間６０８を生成するように構成されることがある。場合によっては、多重解像度ボクセル空間生成コンポーネント６０６は、分類および／またはセグメンテーション技法を介してデータ点を処理することがある。例として、多重解像度ボクセル空間生成コンポーネント６０６は、１つまたは複数のニューラルネットワーク（例えば、深層ニューラルネットワーク、畳み込みニューラルネットワーク等）、回帰技法等を用いて、データ点にタイプまたはクラスを割り当てて、データ点６０４をセマンティックラベル（semantic label）により識別し分類することがある。場合によっては、セマンティックラベルは、たとえば、車両、歩行者、サイクリスト、動物、建物、木、路面、縁石、歩道、不明などのクラスまたはエンティティタイプを含むことがある。追加および／または代替の例では、セマンティックラベルは、データ点６０４に関連付けられた１つまたは複数の特性を含むことがある。例えば、特性は、限定されないが、ｘ位置（グローバルおよび／またはローカルポジション）、ｙ位置（グローバルおよび／またはローカルポジション）、ｚ位置（グローバルおよび／またはローカルポジション）、向き（例えば、ロール、ピッチ、ヨー）、エンティティタイプ（例えば、分類）、実体の速度、実体の加速度、速度および／または加速度の変化の割合、実体の範囲（大きさ）などを含むことがある。 Multi-resolution voxel space generation component 606 may be configured to generate target multi-resolution voxel space 608 from data points 604 . In some cases, multi-resolution voxel space generation component 606 may process data points via classification and/or segmentation techniques. By way of example, multi-resolution voxel space generation component 606 assigns types or classes to data points using one or more neural networks (e.g., deep neural networks, convolutional neural networks, etc.), regression techniques, etc. to generate data Points 604 may be identified and classified by semantic labels. In some cases, semantic labels may include classes or entity types such as, for example, vehicle, pedestrian, cyclist, animal, building, tree, road surface, curb, sidewalk, unknown. In additional and/or alternative examples, semantic labels may include one or more characteristics associated with data points 604 . For example, properties include, but are not limited to, x position (global and/or local position), y position (global and/or local position), z position (global and/or local position), orientation (e.g., roll, pitch, yaw), entity type (eg, classification), entity velocity, entity acceleration, rate of change of velocity and/or acceleration, entity range (magnitude), and the like.

いくつかの例では、ターゲット多重解像度ボクセル空間６０８を生成することは、静的オブジェクト（例えば、建物、樹木、群葉など）に関連付けられたデータをターゲット多重解像度ボクセル空間６０８に関連付ける一方、動的オブジェクト（例えば、歩行者、車両などを表す）に関連付けられたデータをフィルタリングすることを含むことがある。 In some examples, generating the target multi-resolution voxel space 608 involves associating data associated with static objects (e.g., buildings, trees, foliage, etc.) to the target multi-resolution voxel space 608, whereas dynamic It may include filtering data associated with objects (eg, representing pedestrians, vehicles, etc.).

代替の実装では、データ点６０４は、パーセプションパイプラインまたはコンポーネントによって、添付されたセマンティックラベルを有して出力されることがある。例として、データ点６０４は、パーセプションコンポーネントによって出力される散在する物体の状態表現の一部として受信されることがあり、詳細が、全体として本明細書に参照により組み入れられる米国出願シリアル番号１６／５４９，６９４に述べられる。 In alternative implementations, data points 604 may be output with semantic labels attached by a perception pipeline or component. By way of example, data points 604 may be received as part of a stray object state representation output by a perception component, the details of which are incorporated herein by reference in their entirety, US Application Ser. 549,694.

現在の例では、多重解像度ボクセル空間生成コンポーネント６０６は、セマンティックラベルを付けられたデータ点６０４を、対応するセマンティックラベル（例えば、木、建物、歩行者等）を有するターゲット多重解像度ボクセル空間６０８のセマンティック層に割り当てることがある。例として、多重解像度ボクセル空間生成コンポーネント６０６は、データ点６０４を共通の基準フレームに投影し、次に、共通の基準フレーム内のデータ点６０４を、対応するセマンティッククラスに関連付けられた適切な点群へ多重化することがある。各点群に対して、次に、多重解像度ボクセル空間生成コンポーネント６０６は、各データ点６０４を、各セマンティック層の最も細かい解像度のボクセルグリッド（例えば、ベースボクセルグリッド）のボクセルに割り当てることがある。いくつかの特定の事例では、多重解像度ボクセル空間は、ボクセルの各々のセマンティッククラスを含む複数の統計値を格納する単一の層であり得る。 In the current example, multiresolution voxel space generation component 606 converts semantically labeled data points 604 into semantic vectors of target multiresolution voxel space 608 with corresponding semantic labels (eg, trees, buildings, pedestrians, etc.). may be assigned to layers. As an example, the multi-resolution voxel space generation component 606 projects the data points 604 into a common frame of reference, and then converts the data points 604 in the common frame of reference into an appropriate point cloud associated with the corresponding semantic class. may be multiplexed into For each point cloud, a multi-resolution voxel space generation component 606 may then assign each data point 604 to a voxel of the finest resolution voxel grid (eg, base voxel grid) of each semantic layer. In some particular cases, a multi-resolution voxel space may be a single layer that stores multiple statistics, including semantic classes for each of the voxels.

対応する群に対するデータ点６０４の各々がボクセルに割り当てられると、多重解像度ボクセル空間生成コンポーネント６０６は、セマンティック層の最も細かい解像度のグリッドの各ボクセルに対して空間統計（例えば、空間平均、共分散、およびボクセルに割り当てられたデータ点６０４の重みまたは数）を計算することがある。１つの特有の例では、特有のボクセルの空間統計は、Welford's Online Algorithmを用いて計算されることがある。 Once each of the data points 604 for the corresponding group has been assigned to a voxel, the multi-resolution voxel space generation component 606 computes spatial statistics (e.g., spatial mean, covariance, and the weight or number of data points 604 assigned to the voxel). In one specific example, spatial statistics for specific voxels may be calculated using Welford's Online Algorithm.

セマンティック層のベースまたは最も細かい解像度のボクセルグリッドが完了すると、多重解像度ボクセル空間生成コンポーネント６０６は、セマンティック層の次のより大きい解像度のボクセルグリッドの各々を反復的にまたは再帰的に生成することがある。例として、多重解像度ボクセル空間生成コンポーネント６０６は、先行するより低い解像度のグリッド（ベースのまたは最も細かい解像度のグリッドにより凝視する）を利用し、２ｘ２ｘ２近傍内のボクセルに関連付けられたデータをマージして、次のより高いレベルのボクセルグリッドを形成することがある。１つの特定の例では、より低い解像度ボクセルグリッドの近傍内のボクセルは、近傍内の各ボクセルの個々のガウス分布の重み付き和をとることによってマージされる。ゆえに、多重解像度ボクセル空間のセマンティック層内のボクセルグリッドは、図１～５に関して上により詳細に述べている、各々のより高い解像度のグリッドが、先行するより低い解像度のグリッドよりも少ないボクセルを含む多重解像度ピラミッドを形成することがある。１つの特定の例では、セマンティック層の各々の先行する低い解像度のグリッドは、次のより高い解像度のグリッドの４倍の数のボクセルを有することがある。 Once the base or finest resolution voxel grid of the semantic layer is complete, the multi-resolution voxel space generation component 606 may iteratively or recursively generate each of the next higher resolution voxel grids of the semantic layer. . As an example, the multi-resolution voxel space generation component 606 utilizes a preceding lower resolution grid (gazing at the base or finest resolution grid) and merges data associated with voxels within a 2x2x2 neighborhood. , may form the next higher level voxel grid. In one particular example, voxels within a neighborhood of the lower resolution voxel grid are merged by taking a weighted sum of the individual Gaussian distributions of each voxel within the neighborhood. Thus, the voxel grids in the semantic layer of the multiresolution voxel space are described in more detail above with respect to FIGS. 1-5, with each higher resolution grid containing fewer voxels than the preceding lower resolution grid. A multi-resolution pyramid may be formed. In one particular example, each preceding lower resolution grid in the semantic layer may have four times as many voxels as the next higher resolution grid.

ターゲット多重解像度ボクセル空間６０８がデータ点６０４から生成されると、ターゲット多重解像度ボクセル空間６０８は、リファレンス多重解像度ボクセル空間６１０（例えば、シーンを表す多重解像度ボクセル空間）とアライメントされる。例として、例示される例では、多重解像度ボクセル空間アライメントコンポーネント６１２は、新たに生成されたターゲット多重解像度ボクセル空間６０８をリファレンス多重解像度ボクセル空間６１０とアライメントすることがある、またはターゲット多重解像度ボクセル空間６０８とリファレンス多重解像度ボクセル空間６１０との間の変換を決定することであり得る。ターゲット多重解像度ボクセル空間６０８をリファレンス多重解像度ボクセル空間６１０とアライメントするために、多重解像度ボクセル空間アライメントコンポーネント６１２は、ターゲット多重解像度ボクセル空間６０８の各セマンティック層および各解像度に対して、実質的に同時に、各ボクセルを取り、リファレンス多重解像度ボクセル空間６１０の対応する解像度およびセマンティック層における平均ターゲット点を決定することがある。次に、多重解像度ボクセル空間アライメントコンポーネント６１２は、リファレンス多重解像度ボクセル空間６１０の対応する解像度およびセマンティック層のボクセルグリッドの２ｘ２ｘ２近傍を決定し、近傍のボクセルが占めているかどうかを識別することがある。次に、多重解像度ボクセル空間アライメントコンポーネント６１２は、ターゲット多重解像度ボクセル空間６０８からボクセルに最も近いセントロイドを有するボクセルを選択し、選択されたボクセルの分布およびターゲットボクセル（target voxel）からのボクセルの分布を平均する。次に、多重解像度ボクセル空間アライメントコンポーネント６１２は、選択されたボクセルとターゲットからのボクセルとの組み合わされた共分散行列に主成分分析を行い、２つのボクセルに対してマッチングされた法線ベクトルとして最小の固有値を選択することによりターゲット多重解像度ボクセル空間６０８をリファレンス多重解像度ボクセル空間６１０とより近くにアライメントさせることがある。いくつかの場合には、最適化は、マッチングされたボクセルに行われて、リファレンス多重解像度ボクセル空間およびターゲット多重解像度ボクセル空間の間の全体的なアライメントを改善する、および／または、限定されないが、非線形最適化（例えば、非線形最小二乗最適化）を含む相対変換（例えば、ローカライゼーションのために用いられる）を決定することがある。一例として、勾配降下技法、たとえば、以下に述べるガウスニュートン技法などが、利用されることがある。 Once the target multi-resolution voxel space 608 is generated from the data points 604, the target multi-resolution voxel space 608 is aligned with the reference multi-resolution voxel space 610 (eg, the multi-resolution voxel space representing the scene). By way of example, in the illustrated example, the multi-resolution voxel space alignment component 612 may align the newly generated target multi-resolution voxel space 608 with the reference multi-resolution voxel space 610 or the target multi-resolution voxel space 608 and the reference multi-resolution voxel space 610 . To align the target multiresolution voxel space 608 with the reference multiresolution voxel space 610, the multiresolution voxel space alignment component 612 substantially simultaneously for each semantic layer and each resolution of the target multiresolution voxel space 608: We may take each voxel and determine the average target point at the corresponding resolution and semantic layers of the reference multi-resolution voxel space 610 . A multi-resolution voxel space alignment component 612 may then determine a 2x2x2 neighborhood of the corresponding resolution and semantic layer voxel grids in the reference multi-resolution voxel space 610 and identify whether the neighboring voxels are occupied. Next, a multiresolution voxel space alignment component 612 selects the voxel with the closest centroid to the voxel from the target multiresolution voxel space 608 and calculates the distribution of the selected voxels and the distribution of voxels from the target voxel. to average Next, a multi-resolution voxel space alignment component 612 performs a principal component analysis on the combined covariance matrix of the selected voxels and voxels from the target, and minimizes The target multiresolution voxel space 608 may be more closely aligned with the reference multiresolution voxel space 610 by selecting the eigenvalues of . In some cases, optimization is performed on the matched voxels to improve the overall alignment between the reference multiresolution voxel space and the target multiresolution voxel space and/or without limitation Relative transformations (eg, used for localization) may be determined that involve non-linear optimization (eg, non-linear least-squares optimization). As an example, a gradient descent technique, such as the Gauss-Newton technique discussed below, may be utilized.

ガウスニュートン技法では、ターゲット多重解像度ボクセル空間６０８の第１のボクセルｉとリファレンス多重解像度ボクセル空間６１０の第２のボクセルｊとの間のマッチング、マッチングされた残差は、次のとおりに計算されることがある。 In the Gauss-Newton technique, the matching, matched residual, between the first voxel i in the target multiresolution voxel space 608 and the second voxel j in the reference multiresolution voxel space 610 is computed as Sometimes.

ただし、 however,

はマッチングされた法線ベクトルであり、μ_i、はボクセルｉの平均であり、λ₀はマッチング共分散行列の最小の固有値である。上に触れているように、マッチング法線ベクトルは、対応するボクセル共分散行列の重み付き和の最小固有ベクトルから計算される。各残差の重み、ｚ_ijは、Ｍ－エスティメータフレームワーク（M-estimator framework）に従って（例えばコーシー損失関数を用いて）再重み付けされる。次に、リファレンスグリッドとターゲットグリッドの間の変換に関して、マッチング誤差ijのヤコビアン is the matched normal vector, μ _i , is the mean of voxel i, and λ ₀ is the smallest eigenvalue of the matching covariance matrix. As alluded to above, matching normal vectors are computed from the smallest eigenvectors of the weighted sums of the corresponding voxel covariance matrices. Each residual weight, z _ij , is reweighted according to the M-estimator framework (eg, using the Cauchy loss function). Then the Jacobian of the matching error ij with respect to the transformation between the reference grid and the target grid

は、次によって与えられる。
J_ij＝［Ｒｎ_ij ｘμ_j×Ｒｎ_ij］^T
次に、多重解像度ボクセル空間アライメントコンポーネント６１２は、各マッチングｉｊに対して、全勾配と近似のヘシアンとを次のとおりに計算することがある。 is given by
J _ij =[Rn _ij ×μ _j ×Rn _ij ] ^T
Next, the multi-resolution voxel space alignment component 612 may compute the total gradient and approximate Hessian for each matching ij as follows.

ガウスニュートン最適化は、次のとおりに計算される。
ＨδＴ＝－ｇ
さらに、多重解像度ボクセル空間アライメントコンポーネント６１２は、ＳＯ（３）×Ｒ³の要素としてモデリングすることによって、デルタ変換を計算することがあり、更新されたアライメント変換は、次によって与えられる。
ｘⁿ⁺¹＝［ｅｘｐ（δＲ）Ｒⁿ δｐ＋ｐⁿ］^T
ただし、ｅｘｐ（）は、ＳＯ（３）指数写像である。上に与えられた変換が、最適化のさらなるイテレーションにおいて多重解像度ボクセル空間全体に適用されることがあり、最後のイテレーションが、２つのボクセル空間の間の変換を含むことがあることは、理解されるべきである。 The Gauss-Newton optimization is computed as follows.
HδT=-g
Further, the multi-resolution voxel space alignment component 612 may compute the delta transform by modeling it as an element of SO(3)×R ³ , and the updated alignment transform is given by:
xn ⁺¹ =[exp(δR) ^Rnδp + ^pn ] ^T
where exp() is the SO(3) exponential map. It will be appreciated that the transforms given above may be applied to the entire multi-resolution voxel space in further iterations of the optimization, and the final iteration may involve transformations between the two voxel spaces. should.

アライメント処理は、２つの多重解像度ボクセル空間６０８および６１０が許容誤差またはしきい値内にアライメントされるまで、またはイテレーション（例えば、ボクセルマージ）の予め決められた数が完了するまで、ターゲット多重解像度ボクセル空間６０８の各調整の後、イテレーションを続けることがある。このように、アライメントの間、より粗い解像度（例えば、より大きなボクセルに対応する解像度）は、より細かい解像度が、許容誤差またはしきい値を越えてマッチングをすることを開始し、アライメント処理を完了することができるように、２つの多重解像度ボクセル空間６０８および６１０をより近いアライメントに至らせるより前のマッチングに帰着することがある。しかしながら、いくつかの実装では、動作は、種々のボクセル空間のいくつかまたはすべてをアライメントさせるように決定される単一のデータ変換により、実質的に同時にすべての層および／またはセマンティッククラスにわたって行われることがある。 The alignment process continues until the two multiresolution voxel spaces 608 and 610 are aligned within a tolerance or threshold, or until a predetermined number of iterations (eg, voxel merges) are completed. After each adjustment of space 608, iterations may continue. Thus, during alignment, coarser resolutions (e.g., resolutions corresponding to larger voxels) begin to match finer resolutions beyond a tolerance or threshold to complete the alignment process. As can be done, it may result in earlier matching that brings the two multi-resolution voxel spaces 608 and 610 into closer alignment. However, in some implementations, operations are performed across all layers and/or semantic classes substantially simultaneously with a single data transformation determined to align some or all of the various voxel spaces. Sometimes.

１つの特有の例では、多重解像度ボクセル空間アライメントコンポーネント６１２は、追加のイテレーションより前にアライメントを初期化するために、第１のイテレーションにおいて各セマンティック層の最も高いまたは最も粗い解像度のみを利用することがある。場合によっては、各々の追加のイテレーションは、アライメント処理に対して別のより細かい解像度を導入することがある。完全にアライメントされた多重解像度ボクセル空間６１４は、次に、多重解像度ボクセル空間アライメントコンポーネント６１２によって出力され、次のリファレンス多重解像度ボクセル空間６１０として用いられることがある。 In one specific example, the multi-resolution voxel space alignment component 612 utilizes only the highest or coarsest resolution of each semantic layer in the first iteration to initialize alignment prior to additional iterations. There is In some cases, each additional iteration may introduce another finer resolution to the alignment process. The perfectly aligned multiresolution voxel space 614 may then be output by a multiresolution voxel space alignment component 612 and used as the next reference multiresolution voxel space 610 .

図７は、本明細書に説明されている多重解像度ボクセル空間の構成に関連付けられた例示的な処理７００を例示する例示的なフロー図である。上に述べている、多重解像度ボクセル空間は、異なる解像度または物理的距離にて物理環境を表す複数のボクセルグリッドまたはボクセル層を含むことがある。例として、各ボクセル層は、進行している層の２倍の解像度（例えば、１フィート、２フィート、４フィートなど）にて物理環境を表すことがある。場合によっては、多重解像度ボクセル空間は、複数のセマンティック層に分離されることがあり、各セマンティック層は、異なる解像度の複数のボクセルグリッドを含む。 FIG. 7 is an example flow diagram illustrating an example process 700 associated with constructing a multi-resolution voxel space as described herein. A multi-resolution voxel space, as discussed above, may include multiple voxel grids or voxel layers that represent the physical environment at different resolutions or physical distances. As an example, each voxel layer may represent the physical environment at twice the resolution of the advancing layer (eg, 1 foot, 2 feet, 4 feet, etc.). In some cases, the multi-resolution voxel space may be separated into multiple semantic layers, each semantic layer containing multiple voxel grids of different resolutions.

７０２において、多重解像度ボクセル空間生成コンポーネントは、物理環境を表すデータを受信することがある。例えば、多重解像度ボクセル空間は、たとえば、ライダーシステムの出力など、物理環境を表すデータ点から生成されることがある。他の例では、データは、レーダー、ソナー、赤外線、カメラ、または他の画像／データキャプチャデバイスの出力を含むことがある。いくつかの例では、多重解像度ボクセル空間生成コンポーネントは、各データ点にセマンティッククラスを割り当てることがある。例として、１つの特定の例では、データ点へのセマンティッククラスの割り当ては、全体として本明細書に参照により組み入れられる米国出願シリアル番号１５／８２０，２４５に述べられる。 At 702, a multi-resolution voxel space generation component may receive data representing a physical environment. For example, a multi-resolution voxel space may be generated from data points representing the physical environment, eg, the output of a lidar system. In other examples, the data may include the output of radar, sonar, infrared, cameras, or other image/data capture devices. In some examples, the multi-resolution voxel space generation component may assign a semantic class to each data point. By way of example, in one particular example, the assignment of semantic classes to data points is set forth in US Application Serial No. 15/820,245, which is incorporated herein by reference in its entirety.

７０４において、多重解像度ボクセル空間生成コンポーネントは、物理環境を表すデータからセマンティック点群（semantic point cloud）を生成する。例えば、多重解像度ボクセル空間生成コンポーネントは、物理環境を表すデータから共通フレームにデータ点を投影することがある。 At 704, the multi-resolution voxel space generation component generates a semantic point cloud from the data representing the physical environment. For example, a multi-resolution voxel space generation component may project data points from data representing the physical environment onto a common frame.

例として、多重解像度ボクセル空間生成コンポーネントまたは別のコンポーネントは、データ点に分類および／またはセグメンテーション技法を適用して、マンティッククラスを割り当てることがある。いくつかの例では、１つまたは複数のニューラルネットワーク（例えば、ディープニューラルネットワーク、畳み込みニューラルネットワークなど）、回帰技法などは、データ点をセマンティッククラスにより識別しおよび分類するのに用いられることがある。場合によっては、セマンティッククラスは、たとえば、車両、歩行者、サイクリスト、動物、建物、木、路面、縁石、歩道、不明などのクラスまたはエンティティタイプを含むことがある。 By way of example, a multi-resolution voxel space generation component or another component may apply classification and/or segmentation techniques to data points to assign mantic classes. In some examples, one or more neural networks (e.g., deep neural networks, convolutional neural networks, etc.), regression techniques, etc. may be used to identify and classify data points by semantic class. In some cases, semantic classes may include classes or entity types such as, for example, vehicle, pedestrian, cyclist, animal, building, tree, road surface, curb, sidewalk, unknown.

７０６において、多重解像度ボクセル空間生成コンポーネントは、多重解像度ボクセル空間の第１の解像度に対して、セマンティッククラスごとのボクセルコバリアンスグリッドを生成することがある。いくつかの例では、多重解像度ボクセル空間生成コンポーネントは、データ点を、多重解像度ボクセル空間のマッチしているセマンティック層の対応するボクセルに割り当てて、第１の解像度のグリッドの各々を生成することがある。データ点がセマンティック層のボクセルに割り当てられると、多重解像度ボクセル空間生成コンポーネントは、たとえば、各ボクセルに対する平均および共分散など、ボクセル空間統計を決定することがある。場合によっては、多重解像度ボクセル空間生成コンポーネントは、多重解像度ボクセル空間を形成するとき、最も細かい解像度の層より開始し、そして、各々の次のより粗い層を生成することがある。 At 706, the multi-resolution voxel space generation component may generate a voxel covariance grid for each semantic class for the first resolution of the multi-resolution voxel space. In some examples, the multiresolution voxel space generation component can assign data points to corresponding voxels of a matching semantic layer of the multiresolution voxel space to generate each of the first resolution grids. be. Once the data points are assigned to voxels in the semantic layer, the multi-resolution voxel space generation component may determine voxel space statistics, such as the mean and covariance for each voxel. In some cases, the multi-resolution voxel space generation component may start with the finest resolution layer and generate each subsequent coarser layer when forming the multi-resolution voxel space.

７０８において、多重解像度ボクセル空間生成コンポーネントは、生成する追加の解像度があるかどうかを決定する。例えば、多重解像度ボクセル空間生成コンポーネントは、解像度が解像度しきい値より大きいかどうか、および／または、層数が層しきい値より大きいかどうかを決定することがある。追加の解像度があるならば、処理７００は７１０に進む。しかしながら、生成する追加の解像度がないならば、処理７００は７１２に進む。 At 708, the multi-resolution voxel space generation component determines whether there are additional resolutions to generate. For example, the multi-resolution voxel space generation component may determine whether the resolution is greater than the resolution threshold and/or whether the number of layers is greater than the layer threshold. If there are additional resolutions, process 700 proceeds to 710 . However, if there are no additional resolutions to generate, process 700 proceeds to 712 .

７１０において、多重解像度ボクセル空間生成コンポーネントは、次のより高い解像度に対して、セマンティッククラスごとのボクセルコバリアンスグリッドを生成することがある。各々のより高い解像度のグリッドは、より低いグリッドのボクセルをマージすることによって、より低い解像度のグリッドに少なくとも一部に基づいて形成されることがある。例として、多重解像度ボクセル空間生成コンポーネントは、セマンティック層内のより低い解像度グリッドからボクセルの近傍（たとえば２ｘ２ｘ２グループなど）を取り、近傍内のボクセルの各々から個々の分布（例えば、ガウス分布）の重み付き和を計算して、単一のより高い解像度ボクセルを生成することがある。このように、各々のより高い解像度は、より低い解像度のグリッドよりも少ないボクセルを有し、多重解像度ボクセル空間は、多重解像度ボクセルピラミッドを形成することがある。 At 710, the multi-resolution voxel space generation component may generate a voxel covariance grid for each semantic class for the next higher resolution. Each higher resolution grid may be formed based at least in part on a lower resolution grid by merging voxels of the lower grid. As an example, the multi-resolution voxel space generation component takes a neighborhood of voxels (e.g., 2x2x2 groups, etc.) from a lower resolution grid in the semantic layer and generates an individual distribution (e.g., Gaussian distribution) of weights from each of the voxels in the neighborhood. A weighted sum may be computed to produce a single higher resolution voxel. Thus, each higher resolution has fewer voxels than the lower resolution grid, and the multi-resolution voxel space may form a multi-resolution voxel pyramid.

７１２において、多重解像度ボクセル空間生成コンポーネントは、結果の多重解像度ボクセル空間を平滑化することがある。例えば、多重解像度ボクセル空間生成コンポーネントは、多重解像度ボクセル空間のボクセルをガウスカーネルにより畳み込み、ボクセルの法線推定（normal estimate）のノイズを減らすことがある。加えて、ライダーが、物理環境を表すデータを収集するのに用いられるとき、多重解像度ボクセル空間生成コンポーネントは、これらのデータ点が、誤って決定された法線と不十分な統計情報とを有することがあるので、しきい値未満の観測に対応するとき（例えば、単一のライダービームが観測される場合）、ボクセルを取り除くことがある。 At 712, the multi-resolution voxel space generation component may smooth the resulting multi-resolution voxel space. For example, the multiresolution voxel space generation component may convolve the voxels of the multiresolution voxel space with a Gaussian kernel to reduce noise in the normal estimates of the voxels. In addition, when lidar is used to collect data representing the physical environment, the multi-resolution voxel space generation component finds that these data points have erroneously determined normals and insufficient statistical information. Therefore, voxels may be removed when they correspond to observations below the threshold (eg, when a single lidar beam is observed).

７１４において、多重解像度ボクセル空間生成コンポーネントは、最大の重みより大きい重みを有するボクセルを最大の重みに減らし、最小の重みより小さい重みを有するボクセルを取り除くことがある。場合によっては、ボクセルに最大および最小の重み範囲を適用することによって、多重解像度ボクセル空間は、より均一なサンプル密度を維持し、システム（例えば、自律車両）に近いボクセルが、図８に関して以下に述べるアライメント処理に崩壊を引き起こすことを防ぐことがある。他の例では、多重解像度ボクセル空間は、３次元空間内の各ボクセルのロケーションのハッシュとして格納され、次に、ハッシュにインデックスを付け、迅速なメモリーアクセスを提供する（例えば、ボクセルハッシング（voxel hashing）を用いて）ルックアップテーブルを含むことがある。このように、多重解像度ボクセル空間の望ましい部分のみがメモリーにロードされ、多重解像度ボクセル空間をアクセスすることは、より少ない処理リソースを用いて行われることがある。 At 714, the multi-resolution voxel space generation component may reduce voxels with weights greater than the maximum weight to the maximum weight and remove voxels with weights less than the minimum weight. In some cases, by applying a maximum and minimum weight range to the voxels, the multi-resolution voxel space maintains a more uniform sample density, and voxels close to the system (e.g., autonomous vehicle) are shown below with respect to FIG. It may prevent the described alignment process from causing disruption. In another example, the multi-resolution voxel space is stored as a hash of each voxel's location in the three-dimensional space, and then the hash is indexed to provide quick memory access (e.g., voxel hashing ) to include a lookup table. In this way, only the desired portion of the multi-resolution voxel space is loaded into memory, and accessing the multi-resolution voxel space may be done using fewer processing resources.

図８は、本明細書に説明されているターゲット多重解像度ボクセル空間をリファレンス多重解像度ボクセル空間とアライメントすることについての例示的な処理８００を例示する別のフロー図である。例として、上に述べている、多重解像度ボクセル空間アライメントコンポーネントは、多重解像度ボクセル空間構造を利用して、たとえば、ターゲット多重解像度ボクセル空間およびリファレンス多重解像度ボクセル空間など、物理環境の複数のスキャンをアライメントすることがある。例として、多重解像度ボクセル空間（例えば、ターゲット多重解像度ボクセル空間）、が物理環境を表す特有のスキャンまたはデータセットに対して生成されると、多重解像度ボクセル空間アライメントコンポーネントは、多重解像度ボクセル空間を、シーンを表す多重解像度ボクセル空間（例えば、リファレンス多重解像度ボクセル空間）とアライメントすることがある。 FIG. 8 is another flow diagram illustrating an exemplary process 800 for aligning a target multi-resolution voxel space with a reference multi-resolution voxel space as described herein. By way of example, the multiresolution voxel space alignment component described above utilizes a multiresolution voxel space structure to align multiple scans of a physical environment, e.g., a target multiresolution voxel space and a reference multiresolution voxel space. I have something to do. By way of example, when a multi-resolution voxel space (e.g., a target multi-resolution voxel space) is generated for a particular scan or dataset representing a physical environment, the multi-resolution voxel space alignment component transforms the multi-resolution voxel space into It may be aligned with a multi-resolution voxel space representing the scene (eg, a reference multi-resolution voxel space).

８０２において、多重解像度ボクセル空間アライメントコンポーネントは、シーンを表すリファレンス多重解像度ボクセル空間とアライメントされるターゲット多重解像度ボクセル空間を受信することがある。場合によっては、リファレンス多重解像度ボクセル空間は、システムによって維持され、環境の各々の新しいスキャンにより更新されて、物体検出およびトラッキングを前もって決めることがある。 At 802, a multi-resolution voxel space alignment component may receive a target multi-resolution voxel space aligned with a reference multi-resolution voxel space representing a scene. In some cases, a reference multi-resolution voxel space may be maintained by the system and updated with each new scan of the environment to predetermine object detection and tracking.

８０４において、多重解像度ボクセル空間アライメントコンポーネントは、ターゲット多重解像度ボクセル空間とリファレンス多重解像度ボクセル空間との間のボクセル対応（voxel correspondence）を決定することがある。いくつかの例では、対応は、セマンティック層ごと、および解像度ごとであり得る。さらに、対応は、各セマンティック層の各解像度に対して、実質的に同時に決定されることもある。例として、８０４において、多重解像度ボクセル空間アライメントコンポーネントは、ターゲット多重解像度ボクセル空間における特有の解像度の各ボクセルに対して、リファレンス多重解像度ボクセル空間の対応する特有の解像度において平均ターゲット点を含むボクセルの２ｘ２ｘ２近傍を検索することがある。次に、多重解像度ボクセル空間アライメントコンポーネントは、ターゲット多重解像度ボクセル空間におけるボクセルに最も近いセントロイドを有する２ｘ２ｘ２近傍からボクセルを選択することがある。 At 804, a multi-resolution voxel space alignment component may determine a voxel correspondence between the target multi-resolution voxel space and the reference multi-resolution voxel space. In some examples, the correspondence may be per semantic layer and per resolution. Moreover, correspondences may be determined substantially simultaneously for each resolution of each semantic layer. As an example, at 804, for each voxel of a unique resolution in the target multiresolution voxel space, the multiresolution voxel space alignment component aligns 2x2x2 of the voxels containing the average target point at the corresponding unique resolution of the reference multiresolution voxel space. It may search the neighborhood. A multi-resolution voxel space alignment component may then select a voxel from the 2x2x2 neighborhood with the closest centroid to the voxel in the target multi-resolution voxel space.

８０６において、多重解像度ボクセル空間アライメントコンポーネントは、対応するボクセルを再重み付けすることがある。例として、多重解像度ボクセル空間アライメントコンポーネントは、２つの対応するボクセル（例えば、ターゲットボクセル、および選択されたボクセル）に含まれるデータの重み付き平均を計算することがある。例として、組み合わされた共分散が計算されることがある。集約した共分散が決定されると、多重解像度ボクセル空間アライメントコンポーネントは、２つの対応するボクセルの組み合わされた共分散行列に主成分分析（固有値分解など）を行い、最小の固有値を、マッチングした法線ベクトルとして選択することがある。各ボクセルに対する残差（または誤差）は、マッチングされた法線ベクトルおよび／または対応するボクセルの平均（またはセントロイド）の差に比例して計算されることがあり、２つのフレーム間の変換にわたる最適化は、残差を最小化するように行われることがある。このように、アライメントの間、より粗い解像度（例えば、より大きなボクセルに対応する解像度）は、より細かい解像度より前のマッチングに帰着することがある。このように、より粗い解像度におけるマッチングは、より細かい解像度が、図６に関して上に述べているマッチングをすることを開始し、アライメント処理を完了することができるように、２つの多重解像度ボクセル空間を、より近いアライメントに至らせる。 At 806, the multi-resolution voxel spatial alignment component may reweight the corresponding voxels. As an example, a multi-resolution voxel spatial alignment component may compute a weighted average of data contained in two corresponding voxels (eg, a target voxel and a selected voxel). As an example, a combined covariance may be calculated. Once the aggregated covariance is determined, the multiresolution voxel space alignment component performs a principal component analysis (such as eigenvalue decomposition) on the combined covariance matrix of the two corresponding voxels and finds the smallest eigenvalue by the matched method May be selected as a line vector. A residual (or error) for each voxel may be computed proportional to the difference between the matched normal vector and/or the mean (or centroid) of the corresponding voxel, over the transformation between the two frames. Optimization may be done to minimize the residual. Thus, during alignment, coarser resolutions (eg, resolutions corresponding to larger voxels) may result in matching before finer resolutions. Thus, matching at the coarser resolution divides the two multi-resolution voxel spaces so that the finer resolution can begin matching as described above with respect to FIG. 6 and complete the alignment process. , leading to a closer alignment.

８０８において、多重解像度ボクセル空間アライメントコンポーネントは、イテレーションの数が完了したかどうかを決定することがある。例えば、システムは、アライメント処理のイテレーションの最大数を含み、２つの多重解像度ボクセル空間をアライメントすることに関連付けられた処理時間の上限を定める、または制限をすることがある。イテレーションの数が完了したならば、処理８００は８１２に進み、そうでなければ、処理８００は８１０に進行する。 At 808, the multi-resolution voxel spatial alignment component may determine whether the number of iterations has been completed. For example, the system may cap or limit the processing time associated with aligning two multi-resolution voxel spaces, including the maximum number of iterations of the alignment process. If the number of iterations has been completed, process 800 proceeds to 812 , otherwise process 800 proceeds to 810 .

８１０において、多重解像度ボクセル空間アライメントコンポーネントは、組み合わされた多重解像度ボクセル空間についての再重み付き平均が許容誤差しきい値より下であるかどうかを決定することがある。再重み付き平均が許容誤差しきい値より下であるならば、処理８００は８１２に進み、そうでなければ、処理８００は８０４に戻る。システムは、２つの多重解像度ボクセル空間が、特有の使用に対してどれぐらい十分にアライメントされるべきかという要件を設定する許容誤差を含むことがある。例として、いくつかのアプリケーションでは、物理環境の粗い理解のみが必要とされる一方、たとえば自律車両などの他の場合、より正確な詳細な理解が必要とされることがある。 At 810, a multi-resolution voxel space alignment component may determine whether the re-weighted average for the combined multi-resolution voxel space is below a tolerance threshold. If the re-weighted average is below the tolerance threshold, process 800 proceeds to 812 , otherwise process 800 returns to 804 . Systems may include tolerances that set requirements on how well two multi-resolution voxel spaces should be aligned for a particular use. As an example, some applications may require only a coarse understanding of the physical environment, while others, such as autonomous vehicles, may require a more precise and detailed understanding.

８１２において、多重解像度ボクセル空間アライメントコンポーネントは、ターゲット多重解像度ボクセル空間とリファレンス多重解像度ボクセル空間とのアライメントにおける不確かさの量を減らすおよび／または決定することがある。例えば、最適化（例えば、上記の非線形最適化）の後、多重解像度ボクセル空間アライメントコンポーネントは、アライメントされたボクセルに測定ノイズを伝搬することがある。以下により詳細に議論される１つの特定の例では、多重解像度ボクセル空間アライメントコンポーネントは、零平均および零共分散を有するガウス分布に従ってアライメントの不確かさのモデルを決定することがある。例として、多重解像度ボクセル空間アライメントコンポーネントは、各ステップが次のとおりに計算されるように、正規確率変数ｘ～Ｎ（μ_x，Σ_x）をモデリングすることがある。
ｘ＝Ｃｚ
ただし、Ｃ＝（Ｊ^TＷＪ）^―1Ｊ^TＷ（ＪおよびＷは決定された同一のヤコビアンおよび重みを表し、Ｃは重み付き擬ハミルトニアンを表す）、ｚ～Ｎ（０，δ_z ²Ｉ）は残差である。 At 812, a multi-resolution voxel space alignment component may reduce and/or determine the amount of uncertainty in the alignment of the target multi-resolution voxel space and the reference multi-resolution voxel space. For example, after optimization (eg, the non-linear optimization described above), the multi-resolution voxel spatial alignment component may propagate measurement noise to the aligned voxels. In one particular example, discussed in more detail below, the multi-resolution voxel space alignment component may determine the alignment uncertainty model according to a Gaussian distribution with zero mean and zero covariance. As an example, the multi-resolution voxel space alignment component may model normal random variables xN(μ _x , Σ _x ) such that each step is computed as follows.
x=Cz
where C=(J ^T WJ) ⁻¹ J ^T W (J and W represent the same determined Jacobian and weights, C represents the weighted pseudo-Hamiltonian), z˜N(0, δ _z ² I ) is the residual.

次に、ｘの共分散は、残差ノイズを伝搬させ、次のとおりに展開することによって決定されることがある。
Σ_x＝ＣΣ_zＣ^T
Σ_x＝（Ｊ^TＷＪ）^―1Ｊ^Tσ_z ²Ｗ²J（Ｊ^TＷＪ）^―1
場合によっては、残差ノイズは、インクリメントに計算され、次に、行列Ｊ^TＷＪおよびσ_z ²Ｊ^TＷ²Ｊは、各ボクセルに対して集められることがある。１つの特有の例では、さらに、多重解像度ボクセル空間アライメントコンポーネントは、各残差 The covariance of x may then be determined by propagating the residual noise and expanding:
_Σx = _CΣzC ^T
Σ _x = (J ^T WJ) ^-1 J ^T σ _z ² W ² J (J ^T WJ) ^-1
In some cases, the residual noise may be calculated incrementally and then the matrices J ^T WJ and σ _z ² J ^T W ² J may be collected for each voxel. In one specific example, the multi-resolution voxel spatial alignment component further includes each residual

の等方性の分散を導出することもあり、各ボクセルの平均は、 We also derive the isotropic variance of , and the mean of each voxel is

に従って分布され、ただし、σ_p ²は、各点観測における等方性のガウス雑音であり、Ｗ_iは、ボクセルの重みである。次に、残差の共分散は、次のとおりに計算されることがある。
σ_z ²Ｉ＝Ｅ［ｚｚ^T］
Ｅ［ｚｚ^T］を含む項は、ボクセル平均について仮定される独立のために零になる。ゆえに、 where σ _p ² is the isotropic Gaussian noise at each point observation and W _i are the voxel weights. The residual covariance may then be computed as follows.
σ _z ² I=E[zz ^T ]
The term involving E[zz ^T ] vanishes due to the assumed independence of voxel means. therefore,

ただし、 however,

次に、多重解像度ボクセル空間アライメントコンポーネントは、外れ値の名目上の数から仮定される寄与を加えることによって、および、対角を最小値にクランピングするまたは制限することによって、結果の共分散行列をさらに正則化することもある。 The multiresolution voxel spatial alignment component then computes the resulting covariance matrix can be further regularized.

８１４において、多重解像度ボクセル空間アライメントコンポーネントは、アライメントされた多重解像度ボクセル空間（測定不確実さを含むことがある）を出力することがある。例として、アライメントされた多重解像度ボクセル空間は、たとえば、自律車両のプランニングシステムまたはパーセプションシステムなど、別のシステムに提供されることがある。他の場合、アライメントされた多重解像度ボクセル空間は、１つまたは複数のネットワークを介して、たとえばクラウドベースのコンピューティングシステムなど、リモートシステムまたはリモートデバイスに送られることがある。他の例では、多重解像度ボクセル空間アライメントコンポーネントは、物理環境に関して、車両の位置に関連付けられたターゲット多重解像度ボクセル空間とリファレンス多重解像度ボクセル空間との間のローカリゼーションデータまたは変換データを出力することがある。いくつかの例では、リファレンス多重解像度ボクセル空間は、クラウドベースのコンピューティングシステムによって予め生成され、車両が進み始めるより前に車両に送られることがある。場合によっては、クラウドベースシステムｍａは、複数の車両から、動作の間、収集されたデータ（例えば、マージターゲット多重解像度ボクセル空間（merge target multi-resolution voxel space））を用いて、リファレンス多重解像度ボクセル空間を更新する。さらに、いくつかの例では、車両は、オフラインのやり方にて（例えば、駐車されたとき、ないしは別のやり方により機敏に進む状況でないなど）リファレンス多重解像度ボクセル空間を更新するように装備されることがある。 At 814, a multi-resolution voxel space alignment component may output an aligned multi-resolution voxel space (which may include measurement uncertainty). By way of example, the aligned multi-resolution voxel space may be provided to another system, such as, for example, an autonomous vehicle planning system or a perception system. In other cases, the aligned multi-resolution voxel space may be sent via one or more networks to a remote system or device, such as a cloud-based computing system, for example. In other examples, the multi-resolution voxel space alignment component may output localization data or transformation data between a target multi-resolution voxel space associated with the position of the vehicle and a reference multi-resolution voxel space with respect to the physical environment. . In some examples, the reference multi-resolution voxel space may be pre-generated by a cloud-based computing system and sent to the vehicle before the vehicle begins moving. In some cases, the cloud-based system ma uses data collected during operation (e.g., merge target multi-resolution voxel space) from multiple vehicles to generate reference multi-resolution voxel Update space. Further, in some examples, the vehicle may be equipped to update the reference multi-resolution voxel space in an off-line manner (e.g., when parked or otherwise not in an otherwise nimble situation). There is

図９は、本開示の態様にしたがって、本明細書に説明される技法を実装するための例示的なシステムを例示する。いくつかの例では、システムは、図１～８を参照して本明細書に説明される態様の１つまたは複数の特徴、処理リソース、構成要素、および／または機能性を含むことがある。上に述べている、いくつかの態様では、システムは、自律車両を含むことがある。 FIG. 9 illustrates an example system for implementing the techniques described herein, in accordance with aspects of this disclosure. In some examples, a system may include one or more of the features, processing resources, components, and/or functionality of the aspects described herein with reference to FIGS. 1-8. In some aspects noted above, the system may include an autonomous vehicle.

図９は、本明細書に説明されている多重解像度ボクセル空間アライメントシステムを実装するための例示的なシステム９００のブロック図である。今述べた態様では、システム９００は、車両コンピューティングデバイス９０４、１つまたは複数のセンサーシステム９０６、１つまたは複数の通信接続９０８、および１つまたは複数のドライブシステム９１０１０を含むことがある車両９０２である。 FIG. 9 is a block diagram of an exemplary system 900 for implementing the multi-resolution voxel spatial alignment system described herein. In the aspect just described, the system 900 may include a vehicle computing device 904, one or more sensor systems 906, one or more communication connections 908, and one or more drive systems 91010. is.

車両コンピューティングデバイス９０４は、１つまたは複数のプロセッサー９１２（または処理リソース）と、１つまたは複数のプロセッサー９１２と通信接続されたコンピューター読取り可能媒体９１４とを含むことがある。例示される例では、車両９０２は、自律車両であるが、しかしながら、車両９０２は、どんな他の種類の車両でも、またはどんな他のシステム（例えば、ロボティックシステム、カメラ可能スマートフォンなど）でもあることが可能だろう。例示される例では、車両コンピューティングデバイス９０４のコンピューター読取り可能媒体９１４は、多重解像度ボクセル空間生成コンポーネント９１６、多重解像度ボクセル空間アライメントコンポーネント９１８、プランニングコンポーネント９２０、パーセプションコンポーネント９２２を、自律車両に関連付けられた他のシステムも同様に、格納する。さらに、コンピューター読取り可能媒体９１４は、センサーデータ９２４および多重解像度ボクセル空間９２６を格納することもある。いくつかの実装では、システムが、コンピューター読取り可能媒体に格納されたデータも同様に、加えてまたは代わりに、車両９０２にアクセス可能であり得る（例えば、車両９０２から離れた他のコンピューター読取り可能媒体に格納される、ないしは別のやり方によりアクセス可能である）ことは、理解されるべきである。 Vehicle computing device 904 may include one or more processors 912 (or processing resources) and computer readable media 914 communicatively coupled with one or more processors 912 . In the illustrated example, vehicle 902 is an autonomous vehicle, however, vehicle 902 may be any other type of vehicle or any other system (eg, robotic system, camera-enabled smart phone, etc.). would be possible. In the illustrated example, computer readable medium 914 of vehicle computing device 904 implements multi-resolution voxel space generation component 916, multi-resolution voxel space alignment component 918, planning component 920, and perception component 922 associated with the autonomous vehicle. Other systems store similarly. Additionally, computer readable media 914 may also store sensor data 924 and multi-resolution voxel space 926 . In some implementations, the system may be able to access data stored on computer readable media as well, in addition or alternatively, to vehicle 902 (e.g., other computer readable media remote from vehicle 902). stored in or otherwise accessible).

多重解像度ボクセル空間生成コンポーネント９１６は、たとえば、ライダーシステムの出力など、物理環境を表すデータ点から多重解像度ボクセル空間を生成することがある。場合によっては、多重解像度ボクセル空間生成コンポーネント９１６は、複数のライダーポイント、または点群として表されるライダーデータを受信することがある。多重解像度ボクセル空間生成コンポーネント９１６は、ライダーポイントを、第１のベース解像度のボクセルグリッドのボクセルに割り当てることがある。次に、多重解像度ボクセル空間生成コンポーネント９１６は、より高いレベルのボクセルグリッドを生成するとき、より低い解像度のグリッドのボクセルをマージすることがある。例えば、多重解像度ボクセル空間生成コンポーネント９１６は次のより高いレベルのボクセルグリッドを形成するとき、より低い解像度のグリッドにおける近傍（たとえば、２ｘ２ｘ２近傍など）内のボクセルをマージすることがある。 A multi-resolution voxel space generation component 916 may generate a multi-resolution voxel space from data points representing a physical environment, such as the output of a lidar system, for example. In some cases, the multi-resolution voxel space generation component 916 may receive lidar data represented as a plurality of lidar points or point clouds. A multi-resolution voxel space generation component 916 may assign lidar points to voxels of the first base resolution voxel grid. Next, the multi-resolution voxel space generation component 916 may merge the voxels of the lower resolution grid when generating the higher level voxel grid. For example, the multi-resolution voxel space generation component 916 may merge voxels within a neighborhood (eg, a 2x2x2 neighborhood, etc.) in a lower resolution grid when forming the next higher level voxel grid.

１つの特有の例では、多重解像度ボクセル空間生成コンポーネント１０１６は、ブロックに、メモリー内にて動かされるまたは再配置されることを可能にするオフセットとして実装されたポインタを有するコリジョンフリーのハッシュテーブルを介してアクセス可能なメモリーのマッピング可能な連続ブロックとして多重解像度ボクセル空間を生成することがある。場合によっては、メモリーブロックは、ヘッダ、インデックス（例えば、ハッシュテーブル）、およびボクセルアレイを有するタイルとして表されることがある。インデックスは、層および／または解像度によって分離されることがある。ボクセルアレイは、単一のアレイ、または解像度によって配置される複数のアレイ（例えば、第１のセマンティック層の第１の解像度のグリッド、第２のセマンティック層の第１の解像度のグリッド、第３セマンティック層の第１の解像度のグリッド、．．．）を含むことがある。ボクセルアレイにおいて、各エレメントは、ボクセルと、ボクセルの空間位置のキーとであり得る。場合によっては、ヘッダは、スタック識別子、バージョン数、解像度の数、セマンティックラベルの数、層の総数、オフセットなどを含むことがある。インデックスは、ハッシュ値をメモリーブロック内のオフセットに関係させるスパースハッシュテーブル（sparse hash table）であり得る。さらに、インデックスは、今述べた特有のテーブルに対する入力をソルトするのに用いられるソルト値と、モジュラス計算の第１のラウンドに用いられる素数値とを含むこともある。 In one specific example, the multi-resolution voxel space generation component 1016 generates blocks via a collision-free hash table with pointers implemented as offsets that allow them to be moved or relocated in memory. A multi-resolution voxel space may be generated as a mappable contiguous block of memory accessible by In some cases, memory blocks are represented as tiles with headers, indices (eg, hash tables), and voxel arrays. The indices may be separated by layer and/or resolution. The voxel array may be a single array, or multiple arrays arranged by resolution (e.g., first semantic layer first resolution grid, second semantic layer first resolution grid, third semantic layer Grid of the first resolution of the layer,...). In a voxel array, each element can be a voxel and a key to the spatial position of the voxel. In some cases, the header may include a stack identifier, version number, resolution number, semantic label number, total number of layers, offset, and the like. The index can be a sparse hash table that relates hash values to offsets in memory blocks. In addition, the index may also contain the salt value used to salt the input to the particular table just mentioned, and the prime value used in the first round of modulus calculation.

いくつかの例では、多重解像度ボクセル空間アライメントコンポーネント１０１８は、２つの多重解像度ボクセル空間（例えば、ターゲット多重解像度ボクセル空間およびリファレンス多重解像度ボクセル空間）をアライメントすることがある。場合によっては、多重解像度ボクセル空間アライメントコンポーネント９１８は、リファレンス多重解像度ボクセル空間およびターゲット多重解像度ボクセル空間のボクセル間の対応を見つけることがある。多重解像度ボクセル空間アライメントコンポーネント９１８は、ターゲット多重解像度ボクセル空間における特有の解像度の各ボクセルに対して、リファレンス多重解像度ボクセル空間の対応する特有の解像度にて平均ターゲット点を含むボクセルの３次元（例えば、２ｘ２ｘ２、３ｘ３ｘ３、５ｘ５ｘ５など）近傍を検索することによって、対応を見つけることがある。近傍内の識別されたボクセルのうち、多重解像度ボクセル空間アライメントコンポーネント９１８は、ターゲット多重解像度ボクセル空間のボクセルに近いセントロイドを有するボクセルを選択することがある。次に、多重解像度ボクセル空間アライメントコンポーネント９１８は、リファレンス多重解像度ボクセル空間における選択されたボクセルの分布を、ターゲット共分散スタック（target covariance stack）のボクセルにより平均することがある。次に、多重解像度ボクセル空間アライメントコンポーネント１０１８は、組み合わされた共分散行列に主成分分析を行い、最小の固有値を、２つのボクセルに対してマッチングされた法線ベクトルとして選択することがある。 In some examples, the multi-resolution voxel space alignment component 1018 may align two multi-resolution voxel spaces (eg, a target multi-resolution voxel space and a reference multi-resolution voxel space). In some cases, the multi-resolution voxel space alignment component 918 may find correspondence between voxels in the reference multi-resolution voxel space and the target multi-resolution voxel space. A multi-resolution voxel space alignment component 918 , for each voxel of a unique resolution in the target multi-resolution voxel space, aligns the voxel's three dimensions (e.g., 2x2x2, 3x3x3, 5x5x5, etc.) correspondences may be found by searching the neighborhood. Of the voxels identified within the neighborhood, the multiresolution voxel space alignment component 918 may select voxels with centroids that are close to voxels in the target multiresolution voxel space. A multiresolution voxel space alignment component 918 may then average the distribution of the selected voxels in the reference multiresolution voxel space with the voxels of the target covariance stack. A multi-resolution voxel spatial alignment component 1018 may then perform principal component analysis on the combined covariance matrices and select the smallest eigenvalue as the matched normal vector for the two voxels.

プランニングコンポーネント９２０は、物理環境を通過して横切るために従う車両９０２に対してパスを決定することがある。例えば、プランニングコンポーネント９２０は、種々のルートおよび軌道および種々の詳細レベルを決定することがある。例えば、プランニングコンポーネント９２０は、現在ロケーションから目標ロケーションまでの移動経路を決定することがある。本解説の目的のために、ルートは、２つのロケーション間を進むためのウェイポイントのシーケンスであり得る。 A planning component 920 may determine a path for the vehicle 902 to follow to traverse through the physical environment. For example, planning component 920 may determine different routes and trajectories and different levels of detail. For example, planning component 920 may determine a travel route from a current location to a target location. For the purposes of this discussion, a route may be a sequence of waypoints to travel between two locations.

いくつかの実装では、プレディクションコンポーネント９２２は、多重解像度ボクセル空間生成コンポーネント９１６および多重解像度ボクセル空間アライメントコンポーネント９１８によって出力された多重解像度ボクセル空間９２６に少なくとも部分的に基づいて、たとえば、姿勢、速さ、軌道、速度、ヨー、ヨー率、ロール、ロール率、ピッチ、ピッチ率、位置、加速度、または他の特性など、オブジェクト（例えば、車両、歩行者、動物など）の現在を推定する、および／または将来、特性、または状態を予測するように構成されることがある。 In some implementations, the prediction component 922 generates, e.g., pose, velocity , trajectory, velocity, yaw, yaw rate, roll, roll rate, pitch, pitch rate, position, acceleration, or other properties, and/or or may be configured to predict a property or condition in the future.

さらに、車両９０２は、車両９０２と、他のローカルまたはリモートのコンピューティングデバイス（複数可）との間の通信を可能にする通信接続（複数可）９０８を含むことも可能である。例として、通信接続（複数可）９０８は、車両９０２の他のローカルコンピューティングデバイス（複数可）との、および／またはドライブシステム（複数可）９１０との通信を容易にすることがある。さらに、通信接続（複数可）９０８は、車両９０２が、他の近くのコンピュータデバイス（複数可）（たとえば、他の近くの車両、交通信号機など）と通信できるようにすることもある。さらに、通信接続（複数可）９０８は、車両９０２に、リモート遠隔操作コンピューティングデバイス、または他のリモートサービスと通信できるようにもする。 Additionally, vehicle 902 may also include communication connection(s) 908 that enable communication between vehicle 902 and other local or remote computing device(s). By way of example, communication connection(s) 908 may facilitate communication with other local computing device(s) of vehicle 902 and/or with drive system(s) 910 . Additionally, communication connection(s) 908 may allow vehicle 902 to communicate with other nearby computing device(s) (eg, other nearby vehicles, traffic lights, etc.). Additionally, communication connection(s) 908 also allow vehicle 902 to communicate with a remote teleoperated computing device or other remote service.

通信接続（複数可）９０８は、車両コンピューティングデバイス９０４を、別のコンピューティングデバイス（例えば、コンピューティングデバイス（複数可）９３０）に、および／またはネットワークたとえばネットワーク（複数可）９２８などに接続するための物理および／または論理インターフェースを含むことがある。例えば、通信接続（複数可）９０８は、たとえば、ＩＥＥＥ８０２．１１規格によって定義された周波数、たとえばＢＬＵＥＴＯＯＴＨ（登録商標）などのショートレンジのワイヤレス周波数、セルラー通信（例えば２Ｇ、３Ｇ、４Ｇ、４ＧＬＴＥ、５Ｇなど）、またはそれぞれのコンピューティングデバイスに他のコンピューティングデバイス（複数可）とインターフェースできるようにするどんな適切なワイヤードもしくはワイヤレスの通信プロトコルでも介してなど、Ｗｉ－Ｆｉベースの通信を可能にすることがある。いくつかの例では、車両９０２の通信接続９０８は、多重解像度ボクセル空間９２６をコンピューティングデバイス（複数可）９３０に送信するまたは送ることがある。 Communication connection(s) 908 connect vehicle computing device 904 to another computing device (eg, computing device(s) 930) and/or to a network, such as network(s) 928. may include physical and/or logical interfaces for For example, the communication connection(s) 908 may be, for example, frequencies defined by the IEEE 802.11 standard, short range wireless frequencies such as BLUETOOTH, cellular communications (eg 2G, 3G, 4G, 4G LTE, 5G), or via any suitable wired or wireless communication protocol that allows each computing device to interface with other computing device(s). Sometimes. In some examples, communication connection 908 of vehicle 902 may transmit or send multi-resolution voxel space 926 to computing device(s) 930 .

少なくとも１つの例にて、センサーシステム（複数可）９０６は、ライダーセンサー、レーダーセンサー、超音波トランスデューサー、ソナーセンサー、ロケーションセンサー（例えば、ＧＰＳ、方位磁針など）、慣性センサー（例えば、慣性測定ユニット（ＩＭＵ）、加速度計、磁力計、ジャイロスコープなど）、カメラ（例えば、ＲＧＢ、ＩＲ、強度、深度、タイムオブフライトなど）、マイクロフォン、ホイールエンコーダー、環境センサー（例えば、温度センサー、湿度センサー、光センサー、圧力センサーなど）、および１つまたは複数のタイムオブフライト（time of flight：ＴｏＦ）センサーなどを含むことが可能である。センサーシステム（複数可）９０６は、今述べたまたは他の種類のセンサーの各々に関する複数のインスタンスを含むことが可能である。例として、ライダーセンサーは、車両９０２の角、前面、後面、側面、および／または上面に位置される個々のライダーセンサーを含むことがある。別の例として、カメラセンサーは、車両９０２の外部および／または内部のあちこちに、種々のロケーションに配置された複数のカメラを含むことが可能である。センサーシステム（複数可）９０６は、入力を、車両コンピューティングデバイス９０４に提供することがある。加えて、または代わりに、センサーシステム（複数可）９０６は、１つまたは複数のネットワーク９２８を介して、センサーデータを、特有の周波数において、予め決められた一定の時間が経つと、ほぼリアルタイムにおいてなど、１つまたは複数のコンピューティングデバイス（複数可）９３０に送ることが可能である。 In at least one example, the sensor system(s) 906 may include lidar sensors, radar sensors, ultrasonic transducers, sonar sensors, location sensors (eg, GPS, compass, etc.), inertial sensors (eg, inertial measurement unit (IMU), accelerometer, magnetometer, gyroscope, etc.), camera (e.g. RGB, IR, intensity, depth, time of flight, etc.), microphone, wheel encoder, environmental sensor (e.g. temperature sensor, humidity sensor, light sensors, pressure sensors, etc.), and one or more time of flight (ToF) sensors, and the like. Sensor system(s) 906 may include multiple instances of each of the just mentioned or other types of sensors. By way of example, the lidar sensors may include individual lidar sensors located on the corners, front, back, sides, and/or top of vehicle 902 . As another example, camera sensors may include multiple cameras positioned at various locations around the exterior and/or interior of vehicle 902 . Sensor system(s) 906 may provide input to vehicle computing device 904 . Additionally or alternatively, the sensor system(s) 906 may transmit sensor data over one or more networks 928 at specific frequencies, over a predetermined period of time, in near real-time. , etc., to one or more computing device(s) 930 .

少なくとも１つの例にて、車両９０２は、１つまたは複数のドライブシステム９１０を含むことが可能である。いくつかの例では、車両９０２は、単一のドライブモジュール９１０を有することがある。少なくとも１つの例にて、車両９０２が複数のドライブシステム９１０を有するならば、個々のドライブシステム９１０は、車両９０２の向き合う端部（例えば、前方および後方など）に置かれることが可能である。少なくとも１つの例にて、ドライブシステム（複数可）９１０は、上に述べている、ドライブシステム（複数可）９１０の状態を、および／または車両９０２の周囲の状態を検出する１つまたは複数のセンサーシステム９０６を含むことが可能である。例および非限定として、センサーシステム（複数可）９０６は、ドライブモジュールの車輪の回転を感知する１つまたは複数のホイールエンコーダー（たとえば、ロータリーエンコーダー）、ドライブシステムの向きおよび加速度を測定する慣性センサー（たとえば、慣性測定ユニット、加速度計、ジャイロスコープ、磁力計など）、カメラまたは他の画像センサー、ドライブシステムの周囲のオブジェクトを聴覚的に検出する超音波センサー、ライダーセンサー、レーダーセンサーなどを含むことが可能である。いくつかのセンサー、たとえばホイールエンコーダーなどは、ドライブシステム（複数可）９１０に一意的であり得る。場合によっては、ドライブシステム（複数可）９１０におけるセンサーシステム（複数可）９０６は、車両９０２の対応するシステムに重なるまたは対応するシステムを補うことが可能である。 In at least one example, vehicle 902 may include one or more drive systems 910 . In some examples, vehicle 902 may have a single drive module 910 . In at least one example, if the vehicle 902 has multiple drive systems 910, individual drive systems 910 can be located at opposite ends of the vehicle 902 (eg, front and rear, etc.). In at least one example, drive system(s) 910 includes one or more sensors that detect the conditions of drive system(s) 910 and/or conditions surrounding vehicle 902, as described above. A sensor system 906 can be included. By way of example and not limitation, the sensor system(s) 906 may include one or more wheel encoders (e.g., rotary encoders) that sense the rotation of the wheels of the drive module, inertial sensors (e.g., rotary encoders) that measure orientation and acceleration of the drive system. inertial measurement units, accelerometers, gyroscopes, magnetometers, etc.), cameras or other image sensors, ultrasonic sensors for auditory detection of objects around the drive system, lidar sensors, radar sensors, etc. It is possible. Some sensors, such as wheel encoders, may be unique to drive system(s) 910 . In some cases, sensor system(s) 906 in drive system(s) 910 may overlap or supplement corresponding systems in vehicle 902 .

少なくとも１つの例にて、本明細書に述べられる構成要素は、上に説明されるようにセンサーデータ９２４を処理することができ、１つまたは複数のネットワーク（複数可）９２８を介して、それぞれの出力を１つまたは複数のコンピューティングデバイス（複数可）９３０に送ることがある。少なくとも１つの例にて、本明細書に述べられるコンポーネントは、それらのそれぞれの出力を、特定の周波数において、予め決められた一定の時間が経つと、ほぼリアルタイムにおいてなど、１つまたは複数のコンピューティングデバイス（複数可）９３０に送ることがある。 In at least one example, the components described herein can process sensor data 924 as described above, via one or more network(s) 928, respectively may be sent to one or more computing device(s) 930. In at least one example, the components described herein transmit their respective outputs to one or more computers at a particular frequency, after a predetermined period of time, in near real time, etc. device(s) 930 .

いくつかの例では、車両９０２は、ネットワーク（複数可）９２８を介して１つまたは複数のコンピューティングデバイス（複数可）９３０にセンサーデータを送ることが可能である。いくつかの例では、車両９０２は、生のセンサーデータ９２４または処理された多重解像度ボクセル空間９２６をコンピューティングデバイス（複数可）９３０に送ることが可能である。他の例では、車両９０２は、処理されたセンサーデータ９２４および／またはセンサーデータの表現（例として、物体パーセプショントラック（object perception track））をコンピューティングデバイス（複数可）９３０に送ることが可能である。いくつかの例では、車両９０２は、センサーデータ９２４を、特有の周波数において、予め決められた一定の時間が経つと、ほぼリアルタイムにおいてなど、コンピューティングデバイス（複数可）９３０に送ることが可能である。場合によっては、車両９０２は、（生のまたは処理された）センサーデータをコンピューティングデバイス（複数可）９３０に送ることが可能である。 In some examples, vehicle 902 may send sensor data to one or more computing device(s) 930 via network(s) 928 . In some examples, vehicle 902 may send raw sensor data 924 or processed multi-resolution voxel space 926 to computing device(s) 930 . In other examples, vehicle 902 can send processed sensor data 924 and/or representations of sensor data (eg, object perception tracks) to computing device(s) 930 . be. In some examples, vehicle 902 may send sensor data 924 to computing device(s) 930 at a specific frequency, after a predetermined period of time, such as in near real time. be. In some cases, vehicle 902 may send sensor data (raw or processed) to computing device(s) 930 .

コンピューティングシステム（複数可）９３０は、プロセッサー（複数可）９３２と、多重解像度ボクセル空間生成コンポーネント９３６、多重解像度ボクセル空間アライメントコンポーネント９３８を、車両９０２から受信されるセンサーデータ９４０および多重解像度ボクセル空間９４２も同様に、格納するコンピューター読取り可能媒体９３４とを含むことがある。いくつかの例では、多重解像度ボクセル空間生成コンポーネント９３６および多重解像度ボクセル空間アライメントコンポーネント９３８は、多重解像度ボクセル空間９４２を生成して、または複数の車両９０２によってキャプチャされたデータから生成された多重解像度ボクセル空間９４２をアライメントして、種々の物理環境のより完全なシーンを形成する、および／または信号拡張物理環境としていっしょに種々のシーンを接続するように構成されることがある。場合によっては、多重解像度ボクセル空間生成コンポーネント９３６および／または多重解像度ボクセル空間アライメントコンポーネント９３８は、機械学習および／または将来のコードテストのために用いられることがあるセンサーデータ９２４からの１つまたは複数のモデルを生成するように構成されることがある。 Computing system(s) 930 implements processor(s) 932 , multi-resolution voxel space generation component 936 , multi-resolution voxel space alignment component 938 to process sensor data 940 and multi-resolution voxel space 942 received from vehicle 902 . may also include computer readable media 934 for storage. In some examples, multi-resolution voxel space generation component 936 and multi-resolution voxel space alignment component 938 generate multi-resolution voxel space 942 or multi-resolution voxels generated from data captured by multiple vehicles 902 . Space 942 may be configured to align to form more complete scenes of different physical environments and/or connect different scenes together as a signal augmenting physical environment. In some cases, multi-resolution voxel space generation component 936 and/or multi-resolution voxel space alignment component 938 may be used for machine learning and/or future code testing of one or more data from sensor data 924 . May be configured to generate a model.

車両９０２のプロセッサー（複数可）９１２およびコンピューティングデバイス（複数可）９３０のプロセッサー（複数可）９３２は、本明細書に説明されているデータを処理し、動作を行う命令を実行する性能があるどんな適切なプロセッサーでもあり得る。例および非限定として、プロセッサー（複数可）９１２および９３２は、１つまたは複数のＣＰＵ（中央処理装置）、ＧＰＵ（Graphics Processing Unit）、または電子データを処理して、その電子データを、レジスターおよび／もしくはコンピューター読取り可能媒体に格納されることが可能である他の電子データに変換するどんな他のデバイスもしくはデバイスの部分でも含むことが可能である。いくつかの例では、さらに、集積回路（例えば、ＡＳＩＣなど）、ゲートアレイ（例えば、ＦＰＧＡなど）、および他のハードウェアデバイスは、エンコードされた命令を実装するように構成される限り、考慮されるプロセッサーであることも可能である。 Processor(s) 912 of vehicle 902 and processor(s) 932 of computing device(s) 930 are capable of processing data and executing instructions to perform the operations described herein. It can be any suitable processor. By way of example and not limitation, processor(s) 912 and 932 may be one or more CPUs (central processing units), GPUs (graphics processing units), or processors that process electronic data and store the electronic data in registers and /or may include any other device or portion of a device that converts to other electronic data that can be stored on a computer readable medium. In some examples, integrated circuits (eg, ASICs, etc.), gate arrays (eg, FPGAs, etc.), and other hardware devices are also considered, so long as they are configured to implement the encoded instructions. It can also be a processor with

コンピューター読取り可能媒体９１４および９３４は、非一時的なコンピューター読取り可能媒体の例である。コンピューター読取り可能媒体９１４および９３４は、本明細書に説明される方法と、種々のシステムに帰する機能とを実装するためのオペレーティングシステムおよび１つまたは複数のソフトウェアアプリケーション、命令、プログラム、および／またはデータを格納することが可能である。種々の実装において、コンピューター読取り可能媒体は、どんな適切なコンピューター読取り可能媒体技術でも、例えば、ＳＲＡＭ（スタティックＲＡＭ）、ＳＤＲＡＭ（シンクロナスＤＲＡＭ）、不揮発性／フラッシュ型メモリー、または情報を格納する性能があるどんな他のタイプのメモリーでも用いて実装されることが可能である。本明細書に説明されるアーキテクチャ、システム、および個々の要素は、多くの他の論理的な、プログラム的な、および物理的なコンポーネントを含むことが可能であり、添付の図面に示されるそれらは、本明細書の説明に関係する単なる例である。 Computer-readable media 914 and 934 are examples of non-transitory computer-readable media. Computer readable media 914 and 934 include an operating system and one or more software applications, instructions, programs, and/or software for implementing the methods and functions ascribed to various systems described herein. It is possible to store data. In various implementations, the computer-readable medium can be any suitable computer-readable medium technology, such as SRAM (static RAM), SDRAM (synchronous DRAM), non-volatile/flash memory, or the ability to store information. It can be implemented with some other type of memory. The architectures, systems, and individual elements described herein can include many other logical, programmatic, and physical components, illustrated in the accompanying figures , are merely examples relevant to the description herein.

理解されることが可能であるように、本明細書に述べられる構成要素は、例示の目的のために区分されているとして説明される。しかしながら、種々の構成要素によって行われる動作は、どんな他の構成要素にでも組み合わされるまたは行われることが可能である。 As can be understood, the components described herein are described as partitioned for purposes of illustration. However, the actions performed by various components may be combined or performed on any other components.

図９が分散システムとして例示される一方、代替えの例では、車両９０２のコンポーネントがコンピューティングデバイス（複数可）９３０に関連付けられることが可能であること、および／または、コンピューティングデバイス（複数可）９３０のコンポーネントが車両９０２に関連付けられることが可能であることは特筆されるべきである。すなわち、車両９０２は、コンピューティングデバイス（複数可）９３０に関連付けられた１つまたは複数の機能を行い、逆もまた同様であることが可能である。 While FIG. 9 is illustrated as a distributed system, in alternative examples, components of vehicle 902 can be associated with computing device(s) 930 and/or computing device(s). It should be noted that components at 930 can be associated with vehicle 902 . That is, vehicle 902 may perform one or more functions associated with computing device(s) 930 and vice versa.

図１０は、本明細書に説明されている、たとえばキャプチャされたデータの点群表現１００８と比較した図２～４の多重解像度ボクセル空間２０８など、多重解像度ボクセル空間の例についての絵入りの図１０００である。例示されている多重解像度ボクセル空間２０８および点群表現１００８の両方は、実世界の物理的なロケーションまたは空間に対応する。 FIG. 10 is a pictorial diagram 1000 for an example multi-resolution voxel space, such as the multi-resolution voxel space 208 of FIGS. 2-4 compared to a point cloud representation 1008 of the captured data described herein. is. Both the illustrated multi-resolution voxel space 208 and point cloud representation 1008 correspond to real-world physical locations or spaces.

例示的な箇条
Ａ．ライダーセンサーと、１つまたは複数のプロセッサーと、１つまたは複数のプロセッサーによって実行可能な命令を格納する１つまたは複数の非一時的なコンピューター読取り可能媒体とを含み、命令は、実行されると、システムに、ライダーセンサーから物理環境を表すデータを受信することと、データの第１の部分に関連付けられた第１のセマンティッククラスを決定することと、データの第２の部分に関連付けられた第２のセマンティッククラスを決定することと、データの第１の部分を第１のボクセルグリッドの第１のボクセルのボクセルに関連付け、第１のボクセルグリッドがターゲット多重解像度ボクセル空間の第１のセマンティック層に関連付けられることと、データの第２の部分を第２のボクセルグリッドの第２のボクセルのボクセルに関連付け、第２のボクセルグリッドがターゲット多重解像度ボクセル空間の第２のセマンティック層に関連付けられ、第１のボクセルグリッドおよび第２のボクセルグリッドが第１の解像度に関連付けられることと、第１のボクセルグリッドの隣り合ったボクセルの第１のセットをマージして、第１のセマンティック層に関連付けられた第３のボクセルグリッドのボクセルを形成し、第３のボクセルグリッドが第１の解像度よりも低い第２の解像度に関連付けられることと、第２のボクセルグリッドの隣り合ったボクセルの第２のセットをマージして、第２のセマンティック層に関連付けられた第４のボクセルグリッドのボクセルを形成し、第４のボクセルグリッドが第２の解像度に関連付けられることとを含む動作を行わせる、システム。 Exemplary Clause A. including a lidar sensor, one or more processors, and one or more non-transitory computer readable media storing instructions executable by the one or more processors, the instructions being executed , having the system receive data representing a physical environment from a lidar sensor; determining a first semantic class associated with a first portion of the data; determining a semantic class of 2; and associating the first portion of the data with voxels in the first voxel of the first voxel grid, the first voxel grid being in the first semantic layer of the target multiresolution voxel space. associating a second portion of the data with voxels of the second voxel of the second voxel grid, the second voxel grid being associated with a second semantic layer of the target multiresolution voxel space; and the second voxel grid are associated with a first resolution; and merging the first set of adjacent voxels of the first voxel grid to obtain a first set of voxel grids associated with the first semantic layer. forming voxels of a voxel grid of three, wherein the third voxel grid is associated with a second resolution lower than the first resolution; and merging a second set of adjacent voxels of the second voxel grid. to form voxels of a fourth voxel grid associated with the second semantic layer, wherein the fourth voxel grid is associated with the second resolution.

Ｂ．動作は、第３のボクセルグリッドの隣り合ったボクセルの第３のセットをマージして、第１のセマンティック層に関連付けられた第５のボクセルグリッドのボクセルを形成し、第５のボクセルグリッドが第２の解像度よりも低い第３の解像度を有することと、第４のボクセルグリッドの隣り合ったボクセルの第４のセットをマージして、第２のセマンティック層に関連付けられた第６のボクセルグリッドのボクセルを形成し、第６のボクセルグリッドが第３の解像度を有することとをさらに含む、項Ａのシステム。 B. An operation merges a third set of adjacent voxels of the third voxel grid to form voxels of a fifth voxel grid associated with the first semantic layer, the fifth voxel grid being associated with the first semantic layer. and merging a fourth set of adjacent voxels of the fourth voxel grid to form a sixth voxel grid associated with the second semantic layer. Forming voxels, the sixth voxel grid having a third resolution.

Ｃ．データの第１の部分を関連付けることは、データの第１の部分の観測の数が観測のしきい値数以上であると決定することと、データの第１の部分の平均値を決定することと、データの第１の部分の共分散を決定することと、平均および共分散を第１のボクセルに関連付けることとを含む、項Ａのシステム。 C. Correlating the first portion of data includes determining that the number of observations in the first portion of data is greater than or equal to a threshold number of observations and determining a mean value of the first portion of data. and determining the covariance of the first portion of the data, and associating the mean and covariance with the first voxel.

Ｄ．動作は、リファレンス多重解像度ボクセル空間を受信することと、ターゲット多重解像度ボクセル空間のターゲットボクセルとリファレンス多重解像度ボクセル空間のリファレンスボクセル（reference voxel）との間のボクセル対応（voxel correspondence）を決定し、ターゲットボクセルおよびリファレンスボクセルが同一の解像度を含むことと、ターゲットボクセルおよびリファレンスボクセルを表す組み合わされたボクセルの重み付き統計量を決定することと、重み付き平均共分散に少なくとも部分的に基づいてリファレンス多重解像度ボクセル空間とターゲット多重解像度ボクセル空間との間の変換を決定することと、変換に少なくとも部分的に基づいて自律車両を制御することとをさらに含む、項Ｃのシステム。 D. The operations include receiving a reference multiresolution voxel space, determining a voxel correspondence between a target voxel in the target multiresolution voxel space and a reference voxel in the reference multiresolution voxel space; voxels and reference voxels comprising the same resolution; determining a combined voxel weighted statistic representing the target voxels and the reference voxels; The system of Section C, further comprising determining a transform between the voxel space and the target multi-resolution voxel space and controlling the autonomous vehicle based at least in part on the transform.

Ｅ．センサーからセンサーデータを受信することと、センサーデータの少なくとも第１の部分を多重解像度ボクセル空間の第１のボクセルグリッドの第１のボクセルに関連付け、第１のボクセルが第１のセマンティック分類（semantic classification）および第１の解像度に関連付けられることと、センサーデータの少なくとも第２の部分を多重解像度ボクセル空間の第２のボクセルグリッドの第２のボクセルに関連付け、第２のボクセルが第１のセマンティック分類および第１の解像度に関連付けられることと、第１のボクセルおよび第２のボクセルに少なくとも部分的に基づいて、第１の解像度よりも低い第２の解像度に関連付けられた第３のボクセルを決定し、第３のボクセルが第１のセマンティック分類に関連付けられることと、多重解像度ボクセル空間に少なくとも部分的に基づいて自律車両を制御することとを含む、方法。 E. receiving sensor data from a sensor; associating at least a first portion of the sensor data with a first voxel of a first voxel grid in multi-resolution voxel space, the first voxel having a first semantic classification; ) and associated with the first resolution; and at least a second portion of the sensor data with a second voxel of a second voxel grid in multiresolution voxel space, the second voxel being associated with the first semantic classification and determining a third voxel associated with a second resolution lower than the first resolution based at least in part on being associated with the first resolution and the first voxel and the second voxel; A method comprising associating a third voxel with a first semantic classification and controlling an autonomous vehicle based at least in part on the multi-resolution voxel space.

Ｆ．データの第１の部分に関連付けられた第１のセマンティック分類を決定することと、データの第３の部分に関連付けられた第２のセマンティック分類を決定することと、第２のセマンティック分類に少なくとも部分的に基づいて、データの第３の部分を多重解像度ボクセル空間の第３のボクセルに関連付けることとをさらに含む、項Ｅの方法。 F. determining a first semantic classification associated with the first portion of the data; determining a second semantic classification associated with the third portion of the data; and associating the third portion of the data with the third voxel of the multi-resolution voxel space based on the objective.

Ｇ．データの第１の部分を関連付けることは、データの第１の部分の第１の平均値を決定することと、データの第１の部分の第１の共分散を決定することと、第１の平均および第１の共分散を第１のボクセルに関連付けることと、データの第２の部分の第２の平均値を決定することと、データの第２の部分の第２の共分散を決定することと、第２の平均および第２の共分散を第２のボクセルに関連付けることとを含む、項Ｅの方法。 G. Associating the first portion of data includes determining a first mean of the first portion of data; determining a first covariance of the first portion of data; associating a mean and a first covariance with the first voxels; determining a second mean value for the second portion of the data; and determining a second covariance for the second portion of the data. and associating the second mean and the second covariance with the second voxel.

Ｈ．第３のボクセルを決定することは、第１のボクセルの第１の平均および第２のボクセルの第２の平均の重み付き平均を決定することと、第１のボクセルの第１の共分散および第２のボクセルの第２の共分散の重み付き平均を決定することと、第１の平均および第２の平均の重み付き平均と第１の共分散および第２の共分散の重み付き平均とを第３のボクセルに関連付けることを含む、項Ｅの方法。 H. Determining a third voxel comprises determining a weighted average of a first average of the first voxels and a second average of the second voxels; a first covariance of the first voxels; determining a weighted average of a second covariance of the second voxels; weighted average of the first average and the second average and weighted average of the first covariance and the second covariance; to the third voxel.

Ｉ．リファレンス多重解像度ボクセル空間を受信することと、第１のボクセルとリファレンス多重解像度ボクセル空間のリファレンスボクセルとの間のボクセル対応を決定し、リファレンスボクセルが第１の解像度を有することと、第１のボクセルおよびリファレンスボクセルを表す組み合わされたボクセルの重み付き統計量を決定することと、重み付き平均統計量に少なくとも部分的に基づいて多重解像度ボクセル空間とリファレンス多重解像度ボクセル空間との間の変換を決定することとをさらに含み、自律車両を制御することは変換に少なくとも部分的に基づく、項Ｅの方法。 I. receiving a reference multiresolution voxel space; determining voxel correspondence between a first voxel and a reference voxel of the reference multiresolution voxel space, the reference voxel having a first resolution; and determining a weighted statistic of the combined voxels representing the voxels and the reference voxels, and determining a transformation between the multiresolution voxel space and the reference multiresolution voxel space based at least in part on the weighted average statistic. and wherein controlling the autonomous vehicle is based at least in part on the transformation.

Ｊ．ボクセル対応は、少なくとも、リファレンス多重解像度ボクセル空間に関連付けられた第１のセントロイドと、ターゲット多重解像度ボクセル空間に関連付けられた第２のセントロイドとの間の距離に基づく、項Ｉの方法。 J. The method of Section I, wherein the voxel correspondence is based on at least a distance between a first centroid associated with the reference multiresolution voxel space and a second centroid associated with the target multiresolution voxel space.

Ｋ．重み付き統計量は、重み付き共分散を含む、項Ｉの方法。 K. The method of Section I, wherein the weighted statistics include weighted covariances.

Ｌ．変換を決定することは、アライメントをガウス分布としてモデリングすることに少なくとも部分的に基づいて、測定の不確かさを決定することをさらに含む、項Ｉの方法。 L. The method of Section I, wherein determining the transform further comprises determining measurement uncertainty based at least in part on modeling the alignment as a Gaussian distribution.

Ｍ．変換を決定することは、第１のボクセルの共分散とリファレンスボクセルの共分散との重み付き平均を決定することと、重み付き平均の最小固有ベクトルを決定することとを含む、項Ｉの方法。 M. The method of Section I, wherein determining the transform includes determining a weighted average of the covariance of the first voxel and the covariance of the reference voxels; and determining the smallest eigenvector of the weighted average.

Ｎ．第１のボクセルおよび第２のボクセルは、第１の解像度内にて隣り合う、項Ｅの方法。 N. The method of term E, wherein the first voxel and the second voxel are adjacent within the first resolution.

Ｏ．実行されると、１つまたは複数のプロセッサーに、車両に関連付けられたセンサーからセンサーデータを受信することと、データの第１の部分をボクセル空間の第１のグリッドの第１のボクセルに関連付け、データの第１の部分が第１のセマンティッククラスを有することと、データの第１の部分に関連付けられた第１の重み付き統計量を決定することと、データの第２の部分をボクセル空間の第１のグリッドの第２のボクセルに関連付けることと、データの第２の部分に関連付けられた第２の重み付き統計量を決定し、データの第２の部分が第１のセマンティッククラスを有することと、第１の重み付き統計量と第２の重み付き統計量とに少なくとも部分的に基づいて、ボクセル空間の第２のグリッドの第３のボクセルに関連付けられた第３の重み付き統計量を決定し、第１のグリッドは、第２のグリッドに関連付けられた第２の解像度よりも少ないボクセルを有する第１の解像度に関連付けられることと、ボクセル空間に少なくとも部分的に基づいて車両を制御することとを含む動作を行わせる命令を格納する非一時的なコンピューター読取り可能媒体。 O.D. When executed, cause one or more processors to receive sensor data from sensors associated with the vehicle, associate a first portion of the data with a first voxel of a first grid in voxel space; a first portion of data having a first semantic class; determining a first weighted statistic associated with the first portion of data; associating with a second voxel of the first grid; determining a second weighted statistic associated with a second portion of the data, the second portion of the data having a first semantic class; and a third weighted statistic associated with a third voxel of a second grid in voxel space based at least in part on the first weighted statistic and the second weighted statistic determining that the first grid is associated with a first resolution having fewer voxels than a second resolution associated with the second grid; and controlling the vehicle based at least in part on the voxel space. A non-transitory computer-readable medium storing instructions for performing actions including:

Ｐ．動作は、データの第１の部分およびデータの第２の部分をボクセル空間の第１のセマンティック層に関連付け、第１のセマンティック層が第１のセマンティッククラスに対応することと、データの第３の部分をボクセル空間の第１のグリッドの第３のボクセルに関連付け、データの第３の部分が第２のセマンティッククラスを有することと、データの第３の部分をボクセル空間の第２のセマンティック層に関連付け、第２のセマンティック層が第２のセマンティッククラスに対応することとを含む、項Ｏの非一時的なコンピューター読取り可能媒体。 P. The operation associates a first portion of data and a second portion of data with a first semantic layer of voxel space, wherein the first semantic layer corresponds to a first semantic class; associating the portion with a third voxel of the first grid in voxel space, the third portion of the data having a second semantic class; The non-transitory computer-readable medium of term O comprising an association, the second semantic layer corresponding to the second semantic class.

Ｑ．第１のセマンティッククラスは、歩行者、車両、建物、動物、または群葉を含む、項Ｏの非一時的なコンピューター読取り可能媒体。 Q. A first semantic class is a non-transitory computer-readable medium of term O including pedestrians, vehicles, buildings, animals, or foliage.

Ｒ．第１の重み付き統計量は、データの第１の部分の第１の平均および第１の共分散を含み、第２の重み付き統計量は、データの第２の部分の第２の平均および第２の共分散を含む、項Ｏの非一時的なコンピューター読取り可能媒体。 R. The first weighted statistic comprises the first mean and the first covariance of the first portion of the data, and the second weighted statistic comprises the second mean and the first covariance of the second portion of the data. A non-transient computer-readable medium of term O comprising a second covariance.

Ｓ．第３の重み付き統計量は、第１の平均および第２の平均の重み付き平均を決定することと、第１の共分散および第２の共分散の重み付き平均を決定することと、第１の平均および第２の平均の重み付き平均と第１の共分散および第２の共分散の重み付き平均を第３ボクセルに関連付けることとに少なくとも部分的に基づいて決定される、項Ｏの非一時的なコンピューター読取り可能媒体。 S. A third weighted statistic is determined by determining a weighted average of the first and second averages; determining a weighted average of the first and second covariances; of the term O, determined at least in part based on a weighted average of the one's average and the second average and associating the weighted average of the first covariance and the second covariance with the third voxel A non-transitory computer-readable medium.

Ｔ．動作は、ボクセル空間および多重解像度ボクセル空間に少なくとも部分的に基づいて、物理環境内の車両のロケーションを決定することをさらに含む、項Ｏの非一時的なコンピューター読取り可能媒体。 T. The non-transitory computer-readable medium of Term O, wherein the operations further include determining the location of the vehicle within the physical environment based at least in part on the voxel space and the multi-resolution voxel space.

Ｕ．１つまたは複数のプロセッサーと、１つまたは複数のプロセッサーによって実行可能な命令を格納する１つまたは複数の非一時的なコンピューター読取り可能媒体とを含み、命令は、実行されると、システムに、データを受信することと、データをターゲット多重解像度ボクセル空間に関連付けることと、リファレンス多重解像度ボクセル空間を受信することと、リファレンス多重解像度ボクセル空間のリファレンスボクセルに関連付けられたターゲット多重解像度ボクセル空間のターゲットボクセルを決定し、ターゲットボクセルおよびリファレンスボクセルが同一の解像度に関連付けられることと、ターゲットボクセルおよびリファレンスボクセルを表す組み合わされたボクセルに関連付けられた重み付き統計値を決定することと、重み付き統計値に少なくとも部分的に基づいて変換を決定することと、変換に少なくとも部分的に基づいて自律車両を制御することとを含む動作を行わせる、システム。 U.S.A. one or more processors and one or more non-transitory computer-readable media storing instructions executable by the one or more processors, the instructions being executed to cause a system to: Receiving data; relating the data to a target multiresolution voxel space; receiving a reference multiresolution voxel space; , determining that the target voxel and the reference voxel are associated with the same resolution; determining a weighted statistic associated with the combined voxels representing the target voxel and the reference voxel; A system for performing an action including determining a transformation based in part and controlling an autonomous vehicle based at least in part on the transformation.

Ｖ．重み付き統計値は、重み付き共分散行列である、項Ｕのシステム。 V. A system of terms U, wherein the weighted statistic is the weighted covariance matrix.

Ｗ．動作は、重み付き平均共分散行列に主成分分析を行うことと、主成分分析の最小固有ベクトルを決定することとをさらに含み、変換を決定することは、最小固有ベクトルにさらに基づく、項Ｕのシステム。 W. The system of term U, wherein the operations further include performing a principal component analysis on the weighted mean covariance matrix and determining a minimum eigenvector of the principal component analysis, wherein determining the transformation is further based on the minimum eigenvector. .

Ｘ．リファレンス多重解像度ボクセル空間のリファレンスボクセルに関連付けられたターゲット多重解像度ボクセル空間のターゲットボクセルを決定し、ターゲットボクセルおよびリファレンスボクセルを表す組み合わされたボクセルに関連付けられた重み付き統計値を決定することは、ボクセルの複数のペアに対して反復して行われ、ボクセルの個々のペアがターゲット多重解像度ボクセル空間のボクセルおよびリファレンス多重解像度ボクセル空間のボクセルを含む、項Ｕのシステム。 X. Determining a target voxel in the target multiresolution voxel space associated with the reference voxel in the reference multiresolution voxel space and determining a weighted statistic associated with the combined voxel representing the target voxel and the reference voxel is performed by determining the voxel , wherein each pair of voxels comprises a voxel in the target multiresolution voxel space and a voxel in the reference multiresolution voxel space.

Ｙ．第１のボクセル空間を含むマップデータを受信し、第１のボクセル空間が、第１の解像度に関連付けられた第１の層と、第１の解像度とは異なる第２の解像度に関連付けられた第２の層とを有することと、車両に関連付けられたセンサーからセンサーデータを受信することと、センサーデータを第２のボクセル空間に関連付け、第２のボクセル空間が、第１の解像度に関連付けられた第１の層と第２の解像度に関連付けられた第２の層とを有することとを含むことと、第１のボクセル空間および第２のボクセル空間に少なくとも部分的に基づいて、第１の集約されたボクセルデータを決定することと、第１の集約されたボクセルデータに少なくとも部分的に基づいて、第１のボクセル空間と第２のボクセル空間との間の変換を決定することと、変換に少なくとも部分的に基づいて物理環境における車両のロケーションを決定することとを含む、方法。 Y. receive map data including a first voxel space, the first voxel space comprising a first layer associated with a first resolution and a second layer associated with a second resolution different than the first resolution; receiving sensor data from sensors associated with the vehicle; and associating the sensor data with a second voxel space, the second voxel space being associated with the first resolution. having a first layer and a second layer associated with a second resolution; and a first aggregation based at least in part on the first voxel space and the second voxel space. determining the aggregated voxel data; determining a transformation between the first voxel space and the second voxel space based at least in part on the first aggregated voxel data; determining the location of the vehicle in the physical environment based at least in part.

Ｚ．第１の集約されたボクセルデータを決定することは、第１のボクセル空間の第１のボクセルに対して、第１のボクセルのセントロイドの指定された距離内のセントロイドを有する第２のボクセル空間のボクセルのセットを識別することと、ボクセルのセットの第２のボクセルを選択し、第２のボクセルが第１のボクセルのセントロイドに最も近いセントロイドを有することと、第１のボクセルの共分散およびリファレンスボクセルの共分散の重み付き平均を決定することとを含む、項Ｙの方法。 Z. Determining the first aggregated voxel data includes for the first voxel in the first voxel space a second voxel having a centroid within a specified distance of the centroid of the first voxel. identifying a set of voxels in space; selecting a second voxel of the set of voxels, the second voxel having a centroid closest to the centroid of the first voxel; and determining a weighted average of the covariance and the covariance of the reference voxels.

ＡＡ．第１のボクセルおよび第２のボクセルは、同一のセマンティッククラスを含む、項Ｚの方法。 AA. The method of term Z, wherein the first voxel and the second voxel include the same semantic class.

ＡＢ．第１の集約されたボクセルデータを決定することは、重み付き平均の最小固有ベクトルを決定することと、最小固有ベクトルに少なくとも部分的に基づいて、第１の集約されたボクセルデータを表す法線ベクトルを決定することとをさらに含む、項Ｚの方法。 AB. Determining the first aggregated voxel data includes determining a minimum eigenvector of the weighted average and determining a normal vector representing the first aggregated voxel data based at least in part on the minimum eigenvector. The method of term Z, further comprising: determining.

ＡＣ．第１の集約されたボクセルデータを再重み付けすることは、ｍ－エスティメータフレームワークを適用することを含む、項ＡＢの方法。 AC. The method of Section AB, wherein reweighting the first aggregated voxel data includes applying an m-estimator framework.

ＡＤ．変換を決定することは、最小固有値に少なくとも部分的に基づいて、残差を決定することと、残差に少なくとも部分的に基づいて、ターゲットボクセル空間とリファレンスボクセル空間との間の回転または平行移動のうちの１つまたは複数を決定することとを含む、項ＡＢの方法。 AD. Determining the transform comprises determining a residual based at least in part on the minimum eigenvalue and rotating or translating between the target voxel space and the reference voxel space based at least in part on the residual. and determining one or more of

ＡＥ．分布をモデリングすることに少なくとも部分的に基づいて、アライメントに関連付けられた不確かさを決定することをさらに含む、項ＡＤの方法。 AE. The method of Section AD, further comprising determining the uncertainty associated with the alignment based at least in part on modeling the distribution.

ＡＦ．変換は、第１のボクセル空間と第２のボクセル空間との間の位置または向きのうちの１つまたは複数における差を示す、項Ｚの方法。 AF. The method of term Z, wherein the transform indicates a difference in one or more of position or orientation between the first voxel space and the second voxel space.

ＡＧ．車両は自律車両であり、方法は、物理環境における自律車両のロケーションに少なくとも部分的に基づいて自律車両を制御することをさらに含む、項Ｚの方法。 AG. The method of term Z, wherein the vehicle is an autonomous vehicle and the method further comprises controlling the autonomous vehicle based at least in part on the location of the autonomous vehicle in the physical environment.

ＡＨ．実行されると、１つまたは複数のプロセッサーに、ターゲット多重解像度ボクセル空間を受信することと、リファレンス多重解像度ボクセル空間を受信することと、ターゲット多重解像度ボクセル空間の第１のターゲットボクセルがリファレンス多重解像度ボクセル空間の第１のリファレンスボクセルに関連付けられると決定し、第１のターゲットボクセルおよび第１のリファレンスボクセルが第１の解像度を共有することと、第１のターゲットボクセルおよび第１のリファレンスボクセルの第１の重み付き統計量を決定することと、ターゲット多重解像度ボクセル空間の第２のターゲットボクセルがリファレンス多重解像度ボクセル空間の第２のリファレンスボクセルに関連付けられると決定し、第２のターゲットボクセルおよび第２のリファレンスボクセルは、第２の解像度を共有し、第２の解像度が第１の解像度とは異なることと、第２のターゲットボクセルおよび第２のリファレンスボクセルを表す第２の組み合わされたボクセルの第２の重み付き統計量を決定することと、第１の重み付き統計量および第２の重み付き統計量に少なくとも部分的に基づいて、ターゲット多重解像度ボクセル空間とリファレンス多重解像度ボクセル空間との間の変換を決定することとを含む動作を行わせる命令を格納する、非一時的なコンピューター読取り可能媒体。 AH. When executed, cause one or more processors to receive a target multiresolution voxel space; receive a reference multiresolution voxel space; associated with a first reference voxel in voxel space, wherein the first target voxel and the first reference voxel share a first resolution; determining a weighted statistic of 1; determining that a second target voxel in the target multiresolution voxel space is associated with a second reference voxel in the reference multiresolution voxel space; of the second combined voxels share a second resolution, the second resolution being different than the first resolution, and a second combined voxel representing the second target voxel and the second reference voxel. determining a weighted statistic of 2 and between the target multiresolution voxel space and the reference multiresolution voxel space based at least in part on the first weighted statistic and the second weighted statistic; A non-transitory computer-readable medium storing instructions for performing actions including determining a transformation.

ＡＩ．ターゲット多重解像度ボクセル空間は、第１の分類に関連付けられたボクセルの第１のセットと、第２の分類に関連付けられたボクセルの第２のセットとを含む、項ＡＨの非一時的なコンピューター読取り可能媒体。 AI. The target multiresolution voxel space includes a first set of voxels associated with the first classification and a second set of voxels associated with the second classification. possible medium.

ＡＪ．ターゲット多重解像度ボクセル空間の第１のターゲットボクセルがリファレンス多重解像度ボクセル空間の第１のリファレンスボクセルに関連付けられると決定することは、第１のターゲットボクセルに対して、第１のターゲットボクセルのセントロイドの指定された距離内のセントロイドを有するリファレンス多重解像度ボクセル空間のボクセルのセットを識別することと、第１のターゲットボクセルのセントロイドに対する第１のリファレンスボクセルのセントロイドの距離に基づいて、ボクセルのセットから第１のリファレンスボクセルを決定することと、を含む、項ＡＨの非一時的なコンピューター読取り可能媒体。 AJ. Determining that a first target voxel in the target multiresolution voxel space is associated with a first reference voxel in the reference multiresolution voxel space involves determining, for the first target voxel, the centroid of the first target voxel. identifying a set of voxels in the reference multiresolution voxel space that have centroids within the specified distance; determining a first reference voxel from the set. The non-transitory computer-readable medium of Section AH.

ＡＫ．第１のターゲットボクセルのセントロイドと第１のリファレンスボクセルのセントロイドとの距離に少なくとも部分的に基づいて、第１のターゲットボクセルおよび第１のリファレンスボクセルが対応すると決定することをさらに含む、項ＡＨの非一時的なコンピューター読取り可能媒体。 AK. determining that the first target voxel and the first reference voxel correspond based at least in part on a distance between the centroid of the first target voxel and the centroid of the first reference voxel. A non-transitory computer readable medium of AH.

ＡＬ．第１の重み付き統計量は、重み付き平均共分散である、項ＡＨの非一時的なコンピューター読取り可能媒体。 AL. The non-temporal computer readable medium of term AH, wherein the first weighted statistic is the weighted mean covariance.

ＡＭ．変換を決定することは、第１の重み付き統計量に主成分分析を行うことと、主成分分析の最小固有値を決定することと、最小固有値に少なくとも部分的に基づいて、残差を決定することと、変換として、残差を最適化するターゲット多重解像度マップとリファレンス多重解像度マップと間の回転または平行移動のうちの１つまたは複数を決定することとを含む、項ＡＨの非一時的なコンピューター読取り可能媒体。 AM. Determining a transformation includes performing a principal component analysis on the first weighted statistic, determining a minimum eigenvalue of the principal component analysis, and determining a residual based at least in part on the minimum eigenvalue. and, as transformations, determining one or more of rotations or translations between the target multiresolution map and the reference multiresolution map that optimize the residuals. computer readable medium.

ＡＮ．残差に少なくとも部分的に基づいて値を最小化する勾配降下技法または非線形最適化技法のうちの１つまたは複数を適用することをさらに含み、変換は、平行移動または回転のうちの１つまたは複数を含む、項ＡＭの非一時的なコンピューター読取り可能媒体。 AN. further comprising applying one or more of a gradient descent technique or a non-linear optimization technique that minimizes a value based at least in part on the residuals, wherein the transformation is one of translation or rotation or The non-transitory computer-readable medium of section AM, including a plurality.

上に説明される例示的な箇条が、ある特有の実装に関して説明されるが、本文書の関連において、さらに、例示的な箇条の内容が、方法、デバイス、システム、コンピューター読取り可能媒体、および／または別の実装を介して実装されることが可能であることは、理解されるべきである。加えて、例Ａ-ＡＮのいずれかは、単独にて、または例Ａ-ＡＮのうちのいずれか他の１つまたは複数との組み合わせにおいて、実装されることがある。 Although the example clauses described above are described with respect to certain specific implementations, in the context of this document it is further understood that the content of the example clauses may include methods, devices, systems, computer-readable media, and/or or via another implementation. Additionally, any of Examples A-AN may be implemented alone or in combination with any other one or more of Examples A-AN.

終結
理解されることが可能であるように、本明細書に述べられる構成要素は、例示の目的のために区分されているとして説明される。しかしながら、種々の構成要素によって行われる動作は、どんな他の構成要素にでも組み合わされるまたは行われることが可能である。さらに、１つの例または実装に関して述べられた構成要素またはステップが、他の例の構成要素またはステップと協働して用いられることがあることは、理解されるべきである。例えば、図９のコンポーネントおよび命令は、図１～８の処理およびフローを利用することがある。 As can be finally understood, the components described herein are described as partitioned for purposes of illustration. However, the actions performed by various components may be combined or performed on any other components. Furthermore, it should be understood that components or steps described with respect to one example or implementation may be used in conjunction with components or steps of other examples. For example, the components and instructions of Figure 9 may utilize the processes and flows of Figures 1-8.

本明細書に説明される技法に関する１つまたは複数の例が説明されたが、種々の代替、追加、置換および均等は、本明細書に説明される技法の範囲内に含まれる。 Having described one or more examples of the techniques described herein, various alternatives, additions, permutations and equivalents are included within the scope of the techniques described herein.

例の説明において、参照は、主張される主題の特定の例を実例として示す、一部を形成する添付の図面に対してされる。他の例が用いられることが可能であること、および、変更または代替、たとえば構造的な変更などがされることが可能であることは、理解されることである。上記の例、変更、または代替は、意図され主張される主題に関して、必ずしも範囲からの逸脱でない。本明細書におけるステップが、ある順において与えられることが可能であるが、いくつかの場合において、順は、ある入力が、説明されるシステムおよび方法の機能を変更することなしに異なる時間に、または別個の順に提供されるように、変更されることが可能である。さらに、開示される手順は、異なる順において実行されることも可能だろう。追加として、本明細書に説明される種々の計算は、開示された順において行われる必要がなく、計算に関し代替の順にすることを用いる他の例は、難なく実装されることが可能であろう。並べ替えられることに加えて、いくつかの場合には、さらに、計算は、同一の結果を有する部分計算に分解されることも可能だろう。 In the description of the examples, reference is made to the accompanying drawings, which form a part, and which illustrate, by way of illustration, certain examples of the claimed subject matter. It is understood that other examples can be used and that modifications or alternatives, such as structural changes, can be made. The above examples, modifications, or alternatives are not necessarily departures from the scope of the intended and claimed subject matter. While it is possible for the steps herein to be given in a certain order, in some cases the order is such that certain inputs are Or they can be modified so that they are provided in separate orders. Additionally, the disclosed procedures could be performed in a different order. Additionally, the various computations described herein need not be performed in the order disclosed, and other examples using alternate orderings for the computations could be readily implemented. . In addition to being reordered, in some cases a computation could also be decomposed into sub-computations with identical results.

Claims

Receiving map data including a first voxel space, the first voxel space having a first layer associated with a first resolution and a second layer different from the first resolution. a second layer associated with resolution;
receiving sensor data from sensors associated with the vehicle;
Associating the sensor data with a second voxel space, the second voxel space comprising a first layer associated with the first resolution and a second layer associated with the second resolution. and a layer of
determining first aggregated voxel data based at least in part on the first voxel space and the second voxel space;
determining a transformation between the first voxel space and the second voxel space based at least in part on the first aggregated voxel data;
determining a location of the vehicle in the physical environment based at least in part on the transformation.

Determining the first aggregated voxel data comprises:
for a first voxel in the first voxel space, identifying a set of voxels in the second voxel space having a centroid within a specified distance of the centroid of the first voxel;
selecting a second voxel of said set of voxels, said second voxel having a centroid closest to said centroid of said first voxel;
2. The method of claim 1, comprising determining a weighted average of the covariance of the first voxels and the covariance of the reference voxels.

3. A method according to claim 1 or 2, wherein said first voxel and said second voxel contain the same semantic class.

Determining the first aggregated voxel data comprises:
determining the smallest eigenvector of the weighted average;
4. The method of any one of claims 1-3, further comprising: determining a normal vector representing the first aggregated voxel data based at least in part on the smallest eigenvector. the method of.

A method according to any preceding claim, wherein re-weighting the first aggregated voxel data comprises applying an m-estimator framework.

Determining the transform includes:
determining a residual based at least in part on the smallest eigenvalue;
determining one or more of a rotation or translation between the target voxel space and a reference voxel space based at least in part on the residual. 5. The method of any one of 4.

7. The method of any one of claims 1-6, further comprising determining an uncertainty associated with the alignment based at least in part on modeling a distribution.

8. A method as claimed in any one of claims 1 to 7, wherein the transform indicates a difference in one or more of position or orientation between the first voxel space and the second voxel space. described method.

The vehicle is an autonomous vehicle, and the method includes:
9. The method of any one of claims 1-8, further comprising: controlling the autonomous vehicle based at least in part on the location of the autonomous vehicle in the physical environment.

A computer program product comprising coded instructions which, when executed on a computer, implements the method of any one of claims 1-9.

a system,
one or more processors;
one or more non-transitory computer-readable media storing instructions executable by the one or more processors, the instructions, when executed, causing the system to:
receiving a target multi-resolution voxel space;
receiving a reference multi-resolution voxel space;
determining that a first target voxel of the target multiresolution voxel space is associated with a first reference voxel of the reference multiresolution voxel space, wherein the first target voxel and the first reference voxel; share the first resolution; and
determining a first weighted statistic of the first target voxel and the first reference voxel;
determining that a second target voxel of the target multiresolution voxel space is associated with a second reference voxel of the reference multiresolution voxel space, wherein the second target voxel and the second reference voxel; share a second resolution, said second resolution being different than said first resolution;
determining a second weighted statistic of a second combined voxel representing the second target voxel and the second reference voxel;
determining a transformation between the target multiresolution voxel space and a reference multiresolution voxel space based at least in part on the first weighted statistic and the second weighted statistic. A system characterized by performing

12. The method of claim 11, wherein the target multiresolution voxel space includes a first set of voxels associated with a first classification and a second set of voxels associated with a second classification. System as described.

Determining that the first target voxel of the target multiresolution voxel space is associated with the first reference voxel of the reference multiresolution voxel space includes:
For a first target voxel, identifying a set of voxels in the reference multiresolution voxel space having a centroid within a specified distance of the centroid of the first target voxel;
determining the first reference voxel from the set of voxels based on the distance of the centroid of the first reference voxel to the centroid of the first target voxel. System according to claim 11 or 12.

The operation determines that the first target voxel and the first reference voxel correspond based at least in part on a distance between the centroid of the first target voxel and the centroid of the first reference voxel. 14. The system of any one of claims 1-13, further comprising:

Determining the transform includes:
performing principal component analysis on the first weighted statistic;
determining the smallest eigenvalue of the principal component analysis;
determining a residual based at least in part on the smallest eigenvalue;
determining as the transformation one or more of a rotation or translation between the target multi-resolution map and a reference multi-resolution map that optimizes the residual;
The operations further include applying one or more of a gradient descent technique or a non-linear optimization technique that minimizes a value based at least in part on the residuals, and wherein the transform is a translation or a rotation 15. The system of any one of claims 11-14, comprising one or more of: