JP7418762B2

JP7418762B2 - Computing systems, methods and non-transitory computer-readable media

Info

Publication number: JP7418762B2
Application number: JP2023018862A
Authority: JP
Inventors: アブエラ，アメッド; サルル，ハムディ; ロドリゲス，ホセジェロニモモレイラ; ヨウ，シュタオ; ユ，ジンゼ
Original assignee: Mujin Inc
Current assignee: Mujin Inc
Priority date: 2021-08-09
Filing date: 2023-02-10
Publication date: 2024-01-22
Anticipated expiration: 2042-08-09
Also published as: JP2024026473A; JP2023102783A; JP2023102781A; JP2023102782A; JP7391342B2; JP7417882B2

Description

関連出願の相互参照
本出願は、「ＡＲＯＢＯＴＩＣＳＹＳＴＥＭＦＯＲＦＡＣＩＬＩＴＡＴＩＮＧＴＥＭＰＬＡＴＥＭＡＴＣＨＩＮＧＡＮＤＤＥＴＥＣＴＩＯＮＦＯＲＯＢＪＥＣＴＰＩＣＫＩＮＧ」と題する２０２１年８月９日出願の米国仮特許出願第６３／２３０，９３１号の利益を主張し、その全体が参照により本明細書に組み込まれる。 CROSS-REFERENCE TO RELATED APPLICATIONS This application is filed in U.S. Provisional Patent Application No. 63/230,93, filed August 9, 2021, entitled “A ROBOTIC SYSTEM FOR FACILITATION TEMPLATE MATCHING AND DETECTION FOR OBJECT PICKING” Asserting the interests of No. 1, Incorporated herein by reference in its entirety.

本技術は、概して、ロボットシステム、より詳細には、物体を識別及び検出するためのシステム、プロセス、及び技術を対象とする。より詳細には、本技術は、容器中の物体を識別及び検出するために使用され得る。 TECHNICAL FIELD The present technology is generally directed to robotic systems, and more particularly to systems, processes, and techniques for identifying and detecting objects. More particularly, the technology can be used to identify and detect objects in containers.

性能がますます向上し、コストが低減するにつれ、現在、多くのロボット（例えば、物理的アクションを自動／自律的に実行するように構成された機械）が様々な異なる分野で広く使用されている。ロボットは、例えば、製造及び／又は組立、梱包及び／又は包装、輸送及び／又は出荷などにおける様々なタスク（例えば、空間を通した物体の操作又は搬送）を実行するために使用され得る。タスクを実行する際に、ロボットは、人のアクションを再現することができ、それによって、そうでなければ危険又は反復的なタスクを実行するのに必要な人の関与を置き換えるか、又は低減することができる。 Many robots (e.g. machines configured to perform physical actions automatically/autonomously) are now widely used in a variety of different fields as their performance continues to improve and costs decrease. . Robots may be used to perform various tasks (eg, manipulating or conveying objects through space), for example, in manufacturing and/or assembly, packaging and/or packaging, transportation and/or shipping, and the like. In performing tasks, robots can reproduce human actions, thereby replacing or reducing human involvement required to perform otherwise dangerous or repetitive tasks. be able to.

しかしながら、技術が進歩しているにもかかわらず、ロボットは多くの場合、より大きな及び／又はより複雑なタスクを実行するために要求される、人間の相互作用を複製するのに必要な精巧さを欠く。したがって、ロボット間の動作及び／又は相互作用を管理するための改善された技術及びシステムに対するニーズが依然として存在する。 However, despite advancements in technology, robots often lack the sophistication necessary to replicate human interaction, which is required to perform larger and/or more complex tasks. lack. Accordingly, there remains a need for improved techniques and systems for managing motion and/or interaction between robots.

一実施形態では、シーン内の物体を識別するための物体認識テンプレートセットを生成するように構成された計算システムが提供される。計算システムは、少なくとも一つの処理回路を含み、少なくとも一つの処理回路は、物体を表す物体モデルを含む、物体の登録データを取得することと、三次元空間における物体モデルの複数の視点を決定することと、複数の視点の各々で、物体モデルの複数の外観を推定することと、複数の外観に従って、それぞれが複数の外観のうちのそれぞれの一つに対応する複数の物体認識テンプレートを生成することと、複数の物体認識テンプレートを、ロボット制御システムに物体認識テンプレートセットとして伝達することと、行うように構成される。複数の物体認識テンプレートの各々は、シーン内の物体の画像情報を生成するカメラの光学軸に対して物体が有し得る姿勢を表す。 In one embodiment, a computing system is provided that is configured to generate a set of object recognition templates for identifying objects in a scene. The computing system includes at least one processing circuit, the at least one processing circuit acquiring registration data for the object, including an object model representing the object, and determining multiple viewpoints of the object model in three-dimensional space. estimating a plurality of appearances of the object model at each of the plurality of viewpoints; and generating a plurality of object recognition templates according to the plurality of appearances, each corresponding to a respective one of the plurality of appearances. and communicating the plurality of object recognition templates to a robot control system as an object recognition template set. Each of the plurality of object recognition templates represents a pose that an object may have with respect to an optical axis of a camera that generates image information of the object in the scene.

別の実施形態では、シーン内の物体を識別するための物体認識テンプレートセットを生成する方法が提供される。方法は、物体を表す物体モデルを含む、物体の登録データを取得することと、三次元空間における物体モデルの複数の視点を決定することと、複数の視点の各々で、物体モデルの複数の外観を推定することと、複数の外観に従って、各々が複数の外観のうちのそれぞれの一つに対応する複数の物体認識テンプレートを生成することと、複数の物体認識テンプレートを、ロボット制御システムに物体認識セットとして伝達することと、を含む。複数の物体認識テンプレートの各々は、シーン内の物体の画像情報を生成するカメラの光学軸に対して物体が有し得る姿勢を表す。 In another embodiment, a method is provided for generating a set of object recognition templates for identifying objects in a scene. The method includes obtaining registration data of an object including an object model representing the object, determining multiple viewpoints of the object model in three-dimensional space, and determining multiple appearances of the object model at each of the multiple viewpoints. estimating a plurality of object recognition templates according to the plurality of appearances, each object recognition template corresponding to a respective one of the plurality of appearances, and transmitting the plurality of object recognition templates to a robot control system for object recognition. and communicating as a set. Each of the plurality of object recognition templates represents a pose that an object may have with respect to an optical axis of a camera that generates image information of the object in the scene.

別の実施形態では、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体が提供され、非一時的コンピュータ可読媒体は、シーン内の物体を識別するための物体認識テンプレートを生成するための方法を実行するための実行可能な命令で構成される。方法は、物体を表す物体モデルを含む、物体の登録データを受信することと、物体モデルの複数の視点を三次元空間内で生成するための処理を行うことと、複数の視点の各々で、物体モデルの複数の外観を推定する処理を行うことと、複数の外観に従って、各々が複数の外観のうちのそれぞれの一つに対応する複数の物体認識テンプレートを生成する処理を行うことと、ロボットシステムに物体認識テンプレートセットとして、複数の物体認識テンプレートを出力することと、を含む。複数の物体認識テンプレートの各々は、シーン内の物体の画像情報を生成するカメラの光学軸に対して物体が有し得る姿勢を表す。 In another embodiment, a non-transitory computer-readable medium is provided operable by at least one processing circuit through a communication interface configured to communicate with a robotic system, the non-transitory computer-readable medium being operable within a scene. The method comprises executable instructions for performing a method for generating an object recognition template for identifying an object. The method includes receiving registration data of an object including an object model representing the object, performing processing to generate a plurality of viewpoints of the object model in a three-dimensional space, and, at each of the plurality of viewpoints, performing processing for estimating a plurality of appearances of an object model; performing processing for generating a plurality of object recognition templates, each of which corresponds to one of the plurality of appearances, according to the plurality of appearances; and a robot. The method includes outputting a plurality of object recognition templates to the system as an object recognition template set. Each of the plurality of object recognition templates represents a pose that an object may have with respect to an optical axis of a camera that generates image information of the object in the scene.

別の実施形態では、シーン内の物体を識別するための物体認識テンプレートを生成するように構成された計算システムが提供される。計算システムは、少なくとも一つの処理回路を含む。処理回路は、デジタルで表される物体を含む物体情報を取得するステップと、物体情報から二次元測定情報を抽出するステップと、物体情報から三次元測定情報を抽出するステップと、二次元測定情報及び三次元測定情報に従って物体認識テンプレートを生成するステップと、を含む。 In another embodiment, a computing system is provided that is configured to generate an object recognition template for identifying objects in a scene. The computing system includes at least one processing circuit. The processing circuit includes a step of acquiring object information including a digitally represented object, a step of extracting two-dimensional measurement information from the object information, a step of extracting three-dimensional measurement information from the object information, and a step of extracting two-dimensional measurement information from the object information. and generating an object recognition template according to the three-dimensional measurement information.

別の実施形態では、シーン内の物体を識別するための物体認識テンプレートを生成する方法が提供される。方法は、デジタルで表される物体を含む物体情報を取得することと、物体情報から二次元測定情報を抽出することと、物体情報から三次元測定情報を抽出することと、二次元測定情報及び三次元測定情報に従って物体認識テンプレートを生成することとを含む。 In another embodiment, a method is provided for generating an object recognition template for identifying objects in a scene. The method includes acquiring object information including a digitally represented object, extracting two-dimensional measurement information from the object information, extracting three-dimensional measurement information from the object information, and extracting two-dimensional measurement information and and generating an object recognition template according to the three-dimensional measurement information.

別の実施形態では、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体であって、シーン内の物体を識別するための物体認識テンプレートを生成するための方法を実行するための実行可能な命令を有する非一時的コンピュータ可読媒体が提供される。方法は、デジタルで表される物体を含む物体情報を受信することと、物体情報から二次元測定情報を抽出するための動作を行うことと、物体情報から三次元測定情報を抽出するための動作を行うことと、二次元測定情報及び三次元測定情報に従って、ロボットシステムに物体認識テンプレートを出力することと、を含む。 In another embodiment, a non-transitory computer readable medium operable by at least one processing circuit through a communication interface configured to communicate with a robotic system for identifying an object in a scene. A non-transitory computer-readable medium having executable instructions for performing a method for generating a recognition template is provided. The method includes receiving object information including a digitally represented object, performing an operation for extracting two-dimensional measurement information from the object information, and performing an operation for extracting three-dimensional measurement information from the object information. and outputting an object recognition template to the robot system according to the two-dimensional measurement information and the three-dimensional measurement information.

別の実施形態では、計算システムが提供される。計算システムは、アーム及びアームに接続されたエンドエフェクタを有するロボットと通信すると共に、視野を有するカメラと通信する少なくとも一つの処理回路を備える。少なくとも一つの処理回路は、視野内に一つ以上の物体がある又はあったときに、非一時的コンピュータ可読媒体に記憶されている命令を実行するように構成される。実行される命令は、シーン内の物体の物体画像情報を取得することと、テンプレート物体を表す対応する物体認識テンプレートを含む検出仮説を取得することと、テンプレート物体と物体画像情報との間の不一致を識別することと、物体画像情報の物体位置のセットに対応するテンプレート物体内のテンプレート位置のセットを識別することと、テンプレート位置のセットを、物体位置のセットに収束するように調整することと、調整後のテンプレート位置のセットに従って、調整された対応する物体認識テンプレートを含む、調整された検出仮説を生成することと、を含む。 In another embodiment, a computing system is provided. The computing system includes at least one processing circuit in communication with a robot having an arm and an end effector connected to the arm, and in communication with a camera having a field of view. At least one processing circuit is configured to execute instructions stored on the non-transitory computer-readable medium when there is or has been one or more objects within the field of view. The instructions executed are to obtain object image information for an object in the scene, obtain a detection hypothesis containing a corresponding object recognition template representing the template object, and detect the discrepancy between the template object and the object image information. identifying a set of template positions within the template object that correspond to the set of object positions of the object image information; and adjusting the set of template positions to converge to the set of object positions. , generating an adjusted detection hypothesis including an adjusted corresponding object recognition template according to the set of adjusted template positions.

別の実施形態では、方法が提供される。方法は、シーン内の物体の物体画像情報を取得することと、テンプレート物体を表す対応する物体認識テンプレートを含む検出仮説を取得することと、テンプレート物体と物体画像情報との間の不一致を識別することと、物体画像情報の物体位置のセットに対応するテンプレート物体内のテンプレート位置のセットを識別することと、テンプレート位置のセットを、物体位置のセットに収束するように調整することと、調整後のテンプレート位置のセットに従って、調整された対応する物体認識テンプレートを含む、調整された検出仮説を生成することと、を含む。 In another embodiment, a method is provided. The method includes obtaining object image information of an object in a scene, obtaining a detection hypothesis including a corresponding object recognition template representing the template object, and identifying mismatches between the template object and the object image information. identifying a set of template positions in the template object that correspond to the set of object positions in the object image information; adjusting the set of template positions to converge to the set of object positions; generating a tailored detection hypothesis that includes a tailored corresponding object recognition template according to the set of template positions of the object recognition template.

別の実施形態では、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体であって、検出仮説を精密化するための方法を実行するための実行可能な命令を有する非一時的コンピュータ可読媒体が提供される。方法は、シーン内の物体の物体画像情報を受信することと、テンプレート物体を表す対応する物体認識テンプレートを含む検出仮説を受信することと、テンプレート物体と物体画像情報との間の不一致を識別する動作を行うことと、物体画像情報の物体位置のセットに対応するテンプレート物体内のテンプレート位置のセットを識別する動作を行うことと、テンプレート位置のセットを、物体位置のセットに収束するように調整する動作を行うことと、調整後のテンプレート位置のセットに従って、調整された対応する物体認識テンプレートを含む、調整された検出仮説をロボットシステムに出力することと、を含む。 In another embodiment, a non-transitory computer-readable medium operable by at least one processing circuit through a communication interface configured to communicate with a robotic system includes a method for refining a detection hypothesis. A non-transitory computer readable medium having executable instructions for execution is provided. The method includes receiving object image information of an object in a scene, receiving a detection hypothesis including a corresponding object recognition template representing the template object, and identifying a mismatch between the template object and the object image information. performing an operation, identifying a set of template positions within the template object that correspond to a set of object positions in the object image information; and adjusting the set of template positions to converge to the set of object positions. and outputting an adjusted detection hypothesis to the robotic system including an adjusted corresponding object recognition template according to the adjusted set of template positions.

別の実施形態では、計算システムが提供される。計算システムは、アーム及びアームに接続されたエンドエフェクタを有するロボットと通信すると共に、視野を有するカメラと通信する少なくとも一つの処理回路を備え、少なくとも一つの処理回路は、視野内に一つ以上の物体がある又はあったときに、非一時的コンピュータ可読媒体に記憶されている命令を実行するように構成される。実行される命令は、シーン内の物体の物体画像情報を取得することと、それぞれがテンプレート物体を表す対応する物体認識テンプレートを含む検出仮説のセットを取得することと、検出仮説のセットの各検出仮説を検証することと、を含む。検証することは、検出仮説の物体認識テンプレートの三次元情報と、物体に対応する物体画像情報の三次元情報との比較に基づいて、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア及び法線ベクトルバリデータスコアのうちの少なくとも一つを含む複数の三次元検証スコアを生成することと、検出仮説の対応する物体認識テンプレートの二次元情報と物体画像情報の三次元情報の比較に基づいて、レンダリングされた合致バリデータスコア及びテンプレートマッチングバリデータスコアのうちの少なくとも一つを含む複数の二次元検証スコアを生成することと、複数の三次元検証スコア及び複数の二次元検証スコアに従って、検出仮説のセットから検出仮説をフィルタリングすることと、検証後に検出仮説のセットに残っているフィルタリングされていない検出仮説に従って、シーン内の物体を検出することとによって行われる。 In another embodiment, a computing system is provided. The computing system includes at least one processing circuit in communication with a robot having an arm and an end effector connected to the arm and in communication with a camera having a field of view. The object is or is configured to execute instructions stored on the non-transitory computer-readable medium. The instructions executed are to obtain object image information for objects in the scene, obtain a set of detection hypotheses each containing a corresponding object recognition template representing a template object, and obtain a set of detection hypotheses for each detection in the set of detection hypotheses. testing a hypothesis; What is verified is the occlusion validator score, point cloud validator score, hole matching validator, based on the comparison between the 3D information of the object recognition template of the detection hypothesis and the 3D information of the object image information corresponding to the object. generating a plurality of three-dimensional validation scores including at least one of a score and a normal vector validator score; and comparing two-dimensional information of a corresponding object recognition template of a detection hypothesis and three-dimensional information of object image information. generating a plurality of two-dimensional validation scores including at least one of a rendered matching validator score and a template matching validator score based on the plurality of three-dimensional validation scores and the plurality of two-dimensional validation scores. Accordingly, by filtering detection hypotheses from the set of detection hypotheses and detecting objects in the scene according to the unfiltered detection hypotheses remaining in the set of detection hypotheses after validation.

別の実施形態では、方法が提供される。方法は、シーン内の物体の物体画像情報を取得することと、それぞれがテンプレート物体を表す対応する物体認識テンプレートを含む検出仮説のセットを取得することと、検出仮説のセットの各検出仮説を検証することと、を含む。検証することは、検出仮説の物体認識テンプレートの三次元情報と、物体に対応する物体画像情報の三次元情報との比較に基づいて、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、及び法線ベクトルバリデータスコアのうちの少なくとも一つを含む、複数の三次元検証スコアを生成することと、検出仮説の対応する物体認識テンプレートの二次元情報と物体画像情報の三次元情報の比較に基づいて、レンダリングされた合致バリデータスコア及びテンプレートマッチングバリデータスコアのうちの少なくとも一つを含む、複数の二次元検証スコアを生成することと、複数の三次元検証スコア及び複数の二次元検証スコアに従って、検出仮説のセットから検出仮説をフィルタリングすることと、検証後に検出仮説のセットに残っているフィルタリングされていない検出仮説に従って、シーン内の物体を検出することと、によって行われる。 In another embodiment, a method is provided. The method includes obtaining object image information of objects in a scene, obtaining a set of detection hypotheses each containing a corresponding object recognition template representing a template object, and validating each detection hypothesis in the set of detection hypotheses. and include. What is verified is the occlusion validator score, point cloud validator score, hole matching validator, based on the comparison between the 3D information of the object recognition template of the detection hypothesis and the 3D information of the object image information corresponding to the object. generating a plurality of 3D verification scores including at least one of a score and a normal vector validator score; and 2D information of a corresponding object recognition template of the detection hypothesis and 3D information of the object image information. generating a plurality of two-dimensional validation scores, including at least one of a rendered matching validator score and a template matching validator score, based on a comparison of a plurality of three-dimensional validation scores and a plurality of two-dimensional validation scores; This is done by filtering detection hypotheses from the set of detection hypotheses according to the dimensional validation score and detecting objects in the scene according to the unfiltered detection hypotheses remaining in the set of detection hypotheses after validation.

別の実施形態では、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体であって、検出仮説を検証するための方法を実行するための実行可能な命令を有する非一時的コンピュータ可読媒体が提供される。方法は、シーン内の物体の物体画像情報を受信することと、検出仮説のセットを受信することであって、各検出仮説がテンプレート物体を表す対応する物体認識テンプレートを含む、検出仮説のセットを受信することと、検出仮説の物体認識テンプレートの三次元情報と、物体に対応する物体画像情報の三次元情報との比較に基づいて、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、及び法線ベクトルバリデータスコアのうちの少なくとも一つを含む、複数の三次元検証スコアを生成する処理を行うことと、検出仮説の対応する物体認識テンプレートの二次元情報及び物体画像情報の三次元情報との比較に基づいて、レンダリングされた合致バリデータスコア及びテンプレートマッチングバリデータスコアのうちの少なくとも一つを含む、複数の二次元検証スコアを生成する処理を実行することと、複数の三次元検証スコア及び複数の二次元検証スコアに従って、検出仮説のセットから検出仮説をフィルタリングする処理を行うことと、検証後に、検出仮説のセットに残っているフィルタリングされていない検出仮説に従って、シーン内の物体を検出することと、シーン内の検出された物体をロボットシステムに出力することと、を含む。 In another embodiment, a non-transitory computer readable medium operable by at least one processing circuit through a communication interface configured to communicate with a robotic system to perform a method for testing a detection hypothesis. A non-transitory computer-readable medium is provided having executable instructions to do so. The method includes receiving object image information for objects in a scene and receiving a set of detection hypotheses, each detection hypothesis including a corresponding object recognition template representing a template object. Based on receiving and comparing the three-dimensional information of the object recognition template of the detection hypothesis and the three-dimensional information of the object image information corresponding to the object, an occlusion validator score, a point cloud validator score, a hole matching validator are calculated. performing a process of generating a plurality of three-dimensional verification scores including at least one of a score and a normal vector validator score, and generating two-dimensional information and object image information of an object recognition template corresponding to a detection hypothesis. performing a process of generating a plurality of two-dimensional validation scores, including at least one of a rendered matching validator score and a template matching validator score, based on the comparison with the three-dimensional information; performing a process of filtering a detection hypothesis from a set of detection hypotheses according to a 3D validation score and a plurality of 2D validation scores; and outputting the detected objects in the scene to a robotic system.

本明細書の実施形態による、物体の検出、識別、及び取り出しを実行するか、又は容易にするためのシステムを示す。1 illustrates a system for performing or facilitating object detection, identification, and retrieval according to embodiments herein;

本明細書の実施形態による、物体の検出、識別、及び取り出しを実行するか、又は容易にするためのシステムの一実施形態を示す。1 illustrates one embodiment of a system for performing or facilitating object detection, identification, and retrieval according to embodiments herein.

本明細書の実施形態による、物体の検出、識別、及び取り出しを実行するか、又は容易にするためのシステムの別の実施形態を示す。3 illustrates another embodiment of a system for performing or facilitating object detection, identification, and retrieval according to embodiments herein.

本明細書の実施形態による、物体の検出、識別、及び取り出しを実行するか、又は容易にするためのシステムのさらに別の実施形態を示す。3 illustrates yet another embodiment of a system for performing or facilitating object detection, identification, and retrieval according to embodiments herein.

本明細書の実施形態と一致する、物体の検出、識別及び取り出しを実行するか、又は容易にするように構成された計算システムを示すブロック図である。FIG. 1 is a block diagram illustrating a computing system configured to perform or facilitate object detection, identification, and retrieval consistent with embodiments herein.

本明細書の実施形態と一致する、物体の検出、識別、及び取り出しを実行するか、又は容易にするように構成された計算システムの一実施形態を示すブロック図である。FIG. 1 is a block diagram illustrating one embodiment of a computing system configured to perform or facilitate object detection, identification, and retrieval consistent with embodiments herein.

本明細書の実施形態と一致する、物体の検出、識別、及び取り出しを実行するか、又は容易にするように構成された計算システムの別の実施形態を示すブロック図である。FIG. 2 is a block diagram illustrating another embodiment of a computing system configured to perform or facilitate object detection, identification, and retrieval consistent with embodiments herein.

本明細書の実施形態と一致する、物体の検出、識別、及び取り出しを実行するか、又は容易にするように構成された計算システムのさらに別の実施形態を示すブロック図である。FIG. 2 is a block diagram illustrating yet another embodiment of a computing system configured to perform or facilitate object detection, identification, and retrieval consistent with embodiments herein.

システムによって処理され、本明細書の実施形態と一致する、画像情報の例である。2 is an example of image information processed by the system and consistent with embodiments herein;

システムによって処理され、本明細書の実施形態と一致する、画像情報の別の例である。2 is another example of image information processed by the system and consistent with embodiments herein.

本明細書の実施形態による、ロボットシステムを操作するための例示的な環境を示す。1 illustrates an example environment for operating a robotic system, according to embodiments herein.

本明細書の実施形態と一致する、ロボットシステムによる物体の検出、識別、及び取り出しのための例示的な環境を示す。1 illustrates an example environment for object detection, identification, and retrieval by a robotic system consistent with embodiments herein.

本明細書の実施形態による、物体の検出、識別、及び取り出しのための方法及び処理の全体的な流れを示す、フロー図を提供する。1 provides a flow diagram illustrating the overall flow of methods and processes for object detection, identification, and retrieval according to embodiments herein.

本明細書の実施形態と一致する、物体登録データの例を示す。3 illustrates an example of object registration data consistent with embodiments herein.

本明細書の実施形態と一致する、物体認識テンプレートを生成する方法を示す。3 illustrates a method for generating object recognition templates consistent with embodiments herein.

本明細書の実施形態と一致する物体認識テンプレートを生成する方法の態様を示す。3 illustrates aspects of a method for generating object recognition templates consistent with embodiments herein. 本明細書の実施形態と一致する物体認識テンプレートを生成する方法の態様を示す。3 illustrates aspects of a method for generating object recognition templates consistent with embodiments herein.

本明細書の実施形態と一致する、物体認識テンプレートを生成する方法を示す。4 illustrates a method for generating object recognition templates consistent with embodiments herein.

本明細書の実施形態と一致する、物体認識テンプレートを生成する方法の態様を示す。4 illustrates aspects of a method for generating object recognition templates consistent with embodiments herein. 本明細書の実施形態と一致する、物体認識テンプレートを生成する方法の態様を示す。3 illustrates aspects of a method for generating object recognition templates consistent with embodiments herein. 本明細書の実施形態と一致する、物体認識テンプレートを生成する方法の態様を示す。2 illustrates aspects of a method for generating object recognition templates consistent with embodiments herein. 本明細書の実施形態と一致する、物体認識テンプレートを生成する方法の態様を示す。2 illustrates aspects of a method for generating object recognition templates consistent with embodiments herein.

本明細書の実施形態と一致するテンプレートマッチングを介した物体識別及び仮説生成の方法を示す。3 illustrates a method of object identification and hypothesis generation via template matching consistent with embodiments herein. 本明細書の実施形態と一致するテンプレートマッチングを介した物体識別及び仮説生成の方法を示す。2 illustrates a method of object identification and hypothesis generation via template matching consistent with embodiments herein.

図１１は、本明細書の実施形態と一致する検出仮説を精密化する方法を示す。FIG. 11 illustrates a method for refining detection hypotheses consistent with embodiments herein.

本明細書の実施形態と一致する検出仮説を精密化する方法の態様を示す。4 illustrates aspects of a method for refining a detection hypothesis consistent with embodiments herein. 本明細書の実施形態と一致する検出仮説を精密化する方法の態様を示す。4 illustrates aspects of a method for refining a detection hypothesis consistent with embodiments herein. 本明細書の実施形態と一致する検出仮説を精密化する方法の態様を示す。3 illustrates aspects of methods for refining detection hypotheses consistent with embodiments herein.

本明細書の実施形態と一致する検出仮説を検証する方法を示す。3 illustrates a method for testing a detection hypothesis consistent with embodiments herein.

本明細書の実施形態と一致する検出仮説を精密化する方法の態様を示す。3 illustrates aspects of methods for refining detection hypotheses consistent with embodiments herein.

物体の検出、識別、及び取り出しに関連するシステム及び方法が本明細書に記載されている。具体的には、本開示のシステム及び方法は、物体が容器の中に位置する場所における物体の検出、識別、及び取り出しを容易にし得る。本明細書で論じるように、物体は、金属又は他の材料であってもよく、箱、ビン、木枠などの容器の中に位置してもよい。物体は、例えば、ねじで満たされた箱などの、非組織的又は不規則な様式で、容器の中に位置してもよい。こうした状況での物体の検出及び識別は、物体の不規則な配置のために困難であり得るが、本明細書で論じられるシステム及び方法は、規則的又は半規則的な様式で配置される物体の検出、識別、及び物体の取り出しを同様に改善し得る。したがって、本明細書に記載のシステム及び方法は、複数の物体の中から個々の物体を識別するように設計され、個々の物体は、異なる位置、異なる角度などで配置されてもよい。本明細書で論じるシステム及び方法は、ロボットシステムを含み得る。本明細書の実施形態に従って構成されたロボットシステムは、複数のロボットの操作を調整することによって、統合されたタスクを自律的に実行し得る。ロボットシステムは、本明細書に記載されるように、制御し、コマンドを発行し、ロボット装置及びセンサからの情報を受信し、ロボット装置、センサ及びカメラによって生成されたデータにアクセスし、分析し、及び処理し、ロボットシステムの制御に使用可能なデータ又は情報を生成し、ロボット装置、センサ、及びカメラのアクションを計画するように構成されたロボット装置、アクチュエータ、センサ、カメラ、及び計算システムの任意の適切な組み合わせを含み得る。本明細書で使用される場合、ロボットシステムは、ロボットアクチュエータ、センサ、又はその他の装置に直ちにアクセス又は制御する必要はない。ロボットシステムは、本明細書に記載するように、情報の受け取り、分析、及び処理を通して、こうしたロボットアクチュエータ、センサ、及び他の装置の性能を改善するように構成された計算システムであってもよい。 Systems and methods related to object detection, identification, and retrieval are described herein. Specifically, the systems and methods of the present disclosure may facilitate detection, identification, and retrieval of objects where they are located within a container. As discussed herein, the object may be metal or other material and may be located within a container such as a box, bottle, crate, or the like. The objects may be located within the container in an unorganized or irregular manner, such as, for example, a box filled with screws. Detection and identification of objects in such situations can be difficult due to the irregular arrangement of objects, but the systems and methods discussed herein are useful for detecting and identifying objects that are arranged in a regular or semi-regular manner. Detection, identification, and object retrieval may be similarly improved. Accordingly, the systems and methods described herein are designed to identify individual objects among a plurality of objects, and the individual objects may be located at different positions, at different angles, etc. The systems and methods discussed herein may include robotic systems. A robotic system configured according to embodiments herein may autonomously perform integrated tasks by coordinating the operations of multiple robots. The robotic system controls, issues commands, receives information from robotic devices and sensors, and accesses and analyzes data generated by robotic devices, sensors, and cameras, as described herein. and of robotic devices, actuators, sensors, cameras, and computing systems configured to process and generate data or information usable for controlling the robotic system and to plan actions of the robotic devices, sensors, and cameras. Any suitable combination may be included. As used herein, a robotic system does not require immediate access to or control of robotic actuators, sensors, or other devices. A robotic system may be a computing system configured to improve the performance of such robotic actuators, sensors, and other devices through receiving, analyzing, and processing information, as described herein. .

本明細書に記載される技術は、物体の識別、検出、及び取り出しで使用するために構成されたロボットシステムに技術的改善を提供する。本明細書に記載する技術的改善は、これらの作業の速度、精度、及び正確さを増加させ、容器からの物体の検出、識別、及び取り出しをさらに容易にする。本明細書に記載されるロボットシステム及び計算システムは、容器からの物体を識別、検出、及び取り出す技術的問題に対処するものであり、物体は不規則に配置され得る。この技術的問題に対処することにより、物体の識別、検出、及び取り出しの技術が改善される。 The techniques described herein provide technological improvements to robotic systems configured for use in object identification, detection, and retrieval. The technical improvements described herein increase the speed, accuracy, and precision of these tasks, making it easier to detect, identify, and remove objects from containers. The robotic and computational systems described herein address the technical problem of identifying, detecting, and removing objects from containers, where objects may be randomly located. Addressing this technical problem improves object identification, detection, and retrieval techniques.

本出願は、システム及びロボットシステムを指す。ロボットシステムは、本明細書で論じるように、ロボットアクチュエータ構成要素（例えば、ロボットアーム、ロボットグリッパなど）、様々なセンサ（例えば、カメラなど）、及び様々な計算又は制御システムを含み得る。本明細書で論じるように、計算システム又は制御システムは、ロボットアーム、ロボットグリッパ、カメラなどの様々なロボット構成要素を「制御する」と呼んでもよい。こうした「制御」は、ロボット構成要素の様々なアクチュエータ、センサ、及びその他の機能的態様の直接的な制御及び相互作用を指し得る。例えば、計算システムは、様々なモータ、アクチュエータ、及びセンサにロボット移動を引き起こすために必要な信号のすべてを発行又は提供することによって、ロボットアームを制御し得る。こうした「制御」はまた、こうしたコマンドをロボット移動を引き起こすために必要な信号に変換するさらなるロボット制御システムへの抽象的又は間接的なコマンドの発行を指すことができる。例えば、計算システムは、ロボットアームが移動すべき軌道又は目的地の位置を記述するコマンドを発行することによってロボットアームを制御してもよく、ロボットアームに関連付けられたさらなるロボット制御システムは、こうしたコマンドを受信及び解釈し、その後、ロボットアームの様々なアクチュエータ及びセンサに必要な直接信号を提供して、必要な移動を引き起こしてもよい。 This application refers to systems and robotic systems. A robotic system may include robotic actuator components (eg, robotic arms, robotic grippers, etc.), various sensors (eg, cameras, etc.), and various computing or control systems, as discussed herein. As discussed herein, a computing system or control system may be referred to as "controlling" various robot components, such as a robot arm, robot gripper, camera, etc. Such "control" may refer to the direct control and interaction of various actuators, sensors, and other functional aspects of the robotic components. For example, a computing system may control a robotic arm by issuing or providing various motors, actuators, and sensors with all of the necessary signals to cause robotic movement. Such "control" may also refer to the issuance of abstract or indirect commands to further robot control systems that convert such commands into the signals necessary to cause robot movement. For example, a computing system may control a robotic arm by issuing commands that describe a trajectory or a destination location for the robotic arm to travel, and a further robotic control system associated with the robotic arm may control such commands. may be received and interpreted and then provide the necessary direct signals to the various actuators and sensors of the robot arm to cause the required movement.

具体的には、本明細書に記載の本技術は、ロボットシステムが、容器内の複数の物体のうち、標的物体と相互作用するのを支援する。容器からの物体の検出、識別、及び取り出しには、適切な物体認識テンプレートの生成、識別に使用可能な特徴の抽出、ならびに検出仮説の生成、精密化、及び検証を含む、いくつかのステップが必要である。例えば、物体の不規則な配置の可能性のために、複数の異なる姿勢（例えば、角度及び位置）の、及び他の物体の一部によって潜在的に隠されている場合の、物体を認識及び識別することが必要であり得る。 Specifically, the techniques described herein assist a robotic system in interacting with a target object among a plurality of objects within a container. Detection, identification, and retrieval of objects from containers involves several steps, including generation of appropriate object recognition templates, extraction of features that can be used for identification, and generation, refinement, and validation of detection hypotheses. is necessary. For example, due to the possibility of irregular placement of objects, objects can be recognized and It may be necessary to identify.

以下に、本開示の技術の理解を提供するために、具体的な詳細が記載されている。実施形態では、本明細書に導入される技術は、本明細書に開示される各具体的な詳細を含まずに実施されてもよい。他の実例では、特定の機能又はルーチンなどの周知の特徴は、本開示を不必要に不明瞭化することを避けるために詳細には説明されない。本明細書における「実施形態」、「一実施形態」などへの参照は、説明される特定の特徴、構造、材料、又は特性が、本開示の少なくとも一つの実施形態に含まれることを意味する。したがって、本明細書におけるそのような語句の外観は、必ずしもすべて同じ実施形態を指すわけではない。一方で、そのような参照は、必ずしも相互に排他的なものではない。さらに、任意の一つの実施形態に関して記載される特定の特徴、構造、材料、又は特性は、このような品目が相互に排他的でない限り、任意の他の実施形態のものと任意の適切な様式で組み合わせることができる。図に示される様々な実施形態は、単に例示的な表現であり、必ずしも縮尺どおりに描かれるものではないことを理解されたい。 Specific details are set forth below to provide an understanding of the techniques of this disclosure. In embodiments, the techniques introduced herein may be practiced without each specific detail disclosed herein. In other instances, well-known features, such as particular functions or routines, are not described in detail to avoid unnecessarily obscuring the present disclosure. Reference herein to "an embodiment," "an embodiment," or the like means that the particular feature, structure, material, or characteristic described is included in at least one embodiment of the present disclosure. . Therefore, the appearances of such phrases herein are not necessarily all referring to the same embodiment. However, such references are not necessarily mutually exclusive. Furthermore, the specific features, structures, materials, or characteristics described with respect to any one embodiment may be combined with those of any other embodiment in any appropriate manner, unless such items are mutually exclusive. Can be combined with It is to be understood that the various embodiments shown in the figures are merely exemplary representations and are not necessarily drawn to scale.

周知であり、かつ多くの場合にロボットシステム及びサブシステムと関連付けられるが、本開示の技術のいくつかの重要な態様を不必要に不明瞭にし得る、構造又はプロセスを説明する数点の詳細は、明確化の目的で以下の説明には記載されていない。さらに、以下の開示は、本技術の異なる態様の数点の実施形態を説明しているが、数点の他の実施形態は、本節に説明されるものとは異なる構成又は異なる構成要素を有してもよい。したがって、開示された技術は、追加の要素を有するか、又は以下に説明される要素のうちの数点を有しない、他の実施形態を有してもよい。 A few details describing structures or processes that are well known and often associated with robotic systems and subsystems may unnecessarily obscure some important aspects of the disclosed techniques. , are not included in the description below for clarity. Additionally, while the following disclosure describes several embodiments of different aspects of the technology, several other embodiments may have different configurations or different components than those described in this section. You may. Accordingly, the disclosed technology may have other embodiments with additional elements or without some of the elements described below.

以下に説明される本開示の多くの実施形態又は態様は、プログラム可能なコンピュータ又はコントローラによって実行されるルーチンを含む、コンピュータ又はコントローラ実行可能命令の形態をとり得る。関連分野の当業者であれば、開示された技術は、以下に示され説明されるもの以外のコンピュータ又はコントローラシステム上で、もしくはそれらを用いて実践され得ることを理解するであろう。本明細書に説明される技術は、以下に説明されるコンピュータ実行可能な命令のうちの一つ以上を実行するように、特別にプログラム、構成、又は構築されている、専用コンピュータ又はデータプロセッサで具現化され得る。したがって、本明細書において一般的に使用される「コンピュータ」及び「コントローラ」という用語は、任意のデータプロセッサを指し、インターネット家電及びハンドヘルドデバイス（パームトップコンピュータ、ウェアラブルコンピュータ、セルラー又はモバイルフォン、マルチプロセッサシステム、プロセッサベース又はプログラム可能な家電、ネットワークコンピュータ、ミニコンピュータなどを含む）を含み得る。これらのコンピュータ及びコントローラによって処理される情報は、液晶ディスプレイ（ＬＣＤ）を含む、任意の好適なディスプレイ媒体で提示され得る。コンピュータ又はコントローラ実行可能タスクを実行するための命令は、ハードウェア、ファームウェア、又はハードウェアとファームウェアとの組み合わせを含む、任意の好適なコンピュータ可読媒体に、又はその上に記憶され得る。命令は、例えば、フラッシュドライブ、ＵＳＢデバイス、及び／又は他の好適な媒体を含む、任意の適切なメモリデバイスに含まれ得る。 Many embodiments or aspects of the present disclosure described below may take the form of computer- or controller-executable instructions, including routines executed by a programmable computer or controller. Those of ordinary skill in the relevant art will appreciate that the disclosed techniques may be practiced on or with computer or controller systems other than those shown and described below. The techniques described herein may be implemented on a special purpose computer or data processor that is specifically programmed, configured, or constructed to execute one or more of the computer-executable instructions described below. can be realized. Accordingly, the terms "computer" and "controller" as used generally herein refer to any data processor, Internet appliances and handheld devices (such as palmtop computers, wearable computers, cellular or mobile phones, multi-processor systems, processor-based or programmable consumer electronics, network computers, minicomputers, etc.). Information processed by these computers and controllers may be presented on any suitable display medium, including liquid crystal displays (LCDs). Instructions for performing computer- or controller-executable tasks may be stored in or on any suitable computer-readable medium, including hardware, firmware, or a combination of hardware and firmware. The instructions may be included in any suitable memory device, including, for example, a flash drive, a USB device, and/or other suitable media.

「結合」及び「接続」という用語は、それらの派生語と共に、本明細書では、構成要素間の構造的な関係を説明するために使用され得る。これらの用語は、互いの同義語として意図されていないことが理解されるべきである。むしろ、特定の実施形態では、「接続」は、二つ以上の要素が互いに直接接触していることを示すために使用され得る。文脈において別途明白にされない限り、「結合」という用語は、二つ以上の要素が、互いに直接的又は間接的（それらの間の他の介在要素との）接触にあるか、又は二つ以上の要素が互いに協働するか、もしくは相互作用する（例えば、信号送信／受信のための、又は関数呼び出しのためのなどの、因果関係にあるような）か、又はその両方を示すために使用され得る。 The terms "coupled" and "connected", along with their derivatives, may be used herein to describe structural relationships between components. It should be understood that these terms are not intended as synonyms for each other. Rather, in certain embodiments, "connected" can be used to indicate that two or more elements are in direct contact with each other. Unless the context clearly indicates otherwise, the term "coupled" means that two or more elements are in direct or indirect contact with each other (with other intervening elements between them) or used to indicate that elements cooperate or interact with each other (e.g., in a causal relationship, such as for sending/receiving signals or for function calls), or both. obtain.

計算システムによる画像分析に対する本明細書の任意の参照は、選択された点に対する様々な位置のそれぞれの奥行き値を説明する奥行き情報を含み得る空間構造情報に従って、又はそれを使用して実施され得る。奥行き情報は、物体を識別するか、又は物体が空間的にどのように配置されているかを推定するために使用され得る。一部の実例では、空間構造情報は、物体の一つ以上の表面上の場所を説明する点群を含んでもよく、又は該点群を生成するために使用され得る。空間構造情報は、可能な画像分析の一形態に過ぎず、当業者が公知の他の形態が、本明細書に説明される方法に従って使用され得る。 Any reference herein to image analysis by a computational system may be performed in accordance with or using spatial structure information, which may include depth information that describes depth values for each of the various locations relative to a selected point. . Depth information may be used to identify objects or estimate how objects are spatially located. In some instances, the spatial structure information may include or be used to generate a point cloud that describes locations on one or more surfaces of an object. Spatial structure information is only one form of possible image analysis; other forms known to those skilled in the art may be used in accordance with the methods described herein.

図１Ａは、物体検出、すなわちより具体的には、物体認識を実行するためのシステム１０００を示している。より詳細には、システム１０００は、計算システム１１００及びカメラ１２００を含み得る。この例では、カメラ１２００は、カメラ１２００が位置する環境を、説明するか、もしくはそうでなければ表し、又はより具体的には、カメラ１２００の視野（カメラ視野とも呼ぶ）内の環境を表す、画像情報を生成するように構成され得る。環境は、例えば、倉庫、製造工場、小売空間、又は他の施設であり得る。こうした実例では、画像情報が、箱、ビン、ケース、木枠、パレット、又は他の容器などの、こうした施設に位置する物体を表し得る。システム１０００は、以下でより詳細に論じるように、画像情報を使用して、カメラ視野内の個々の物体を区別すること、画像情報に基づいて物体認識又は物体登録を実行すること、及び／又は画像情報に基づいてロボット相互作用計画を実行することなど、画像情報を生成、受信、及び／又は処理するよう構成され得る（用語「及び／又は」及び「又は」は、本開示では互換的に使用される）。ロボット相互作用計画は、例えば、ロボットと容器又は他の物体との間のロボット相互作用を容易にするように、施設でロボットを制御するために使用され得る。計算システム１１００及びカメラ１２００が、同じ施設に位置してもよく、又は互いと遠隔に位置してもよい。例えば、計算システム１１００は、倉庫又は小売空間から遠隔のデータセンターでホストされる、クラウドコンピューティングプラットフォームの一部であってもよく、ネットワーク接続を介して、カメラ１２００と通信し得る。 FIG. 1A shows a system 1000 for performing object detection, or more specifically, object recognition. More particularly, system 1000 may include a computing system 1100 and a camera 1200. In this example, camera 1200 describes or otherwise represents the environment in which camera 1200 is located, or more specifically represents the environment within the field of view of camera 1200 (also referred to as camera field of view). The image information may be configured to generate image information. The environment may be, for example, a warehouse, manufacturing plant, retail space, or other facility. In such instances, the image information may represent objects located in such facilities, such as boxes, bins, cases, crates, pallets, or other containers. System 1000 uses the image information to distinguish between individual objects within the camera field of view, perform object recognition or object registration based on the image information, and/or as discussed in more detail below. May be configured to generate, receive, and/or process image information, such as executing a robot interaction plan based on the image information (the terms “and/or” and “or” are used interchangeably in this disclosure). used). A robot interaction plan may be used to control a robot at a facility, for example, to facilitate robot interaction between the robot and a container or other object. Computing system 1100 and camera 1200 may be located in the same facility or may be located remotely from each other. For example, computing system 1100 may be part of a cloud computing platform hosted in a data center remote from a warehouse or retail space and may communicate with camera 1200 via a network connection.

一実施形態では、カメラ１２００（画像感知装置とも呼ばれ得る）は、２Ｄカメラ及び／又は３Ｄカメラであり得る。例えば、図１Ｂは、計算システム１１００、ならびにその両方がカメラ１２００の実施形態であり得る、カメラ１２００Ａ及びカメラ１２００Ｂを含む、システム１０００の実施形態であり得るシステム１５００Ａを示している。この実施例では、カメラ１２００Ａは、カメラの視野内にある環境の視覚的外観を説明する２Ｄ画像を含む、又は形成する、２Ｄ画像情報を生成するように構成される、２Ｄカメラであり得る。カメラ１２００Ｂは、カメラの視野内の環境に関する空間構造情報を含む、又は形成する３Ｄ画像情報を生成するように構成される、３Ｄカメラ（空間構造感知カメラ又は空間構造感知装置とも呼ばれる）であり得る。空間構造情報は、カメラ１２００の視野内にある様々な物体の表面上の場所など、カメラ１２００Ｂに対する様々な場所のそれぞれの奥行き値を説明する、奥行き情報（例えば、奥行きマップ）を含んでもよい。カメラの視野又は物体の表面上のこれらの場所を、物理的な場所と称することもできる。この例の奥行き情報は、物体が三次元（３Ｄ）空間の中で空間的にどのように配置されるかを推定するために使用され得る。一部の実例では、空間構造情報は、カメラ１２００Ｂの視野内にある物体の一つ以上の表面上の場所を説明する点群を含んでもよく、又は該点群を生成するために使用され得る。より具体的には、空間構造情報が、物体の構造（物体構造とも呼ぶ）上の様々な位置を説明し得る。 In one embodiment, camera 1200 (which may also be referred to as an image sensing device) may be a 2D camera and/or a 3D camera. For example, FIG. 1B depicts a system 1500A, which may be an embodiment of system 1000, including a computing system 1100, and a camera 1200A and a camera 1200B, both of which may be embodiments of camera 1200. In this example, camera 1200A may be a 2D camera configured to generate 2D image information that includes or forms a 2D image that describes the visual appearance of the environment within the camera's field of view. Camera 1200B may be a 3D camera (also referred to as a spatial structure sensing camera or spatial structure sensing device) that is configured to generate 3D image information that includes or forms spatial structure information about the environment within the camera's field of view. . The spatial structure information may include depth information (eg, a depth map) that describes depth values for each of various locations relative to camera 1200B, such as locations on the surface of various objects within the field of view of camera 1200. These locations on the field of view of the camera or on the surface of the object may also be referred to as physical locations. Depth information in this example may be used to estimate how an object is spatially positioned in three-dimensional (3D) space. In some instances, the spatial structure information may include or be used to generate a point cloud that describes locations on one or more surfaces of an object that are within the field of view of camera 1200B. . More specifically, the spatial structure information may describe various positions on the structure of the object (also referred to as the object structure).

一実施形態では、システム１０００が、カメラ１２００の環境においてロボットと様々な物体との間のロボット相互作用を容易にするための、ロボット操作システムであり得る。例えば、図１Ｃは、図１Ａ及び図１Ｂのシステム１０００／１５００Ａの実施形態であり得る、ロボット操作システム１５００Ｂを示している。ロボット操作システム１５００Ｂは、計算システム１１００、カメラ１２００、及びロボット１３００を含んでもよい。上述のように、ロボット１３００は、カメラ１２００の環境の中にある一つ以上の物体、例えば、箱、木枠、ビン、パレット、又はその他の容器と相互作用するために使用され得る。例えば、ロボット１３００は、一つの場所から容器を拾い上げ、それらを別の場所に移動するように構成され得る。一部の事例では、ロボット１３００は、容器又は他の物体のグループが降ろされて、例えば、ベルトコンベヤーに移動される、パレットから降ろす操作を実行するために使用され得る。一部の実施形態では、カメラ１２００は、以下に論じる、ロボット１３００又はロボット３３００に取り付けられてもよい。これは、手持ちカメラ又は手持ちカメラソリューションとしても知られる。カメラ１２００は、ロボット１３００のロボットアーム３３２０に取り付けられてもよい。次に、ロボットアーム３３２０は、様々な持ち上げ領域に移動して、それらの領域に関する画像情報を生成してもよい。一部の実施形態では、カメラ１２００は、ロボット１３００から離れていてもよい。例えば、カメラ１２００は、倉庫又は他の構造の天井に装着されてもよく、構造に対して静止したままであり得る。一部の実施形態では、手持ちカメラ１２００と共に使用されるロボット１３００とは別個の複数のカメラ１２００及び／又はロボット１３００とは別個のカメラ１２００を含む、複数のカメラ１２００が使用され得る。一部の実施形態では、一つのカメラ１２００又は複数のカメラ１２００は、ロボットアーム、ガントリ、又はカメラ移動のために構成された他の自動化システムなどの物体操作に使用されるロボット１３００とは別個に、専用のロボットシステムに装着されてもよく、又は固定されてもよい。本明細書全体を通して、カメラ１２００を「制御する（ｃｏｎｔｒｏｌ）」又は「制御している（ｃｏｎｔｒｏｌｌｉｎｇ）」について論じることができる。手持ちカメラソリューションについては、カメラ１２００の制御は、カメラ１２００が装着されるか、又は取り付けられるロボット１３００の制御も含む。 In one embodiment, system 1000 may be a robotic manipulation system to facilitate robotic interaction between the robot and various objects in the environment of camera 1200. For example, FIG. 1C shows a robotic manipulation system 1500B, which may be an embodiment of the systems 1000/1500A of FIGS. 1A and 1B. Robot manipulation system 1500B may include computing system 1100, camera 1200, and robot 1300. As mentioned above, robot 1300 may be used to interact with one or more objects within the environment of camera 1200, such as boxes, crates, bins, pallets, or other containers. For example, robot 1300 may be configured to pick up containers from one location and move them to another location. In some cases, robot 1300 may be used to perform a depalletization operation in which groups of containers or other objects are unloaded and transferred to, for example, a belt conveyor. In some embodiments, camera 1200 may be attached to robot 1300 or robot 3300, discussed below. This is also known as a handheld camera or handheld camera solution. Camera 1200 may be attached to robot arm 3320 of robot 1300. Robotic arm 3320 may then move to various lifting areas and generate image information regarding those areas. In some embodiments, camera 1200 may be remote from robot 1300. For example, camera 1200 may be mounted to the ceiling of a warehouse or other structure and remain stationary relative to the structure. In some embodiments, multiple cameras 1200 may be used, including multiple cameras 1200 separate from robot 1300 and/or separate cameras 1200 from robot 1300 used with handheld camera 1200. In some embodiments, the camera 1200 or cameras 1200 are separate from the robot 1300 used for object manipulation, such as a robotic arm, gantry, or other automated system configured for camera movement. , may be attached to a dedicated robotic system, or may be fixed. Throughout this specification, there may be discussion of "controlling" or "controlling" camera 1200. For handheld camera solutions, controlling the camera 1200 also includes controlling the robot 1300 to which the camera 1200 is mounted or attached.

一実施形態では、図１Ａ～図１Ｃの計算システム１１００は、ロボットコントローラとも呼ばれ得るロボット１３００を形成してもよく、又はロボット１３００に組み込まれてもよい。ロボット制御システムは、システム１５００Ｂに含まれ得、例えば、ロボット１３００と容器又は他の物体との間のロボット相互作用を制御するためのロボット相互作用移動コマンドなどの、ロボット１３００用のコマンドを生成するように構成されている。こうした実施形態では、計算システム１１００は、例えば、カメラ１２００によって生成された画像情報に基づいて、このようなコマンドを生成するように構成されてもよい。例えば、計算システム１１００は、画像情報に基づいて動作計画を決定するように構成されてもよく、動作計画は、例えば、物体を握るか、又は他の方法で拾い上げることを意図し得る。計算システム１１００は、動作計画を実行するために、一つ以上のロボット相互作用移動コマンドを生成し得る。 In one embodiment, the computing system 1100 of FIGS. 1A-1C may form or be incorporated into a robot 1300, which may also be referred to as a robot controller. A robot control system may be included in system 1500B to generate commands for robot 1300, such as, for example, robot interaction movement commands to control robot interaction between robot 1300 and a container or other object. It is configured as follows. In such embodiments, computing system 1100 may be configured to generate such commands based on image information generated by camera 1200, for example. For example, computing system 1100 may be configured to determine a motion plan based on the image information, where the motion plan may be intended to, for example, grasp or otherwise pick up an object. Computing system 1100 may generate one or more robot interaction movement commands to execute the motion plan.

実施形態では、計算システム１１００は、視覚システムを形成し得るか、又はその一部であり得る。視覚システムは、例えば、ロボット１３００が位置する環境を説明する、又は別の方法として、もしくは追加的に、カメラ１２００が位置する環境を説明する、視覚情報を生成するシステムであり得る。視覚情報が、上で考察された３Ｄ画像情報、及び／又は２Ｄ画像情報、あるいはいくつかの他の画像情報を含んでもよい。一部のシナリオでは、計算システム１１００が、視覚システムを形成する場合、視覚システムは、上で考察されたロボット制御システムの一部であってもよく、又はロボット制御システムから分離してもよい。視覚システムは、ロボット制御システムから分離する場合、視覚システムは、ロボット１３００が位置する環境を説明する、情報を出力するように構成され得る。情報は、視覚システムからこうした情報を受信することができる、ロボット制御システムに出力されることができ、情報に基づいて、動作計画を実行し、及び／又はロボット相互作用移動コマンドを生成する。視覚システムに関するさらなる情報は、以下に詳しく説明されている。 In embodiments, computing system 1100 may form or be part of a vision system. The vision system may be, for example, a system that generates visual information that describes the environment in which the robot 1300 is located, or alternatively or additionally, describes the environment in which the camera 1200 is located. The visual information may include the 3D image information discussed above, and/or the 2D image information, or some other image information. In some scenarios, where computing system 1100 forms a vision system, the vision system may be part of the robot control system discussed above or may be separate from the robot control system. If the vision system is separate from the robot control system, the vision system may be configured to output information that describes the environment in which the robot 1300 is located. Information can be output to a robot control system that can receive such information from the vision system and execute a motion plan and/or generate robot interaction movement commands based on the information. Further information regarding the vision system is detailed below.

一実施形態では、計算システム１１００は、ＲＳ－２３２インターフェース、ユニバーサルシリアルバス（ＵＳＢ）インターフェースなどの専用有線通信インターフェースを介して、及び／もしくは周辺構成要素相互接続（ＰＣＩ）バスなどのローカルコンピュータバスを介して提供される接続など、直接接続によってカメラ１２００ならびに／又はロボット１３００と通信し得る。一実施形態では、計算システム１１００が、ネットワークを介してカメラ１２００及び／又はロボット１３００と通信し得る。ネットワークは、パーソナルエリアネットワーク（ＰＡＮ）、例えば、イントラネットといったローカルエリアネットワーク（ＬＡＮ）、メトロポリタンエリアネットワーク（ＭＡＮ）、ワイドエリアネットワーク（ＷＡＮ）、又はインターネットなど、任意の種類及び／又は形態のネットワークであり得る。ネットワークは、例えば、イーサネットプロトコル、インターネットプロトコル群（ＴＣＰ／ＩＰ）、ＡＴＭ（ＡｓｙｎｃｈｒｏｎｏｕｓＴｒａｎｓｆｅｒＭｏｄｅ）技術、ＳＯＮＥＴ（ＳｙｎｃｈｒｏｎｏｕｓＯｐｔｉｃａｌＮｅｔｗｏｒｋｉｎｇ）プロトコル、又はＳＤＨ（ＳｙｎｃｈｒｏｎｏｕｓＤｉｇｉｔａｌＨｉｅｒａｒｃｈｙ）プロトコルを含む、プロトコルの異なる技術、及び層又はスタックを利用し得る。 In one embodiment, computing system 1100 connects to a local computer bus, such as a peripheral component interconnect (PCI) bus, through a dedicated wired communications interface, such as an RS-232 interface, a universal serial bus (USB) interface, and/or a peripheral component interconnect (PCI) bus. The camera 1200 and/or the robot 1300 may communicate with the camera 1200 and/or the robot 1300 by a direct connection, such as a connection provided through. In one embodiment, computing system 1100 may communicate with camera 1200 and/or robot 1300 via a network. The network can be any type and/or form of network, such as a personal area network (PAN), a local area network (LAN), e.g. an intranet, a metropolitan area network (MAN), a wide area network (WAN), or the Internet. obtain. The network uses, for example, Ethernet protocol, Internet protocol group (TCP/IP), ATM (Asynchronous Transfer Mode) technology, SONET (Synchronous Optical Networking) protocol, or SDH (Synchronous Digital Hiera). different techniques of protocols, including rchy) protocols, and Layers or stacks may be utilized.

実施形態では、計算システム１１００は、カメラ１２００及び／もしくはロボット１３００と直接情報を伝達してもよく、又は中間記憶装置、もしくはより広くは、中間の非一時的コンピュータ可読媒体を介して通信し得る。例えば、図１Ｄは、計算システム１１００の外部にあり得る非一時的コンピュータ可読媒体１４００を含む、システム１０００／１５００Ａ／１５００Ｂの実施形態であってもよく、例えば、カメラ１２００によって生成された画像情報を記憶するための外部バッファ又はリポジトリとして作用し得る、システム１５００Ｃを示している。こうした一実施例では、計算システム１１００は、非一時的コンピュータ可読媒体１４００から、画像情報を検索するか、さもなければ受信することができる。非一時的コンピュータ可読媒体１４００の例としては、電子記憶装置、磁気記憶装置、光学記憶装置、電磁記憶装置、半導体記憶装置、又はそれらの任意の好適な組み合わせが挙げられる。非一時的コンピュータ可読媒体は、例えば、コンピュータディスケット、ハードディスクドライブ（ＨＤＤ）、ソリッドステートドライブ（ＳＤＤ）、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、消却可能プログラム可能読み出し専用メモリ（ＥＰＲＯＭ又はフラッシュメモリ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、携帯型コンパクトディスク読み出し専用メモリ（ＣＤ－ＲＯＭ）、デジタル多目的ディスク（ＤＶＤ）、及び／又はメモリスティックを形成し得る。 In embodiments, computing system 1100 may communicate information directly with camera 1200 and/or robot 1300, or may communicate via intermediate storage or, more broadly, an intermediate non-transitory computer-readable medium. . For example, FIG. 1D may be an embodiment of a system 1000/1500A/1500B that includes a non-transitory computer-readable medium 1400 that may be external to computing system 1100, e.g. A system 1500C is shown that can act as an external buffer or repository for storage. In one such example, computing system 1100 may retrieve or otherwise receive image information from non-transitory computer-readable medium 1400. Examples of non-transitory computer-readable media 1400 include electronic storage, magnetic storage, optical storage, electromagnetic storage, semiconductor storage, or any suitable combination thereof. Non-transitory computer-readable media may include, for example, a computer diskette, hard disk drive (HDD), solid state drive (SDD), random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), or (flash memory), static random access memory (SRAM), portable compact disc read only memory (CD-ROM), digital versatile disc (DVD), and/or memory stick.

上述のように、カメラ１２００は、３Ｄカメラ及び／又は２Ｄカメラであり得る。２Ｄカメラは、カラー画像又はグレースケール画像などの、２Ｄ画像を生成するように構成され得る。３Ｄカメラは、例えば、飛行時間（ＴＯＦ）カメラもしくは構造化光カメラなどの、奥行き感知カメラ、又は任意の他の種類の３Ｄカメラであり得る。一部の事例では、２Ｄカメラ及び／又は３Ｄカメラは、電荷結合素子（ＣＣＤ）センサ及び／又は相補型金属酸化膜半導体（ＣＭＯＳ）センサなど、画像センサを含み得る。一実施形態では、３Ｄカメラは、レーザ、ＬＩＤＡＲ装置、赤外線装置、明／暗センサ、運動センサ、マイクロ波検出器、超音波検出器、レーダ探知機、又は奥行き情報、もしくは他の空間構造情報を捕捉するように構成される任意の他の装置を含み得る。 As mentioned above, camera 1200 can be a 3D camera and/or a 2D camera. A 2D camera may be configured to generate 2D images, such as color or grayscale images. The 3D camera may be a depth sensing camera, or any other type of 3D camera, such as a time-of-flight (TOF) camera or a structured light camera. In some cases, the 2D camera and/or 3D camera may include an image sensor, such as a charge coupled device (CCD) sensor and/or a complementary metal oxide semiconductor (CMOS) sensor. In one embodiment, the 3D camera includes a laser, a LIDAR device, an infrared device, a light/dark sensor, a motion sensor, a microwave detector, an ultrasound detector, a radar detector, or a 3D camera that captures depth information or other spatial structure information. It may include any other device configured to capture.

上述のように、画像情報が、計算システム１１００によって処理され得る。一実施形態では、計算システム１１００は、サーバ（例えば、一つ以上のサーバブレード、プロセッサなどを有する）、パーソナルコンピュータ（例えば、デスクトップコンピュータ、ラップトップコンピュータなど）、スマートフォン、タブレット計算装置、及び／もしくは他の任意の他の計算システムを含んでもよく、又はそれらとして構成されてもよい。一実施形態では、計算システム１１００の機能性のいずれか又はすべては、クラウドコンピューティングプラットフォームの一部として行われてもよい。計算システム１１００は、単一の計算装置（例えば、デスクトップコンピュータ）であってもよく、又は複数の計算装置を含んでもよい。 As described above, image information may be processed by computing system 1100. In one embodiment, computing system 1100 includes a server (e.g., having one or more server blades, processors, etc.), a personal computer (e.g., a desktop computer, a laptop computer, etc.), a smartphone, a tablet computing device, and/or It may also include or be configured as any other computing system. In one embodiment, any or all of the functionality of computing system 1100 may be performed as part of a cloud computing platform. Computing system 1100 may be a single computing device (eg, a desktop computer) or may include multiple computing devices.

図２Ａは、計算システム１１００の一実施形態を示す、ブロック図を提供する。この実施形態における計算システム１１００は、少なくとも一つの処理回路１１１０、及び非一時的コンピュータ可読媒体（複数可）１１２０を含む。一部の実例では、処理回路１１１０は、非一時的コンピュータ可読媒体１１２０（例えば、コンピュータメモリ）上に記憶された命令（例えば、ソフトウェア命令）を実行するように構成されたプロセッサ（例えば、中央処理ユニット（ＣＰＵ）、専用コンピュータ、及び／又はオンボードサーバ）を含み得る。一部の実施形態では、プロセッサは、他の電子／電気装置に動作可能に結合された別個の／スタンドアロンコントローラに含まれてもよい。プロセッサは、プログラム命令を実行して、他の装置を制御／他の装置とインターフェース接続し、それによって、計算システム１１００にアクション、タスク、及び／又は処理を実行させ得る。一実施形態では、処理回路１１１０は、一つ以上のプロセッサ、一つ以上の処理コア、プログラマブルロジックコントローラ（「ＰＬＣ」）、特定用途向け集積回路（「ＡＳＩＣ」）、プログラマブルゲートアレイ（「ＰＧＡ」）、フィールドプログラマブルゲートアレイ（「ＦＰＧＡ」）、それらの任意の組み合わせ、又は任意の他の処理回路を含む。 FIG. 2A provides a block diagram illustrating one embodiment of a computing system 1100. Computing system 1100 in this embodiment includes at least one processing circuit 1110 and non-transitory computer readable medium(s) 1120. In some instances, processing circuitry 1110 includes a processor (e.g., a central processing unit) configured to execute instructions (e.g., software instructions) stored on non-transitory computer-readable medium 1120 (e.g., computer memory). unit (CPU), dedicated computer, and/or on-board server). In some embodiments, the processor may be included in a separate/standalone controller that is operably coupled to other electronic/electrical devices. The processor may execute program instructions to control/interface with other devices, thereby causing computing system 1100 to perform actions, tasks, and/or processing. In one embodiment, processing circuitry 1110 includes one or more processors, one or more processing cores, a programmable logic controller (“PLC”), an application specific integrated circuit (“ASIC”), a programmable gate array (“PGA”) ), field programmable gate arrays (“FPGAs”), any combination thereof, or any other processing circuitry.

一実施形態では、計算システム１１００の一部である、非一時的コンピュータ可読媒体１１２０が、上で考察された中間の非一時的コンピュータ可読媒体１４００の代替又は追加であり得る。非一時的コンピュータ可読媒体１１２０は、電子記憶装置、磁気記憶装置、光学記憶装置、電磁記憶装置、半導体記憶装置、又は、例えば、コンピュータディスケット、ハードディスクドライブ（ＨＤＤ）、ソリッドステートドライブ（ＳＳＤ）、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、消却可能プログラム可能読み出し専用メモリ（ＥＰＲＯＭ又はフラッシュメモリ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、携帯型コンパクトディスク読み出し専用メモリ（ＣＤ－ＲＯＭ）、デジタル多目的ディスク（ＤＶＤ）、メモリスティック、それらの任意の組み合わせ、又は任意の他の記憶装置など、それらの任意の好適な組み合わせなどの記憶装置であり得る。一部の実例では、非一時的コンピュータ可読媒体１１２０は、複数の記憶装置を含み得る。特定の実施形態では、非一時的コンピュータ可読媒体１１２０が、カメラ１２００によって生成され、計算システム１１００によって受信される画像情報を記憶するように構成される。一部の実例では、非一時的コンピュータ可読媒体１１２０は、本明細書で論じる方法及び処理を実行するために使用される一つ以上の物体認識テンプレートを記憶し得る。非一時的コンピュータ可読媒体１１２０が、処理回路１１１０によって実行されるとき、処理回路１１１０に、本明細書に説明される一つ以上の方法論を実施させるコンピュータ可読プログラム命令を、代替的又は追加的に記憶し得る。 In one embodiment, non-transitory computer-readable media 1120 that is part of computing system 1100 may be an alternative to or in addition to intermediate non-transitory computer-readable media 1400 discussed above. Non-transitory computer readable medium 1120 can be an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or, for example, a computer diskette, a hard disk drive (HDD), a solid state drive (SSD), a random storage device, etc. Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM or Flash Memory), Static Random Access Memory (SRAM), Portable Compact Disk Read Only Memory (CD-ROM), Digital Versatile Memory The storage device may be any suitable combination thereof, such as a disc (DVD), a memory stick, any combination thereof, or any other storage device. In some instances, non-transitory computer-readable medium 1120 may include multiple storage devices. In certain embodiments, non-transitory computer-readable medium 1120 is configured to store image information generated by camera 1200 and received by computing system 1100. In some instances, non-transitory computer-readable medium 1120 may store one or more object recognition templates used to perform the methods and processes discussed herein. Non-transitory computer-readable medium 1120 may alternatively or additionally contain computer-readable program instructions that, when executed by processing circuitry 1110, cause processing circuitry 1110 to perform one or more methodologies described herein. Can be memorized.

図２Ｂは、計算システム１１００の一実施形態であり、通信インターフェース１１３１を含む、計算システム１１００Ａを示す。通信インターフェース１１３１は、例えば、図１Ａ～図１Ｄのカメラ１２００によって生成された画像情報を受信するように構成され得る。画像情報は、上で考察された中間の非一時的コンピュータ可読媒体１４００もしくはネットワークを介して、又はカメラ１２００と計算システム１１００／１１００Ａとの間のより直接的な接続を介して受信され得る。一実施形態では、通信インターフェース１１３１は、図１Ｃのロボット１３００と通信するように構成され得る。計算システム１１００が、ロボット制御システムの外部にある場合、計算システム１１００の通信インターフェース１１３１が、ロボット制御システムと通信するように構成され得る。通信インターフェース１１３１はまた、通信構成要素又は通信回路と呼ばれる場合があり、例えば、有線又は無線プロトコル上で通信を実行するように構成される通信回路を含んでもよい。例として、通信回路が、ＲＳ－２３２ポートコントローラー、ＵＳＢコントローラ、イーサネットコントローラ、Ｂｌｕｅｔｏｏｔｈ（登録商標）コントローラ、ＰＣＩバスコントローラ、任意の他の通信回路、又はそれらの組み合わせを含んでもよい。 FIG. 2B depicts computing system 1100A, which is one embodiment of computing system 1100, and includes communication interface 1131. FIG. Communication interface 1131 may be configured to receive image information generated by camera 1200 of FIGS. 1A-1D, for example. Image information may be received via the intermediate non-transitory computer readable medium 1400 or network discussed above, or via a more direct connection between camera 1200 and computing system 1100/1100A. In one embodiment, communication interface 1131 may be configured to communicate with robot 1300 of FIG. 1C. If the computing system 1100 is external to the robot control system, a communication interface 1131 of the computing system 1100 may be configured to communicate with the robot control system. Communication interface 1131 may also be referred to as a communication component or communication circuit, and may include, for example, communication circuitry configured to perform communications over a wired or wireless protocol. By way of example, the communication circuitry may include an RS-232 port controller, a USB controller, an Ethernet controller, a Bluetooth controller, a PCI bus controller, any other communication circuitry, or a combination thereof.

一実施形態では、図２Ｃに図示されるように、非一時的コンピュータ可読媒体１１２０は、本明細書に論じられる一つ以上のデータオブジェクトを記憶するように構成された記憶空間１１２５を含み得る。例えば、記憶空間は、物体認識テンプレート、検出仮説、画像情報、物体画像情報、ロボットアーム移動コマンド、及び本明細書で論じた計算システムがアクセスを必要とし得る任意の追加のデータオブジェクトを記憶し得る。 In one embodiment, as illustrated in FIG. 2C, non-transitory computer-readable medium 1120 may include a storage space 1125 configured to store one or more data objects discussed herein. For example, the storage space may store object recognition templates, detection hypotheses, image information, object image information, robot arm movement commands, and any additional data objects that the computing systems discussed herein may need access to. .

一実施形態では、処理回路１１１０が、非一時的コンピュータ可読媒体１１２０に記憶される、一つ以上のコンピュータ可読プログラム命令によってプログラムされ得る。例えば、図２Ｄは、計算システム１１００／１１００Ａ／１１００Ｂの一実施形態である、計算システム１１００Ｃを示し、処理回路１１１０は、物体認識モジュール１１２１、動作計画モジュール１１２９、及び物体操作計画モジュール１１２６を含む、一つ以上のモジュールによってプログラムされる。処理回路１１１０はさらに、仮説生成モジュール１１２８、物体登録モジュール１１３０、テンプレート生成モジュール１１３２、特徴抽出モジュール１１３４、仮説精密化モジュール１１３６、及び仮説検証モジュール１１３８を用いてプログラムされてもよい。上記のモジュールの各々は、本明細書に記載されるプロセッサ、処理回路、計算システムなどのうちの一つ以上上でインスタンス化されたときに、特定のタスクを実行するように構成されたコンピュータ可読プログラム命令を表し得る。上記のモジュールの各々は、本明細書に記載される機能を達成するために、互いに協働してもよい。本明細書に記載される機能性の様々な態様は、上述のソフトウェアモジュールのうちの一つ以上によって実行されてもよく、ソフトウェアモジュール及びそれらの説明は、本明細書に開示されるシステムの計算構造を制限するものとして理解されない。例えば、特定のタスク又は機能は、特定のモジュールに関して記述され得るが、そのタスク又は機能は、必要に応じて異なるモジュールによって実施されてもよい。さらに、本明細書に記載のシステム機能は、異なる機能の内訳又は割り当てで構成された異なるセットのソフトウェアモジュールによって行われてもよい。 In one embodiment, processing circuitry 1110 may be programmed with one or more computer readable program instructions stored on non-transitory computer readable medium 1120. For example, FIG. 2D shows computing system 1100C, which is one embodiment of computing system 1100/1100A/1100B, in which processing circuitry 1110 includes an object recognition module 1121, a motion planning module 1129, and an object manipulation planning module 1126. Programmed by one or more modules. Processing circuitry 1110 may be further programmed with hypothesis generation module 1128, object registration module 1130, template generation module 1132, feature extraction module 1134, hypothesis refinement module 1136, and hypothesis verification module 1138. Each of the modules described above is a computer readable device configured to perform a particular task when instantiated on one or more of the processors, processing circuits, computing systems, etc. described herein. May represent program instructions. Each of the modules described above may cooperate with each other to accomplish the functionality described herein. Various aspects of the functionality described herein may be performed by one or more of the software modules described above, and the software modules and their descriptions are representative of the systems disclosed herein. It is not understood as limiting structure. For example, although a particular task or function may be described with respect to a particular module, the task or function may be performed by different modules as appropriate. Additionally, the system functions described herein may be performed by different sets of software modules configured with different functional subdivisions or assignments.

一実施形態では、物体認識モジュール１１２１は、本開示全体を通して論じられたように、画像情報を取得及び分析するように構成され得る。画像情報に関して本明細書で論じられる方法、システム、及び技術は、物体認識モジュール１１２１を使用し得る。物体認識モジュールは、本明細書で論じるように、物体識別に関連する物体認識タスクのためにさらに構成されてもよい。 In one embodiment, object recognition module 1121 may be configured to acquire and analyze image information as discussed throughout this disclosure. The methods, systems, and techniques discussed herein regarding image information may use object recognition module 1121. The object recognition module may be further configured for object recognition tasks related to object identification, as discussed herein.

動作計画モジュール１１２９は、ロボットの動きを計画及び実行するように構成され得る。例えば、動作計画モジュール１１２９は、本明細書に記載される他のモジュールと相互作用して、物体取り出し操作及びカメラ配置操作のためのロボット３３００の動作を計画してもよい。ロボットアームの動き及び軌道に関して本明細書で論じられる方法、システム、及び技術は、動作計画モジュール１１２９によって実施され得る。 Motion planning module 1129 may be configured to plan and execute robot movements. For example, motion planning module 1129 may interact with other modules described herein to plan motion of robot 3300 for object retrieval operations and camera placement operations. The methods, systems, and techniques discussed herein regarding robot arm movements and trajectories may be implemented by motion planning module 1129.

物体操作計画モジュール１１２６は、例えば、物体を把持及び解放する、ならびにこうした把持及び解放を補助及び容易にするためロボットアームコマンドを実行する、ロボットアームの物体操作活動を計画及び実行するように構成されてもよい。 Object manipulation planning module 1126 is configured to plan and execute object manipulation activities of the robotic arm, such as grasping and releasing objects, and executing robotic arm commands to assist and facilitate such grasping and releasing. You can.

仮説生成モジュール１１２８は、例えば、図１０Ａ～１０Ｂに関して説明されるように、検出仮説を生成するため、テンプレートマッチング及び認識タスクを実行するように構成され得る。仮説生成モジュール１１２８は、任意の他の必要なモジュールと相互作用又は通信するように構成され得る。 Hypothesis generation module 1128 may be configured to perform template matching and recognition tasks to generate detection hypotheses, eg, as described with respect to FIGS. 10A-10B. Hypothesis generation module 1128 may be configured to interact or communicate with any other necessary modules.

物体登録モジュール１１３０は、本明細書で論じる様々なタスクに必要とされ得る物体登録情報を取得、記憶、生成、及びそうでなければ処理するよう構成されてもよい。物体登録モジュール１１３０は、任意の他の必要なモジュールと相互作用又は通信するように構成され得る。 Object registration module 1130 may be configured to obtain, store, generate, and otherwise process object registration information that may be needed for various tasks discussed herein. Object registration module 1130 may be configured to interact or communicate with any other necessary modules.

テンプレート生成モジュール１１３２は、例えば、図６～９Ｄに関連して、本明細書で論じるように、物体認識テンプレート生成タスクを完了するように構成され得る。テンプレート生成モジュール１１３２は、物体登録モジュール１１３０、特徴抽出モジュール１１３４、及び任意の他の必要なモジュールと相互作用するように構成され得る。 Template generation module 1132 may be configured to complete object recognition template generation tasks, for example, as discussed herein in connection with FIGS. 6-9D. Template generation module 1132 may be configured to interact with object registration module 1130, feature extraction module 1134, and any other necessary modules.

特徴抽出モジュール１１３４は、例えば、図８～９Ｄに関連して、本明細書で論じる特徴抽出及び生成タスクを完了するように構成され得る。特徴抽出モジュール１１３４は、物体登録モジュール１１３０、テンプレート生成モジュール１１３２、仮説生成モジュール１１２８、及び任意の他の必要なモジュールと相互作用するように構成され得る。 Feature extraction module 1134 may be configured, for example, to complete the feature extraction and generation tasks discussed herein in connection with FIGS. 8-9D. Feature extraction module 1134 may be configured to interact with object registration module 1130, template generation module 1132, hypothesis generation module 1128, and any other necessary modules.

仮説精密化モジュール１１３６は、例えば、図１１～１２Ｃに関連して、本明細書で論じるように、仮説精密化タスクを完了するように構成され得る。仮説精密化モジュール１１３６は、物体認識モジュール１１２１及び仮説生成モジュール１１２８、ならびに任意の他の必要なモジュールと相互作用するように構成され得る。 Hypothesis refinement module 1136 may be configured to complete hypothesis refinement tasks, for example, as discussed herein in connection with FIGS. 11-12C. Hypothesis refinement module 1136 may be configured to interact with object recognition module 1121 and hypothesis generation module 1128, as well as any other necessary modules.

仮説検証モジュール１１３８は、例えば、図１３～１４に関連して、本明細書で論じるように仮説検証タスクを完了するように構成され得る。仮説検証モジュール１１３８は、物体登録モジュール１１３０、特徴抽出モジュール１１３４、仮説生成モジュール１１２８、仮説精密化モジュール１１３６、及び任意の他の必要なモジュールと相互作用するように構成され得る。 Hypothesis testing module 1138 may be configured to complete hypothesis testing tasks, for example, as discussed herein in connection with FIGS. 13-14. Hypothesis validation module 1138 may be configured to interact with object registration module 1130, feature extraction module 1134, hypothesis generation module 1128, hypothesis refinement module 1136, and any other necessary modules.

図２Ｅ、図２Ｆ、図３Ａ及び図３Ｂを参照すると、画像分析のために実施され得る物体認識モジュール１１２１に関連する方法が説明される。図２Ｅ及び図２Ｆは、画像分析方法と関連付けられた例示的な画像情報を示すが、図３Ａ及び図３Ｂは、画像分析方法と関連付けられた例示的なロボット環境を示す。計算システムによる画像分析に関連する本明細書の参照は、選択された点に対する様々な場所のそれぞれの奥行き値を説明する奥行き情報を含み得る空間構造情報に従って、又はそれを使用して実行され得る。奥行き情報は、物体を識別するか、又は物体が空間的にどのように配置されているかを推定するために使用され得る。一部の実例では、空間構造情報は、物体の一つ以上の表面上の場所を説明する点群を含んでもよく、又は該点群を生成するために使用され得る。空間構造情報は、可能な画像分析の一形態に過ぎず、当業者が公知の他の形態が、本明細書に説明される方法に従って使用され得る。 2E, 2F, 3A, and 3B, methods associated with object recognition module 1121 that may be implemented for image analysis are described. 2E and 2F illustrate example image information associated with an image analysis method, while FIGS. 3A and 3B illustrate an example robot environment associated with an image analysis method. References herein relating to image analysis by a computing system may be performed in accordance with or using spatial structure information, which may include depth information describing respective depth values of various locations relative to a selected point. . Depth information may be used to identify objects or estimate how objects are spatially located. In some instances, the spatial structure information may include or be used to generate a point cloud that describes locations on one or more surfaces of an object. Spatial structure information is only one form of possible image analysis; other forms known to those skilled in the art may be used in accordance with the methods described herein.

実施形態では、計算システム１１００が、カメラ１２００のカメラ視野（例えば、３２００）内の物体を表す画像情報を取得し得る。画像情報を取得するための以下に説明するステップ及び技術は、以下、画像情報捕捉処理３００１と呼んでもよい。一部の実例では、物体は、カメラ１２００の視野３２００のシーン５０１３内の複数の物体５０１２からの一つの物体５０１２であってもよい。画像情報２６００、２７００は、物体５０１２がカメラ視野３２００にある（又はあった）ときに、カメラ（例えば、１２００）によって生成されてもよく、個々の物体５０１２又はシーン５０１３のうちの一つ以上を記述してもよい。物体の外観は、カメラ１２００の視点からの物体５０１２の外観を記述する。カメラ視野に複数の物体５０１２がある場合、カメラは、必要に応じて、複数の物体又は単一の物体を表す画像情報（単一の物体に関するこうした画像情報は、物体画像情報と呼ばれ得る）を生成し得る。画像情報は、物体のグループがカメラ視野にある（又はあった）ときに、カメラ（例えば、１２００）によって生成されてもよく、及び、例えば、２Ｄ画像情報及び／又は３Ｄ画像情報を含み得る。 In embodiments, computing system 1100 may obtain image information representative of objects within the camera field of view (eg, 3200) of camera 1200. The steps and techniques described below for acquiring image information may hereinafter be referred to as image information capture process 3001. In some examples, the object may be one object 5012 from a plurality of objects 5012 within the scene 5013 of the field of view 3200 of the camera 1200. The image information 2600, 2700 may be generated by a camera (e.g., 1200) when the object 5012 is (or has been) in the camera field of view 3200, and may capture one or more of the individual objects 5012 or the scene 5013. May be written. Object appearance describes the appearance of object 5012 from the perspective of camera 1200. If there are multiple objects 5012 in the camera field of view, the camera optionally displays image information representing multiple objects or a single object (such image information regarding a single object may be referred to as object image information). can be generated. Image information may be generated by a camera (e.g., 1200) when a group of objects is (or has been) in camera field of view, and may include, for example, 2D image information and/or 3D image information.

一例として、図２Ｅは、画像情報の第一のセット、より具体的には、２Ｄ画像情報２６００を図示し、これは、上述のように、カメラ１２００によって生成され、図３Ａの物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄを表す。より具体的には、２Ｄ画像情報２６００は、グレースケール、又はカラー画像であり得、カメラ１２００の視点からの物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄの外観を記述し得る。一実施形態では、２Ｄ画像情報２６００は、カラー画像の単一色チャネル（例えば、赤、緑、又は青のチャネル）に対応し得る。カメラ１２００が物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄの上方に配置される場合、２Ｄ画像情報２６００は、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄのそれぞれの上部表面の外観を表し得る。図２Ｅの例では、２Ｄ画像情報２６００は、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄのそれぞれの表面を表す、画像部分又は物体画像情報とも呼ばれる、それぞれの部分２０００Ａ／２０００Ｂ／２０００Ｃ／２０００Ｄ／２５５０を含み得る。図２Ｅでは、２Ｄ画像情報２６００の各画像部分２０００Ａ／２０００Ｂ／２０００Ｃ／２０００Ｄ／２５５０は、画像領域、又は、より具体的には、ピクセル領域（画像がピクセルによって形成される場合）であり得る。２Ｄ画像情報２６００のピクセル領域内の各ピクセルは、座標［Ｕ、Ｖ］のセットによって記述される位置を有するものとして特徴付けられ得、図２Ｅ及び図２Ｆに示されるように、カメラ座標系又は他の何らかの座標系に対する値を有し得る。ピクセルの各々はまた、０～２５５又は０～１０２３の値などの、強度値を有してもよい。さらなる実施形態では、ピクセルの各々は、様々なフォーマット（例えば、色相、飽和、強度、ＣＭＹＫ、ＲＧＢなど）のピクセルに関連付けられた任意の追加情報を含んでもよい。 As an example, FIG. 2E illustrates a first set of image information, more specifically 2D image information 2600, which is generated by camera 1200 and which is generated by object 3410A/3410B of FIG. 3A, as described above. /3410C/ 3410D . More specifically, 2D image information 2600 may be a grayscale or color image and may describe the appearance of object 3410A/3410B/3410C/3410D from the perspective of camera 1200. In one embodiment, 2D image information 2600 may correspond to a single color channel (eg, a red, green, or blue channel) of a color image. When camera 1200 is positioned above objects 3410A/3410B/3410C/ 3410D , 2D image information 2600 may represent the appearance of the respective upper surfaces of objects 3410A/3410B/3410C/3410D. In the example of FIG. 2E, 2D image information 2600 includes respective portions 2000A/2000B/2000C/2000D/2550, also referred to as image portions or object image information, representing respective surfaces of objects 3410A/3410B/ 3410C /3410D. may be included. In FIG. 2E, each image portion 2000A/2000B/2000C/2000D/2550 of 2D image information 2600 may be an image region or, more specifically, a pixel region (if the image is formed by pixels). Each pixel within the pixel region of 2D image information 2600 may be characterized as having a position described by a set of coordinates [U,V], and may be defined in the camera coordinate system or as shown in FIGS. 2E and 2F. It may have values for some other coordinate system. Each pixel may also have an intensity value, such as a value between 0 and 255 or between 0 and 1023. In further embodiments, each of the pixels may include any additional information associated with the pixel in various formats (e.g., hue, saturation, intensity, CMYK, RGB, etc.).

上述のように、画像情報は、一部の実施形態では、２Ｄ画像情報２６００などの画像のすべて又は一部分であってもよい。例では、計算システム１１００は、対応する物体３４１０Ａと関連付けられた画像情報のみを取得するために、２Ｄ画像情報２６００から画像部分２０００Ａを抽出するように構成されてもよい。画像部分（画像部分２０００Ａなど）が単一の物体に向けられている場合、それは物体画像情報と呼ばれ得る。物体画像情報は、それが向けられる物体についての情報のみを含む必要はない。例えば、それが向けられる物体は、一つ以上の他の物体近く、その下、その上、又はそうでなればその近傍に位置してもよい。こうした場合、物体画像情報は、それが向けられる物体、ならびに一つ以上の隣接する物体についての情報を含み得る。例えば、計算システム１１００は、図２Ｆに示される２Ｄ画像情報２６００及び／又は３Ｄ画像情報２７００に基づいて画像セグメンテーション又は他の分析又は処理操作を実行することによって、画像部分２０００Ａを抽出し得る。一部の実施形態では、画像セグメンテーション又は他の操作は、物体の物理的エッジ（例えば、物体のエッジ）が２Ｄ画像情報２６００の中に現れる画像の場所を検出すること、及びこうした画像の場所を使用して、カメラ視野（例えば、３２００）内の個々の物体を表すことに限定された物体画像情報を識別すること、ならびに実質的に他の物体を除外することを含み得る。「実質的に除外する」とは、画像セグメンテーション又はその他の処理技術が、非標的物体を物体画像情報から除外するように設計及び構成されるが、エラーが生じてもよく、ノイズが存在してもよく、様々な他の要因が他の物体の一部分の包含をもたらし得ることが理解されることを意味する。 As mentioned above, the image information may be all or a portion of an image, such as 2D image information 2600, in some embodiments. In an example, computing system 1100 may be configured to extract image portion 2000A from 2D image information 2600 to obtain only image information associated with corresponding object 3410A. If an image portion (such as image portion 2000A) is directed toward a single object, it may be referred to as object image information. Object image information need not only contain information about the object at which it is directed. For example, the object toward which it is directed may be located near, below, above, or otherwise near one or more other objects. In such cases, the object image information may include information about the object toward which it is directed, as well as one or more adjacent objects. For example, computing system 1100 may extract image portion 2000A by performing image segmentation or other analysis or processing operations based on 2D image information 2600 and/or 3D image information 2700 shown in FIG. 2F. In some embodiments, image segmentation or other operations include detecting locations in images where physical edges of objects (e.g., edges of objects) appear in 2D image information 2600, and detecting locations in such images. Use may include identifying object image information that is limited to representing individual objects within the camera field of view (eg, 3200), as well as substantially excluding other objects. "Substantially exclude" means that the image segmentation or other processing technique is designed and configured to exclude non-target objects from object image information, but may be subject to errors and noise may be present. It is meant to be understood that various other factors may result in the inclusion of portions of other objects.

図２Ｆは、画像情報が３Ｄ画像情報２７００である例を図示している。より具体的には、３Ｄ画像情報２７００は、例えば、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄの一つ以上の表面（例えば、上部表面、又は他の外部表面）上の様々な場所のそれぞれの奥行き値を示す、奥行きマップ又は点群を含み得る。一部の実施形態では、画像情報を抽出するための画像セグメンテーション操作は、物体の物理的エッジ（例えば、箱のエッジ）が３Ｄ画像情報２７００の中に現れる画像の場所を検出すること、及びこうした画像の場所を使用して、カメラ視野（例えば、３４１０Ａ）内の個々の物体を表すことに限定された画像部分（例えば、２７３０）を識別することを含み得る。 FIG. 2F illustrates an example where the image information is 3D image information 2700. More specifically, the 3D image information 2700 includes, for example, the depth of each of various locations on one or more surfaces (e.g., the top surface, or other external surface) of the object 3410A/3410B/3410C/3410D. It may include a depth map or point cloud showing the values. In some embodiments, image segmentation operations for extracting image information include detecting image locations where physical edges of objects (e.g., edges of a box) appear in 3D image information 2700, and Image locations may include identifying image portions (eg, 2730) that are limited to representing individual objects within the camera field of view (eg, 3410A).

それぞれの奥行き値は、３Ｄ画像情報２７００を生成するカメラ１２００に対するものであってもよく、又はいくつかの他の基準点に対するものであってもよい。一部の実施形態では、３Ｄ画像情報２７００は、カメラ視野（例えば、３２００）の中にある物体の構造上の様々な場所に対するそれぞれの座標を含む、点群を含み得る。図２Ｆの例では、点群は、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄのそれぞれの表面の場所を説明する、それぞれの座標セットを含み得る。座標は、［ＸＹＺ］座標などの３Ｄ座標であってもよく、カメラ座標系、又は何らかの他の座標系に対する値を有してもよい。例えば、３Ｄ画像情報２７００は、物体３４１０Ｄの表面上の物理的な場所とも呼ばれる、場所２７１０_１～２７１０_ｎのセットに対するそれぞれの奥行き値を示す、画像部分とも呼ばれる第一の画像部分２７１０を含み得る。さらに、３Ｄ画像情報２７００は、第二の部分、第三の部分、第四の部分、及び第五の部分２７２０、２７３０、２７４０、及び２７５０をさらに含み得る。次に、これらの部分は、それぞれ、２７２０_１～２７２０_ｎ、２７３０_１～２７３０_ｎ、２７４０_１～２７４０_ｎ、及び２７５０_１～２７５０_ｎによって表され得る、場所のセットに対するそれぞれの奥行き値をさらに示し得る。これらの図は単に実施例であり、対応する画像部分を有する任意の数の物体が使用され得る。上述のように、取得された３Ｄ画像情報２７００は、一部の実例では、カメラによって生成される３Ｄ画像情報２７００の第一のセットの一部分であってもよい。図２Ｆの例では、取得された３Ｄ画像情報２７００が図３Ａの物体３４１０Ａを表す場合、３Ｄ画像情報２７００は、画像部分２７１０のみを参照するように狭められ得る。２Ｄ画像情報２６００の考察と同様に、識別された画像部分２７１０は、個々の物体に関連してもよく、物体画像情報と呼ばれてもよい。したがって、物体画像情報は、本明細書で使用される場合、２Ｄ及び／又は３Ｄ画像情報を含み得る。 Each depth value may be relative to the camera 1200 generating the 3D image information 2700, or may be relative to some other reference point. In some embodiments, 3D image information 2700 may include a point cloud that includes respective coordinates for various structural locations of an object within the camera field of view (eg, 3200). In the example of FIG. 2F, the point cloud may include respective sets of coordinates that describe the locations of respective surfaces of objects 3410A/3410B/3410C/3410D . The coordinates may be 3D coordinates, such as [X Y Z] coordinates, and may have values relative to the camera coordinate system, or some other coordinate system. For example, the 3D image information 2700 may include a first image portion 2710, also referred to as an image portion, that indicates respective depth values for a set of locations 2710 ₁ -2710 _n , also referred to as physical locations on the surface of the object 3410D. . Furthermore, the 3D image information 2700 may further include a second portion, a third portion, a fourth portion, and a fifth portion 2720, 2730, 2740, and 2750. These parts then further indicate respective depth values for the set of locations, which may be represented by 2720 ₁ to 2720 _n , 2730 ₁ to 2730 _n , 2740 ₁ to 2740 _n , and 2750 ₁ to 2750 _n , respectively. obtain. These figures are merely examples; any number of objects with corresponding image portions may be used. As mentioned above, the acquired 3D image information 2700 may, in some instances, be part of a first set of 3D image information 2700 generated by a camera. In the example of FIG. 2F , if the acquired 3D image information 2700 represents object 3410A of FIG. 3A, the 3D image information 2700 may be narrowed to refer only to image portion 2710. Similar to the discussion of 2D image information 2600, identified image portions 2710 may be associated with individual objects and may be referred to as object image information. Accordingly, object image information as used herein may include 2D and/or 3D image information.

一実施形態では、画像正規化操作は、画像情報を取得する一部として、計算システム１１００によって実施され得る。画像正規化操作は、変換された画像又は変換された画像部分を生成するために、カメラ１２００によって生成された画像又は画像部分を変換することを伴い得る。例えば、取得された、２Ｄ画像情報２６００、３Ｄ画像情報２７００、又は二つの組み合わせを含み得る画像情報が、視点、物体姿勢、及び視覚的記述情報と関連付けられた照明条件において画像情報を変更させるように試みるために、画像正規化操作を受け得る場合である。そのような正規化は、画像情報及びモデル（例えば、テンプレート）情報間のより正確な比較を容易にするために実施され得る。視点は、カメラ１２００に対する物体の姿勢、及び／又はカメラ１２００が物体を表す画像を生成するときに、カメラ１２００が物体を見ている角度を指し得る。 In one embodiment, image normalization operations may be performed by computing system 1100 as part of obtaining image information. An image normalization operation may involve transforming an image or image portion produced by camera 1200 to produce a transformed image or transformed image portion. For example, the acquired image information, which may include 2D image information 2600, 3D image information 2700, or a combination of the two, may cause the image information to change in perspective, object pose, and lighting conditions associated with visual descriptive information. This is the case when an image can be subjected to a normalization operation in order to attempt to do so. Such normalization may be performed to facilitate more accurate comparisons between image information and model (eg, template) information. A viewpoint may refer to the pose of an object relative to the camera 1200 and/or the angle at which the camera 1200 views the object when the camera 1200 generates an image representing the object.

例えば、画像情報は、標的物体がカメラ視野３２００内にある、物体認識操作中に生成され得る。カメラ１２００は、標的物体がカメラに対して特定の姿勢を有するときに、標的物体を表す画像情報を生成し得る。例えば、標的物体は、その上面をカメラ１２００の光学軸に対して垂直にする姿勢を有してもよい。こうした例では、カメラ１２００によって生成される画像情報は、標的物体の上面図などの特定の視点を表し得る。一部の実例では、カメラ１２００が物体認識操作中に画像情報を生成しているときに、画像情報は、照明強度などの特定の照明条件で生成され得る。こうした実例では、画像情報は、特定の照明強度、照明色、又は他の照明条件を表し得る。 For example, image information may be generated during an object recognition operation in which a target object is within the camera field of view 3200. Camera 1200 may generate image information representative of the target object when the target object has a particular pose with respect to the camera. For example, the target object may have an orientation with its top surface perpendicular to the optical axis of camera 1200. In such examples, the image information generated by camera 1200 may represent a particular perspective, such as a top view of the target object. In some instances, when camera 1200 is generating image information during an object recognition operation, image information may be generated at particular lighting conditions, such as illumination intensity. In such instances, the image information may represent a particular illumination intensity, illumination color, or other illumination condition.

一実施形態では、画像正規化操作は、画像又は画像部分を、物体認識テンプレートの情報と関連付けられた視点及び／又は照明条件により良く合致させるように、カメラによって生成されるシーンの画像又は画像部分を調整することを伴い得る。調整は、画像又は画像部分を変換して、物体姿勢又は物体認識テンプレートの視覚的記述情報に関連付けられた照明条件のうちの少なくとも一方に合致する変換された画像を生成することを伴い得る。 In one embodiment, the image normalization operation includes an image or image portion of a scene generated by the camera to better match the image or image portion to viewpoint and/or lighting conditions associated with information in the object recognition template. This may involve adjusting the Adjustment may involve transforming the image or image portion to produce a transformed image that matches at least one of an object pose or a lighting condition associated with visual description information of the object recognition template.

視点調整は、画像が物体認識テンプレート内に含まれ得る視覚的記述情報と同じ視点を表すように、シーンの画像の処理、ワーピング、及び／又はシフトを伴い得る。処理は、例えば、画像の色、コントラスト、又は照明を変更することを含み得、シーンのワーピングは、画像のサイズ、寸法、又は比率を変更することを含み得、画像のシフトは、画像の位置、向き、又は回転を変更することを含み得る。例示的な実施形態では、処理、ワーピング、及び／又はシフトを使用して、シーンの画像内の物体を、物体認識テンプレートの視覚的記述情報に合致するか、又はそれにより良好に対応する向き及び／又はサイズを有するように変更してもよい。物体認識テンプレートが、一部の物体の正面図（例えば、上面図）を記述する場合、シーンの画像は、シーン内の物体の正面図も表すようにワーピングされ得る。 Perspective adjustment may involve processing, warping, and/or shifting images of a scene so that the images represent the same viewpoint as the visual description information that may be included within the object recognition template. Processing may include, for example, changing the color, contrast, or illumination of the image, warping the scene may include changing the size, dimensions, or proportions of the image, and shifting the image may include changing the position of the image. , orientation, or rotation. Exemplary embodiments use processing, warping, and/or shifting to orient objects in images of a scene to orientations and orientations that match or better correspond to the visual description information of the object recognition template. /or may be changed to have a size. If the object recognition template describes a front view (eg, a top view) of some object, the image of the scene may be warped to also represent a front view of the object in the scene.

本明細書に実施される物体認識方法のさらなる態様は、２０２０年８月１２日出願の米国特許出願第１６／９９１，５１０号、及び２０２０年８月１２日出願の米国特許出願第１６／９９１，４６６号により詳細に説明されており、その各々が参照により本明細書に組み込まれる。 Further aspects of the object recognition methods implemented herein are described in U.S. Patent Application No. 16/991,510, filed August 12, 2020; , 466, each of which is incorporated herein by reference.

様々な実施形態では、「コンピュータ可読命令」及び「コンピュータ可読プログラム命令」という用語は、様々なタスク及び操作を遂行するように構成されている、ソフトウェア命令又はコンピュータコードを記述するために使用される。様々な実施形態では、「モジュール」という用語は、処理回路１１１０に一つ以上の機能タスクを行わせるように構成される、ソフトウェア命令又はコードの集まりを広く指す。モジュール及びコンピュータ可読命令は、処理回路又は他のハードウェアコンポーネントが、モジュールもしくはコンピュータ可読命令を実行しているときに、様々な操作又はタスクを実行するものとして説明され得る。 In various embodiments, the terms "computer readable instructions" and "computer readable program instructions" are used to describe software instructions or computer code that are configured to perform various tasks and operations. . In various embodiments, the term "module" broadly refers to a collection of software instructions or code that is configured to cause processing circuitry 1110 to perform one or more functional tasks. Modules and computer-readable instructions may be described as processing circuits or other hardware components that, when executing the modules or computer-readable instructions, perform various operations or tasks.

図３Ａ～３Ｂは、非一時的コンピュータ可読媒体１１２０上に記憶されたコンピュータ可読プログラム命令が、計算システム１１００を介して利用されて、物体識別、検出、ならびに取り出しの操作及び方法の効率を増大させる例示的な環境を示す。計算システム１１００によって取得され、図３Ａにおいて例証される画像情報は、物体環境内に存在するロボット３３００へのシステムの意思決定手順及びコマンド出力に影響を与える。 3A-3B illustrate that computer readable program instructions stored on non-transitory computer readable media 1120 are utilized via computing system 1100 to increase the efficiency of object identification, detection, and retrieval operations and methods. An example environment is shown. The image information obtained by the computing system 1100 and illustrated in FIG. 3A influences the system's decision-making procedures and command output to the robot 3300 present within the object environment.

図３Ａ～図３Ｂは、本明細書に説明されるプロセス及び方法が実施され得る例示的な環境を示す。図３Ａは、少なくとも計算システム１１００、ロボット３３００、及びカメラ１２００を含むシステム３０００（図１Ａ～図１Ｄのシステム１０００／１５００Ａ／１５００Ｂ／１５００Ｃの実施形態であり得る）を有する環境を示す。カメラ１２００は、カメラ１２００の実施形態であってもよく、カメラ１２００のカメラ視野３２００内のシーン５０１３を表す、又は、より具体的には、物体３４１０Ａ、３４１０Ｂ、３４１０Ｃ、及び３４１０Ｄなどの、カメラ視野３２００内の物体（箱など）を表す、画像情報を生成するように構成され得る。一実施例では、物体３４１０Ａ～３４１０Ｄの各々は、例えば、箱又は木枠などの容器であってもよく、一方で、物体３４００は、例えば、上に容器が配設されるパレットであり得る。さらに、物体３４１０Ａ～３４１０Ｄの各々は、個々の物体５０１２を含む容器であってもよい。各物体５０１２は、例えば、ロッド、バー、ギア、ボルト、ナット、ねじ、くぎ、リベット、ばね、リンケージ、歯車、又は任意の他の種類の物理的物体、ならびに複数の物体のアセンブリであってもよい。図３Ａは、物体５０１２の複数の容器を含む実施形態を示すが、図３Ｂは、物体５０１２の単一の容器を含む実施形態を示す。 3A-3B illustrate example environments in which the processes and methods described herein may be implemented. FIG. 3A shows an environment having a system 3000 (which may be an embodiment of the systems 1000/1500A/1500B/1500C of FIGS. 1A-1D) including at least a computing system 1100, a robot 3300, and a camera 1200. Camera 1200 may be an embodiment of camera 1200 representing a scene 5013 within camera field of view 3200 of camera 1200, or more specifically objects 3410A , 3410B , 3410C , 3410D , etc. may be configured to generate image information representative of an object (such as a box) within the camera field of view 3200 . In one example, each of objects 3410A - 3410D may be a container, such as a box or crate, while object 3400 may be, for example, a pallet on which the containers are disposed. obtain. Additionally, each of objects 3410A - 3410D may be a container containing an individual object 5012. Each object 5012 may be, for example, a rod, bar, gear, bolt, nut, screw, nail, rivet, spring, linkage, gear, or any other type of physical object, as well as an assembly of multiple objects. good. 3A depicts an embodiment that includes multiple containers of object 5012, whereas FIG. 3B depicts an embodiment that includes a single container of object 5012.

一実施形態では、図３Ａのシステム３０００は、一つ以上の光源を含み得る。光源は、例えば、発光ダイオード（ＬＥＤ）、ハロゲンランプ、又は任意の他の光源であってもよく、可視光、赤外線、又は物体３４１０Ａ～３４１０Ｄの表面に向かって任意の他の形態の光を放射するように構成され得る。一部の実施形態では、計算システム１１００は、光源と通信して、光源が起動されるときを制御するように構成され得る。他の実施形態では、光源は、計算システム１１００とは独立して動作し得る。 In one embodiment, system 3000 of FIG. 3A may include one or more light sources. The light source may be, for example, a light emitting diode (LED), a halogen lamp, or any other light source, such as visible light, infrared light, or any other form of light directed toward the surface of the object 3410A - 3410D . may be configured to emit. In some embodiments, computing system 1100 may be configured to communicate with a light source to control when the light source is activated. In other embodiments, the light source may operate independently of computing system 1100.

一実施形態では、システム３０００は、カメラ１２００、又は２Ｄ画像情報２６００を生成するように構成される２Ｄカメラと、３Ｄ画像情報２７００を生成するように構成される３Ｄカメラと、を含む、複数のカメラ１２００を含み得る。カメラ１２００又は複数のカメラ１２００は、ロボット３３００に装着されるか、又はロボット３３００に取り付けられてもよく、環境内に静止していてもよく、及び／又はロボットアーム、ガントリ、又はカメラ移動のために構成された他の自動化システムなどの物体操作に使用されるロボット３３００から分離された専用のロボットシステムに固定されてもよい。図３Ａは、静止カメラ１２００及び手持ちカメラ１２００を有する例を示し、一方、図３Ｂは、静止カメラ１２００のみを有する例を示す。２Ｄ画像情報２６００（例えば、カラー画像又はグレースケール画像）は、カメラ視野３２００における、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄ又は物体５０１２などの一つ以上の物体の外観を説明し得る。例えば、２Ｄ画像情報２６００は、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄ及び５０１２のそれぞれの外部表面（例えば、上部表面）上に配置される視覚的詳細、及び／又はそれらの外部表面の輪郭を捕捉するか、又は別様に表し得る。一実施形態では、３Ｄ画像情報２７００は、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄ／３４００及び５０１２のうちの一つ以上の構造を説明してもよく、物体についての構造は、物体の構造又は物体の物理的構造とも呼ばれ得る。例えば、３Ｄ画像情報２７００は、奥行きマップを含んでもよく、より全般的には、カメラ１２００に対する、又は何らかの他の基準点に対する、カメラ視野３２００の様々な場所のそれぞれの奥行き値を説明し得る、奥行き情報を含んでもよい。それぞれの奥行き値に対応する場所は、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄ／３４００及び５０１２のそれぞれの上部表面上の場所などの、カメラ視野３２００の様々な表面上の場所（物理的な場所とも呼ばれる）であり得る。一部の実例では、３Ｄ画像情報２７００は、物体３４１０Ａ／３４１０Ｂ／３４１０Ｃ／３４１０Ｄ／３４００及び５０１２、又はカメラ視野３２００内のいくつかの他の物体の一つ以上の外側表面上の様々な場所を説明する、複数の３Ｄ座標を含み得る、点群を含み得る。点群が図２Ｆに示される。 In one embodiment, system 3000 includes a plurality of cameras 1200 or 2D cameras configured to generate 2D image information 2600 and 3D cameras configured to generate 3D image information 2700. A camera 1200 may be included. The camera 1200 or cameras 1200 may be mounted on or attached to the robot 3300, may be stationary within the environment, and/or may be mounted on a robot arm, gantry, or for camera movement. The robot 3300 may be fixed to a dedicated robotic system separate from the robot 3300 used for object manipulation, such as other automated systems configured to operate the object. 3A shows an example with a still camera 1200 and a handheld camera 1200, while FIG. 3B shows an example with only a still camera 1200. 2D image information 2600 (eg, a color or grayscale image) may describe the appearance of one or more objects, such as objects 3410 A/ 3410 B/ 3410 C/ 3410 D or object 5012, in camera field of view 3200. For example, the 2D image information 2600 may include visual details located on the respective external surfaces (e.g., top surfaces) of the objects 3410 A/ 3410 B/ 3410 C/ 3410 D and/or the The contour may be captured or otherwise represented. In one embodiment, the 3D image information 2700 may describe the structure of one or more of the objects 3410 A/ 3410 B/ 3410 C/ 3410 D/ 3400 and 5012, and the structure for the object It may also be referred to as a structure or physical structure of an object. For example, 3D image information 2700 may include a depth map, and more generally, may describe depth values for each of various locations in camera field of view 3200, relative to camera 1200 or relative to some other reference point. It may also include depth information. The locations corresponding to each depth value may be located on various surfaces of the camera field of view 3200 (physical (also called a place). In some instances, the 3D image information 2700 is on the outer surface of one or more of the objects 3410 A/ 3410 B/ 3410 C/ 3410 D/ 3400 and 5012, or some other object within the camera field of view 3200. It may include a cloud of points, which may include multiple 3D coordinates, describing various locations. The point cloud is shown in Figure 2F.

図３Ａ及び図３Ｂの例では、ロボット３３００（ロボット１３００の実施形態であり得る）は、ロボット基部３３１０に取り付けられる一方の端を有し、かつロボットグリッパなどのエンドエフェクタ装置３３３０に取り付けられるか、又はそれによって形成される別の端を有する、ロボットアーム３３２０を含み得る。ロボット基部３３１０は、ロボットアーム３３２０を装着するために使用され得るが、ロボットアーム３３２０、より具体的には、エンドエフェクタ装置３３３０は、ロボット３３００の環境で一つ以上の物体と相互作用するために使用され得る。相互作用（ロボット相互作用とも呼ぶ）は、例えば、物体３４１０Ａ～３４１０Ｄ及び５０１２のうちの少なくとも一つを握るか、又は他の方法で拾い上げることを含み得る。例えば、ロボット相互作用は、物体５０１２を容器から識別、検出、及び取り出すための物体摘み作業の一部であってもよい。エンドエフェクタ装置３３３０は、物体５０１２を把持するか、又は掴むための吸引カップ又は他の構成要素を有し得る。エンドエフェクタ装置３３３０は、吸引カップ又は他の把持構成要素を使用して、例えば、上面を介して、物体の単一の面又は表面との接触を通して、物体を把持するか、又は掴むように構成され得る。 In the example of FIGS. 3A and 3B, a robot 3300 (which may be an embodiment of robot 1300) has one end attached to a robot base 3310 and attached to an end effector device 3330, such as a robot gripper, or or another end formed thereby. The robot base 3310 may be used to mount a robot arm 3320, and more specifically an end effector device 3330, for interacting with one or more objects in the environment of the robot 3300. can be used. The interaction (also referred to as robot interaction) may include, for example, grasping or otherwise picking up at least one of objects 3410A - 3410D and 5012. For example, the robot interaction may be part of an object picking operation to identify, detect, and remove object 5012 from a container. End effector device 3330 may have a suction cup or other component for grasping or grasping object 5012. The end effector device 3330 is configured to grip or grip an object through contact with a single side or surface of the object, e.g., through the top surface, using a suction cup or other gripping component. can be done.

ロボット３３００は、構造部材を操作するため、及び／又はロボットユニットを輸送するためになど、作業を実行するために使用される情報を取得するように構成された追加のセンサをさらに含み得る。センサは、ロボット３３００及び／又は周囲の環境の一つ以上の物理的特性（例えば、その状態、条件、及び／又は一つ以上の構造部材／ジョイントの場所）を検出又は測定するよう構成された装置を含み得る。センサの一部の実施例には、加速度計、ジャイロスコープ、力センサ、歪みゲージ、触覚センサ、トルクセンサ、位置エンコーダなどが含まれ得る。 Robot 3300 may further include additional sensors configured to obtain information used to perform tasks, such as to manipulate structural members and/or to transport robotic units. The sensors are configured to detect or measure one or more physical characteristics of the robot 3300 and/or the surrounding environment (e.g., the state, condition, and/or location of one or more structural members/joints thereof). may include a device. Some examples of sensors may include accelerometers, gyroscopes, force sensors, strain gauges, tactile sensors, torque sensors, position encoders, and the like.

図４は、本明細書の実施形態による、物体の検出、識別、及び取り出しのための方法及び操作の全体的な流れを示す、フロー図を提供する。物体検出、識別、及び取り出しの方法４０００は、本明細書に記載されるサブ方法及び操作の特徴の任意の組み合わせを含み得る。方法４０００は、物体登録処理５０００、物体認識テンプレート生成方法６０００、特徴生成方法８０００、画像情報捕捉処理３００１、仮説生成処理１００００、仮説精密化方法１１０００、仮説検証方法１３０００、及び障害物検出、動作計画、及び動作実行を含むロボット制御処理１５０００のいずれか又はすべてを含み得る。実施形態では、物体登録処理５０００、物体認識テンプレート生成方法６０００、及び特徴生成方法８０００は、ロボット動作の文脈外で、前処理又はオフライン環境で行われてもよい。したがって、これらの処理及び方法は、ロボットによる後のアクションを容易にするために事前に行われてもよい。画像情報捕捉処理３００１、仮説生成処理１００００、仮説精密化方法１１０００、仮説検証方法１３０００、及びロボット制御処理１５０００は、それぞれ、容器から物体を検出、識別、及び取り出すためのロボット操作の文脈で行われてもよい。 FIG. 4 provides a flow diagram illustrating the overall flow of methods and operations for object detection, identification, and retrieval according to embodiments herein. Object detection, identification, and retrieval method 4000 may include any combination of sub-methods and operational features described herein. The method 4000 includes an object registration process 5000, an object recognition template generation method 6000, a feature generation method 8000, an image information capture process 3001, a hypothesis generation process 10000, a hypothesis refinement method 11000, a hypothesis verification method 13000, and obstacle detection and motion planning. , and robot control processing 15000 including motion execution. In embodiments, the object registration process 5000, object recognition template generation method 6000, and feature generation method 8000 may be performed in a pre-processing or offline environment outside the context of robot motion. These processes and methods may therefore be performed in advance to facilitate later actions by the robot. Image information capture processing 3001, hypothesis generation processing 10000, hypothesis refinement method 11000, hypothesis verification method 13000, and robot control processing 15000 are each performed in the context of robot operation for detecting, identifying, and removing objects from containers. You can.

図５は、物体登録処理５０００の間に生成、取得、受信、又はその他の方法で取得され得る、物体の種類に関連する物体登録データを示す。上述したように、本明細書に記載される方法及びシステムは、物体登録データ５００１、例えば、物体５０１１に関連する既知の、以前に記憶された情報を取得し、使用して、物理的シーンにおける類似の物体を識別及び認識する際に使用するための物体認識テンプレートを生成するように構成される。物体登録データ５００１は、物体モデル４２００を識別し、物体モデル４２００に関連付け、及び／又は物体モデル４２００を説明する、任意の種類のコンピュータ可読情報を含み得る。物体モデル４２００の物体登録データ５００１は、物体５０１１を表してもよく、物体モデル４２００は、表される物体５０１１の測定値及び寸法を提供し、相互作用的であってもなくてもよい二次又は三次元フォーマットである。物体登録データ５００１は、例えば、ＣＡＤ（すなわち、コンピュータ支援設計）データ、又は物体モデル４２００を説明し、任意の適切なフォーマットで記憶された他のモデリングデータを含み得る。登録データは、ソリッドＣＡＤモデル、ワイヤフレームＣＡＤモデル、又はサーフェスＣＡＤモデルであってもよい。一実施形態では、登録データは、ＦＢＸ、ＯＢＪ、ＵＳＤ、ＳＴＬ、ＳＴＥＰ、ＣＯＬＬＡＤＥなどの任意の種類の三次元ファイル形式であってもよい。物体モデル４２００は、一つ以上の物理的物体を表す。物体モデル４２００は、物理的に世界内に存在する一つ以上の対応する物体５０１１のモデル化（すなわち、コンピュータが記憶した）バージョンである。図５に示すように、物体５０１１は、物理的世界に存在する物理的物体であり、物体モデル４２００は、物体登録データ５００１によって説明される物体５０１１のデジタル表現である。示される物体５０１１は、例えば、ロッド、バー、ギア、ボルト、ナット、ねじ、くぎ、リベット、ばね、鎖、歯車、又は任意の他の種類の物理的物体、ならびに複数の物体のアセンブリを含む、任意の物体であってもよい。実施形態では、物体５０１１は、例えば、数グラム～数キログラムの範囲の質量、及び例えば、５ｍｍ～５００ｍｍの範囲のサイズを有する、容器（例えば、ビン、箱、バケツなど）からアクセス可能な物体を指し得る。物体モデル４２００は、例えば、特定の長さ、ねじ山の数、ねじ山のサイズ、ヘッドサイズなどを有するねじなど、現実世界の物体５０１１の正確なバージョンに特有であってもよい。例えば、及び例示的な目的として、この説明は、物体５０１１としてのねじ形状の物体を指す。これは、利便性のみを目的として提示されており、いかなる方法でも説明の範囲を制限することを意図していない。 FIG. 5 illustrates object registration data related to object types that may be generated, acquired, received, or otherwise obtained during an object registration process 5000. As mentioned above, the methods and systems described herein obtain and use object registration data 5001, e.g., known, previously stored information associated with object 5011, to create an image in a physical scene. The method is configured to generate object recognition templates for use in identifying and recognizing similar objects. Object registration data 5001 may include any type of computer readable information that identifies, associates with, and/or describes object model 4200. The object registration data 5001 of the object model 4200 may represent an object 5011, and the object model 4200 provides measurements and dimensions of the represented object 5011 and provides secondary data that may or may not be interactive. or in three-dimensional format. Object registration data 5001 may include, for example, CAD (ie, computer-aided design) data or other modeling data describing object model 4200 and stored in any suitable format. The registration data may be a solid CAD model, a wireframe CAD model, or a surface CAD model. In one embodiment, the registration data may be in any type of three-dimensional file format, such as FBX, OBJ, USD, STL, STEP, COLLADE, etc. Object model 4200 represents one or more physical objects. Object model 4200 is a modeled (ie, computer-stored) version of one or more corresponding objects 5011 that physically exist in the world. As shown in FIG. 5, object 5011 is a physical object existing in the physical world, and object model 4200 is a digital representation of object 5011 described by object registration data 5001. The objects 5011 shown include, for example, rods, bars, gears, bolts, nuts, screws, nails, rivets, springs, chains, gears, or any other type of physical object, as well as assemblies of multiple objects. It can be any object. In an embodiment, the object 5011 is an object accessible from a container (e.g., a bottle, box, bucket, etc.), for example, having a mass in the range of a few grams to a few kilograms, and a size in the range of, for example, 5 mm to 500 mm. It can be pointed out. Object model 4200 may be specific to an exact version of real-world object 5011, such as, for example, a screw with a particular length, number of threads, thread size, head size, etc. For example, and for exemplary purposes, this description refers to a thread-shaped object as object 5011. It is presented for convenience only and is not intended to limit the scope of the description in any way.

いくつかの実施形態では、本開示は、シーン５０１３内の物体５０１２を識別するための物体認識テンプレートセットの生成に関する。物体登録データ５００１は、物理的物体５０１１に基づいてもよく、物理的物体５０１１に類似する（及び、そのコピー又はバージョンであってもよい）他の物理的物体５０１２の認識を容易にするために使用されてもよい。シーン内の物体５０１２を識別することは、物体５０１２が対応する（例えば、物体５０１２が何であるかを識別する）物体モデル４２００を識別すること、物体５０１２の姿勢を識別する（例えば、物体５０１２の位置、角度、及び配向を識別する）ことを含み得る。 In some embodiments, the present disclosure relates to generating a set of object recognition templates to identify objects 5012 within scene 5013. Object registration data 5001 may be based on physical object 5011 to facilitate recognition of other physical objects 5012 that are similar to physical object 5011 (and may be copies or versions thereof). may be used. Identifying an object 5012 in a scene includes identifying the object model 4200 to which the object 5012 corresponds (e.g., identifying what the object 5012 is), identifying the pose of the object 5012 (e.g., identifying what the object 5012 is) position, angle, and orientation).

図６は、物体認識テンプレートセットを生成するための、例示的な物体認識テンプレート生成方法６０００のフロー図を示す。一実施形態では、物体認識テンプレート生成方法６０００は、例えば、図２Ａ～２Ｄの計算システム１１００（又は１１００Ａ／１１００Ｂ／１１００Ｃ）、又は図３Ａ～３Ｂの計算システム１１００、あるいはより具体的には、計算システム１１００の少なくとも一つの処理回路１１１０によって行われてもよい。一部のシナリオでは、計算システム１１００は、非一時的コンピュータ可読媒体（例えば、１１２０）に記憶された命令を実行することによって、物体認識テンプレート生成方法６０００を実行し得る。例えば、命令によって、計算システム１１００に、物体認識テンプレート生成方法６０００を行い得る、図２Ｄに示されたモジュールのうちの一つ以上を実行させてもよい。例えば、実施形態では、物体認識テンプレート生成方法６０００のステップは、物体登録モジュール１１３０によって実施されてもよく、テンプレート生成モジュール１１３２は、協働して物体認識テンプレートを生成するように処理してもよい。 FIG. 6 shows a flow diagram of an example object recognition template generation method 6000 for generating an object recognition template set. In one embodiment, the object recognition template generation method 6000 includes, for example, the computing system 1100 (or 1100A/1100B/1100C) of FIGS. 2A-2D, or the computing system 1100 of FIGS. 3A- 3B , or more specifically, This may be performed by at least one processing circuit 1110 of system 1100. In some scenarios, computing system 1100 may perform object recognition template generation method 6000 by executing instructions stored on a non-transitory computer-readable medium (eg, 1120). For example, the instructions may cause computing system 1100 to perform one or more of the modules illustrated in FIG. 2D that may perform object recognition template generation method 6000. For example, in embodiments, the steps of object recognition template generation method 6000 may be performed by object registration module 1130, and template generation module 1132 may operate to collaboratively generate object recognition templates. .

物体認識テンプレート生成方法６０００のステップは、物体認識テンプレート生成を達成するために利用されてもよく、これは後で特定のタスクを実行するための特定の連続ロボット軌道と共に使用されてもよい。一般的な概要として、物体認識テンプレート生成方法６０００は、物体のピッキングに関する処理のためのシーン内の物体を識別するのに使用するために、計算システム用の物体認識テンプレートのセットを計算システム１１００に生成させるように処理してもよい。物体認識テンプレート生成方法６０００は、図７Ａ及び７Ｂをさらに参照して以下に説明される。 The steps of object recognition template generation method 6000 may be utilized to accomplish object recognition template generation, which may later be used in conjunction with a particular sequential robot trajectory to perform a particular task. As a general overview, object recognition template generation method 6000 provides a set of object recognition templates for computing system 1100 for use in identifying objects in a scene for processing related to object picking. It may be processed so as to be generated. The object recognition template generation method 6000 is described below with further reference to FIGS. 7A and 7B.

少なくとも一つの処理回路１１１０は、複数の物体認識テンプレート４３００を含み得る物体認識テンプレートセット４３０１を生成するための、物体認識テンプレート生成方法６０００の特定のステップを実行し得る。物体認識テンプレート生成方法６０００は、物体５０１１を表す物体モデル４２００の物体登録データ５００１を取得することを含む、処理６００１で始まるか、又はこれを含んでもよい。 At least one processing circuit 1110 may perform certain steps of an object recognition template generation method 6000 to generate an object recognition template set 4301 that may include a plurality of object recognition templates 4300. The object recognition template generation method 6000 may begin with or include operation 6001, which includes obtaining object registration data 5001 for an object model 4200 representing an object 5011.

処理６００１では、物体認識テンプレート生成方法６０００は、物体５０１１を表す物体登録データ５００１を取得することを含んでもよく、物体登録データ５００１は、物体５０１１を表す物体モデル４２００を含んでもよい。少なくとも一つの処理回路１１１０は、三次元空間４１００における物体モデルの複数の視点４１２０を決定し得る。少なくとも一つの処理回路１１１０はさらに、複数の視点４１２０の各々で、物体モデル４２００の複数の外観４１４０を推定し得る。ロボットシステムは、複数の外観に従って、複数の物体認識テンプレート４３００（例えば、４３００Ａ／４３００Ｂ／４３００Ｃ／４３００Ｄ）をさらに生成してもよく、複数の物体認識テンプレート４３００の各々は、複数の外観４１４０のそれぞれの一つに対応する。次に、少なくとも一つの処理回路１１１０は、複数の物体認識テンプレート４３００を、後で使用するために、物体認識テンプレートセット４３０１としてロボットシステム又は記憶システムに伝達してもよい。複数の物体認識テンプレート４３００の各々は、仮想カメラ４１１０の光学軸４１３０に対して、物体モデル４２００が有し得る姿勢を表し得る。各物体認識テンプレート４３００は、物体認識テンプレート４３００の生成中に仮想カメラ４１１０の目線に対応する目線を有するカメラ１２００の目線から、物体モデル４２００に対応する物体５０１１の視野を表す。 In the process 6001, the object recognition template generation method 6000 may include obtaining object registration data 5001 representing the object 5011, and the object registration data 5001 may include an object model 4200 representing the object 5011. At least one processing circuit 1110 may determine multiple viewpoints 4120 of the object model in three-dimensional space 4100. At least one processing circuit 1110 may further estimate multiple appearances 4140 of object model 4200 at each of multiple viewpoints 4120. The robot system may further generate a plurality of object recognition templates 4300 (e.g., 4300A/4300B/4300C/4300D) according to the plurality of appearances, each of the plurality of object recognition templates 4300 corresponding to a respective one of the plurality of appearances 4140. corresponds to one of the At least one processing circuit 1110 may then communicate the plurality of object recognition templates 4300 as an object recognition template set 4301 to a robotic system or storage system for later use. Each of the plurality of object recognition templates 4300 may represent a pose that the object model 4200 may have with respect to the optical axis 4130 of the virtual camera 4110. Each object recognition template 4300 represents the field of view of the object 5011 corresponding to the object model 4200 from the line of sight of the camera 1200, which has a line of sight corresponding to the line of sight of the virtual camera 4110 during generation of the object recognition template 4300.

少なくとも一つの処理回路１１１０は、物体登録データ５００１を、それ自体のハードウェアストレージコンポーネント（すなわち、ＨＤＤ、ＳＳＤ、ＵＳＢ、ＣＤ、ＲＡＩＤなど）、又はソフトウェアストレージコンポーネント（すなわち、クラウド、ＶＳＰなど）内から取得し得る。一実施形態では、少なくとも一つの処理回路１１１０は、外部プロセッサ（すなわち、外部ラップトップ、デスクトップ、携帯電話、又は独自の処理システムを有する任意の他の別個のデバイス）から登録データを取得し得る。 At least one processing circuit 1110 receives object registration data 5001 from within its own hardware storage component (i.e., HDD, SSD, USB, CD, RAID, etc.) or software storage component (i.e., cloud, VSP, etc.). can be obtained. In one embodiment, at least one processing circuit 1110 may obtain registration data from an external processor (i.e., an external laptop, desktop, mobile phone, or any other separate device with its own processing system).

物体認識テンプレート生成方法６０００は、三次元空間４１００における物体モデル４２００の複数の視点４１２０を決定することを含み得る処理６００３をさらに含み得る。これは、空間サブサンプリング手順と呼ばれ得る。物体モデル４２００を囲む三次元空間４１００は、表面４１０１によって囲まれてもよい。三次元空間４１００及び表面４１０１は、これも仮想実体である物体モデル４２００を囲む仮想実体である。処理６００３で決定される複数の視点４１２０の各々は、三次元空間４１００を囲む表面４１０１上の仮想カメラ４１１０、及び仮想カメラ４１１０の光学軸４１３０の周りの仮想カメラ４１１０の回転角の位置に対応するか、又はそれらを表してもよい。したがって、表面４１０１上の各位置は、複数の視点４１２０に対応し得る。 The object recognition template generation method 6000 may further include a process 6003 that may include determining multiple viewpoints 4120 of the object model 4200 in the three-dimensional space 4100. This may be called a spatial subsampling procedure. A three-dimensional space 4100 surrounding the object model 4200 may be surrounded by a surface 4101. Three-dimensional space 4100 and surface 4101 are virtual entities surrounding object model 4200, which is also a virtual entity. Each of the plurality of viewpoints 4120 determined in the process 6003 corresponds to the position of the virtual camera 4110 on the surface 4101 surrounding the three-dimensional space 4100 and the rotational angle of the virtual camera 4110 around the optical axis 4130 of the virtual camera 4110. or may represent them. Thus, each location on surface 4101 may correspond to multiple viewpoints 4120.

空間サブサンプリング手順で使用される仮想カメラ４１１０は、仮想カメラ４１１０が位置する視点４１２０から物体の外観を捕捉してもよい。例えば、図７Ａに示すように、個々の視点４１２０Ａに位置する仮想カメラ４１１０は、物体モデル４２００の外観４１４０を捕捉してもよい。外観４１４０は、仮想カメラ４１１０の光学軸の周りの仮想カメラ４１１０の画角及び回転に基づいて、物体モデル４２００の仮想カメラ４１１０への外観を説明する情報を含む。物体モデル４２００は、この三次元空間４１００内に固定されてもよい。一実施形態では、三次元空間は実質的に球状であってもよい。物体モデル４２００はさらに、実質的に球状の三次元空間の中心又はほぼ中心に固定されてもよい。別の実施形態では、三次元空間は、楕円体、又は平行六面体など、任意の他の適切な三次元形状であってもよい。物体モデル４２００は、三次元空間内の中央又は非中央のいずれかの点に固定されてもよい。（例えば、以下でさらに論じるように処理６００７を介して）生成された個々の物体認識テンプレート４３００Ａ／４３００Ｂ／４３００Ｃ／４３００Ｄなどの各々は、複数の視点４１２０の一つの視点４１２０からの物体モデル４２００の一つの捕捉された外観４１４０に対応し得る。各物体認識テンプレート４３００は、物体の姿勢、すなわち、物体の配向及び可視表面などを捕捉する視点４１２０からの物体モデル４２００の外観４１４０を含み得る。一実施形態では、複数の視点４１２０の各々は、さらに、三次元空間４１００内の仮想カメラ４１１０の回転角度、すなわち、その光学軸４１３０に対する、１～３６０°のカメラの回転角度に相当し得る。 The virtual camera 4110 used in the spatial subsampling procedure may capture the appearance of the object from a viewpoint 4120 from which the virtual camera 4110 is located. For example, as shown in FIG. 7A, a virtual camera 4110 located at a respective viewpoint 4120A may capture an exterior view 4140 of an object model 4200. Appearance 4140 includes information that describes the appearance of object model 4200 to virtual camera 4110 based on the viewing angle and rotation of virtual camera 4110 about the optical axis of virtual camera 4110. Object model 4200 may be fixed within this three-dimensional space 4100. In one embodiment, the three-dimensional space may be substantially spherical. Object model 4200 may further be fixed at or near the center of a substantially spherical three-dimensional space. In another embodiment, the three-dimensional space may be any other suitable three-dimensional shape, such as an ellipsoid or a parallelepiped. Object model 4200 may be fixed at either a central or non-central point in three-dimensional space. Each of the individual object recognition templates 4300A/4300B/4300C/4300D, etc. generated (e.g., via process 6007 as discussed further below) is a representation of the object model 4200 from one viewpoint 4120 of the plurality of viewpoints 4120. May correspond to one captured appearance 4140. Each object recognition template 4300 may include an appearance 4140 of the object model 4200 from a viewpoint 4120 that captures the object's pose, ie, the object's orientation and visible surface. In one embodiment, each of the plurality of viewpoints 4120 may further correspond to a rotation angle of the virtual camera 4110 in the three-dimensional space 4100, ie, from 1 to 360 degrees with respect to its optical axis 4130.

処理６００３は、対応する物体認識テンプレート４３００が物体認識テンプレートセット４３０１に含まれる、視点４１２０を選択するために実行される空間サブサンプリング手順を含み得る。物体認識テンプレート生成方法６０００の効率は、物体認識テンプレート４３００が生成される空間（例えば、視点４１２０及び外観４１４０の数）を低減又はその他の方法で最適化することによって、増大又は最大化され得る。実施形態では、過剰な視点４１２０は、それらの視点４１２０で物体外観４１４０を最初に捕捉した後に排除され得る。例えば、過剰な視点４１２０は、（例えば、対称性に起因して）他の視点４１２０と実質的に類似した情報を含有すると判定された場合に排除され得る。実施形態では、過剰な視点４１２０は、以下で論じるように、姿勢、間隔などに関する所定の決定に基づいて、物体外観４１４０の捕捉の前に排除され得る。実施形態では、選択された視点４１２０の数及び隣接する視点４１２０の間の間隔の距離は、例えば、問題となる物体モデル４２００の複雑さ及び／又は対称性に基づいて、必要とされる物体認識テンプレート４３００の数に依存し得る。 Process 6003 may include a spatial subsampling procedure performed to select viewpoints 4120 whose corresponding object recognition templates 4300 are included in object recognition template set 4301. The efficiency of object recognition template generation method 6000 may be increased or maximized by reducing or otherwise optimizing the space (eg, number of viewpoints 4120 and appearances 4140) in which object recognition template 4300 is generated. In embodiments, superfluous viewpoints 4120 may be eliminated after initially capturing object appearances 4140 at those viewpoints 4120. For example, superfluous viewpoints 4120 may be eliminated if they are determined to contain information that is substantially similar to other viewpoints 4120 (eg, due to symmetry). In embodiments, superfluous viewpoints 4120 may be eliminated prior to acquisition of object features 4140 based on predetermined decisions regarding pose, spacing, etc., as discussed below. In embodiments, the number of selected viewpoints 4120 and the distance of the spacing between adjacent viewpoints 4120 are determined based on the required object recognition, e.g., the complexity and/or symmetry of the object model 4200 in question. It may depend on the number of templates 4300.

複数の視点４１２０は、いくつかの異なる方法に従って選択又は決定され得る。例えば、少なくとも一つの処理回路１１１０は、経度の円４１７０及び緯度の円４１８０の交差に従って、視点４１２０を決定し得る。視点４１２０は、三次元空間４１００の表面４１０１にわたる経度の円４１７０及び緯度の円４１８０の交差に位置し得る。こうした選択スキームでは、高密度の視点４１２０は、表面４１０１の極に、又はその近くにクラスタ化されてもよく、低密度の視点は、交差する経度の円及び緯度の円の周りに、極からさらに離れて（例えば、表面４１０１の赤道に近く）形成してもよい。サンプル位置のこうした不均一な分布は、複数の物体認識テンプレート４３００を、仮想カメラ４１１０と物体モデル４２００との間の相対的姿勢／配向の一つの範囲又は範囲のセットを過剰に表し、別の範囲又は範囲のセットを過小に表させ得る。こうした選択は、一部の物体モデル４２００を有する一部のシナリオでは有利であり、その他のシナリオではあまり有利ではない場合がある。 Multiple viewpoints 4120 may be selected or determined according to several different methods. For example, at least one processing circuit 1110 may determine viewpoint 4120 according to the intersection of longitude circle 4170 and latitude circle 4180. Viewpoint 4120 may be located at the intersection of longitude circle 4170 and latitude circle 4180 across surface 4101 of three-dimensional space 4100. In such a selection scheme, high-density viewpoints 4120 may be clustered at or near the poles of surface 4101, and low-density viewpoints may be clustered from the poles around intersecting longitude and latitude circles . They may be formed further apart (eg, closer to the equator of surface 4101). Such a non-uniform distribution of sample locations may cause multiple object recognition templates 4300 to overrepresent one range or set of ranges of relative pose/orientation between virtual camera 4110 and object model 4200 and overrepresent another range. or may cause the set of ranges to be underrepresented. Such a choice may be advantageous in some scenarios with some object models 4200 and less advantageous in other scenarios.

さらなる実施形態では、複数の視点４１２０は、三次元空間４１００を囲む表面４１０１にわたる均等分布に従って選択され得る。均等分布は、表面４１０１にわたって互いに等しい距離で分布されている視点４１２０を指し得る。均等分布は、不均一な分布よりもより一貫性のあるテンプレート生成を提供してもよく、対称性を欠く物体に対して好ましい場合がある。 In further embodiments, the plurality of viewpoints 4120 may be selected according to an even distribution across the surface 4101 surrounding the three-dimensional space 4100. Even distribution may refer to viewpoints 4120 that are distributed at equal distances from each other across surface 4101. An even distribution may provide more consistent template generation than an uneven distribution and may be preferred for objects lacking symmetry.

いくつかの実施形態では、複数の視点４１２０は、視点４１２０の総数を減少させるように、及び／又は特定の視点を支持するために、視点分布を重み付け又は偏らせるように選択され得る。 In some embodiments, multiple viewpoints 4120 may be selected to reduce the total number of viewpoints 4120 and/or to weight or bias the viewpoint distribution in favor of particular viewpoints.

一実施形態では、複数の視点４１２０は、物理的状況において複数の物体５０１１に対して観察されることが期待される、予測可能な姿勢の範囲に基づいて決定され得る。例えば、いくつかの先細りボトルを保持する容器では、ボトルの姿勢は、より広い端又は基端が下向きに面するようにすることが期待され得る。したがって、視点分布は、表面４１０１の上半分により多くの視点４１２０を有するように偏らせる又は重み付けされ得る。 In one embodiment, multiple viewpoints 4120 may be determined based on a range of predictable poses that are expected to be observed for multiple objects 5011 in a physical situation. For example, in a container holding several tapered bottles, the orientation of the bottles may be expected to be such that the wider or proximal end faces downward. Therefore, the viewpoint distribution may be biased or weighted to have more viewpoints 4120 in the upper half of the surface 4101.

別の実施形態では、複数の視点４１２０は、物体モデル４２００の対称性（又はその欠如）に基づいて決定され得る。物体モデル４２００の対称性は、物体モデル４２００の回転後に、物体モデル４２００の外観４１４０が、物体モデル４２００の軸の周りの度数によって変化するかどうかに基づいて判定され得る。例えば、１８０度回転の後に実質的に同じように見える物体モデル４２００は、双方向対称性を有する。１２０度回転の後に実質的に同じように見える物体モデル４２００は、三方対称性を有する。９０度回転の後に実質的に同じように見える物体モデル４２００は、四方対称性を有する。６０度の回転の後に実質的に同じように見える物体モデル４２００は、六方対称性を有する。異なる物体に対して他の対称性が可能であってよい。実質的に同じ外観は、類似性の閾値に従って決定され得る。 In another embodiment, multiple viewpoints 4120 may be determined based on symmetry (or lack thereof) of object model 4200. The symmetry of object model 4200 may be determined based on whether the appearance 4140 of object model 4200 changes by degrees about an axis of object model 4200 after rotation of object model 4200. For example, an object model 4200 that looks substantially the same after a 180 degree rotation has two-way symmetry. Object model 4200, which looks substantially the same after a 120 degree rotation, has three-way symmetry. Object model 4200, which looks substantially the same after a 90 degree rotation, has four-way symmetry. Object model 4200, which looks substantially the same after a 60 degree rotation, has hexagonal symmetry. Other symmetries may be possible for different objects. Substantially the same appearance may be determined according to a similarity threshold.

物体認識テンプレート生成方法６０００は、複数の視点４１２０の各々で、物体モデル４２００の複数の外観４１４０を推定又は捕捉することを含む、処理６００５をさらに含み得る。複数の外観４１４０の推定は、複数の視点４１２０の各視点４１２０で行われてもよい。各外観４１４０は、それぞれの視点４１２０に見られるように、物体モデル４２００の姿勢又は配向を含む。物体認識テンプレート４３００の各々は、複数の視点４１２０のそれぞれの視点４１２０に対応し、それぞれの視点４１２０からの物体モデル４２００の外観４１４０を表す情報を含む。例えば、物体認識テンプレート４３００は、物体モデルの真上に（すなわち、三次元平面のＹ軸に沿って）配置される仮想カメラ４１１０に対応する、それぞれの視点４１２０からの物体モデル４２００のそれぞれの外観４１４０に対応するか、又は表すことができる。別の例では、物体認識テンプレート４３００は、物体モデル４２００の直接左に（すなわち、三次元平面のＸ軸に沿って）配置される仮想カメラ４１１０に対応する、それぞれの視点４１２０からの物体モデルのそれぞれの外観４１４０に対応してもよい。一実施形態では、物体認識テンプレートセット４３０１の物体認識テンプレート４３００の各々は、物体モデル４２００の周りの多数の異なる位置及び配向（すなわち、三次元平面内の多数の位置）に配置される仮想カメラ４１１０に対応する複数の視点４１２０のそれぞれの視点４１２０からの、物体モデル４２００の複数の外観４１４０のそれぞれの外観４１４０に対応するか、又は表すことができる。したがって、複数の外観４１４０を推定することは、特定の視点から特定の配向で観察されたときに、物体モデル４２００がどのように見えるかを決定するか、又は推定することを含み得る。例えば、視点は、物体モデル４２００の直接的な下向きの外観、上向きの外観、左向きの外観、右向きの外観、又は三次元空間４１００を囲む表面４１０１のならびに表面４１０１上の主軸Ｘ、Ｙ、及びＺの間の任意の角度／位置を含み得る。上で論じたように、各視点４１２０はまた、１～３６０°のカメラの光学軸４１３０に対する仮想カメラ４１１０の回転角を含んでもよい。したがって、各カメラ位置は、視点４１２０のセットに対応してもよく、視点のセットの各視点４１２０はさらに、仮想カメラ４１１０の異なる回転角に対応してもよい。例えば、視点４１２０のセットの二つの別個の視点４１２０は、表面４１０１の主軸Ｘ、Ｙ、及びＺの間の同じ角度／場所から推定又は捕捉され得るが、第一の視点４１２０の回転角は、第二の視点４１２０の回転角に対して４５°回転される。 The object recognition template generation method 6000 may further include an operation 6005 that includes estimating or capturing a plurality of appearances 4140 of the object model 4200 at each of the plurality of viewpoints 4120. Estimation of multiple appearances 4140 may be performed at each viewpoint 4120 of multiple viewpoints 4120. Each appearance 4140 includes a pose or orientation of the object model 4200 as seen from a respective viewpoint 4120. Each of the object recognition templates 4300 corresponds to each viewpoint 4120 of the plurality of viewpoints 4120 and includes information representing the appearance 4140 of the object model 4200 from each viewpoint 4120. For example, the object recognition templates 4300 may be configured to represent each appearance of the object model 4200 from a respective viewpoint 4120 corresponding to a virtual camera 4110 placed directly above the object model (i.e., along the Y-axis of the three-dimensional plane). 4140 or can be represented. In another example, the object recognition template 4300 includes a view of the object model from a respective viewpoint 4120 corresponding to a virtual camera 4110 positioned directly to the left of the object model 4200 (i.e., along the X-axis of the three-dimensional plane). It may correspond to each appearance 4140. In one embodiment, each of the object recognition templates 4300 of the object recognition template set 4301 includes a virtual camera 4110 positioned at a number of different positions and orientations around the object model 4200 (i.e., multiple positions in a three-dimensional plane). may correspond to or represent a respective appearance 4140 of a plurality of appearances 4140 of the object model 4200 from a respective viewpoint 4120 of a plurality of viewpoints 4120 corresponding to the object model 4200 . Accordingly, estimating multiple appearances 4140 may include determining or estimating how object model 4200 looks when viewed from a particular viewpoint and in a particular orientation. For example, the viewpoint may be a direct downward look, an upward look, a leftward look, a rightward look of the object model 4200, or the principal axes X, Y, and Z of the surface 4101 surrounding the three-dimensional space 4100 as well as the principal axes X, Y, and Z on the surface 4101. may include any angle/position between. As discussed above, each viewpoint 4120 may also include a rotation angle of the virtual camera 4110 with respect to the camera's optical axis 4130 from 1 to 360 degrees. Thus, each camera position may correspond to a set of viewpoints 4120, and each viewpoint 4120 of the set of viewpoints may further correspond to a different rotation angle of the virtual camera 4110. For example, two separate viewpoints 4120 of the set of viewpoints 4120 may be estimated or captured from the same angle/location between the principal axes X, Y, and Z of the surface 4101, but the rotation angle of the first viewpoint 4120 is It is rotated by 45° with respect to the rotation angle of the second viewpoint 4120.

物体認識テンプレート生成方法６０００は、複数の外観４１４０に基づいて、複数の物体認識テンプレート４３００が生成される処理６００７をさらに含み得る。複数の物体認識テンプレート４３００の各々は、複数の外観４１４０のそれぞれの一つに対応する。したがって、生成された物体認識テンプレート４３００は、物体モデル４２００に対して、特定の姿勢で、ならびに仮想カメラ４１１０の特定の角度及び／又は回転で、物体モデル４２００を表す情報を含み得る。したがって、複数の物体認識テンプレート４３００の各々は、複数の物体認識テンプレート４３００の他のものと異なっていてもよい（ただし、一部のシナリオでは、二つの異なる物体認識テンプレート４３００は、視点４１２０の選択で説明されない物体モデル４２００の対称性に起因して、実質的に同じ情報を含み得る）。 The object recognition template generation method 6000 may further include a process 6007 in which a plurality of object recognition templates 4300 are generated based on the plurality of appearances 4140. Each of the plurality of object recognition templates 4300 corresponds to a respective one of the plurality of appearances 4140. Thus, the generated object recognition template 4300 may include information representing the object model 4200 at a particular pose and at a particular angle and/or rotation of the virtual camera 4110 relative to the object model 4200. Thus, each of the plurality of object recognition templates 4300 may be different from the other of the plurality of object recognition templates 4300 (although in some scenarios, two different object recognition templates 4300 may contain substantially the same information due to symmetries of the object model 4200 that are not accounted for).

各物体認識テンプレート４３００は、捕捉された又は推定されたそれぞれの外観４１４０に従って生成される２Ｄ外観４３０２及び３Ｄ外観４３０３を含み得る。２Ｄ外観４３０２は、例えば、レイトレーシング及び不連続検出技術に従ってレンダリングされ得る、レンダリングされた二次元画像を含み得る。３Ｄ外観４３０３は、例えば、図２Ｆに関連して説明された３Ｄ画像情報２７００に類似のレンダリングされた３Ｄ点群を含む。 Each object recognition template 4300 may include a 2D appearance 4302 and a 3D appearance 4303 generated according to a respective captured or estimated appearance 4140. 2D appearance 4302 may include a rendered two-dimensional image, which may be rendered according to ray tracing and discontinuity detection techniques, for example. 3D appearance 4303 includes, for example, a rendered 3D point cloud similar to 3D image information 2700 described in connection with FIG. 2F.

一部の実施形態では、２Ｄ外観４３０２及び／又は３Ｄ外観４３０３は、レイトレーシング技術を介して生成され得る。レイトレーシング処理は、物体モデル４２００の表面に当たる、仮想カメラ４１１０の目線からの様々な光線をシミュレートし得る。これはさらに、光線が物体モデル４２００の表面に当たる角度、光線が物体モデル４２００の表面に移動する距離、及び／又は拡散反射（複数の角度で偏向された光線がそのように実施される）又は鏡面反射（偏向された光線が単一の角度でそのように実施される）の影響を決定し得る。物体モデル４２００の表面から反射された偏向された光線の角度は、物体の表面法線の角度の変化を示し得る。物体の表面法線の角度のこうした変化は、物体のエッジで発生し得る。 In some embodiments, 2D appearance 4302 and/or 3D appearance 4303 may be generated via ray tracing techniques. The ray-tracing process may simulate various light rays from the perspective of the virtual camera 4110 hitting the surface of the object model 4200. This further determines the angle at which the ray hits the surface of the object model 4200, the distance the ray travels to the surface of the object model 4200, and/or the effects of diffuse reflection (rays deflected at multiple angles are thus implemented) or specular surfaces. The effect of reflection (the deflected ray is so performed at a single angle) can be determined. The angle of the deflected ray reflected from the surface of object model 4200 may indicate a change in the angle of the object's surface normal. Such changes in the angle of the object's surface normal can occur at the edges of the object.

物体モデル４２００に対して生成される複数の物体認識テンプレート４３００の総数は、およそ１００個のテンプレートから３２００個のテンプレートの範囲であってもよく、このテンプレートの数が多いほど、複数の物体認識テンプレート４３００が生成される物体モデル４２００の複雑さに相関してもよい。引用された数は、いくつかの用途及びいくつかの物体の種類に共通するが、本発明の範囲から逸脱することなく、より多くの又はより少ないテンプレートを使用してもよい。例えば、実質的に対称的な外観（例えば、ねじ付きナット）を呈する物体モデル４２００は、多数の冗長テンプレート（すなわち、合致するテンプレート）又は実質的に同じテンプレートを生成する。したがって、こうした単純な物体モデル４２００は、わずか１００個のテンプレート、又は１００～３２００個のテンプレートの範囲の下半分の任意の数のテンプレートを生成することができる。逆に、対称性を欠く物体モデル４２００は、より大きい実行可能な角度で物体モデル４２００を適切に表すために、より多くの物体認識テンプレート４３００を必要とし得る。 The total number of multiple object recognition templates 4300 generated for the object model 4200 may range from approximately 100 templates to 3200 templates, and the larger the number of templates, the more the multiple object recognition templates 4300 may be correlated to the complexity of the object model 4200 being generated. Although the numbers cited are common to some applications and some object types, more or fewer templates may be used without departing from the scope of the invention. For example, an object model 4200 that exhibits a substantially symmetrical appearance (eg, a threaded nut) will generate multiple redundant templates (ie, matching templates) or templates that are substantially the same. Thus, such a simple object model 4200 can generate as few as 100 templates, or any number of templates in the lower half of the range of 100 to 3200 templates. Conversely, an object model 4200 that lacks symmetry may require more object recognition templates 4300 to adequately represent the object model 4200 at a larger feasible angle.

物体認識テンプレート生成方法６０００は、物体認識テンプレートセット４３０１として、複数の物体認識テンプレート４３００をロボット制御システムに伝達することを含む処理６００９をさらに含み得る。物体認識テンプレートセット４３０１は、計算システム１１００、任意の他の種類のロボット制御システム、及び／又は物体認識テンプレート４３００を採用し得る任意の他のシステムなどのロボット制御システムに伝達されてもよい。実施形態では、物体認識テンプレートセット４３０１を伝達することは、任意の適切なネットワーキングプロトコル及び／又は記憶装置を介して、ロボット制御システム、又は物体認識テンプレートを採用できる他のシステムによって後でアクセスするための任意の時間の間、メモリ又は他の記憶装置への直接伝達を含み得る。物体認識テンプレートセット４３０１内の複数の物体認識テンプレート４３００の各々は、特定の視点４１２０に位置するとき、物体モデル４２００が仮想カメラ４１１０の光学軸４１３０に対して有し得る姿勢を表す。前述したように、姿勢は、任意の位置角度及び回転角度を含み得る。 The object recognition template generation method 6000 may further include an operation 6009 that includes communicating the plurality of object recognition templates 4300 to the robot control system as an object recognition template set 4301. Object recognition template set 4301 may be communicated to a robot control system, such as computing system 1100, any other type of robot control system, and/or any other system that may employ object recognition template 4300. In embodiments, communicating the object recognition template set 4301 for later access by a robot control system, or other system that can employ the object recognition templates, via any suitable networking protocol and/or storage device. may include direct transmission to memory or other storage device for any amount of time. Each of the plurality of object recognition templates 4300 in the object recognition template set 4301 represents a pose that the object model 4200 may have with respect to the optical axis 4130 of the virtual camera 4110 when located at a particular viewpoint 4120. As mentioned above, a pose can include any position angle and rotation angle.

上述のように、本発明の物体認識テンプレート生成方法６０００は、物体登録データ５００１から物体認識テンプレートセット４３０１を生成することを含む。物体認識テンプレートセット４３０１は、物理的処理中に、一つ以上の物体５０１１を掴む、持ち上げる、あるいはこれと相互作用する、シーン内の一つ以上の物体５０１１を識別するために使用され得る。物体５０１１を表す物体モデル４２００の物体登録データ５００１を取得する。三次元空間４１００における物体モデル４２００の複数の視点４１２０が決定される。複数の視点４１２０のそれぞれにおける物体モデルの外観４１４０は、推定又は捕捉される。複数の物体認識テンプレート４３００は、複数の外観４１４０に従って生成され、複数の物体認識テンプレート４３００の各々は、複数の外観４１４０のそれぞれの一つに対応する。複数の物体認識テンプレート４３００は、物体認識テンプレートセット４３０１としてロボット制御システムに伝達される。複数の物体認識テンプレート４３００の各々は、仮想カメラ４１１０の光学軸４１３０に対して、物体モデル４２００が有し得る姿勢を表す。したがって、複数の物体認識テンプレート４３００の各々は、物理的シーン内の物体５０１１の画像情報（例えば、画像情報２６００／２７００）を生成するカメラ（カメラ１２００など）の光学軸に対する、物理的シーンにおける物体５０１１の潜在的な姿勢に対応し得る。 As described above, the object recognition template generation method 6000 of the present invention includes generating the object recognition template set 4301 from the object registration data 5001. Object recognition template set 4301 may be used to identify one or more objects 5011 in a scene that grab, pick up, or otherwise interact with one or more objects 5011 during physical processing. Object registration data 5001 of object model 4200 representing object 5011 is acquired. A plurality of viewpoints 4120 of the object model 4200 in the three-dimensional space 4100 are determined. The appearance 4140 of the object model at each of the plurality of viewpoints 4120 is estimated or captured. The plurality of object recognition templates 4300 are generated according to the plurality of appearances 4140, and each of the plurality of object recognition templates 4300 corresponds to a respective one of the plurality of appearances 4140. The plurality of object recognition templates 4300 are transmitted to the robot control system as an object recognition template set 4301. Each of the plurality of object recognition templates 4300 represents a posture that the object model 4200 may have with respect to the optical axis 4130 of the virtual camera 4110. Accordingly, each of the plurality of object recognition templates 4300 represents the object in the physical scene relative to the optical axis of the camera (such as camera 1200) that generates image information (e.g., image information 2600/2700) of the object 5011 in the physical scene. It can correspond to 5011 potential poses.

さらなる実施形態において、追加又は代替的な方法は、物体登録データ５００１から物体認識テンプレートセット４３０１を生成するために使用されてもよく、物体認識テンプレート４３００は、２Ｄ外観４３０２及び３Ｄ外観４３０３に追加又は２Ｄ外観４３０２及び３Ｄ外観４３０３とは異なる情報を含んでもよい。具体的には、物体認識テンプレート４３００は、二次元（２Ｄ）測定情報４３０４及び三次元（３Ｄ）測定情報４３０５を含み得る。 In further embodiments, additional or alternative methods may be used to generate object recognition template set 4301 from object registration data 5001, where object recognition template 4300 is additional or alternative to 2D appearance 4302 and 3D appearance 4303. The 2D appearance 4302 and the 3D appearance 4303 may include different information. Specifically, object recognition template 4300 may include two-dimensional (2D) measurement information 4304 and three-dimensional (3D) measurement information 4305.

２Ｄ測定情報４３０４は、勾配特徴マップを指し得る。勾配特徴マップは、以下に説明するように、デジタル物体の表面上の一つ以上の勾配抽出位置５１００で、物体のデジタル表示から捕捉又は抽出された勾配情報９１００を含み得る。３Ｄ測定情報４３０５は、表面法線特徴マップを指し得る。表面法線特徴マップは、以下に説明するように、デジタル物体の表面上の一つ以上の表面法線位置５１０１で、物体のデジタル表示から捕捉又は抽出された表面法線ベクトル９１０１を含み得る。２Ｄ測定情報４３０４及び３Ｄ測定情報４３０５の生成及び／又は抽出は、図８～９Ｃに関して以下でより詳細に説明される。 2D measurement information 4304 may refer to a gradient feature map. The gradient feature map may include gradient information 9100 captured or extracted from the digital representation of the object at one or more gradient extraction locations 5100 on the surface of the digital object, as described below. 3D measurement information 4305 may refer to a surface normal feature map. The surface normal feature map may include surface normal vectors 9101 captured or extracted from the digital representation of the object at one or more surface normal locations 5101 on the surface of the digital object, as described below. The generation and/or extraction of 2D measurement information 4304 and 3D measurement information 4305 is described in more detail below with respect to FIGS. 8-9C.

図８は、例示的な特徴生成方法８０００のフロー図を示す。実施形態では、特徴生成方法８０００は、物体認識テンプレートセット及び／又は複数の物体認識テンプレートを生成するために使用され得る。さらなる実施形態では、以下でより詳細に論じるように、特徴生成方法８０００を使用して、仮説生成、精密化、及び検証方法における物体画像情報から特徴を抽出してもよい。一実施形態では、特徴生成方法８０００は、例えば、図２Ａ～２Ｄの計算システム１１００（又は１１００Ａ／１１００Ｂ／１１００Ｃ）、又は図３Ａ～３Ｂの計算システム１１００によって、又はより具体的には、計算システム１１００の少なくとも一つの処理回路１１１０によって行われてもよい。一部のシナリオでは、計算システム１１００が、非一時的コンピュータ可読媒体（例えば、１１２０）上に記憶された命令を実行することによって、特徴生成方法８０００を行ってもよい。例えば、命令によって、計算システム１１００に、特徴生成方法８０００を行い得る、図２Ｄに示されたモジュールのうちの一つ以上を実行させてもよい。例えば、実施形態では、特徴生成方法８０００のステップは、物体登録モジュール１１３０、物体認識モジュール１１２１、特徴抽出モジュール１１３４、及び一緒に動作するテンプレート生成モジュール１１３２によって行われてもよい。 FIG. 8 shows a flow diagram of an example feature generation method 8000. In embodiments, feature generation method 8000 may be used to generate a set of object recognition templates and/or a plurality of object recognition templates. In further embodiments, the feature generation method 8000 may be used to extract features from object image information in a hypothesis generation, refinement, and verification method, as discussed in more detail below. In one embodiment, the feature generation method 8000 is performed by, for example, the computing system 1100 (or 1100A/1100B/1100C) of FIGS. 2A-2D, or the computing system 1100 of FIGS. 3A-3B, or more specifically, the computing system 1100 may be performed by at least one processing circuit 1110 of 1100. In some scenarios, computing system 1100 may perform feature generation method 8000 by executing instructions stored on a non-transitory computer-readable medium (eg, 1120). For example, the instructions may cause computing system 1100 to execute one or more of the modules shown in FIG. 2D that may perform feature generation method 8000. For example, in embodiments, the steps of feature generation method 8000 may be performed by object registration module 1130, object recognition module 1121, feature extraction module 1134, and template generation module 1132 operating together.

実施形態では、特徴生成方法８０００のステップは、例えば、特徴生成及び／又は抽出方法を通して、物体認識テンプレート生成を達成するために利用されてもよく、これは、特定のタスクを実行するための特定の連続ロボット軌道と併せて後で使用されてもよい。実施形態では、特徴生成方法８０００のステップを適用して、仮説生成、精密化、及び検証で使用するための物体画像情報から特徴を抽出又は生成してもよい。一般的な概要として、特徴生成方法８０００は、物体の持ち上げに関連する処理のためにシーン内の物体を識別する際に使用するために、計算システム（例えば、計算システム１１００又は類似の計算システム）用の物体認識テンプレート、特徴マップ、及び／又は抽出／生成された特徴のセットを、計算システム１１００に生成させるように処理してもよい。特徴生成方法８０００は、図７Ａ及び７Ｂならびに図９Ａ～９Ｃをさらに参照して以下に説明される。 In embodiments, the steps of the feature generation method 8000 may be utilized to accomplish object recognition template generation, e.g., through feature generation and/or extraction methods, which may include specific may later be used in conjunction with continuous robot trajectories. In embodiments, the steps of feature generation method 8000 may be applied to extract or generate features from object image information for use in hypothesis generation, refinement, and validation. As a general overview, the feature generation method 8000 can be applied to a computing system (e.g., computing system 1100 or a similar computing system) for use in identifying objects in a scene for processing related to object elevation. An object recognition template, a feature map, and/or a set of extracted/generated features may be generated by the computing system 1100. Feature generation method 8000 is described below with further reference to FIGS. 7A and 7B and FIGS. 9A-9C.

特徴生成方法８０００は、物体認識テンプレート４３００を生成するため、及び／又は物理的シーン５０１３内の物体５０１２（例えば、図３Ｂを参照）を特徴付けるために使用され得る２Ｄ測定情報４３０４及び３Ｄ測定情報４３０５を含み得る。少なくとも一つの処理回路１１１０は、物体情報９１２１を取得してもよい。図９Ａに示すように、物体情報９１２１は、デジタルで表される物体９２００、例えば、物体モデル４２００の物体登録データ５００１、物体モデル４２００の外観４１４０、物体認識テンプレート４３００、及び／又はシーン情報９１３１を含み得る。シーン情報９１３１は、例えば２Ｄ画像情報２６００及び／又は３Ｄ画像情報２７００に類似した、複数の物体５０１２を含む物理的シーン５０１３の捕捉された２Ｄ又は３Ｄ画像情報を含み得る。シーン情報９１３１はまた、仮説生成、検証、ならびに精密化の方法及び処理に関して以下で論じる画像情報１２００１を含んでもよい。少なくとも一つの処理回路１１１０はさらに、物体情報９１２１から２Ｄ測定情報４３０４及び／又は３Ｄ測定情報４３０５を抽出又は生成し得る。実施形態では、少なくとも一つの処理回路１１１０は、２Ｄ測定情報４３０４及び３Ｄ測定情報４３０５に従って、物体認識テンプレート４３００をさらに生成してもよい。実施形態では、２Ｄ測定情報４３０４及び３Ｄ測定情報４３０５は、仮説生成、検証、及び精密化などの代替的な目的のために使用又は利用され得る。少なくとも一つの処理回路１１１０は、物体認識テンプレートセット４３０１を生成するための、及び／又は仮説精密化及び検証で使用するための特徴生成方法８０００の特定のステップを実行し得る。 Feature generation method 8000 generates 2D measurement information 4304 and 3D measurement information 4305 that may be used to generate object recognition template 4300 and/or to characterize object 5012 (see, e.g., FIG. 3B) within physical scene 5013. may include. At least one processing circuit 1110 may obtain object information 9121. As shown in FIG. 9A, the object information 9121 includes an object 9200 represented digitally, for example, object registration data 5001 of the object model 4200, appearance 4140 of the object model 4200, object recognition template 4300, and/or scene information 9131. may be included. Scene information 9131 may include captured 2D or 3D image information of a physical scene 5013 including a plurality of objects 5012, similar to 2D image information 2600 and/or 3D image information 2700, for example. Scene information 9131 may also include image information 12001, discussed below with respect to hypothesis generation, verification, and refinement methods and processes. At least one processing circuit 1110 may further extract or generate 2D measurement information 4304 and/or 3D measurement information 4305 from object information 9121. In embodiments, at least one processing circuit 1110 may further generate object recognition template 4300 according to 2D measurement information 4304 and 3D measurement information 4305. In embodiments, 2D measurement information 4304 and 3D measurement information 4305 may be used or utilized for alternative purposes, such as hypothesis generation, validation, and refinement. At least one processing circuit 1110 may perform certain steps of the feature generation method 8000 for generating the object recognition template set 4301 and/or for use in hypothesis refinement and validation.

処理８００１では、特徴生成方法８０００は、物体情報９１２１の取得を含み得る。物体情報９１２１は、デジタルで表される物体９２００を含み得る。物体情報９１２１及びデジタルで表される物体９２００は、物理的に世界に存在する物体５０１５を表し得る。物体５０１５は、例えば、物体５０１１（例えば、物体モデル４２００によって表される物理的物体）及び／又は物体５０１２（例えば、物理的シーン５０１３の捕捉された画像情報によって表される物理的物体）を含み得る。実施形態では、物体情報９１２１は、物体認識テンプレート４３００、物体登録データ５００１、物体外観４１４０及び／又はシーン情報９１３１のうちの一つ以上を含み得る。少なくとも一つの処理回路１１１０は、計算システム１１００のハードウェアストレージコンポーネント（すなわち、ＨＤＤ、ＳＳＤ、ＵＳＢ、ＣＤ、ＲＡＩＤなど）、又はソフトウェアストレージコンポーネント（すなわち、クラウド、ＶＳＰなど）内から、物体情報９１２１を取得し得る。少なくとも一つの処理回路１１１０は、内部処理の一部として、例えば、物体認識テンプレート４３００として、物体情報９１２１を取得してもよい。少なくとも一つの処理回路１１１０は、計算システム１１００に関連付けられたカメラ１２００から物体情報９１２１を取得し得る。少なくとも一つの処理回路１１１０は、外部のプロセッサ（すなわち、外部のラップトップ、デスクトップ、携帯電話、又は独自の処理システムを有する任意の他の別個の装置）又は外部の記憶装置から、物体の物体情報９１２１を取得し得る。 In process 8001, feature generation method 8000 may include obtaining object information 9121. Object information 9121 may include a digitally represented object 9200. Object information 9121 and digitally represented object 9200 may represent object 5015 that physically exists in the world. Objects 5015 include, for example, objects 5011 (e.g., physical objects represented by object model 4200) and/or objects 5012 (e.g., physical objects represented by captured image information of physical scene 5013). obtain. In embodiments, object information 9121 may include one or more of object recognition template 4300, object registration data 5001, object appearance 4140, and/or scene information 9131. At least one processing circuit 1110 receives object information 9121 from within a hardware storage component (i.e., HDD, SSD, USB, CD, RAID, etc.) or a software storage component (i.e., cloud, VSP, etc.) of computing system 1100. can be obtained. At least one processing circuit 1110 may obtain object information 9121 as part of internal processing, for example, as object recognition template 4300. At least one processing circuit 1110 may obtain object information 9121 from a camera 1200 associated with computing system 1100. At least one processing circuit 1110 receives object information about the object from an external processor (i.e., an external laptop, desktop, mobile phone, or any other separate device with its own processing system) or an external storage device. 9121 can be obtained.

処理８００３では、特徴生成方法８０００は、勾配抽出位置５１００（図９Ｂに示す）及び表面法線位置５１０１（図９Ｃに示す）を含む特徴位置を選択することをさらに含み得る。勾配抽出位置５１００は、２Ｄ測定情報４３０４の抽出又は生成のために選択される位置である。表面法線位置５１０１は、３Ｄ測定情報４３０５の抽出又は生成のために選択される位置である。勾配抽出位置５１００及び表面法線位置５１０１の各々は、デジタルで表される物体９２００の表面９１２２上の位置である。 In operation 8003, the feature generation method 8000 may further include selecting feature locations that include gradient extraction locations 5100 (shown in FIG. 9B) and surface normal locations 5101 (shown in FIG. 9C). Gradient extraction location 5100 is a location selected for extraction or generation of 2D measurement information 4304. Surface normal position 5101 is the position selected for extraction or generation of 3D measurement information 4305. Each of the gradient extraction position 5100 and the surface normal position 5101 is a position on the surface 9122 of the digitally represented object 9200.

実施形態では、勾配抽出位置５１００及び表面法線位置５１０１は、互いに対応してもよい。実施形態では、一部の勾配抽出位置５１００は、一部の表面法線位置５１０１に対応してもよく、一方、その他の勾配抽出位置５１００は、表面法線位置５１０１に対応しない。さらなる実施形態において、勾配抽出位置５１００及び表面法線位置５１０１は、互いに重ならないように選択されてもよい。したがって、勾配抽出位置５１００及び表面法線位置５１０１は、完全な重なり及び重なりなしを含む、任意の量の重なりを有し得る。 In embodiments, gradient extraction location 5100 and surface normal location 5101 may correspond to each other. In embodiments, some gradient extraction locations 5100 may correspond to some surface normal locations 5101, while other gradient extraction locations 5100 do not correspond to surface normal locations 5101. In further embodiments, gradient extraction locations 5100 and surface normal locations 5101 may be selected such that they do not overlap each other. Thus, gradient extraction location 5100 and surface normal location 5101 may have any amount of overlap, including complete overlap and no overlap.

実施形態では、勾配抽出位置５１００及びデジタル的に表される物体９２００の表面９１２２上の表面法線位置５１０１の位置は、抽出又は生成された２Ｄ測定情報４３０４及び３Ｄ測定情報４３０５を記憶するために必要なメモリ量を制限するために、限定セットとして選択され得る。このメモリ保存の実践は、デジタルで表される物体９２００の物体情報９１２１のサイズ（バイト単位）に関係なく、抽出及び／又は分析された、全特徴の固定数（以下に説明するように、勾配情報９１００及び／又は表面法線ベクトル９１０１など）を指し得る線形モダリティと呼んでもよい。２Ｄ測定情報４３０４のために捕捉された特徴の数は、３Ｄ測定情報４３０５のために捕捉された特徴の数と同一であってもよく、又は異なってもよい。 In embodiments, the location of the gradient extraction location 5100 and the surface normal location 5101 on the surface 9122 of the digitally represented object 9200 is used to store the extracted or generated 2D measurement information 4304 and 3D measurement information 4305. A limited set may be selected to limit the amount of memory required. This memory preservation practice ensures that, regardless of the size (in bytes) of the object information 9121 of the digitally represented object 9200, a fixed number of total features (as described below, information 9100 and/or surface normal vector 9101, etc.). The number of features captured for 2D measurement information 4304 may be the same as the number of features captured for 3D measurement information 4305, or may be different.

実施形態では、限定された数の勾配抽出位置５１００及び表面法線位置５１０１は、効率的な結果を生成するように位置してもよい。例えば、勾配抽出位置５１００は、図９Ｂに示すように、デジタルで表される物体９２００の識別されたエッジに沿って位置してもよいが、表面法線位置５１０１は、デジタルで表される物体９２００のエッジから離れて位置してもよい。実施形態では、デジタルで表される物体９２００のエッジは、例えば、レイトレーシング、ピクセル強度不連続性、又は他の分析技術に従って識別されてもよい。これは、以下に説明するように、勾配情報９１００が、物体エッジの近くで捕捉されたとき仮説生成、検証、及び精密化においてより有意であり得る一方、表面法線ベクトル９１０１がエッジから離れて捕捉されたときにより有意であり得るため、効率的であると証明され得る。選択される勾配抽出位置５１００及び表面法線位置５１０１の組み合わせ数は、１００～１０００、５０～５０００、及び／又は１０～１０００の範囲であり得るが、より多いか、又はより少いのも、同様に適切であり得る。特定の実施形態では、勾配抽出位置５１００及び表面法線位置５１０１の数は、それぞれ２５６であってもよく、又は合計で２５６であってもよい。 In embodiments, a limited number of gradient extraction locations 5100 and surface normal locations 5101 may be located to produce efficient results. For example, the gradient extraction location 5100 may be located along an identified edge of the digitally represented object 9200, as shown in FIG. 9B, while the surface normal location 5101 9200 may be located away from the edge. In embodiments, edges of the digitally represented object 9200 may be identified according to, for example, ray tracing, pixel intensity discontinuities, or other analysis techniques. This is because, as explained below, gradient information 9100 may be more significant in hypothesis generation, validation, and refinement when captured close to an object edge, whereas surface normal vector 9101 is captured away from an edge. It may prove efficient as it may be more significant when captured. The number of combinations of selected gradient extraction positions 5100 and surface normal positions 5101 may range from 100 to 1000, 50 to 5000, and/or 10 to 1000, but may also be greater or less. may be suitable as well. In certain embodiments, the number of gradient extraction locations 5100 and surface normal locations 5101 may be 256 each or 256 in total.

処理８００５では、特徴生成方法８０００は、物体情報９１２１から２Ｄ測定情報４３０４を抽出することをさらに含み得る。２Ｄ測定情報４３０４は、メモリ又は他のリソースを節約するために、及び／又は物体認識テンプレートセット４３０１が生成されるか、又は仮説検証及び精密化が実施される速度を改善するために、物体５０１５を表すために、（例えば、２Ｄ外観４３０２と比較して）より小さな情報セットを表し得る。上述のように、物体認識テンプレートセット４３０１からの物体認識テンプレート４３００は、物体５０１５を説明する２Ｄ測定情報４３０４（及び／又は３Ｄ測定情報４３０５）を含み得る。 In operation 8005, the feature generation method 8000 may further include extracting 2D measurement information 4304 from object information 9121. 2D measurement information 4304 may be added to object 5015 to conserve memory or other resources and/or to improve the speed at which object recognition template set 4301 is generated or hypothesis testing and refinement is performed. may represent a smaller set of information (eg, compared to 2D appearance 4302). As mentioned above, object recognition template 4300 from object recognition template set 4301 may include 2D measurement information 4304 (and/or 3D measurement information 4305) describing object 5015.

２Ｄ測定情報４３０４は、物体情報９１２１から抽出又は生成される二次元特徴を含み得る。一実施形態では、２Ｄ測定情報４３０４を抽出又は生成することは、物体情報９１２１から勾配情報９１００を抽出することを含み得る。したがって、２Ｄ測定情報４３０４は、本明細書に記載されるように勾配情報９１００を含む勾配特徴マップを含み得る。勾配情報９１００は、デジタルで表される物体９２００のエッジ５１１０の方向又は配向を示す。勾配情報９１００は、デジタル的に表される物体９２００の複数の勾配抽出位置５１００で抽出されてもよい。勾配抽出位置５１００は、デジタルで表される物体９２００内で識別される、任意の又はすべての内部及び外部のエッジを表すことができる。 2D measurement information 4304 may include two-dimensional features extracted or generated from object information 9121. In one embodiment, extracting or generating 2D measurement information 4304 may include extracting slope information 9100 from object information 9121. Accordingly, 2D measurement information 4304 may include a slope feature map that includes slope information 9100 as described herein. Gradient information 9100 indicates the direction or orientation of edge 5110 of digitally represented object 9200. Gradient information 9100 may be extracted at multiple gradient extraction locations 5100 of a digitally represented object 9200. Gradient extraction location 5100 may represent any or all internal and external edges identified within digitally represented object 9200.

勾配情報９１００を抽出することは、物体情報９１２１の二次元画像情報のピクセル強度を解析して、各勾配抽出位置での二次元画像情報のピクセル強度が、勾配抽出と呼ばれるプロセスにおいて変化する方向（例えば、矢印９１５０によって表される）を測定することを含み得る。ピクセル強度の変化は、デジタルで表される物体９２００の表面及びエッジの輪郭及び配向を表してもよく、したがって、二つのデジタルで表される物体９２００を比較するのに役立ち得る情報を提供する。エッジ５１１０に沿った互いの近くの位置は、類似の勾配情報９１００を有する可能性が高く、例えば、こうした隣接位置の近くのピクセル強度は、エッジ５１１０からの距離の増加と共に同様の方法で変化する。一部の実施例では、平均よりも高いピクセル強度を提示するデジタルで表される物体９２００の一部分は、エッジ５１１０又は他の識別可能な特徴を示し得る。上述のように、一部の実施例では、勾配抽出位置５１００は、デジタルで表される物体９２００のエッジ５１１０に沿って配置されてもよい。 Extracting the gradient information 9100 involves analyzing the pixel intensity of the two-dimensional image information of the object information 9121, and determining the direction in which the pixel intensity of the two-dimensional image information at each gradient extraction position changes in a process called gradient extraction. For example, it may include measuring (represented by arrow 9150). Changes in pixel intensity may represent the contour and orientation of the surfaces and edges of the digitally represented object 9200, thus providing information that may be useful in comparing two digitally represented objects 9200. Locations near each other along edge 5110 are likely to have similar slope information 9100, e.g., pixel intensities near such adjacent locations change in a similar manner with increasing distance from edge 5110. . In some examples, a portion of the digitally represented object 9200 that presents a higher than average pixel intensity may exhibit an edge 5110 or other discernible feature. As mentioned above, in some examples, the gradient extraction location 5100 may be located along the edge 5110 of the digitally represented object 9200.

一実施形態では、抽出された勾配情報９１００は、テンプレートマッチング処理、仮説生成、又は仮説検証処理を改善するために使用され得るが、これは、物体認識テンプレートセット４３０１からの物体認識テンプレート４３００が、シーン内の物体５０１２に合致するかどうかを判定し得る。例えば、２Ｄ外観４３０２が、シーンからデジタルで表される物体９２００と重なる又は交差する特定の部分を有する場合、少なくとも一つの処理回路１１１０は、合致する部分も合致する、又は類似の勾配（例えば、２Ｄ測定情報４３０４の一部分が合致するか否か）を提示するか否かを判定し得る。勾配が異なるか、又は合致しない場合、少なくとも一つの処理回路１１１０は、異なる特定の部分が不一致の結果であるか、又は偶然であると判定し得る。不一致は、シーンの一部分をわずかな量だけ重ねる２Ｄ外観４３０２の結果であり得る。 In one embodiment, extracted gradient information 9100 may be used to improve a template matching process, hypothesis generation, or hypothesis testing process because object recognition template 4300 from object recognition template set 4301 is It may be determined whether there is a match to object 5012 in the scene. For example, if the 2D feature 4302 has a particular portion that overlaps or intersects the digitally represented object 9200 from the scene, the at least one processing circuit 1110 may determine whether the matching portion also matches or has a similar gradient (e.g. Whether a portion of the 2D measurement information 4304 matches) can be determined. If the slopes are different or do not match, at least one processing circuit 1110 may determine that the different particular portion is a result of the mismatch or is a coincidence. The mismatch may be the result of the 2D appearance 4302 overlapping portions of the scene by a small amount.

例えば、ここで図９Ｄを参照すると、物体認識テンプレート４３００Ａの２Ｄ外観４３０２Ａは、長方形によって表され、物体認識テンプレート４３００Ａの２Ｄ測定情報４３０４Ａ（例えば、勾配情報９１００）は矢印によって表される。シーン内の物体５０１２の２Ｄ画像情報２６００Ｂ（物理的物体は図示せず）は、Ｌ字型の固体によって表される。物体５０１２はさらに、矢印で表される２Ｄ測定情報４３０４Ｂによって表される。２Ｄ外観４３０２Ａの一部分は、シーン内の物体５０１２を表す２Ｄ画像情報２６００Ｂ（物理的物体は図示せず）のパターン化された部分９３０９と比較され、重ねられてもよい。しかしながら、２Ｄ測定情報４３０４Ａ及び４３０４Ｂによって表される勾配は合致せず、したがって、少なくとも一つの処理回路１１１０によって、物体認識テンプレート４３００Ａは、シーン５０１３内の物体５０１２に対して不適合であると判定されてもよい。 For example, referring now to FIG. 9D, the 2D appearance 4302A of the object recognition template 4300A is represented by a rectangle, and the 2D measurement information 4304A (eg, slope information 9100) of the object recognition template 4300A is represented by an arrow. 2D image information 2600B of an object 5012 in the scene (physical object not shown) is represented by an L-shaped solid. Object 5012 is further represented by 2D measurement information 4304B represented by an arrow. A portion of the 2D appearance 4302A may be compared and superimposed with a patterned portion 9309 of the 2D image information 2600B (physical object not shown) representing an object 5012 in the scene. However, the slopes represented by 2D measurement information 4304A and 4304B do not match, and therefore object recognition template 4300A is determined to be non-conforming for object 5012 in scene 5013 by at least one processing circuit 1110. Good too.

処理８００７では、特徴生成方法８０００は、物体情報９１２１から３Ｄ測定情報４３０５を抽出又は生成することをさらに含み得る。ここで図９Ｃを参照すると、処理８００７は、表面法線位置５１０１で表面法線ベクトル９１０１を決定することを含み得る。３Ｄ測定情報４３０５は、抽出又は生成された表面法線ベクトル９１０１を含む表面法線特徴マップを含み得る。 In operation 8007, the feature generation method 8000 may further include extracting or generating 3D measurement information 4305 from object information 9121. Referring now to FIG. 9C, process 8007 may include determining a surface normal vector 9101 at surface normal position 5101. 3D measurement information 4305 may include a surface normal feature map that includes extracted or generated surface normal vectors 9101.

抽出された３Ｄ測定情報４３０５は、表面法線ベクトル情報、例えば、デジタルで表される物体９２００の表面９１２２上にある表面法線位置５１０１で取られた法線ベクトル（表面に対して垂直なベクトル）であってもよい表面法線ベクトル９１０１を記述する測定値を含んでもよい。一実施形態では、３Ｄ測定情報４３０５を抽出又は生成することは、物体情報９１２１から表面法線ベクトル９１０１及び／又は表面法線ベクトル情報を抽出又は生成することを含む。表面法線ベクトル９１０１は、デジタルで表される物体９２００の表面９１２２に対して法線の複数のベクトルを記述する。表面法線ベクトル９１０１は、デジタルで表される物体９２００の複数の表面法線位置５１０１で抽出又は生成されてもよい。表面法線ベクトル９１０１を抽出することは、表面法線ベクトル位置５１０１のそれぞれで、デジタルで表される物体９２００の複数の表面法線ベクトル９１０１を識別することを含み得る。 The extracted 3D measurement information 4305 includes surface normal vector information, for example, a normal vector taken at a surface normal position 5101 on the surface 9122 of the digitally represented object 9200 (a vector perpendicular to the surface). ) may include measurements that describe the surface normal vector 9101. In one embodiment, extracting or generating 3D measurement information 4305 includes extracting or generating surface normal vector 9101 and/or surface normal vector information from object information 9121. Surface normal vector 9101 describes a plurality of vectors normal to surface 9122 of digitally represented object 9200. Surface normal vectors 9101 may be extracted or generated at multiple surface normal positions 5101 of digitally represented object 9200. Extracting the surface normal vector 9101 may include identifying a plurality of surface normal vectors 9101 of the digitally represented object 9200 at each of the surface normal vector positions 5101.

処理８００９では、特徴生成方法８０００は、物体認識テンプレートセット４３０１又は複数の物体認識テンプレート４３００を生成することを含み得る。少なくとも一つの処理回路１１１０は、上述の２Ｄ測定情報４３０４及び３Ｄ測定情報４３０５を含む、一つ以上の物体認識テンプレート４３００を生成し得る。一つ以上の物体認識テンプレート４３００は、物体認識テンプレートセット４３０１を形成し得る。上述のように、物体認識テンプレート４３００は、２Ｄ測定情報４３０４、３Ｄ測定情報４３０５、２Ｄ外観４３０２、及び３Ｄ外観４３０３のうちの一つ以上を含み得る。したがって、いくつかの実施形態では、特徴生成方法８０００は、以前に確立された物体認識テンプレート４３００及び物体認識テンプレートセット４３０１を増強又はさらに開発してもよい。抽出又は生成された３Ｄ測定情報４３０５及び２Ｄ測定情報４３０４は、以下で論じるように、リアルタイム又はほぼリアルタイムの持ち上げ処理中に、シーン内の物体５０１２を識別するために使用され得る。特徴生成方法８０００は、後でシーン（又はシーン内の物体）に対して合致する（仮説精密化及び検証）処理を実行するために、物体認識テンプレートセット４３０１を生成する際に上述の物体認識テンプレート生成方法６０００と並行して、又はその後に続いて作業又は処理してもよい。特徴生成方法８０００は、後の仮説処理（例えば、以下でさらに詳細に説明される方法１１０００及び方法１３０００など）で使用される物体認識テンプレートセット４３０１の作成に向けた最終ステップとして役立ち得る。 In operation 8009, the feature generation method 8000 may include generating an object recognition template set 4301 or a plurality of object recognition templates 4300. At least one processing circuit 1110 may generate one or more object recognition templates 4300 that include 2D measurement information 4304 and 3D measurement information 4305 described above. One or more object recognition templates 4300 may form an object recognition template set 4301. As described above, object recognition template 4300 may include one or more of 2D measurement information 4304, 3D measurement information 4305, 2D appearance 4302, and 3D appearance 4303. Accordingly, in some embodiments, the feature generation method 8000 may augment or further develop the previously established object recognition template 4300 and object recognition template set 4301. The extracted or generated 3D measurement information 4305 and 2D measurement information 4304 may be used to identify objects 5012 in the scene during a real-time or near real-time lifting process, as discussed below. The feature generation method 8000 uses the object recognition template described above when generating an object recognition template set 4301 in order to later perform matching (hypothesis refinement and verification) processing on a scene (or an object in the scene). It may be operated or processed in parallel with or subsequent to the generation method 6000. Feature generation method 8000 may serve as a final step towards creating an object recognition template set 4301 for use in subsequent hypothesis processing (eg, method 11000 and method 13000, described in further detail below).

図１０Ａ及び１０Ｂは、本明細書の実施形態と一致する、テンプレートの合致及び仮説生成方法１００００の態様を示す。本明細書で論じる仮説生成技術は、概してｌｉｎｅＭｏｄ技術と一致し得る。 10A and 10B illustrate aspects of a template matching and hypothesis generation method 10000 consistent with embodiments herein. The hypothesis generation techniques discussed herein may be generally consistent with lineMod techniques.

処理１０００１では、テンプレートマッチング及び仮説生成方法１００００は、画像情報の取得を含み得る。一実施形態では、画像情報１２００１を取得することは、シーン５０１３及びシーン内の一つ以上の物体５０１２の画像を捕捉することを含み得る。こうした実例では、画像情報１２００１が、箱、ビン、ケース、木枠、パレット、又は他の容器内に位置する物体５０１２を表し得る。画像情報１２００１は、本明細書で論じるように、カメラ１２００によって取得されてもよい。 In process 10001, the template matching and hypothesis generation method 10000 may include obtaining image information. In one embodiment, obtaining image information 12001 may include capturing images of a scene 5013 and one or more objects 5012 within the scene. In these examples, image information 12001 may represent an object 5012 located within a box, bin, case, crate, pallet, or other container. Image information 12001 may be acquired by camera 1200, as discussed herein.

少なくとも一つの処理回路１１１０は、画像情報１２００１を使用して、カメラ１２００の視野内の個々の物体を区別し、画像情報１２００１に基づいて物体認識を行うなどにより、画像情報１２００１を生成、受信、及び／又は処理するように構成され得る。一実施形態では、画像情報１２００１は、カメラ１２００の視野における環境又はシーン５０１３の視覚的な外観を記述する二次元画像情報（例えば、２Ｄ画像情報２６００に類似）を含み得る。一実施形態では、画像情報１２００１は、カメラ１２００の視野内のシーン５０１３の点群、空間構造情報、奥行きマップ、又は他の三次元画像を提供する、三次元画像情報（例えば、３Ｄ画像情報２７００に類似）を含み得る。この例の三次元画像情報は、物体５０１２が三次元空間（例えば、シーン５０１３）の中で空間的にどのように配置されるかを推定するために使用され得る。画像情報１２００１の取得は、シーン５０１３を表す画像情報１２００１の生成又は取得を含んでもよく、必要に応じて、シーン５０１３内の個々の物体５０１２又は複数の物体５０１２を表す物体画像情報１２００２の生成又は取得を含んでもよい。画像情報１２００１は、物体５０１２がカメラ１２００の視野にある（又はあった）ときに、カメラ１２００によって生成されてもよく、例えば、二次元画像情報及び／又は三次元画像情報を含み得る。 At least one processing circuit 1110 generates, receives, and generates image information 12001, such as by using image information 12001 to distinguish individual objects within the field of view of camera 1200 and perform object recognition based on image information 12001. and/or may be configured to process. In one embodiment, image information 12001 may include two-dimensional image information (eg, similar to 2D image information 2600) that describes the visual appearance of an environment or scene 5013 in the field of view of camera 1200. In one embodiment, image information 12001 includes three-dimensional image information (e.g., 3D image information 2700 similar to). This example three-dimensional image information may be used to estimate how object 5012 is spatially positioned within three-dimensional space (eg, scene 5013). Obtaining image information 12001 may include generating or obtaining image information 12001 representing a scene 5013, and optionally generating or obtaining object image information 12002 representing an individual object 5012 or multiple objects 5012 within scene 5013. May include acquisition. Image information 12001 may be generated by camera 1200 when object 5012 is (or was) in the field of view of camera 1200 and may include, for example, two-dimensional image information and/or three-dimensional image information.

一実施形態では、画像情報１２００１は、二次元グレースケール又はカラー画像を含んでもよく、カメラ１２００の視点からのシーン５０１３（及び／又はシーン内の物体５０１２）の外観を記述してもよい。一実施形態では、画像情報１２００１は、カラー画像の単一色チャネル（例えば、赤、緑、又は青色のチャネル）に対応し得る。カメラ１２００が物体５０１２の上方に配置される場合、二次元画像情報は、物体５０１２のそれぞれの上部表面の外観を表し得る。さらに、画像情報１２００１は、例えば、物体５０１２の一つ以上の表面（例えば、上部表面、又は他の外側表面）上の、又は物体５０１２の一つ以上のエッジに沿った様々な物体位置６２２０のそれぞれの奥行き値を示す、奥行きマップ又は点群を含み得る、三次元画像情報を含み得る。物体画像情報１２００２の二次元画像情報及び三次元画像情報は、それぞれ、２Ｄ画像情報１２６００及び３Ｄ画像情報１２７００と呼んでもよい。一部の実施形態では、物体５０１２の物理的エッジを表す物体位置６２２０は、個々の物体５０１２を表すことに限定される物体画像情報１２００２を識別するために使用され得る。 In one embodiment, image information 12001 may include a two-dimensional grayscale or color image and may describe the appearance of scene 5013 (and/or objects 5012 within the scene) from the perspective of camera 1200. In one embodiment, image information 12001 may correspond to a single color channel (eg, a red, green, or blue channel) of a color image. If camera 1200 is positioned above object 5012, the two-dimensional image information may represent the appearance of the respective upper surface of object 5012. Further, image information 12001 may include, for example, various object positions 6220 on one or more surfaces (e.g., a top surface, or other outer surface) of object 5012 or along one or more edges of object 5012. It may include three-dimensional image information, which may include a depth map or point cloud, indicating respective depth values. The two-dimensional image information and three-dimensional image information of the object image information 12002 may be referred to as 2D image information 12600 and 3D image information 12700, respectively. In some embodiments, object positions 6220 representing physical edges of objects 5012 may be used to identify object image information 12002 that is limited to representing individual objects 5012.

物体画像情報１２００２は、シーン５０１３内の特定の物理的物体５０１２に関連する画像情報を含み得る。物体画像情報１２００２は、画像情報２６００に類似の、物体５０１２を表す２Ｄ画像情報１２６００を含み得る。物体画像情報１２００２は、画像情報２７００に類似の、物体５０１２を表す３Ｄ画像情報１２７００を含み得る。物体画像情報１２００２は、物体位置６２２０を含んでもよく、物体位置６２２０は、例えば、特徴生成方法８０００を介して、それぞれの勾配情報８１０２及び表面法線ベクトル８１０３が取得される位置を表す勾配抽出位置８１００及び表面法線位置８１０１をさらに含んでもよい。勾配抽出位置８１００、表面法線位置８１０１、勾配情報８１０２、及び表面法線ベクトル８１０３は、上述したように、勾配抽出位置８１００、表面法線位置８１０１、勾配情報８１０２、及び表面法線ベクトル８１０３が、物理的物体から得られた画像情報から取得されることを除いて、勾配抽出位置５１００、表面法線位置５１０１、勾配情報９１００、及び表面法線ベクトル９１０１と類似していてもよい。 Object image information 12002 may include image information related to a particular physical object 5012 within scene 5013. Object image information 12002 may include 2D image information 12600 representing object 5012, similar to image information 2600. Object image information 12002 may include 3D image information 12700 representing object 5012, similar to image information 2700. The object image information 12002 may include an object position 6220, where the object position 6220 is, for example, a gradient extraction position representing the position at which the respective gradient information 8102 and surface normal vector 8103 are obtained via the feature generation method 8000. 8100 and a surface normal position 8101 may be further included. As described above, the gradient extraction position 8100, the surface normal position 8101, the gradient information 8102, and the surface normal vector 8103 are , may be similar to gradient extraction location 5100, surface normal location 5101, gradient information 9100, and surface normal vector 9101, except that they are obtained from image information obtained from a physical object.

以下で論じるテンプレートマッチング及び仮説生成処理は、物体認識テンプレートを、画像情報１２００１及び／又は物体画像情報１２００２と比較することによって行われてもよい。実施形態では、物体画像情報１２００２は、画像情報１２００１から、例えば、上述したように、画像セグメンテーション又は他の技術、ならびに特徴生成方法８０００に基づいて生成されてもよい。 The template matching and hypothesis generation process discussed below may be performed by comparing the object recognition template to image information 12001 and/or object image information 12002. In embodiments, object image information 12002 may be generated from image information 12001 based on, for example, image segmentation or other techniques as well as feature generation method 8000, as described above.

処理１０００３では、テンプレートマッチング及び仮説生成方法１００００は、テンプレートを物体画像情報に合致させることを含み得る。シーン５０１３内に存在する物体５０１２の種類（それが単一の種類であるか、又は複数の種類であるかにかかわらず）は、既知であり得る。したがって、既知の物体の種類に対応する物体認識テンプレートセット４３０１は、例えば、本明細書に記載される任意の方法を介して取得され得る。物体５０１２がどのように様々な姿勢で見えるべきかについての情報を表す物体認識テンプレートセット４３０１の各物体認識テンプレート４３００の情報は、物体５０１２を表す物体画像情報１２００２と比較して、各物体認識テンプレート４３００が合致の候補であるかを判定してもよい。次いで、検出仮説の生成のために、合致の良好な候補を選択し得る。 In process 10003, the template matching and hypothesis generation method 10000 may include matching a template to object image information. The types of objects 5012 present in scene 5013 (whether a single type or multiple types) may be known. Accordingly, an object recognition template set 4301 corresponding to a known object type may be obtained, for example, via any method described herein. The information of each object recognition template 4300 of the object recognition template set 4301 representing information about how the object 5012 should appear in various poses is compared to the object image information 12002 representing the object 5012. It may be determined whether 4300 is a matching candidate. Good matching candidates may then be selected for generation of detection hypotheses.

物体認識テンプレート４３００の任意の関連情報は、物体画像情報１２００２の対応する情報と比較されてもよい。例えば、物体画像情報１２００２の勾配情報８１０２及び勾配抽出位置８１００は、物体認識テンプレート４３００の勾配情報９１００及び勾配抽出位置５１００と比較されてもよい。物体画像情報１２００２の表面法線ベクトル８１０３及び表面法線位置８１０１は、物体認識テンプレート４３００の表面法線ベクトル９１０１及び表面法線位置５１０１と比較され得る。２Ｄ情報１２６００及び３Ｄ情報１２７００はそれぞれ、２Ｄ外観４３０２及び３Ｄ外観４３０３と比較されてもよい。 Any relevant information in object recognition template 4300 may be compared with corresponding information in object image information 12002. For example, gradient information 8102 and gradient extraction position 8100 of object image information 12002 may be compared with gradient information 9100 and gradient extraction position 5100 of object recognition template 4300. Surface normal vector 8103 and surface normal position 8101 of object image information 12002 may be compared with surface normal vector 9101 and surface normal position 5101 of object recognition template 4300. 2D information 12600 and 3D information 12700 may be compared to 2D appearance 4302 and 3D appearance 4303, respectively.

物体認識テンプレート４３００からの、及び物体画像情報１２００２からの上述の情報は、情報が一連の二次元的位置に起因し得るという点で、マップとして理解され得る。テンプレートマップ（物体認識テンプレート４３００の情報のいずれかを表す）は、閾値を超える合致が見つかるまで、物体マップ（物体画像情報１２００２のいずれかを表す）に対して横方向に摺動させることができる。テンプレートマッチングには、それぞれの勾配情報、それぞれの２Ｄ画像情報、それぞれの３Ｄ情報、及び／又はそれぞれの表面法線ベクトル情報の比較を伴い得る。 The above information from object recognition template 4300 and from object image information 12002 can be understood as a map in that the information can be attributed to a series of two-dimensional locations. The template map (representing any of the object recognition template 4300 information) can be slid laterally relative to the object map (representing any of the object image information 12002) until a match that exceeds a threshold is found. . Template matching may involve comparing respective gradient information, respective 2D image information, respective 3D information, and/or respective surface normal vector information.

閾値を使用してもよく、公差は、物体認識テンプレート４３００と物体画像情報１２００２との間の姿勢の潜在的な変化を説明するため許可されてもよい。上述の空間サブサンプリング手順では、物体認識テンプレート４３００における全ての可能な姿勢を捕捉することは不可能であり、したがって、何らかの変形が許容されると理解され、説明され得る。こうした公差技術は、例えば、広がることを含んでもよく、それによって、勾配情報９１００は、物体認識テンプレート４３００の隣接する勾配抽出位置５１００の間で広がり、合致の可能性を増大させる。別の公差技術は、例えば、勾配情報又は表面法線ベクトルが互いに近接しているが、完全には合致しない場合など、合致の閾値レベルに基づいて合致を見つけることを含み得る。テンプレートマッチングは、合致の品質を示すために、テンプレートマッチングスコアを生成し得る。 Thresholds may be used and tolerances may be allowed to account for potential changes in pose between object recognition template 4300 and object image information 12002. It can be understood and explained that the spatial subsampling procedure described above is not capable of capturing all possible poses in the object recognition template 4300, and therefore some deformation is allowed. Such tolerance techniques may include, for example, spreading, whereby gradient information 9100 is spread between adjacent gradient extraction locations 5100 of object recognition template 4300 to increase the likelihood of a match. Another tolerance technique may include finding a match based on a threshold level of match, such as when slope information or surface normal vectors are close to each other but do not match perfectly. Template matching may generate a template matching score to indicate the quality of the match.

処理１０００５では、テンプレートマッチング及び仮説生成方法１００００は、合致するテンプレートをクラスタリング及びグループ化して、合致の総数を低減することを含み得る。テンプレートマッチング処理は、物体画像情報１２００２によって表される物体に合致する、複数の物体認識テンプレート４３００を見つけることができる。一部の実施形態では、テンプレートマッチング処理は、時間又は計算リソースによって、いくつの合致が識別され得るかという観点で制限され得る。こうした状況では、処理１０００５は、シーン５０１３の中の単一の部分又は一組の部分上に合致を集中させることを回避し得る。したがって、良好な品質合致（例えば、閾値を超える）を有する合致したテンプレートは、良好なシーン範囲を維持するために、クラスタ化、グループ化、及びフィルタリングされてもよい。同じ物体画像情報１２００２に対応するものとして識別される物体認識テンプレート４３００は、クラスタ化又はグループ化されてもよい。各クラスタ又はグループの中で、最良の合致が選択されてもよく、残りは除去されてもよい。したがって、残りの合致は、単一の領域でクラスタリングするのではなく、シーン５０１３全体を通して物体５０１２を表し得る。一実施例では、シーン５０１３中の物体５０１２が容器の上部の近くにあり、非常に容易に認識できる場合、部分的に隠された物体よりも多くの合致を生成し得る。各物体画像情報１２００２に対して最良の合致のみを選択することによって、より多くの物体を識別できる。 In operation 10005, the template matching and hypothesis generation method 10000 may include clustering and grouping matching templates to reduce the total number of matches. The template matching process can find a plurality of object recognition templates 4300 that match the object represented by the object image information 12002. In some embodiments, the template matching process may be limited by time or computational resources in terms of how many matches can be identified. In these situations, process 10005 may avoid concentrating matches on a single portion or set of portions within scene 5013. Accordingly, matched templates with good quality matches (eg, above a threshold) may be clustered, grouped, and filtered to maintain good scene coverage. Object recognition templates 4300 that are identified as corresponding to the same object image information 12002 may be clustered or grouped. Within each cluster or group, the best match may be selected and the rest may be removed. Therefore, the remaining matches may represent object 5012 throughout scene 5013 rather than clustering in a single region. In one example, if object 5012 in scene 5013 is near the top of the container and is very easily recognizable, it may generate more matches than a partially hidden object. By selecting only the best match for each object image information 12002, more objects can be identified.

処理１０００７では、テンプレートマッチング及び仮説生成方法１００００は、一つ以上の検出仮説のセットを生成することを含み得る。クラスタリング及びグループ化後に残っている物体認識テンプレート４３００は、検出仮説として選択され得る。これらの物体認識テンプレート４３００は、シーン５０１３内の各物体認識テンプレート４３００が、対応する物体画像情報に合致するように位置するべきである場所についての情報を示す姿勢情報６３０１とともに記憶されてもよい。姿勢情報６３０１は、各物体認識テンプレート４３００を対応する物体画像情報１２００２と関連付ける情報をさらに含み得る。検出仮説６３００は、グループ及び／又はセットで組み合わせられてもよい。例えば、検出仮説セット８３０９は、単一の物体５０１２を表す物体画像情報１２００２に関連する複数の検出仮説８３００を含んでもよいが、検出仮説８３００のグループは、シーン５０１３内の複数の異なる物体５０１２を表す物体画像情報１２００２に関連する複数の検出仮説８３００を含んでもよい。 In operation 10007, the template matching and hypothesis generation method 10000 may include generating a set of one or more detection hypotheses. The object recognition templates 4300 remaining after clustering and grouping may be selected as detection hypotheses. These object recognition templates 4300 may be stored with pose information 6301 indicating information about where each object recognition template 4300 in scene 5013 should be located to match the corresponding object image information. Posture information 6301 may further include information associating each object recognition template 4300 with corresponding object image information 12002. Detection hypotheses 6300 may be combined in groups and/or sets. For example, detection hypothesis set 8309 may include multiple detection hypotheses 8300 associated with object image information 12002 representing a single object 5012, but a group of detection hypotheses 8300 may represent multiple different objects 5012 in scene 5013. A plurality of detection hypotheses 8300 related to the represented object image information 12002 may be included.

図１１は、検出仮説を精密化するための例示的な仮説精密化方法１１０００のフロー図を示す。一実施形態では、仮説精密化方法１１０００は、例えば、図２Ａ～２Ｄの計算システム１１００（すなわち１１００Ａ／１１００Ｂ／１１００Ｃ）、又は図３Ａ～３Ｂの計算システム１１００、又はより具体的には、計算システム１１００の少なくとも一つの処理回路１１１０によって行われてもよい。一部のシナリオでは、計算システム１１００が、非一時的コンピュータ可読媒体（例えば、１１２０）上に記憶された命令を実行することによって、仮説精密化方法１１０００を行ってもよい。例えば、命令によって、計算システム１１００に、方法１１０００を行い得る、図２Ｄに示されたモジュールのうちの一つ以上を実行させてもよい。例えば、実施形態では、方法１１０００のステップは、仮説生成モジュール１１２８と、一緒に動作する仮説精密化モジュール１１３６とによって行われてもよい。 FIG. 11 shows a flow diagram of an example hypothesis refinement method 11000 for refining detection hypotheses. In one embodiment, the hypothesis refinement method 11000 may be applied to, for example, the computing system 1100 of FIGS. 2A-2D (i.e., 1100A/1100B/1100C), or the computing system 1100 of FIGS. 1100 may be performed by at least one processing circuit 1110 of 1100. In some scenarios, computing system 1100 may perform hypothesis refinement method 11000 by executing instructions stored on a non-transitory computer-readable medium (eg, 1120). For example, the instructions may cause computing system 1100 to execute one or more of the modules shown in FIG. 2D that may perform method 11000. For example, in embodiments, the steps of method 11000 may be performed by hypothesis generation module 1128 and hypothesis refinement module 1136 operating in conjunction.

仮説精密化方法１１０００を使用して、シーン５０１３内に物理的に位置する物体５０１２を識別するために生成される、一つ以上の検出仮説６３００（例えば、上述したように）を精密化してもよい。仮説精密化方法１１０００は、シーン５０１３から得られた画像情報１２００１について処理し得る。画像情報１２００１は、２Ｄ画像情報２６００及び３Ｄ画像情報２７００と類似してもよい。画像情報１２００１内には、シーン５０１３内の物体５０１２を表す一つ以上の物体画像情報１２００２があってよい。物体５０１２を識別することは、物体の種類を識別すること、又は対応する物体画像情報１２００２から物体の寸法を識別すること、及び／又は物体認識テンプレート４３００に物体画像情報１２００２を合致させることを含み得る。したがって、検出仮説６３００は、一つ以上の物体認識テンプレート４３００のどれが、シーン５０１３を表す画像情報１２００１の物体画像情報１２００２と合致し得るかに関する仮説であってもよい。物体画像情報１２００２は、物体５０１２を表す２Ｄ画像情報１２６００を含み得る。２Ｄ画像情報１２６００は、画像情報２６００に類似していてもよく、及び／又はレイトレーシング及び不連続検出などのレンダリング技術に従って生成されたレンダリングされた２Ｄ画像情報を含んでもよい。物体画像情報１２００２は、画像情報２７００に類似の、物体５０１２を表す３Ｄ画像情報１２７００を含み得る。検出仮説６３００は、上述のように、テンプレートマッチング手順に従って生成され得る。例えば、一実施形態では、検出仮説６３００は、上述のｌｉｎｅＭｏｄアルゴリズム及び／又は手順を介して生成されてもよい。仮説精密化方法１１０００は、物体認識テンプレート４３００が物体画像情報１２００２と正確に合致しないシナリオでも、物体認識テンプレート４３００と物体画像情報１２００２の合致を精密化するように処理してもよい。 Hypothesis refinement method 11000 may also be used to refine one or more detection hypotheses 6300 (e.g., as described above) generated to identify objects 5012 physically located within scene 5013. good. Hypothesis refinement method 11000 may operate on image information 12001 obtained from scene 5013. Image information 12001 may be similar to 2D image information 2600 and 3D image information 2700. Within image information 12001 there may be one or more object image information 12002 representing objects 5012 within scene 5013. Identifying the object 5012 includes identifying a type of object, or identifying dimensions of the object from corresponding object image information 12002, and/or matching object image information 12002 to object recognition template 4300. obtain. Accordingly, detection hypothesis 6300 may be a hypothesis regarding which of one or more object recognition templates 4300 may match object image information 12002 of image information 12001 representing scene 5013. Object image information 12002 may include 2D image information 12600 representing object 5012. 2D image information 12600 may be similar to image information 2600 and/or may include rendered 2D image information generated according to rendering techniques such as ray tracing and discontinuity detection. Object image information 12002 may include 3D image information 12700 representing object 5012, similar to image information 2700. Detection hypothesis 6300 may be generated according to a template matching procedure, as described above. For example, in one embodiment, detection hypothesis 6300 may be generated via the lineMod algorithm and/or procedure described above. The hypothesis refinement method 11000 may perform processing to refine the match between the object recognition template 4300 and the object image information 12002 even in a scenario where the object recognition template 4300 does not exactly match the object image information 12002.

仮説精密化方法１１０００では、少なくとも一つの処理回路１１１０が、ロボットアーム３３２０及びアームに接続されたエンドエフェクタ装置３３３０を有するロボット３３００、及び視野を有し、視野内に一つ以上の物体５０１２がある又はあったときに、非一時的コンピュータ可読媒体に記憶されている命令を実行するように構成されるカメラ１２００と通信することができる。実施形態では、少なくとも一つの処理回路１１１０は、ロボット３３００と直接通信しなくてもよく、ネットワーク及び／又は記憶装置を介して、ロボット３３００との間で情報を送受信してもよい。実施形態では、少なくとも一つの処理回路１１１０は、ロボット３３００と直接通信してもよい。少なくとも一つの処理回路１１１０は、シーン５０１３内の一つ以上の物体５０１２の画像情報１２００１を取得し得る。少なくとも一つの処理回路１１１０はまた、検出仮説６３００を取得してもよい。検出仮説６３００は、物体認識テンプレート４３００（例えば、複数の物体認識テンプレート４３００から選択される対応する物体認識テンプレート４３００Ｂ）と物体画像情報１２００２を関連付ける情報を含んでもよく、物体画像情報１２００２によって表される物体５０１２の姿勢情報６３０１を含んでもよい。物体５０１２の姿勢情報６３０１は、物体５０１２の位置及び配向を指し得る。実施形態では、検出仮説６３００は、対応する物体認識テンプレート４３００Ｂを含んでもよく、又は対応する物体認識テンプレート４３００Ｂへの参照を含んでもよい。少なくとも一つの処理回路１１１０は、対応する物体認識テンプレート４３００Ｂと、対応する物体画像情報１２００２との間の不一致を識別するように処理してもよい。少なくとも一つの処理回路１１１０は、物体画像情報１２００２の物体位置６２２０のセットに対応する、対応する物体認識テンプレート４３００Ｂのテンプレート位置６２１０のセットを識別するように処理してもよい。少なくとも一つの処理回路１１１０は、テンプレート位置６２１０のセットを調整して、物体位置６２２０のセットに収束するようにさらに処理してもよい。少なくとも一つの処理回路１１１０は、調整された検出仮説６３００’又は、調整後のテンプレート位置６２１０のセットに従って調整された物体認識テンプレートを含む、複数の反復調整された検出仮説６３００‘を生成するように処理してもよい。 In the hypothesis refinement method 11000, at least one processing circuit 1110 includes a robot 3300 having a robot arm 3320 and an end effector device 3330 connected to the arm, and a field of view with one or more objects 5012 within the field of view. or when the camera 1200 is configured to execute instructions stored on a non-transitory computer-readable medium. In embodiments, at least one processing circuit 1110 may not communicate directly with robot 3300, but may send information to and receive information from robot 3300 via a network and/or storage device. In embodiments, at least one processing circuit 1110 may communicate directly with robot 3300. At least one processing circuit 1110 may obtain image information 12001 of one or more objects 5012 within a scene 5013. At least one processing circuit 1110 may also obtain a detection hypothesis 6300. Detection hypothesis 6300 may include information associating object recognition template 4300 (e.g., a corresponding object recognition template 4300B selected from a plurality of object recognition templates 4300) and object image information 12002, and is represented by object image information 12002. It may also include posture information 6301 of the object 5012. Pose information 6301 for object 5012 may refer to the position and orientation of object 5012. In embodiments, a detection hypothesis 6300 may include a corresponding object recognition template 4300B or may include a reference to a corresponding object recognition template 4300B. At least one processing circuit 1110 may process to identify a mismatch between the corresponding object recognition template 4300B and the corresponding object image information 12002. At least one processing circuit 1110 may process to identify a corresponding set of template positions 6210 of object recognition template 4300B that correspond to a set of object positions 6220 of object image information 12002. At least one processing circuit 1110 may further process the set of template positions 6210 to adjust it to converge on the set of object positions 6220. At least one processing circuit 1110 is configured to generate an adjusted detection hypothesis 6300' or a plurality of iterative adjusted detection hypotheses 6300' that include an object recognition template adjusted according to a set of adjusted template positions 6210. May be processed.

少なくとも一つの処理回路１１１０は、検出仮説６３００を精密化するために、仮説精密化方法１１０００の特定のステップを実行し得る。処理１１００１では、仮説精密化方法１１０００は、シーン５０１３内の一つ以上の物体５０１２の画像情報１２００１を取得することを含み得る。一実施形態では、画像情報１２００１を取得することは、シーン５０１３の画像を捕捉することを含み得る。こうした実例では、画像情報１２００１が、箱、ビン、ケース、木枠、パレット又は他の容器内に位置する物体５０１２を表し得る。画像情報１２００１は、本明細書で論じるように、カメラ１２００によって取得されてもよい。 At least one processing circuit 1110 may perform certain steps of hypothesis refinement method 11000 to refine detection hypothesis 6300. In process 11001, hypothesis refinement method 11000 may include obtaining image information 12001 of one or more objects 5012 in scene 5013. In one embodiment, obtaining image information 12001 may include capturing an image of scene 5013. In these examples, image information 12001 may represent an object 5012 located within a box, bin, case, crate, pallet, or other container. Image information 12001 may be acquired by camera 1200, as discussed herein.

少なくとも一つの処理回路１１１０は、画像情報１２００１を使用して、カメラ１２００の視野内の個々の物体を区別し、画像情報１２００１に基づいて物体認識又は物体登録を行うなど、画像情報１２００１を生成、受信、及び／又は処理するように構成され得る。一実施形態では、画像情報１２００１は、カメラ１２００の視野における環境又はシーン５０１３の視覚的な外観を記述する二次元画像情報（例えば、２Ｄ画像情報２６００に類似）を含み得る。一実施形態では、画像情報１２００１は、カメラ１２００の視野内のシーン５０１３の点群、空間構造情報、奥行きマップ、又は他の三次元画像を提供する、三次元画像情報（例えば、３Ｄ画像情報２７００に類似）を含み得る。この例の三次元画像情報は、物体５０１２が三次元空間（例えば、シーン５０１３）の中で空間的にどのように配置されるかを推定するために使用され得る。処理１１００１に関して、画像情報１２００１を取得することは、シーン５０１３を表す画像情報１２００１の生成又は取得を含んでもよく、必要に応じて、シーン５０１３内の個々の物体５０１２又は複数の物体５０１２を表す一つ以上の物体画像情報１２００２の生成又は取得を含んでもよい。画像情報１２００１は、物体５０１２がカメラ１２００の視野にある（又はあった）ときに、カメラ１２００によって生成されてもよく、例えば、二次元画像情報及び／又は三次元画像情報を含み得る。 At least one processing circuit 1110 generates image information 12001, such as using image information 12001 to distinguish individual objects within the field of view of camera 1200 and perform object recognition or object registration based on image information 12001; The information may be configured to receive and/or process. In one embodiment, image information 12001 may include two-dimensional image information (eg, similar to 2D image information 2600) that describes the visual appearance of an environment or scene 5013 in the field of view of camera 1200. In one embodiment, image information 12001 includes three-dimensional image information (e.g., 3D image information 2700 similar to). This example three-dimensional image information may be used to estimate how object 5012 is spatially positioned within three-dimensional space (eg, scene 5013). With respect to process 11001, obtaining image information 12001 may include generating or obtaining image information 12001 representing a scene 5013, optionally a single object representing an individual object 5012 or multiple objects 5012 within scene 5013. It may also include generating or acquiring more than one object image information 12002. Image information 12001 may be generated by camera 1200 when object 5012 is (or was) in the field of view of camera 1200 and may include, for example, two-dimensional image information and/or three-dimensional image information.

一実施形態では、画像情報１２００１は、二次元グレースケール又はカラー画像を含んでもよく、カメラ１２００の視点からのシーン５０１３（及び／又はシーン内の物体５０１２）の外観を記述してもよい。一実施形態では、画像情報１２００１は、カラー画像の単一色チャネル（例えば、赤、緑、又は青色のチャネル）に対応し得る。カメラ１２００が物体５０１２の上方に配置される場合、二次元画像情報は、物体５０１２のそれぞれの上部表面の外観を表し得る。 In one embodiment, image information 12001 may include a two-dimensional grayscale or color image and may describe the appearance of scene 5013 (and/or objects 5012 within the scene) from the perspective of camera 1200. In one embodiment, image information 12001 may correspond to a single color channel (eg, a red, green, or blue channel) of a color image. If camera 1200 is positioned above object 5012, the two-dimensional image information may represent the appearance of the respective upper surface of object 5012.

物体画像情報１２００２は、シーン５０１３内の特定の物理的物体５０１２に関連する画像情報を含み得る。物体画像情報１２００２は、画像情報２６００に類似の、物体５０１２を表す２Ｄ画像情報１２６００を含み得る。物体画像情報１２００２は、画像情報２７００に類似の、物体５０１２を表す３Ｄ画像情報１２７００を含み得る。物体画像情報１２００２は、物体位置６２２０を含んでもよく、物体位置６２２０は、例えば、特徴生成方法８０００を介して、それぞれの勾配情報８１０２及び表面法線ベクトル８１０３が取得される位置を表す勾配抽出位置８１００及び表面法線位置８１０１をさらに含んでもよい。勾配抽出位置８１００、表面法線位置８１０１、勾配情報８１０２、及び表面法線ベクトル８１０３は、上述したように、勾配抽出位置８１００、表面法線位置８１０１、勾配情報８１０２、及び表面法線ベクトル８１０３が、物理的物体から得られた画像情報から取得されることを除いて、勾配抽出位置５１００、表面法線位置５１０１、勾配情報９１００、及び表面法線ベクトル９１０１と類似していてもよい。 Object image information 12002 may include image information related to a particular physical object 5012 within scene 5013. Object image information 12002 may include 2D image information 12600 representing object 5012, similar to image information 2600. Object image information 12002 may include 3D image information 12700 representing object 5012, similar to image information 2700. The object image information 12002 may include an object position 6220, where the object position 6220 is, for example, a gradient extraction position representing the position at which the respective gradient information 8102 and surface normal vector 8103 are obtained via the feature generation method 8000. 8100 and a surface normal position 8101 may also be included. As described above, the gradient extraction position 8100, the surface normal position 8101, the gradient information 8102, and the surface normal vector 8103 are , may be similar to gradient extraction location 5100, surface normal location 5101, gradient information 9100, and surface normal vector 9101, except that they are obtained from image information obtained from a physical object.

処理１１００３では、仮説精密化方法１１０００は、検出仮説６３００を取得することをさらに含み得る。検出仮説６３００は、複数の情報の断片を含み得る。例えば、検出仮説６３００は、画像情報１２００１内の対応する物体画像情報１２００２をオーバーレイするために必要な、対応する物体認識テンプレート４３００Ｂの位置及び配向を示す、対応する物体認識テンプレート４３００Ｂ及び物体姿勢情報６３０１を含み得る。対応する物体認識テンプレート４３００Ｂは、２Ｄ外観４３０２Ｂ、３Ｄ外観４３０３Ｂ、２Ｄ測定情報４３０４Ｂ、及び３Ｄ測定情報４３０５Ｂのうちの一つ以上を含み得る。上述のように、２Ｄ測定情報４３０４Ｂは、勾配情報９１００Ｂ及び勾配抽出位置５１００Ｂを含んでもよいが、３Ｄ測定情報４３０５Ｂは、表面法線ベクトル９１０１Ｂ及び表面法線位置５１０１Ｂを含んでもよい。対応する物体認識テンプレート４３００Ｂは、勾配抽出位置５１００Ｂ及び表面法線位置５１０１Ｂ又はそのサブセットを含み得る、テンプレート位置６２１０をさらに含み得る。 In operation 11003, the hypothesis refinement method 11000 may further include obtaining a detection hypothesis 6300. Detection hypothesis 6300 may include multiple pieces of information. For example, the detection hypothesis 6300 includes the corresponding object recognition template 4300B and object pose information 6301 indicating the position and orientation of the corresponding object recognition template 4300B necessary to overlay the corresponding object image information 12002 in the image information 12001. may include. The corresponding object recognition template 4300B may include one or more of 2D appearance 4302B, 3D appearance 4303B, 2D measurement information 4304B, and 3D measurement information 4305B. As described above, 2D measurement information 4304B may include gradient information 9100B and gradient extraction position 5100B, whereas 3D measurement information 4305B may include surface normal vector 9101B and surface normal position 5101B. The corresponding object recognition template 4300B may further include template locations 6210, which may include gradient extraction locations 5100B and surface normal locations 5101B or a subset thereof.

処理１１００５では、仮説精密化方法１１０００は、対応する物体認識テンプレート４３００Ｂと、検出仮説６３００に従ってテンプレートが合致した物体画像情報１２００２との間の不一致を識別することをさらに含んでもよい。対応する物体認識テンプレート４３００Ｂ（例えば、２Ｄ外観４３０２Ｂ）の二次元情報は、物体画像情報１２００２と比較して、不一致を識別してもよい。不一致は、２Ｄ外観４３０２Ｂと物体画像情報１２００２との間の非整列又は他の不整合の領域に従って識別又は定量化され得る。 In operation 11005, the hypothesis refinement method 11000 may further include identifying a mismatch between the corresponding object recognition template 4300B and the object image information 12002 that the template matched according to the detection hypothesis 6300. The two-dimensional information of the corresponding object recognition template 4300B (eg, 2D appearance 4302B) may be compared to the object image information 12002 to identify discrepancies. Mismatches may be identified or quantified according to areas of non-alignment or other mismatch between 2D appearance 4302B and object image information 12002.

対応する物体認識テンプレート４３００Ｂの間の不一致又は不整合の識別に伴い、対応する物体認識テンプレート４３００Ｂ（例えば、２Ｄ外観４３０２Ｂ）の二次元情報は、物体画像情報１２００２との比較及びアライメントのために、二次元空間から三次元空間に変換されてもよい。一部の実例では、３Ｄ外観４３０３Ｂ又は２Ｄ外観４３０２Ｂの３Ｄ変換は、物体画像情報１２００２との比較に使用され、不一致を識別し得る。いくつかの実施形態では、不一致は、物体位置６２２０とテンプレート位置６２１０との間の不整合に従って識別又は定量化されてもよい。物体位置６２２０は、物体５０１２のデジタル表示上の点（例えば、物体画像情報１２００２）を表し、一方、テンプレート位置６２１０は、テンプレート物体６２９０上の点を表す（以下で論じるように）。 Upon identification of a mismatch or inconsistency between the corresponding object recognition templates 4300B, the two-dimensional information of the corresponding object recognition templates 4300B (e.g., 2D appearance 4302B) is used for comparison and alignment with the object image information 12002. A two-dimensional space may be converted to a three-dimensional space. In some instances, a 3D transformation of 3D appearance 4303B or 2D appearance 4302B may be used for comparison with object image information 12002 to identify discrepancies. In some embodiments, a mismatch may be identified or quantified according to the mismatch between object position 6220 and template position 6210. Object position 6220 represents a point on a digital representation of object 5012 (eg, object image information 12002), while template position 6210 represents a point on template object 6290 (as discussed below).

二次元空間から三次元空間への変換は、カメラキャリブレーション処理中に決定され得る、又は事前に定義され得る、カメラ１２００又は他の画像センサのキャリブレーションパラメータ又は他のパラメータに基づいてもよい。上述のように、対応する物体認識テンプレート４３００Ｂは、物体登録データ５００１から導出され、それに関連付けられた座標系を有してもよい。三次元空間への変換において、対応する物体認識テンプレート４３００Ｂの座標系は、画像情報１２００１に捕捉されるように、シーン５０１３の座標系にマッピングされてもよい。したがって、画像情報１２００１を捕捉したカメラ１２００のキャリブレーションパラメータ又は他のパラメータを、変換に利用してもよい。検出仮説６３００の情報は、本明細書ではテンプレート物体６２９０と呼ばれる、物体のデジタル表現を定義することができる。三次元変換は、テンプレート物体６２９０と呼ばれてもよく、物体画像情報１２００２と比較するための、画像情報１２００１の座標系における三次元空間における検出仮説６３００の情報を表してもよい。 The transformation from two-dimensional space to three-dimensional space may be based on calibration parameters or other parameters of the camera 1200 or other image sensor, which may be determined during a camera calibration process or may be predefined. As mentioned above, the corresponding object recognition template 4300B may be derived from the object registration data 5001 and have a coordinate system associated therewith. In the transformation to three-dimensional space, the coordinate system of the corresponding object recognition template 4300B may be mapped to the coordinate system of the scene 5013, as captured in the image information 12001. Therefore, calibration parameters or other parameters of the camera 1200 that captured the image information 12001 may be utilized for the conversion. The detection hypothesis 6300 information may define a digital representation of the object, referred to herein as a template object 6290. The three-dimensional transformation may be referred to as template object 6290 and may represent information of detection hypothesis 6300 in three-dimensional space in the coordinate system of image information 12001 for comparison with object image information 12002.

処理１１００７では、仮説精密化方法１１０００は、対応する物体上の物体位置のセットに対応する、対応する物体認識テンプレート内のテンプレート位置のセットを識別することをさらに含んでもよい。物体位置６２２０は、物体５０１２のデジタル表現上の点（例えば、物体画像情報１２００２）を表し、一方、テンプレート位置６２１０は、テンプレート物体６２９０上の点を表す。したがって、物体位置６２２０をテンプレート位置６２１０と整列させることは、検出仮説６３００を精密化するのに役立ち得る。 In operation 11007, the hypothesis refinement method 11000 may further include identifying a set of template positions in the corresponding object recognition template that correspond to the set of object positions on the corresponding object. Object position 6220 represents a point on the digital representation of object 5012 (eg, object image information 12002), while template position 6210 represents a point on template object 6290. Therefore, aligning object location 6220 with template location 6210 may help refine detection hypothesis 6300.

上述のように、テンプレート位置６２１０は、勾配抽出位置５１００Ｂ及び表面法線位置５１０１Ｂ又はそのサブセットに対応してもよい。さらなる実施形態では、テンプレート位置６２１０は、物体画像情報１２００２の物体位置６２２０とのアライメントに使用される、追加又は異なる位置を含み得る。 As mentioned above, template location 6210 may correspond to gradient extraction location 5100B and surface normal location 5101B or a subset thereof. In further embodiments, template location 6210 may include additional or different locations used for alignment of object image information 12002 with object location 6220.

テンプレート位置６２１０（及び対応する物体位置６２２０）は、仮説精密化に高い影響を有する位置（例えば、物体画像情報１２００２とテンプレート物体６２９０との間のアライメント）に従って選択されてもよい。一部の実例では、テンプレート位置６２１０及び物体位置６２２０は、それぞれのテンプレート物体６２９０及び物体画像情報１２００２のエッジの周りの位置として選択され得る。こうした位置は、ノイズの影響を受けにくく、物体の形状の輪郭を提供し得るため、仮説精密化を実行するためにより有用であり得る。 Template positions 6210 (and corresponding object positions 6220) may be selected according to positions that have a high impact on hypothesis refinement (eg, alignment between object image information 12002 and template object 6290). In some examples, template position 6210 and object position 6220 may be selected as positions around the edges of respective template object 6290 and object image information 12002. Such locations may be more useful for performing hypothesis refinement because they are less susceptible to noise and may provide an outline of the object's shape.

処理１１００９では、仮説精密化方法１１０００は、テンプレート位置６２１０のセットを調整して、物体位置６２２０のセットに収束させることをさらに含み得る。少なくとも一つの処理回路１１１０は、テンプレート位置６２１０のセットを調整するようにさらに構成されてもよい。不一致が識別される場合、少なくとも一つの処理回路１１１０は、テンプレート物体６２９０のテンプレート位置６２１０から物体画像情報１２００２の対応する物体位置６２２０までの間のアライメント値を改善するために調整を行ってもよい。 In operation 11009, the hypothesis refinement method 11000 may further include adjusting the set of template positions 6210 to converge to the set of object positions 6220. At least one processing circuit 1110 may be further configured to adjust the set of template positions 6210. If a mismatch is identified, the at least one processing circuit 1110 may make an adjustment to improve the alignment value between the template position 6210 of the template object 6290 and the corresponding object position 6220 of the object image information 12002. .

アライメント手順は、図１２Ｂ及び１２Ｃに示す、ｉｔｅｒａｔｉｖｅｃｌｏｓｅｓｔｐｏｉｎｔ（ＩＣＰ）技術を使用して実施され得る。ＩＣＰ技術は、テンプレート位置６２１０を調整して、物体位置６２２０のセットに収束させることを含み得る。テンプレート位置６２１０とそれらの対応する物体位置６２２０との間のベクトル６２１５のセットが、決定され得る。各ベクトル６２１５は、方向及び大きさを表し得る。一実施形態では、ベクトルの方向及び大きさは、物体位置６２２０に収束するようにテンプレート位置６２１０を調整するために使用され得る。テンプレート位置６２１０から物体位置６２２０に延在するベクトル６２１５は、方向及び大きさを有する。ベクトル６２１５のセットが、ベクトルの方向及び大きさを有し、テンプレート位置６２１０でテンプレート物体６２９０上で動作する力として数学的に理解される場合、テンプレート物体６２９０は、それぞれのテンプレート位置６２１０で加えられるか、又は作用するベクトル６２１５の方向及び大きさに従って調整又は移動されてもよい。したがって、より大きな大きさを有する、さらに離れている（例えば、より大きなデルタ又はオフセットを有する）テンプレート位置６２１０及び物体位置６２２０を表すベクトル６２１５は、より大きな「力」をテンプレート調整に加えると理解され得る。例えば、図１２Ｂを参照すると、テンプレート物体６２９０は、物体画像情報１２００２をオーバーレイしてもよい。ベクトル６２１５は、テンプレート位置６２１０と物体位置６２２０との間で延びる。ベクトル６２１５が、その方向及び大きさに基づいてテンプレート物体６２９０に集合的に「力」として加えられる場合、テンプレート物体６２９０（図１２Ｂに示す）は、時計回りに回転し、物体画像情報１２００２とより近いアライメントに至る傾向がある。ベクトル６２１５を適用した後、ベクトル６２１５の新しいセットを生成し、反復的に適用してもよい。別の例では、図１２Ｃに示すように、ベクトル６２１５を適用することは、テンプレート物体６２９０の並進移動を引き起こして、物体画像情報１２００２とのアライメントをもたらし得る。いくつかの実施形態では、ベクトル６２１５の反復生成及び適用を通して、テンプレート物体６２９０は、残りのベクトル６２１５が互いに打ち消し合い、さらなる移動が生成されなくなるまで、物体画像情報１２００２とのより良好なアライメントに移動する。さらなる移動が生成できない場合、アライメント品質が評価され得る。いくつかの実施形態では、反復調整は、アライメント品質が閾値を超えるまで行われてもよい。 The alignment procedure may be performed using the iterative closest point (ICP) technique, shown in FIGS. 12B and 12C. ICP techniques may include adjusting template positions 6210 to converge on a set of object positions 6220. A set of vectors 6215 between template positions 6210 and their corresponding object positions 6220 may be determined. Each vector 6215 may represent direction and magnitude. In one embodiment, the direction and magnitude of the vector may be used to adjust template position 6210 to converge on object position 6220. A vector 6215 extending from template position 6210 to object position 6220 has a direction and a magnitude. If the set of vectors 6215 is understood mathematically as a force having the direction and magnitude of the vector and operating on the template object 6290 at the template position 6210, then the template object 6290 is applied at each template position 6210. or may be adjusted or moved according to the direction and magnitude of the acting vector 6215. Therefore, vectors 6215 representing template positions 6210 and object positions 6220 that have larger magnitudes and are further apart (e.g., have larger deltas or offsets) are understood to apply more "force" to the template adjustment. obtain. For example, referring to FIG. 12B, template object 6290 may overlay object image information 12002. Vector 6215 extends between template position 6210 and object position 6220. When vectors 6215 are collectively applied as a “force” to template object 6290 based on their direction and magnitude, template object 6290 (shown in FIG. There is a tendency to achieve close alignment. After applying vectors 6215, a new set of vectors 6215 may be generated and applied iteratively. In another example, applying vector 6215 may cause a translation of template object 6290 to result in alignment with object image information 12002, as shown in FIG. 12C. In some embodiments, through iterative generation and application of vectors 6215, template object 6290 moves into better alignment with object image information 12002 until the remaining vectors 6215 cancel each other out and no further movement is produced. do. If no further movement can be generated, alignment quality may be evaluated. In some embodiments, iterative adjustments may be made until the alignment quality exceeds a threshold.

アライメントの品質（又はミスアライメントのレベル）は、複数の異なる方法で評価又は決定され得る。例えば、アライメントの品質は、ベクトル６２１５の方向及び大きさによって定義されるミスアライメントのレベルに従って評価又は決定され得る。アライメントの品質はまた、新しい、更新された、又は調整されたテンプレート位置６２１０のセットと、物体位置６２２０のセットとの間の距離測定値に従って評価又は決定されてもよい。アライメントの品質は、収束割合に従って評価又は決定されてもよい。実施形態では、これらのアライメントの品質の測定値の任意の組み合わせを使用してもよい。 The quality of alignment (or level of misalignment) can be evaluated or determined in a number of different ways. For example, the quality of alignment may be evaluated or determined according to the level of misalignment defined by the direction and magnitude of vector 6215. The quality of the alignment may also be evaluated or determined according to distance measurements between the new, updated, or adjusted set of template positions 6210 and the set of object positions 6220. The quality of alignment may be evaluated or determined according to the convergence rate. Embodiments may use any combination of these alignment quality measurements.

アライメントの品質は、新しい又は更新されたそれぞれのベクトル６２１５の方向及び大きさによって定義されるミスアライメントのレベルに基づいて決定され得る。上述のように、ベクトル６２１５は、その方向及び大きさに従って、テンプレート物体６２９０に作用する力として数学的に解釈され得る。静止し、力を受ける時、物体は応力を経験するであろう。実施形態では、ミスアライメントのレベルは、ベクトル６２１５を、テンプレート物体６２９０の内部応力を生成する力として数学的に処理することに従って計算され得る。したがって、例えば、等しいベクトル及び反対のベクトル６２１５は、（ベクトル６２１５が単に一緒に追加された場合のように）互いに打ち消しあわず、しかし、テンプレート物体６２９０に「ストレス」を生成する。アライメント品質のレベルが良好（及びミスアライメントのレベルが低）である場合、ベクトル６２１５は比較的小さい大きさであり、それによって低内部応力に対応する。アライメント品質が悪い（及びミスアライメントのレベルが高い）場合、ベクトル６２１５は大きくなり、それによってより重大な内部応力に対応する。この内部応力の計算は、アライメント品質を示すものとみなされ得る。 The quality of alignment may be determined based on the level of misalignment defined by the direction and magnitude of each new or updated vector 6215. As mentioned above, vector 6215 can be interpreted mathematically as a force acting on template object 6290 according to its direction and magnitude. When an object is at rest and subjected to a force, it will experience stress. In embodiments, the level of misalignment may be calculated according to mathematically treating vector 6215 as a force that creates internal stress in template object 6290. Thus, for example, equal and opposite vectors 6215 do not cancel each other out (as if vectors 6215 were simply added together), but create "stress" on template object 6290. When the level of alignment quality is good (and the level of misalignment is low), vector 6215 has a relatively small magnitude, thereby corresponding to low internal stresses. If the alignment quality is poor (and the level of misalignment is high), vector 6215 will be larger, thereby corresponding to more significant internal stresses. This internal stress calculation can be considered as an indication of alignment quality.

一実施形態では、アライメントの品質は、新しい又は更新されたテンプレート位置６２１０のセットと物体位置６２２０のセットとの間の距離測定値に基づいて決定され得る。距離測定値は、ユークリッド距離測定値、又はユークリッド空間の２点間の線分の長さであってもよい。ユークリッド距離（又はピタゴラスの距離）は、以下の式を介して表され得る： In one embodiment, the quality of alignment may be determined based on distance measurements between the new or updated set of template locations 6210 and the set of object locations 6220. The distance measure may be a Euclidean distance measure or the length of a line segment between two points in Euclidean space. Euclidean distance (or Pythagorean distance) can be expressed via the following formula:

式中、以下のとおりである。 In the formula, it is as follows.

ｄ＝距離 d=distance

ｐ＝３Ｄ座標ｐ_１、ｐ_２、ｐ_３を有する第一の点 p = first point with 3D coordinates p ₁ , p ₂ , p ₃

ｑ＝３Ｄ座標ｑ_１、ｑ_２、ｑ_３を有する第二の点 q = second point with 3D coordinates q ₁ , q ₂ , q ₃

上述の方程式を介して生成された距離測定値は、距離値（典型的にはゼロ以上）を出力し、ゼロに近い出力値は、点ｐ、ｑ（ゼロは距離なしを表すか、又は同一／重複点を表す）の間のより近い距離を表す。テンプレート位置６２１０の新しいセット又は更新されたセットの各々と、物体位置６２２０のセットとの間の距離測定値は、例えば、算術平均又は幾何平均を取ることによって組み合わせられてもよい。組み合わせられた距離値は、次に、所定の閾値と比較されてもよく、所定の閾値と等しいか、又はそれを下回る距離値（すなわち、ゼロ及び所定の閾値）は、良好な合致を示す（すなわち、テンプレート物体６２９０と物体画像情報１２００２との間）ものであり、所定の閾値よりも大きい距離出力値は、不一致を示す。 The distance measurements generated via the above equation will output a distance value (typically greater than or equal to zero), and an output value close to zero will correspond to the point p, q (zero represents no distance or the same /represents an overlapping point). The distance measurements between each new or updated set of template locations 6210 and the set of object locations 6220 may be combined, for example, by taking an arithmetic or geometric mean. The combined distance values may then be compared to a predetermined threshold, with distance values equal to or below the predetermined threshold (i.e., zero and the predetermined threshold) indicating a good match ( That is, between the template object 6290 and the object image information 12002), a distance output value greater than a predetermined threshold value indicates a mismatch.

一実施形態では、距離測定値は、新しいテンプレート位置６２１０のセット（すなわち、テンプレートベクトル６２６０）と、物体位置６２２０のセット（すなわち、物体ベクトル６２７０）と関連付けられた表面法線ベクトルとの間のコサイン距離であってもよい。テンプレートベクトル６２６０は、対応する物体認識テンプレート４３００Ｂに関連付けられた、以前に決定された表面法線ベクトル９１０１の一部又はすべてを含み得る。物体ベクトル６２７０は、物体画像情報１２００２に関連付けられた表面法線ベクトル８１０１を含み得る。測定されたコサイン距離は、表面法線ベクトル（例えば、テンプレートベクトル６２６０及び物体ベクトル６２７０）の間の角度を示してもよく、示される角度は、表面法線ベクトル（例えば、テンプレートベクトル６２６０及び物体ベクトル６２７０）間のアライメントの程度又は品質と直接相関する。コサイン距離は、以下の式によって表され得る。 In one embodiment, the distance measurement is the cosine between the new set of template positions 6210 (i.e., template vector 6260) and the surface normal vector associated with the set of object positions 6220 (i.e., object vector 6270). It may be distance. Template vector 6260 may include some or all of the previously determined surface normal vectors 9101 associated with the corresponding object recognition template 4300B. Object vector 6270 may include surface normal vector 8101 associated with object image information 12002. The measured cosine distance may indicate an angle between a surface normal vector (e.g., template vector 6260 and object vector 6270), and the indicated angle may indicate an angle between a surface normal vector (e.g., template vector 6260 and object vector 6270). 6270). The cosine distance can be expressed by the following formula.

コサイン距離＝１－コサイン類似度、 Cosine distance = 1 - cosine similarity,

ここで、コサイン類似度は、以下の式によって表され、 Here, the cosine similarity is expressed by the following formula,

式中、ｘ_ｉ及びｙ_ｉは、ベクトルＸ及びＹの構成要素である。 where x _i and y _i are the components of vectors X and Y.

又は、代替では、以下である。
Or, in the alternative:

上記の方程式によって生成された距離測定値は、二つの表面法線ベクトル間の距離を示す値を出力する（すなわち、コサイン距離として）。この出力値は、テンプレートベクトル６２６０と物体ベクトル６２７０との間の角度、すなわち、より具体的には、テンプレート物体６２９０の平面部分と物体画像情報１２００２の平面部分との間の角度をさらに示してもよい。平面部分は、表面法線ベクトルが延び、平行である表面を指す。小さな角度を提供する出力は、テンプレート物体６２９０の平面部分と物体画像情報１２００２の平面部分との間の良好な合致（すなわち、良好な収束又はアライメント）を示し得る。テンプレートベクトル６２６０と物体ベクトル６２７０の対応する各対の間のコサイン距離は、例えば、算術又は幾何平均を取ることによって、距離測定値を生成するように組み合わせられてもよい。 The distance measurement produced by the above equation outputs a value indicating the distance between two surface normal vectors (ie, as a cosine distance). This output value may further indicate the angle between the template vector 6260 and the object vector 6270, or more specifically, the angle between the planar portion of the template object 6290 and the planar portion of the object image information 12002. good. A planar portion refers to a surface whose surface normal vectors extend and are parallel. An output that provides a small angle may indicate a good match (ie, good convergence or alignment) between a planar portion of template object 6290 and a planar portion of object image information 12002. The cosine distances between each corresponding pair of template vector 6260 and object vector 6270 may be combined to generate a distance measurement, for example, by taking an arithmetic or geometric mean.

別の実施形態では、距離測定値は、テンプレート位置６２１０のうちの一つから、物体位置６２２０からの対応する点を含む平面まで測定される、平面距離測定値であってもよく、又はその逆であってもよい。テンプレートベクトル６２６０と物体ベクトル６２７０の対応する各対の間の平面距離は、例えば、算術又は幾何平均を取ることによって、距離測定値を生成するように組み合わせられてもよい。 In another embodiment, the distance measurement may be a planar distance measurement, measured from one of the template positions 6210 to a plane containing the corresponding point from the object position 6220, or vice versa. It may be. The planar distances between each corresponding pair of template vector 6260 and object vector 6270 may be combined to generate a distance measurement, for example, by taking an arithmetic or geometric mean.

テンプレート物体６２９０と物体画像情報１２００２との間のアライメントの品質は、ＩＣＰ技術の連続的な反復にわたって距離が減少することを示すプロファイルに従ってさらに決定され得る。上述のように、ＩＣＰ技術を使用して、テンプレート位置６２１０を物体位置６２２０と収束させることによって、テンプレート物体６２９０及び物体画像情報１２００２を整列させてもよい。連続的な反復の間、テンプレート物体６２９０と物体画像情報１２００２との間の距離測定値（例えば、コサイン距離、ユークリッド距離、平面距離など）が取得され得る。プロファイルは、連続的な反復にわたるこうした距離の変化を示し得る。 The quality of alignment between template object 6290 and object image information 12002 may be further determined according to a profile that shows decreasing distance over successive iterations of the ICP technique. As discussed above, ICP techniques may be used to align template object 6290 and object image information 12002 by converging template position 6210 with object position 6220 . During successive iterations, distance measurements (eg, cosine distance, Euclidean distance, planar distance, etc.) between template object 6290 and object image information 12002 may be obtained. A profile may show changes in these distances over successive iterations.

例えば、ＩＣＰの連続的な反復にわたる距離の一貫した減少を示すプロファイルは、テンプレート物体６２９０と物体画像情報１２００２との間の収束の観点から、高いアライメント品質を示し得る。逆に、距離が増加する、又はそうでなければ、距離が連続的な反復の間に非常に急速に減少していないことを示す、ＩＣＰの連続的な反復があることがプロファイルによって示される場合、プロファイルは、テンプレート物体６２９０及び物体画像情報１２００２が、高品質の収束を示さず、テンプレート物体６２９０と物体画像情報１２００２との間の最終的なアライメントは、低品質であり得ることを示し得る。 For example, a profile that exhibits a consistent decrease in distance over successive iterations of the ICP may indicate high alignment quality in terms of convergence between template object 6290 and object image information 12002. Conversely, if the profile indicates that there are successive repetitions of the ICP, indicating that the distance is increasing or otherwise decreasing very rapidly between successive repetitions. , the profile may indicate that template object 6290 and object image information 12002 do not exhibit high quality convergence and the final alignment between template object 6290 and object image information 12002 may be of low quality.

処理１１０１１では、仮説精密化方法１１０００は、調整された検出仮説を生成することを含み得る。調整された検出仮説６３００’は、上述のように、テンプレート位置６２１０に加えられた調整に従って生成され得る。調整は、検出仮説６３００に記憶された様々な断片の情報の調整されたバージョンを表し得る。例えば、調整された検出仮説６３００’は、物体画像情報１２００２と、調整された対応する物体認識テンプレート４３００Ｂ’とを関連付ける情報を含んでもよく、調整された姿勢情報６３０１’を含んでもよい。調整された対応する物体認識テンプレート４３００Ｂ’は、調整された２Ｄ外観４３０２Ｂ’、調整された３Ｄ外観４３０３Ｂ’、調整された２Ｄ測定情報４３０４Ｂ’、及び調整された３Ｄ測定情報４３０５Ｂ’のうちの一つ以上を含み得る。調整された２Ｄ測定情報４３０４Ｂ’は、調整された勾配情報９１００Ｂ’及び調整された勾配抽出位置５１００Ｂ’を含んでもよく、一方で、調整された３Ｄ測定情報４３０５Ｂ’は、調整された表面法線ベクトル９１０１Ｂ’及び調整された表面法線位置５１０１Ｂ’を含んでもよい。調整された物体認識テンプレート４３００Ｂ’は、調整された勾配抽出位置５１００Ｂ’及び調整された表面法線位置５１０１Ｂ’又はそのサブセットを含み得る、調整されたテンプレート位置６２１０’をさらに含み得る。調整された検出仮説６３００’に含まれる情報の「調整された」バージョンすべてが、検出仮説６３００の対応する情報と異なることは要求されない。例えば、実施形態では、位置は調整されてもよく、一方で、その位置に関連付けられた情報（勾配及び表面法線）は、同じままであってもよい。実施形態では、調整された情報は、オリジナルの検出仮説６３００を記憶することと併せて、調整についての情報を記憶することによって捕捉され得る。 In operation 11011, the hypothesis refinement method 11000 may include generating an adjusted detection hypothesis. Adjusted detection hypothesis 6300' may be generated according to adjustments made to template position 6210, as described above. Adjustments may represent adjusted versions of various pieces of information stored in detection hypothesis 6300. For example, the adjusted detection hypothesis 6300' may include information associating the object image information 12002 with the adjusted corresponding object recognition template 4300B', and may include adjusted pose information 6301'. The adjusted corresponding object recognition template 4300B' is one of adjusted 2D appearance 4302B', adjusted 3D appearance 4303B', adjusted 2D measurement information 4304B', and adjusted 3D measurement information 4305B'. may contain more than one. Adjusted 2D measurement information 4304B' may include adjusted slope information 9100B' and adjusted gradient extraction positions 5100B', while adjusted 3D measurement information 4305B' includes adjusted surface normals. It may include vector 9101B' and adjusted surface normal position 5101B'. The adjusted object recognition template 4300B' may further include adjusted template positions 6210', which may include adjusted gradient extraction positions 5100B' and adjusted surface normal positions 5101B' or a subset thereof. There is no requirement that all "adjusted" versions of the information included in the adjusted detection hypothesis 6300' be different from the corresponding information in the detection hypothesis 6300. For example, in embodiments, a position may be adjusted while information associated with that position (slope and surface normal) may remain the same. In embodiments, the adjusted information may be captured by storing information about the adjustment in conjunction with storing the original detection hypothesis 6300.

本開示はさらに、検出仮説の検証に関する。図１３は、検出仮説を検証するための例示的な検出仮説検証方法１３０００のフロー図を示す。検出仮説検証の以下の説明は、図１４を参照する。検出仮説検証方法１３０００は、一つ以上の以前に取得された検出仮説上で処理して、シーン内に具体的に検出された物理的物体に対応するものとして、特定の検出仮説を検証してもよい。上述のように、テンプレートマッチング及び検出仮説の生成及び精密化を通して、シーン内の特定の物理的物体に関連する又はそれを説明するものとして複数の検出仮説を提案してもよい。検出仮説検証方法１３０００は、特定の物理物体の物体画像情報ならびにそれに関連する検出仮説のセットを受信し、複数の検出仮説を検証して、最適又は最良の適合の検出仮説を決定し得る。検出仮説のセットは、初期検出仮説（検出仮説６３００など）であってもよく、及び／又は調整された検出仮説（調整された検出仮説６３００’など）であってもよく、又はそれらの組み合わせであってもよい。少なくとも一つの処理回路１１１０は、以下に記載されるように、検出仮説８３００を検証するために、検出仮説検証方法１３０００の特定のステップを実行し得る。 The present disclosure further relates to testing detection hypotheses. FIG. 13 shows a flow diagram of an example detection hypothesis verification method 13000 for testing a detection hypothesis. The following description of detection hypothesis verification refers to FIG. 14. The detection hypothesis verification method 13000 operates on one or more previously obtained detection hypotheses to verify a particular detection hypothesis as corresponding to a specifically detected physical object in a scene. Good too. As discussed above, through template matching and detection hypothesis generation and refinement, multiple detection hypotheses may be proposed as relating to or explaining a particular physical object in a scene. Detection hypothesis verification method 13000 may receive object image information for a particular physical object as well as a set of detection hypotheses associated therewith, and may test the plurality of detection hypotheses to determine an optimal or best-fitting detection hypothesis. The set of detection hypotheses may be initial detection hypotheses (such as detection hypothesis 6300) and/or may be adjusted detection hypotheses (such as adjusted detection hypothesis 6300'), or a combination thereof. There may be. At least one processing circuit 1110 may perform certain steps of detection hypothesis verification method 13000 to verify detection hypothesis 8300, as described below.

一実施形態では、検出仮説検証方法１３０００は、例えば、図２Ａ～２Ｄの計算システム１１００（又は、１１００Ａ／１１００Ｂ／１１００Ｃ）、又は図３Ａ～３Ｂの計算システム１１００によって、あるいはより具体的には、計算システム１１００の少なくとも一つの処理回路１１１０によって行われてもよい。一部のシナリオでは、計算システム１１００が、非一時的コンピュータ可読媒体（例えば、１１２０）上に記憶される命令を実行することによって、検出仮説検証方法１３０００を行ってもよい。例えば、命令によって、計算システム１１００に、検出仮説検証方法１３０００を行い得る、図２Ｄに示されたモジュールのうちの一つ以上を実行させてもよい。例えば、実施形態では、方法１３０００のステップは、仮説生成モジュール１１２８、仮説精密化モジュール１１３６、及び一緒に動作する仮説検証モジュール１１３８によって行われてもよい。 In one embodiment, detection hypothesis validation method 13000 is performed, for example, by computing system 1100 (or 1100A/1100B/1100C) of FIGS. 2A-2D or computing system 1100 of FIGS. 3A- 3B , or more specifically, by The processing may be performed by at least one processing circuit 1110 of computing system 1100. In some scenarios, computing system 1100 may perform detection hypothesis verification method 13000 by executing instructions stored on a non-transitory computer-readable medium (eg, 1120). For example, the instructions may cause computing system 1100 to execute one or more of the modules illustrated in FIG. 2D that may perform detection hypothesis verification method 13000. For example, in embodiments, the steps of method 13000 may be performed by hypothesis generation module 1128, hypothesis refinement module 1136, and hypothesis verification module 1138 operating together.

検出仮説検証方法１３０００を使用して、シーン５０１３内に物理的に位置する一つ以上の物体５０１２を識別するために生成された検出仮説セット８３０９の一つ以上の検出仮説８３００を検証してもよい。検出仮説検証方法１３０００は、シーン５０１３から取得された画像情報１２００１について処理してもよい。画像情報１２００１は、２Ｄ画像情報２６００及び３Ｄ画像情報２７００と類似してもよい。画像情報１２００１内には、シーン５０１３内の物体５０１２を表す一つ以上の物体画像情報１２００２があってよい。 Detection hypothesis testing method 13000 may also be used to test one or more detection hypotheses 8300 of detection hypothesis set 8309 generated to identify one or more objects 5012 physically located within scene 5013. good. The detection hypothesis verification method 13000 may process image information 12001 acquired from the scene 5013. Image information 12001 may be similar to 2D image information 2600 and 3D image information 2700. Within image information 12001 there may be one or more object image information 12002 representing objects 5012 within scene 5013.

以下の考察では、識別される単一の物体５０１２に関する検出仮説セット８３０９の使用に従って、検出仮説検証方法１３０００が論じられる。以下で論じるように、検出仮説検証方法１３０００は、単一の物体５０１２に対応する最良の検出仮説８３００を識別するように処理する。他の実施形態では、検出仮説セット８３０９は、シーン５０１３内の複数の物体５０１２に関する検出仮説８３００を含み得る。各個々の物体５０１２は、検出仮説セット８３０９からの対応する検出仮説８３００のグループを有してもよく、これは、対応する個々の物体５０１２に関して説明する方法に従って検証されてもよい。このようにして、検出仮説検証方法１３０００は、単一の物体５０１２に対する最良の検出仮説８３００を検証及び識別するために、又は、各々が異なる個々の物体５０１２に対応する、複数の最良の検出仮説８３００を識別するために利用されてもよい。複数の検出仮説８３００を検証することによって、複数の物体５０１２の連続する持ち上げを含む複雑な持ち上げ処理を計画及び実行することができる。 In the following discussion, detection hypothesis verification method 13000 is discussed according to the use of detection hypothesis set 8309 for a single object 5012 to be identified. As discussed below, detection hypothesis validation method 13000 operates to identify the best detection hypothesis 8300 that corresponds to a single object 5012. In other embodiments, detection hypothesis set 8309 may include detection hypotheses 8300 for multiple objects 5012 in scene 5013. Each individual object 5012 may have a corresponding group of detection hypotheses 8300 from the detection hypothesis set 8309, which may be verified according to the methods described with respect to the corresponding individual object 5012. In this way, the detection hypothesis verification method 13000 can be used to test and identify the best detection hypothesis 8300 for a single object 5012 or to identify multiple best detection hypotheses, each corresponding to a different individual object 5012. 8300. By testing multiple detection hypotheses 8300, complex lifting processes involving sequential lifting of multiple objects 5012 can be planned and executed.

検出仮説検証方法１３０００では、少なくとも一つの処理回路１１１０が、ロボットアーム３３２０及びアームに接続されたエンドエフェクタ装置３３３０を有するロボット３３００、及び視野を有し、視野内に一つ以上の物体５０１２がある又はあったときに、非一時的コンピュータ可読媒体に記憶されている命令を実行するように構成されるカメラ１２００と通信することができる。実施形態では、少なくとも一つの処理回路１１１０は、ロボット３３００と直接通信しなくてもよく、ネットワーク及び／又は記憶装置を介して、ロボット３３００との間で情報を送受信してもよい。実施形態では、少なくとも一つの処理回路１１１０は、ロボット３３００と直接通信してもよい。少なくとも一つの処理回路１１１０は、シーン５０１３内の一つ以上の物体５０１２の画像情報１２００１を取得し得る。少なくとも一つの処理回路１１１０はまた、一つ以上の検出仮説８３００及び／又は検出仮説セット８３０９を取得してもよい。 In the detection hypothesis verification method 13000, at least one processing circuit 1110 includes a robot 3300 having a robot arm 3320 and an end effector device 3330 connected to the arm, and a field of view in which one or more objects 5012 are present. or when the camera 1200 is configured to execute instructions stored on a non-transitory computer-readable medium. In embodiments, at least one processing circuit 1110 may not communicate directly with robot 3300, but may send information to and receive information from robot 3300 via a network and/or storage device. In embodiments, at least one processing circuit 1110 may communicate directly with robot 3300. At least one processing circuit 1110 may obtain image information 12001 of one or more objects 5012 within a scene 5013. At least one processing circuit 1110 may also obtain one or more detection hypotheses 8300 and/or detection hypothesis sets 8309.

各検出仮説８３００は、物体認識テンプレート４３００（例えば、複数の物体認識テンプレート４３００から選択される対応する物体認識テンプレート４３００Ｃ）と物体画像情報１２００２を関連付ける情報を含んでもよく、物体画像情報１２００２によって表される物体５０１２の姿勢情報６３０１を含んでもよい。物体５０１２の姿勢情報６３０１は、物体５０１２の位置及び配向を指し得る。実施形態では、検出仮説８３００は、対応する物体認識テンプレート４３００Ｃを含んでもよく、又は対応する物体認識テンプレート４３００Ｃへの参照を含んでもよい。 Each detection hypothesis 8300 may include information that associates an object recognition template 4300 (e.g., a corresponding object recognition template 4300C selected from a plurality of object recognition templates 4300) with object image information 12002, and is represented by object image information 12002. It may also include posture information 6301 of the object 5012. Pose information 6301 for object 5012 may refer to the position and orientation of object 5012. In embodiments, the detection hypothesis 8300 may include a corresponding object recognition template 4300C or may include a reference to a corresponding object recognition template 4300C.

処理１３００１では、検出仮説検証方法１３０００は、シーン内の一つ以上の物体の画像情報を取得することを含む。処理１３００１は、上で論じた処理１１００１と類似してもよい。画像情報１２００１を取得することは、シーン５０１３の画像を捕捉することを含み得る。こうした実例では、画像情報１２００１が、箱、ビン、ケース、木枠、パレット又は他の容器内に位置する物体５０１２を表し得る。画像情報１２００１は、本明細書で論じるように、カメラ１２００によって取得されてもよい。少なくとも一つの処理回路１１１０は、画像情報１２００１を使用して、カメラ１２００の視野内の個々の物体を区別し、画像情報１２００１に基づいて物体認識又は物体登録を行うなど、画像情報１２００１を生成、受信、及び／又は処理するように構成され得る。一実施形態では、画像情報１２００１は、カメラ１２００の視野における環境又はシーン５０１３の視覚的な外観を記述する、二次元画像情報（例えば、２Ｄ画像情報２６００に類似）を含み得る。一実施形態では、画像情報１２００１は、カメラ１２００の視野内のシーン５０１３の点群、空間構造情報、奥行きマップ、又は他の三次元画像を提供する、三次元画像情報（例えば、３Ｄ画像情報２７００に類似）を含み得る。この例の三次元画像情報は、物体５０１２が三次元空間（例えば、シーン５０１３）の中で空間的にどのように配置されるかを推定するために使用され得る。画像情報１２００１の取得は、シーン５０１３を表す画像情報１２００１の生成又は取得を含んでもよく、必要に応じて、シーン５０１３の個々の物体５０１２又は複数の物体５０１２を表す一つ以上の物体画像情報１２００２の生成又は取得を含んでもよい。物体画像情報１２００２は、物体５０１２を表す２Ｄ画像情報１２６００を含み得る。２Ｄ画像情報１２６００は、画像情報２６００に類似していてもよく、及び／又はレイトレーシング及び不連続検出などのレンダリング技術に従って生成されたレンダリングされた２Ｄ画像情報を含んでもよい。物体画像情報１２００２は、画像情報２７００に類似の、物体５０１２を表す３Ｄ画像情報１２７００を含み得る。画像情報１２００１は、物体５０１２がカメラ１２００の視野にある（又はあった）ときに、カメラ１２００によって生成されてもよく、例えば、二次元画像情報及び／又は三次元画像情報を含み得る。 In process 13001, the detection hypothesis verification method 13000 includes obtaining image information of one or more objects in a scene. Process 13001 may be similar to process 11001 discussed above. Obtaining image information 12001 may include capturing an image of scene 5013. In these examples, image information 12001 may represent an object 5012 located within a box, bin, case, crate, pallet, or other container. Image information 12001 may be acquired by camera 1200, as discussed herein. At least one processing circuit 1110 generates image information 12001, such as using image information 12001 to distinguish individual objects within the field of view of camera 1200 and perform object recognition or object registration based on image information 12001; The information may be configured to receive and/or process. In one embodiment, image information 12001 may include two-dimensional image information (eg, similar to 2D image information 2600) that describes the visual appearance of an environment or scene 5013 in the field of view of camera 1200. In one embodiment, image information 12001 includes three-dimensional image information (e.g., 3D image information 2700 similar to). This example three-dimensional image information may be used to estimate how object 5012 is spatially positioned within three-dimensional space (eg, scene 5013). Obtaining image information 12001 may include generating or obtaining image information 12001 representing a scene 5013 and, optionally, one or more object image information 12002 representing an individual object 5012 or multiple objects 5012 of scene 5013. It may also include the generation or acquisition of. Object image information 12002 may include 2D image information 12600 representing object 5012. 2D image information 12600 may be similar to image information 2600 and/or may include rendered 2D image information generated according to rendering techniques such as ray tracing and discontinuity detection. Object image information 12002 may include 3D image information 12700 representing object 5012, similar to image information 2700. Image information 12001 may be generated by camera 1200 when object 5012 is (or was) in the field of view of camera 1200 and may include, for example, two-dimensional image information and/or three-dimensional image information.

画像情報１２００１は、仮説精密化方法１１０００の実施中に取得された同じ画像情報１２００１であってもよい。したがって、計算システム１１００は、仮説精密化方法１１０００の実施のために画像情報１２００１を取得し、画像情報１２００１を記憶し、検出仮説検証方法１３０００の実施のために画像情報１２００１にアクセスし得る。実施形態では、画像情報１２００１は、特に検出仮説検証方法１３０００の実施のために、新たに取得されてもよい。 Image information 12001 may be the same image information 12001 obtained during implementation of hypothesis refinement method 11000. Accordingly, computing system 1100 may obtain image information 12001 for performance of hypothesis refinement method 11000, store image information 12001, and access image information 12001 for performance of detection hypothesis verification method 13000. In embodiments, the image information 12001 may be newly acquired, particularly for implementing the detection hypothesis verification method 13000.

上で論じたように、画像情報１２００１は、二次元グレースケール及び／又はカラー画像を含んでもよく、カメラ１２００の視点からのシーン５０１３（及び／又はシーン内の物体５０１２）の外観を説明してもよい。一実施形態では、画像情報１２００１は、カラー画像の単一色チャネル（例えば、赤、緑、又は青色のチャネル）に対応し得る。カメラ１２００が物体５０１２の上方に配置される場合、二次元画像情報は、物体５０１２のそれぞれの上部表面の外観を表し得る。さらに、画像情報１２００１は、例えば、物体５０１２の一つ以上の表面（例えば、上部表面、又は他の外側表面）上の、又は物体５０１２の一つ以上のエッジに沿った様々な物体位置６２２０のそれぞれの奥行き値を示す、奥行きマップ又は点群を含み得る、三次元画像情報を含み得る。物体画像情報１２００２の二次元画像情報及び三次元画像情報は、それぞれ、２Ｄ画像情報１２６００及び３Ｄ画像情報１２７００と呼んでもよい。一部の実施形態では、物体５０１２の物理的エッジを表す物体位置６２２０は、個々の物体５０１２を表すことに限定される物体画像情報１２００２を識別するために使用され得る。 As discussed above, image information 12001 may include two-dimensional grayscale and/or color images that describe the appearance of scene 5013 (and/or objects 5012 within the scene) from the perspective of camera 1200. Good too. In one embodiment, image information 12001 may correspond to a single color channel (eg, a red, green, or blue channel) of a color image. If camera 1200 is positioned above object 5012, the two-dimensional image information may represent the appearance of the respective upper surface of object 5012. Further, image information 12001 may include, for example, various object positions 6220 on one or more surfaces (e.g., a top surface, or other outer surface) of object 5012 or along one or more edges of object 5012. It may include three-dimensional image information, which may include a depth map or point cloud, indicating respective depth values. The two-dimensional image information and three-dimensional image information of the object image information 12002 may be referred to as 2D image information 12600 and 3D image information 12700, respectively. In some embodiments, object positions 6220 representing physical edges of objects 5012 may be used to identify object image information 12002 that is limited to representing individual objects 5012.

処理１３００３では、検出仮説検証方法１３０００は、一つ以上の検出仮説８３００及び／又は検出仮説セット８３０９を取得することをさらに含み得る。説明を容易にするために、特に明記されない限り、特定の検出仮説８３００の記述された属性及び質は、検出仮説セット８３０９の検出仮説８３００の各々に適用されると理解され得る。検出仮説８３００は、仮説精密化方法１１０００の実施に続いて、調整された検出仮説６３００’として取得されてもよい。検出仮説８３００は、上述のように、テンプレートマッチング処理から初期検出仮説８３００として取得され得る。 In operation 13003, the detection hypothesis verification method 13000 may further include obtaining one or more detection hypotheses 8300 and/or detection hypothesis set 8309. For ease of explanation, the described attributes and qualities of a particular detection hypothesis 8300 may be understood to apply to each of the detection hypotheses 8300 of the detection hypothesis set 8309, unless otherwise specified. Detection hypothesis 8300 may be obtained as adjusted detection hypothesis 6300' following implementation of hypothesis refinement method 11000. Detection hypothesis 8300 may be obtained as initial detection hypothesis 8300 from the template matching process, as described above.

検出仮説８３００は、画像情報１２００１内の対応する物体画像情報１２００２をオーバーレイするために必要な、対応する物体認識テンプレート４３００Ｃの位置及び配向を示す、対応する物体認識テンプレート４３００Ｃ及び物体姿勢情報８３０１を含んでもよい。対応する物体認識テンプレート４３００Ｃは、２Ｄ外観４３０２Ｃ、３Ｄ外観４３０３Ｃ、２Ｄ測定情報４３０４Ｃ、及び３Ｄ測定情報４３０５Ｃのうちの一つ以上を含み得る。上述のように、２Ｄ測定情報４３０４Ｃは、勾配情報９１００Ｃ及び勾配抽出位置５１００Ｃを含んでもよく、一方で、３Ｄ測定情報４３０５Ｃは、表面法線ベクトル９１０１Ｃ及び表面法線位置５１０１Ｃを含んでもよい。対応する物体認識テンプレート４３００Ｃは、勾配抽出位置５１００Ｂ及び表面法線位置５１０１Ｂ又はそのサブセットを含み得る、テンプレート位置８２１０をさらに含み得る。検出仮説８３００の情報は、本明細書ではテンプレート物体８２９０と呼ばれる、物体のデジタル表現を定義することができる。テンプレート物体８２９０は、物体画像情報１２００２と比較するための、画像情報１２００１の座標系における三次元空間における検出仮説６３００の情報を表す。 The detection hypothesis 8300 includes a corresponding object recognition template 4300C and object pose information 8301 indicating the position and orientation of the corresponding object recognition template 4300C necessary to overlay the corresponding object image information 12002 in the image information 12001. But that's fine. The corresponding object recognition template 4300C may include one or more of 2D appearance 4302C, 3D appearance 4303C, 2D measurement information 4304C, and 3D measurement information 4305C. As mentioned above, 2D measurement information 4304C may include slope information 9100C and gradient extraction position 5100C, while 3D measurement information 4305C may include surface normal vector 9101C and surface normal position 5101C. The corresponding object recognition template 4300C may further include template locations 8210, which may include gradient extraction locations 5100B and surface normal locations 5101B or a subset thereof. The detection hypothesis 8300 information may define a digital representation of the object, referred to herein as a template object 8290. Template object 8290 represents information of detection hypothesis 6300 in three-dimensional space in the coordinate system of image information 12001 for comparison with object image information 12002.

検出仮説セット８３０９、特にセットのサイズは、割合と完全さとのバランスを取るように選択又は決定され得る。より多くの検出仮説８３００を選択することは、良好な合致を達成するより高い機会をもたらし得るが、処理にも時間がかかり得る。仮説精密化方法１１０００に関して上で論じたように、アライメントの品質は、精密化に関連するステップの間に測定又は決定され得る。品質閾値を超えることは、仮説精密化方法１１０００を完了として決定させるマーカであってもよい。同様に、品質閾値を超えることは、検出仮説セット８３０９内に調整された検出仮説６３００’を含めることを可能にするマーカとして考慮され得る。品質閾値を超えることができないと、調整された検出仮説６３００’が除外される結果となり得る。したがって、検出仮説セット８３０９のサイズは、品質閾値がどれほど厳しいかによって決められ得る。いくつかの実施形態では、検出仮説セット８３０９のサイズは限定されてもよく、最高品質アライメント調整検出仮説６３００’のみが含まれる。実施形態では、品質閾値及びランク付けされた順序の両方を使用してもよい。実施形態では、多くの偽陽性が生成されるという理解で、大きな検出仮説セット８３０９（例えば、５００、１０００、又は１０，０００を超える総検出仮説）を生成するテンプレートマッチング及び仮説精密化技術を使用することが有益であり得る。こうした実施形態は、以下で論じるように、検出仮説検証方法１３０００に依拠して、偽陽性をフィルタリングしてもよい。 The detection hypothesis set 8309, and in particular the size of the set, may be selected or determined to balance proportion and completeness. Selecting more detection hypotheses 8300 may provide a higher chance of achieving a good match, but may also take longer to process. As discussed above with respect to hypothesis refinement method 11000, the quality of the alignment may be measured or determined during the steps associated with refinement. Exceeding a quality threshold may be a marker that causes hypothesis refinement method 11000 to be determined as complete. Similarly, exceeding a quality threshold may be considered as a marker that allows for inclusion of an adjusted detection hypothesis 6300' within the detection hypothesis set 8309. Failure to exceed the quality threshold may result in the adjusted detection hypothesis 6300' being rejected. Therefore, the size of the detection hypothesis set 8309 may be determined by how stringent the quality threshold is. In some embodiments, the size of detection hypothesis set 8309 may be limited and includes only the highest quality alignment adjustment detection hypotheses 6300'. In embodiments, both a quality threshold and a ranked order may be used. Embodiments use template matching and hypothesis refinement techniques to generate large detection hypothesis sets 8309 (e.g., more than 500, 1000, or 10,000 total detection hypotheses) with the understanding that many false positives will be generated. It can be beneficial to do so. Such embodiments may rely on the detection hypothesis validation method 13000 to filter false positives, as discussed below.

処理１３００５において、検出仮説検証方法１３０００は、検出仮説のセットの各検出仮説を検証することを含む。複数の検出仮説８３００が取得され、画像情報１２００１の物体画像情報１２００２と比較されて、物体画像情報１２００２によって表される物理的物体５０１２を記述するためどの検出仮説８３００が最良の推定値又は最良の適合であるかを識別する。検出仮説セット８３０９から最良の検出仮説を選択することは、以下に記載される処理１３００７～１３０１１に従って、検出仮説の各々を検証することを含む。仮説検証は、三次元及び二次元検証スコアを生成し、これらに従って検出仮説セット８３０９をフィルタリングすることを含み得る。 In operation 13005, the detection hypothesis verification method 13000 includes testing each detection hypothesis of the set of detection hypotheses. A plurality of detection hypotheses 8300 are obtained and compared to object image information 12002 of image information 12001 to determine which detection hypothesis 8300 is the best estimate or Identify compliance. Selecting the best detection hypothesis from the detection hypothesis set 8309 includes testing each of the detection hypotheses according to processes 13007-13011 described below. Hypothesis testing may include generating three-dimensional and two-dimensional testing scores and filtering the detected hypothesis set 8309 accordingly.

処理１３００７では、検出仮説検証方法１３０００は、複数の三次元検証スコアを生成することを含む。各三次元検証スコアは、検出仮説８３００の三次元情報及びシーンからの物体に対応する画像情報（例えば、物体画像情報１２００２）の対応する三次元情報の比較に基づいてもよい。複数の三次元検証スコアは、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、及び法線ベクトルバリデータスコアのうちの少なくとも一つを含んでもよい。検出仮説８３００の三次元情報は、３Ｄ外観４３０３Ｃ及び表面法線ベクトル９１０１Ｃを含む３Ｄ測定情報４３０５Ｃ及び表面法線位置５１０１Ｃを含み得る。物体画像情報１２００２の三次元情報は、３Ｄ画像情報１２７００、表面法線位置８１０１、及び表面法線ベクトル８１０３を含み得る。 In operation 13007, the detection hypothesis verification method 13000 includes generating a plurality of three-dimensional verification scores. Each three-dimensional validation score may be based on a comparison of the three-dimensional information of the detection hypothesis 8300 and the corresponding three-dimensional information of the image information corresponding to the object from the scene (eg, object image information 12002). The plurality of three-dimensional validation scores may include at least one of an occlusion validator score, a point cloud validator score, a hole match validator score, and a normal vector validator score. The three-dimensional information of detection hypothesis 8300 may include 3D appearance 4303C and 3D measurement information 4305C including surface normal vector 9101C and surface normal position 5101C. The three-dimensional information of the object image information 12002 may include 3D image information 12700, surface normal position 8101, and surface normal vector 8103.

バリデータスコアは、本明細書で論じるように、特定の検出仮説が、物体画像情報１２００２にどの程度よく対応しているか、又は整列しているかを表すスコア又は数であってもよい。バリデータスコアは、本明細書で論じるように、より低い値がより良い適合を表す仮説信頼スコアに適用されるペナルティスコアであってもよい。あるいは、バリデータスコアは、より高い値がより良好な適合を表すボーナススコアであってもよい。説明を容易にするために、本明細書で論じるバリデータスコアは、ペナルティスコアであってもよいが、同じ概念及び技法の全てが、ボーナススコアを使用して適用され得ることが理解される。 A validator score may be a score or number that represents how well a particular detection hypothesis corresponds to or aligns with object image information 12002, as discussed herein. The validator score may be a penalty score applied to the hypothesis confidence score, with lower values representing a better fit, as discussed herein. Alternatively, the validator score may be a bonus score where higher values represent a better fit. For ease of explanation, the validator scores discussed herein may be penalty scores, but it is understood that all of the same concepts and techniques can be applied using bonus scores.

遮蔽バリデータスコア及び点群バリデータスコアは、それぞれ、物体画像情報１２００２の物体位置６２２０を、検出仮説８３００によって表されるテンプレート物体８２９０の表面と比較し、物体位置６２２０とテンプレート物体８２９０の表面との間の不一致を識別して、遮蔽バリデータスコア及び点群バリデータスコアを取得することによって、取得され得る。検出仮説の三次元情報は、テンプレート物体８２９０の表面の位置を示し得る。検出仮説８３００の三次元情報が、実際には、シーン５０１３の物体５０１２を表す場合、３Ｄ画像情報１２７００に関連付けられた物体位置６２２０は、表面上又は表面に近くにあるはずである。表面の近くにない場合、テンプレートマッチング処理によって決定される合致は、偽陽性であってもよい。物体位置６２２０をテンプレート物体８２９０の表面に比較することは、有効な点ならびに遮蔽及び無効な点の二つの種類の不一致を識別し得る。物体位置６２２０を、テンプレート物体８２９０の表面の上方又はそうでなければ外側に置く不一致は、遮蔽と呼ばれ、遮蔽バリデータスコアを計算するために使用され得る。テンプレート物体８２９０の表面の下に物体位置６２２０を置く不一致は、無効な点と呼ばれ、点群バリデータスコアを計算するために使用され得る。テンプレート物体８２９０の表面から上又はその近く（閾値距離内、表皮奥行きパラメータとも呼ばれる）にある物体位置６２２０は、有効な点と呼ばれ得る。物体位置６２２０とテンプレート物体８２９０の表面との間のある程度の偏差が予想される。こうした偏差は、表皮奥行きパラメータによって説明されてもよく、そのサイズは、許容偏差の量を決定する。 The occlusion validator score and the point cloud validator score each compare the object position 6220 of the object image information 12002 with the surface of the template object 8290 represented by the detection hypothesis 8300, and calculate the difference between the object position 6220 and the surface of the template object 8290. may be obtained by identifying discrepancies between and obtaining an occlusion validator score and a point cloud validator score. The three-dimensional information of the detection hypothesis may indicate the position of the surface of the template object 8290. If the three-dimensional information of detection hypothesis 8300 actually represents object 5012 in scene 5013, object position 6220 associated with 3D image information 12700 should be on or near the surface. If not near the surface, the match determined by the template matching process may be a false positive. Comparing object position 6220 to the surface of template object 8290 may identify two types of mismatches: valid points and occlusion and invalid points. Discrepancies that place object position 6220 above or otherwise outside the surface of template object 8290 are referred to as occlusions and may be used to calculate an occlusion validator score. Discrepancies that place object positions 6220 below the surface of template object 8290 are referred to as invalid points and may be used to calculate point cloud validator scores. Object locations 6220 that are above or near the surface of template object 8290 (within a threshold distance, also referred to as the skin depth parameter) may be referred to as valid points. Some deviation between object position 6220 and the surface of template object 8290 is expected. Such deviations may be accounted for by skin depth parameters, the size of which determines the amount of allowable deviation.

遮蔽バリデータスコアは、物体位置６２２０をテンプレート物体８２９０の表面の上方又は外側に置く不一致を識別することによって取得される。これらの不一致は遮蔽と呼ばれる。遮蔽バリデータスコアは、仮説信頼スコアに対する加重ペナルティをもたらし、重みは、テンプレート物体６２９０の表面からの物体位置６２２０の距離に依存する。遮蔽バリデータスコアは、テンプレート物体６２９０の表面からの距離の関数として計算されてもよい。関数は、例えば、対数法線関数であってもよく、対数法線関数の曲線のピークは、テンプレート物体８２９０の表面の近くの３Ｄ点と一致する表面からの距離を表すが、テンプレート物体８２９０の一部である可能性は低い。実施形態では、ピークを有する関数は、画像情報１２００１を捕捉するセンサ又はカメラが精度を失う点を越えてすぐの距離で選択され得る。例えば、テンプレート物体６２９０の表面を非常に大きな距離で越える物体位置６２２０は、物体画像情報１２００２によって表される物体５０１２上の実際の点ではなく、シーン５０１３の合致部分とカメラ１２００の合致部分との間の別の物体５０１２からの遮蔽又は画像情報１２００１のノイズから、このような物体位置６２２０が結果として生じる可能性のために、それに対して適用されるペナルティは低くてもよい。したがって、特定の物体位置６２２０に対する遮蔽バリデータスコアのペナルティは、最初距離と共に最初に増大し、検出仮説の信頼性を低下させ得る。距離がピークを超えて増加した後、特定の物体位置６２２０が、物体画像情報１２００２によって表される物体５０１２によって生成されず、ペナルティが減少する可能性がますます高まる。 The occlusion validator score is obtained by identifying discrepancies that place the object location 6220 above or outside the surface of the template object 8290. These mismatches are called occlusions. The occlusion validator score provides a weighted penalty to the hypothesis confidence score, with the weight depending on the distance of the object location 6220 from the surface of the template object 6290. The occlusion validator score may be calculated as a function of distance from the surface of template object 6290. The function may be, for example, a log-normal function, where the peak of the curve of the log-normal function represents a distance from the surface that coincides with a 3D point near the surface of the template object 8290, but It is unlikely that some of them are. In embodiments, a function with a peak may be selected at a distance just beyond the point at which a sensor or camera capturing image information 12001 loses accuracy. For example, object position 6220 that exceeds the surface of template object 6290 by a very large distance is not an actual point on object 5012 represented by object image information 12002, but rather a combination of a matching portion of scene 5013 and a matching portion of camera 1200. Due to the possibility that such an object position 6220 results from occlusion from another object 5012 in between or from noise in the image information 12001, the penalty applied to it may be low. Therefore, the occlusion validator score penalty for a particular object location 6220 may initially increase with distance, reducing the confidence of the detection hypothesis. After the distance increases beyond the peak, it becomes increasingly likely that a particular object position 6220 is not generated by the object 5012 represented by the object image information 12002 and the penalty decreases.

実施形態では、遮蔽信頼スコアは、遮蔽バリデータスコアに対して決定され得る。遮蔽信頼スコアは、遮蔽バリデータスコアが、どの決定がなされ得るかについての良い情報を提供する信頼レベルを表す。物体位置６２２０は、それらが物体に属するという信頼がある点又は位置を表し得る。しかしながら、物体画像情報１２００２は、物体５０１２に属すると自信を持って識別されない追加の点を含み得る。遮蔽信頼スコアは、物体画像情報１２００２の可視点の総数に対する物体位置６２２０の比に基づいてもよい。したがって、物体に属するという信頼がある物体位置６２２０が、可視点全体のより低い割合である場合、物体位置６２２０に基づく遮蔽バリデータスコアが正確な情報を提供する信頼性は低下し、関連する遮蔽信頼スコアも同様に低下する。いくつかの実施形態では、最終的な遮蔽バリデータスコアは、遮蔽信頼スコアに従って修正される初期遮蔽バリデータスコアによって表されてもよい。 In embodiments, a shielding confidence score may be determined relative to a shielding validator score. The shielding confidence score represents the level of confidence that the shielding validator score provides good information about what decisions can be made. Object locations 6220 may represent points or locations where there is confidence that they belong to an object. However, object image information 12002 may include additional points that are not confidently identified as belonging to object 5012. The occlusion confidence score may be based on the ratio of the object position 6220 to the total number of visible points in the object image information 12002. Therefore, if object locations 6220 that are confident to belong to an object represent a lower percentage of the total visibility points, then the occlusion validator score based on object locations 6220 has less confidence in providing accurate information, and the associated occlusion Confidence scores decrease as well. In some embodiments, the final occlusion validator score may be represented by the initial occlusion validator score modified according to the occlusion confidence score.

点群バリデータスコアは、物体位置６２２０をテンプレート物体８２９０の中又は表面の下に置く不一致を識別することによって取得される。これらの不一致は、無効な点と呼ばれる。点群バリデータスコアは、仮説信頼スコアに対するペナルティとなる。無効点として識別された物体位置６２２０、例えば、テンプレート物体８２９０の表面下は、検出仮説８３００が不正確であり、相応に高いペナルティスコアをもたらし得る強力な指標であり得る。実施形態では、点群バリデータスコアは、無効な点の数、又は無効な点と無効な点のカットオフ値との比に基づいてもよい。 Point cloud validator scores are obtained by identifying discrepancies that place object location 6220 within or below the surface of template object 8290. These discrepancies are called invalid points. The point cloud validator score becomes a penalty to the hypothesis confidence score. An object location 6220 identified as an invalid point, eg, below the surface of template object 8290, may be a strong indicator that detection hypothesis 8300 is inaccurate and may result in a correspondingly high penalty score. In embodiments, the point cloud validator score may be based on the number of invalid points or the ratio of invalid points to an invalid point cutoff value.

点群バリデータスコアは、例えば、物体画像情報１２００２内の可視点の総数に対する物体位置６２２０の比に従って、遮蔽信頼スコアに関して上述したのと同じ様式で決定された点群信頼スコアを有してもよい。実施形態では、最終的な点群バリデータスコアは、点群信頼スコアに従って変更される点群バリデータスコアによって表されてもよい。 The point cloud validator score may have a point cloud confidence score determined in the same manner as described above with respect to the occlusion confidence score, e.g., according to the ratio of the object position 6220 to the total number of visible points in the object image information 12002. good. In embodiments, the final point cloud validator score may be represented by a point cloud validator score that is modified according to the point cloud confidence score.

実施形態では、点群バリデータスコア及び遮蔽バリデータスコアは、単一の表面バリデータスコアに組み合わされてもよい。表面バリデータスコアは、点群バリデータスコアと遮蔽バリデータスコアとの組み合わせとして、例えば、二つを組み合わせるために、追加、平均化、又は別の数学的操作を実行することによって、決定され得る。 In embodiments, the point cloud validator score and the occlusion validator score may be combined into a single surface validator score. The surface validator score may be determined as a combination of the point cloud validator score and the occlusion validator score, for example by adding, averaging, or performing another mathematical operation to combine the two. .

法線ベクトルバリデータスコアは、テンプレート物体６２９０の表面上又はその近傍の物体位置６２２０に従って識別される、有効な点が、テンプレート物体６２９０の表面の配向に合致する表面法線ベクトル８１０３を有するかどうかを判定することによって取得され得る。こうした決定は、物体位置６２２０に関連付けられた表面法線ベクトル８１０３を、対応する物体認識テンプレート４３００Ｃの対応する表面法線位置５１０１Ｃに関連付けられた対応する表面法線ベクトル９１０１Ｃと比較することによって行われてもよい。表面法線ベクトル８１０３が、対応する表面法線ベクトル９１０１Ｃと配向を整列しないか、又は配向と合致しない場合、法線ベクトルバリデータスコアは、検出仮説信頼スコアに対するペナルティとして適用され得る。実施形態では、不一致又はミスアライメントの量は、適用されるペナルティのサイズに影響を与え得る。 The normal vector validator score determines whether a valid point, identified according to object position 6220 on or near the surface of template object 6290, has a surface normal vector 8103 that matches the orientation of the surface of template object 6290. can be obtained by determining. Such determination is made by comparing the surface normal vector 8103 associated with the object position 6220 to the corresponding surface normal vector 9101C associated with the corresponding surface normal position 5101C of the corresponding object recognition template 4300C. You can. If the surface normal vector 8103 does not align or match the orientation with the corresponding surface normal vector 9101C, the normal vector validator score may be applied as a penalty to the detection hypothesis confidence score. In embodiments, the amount of mismatch or misalignment may affect the size of the penalty applied.

実施形態では、検出仮説が正確であっても、表面法線ベクトル８１０３が、対応する表面法線ベクトル９１０１Ｃと配向を整列又は合致させることが期待されない状況に対して、状況に対するいくらかの寛容が与えられてもよい。例えば、多くの歯を有するギアなどの物体は、エッジ及び表面法線ベクトルの突然の変化を呈する部分を有し得る。こうした物体構造は、物体画像情報１２００２とテンプレート物体８２９０との間に、シーン上にオーバーレイされているわずかなミスアライメントがあるだけであっても、表面法線ベクトル８１０３及び対応する表面法線ベクトル９１０１Ｃに大きな偏差を引き起こし得る。こうしたシナリオを説明するために、少なくとも一つの処理回路１１１０は、対応する物体認識テンプレート４３００Ｃ又は画像情報１２００１が、表面法線ベクトル９１０１Ｃ／８１０３に高い変動を有する領域を有するかどうかを調べ得る。結果が陽性である場合、少なくとも一つの処理回路１１１０は、対応する物体認識テンプレート４３００Ｃの対応する表面法線ベクトル９１０１Ｃと、物体画像情報１２００２の表面法線ベクトル８１０３との間の高変化領域における差異について法線ベクトル検証スコアを低下させることによって、より高い量の公差を適用し得る。 In embodiments, some tolerance is provided for situations in which, even if the detection hypothesis is accurate, surface normal vector 8103 is not expected to align or match orientation with corresponding surface normal vector 9101C. It's okay to be hit. For example, an object such as a gear with many teeth may have portions that exhibit abrupt changes in edge and surface normal vectors. Such an object structure can generate surface normal vectors 8103 and corresponding surface normal vectors 9101C even if there is only a slight misalignment between object image information 12002 and template object 8290 overlaid on the scene. can cause large deviations. To account for such a scenario, at least one processing circuit 1110 may check whether the corresponding object recognition template 4300C or image information 12001 has a region with high variation in surface normal vectors 9101C/8103. If the result is positive, the at least one processing circuit 1110 determines the difference in the high variation region between the corresponding surface normal vector 9101C of the corresponding object recognition template 4300C and the surface normal vector 8103 of the object image information 12002. A higher amount of tolerance can be applied by lowering the normal vector validation score for .

実施形態では、表面法線バリデータスコアは、それに関連付けられた表面法線信頼レベルを有してもよい。表面法線信頼レベルは、表面法線バリデータスコアによって提供される情報の信頼レベルを表し得る。一実施形態では、表面法線信頼レベルは、抽出されたエッジの品質に従って決定され得る。一実施形態では、表面法線バリデータスコアは、表面法線信頼レベルに従って調整されてもよい。 In embodiments, a surface normal validator score may have a surface normal confidence level associated with it. The surface normal confidence level may represent the confidence level of the information provided by the surface normal validator score. In one embodiment, the surface normal confidence level may be determined according to the quality of the extracted edges. In one embodiment, the surface normal validator score may be adjusted according to the surface normal confidence level.

対応する物体認識テンプレート４３００Ｃによって表されるように、物体画像情報１２００２から取得された物体位置６２２０をテンプレート物体８２９０の構造と比較し、物体位置６２２０と構造との間の不一致を識別して、構造が不在のテンプレート物体又は位置の構造の空のボリュームに対応する物体位置６２２０に従って、無効な穴又は空の座標（穴の無効性と呼ばれる）を識別することによって、穴合致バリデータスコアを得る。物体画像情報１２００２の物体位置６２２０は、物体５０１２の物理的構造の表面上の位置を表すため、シーン内の物体５０１２は、対応する物体認識テンプレート４３００Ｃが空であると示す空間に構造を持つものではない。対応する物体認識テンプレート４３００Ｃが、空であると示す部分における物体位置６２２０の存在は、ノイズに起因する場合があるが、また、不正確な検出仮説を示す場合がある。したがって、穴合致バリデータスコアは、識別されるすべての穴の無効性に対する検出仮説信頼レベルに対するペナルティスコアとして決定され得る。 The object position 6220 obtained from the object image information 12002 is compared to the structure of the template object 8290, as represented by the corresponding object recognition template 4300C, and mismatches between the object position 6220 and the structure are identified and the structure The hole match validator score is obtained by identifying invalid hole or empty coordinates (referred to as hole invalidity) according to the object position 6220, which corresponds to an empty volume of the structure in the template object or position where the hole is absent. Since the object position 6220 of the object image information 12002 represents the position on the surface of the physical structure of the object 5012, the object 5012 in the scene has a structure in a space that the corresponding object recognition template 4300C indicates is empty. isn't it. The presence of an object location 6220 in a portion that the corresponding object recognition template 4300C indicates as empty may be due to noise, but may also indicate an inaccurate detection hypothesis. Therefore, the hole match validator score may be determined as a penalty score for the detection hypothesis confidence level for invalidity of all identified holes.

実施形態では、穴合致バリデータスコアは、それに関連付けられた穴合致信頼レベルを有してもよい。穴合致信頼レベルは、穴合致バリデータスコアによって提供される情報の信頼レベルを表し得る。一実施形態では、穴合致信頼レベルは、抽出されたエッジの品質に従って決定され得る。一実施形態では、穴合致バリデータスコアは、穴合致信頼レベルに従って調整されてもよい。 In embodiments, a hole match validator score may have a hole match confidence level associated therewith. The hole match confidence level may represent the confidence level of the information provided by the hole match validator score. In one embodiment, the hole match confidence level may be determined according to the quality of the extracted edges. In one embodiment, the hole match validator score may be adjusted according to the hole match confidence level.

実施形態では、正しい検出仮説を用いても、ノイズ又は穴の無効性を生じ得る他の状況を説明するために、公差が設けられてもよい。例えば、物体画像情報１２００２が、テンプレート物体８２９０の空の空間（例えば、物体の穴又は開口部）に対応する物体位置６２２０を含む場合、その物体位置６２２０は、空間内に一致して位置する別の物体の一部分に対応し得る。こうしたシナリオは、推定では空の空間の物体位置６２２０が、対応する物体認識テンプレート４３００Ｃによって表される物体５０１２に属さず、代わりに別の物体に属するとき、シーン５０１３内の物体５０１２に対する正確な検出仮説８３００と一致し得る。一実施形態では、穴合致バリデータスコアは、テンプレート物体８２９０の穴、開口部、又は空の座標のサイズが比較的大きいときに、その空間の測定に影響を与える不規則性（例えば、穴又は開口部と交差する、又は穴又は開口部を通して突出する物体）の可能性、及びその空間内に延在する別の物体の可能性が増加するため、より大きな公差を提供し得る。 In embodiments, tolerances may be provided to account for noise or other situations that may cause invalidity of holes even with correct detection hypotheses. For example, if object image information 12002 includes an object position 6220 that corresponds to an empty space in template object 8290 (e.g., a hole or opening in the object), then that object position 6220 corresponds to another position that is coincidentally located in the space. can correspond to a part of an object. Such a scenario provides accurate detection for an object 5012 in a scene 5013 when, in estimation, the object position 6220 in empty space does not belong to the object 5012 represented by the corresponding object recognition template 4300C, but instead belongs to another object. This may be consistent with hypothesis 8300. In one embodiment, the hole match validator score determines when the coordinates of a hole, opening, or void in template object 8290 are relatively large in size, and irregularities that affect the measurement of that space (e.g., hole or Greater tolerances may be provided because the likelihood of an object intersecting the opening or protruding through the hole or opening and another object extending into that space is increased.

実施形態では、点群バリデータスコア、遮蔽バリデータスコア、穴合致バリデータスコア、及び表面法線バリデータスコアは、単一の３Ｄバリデータスコアに組み合わされてもよい。３Ｄバリデータスコアは、点群バリデータスコア、遮蔽バリデータスコア（又は組み合わされた表面バリデータスコア）、穴合致バリデータスコア、及び表面法線バリデータスコアのいずれかの組み合わせとして、例えば、二つを組み合わせるために、追加、平均化、又は別の数学的操作を実行することによって、決定され得る。 In embodiments, the point cloud validator score, occlusion validator score, hole match validator score, and surface normal validator score may be combined into a single 3D validator score. The 3D validator score may be, for example, a combination of point cloud validator score, occlusion validator score (or combined surface validator score), hole match validator score, and surface normal validator score. It may be determined by adding, averaging, or performing another mathematical operation to combine the two.

処理１３００９では、検出仮説検証方法１３０００は、複数の二次元検証スコアを生成することを含み、この検証スコアは、レンダリングされた合致バリデータスコア及びテンプレートマッチングバリデータスコアのうちの少なくとも一方を含み得る。 In operation 13009, the detection hypothesis validation method 13000 includes generating a plurality of two-dimensional validation scores, which validation scores may include at least one of a rendered match validator score and a template matching validator score. .

レンダリングされた合致バリデータスコアは、画像情報１２００１のレンダリングされた２Ｄ画像情報１２６００を、対応する２Ｄ外観４３０２Ｃと比較することによって取得される。レンダリングされた合致バリデータスコアは、レンダリングされた２Ｄ画像情報１２６００及び対応する２Ｄ外観４３０２Ｃの両方からエッジ情報を抽出するようにさらに処理してもよい。レンダリングされた合致バリデータスコアは、２Ｄ画像情報１２６００から抽出されたエッジが、対応する２Ｄ外観４３０２Ｃから抽出されたエッジと整列するかどうかの判定に基づいてもよい。レンダリングされた合致バリデータスコアは、抽出されたエッジ、抽出されたエッジ間の平均距離、又は任意の他の適切な測定基準によって画定される領域間の重複の量に基づいてもよい。レンダリングされた合致バリデータスコアは、検出仮説信頼スコアに適用されるペナルティスコアとして使用され得る。一部の実例では、レンダリング（例えば、レイトレーシング）を使用して、エッジ情報を生成及び抽出することは、ノイズ及び、金属物体又は影から反射する光のグレアなどのアーチファクトを引き起こし得る他の状態を補償し得る。一部の実例では、処理１３００９はまた、対応する物体認識テンプレート４３００Ｃから情報を再レンダリングして、対応する物体認識テンプレート４３００Ｃからエッジを抽出するように処理してもよい。 The rendered match validator score is obtained by comparing the rendered 2D image information 12600 of the image information 12001 to the corresponding 2D appearance 4302C. The rendered match validator score may be further processed to extract edge information from both the rendered 2D image information 12600 and the corresponding 2D appearance 4302C. The rendered match validator score may be based on determining whether edges extracted from the 2D image information 12600 align with edges extracted from the corresponding 2D appearance 4302C. The rendered match validator score may be based on the amount of overlap between regions defined by extracted edges, average distance between extracted edges, or any other suitable metric. The rendered match validator score may be used as a penalty score applied to the detection hypothesis confidence score. In some instances, using rendering (e.g., ray tracing) to generate and extract edge information may introduce noise and other artifacts such as glare from light reflecting off metal objects or shadows. can be compensated for. In some instances, process 13009 may also process to re-render information from corresponding object recognition template 4300C to extract edges from corresponding object recognition template 4300C.

実施形態では、レンダリングされた合致バリデータスコアは、それに関連付けられたレンダリングされた合致信頼レベルを有してもよい。レンダリングされた合致信頼レベルは、レンダリングされた合致バリデータスコアによって提供された情報における信頼レベルを表し得る。一実施形態では、レンダリングされた合致信頼レベルは、抽出されたエッジの品質に従って決定され得る。一実施形態では、レンダリングされた合致バリデータスコアは、レンダリングされた合致信頼レベルに従って調整されてもよい。 In embodiments, a rendered match validator score may have a rendered match confidence level associated therewith. The rendered match confidence level may represent the confidence level in the information provided by the rendered match validator score. In one embodiment, the rendered match confidence level may be determined according to the quality of the extracted edges. In one embodiment, the rendered match validator score may be adjusted according to the rendered match confidence level.

テンプレートマッチングバリデータスコアは、物体画像情報１２００２から抽出されたエッジと、対応する物体認識テンプレート４３００Ｃ（例えば、テンプレート物体８２９０又は２Ｄ外観４３０２Ｃなど）から導出された物体画像とを比較することによって取得される。エッジ検出アルゴリズム、例えば、Ｃａｎｎｙエッジ検出器を利用して、物体画像情報１２００２から直接物体エッジを、及び対応する物体認識テンプレート４３００Ｃに記憶された画像情報からテンプレートエッジを識別してもよい。テンプレートマッチングバリデータスコアは、物体エッジに対してテンプレートエッジを摺動させてどの程度の摺動（存在する場合）がピーク応答又はオーバーラップをもたらすかを判定することによって、物体エッジとテンプレートエッジとの間のオフセットがあることに従って決定され得る。テンプレートマッチングバリデータスコアは、ピーク応答又はオーバーラップを達成するために必要な、摺動、移動、オフセット、又は調整の量に基づいてもよい。必要な移動や摺動の量が多いほど、テンプレートマッチングバリデータスコアが高くなり、適用されるペナルティが大きくなる。言い換えれば、より多く必要とされる移動は、より不良な合致を示す。 The template matching validator score is obtained by comparing the edges extracted from the object image information 12002 and the object image derived from the corresponding object recognition template 4300C (e.g., template object 8290 or 2D appearance 4302C, etc.). Ru. Edge detection algorithms, such as a Canny edge detector, may be utilized to identify object edges directly from the object image information 12002 and template edges from the image information stored in the corresponding object recognition template 4300C. Template matching validator scores match object edges and template edges by sliding the template edge against the object edge and determining how much sliding (if any) results in a peak response or overlap. The offset between can be determined accordingly. The template matching validator score may be based on the amount of sliding, movement, offset, or adjustment required to achieve peak response or overlap. The greater the amount of movement or sliding required, the higher the template matching validator score and the greater the penalty applied. In other words, more moves required indicate a poorer match.

実施形態では、テンプレートマッチングバリデータスコアは、それに関連付けられたテンプレートマッチング信頼レベルを有してもよい。テンプレートマッチング信頼レベルは、テンプレートマッチングバリデータスコアによって提供される情報における信頼レベルを表し得る。一実施形態では、テンプレートマッチング信頼レベルは、抽出されたエッジの品質に従って決定されてもよい。実施形態では、テンプレートマッチングバリデータスコアは、テンプレートマッチング信頼レベルに従って調整されてもよい。 In embodiments, a template matching validator score may have a template matching confidence level associated therewith. The template matching confidence level may represent the level of confidence in the information provided by the template matching validator score. In one embodiment, the template matching confidence level may be determined according to the quality of the extracted edges. In embodiments, the template matching validator score may be adjusted according to the template matching confidence level.

三次元バリデータスコア及び二次元バリデータスコアを組み合わせて、検出仮説における全体的な信頼レベルを決定するために、さらなる処理で使用され得る全体的な検証スコアを決定してもよい。合計検証スコアは、三次元及び二次元バリデータスコアの各々と、各バリデータスコアに関連付けられた信頼値との組み合わせに基づいてもよい。例えば、より高い信頼値及び／又はより高いスコアの重みを有するバリデータスコアは、総検証スコアに対してより大きな影響を有してもよく、一方、低い信頼値及び／又は低いスコアの重みを有するバリデータスコアは、総検証スコアに対してより小さな影響を有してもよい。 The three-dimensional validator score and the two-dimensional validator score may be combined to determine an overall validation score that can be used in further processing to determine an overall level of confidence in the detection hypothesis. The total validation score may be based on a combination of each of the three-dimensional and two-dimensional validator scores and the confidence value associated with each validator score. For example, a validator score with a higher confidence value and/or a higher score weight may have a greater impact on the total validation score, whereas a validator score with a lower confidence value and/or a lower score weight may have a greater impact on the total validation score. The validator score may have a smaller impact on the total validation score.

実施形態では、処理１３００５は、対応する物体認識テンプレート４３００Ｃが、シーン５０１３に対応する画像情報１２００１内の他の構造又は物体と包括的な一貫性を有するかを判定する、追加の検証ステップをさらに含み得る。例えば、こうした他の構造及び物体は、ワーク又は他の物体が位置する容器を含み得る。例えば、処理１３００５は、テンプレート物体８２９０がこうした容器内に完全に嵌合するかどうか（例えば、姿勢情報６３０１によって決定されるテンプレート物体８２９０の位置に基づいて）、又はテンプレート物体８２９０が容器の表面の外側に延在するか、又は突出するかをさらに決定し得る。テンプレート物体８２９０又はその一部分が容器の外側にある場合、こうした状況は、不正確な検出仮説の示唆であり得る。こうした状況では、総検証スコアは、テンプレート物体８２９０が容器のどのくらい外側にあるかに従って加重されたペナルティで、それに応じて調整され得る。実施形態では、テンプレート物体８２９０又はその一部が閾値量を超える容器の外側にある場合、総検証スコアは、不正確な検出仮説を示すように調整され得る。正確な検出仮説が、依然として容器の外側に延在するか、又は容器の内面を画定する平面を超えて延在するテンプレート物体８２９０と一致し得る状況を説明するために、いくつかの公差が設けられてもよい。こうした状況は、例えば、容器がメッシュ容器である場合、又は物体が、容器の内面を凹凸状に又はその他の方法で変形させるのに十分硬い金属物体である場合に生じ得る。 In embodiments, the process 13005 further includes an additional verification step of determining whether the corresponding object recognition template 4300C is globally consistent with other structures or objects within the image information 12001 corresponding to the scene 5013. may be included. For example, such other structures and objects may include containers in which workpieces or other objects are located. For example, the process 13005 determines whether the template object 8290 fits completely within such a container (e.g., based on the position of the template object 8290 as determined by the pose information 6301), or whether the template object 8290 fits completely within such a container. It may further be determined whether it extends outwardly or protrudes. If template object 8290 or a portion thereof is outside the container, such a situation may be indicative of an incorrect detection hypothesis. In such a situation, the total validation score may be adjusted accordingly, with a weighted penalty according to how far outside the container the template object 8290 is. In embodiments, if template object 8290 or a portion thereof is outside the container by more than a threshold amount, the total validation score may be adjusted to indicate an incorrect detection hypothesis. Some tolerances are provided to account for situations in which an accurate detection hypothesis may still correspond to a template object 8290 that extends outside the container or extends beyond the plane that defines the interior surface of the container. It's okay to be hit. Such a situation may arise, for example, if the container is a mesh container or if the object is a metal object that is sufficiently hard to roughen or otherwise deform the inner surface of the container.

処理１３０１１では、検出仮説検証方法１３０００は、複数の三次元検証スコア及び複数の二次元検証スコアによる、検出仮説のセットからの検出仮説をフィルタリングすることをさらに含む。 In operation 13011, the detection hypothesis validation method 13000 further includes filtering the detection hypothesis from the set of detection hypotheses by a plurality of three-dimensional validation scores and a plurality of two-dimensional validation scores.

実施形態では、複数のバリデータスコアを組み合わせて、検出仮説信頼レベルを決定するために使用され得る総バリデータスコアを生成してもよい。総バリデータスコア及び検出仮説信頼レベルは、対応する物体認識テンプレート４３００Ｃが、シーン５０１３から取得された物体画像情報１２００２とどの程度よく合致するかを示し得る。検出仮説信頼レベルは、検出仮説を除外するか、又はシーン５０１３の物体５０１２を持ち上げるためロボット動作を計画する検出仮説を使用するかどうかを決定するために使用され得る。 In embodiments, multiple validator scores may be combined to generate a total validator score that may be used to determine a detection hypothesis confidence level. The total validator score and detection hypothesis confidence level may indicate how well the corresponding object recognition template 4300C matches the object image information 12002 obtained from the scene 5013. The detection hypothesis confidence level may be used to decide whether to exclude the detection hypothesis or use the detection hypothesis to plan a robot motion to lift the object 5012 in the scene 5013.

実施形態では、検出仮説のフィルタリングは、連続フィルタリング技術に従って行われてもよく、バリデータスコアの各々は、対応する閾値と比較して、検出仮説セット８３０９からの所与の検出仮説８３００を、保持するか、又はフィルタリングするかを決定する。各連続的なバリデータスコアは、閾値と比較されてもよく、バリデータスコアが閾値を超える場合、検出仮説８３００は除外されてもよい。一例では、検出仮説セット８３０９からの検出仮説８３００のフィルタリングは、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、法線ベクトルバリデータスコア、レンダリングされた合致バリデータスコア、及びテンプレートマッチングバリデータスコアを対応する閾値と比較すること、及び対応する閾値を超えるバリデータスコアと、任意の検出仮説８３００を除去することを含み得る。上述の比較は、順次行われてもよい。上記の順序は、例示のみを目的としており、任意の順序を使用してもよい。連続フィルタリングが利用される場合、除外された検出仮説に対する追加のバリデータスコアを計算しないことによって、プロセスの効率が増大し得る。 In embodiments, the filtering of detection hypotheses may be performed according to a sequential filtering technique, where each validator score retains a given detection hypothesis 8300 from the detection hypothesis set 8309 compared to a corresponding threshold. or filtering. Each successive validator score may be compared to a threshold, and if the validator score exceeds the threshold, the detection hypothesis 8300 may be excluded. In one example, filtering detection hypotheses 8300 from detection hypothesis set 8309 includes occlusion validator score, point cloud validator score, hole match validator score, normal vector validator score, rendered match validator score, and template It may include comparing the matching validator score to a corresponding threshold and removing validator scores that exceed the corresponding threshold and any detection hypothesis 8300. The above comparisons may be performed sequentially. The above order is for illustrative purposes only; any order may be used. If continuous filtering is utilized, the efficiency of the process may be increased by not calculating additional validator scores for excluded detection hypotheses.

実施形態では、バリデータスコアを対応する閾値と比較することは、バリデータスコアに関連付けられた信頼レベルを考慮に入れ得る。例えば、関連する閾値は、バリデータスコア信頼レベルに従って調整されてもよく、及び／又はバリデータスコアは、信頼レベルに従って調整されてもよい。したがって、不良合致を示す信頼度の低いバリデータスコアは、検出仮説として除外されてもよく、一方、信頼度の高いバリデータスコアは、より大きな影響を有してもよい。 In embodiments, comparing the validator score to a corresponding threshold may take into account the confidence level associated with the validator score. For example, the associated threshold may be adjusted according to the validator score confidence level, and/or the validator score may be adjusted according to the confidence level. Therefore, validator scores with low confidence indicating a bad match may be excluded as detection hypotheses, while validator scores with high confidence may have a greater impact.

したがって、三次元検証スコア又は二次元検証スコアのうちの一つ以上が、対応する閾値を超える場合（必要に応じて信頼レベルを考慮する）、検出仮説８３００は、検出仮説セット８３０９から除去又はフィルタリングされ得る。検出仮説８３００は、三次元検証スコア及び二次元検証スコアの全てが、対応する閾値の全てを超えることができない場合（必要に応じて信頼レベルを考慮に入れる）、検出仮説セット８３０９内に留まり得る。 Therefore, if one or more of the three-dimensional validation score or the two-dimensional validation score exceeds the corresponding threshold (considering the confidence level if necessary), the detection hypothesis 8300 is removed or filtered from the detection hypothesis set 8309. can be done. A detection hypothesis 8300 may remain within the detection hypothesis set 8309 if all of the 3D and 2D validation scores cannot exceed all of the corresponding thresholds (optionally taking into account confidence levels). .

フィルタリングプロセスは、物体画像情報１２００２に対応する各特定の物体５０１２に対して単一の検出仮説８３００が残るまで継続され得る。こうしたことは、最も高い検出仮説信頼レベル（及び最も低い総検証スコア）を有する検出仮説を選択することによって、及び／又は各物体５０１２に対して単一の検出仮説８３００のみが成功するまで、ますます下げられたフィルター閾値を有するフィルタリングプロセスを反復することによって、発生し得る。単一検出仮説８３００は、フィルタリングされていない検出仮説８３００であってもよい。実施形態では、最小信頼レベルは、検出仮説８３００に設定されてもよい。こうした実施形態では、物体５０１２に対する最良適合検出仮説８３００が信頼閾値を超えることができない場合、システムは、その物体に対する検出仮説８３００を返さなくてもよい。 The filtering process may continue until a single detection hypothesis 8300 remains for each particular object 5012 corresponding to object image information 12002. This is done by selecting the detection hypothesis with the highest detection hypothesis confidence level (and lowest total validation score) and/or until only a single detection hypothesis 8300 is successful for each object 5012. This can occur by repeating the filtering process with increasingly lowered filter thresholds. Single detection hypothesis 8300 may be unfiltered detection hypothesis 8300. In embodiments, a minimum confidence level may be set to detection hypothesis 8300. In such embodiments, if the best-fitting detection hypothesis 8300 for an object 5012 cannot exceed the confidence threshold, the system may not return a detection hypothesis 8300 for that object.

処理１３０１３では、仮説検証方法１３０００は、検証後に、検出仮説のセットに残っているフィルタリングされていない検出仮説に従って、シーン内の一つ以上の物体を検出することを含む。フィルタリング後、物体画像情報１２００２に関連付けられた物体５０１２に対応する最良の検出仮説８３００が識別されて、シーン内の物体５０１２を検出する。上述のように、仮説検証方法１３０００は、複数の異なる関連検出仮説８３００に従って、複数の異なる物体５０１２を識別するためにも利用されてもよい。 In operation 13013, the hypothesis testing method 13000 includes detecting one or more objects in the scene according to the unfiltered detection hypotheses remaining in the set of detection hypotheses after validation. After filtering, the best detection hypothesis 8300 corresponding to the object 5012 associated with the object image information 12002 is identified to detect the object 5012 in the scene. As mentioned above, hypothesis testing method 13000 may also be utilized to identify multiple different objects 5012 according to multiple different associated detection hypotheses 8300.

いくつかの実施形態では、仮説検証方法１３０００は、重複検出処理をさらに含んでもよく、それによって、一つ以上の検出仮説８３００が互いに比較されて、それらの対応するテンプレート物体８２９０が重複を有するかどうかを判定する。こうした重複は、重複を有する検出仮説８３００の一方又は両方が不正確であることを示し得る。検出仮説８３００は、フィルタリング処理１３０１１後の重複について比較されてもよい。フィルタリング処理１３０１１の前に、複数の検出仮説８３００は、各物体５０１２に対して残ってもよく、したがって重複が予想される。フィルタリング処理１３０１１の後、残りの検出仮説８３００は、個々の物体５０１２に対する最良適合を表し、重複は予想されない。重複の検出に応答して、システムは、例えば、その信頼スコアに基づいて、重複検出仮説８３００の一方又は両方を破棄するように構成されてもよく、又は重複検出仮説８３００に関して追加の分析又は処理を行うように構成されてもよい。重複検出仮説８３００を破棄、維持、又は再分析する決定は、重複の程度をさらに考慮し得る。 In some embodiments, hypothesis testing method 13000 may further include a duplicate detection process, whereby one or more detection hypotheses 8300 are compared to each other to determine whether their corresponding template objects 8290 have duplicates. judge whether Such overlaps may indicate that one or both of the detection hypotheses 8300 with overlaps are inaccurate. Detection hypotheses 8300 may be compared for overlap after filtering process 13011. Prior to the filtering process 13011, multiple detection hypotheses 8300 may remain for each object 5012, so overlap is expected. After the filtering process 13011, the remaining detection hypotheses 8300 represent the best fit for each object 5012 and no overlap is expected. In response to detecting a duplicate, the system may be configured to discard one or both of the duplicate detection hypotheses 8300, e.g., based on their confidence scores, or perform additional analysis or processing with respect to the duplicate detection hypothesis 8300. It may be configured to perform. The decision to discard, keep, or reanalyze a duplicate detection hypothesis 8300 may further consider the degree of overlap.

シーン５０１３中の一つ以上の物体５０１２の検出に続いて、少なくとも一つの処理回路１１１０は、一つ以上の物体５０１２の取り出しのためにロボット制御処理１５０００を実行するように処理し、ロボット３３００の移動に一つ以上の物体５０１２を取り出しさせるコマンドを出力し得る。ロボット制御処理１５０００は、障害物検出、動作計画、動作実行を含み得る。 Following detection of one or more objects 5012 in scene 5013 , at least one processing circuit 1110 is operable to perform robot control processing 15000 for retrieval of one or more objects 5012 and to control robot 3300 . A command may be output that causes the move to retrieve one or more objects 5012. Robot control processing 15000 may include obstacle detection, motion planning, and motion execution.

障害物検出は、取り出される物体５０１２の近傍の障害物を検出し、それを説明することを含み得る。本明細書で論じるように、物体５０１２は、他の品目及び物体を有する容器の中にあってもよい。したがって、他の品目及び物体、ならびに容器自体は、ロボット３３００のロボット動作に対する障害物を表し得る。こうした障害物は、物体の近傍の障害物の位置を決定するために処理されてもよい画像情報１２００１及び／又は物体画像情報１２００２内に捕捉されてもよい。 Obstacle detection may include detecting and describing obstacles in the vicinity of the retrieved object 5012. As discussed herein, object 5012 may be in a container with other items and objects. Accordingly, other items and objects, as well as the container itself, may represent obstacles to robotic movement of robot 3300. Such obstacles may be captured in image information 12001 and/or object image information 12002, which may be processed to determine the location of the obstacle in the vicinity of the object.

動作計画は、例えば、ロボット３３００が物体５０１２を取り出すために実行する軌道をプロットするなど、ロボット動作を計画することを含み得る。軌道は、識別された障害物を説明し、回避するようにプロットされ得る。動作実行は、ロボット３３００に計画された動作を実行させるように、動作計画に関連するコマンドをロボット又はロボット制御システムに送信することを含み得る。 Motion planning may include, for example, planning robot motions, such as plotting a trajectory that robot 3300 will take to retrieve object 5012. A trajectory may be plotted to account for and avoid identified obstacles. Motion execution may include sending commands related to the motion plan to the robot or robot control system to cause the robot 3300 to perform the planned motion.

本明細書で論じる方法、例えば、方法６０００、８０００、１００００、１１０００、及び１３０００は、物体認識テンプレートを作成し、物体認識テンプレートを利用して、シーン内の物体に対する検出仮説を生成、精密化、及び検証するために、協働して処理されてもよい。したがって、方法６０００、８０００、１００００、１１０００、及び１３０００は、容器内から複数の物体を検出、識別、及び取り出すロボットプロセスを容易にするために利用されてもよい。 The methods discussed herein, e.g., methods 6000, 8000, 10000, 11000, and 13000, create object recognition templates, utilize the object recognition templates to generate, refine, and refine detection hypotheses for objects in a scene. and may be jointly processed for verification. Accordingly, methods 6000, 8000, 10000, 11000, and 13000 may be utilized to facilitate robotic processes for detecting, identifying, and removing objects from within a container.

関連分野の当業者にとって、本明細書に記載する方法及び用途への、その他の好適な修正ならびに適応を、実施形態のうちのいずれの範囲からも逸脱することなく行うことができることは明らかであろう。上に記載する実施形態は、例示的な例であり、本開示がこれらの特定の実施形態に限定されると解釈されるべきではない。本明細書に開示する様々な実施形態は、記載及び添付の図に具体的に提示する組み合わせとは異なる組み合わせで、組み合わせてもよいことは理解されるべきである。例によって、本明細書に記載するプロセスもしくは方法のいずれのある特定の行為又は事象は、異なるシーケンスで行われてもよく、追加、統合、又は完全に省略し得ることも理解されるべきである（例えば、記載したすべての行為又は事象が、方法又はプロセスを実行するのに必要でなくてもよい）。加えて、本明細書の実施形態のある特定の特徴を、明確にするために、単一の構成要素、モジュール、又はユニットにより行われていると記載しているものの、本明細書に記載する特徴及び機能は、構成要素、ユニット、又はモジュールのいかなる組み合わせによって行われてもよいことは理解されるべきである。従って、添付の特許請求の範囲で定義されるような、発明の趣旨又は範囲から逸脱することなく、様々な変更及び修正を当業者が及ぼし得る。 It will be apparent to those skilled in the relevant art that other suitable modifications and adaptations to the methods and uses described herein can be made without departing from the scope of any of the embodiments. Dew. The embodiments described above are illustrative examples and the disclosure should not be construed as limited to these particular embodiments. It is to be understood that the various embodiments disclosed herein may be combined in different combinations than those specifically presented in the description and accompanying figures. It should also be understood that, by way of example, certain acts or events of any of the processes or methods described herein may be performed in a different sequence, and may be added, combined, or omitted entirely. (For example, not all described acts or events may be necessary to carry out a method or process). Additionally, certain features of the embodiments herein are described herein, for clarity, as being implemented by a single component, module, or unit. It should be understood that the features and functions may be performed by any combination of components, units, or modules. Accordingly, various changes and modifications may be made by those skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.

さらなる実施形態は、以下の実施形態を含む。 Further embodiments include the following embodiments.

実施形態１は、シーン内の物体を識別するための物体認識テンプレートセットを生成するように構成された計算システムであって、物体を表す物体モデルを含む、物体の登録データを取得することと、三次元空間における物体モデルの複数の視点を決定することと、複数の視点の各々で、物体モデルの複数の外観を推定することと、複数の外観に従って、それぞれが複数の外観のそれぞれの一つに対応する複数の物体認識テンプレートを生成することと、複数の物体認識テンプレートを、ロボット制御システムに物体認識テンプレートセットとして伝達することと、を行うように構成された少なくとも一つの処理回路を備え、複数の物体認識テンプレートの各々は、シーン内の物体の画像情報を生成するカメラの光学軸に対して物体が有し得る姿勢を表す、計算システムである。 Embodiment 1 is a computing system configured to generate a set of object recognition templates for identifying objects in a scene, the system comprising: obtaining object registration data including an object model representing the object; determining a plurality of viewpoints of the object model in three-dimensional space; estimating, at each of the plurality of viewpoints, a plurality of appearances of the object model; and according to the plurality of appearances, each one of the plurality of appearances. and transmitting the plurality of object recognition templates to a robot control system as an object recognition template set; Each of the plurality of object recognition templates is a computational system that represents a pose that the object may have with respect to an optical axis of a camera that generates image information of the object in the scene.

実施形態２は、三次元空間が表面によって囲まれ、複数の視点の各々が、表面上のカメラ位置に対応し、物体認識テンプレートの各々が、複数の視点のうちの一つの視点に対応し、一つの視点からの物体の外観を含む、実施形態１に記載の計算システムである。 In the second embodiment, the three-dimensional space is surrounded by a surface, each of the plurality of viewpoints corresponds to a camera position on the surface, and each of the object recognition templates corresponds to one of the plurality of viewpoints, 3 is a calculation system according to embodiment 1, including an appearance of an object from one viewpoint.

実施形態３は、複数の視点の各々が、カメラ回転角度にさらに対応する、実施形態１に記載の計算システムである。 Embodiment 3 is the calculation system according to Embodiment 1, in which each of the plurality of viewpoints further corresponds to a camera rotation angle.

実施形態４は、物体モデルが三次元空間内に固定される、実施形態２に記載の計算システムである。 Embodiment 4 is the calculation system according to Embodiment 2, in which the object model is fixed in three-dimensional space.

実施形態５は、三次元空間が実質的に球状であり、物体モデルが三次元空間の中心に固定される、実施形態２に記載の計算システムである。 Embodiment 5 is the calculation system according to Embodiment 2, in which the three-dimensional space is substantially spherical and the object model is fixed at the center of the three-dimensional space.

実施形態６は、複数の視点が、表面にわたる均等分布に従って選択される、実施形態２に記載の計算システムである。 Embodiment 6 is the computing system according to Embodiment 2, in which the multiple viewpoints are selected according to an even distribution over the surface.

実施形態７は、各カメラ位置が、視点のセットに対応し、視点のセットの各視点は、異なるカメラ回転角度に対応する、実施形態３に記載の計算システムである。 Embodiment 7 is the computing system described in Embodiment 3, where each camera position corresponds to a set of viewpoints, and each viewpoint of the set of viewpoints corresponds to a different camera rotation angle.

実施形態８は、物体認識テンプレートのセットのサブセットが、異なる位置及び異なるカメラ回転角度に対応する視点に対応する物体認識テンプレートを含む、実施形態３に記載の計算システムである。 Embodiment 8 is the computing system according to Embodiment 3, wherein a subset of the set of object recognition templates includes object recognition templates corresponding to viewpoints corresponding to different positions and different camera rotation angles.

実施形態９は、複数の物体認識テンプレートに対して観察される姿勢の予測範囲に基づいて、複数の視点を決定することをさらに含む、実施形態２に記載の計算システムである。 Embodiment 9 is the calculation system according to Embodiment 2, further including determining a plurality of viewpoints based on predicted ranges of postures observed for a plurality of object recognition templates.

実施形態１０は、物体の対称性に基づいて複数の視点を決定することをさらに含む、実施形態２に記載の計算システムである。 Embodiment 10 is the computing system of Embodiment 2, further comprising determining multiple viewpoints based on symmetry of the object.

実施形態１１は、回転後に物体の物体外観が変化するという判定及び物体の軸の識別のうちの少なくとも一つに従って、物体の対称性を決定することをさらに含む、実施形態１０に記載の計算システムである。 Embodiment 11 is the computing system of embodiment 10, further comprising determining symmetry of the object according to at least one of determining that an object appearance of the object changes after rotation and identifying an axis of the object. It is.

実施形態１２は、シーン内の物体を識別するための物体認識テンプレートセットを生成する方法であって、物体を表す物体モデルを含む物体の登録データを取得することと、三次元空間における物体モデルの複数の視点を決定することと、複数の視点の各々で、物体モデルの複数の外観を推定することと、複数の外観に従って、それぞれが複数の外観のそれぞれの一つに対応する複数の物体認識テンプレートを生成することと、複数の物体認識テンプレートを、ロボット制御システムに物体認識テンプレートセットとして伝達することと、を含み、複数の物体認識テンプレートの各々は、シーン内の物体の画像情報を生成するカメラの光学軸に対して物体が有し得る姿勢を表す、方法である。 Embodiment 12 is a method for generating an object recognition template set for identifying an object in a scene, which includes acquiring object registration data including an object model representing the object, and generating object recognition template set for identifying an object in a three-dimensional space. determining a plurality of viewpoints, estimating a plurality of appearances of the object model at each of the plurality of viewpoints, and a plurality of object recognitions according to the plurality of appearances, each corresponding to a respective one of the plurality of appearances. generating a template; and communicating the plurality of object recognition templates to a robot control system as an object recognition template set, each of the plurality of object recognition templates generating image information of an object in the scene. A method of representing the pose that an object can have with respect to the optical axis of a camera.

実施形態１３は、三次元空間が表面によって囲まれ、複数の視点を表面上のカメラ位置に対応させ、物体認識テンプレートの各々を複数の視点のうちの一つの視点に対応させることをさらに含む、実施形態１２に記載の方法である。 Embodiment 13 further comprises: the three-dimensional space is surrounded by a surface, the plurality of viewpoints correspond to camera positions on the surface, and each of the object recognition templates corresponds to one of the plurality of viewpoints. This is the method described in Embodiment 12.

実施形態１４は、複数の視点の各々をカメラ回転角度に対応させることを更に含む、実施形態１３に記載の方法である。 Embodiment 14 is the method of Embodiment 13, further comprising associating each of the plurality of viewpoints with a camera rotation angle.

実施形態１５は、三次元空間内に物体モデルを固定することをさらに含む、実施形態１３に記載の方法である。 Embodiment 15 is the method of embodiment 13, further comprising fixing the object model in three-dimensional space.

実施形態１６は、表面にわたる均等分布に従って複数の視点を選択することをさらに含む、実施形態１３に記載の方法である。 Embodiment 16 is the method of embodiment 13, further comprising selecting the plurality of viewpoints according to an even distribution across the surface.

実施形態１７は、複数の物体認識テンプレートに対して観察される姿勢の予測範囲に基づいて、複数の視点を決定することをさらに含む、実施形態１３に記載の方法である。 Embodiment 17 is the method described in Embodiment 13, further comprising determining a plurality of viewpoints based on predicted ranges of postures observed for a plurality of object recognition templates.

実施形態１８は、物体の対称性に基づいて複数の視点を決定することをさらに含む、実施形態１３に記載の方法である。 Embodiment 18 is the method of embodiment 13, further comprising determining multiple viewpoints based on symmetry of the object.

実施形態１９は、回転後に物体の物体外観が変化するという判定及び物体の軸の識別のうちの少なくとも一つに従って、物体の対称性を決定することをさらに含む、実施形態１８に記載の方法である。 Embodiment 19 is the method of embodiment 18, further comprising determining symmetry of the object according to at least one of determining that an object appearance of the object changes after rotation and identifying an axis of the object. be.

実施形態２０は、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体であって、シーン内の物体を識別するための物体認識テンプレートを生成するための方法を実行するための実行可能な命令を有する非一時的コンピュータ可読媒体である。当該方法は、物体を表す物体モデルを含む、物体の登録データを受信することと、物体モデルの複数の視点を三次元空間内で生成するための動作を行うことと、複数の視点の各々で、物体モデルの複数の外観を推定する動作を行うことと、複数の外観に従って、それぞれが複数の外観のそれぞれの一つに対応する複数の物体認識テンプレートを生成する動作を行うことと、複数の物体認識テンプレートをロボットシステムに物体認識テンプレートセットとして出力することと、を含み、複数の物体認識テンプレートの各々は、シーン内の物体の画像情報を生成するカメラの光学軸に対して物体が有し得る姿勢を表す。 Embodiment 20 is a non-transitory computer-readable medium operable by at least one processing circuit via a communication interface configured to communicate with a robotic system, the non-transitory computer readable medium operable by at least one processing circuit to perform object recognition for identifying objects in a scene. A non-transitory computer readable medium having executable instructions for performing a method for generating a template. The method includes receiving registration data of an object including an object model representing the object, performing an operation to generate a plurality of viewpoints of the object model in a three-dimensional space, and performing an operation for generating a plurality of viewpoints of the object model in a three-dimensional space. , performing an operation of estimating a plurality of appearances of an object model; performing an operation of generating a plurality of object recognition templates according to the plurality of appearances, each corresponding to a respective one of the plurality of appearances; outputting the object recognition templates to the robot system as a set of object recognition templates, each of the plurality of object recognition templates including an image of the object in the scene relative to an optical axis of the camera that generates image information of the object in the scene. Represents the attitude to obtain.

実施形態２１は、シーン内の物体を識別するための物体認識テンプレートを生成するように構成された計算システムであって、デジタルで表される物体を含む物体情報を取得することと、物体情報から二次元測定情報を抽出することと、物体情報から三次元測定情報を抽出することと、二次元測定情報及び三次元測定情報に従って、物体認識テンプレートを生成することとを行うように構成された少なくとも一つの処理回路を含む、計算システムである。 Embodiment 21 is a computing system configured to generate an object recognition template for identifying an object in a scene, the computing system comprising: obtaining object information including a digitally represented object; At least one configured to extract two-dimensional measurement information, extract three-dimensional measurement information from the object information, and generate an object recognition template according to the two-dimensional measurement information and the three-dimensional measurement information. A computing system that includes one processing circuit.

実施形態２２は、デジタルで表される物体が、物体モデルであり、二次元測定情報及び三次元測定を抽出することは、選択された視点で物体モデルの特徴マップを生成するために実行される、実施形態２１に記載の計算システムである。 Embodiment 22 provides that the digitally represented object is an object model, and extracting the two-dimensional measurement information and three-dimensional measurements is performed to generate a feature map of the object model at a selected viewpoint. , the computing system described in Embodiment 21.

実施形態２３は、少なくとも一つの処理回路が、シーンの画像情報を取得すること、物体認識テンプレートへアクセスすること、及び二次元測定情報及び三次元測定情報と画像情報とを比較して、デジタルで表される物体に対応する物体を識別することを行うようにさらに構成される、実施形態２１に記載の計算システムである。 Embodiment 23 provides that the at least one processing circuit acquires image information of a scene, accesses an object recognition template, and compares the image information with the two-dimensional measurement information and the three-dimensional measurement information to digitally 22. The computing system of embodiment 21, further configured to identify an object corresponding to the represented object.

実施形態２４は、二次元測定情報を抽出することが、物体情報から勾配情報を抽出することを含み、勾配情報が、デジタルで表される物体の候補エッジの方向又は配向を示し、三次元測定情報を抽出することが、物体情報から表面法線ベクトル情報を抽出することを含み、表面法線ベクトル情報が、デジタルで表される物体の表面に対して法線の複数のベクトルを記述する、実施形態２１に記載の計算システムである。 Embodiment 24 provides that extracting the two-dimensional measurement information includes extracting gradient information from the object information, the gradient information indicating a direction or orientation of a candidate edge of the digitally represented object; extracting the information includes extracting surface normal vector information from the object information, the surface normal vector information describing a plurality of vectors normal to a surface of the digitally represented object; This is the calculation system described in Embodiment 21.

実施形態２５は、物体情報が、物体の登録データを含み、デジタルで表される物体が、物体モデルを含む、実施形態２１に記載の計算システムである。 Embodiment 25 is the calculation system according to Embodiment 21, wherein the object information includes object registration data, and the digitally represented object includes an object model.

実施形態２６は、物体情報が、二次元画像情報及び三次元画像情報のうちの少なくとも一方を含む、実施形態２１に記載の計算システムである。 Embodiment 26 is the calculation system according to Embodiment 21, wherein the object information includes at least one of two-dimensional image information and three-dimensional image information.

実施形態２７は、勾配情報が、デジタルで表される物体の複数の勾配抽出位置で抽出され、勾配情報を抽出することが、物体情報の二次元画像情報のピクセル強度を分析して、各勾配抽出位置での二次元画像情報のピクセル強度が変化する方向を測定することを含む、実施形態２４に記載の計算システムである。 Embodiment 27 provides that gradient information is extracted at a plurality of gradient extraction positions of a digitally represented object, and that extracting the gradient information includes analyzing pixel intensities of two-dimensional image information of the object information to extract each gradient. 25. The computing system of embodiment 24, comprising measuring the direction in which pixel intensities of the two-dimensional image information change at the extraction location.

実施形態２８は、表面法線ベクトル情報が、デジタルで表される物体の複数の表面法線位置で抽出され、表面法線ベクトル情報を抽出することが、各表面法線位置で、デジタルで表される物体の表面に対して法線の複数のベクトルを識別することを含む、実施形態２４に記載の計算システムである。 In embodiment 28, surface normal vector information is extracted at a plurality of surface normal positions of a digitally represented object, and extracting the surface normal vector information is performed by digitally representing the object at each surface normal position. 25. The computational system of embodiment 24, comprising identifying a plurality of vectors normal to a surface of an object.

実施形態２９は、勾配情報が、デジタルで表される物体の複数の勾配抽出位置で抽出され、表面法線ベクトル情報が、デジタルで表される物体の複数の表面法線位置で抽出され、複数の勾配抽出位置が、複数の表面法線位置と異なっている、実施形態２４に記載の計算システムである。 In Embodiment 29, gradient information is extracted at a plurality of gradient extraction positions of a digitally represented object, surface normal vector information is extracted at a plurality of surface normal positions of a digitally represented object, and the surface normal vector information is extracted at a plurality of surface normal vector positions of a digitally represented object. 25. The calculation system according to embodiment 24, wherein the gradient extraction position of is different from a plurality of surface normal positions.

実施形態３０は、複数の勾配抽出位置が、複数の表面法線位置と重複しない、実施形態２９に記載の計算システムである。 Embodiment 30 is the calculation system according to Embodiment 29, wherein the plurality of gradient extraction positions do not overlap with the plurality of surface normal positions.

実施形態３１は、複数の勾配抽出位置が、デジタルで表される物体のエッジに配置され、複数の表面法線位置が、デジタルで表される物体のエッジから離れて配置される、実施形態２９に記載の計算システムである。 Embodiment 31 is Embodiment 29, wherein the plurality of gradient extraction positions are located at the edges of the digitally represented object, and the plurality of surface normal positions are located away from the edges of the digitally represented object. This is a calculation system described in .

実施形態３２は、シーン内の物体を識別するための物体認識テンプレートを生成する方法であって、デジタルで表される物体を含む物体情報を取得することと、物体情報から二次元測定情報を抽出することと、物体情報から三次元測定情報を抽出することと、二次元測定情報及び三次元測定情報に従って、物体認識テンプレートを生成することと、を含む方法である。 Embodiment 32 is a method for generating an object recognition template for identifying objects in a scene, the method comprising acquiring object information including a digitally represented object, and extracting two-dimensional measurement information from the object information. The method includes: extracting three-dimensional measurement information from the object information; and generating an object recognition template according to the two-dimensional measurement information and the three-dimensional measurement information.

実施形態３３は、選択された視点で物体モデルの特徴マップを生成することをさらに含む、実施形態３２に記載の方法である。 Embodiment 33 is the method of embodiment 32, further comprising generating a feature map of the object model at the selected viewpoint.

実施形態３４は、シーンの画像情報を取得することと、物体認識テンプレートにアクセスすることと、二次元測定情報及び三次元測定情報を画像情報と比較して、デジタルで表される物体に対応するものとして物体を識別することをさらに含む、実施形態３２に記載の方法である。 Embodiment 34 corresponds to a digitally represented object by obtaining image information of a scene, accessing an object recognition template, and comparing two-dimensional measurement information and three-dimensional measurement information with image information. 33. The method of embodiment 32, further comprising identifying the object as an object.

実施形態３５は、二次元測定情報を抽出することが、物体情報から勾配情報を抽出することをさらに含み、勾配情報は、デジタルで表される物体の候補エッジの方向又は配向を示す、実施形態３２に記載の方法である。 Embodiment 35 is an embodiment in which extracting the two-dimensional measurement information further comprises extracting gradient information from the object information, the gradient information indicating a direction or orientation of a candidate edge of the digitally represented object. This is the method described in No. 32.

実施形態３６は、三次元測定情報を抽出することが、物体情報から表面法線ベクトル情報を抽出することをさらに含み、表面法線ベクトル情報が、デジタルで表される物体の表面に対して法線の複数のベクトルを記述する、実施形態３２に記載の方法である。 Embodiment 36 provides that extracting the three-dimensional measurement information further includes extracting surface normal vector information from the object information, wherein the surface normal vector information is normal to the surface of the digitally represented object. 33. The method of embodiment 32, which describes multiple vectors of lines.

実施形態３７は、デジタルで表される物体の複数の勾配抽出位置で勾配情報を抽出することと、物体情報の二次元画像情報のピクセル強度を分析して、各勾配抽出位置での二次元画像情報のピクセル強度が変化する方向を測定することとをさらに含む、実施形態３５に記載の方法である。 Embodiment 37 extracts gradient information at a plurality of gradient extraction positions of a digitally represented object, analyzes the pixel intensity of two-dimensional image information of the object information, and extracts a two-dimensional image at each gradient extraction position. 36. The method of embodiment 35, further comprising measuring the direction in which the pixel intensity of the information changes.

実施形態３８は、デジタルで表される物体の複数の表面法線位置で表面法線ベクトル情報を抽出することと、各表面法線位置で、デジタルで表される物体の表面に対して法線の複数のベクトルを識別することをさらに含む、実施形態３６に記載の方法である。 Embodiment 38 includes extracting surface normal vector information at a plurality of surface normal positions of a digitally represented object, and extracting surface normal vector information at each surface normal position of a digitally represented object. 37. The method of embodiment 36, further comprising identifying a plurality of vectors of .

実施形態３９は、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体であって、シーン内の物体を識別するための物体認識テンプレートを生成するための方法を実行するための実行可能な命令で構成された非一時的コンピュータ可読媒体である。当該方法は、デジタルで表される物体を含む物体情報を受信することと、物体情報から二次元測定情報を抽出する動作を行うことと、物体情報から三次元測定情報を抽出する動作を行うことと、二次元測定情報及び三次元測定情報に従って、物体認識テンプレートをロボットシステムに出力することと、を含む。 Embodiment 39 is a non-transitory computer-readable medium operable by at least one processing circuit via a communication interface configured to communicate with a robotic system, the non-transitory computer-readable medium operable by at least one processing circuit to perform object recognition for identifying objects in a scene. A non-transitory computer-readable medium configured with executable instructions for performing a method for generating a template. The method includes receiving object information including a digitally represented object, performing an operation of extracting two-dimensional measurement information from the object information, and performing an operation of extracting three-dimensional measurement information from the object information. and outputting an object recognition template to the robot system according to the two-dimensional measurement information and the three-dimensional measurement information.

実施形態４０は、シーンの画像情報を受信することと、物体認識テンプレートにアクセスすることと、画像情報に対する二次元測定情報と三次元測定情報との比較をロボットシステムへ出力して、物体を、デジタルで表わされる物体に対応するものとして識別することと、をさらに含む、実施形態３９に記載の実施形態である。 Embodiment 40 includes receiving image information of a scene, accessing an object recognition template, and outputting a comparison of two-dimensional measurement information and three-dimensional measurement information for the image information to a robot system to identify an object. 40. The embodiment of embodiment 39, further comprising identifying as corresponding to a digitally represented object.

実施形態４１は、アーム及び当該アームに接続されたエンドエフェクタを有するロボットと通信すると共に、視野を有するカメラと通信する少なくとも一つの処理回路を備え、少なくとも一つの処理回路は、一つ以上の物体が視野内にあるか又は視野内にあったとき、非一時的コンピュータ可読媒体に記憶された命令を実行するように構成され、命令は、シーン内の物体の物体画像情報を取得することと、テンプレート物体を表す対応する物体認識テンプレートを含む検出仮説を取得することと、テンプレート物体と物体画像情報との間の不一致を識別することと、物体画像情報の物体位置のセットに対応するテンプレート物体内のテンプレート位置のセットを識別することと、テンプレート位置のセットを、物体位置のセットに収束するように調整することと、調整後のテンプレート位置のセットに従って、調整された対応する物体認識テンプレートを含む、調整された検出仮説を生成することと、を含む、計算システムである。 Embodiment 41 includes at least one processing circuit in communication with a robot having an arm and an end effector connected to the arm, and in communication with a camera having a field of view, the at least one processing circuit communicating with a robot having an arm and an end effector connected to the arm, the at least one processing circuit communicating with a camera having a field of view. is configured to execute instructions stored on the non-transitory computer-readable medium when the is or has been within the field of view, the instructions comprising: obtaining object image information of an object in the scene; obtaining a detection hypothesis comprising a corresponding object recognition template representing the template object; identifying mismatches between the template object and object image information; and adjusting the set of template positions to converge to the set of object positions, and including a corresponding object recognition template adjusted according to the adjusted set of template positions. , generating an adjusted detection hypothesis.

実施形態４２は、テンプレート位置のセットと、物体位置のセットの対応するものとの間で延びるそれぞれのベクトルを識別すること、及びそれぞれのベクトルに従って、テンプレート位置のセットを繰り返し調整することによって、テンプレート位置のセットを調整することをさらに含む、実施形態４１に記載の計算システムである。 Embodiment 42 improves the template position by identifying respective vectors that extend between the set of template positions and a corresponding one of the set of object positions, and iteratively adjusting the set of template positions according to the respective vectors. 42. The computing system of embodiment 41, further comprising adjusting the set of positions.

実施形態４３は、テンプレート位置のセットを繰り返し調整することは、テンプレート物体に作用するそれぞれのベクトルの大きさ及び方向に従って、テンプレート位置の調整されたセットを繰り返し生成することと、調整されたテンプレート位置のセットに従って、それぞれのベクトルを調整することと、アライメントの品質が閾値を超えるまで、調整されたテンプレート位置のセットに従って新しいそれぞれのベクトルを識別することと、を含む、実施形態４２に記載の計算システムである。 Embodiment 43 provides that iteratively adjusting the set of template positions includes repeatedly generating an adjusted set of template positions according to the magnitude and direction of each vector acting on the template object; and identifying new respective vectors according to the set of adjusted template positions until the quality of the alignment exceeds a threshold. It is a system.

実施形態４４は、アライメントの品質が、新しいそれぞれのベクトルによって画定されるミスアライメントのレベルに基づいて決定される、実施形態４３に記載の計算システムである。 Embodiment 44 is the computing system of embodiment 43, wherein the quality of alignment is determined based on the level of misalignment defined by each new vector.

実施形態４５は、アライメントの品質が、調整されたテンプレート位置のセットと、物体位置のセットとの間の距離測定値に基づいて決定される、実施形態４３に記載の計算システムである。 Embodiment 45 is the computing system of embodiment 43, wherein the quality of the alignment is determined based on distance measurements between the set of adjusted template positions and the set of object positions.

実施形態４６は、距離測定値がユークリッド距離測定値を含む、実施形態４５に記載の計算システムである。 Embodiment 46 is the computing system of embodiment 45, wherein the distance measurements include Euclidean distance measurements.

実施形態４７は、距離測定値が、調整されたテンプレート位置のセット及び物体位置のセットに関連付けられた表面法線ベクトル間のコサイン距離を含む、実施形態４５に記載の計算システムである。 Embodiment 47 is the computing system of embodiment 45, wherein the distance measurement comprises a cosine distance between surface normal vectors associated with the set of adjusted template positions and the set of object positions.

実施形態４８は、コサイン距離が、表面法線ベクトル間の角度を示し、角度のサイズがアライメントの品質と相関する、実施形態４７に記載の計算システムである。 Embodiment 48 is the calculation system of embodiment 47, wherein the cosine distance indicates an angle between surface normal vectors, and the size of the angle correlates with the quality of the alignment.

実施形態４９は、距離測定値が、調整されたテンプレート位置のセットの第一の位置から、物体位置のセットの第二の位置の平面までの測定値である、実施形態４５に記載の計算システムである。 Embodiment 49 is the computing system of embodiment 45, wherein the distance measurement is a measurement from a first position of the set of adjusted template positions to a plane of a second position of the set of object positions. It is.

実施形態５０は、アライメントの品質が、調整されたテンプレート位置のセットと、物体位置のセットとの間の収束割合によって決定される、実施形態４３に記載の計算システムである。 Embodiment 50 is the computational system of embodiment 43, wherein the quality of alignment is determined by the percentage of convergence between the set of adjusted template positions and the set of object positions.

実施形態５１は、物体認識テンプレートにシーンの画像情報をオーバーレイして、物体認識テンプレートのテンプレート勾配情報及びテンプレート表面法線ベクトル情報と、画像情報から抽出された物体勾配情報及び物体表面法線ベクトル情報との比較に基づいて物体画像情報を識別することによって、検出仮説を取得することをさらに含む、実施形態４１に記載のシステムである。 Embodiment 51 overlays image information of a scene on an object recognition template, and combines template gradient information and template surface normal vector information of the object recognition template with object gradient information and object surface normal vector information extracted from the image information. 42. The system of embodiment 41, further comprising obtaining a detection hypothesis by identifying object image information based on a comparison with the object image information.

実施形態５２は、シーン内の物体の物体画像情報を取得することと、テンプレート物体を表す対応する物体認識テンプレートを含む検出仮説を取得することと、テンプレート物体と物体画像情報との間の不一致を識別することと、物体画像情報の物体位置のセットに対応するテンプレート物体内のテンプレート位置のセットを識別することと、テンプレート位置のセットを、物体位置のセットに収束するように調整することと、調整後のテンプレート位置のセットに従って、調整された対応する物体認識テンプレートを含む、調整された検出仮説を生成することと、を含む、方法である。 Embodiment 52 includes obtaining object image information for an object in a scene, obtaining a detection hypothesis that includes a corresponding object recognition template representing the template object, and detecting a mismatch between the template object and the object image information. identifying a set of template positions within the template object that correspond to a set of object positions of the object image information; and adjusting the set of template positions to converge to the set of object positions; generating an adjusted detection hypothesis that includes an adjusted corresponding object recognition template according to the set of adjusted template positions.

実施形態５３は、テンプレート位置のセットを調整することが、テンプレート位置のセットと、物体位置のセットの対応するものとの間で延びるそれぞれのベクトルを識別することと、それぞれのベクトルに従ってテンプレート位置のセットを繰り返し調整することと、をさらに含む、実施形態５２に記載の方法である。 Embodiment 53 provides that adjusting the set of template positions includes identifying respective vectors extending between the set of template positions and a corresponding one of the set of object positions, and adjusting the template positions according to the respective vectors. 53. The method of embodiment 52, further comprising iteratively adjusting the set.

実施形態５４は、テンプレート物体に作用するそれぞれのベクトルの大きさ及び方向に従ってテンプレート位置の調整されたセットを繰り返し生成することと、調整されたテンプレート位置のセットに従ってそれぞれのベクトルを調整することと、アライメントの品質が閾値を超えるまで、調整されたテンプレート位置のセットに従って新しいそれぞれのベクトルを識別することと、をさらに含む、実施形態５３に記載の方法である。 Embodiment 54 includes: iteratively generating an adjusted set of template positions according to the magnitude and direction of each vector acting on the template object; and adjusting each vector according to the adjusted set of template positions; 54. The method of embodiment 53, further comprising identifying each new vector according to the adjusted set of template positions until the quality of the alignment exceeds a threshold.

実施形態５５は、新しいそれぞれのベクトルによって画定されるミスアライメントのレベルに基づいてアライメントの品質を決定することをさらに含む、実施形態５４に記載の方法である。 Embodiment 55 is the method of embodiment 54, further comprising determining the quality of the alignment based on the level of misalignment defined by each new vector.

実施形態５６は、調整されたテンプレート位置のセットと物体位置のセットとの間の距離測定値に基づいてアライメントの品質を決定することをさらに含む、実施形態５４に記載の方法である。 Embodiment 56 is the method of embodiment 54, further comprising determining the quality of alignment based on a distance measurement between the set of adjusted template positions and the set of object positions.

実施形態５７は、調整されたテンプレート位置のセットと物体位置のセットとの間の収束割合によってアライメントの品質を決定することをさらに含む、実施形態５４に記載の方法である。 Embodiment 57 is the method of embodiment 54, further comprising determining the quality of alignment by a percentage of convergence between the set of adjusted template positions and the set of object positions.

実施形態５８は、検出仮説を取得することは、物体認識テンプレートにシーンの画像情報をオーバーレイして、物体認識テンプレートのテンプレート勾配情報及びテンプレート表面法線ベクトル情報と、画像情報から抽出された物体勾配情報及び物体表面法線ベクトル情報との比較に基づいて、物体画像情報を識別することをさらに含む、実施形態５２に記載の方法である。 In embodiment 58, obtaining a detection hypothesis includes overlaying image information of a scene on an object recognition template, and combining template gradient information and template surface normal vector information of the object recognition template with object gradient extracted from the image information. 53. The method of embodiment 52, further comprising identifying object image information based on a comparison of the information and object surface normal vector information.

実施形態５９は、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体であって、検出仮説を精密化するための方法を実行するための実行可能な命令を有する非一時的コンピュータ可読媒体であって、当該方法が、シーン内の物体の物体画像情報を受信することと、テンプレート物体を表す対応する物体認識テンプレートを含む検出仮説を受信することと、テンプレート物体と物体画像情報との間の不一致を識別する処理を行うことと、物体画像情報の物体位置のセットに対応するテンプレート物体内のテンプレート位置のセットを識別する処理を行うことと、テンプレート位置のセットを調整して、物体位置のセットに収束させる処理を行うことと、調整後のテンプレート位置のセットに従って、調整された対応する物体認識テンプレートを含む、調整された検出仮説をロボットシステムに出力することと、を含む、非一時的コンピュータ可読媒体である。 Embodiment 59 is a non-transitory computer readable medium operable by at least one processing circuit through a communication interface configured to communicate with a robotic system to perform a method for refining a detection hypothesis. a non-transitory computer-readable medium having executable instructions for: receiving object image information of an object in a scene; and a detection hypothesis comprising a corresponding object recognition template representing the template object. and performing a process of identifying a mismatch between the template object and the object image information, and a process of identifying a set of template positions within the template object that correspond to a set of object positions of the object image information. and adjusting a set of template positions to converge to a set of object positions, and adjusting a corresponding object recognition template according to the adjusted set of template positions. and outputting the hypothesis to a robotic system.

実施形態６０は、テンプレート位置のセットを調整するための処理が、テンプレート位置のセットと、物体位置のセットの対応するものとの間で延びるそれぞれのベクトルを識別する処理を実行することと、それぞれのベクトルに従ってテンプレート位置のセットを繰り返し調整した後に、テンプレート位置のセットを調整する処理を実行することを含む、実施形態５９に記載の方法である。 Embodiment 60 provides that the process for adjusting the set of template positions includes: performing a process of identifying respective vectors extending between the set of template positions and a corresponding one of the set of object positions; 60. The method of embodiment 59, comprising performing the process of adjusting the set of template positions after iteratively adjusting the set of template positions according to a vector of .

実施形態６１は、アーム及びアームに接続されたエンドエフェクタを有するロボットと通信すると共に、視野を有するカメラと通信する少なくとも一つの処理回路を備え、少なくとも一つの処理回路は、一つ以上の物体が視野内にあるか又は視野内にあったとき、非一時的コンピュータ可読媒体に格納された命令を実行するように構成され、命令は、シーン内の物体の物体画像情報を取得することと、それぞれがテンプレート物体を表す対応する物体認識テンプレートを含む検出仮説のセットを取得することと、検出仮説のセットの各検出仮説を検証することと、を含み、検証することは、検出仮説の物体認識テンプレートの三次元情報と、物体に対応する物体画像情報の三次元情報との比較に基づいて、複数の三次元検証スコアを生成することであって、複数の三次元検証スコアが遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、及び法線ベクトルバリデータスコアのうちの少なくとも一つを含むことと、検出仮説の対応する物体認識テンプレートの二次元情報及び物体画像情報の三次元情報の比較に基づいて、複数の二次元検証スコアを生成することとであって、複数の二次元検証スコアがレンダリングされた合致バリデータスコア及びテンプレートマッチングバリデータスコアのうちの少なくとも一つを含むことと、複数の三次元検証スコア及び複数の二次元検証スコアに従って、検出仮説のセットから検出仮説をフィルタリングすることと、検証後に検出仮説のセットに残っているフィルタリングされていない検出仮説に従って、シーン内の物体を検出することと、によって行われる、計算システムである。 Embodiment 61 includes at least one processing circuit in communication with a robot having an arm and an end effector connected to the arm and in communication with a camera having a field of view, the at least one processing circuit communicating with a robot having an arm and an end effector connected to the arm; The instructions are configured to execute instructions stored on the non-transitory computer-readable medium when being or were within the field of view, the instructions comprising: obtaining object image information of an object in the scene; obtaining a set of detection hypotheses including a corresponding object recognition template representing the template object; and validating each detection hypothesis of the set of detection hypotheses, wherein the validating the object recognition template of the detection hypothesis generating a plurality of three-dimensional verification scores based on a comparison between three-dimensional information of the object and three-dimensional information of object image information corresponding to the object, the plurality of three-dimensional verification scores being an occlusion validator score, including at least one of a point cloud validator score, a hole matching validator score, and a normal vector validator score, and two-dimensional information of the object recognition template and three-dimensional information of the object image information corresponding to the detection hypothesis. generating a plurality of two-dimensional validation scores based on the comparison of the plurality of two-dimensional validation scores, the plurality of two-dimensional validation scores including at least one of a rendered matching validator score and a template matching validator score. and filtering detection hypotheses from the set of detection hypotheses according to the plurality of 3D validation scores and the plurality of 2D validation scores, and filtering the detection hypotheses in the scene according to the unfiltered detection hypotheses remaining in the set of detection hypotheses after validation. is a computational system that detects objects in

実施形態６２は、命令が、シーンから物体を取り出すためのロボット動作計画手順を実行することと、ロボットを動かして物体を取り出すためのコマンドを出力することと、をさらに含む、実施形態６１に記載の計算システムである。 Embodiment 62 is as described in embodiment 61, wherein the instructions further include: performing a robot motion planning procedure to retrieve the object from the scene; and outputting a command to move the robot to retrieve the object. It is a calculation system.

実施形態６３は、複数の三次元検証スコアが、点群バリデータスコアを含み、点群バリデータスコアが、物体画像情報から得られた物体位置をテンプレート物体の表面と比較すること、物体位置と表面との間の不一致を識別して、点群バリデータスコアを得ることによって取得される、実施形態６１に記載の計算システムである。 Embodiment 63 provides that the plurality of three-dimensional verification scores include a point cloud validator score, and the point cloud validator score compares an object position obtained from object image information with a surface of a template object; 62. The computational system of embodiment 61, wherein the point cloud validator score is obtained by identifying discrepancies with the surface.

実施形態６４は、無効な物体位置が、テンプレート物体の表面の下に物体位置を配置する不一致に従って識別され、点群バリデータスコアが、無効な物体位置に基づく、実施形態６３に記載の計算システムである。 Embodiment 64 is the computational system of embodiment 63, wherein the invalid object position is identified according to a discrepancy that places the object position below a surface of the template object, and the point cloud validator score is based on the invalid object position. It is.

実施形態６５は、複数の三次元検証スコアが、遮蔽バリデータスコアを含み、遮蔽バリデータスコアが、物体画像情報から得られた物体位置をテンプレート物体の表面と比較すること、物体位置と表面との間の不一致を識別して、遮蔽バリデータスコアを得ることによって取得される、実施形態６１に記載の計算システムである。 Embodiment 65 provides that the plurality of three-dimensional validation scores include an occlusion validator score, the occlusion validator score compares an object position obtained from object image information with a surface of a template object, and the object position and the surface. 62. The computational system of embodiment 61, wherein the occlusion validator score is obtained by identifying discrepancies between the occlusion validator scores.

実施形態６６は、テンプレート物体の表面の上方又は外側に対応する物体位置を配置する不一致に従って遮蔽が識別され、遮蔽バリデータスコアは遮蔽に基づく、実施形態６５に記載の計算システムである。 Embodiment 66 is the computational system of embodiment 65, wherein occlusion is identified according to a discrepancy that places the corresponding object position above or outside a surface of the template object, and the occlusion validator score is based on the occlusion.

実施形態６７は、複数の三次元検証スコアが、法線ベクトルバリデータスコアを含み、法線ベクトルバリデータスコアが、物体画像情報から得られた表面法線ベクトルを、テンプレート物体の対応する表面法線ベクトルと比較すること、表面法線ベクトルと対応する表面法線ベクトルとの間の不一致を識別して、法線ベクトルバリデータスコアを得ることによって取得される、実施形態６１に記載の計算システムである。 In embodiment 67, the plurality of three-dimensional verification scores include a normal vector validator score, and the normal vector validator score applies a surface normal vector obtained from object image information to a corresponding surface method of the template object. 62. The computing system of embodiment 61, obtained by comparing with a line vector, identifying a mismatch between a surface normal vector and a corresponding surface normal vector to obtain a normal vector validator score. It is.

実施形態６８は、複数の三次元検証スコアが、穴合致バリデータスコアを含み、穴合致バリデータスコアが、物体画像情報から得られた物体位置を、テンプレート物体の構造と比較すること、テンプレート物体の構造内の空のボリュームに対応する位置の物体位置に従って、穴の無効性を識別するために、物体位置と前記構造との間の不一致を識別することによって取得される、実施形態６１の計算システムである。 Embodiment 68 provides that the plurality of three-dimensional validation scores include a hole match validator score, the hole match validator score compares an object position obtained from object image information with a structure of a template object; The calculation of embodiment 61, obtained by identifying a mismatch between an object position and said structure, in order to identify the invalidity of a hole according to an object position at a position corresponding to an empty volume in the structure of said structure. It is a system.

実施形態６９は、レンダリングされた合致バリデータスコアが、シーン内の物体の二次元レンダリングを生成することと、物体の二次元レンダリングのレンダリングされたエッジを、テンプレート物体の抽出されたエッジと比較して、無効なエッジを識別することによって取得される、実施形態６１に記載の計算システムである。 Embodiment 69 provides that the rendered match validator score generates a two-dimensional rendering of an object in the scene, and that the rendered edges of the two-dimensional rendering of the object are compared to the extracted edges of the template object. 62. The computing system according to embodiment 61, wherein the calculated edge is obtained by identifying invalid edges.

実施形態７０は、検出仮説のセットの各検出仮説を検証することが、対応する物体認識テンプレートを、テンプレート物体に対応する物体以外のシーン要素と比較することをさらに含む、実施形態６１に記載の計算システムである。 Embodiment 70 is as in embodiment 61, wherein validating each detection hypothesis of the set of detection hypotheses further comprises comparing the corresponding object recognition template to a scene element other than an object corresponding to the template object. It is a calculation system.

実施形態７１は、推定された物体を表す対応する物体認識テンプレートをシーン要素と比較することが、テンプレート物体に対応する物体が容器内にあるかどうかを判定することを含む、実施形態７０に記載の計算システムである。 Embodiment 71 is described in embodiment 70, wherein comparing a corresponding object recognition template representing the estimated object with the scene element includes determining whether an object corresponding to the template object is within the container. It is a calculation system.

実施形態７２は、検出仮説のセットから検出仮説をフィルタリングすることが、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、法線ベクトルバリデータスコア、レンダリングされた合致バリデータスコア、及びテンプレートマッチングバリデータスコアを対応する閾値と比較することを含み、三次元検証スコア又は二次元検証スコアのいずれかが、対応する閾値を超えることができない場合、検出仮説が、検出仮説のセットから削除され、三次元検証スコア及び二次元検証スコアが、対応する閾値の全てを超える場合、検出仮説が検出仮説のセット内に残る、実施形態７１に記載の計算システムである。 Embodiment 72 provides that filtering a detection hypothesis from the set of detection hypotheses includes an occlusion validator score, a point cloud validator score, a hole match validator score, a normal vector validator score, a rendered match validator score, and comparing the template matching validator score with a corresponding threshold, and if either the three-dimensional validation score or the two-dimensional validation score cannot exceed the corresponding threshold, the detection hypothesis is selected from the set of detection hypotheses. 72. The computing system of embodiment 71, wherein the detection hypothesis remains in the set of detection hypotheses if it is deleted and the three-dimensional validation score and the two-dimensional validation score exceed all of the corresponding thresholds.

実施形態７３は、シーン内の物体の物体画像情報を取得することと、それぞれがテンプレート物体を表す対応する物体認識テンプレートを含む検出仮説のセットを取得することと、検出仮説のセットの各検出仮説を、検出仮説の物体認識テンプレートの三次元情報と、物体に対応する物体画像情報の三次元情報との比較に基づいて、複数の三次元検証スコアを生成することであって、複数の三次元検証スコアが、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、及び法線ベクトルバリデータスコアのうちの少なくとも一つを含む、複数の三次元検証スコアを生成することと、検出仮説の対応する物体認識テンプレートの二次元情報と物体画像情報の三次元情報との比較に基づいて、複数の二次元検証スコアを生成することであって、複数の二次元検証スコアが、レンダリングされた合致バリデータスコア及びテンプレートマッチングバリデータスコアのうちの少なくとも一方を含む、複数の二次元検証スコアを生成することと、複数の三次元検証スコア及び複数の二次元検証スコアに従って、検出仮説のセットから検出仮説をフィルタリングすることと、検証後に前記検出仮説のセットに残っているフィルタリングされていない検出仮説に従ってシーン内の物体を検出することとによって検証することと、を含む、方法である。 Embodiment 73 includes obtaining object image information for objects in a scene, obtaining a set of detection hypotheses each including a corresponding object recognition template representing a template object, and detecting each detection hypothesis of the set of detection hypotheses. generating a plurality of three-dimensional verification scores based on a comparison between three-dimensional information of an object recognition template of a detection hypothesis and three-dimensional information of object image information corresponding to the object, generating a plurality of three-dimensional validation scores, the validation scores including at least one of an occlusion validator score, a point cloud validator score, a hole match validator score, and a normal vector validator score; generating a plurality of two-dimensional validation scores based on a comparison of two-dimensional information of a corresponding object recognition template of a hypothesis and three-dimensional information of object image information, the plurality of two-dimensional validation scores being rendered; generating a plurality of two-dimensional validation scores including at least one of a matched validator score and a template-matching validator score; and a set of detection hypotheses according to the plurality of three-dimensional validation scores and the plurality of two-dimensional validation scores. A method comprising: filtering detection hypotheses from a set of detection hypotheses; and validating by detecting objects in a scene according to unfiltered detection hypotheses remaining in the set of detection hypotheses after validation.

実施形態７４は、シーンから物体を取り出すためのロボット動作計画手順を実行することと、ロボットを移動して物体を取り出すためのコマンドを出力することと、をさらに含む、実施形態７３に記載の方法である。 Embodiment 74 is the method of embodiment 73, further comprising: performing a robot motion planning procedure to retrieve the object from the scene; and outputting a command to move the robot to retrieve the object. It is.

実施形態７５は、複数の三次元検証スコアを生成することが、法線ベクトルバリデータスコアを取得することと、法線ベクトルバリデータスコアが、物体画像情報から得られた表面法線ベクトルを、テンプレート物体の対応する表面法線ベクトルと比較することと、表面法線ベクトルと対応する表面法線ベクトルとの間の不一致を識別して、法線ベクトルバリデータスコアを得ることとによって取得されることと、をさらに含む、実施形態７３に記載の方法である。 Embodiment 75 provides that generating a plurality of three-dimensional verification scores includes obtaining a normal vector validator score, and the normal vector validator score is a surface normal vector obtained from object image information. obtained by comparing with a corresponding surface normal vector of the template object and identifying discrepancies between the surface normal vector and the corresponding surface normal vector to obtain a normal vector validator score. 74. The method of embodiment 73, further comprising:

実施形態７６は、穴合致バリデータスコアを取得することが、物体画像情報から得られた物体位置をテンプレート物体の構造と比較することと、物体位置と構造との間の不一致を識別して、テンプレート物体の構造中の空のボリュームに対応する位置の物体位置による穴の無効性を識別することとを含む、実施形態７３に記載の方法である。 Embodiment 76 provides that obtaining the hole match validator score comprises: comparing the object position obtained from the object image information to the structure of the template object; and identifying a mismatch between the object position and the structure; 74. The method of embodiment 73, comprising identifying invalidity of the hole due to an object position at a location corresponding to an empty volume in the structure of the template object.

実施形態７７は、レンダリングされた合致バリデータスコアを取得することが、シーン内の物体の二次元レンダリングを生成することと、無効なエッジを識別するために、物体の二次元レンダリングのレンダリングされたエッジをテンプレート物体の抽出されたエッジと比較することとを含む、実施形態７３に記載の方法である。 Embodiment 77 provides that obtaining a rendered match validator score includes: generating a two-dimensional rendering of an object in a scene; and determining invalid edges of the rendered match validator to identify invalid edges. 74. The method of embodiment 73, comprising comparing the edges to extracted edges of the template object.

実施形態７８は、検出仮説のセットの各検出仮説を検証することが、対応する物体認識テンプレートを、テンプレート物体に対応する物体以外のシーン要素と比較することをさらに含む、実施形態７３に記載の方法である。 Embodiment 78 is as in embodiment 73, wherein validating each detection hypothesis of the set of detection hypotheses further comprises comparing the corresponding object recognition template to a scene element other than an object corresponding to the template object. It's a method.

実施形態７９は、検出仮説のセットから検出仮説をフィルタリングすることが、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、法線ベクトルバリデータスコア、レンダリングされた合致バリデータスコア、及びテンプレートマッチングバリデータスコアを対応する閾値と比較することと、三次元検証スコア又は二次元検証スコアのいずれかが、対応する閾値を超えることができない場合、検出仮説のセットから検出仮説を除去することと、三次元検証スコア及び二次元検証スコアが対応する閾値のすべてを超える場合、検出仮説を、検出仮説のセット内に維持することと、を含む、実施形態７３に記載の方法である。 Embodiment 79 provides that filtering a detection hypothesis from the set of detection hypotheses includes an occlusion validator score, a point cloud validator score, a hole match validator score, a normal vector validator score, a rendered match validator score, and comparing the template matching validator score with a corresponding threshold and removing the detection hypothesis from the set of detection hypotheses if either the three-dimensional validation score or the two-dimensional validation score cannot exceed the corresponding threshold. and maintaining the detection hypothesis within the set of detection hypotheses if the three-dimensional validation score and the two-dimensional validation score exceed all of the corresponding thresholds.

実施形態８０は、ロボットシステムと通信するように構成された通信インターフェースを介して少なくとも一つの処理回路によって動作可能な非一時的コンピュータ可読媒体であって、検出仮説を検証するための方法を実行するための実行可能な命令を有する非一時的コンピュータ可読媒体であって、方法は、シーン内の物体の物体画像情報を受信することと、それぞれがテンプレート物体を表す対応する物体認識テンプレートを含む検出仮説のセットを受信することと、検出仮説の物体認識テンプレートの三次元情報と、物体に対応する物体画像情報の三次元情報との比較に基づいて、遮蔽バリデータスコア、点群バリデータスコア、穴合致バリデータスコア、及び法線ベクトルバリデータスコアのうちの少なくとも一つを含む、複数の三次元検証スコアを生成する処理を行うことと、検出仮説の対応する物体認識テンプレートの二次元情報及び物体画像情報の三次元情報を比較することに基づいて、レンダリングされた合致バリデータスコア及びテンプレートマッチングバリデータスコアのうちの少なくとも一つを含む、複数の二次元検証スコアを生成する処理を実行することと、複数の三次元検証スコア及び複数の二次元検証スコアに従って、検出仮説のセットから検出仮説をフィルタリングする処理を行うことと、検証後に、検出仮説のセットに残っているフィルタリングされていない検出仮説に従って、シーン内の物体を検出することと、シーン内の検出された物体をロボットシステムに出力することと、を含む、非一時的コンピュータ可読媒体である。 Embodiment 80 is a non-transitory computer readable medium operable by at least one processing circuit through a communication interface configured to communicate with a robotic system to perform a method for testing a detection hypothesis. A non-transitory computer-readable medium having executable instructions for: receiving object image information of objects in a scene; and detecting detection hypotheses each including corresponding object recognition templates representing template objects. Based on the comparison of the three-dimensional information of the object recognition template of the detection hypothesis and the three-dimensional information of the object image information corresponding to the object, the occlusion validator score, point cloud validator score, hole performing a process of generating a plurality of three-dimensional verification scores including at least one of a match validator score and a normal vector validator score; and two-dimensional information of an object recognition template corresponding to the detection hypothesis and the object. performing a process of generating a plurality of two-dimensional validation scores, including at least one of a rendered matching validator score and a template matching validator score, based on comparing three-dimensional information of the image information; and filtering detection hypotheses from the set of detection hypotheses according to the plurality of three-dimensional validation scores and the plurality of two-dimensional validation scores; and after the validation, unfiltered detection hypotheses remaining in the set of detection hypotheses. According to a non-transitory computer-readable medium, the method includes detecting an object in a scene and outputting the detected object in the scene to a robotic system.

実施形態８１は、三次元空間が実質的に球状であり、物体モデルが三次元空間の中心に固定される、実施形態１３に記載の方法である。 Embodiment 81 is the method of embodiment 13, wherein the three-dimensional space is substantially spherical and the object model is fixed at the center of the three-dimensional space.

実施形態８２は、各カメラ位置が、視点のセットに対応し、視点のセットの各視点は、異なるカメラ回転角度に対応する、実施形態１４に記載の方法である。 Embodiment 82 is the method of embodiment 14, wherein each camera position corresponds to a set of viewpoints, and each viewpoint of the set of viewpoints corresponds to a different camera rotation angle.

実施形態８３は、物体認識テンプレートのセットのサブセットが、異なる位置及び異なるカメラ回転角度に対応する視点に対応する物体認識テンプレートを含む、実施形態１４に記載の方法である。 Embodiment 83 is the method of embodiment 14, wherein the subset of the set of object recognition templates includes object recognition templates corresponding to viewpoints corresponding to different positions and different camera rotation angles.

実施形態８４は、デジタルで表される物体が、物体モデルであり、二次元測定情報及び三次元測定を抽出することが、選択された視点で物体モデルの特徴マップを生成するために実施される、実施形態３２に記載の方法である。 Embodiment 84 provides that the digitally represented object is an object model, and extracting two-dimensional measurement information and three-dimensional measurements is performed to generate a feature map of the object model at a selected viewpoint. , the method described in Embodiment 32.

実施形態８５は、物体情報が、物体の登録データを含み、デジタルで表される物体が、物体モデルを含む、実施形態３２に記載の方法である。 Embodiment 85 is the method of embodiment 32, wherein the object information includes object registration data and the digitally represented object includes an object model.

実施形態８６は、物体情報が、二次元画像情報及び三次元画像情報のうちの少なくとも一方を含む、実施形態３２に記載の方法である。 Embodiment 86 is the method according to Embodiment 32, wherein the object information includes at least one of two-dimensional image information and three-dimensional image information.

実施形態８７は、勾配情報が、デジタルで表される物体の複数の勾配抽出位置で抽出され、表面法線ベクトル情報が、デジタルで表される物体の複数の表面法線位置で抽出され、複数の勾配抽出位置が、複数の表面法線位置と異なっている、実施形態３６に記載の方法である。 Embodiment 87 is such that gradient information is extracted at a plurality of gradient extraction positions of a digitally represented object, surface normal vector information is extracted at a plurality of surface normal positions of a digitally represented object, and surface normal vector information is extracted at a plurality of surface normal vector positions of a digitally represented object. 37. The method of embodiment 36, wherein the gradient extraction position of is different from the plurality of surface normal positions.

実施形態８８は、複数の勾配抽出位置が、複数の表面法線位置と重複しない、実施形態８７に記載の方法である。 Embodiment 88 is the method of embodiment 87, wherein the plurality of gradient extraction positions do not overlap the plurality of surface normal positions.

実施形態８９は、複数の勾配抽出位置が、デジタルで表される物体のエッジに配置され、複数の表面法線位置が、デジタルで表される物体のエッジから離れて配置される、実施形態８７に記載の方法である。 Embodiment 89 is the embodiment 87 in which the plurality of gradient extraction positions are located at the edge of the digitally represented object and the plurality of surface normal positions are located away from the edge of the digitally represented object. This is the method described in .

実施形態９０は、距離測定値がユークリッド距離測定値を含む、実施形態５６に記載の方法である。 Embodiment 90 is the method of embodiment 56, wherein the distance measurements include Euclidean distance measurements.

実施形態９１は、距離測定値が、調整されたテンプレート位置のセット及び物体位置のセットに関連付けられた表面法線ベクトル間のコサイン距離を含む、実施形態５６に記載の方法である。 Embodiment 91 is the method of embodiment 56, wherein the distance measurement comprises a cosine distance between surface normal vectors associated with the set of adjusted template positions and the set of object positions.

実施形態９２は、コサイン距離が、表面法線ベクトル間の角度を示し、角度のサイズがアライメントの質と相関する、実施形態９１に記載の方法である。 Embodiment 92 is the method of embodiment 91, wherein the cosine distance indicates an angle between surface normal vectors, and the size of the angle correlates with the quality of the alignment.

実施形態９３は、距離測定値が、調整されたテンプレート位置のセットの第一の位置から、物体位置のセットの第二の位置の平面までの測定値である、実施形態５６に記載の方法である。 Embodiment 93 is the method of embodiment 56, wherein the distance measurement is a measurement from a first position of the set of adjusted template positions to a plane of a second position of the set of object positions. be.

実施形態９４は、複数の三次元検証スコアが、点群バリデータスコアを含み、点群バリデータスコアが、物体画像情報から取得された物体位置をテンプレート物体の表面と比較すること、物体位置と表面との間の不一致を識別して、点群バリデータスコアを得ることによって取得される、実施形態７３に記載の方法である。 Embodiment 94 provides that the plurality of three-dimensional validation scores include a point cloud validator score, the point cloud validator score comparing an object position obtained from object image information with a surface of a template object; 74. The method of embodiment 73, wherein the point cloud validator score is obtained by identifying mismatches between the points and the surface.

実施形態９５は、無効な物体位置が、テンプレート物体の表面の下に物体位置を配置する不一致に従って識別され、点群バリデータスコアが、無効な物体位置に基づく、実施形態９４に記載の方法である。 Embodiment 95 is the method of embodiment 94, wherein the invalid object location is identified according to a discrepancy that places the object location below a surface of the template object, and the point cloud validator score is based on the invalid object location. be.

実施形態９６は、複数の三次元検証スコアが、遮蔽バリデータスコアを含み、遮蔽バリデータスコアが、物体画像情報から得られた物体位置をテンプレート物体の表面と比較すること、物体位置と表面との間の不一致を識別して、遮蔽バリデータスコアを得ることによって取得される、実施形態７３に記載の方法である。 Embodiment 96 provides that the plurality of three-dimensional validation scores include an occlusion validator score, and the occlusion validator score compares an object position obtained from object image information with a surface of a template object; 74. The method of embodiment 73, wherein the occlusion validator score is obtained by identifying discrepancies between the occluding validator scores.

実施形態９７は、遮蔽が、テンプレート物体の表面の上方又は外側に対応する物体位置を配置する不一致に従って識別され、遮蔽バリデータスコアは、遮蔽に基づく、実施形態９６に記載の方法である。 Embodiment 97 is the method of embodiment 96, wherein the occlusion is identified according to a discrepancy that places the corresponding object position above or outside a surface of the template object, and the occlusion validator score is based on the occlusion.

実施形態９８は、推定された物体を表す対応する物体認識テンプレートをシーン要素と比較することが、テンプレート物体に対応する物体が容器内にあるかどうかを判定することを含む、実施形態７８に記載の方法である。 Embodiment 98 is described in embodiment 78, wherein comparing a corresponding object recognition template representing the estimated object with the scene element includes determining whether an object corresponding to the template object is within the container. This is the method.

Claims

A computational system configured to generate an object recognition template for identifying objects in a scene, the computational system comprising:
comprising at least one processing circuit;
The at least one processing circuit includes:
Obtaining object information including digitally represented objects;
Extracting two-dimensional measurement information from the object information;
Extracting three-dimensional measurement information from the object information;
generating an object recognition template including the two-dimensional measurement information and the three-dimensional measurement information;
A computing system configured to perform

The digitally represented object is an object model,
2. The computing system of claim 1, wherein extracting the two-dimensional measurement information and the three-dimensional measurement information is performed to generate a feature map of the object model at a selected viewpoint.

The at least one processing circuit includes:
obtaining image information of the scene;
accessing the object recognition template;
comparing the two-dimensional measurement information and the three-dimensional measurement information with the image information to identify an object corresponding to the digitally represented object;
The computing system of claim 1, further configured to perform.

Extracting the two-dimensional measurement information includes extracting slope information from the object information,
the gradient information indicates a direction or orientation of a candidate edge of the digitally represented object;
Extracting the three-dimensional measurement information includes extracting surface normal vector information from the object information,
The computing system according to claim 1, wherein the surface normal vector information describes a plurality of vectors normal to the surface of the digitally represented object.

The object information includes registration data of the object,
2. The computing system of claim 1, wherein the digitally represented object includes an object model.

The calculation system according to claim 1, wherein the object information includes at least one of two-dimensional image information and three-dimensional image information.

the gradient information is extracted at a plurality of gradient extraction positions of the digitally represented object;
Extracting the gradient information includes analyzing pixel intensities of two-dimensional image information of the object information to determine a direction in which the pixel intensities of the two-dimensional image information change at each gradient extraction position. , the calculation system according to claim 4.

the surface normal vector information is extracted at a plurality of surface normal positions of the digitally represented object;
5. The method of claim 4, wherein extracting the surface normal vector information includes identifying, at each surface normal position, a plurality of vectors normal to the surface of the digitally represented object. calculation system.

the gradient information is extracted at a plurality of gradient extraction positions of the digitally represented object;
the surface normal vector information is extracted at a plurality of surface normal positions of the digitally represented object;
The calculation system according to claim 4, wherein the plurality of gradient extraction positions are different from the plurality of surface normal positions.

The calculation system according to claim 9, wherein the plurality of gradient extraction positions do not overlap with the plurality of surface normal positions.

The plurality of gradient extraction positions are arranged at edges of the digitally represented object,
10. The computing system of claim 9, wherein the plurality of surface normal positions are located away from edges of the digitally represented object.

A method for generating an object recognition template for identifying objects in a scene, the method comprising:
Obtaining object information including digitally represented objects;
Extracting two-dimensional measurement information from the object information;
Extracting three-dimensional measurement information from the object information;
generating an object recognition template including the two-dimensional measurement information and the three-dimensional measurement information;
method including.

13. The method of claim 12, further comprising generating a feature map of the object model at the selected viewpoint.

further comprising using the object recognition template to identify the object;
Using the object recognition template includes:
obtaining image information of the scene;
accessing the object recognition template;
13. The method of claim 12, comprising comparing the two-dimensional measurement information and the three-dimensional measurement information with the image information to identify an object as corresponding to the digitally represented object.

Extracting the two-dimensional measurement information further includes extracting slope information from the object information,
13. The method of claim 12, wherein the gradient information indicates a direction or orientation of a candidate edge of the digitally represented object.

Extracting the three-dimensional measurement information further includes extracting surface normal vector information from the object information,
13. The method of claim 12, wherein the surface normal vector information describes a plurality of vectors normal to the surface of the digitally represented object.

extracting the gradient information at a plurality of gradient extraction positions of the digitally represented object;
analyzing the pixel intensity of the two-dimensional image information of the object information and measuring the direction in which the pixel intensity of the two-dimensional image information changes at each gradient extraction position;
16. The method of claim 15, further comprising:

extracting the surface normal vector information at a plurality of surface normal positions of the digitally represented object;
identifying, at each surface normal position, a plurality of vectors normal to the surface of the digitally represented object;
17. The method of claim 16, further comprising:

a non-transitory computer-readable medium operable by at least one processing circuit to generate an object recognition template for identifying objects in a scene through a communication interface configured to communicate with a robotic system; a non-transitory computer-readable medium having executable instructions for performing a method for
The method includes:
receiving object information including a digitally represented object;
performing a process of extracting two-dimensional measurement information from the object information;
performing a process of extracting three-dimensional measurement information from the object information;
outputting an object recognition template including the two-dimensional measurement information and the three-dimensional measurement information to the robot system.

receiving image information of the scene;
accessing the object recognition template;
outputting a comparison between the two-dimensional measurement information and the three-dimensional measurement information for the image information to a robot system to identify the object as corresponding to the digitally represented object;
20. The non-transitory computer-readable medium of claim 19, further comprising: