JP7129065B2

JP7129065B2 - Object pose detection from image data

Info

Publication number: JP7129065B2
Application number: JP2019150869A
Authority: JP
Inventors: エルベオドレン; ノゲスフェルナンドカマロ; ランケフー; ナッタワンチャロエングンワニッチ; ロモホセイバンロペス; 睦月榊原; マルコシミッチ
Original assignee: アセントロボティクス株式会社
Priority date: 2019-08-21
Filing date: 2019-08-21
Publication date: 2022-09-01
Anticipated expiration: 2039-08-21
Also published as: JP2021056542A; US20210056247A1

Description

本発明は、ポーズ検出に関する。より具体的には、本発明は、コンピュータモデルポーズのシミュレーションで訓練したポーズ決定関数に関する。 The present invention relates to pose detection. More specifically, the present invention relates to pose determination functions trained on simulations of computer model poses.

製品の製造は、ますます多くのロボット工学技術を用いている。例えば、組み立てラインは、最終製品を組み立てる際に部品を検出し、拾い上げ、まとめるロボットアームを用いる場合がある。プログラミングの負荷を低減させるため、人間の介入が増加する場合がある。例えば、部品を手で適切な位置および向きに配置することによって、ロボットアームは最小限の検出能力のみでよい。ロボットアームが、物体を検出し操作するその能力を高めるにつれて、人間の介入を低減できる場合があり、それにより製造コストも低減され得る。 The manufacturing of products employs an increasing number of robotics techniques. For example, an assembly line may employ robotic arms that detect, pick up, and group parts in assembling the final product. Human intervention may be increased to reduce the programming burden. For example, by manually placing the parts in proper position and orientation, the robotic arm needs only minimal detection capabilities. As robotic arms increase their ability to detect and manipulate objects, human intervention may be reduced, which may also reduce manufacturing costs.

物体を効果的に操作するためには、ロボットシステムは、そのような物体がどのように６Ｄ空間（３軸に沿った位置および３軸の周りの向きで規定できる）に置かれているかを認識できる必要がある。そのようなロボットシステムを訓練しその性能を評価するために、多くの環境多様性を含む大量の訓練データを取得しなければならない。そのようなロボットシステムの設計者は、実行時間およびデータモダリティ要件の両方を低く維持しながら精度を最大化する試みにおいて課題に直面する。 To effectively manipulate an object, a robotic system recognizes how such an object is placed in 6D space (which can be defined by its position along and orientation around 3 axes). I need to be able to. In order to train such a robot system and evaluate its performance, a large amount of training data containing a lot of environmental diversity must be obtained. Designers of such robotic systems face challenges in attempting to maximize accuracy while keeping both run time and data modality requirements low.

本発明の一態様によれば、コンピュータによって実行可能なコンピュータプログラムであって、コンピュータに、物体のコンピュータモデルを取得する手順と、現実的環境シミュレータでコンピュータモデルをシミュレーションする手順と、複数のポーズ表現を含む訓練データを取り込む手順であって、各ポーズ表現が、複数のポーズのうちの１つコンピュータモデルの画像を含み、複数のポーズのうちの１つは、画像に示されるコンピュータモデルのポーズ指定を含むラベルと対になっており、コンピュータモデルの画像およびポーズ指定はシミュレータによって規定される、取り込む手順と、ポーズ表現に学習プロセスを適用して、物体の画像をポーズ指定に関連付けるためのポーズ決定関数を生成する手順と、を含む操作を実行させる、コンピュータプログラムが提供される。 According to one aspect of the invention, there is provided a computer program executable by a computer, comprising: obtaining a computer model of an object; simulating the computer model with a realistic environment simulator; wherein each pose representation includes an image of the computer model in one of a plurality of poses, one of the plurality of poses being a pose specification of the computer model shown in the image , and the image and pose specification of the computer model are specified by the simulator, the capture procedure, and the pose determination for applying a learning process to the pose representation and associating the image of the object with the pose specification A computer program is provided for performing an operation comprising generating a function.

この態様はまた、コンピュータプログラムの命令を実行するコンピュータによって実行される方法、およびその方法を実行する装置を含んでもよい。 This aspect may also include a computer-implemented method executing the instructions of a computer program, and apparatus for performing the method.

なお、上記の発明の概要は、本発明の実施形態の必要な特徴の全てを列挙したものではない。上記の特徴のサブコンビネーションもまた本発明となり得る。 It is noted that the above summary of the invention is not an exhaustive list of all required features of embodiments of the invention. Sub-combinations of the above features may also constitute the invention.

本発明の一実施形態による、ＣＡＤモデルから補正されたポーズ検出までのハードウェアおよびソフトウェア要素間の相互作用の図を示す。FIG. 4 shows a diagram of the interaction between hardware and software elements from CAD model to corrected pose detection, according to one embodiment of the present invention.

本発明の一実施形態による、ポーズ検出のための例示的なハードウェア構成を示す。4 illustrates an exemplary hardware configuration for pose detection, according to one embodiment of the present invention;

本発明の一実施形態による、ポーズ検出のための操作フローを示す。FIG. 4 illustrates an operational flow for pose detection, according to an embodiment of the present invention; FIG.

本発明の一実施形態による、訓練データを取り込むためのコンピュータモデルのシミュレーションのための操作フローを示す。FIG. 4 illustrates an operational flow for simulation of a computer model for populating training data, according to one embodiment of the present invention; FIG.

本発明の一実施形態による、ポーズ決定関数を生成するための操作フローを示す。FIG. 4 illustrates an operational flow for generating a pose determination function, according to one embodiment of the present invention; FIG.

本発明の一実施形態による、ポーズ指定を決定するための操作フローを示す。4 illustrates an operational flow for determining pose designations, according to one embodiment of the present invention;

続いて、例示の本発明の実施形態が説明される。例示の実施形態は、特許請求の範囲に係る本発明を限定するものではなく、実施形態に記載された特徴の組み合わせは、必ずしも本発明に不可欠なものではない。 Illustrative embodiments of the invention are subsequently described. The illustrated embodiments do not limit the claimed invention, and the combination of features described in the embodiments is not necessarily essential to the invention.

図１は、本発明の一実施形態による、ＣＡＤモデルから補正されたポーズ検出までのハードウェアおよびソフトウェア要素間の相互作用の図を示す。この図は、シミュレーション、ディープラーニング、および伝統的なコンピュータ・ビジョンからなる複数の手順を用いる手法を示す。この実施形態では、コンピュータ支援設計（ＣＡＤ）モデル１１２が取得される。ＣＡＤモデル１１２は、物体の３Ｄスキャンから作成されてもよく、または手動で設計されてもよい。組み立てラインなどのいくつかの場合では、ＣＡＤモデルは既に作成されている場合があり、単に再使用されてもよい。 FIG. 1 shows a diagram of the interaction between hardware and software elements from CAD model to corrected pose detection, according to one embodiment of the present invention. This figure shows a multi-step approach consisting of simulation, deep learning, and traditional computer vision. In this embodiment, a computer aided design (CAD) model 112 is obtained. CAD model 112 may be created from a 3D scan of the object or may be manually designed. In some cases, such as assembly lines, CAD models may already be created and may simply be reused.

ランダムなポーズのＣＡＤモデル１１２のインスタンスの画像を作成するために、ＣＡＤモデル１１２の１つまたは複数のインスタンスがシミュレータ１０４によって使用される。シミュレータ１０４が使用されてもよく、その結果、場合によっては「グランドトルース」と呼ばれる、ＣＡＤモデル１１２の各インスタンスの実際のポーズがシミュレータから容易に出力され得る。このようにして、非常に退屈で時間のかかる場合がある、実際のポーズの手動での導出は必要なくなる。ＣＡＤモデル１１２の各インスタンスのランダムなポーズは、シミュレーション内で物体を落下、衝突、振とう、撹拌させるなどによって達成され得る。シミュレータ１０４は、これらの操作を単純化するために物理エンジンを使用する。一旦ＣＡＤモデル１１２の各インスタンスが静止位置に定着すると、画像が取り込まれる。ポーズと相関しない特徴はランダム化され得る。したがって、照明効果を変えることができ、表面の色、テクスチャ、および光沢はすべてランダム化することができる。それにより、ポーズを決定するために、ポーズに相関する特徴、例えば、形状データ、縁などに学習プロセスを効果的に集中させることができる。画像にノイズを付加してもよく、その結果、学習プロセスは、物体の実像の不完全さに慣れることができる。このとき照明効果も役割を果たしており、それは、実際の画像は常に理想的な照明条件下で撮られるわけではなく、それにより、関係するいくらかの側面が検出困難なままとなる場合があるためである。 One or more instances of CAD model 112 are used by simulator 104 to create images of instances of CAD model 112 in random poses. A simulator 104 may be used so that the actual pose of each instance of the CAD model 112, sometimes referred to as the "ground truth", can be readily output from the simulator. In this way, manual derivation of the actual pose is no longer necessary, which can be very tedious and time consuming. Random poses for each instance of CAD model 112 may be achieved by dropping, colliding, shaking, stirring, etc. objects within the simulation. Simulator 104 uses a physics engine to simplify these operations. Once each instance of CAD model 112 is anchored in a stationary position, an image is captured. Features that do not correlate with pose can be randomized. Thus, lighting effects can be varied, and surface color, texture, and gloss can all be randomized. Thereby, the learning process can be effectively focused on features that are correlated with the pose, such as shape data, edges, etc., to determine the pose. Noise may be added to the images so that the learning process can become accustomed to imperfections in the real image of the object. Lighting effects also play a role here, as real images are not always taken under ideal lighting conditions, which may leave some relevant aspects difficult to detect. be.

シミュレータ１０４により取り込まれた各カラー画像は、ラベルとして使用されるシミュレータ１０４から出力される対応する実際のポーズ出力と対になる。この実施形態では、学習プロセスは、未訓練の畳み込みニューラルネットワーク１１７Ｕであり、これは各カラー画像およびラベル対に適用される。カラー画像およびラベル対が訓練データをなす。訓練データは、訓練プロセス前または訓練プロセス中に生成され得る。訓練データが訓練プロセスとは別の計算用リソースを使用して生成される実施形態では、各対が生成されると同時にそれを適用する方がより一時的に効果的である場合がある。訓練プロセス中、未訓練の畳み込みニューラルネットワーク１１７Ｕからの出力は対応するラベルと比較され、重みが適宜調整される。訓練プロセスは、訓練が完了したことを示す条件が満たされるまで継続される。この条件は、特定の量の訓練データの適用、未訓練の畳み込みニューラルネットワーク１１７Ｕの重みの定着、閾値精度に達する出力などであり得る。 Each color image captured by simulator 104 is paired with a corresponding actual pose output from simulator 104 used as a label. In this embodiment, the learning process is an untrained convolutional neural network 117U, which is applied to each color image and label pair. Color images and label pairs form the training data. Training data may be generated prior to the training process or during the training process. In embodiments where the training data is generated using computational resources separate from the training process, it may be more temporally efficient to apply each pair as it is generated. During the training process, the outputs from the untrained convolutional neural network 117U are compared with the corresponding labels and the weights are adjusted accordingly. The training process continues until a condition is met that indicates that training is complete. This condition may be the application of a certain amount of training data, the settling of weights of the untrained convolutional neural network 117U, the output reaching a threshold accuracy, or the like.

一旦訓練が完了すると、結果として得られる訓練済みの畳み込みニューラルネットワーク１１７Ｔが物理的環境において使用される準備が整う。この実施形態では、物理的環境は、ＣＡＤモデル１１２の実体である物体を含む。これらの物体は、カメラ１２５によって撮影され、それにより物体のうちの１つまたは複数のカラー画像をもたらす。カラー画像中の各物体の６Ｄポーズを出力するために、訓練済みの畳み込みニューラルネットワーク１１７Ｔがこのカラー画像に適用される。カメラ１２５はより基本的な性能の劣るカメラである場合があり、かつ照明条件は理想的でない場合があるものの、訓練済みの畳み込みニューラルネットワーク１１７Ｔは、訓練中のシミュレーション画像と同じ方式でこのカラー画像を適切に処理することができるはずである。 Once training is complete, the resulting trained convolutional neural network 117T is ready to be used in a physical environment. In this embodiment, the physical environment includes objects that are the substance of CAD model 112 . These objects are photographed by camera 125, thereby providing one or more color images of the objects. A trained convolutional neural network 117T is applied to this color image to output the 6D pose of each object in the color image. Although the camera 125 may be a more basic, less powerful camera, and the lighting conditions may not be ideal, the trained convolutional neural network 117T will capture this color image in the same manner as the simulated image during training. should be able to handle properly.

一旦カラー画像中の各物体の６Ｄポーズが出力されると、最後の操作の補正１０９が実行される。補正操作１０９は、もう一度ＣＡＤモデル１１２を利用して、検出された各６Ｄポーズに対して微調整を行う。この実施形態では、ＣＡＤモデル１１２は、出力された各６Ｄポーズに従って画像を再作成し、次いで、画像間でずれているように見える任意の物体の６Ｄポーズを調整するために使用される。６Ｄポーズが調整されるにつれて、再作成された画像が適宜操作され、画像が一致するまで比較が継続される。この実施形態では、補正操作１０９は、別の学習プロセスではなく、伝統的な手書きのアルゴリズムである。 Once the 6D pose of each object in the color image has been output, the last operation correction 109 is performed. Correction operation 109 again utilizes CAD model 112 to make fine adjustments to each detected 6D pose. In this embodiment, the CAD model 112 is used to recreate the images according to each output 6D pose and then adjust the 6D pose of any object that appears to be misaligned between images. As the 6D pose is adjusted, the recreated images are manipulated accordingly and the comparison continues until the images match. In this embodiment, the correction operation 109 is a traditional handwritten algorithm rather than another learning process.

一旦補正１０９が完了すると、最終的なポーズ１１９が出力される。最終的なポーズ１１９は、実施形態の状況に応じて多様な方法で利用され得る。例えば、組み立てラインにおいて、ロボットアームは最終的なポーズ１１９を利用して、ロボットアームが組み立てステップを実行することを可能にする方式で各物体を戦略的に掴むことができる。ロボットアーム以外、および組み立てライン以外にも、多くの用途が存在する。適切なポーズ検出を必要とする用途の数は増加している。 Once the correction 109 is complete, the final pose 119 is output. Final pose 119 may be utilized in a variety of ways depending on the context of the embodiment. For example, on an assembly line, the robot arm can utilize the final pose 119 to strategically grab each object in a manner that allows the robot arm to perform the assembly steps. There are many uses other than robotic arms and other than assembly lines. The number of applications requiring proper pose detection is increasing.

図２は、本発明の一実施形態による、ポーズ検出のための例示的なハードウェア構成を示す。例示的なハードウェア構成は、ネットワーク２２８と通信し、かつＣＡＤモデラ２２４、カメラ２２５、およびロボットアーム２２６と相互作用し得るポーズ検出デバイス２２０を含む。ポーズ検出デバイス２２０は、オンプレミスのアプリケーションを実行し、それを使用するクライアントコンピュータをホストするサーバコンピュータまたはメインフレームコンピュータなどのホストコンピュータであり得る。この場合、ポーズ検出デバイス２２０は、ＣＡＤモデラ２２４、カメラ２２５、およびロボットアーム２２６と直接接続されなくてよく、ネットワーク２２８を介して接続される。ポーズ検出デバイス２２０は、２以上のコンピュータを含むコンピュータシステムであってもよい。ポーズ検出デバイス２２０は、ポーズ検出デバイス２２０のユーザ用のアプリケーションを実行するパーソナルコンピュータであってもよい。 FIG. 2 shows an exemplary hardware configuration for pose detection, according to one embodiment of the present invention. An exemplary hardware configuration includes pose detection device 220 in communication with network 228 and capable of interacting with CAD modeler 224 , camera 225 , and robotic arm 226 . Pose detection device 220 may be a host computer, such as a server computer or a mainframe computer that hosts client computers that run and use on-premises applications. In this case, pose detection device 220 may not be directly connected to CAD modeler 224 , camera 225 and robot arm 226 , but through network 228 . Pose detection device 220 may be a computer system including two or more computers. Pose detection device 220 may be a personal computer running an application for the user of pose detection device 220 .

ポーズ検出デバイス２２０は、論理部２００、格納部２１０、通信インタフェース２２１、および入／出力コントローラ２２２を含む。論理部２００は、様々な部分の操作をプロセッサまたはプログラマブル回路に実行させるためにプロセッサまたはプログラマブル回路で実行可能なプログラム命令を集合的に格納している１つまたは複数のコンピュータ可読記憶媒体を含むコンピュータプログラムプロダクトであってよい。論理部２００は、代替的に、アナログもしくはデジタルプログラマブル回路、またはそれらの任意の組み合わせであり得る。論理部２００は、通信を介して相互作用する、物理的に切り離された格納装置または回路から構成され得る。格納部２１０は、本明細書のプロセスの実行中に論理部２００がアクセスするための実行不能データを格納することができる不揮発性コンピュータ可読媒体であり得る。通信インタフェース２２１は、格納部２１０などの記録媒体に設けられた送信バッファ領域に格納され得る送信データを読み取り、読み取った送信データをネットワーク２２８へ送信する、またはネットワーク２２８から受信した受信データを、記録媒体に設けられた受信バッファ領域に書き込む。入／出力コントローラ２２２は、パラレルポート、シリアルポート、キーボードポート、マウスポート、モニターポートなどを介してＣＡＤモデラ２２４、カメラ２２５、およびロボットアーム２２６などの様々な入出力ユニットに接続して、コマンドを受け入れ、情報を提示する。 Pose detection device 220 includes logic unit 200 , storage unit 210 , communication interface 221 and input/output controller 222 . Logic portion 200 is a computer that includes one or more computer-readable storage media collectively storing program instructions executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform various portions of the operations. It may be a program product. Logic portion 200 may alternatively be analog or digital programmable circuitry, or any combination thereof. Logic portion 200 may be comprised of physically separate enclosures or circuits that interact through communication. Storage unit 210 may be a non-volatile computer-readable medium capable of storing non-executable data for access by logic unit 200 during execution of the processes herein. The communication interface 221 reads transmission data that can be stored in a transmission buffer area provided in a recording medium such as the storage unit 210, transmits the read transmission data to the network 228, or records reception data received from the network 228. Write to the receiving buffer area provided on the medium. Input/output controller 222 connects to various input/output units such as CAD modeler 224, camera 225, and robotic arm 226 via parallel ports, serial ports, keyboard ports, mouse ports, monitor ports, etc. to issue commands. Accept and present information.

取得部２０２は、ポーズ検出の過程で、ＣＡＤモデラ２２４、カメラ２２５、ロボットアーム２２６、およびネットワーク２２８からのデータの取得を実行する論理部２００の部分である。取得部は、物体のコンピュータモデル２１２を取得してもよい。取得部２０２は、コンピュータモデル２１２を格納部２１０に格納してもよい。取得部２０２は、以下のフローチャートに記載される付加的機能を実行するための小区分を含んでもよい。そのような小区分は、それらの機能に関連付けられた名称で呼ばれ得る。 Acquisition unit 202 is the portion of logic unit 200 that performs acquisition of data from CAD modeler 224, camera 225, robotic arm 226, and network 228 in the course of pose detection. The acquisition unit may acquire a computer model 212 of the object. Acquisition unit 202 may store computer model 212 in storage unit 210 . Acquisition unit 202 may include subsections for performing additional functions described in the flow charts below. Such subsections may be referred to by names associated with their function.

シミュレーション部２０４は、現実的環境におけるコンピュータモデルをシミュレーションする論理部２００の部分である。シミュレーション部２０４は、ランダムなポーズの物体のコンピュータモデルをシミュレーションしてもよい。その際、シミュレーション部２０４は、コンピュータモデルの運動を誘発するなどのために物理エンジンを含んでもよい。シミュレーション部２０４は、物理エンジンなどのシミュレーションパラメータ２１４を格納部２１０に格納してもよい。シミュレーション部２０４は、以下のフローチャートに記載される付加的機能を実行するための小区分を含んでもよい。そのような小区分は、それらの機能に関連付けられた名称で呼ばれ得る。 Simulation portion 204 is the portion of logic portion 200 that simulates a computer model in a realistic environment. The simulation unit 204 may simulate computer models of objects in random poses. In doing so, the simulation unit 204 may include a physics engine, such as for inducing motion of the computer model. The simulation unit 204 may store simulation parameters 214 such as a physics engine in the storage unit 210 . The simulation portion 204 may include subsections for performing additional functions described in the flow charts below. Such subsections may be referred to by names associated with their function.

取込み部２０５は、訓練データを取り込む論理部２００の部分である。訓練データは、複数のポーズ表現２１５を含んでもよく、各ポーズ表現２１５は、画像に示されるコンピュータモデルのポーズ指定を含むラベルと対になった複数のポーズのうちの１つにあるコンピュータモデルの画像を含む。画像および対応するポーズ指定は、シミュレーション部２０４によって規定される。取込み部２０５は、ポーズ表現２１５を格納部２１０に格納してもよい。取込み部２０５は、以下のフローチャートに記載される付加的機能を実行するための小区分を含んでもよい。そのような小区分は、それらの機能に関連付けられた名称で呼ばれ得る。 Acquisition unit 205 is the portion of logic unit 200 that acquires training data. The training data may include a plurality of pose representations 215, each pose representation 215 of the computer model in one of a plurality of poses paired with a label containing a pose designation of the computer model shown in the images. Contains images. Images and corresponding pose specifications are defined by simulation unit 204 . The capture unit 205 may store the pose representation 215 in the storage unit 210 . The ingestor 205 may include subsections for performing additional functions described in the flow charts below. Such subsections may be referred to by names associated with their function.

関数生成部２０６は、ポーズ検出の過程で、ポーズ表現に学習プロセスを適用してポーズ決定関数を生成する論理部２００の部分である。例えば、ポーズ決定関数は、物体の画像をポーズ指定と関連付けることができる。関数生成部２０６は、ポーズ決定関数パラメータ２１７などの訓練済みの学習プロセスのパラメータを格納部２１０に格納してもよい。関数生成部２０６は、以下のフローチャートに記載される付加的機能を実行するための小区分を含んでもよい。そのような小区分は、それらの機能に関連付けられた名称で呼ばれ得る。 Function generator 206 is the portion of logic 200 that applies a learning process to the pose representation to generate a pose determination function during pose detection. For example, a pose determination function can associate an image of an object with a pose specification. Function generator 206 may store parameters of the trained learning process, such as pose determination function parameters 217 , in storage 210 . Function generator 206 may include subdivisions for performing additional functions described in the flow charts below. Such subsections may be referred to by names associated with their function.

ポーズ決定部２０８は、ポーズ検出の過程で、ポーズ決定関数を物体の画像に適用することによって物体のポーズ指定を決定する論理部２００の部分である。例えば、ポーズ指定位置および向きの６Ｄ指定である。その際、ポーズ決定部２０８は、格納部２１０に格納されたポーズ決定関数パラメータ２１７、およびカメラ２２５によって取り込まれた物理的環境におけるコンピュータモデル２１２と同一な物体の画像を利用してもよく、それにより６Ｄポーズ指定の出力がもたらされる。ポーズ決定部２０８は、以下のフローチャートに記載される付加的機能を実行するための小区分を含んでもよい。そのような小区分は、それらの機能に関連付けられた名称で呼ばれ得る。 Pose determiner 208 is the portion of logic 200 that determines the pose designation of an object by applying a pose determination function to an image of the object during pose detection. For example, 6D specification of pose specification position and orientation. In so doing, pose determiner 208 may utilize pose determination function parameters 217 stored in storage 210 and images of the same object as computer model 212 in the physical environment captured by camera 225, which gives the output of the 6D pose specification. Pose determiner 208 may include subsections for performing additional functions described in the flow charts below. Such subsections may be referred to by names associated with their function.

ポーズ補正部２０９は、ポーズ検出の過程で、物体のポーズ指定を補正する論理部２００の部分である。その際、ポーズ補正部２０９は、格納部２１０に格納された補正パラメータ２１８およびコンピュータモデル２１２を利用してもよく、それにより補正された６Ｄポーズ指定の出力がもたらされる。ポーズ補正部２０９は、以下のフローチャートに記載される付加的機能を実行するための小区分を含んでもよい。そのような小区分は、それらの機能に関連付けられた名称で呼ばれ得る。 Pose corrector 209 is the portion of logic 200 that corrects the pose specification of the object during pose detection. In doing so, pose corrector 209 may utilize correction parameters 218 and computer model 212 stored in storage 210 to provide a corrected 6D pose specification output. The pose corrector 209 may include subsections for performing additional functions described in the flow charts below. Such subsections may be referred to by names associated with their function.

この実施形態では、ポーズ検出デバイス２２０は、訓練データを生成し、学習プロセスを訓練してポーズ決定関数を生成し、次いで、単にコンピュータモデルを入力することによって自動的に訓練済みのポーズ決定関数を使用することを可能にし得る。 In this embodiment, pose detection device 220 generates training data, trains a learning process to generate a pose decision function, and then automatically generates a trained pose decision function simply by inputting a computer model. can allow you to use

他の実施形態では、ポーズ検出デバイスは、本明細書のプロセスを実行するために論理関数を処理することができる任意の他のデバイスであってもよい。ポーズ検出デバイスは、入力、出力、およびすべての情報が直接接続されている環境ではネットワークに接続される必要がない場合がある。論理部および格納部は完全に別個のデバイスでなくてもよく、１つまたは複数のコンピュータ可読媒体を共有してもよい。例えば、格納部は、コンピュータ実行可能命令および論理部によってアクセスされるデータの両方を格納するハードドライブであってもよく、論理部は、中央処理装置（ＣＰＵ）およびランダムアクセスメモリ（ＲＡＭ）の組み合わせであってもよく、論理部では、本明細書のプロセスの実行中にＣＰＵによって実行されるために、コンピュータ実行可能命令が全体的にまたは部分的に複製され得る。特にニューラルネットワークを利用する実施形態では、１つまたは複数のグラフィック処理装置（ＧＰＵ）が論理部に含まれてもよい。 In other embodiments, the pose detection device may be any other device capable of processing logic functions to perform the processes herein. The pose detection device may not need to be networked in environments where inputs, outputs and all information are directly connected. The logic and storage need not be entirely separate devices, but may share one or more computer-readable media. For example, the storage may be a hard drive that stores both computer-executable instructions and data accessed by logic, which is a combination of central processing unit (CPU) and random access memory (RAM). and in the logic portion, computer-executable instructions may be replicated in whole or in part for execution by the CPU during execution of the processes herein. One or more graphics processing units (GPUs) may be included in the logic, particularly in embodiments utilizing neural networks.

ポーズ検出デバイスがコンピュータである実施形態では、コンピュータにインストールされたプログラムは、コンピュータに、本発明の実施形態の装置またはその１もしくは複数の部（モジュール、コンポーネント、エレメントなどを含む）として機能させ得るか、またはそれに関連付けられる操作を実行させ得る、かつ／あるいはコンピュータに、本発明の実施形態のプロセスまたはそのステップを実行させ得る。そのようなプログラムは、コンピュータに、本明細書に記載のフローチャートおよびブロック図のいくつかもしくはすべてのブロックに関連付けられる特定の操作を実行させるために、プロセッサによって実行され得る。 In embodiments in which the pose detection device is a computer, a program installed on the computer may cause the computer to function as the apparatus or one or more portions thereof (including modules, components, elements, etc.) of embodiments of the present invention. or may cause operations associated therewith and/or cause a computer to perform the processes of embodiments of the present invention or steps thereof. Such programs may be executed by the processor to cause the computer to perform certain operations associated with blocks in some or all of the flowcharts and block diagrams described herein.

他の実施形態では、カメラは、色情報に加えて各画素の深度情報を取り込むことができる深度カメラであってもよい。そのような実施形態では、取込み部はまた、シミュレーション部によって規定された深度情報も取り込み、学習関数は適宜訓練されることになる。換言すると、コンピュータモデルの画像は深度情報を含んでもよく、したがって、物体の画像の取り込みは、深度情報の取り込みも含む。しかしながら、多くの深度カメラは近距離では精度が良好でない場合がある。したがって、深度カメラは、より大きい規模の用途でより好適であり得る。 In other embodiments, the camera may be a depth camera capable of capturing depth information for each pixel in addition to color information. In such an embodiment, the fetching unit would also fetch the depth information defined by the simulating unit, and the learning function would be trained accordingly. In other words, the image of the computer model may contain depth information, so capturing the image of the object also includes capturing the depth information. However, many depth cameras may not be accurate at close range. Depth cameras may therefore be more suitable for larger scale applications.

いくつかの実施形態では、単一の用途に複数のコンピュータモデルを使用することができる。シミュレーション部において複数のコンピュータモデルを容易にシミュレーションすることができるが、確実なポーズ決定関数を生成するためにより多くの訓練が要求され得る。例えば、単一の物体が、接続されているが相対的に移動可能な２つの構成要素を含む場合、そのような構成要素は個別の物体として扱われる場合があり、学習関数はそれに伴って訓練されることになる。さらなる実施形態では、ラベルは、構成要素間の関係を規定するパラメータを含み得る。より複雑な方法で形状を変化させる物体、例えば流れる、変形する、または多くの可動部を有する物体は、確実なポーズ決定関数を生成することが全くできない場合がある。 In some embodiments, multiple computer models can be used for a single application. Multiple computer models can be easily simulated in the simulation section, but more training may be required to generate a robust pose determination function. For example, if a single object contains two components that are connected but relatively moveable, such components may be treated as separate objects, and the learning function is trained accordingly. will be In further embodiments, the label may include parameters that define relationships between components. Objects that change shape in more complex ways, such as objects that flow, deform, or have many moving parts, may not be able to generate a reliable pose determination function at all.

図３は、本発明の一実施形態による、ポーズ検出のための操作フローを示す。この操作フローは、ポーズ検出デバイス２２０または以下の操作を実行することができる任意の他のデバイスなどのポーズ検出デバイスによって実行され得るポーズ検出方法を提供し得る。 FIG. 3 shows an operational flow for pose detection, according to one embodiment of the present invention. This operational flow may provide a pose detection method that may be performed by a pose detection device such as pose detection device 220 or any other device capable of performing the following operations.

Ｓ３３０において、取得部２０２などの取得部はコンピュータモデルを取得する。例えば、取得部は、ネットワーク２２８などのネットワークを介して、ＣＡＤモデラ２２４などのＣＡＤモデラから、または別のソースからなど、直接的なユーザ入力から物体のコンピュータモデルを取得し得る。いくつかの実施形態では、取得部は、物体の３Ｄスキャンを行うことによってコンピュータモデルを生成し得る。 At S330, an acquisition unit, such as acquisition unit 202, acquires a computer model. For example, the acquisition unit may acquire a computer model of the object from direct user input, such as from a CAD modeler, such as CAD modeler 224, over a network, such as network 228, or from another source. In some embodiments, the acquisition unit may generate a computer model by performing a 3D scan of the object.

Ｓ３４０において、シミュレーション部２０４などのシミュレーション部は、現実的環境においてコンピュータモデルをシミュレーションする。例えば、シミュレーション部は、現実的環境においてコンピュータモデルをシミュレーションしてもよい。いくつかの実施形態では、シミュレーション部は、コンピュータモデルの１より多くのインスタンスを同時にシミュレーションしてもよい。 At S340, a simulation unit, such as simulation unit 204, simulates the computer model in a realistic environment. For example, the simulation unit may simulate a computer model in a realistic environment. In some embodiments, the simulation unit may simultaneously simulate more than one instance of the computer model.

Ｓ３４６において、取込み部２０５などの取込み部は、ポーズ表現の訓練データを取り込む。例えば、取込み部は、複数のポーズ表現を取り込んでもよく、各ポーズ表現は、画像に示されるコンピュータモデルのポーズ指定を含むラベルと対になった複数のポーズのうちの１つにあるコンピュータモデルの画像を含む。画像および対応するポーズ指定は、シミュレーション部によって規定される。シミュレーション部がコンピュータモデルの１より多くのインスタンスをシミュレーションする実施形態では、各画像もコンピュータモデルの１より多くのインスタンスを含んでもよく、コンピュータモデルの各インスタンスは固有のポーズにある。 At S346, a capture unit such as capture unit 205 captures pose representation training data. For example, the importer may import a plurality of pose representations, each pose representation of the computer model in one of a plurality of poses paired with a label containing a pose designation of the computer model shown in the image. Contains images. Images and corresponding pose specifications are defined by the simulation unit. In embodiments where the simulation unit simulates more than one instance of the computer model, each image may also contain more than one instance of the computer model, each instance of the computer model being in a unique pose.

Ｓ３５０において、関数生成部２０６などの関数生成部はポーズ決定関数を生成する。例えば、関数生成部は、学習プロセスをポーズ表現に適用して、物体の画像をポーズ指定と関連付けるポーズ決定関数を生成し得る。 At S350, a function generator, such as function generator 206, generates a pose determination function. For example, the function generator may apply a learning process to the pose representation to generate a pose determination function that associates images of objects with pose specifications.

Ｓ３６０において、ポーズ決定部２０８などのポーズ決定部はポーズ指定を決定する。例えば、ポーズ決定部は、ポーズ検出の過程で、ポーズ決定関数を物体の画像に適用することによって物体のポーズ指定を決定し得る。いくつかの実施形態では、ポーズ補正部２０９などのポーズ補正部は、物体のポーズ指定を補正し得る。いくつかの実施形態では、ポーズ補正部は、物体のポーズ指定に従ったコンピュータモデルの画像と物理的環境における物体の画像との間の差異を低減させるために、直接画像位置合わせ（ＤｉｒｅｃｔＩｍａｇｅＡｌｉｇｎｍｅｎｔ，ＤＩＡ）を適用し得る。深度情報が利用可能である実施形態などのいくつかの実施形態では、ポーズ補正部は、物体のポーズ指定に従ったコンピュータモデルの画像と物理的環境における物体の画像との間の差異を低減させるために、コヒーレント点ドリフト（ＣｏｈｅｒｅｎｔＰｏｉｎｔＤｒｉｆｔ，ＣＰＤ）を適用し得る。 At S360, a pose determiner, such as pose determiner 208, determines a pose designation. For example, the pose determiner may determine the pose designation of an object by applying a pose determination function to an image of the object during pose detection. In some embodiments, a pose corrector, such as pose corrector 209, may correct the pose designation of the object. In some embodiments, the pose corrector performs Direct Image Alignment to reduce differences between an image of the computer model according to the object's pose specification and an image of the object in the physical environment. , DIA) can be applied. In some embodiments, such as those in which depth information is available, the pose corrector reduces differences between an image of the computer model according to the object's pose specification and an image of the object in the physical environment. For this purpose, Coherent Point Drift (CPD) can be applied.

Ｓ３７０において、ロボットアーム２２６などのロボットアームが位置決めされ得る。例えば、ポーズ検出デバイスは、ポーズ指定に従いロボットアームを位置決めし得る。いくつかの実施形態では、ロボットアームの位置決めは、物体の画像を取り込んだカメラ、例えばカメラ２２５の位置に基づいて、ロボットアームに対する物体の位置を決定することを含み得る。 At S370, a robotic arm, such as robotic arm 226, may be positioned. For example, the pose detection device may position the robotic arm according to the pose specification. In some embodiments, positioning the robotic arm may include determining the position of the object relative to the robotic arm based on the position of a camera that captured an image of the object, such as camera 225 .

図４は、本発明の一実施形態による、図３のＳ３４０およびＳ３４６などの、訓練データを取り込むためのコンピュータモデルのシミュレーションのための操作フローを示す。この操作フロー内の操作は、シミュレーション部２０４などのシミュレーション部、または対応して名付けられたその小区分、および取込み部２０５などの取込み部、または対応して名付けられたその小区分によって実行され得る。 FIG. 4 illustrates an operational flow for simulating a computer model to capture training data, such as S340 and S346 of FIG. 3, according to one embodiment of the invention. The operations in this operational flow may be performed by a simulation unit, such as simulation unit 204, or a correspondingly named subsection thereof, and an acquisition unit, such as acquisition unit 205, or a correspondingly named subsection thereof. .

Ｓ４４２において、シミュレーション部２０４またはその小区分などの環境生成部は、シミュレーション環境を生成する。例えば、環境生成部は、その内部がコンピュータモデルとなり、かつ一部がプラットフォームを形成する３Ｄ空間を作成し得る。環境の残りの詳細、例えば背景色および物体は、たとえあったとしてもシミュレーションの目的にはあまり重要ではなく、学習プロセスがそれらに値を割り当てることを防ぐためにさらにランダム化される。 At S442, an environment generator, such as simulation unit 204 or a subsection thereof, generates a simulation environment. For example, the environment generator may create a 3D space inside which is a computer model and part of which forms a platform. The remaining details of the environment, such as background color and objects, are of little importance, if any, for simulation purposes, and are further randomized to prevent the learning process from assigning values to them.

Ｓ４４４において、シミュレーション部２０４またはその小区分などのランダム割当部は、色、テクスチャ、および照明をランダムに割り当てる。例えば、ランダム割当部は、現実的環境シミュレータ内で、コンピュータモデルおよびプラットフォームに１つまたは複数の表面色をポーズごとにランダムに割り当て得る。別の例として、ランダム割当部は、現実的環境シミュレータ内で、コンピュータモデルおよびプラットフォームに１つまたは複数の表面テクスチャをポーズごとにランダムに割り当て得る。さらに別の例として、ランダム割当部は、現実的環境シミュレータ内で、環境における照明効果をポーズごとにランダムに割り当て得る。そのような照明効果は、明るさ、コントラスト、色温度、および方向のうちの少なくとも１つを含み得る。 At S444, a random allocation unit, such as simulation unit 204 or a subsection thereof, randomly allocates colors, textures, and lighting. For example, the random assigner may randomly assign one or more surface colors to the computer model and platform for each pose within the realistic environment simulator. As another example, the random assigner may randomly assign one or more surface textures to the computer model and platform for each pose within the realistic environment simulator. As yet another example, the random assigner may randomly assign lighting effects in the environment for each pose within the realistic environment simulator. Such lighting effects may include at least one of brightness, contrast, color temperature, and direction.

Ｓ４４５において、シミュレーション部２０４またはその小区分などの運動誘発部は、コンピュータモデルの運動を誘発する。例えば、運動誘発部は、コンピュータモデルがランダムなポーズを呈するように、前記現実的環境シミュレータ内で、プラットフォームに対する前記コンピュータモデルの運動を誘発し得る。誘発される運動の例としては、プラットフォームまたはコンピュータモデルの他のインスタンスに対してコンピュータモデルを落下させ、回転させ、衝突させることが挙げられる。 At S445, a motion inducer, such as simulation unit 204 or a subsection thereof, induces motion of the computer model. For example, a motion inducer may induce motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose. Examples of induced motion include dropping, rotating, and colliding the computer model against a platform or other instance of the computer model.

Ｓ４４６において、取込み部２０５などの取込み部は、画像およびポーズ指定を取り込み得る。例えば、取込み部は、シミュレーション内でソフトカメラを規定し、そのソフトカメラを使用してコンピュータモデルの画像を取り込むことなどによって、シミュレーション内のコンピュータモデルの画像を取り込み得る。取込み部はまた、コンピュータモデルのポーズ指定を取り込んでもよい。ポーズ指定は、ソフトカメラの視点からであってもよい。あるいは、ポーズ指定は、いくつかの他の視点からであってもよく、例えば、ポーズ指定を変換することによって得られる。シミュレーション部がコンピュータモデルの１より多くのインスタンスをシミュレーションする実施形態では、各画像もコンピュータモデルの１より多くのインスタンスを含んでもよく、コンピュータモデルの各インスタンスは固有のポーズにあり、そのため、固有のポーズ指定と関連付けられる。 At S446, a capture unit, such as capture unit 205, may capture the image and pose specification. For example, the capturer may capture images of the computer model within the simulation, such as by defining a soft camera within the simulation and using the soft camera to capture images of the computer model. The importer may also import the pose specification of the computer model. Posing may be from the point of view of a soft camera. Alternatively, the pose specification may be from some other viewpoint, eg obtained by transforming the pose specification. In embodiments in which the simulation portion simulates more than one instance of the computer model, each image may also contain more than one instance of the computer model, each instance of the computer model being in a unique pose and thus a unique pose. Associated with a pose specification.

Ｓ４４８において、シミュレーション部は、取込み部によって十分な量の訓練データが取り込まれたかどうかを決定する。訓練データの量が不十分である場合、操作フローはＳ４４９に進み、そこで別の訓練データを取り込む準備をするために環境がリセットされる。十分な量の訓練データがある場合、操作フローは終了する。 At S448, the simulation unit determines whether a sufficient amount of training data has been captured by the capture unit. If the amount of training data is insufficient, operational flow proceeds to S449 where the environment is reset in preparation for loading another training data. If there is a sufficient amount of training data, the operational flow ends.

図５は、本発明の一実施形態による、図３のＳ３５０などの、ポーズ決定関数を生成するための操作フローを示す。この操作フロー内の操作は、関数生成部２０６などの関数生成部または対応して名付けられたその小区分によって実行され得る。 FIG. 5 shows an operational flow for generating a pose determination function, such as S350 of FIG. 3, according to one embodiment of the invention. The operations in this operational flow may be performed by a function generator, such as function generator 206, or a correspondingly named subsection thereof.

Ｓ５５２において、関数生成部またはその小区分などの学習プロセス規定部は、学習プロセスを規定する。学習プロセスを規定することは、ニューラルネットワークの種類、ニューラルネットワークの次元、層の数などを規定することを含み得る。いくつかの実施形態では、学習プロセス規定部は、畳み込みニューラルネットワークとして学習プロセスを規定する。 At S552, a learning process definition unit, such as a function generator or a subsection thereof, defines a learning process. Defining the learning process may include defining the type of neural network, the dimensions of the neural network, the number of layers, and the like. In some embodiments, the learning process definer defines the learning process as a convolutional neural network.

Ｓ５５４において、関数生成部またはその小区分などのポーズ表現選択部は、複数のポーズ表現の中からポーズ表現を選択する。各ポーズ表現が処理されることを確実にするために、ポーズ決定関数を生成するための操作フローの反復が進むにつれて、以前に選択されていないポーズ表現のみがＳ５５４で選択され得る。ポーズ表現が取り込まれるとすぐに処理される実施形態では、ポーズ表現選択は必要でない場合がある。 At S554, a pose expression selector, such as a function generator or a subdivision thereof, selects a pose expression from among a plurality of pose expressions. To ensure that each pose expression is processed, only previously unselected pose expressions may be selected at S554 as the iterations of the operational flow for generating the pose determination function proceed. In embodiments where pose representations are processed as soon as they are captured, pose representation selection may not be necessary.

Ｓ５５６において、関数生成部またはその小区分などの学習プロセス適用部は、学習プロセスを画像に適用する。学習プロセスをポーズ表現に適用することは、学習プロセスが出力を生成するように、画像を学習プロセスへの入力として使用することを含み得る。学習プロセスがニューラルネットワークを含み、かつポーズ表現がシミュレーション画像である実施形態では、学習プロセスは６Ｄポーズ指定を出力し得る。 At S556, a learning process applicator, such as a function generator or a subsection thereof, applies a learning process to the image. Applying the learning process to the pose representation may include using the image as input to the learning process, such that the learning process produces an output. In embodiments where the learning process includes a neural network and the pose representations are simulated images, the learning process may output a 6D pose designation.

Ｓ５５７において、関数生成部またはその小区分などの学習プロセス調整部は、シミュレーション部によって規定されるラベル、ポーズ指定を目標として用いて学習プロセスを調整する。ポーズ決定関数を生成するための操作フローの反復が進むにつれ、学習プロセス調整部は、ポーズ決定関数パラメータ２１７などの学習プロセスのパラメータを調整して、ポーズ決定関数となるように学習プロセスを訓練する。学習プロセスがニューラルネットワークを含み、ポーズ表現がシミュレーション画像である実施形態では、学習プロセス調整部は、ニューラルネットワークの重みを調整してもよく、学習プロセスは、画像内のコンピュータモデルのインスタンスごとに６Ｄポーズ指定を出力するように訓練され得る。例えば、ニューラルネットワークに画像が入力された後、ニューラルネットワークの実際の出力と対応するポーズ指定との間の誤差が算出される。一旦誤差が算出されると、次いで、この誤差は逆伝搬される、すなわち、誤差は、ネットワークの各重みに対する導関数として表される。一旦導関数が得られると、この導関数の関数に従ってニューラルネットワークの重みが更新される。 At S557, the learning process adjuster, such as the function generator or a subsection thereof, adjusts the learning process using the labels and pose designations defined by the simulation unit as goals. As the iterations of the operational flow for generating the pose decision function proceed, the learning process adjuster adjusts the parameters of the learning process, such as the pose decision function parameters 217, to train the learning process to the pose decision function. . In embodiments in which the learning process includes a neural network and the pose representations are simulated images, the learning process adjuster may adjust the weights of the neural network, and the learning process is a 6D model for each instance of the computer model in the image. It can be trained to output pose specifications. For example, after an image is input to the neural network, the error between the actual output of the neural network and the corresponding pose specification is calculated. Once the error is calculated, it is then back-propagated, ie the error is expressed as a derivative with respect to each weight of the network. Once the derivative is obtained, the neural network weights are updated according to the function of this derivative.

Ｓ５５９において、関数生成部は、すべてのポーズ表現が関数生成部によって処理されたかどうかを決定する。未処理のままのポーズ表現がある場合、操作フローはＳ５５４に戻り、そこで、処理のために別のポーズ表現が選択される。未処理のままのポーズ表現がない場合、操作フローは終了する。図５の操作フローが反復して実行されるにつれ、操作Ｓ５５４、Ｓ５５６、およびＳ５５７の反復は全体としてポーズ決定関数を生成する操作となる。図５の操作フローの終わりには、学習プロセスは、ポーズ決定関数となるのに十分な訓練を受けている。 At S559, the function generator determines whether all pose expressions have been processed by the function generator. If there are pose representations left unprocessed, operational flow returns to S554 where another pose representation is selected for processing. If no pose representations remain unprocessed, the operational flow ends. As the operational flow of FIG. 5 is performed iteratively, the iterations of operations S554, S556, and S557 collectively result in operations that generate a pose determination function. At the end of the operational flow of FIG. 5, the learning process has been trained sufficiently to be the pose decision function.

この実施形態では訓練はすべてのポーズ表現が処理されたときに終了するものの、他の実施形態は、例えばエポックの数によって、または誤差の量に応じてなど、訓練が終わるときを決定するための異なる基準を含み得る。また、この実施形態では、学習プロセスのパラメータは各ポーズ表現の適用後に調整されるものの、他の実施形態では、例えばエポックごとに１回、または誤差の量に応じてなど、異なる間隔でパラメータを調整してもよい。最後に、この実施形態では、学習プロセスの出力がポーズ決定関数となり、これは学習関数の出力がポーズ指定であることを意味するものの、他の実施形態では、学習プロセス自体はポーズ指定を出力しないが、ポーズ指定をもたらすカメラのパラメータと組み合わされたいくつかの出力を行う場合がある。これらの実施形態では、目標の学習プロセス出力を適切に規定するために、訓練データはカメラのそのようなパラメータをポーズ指定から取り除くことによって生成され得る。これらの実施形態では、ポーズ決定関数は、訓練済みの学習プロセスと、出力をカメラのパラメータと組み合わせるための関数との両方を含む。 Although in this embodiment training terminates when all pose representations have been processed, other embodiments may use a method to determine when training terminates, such as by the number of epochs or according to the amount of error. It may contain different criteria. Also, while in this embodiment the parameters of the learning process are adjusted after each pose representation is applied, other embodiments adjust the parameters at different intervals, e.g., once per epoch, or depending on the amount of error. may be adjusted. Finally, in this embodiment, the output of the learning process is the pose determination function, which means that the output of the learning function is the pose specification, whereas in other embodiments the learning process itself does not output the pose specification. may provide some output combined with the camera parameters that provide the pose specification. In these embodiments, the training data may be generated by removing such parameters from the pose specification of the camera in order to adequately define the target learning process output. In these embodiments, the pose determination function includes both a trained learning process and a function for combining the output with camera parameters.

図６は、本発明の一実施形態による、図３のＳ３６０などの、ポーズ指定を決定するための操作フローを示す。この操作フロー内の操作は、ポーズ決定部２０８または対応して名付けられたその小区分などのポーズ決定部、およびポーズ補正部２０９または対応して名付けられたその小区分などのポーズ補正部によって実行され得る。 FIG. 6 illustrates an operational flow for determining pose designations, such as S360 of FIG. 3, according to one embodiment of the invention. The operations in this operational flow are performed by a pose determiner such as pose determiner 208 or its correspondingly named subsection and a pose corrector such as pose corrector 209 or its correspondingly named subsection. can be

Ｓ６６２において、ポーズ決定部２０８またはその小区分などの画像取込み部は、物体の画像を取り込む。例えば、画像取込み部は、物理的環境における物体の画像を取り込み得る。画像取込み部は、カメラ２２５などのカメラ、または画像を取り込むための他のフォトセンサと通信し得る。ポーズ決定関数は、色情報が出力されるポーズ指定に影響を及ぼさないように効果的に訓練され得るものの、色付きで取り込まれた画像は、より多くの情報を提供することができ、その結果、例えばグレースケールで取り込まれた画像よりも縁においてそれらを表す情報のずれがより大きく、それにより、ポーズ決定関数が画像中の物体を画定する縁より容易に検出することが可能となる。 At S662, an image capture component, such as pose determiner 208 or a subsection thereof, captures an image of the object. For example, the image capturer may capture images of objects in the physical environment. The image capture unit may communicate with a camera, such as camera 225, or other photosensor for capturing images. Although the pose determination function can be effectively trained such that color information does not affect the pose specification that is output, images captured in color can provide more information, resulting in The information representing them is more misaligned at edges than, for example, in images captured in grayscale, thereby allowing the pose determination function to more easily detect edges that define objects in the image.

Ｓ６６４において、ポーズ決定部２０８または対応して名付けられたその小区分などのポーズ決定関数適用部は、ポーズ決定関数を画像に適用する。ポーズ決定関数を画像に適用することは、ポーズ決定関数が出力を生成するように、ポーズ決定関数への入力として画像を使用することを含み得る。ポーズ決定関数がニューラルネットワークを含む実施形態では、ニューラルネットワークは、画像中のコンピュータモデルのインスタンスごとに６Ｄポーズ指定を出力し得る。 At S664, a pose determination function applicator, such as pose determiner 208 or a correspondingly named subsection thereof, applies the pose determination function to the image. Applying the pose determination function to the image may include using the image as an input to the pose determination function such that the pose determination function produces an output. In embodiments where the pose determination function includes a neural network, the neural network may output a 6D pose designation for each instance of the computer model in the image.

Ｓ６６６において、ポーズ補正部２０９または対応して名付けられたその小区分などの画像作成部は、コンピュータモデルの画像を作成する。例えば、画像作成部は、物体のポーズ指定に従ったコンピュータモデルの画像を作成し得る。いくつかの実施形態では、画像は、無地の背景で、ポーズ指定に従ったコンピュータモデルのみからなる。 At S666, an image creator, such as pose corrector 209 or a correspondingly named subsection thereof, creates an image of the computer model. For example, the image creator may create an image of the computer model according to the pose specification of the object. In some embodiments, the image consists only of a computer model with a plain background and a pose specification.

Ｓ６６７において、ポーズ補正部２０９または対応して名付けられたその小区分などの画像比較部は、作成された画像を取り込まれた画像と比較する。例えば、画像比較部は、物体のポーズ指定に従ったコンピュータモデルの画像を、物理的環境における物体の画像と比較し得る。いくつかの実施形態では、比較を容易にするために、作成された画像を切り出すことによって生成され得るシルエットは、シミュレーションから直接算出された作成された画像のシルエットと比較される。この比較は、誤差が十分に最小化されるまで反復して実行され得る。 At S667, an image comparator, such as the pose corrector 209 or a correspondingly named subsection thereof, compares the generated image with the captured image. For example, the image comparator may compare an image of the computer model according to the pose specification of the object with an image of the object in the physical environment. In some embodiments, for ease of comparison, silhouettes that may be generated by cropping the generated image are compared to the silhouette of the generated image calculated directly from the simulation. This comparison can be performed iteratively until the error is sufficiently minimized.

Ｓ６６９において、ポーズ補正部２０９または対応して名付けられたその小区分などのポーズ調整部は、ポーズ決定関数から出力されるポーズ指定を調整する。例えば、ポーズ調整部は、取り込まれた画像と作成された画像との間の差異を低減させるためにポーズ指定を調整し得る。 At S669, a pose adjuster, such as pose corrector 209 or a correspondingly named subsection thereof, adjusts the pose specification output from the pose determination function. For example, the pose adjuster may adjust pose specifications to reduce differences between the captured image and the generated image.

本明細書の実施形態の多くでは、ポーズ検出デバイスは、訓練データを生成し、学習プロセスを訓練してポーズ決定関数を生成し、次いで、単にコンピュータモデルを入力することによって自動的に訓練済みのポーズ決定関数を使用することを可能にし得る。シミュレータを利用して訓練データを生成することにより、本明細書に記載の実施形態は、ラベルとしてシミュレータによって規定されたポーズ指定の取り込みを含む、迅速な画像の取り込みが可能となり得る。シミュレータによって規定されたポーズ指定を使用することにより、ラベルが非常に正確になることも可能となる。それらの現実的な精度で知られている既存のシミュレータ、例えばＵＮＲＥＡＬ（登録商標）エンジンは、精度の確信度を高めるだけでなく、画像処理および環境面のランダム化の能力も備わっていてもよい。 In many of the embodiments herein, the pose detection device generates training data, trains a learning process to generate a pose determination function, and then automatically trains a pre-trained function simply by inputting a computer model. It may be possible to use a pose determination function. By utilizing a simulator to generate training data, embodiments described herein may enable rapid image capture, including capture of pose specifications defined by the simulator as labels. Using pose specifications defined by the simulator also allows the labels to be very accurate. Existing simulators known for their realistic accuracy, such as the UNREAL® engine, not only increase confidence in accuracy, but may also be equipped with image processing and environmental randomization capabilities. .

本発明の様々な実施形態は、ブロックが（１）操作が実行されるプロセスのステップ、または（２）操作を実行する役割を果たす装置の部を表し得るフローチャートおよびブロック図を参照して、説明することができる。特定のステップおよび部は、専用回路、コンピュータ可読媒体上に格納されたコンピュータ可読命令が供給されているプログラマブル回路、および／またはコンピュータ可読媒体上に格納されたコンピュータ可読命令が供給されているプロセッサによって実装され得る。専用回路は、デジタルおよび／またはアナログハードウェア回路を含んでもよく、集積回路（ＩＣ）および／または個別の回路を含み得る。プログラマブル回路は、論理ＡＮＤ、ＯＲ、ＸＯＲ、ＮＡＮＤ、ＮＯＲ、および他の論理演算、フリップ－フロップ、レジスタ、メモリ素子など、例えばフィールドプログラマブルゲートアレイ（ＦＰＧＡ）、プログラマブル論理アレイ（ＰＬＡ）などを含む再構成可能なハードウェア回路を含み得る。プロセッサは、中央処理装置（ＣＰＵ）、グラフィック処理装置（ＧＰＵ）、モバイル処理装置（ＭＰＵ）などを含み得る。 Various embodiments of the present invention are described with reference to flowcharts and block diagrams, where blocks may represent (1) steps in a process in which operations are performed or (2) portions of an apparatus responsible for performing the operations. can do. Certain steps and portions may be performed by dedicated circuitry, programmable circuitry provided with computer readable instructions stored on a computer readable medium, and/or processor provided with computer readable instructions stored on a computer readable medium. can be implemented. Dedicated circuitry may include digital and/or analog hardware circuitry and may include integrated circuits (ICs) and/or discrete circuits. Programmable circuits include logic AND, OR, XOR, NAND, NOR, and other logic operations, flip-flops, registers, memory elements, etc., such as Field Programmable Gate Arrays (FPGAs), Programmable Logic Arrays (PLAs), and the like. It may include configurable hardware circuitry. A processor may include a central processing unit (CPU), a graphics processing unit (GPU), a mobile processing unit (MPU), and the like.

コンピュータ可読媒体は、好適なデバイスによって実行される命令を格納可能な任意の有形なデバイスを含んでよく、その結果、そこに格納される命令を有するコンピュータ可読媒体は、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく実行され得る命令を含む、製品を備えることになる。コンピュータ可読媒体の例としては、電子記憶媒体、磁気記憶媒体、光記憶媒体、電磁記憶媒体、半導体記憶媒体などが含まれ得る。コンピュータ可読媒体のより具体的な例としては、フロッピーディスク、ディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭまたはフラッシュメモリ）、電気的消去可能プログラマブルリードオンリメモリ（ＥＥＰＲＯＭ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、コンパクトディスクリードオンリメモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、ＢＬＵ－ＲＡＹ（登録商標）ディスク、メモリスティック、集積回路カードなどが含まれ得る。 Computer-readable media may include any tangible device capable of storing instructions to be executed by a suitable device, such that computer-readable media having instructions stored thereon may be designated in flowcharts or block diagrams. It will comprise an article of manufacture containing instructions that can be executed to create means for performing the operations described above. Examples of computer-readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer readable media include floppy disks, diskettes, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), electrically erasable Programmable Read Only Memory (EEPROM) Static Random Access Memory (SRAM) Compact Disc Read Only Memory (CD-ROM) Digital Versatile Disc (DVD) BLU-RAY disc Memory Stick Integrated Circuit Card etc. can be included.

コンピュータ可読命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、またはオブジェクト指向プログラミング言語、例えばＳｍａｌｌｔａｌｋ（登録商標）、ＪＡＶＡ（登録商標）、Ｃ＋＋など、および「Ｃ」プログラミング言語または同様のプログラミング言語などの従来の手続型プログラミング言語を含む、１つまたは複数のプログラミング言語の任意の組み合わせで記述されたソースコードまたはオブジェクトコードのいずれかを含んでよい。 The computer readable instructions may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or an object oriented programming language such as Smalltalk®, JAVA®. ), C++, etc., and any combination of one or more programming languages, including conventional procedural programming languages such as the "C" programming language or similar programming languages, either source or object code may contain

コンピュータ可読命令は、汎用コンピュータ、特殊目的のコンピュータ、もしくは他のプログラマブルなデータ処理装置のプロセッサ、またはプログラマブル回路に対し、ローカルにまたはローカルエリアネットワーク（ＬＡＮ）、例えばインターネットなどのワイドエリアネットワーク（ＷＡＮ）を介して提供され、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく、コンピュータ可読命令を実行してよい。プロセッサの例としては、コンピュータプロセッサ、処理装置、マイクロプロセッサ、デジタル信号プロセッサ、コントローラ、マイクロコントローラなどが含まれる。 The computer readable instructions may be transferred to a processor or programmable circuit of a general purpose computer, special purpose computer, or other programmable data processing apparatus, either locally or over a local area network (LAN), e.g., a wide area network (WAN) such as the Internet. and may be executed to create means for performing the operations specified in the flowchart or block diagrams. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, and the like.

本発明の実施形態の多くは、特に、人工知能、学習プロセス、およびニューラルネットワークを含む。前述の実施形態のいくつかは、特定の種類のニューラルネットワークを説明している。しかしながら、学習プロセスは通常、重みなどの値を乱数で設定してから開始する。そのような未訓練の学習プロセスは、それらがうまく関数を実行すると合理的に予想することができるよう事前に訓練されなければならない。本明細書に記載のプロセスの多くは、ポーズ検出のための学習プロセスを訓練する目的のためのものである。一旦訓練されると、学習プロセスをポーズ検出に使用することができ、さらなる訓練を必要としない場合がある。このように、訓練済みのポーズ決定関数は、未訓練の学習プロセスを訓練するプロセスの成果である。 Many of the embodiments of the present invention involve artificial intelligence, learning processes, and neural networks, among others. Some of the above embodiments describe specific types of neural networks. However, the learning process usually starts by setting values such as weights with random numbers. Such untrained learning processes must be pretrained so that they can reasonably be expected to perform the function well. Many of the processes described herein are for the purpose of training the learning process for pose detection. Once trained, the learning process can be used for pose detection and may not require further training. Thus, the trained pose decision function is the product of a process of training an untrained learning process.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または補正を加えることが可能であることが当業者に明らかである。そのような変更または補正を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It is obvious to those skilled in the art that various modifications or amendments can be made to the above embodiments. It is clear from the description of the scope of claims that forms with such modifications or amendments can also be included in the technical scope of the present invention.

特許請求の範囲、実施形態、または図面に示す装置、システム、プログラム、および方法によって実行される各プロセスの操作、手順、ステップ、および段階は、順序が「に先立って」または「の前」などによって示されていない限り、また前のプロセスからの出力が後のプロセスで使用されない限りは、任意の順序で実行することができる。特許請求の範囲、明細書、または図面中の動作フローに関して、便宜上「まず、」または「次に、」などを用いて説明したとしても、この順序で実行することが必須であることを意味するものではない。 The operations, procedures, steps, and phases of each process performed by the apparatus, systems, programs, and methods depicted in the claims, embodiments, or drawings may be "prior to," "before," etc. in order. Unless indicated by , and unless the output from the earlier process is used by the later process, they can be executed in any order. Regarding the operation flow in the claims, specification, or drawings, even if it is described using "first" or "next" for convenience, it means that it is essential to execute in this order not a thing

Claims

A computer program executable by a computer, comprising:
obtaining a computer model of an object;
simulating the computer model in a realistic environment simulator;
A procedure for capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image. a capturing step paired with a label containing a pose specification of the computer model shown in , wherein the image and the pose specification of the computer model are defined by the realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
capturing an image of said object in a physical environment;
determining a pose designation of the object by applying the pose determination function to the image of the object;
correcting the pose specification of the object;
to perform an operation including
the image of the computer model includes depth information;
said capturing said image of said object further comprising capturing depth information;
The correcting procedure applies one of Direct Image Alignment (DIA) and Coherent Point Drift (CPD) to transform the computer model according to the pose specification of the object. reducing differences between the image of and the image of the object in the physical environment;
computer program.

2. The computer program according to claim 1 , further comprising the step of positioning a robot arm according to said pose specification.

3. The computer program product of claim 2 , wherein the step of positioning the robotic arm comprises determining the position of the object relative to the robotic arm based on the position of a camera that captured the image of the object.

4. A computer program product according to any one of claims 1 to 3 , wherein said step of correcting comprises creating an image of said computer model according to said pose specification of said object.

The procedure for correcting
comparing said image of said computer model according to said pose specification of said object with said image of said object in said physical environment;
adjusting the pose specification to reduce differences between the captured image and the generated image;
5. The computer program of claim 4 , further comprising:

6. A computer program product according to any preceding claim, wherein the pose specification is a 6D specification of position and orientation.

the step of simulating includes simulating more than one instance of the computer model;
each image includes the more than one instance of the computer model, each instance of the computer model being a unique pose;
Computer program according to any one of claims 1 to 6.

the realistic environment simulator includes a physics engine;
the simulating step includes inducing motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose;
Computer program according to any one of claims 1 to 7 .

9. The computer program product of claim 8 , wherein the motion-inducing procedure comprises at least one of dropping, rolling, and colliding procedures.

10. The computer program product of claim 8 or 9 , wherein the step of simulating comprises randomly assigning one or more surface colors to the computer model and platform on a pose-by-pose basis within the realistic environment simulator.

A computer program executable by a computer, comprising:
obtaining a computer model of an object;
simulating the computer model in a realistic environment simulator;
A procedure for capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image. a capturing step paired with a label containing a pose specification of the computer model shown in , wherein the image and the pose specification of the computer model are defined by the realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
to perform an operation including
the realistic environment simulator includes a physics engine;
the simulating step includes inducing motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose;
the step of simulating includes randomly assigning one or more surface colors to the computer model and the platform for each pose within the realistic environment simulator;
computer program.

12. The step of simulating comprises randomly assigning one or more surface textures to the computer model and platform on a pose-by-pose basis within the realistic environment simulator. The computer program described.

A computer program executable by a computer, comprising:
obtaining a computer model of an object;
simulating the computer model in a realistic environment simulator;
A procedure for capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image. a capturing step paired with a label containing a pose specification of the computer model shown in , wherein the image and the pose specification of the computer model are defined by the realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
to perform an operation including
the realistic environment simulator includes a physics engine;
the simulating step includes inducing motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose;
the step of simulating includes randomly assigning one or more surface textures to the computer model and the platform for each pose within the realistic environment simulator;
computer program.

14. The computer program product of any one of claims 1 to 13 , wherein the step of simulating comprises randomly assigning lighting effects in the environment from pose to pose within the realistic environment simulator.

A computer program executable by a computer, comprising:
obtaining a computer model of an object;
simulating the computer model in a realistic environment simulator;
A procedure for capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image. a capturing step paired with a label containing a pose specification of the computer model shown in , wherein the image and the pose specification of the computer model are defined by the realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
to perform an operation including
the step of simulating includes, within the realistic environment simulator, randomly assigning lighting effects in the environment for each pose;
computer program.

16. A computer program as claimed in claim 14 or 15, wherein the lighting effects include at least one of brightness, contrast, color temperature and direction.

17. A computer program product as claimed in any preceding claim, wherein the learning process is a convolutional neural network.

obtaining a computer model of the object;
simulating the computer model in a realistic environment simulator;
capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image; wherein said image and said pose designation of said computer model are defined by said realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
capturing an image of the object in a physical environment;
determining a pose designation of the object by applying the pose determination function to the image of the object;
correcting the pose designation of the object;
including
the image of the computer model includes depth information;
said capturing said image of said object further comprising capturing depth information;
the step of correcting applies one of Direct Image Alignment (DIA) and Coherent Point Drift (CPD) to generate the computer model according to the pose specification of the object; reducing differences between the image of and the image of the object in the physical environment;
Computer-implemented method.

obtaining a computer model of the object;
simulating the computer model in a realistic environment simulator;
capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image; wherein said image and said pose designation of said computer model are defined by said realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
including
the realistic environment simulator includes a physics engine;
the simulating step includes inducing motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose;
said simulating includes randomly assigning one or more surface colors to said computer model and said platform for each pose within said realistic environment simulator;
Computer-implemented method.

obtaining a computer model of the object;
simulating the computer model in a realistic environment simulator;
capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image; wherein said image and said pose designation of said computer model are defined by said realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
including
the realistic environment simulator includes a physics engine;
the simulating step includes inducing motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose;
said simulating includes randomly assigning one or more surface textures to said computer model and said platform for each pose within said realistic environment simulator;
Computer-implemented method.

obtaining a computer model of the object;
simulating the computer model in a realistic environment simulator;
capturing training data comprising a plurality of pose representations, each pose representation comprising an image of said computer model in one of a plurality of poses, one of said plurality of poses being said image; wherein said image and said pose designation of said computer model are defined by said realistic environment simulator;
applying a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
including
the simulating step includes randomly assigning lighting effects in the environment to poses within the realistic environment simulator;
Computer-implemented method.

an acquisition unit configured to acquire a computer model of an object;
a simulation unit configured to simulate the computer model in a realistic environment simulator;
a capture unit configured to capture training data comprising a plurality of pose representations and an image of the object in a physical environment , each pose representation of the computer model in one of a plurality of poses; an image, one of the plurality of poses being paired with a label including a pose specification of the computer model shown in the image, the image and the pose specification of the computer model being the realistic an acquisition section defined by the environment simulator;
a learning process applying unit configured to apply a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
a pose determination unit that determines a pose designation of the object by applying the pose determination function to the image of the object;
a pose correction unit that corrects the pose designation of the object;
with
the image of the computer model includes depth information;
The capturing unit further captures depth information,
The pose corrector applies one of Direct Image Alignment (DIA) and Coherent Point Drift (CPD) to transform the computer model according to the pose specification of the object. reducing differences between the image of and the image of the object in the physical environment;
Device.

an acquisition unit configured to acquire a computer model of an object;
a simulation unit configured to simulate the computer model in a realistic environment simulator;
an importer configured to import training data comprising a plurality of pose representations, each pose representation comprising an image of the computer model in one of a plurality of poses; is paired with a label containing a pose designation of said computer model shown in said image, said image and said pose designation of said computer model being defined by said realistic environment simulator; ,
a learning process applying unit configured to apply a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
with
the realistic environment simulator includes a physics engine;
the simulation unit induces motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose;
wherein the simulation unit randomly assigns one or more surface colors to the computer model and the platform for each pose within the realistic environment simulator;
Device.

an acquisition unit configured to acquire a computer model of an object;
a simulation unit configured to simulate the computer model in a realistic environment simulator;
an importer configured to import training data comprising a plurality of pose representations, each pose representation comprising an image of the computer model in one of a plurality of poses; is paired with a label containing a pose designation of said computer model shown in said image, said image and said pose designation of said computer model being defined by said realistic environment simulator; ,
a learning process applying unit configured to apply a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
with
the realistic environment simulator includes a physics engine;
the simulation unit induces motion of the computer model relative to a platform within the realistic environment simulator such that the computer model assumes a random pose;
the simulation unit randomly assigning one or more surface textures to the computer model and the platform for each pose within the realistic environment simulator;
Device.

an acquisition unit configured to acquire a computer model of an object;
a simulation unit configured to simulate the computer model in a realistic environment simulator;
an importer configured to import training data comprising a plurality of pose representations, each pose representation comprising an image of the computer model in one of a plurality of poses; is paired with a label containing a pose designation of said computer model shown in said image, said image and said pose designation of said computer model being defined by said realistic environment simulator; ,
a learning process applying unit configured to apply a learning process to the plurality of pose representations to generate a pose determination function for associating an image of the object with a pose specification;
with
wherein the simulation unit randomly assigns lighting effects in the environment to poses within the realistic environment simulator;
Device.