JP2022024189A

JP2022024189A - Learning data creation method, learning data creation device, and program

Info

Publication number: JP2022024189A
Application number: JP2018182538A
Authority: JP
Inventors: 叡一松元; Eiichi Matsumoto; 颯介小林; Sosuke Kobayaashi; 悠太菊池; Yuta Kikuchi; 祐貴五十嵐; Yuki Igarashi; 統太郎中島; Totaro Nakajima
Original assignee: Preferred Networks Inc
Current assignee: Preferred Networks Inc
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2022-02-09
Also published as: WO2020067204A1

Abstract

To easily create learning data.SOLUTION: One embodiment is characterized in that a computer executes: a first creation procedure in which, a virtual space is created by a three-dimensional simulator, where one or more objects are positioned; a second creation procedure in which, in response to a first image being created where an inside of a prescribed range in a real space corresponding to the virtual space is captured, a second image is created in which the inside of the range in the virtual space is drawn by the three-dimensional simulator; and a third creation procedure in which, prescribed information obtainable from the second image is added to the first image as teacher information, and learning data to be used for a prescribed machine learning model to practice learning is created.SELECTED DRAWING: Figure 2

Description

本発明は、学習用データ作成方法、学習用データ作成装置及びプログラムに関する。 The present invention relates to a learning data creation method, a learning data creation device, and a program.

近年、機械学習の手法により様々なタスクを実行することが行われている。このようなタスクの１つとして、例えば、セマンティックセグメンテーション等が知られている。セマンティックセグメンテーションとは、カメラ装置等により撮影された画像中の各画素（ピクセル）を、そのピクセルが示す意味に応じたクラス（例えば、そのピクセルが表す物体の物体名等）に分類するタスクである。 In recent years, various tasks have been executed by machine learning techniques. As one of such tasks, for example, semantic segmentation and the like are known. Semantic segmentation is a task of classifying each pixel in an image taken by a camera device or the like into a class (for example, the object name of an object represented by the pixel) according to the meaning indicated by the pixel. ..

ここで、セマンティックセグメンテーション等の多くのタスクでは、教師あり学習により機械学習モデルが学習される場合が多い。 Here, in many tasks such as semantic segmentation, machine learning models are often learned by supervised learning.

特開２０１７－１８２１２９号公報Japanese Unexamined Patent Publication No. 2017-182129 特開２０１６－７１５９７号公報Japanese Unexamined Patent Publication No. 2016-71597

しかしながら、教師あり学習に用いられる学習用データは、人手により作成されることが多い。例えば、セマンティックセグメンテーションでは、画像中の各ピクセルを、このピクセルが示す意味に応じたクラスの色に塗りつぶす作業を行うことで、当該画像に対して教師情報（各ピクセルのクラス分類）が付与された学習用データが作成される。 However, learning data used for supervised learning is often created manually. For example, in semantic segmentation, teacher information (classification of each pixel) is given to the image by filling each pixel in the image with the color of the class according to the meaning indicated by this pixel. Training data is created.

また、機械学習の手法により実行するタスクによっては、複数の教師情報が付与された学習用データを作成する必要がある。例えば、上記のクラス分類の他に、画像中の物体の姿勢（物体の向きや回転等）や当該物体の状態等を教師情報として付与した学習用データが必要になる場合もある。 In addition, depending on the task to be executed by the machine learning method, it is necessary to create learning data to which a plurality of teacher information is added. For example, in addition to the above classification, learning data in which the posture (direction, rotation, etc.) of an object in an image, the state of the object, and the like are added as teacher information may be required.

更に、一般に、機械学習モデルの学習には大量の学習用データが必要になることが多い。このため、教師あり学習に用いられる学習用データの作成には、多大な労力と膨大な時間とを要する場合があった。 Furthermore, in general, learning a machine learning model often requires a large amount of learning data. Therefore, it may take a lot of labor and a huge amount of time to create learning data used for supervised learning.

本発明の実施の形態は、上記の点に鑑みてなされたもので、学習用データを容易に作成することを目的とする。 The embodiment of the present invention has been made in view of the above points, and an object thereof is to easily create learning data.

上記目的を達成するため、本発明の実施の形態は、１以上の物体が配置された仮想空間を三次元シミュレータにより作成する第１の作成手順と、前記仮想空間に対応する実空間の所定の範囲内を撮影した第１の画像が作成されたことに応じて、前記仮想空間の前記範囲内を前記三次元シミュレータが描画した第２の画像を作成する第２の作成手順と、前記第１の画像に対して、前記第２の画像から得られる所定の情報を教師情報として付与して、所定の機械学習モデルの学習に用いられる学習用データを作成する第３の作成手順と、をコンピュータが実行することを特徴とする。 In order to achieve the above object, the embodiment of the present invention includes a first creation procedure for creating a virtual space in which one or more objects are arranged by a three-dimensional simulator, and a predetermined real space corresponding to the virtual space. A second creation procedure for creating a second image drawn by the three-dimensional simulator in the range of the virtual space according to the creation of the first image captured in the range, and the first creation procedure. A third creation procedure for creating learning data used for learning a predetermined machine learning model by adding predetermined information obtained from the second image to the image of the above as teacher information, and a computer. Is characterized by executing.

学習用データを容易に作成することができる。 Training data can be easily created.

本発明の実施の形態における学習用データ作成システムの全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the learning data creation system in embodiment of this invention. 学習用データ作成の一例を模式的に説明するための図である。It is a figure for schematically explaining an example of learning data creation. 事前準備手順の流れの一例を説明するための図である。It is a figure for demonstrating an example of the flow of the advance preparation procedure. 学習用データ作成手順の流れの一例を説明するための図である。It is a figure for demonstrating an example of the flow of the learning data creation procedure. 教師情報リストの一例を示す図である。It is a figure which shows an example of a teacher information list. 本発明の実施の形態における学習用データ作成装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the learning data creation apparatus in embodiment of this invention.

以下、本発明の実施の形態について説明する。以降では、所定のタスクを実行する機械学習モデルの学習用データを容易に作成することが可能な学習用データ作成システム１について説明する。所定のタスクとしては、例えば、カメラ装置等により撮影された画像中の物体の認識や分類、当該物体の状態の把握、当該物体に関する何等かの行動（例えば、物体の把持行動や物体の回避行動）等のタスクが挙げられる。 Hereinafter, embodiments of the present invention will be described. Hereinafter, a learning data creation system 1 capable of easily creating learning data of a machine learning model that executes a predetermined task will be described. Predetermined tasks include, for example, recognition and classification of an object in an image taken by a camera device or the like, grasping the state of the object, and some action related to the object (for example, gripping action of the object or avoidance action of the object). ) And other tasks.

本発明の実施の形態では、三次元シミュレータで作成した仮想空間をカメラ装置（すなわち、仮想空間内に設置等された仮想的なカメラ装置）で撮影した画像（以降、「仮想撮影画像」とも表す。）と、当該仮想空間に対応する実空間を実際のカメラ装置で撮影した画像（以降、「実撮影画像」とも表す。）とを用いて、実撮影画像に対して、仮想撮影画像から得られる教師情報を付与することで、学習用データを作成する。教師情報としては、例えば、仮想撮影画像中における物体の輪郭線情報、当該物体が分類されるクラス、当該物体の物体名、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の行動を行うための情報等が挙げられる。 In the embodiment of the present invention, an image taken by a camera device (that is, a virtual camera device installed in the virtual space) of a virtual space created by a three-dimensional simulator (hereinafter, also referred to as a "virtual shot image"). ) And an image of the real space corresponding to the virtual space taken by an actual camera device (hereinafter, also referred to as "actually shot image"), obtained from the virtual shot image with respect to the actual shot image. By adding the teacher information to be used, learning data is created. The teacher information includes, for example, contour line information of an object in a virtual photographed image, a class in which the object is classified, an object name of the object, state information of the object, a depth to the object, an attitude of the object, and the subject. Information for performing a predetermined action regarding an object and the like can be mentioned.

ここで、仮想空間に対応する実空間とは、例えば、三次元シミュレータで作成した仮想空間と同じ位置に同じ物体が配置等されている実空間のことである。なお、仮想空間と実空間とで位置が同じであるとは、例えば、仮想空間及び実空間に同一の座標系を設定した場合に、位置座標が同一であることである。ただし、仮想空間及び実空間には、例えば、相互に変換可能な座標系がそれぞれ設定されても良い。なお、以降では、「位置」や「姿勢」は、仮想空間及び実空間に設定された同一の座標系における位置や姿勢を表すものとする。 Here, the real space corresponding to the virtual space is, for example, a real space in which the same object is arranged at the same position as the virtual space created by the three-dimensional simulator. The fact that the positions are the same in the virtual space and the real space means that, for example, when the same coordinate system is set in the virtual space and the real space, the position coordinates are the same. However, for example, a coordinate system that can be converted to each other may be set in the virtual space and the real space. In the following, "position" and "posture" shall represent positions and postures in the same coordinate system set in the virtual space and the real space.

また、仮想空間と実空間とで物体が同じであるとは、仮想空間内に配置等された三次元モデルで表される物体と、実空間内に配置等された実際の物体とが同じであることである。なお、実空間内に配置等される実際の物体と区別するため、仮想空間内に配置される物体を「オブジェクト」とも称する。 In addition, the same object in the virtual space and the real space means that the object represented by the three-dimensional model arranged in the virtual space and the actual object arranged in the real space are the same. There is. An object arranged in the virtual space is also referred to as an "object" in order to distinguish it from an actual object arranged in the real space.

＜学習用データ作成システム１の全体構成＞
まず、本発明の実施の形態における学習用データ作成システム１の全体構成について、図１を参照しながら説明する。図１は、本発明の実施の形態における学習用データ作成システム１の全体構成の一例を示す図である。 <Overall configuration of learning data creation system 1>
First, the overall configuration of the learning data creation system 1 according to the embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram showing an example of the overall configuration of the learning data creation system 1 according to the embodiment of the present invention.

図１に示すように、本発明の実施の形態における学習用データ作成システム１は、学習用データ作成装置１０と、１以上のカメラ装置２０と、１以上のトラッキング装置３０とを有する。また、学習用データ作成装置１０と、カメラ装置２０と、トラッキング装置３０とは、例えば、無線ＬＡＮ（Local Area Network）等の通信ネットワークを介して通信可能に接続されている。なお、この通信ネットワークは、例えば、全部又は一部が有線ＬＡＮ等であっても良い。 As shown in FIG. 1, the learning data creating system 1 according to the embodiment of the present invention includes a learning data creating device 10, one or more camera devices 20, and one or more tracking devices 30. Further, the learning data creation device 10, the camera device 20, and the tracking device 30 are communicably connected via a communication network such as a wireless LAN (Local Area Network). In addition, this communication network may be, for example, a wired LAN or the like in whole or in part.

学習用データ作成装置１０は、学習用データを作成するコンピュータ又はコンピュータシステムである。学習用データ作成装置１０は、三次元シミュレータ１００と、学習用データ作成部２００と、記憶部３００とを有する。 The learning data creation device 10 is a computer or a computer system that creates learning data. The learning data creating device 10 includes a three-dimensional simulator 100, a learning data creating unit 200, and a storage unit 300.

三次元シミュレータ１００は、三次元の仮想空間をシミュレーションすることが可能なシミュレータである。三次元シミュレータ１００では、仮想空間内にオブジェクトを配置したり、オブジェクトの物理学的な法則をシミュレーションする物理演算（例えば、オブジェクト同士の衝突判定）を行ったりすることができる。 The three-dimensional simulator 100 is a simulator capable of simulating a three-dimensional virtual space. In the three-dimensional simulator 100, objects can be arranged in a virtual space, and physical operations (for example, collision determination between objects) that simulate the physical rules of objects can be performed.

また、三次元シミュレータ１００では、仮想空間内を仮想的なカメラ装置で撮影した仮想撮影画像を描画したりすることができる。このとき、三次元シミュレータ１００では、例えば、仮想撮影画像中の物体（オブジェクト）の輪郭線情報や物体名、当該物体が分類されるクラス、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の物理演算を行った結果等を、当該仮想撮影画像に付与することができる。これらの情報は、三次元シミュレータ１００における演算等により生成される。 Further, in the three-dimensional simulator 100, it is possible to draw a virtual photographed image taken by a virtual camera device in the virtual space. At this time, in the three-dimensional simulator 100, for example, the contour line information and the object name of the object (object) in the virtual photographed image, the class to which the object is classified, the state information of the object, the depth to the object, and the object. The posture of the object, the result of performing a predetermined physical calculation on the object, and the like can be given to the virtual photographed image. This information is generated by an operation or the like in the three-dimensional simulator 100.

このような三次元シミュレータ１００は、例えば、ＵｎｉｔｙやＵｎｒｅａｌＥｎｇｉｎ４（ＵＥ４）、Ｂｌｅｎｄｅｒ等のゲームエンジンにより実現される。ただし、三次元シミュレータ１００は、これらのゲームエンジンに限られず、任意の三次元シミュレーションソフトウェアにより実現されても良い。 Such a three-dimensional simulator 100 is realized by, for example, a game engine such as Unity, Unreal Engine 4 (UE4), or Blender. However, the three-dimensional simulator 100 is not limited to these game engines, and may be realized by any three-dimensional simulation software.

学習用データ作成部２００は、実撮影画像に対して、仮想撮影画像から得られる教師情報を付与することで、学習用データを作成する。教師情報は、三次元シミュレータ１００が仮想撮影画像を描画した際に、当該仮想撮影画像に付与した情報（例えば、仮想撮影画像中の物体の輪郭線情報や物体名、当該物体が分類されるクラス、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の物理演算を行った結果等）である。 The learning data creation unit 200 creates learning data by adding teacher information obtained from a virtual captured image to an actual captured image. The teacher information is information given to the virtual captured image when the three-dimensional simulator 100 draws the virtual captured image (for example, contour line information and object name of an object in the virtual captured image, and a class in which the object is classified. , State information of the object, depth to the object, posture of the object, result of performing a predetermined physical calculation on the object, etc.).

このように、学習用データ作成部２００は、三次元シミュレータ１００が描画した仮想撮影画像から得られる情報を教師情報として実撮影画像に付与することで、学習用データを作成する。 In this way, the learning data creation unit 200 creates learning data by adding the information obtained from the virtual captured image drawn by the three-dimensional simulator 100 to the actual captured image as teacher information.

記憶部３００は、種々の情報を記憶する。記憶部３００に記憶される情報としては、例えば、カメラ装置２０が実空間を撮影した実撮影画像や三次元シミュレータ１００が描画した仮想撮影画像、トラッキング装置３０から取得したトラッキング情報、仮想空間内に配置等される物体（オブジェクト）の三次元モデル等が挙げられる。ここで、トラッキング情報とは、実空間におけるカメラ装置２０の位置や姿勢をトラッキングした情報である。すなわち、トラッキング情報は、各時刻におけるカメラ装置２０の位置及び姿勢の両方を示す情報である。ただし、トラッキング情報は、例えば、各時刻におけるカメラ装置２０の位置のみを示す情報であっても良い。 The storage unit 300 stores various information. The information stored in the storage unit 300 includes, for example, an actual photographed image taken by the camera device 20 in the real space, a virtual photographed image drawn by the three-dimensional simulator 100, tracking information acquired from the tracking device 30, and in the virtual space. A three-dimensional model of an object to be arranged or the like can be mentioned. Here, the tracking information is information that tracks the position and posture of the camera device 20 in the real space. That is, the tracking information is information indicating both the position and the posture of the camera device 20 at each time. However, the tracking information may be, for example, information indicating only the position of the camera device 20 at each time.

カメラ装置２０は、実空間を撮影して実撮影画像を作成する撮像装置である。カメラ装置２０は、例えば、トラッキング装置３０が装着された携帯型カメラスタンド等に固定されている。カメラ装置２０により作成された実撮影画像は、学習用データ作成装置１０に送信され、記憶部３００に記憶される。なお、カメラ装置２０は、例えば、深度情報が付与された実撮影画像を作成可能な深度カメラであっても良い。 The camera device 20 is an image pickup device that captures a real space and creates an actual photographed image. The camera device 20 is fixed to, for example, a portable camera stand or the like on which the tracking device 30 is mounted. The actual photographed image created by the camera device 20 is transmitted to the learning data creating device 10 and stored in the storage unit 300. The camera device 20 may be, for example, a depth camera capable of creating an actual photographed image to which depth information is added.

トラッキング装置３０は、カメラ装置２０の位置及び姿勢をトラッキングして、トラッキング情報を作成する装置（例えば、位置センサ及び姿勢センサが搭載されたセンシング装置）である。トラッキング装置３０は、例えば携帯型カメラスタンド等に装着されている。このように、１台のカメラ装置２０に対して、１台のトラッキング装置３０が対応付けて設置等されている。トラッキング装置３０により作成されたトラッキング情報は、学習用データ作成装置１０に送信され、記憶部３００に記憶される。なお、トラッキング装置３０は、例えば、カメラ装置２０に直接装着等されていても良いし、カメラ装置２０に内蔵等されていても良い。 The tracking device 30 is a device (for example, a sensing device equipped with a position sensor and a posture sensor) that tracks the position and posture of the camera device 20 and creates tracking information. The tracking device 30 is attached to, for example, a portable camera stand or the like. In this way, one tracking device 30 is installed in association with one camera device 20. The tracking information created by the tracking device 30 is transmitted to the learning data creation device 10 and stored in the storage unit 300. The tracking device 30 may be directly mounted on the camera device 20, or may be built in the camera device 20, for example.

図１に示す学習用データ作成システム１の構成は一例であって、他の構成であっても良い。例えば、学習用データ作成システム１は、任意の台数のカメラ装置２０と、これらのカメラ装置２０に対応するトラッキング装置３０とを有していても良い。 The configuration of the learning data creation system 1 shown in FIG. 1 is an example, and may be another configuration. For example, the learning data creation system 1 may have an arbitrary number of camera devices 20 and a tracking device 30 corresponding to these camera devices 20.

また、カメラ装置２０は、実空間における位置及び姿勢が既知であれば、このカメラ装置２０に対応するトラッキング装置３０は無くても良い。例えば、予め決まった位置に、予め決まった姿勢でカメラ装置２０が固定的に設置等されているような場合には、このカメラ装置２０に対応するトラッキング装置３０は無くても良い。 Further, the camera device 20 may not have a tracking device 30 corresponding to the camera device 20 as long as the position and posture in the real space are known. For example, when the camera device 20 is fixedly installed at a predetermined position and in a predetermined posture, the tracking device 30 corresponding to the camera device 20 may not be provided.

＜学習用データの作成方法＞
ここで、本発明の実施の形態における学習用データ作成装置１０で学習用データを作成する場合の概略について、図２を参照しながら説明する。図２は、学習用データ作成の一例を模式的に説明するための図である。 <How to create learning data>
Here, an outline of a case where learning data is created by the learning data creating device 10 according to the embodiment of the present invention will be described with reference to FIG. FIG. 2 is a diagram for schematically explaining an example of creating learning data.

図２に示すように、実空間の或る位置において、或る姿勢のカメラ装置２０で撮影した実撮影画像を「実撮影画像Ｇ１１０」とする。また、実空間と対応する仮想空間の同じ位置において、同じ姿勢の仮想的なカメラ装置で撮影した仮想撮影画像を「仮想撮影画像Ｇ２１０」とする。 As shown in FIG. 2, an actual photographed image taken by a camera device 20 in a certain posture at a certain position in a real space is referred to as an “actually photographed image G110”. Further, a virtual photographed image taken by a virtual camera device having the same posture at the same position in the virtual space corresponding to the real space is referred to as a “virtual photographed image G210”.

このとき、仮想撮影画像Ｇ２１０には、三次元シミュレータ１００の演算等により生成される情報（図２では、一例として、「輪郭線」及び「物体名」）が付与されている。すなわち、図２に示す例では、仮想撮影画像Ｇ２１０中の各物体の輪郭線と、各物体の物体名とが付与されている。なお、三次元シミュレータの演算等により生成可能な情報のうち、どのような情報を仮想撮影画像Ｇ２１０に付与するかは、機械学習モデルに実行させるタスクによって異なる。 At this time, information generated by the calculation of the three-dimensional simulator 100 or the like (in FIG. 2, "contour line" and "object name" as an example) is added to the virtual photographed image G210. That is, in the example shown in FIG. 2, the contour line of each object in the virtual photographed image G210 and the object name of each object are given. Of the information that can be generated by the calculation of the three-dimensional simulator, what kind of information is given to the virtual photographed image G210 depends on the task to be executed by the machine learning model.

学習用データ作成装置１０は、実撮影画像Ｇ１１０に対して、仮想撮影画像Ｇ２１０に付与されている情報（すなわち、「輪郭線」及び「物体名」）を教師情報として付与することで、学習用データＧ１２０を作成する。これにより、実撮影画像Ｇ１１０と、教師情報（すなわち、「輪郭線」及び「物体名」）との組で表される学習用データＧ１２０が作成される。 The learning data creation device 10 is for learning by imparting information (that is, "contour line" and "object name") given to the virtual captured image G210 to the actual captured image G110 as teacher information. Create data G120. As a result, the learning data G120 represented by a set of the actual photographed image G110 and the teacher information (that is, the “contour line” and the “object name”) is created.

このように、本発明の実施の形態における学習用データ作成装置１０は、実空間の或る範囲内を実際に撮影した実撮影画像Ｇ１１０と、仮想空間の同じ範囲内を仮想的に撮影した仮想撮影画像Ｇ２１０とを用いて、当該仮想撮影画像Ｇ２１０から得られる情報（すなわち、三次元シミュレータの演算等により生成された情報）を実撮影画像Ｇ１１０に付与することで、学習用データＧ１２０を作成する。このため、本発明の実施の形態における学習用データ作成装置１０では、学習用データＧ１２０を容易に作成することができる。 As described above, the learning data creation device 10 according to the embodiment of the present invention is a virtual image G110 actually captured in a certain range of the real space and a virtual image captured in the same range of the virtual space. The learning data G120 is created by using the captured image G210 and applying the information obtained from the virtual captured image G210 (that is, the information generated by the calculation of the three-dimensional simulator or the like) to the actual captured image G110. .. Therefore, in the learning data creating device 10 according to the embodiment of the present invention, the learning data G120 can be easily created.

しかも、本発明の実施の形態における学習用データ作成装置１０では、トラッキング装置３０から取得されたトラッキング情報によりカメラ装置２０の位置及び姿勢が特定されるため、仮想空間における仮想的なカメラ装置の位置及び姿勢を当該カメラ装置２０と同期させることができる。このため、ユーザは、例えば、実空間内をカメラ装置２０で撮影するだけで、実撮影画像と、この実撮影画像に対応する仮想撮影画像とを容易に得ることができる。 Moreover, in the learning data creating device 10 according to the embodiment of the present invention, the position and posture of the camera device 20 are specified by the tracking information acquired from the tracking device 30, so that the position of the virtual camera device in the virtual space is specified. And the posture can be synchronized with the camera device 20. Therefore, for example, the user can easily obtain an actual photographed image and a virtual photographed image corresponding to the actual photographed image only by photographing the inside of the real space with the camera device 20.

＜事前準備手順の流れ＞
本発明の実施の形態では、上述したように、仮想空間と実空間とが対応している必要がある。このため、学習用データを作成するための事前準備として、仮想空間と実空間とを対応させる必要がある。そこで、以降では、三次元シミュレータ１００により仮想空間に物体（オブジェクト）を配置した上で、この仮想空間に対応するように実空間にも実際の物体を配置することで、仮想空間と実空間とを対応させる場合の手順について、図３を参照しながら説明する。図３は、事前準備手順の流れの一例を説明するための図である。 <Flow of preparation procedure>
In the embodiment of the present invention, as described above, the virtual space and the real space need to correspond to each other. Therefore, it is necessary to associate the virtual space with the real space as a preliminary preparation for creating the learning data. Therefore, in the following, by arranging an object (object) in the virtual space by the three-dimensional simulator 100 and then arranging the actual object in the real space so as to correspond to this virtual space, the virtual space and the real space can be described. The procedure for associating the above with reference to FIG. 3 will be described with reference to FIG. FIG. 3 is a diagram for explaining an example of the flow of the preparatory procedure.

ステップＳ１０１：三次元シミュレータ１００は、仮想空間内に配置される物体（オブジェクト）の三次元モデルを記憶部３００から取得する。これは、例えば、記憶部３００に記憶されている三次元モデルのデータを三次元シミュレータ１００にインポートすること意味する。三次元モデルは、物体の形状だけでなく、例えば、物体ＩＤや物体名、物体が属するカテゴリ等の情報が付与されている。 Step S101: The three-dimensional simulator 100 acquires a three-dimensional model of an object (object) arranged in the virtual space from the storage unit 300. This means, for example, importing the data of the three-dimensional model stored in the storage unit 300 into the three-dimensional simulator 100. In the three-dimensional model, not only the shape of the object but also information such as an object ID, an object name, and a category to which the object belongs is given.

なお、三次元モデルは、任意の方法で予め作成した上で、記憶部３００に保存しておけば良い。三次元モデルを作成する方法としては、例えば、実際の物体の三次元形状を三次元スキャナ等によりスキャンすることで作成しても良いし、三次元モデル作成ソフトウェア等により手作業で作成しても良い。 The three-dimensional model may be created in advance by any method and then stored in the storage unit 300. As a method of creating a three-dimensional model, for example, the three-dimensional shape of an actual object may be created by scanning it with a three-dimensional scanner or the like, or it may be manually created by a three-dimensional model creation software or the like. good.

ステップＳ１０２：三次元シミュレータ１００は、仮想空間内に、三次元モデルが表すオブジェクトを配置する。ユーザは、例えば、上記のステップＳ１０１でインポートされた複数の三次元モデルの中から所望の三次元モデルを選択した上で、選択した三次元モデルを仮想空間内にドラッグ・アンド・ドロップすることで、当該仮想空間内にオブジェクトを配置することができる。これ以外にも、ユーザは、仮想空間内の位置座標を指定することで、三次元モデルが表すオブジェクトを当該仮想空間内に配置することができても良い。 Step S102: The three-dimensional simulator 100 arranges an object represented by the three-dimensional model in the virtual space. For example, the user selects a desired 3D model from the plurality of 3D models imported in step S101 above, and then drags and drops the selected 3D model into the virtual space. , Objects can be placed in the virtual space. In addition to this, the user may be able to arrange the object represented by the three-dimensional model in the virtual space by specifying the position coordinates in the virtual space.

ここで、三次元モデルが表すオブジェクトを仮想空間内に配置する際に、ユーザは、当該オブジェクトを任意に傾けたり、回転させたりした上で、当該オブジェクトを配置しても良い。これ以外にも、ユーザは、例えば、当該オブジェクトを拡大や縮小等した上で、当該オブジェクトを配置しても良い。 Here, when arranging the object represented by the three-dimensional model in the virtual space, the user may arbitrarily tilt or rotate the object and then arrange the object. In addition to this, the user may arrange the object after enlarging or reducing the object, for example.

なお、仮想空間内に複数のオブジェクトを配置する場合、上記のステップＳ１０２が繰り返し行われれば良い。 When arranging a plurality of objects in the virtual space, the above step S102 may be repeated.

以上のステップＳ１０１～ステップＳ１０２により、１以上の物体（オブジェクト）が所望の位置に配置された仮想空間が三次元シミュレータ１００により作成される。 By the above steps S101 to S102, a virtual space in which one or more objects (objects) are arranged at desired positions is created by the three-dimensional simulator 100.

ステップＳ１０３：ユーザは、上記のステップＳ１０１～ステップＳ１０２により作成された仮想空間に対応するように、実空間内に実際の物体を配置する。 Step S103: The user arranges an actual object in the real space so as to correspond to the virtual space created by the above steps S101 to S102.

ここで、ユーザは、例えば、仮想空間内に配置されたオブジェクトを実空間に重畳して表示させることが可能で、かつ、位置センサ及び姿勢センサが搭載されている表示装置を用いて、この表示装置に表示されたオブジェクトと同じ位置に実際の物体を配置すれば良い。このような表示装置としては、例えば、位置センサと姿勢センサとカメラとが搭載されたヘッドマウントディスプレイ、実空間を透過的に視認可能で位置センサ及び姿勢センサが搭載されたヘッドマウントディスプレイ、プロジェクションマッピング装置、位置センサと姿勢センサとカメラとが搭載されたタブレット端末、位置センサと姿勢センサとカメラとが搭載されたスマートフォン等が挙げられる。 Here, for example, the user can superimpose and display an object arranged in the virtual space on the real space, and use a display device equipped with a position sensor and a posture sensor to display this display. The actual object may be placed at the same position as the object displayed on the device. Examples of such a display device include a head mount display equipped with a position sensor, an attitude sensor, and a camera, a head mount display equipped with a position sensor and an attitude sensor that can transparently visually recognize the real space, and projection mapping. Examples thereof include a device, a tablet terminal equipped with a position sensor, a posture sensor and a camera, and a smartphone equipped with a position sensor, a posture sensor and a camera.

これらの表示装置では、仮想空間内の位置と実空間内の位置とを同期させた上で、実空間内にオブジェクトを重畳させた映像を表示させることができる。したがって、ユーザは、例えば、当該表示装置を携帯又は装着等した上で実空間内を移動して、当該映像中のオブジェクトと同じ位置に、同じ姿勢で、同じ物体を実空間内に配置することができる。 With these display devices, it is possible to display an image in which an object is superimposed in the real space after synchronizing the position in the virtual space and the position in the real space. Therefore, for example, the user moves in the real space after carrying or wearing the display device, and arranges the same object in the real space at the same position and in the same posture as the object in the video. Can be done.

これにより、上記のステップＳ１０１～ステップＳ１０２により作成された仮想空間と、実空間とを対応させることができる。なお、上記以外にも、例えば、ＭＲ（Mixed Reality）等の技術によって実空間と仮想空間とを融合させた複合現実を作成することで、仮想空間内に配置されたオブジェクトと同じ位置に、同じ姿勢で、同じ物体を実空間内に配置しても良い。 Thereby, the virtual space created by the above steps S101 to S102 can be made to correspond to the real space. In addition to the above, for example, by creating a mixed reality that fuses real space and virtual space with a technology such as MR (Mixed Reality), the same position as the object placed in the virtual space is the same. The same object may be placed in real space in a posture.

＜学習用データ作成手順の流れ＞
次に、実撮影画像と、この実撮影画像に対応する仮想撮影画像とを作成した上で、これらの実撮影画像と仮想撮影画像とを用いて、学習用データを作成する場合の手順について、図４を参照しながら説明する。図４は、学習用データ作成手順の流れの一例を説明するための図である。 <Flow of learning data creation procedure>
Next, regarding the procedure for creating learning data by creating an actual photographed image and a virtual photographed image corresponding to the actual photographed image, and then using these the actual photographed image and the virtual photographed image. This will be described with reference to FIG. FIG. 4 is a diagram for explaining an example of the flow of the learning data creation procedure.

ステップＳ２０１：ユーザは、カメラ装置２０を用いて、実空間内の所望の範囲を撮影する。これにより、カメラ装置２０により実撮影画像が作成され、学習用データ作成装置１０に送信される。学習用データ作成装置１０では、当該実撮影画像が記憶部３００に記憶される。 Step S201: The user uses the camera device 20 to photograph a desired range in the real space. As a result, the actual photographed image is created by the camera device 20 and transmitted to the learning data creating device 10. In the learning data creating device 10, the actually captured image is stored in the storage unit 300.

また、このとき、当該カメラ装置２０に対応するトラッキング装置３０は、トラッキング情報を学習用データ作成装置１０に送信する。これにより、学習用データ作成装置１０では、当該トラッキング情報が記憶部３００に記憶される。トラッキング情報は、上述したように、当該カメラ装置２０の位置及び姿勢を示す情報である。 At this time, the tracking device 30 corresponding to the camera device 20 transmits the tracking information to the learning data creation device 10. As a result, in the learning data creating device 10, the tracking information is stored in the storage unit 300. As described above, the tracking information is information indicating the position and posture of the camera device 20.

なお、上記のステップＳ２０１では、トラッキング装置３０がカメラ装置２０の位置及び姿勢をトラッキングすることで作成したトラッキング情報を記憶部３００に記憶させたが、これに限られない。任意の方法でカメラ装置２０の位置及び姿勢をトラッキングした上で、このトラッキング結果を示すトラッキング情報を記憶部３００に記憶させても良い。例えば、ＱＲコード（登録商標）等の二次元コードをカメラ装置２０に事前に貼り付け等した上、この二次元コードをカメラ等で読み取ることで当該カメラ装置２０の位置及び姿勢をトラッキングしても良い。 In step S201 described above, the tracking device 30 stores the tracking information created by tracking the position and posture of the camera device 20 in the storage unit 300, but the present invention is not limited to this. After tracking the position and posture of the camera device 20 by any method, the storage unit 300 may store the tracking information indicating the tracking result. For example, even if a two-dimensional code such as a QR code (registered trademark) is pasted on the camera device 20 in advance and the two-dimensional code is read by a camera or the like to track the position and posture of the camera device 20. good.

ステップＳ２０２：三次元シミュレータ１００は、上記のステップＳ２０１で撮影したカメラ装置２０と同じ位置及び姿勢で、仮想空間内を仮想的なカメラ装置で撮影する。すなわち、三次元シミュレータ１００は、仮想空間内において、上記のステップＳ２０１で撮影したカメラ装置２０と同じ位置及び姿勢の仮想的なカメラ装置の撮影範囲内を描画（レンダリング）する。 Step S202: The three-dimensional simulator 100 takes a picture of the inside of the virtual space with the virtual camera device at the same position and posture as the camera device 20 taken in the above step S201. That is, the three-dimensional simulator 100 draws (renders) within the shooting range of the virtual camera device having the same position and orientation as the camera device 20 shot in step S201 above in the virtual space.

ここで、三次元シミュレータ１００は、上記のステップＳ２０１で作成されたトラッキング情報から、カメラ装置２０の位置及び姿勢を特定することができる。このため、三次元シミュレータ１００は、実空間のカメラ装置２０と同じ位置及び姿勢で、仮想空間内に仮想的なカメラ装置を設置することができる。これにより、上記のステップＳ２０１で作成された実撮影画像に対応する仮想撮影画像が作成される。 Here, the three-dimensional simulator 100 can specify the position and the posture of the camera device 20 from the tracking information created in the above step S201. Therefore, the three-dimensional simulator 100 can install a virtual camera device in the virtual space at the same position and posture as the camera device 20 in the real space. As a result, a virtual captured image corresponding to the actual captured image created in step S201 is created.

このとき、三次元シミュレータ１００は、仮想空間内で取得又は演算により生成される所定の情報を仮想撮影画像に付与する。そして、三次元シミュレータ１００は、当該所定の情報が付与された仮想撮影画像を記憶部３００に記憶する。 At this time, the three-dimensional simulator 100 adds predetermined information generated by acquisition or calculation in the virtual space to the virtual captured image. Then, the three-dimensional simulator 100 stores the virtual photographed image to which the predetermined information is added in the storage unit 300.

ここで、所定の情報としては、上述したように、例えば、仮想撮影画像中の物体（オブジェクト）の輪郭線情報や物体名、当該物体が分類されるクラス、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の物理演算を行った結果等が挙げられる。また、物体に関して所定の物理演算を行った結果としては、例えば、予め設定された動作が可能なロボットアームが、当該位置において当該物体を把持可能な動作に関する情報等が挙げられる。又は、例えば、予め設定された動作が可能な移動式ロボットが、当該位置において当該物体を回避可能な動作に関する情報等が挙げられる。なお、これらのロボットアームや移動式ロボットは、予め設定された動作が可能な動作主体の一例である。 Here, as the predetermined information, as described above, for example, the contour line information and the object name of the object (object) in the virtual photographed image, the class to which the object is classified, the state information of the object, and the object. Depth, the posture of the object, the result of performing a predetermined physical calculation on the object, and the like. Further, as a result of performing a predetermined physical calculation on an object, for example, information on an operation in which a robot arm capable of a preset operation can grip the object at the position can be mentioned. Alternatively, for example, information on an operation in which a mobile robot capable of a preset operation can avoid the object at the position can be mentioned. It should be noted that these robot arms and mobile robots are examples of operation main bodies capable of preset operations.

なお、上記のステップＳ２０２は、例えば、上記のステップＳ２０１の後に自動的に実行されても良いし、ユーザの操作（例えば、仮想空間内でのレンダリング開始操作）等に応じて実行されても良い。 The above step S202 may be automatically executed after the above step S201, for example, or may be executed in response to a user operation (for example, a rendering start operation in the virtual space). ..

ステップＳ２０３：学習用データ作成部２００は、上記のステップＳ２０１で作成された実撮影画像に対して、上記のステップＳ２０２で作成された仮想撮影画像に付与されている所定の情報を教師情報として付与する。これにより、実撮影画像と、教師情報との組で表される学習用データが作成される。 Step S203: The learning data creation unit 200 assigns predetermined information given to the virtual captured image created in step S202 to the actual captured image created in step S201 as teacher information. do. As a result, learning data represented by a set of the actual photographed image and the teacher information is created.

ここで、学習用データに含まれる教師情報は、例えば、リスト形式で表される。一例として、リスト形式で表された複数の教師情報（これを「教師情報リスト」とも表す。）を図５に示す。図５は、或る実撮影画像（画像ＩＤ：ｉｍａｇｅ１０１）に付与された教師情報リストの一例である。 Here, the teacher information included in the learning data is represented in a list format, for example. As an example, FIG. 5 shows a plurality of teacher information represented in a list format (this is also referred to as a “teacher information list”). FIG. 5 is an example of a teacher information list assigned to a certain actual photographed image (image ID: image101).

図５に示す教師情報リストに含まれる各教師情報は、物体ＩＤと、位置情報と、輪郭線情報と、接触情報と、把持動作情報とが対応付けられた情報である。 Each teacher information included in the teacher information list shown in FIG. 5 is information in which an object ID, position information, contour line information, contact information, and gripping motion information are associated with each other.

物体ＩＤは、物体（オブジェクト）を識別するＩＤである。物体ＩＤは、例えば、仮想空間に配置されたオブジェクトの三次元モデルに付与されている情報である。 The object ID is an ID that identifies an object (object). The object ID is, for example, information given to a three-dimensional model of an object arranged in a virtual space.

位置情報は、物体（オブジェクト）が配置された位置座標である。位置情報は、例えば、上記のステップＳ１０２で三次元モデルが表すオブジェクトが配置された際に、当該オブジェクトに付与される情報である。 The position information is the position coordinates where the object (object) is arranged. The position information is, for example, information given to the object represented by the three-dimensional model when the object represented by the three-dimensional model is arranged in step S102.

輪郭線情報は、物体（オブジェクト）の輪郭線を示す情報である。輪郭線情報は、例えば、上記のステップＳ２０２で仮想撮影画像を描画（レンダリング）した際のレンダリング結果から取得することができる。 The contour line information is information indicating the contour line of an object (object). The contour line information can be obtained, for example, from the rendering result when the virtual captured image is drawn (rendered) in step S202.

接触情報は、当該物体ＩＤの物体が他の物体（オブジェクト）と接触している場合に、当該他の物体の物体ＩＤや当該他の物体との接触位置等を示す情報である。接触情報は、例えば、三次元シミュレータ１００の物理演算の演算結果から取得することができる。 The contact information is information indicating the object ID of the other object, the contact position with the other object, and the like when the object of the object ID is in contact with another object (object). The contact information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.

把持動作情報は、例えば、予め設定された動作が可能なロボットアームが、仮想撮影画像の撮影位置において当該物体ＩＤの物体を把持可能な動作に関する情報である。把持動作情報は、例えば、三次元シミュレータ１００の物理演算の演算結果から取得することができる。 The gripping motion information is, for example, information related to a motion in which a robot arm capable of a preset motion can grip an object having the object ID at a shooting position of a virtual captured image. The gripping motion information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.

このように、図５に示す教師情報リストは、物体（オブジェクト）毎に、位置情報と、物体名と、輪郭線情報と、接触情報と、把持動作情報とが対応付けられた教師情報のリストである。これ以外にも、当該教師情報には、三次元シミュレータ１００が取得又は演算可能な任意の情報が対応付けられていても良い。例えば、教師情報として仮想撮影画像自体又は仮想撮影画像の一部の領域が対応付けられていても良い。具体的には、例えば、物体ＩＤに対して、仮想撮影画像の画像領域うち、当該物体ＩＤの物体を表す画像領域部分が対応付けられていても良い。 As described above, the teacher information list shown in FIG. 5 is a list of teacher information in which position information, object names, contour line information, contact information, and gripping motion information are associated with each object. Is. In addition to this, any information that can be acquired or calculated by the three-dimensional simulator 100 may be associated with the teacher information. For example, the virtual photographed image itself or a part of the area of the virtual photographed image may be associated with the teacher information. Specifically, for example, an image area portion representing an object of the object ID may be associated with the object ID in the image area of the virtual photographed image.

また、各教師情報は、上記の各情報（位置情報や物体名、輪郭線情報、接触情報、把持動作情報等）のうちの一部の情報のみが対応付けられた情報であっても良い。 Further, each teacher information may be information to which only a part of the above information (position information, object name, contour line information, contact information, gripping motion information, etc.) is associated.

なお、学習用データに含まれる教師情報がリスト形式で表されることは一例であって、学習用データに含まれる教師情報は、他の形式の任意の形式で表されていても良い。 It should be noted that the teacher information included in the learning data is represented in a list format as an example, and the teacher information included in the learning data may be represented in any other format.

＜学習用データ作成装置１０のハードウェア構成＞
次に、本発明の実施の形態における学習用データ作成装置１０のハードウェア構成について、図６を参照しながら説明する。図６は、本発明の実施の形態における学習用データ作成装置１０のハードウェア構成の一例を示す図である。 <Hardware configuration of learning data creation device 10>
Next, the hardware configuration of the learning data creation device 10 according to the embodiment of the present invention will be described with reference to FIG. FIG. 6 is a diagram showing an example of the hardware configuration of the learning data creation device 10 according to the embodiment of the present invention.

図６に示すように、本発明の実施の形態における学習用データ作成装置１０は、入力装置４０１と、表示装置４０２と、外部Ｉ／Ｆ４０３と、通信Ｉ／Ｆ４０４と、ＲＡＭ（Random Access Memory）４０５と、ＲＯＭ（Read Only Memory）４０６と、プロセッサ４０７と、補助記憶装置４０８とを有する。これら各ハードウェアは、それぞれがバス４０９により相互に接続されている。 As shown in FIG. 6, the learning data creating device 10 according to the embodiment of the present invention includes an input device 401, a display device 402, an external I / F403, a communication I / F404, and a RAM (Random Access Memory). It has a 405, a ROM (Read Only Memory) 406, a processor 407, and an auxiliary storage device 408. Each of these hardware is connected to each other by a bus 409.

入力装置４０１は、例えばキーボードやマウス、タッチパネル等であり、ユーザが各種操作を入力するのに用いられる。表示装置４０２は、例えばディスプレイ等であり、学習用データ作成装置１０の各種の処理結果を表示する。なお、学習用データ作成装置１０は、入力装置４０１及び表示装置４０２のうちの少なくとも一方を有していなくても良い。 The input device 401 is, for example, a keyboard, a mouse, a touch panel, or the like, and is used for a user to input various operations. The display device 402 is, for example, a display or the like, and displays various processing results of the learning data creation device 10. The learning data creation device 10 does not have to have at least one of the input device 401 and the display device 402.

外部Ｉ／Ｆ４０３は、外部装置とのインタフェースである。外部装置には、記録媒体４０３ａ等がある。学習用データ作成装置１０、外部Ｉ／Ｆ４０３を介して、記録媒体４０３ａ等の読み取りや書き込み等を行うことができる。記録媒体４０３ａには、三次元シミュレータ１００や学習用データ作成部２００を実現する１以上のプログラム等が記録されていても良い。 The external I / F 403 is an interface with an external device. The external device includes a recording medium 403a and the like. The recording medium 403a and the like can be read and written via the learning data creation device 10 and the external I / F 403. The recording medium 403a may record one or more programs that realize the three-dimensional simulator 100 and the learning data creation unit 200.

記録媒体４０３ａには、例えば、フレキシブルディスク、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ＳＤメモリカード（Secure Digital memory card）、ＵＳＢ（Universal Serial Bus）メモリカード等がある。 The recording medium 403a includes, for example, a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital memory card), a USB (Universal Serial Bus) memory card, and the like.

通信Ｉ／Ｆ４０４は、学習用データ作成装置１０を通信ネットワークに接続するためのインタフェースである。三次元シミュレータ１００や学習用データ作成部２００を実現する１以上のプログラムは、通信Ｉ／Ｆ４０４を介して、所定のサーバ装置等から取得（ダウンロード）されても良い。 The communication I / F 404 is an interface for connecting the learning data creation device 10 to the communication network. One or more programs that realize the three-dimensional simulator 100 and the learning data creation unit 200 may be acquired (downloaded) from a predetermined server device or the like via the communication I / F 404.

ＲＡＭ４０５は、プログラムやデータを一時保持する揮発性の半導体メモリである。ＲＯＭ４０６は、電源を切ってもプログラムやデータを保持することができる不揮発性の半導体メモリである。ＲＯＭ４０６には、例えば、ＯＳ（Operating System）に関する設定や通信ネットワークに関する設定等が格納されている。 The RAM 405 is a volatile semiconductor memory that temporarily holds programs and data. The ROM 406 is a non-volatile semiconductor memory that can hold programs and data even when the power is turned off. The ROM 406 stores, for example, settings related to an OS (Operating System), settings related to a communication network, and the like.

プロセッサ４０７は、例えばＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等であり、ＲＯＭ４０６や補助記憶装置４０８等からプログラムやデータをＲＡＭ４０５上に読み出して処理を実行する演算装置である。三次元シミュレータ１００や学習用データ作成部２００は、例えば補助記憶装置４０８に格納されている１以上のプログラムがプロセッサ４０７に実行させる処理により実現される。なお、学習用データ作成装置１０は、プロセッサ４０７として、ＣＰＵとＧＰＵとの両方を有していても良いし、ＣＰＵ又はＧＰＵのいずれか一方のみを有していても良い。 The processor 407 is, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like, and is an arithmetic unit that reads programs and data from a ROM 406, an auxiliary storage device 408, or the like onto a RAM 405 and executes processing. The three-dimensional simulator 100 and the learning data creation unit 200 are realized, for example, by processing one or more programs stored in the auxiliary storage device 408 to be executed by the processor 407. The learning data creation device 10 may have both a CPU and a GPU as the processor 407, or may have only one of the CPU and the GPU.

補助記憶装置４０８は、例えばＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等であり、プログラムやデータを格納している不揮発性の記憶装置である。補助記憶装置４０８には、例えば、ＯＳ、各種アプリケーションソフトウェア、三次元シミュレータ１００や学習用データ作成部２００を実現する１以上のプログラム等が格納されている。記憶部３００は、例えば補助記憶装置４０８を用いて実現されている。ただし、記憶部３００は、補助記憶装置４０８ではなく、例えば、学習用データ作成装置１０と通信ネットワークを介して通信可能に接続される記憶装置等を用いて実現されていても良い。 The auxiliary storage device 408 is, for example, an HDD (Hard Disk Drive), an SSD (Solid State Drive), or the like, and is a non-volatile storage device that stores programs and data. The auxiliary storage device 408 stores, for example, an OS, various application software, one or more programs that realize a three-dimensional simulator 100 and a learning data creation unit 200, and the like. The storage unit 300 is realized by using, for example, an auxiliary storage device 408. However, the storage unit 300 may be realized by using, for example, a storage device that is communicably connected to the learning data creation device 10 via a communication network, instead of the auxiliary storage device 408.

本発明の実施の形態における学習用データ作成装置１０は、図６に示すハードウェア構成を有することにより、上述した各種処理を実現することができる。なお、図６に示す例では、本発明の実施の形態における学習用データ作成装置１０が１台の装置（コンピュータ）で実現されている場合について説明したが、これに限られない。本発明の実施の形態における学習用データ作成装置１０は、複数台の装置（コンピュータ）で実現されていても良い。 The learning data creation device 10 according to the embodiment of the present invention can realize the above-mentioned various processes by having the hardware configuration shown in FIG. In the example shown in FIG. 6, the case where the learning data creating device 10 according to the embodiment of the present invention is realized by one device (computer) has been described, but the present invention is not limited to this. The learning data creation device 10 in the embodiment of the present invention may be realized by a plurality of devices (computers).

＜まとめ＞
以上のように、本発明の実施の形態における学習用データ作成システム１は、仮想撮影画像から得られる情報（すなわち、三次元シミュレータ１００が取得又は演算可能な情報）を教師情報として実撮影画像に付与することで、学習用データを作成する。このため、本発明の実施の形態における学習用データ作成システム１では、例えば、実撮影画像に対して教師情報を手作業で付与する等の作業を行うことなく、学習用データを容易に作成することができるようになる。特に、例えば、教師情報の数が多い場合（例えば、物体の種類数が多い場合やカテゴリ数が多い場合等）であっても、学習用データを容易に作成することができるようになる。 <Summary>
As described above, the learning data creation system 1 according to the embodiment of the present invention uses the information obtained from the virtual captured image (that is, the information that can be acquired or calculated by the three-dimensional simulator 100) as the teacher information in the actual captured image. By adding it, learning data is created. Therefore, in the learning data creation system 1 according to the embodiment of the present invention, learning data can be easily created without performing work such as manually adding teacher information to an actual photographed image. You will be able to do it. In particular, for example, even when the number of teacher information is large (for example, when the number of types of objects is large or the number of categories is large), learning data can be easily created.

また、例えば、セマンティックセグメンテーションを行う場合に、本発明の実施の形態における学習用データ作成システム１では、物体（オブジェクト）の境界線を三次元シミュレータ１００が取得するため、高い精度で物体のセグメンテーションを行うことができるようになる。 Further, for example, in the case of performing semantic segmentation, in the learning data creation system 1 according to the embodiment of the present invention, the three-dimensional simulator 100 acquires the boundary line of the object (object), so that the segmentation of the object can be performed with high accuracy. You will be able to do it.

更に、例えば、本発明の実施の形態における学習用データ作成システム１では、例えば、深度や物体の姿勢等、手作業で付与することが困難な教師情報であっても、この教師情報が含まれる学習用データを容易に作成することができる。 Further, for example, in the learning data creation system 1 according to the embodiment of the present invention, even if the teacher information is difficult to be manually given, such as the depth and the posture of the object, the teacher information is included. Training data can be easily created.

しかも、本発明の実施の形態における学習用データ作成システム１では、例えば、ユーザが実空間内の移動しながらカメラ装置２０で任意の範囲を撮影するだけで、実撮影画像と、この実撮影画像に対応する仮想撮影画像とが作成されるため、大量の学習用データを容易に作成することができる。このため、例えば、クラウドソーシング等を利用して教師情報を実撮影画像に付与する場合と比較して、低コストに大量の学習用データを得ることができる。 Moreover, in the learning data creation system 1 according to the embodiment of the present invention, for example, the user simply captures an arbitrary range with the camera device 20 while moving in the real space, and the actual captured image and the actual captured image. Since a virtual captured image corresponding to the above is created, a large amount of learning data can be easily created. Therefore, for example, a large amount of learning data can be obtained at low cost as compared with the case where teacher information is added to an actual photographed image by using crowdsourcing or the like.

したがって、本発明の実施の形態における学習用データ作成システム１を利用することで、例えば、実空間である或る部屋内を掃除したり、当該部屋内の物体の片づけを行ったりするロボットの認識エンジン（すなわち、部屋内の掃除や片づけを行うタスクを実行する機械学習モデル）の学習に用いられる大量の学習用データを容易に得ることができる。 Therefore, by using the learning data creation system 1 in the embodiment of the present invention, for example, recognition of a robot that cleans a room in a real space or cleans up an object in the room. A large amount of learning data used for learning an engine (ie, a machine learning model that performs tasks such as cleaning and tidying up a room) can be easily obtained.

なお、本発明の実施の形態では、事前準備手順として、仮想空間を作成した上で、この仮想空間に対応するように実空間に物体を配置したが、これに限られない。例えば、実際に物体が配置されている実空間と対応するように、仮想空間が作成されても良い。 In the embodiment of the present invention, as a preliminary preparation procedure, a virtual space is created and an object is arranged in the real space so as to correspond to the virtual space, but the present invention is not limited to this. For example, a virtual space may be created so as to correspond to the real space in which the object is actually arranged.

また、本発明の実施の形態では、実撮影画像及び仮想撮影画像が静止画像である場合を想定して説明したが、これに限られない。実撮影画像及び仮想撮影画像は動画であっても良い。 Further, in the embodiment of the present invention, the description has been made assuming that the actual photographed image and the virtual photographed image are still images, but the present invention is not limited to this. The actual shot image and the virtual shot image may be moving images.

また、本発明の実施の形態では、実撮影画像と、三次元シミュレータ１００から取得された教師情報との組を学習用データとしたが、これに限られない。例えば、実撮影画像を教師情報として、仮想撮影画像と、教師情報（実撮影画像）との組を学習用データとしても良い。この場合、三次元シミュレータ１００により作成された仮想撮影画像から実撮影画像を予測するタスクの学習用データを作成することができる。 Further, in the embodiment of the present invention, the set of the actual photographed image and the teacher information acquired from the three-dimensional simulator 100 is used as learning data, but the present invention is not limited to this. For example, the actual photographed image may be used as teacher information, and the set of the virtual photographed image and the teacher information (actually photographed image) may be used as learning data. In this case, it is possible to create learning data for a task of predicting an actual photographed image from a virtual photographed image created by the three-dimensional simulator 100.

本発明は、具体的に開示された上記の実施の形態に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 The present invention is not limited to the above-described embodiment specifically disclosed, and various modifications and modifications can be made without departing from the scope of claims.

１学習用データ作成システム
１０学習用データ作成装置
２０カメラ装置
３０トラッキング装置
１００三次元シミュレータ
２００学習用データ作成部
３００記憶部 1 Learning data creation system 10 Learning data creation device 20 Camera device 30 Tracking device 100 Three-dimensional simulator 200 Learning data creation unit 300 Storage unit

Claims

When the first object in the real space and the second object in the virtual space created by the three-dimensional simulator have the same coordinate system as the other space and the same origin in the coordinate system in each space. The first step of arranging so that they have the same coordinates,
A second procedure of photographing a predetermined range including the first object in the real space and creating a first image, and
A third procedure of drawing a predetermined range in the virtual space corresponding to the first image by the three-dimensional simulator and creating a second image, and
At least one of the information obtained from the second image and the information generated by the three-dimensional simulator is added to the first image as teacher information to create learning data for a machine learning model. A method for creating learning data, characterized in that a computer executes the fourth procedure.

The first creation procedure for creating a virtual space in which one or more objects are arranged by a three-dimensional simulator, and
In response to the creation of the first image taken within a predetermined range of the real space corresponding to the virtual space, a second image drawn by the three-dimensional simulator in the range of the virtual space is created. The second creation procedure to be done and
A third creation procedure for creating learning data used for learning a predetermined machine learning model by adding predetermined information obtained from the second image to the first image as teacher information. ,
A learning data creation method characterized by a computer performing.

The second creation procedure is
The first image is obtained by photographing the inside of the virtual space with the virtual camera device of the posture at the position in the virtual space by using the information indicating the position and the posture of the camera device that created the first image. The method for creating learning data according to claim 2, wherein the image of 2 is created.

The third creation procedure is
A claim characterized in that the learning data represented by either a set of the first image and the predetermined information or a set of the second image and the first image is created. The learning data creation method according to 2 or 3.

The learning data creation method according to any one of claims 2 to 4, wherein the predetermined information is information acquired or calculated by the three-dimensional simulator.

The above-mentioned predetermined information includes
Information indicating the contour line of an object represented by the three-dimensional model arranged in the range of the virtual space, information indicating the object name of the object, information indicating the state of the object, and the depth to the object. Information to be shown, information to show the posture of the object, information for an operating subject capable of performing a preset operation to perform a predetermined action with respect to the object, and an image area of all or a part of the second image. The method for creating learning data according to claim 5, wherein at least one piece of information is included.

The first creation unit that creates a virtual space in which one or more objects are arranged by a three-dimensional simulator,
In response to the creation of the first image taken within a predetermined range of the real space corresponding to the virtual space, a second image drawn by the three-dimensional simulator in the range of the virtual space is created. The second creation part to do,
A third creation unit that creates learning data used for learning a predetermined machine learning model by adding predetermined information obtained from the second image to the first image as teacher information. ,
A learning data creation device characterized by having.

The first creation procedure for creating a virtual space in which one or more objects are arranged by a three-dimensional simulator, and
In response to the creation of the first image taken within a predetermined range of the real space corresponding to the virtual space, a second image drawn by the three-dimensional simulator in the range of the virtual space is created. The second creation procedure to be done and
A third creation procedure for creating learning data used for learning a predetermined machine learning model by adding predetermined information obtained from the second image to the first image as teacher information. ,
A program characterized by having a computer execute.