WO2020067204A1

WO2020067204A1 - Learning data creation method, machine learning model generation method, learning data creation device, and program

Info

Publication number: WO2020067204A1
Application number: PCT/JP2019/037684
Authority: WO
Inventors: 叡一松元; 颯介小林; 悠太菊池; 祐貴五十嵐; 統太郎中島
Original assignee: 株式会社ＰｒｅｆｅｒｒｅｄＮｅｔｗｏｒｋｓ
Priority date: 2018-09-27
Filing date: 2019-09-25
Publication date: 2020-04-02
Also published as: JP2022024189A

Abstract

With the present invention, using first data recording a first real subject and second data recording a second subject which is generated by a simulator, is positioned in alignment with the first subject, and corresponds to the first subject, one or a plurality of computers executes appending information based on the second data and/or information generated by the simulator to the first data as teaching information.

Description

Learning data creation method, machine learning model creation method, learning data creation device and program

The present invention relates to a learning data creation method, a machine learning model generation method, a learning data creation device, and a program.

In recent years, various tasks have been executed by machine learning techniques. As one of such tasks, for example, semantic segmentation is known. Semantic segmentation is a task of classifying each pixel (pixel) in an image captured by a camera device or the like into a class (for example, an object name of an object represented by the pixel) according to the meaning indicated by the pixel. .

Here, in many tasks such as semantic segmentation, a machine learning model is often learned by supervised learning.

JP-A-2017-182129 JP 2016-71597 A

However, learning data used for supervised learning is often created manually. For example, in the semantic segmentation, teacher information (class classification of each pixel) is given to the image by performing a task of painting each pixel in the image with a color of a class corresponding to the meaning indicated by the pixel. Learning data is created.

学習 Also, depending on the task to be executed by the machine learning method, it is necessary to create learning data to which a plurality of pieces of teacher information are added. For example, in addition to the above-described class classification, learning data to which the posture of an object in an image (the direction and rotation of the object, etc.), the state of the object, and the like are added as teacher information may be necessary.

Furthermore, in general, learning a machine learning model often requires a large amount of learning data. For this reason, creation of learning data used for supervised learning sometimes requires a great deal of labor and a great deal of time.

実施 The embodiment of the present invention has been made in view of the above points, and aims to easily create learning data.

In order to achieve the above object, an embodiment of the present invention is directed to a first embodiment in which first data recording a first object in the real world and the first data generated by a simulator and aligned with the first object are arranged. And the second data that records the second object corresponding to the first object, at least the information based on the second data or the information generated by the simulator with respect to the first data. One or a plurality of computers perform the assignment of one of them as teacher information.

Learning data can be easily created.

FIG. 1 is a diagram illustrating an example of an overall configuration of a learning data creation system according to an embodiment of the present invention. FIG. 9 is a diagram for schematically explaining an example of creation of learning data. FIG. 6 is a diagram for explaining an example of a flow of a preparation procedure. FIG. 9 is a diagram for describing an example of the flow of a learning data creation procedure. It is a figure showing an example of a teacher information list. FIG. 2 is a diagram illustrating an example of a hardware configuration of a learning data creation device according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described. Hereinafter, a learning data creation system 1 that can easily create learning data of a machine learning model that executes a predetermined task will be described. The predetermined task includes, for example, recognition and classification of an object in an image captured by a camera device or the like, grasping of the state of the object, and any action related to the object (for example, an object grasping action or an object avoiding action). ) And other tasks. Note that the object is an example of a target to which teacher information is added.

In the embodiment of the present invention, an image obtained by photographing a virtual space created by a three-dimensional simulator with a camera device (that is, a virtual camera device installed in the virtual space or the like) (hereinafter, also referred to as a “virtual photographed image”) ) And an image (hereinafter, also referred to as an “actual photographed image”) of a real space corresponding to the virtual space (that is, a real space) photographed by an actual camera device. Then, learning data is created by adding teacher information obtained from the virtual photographed image. As the teacher information, for example, contour information of an object in a virtual captured image, a class to which the object is classified, an object name of the object, state information of the object, a depth to the object, a depth to the object, a posture of the object, Information for performing a predetermined action on the object is included. Note that the virtual captured image is an example of data that records an object (a second target) captured in the virtual space, and is an example of second data. The actual photographed image is an example of data recording an object (first object) photographed in the real space, and is an example of first data. A three-dimensional simulator is an example of a simulator.

Here, the real space corresponding to the virtual space is, for example, a real space in which the same object is arranged at the same position as the virtual space created by the three-dimensional simulator. In the present disclosure, the position being the same in the virtual space and the real space means that, for example, when the same coordinate system is set in the virtual space and the real space, the position coordinates are the same. However, in the virtual space and the real space, for example, coordinate systems that can be mutually converted may be set. Hereinafter, “position” and “posture” represent a position and a posture in the same coordinate system set in the virtual space and the real space.

Further, in the present specification, the same or the same is not limited to exactly the same or the same. For example, the same or the same may be allowed according to a shift generated due to an error in an apparatus or a calculation, a purpose of using a data set, or the like. It is assumed that deviations can be included.

Also, that the object is the same in the virtual space and the real space means that the object represented by the three-dimensional model arranged in the virtual space is the same as the real object arranged in the real space. That is. Note that an object arranged in a virtual space is also referred to as an “object” in order to distinguish it from an actual object arranged in a real space. Further, the fact that the target or the object is the same means that the information necessary for providing the teacher information is the same. For example, in semantic segmentation, an object having the same contour is included, and in teaching a gripping position of the object, an object having the same tactile sensation and material information in addition to the contour is included.

<Overall Configuration of Learning Data Creation System 1>
First, an overall configuration of a learning data creation system 1 according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of an overall configuration of a learning data creation system 1 according to an embodiment of the present invention.

As shown in FIG. 1, the learning data creation system 1 according to the embodiment of the present invention includes a learning data creation device 10, one or more camera devices 20, and one or more tracking devices 30. The learning data creation device 10, the camera device 20, and the tracking device 30 are communicably connected via a communication network such as a wireless LAN (Local Area Network). This communication network may be, for example, a wire LAN or the like in whole or in part.

The learning data creation device 10 is a computer or a computer system that creates learning data. The learning data creation device 10 includes a three-dimensional simulator 100, a learning data creation unit 200, and a storage unit 300.

The three-dimensional simulator 100 of the present embodiment is a simulator that can simulate a three-dimensional virtual space. In the three-dimensional simulator 100, an object can be arranged in a virtual space, or a physical operation for simulating the physical laws of the object (for example, collision determination between objects) can be performed.

{Circle around (3)} In the three-dimensional simulator 100, a virtual photographed image photographed in a virtual space with a virtual camera device can be drawn. At this time, in the three-dimensional simulator 100, for example, the outline information and the object name of the object (object) in the virtual captured image, the class to which the object is classified, the state information of the object, the depth to the object, the depth to the object, , The result of performing a predetermined physical operation on the object, and the like can be added to the virtual captured image. These pieces of information are generated by calculation or the like in the three-dimensional simulator 100. Note that these pieces of information may be generated in advance before the virtual captured image is drawn, for example, and separately stored in the storage unit 300 or the like.

In the present embodiment, by adding various information to the virtual photographed image by the three-dimensional simulator 100, all or all of the various information is added to the real photographed image corresponding to the virtual photographed image. It is possible to create learning data to which a part is added as teacher information.

(4) A data set can be easily created by adding the information generated by the simulator to the first data.

Such a three-dimensional simulator 100 is realized by a game engine such as, for example, Unity, Unreal @ Engine4 (UE4), Blender, and the like. However, the three-dimensional simulator 100 is not limited to these game engines, and may be realized by any three-dimensional simulation software.

The learning data creation unit 200 creates learning data by adding teacher information obtained from a virtual captured image to a real captured image. The teacher information is information added to the virtual captured image when the three-dimensional simulator 100 draws the virtual captured image (for example, contour information and an object name of an object in the virtual captured image, a class into which the object is classified). , State information of the object, a depth to the object, a posture of the object, a result of performing a predetermined physical operation on the object, and the like).

As described above, the learning data creating unit 200 creates learning data by adding information obtained from the virtual captured image drawn by the three-dimensional simulator 100 to the real captured image as teacher information.

(4) The storage unit 300 stores various information. The information stored in the storage unit 300 includes, for example, an actual captured image captured by the camera device 20 in the real space, a virtual captured image drawn by the three-dimensional simulator 100, tracking information acquired from the tracking device 30, Examples include a three-dimensional model of an object to be arranged. Here, the tracking information is information obtained by tracking the position and orientation of the camera device 20 in the real space. That is, the tracking information is information indicating both the position and the posture of the camera device 20 at each time. However, the tracking information may be, for example, information indicating only the position of the camera device 20 at each time. The tracking information obtained by the tracking device 30 is used for synchronizing information in the real space and information in the virtual space.

The camera device 20 is an imaging device that captures an image of a real space and creates an actually captured image. The camera device 20 is fixed to, for example, a portable camera stand on which the tracking device 30 is mounted. The actual captured image created by the camera device 20 is transmitted to the learning data creation device 10 and stored in the storage unit 300. Note that the camera device 20 may be, for example, a depth camera that can create an actual captured image to which depth information has been added.

The tracking device 30 is a device that tracks the position and orientation of the camera device 20 and creates tracking information (for example, a sensing device equipped with a position sensor and an orientation sensor). The tracking device 30 is mounted on, for example, a portable camera stand or the like. As described above, one tracking device 30 is installed and associated with one camera device 20. The tracking information created by the tracking device 30 is transmitted to the learning data creation device 10 and stored in the storage unit 300. The tracking device 30 may be directly mounted on the camera device 20 or may be built in the camera device 20, for example.

Since the tracking information of the camera device 20 is obtained by the tracking device 30, it is possible to obtain a virtual photographed image corresponding to a real photographed image photographed in a real space by the camera device 20. That is, the tracking device 30 of the present embodiment is a device that acquires information when the first data is acquired, and can easily create the second data based on the information.

The configuration of the learning data creation system 1 shown in FIG. 1 is an example, and another configuration may be used. For example, the learning data creation system 1 may include an arbitrary number of camera devices 20 and tracking devices 30 corresponding to the camera devices 20.

The camera device 20 may not have the tracking device 30 corresponding to the camera device 20 as long as the position and the posture in the real space are known. For example, when the camera device 20 is fixedly installed at a predetermined position in a predetermined posture, the tracking device 30 corresponding to the camera device 20 may not be provided.

<How to create learning data>
Here, an outline of the case where learning data is created by the learning data creating device 10 according to the embodiment of the present invention will be described with reference to FIG. FIG. 2 is a diagram for schematically explaining an example of creating learning data.

実 As shown in FIG. 2, an actual photographed image photographed by the camera device 20 in a certain posture at a certain position in the real space is referred to as “real photographed image G110”. A virtual captured image captured by the virtual camera device having the same posture at the same position in the virtual space corresponding to the real space is referred to as a “virtual captured image G210”.

At this time, information (for example, “contour line” and “object name” in FIG. 2) generated by calculation of the three-dimensional simulator 100 is added to the virtual captured image G210. That is, in the example shown in FIG. 2, the outline of each object in the virtual captured image G210 and the object name of each object are given. Note that, among information that can be generated by calculation of the three-dimensional simulator, what information is given to the virtual captured image G210 depends on the task to be executed by the machine learning model (that is, what learning data set Depends on what you want to create.).

The learning data creating device 10 adds the information (that is, the “contour line” and the “object name”) assigned to the virtual captured image G210 to the real captured image G110 as teacher information. The data G120 is created. As a result, learning data G120 represented by a set of the real photographed image G110 and the teacher information (that is, “contour line” and “object name”) is created.

As described above, the learning data creating apparatus 10 according to the embodiment of the present invention includes the real photographed image G110 actually photographing a certain area in the real space and the virtual photographing image G110 virtually photographing the same area in the virtual space. Using the captured image G210 and the information obtained from the virtual captured image G210 (that is, information generated by the operation of the three-dimensional simulator or the like) is added to the real captured image G110 to create the learning data G120. . Therefore, the learning data creation device 10 according to the embodiment of the present invention can easily create the learning data G120.

In addition, in the learning data creation device 10 according to the embodiment of the present invention, the position and orientation of the camera device 20 are specified by the tracking information acquired from the tracking device 30, so that the position of the virtual camera device in the virtual space is determined. And the posture can be synchronized with the camera device 20. For this reason, the user can easily obtain the real photographed image and the virtual photographed image corresponding to the real photographed image simply by photographing the real space with the camera device 20, for example.

<Flow of preparation procedure>
In the embodiment of the present invention, as described above, the virtual space needs to correspond to the real space. For this reason, it is necessary to make the virtual space correspond to the real space as advance preparation for creating the learning data. Therefore, hereinafter, an object is placed in a virtual space by the three-dimensional simulator 100, and an actual object is also placed in the real space so as to correspond to the virtual space. (That is, the virtual space and the real space are matched) will be described with reference to FIG. FIG. 3 is a diagram for explaining an example of the flow of the advance preparation procedure.

Step S101: The three-dimensional simulator 100 acquires a three-dimensional model of an object placed in the virtual space from the storage unit 300. This means, for example, that the data of the three-dimensional model stored in the storage unit 300 is imported into the three-dimensional simulator 100. The three-dimensional model is provided with information such as an object ID, an object name, and a category to which the object belongs, in addition to the shape of the object.

Note that the three-dimensional model may be created in advance by an arbitrary method and stored in the storage unit 300. As a method of creating a three-dimensional model, for example, the three-dimensional shape of an actual object may be created by scanning with a three-dimensional scanner or the like, or may be created manually with three-dimensional model creation software or the like. good.

Step S102: The three-dimensional simulator 100 arranges the object represented by the three-dimensional model in the virtual space. For example, the user selects a desired three-dimensional model from the plurality of three-dimensional models imported in step S101, and drags and drops the selected three-dimensional model into the virtual space. , An object can be arranged in the virtual space. In addition, the user may be able to arrange the object represented by the three-dimensional model in the virtual space by designating the position coordinates in the virtual space.

Here, when arranging the object represented by the three-dimensional model in the virtual space, the user may arbitrarily tilt or rotate the object, and then arrange the object. In addition, the user may arrange the object after enlarging or reducing the object, for example.

When arranging a plurality of objects in the virtual space, the above step S102 may be repeatedly performed.

により Through the above steps S101 to S102, the three-dimensional simulator 100 creates a virtual space in which one or more objects (objects) are arranged at desired positions.

Step S103: The user places an actual object in the real space so as to correspond to the virtual space created in steps S101 to S102.

Here, for example, the user can superimpose and display the object arranged in the virtual space in the real space, and use the display device equipped with the position sensor and the posture sensor to display the object. What is necessary is just to arrange the actual object at the same position as the object displayed on the device. Examples of such a display device include a head-mounted display on which a position sensor, an attitude sensor, and a camera are mounted, a head-mounted display on which a real space is transparently visible and on which a position sensor and an attitude sensor are mounted, and projection mapping. Examples include a device, a tablet terminal equipped with a position sensor, a posture sensor, and a camera, and a smartphone equipped with a position sensor, a posture sensor, and a camera.

These display devices can display a video in which an object is superimposed in the real space after synchronizing the position in the virtual space with the position in the real space. Therefore, for example, the user moves the display device in the real space after carrying or wearing the display device, and places the same object in the same position, the same posture, and the same position as the object in the video in the real space. Can be.

により Thereby, the virtual space created in steps S101 to S102 can be associated with the real space. In addition to the above, for example, by creating a mixed reality in which the real space and the virtual space are fused by using a technology such as MR (Mixed Reality), the same position as the object arranged in the virtual space is obtained. The same object may be arranged in the real space in the posture.

<Procedure for creating learning data>
Next, after a real photographed image and a virtual photographed image corresponding to the real photographed image are created, a procedure for creating learning data using the real photographed image and the virtual photographed image is described below. This will be described with reference to FIG. FIG. 4 is a diagram for explaining an example of the flow of the learning data creation procedure.

Step S201: The user uses the camera device 20 to photograph a desired range in the real space. As a result, an actual photographed image is created by the camera device 20 and transmitted to the learning data creating device 10. In the learning data creation device 10, the actual photographed image is stored in the storage unit 300.

At this time, the tracking device 30 corresponding to the camera device 20 transmits the tracking information to the learning data creation device 10. Thereby, in the learning data creation device 10, the tracking information is stored in the storage unit 300. The tracking information is information indicating the position and orientation of the camera device 20 as described above.

In the above step S201, the tracking information created by the tracking device 30 tracking the position and orientation of the camera device 20 is stored in the storage unit 300, but the present invention is not limited to this. After tracking the position and orientation of the camera device 20 by an arbitrary method, tracking information indicating the tracking result may be stored in the storage unit 300. For example, even if a two-dimensional code such as a QR code (registered trademark) is pasted on the camera device 20 in advance and the position and orientation of the camera device 20 are tracked by reading the two-dimensional code with a camera or the like. good.

Step S202: The three-dimensional simulator 100 photographs the virtual space with the virtual camera device at the same position and orientation as the camera device 20 photographed in step S201. In other words, the three-dimensional simulator 100 draws (renders) in the virtual space an imaging range of the virtual camera device having the same position and posture as the camera device 20 imaged in step S201.

Here, the three-dimensional simulator 100 can specify the position and orientation of the camera device 20 from the tracking information created in step S201. Therefore, the three-dimensional simulator 100 can install a virtual camera device in the virtual space at the same position and orientation as the camera device 20 in the real space. As a result, a virtual captured image corresponding to the real captured image created in step S201 is created.

At this time, the three-dimensional simulator 100 adds predetermined information acquired or calculated in the virtual space to the virtual captured image. Then, the three-dimensional simulator 100 stores the virtual captured image to which the predetermined information is added in the storage unit 300.

Here, as the predetermined information, as described above, for example, the outline information and the object name of the object (object) in the virtual captured image, the class to which the object is classified, the state information of the object, the , The posture of the object, the result of performing a predetermined physical operation on the object, and the like. In addition, as a result of performing a predetermined physical operation on an object, for example, information on an operation of a robot arm capable of performing a preset operation and holding the object at the position can be given. Alternatively, for example, there is information on an operation or the like in which a mobile robot capable of performing a preset operation can avoid the object at the position. Note that these robot arms and mobile robots are examples of an operation subject that can perform a preset operation.

Step S202 may be automatically performed after step S201, for example, or may be performed in response to a user operation (for example, a rendering start operation in a virtual space). .

Step S203: The learning data creation unit 200 assigns, as teacher information, predetermined information given to the virtual shot image created in step S202 to the real shot image created in step S201. I do. As a result, learning data represented by a set of the real photographed image and the teacher information is created.

Here, the teacher information included in the learning data is represented, for example, in a list format. As an example, FIG. 5 illustrates a plurality of pieces of teacher information represented in a list format (this is also referred to as a “teacher information list”). FIG. 5 is an example of a teacher information list assigned to a certain actually shot image (image ID: image101).

各 Each teacher information included in the teacher information list shown in FIG. 5 is information in which an object ID, position information, contour line information, contact information, and gripping operation information are associated with each other.

The object ID is an ID for identifying an object. The object ID is, for example, information given to a three-dimensional model of an object arranged in a virtual space.

The position information is the position coordinates where the object is located. The position information is, for example, information given to the object when the object represented by the three-dimensional model is arranged in step S102 described above.

Contour line information is information indicating a contour line of an object. The contour information can be acquired, for example, from the rendering result when the virtual captured image is drawn (rendered) in step S202.

The contact information is information indicating the object ID of the other object, the contact position with the other object, and the like when the object with the object ID is in contact with another object (object). The contact information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.

The gripping operation information is, for example, information relating to an operation in which a robot arm capable of performing a preset operation can grip an object having the object ID at a shooting position of a virtual shot image. The gripping motion information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.

As described above, the teacher information list illustrated in FIG. 5 is a list of the teacher information in which the position information, the object name, the outline information, the contact information, and the grip operation information are associated with each other for each object (object). It is. In addition to this, any information that can be obtained or calculated by the three-dimensional simulator 100 may be associated with the teacher information. For example, the virtual captured image itself or a partial region of the virtual captured image may be associated with the teacher information. Specifically, for example, an image area portion representing the object of the object ID in the image area of the virtual captured image may be associated with the object ID. Note that the gripping motion information may include information on a target part that is considered to be grippable or easy to grip when gripping the target.

{Circle around (4)} Each piece of teacher information may be information in which only a part of the above information (position information, object name, contour line information, contact information, gripping motion information, etc.) is associated.

Note that the case where the teacher information included in the learning data is represented in a list format is an example, and the teacher information included in the learning data may be represented in any other format.

<Hardware configuration of learning data creation device 10>
Next, a hardware configuration of the learning data creation device 10 according to the embodiment of the present invention will be described with reference to FIG. FIG. 6 is a diagram illustrating an example of a hardware configuration of the learning data creation device 10 according to the embodiment of the present invention.

As shown in FIG. 6, the learning data creating device 10 according to the embodiment of the present invention includes an input device 401, a display device 402, an external I / F 403, a communication I / F 404, and a random access memory (RAM). 405, a ROM (Read Only Memory) 406, a processor 407, and an auxiliary storage device 408. These pieces of hardware are interconnected by a bus 409.

The input device 401 is, for example, a keyboard, a mouse, a touch panel, or the like, and is used by a user to input various operations. The display device 402 is, for example, a display or the like, and displays various processing results of the learning data creation device 10. Note that the learning data creation device 10 may not have at least one of the input device 401 and the display device 402.

The external I / F 403 is an interface with an external device. The external device includes a recording medium 403a and the like. Reading and writing of the recording medium 403a and the like can be performed via the learning data creation device 10 and the external I / F 403. The storage medium 403a may store one or more programs for realizing the three-dimensional simulator 100 and the learning data creation unit 200.

Examples of the recording medium 403a include a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital Memory card), and a USB (Universal Serial Bus) memory card.

The communication I / F 404 is an interface for connecting the learning data creation device 10 to a communication network. One or more programs that implement the three-dimensional simulator 100 and the learning data creation unit 200 may be obtained (downloaded) from a predetermined server device or the like via the communication I / F 404.

The RAM 405 is a volatile semiconductor memory that temporarily stores programs and data. The ROM 406 is a nonvolatile semiconductor memory that can retain programs and data even when the power is turned off. The ROM 406 stores, for example, settings related to an OS (Operating System), settings related to a communication network, and the like.

The processor 407 is, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like, and is an arithmetic device that reads a program or data from the ROM 406 or the auxiliary storage device 408 onto the RAM 405 and executes processing. The three-dimensional simulator 100 and the learning data creation unit 200 are realized by, for example, processing that causes the processor 407 to execute one or more programs stored in the auxiliary storage device 408. The learning data creation device 10 may include both the CPU and the GPU as the processor 407, or may include only one of the CPU and the GPU.

The auxiliary storage device 408 is, for example, a hard disk drive (HDD) or a solid state drive (SSD), and is a nonvolatile storage device that stores programs and data. The auxiliary storage device 408 stores, for example, an OS, various application software, one or more programs for implementing the three-dimensional simulator 100 and the learning data creation unit 200, and the like. The storage unit 300 is realized using, for example, the auxiliary storage device 408. However, the storage unit 300 may be implemented using, for example, a storage device that is communicably connected to the learning data creation device 10 via a communication network, instead of the auxiliary storage device 408.

The learning data creation device 10 according to the embodiment of the present invention can implement the above-described various processes by having the hardware configuration shown in FIG. In the example illustrated in FIG. 6, a case has been described where the learning data creation device 10 according to the embodiment of the present invention is implemented by one device (computer), but the invention is not limited to this. The learning data creation device 10 according to the embodiment of the present invention may be realized by a plurality of devices (computers).

<Summary>
As described above, the learning data creation system 1 according to the embodiment of the present invention converts information obtained from a virtual captured image (that is, information that can be obtained or calculated by the three-dimensional simulator 100) into a real captured image as teacher information. By giving, the learning data is created. For this reason, in the learning data creation system 1 according to the embodiment of the present invention, the learning data is easily created without, for example, manually adding teacher information to an actual captured image. Will be able to do it. In particular, for example, even when the number of teacher information is large (for example, the number of types of objects is large or the number of categories is large), the learning data can be easily created.

In addition, for example, when performing semantic segmentation, in the learning data creation system 1 according to the embodiment of the present invention, since the three-dimensional simulator 100 acquires the boundary of the object (object), the object segmentation can be performed with high accuracy. Will be able to do it.

Further, for example, in the learning data creation system 1 according to the embodiment of the present invention, even if the teacher information is difficult to manually assign, such as the depth and the posture of the object, the teacher information is included. Learning data can be easily created.

Moreover, in the learning data creation system 1 according to the embodiment of the present invention, for example, the user simply captures an arbitrary range with the camera device 20 while moving in the real space, and obtains the real captured image and the real captured image. Is generated, so that a large amount of learning data can be easily created. For this reason, for example, a large amount of learning data can be obtained at low cost as compared with a case where teacher information is added to an actual captured image using crowd sourcing or the like.

Therefore, by using the learning data creation system 1 according to the embodiment of the present invention, for example, recognition of a robot that cleans a certain room that is a real space or clears an object in the room is performed. It is possible to easily obtain a large amount of learning data used for learning an engine (that is, a machine learning model that performs a task of cleaning and clearing a room).

In the embodiment of the present invention, as a preliminary preparation procedure, a virtual space is created, and objects are arranged in the real space so as to correspond to the virtual space. However, the present invention is not limited to this. For example, a virtual space may be created so as to correspond to a real space in which an object is actually arranged. At this time, the object in the virtual space may be moved so as to correspond to the object in the real space.

In addition, in the present embodiment, it is preferable to provide a function for facilitating the alignment between the first target and the second target. For example, a function may be provided that facilitates movement when moving an object in the virtual space, or facilitates alignment with an object in the real space. Specifically, a function of moving an object in the virtual space with reference to the position of the user (for example, a function of drawing the object to the vicinity of the user), a function of aligning the normal with the real space (for example, a predetermined function of the object in the virtual space) A function to move a surface along another object while in contact with another object including a wall or a floor in the virtual space), and any of information such as the direction, position, and angle of the object in the virtual space A function that aligns objects in the real space by manipulating other information after fixing the object, a function that aligns objects in the real space with the corresponding objects in the virtual space using a machine learning model, etc. Is mentioned.

処理 The second object may be subjected to a process such that the second object satisfactorily corresponds to the first object or facilitates alignment. For example, when the object in the real space does not correspond to the object in the virtual space, or when it is difficult to match the position, an object in the virtual space is newly added, the information is corrected, or the information is generated again. You may do it.

In addition, when acquiring or creating the first data or the second data, data other than main data (for example, an image in the above embodiment) used as learning data may be recorded. Such data is, specifically, the parameters of the device that acquires or creates the data (for example, in the case of a camera, the focal length, the position of the camera, the shutter speed and the aperture value, the acceleration sensor provided in the camera, Information on the camera at the time when the image was acquired, such as information on the movement of the camera, information on the environment in which data such as lighting conditions were acquired, category or tag information on the target or main data recorded in the data ( For example, in the case of an image, category information related to a photographed object) is included. When the main data is a moving image, it is preferable to include time information and the like. Further, information used for management of recorded data and the like, for example, an ID of a worker or an imaging device may be included. Such data other than the main data may be used in the process of creating the learning data, or may be used together with the main data as a part of the learning data.

Further, in the embodiment of the present invention, the case has been described where the real photographed image and the virtual photographed image are still images, but the present invention is not limited to this. The real photographed image and the virtual photographed image may be moving images.

In addition, in the embodiment of the present invention, a set of the real photographed image and the teacher information acquired from the three-dimensional simulator 100 is used as the learning data, but is not limited thereto. For example, a combination of a virtual captured image and teacher information (real captured image) may be used as learning data, using the real captured image as teacher information. In this case, it is possible to create learning data for a task of predicting an actual captured image from the virtual captured image created by the three-dimensional simulator 100.

The present invention is not limited to the above-described embodiments specifically disclosed, and various modifications and changes can be made without departing from the scope of the claims.

This application is based on Japanese Patent Application No. 2018-182538 filed on Sep. 27, 2018 in Japan, the entire contents of which are incorporated herein by reference.

Reference Signs List 1 learning data creation system 10 learning data creation device 20 camera device 30 tracking device 100 three-dimensional simulator 200 learning data creation unit 300 storage unit

Claims

First data that records a first object in reality, and a second data that records a second object corresponding to the first object generated by a simulator and aligned with the first object. Using the data and
Learning data for one or more computers to execute, as teacher information, at least one of information based on the second data or information generated by the simulator, for the first data; How to make.
The method according to claim 1, wherein the first object is an object, the first data and the second data are images, and the second object is a three-dimensional model.
Arranging the first object and the second object so as to match the position of the first object in reality with the position of the second object in the virtual space, or in the virtual space The learning data creation method according to claim 1 or 2, wherein the second object is moved to match the position of the first object in reality.
4. The learning data creating method according to claim 3, wherein an MR technique is used for arranging the first target and the second target so as to be aligned with each other.
A method for generating a machine learning model in which one or more computers execute learning using learning data created by the learning data creating method according to any one of claims 1 to 4.
First data that records a first object in reality, and a second data that records a second object corresponding to the first object generated by a simulator and aligned with the first object. Using the data and
An assigning unit that assigns at least one of information based on the second data or information generated by the simulator as teacher information to the first data;
A learning data creation device having
First data that records a first object in reality, and a second data that records a second object corresponding to the first object generated by a simulator and aligned with the first object. Using the data and
A program for causing one or more computers to execute at least one of information based on the second data and information generated by the simulator as teacher information to the first data.