WO2020067204A1 - Learning data creation method, machine learning model generation method, learning data creation device, and program - Google Patents

Learning data creation method, machine learning model generation method, learning data creation device, and program Download PDF

Info

Publication number
WO2020067204A1
WO2020067204A1 PCT/JP2019/037684 JP2019037684W WO2020067204A1 WO 2020067204 A1 WO2020067204 A1 WO 2020067204A1 JP 2019037684 W JP2019037684 W JP 2019037684W WO 2020067204 A1 WO2020067204 A1 WO 2020067204A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
learning data
data
virtual
data creation
Prior art date
Application number
PCT/JP2019/037684
Other languages
French (fr)
Japanese (ja)
Inventor
叡一 松元
颯介 小林
悠太 菊池
祐貴 五十嵐
統太郎 中島
Original Assignee
株式会社Preferred Networks
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Preferred Networks filed Critical 株式会社Preferred Networks
Publication of WO2020067204A1 publication Critical patent/WO2020067204A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a learning data creation method, a machine learning model generation method, a learning data creation device, and a program.
  • Semantic segmentation is a task of classifying each pixel (pixel) in an image captured by a camera device or the like into a class (for example, an object name of an object represented by the pixel) according to the meaning indicated by the pixel. .
  • a machine learning model is often learned by supervised learning.
  • learning data used for supervised learning is often created manually.
  • teacher information class classification of each pixel
  • Learning data is created.
  • learning data to which a plurality of pieces of teacher information are added. For example, in addition to the above-described class classification, learning data to which the posture of an object in an image (the direction and rotation of the object, etc.), the state of the object, and the like are added as teacher information may be necessary.
  • an embodiment of the present invention is directed to a first embodiment in which first data recording a first object in the real world and the first data generated by a simulator and aligned with the first object are arranged. And the second data that records the second object corresponding to the first object, at least the information based on the second data or the information generated by the simulator with respect to the first data.
  • One or a plurality of computers perform the assignment of one of them as teacher information.
  • FIG. 1 is a diagram illustrating an example of an overall configuration of a learning data creation system according to an embodiment of the present invention.
  • FIG. 9 is a diagram for schematically explaining an example of creation of learning data.
  • FIG. 6 is a diagram for explaining an example of a flow of a preparation procedure.
  • FIG. 9 is a diagram for describing an example of the flow of a learning data creation procedure. It is a figure showing an example of a teacher information list.
  • FIG. 2 is a diagram illustrating an example of a hardware configuration of a learning data creation device according to an embodiment of the present invention.
  • the predetermined task includes, for example, recognition and classification of an object in an image captured by a camera device or the like, grasping of the state of the object, and any action related to the object (for example, an object grasping action or an object avoiding action). ) And other tasks.
  • the object is an example of a target to which teacher information is added.
  • an image obtained by photographing a virtual space created by a three-dimensional simulator with a camera device that is, a virtual camera device installed in the virtual space or the like
  • a camera device that is, a virtual camera device installed in the virtual space or the like
  • an image hereinafter, also referred to as an “actual photographed image” of a real space corresponding to the virtual space (that is, a real space) photographed by an actual camera device.
  • learning data is created by adding teacher information obtained from the virtual photographed image.
  • the teacher information for example, contour information of an object in a virtual captured image, a class to which the object is classified, an object name of the object, state information of the object, a depth to the object, a depth to the object, a posture of the object, Information for performing a predetermined action on the object is included.
  • the virtual captured image is an example of data that records an object (a second target) captured in the virtual space, and is an example of second data.
  • the actual photographed image is an example of data recording an object (first object) photographed in the real space, and is an example of first data.
  • a three-dimensional simulator is an example of a simulator.
  • the real space corresponding to the virtual space is, for example, a real space in which the same object is arranged at the same position as the virtual space created by the three-dimensional simulator.
  • the position being the same in the virtual space and the real space means that, for example, when the same coordinate system is set in the virtual space and the real space, the position coordinates are the same.
  • coordinate systems that can be mutually converted may be set.
  • position and “posture” represent a position and a posture in the same coordinate system set in the virtual space and the real space.
  • the same or the same is not limited to exactly the same or the same.
  • the same or the same may be allowed according to a shift generated due to an error in an apparatus or a calculation, a purpose of using a data set, or the like. It is assumed that deviations can be included.
  • the object is the same in the virtual space and the real space means that the object represented by the three-dimensional model arranged in the virtual space is the same as the real object arranged in the real space. That is.
  • an object arranged in a virtual space is also referred to as an “object” in order to distinguish it from an actual object arranged in a real space.
  • the fact that the target or the object is the same means that the information necessary for providing the teacher information is the same. For example, in semantic segmentation, an object having the same contour is included, and in teaching a gripping position of the object, an object having the same tactile sensation and material information in addition to the contour is included.
  • FIG. 1 is a diagram illustrating an example of an overall configuration of a learning data creation system 1 according to an embodiment of the present invention.
  • the learning data creation system 1 includes a learning data creation device 10, one or more camera devices 20, and one or more tracking devices 30.
  • the learning data creation device 10, the camera device 20, and the tracking device 30 are communicably connected via a communication network such as a wireless LAN (Local Area Network).
  • This communication network may be, for example, a wire LAN or the like in whole or in part.
  • the learning data creation device 10 is a computer or a computer system that creates learning data.
  • the learning data creation device 10 includes a three-dimensional simulator 100, a learning data creation unit 200, and a storage unit 300.
  • the three-dimensional simulator 100 of the present embodiment is a simulator that can simulate a three-dimensional virtual space.
  • an object can be arranged in a virtual space, or a physical operation for simulating the physical laws of the object (for example, collision determination between objects) can be performed.
  • a virtual photographed image photographed in a virtual space with a virtual camera device can be drawn.
  • the outline information and the object name of the object (object) in the virtual captured image, the class to which the object is classified, the state information of the object, the depth to the object, the depth to the object, , The result of performing a predetermined physical operation on the object, and the like can be added to the virtual captured image.
  • These pieces of information are generated by calculation or the like in the three-dimensional simulator 100. Note that these pieces of information may be generated in advance before the virtual captured image is drawn, for example, and separately stored in the storage unit 300 or the like.
  • a data set can be easily created by adding the information generated by the simulator to the first data.
  • Such a three-dimensional simulator 100 is realized by a game engine such as, for example, Unity, Unreal @ Engine4 (UE4), Blender, and the like.
  • a game engine such as, for example, Unity, Unreal @ Engine4 (UE4), Blender, and the like.
  • UE4 Unreal @ Engine4
  • Blender Blender
  • the three-dimensional simulator 100 is not limited to these game engines, and may be realized by any three-dimensional simulation software.
  • the learning data creation unit 200 creates learning data by adding teacher information obtained from a virtual captured image to a real captured image.
  • the teacher information is information added to the virtual captured image when the three-dimensional simulator 100 draws the virtual captured image (for example, contour information and an object name of an object in the virtual captured image, a class into which the object is classified). , State information of the object, a depth to the object, a posture of the object, a result of performing a predetermined physical operation on the object, and the like).
  • the learning data creating unit 200 creates learning data by adding information obtained from the virtual captured image drawn by the three-dimensional simulator 100 to the real captured image as teacher information.
  • the storage unit 300 stores various information.
  • the information stored in the storage unit 300 includes, for example, an actual captured image captured by the camera device 20 in the real space, a virtual captured image drawn by the three-dimensional simulator 100, tracking information acquired from the tracking device 30, Examples include a three-dimensional model of an object to be arranged.
  • the tracking information is information obtained by tracking the position and orientation of the camera device 20 in the real space. That is, the tracking information is information indicating both the position and the posture of the camera device 20 at each time. However, the tracking information may be, for example, information indicating only the position of the camera device 20 at each time.
  • the tracking information obtained by the tracking device 30 is used for synchronizing information in the real space and information in the virtual space.
  • the camera device 20 is an imaging device that captures an image of a real space and creates an actually captured image.
  • the camera device 20 is fixed to, for example, a portable camera stand on which the tracking device 30 is mounted.
  • the actual captured image created by the camera device 20 is transmitted to the learning data creation device 10 and stored in the storage unit 300.
  • the camera device 20 may be, for example, a depth camera that can create an actual captured image to which depth information has been added.
  • the tracking device 30 is a device that tracks the position and orientation of the camera device 20 and creates tracking information (for example, a sensing device equipped with a position sensor and an orientation sensor).
  • the tracking device 30 is mounted on, for example, a portable camera stand or the like. As described above, one tracking device 30 is installed and associated with one camera device 20.
  • the tracking information created by the tracking device 30 is transmitted to the learning data creation device 10 and stored in the storage unit 300.
  • the tracking device 30 may be directly mounted on the camera device 20 or may be built in the camera device 20, for example.
  • the tracking device 30 of the present embodiment is a device that acquires information when the first data is acquired, and can easily create the second data based on the information.
  • the configuration of the learning data creation system 1 shown in FIG. 1 is an example, and another configuration may be used.
  • the learning data creation system 1 may include an arbitrary number of camera devices 20 and tracking devices 30 corresponding to the camera devices 20.
  • the camera device 20 may not have the tracking device 30 corresponding to the camera device 20 as long as the position and the posture in the real space are known. For example, when the camera device 20 is fixedly installed at a predetermined position in a predetermined posture, the tracking device 30 corresponding to the camera device 20 may not be provided.
  • FIG. 2 is a diagram for schematically explaining an example of creating learning data.
  • an actual photographed image photographed by the camera device 20 in a certain posture at a certain position in the real space is referred to as “real photographed image G110”.
  • a virtual captured image captured by the virtual camera device having the same posture at the same position in the virtual space corresponding to the real space is referred to as a “virtual captured image G210”.
  • information for example, “contour line” and “object name” in FIG. 2 generated by calculation of the three-dimensional simulator 100 is added to the virtual captured image G210. That is, in the example shown in FIG. 2, the outline of each object in the virtual captured image G210 and the object name of each object are given. Note that, among information that can be generated by calculation of the three-dimensional simulator, what information is given to the virtual captured image G210 depends on the task to be executed by the machine learning model (that is, what learning data set Depends on what you want to create.).
  • the learning data creating device 10 adds the information (that is, the “contour line” and the “object name”) assigned to the virtual captured image G210 to the real captured image G110 as teacher information.
  • the data G120 is created.
  • learning data G120 represented by a set of the real photographed image G110 and the teacher information that is, “contour line” and “object name”.
  • the learning data creating apparatus 10 includes the real photographed image G110 actually photographing a certain area in the real space and the virtual photographing image G110 virtually photographing the same area in the virtual space. Using the captured image G210 and the information obtained from the virtual captured image G210 (that is, information generated by the operation of the three-dimensional simulator or the like) is added to the real captured image G110 to create the learning data G120. . Therefore, the learning data creation device 10 according to the embodiment of the present invention can easily create the learning data G120.
  • the position and orientation of the camera device 20 are specified by the tracking information acquired from the tracking device 30, so that the position of the virtual camera device in the virtual space is determined. And the posture can be synchronized with the camera device 20. For this reason, the user can easily obtain the real photographed image and the virtual photographed image corresponding to the real photographed image simply by photographing the real space with the camera device 20, for example.
  • FIG. 3 is a diagram for explaining an example of the flow of the advance preparation procedure.
  • Step S101 The three-dimensional simulator 100 acquires a three-dimensional model of an object placed in the virtual space from the storage unit 300. This means, for example, that the data of the three-dimensional model stored in the storage unit 300 is imported into the three-dimensional simulator 100.
  • the three-dimensional model is provided with information such as an object ID, an object name, and a category to which the object belongs, in addition to the shape of the object.
  • the three-dimensional model may be created in advance by an arbitrary method and stored in the storage unit 300.
  • a method of creating a three-dimensional model for example, the three-dimensional shape of an actual object may be created by scanning with a three-dimensional scanner or the like, or may be created manually with three-dimensional model creation software or the like. good.
  • Step S102 The three-dimensional simulator 100 arranges the object represented by the three-dimensional model in the virtual space.
  • the user selects a desired three-dimensional model from the plurality of three-dimensional models imported in step S101, and drags and drops the selected three-dimensional model into the virtual space.
  • An object can be arranged in the virtual space.
  • the user may be able to arrange the object represented by the three-dimensional model in the virtual space by designating the position coordinates in the virtual space.
  • the user when arranging the object represented by the three-dimensional model in the virtual space, the user may arbitrarily tilt or rotate the object, and then arrange the object. In addition, the user may arrange the object after enlarging or reducing the object, for example.
  • step S102 may be repeatedly performed.
  • the three-dimensional simulator 100 creates a virtual space in which one or more objects (objects) are arranged at desired positions.
  • Step S103 The user places an actual object in the real space so as to correspond to the virtual space created in steps S101 to S102.
  • the user can superimpose and display the object arranged in the virtual space in the real space, and use the display device equipped with the position sensor and the posture sensor to display the object. What is necessary is just to arrange the actual object at the same position as the object displayed on the device.
  • Examples of such a display device include a head-mounted display on which a position sensor, an attitude sensor, and a camera are mounted, a head-mounted display on which a real space is transparently visible and on which a position sensor and an attitude sensor are mounted, and projection mapping.
  • Examples include a device, a tablet terminal equipped with a position sensor, a posture sensor, and a camera, and a smartphone equipped with a position sensor, a posture sensor, and a camera.
  • These display devices can display a video in which an object is superimposed in the real space after synchronizing the position in the virtual space with the position in the real space. Therefore, for example, the user moves the display device in the real space after carrying or wearing the display device, and places the same object in the same position, the same posture, and the same position as the object in the video in the real space. Can be.
  • the virtual space created in steps S101 to S102 can be associated with the real space.
  • the same position as the object arranged in the virtual space is obtained.
  • the same object may be arranged in the real space in the posture.
  • FIG. 4 is a diagram for explaining an example of the flow of the learning data creation procedure.
  • Step S201 The user uses the camera device 20 to photograph a desired range in the real space. As a result, an actual photographed image is created by the camera device 20 and transmitted to the learning data creating device 10. In the learning data creation device 10, the actual photographed image is stored in the storage unit 300.
  • the tracking device 30 corresponding to the camera device 20 transmits the tracking information to the learning data creation device 10.
  • the tracking information is stored in the storage unit 300.
  • the tracking information is information indicating the position and orientation of the camera device 20 as described above.
  • the tracking information created by the tracking device 30 tracking the position and orientation of the camera device 20 is stored in the storage unit 300, but the present invention is not limited to this.
  • tracking information indicating the tracking result may be stored in the storage unit 300.
  • a two-dimensional code such as a QR code (registered trademark) is pasted on the camera device 20 in advance and the position and orientation of the camera device 20 are tracked by reading the two-dimensional code with a camera or the like. good.
  • Step S202 The three-dimensional simulator 100 photographs the virtual space with the virtual camera device at the same position and orientation as the camera device 20 photographed in step S201. In other words, the three-dimensional simulator 100 draws (renders) in the virtual space an imaging range of the virtual camera device having the same position and posture as the camera device 20 imaged in step S201.
  • the three-dimensional simulator 100 can specify the position and orientation of the camera device 20 from the tracking information created in step S201. Therefore, the three-dimensional simulator 100 can install a virtual camera device in the virtual space at the same position and orientation as the camera device 20 in the real space. As a result, a virtual captured image corresponding to the real captured image created in step S201 is created.
  • the three-dimensional simulator 100 adds predetermined information acquired or calculated in the virtual space to the virtual captured image. Then, the three-dimensional simulator 100 stores the virtual captured image to which the predetermined information is added in the storage unit 300.
  • the predetermined information for example, the outline information and the object name of the object (object) in the virtual captured image, the class to which the object is classified, the state information of the object, the , The posture of the object, the result of performing a predetermined physical operation on the object, and the like.
  • a result of performing a predetermined physical operation on an object for example, information on an operation of a robot arm capable of performing a preset operation and holding the object at the position can be given.
  • Step S202 may be automatically performed after step S201, for example, or may be performed in response to a user operation (for example, a rendering start operation in a virtual space). .
  • Step S203 The learning data creation unit 200 assigns, as teacher information, predetermined information given to the virtual shot image created in step S202 to the real shot image created in step S201. I do. As a result, learning data represented by a set of the real photographed image and the teacher information is created.
  • the teacher information included in the learning data is represented, for example, in a list format.
  • FIG. 5 illustrates a plurality of pieces of teacher information represented in a list format (this is also referred to as a “teacher information list”).
  • FIG. 5 is an example of a teacher information list assigned to a certain actually shot image (image ID: image101).
  • Each teacher information included in the teacher information list shown in FIG. 5 is information in which an object ID, position information, contour line information, contact information, and gripping operation information are associated with each other.
  • the object ID is an ID for identifying an object.
  • the object ID is, for example, information given to a three-dimensional model of an object arranged in a virtual space.
  • the position information is the position coordinates where the object is located.
  • the position information is, for example, information given to the object when the object represented by the three-dimensional model is arranged in step S102 described above.
  • Contour line information is information indicating a contour line of an object.
  • the contour information can be acquired, for example, from the rendering result when the virtual captured image is drawn (rendered) in step S202.
  • the contact information is information indicating the object ID of the other object, the contact position with the other object, and the like when the object with the object ID is in contact with another object (object).
  • the contact information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.
  • the gripping operation information is, for example, information relating to an operation in which a robot arm capable of performing a preset operation can grip an object having the object ID at a shooting position of a virtual shot image.
  • the gripping motion information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.
  • the teacher information list illustrated in FIG. 5 is a list of the teacher information in which the position information, the object name, the outline information, the contact information, and the grip operation information are associated with each other for each object (object). It is.
  • any information that can be obtained or calculated by the three-dimensional simulator 100 may be associated with the teacher information.
  • the virtual captured image itself or a partial region of the virtual captured image may be associated with the teacher information.
  • an image area portion representing the object of the object ID in the image area of the virtual captured image may be associated with the object ID.
  • the gripping motion information may include information on a target part that is considered to be grippable or easy to grip when gripping the target.
  • Each piece of teacher information may be information in which only a part of the above information (position information, object name, contour line information, contact information, gripping motion information, etc.) is associated.
  • teacher information included in the learning data is represented in a list format
  • teacher information included in the learning data may be represented in any other format.
  • FIG. 6 is a diagram illustrating an example of a hardware configuration of the learning data creation device 10 according to the embodiment of the present invention.
  • the learning data creating device 10 includes an input device 401, a display device 402, an external I / F 403, a communication I / F 404, and a random access memory (RAM). 405, a ROM (Read Only Memory) 406, a processor 407, and an auxiliary storage device 408. These pieces of hardware are interconnected by a bus 409.
  • the input device 401 is, for example, a keyboard, a mouse, a touch panel, or the like, and is used by a user to input various operations.
  • the display device 402 is, for example, a display or the like, and displays various processing results of the learning data creation device 10. Note that the learning data creation device 10 may not have at least one of the input device 401 and the display device 402.
  • the external I / F 403 is an interface with an external device.
  • the external device includes a recording medium 403a and the like. Reading and writing of the recording medium 403a and the like can be performed via the learning data creation device 10 and the external I / F 403.
  • the storage medium 403a may store one or more programs for realizing the three-dimensional simulator 100 and the learning data creation unit 200.
  • Examples of the recording medium 403a include a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital Memory card), and a USB (Universal Serial Bus) memory card.
  • a flexible disk a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital Memory card), and a USB (Universal Serial Bus) memory card.
  • the communication I / F 404 is an interface for connecting the learning data creation device 10 to a communication network.
  • One or more programs that implement the three-dimensional simulator 100 and the learning data creation unit 200 may be obtained (downloaded) from a predetermined server device or the like via the communication I / F 404.
  • the RAM 405 is a volatile semiconductor memory that temporarily stores programs and data.
  • the ROM 406 is a nonvolatile semiconductor memory that can retain programs and data even when the power is turned off.
  • the ROM 406 stores, for example, settings related to an OS (Operating System), settings related to a communication network, and the like.
  • the processor 407 is, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like, and is an arithmetic device that reads a program or data from the ROM 406 or the auxiliary storage device 408 onto the RAM 405 and executes processing.
  • the three-dimensional simulator 100 and the learning data creation unit 200 are realized by, for example, processing that causes the processor 407 to execute one or more programs stored in the auxiliary storage device 408.
  • the learning data creation device 10 may include both the CPU and the GPU as the processor 407, or may include only one of the CPU and the GPU.
  • the auxiliary storage device 408 is, for example, a hard disk drive (HDD) or a solid state drive (SSD), and is a nonvolatile storage device that stores programs and data.
  • the auxiliary storage device 408 stores, for example, an OS, various application software, one or more programs for implementing the three-dimensional simulator 100 and the learning data creation unit 200, and the like.
  • the storage unit 300 is realized using, for example, the auxiliary storage device 408.
  • the storage unit 300 may be implemented using, for example, a storage device that is communicably connected to the learning data creation device 10 via a communication network, instead of the auxiliary storage device 408.
  • the learning data creation device 10 can implement the above-described various processes by having the hardware configuration shown in FIG. In the example illustrated in FIG. 6, a case has been described where the learning data creation device 10 according to the embodiment of the present invention is implemented by one device (computer), but the invention is not limited to this.
  • the learning data creation device 10 according to the embodiment of the present invention may be realized by a plurality of devices (computers).
  • the learning data creation system 1 converts information obtained from a virtual captured image (that is, information that can be obtained or calculated by the three-dimensional simulator 100) into a real captured image as teacher information. By giving, the learning data is created. For this reason, in the learning data creation system 1 according to the embodiment of the present invention, the learning data is easily created without, for example, manually adding teacher information to an actual captured image. Will be able to do it. In particular, for example, even when the number of teacher information is large (for example, the number of types of objects is large or the number of categories is large), the learning data can be easily created.
  • the object segmentation can be performed with high accuracy. Will be able to do it.
  • the teacher information is included. Learning data can be easily created.
  • the user simply captures an arbitrary range with the camera device 20 while moving in the real space, and obtains the real captured image and the real captured image. Is generated, so that a large amount of learning data can be easily created. For this reason, for example, a large amount of learning data can be obtained at low cost as compared with a case where teacher information is added to an actual captured image using crowd sourcing or the like.
  • the learning data creation system 1 for example, recognition of a robot that cleans a certain room that is a real space or clears an object in the room is performed. It is possible to easily obtain a large amount of learning data used for learning an engine (that is, a machine learning model that performs a task of cleaning and clearing a room).
  • a virtual space is created, and objects are arranged in the real space so as to correspond to the virtual space.
  • a virtual space may be created so as to correspond to a real space in which an object is actually arranged.
  • the object in the virtual space may be moved so as to correspond to the object in the real space.
  • a function for facilitating the alignment between the first target and the second target it is preferable to provide a function for facilitating the alignment between the first target and the second target.
  • a function may be provided that facilitates movement when moving an object in the virtual space, or facilitates alignment with an object in the real space.
  • a function of moving an object in the virtual space with reference to the position of the user for example, a function of drawing the object to the vicinity of the user
  • a function of aligning the normal with the real space for example, a predetermined function of the object in the virtual space
  • any of information such as the direction, position, and angle of the object in the virtual space
  • a function that aligns objects in the real space by manipulating other information after fixing the object a function that aligns objects in the real space with the corresponding objects in the virtual space using a machine learning model, etc.
  • the second object may be subjected to a process such that the second object satisfactorily corresponds to the first object or facilitates alignment. For example, when the object in the real space does not correspond to the object in the virtual space, or when it is difficult to match the position, an object in the virtual space is newly added, the information is corrected, or the information is generated again. You may do it.
  • data other than main data used as learning data may be recorded.
  • data is, specifically, the parameters of the device that acquires or creates the data (for example, in the case of a camera, the focal length, the position of the camera, the shutter speed and the aperture value, the acceleration sensor provided in the camera, Information on the camera at the time when the image was acquired, such as information on the movement of the camera, information on the environment in which data such as lighting conditions were acquired, category or tag information on the target or main data recorded in the data ( For example, in the case of an image, category information related to a photographed object) is included.
  • the main data is a moving image
  • time information and the like information used for management of recorded data and the like, for example, an ID of a worker or an imaging device may be included.
  • data other than the main data may be used in the process of creating the learning data, or may be used together with the main data as a part of the learning data.
  • the real photographed image and the virtual photographed image are still images, but the present invention is not limited to this.
  • the real photographed image and the virtual photographed image may be moving images.
  • a set of the real photographed image and the teacher information acquired from the three-dimensional simulator 100 is used as the learning data, but is not limited thereto.
  • a combination of a virtual captured image and teacher information may be used as learning data, using the real captured image as teacher information. In this case, it is possible to create learning data for a task of predicting an actual captured image from the virtual captured image created by the three-dimensional simulator 100.
  • Reference Signs List 1 learning data creation system 10 learning data creation device 20 camera device 30 tracking device 100 three-dimensional simulator 200 learning data creation unit 300 storage unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

With the present invention, using first data recording a first real subject and second data recording a second subject which is generated by a simulator, is positioned in alignment with the first subject, and corresponds to the first subject, one or a plurality of computers executes appending information based on the second data and/or information generated by the simulator to the first data as teaching information.

Description

学習用データ作成方法、機械学習モデルの生成方法、学習用データ作成装置及びプログラムLearning data creation method, machine learning model creation method, learning data creation device and program
 本発明は、学習用データ作成方法、機械学習モデルの生成方法、学習用データ作成装置及びプログラムに関する。 The present invention relates to a learning data creation method, a machine learning model generation method, a learning data creation device, and a program.
 近年、機械学習の手法により様々なタスクを実行することが行われている。このようなタスクの1つとして、例えば、セマンティックセグメンテーション等が知られている。セマンティックセグメンテーションとは、カメラ装置等により撮影された画像中の各画素(ピクセル)を、そのピクセルが示す意味に応じたクラス(例えば、そのピクセルが表す物体の物体名等)に分類するタスクである。 In recent years, various tasks have been executed by machine learning techniques. As one of such tasks, for example, semantic segmentation is known. Semantic segmentation is a task of classifying each pixel (pixel) in an image captured by a camera device or the like into a class (for example, an object name of an object represented by the pixel) according to the meaning indicated by the pixel. .
 ここで、セマンティックセグメンテーション等の多くのタスクでは、教師あり学習により機械学習モデルが学習される場合が多い。 Here, in many tasks such as semantic segmentation, a machine learning model is often learned by supervised learning.
特開2017-182129号公報JP-A-2017-182129 特開2016-71597号公報JP 2016-71597 A
 しかしながら、教師あり学習に用いられる学習用データは、人手により作成されることが多い。例えば、セマンティックセグメンテーションでは、画像中の各ピクセルを、このピクセルが示す意味に応じたクラスの色に塗りつぶす作業を行うことで、当該画像に対して教師情報(各ピクセルのクラス分類)が付与された学習用データが作成される。 However, learning data used for supervised learning is often created manually. For example, in the semantic segmentation, teacher information (class classification of each pixel) is given to the image by performing a task of painting each pixel in the image with a color of a class corresponding to the meaning indicated by the pixel. Learning data is created.
 また、機械学習の手法により実行するタスクによっては、複数の教師情報が付与された学習用データを作成する必要がある。例えば、上記のクラス分類の他に、画像中の物体の姿勢(物体の向きや回転等)や当該物体の状態等を教師情報として付与した学習用データが必要になる場合もある。 学習 Also, depending on the task to be executed by the machine learning method, it is necessary to create learning data to which a plurality of pieces of teacher information are added. For example, in addition to the above-described class classification, learning data to which the posture of an object in an image (the direction and rotation of the object, etc.), the state of the object, and the like are added as teacher information may be necessary.
 更に、一般に、機械学習モデルの学習には大量の学習用データが必要になることが多い。このため、教師あり学習に用いられる学習用データの作成には、多大な労力と膨大な時間とを要する場合があった。 Furthermore, in general, learning a machine learning model often requires a large amount of learning data. For this reason, creation of learning data used for supervised learning sometimes requires a great deal of labor and a great deal of time.
 本発明の実施の形態は、上記の点に鑑みてなされたもので、学習用データを容易に作成することを目的とする。 実 施 The embodiment of the present invention has been made in view of the above points, and aims to easily create learning data.
 上記目的を達成するため、本発明の実施の形態は、現実における第1の対象を記録した第1のデータと、シミュレータにより生成され前記第1の対象と位置を合わせて配置された前記第1の対象と対応する第2の対象を記録した第2のデータと、を用いて、前記第1のデータに対して、少なくとも、前記第2のデータに基づく情報又は前記シミュレータにより生成された情報の何れかを教師情報として付与する、ことを1又は複数のコンピュータが実行する。 In order to achieve the above object, an embodiment of the present invention is directed to a first embodiment in which first data recording a first object in the real world and the first data generated by a simulator and aligned with the first object are arranged. And the second data that records the second object corresponding to the first object, at least the information based on the second data or the information generated by the simulator with respect to the first data. One or a plurality of computers perform the assignment of one of them as teacher information.
 学習用データを容易に作成することができる。 Learning data can be easily created.
本発明の実施の形態における学習用データ作成システムの全体構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of an overall configuration of a learning data creation system according to an embodiment of the present invention. 学習用データ作成の一例を模式的に説明するための図である。FIG. 9 is a diagram for schematically explaining an example of creation of learning data. 事前準備手順の流れの一例を説明するための図である。FIG. 6 is a diagram for explaining an example of a flow of a preparation procedure. 学習用データ作成手順の流れの一例を説明するための図である。FIG. 9 is a diagram for describing an example of the flow of a learning data creation procedure. 教師情報リストの一例を示す図である。It is a figure showing an example of a teacher information list. 本発明の実施の形態における学習用データ作成装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of a learning data creation device according to an embodiment of the present invention.
 以下、本発明の実施の形態について説明する。以降では、所定のタスクを実行する機械学習モデルの学習用データを容易に作成することが可能な学習用データ作成システム1について説明する。所定のタスクとしては、例えば、カメラ装置等により撮影された画像中の物体の認識や分類、当該物体の状態の把握、当該物体に関する何等かの行動(例えば、物体の把持行動や物体の回避行動)等のタスクが挙げられる。なお、物体は、教師情報が付与される対象の一例である。 Hereinafter, embodiments of the present invention will be described. Hereinafter, a learning data creation system 1 that can easily create learning data of a machine learning model that executes a predetermined task will be described. The predetermined task includes, for example, recognition and classification of an object in an image captured by a camera device or the like, grasping of the state of the object, and any action related to the object (for example, an object grasping action or an object avoiding action). ) And other tasks. Note that the object is an example of a target to which teacher information is added.
 本発明の実施の形態では、三次元シミュレータで作成した仮想空間をカメラ装置(すなわち、仮想空間内に設置等された仮想的なカメラ装置)で撮影した画像(以降、「仮想撮影画像」とも表す。)と、当該仮想空間に対応する実空間(つまり、現実の空間)を実際のカメラ装置で撮影した画像(以降、「実撮影画像」とも表す。)とを用いて、実撮影画像に対して、仮想撮影画像から得られる教師情報を付与することで、学習用データを作成する。教師情報としては、例えば、仮想撮影画像中における物体の輪郭線情報、当該物体が分類されるクラス、当該物体の物体名、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の行動を行うための情報等が挙げられる。なお、仮想撮影画像は仮想空間内で撮影された物体(第2の対象)を記録したデータの一例であり、第2のデータの一例である。実撮影画像は実空間内で撮影された物体(第1の対象)を記録したデータの一例であり、第1のデータの一例である。また、三次元シミュレータはシミュレータの一例である。 In the embodiment of the present invention, an image obtained by photographing a virtual space created by a three-dimensional simulator with a camera device (that is, a virtual camera device installed in the virtual space or the like) (hereinafter, also referred to as a “virtual photographed image”) ) And an image (hereinafter, also referred to as an “actual photographed image”) of a real space corresponding to the virtual space (that is, a real space) photographed by an actual camera device. Then, learning data is created by adding teacher information obtained from the virtual photographed image. As the teacher information, for example, contour information of an object in a virtual captured image, a class to which the object is classified, an object name of the object, state information of the object, a depth to the object, a depth to the object, a posture of the object, Information for performing a predetermined action on the object is included. Note that the virtual captured image is an example of data that records an object (a second target) captured in the virtual space, and is an example of second data. The actual photographed image is an example of data recording an object (first object) photographed in the real space, and is an example of first data. A three-dimensional simulator is an example of a simulator.
 ここで、仮想空間に対応する実空間とは、例えば、三次元シミュレータで作成した仮想空間と同じ位置に同じ物体が配置等されている実空間のことである。なお、本開示において、仮想空間と実空間とで位置が同じであるとは、例えば、仮想空間及び実空間に同一の座標系を設定した場合に、位置座標が同一であることである。ただし、仮想空間及び実空間には、例えば、相互に変換可能な座標系がそれぞれ設定されても良い。なお、以降では、「位置」や「姿勢」は、仮想空間及び実空間に設定された同一の座標系における位置や姿勢を表すものとする。 Here, the real space corresponding to the virtual space is, for example, a real space in which the same object is arranged at the same position as the virtual space created by the three-dimensional simulator. In the present disclosure, the position being the same in the virtual space and the real space means that, for example, when the same coordinate system is set in the virtual space and the real space, the position coordinates are the same. However, in the virtual space and the real space, for example, coordinate systems that can be mutually converted may be set. Hereinafter, “position” and “posture” represent a position and a posture in the same coordinate system set in the virtual space and the real space.
 また、本明細書において、同じ又は同一とは、厳密に同じ又は同一であることに限られず、例えば、装置や計算の誤差により発生するずれやデータセットの使用の目的等に応じて許容され得るずれ等も含み得るものとする。 Further, in the present specification, the same or the same is not limited to exactly the same or the same. For example, the same or the same may be allowed according to a shift generated due to an error in an apparatus or a calculation, a purpose of using a data set, or the like. It is assumed that deviations can be included.
 また、仮想空間と実空間とで物体が同じであるとは、仮想空間内に配置等された三次元モデルで表される物体と、実空間内に配置等された実際の物体とが同じであることである。なお、実空間内に配置等される実際の物体と区別するため、仮想空間内に配置される物体を「オブジェクト」とも称する。また、対象又は物体が同じであるとは、教師情報の付与のために必要な情報が同じであることを意味する。例えば、セマンティックセグメンテーションにおいては物体の輪郭が同じであるものを含み、物体の把持位置の教示においては輪郭に加えて触覚や素材の情報等が同じであるものを含む。 Also, that the object is the same in the virtual space and the real space means that the object represented by the three-dimensional model arranged in the virtual space is the same as the real object arranged in the real space. That is. Note that an object arranged in a virtual space is also referred to as an “object” in order to distinguish it from an actual object arranged in a real space. Further, the fact that the target or the object is the same means that the information necessary for providing the teacher information is the same. For example, in semantic segmentation, an object having the same contour is included, and in teaching a gripping position of the object, an object having the same tactile sensation and material information in addition to the contour is included.
 <学習用データ作成システム1の全体構成>
 まず、本発明の実施の形態における学習用データ作成システム1の全体構成について、図1を参照しながら説明する。図1は、本発明の実施の形態における学習用データ作成システム1の全体構成の一例を示す図である。
<Overall Configuration of Learning Data Creation System 1>
First, an overall configuration of a learning data creation system 1 according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of an overall configuration of a learning data creation system 1 according to an embodiment of the present invention.
 図1に示すように、本発明の実施の形態における学習用データ作成システム1は、学習用データ作成装置10と、1以上のカメラ装置20と、1以上のトラッキング装置30とを有する。また、学習用データ作成装置10と、カメラ装置20と、トラッキング装置30とは、例えば、無線LAN(Local Area Network)等の通信ネットワークを介して通信可能に接続されている。なお、この通信ネットワークは、例えば、全部又は一部が有線LAN等であっても良い。 As shown in FIG. 1, the learning data creation system 1 according to the embodiment of the present invention includes a learning data creation device 10, one or more camera devices 20, and one or more tracking devices 30. The learning data creation device 10, the camera device 20, and the tracking device 30 are communicably connected via a communication network such as a wireless LAN (Local Area Network). This communication network may be, for example, a wire LAN or the like in whole or in part.
 学習用データ作成装置10は、学習用データを作成するコンピュータ又はコンピュータシステムである。学習用データ作成装置10は、三次元シミュレータ100と、学習用データ作成部200と、記憶部300とを有する。 The learning data creation device 10 is a computer or a computer system that creates learning data. The learning data creation device 10 includes a three-dimensional simulator 100, a learning data creation unit 200, and a storage unit 300.
 本実施形態の三次元シミュレータ100は、三次元の仮想空間をシミュレーションすることが可能なシミュレータである。三次元シミュレータ100では、仮想空間内にオブジェクトを配置したり、オブジェクトの物理学的な法則をシミュレーションする物理演算(例えば、オブジェクト同士の衝突判定)を行ったりすることができる。 The three-dimensional simulator 100 of the present embodiment is a simulator that can simulate a three-dimensional virtual space. In the three-dimensional simulator 100, an object can be arranged in a virtual space, or a physical operation for simulating the physical laws of the object (for example, collision determination between objects) can be performed.
 また、三次元シミュレータ100では、仮想空間内を仮想的なカメラ装置で撮影した仮想撮影画像を描画したりすることができる。このとき、三次元シミュレータ100では、例えば、仮想撮影画像中の物体(オブジェクト)の輪郭線情報や物体名、当該物体が分類されるクラス、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の物理演算を行った結果等を、当該仮想撮影画像に付与することができる。これらの情報は、三次元シミュレータ100における演算等により生成される。なお、これらの情報は、例えば、仮想撮影画像の描画よりも前に予め生成され、別途、記憶部300等に記憶されていても良い。 {Circle around (3)} In the three-dimensional simulator 100, a virtual photographed image photographed in a virtual space with a virtual camera device can be drawn. At this time, in the three-dimensional simulator 100, for example, the outline information and the object name of the object (object) in the virtual captured image, the class to which the object is classified, the state information of the object, the depth to the object, the depth to the object, , The result of performing a predetermined physical operation on the object, and the like can be added to the virtual captured image. These pieces of information are generated by calculation or the like in the three-dimensional simulator 100. Note that these pieces of information may be generated in advance before the virtual captured image is drawn, for example, and separately stored in the storage unit 300 or the like.
 三次元シミュレータ100によって、仮想撮影画像に対して種々の情報が付与されることで、本実施形態では、この仮想撮影画像に対応する実撮影画像に対してこれらの種々の情報のうちの全部又は一部を教師情報として付与した学習用データを作成することが可能となる。 In the present embodiment, by adding various information to the virtual photographed image by the three-dimensional simulator 100, all or all of the various information is added to the real photographed image corresponding to the virtual photographed image. It is possible to create learning data to which a part is added as teacher information.
 シミュレータによって生成された情報を第1のデータに付与することにより、容易にデータセットを作成することができる。 (4) A data set can be easily created by adding the information generated by the simulator to the first data.
 このような三次元シミュレータ100は、例えば、UnityやUnreal Engin4(UE4)、Blender等のゲームエンジンにより実現される。ただし、三次元シミュレータ100は、これらのゲームエンジンに限られず、任意の三次元シミュレーションソフトウェアにより実現されても良い。 Such a three-dimensional simulator 100 is realized by a game engine such as, for example, Unity, Unreal @ Engine4 (UE4), Blender, and the like. However, the three-dimensional simulator 100 is not limited to these game engines, and may be realized by any three-dimensional simulation software.
 学習用データ作成部200は、実撮影画像に対して、仮想撮影画像から得られる教師情報を付与することで、学習用データを作成する。教師情報は、三次元シミュレータ100が仮想撮影画像を描画した際に、当該仮想撮影画像に付与した情報(例えば、仮想撮影画像中の物体の輪郭線情報や物体名、当該物体が分類されるクラス、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の物理演算を行った結果等)である。 The learning data creation unit 200 creates learning data by adding teacher information obtained from a virtual captured image to a real captured image. The teacher information is information added to the virtual captured image when the three-dimensional simulator 100 draws the virtual captured image (for example, contour information and an object name of an object in the virtual captured image, a class into which the object is classified). , State information of the object, a depth to the object, a posture of the object, a result of performing a predetermined physical operation on the object, and the like).
 このように、学習用データ作成部200は、三次元シミュレータ100が描画した仮想撮影画像から得られる情報を教師情報として実撮影画像に付与することで、学習用データを作成する。 As described above, the learning data creating unit 200 creates learning data by adding information obtained from the virtual captured image drawn by the three-dimensional simulator 100 to the real captured image as teacher information.
 記憶部300は、種々の情報を記憶する。記憶部300に記憶される情報としては、例えば、カメラ装置20が実空間を撮影した実撮影画像や三次元シミュレータ100が描画した仮想撮影画像、トラッキング装置30から取得したトラッキング情報、仮想空間内に配置等される物体(オブジェクト)の三次元モデル等が挙げられる。ここで、トラッキング情報とは、実空間におけるカメラ装置20の位置や姿勢をトラッキングした情報である。すなわち、トラッキング情報は、各時刻におけるカメラ装置20の位置及び姿勢の両方を示す情報である。ただし、トラッキング情報は、例えば、各時刻におけるカメラ装置20の位置のみを示す情報であっても良い。トラッキング装置30によって得られたトラッキング情報は、実空間の情報と仮想空間の情報との同期に用いられる。 (4) The storage unit 300 stores various information. The information stored in the storage unit 300 includes, for example, an actual captured image captured by the camera device 20 in the real space, a virtual captured image drawn by the three-dimensional simulator 100, tracking information acquired from the tracking device 30, Examples include a three-dimensional model of an object to be arranged. Here, the tracking information is information obtained by tracking the position and orientation of the camera device 20 in the real space. That is, the tracking information is information indicating both the position and the posture of the camera device 20 at each time. However, the tracking information may be, for example, information indicating only the position of the camera device 20 at each time. The tracking information obtained by the tracking device 30 is used for synchronizing information in the real space and information in the virtual space.
 カメラ装置20は、実空間を撮影して実撮影画像を作成する撮像装置である。カメラ装置20は、例えば、トラッキング装置30が装着された携帯型カメラスタンド等に固定されている。カメラ装置20により作成された実撮影画像は、学習用データ作成装置10に送信され、記憶部300に記憶される。なお、カメラ装置20は、例えば、深度情報が付与された実撮影画像を作成可能な深度カメラであっても良い。 The camera device 20 is an imaging device that captures an image of a real space and creates an actually captured image. The camera device 20 is fixed to, for example, a portable camera stand on which the tracking device 30 is mounted. The actual captured image created by the camera device 20 is transmitted to the learning data creation device 10 and stored in the storage unit 300. Note that the camera device 20 may be, for example, a depth camera that can create an actual captured image to which depth information has been added.
 トラッキング装置30は、カメラ装置20の位置及び姿勢をトラッキングして、トラッキング情報を作成する装置(例えば、位置センサ及び姿勢センサが搭載されたセンシング装置)である。トラッキング装置30は、例えば携帯型カメラスタンド等に装着されている。このように、1台のカメラ装置20に対して、1台のトラッキング装置30が対応付けて設置等されている。トラッキング装置30により作成されたトラッキング情報は、学習用データ作成装置10に送信され、記憶部300に記憶される。なお、トラッキング装置30は、例えば、カメラ装置20に直接装着等されていても良いし、カメラ装置20に内蔵等されていても良い。 The tracking device 30 is a device that tracks the position and orientation of the camera device 20 and creates tracking information (for example, a sensing device equipped with a position sensor and an orientation sensor). The tracking device 30 is mounted on, for example, a portable camera stand or the like. As described above, one tracking device 30 is installed and associated with one camera device 20. The tracking information created by the tracking device 30 is transmitted to the learning data creation device 10 and stored in the storage unit 300. The tracking device 30 may be directly mounted on the camera device 20 or may be built in the camera device 20, for example.
 トラッキング装置30によってカメラ装置20のトラッキング情報が得られることで、このカメラ装置20で実空間を撮影した実撮影画像に対応する仮想撮影画像を得ることが可能となる。つまり、本実施形態のトラッキング装置30は、第1のデータを取得した際の情報を取得する装置であって、当該情報に基づいて第2のデータの作成を容易にすることができる。 Since the tracking information of the camera device 20 is obtained by the tracking device 30, it is possible to obtain a virtual photographed image corresponding to a real photographed image photographed in a real space by the camera device 20. That is, the tracking device 30 of the present embodiment is a device that acquires information when the first data is acquired, and can easily create the second data based on the information.
 図1に示す学習用データ作成システム1の構成は一例であって、他の構成であっても良い。例えば、学習用データ作成システム1は、任意の台数のカメラ装置20と、これらのカメラ装置20に対応するトラッキング装置30とを有していても良い。 The configuration of the learning data creation system 1 shown in FIG. 1 is an example, and another configuration may be used. For example, the learning data creation system 1 may include an arbitrary number of camera devices 20 and tracking devices 30 corresponding to the camera devices 20.
 また、カメラ装置20は、実空間における位置及び姿勢が既知であれば、このカメラ装置20に対応するトラッキング装置30は無くても良い。例えば、予め決まった位置に、予め決まった姿勢でカメラ装置20が固定的に設置等されているような場合には、このカメラ装置20に対応するトラッキング装置30は無くても良い。 The camera device 20 may not have the tracking device 30 corresponding to the camera device 20 as long as the position and the posture in the real space are known. For example, when the camera device 20 is fixedly installed at a predetermined position in a predetermined posture, the tracking device 30 corresponding to the camera device 20 may not be provided.
 <学習用データの作成方法>
 ここで、本発明の実施の形態における学習用データ作成装置10で学習用データを作成する場合の概略について、図2を参照しながら説明する。図2は、学習用データ作成の一例を模式的に説明するための図である。
<How to create learning data>
Here, an outline of the case where learning data is created by the learning data creating device 10 according to the embodiment of the present invention will be described with reference to FIG. FIG. 2 is a diagram for schematically explaining an example of creating learning data.
 図2に示すように、実空間の或る位置において、或る姿勢のカメラ装置20で撮影した実撮影画像を「実撮影画像G110」とする。また、実空間と対応する仮想空間の同じ位置において、同じ姿勢の仮想的なカメラ装置で撮影した仮想撮影画像を「仮想撮影画像G210」とする。 実 As shown in FIG. 2, an actual photographed image photographed by the camera device 20 in a certain posture at a certain position in the real space is referred to as “real photographed image G110”. A virtual captured image captured by the virtual camera device having the same posture at the same position in the virtual space corresponding to the real space is referred to as a “virtual captured image G210”.
 このとき、仮想撮影画像G210には、三次元シミュレータ100の演算等により生成される情報(図2では、一例として、「輪郭線」及び「物体名」)が付与されている。すなわち、図2に示す例では、仮想撮影画像G210中の各物体の輪郭線と、各物体の物体名とが付与されている。なお、三次元シミュレータの演算等により生成可能な情報のうち、どのような情報を仮想撮影画像G210に付与するかは、機械学習モデルに実行させるタスクによって異なる(つまり、どのような学習用データセットを作成したいかによって異なる。)。 At this time, information (for example, “contour line” and “object name” in FIG. 2) generated by calculation of the three-dimensional simulator 100 is added to the virtual captured image G210. That is, in the example shown in FIG. 2, the outline of each object in the virtual captured image G210 and the object name of each object are given. Note that, among information that can be generated by calculation of the three-dimensional simulator, what information is given to the virtual captured image G210 depends on the task to be executed by the machine learning model (that is, what learning data set Depends on what you want to create.).
 学習用データ作成装置10は、実撮影画像G110に対して、仮想撮影画像G210に付与されている情報(すなわち、「輪郭線」及び「物体名」)を教師情報として付与することで、学習用データG120を作成する。これにより、実撮影画像G110と、教師情報(すなわち、「輪郭線」及び「物体名」)との組で表される学習用データG120が作成される。 The learning data creating device 10 adds the information (that is, the “contour line” and the “object name”) assigned to the virtual captured image G210 to the real captured image G110 as teacher information. The data G120 is created. As a result, learning data G120 represented by a set of the real photographed image G110 and the teacher information (that is, “contour line” and “object name”) is created.
 このように、本発明の実施の形態における学習用データ作成装置10は、実空間の或る範囲内を実際に撮影した実撮影画像G110と、仮想空間の同じ範囲内を仮想的に撮影した仮想撮影画像G210とを用いて、当該仮想撮影画像G210から得られる情報(すなわち、三次元シミュレータの演算等により生成された情報)を実撮影画像G110に付与することで、学習用データG120を作成する。このため、本発明の実施の形態における学習用データ作成装置10では、学習用データG120を容易に作成することができる。 As described above, the learning data creating apparatus 10 according to the embodiment of the present invention includes the real photographed image G110 actually photographing a certain area in the real space and the virtual photographing image G110 virtually photographing the same area in the virtual space. Using the captured image G210 and the information obtained from the virtual captured image G210 (that is, information generated by the operation of the three-dimensional simulator or the like) is added to the real captured image G110 to create the learning data G120. . Therefore, the learning data creation device 10 according to the embodiment of the present invention can easily create the learning data G120.
 しかも、本発明の実施の形態における学習用データ作成装置10では、トラッキング装置30から取得されたトラッキング情報によりカメラ装置20の位置及び姿勢が特定されるため、仮想空間における仮想的なカメラ装置の位置及び姿勢を当該カメラ装置20と同期させることができる。このため、ユーザは、例えば、実空間内をカメラ装置20で撮影するだけで、実撮影画像と、この実撮影画像に対応する仮想撮影画像とを容易に得ることができる。 In addition, in the learning data creation device 10 according to the embodiment of the present invention, the position and orientation of the camera device 20 are specified by the tracking information acquired from the tracking device 30, so that the position of the virtual camera device in the virtual space is determined. And the posture can be synchronized with the camera device 20. For this reason, the user can easily obtain the real photographed image and the virtual photographed image corresponding to the real photographed image simply by photographing the real space with the camera device 20, for example.
 <事前準備手順の流れ>
 本発明の実施の形態では、上述したように、仮想空間と実空間とが対応している必要がある。このため、学習用データを作成するための事前準備として、仮想空間と実空間とを対応させる必要がある。そこで、以降では、三次元シミュレータ100により仮想空間に物体(オブジェクト)を配置した上で、この仮想空間に対応するように実空間にも実際の物体を配置することで、仮想空間と実空間とを対応させる(つまり、仮想空間と実空間とを合わせる)場合の手順について、図3を参照しながら説明する。図3は、事前準備手順の流れの一例を説明するための図である。
<Flow of preparation procedure>
In the embodiment of the present invention, as described above, the virtual space needs to correspond to the real space. For this reason, it is necessary to make the virtual space correspond to the real space as advance preparation for creating the learning data. Therefore, hereinafter, an object is placed in a virtual space by the three-dimensional simulator 100, and an actual object is also placed in the real space so as to correspond to the virtual space. (That is, the virtual space and the real space are matched) will be described with reference to FIG. FIG. 3 is a diagram for explaining an example of the flow of the advance preparation procedure.
 ステップS101:三次元シミュレータ100は、仮想空間内に配置される物体(オブジェクト)の三次元モデルを記憶部300から取得する。これは、例えば、記憶部300に記憶されている三次元モデルのデータを三次元シミュレータ100にインポートすること意味する。三次元モデルは、物体の形状だけでなく、例えば、物体IDや物体名、物体が属するカテゴリ等の情報が付与されている。 Step S101: The three-dimensional simulator 100 acquires a three-dimensional model of an object placed in the virtual space from the storage unit 300. This means, for example, that the data of the three-dimensional model stored in the storage unit 300 is imported into the three-dimensional simulator 100. The three-dimensional model is provided with information such as an object ID, an object name, and a category to which the object belongs, in addition to the shape of the object.
 なお、三次元モデルは、任意の方法で予め作成した上で、記憶部300に保存しておけば良い。三次元モデルを作成する方法としては、例えば、実際の物体の三次元形状を三次元スキャナ等によりスキャンすることで作成しても良いし、三次元モデル作成ソフトウェア等により手作業で作成しても良い。 Note that the three-dimensional model may be created in advance by an arbitrary method and stored in the storage unit 300. As a method of creating a three-dimensional model, for example, the three-dimensional shape of an actual object may be created by scanning with a three-dimensional scanner or the like, or may be created manually with three-dimensional model creation software or the like. good.
 ステップS102:三次元シミュレータ100は、仮想空間内に、三次元モデルが表すオブジェクトを配置する。ユーザは、例えば、上記のステップS101でインポートされた複数の三次元モデルの中から所望の三次元モデルを選択した上で、選択した三次元モデルを仮想空間内にドラッグ・アンド・ドロップすることで、当該仮想空間内にオブジェクトを配置することができる。これ以外にも、ユーザは、仮想空間内の位置座標を指定することで、三次元モデルが表すオブジェクトを当該仮想空間内に配置することができても良い。 Step S102: The three-dimensional simulator 100 arranges the object represented by the three-dimensional model in the virtual space. For example, the user selects a desired three-dimensional model from the plurality of three-dimensional models imported in step S101, and drags and drops the selected three-dimensional model into the virtual space. , An object can be arranged in the virtual space. In addition, the user may be able to arrange the object represented by the three-dimensional model in the virtual space by designating the position coordinates in the virtual space.
 ここで、三次元モデルが表すオブジェクトを仮想空間内に配置する際に、ユーザは、当該オブジェクトを任意に傾けたり、回転させたりした上で、当該オブジェクトを配置しても良い。これ以外にも、ユーザは、例えば、当該オブジェクトを拡大や縮小等した上で、当該オブジェクトを配置しても良い。 Here, when arranging the object represented by the three-dimensional model in the virtual space, the user may arbitrarily tilt or rotate the object, and then arrange the object. In addition, the user may arrange the object after enlarging or reducing the object, for example.
 なお、仮想空間内に複数のオブジェクトを配置する場合、上記のステップS102が繰り返し行われれば良い。 When arranging a plurality of objects in the virtual space, the above step S102 may be repeatedly performed.
 以上のステップS101~ステップS102により、1以上の物体(オブジェクト)が所望の位置に配置された仮想空間が三次元シミュレータ100により作成される。 に よ り Through the above steps S101 to S102, the three-dimensional simulator 100 creates a virtual space in which one or more objects (objects) are arranged at desired positions.
 ステップS103:ユーザは、上記のステップS101~ステップS102により作成された仮想空間に対応するように、実空間内に実際の物体を配置する。 Step S103: The user places an actual object in the real space so as to correspond to the virtual space created in steps S101 to S102.
 ここで、ユーザは、例えば、仮想空間内に配置されたオブジェクトを実空間に重畳して表示させることが可能で、かつ、位置センサ及び姿勢センサが搭載されている表示装置を用いて、この表示装置に表示されたオブジェクトと同じ位置に実際の物体を配置すれば良い。このような表示装置としては、例えば、位置センサと姿勢センサとカメラとが搭載されたヘッドマウントディスプレイ、実空間を透過的に視認可能で位置センサ及び姿勢センサが搭載されたヘッドマウントディスプレイ、プロジェクションマッピング装置、位置センサと姿勢センサとカメラとが搭載されたタブレット端末、位置センサと姿勢センサとカメラとが搭載されたスマートフォン等が挙げられる。 Here, for example, the user can superimpose and display the object arranged in the virtual space in the real space, and use the display device equipped with the position sensor and the posture sensor to display the object. What is necessary is just to arrange the actual object at the same position as the object displayed on the device. Examples of such a display device include a head-mounted display on which a position sensor, an attitude sensor, and a camera are mounted, a head-mounted display on which a real space is transparently visible and on which a position sensor and an attitude sensor are mounted, and projection mapping. Examples include a device, a tablet terminal equipped with a position sensor, a posture sensor, and a camera, and a smartphone equipped with a position sensor, a posture sensor, and a camera.
 これらの表示装置では、仮想空間内の位置と実空間内の位置とを同期させた上で、実空間内にオブジェクトを重畳させた映像を表示させることができる。したがって、ユーザは、例えば、当該表示装置を携帯又は装着等した上で実空間内を移動して、当該映像中のオブジェクトと同じ位置に、同じ姿勢で、同じ物体を実空間内に配置することができる。 These display devices can display a video in which an object is superimposed in the real space after synchronizing the position in the virtual space with the position in the real space. Therefore, for example, the user moves the display device in the real space after carrying or wearing the display device, and places the same object in the same position, the same posture, and the same position as the object in the video in the real space. Can be.
 これにより、上記のステップS101~ステップS102により作成された仮想空間と、実空間とを対応させることができる。なお、上記以外にも、例えば、MR(Mixed Reality)等の技術によって実空間と仮想空間とを融合させた複合現実を作成することで、仮想空間内に配置されたオブジェクトと同じ位置に、同じ姿勢で、同じ物体を実空間内に配置しても良い。 に よ り Thereby, the virtual space created in steps S101 to S102 can be associated with the real space. In addition to the above, for example, by creating a mixed reality in which the real space and the virtual space are fused by using a technology such as MR (Mixed Reality), the same position as the object arranged in the virtual space is obtained. The same object may be arranged in the real space in the posture.
 <学習用データ作成手順の流れ>
 次に、実撮影画像と、この実撮影画像に対応する仮想撮影画像とを作成した上で、これらの実撮影画像と仮想撮影画像とを用いて、学習用データを作成する場合の手順について、図4を参照しながら説明する。図4は、学習用データ作成手順の流れの一例を説明するための図である。
<Procedure for creating learning data>
Next, after a real photographed image and a virtual photographed image corresponding to the real photographed image are created, a procedure for creating learning data using the real photographed image and the virtual photographed image is described below. This will be described with reference to FIG. FIG. 4 is a diagram for explaining an example of the flow of the learning data creation procedure.
 ステップS201:ユーザは、カメラ装置20を用いて、実空間内の所望の範囲を撮影する。これにより、カメラ装置20により実撮影画像が作成され、学習用データ作成装置10に送信される。学習用データ作成装置10では、当該実撮影画像が記憶部300に記憶される。 Step S201: The user uses the camera device 20 to photograph a desired range in the real space. As a result, an actual photographed image is created by the camera device 20 and transmitted to the learning data creating device 10. In the learning data creation device 10, the actual photographed image is stored in the storage unit 300.
 また、このとき、当該カメラ装置20に対応するトラッキング装置30は、トラッキング情報を学習用データ作成装置10に送信する。これにより、学習用データ作成装置10では、当該トラッキング情報が記憶部300に記憶される。トラッキング情報は、上述したように、当該カメラ装置20の位置及び姿勢を示す情報である。 At this time, the tracking device 30 corresponding to the camera device 20 transmits the tracking information to the learning data creation device 10. Thereby, in the learning data creation device 10, the tracking information is stored in the storage unit 300. The tracking information is information indicating the position and orientation of the camera device 20 as described above.
 なお、上記のステップS201では、トラッキング装置30がカメラ装置20の位置及び姿勢をトラッキングすることで作成したトラッキング情報を記憶部300に記憶させたが、これに限られない。任意の方法でカメラ装置20の位置及び姿勢をトラッキングした上で、このトラッキング結果を示すトラッキング情報を記憶部300に記憶させても良い。例えば、QRコード(登録商標)等の二次元コードをカメラ装置20に事前に貼り付け等した上、この二次元コードをカメラ等で読み取ることで当該カメラ装置20の位置及び姿勢をトラッキングしても良い。 In the above step S201, the tracking information created by the tracking device 30 tracking the position and orientation of the camera device 20 is stored in the storage unit 300, but the present invention is not limited to this. After tracking the position and orientation of the camera device 20 by an arbitrary method, tracking information indicating the tracking result may be stored in the storage unit 300. For example, even if a two-dimensional code such as a QR code (registered trademark) is pasted on the camera device 20 in advance and the position and orientation of the camera device 20 are tracked by reading the two-dimensional code with a camera or the like. good.
 ステップS202:三次元シミュレータ100は、上記のステップS201で撮影したカメラ装置20と同じ位置及び姿勢で、仮想空間内を仮想的なカメラ装置で撮影する。すなわち、三次元シミュレータ100は、仮想空間内において、上記のステップS201で撮影したカメラ装置20と同じ位置及び姿勢の仮想的なカメラ装置の撮影範囲内を描画(レンダリング)する。 Step S202: The three-dimensional simulator 100 photographs the virtual space with the virtual camera device at the same position and orientation as the camera device 20 photographed in step S201. In other words, the three-dimensional simulator 100 draws (renders) in the virtual space an imaging range of the virtual camera device having the same position and posture as the camera device 20 imaged in step S201.
 ここで、三次元シミュレータ100は、上記のステップS201で作成されたトラッキング情報から、カメラ装置20の位置及び姿勢を特定することができる。このため、三次元シミュレータ100は、実空間のカメラ装置20と同じ位置及び姿勢で、仮想空間内に仮想的なカメラ装置を設置することができる。これにより、上記のステップS201で作成された実撮影画像に対応する仮想撮影画像が作成される。 Here, the three-dimensional simulator 100 can specify the position and orientation of the camera device 20 from the tracking information created in step S201. Therefore, the three-dimensional simulator 100 can install a virtual camera device in the virtual space at the same position and orientation as the camera device 20 in the real space. As a result, a virtual captured image corresponding to the real captured image created in step S201 is created.
 このとき、三次元シミュレータ100は、仮想空間内で取得又は演算により生成される所定の情報を仮想撮影画像に付与する。そして、三次元シミュレータ100は、当該所定の情報が付与された仮想撮影画像を記憶部300に記憶する。 At this time, the three-dimensional simulator 100 adds predetermined information acquired or calculated in the virtual space to the virtual captured image. Then, the three-dimensional simulator 100 stores the virtual captured image to which the predetermined information is added in the storage unit 300.
 ここで、所定の情報としては、上述したように、例えば、仮想撮影画像中の物体(オブジェクト)の輪郭線情報や物体名、当該物体が分類されるクラス、当該物体の状態情報、当該物体までの深度、当該物体の姿勢、当該物体に関して所定の物理演算を行った結果等が挙げられる。また、物体に関して所定の物理演算を行った結果としては、例えば、予め設定された動作が可能なロボットアームが、当該位置において当該物体を把持可能な動作に関する情報等が挙げられる。又は、例えば、予め設定された動作が可能な移動式ロボットが、当該位置において当該物体を回避可能な動作に関する情報等が挙げられる。なお、これらのロボットアームや移動式ロボットは、予め設定された動作が可能な動作主体の一例である。 Here, as the predetermined information, as described above, for example, the outline information and the object name of the object (object) in the virtual captured image, the class to which the object is classified, the state information of the object, the , The posture of the object, the result of performing a predetermined physical operation on the object, and the like. In addition, as a result of performing a predetermined physical operation on an object, for example, information on an operation of a robot arm capable of performing a preset operation and holding the object at the position can be given. Alternatively, for example, there is information on an operation or the like in which a mobile robot capable of performing a preset operation can avoid the object at the position. Note that these robot arms and mobile robots are examples of an operation subject that can perform a preset operation.
 なお、上記のステップS202は、例えば、上記のステップS201の後に自動的に実行されても良いし、ユーザの操作(例えば、仮想空間内でのレンダリング開始操作)等に応じて実行されても良い。 Step S202 may be automatically performed after step S201, for example, or may be performed in response to a user operation (for example, a rendering start operation in a virtual space). .
 ステップS203:学習用データ作成部200は、上記のステップS201で作成された実撮影画像に対して、上記のステップS202で作成された仮想撮影画像に付与されている所定の情報を教師情報として付与する。これにより、実撮影画像と、教師情報との組で表される学習用データが作成される。 Step S203: The learning data creation unit 200 assigns, as teacher information, predetermined information given to the virtual shot image created in step S202 to the real shot image created in step S201. I do. As a result, learning data represented by a set of the real photographed image and the teacher information is created.
 ここで、学習用データに含まれる教師情報は、例えば、リスト形式で表される。一例として、リスト形式で表された複数の教師情報(これを「教師情報リスト」とも表す。)を図5に示す。図5は、或る実撮影画像(画像ID:image101)に付与された教師情報リストの一例である。 Here, the teacher information included in the learning data is represented, for example, in a list format. As an example, FIG. 5 illustrates a plurality of pieces of teacher information represented in a list format (this is also referred to as a “teacher information list”). FIG. 5 is an example of a teacher information list assigned to a certain actually shot image (image ID: image101).
 図5に示す教師情報リストに含まれる各教師情報は、物体IDと、位置情報と、輪郭線情報と、接触情報と、把持動作情報とが対応付けられた情報である。 各 Each teacher information included in the teacher information list shown in FIG. 5 is information in which an object ID, position information, contour line information, contact information, and gripping operation information are associated with each other.
 物体IDは、物体(オブジェクト)を識別するIDである。物体IDは、例えば、仮想空間に配置されたオブジェクトの三次元モデルに付与されている情報である。 The object ID is an ID for identifying an object. The object ID is, for example, information given to a three-dimensional model of an object arranged in a virtual space.
 位置情報は、物体(オブジェクト)が配置された位置座標である。位置情報は、例えば、上記のステップS102で三次元モデルが表すオブジェクトが配置された際に、当該オブジェクトに付与される情報である。 The position information is the position coordinates where the object is located. The position information is, for example, information given to the object when the object represented by the three-dimensional model is arranged in step S102 described above.
 輪郭線情報は、物体(オブジェクト)の輪郭線を示す情報である。輪郭線情報は、例えば、上記のステップS202で仮想撮影画像を描画(レンダリング)した際のレンダリング結果から取得することができる。 Contour line information is information indicating a contour line of an object. The contour information can be acquired, for example, from the rendering result when the virtual captured image is drawn (rendered) in step S202.
 接触情報は、当該物体IDの物体が他の物体(オブジェクト)と接触している場合に、当該他の物体の物体IDや当該他の物体との接触位置等を示す情報である。接触情報は、例えば、三次元シミュレータ100の物理演算の演算結果から取得することができる。 The contact information is information indicating the object ID of the other object, the contact position with the other object, and the like when the object with the object ID is in contact with another object (object). The contact information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.
 把持動作情報は、例えば、予め設定された動作が可能なロボットアームが、仮想撮影画像の撮影位置において当該物体IDの物体を把持可能な動作に関する情報である。把持動作情報は、例えば、三次元シミュレータ100の物理演算の演算結果から取得することができる。 The gripping operation information is, for example, information relating to an operation in which a robot arm capable of performing a preset operation can grip an object having the object ID at a shooting position of a virtual shot image. The gripping motion information can be obtained, for example, from the calculation result of the physical calculation of the three-dimensional simulator 100.
 このように、図5に示す教師情報リストは、物体(オブジェクト)毎に、位置情報と、物体名と、輪郭線情報と、接触情報と、把持動作情報とが対応付けられた教師情報のリストである。これ以外にも、当該教師情報には、三次元シミュレータ100が取得又は演算可能な任意の情報が対応付けられていても良い。例えば、教師情報として仮想撮影画像自体又は仮想撮影画像の一部の領域が対応付けられていても良い。具体的には、例えば、物体IDに対して、仮想撮影画像の画像領域うち、当該物体IDの物体を表す画像領域部分が対応付けられていても良い。なお、把持動作情報には、対象を把持する場合に、把持が可能又は把持が行いやすいと考えられる対象の部位の情報が含まれ得る。 As described above, the teacher information list illustrated in FIG. 5 is a list of the teacher information in which the position information, the object name, the outline information, the contact information, and the grip operation information are associated with each other for each object (object). It is. In addition to this, any information that can be obtained or calculated by the three-dimensional simulator 100 may be associated with the teacher information. For example, the virtual captured image itself or a partial region of the virtual captured image may be associated with the teacher information. Specifically, for example, an image area portion representing the object of the object ID in the image area of the virtual captured image may be associated with the object ID. Note that the gripping motion information may include information on a target part that is considered to be grippable or easy to grip when gripping the target.
 また、各教師情報は、上記の各情報(位置情報や物体名、輪郭線情報、接触情報、把持動作情報等)のうちの一部の情報のみが対応付けられた情報であっても良い。 {Circle around (4)} Each piece of teacher information may be information in which only a part of the above information (position information, object name, contour line information, contact information, gripping motion information, etc.) is associated.
 なお、学習用データに含まれる教師情報がリスト形式で表されることは一例であって、学習用データに含まれる教師情報は、他の形式の任意の形式で表されていても良い。 Note that the case where the teacher information included in the learning data is represented in a list format is an example, and the teacher information included in the learning data may be represented in any other format.
 <学習用データ作成装置10のハードウェア構成>
 次に、本発明の実施の形態における学習用データ作成装置10のハードウェア構成について、図6を参照しながら説明する。図6は、本発明の実施の形態における学習用データ作成装置10のハードウェア構成の一例を示す図である。
<Hardware configuration of learning data creation device 10>
Next, a hardware configuration of the learning data creation device 10 according to the embodiment of the present invention will be described with reference to FIG. FIG. 6 is a diagram illustrating an example of a hardware configuration of the learning data creation device 10 according to the embodiment of the present invention.
 図6に示すように、本発明の実施の形態における学習用データ作成装置10は、入力装置401と、表示装置402と、外部I/F403と、通信I/F404と、RAM(Random Access Memory)405と、ROM(Read Only Memory)406と、プロセッサ407と、補助記憶装置408とを有する。これら各ハードウェアは、それぞれがバス409により相互に接続されている。 As shown in FIG. 6, the learning data creating device 10 according to the embodiment of the present invention includes an input device 401, a display device 402, an external I / F 403, a communication I / F 404, and a random access memory (RAM). 405, a ROM (Read Only Memory) 406, a processor 407, and an auxiliary storage device 408. These pieces of hardware are interconnected by a bus 409.
 入力装置401は、例えばキーボードやマウス、タッチパネル等であり、ユーザが各種操作を入力するのに用いられる。表示装置402は、例えばディスプレイ等であり、学習用データ作成装置10の各種の処理結果を表示する。なお、学習用データ作成装置10は、入力装置401及び表示装置402のうちの少なくとも一方を有していなくても良い。 The input device 401 is, for example, a keyboard, a mouse, a touch panel, or the like, and is used by a user to input various operations. The display device 402 is, for example, a display or the like, and displays various processing results of the learning data creation device 10. Note that the learning data creation device 10 may not have at least one of the input device 401 and the display device 402.
 外部I/F403は、外部装置とのインタフェースである。外部装置には、記録媒体403a等がある。学習用データ作成装置10、外部I/F403を介して、記録媒体403a等の読み取りや書き込み等を行うことができる。記録媒体403aには、三次元シミュレータ100や学習用データ作成部200を実現する1以上のプログラム等が記録されていても良い。 The external I / F 403 is an interface with an external device. The external device includes a recording medium 403a and the like. Reading and writing of the recording medium 403a and the like can be performed via the learning data creation device 10 and the external I / F 403. The storage medium 403a may store one or more programs for realizing the three-dimensional simulator 100 and the learning data creation unit 200.
 記録媒体403aには、例えば、フレキシブルディスク、CD(Compact Disc)、DVD(Digital Versatile Disk)、SDメモリカード(Secure Digital memory card)、USB(Universal Serial Bus)メモリカード等がある。 Examples of the recording medium 403a include a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disk), an SD memory card (Secure Digital Memory card), and a USB (Universal Serial Bus) memory card.
 通信I/F404は、学習用データ作成装置10を通信ネットワークに接続するためのインタフェースである。三次元シミュレータ100や学習用データ作成部200を実現する1以上のプログラムは、通信I/F404を介して、所定のサーバ装置等から取得(ダウンロード)されても良い。 The communication I / F 404 is an interface for connecting the learning data creation device 10 to a communication network. One or more programs that implement the three-dimensional simulator 100 and the learning data creation unit 200 may be obtained (downloaded) from a predetermined server device or the like via the communication I / F 404.
 RAM405は、プログラムやデータを一時保持する揮発性の半導体メモリである。ROM406は、電源を切ってもプログラムやデータを保持することができる不揮発性の半導体メモリである。ROM406には、例えば、OS(Operating System)に関する設定や通信ネットワークに関する設定等が格納されている。 The RAM 405 is a volatile semiconductor memory that temporarily stores programs and data. The ROM 406 is a nonvolatile semiconductor memory that can retain programs and data even when the power is turned off. The ROM 406 stores, for example, settings related to an OS (Operating System), settings related to a communication network, and the like.
 プロセッサ407は、例えばCPU(Central Processing Unit)やGPU(Graphics Processing Unit)等であり、ROM406や補助記憶装置408等からプログラムやデータをRAM405上に読み出して処理を実行する演算装置である。三次元シミュレータ100や学習用データ作成部200は、例えば補助記憶装置408に格納されている1以上のプログラムがプロセッサ407に実行させる処理により実現される。なお、学習用データ作成装置10は、プロセッサ407として、CPUとGPUとの両方を有していても良いし、CPU又はGPUのいずれか一方のみを有していても良い。 The processor 407 is, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like, and is an arithmetic device that reads a program or data from the ROM 406 or the auxiliary storage device 408 onto the RAM 405 and executes processing. The three-dimensional simulator 100 and the learning data creation unit 200 are realized by, for example, processing that causes the processor 407 to execute one or more programs stored in the auxiliary storage device 408. The learning data creation device 10 may include both the CPU and the GPU as the processor 407, or may include only one of the CPU and the GPU.
 補助記憶装置408は、例えばHDD(Hard Disk Drive)やSSD(Solid State Drive)等であり、プログラムやデータを格納している不揮発性の記憶装置である。補助記憶装置408には、例えば、OS、各種アプリケーションソフトウェア、三次元シミュレータ100や学習用データ作成部200を実現する1以上のプログラム等が格納されている。記憶部300は、例えば補助記憶装置408を用いて実現されている。ただし、記憶部300は、補助記憶装置408ではなく、例えば、学習用データ作成装置10と通信ネットワークを介して通信可能に接続される記憶装置等を用いて実現されていても良い。 The auxiliary storage device 408 is, for example, a hard disk drive (HDD) or a solid state drive (SSD), and is a nonvolatile storage device that stores programs and data. The auxiliary storage device 408 stores, for example, an OS, various application software, one or more programs for implementing the three-dimensional simulator 100 and the learning data creation unit 200, and the like. The storage unit 300 is realized using, for example, the auxiliary storage device 408. However, the storage unit 300 may be implemented using, for example, a storage device that is communicably connected to the learning data creation device 10 via a communication network, instead of the auxiliary storage device 408.
 本発明の実施の形態における学習用データ作成装置10は、図6に示すハードウェア構成を有することにより、上述した各種処理を実現することができる。なお、図6に示す例では、本発明の実施の形態における学習用データ作成装置10が1台の装置(コンピュータ)で実現されている場合について説明したが、これに限られない。本発明の実施の形態における学習用データ作成装置10は、複数台の装置(コンピュータ)で実現されていても良い。 The learning data creation device 10 according to the embodiment of the present invention can implement the above-described various processes by having the hardware configuration shown in FIG. In the example illustrated in FIG. 6, a case has been described where the learning data creation device 10 according to the embodiment of the present invention is implemented by one device (computer), but the invention is not limited to this. The learning data creation device 10 according to the embodiment of the present invention may be realized by a plurality of devices (computers).
 <まとめ>
 以上のように、本発明の実施の形態における学習用データ作成システム1は、仮想撮影画像から得られる情報(すなわち、三次元シミュレータ100が取得又は演算可能な情報)を教師情報として実撮影画像に付与することで、学習用データを作成する。このため、本発明の実施の形態における学習用データ作成システム1では、例えば、実撮影画像に対して教師情報を手作業で付与する等の作業を行うことなく、学習用データを容易に作成することができるようになる。特に、例えば、教師情報の数が多い場合(例えば、物体の種類数が多い場合やカテゴリ数が多い場合等)であっても、学習用データを容易に作成することができるようになる。
<Summary>
As described above, the learning data creation system 1 according to the embodiment of the present invention converts information obtained from a virtual captured image (that is, information that can be obtained or calculated by the three-dimensional simulator 100) into a real captured image as teacher information. By giving, the learning data is created. For this reason, in the learning data creation system 1 according to the embodiment of the present invention, the learning data is easily created without, for example, manually adding teacher information to an actual captured image. Will be able to do it. In particular, for example, even when the number of teacher information is large (for example, the number of types of objects is large or the number of categories is large), the learning data can be easily created.
 また、例えば、セマンティックセグメンテーションを行う場合に、本発明の実施の形態における学習用データ作成システム1では、物体(オブジェクト)の境界線を三次元シミュレータ100が取得するため、高い精度で物体のセグメンテーションを行うことができるようになる。 In addition, for example, when performing semantic segmentation, in the learning data creation system 1 according to the embodiment of the present invention, since the three-dimensional simulator 100 acquires the boundary of the object (object), the object segmentation can be performed with high accuracy. Will be able to do it.
 更に、例えば、本発明の実施の形態における学習用データ作成システム1では、例えば、深度や物体の姿勢等、手作業で付与することが困難な教師情報であっても、この教師情報が含まれる学習用データを容易に作成することができる。 Further, for example, in the learning data creation system 1 according to the embodiment of the present invention, even if the teacher information is difficult to manually assign, such as the depth and the posture of the object, the teacher information is included. Learning data can be easily created.
 しかも、本発明の実施の形態における学習用データ作成システム1では、例えば、ユーザが実空間内の移動しながらカメラ装置20で任意の範囲を撮影するだけで、実撮影画像と、この実撮影画像に対応する仮想撮影画像とが作成されるため、大量の学習用データを容易に作成することができる。このため、例えば、クラウドソーシング等を利用して教師情報を実撮影画像に付与する場合と比較して、低コストに大量の学習用データを得ることができる。 Moreover, in the learning data creation system 1 according to the embodiment of the present invention, for example, the user simply captures an arbitrary range with the camera device 20 while moving in the real space, and obtains the real captured image and the real captured image. Is generated, so that a large amount of learning data can be easily created. For this reason, for example, a large amount of learning data can be obtained at low cost as compared with a case where teacher information is added to an actual captured image using crowd sourcing or the like.
 したがって、本発明の実施の形態における学習用データ作成システム1を利用することで、例えば、実空間である或る部屋内を掃除したり、当該部屋内の物体の片づけを行ったりするロボットの認識エンジン(すなわち、部屋内の掃除や片づけを行うタスクを実行する機械学習モデル)の学習に用いられる大量の学習用データを容易に得ることができる。 Therefore, by using the learning data creation system 1 according to the embodiment of the present invention, for example, recognition of a robot that cleans a certain room that is a real space or clears an object in the room is performed. It is possible to easily obtain a large amount of learning data used for learning an engine (that is, a machine learning model that performs a task of cleaning and clearing a room).
 なお、本発明の実施の形態では、事前準備手順として、仮想空間を作成した上で、この仮想空間に対応するように実空間に物体を配置したが、これに限られない。例えば、実際に物体が配置されている実空間と対応するように、仮想空間が作成されても良い。この際、仮想空間の物体を実空間の物体と対応するように移動させても良い。 In the embodiment of the present invention, as a preliminary preparation procedure, a virtual space is created, and objects are arranged in the real space so as to correspond to the virtual space. However, the present invention is not limited to this. For example, a virtual space may be created so as to correspond to a real space in which an object is actually arranged. At this time, the object in the virtual space may be moved so as to correspond to the object in the real space.
 また、本実施形態では、第1の対象と第2の対象との位置合わせを容易とする機能を備えることが好ましい。例えば、仮想空間の物体を移動させる際の移動が容易又は実空間の物体と位置を合わせることが容易となる機能を備えても良い。具体的には、仮想空間の物体をユーザの位置を基準として移動させる機能(例えば、ユーザの近傍に引き寄せる等の機能)、実空間と法線を合わせる機能(例えば、仮想空間の物体の所定の面を、仮想空間中の壁や床等を含む他の物体と接触した状態で、他の物体に沿って移動させる機能)、さらに仮想空間の物体の向き、位置、角度等の情報のいずれかを固定した上で他の情報を操作することで実空間の物体と位置合わせを行う機能、機械学習モデル等を用いて実空間の物体と対応する仮想空間の物体との位置合わせを行う機能等が挙げられる。 In addition, in the present embodiment, it is preferable to provide a function for facilitating the alignment between the first target and the second target. For example, a function may be provided that facilitates movement when moving an object in the virtual space, or facilitates alignment with an object in the real space. Specifically, a function of moving an object in the virtual space with reference to the position of the user (for example, a function of drawing the object to the vicinity of the user), a function of aligning the normal with the real space (for example, a predetermined function of the object in the virtual space) A function to move a surface along another object while in contact with another object including a wall or a floor in the virtual space), and any of information such as the direction, position, and angle of the object in the virtual space A function that aligns objects in the real space by manipulating other information after fixing the object, a function that aligns objects in the real space with the corresponding objects in the virtual space using a machine learning model, etc. Is mentioned.
 第2の対象には、第2の対象が第1の対象と良好に対応するような又は位置を合わせることが容易となるような処理が行われても良い。例えば、実空間の物体と仮想空間の物体とが対応しない場合、又は位置を合わせることが困難な場合には、仮想空間の物体を新たに追加して生成する、情報を修正する、又は再度生成する等しても良い。 処理 The second object may be subjected to a process such that the second object satisfactorily corresponds to the first object or facilitates alignment. For example, when the object in the real space does not correspond to the object in the virtual space, or when it is difficult to match the position, an object in the virtual space is newly added, the information is corrected, or the information is generated again. You may do it.
 また、第1のデータ又は第2のデータの取得又は作成の際には、学習用データとして用いられる主なデータ(例えば、上述の実施形態では画像)以外のデータを記録しても良い。このようなデータは、具体的には、データを取得又は作成する装置のパラメータ(例えば、カメラの場合には、焦点距離、カメラの位置、シャッタースピード及び絞り値、カメラに設けられた加速度センサからのカメラの動きの情報等、画像が取得された際のカメラの情報)、照明条件等のデータが取得された環境に関する情報、当該データに記録された対象又は主なデータに関するカテゴリ又はタグ情報(例えば、画像の場合には、撮影された物体に関するカテゴリの情報)等が含まれる。なお、主なデータが動画である場合には時刻の情報等が含まれることが好ましい。また、記録したデータの管理等に用いられる情報、例えば作業者や撮像装置のID等が含まれていても良い。このような主なデータ以外のデータは、学習用データを作成する工程中で利用されても良く、主なデータと共に学習用データの一部として利用されても良い。 In addition, when acquiring or creating the first data or the second data, data other than main data (for example, an image in the above embodiment) used as learning data may be recorded. Such data is, specifically, the parameters of the device that acquires or creates the data (for example, in the case of a camera, the focal length, the position of the camera, the shutter speed and the aperture value, the acceleration sensor provided in the camera, Information on the camera at the time when the image was acquired, such as information on the movement of the camera, information on the environment in which data such as lighting conditions were acquired, category or tag information on the target or main data recorded in the data ( For example, in the case of an image, category information related to a photographed object) is included. When the main data is a moving image, it is preferable to include time information and the like. Further, information used for management of recorded data and the like, for example, an ID of a worker or an imaging device may be included. Such data other than the main data may be used in the process of creating the learning data, or may be used together with the main data as a part of the learning data.
 また、本発明の実施の形態では、実撮影画像及び仮想撮影画像が静止画像である場合を想定して説明したが、これに限られない。実撮影画像及び仮想撮影画像は動画であっても良い。 Further, in the embodiment of the present invention, the case has been described where the real photographed image and the virtual photographed image are still images, but the present invention is not limited to this. The real photographed image and the virtual photographed image may be moving images.
 また、本発明の実施の形態では、実撮影画像と、三次元シミュレータ100から取得された教師情報との組を学習用データとしたが、これに限られない。例えば、実撮影画像を教師情報として、仮想撮影画像と、教師情報(実撮影画像)との組を学習用データとしても良い。この場合、三次元シミュレータ100により作成された仮想撮影画像から実撮影画像を予測するタスクの学習用データを作成することができる。 In addition, in the embodiment of the present invention, a set of the real photographed image and the teacher information acquired from the three-dimensional simulator 100 is used as the learning data, but is not limited thereto. For example, a combination of a virtual captured image and teacher information (real captured image) may be used as learning data, using the real captured image as teacher information. In this case, it is possible to create learning data for a task of predicting an actual captured image from the virtual captured image created by the three-dimensional simulator 100.
 本発明は、具体的に開示された上記の実施の形態に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 The present invention is not limited to the above-described embodiments specifically disclosed, and various modifications and changes can be made without departing from the scope of the claims.
 本願は、日本国に2018年9月27日に出願された基礎出願2018-182538号に基づくものであり、その全内容はここに参照をもって援用される。 This application is based on Japanese Patent Application No. 2018-182538 filed on Sep. 27, 2018 in Japan, the entire contents of which are incorporated herein by reference.
 1    学習用データ作成システム
 10   学習用データ作成装置
 20   カメラ装置
 30   トラッキング装置
 100  三次元シミュレータ
 200  学習用データ作成部
 300  記憶部
Reference Signs List 1 learning data creation system 10 learning data creation device 20 camera device 30 tracking device 100 three-dimensional simulator 200 learning data creation unit 300 storage unit

Claims (7)

  1.  現実における第1の対象を記録した第1のデータと、シミュレータにより生成され前記第1の対象と位置を合わせて配置された前記第1の対象と対応する第2の対象を記録した第2のデータと、を用いて、
     前記第1のデータに対して、少なくとも、前記第2のデータに基づく情報又は前記シミュレータにより生成された情報の何れかを教師情報として付与する、ことを1又は複数のコンピュータが実行する学習用データ作成方法。
    First data that records a first object in reality, and a second data that records a second object corresponding to the first object generated by a simulator and aligned with the first object. Using the data and
    Learning data for one or more computers to execute, as teacher information, at least one of information based on the second data or information generated by the simulator, for the first data; How to make.
  2.  前記第1の対象は物体、前記第1のデータ及び前記第2のデータは画像、前記第2の対象は三次元モデルである、請求項1に記載の学習用データ作成方法。 The method according to claim 1, wherein the first object is an object, the first data and the second data are images, and the second object is a three-dimensional model.
  3.  前記第1の対象と前記第2の対象との位置を合わせて配置することは、現実における前記第1の対象の位置を仮想空間における前記第2の対象の位置に合わせる、又は、仮想空間において前記第2の対象を移動させて現実における前記第1の対象の位置に合わせることである、請求項1又は2に記載の学習用データ作成方法。 Arranging the first object and the second object so as to match the position of the first object in reality with the position of the second object in the virtual space, or in the virtual space The learning data creation method according to claim 1 or 2, wherein the second object is moved to match the position of the first object in reality.
  4.  前記第1の対象と前記第2の対象との位置を合わせて配置することにはMR技術が用いられる、請求項3に記載の学習用データ作成方法。 4. The learning data creating method according to claim 3, wherein an MR technique is used for arranging the first target and the second target so as to be aligned with each other.
  5.  請求項1から4のいずれか一項に記載の学習用データ作成方法により作成された学習用データを用いて学習を行う、ことを1又は複数のコンピュータが実行する機械学習モデルの生成方法。 A method for generating a machine learning model in which one or more computers execute learning using learning data created by the learning data creating method according to any one of claims 1 to 4.
  6.  現実における第1の対象を記録した第1のデータと、シミュレータにより生成され前記第1の対象と位置を合わせて配置された前記第1の対象と対応する第2の対象を記録した第2のデータと、を用いて、
     前記第1のデータに対して、少なくとも、前記第2のデータに基づく情報又は前記シミュレータにより生成された情報の何れかを教師情報として付与する付与部、
     を有する学習用データ作成装置。
    First data that records a first object in reality, and a second data that records a second object corresponding to the first object generated by a simulator and aligned with the first object. Using the data and
    An assigning unit that assigns at least one of information based on the second data or information generated by the simulator as teacher information to the first data;
    A learning data creation device having
  7.  現実における第1の対象を記録した第1のデータと、シミュレータにより生成され前記第1の対象と位置を合わせて配置された前記第1の対象と対応する第2の対象を記録した第2のデータと、を用いて、
     前記第1のデータに対して、少なくとも、前記第2のデータに基づく情報又は前記シミュレータにより生成された情報の何れかを教師情報として付与する、ことを1又は複数のコンピュータに実行させるプログラム。
    First data that records a first object in reality, and a second data that records a second object corresponding to the first object generated by a simulator and aligned with the first object. Using the data and
    A program for causing one or more computers to execute at least one of information based on the second data and information generated by the simulator as teacher information to the first data.
PCT/JP2019/037684 2018-09-27 2019-09-25 Learning data creation method, machine learning model generation method, learning data creation device, and program WO2020067204A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018182538A JP2022024189A (en) 2018-09-27 2018-09-27 Learning data creation method, learning data creation device, and program
JP2018-182538 2018-09-27

Publications (1)

Publication Number Publication Date
WO2020067204A1 true WO2020067204A1 (en) 2020-04-02

Family

ID=69953491

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/037684 WO2020067204A1 (en) 2018-09-27 2019-09-25 Learning data creation method, machine learning model generation method, learning data creation device, and program

Country Status (2)

Country Link
JP (1) JP2022024189A (en)
WO (1) WO2020067204A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7281576B1 (en) 2022-03-31 2023-05-25 Kddi株式会社 Video projection system and video projection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004362440A (en) * 2003-06-06 2004-12-24 National Printing Bureau Character string extracting processor from printed matter
WO2018020954A1 (en) * 2016-07-29 2018-02-01 株式会社日立製作所 Database construction system for machine-learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004362440A (en) * 2003-06-06 2004-12-24 National Printing Bureau Character string extracting processor from printed matter
WO2018020954A1 (en) * 2016-07-29 2018-02-01 株式会社日立製作所 Database construction system for machine-learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FUJIHASHI, KAZUKI ET AL.: "Estimation of Number and Locations of Products in Pictures by Semi-Supervised Learning", PROCEEDINGS DVD OF THE 32ND ANNUAL CONFERENCE OF THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE , 2018, 5 June 2018 (2018-06-05), pages 1 - 3 *

Also Published As

Publication number Publication date
JP2022024189A (en) 2022-02-09

Similar Documents

Publication Publication Date Title
Nebeling et al. The trouble with augmented reality/virtual reality authoring tools
JP6105092B2 (en) Method and apparatus for providing augmented reality using optical character recognition
JP2020535509A (en) Methods, devices and systems for automatically annotating target objects in images
US20080231631A1 (en) Image processing apparatus and method of controlling operation of same
WO2019041900A1 (en) Method and device for recognizing assembly operation/simulating assembly in augmented reality environment
US11436755B2 (en) Real-time pose estimation for unseen objects
CN112154486B (en) System and method for multi-user augmented reality shopping
CN110573992B (en) Editing augmented reality experiences using augmented reality and virtual reality
JPWO2015107665A1 (en) Work support data creation program
JP2023103265A (en) Control device, control method and program
Nishino et al. 3d object modeling using spatial and pictographic gestures
KR20200136723A (en) Method and apparatus for generating learning data for object recognition using virtual city model
WO2020067204A1 (en) Learning data creation method, machine learning model generation method, learning data creation device, and program
US20200226833A1 (en) A method and system for providing a user interface for a 3d environment
TW201724054A (en) System, method, and computer program product for simulated reality learning
GB2555521A (en) Improved object painting through use of perspectives or transfers in a digital medium environment
JP7401245B2 (en) Image synthesis device, control method and program for image synthesis device
CN115082648A (en) AR scene arrangement method and system based on marker model binding
Okamoto et al. Assembly assisted by augmented reality (A 3 R)
BARON et al. APPLICATION OF AUGMENTED REALITY TOOLS TO THE DESIGN PREPARATION OF PRODUCTION.
JP7045863B2 (en) Information management system, information management method, and program
JP6967150B2 (en) Learning device, image generator, learning method, image generation method and program
JP6859763B2 (en) Program, information processing device
JP6204781B2 (en) Information processing method, information processing apparatus, and computer program
JP2021192230A5 (en)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19865869

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19865869

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP