WO2023100282A1 - Data generation system, model generation system, estimation system, trained model production method, robot control system, data generation method, and data generation program - Google Patents

Data generation system, model generation system, estimation system, trained model production method, robot control system, data generation method, and data generation program Download PDF

Info

Publication number
WO2023100282A1
WO2023100282A1 PCT/JP2021/044058 JP2021044058W WO2023100282A1 WO 2023100282 A1 WO2023100282 A1 WO 2023100282A1 JP 2021044058 W JP2021044058 W JP 2021044058W WO 2023100282 A1 WO2023100282 A1 WO 2023100282A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
data generation
unit
interest
robot
Prior art date
Application number
PCT/JP2021/044058
Other languages
French (fr)
Japanese (ja)
Inventor
光司 曽我部
次郎 村岡
Original Assignee
株式会社安川電機
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社安川電機 filed Critical 株式会社安川電機
Priority to PCT/JP2021/044058 priority Critical patent/WO2023100282A1/en
Publication of WO2023100282A1 publication Critical patent/WO2023100282A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • One aspect of the present disclosure relates to a data generation system, a model generation system, an estimation system, a trained model manufacturing method, a robot control system, a data generation method, and a data generation program.
  • Patent Document 1 discloses item information acquisition means for acquiring item information about an item, mark specifying means for specifying a mark of an item based on the item information, and item classification for specifying an item classification based on the item information.
  • a fraud presumption system is described that includes a class identification means and an inference means for presuming fraud for an item based on the identified mark and classification.
  • a data generation system includes a detection unit that detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on an image, and a machine learning model in detection.
  • An identification unit that identifies an area of interest from an input image as an area of interest, and an annotation unit that associates an annotation corresponding to the detected object with the area of interest.
  • a data generation method is executed by a data generation system including at least one processor.
  • This data generation method includes steps of detecting the presence or absence of a target object in an input image using a machine learning model that detects the presence or absence of a target object based on the image; and associating annotations corresponding to the detected objects with the region of interest.
  • a data generation program detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on the image; causing a computer to perform the steps of identifying the detected region from the input image as the region of interest and associating annotations corresponding to the detected object with the region of interest.
  • FIG. 4 is a flowchart showing an example of processing in an object detection system; 10 is a flowchart showing an example of annotation processing; It is a figure which shows an example of specification of an attention area. It is a figure showing an example of functional composition of a robot system. 4 is a flow chart showing an example of processing in the robot system; It is a figure which shows an example of robot control.
  • the object detection system 1 is a computer system that generates a trained model for detecting the position of an object from an image and uses this trained model to perform the detection.
  • a trained model is a computational model that detects a position where an object appears in an image as the position of the object.
  • a trained model is generated in advance by machine learning. Machine learning refers to a method of autonomously discovering laws or rules by repeatedly learning based on given information.
  • a trained model is built using algorithms and data structures.
  • the trained model is built by a neural network such as a convolutional neural network (CNN). Generating a trained model corresponds to the learning phase, and using the trained model corresponds to the operation phase. Therefore, the object detection system 1 performs both a learning phase and an operational phase.
  • CNN convolutional neural network
  • the object is any tangible object, and is set according to the purpose of use of the object detection system 1. For example, if the object detection system 1 is used to robotically harvest crops automatically, the target is the crops. As another example, if the object detection system 1 is used for robotic automatic boxing of items, the object is the item.
  • FIG. 1 is a diagram showing an example of the functional configuration of an object detection system 1.
  • object detection system 1 comprises data generation system 10 , model generation system 20 and estimation system 30 .
  • Data generation system 10, model generation system 20, and estimation system 30 are examples of a data generation system, a model generation system, and an estimation system, respectively, according to the present disclosure.
  • the data generation system 10 and model generation system 20 correspond to the learning phase
  • the estimation system 30 corresponds to the operation phase.
  • the model generation system 20 In the learning phase, the model generation system 20 generates a trained model 42 used by the estimation system 30. In order to generate the trained model 42, machine learning using teacher data including a plurality of annotated images is required. Generally, a large amount of training data is required for machine learning, so it is necessary to annotate a large number of images.
  • An annotation is information (metadata) related to an image, and indicates, for example, a class value indicating the type of object and the position of the object in the image. Class values can be expressed in any form, such as numbers, text, and so on. The position of the object may be indicated by a rectangular bounding box that is set corresponding to the position. Conventionally, annotation is performed manually, which is very costly and time consuming. In addition, variations in annotations between workers can also occur. In the object detection system 1, the data generation system 10 executes the annotation to automatically generate at least part of the teacher data. Therefore, it is possible to efficiently carry out annotations on the object appearing in the image.
  • the data generation system 10 uses a machine learning model 41 to annotate images.
  • the machine learning model 41 is a computational model for detecting the presence or absence of an object based on an image. In the present disclosure, detecting the presence or absence of an object may include processing for identifying the type of object, that is, the class value.
  • Machine learning model 41 processes the image to detect whether or not the object is visible in the image.
  • the machine learning model 41 is generated in advance by machine learning, so it can be said that this is also a trained model.
  • machine learning model 41 is constructed by a neural network such as a convolutional neural network (CNN).
  • CNN convolutional neural network
  • a computational model for annotating images in the data generation system 10 corresponding to a part of the learning phase is referred to as a "machine learning model”
  • the estimation system 30 corresponding to the operation phase A computational model for detecting the position of an object in an image is called a "learned model”.
  • the object detection system 1 can access a first image database 51 and a second image database 52 . These databases may be provided outside the object detection system 1 or may be part of the object detection system 1 .
  • the first image database 51 is a device that stores a plurality of first training images with labels indicating the presence or absence of objects as first teacher data used to generate the machine learning model 41 .
  • a label is information (metadata) associated with an image, and indicates, for example, a class value indicating whether or not an object exists in the image.
  • the second image database 52 is a device that stores a plurality of second training images annotated with annotations corresponding to objects as second teacher data used to generate the trained model 42 .
  • Labels and annotations are common in that they are used as ground truth in machine learning.
  • the metadata attached to the first training images is referred to as “labels”
  • the metadata attached to the second training images is referred to as “annotations”.
  • the data generation system 10 includes a display control unit 11, a labeling unit 12, a preparation unit 13, a detection unit 14, an identification unit 15, and an annotation unit 16 as functional modules.
  • the display control unit 11 is a functional module that displays a user interface related to labels or annotations.
  • the labeling unit 12 is a functional module that generates a first training image by adding a label input via a user interface to a given image.
  • the labeling unit 12 stores the first training images in the first image database 51 .
  • the preparation unit 13 is a functional module that executes machine learning based on the labeled first training image to generate a machine learning model 41 .
  • the detection unit 14 is a functional module that uses the machine learning model 41 to detect the presence or absence of an object in the input image.
  • the detection unit 14 may detect whether or not a specific type of object exists in the input image. Alternatively, the detection unit 14 may detect whether or not each of a plurality of types of target objects exists in the input image.
  • the detection unit 14 may specify each type of one or more types of objects, that is, each class value.
  • the specifying unit 15 is a functional module that specifies, from the input image, a region focused on by the machine learning model 41 in its detection as a region of interest.
  • the annotation unit 16 is a functional module that associates the annotation corresponding to the detected object with the region of interest.
  • the annotation unit 16 stores the input image whose annotation is associated with the region of interest, that is, the input image to which the annotation is added, in the second image database 52 as a second training image.
  • the input image is an image processed by the machine learning model 41, and is treated as a second training image by annotating it.
  • the model generation system 20 includes a learning unit 21.
  • the learning unit 21 is a functional module that acquires second teacher data including second training images generated by the data generation system 10 and generates a trained model 42 based on this second teacher data. Therefore, the learning unit 21 also functions as an acquisition unit.
  • model generation system 20 may be constructed as a computer system including data generation system 10 and learning unit 21 .
  • the estimation system 30 includes an estimation unit 31 .
  • the estimation unit 31 is a functional module that inputs a target image to the trained model 42 and detects at least the position of the target object from the target image.
  • the estimation unit 31 may detect the position of each of two or more types of target objects from one target image.
  • a target image is an image to be processed by the trained model 42 .
  • No metadata is associated with the target image, and the trained model 42 at least detects the position of the target based on the pixel information of the target image.
  • the estimator 31 may further estimate a class value for each detected object.
  • the estimation system 30 may be constructed as a computer system that includes the model generation system 20 that may include the data generation system 10 and the estimation unit 31 .
  • the object detection system 1 can be realized by any kind of computer.
  • the computer may be a general-purpose computer such as a personal computer or a server for business use, or may be incorporated in a dedicated device that executes specific processing.
  • the object detection system 1 may be implemented by one computer, or may be implemented by a distributed system having a plurality of computers.
  • Each of the data generation system 10, the model generation system 20, and the estimation system 30 may be implemented by one computer, or may be implemented by a distributed system of multiple computers.
  • one computer may function as at least two of data generation system 10 , model generation system 20 and estimation system 30 .
  • FIG. 2 is a diagram showing an example of the hardware configuration of the computer 100 used in the object detection system 1.
  • computer 100 comprises main body 110 , monitor 120 and input device 130 .
  • the main body 110 is a device that executes the main functions of the computer.
  • the main body 110 has a circuit 160 which has at least one processor 161 , a memory 162 , a storage 163 , an input/output port 164 and a communication port 165 .
  • Storage 163 records programs for configuring each functional module of main body 110 .
  • the storage 163 is a computer-readable recording medium such as a hard disk, nonvolatile semiconductor memory, magnetic disk, or optical disk.
  • the memory 162 temporarily stores programs loaded from the storage 163, calculation results of the processor 161, and the like.
  • the processor 161 configures each functional module by executing a program in cooperation with the memory 162 .
  • the input/output port 164 inputs and outputs electrical signals to/from the monitor 120 or the input device 130 according to instructions from the processor 161 .
  • the input/output port 164 may input/output electrical signals to/from other devices.
  • Communication port 165 performs data communication with other devices via communication network N according to instructions from processor 161 .
  • the monitor 120 is a device for displaying information output from the main body 110 .
  • the monitor 120 may be of any type as long as it can display graphics, and a specific example thereof is a liquid crystal panel.
  • the input device 130 is a device for inputting information to the main body 110.
  • the input device 130 may be of any type as long as desired information can be input, and specific examples thereof include operation interfaces such as a keypad, mouse, and operation controller.
  • the monitor 120 and the input device 130 may be integrated as a touch panel.
  • the main body 110, the monitor 120, and the input device 130 may be integrated like a tablet computer.
  • Each functional module of the object detection system 1 is realized by loading an object detection program onto the processor 161 or memory 162 and causing the processor 161 to execute the object detection program.
  • the processor 161 operates the input/output port 164 or communication port 165 according to the object detection program to read and write data in the memory 162 or storage 163 .
  • the object detection program includes a data generation program, a model generation program, and an estimation program.
  • the data generation program includes code for realizing each functional module of data generation system 10 .
  • the model generation program includes code for realizing each functional module of the model generation system 20 .
  • the estimation program includes code for implementing each functional module of estimation system 30 .
  • the object detection program may be provided after being permanently recorded on a non-temporary recording medium such as a CD-ROM, DVD-ROM, or semiconductor memory. Alternatively, the object detection program may be provided over a communication network as a data signal superimposed on a carrier wave.
  • the data generator, model generator, and estimation program may be provided separately. Alternatively, at least two of these three types of programs may be provided as one package.
  • FIG. 3 is a flowchart showing an example of processing in the object detection system 1 as a processing flow S1. That is, the object detection system 1 executes the processing flow S1.
  • step S11 the data generation system 10 associates a given image with a label to generate a first training image.
  • this process is performed by the display control section 11 and the labeling section 12 .
  • the display control unit 11 displays on the monitor 120 a labeling user interface for labeling a given image.
  • the user inputs a label indicating whether or not the object is present in the image via the label user interface.
  • the labeling unit 12 assigns the label to the image to generate the first training image, and stores the first training image in the first image database 51 as at least part of the first training data.
  • step S12 the data generation system 10 executes machine learning based on the first teacher data including at least one first training image to generate the machine learning model 41.
  • this processing is performed by the preparation unit 13 .
  • the preparation unit 13 accesses the first image database 51 and executes the following processing for each first training image. That is, the preparation unit 13 inputs the first training image to the first reference model, which is the calculation model on which the machine learning model 41 is based, and obtains the estimation result of the class value output from the first reference model.
  • the preparation unit 13 executes back propagation (error backpropagation method) based on the error between the estimation result and the label (correct answer) to update the parameter group in the first reference model.
  • the preparation unit 13 obtains a machine learning model 41 by repeating this learning until a given termination condition is satisfied.
  • the machine learning model 41 is a computational model estimated to be optimal for detecting the presence or absence of objects based on images. Note that the machine learning model 41 is not necessarily a "computational model that is optimal in reality".
  • step S13 the data generation system 10 generates a second training image by annotation using the machine learning model 41.
  • this processing is performed by the detection unit 14 , the identification unit 15 and the annotation unit 16 . The details of this processing will be described later.
  • step S14 the model generation system 20 executes machine learning based on the second teacher data including at least one second training image to generate the learned model 42.
  • this processing is performed by the learning unit 21 .
  • the learning unit 21 accesses the second image database 52 and executes the following processing for each second training image. That is, the learning unit 21 inputs the second training image to the second reference model, which is the calculation model on which the trained model 42 is based, and obtains the position estimation result of the target output from the second reference model. .
  • the estimation result may further indicate a class value indicating the type of object.
  • the learning unit 21 executes back propagation based on the error between the estimation result and the annotation (correct answer) to update the parameter group in the second reference model.
  • the learning unit 21 obtains a trained model 42 by repeating this learning until a given termination condition is satisfied.
  • trained model 42 is a computational model that is estimated to be optimal for locating objects based on images. Note that the trained model 42 is not necessarily the "actually optimal computational model".
  • step S15 the estimation system 30 performs estimation by the trained model 42.
  • this process is performed by the estimation unit 31 .
  • the estimation unit 31 inputs the target image to the trained model 42 and detects at least the position of the target object in the target image.
  • the estimation unit 31 outputs the detected position as an estimation result.
  • the estimating unit 31 may further detect class values, and thus the estimation result may further include class values.
  • the estimation unit 31 may superimpose a bounding box or the like indicating the position of the target object on the target image to generate an estimation result, and display the estimation result on the monitor 120 .
  • the estimation unit 31 may store the estimation result in a recording medium such as the storage 163 .
  • the estimation unit 31 may transmit the estimation result to another computer.
  • the estimation unit 31 may perform detection for each of the plurality of target images.
  • FIG. 4 is a flowchart showing an example of annotation processing.
  • the detection unit 14 acquires one input image.
  • the input image may be a still image, or may be one frame image forming a video.
  • the detector 14 may receive input images sent from a camera or other computer. Alternatively, the detection unit 14 may accept an input image input by a user, or read an input image from a given storage device based on user input.
  • step S132 the detection unit 14 inputs the input image to the machine learning model 41 and detects the presence or absence of the object. That is, the detection unit 14 detects whether or not the target object appears in the input image.
  • the detection unit 14 may use the machine learning model 41 to detect the presence or absence of each of multiple types of objects in the input image.
  • the detection unit 14 may specify a class value for each detected object.
  • step S133 the specifying unit 15 executes the visualization method on the machine learning model 41 that has executed the detection, and calculates the degree of attention for each of the plurality of pixels forming the input image.
  • the degree of attention refers to an index indicating the degree of attention paid by the machine learning model 41 .
  • the degree of attention can be said to be an index indicating how much influence the decision by the machine learning model 41 has, and it can also be said to be an index indicating the grounds for the decision by the machine learning model 41 .
  • a pixel with a higher degree of attention has a greater influence on the determination by the machine learning model 41 .
  • the visualization method is executed based on values calculated in the machine learning model 41 that processed the input image, for example, calculated values corresponding to individual nodes and individual edges in the neural network.
  • the identification unit 15 uses Class Activation Mapping (CAM) as a visualization technique.
  • CAM is a technique for visualizing the grounds for judgment by a neural network based on a feature map and weights corresponding to edges from Global Average Pooling (GAP) to detected classes.
  • GAP Global Average Pooling
  • the identifying unit 15 may use Gradient-weighted CAM (Grad-CAM).
  • Grad-CAM is a method of substituting gradients during backpropagation for weights used in CAM calculations, which makes it possible to visualize the grounds for decisions in various types of neural networks.
  • the identification unit 15 may use Grad-CAM++, Score-CAM, Ablation-CAM, Eigen-CAM, or Integrated Grad-CAM.
  • step S134 the specifying unit 15 selects one or more pixels whose attention level is equal to or greater than a given threshold value Ta as target pixels from a plurality of pixels of the input image. That is, the identifying unit 15 selects a pixel that has a relatively large influence on the determination by the machine learning model 41 as the pixel of interest.
  • the identifying unit 15 identifies a region of interest based on the set of one or more selected pixels of interest.
  • the identifying unit 15 identifies a dense area, which is an area where a plurality of pixels of interest are concentrated, as the attention area.
  • a dense area is a limited range in which a plurality of pixels of interest are gathered at a density equal to or higher than a given reference value. At least part of the dense area may be formed by two or more target pixels that are continuously present.
  • the specifying unit 15 may specify at least one dense area by clustering the target pixels, and specify the dense area as the target area.
  • the specifying unit 15 may calculate the area of each of one or more dense areas, and specify a dense area whose area is equal to or larger than a given threshold value Tb as the attention area.
  • the specifying unit 15 may calculate the area of the circumscribed shape of each of one or more dense areas, and specify the dense areas whose area is equal to or larger than a given threshold value Tc as the attention area.
  • the circumscribed shape may be a circumscribed rectangle or a circumscribed circle.
  • the identifying unit 15 identifies from the input image the area that has been noticed by the machine learning model 41 in the detection as the area of interest.
  • the specifying unit 15 can specify a plurality of attention areas from one input image.
  • the specifying unit 15 may specify a region of interest for each of one or more types of target objects that are detected among the plurality of types of target objects.
  • the identifying unit 15 may identify the attention area based on a plurality of degrees of attention corresponding to a plurality of pixels forming the input image.
  • the identification unit 15 may execute Grad-CAM on the machine learning model to identify the region of interest.
  • the annotation unit 16 associates the annotation with the region of interest.
  • the annotation unit 16 associates a class value indicating the type of object and an annotation indicating the position of the object with the region of interest.
  • the annotation unit 16 associates annotations with each attention area.
  • the annotation unit 16 may associate different annotations for each type of object with one or more attention regions corresponding to one or more types of detected objects. “Annotations different for each type of object” is, for example, an annotation including a class value indicating the type of object.
  • the annotation unit 16 associates the annotation corresponding to the object detected by the machine learning model 41 with the region of interest.
  • the annotation unit 16 may associate an annotation including a graphic representation corresponding to the region of interest with the region of interest.
  • This graphical representation can be used to indicate the position of the object in the input image.
  • the annotation unit 16 may generate a graphical representation rendered corresponding to the location of the region of interest.
  • This graphic representation is for example a bounding box.
  • the graphical representation may be a shape that encloses the entire region of interest, such as a circumscribing shape.
  • the graphical representation may have a shape that partially overlaps the area of interest, so that the area of interest may extend beyond the graphical representation.
  • the graphic representation may be a shape that surrounds the entire object located in the region of interest, or a shape that overlaps part of the object.
  • the annotation unit 16 may generate graphic representations that provide visual effects such as blinking, highlighting, and the like.
  • the annotation unit 16 may associate an annotation that indicates the position of an object and does not include a class value with the region of interest.
  • the annotation unit 16 may perform segmentation for setting annotations in units of pixels.
  • the annotation unit 16 may generate a graphic representation corresponding to the segment as an example of a graphic representation corresponding to the region of interest.
  • the display control unit 11 accepts correction of the annotation.
  • the display control unit 11 displays on the monitor 120 a correction user interface for allowing the user to correct the annotation associated with the attention area by the annotation unit 16 .
  • the display control unit 11 displays the input image on which the annotation is superimposed on the correction user interface, and receives user input for correcting the annotation.
  • a user can use this modification user interface to modify the position or dimensions of a graphic representation such as a bounding shape, modify class values, add or delete annotations, and so on.
  • the annotation section 16 modifies the annotation based on the user's input. The user does not have to correct the annotation, and in this case the annotation unit 16 does not execute correction processing.
  • step S138 the annotation unit 16 stores the processed input image, that is, the annotated input image in the second image database 52 as a second training image.
  • step S139 the detection unit 14 determines whether or not all input images have been processed. If there is an unprocessed input image (NO in step S139), the process returns to step S131. In step S131, the detection unit 14 acquires the next input image, and the processes of steps S132 to S138 are executed based on this input image. When all input images have been processed (YES in step S139), data generation system 10 ends the process of step S13.
  • FIG. 5 is a diagram showing an example of identifying a region of interest from an input image.
  • data generation system 10 performs annotation on input image 200 .
  • An input image 200 is an image captured by a camera that captures the periphery of the end effector 3b of the robot, and shows three decorations 201-203.
  • Decorations 201 and 202 are products Pa, and decoration 203 is product Pb.
  • the detection unit 14 detects the product Pa as the object.
  • the detection unit 14 inputs the input image 200 to the machine learning model 41 and detects the decorations 201 and 202 as objects (step S132).
  • the specifying unit 15 executes a visualization method such as Grad-CAM on the machine learning model 41 that has processed the input image 200, and calculates the degree of attention for each pixel of the input image 200 (step S133).
  • Image 210 is a heat map that visualizes the degree of attention of each pixel. Looking at this image 210, it can be seen that a pixel group 211 located in the area of the decoration 201 and a pixel group 212 located in the area of the decoration 202 are highly noticeable.
  • the specifying unit 15 selects pixels whose attention levels are equal to or higher than a given threshold as target pixels, and specifies a target area based on a set of target pixels (for example, a dense area) (steps S134 and 135).
  • Image 220 shows a dense region 221 obtained from pixel group 211 corresponding to decoration 201 and a dense region 222 obtained from pixel group 212 corresponding to decoration 202 .
  • the specifying unit 15 determines whether or not to specify the dense area as the attention area.
  • the annotation unit 16 associates the annotation with each attention area (step S136).
  • An image 230 shows that dense regions 221 and 222 are specified as regions of interest, an annotation 231 is associated with the dense region (region of interest) 221, and an annotation 232 is associated with the dense region (region of interest).
  • annotations 231 and 232 related to product Pa are represented by circumscribing rectangles of dense areas (areas of interest).
  • the annotation unit 16 can modify at least one of the annotations 231, 232 based on user input (step S137).
  • Steps S13 and S14 are also an example of a learned model manufacturing method according to the present disclosure.
  • the manufacturing method is realized as follows. That is, the data generation system 10 uses the machine learning model 41 to detect the presence or absence of the object in the input image (step S132). The data generation system 10 specifies from the input image the area that has been noticed by the machine learning model 41 in the detection as the attention area (steps S133 to S135). The data generation system 10 associates the annotation corresponding to the detected object with the region of interest (S136). The model generation system 20 generates a trained model 42 based on second teacher data including input images (ie, second training images) with annotations associated with regions of interest.
  • second teacher data including input images (ie, second training images) with annotations associated with regions of interest.
  • the method for producing a trained model according to the present disclosure includes a data generation method according to the present disclosure, and an image based on teacher data including an input image in which an annotation is associated with a region of interest by the data generation method. generating a trained model for at least detecting the position of the object from.
  • an example robotic control system is shown as a component of the robotic system 2 .
  • the robot system 2 is a mechanism for automating given work by causing a robot to execute a task, which is a series of processes for achieving a given purpose.
  • the robot system 2 actively extracts an area in which the robot is expected to be able to perform a task, i.e., an area in which it is expected to process an object, as a task area in a situation where the surrounding environment of the robot is unknown. , the robot approaches the object (task area).
  • the robot system 2 uses the mechanism of the data generation system 10 that identifies the attention area from the input image.
  • the robot system 2 detects the presence or absence of an object in the input image, specifies an area of interest from the input image as an area of interest by the machine learning model in this detection, and selects a task area based on the area of interest.
  • the robot system 2 makes the robot approach the object while repeatedly extracting the task area.
  • the robot system 2 causes the robot to perform the task.
  • the robot reaches the task area it means that the robot has come close enough to the object to perform the task.
  • the task includes the step of bringing the robot 3 into contact with an object.
  • Examples of such tasks include tasks involving grasping an object, pushing or pulling an object.
  • the robot system 2 extracts, as a task area, an area where the robot 3 can come into contact with the object in order to perform the task.
  • the robot system 2 uses active sensing, which is a technique for searching and collecting necessary information by actively changing sensor conditions. This technology allows the robot to recognize the target to be aimed at, even if the conditions regarding the object or surrounding environment change frequently or are impossible or difficult to model in advance. Active sensing is a technique for finding unknown targets, and thus differs from visual feedback, which positions a mechanical system toward a known target.
  • FIG. 6 is a diagram showing an example of the functional configuration of the robot system 2.
  • the robot system 2 comprises a robot control system 60 , one or more robots 3 , and one or more robot controllers 4 corresponding to the one or more robots 3 .
  • FIG. 6 shows one robot 3 and one robot controller 4 and shows a configuration in which one robot 3 is connected to one robot controller 4 .
  • a communication network that connects devices may be a wired network or a wireless network.
  • the communication network may comprise at least one of the Internet and an intranet. Alternatively, the communication network may simply be implemented by a single communication cable.
  • the robot control system 60 is a computer system for autonomously operating the robot 3 in at least some situations.
  • the robot control system 60 performs given operations to generate command signals for controlling the robot 3 .
  • the command signal includes data for controlling the robot 3, such as a path indicating the trajectory of the robot 3.
  • the trajectory of the robot 3 refers to the path of movement of the robot 3 or its components.
  • the trajectory of the robot 3 can be the trajectory of the tip.
  • the robot control system 60 transmits the generated command signal to the robot controller 4 .
  • the robot controller 4 is a device that operates the robot 3 according to command signals from the robot control system 60 .
  • the robot controller 4 calculates joint angle target values (angle target values of each joint of the robot 3) for matching the position and orientation of the tip portion with the target values indicated by the command signals, and calculates the joint angle target values. to control the robot 3 according to
  • the robot 3 is a device or machine that works on behalf of humans.
  • the robot 3 is a multi-axis serial link type vertical articulated robot.
  • the robot 3 includes a manipulator 3a and an end effector 3b, which is a tool attached to the tip of the manipulator 3a.
  • the robot 3 can perform various processes using its end effector 3b.
  • the robot 3 can freely change the position and posture of the end effector 3b within a given range.
  • the robot 3 may be a 6-axis vertical multi-joint robot or a 7-axis vertical multi-joint robot in which one redundant axis is added to the 6 axes.
  • the robot 3 operates under the control of the robot control system 60 to perform a given task.
  • the execution of the task by the robot 3 produces the result desired by the user of the robot system 2 .
  • a task is set to process some object, in which case the robot 3 processes the object.
  • Examples of tasks include "grab an object and place it on a conveyor”, “grab an object and attach it to a workpiece”, and "spray paint an object”.
  • the robot includes a camera 3c that captures the surroundings of the end effector 3b.
  • the coverage of the camera 3c may be set so as to capture at least part of the end effector 3b.
  • the camera 3c may be arranged on the manipulator 3a, for example attached near the tip of the manipulator 3a.
  • the camera 3c moves corresponding to the motion of the robot 3. FIG. This movement may include changes in at least one of the position and orientation of the camera 3c.
  • the camera 3 c may be provided at a different location from the robot 3 as long as it moves in response to the motion of the robot 3 .
  • the camera 3c may be attached to another robot, or may be movably provided on the ceiling, wall, or camera stand.
  • the robot control system 60 includes a display control unit 11, a labeling unit 12, a preparation unit 13, a detection unit 14, an identification unit 15, and a robot control unit 61 as functional modules.
  • Display control unit 11 , labeling unit 12 , preparation unit 13 , detection unit 14 , and identification unit 15 are the same as the functional modules shown in data generation system 10 .
  • the display control unit 11 displays a label user interface.
  • the labeling unit 12 generates a first training image by adding a label input through the label user interface to a given image, and stores the first training image in the first image database 51 .
  • the preparation unit 13 executes machine learning based on the labeled first training image to generate a machine learning model 41 .
  • the detection unit 14 uses the machine learning model 41 to detect the presence or absence of an object in the input image.
  • the specifying unit 15 specifies from the input image a region focused on by the machine learning model 41 in the detection as a region of interest.
  • the robot control unit 61 is a functional module that controls the robot 3 that processes an object based on its attention area.
  • the robot system 2 (robot control system 60) can access the first image database 51.
  • the first image database 51 may be provided outside the robot system 2 or the robot control system 60 or may be part of the robot system 2 or the robot control system 60 .
  • the first image database 51 is a device that stores a plurality of first training images with labels indicating the presence or absence of objects as first teacher data used to generate the machine learning model 41 .
  • the robot control system 60 may be implemented by any kind of computer, for example by the computer 100 shown in FIG. Each functional module of the robot control system 60 is implemented by loading a robot control program into the processor 161 or memory 162 and causing the processor 161 to execute the robot control program.
  • the processor 161 operates the input/output port 164 or the communication port 165 according to the robot control program to read and write data in the memory 162 or storage 163 .
  • the robot control program may be provided by a non-transitory recording medium or via a communication network.
  • FIG. 7 is a flowchart showing an example of processing in the robot system 2 (robot control system 60) as a processing flow S2. That is, the robot system 2 (robot control system 60) executes the processing flow S2.
  • the processing flow S2 is executed on the premise that the machine learning model 41 has already been prepared.
  • the machine learning model 41 generated by steps S11 and S12 of process flow S1 is used in process flow S2.
  • step S21 the detection unit 14 acquires one input image.
  • the input image may be a still image, or may be one frame input image forming a video.
  • the detection unit 14 may receive an input image sent from the camera 3c.
  • step S22 the detection unit 14 inputs the input image to the machine learning model 41 and detects the presence or absence of the object. This process is the same as step S132.
  • step S23 the identifying unit 15 executes the visualization method on the machine learning model 41 that has executed the detection, and calculates the degree of attention for each of the plurality of pixels forming the input image. This process is the same as step S133.
  • step S24 the identifying unit 15 selects one or more pixels whose degree of attention is equal to or greater than a given threshold value Ta as target pixels from a plurality of pixels of the input image. This process is the same as step S134.
  • step S25 the specifying unit 15 specifies a region of interest based on the set of one or more selected pixels of interest. This process is the same as step S135.
  • the identifying unit 15 identifies from the input image the area that has been noticed by the machine learning model 41 in the detection as the area of interest.
  • the specifying unit 15 can specify a plurality of attention areas from one input image.
  • the identifying unit 15 may identify the attention area based on a plurality of degrees of attention corresponding to a plurality of pixels forming the input image.
  • the identification unit 15 may execute Grad-CAM on the machine learning model to identify the region of interest.
  • the robot control unit 61 selects a task area based on the attention area. For example, the robot control unit 61 selects one of the one or more attention areas as the task area. When a plurality of attention areas are specified, the robot control section 61 may select the attention area having the largest area of the circumscribed rectangle, or may select the attention area having the largest area. When the attention area is selected in this manner, there is a high probability that the area of the object that is closest to the robot 3 in the coverage area will be selected as the task area.
  • step S27 the robot control unit 61 determines whether the robot 3 has reached the task area. For example, the robot control section 61 may calculate the distance between the end effector 3b and the task area, and execute determination based on this distance. If the calculated distance is equal to or less than the given threshold value Td, the robot control unit 61 determines that the robot 3 has reached the task area. On the other hand, when the calculated distance is greater than the threshold Td, the robot control section 61 determines that the robot 3 has not reached the task area.
  • step S28 the robot control unit 61 controls the robot 3 toward the task area as an example of control of the robot 3 based on the attention area. That is, the robot control unit 61 controls the robot 3 so that the robot 3 approaches the object.
  • the robot control unit 61 generates a path of the robot 3 for moving the end effector 3b from the current position to the task area by planning.
  • the robot control unit 61 may generate a path (trajectory) of the robot 3 by planning so that the distance to the task area is reduced and the task area appears in the center of the image of the camera 3c.
  • the robot control unit 61 outputs a command signal indicating the generated path to the robot controller 4, and the robot controller 4 controls the robot 3 according to the command signal. As a result, the robot 3 approaches the object along its path.
  • step S28 the process returns to step S21, and the robot control system 60 executes the processes after step S21 again.
  • the detection unit 14 acquires a new input image (step S21), and further detects the presence or absence of the object in the input image using the machine learning model 41 (step S22).
  • the detection unit 14 processes an input image captured after the robot 3 approaches the object as a new input image.
  • the specifying unit 15 specifies from the new input image, as a new attention area, the area that has been noticed by the machine learning model 41 in the detection (steps S23 to S25).
  • the robot control unit 61 further controls the robot 3 based on the new attention area (from step S26).
  • step S27 the process proceeds to step S29.
  • the robot control unit 61 causes the robot 3 to execute the task.
  • the robot control unit 61 generates a path for executing a task through planning, and outputs a command signal indicating the path to the robot controller 4 .
  • the robot controller 4 controls the robot 3 according to the command signal. As a result, the robot 3 executes the task.
  • the robot control unit 61 determines whether or not to end robot control.
  • the robot control section 61 may perform this determination based on any termination condition. For example, the robot control unit 61 may determine to end the robot control when the task has been executed a specified number of times, and may determine to continue the robot control when the number of task executions is less than the specified number of times. Alternatively, the robot control unit 61 may determine to end the robot control when an error occurs in the robot control, and determine to continue the robot control when the error does not occur.
  • step S30 the process proceeds to step S31.
  • step S31 the robot control system 60 executes end processing.
  • the robot control unit 61 may return the robot 3 to its initial posture and position. Alternatively, the robot control unit 61 may notify the user by visual information or voice that all tasks have been completed.
  • step S30 the process proceeds to step S32.
  • the robot control unit 61 prepares for the next task. For example, the robot controller 61 may return the robot 3 to its initial posture and position. Alternatively, the robot control unit 61 may notify the user by visual information or voice that the next task is to be executed.
  • FIG. 8 is a diagram showing an example of robot control by the robot control system 60.
  • the robot 3 performs the task of putting the balls in the box 410 in its surroundings. That is, in this example the object is a ball.
  • FIG. 8 sequentially represents a series of actions of the robot 3 by scenes S301 to S304. The following description also shows the correspondence with the processing flow S2.
  • the detection unit 14 acquires an input image showing the ball 421 (step S21), and uses the machine learning model 41 to detect the presence or absence of the object (ball) in the input image (step S22).
  • the specifying unit 15 specifies the attention area in the machine learning model 41 in the detection (steps S23 to S25). This attention area corresponds to the position where the ball 421 exists.
  • the robot control unit 61 selects a task area corresponding to the ball 421 based on the attention area (step S26).
  • the robot control unit 61 controls the robot 3 so that it approaches the ball 421 (steps S27 and S28). Through this control, the distance between the end effector 3b and the ball 421 is shortened. After that, the processing of steps S21 to S28 is repeated.
  • the robot control unit 61 causes the robot 3 to execute the task (step S29).
  • the robot 3 executes the task under the control of the robot control unit 61 (step S29).
  • the robot 3 grips the ball 421 with the end effector 3 b and puts the ball 421 into the box 410 .
  • the data generation system includes a detection unit that detects the presence or absence of a target object in an input image using a machine learning model that detects the presence or absence of the target object based on the image;
  • the apparatus includes a specifying unit that specifies a region noticed by a machine learning model in detection from an input image as a region of interest, and an annotation unit that associates an annotation corresponding to the detected object with the region of interest.
  • a data generation method is executed by a data generation system including at least one processor.
  • This data generation method includes steps of detecting the presence or absence of a target object in an input image using a machine learning model that detects the presence or absence of a target object based on the image; and associating annotations corresponding to the detected objects with the region of interest.
  • a data generation program detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on the image; causing a computer to perform the steps of identifying the detected region from the input image as the region of interest and associating annotations corresponding to the detected object with the region of interest.
  • annotations corresponding to the object are automatically associated with the region focused on by the machine learning model in detecting the object. Therefore, it is possible to efficiently annotate an object in an image.
  • a data generation system may further include a preparation unit that executes machine learning based on a plurality of training images labeled to indicate the presence or absence of an object to generate a machine learning model.
  • a data generation system includes: a display control unit that displays a label user interface for assigning a label to a given image; A labeling unit that assigns and generates a training image may also be provided. This configuration enables training images to be prepared as desired by the user.
  • a data generation system further includes a display control unit that displays a correction user interface for allowing a user to correct the annotation, and the annotation unit corrects the annotation based on user input via the correction user interface. You can fix it. This configuration gives the user the opportunity to modify the automatically set annotations. Also, since the annotation is corrected according to user input, the accuracy of the annotation can be improved.
  • the annotation unit may associate a graphical representation corresponding to the region of interest with the region of interest as at least part of the annotation. Annotations using graphical representations can clearly indicate objects in the input image.
  • the annotation unit may generate the circumscribed shape of the region of interest as a graphic representation. This circumscribed shape can clearly indicate the position of the object in the input image.
  • the detection unit detects the presence or absence of each of the plurality of types of objects in the input image using a machine learning model, and the identification unit detects among the plurality of types of objects.
  • the annotation unit specifies a region of interest for each of the detected one or more types of target objects, and annotating each of the one or more target regions corresponding to the detected one or more types of target objects with different annotations for each target type. may be associated. In this case, annotations can be added to the input image so that the type of object can be determined.
  • the annotation unit may associate a class value for identifying the type of object with the region of interest as at least part of the annotation.
  • the specifying unit calculates an attention level indicating a degree of attention given to each of the plurality of pixels forming the input image by the machine learning model, and calculates a plurality of attention levels corresponding to the plurality of pixels.
  • a region of interest may be identified based on the degree of attention. Since the attention area is specified based on the degree of attention for each pixel, the attention area can be specified in detail.
  • the specifying unit selects one or more pixels whose attention level is equal to or higher than a given threshold value from the plurality of pixels as target pixels, and based on the selected one or more target pixels A region of interest may be identified. Since a pixel with a relatively high degree of attention is set as an attention area, an annotation can be associated with a position where there is a high probability that an object exists. As a result, it is possible to further improve the accuracy of annotation.
  • the identifying unit may identify the dense region as the attention region. good.
  • a region in which pixels of interest are concentrated over a relatively wide range is set as a region of interest, so that an annotation can be associated with an object clearly appearing in the input image.
  • the identifying unit identifies the dense region as the attention region.
  • the attention area is specified based not on the area of the dense area itself but on the area of the circumscribed shape of the dense area, the area of the dense area can be easily calculated, and the attention area can be specified at high speed accordingly.
  • the identifying unit may execute Grad-CAM on the machine learning model to identify the region of interest.
  • Grad-CAM can be used to identify regions of interest for various types of machine learning models, such as various types of neural networks.
  • a model generation system includes the data generation system described above, an acquisition unit that acquires teacher data including an input image in which an annotation is associated with a region of interest by the data generation system, and based on the teacher data, a learning unit that generates a trained model for detecting at least the position of the object from the image.
  • a trained model for detecting the position of an object can be generated using teacher data including annotated input images.
  • An estimation system includes the model generation system described above, and an estimation unit that inputs a target image to a trained model generated by the model generation system and detects at least the position of the target from the target image. Prepare.
  • the trained model can be used to efficiently detect the position of the target object from the target image.
  • a learned model robot control system includes a detection unit that detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on an image, and
  • the apparatus includes a specifying unit that specifies a region noticed by a machine learning model as a region of interest from an input image, and a robot control unit that controls a robot that processes an object based on the region of interest.
  • the robot since the robot is controlled based on the area focused on by the machine learning model in detecting the object, the robot can be operated autonomously according to the position of the object.
  • the robot control unit may control the robot so that the robot approaches the object.
  • the robot can autonomously approach the object according to the position of the object.
  • the detection unit further detects the presence or absence of the object in a new input image acquired after the robot approaches the object using a machine learning model, and the identification unit A region focused by the machine learning model in the further detection may be specified as a new region of interest from the new input image, and the robot control unit may further control the robot based on the new region of interest.
  • the attention area is identified again, and the robot is further controlled based on the attention area. This mechanism allows the robot to operate more accurately.
  • a method for producing a trained model according to an aspect of the present disclosure is based on the data generation method described above and teacher data including an input image in which annotations are associated with regions of interest by the data generation method. generating a trained model for at least detecting the Since the input image automatically annotated is used as at least part of the teacher data, it is possible to efficiently generate a trained model for object detection.
  • the data generation system 10 may be constructed independently without including the model generation system 20 and the estimation system 30 .
  • the computer systems corresponding to model generation system 20 and estimation system 30 may be computer systems owned by different owners from data generation system 10 .
  • a combination of data generation system 10 and model generation system 20 may be constructed without including estimation system 30 .
  • the computer system corresponding to the estimation system 30 may be a computer system owned by a different owner from the data generation system 10 and the model generation system 20 .
  • the data generation system and robot control system according to the present disclosure may not include the display control unit 11, the labeling unit 12, and the preparation unit 13. That is, the data generation system and robot control system may use machine learning models generated by other computer systems. Because machine learning models and trained models are portable between computer systems, various systems according to the present disclosure can be implemented flexibly.
  • the hardware configuration of the system according to the present disclosure is not limited to the aspect of implementing each functional module by executing a program.
  • at least part of the functional module group described above may be configured by a logic circuit specialized for that function, or may be configured by an ASIC (Application Specific Integrated Circuit) that integrates the logic circuit.
  • ASIC Application Specific Integrated Circuit
  • the processing procedure of the method executed by at least one processor is not limited to the above examples. For example, some of the steps or processes described above may be omitted, or the steps may be performed in a different order. Also, two or more of the steps described above may be combined, and some of the steps may be modified or deleted. Alternatively, other steps may be performed in addition to the above steps.
  • either of the two criteria “greater than” and “greater than” may be used, and the two criteria “less than” and “less than” may be used. Either of the criteria may be used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A data generation system according to one example comprises: a detection unit that detects the presence or absence of a target object in an input image using a machine learning model that detects the presence or absence of a target object on the basis of an image; an identification unit that identifies, from the input image and as an area of interest, the area targeted by the machine learning model at the time of the detection; and an annotation unit that associates an annotation corresponding to the detected target object with the area of interest.

Description

データ生成システム、モデル生成システム、推定システム、学習済みモデルの製造方法、ロボット制御システム、データ生成方法、およびデータ生成プログラムData generation system, model generation system, estimation system, trained model manufacturing method, robot control system, data generation method, and data generation program
 本開示の一側面は、データ生成システム、モデル生成システム、推定システム、学習済みモデルの製造方法、ロボット制御システム、データ生成方法、およびデータ生成プログラムに関する。 One aspect of the present disclosure relates to a data generation system, a model generation system, an estimation system, a trained model manufacturing method, a robot control system, a data generation method, and a data generation program.
 特許文献1には、アイテムに関するアイテム情報を取得するアイテム情報取得手段と、該アイテム情報に基づいてアイテムの標章を特定する標章特定手段と、該アイテム情報に基づいてアイテムの分類を特定する分類特定手段と、特定された標章および分類に基づいて、アイテムに関する不正を推定する推定手段とを含む不正推定システムが記載されている。 Patent Document 1 discloses item information acquisition means for acquiring item information about an item, mark specifying means for specifying a mark of an item based on the item information, and item classification for specifying an item classification based on the item information. A fraud presumption system is described that includes a class identification means and an inference means for presuming fraud for an item based on the identified mark and classification.
国際公開第2020/240834号パンフレットInternational Publication No. 2020/240834 pamphlet
 画像に映った対象物に関する処理の効率化が望まれている。  There is a demand for more efficient processing of objects in images.
 本開示の一側面に係るデータ生成システムは、画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における対象物の有無を検出する検出部と、検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定する特定部と、検出された対象物に対応するアノテーションを注目領域に関連付けるアノテーション部とを備える。 A data generation system according to one aspect of the present disclosure includes a detection unit that detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on an image, and a machine learning model in detection. An identification unit that identifies an area of interest from an input image as an area of interest, and an annotation unit that associates an annotation corresponding to the detected object with the area of interest.
 本開示の一側面に係るデータ生成方法は、少なくとも一つのプロセッサを備えるデータ生成システムによって実行される。このデータ生成方法は、画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における対象物の有無を検出するステップと、検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定するステップと、検出された対象物に対応するアノテーションを注目領域に関連付けるステップとを含む。 A data generation method according to one aspect of the present disclosure is executed by a data generation system including at least one processor. This data generation method includes steps of detecting the presence or absence of a target object in an input image using a machine learning model that detects the presence or absence of a target object based on the image; and associating annotations corresponding to the detected objects with the region of interest.
 本開示の一側面に係るデータ生成プログラムは、画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における対象物の有無を検出するステップと、検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定するステップと、検出された対象物に対応するアノテーションを注目領域に関連付けるステップとをコンピュータに実行させる。 A data generation program according to one aspect of the present disclosure detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on the image; causing a computer to perform the steps of identifying the detected region from the input image as the region of interest and associating annotations corresponding to the detected object with the region of interest.
 本開示の一側面によれば、画像に映った対象物に関する処理を効率化できる。 According to one aspect of the present disclosure, it is possible to improve the efficiency of processing related to an object appearing in an image.
物体検出システムの機能構成の一例を示す図である。It is a figure showing an example of functional composition of an object detection system. 物体検出システムで用いられるコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the computer used with an object detection system. 物体検出システムでの処理の一例を示すフローチャートである。4 is a flowchart showing an example of processing in an object detection system; アノテーションに関する処理の一例を示すフローチャートである。10 is a flowchart showing an example of annotation processing; 注目領域の特定の一例を示す図である。It is a figure which shows an example of specification of an attention area. ロボットシステムの機能構成の一例を示す図である。It is a figure showing an example of functional composition of a robot system. ロボットシステムでの処理の一例を示すフローチャートである。4 is a flow chart showing an example of processing in the robot system; ロボット制御の一例を示す図である。It is a figure which shows an example of robot control.
 以下、添付図面を参照しながら本開示での実施形態を詳細に説明する。図面の説明において同一または同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and overlapping descriptions are omitted.
 [物体検出システム]
 (システムの概要)
 一例では、本開示に係るデータ生成システム、モデル生成システム、および推定システムを物体検出システム1に適用する。物体検出システム1は、画像から対象物の位置を検出するための学習済みモデルを生成し、この学習済みモデルを用いてその検出を実行するコンピュータシステムである。学習済みモデルは、画像中において対象物が映っている位置を対象物の位置として検出する計算モデルである。学習済みモデルは予め機械学習によって生成される。機械学習とは、与えられた情報に基づいて反復的に学習することで、法則またはルールを自律的に見つけ出す手法をいう。学習済みモデルはアルゴリズムおよびデータ構造を用いて構築される。一例では、学習済みモデルは畳み込みニューラルネットワーク(CNN)などのニューラルネットワークによって構築される。学習済みモデルの生成は学習フェーズに相当し、学習済みモデルの利用は運用フェーズに相当する。したがって、物体検出システム1は学習フェーズおよび運用フェーズの双方を実行する。
[Object detection system]
(system overview)
In one example, the data generation system, model generation system, and estimation system according to the present disclosure are applied to the object detection system 1 . The object detection system 1 is a computer system that generates a trained model for detecting the position of an object from an image and uses this trained model to perform the detection. A trained model is a computational model that detects a position where an object appears in an image as the position of the object. A trained model is generated in advance by machine learning. Machine learning refers to a method of autonomously discovering laws or rules by repeatedly learning based on given information. A trained model is built using algorithms and data structures. In one example, the trained model is built by a neural network such as a convolutional neural network (CNN). Generating a trained model corresponds to the learning phase, and using the trained model corresponds to the operation phase. Therefore, the object detection system 1 performs both a learning phase and an operational phase.
 対象物は任意の有体物であり、物体検出システム1の利用目的に応じて設定される。例えば、農作物をロボットで自動的に収穫するために物体検出システム1が用いられる場合には、対象物はその農作物である。別の例として、物品をロボットで自動的に箱詰めするために物体検出システム1が用いられる場合には、対象物はその物品である。 The object is any tangible object, and is set according to the purpose of use of the object detection system 1. For example, if the object detection system 1 is used to robotically harvest crops automatically, the target is the crops. As another example, if the object detection system 1 is used for robotic automatic boxing of items, the object is the item.
 (システムの構成)
 図1は物体検出システム1の機能構成の一例を示す図である。この例では、物体検出システム1はデータ生成システム10、モデル生成システム20、および推定システム30を備える。データ生成システム10、モデル生成システム20、および推定システム30はそれぞれ、本開示に係るデータ生成システム、モデル生成システム、および推定システムの一例である。データ生成システム10およびモデル生成システム20は学習フェーズに相当し、推定システム30は運用フェーズに相当する。
(System configuration)
FIG. 1 is a diagram showing an example of the functional configuration of an object detection system 1. As shown in FIG. In this example, object detection system 1 comprises data generation system 10 , model generation system 20 and estimation system 30 . Data generation system 10, model generation system 20, and estimation system 30 are examples of a data generation system, a model generation system, and an estimation system, respectively, according to the present disclosure. The data generation system 10 and model generation system 20 correspond to the learning phase, and the estimation system 30 corresponds to the operation phase.
 学習フェーズにおいて、モデル生成システム20は推定システム30で用いられる学習済みモデル42を生成する。学習済みモデル42を生成するためには、アノテーションが付与された複数の画像を含む教師データを用いた機械学習が必要である。一般に、その機械学習のために大量の教師データが求められ、したがって、多数の画像にアノテーションを付与する必要がある。アノテーションは画像に関連する情報(メタデータ)であり、例えば、対象物の種類を示すクラス値と画像中における対象物の位置とを示す。クラス値は数値、テキストなどの任意の形式で表現され得る。対象物の位置は、その位置に対応して設定される矩形であるバウンディングボックスによって示されてもよい。従来、アノテーションは人手によって行われるので非常にコストと時間が掛かる。加えて、作業者間でのアノテーションのばらつきも発生し得る。物体検出システム1では、データ生成システム10がそのアノテーションを実行して、教師データの少なくとも一部を自動的に生成する。したがって、画像に映った対象物に関するアノテーションを効率的に実行できる。 In the learning phase, the model generation system 20 generates a trained model 42 used by the estimation system 30. In order to generate the trained model 42, machine learning using teacher data including a plurality of annotated images is required. Generally, a large amount of training data is required for machine learning, so it is necessary to annotate a large number of images. An annotation is information (metadata) related to an image, and indicates, for example, a class value indicating the type of object and the position of the object in the image. Class values can be expressed in any form, such as numbers, text, and so on. The position of the object may be indicated by a rectangular bounding box that is set corresponding to the position. Conventionally, annotation is performed manually, which is very costly and time consuming. In addition, variations in annotations between workers can also occur. In the object detection system 1, the data generation system 10 executes the annotation to automatically generate at least part of the teacher data. Therefore, it is possible to efficiently carry out annotations on the object appearing in the image.
 データ生成システム10は画像にアノテーションを付与するために機械学習モデル41を用いる。機械学習モデル41は画像に基づいて対象物の有無を検出する計算モデルである。本開示において、対象物の有無の検出は、対象物の種類、すなわちクラス値を特定する処理を含み得る。機械学習モデル41は画像を処理して、該画像中において対象物が映っているか否かを検出する。機械学習モデル41は予め機械学習によって生成され、したがってこれも学習済みモデルであるといえる。一例では、機械学習モデル41は畳み込みニューラルネットワーク(CNN)などのニューラルネットワークによって構築される。 The data generation system 10 uses a machine learning model 41 to annotate images. The machine learning model 41 is a computational model for detecting the presence or absence of an object based on an image. In the present disclosure, detecting the presence or absence of an object may include processing for identifying the type of object, that is, the class value. Machine learning model 41 processes the image to detect whether or not the object is visible in the image. The machine learning model 41 is generated in advance by machine learning, so it can be said that this is also a trained model. In one example, machine learning model 41 is constructed by a neural network such as a convolutional neural network (CNN).
 本開示では説明の便宜のために、学習フェーズの一部に相当するデータ生成システム10において画像にアノテーションを付与するための計算モデルを「機械学習モデル」といい、運用フェーズに相当する推定システム30において画像中の対象物の位置を検出するための計算モデルを「学習済みモデル」という。 In the present disclosure, for convenience of explanation, a computational model for annotating images in the data generation system 10 corresponding to a part of the learning phase is referred to as a "machine learning model", and the estimation system 30 corresponding to the operation phase A computational model for detecting the position of an object in an image is called a "learned model".
 一例では、物体検出システム1は第1画像データベース51および第2画像データベース52にアクセス可能である。これらのデータベースは、物体検出システム1の外部に設けられてもよいし、物体検出システム1の一部であってもよい。第1画像データベース51は、対象物の有無を示すラベルが付与された複数の第1トレーニング画像を、機械学習モデル41を生成するために用いられる第1教師データとして記憶する装置である。ラベルは画像に関連する情報(メタデータ)であり、例えば、対象物が画像中に存在するか否かを示すクラス値を示す。第2画像データベース52は、対象物に対応するアノテーションが付与された複数の第2トレーニング画像を、学習済みモデル42を生成するために用いられる第2教師データとして記憶する装置である。 In one example, the object detection system 1 can access a first image database 51 and a second image database 52 . These databases may be provided outside the object detection system 1 or may be part of the object detection system 1 . The first image database 51 is a device that stores a plurality of first training images with labels indicating the presence or absence of objects as first teacher data used to generate the machine learning model 41 . A label is information (metadata) associated with an image, and indicates, for example, a class value indicating whether or not an object exists in the image. The second image database 52 is a device that stores a plurality of second training images annotated with annotations corresponding to objects as second teacher data used to generate the trained model 42 .
 ラベルおよびアノテーションは、機械学習において正解(ground truth)として用いられる点で共通する。本開示では説明の便宜のために、第1トレーニング画像に付与されるメタデータを「ラベル」といい、第2トレーニング画像に付与されるメタデータを「アノテーション」という。 Labels and annotations are common in that they are used as ground truth in machine learning. In this disclosure, for convenience of explanation, the metadata attached to the first training images is referred to as "labels", and the metadata attached to the second training images is referred to as "annotations".
 一例では、データ生成システム10は機能モジュールとして表示制御部11、ラベリング部12、準備部13、検出部14、特定部15、およびアノテーション部16を備える。表示制御部11はラベルまたはアノテーションに関するユーザインタフェースを表示する機能モジュールである。ラベリング部12は、ユーザインタフェースを介して入力されたラベルを所与の画像に付与して第1トレーニング画像を生成する機能モジュールである。ラベリング部12はその第1トレーニング画像を第1画像データベース51に格納する。準備部13は、ラベルが付与された第1トレーニング画像に基づいて機械学習を実行して機械学習モデル41を生成する機能モジュールである。検出部14はその機械学習モデル41を用いて、入力画像における対象物の有無を検出する機能モジュールである。検出部14は、特定の1種類の対象物についてその対象物が入力画像中に存在するか否かを検出してもよい。あるいは、検出部14は複数種類の対象物のそれぞれについて、該対象物が入力画像中に存在するか否かを検出してもよい。検出部14は1種類以上の対象物のそれぞれの種類を、すなわち、それぞれのクラス値を特定してもよい。特定部15は、その検出において機械学習モデル41によって注目された領域を注目領域として入力画像から特定する機能モジュールである。アノテーション部16は、検出された対象物に対応するアノテーションを注目領域に関連付ける機能モジュールである。アノテーション部16は、アノテーションが注目領域に関連付けられた入力画像、すなわちアノテーションが付与された入力画像を、第2トレーニング画像として第2画像データベース52に格納する。入力画像は機械学習モデル41によって処理される画像であり、アノテーションが付与されることで第2トレーニング画像として取り扱われる。 In one example, the data generation system 10 includes a display control unit 11, a labeling unit 12, a preparation unit 13, a detection unit 14, an identification unit 15, and an annotation unit 16 as functional modules. The display control unit 11 is a functional module that displays a user interface related to labels or annotations. The labeling unit 12 is a functional module that generates a first training image by adding a label input via a user interface to a given image. The labeling unit 12 stores the first training images in the first image database 51 . The preparation unit 13 is a functional module that executes machine learning based on the labeled first training image to generate a machine learning model 41 . The detection unit 14 is a functional module that uses the machine learning model 41 to detect the presence or absence of an object in the input image. The detection unit 14 may detect whether or not a specific type of object exists in the input image. Alternatively, the detection unit 14 may detect whether or not each of a plurality of types of target objects exists in the input image. The detection unit 14 may specify each type of one or more types of objects, that is, each class value. The specifying unit 15 is a functional module that specifies, from the input image, a region focused on by the machine learning model 41 in its detection as a region of interest. The annotation unit 16 is a functional module that associates the annotation corresponding to the detected object with the region of interest. The annotation unit 16 stores the input image whose annotation is associated with the region of interest, that is, the input image to which the annotation is added, in the second image database 52 as a second training image. The input image is an image processed by the machine learning model 41, and is treated as a second training image by annotating it.
 一例では、モデル生成システム20は学習部21を備える。学習部21は、データ生成システム10によって生成された第2トレーニング画像を含む第2教師データを取得し、この第2教師データに基づいて学習済みモデル42を生成する機能モジュールである。したがって、学習部21は取得部としても機能する。一例では、モデル生成システム20はデータ生成システム10および学習部21を備えるコンピュータシステムとして構築されてもよい。 In one example, the model generation system 20 includes a learning unit 21. The learning unit 21 is a functional module that acquires second teacher data including second training images generated by the data generation system 10 and generates a trained model 42 based on this second teacher data. Therefore, the learning unit 21 also functions as an acquisition unit. In one example, model generation system 20 may be constructed as a computer system including data generation system 10 and learning unit 21 .
 一例では、推定システム30は推定部31を備える。推定部31は学習済みモデル42に対象画像を入力して、該対象画像から対象物の位置を少なくとも検出する機能モジュールである。推定部31は一つの対象画像から、2種類以上の対象物のそれぞれについて該対象物の位置を検出してもよい。対象画像は学習済みモデル42によって処理される画像である。対象画像にはメタデータが関連付けられておらず、学習済みモデル42は対象画像の画素情報に基づいて対象物の位置を少なくとも検出する。推定部31は検出された対象物のそれぞれについてクラス値を更に推定してもよい。一例では、推定システム30は、データ生成システム10を含み得るモデル生成システム20と推定部31とを備えるコンピュータシステムとして構築されてもよい。 In one example, the estimation system 30 includes an estimation unit 31 . The estimation unit 31 is a functional module that inputs a target image to the trained model 42 and detects at least the position of the target object from the target image. The estimation unit 31 may detect the position of each of two or more types of target objects from one target image. A target image is an image to be processed by the trained model 42 . No metadata is associated with the target image, and the trained model 42 at least detects the position of the target based on the pixel information of the target image. The estimator 31 may further estimate a class value for each detected object. In one example, the estimation system 30 may be constructed as a computer system that includes the model generation system 20 that may include the data generation system 10 and the estimation unit 31 .
 物体検出システム1は任意の種類のコンピュータによって実現され得る。そのコンピュータは、パーソナルコンピュータ、業務用サーバなどの汎用コンピュータでもよいし、特定の処理を実行する専用装置に組み込まれてもよい。物体検出システム1は一つのコンピュータによって実現されてもよいし、複数のコンピュータを有する分散システムによって実現されてもよい。データ生成システム10、モデル生成システム20、および推定システム30のそれぞれは、一つのコンピュータによって実現されてもよいし、複数のコンピュータによる分散システムによって実現されてもよい。あるいは、一つのコンピュータが、データ生成システム10、モデル生成システム20、および推定システム30のうちの少なくとも二つとして機能してもよい。 The object detection system 1 can be realized by any kind of computer. The computer may be a general-purpose computer such as a personal computer or a server for business use, or may be incorporated in a dedicated device that executes specific processing. The object detection system 1 may be implemented by one computer, or may be implemented by a distributed system having a plurality of computers. Each of the data generation system 10, the model generation system 20, and the estimation system 30 may be implemented by one computer, or may be implemented by a distributed system of multiple computers. Alternatively, one computer may function as at least two of data generation system 10 , model generation system 20 and estimation system 30 .
 図2は、物体検出システム1で用いられるコンピュータ100のハードウェア構成の一例を示す図である。この例では、コンピュータ100は本体110、モニタ120、および入力デバイス130を備える。 FIG. 2 is a diagram showing an example of the hardware configuration of the computer 100 used in the object detection system 1. As shown in FIG. In this example, computer 100 comprises main body 110 , monitor 120 and input device 130 .
 本体110はコンピュータの主たる機能を実行する装置である。本体110は回路160を有し、回路160は、少なくとも一つのプロセッサ161と、メモリ162と、ストレージ163と、入出力ポート164と、通信ポート165とを有する。ストレージ163は、本体110の各機能モジュールを構成するためのプログラムを記録する。ストレージ163は、ハードディスク、不揮発性の半導体メモリ、磁気ディスク、光ディスクなどの、コンピュータ読み取り可能な記録媒体である。メモリ162は、ストレージ163からロードされたプログラム、プロセッサ161の演算結果などを一時的に記憶する。プロセッサ161は、メモリ162と協働してプログラムを実行することで、各機能モジュールを構成する。入出力ポート164は、プロセッサ161からの指令に応じて、モニタ120または入力デバイス130との間で電気信号の入出力を行う。入出力ポート164は他の装置との間で電気信号の入出力を行ってもよい。通信ポート165は、プロセッサ161からの指令に従って、通信ネットワークNを介して他の装置との間でデータ通信を行う。 The main body 110 is a device that executes the main functions of the computer. The main body 110 has a circuit 160 which has at least one processor 161 , a memory 162 , a storage 163 , an input/output port 164 and a communication port 165 . Storage 163 records programs for configuring each functional module of main body 110 . The storage 163 is a computer-readable recording medium such as a hard disk, nonvolatile semiconductor memory, magnetic disk, or optical disk. The memory 162 temporarily stores programs loaded from the storage 163, calculation results of the processor 161, and the like. The processor 161 configures each functional module by executing a program in cooperation with the memory 162 . The input/output port 164 inputs and outputs electrical signals to/from the monitor 120 or the input device 130 according to instructions from the processor 161 . The input/output port 164 may input/output electrical signals to/from other devices. Communication port 165 performs data communication with other devices via communication network N according to instructions from processor 161 .
 モニタ120は、本体110から出力された情報を表示するための装置である。モニタ120は、グラフィック表示が可能であればいかなるものであってもよく、その具体例としては液晶パネルが挙げられる。 The monitor 120 is a device for displaying information output from the main body 110 . The monitor 120 may be of any type as long as it can display graphics, and a specific example thereof is a liquid crystal panel.
 入力デバイス130は、本体110に情報を入力するための装置である。入力デバイス130は、所望の情報を入力可能であればいかなるものであってもよく、その具体例としてはキーパッド、マウス、操作コントローラなどの操作インタフェースが挙げられる。 The input device 130 is a device for inputting information to the main body 110. The input device 130 may be of any type as long as desired information can be input, and specific examples thereof include operation interfaces such as a keypad, mouse, and operation controller.
 モニタ120および入力デバイス130はタッチパネルとして一体化されていてもよい。例えばタブレットコンピュータのように、本体110、モニタ120、および入力デバイス130が一体化されていてもよい。 The monitor 120 and the input device 130 may be integrated as a touch panel. For example, the main body 110, the monitor 120, and the input device 130 may be integrated like a tablet computer.
 物体検出システム1の各機能モジュールは、プロセッサ161またはメモリ162の上に物体検出プログラムを読み込ませてプロセッサ161にその物体検出プログラムを実行させることで実現される。プロセッサ161は物体検出プログラムに従って入出力ポート164または通信ポート165を動作させ、メモリ162またはストレージ163におけるデータの読み出しおよび書き込みを実行する。 Each functional module of the object detection system 1 is realized by loading an object detection program onto the processor 161 or memory 162 and causing the processor 161 to execute the object detection program. The processor 161 operates the input/output port 164 or communication port 165 according to the object detection program to read and write data in the memory 162 or storage 163 .
 一例では、物体検出プログラムはデータ生成プログラム、モデル生成プログラム、および推定プログラムを含む。データ生成プログラムは、データ生成システム10の各機能モジュールを実現するためのコードを含む。モデル生成プログラムは、モデル生成システム20の各機能モジュールを実現するためのコードを含む。推定プログラムは、推定システム30の各機能モジュールを実現するためのコードを含む。 In one example, the object detection program includes a data generation program, a model generation program, and an estimation program. The data generation program includes code for realizing each functional module of data generation system 10 . The model generation program includes code for realizing each functional module of the model generation system 20 . The estimation program includes code for implementing each functional module of estimation system 30 .
 物体検出プログラムは、CD-ROM、DVD-ROM、半導体メモリなどの非一時的な記録媒体に固定的に記録された上で提供されてもよい。あるいは、物体検出プログラムは、搬送波に重畳されたデータ信号として通信ネットワークを介して提供されてもよい。データ生成プログラム、モデル生成プログラム、および推定プログラムは個別に提供されてもよい。あるいは、これら3種類のプログラムのうちの少なくとも二つが一つのパッケージとして提供されてもよい。 The object detection program may be provided after being permanently recorded on a non-temporary recording medium such as a CD-ROM, DVD-ROM, or semiconductor memory. Alternatively, the object detection program may be provided over a communication network as a data signal superimposed on a carrier wave. The data generator, model generator, and estimation program may be provided separately. Alternatively, at least two of these three types of programs may be provided as one package.
 (物体検出方法)
 本開示に係る物体検出方法の一例として、図3を参照しながら、物体検出システム1により実行される処理手順の一例を説明する。図3は物体検出システム1での処理の一例を処理フローS1として示すフローチャートである。すなわち、物体検出システム1は処理フローS1を実行する。
(Object detection method)
As an example of the object detection method according to the present disclosure, an example of a processing procedure executed by the object detection system 1 will be described with reference to FIG. 3 . FIG. 3 is a flowchart showing an example of processing in the object detection system 1 as a processing flow S1. That is, the object detection system 1 executes the processing flow S1.
 ステップS11では、データ生成システム10が所与の画像にラベルを関連付けて第1トレーニング画像を生成する。一例では、この処理は表示制御部11およびラベリング部12によって実行される。表示制御部11は所与の画像にラベルを付与するためのラベル用ユーザインタフェースをモニタ120に表示する。ユーザはラベル用ユーザインタフェースを介して、その画像中に対象物が存在するか否かを示すラベルを入力する。ラベリング部12はそのラベルを画像に付与して第1トレーニング画像を生成し、この第1トレーニング画像を第1教師データの少なくとも一部として第1画像データベース51に格納する。 In step S11, the data generation system 10 associates a given image with a label to generate a first training image. In one example, this process is performed by the display control section 11 and the labeling section 12 . The display control unit 11 displays on the monitor 120 a labeling user interface for labeling a given image. The user inputs a label indicating whether or not the object is present in the image via the label user interface. The labeling unit 12 assigns the label to the image to generate the first training image, and stores the first training image in the first image database 51 as at least part of the first training data.
 ステップS12では、データ生成システム10が、少なくとも一つの第1トレーニング画像を含む第1教師データに基づいて機械学習を実行して機械学習モデル41を生成する。一例では、この処理は準備部13によって実行される。準備部13は第1画像データベース51にアクセスして、それぞれの第1トレーニング画像について以下の処理を実行する。すなわち、準備部13は、機械学習モデル41の基になる計算モデルである第1基準モデルに第1トレーニング画像を入力し、その第1基準モデルから出力されるクラス値の推定結果を得る。準備部13はその推定結果とラベル(正解)との誤差に基づくバックプロパゲーション(誤差逆伝播法)を実行して、第1基準モデル内のパラメータ群を更新する。準備部13は所与の終了条件を満たすまでこの学習を繰り返して機械学習モデル41を得る。一例では、機械学習モデル41は、画像に基づいて対象物の有無を検出するために最適であると推定される計算モデルである。機械学習モデル41は“現実に最適である計算モデル”とは限らないことに留意されたい。 In step S12, the data generation system 10 executes machine learning based on the first teacher data including at least one first training image to generate the machine learning model 41. In one example, this processing is performed by the preparation unit 13 . The preparation unit 13 accesses the first image database 51 and executes the following processing for each first training image. That is, the preparation unit 13 inputs the first training image to the first reference model, which is the calculation model on which the machine learning model 41 is based, and obtains the estimation result of the class value output from the first reference model. The preparation unit 13 executes back propagation (error backpropagation method) based on the error between the estimation result and the label (correct answer) to update the parameter group in the first reference model. The preparation unit 13 obtains a machine learning model 41 by repeating this learning until a given termination condition is satisfied. In one example, the machine learning model 41 is a computational model estimated to be optimal for detecting the presence or absence of objects based on images. Note that the machine learning model 41 is not necessarily a "computational model that is optimal in reality".
 ステップS13では、データ生成システム10が機械学習モデル41を用いたアノテーションによって第2トレーニング画像を生成する。一例では、この処理は検出部14、特定部15、およびアノテーション部16によって実行される。この処理の詳細は後述する。 In step S13, the data generation system 10 generates a second training image by annotation using the machine learning model 41. In one example, this processing is performed by the detection unit 14 , the identification unit 15 and the annotation unit 16 . The details of this processing will be described later.
 ステップS14では、モデル生成システム20が、少なくとも一つの第2トレーニング画像を含む第2教師データに基づいて機械学習を実行して学習済みモデル42を生成する。一例では、この処理は学習部21によって実行される。学習部21は第2画像データベース52にアクセスして、それぞれの第2トレーニング画像について以下の処理を実行する。すなわち、学習部21は、学習済みモデル42の基になる計算モデルである第2基準モデルに第2トレーニング画像を入力し、その第2基準モデルから出力される対象物の位置の推定結果を得る。推定結果は、対象物の種類を示すクラス値を更に示してもよい。学習部21はその推定結果とアノテーション(正解)との誤差に基づくバックプロパゲーションを実行して、第2基準モデル内のパラメータ群を更新する。学習部21は所与の終了条件を満たすまでこの学習を繰り返して学習済みモデル42を得る。一例では、学習済みモデル42は、画像に基づいて対象物の位置を検出するために最適であると推定される計算モデルである。学習済みモデル42は“現実に最適である計算モデル”とは限らないことに留意されたい。 In step S14, the model generation system 20 executes machine learning based on the second teacher data including at least one second training image to generate the learned model 42. In one example, this processing is performed by the learning unit 21 . The learning unit 21 accesses the second image database 52 and executes the following processing for each second training image. That is, the learning unit 21 inputs the second training image to the second reference model, which is the calculation model on which the trained model 42 is based, and obtains the position estimation result of the target output from the second reference model. . The estimation result may further indicate a class value indicating the type of object. The learning unit 21 executes back propagation based on the error between the estimation result and the annotation (correct answer) to update the parameter group in the second reference model. The learning unit 21 obtains a trained model 42 by repeating this learning until a given termination condition is satisfied. In one example, trained model 42 is a computational model that is estimated to be optimal for locating objects based on images. Note that the trained model 42 is not necessarily the "actually optimal computational model".
 ステップS15では、推定システム30が学習済みモデル42による推定を実行する。一例では、この処理は推定部31によって実行される。推定部31は対象画像を学習済みモデル42に入力して、その対象画像における対象物の位置を少なくとも検出する。推定部31は検出された位置を推定結果として出力する。推定部31はクラス値を更に検出してもよく、したがって、推定結果はクラス値を更に含み得る。推定部31は、対象物の位置を示すバウンディングボックスなどを対象画像上に重畳して推定結果を生成し、この推定結果をモニタ120上に表示してもよい。あるいは、推定部31は推定結果をストレージ163などの記録媒体に格納してもよい。あるいは、推定部31は推定結果を他のコンピュータに向けて送信してもよい。推定部31は複数の対象画像のそれぞれについて検出を実行してもよい。 In step S15, the estimation system 30 performs estimation by the trained model 42. In one example, this process is performed by the estimation unit 31 . The estimation unit 31 inputs the target image to the trained model 42 and detects at least the position of the target object in the target image. The estimation unit 31 outputs the detected position as an estimation result. The estimating unit 31 may further detect class values, and thus the estimation result may further include class values. The estimation unit 31 may superimpose a bounding box or the like indicating the position of the target object on the target image to generate an estimation result, and display the estimation result on the monitor 120 . Alternatively, the estimation unit 31 may store the estimation result in a recording medium such as the storage 163 . Alternatively, the estimation unit 31 may transmit the estimation result to another computer. The estimation unit 31 may perform detection for each of the plurality of target images.
 (データ生成方法)
 本開示に係るデータ生成方法の一例として、図4を参照しながらステップS13の詳細を説明する。図4はアノテーションに関する処理の一例を示すフローチャートである。
(data generation method)
Details of step S13 will be described with reference to FIG. 4 as an example of the data generation method according to the present disclosure. FIG. 4 is a flowchart showing an example of annotation processing.
 ステップS131では、検出部14が一つの入力画像を取得する。入力画像は静止画でもよいし、映像を構成する一つのフレーム画像でもよい。検出部14はカメラまたは他のコンピュータから送られてきた入力画像を受信してもよい。あるいは、検出部14はユーザにより入力された入力画像を受け付けてもよいし、ユーザ入力に基づいて所与の記憶装置から入力画像を読み出してもよい。 In step S131, the detection unit 14 acquires one input image. The input image may be a still image, or may be one frame image forming a video. The detector 14 may receive input images sent from a camera or other computer. Alternatively, the detection unit 14 may accept an input image input by a user, or read an input image from a given storage device based on user input.
 ステップS132では、検出部14がその入力画像を機械学習モデル41に入力して対象物の有無を検出する。すなわち、検出部14は入力画像中に対象物が映っているか否かを検出する。検出部14は機械学習モデル41を用いて、入力画像における複数種類の対象物のそれぞれについて、該対象物の有無を検出してもよい。検出部14は検出された対象物のそれぞれについてクラス値を特定してもよい。 In step S132, the detection unit 14 inputs the input image to the machine learning model 41 and detects the presence or absence of the object. That is, the detection unit 14 detects whether or not the target object appears in the input image. The detection unit 14 may use the machine learning model 41 to detect the presence or absence of each of multiple types of objects in the input image. The detection unit 14 may specify a class value for each detected object.
 ステップS133では、特定部15が、その検出を実行した機械学習モデル41に対して可視化手法を実行して、入力画像を形成する複数の画素のそれぞれについて注目度を算出する。本開示において、注目度とは機械学習モデル41によって注目された度合いを示す指標をいう。注目度は、機械学習モデル41による判断にどのくらい影響を与えたかを示す指標ともいえるし、機械学習モデル41による判断の根拠を表す指標であるともいえる。注目度が高い画素ほど、機械学習モデル41による判断に与える影響が大きい。 In step S133, the specifying unit 15 executes the visualization method on the machine learning model 41 that has executed the detection, and calculates the degree of attention for each of the plurality of pixels forming the input image. In the present disclosure, the degree of attention refers to an index indicating the degree of attention paid by the machine learning model 41 . The degree of attention can be said to be an index indicating how much influence the decision by the machine learning model 41 has, and it can also be said to be an index indicating the grounds for the decision by the machine learning model 41 . A pixel with a higher degree of attention has a greater influence on the determination by the machine learning model 41 .
 可視化手法は、入力画像を処理した機械学習モデル41内で算出された値に基づいて実行され、例えば、ニューラルネットワーク内の個々のノードおよび個々のエッジに対応する算出値に基づいて実行される。一例では、特定部15は可視化手法としてClass Activation Mapping(CAM)を用いる。CAMは、特徴マップと、Global Average Pooling(GAP)から披検出クラスへのエッジに対応する重みとに基づいて、ニューラルネットワークによる判断の根拠を可視化する手法である。特定部15はGrad-CAM(Gradient-weighted CAM)を用いてもよい。Grad-CAMは、CAMの計算で用いられる重みを逆伝播時の勾配で代用する手法であり、これにより、様々な種類のニューラルネットワークにおいて判断の根拠を可視化できる。特定部15はGrad-CAM++、Score-CAM、Ablation-CAM、Eigen-CAM、またはIntegrated Grad-CAMを用いてもよい。 The visualization method is executed based on values calculated in the machine learning model 41 that processed the input image, for example, calculated values corresponding to individual nodes and individual edges in the neural network. In one example, the identification unit 15 uses Class Activation Mapping (CAM) as a visualization technique. CAM is a technique for visualizing the grounds for judgment by a neural network based on a feature map and weights corresponding to edges from Global Average Pooling (GAP) to detected classes. The identifying unit 15 may use Gradient-weighted CAM (Grad-CAM). Grad-CAM is a method of substituting gradients during backpropagation for weights used in CAM calculations, which makes it possible to visualize the grounds for decisions in various types of neural networks. The identification unit 15 may use Grad-CAM++, Score-CAM, Ablation-CAM, Eigen-CAM, or Integrated Grad-CAM.
 ステップS134では、特定部15が、注目度が所与の閾値Ta以上である1以上の画素を注目画素として、入力画像の複数の画素から選択する。すなわち、特定部15は、機械学習モデル41による判断に与えた影響が相対的に大きい画素を注目画素として選択する。 In step S134, the specifying unit 15 selects one or more pixels whose attention level is equal to or greater than a given threshold value Ta as target pixels from a plurality of pixels of the input image. That is, the identifying unit 15 selects a pixel that has a relatively large influence on the determination by the machine learning model 41 as the pixel of interest.
 ステップS135では、特定部15が、選択された1以上の注目画素の集合に基づいて注目領域を特定する。一例では、特定部15は複数の注目画素が密集する領域である密集領域を注目領域として特定する。密集領域は、複数の注目画素が所与の基準値以上の密度で集まっている、限定された範囲である。密集領域の少なくとも一部は、連続して存在する2以上の注目画素によって形成されてもよい。一例では、特定部15は注目画素についてのクラスタリングを実行して少なくとも一つの密集領域を特定し、その密集領域を注目領域として特定してもよい。特定部15は1以上の密集領域のそれぞれについて該密集領域の面積を算出し、その面積が所与の閾値Tb以上である密集領域を注目領域として特定してもよい。あるいは、特定部15は1以上の密集領域のそれぞれについて、該密集領域の外接形状の面積を算出し、その面積が所与の閾値Tc以上である密集領域を注目領域として特定してもよい。その外接形状は外接矩形でもよいし外接円でもよい。 In step S135, the identifying unit 15 identifies a region of interest based on the set of one or more selected pixels of interest. In one example, the identifying unit 15 identifies a dense area, which is an area where a plurality of pixels of interest are concentrated, as the attention area. A dense area is a limited range in which a plurality of pixels of interest are gathered at a density equal to or higher than a given reference value. At least part of the dense area may be formed by two or more target pixels that are continuously present. In one example, the specifying unit 15 may specify at least one dense area by clustering the target pixels, and specify the dense area as the target area. The specifying unit 15 may calculate the area of each of one or more dense areas, and specify a dense area whose area is equal to or larger than a given threshold value Tb as the attention area. Alternatively, the specifying unit 15 may calculate the area of the circumscribed shape of each of one or more dense areas, and specify the dense areas whose area is equal to or larger than a given threshold value Tc as the attention area. The circumscribed shape may be a circumscribed rectangle or a circumscribed circle.
 ステップS133~S135に示すように、特定部15は、検出において機械学習モデル41によって注目された領域を注目領域として入力画像から特定する。特定部15は一つの入力画像から複数の注目領域を特定し得る。特定部15は、複数種類の対象物のうち、検出された1種類以上の対象物のそれぞれについて注目領域を特定してもよい。特定部15は、入力画像を形成する複数の画素に対応する複数の注目度に基づいて注目領域を特定してもよい。特定部15はGrad-CAMを機械学習モデルに対して実行して注目領域を特定してもよい。 As shown in steps S133 to S135, the identifying unit 15 identifies from the input image the area that has been noticed by the machine learning model 41 in the detection as the area of interest. The specifying unit 15 can specify a plurality of attention areas from one input image. The specifying unit 15 may specify a region of interest for each of one or more types of target objects that are detected among the plurality of types of target objects. The identifying unit 15 may identify the attention area based on a plurality of degrees of attention corresponding to a plurality of pixels forming the input image. The identification unit 15 may execute Grad-CAM on the machine learning model to identify the region of interest.
 ステップS136では、アノテーション部16が注目領域にアノテーションを関連付ける。一例では、アノテーション部16は対象物の種類を示すクラス値と該対象物の位置とを示すアノテーションを注目領域に関連付ける。複数の注目領域が特定された場合には、アノテーション部16はそれぞれの注目領域にアノテーションを関連付ける。アノテーション部16は、検出された1種類以上の対象物に対応する1以上の注目領域のそれぞれに、対象物の種類ごとに異なるアノテーションを関連付けてもよい。「対象物の種類ごとに異なるアノテーション」とは、例えば、対象物の種類を示すクラス値を含むアノテーションである。アノテーション部16は、機械学習モデル41によって検出された対象物に対応するアノテーションを注目領域に関連付ける。 At step S136, the annotation unit 16 associates the annotation with the region of interest. In one example, the annotation unit 16 associates a class value indicating the type of object and an annotation indicating the position of the object with the region of interest. When a plurality of attention areas are specified, the annotation unit 16 associates annotations with each attention area. The annotation unit 16 may associate different annotations for each type of object with one or more attention regions corresponding to one or more types of detected objects. “Annotations different for each type of object” is, for example, an annotation including a class value indicating the type of object. The annotation unit 16 associates the annotation corresponding to the object detected by the machine learning model 41 with the region of interest.
 一例では、アノテーション部16は、注目領域に対応するグラフィック表現を含むアノテーションを該注目領域に関連付けてもよい。このグラフィック表現は、入力画像における対象物の位置を示すために用いられ得る。例えば、アノテーション部16は、注目領域の位置に対応して描画されるグラフィック表現を生成してもよい。このグラフィック表現は例えばバウンディングボックスである。グラフィック表現は、外接形状などのような、注目領域の全体を囲む形状でもよい。あるいは、グラフィック表現は注目領域の一部と重なる形状でもよく、したがって、注目領域がグラフィック表現からはみ出てもよい。あるいは、グラフィック表現は注目領域に位置する対象物の全体を囲む形状でもよいし、該対象物の一部と重なる形状でもよい。アノテーション部16は、点滅、強調などの視覚的効果を提供するグラフィック表現を生成してもよい。一例では、アノテーション部16は、対象物の位置を示しクラス値を含まないアノテーションを注目領域に関連付けてもよい。 In one example, the annotation unit 16 may associate an annotation including a graphic representation corresponding to the region of interest with the region of interest. This graphical representation can be used to indicate the position of the object in the input image. For example, the annotation unit 16 may generate a graphical representation rendered corresponding to the location of the region of interest. This graphic representation is for example a bounding box. The graphical representation may be a shape that encloses the entire region of interest, such as a circumscribing shape. Alternatively, the graphical representation may have a shape that partially overlaps the area of interest, so that the area of interest may extend beyond the graphical representation. Alternatively, the graphic representation may be a shape that surrounds the entire object located in the region of interest, or a shape that overlaps part of the object. The annotation unit 16 may generate graphic representations that provide visual effects such as blinking, highlighting, and the like. In one example, the annotation unit 16 may associate an annotation that indicates the position of an object and does not include a class value with the region of interest.
 アノテーションを注目領域に関連付ける処理の一例として、アノテーション部16は、画素単位でアノテーションを設定するセグメンテーションを実行してもよい。この場合、アノテーション部16は、注目領域に対応するグラフィック表現の一例として、セグメントに対応するグラフィック表現を生成してもよい。 As an example of processing for associating annotations with regions of interest, the annotation unit 16 may perform segmentation for setting annotations in units of pixels. In this case, the annotation unit 16 may generate a graphic representation corresponding to the segment as an example of a graphic representation corresponding to the region of interest.
 ステップS137では、表示制御部11がアノテーションの修正を受け付ける。表示制御部11は、アノテーション部16によって注目領域に関連付けられたアノテーションをユーザに修正させるための修正用ユーザインタフェースをモニタ120上に表示する。例えば、表示制御部11はアノテーションが重畳された入力画像を修正用ユーザインタフェース上に表示して、そのアノテーションを修正するためのユーザ入力を受け付ける。ユーザはこの修正用ユーザインタフェースを使って、外接形状などのグラフィック表現の位置または寸法を修正したり、クラス値を修正したり、アノテーションを追加または削除したりすることができる。ユーザがアノテーションを修正した場合には、アノテーション部16がそのユーザ入力に基づいてアノテーションを修正する。ユーザはアノテーションを修正しなくてもよく、この場合にはアノテーション部16は修正処理を実行しない。 At step S137, the display control unit 11 accepts correction of the annotation. The display control unit 11 displays on the monitor 120 a correction user interface for allowing the user to correct the annotation associated with the attention area by the annotation unit 16 . For example, the display control unit 11 displays the input image on which the annotation is superimposed on the correction user interface, and receives user input for correcting the annotation. A user can use this modification user interface to modify the position or dimensions of a graphic representation such as a bounding shape, modify class values, add or delete annotations, and so on. When the user modifies the annotation, the annotation section 16 modifies the annotation based on the user's input. The user does not have to correct the annotation, and in this case the annotation unit 16 does not execute correction processing.
 ステップS138では、アノテーション部16が、処理された入力画像、すなわち、アノテーションが付与された入力画像を、第2トレーニング画像として第2画像データベース52に保存する。 In step S138, the annotation unit 16 stores the processed input image, that is, the annotated input image in the second image database 52 as a second training image.
 ステップS139では、検出部14がすべての入力画像を処理したか否かを判定する。未処理の入力画像が存在する場合には(ステップS139においてNO)、処理はステップS131に戻る。ステップS131では検出部14が次の入力画像を取得し、この入力画像に基づいてステップS132~S138の処理が実行される。すべての入力画像が処理された場合には(ステップS139においてYES)、データ生成システム10がステップS13の処理を終了する。 In step S139, the detection unit 14 determines whether or not all input images have been processed. If there is an unprocessed input image (NO in step S139), the process returns to step S131. In step S131, the detection unit 14 acquires the next input image, and the processes of steps S132 to S138 are executed based on this input image. When all input images have been processed (YES in step S139), data generation system 10 ends the process of step S13.
 図5は入力画像から注目領域を特定する一例を示す図である。この例では、データ生成システム10は入力画像200に対してアノテーションを実行する。入力画像200はロボットのエンドエフェクタ3bの周辺を撮影するカメラによって撮影された画像であり、3個の飾り201~203を写している。飾り201,202は製品Paであり、飾り203は製品Pbである。この例では検出部14は製品Paを対象物として検出するものとする。 FIG. 5 is a diagram showing an example of identifying a region of interest from an input image. In this example, data generation system 10 performs annotation on input image 200 . An input image 200 is an image captured by a camera that captures the periphery of the end effector 3b of the robot, and shows three decorations 201-203. Decorations 201 and 202 are products Pa, and decoration 203 is product Pb. In this example, the detection unit 14 detects the product Pa as the object.
 検出部14は入力画像200を機械学習モデル41に入力して、飾り201,202を対象物として検出する(ステップS132)。特定部15は、入力画像200を処理した機械学習モデル41に対してGrad-CAMなどの可視化手法を実行して、入力画像200の画素ごとの注目度を算出する(ステップS133)。画像210は個々の画素の注目度を可視化したヒートマップである。この画像210を見ると、飾り201の領域に位置する画素群211と、飾り202の領域に位置する画素群212とにおいて注目度が高いことが分かる。特定部15は注目度が所与の閾値以上である画素を注目画素として選択し、その注目画素の集合(例えば密集領域)に基づいて注目領域を特定する(ステップS134,135)。画像220は、飾り201に対応する画素群211から得られた密集領域221と、飾り202に対応する画素群212から得られた密集領域222とを示す。特定部15は密集領域そのものの面積に基づいて、または密集領域の外接形状の面積に基づいて、該密集領域を注目領域として特定するか否かを判定する。アノテーション部16はそれぞれの注目領域にアノテーションを関連付ける(ステップS136)。画像230は、密集領域221,222が注目領域として特定され、密集領域(注目領域)221にアノテーション231が関連付けられ、密集領域(注目領域)にアノテーション232が関連付けられたことを示す。この例では、製品Paに関するアノテーション231,232は密集領域(注目領域)の外接矩形によって表されている。アノテーション部16はユーザ入力に基づいてアノテーション231,232の少なくとも一方を修正し得る(ステップS137)。 The detection unit 14 inputs the input image 200 to the machine learning model 41 and detects the decorations 201 and 202 as objects (step S132). The specifying unit 15 executes a visualization method such as Grad-CAM on the machine learning model 41 that has processed the input image 200, and calculates the degree of attention for each pixel of the input image 200 (step S133). Image 210 is a heat map that visualizes the degree of attention of each pixel. Looking at this image 210, it can be seen that a pixel group 211 located in the area of the decoration 201 and a pixel group 212 located in the area of the decoration 202 are highly noticeable. The specifying unit 15 selects pixels whose attention levels are equal to or higher than a given threshold as target pixels, and specifies a target area based on a set of target pixels (for example, a dense area) (steps S134 and 135). Image 220 shows a dense region 221 obtained from pixel group 211 corresponding to decoration 201 and a dense region 222 obtained from pixel group 212 corresponding to decoration 202 . Based on the area of the dense area itself or the area of the circumscribed shape of the dense area, the specifying unit 15 determines whether or not to specify the dense area as the attention area. The annotation unit 16 associates the annotation with each attention area (step S136). An image 230 shows that dense regions 221 and 222 are specified as regions of interest, an annotation 231 is associated with the dense region (region of interest) 221, and an annotation 232 is associated with the dense region (region of interest). In this example, annotations 231 and 232 related to product Pa are represented by circumscribing rectangles of dense areas (areas of interest). The annotation unit 16 can modify at least one of the annotations 231, 232 based on user input (step S137).
 (学習済みモデルの製造方法)
 ステップS13,S14は、本開示に係る学習済みモデルの製造方法の一例でもある。例えばその製造方法は次のように実現される。すなわち、データ生成システム10は機械学習モデル41を用いて、入力画像における対象物の有無を検出する(ステップS132)。データ生成システム10は、その検出において機械学習モデル41によって注目された領域を注目領域として入力画像から特定する(ステップS133~S135)。データ生成システム10は、検出された対象物に対応するアノテーションを注目領域に関連付ける(S136)。モデル生成システム20は、アノテーションが注目領域に関連付けられた入力画像(すなわち第2トレーニング画像)を含む第2教師データに基づいて学習済みモデル42を生成する。すなわち、一例では、本開示に係る学習済みモデルの製造方法は、本開示に係るデータ生成方法と、該データ生成方法によってアノテーションが注目領域に関連付けられた入力画像を含む教師データに基づいて、画像から対象物の位置を少なくとも検出するための学習済みモデルを生成するステップとを含む。
(Manufacturing method of learned model)
Steps S13 and S14 are also an example of a learned model manufacturing method according to the present disclosure. For example, the manufacturing method is realized as follows. That is, the data generation system 10 uses the machine learning model 41 to detect the presence or absence of the object in the input image (step S132). The data generation system 10 specifies from the input image the area that has been noticed by the machine learning model 41 in the detection as the attention area (steps S133 to S135). The data generation system 10 associates the annotation corresponding to the detected object with the region of interest (S136). The model generation system 20 generates a trained model 42 based on second teacher data including input images (ie, second training images) with annotations associated with regions of interest. That is, in one example, the method for producing a trained model according to the present disclosure includes a data generation method according to the present disclosure, and an image based on teacher data including an input image in which an annotation is associated with a region of interest by the data generation method. generating a trained model for at least detecting the position of the object from.
 [ロボットシステム]
 (システムの概要)
 一例では、本開示に係るロボット制御システムの一例をロボットシステム2の一構成要素として示す。ロボットシステム2は、所与の目的を達成するための一連の処理であるタスクをロボットに実行させることにより、所与の作業を自動化する仕組みである。一例では、ロボットシステム2は、ロボットの周辺環境が未知である状況において、ロボットがタスクを実行できると見込まれる領域、すなわち、対象物を処理できると見込まれる領域をタスク領域として能動的に抽出し、その対象物(タスク領域)に向けてロボットを接近させる。ロボットシステム2はタスク領域を抽出するために、入力画像から注目領域を特定するデータ生成システム10の仕組みを利用する。すなわち、ロボットシステム2は入力画像における対象物の有無を検出し、この検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定し、その注目領域に基づいてタスク領域を選択する。一例では、ロボットシステム2はタスク領域の抽出を繰り返しながらロボットを対象物に接近させていく。ロボットがタスク領域に到達すると、ロボットシステム2はそのロボットにタスクを実行させる。本開示において、ロボットがタスク領域に到達するとは、ロボットがタスクを実行できる程に十分に対象物に接近したことを意味する。
[Robot system]
(system overview)
In one example, an example robotic control system according to the present disclosure is shown as a component of the robotic system 2 . The robot system 2 is a mechanism for automating given work by causing a robot to execute a task, which is a series of processes for achieving a given purpose. In one example, the robot system 2 actively extracts an area in which the robot is expected to be able to perform a task, i.e., an area in which it is expected to process an object, as a task area in a situation where the surrounding environment of the robot is unknown. , the robot approaches the object (task area). In order to extract a task area, the robot system 2 uses the mechanism of the data generation system 10 that identifies the attention area from the input image. That is, the robot system 2 detects the presence or absence of an object in the input image, specifies an area of interest from the input image as an area of interest by the machine learning model in this detection, and selects a task area based on the area of interest. In one example, the robot system 2 makes the robot approach the object while repeatedly extracting the task area. When the robot reaches the task area, the robot system 2 causes the robot to perform the task. In this disclosure, when the robot reaches the task area, it means that the robot has come close enough to the object to perform the task.
 一例では、タスクはロボット3が対象物に接触する工程を含む。このようなタスクの例として、対象物の把持を含むタスク、対象物を押すかまたは引くことを含むタスクが挙げられる。このようなタスクをロボット3に実行させる場合には、ロボットシステム2は、ロボット3がタスクを実行するために対象物に接触できる領域をタスク領域として抽出する。 In one example, the task includes the step of bringing the robot 3 into contact with an object. Examples of such tasks include tasks involving grasping an object, pushing or pulling an object. When the robot 3 is caused to perform such a task, the robot system 2 extracts, as a task area, an area where the robot 3 can come into contact with the object in order to perform the task.
 ロボットシステム2は未知の周辺環境を認識するために、センサの条件を能動的に変化させて必要な情報を探索および収集する技術であるアクティブセンシング(active sensing)を用いる。この技術によって、対象物または周辺環境に関する条件が頻繁に変わったり、事前のモデル化が不可能または困難であったりする場合でも、ロボットが目指すべき目標を認識することが可能になる。アクティブセンシングは未知の目標を発見する技術であり、したがって、既知の目標に向けて機械系の位置を決めるビジュアルフィードバック(visual feedback)とは異なる。 In order to recognize the unknown surrounding environment, the robot system 2 uses active sensing, which is a technique for searching and collecting necessary information by actively changing sensor conditions. This technology allows the robot to recognize the target to be aimed at, even if the conditions regarding the object or surrounding environment change frequently or are impossible or difficult to model in advance. Active sensing is a technique for finding unknown targets, and thus differs from visual feedback, which positions a mechanical system toward a known target.
 (システムの構成)
 図6はロボットシステム2の機能構成の一例を示す図である。この例では、ロボットシステム2は、ロボット制御システム60と、1以上のロボット3と、1以上のロボット3に対応する1以上のロボットコントローラ4とを備える。図6は一つのロボット3および一つのロボットコントローラ4を示し、一つのロボットコントローラ4に一つのロボット3が接続される構成を示す。しかし、各装置の台数も接続方法も図6の例に限定されない。例えば、一つのロボットコントローラ4が複数台のロボット3と接続してもよい。装置間を接続する通信ネットワークは、有線ネットワークでも無線ネットワークでもよい。通信ネットワークはインターネットおよびイントラネットの少なくとも一方を含んで構成されてもよい。あるいは、通信ネットワークは単純に1本の通信ケーブルによって実現されてもよい。
(System configuration)
FIG. 6 is a diagram showing an example of the functional configuration of the robot system 2. As shown in FIG. In this example, the robot system 2 comprises a robot control system 60 , one or more robots 3 , and one or more robot controllers 4 corresponding to the one or more robots 3 . FIG. 6 shows one robot 3 and one robot controller 4 and shows a configuration in which one robot 3 is connected to one robot controller 4 . However, neither the number of devices nor the connection method is limited to the example of FIG. For example, one robot controller 4 may be connected to multiple robots 3 . A communication network that connects devices may be a wired network or a wireless network. The communication network may comprise at least one of the Internet and an intranet. Alternatively, the communication network may simply be implemented by a single communication cable.
 ロボット制御システム60は、少なくとも一部の状況においてロボット3を自律的に動作させるためのコンピュータシステムである。ロボット制御システム60は所与の演算を実行して、ロボット3を制御するための指令信号を生成する。一例では、指令信号はロボット3を制御するためのデータを含み、例えば、ロボット3の軌道を示すパスを含む。ロボット3の軌道とは、ロボット3またはその構成要素の動きの経路のことをいう。例えば、ロボット3の軌道は先端部の軌道であり得る。ロボット制御システム60は生成された指令信号をロボットコントローラ4に向けて送信する。 The robot control system 60 is a computer system for autonomously operating the robot 3 in at least some situations. The robot control system 60 performs given operations to generate command signals for controlling the robot 3 . In one example, the command signal includes data for controlling the robot 3, such as a path indicating the trajectory of the robot 3. FIG. The trajectory of the robot 3 refers to the path of movement of the robot 3 or its components. For example, the trajectory of the robot 3 can be the trajectory of the tip. The robot control system 60 transmits the generated command signal to the robot controller 4 .
 ロボットコントローラ4は、ロボット制御システム60からの指令信号に従ってロボット3を動作させる装置である。一例では、ロボットコントローラ4は、指令信号で示される目標値に先端部の位置および姿勢を一致させるための関節角度目標値(ロボット3の各関節の角度目標値)を算出し、その角度目標値に従ってロボット3を制御する。 The robot controller 4 is a device that operates the robot 3 according to command signals from the robot control system 60 . In one example, the robot controller 4 calculates joint angle target values (angle target values of each joint of the robot 3) for matching the position and orientation of the tip portion with the target values indicated by the command signals, and calculates the joint angle target values. to control the robot 3 according to
 ロボット3は、人に代わって作業する装置または機械である。一例では、ロボット3は多軸のシリアルリンク型の垂直多関節ロボットである。ロボット3は、マニピュレータ3aと、該マニピュレータ3aの先端に取り付けられたツールであるエンドエフェクタ3bとを備える。ロボット3はそのエンドエフェクタ3bを用いて様々な処理を実行できる。ロボット3は、所与の範囲内においてエンドエフェクタ3bの位置および姿勢を自在に変更し得る。ロボット3は、6軸の垂直多関節ロボットでもよいし、6軸に1軸の冗長軸を追加した7軸の垂直多関節ロボットでもよい。 The robot 3 is a device or machine that works on behalf of humans. In one example, the robot 3 is a multi-axis serial link type vertical articulated robot. The robot 3 includes a manipulator 3a and an end effector 3b, which is a tool attached to the tip of the manipulator 3a. The robot 3 can perform various processes using its end effector 3b. The robot 3 can freely change the position and posture of the end effector 3b within a given range. The robot 3 may be a 6-axis vertical multi-joint robot or a 7-axis vertical multi-joint robot in which one redundant axis is added to the 6 axes.
 ロボット3はロボット制御システム60による制御に基づいて動作して所与のタスクを実行する。ロボット3がタスクを実行することで、ロボットシステム2のユーザが望む結果が得られる。例えば、タスクは何らかの対象物を処理するために設定され、この場合にはロボット3は対象物を処理する。タスクの例として「対象物を把持してコンベヤ上に置く」、「対象物を掴んでワークに取り付ける」、「対象物にスプレーで塗装する」が挙げられる。 The robot 3 operates under the control of the robot control system 60 to perform a given task. The execution of the task by the robot 3 produces the result desired by the user of the robot system 2 . For example, a task is set to process some object, in which case the robot 3 processes the object. Examples of tasks include "grab an object and place it on a conveyor", "grab an object and attach it to a workpiece", and "spray paint an object".
 ロボットの周囲の3次元空間を認識するためのセンサの例として、カメラなどの視覚センサが挙げられる。一例では、ロボットは、エンドエフェクタ3bの周辺を撮影するカメラ3cを備える。カメラ3cの被写範囲は、エンドエフェクタ3bの少なくとも一部を映すように設定されてもよい。カメラ3cはマニピュレータ3a上に配置されてもよく、例えばマニピュレータ3aの先端付近に取り付けられる。一例では、カメラ3cはロボット3の動作に対応して移動する。この移動は、カメラ3cの位置および姿勢の少なくとも一方の変化を含み得る。ロボット3の動作に対応して移動する限り、カメラ3cはロボット3とは異なる場所に設けられてもよい。例えば、カメラ3cは別のロボットに取り付けられてもよいし、天井、壁面、またはカメラスタンドに移動可能に設けられてもよい。 Visual sensors such as cameras are examples of sensors for recognizing the three-dimensional space around the robot. In one example, the robot includes a camera 3c that captures the surroundings of the end effector 3b. The coverage of the camera 3c may be set so as to capture at least part of the end effector 3b. The camera 3c may be arranged on the manipulator 3a, for example attached near the tip of the manipulator 3a. In one example, the camera 3c moves corresponding to the motion of the robot 3. FIG. This movement may include changes in at least one of the position and orientation of the camera 3c. The camera 3 c may be provided at a different location from the robot 3 as long as it moves in response to the motion of the robot 3 . For example, the camera 3c may be attached to another robot, or may be movably provided on the ceiling, wall, or camera stand.
 一例では、ロボット制御システム60は機能モジュールとして表示制御部11、ラベリング部12、準備部13、検出部14、特定部15、およびロボット制御部61を備える。表示制御部11、ラベリング部12、準備部13、検出部14、および特定部15は、データ生成システム10で示した機能モジュールと同様である。表示制御部11はラベル用ユーザインタフェースを表示する。ラベリング部12は、ラベル用ユーザインタフェースを介して入力されたラベルを所与の画像に付与して第1トレーニング画像を生成し、その第1トレーニング画像を第1画像データベース51に格納する。準備部13はラベルが付与された第1トレーニング画像に基づいて機械学習を実行して、機械学習モデル41を生成する。検出部14はその機械学習モデル41を用いて、入力画像における対象物の有無を検出する。特定部15はその検出において機械学習モデル41によって注目された領域を注目領域として入力画像から特定する。ロボット制御部61は、対象物を処理するロボット3をその注目領域に基づいて制御する機能モジュールである。 In one example, the robot control system 60 includes a display control unit 11, a labeling unit 12, a preparation unit 13, a detection unit 14, an identification unit 15, and a robot control unit 61 as functional modules. Display control unit 11 , labeling unit 12 , preparation unit 13 , detection unit 14 , and identification unit 15 are the same as the functional modules shown in data generation system 10 . The display control unit 11 displays a label user interface. The labeling unit 12 generates a first training image by adding a label input through the label user interface to a given image, and stores the first training image in the first image database 51 . The preparation unit 13 executes machine learning based on the labeled first training image to generate a machine learning model 41 . The detection unit 14 uses the machine learning model 41 to detect the presence or absence of an object in the input image. The specifying unit 15 specifies from the input image a region focused on by the machine learning model 41 in the detection as a region of interest. The robot control unit 61 is a functional module that controls the robot 3 that processes an object based on its attention area.
 一例では、ロボットシステム2(ロボット制御システム60)は第1画像データベース51にアクセス可能である。第1画像データベース51は、ロボットシステム2またはロボット制御システム60の外部に設けられてもよいし、ロボットシステム2またはロボット制御システム60の一部であってもよい。第1画像データベース51は、対象物の有無を示すラベルが付与された複数の第1トレーニング画像を、機械学習モデル41を生成するために用いられる第1教師データとして記憶する装置である。 In one example, the robot system 2 (robot control system 60) can access the first image database 51. The first image database 51 may be provided outside the robot system 2 or the robot control system 60 or may be part of the robot system 2 or the robot control system 60 . The first image database 51 is a device that stores a plurality of first training images with labels indicating the presence or absence of objects as first teacher data used to generate the machine learning model 41 .
 物体検出システム1と同様に、ロボット制御システム60は任意の種類のコンピュータによって実現されてよく、例えば図2に示すコンピュータ100によって実現され得る。ロボット制御システム60の各機能モジュールは、プロセッサ161またはメモリ162の上にロボット制御プログラムを読み込ませてプロセッサ161にそのロボット制御プログラムを実行させることで実現される。プロセッサ161はロボット制御プログラムに従って入出力ポート164または通信ポート165を動作させ、メモリ162またはストレージ163におけるデータの読み出しおよび書き込みを実行する。物体検出プログラムと同様に、ロボット制御プログラムは非一時的な記録媒体によって提供されてもよいし、通信ネットワークを介して提供されてもよい。 Similar to the object detection system 1, the robot control system 60 may be implemented by any kind of computer, for example by the computer 100 shown in FIG. Each functional module of the robot control system 60 is implemented by loading a robot control program into the processor 161 or memory 162 and causing the processor 161 to execute the robot control program. The processor 161 operates the input/output port 164 or the communication port 165 according to the robot control program to read and write data in the memory 162 or storage 163 . Similar to the object detection program, the robot control program may be provided by a non-transitory recording medium or via a communication network.
 (ロボット制御方法)
 本開示に係るロボット制御方法の一例として、図7を参照しながら、ロボットシステム2(ロボット制御システム60)により実行される処理手順の一例を説明する。図7はロボットシステム2(ロボット制御システム60)での処理の一例を処理フローS2として示すフローチャートである。すなわち、ロボットシステム2(ロボット制御システム60)は処理フローS2を実行する。
(Robot control method)
As an example of the robot control method according to the present disclosure, an example of a processing procedure executed by the robot system 2 (robot control system 60) will be described with reference to FIG. FIG. 7 is a flowchart showing an example of processing in the robot system 2 (robot control system 60) as a processing flow S2. That is, the robot system 2 (robot control system 60) executes the processing flow S2.
 処理フローS2は、機械学習モデル41が既に用意されていることを前提として実行される。一例では、処理フローS1のステップS11,S12によって生成された機械学習モデル41が処理フローS2において用いられる。 The processing flow S2 is executed on the premise that the machine learning model 41 has already been prepared. In one example, the machine learning model 41 generated by steps S11 and S12 of process flow S1 is used in process flow S2.
 ステップS21では、検出部14が一つの入力画像を取得する。この処理はステップS131と同様である。入力画像は静止画でもよいし、映像を構成する一つのフレーム入力画像でもよい。例えば、検出部14はカメラ3cから送られてきた入力画像を受信してもよい。 In step S21, the detection unit 14 acquires one input image. This process is similar to step S131. The input image may be a still image, or may be one frame input image forming a video. For example, the detection unit 14 may receive an input image sent from the camera 3c.
 ステップS22では、検出部14がその入力画像を機械学習モデル41に入力して対象物の有無を検出する。この処理はステップS132と同様である。 In step S22, the detection unit 14 inputs the input image to the machine learning model 41 and detects the presence or absence of the object. This process is the same as step S132.
 ステップS23では、特定部15が、その検出を実行した機械学習モデル41に対して可視化手法を実行して、入力画像を形成する複数の画素のそれぞれについて注目度を算出する。この処理はステップS133と同様である。 In step S23, the identifying unit 15 executes the visualization method on the machine learning model 41 that has executed the detection, and calculates the degree of attention for each of the plurality of pixels forming the input image. This process is the same as step S133.
 ステップS24では、特定部15が、注目度が所与の閾値Ta以上である1以上の画素を注目画素として、入力画像の複数の画素から選択する。この処理はステップS134と同様である。 In step S24, the identifying unit 15 selects one or more pixels whose degree of attention is equal to or greater than a given threshold value Ta as target pixels from a plurality of pixels of the input image. This process is the same as step S134.
 ステップS25では、特定部15が、選択された1以上の注目画素の集合に基づいて注目領域を特定する。この処理はステップS135と同様である。 In step S25, the specifying unit 15 specifies a region of interest based on the set of one or more selected pixels of interest. This process is the same as step S135.
 ステップS23~S25に示すように、特定部15は、検出において機械学習モデル41によって注目された領域を注目領域として入力画像から特定する。特定部15は一つの入力画像から複数の注目領域を特定し得る。特定部15は、入力画像を形成する複数の画素に対応する複数の注目度に基づいて注目領域を特定してもよい。特定部15は、Grad-CAMを機械学習モデルに対して実行して、注目領域を特定してもよい。ステップS23~S25に関するこれらの事項もステップS133~S135と同様である。 As shown in steps S23 to S25, the identifying unit 15 identifies from the input image the area that has been noticed by the machine learning model 41 in the detection as the area of interest. The specifying unit 15 can specify a plurality of attention areas from one input image. The identifying unit 15 may identify the attention area based on a plurality of degrees of attention corresponding to a plurality of pixels forming the input image. The identification unit 15 may execute Grad-CAM on the machine learning model to identify the region of interest. These matters regarding steps S23-S25 are the same as those of steps S133-S135.
 ステップS26では、ロボット制御部61が注目領域に基づいてタスク領域を選択する。例えば、ロボット制御部61は1以上の注目領域のうちの一つをタスク領域として選択する。複数の注目領域が特定された場合には、ロボット制御部61は外接矩形の面積がいちばん大きい注目領域を選択してもよいし、いちばん大きい面積を有する注目領域を選択してもよい。このように注目領域を選択する場合には、被写範囲においてロボット3にいちばん近くに存在する対象物の領域がタスク領域として選択される確率が高い。 In step S26, the robot control unit 61 selects a task area based on the attention area. For example, the robot control unit 61 selects one of the one or more attention areas as the task area. When a plurality of attention areas are specified, the robot control section 61 may select the attention area having the largest area of the circumscribed rectangle, or may select the attention area having the largest area. When the attention area is selected in this manner, there is a high probability that the area of the object that is closest to the robot 3 in the coverage area will be selected as the task area.
 ステップS27では、ロボット制御部61が、ロボット3がタスク領域に到達したか否かを判定する。例えば、ロボット制御部61はエンドエフェクタ3bとタスク領域との距離を算出し、この距離に基づいて判定を実行してもよい。算出された距離が所与の閾値Td以下である場合には、ロボット制御部61はロボット3がタスク領域に到達したと判定する。一方、算出された距離がその閾値Tdより大きい場合には、ロボット制御部61はロボット3がタスク領域に到達していないと判定する。 In step S27, the robot control unit 61 determines whether the robot 3 has reached the task area. For example, the robot control section 61 may calculate the distance between the end effector 3b and the task area, and execute determination based on this distance. If the calculated distance is equal to or less than the given threshold value Td, the robot control unit 61 determines that the robot 3 has reached the task area. On the other hand, when the calculated distance is greater than the threshold Td, the robot control section 61 determines that the robot 3 has not reached the task area.
 ロボット3がタスク領域に到達していないと判定された場合には(ステップS27においてNO)、処理はステップS28に進む。ステップS28では、ロボット制御部61が、注目領域に基づくロボット3の制御の一例として、タスク領域に向けてロボット3を制御する。すなわち、ロボット制御部61はロボット3が対象物に接近するように該ロボット3を制御する。一例では、ロボット制御部61はエンドエフェクタ3bを現在位置からタスク領域へと動かすためのロボット3のパスをプランニングにより生成する。あるいは、ロボット制御部61はタスク領域までの距離を小さくすると共に該タスク領域がカメラ3cの画像の中心に映るように、ロボット3のパス(軌道)をプランニングにより生成してもよい。ロボット制御部61は生成されたパスを示す指令信号をロボットコントローラ4に出力し、ロボットコントローラ4はその指令信号に従ってロボット3を制御する。この結果、ロボット3がそのパスに沿って対象物に向かって接近する。 If it is determined that the robot 3 has not reached the task area (NO in step S27), the process proceeds to step S28. In step S28, the robot control unit 61 controls the robot 3 toward the task area as an example of control of the robot 3 based on the attention area. That is, the robot control unit 61 controls the robot 3 so that the robot 3 approaches the object. In one example, the robot control unit 61 generates a path of the robot 3 for moving the end effector 3b from the current position to the task area by planning. Alternatively, the robot control unit 61 may generate a path (trajectory) of the robot 3 by planning so that the distance to the task area is reduced and the task area appears in the center of the image of the camera 3c. The robot control unit 61 outputs a command signal indicating the generated path to the robot controller 4, and the robot controller 4 controls the robot 3 according to the command signal. As a result, the robot 3 approaches the object along its path.
 ステップS28の後に処理はステップS21に戻り、ロボット制御システム60がステップS21以降の処理を再び実行する。この繰り返しにおいて、検出部14は新たな入力画像を取得し(ステップS21)、その入力画像における対象物の有無を機械学習モデル41を用いて更に検出する(ステップS22)。例えば、検出部14はロボット3が対象物に接近した後に撮影された入力画像を新たな入力画像として処理する。特定部15は、その検出において機械学習モデル41によって注目された領域を新たな注目領域として新たな入力画像から特定する(ステップS23~S25)。ロボット制御部61は新たな注目領域に基づいてロボット3を更に制御する(ステップS26以降)。 After step S28, the process returns to step S21, and the robot control system 60 executes the processes after step S21 again. In this repetition, the detection unit 14 acquires a new input image (step S21), and further detects the presence or absence of the object in the input image using the machine learning model 41 (step S22). For example, the detection unit 14 processes an input image captured after the robot 3 approaches the object as a new input image. The specifying unit 15 specifies from the new input image, as a new attention area, the area that has been noticed by the machine learning model 41 in the detection (steps S23 to S25). The robot control unit 61 further controls the robot 3 based on the new attention area (from step S26).
 ロボット3がタスク領域に到達したと判定された場合には(ステップS27においてYES)、処理はステップS29に進む。ステップS29では、ロボット制御部61がロボット3にタスクを実行させる。一例では、ロボット制御部61はタスクを実行するためのパスをプランニングにより生成し、そのパスを示す指令信号をロボットコントローラ4に出力する。ロボットコントローラ4はその指令信号に従ってロボット3を制御する。この結果、ロボット3はタスクを実行する。 If it is determined that the robot 3 has reached the task area (YES in step S27), the process proceeds to step S29. At step S29, the robot control unit 61 causes the robot 3 to execute the task. In one example, the robot control unit 61 generates a path for executing a task through planning, and outputs a command signal indicating the path to the robot controller 4 . The robot controller 4 controls the robot 3 according to the command signal. As a result, the robot 3 executes the task.
 ステップS30では、ロボット制御部61が、ロボット制御を終了する否かを判定する。ロボット制御部61は任意の終了条件に基づいてこの判定を実行してよい。例えば、ロボット制御部61は規定回数のタスクを実行した場合にはロボット制御を終了すると判定し、タスクの実行回数がその規定回数未満である場合にはロボット制御を継続すると判定してもよい。あるいは、ロボット制御部61はロボット制御においてエラーが発生した場合にはロボット制御を終了すると判定し、そのエラーが発生していない場合にはロボット制御を継続すると判定してもよい。 In step S30, the robot control unit 61 determines whether or not to end robot control. The robot control section 61 may perform this determination based on any termination condition. For example, the robot control unit 61 may determine to end the robot control when the task has been executed a specified number of times, and may determine to continue the robot control when the number of task executions is less than the specified number of times. Alternatively, the robot control unit 61 may determine to end the robot control when an error occurs in the robot control, and determine to continue the robot control when the error does not occur.
 ロボット制御を終了すると判定された場合には(ステップS30においてYES)、処理はステップS31に進む。ステップS31では、ロボット制御システム60が終了処理を実行する。この終了処理において、ロボット制御部61はロボット3を初期の姿勢および位置に戻してもよい。あるいは、ロボット制御部61はすべてのタスクを終了したことを視覚情報または音声によってユーザに通知してもよい。 If it is determined to end the robot control (YES in step S30), the process proceeds to step S31. In step S31, the robot control system 60 executes end processing. In this termination process, the robot control unit 61 may return the robot 3 to its initial posture and position. Alternatively, the robot control unit 61 may notify the user by visual information or voice that all tasks have been completed.
 ロボット制御を続ける場合には(ステップS30においてNO)、処理はステップS32に進む。ステップS32では、ロボット制御部61が次のタスクのための準備を実行する。例えば、ロボット制御部61はロボット3を初期の姿勢および位置に戻してもよい。あるいは、ロボット制御部61は次のタスクを実行することを視覚情報または音声によってユーザに通知してもよい。 If robot control is to be continued (NO in step S30), the process proceeds to step S32. At step S32, the robot control unit 61 prepares for the next task. For example, the robot controller 61 may return the robot 3 to its initial posture and position. Alternatively, the robot control unit 61 may notify the user by visual information or voice that the next task is to be executed.
 図8を参照しながら、処理フローS2に基づくロボットの動作の一例について説明する。図8はロボット制御システム60によるロボット制御の一例を示す図である。この例では、ロボット3は周囲に存在するボールを箱410に収納するタスクを実行する。すなわち、この例では対象物はボールである。図8はロボット3の一連の動作を場面S301~S304により順に表す。以下の説明では処理フローS2との対応も示す。 An example of the motion of the robot based on the processing flow S2 will be described with reference to FIG. FIG. 8 is a diagram showing an example of robot control by the robot control system 60. As shown in FIG. In this example, the robot 3 performs the task of putting the balls in the box 410 in its surroundings. That is, in this example the object is a ball. FIG. 8 sequentially represents a series of actions of the robot 3 by scenes S301 to S304. The following description also shows the correspondence with the processing flow S2.
 場面S301では、検出部14はボール421が映った入力画像を取得し(ステップS21)、機械学習モデル41を用いてその入力画像における対象物(ボール)の有無を検出する(ステップS22)。特定部15はその検出における機械学習モデル41での注目領域を特定する(ステップS23~S25)。この注目領域はボール421が存在する位置に対応する。ロボット制御部61はその注目領域に基づいて、ボール421に対応するタスク領域を選択する(ステップS26)。 In the scene S301, the detection unit 14 acquires an input image showing the ball 421 (step S21), and uses the machine learning model 41 to detect the presence or absence of the object (ball) in the input image (step S22). The specifying unit 15 specifies the attention area in the machine learning model 41 in the detection (steps S23 to S25). This attention area corresponds to the position where the ball 421 exists. The robot control unit 61 selects a task area corresponding to the ball 421 based on the attention area (step S26).
 場面S302では、ロボット制御部61は、ロボット3がボール421に接近するように該ロボット3を制御する(ステップS27,S28)。この制御によってエンドエフェクタ3bとボール421との距離が短くなる。この後もステップS21~S28の処理が繰り返される。 In scene S302, the robot control unit 61 controls the robot 3 so that it approaches the ball 421 (steps S27 and S28). Through this control, the distance between the end effector 3b and the ball 421 is shortened. After that, the processing of steps S21 to S28 is repeated.
 場面S303では、ロボット3がタスク領域に到達したことに応答して(ステップS27においてYES)、ロボット制御部61はロボット3にタスクを実行させる(ステップS29)。 In the scene S303, in response to the robot 3 reaching the task area (YES in step S27), the robot control unit 61 causes the robot 3 to execute the task (step S29).
 場面S303から場面S304にかけて、ロボット3はロボット制御部61の制御によってタスクを実行する(ステップS29)。ロボット3はエンドエフェクタ3bでボール421を把持して、そのボール421を箱410に入れる。 From scene S303 to scene S304, the robot 3 executes the task under the control of the robot control unit 61 (step S29). The robot 3 grips the ball 421 with the end effector 3 b and puts the ball 421 into the box 410 .
 [効果]
 以上説明したように、本開示の一側面に係るデータ生成システムは、画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における対象物の有無を検出する検出部と、検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定する特定部と、検出された対象物に対応するアノテーションを注目領域に関連付けるアノテーション部とを備える。
[effect]
As described above, the data generation system according to one aspect of the present disclosure includes a detection unit that detects the presence or absence of a target object in an input image using a machine learning model that detects the presence or absence of the target object based on the image; The apparatus includes a specifying unit that specifies a region noticed by a machine learning model in detection from an input image as a region of interest, and an annotation unit that associates an annotation corresponding to the detected object with the region of interest.
 本開示の一側面に係るデータ生成方法は、少なくとも一つのプロセッサを備えるデータ生成システムによって実行される。このデータ生成方法は、画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における対象物の有無を検出するステップと、検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定するステップと、検出された対象物に対応するアノテーションを注目領域に関連付けるステップとを含む。 A data generation method according to one aspect of the present disclosure is executed by a data generation system including at least one processor. This data generation method includes steps of detecting the presence or absence of a target object in an input image using a machine learning model that detects the presence or absence of a target object based on the image; and associating annotations corresponding to the detected objects with the region of interest.
 本開示の一側面に係るデータ生成プログラムは、画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における対象物の有無を検出するステップと、検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定するステップと、検出された対象物に対応するアノテーションを注目領域に関連付けるステップとをコンピュータに実行させる。 A data generation program according to one aspect of the present disclosure detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on the image; causing a computer to perform the steps of identifying the detected region from the input image as the region of interest and associating annotations corresponding to the detected object with the region of interest.
 このような側面においては、対象物の検出において機械学習モデルによって注目された領域に、該対象物に対応するアノテーションが自動的に関連付けられる。したがって、画像に映った対象物に関するアノテーションを効率化できる。 In this aspect, annotations corresponding to the object are automatically associated with the region focused on by the machine learning model in detecting the object. Therefore, it is possible to efficiently annotate an object in an image.
 他の側面に係るデータ生成システムは、対象物の有無を示すラベルが付与された複数のトレーニング画像に基づいて機械学習を実行して、機械学習モデルを生成する準備部を更に備えてもよい。この構成によって、対象物の有無を検出するための機械学習モデル、すなわち、注目領域を得るために用いられる機械学習モデルを生成できる。 A data generation system according to another aspect may further include a preparation unit that executes machine learning based on a plurality of training images labeled to indicate the presence or absence of an object to generate a machine learning model. With this configuration, it is possible to generate a machine learning model for detecting the presence or absence of an object, that is, a machine learning model used to obtain a region of interest.
 他の側面に係るデータ生成システムは、所与の画像にラベルを付与するためのラベル用ユーザインタフェースを表示する表示制御部と、ラベル用ユーザインタフェースを介して入力されたラベルを所与の画像に付与して、トレーニング画像を生成するラベリング部とを更に備えてもよい。この構成により、ユーザの望むようにトレーニング画像を用意することが可能になる。 A data generation system according to another aspect includes: a display control unit that displays a label user interface for assigning a label to a given image; A labeling unit that assigns and generates a training image may also be provided. This configuration enables training images to be prepared as desired by the user.
 他の側面に係るデータ生成システムは、アノテーションをユーザに修正させるための修正用ユーザインタフェースを表示する表示制御部を更に備え、アノテーション部は、修正用ユーザインタフェースを介したユーザ入力に基づいてアノテーションを修正してもよい。この構成により、ユーザは自動的に設定されたアノテーションを修正する機会を得られる。また、ユーザ入力に応じてアノテーションが修正されるのでアノテーションの精度を高めることができる。 A data generation system according to another aspect further includes a display control unit that displays a correction user interface for allowing a user to correct the annotation, and the annotation unit corrects the annotation based on user input via the correction user interface. You can fix it. This configuration gives the user the opportunity to modify the automatically set annotations. Also, since the annotation is corrected according to user input, the accuracy of the annotation can be improved.
 他の側面に係るデータ生成システムでは、アノテーション部は、注目領域に対応するグラフィック表現をアノテーションの少なくとも一部として注目領域に関連付けてもよい。グラフィック表現を用いたアノテーションによって、入力画像内の対象物を分かりやすく示すことができる。 In a data generation system according to another aspect, the annotation unit may associate a graphical representation corresponding to the region of interest with the region of interest as at least part of the annotation. Annotations using graphical representations can clearly indicate objects in the input image.
 他の側面に係るデータ生成システムでは、アノテーション部は、注目領域の外接形状をグラフィック表現として生成してもよい。この外接形状によって、入力画像内の対象物の位置を明確に示すことができる。 In a data generation system according to another aspect, the annotation unit may generate the circumscribed shape of the region of interest as a graphic representation. This circumscribed shape can clearly indicate the position of the object in the input image.
 他の側面に係るデータ生成システムでは、検出部は、機械学習モデルを用いて、入力画像における複数種類の対象物のそれぞれの有無を検出し、特定部は、複数種類の対象物のうち、検出された1種類以上の対象物のそれぞれについて注目領域を特定し、アノテーション部は、検出された1種類以上の対象物に対応する1以上の注目領域のそれぞれに、対象物の種類ごとに異なるアノテーションを関連付けてもよい。この場合には、対象物の種類を判別できるようにアノテーションを入力画像に付与できる。 In the data generation system according to another aspect, the detection unit detects the presence or absence of each of the plurality of types of objects in the input image using a machine learning model, and the identification unit detects among the plurality of types of objects. The annotation unit specifies a region of interest for each of the detected one or more types of target objects, and annotating each of the one or more target regions corresponding to the detected one or more types of target objects with different annotations for each target type. may be associated. In this case, annotations can be added to the input image so that the type of object can be determined.
 他の側面に係るデータ生成システムでは、アノテーション部は、対象物の種類を識別するためのクラス値をアノテーションの少なくとも一部として注目領域に関連付けてもよい。この構成により、入力画像のどこに何が映っているかを明確に示すアノテーションを付与できる。 In a data generation system according to another aspect, the annotation unit may associate a class value for identifying the type of object with the region of interest as at least part of the annotation. With this configuration, it is possible to add annotations that clearly indicate where and what is shown in the input image.
 他の側面に係るデータ生成システムでは、特定部は、入力画像を形成する複数の画素のそれぞれについて、機械学習モデルにより注目された度合いを示す注目度を算出し、複数の画素に対応する複数の注目度に基づいて注目領域を特定してもよい。画素ごとの注目度に基づいて注目領域が特定されるので、その注目領域を詳細に特定できる。 In the data generation system according to another aspect, the specifying unit calculates an attention level indicating a degree of attention given to each of the plurality of pixels forming the input image by the machine learning model, and calculates a plurality of attention levels corresponding to the plurality of pixels. A region of interest may be identified based on the degree of attention. Since the attention area is specified based on the degree of attention for each pixel, the attention area can be specified in detail.
 他の側面に係るデータ生成システムでは、特定部は、注目度が所与の閾値以上である1以上の画素を注目画素として複数の画素から選択し、選択された1以上の注目画素に基づいて注目領域を特定してもよい。注目度が相対的に高い画素が注目領域として設定されるので、対象物が存在する確率が高い位置にアノテーションを関連付けることができる。この結果、アノテーションの精度をより高めることが可能になる。 In the data generation system according to another aspect, the specifying unit selects one or more pixels whose attention level is equal to or higher than a given threshold value from the plurality of pixels as target pixels, and based on the selected one or more target pixels A region of interest may be identified. Since a pixel with a relatively high degree of attention is set as an attention area, an annotation can be associated with a position where there is a high probability that an object exists. As a result, it is possible to further improve the accuracy of annotation.
 他の側面に係るデータ生成システムでは、特定部は、複数の注目画素が密集する領域である密集領域の面積が所与の閾値以上である場合に、該密集領域を注目領域として特定してもよい。相対的に広い範囲にわたって注目画素が密集する領域が注目領域として設定されるので、入力画像において明確に映っている対象物にアノテーションを関連付けることができる。 In the data generation system according to another aspect, when the area of a dense region, which is a region in which a plurality of pixels of interest are concentrated, is equal to or greater than a given threshold, the identifying unit may identify the dense region as the attention region. good. A region in which pixels of interest are concentrated over a relatively wide range is set as a region of interest, so that an annotation can be associated with an object clearly appearing in the input image.
 他の側面に係るデータ生成システムでは、特定部は、複数の注目画素が密集する領域である密集領域の外接形状の面積が所与の閾値以上である場合に、該密集領域を注目領域として特定してもよい。密集領域そのものの面積ではなく、密集領域の外接形状の面積に基づいて注目領域が特定されるので、密集領域の面積を簡単に計算でき、その分だけ注目領域を高速に特定できる。 In the data generation system according to another aspect, when the area of the circumscribed shape of the dense region, which is a region in which a plurality of pixels of interest are concentrated, is equal to or greater than a given threshold, the identifying unit identifies the dense region as the attention region. You may Since the attention area is specified based not on the area of the dense area itself but on the area of the circumscribed shape of the dense area, the area of the dense area can be easily calculated, and the attention area can be specified at high speed accordingly.
 他の側面に係るデータ生成システムでは、特定部は、Grad-CAMを機械学習モデルに対して実行して、注目領域を特定してもよい。Grad-CAMを用いることで、様々な種類の機械学習モデル、例えば様々な種類のニューラルネットワークに対して注目領域を特定できる。 In the data generation system according to another aspect, the identifying unit may execute Grad-CAM on the machine learning model to identify the region of interest. Grad-CAM can be used to identify regions of interest for various types of machine learning models, such as various types of neural networks.
 本開示の一側面に係るモデル生成システムは、上記のデータ生成システムと、データ生成システムによってアノテーションが注目領域に関連付けられた入力画像を含む教師データを取得する取得部と、教師データに基づいて、画像から対象物の位置を少なくとも検出するための学習済みモデルを生成する学習部とを備える。この側面においては、アノテーションが付与された入力画像を含む教師データを用いて、対象物の位置を検出するための学習済みモデルを生成できる。 A model generation system according to one aspect of the present disclosure includes the data generation system described above, an acquisition unit that acquires teacher data including an input image in which an annotation is associated with a region of interest by the data generation system, and based on the teacher data, a learning unit that generates a trained model for detecting at least the position of the object from the image. In this aspect, a trained model for detecting the position of an object can be generated using teacher data including annotated input images.
 本開示の一側面に係る推定システムは、上記のモデル生成システムと、モデル生成システムによって生成された学習済みモデルに対象画像を入力して、該対象画像から対象物の位置を少なくとも検出する推定部を備える。この側面においては、学習済みモデルを用いて対象画像から対象物の位置を効率的に検出できる。 An estimation system according to one aspect of the present disclosure includes the model generation system described above, and an estimation unit that inputs a target image to a trained model generated by the model generation system and detects at least the position of the target from the target image. Prepare. In this aspect, the trained model can be used to efficiently detect the position of the target object from the target image.
 本開示の一側面に係る学習済みモデルのロボット制御システムは、画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における対象物の有無を検出する検出部と、検出において機械学習モデルによって注目された領域を注目領域として入力画像から特定する特定部と、対象物を処理するロボットを注目領域に基づいて制御するロボット制御部とを備える。 A learned model robot control system according to one aspect of the present disclosure includes a detection unit that detects the presence or absence of an object in an input image using a machine learning model that detects the presence or absence of an object based on an image, and The apparatus includes a specifying unit that specifies a region noticed by a machine learning model as a region of interest from an input image, and a robot control unit that controls a robot that processes an object based on the region of interest.
 この側面においては、対象物の検出において機械学習モデルによって注目された領域に基づいてロボットが制御されるので、対象物の位置に応じてロボットを自律的に動作させることができる。 In this aspect, since the robot is controlled based on the area focused on by the machine learning model in detecting the object, the robot can be operated autonomously according to the position of the object.
 他の側面に係るロボット制御システムでは、ロボット制御部は、ロボットが対象物に接近するようにロボットを制御してもよい。この場合には、対象物の位置に応じてロボットを自律的に対象物に接近させることができる。 In a robot control system according to another aspect, the robot control unit may control the robot so that the robot approaches the object. In this case, the robot can autonomously approach the object according to the position of the object.
 他の側面に係るロボット制御システムでは、検出部は、ロボットが対象物に接近した後に取得された新たな入力画像における対象物の有無を、機械学習モデルを用いて更に検出し、特定部は、更なる検出において機械学習モデルによって注目された領域を新たな注目領域として新たな入力画像から特定し、ロボット制御部は、新たな注目領域に基づいてロボットを更に制御してもよい。ロボットが対象物に接近した際に注目領域が再び特定され、その注目領域に基づいてロボットが更に制御される。この仕組みによってロボットをより精度良く動作させることができる。 In the robot control system according to another aspect, the detection unit further detects the presence or absence of the object in a new input image acquired after the robot approaches the object using a machine learning model, and the identification unit A region focused by the machine learning model in the further detection may be specified as a new region of interest from the new input image, and the robot control unit may further control the robot based on the new region of interest. When the robot approaches the object, the attention area is identified again, and the robot is further controlled based on the attention area. This mechanism allows the robot to operate more accurately.
 本開示の一側面に係る学習済みモデルの製造方法は、上記のデータ生成方法と、データ生成方法によってアノテーションが注目領域に関連付けられた入力画像を含む教師データに基づいて、画像から対象物の位置を少なくとも検出するための学習済みモデルを生成するステップとを含む。アノテーションが自動的に付与された入力画像が教師データの少なくとも一部として用いられるので、物体検出のための学習済みモデルを効率的に生成できる。 A method for producing a trained model according to an aspect of the present disclosure is based on the data generation method described above and teacher data including an input image in which annotations are associated with regions of interest by the data generation method. generating a trained model for at least detecting the Since the input image automatically annotated is used as at least part of the teacher data, it is possible to efficiently generate a trained model for object detection.
 [変形例]
 以上、本開示の実施形態に基づいて詳細に説明した。しかし、本開示は上記の例に限定されるものではない。本開示は、その要旨を逸脱しない範囲で様々な変形が可能である。
[Modification]
The above has been described in detail based on the embodiments of the present disclosure. However, the disclosure is not limited to the above examples. Various modifications can be made to the present disclosure without departing from the gist thereof.
 本開示に係るシステムの機能構成は上記の例に限定されない。例えば、モデル生成システム20および推定システム30を備えることなく、データ生成システム10が単独で構築されてもよい。この場合には、モデル生成システム20および推定システム30に相当するコンピュータシステムは、データ生成システム10とは所有者が異なるコンピュータシステムであり得る。あるいは、推定システム30を備えることなく、データ生成システム10およびモデル生成システム20の組合せが構築されてもよい。この場合には、推定システム30に相当するコンピュータシステムは、データ生成システム10およびモデル生成システム20とは所有者が異なるコンピュータシステムであり得る。本開示に係るデータ生成システムおよびロボット制御システムは、表示制御部11、ラベリング部12、および準備部13を備えなくてもよい。すなわち、データ生成システムおよびロボット制御システムは、他のコンピュータシステムによって生成された機械学習モデルを用いてもよい。機械学習モデルおよび学習済みモデルはコンピュータシステム間で移植可能なので、本開示に係る各種のシステムは柔軟に実装され得る。 The functional configuration of the system according to the present disclosure is not limited to the above example. For example, the data generation system 10 may be constructed independently without including the model generation system 20 and the estimation system 30 . In this case, the computer systems corresponding to model generation system 20 and estimation system 30 may be computer systems owned by different owners from data generation system 10 . Alternatively, a combination of data generation system 10 and model generation system 20 may be constructed without including estimation system 30 . In this case, the computer system corresponding to the estimation system 30 may be a computer system owned by a different owner from the data generation system 10 and the model generation system 20 . The data generation system and robot control system according to the present disclosure may not include the display control unit 11, the labeling unit 12, and the preparation unit 13. That is, the data generation system and robot control system may use machine learning models generated by other computer systems. Because machine learning models and trained models are portable between computer systems, various systems according to the present disclosure can be implemented flexibly.
 本開示に係るシステムのハードウェア構成は、プログラムの実行により各機能モジュールを実現する態様に限定されない。例えば、上述した機能モジュール群の少なくとも一部は、その機能に特化した論理回路により構成されていてもよいし、該論理回路を集積したASIC(Application Specific Integrated Circuit)により構成されてもよい。 The hardware configuration of the system according to the present disclosure is not limited to the aspect of implementing each functional module by executing a program. For example, at least part of the functional module group described above may be configured by a logic circuit specialized for that function, or may be configured by an ASIC (Application Specific Integrated Circuit) that integrates the logic circuit.
 少なくとも一つのプロセッサにより実行される方法の処理手順は上記の例に限定されない。例えば、上述したステップまたは処理の一部が省略されてもよいし、別の順序で各ステップが実行されてもよい。また、上述したステップのうちの2以上のステップが組み合わされてもよいし、ステップの一部が修正または削除されてもよい。あるいは、上記の各ステップに加えて他のステップが実行されてもよい。 The processing procedure of the method executed by at least one processor is not limited to the above examples. For example, some of the steps or processes described above may be omitted, or the steps may be performed in a different order. Also, two or more of the steps described above may be combined, and some of the steps may be modified or deleted. Alternatively, other steps may be performed in addition to the above steps.
 コンピュータシステムまたはコンピュータ内で二つの数値の大小関係を比較する際には、「以上」および「よりも大きい」という二つの基準のどちらを用いてもよく、「以下」および「未満」という二つの基準のうちのどちらを用いてもよい。 When comparing two numerical values within a computer system or within a computer, either of the two criteria "greater than" and "greater than" may be used, and the two criteria "less than" and "less than" may be used. Either of the criteria may be used.
 1…物体検出システム、2…ロボットシステム、3…ロボット、3b…エンドエフェクタ、4…ロボットコントローラ、10…データ生成システム、11…表示制御部、12…ラベリング部、13…準備部、14…検出部、15…特定部、16…アノテーション部、20…モデル生成システム、21…学習部、30…推定システム、31…推定部、41…機械学習モデル、42…学習済みモデル、51…第1画像データベース、52…第2画像データベース、60…ロボット制御システム、61…ロボット制御部、200…入力画像、221,222…密集領域、231,232…アノテーション、421…ボール(対象物)。 DESCRIPTION OF SYMBOLS 1... Object detection system, 2... Robot system, 3... Robot, 3b... End effector, 4... Robot controller, 10... Data generation system, 11... Display control part, 12... Labeling part, 13... Preparation part, 14... Detection Part, 15... Identification part, 16... Annotation part, 20... Model generation system, 21... Learning part, 30... Estimation system, 31... Estimation part, 41... Machine learning model, 42... Learned model, 51... First image Database 52 Second image database 60 Robot control system 61 Robot control unit 200 Input image 221, 222 Dense area 231, 232 Annotation 421 Ball (object).

Claims (21)

  1.  画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における前記対象物の有無を検出する検出部と、
     前記検出において前記機械学習モデルによって注目された領域を注目領域として前記入力画像から特定する特定部と、
     検出された対象物に対応するアノテーションを前記注目領域に関連付けるアノテーション部と、
    を備えるデータ生成システム。
    A detection unit that detects the presence or absence of the target object in the input image using a machine learning model that detects the presence or absence of the target object based on the image;
    a specifying unit that specifies, from the input image, a region focused by the machine learning model in the detection as a region of interest;
    an annotation unit that associates an annotation corresponding to the detected object with the region of interest;
    A data generation system comprising
  2.  前記対象物の有無を示すラベルが付与された複数のトレーニング画像に基づいて機械学習を実行して、前記機械学習モデルを生成する準備部を更に備える請求項1に記載のデータ生成システム。 The data generation system according to claim 1, further comprising a preparation unit that executes machine learning based on a plurality of training images to which labels indicating the presence or absence of the object are assigned to generate the machine learning model.
  3.  所与の画像に前記ラベルを付与するためのラベル用ユーザインタフェースを表示する表示制御部と、
     前記ラベル用ユーザインタフェースを介して入力された前記ラベルを前記所与の画像に付与して、前記トレーニング画像を生成するラベリング部と、
    を更に備える請求項2に記載のデータ生成システム。
    a display control unit that displays a label user interface for assigning the label to a given image;
    a labeling unit that generates the training image by adding the label input through the label user interface to the given image;
    3. The data generation system of claim 2, further comprising:
  4.  前記アノテーションをユーザに修正させるための修正用ユーザインタフェースを表示する表示制御部を更に備え、
     前記アノテーション部は、前記修正用ユーザインタフェースを介したユーザ入力に基づいて前記アノテーションを修正する、
    請求項1~3のいずれか一項に記載のデータ生成システム。
    Further comprising a display control unit that displays a correction user interface for allowing the user to correct the annotation,
    The annotation unit corrects the annotation based on user input via the correction user interface.
    The data generation system according to any one of claims 1-3.
  5.  前記アノテーション部は、前記注目領域に対応するグラフィック表現を前記アノテーションの少なくとも一部として前記注目領域に関連付ける、
    請求項1~4のいずれか一項に記載のデータ生成システム。
    The annotation unit associates a graphical representation corresponding to the region of interest with the region of interest as at least part of the annotation.
    The data generation system according to any one of claims 1-4.
  6.  前記アノテーション部は、前記注目領域の外接形状を前記グラフィック表現として生成する、
    請求項5に記載のデータ生成システム。
    The annotation unit generates a circumscribed shape of the region of interest as the graphic representation.
    The data generation system according to claim 5.
  7.  前記検出部は、前記機械学習モデルを用いて、前記入力画像における複数種類の対象物のそれぞれについて、該対象物の有無を検出し、
     前記特定部は、前記複数種類の対象物のうち、検出された1種類以上の対象物のそれぞれについて前記注目領域を特定し、
     前記アノテーション部は、前記検出された1種類以上の対象物に対応する1以上の前記注目領域のそれぞれに、前記対象物の種類ごとに異なる前記アノテーションを関連付ける、
    請求項1~6のいずれか一項に記載のデータ生成システム。
    The detection unit uses the machine learning model to detect the presence or absence of each of a plurality of types of objects in the input image,
    The specifying unit specifies the region of interest for each of one or more types of target objects detected among the plurality of types of target objects,
    The annotation unit associates each of the one or more regions of interest corresponding to the detected one or more types of objects with the annotations that differ for each type of the object.
    The data generation system according to any one of claims 1-6.
  8.  前記アノテーション部は、前記対象物の種類を識別するためのクラス値を前記アノテーションの少なくとも一部として前記注目領域に関連付ける、
    請求項1~7のいずれか一項に記載のデータ生成システム。
    The annotation unit associates a class value for identifying the type of the object as at least part of the annotation with the region of interest.
    The data generation system according to any one of claims 1-7.
  9.  前記特定部は、
      前記入力画像を形成する複数の画素のそれぞれについて、前記機械学習モデルにより注目された度合いを示す注目度を算出し、
      前記複数の画素に対応する複数の前記注目度に基づいて前記注目領域を特定する、
    請求項1~8のいずれか一項に記載のデータ生成システム。
    The specifying unit is
    calculating a degree of attention indicating the degree of attention given to each of the plurality of pixels forming the input image by the machine learning model;
    Identifying the attention area based on the plurality of attention degrees corresponding to the plurality of pixels;
    The data generation system according to any one of claims 1-8.
  10.  前記特定部は、
      前記注目度が所与の閾値以上である1以上の前記画素を注目画素として前記複数の画素から選択し、
      前記選択された1以上の注目画素に基づいて前記注目領域を特定する、
    請求項9に記載のデータ生成システム。
    The specifying unit is
    selecting one or more of the pixels whose attention level is equal to or greater than a given threshold as target pixels from the plurality of pixels;
    Identifying the region of interest based on the selected one or more pixels of interest;
    The data generation system according to claim 9.
  11.  前記特定部は、複数の前記注目画素が密集する領域である密集領域の面積が所与の閾値以上である場合に、該密集領域を前記注目領域として特定する、
    請求項10に記載のデータ生成システム。
    When the area of a dense region, which is a region in which a plurality of the pixels of interest are densely packed, is equal to or greater than a given threshold, the identifying unit identifies the dense region as the attention region.
    The data generation system according to claim 10.
  12.  前記特定部は、複数の前記注目画素が密集する領域である密集領域の外接形状の面積が所与の閾値以上である場合に、該密集領域を前記注目領域として特定する、
    請求項10または11に記載のデータ生成システム。
    When the area of the circumscribed shape of the dense region, which is the region in which the plurality of pixels of interest are densely packed, is equal to or greater than a given threshold, the identifying unit identifies the dense region as the attention region.
    The data generation system according to claim 10 or 11.
  13.  前記特定部は、Grad-CAMを前記機械学習モデルに対して実行して、前記注目領域を特定する、
    請求項1~12のいずれか一項に記載のデータ生成システム。
    The identifying unit executes Grad-CAM on the machine learning model to identify the attention area,
    The data generation system according to any one of claims 1-12.
  14.  請求項1~13のいずれか一項に記載のデータ生成システムと、
     前記データ生成システムによって前記アノテーションが前記注目領域に関連付けられた前記入力画像を含む教師データを取得する取得部と、
     前記教師データに基づいて、画像から前記対象物の位置を少なくとも検出するための学習済みモデルを生成する学習部と、
    を備えるモデル生成システム。
    A data generation system according to any one of claims 1 to 13;
    an acquisition unit configured to acquire teacher data including the input image in which the annotation is associated with the region of interest by the data generation system;
    a learning unit that generates a trained model for detecting at least the position of the object from an image based on the training data;
    A model generation system with
  15.  請求項14に記載のモデル生成システムと、
     前記モデル生成システムによって生成された前記学習済みモデルに対象画像を入力して、該対象画像から前記対象物の位置を少なくとも検出する推定部を備える推定システム。
    a model generation system according to claim 14;
    An estimation system comprising an estimation unit that inputs a target image to the trained model generated by the model generation system and detects at least the position of the target from the target image.
  16.  画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における前記対象物の有無を検出する検出部と、
     前記検出において前記機械学習モデルによって注目された領域を注目領域として前記入力画像から特定する特定部と、
     前記対象物を処理するロボットを前記注目領域に基づいて制御するロボット制御部と、
    を備えるロボット制御システム。
    A detection unit that detects the presence or absence of the target object in the input image using a machine learning model that detects the presence or absence of the target object based on the image;
    a specifying unit that specifies, from the input image, a region focused by the machine learning model in the detection as a region of interest;
    a robot control unit that controls a robot that processes the object based on the attention area;
    A robot control system with
  17.  前記ロボット制御部は、前記ロボットが前記対象物に接近するように前記ロボットを制御する、
    請求項16に記載のロボット制御システム。
    The robot control unit controls the robot so that the robot approaches the object.
    17. A robot control system according to claim 16.
  18.  前記検出部は、前記ロボットが前記対象物に接近した後に取得された新たな入力画像における前記対象物の有無を、前記機械学習モデルを用いて更に検出し、
     前記特定部は、前記更なる検出において前記機械学習モデルによって注目された領域を新たな注目領域として前記新たな入力画像から特定し、
     前記ロボット制御部は、前記新たな注目領域に基づいて前記ロボットを更に制御する、
    請求項16または17に記載のロボット制御システム。
    The detection unit further detects the presence or absence of the object in a new input image acquired after the robot approaches the object, using the machine learning model,
    The specifying unit specifies, from the new input image, a region focused by the machine learning model in the further detection as a new region of interest,
    the robot control unit further controls the robot based on the new attention area;
    Robot control system according to claim 16 or 17.
  19.  少なくとも一つのプロセッサを備えるデータ生成システムによって実行されるデータ生成方法であって、
     画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における前記対象物の有無を検出するステップと、
     前記検出において前記機械学習モデルによって注目された領域を注目領域として前記入力画像から特定するステップと、
     検出された対象物に対応するアノテーションを前記注目領域に関連付けるステップと、
    を含むデータ生成方法。
    A data generation method performed by a data generation system comprising at least one processor, comprising:
    detecting the presence or absence of the object in the input image using a machine learning model that detects the presence or absence of the object based on the image;
    identifying from the input image a region focused by the machine learning model in the detection as a region of interest;
    associating annotations corresponding to detected objects with the region of interest;
    data generation methods, including
  20.  請求項19に記載のデータ生成方法と、
     前記データ生成方法によって前記アノテーションが前記注目領域に関連付けられた前記入力画像を含む教師データに基づいて、画像から前記対象物の位置を少なくとも検出するための学習済みモデルを生成するステップと、
    を含む学習済みモデルの製造方法。
    a data generation method according to claim 19;
    generating a trained model for at least detecting the position of the object from an image based on teacher data including the input image in which the annotation is associated with the region of interest by the data generation method;
    How to produce a trained model that contains
  21.  画像に基づいて対象物の有無を検出する機械学習モデルを用いて、入力画像における前記対象物の有無を検出するステップと、
     前記検出において前記機械学習モデルによって注目された領域を注目領域として前記入力画像から特定するステップと、
     検出された対象物に対応するアノテーションを前記注目領域に関連付けるステップと、
    をコンピュータに実行させるデータ生成プログラム。
    detecting the presence or absence of the object in the input image using a machine learning model that detects the presence or absence of the object based on the image;
    identifying from the input image a region focused by the machine learning model in the detection as a region of interest;
    associating annotations corresponding to detected objects with the region of interest;
    A data generation program that causes a computer to execute
PCT/JP2021/044058 2021-12-01 2021-12-01 Data generation system, model generation system, estimation system, trained model production method, robot control system, data generation method, and data generation program WO2023100282A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/044058 WO2023100282A1 (en) 2021-12-01 2021-12-01 Data generation system, model generation system, estimation system, trained model production method, robot control system, data generation method, and data generation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/044058 WO2023100282A1 (en) 2021-12-01 2021-12-01 Data generation system, model generation system, estimation system, trained model production method, robot control system, data generation method, and data generation program

Publications (1)

Publication Number Publication Date
WO2023100282A1 true WO2023100282A1 (en) 2023-06-08

Family

ID=86611780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/044058 WO2023100282A1 (en) 2021-12-01 2021-12-01 Data generation system, model generation system, estimation system, trained model production method, robot control system, data generation method, and data generation program

Country Status (1)

Country Link
WO (1) WO2023100282A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019150628A1 (en) * 2018-01-30 2019-08-08 三菱電機株式会社 Entry area extraction device and entry area extraction program
JP2020107170A (en) * 2018-12-28 2020-07-09 住友電気工業株式会社 Annotation device, learning model, image sensor, annotation method, and computer program
JP2021033494A (en) * 2019-08-21 2021-03-01 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Annotation support method, annotation support device, and annotation support program
WO2021125019A1 (en) * 2019-12-17 2021-06-24 株式会社Preferred Networks Information system, information processing method, information processing program and robot system
WO2021152727A1 (en) * 2020-01-29 2021-08-05 楽天グループ株式会社 Object recognition system, positional information acquisition method, and program
JP2021163078A (en) * 2020-03-31 2021-10-11 Jfeスチール株式会社 Foreign matter detection device, foreign matter removal device, and foreign matter detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019150628A1 (en) * 2018-01-30 2019-08-08 三菱電機株式会社 Entry area extraction device and entry area extraction program
JP2020107170A (en) * 2018-12-28 2020-07-09 住友電気工業株式会社 Annotation device, learning model, image sensor, annotation method, and computer program
JP2021033494A (en) * 2019-08-21 2021-03-01 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Annotation support method, annotation support device, and annotation support program
WO2021125019A1 (en) * 2019-12-17 2021-06-24 株式会社Preferred Networks Information system, information processing method, information processing program and robot system
WO2021152727A1 (en) * 2020-01-29 2021-08-05 楽天グループ株式会社 Object recognition system, positional information acquisition method, and program
JP2021163078A (en) * 2020-03-31 2021-10-11 Jfeスチール株式会社 Foreign matter detection device, foreign matter removal device, and foreign matter detection method

Similar Documents

Publication Publication Date Title
CN114080583B (en) Visual teaching and repetitive movement manipulation system
Barth et al. Design of an eye-in-hand sensing and servo control framework for harvesting robotics in dense vegetation
US11407111B2 (en) Method and system to generate a 3D model for a robot scene
US11717959B2 (en) Machine learning methods and apparatus for semantic robotic grasping
SE526119C2 (en) Method and system for programming an industrial robot
CN111462154A (en) Target positioning method and device based on depth vision sensor and automatic grabbing robot
Schröder et al. Real-time hand tracking with a color glove for the actuation of anthropomorphic robot hands
Zhang et al. Sim2real learning of obstacle avoidance for robotic manipulators in uncertain environments
JP6973444B2 (en) Control system, information processing device and control method
CN113829343A (en) Real-time multi-task multi-person man-machine interaction system based on environment perception
Bertino et al. Experimental autonomous deep learning-based 3d path planning for a 7-dof robot manipulator
Cong Visual servoing control of 4-DOF palletizing robotic arm for vision based sorting robot system
Zhang et al. Recent advances on vision-based robot learning by demonstration
WO2023100282A1 (en) Data generation system, model generation system, estimation system, trained model production method, robot control system, data generation method, and data generation program
CN109934155B (en) Depth vision-based collaborative robot gesture recognition method and device
Ye et al. Design of Industrial Robot Teaching System Based on Machine Vision
WO2023286138A1 (en) Robot control system, robot system, robot control method, and robot control program
Al-Shanoon et al. DeepNet‐Based 3D Visual Servoing Robotic Manipulation
Maeda et al. Lighting-and occlusion-robust view-based teaching/playback for model-free robot programming
Al-Shanoon Developing a mobile manipulation system to handle unknown and unstructured objects
Chaudhary et al. Visual Feedback based Trajectory Planning to Pick an Object and Manipulation using Deep learning
KR20240096990A (en) Control Device of Robot for Moving the Position of Non-fixed Object
RAMÍREZ Visual Servoing for Reaching and Grasping Behaviors
Hu Research on trajectory control of multi‐degree‐of‐freedom industrial robot based on visual image
CA2625805A1 (en) System and method for image mapping and visual attention

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21966366

Country of ref document: EP

Kind code of ref document: A1