WO2022250154A1 - 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 - Google Patents
学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 Download PDFInfo
- Publication number
- WO2022250154A1 WO2022250154A1 PCT/JP2022/021815 JP2022021815W WO2022250154A1 WO 2022250154 A1 WO2022250154 A1 WO 2022250154A1 JP 2022021815 W JP2022021815 W JP 2022021815W WO 2022250154 A1 WO2022250154 A1 WO 2022250154A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- model
- learning
- adapter
- target
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 30
- 230000013016 learning Effects 0.000 claims abstract description 187
- 230000008878 coupling Effects 0.000 claims abstract 2
- 238000010168 coupling process Methods 0.000 claims abstract 2
- 238000005859 coupling reaction Methods 0.000 claims abstract 2
- 230000006870 function Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 15
- 230000001131 transforming effect Effects 0.000 claims 3
- 239000012636 effector Substances 0.000 description 20
- 238000003860 storage Methods 0.000 description 16
- 238000011144 upstream manufacturing Methods 0.000 description 8
- 238000012546 transfer Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 241000283086 Equidae Species 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000011960 computer-aided design Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 241000283070 Equus zebra Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 240000004050 Pentaglottis sempervirens Species 0.000 description 1
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000005282 brightening Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003754 machining Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to a trained model generation device, a trained model generation method, and a recognition device.
- a trained model generating device includes a control unit that generates a trained model that outputs a recognition result of a recognition target included in input information.
- the control unit is connected to at least one base model generated by executing first learning using teacher data including learning target information that is the same as or related to the input information, and the learning Inputting the input information generated by performing second learning using teacher data including target information different from information used in the first learning to the at least one base model Get a convertible adapter before you do.
- the control unit performs third learning using teacher data including information different from information used in the first learning and information used in the second learning among the information to be learned. Generate the target model by running it.
- the control unit generates the learned model by combining the adapter and the target model.
- a trained model generation method is executed by a trained model generation device that generates a trained model that outputs a recognition result of a recognition target included in input information.
- the method for generating a trained model is coupled to at least one base model generated by performing a first learning using teacher data including information to be learned that is the same as or related to the input information. , the input information generated by performing second learning using teacher data including information different from the information used in the first learning among the information to be learned, the at least one base; Including getting translatable adapters before populating the model.
- the trained model generation method includes a third learning method using teacher data including information different from information used in the first learning and information used in the second learning, among the information to be learned.
- the trained model generation method includes generating the trained model by combining the adapter and the target model.
- a recognition device includes a trained model that outputs a recognition result of a recognition target included in input information.
- the trained model is combined with at least one base model generated by performing a first learning using teacher data including information to be learned that is the same as or related to the input information, applying the input information generated by executing second learning using teacher data including information different from the information used in the first learning out of information to be learned to the at least one base model; Including an adapter that can be converted before entering.
- third learning using teacher data wherein the trained model includes information different from the information used in the first learning and the information used in the second learning among the information to be learned; contains the target model generated by running
- the trained model is constructed by combining the adapter and the target model.
- FIG. 1 is a block diagram showing a configuration example of a trained model generation system according to an embodiment
- FIG. FIG. 4 is a schematic diagram showing a generic library and a trained model to which an image adapter is coupled
- FIG. 3 is a diagram showing an example of an image adapter
- FIG. 4 is a schematic diagram showing generation of an image adapter coupled to a plurality of base models and generation of a trained model by transferring the image adapter to the trained model
- 4 is a flow chart showing an example procedure of a learned model generation method
- 1 is a schematic diagram showing a configuration example of a robot control system
- recognition accuracy can be improved.
- a trained model generation device 20 As shown in FIG. 1 , a trained model generation device 20 according to an embodiment of the present disclosure includes a control section 22 and an information generation section 26 . The trained model generating device 20 generates a trained model 70 (see FIG. 2).
- the control unit 22 acquires information about the target applied to learning from the information generation unit 26 .
- Objects that are applied to learning are also referred to as learning objects.
- the control unit 22 performs learning using the information about the learning target acquired from the information generating unit 26 as teacher data, and outputs information or data based on the learning result.
- the learning target for generating the trained model 70 may include the object itself to be recognized, or another object. may include An object that can be recognized by the trained model 70 is also called a recognition target.
- the control unit 22 may include at least one processor to provide control and processing power to perform various functions.
- the processor may execute programs that implement various functions of the controller 22 .
- a processor may be implemented as a single integrated circuit.
- An integrated circuit is also called an IC (Integrated Circuit).
- a processor may be implemented as a plurality of communicatively coupled integrated and discrete circuits. Processors may be implemented based on various other known technologies.
- the control unit 22 may include a storage unit.
- the storage unit may include an electromagnetic storage medium such as a magnetic disk, or may include a memory such as a semiconductor memory or a magnetic memory.
- the storage unit stores various information.
- the storage unit stores programs and the like executed by the control unit 22 .
- the storage unit may be configured as a non-transitory readable medium.
- the storage section may function as a work memory for the control section 22 . At least part of the storage section may be configured separately from the control section 22 .
- the information generation unit 26 outputs teacher data used in learning in the control unit 22 to the control unit 22 .
- the information generator 26 may generate teacher data, or acquire teacher data from an external device.
- the information generation unit 26 may be configured including at least one processor to provide control and processing capabilities for generating or acquiring teacher data.
- the processor may execute a program that generates or acquires teacher data.
- the information generator 26 may be configured identically or similarly to the controller 22 .
- the information generator 26 may be configured integrally with the controller 22 .
- the information generation unit 26 may generate information representing the actual mode of the learning target as teacher data. Information representing the actual aspect of the learning object is also referred to as actual information.
- the information generator 26 may include a camera that takes an actual image of the learning target.
- the information generation unit 26 may perform annotation by adding information such as a label to the actual image to be learned.
- the information generator 26 may receive an operation input related to annotation from the user.
- the information generation unit 26 may perform annotation based on a learning model for annotation prepared in advance.
- the information generator 26 can generate actual information by annotating the actual image to be learned.
- the information generating unit 26 virtually generates, as teacher data, information about the learning target as information of a task that is the same as or related to the input information input to the trained model 70 .
- the input information will be the image in which the object was taken.
- a task that is the same as or related to the input information corresponds to a task that is executed using the input information to be processed by the trained model 70 or a task that is executed using information similar to or related to the input information.
- the same task as the input information corresponds to the task of classifying the screws and nails that are actually classified by the trained model 70. do.
- the task associated with the input information corresponds to the task of classifying screws and nails from an image that also includes other types of screws or nails that are similar to a given type of screws and nails, or objects that are similar to these.
- the information about the learning object that is virtually generated is also called pseudo information.
- the pseudo information may be, for example, a computer graphics (CG) image of the screw or nail to be recognized instead of image information of the actual screw or nail.
- the task may include, for example, a classification task for classifying recognition targets included in input information into at least two types.
- the task may include, for example, a task of distinguishing whether a recognition target is a screw or a nail, or an evaluation task of calculating at least one type of evaluation value based on input information.
- the classification task can be subdivided into, for example, a task of distinguishing whether a recognition target is a dog or a cat.
- Tasks are not limited to classification tasks, and may include tasks that implement various other operations.
- a task may include a segmentation determining from pixels belonging to a particular object.
- a task may include object detection to detect an enclosing rectangular region.
- the task may include object pose estimation.
- a task may include keypoint detection to find certain feature points.
- both the input information and the information about the learning target are classification task information
- the relationship between the input information and the information about the learning target is assumed to be related task information.
- both the input information and the information about the learning target are task information for distinguishing whether the recognition target is a dog or a cat
- the relationship between the input information and the information about the learning target is the same. task information.
- the relationship between the input information and the learning target information is not limited to these examples, and can be determined under various conditions.
- the information generation unit 26 may generate information that virtually represents the appearance of the learning target in order to generate pseudo information.
- the information generator 26 may generate modeling data such as three-dimensional CAD (Computer Aided Design) data of the appearance of the learning object as information that virtually represents the appearance of the learning object.
- the information generation unit 26 may generate an image of the learning target as information that virtually represents the appearance of the learning target.
- the information generation unit 26 may perform annotation by adding information such as a label to modeling data or an image that virtually represents the appearance of the object to be learned.
- the information generation unit 26 can generate pseudo information by annotating the generated information that virtually represents the appearance of the object to be learned.
- the information generation unit 26 may acquire information that virtually represents the appearance of the learning object from an external device.
- the information generation unit 26 may receive input regarding modeling data from the user.
- the information generation unit 26 may acquire data obtained by annotating information that virtually represents the appearance of the object to be learned.
- the information generator 26 may receive an operation input related to annotation from the user.
- the information generation unit 26 may perform annotation on information that virtually represents the appearance of a learning object based on a learning model for annotation that has been prepared in advance.
- the trained model generating device 20 generates a trained model 70 that outputs recognition results of recognition targets included in input information.
- the trained model 70 is configured as a model in which the image adapter 50 is coupled to the input side of the target model 40 .
- the image adapter 50 is configured to be able to input input information.
- the image adapter 50 is also simply called an adapter.
- the trained model generation device 20 performs the following operations in preparation for generating the trained model 70.
- the trained model generating device 20 generates the base model 30 by learning based on the pseudo information.
- the training performed to generate the base model 30 is also referred to as first training.
- the teacher data used in the first learning may include learning target information that is the same as or related to the input information.
- the trained model generating device 20 may use real information instead of pseudo information, or may use both pseudo information and real information.
- the pseudo information used for learning to generate the base model 30 is also called first pseudo information.
- the trained model generation device 20 generates the image adapter 50 by further learning based on the actual information while the image adapter 50 is connected to the input side of the base model 30 .
- the learning performed to generate the image adapter 50 is also referred to as second learning.
- the teacher data used in the second learning includes learning target information that is the same as or related to the input information, and may include information different from the information used in the first learning.
- the real information used for learning to generate the image adapter 50 is also called first real information. Second pseudo information and second real information, which will be described later, may be used as the first pseudo information and first real information.
- the trained model generation device 20 generates the target model 40 by learning based on pseudo information or real information without connecting the image adapter 50 .
- the learning performed to generate the target model 40 is also referred to as third learning.
- the teacher data used in the third learning contains learning target information that is the same as or related to the input information, and is different from the information used in the first learning and the information used in the second learning. May contain information.
- the pseudo information used for learning to generate the target model 40 is also called second pseudo information.
- the real information used for learning to generate the target model 40 is also referred to as second real information.
- the trained model generating device 20 transfers the image adapter 50 generated in advance by pre-learning in a state where it is connected to the base model 30, and connects it to the input side of the newly generated target model 40 to generate a trained model 70. to generate Note that the trained model generation device 20 may transfer the base model 30 used for pre-learning as the target model 40 . Also, the trained model generation device 20 combines the image adapter 50 and the target model 40, performs further learning using the second pseudo information and the second real information as teacher data, and generates a trained model 70. good too.
- the trained model generating device 20 generates the image adapter 50 in advance by pre-learning, generates the target model 40 by learning based only on the pseudo information, and generates the trained model 70 simply by connecting the image adapter 50. can be generated. As a result, the workload of generating the target model 40 can be reduced.
- pre-learning real information, pseudo information, or information combining these may be used as training data.
- the base model 30 and the target model 40 are configured as a CNN (Convolution Neural Network) having multiple layers. Information input to the base model 30 and the target model 40 is subjected to convolution based on predetermined weighting factors in each layer of the CNN. In training the base model 30 and the target model 40, the weighting factors are updated.
- Base model 30 and target model 40 may be configured by VGG 16 or ResNet 50 .
- the base model 30 and the target model 40 are not limited to these examples, and may be configured as various other models.
- the base model 30 includes a first base model 31 and a second base model 32 .
- the target model 40 includes a first target model 41 and a second target model 42 .
- the first base model 31 and the first target model 41 are also called backbone.
- the second base model 32 and the second target model 42 are also called heads.
- Base model 30 and target model 40 include a backbone and a head.
- each trained model included in the target model 40 may be different from the trained model included in the base model 30 .
- each of the trained models included in the target model 40 may be subjected to a different learning process than each of the trained models included in the base model 30 . More specifically, the learning process may be performed using teacher data containing information different from each other.
- the pre-learning model included in the target model 40 may be the same model as the pre-learning model included in the base model 30 .
- the backbone is configured to output the result of extracting the feature quantity of the input information.
- the feature quantity represents, for example, the feature of the appearance of the learning object as a numerical value.
- the head is configured to make predetermined decisions about the input information based on the output of the backbone. Specifically, the head may output the recognition result of the recognition target included in the input information based on the feature amount of the input information output by the backbone. That is, the head is configured to perform recognition of the recognition target as a predetermined determination.
- the feature quantity can be a parameter representing the ratio of striped area on the body surface.
- the predetermined determination may be to determine whether the recognition target is a horse or a zebra by comparing the area ratio of the striped pattern on the body surface with a threshold value.
- the feature quantity may be a parameter representing the size or the number of holes in the shell.
- the predetermined determination may be comparing the size or the number of holes in the shell with a threshold value to determine whether the recognition target is an abalone or a tokobushi.
- the image adapter 50 may be configured as a CNN with multiple layers, as illustrated in FIG.
- the image adapter 50 is configured to convert information input to the base model 30 or the target model 40 before being input to the base model 30 or the target model 40 .
- the image adapter 50 is coupled to the input side of the target model 40 in FIG. 3, but can also be coupled to the input side of the base model 30.
- the block labeled "Conv” represents executing convolution. Convolution is also called downsampling. Also, the block described as “Conv Trans” represents the execution of transposed convolution. Transposed convolution is also called upsampling. Transposed convolution is sometimes referred to as deconvolution.
- the block labeled "Conv 4x4" represents that the size of the filter used to perform the convolution on the two-dimensional data is 4x4.
- a filter also called a kernel, corresponds to a set of weighting coefficients in performing a convolution or deconvolution of the information input to the block.
- the block labeled “Conv Trans 4x4" represents that the size of the filter used to perform the transposed convolution on the two-dimensional data is 4x4.
- the block labeled "stride 2" represents shifting the filter by two elements when performing convolution or transposed convolution. Conversely, blocks without “stride 2" indicate that the filter is shifted by one element when performing convolution or transposed convolution.
- the image adapter 50 When the image adapter 50 is connected to the input side of the base model 30, it converts pseudo information or real information input for learning and outputs it to the base model 30. If the pseudo information or real information is an image, the image adapter 50 converts the input image and outputs it to the base model 30 . When connected to the input side of the target model 40 , the image adapter 50 converts and outputs an image to be recognized included in the input information input to the trained model 70 . Further, the image adapter 50 may convert the form of the input image and output it. The image adapter 50 may output the input image by, for example, emphasizing the edges of the image or brightening the shaded portion of the image. The image adapter 50 converts the target model 40 to which it is connected so that it can process the task correctly. For example, if the task is recognition of an object included in an image, the base model 30 or the target model 40 converts the mode so that the result of correctly recognizing the recognition target can be output.
- the control unit 22 of the trained model generating device 20 can generate the trained model 70 by executing the operations schematically shown in FIG. 4, for example.
- the operation of the trained model generation device 20 will be described below with reference to FIG.
- the control unit 22 generates at least one base model 30 as a first step. Specifically, the control unit 22 acquires the first pseudo information as teacher data from the information generation unit 26 . The control unit 22 generates the base model 30 by learning based on the first pseudo information. The control unit 22 updates the base model 30 so as to increase the probability that the information output from the learning base model 30 is the information representing the learning target included in the first pseudo information. The controller 22 may update the base model 30 by updating the weighting coefficients of the base model 30 . Before starting learning, the base model 30 may be in a predetermined initial state. That is, the weighting factor of the base model 30 may be set to a predetermined initial value.
- the control unit 22 can generate the base model 30 by learning based on the first pseudo information. Since the learning for generating the base model 30 is executed prior to the learning for generating the image adapter 50 in the second step, which will be described later, it can be said to be pre-learning.
- the controller 22 has been described as acquiring the first pseudo information from the information generator 26 as teacher data, but the present invention is not limited to this.
- training data not only the first pseudo information but also the first real information can be used.
- the second pseudo information or the second real information may be used as the training data.
- the control unit 22 generates x base models 30 .
- the x number of base models 30 are distinguished from the first base model 301 to the x-th base model 30x.
- the control unit 22 acquires different pieces of information as the first pseudo information used for learning to generate each base model 30 .
- the first base model 301 includes a first base model 311 and a second base model 321 .
- the x-th base model 30x includes a first base model 31x and a second base model 32x.
- the control unit 22 generates the image adapter 50 as a second step. Specifically, the control unit 22 may further acquire actual information as teacher data from the information generation unit 26 .
- the control unit 22 updates the image adapter 50 by learning based on the first pseudo information and real information while the image adapter 50 is connected to the learned base model 30 generated in the first step.
- the controller 22 may update the image adapter 50 by updating the weighting coefficients of the image adapter 50 .
- the control unit 22 acquires different information as actual information used for learning for generating each base model 30 .
- the image adapter 50 coupled to the base model 30 may be in a predetermined initial state. That is, the weighting factor of the image adapter 50 may be set to a predetermined initial value.
- the learning image adapter 50a to be updated by learning is represented by a black rectangle.
- the control unit 22 learns based on the first pseudo information and the real information in a state in which the image adapter 50 is connected to the learned base model 30 generated in the first step, and the image adapter 50 has been described as updating, but it is not limited to this.
- the control unit 22 may perform learning based on only one of the first pseudo information and the real information to update the image adapter 50 .
- the control unit 22 learns based on the first pseudo information or real information corresponding to each base model 30 while the image adapter 50a being learned is connected to each of the x number of base models 30 .
- the control unit 22 inputs the first pseudo information and the real information to the image adapter 50a under learning, and inputs the output of the image adapter 50a under learning to each of the x base models 30 for learning.
- the control unit 22 generates the image adapter 50 by updating the image adapter 50 through learning.
- the control unit 22 outputs information output from each base model 30 to which the first pseudo information is input via the image adapter 50, and output from each base model 30 to which actual information is input via the image adapter 50. Update the image adapter 50 so that the information becomes closer.
- the control unit 22 outputs information output from each base model 30 to which the first pseudo information is input via the image adapter 50, and output from each base model 30 to which actual information is input via the image adapter 50.
- the image adapter 50 may be updated to increase the probability of matching information.
- the control unit 22 may update each base model 30 together with the image adapter 50 through learning, or may update only the image adapter 50 .
- the control unit 22 may perform learning for each combination of one base model 30 coupled with the image adapter 50a being learned.
- the control unit 22 may combine a plurality of combinations of one base model 30 and the image adapter 50a being learned and perform learning in parallel.
- control unit 22 can generate the image adapter 50 through learning based on the first pseudo information and real information.
- the learning for generating the image adapter 50 can be performed independently of the learning for generating the target model 40 in the third step, which will be described later.
- the control unit 22 generates a target model 40 as a third step. Specifically, the control unit 22 acquires the second pseudo information as teacher data from the information generation unit 26 . As the second pseudo information, the control unit 22 acquires task information that is the same as or related to the first pseudo information used for learning to generate the base model 30 . The control unit 22 generates the target model 40 by learning based on the second pseudo information. The control unit 22 inputs the second pseudo information to the image adapter 50 and inputs it to the target model 40 without conversion. The control unit 22 updates the target model 40 so as to increase the probability that the information output from the learning target model 40 is the information representing the learning target included in the second pseudo information. The control unit 22 may update the target model 40 by updating the weighting coefficients of the target model 40 .
- the target model 40 Before starting learning, the target model 40 may be in a predetermined initial state. That is, the weighting factor of the target model 40 may be set to a predetermined initial value.
- the target models 40 to be updated by learning include a first target model 41a and a second target model 42a that are being learned, and are represented by black rectangles.
- the control unit 22 can generate the target model 40 by learning based on the second pseudo information.
- the controller 22 has been described as acquiring the second pseudo information from the information generator 26 as teacher data, but the present invention is not limited to this. As training data, not only the second pseudo information but also the second real information may be used.
- control unit 22 inputs the second pseudo information to the target model 40 without converting it to update the target model 40, but the present invention is not limited to this.
- the control unit 22 updates the target model 40 and the image adapter 50 by combining the target model 40 and the image adapter 50 and learning using the second pseudo information, the second real information, or both.
- the control unit 22 generates a trained model 70 by connecting the image adapter 50 to the target model 40 .
- the control unit 22 converts the trained image adapter 50b generated in the second step to the target model 40 including the trained first target model 41b and the second trained target model 42b generated in the third step. Join. That is, the control unit 22 transfers the image adapter 50 generated in the second step and couples it to the target model 40 .
- the target model 40 and the image adapter 50 generated in the third step have been described as being combined, but the present invention is not limited to this.
- the target model 40 the base model 30 generated in the first step may be used. In this case, the third step may not be executed.
- the controller 22 of the trained model generation device 20 may perform the above-described operations as a trained model generation method including the procedures of the flowchart illustrated in FIG.
- the learned model generation method may be implemented as a learned model generation program that is executed by a processor that configures the control unit 22 .
- the trained model generation program may be stored on non-transitory computer-readable media.
- the control unit 22 acquires a plurality of base models 30 (step S1).
- the control unit 22 may generate a plurality of base models 30 by learning based on the first pseudo information, or may acquire them from an external device.
- the control unit 22 acquires only the plurality of base models 30 used for learning to generate the image adapter 50.
- the control unit 22 selects at least one base model 30 from a plurality of base models 30 (step S2).
- the control unit 22 acquires information on a learning target (step S3).
- the control unit 22 may acquire real information of a task that is the same as or related to pseudo information used in learning for generating the selected base model 30 as learning target information.
- the control unit 22 generates the image adapter 50 by learning based on the learning target information while the image adapter 50 is connected to the selected base model 30 (step S4). Specifically, the control unit 22 inputs real information to the image adapter 50 as learning target information. Information converted from actual information by the image adapter 50 is input to the selected base model 30 . The control unit 22 generates the image adapter 50 by updating the image adapter 50 based on the information output from the selected base model 30 .
- the control unit 22 determines whether all base models 30 have been selected (step S5). If all the base models 30 have not been selected (step S5: NO), that is, if at least one base model 30 has not been selected, the control unit 22 returns to the procedure of step S2 to select the unselected base model. Select 30.
- step S6 the control unit 22 acquires information on the recognition target (step S6). Specifically, the control unit 22 may acquire second pseudo information of a task that is the same as or related to the first pseudo information used in learning for generating the selected base model 30 as information to be recognized.
- the control unit 22 generates the target model 40 by learning based on the information of the recognition target (step S7).
- the control unit 22 connects the image adapter 50 and the target model 40 (step S8).
- the control unit 22 can generate the learned model 70 that combines the image adapter 50 and the target model 40 by executing the above procedure.
- the control unit 22 ends the execution of the procedure of the flowchart of FIG.
- the control unit 22 inputs the input information to the generated trained model 70, and evaluates the recognition accuracy of the recognition target included in the input information based on the output of the trained model 70. good.
- the control unit 22 may output the generated learned model 70 to the robot control device 110 (see FIG. 6), which will be described later.
- the trained model generation device 20 combines the image adapter 50 generated by learning in the state of being connected to the base model 30 with the target model 40 newly generated by another learning. By doing so, the trained model 70 can be generated.
- the trained model generating device 20 generates the image adapter 50 by learning based on real information or pseudo information.
- the trained model generating device 20 generates the target model 40 by learning based only on the pseudo information.
- the recognition accuracy by the trained model 70 combined with the image adapter 50 generated by learning based on real information or pseudo information is improved compared to the case of using only the target model 40 . Therefore, if the image adapter 50 is generated in advance by learning based on real information or pseudo information, high recognition accuracy can be expected by combining the image adapter 50 with the target model 40 .
- the trained model generating device 20 can increase the recognition accuracy by generating the trained model 70 by connecting the image adapter 50 . In other words, the recognition accuracy of the trained model 70 can be improved without transferring the base model 30 to the target model 40 .
- the operation of transferring the base model 30 itself can be a constraint on the generation of the trained model 70.
- the target model 40 may not match the desired recognition target. be.
- the trained model generation device 20 according to the present embodiment does not need to transfer the base model 30 to the target model 40, so that the target model 40 can be easily matched with the model desired by the end user.
- the image adapter 50 generated by learning in a state of being linked to each of the plurality of base models 30 is also called an upstream task because it is generated by the service provider's prior learning.
- the trained model 70 generated by transferring the image adapter 50 from the upstream task and combining it with the newly generated target model 40 is generated according to the recognition target desired by the end user of the service. , also called downstream tasks.
- the trained model generation device 20 In the downstream task, it is required to generate the trained model 70 with little data acquisition effort or in a short learning time to quickly operate the system.
- upstream tasks a lot of data and computational resources can be expended in advance in order to provide high-quality metamodels with fast transfer learning and high generalization performance.
- the trained model generation device 20 according to the present embodiment generates upstream tasks using a large amount of data and computational resources, so that downstream tasks can be generated with a small load, and as a result, the system can be put into operation early.
- the trained model generation device 20 recognizes the real information even in the downstream task that has not learned based on the real information. Accuracy can be improved.
- the image adapter 50 is generated so as to increase the recognition accuracy for real information of each of the plurality of base models 30 generated so as to increase the recognition accuracy for pseudo information.
- the recognition accuracy of the target model 40 newly generated in the downstream task can also be improved.
- the generation of the image adapter 50 to improve the recognition accuracy of each of the plurality of base models 30 is also called generalization of the image adapter 50 or Generalized Image Adapter (GIA).
- GAA Generalized Image Adapter
- image quality improvements that are fundamentally useful for the task can be obtained, such as emphasizing common features that perform well in multiple base models 30 while suppressing features that are sources of noise. This improvement in image quality is expected not only to improve the Sim-to-Real problem, but also to improve recognition accuracy with various base models.
- the trained model generation device 20 may generate the image adapter 50 in the upstream task and transfer the image adapter 50 generated in the upstream task to the downstream task.
- the trained model generation device 20 may generate the image adapter 50 by learning based on the second real information or the second pseudo information only in downstream tasks.
- ⁇ Comparison of recognition accuracy> When recognizing a recognition target from input information including a real image using a model generated by learning based only on a generated image that is pseudo information, the recognition accuracy decreases due to the difference between the generated image and the real image. Specifically, in a model that can recognize a recognition target with a probability close to 100% for a generated image, the probability that a recognition target can be recognized for a real image can drop to about 70%.
- the trained model 70 is generated as a model in which the image adapter 50 generated by learning in a state of being connected to each of the plurality of base models 30 is connected to the target model 40 .
- the image adapter 50 can correct errors in recognition results due to differences between the generated image and the actual image.
- the probability that the recognition target can be recognized with respect to the real image can be increased to about 80%. That is, when the image adapter 50 is connected, the probability of recognizing the recognition target can be increased compared to when the image adapter 50 is not connected.
- the learned model 70 according to this embodiment is generated without transferring the base model 30 . That is, it is possible to increase the probability that the recognition target can be recognized with respect to the real image without transferring the base model 30 . By not having to transfer the base model 30, the target model 40 is more likely to match the model desired by the end user.
- a robot control system 100 includes a robot 2 and a robot control device 110 .
- the robot 2 moves the work object 8 from the work start point 6 to the work target point 7 . That is, the robot control device 110 controls the robot 2 so that the work object 8 moves from the work start point 6 to the work target point 7 .
- the work object 8 is also referred to as work object.
- the robot control device 110 controls the robot 2 based on information regarding the space in which the robot 2 works. Information about space is also referred to as spatial information.
- the robot 2 has an arm 2A and an end effector 2B.
- the arm 2A may be configured as, for example, a 6-axis or 7-axis vertical articulated robot.
- the arm 2A may be configured as a 3-axis or 4-axis horizontal articulated robot or SCARA robot.
- the arm 2A may be configured as a 2-axis or 3-axis Cartesian robot.
- Arm 2A may be configured as a parallel link robot or the like.
- the number of shafts forming the arm 2A is not limited to the illustrated one.
- the robot 2 has an arm 2A connected by a plurality of joints and operates by driving the joints.
- the end effector 2B may include, for example, a gripping hand configured to grip the work object 8.
- the grasping hand may have multiple fingers. The number of fingers of the grasping hand may be two or more. The fingers of the grasping hand may have one or more joints.
- the end effector 2B may include a suction hand configured to be able to suction the work object 8 .
- the end effector 2B may include a scooping hand configured to scoop the work object 8 .
- the end effector 2 ⁇ /b>B includes a tool such as a drill, and may be configured to be able to perform various machining operations such as drilling a hole in the work object 8 .
- the end effector 2B is not limited to these examples, and may be configured to perform various other operations. In the configuration illustrated in FIG. 1, the end effector 2B is assumed to include a grasping hand.
- the robot 2 can control the position of the end effector 2B by operating the arm 2A.
- the end effector 2 ⁇ /b>B may have an axis that serves as a reference for the direction in which it acts on the work object 8 . If the end effector 2B has an axis, the robot 2 can control the direction of the axis of the end effector 2B by operating the arm 2A.
- the robot 2 controls the start and end of the action of the end effector 2B acting on the work object 8 .
- the robot 2 can move or process the workpiece 8 by controlling the position of the end effector 2B or the direction of the axis of the end effector 2B and controlling the operation of the end effector 2B. In the configuration illustrated in FIG.
- the robot 2 causes the end effector 2B to grip the work object 8 at the work start point 6 and moves the end effector 2B to the work target point 7 .
- the robot 2 causes the end effector 2B to release the work object 8 at the work target point 7 . By doing so, the robot 2 can move the work object 8 from the work start point 6 to the work target point 7 .
- the robot control system 100 further comprises a sensor 3, as shown in FIG. A sensor 3 detects physical information of the robot 2 .
- the physical information of the robot 2 may include information on the actual position or orientation of each constituent part of the robot 2 or the velocity or acceleration of each constituent part of the robot 2 .
- the physical information of the robot 2 may include information about forces acting on each component of the robot 2 .
- the physical information of the robot 2 may include information about the current flowing through the motors that drive each component of the robot 2 or the torque of the motors.
- the physical information of the robot 2 represents the result of the actual motion of the robot 2 . In other words, the robot control system 100 can grasp the result of the actual motion of the robot 2 by acquiring the physical information of the robot 2 .
- the sensor 3 may include a force sensor or a tactile sensor that detects force acting on the robot 2, distributed pressure, slip, or the like as physical information of the robot 2.
- the sensor 3 may include a motion sensor that detects the position or posture, or the speed or acceleration of the robot 2 as the physical information of the robot 2 .
- the sensor 3 may include a current sensor that detects the current flowing through the motor that drives the robot 2 as the physical information of the robot 2 .
- the sensor 3 may include a torque sensor that detects the torque of the motor that drives the robot 2 as the physical information of the robot 2 .
- the sensor 3 may be installed in a joint of the robot 2 or in a joint driving section that drives the joint.
- the sensor 3 may be installed on the arm 2A of the robot 2 or the end effector 2B.
- the sensor 3 outputs the detected physical information of the robot 2 to the robot control device 110 .
- the sensor 3 detects and outputs physical information of the robot 2 at a predetermined timing.
- the sensor 3 outputs physical information of the robot 2 as time-series data.
- the robot control system 100 is assumed to have two cameras 4 .
- the camera 4 captures an image of an object, a person, or the like located within the influence range 5 that may affect the motion of the robot 2 .
- An image captured by the camera 4 may include monochrome luminance information, or may include luminance information of each color represented by RGB (Red, Green and Blue) or the like.
- the range of influence 5 includes the motion range of the robot 2 . It is assumed that the influence range 5 is a range obtained by expanding the motion range of the robot 2 further outward.
- the range of influence 5 may be set so that the robot 2 can be stopped before a person or the like moving from the outside to the inside of the motion range of the robot 2 enters the inside of the motion range of the robot 2 .
- the range of influence 5 may be set, for example, as a range that extends a predetermined distance from the boundary of the motion range of the robot 2 to the outside.
- the camera 4 may be installed so as to capture a bird's-eye view of the influence range 5 or the motion range of the robot 2 or a peripheral area thereof.
- the number of cameras 4 is not limited to two, and may be one or three or more.
- the robot control device 110 acquires the learned model 70 generated by the trained model generation device 20 . Based on the image captured by the camera 4 and the learned model 70, the robot control device 110 identifies the work object 8, the work start point 6, the work target point 7, or the like, which exist in the space where the robot 2 works. to recognize In other words, the robot control device 110 acquires the learned model 70 generated for recognizing the work object 8 and the like based on the image captured by the camera 4 . Robot controller 110 is also referred to as a recognizer.
- the robot controller 110 may be configured with at least one processor to provide control and processing power to perform various functions.
- Each component of the robot control device 110 may be configured including at least one processor.
- a plurality of components among the components of the robot control device 110 may be realized by one processor.
- the entire robot controller 110 may be implemented with one processor.
- the processor may execute programs that implement various functions of the robot controller 110 .
- a processor may be implemented as a single integrated circuit.
- An integrated circuit is also called an IC (Integrated Circuit).
- a processor may be implemented as a plurality of communicatively coupled integrated and discrete circuits. Processors may be implemented based on various other known technologies.
- the robot control device 110 may include a storage unit.
- the storage unit may include an electromagnetic storage medium such as a magnetic disk, or may include a memory such as a semiconductor memory or a magnetic memory.
- the storage unit stores various information, programs executed by the robot control device 110, and the like.
- the storage unit may be configured as a non-transitory readable medium.
- the storage unit may function as a work memory for the robot control device 110 . At least part of the storage unit may be configured separately from the robot controller 110 .
- the robot control device 110 acquires the learned model 70 in advance.
- the robot control device 110 may store the trained model 70 in the storage unit.
- the robot control device 110 obtains an image of the work object 8 from the camera 4 .
- the robot control device 110 inputs the captured image of the work target 8 to the learned model 70 as input information.
- the robot control device 110 acquires output information output from the learned model 70 according to the input of input information.
- the robot control device 110 recognizes the work object 8 based on the output information, and performs work such as gripping and moving the work object 8 .
- the robot control system 100 can acquire the learned model 70 from the learned model generation device 20 and recognize the work object 8 by the learned model 70 .
- the trained model generation device 20 may set the loss function so that the output when input information is input to the generated trained model 70 approaches the output when teacher data is input.
- cross-entropy can be used as the loss function.
- Cross-entropy is calculated as a value representing the relationship between two probability distributions. Specifically, in this embodiment, the cross-entropy is calculated as a value representing the relationship between the input pseudo information or real information and the backbone, head or adapter.
- the trained model generation device 20 learns so that the value of the loss function becomes small.
- the output corresponding to the input of the input information can approach the output corresponding to the input of the teacher data.
- the control unit 22 of the trained model generation device 20 trains the image adapter 50 by optimizing the loss function of the same or related task as the input information while the image adapter 50 is connected to the base model 30. may be generated. Optimization of the loss function may be, for example, minimization of the value of the loss function. Loss functions for tasks that are identical or related to the input information include the loss function of the base model 30 . On the other hand, the control unit 22 generates the image adapter 50 by learning to optimize a loss function other than the task that is the same as or related to the input information while the image adapter 50 is connected to the base model 30. good too. Non-task loss functions that are the same as or related to the input information include various significant loss functions other than the base model 30 loss function.
- Discrimination Loss is a loss function used to learn the authenticity of a generated image by labeling it with a numerical value between 1, which represents complete truth, and 0, which represents complete falsehood. .
- the control unit 22 learns an image output by the image adapter 50 when an image is input to the image adapter 50 as input information, using the correct answer as a label. By doing so, the control unit 22 controls the image adapter 50 so that the base model 30 generated by learning based on the pseudo information cannot distinguish between the image as the actual information and the image output by the image adapter 50 . can generate
- the control unit 22 of the trained model generation device 20 generates the image adapter 50 by learning with the image adapter 50 coupled to each of the plurality of base models 30 . That is, the control unit 22 applies each of the plurality of base models 30 to pre-learning for generating the image adapter 50 .
- the control unit 22 When the plurality of base models 30 includes the first base model 301 to the x-th base model 30x, the control unit 22 generates a combination in which each base model 30 is coupled to the image adapter 50 in order, Image adapter 50 may be generated by learning and updating image adapter 50 for each of each combination. That is, the control unit 22 may sequentially apply each of the plurality of base models 30 one by one to pre-learning for generating the image adapter 50 .
- the control unit 22 may randomly determine the order in which the base model 30 is applied to pre-learning, or may determine it based on a predetermined rule.
- the control unit 22 may execute in parallel a plurality of pre-learnings applying each of a plurality of combinations. That is, the control unit 22 may apply a plurality of base models 30 in parallel to pre-learning.
- the control unit 22 may classify a plurality of base models 30 into a plurality of groups, and apply each group to pre-learning for generating the image adapter 50 in order.
- the control unit 22 may classify a plurality of base models 30 into one group. In this case, the control unit 22 may apply the plurality of base models 30 classified into groups in parallel to pre-learning, or may apply each of the plurality of base models 30 one by one to pre-learning in order. good.
- the control unit 22 may classify one base model 30 into each group.
- the control unit 22 may randomly determine the order in which each group is applied to pre-learning, or may determine it based on a predetermined rule.
- the embodiments of the trained model generation system 1 and the robot control system 100 have been described above. It can also be embodied as a medium (for example, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a CD-RW, a magnetic tape, a hard disk, or a memory card).
- a medium for example, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a CD-RW, a magnetic tape, a hard disk, or a memory card.
- the implementation form of the program is not limited to an application program such as an object code compiled by a compiler or a program code executed by an interpreter. good.
- the program may or may not be configured so that all processing is performed only in the CPU on the control board.
- the program may be configured to be partially or wholly executed by another processing unit mounted on an expansion board or expansion unit added to the board as required.
- Embodiments according to the present disclosure are not limited to any specific configuration of the embodiments described above. Embodiments of the present disclosure extend to any novel feature or combination thereof described in the present disclosure or any novel method or process step or combination thereof described. be able to.
- Descriptions such as “first” and “second” in this disclosure are identifiers for distinguishing the configurations. Configurations that are differentiated in descriptions such as “first” and “second” in this disclosure may interchange the numbers in that configuration. For example, the first pseudo information can replace the identifiers “first” and “second” with the second pseudo information. The exchange of identifiers is done simultaneously. The configurations are still distinct after the exchange of identifiers. Identifiers may be deleted. Configurations from which identifiers have been deleted are distinguished by codes. The description of identifiers such as “first” and “second” in this disclosure should not be used as a basis for interpreting the order of the configuration or the existence of lower numbered identifiers.
- Trained model generation device (22: control unit, 26: information generation unit) 30 base model (31: first base model (31a: during learning, 31b: already learned), 32: second base model (32a: during learning, 32b: already learned), 301 to 30x: 1st to xth base model, 311-31x: 1st to x-th first base model, 321-32x: 1st to x-th second base model) 40 target model (41: first target model (41a: during learning, 41b: already learned), 42: second target model (42a: during learning, 42b: already learned)) 50 adapter (50a: learning, 50b: already learned) 70 trained model 100 robot control system (2: robot, 2A: arm, 2B: end effector, 3: sensor, 4: camera, 5: range of robot influence, 6: work start table, 7: work target table, 8 : work object, 110: robot control device (recognition device)
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
Description
図1に示されるように、本開示の一実施形態に係る学習済みモデル生成装置20は、制御部22と、情報生成部26とを備える。学習済みモデル生成装置20は、学習済みモデル70(図2参照)を生成する。
図2に示されるように、学習済みモデル生成装置20は、入力情報に含まれる認識対象の認識結果を出力する学習済みモデル70を生成する。学習済みモデル70は、ターゲットモデル40の入力側にイメージアダプタ50を結合したモデルとして構成される。イメージアダプタ50は、入力情報を入力可能に構成される。イメージアダプタ50は、単にアダプタとも称される。
学習済みモデル生成装置20の制御部22は、例えば図4に模式的に示される動作を実行することによって学習済みモデル70を生成できる。以下、図4を参照して学習済みモデル生成装置20の動作が説明される。
学習済みモデル生成装置20の制御部22は、以上述べてきた動作を、図5に例示されるフローチャートの手順を含む学習済みモデル生成方法として実行してよい。学習済みモデル生成方法は、制御部22を構成するプロセッサに実行させる学習済みモデル生成プログラムとして実現されてもよい。学習済みモデル生成プログラムは、非一時的なコンピュータ読み取り可能な媒体に格納されてよい。
以上述べてきたように、本実施形態に係る学習済みモデル生成装置20は、ベースモデル30に結合した状態における学習によって生成したイメージアダプタ50を、別の学習によって新たに生成したターゲットモデル40に結合することによって、学習済みモデル70を生成できる。学習済みモデル生成装置20は、実情報又は疑似情報に基づいて学習することによって、イメージアダプタ50を生成する。学習済みモデル生成装置20は、疑似情報だけに基づいて学習することによってターゲットモデル40を生成する。実情報又は疑似情報に基づく学習によって生成されたイメージアダプタ50を結合した学習済みモデル70による認識精度は、ターゲットモデル40だけの場合に比べて認識精度が向上する。したがって、あらかじめ実情報又は疑似情報に基づく学習によってイメージアダプタ50を生成しておけば、ターゲットモデル40にイメージアダプタ50を結合させることで、高い認識精度が期待される。
疑似情報である生成画像のみに基づく学習によって生成したモデルで実画像を含む入力情報から認識対象を認識する場合、生成画像と実画像との差異に起因して、認識精度は低下する。具体的に、生成画像に対して100%に近い確率で認識対象を認識できるモデルにおいて、実画像に対して認識対象を認識できる確率は70%程度に低下し得る。
図6に示されるように、一実施形態に係るロボット制御システム100は、ロボット2と、ロボット制御装置110とを備える。本実施形態において、ロボット2は、作業対象物8を作業開始地点6から作業目標地点7へ移動させるとする。つまり、ロボット制御装置110は、作業対象物8が作業開始地点6から作業目標地点7へ移動するようにロボット2を制御する。作業対象物8は、作業対象とも称される。ロボット制御装置110は、ロボット2が作業を実施する空間に関する情報に基づいて、ロボット2を制御する。空間に関する情報は、空間情報とも称される。
ロボット2は、アーム2Aと、エンドエフェクタ2Bとを備える。アーム2Aは、例えば、6軸又は7軸の垂直多関節ロボットとして構成されてよい。アーム2Aは、3軸又は4軸の水平多関節ロボット又はスカラロボットとして構成されてもよい。アーム2Aは、2軸又は3軸の直交ロボットとして構成されてもよい。アーム2Aは、パラレルリンクロボット等として構成されてもよい。アーム2Aを構成する軸の数は、例示したものに限られない。言い換えれば、ロボット2は、複数の関節で接続されるアーム2Aを有し、関節の駆動によって動作する。
図2に示されるように、ロボット制御システム100は、更にセンサ3を備える。センサ3は、ロボット2の物理情報を検出する。ロボット2の物理情報は、ロボット2の各構成部の現実の位置若しくは姿勢、又は、ロボット2の各構成部の速度若しくは加速度に関する情報を含んでよい。ロボット2の物理情報は、ロボット2の各構成部に作用する力に関する情報を含んでよい。ロボット2の物理情報は、ロボット2の各構成部を駆動するモータに流れる電流又はモータのトルクに関する情報を含んでよい。ロボット2の物理情報は、ロボット2の実際の動作の結果を表す。つまり、ロボット制御システム100は、ロボット2の物理情報を取得することによって、ロボット2の実際の動作の結果を把握することができる。
図1に示される構成例において、ロボット制御システム100は、2台のカメラ4を備えるとする。カメラ4は、ロボット2の動作に影響を及ぼす可能性がある影響範囲5に位置する物品又は人間等を撮影する。カメラ4が撮影する画像は、モノクロの輝度情報を含んでもよいし、RGB(Red, Green and Blue)等で表される各色の輝度情報を含んでもよい。影響範囲5は、ロボット2の動作範囲を含む。影響範囲5は、ロボット2の動作範囲を更に外側に広げた範囲であるとする。影響範囲5は、ロボット2の動作範囲の外側から動作範囲の内側へ向かって移動する人間等がロボット2の動作範囲の内側に入るまでにロボット2を停止できるように設定されてよい。影響範囲5は、例えば、ロボット2の動作範囲の境界から所定距離だけ外側まで拡張された範囲に設定されてもよい。カメラ4は、ロボット2の影響範囲5若しくは動作範囲又はこれらの周辺の領域を俯瞰的に撮影できるように設置されてもよい。カメラ4の数は、2つに限られず、1つであってもよいし、3つ以上であってもよい。
ロボット制御装置110は、学習済みモデル生成装置20で生成された学習済みモデル70を取得する。ロボット制御装置110は、カメラ4で撮影した画像と学習済みモデル70とに基づいて、ロボット2が作業を実施する空間に存在する、作業対象物8、又は作業開始地点6若しくは作業目標地点7等を認識する。言い換えれば、ロボット制御装置110は、カメラ4で撮影した画像に基づいて作業対象物8等を認識するために生成された学習済みモデル70を取得する。ロボット制御装置110は、認識装置とも称される。
ロボット制御装置110(認識装置)は、学習済みモデル70をあらかじめ取得する。ロボット制御装置110は、学習済みモデル70を記憶部に格納してよい。ロボット制御装置110は、カメラ4から作業対象物8を撮影した画像を取得する。ロボット制御装置110は、作業対象物8を撮影した画像を入力情報として学習済みモデル70に入力する。ロボット制御装置110は、学習済みモデル70から入力情報の入力に応じて出力される出力情報を取得する。ロボット制御装置110は、出力情報に基づいて作業対象物8を認識し、作業対象物8を把持したり移動したりする作業を実行する。
以上述べてきたように、ロボット制御システム100は、学習済みモデル生成装置20から学習済みモデル70を取得し、学習済みモデル70によって作業対象物8を認識できる。
以下、他の実施形態が説明される。
学習済みモデル生成装置20は、生成した学習済みモデル70に対して入力情報を入力した場合における出力が教師データを入力した場合における出力に近づくように損失関数を設定してよい。本実施形態において、損失関数として交差エントロピーが用いられ得る。交差エントロピーは、2つの確率分布の間の関係を表す値として算出される。具体的に、本実施形態において、交差エントロピーは、入力する疑似情報又は実情報と、バックボーン、ヘッド又はアダプタとの間の関係を表す値として算出される。
学習済みモデル生成装置20の制御部22は、複数のベースモデル30の各々にイメージアダプタ50を結合した状態で学習することによって、イメージアダプタ50を生成する。つまり、制御部22は、複数のベースモデル30の各々を、イメージアダプタ50を生成するための事前学習に適用する。
30 ベースモデル(31:第1ベースモデル(31a:学習中、31b:学習済み)、32:第2ベースモデル(32a:学習中、32b:学習済み)、301~30x:1番目~x番目のベースモデル、311~31x:1番目~x番目の第1ベースモデル、321~32x:1番目~x番目の第2ベースモデル)
40 ターゲットモデル(41:第1ターゲットモデル(41a:学習中、41b:学習済み)、42:第2ターゲットモデル(42a:学習中、42b:学習済み))
50 アダプタ(50a:学習中、50b:学習済み)
70 学習済みモデル
100 ロボット制御システム(2:ロボット、2A:アーム、2B:エンドエフェクタ、3:センサ、4:カメラ、5:ロボットの影響範囲、6:作業開始台、7:作業目標台、8:作業対象物、110:ロボット制御装置(認識装置)
Claims (16)
- 入力情報に含まれる認識対象の認識結果を出力する学習済みモデルを生成する制御部を備え、
前記制御部は、
前記入力情報と同一又は関連する学習対象の情報を含む教師データを用いた第1の学習を実行することによって生成された少なくとも1つのベースモデルに結合された状態で、前記学習対象の情報のうち前記第1の学習で用いられた情報と異なる情報を含む教師データを用いた第2の学習を実行することによって生成された、前記入力情報を前記少なくとも1つのベースモデルに入力する前に変換可能なアダプタを取得し、
前記学習対象の情報のうち前記第1の学習で用いられた情報及び前記第2の学習で用いられた情報のいずれとも異なる情報を含む教師データを用いた第3の学習を実行することによってターゲットモデルを生成し、
前記アダプタと前記ターゲットモデルとを結合することによって前記学習済みモデルを生成する、
学習済みモデル生成装置。 - 前記ベースモデルは、前記入力情報と同一又は関連するタスクの情報として仮想的に生成された学習対象の第1疑似情報を教師データとして事前学習されたモデルであり、
前記制御部は、
前記学習対象の実際の態様を表す第1実情報及び前記第1疑似情報の少なくとも一方を更に教師データとして、前記ベースモデルに結合させた前記アダプタを学習させ、
前記認識対象を表すデータとして仮想的に生成された第2疑似情報、又は、前記認識対象の実際の態様を表す第2実情報を教師データとして学習して前記ターゲットモデルを生成する、請求項1に記載の学習済みモデル生成装置。 - 前記第1疑似情報、前記第2疑似情報、前記第1実情報、及び前記第2実情報は画像を含み、
前記アダプタは、入力される画像の態様を変換して出力する、請求項2に記載の学習済みモデル生成装置。 - 前記ベースモデルは、前記入力情報と同一又は関連するタスクの情報として仮想的に生成された学習対象の第1疑似情報のみを教師データとして事前学習されたモデルであり、
前記制御部は、前記認識対象を表すデータとして仮想的に生成された第2疑似情報のみを教師データとして学習して前記ターゲットモデルを生成する、請求項2又は3に記載の学習済みモデル生成装置。 - 複数の前記ベースモデルが事前学習によって生成され、
前記アダプタは、前記複数のベースモデルそれぞれに前記入力情報を入力可能に構成され、
前記制御部は、前記アダプタの出力を前記複数のベースモデルの各々に入力させて学習させることによって、少なくとも前記アダプタを生成する、請求項1から4までのいずれか一項に記載の学習済みモデル生成装置。 - 前記制御部は、前記アダプタの出力を前記複数のベースモデルの各々に入力させて学習させることによって、前記アダプタのみを生成又は更新する、請求項5に記載の学習済みモデル生成装置。
- 前記制御部は、前記アダプタを生成するために、前記複数のベースモデルを複数のグループに分類して前記各グループを順番に、前記アダプタを生成するための事前学習に適用する、請求項5に記載の学習済みモデル生成装置。
- 前記制御部は、前記各グループに1つの前記ベースモデルを分類する、請求項7に記載の学習済みモデル生成装置。
- 前記制御部は、前記アダプタを生成するための事前学習に前記各グループを適用する順番をランダムに決定する、請求項7又は8に記載の学習済みモデル生成装置。
- 前記制御部は、前記認識対象を表すデータとして仮想的に生成された第2疑似情報とに基づいて学習することによって、前記アダプタに結合されるターゲットモデルを生成する、請求項1から9までのいずれか一項に記載の学習済みモデル生成装置。
- 前記制御部は、前記アダプタを前記ターゲットモデルに結合した状態で学習することによって前記アダプタを生成する、請求項1から10までのいずれか一項に記載の学習済みモデル生成装置。
- 前記制御部は、前記入力情報と同一又は関連するタスクの損失関数を最適化するように、前記ベースモデルに結合したアダプタを学習する、請求項1から11までのいずれか一項に記載の学習済みモデル生成装置。
- 前記制御部は、前記入力情報と同一又は関連するタスク以外の損失関数を最適化するように学習することによって、前記ベースモデルに結合したアダプタを生成する、請求項1から11までのいずれか一項に記載の学習済みモデル生成装置。
- 前記ベースモデルは、前記入力情報の特徴量を抽出した結果を出力する第1ベースモデルと、前記第1ベースモデルの出力に基づいて前記入力情報についての所定の判断を行う第2ベースモデルとを含む、請求項1から13までのいずれか一項に記載の学習済みモデル生成装置。
- 入力情報に含まれる認識対象の認識結果を出力する学習済みモデルを生成する学習済みモデル生成装置が実行する学習済みモデル生成方法であって、
前記入力情報と同一又は関連する学習対象の情報を含む教師データを用いた第1の学習を実行することによって生成された少なくとも1つのベースモデルに結合された状態で、前記学習対象の情報のうち前記第1の学習で用いられた情報と異なる情報を含む教師データを用いた第2の学習を実行することによって生成された、前記入力情報を前記少なくとも1つのベースモデルに入力する前に変換可能なアダプタを取得することと、
前記学習対象の情報のうち前記第1の学習で用いられた情報及び前記第2の学習で用いられた情報のいずれとも異なる情報を含む教師データを用いた第3の学習を実行することによってターゲットモデルを生成することと、
前記アダプタと前記ターゲットモデルとを結合することによって前記学習済みモデルを生成することと
を含む学習済みモデル生成方法。 - 入力情報に含まれる認識対象の認識結果を出力する学習済みモデルを備えた認識装置であって、
前記学習済みモデルは、
前記入力情報と同一又は関連する学習対象の情報を含む教師データを用いた第1の学習を実行することによって生成された少なくとも1つのベースモデルに結合された状態で、前記学習対象の情報のうち前記第1の学習で用いられた情報と異なる情報を含む教師データを用いた第2の学習を実行することによって生成された、前記入力情報を前記少なくとも1つのベースモデルに入力する前に変換可能なアダプタと、
前記学習対象の情報のうち前記第1の学習で用いられた情報及び前記第2の学習で用いられた情報のいずれとも異なる情報を含む教師データを用いた第3の学習を実行することによって生成したターゲットモデルと
を含み、
前記アダプタと前記ターゲットモデルとを結合することによって構成されている、
認識装置。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023513902A JP7271809B2 (ja) | 2021-05-28 | 2022-05-27 | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 |
EP22811422.9A EP4350614A1 (en) | 2021-05-28 | 2022-05-27 | Trained model generating device, trained model generating method, and recognition device |
CN202280037790.3A CN117396927A (zh) | 2021-05-28 | 2022-05-27 | 训练模型生成装置、训练模型生成方法和识别装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021090676 | 2021-05-28 | ||
JP2021-090676 | 2021-05-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022250154A1 true WO2022250154A1 (ja) | 2022-12-01 |
Family
ID=84228930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/021815 WO2022250154A1 (ja) | 2021-05-28 | 2022-05-27 | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4350614A1 (ja) |
JP (2) | JP7271809B2 (ja) |
CN (1) | CN117396927A (ja) |
WO (1) | WO2022250154A1 (ja) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016071502A (ja) | 2014-09-29 | 2016-05-09 | セコム株式会社 | 対象識別装置 |
WO2019194256A1 (ja) * | 2018-04-05 | 2019-10-10 | 株式会社小糸製作所 | 演算処理装置、オブジェクト識別システム、学習方法、自動車、車両用灯具 |
US10565471B1 (en) * | 2019-03-07 | 2020-02-18 | Capital One Services, Llc | Systems and methods for transfer learning of neural networks |
US20200134469A1 (en) * | 2018-10-30 | 2020-04-30 | Samsung Sds Co., Ltd. | Method and apparatus for determining a base model for transfer learning |
JP2020144700A (ja) * | 2019-03-07 | 2020-09-10 | 株式会社日立製作所 | 画像診断装置、画像処理方法及びプログラム |
JP2021056785A (ja) * | 2019-09-30 | 2021-04-08 | セコム株式会社 | 画像認識システム、撮像装置、認識装置及び画像認識方法 |
-
2022
- 2022-05-27 EP EP22811422.9A patent/EP4350614A1/en active Pending
- 2022-05-27 CN CN202280037790.3A patent/CN117396927A/zh active Pending
- 2022-05-27 JP JP2023513902A patent/JP7271809B2/ja active Active
- 2022-05-27 WO PCT/JP2022/021815 patent/WO2022250154A1/ja active Application Filing
-
2023
- 2023-04-26 JP JP2023072697A patent/JP2023099084A/ja active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016071502A (ja) | 2014-09-29 | 2016-05-09 | セコム株式会社 | 対象識別装置 |
WO2019194256A1 (ja) * | 2018-04-05 | 2019-10-10 | 株式会社小糸製作所 | 演算処理装置、オブジェクト識別システム、学習方法、自動車、車両用灯具 |
US20200134469A1 (en) * | 2018-10-30 | 2020-04-30 | Samsung Sds Co., Ltd. | Method and apparatus for determining a base model for transfer learning |
US10565471B1 (en) * | 2019-03-07 | 2020-02-18 | Capital One Services, Llc | Systems and methods for transfer learning of neural networks |
JP2020144700A (ja) * | 2019-03-07 | 2020-09-10 | 株式会社日立製作所 | 画像診断装置、画像処理方法及びプログラム |
JP2021056785A (ja) * | 2019-09-30 | 2021-04-08 | セコム株式会社 | 画像認識システム、撮像装置、認識装置及び画像認識方法 |
Also Published As
Publication number | Publication date |
---|---|
JP2023099084A (ja) | 2023-07-11 |
EP4350614A1 (en) | 2024-04-10 |
JPWO2022250154A1 (ja) | 2022-12-01 |
JP7271809B2 (ja) | 2023-05-11 |
CN117396927A (zh) | 2024-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11338435B2 (en) | Gripping system with machine learning | |
US11741701B2 (en) | Autonomous task performance based on visual embeddings | |
CN111275063A (zh) | 一种基于3d视觉的机器人智能抓取控制方法及系统 | |
JP7200610B2 (ja) | 位置検出プログラム、位置検出方法及び位置検出装置 | |
Moutinho et al. | Deep learning-based human action recognition to leverage context awareness in collaborative assembly | |
JP7271809B2 (ja) | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 | |
JP7271810B2 (ja) | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 | |
Mohammed et al. | Color matching based approach for robotic grasping | |
WO2023042895A1 (ja) | 学習済みモデル生成方法、推論装置、及び学習済みモデル生成装置 | |
Andersen et al. | Using a flexible skill-based approach to recognize objects in industrial scenarios | |
EP4389367A1 (en) | Holding mode determination device for robot, holding mode determination method, and robot control system | |
JP7483179B1 (ja) | 推定装置、学習装置、推定方法及び推定プログラム | |
EP4393660A1 (en) | Trained model generation method, trained model generation device, trained model, and device for estimating maintenance state | |
Somani et al. | Scene perception and recognition for human-robot co-operation | |
Güler et al. | Visual state estimation in unseen environments through domain adaptation and metric learning | |
JP7470062B2 (ja) | 情報処理装置、および、学習認識システム | |
Tokuda et al. | CNN-based Visual Servoing for Pose Control of Soft Fabric Parts | |
Johnson et al. | Recognition of Marker-less human actions in videos using hidden Markov models | |
Gu et al. | TOWARDS AUTOMATED ROBOT MANIPULATION: A UNIFIED ACTIVE VISION FRAMEWORK | |
Qi et al. | 3D Hand Joint and Grasping Estimation for Teleoperation System | |
KR20230175122A (ko) | 대상물의 조작, 특히 픽업을 위한 로봇 제어 방법 | |
Somei et al. | Clustering of image features based on contact and occlusion among robot body and objects | |
KR20240096990A (ko) | 비고정 물체를 위치 이동시키는 로봇의 제어 장치 | |
Li et al. | Multilevel part-based model for object manipulation | |
Palm | Recognition of Human Grasps by Time-clustering, Fuzzy Modeling, and Hidden Markov Models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22811422 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023513902 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280037790.3 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022811422 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022811422 Country of ref document: EP Effective date: 20240102 |