WO2022250153A1 - 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 - Google Patents
学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 Download PDFInfo
- Publication number
- WO2022250153A1 WO2022250153A1 PCT/JP2022/021814 JP2022021814W WO2022250153A1 WO 2022250153 A1 WO2022250153 A1 WO 2022250153A1 JP 2022021814 W JP2022021814 W JP 2022021814W WO 2022250153 A1 WO2022250153 A1 WO 2022250153A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- information
- target
- learning
- adapter
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 42
- 230000008878 coupling Effects 0.000 claims abstract description 6
- 238000010168 coupling process Methods 0.000 claims abstract description 6
- 238000005859 coupling reaction Methods 0.000 claims abstract description 6
- 230000009466 transformation Effects 0.000 claims 2
- 238000012549 training Methods 0.000 abstract description 4
- 239000012636 effector Substances 0.000 description 20
- 238000003860 storage Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 15
- 238000011144 upstream manufacturing Methods 0.000 description 15
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 241000124008 Mammalia Species 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 241000283086 Equidae Species 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000011960 computer-aided design Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 241000283070 Equus zebra Species 0.000 description 1
- 241000270322 Lepidosauria Species 0.000 description 1
- 240000004050 Pentaglottis sempervirens Species 0.000 description 1
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003754 machining Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to a trained model generation device, a trained model generation method, and a recognition device.
- a trained model generating device includes a control unit that generates a trained model that outputs a recognition result of a recognition target included in input information.
- the control unit acquires base models including at least a first base model generated by learning first information that is the same as or related to the input information as teacher data.
- the control unit uses the first base model as a first target model and, in a state in which a second target model to be combined with the first target model is combined with the first target model, sets second information representing the recognition target. as teacher data, a target model including the first target model and the second target model is generated.
- the control unit acquires an adapter generated by learning at least third information as teacher data while being coupled to the base model.
- the control unit generates the learned model by binding the adapter to the target model.
- a trained model generation method is executed by a trained model generation device that generates a trained model that outputs a recognition result of a recognition target included in input information.
- the learned model generation method acquires a base model including at least a first base model generated by learning, by the learned model generation device, first information that is the same as or related to the input information as teacher data. including doing
- the trained model generation device uses the first base model as a first target model, and combines a second target model that combines with the first target model with the first target model. state, generating a target model including the first target model and the second target model by learning second information representing the recognition target as teacher data.
- the learned model generation method includes acquiring an adapter generated by learning at least third information as teacher data while the learned model generation device is coupled to the base model.
- the trained model generation method includes the trained model generation device generating the trained model by binding the adapter to the target model.
- a recognition device includes a trained model that outputs a recognition result of a recognition target included in input information.
- the trained models include base models including at least a first base model generated by learning first information that is the same as or related to the input information as teacher data.
- the trained model is a second target generated by learning second information representing the recognition target as teaching data in a state where the first base model is used as a first target model and combined with the first target model.
- a target model comprising a model and the first target model.
- the learned model includes an adapter generated by learning at least third information as teacher data while being combined with the base model. The adapter is coupled to the target model.
- FIG. 1 is a block diagram showing a configuration example of a trained model generation system according to an embodiment
- FIG. FIG. 4 is a schematic diagram showing a general-purpose library and a trained model to which an image adapter is coupled as an adapter
- FIG. 3 is a diagram showing an example of an image adapter
- FIG. 4 is a schematic diagram showing a general-purpose library and a trained model to which a weight adapter is coupled as an adapter
- It is a figure which shows an example of a weight adapter.
- 4 is a schematic diagram showing generation of a general-purpose library through learning and generation of a trained model through learning and transfer of a part of the general-purpose library to a trained model; 4 is a flow chart showing an example procedure of a general-purpose library generation method; 4 is a flow chart showing an example procedure of a learned model generation method; 1 is a schematic diagram showing a configuration example of a robot control system; FIG.
- the recognition accuracy may decrease due to the domain gap, also called Sim-to-Real.
- the domain gap also called Sim-to-Real.
- recognition accuracy may decrease due to a domain gap that occurs when learning models are transferred. Improvement of recognition accuracy in the presence of various domain gaps is required. According to the learned model generation device, the learned model generation method, and the recognition device according to an embodiment of the present disclosure, recognition accuracy can be improved.
- a trained model generation system 1 (Configuration example of trained model generation system 1) generates a trained model 70 (see FIG. 2 or 4, etc.) that outputs a recognition result of a recognition target included in input information.
- the trained model generation system 1 generates a general-purpose library 60 (see FIG. 2 or 4, etc.) in preparation for generating a trained model 70, and generates the trained model 70 based on the general-purpose library 60.
- FIG. 1 A trained model generation system 1 according to an embodiment of the present disclosure generates a trained model 70 (see FIG. 2 or 4, etc.) that outputs a recognition result of a recognition target included in input information.
- the trained model generation system 1 generates a general-purpose library 60 (see FIG. 2 or 4, etc.) in preparation for generating a trained model 70, and generates the trained model 70 based on the general-purpose library 60.
- a trained model generation system 1 includes a general-purpose library generation device 10 and a trained model generation device 20.
- the trained model generation system 1 generates a general-purpose library 60 with a general-purpose library generation device 10 and a trained model 70 with a trained model generation device 20 .
- the trained model generation system 1 can reduce the workload of the trained model generation device 20 that generates the trained model 70 by generating the general-purpose library 60 and the trained model 70 in separate devices.
- the general-purpose library generation device 10 includes a first control section 12, a first interface 14, and a first information generation section 16.
- the trained model generation device 20 includes a second control section 22 , a second interface 24 and a second information generation section 26 .
- the designations "first” and “second” are provided merely to distinguish between the features included in each of the different devices.
- the first control unit 12 and the second control unit 22 are also simply referred to as control units.
- the first interface 14 and the second interface 24 are also simply referred to as interfaces.
- the first control unit 12 of the general-purpose library generation device 10 acquires information about the target applied to learning from the first information generation unit 16 .
- the second control unit 22 of the trained model generation device 20 acquires information about the target applied to learning from the second information generation unit 26 .
- Objects that are applied to learning are also referred to as learning objects.
- the first control unit 12 and the second control unit 22 perform learning using the information about the learning target acquired from the first information generation unit 16 and the second information generation unit 26 as teacher data, and perform learning using information or data based on the learning result.
- the learning target for generating the trained model 70 may include the object itself to be recognized, or another object. may include An object that can be recognized by the trained model 70 is also called a recognition target.
- the first control unit 12 and the second control unit 22 may be configured with at least one processor to provide control and processing power to perform various functions.
- the processor may execute programs that implement various functions of the first controller 12 and the second controller 22 .
- a processor may be implemented as a single integrated circuit.
- An integrated circuit is also called an IC (Integrated Circuit).
- a processor may be implemented as a plurality of communicatively coupled integrated and discrete circuits. Processors may be implemented based on various other known technologies.
- the first control unit 12 and the second control unit 22 may have storage units.
- the storage unit may include an electromagnetic storage medium such as a magnetic disk, or may include a memory such as a semiconductor memory or a magnetic memory.
- the storage unit stores various information.
- the storage unit stores programs and the like executed by the first control unit 12 and the second control unit 22 .
- the storage unit may be configured as a non-transitory readable medium.
- the storage section may function as a work memory for the first control section 12 and the second control section 22 . At least a part of the storage section may be configured separately from the first control section 12 and the second control section 22 .
- the first interface 14 of the general-purpose library generation device 10 and the second interface 24 of the trained model generation device 20 mutually input and output information or data.
- the first interface 14 and the second interface 24 may be configured including communication devices configured to be capable of wired or wireless communication.
- the first interface 14 and the second interface 24 are also called communication units.
- a communication device may be configured to be able to communicate with communication schemes based on various communication standards.
- the first interface 14 and the second interface 24 can be constructed using known communication techniques.
- the first interface 14 outputs information or data acquired from the first control unit 12 to the learned model generation device 20, and outputs information or data acquired from the learned model generation device 20 to the first control unit 12.
- the second interface 24 outputs information or data acquired from the second control unit 22 to the general-purpose library generation device 10 and outputs information or data acquired from the general-purpose library generation device 10 to the second control unit 22 .
- the first information generation unit 16 of the general-purpose library generation device 10 outputs teacher data used in learning in the first control unit 12 to the first control unit 12 .
- the second information generation unit 26 of the trained model generation device 20 outputs teacher data used in learning in the second control unit 22 to the second control unit 22 .
- the first information generation unit 16 and the second information generation unit 26 may generate teacher data, or may acquire teacher data from an external device.
- the first information generation unit 16 and the second information generation unit 26 may include at least one processor to provide control and processing capability for generating or acquiring teacher data.
- the processor may execute a program that generates or acquires teacher data.
- the first information generator 16 and the second information generator 26 may be configured identically or similarly to the first controller 12 and the second controller 22 .
- the first information generator 16 may be configured integrally with the first controller 12 .
- the second information generator 26 may be configured integrally with the second controller 22 .
- the first information generation unit 16 may generate information representing the actual mode of the learning target as teacher data.
- Information representing the actual aspect of the learning object is also referred to as actual information.
- the information representing the actual mode of the learning target can be said to be the same as or related to the input information.
- the first information generation unit 16 may include a camera that captures an actual image of the learning object.
- the first information generation unit 16 may perform annotation by adding information such as a label to the actual image to be learned.
- the first information generator 16 may receive an operation input related to annotation from the user.
- the first information generation unit 16 may perform annotation based on a learning model for annotation prepared in advance.
- the first information generator 16 can generate actual information by annotating the actual image to be learned.
- the first information generation unit 16 and the second information generation unit 26 virtually generate, as teacher data, information about the learning target as information of a task that is the same as or related to the input information input to the trained model 70 .
- an example of input information is an image depicting organisms including mammals.
- the information about the learning object generated as the information of the same task as the input information is an image of a mammal.
- the information about the learning target generated as the information of the task related to the input information is, for example, an image of a reptile.
- the information about the learning object that is virtually generated is also called pseudo information.
- the pseudo information generated by the first information generator 16 is also called first pseudo information.
- the pseudo information generated by the second information generator 26 is also called second pseudo information.
- the first information generator 16 and the second information generator 26 may generate the first pseudo information and the second pseudo information, for example, using the same method, the same specifications, or the same environment. More specifically, when the first information generator 16 and the second information generator 26 virtually generate the first pseudo information and the second pseudo information, for example, the same software processing may be used to generate the information. .
- a task may include, for example, a classification task that classifies recognition targets included in input information into at least two types.
- the classification task can be subdivided into, for example, a task of distinguishing whether the recognition target is a dog or a cat, or a task of distinguishing whether the recognition target is a cow or a horse.
- Tasks are not limited to classification tasks, and may include tasks that implement various other operations.
- a task may include a segmentation determining from pixels belonging to a particular object.
- a task may include object detection to detect an enclosing rectangular region.
- the task may include object pose estimation.
- a task may include keypoint detection to find certain feature points.
- both the input information and the information about the learning target are classification task information
- the relationship between the input information and the information about the learning target is assumed to be related task information.
- both the input information and the information about the learning target are task information for distinguishing whether the recognition target is a dog or a cat
- the relationship between the input information and the information about the learning target is the same. task information.
- the relationship between the input information and the learning target information is not limited to these examples, and can be determined under various conditions.
- the first information generation unit 16 and the second information generation unit 26 may generate information that virtually represents the appearance of the learning target in order to generate pseudo information.
- the first information generation unit 16 and the second information generation unit 26 generate modeling data such as three-dimensional CAD (Computer Aided Design) data of the appearance of the learning object as information that virtually represents the appearance of the learning object. good too.
- the first information generation unit 16 and the second information generation unit 26 may generate a learning target image as information that virtually represents the appearance of the learning target.
- the first information generation unit 16 and the second information generation unit 26 may perform annotation by adding information such as a label to modeling data or an image that virtually represents the appearance of the object to be learned.
- the first information generation unit 16 and the second information generation unit 26 can generate pseudo information by annotating the generated information that virtually represents the appearance of the object to be learned.
- the first information generation unit 16 and the second information generation unit 26 may acquire information that virtually represents the appearance of the learning object from an external device.
- the first information generation unit 16 and the second information generation unit 26 may receive input regarding modeling data from the user.
- the first information generation unit 16 and the second information generation unit 26 may acquire data obtained by annotating information that virtually represents the appearance of the object to be learned.
- the first information generation unit 16 and the second information generation unit 26 may receive an operation input regarding annotation from the user.
- the first information generation unit 16 and the second information generation unit 26 may perform annotation on information that virtually represents the appearance of a learning target based on a learning model for annotation prepared in advance.
- the trained model generation system 1 generates a general-purpose library 60 in advance and generates a trained model 70 based on the general-purpose library 60 . Specifically, the trained model generation system 1 transfers part of the general-purpose library 60 to the trained model 70 as illustrated in FIGS. 2 and 3 .
- the general-purpose library 60 is represented as a model in which the adapter 50 is combined with the base model 30 .
- Base model 30 includes a first base model 31 and a second base model 32 .
- the trained model 70 is expressed as a model in which the adapter 50 is coupled to the target model 40 .
- Target model 40 includes a first target model 41 and a second target model 42 .
- the base model 30 and the target model 40 are configured as a CNN (Convolution Neural Network) having multiple layers. Information input to the base model 30 and the target model 40 is subjected to convolution based on predetermined weighting factors in each layer of the CNN. In training the base model 30 and the target model 40, the weighting factors are updated.
- Base model 30 and target model 40 may be configured by VGG 16 or ResNet 50 .
- the base model 30 and the target model 40 are not limited to these examples, and may be configured as various other models.
- the part transferred from the general-purpose library 60 to the trained model 70 is commonly included in the general-purpose library 60 and the trained model 70, and is also called a backbone.
- the first base model 31 and the first target model 41 correspond to the backbone.
- a portion that is not common between the general-purpose library 60 and the trained model 70 is also called a head.
- the second base model 32 and the second target model 42 correspond to the head.
- Base model 30 and target model 40 include a backbone and a head.
- Generic library 60 and trained model 70 also include backbone, head, and adapter 50 .
- the backbone is configured to output the result of extracting the feature quantity of the input information.
- the feature quantity represents, for example, the feature of the appearance of the learning object as a numerical value.
- the head is configured to make predetermined decisions about the input information based on the output of the backbone. Specifically, the head may output the recognition result of the recognition target included in the input information based on the feature amount of the input information output by the backbone. That is, the head is configured to perform recognition of the recognition target as a predetermined determination.
- the feature quantity can be a parameter representing the ratio of striped area on the body surface.
- the predetermined determination may be to determine whether the recognition target is a horse or a zebra by comparing the area ratio of the striped pattern on the body surface with a threshold value.
- the feature quantity may be a parameter representing the size or the number of holes in the shell.
- the predetermined determination may be comparing the size or the number of holes in the shell with a threshold value to determine whether the recognition target is an abalone or a tokobushi.
- the general-purpose library generation device 10 generates a base model 30 including a first base model 31 and a second base model 32 by learning based on teacher data. Further, the general-purpose library generating device 10 generates the adapter 50 by executing learning based on the teacher data while the adapter 50 is coupled to the base model 30 and updating the adapter 50 based on the learning result. The general-purpose library generation device 10 generates the general-purpose library 60 by connecting the adapter 50 to the base model 30 .
- the learned model generation device 20 acquires the first base model 31 from the general-purpose library generation device 10 as the first target model 41 . That is, the first target model 41 is the same as the first base model 31 .
- the trained model generation device 20 generates the target model 40 by learning based on teacher data.
- the trained model generation device 20 generates the second target model 42 in accordance with the already acquired first target model 41 .
- the trained model generation device 20 acquires the adapter 50 from the general-purpose library generation device 10 .
- the trained model generation device 20 generates a trained model 70 by connecting the adapter 50 acquired from the general-purpose library generation device 10 to the generated target model 40 .
- the trained model generation system 1 transfers the first base model 31 as the first target model 41 from the general-purpose library 60 to the trained model 70 . Also, the trained model generation system 1 transfers the adapter 50 from the general-purpose library 60 to the trained model 70 .
- the first base model 31 may be included in the base model 30 learned by using the first pseudo information generated by the first information generation unit 16 as teacher data.
- the general-purpose library 60 and the trained model 70 illustrated in FIG. 2 include an image adapter 51 as the adapter 50.
- Image adapter 51 is coupled to the input side of base model 30 or target model 40 .
- the image adapter 51 is configured to convert input information before it is input to the base model 30 or target model 40 .
- the image adapter 51 may be configured as a CNN with multiple layers, as illustrated in FIG.
- the image adapter 51 is coupled to the input side of the target model 40, but it can also be coupled to the input side of the base model 30.
- Blocks labeled “Conv” represent performing convolution. Convolution is also called downsampling. Also, the block described as “Conv Trans” represents the execution of transposed convolution. Transposed convolution is also called upsampling. Transposed convolution is sometimes referred to as deconvolution.
- the block labeled "Conv 4x4" represents that the size of the filter used to perform the convolution on the two-dimensional data is 4x4.
- a filter also called a kernel, corresponds to a set of weighting coefficients in performing a convolution or deconvolution of the information input to the block.
- the block labeled "Conv Trans 4x4" represents that the size of the filter used to perform the transposed convolution on the two-dimensional data is 4x4.
- the block labeled "stride 2" represents shifting the filter by two elements when performing convolution or transposed convolution. Conversely, blocks without “stride 2" indicate that the filter is shifted by one element when performing convolution or transposed convolution.
- the image adapter 51 When the image adapter 51 is connected to the input side of the base model 30 , it converts the first pseudo information or real information input for learning and outputs it to the base model 30 .
- the image adapter 51 converts the input image and outputs it to the base model 30 .
- the image adapter 51 converts and outputs an image to be recognized included in the input information input to the trained model 70 .
- the image adapter 51 may convert the form of the input image and output it.
- the image adapter 51 may convert the form of the input image to, for example, emphasize the edges of the image or brighten the shadowed part, and then output the converted image, but is not limited to this.
- the image adapter 51 converts the connected target model 40 into a form that can process the task correctly. For example, if the task is recognition of an object included in an image, the base model 30 or the target model 40 converts the mode so that the result of correctly recognizing the recognition target can be output.
- the general-purpose library 60 and the trained model 70 illustrated in FIG. 3 include a weight adapter 52 as the adapter 50.
- Weight adapter 52 is coupled to the interior of base model 30 or target model 40 .
- the weight adapter 52 is configured to convert information output by the base model 30 or the target model 40 by being coupled inside the base model 30 or the target model 40 .
- a configuration in which the weight adapter 52 is coupled inside the target model 40 will be described below with reference to FIG.
- the target model 40 includes an input layer 43 to which input information is input, an intermediate layer 44, and an output layer 45 to output information from the target model 40.
- the intermediate layer 44 is coupled with the input layer 43 via the first target model 41 .
- the first target model 41 represents a first relationship specified by a first weighting coefficient that represents the strength of coupling between the input layer 43 and the intermediate layer 44 . That is, the intermediate layer 44 is coupled with the input layer 43 in the first relationship by being coupled with the input layer 43 via the first target model 41 .
- Input information input to the input layer 43 is transformed by the first target model 41 based on the first relationship. Information obtained by converting the input information based on the first relationship is also referred to as conversion information.
- Intermediate layer 44 passes the transform information to output layer 45 .
- the output layer 45 is coupled with the intermediate layer 44 via the second target model 42 .
- a second target model 42 represents a second relationship specified by a second weighting factor representing the strength of the coupling between the intermediate layer 44 and the output layer 45 . That is, the output layer 45 is coupled with the intermediate layer 44 in a second relationship by being coupled with the intermediate layer 44 via the second target model 42 .
- the transformed information passed through the intermediate layer 44 is transformed in the second target model 42 based on the second relationship. Information obtained by converting the conversion information based on the second relationship is also referred to as output information.
- the output layer 45 outputs output information as a recognition result of the recognition target included in the input information by the trained model 70 .
- the weight adapter 52 is connected in parallel to the first target model 41. Weight adapters 52 may be coupled in parallel to at least one layer of the CNN that makes up target model 40 . Although the weight adapter 52 is coupled to the first target model 41 in the example of FIG. 5 , it may be coupled to the second target model 42 . Weight adapter 52 may be coupled to first base model 31 or second base model 32 . The weight adapter 52 may be configured as one layer, or may be configured as two or more layers.
- a block labeled “Conv 1x1” represents performing convolution on two-dimensional data, and represents that the size of the filter used to perform the convolution is 1 ⁇ 1. Blocks labeled "Conv 1x1” may be replaced by various other blocks, such as "Conv 3x3" for example.
- the weight adapter 52 influences conversion from input information to conversion information. That is, the weight adapter 52 can convert conversion information.
- the weight adapter 52 is coupled to the second target model 42 to affect the conversion of transform information to output information. That is, the weight adapter 52 can convert the output information.
- the weight adapter 52 is coupled inside the target model 40 so as to convert at least one of the conversion information and the output information.
- the weight adapter 52 converts at least one of the conversion information and the output information so that the target model 40 can correctly process the task for the input information.
- the trained model generation system 1 can generate the trained model 70 by executing the operations schematically shown in FIG. 6, for example. The operation of the trained model generation system 1 will be described below with reference to FIG.
- the trained model generation system 1 generates a base model 30 using the general-purpose library generation device 10 .
- the first control unit 12 of the general-purpose library generation device 10 acquires first pseudo information from the first information generation unit 16 as teacher data.
- the first control unit 12 learns based on the first pseudo information.
- the first control unit 12 inputs the first pseudo information to the base model 30 including the first base model 31a and the second base model 32a that are being learned.
- the first control unit 12 updates the base model 30 so as to increase the probability that the information output from the learning base model 30 will be the information representing the learning target included in the first pseudo information.
- the first control unit 12 may update the base model 30 by updating the weighting coefficients.
- the base model 30 Before starting learning, the base model 30 may be in a predetermined initial state.
- the weighting factor of the base model 30 may be set to a predetermined initial value.
- the first base model 31a and the second base model 32a to be updated by learning are represented by black rectangles.
- the first control unit 12 can generate the base model 30 by learning based on the first pseudo information.
- the trained model generation system 1 generates the adapter 50 using the general-purpose library generation device 10 .
- the first control unit 12 of the general-purpose library generation device 10 further acquires actual information as teacher data from the first information generation unit 16 .
- the first control unit 12 With the adapter 50 coupled to the base model 30 including the learned first base model 31b and the second base model 32b generated in the first step, the first control unit 12 generates first pseudo information and real information. and update the adapter 50 .
- the first control unit 12 may update the adapter 50 by updating the weighting factor of the adapter 50 .
- the adapter 50 coupled to the base model 30 may be in a predetermined initial state. That is, the weighting factor of the adapter 50 may be set to a predetermined initial value.
- the first control unit 12 inputs the first pseudo information and the real information to the general-purpose library 60, which connects the learning adapter 50a to the learned base model 30 generated in the first step.
- the first control unit 12 updates the adapter 50 so that the information output from the general-purpose library 60 to which the first pseudo information is input and the information output from the general-purpose library 60 to which the real information is input are closer.
- the first control unit 12 controls the adapter 50 so as to increase the probability that the information output from the general-purpose library 60 to which the first pseudo information is input matches the information output from the general-purpose library 60 to which the real information is input. You may update.
- the adapter 50a to be updated by learning is represented by a black rectangle.
- the first control unit 12 can generate the adapter 50 through learning based on the first pseudo information and real information. Learning based on the first pseudo information and real information is also called pre-learning because it is executed prior to learning in the third step described later.
- the trained model generation system 1 generates the target model 40 using the trained model generation device 20 .
- the second control unit 22 of the trained model generation device 20 acquires the second pseudo information from the second information generation unit 26 as teacher data.
- the second control unit 22 acquires the first base model 31 generated in the first step as the first target model 41 .
- the second control unit 22 learns by inputting the second pseudo information to the target model 40 including the acquired first target model 41 and the second target model 42a being learned, and learns the second target model 42a being learned. 42a.
- the second control unit 22 may update the second target model 42a by updating the weighting coefficients of the second target model 42a.
- the second target model 42a being learned may be in a predetermined initial state before starting learning.
- the weighting factor of the second target model 42a during learning may be set to a predetermined initial value.
- the second control unit 22 updates the second target model 42a so as to increase the probability that the information output from the learning target model 40 is the information representing the learning target included in the second pseudo information.
- the second target model 42a to be updated by learning is represented by a black rectangle.
- the second control unit 22 By executing the operation described as the third step, the second control unit 22 generates the second target model 42 by learning based on the second pseudo information, and the acquired first target model 41 and the generated second target model A target model 40 including a model 42 can be generated.
- the second control unit 22 generates only the second target model 42 by learning in the third step.
- the trained model generation system 1 generates a trained model 70 using the trained model generation device 20 .
- the second control unit 22 of the trained model generation device 20 acquires the adapter 50 generated in the second step.
- the adapter 50 acquired by the second control unit 22 is represented as a learned adapter 50b.
- the second control unit 22 couples the learned adapter 50b to the target model 40 including the first target model 41 acquired in the third step and the learned second target model 42b generated in the third step. By doing so, a trained model 70 is generated.
- the trained model generation system 1 may execute the operations described above as procedures of the trained model generation method.
- the operations described above are divided into operations executed by the general-purpose library generation device 10 and operations executed by the trained model generation device 20 .
- An example of the procedure of operations performed by the general-purpose library generation device 10 and the trained model generation device 20 will be described below.
- the general-purpose library generation device 10 may execute a general-purpose library generation method including the procedure of the flowchart illustrated in FIG.
- the general-purpose library generation method may be implemented as a general-purpose library generation program that is executed by a processor that constitutes the general-purpose library generation device 10 .
- the general purpose library generator program may be stored on non-transitory computer readable media.
- the first control unit 12 of the general-purpose library generation device 10 acquires first pseudo information from the first information generation unit 16 (step S1).
- the first control unit 12 generates the base model 30 by learning based on the first pseudo information (step S2).
- the first control unit 12 further acquires actual information from the first information generation unit 16 (step S3).
- the 1st control part 12 produces
- the first control unit 12 ends the execution of the procedure of the flowchart of FIG.
- the first control unit 12 may output the generated first base model 31 and the adapter 50 to the learned model generating device 20 after executing the procedure of step S4.
- the trained model generation device 20 may execute a trained model generation method including the procedures of the flowchart illustrated in FIG.
- the trained model generation method may be implemented as a trained model generation program that is executed by a processor that constitutes the trained model generation device 20 .
- the trained model generation program may be stored on non-transitory computer-readable media.
- the second control unit 22 of the trained model generation device 20 acquires the first base model 31 as the first target model 41 from the general-purpose library generation device 10 (step S11).
- the second control unit 22 acquires the second pseudo information from the second information generation unit 26 (step S12).
- the second control unit 22 generates the second target model 42 by learning based on the second pseudo information (step S13).
- the second control unit 22 acquires the adapter 50 from the general-purpose library generation device 10 (step S14).
- the second control unit 22 couples the adapter 50 to the target model 40 including the acquired first target model 41 and the generated second target model 42 (step S15). By doing so, the second control unit 22 can generate the learned model 70 including the adapter 50 and the target model 40 .
- the second control unit 22 After executing the procedure of step S15, the second control unit 22 ends the execution of the procedure of the flowchart of FIG. After executing the procedure of step S15, the second control unit 22 inputs the input information to the generated trained model 70, and evaluates the recognition accuracy of the recognition target included in the input information based on the output of the trained model 70.
- the trained model generation system 1 can generate the trained model 70 by executing the general-purpose library generation method and the trained model generation method on separate devices.
- the trained model generation system 1 does not need to learn based on real information to generate the trained model 70 by learning based on real information to generate the general-purpose library 60 .
- the trained model generation system 1 causes the trained model generation device 20 to execute the operation of generating the trained model 70, so that the trained model generation device 20 only executes an operation that does not involve learning based on actual information. done. As a result, the operation load of the trained model generation device 20 can be reduced.
- the general-purpose library generation device 10 is also called an upstream device.
- the general-purpose library 60 generated by learning with the general-purpose library generation device 10, which is an upstream device, is also called an upstream task.
- the upstream tasks are generated by prior learning by the service provider.
- the trained model generation device 20 is also called a downstream device.
- a trained model 70 generated by learning in the trained model generating device 20, which is a downstream device, is also called a downstream task. Downstream tasks are generated so that the end-user of the service can improve the recognition accuracy of the desired recognition target by learning to the desired recognition target.
- the trained model generation system 1 uses a lot of data and computational resources in the upstream device to generate upstream tasks. can generate downstream tasks and get the system up and running early.
- the trained model generation system 1 transfers the adapter 50 for domain adaptation from the upstream task to the downstream task, so that even in the downstream task that has not learned based on the actual information, the recognition accuracy for the actual information is improved. can increase In other words, the base model 30 included in the upstream task is generated by learning so as to increase the recognition accuracy for pseudo information. In this case, although the recognition accuracy for real information is lower than the recognition accuracy for pseudo information, it can be improved by correction by the adapter 50 .
- a new framework that can be proposed by the trained model generation system 1 according to this embodiment is also called Task Rehearsal Bridging (TRB).
- TRB Task Rehearsal Bridging
- the trained model generation system 1 can apply the image adapter 51 or the weight adapter 52 as the adapter 50 .
- the trained model generation system 1 generates an upstream task based on the results of learning the adapter 50 based on pseudo information and real information, thereby substituting the upstream task for the downstream task.
- ⁇ Comparison of recognition accuracy> When recognizing a recognition target from input information including a real image using a model generated by learning based only on a generated image that is pseudo information, the recognition accuracy decreases due to the difference between the generated image and the real image. Specifically, in a model that can recognize a recognition target with a probability close to 100% for a generated image, the probability that a recognition target can be recognized for a real image can drop to about 70%.
- a trained model 70 according to this embodiment is generated as a model in which the adapter 50 is coupled to the target model 40 .
- the adapter 50 can correct errors in recognition results due to differences between the generated image and the actual image.
- the probability that the recognition target can be recognized with respect to the real image can be increased to about 90%. That is, when the adapter 50 is connected, the probability of being able to recognize the recognition target can be increased compared to when the adapter 50 is not connected.
- a robot control system 100 includes a robot 2 and a robot control device 110 .
- the robot 2 moves the work object 8 from the work start point 6 to the work target point 7 . That is, the robot control device 110 controls the robot 2 so that the work object 8 moves from the work start point 6 to the work target point 7 .
- the work object 8 is also referred to as work object.
- the robot control device 110 controls the robot 2 based on information regarding the space in which the robot 2 works. Information about space is also referred to as spatial information.
- the robot 2 has an arm 2A and an end effector 2B.
- the arm 2A may be configured as, for example, a 6-axis or 7-axis vertical articulated robot.
- the arm 2A may be configured as a 3-axis or 4-axis horizontal articulated robot or SCARA robot.
- the arm 2A may be configured as a 2-axis or 3-axis Cartesian robot.
- Arm 2A may be configured as a parallel link robot or the like.
- the number of shafts forming the arm 2A is not limited to the illustrated one.
- the robot 2 has an arm 2A connected by a plurality of joints and operates by driving the joints.
- the end effector 2B may include, for example, a gripping hand configured to grip the work object 8.
- the grasping hand may have multiple fingers. The number of fingers of the grasping hand may be two or more. The fingers of the grasping hand may have one or more joints.
- the end effector 2B may include a suction hand configured to be able to suction the work object 8 .
- the end effector 2B may include a scooping hand configured to scoop the work object 8 .
- the end effector 2 ⁇ /b>B includes a tool such as a drill, and may be configured to be able to perform various machining operations such as drilling a hole in the work object 8 .
- the end effector 2B is not limited to these examples, and may be configured to perform various other operations. In the configuration illustrated in FIG. 1, the end effector 2B is assumed to include a grasping hand.
- the robot 2 can control the position of the end effector 2B by operating the arm 2A.
- the end effector 2 ⁇ /b>B may have an axis that serves as a reference for the direction in which it acts on the work object 8 . If the end effector 2B has an axis, the robot 2 can control the direction of the axis of the end effector 2B by operating the arm 2A.
- the robot 2 controls the start and end of the action of the end effector 2B acting on the work object 8 .
- the robot 2 can move or process the workpiece 8 by controlling the position of the end effector 2B or the direction of the axis of the end effector 2B and controlling the operation of the end effector 2B. In the configuration illustrated in FIG.
- the robot 2 causes the end effector 2B to grip the work object 8 at the work start point 6 and moves the end effector 2B to the work target point 7 .
- the robot 2 causes the end effector 2B to release the work object 8 at the work target point 7 . By doing so, the robot 2 can move the work object 8 from the work start point 6 to the work target point 7 .
- the robot control system 100 further comprises a sensor 3, as shown in FIG. A sensor 3 detects physical information of the robot 2 .
- the physical information of the robot 2 may include information on the actual position or orientation of each constituent part of the robot 2 or the velocity or acceleration of each constituent part of the robot 2 .
- the physical information of the robot 2 may include information about forces acting on each component of the robot 2 .
- the physical information of the robot 2 may include information about the current flowing through the motors that drive each component of the robot 2 or the torque of the motors.
- the physical information of the robot 2 represents the result of the actual motion of the robot 2 . In other words, the robot control system 100 can grasp the result of the actual motion of the robot 2 by acquiring the physical information of the robot 2 .
- the sensor 3 may include a force sensor or a tactile sensor that detects force acting on the robot 2, distributed pressure, slip, or the like as physical information of the robot 2.
- the sensor 3 may include a motion sensor that detects the position or posture, or the speed or acceleration of the robot 2 as the physical information of the robot 2 .
- the sensor 3 may include a current sensor that detects the current flowing through the motor that drives the robot 2 as the physical information of the robot 2 .
- the sensor 3 may include a torque sensor that detects the torque of the motor that drives the robot 2 as the physical information of the robot 2 .
- the sensor 3 may be installed in a joint of the robot 2 or in a joint driving section that drives the joint.
- the sensor 3 may be installed on the arm 2A of the robot 2 or the end effector 2B.
- the sensor 3 outputs the detected physical information of the robot 2 to the robot control device 110 .
- the sensor 3 detects and outputs physical information of the robot 2 at a predetermined timing.
- the sensor 3 outputs physical information of the robot 2 as time-series data.
- the robot control system 100 is assumed to have two cameras 4 .
- the camera 4 captures an image of an object, a person, or the like located within the influence range 5 that may affect the motion of the robot 2 .
- An image captured by the camera 4 may include monochrome luminance information, or may include luminance information of each color represented by RGB (Red, Green and Blue) or the like.
- the range of influence 5 includes the motion range of the robot 2 . It is assumed that the influence range 5 is a range obtained by expanding the motion range of the robot 2 further outward.
- the range of influence 5 may be set so that the robot 2 can be stopped before a person or the like moving from the outside to the inside of the motion range of the robot 2 enters the inside of the motion range of the robot 2 .
- the range of influence 5 may be set, for example, as a range that extends a predetermined distance from the boundary of the motion range of the robot 2 to the outside.
- the camera 4 may be installed so as to capture a bird's-eye view of the influence range 5 or the motion range of the robot 2 or a peripheral area thereof.
- the number of cameras 4 is not limited to two, and may be one or three or more.
- the robot control device 110 acquires the learned model 70 generated by the trained model generation device 20 . Based on the image captured by the camera 4 and the learned model 70, the robot control device 110 identifies the work object 8, the work start point 6, the work target point 7, or the like, which exist in the space where the robot 2 works. to recognize In other words, the robot control device 110 acquires the learned model 70 generated for recognizing the work object 8 and the like based on the image captured by the camera 4 . Robot controller 110 is also referred to as a recognizer.
- the robot controller 110 may be configured with at least one processor to provide control and processing power to perform various functions.
- Each component of the robot control device 110 may be configured including at least one processor.
- a plurality of components among the components of the robot control device 110 may be realized by one processor.
- the entire robot controller 110 may be implemented with one processor.
- the processor may execute programs that implement various functions of the robot controller 110 .
- a processor may be implemented as a single integrated circuit.
- An integrated circuit is also called an IC (Integrated Circuit).
- a processor may be implemented as a plurality of communicatively coupled integrated and discrete circuits. Processors may be implemented based on various other known technologies.
- the robot control device 110 may include a storage unit.
- the storage unit may include an electromagnetic storage medium such as a magnetic disk, or may include a memory such as a semiconductor memory or a magnetic memory.
- the storage unit stores various information, programs executed by the robot control device 110, and the like.
- the storage unit may be configured as a non-transitory readable medium.
- the storage unit may function as a work memory for the robot control device 110 . At least part of the storage unit may be configured separately from the robot controller 110 .
- the robot control device 110 acquires the learned model 70 in advance.
- the robot control device 110 may store the trained model 70 in the storage unit.
- the robot control device 110 obtains an image of the work object 8 from the camera 4 .
- the robot control device 110 inputs the captured image of the work target 8 to the learned model 70 as input information.
- the robot control device 110 acquires output information output from the learned model 70 according to the input of input information.
- the robot control device 110 recognizes the work object 8 based on the output information, and performs work such as gripping and moving the work object 8 .
- the robot control system 100 can acquire the trained model 70 from the trained model generation system 1 and recognize the work object 8 using the trained model 70 .
- the configuration for reducing the influence of Sim to Real domain gaps on recognition accuracy that can occur when a model trained using pseudo data is transferred to the recognition of actual data has been described.
- various domain gaps may occur, not limited to the examples described above.
- the trained model generation system 1 according to this embodiment can be configured to reduce the influence of various domain gaps on recognition accuracy. For example, it is possible to reduce the impact on recognition accuracy of domain gaps that can occur when a model is transferred to recognize data different from teacher data used in upstream learning.
- a model was generated by learning with real images as teacher data in upstream learning.
- This model can be transferred to recognize an image captured in an environment different from the environment in which the training data image was captured.
- the environment includes changes in the lighting environment.
- domain gaps may occur due to changes in the imaging environment, such as changes in illumination.
- the trained model generation system 1 can reduce the influence of various domain gaps, such as domain gaps that may occur due to changes in the shooting environment, on recognition accuracy.
- the data used for model learning in this embodiment may include not only pseudo data but also actual data, or may include actual data instead of pseudo data.
- the base model 30 and the target model 40 may be real image data to be learned.
- the adapter 50 may be image data obtained in a real environment in which work is performed on a real object to be learned, or image data obtained by simulating the real environment.
- the first pseudo information and the second pseudo information described in the above-described embodiments are also called first information and second information, respectively.
- real information, which is distinguished from pseudo information is also called third information in order to distinguish it from the first information and the second information.
- the first information generating section 16 and the second information generating section 26 may generate the first information, the second information, and the like by cameras and the like having the same specifications.
- the first control unit 12 of the general-purpose library generation device 10 may generate the base model 30 including at least the first base model 31 by learning the first information as teacher data.
- the first control unit 12 may generate the adapter 50 by learning with the adapter 50 coupled to the base model 30 .
- the second control unit 22 of the trained model generation device 20 may acquire the base model 30 including at least the first base model 31 and the adapter 50 .
- the second control unit 22 uses the first base model 31 as the first target model 41, and in a state where the second target model 42 coupled to the first target model 41 is coupled to the first target model 41, transmits the second information.
- a target model 40 including a first target model 41 and a second target model 42 may be generated by learning as teacher data.
- the second control unit 22 may generate the learned model 70 by connecting the acquired adapter 50 to the generated target model 40 .
- the trained model generation system 1 may set the loss function so that the output when input information is input to the generated trained model 70 approaches the output when teacher data is input.
- cross-entropy can be used as the loss function.
- Cross-entropy is calculated as a value representing the relationship between two probability distributions. Specifically, in this embodiment, the cross-entropy is calculated as a value representing the relationship between the input pseudo information or real information and the backbone, head or adapter 50 .
- the trained model generation system 1 learns so that the value of the loss function becomes small.
- the output corresponding to the input of the input information can approach the output corresponding to the input of the teacher data.
- Discrimination Loss is a loss function used to learn the authenticity of a generated image by labeling it with a numerical value between 1, which represents complete truth, and 0, which represents complete falsehood. .
- the embodiments of the trained model generation system 1 and the robot control system 100 have been described above. It can also be embodied as a medium (for example, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a CD-RW, a magnetic tape, a hard disk, or a memory card).
- a medium for example, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a CD-RW, a magnetic tape, a hard disk, or a memory card.
- the implementation form of the program is not limited to an application program such as an object code compiled by a compiler or a program code executed by an interpreter. good.
- the program may or may not be configured so that all processing is performed only in the CPU on the control board.
- the program may be configured to be partially or wholly executed by another processing unit mounted on an expansion board or expansion unit added to the board as required.
- Embodiments according to the present disclosure are not limited to any specific configuration of the embodiments described above. Embodiments of the present disclosure extend to any novel feature or combination thereof described in the present disclosure or any novel method or process step or combination thereof described. be able to.
- Descriptions such as “first” and “second” in this disclosure are identifiers for distinguishing the configurations. Configurations that are differentiated in descriptions such as “first” and “second” in this disclosure may interchange the numbers in that configuration. For example, the first pseudo information can replace the identifiers “first” and “second” with the second pseudo information. The exchange of identifiers is done simultaneously. The configurations are still distinct after the exchange of identifiers. Identifiers may be deleted. Configurations from which identifiers have been deleted are distinguished by codes. The description of identifiers such as “first” and “second” in this disclosure should not be used as a basis for interpreting the order of the configuration or the existence of lower numbered identifiers.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
本開示の一実施形態に係る学習済みモデル生成システム1は、入力情報に含まれる認識対象の認識結果を出力する学習済みモデル70(図2又は図4等参照)を生成する。学習済みモデル生成システム1は、学習済みモデル70を生成するための準備として汎用ライブラリ60(図2又は図4等参照)を生成し、汎用ライブラリ60に基づいて学習済みモデル70を生成する。
汎用ライブラリ生成装置10の第1制御部12は、第1情報生成部16から学習に適用される対象に関する情報を取得する。学習済みモデル生成装置20の第2制御部22は、第2情報生成部26から学習に適用される対象に関する情報を取得する。学習に適用される対象は、学習対象とも称される。第1制御部12及び第2制御部22は、第1情報生成部16及び第2情報生成部26から取得した学習対象に関する情報を教師データとする学習を実行し、学習結果に基づく情報又はデータを出力する。例えば学習済みモデル70が工業部品等の特定の物体を認識するモデルとして生成される場合、その学習済みモデル70を生成するための学習対象は、認識する物体そのものを含んでもよいし、他の物体を含んでもよい。学習済みモデル70が認識できる物体は、認識対象とも称される。
汎用ライブラリ生成装置10の第1インタフェース14、及び、学習済みモデル生成装置20の第2インタフェース24は、互いに情報又はデータを入出力する。第1インタフェース14及び第2インタフェース24は、有線又は無線で通信可能に構成される通信デバイスを含んで構成されてよい。第1インタフェース14及び第2インタフェース24は、通信部とも称される。通信デバイスは、種々の通信規格に基づく通信方式で通信可能に構成されてよい。第1インタフェース14及び第2インタフェース24は、既知の通信技術により構成することができる。
汎用ライブラリ生成装置10の第1情報生成部16は、第1制御部12における学習で用いられる教師データを第1制御部12に出力する。学習済みモデル生成装置20の第2情報生成部26は、第2制御部22における学習で用いられる教師データを第2制御部22に出力する。第1情報生成部16及び第2情報生成部26は、教師データを生成してもよいし、外部装置から教師データを取得してもよい。
学習済みモデル生成システム1は、汎用ライブラリ60をあらかじめ生成し、汎用ライブラリ60に基づいて学習済みモデル70を生成する。具体的に、学習済みモデル生成システム1は、図2及び図3に例示されるように、汎用ライブラリ60の一部を学習済みモデル70に転移する。汎用ライブラリ60は、ベースモデル30にアダプタ50を結合したモデルとして表される。ベースモデル30は、第1ベースモデル31と第2ベースモデル32とを含む。また、学習済みモデル70は、ターゲットモデル40にアダプタ50を結合したモデルとして表される。ターゲットモデル40は、第1ターゲットモデル41と第2ターゲットモデル42とを含む。ベースモデル30及びターゲットモデル40は、複数の層を有するCNN(Convolution Neural Network)として構成される。ベースモデル30及びターゲットモデル40に入力された情報に対して、CNNの各層において所定の重みづけ係数に基づく畳み込みが実行される。ベースモデル30及びターゲットモデル40の学習において、重みづけ係数が更新される。ベースモデル30及びターゲットモデル40は、VGG16又はResNet50によって構成されてもよい。ベースモデル30及びターゲットモデル40は、これらの例に限られず、他の種々のモデルとして構成されてもよい。
学習済みモデル生成システム1は、例えば図6に模式的に示される動作を実行することによって学習済みモデル70を生成できる。以下、図6を参照して学習済みモデル生成システム1の動作が説明される。
汎用ライブラリ生成装置10は、図7に例示されるフローチャートの手順を含む汎用ライブラリ生成方法を実行してもよい。汎用ライブラリ生成方法は、汎用ライブラリ生成装置10を構成するプロセッサに実行させる汎用ライブラリ生成プログラムとして実現されてもよい。汎用ライブラリ生成プログラムは、非一時的なコンピュータ読み取り可能な媒体に格納されてよい。
学習済みモデル生成装置20は、図8に例示されるフローチャートの手順を含む学習済みモデル生成方法を実行してもよい。学習済みモデル生成方法は、学習済みモデル生成装置20を構成するプロセッサに実行させる学習済みモデル生成プログラムとして実現されてもよい。学習済みモデル生成プログラムは、非一時的なコンピュータ読み取り可能な媒体に格納されてよい。
以上述べてきたように、本実施形態に係る学習済みモデル生成システム1は、汎用ライブラリ生成方法と学習済みモデル生成方法とをそれぞれ別の装置で実行することによって、学習済みモデル70を生成できる。学習済みモデル生成システム1は、汎用ライブラリ60を生成するために実情報に基づいて学習することによって、学習済みモデル70を生成するために実情報に基づいて学習しなくてよい。学習済みモデル生成システム1が学習済みモデル70の生成の動作を学習済みモデル生成装置20に実行させることによって、学習済みモデル生成装置20は、実情報に基づく学習を含まない動作を実行するだけで済む。その結果、学習済みモデル生成装置20の動作負荷が低減され得る。
疑似情報である生成画像のみに基づく学習によって生成したモデルで実画像を含む入力情報から認識対象を認識する場合、生成画像と実画像との差異に起因して、認識精度は低下する。具体的に、生成画像に対して100%に近い確率で認識対象を認識できるモデルにおいて、実画像に対して認識対象を認識できる確率は70%程度に低下し得る。
図9に示されるように、一実施形態に係るロボット制御システム100は、ロボット2と、ロボット制御装置110とを備える。本実施形態において、ロボット2は、作業対象物8を作業開始地点6から作業目標地点7へ移動させるとする。つまり、ロボット制御装置110は、作業対象物8が作業開始地点6から作業目標地点7へ移動するようにロボット2を制御する。作業対象物8は、作業対象とも称される。ロボット制御装置110は、ロボット2が作業を実施する空間に関する情報に基づいて、ロボット2を制御する。空間に関する情報は、空間情報とも称される。
ロボット2は、アーム2Aと、エンドエフェクタ2Bとを備える。アーム2Aは、例えば、6軸又は7軸の垂直多関節ロボットとして構成されてよい。アーム2Aは、3軸又は4軸の水平多関節ロボット又はスカラロボットとして構成されてもよい。アーム2Aは、2軸又は3軸の直交ロボットとして構成されてもよい。アーム2Aは、パラレルリンクロボット等として構成されてもよい。アーム2Aを構成する軸の数は、例示したものに限られない。言い換えれば、ロボット2は、複数の関節で接続されるアーム2Aを有し、関節の駆動によって動作する。
図2に示されるように、ロボット制御システム100は、更にセンサ3を備える。センサ3は、ロボット2の物理情報を検出する。ロボット2の物理情報は、ロボット2の各構成部の現実の位置若しくは姿勢、又は、ロボット2の各構成部の速度若しくは加速度に関する情報を含んでよい。ロボット2の物理情報は、ロボット2の各構成部に作用する力に関する情報を含んでよい。ロボット2の物理情報は、ロボット2の各構成部を駆動するモータに流れる電流又はモータのトルクに関する情報を含んでよい。ロボット2の物理情報は、ロボット2の実際の動作の結果を表す。つまり、ロボット制御システム100は、ロボット2の物理情報を取得することによって、ロボット2の実際の動作の結果を把握することができる。
図1に示される構成例において、ロボット制御システム100は、2台のカメラ4を備えるとする。カメラ4は、ロボット2の動作に影響を及ぼす可能性がある影響範囲5に位置する物品又は人間等を撮影する。カメラ4が撮影する画像は、モノクロの輝度情報を含んでもよいし、RGB(Red, Green and Blue)等で表される各色の輝度情報を含んでもよい。影響範囲5は、ロボット2の動作範囲を含む。影響範囲5は、ロボット2の動作範囲を更に外側に広げた範囲であるとする。影響範囲5は、ロボット2の動作範囲の外側から動作範囲の内側へ向かって移動する人間等がロボット2の動作範囲の内側に入るまでにロボット2を停止できるように設定されてよい。影響範囲5は、例えば、ロボット2の動作範囲の境界から所定距離だけ外側まで拡張された範囲に設定されてもよい。カメラ4は、ロボット2の影響範囲5若しくは動作範囲又はこれらの周辺の領域を俯瞰的に撮影できるように設置されてもよい。カメラ4の数は、2つに限られず、1つであってもよいし、3つ以上であってもよい。
ロボット制御装置110は、学習済みモデル生成装置20で生成された学習済みモデル70を取得する。ロボット制御装置110は、カメラ4で撮影した画像と学習済みモデル70とに基づいて、ロボット2が作業を実施する空間に存在する、作業対象物8、又は作業開始地点6若しくは作業目標地点7等を認識する。言い換えれば、ロボット制御装置110は、カメラ4で撮影した画像に基づいて作業対象物8等を認識するために生成された学習済みモデル70を取得する。ロボット制御装置110は、認識装置とも称される。
ロボット制御装置110(認識装置)は、学習済みモデル70をあらかじめ取得する。ロボット制御装置110は、学習済みモデル70を記憶部に格納してよい。ロボット制御装置110は、カメラ4から作業対象物8を撮影した画像を取得する。ロボット制御装置110は、作業対象物8を撮影した画像を入力情報として学習済みモデル70に入力する。ロボット制御装置110は、学習済みモデル70から入力情報の入力に応じて出力される出力情報を取得する。ロボット制御装置110は、出力情報に基づいて作業対象物8を認識し、作業対象物8を把持したり移動したりする作業を実行する。
以上述べてきたように、ロボット制御システム100は、学習済みモデル生成システム1から学習済みモデル70を取得し、学習済みモデル70によって作業対象物8を認識できる。
以下、他の実施形態が説明される。
以上述べてきた実施形態において、疑似データによって学習したモデルを実際のデータの認識用に転移する場合に生じ得るSim to Realのドメインギャップが認識精度に対して及ぼす影響を低減する構成が説明された。学習済みモデル70の転移において、上述してきた例に限られず、種々のドメインギャップが生じ得る。本実施形態に係る学習済みモデル生成システム1は、種々のドメインギャップが認識精度に対して及ぼす影響を低減するように構成され得る。例えば、上流の学習において用いた教師データと異なるデータを認識するためにモデルが転移される場合に生じ得るドメインギャップが認識精度に対して及ぼす影響が低減され得る。
学習済みモデル生成システム1は、生成した学習済みモデル70に対して入力情報を入力した場合における出力が教師データを入力した場合における出力に近づくように損失関数を設定してよい。本実施形態において、損失関数として交差エントロピーが用いられ得る。交差エントロピーは、2つの確率分布の間の関係を表す値として算出される。具体的に、本実施形態において、交差エントロピーは、入力する疑似情報又は実情報と、バックボーン、ヘッド又はアダプタ50との間の関係を表す値として算出される。
10 汎用ライブラリ生成装置(12:第1制御部、14:第1インタフェース、16:第1情報生成部)
20 学習済みモデル生成装置(22:第2制御部、24:第2インタフェース、26:第2情報生成部)
30 ベースモデル(31:第1ベースモデル(31a:学習中、31b:学習済)、32:第2ベースモデル(32a:学習中、32b:学習済)
40 ターゲットモデル(41:第1ターゲットモデル、42:第2ターゲットモデル(42a:学習中、42b:学習済))
50 アダプタ(50a:学習中、50b:学習済)
60 汎用ライブラリ
70 学習済みモデル
100 ロボット制御システム(2:ロボット、2A:アーム、2B:エンドエフェクタ、3:センサ、4:カメラ、5:ロボットの影響範囲、6:作業開始台、7:作業目標台、8:作業対象物、110:ロボット制御装置(認識装置)
Claims (10)
- 入力情報に含まれる認識対象の認識結果を出力する学習済みモデルを生成する制御部を備え、
前記制御部は、
前記入力情報と同一又は関連する第1情報を教師データとして学習することによって生成された、少なくとも第1ベースモデルを含むベースモデルを取得し、
前記第1ベースモデルを第1ターゲットモデルとして、前記第1ターゲットモデルに結合する第2ターゲットモデルを、前記第1ターゲットモデルに結合した状態で、前記入力情報と同一又は関連する第2情報を教師データとして学習することによって、前記第1ターゲットモデル及び前記第2ターゲットモデルを含むターゲットモデルを生成し、
前記ベースモデルに結合された状態で、少なくとも前記入力情報と同一又は関連する第3情報を教師データとして学習することによって生成されたアダプタを取得し、
前記ターゲットモデルに前記アダプタを結合することによって前記学習済みモデルを生成する、
学習済みモデル生成装置。 - 前記アダプタは、前記ターゲットモデルの入力側に結合されることによって、前記入力情報を前記ターゲットモデルに入力する前に変換可能に構成される、請求項1に記載の学習済みモデル生成装置。
- 前記ターゲットモデルは、前記入力情報が入力される入力層と、前記入力層と結合される中間層と、前記中間層に結合される出力層とを有し、
前記中間層は、前記入力層との結合の強さを表す第1の重みづけ係数で特定される第1の関係で前記入力層と結合され、前記入力情報を前記第1の関係に基づいて変換した変換情報を前記出力層に通過させ、
前記出力層は、前記中間層との結合の強さを表す第2の重みづけ係数で特定される第2の関係で前記中間層と結合され、前記変換情報を前記第2の関係に基づいて変換した出力情報を、前記学習済みモデルによる前記入力情報に含まれる認識対象の認識結果として出力し、
前記アダプタは、前記変換情報及び前記出力情報のうち少なくとも一方を変換するように、前記ターゲットモデルの内部に結合される、請求項1に記載の学習済みモデル生成装置。 - 前記アダプタは、前記第3情報及び前記第1情報を教師データとして事前学習することによって生成される、請求項1から3までのいずれか一項に記載の学習済みモデル生成装置。
- 前記第2ターゲットモデルは、前記第1ターゲットモデルに結合された第2ベースモデルのみを前記第2情報を教師データとして学習したものである、請求項1から4までのいずれか一項に記載の学習済みモデル生成装置。
- 前記第1情報、前記第2情報、前記第3情報、及び入力情報は画像を含み、
前記ベースモデルに結合された前記アダプタは、入力される前記第1情報又は前記第3情報の画像を変換して出力し、
前記ターゲットモデルに結合された前記アダプタは、入力される前記入力情報に含まれる前記認識対象の画像を変換して出力する、請求項1から5までのいずれか一項に記載の学習済みモデル生成装置。 - 前記第1ターゲットモデルは、前記入力情報の特徴量を抽出した結果を出力し、前記第2ターゲットモデルは、前記第1ターゲットモデルの出力に基づき、前記入力情報についての所定の判断を行う、請求項1から6までのいずれか一項に記載の学習済みモデル生成装置。
- 前記第1情報、前記第2情報、前記第3情報、及び入力情報は画像を含み、
前記アダプタは、入力される画像の態様を変換して出力する、請求項1から7までのいずれか一項に記載の学習済みモデル生成装置。 - 入力情報に含まれる認識対象の認識結果を出力する学習済みモデルを生成する学習済みモデル生成装置が実行する学習済みモデル生成方法であって、
前記学習済みモデル生成装置が、前記入力情報と同一又は関連する第1情報を教師データとして学習することによって生成された、少なくとも第1ベースモデルを含むベースモデルを取得することと、
前記学習済みモデル生成装置が、前記第1ベースモデルを第1ターゲットモデルとして、前記第1ターゲットモデルに結合する第2ターゲットモデルを、前記第1ターゲットモデルに結合した状態で、前記認識対象を表す第2情報を教師データとして学習することによって、前記第1ターゲットモデル及び前記第2ターゲットモデルを含むターゲットモデルを生成することと、
前記学習済みモデル生成装置が、前記ベースモデルに結合された状態で、少なくとも第3情報を教師データとして学習することによって生成されたアダプタを取得することと、
前記学習済みモデル生成装置が、前記ターゲットモデルに前記アダプタを結合することによって前記学習済みモデルを生成することと
を含む学習済みモデル生成方法。 - 入力情報に含まれる認識対象の認識結果を出力する学習済みモデルを備えた認識装置であって、
前記学習済みモデルは、
前記入力情報と同一又は関連する第1情報を教師データとして学習することによって生成された、少なくとも第1ベースモデルを含むベースモデルと、
前記第1ベースモデルを第1ターゲットモデルとして、前記第1ターゲットモデルに結合した状態で、前記認識対象を表す第2情報を教師データとして学習することによって生成した第2ターゲットモデルと前記第1ターゲットモデルとを含むターゲットモデルと、
前記ベースモデルに結合された状態で、少なくとも第3情報を教師データとして学習することによって生成されたアダプタと
を含み、
前記ターゲットモデルに前記アダプタを結合されている、
認識装置。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/564,443 US20240265669A1 (en) | 2021-05-27 | 2022-05-27 | Trained model generating device, trained model generating method, and recognition device |
JP2023513909A JP7271810B2 (ja) | 2021-05-27 | 2022-05-27 | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 |
CN202280036765.3A CN117377986A (zh) | 2021-05-27 | 2022-05-27 | 训练模型生成装置、训练模型生成方法和识别装置 |
EP22811421.1A EP4350613A1 (en) | 2021-05-27 | 2022-05-27 | Trained model generating device, trained model generating method, and recognition device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021089565 | 2021-05-27 | ||
JP2021-089565 | 2021-05-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022250153A1 true WO2022250153A1 (ja) | 2022-12-01 |
Family
ID=84228923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/021814 WO2022250153A1 (ja) | 2021-05-27 | 2022-05-27 | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20240265669A1 (ja) |
EP (1) | EP4350613A1 (ja) |
JP (2) | JP7271810B2 (ja) |
CN (1) | CN117377986A (ja) |
WO (1) | WO2022250153A1 (ja) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016071502A (ja) | 2014-09-29 | 2016-05-09 | セコム株式会社 | 対象識別装置 |
WO2019194256A1 (ja) * | 2018-04-05 | 2019-10-10 | 株式会社小糸製作所 | 演算処理装置、オブジェクト識別システム、学習方法、自動車、車両用灯具 |
US10565471B1 (en) * | 2019-03-07 | 2020-02-18 | Capital One Services, Llc | Systems and methods for transfer learning of neural networks |
US20200134469A1 (en) * | 2018-10-30 | 2020-04-30 | Samsung Sds Co., Ltd. | Method and apparatus for determining a base model for transfer learning |
JP2020144700A (ja) * | 2019-03-07 | 2020-09-10 | 株式会社日立製作所 | 画像診断装置、画像処理方法及びプログラム |
JP2021056785A (ja) * | 2019-09-30 | 2021-04-08 | セコム株式会社 | 画像認識システム、撮像装置、認識装置及び画像認識方法 |
-
2022
- 2022-05-27 EP EP22811421.1A patent/EP4350613A1/en active Pending
- 2022-05-27 CN CN202280036765.3A patent/CN117377986A/zh active Pending
- 2022-05-27 US US18/564,443 patent/US20240265669A1/en active Pending
- 2022-05-27 WO PCT/JP2022/021814 patent/WO2022250153A1/ja active Application Filing
- 2022-05-27 JP JP2023513909A patent/JP7271810B2/ja active Active
-
2023
- 2023-04-26 JP JP2023072691A patent/JP2023099083A/ja active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016071502A (ja) | 2014-09-29 | 2016-05-09 | セコム株式会社 | 対象識別装置 |
WO2019194256A1 (ja) * | 2018-04-05 | 2019-10-10 | 株式会社小糸製作所 | 演算処理装置、オブジェクト識別システム、学習方法、自動車、車両用灯具 |
US20200134469A1 (en) * | 2018-10-30 | 2020-04-30 | Samsung Sds Co., Ltd. | Method and apparatus for determining a base model for transfer learning |
US10565471B1 (en) * | 2019-03-07 | 2020-02-18 | Capital One Services, Llc | Systems and methods for transfer learning of neural networks |
JP2020144700A (ja) * | 2019-03-07 | 2020-09-10 | 株式会社日立製作所 | 画像診断装置、画像処理方法及びプログラム |
JP2021056785A (ja) * | 2019-09-30 | 2021-04-08 | セコム株式会社 | 画像認識システム、撮像装置、認識装置及び画像認識方法 |
Also Published As
Publication number | Publication date |
---|---|
EP4350613A1 (en) | 2024-04-10 |
CN117377986A (zh) | 2024-01-09 |
JPWO2022250153A1 (ja) | 2022-12-01 |
US20240265669A1 (en) | 2024-08-08 |
JP2023099083A (ja) | 2023-07-11 |
JP7271810B2 (ja) | 2023-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11338435B2 (en) | Gripping system with machine learning | |
CN110480637B (zh) | 一种基于Kinect传感器的机械臂零件图像识别抓取方法 | |
CN111347411B (zh) | 基于深度学习的双臂协作机器人三维视觉识别抓取方法 | |
CN111275063A (zh) | 一种基于3d视觉的机器人智能抓取控制方法及系统 | |
CN111695562A (zh) | 一种基于卷积神经网络的机器人自主抓取方法 | |
WO2020057440A1 (zh) | 一种装配方法、装配装置及装配设备 | |
WO2019192402A1 (zh) | 一种插机方法及插机设备 | |
CN114131603B (zh) | 基于感知增强和场景迁移的深度强化学习机器人抓取方法 | |
JP7051751B2 (ja) | 学習装置、学習方法、学習モデル、検出装置及び把持システム | |
JP7271810B2 (ja) | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 | |
JP7271809B2 (ja) | 学習済みモデル生成装置、学習済みモデル生成方法、及び認識装置 | |
CN114187312A (zh) | 目标物的抓取方法、装置、系统、存储介质及设备 | |
Mohammed et al. | Color matching based approach for robotic grasping | |
Chen et al. | Robotic grasp control policy with target pre-detection based on deep Q-learning | |
WO2023042895A1 (ja) | 学習済みモデル生成方法、推論装置、及び学習済みモデル生成装置 | |
CN115936105A (zh) | 生成用于监督学习的训练数据以训练神经网络的方法 | |
Tokuda et al. | CNN-based Visual Servoing for Simultaneous Positioning and Flattening of Soft Fabric Parts | |
WO2023027187A1 (ja) | 学習済みモデル生成方法、学習済みモデル生成装置、学習済みモデル、及び保持態様の推定装置 | |
WO2023234062A1 (ja) | データ取得装置、データ取得方法、及びデータ取得台 | |
EP4389367A1 (en) | Holding mode determination device for robot, holding mode determination method, and robot control system | |
CN112533739A (zh) | 机器人控制装置、机器人控制方法以及机器人控制程序 | |
WO2023234061A1 (ja) | データ取得装置、データ取得方法、及びデータ取得台 | |
Elachkar | Robot Learning From Human Observation Using Deep Neural Networks | |
Tokuda et al. | CNN-based Visual Servoing for Pose Control of Soft Fabric Parts | |
Tailor et al. | Mono Camera-based Localization of Objects to Guide Real-time Grasp of a Robotic Manipulator |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22811421 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023513909 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280036765.3 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18564443 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022811421 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022811421 Country of ref document: EP Effective date: 20240102 |