WO2023286847A1 - 認識モデル生成方法及び認識モデル生成装置 - Google Patents

認識モデル生成方法及び認識モデル生成装置 Download PDF

Info

Publication number
WO2023286847A1
WO2023286847A1 PCT/JP2022/027775 JP2022027775W WO2023286847A1 WO 2023286847 A1 WO2023286847 A1 WO 2023286847A1 JP 2022027775 W JP2022027775 W JP 2022027775W WO 2023286847 A1 WO2023286847 A1 WO 2023286847A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition model
captured image
image
model generation
detection target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/027775
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
匡芳 中村
匡史 堤
智之 和泉
康平 古川
慧 村岡
達将 樺澤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rist Inc
Kyocera Corp
Original Assignee
Rist Inc
Kyocera Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rist Inc, Kyocera Corp filed Critical Rist Inc
Priority to US18/579,257 priority Critical patent/US20240355097A1/en
Priority to JP2023534865A priority patent/JP7581521B2/ja
Priority to CN202280049628.3A priority patent/CN117651971A/zh
Priority to EP22842188.9A priority patent/EP4372679A4/en
Publication of WO2023286847A1 publication Critical patent/WO2023286847A1/ja
Anticipated expiration legal-status Critical
Priority to JP2024190926A priority patent/JP2025014039A/ja
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional [3D] objects

Definitions

  • the present disclosure relates to a recognition model generation method and a recognition model generation device.
  • the recognition model generation method includes: Acquiring multiple composite images showing a detection target, performing first learning for creating a first recognition model that outputs an object recognition result in response to an image input based on the plurality of synthesized images; Acquiring a captured image of the detection target, assigning the object recognition result output by inputting the plurality of captured images to the first recognition model as annotation data to the captured image; Second learning for creating a second recognition model is performed based on the captured image and the annotation data.
  • the recognition model generation device is: a first recognition model generating means for generating a first recognition model outputting an object recognition result in response to an image input, based on a plurality of synthesized images representing detection targets; a means for adding the object recognition result as annotation data to the captured image by inputting the plurality of captured images of the detection target to the first recognition model; a second recognition model generation means for generating a second recognition model based on the captured image and the annotation data.
  • the recognition model generation device is: A recognition model generation device for generating a second recognition model by learning a first recognition model using a captured image of a detection target as teacher data,
  • the first recognition model is a recognition model generated by learning an original recognition model used for object recognition using a synthetic image generated based on three-dimensional shape data of a detection target as teacher data.
  • FIG. 1 is a functional block diagram showing a schematic configuration of a recognition model generation device according to one embodiment.
  • FIG. 2 is a functional block diagram showing a virtual schematic configuration of a control unit in FIG. 1;
  • FIG. 2 is a first flowchart for explaining recognition model generation processing executed by the control unit in FIG. 1;
  • FIG. 4 is a second flowchart for explaining recognition model generation processing executed by the control unit in FIG. 1;
  • a large amount of training data requires, for example, images of the same object to be recognized viewed from various directions, images viewed under various lighting conditions, and the like.
  • the recognition model generation device creates the first recognition model by learning the original recognition model using the synthetic image based on the three-dimensional shape data of the detection target.
  • the recognition model generation device adds annotation information by annotating at least part of the captured image of the detection target using the first recognition model.
  • the recognition model generation device creates a model for deployment via the second recognition model by learning the first recognition model.
  • the recognition model generation device uses the captured image of the detection target to which the annotation data is added to generate the model for deployment.
  • the recognition model generation device 10 may include a communication unit 11, a storage unit 12, and a control unit 13.
  • the recognition model generation device 10 is, for example, one or a plurality of server devices that can communicate with each other, a general-purpose electronic device such as a PC (Personal Computer), or a dedicated electronic device.
  • the communication unit 11 may communicate with external devices.
  • External devices are, for example, imaging devices, storage media, and terminal devices.
  • the imaging device is provided in, for example, a mobile terminal such as a smart phone or a tablet, or a device such as a robot.
  • a storage medium is, for example, any storage medium that can be attached and detached at a connector.
  • a terminal device is, for example, a general-purpose electronic device such as a smartphone, a tablet, or a PC, or a dedicated electronic device.
  • the communication unit 11 may communicate with an external device by wire or wirelessly.
  • the communication unit 11 may acquire information and instructions through communication with external devices.
  • the communication unit 11 may give information and instructions through communication with an external device.
  • the communication unit 11 may acquire the three-dimensional shape data of the detection target.
  • Three-dimensional shape data is, for example, CAD data.
  • the name of the detection target may be associated with the three-dimensional shape data as label data.
  • the communication unit 11 may acquire texture information to be detected.
  • the texture data the texture of a material commonly used for an assumed detection target may be digitized as a template, or the surface of an actual photograph may be digitized.
  • the communication unit 11 may acquire a synthetic image generated based on the three-dimensional shape data of the detection target.
  • the acquired composite image may be associated with annotation data.
  • the annotation data may include, for example, data corresponding to at least one of a mask image to be detected, a bounding box to be detected, and a label.
  • the mask image is, for example, an image that fills in the contour of the detection target within the entire image range.
  • a bounding box is, for example, a rectangular frame surrounding a detection target.
  • a label is, for example, the name of a detection target.
  • the synthesized image may be generated based on, for example, a plurality of two-dimensional shape data.
  • the communication unit 11 may acquire the captured image of the detection target. As will be described later, the communication unit 11 may acquire annotation data that has been modified with respect to annotation data attached to the captured image.
  • the communication unit 11 may provide the mobile terminal or the robot with an imaging guide for imaging the detection target. As will be described later, the communication unit 11 may add annotation information obtained by using the first recognition model to the acquired captured image to the terminal device.
  • the storage unit 12 includes arbitrary storage devices such as RAM (Random Access Memory) and ROM (Read Only Memory).
  • the storage unit 12 may store various programs that cause the control unit 13 to function and various information that the control unit 13 uses.
  • the control unit 13 includes one or more processors and memory.
  • the processor may include a general-purpose processor that loads a specific program to execute a specific function, and a dedicated processor that specializes in specific processing.
  • a dedicated processor may include an Application Specific Integrated Circuit (ASIC).
  • the processor may include a programmable logic device (PLD).
  • the PLD may include an FPGA (Field-Programmable Gate Array).
  • the control unit 13 may be either SoC (System-on-a-Chip) in which one or more processors cooperate, or SiP (System In a Package).
  • control unit 13 functions as synthesis means 14, first recognition model generation means 15, imaging guide generation means 16, provision means 17, and second recognition model generation means 18, which will be described below. you can
  • the synthesizing means 14 may generate a synthetic image of the detection target based on the three-dimensional shape data. Based on the three-dimensional shape data, the synthesizing means 14 may generate a two-dimensional synthesized image including a single or a plurality of detection target images in an image display area such as a rectangle, for example. The synthesizing means 14 may generate a plurality of synthetic images. The synthesizing means 14 may generate a synthesized image in which the images of the detection target are arranged in various ways in the image display area. The synthesizing means 14 may generate a synthetic image including images of different detection targets separately. The synthesizing means 14 may generate a synthetic image including different detection targets.
  • the synthesizing means 14 may generate a synthetic image so as to have the format of input information input when inferring a first recognition model, which will be described later. For example, if the captured image input to the first recognition model is two-dimensional, the synthesized image may also be two-dimensional.
  • the synthesizing means 14 may generate a synthesized image including images of various postures of the detection target in the image display area.
  • the synthesizing means 14 may determine the orientation of the image based on the three-dimensional shape data of the detection target. For example, when the detection target is spherical, the synthesizing unit 14 generates a synthesized image in which the posture of the detection target is an image viewed from any one direction. For example, when the object to be detected is cubic, the synthesizing unit 14 selects a direction obtained by rotating an arbitrary side from an arbitrary surface by 45° around an arbitrary side as an axis, and rotating the direction by 10° around an axis perpendicular to the side. , may be generated as a composite image.
  • the synthesizing means 14 further generates a composite image of the angle image viewed from the direction rotated by 10° around the side perpendicular to the side from the direction inclined by 50° from the arbitrary plane around the given side. can be generated as
  • the synthesizing means 14 When the synthesizing means 14 generates a plurality of synthetic images for the same detection target, it may be determined that some of them are used as data for learning and others are used as data for evaluation. For example, when a synthetic image of a cube-shaped detection target is generated as described above, a synthetic image viewed from an arbitrary plane with an arbitrary side as an axis at an angle of 45° may be determined as learning data. . Also, a composite image viewed from a direction tilted 50° from an arbitrary side about an arbitrary side may be determined as the evaluation data. Further, learning data may be determined to be training data or validation data.
  • the synthesizing means 14 may generate a synthetic image using the texture corresponding to the detection target.
  • the texture corresponding to the detection target may be selected by designating a template registered in advance and stored in the storage unit 12 for each type of material such as metal or an image of the material.
  • the image of the material may be a texture image corresponding to the material specified based on the overall image generated by imaging the detection target with imaging means such as a camera.
  • the image of the material may be pre-stored in the storage unit 12 . Texture selection may be performed by detecting manual input to a pointing device such as a mouse or an input device such as a keyboard via the communication unit 11 .
  • the synthesizing means 14 may generate a synthetic image so as to reproduce the characteristics of the captured image based on the three-dimensional shape data.
  • the synthesizing unit 14 may generate a synthesized image having the same features as the captured image.
  • the same feature is, for example, the same orientation, in other words, the same appearance, and the same color, in other words, the same hue, saturation, and brightness as the detection target in the captured image.
  • the synthesizing unit 14 may store the newly generated synthetic image in the storage unit 12 as data for creating a deployment model, which will be described later.
  • the synthesizing means 14 may annotate the synthetic image based on the three-dimensional shape data.
  • Annotation refers to adding annotation data to a synthesized image. That is, the synthesizing unit 14 may add annotation data to the synthetic image by annotating the synthetic image.
  • the annotation data added by the synthesizing unit 14 as an annotation may include, for example, a mask image to be detected and a bounding box to be detected.
  • the synthesizing means 14 generates polygons based on the three-dimensional shape data, and calculates the area occupied by the detection target viewed from the photographing direction of the synthesized image, thereby generating a mask image and a bounding box surrounding the polygons. good.
  • the synthesizing unit 14 may store the synthesized image with the annotation data in the storage unit 12 as data for creating a deployment model.
  • the first recognition model generating means 15 performs first learning for learning the original recognition model using the synthesized image as teacher data.
  • the original recognition model is a recognition model used for object recognition.
  • the original recognition model is a model for detecting at least one of a mask image and a rectangular frame-shaped bounding box for each area of each object in order to perform object detection such as instance segmentation.
  • the original recognition model may be, for example, a trained model using a large data set such as ImageNet or MS COCO, or a data set of a specific product group such as industrial products.
  • the first learning is, for example, transfer learning and fine tuning of the original recognition model.
  • the first recognition model generating means 15 generates a first recognition model through first learning.
  • the first recognition model outputs an object recognition result for any input image.
  • the object recognition result may be data corresponding to at least one of a mask image to be detected, a bounding box to be detected, a label, a mask score, and a bounding box score.
  • the first recognition model generation means 15 may calculate the accuracy of validation data for each epoch in learning using training data.
  • the first recognition model generating means 15 may attenuate the learning rate when the accuracy of the validation data does not increase a certain number of times. Furthermore, the first recognition model generating means 15 may terminate learning when the accuracy of the validation data does not increase a certain number of times.
  • the first recognition model generating means 15 may store the model of the epoch with the best accuracy for the validation data in the storage unit 12 as the first recognition model.
  • the first recognition model generating means 15 may search for a certainty threshold that maximizes accuracy with respect to the validation data while changing the certainty threshold.
  • the first recognition model generating means 15 may determine the retrieved certainty threshold as the certainty threshold of the first recognition model.
  • the first recognition model generating means 15 may use the evaluation data to evaluate the first recognition model.
  • the imaging guide generation means 16 may provide the imaging guide based on the acquired three-dimensional shape data.
  • the imaging guide may indicate a method of imaging the detection target corresponding to the acquired three-dimensional shape data.
  • the imaging guide may include, for example, designation of the imaging direction of the detection target, in other words, the appearance of the detection target in the captured image generated by imaging.
  • the imaging guide may include, for example, designation of the size of the image of the detection target in the entire captured image, in other words, the focal length, the distance between the detection target and the camera, and the like.
  • the imaging guide generating means 16 may determine the imaging direction and image size of the detection target based on the three-dimensional shape data.
  • the imaging guide may be sent to a mobile terminal with an imaging device, such as a smartphone or tablet, or to a control device of a robot equipped with an imaging device.
  • the imaging device may perform imaging under control based on the imaging guide and acquire the captured image of the detection target.
  • the imaging guide may be an imaging method indicated by text and drawings in the configuration sent to the mobile terminal.
  • the detection target may be imaged by a user's manual operation with reference to the imaging guide.
  • the imaging guide may be a control command that causes the robot to adjust the position of the imaging device so as to achieve the specified imaging direction and size in the configuration sent to the robot control device.
  • the detection target may be imaged at a position adjusted by the robot based on the imaging guide.
  • the control unit 13 may acquire the captured image via the communication unit 11.
  • the control unit 13 may selectably present the name of the detection target corresponding to the acquired three-dimensional shape data when acquiring the captured image.
  • the control unit 13 may present the name of the detection target on, for example, a display connected to the recognition model generation device 10 or a terminal device.
  • the control unit 13 may acquire the name corresponding to the captured image by operation input from an input device or terminal device connected to the recognition model generation device 10 .
  • the control unit 13 may associate the name of the detection target as a label with the captured image to be acquired.
  • the adding means 17 adds annotation data to the captured image by annotating at least part of the acquired captured image using the first recognition model.
  • the annotation data may include data corresponding to at least one of the mask image to be detected and the bounding box to be detected.
  • the adding unit 17 may store the captured image to which the annotation data is added in the storage unit 12 as data for creating a deployment model.
  • the adding means 17 may generate a removed image by removing noise from the captured image to be annotated.
  • the adding unit 17 may perform the annotation by causing the first recognition model to recognize the removed image, and add the annotation data to the captured image corresponding to the removed image. Therefore, the generated removed image is not used in the second recognition model generating means 18, which will be described later, and the second learning is performed using the captured image to which the annotation data is added.
  • the adding means 17 may present the captured image with the annotation data added to the display connected to the recognition model generation device 10 or to the terminal device connected via the communication unit 11 .
  • Annotation data may be modifiable by operation input to an input device or terminal device connected to the recognition model generation device 10 .
  • the adding unit 17 may acquire the corrected annotation data via the communication unit 11 .
  • the adding means 17 may use the corrected annotation data to update the annotation data stored in the storage unit 12 as the data for creating the deployment model.
  • the adding unit 17 instructs the synthesizing unit 14 to add the features of the captured image.
  • a command to create a composite image may be given.
  • the second recognition model generating means 18 performs second learning for learning the first recognition model using the captured image.
  • the second recognition model generating means 18 generates a second recognition model through second learning.
  • the second recognition model outputs an object recognition result for any input image.
  • the object recognition result may be data corresponding to at least one of a mask image to be detected, a bounding box to be detected, a label, a mask score, and a bounding box score.
  • the second recognition model generating means 18 may generate the second recognition model by performing second learning using the captured image to which the annotation data is attached as teacher data.
  • the second recognition model generating means 18 may perform the second learning using the synthesized image with annotation data, which is stored in the storage unit 12 as data for creating a deployment model.
  • the annotation data stored in the storage unit 12 as the data for creating the deployment model is At least part of the given captured image may be determined as learning data. Furthermore, the second recognition model generating means 18 may determine the learning data to be training data or validation data. The second recognition model generating means 18 may determine another part of the captured image to which the annotation data is attached as the evaluation data.
  • the second recognition model generation means 18 may calculate the accuracy of validation data for each epoch in learning using training data.
  • the second recognition model generating means 18 may attenuate the learning rate when the accuracy of the validation data does not increase a certain number of times. Furthermore, the second recognition model generating means 18 may terminate learning when the accuracy of the validation data does not increase a certain number of times.
  • the second recognition model generating means 18 may store the model of the epoch with the best accuracy for the validation data in the storage unit 12 as the second recognition model.
  • the second recognition model generating means 18 may search for a certainty threshold that maximizes accuracy with respect to the validation data while changing the certainty threshold.
  • the second recognition model generating means 18 may determine the retrieved certainty threshold as the certainty threshold of the second recognition model.
  • the second recognition model generating means 18 may use the evaluation data to evaluate the second recognition model.
  • the second recognition model generating means 18 re-learns the first recognition model by subjecting the first recognition model to domain adaptation using a captured image to which annotation data is not attached, as second learning, thereby generating a second recognition model. may be generated.
  • the second recognition model generating means 18 uses annotation data stored in the storage unit 12 as data for creating a deployment model. may be determined as evaluation data. The second recognition model generating means 18 may use the evaluation data to evaluate the second recognition model.
  • the second recognition model generating means 18 may store the second recognition model after evaluation in the storage unit 12 as a deployment model.
  • the recognition model generation process is started, for example, when an operation input for starting the generation process to an input device or the like connected to the recognition model generation apparatus 10 is detected.
  • step S100 the control unit 13 determines whether or not the three-dimensional shape data to be detected has been acquired. If not, the process returns to step S100. If so, the process proceeds to step S101.
  • step S101 the control unit 13 generates a composite image based on the three-dimensional shape data confirmed to be acquired in step S100. After generation, the process proceeds to step S102.
  • step S102 the control unit 13 generates annotation data based on the three-dimensional shape data whose acquisition was confirmed in step S100.
  • the control unit 13 adds the generated annotation data to the composite image generated in step S101. After granting, the process proceeds to step S103.
  • step S103 the control unit 13 executes first learning by learning the original recognition model using the synthesized image to which the annotation data was added in step S102.
  • the control unit 13 stores the first recognition model generated by executing the first learning in the storage unit 12 . After performing the first learning, the process proceeds to step S104.
  • step S104 the control unit 13 may generate an imaging guide based on the three-dimensional shape data whose acquisition was confirmed in step S100.
  • the control unit 13 may generate an imaging guide according to the provider. After generation, the process proceeds to step S105.
  • step S105 the control unit 13 provides the imaging guide generated in step S104 to the external device. After provisioning, the process proceeds to step S106.
  • step S106 the control unit 13 determines whether or not the captured image is acquired from the external device. If no captured image has been acquired, the process returns to step S106. If the captured image has been acquired, the process proceeds to step S107.
  • step S107 the control unit 13 presents the name of the detection target corresponding to the three-dimensional shape data stored in the storage unit 12 in a selectable manner. After presentation, the process proceeds to step S108.
  • step S108 the control unit 13 determines whether or not the name of the detection target has been acquired. If the name of the detection target has been acquired, the process proceeds to step S109. If the name of the detection target has not been obtained, the process proceeds to step S110.
  • step S109 the control unit 13 associates the name whose acquisition was confirmed in step S108 with the captured image whose acquisition was confirmed in step S106.
  • the control unit 13 stores the captured image associated with the name of the detection target in the storage unit 12 . After association, the process proceeds to step S110.
  • step S110 the control unit 13 generates a removed image by removing noise from the captured image whose acquisition was confirmed in step S106. After noise removal, the process proceeds to step S111.
  • step S111 the control unit 13 uses the first recognition model generated in step S103 to annotate the removed image generated in step S110.
  • the control unit 13 adds annotation data generated by annotation to the captured image corresponding to the removed image. After granting, the process proceeds to step S112.
  • control unit 13 presents the captured image with the annotation data. After presentation, the process proceeds to step S113.
  • step S113 the control unit 13 determines whether or not the annotation data corrected for the presentation of the captured image to which the annotation data is attached is acquired. If amended annotation data is obtained, the process proceeds to step S114. If the modified annotation data has not been obtained, the process proceeds to step S115.
  • step S114 the control unit 13 updates the annotation data stored in the storage unit 12 using the corrected annotation data whose acquisition was confirmed in step S113. After updating, the process proceeds to step S115.
  • step S115 the control unit 13 generates a second recognition model by executing second learning.
  • the control unit 13 In a configuration using captured images to which annotation data is added in the second learning, the control unit 13 generates a synthesized image having the same characteristics as captured images whose annotations have a certainty factor equal to or less than a threshold.
  • the control unit 13 further trains the first recognition model using the captured image to which the annotation data is added and the newly generated synthesized image.
  • the control unit 13 performs domain adaptation using the captured image. After performing the second learning, the process proceeds to step S116.
  • control unit 13 evaluates the second recognition model generated at step S116 using the captured image to which the annotation data is attached. After evaluation, the process proceeds to step S117.
  • step S117 the control unit 13 stores the second recognition model evaluated in step S116 in the storage unit 12 as a deployment model. After storing, the recognition model generation processing ends.
  • the recognition model generation device 10 of the present embodiment configured as described above generates a first recognition model that outputs an object recognition result in response to an image input based on a plurality of synthetic images representing a detection target, By inputting a plurality of captured images into the first recognition model, annotation data is added to the captured images using object recognition results, and a second learning model is created based on the captured images and the annotation data. With such a configuration, the recognition model generation device 10 annotates the captured image using the first recognition model. work can be reduced.
  • the recognition model generation device 10 creates the second learning model as described above, it is possible to improve the recognition accuracy of the detection target in the actually captured image.
  • the recognition model generation device 10 can learn using a large amount of synthesized images generated based on the three-dimensional shape data, it is possible to generate a model with high recognition accuracy even with a small number of captured images.
  • the recognition model generation method of the present embodiment configured as described above, the original recognition model is trained using the synthesized image until the real object to be detected is manufactured and the captured image can be acquired.
  • a recognition model can be created. After that, after the actual object to be detected is manufactured and the captured image becomes available, annotation data is added to at least a part of the captured image using the first recognition model, and the captured image to be detected is used for the first recognition.
  • a second learning model can be created by training one recognition model.
  • the recognition model generation method of the present embodiment can advance the construction of the production line and the generation of the recognition model in parallel, so that the production line that introduces the recognition model generation can be introduced at an early stage.
  • the recognition model generation device 10 of the present embodiment generates a second recognition model using a captured image to which annotation data is added. With such a configuration, the recognition model generation device 10 can shorten the time required for the second learning.
  • the recognition model generation device 10 of the present embodiment in the second learning, the first recognition model is re-learned by performing domain adaptation using the captured image of the detection target to which annotation data is not added, and the annotation data is used to evaluate the second recognition model.
  • the recognition model generation device 10 evaluates the learned recognition model not by the synthesized image but by the captured image, so that the reliability of the evaluation result can be improved.
  • the recognition model generation device 10 of the present embodiment when the certainty factor in the annotation of the captured image, that is, the certainty factor when the first recognition model is recognized to annotate the captured image is equal to or less than a threshold value, the captured image and A synthetic image to be detected is generated so as to have the same features, and the synthetic image is used for the second learning.
  • the recognition model generation device 10 can frequently generate a synthetic image that looks the same as the appearance where the recognition accuracy is lowered, so that the recognition accuracy of the finally learned second recognition model is improved. obtain.
  • the recognition model generation device 10 can improve the recognition accuracy of the detection target in the actually captured image by using the captured image while ensuring robustness in the domain of the synthesized image.
  • the recognition model generation device 10 of this embodiment provides an imaging guide based on the three-dimensional shape data.
  • the recognition model generation device 10 can capture a captured image based on the imaging guide. Therefore, the recognition model generation device 10 can acquire a captured image of a detection target whose orientation requires high learning, based on the three-dimensional shape data, regardless of the user's experience and knowledge. As a result, the recognition model generation device 10 can finally generate a second recognition model with high recognition accuracy.
  • the recognition model generation device 10 of the present embodiment assigns annotation data by causing the first recognition model to recognize the noise-removed image obtained by removing noise from the captured image in the annotation, and uses the captured image in the second learning. Train the first recognition model.
  • the recognition model generation device 10 can attach highly accurate annotation data by bringing the captured image closer to a synthesized image with less noise in the annotation.
  • the recognition model generation device 10 performs learning using the captured image from which noise has not been removed as it is in the second learning, it is possible to improve the recognition accuracy of the detection target in the actually captured image.
  • the recognition model generation device 10 of the present embodiment generates a synthetic image using textures. With such a configuration, the recognition model generation device 10 can further improve the recognition accuracy of the first recognition model and the second recognition model.
  • a storage medium on which the program is recorded for example, an optical disk, an optical Magnetic disk, CD-ROM, CD-R, CD-RW, magnetic tape, hard disk, memory card, etc.
  • the implementation form of the program is not limited to an application program such as an object code compiled by a compiler or a program code executed by an interpreter. good.
  • the program may or may not be configured so that all processing is performed only in the CPU on the control board.
  • the program may be configured to be partially or wholly executed by another processing unit mounted on an expansion board or expansion unit added to the board as required.
  • Embodiments according to the present disclosure are not limited to any specific configuration of the embodiments described above. Embodiments of the present disclosure extend to any novel feature or combination thereof described in the present disclosure or any novel method or process step or combination thereof described. be able to.
  • Descriptions such as “first” and “second” in this disclosure are identifiers for distinguishing the configurations. Configurations that are differentiated in descriptions such as “first” and “second” in this disclosure may interchange the numbers in that configuration. For example, a first recognition model can exchange identifiers “first” and “second” with a second recognition model. The exchange of identifiers is done simultaneously. The configurations are still distinct after the exchange of identifiers. Identifiers may be deleted. Configurations from which identifiers have been deleted are distinguished by codes. The description of identifiers such as “first” and “second” in this disclosure should not be used as a basis for interpreting the order of the configuration or the existence of lower numbered identifiers.
  • synthesizing means 14, first recognition model generating means 15, imaging guide generating means 16, imparting means 17, and second recognition model generating means 18 have been described as functioning by control unit 13. Not limited. Synthesis means 14, first recognition model generation means 15, imaging guide generation means 16, provision means 17, and second recognition model generation means 18 may each be configured by one or more devices.
  • the recognition model generation method disclosed in this embodiment includes, for example, a synthesizing device, a first recognition model generation device, an imaging guide generation device, an attachment device for adding annotation data, and a second recognition model generation device. It can be implemented in a recognition model generation system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
PCT/JP2022/027775 2021-07-15 2022-07-14 認識モデル生成方法及び認識モデル生成装置 Ceased WO2023286847A1 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US18/579,257 US20240355097A1 (en) 2021-07-15 2022-07-14 Recognition model generation method and recognition model generation apparatus
JP2023534865A JP7581521B2 (ja) 2021-07-15 2022-07-14 認識モデル生成方法及び認識モデル生成装置
CN202280049628.3A CN117651971A (zh) 2021-07-15 2022-07-14 识别模型生成方法以及识别模型生成装置
EP22842188.9A EP4372679A4 (en) 2021-07-15 2022-07-14 Recognition model generation method and recognition model generation device
JP2024190926A JP2025014039A (ja) 2021-07-15 2024-10-30 認識モデル生成方法及び認識モデル生成装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-117345 2021-07-15
JP2021117345 2021-07-15

Publications (1)

Publication Number Publication Date
WO2023286847A1 true WO2023286847A1 (ja) 2023-01-19

Family

ID=84920258

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/027775 Ceased WO2023286847A1 (ja) 2021-07-15 2022-07-14 認識モデル生成方法及び認識モデル生成装置

Country Status (5)

Country Link
US (1) US20240355097A1 (https=)
EP (1) EP4372679A4 (https=)
JP (2) JP7581521B2 (https=)
CN (1) CN117651971A (https=)
WO (1) WO2023286847A1 (https=)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024201655A1 (ja) * 2023-03-27 2024-10-03 ファナック株式会社 学習データ生成装置、ロボットシステム、学習データ生成方法および学習データ生成プログラム
WO2025069198A1 (ja) * 2023-09-26 2025-04-03 日本電信電話株式会社 設定補助装置、設定補助方法、および設定補助プログラム
WO2026009899A1 (ja) * 2024-07-02 2026-01-08 京セラ株式会社 学習方法、学習装置、処理装置及びプログラム

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019021456A1 (ja) * 2017-07-28 2019-01-31 株式会社ソニー・インタラクティブエンタテインメント 学習装置、認識装置、学習方法、認識方法及びプログラム
WO2019059343A1 (ja) * 2017-09-22 2019-03-28 Ntn株式会社 ワーク情報処理装置およびワークの認識方法
JP2019056966A (ja) * 2017-09-19 2019-04-11 株式会社東芝 情報処理装置、画像認識方法および画像認識プログラム
JP2019191973A (ja) * 2018-04-26 2019-10-31 株式会社神戸製鋼所 学習画像生成装置及び学習画像生成方法、並びに画像認識装置及び画像認識方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3655926A1 (en) * 2017-08-08 2020-05-27 Siemens Aktiengesellschaft Synthetic depth image generation from cad data using generative adversarial neural networks for enhancement
JP6924413B2 (ja) * 2017-12-25 2021-08-25 オムロン株式会社 データ生成装置、データ生成方法及びデータ生成プログラム
EP3624008A1 (en) * 2018-08-29 2020-03-18 Panasonic Intellectual Property Corporation of America Information processing method and information processing system
JP7466928B2 (ja) * 2018-09-12 2024-04-15 オルソグリッド システムズ ホールディング,エルエルシー 人工知能の術中外科的ガイダンスシステムと使用方法
WO2020102767A1 (en) * 2018-11-16 2020-05-22 Google Llc Generating synthetic images and/or training machine learning model(s) based on the synthetic images
CN109816634B (zh) * 2018-12-29 2023-07-11 歌尔股份有限公司 检测方法、模型训练方法、装置及设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019021456A1 (ja) * 2017-07-28 2019-01-31 株式会社ソニー・インタラクティブエンタテインメント 学習装置、認識装置、学習方法、認識方法及びプログラム
JP2019056966A (ja) * 2017-09-19 2019-04-11 株式会社東芝 情報処理装置、画像認識方法および画像認識プログラム
WO2019059343A1 (ja) * 2017-09-22 2019-03-28 Ntn株式会社 ワーク情報処理装置およびワークの認識方法
JP2019191973A (ja) * 2018-04-26 2019-10-31 株式会社神戸製鋼所 学習画像生成装置及び学習画像生成方法、並びに画像認識装置及び画像認識方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4372679A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024201655A1 (ja) * 2023-03-27 2024-10-03 ファナック株式会社 学習データ生成装置、ロボットシステム、学習データ生成方法および学習データ生成プログラム
WO2025069198A1 (ja) * 2023-09-26 2025-04-03 日本電信電話株式会社 設定補助装置、設定補助方法、および設定補助プログラム
WO2026009899A1 (ja) * 2024-07-02 2026-01-08 京セラ株式会社 学習方法、学習装置、処理装置及びプログラム

Also Published As

Publication number Publication date
EP4372679A1 (en) 2024-05-22
EP4372679A4 (en) 2025-05-21
US20240355097A1 (en) 2024-10-24
JP2025014039A (ja) 2025-01-28
JPWO2023286847A1 (https=) 2023-01-19
JP7581521B2 (ja) 2024-11-12
CN117651971A (zh) 2024-03-05

Similar Documents

Publication Publication Date Title
WO2023286847A1 (ja) 認識モデル生成方法及び認識モデル生成装置
CN112598785B (zh) 虚拟形象的三维模型生成方法、装置、设备及存储介质
KR101635730B1 (ko) 몽타주 생성 장치 및 방법, 그 방법을 수행하기 위한 기록 매체
US20200265231A1 (en) Method, apparatus, and system for automatically annotating a target object in images
WO2015182134A1 (en) Improved setting of virtual illumination environment
WO2006009257A1 (ja) 画像処理装置および画像処理方法
JP2013101633A (ja) メイクアップシミュレーションシステム
US20170206676A1 (en) Information processing apparatus and control method thereof
JP2010026818A (ja) 画像処理プログラム、画像処理装置及び画像処理方法
JP6880618B2 (ja) 画像処理プログラム、画像処理装置、及び画像処理方法
CN111009028A (zh) 虚拟脸部模型的表情拟真系统及方法
CN111353069A (zh) 一种人物场景视频生成方法、系统、装置及存储介质
JP5953953B2 (ja) 画像認識装置、画像認識方法及びプログラム
CN109325493A (zh) 一种基于人形机器人的文字识别方法及人形机器人
WO2014006786A1 (ja) 特徴量抽出装置および特徴量抽出方法
CN119888024B (zh) 一种基于仿真环境的人体姿态多目视觉识别ai训练数据集自动生成和标识方法
US20260120351A1 (en) Drawing processing device and drawing processing method
KR101792701B1 (ko) 도면 검사 장치 및 방법
JP6719168B1 (ja) 教師データとしてのデプス画像にラベルを付与するプログラム、装置及び方法
KR20200052812A (ko) 가상 환경에 활동 캐릭터를 생성하기 위한 가상환경의 활동캐릭터 생성 방법
CN121039579A (zh) 用于形成设施的、尤其工业环境中的设施的、用于形成设施的数字孪生的三维运动学模型的计算机辅助的方法和装置
TWI892076B (zh) 向量化立體模型機器學習方法與學習系統
Farrukh et al. Comparative Analysis of Synthetic Data Generation for Object Detection: CAD Models vs. 3D Scans of Industrial Items and Hybrid Approaches
TW202103047A (zh) 物件姿態辨識方法及系統與電腦程式產品
JP2003331318A (ja) 物体データ生成装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22842188

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18579257

Country of ref document: US

Ref document number: 202280049628.3

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2023534865

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2022842188

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022842188

Country of ref document: EP

Effective date: 20240215