WO2023206268A1 - Procédé et appareil de génération d'ensemble de données d'apprentissage, dispositif électronique et support lisible - Google Patents
Procédé et appareil de génération d'ensemble de données d'apprentissage, dispositif électronique et support lisible Download PDFInfo
- Publication number
- WO2023206268A1 WO2023206268A1 PCT/CN2022/090006 CN2022090006W WO2023206268A1 WO 2023206268 A1 WO2023206268 A1 WO 2023206268A1 CN 2022090006 W CN2022090006 W CN 2022090006W WO 2023206268 A1 WO2023206268 A1 WO 2023206268A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cad model
- training data
- rendering
- point clouds
- label
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000009877 rendering Methods 0.000 claims abstract description 51
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000003287 optical effect Effects 0.000 claims description 3
- 238000003384 imaging method Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 4
- 238000013473 artificial intelligence Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/521—Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
Definitions
- Embodiments of the present application mainly relate to the field of image processing, and in particular, to a method, device, electronic device, and computer-readable medium for generating a training data set.
- AI artificial intelligence
- Embodiments of the present application provide a method, device, electronic device, and readable medium for generating a training data set, which are used to generate a point cloud data set with depth information to train an AI model, so as to be applied in scenarios involving three-dimensional object manipulation. Related predictions.
- a method for generating a training data set including: obtaining a CAD model and rendering parameters; generating a label for the CAD model; rendering the CAD model according to the rendering parameters to obtain the CAD model A plurality of rendering depth maps; each of the plurality of rendering depth maps has a corresponding label of the CAD model; converting a rendering depth map into a set of point clouds, the point clouds and the corresponding The labels of constitute a set of training data in the training data set.
- a second aspect provides a device for generating a training data set, including components for executing each step of the method provided in the first aspect.
- an electronic device including: at least one memory configured to store computer readable code; at least one processor configured to call the computer readable code to execute each of the methods provided in the first aspect. step.
- a computer-readable medium is provided.
- Computer-readable instructions are stored on the computer-readable medium. When executed by a processor, the computer-readable instructions cause the processor to execute the method provided in the first aspect. Each step in the method.
- Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application
- Figure 2 is a schematic diagram of a device for generating a training data set according to an embodiment of the present application
- FIG. 3 is a schematic diagram of an electronic device according to an embodiment of the present application.
- the term "includes” and variations thereof represent an open term meaning “including, but not limited to.”
- the term “based on” means “based at least in part on.”
- the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
- the term “another embodiment” means “at least one other embodiment”.
- the terms “first”, “second”, etc. may refer to different or the same object. Other definitions may be included below, whether explicit or implicit. The definition of a term is consistent throughout this specification unless the context clearly dictates otherwise.
- Point clouds can contain depth information and thus can carry more information than two-dimensional images. This is a very important feature for industrial scenarios, as many point clouds rely on depth information to estimate the pose of an object and then operate on it.
- point cloud data sets are not easy to obtain, which has largely hindered further research and application of point cloud-based AI models in the industry.
- Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application. As shown in Figure 1, the method 100 for generating a training data set includes:
- Step 101 obtain the CAD model and rendering parameters.
- Step 102 Generate labels for the CAD model.
- a label of the CAD model may be generated based on the file name of the CAD model/description information of the CAD model/user input. Among them, the label is used to characterize the name of the CAD model.
- the name of the generated CAD model may be a specific name of the CAD model or a category name, such as robot, conveyor belt, material box, etc.
- the name of the generated CAD model may also be an abstract name, such as A, B, and C.
- image recognition can be performed on the CAD model to generate a corresponding label of the CAD model.
- the label is used to characterize the name of the CAD model.
- the label of the generated CAD model may be a description in the form of a supporting file, or the name of the CAD model may be added to a data structure in the CAD model.
- the generated label may surround the CAD model in the form of a bounding box and indicate a specific name of the CAD model.
- Step 103 Render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model, wherein each of the multiple rendering depth maps has a corresponding label of the CAD model.
- the CAD model can be image rendered through the rendering engine, and the length of each pixel in the rendered CAD model from the corresponding spatial point in the rendering engine to the camera center point of the rendering engine along the optical axis is used as the depth of the pixel. .
- the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine may be defined by rendering parameters.
- the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine can be randomly determined according to preset rules to generate more rendering depth maps.
- Step 104 Convert a rendered depth map into a set of point clouds.
- the point clouds and corresponding labels constitute a set of training data in the training data set.
- the coordinates (x, y, z) of each pixel in a rendered depth map are converted into the corresponding camera coordinates (x′, y′, z′) in the rendering engine.
- the camera coordinates in the rendering engine approximate the scale in the real environment.
- the depth map can be converted into a point cloud based on the internal calibration parameters of the rendering engine's camera.
- the CAD model is rendered through a rendering engine, and then the obtained rendering depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model, that is, the neural network model.
- the trained AI model It can be applied to scenarios involving three-dimensional object manipulation, such as object recognition, object classification, and six-dimensional pose estimation in the field of robotic applications.
- the embodiments of the present application can be applied to different production stages, such as part-level, component-level, sub-assembly or assembly-level products, and the application prospects are very broad.
- the size of the point cloud data set can be continuously expanded according to user needs, thereby further improving the performance of the AI model.
- noise can be generated at multiple points in the set of point clouds to obtain multiple sets of point clouds containing different noises.
- random noise/Gaussian noise can be generated at multiple points in a set of point clouds.
- random noise can be generated at multiple points in a set of point clouds, where the perturbation amount of the random noise can be set within a smaller range, such as at x, y and z of multiple points.
- the random noise added to the coordinates is within the following range: x 1 : (-1, 1), y 1 : (-1, 1), z 1 : (-1, 1).
- Gaussian noise can be generated on multiple points in a set of point clouds, where the perturbation amount of the Gaussian noise is set to oscillate around the mean of the coordinates of all points in the point cloud. Specifically, the closer the The mean, the higher the probability of the corresponding disturbance; the further away from the mean, the lower the probability of the corresponding disturbance, resulting in noise that conforms to the Gaussian distribution. Additionally, Gaussian noise can be generated at multiple points in different sets of point clouds.
- noise is added on the basis of the point cloud converted from the rendered CAD model. This generates more point cloud data sets that are closer to real application scenarios.
- the point cloud and the label of each point in the point cloud constitute a set of training data in the training data set.
- the AI model trained through the generated training data set can be used in point cloud segmentation scenarios. For example, the first point cloud in the real scene collected by the depth camera installed on the robot is input into the first AI model trained with the corresponding training data set. The first AI model can target each point cloud in the first point cloud. A point is segmented to extract different objects in the first point cloud.
- the point cloud and the labels of the point cloud constitute a set of training data in the training data set.
- the AI model trained through the generated training data set can be used in point cloud classification scenarios. For example, point clouds of different categories are input into the second AI model trained with the corresponding training data set, and the second AI model can predict the categories to which the different input point clouds belong.
- At least one two-dimensional depth map can be added at a random position of each rendered depth map in the multiple rendered depth maps to increase interference terms, where, Each pixel in the two-dimensional depth map contains depth information, and the relevant depth information is randomly generated within a preset range.
- generate a richer, more distracting rendered depth map by adding one or more 2D depth maps of different sizes, different positions, or different rotations at random locations.
- the multiple rendered depth maps obtained are converted into multiple sets of point clouds respectively to obtain point clouds blocked by interference items.
- the related multiple sets of point clouds are used as training data sets to train the AI model for use in point cloud detection scenarios.
- FIG. 2 is a schematic diagram of a training data set generating device 20 according to an embodiment of the present application. As shown in Figure 2, the training data set generating device 20 includes:
- the acquisition module 21 is configured to: acquire the CAD model and rendering parameters.
- the generation module 22 is configured to: generate labels of the CAD model.
- the rendering module 23 is configured to: render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; wherein each of the multiple rendering depth maps has a corresponding CAD model Tag of.
- the conversion module 24 is configured to: convert a depth map into a set of point clouds, and the point clouds and corresponding labels constitute a set of training data in the training data set.
- the CAD model is rendered through a rendering engine, and the resulting rendered depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model.
- the trained AI model can be applied to applications involving Relevant predictions in scenarios of 3D object manipulation.
- FIG. 3 is a schematic diagram of an electronic device 300 according to an embodiment of the present application.
- the electronic device 300 includes a processor 302 and a memory 301 .
- the memory 301 stores instructions. When the instructions are executed by the processor 302 , the method 100 as described above is implemented.
- At least one processor 302 may include a microprocessor, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a state machine, etc.
- ASIC application specific integrated circuit
- DSP digital signal processor
- CPU central processing unit
- GPU graphics processing unit
- Examples of computer readable media include, but are not limited to, floppy disks, CD-ROMs, magnetic disks, memory chips, ROM, RAM, ASICs, configured processors, all-optical media, all magnetic tape or other magnetic media, or from which a computer processor can Any other medium from which instructions are read.
- various other forms of computer-readable media can be used to send or carry instructions to a computer, including routers, private or public networks, or other wired and wireless transmission devices or channels. Instructions can include code in any computer programming language, including C, C++, C++, Visual Basic, Java, and JavaScript.
- the embodiments of the present application also provide a computer-readable medium.
- Computer-readable instructions are stored on the computer-readable medium. When executed by the processor, the computer-readable instructions cause the processor to execute the aforementioned training data set. generation method. Examples of computer-readable media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tape, non- Volatile memory cards and ROM.
- the computer-readable instructions may be downloaded from the server computer or the cloud by the communications network.
- the execution order of each step is not fixed and can be adjusted as needed.
- the system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by multiple physical entities, or may be implemented by multiple Some components in separate devices are implemented together.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Optics & Photonics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Generation (AREA)
Abstract
Les modes de réalisation de la présente invention concernent principalement le domaine du traitement d'image, et en particulier un procédé et un appareil de génération d'un ensemble de données d'apprentissage, ainsi qu'un dispositif électronique et un support lisible par ordinateur. Le procédé consiste à : acquérir un modèle de CAO et des paramètres de rendu ; générer des étiquettes du modèle de CAO ; rendre le modèle de CAO en fonction des paramètres de rendu, de façon à obtenir une pluralité de cartes de profondeur rendues du modèle de CAO, chacune de la pluralité de cartes de profondeur rendues ayant une étiquette correspondante du modèle de CAO ; et convertir une carte de profondeur rendue en un groupe de nuages de points, les nuages de points et les étiquettes correspondantes constituant un groupe de données d'apprentissage dans un ensemble de données d'apprentissage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/090006 WO2023206268A1 (fr) | 2022-04-28 | 2022-04-28 | Procédé et appareil de génération d'ensemble de données d'apprentissage, dispositif électronique et support lisible |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/090006 WO2023206268A1 (fr) | 2022-04-28 | 2022-04-28 | Procédé et appareil de génération d'ensemble de données d'apprentissage, dispositif électronique et support lisible |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023206268A1 true WO2023206268A1 (fr) | 2023-11-02 |
Family
ID=88516799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/090006 WO2023206268A1 (fr) | 2022-04-28 | 2022-04-28 | Procédé et appareil de génération d'ensemble de données d'apprentissage, dispositif électronique et support lisible |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023206268A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106548194A (zh) * | 2016-09-29 | 2017-03-29 | 中国科学院自动化研究所 | 二维图像人体关节点定位模型的构建方法及定位方法 |
WO2018156126A1 (fr) * | 2017-02-23 | 2018-08-30 | Siemens Aktiengesellschaft | Génération en temps réel de données synthétiques à partir de capteurs de lumière structurée à prises de vue multiples pour une estimation de pose d'objet en trois dimensions |
CN112541908A (zh) * | 2020-12-18 | 2021-03-23 | 广东工业大学 | 基于机器视觉的铸件飞边识别方法及存储介质 |
CN113012122A (zh) * | 2021-03-11 | 2021-06-22 | 复旦大学 | 一种类别级6d位姿与尺寸估计方法及装置 |
CN113129370A (zh) * | 2021-03-04 | 2021-07-16 | 同济大学 | 结合生成数据和无标注数据的半监督物体位姿估计方法 |
-
2022
- 2022-04-28 WO PCT/CN2022/090006 patent/WO2023206268A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106548194A (zh) * | 2016-09-29 | 2017-03-29 | 中国科学院自动化研究所 | 二维图像人体关节点定位模型的构建方法及定位方法 |
WO2018156126A1 (fr) * | 2017-02-23 | 2018-08-30 | Siemens Aktiengesellschaft | Génération en temps réel de données synthétiques à partir de capteurs de lumière structurée à prises de vue multiples pour une estimation de pose d'objet en trois dimensions |
CN112541908A (zh) * | 2020-12-18 | 2021-03-23 | 广东工业大学 | 基于机器视觉的铸件飞边识别方法及存储介质 |
CN113129370A (zh) * | 2021-03-04 | 2021-07-16 | 同济大学 | 结合生成数据和无标注数据的半监督物体位姿估计方法 |
CN113012122A (zh) * | 2021-03-11 | 2021-06-22 | 复旦大学 | 一种类别级6d位姿与尺寸估计方法及装置 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108460338B (zh) | 人体姿态估计方法和装置、电子设备、存储介质、程序 | |
JP7040278B2 (ja) | 顔認識のための画像処理装置の訓練方法及び訓練装置 | |
CN110322512A (zh) | 结合小样本实例分割和三维匹配的物体位姿估计方法 | |
CN110163208B (zh) | 一种基于深度学习的场景文字检测方法和系统 | |
CN113408584B (zh) | Rgb-d多模态特征融合3d目标检测方法 | |
Rostianingsih et al. | COCO (creating common object in context) dataset for chemistry apparatus | |
WO2022089143A1 (fr) | Procédé de génération d'image analogique, dispositif électronique et support de stockage | |
JP2020135679A (ja) | データセット作成方法、データセット作成装置、及びデータセット作成プログラム | |
CN117274388B (zh) | 基于视觉文本关系对齐的无监督三维视觉定位方法及系统 | |
Talukdar et al. | Data augmentation on synthetic images for transfer learning using deep CNNs | |
CN110969641A (zh) | 图像处理方法和装置 | |
Sharma | Object detection and recognition using Amazon Rekognition with Boto3 | |
CN116977248A (zh) | 图像处理方法、装置、智能设备、存储介质及产品 | |
Sagues-Tanco et al. | Fast synthetic dataset for kitchen object segmentation in deep learning | |
CN113223037A (zh) | 一种面向大规模数据的无监督语义分割方法及系统 | |
CN107330363B (zh) | 一种快速的互联网广告牌检测方法 | |
Károly et al. | Automated dataset generation with blender for deep learning-based object segmentation | |
CN117953224A (zh) | 一种开放词汇3d全景分割方法及系统 | |
Volokitin et al. | Efficiently detecting plausible locations for object placement using masked convolutions | |
CN116310349B (zh) | 基于深度学习的大规模点云分割方法、装置、设备及介质 | |
CN113724329A (zh) | 融合平面与立体信息的目标姿态估计方法、系统和介质 | |
WO2023206268A1 (fr) | Procédé et appareil de génération d'ensemble de données d'apprentissage, dispositif électronique et support lisible | |
CN110910478B (zh) | Gif图生成方法、装置、电子设备及存储介质 | |
JP2019185295A (ja) | 画像処理装置および2次元画像生成用プログラム | |
CN111814594A (zh) | 物流违规行为识别方法、装置、设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22939119 Country of ref document: EP Kind code of ref document: A1 |