WO2023206268A1 - Method and apparatus for generating training data set, and electronic device and readable medium - Google Patents

Method and apparatus for generating training data set, and electronic device and readable medium Download PDF

Info

Publication number
WO2023206268A1
WO2023206268A1 PCT/CN2022/090006 CN2022090006W WO2023206268A1 WO 2023206268 A1 WO2023206268 A1 WO 2023206268A1 CN 2022090006 W CN2022090006 W CN 2022090006W WO 2023206268 A1 WO2023206268 A1 WO 2023206268A1
Authority
WO
WIPO (PCT)
Prior art keywords
cad model
training data
rendering
point clouds
label
Prior art date
Application number
PCT/CN2022/090006
Other languages
French (fr)
Chinese (zh)
Inventor
王海峰
Original Assignee
西门子股份公司
西门子(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西门子股份公司, 西门子(中国)有限公司 filed Critical 西门子股份公司
Priority to PCT/CN2022/090006 priority Critical patent/WO2023206268A1/en
Publication of WO2023206268A1 publication Critical patent/WO2023206268A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light

Definitions

  • Embodiments of the present application mainly relate to the field of image processing, and in particular, to a method, device, electronic device, and computer-readable medium for generating a training data set.
  • AI artificial intelligence
  • Embodiments of the present application provide a method, device, electronic device, and readable medium for generating a training data set, which are used to generate a point cloud data set with depth information to train an AI model, so as to be applied in scenarios involving three-dimensional object manipulation. Related predictions.
  • a method for generating a training data set including: obtaining a CAD model and rendering parameters; generating a label for the CAD model; rendering the CAD model according to the rendering parameters to obtain the CAD model A plurality of rendering depth maps; each of the plurality of rendering depth maps has a corresponding label of the CAD model; converting a rendering depth map into a set of point clouds, the point clouds and the corresponding The labels of constitute a set of training data in the training data set.
  • a second aspect provides a device for generating a training data set, including components for executing each step of the method provided in the first aspect.
  • an electronic device including: at least one memory configured to store computer readable code; at least one processor configured to call the computer readable code to execute each of the methods provided in the first aspect. step.
  • a computer-readable medium is provided.
  • Computer-readable instructions are stored on the computer-readable medium. When executed by a processor, the computer-readable instructions cause the processor to execute the method provided in the first aspect. Each step in the method.
  • Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application
  • Figure 2 is a schematic diagram of a device for generating a training data set according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an electronic device according to an embodiment of the present application.
  • the term "includes” and variations thereof represent an open term meaning “including, but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
  • the term “another embodiment” means “at least one other embodiment”.
  • the terms “first”, “second”, etc. may refer to different or the same object. Other definitions may be included below, whether explicit or implicit. The definition of a term is consistent throughout this specification unless the context clearly dictates otherwise.
  • Point clouds can contain depth information and thus can carry more information than two-dimensional images. This is a very important feature for industrial scenarios, as many point clouds rely on depth information to estimate the pose of an object and then operate on it.
  • point cloud data sets are not easy to obtain, which has largely hindered further research and application of point cloud-based AI models in the industry.
  • Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application. As shown in Figure 1, the method 100 for generating a training data set includes:
  • Step 101 obtain the CAD model and rendering parameters.
  • Step 102 Generate labels for the CAD model.
  • a label of the CAD model may be generated based on the file name of the CAD model/description information of the CAD model/user input. Among them, the label is used to characterize the name of the CAD model.
  • the name of the generated CAD model may be a specific name of the CAD model or a category name, such as robot, conveyor belt, material box, etc.
  • the name of the generated CAD model may also be an abstract name, such as A, B, and C.
  • image recognition can be performed on the CAD model to generate a corresponding label of the CAD model.
  • the label is used to characterize the name of the CAD model.
  • the label of the generated CAD model may be a description in the form of a supporting file, or the name of the CAD model may be added to a data structure in the CAD model.
  • the generated label may surround the CAD model in the form of a bounding box and indicate a specific name of the CAD model.
  • Step 103 Render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model, wherein each of the multiple rendering depth maps has a corresponding label of the CAD model.
  • the CAD model can be image rendered through the rendering engine, and the length of each pixel in the rendered CAD model from the corresponding spatial point in the rendering engine to the camera center point of the rendering engine along the optical axis is used as the depth of the pixel. .
  • the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine may be defined by rendering parameters.
  • the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine can be randomly determined according to preset rules to generate more rendering depth maps.
  • Step 104 Convert a rendered depth map into a set of point clouds.
  • the point clouds and corresponding labels constitute a set of training data in the training data set.
  • the coordinates (x, y, z) of each pixel in a rendered depth map are converted into the corresponding camera coordinates (x′, y′, z′) in the rendering engine.
  • the camera coordinates in the rendering engine approximate the scale in the real environment.
  • the depth map can be converted into a point cloud based on the internal calibration parameters of the rendering engine's camera.
  • the CAD model is rendered through a rendering engine, and then the obtained rendering depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model, that is, the neural network model.
  • the trained AI model It can be applied to scenarios involving three-dimensional object manipulation, such as object recognition, object classification, and six-dimensional pose estimation in the field of robotic applications.
  • the embodiments of the present application can be applied to different production stages, such as part-level, component-level, sub-assembly or assembly-level products, and the application prospects are very broad.
  • the size of the point cloud data set can be continuously expanded according to user needs, thereby further improving the performance of the AI model.
  • noise can be generated at multiple points in the set of point clouds to obtain multiple sets of point clouds containing different noises.
  • random noise/Gaussian noise can be generated at multiple points in a set of point clouds.
  • random noise can be generated at multiple points in a set of point clouds, where the perturbation amount of the random noise can be set within a smaller range, such as at x, y and z of multiple points.
  • the random noise added to the coordinates is within the following range: x 1 : (-1, 1), y 1 : (-1, 1), z 1 : (-1, 1).
  • Gaussian noise can be generated on multiple points in a set of point clouds, where the perturbation amount of the Gaussian noise is set to oscillate around the mean of the coordinates of all points in the point cloud. Specifically, the closer the The mean, the higher the probability of the corresponding disturbance; the further away from the mean, the lower the probability of the corresponding disturbance, resulting in noise that conforms to the Gaussian distribution. Additionally, Gaussian noise can be generated at multiple points in different sets of point clouds.
  • noise is added on the basis of the point cloud converted from the rendered CAD model. This generates more point cloud data sets that are closer to real application scenarios.
  • the point cloud and the label of each point in the point cloud constitute a set of training data in the training data set.
  • the AI model trained through the generated training data set can be used in point cloud segmentation scenarios. For example, the first point cloud in the real scene collected by the depth camera installed on the robot is input into the first AI model trained with the corresponding training data set. The first AI model can target each point cloud in the first point cloud. A point is segmented to extract different objects in the first point cloud.
  • the point cloud and the labels of the point cloud constitute a set of training data in the training data set.
  • the AI model trained through the generated training data set can be used in point cloud classification scenarios. For example, point clouds of different categories are input into the second AI model trained with the corresponding training data set, and the second AI model can predict the categories to which the different input point clouds belong.
  • At least one two-dimensional depth map can be added at a random position of each rendered depth map in the multiple rendered depth maps to increase interference terms, where, Each pixel in the two-dimensional depth map contains depth information, and the relevant depth information is randomly generated within a preset range.
  • generate a richer, more distracting rendered depth map by adding one or more 2D depth maps of different sizes, different positions, or different rotations at random locations.
  • the multiple rendered depth maps obtained are converted into multiple sets of point clouds respectively to obtain point clouds blocked by interference items.
  • the related multiple sets of point clouds are used as training data sets to train the AI model for use in point cloud detection scenarios.
  • FIG. 2 is a schematic diagram of a training data set generating device 20 according to an embodiment of the present application. As shown in Figure 2, the training data set generating device 20 includes:
  • the acquisition module 21 is configured to: acquire the CAD model and rendering parameters.
  • the generation module 22 is configured to: generate labels of the CAD model.
  • the rendering module 23 is configured to: render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; wherein each of the multiple rendering depth maps has a corresponding CAD model Tag of.
  • the conversion module 24 is configured to: convert a depth map into a set of point clouds, and the point clouds and corresponding labels constitute a set of training data in the training data set.
  • the CAD model is rendered through a rendering engine, and the resulting rendered depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model.
  • the trained AI model can be applied to applications involving Relevant predictions in scenarios of 3D object manipulation.
  • FIG. 3 is a schematic diagram of an electronic device 300 according to an embodiment of the present application.
  • the electronic device 300 includes a processor 302 and a memory 301 .
  • the memory 301 stores instructions. When the instructions are executed by the processor 302 , the method 100 as described above is implemented.
  • At least one processor 302 may include a microprocessor, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a state machine, etc.
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • CPU central processing unit
  • GPU graphics processing unit
  • Examples of computer readable media include, but are not limited to, floppy disks, CD-ROMs, magnetic disks, memory chips, ROM, RAM, ASICs, configured processors, all-optical media, all magnetic tape or other magnetic media, or from which a computer processor can Any other medium from which instructions are read.
  • various other forms of computer-readable media can be used to send or carry instructions to a computer, including routers, private or public networks, or other wired and wireless transmission devices or channels. Instructions can include code in any computer programming language, including C, C++, C++, Visual Basic, Java, and JavaScript.
  • the embodiments of the present application also provide a computer-readable medium.
  • Computer-readable instructions are stored on the computer-readable medium. When executed by the processor, the computer-readable instructions cause the processor to execute the aforementioned training data set. generation method. Examples of computer-readable media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tape, non- Volatile memory cards and ROM.
  • the computer-readable instructions may be downloaded from the server computer or the cloud by the communications network.
  • the execution order of each step is not fixed and can be adjusted as needed.
  • the system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by multiple physical entities, or may be implemented by multiple Some components in separate devices are implemented together.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Optics & Photonics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

The embodiments of the present application mainly relate to the field of image processing, and in particular to a method and apparatus for generating a training data set, and an electronic device and a computer-readable medium. The method comprises: acquiring a CAD model and rendering parameters; generating labels of the CAD model; rendering the CAD model according to the rendering parameters, so as to obtain a plurality of rendered depth maps of the CAD model, wherein each of the plurality of rendered depth maps has a corresponding label of the CAD model; and converting one rendered depth map into a group of point clouds, wherein the point clouds and the corresponding labels constitute a group of training data in a training data set.

Description

训练数据集的生成方法、装置、电子设备和可读介质Method, device, electronic device and readable medium for generating training data set 技术领域Technical field
本申请实施例主要涉及图像处理领域,尤其涉及一种训练数据集的生成方法、装置、电子设备和计算机可读介质。Embodiments of the present application mainly relate to the field of image processing, and in particular, to a method, device, electronic device, and computer-readable medium for generating a training data set.
背景技术Background technique
随着人工智能(AI)领域的高速发展,通过AI模型进行图像识别变得普遍起来。若要获得一能够识别特定图像的AI模型,首先需要生成大量经标记的相关图像以训练AI模型。目前,一种方式是通过利用渲染引擎生成相关图像,然而,存在的问题是此方法下获得的二维合成图像作为训练数据集所得到的AI模型,无法适用于涉及三维物体操作的场景,尤其是机器人应用领域。With the rapid development of the field of artificial intelligence (AI), image recognition through AI models has become common. To obtain an AI model that can recognize specific images, you first need to generate a large number of labeled relevant images to train the AI model. Currently, one way is to use a rendering engine to generate relevant images. However, the problem is that the AI model obtained by using the two-dimensional synthetic images obtained under this method as a training data set cannot be applied to scenes involving three-dimensional object operations, especially It is the field of robot application.
发明内容Contents of the invention
本申请实施例提供一种训练数据集的生成方法、装置、电子设备和可读介质,用于生成具有深度信息的点云数据集以训练AI模型,从而应用于涉及三维物体操作的场景下的相关预测。Embodiments of the present application provide a method, device, electronic device, and readable medium for generating a training data set, which are used to generate a point cloud data set with depth information to train an AI model, so as to be applied in scenarios involving three-dimensional object manipulation. Related predictions.
第一方面,提供一种训练数据集的生成方法,包括:获取CAD模型和渲染参数;生成所述CAD模型的标签;根据所述渲染参数,对所述CAD模型进行渲染,得到所述CAD模型的多张渲染深度图;所述多张渲染深度图中的每一张渲染深度图具有相应的所述CAD模型的标签;将一渲染深度图转换为一组点云,所述点云与相应的标签构成所述训练数据集中的一组训练数据。In a first aspect, a method for generating a training data set is provided, including: obtaining a CAD model and rendering parameters; generating a label for the CAD model; rendering the CAD model according to the rendering parameters to obtain the CAD model A plurality of rendering depth maps; each of the plurality of rendering depth maps has a corresponding label of the CAD model; converting a rendering depth map into a set of point clouds, the point clouds and the corresponding The labels of constitute a set of training data in the training data set.
第二方面,提供一种训练数据集的生成装置,包括用于执行第一方面提供的方法中各步骤的组成部分。A second aspect provides a device for generating a training data set, including components for executing each step of the method provided in the first aspect.
第三方面,提供一种电子设备,包括:至少一个存储器,被配置为存储计算机可读代码;至少一个处理器,被配置为调用所述计算机可读代码,执行第一方面提供的方法中各步骤。In a third aspect, an electronic device is provided, including: at least one memory configured to store computer readable code; at least one processor configured to call the computer readable code to execute each of the methods provided in the first aspect. step.
第四方面,提供一种计算机可读介质,所述计算机可读介质上存储有计算机可读指令,所述计算机可读指令在被处理器执行时,使所述处理器执行第一方面提供的方法中各步骤。In a fourth aspect, a computer-readable medium is provided. Computer-readable instructions are stored on the computer-readable medium. When executed by a processor, the computer-readable instructions cause the processor to execute the method provided in the first aspect. Each step in the method.
附图说明Description of drawings
以下附图仅旨在于对本申请实施例做示意性说明和解释,并不限定本申请实施例的范围。其中:The following drawings are only intended to schematically illustrate and explain the embodiments of the present application, and do not limit the scope of the embodiments of the present application. in:
图1是根据本申请一实施例的一种训练数据集的生成方法的流程图;Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application;
图2是根据本申请一实施例的一种训练数据集的生成装置的示意图;Figure 2 is a schematic diagram of a device for generating a training data set according to an embodiment of the present application;
图3是根据本申请一实施例的一种电子装置的示意图。FIG. 3 is a schematic diagram of an electronic device according to an embodiment of the present application.
附图标记说明Explanation of reference signs
100:训练数据集的生成方法                               101-104:方法步骤100: Generating method of training data set 101-104: Method steps
20:训练数据集的生成装置       21:获取模块             22:生成模块20: Training data set generation device 21: Acquisition module 22: Generation module
23:渲染模块                   24:转换模块23: Rendering module 24: Conversion module
300:电子设备                  301:存储器              302:处理器300: Electronic equipment 301: Memory 302: Processor
具体实施方式Detailed ways
现在将参考示例实施方式讨论本文描述的主题。应该理解,讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离本申请实施例内容的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个示例可以根据需要,省略、替代或者添加各种过程或组件。例如,所描述的方法可以按照与所描述的顺序不同的顺序来执行,以及各个步骤可以被添加、省略或者组合。另外,相对一些示例所描述的特征在其它例子中也可以进行组合。The subject matter described herein will now be discussed with reference to example implementations. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. The functions and arrangements of the discussed elements may be changed without departing from the scope of the embodiments of the present application. Each example may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and individual steps may be added, omitted, or combined. Additionally, features described with respect to some examples may also be combined in other examples.
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其他的定义,无论是明确的还是隐含的。除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。As used herein, the term "includes" and variations thereof represent an open term meaning "including, but not limited to." The term "based on" means "based at least in part on." The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions may be included below, whether explicit or implicit. The definition of a term is consistent throughout this specification unless the context clearly dictates otherwise.
基于二维图像的人工智能模型被广泛应用于解决各种现实问题,与一般的二维图像相比,点云可以包含深度信息,从而比二维图像可以承载更多的信息。对于工业场景来说,这是一个非常重要的特性,因为许多点云都依赖深度信息来估计对象的姿势,然后对其进行操作。然而在现有技术下,点云数据集并不容易获取,这很大程度上阻碍了基于点云的AI模型在行业中的进一步研究和应用。Artificial intelligence models based on two-dimensional images are widely used to solve various real-world problems. Compared with general two-dimensional images, point clouds can contain depth information and thus can carry more information than two-dimensional images. This is a very important feature for industrial scenarios, as many point clouds rely on depth information to estimate the pose of an object and then operate on it. However, under current technology, point cloud data sets are not easy to obtain, which has largely hindered further research and application of point cloud-based AI models in the industry.
下面结合附图对本申请实施例进行详细说明。The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
图1是根据本申请的一实施例的一种训练数据集的生成方法的流程图,如图1所示,训练数据集的生成方法100包括:Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application. As shown in Figure 1, the method 100 for generating a training data set includes:
步骤101,获取CAD模型和渲染参数。 Step 101, obtain the CAD model and rendering parameters.
用户可以根据真实应用场景的需要,将不同的CAD模型导入至渲染引擎中。Users can import different CAD models into the rendering engine according to the needs of real application scenarios.
步骤102,生成CAD模型的标签。Step 102: Generate labels for the CAD model.
在一实施例中,可以根据CAD模型的文件名/CAD模型的描述信息/用户输入,生成CAD模型的标签。其中,标签用以表征CAD模型的名称。其中,所生成的CAD模型的名称可以是CAD模型的具体名称,也可以是类别名称,例如:机器人、传送带、料盒等。可选地,所生成的CAD模型的名称还可以是抽象的名称,例如A、B和C等。在一实施例中,可以针对CAD模型进行图像识别,生成对应的CAD模型的标签。其中,标签用以表征CAD模型的名称。通过本申请实施例的标签生成流程,可以将用户从繁琐的手动标记的流程中解放出来,还可以明显加快大规模的定制工作。In one embodiment, a label of the CAD model may be generated based on the file name of the CAD model/description information of the CAD model/user input. Among them, the label is used to characterize the name of the CAD model. The name of the generated CAD model may be a specific name of the CAD model or a category name, such as robot, conveyor belt, material box, etc. Optionally, the name of the generated CAD model may also be an abstract name, such as A, B, and C. In one embodiment, image recognition can be performed on the CAD model to generate a corresponding label of the CAD model. Among them, the label is used to characterize the name of the CAD model. Through the label generation process of the embodiment of the present application, users can be liberated from the tedious manual labeling process, and large-scale customization work can also be significantly accelerated.
在一实施例中,所生成的CAD模型的标签可以是以一个配套的文件形式进行的描述,或者可以在CAD模型中的一个数据结构中增加CAD模型的名称。In one embodiment, the label of the generated CAD model may be a description in the form of a supporting file, or the name of the CAD model may be added to a data structure in the CAD model.
在一实施例中,所生成的标签可以通过边界框的形式包围CAD模型,并指示CAD模型的具体名称。In one embodiment, the generated label may surround the CAD model in the form of a bounding box and indicate a specific name of the CAD model.
步骤103,根据所述渲染参数,对CAD模型进行渲染,得到CAD模型的多张渲染深度图,其中,多张渲染深度图中的每一张渲染深度图具有相应的所述CAD模型的标签。Step 103: Render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model, wherein each of the multiple rendering depth maps has a corresponding label of the CAD model.
可以通过渲染引擎对CAD模型进行图像渲染,并将渲染后的CAD模型中的每一个像素点在渲染引擎中对应的空间点距离渲染引擎的相机中心点沿着光轴的长度作为像素点的深度。The CAD model can be image rendered through the rendering engine, and the length of each pixel in the rendered CAD model from the corresponding spatial point in the rendering engine to the camera center point of the rendering engine along the optical axis is used as the depth of the pixel. .
可选地,在渲染引擎中的待渲染的CAD模型的数量、位置、缩放和/或旋转可由渲染参数定义。可选地,在渲染引擎中的待渲染的CAD模型的数量、位置、缩放和/或旋转可以根据预设的规则进行随机确定,以生成更多的渲染深度图。Optionally, the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine may be defined by rendering parameters. Optionally, the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine can be randomly determined according to preset rules to generate more rendering depth maps.
步骤104,将一渲染深度图转换为一组点云,点云与相应的标签构成训练数据集中的一组训练数据。Step 104: Convert a rendered depth map into a set of point clouds. The point clouds and corresponding labels constitute a set of training data in the training data set.
具体而言,根据透视成像的原理,将一渲染深度图中的每一像素点的坐标(x,y,z),转换为渲染引擎中对应的相机坐标(x′,y′,z′)下的一组点云。其中,渲染引擎中的相机坐标逼近真实环境中的尺度。可选地,可以根据渲染引擎的相机的内标定参数,将深度图转换为点云。Specifically, according to the principle of perspective imaging, the coordinates (x, y, z) of each pixel in a rendered depth map are converted into the corresponding camera coordinates (x′, y′, z′) in the rendering engine. A set of point clouds below. Among them, the camera coordinates in the rendering engine approximate the scale in the real environment. Optionally, the depth map can be converted into a point cloud based on the internal calibration parameters of the rendering engine's camera.
在本申请实施例中,通过渲染引擎对CAD模型渲染,再将得到的渲染深度图转换为对应的点云,从而生成相应的训练数据集以训练AI模型即神经网络模型,训练得到的AI模型可以应用于涉及三维物体操作的场景,例如机器人应用领域下的物体识别、物体分类以及六维姿态的估计等等。本申请实施例可以应用于不同的生产阶段,例如零件级、组件级、子装配或装配级产品,应用前景十分广阔。此外,在本申请实施例中,根据用户的需求,可以不断地扩充点云数据集的规模,从而进一步提高AI模型的性能。In the embodiment of this application, the CAD model is rendered through a rendering engine, and then the obtained rendering depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model, that is, the neural network model. The trained AI model It can be applied to scenarios involving three-dimensional object manipulation, such as object recognition, object classification, and six-dimensional pose estimation in the field of robotic applications. The embodiments of the present application can be applied to different production stages, such as part-level, component-level, sub-assembly or assembly-level products, and the application prospects are very broad. In addition, in the embodiments of this application, the size of the point cloud data set can be continuously expanded according to user needs, thereby further improving the performance of the AI model.
在一实施例中,在将一深度图转换为一组点云后,可以在该组点云中的多个点上生成噪声,得到包含不同噪声的多组点云。可选地,可以在一组点云中的多个点上生成随机噪声/高斯噪声。在一实施例中,可以在一组点云中的多个点上生成随机噪声,其中,随机噪声的扰动量可以设置在一个较小的范围内,例如在多个点的x,y和z坐标上增加的随机噪声在下述范围内:x 1:(-1,1),y 1:(-1,1),z 1:(-1,1)。此外,还可以在不同组点云中的多个点上生成随机噪声,从而增强生成的点云,得到更多不同噪声情况的训练数据集。在一实施例中,可以在一组点云中的多个点上生成高斯噪声,其中,高斯噪声的扰动量设置在点云中的所有点的坐标的均值附近振荡,具体而言,越接近均值,出现相对应的扰动量的概率越高;越远离均值,出现相对应的扰动量的概率越低,从而产生符合高斯分布的噪声。此外,还可以在不同组点云中的多个点上生成高斯噪声。鉴于真实应用场景下通过深度传感器采集得到的相关物体的点云数据中可能存在各种噪音,因此为了缩小和真实数据的差异,在由渲染得到的CAD模型转换得到的点云基础上增加噪声,从而生成更多更贴近真实应用场景下的点云数据组。 In one embodiment, after converting a depth map into a set of point clouds, noise can be generated at multiple points in the set of point clouds to obtain multiple sets of point clouds containing different noises. Optionally, random noise/Gaussian noise can be generated at multiple points in a set of point clouds. In one embodiment, random noise can be generated at multiple points in a set of point clouds, where the perturbation amount of the random noise can be set within a smaller range, such as at x, y and z of multiple points. The random noise added to the coordinates is within the following range: x 1 : (-1, 1), y 1 : (-1, 1), z 1 : (-1, 1). In addition, random noise can be generated at multiple points in different groups of point clouds to enhance the generated point cloud and obtain more training data sets with different noise conditions. In one embodiment, Gaussian noise can be generated on multiple points in a set of point clouds, where the perturbation amount of the Gaussian noise is set to oscillate around the mean of the coordinates of all points in the point cloud. Specifically, the closer the The mean, the higher the probability of the corresponding disturbance; the further away from the mean, the lower the probability of the corresponding disturbance, resulting in noise that conforms to the Gaussian distribution. Additionally, Gaussian noise can be generated at multiple points in different sets of point clouds. In view of the fact that there may be various noises in the point cloud data of relevant objects collected by the depth sensor in real application scenarios, in order to reduce the difference with the real data, noise is added on the basis of the point cloud converted from the rendered CAD model. This generates more point cloud data sets that are closer to real application scenarios.
在一实施例中,点云和点云中的每一个点的标签构成所述训练数据集中的一组训练数据。通过所生成的训练数据集训练得到的AI模型,可以用于点云分割的场景。例如,将安装于机器人上的深度相机采集得到的真实场景下的第一点云输入到相应的训练数据集训练得到的第一AI模型中,第一AI模型可以针对第一点云中的每一个点进行分割,从而提取第一点云中的不同物体。In one embodiment, the point cloud and the label of each point in the point cloud constitute a set of training data in the training data set. The AI model trained through the generated training data set can be used in point cloud segmentation scenarios. For example, the first point cloud in the real scene collected by the depth camera installed on the robot is input into the first AI model trained with the corresponding training data set. The first AI model can target each point cloud in the first point cloud. A point is segmented to extract different objects in the first point cloud.
在一实施例中,点云和点云的标签构成所述训练数据集中的一组训练数据。通过所生成的训练数据集训练得到的AI模型,可以用于点云分类的场景。例如,将不同类别的点云输入到相应的训练数据集训练得到的第二AI模型中,第二AI模型可以预测所输入的不同点云分别归属的类别。In one embodiment, the point cloud and the labels of the point cloud constitute a set of training data in the training data set. The AI model trained through the generated training data set can be used in point cloud classification scenarios. For example, point clouds of different categories are input into the second AI model trained with the corresponding training data set, and the second AI model can predict the categories to which the different input point clouds belong.
在一实施例中,在得到CAD模型的多张渲染深度图之后,可以在多张渲染深度图中的每一渲染深度图的随机位置上添加至少一个二维深度贴图以增加干扰项,其中,二维深度贴图中的每一个像素点均包含深度信息,相关深度信息是在预设范围中随机生成得到的。 可选地,通过在随机位置上添加不同大小、不同位置或不同旋转角度的一个或多个二维深度贴图,以生成更丰富的、更分散注意力的渲染深度图。将所得到的多个渲染深度图分别转换为多组点云,得到被干扰项遮挡的点云,将相关多组点云作为训练数据集以训练AI模型,从而用于点云检测的场景。In one embodiment, after obtaining multiple rendered depth maps of the CAD model, at least one two-dimensional depth map can be added at a random position of each rendered depth map in the multiple rendered depth maps to increase interference terms, where, Each pixel in the two-dimensional depth map contains depth information, and the relevant depth information is randomly generated within a preset range. Optionally, generate a richer, more distracting rendered depth map by adding one or more 2D depth maps of different sizes, different positions, or different rotations at random locations. The multiple rendered depth maps obtained are converted into multiple sets of point clouds respectively to obtain point clouds blocked by interference items. The related multiple sets of point clouds are used as training data sets to train the AI model for use in point cloud detection scenarios.
图2是根据本申请的一实施例的一种训练数据集的生成装置20的示意图,如图2所示,训练数据集的生成装置20包括:Figure 2 is a schematic diagram of a training data set generating device 20 according to an embodiment of the present application. As shown in Figure 2, the training data set generating device 20 includes:
获取模块21,被配置为:获取CAD模型和渲染参数。The acquisition module 21 is configured to: acquire the CAD model and rendering parameters.
生成模块22,被配置为:生成CAD模型的标签。The generation module 22 is configured to: generate labels of the CAD model.
渲染模块23,被配置为:根据渲染参数,对CAD模型进行渲染,得到CAD模型的多张渲染深度图;其中,多张渲染深度图中的每一张渲染深度图具有相应的所述CAD模型的标签。The rendering module 23 is configured to: render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; wherein each of the multiple rendering depth maps has a corresponding CAD model Tag of.
转换模块24,被配置为:将一深度图转换为一组点云,点云与相应的标签构成训练数据集中的一组训练数据。The conversion module 24 is configured to: convert a depth map into a set of point clouds, and the point clouds and corresponding labels constitute a set of training data in the training data set.
在本申请实施例中,通过渲染引擎对CAD模型渲染,再将得到的渲染深度图转换为对应的点云,从而生成相应的训练数据集以训练AI模型,训练得到的AI模型可以应用于涉及三维物体操作的场景下的相关预测。In this embodiment of the present application, the CAD model is rendered through a rendering engine, and the resulting rendered depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model. The trained AI model can be applied to applications involving Relevant predictions in scenarios of 3D object manipulation.
本申请实施例还提出一种电子设备300。图3是根据本申请的一实施例的一种电子设备300的示意图。如图3所示,电子设备300包括处理器302和存储器301,存储器301中存储有指令,其中指令被处理器302执行时实现如上文所述的方法100。An embodiment of the present application also provides an electronic device 300. FIG. 3 is a schematic diagram of an electronic device 300 according to an embodiment of the present application. As shown in FIG. 3 , the electronic device 300 includes a processor 302 and a memory 301 . The memory 301 stores instructions. When the instructions are executed by the processor 302 , the method 100 as described above is implemented.
其中,至少一个处理器302可以包括微处理器、专用集成电路(ASIC)、数字信号处理器(DSP)、中央处理单元(CPU)、图形处理单元(GPU)、状态机等。计算机可读介质的实施例包括但不限于软盘、CD-ROM、磁盘,存储器芯片、ROM、RAM、ASIC、配置的处理器、全光介质、所有磁带或其他磁性介质,或计算机处理器可以从中读取指令的任何其他介质。此外,各种其它形式的计算机可读介质可以向计算机发送或携带指令,包括路由器、专用或公用网络、或其它有线和无线传输设备或信道。指令可以包括任何计算机编程语言的代码,包括C、C++、C语言、Visual Basic、java和JavaScript。Among them, at least one processor 302 may include a microprocessor, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a state machine, etc. Examples of computer readable media include, but are not limited to, floppy disks, CD-ROMs, magnetic disks, memory chips, ROM, RAM, ASICs, configured processors, all-optical media, all magnetic tape or other magnetic media, or from which a computer processor can Any other medium from which instructions are read. Additionally, various other forms of computer-readable media can be used to send or carry instructions to a computer, including routers, private or public networks, or other wired and wireless transmission devices or channels. Instructions can include code in any computer programming language, including C, C++, C++, Visual Basic, Java, and JavaScript.
此外,本申请实施例实施例还提供一种计算机可读介质,该计算机可读介质上存储有计算机可读指令,计算机可读指令在被处理器执行时,使处理器执行前述的训练数据集的生成方法。计算机可读介质的实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD+RW)、磁带、非易失性存储卡和ROM。可选地,可以由通信网络从服务器计算机上或云上下载计算机可读指令。In addition, the embodiments of the present application also provide a computer-readable medium. Computer-readable instructions are stored on the computer-readable medium. When executed by the processor, the computer-readable instructions cause the processor to execute the aforementioned training data set. generation method. Examples of computer-readable media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tape, non- Volatile memory cards and ROM. Alternatively, the computer-readable instructions may be downloaded from the server computer or the cloud by the communications network.
需要说明的是,上述各流程和各系统结构图中不是所有的步骤和模块都是必须的,可以根据实际的需要忽略某些步骤或模块。各步骤的执行顺序不是固定的,可以根据需要进行调整。上述各实施例中描述的系统结构可以是物理结构,也可以是逻辑结构,即,有些模块可能由同一物理实体实现,或者,有些模块可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。It should be noted that not all steps and modules in the above-mentioned processes and system structure diagrams are necessary, and some steps or modules can be ignored according to actual needs. The execution order of each step is not fixed and can be adjusted as needed. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by multiple physical entities, or may be implemented by multiple Some components in separate devices are implemented together.

Claims (13)

  1. 一种训练数据集的生成方法,其特征在于,包括:A method for generating a training data set, which is characterized by including:
    -获取(101)CAD模型和渲染参数;-Get (101) CAD model and rendering parameters;
    -生成(102)所述CAD模型的标签;- generate (102) a label for the CAD model;
    -根据所述渲染参数,对所述CAD模型进行渲染(103),得到所述CAD模型的多张渲染深度图;所述多张渲染深度图中的每一张渲染深度图具有相应的所述CAD模型的标签;-Render the CAD model (103) according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; each rendering depth map in the multiple rendering depth maps has the corresponding CAD model tags;
    -将一渲染深度图转换(104)为一组点云,所述点云与相应的标签构成所述训练数据集中的一组训练数据。-Convert (104) a rendered depth map into a set of point clouds, said point clouds and corresponding labels forming a set of training data in said training data set.
  2. 根据权利要求1所述的方法,其特征在于,将一深度图转换(104)为一组点云后,所述方法还包括:The method according to claim 1, characterized in that after converting (104) a depth map into a set of point clouds, the method further includes:
    -在一组点云中的多个点上生成噪声,得到包含不同噪声的多组点云。-Generate noise at multiple points in a set of point clouds to obtain multiple sets of point clouds containing different noises.
  3. 根据权利要求2所述的方法,其特征在于,所述在一组点云中的多个点上生成噪声包括:The method of claim 2, wherein generating noise at multiple points in a set of point clouds includes:
    -在一组点云中的多个点上生成随机噪声;或,- Generate random noise at multiple points in a set of point clouds; or,
    -在一组点云中的多个点上生成高斯噪声。- Generate Gaussian noise at multiple points in a set of point clouds.
  4. 根据权利要求1所述的方法,其特征在于,对所述CAD模型进行渲染(103)包括:The method according to claim 1, characterized in that rendering (103) the CAD model includes:
    -通过渲染引擎对所述CAD模型进行图像渲染,并将渲染后的CAD模型中的每一个像素点在所述渲染引擎中对应的空间点距离所述渲染引擎的相机中心点沿着光轴的长度作为所述像素点的深度。- Perform image rendering on the CAD model through a rendering engine, and determine the distance between each pixel in the rendered CAD model and the corresponding spatial point in the rendering engine along the optical axis from the camera center point of the rendering engine. The length is the depth of the pixel.
  5. 根据权利要求1所述的方法,其特征在于,所述生成(102)所述CAD模型的标签包括:The method according to claim 1, characterized in that generating (102) a label of the CAD model includes:
    -根据所述CAD模型的文件名/所述CAD模型的描述信息/用户输入,生成所述CAD模型的标签;所述标签用以表征所述CAD模型的名称。- Generate a label for the CAD model based on the file name of the CAD model/description information of the CAD model/user input; the label is used to characterize the name of the CAD model.
  6. 根据权利要求1所述的方法,其特征在于,所述生成(102)所述CAD模型的标签包括:The method according to claim 1, characterized in that generating (102) a label of the CAD model includes:
    -针对所述CAD模型进行识别,生成所述CAD模型的标签;所述标签用以表征所述CAD模型的名称。-Identify the CAD model and generate a label for the CAD model; the label is used to characterize the name of the CAD model.
  7. 根据权利要求1所述的方法,其特征在于,所述点云与相应的标签构成所述训练数据集中的一组训练数据包括:The method of claim 1, wherein the point cloud and corresponding labels constitute a set of training data in the training data set, including:
    -所述点云和所述点云中的每一个点的标签构成所述训练数据集中的一组训练数据。-The point cloud and the label of each point in the point cloud constitute a set of training data in the training data set.
  8. 根据权利要求1所述的方法,其特征在于,所述点云与相应的标签构成所述训练数据集中的一组训练数据包括:The method of claim 1, wherein the point cloud and corresponding labels constitute a set of training data in the training data set, including:
    -所述点云和所述点云的标签构成所述训练数据集中的一组训练数据。- the point cloud and the labels of the point cloud constitute a set of training data in the training data set.
  9. 根据权利要求1所述的方法,其特征在于,将一深度图转换(104)为一组点云包括:The method of claim 1, wherein converting (104) a depth map into a set of point clouds includes:
    -根据透视成像的原理,将一深度图中的每一像素点的坐标,转换为渲染引擎中对应的相机坐标下的一组点云。-According to the principle of perspective imaging, convert the coordinates of each pixel in a depth map into a set of point clouds under the corresponding camera coordinates in the rendering engine.
  10. 根据权利要求1所述的方法,其特征在于,在得到所述CAD模型的多张渲染深度图之后,所述方法还包括:The method according to claim 1, characterized in that, after obtaining multiple rendering depth maps of the CAD model, the method further includes:
    -在所述多张渲染深度图中的每一渲染深度图的随机位置上添加至少一个二维深度贴图,所述二维深度贴图中的每一个像素点均包含在预设范围内随机生成的深度信息。-Add at least one two-dimensional depth map at a random position of each rendered depth map in the plurality of rendered depth maps. Each pixel in the two-dimensional depth map contains a randomly generated pixel within a preset range. depth information.
  11. 一种训练数据集的生成装置,其特征在于,包括:A device for generating training data sets, which is characterized by including:
    -获取模块(21),被配置为:获取CAD模型和渲染参数;-The acquisition module (21) is configured to: acquire the CAD model and rendering parameters;
    -生成模块(22),被配置为:生成所述CAD模型的标签;- a generation module (22) configured to: generate labels for the CAD model;
    -渲染模块(23),被配置为:根据所述渲染参数,对所述CAD模型进行渲染,得到所述CAD模型的多张渲染深度图;所述多张渲染深度图中的每一张渲染深度图具有相应的所述CAD模型的标签;-Rendering module (23), configured to: render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; render each of the multiple rendering depth maps The depth map has a corresponding label of the CAD model;
    -转换模块(24),被配置为:将一深度图转换为一组点云,所述点云与相应的标签构成所述训练数据集中的一组训练数据。- a conversion module (24) configured to: convert a depth map into a set of point clouds, the point clouds and corresponding labels forming a set of training data in the training data set.
  12. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    至少一个存储器(301),被配置为存储计算机可读代码;At least one memory (301) configured to store computer readable code;
    至少一个处理器(302),被配置为调用所述计算机可读代码,执行如权利要求1~10任一项所述的方法中的步骤。At least one processor (302) configured to invoke the computer readable code to perform the steps of the method according to any one of claims 1 to 10.
  13. 一种计算机可读介质,其特征在于,所述计算机可读介质上存储有计算机可读指令,所述计算机可读指令在被处理器执行时,使所述处理器执行如权利要求1~10任一项所述的方法中的步骤。A computer-readable medium, characterized in that computer-readable instructions are stored on the computer-readable medium, and when executed by a processor, the computer-readable instructions cause the processor to execute claims 1 to 10 The steps in any of the methods.
PCT/CN2022/090006 2022-04-28 2022-04-28 Method and apparatus for generating training data set, and electronic device and readable medium WO2023206268A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/090006 WO2023206268A1 (en) 2022-04-28 2022-04-28 Method and apparatus for generating training data set, and electronic device and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/090006 WO2023206268A1 (en) 2022-04-28 2022-04-28 Method and apparatus for generating training data set, and electronic device and readable medium

Publications (1)

Publication Number Publication Date
WO2023206268A1 true WO2023206268A1 (en) 2023-11-02

Family

ID=88516799

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090006 WO2023206268A1 (en) 2022-04-28 2022-04-28 Method and apparatus for generating training data set, and electronic device and readable medium

Country Status (1)

Country Link
WO (1) WO2023206268A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548194A (en) * 2016-09-29 2017-03-29 中国科学院自动化研究所 The construction method and localization method of two dimensional image human joint pointses location model
WO2018156126A1 (en) * 2017-02-23 2018-08-30 Siemens Aktiengesellschaft Real-time generation of synthetic data from multi-shot structured light sensors for three-dimensional object pose estimation
CN112541908A (en) * 2020-12-18 2021-03-23 广东工业大学 Casting flash identification method based on machine vision and storage medium
CN113012122A (en) * 2021-03-11 2021-06-22 复旦大学 Category-level 6D pose and size estimation method and device
CN113129370A (en) * 2021-03-04 2021-07-16 同济大学 Semi-supervised object pose estimation method combining generated data and label-free data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548194A (en) * 2016-09-29 2017-03-29 中国科学院自动化研究所 The construction method and localization method of two dimensional image human joint pointses location model
WO2018156126A1 (en) * 2017-02-23 2018-08-30 Siemens Aktiengesellschaft Real-time generation of synthetic data from multi-shot structured light sensors for three-dimensional object pose estimation
CN112541908A (en) * 2020-12-18 2021-03-23 广东工业大学 Casting flash identification method based on machine vision and storage medium
CN113129370A (en) * 2021-03-04 2021-07-16 同济大学 Semi-supervised object pose estimation method combining generated data and label-free data
CN113012122A (en) * 2021-03-11 2021-06-22 复旦大学 Category-level 6D pose and size estimation method and device

Similar Documents

Publication Publication Date Title
CN108460338B (en) Human body posture estimation method and apparatus, electronic device, storage medium, and program
JP7040278B2 (en) Training method and training device for image processing device for face recognition
CN110322512A (en) In conjunction with the segmentation of small sample example and three-dimensional matched object pose estimation method
CN110163208B (en) Scene character detection method and system based on deep learning
CN113408584B (en) RGB-D multi-modal feature fusion 3D target detection method
Rostianingsih et al. COCO (creating common object in context) dataset for chemistry apparatus
WO2022089143A1 (en) Method for generating analog image, and electronic device and storage medium
CN117274388B (en) Unsupervised three-dimensional visual positioning method and system based on visual text relation alignment
JP2020135679A (en) Data set creation method, data set creation device, and data set creation program
Talukdar et al. Data augmentation on synthetic images for transfer learning using deep CNNs
CN110969641A (en) Image processing method and device
CN107330363B (en) Rapid internet billboard detection method
Sagues-Tanco et al. Fast synthetic dataset for kitchen object segmentation in deep learning
Sharma Object detection and recognition using Amazon Rekognition with Boto3
Károly et al. Automated dataset generation with blender for deep learning-based object segmentation
Buls et al. Generation of synthetic training data for object detection in piles
CN116310349B (en) Large-scale point cloud segmentation method, device, equipment and medium based on deep learning
WO2023206268A1 (en) Method and apparatus for generating training data set, and electronic device and readable medium
CN113223037A (en) Unsupervised semantic segmentation method and unsupervised semantic segmentation system for large-scale data
JP2019185295A (en) Image processing device and program for generating two-dimensional image
Volokitin et al. Efficiently detecting plausible locations for object placement using masked convolutions
CN110490852A (en) Search method, device, computer-readable medium and the electronic equipment of target object
CN114241052B (en) Method and system for generating new view image of multi-object scene based on layout
CN110910478B (en) GIF map generation method and device, electronic equipment and storage medium
CN110751153B (en) Semantic annotation method for indoor scene RGB-D image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939119

Country of ref document: EP

Kind code of ref document: A1