WO2023206268A1

WO2023206268A1 - Method and apparatus for generating training data set, and electronic device and readable medium

Info

Publication number: WO2023206268A1
Application number: PCT/CN2022/090006
Authority: WO
Inventors: 王海峰
Original assignee: 西门子股份公司; 西门子（中国）有限公司
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2023-11-02

Abstract

The embodiments of the present application mainly relate to the field of image processing, and in particular to a method and apparatus for generating a training data set, and an electronic device and a computer-readable medium. The method comprises: acquiring a CAD model and rendering parameters; generating labels of the CAD model; rendering the CAD model according to the rendering parameters, so as to obtain a plurality of rendered depth maps of the CAD model, wherein each of the plurality of rendered depth maps has a corresponding label of the CAD model; and converting one rendered depth map into a group of point clouds, wherein the point clouds and the corresponding labels constitute a group of training data in a training data set.

Description

Method, device, electronic device and readable medium for generating training data set

Technical field

Embodiments of the present application mainly relate to the field of image processing, and in particular, to a method, device, electronic device, and computer-readable medium for generating a training data set.

Background technique

With the rapid development of the field of artificial intelligence (AI), image recognition through AI models has become common. To obtain an AI model that can recognize specific images, you first need to generate a large number of labeled relevant images to train the AI model. Currently, one way is to use a rendering engine to generate relevant images. However, the problem is that the AI model obtained by using the two-dimensional synthetic images obtained under this method as a training data set cannot be applied to scenes involving three-dimensional object operations, especially It is the field of robot application.

Contents of the invention

Embodiments of the present application provide a method, device, electronic device, and readable medium for generating a training data set, which are used to generate a point cloud data set with depth information to train an AI model, so as to be applied in scenarios involving three-dimensional object manipulation. Related predictions.

In a first aspect, a method for generating a training data set is provided, including: obtaining a CAD model and rendering parameters; generating a label for the CAD model; rendering the CAD model according to the rendering parameters to obtain the CAD model A plurality of rendering depth maps; each of the plurality of rendering depth maps has a corresponding label of the CAD model; converting a rendering depth map into a set of point clouds, the point clouds and the corresponding The labels of constitute a set of training data in the training data set.

A second aspect provides a device for generating a training data set, including components for executing each step of the method provided in the first aspect.

In a third aspect, an electronic device is provided, including: at least one memory configured to store computer readable code; at least one processor configured to call the computer readable code to execute each of the methods provided in the first aspect. step.

In a fourth aspect, a computer-readable medium is provided. Computer-readable instructions are stored on the computer-readable medium. When executed by a processor, the computer-readable instructions cause the processor to execute the method provided in the first aspect. Each step in the method.

Description of drawings

The following drawings are only intended to schematically illustrate and explain the embodiments of the present application, and do not limit the scope of the embodiments of the present application. in:

Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application;

Figure 2 is a schematic diagram of a device for generating a training data set according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an electronic device according to an embodiment of the present application.

Explanation of reference signs

100: Generating method of training data set 101-104: Method steps

20: Training data set generation device 21: Acquisition module 22: Generation module

23: Rendering module 24: Conversion module

300: Electronic equipment 301: Memory 302: Processor

Detailed ways

The subject matter described herein will now be discussed with reference to example implementations. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. The functions and arrangements of the discussed elements may be changed without departing from the scope of the embodiments of the present application. Each example may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and individual steps may be added, omitted, or combined. Additionally, features described with respect to some examples may also be combined in other examples.

As used herein, the term "includes" and variations thereof represent an open term meaning "including, but not limited to." The term "based on" means "based at least in part on." The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions may be included below, whether explicit or implicit. The definition of a term is consistent throughout this specification unless the context clearly dictates otherwise.

Artificial intelligence models based on two-dimensional images are widely used to solve various real-world problems. Compared with general two-dimensional images, point clouds can contain depth information and thus can carry more information than two-dimensional images. This is a very important feature for industrial scenarios, as many point clouds rely on depth information to estimate the pose of an object and then operate on it. However, under current technology, point cloud data sets are not easy to obtain, which has largely hindered further research and application of point cloud-based AI models in the industry.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Figure 1 is a flow chart of a method for generating a training data set according to an embodiment of the present application. As shown in Figure 1, the method 100 for generating a training data set includes:

Step 101, obtain the CAD model and rendering parameters.

Users can import different CAD models into the rendering engine according to the needs of real application scenarios.

Step 102: Generate labels for the CAD model.

In one embodiment, a label of the CAD model may be generated based on the file name of the CAD model/description information of the CAD model/user input. Among them, the label is used to characterize the name of the CAD model. The name of the generated CAD model may be a specific name of the CAD model or a category name, such as robot, conveyor belt, material box, etc. Optionally, the name of the generated CAD model may also be an abstract name, such as A, B, and C. In one embodiment, image recognition can be performed on the CAD model to generate a corresponding label of the CAD model. Among them, the label is used to characterize the name of the CAD model. Through the label generation process of the embodiment of the present application, users can be liberated from the tedious manual labeling process, and large-scale customization work can also be significantly accelerated.

In one embodiment, the label of the generated CAD model may be a description in the form of a supporting file, or the name of the CAD model may be added to a data structure in the CAD model.

In one embodiment, the generated label may surround the CAD model in the form of a bounding box and indicate a specific name of the CAD model.

Step 103: Render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model, wherein each of the multiple rendering depth maps has a corresponding label of the CAD model.

The CAD model can be image rendered through the rendering engine, and the length of each pixel in the rendered CAD model from the corresponding spatial point in the rendering engine to the camera center point of the rendering engine along the optical axis is used as the depth of the pixel. .

Optionally, the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine may be defined by rendering parameters. Optionally, the number, position, scaling and/or rotation of the CAD models to be rendered in the rendering engine can be randomly determined according to preset rules to generate more rendering depth maps.

Step 104: Convert a rendered depth map into a set of point clouds. The point clouds and corresponding labels constitute a set of training data in the training data set.

Specifically, according to the principle of perspective imaging, the coordinates (x, y, z) of each pixel in a rendered depth map are converted into the corresponding camera coordinates (x′, y′, z′) in the rendering engine. A set of point clouds below. Among them, the camera coordinates in the rendering engine approximate the scale in the real environment. Optionally, the depth map can be converted into a point cloud based on the internal calibration parameters of the rendering engine's camera.

In the embodiment of this application, the CAD model is rendered through a rendering engine, and then the obtained rendering depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model, that is, the neural network model. The trained AI model It can be applied to scenarios involving three-dimensional object manipulation, such as object recognition, object classification, and six-dimensional pose estimation in the field of robotic applications. The embodiments of the present application can be applied to different production stages, such as part-level, component-level, sub-assembly or assembly-level products, and the application prospects are very broad. In addition, in the embodiments of this application, the size of the point cloud data set can be continuously expanded according to user needs, thereby further improving the performance of the AI model.

In one embodiment, after converting a depth map into a set of point clouds, noise can be generated at multiple points in the set of point clouds to obtain multiple sets of point clouds containing different noises. Optionally, random noise/Gaussian noise can be generated at multiple points in a set of point clouds. In one embodiment, random noise can be generated at multiple points in a set of point clouds, where the perturbation amount of the random noise can be set within a smaller range, such as at x, y and z of multiple points. The random noise added to the coordinates is within the following range: x ₁ : (-1, 1), y ₁ : (-1, 1), z ₁ : (-1, 1). In addition, random noise can be generated at multiple points in different groups of point clouds to enhance the generated point cloud and obtain more training data sets with different noise conditions. In one embodiment, Gaussian noise can be generated on multiple points in a set of point clouds, where the perturbation amount of the Gaussian noise is set to oscillate around the mean of the coordinates of all points in the point cloud. Specifically, the closer the The mean, the higher the probability of the corresponding disturbance; the further away from the mean, the lower the probability of the corresponding disturbance, resulting in noise that conforms to the Gaussian distribution. Additionally, Gaussian noise can be generated at multiple points in different sets of point clouds. In view of the fact that there may be various noises in the point cloud data of relevant objects collected by the depth sensor in real application scenarios, in order to reduce the difference with the real data, noise is added on the basis of the point cloud converted from the rendered CAD model. This generates more point cloud data sets that are closer to real application scenarios.

In one embodiment, the point cloud and the label of each point in the point cloud constitute a set of training data in the training data set. The AI model trained through the generated training data set can be used in point cloud segmentation scenarios. For example, the first point cloud in the real scene collected by the depth camera installed on the robot is input into the first AI model trained with the corresponding training data set. The first AI model can target each point cloud in the first point cloud. A point is segmented to extract different objects in the first point cloud.

In one embodiment, the point cloud and the labels of the point cloud constitute a set of training data in the training data set. The AI model trained through the generated training data set can be used in point cloud classification scenarios. For example, point clouds of different categories are input into the second AI model trained with the corresponding training data set, and the second AI model can predict the categories to which the different input point clouds belong.

In one embodiment, after obtaining multiple rendered depth maps of the CAD model, at least one two-dimensional depth map can be added at a random position of each rendered depth map in the multiple rendered depth maps to increase interference terms, where, Each pixel in the two-dimensional depth map contains depth information, and the relevant depth information is randomly generated within a preset range. Optionally, generate a richer, more distracting rendered depth map by adding one or more 2D depth maps of different sizes, different positions, or different rotations at random locations. The multiple rendered depth maps obtained are converted into multiple sets of point clouds respectively to obtain point clouds blocked by interference items. The related multiple sets of point clouds are used as training data sets to train the AI model for use in point cloud detection scenarios.

Figure 2 is a schematic diagram of a training data set generating device 20 according to an embodiment of the present application. As shown in Figure 2, the training data set generating device 20 includes:

The acquisition module 21 is configured to: acquire the CAD model and rendering parameters.

The generation module 22 is configured to: generate labels of the CAD model.

The rendering module 23 is configured to: render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; wherein each of the multiple rendering depth maps has a corresponding CAD model Tag of.

The conversion module 24 is configured to: convert a depth map into a set of point clouds, and the point clouds and corresponding labels constitute a set of training data in the training data set.

In this embodiment of the present application, the CAD model is rendered through a rendering engine, and the resulting rendered depth map is converted into a corresponding point cloud, thereby generating a corresponding training data set to train the AI model. The trained AI model can be applied to applications involving Relevant predictions in scenarios of 3D object manipulation.

An embodiment of the present application also provides an electronic device 300. FIG. 3 is a schematic diagram of an electronic device 300 according to an embodiment of the present application. As shown in FIG. 3 , the electronic device 300 includes a processor 302 and a memory 301 . The memory 301 stores instructions. When the instructions are executed by the processor 302 , the method 100 as described above is implemented.

Among them, at least one processor 302 may include a microprocessor, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a state machine, etc. Examples of computer readable media include, but are not limited to, floppy disks, CD-ROMs, magnetic disks, memory chips, ROM, RAM, ASICs, configured processors, all-optical media, all magnetic tape or other magnetic media, or from which a computer processor can Any other medium from which instructions are read. Additionally, various other forms of computer-readable media can be used to send or carry instructions to a computer, including routers, private or public networks, or other wired and wireless transmission devices or channels. Instructions can include code in any computer programming language, including C, C++, C++, Visual Basic, Java, and JavaScript.

In addition, the embodiments of the present application also provide a computer-readable medium. Computer-readable instructions are stored on the computer-readable medium. When executed by the processor, the computer-readable instructions cause the processor to execute the aforementioned training data set. generation method. Examples of computer-readable media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tape, non- Volatile memory cards and ROM. Alternatively, the computer-readable instructions may be downloaded from the server computer or the cloud by the communications network.

It should be noted that not all steps and modules in the above-mentioned processes and system structure diagrams are necessary, and some steps or modules can be ignored according to actual needs. The execution order of each step is not fixed and can be adjusted as needed. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by multiple physical entities, or may be implemented by multiple Some components in separate devices are implemented together.

Claims

A method for generating a training data set, which is characterized by including:

-Get (101) CAD model and rendering parameters;

- generate (102) a label for the CAD model;

-Render the CAD model (103) according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; each rendering depth map in the multiple rendering depth maps has the corresponding CAD model tags;

-Convert (104) a rendered depth map into a set of point clouds, said point clouds and corresponding labels forming a set of training data in said training data set.
The method according to claim 1, characterized in that after converting (104) a depth map into a set of point clouds, the method further includes:

-Generate noise at multiple points in a set of point clouds to obtain multiple sets of point clouds containing different noises.
The method of claim 2, wherein generating noise at multiple points in a set of point clouds includes:

- Generate random noise at multiple points in a set of point clouds; or,

- Generate Gaussian noise at multiple points in a set of point clouds.
The method according to claim 1, characterized in that rendering (103) the CAD model includes:

- Perform image rendering on the CAD model through a rendering engine, and determine the distance between each pixel in the rendered CAD model and the corresponding spatial point in the rendering engine along the optical axis from the camera center point of the rendering engine. The length is the depth of the pixel.
The method according to claim 1, characterized in that generating (102) a label of the CAD model includes:

- Generate a label for the CAD model based on the file name of the CAD model/description information of the CAD model/user input; the label is used to characterize the name of the CAD model.
The method according to claim 1, characterized in that generating (102) a label of the CAD model includes:

-Identify the CAD model and generate a label for the CAD model; the label is used to characterize the name of the CAD model.
The method of claim 1, wherein the point cloud and corresponding labels constitute a set of training data in the training data set, including:

-The point cloud and the label of each point in the point cloud constitute a set of training data in the training data set.
The method of claim 1, wherein the point cloud and corresponding labels constitute a set of training data in the training data set, including:

- the point cloud and the labels of the point cloud constitute a set of training data in the training data set.
The method of claim 1, wherein converting (104) a depth map into a set of point clouds includes:

-According to the principle of perspective imaging, convert the coordinates of each pixel in a depth map into a set of point clouds under the corresponding camera coordinates in the rendering engine.
The method according to claim 1, characterized in that, after obtaining multiple rendering depth maps of the CAD model, the method further includes:

-Add at least one two-dimensional depth map at a random position of each rendered depth map in the plurality of rendered depth maps. Each pixel in the two-dimensional depth map contains a randomly generated pixel within a preset range. depth information.
A device for generating training data sets, which is characterized by including:

-The acquisition module (21) is configured to: acquire the CAD model and rendering parameters;

- a generation module (22) configured to: generate labels for the CAD model;

-Rendering module (23), configured to: render the CAD model according to the rendering parameters to obtain multiple rendering depth maps of the CAD model; render each of the multiple rendering depth maps The depth map has a corresponding label of the CAD model;

- a conversion module (24) configured to: convert a depth map into a set of point clouds, the point clouds and corresponding labels forming a set of training data in the training data set.
An electronic device, characterized by including:

At least one memory (301) configured to store computer readable code;

At least one processor (302) configured to invoke the computer readable code to perform the steps of the method according to any one of claims 1 to 10.
A computer-readable medium, characterized in that computer-readable instructions are stored on the computer-readable medium, and when executed by a processor, the computer-readable instructions cause the processor to execute claims 1 to 10 The steps in any of the methods.