WO2022116678A1

WO2022116678A1 - Method and apparatus for determining pose of target object, storage medium and electronic device

Info

Publication number: WO2022116678A1
Application number: PCT/CN2021/122454
Authority: WO
Inventors: 杜国光; 赵开勇
Original assignee: 达闼机器人股份有限公司
Priority date: 2020-12-02
Filing date: 2021-09-30
Publication date: 2022-06-09
Also published as: CN112435297B; CN112435297A

Abstract

A method and an apparatus for determining a pose of a target object, a storage medium and an electronic device, which aim to solve the problem in the related art of low accuracy in determining a pose of a target object. Said method comprises: determining target coordinates of each input point in the target object according to a 3D point cloud corresponding to the target object (S11); on the basis of a down-sampling method, generating a seed point according to the target coordinates of each input point (S12); inputting original coordinates of each seed point into a deep neural network model, so as to obtain center point coordinates and in-situ point coordinates of each seed point (S13); and determining a 6D pose of the target object according to the center point coordinates and the in-situ point coordinates of each seed point (S14). Said method improves the accuracy in determining a pose of a target object.

Description

Target object pose determination method, device, storage medium and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

This disclosure claims the priority of the Chinese patent application with the application number 202011401984.6 and titled "Method, Apparatus, Storage Medium and Electronic Equipment for Determining the Position and Attitude of a Target Object" filed with the China Patent Office on December 02, 2020, the entire contents of which are obtained through References are incorporated in this disclosure.

technical field

The present disclosure relates to the technical field of robotics and computer vision, and in particular, to a method, device, storage medium and electronic device for determining the pose of a target object.

Background technique

6D Object Pose Estimation refers to the translation and rotation transformation of the camera coordinate system relative to the world system where the original object is located at the moment when the current image is captured. Among them, there are 3 displacement degrees of freedom and 3 rotational degrees of freedom. Based on the 6D pose of the object, it can accurately locate the object, which is of great significance in the fields of robot grasping and augmented reality applications.

In the related art, according to the 3D data of the specific object, based on the 3D point cloud or based on the RGB-D image, the 6D pose of the specific object under the camera system is calculated. For example, based on the 3D point cloud, a random sampling method can be used to obtain the spatial 6D transformation by randomly finding corresponding point pairs, and calculate the error after each spatial 6D transformation, and then use the point pair with the smallest error after the spatial 6D transformation as the object 6D bit posture. For another example, based on the 3D point cloud, the feature point method can be used to obtain the 6D pose of the initial object by finding and matching multiple feature point pairs. Based on the 6D pose of the initial object, an accurate matching algorithm such as the nearest neighbor iterative point algorithm ICP (Iterative Closest Points) Get the final 6D pose of the object. Based on the RGB-D image, the template image information can be used to find the template image that is most similar to the current image, and the 6D pose of the template image can be regarded as the 6D pose of the current object.

SUMMARY OF THE INVENTION

The purpose of the present disclosure is to provide a method, device, storage medium and electronic device for determining the pose of a target object, so as to solve the problem of low accuracy in determining the pose of a target object in the related art

In order to achieve the above purpose, a first aspect of the embodiments of the present disclosure provides a method for determining the pose of a target object, the method comprising:

Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

Based on the downsampling method, a seed point is generated according to the target coordinates of each input point;

The original coordinates of each described seed point are input into the deep neural network model, to obtain the center point coordinates and the original position coordinates of each described seed point;

The 6D pose of the target object is determined according to the center point coordinates of each seed point and the origin point coordinates.

In a second aspect of the embodiments of the present disclosure, there is provided an apparatus for determining the pose of a target object, the apparatus comprising:

a determination module, configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

A generation module is used to generate a seed point according to the target coordinates of each input point based on the downsampling method;

an input module for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the in-situ coordinates of each of the seed points;

The execution module is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.

A third aspect of the embodiments of the present disclosure provides a computer program, including computer-readable codes, which, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to execute the embodiments of the first aspect of the present disclosure the methods proposed.

A third aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which the computer program provided by the embodiments of the third aspect of the present disclosure is stored, and when the program is executed by a processor, the method described in the first aspect is executed. step.

A fourth aspect of the embodiments of the present disclosure provides an electronic device, including:

a memory, on which the computer program proposed by the embodiment of the third aspect of the present disclosure is stored;

A processor for executing the computer program in the memory to implement the steps of the method in the first aspect.

Through the above technical solutions, at least the following beneficial effects can be achieved:

Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object; based on the downsampling method, generate seed points according to the target coordinates of each input point; input the original coordinates of each seed point into the deep neural network In the model, the center point coordinates and the origin point coordinates of each seed point are obtained; the 6D pose of the target object is determined according to the center point coordinates and the origin point coordinates of each seed point. In this way, the coordinates of each seed point are returned to the coordinates under the standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy.

Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.

Description of drawings

The accompanying drawings are used to provide a further understanding of the present disclosure, and constitute a part of the specification, and together with the following detailed description, are used to explain the present disclosure, but not to limit the present disclosure. In the attached image:

Fig. 1 is a flow chart of a method for determining the pose of a target object according to an exemplary embodiment.

Fig. 2 is a flow chart showing step S14 in Fig. 1 according to an exemplary embodiment.

Fig. 3 is a flow chart showing step S11 in Fig. 1 according to an exemplary embodiment.

Fig. 4 is a flow chart showing step S12 in Fig. 1 according to an exemplary embodiment.

Fig. 5 is a flowchart showing a method for determining a 3D point cloud of a target object according to an exemplary embodiment.

Fig. 6 is a flowchart showing another method for determining a 3D point cloud of a target object according to an exemplary embodiment.

Fig. 7 is a block diagram of an apparatus for determining the pose of a target object according to an exemplary embodiment.

FIG. 8 is a block diagram of an electronic device 800 according to an exemplary embodiment.

FIG. 9 provides a schematic structural diagram of a computing processing device according to an embodiment of the present disclosure.

FIG. 10 provides a schematic diagram of a storage unit for portable or fixed program code implementing the method according to the present disclosure according to an embodiment of the present disclosure.

Detailed ways

The specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to illustrate and explain the present disclosure, but not to limit the present disclosure.

It should be noted that, in the present disclosure, the terms "first", "second" and the like in the description, the claims and the drawings are used to distinguish similar objects, and are not necessarily construed as describing a specific order or sequence. . Likewise, the terms "S51", "S61", etc. are used to distinguish steps, and are not necessarily understood to mean that the method steps are performed in a specific order or sequence.

Before introducing the method, device, storage medium and electronic device for determining the pose of a target object provided by the present disclosure, the application scenarios of the present disclosure are first introduced. The method for determining the pose of a target object provided by the present disclosure can be applied to an electronic device, and the electronic device can be, for example, a smart phone, a PC (Personal Computer), and the like.

The inventors found that, to determine the pose of the target object based on the random sampling method, it is necessary to repeatedly search for corresponding point pairs and calculate the transformation error. The pose determination and calculation process is cumbersome and takes a long time, causing waste of human and material resources and increasing time costs. . Determining the grasping feature points based on the geometric structure requires the geometric structure of the target object to be highly significant. If the geometric structure of the target object is less significant, the accuracy of the determined grasping feature points will be reduced, which may cause the grasping robot to grasp. Location determination and augmented reality are less accurate, thus increasing the risk of damage to the target object.

In order to solve the above technical problems, the present disclosure provides a method for determining the pose of a target object. Fig. 1 is a flowchart of a method for determining the pose of a target object according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.

S11. Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object.

S12. Based on the down-sampling method, a seed point is generated according to the target coordinates of each input point.

S13: Input the original coordinates of each seed point into the deep neural network model to obtain the center point coordinates and the original position coordinates of each seed point.

S14. Determine the 6D pose of the target object according to the center point coordinates of each seed point and the origin point coordinates.

Using the above technical solution, the target coordinates of each input point in the target object are determined according to the 3D point cloud corresponding to the target object; based on the downsampling method, a seed point is generated according to the target coordinates of each input point; The coordinates are input into the deep neural network model to obtain the coordinates of the center point of each seed point and the coordinates of the original point; the 6D pose of the target object is determined according to the coordinates of the center point and the original point of each seed point. In this way, the coordinates of each seed point are returned to the coordinates under the standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy. Moreover, not only the 6D pose of the instance-level target object can be estimated, but also the 6D pose of the category-level target object can be estimated, thereby reducing the dependence on the complete consistency of the model.

Optionally, the deep neural network model generates the center point coordinates and in-situ coordinates of each of the seed points in the following manner:

Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;

The center point coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.

Optionally, the high-dimensional feature of the original coordinates of each seed point can be obtained by using the three-dimensional convolutional neural network model PointNet++, and the displacement deviation and rotation deviation of the seed point relative to the predicted target point can be obtained by regression according to the high-dimensional feature.

For example, if the original coordinate of the ith seed point is p _i , Δq _i is the displacement deviation corresponding to the ith sample seed point, and Δr _i is the rotation deviation corresponding to the ith sample seed point, then the center of the seed point The point coordinate qi is p _i +Δq _i , and the in-situ point coordinate _ri of the seed point is p _i +Δr _i , where the value of _n is the same as the number of the determined seed points.

Optionally, referring to a flowchart shown in FIG. 2 for implementing step S14 in FIG. 1 , in step S14, the target is determined according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point. The 6D pose of the object, including:

S141. Construct a first-order matrix according to the original coordinates of the seed points, and construct a second first-order matrix according to the original coordinates of the seed points.

S142. Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix.

S143, obtaining three rotational degrees of freedom of the target object according to the rotation matrix.

S144: Calculate the average coordinates according to the center point coordinates of each seed point, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object.

S145. Determine the 6D pose of the target object according to the three rotational degrees of freedom and the three displacement degrees of freedom.

In specific implementation, according to the center point coordinates of each seed point, the predicted center point coordinates of the target object are aggregated. For example, by calculating the sum of the center point coordinates of all seed points, the center point of each seed point of the target object is obtained. The average value of the coordinates to obtain the coordinates of the predicted center point of the target object, that is, the coordinates of the predicted center point

Among them, qi is the coordinate of the center point of the _ith seed point.

Further, the three coordinate values of the average coordinates are used as the three displacement degrees of freedom of the target object. For example, if the obtained coordinates of the predicted center point are (1, 2, 3), then the three displacement degrees of freedom of the target object are ( 1,2,3).

Further, the first-order matrix is constructed according to the original coordinates of the seed points as

The second first-order matrix is constructed according to the in-situ coordinates of the seed points as

If the rotation matrix is

then there are

Further, X is calculated using the least squares method.

Further, let the three rotational degrees of freedom of the target object be (α, β, γ), then α=atan2(x ₃₂ , x ₃₃ ),

γ=atan2(x ₂₁ , x ₁₁ ).

Optionally, referring to a flowchart shown in FIG. 3 for implementing step S11 in FIG. 1 , in step S11 , each input point in the target object is determined according to the 3D point cloud corresponding to the target object. target coordinates, including:

S111. Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object.

S112. Determine the barycentric coordinates of the 3D point cloud according to the original coordinates of each input point.

S113, for each input point, determine the offset value of the original coordinates of the input point and the barycentric coordinates in each dimension;

S114. Determine the target coordinate corresponding to the input point according to the offset value in each dimension.

Specifically, if the barycentric coordinate of the 3D point cloud is o _center , then the target coordinate of the input point is p _new =p _ori -o _center .

For example, it is determined that the barycentric coordinates of the 3D point cloud are (2, 3, 4), the original coordinates of a certain input point of the target object are (8, 7, 9), and it is determined that the original coordinates of the input point and the barycentric coordinates are in the first The offset value in the dimension is 6, that is, 8-2=6, and the offset value of the original coordinates of the input point and the barycentric coordinates in the second dimension is 4, that is, 7-3=4, and the offset value of the input point is determined. The offset value of the original coordinates and the barycentric coordinates in the third dimension is 5, that is, 9-4=5.

Further, according to the offset value 6 on the first dimension, the offset value 4 on the second dimension, and the offset value 5 on the third dimension, the target coordinates of the input point are determined to be (6, 4, 5).

With the above technical solution, since the position of the 3D point cloud of the target object is random, the original coordinates of the 3D point cloud can be normalized by the barycentric coordinates of the 3D point cloud, so that the original coordinates of the target object can be reduced to the model Calculated impact.

Optionally, the downsampling method is the farthest point sampling method. Referring to a flowchart shown in FIG. 4 for implementing step S12 in FIG. 1 , in step S12, the downsampling-based method is based on the The target coordinates of each input point generate seed points, including:

S121. Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object.

S122. Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point.

S123. Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.

Optionally, the downsampling method includes random sampling, farthest point sampling FPS (Farthest Point Sampling), and a method based on depth model sampling.

For example, here the following sampling method is the farthest point sampling method as an example, select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point, and then select the Euclidean distance from the first seed point. The input point with the farthest distance is used as the second seed point, and then based on the first seed point and the second seed point, the input point with the farthest Euclidean distance is selected as the third seed point, that is, the Euclidean distance farthest from the first seed point is selected. And the input point with the farthest Euclidean distance from the second seed point is used as the third seed point.

Further, based on the second seed point and the third seed point, the input point with the farthest Euclidean distance is selected as the fourth seed point, that is, the Euclidean distance farthest from the second seed point and the Euclidean distance from the third seed point are selected. The distant input point is used as the fourth seed point, and so on, until the number of seed points reaches 1024. That is, the preset threshold is 1024.

With the above technical solution, since the density of the seed points of the target object is not uniform, the seed points of the target object can be determined by the downsampling method, so that the influence of different original coordinate densities of the target object on the model calculation can be reduced.

Optionally, the training of the deep neural network model includes:

The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;

The coordinates of the predicted center point of the sample seed point are determined according to the coordinate information of the sample seed point and the predicted displacement deviation, and the prediction of the sample seed point is determined according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;

The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:

L=λ ₁ L _center +λ ₂ L _initial ;

Among them, n is the total number of sample seed points, L _center is the displacement offset loss value, L _initial is the rotation offset loss value, λ ₁ is the preset weight of the displacement offset loss value, and λ ₂ is the rotation offset value. The preset weight of the displacement loss value,

Δq _i is the predicted displacement deviation corresponding to the ith sample seed point, Δr _i is the predicted rotation deviation corresponding to the ith sample seed point,

is the real offset of the coordinates of the ith sample seed point from the coordinates of the center point,

It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.

It is worth noting that the coordinates of the ith sample seed point are the real offsets from the coordinates of the original point.

It can be calculated from the true value of the rotation degree of freedom and the predicted rotation deviation of the seed point. The coordinate of the ith sample seed point is the true offset of the center point coordinate.

It can be calculated from the actual value of the displacement degrees of freedom and the predicted displacement deviation of the seed point.

Optionally, FIG. 5 is a flowchart of a method for determining a 3D point cloud of a target object according to an exemplary embodiment, in which each input in the determination of the target object according to the 3D point cloud corresponding to the target object is performed. Before the target coordinates of the point, include:

S51. Collect the RGB image of the target object through the image acquisition device.

S52. Determine the depth image of the target object according to the RGB image, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area.

S53. Determine a 3D point cloud corresponding to the target object according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.

In specific implementation, when it is determined that there is a target object in the acquisition range of the image acquisition device, for example, the target object is placed in the acquisition range of the image acquisition device on the workbench, and the image acquisition device collects RGB images of its acquisition range, and then the Instance segmentation is performed on the RGB image. For example, instance segmentation is performed on the RGB image by using the Mask R-CNN instance segmentation algorithm to obtain the category-level mask area occupied by the target object.

Further, based on the alignment of the RGB image and the depth image, according to the RGB image of the target object, the depth image corresponding to the target object is determined, and then the area occupied by the depth image of the target image can be determined, that is, the image of the target image. deep area.

Further, the 3D point cloud corresponding to the target object is determined according to the category-level mask area and the depth image area, combined with the noise suppression parameter and the texture restoration degree parameter of the image acquisition device. In this way, the background noise of the image can be removed reasonably and the accuracy of 3D point cloud determination can be improved.

By adopting the above technical solution, by determining the category-level mask area and the depth image area of the RGB image of the target object, and removing background noise, the accuracy of determining the 3D point cloud can be improved, thereby improving the accuracy of determining the pose of the target object. sex.

Optionally, FIG. 6 is a flowchart showing another method for determining a 3D point cloud of a target object according to an exemplary embodiment, in which each of the target objects is determined according to the 3D point cloud corresponding to the target object. Before entering the target coordinates of the point, include:

S61. Determine a category-level rectangular area corresponding to the target object according to the acquired RGB image of the target object and the target detection algorithm.

S62 , crop the category-level rectangular region from far and near to obtain the point cloud of the frustum region of the target object.

S63. Perform semantic segmentation on the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.

In specific implementation, when it is determined that there is a target object within the acquisition range of the image acquisition device, the RGB image of the target object is collected by the image acquisition device, and the single-stage 2D target detection algorithm YOLOv3 can be used to identify the category occupied by the target area. level rectangle area.

Further, based on the far-near cropping function of the image acquisition device, the category-level rectangular area is subjected to far-near cropping to obtain the visual frustum corresponding to the target object, and based on the frustum plane of the image acquisition device, the frustum plane is no more than the frustum. The vertex of the body is intersected with the generatrix of the viewing frustum, the frustum operation is performed on the viewing frustum, the frustum corresponding to the target object is obtained, and the area occupied by the frustum is determined.

Further, determine the point cloud of the frustum area according to the area occupied by the frustum, and perform semantic segmentation on the point cloud of the frustum area based on a semantic segmentation model, such as the PointNet++ model, remove the background noise of the image, and then determine The 3D point cloud corresponding to the target object.

Using the above technical solution, by determining the category-level rectangular frame area and the area occupied by the frustum on the RGB image of the target object, and then determining the point cloud of the frustum area, and removing the background noise of the image, the determination of the 3D point cloud can be improved. accuracy, thereby improving the accuracy of the pose determination of the target object.

Based on the same inventive concept, the present disclosure also provides an apparatus 700 for determining the pose of a target object, which is used to perform the steps of the method for determining the pose of a target object provided by the above method embodiments. The method for determining the pose of the target object is realized in a combined way. FIG. 7 is a block diagram of an apparatus for determining the pose of a target object according to an exemplary embodiment. As shown in FIG. 7 , the apparatus 700 includes: a determination module 710 , a generation module 720 , an input module 730 and an execution module 740 .

Wherein, the determining module 710 is configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

A generating module 720, configured to generate a seed point according to the target coordinate of each input point based on the downsampling method;

The input module 730 is used for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the origin point coordinates of each of the seed points;

The execution module 740 is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.

The above device returns the coordinates of each seed point to the coordinates under its standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy.

Optionally, the execution module is used to:

Build a first-order matrix according to the original coordinates of the seed points, and build a second first-order matrix according to the original coordinates of the seed points;

Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix;

3 rotational degrees of freedom of the target object are obtained according to the rotation matrix;

Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;

The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.

Optionally, the determining module is used to:

Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

Determine the barycentric coordinates of the 3D point cloud according to the original coordinates of each input point;

For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;

The target coordinate corresponding to the input point is determined according to the offset value in each dimension.

Optionally, the generation module is used for:

Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;

Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;

Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.

Optionally, the training of the deep neural network model includes:

L=λ ₁ L _center +λ ₂ L _initial ;

is the true offset of the coordinates of the ith sample seed point from the coordinates of the center point,

Optionally, the device further includes a collection module for:

Collect the RGB image of the target object through an image acquisition device;

According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;

A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.

Optionally, the device further includes an acquisition module for:

According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;

Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;

Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.

Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

In addition, it is worth noting that, for the convenience and brevity of description, the embodiments described in the specification are all preferred embodiments, and the involved parts are not necessarily necessary for the present disclosure, for example, the input module 730 and the execution module 740 , may be mutually independent devices or the same device during specific implementation, which is not limited in the present disclosure.

Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of any one of the methods described above.

Embodiments of the present disclosure also provide an electronic device, including:

a memory on which a computer program is stored;

A processor, configured to execute the computer program in the memory, to implement the steps of any one of the above methods.

FIG. 8 is a block diagram of an electronic device 800 according to an exemplary embodiment. As shown in FIG. 8 , the electronic device 800 may include: a processor 701 and a memory 702 . The electronic device 800 may also include one or more of a multimedia component 703 , an input/output (I/O) interface 704 , and a communication component 705 .

The processor 701 is configured to control the overall operation of the electronic device 800 to complete all or part of the steps in the above-mentioned method for determining the pose of the target object. The memory 702 is used to store various types of data to support operations on the electronic device 800, such data may include, for example, instructions for any application or method operating on the electronic device 800, and application-related data, Such as contact data, messages sent and received, pictures, audio, video, and so on. The memory 702 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only memory ( Electrically Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. Multimedia components 703 may include screen and audio components. Wherein the screen can be, for example, a touch screen, and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in memory 702 or transmitted through communication component 705 . The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, and the above-mentioned other interface modules may be a keyboard, a mouse, a button, and the like. These buttons can be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 800 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or more of them The combination is not limited here. Therefore, the corresponding communication component 705 may include: Wi-Fi module, Bluetooth module, NFC module and so on.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (Digital) Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic components The implementation is used to execute the above-mentioned method for determining the pose of the target object.

In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, and when the program instructions are executed by a processor, the steps of the above-mentioned method for determining the pose of a target object are implemented. For example, the computer-readable storage medium can be the above-mentioned memory 702 including program instructions, and the above-mentioned program instructions can be executed by the processor 701 of the electronic device 800 to complete the above-mentioned method for determining the pose of a target object.

In order to realize the above embodiments, the present disclosure also proposes a computing processing device, including:

a memory in which computer readable code is stored; and

One or more processors, when the computer readable code is executed by the one or more processors, the computing processing device executes the aforementioned method for determining the pose of the target object.

In order to implement the above-mentioned embodiments, the present disclosure also proposes a computer program, including computer-readable codes, which, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to perform the aforementioned determination of the pose of the target object method.

The computer-readable storage medium proposed by the present disclosure stores the aforementioned computer program therein.

FIG. 9 provides a schematic structural diagram of a computing processing device according to an embodiment of the present disclosure. The computing processing device typically includes a processor 1110 and a computer program product or computer readable medium in the form of a memory 1130 . The memory 1130 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. The memory 1130 has storage space 1150 for program code 1151 for performing any of the method steps in the above-described methods. For example, the storage space 1150 for program codes may include various program codes 1151 for implementing various steps in the above methods, respectively. These program codes can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as shown in FIG. 10 . The storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 1130 in the computing processing device of FIG. 9 . The program code may, for example, be compressed in a suitable form. Typically, the storage unit includes computer readable code 1151', i.e. code readable by a processor such as 1110, for example, which when executed by a server, causes the server to perform the various steps in the methods described above.

The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings. However, the present disclosure is not limited to the specific details of the above-mentioned embodiments. Within the scope of the technical concept of the present disclosure, various simple modifications can be made to the technical solutions of the present disclosure. These simple modifications all fall within the protection scope of the present disclosure.

In addition, it should be noted that, the specific technical features described in the above-mentioned specific embodiments can be combined in any suitable manner unless they are inconsistent. In order to avoid unnecessary repetition, the present disclosure provides The combination method will not be specified otherwise.

In addition, the various embodiments of the present disclosure can also be arbitrarily combined, as long as they do not violate the spirit of the present disclosure, they should also be regarded as the contents disclosed in the present disclosure.

Example

1. A method for determining the pose of a target object, comprising:

2. The method according to Embodiment 1, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:

The center point coordinates of each seed point are determined according to the original coordinates and the corresponding displacement deviation, and the home position coordinates of the seed point are determined according to the original coordinates and the corresponding rotation deviation of each seed point.

3. The method according to Embodiment 1, wherein determining the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point, comprising:

4. The method according to Embodiment 1, wherein determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, comprising:

5. The method according to Embodiment 1, wherein the downsampling method is the farthest point sampling method, and the downsampling-based method generates seed points according to the target coordinates of each input point, including:

6. According to the method of embodiment 1, the training of the deep neural network model comprises:

Determine the coordinates of the predicted center point of the sample seed point according to the coordinate information of the sample seed point and the predicted displacement deviation, and determine the prediction of the sample seed point according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;

L=λ ₁ L _center +λ ₂ L _initial ;

7. The method according to any one of Embodiments 1 to 6, before the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, comprising:

Collect the RGB image of the target object through an image acquisition device;

8. The method according to any one of Embodiments 1 to 6, before determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, comprising:

9. A device for determining the pose of a target object, comprising:

A determination module for determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

10. The device according to Embodiment 9, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:

The center point coordinates of each seed point are determined according to the original coordinates of the seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.

11. The apparatus according to Embodiment 9, wherein the execution module is configured to:

12. The apparatus according to Embodiment 9, wherein the determining module is configured to:

13. The apparatus according to Embodiment 9, wherein the generating module is used for:

14. The apparatus according to Embodiment 9, wherein the training of the deep neural network model comprises:

L=λ ₁ L _center +λ ₂ L _initial ;

15. The apparatus according to any one of Embodiments 9 to 14, further comprising a collection module for:

Collect the RGB image of the target object through an image acquisition device;

16. The apparatus according to any one of Embodiments 9 to 14, further comprising an acquisition module for:

17. A computer program comprising computer readable code which, when run on a computing processing device, causes the computing processing device to perform the method of any of embodiments 1-8.

18. A computer-readable storage medium on which the computer program according to Embodiment 17 is stored, and when the program is executed by a processor, implements the steps of the method according to any one of Embodiments 1 to 8.

19. An electronic device comprising:

a memory having stored thereon the computer program as described in Embodiment 17;

A processor, configured to execute the computer program in the memory, to implement the steps of the method in any one of Embodiments 1 to 8.

Claims

A method for determining the pose of a target object, characterized in that the method comprises:

Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

Based on the downsampling method, a seed point is generated according to the target coordinates of each input point;

The original coordinates of each described seed point are input into the deep neural network model, to obtain the center point coordinates and the original position coordinates of each described seed point;

The 6D pose of the target object is determined according to the center point coordinates of each seed point and the origin point coordinates.
The method according to claim 1, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:

Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;

The center point coordinates of each seed point are determined according to the original coordinates of the seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
The method according to claim 1, wherein the determining the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point includes:

Build a first-order matrix according to the center point coordinates of the seed points, and build a second first-order matrix according to the in-situ coordinates of the seed points;

Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix;

3 rotational degrees of freedom of the target object are obtained according to the rotation matrix;

Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;

The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
The method according to claim 1, wherein the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object comprises:

Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

Determine the barycentric coordinates of the target object according to the original coordinates of each input point;

For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;

The target coordinate corresponding to the input point is determined according to the offset value in each dimension.
The method according to claim 1, wherein the downsampling method is a farthest point sampling method, and the downsampling-based method generates seed points according to the target coordinates of each input point, comprising:

Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;

Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;

Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
The method according to claim 1, wherein the training of the deep neural network model comprises:

The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;

Determine the coordinates of the predicted center point of the sample seed point according to the coordinate information of the sample seed point and the predicted displacement deviation, and determine the prediction of the sample seed point according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;

The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:

L=λ 1 L center +λ 2 L initial ;

Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
is the real offset of the coordinates of the ith sample seed point from the coordinates of the center point,
It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
The method according to any one of claims 1-6, wherein before the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, the method comprises:

Collect the RGB image of the target object through an image acquisition device;

According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;

A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
The method according to any one of claims 1-6, wherein before the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, the method comprises:

According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;

Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;

Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
A device for determining the pose of a target object, characterized in that the device comprises:

a determination module, configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

A generation module is used to generate a seed point according to the target coordinates of each input point based on the downsampling method;

an input module for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the in-situ coordinates of each of the seed points;

The execution module is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
The device according to claim 9, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:

Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;

The center point coordinates of each seed point are determined according to the original coordinates of the seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
The apparatus according to claim 9, wherein the execution module is configured to:

Build a first-order matrix according to the center point coordinates of the seed points, and build a second first-order matrix according to the in-situ coordinates of the seed points;

Based on the least squares method, the rotation matrix corresponding to the target object is determined according to the first first-order matrix and the second first-order matrix;

Obtain 3 rotational degrees of freedom of the target object according to the rotation matrix;

Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;

The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
The device according to claim 9, wherein the determining module is configured to:

Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;

Determine the barycentric coordinates of the target object according to the original coordinates of each input point;

For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;

The target coordinate corresponding to the input point is determined according to the offset value in each dimension.
The apparatus according to claim 9, wherein the generating module is used for:

Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;

Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;

Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
The device according to claim 9, wherein the training of the deep neural network model comprises:

The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;

Determine the coordinates of the predicted center point of the sample seed point according to the coordinate information of the sample seed point and the predicted displacement deviation, and determine the prediction of the sample seed point according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;

The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:

L=λ 1 L center +λ 2 L initial ;

Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
is the real offset of the coordinates of the ith sample seed point from the coordinates of the center point,
It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
The device according to any one of claims 9-14, characterized in that, the device further comprises a collection module for:

Collect the RGB image of the target object through an image acquisition device;

According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;

A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
The device according to any one of claims 9-14, characterized in that, the device further comprises an acquisition module for:

According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;

Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;

Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
A computer program, characterized by comprising computer-readable codes that, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to execute the method according to any one of claims 1-8 method.
A computer-readable storage medium on which the computer program according to claim 17 is stored, characterized in that, when the program is executed by a processor, the steps of the method according to any one of claims 1-8 are implemented.
An electronic device, comprising:

a memory having stored thereon the computer program of claim 17;

A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-8.