WO2022116678A1 - Method and apparatus for determining pose of target object, storage medium and electronic device - Google Patents

Method and apparatus for determining pose of target object, storage medium and electronic device Download PDF

Info

Publication number
WO2022116678A1
WO2022116678A1 PCT/CN2021/122454 CN2021122454W WO2022116678A1 WO 2022116678 A1 WO2022116678 A1 WO 2022116678A1 CN 2021122454 W CN2021122454 W CN 2021122454W WO 2022116678 A1 WO2022116678 A1 WO 2022116678A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
coordinates
target object
seed
input
Prior art date
Application number
PCT/CN2021/122454
Other languages
French (fr)
Chinese (zh)
Inventor
杜国光
赵开勇
Original Assignee
达闼机器人股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 达闼机器人股份有限公司 filed Critical 达闼机器人股份有限公司
Publication of WO2022116678A1 publication Critical patent/WO2022116678A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the technical field of robotics and computer vision, and in particular, to a method, device, storage medium and electronic device for determining the pose of a target object.
  • 6D Object Pose Estimation refers to the translation and rotation transformation of the camera coordinate system relative to the world system where the original object is located at the moment when the current image is captured. Among them, there are 3 displacement degrees of freedom and 3 rotational degrees of freedom. Based on the 6D pose of the object, it can accurately locate the object, which is of great significance in the fields of robot grasping and augmented reality applications.
  • the 6D pose of the specific object under the camera system is calculated.
  • a random sampling method can be used to obtain the spatial 6D transformation by randomly finding corresponding point pairs, and calculate the error after each spatial 6D transformation, and then use the point pair with the smallest error after the spatial 6D transformation as the object 6D bit posture.
  • the feature point method can be used to obtain the 6D pose of the initial object by finding and matching multiple feature point pairs.
  • an accurate matching algorithm such as the nearest neighbor iterative point algorithm ICP (Iterative Closest Points) Get the final 6D pose of the object.
  • ICP Intelligent Closest Points
  • the template image information can be used to find the template image that is most similar to the current image, and the 6D pose of the template image can be regarded as the 6D pose of the current object.
  • the purpose of the present disclosure is to provide a method, device, storage medium and electronic device for determining the pose of a target object, so as to solve the problem of low accuracy in determining the pose of a target object in the related art
  • a first aspect of the embodiments of the present disclosure provides a method for determining the pose of a target object, the method comprising:
  • a seed point is generated according to the target coordinates of each input point
  • the original coordinates of each described seed point are input into the deep neural network model, to obtain the center point coordinates and the original position coordinates of each described seed point;
  • the 6D pose of the target object is determined according to the center point coordinates of each seed point and the origin point coordinates.
  • an apparatus for determining the pose of a target object comprising:
  • a determination module configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
  • a generation module is used to generate a seed point according to the target coordinates of each input point based on the downsampling method
  • an input module for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the in-situ coordinates of each of the seed points;
  • the execution module is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
  • a third aspect of the embodiments of the present disclosure provides a computer program, including computer-readable codes, which, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to execute the embodiments of the first aspect of the present disclosure the methods proposed.
  • a third aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which the computer program provided by the embodiments of the third aspect of the present disclosure is stored, and when the program is executed by a processor, the method described in the first aspect is executed. step.
  • a fourth aspect of the embodiments of the present disclosure provides an electronic device, including:
  • a processor for executing the computer program in the memory to implement the steps of the method in the first aspect.
  • Fig. 1 is a flow chart of a method for determining the pose of a target object according to an exemplary embodiment.
  • Fig. 2 is a flow chart showing step S14 in Fig. 1 according to an exemplary embodiment.
  • Fig. 3 is a flow chart showing step S11 in Fig. 1 according to an exemplary embodiment.
  • Fig. 4 is a flow chart showing step S12 in Fig. 1 according to an exemplary embodiment.
  • Fig. 5 is a flowchart showing a method for determining a 3D point cloud of a target object according to an exemplary embodiment.
  • Fig. 6 is a flowchart showing another method for determining a 3D point cloud of a target object according to an exemplary embodiment.
  • Fig. 7 is a block diagram of an apparatus for determining the pose of a target object according to an exemplary embodiment.
  • FIG. 8 is a block diagram of an electronic device 800 according to an exemplary embodiment.
  • FIG. 9 provides a schematic structural diagram of a computing processing device according to an embodiment of the present disclosure.
  • FIG. 10 provides a schematic diagram of a storage unit for portable or fixed program code implementing the method according to the present disclosure according to an embodiment of the present disclosure.
  • the method for determining the pose of a target object provided by the present disclosure can be applied to an electronic device, and the electronic device can be, for example, a smart phone, a PC (Personal Computer), and the like.
  • the pose determination and calculation process is cumbersome and takes a long time, causing waste of human and material resources and increasing time costs. .
  • Determining the grasping feature points based on the geometric structure requires the geometric structure of the target object to be highly significant. If the geometric structure of the target object is less significant, the accuracy of the determined grasping feature points will be reduced, which may cause the grasping robot to grasp. Location determination and augmented reality are less accurate, thus increasing the risk of damage to the target object.
  • Fig. 1 is a flowchart of a method for determining the pose of a target object according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.
  • a seed point is generated according to the target coordinates of each input point.
  • S13 Input the original coordinates of each seed point into the deep neural network model to obtain the center point coordinates and the original position coordinates of each seed point.
  • the target coordinates of each input point in the target object are determined according to the 3D point cloud corresponding to the target object; based on the downsampling method, a seed point is generated according to the target coordinates of each input point;
  • the coordinates are input into the deep neural network model to obtain the coordinates of the center point of each seed point and the coordinates of the original point;
  • the 6D pose of the target object is determined according to the coordinates of the center point and the original point of each seed point.
  • the coordinates of each seed point are returned to the coordinates under the standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy.
  • the 6D pose of the instance-level target object can be estimated, but also the 6D pose of the category-level target object can be estimated, thereby reducing the dependence on the complete consistency of the model.
  • the deep neural network model generates the center point coordinates and in-situ coordinates of each of the seed points in the following manner:
  • the center point coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
  • the high-dimensional feature of the original coordinates of each seed point can be obtained by using the three-dimensional convolutional neural network model PointNet++, and the displacement deviation and rotation deviation of the seed point relative to the predicted target point can be obtained by regression according to the high-dimensional feature.
  • the center of the seed point The point coordinate qi is p i + ⁇ q i
  • the in-situ point coordinate ri of the seed point is p i + ⁇ r i , where the value of n is the same as the number of the determined seed points.
  • step S14 the target is determined according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
  • the 6D pose of the object including:
  • S144 Calculate the average coordinates according to the center point coordinates of each seed point, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object.
  • the predicted center point coordinates of the target object are aggregated. For example, by calculating the sum of the center point coordinates of all seed points, the center point of each seed point of the target object is obtained.
  • the three coordinate values of the average coordinates are used as the three displacement degrees of freedom of the target object. For example, if the obtained coordinates of the predicted center point are (1, 2, 3), then the three displacement degrees of freedom of the target object are ( 1,2,3).
  • first-order matrix is constructed according to the original coordinates of the seed points as The second first-order matrix is constructed according to the in-situ coordinates of the seed points as If the rotation matrix is then there are
  • X is calculated using the least squares method.
  • each input point in the target object is determined according to the 3D point cloud corresponding to the target object.
  • target coordinates including:
  • S112. Determine the barycentric coordinates of the 3D point cloud according to the original coordinates of each input point.
  • the barycentric coordinates of the 3D point cloud are (2, 3, 4)
  • the original coordinates of a certain input point of the target object are (8, 7, 9)
  • the target coordinates of the input point are determined to be (6, 4, 5).
  • the original coordinates of the 3D point cloud can be normalized by the barycentric coordinates of the 3D point cloud, so that the original coordinates of the target object can be reduced to the model Calculated impact.
  • the downsampling method is the farthest point sampling method.
  • the downsampling-based method is based on the The target coordinates of each input point generate seed points, including:
  • the downsampling method includes random sampling, farthest point sampling FPS (Farthest Point Sampling), and a method based on depth model sampling.
  • FPS Fieldest Point Sampling
  • the following sampling method is the farthest point sampling method as an example, select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point, and then select the Euclidean distance from the first seed point.
  • the input point with the farthest distance is used as the second seed point, and then based on the first seed point and the second seed point, the input point with the farthest Euclidean distance is selected as the third seed point, that is, the Euclidean distance farthest from the first seed point is selected.
  • the input point with the farthest Euclidean distance from the second seed point is used as the third seed point.
  • the input point with the farthest Euclidean distance is selected as the fourth seed point, that is, the Euclidean distance farthest from the second seed point and the Euclidean distance from the third seed point are selected.
  • the distant input point is used as the fourth seed point, and so on, until the number of seed points reaches 1024. That is, the preset threshold is 1024.
  • the seed points of the target object can be determined by the downsampling method, so that the influence of different original coordinate densities of the target object on the model calculation can be reduced.
  • the training of the deep neural network model includes:
  • the coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
  • the coordinates of the predicted center point of the sample seed point are determined according to the coordinate information of the sample seed point and the predicted displacement deviation, and the prediction of the sample seed point is determined according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
  • the loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
  • L ⁇ 1 L center + ⁇ 2 L initial ;
  • n is the total number of sample seed points
  • L center is the displacement offset loss value
  • L initial is the rotation offset loss value
  • ⁇ 1 is the preset weight of the displacement offset loss value
  • ⁇ 2 is the rotation offset value.
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
  • the coordinates of the ith sample seed point are the real offsets from the coordinates of the original point. It can be calculated from the true value of the rotation degree of freedom and the predicted rotation deviation of the seed point.
  • the coordinate of the ith sample seed point is the true offset of the center point coordinate. It can be calculated from the actual value of the displacement degrees of freedom and the predicted displacement deviation of the seed point.
  • FIG. 5 is a flowchart of a method for determining a 3D point cloud of a target object according to an exemplary embodiment, in which each input in the determination of the target object according to the 3D point cloud corresponding to the target object is performed.
  • each input in the determination of the target object according to the 3D point cloud corresponding to the target object is performed.
  • S53 Determine a 3D point cloud corresponding to the target object according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
  • the target object when it is determined that there is a target object in the acquisition range of the image acquisition device, for example, the target object is placed in the acquisition range of the image acquisition device on the workbench, and the image acquisition device collects RGB images of its acquisition range, and then the Instance segmentation is performed on the RGB image. For example, instance segmentation is performed on the RGB image by using the Mask R-CNN instance segmentation algorithm to obtain the category-level mask area occupied by the target object.
  • the depth image corresponding to the target object is determined, and then the area occupied by the depth image of the target image can be determined, that is, the image of the target image. deep area.
  • the 3D point cloud corresponding to the target object is determined according to the category-level mask area and the depth image area, combined with the noise suppression parameter and the texture restoration degree parameter of the image acquisition device. In this way, the background noise of the image can be removed reasonably and the accuracy of 3D point cloud determination can be improved.
  • the accuracy of determining the 3D point cloud can be improved, thereby improving the accuracy of determining the pose of the target object. sex.
  • FIG. 6 is a flowchart showing another method for determining a 3D point cloud of a target object according to an exemplary embodiment, in which each of the target objects is determined according to the 3D point cloud corresponding to the target object.
  • each of the target objects is determined according to the 3D point cloud corresponding to the target object.
  • the RGB image of the target object is collected by the image acquisition device, and the single-stage 2D target detection algorithm YOLOv3 can be used to identify the category occupied by the target area. level rectangle area.
  • the category-level rectangular area is subjected to far-near cropping to obtain the visual frustum corresponding to the target object, and based on the frustum plane of the image acquisition device, the frustum plane is no more than the frustum.
  • the vertex of the body is intersected with the generatrix of the viewing frustum, the frustum operation is performed on the viewing frustum, the frustum corresponding to the target object is obtained, and the area occupied by the frustum is determined.
  • a semantic segmentation model such as the PointNet++ model
  • the determination of the 3D point cloud can be improved. accuracy, thereby improving the accuracy of the pose determination of the target object.
  • FIG. 7 is a block diagram of an apparatus for determining the pose of a target object according to an exemplary embodiment. As shown in FIG. 7 , the apparatus 700 includes: a determination module 710 , a generation module 720 , an input module 730 and an execution module 740 .
  • the determining module 710 is configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
  • a generating module 720 configured to generate a seed point according to the target coordinate of each input point based on the downsampling method
  • the input module 730 is used for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the origin point coordinates of each of the seed points;
  • the execution module 740 is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
  • the above device returns the coordinates of each seed point to the coordinates under its standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy.
  • the deep neural network model generates the center point coordinates and in-situ coordinates of each of the seed points in the following manner:
  • the center point coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
  • the execution module is used to:
  • the 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
  • the determining module is used to:
  • the target coordinate corresponding to the input point is determined according to the offset value in each dimension.
  • the generation module is used for:
  • the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
  • the training of the deep neural network model includes:
  • the coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
  • the coordinates of the predicted center point of the sample seed point are determined according to the coordinate information of the sample seed point and the predicted displacement deviation, and the prediction of the sample seed point is determined according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
  • the loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
  • L ⁇ 1 L center + ⁇ 2 L initial ;
  • n is the total number of sample seed points
  • L center is the displacement offset loss value
  • L initial is the rotation offset loss value
  • ⁇ 1 is the preset weight of the displacement offset loss value
  • ⁇ 2 is the rotation offset value.
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
  • the device further includes a collection module for:
  • the RGB image determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
  • a 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
  • the device further includes an acquisition module for:
  • the target detection algorithm determine the category-level rectangular area corresponding to the target object
  • the embodiments described in the specification are all preferred embodiments, and the involved parts are not necessarily necessary for the present disclosure, for example, the input module 730 and the execution module 740 , may be mutually independent devices or the same device during specific implementation, which is not limited in the present disclosure.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of any one of the methods described above.
  • Embodiments of the present disclosure also provide an electronic device, including:
  • a processor configured to execute the computer program in the memory, to implement the steps of any one of the above methods.
  • FIG. 8 is a block diagram of an electronic device 800 according to an exemplary embodiment.
  • the electronic device 800 may include: a processor 701 and a memory 702 .
  • the electronic device 800 may also include one or more of a multimedia component 703 , an input/output (I/O) interface 704 , and a communication component 705 .
  • I/O input/output
  • the processor 701 is configured to control the overall operation of the electronic device 800 to complete all or part of the steps in the above-mentioned method for determining the pose of the target object.
  • the memory 702 is used to store various types of data to support operations on the electronic device 800, such data may include, for example, instructions for any application or method operating on the electronic device 800, and application-related data, Such as contact data, messages sent and received, pictures, audio, video, and so on.
  • the memory 702 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only memory ( Electrically Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, ROM for short), magnetic memory, flash memory, magnetic disk or optical disk.
  • Multimedia components 703 may include screen and audio components. Wherein the screen can be, for example, a touch screen, and the audio component is used for outputting and/or inputting audio signals.
  • the audio component may include a microphone for receiving external audio signals.
  • the received audio signal may be further stored in memory 702 or transmitted through communication component 705 .
  • the audio assembly also includes at least one speaker for outputting audio signals.
  • the I/O interface 704 provides an interface between the processor 701 and other interface modules, and the above-mentioned other interface modules may be a keyboard, a mouse, a button, and the like. These buttons can be virtual buttons or physical buttons.
  • the communication component 705 is used for wired or wireless communication between the electronic device 800 and other devices.
  • Wireless communication such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or more of them
  • the corresponding communication component 705 may include: Wi-Fi module, Bluetooth module, NFC module and so on.
  • the electronic device 800 may be implemented by one or more Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (Digital) Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic components
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • controller microcontroller, microprocessor or other electronic components
  • microcontroller microprocessor or other electronic components
  • a computer-readable storage medium including program instructions is also provided, and when the program instructions are executed by a processor, the steps of the above-mentioned method for determining the pose of a target object are implemented.
  • the computer-readable storage medium can be the above-mentioned memory 702 including program instructions, and the above-mentioned program instructions can be executed by the processor 701 of the electronic device 800 to complete the above-mentioned method for determining the pose of a target object.
  • the present disclosure also proposes a computing processing device, including:
  • One or more processors when the computer readable code is executed by the one or more processors, the computing processing device executes the aforementioned method for determining the pose of the target object.
  • the present disclosure also proposes a computer program, including computer-readable codes, which, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to perform the aforementioned determination of the pose of the target object method.
  • the computer-readable storage medium proposed by the present disclosure stores the aforementioned computer program therein.
  • FIG. 9 provides a schematic structural diagram of a computing processing device according to an embodiment of the present disclosure.
  • the computing processing device typically includes a processor 1110 and a computer program product or computer readable medium in the form of a memory 1130 .
  • the memory 1130 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 1130 has storage space 1150 for program code 1151 for performing any of the method steps in the above-described methods.
  • the storage space 1150 for program codes may include various program codes 1151 for implementing various steps in the above methods, respectively. These program codes can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as shown in FIG. 10 .
  • the storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 1130 in the computing processing device of FIG. 9 .
  • the program code may, for example, be compressed in a suitable form.
  • the storage unit includes computer readable code 1151', i.e. code readable by a processor such as 1110, for example, which when executed by a server, causes the server to perform the various steps in the methods described above.
  • a method for determining the pose of a target object comprising:
  • a seed point is generated according to the target coordinates of each input point
  • the original coordinates of each described seed point are input into the deep neural network model, to obtain the center point coordinates and the original position coordinates of each described seed point;
  • the 6D pose of the target object is determined according to the center point coordinates of each seed point and the origin point coordinates.
  • the center point coordinates of each seed point are determined according to the original coordinates and the corresponding displacement deviation, and the home position coordinates of the seed point are determined according to the original coordinates and the corresponding rotation deviation of each seed point.
  • the 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
  • the target coordinate corresponding to the input point is determined according to the offset value in each dimension.
  • the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
  • the training of the deep neural network model comprises:
  • the coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
  • the loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
  • L ⁇ 1 L center + ⁇ 2 L initial ;
  • n is the total number of sample seed points
  • L center is the displacement offset loss value
  • L initial is the rotation offset loss value
  • ⁇ 1 is the preset weight of the displacement offset loss value
  • ⁇ 2 is the rotation offset value.
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
  • the RGB image determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
  • a 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
  • the target detection algorithm determine the category-level rectangular area corresponding to the target object
  • a device for determining the pose of a target object comprising:
  • a determination module for determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
  • a generation module is used to generate a seed point according to the target coordinates of each input point based on the downsampling method
  • an input module for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the in-situ coordinates of each of the seed points;
  • the execution module is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
  • the center point coordinates of each seed point are determined according to the original coordinates of the seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
  • the 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
  • the target coordinate corresponding to the input point is determined according to the offset value in each dimension.
  • the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
  • the coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
  • the loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
  • L ⁇ 1 L center + ⁇ 2 L initial ;
  • n is the total number of sample seed points
  • L center is the displacement offset loss value
  • L initial is the rotation offset loss value
  • ⁇ 1 is the preset weight of the displacement offset loss value
  • ⁇ 2 is the rotation offset value.
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • ⁇ q i is the predicted displacement deviation corresponding to the ith sample seed point
  • ⁇ r i is the predicted rotation deviation corresponding to the ith sample seed point
  • It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
  • the RGB image determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
  • a 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
  • the target detection algorithm determine the category-level rectangular area corresponding to the target object
  • a computer program comprising computer readable code which, when run on a computing processing device, causes the computing processing device to perform the method of any of embodiments 1-8.
  • An electronic device comprising:
  • a processor configured to execute the computer program in the memory, to implement the steps of the method in any one of Embodiments 1 to 8.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

A method and an apparatus for determining a pose of a target object, a storage medium and an electronic device, which aim to solve the problem in the related art of low accuracy in determining a pose of a target object. Said method comprises: determining target coordinates of each input point in the target object according to a 3D point cloud corresponding to the target object (S11); on the basis of a down-sampling method, generating a seed point according to the target coordinates of each input point (S12); inputting original coordinates of each seed point into a deep neural network model, so as to obtain center point coordinates and in-situ point coordinates of each seed point (S13); and determining a 6D pose of the target object according to the center point coordinates and the in-situ point coordinates of each seed point (S14). Said method improves the accuracy in determining a pose of a target object.

Description

目标物体位姿确定方法、装置、存储介质及电子设备Target object pose determination method, device, storage medium and electronic device
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求在2020年12月02日提交中国专利局、申请号为202011401984.6、名称为“目标物体位姿确定方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application with the application number 202011401984.6 and titled "Method, Apparatus, Storage Medium and Electronic Equipment for Determining the Position and Attitude of a Target Object" filed with the China Patent Office on December 02, 2020, the entire contents of which are obtained through References are incorporated in this disclosure.
技术领域technical field
本公开涉及机器人及计算机视觉技术领域,具体地,涉及一种目标物体位姿确定方法、装置、存储介质及电子设备。The present disclosure relates to the technical field of robotics and computer vision, and in particular, to a method, device, storage medium and electronic device for determining the pose of a target object.
背景技术Background technique
物体6D位姿(6D Object Pose Estimation)是指拍摄当前图像时刻,相机坐标系相对于原始物体所在的世界系,发生的平移和旋转变换。其中,包括3个位移自由度以及3个旋转自由度。基于物体6D位姿能够对物体进行精确定位,在机器人抓取和增强现实应用领域具有重要意义。6D Object Pose Estimation refers to the translation and rotation transformation of the camera coordinate system relative to the world system where the original object is located at the moment when the current image is captured. Among them, there are 3 displacement degrees of freedom and 3 rotational degrees of freedom. Based on the 6D pose of the object, it can accurately locate the object, which is of great significance in the fields of robot grasping and augmented reality applications.
相关技术中,根据特定物体的3D数据,基于3D点云或者基于RGB-D图像,计算该特定物体在相机系下的6D位姿。例如,基于3D点云可以采用随机采样方法,通过随机寻找对应点对,获得空间6D变换,并计算每一次空间6D变换后的误差,进而将空间6D变换后误差最小的点对作为物体6D位姿。又例如,基于3D点云可以采用特征点方法,通过寻找和匹配多个特征点对,获得初始物体6D位姿,在初始物体6D位姿基础上,通过精确匹配算法如最近邻迭代点算法ICP(Iterative Closest Points)得到最终物体6D位姿。而基于RGB-D图像可以借助模板图像信息,通过寻找与当前图像最相似的模板图像,将该模板图像的6D位姿当作当前物体的6D位姿。In the related art, according to the 3D data of the specific object, based on the 3D point cloud or based on the RGB-D image, the 6D pose of the specific object under the camera system is calculated. For example, based on the 3D point cloud, a random sampling method can be used to obtain the spatial 6D transformation by randomly finding corresponding point pairs, and calculate the error after each spatial 6D transformation, and then use the point pair with the smallest error after the spatial 6D transformation as the object 6D bit posture. For another example, based on the 3D point cloud, the feature point method can be used to obtain the 6D pose of the initial object by finding and matching multiple feature point pairs. Based on the 6D pose of the initial object, an accurate matching algorithm such as the nearest neighbor iterative point algorithm ICP (Iterative Closest Points) Get the final 6D pose of the object. Based on the RGB-D image, the template image information can be used to find the template image that is most similar to the current image, and the 6D pose of the template image can be regarded as the 6D pose of the current object.
发明内容SUMMARY OF THE INVENTION
本公开的目的是提供一种目标物体位姿确定方法、装置、存储介质及电子设备,以解决相关技术中目标物体位姿确定的准确性较低的问题The purpose of the present disclosure is to provide a method, device, storage medium and electronic device for determining the pose of a target object, so as to solve the problem of low accuracy in determining the pose of a target object in the related art
为了实现上述目的,本公开实施例的第一方面,提供一种目标物体位姿确定方法,所述方法包括:In order to achieve the above purpose, a first aspect of the embodiments of the present disclosure provides a method for determining the pose of a target object, the method comprising:
根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标;Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
基于下采样方法,根据所述每一输入点的目标坐标生成种子点;Based on the downsampling method, a seed point is generated according to the target coordinates of each input point;
将每一所述种子点的原始坐标输入深度神经网络模型中,以得到每一所述种子点的中心点坐标以及原位点坐标;The original coordinates of each described seed point are input into the deep neural network model, to obtain the center point coordinates and the original position coordinates of each described seed point;
根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the center point coordinates of each seed point and the origin point coordinates.
本公开实施例的第二方面,提供一种目标物体位姿确定装置,所述装置包括:In a second aspect of the embodiments of the present disclosure, there is provided an apparatus for determining the pose of a target object, the apparatus comprising:
确定模块,用于根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标;a determination module, configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
生成模块,用于基于下采样方法,根据所述每一输入点的目标坐标生成种子点;A generation module is used to generate a seed point according to the target coordinates of each input point based on the downsampling method;
输入模块,用于将每一所述种子点的原始坐标输入深度神经网络模型中,以得到每一所述种子点的中心点坐标以及原位点坐标;an input module for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the in-situ coordinates of each of the seed points;
执行模块,用于根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿。The execution module is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
本公开实施例的第三方面,提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行本公开第一方面实施例所提出的方法。A third aspect of the embodiments of the present disclosure provides a computer program, including computer-readable codes, which, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to execute the embodiments of the first aspect of the present disclosure the methods proposed.
本公开实施例的第三方面,提供一种计算机可读存储介质,其上存储有本公开第三方面实施例所提出的计算机程序,该程序被处理器执行时第一方面中所述方法的步骤。A third aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which the computer program provided by the embodiments of the third aspect of the present disclosure is stored, and when the program is executed by a processor, the method described in the first aspect is executed. step.
本公开实施例的第四方面,提供一种电子设备,包括:A fourth aspect of the embodiments of the present disclosure provides an electronic device, including:
存储器,其上存储有本公开第三方面实施例所提出的计算机程序;a memory, on which the computer program proposed by the embodiment of the third aspect of the present disclosure is stored;
处理器,用于执行所述存储器中的所述计算机程序,以实现第一方面中所述方法的步骤。A processor for executing the computer program in the memory to implement the steps of the method in the first aspect.
通过上述技术方案,至少可以达到以下有益效果:Through the above technical solutions, at least the following beneficial effects can be achieved:
通过根据目标物体对应的3D点云确定目标物体中每一输入点的目标坐标;基于下采样方法,根据每一输入点的目标坐标生成种子点;将每一种子点的原始坐标输入深度神经网络模型中,以得到每一种子点的中心点坐标以及原位点坐标;根据每一种子点的中心点坐标以及原位点坐标确定目标物体的6D位姿。这样,通过深度神经网络模型将每一个种子点的坐标回归其标准姿态下的坐标,不仅缩短了位姿确定的时长还提高了目标物体位姿确定的准确性,从而提升了机器人抓取和增强现实的准确性。Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object; based on the downsampling method, generate seed points according to the target coordinates of each input point; input the original coordinates of each seed point into the deep neural network In the model, the center point coordinates and the origin point coordinates of each seed point are obtained; the 6D pose of the target object is determined according to the center point coordinates and the origin point coordinates of each seed point. In this way, the coordinates of each seed point are returned to the coordinates under the standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy.
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.
附图说明Description of drawings
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present disclosure, and constitute a part of the specification, and together with the following detailed description, are used to explain the present disclosure, but not to limit the present disclosure. In the attached image:
图1是根据一示例性实施例示出的一种目标物体位姿确定方法的流程图。Fig. 1 is a flow chart of a method for determining the pose of a target object according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种实现图1中步骤S14的流程图。Fig. 2 is a flow chart showing step S14 in Fig. 1 according to an exemplary embodiment.
图3是根据一示例性实施例示出的一种实现图1中步骤S11的流程图。Fig. 3 is a flow chart showing step S11 in Fig. 1 according to an exemplary embodiment.
图4是根据一示例性实施例示出的一种实现图1中步骤S12的流程图。Fig. 4 is a flow chart showing step S12 in Fig. 1 according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种目标物体3D点云确定方法的流程图。Fig. 5 is a flowchart showing a method for determining a 3D point cloud of a target object according to an exemplary embodiment.
图6是根据一示例性实施例示出的另一种目标物体3D点云确定方法的流程图。Fig. 6 is a flowchart showing another method for determining a 3D point cloud of a target object according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种目标物体位姿确定装置的框图。Fig. 7 is a block diagram of an apparatus for determining the pose of a target object according to an exemplary embodiment.
图8是根据一示例性实施例示出的一种电子设备800的框图。FIG. 8 is a block diagram of an electronic device 800 according to an exemplary embodiment.
图9为本公开实施例提供了一种计算处理设备的结构示意图。FIG. 9 provides a schematic structural diagram of a computing processing device according to an embodiment of the present disclosure.
图10为本公开实施例提供了一种用于便携式或者固定实现根据本公开的方法的程序代码的存储单元的示意图。FIG. 10 provides a schematic diagram of a storage unit for portable or fixed program code implementing the method according to the present disclosure according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。The specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to illustrate and explain the present disclosure, but not to limit the present disclosure.
需要说明的是,在本公开中,说明书和权利要求书以及附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必理解为描述特定的顺序或先后次序。同理,术语“S51”、“S61”等用于区别步骤,而不必理解为按照特定的顺序或先后次序执行方法步骤。It should be noted that, in the present disclosure, the terms "first", "second" and the like in the description, the claims and the drawings are used to distinguish similar objects, and are not necessarily construed as describing a specific order or sequence. . Likewise, the terms "S51", "S61", etc. are used to distinguish steps, and are not necessarily understood to mean that the method steps are performed in a specific order or sequence.
在介绍本公开提供的目标物体位姿确定方法、装置、存储介质及电子设备之前,首先对本公开的应用场景进行介绍。本公开提供的目标物体位姿确定方法可以应用于电子设备,该电子设备可以例如是智能手机、PC(Personal Computer个人计算机)等等。Before introducing the method, device, storage medium and electronic device for determining the pose of a target object provided by the present disclosure, the application scenarios of the present disclosure are first introduced. The method for determining the pose of a target object provided by the present disclosure can be applied to an electronic device, and the electronic device can be, for example, a smart phone, a PC (Personal Computer), and the like.
发明人发现,基于随机采样的方法确定目标物体位姿,需要不断重复地寻找对应点对以及计算变换误差,位姿确定计算过程繁琐,耗费时间较长,造成人力物力资源浪费,增加了时间成本。而基于几何结构确定抓取特征点,需要目标物体的几何机构显著性较 高,若目标物体的几何机构显著性较低,则确定的抓取特征点的准确性降低,可能导致抓取机器人抓取位置确定以及增强现实的准确性较低,因而增加了目标物体损坏的风险。The inventors found that, to determine the pose of the target object based on the random sampling method, it is necessary to repeatedly search for corresponding point pairs and calculate the transformation error. The pose determination and calculation process is cumbersome and takes a long time, causing waste of human and material resources and increasing time costs. . Determining the grasping feature points based on the geometric structure requires the geometric structure of the target object to be highly significant. If the geometric structure of the target object is less significant, the accuracy of the determined grasping feature points will be reduced, which may cause the grasping robot to grasp. Location determination and augmented reality are less accurate, thus increasing the risk of damage to the target object.
为解决上述技术问题,本公开提供一种目标物体位姿确定方法。图1是根据一示例性实施例示出的一种目标物体位姿确定方法的流程图,如图1所示,所述方法包括以下步骤。In order to solve the above technical problems, the present disclosure provides a method for determining the pose of a target object. Fig. 1 is a flowchart of a method for determining the pose of a target object according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.
S11、根据目标物体对应的3D点云确定目标物体中每一输入点的目标坐标。S11. Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object.
S12、基于下采样方法,根据每一输入点的目标坐标生成种子点。S12. Based on the down-sampling method, a seed point is generated according to the target coordinates of each input point.
S13、将每一种子点的原始坐标输入深度神经网络模型中,以得到每一种子点的中心点坐标以及原位点坐标。S13: Input the original coordinates of each seed point into the deep neural network model to obtain the center point coordinates and the original position coordinates of each seed point.
S14、根据每一种子点的中心点坐标以及原位点坐标确定目标物体的6D位姿。S14. Determine the 6D pose of the target object according to the center point coordinates of each seed point and the origin point coordinates.
采用上述技术方案,通过根据目标物体对应的3D点云确定目标物体中每一输入点的目标坐标;基于下采样方法,根据每一输入点的目标坐标生成种子点;将每一种子点的原始坐标输入深度神经网络模型中,以得到每一种子点的中心点坐标以及原位点坐标;根据每一种子点的中心点坐标以及原位点坐标确定目标物体的6D位姿。这样,通过深度神经网络模型将每一个种子点的坐标回归其标准姿态下的坐标,不仅缩短了位姿确定的时长还提高了目标物体位姿确定的准确性,从而提升了机器人抓取和增强现实的准确性。并且,不仅可以估计实例级的目标物体的6D位姿,还能够估计类别级的目标物体的6D位姿,进而减少了对模型完全一致性的依赖。Using the above technical solution, the target coordinates of each input point in the target object are determined according to the 3D point cloud corresponding to the target object; based on the downsampling method, a seed point is generated according to the target coordinates of each input point; The coordinates are input into the deep neural network model to obtain the coordinates of the center point of each seed point and the coordinates of the original point; the 6D pose of the target object is determined according to the coordinates of the center point and the original point of each seed point. In this way, the coordinates of each seed point are returned to the coordinates under the standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy. Moreover, not only the 6D pose of the instance-level target object can be estimated, but also the 6D pose of the category-level target object can be estimated, thereby reducing the dependence on the complete consistency of the model.
可选地,所述深度神经网络模型通过如下方式生成每一所述种子点的中心点坐标以及原位点坐标:Optionally, the deep neural network model generates the center point coordinates and in-situ coordinates of each of the seed points in the following manner:
根据每一所述种子点的原始坐标确定所述每一种子点对应的位移偏差以及旋转偏差;Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;
根据每一所述种子点的原始坐标以及对应的位移偏差确定该种子点的中心点坐标,以及根据每一所述种子点的原始坐标以及对应的旋转偏差确定该种子点的原位点坐标。The center point coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
可选地,利用三维卷积神经网络模型PointNet++可以得到每一种子点的原始坐标的高维特征,并根据该高维特征可以回归得到该种子点相对于预测目标点的位移偏差以及旋转偏差。Optionally, the high-dimensional feature of the original coordinates of each seed point can be obtained by using the three-dimensional convolutional neural network model PointNet++, and the displacement deviation and rotation deviation of the seed point relative to the predicted target point can be obtained by regression according to the high-dimensional feature.
示例地,若第i个种子点的原始坐标为p i,Δq i为第i个样本种子点对应的位移偏差,Δr i为第i个样本种子点对应的旋转偏差,则该种子点的中心点坐标q i为p i+Δq i, 该种子点的原位点坐标r i为p i+Δr i,其中,n的取值与确定的种子点的个数相同。 For example, if the original coordinate of the ith seed point is p i , Δq i is the displacement deviation corresponding to the ith sample seed point, and Δr i is the rotation deviation corresponding to the ith sample seed point, then the center of the seed point The point coordinate qi is p i +Δq i , and the in-situ point coordinate ri of the seed point is p i +Δr i , where the value of n is the same as the number of the determined seed points.
可选地,参考图2所示出的一种实现图1中步骤S14的流程图,在步骤S14中,所述根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿,包括:Optionally, referring to a flowchart shown in FIG. 2 for implementing step S14 in FIG. 1 , in step S14, the target is determined according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point. The 6D pose of the object, including:
S141、根据种子点的原始坐标构建第一一阶矩阵,并根据种子点的原位点坐标构建第二一阶矩阵。S141. Construct a first-order matrix according to the original coordinates of the seed points, and construct a second first-order matrix according to the original coordinates of the seed points.
S142、基于最小二乘法,根据第一一阶矩阵以及第二一阶矩阵确定目标物体对应的旋转矩阵。S142. Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix.
S143、根据旋转矩阵得到目标物体的3个旋转自由度。S143, obtaining three rotational degrees of freedom of the target object according to the rotation matrix.
S144、根据每一种子点的中心点坐标计算平均坐标,并将平均坐标的三个坐标值作为目标物体的3个位移自由度。S144: Calculate the average coordinates according to the center point coordinates of each seed point, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object.
S145、根据3个旋转自由度以及3个位移自由度确定目标物体的6D位姿。S145. Determine the 6D pose of the target object according to the three rotational degrees of freedom and the three displacement degrees of freedom.
具体实施时,根据每一种子点的中心点坐标,聚合得到目标物体的预测中心点坐标,例如,通过计算所有种子点的中心点坐标的和,再求取目标物体的每一种子点中心点坐标的平均值,得到目标物体的预测中心点坐标,即预测中心点坐标
Figure PCTCN2021122454-appb-000001
其中,q i为第i种子点的中心点坐标。
In specific implementation, according to the center point coordinates of each seed point, the predicted center point coordinates of the target object are aggregated. For example, by calculating the sum of the center point coordinates of all seed points, the center point of each seed point of the target object is obtained. The average value of the coordinates to obtain the coordinates of the predicted center point of the target object, that is, the coordinates of the predicted center point
Figure PCTCN2021122454-appb-000001
Among them, qi is the coordinate of the center point of the ith seed point.
进一步地,将平均坐标的三个坐标值作为目标物体的3个位移自由度,例如,若得到的预测中心点坐标为(1,2,3),则目标物体的3个位移自由度为(1,2,3)。Further, the three coordinate values of the average coordinates are used as the three displacement degrees of freedom of the target object. For example, if the obtained coordinates of the predicted center point are (1, 2, 3), then the three displacement degrees of freedom of the target object are ( 1,2,3).
进一步地,根据种子点的原始坐标构建第一一阶矩阵为
Figure PCTCN2021122454-appb-000002
根据种子点的原位点坐标构建第二一阶矩阵为
Figure PCTCN2021122454-appb-000003
若令旋转矩阵
Figure PCTCN2021122454-appb-000004
则有
Figure PCTCN2021122454-appb-000005
Further, the first-order matrix is constructed according to the original coordinates of the seed points as
Figure PCTCN2021122454-appb-000002
The second first-order matrix is constructed according to the in-situ coordinates of the seed points as
Figure PCTCN2021122454-appb-000003
If the rotation matrix is
Figure PCTCN2021122454-appb-000004
then there are
Figure PCTCN2021122454-appb-000005
进一步地,使用最小二乘法计算得到X。Further, X is calculated using the least squares method.
进一步地,令目标物体的3个旋转自由度为(α,β,γ),则α=atan2(x 32,x 33),
Figure PCTCN2021122454-appb-000006
γ=atan2(x 21,x 11)。
Further, let the three rotational degrees of freedom of the target object be (α, β, γ), then α=atan2(x 32 , x 33 ),
Figure PCTCN2021122454-appb-000006
γ=atan2(x 21 , x 11 ).
可选地,参考图3所示出的一种实现图1中步骤S11的流程图,在步骤S11中,所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标,包括:Optionally, referring to a flowchart shown in FIG. 3 for implementing step S11 in FIG. 1 , in step S11 , each input point in the target object is determined according to the 3D point cloud corresponding to the target object. target coordinates, including:
S111、根据目标物体对应的3D点云确定目标物体中每一输入点的原始坐标。S111. Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object.
S112、根据每一输入点的原始坐标确定3D点云的重心坐标。S112. Determine the barycentric coordinates of the 3D point cloud according to the original coordinates of each input point.
S113、针对所述每一输入点,确定该输入点的原始坐标与所述重心坐标在每一维度上的偏移值;S113, for each input point, determine the offset value of the original coordinates of the input point and the barycentric coordinates in each dimension;
S114、根据每一维度上的所述偏移值确定对应该输入点的目标坐标。S114. Determine the target coordinate corresponding to the input point according to the offset value in each dimension.
具体地,若3D点云的重心坐标为o center,则该输入点的目标坐标p new=p ori-o centerSpecifically, if the barycentric coordinate of the 3D point cloud is o center , then the target coordinate of the input point is p new =p ori -o center .
示例地,确定3D点云的重心坐标为(2,3,4),目标物体的某一输入点的原始坐标(8,7,9),确定该输入点的原始坐标与重心坐标在第一维度上的偏移值为6,即8-2=6,确定该输入点的原始坐标与重心坐标在第二维度上的偏移值为4,即7-3=4,确定该输入点的原始坐标与重心坐标在第三维度上的偏移值为5,即9-4=5。For example, it is determined that the barycentric coordinates of the 3D point cloud are (2, 3, 4), the original coordinates of a certain input point of the target object are (8, 7, 9), and it is determined that the original coordinates of the input point and the barycentric coordinates are in the first The offset value in the dimension is 6, that is, 8-2=6, and the offset value of the original coordinates of the input point and the barycentric coordinates in the second dimension is 4, that is, 7-3=4, and the offset value of the input point is determined. The offset value of the original coordinates and the barycentric coordinates in the third dimension is 5, that is, 9-4=5.
进一步地,根据第一维度上的偏移值6、第二维度上的偏移值4以及第三维度上的偏移值5,确定该输入点的目标坐标为(6,4,5)。Further, according to the offset value 6 on the first dimension, the offset value 4 on the second dimension, and the offset value 5 on the third dimension, the target coordinates of the input point are determined to be (6, 4, 5).
采用上述技术方案,由于目标物体的3D点云出现的位置是随机的,通过3D点云的重心坐标,可以归一化3D点云的原始坐标,这样,可以减小目标物体的原始坐标对模型计算的影响。With the above technical solution, since the position of the 3D point cloud of the target object is random, the original coordinates of the 3D point cloud can be normalized by the barycentric coordinates of the 3D point cloud, so that the original coordinates of the target object can be reduced to the model Calculated impact.
可选地,所述下采样方法为最远点采样方法,参考图4所示出的一种实现图1中步骤S12的流程图,在步骤S12中,所述基于下采样方法,根据所述每一输入点的目标坐标生成种子点,包括:Optionally, the downsampling method is the farthest point sampling method. Referring to a flowchart shown in FIG. 4 for implementing step S12 in FIG. 1 , in step S12, the downsampling-based method is based on the The target coordinates of each input point generate seed points, including:
S121、根据目标物体中每一输入点的原始坐标确定3D点云的中心点。S121. Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object.
S122、选择距离3D点云的中心点欧氏距离最远的输入点作为第一种子点。S122. Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point.
S123、以第一种子点为基准,将与已确定的种子点之间的欧式距离最远的输入点作为新的种子点,直到种子点的个数达到预设阈值。S123. Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
可选地,下采样方法包括随机采样、最远点采样FPS(Farthest Point Sampling)以及 基于深度模型采样的方法。Optionally, the downsampling method includes random sampling, farthest point sampling FPS (Farthest Point Sampling), and a method based on depth model sampling.
示例地,此处以下采样方法为最远点采样方法为例,选择距离所述3D点云的中心点欧氏距离最远的输入点作为第一种子点,进而选择距离第一种子点欧氏距离最远的输入点作为第二种子点,进而基于第一种子点以及第二种子点,选择欧式距离最远的输入点为第三种子点,即选择距离第一种子点欧氏距离最远以及距离第二种子点欧氏距离最远的输入点作为第三种子点。For example, here the following sampling method is the farthest point sampling method as an example, select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point, and then select the Euclidean distance from the first seed point. The input point with the farthest distance is used as the second seed point, and then based on the first seed point and the second seed point, the input point with the farthest Euclidean distance is selected as the third seed point, that is, the Euclidean distance farthest from the first seed point is selected. And the input point with the farthest Euclidean distance from the second seed point is used as the third seed point.
进一步地,基于第二种子点以及第三种子点,选择欧式距离最远的输入点为第四种子点,即选择距离第二种子点欧氏距离最远以及距离第三种子点欧氏距离最远的输入点作为第四种子点,以此类推,直到种子点的数量达到1024个。即预设阈值为1024。Further, based on the second seed point and the third seed point, the input point with the farthest Euclidean distance is selected as the fourth seed point, that is, the Euclidean distance farthest from the second seed point and the Euclidean distance from the third seed point are selected. The distant input point is used as the fourth seed point, and so on, until the number of seed points reaches 1024. That is, the preset threshold is 1024.
采用上述技术方案,由于目标物体的种子点的稠密不均匀,通过下采样方法,可以确定目标物体的种子点,这样,可以减小目标物体的原始坐标密度不同对模型计算的影响。With the above technical solution, since the density of the seed points of the target object is not uniform, the seed points of the target object can be determined by the downsampling method, so that the influence of different original coordinate densities of the target object on the model calculation can be reduced.
可选地,所述深度神经网络模型的训练包括:Optionally, the training of the deep neural network model includes:
将样本种子点的坐标信息输入深度神经网络模型,得到所述深度神经网络模型输出的预测位移偏差和预测旋转偏差;The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
根据所述样本种子点的坐标信息以及所述预测位移偏差确定所述样本种子点的预测中心点坐标,以及根据所述样本种子点的坐标信息以及所述旋转偏差确定所述样本种子点的预测原位点坐标;The coordinates of the predicted center point of the sample seed point are determined according to the coordinate information of the sample seed point and the predicted displacement deviation, and the prediction of the sample seed point is determined according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
通过如下损失函数计算损失量L,并根据所述损失量L更新所述深度神经网络模型的参数:The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
L=λ 1L center2L initialL=λ 1 L center2 L initial ;
其中,n为样本种子点的总数量,L center为位移偏移量损失值,L initial为旋转偏移量损失值,λ 1为位移偏移量损失值的预设权重,λ 2为旋转偏移量损失值的预设权重,
Figure PCTCN2021122454-appb-000007
Δq i为第i个样本种子点对应的预测位移偏差,Δr i为第i个样本种子点对应的预测旋转偏差,
Figure PCTCN2021122454-appb-000008
为第i个样本种子点的坐标距离中心点坐标的真实偏移量,
Figure PCTCN2021122454-appb-000009
为第i个样本种子点的坐标距离 原位点坐标的真实偏移量。
Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
Figure PCTCN2021122454-appb-000007
Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
Figure PCTCN2021122454-appb-000008
is the real offset of the coordinates of the ith sample seed point from the coordinates of the center point,
Figure PCTCN2021122454-appb-000009
It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
值得说明的是第i个样本种子点的坐标距离原位点坐标的真实偏移量
Figure PCTCN2021122454-appb-000010
可以由旋转自由度的真实值与该种子点预测旋转偏差计算得到,第i个样本种子点的坐标距离中心点坐标的真实偏移量
Figure PCTCN2021122454-appb-000011
可以由位移自由度的真实值与该种子点预测位移偏差计算得到。
It is worth noting that the coordinates of the ith sample seed point are the real offsets from the coordinates of the original point.
Figure PCTCN2021122454-appb-000010
It can be calculated from the true value of the rotation degree of freedom and the predicted rotation deviation of the seed point. The coordinate of the ith sample seed point is the true offset of the center point coordinate.
Figure PCTCN2021122454-appb-000011
It can be calculated from the actual value of the displacement degrees of freedom and the predicted displacement deviation of the seed point.
可选地,图5是根据一示例性实施例示出的一种目标物体3D点云确定方法的流程图,在所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标之前,包括:Optionally, FIG. 5 is a flowchart of a method for determining a 3D point cloud of a target object according to an exemplary embodiment, in which each input in the determination of the target object according to the 3D point cloud corresponding to the target object is performed. Before the target coordinates of the point, include:
S51、通过图像采集装置采集目标物体的RGB图像。S51. Collect the RGB image of the target object through the image acquisition device.
S52、根据RGB图像,确定目标物体的深度图像,并对RGB图像进行实例分割,得到对应的类别级蒙版区域。S52. Determine the depth image of the target object according to the RGB image, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area.
S53、根据类别级蒙版区域、图像采集装置的内部参数以及深度图像,确定目标物体对应的3D点云。S53. Determine a 3D point cloud corresponding to the target object according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
具体实施时,在确定图像采集装置的采集范围内存在目标物体的情况下,例如,在工作台的图像采集装置的采集范围内放置目标物体,图像采集装置采集其采集范围的RGB图像,进而对该RGB图像进行实例分割,例如,利用Mask R-CNN实例分割算法对RGB图像进行实例分割,得到该目标物体所占有的类别级蒙版区域。In specific implementation, when it is determined that there is a target object in the acquisition range of the image acquisition device, for example, the target object is placed in the acquisition range of the image acquisition device on the workbench, and the image acquisition device collects RGB images of its acquisition range, and then the Instance segmentation is performed on the RGB image. For example, instance segmentation is performed on the RGB image by using the Mask R-CNN instance segmentation algorithm to obtain the category-level mask area occupied by the target object.
进一步地,基于RGB图像与深度图像是对齐的,根据该目标物体的RGB图像,确定该目标物体对应的深度图像,进而可以确定该目标图像的深度图像所占的区域,即该目标图像的图像深度区域。Further, based on the alignment of the RGB image and the depth image, according to the RGB image of the target object, the depth image corresponding to the target object is determined, and then the area occupied by the depth image of the target image can be determined, that is, the image of the target image. deep area.
进一步地,根据类别级蒙版区域以及深度图像区域,结合图像采集装置的噪声抑制参数、纹理还原度参数,确定目标物体对应的3D点云。这样,可以合理地去除图像的背景噪声,提高3D点云确定的准确性。Further, the 3D point cloud corresponding to the target object is determined according to the category-level mask area and the depth image area, combined with the noise suppression parameter and the texture restoration degree parameter of the image acquisition device. In this way, the background noise of the image can be removed reasonably and the accuracy of 3D point cloud determination can be improved.
采用上述技术方案,通过对目标物体的RGB图像进行类别级蒙版区域以及深度图像区域的确定,以及去除背景噪声,可以提高3D点云确定的准确性,进而提高了目标物体位姿确定的准确性。By adopting the above technical solution, by determining the category-level mask area and the depth image area of the RGB image of the target object, and removing background noise, the accuracy of determining the 3D point cloud can be improved, thereby improving the accuracy of determining the pose of the target object. sex.
可选地,图6是根据一示例性实施例示出的另一种目标物体3D点云确定方法的流程图,在所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标之前,包括:Optionally, FIG. 6 is a flowchart showing another method for determining a 3D point cloud of a target object according to an exemplary embodiment, in which each of the target objects is determined according to the 3D point cloud corresponding to the target object. Before entering the target coordinates of the point, include:
S61、根据获取到的目标物体的RGB图像以及目标检测算法,确定目标物体对应的类别级矩形区域。S61. Determine a category-level rectangular area corresponding to the target object according to the acquired RGB image of the target object and the target detection algorithm.
S62、对类别级矩形区域进行远近裁剪,得到目标物体的截锥体区域点云。S62 , crop the category-level rectangular region from far and near to obtain the point cloud of the frustum region of the target object.
S63、基于语义分割模型对截锥体区域点云进行语义分割,确定目标物体对应的3D点云。S63. Perform semantic segmentation on the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
具体实施时,在确定图像采集装置的采集范围内存在目标物体的情况下,通过图像采集装置采集目标物体的RGB图像,并通过单阶段2D目标检测算法YOLOv3,可以识别该目标区域所占的类别级矩形框区域。In specific implementation, when it is determined that there is a target object within the acquisition range of the image acquisition device, the RGB image of the target object is collected by the image acquisition device, and the single-stage 2D target detection algorithm YOLOv3 can be used to identify the category occupied by the target area. level rectangle area.
进一步地,基于图像采集装置的远近裁剪功能,对该类别级矩形区域进行远近裁剪,得到该目标物体对应的视椎体,基于图像采集装置的截锥体平面,该截锥体平面不过视椎体的顶点且与视锥体的母线相交,对该视椎体进行截锥体操作,得到该目标物体对应的截锥体,并确定该截锥体所占的区域。Further, based on the far-near cropping function of the image acquisition device, the category-level rectangular area is subjected to far-near cropping to obtain the visual frustum corresponding to the target object, and based on the frustum plane of the image acquisition device, the frustum plane is no more than the frustum. The vertex of the body is intersected with the generatrix of the viewing frustum, the frustum operation is performed on the viewing frustum, the frustum corresponding to the target object is obtained, and the area occupied by the frustum is determined.
进一步地,根据该截锥体所占的区域确定截锥体区域点云,并基于语义分割模型,例如PointNet++模型,对该截锥体区域点云进行语义分割,去除图像的背景噪声,进而确定目标物体对应的3D点云。Further, determine the point cloud of the frustum area according to the area occupied by the frustum, and perform semantic segmentation on the point cloud of the frustum area based on a semantic segmentation model, such as the PointNet++ model, remove the background noise of the image, and then determine The 3D point cloud corresponding to the target object.
采用上述技术方案,通过对目标物体的RGB图像进行类别级矩形框区域以及截锥体所占区域的确定,进而确定截锥体区域点云,并去除图像的背景噪声,可以提高3D点云确定的准确性,进而提高了目标物体位姿确定的准确性。Using the above technical solution, by determining the category-level rectangular frame area and the area occupied by the frustum on the RGB image of the target object, and then determining the point cloud of the frustum area, and removing the background noise of the image, the determination of the 3D point cloud can be improved. accuracy, thereby improving the accuracy of the pose determination of the target object.
基于相同的发明构思,本公开还提供一种目标物体位姿确定装置700,用于执行上述方法实施例提供的目标物体位姿确定方法的步骤,该装置700可以以软件、硬件或者两者相结合的方式实现目标物体位姿确定方法。图7是根据一示例性实施例示出的一种目标物体位姿确定装置的框图,如图7所示,所述装置700包括:确定模块710、生成模块720、输入模块730和执行模块740。Based on the same inventive concept, the present disclosure also provides an apparatus 700 for determining the pose of a target object, which is used to perform the steps of the method for determining the pose of a target object provided by the above method embodiments. The method for determining the pose of the target object is realized in a combined way. FIG. 7 is a block diagram of an apparatus for determining the pose of a target object according to an exemplary embodiment. As shown in FIG. 7 , the apparatus 700 includes: a determination module 710 , a generation module 720 , an input module 730 and an execution module 740 .
其中,确定模块710,用于根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标;Wherein, the determining module 710 is configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
生成模块720,用于基于下采样方法,根据所述每一输入点的目标坐标生成种子点;A generating module 720, configured to generate a seed point according to the target coordinate of each input point based on the downsampling method;
输入模块730,用于将每一所述种子点的原始坐标输入深度神经网络模型中,以得到每一所述种子点的中心点坐标以及原位点坐标;The input module 730 is used for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the origin point coordinates of each of the seed points;
执行模块740,用于根据每一所述种子点的中心点坐标以及原位点坐标确定所述目 标物体的6D位姿。The execution module 740 is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
上述装置通过深度神经网络模型将每一个种子点的坐标回归其标准姿态下的坐标,不仅缩短了位姿确定的时长还提高了目标物体位姿确定的准确性,从而提升了机器人抓取和增强现实的准确性。The above device returns the coordinates of each seed point to the coordinates under its standard pose through the deep neural network model, which not only shortens the time for determining the pose, but also improves the accuracy of determining the pose of the target object, thereby improving robot grasping and enhancement. realistic accuracy.
可选地,所述深度神经网络模型通过如下方式生成每一所述种子点的中心点坐标以及原位点坐标:Optionally, the deep neural network model generates the center point coordinates and in-situ coordinates of each of the seed points in the following manner:
根据每一所述种子点的原始坐标确定所述每一种子点对应的位移偏差以及旋转偏差;Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;
根据每一所述种子点的原始坐标以及对应的位移偏差确定该种子点的中心点坐标,以及根据每一所述种子点的原始坐标以及对应的旋转偏差确定该种子点的原位点坐标。The center point coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
可选地,所述执行模块用于:Optionally, the execution module is used to:
根据所述种子点的原始坐标构建第一一阶矩阵,并根据所述种子点的原位点坐标构建第二一阶矩阵;Build a first-order matrix according to the original coordinates of the seed points, and build a second first-order matrix according to the original coordinates of the seed points;
基于最小二乘法,根据所述第一一阶矩阵以及所述第二一阶矩阵确定所述目标物体对应的旋转矩阵;Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix;
根据所述旋转矩阵得到所述目标物体的3个旋转自由度;3 rotational degrees of freedom of the target object are obtained according to the rotation matrix;
根据每一所述种子点的中心点坐标计算平均坐标,并将所述平均坐标的三个坐标值作为所述目标物体的3个位移自由度;Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;
根据所述3个旋转自由度以及所述3个位移自由度确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
可选地,所述确定模块用于:Optionally, the determining module is used to:
根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的原始坐标;Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
根据所述每一输入点的原始坐标确定所述3D点云的重心坐标;Determine the barycentric coordinates of the 3D point cloud according to the original coordinates of each input point;
针对所述每一输入点,确定该输入点的原始坐标与所述重心坐标在每一维度上的偏移值;For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;
根据每一维度上的所述偏移值确定对应该输入点的目标坐标。The target coordinate corresponding to the input point is determined according to the offset value in each dimension.
可选地,所述生成模块用于:Optionally, the generation module is used for:
根据所述目标物体中每一输入点的原始坐标确定所述3D点云的中心点;Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;
选择距离所述3D点云的中心点欧氏距离最远的输入点作为第一种子点;Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;
以所述第一种子点为基准,将与已确定的种子点之间的欧式距离最远的输入点作为 新的种子点,直到种子点的个数达到预设阈值。Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
可选地,所述深度神经网络模型的训练包括:Optionally, the training of the deep neural network model includes:
将样本种子点的坐标信息输入深度神经网络模型,得到所述深度神经网络模型输出的预测位移偏差和预测旋转偏差;The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
根据所述样本种子点的坐标信息以及所述预测位移偏差确定所述样本种子点的预测中心点坐标,以及根据所述样本种子点的坐标信息以及所述旋转偏差确定所述样本种子点的预测原位点坐标;The coordinates of the predicted center point of the sample seed point are determined according to the coordinate information of the sample seed point and the predicted displacement deviation, and the prediction of the sample seed point is determined according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
通过如下损失函数计算损失量L,并根据所述损失量L更新所述深度神经网络模型的参数:The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
L=λ 1L center2L initialL=λ 1 L center2 L initial ;
其中,n为样本种子点的总数量,L center为位移偏移量损失值,L initial为旋转偏移量损失值,λ 1为位移偏移量损失值的预设权重,λ 2为旋转偏移量损失值的预设权重,
Figure PCTCN2021122454-appb-000012
Δq i为第i个样本种子点对应的预测位移偏差,Δr i为第i个样本种子点对应的预测旋转偏差,
Figure PCTCN2021122454-appb-000013
为第i个样本种子点的坐标距离中心点坐标的真实偏移量,
Figure PCTCN2021122454-appb-000014
为第i个样本种子点的坐标距离原位点坐标的真实偏移量。
Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
Figure PCTCN2021122454-appb-000012
Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
Figure PCTCN2021122454-appb-000013
is the true offset of the coordinates of the ith sample seed point from the coordinates of the center point,
Figure PCTCN2021122454-appb-000014
It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
可选地,所述装置还包括采集模块,用于:Optionally, the device further includes a collection module for:
通过图像采集装置采集所述目标物体的RGB图像;Collect the RGB image of the target object through an image acquisition device;
根据所述RGB图像,确定所述目标物体的深度图像,并对所述RGB图像进行实例分割,得到对应的类别级蒙版区域;According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
根据所述类别级蒙版区域、所述图像采集装置的内部参数以及所述深度图像,确定所述目标物体对应的3D点云。A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
可选地,所述装置还包括获取模块,用于:Optionally, the device further includes an acquisition module for:
根据获取到的所述目标物体的RGB图像以及目标检测算法,确定所述目标物体对应的类别级矩形区域;According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;
对所述类别级矩形区域进行远近裁剪,得到所述目标物体的截锥体区域点云;Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;
基于语义分割模型对所述截锥体区域点云进行语义分割,确定所述目标物体对应的3D点云。Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
此外值得说明的是,为描述的方便和简洁,说明书中所描述的实施例均属于优选实施例,其所涉及的部分并不一定是本公开所必须的,例如,输入模块730和执行模块740,在具体实施时可以是相互独立的装置也可以是同一个装置,本公开对此不作限定。In addition, it is worth noting that, for the convenience and brevity of description, the embodiments described in the specification are all preferred embodiments, and the involved parts are not necessarily necessary for the present disclosure, for example, the input module 730 and the execution module 740 , may be mutually independent devices or the same device during specific implementation, which is not limited in the present disclosure.
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一项所述方法的步骤。Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of any one of the methods described above.
本公开实施例还提供一种电子设备,包括:Embodiments of the present disclosure also provide an electronic device, including:
存储器,其上存储有计算机程序;a memory on which a computer program is stored;
处理器,用于执行所述存储器中的所述计算机程序,以实现上述任一项所述方法的步骤。A processor, configured to execute the computer program in the memory, to implement the steps of any one of the above methods.
图8是根据一示例性实施例示出的一种电子设备800的框图。如图8所示,该电子设备800可以包括:处理器701,存储器702。该电子设备800还可以包括多媒体组件703,输入/输出(I/O)接口704,以及通信组件705中的一者或多者。FIG. 8 is a block diagram of an electronic device 800 according to an exemplary embodiment. As shown in FIG. 8 , the electronic device 800 may include: a processor 701 and a memory 702 . The electronic device 800 may also include one or more of a multimedia component 703 , an input/output (I/O) interface 704 , and a communication component 705 .
其中,处理器701用于控制该电子设备800的整体操作,以完成上述的目标物体位姿确定方法中的全部或部分步骤。存储器702用于存储各种类型的数据以支持在该电子设备800的操作,这些数据例如可以包括用于在该电子设备800上操作的任何应用程序或方法的指令,以及应用程序相关的数据,例如联系人数据、收发的消息、图片、音频、视频等等。该存储器702可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。多媒体组件703可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏,音频组件用于输出和/或输入音频信号。例如,音频组件可以包括一个麦克风,麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器702或通过通信组件705发送。音频组件 还包括至少一个扬声器,用于输出音频信号。I/O接口704为处理器701和其他接口模块之间提供接口,上述其他接口模块可以是键盘,鼠标,按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件705用于该电子设备800与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near Field Communication,简称NFC),2G、3G、4G、NB-IOT、eMTC、或其他5G等等,或它们中的一种或几种的组合,在此不做限定。因此相应的该通信组件705可以包括:Wi-Fi模块,蓝牙模块,NFC模块等等。The processor 701 is configured to control the overall operation of the electronic device 800 to complete all or part of the steps in the above-mentioned method for determining the pose of the target object. The memory 702 is used to store various types of data to support operations on the electronic device 800, such data may include, for example, instructions for any application or method operating on the electronic device 800, and application-related data, Such as contact data, messages sent and received, pictures, audio, video, and so on. The memory 702 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only memory ( Electrically Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. Multimedia components 703 may include screen and audio components. Wherein the screen can be, for example, a touch screen, and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in memory 702 or transmitted through communication component 705 . The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, and the above-mentioned other interface modules may be a keyboard, a mouse, a button, and the like. These buttons can be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 800 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or more of them The combination is not limited here. Therefore, the corresponding communication component 705 may include: Wi-Fi module, Bluetooth module, NFC module and so on.
在一示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(Digital Signal Processor,简称DSP)、数字信号处理设备(Digital Signal Processing Device,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述的目标物体位姿确定方法。In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (Digital) Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic components The implementation is used to execute the above-mentioned method for determining the pose of the target object.
在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,该程序指令被处理器执行时实现上述的目标物体位姿确定方法的步骤。例如,该计算机可读存储介质可以为上述包括程序指令的存储器702,上述程序指令可由电子设备800的处理器701执行以完成上述的目标物体位姿确定方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, and when the program instructions are executed by a processor, the steps of the above-mentioned method for determining the pose of a target object are implemented. For example, the computer-readable storage medium can be the above-mentioned memory 702 including program instructions, and the above-mentioned program instructions can be executed by the processor 701 of the electronic device 800 to complete the above-mentioned method for determining the pose of a target object.
为了实现上述实施例,本公开还提出了一种计算处理设备,包括:In order to realize the above embodiments, the present disclosure also proposes a computing processing device, including:
存储器,其中存储有计算机可读代码;以及a memory in which computer readable code is stored; and
一个或多个处理器,当所述计算机可读代码被所述一个或多个处理器执行时,所述计算处理设备执行前述的目标物体位姿确定方法。One or more processors, when the computer readable code is executed by the one or more processors, the computing processing device executes the aforementioned method for determining the pose of the target object.
为了实现上述实施例,本公开还提出了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行前述目标物体位姿确定方法。In order to implement the above-mentioned embodiments, the present disclosure also proposes a computer program, including computer-readable codes, which, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to perform the aforementioned determination of the pose of the target object method.
本公开所提出的计算机可读存储介质,其中存储了前述的计算机程序。The computer-readable storage medium proposed by the present disclosure stores the aforementioned computer program therein.
图9为本公开实施例提供了一种计算处理设备的结构示意图。该计算处理设备通常包括处理器1110和以存储器1130形式的计算机程序产品或者计算机可读介质。存储器1130可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器1130具有用于执行上述方法中的任何方法步骤的程序代码 1151的存储空间1150。例如,用于程序代码的存储空间1150可以包括分别用于实现上面的方法中的各种步骤的各个程序代码1151。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如图10所示的便携式或者固定存储单元。该存储单元可以具有与图9的计算处理设备中的存储器1130类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码1151’,即可以由例如诸如1110之类的处理器读取的代码,这些代码当由服务器运行时,导致该服务器执行上面所描述的方法中的各个步骤。FIG. 9 provides a schematic structural diagram of a computing processing device according to an embodiment of the present disclosure. The computing processing device typically includes a processor 1110 and a computer program product or computer readable medium in the form of a memory 1130 . The memory 1130 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. The memory 1130 has storage space 1150 for program code 1151 for performing any of the method steps in the above-described methods. For example, the storage space 1150 for program codes may include various program codes 1151 for implementing various steps in the above methods, respectively. These program codes can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as shown in FIG. 10 . The storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 1130 in the computing processing device of FIG. 9 . The program code may, for example, be compressed in a suitable form. Typically, the storage unit includes computer readable code 1151', i.e. code readable by a processor such as 1110, for example, which when executed by a server, causes the server to perform the various steps in the methods described above.
以上结合附图详细描述了本公开的优选实施方式,但是,本公开并不限于上述实施方式中的具体细节,在本公开的技术构思范围内,可以对本公开的技术方案进行多种简单变型,这些简单变型均属于本公开的保护范围。The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings. However, the present disclosure is not limited to the specific details of the above-mentioned embodiments. Within the scope of the technical concept of the present disclosure, various simple modifications can be made to the technical solutions of the present disclosure. These simple modifications all fall within the protection scope of the present disclosure.
另外需要说明的是,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本公开对各种可能的组合方式不再另行说明。In addition, it should be noted that, the specific technical features described in the above-mentioned specific embodiments can be combined in any suitable manner unless they are inconsistent. In order to avoid unnecessary repetition, the present disclosure provides The combination method will not be specified otherwise.
此外,本公开的各种不同的实施方式之间也可以进行任意组合,只要其不违背本公开的思想,其同样应当视为本公开所公开的内容。In addition, the various embodiments of the present disclosure can also be arbitrarily combined, as long as they do not violate the spirit of the present disclosure, they should also be regarded as the contents disclosed in the present disclosure.
实施例Example
1、一种目标物体位姿确定方法,包括:1. A method for determining the pose of a target object, comprising:
根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标;Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
基于下采样方法,根据所述每一输入点的目标坐标生成种子点;Based on the downsampling method, a seed point is generated according to the target coordinates of each input point;
将每一所述种子点的原始坐标输入深度神经网络模型中,以得到每一所述种子点的中心点坐标以及原位点坐标;The original coordinates of each described seed point are input into the deep neural network model, to obtain the center point coordinates and the original position coordinates of each described seed point;
根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the center point coordinates of each seed point and the origin point coordinates.
2、根据实施例1所述的方法,其中,所述深度神经网络模型通过如下方式生成每一所述种子点的中心点坐标以及原位点坐标:2. The method according to Embodiment 1, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:
根据每一所述种子点的原始坐标确定所述每一种子点对应的位移偏差以及旋转偏差;Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;
根据每一所述种子点的原始坐标以及对应的位移偏差确定该种子点的中心点坐标, 以及根据每一所述种子点的原始坐标以及对应的旋转偏差确定该种子点的原位点坐标。The center point coordinates of each seed point are determined according to the original coordinates and the corresponding displacement deviation, and the home position coordinates of the seed point are determined according to the original coordinates and the corresponding rotation deviation of each seed point.
3、根据实施例1所述的方法,所述根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿,包括:3. The method according to Embodiment 1, wherein determining the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point, comprising:
根据所述种子点的原始坐标构建第一一阶矩阵,并根据所述种子点的原位点坐标构建第二一阶矩阵;Build a first-order matrix according to the original coordinates of the seed points, and build a second first-order matrix according to the original coordinates of the seed points;
基于最小二乘法,根据所述第一一阶矩阵以及所述第二一阶矩阵确定所述目标物体对应的旋转矩阵;Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix;
根据所述旋转矩阵得到所述目标物体的3个旋转自由度;3 rotational degrees of freedom of the target object are obtained according to the rotation matrix;
根据每一所述种子点的中心点坐标计算平均坐标,并将所述平均坐标的三个坐标值作为所述目标物体的3个位移自由度;Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;
根据所述3个旋转自由度以及所述3个位移自由度确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
4、根据实施例1所述的方法,所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标,包括:4. The method according to Embodiment 1, wherein determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, comprising:
根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的原始坐标;Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
根据所述每一输入点的原始坐标确定所述3D点云的重心坐标;Determine the barycentric coordinates of the 3D point cloud according to the original coordinates of each input point;
针对所述每一输入点,确定该输入点的原始坐标与所述重心坐标在每一维度上的偏移值;For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;
根据每一维度上的所述偏移值确定对应该输入点的目标坐标。The target coordinate corresponding to the input point is determined according to the offset value in each dimension.
5、根据实施例1所述的方法,所述下采样方法为最远点采样方法,所述基于下采样方法,根据所述每一输入点的目标坐标生成种子点,包括:5. The method according to Embodiment 1, wherein the downsampling method is the farthest point sampling method, and the downsampling-based method generates seed points according to the target coordinates of each input point, including:
根据所述目标物体中每一输入点的原始坐标确定所述3D点云的中心点;Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;
选择距离所述3D点云的中心点欧氏距离最远的输入点作为第一种子点;Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;
以所述第一种子点为基准,将与已确定的种子点之间的欧式距离最远的输入点作为新的种子点,直到种子点的个数达到预设阈值。Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
6、根据实施例1所述的方法,所述深度神经网络模型的训练包括:6. According to the method of embodiment 1, the training of the deep neural network model comprises:
将样本种子点的坐标信息输入深度神经网络模型,得到所述深度神经网络模型输出的预测位移偏差和预测旋转偏差;The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
根据所述样本种子点的坐标信息以及所述预测位移偏差确定所述样本种子点的预测中心点坐标,以及根据所述样本种子点的坐标信息以及所述旋转偏差确定所述样本种 子点的预测原位点坐标;Determine the coordinates of the predicted center point of the sample seed point according to the coordinate information of the sample seed point and the predicted displacement deviation, and determine the prediction of the sample seed point according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
通过如下损失函数计算损失量L,并根据所述损失量L更新所述深度神经网络模型的参数:The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
L=λ 1L center2L initialL=λ 1 L center2 L initial ;
其中,n为样本种子点的总数量,L center为位移偏移量损失值,L initial为旋转偏移量损失值,λ 1为位移偏移量损失值的预设权重,λ 2为旋转偏移量损失值的预设权重,
Figure PCTCN2021122454-appb-000015
Δq i为第i个样本种子点对应的预测位移偏差,Δr i为第i个样本种子点对应的预测旋转偏差,
Figure PCTCN2021122454-appb-000016
为第i个样本种子点的坐标距离中心点坐标的真实偏移量,
Figure PCTCN2021122454-appb-000017
为第i个样本种子点的坐标距离原位点坐标的真实偏移量。
Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
Figure PCTCN2021122454-appb-000015
Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
Figure PCTCN2021122454-appb-000016
is the true offset of the coordinates of the ith sample seed point from the coordinates of the center point,
Figure PCTCN2021122454-appb-000017
It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
7、根据实施例1至实施例6中任一实施例所述的方法,在所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标之前,包括:7. The method according to any one of Embodiments 1 to 6, before the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, comprising:
通过图像采集装置采集所述目标物体的RGB图像;Collect the RGB image of the target object through an image acquisition device;
根据所述RGB图像,确定所述目标物体的深度图像,并对所述RGB图像进行实例分割,得到对应的类别级蒙版区域;According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
根据所述类别级蒙版区域、所述图像采集装置的内部参数以及所述深度图像,确定所述目标物体对应的3D点云。A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
8、根据实施例1至实施例6中任一实施例所述的方法,在所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标之前,包括:8. The method according to any one of Embodiments 1 to 6, before determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, comprising:
根据获取到的所述目标物体的RGB图像以及目标检测算法,确定所述目标物体对应的类别级矩形区域;According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;
对所述类别级矩形区域进行远近裁剪,得到所述目标物体的截锥体区域点云;Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;
基于语义分割模型对所述截锥体区域点云进行语义分割,确定所述目标物体对应的3D点云。Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
9、一种目标物体位姿确定装置,包括:9. A device for determining the pose of a target object, comprising:
确定模块,用于根据所述目标物体对应的3D点云确定所述目标物体中每一输入点 的目标坐标;A determination module for determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
生成模块,用于基于下采样方法,根据所述每一输入点的目标坐标生成种子点;A generation module is used to generate a seed point according to the target coordinates of each input point based on the downsampling method;
输入模块,用于将每一所述种子点的原始坐标输入深度神经网络模型中,以得到每一所述种子点的中心点坐标以及原位点坐标;an input module for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the in-situ coordinates of each of the seed points;
执行模块,用于根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿。The execution module is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
10、根据实施例9所述的装置,所述深度神经网络模型通过如下方式生成每一所述种子点的中心点坐标以及原位点坐标:10. The device according to Embodiment 9, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:
根据每一所述种子点的原始坐标确定所述每一种子点对应的位移偏差以及旋转偏差;Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;
根据每一所述种子点的原始坐标以及对应的位移偏差确定该种子点的中心点坐标,以及根据每一所述种子点的原始坐标以及对应的旋转偏差确定该种子点的原位点坐标。The center point coordinates of each seed point are determined according to the original coordinates of the seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
11、根据实施例9所述的装置,所述执行模块用于:11. The apparatus according to Embodiment 9, wherein the execution module is configured to:
根据所述种子点的原始坐标构建第一一阶矩阵,并根据所述种子点的原位点坐标构建第二一阶矩阵;Build a first-order matrix according to the original coordinates of the seed points, and build a second first-order matrix according to the original coordinates of the seed points;
基于最小二乘法,根据所述第一一阶矩阵以及所述第二一阶矩阵确定所述目标物体对应的旋转矩阵;Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix;
根据所述旋转矩阵得到所述目标物体的3个旋转自由度;3 rotational degrees of freedom of the target object are obtained according to the rotation matrix;
根据每一所述种子点的中心点坐标计算平均坐标,并将所述平均坐标的三个坐标值作为所述目标物体的3个位移自由度;Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;
根据所述3个旋转自由度以及所述3个位移自由度确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
12、根据实施例9所述的装置,所述确定模块用于:12. The apparatus according to Embodiment 9, wherein the determining module is configured to:
根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的原始坐标;Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
根据所述每一输入点的原始坐标确定所述3D点云的重心坐标;Determine the barycentric coordinates of the 3D point cloud according to the original coordinates of each input point;
针对所述每一输入点,确定该输入点的原始坐标与所述重心坐标在每一维度上的偏移值;For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;
根据每一维度上的所述偏移值确定对应该输入点的目标坐标。The target coordinate corresponding to the input point is determined according to the offset value in each dimension.
13、根据实施例9所述的装置,所述生成模块用于:13. The apparatus according to Embodiment 9, wherein the generating module is used for:
根据所述目标物体中每一输入点的原始坐标确定所述3D点云的中心点;Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;
选择距离所述3D点云的中心点欧氏距离最远的输入点作为第一种子点;Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;
以所述第一种子点为基准,将与已确定的种子点之间的欧式距离最远的输入点作为新的种子点,直到种子点的个数达到预设阈值。Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
14、根据实施例9所述的装置,所述深度神经网络模型的训练包括:14. The apparatus according to Embodiment 9, wherein the training of the deep neural network model comprises:
将样本种子点的坐标信息输入深度神经网络模型,得到所述深度神经网络模型输出的预测位移偏差和预测旋转偏差;The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
根据所述样本种子点的坐标信息以及所述预测位移偏差确定所述样本种子点的预测中心点坐标,以及根据所述样本种子点的坐标信息以及所述旋转偏差确定所述样本种子点的预测原位点坐标;Determine the coordinates of the predicted center point of the sample seed point according to the coordinate information of the sample seed point and the predicted displacement deviation, and determine the prediction of the sample seed point according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
通过如下损失函数计算损失量L,并根据所述损失量L更新所述深度神经网络模型的参数:The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
L=λ 1L center2L initialL=λ 1 L center2 L initial ;
其中,n为样本种子点的总数量,L center为位移偏移量损失值,L initial为旋转偏移量损失值,λ 1为位移偏移量损失值的预设权重,λ 2为旋转偏移量损失值的预设权重,
Figure PCTCN2021122454-appb-000018
Δq i为第i个样本种子点对应的预测位移偏差,Δr i为第i个样本种子点对应的预测旋转偏差,
Figure PCTCN2021122454-appb-000019
为第i个样本种子点的坐标距离中心点坐标的真实偏移量,
Figure PCTCN2021122454-appb-000020
为第i个样本种子点的坐标距离原位点坐标的真实偏移量。
Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
Figure PCTCN2021122454-appb-000018
Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
Figure PCTCN2021122454-appb-000019
is the real offset of the coordinates of the ith sample seed point from the coordinates of the center point,
Figure PCTCN2021122454-appb-000020
It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
15、根据实施例9至实施例14中任一实施例所述的装置,所述装置还包括采集模块,用于:15. The apparatus according to any one of Embodiments 9 to 14, further comprising a collection module for:
通过图像采集装置采集所述目标物体的RGB图像;Collect the RGB image of the target object through an image acquisition device;
根据所述RGB图像,确定所述目标物体的深度图像,并对所述RGB图像进行实例分割,得到对应的类别级蒙版区域;According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
根据所述类别级蒙版区域、所述图像采集装置的内部参数以及所述深度图像,确定所述目标物体对应的3D点云。A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
16、根据实施例9至实施例14中任一实施例所述的装置,所述装置还包括获取模 块,用于:16. The apparatus according to any one of Embodiments 9 to 14, further comprising an acquisition module for:
根据获取到的所述目标物体的RGB图像以及目标检测算法,确定所述目标物体对应的类别级矩形区域;According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;
对所述类别级矩形区域进行远近裁剪,得到所述目标物体的截锥体区域点云;Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;
基于语义分割模型对所述截锥体区域点云进行语义分割,确定所述目标物体对应的3D点云。Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
17、一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行根据实施例1-8中任一项所述的方法。17. A computer program comprising computer readable code which, when run on a computing processing device, causes the computing processing device to perform the method of any of embodiments 1-8.
18、一种计算机可读存储介质,其上存储有如实施例17所述的计算机程序,该程序被处理器执行时实现实施例1至实施例8中任一实施例所述方法的步骤。18. A computer-readable storage medium on which the computer program according to Embodiment 17 is stored, and when the program is executed by a processor, implements the steps of the method according to any one of Embodiments 1 to 8.
19、一种电子设备,包括:19. An electronic device comprising:
存储器,其上存储有如实施例17所述的计算机程序;a memory having stored thereon the computer program as described in Embodiment 17;
处理器,用于执行所述存储器中的所述计算机程序,以实现实施例1至实施例8中任一实施例所述方法的步骤。A processor, configured to execute the computer program in the memory, to implement the steps of the method in any one of Embodiments 1 to 8.

Claims (19)

  1. 一种目标物体位姿确定方法,其特征在于,所述方法包括:A method for determining the pose of a target object, characterized in that the method comprises:
    根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标;Determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
    基于下采样方法,根据所述每一输入点的目标坐标生成种子点;Based on the downsampling method, a seed point is generated according to the target coordinates of each input point;
    将每一所述种子点的原始坐标输入深度神经网络模型中,以得到每一所述种子点的中心点坐标以及原位点坐标;The original coordinates of each described seed point are input into the deep neural network model, to obtain the center point coordinates and the original position coordinates of each described seed point;
    根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the center point coordinates of each seed point and the origin point coordinates.
  2. 根据权利要求1所述的方法,其特征在于,所述深度神经网络模型通过如下方式生成每一所述种子点的中心点坐标以及原位点坐标:The method according to claim 1, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:
    根据每一所述种子点的原始坐标确定所述每一种子点对应的位移偏差以及旋转偏差;Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;
    根据每一所述种子点的原始坐标以及对应的位移偏差确定该种子点的中心点坐标,以及根据每一所述种子点的原始坐标以及对应的旋转偏差确定该种子点的原位点坐标。The center point coordinates of each seed point are determined according to the original coordinates of the seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
  3. 根据权利要求1所述的方法,其特征在于,所述根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿,包括:The method according to claim 1, wherein the determining the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point includes:
    根据所述种子点的中心点坐标构建第一一阶矩阵,并根据所述种子点的原位点坐标构建第二一阶矩阵;Build a first-order matrix according to the center point coordinates of the seed points, and build a second first-order matrix according to the in-situ coordinates of the seed points;
    基于最小二乘法,根据所述第一一阶矩阵以及所述第二一阶矩阵确定所述目标物体对应的旋转矩阵;Based on the least squares method, determine the rotation matrix corresponding to the target object according to the first first-order matrix and the second first-order matrix;
    根据所述旋转矩阵得到所述目标物体的3个旋转自由度;3 rotational degrees of freedom of the target object are obtained according to the rotation matrix;
    根据每一所述种子点的中心点坐标计算平均坐标,并将所述平均坐标的三个坐标值作为所述目标物体的3个位移自由度;Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;
    根据所述3个旋转自由度以及所述3个位移自由度确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标,包括:The method according to claim 1, wherein the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object comprises:
    根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的原始坐标;Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
    根据所述每一输入点的原始坐标确定所述目标物体的重心坐标;Determine the barycentric coordinates of the target object according to the original coordinates of each input point;
    针对所述每一输入点,确定该输入点的原始坐标与所述重心坐标在每一维度上的偏移值;For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;
    根据每一维度上的所述偏移值确定对应该输入点的目标坐标。The target coordinate corresponding to the input point is determined according to the offset value in each dimension.
  5. 根据权利要求1所述的方法,其特征在于,所述下采样方法为最远点采样方法,所述基于下采样方法,根据所述每一输入点的目标坐标生成种子点,包括:The method according to claim 1, wherein the downsampling method is a farthest point sampling method, and the downsampling-based method generates seed points according to the target coordinates of each input point, comprising:
    根据所述目标物体中每一输入点的原始坐标确定所述3D点云的中心点;Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;
    选择距离所述3D点云的中心点欧氏距离最远的输入点作为第一种子点;Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;
    以所述第一种子点为基准,将与已确定的种子点之间的欧式距离最远的输入点作为新的种子点,直到种子点的个数达到预设阈值。Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
  6. 根据权利要求1所述的方法,其特征在于,所述深度神经网络模型的训练包括:The method according to claim 1, wherein the training of the deep neural network model comprises:
    将样本种子点的坐标信息输入深度神经网络模型,得到所述深度神经网络模型输出的预测位移偏差和预测旋转偏差;The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
    根据所述样本种子点的坐标信息以及所述预测位移偏差确定所述样本种子点的预测中心点坐标,以及根据所述样本种子点的坐标信息以及所述旋转偏差确定所述样本种子点的预测原位点坐标;Determine the coordinates of the predicted center point of the sample seed point according to the coordinate information of the sample seed point and the predicted displacement deviation, and determine the prediction of the sample seed point according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
    通过如下损失函数计算损失量L,并根据所述损失量L更新所述深度神经网络模型的参数:The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
    L=λ 1L center2L initialL=λ 1 L center2 L initial ;
    其中,n为样本种子点的总数量,L center为位移偏移量损失值,L initial为旋转偏移量损失值,λ 1为位移偏移量损失值的预设权重,λ 2为旋转偏移量损失值的预设权重,
    Figure PCTCN2021122454-appb-100001
    Δq i为第i个样本种子点对应的预测位移偏差,Δr i为第i个样本种子点对应的预测旋转偏差,
    Figure PCTCN2021122454-appb-100002
    为第i个样本种子点的坐标距离中心点坐标的真实偏移量,
    Figure PCTCN2021122454-appb-100003
    为第i个样本种子点的坐标距离原位点坐标的真实偏移量。
    Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
    Figure PCTCN2021122454-appb-100001
    Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
    Figure PCTCN2021122454-appb-100002
    is the real offset of the coordinates of the ith sample seed point from the coordinates of the center point,
    Figure PCTCN2021122454-appb-100003
    It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,在所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标之前,包括:The method according to any one of claims 1-6, wherein before the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, the method comprises:
    通过图像采集装置采集所述目标物体的RGB图像;Collect the RGB image of the target object through an image acquisition device;
    根据所述RGB图像,确定所述目标物体的深度图像,并对所述RGB图像进行实例分割,得到对应的类别级蒙版区域;According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
    根据所述类别级蒙版区域、所述图像采集装置的内部参数以及所述深度图像,确定所述目标物体对应的3D点云。A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
  8. 根据权利要求1-6任一项所述的方法,其特征在于,在所述根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标之前,包括:The method according to any one of claims 1-6, wherein before the determining the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object, the method comprises:
    根据获取到的所述目标物体的RGB图像以及目标检测算法,确定所述目标物体对应的类别级矩形区域;According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;
    对所述类别级矩形区域进行远近裁剪,得到所述目标物体的截锥体区域点云;Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;
    基于语义分割模型对所述截锥体区域点云进行语义分割,确定所述目标物体对应的3D点云。Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
  9. 一种目标物体位姿确定装置,其特征在于,所述装置包括:A device for determining the pose of a target object, characterized in that the device comprises:
    确定模块,用于根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的目标坐标;a determination module, configured to determine the target coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
    生成模块,用于基于下采样方法,根据所述每一输入点的目标坐标生成种子点;A generation module is used to generate a seed point according to the target coordinates of each input point based on the downsampling method;
    输入模块,用于将每一所述种子点的原始坐标输入深度神经网络模型中,以得到每一所述种子点的中心点坐标以及原位点坐标;an input module for inputting the original coordinates of each of the seed points into the deep neural network model, to obtain the center point coordinates and the in-situ coordinates of each of the seed points;
    执行模块,用于根据每一所述种子点的中心点坐标以及原位点坐标确定所述目标物体的6D位姿。The execution module is configured to determine the 6D pose of the target object according to the coordinates of the center point of each of the seed points and the coordinates of the in-situ point.
  10. 根据权利要求9所述的装置,其特征在于,所述深度神经网络模型通过如下方式生成每一所述种子点的中心点坐标以及原位点坐标:The device according to claim 9, wherein the deep neural network model generates the center point coordinates and the in-situ point coordinates of each of the seed points in the following manner:
    根据每一所述种子点的原始坐标确定所述每一种子点对应的位移偏差以及旋转偏差;Determine the displacement deviation and the rotation deviation corresponding to each of the seed points according to the original coordinates of each of the seed points;
    根据每一所述种子点的原始坐标以及对应的位移偏差确定该种子点的中心点坐标,以及根据每一所述种子点的原始坐标以及对应的旋转偏差确定该种子点的原位点坐标。The center point coordinates of each seed point are determined according to the original coordinates of the seed point and the corresponding displacement deviation, and the original position coordinates of the seed point are determined according to the original coordinates of each seed point and the corresponding rotation deviation.
  11. 根据权利要求9所述的装置,其特征在于,所述执行模块用于:The apparatus according to claim 9, wherein the execution module is configured to:
    根据所述种子点的中心点坐标构建第一一阶矩阵,并根据所述种子点的原位点坐标构建第二一阶矩阵;Build a first-order matrix according to the center point coordinates of the seed points, and build a second first-order matrix according to the in-situ coordinates of the seed points;
    基于最小二乘法,根据所述第一一阶矩阵以及所述第二一阶矩阵确定所述目标物体 对应的旋转矩阵;Based on the least squares method, the rotation matrix corresponding to the target object is determined according to the first first-order matrix and the second first-order matrix;
    根据所述旋转矩阵得到所述目标物体的3个旋转自由度;Obtain 3 rotational degrees of freedom of the target object according to the rotation matrix;
    根据每一所述种子点的中心点坐标计算平均坐标,并将所述平均坐标的三个坐标值作为所述目标物体的3个位移自由度;Calculate the average coordinates according to the center point coordinates of each of the seed points, and use the three coordinate values of the average coordinates as the three displacement degrees of freedom of the target object;
    根据所述3个旋转自由度以及所述3个位移自由度确定所述目标物体的6D位姿。The 6D pose of the target object is determined according to the three rotational degrees of freedom and the three displacement degrees of freedom.
  12. 根据权利要求9所述的装置,其特征在于,所述确定模块用于:The device according to claim 9, wherein the determining module is configured to:
    根据所述目标物体对应的3D点云确定所述目标物体中每一输入点的原始坐标;Determine the original coordinates of each input point in the target object according to the 3D point cloud corresponding to the target object;
    根据所述每一输入点的原始坐标确定所述目标物体的重心坐标;Determine the barycentric coordinates of the target object according to the original coordinates of each input point;
    针对所述每一输入点,确定该输入点的原始坐标与所述重心坐标在每一维度上的偏移值;For each input point, determine the offset value of the original coordinate of the input point and the barycentric coordinate in each dimension;
    根据每一维度上的所述偏移值确定对应该输入点的目标坐标。The target coordinate corresponding to the input point is determined according to the offset value in each dimension.
  13. 根据权利要求9所述的装置,其特征在于,所述生成模块用于:The apparatus according to claim 9, wherein the generating module is used for:
    根据所述目标物体中每一输入点的原始坐标确定所述3D点云的中心点;Determine the center point of the 3D point cloud according to the original coordinates of each input point in the target object;
    选择距离所述3D点云的中心点欧氏距离最远的输入点作为第一种子点;Select the input point with the farthest Euclidean distance from the center point of the 3D point cloud as the first seed point;
    以所述第一种子点为基准,将与已确定的种子点之间的欧式距离最远的输入点作为新的种子点,直到种子点的个数达到预设阈值。Taking the first seed point as a benchmark, the input point with the farthest Euclidean distance from the determined seed point is used as a new seed point, until the number of seed points reaches a preset threshold.
  14. 根据权利要求9所述的装置,其特征在于,所述深度神经网络模型的训练包括:The device according to claim 9, wherein the training of the deep neural network model comprises:
    将样本种子点的坐标信息输入深度神经网络模型,得到所述深度神经网络模型输出的预测位移偏差和预测旋转偏差;The coordinate information of the sample seed point is input into the deep neural network model, and the predicted displacement deviation and the predicted rotation deviation of the output of the deep neural network model are obtained;
    根据所述样本种子点的坐标信息以及所述预测位移偏差确定所述样本种子点的预测中心点坐标,以及根据所述样本种子点的坐标信息以及所述旋转偏差确定所述样本种子点的预测原位点坐标;Determine the coordinates of the predicted center point of the sample seed point according to the coordinate information of the sample seed point and the predicted displacement deviation, and determine the prediction of the sample seed point according to the coordinate information of the sample seed point and the rotation deviation in-situ coordinates;
    通过如下损失函数计算损失量L,并根据所述损失量L更新所述深度神经网络模型的参数:The loss amount L is calculated by the following loss function, and the parameters of the deep neural network model are updated according to the loss amount L:
    L=λ 1L center2L initialL=λ 1 L center2 L initial ;
    其中,n为样本种子点的总数量,L center为位移偏移量损失值,L initial为旋转偏移量损失值,λ 1为位移偏移量损失值的预设权重,λ 2为旋转偏移量损失值的预设权重,
    Figure PCTCN2021122454-appb-100004
    Δq i为第i个样本种子点对应的预测位移偏差,Δr i为第i个样本种子点对应的预测旋转偏差,
    Figure PCTCN2021122454-appb-100005
    为第i个样本种子点的坐标距离中心点坐标的真实偏移量,
    Figure PCTCN2021122454-appb-100006
    为第i个样本种子点的坐标距离原位点坐标的真实偏移量。
    Among them, n is the total number of sample seed points, L center is the displacement offset loss value, L initial is the rotation offset loss value, λ 1 is the preset weight of the displacement offset loss value, and λ 2 is the rotation offset value. The preset weight of the displacement loss value,
    Figure PCTCN2021122454-appb-100004
    Δq i is the predicted displacement deviation corresponding to the ith sample seed point, Δr i is the predicted rotation deviation corresponding to the ith sample seed point,
    Figure PCTCN2021122454-appb-100005
    is the real offset of the coordinates of the ith sample seed point from the coordinates of the center point,
    Figure PCTCN2021122454-appb-100006
    It is the true offset of the coordinates of the ith sample seed point from the coordinates of the original point.
  15. 根据权利要求9-14任一项所述的装置,其特征在于,所述装置还包括采集模块,用于:The device according to any one of claims 9-14, characterized in that, the device further comprises a collection module for:
    通过图像采集装置采集所述目标物体的RGB图像;Collect the RGB image of the target object through an image acquisition device;
    根据所述RGB图像,确定所述目标物体的深度图像,并对所述RGB图像进行实例分割,得到对应的类别级蒙版区域;According to the RGB image, determine the depth image of the target object, and perform instance segmentation on the RGB image to obtain a corresponding category-level mask area;
    根据所述类别级蒙版区域、所述图像采集装置的内部参数以及所述深度图像,确定所述目标物体对应的3D点云。A 3D point cloud corresponding to the target object is determined according to the category-level mask area, the internal parameters of the image acquisition device, and the depth image.
  16. 根据权利要求9-14任一项所述的装置,其特征在于,所述装置还包括获取模块,用于:The device according to any one of claims 9-14, characterized in that, the device further comprises an acquisition module for:
    根据获取到的所述目标物体的RGB图像以及目标检测算法,确定所述目标物体对应的类别级矩形区域;According to the obtained RGB image of the target object and the target detection algorithm, determine the category-level rectangular area corresponding to the target object;
    对所述类别级矩形区域进行远近裁剪,得到所述目标物体的截锥体区域点云;Cropping the category-level rectangular area near and far to obtain the point cloud of the frustum area of the target object;
    基于语义分割模型对所述截锥体区域点云进行语义分割,确定所述目标物体对应的3D点云。Semantically segment the point cloud of the frustum region based on the semantic segmentation model, and determine the 3D point cloud corresponding to the target object.
  17. 一种计算机程序,其特征在于,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行根据权利要求1-8中任一项所述的方法。A computer program, characterized by comprising computer-readable codes that, when the computer-readable codes are executed on a computing and processing device, cause the computing and processing device to execute the method according to any one of claims 1-8 method.
  18. 一种计算机可读存储介质,其上存储有如权利要求17所述的计算机程序,其特征在于,该程序被处理器执行时实现权利要求1-8中任一项所述方法的步骤。A computer-readable storage medium on which the computer program according to claim 17 is stored, characterized in that, when the program is executed by a processor, the steps of the method according to any one of claims 1-8 are implemented.
  19. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    存储器,其上存储有如权利要求17所述的计算机程序;a memory having stored thereon the computer program of claim 17;
    处理器,用于执行所述存储器中的所述计算机程序,以实现权利要求1-8中任一项所述方法的步骤。A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-8.
PCT/CN2021/122454 2020-12-02 2021-09-30 Method and apparatus for determining pose of target object, storage medium and electronic device WO2022116678A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011401984.6 2020-12-02
CN202011401984.6A CN112435297B (en) 2020-12-02 2020-12-02 Target object pose determining method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
WO2022116678A1 true WO2022116678A1 (en) 2022-06-09

Family

ID=74691510

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122454 WO2022116678A1 (en) 2020-12-02 2021-09-30 Method and apparatus for determining pose of target object, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN112435297B (en)
WO (1) WO2022116678A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245944A (en) * 2022-12-29 2023-06-09 南京航空航天大学 Cabin automatic docking method and system based on measured data
CN116245950A (en) * 2023-05-11 2023-06-09 合肥高维数据技术有限公司 Screen corner positioning method for full screen or single corner deletion

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435297B (en) * 2020-12-02 2023-04-18 达闼机器人股份有限公司 Target object pose determining method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829947A (en) * 2019-02-25 2019-05-31 北京旷视科技有限公司 Pose determines method, tray loading method, apparatus, medium and electronic equipment
US20190278983A1 (en) * 2018-03-12 2019-09-12 Nvidia Corporation Three-dimensional (3d) pose estimation from a monocular camera
CN110322510A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of 6D position and orientation estimation method using profile information
CN111145253A (en) * 2019-12-12 2020-05-12 深圳先进技术研究院 Efficient object 6D attitude estimation algorithm
CN111899301A (en) * 2020-06-02 2020-11-06 广州中国科学院先进技术研究所 Workpiece 6D pose estimation method based on deep learning
CN112435297A (en) * 2020-12-02 2021-03-02 达闼机器人有限公司 Target object pose determining method and device, storage medium and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6736362B2 (en) * 2016-06-03 2020-08-05 キヤノン株式会社 Image processing device, image processing method, and program
JP6955783B2 (en) * 2018-01-10 2021-10-27 達闥機器人有限公司Cloudminds (Shanghai) Robotics Co., Ltd. Information processing methods, equipment, cloud processing devices and computer program products
CN110660101B (en) * 2019-08-19 2022-06-07 浙江理工大学 Object 6D posture prediction method based on RGB image and coordinate system transformation
CN111243017B (en) * 2019-12-24 2024-05-10 广州中国科学院先进技术研究所 Intelligent robot grabbing method based on 3D vision
CN111259934B (en) * 2020-01-09 2023-04-07 清华大学深圳国际研究生院 Stacked object 6D pose estimation method and device based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190278983A1 (en) * 2018-03-12 2019-09-12 Nvidia Corporation Three-dimensional (3d) pose estimation from a monocular camera
CN109829947A (en) * 2019-02-25 2019-05-31 北京旷视科技有限公司 Pose determines method, tray loading method, apparatus, medium and electronic equipment
CN110322510A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of 6D position and orientation estimation method using profile information
CN111145253A (en) * 2019-12-12 2020-05-12 深圳先进技术研究院 Efficient object 6D attitude estimation algorithm
CN111899301A (en) * 2020-06-02 2020-11-06 广州中国科学院先进技术研究所 Workpiece 6D pose estimation method based on deep learning
CN112435297A (en) * 2020-12-02 2021-03-02 达闼机器人有限公司 Target object pose determining method and device, storage medium and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245944A (en) * 2022-12-29 2023-06-09 南京航空航天大学 Cabin automatic docking method and system based on measured data
CN116245944B (en) * 2022-12-29 2024-01-05 南京航空航天大学 Cabin automatic docking method and system based on measured data
CN116245950A (en) * 2023-05-11 2023-06-09 合肥高维数据技术有限公司 Screen corner positioning method for full screen or single corner deletion
CN116245950B (en) * 2023-05-11 2023-08-01 合肥高维数据技术有限公司 Screen corner positioning method for full screen or single corner deletion

Also Published As

Publication number Publication date
CN112435297B (en) 2023-04-18
CN112435297A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
WO2022116678A1 (en) Method and apparatus for determining pose of target object, storage medium and electronic device
CN108509848B (en) The real-time detection method and system of three-dimension object
JP7373554B2 (en) Cross-domain image transformation
Zhang et al. Guided mesh normal filtering
WO2022116677A1 (en) Target object grasping method and apparatus, storage medium, and electronic device
US20210012093A1 (en) Method and apparatus for generating face rotation image
CN110675487B (en) Three-dimensional face modeling and recognition method and device based on multi-angle two-dimensional face
JP5705147B2 (en) Representing 3D objects or objects using descriptors
Tejani et al. Latent-class hough forests for 6 DoF object pose estimation
CN110348454B (en) Matching local image feature descriptors
WO2016054779A1 (en) Spatial pyramid pooling networks for image processing
US10311099B2 (en) Method and system for 3D model database retrieval
US9311756B2 (en) Image group processing and visualization
JP2018523881A (en) Method and system for aligning data
CN108875133A (en) Determine architectural composition
JP7453470B2 (en) 3D reconstruction and related interactions, measurement methods and related devices and equipment
CN112150551A (en) Object pose acquisition method and device and electronic equipment
TW201616451A (en) System and method for selecting point clouds using a free selection tool
CN112085033A (en) Template matching method and device, electronic equipment and storage medium
CN110956131B (en) Single-target tracking method, device and system
CN116309880A (en) Object pose determining method, device, equipment and medium based on three-dimensional reconstruction
CN111382791B (en) Deep learning task processing method, image recognition task processing method and device
JP2014102746A (en) Subject recognition device and subject recognition program
CN113793370B (en) Three-dimensional point cloud registration method and device, electronic equipment and readable medium
CN117372604A (en) 3D face model generation method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899712

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21899712

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.11.2023)