WO2023273272A1 - Target pose estimation method and apparatus, computing device, storage medium, and computer program - Google Patents

Target pose estimation method and apparatus, computing device, storage medium, and computer program Download PDF

Info

Publication number
WO2023273272A1
WO2023273272A1 PCT/CN2021/143442 CN2021143442W WO2023273272A1 WO 2023273272 A1 WO2023273272 A1 WO 2023273272A1 CN 2021143442 W CN2021143442 W CN 2021143442W WO 2023273272 A1 WO2023273272 A1 WO 2023273272A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
model
detection area
rgb image
pose estimation
Prior art date
Application number
PCT/CN2021/143442
Other languages
French (fr)
Chinese (zh)
Inventor
杨佳丽
杜国光
赵开勇
Original Assignee
达闼科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 达闼科技(北京)有限公司 filed Critical 达闼科技(北京)有限公司
Publication of WO2023273272A1 publication Critical patent/WO2023273272A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • Embodiments of the present disclosure relate to the technical field of computer vision, and specifically relate to a target pose estimation method, device, computing device, storage medium, and computer program.
  • Robot grasping has great application value no matter in the industrial scene or in the home scene.
  • the pose estimation of the object to be grasped is an important factor affecting the success of grasping.
  • Existing pose estimation methods are generally divided into feature matching methods, template methods, and deep learning-based methods.
  • the feature matching method usually calculates the feature points between the 3D model and the 2D image and matches them, and then uses the PnP method to calculate the pose.
  • the template method usually models the 3D model of the object to be grasped from various perspectives, and estimates the pose by matching the collected images with the template.
  • the method based on deep learning usually needs to collect a large number of objects to be grasped first.
  • Color images and depth images in various pose states create a data set, and then directly or indirectly estimate the pose of the object to be grasped by training a convolutional neural network based on deep learning.
  • the current algorithm still has shortcomings in the grasping of real objects.
  • the feature matching method often requires a lot of calculations, and the algorithm takes a long time to run. Not only that, the success of feature point selection and matching directly affects the accuracy of pose estimation, and the algorithm for objects with fewer feature points often cannot obtain accurate and stable results.
  • the method based on template matching often requires a large number of templates, and pose estimation is essentially a regression problem, so the accuracy of the algorithm is often directly proportional to the number of templates selected, and it is difficult to achieve a balance.
  • the method based on deep learning directly returns the object pose through the convolutional neural network. Most of the existing deep learning methods are at the instance level, and the generalization ability is poor.
  • embodiments of the present disclosure provide a target pose estimation method, device, computing device, and storage medium, which overcome the above problems or at least partially solve the above problems.
  • a method for estimating a target pose comprising: performing 2D detection according to an RGB image and a depth image, and obtaining a detection area of the target; Acquiring a normalized model of the target from an image; acquiring size information of the target according to the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and The 3D model uses a PnP algorithm to obtain the pose information of the target.
  • the performing 2D detection according to the RGB image and the depth image to obtain the detection area of the target includes: applying a pre-built first convolutional neural network to process the RGB image to obtain the The detection area of the target in the RGB image; acquiring the same detection area of the target in the depth image as the RGB image.
  • the normalization model of the target obtained by normalizing the RGB image in the detection area includes: applying the first network structure to the RGB image in the detection area The RGB image is processed to obtain a normalized model of the target.
  • the applying the first network structure to process the RGB image in the detection area to obtain a normalized model of the target includes:
  • the acquiring the size information of the target according to the depth image in the detection area includes: converting the depth image in the detection area into a point cloud; applying the first The second network structure processes the point cloud to obtain the size information of the object.
  • the merging the size information with the normalized model to obtain the 3D model includes: calculating the 3D model according to the size information and the normalized model by applying the following relational formula: Model:
  • (x, y, z) are the coordinates of the normalized model
  • (x', y', z') are the coordinates of the 3D model
  • (w, l, h) are the dimensions of the target Information
  • w, l, h represent the width, length, and height of the target, respectively.
  • the applying the PnP algorithm to acquire the pose information of the target according to the 3D model includes: applying the PnP algorithm to match the coordinates of the 3D model with the 2D image, and acquiring the target pose information.
  • a target pose estimation device includes: a 2D detection unit, configured to perform 2D detection according to an RGB image and a depth image, and obtain a detection area of a target ;
  • a normalization unit configured to acquire a normalized model of the target from the RGB image in the detection area; a size acquisition unit, configured to acquire the target’s size based on the depth image in the detection area Size information; a pose estimation unit, configured to fuse the size information with the normalized model to obtain a 3D model, and apply a PnP algorithm to obtain the pose information of the target according to the 3D model.
  • the 2D detection unit includes: applying a pre-built first convolutional neural network to process the RGB image, and obtaining the detection of the target in the RGB image area: acquire the detection area corresponding to the same target as the RGB image in the depth image.
  • the normalization unit includes: applying the first network structure to process the RGB image in the detection area, and obtaining the normalized normalized value of the target model.
  • the normalization unit includes: applying multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area Perform convolution operation on the lowest resolution feature map; apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of The normalized model of the target is obtained after convolution operations.
  • the size acquisition unit is configured to: convert the depth image in the detection area into a point cloud; apply the second network structure to process the point cloud, and obtain the The size information of the target.
  • the pose estimation unit includes: calculating the 3D model according to the size information and the normalized model by applying the following relational expression:
  • (x, y, z) are the coordinates of the normalized model
  • (x', y', z') are the coordinates of the 3D model
  • (w, l, h) are the dimensions of the target Information
  • w, l, h represent the width, length, and height of the target, respectively.
  • the pose estimation unit is configured to: apply a PnP algorithm to match the coordinates of the 3D model with the 2D image, and obtain pose information of the target.
  • a computing device including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete the communication through the communication bus. communication between
  • the memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the steps of the above target pose estimation method.
  • a computer storage medium wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to execute the steps of the above target pose estimation method .
  • a computer program including instructions, which, when run on a computer, cause the computer to execute the above-mentioned target pose estimation method.
  • the target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model
  • the pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
  • FIG. 1 shows a schematic flow diagram of a target pose estimation method provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of the first convolutional neural network in the target pose estimation method provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of the first network structure in the target pose estimation method provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of acquiring size information in the target pose estimation method provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic structural diagram of an object pose estimation device provided by an embodiment of the present disclosure
  • FIG. 6 shows a schematic structural diagram of a computing device provided by an embodiment of the present disclosure.
  • Fig. 1 shows a schematic flowchart of a method for estimating a target pose provided by an embodiment of the present disclosure. As shown in Fig. 1 , the method for estimating a target pose includes:
  • Step S11 Perform 2D detection according to the RGB image and the depth image, and obtain the detection area of the target.
  • the RGB image is processed by applying a pre-built first convolutional neural network to obtain the detection area of the target in the RGB image; in the depth image Acquire the detection area corresponding to the same target as the RGB image.
  • the first convolutional neural network is not limited to a specific detection or segmentation method, the goal is to obtain the specific area of the target (object to be captured) in the image, and reduce background interference factors for subsequent pose estimation.
  • the first convolutional neural network Before applying the first convolutional neural network for processing, the first convolutional neural network needs to be constructed.
  • First construct the data set collect the RGB images of the objects to be captured under different environmental backgrounds, and mark the most suitable bounding box (x, y, w, h) and object category id for each RGB image; secondly, for a large number of RGB
  • the image data is trained using a Convolutional Neural Network (CNN) to obtain the first Convolutional Neural Network model.
  • the network structure of the first convolutional neural network is shown in Figure 2. The number of network layers is 31, and the image block is scaled to obtain a 448x448 pixel block as the network input.
  • Step S12 Obtain a normalized model of the target from the RGB image in the detection area.
  • a first network structure is applied to process the RGB image in the detection area to obtain a normalized normalized model of the target.
  • the specific structure of the first network structure is shown in Figure 3, applying multiple sets of convolution + convolution + downsampling combination to downsample the RGB image in the detection area and then perform convolution on the lowest resolution feature map Operation: apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations to obtain the target normalized model.
  • the normalized model of the target is output after two consecutive convolution operations.
  • the embodiments of the present disclosure use the U-net network structure regression normalized accuracy map on the basis of 2D detection, which can greatly improve the accuracy of the algorithm.
  • Step S13 Acquiring size information of the target according to the depth image in the detection area.
  • the depth image in the detection area is converted into a point cloud.
  • conversion formula for conversion :
  • (X,Y,Z) are point cloud coordinates
  • (x′,y′) are image coordinates
  • D is depth value
  • f x and f y are focal lengths
  • c x , cy are principal point offsets.
  • the second network structure is preferably composed of a PointNet++ network and a convolutional layer and a fully connected layer behind the PointNet++ network.
  • the PointNet++ network Through the PointNet++ network, the size of the object can be regressed, represented by S(w,l,h), and then the size information of the object can be restored by adding a convolutional layer and a fully connected layer.
  • Step S14 merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm according to the 3D model to obtain pose information of the target.
  • the complete 3D information of the target can be obtained by fusing the normalized model and the object size information.
  • the 3D model is calculated by applying the following relational formula according to the size information and the normalized model:
  • (x, y, z) are the coordinates of the normalized model
  • (x', y', z') are the coordinates of the 3D model
  • (w, l, h) are the dimensions of the target Information
  • w, l, h represent the width, length, and height of the target, respectively.
  • the PnP algorithm may be any existing PnP algorithm capable of realizing the above-mentioned functions, which will not be repeated here.
  • size information is restored through PointNet++ by means of a depth map, and prior information of a post-processing algorithm is added to obtain higher precision information.
  • the embodiments of the present disclosure use RGB images to obtain object categories, segmentation results and normalized models through convolutional neural networks, depth (Depth) images and segmentation results to obtain object size information, size information and normalized models are fused to obtain 3D models, and finally The final pose information is obtained through PnP.
  • T(x, y, z) is used to represent position information in three-dimensional space
  • a rotation matrix R is used to represent three-axis rotation in three-dimensional space.
  • the use of the normalized model can well solve the problem of inconsistency in the size of similar objects of the same type and the inability to obtain the exact size of the object due to camera scaling. Combined with the size recovered from the depth map to carry out poor information, it can solve the current deep learning method. Instance-level issues.
  • the target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model
  • the pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
  • Fig. 5 shows a schematic structural diagram of an object pose estimation device according to an embodiment of the present disclosure. As shown in Fig. Pose Estimation Unit 504.
  • the 2D detection unit 501 is used to perform 2D detection according to the RGB image and the depth image to obtain the detection area of the target;
  • the normalization unit 502 is used to obtain the normalized model of the target from the RGB image in the detection area;
  • the size acquisition unit 503 is configured to acquire size information of the target according to the depth image in the detection area;
  • the pose estimation unit 504 is configured to fuse the size information with the normalized model to obtain a 3D model, and Applying a PnP algorithm according to the 3D model to obtain pose information of the target.
  • the 2D detection unit 501 is configured to: apply a pre-built first convolutional neural network to process the RGB image, and obtain the detection area of the target in the RGB image; The detection area corresponding to the same target as the RGB image is acquired in the depth image.
  • the normalization unit 502 is configured to: apply a first network structure to process the RGB image in the detection area, and obtain a normalized normalized model of the target.
  • the normalization unit 502 is configured to: apply multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area and generate the lowest resolution feature map Perform a convolution operation on the above; apply multiple groups of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations A normalized model of the target is obtained.
  • the size acquisition unit 503 is configured to: convert the depth image in the detection area into a point cloud; apply the second network structure to process the point cloud, and obtain the The size information.
  • the pose estimation unit 504 is configured to: calculate the 3D model by applying the following relational formula according to the size information and the normalized model:
  • (x, y, z) are the coordinates of the normalized model
  • (x', y', z') are the coordinates of the 3D model
  • (w, l, h) are the dimensions of the target Information
  • w, l, h represent the width, length, and height of the target, respectively.
  • the pose estimation unit 504 is configured to: apply a PnP algorithm to match the coordinates of the 3D model with the 2D image, and acquire pose information of the target.
  • the target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model
  • the pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
  • An embodiment of the present disclosure provides a non-volatile computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute the target pose estimation method in any method embodiment above.
  • the executable instruction can be used to make the processor perform the following operations:
  • the size information is fused with the normalized model to obtain a 3D model, and a PnP algorithm is applied to obtain the object according to the 3D model.
  • executable instructions cause the processor to perform the following operations:
  • the detection area corresponding to the same target as the RGB image is acquired in the depth image.
  • executable instructions cause the processor to perform the following operations:
  • executable instructions cause the processor to perform the following operations:
  • executable instructions cause the processor to perform the following operations:
  • executable instructions cause the processor to perform the following operations:
  • (x, y, z) are the coordinates of the normalized model
  • (x', y', z') are the coordinates of the 3D model
  • (w, l, h) are the dimensions of the target Information
  • w, l, h represent the width, length, and height of the target, respectively.
  • executable instructions cause the processor to perform the following operations:
  • the target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model
  • the pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
  • Fig. 6 shows a schematic structural diagram of a device embodiment of the present disclosure, and the specific embodiment of the present disclosure does not limit the specific implementation of the device.
  • the device may include: a processor (processor) 602, a communication interface (Communications Interface) 604, a memory (memory) 606, and a communication bus 608.
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the processor 602 , the communication interface 604 , and the memory 606 communicate with each other through the communication bus 608 .
  • the communication interface 604 is used to communicate with network elements of other devices such as clients or other servers.
  • the processor 602 is configured to execute the program 610, and specifically, may execute relevant steps in the above embodiment of the target pose estimation method.
  • the program 610 may include program codes including computer operation instructions.
  • the processor 602 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present disclosure.
  • the one or more processors included in the device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 606 is used for storing the program 610 .
  • the memory 606 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the program 610 can specifically be used to make the processor 602 perform the following operations:
  • the size information is fused with the normalized model to obtain a 3D model, and a PnP algorithm is applied to obtain the object according to the 3D model.
  • the program 610 enables the processor to perform the following operations:
  • the detection area corresponding to the same target as the RGB image is acquired in the depth image.
  • the program 610 enables the processor to perform the following operations:
  • the program 610 enables the processor to perform the following operations:
  • the program 610 enables the processor to perform the following operations:
  • the program 610 enables the processor to perform the following operations:
  • (x, y, z) are the coordinates of the normalized model
  • (x', y', z') are the coordinates of the 3D model
  • (w, l, h) are the dimensions of the target Information
  • w, l, h represent the width, length, and height of the target, respectively.
  • the program 610 enables the processor to perform the following operations:
  • the target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model
  • the pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to the technical field of computer vision, and provides a target pose estimation method and apparatus, a computing device, a storage medium, and a computer program. The method comprises: performing 2D detection according to an RGB image and a depth image to obtain a detection region of a target; obtaining a normalization model of the target by using the RGB image in the detection region; obtaining size information of the target according to the depth image in the detection region; fusing the size information and the normalization model to obtain a 3D model, and applying a PnP algorithm according to the 3D model to obtain pose information of the target. By means of the method, in embodiments of the present invention, pose information of a target object can be accurately obtained, the target object can be conveniently grabbed, and the user experience is improved.

Description

目标位姿估计方法、装置、计算设备、存储介质及计算机程序Target pose estimation method, device, computing device, storage medium and computer program
交叉引用cross reference
本申请要求2021年06月30日递交的、申请号为“202110743454.8”、发明名称为“目标位姿估计方法、装置、计算设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number "202110743454.8" and the title of the invention "target pose estimation method, device, computing equipment and storage medium" submitted on June 30, 2021, the entire content of which is incorporated by reference incorporated in this application.
技术领域technical field
本公开实施例涉及计算机视觉技术领域,具体涉及一种目标位姿估计方法、装置、计算设备、存储介质及计算机程序。Embodiments of the present disclosure relate to the technical field of computer vision, and specifically relate to a target pose estimation method, device, computing device, storage medium, and computer program.
背景技术Background technique
智能机器人,除了能够感知周围世界,还要能够与环境进行交互,而抓取是不可或缺的能力。机器人抓取无论在工业场景还是在家庭场景,都有着十分巨大的应用价值,其中对待抓取物体进行位姿估计是影响抓取成功与否的重要因素。现有的位姿估计方法通常分为特征匹配法、模版法和基于深度学习的方法。特征匹配法通常计算3D模型和2D图像之间的特征点并将其进行匹配,然后使用PnP的方法计算位姿。模版法通常对待抓取物体的3D模型进行从各个视角进行建模,通过将采集到的图像和模版进行匹配来进行位姿的估计,基于深度学习的方法通常需要首先采集大量待抓取物体在各种位姿状态下的彩色图及深度图,创建数据集,然后通过训练基于深度学习的卷积神经网络,直接或者间接地估计出待抓取物体的位姿。Intelligent robots, in addition to being able to perceive the surrounding world, must also be able to interact with the environment, and grasping is an indispensable ability. Robot grasping has great application value no matter in the industrial scene or in the home scene. Among them, the pose estimation of the object to be grasped is an important factor affecting the success of grasping. Existing pose estimation methods are generally divided into feature matching methods, template methods, and deep learning-based methods. The feature matching method usually calculates the feature points between the 3D model and the 2D image and matches them, and then uses the PnP method to calculate the pose. The template method usually models the 3D model of the object to be grasped from various perspectives, and estimates the pose by matching the collected images with the template. The method based on deep learning usually needs to collect a large number of objects to be grasped first. Color images and depth images in various pose states, create a data set, and then directly or indirectly estimate the pose of the object to be grasped by training a convolutional neural network based on deep learning.
然而,目前算法在真实物体的抓取上仍存在缺陷。特征匹配法往往需要大量的计算,算法运行时间较长。不仅如此,特征点选取及匹配的成功与否直接影响了位姿估计的准确性,对于特征点较少的物体算法往往无法获得准确且稳定的结果。基于模版匹配的方法往往需要大量的模版制作,并且位姿估计本质上是回归问题,因此算法准确度和模版的数量选取往往成正比,很难做到平衡。基于深度学习的方法直接通过卷积神经网络直接回归物体位姿,现有深度学习方法多是实例级别的,泛化能力较差。However, the current algorithm still has shortcomings in the grasping of real objects. The feature matching method often requires a lot of calculations, and the algorithm takes a long time to run. Not only that, the success of feature point selection and matching directly affects the accuracy of pose estimation, and the algorithm for objects with fewer feature points often cannot obtain accurate and stable results. The method based on template matching often requires a large number of templates, and pose estimation is essentially a regression problem, so the accuracy of the algorithm is often directly proportional to the number of templates selected, and it is difficult to achieve a balance. The method based on deep learning directly returns the object pose through the convolutional neural network. Most of the existing deep learning methods are at the instance level, and the generalization ability is poor.
发明内容Contents of the invention
鉴于上述问题,本公开实施例提供了一种目标位姿估计方法、装置、 计算设备及存储介质,克服了上述问题或者至少部分地解决了上述问题。In view of the above problems, embodiments of the present disclosure provide a target pose estimation method, device, computing device, and storage medium, which overcome the above problems or at least partially solve the above problems.
根据本公开实施例的一个方面,提供了一种目标位姿估计方法,所述方法包括:根据RGB图像和深度图像进行2D检测,获取目标的检测区域;将所述检测区域内的所述RGB图像获取所述目标的归一化模型;根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息。According to an aspect of an embodiment of the present disclosure, a method for estimating a target pose is provided, the method comprising: performing 2D detection according to an RGB image and a depth image, and obtaining a detection area of the target; Acquiring a normalized model of the target from an image; acquiring size information of the target according to the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and The 3D model uses a PnP algorithm to obtain the pose information of the target.
在一种可选的方式中,所述根据RGB图像和深度图像进行2D检测,获取目标的检测区域,包括:应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。In an optional manner, the performing 2D detection according to the RGB image and the depth image to obtain the detection area of the target includes: applying a pre-built first convolutional neural network to process the RGB image to obtain the The detection area of the target in the RGB image; acquiring the same detection area of the target in the depth image as the RGB image.
在一种可选的方式中,所述将所述检测区域内的所述RGB图像获取归一化的所述目标的归一化模型,包括:应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。In an optional manner, the normalization model of the target obtained by normalizing the RGB image in the detection area includes: applying the first network structure to the RGB image in the detection area The RGB image is processed to obtain a normalized model of the target.
在一种可选的方式中,所述应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取所述目标的归一化模型,包括:In an optional manner, the applying the first network structure to process the RGB image in the detection area to obtain a normalized model of the target includes:
应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。Apply multiple sets of convolution + convolution + down-sampling combination to down-sample the RGB image in the detection area and then perform convolution operation on the lowest resolution feature map; apply multiple sets of up-sampling + convolution + convolution Combining and restoring the resolution of the RGB image in the detection area after the operation to the original size, and performing a preset number of convolution operations to obtain a normalized model of the target.
在一种可选的方式中,所述根据所述检测区域内的所述深度图像获取所述目标的尺寸信息,包括:将所述检测区域内的所述深度图像转换成点云;应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。In an optional manner, the acquiring the size information of the target according to the depth image in the detection area includes: converting the depth image in the detection area into a point cloud; applying the first The second network structure processes the point cloud to obtain the size information of the object.
在一种可选的方式中,所述将所述尺寸信息与所述归一化模型融合获取3D模型,包括:根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:In an optional manner, the merging the size information with the normalized model to obtain the 3D model includes: calculating the 3D model according to the size information and the normalized model by applying the following relational formula: Model:
x’=x×w,x'=x×w,
y’=y×l,y'=y×l,
z’=z×h,z'=z×h,
其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
在一种可选的方式中,所述根据所述3D模型应用PnP算法获取所述目标的位姿信息,包括:应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。In an optional manner, the applying the PnP algorithm to acquire the pose information of the target according to the 3D model includes: applying the PnP algorithm to match the coordinates of the 3D model with the 2D image, and acquiring the target pose information.
根据本公开实施例的另一方面,提供了一种目标位姿估计装置,所述目标位姿估计装置包括:2D检测单元,用于根据RGB图像和深度图像进行2D检测,获取目标的检测区域;According to another aspect of the embodiments of the present disclosure, a target pose estimation device is provided, the target pose estimation device includes: a 2D detection unit, configured to perform 2D detection according to an RGB image and a depth image, and obtain a detection area of a target ;
归一化单元,用于将所述检测区域内的所述RGB图像获取所述目标的归一化模型;尺寸获取单元,用于根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;位姿估计单元,用于将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息。A normalization unit, configured to acquire a normalized model of the target from the RGB image in the detection area; a size acquisition unit, configured to acquire the target’s size based on the depth image in the detection area Size information; a pose estimation unit, configured to fuse the size information with the normalized model to obtain a 3D model, and apply a PnP algorithm to obtain the pose information of the target according to the 3D model.
在一种可选的方式中,所述2D检测单元,包括用于:应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。In an optional manner, the 2D detection unit includes: applying a pre-built first convolutional neural network to process the RGB image, and obtaining the detection of the target in the RGB image area: acquire the detection area corresponding to the same target as the RGB image in the depth image.
在一种可选的方式中,所述归一化单元,包括用于:应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。In an optional manner, the normalization unit includes: applying the first network structure to process the RGB image in the detection area, and obtaining the normalized normalized value of the target model.
在一种可选的方式中,所述所述归一化单元,包括用于:应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。In an optional manner, the normalization unit includes: applying multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area Perform convolution operation on the lowest resolution feature map; apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of The normalized model of the target is obtained after convolution operations.
在一种可选的方式中,所述尺寸获取单元,包括用于:将所述检测区域内的所述深度图像转换成点云;应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。In an optional manner, the size acquisition unit is configured to: convert the depth image in the detection area into a point cloud; apply the second network structure to process the point cloud, and obtain the The size information of the target.
在一种可选的方式中,所述位姿估计单元,包括用于:根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:In an optional manner, the pose estimation unit includes: calculating the 3D model according to the size information and the normalized model by applying the following relational expression:
x’=x×w,x'=x×w,
y’=y×l,y'=y×l,
z’=z×h,z'=z×h,
其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
在一种可选的方式中,所述位姿估计单元,包括用于:应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。In an optional manner, the pose estimation unit is configured to: apply a PnP algorithm to match the coordinates of the 3D model with the 2D image, and obtain pose information of the target.
根据本公开实施例的另一方面,提供了一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;According to another aspect of the embodiments of the present disclosure, there is provided a computing device, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete the communication through the communication bus. communication between
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述目标位姿估计方法的步骤。The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the steps of the above target pose estimation method.
根据本公开实施例的又一方面,提供了一种计算机存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使所述处理器执行上述目标位姿估计方法的步骤。According to still another aspect of an embodiment of the present disclosure, a computer storage medium is provided, wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to execute the steps of the above target pose estimation method .
根据本公开实施例的又一方面,提供了一种计算机程序,包括指令,当其在计算机上运行时,使得计算机执行根据上述的目标位姿估计方法。According to yet another aspect of the embodiments of the present disclosure, there is provided a computer program, including instructions, which, when run on a computer, cause the computer to execute the above-mentioned target pose estimation method.
本公开实施例的目标位姿估计方法包括:根据RGB图像和深度图像进行2D检测,获取目标的检测区域;将所述检测区域内的所述RGB图像获取所述目标的归一化模型;根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息,能够准确获取目标物体的位姿信息,方便抓取目标物体,提升了用户体验。The target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model The pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
上述说明仅是本公开实施例技术方案的概述,为了能够更清楚了解本公开实施例的技术手段,而可依照说明书的内容予以实施,并且为了让本公开实施例的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。The above description is only an overview of the technical solutions of the embodiments of the present disclosure. In order to better understand the technical means of the embodiments of the present disclosure, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and The advantages can be more obvious and understandable, and the specific implementation manners of the present disclosure are enumerated below.
附图说明Description of drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本公开的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating the preferred embodiments and are not to be considered as limiting the present disclosure. Also throughout the drawings, the same reference numerals are used to designate the same components. In the attached picture:
图1示出了本公开实施例提供的目标位姿估计方法的流程示意图;FIG. 1 shows a schematic flow diagram of a target pose estimation method provided by an embodiment of the present disclosure;
图2示出了本公开实施例提供的目标位姿估计方法中的第一卷积神经网络示意图;FIG. 2 shows a schematic diagram of the first convolutional neural network in the target pose estimation method provided by an embodiment of the present disclosure;
图3示出了本公开实施例提供的目标位姿估计方法中第一网络结构示意图;FIG. 3 shows a schematic diagram of the first network structure in the target pose estimation method provided by an embodiment of the present disclosure;
图4示出了本公开实施例提供的目标位姿估计方法中的尺寸信息获取示意图;FIG. 4 shows a schematic diagram of acquiring size information in the target pose estimation method provided by an embodiment of the present disclosure;
图5示出了本公开实施例提供的目标位姿估计装置的结构示意图;FIG. 5 shows a schematic structural diagram of an object pose estimation device provided by an embodiment of the present disclosure;
图6示出了本公开实施例提供的计算设备的结构示意图。FIG. 6 shows a schematic structural diagram of a computing device provided by an embodiment of the present disclosure.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
图1示出了本公开实施例提供的目标位姿估计方法的流程示意图,如图1所示,目标位姿估计方法包括:Fig. 1 shows a schematic flowchart of a method for estimating a target pose provided by an embodiment of the present disclosure. As shown in Fig. 1 , the method for estimating a target pose includes:
步骤S11:根据RGB图像和深度图像进行2D检测,获取目标的检测区域。Step S11: Perform 2D detection according to the RGB image and the depth image, and obtain the detection area of the target.
在本公开实施例中,可选地,应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。第一卷积神经网络不限于特定的检测或分割方法,目标是得到目标(待抓取物体)在图像中的具体区域,为后续的位姿估计减少背景干扰因素。In an embodiment of the present disclosure, optionally, the RGB image is processed by applying a pre-built first convolutional neural network to obtain the detection area of the target in the RGB image; in the depth image Acquire the detection area corresponding to the same target as the RGB image. The first convolutional neural network is not limited to a specific detection or segmentation method, the goal is to obtain the specific area of the target (object to be captured) in the image, and reduce background interference factors for subsequent pose estimation.
在应用第一卷积神经网络进行处理之前,需要构建该第一卷积神经网络。首先构建数据集:采集待抓取物体在不同环境背景下的RGB图像,对每张RGB图像,标记最合适的边界框(x,y,w,h)及物体类别id;其次,对大量RGB图像数据,使用卷积神经网络(Convolutional Neural Network,CNN)进行训练,得到第一卷积神经网络模型。第一卷积神经网络的网络结构如图2所示,网络层数为31层,将图像块进行缩放得到448x448像素的块,作为网络输入。Before applying the first convolutional neural network for processing, the first convolutional neural network needs to be constructed. First construct the data set: collect the RGB images of the objects to be captured under different environmental backgrounds, and mark the most suitable bounding box (x, y, w, h) and object category id for each RGB image; secondly, for a large number of RGB The image data is trained using a Convolutional Neural Network (CNN) to obtain the first Convolutional Neural Network model. The network structure of the first convolutional neural network is shown in Figure 2. The number of network layers is 31, and the image block is scaled to obtain a 448x448 pixel block as the network input.
步骤S12:将所述检测区域内的所述RGB图像获取所述目标的归一化模型。Step S12: Obtain a normalized model of the target from the RGB image in the detection area.
在本公开实施例中,可选地,应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。第一网络结构的具体结构如图3所示,应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。优选地,对所述检测区域内的所述RGB图像进行4组卷积+卷积+下采样组合处理后在最低分辨率特征图上进行一次卷积操作,再经4组上采样+卷积+卷积组合处理后,进行连续两次的卷积操作后输出目标的归一化模型。本公开实施例在2D检测的基础上使用U-net网络结构回归归一化精度图,能够大大提升算法精度。In the embodiment of the present disclosure, optionally, a first network structure is applied to process the RGB image in the detection area to obtain a normalized normalized model of the target. The specific structure of the first network structure is shown in Figure 3, applying multiple sets of convolution + convolution + downsampling combination to downsample the RGB image in the detection area and then perform convolution on the lowest resolution feature map Operation: apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations to obtain the target normalized model. Preferably, after performing 4 sets of convolution + convolution + downsampling combined processing on the RGB image in the detection area, a convolution operation is performed on the lowest resolution feature map, and then 4 sets of upsampling + convolution + After convolution combination processing, the normalized model of the target is output after two consecutive convolution operations. The embodiments of the present disclosure use the U-net network structure regression normalized accuracy map on the basis of 2D detection, which can greatly improve the accuracy of the algorithm.
步骤S13:根据所述检测区域内的所述深度图像获取所述目标的尺寸信息。Step S13: Acquiring size information of the target according to the depth image in the detection area.
可选地,如图4所示,将所述检测区域内的所述深度图像转换成点云。具体应用以下转换公式进行转换:Optionally, as shown in FIG. 4 , the depth image in the detection area is converted into a point cloud. Specifically apply the following conversion formula for conversion:
Figure PCTCN2021143442-appb-000001
Figure PCTCN2021143442-appb-000001
其中,(X,Y,Z)是点云坐标,(x′,y′)是图像坐标,D是深度值,f x和f y是焦距, c x,c y是主点偏移。 Among them, (X,Y,Z) are point cloud coordinates, (x′,y′) are image coordinates, D is depth value, f x and f y are focal lengths, c x , cy are principal point offsets.
然后应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。第二网络结构优选为PointNet++网络以及位于PointNet++网络之后的卷积层和全连接层构成。通过PointNet++网络,即可回归出物体的尺寸大小,用S(w,l,h)表示,在其后通过增加卷积层和全连接层,恢复出物体的尺寸信息。Then apply the second network structure to process the point cloud to obtain the size information of the object. The second network structure is preferably composed of a PointNet++ network and a convolutional layer and a fully connected layer behind the PointNet++ network. Through the PointNet++ network, the size of the object can be regressed, represented by S(w,l,h), and then the size information of the object can be restored by adding a convolutional layer and a fully connected layer.
步骤S14:将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息。Step S14: merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm according to the 3D model to obtain pose information of the target.
在本公开实施例中,将归一化模型和物体尺寸信息进行融合,即可获得目标(待抓取物体)的完整3D信息。可选地,根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:In the embodiment of the present disclosure, the complete 3D information of the target (object to be grasped) can be obtained by fusing the normalized model and the object size information. Optionally, the 3D model is calculated by applying the following relational formula according to the size information and the normalized model:
x’=x×w,x'=x×w,
y’=y×l,y'=y×l,
z’=z×h,z'=z×h,
其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
然后应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。目标的位姿信息包括放置矩阵R和平移矩阵T。PnP算法可以是现有的任一种能够实现上述功能的PnP算法,在此不再赘述。本公开实施例借助深度图通过PointNet++恢复尺寸信息,增加了后处理算法的先验信息,能够获得较高的精度信息。Then apply the PnP algorithm to match the coordinates of the 3D model with the 2D image to obtain the pose information of the target. The pose information of the target includes the placement matrix R and the translation matrix T. The PnP algorithm may be any existing PnP algorithm capable of realizing the above-mentioned functions, which will not be repeated here. In the embodiment of the present disclosure, size information is restored through PointNet++ by means of a depth map, and prior information of a post-processing algorithm is added to obtain higher precision information.
本公开实施例应用RGB图像通过卷积神经网络得到物体类别、分割结果和归一化模型,深度(Depth)图像和分割结果得到物体尺寸信息,尺寸信息和归一化模型融合得到3D模型,最后通过PnP得到最终的位姿信息。其中使用T(x,y,z)表示三维空间中位置信息,使用旋转矩阵R来表示三维空间中的三轴旋转。通过使用归一化模型能够很好的解决物体同类物体尺寸不统一及相机缩放导致无法得到物体准确尺寸的问题,结合深度图恢复出的尺寸进行信息不畅,即可解决当前深度学习方法多是实例级别的问题。The embodiments of the present disclosure use RGB images to obtain object categories, segmentation results and normalized models through convolutional neural networks, depth (Depth) images and segmentation results to obtain object size information, size information and normalized models are fused to obtain 3D models, and finally The final pose information is obtained through PnP. Wherein, T(x, y, z) is used to represent position information in three-dimensional space, and a rotation matrix R is used to represent three-axis rotation in three-dimensional space. The use of the normalized model can well solve the problem of inconsistency in the size of similar objects of the same type and the inability to obtain the exact size of the object due to camera scaling. Combined with the size recovered from the depth map to carry out poor information, it can solve the current deep learning method. Instance-level issues.
以下将本公开实施例的目标位姿估计方法应用于机器人的步骤进行实例说明:The steps of applying the target pose estimation method of the embodiment of the present disclosure to a robot are illustrated as examples below:
1)准备好机器人设备,包括机器人底座、机械臂和深度摄像头等;1) Prepare the robot equipment, including the robot base, mechanical arm and depth camera, etc.;
2)将物体放置在机械臂前方的桌面上,采集当前位置的RGB图像和Depth图像;2) Place the object on the desktop in front of the robotic arm, and collect the RGB image and Depth image of the current position;
3)针对目标物体的RGB图像,使用目标检测方法,得到当前抓取视角下的待抓取物体的所在区域;3) For the RGB image of the target object, use the target detection method to obtain the area where the object to be captured is located under the current capture angle of view;
4)使用归一化模型生成网络,生成待抓取物体的标准归一化模型;4) Use the normalized model to generate a network to generate a standard normalized model of the object to be grasped;
5)使用尺寸估计网络,计算待抓取物体的尺寸信息;5) Use the size estimation network to calculate the size information of the object to be grasped;
6)将尺寸信息和归一化模型融合,使用PnP算法计算出待抓取物体的位姿信息;6) Fuse the size information with the normalized model, and use the PnP algorithm to calculate the pose information of the object to be captured;
7)根据该位姿,使机械臂执行抓取。7) According to the pose, make the robotic arm perform grabbing.
本公开实施例的目标位姿估计方法包括:根据RGB图像和深度图像进行2D检测,获取目标的检测区域;将所述检测区域内的所述RGB图像获取所述目标的归一化模型;根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息,能够准确获取目标物体的位姿信息,方便抓取目标物体,提升了用户体验。The target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model The pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
图5示出了本公开实施例的目标位姿估计装置的结构示意图,如图5所示,该目标位姿估计装置包括:2D检测单元501、归一化单元502、尺寸获取单元503以及位姿估计单元504。Fig. 5 shows a schematic structural diagram of an object pose estimation device according to an embodiment of the present disclosure. As shown in Fig. Pose Estimation Unit 504.
2D检测单元501用于根据RGB图像和深度图像进行2D检测,获取目标的检测区域;归一化单元502用于将所述检测区域内的所述RGB图像获取所述目标的归一化模型;尺寸获取单元503用于根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;位姿估计单元504用于将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息。The 2D detection unit 501 is used to perform 2D detection according to the RGB image and the depth image to obtain the detection area of the target; the normalization unit 502 is used to obtain the normalized model of the target from the RGB image in the detection area; The size acquisition unit 503 is configured to acquire size information of the target according to the depth image in the detection area; the pose estimation unit 504 is configured to fuse the size information with the normalized model to obtain a 3D model, and Applying a PnP algorithm according to the 3D model to obtain pose information of the target.
在一种可选的方式中,2D检测单元501用于:应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。In an optional manner, the 2D detection unit 501 is configured to: apply a pre-built first convolutional neural network to process the RGB image, and obtain the detection area of the target in the RGB image; The detection area corresponding to the same target as the RGB image is acquired in the depth image.
在一种可选的方式中,归一化单元502用于:应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。In an optional manner, the normalization unit 502 is configured to: apply a first network structure to process the RGB image in the detection area, and obtain a normalized normalized model of the target.
在一种可选的方式中,归一化单元502用于:应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。In an optional manner, the normalization unit 502 is configured to: apply multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area and generate the lowest resolution feature map Perform a convolution operation on the above; apply multiple groups of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations A normalized model of the target is obtained.
在一种可选的方式中,尺寸获取单元503用于:将所述检测区域内的所述深度图像转换成点云;应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。In an optional manner, the size acquisition unit 503 is configured to: convert the depth image in the detection area into a point cloud; apply the second network structure to process the point cloud, and obtain the The size information.
在一种可选的方式中,位姿估计单元504用于:根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:In an optional manner, the pose estimation unit 504 is configured to: calculate the 3D model by applying the following relational formula according to the size information and the normalized model:
x’=x×w,x'=x×w,
y’=y×l,y'=y×l,
z’=z×h,z'=z×h,
其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
在一种可选的方式中,位姿估计单元504用于:应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。In an optional manner, the pose estimation unit 504 is configured to: apply a PnP algorithm to match the coordinates of the 3D model with the 2D image, and acquire pose information of the target.
本公开实施例的目标位姿估计方法包括:根据RGB图像和深度图像进行2D检测,获取目标的检测区域;将所述检测区域内的所述RGB图像获取所述目标的归一化模型;根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息,能够准确获取目标物体的位姿信息,方便抓取目标物体,提升了用户体验。The target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model The pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
本公开实施例提供了一种非易失性计算机存储介质,所述计算机存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的目标位姿估计方法。An embodiment of the present disclosure provides a non-volatile computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute the target pose estimation method in any method embodiment above.
可执行指令具体可以用于使得处理器执行以下操作:Specifically, the executable instruction can be used to make the processor perform the following operations:
根据RGB图像和深度图像进行2D检测,获取目标的检测区域;Perform 2D detection based on the RGB image and the depth image to obtain the detection area of the target;
将所述检测区域内的所述RGB图像获取所述目标的归一化模型;Obtaining a normalized model of the target from the RGB image in the detection area;
根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;acquiring size information of the target according to the depth image in the detection area;
将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目。The size information is fused with the normalized model to obtain a 3D model, and a PnP algorithm is applied to obtain the object according to the 3D model.
在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instructions cause the processor to perform the following operations:
应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;Applying the pre-built first convolutional neural network to process the RGB image to obtain the detection area of the target in the RGB image;
在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。The detection area corresponding to the same target as the RGB image is acquired in the depth image.
在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instructions cause the processor to perform the following operations:
应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。Applying the first network structure to process the RGB image in the detection area to obtain a normalized normalized model of the target.
在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instructions cause the processor to perform the following operations:
应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;Applying multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area and then perform a convolution operation on the lowest resolution feature map;
应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。Apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations to obtain the normalization of the target One model.
在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instructions cause the processor to perform the following operations:
将所述检测区域内的所述深度图像转换成点云;converting the depth image within the detection area into a point cloud;
应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。Applying the second network structure to process the point cloud to obtain the size information of the target.
在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instructions cause the processor to perform the following operations:
根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:Applying the following relational formula to calculate the 3D model according to the size information and the normalized model:
x’=x×w,x'=x×w,
y’=y×l,y'=y×l,
z’=z×h,z'=z×h,
其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
在一种可选的方式中,所述可执行指令使所述处理器执行以下操作:In an optional manner, the executable instructions cause the processor to perform the following operations:
应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。Applying a PnP algorithm to match the coordinates of the 3D model with the 2D image to obtain the pose information of the target.
本公开实施例的目标位姿估计方法包括:根据RGB图像和深度图像进行2D检测,获取目标的检测区域;将所述检测区域内的所述RGB图像获取所述目标的归一化模型;根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息,能够准确获取目标物体的位姿信息,方便抓取目标物体,提升了用户体验。The target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model The pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
图6示出了本公开设备实施例的结构示意图,本公开具体实施例并不对设备的具体实现做限定。Fig. 6 shows a schematic structural diagram of a device embodiment of the present disclosure, and the specific embodiment of the present disclosure does not limit the specific implementation of the device.
如图6所示,该设备可以包括:处理器(processor)602、通信接口(Communications Interface)604、存储器(memory)606、以及通信总线608。As shown in FIG. 6 , the device may include: a processor (processor) 602, a communication interface (Communications Interface) 604, a memory (memory) 606, and a communication bus 608.
其中:处理器602、通信接口604、以及存储器606通过通信总线608完成相互间的通信。通信接口604,用于与其它设备比如客户端或其它服务器等的网元通信。处理器602,用于执行程序610,具体可以执行上述目标位姿估计方法实施例中的相关步骤。Wherein: the processor 602 , the communication interface 604 , and the memory 606 communicate with each other through the communication bus 608 . The communication interface 604 is used to communicate with network elements of other devices such as clients or other servers. The processor 602 is configured to execute the program 610, and specifically, may execute relevant steps in the above embodiment of the target pose estimation method.
具体地,程序610可以包括程序代码,该程序代码包括计算机操作指令。Specifically, the program 610 may include program codes including computer operation instructions.
处理器602可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本公开实施例的一个或多个集成电路。设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor 602 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present disclosure. The one or more processors included in the device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
存储器606,用于存放程序610。存储器606可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 606 is used for storing the program 610 . The memory 606 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
程序610具体可以用于使得处理器602执行以下操作:The program 610 can specifically be used to make the processor 602 perform the following operations:
根据RGB图像和深度图像进行2D检测,获取目标的检测区域;Perform 2D detection based on the RGB image and the depth image to obtain the detection area of the target;
将所述检测区域内的所述RGB图像获取所述目标的归一化模型;Obtaining a normalized model of the target from the RGB image in the detection area;
根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;acquiring size information of the target according to the depth image in the detection area;
将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目。The size information is fused with the normalized model to obtain a 3D model, and a PnP algorithm is applied to obtain the object according to the 3D model.
在一种可选的方式中,所述程序610使所述处理器执行以下操作:In an optional manner, the program 610 enables the processor to perform the following operations:
应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;Applying the pre-built first convolutional neural network to process the RGB image to obtain the detection area of the target in the RGB image;
在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。The detection area corresponding to the same target as the RGB image is acquired in the depth image.
在一种可选的方式中,所述程序610使所述处理器执行以下操作:In an optional manner, the program 610 enables the processor to perform the following operations:
应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。Applying the first network structure to process the RGB image in the detection area to obtain a normalized normalized model of the target.
在一种可选的方式中,所述程序610使所述处理器执行以下操作:In an optional manner, the program 610 enables the processor to perform the following operations:
应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;Applying multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area and then perform a convolution operation on the lowest resolution feature map;
应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。Apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations to obtain the normalization of the target One model.
在一种可选的方式中,所述程序610使所述处理器执行以下操作:In an optional manner, the program 610 enables the processor to perform the following operations:
将所述检测区域内的所述深度图像转换成点云;converting the depth image within the detection area into a point cloud;
应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。Applying the second network structure to process the point cloud to obtain the size information of the target.
在一种可选的方式中,所述程序610使所述处理器执行以下操作:In an optional manner, the program 610 enables the processor to perform the following operations:
根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:Applying the following relational formula to calculate the 3D model according to the size information and the normalized model:
x’=x×w,x'=x×w,
y’=y×l,y'=y×l,
z’=z×h,z'=z×h,
其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
在一种可选的方式中,所述程序610使所述处理器执行以下操作:In an optional manner, the program 610 enables the processor to perform the following operations:
应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。Applying a PnP algorithm to match the coordinates of the 3D model with the 2D image to obtain the pose information of the target.
本公开实施例的目标位姿估计方法包括:根据RGB图像和深度图像进行2D检测,获取目标的检测区域;将所述检测区域内的所述RGB图像获取所述目标的归一化模型;根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用 PnP算法获取所述目标的位姿信息,能够准确获取目标物体的位姿信息,方便抓取目标物体,提升了用户体验。The target pose estimation method in the embodiment of the present disclosure includes: performing 2D detection according to the RGB image and the depth image, and obtaining the detection area of the target; obtaining the normalized model of the target from the RGB image in the detection area; Obtaining the size information of the target from the depth image in the detection area; merging the size information with the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the position of the target according to the 3D model The pose information can accurately obtain the pose information of the target object, which is convenient for grabbing the target object and improves the user experience.
在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本公开实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本公开的内容,并且上面对特定语言所做的描述是为了披露本公开的最佳实施方式。The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present disclosure are not directed to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present disclosure described herein, and the above description of specific languages is for disclosing the best mode of the present disclosure.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本公开的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本公开的示例性实施例的描述中,本公开实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本公开要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本公开的单独实施例。Similarly, it should be appreciated that in the above description of exemplary embodiments of the disclosure, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of embodiments of the disclosure are sometimes grouped together into a single implementation examples, figures, or descriptions thereof. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this disclosure.
此外,本领域的技术人员能够理解,尽管在此的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本公开的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art will appreciate that although some embodiments herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the present disclosure. And form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
应该注意的是上述实施例对本公开进行说明而不是对本公开进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本公开可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这 些单词解释为名称。上述实施例中的步骤,除有特殊说明外,不应理解为对执行顺序的限定。It should be noted that the above-mentioned embodiments illustrate rather than limit the disclosure, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The disclosure can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names. The steps in the above embodiments, unless otherwise specified, should not be construed as limiting the execution order.

Claims (17)

  1. 一种目标位姿估计方法,其特征在于,所述方法包括:A method for estimating a target pose, characterized in that the method comprises:
    根据RGB图像和深度图像进行2D检测,获取目标的检测区域;Perform 2D detection based on the RGB image and the depth image to obtain the detection area of the target;
    将所述检测区域内的所述RGB图像获取所述目标的归一化模型;Obtaining a normalized model of the target from the RGB image in the detection area;
    根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;acquiring size information of the target according to the depth image in the detection area;
    将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息。The size information is fused with the normalized model to obtain a 3D model, and the pose information of the target is obtained by applying a PnP algorithm according to the 3D model.
  2. 如权利要求1所述的目标位姿估计方法,其特征在于,所述根据RGB图像和深度图像进行2D检测,获取目标的检测区域,包括:The target pose estimation method according to claim 1, wherein the 2D detection is performed according to the RGB image and the depth image, and the detection area of the target is obtained, comprising:
    应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;Applying the pre-built first convolutional neural network to process the RGB image to obtain the detection area of the target in the RGB image;
    在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。The detection area corresponding to the same target as the RGB image is acquired in the depth image.
  3. 如权利要求1所述的目标位姿估计方法,其特征在于,所述将所述检测区域内的所述RGB图像获取归一化的所述目标的归一化模型,包括:The target pose estimation method according to claim 1, wherein the normalized model of the target obtained by normalizing the RGB images in the detection area comprises:
    应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。Applying the first network structure to process the RGB image in the detection area to obtain a normalized normalized model of the target.
  4. 如权利要求3所述的目标位姿估计方法,其特征在于,所述应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取所述目标的归一化模型,包括:The target pose estimation method according to claim 3, wherein said applying the first network structure to process said RGB image in said detection area, and obtaining a normalized model of said target, comprising:
    应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;Applying multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area and then perform a convolution operation on the lowest resolution feature map;
    应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。Apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations to obtain the normalization of the target One model.
  5. 如权利要求1所述的目标位姿估计方法,其特征在于,所述根据所述检测区域内的所述深度图像获取所述目标的尺寸信息,包括:The target pose estimation method according to claim 1, wherein the acquiring the size information of the target according to the depth image in the detection area comprises:
    将所述检测区域内的所述深度图像转换成点云;converting the depth image within the detection area into a point cloud;
    应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。Applying the second network structure to process the point cloud to obtain the size information of the target.
  6. 如权利要求1所述的目标位姿估计方法,其特征在于,所述将所述尺寸信息与所述归一化模型融合获取3D模型,包括:The target pose estimation method according to claim 1, wherein said merging said size information with said normalized model to obtain a 3D model comprises:
    根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:Applying the following relational formula to calculate the 3D model according to the size information and the normalized model:
    x’=x×w,x'=x×w,
    y’=y×l,y'=y×l,
    z’=z×h,z'=z×h,
    其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
  7. 如权利要求6所述的目标位姿估计方法,其特征在于,所述根据所述3D模型应用PnP算法获取所述目标的位姿信息,包括:The target pose estimation method according to claim 6, wherein said applying the PnP algorithm to obtain the pose information of the target according to the 3D model comprises:
    应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。Applying a PnP algorithm to match the coordinates of the 3D model with the 2D image to obtain the pose information of the target.
  8. 一种目标位姿估计装置,其特征在于,所述装置包括:A target pose estimation device, characterized in that the device comprises:
    2D检测单元,用于根据RGB图像和深度图像进行2D检测,获取目标的检测区域;The 2D detection unit is used to perform 2D detection according to the RGB image and the depth image to obtain the detection area of the target;
    归一化单元,用于将所述检测区域内的所述RGB图像获取所述目标的归一化模型;A normalization unit, configured to obtain a normalized model of the target from the RGB image in the detection area;
    尺寸获取单元,用于根据所述检测区域内的所述深度图像获取所述目标的尺寸信息;a size acquiring unit, configured to acquire size information of the target according to the depth image in the detection area;
    位姿估计单元,用于将所述尺寸信息与所述归一化模型融合获取3D模型,并根据所述3D模型应用PnP算法获取所述目标的位姿信息。A pose estimation unit, configured to fuse the size information with the normalized model to obtain a 3D model, and apply a PnP algorithm to obtain pose information of the target according to the 3D model.
  9. 如权利要求8所述的目标位姿估计装置,其特征在于,所述2D检测单元,包括用于:The target pose estimation device according to claim 8, wherein the 2D detection unit includes:
    应用预构建的第一卷积神经网络对所述RGB图像进行处理,获取所述RGB图像中所述目标的所述检测区域;Applying the pre-built first convolutional neural network to process the RGB image to obtain the detection area of the target in the RGB image;
    在所述深度图像中获取与所述RGB图像对应相同的所述目标的所述检测区域。The detection area corresponding to the same target as the RGB image is acquired in the depth image.
  10. 如权利要求8所述的目标位姿估计装置,其特征在于,所述归一化单元,包括用于:The target pose estimation device according to claim 8, wherein the normalization unit includes:
    应用第一网络结构对所述检测区域内的所述RGB图像进行处理,获取归一化的所述目标的归一化模型。Applying the first network structure to process the RGB image in the detection area to obtain a normalized normalized model of the target.
  11. 如权利要求10所述的目标位姿估计装置,其特征在于,所述所述归一化单元,包括用于:The target pose estimation device according to claim 10, wherein the normalization unit includes:
    应用多组卷积+卷积+下采样组合对所述检测区域内的所述RGB图像进行下采样后在最低分辨率特征图上进行卷积操作;Applying multiple sets of convolution+convolution+downsampling combinations to downsample the RGB image in the detection area and then perform a convolution operation on the lowest resolution feature map;
    应用多组上采样+卷积+卷积组合将操作后的所述检测区域内的所述RGB图像的分辨率恢复到原始大小,并进行预设数量个卷积操作后得到所述目标的归一化模型。Apply multiple sets of upsampling + convolution + convolution combination to restore the resolution of the RGB image in the detection area after the operation to the original size, and perform a preset number of convolution operations to obtain the normalization of the target One model.
  12. 如权利要求8所述的目标位姿估计装置,其特征在于,所述尺寸获取单元,包括用于:The target pose estimation device according to claim 8, wherein the size acquisition unit includes:
    将所述检测区域内的所述深度图像转换成点云;converting the depth image within the detection area into a point cloud;
    应用第二网络结构对所述点云进行处理,获取所述目标的所述尺寸信息。Applying the second network structure to process the point cloud to obtain the size information of the target.
  13. 如权利要求8所述的目标位姿估计装置,其特征在于,所述位姿估计单元,包括用于:The target pose estimation device according to claim 8, wherein the pose estimation unit includes:
    根据所述尺寸信息和所述归一化模型应用以下关系式计算所述3D模型:Applying the following relational formula to calculate the 3D model according to the size information and the normalized model:
    x’=x×w,x'=x×w,
    y’=y×l,y'=y×l,
    z’=z×h,z'=z×h,
    其中,(x,y,z)为所述归一化模型的坐标,(x’,y’,z’)为所述3D模型的坐标,(w,l,h)为所述目标的尺寸信息,w,l,h分别表示所述目标的宽、长、高。Wherein, (x, y, z) are the coordinates of the normalized model, (x', y', z') are the coordinates of the 3D model, and (w, l, h) are the dimensions of the target Information, w, l, h represent the width, length, and height of the target, respectively.
  14. 如权利要求13所述的目标位姿估计装置,其特征在于,所述位姿估计单元,包括用于:The target pose estimation device according to claim 13, wherein the pose estimation unit includes:
    应用PnP算法将所述3D模型的坐标与2D图像进行匹配,获取所述目标的位姿信息。Applying a PnP algorithm to match the coordinates of the 3D model with the 2D image to obtain the pose information of the target.
  15. 一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;A computing device, comprising: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface complete mutual communication through the communication bus;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行根据权利要求1-7任一项所述目标位姿估计方法的步骤。The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the steps of the target pose estimation method according to any one of claims 1-7.
  16. 一种计算机存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行根据权利要求1-7任一项所述目标位姿估计方法的步骤。A computer storage medium, at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to execute the steps of the target pose estimation method according to any one of claims 1-7.
  17. 一种计算机程序,包括指令,当其在计算机上运行时,使得计算机执行根据权利要求1-7任一项所述的目标位姿估计方法。A computer program, comprising instructions, when running on a computer, causes the computer to execute the target pose estimation method according to any one of claims 1-7.
PCT/CN2021/143442 2021-06-30 2021-12-30 Target pose estimation method and apparatus, computing device, storage medium, and computer program WO2023273272A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110743454.8 2021-06-30
CN202110743454.8A CN115222809B (en) 2021-06-30 2021-06-30 Target pose estimation method, device, computing equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023273272A1 true WO2023273272A1 (en) 2023-01-05

Family

ID=83606059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143442 WO2023273272A1 (en) 2021-06-30 2021-12-30 Target pose estimation method and apparatus, computing device, storage medium, and computer program

Country Status (2)

Country Link
CN (1) CN115222809B (en)
WO (1) WO2023273272A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN108555908A (en) * 2018-04-12 2018-09-21 同济大学 A kind of identification of stacking workpiece posture and pick-up method based on RGBD cameras
US20190012806A1 (en) * 2017-07-06 2019-01-10 Siemens Healthcare Gmbh Mobile Device Localization In Complex, Three-Dimensional Scenes
CN109255813A (en) * 2018-09-06 2019-01-22 大连理工大学 A kind of hand-held object pose real-time detection method towards man-machine collaboration
CN110322512A (en) * 2019-06-28 2019-10-11 中国科学院自动化研究所 In conjunction with the segmentation of small sample example and three-dimensional matched object pose estimation method
CN110793441A (en) * 2019-11-05 2020-02-14 北京华捷艾米科技有限公司 High-precision object geometric dimension measuring method and device
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112270249A (en) * 2020-10-26 2021-01-26 湖南大学 Target pose estimation method fusing RGB-D visual features
CN112562001A (en) * 2020-12-28 2021-03-26 中山大学 Object 6D pose estimation method, device, equipment and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111055281B (en) * 2019-12-19 2021-05-07 杭州电子科技大学 ROS-based autonomous mobile grabbing system and method
CN112233181A (en) * 2020-10-29 2021-01-15 深圳市广宁股份有限公司 6D pose recognition method and device and computer storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012806A1 (en) * 2017-07-06 2019-01-10 Siemens Healthcare Gmbh Mobile Device Localization In Complex, Three-Dimensional Scenes
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN108555908A (en) * 2018-04-12 2018-09-21 同济大学 A kind of identification of stacking workpiece posture and pick-up method based on RGBD cameras
CN109255813A (en) * 2018-09-06 2019-01-22 大连理工大学 A kind of hand-held object pose real-time detection method towards man-machine collaboration
CN110322512A (en) * 2019-06-28 2019-10-11 中国科学院自动化研究所 In conjunction with the segmentation of small sample example and three-dimensional matched object pose estimation method
CN110793441A (en) * 2019-11-05 2020-02-14 北京华捷艾米科技有限公司 High-precision object geometric dimension measuring method and device
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112270249A (en) * 2020-10-26 2021-01-26 湖南大学 Target pose estimation method fusing RGB-D visual features
CN112562001A (en) * 2020-12-28 2021-03-26 中山大学 Object 6D pose estimation method, device, equipment and medium

Also Published As

Publication number Publication date
CN115222809B (en) 2023-04-25
CN115222809A (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN110619676B (en) End-to-end three-dimensional face reconstruction method based on neural network
US11436745B1 (en) Reconstruction method of three-dimensional (3D) human body model, storage device and control device
WO2019174377A1 (en) Monocular camera-based three-dimensional scene dense reconstruction method
US9715761B2 (en) Real-time 3D computer vision processing engine for object recognition, reconstruction, and analysis
CN111862201B (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN113409384B (en) Pose estimation method and system of target object and robot
CN107953329B (en) Object recognition and attitude estimation method and device and mechanical arm grabbing system
WO2019011249A1 (en) Method, apparatus, and device for determining pose of object in image, and storage medium
CN110176032B (en) Three-dimensional reconstruction method and device
WO2022178952A1 (en) Target pose estimation method and system based on attention mechanism and hough voting
CN110097599B (en) Workpiece pose estimation method based on component model expression
CN111582220B (en) Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof
CN111798373A (en) Rapid unmanned aerial vehicle image stitching method based on local plane hypothesis and six-degree-of-freedom pose optimization
CN112053441A (en) Full-automatic layout recovery method for indoor fisheye image
CN114898313A (en) Bird's-eye view image generation method, device, equipment and storage medium of driving scene
CN114494150A (en) Design method of monocular vision odometer based on semi-direct method
CN115471748A (en) Monocular vision SLAM method oriented to dynamic environment
CN115008454A (en) Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement
WO2024037562A1 (en) Three-dimensional reconstruction method and apparatus, and computer-readable storage medium
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN117351078A (en) Target size and 6D gesture estimation method based on shape priori
WO2023273272A1 (en) Target pose estimation method and apparatus, computing device, storage medium, and computer program
CN115937002B (en) Method, apparatus, electronic device and storage medium for estimating video rotation
CN111198563A (en) Terrain recognition method and system for dynamic motion of foot type robot
CN113436251B (en) Pose estimation system and method based on improved YOLO6D algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21948194

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE