CN115222809A - Target pose estimation method and device, computing equipment and storage medium - Google Patents

Target pose estimation method and device, computing equipment and storage medium Download PDF

Info

Publication number
CN115222809A
CN115222809A CN202110743454.8A CN202110743454A CN115222809A CN 115222809 A CN115222809 A CN 115222809A CN 202110743454 A CN202110743454 A CN 202110743454A CN 115222809 A CN115222809 A CN 115222809A
Authority
CN
China
Prior art keywords
target
model
detection area
applying
pose estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110743454.8A
Other languages
Chinese (zh)
Other versions
CN115222809B (en
Inventor
杨佳丽
杜国光
赵开勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cloudminds Beijing Technologies Co Ltd
Original Assignee
Cloudminds Beijing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cloudminds Beijing Technologies Co Ltd filed Critical Cloudminds Beijing Technologies Co Ltd
Priority to CN202110743454.8A priority Critical patent/CN115222809B/en
Priority to PCT/CN2021/143442 priority patent/WO2023273272A1/en
Publication of CN115222809A publication Critical patent/CN115222809A/en
Application granted granted Critical
Publication of CN115222809B publication Critical patent/CN115222809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention relates to the technical field of computer vision, and discloses a target pose estimation method, a device, a computing device and a storage medium, wherein the method comprises the following steps: performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target; acquiring the RGB image in the detection area to obtain a normalized model of the target; acquiring size information of the target according to the depth image in the detection area; and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model. Through the mode, the embodiment of the invention can accurately acquire the pose information of the target object, is convenient to grab the target object and improves the user experience.

Description

Target pose estimation method and device, computing equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computer vision, in particular to a target pose estimation method, a target pose estimation device, a target pose estimation computing device and a storage medium.
Background
In addition to being able to perceive the surrounding world, intelligent robots must also be able to interact with the environment, and grabbing is an indispensable capability. The robot has great application value in both industrial scene and household scene, wherein the pose estimation of the object to be grabbed is an important factor influencing the grabbing success. The existing pose estimation methods are generally classified into a feature matching method, a template method and a deep learning-based method. The feature matching method generally calculates and matches feature points between a 3D model and a 2D image, and then calculates a pose using the PnP method. The stencil method generally models a 3D model of an object to be grasped from various viewing angles, and estimates the pose by matching the acquired image with the stencil, and the method based on deep learning generally requires first acquiring a large number of color images and depth images of the object to be grasped in various pose states, creating a data set, and then directly or indirectly estimating the pose of the object to be grasped by training a convolutional neural network based on deep learning.
However, the current algorithm still has defects in the grabbing of real objects. The feature matching method usually requires a large amount of calculation, and the algorithm running time is long. Moreover, the accuracy of pose estimation is directly affected by the success or failure of feature point selection and matching, and accurate and stable results cannot be obtained for an object algorithm with few feature points. The template matching-based method usually needs a large number of templates to be manufactured, and pose estimation is a regression problem essentially, so the algorithm accuracy and the number selection of the templates are usually in direct proportion, and balance is difficult to achieve. The deep learning-based method directly regresses the pose of an object through a convolutional neural network, and most of the existing deep learning methods are at an example level and have poor generalization capability.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide a target pose estimation method, apparatus, computing device and storage medium, which overcome or at least partially solve the above problems.
According to an aspect of an embodiment of the present invention, there is provided a target pose estimation method, including: performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target; acquiring the RGB image in the detection area to obtain a normalized model of the target; acquiring size information of the target according to the depth image in the detection area; and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model.
In an optional manner, the performing 2D detection according to the RGB image and the depth image to obtain a detection area of the target includes: processing the RGB image by applying a pre-constructed first convolution neural network to obtain the detection area of the target in the RGB image; and acquiring the detection area of the target corresponding to the same RGB image in the depth image.
In an alternative mode, the acquiring the RGB image in the detection area to be normalized to the model map of the target includes: and processing the RGB image in the detection area by applying a first network structure to obtain a normalized model diagram of the target.
In an optional manner, the processing the RGB image in the detection area by using the first network structure to obtain the normalized model of the target includes:
applying a plurality of groups of convolution + downsampling combinations to carry out downsampling on the RGB image in the detection area, and then carrying out convolution operation on the characteristic diagram with the lowest resolution; and restoring the resolution of the RGB image in the detection area after operation to the original size by applying a plurality of groups of up-sampling, convolution and convolution combinations, and performing a preset number of convolution operations to obtain a normalization model of the target.
In an alternative mode, the obtaining size information of the target according to the depth image in the detection area includes: converting the depth image within the detection area to a point cloud; and processing the point cloud by applying a second network structure to acquire the size information of the target.
In an alternative manner, the fusing the size information with the normalized model to obtain a 3D model includes: calculating the 3D model from the dimensional information and the normalized model using the following relationship:
x’=x×w,
y’=y×l,
z’=z×h,
wherein (x, y, z) is a coordinate of the normalized model, (x ', y ', z ') is a coordinate of the 3D model, (w, l, h) is size information of the object, and w, l, h respectively represent a width, a length, and a height of the object.
In an optional manner, the obtaining pose information of the target by applying the PnP algorithm according to the 3D model includes: and matching the coordinates of the 3D model with the coordinates of the 2D image by applying a PnP algorithm to acquire the pose information of the target.
According to another aspect of the embodiments of the present invention, there is provided an object pose estimation apparatus including: the 2D detection unit is used for carrying out 2D detection according to the RGB image and the depth image to obtain a detection area of a target;
the normalization unit is used for acquiring the RGB images in the detection area into a normalization model of the target; a size acquisition unit for acquiring size information of the target according to the depth image in the detection area; and the pose estimation unit is used for fusing the size information and the normalized model to obtain a 3D model and obtaining pose information of the target by applying a PnP algorithm according to the 3D model.
According to another aspect of embodiments of the present invention, there is provided a computing device including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the steps of the target pose estimation method.
According to still another aspect of the embodiments of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing the processor to execute the steps of the above-mentioned target pose estimation method.
The target pose estimation method provided by the embodiment of the invention comprises the following steps: performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target; acquiring the RGB image in the detection area to obtain a normalized model of the target; acquiring size information of the target according to the depth image in the detection area; and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model, so that the pose information of the target object can be accurately obtained, the target object can be conveniently grabbed, and the user experience is improved.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a target pose estimation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a first convolutional neural network in a target pose estimation method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a first network structure in the target pose estimation method according to the embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating size information acquisition in a target pose estimation method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an object pose estimation apparatus provided by an embodiment of the present invention;
fig. 6 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 shows a schematic flow chart of a target pose estimation method provided by an embodiment of the present invention, and as shown in fig. 1, the target pose estimation method includes:
step S11: and carrying out 2D detection according to the RGB image and the depth image to obtain a detection area of the target.
In the embodiment of the present invention, optionally, a pre-constructed first convolutional neural network is applied to process the RGB image, so as to obtain the detection area of the target in the RGB image; and acquiring the detection area of the target corresponding to the same RGB image in the depth image. The first convolution neural network is not limited to a specific detection or segmentation method, and the target is a specific region of a target (an object to be grabbed) in an image, so that background interference factors are reduced for subsequent pose estimation.
Before applying the first convolutional neural network for processing, the first convolutional neural network needs to be constructed. Firstly, a data set is constructed: collecting RGB images of an object to be captured under different environment backgrounds, and marking an optimal bounding box (x, y, w, h) and an object type id for each RGB image; secondly, training a large amount of RGB image data by using a Convolutional Neural Network (CNN) to obtain a first Convolutional Neural Network model. The network structure of the first convolutional neural network is shown in fig. 2, the number of network layers is 31, and the image block is scaled to obtain a block of 448x448 pixels as the network input.
Step S12: and acquiring the RGB image in the detection area to obtain a normalized model of the target.
In this embodiment of the present invention, optionally, a first network structure is applied to process the RGB image in the detection area, so as to obtain a normalized model map of the target. The specific structure of the first network structure is as shown in fig. 3, and a plurality of groups of convolution + downsampling combinations are applied to downsample the RGB image in the detection area, and then convolution operation is performed on the lowest resolution feature map; and restoring the resolution of the RGB image in the detection area after operation to the original size by applying a plurality of groups of up-sampling, convolution and convolution combinations, and performing a preset number of convolution operations to obtain a normalization model of the target. Preferably, after 4 sets of convolution + downsampling combination processing are performed on the RGB image in the detection area, a convolution operation is performed on the lowest resolution feature map once, and after 4 sets of upsampling + convolution combination processing are performed, a normalization model of the target is output after two continuous convolution operations are performed. The embodiment of the invention uses the U-net network structure regression normalization precision graph on the basis of 2D detection, and can greatly improve the precision of the algorithm.
Step S13: and acquiring the size information of the target according to the depth image in the detection area.
Optionally, as shown in fig. 4, the depth image within the detection area is converted to a point cloud. The following conversion formula is specifically applied for conversion:
Figure BDA0003142109070000051
wherein (X, Y, Z) is a point cloud coordinate, (X ', Y') is an image coordinate, D is a depth value, f x And f y Is the focal length, c x ,c y Is the principal point offset.
And then, processing the point cloud by applying a second network structure to acquire the size information of the target. The second network structure is preferably formed by a PointNet + + network and a convolutional layer and a full connection layer located behind the PointNet + + network. The size of the object can be recovered through a PointNet + + network, and is represented by S (w, l, h), and then the size information of the object is recovered by adding a convolution layer and a full connection layer.
Step S14: and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model.
In the embodiment of the invention, the normalized model and the object size information are fused, so that the complete 3D information of the target (the object to be grabbed) can be obtained. Optionally, the 3D model is calculated from the size information and the normalized model applying the following relation:
x’=x×w,
y’=y×l,
z’=z×h,
wherein (x, y, z) is a coordinate of the normalized model, (x ', y ', z ') is a coordinate of the 3D model, (w, l, h) is size information of the object, and w, l, h respectively represent a width, a length, and a height of the object.
And then matching the coordinates of the 3D model with the coordinates of the 2D image by applying a PnP algorithm to acquire the pose information of the target. The pose information of the targets includes a placement matrix R and a translation matrix T. The PnP algorithm may be any one of the existing PnP algorithms that can implement the above functions, and will not be described herein again. According to the embodiment of the invention, the dimension information is recovered through PointNet + + by means of the depth map, the prior information of the post-processing algorithm is added, and higher precision information can be obtained.
The method comprises the steps of obtaining object types, segmentation results and normalized model graphs by using RGB images through a convolutional neural network, obtaining object size information by using Depth (Depth) images and segmentation results, obtaining a 3D model by fusing the size information and the normalized model graphs, and finally obtaining final pose information through PnP. Where T (x, y, z) is used to represent position information in three-dimensional space and a rotation matrix R is used to represent three-axis rotation in three-dimensional space. The problem that the sizes of objects of the same kind are not uniform and the accurate sizes of the objects cannot be obtained due to camera scaling can be well solved by using the normalized model graph, and the problem that the current deep learning method is mostly at an instance level can be solved by combining the sizes recovered by the depth graph to carry out unsmooth information.
The following description exemplifies the steps of applying the target pose estimation method of the embodiment of the present invention to a robot:
1) Preparing robot equipment which comprises a robot base, a mechanical arm, a depth camera and the like;
2) Placing an object on a desktop in front of a mechanical arm, and collecting an RGB (red, green and blue) image and a Depth image at the current position;
3) Aiming at the RGB image of the target object, obtaining the region of the object to be grabbed under the current grabbing visual angle by using a target detection method;
4) Generating a network by using the normalized model, and generating a standard normalized model graph of the object to be grabbed;
5) Calculating size information of the object to be grabbed by using a size estimation network;
6) Fusing the size information and the normalized model graph, and calculating the position and orientation information of the object to be grabbed by using a PnP algorithm;
7) According to the pose, the robot arm is caused to perform grasping.
The target pose estimation method provided by the embodiment of the invention comprises the following steps: performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target; acquiring a normalization model of the target from the RGB image in the detection area; acquiring size information of the target according to the depth image in the detection area; and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model, so that the pose information of the target object can be accurately obtained, the target object can be conveniently grabbed, and the user experience is improved.
Fig. 5 is a schematic structural view of the target pose estimation apparatus according to the embodiment of the present invention, and as shown in fig. 5, the target pose estimation apparatus includes: a 2D detection unit 501, a normalization unit 502, a size acquisition unit 503, and a pose estimation unit 504.
The 2D detection unit 501 is configured to perform 2D detection according to the RGB image and the depth image to obtain a detection area of a target; the normalization unit 502 is configured to obtain a normalized model of the target from the RGB image in the detection area; the size obtaining unit 503 is configured to obtain size information of the target according to the depth image in the detection area; the pose estimation unit 504 is configured to fuse the size information and the normalized model to obtain a 3D model, and obtain pose information of the target by applying a PnP algorithm according to the 3D model.
In an alternative manner, the 2D detection unit 501 is configured to: processing the RGB image by applying a pre-constructed first convolution neural network to obtain the detection area of the target in the RGB image; and acquiring the detection area of the target corresponding to the same RGB image in the depth image.
In an optional manner, the normalization unit 502 is configured to: and processing the RGB image in the detection area by applying a first network structure to obtain a normalized model graph of the target.
In an optional manner, the normalization unit 502 is configured to: applying a plurality of groups of convolution + downsampling combinations to carry out downsampling on the RGB image in the detection area, and then carrying out convolution operation on the characteristic diagram with the lowest resolution; and restoring the resolution of the RGB image in the detection area after operation to the original size by applying a plurality of groups of up-sampling + convolution combinations, and performing a preset number of convolution operations to obtain a normalization model of the target.
In an alternative manner, the size obtaining unit 503 is configured to: converting the depth image within the detection area to a point cloud; and processing the point cloud by applying a second network structure to acquire the size information of the target.
In an optional manner, the pose estimation unit 504 is configured to: calculating the 3D model from the dimensional information and the normalized model using the following relationship:
x’=x×w,
y’=y×l,
z’=z×h,
wherein (x, y, z) is a coordinate of the normalized model, (x ', y ', z ') is a coordinate of the 3D model, (w, l, h) is size information of the object, and w, l, h respectively represent a width, a length, and a height of the object.
In an alternative manner, the pose estimation unit 504 is configured to: and matching the coordinates of the 3D model with the coordinates of the 2D image by applying a PnP algorithm to acquire the pose information of the target.
The target pose estimation method provided by the embodiment of the invention comprises the following steps: performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target; acquiring a normalization model of the target from the RGB image in the detection area; acquiring size information of the target according to the depth image in the detection area; and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model, so that the pose information of the target object can be accurately obtained, the target object can be conveniently grabbed, and the user experience is improved.
An embodiment of the present invention provides a non-volatile computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the computer executable instruction may execute the target pose estimation method in any of the above method embodiments.
The executable instructions may be specifically configured to cause the processor to perform the following operations:
performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target;
acquiring a normalization model of the target from the RGB image in the detection area;
acquiring size information of the target according to the depth image in the detection area;
and fusing the size information and the normalized model to obtain a 3D model, and applying a PnP algorithm to obtain the target according to the 3D model.
In an alternative, the executable instructions cause the processor to:
processing the RGB image by applying a pre-constructed first convolution neural network to obtain the detection area of the target in the RGB image;
and acquiring the detection area of the target corresponding to the same RGB image in the depth image.
In an alternative, the executable instructions cause the processor to:
and processing the RGB image in the detection area by applying a first network structure to obtain a normalized model diagram of the target.
In an alternative, the executable instructions cause the processor to:
applying a plurality of groups of convolution + downsampling combinations to carry out downsampling on the RGB image in the detection area, and then carrying out convolution operation on the characteristic diagram with the lowest resolution;
and restoring the resolution of the RGB image in the detection area after operation to the original size by applying a plurality of groups of up-sampling, convolution and convolution combinations, and performing a preset number of convolution operations to obtain a normalization model of the target.
In an alternative, the executable instructions cause the processor to:
converting the depth image within the detection area to a point cloud;
and processing the point cloud by applying a second network structure to acquire the size information of the target.
In an alternative, the executable instructions cause the processor to:
calculating the 3D model from the dimensional information and the normalized model using the following relation:
x’=x×w,
y’=y×l,
z’=z×h,
wherein (x, y, z) is a coordinate of the normalized model, (x ', y ', z ') is a coordinate of the 3D model, (w, l, h) is size information of the object, and w, l, h respectively represent a width, a length, and a height of the object.
In an alternative, the executable instructions cause the processor to:
and matching the coordinates of the 3D model with the coordinates of the 2D image by applying a PnP algorithm to acquire the pose information of the target.
The target pose estimation method provided by the embodiment of the invention comprises the following steps: performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target; acquiring the RGB image in the detection area to obtain a normalized model of the target; acquiring size information of the target according to the depth image in the detection area; and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model, so that the pose information of the target object can be accurately obtained, the target object can be conveniently grabbed, and the user experience is improved.
Fig. 6 shows a schematic structural diagram of an embodiment of the apparatus according to the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the apparatus.
As shown in fig. 6, the apparatus may include: a processor (processor) 602, a communication Interface 604, a memory 606, and a communication bus 608.
Wherein: the processor 602, communication interface 604, and memory 606 communicate with one another via a communication bus 608. A communication interface 604 for communicating with network elements of other devices, such as clients or other servers. The processor 602 is configured to execute the program 610, and may specifically perform the relevant steps in the above-described embodiment of the target pose estimation method.
In particular, program 610 may include program code comprising computer operating instructions.
The processor 602 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present invention. The device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 606 for storing a program 610. Memory 606 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 610 may specifically be configured to cause the processor 602 to perform the following operations:
performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target;
acquiring the RGB image in the detection area to obtain a normalized model of the target;
acquiring size information of the target according to the depth image in the detection area;
and fusing the size information and the normalized model to obtain a 3D model, and obtaining the target by applying a PnP algorithm according to the 3D model.
In an alternative, the program 610 causes the processor to:
processing the RGB image by applying a pre-constructed first convolution neural network to obtain the detection area of the target in the RGB image;
and acquiring the detection area of the target corresponding to the same RGB image in the depth image.
In an alternative, the program 610 causes the processor to:
and processing the RGB image in the detection area by applying a first network structure to obtain a normalized model diagram of the target.
In an alternative, the program 610 causes the processor to:
applying a plurality of groups of convolution + downsampling combinations to carry out downsampling on the RGB image in the detection area, and then carrying out convolution operation on the characteristic diagram with the lowest resolution;
and restoring the resolution of the RGB image in the detection area after operation to the original size by applying a plurality of groups of up-sampling + convolution combinations, and performing a preset number of convolution operations to obtain a normalization model of the target.
In an alternative, the program 610 causes the processor to:
converting the depth image within the detection area to a point cloud;
and processing the point cloud by applying a second network structure to acquire the size information of the target.
In an alternative, the program 610 causes the processor to:
calculating the 3D model from the dimensional information and the normalized model using the following relationship:
x’=x×w,
y’=y×l,
z’=z×h,
wherein (x, y, z) is coordinates of the normalized model, (x ', y ', z ') is coordinates of the 3D model, (w, l, h) is size information of the object, and w, l, h respectively represent width, length, and height of the object.
In an alternative, the program 610 causes the processor to:
and matching the coordinates of the 3D model with the coordinates of the 2D image by applying a PnP algorithm to acquire the pose information of the target.
The target pose estimation method provided by the embodiment of the invention comprises the following steps: performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target; acquiring the RGB image in the detection area to obtain a normalized model of the target; acquiring size information of the target according to the depth image in the detection area; and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model, so that the pose information of the target object can be accurately obtained, the target object can be conveniently grabbed, and the user experience is improved.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: rather, the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. A method of target pose estimation, the method comprising:
performing 2D detection according to the RGB image and the depth image to obtain a detection area of a target;
acquiring the RGB image in the detection area to obtain a normalized model of the target;
acquiring size information of the target according to the depth image in the detection area;
and fusing the size information and the normalized model to obtain a 3D model, and obtaining the pose information of the target by applying a PnP algorithm according to the 3D model.
2. The object pose estimation method according to claim 1, wherein the performing 2D detection based on the RGB image and the depth image to obtain a detection area of the object comprises:
processing the RGB image by applying a pre-constructed first convolution neural network to obtain the detection area of the target in the RGB image;
and acquiring the detection area of the target corresponding to the same RGB image in the depth image.
3. The object pose estimation method according to claim 1, wherein the acquiring the RGB images within the detection area into a normalized model map of the object comprises:
and processing the RGB image in the detection area by applying a first network structure to obtain a normalized model diagram of the target.
4. The object pose estimation method according to claim 3, wherein the applying a first network structure to the RGB images in the detection area to obtain a normalized model of the object comprises:
applying a plurality of groups of convolution + downsampling combinations to downsample the RGB image in the detection area, and then performing convolution operation on the feature map with the lowest resolution;
and restoring the resolution of the RGB image in the detection area after operation to the original size by applying a plurality of groups of up-sampling, convolution and convolution combinations, and performing a preset number of convolution operations to obtain a normalization model of the target.
5. The object pose estimation method according to claim 1, wherein the acquiring size information of the object from the depth image in the detection area includes:
converting the depth image within the detection area to a point cloud;
and processing the point cloud by applying a second network structure to acquire the size information of the target.
6. The object pose estimation method according to claim 1, wherein the fusing the size information with the normalized model to obtain a 3D model includes:
calculating the 3D model from the dimensional information and the normalized model using the following relationship:
x’=x×w,
y’=y×l,
z’=z×h,
wherein (x, y, z) is coordinates of the normalized model, (x ', y ', z ') is coordinates of the 3D model, (w, l, h) is size information of the object, and w, l, h respectively represent width, length, and height of the object.
7. The pose estimation method of an object according to claim 6, wherein the applying the PnP algorithm to obtain pose information of the object based on the 3D model comprises:
and matching the coordinates of the 3D model with the coordinates of the 2D image by applying a PnP algorithm to acquire the pose information of the target.
8. An object pose estimation apparatus, characterized in that the apparatus comprises:
the 2D detection unit is used for carrying out 2D detection according to the RGB image and the depth image to obtain a detection area of a target;
the normalization unit is used for acquiring the RGB images in the detection area into a normalization model of the target;
a size acquisition unit for acquiring size information of the target according to the depth image in the detection area;
and the pose estimation unit is used for fusing the size information and the normalized model to obtain a 3D model and obtaining pose information of the target by applying a PnP algorithm according to the 3D model.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the object pose estimation method according to any one of claims 1-7.
10. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform the steps of the object pose estimation method according to any one of claims 1-7.
CN202110743454.8A 2021-06-30 2021-06-30 Target pose estimation method, device, computing equipment and storage medium Active CN115222809B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110743454.8A CN115222809B (en) 2021-06-30 2021-06-30 Target pose estimation method, device, computing equipment and storage medium
PCT/CN2021/143442 WO2023273272A1 (en) 2021-06-30 2021-12-30 Target pose estimation method and apparatus, computing device, storage medium, and computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110743454.8A CN115222809B (en) 2021-06-30 2021-06-30 Target pose estimation method, device, computing equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115222809A true CN115222809A (en) 2022-10-21
CN115222809B CN115222809B (en) 2023-04-25

Family

ID=83606059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110743454.8A Active CN115222809B (en) 2021-06-30 2021-06-30 Target pose estimation method, device, computing equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115222809B (en)
WO (1) WO2023273272A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111055281A (en) * 2019-12-19 2020-04-24 杭州电子科技大学 ROS-based autonomous mobile grabbing system and method
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112233181A (en) * 2020-10-29 2021-01-15 深圳市广宁股份有限公司 6D pose recognition method and device and computer storage medium
CN112270249A (en) * 2020-10-26 2021-01-26 湖南大学 Target pose estimation method fusing RGB-D visual features

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10699438B2 (en) * 2017-07-06 2020-06-30 Siemens Healthcare Gmbh Mobile device localization in complex, three-dimensional scenes
CN108171748B (en) * 2018-01-23 2021-12-07 哈工大机器人(合肥)国际创新研究院 Visual identification and positioning method for intelligent robot grabbing application
CN108555908B (en) * 2018-04-12 2020-07-28 同济大学 Stacked workpiece posture recognition and pickup method based on RGBD camera
CN109255813B (en) * 2018-09-06 2021-03-26 大连理工大学 Man-machine cooperation oriented hand-held object pose real-time detection method
CN110322512A (en) * 2019-06-28 2019-10-11 中国科学院自动化研究所 In conjunction with the segmentation of small sample example and three-dimensional matched object pose estimation method
CN110793441B (en) * 2019-11-05 2021-07-27 北京华捷艾米科技有限公司 High-precision object geometric dimension measuring method and device
CN112562001B (en) * 2020-12-28 2023-07-21 中山大学 Object 6D pose estimation method, device, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111055281A (en) * 2019-12-19 2020-04-24 杭州电子科技大学 ROS-based autonomous mobile grabbing system and method
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112270249A (en) * 2020-10-26 2021-01-26 湖南大学 Target pose estimation method fusing RGB-D visual features
CN112233181A (en) * 2020-10-29 2021-01-15 深圳市广宁股份有限公司 6D pose recognition method and device and computer storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG HE 等: "Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *

Also Published As

Publication number Publication date
WO2023273272A1 (en) 2023-01-05
CN115222809B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
CN110135455B (en) Image matching method, device and computer readable storage medium
CN111862201B (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN110176032B (en) Three-dimensional reconstruction method and device
Azad et al. Stereo-based 6d object localization for grasping with humanoid robot systems
CN113223091B (en) Three-dimensional target detection method, three-dimensional target capture device and electronic equipment
CN111079565B (en) Construction method and identification method of view two-dimensional attitude template and positioning grabbing system
CN111401266B (en) Method, equipment, computer equipment and readable storage medium for positioning picture corner points
CN112651881B (en) Image synthesizing method, apparatus, device, storage medium, and program product
CN110097599B (en) Workpiece pose estimation method based on component model expression
JPH0773344A (en) Method and apparatus for three- dimensional point in two-dimensional graphic display
CN114022542A (en) Three-dimensional reconstruction-based 3D database manufacturing method
CN112184815A (en) Method and device for determining position and posture of panoramic image in three-dimensional model
US11189053B2 (en) Information processing apparatus, method of controlling information processing apparatus, and non-transitory computer-readable storage medium
CN115008454A (en) Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement
JP2010205095A (en) Three-dimensional object recognition device, three-dimensional object recognition program, and computer readable recording medium having program recorded therein
CN115063485B (en) Three-dimensional reconstruction method, device and computer-readable storage medium
CN116469101A (en) Data labeling method, device, electronic equipment and storage medium
JP6198104B2 (en) 3D object recognition apparatus and 3D object recognition method
CN115222809B (en) Target pose estimation method, device, computing equipment and storage medium
CN115713547A (en) Motion trail generation method and device and processing equipment
CN115222810A (en) Target pose estimation method and device, computing equipment and storage medium
CN112634439A (en) 3D information display method and device
CN112652056A (en) 3D information display method and device
CN117095131B (en) Three-dimensional reconstruction method, equipment and storage medium for object motion key points
CN117011474B (en) Fisheye image sample generation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant