CN115115700B - Object attitude estimation method and device, electronic equipment and storage medium - Google Patents

Object attitude estimation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115115700B
CN115115700B CN202210538722.7A CN202210538722A CN115115700B CN 115115700 B CN115115700 B CN 115115700B CN 202210538722 A CN202210538722 A CN 202210538722A CN 115115700 B CN115115700 B CN 115115700B
Authority
CN
China
Prior art keywords
point
detected
point cloud
cloud data
symmetry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210538722.7A
Other languages
Chinese (zh)
Other versions
CN115115700A (en
Inventor
季向阳
张睿达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210538722.7A priority Critical patent/CN115115700B/en
Publication of CN115115700A publication Critical patent/CN115115700A/en
Application granted granted Critical
Publication of CN115115700B publication Critical patent/CN115115700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a method and a device for estimating the attitude of an object, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining a region where an object to be detected is located based on a pre-trained object detector, back projecting a depth map corresponding to the region to be detected to a three-dimensional space to obtain point cloud data of the object to be detected, obtaining features of the object to be detected by combining with prior shape information of the class of the object to be detected, splicing the features, inputting the features into a regression posture branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a prediction posture of the object to be detected, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data, and obtaining a posture estimation result of the object to be detected based on the coordinates. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.

Description

Object attitude estimation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer vision technologies, and in particular, to a method and an apparatus for estimating an object pose, an electronic device, and a storage medium.
Background
The estimation of the object postures of the category level plays an important role in the fields of mechanical arm grabbing, automatic driving, augmented reality and the like. Object pose estimation at the class level is to accurately estimate pose information of a particular class of objects relative to a camera from color images and depth maps, and typically includes: (1) Rotation of three degrees of freedom, namely the rotation relation of the camera coordinate system relative to the target object coordinate system; (2) Three-degree-of-freedom translation, namely translation information of the origin of a camera coordinate system relative to the origin of a target object coordinate system; and (3) the size of three degrees of freedom, namely the length, the width and the height of the object.
Compared with the object posture estimation at the individual case level, the object posture estimation method at the category level is suitable for all objects in the same category, the shape and the color of the target object do not need to be known in advance, and the application universality and the diversity of the target object are ensured. The difficulty of the estimation of the object pose at the category level is to deal with the diversity of the objects in the category in shape, material and color. Among them, the existing methods with better effect are mainly classified into indirect methods and direct methods using class prior shape information.
The direct method trains the attitude prediction model to directly predict the object attitude information from the picture, and the calculation efficiency is high; the indirect method firstly predicts the coordinates of the three-dimensional point cloud observed by the camera under a normalized object coordinate system, establishes a corresponding relation, and then solves the object posture from the corresponding relation through a Umeyama algorithm, and most methods in the indirect method use shape prior information of object class levels, namely the average point cloud of objects in a class, so as to improve the method precision. The method comprises the steps of firstly calculating a deformation field, deforming the category prior point cloud, estimating a three-dimensional point cloud model of a current object to be measured, then calculating a similar matrix, and corresponding the observed point cloud to the estimated object model, thereby obtaining corresponding coordinates and solving the attitude. Compared with the direct method, the indirect method uses the shape prior information of the category, so that the accuracy is relatively higher.
However, in the related art, both the direct method and the indirect method have certain defects. For example, the prior information of object categories is often ignored in the existing direct solving method, so that the solving precision of the object categories is limited; the indirect method introduces category prior information, has relatively high accuracy, is easily interfered by outliers, has poor robustness and is relatively slow in solving speed.
Disclosure of Invention
The application provides a method and a device for estimating the attitude of an object, electronic equipment and a storage medium, which are used for solving the problems of limited solving precision, low speed, easy external interference, poor robustness and the like of the method for estimating the attitude of the object in the related art.
An embodiment of a first aspect of the present application provides a method for estimating an attitude of an object, including the following steps:
acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
obtaining a first characteristic of the point cloud data and a second characteristic of the shape prior information according to the point cloud data and the prior shape information of the category of the object to be detected; and
and splicing the first characteristic and the second characteristic, inputting the spliced first characteristic and the spliced second characteristic into a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a prediction attitude, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data of the object to be detected, and acquiring an attitude estimation result of the object to be detected based on the prediction attitude, the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data.
According to an embodiment of the application, the obtaining a pose estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinates of each point in the point cloud data includes:
obtaining the real position of the object to be detected according to the normalized coordinate of the object to be detected and the real posture of the object to be detected;
judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold;
and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
According to an embodiment of the present application, the method for estimating the orientation of the object further includes:
when the object to be detected is a rotational symmetric object, determining a first mirror image point of each point in point cloud data of the rotational symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
when the object to be detected is a reflection symmetric object, determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
and when the object to be detected is an asymmetric object, determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function.
According to an embodiment of the present application, when the object to be measured is the rotational symmetric object, the method further includes:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in each to-be-selected attitude observation point cloud;
and obtaining a posture loss function according to the minimum average distance between the object coordinates and the predicted coordinates corresponding to all the points.
According to one embodiment of the present application, the attitude loss function is:
Figure BDA0003647471080000021
wherein p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i And q is the coordinate of the object corresponding to all points in the ith candidate gesture, and q is the coordinate of the point in the candidate gesture.
According to an embodiment of the present application, the geometric relationship between the posture of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
wherein p is the coordinate of a point in the observation point cloud, c p The normalized coordinate of p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is the rotation matrix, and t is the translation amount.
According to the object posture estimation method, the area where the object to be detected is located is obtained based on the pre-trained object detector, the corresponding depth map is back-projected to the three-dimensional space, the point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics are spliced and input to the regression posture branch, the symmetry reconstruction branch and the object shape restoration branch, the predicted posture and the symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the posture estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
An embodiment of a second aspect of the present application provides an apparatus for estimating an attitude of an object, including:
the projection module is used for acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
the acquisition module is used for obtaining a first characteristic of the point cloud data and a second characteristic of the shape prior information according to the point cloud data and the prior shape information of the category of the object to be detected; and
and the estimation module is used for splicing the first characteristic and the second characteristic, inputting the spliced first characteristic and second characteristic into a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a predicted pose of the object to be detected, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data, and acquiring a pose estimation result of the object to be detected based on the predicted pose, the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data.
According to an embodiment of the present application, the estimation module is specifically configured to:
obtaining the real position of the object to be detected according to the normalized coordinates of the object to be detected and the real posture of the object to be detected;
judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold;
and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
According to an embodiment of the present application, the above-mentioned apparatus for estimating an orientation of an object further includes:
the first determining unit is used for determining a first mirror image point of each point in point cloud data of the rotationally symmetric object under a regular coordinate system by using the symmetry rebuilding branch based on the consistency loss function when the object to be detected is the rotationally symmetric object;
the second determining unit is used for determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry rebuilding branch based on the consistency loss function when the object to be detected is the reflection symmetric object;
and the third determining unit is used for determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry rebuilding branch based on the consistency loss function when the object to be detected is the asymmetric object.
According to an embodiment of the present application, when the object to be measured is the rotational symmetric object, the method further includes:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in the observation point cloud of each gesture to be selected;
and obtaining a posture loss function according to the minimum average distance between the object coordinates corresponding to all the points and the predicted coordinates.
According to one embodiment of the present application, the attitude loss function is:
Figure BDA0003647471080000041
wherein, p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i And q is the coordinate of the object corresponding to all points in the ith gesture to be selected.
According to an embodiment of the present application, the geometric relationship between the posture of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p The normalized coordinate of p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is the rotation matrix, and t is the translation amount.
According to the attitude estimation device of the object, the area where the object to be detected is located is obtained based on the pre-trained object detector, the depth map corresponding to the area is back-projected to the three-dimensional space, point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics of the object to be detected are spliced and input to the regression attitude branch, the symmetry reconstruction branch and the object shape recovery branch, the predicted attitude and symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the attitude estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method for estimating the pose of an object as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the program is executed by a processor, and is used to implement the method for estimating the attitude of an object as described in the foregoing embodiments.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of an object posture estimation method according to an embodiment of the present application;
FIG. 2 is a block diagram of a method for estimating an object pose by combining a direct prediction method with category prior information according to an embodiment of the present application;
FIG. 3 is a schematic diagram of object reconstruction based on symmetry according to an embodiment of the present application;
FIG. 4 is a diagram illustrating normalized object coordinates of a rotationally symmetric object pose provided in accordance with an embodiment of the present application;
FIG. 5 is an exemplary diagram of an apparatus for estimating an attitude of an object according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
A method, an apparatus, an electronic device, and a storage medium for estimating an attitude of an object according to embodiments of the present application are described below with reference to the drawings. In view of the above-mentioned problems of the related art, such as limited solution accuracy, slow speed, susceptibility to external interference, and poor robustness, the present application provides an object pose estimation method, in which,
the method comprises the steps of obtaining a region where an object to be detected is located based on a pre-trained object detector, back projecting a depth map corresponding to the region to be detected to a three-dimensional space to obtain point cloud data of the object to be detected, obtaining features of the object to be detected by combining with prior shape information of the category of the object to be detected, splicing the features, inputting the features into a regression posture branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a predicted posture of the object to be detected, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data, and obtaining a posture estimation result of the object to be detected based on the coordinates. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
Specifically, fig. 1 is a schematic flow chart of a method for estimating an object pose according to an embodiment of the present disclosure.
As shown in fig. 1, the method for estimating the attitude of the object includes the following steps:
in step S101, a region where the object to be detected is located is obtained based on a pre-trained object detector, and a depth map corresponding to the region where the object to be detected is located is back-projected to a three-dimensional space, so as to obtain point cloud data of the object to be detected.
Specifically, as shown in fig. 1, in the embodiment of the present application, an area where an object to be detected is located is obtained by using an object detector trained in advance; and secondly, back projecting the depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected, so that the precision of the method is improved, wherein the point cloud data refers to a set of vectors in a three-dimensional coordinate system.
In step S102, a first feature of the point cloud data and a second feature of the shape prior information are obtained according to the point cloud data and the prior shape information of the category of the object to be measured.
Specifically, the point cloud data obtained by the method and the prior shape information of the category of the object to be detected are input into a shared feature extractor, and the features of the point cloud data and the prior shape information are respectively extracted. The feature extractor is composed of a graph convolution network and can extract geometrical structure information of the graph convolution network and the graph convolution network.
In step S103, the first feature and the second feature are merged and input to the regression pose branch, the symmetry reconstruction branch, and the recovered object shape branch to obtain a predicted pose of the object to be measured, a symmetry reconstruction result, a mask of each point, and a normalized coordinate of each point in the point cloud data, and a pose estimation result of the object to be measured is obtained based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinate of each point in the point cloud data.
Further, in some embodiments, obtaining a pose estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinates of each point in the point cloud data includes: obtaining the real position of the object to be measured according to the normalized coordinates of the object to be measured and the real posture of the object to be measured; judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold; and based on the consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the attitude of the object to be measured based on the mask of each point.
Further, in some embodiments, the geometric relationship between the pose of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p The normalized coordinate of p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
Specifically, the point cloud data and the feature of the prior shape information of the category of the object to be detected are extracted by the feature extractor, and the features of the point cloud data and the feature of the prior shape information of the category of the object to be detected are spliced and input into three parallel branches, as shown in fig. 1, which are a direct regression posture branch, a symmetry reconstruction branch and a recovery object shape branch. The functions are respectively as follows:
(1) Direct regression of posture branches: and directly returning the pose of the object.
(2) Symmetry reconstruction branch: and outputting the symmetry reconstruction result and the mask of each point.
(3) And (3) recovering the object shape branch: firstly, calculating a deformation field, deforming the category prior point cloud, and recovering a three-dimensional point cloud model of the current object to be detected; and then calculating a similar matrix, and corresponding the observed point cloud to the estimated object model, thereby obtaining the coordinates of each point in the point cloud under the normalized object coordinate system.
Further, the area where the object to be detected is located is obtained through the object detector trained in advance, and the point cloud data of the object to be detected is obtained through back projection of the area. However, since the detector and the depth sensor have certain errors, the errors inevitably introduce points which do not belong to the object to be measured. Thus, the obtained point cloud data may contain a non-negligible number of outliers that may mislead the pose estimate.
Further, the geometric relationship between the posture of the object to be measured and the normalized coordinates is as follows:
c p =R T (p-t)/L; (1)
where p is the coordinate of a point in the observation point cloud, c p And the normalized coordinate is p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is the rotation matrix, and t is the translation amount.
Further, the consistency loss function may be expressed as:
Figure BDA0003647471080000071
wherein the content of the first and second substances,
Figure BDA0003647471080000072
is the predicted normalized coordinates.
Further, in order to obtain the real position of each point, the real position of the object to be measured may be obtained by calculating the normalized coordinates of the object to be measured and the real pose of the object to be measured, as shown in the following formula:
p gt =R(Lc)+t; (3)
wherein p is gt And (3) representing the real position of each point, c is a normalized coordinate, L is the length of a diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
Further, the observed position p is compared with the true position p gt If | p gt -p|<λ pt Then define that point as an outer point, where λ pt Is a distance threshold.
Further, when judging whether the point is an outlier, based on the consistency loss function, the judgment can be performed through a symmetry reconstruction branch, a mask is predicted for each point, so that whether the point is an outlier is indicated, and the predicted result of the mask can be supervised by using an L1 loss function during training, so that the network is guided to identify the outlier and the robustness is improved.
Further, in some embodiments, the method for estimating the posture of the object further includes: when the object to be detected is a rotational symmetric object, determining a first mirror image point of each point in point cloud data of the rotational symmetric object under a regular coordinate system by using a symmetry reconstruction branch based on a consistency loss function; when the object to be measured is a reflection symmetric object, determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function; and when the object to be detected is a non-symmetrical object, determining a third mirror image point of each point in the point cloud data of the non-symmetrical object in the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function.
Specifically, as shown in fig. 3, the embodiment of the present application proposes object reconstruction based on symmetry for two symmetries, namely, rotational symmetry and reflection symmetry. Wherein the rotationally symmetric object is symmetric about an axis and the reflectively symmetric object is symmetric about a plane. For the convenience of understanding of the related art and without loss of generality, it can be assumed that the array axis of rotational symmetry is the y-axis and the plane of symmetry of reflection symmetry is the xy-plane. It should be noted that the above assumption is only exemplary, that is, for any object, the regular pose thereof can be changed to satisfy the assumption.
Further, for rotationally symmetric, reflectively symmetric, and non-symmetric objects, each object may be represented by a point p = [ p ] in the canonical coordinate system x ,p y ,p z ]Define its mirror point F mir (p), expressed specifically as follows:
for a rotationally symmetric object, its mirror point, i.e. the first mirror point, can be expressed as:
F mir (p)=[-p x ,p y ,-p z ]; (4)
for an object with symmetric reflection, its mirror point, i.e. the second mirror point, can be expressed as:
F mir (p)=[p x ,p y ,-p z ]; (5)
for an object without symmetry, its mirror point, i.e. the third mirror point, can be expressed as:
F mir (p)=[p x ,p y ,p z ]; (6)
wherein p is x 、p y 、p z The coordinates of P on the x, y, z axes, respectively.
It should be noted that, in the symmetric object reconstruction, the predicted mirror image point corresponding to each point can be supervised by the L1 loss function during training.
Further, in some embodiments, when the object to be measured is a rotationally symmetric object, the method further includes: determining a plurality of gestures to be selected; calculating object coordinates corresponding to all points in each to-be-selected attitude observation point cloud; and obtaining a posture loss function according to the minimum average distance between the object coordinates and the predicted coordinates corresponding to all the points.
Specifically, as shown in fig. 4, since a plurality of candidate poses of the object can be obtained by rotating the real pose around the rotation axis, the poses of the rotationally symmetric objects have ambiguity, and similarly, the normalized object coordinates corresponding to the objects also have ambiguity. Therefore, in order to eliminate ambiguity and make all predicted normalized coordinates correspond to a specific pose, when the object to be measured is a rotational symmetric object, first, a plurality of candidate poses, such as n candidate poses, are determined
Figure BDA0003647471080000081
Second, for each candidate pose, an observation is calculatedObject coordinate C corresponding to all points in measured point cloud i (ii) a Finally, the minimum average L1 distance of these coordinates and the predicted coordinate C is taken as the attitude loss function, which is:
Figure BDA0003647471080000082
wherein p is the coordinate of a point in the observed point cloud, C is the coordinate of an object corresponding to all points in the observed point cloud, C i And q is the coordinate of the object corresponding to all points in the ith candidate gesture, and q is the coordinate of the point in the candidate gesture.
In summary, the method for estimating the attitude of the object provided by the embodiment of the present application has the following innovative technical solutions and advantages:
(1) And combining a direct prediction method and the category prior information to obtain an object attitude estimation framework.
(2) A geometry-guided consistency loss function.
(3) Outlier filtration techniques.
(4) Object reconstruction based on symmetry.
(5) A symmetry-based attitude loss function.
And through a method of combining direct prediction with category prior information, the nine-degree-of-freedom posture of a specific category object relative to a camera is estimated from a color image and a depth picture, wherein the nine-degree-of-freedom posture comprises three-degree-of-freedom rotation, three-degree-of-freedom translation and three-degree-of-freedom size, so that the real-time requirement is met, and meanwhile, the high accuracy is guaranteed.
According to the object posture estimation method, the area where the object to be detected is located is obtained based on the pre-trained object detector, the corresponding depth map is back-projected to the three-dimensional space, the point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics are spliced and input to the regression posture branch, the symmetry reconstruction branch and the object shape restoration branch, the predicted posture and the symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the posture estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of an object attitude estimation method in the related technology is limited, the speed is low, the method is easily interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, and the pose is solved by using a direct method, so that the solving precision and the calculating speed are improved.
Next, an attitude estimation device of an object proposed according to an embodiment of the present application is described with reference to the drawings.
Fig. 5 is a block diagram schematically illustrating an apparatus for estimating an orientation of an object according to an embodiment of the present application.
As shown in fig. 5, the posture estimation device 10 of the object includes: a projection module 100, an acquisition module 200, and an estimation module 300.
The projection module 100 is configured to obtain a region where an object to be detected is located based on a pre-trained object detector, and back-project a depth map corresponding to the region where the object to be detected is located to a three-dimensional space to obtain point cloud data of the object to be detected;
the acquisition module 200 is used for obtaining a first feature of the point cloud data and a second feature of the shape prior information according to the point cloud data and the prior shape information of the category of the object to be detected; and
the estimation module 300 is configured to splice the first feature and the second feature, and input the spliced first feature and second feature to the regression pose branch, the symmetry reconstruction branch, and the restored object shape branch to obtain a predicted pose of the object to be measured, a symmetry reconstruction result, a mask of each point, and a normalized coordinate of each point in the point cloud data, and obtain a pose estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinate of each point in the point cloud data.
Further, in some embodiments, the estimation module 300 is specifically configured to:
obtaining the real position of the object to be measured according to the normalized coordinate of the object to be measured and the real posture of the object to be measured;
judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold;
and based on the consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
Further, in some embodiments, the above-mentioned object posture estimating apparatus 10 further includes:
the first determining unit is used for determining a first mirror image point of each point in point cloud data of the rotationally symmetric object under a regular coordinate system by utilizing the symmetry reconstruction branch based on the consistency loss function when the object to be detected is the rotationally symmetric object;
the second determining unit is used for determining a second mirror image point of each point in the point cloud data of the reflection symmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function when the object to be detected is the reflection symmetric object;
and the third determining unit is used for determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function when the object to be detected is the asymmetric object.
Further, in some embodiments, when the object to be measured is a rotationally symmetric object, the method further includes:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in the observation point cloud of each gesture to be selected;
and obtaining a posture loss function according to the minimum average distance between the object coordinates and the predicted coordinates corresponding to all the points.
Further, in some embodiments, the attitude loss function is:
Figure BDA0003647471080000101
wherein, p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i And q is the coordinate of the object corresponding to all points in the ith candidate gesture, and q is the coordinate of the point in the candidate gesture.
Further, in some embodiments, the geometric relationship between the pose of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p And the normalized coordinate is p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
According to the attitude estimation device of the object, the area where the object to be detected is located is obtained based on the pre-trained object detector, the depth map corresponding to the area is back-projected to the three-dimensional space, point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics of the object to be detected are spliced and input to the regression attitude branch, the symmetry reconstruction branch and the object shape recovery branch, the predicted attitude and symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the attitude estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 601, processor 602, and computer programs stored on memory 601 and executable on processor 602.
The processor 602, when executing the program, implements the method of estimating the attitude of the object provided in the above-described embodiments.
Further, the electronic device further includes:
a communication interface 603 for communication between the memory 601 and the processor 602.
The memory 601 is used for storing computer programs that can be run on the processor 602.
Memory 601 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 601, the processor 602 and the communication interface 603 are implemented independently, the communication interface 603, the memory 601 and the processor 602 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 601, the processor 602, and the communication interface 603 are integrated on a chip, the memory 601, the processor 602, and the communication interface 603 may complete mutual communication through an internal interface.
The processor 602 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for estimating the attitude of an object as above is implemented.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (8)

1. A method of estimating the pose of an object, comprising the steps of:
acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
obtaining a first feature of the point cloud data and a second feature of the prior shape information according to the point cloud data and the prior shape information of the category of the object to be detected; and
splicing the first feature and the second feature, inputting the spliced features to a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch respectively to obtain a prediction pose and a symmetry reconstruction result of the object to be detected, a mask of each point and a normalized coordinate of each point in the point cloud data, and acquiring a pose estimation result of the object to be detected based on the prediction pose and the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data;
wherein the obtaining of the attitude estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point and the normalized coordinates of each point in the point cloud data comprises: obtaining the real position of the object to be detected according to the normalized coordinates of the object to be detected and the real posture of the object to be detected; judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold; and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
2. The method of claim 1, further comprising:
when the object to be detected is a rotational symmetric object, determining a first mirror image point of each point in point cloud data of the rotational symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
when the object to be detected is a reflection symmetric object, determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
and when the object to be detected is an asymmetric object, determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function.
3. The method according to claim 2, wherein when the object to be measured is the rotationally symmetric object, further comprising:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in each to-be-selected attitude observation point cloud;
and obtaining a posture loss function according to the minimum average distance between the object coordinates corresponding to all the points and the predicted coordinates.
4. The method of claim 3, wherein the attitude loss function is:
Figure FDA0004018261670000021
wherein, p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i Is the ith candidate gestureAnd g, corresponding object coordinates of all points below the target object, and q is the coordinate of the point in the gesture to be selected.
5. The method according to claim 1, wherein the geometric relationship between the pose of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p And the normalized coordinate is p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
6. An attitude estimation device of an object, characterized by comprising:
the projection module is used for acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
the acquisition module is used for obtaining a first characteristic of the point cloud data and a second characteristic of the prior shape information according to the point cloud data and the prior shape information of the category of the object to be detected; and
the estimation module is used for splicing the first feature and the second feature, inputting the spliced features to a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch respectively to obtain a prediction attitude, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data of the object to be measured, and acquiring an attitude estimation result of the object to be measured based on the prediction attitude, the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data;
wherein the estimation module is specifically configured to: obtaining the real position of the object to be detected according to the normalized coordinates of the object to be detected and the real posture of the object to be detected; judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold; and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
7. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of attitude estimation of an object according to any one of claims 1 to 5.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing a method of attitude estimation of an object according to any one of claims 1 to 5.
CN202210538722.7A 2022-05-17 2022-05-17 Object attitude estimation method and device, electronic equipment and storage medium Active CN115115700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210538722.7A CN115115700B (en) 2022-05-17 2022-05-17 Object attitude estimation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210538722.7A CN115115700B (en) 2022-05-17 2022-05-17 Object attitude estimation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115115700A CN115115700A (en) 2022-09-27
CN115115700B true CN115115700B (en) 2023-04-11

Family

ID=83326889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210538722.7A Active CN115115700B (en) 2022-05-17 2022-05-17 Object attitude estimation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115115700B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11462023B2 (en) * 2019-11-14 2022-10-04 Toyota Research Institute, Inc. Systems and methods for 3D object detection
CN112800822A (en) * 2019-11-14 2021-05-14 丰田研究所股份有限公司 3D automatic tagging with structural and physical constraints
CN112651944B (en) * 2020-12-28 2023-08-22 哈尔滨工业大学(深圳) 3C component high-precision six-dimensional pose estimation method and system based on CAD model
CN113139996B (en) * 2021-05-06 2024-02-06 南京大学 Point cloud registration method and system based on three-dimensional point cloud geometric feature learning
CN113393503B (en) * 2021-05-24 2022-05-27 湖南大学 Classification-driven shape prior deformation category-level object 6D pose estimation method
CN113780240B (en) * 2021-09-29 2023-12-26 上海交通大学 Object pose estimation method based on neural network and rotation characteristic enhancement
CN114004883B (en) * 2021-09-30 2024-05-03 哈尔滨工业大学 Visual perception method and device for curling ball, computer equipment and storage medium

Also Published As

Publication number Publication date
CN115115700A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN110763251B (en) Method and system for optimizing visual inertial odometer
US9542745B2 (en) Apparatus and method for estimating orientation of camera
CN109872366B (en) Method and device for detecting three-dimensional position of object
US10033985B2 (en) Camera pose estimation apparatus and method for augmented reality imaging
US20140002597A1 (en) Tracking Poses of 3D Camera Using Points and Planes
KR102169309B1 (en) Information processing apparatus and method of controlling the same
JP2002024807A (en) Object movement tracking technique and recording medium
JP2011134012A (en) Image processor, image processing method for the same and program
US20210374978A1 (en) Capturing environmental scans using anchor objects for registration
JP2019109747A (en) Position attitude estimation apparatus, position attitude estimation method, and program
JP2019190969A (en) Image processing apparatus, and image processing method
JP2018195070A (en) Information processing apparatus, information processing method, and program
Sánchez et al. Towards real time 3D tracking and reconstruction on a GPU using Monte Carlo simulations
JP2001101419A (en) Method and device for image feature tracking processing and three-dimensional data preparing method
CN115115700B (en) Object attitude estimation method and device, electronic equipment and storage medium
US11055865B2 (en) Image acquisition device and method of operating image acquisition device
JP4389663B2 (en) Image processing method and image processing apparatus
JP2006113832A (en) Stereoscopic image processor and program
CN116469101A (en) Data labeling method, device, electronic equipment and storage medium
CN117252912A (en) Depth image acquisition method, electronic device and storage medium
JP6204781B2 (en) Information processing method, information processing apparatus, and computer program
JP7061092B2 (en) Image processing equipment and programs
CN113284181A (en) Scene map point and image frame matching method in environment modeling
WO2020217377A1 (en) Degree of movement estimation device, degree of movement estimation method, and degree of movement estimation program
CN115115701A (en) Object attitude estimation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant