CN115115700B - Object attitude estimation method and device, electronic equipment and storage medium - Google Patents
Object attitude estimation method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115115700B CN115115700B CN202210538722.7A CN202210538722A CN115115700B CN 115115700 B CN115115700 B CN 115115700B CN 202210538722 A CN202210538722 A CN 202210538722A CN 115115700 B CN115115700 B CN 115115700B
- Authority
- CN
- China
- Prior art keywords
- point
- detected
- point cloud
- cloud data
- symmetry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a method and a device for estimating the attitude of an object, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining a region where an object to be detected is located based on a pre-trained object detector, back projecting a depth map corresponding to the region to be detected to a three-dimensional space to obtain point cloud data of the object to be detected, obtaining features of the object to be detected by combining with prior shape information of the class of the object to be detected, splicing the features, inputting the features into a regression posture branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a prediction posture of the object to be detected, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data, and obtaining a posture estimation result of the object to be detected based on the coordinates. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
Description
Technical Field
The present disclosure relates to the field of computer vision technologies, and in particular, to a method and an apparatus for estimating an object pose, an electronic device, and a storage medium.
Background
The estimation of the object postures of the category level plays an important role in the fields of mechanical arm grabbing, automatic driving, augmented reality and the like. Object pose estimation at the class level is to accurately estimate pose information of a particular class of objects relative to a camera from color images and depth maps, and typically includes: (1) Rotation of three degrees of freedom, namely the rotation relation of the camera coordinate system relative to the target object coordinate system; (2) Three-degree-of-freedom translation, namely translation information of the origin of a camera coordinate system relative to the origin of a target object coordinate system; and (3) the size of three degrees of freedom, namely the length, the width and the height of the object.
Compared with the object posture estimation at the individual case level, the object posture estimation method at the category level is suitable for all objects in the same category, the shape and the color of the target object do not need to be known in advance, and the application universality and the diversity of the target object are ensured. The difficulty of the estimation of the object pose at the category level is to deal with the diversity of the objects in the category in shape, material and color. Among them, the existing methods with better effect are mainly classified into indirect methods and direct methods using class prior shape information.
The direct method trains the attitude prediction model to directly predict the object attitude information from the picture, and the calculation efficiency is high; the indirect method firstly predicts the coordinates of the three-dimensional point cloud observed by the camera under a normalized object coordinate system, establishes a corresponding relation, and then solves the object posture from the corresponding relation through a Umeyama algorithm, and most methods in the indirect method use shape prior information of object class levels, namely the average point cloud of objects in a class, so as to improve the method precision. The method comprises the steps of firstly calculating a deformation field, deforming the category prior point cloud, estimating a three-dimensional point cloud model of a current object to be measured, then calculating a similar matrix, and corresponding the observed point cloud to the estimated object model, thereby obtaining corresponding coordinates and solving the attitude. Compared with the direct method, the indirect method uses the shape prior information of the category, so that the accuracy is relatively higher.
However, in the related art, both the direct method and the indirect method have certain defects. For example, the prior information of object categories is often ignored in the existing direct solving method, so that the solving precision of the object categories is limited; the indirect method introduces category prior information, has relatively high accuracy, is easily interfered by outliers, has poor robustness and is relatively slow in solving speed.
Disclosure of Invention
The application provides a method and a device for estimating the attitude of an object, electronic equipment and a storage medium, which are used for solving the problems of limited solving precision, low speed, easy external interference, poor robustness and the like of the method for estimating the attitude of the object in the related art.
An embodiment of a first aspect of the present application provides a method for estimating an attitude of an object, including the following steps:
acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
obtaining a first characteristic of the point cloud data and a second characteristic of the shape prior information according to the point cloud data and the prior shape information of the category of the object to be detected; and
and splicing the first characteristic and the second characteristic, inputting the spliced first characteristic and the spliced second characteristic into a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a prediction attitude, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data of the object to be detected, and acquiring an attitude estimation result of the object to be detected based on the prediction attitude, the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data.
According to an embodiment of the application, the obtaining a pose estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinates of each point in the point cloud data includes:
obtaining the real position of the object to be detected according to the normalized coordinate of the object to be detected and the real posture of the object to be detected;
judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold;
and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
According to an embodiment of the present application, the method for estimating the orientation of the object further includes:
when the object to be detected is a rotational symmetric object, determining a first mirror image point of each point in point cloud data of the rotational symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
when the object to be detected is a reflection symmetric object, determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
and when the object to be detected is an asymmetric object, determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function.
According to an embodiment of the present application, when the object to be measured is the rotational symmetric object, the method further includes:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in each to-be-selected attitude observation point cloud;
and obtaining a posture loss function according to the minimum average distance between the object coordinates and the predicted coordinates corresponding to all the points.
According to one embodiment of the present application, the attitude loss function is:
wherein p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i And q is the coordinate of the object corresponding to all points in the ith candidate gesture, and q is the coordinate of the point in the candidate gesture.
According to an embodiment of the present application, the geometric relationship between the posture of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
wherein p is the coordinate of a point in the observation point cloud, c p The normalized coordinate of p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is the rotation matrix, and t is the translation amount.
According to the object posture estimation method, the area where the object to be detected is located is obtained based on the pre-trained object detector, the corresponding depth map is back-projected to the three-dimensional space, the point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics are spliced and input to the regression posture branch, the symmetry reconstruction branch and the object shape restoration branch, the predicted posture and the symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the posture estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
An embodiment of a second aspect of the present application provides an apparatus for estimating an attitude of an object, including:
the projection module is used for acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
the acquisition module is used for obtaining a first characteristic of the point cloud data and a second characteristic of the shape prior information according to the point cloud data and the prior shape information of the category of the object to be detected; and
and the estimation module is used for splicing the first characteristic and the second characteristic, inputting the spliced first characteristic and second characteristic into a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a predicted pose of the object to be detected, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data, and acquiring a pose estimation result of the object to be detected based on the predicted pose, the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data.
According to an embodiment of the present application, the estimation module is specifically configured to:
obtaining the real position of the object to be detected according to the normalized coordinates of the object to be detected and the real posture of the object to be detected;
judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold;
and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
According to an embodiment of the present application, the above-mentioned apparatus for estimating an orientation of an object further includes:
the first determining unit is used for determining a first mirror image point of each point in point cloud data of the rotationally symmetric object under a regular coordinate system by using the symmetry rebuilding branch based on the consistency loss function when the object to be detected is the rotationally symmetric object;
the second determining unit is used for determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry rebuilding branch based on the consistency loss function when the object to be detected is the reflection symmetric object;
and the third determining unit is used for determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry rebuilding branch based on the consistency loss function when the object to be detected is the asymmetric object.
According to an embodiment of the present application, when the object to be measured is the rotational symmetric object, the method further includes:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in the observation point cloud of each gesture to be selected;
and obtaining a posture loss function according to the minimum average distance between the object coordinates corresponding to all the points and the predicted coordinates.
According to one embodiment of the present application, the attitude loss function is:
wherein, p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i And q is the coordinate of the object corresponding to all points in the ith gesture to be selected.
According to an embodiment of the present application, the geometric relationship between the posture of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p The normalized coordinate of p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is the rotation matrix, and t is the translation amount.
According to the attitude estimation device of the object, the area where the object to be detected is located is obtained based on the pre-trained object detector, the depth map corresponding to the area is back-projected to the three-dimensional space, point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics of the object to be detected are spliced and input to the regression attitude branch, the symmetry reconstruction branch and the object shape recovery branch, the predicted attitude and symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the attitude estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method for estimating the pose of an object as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the program is executed by a processor, and is used to implement the method for estimating the attitude of an object as described in the foregoing embodiments.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of an object posture estimation method according to an embodiment of the present application;
FIG. 2 is a block diagram of a method for estimating an object pose by combining a direct prediction method with category prior information according to an embodiment of the present application;
FIG. 3 is a schematic diagram of object reconstruction based on symmetry according to an embodiment of the present application;
FIG. 4 is a diagram illustrating normalized object coordinates of a rotationally symmetric object pose provided in accordance with an embodiment of the present application;
FIG. 5 is an exemplary diagram of an apparatus for estimating an attitude of an object according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
A method, an apparatus, an electronic device, and a storage medium for estimating an attitude of an object according to embodiments of the present application are described below with reference to the drawings. In view of the above-mentioned problems of the related art, such as limited solution accuracy, slow speed, susceptibility to external interference, and poor robustness, the present application provides an object pose estimation method, in which,
the method comprises the steps of obtaining a region where an object to be detected is located based on a pre-trained object detector, back projecting a depth map corresponding to the region to be detected to a three-dimensional space to obtain point cloud data of the object to be detected, obtaining features of the object to be detected by combining with prior shape information of the category of the object to be detected, splicing the features, inputting the features into a regression posture branch, a symmetry reconstruction branch and a recovery object shape branch to obtain a predicted posture of the object to be detected, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data, and obtaining a posture estimation result of the object to be detected based on the coordinates. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
Specifically, fig. 1 is a schematic flow chart of a method for estimating an object pose according to an embodiment of the present disclosure.
As shown in fig. 1, the method for estimating the attitude of the object includes the following steps:
in step S101, a region where the object to be detected is located is obtained based on a pre-trained object detector, and a depth map corresponding to the region where the object to be detected is located is back-projected to a three-dimensional space, so as to obtain point cloud data of the object to be detected.
Specifically, as shown in fig. 1, in the embodiment of the present application, an area where an object to be detected is located is obtained by using an object detector trained in advance; and secondly, back projecting the depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected, so that the precision of the method is improved, wherein the point cloud data refers to a set of vectors in a three-dimensional coordinate system.
In step S102, a first feature of the point cloud data and a second feature of the shape prior information are obtained according to the point cloud data and the prior shape information of the category of the object to be measured.
Specifically, the point cloud data obtained by the method and the prior shape information of the category of the object to be detected are input into a shared feature extractor, and the features of the point cloud data and the prior shape information are respectively extracted. The feature extractor is composed of a graph convolution network and can extract geometrical structure information of the graph convolution network and the graph convolution network.
In step S103, the first feature and the second feature are merged and input to the regression pose branch, the symmetry reconstruction branch, and the recovered object shape branch to obtain a predicted pose of the object to be measured, a symmetry reconstruction result, a mask of each point, and a normalized coordinate of each point in the point cloud data, and a pose estimation result of the object to be measured is obtained based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinate of each point in the point cloud data.
Further, in some embodiments, obtaining a pose estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinates of each point in the point cloud data includes: obtaining the real position of the object to be measured according to the normalized coordinates of the object to be measured and the real posture of the object to be measured; judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold; and based on the consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the attitude of the object to be measured based on the mask of each point.
Further, in some embodiments, the geometric relationship between the pose of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p The normalized coordinate of p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
Specifically, the point cloud data and the feature of the prior shape information of the category of the object to be detected are extracted by the feature extractor, and the features of the point cloud data and the feature of the prior shape information of the category of the object to be detected are spliced and input into three parallel branches, as shown in fig. 1, which are a direct regression posture branch, a symmetry reconstruction branch and a recovery object shape branch. The functions are respectively as follows:
(1) Direct regression of posture branches: and directly returning the pose of the object.
(2) Symmetry reconstruction branch: and outputting the symmetry reconstruction result and the mask of each point.
(3) And (3) recovering the object shape branch: firstly, calculating a deformation field, deforming the category prior point cloud, and recovering a three-dimensional point cloud model of the current object to be detected; and then calculating a similar matrix, and corresponding the observed point cloud to the estimated object model, thereby obtaining the coordinates of each point in the point cloud under the normalized object coordinate system.
Further, the area where the object to be detected is located is obtained through the object detector trained in advance, and the point cloud data of the object to be detected is obtained through back projection of the area. However, since the detector and the depth sensor have certain errors, the errors inevitably introduce points which do not belong to the object to be measured. Thus, the obtained point cloud data may contain a non-negligible number of outliers that may mislead the pose estimate.
Further, the geometric relationship between the posture of the object to be measured and the normalized coordinates is as follows:
c p =R T (p-t)/L; (1)
where p is the coordinate of a point in the observation point cloud, c p And the normalized coordinate is p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is the rotation matrix, and t is the translation amount.
Further, the consistency loss function may be expressed as:
Further, in order to obtain the real position of each point, the real position of the object to be measured may be obtained by calculating the normalized coordinates of the object to be measured and the real pose of the object to be measured, as shown in the following formula:
p gt =R(Lc)+t; (3)
wherein p is gt And (3) representing the real position of each point, c is a normalized coordinate, L is the length of a diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
Further, the observed position p is compared with the true position p gt If | p gt -p|<λ pt Then define that point as an outer point, where λ pt Is a distance threshold.
Further, when judging whether the point is an outlier, based on the consistency loss function, the judgment can be performed through a symmetry reconstruction branch, a mask is predicted for each point, so that whether the point is an outlier is indicated, and the predicted result of the mask can be supervised by using an L1 loss function during training, so that the network is guided to identify the outlier and the robustness is improved.
Further, in some embodiments, the method for estimating the posture of the object further includes: when the object to be detected is a rotational symmetric object, determining a first mirror image point of each point in point cloud data of the rotational symmetric object under a regular coordinate system by using a symmetry reconstruction branch based on a consistency loss function; when the object to be measured is a reflection symmetric object, determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function; and when the object to be detected is a non-symmetrical object, determining a third mirror image point of each point in the point cloud data of the non-symmetrical object in the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function.
Specifically, as shown in fig. 3, the embodiment of the present application proposes object reconstruction based on symmetry for two symmetries, namely, rotational symmetry and reflection symmetry. Wherein the rotationally symmetric object is symmetric about an axis and the reflectively symmetric object is symmetric about a plane. For the convenience of understanding of the related art and without loss of generality, it can be assumed that the array axis of rotational symmetry is the y-axis and the plane of symmetry of reflection symmetry is the xy-plane. It should be noted that the above assumption is only exemplary, that is, for any object, the regular pose thereof can be changed to satisfy the assumption.
Further, for rotationally symmetric, reflectively symmetric, and non-symmetric objects, each object may be represented by a point p = [ p ] in the canonical coordinate system x ,p y ,p z ]Define its mirror point F mir (p), expressed specifically as follows:
for a rotationally symmetric object, its mirror point, i.e. the first mirror point, can be expressed as:
F mir (p)=[-p x ,p y ,-p z ]; (4)
for an object with symmetric reflection, its mirror point, i.e. the second mirror point, can be expressed as:
F mir (p)=[p x ,p y ,-p z ]; (5)
for an object without symmetry, its mirror point, i.e. the third mirror point, can be expressed as:
F mir (p)=[p x ,p y ,p z ]; (6)
wherein p is x 、p y 、p z The coordinates of P on the x, y, z axes, respectively.
It should be noted that, in the symmetric object reconstruction, the predicted mirror image point corresponding to each point can be supervised by the L1 loss function during training.
Further, in some embodiments, when the object to be measured is a rotationally symmetric object, the method further includes: determining a plurality of gestures to be selected; calculating object coordinates corresponding to all points in each to-be-selected attitude observation point cloud; and obtaining a posture loss function according to the minimum average distance between the object coordinates and the predicted coordinates corresponding to all the points.
Specifically, as shown in fig. 4, since a plurality of candidate poses of the object can be obtained by rotating the real pose around the rotation axis, the poses of the rotationally symmetric objects have ambiguity, and similarly, the normalized object coordinates corresponding to the objects also have ambiguity. Therefore, in order to eliminate ambiguity and make all predicted normalized coordinates correspond to a specific pose, when the object to be measured is a rotational symmetric object, first, a plurality of candidate poses, such as n candidate poses, are determinedSecond, for each candidate pose, an observation is calculatedObject coordinate C corresponding to all points in measured point cloud i (ii) a Finally, the minimum average L1 distance of these coordinates and the predicted coordinate C is taken as the attitude loss function, which is:
wherein p is the coordinate of a point in the observed point cloud, C is the coordinate of an object corresponding to all points in the observed point cloud, C i And q is the coordinate of the object corresponding to all points in the ith candidate gesture, and q is the coordinate of the point in the candidate gesture.
In summary, the method for estimating the attitude of the object provided by the embodiment of the present application has the following innovative technical solutions and advantages:
(1) And combining a direct prediction method and the category prior information to obtain an object attitude estimation framework.
(2) A geometry-guided consistency loss function.
(3) Outlier filtration techniques.
(4) Object reconstruction based on symmetry.
(5) A symmetry-based attitude loss function.
And through a method of combining direct prediction with category prior information, the nine-degree-of-freedom posture of a specific category object relative to a camera is estimated from a color image and a depth picture, wherein the nine-degree-of-freedom posture comprises three-degree-of-freedom rotation, three-degree-of-freedom translation and three-degree-of-freedom size, so that the real-time requirement is met, and meanwhile, the high accuracy is guaranteed.
According to the object posture estimation method, the area where the object to be detected is located is obtained based on the pre-trained object detector, the corresponding depth map is back-projected to the three-dimensional space, the point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics are spliced and input to the regression posture branch, the symmetry reconstruction branch and the object shape restoration branch, the predicted posture and the symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the posture estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of an object attitude estimation method in the related technology is limited, the speed is low, the method is easily interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, and the pose is solved by using a direct method, so that the solving precision and the calculating speed are improved.
Next, an attitude estimation device of an object proposed according to an embodiment of the present application is described with reference to the drawings.
Fig. 5 is a block diagram schematically illustrating an apparatus for estimating an orientation of an object according to an embodiment of the present application.
As shown in fig. 5, the posture estimation device 10 of the object includes: a projection module 100, an acquisition module 200, and an estimation module 300.
The projection module 100 is configured to obtain a region where an object to be detected is located based on a pre-trained object detector, and back-project a depth map corresponding to the region where the object to be detected is located to a three-dimensional space to obtain point cloud data of the object to be detected;
the acquisition module 200 is used for obtaining a first feature of the point cloud data and a second feature of the shape prior information according to the point cloud data and the prior shape information of the category of the object to be detected; and
the estimation module 300 is configured to splice the first feature and the second feature, and input the spliced first feature and second feature to the regression pose branch, the symmetry reconstruction branch, and the restored object shape branch to obtain a predicted pose of the object to be measured, a symmetry reconstruction result, a mask of each point, and a normalized coordinate of each point in the point cloud data, and obtain a pose estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point, and the normalized coordinate of each point in the point cloud data.
Further, in some embodiments, the estimation module 300 is specifically configured to:
obtaining the real position of the object to be measured according to the normalized coordinate of the object to be measured and the real posture of the object to be measured;
judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold;
and based on the consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
Further, in some embodiments, the above-mentioned object posture estimating apparatus 10 further includes:
the first determining unit is used for determining a first mirror image point of each point in point cloud data of the rotationally symmetric object under a regular coordinate system by utilizing the symmetry reconstruction branch based on the consistency loss function when the object to be detected is the rotationally symmetric object;
the second determining unit is used for determining a second mirror image point of each point in the point cloud data of the reflection symmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function when the object to be detected is the reflection symmetric object;
and the third determining unit is used for determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function when the object to be detected is the asymmetric object.
Further, in some embodiments, when the object to be measured is a rotationally symmetric object, the method further includes:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in the observation point cloud of each gesture to be selected;
and obtaining a posture loss function according to the minimum average distance between the object coordinates and the predicted coordinates corresponding to all the points.
Further, in some embodiments, the attitude loss function is:
wherein, p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i And q is the coordinate of the object corresponding to all points in the ith candidate gesture, and q is the coordinate of the point in the candidate gesture.
Further, in some embodiments, the geometric relationship between the pose of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p And the normalized coordinate is p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
According to the attitude estimation device of the object, the area where the object to be detected is located is obtained based on the pre-trained object detector, the depth map corresponding to the area is back-projected to the three-dimensional space, point cloud data of the object to be detected is obtained, the characteristics of the object to be detected are obtained by combining the prior shape information of the category of the object to be detected, the characteristics of the object to be detected are spliced and input to the regression attitude branch, the symmetry reconstruction branch and the object shape recovery branch, the predicted attitude and symmetry reconstruction result of the object to be detected, the mask of each point and the normalized coordinate of each point in the point cloud data are obtained, and the attitude estimation result of the object to be detected is obtained based on the coordinate. Therefore, the problems that the solving precision of the object posture estimation method in the related technology is limited, the speed is low, the method is easy to be interfered by the outside, the robustness is poor and the like are solved, the shape of the object is recovered by introducing the shape prior information of the category, the pose is solved by using a direct method, and the solving precision and the calculating speed are improved.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
The processor 602, when executing the program, implements the method of estimating the attitude of the object provided in the above-described embodiments.
Further, the electronic device further includes:
a communication interface 603 for communication between the memory 601 and the processor 602.
The memory 601 is used for storing computer programs that can be run on the processor 602.
If the memory 601, the processor 602 and the communication interface 603 are implemented independently, the communication interface 603, the memory 601 and the processor 602 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 601, the processor 602, and the communication interface 603 are integrated on a chip, the memory 601, the processor 602, and the communication interface 603 may complete mutual communication through an internal interface.
The processor 602 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for estimating the attitude of an object as above is implemented.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (8)
1. A method of estimating the pose of an object, comprising the steps of:
acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
obtaining a first feature of the point cloud data and a second feature of the prior shape information according to the point cloud data and the prior shape information of the category of the object to be detected; and
splicing the first feature and the second feature, inputting the spliced features to a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch respectively to obtain a prediction pose and a symmetry reconstruction result of the object to be detected, a mask of each point and a normalized coordinate of each point in the point cloud data, and acquiring a pose estimation result of the object to be detected based on the prediction pose and the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data;
wherein the obtaining of the attitude estimation result of the object to be measured based on the predicted pose, the symmetry reconstruction result, the mask of each point and the normalized coordinates of each point in the point cloud data comprises: obtaining the real position of the object to be detected according to the normalized coordinates of the object to be detected and the real posture of the object to be detected; judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold; and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
2. The method of claim 1, further comprising:
when the object to be detected is a rotational symmetric object, determining a first mirror image point of each point in point cloud data of the rotational symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
when the object to be detected is a reflection symmetric object, determining a second mirror image point of each point in point cloud data of the reflection symmetric object under a regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function;
and when the object to be detected is an asymmetric object, determining a third mirror image point of each point in the point cloud data of the asymmetric object under the regular coordinate system by using the symmetry reconstruction branch based on the consistency loss function.
3. The method according to claim 2, wherein when the object to be measured is the rotationally symmetric object, further comprising:
determining a plurality of gestures to be selected;
calculating object coordinates corresponding to all points in each to-be-selected attitude observation point cloud;
and obtaining a posture loss function according to the minimum average distance between the object coordinates corresponding to all the points and the predicted coordinates.
4. The method of claim 3, wherein the attitude loss function is:
wherein, p is the coordinate of one point in the observation point cloud, C is the object coordinate corresponding to all points in the observation point cloud, C i Is the ith candidate gestureAnd g, corresponding object coordinates of all points below the target object, and q is the coordinate of the point in the gesture to be selected.
5. The method according to claim 1, wherein the geometric relationship between the pose of the object to be measured and the normalized coordinates is:
c p =R T (p-t)/L;
where p is the coordinate of a point in the observation point cloud, c p And the normalized coordinate is p, L is the length of the diagonal line of the three-dimensional bounding box of the object, R is a rotation matrix, and t is the translation amount.
6. An attitude estimation device of an object, characterized by comprising:
the projection module is used for acquiring the area of an object to be detected based on a pre-trained object detector, and back-projecting a depth map corresponding to the area of the object to be detected to a three-dimensional space to obtain point cloud data of the object to be detected;
the acquisition module is used for obtaining a first characteristic of the point cloud data and a second characteristic of the prior shape information according to the point cloud data and the prior shape information of the category of the object to be detected; and
the estimation module is used for splicing the first feature and the second feature, inputting the spliced features to a regression attitude branch, a symmetry reconstruction branch and a recovery object shape branch respectively to obtain a prediction attitude, a symmetry reconstruction result, a mask of each point and a normalized coordinate of each point in the point cloud data of the object to be measured, and acquiring an attitude estimation result of the object to be measured based on the prediction attitude, the symmetry reconstruction result, the mask of each point and the normalized coordinate of each point in the point cloud data;
wherein the estimation module is specifically configured to: obtaining the real position of the object to be detected according to the normalized coordinates of the object to be detected and the real posture of the object to be detected; judging that the observation position is an external point according to the fact that the absolute value of the difference between the real position and the observation position is smaller than a distance threshold; and based on a consistency loss function, generating a mask of each point for each point in the point cloud data through the symmetry reconstruction branch, and estimating the posture of the object to be detected based on the mask of each point.
7. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of attitude estimation of an object according to any one of claims 1 to 5.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing a method of attitude estimation of an object according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210538722.7A CN115115700B (en) | 2022-05-17 | 2022-05-17 | Object attitude estimation method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210538722.7A CN115115700B (en) | 2022-05-17 | 2022-05-17 | Object attitude estimation method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115115700A CN115115700A (en) | 2022-09-27 |
CN115115700B true CN115115700B (en) | 2023-04-11 |
Family
ID=83326889
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210538722.7A Active CN115115700B (en) | 2022-05-17 | 2022-05-17 | Object attitude estimation method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115115700B (en) |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11462023B2 (en) * | 2019-11-14 | 2022-10-04 | Toyota Research Institute, Inc. | Systems and methods for 3D object detection |
CN112800822A (en) * | 2019-11-14 | 2021-05-14 | 丰田研究所股份有限公司 | 3D automatic tagging with structural and physical constraints |
CN112651944B (en) * | 2020-12-28 | 2023-08-22 | 哈尔滨工业大学(深圳) | 3C component high-precision six-dimensional pose estimation method and system based on CAD model |
CN113139996B (en) * | 2021-05-06 | 2024-02-06 | 南京大学 | Point cloud registration method and system based on three-dimensional point cloud geometric feature learning |
CN113393503B (en) * | 2021-05-24 | 2022-05-27 | 湖南大学 | Classification-driven shape prior deformation category-level object 6D pose estimation method |
CN113780240B (en) * | 2021-09-29 | 2023-12-26 | 上海交通大学 | Object pose estimation method based on neural network and rotation characteristic enhancement |
CN114004883B (en) * | 2021-09-30 | 2024-05-03 | 哈尔滨工业大学 | Visual perception method and device for curling ball, computer equipment and storage medium |
-
2022
- 2022-05-17 CN CN202210538722.7A patent/CN115115700B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115115700A (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110763251B (en) | Method and system for optimizing visual inertial odometer | |
US9542745B2 (en) | Apparatus and method for estimating orientation of camera | |
CN109872366B (en) | Method and device for detecting three-dimensional position of object | |
US10033985B2 (en) | Camera pose estimation apparatus and method for augmented reality imaging | |
US20140002597A1 (en) | Tracking Poses of 3D Camera Using Points and Planes | |
KR102169309B1 (en) | Information processing apparatus and method of controlling the same | |
JP2002024807A (en) | Object movement tracking technique and recording medium | |
JP2011134012A (en) | Image processor, image processing method for the same and program | |
US20210374978A1 (en) | Capturing environmental scans using anchor objects for registration | |
JP2019109747A (en) | Position attitude estimation apparatus, position attitude estimation method, and program | |
JP2019190969A (en) | Image processing apparatus, and image processing method | |
JP2018195070A (en) | Information processing apparatus, information processing method, and program | |
Sánchez et al. | Towards real time 3D tracking and reconstruction on a GPU using Monte Carlo simulations | |
JP2001101419A (en) | Method and device for image feature tracking processing and three-dimensional data preparing method | |
CN115115700B (en) | Object attitude estimation method and device, electronic equipment and storage medium | |
US11055865B2 (en) | Image acquisition device and method of operating image acquisition device | |
JP4389663B2 (en) | Image processing method and image processing apparatus | |
JP2006113832A (en) | Stereoscopic image processor and program | |
CN116469101A (en) | Data labeling method, device, electronic equipment and storage medium | |
CN117252912A (en) | Depth image acquisition method, electronic device and storage medium | |
JP6204781B2 (en) | Information processing method, information processing apparatus, and computer program | |
JP7061092B2 (en) | Image processing equipment and programs | |
CN113284181A (en) | Scene map point and image frame matching method in environment modeling | |
WO2020217377A1 (en) | Degree of movement estimation device, degree of movement estimation method, and degree of movement estimation program | |
CN115115701A (en) | Object attitude estimation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |