CN110706285A - Object pose prediction method based on CAD model - Google Patents
Object pose prediction method based on CAD model Download PDFInfo
- Publication number
- CN110706285A CN110706285A CN201910947809.8A CN201910947809A CN110706285A CN 110706285 A CN110706285 A CN 110706285A CN 201910947809 A CN201910947809 A CN 201910947809A CN 110706285 A CN110706285 A CN 110706285A
- Authority
- CN
- China
- Prior art keywords
- camera
- rotation
- pose
- cad model
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000005070 sampling Methods 0.000 claims description 44
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000013528 artificial neural network Methods 0.000 claims description 14
- 238000006073 displacement reaction Methods 0.000 claims description 12
- 230000003287 optical effect Effects 0.000 claims description 12
- 238000013519 translation Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 5
- 238000003384 imaging method Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 2
- 238000009877 rendering Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 3
- 238000003672 processing method Methods 0.000 abstract description 2
- 238000011960 computer-aided design Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 230000003190 augmentative effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 238000003466 welding Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01B—MEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
- G01B11/00—Measuring arrangements characterised by the use of optical techniques
- G01B11/002—Measuring arrangements characterised by the use of optical techniques for measuring two or more coordinates
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01B—MEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
- G01B11/00—Measuring arrangements characterised by the use of optical techniques
- G01B11/24—Measuring arrangements characterised by the use of optical techniques for measuring contours or curvatures
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C11/00—Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Radar, Positioning & Navigation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Remote Sensing (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an object pose prediction method based on a CAD model, and relates to the technical field of image processing methods. The method comprises the following steps: obtaining relevant parameters of the monocular camera through calibration, and generating data required by rough matching by using a CAD model; detecting and identifying an object in the image and outputting a mask of the image, and obtaining related outline information of the object through the mask of the object; and obtaining a rough matching pose of the object by combining the relevant contour information of the object with rough matching data, and then obtaining an accurate pose of the object by an iterative algorithm. The method can be used as an algorithm for detecting the pose of an object when the real-time requirement is not high, and has high detection precision and strong anti-interference performance.
Description
Technical Field
The invention relates to the technical field of image processing methods, in particular to an object pose prediction method based on a computer-aided design (CAD) model.
Background
Augmented Reality (AR) is based on computer graphics technology and visualization technology, adds a positioning virtual object in a three-dimensional space, can integrate information of a real scene and a virtual scene, and has real-time interactivity. Since the concept of augmented reality-based induced repair was proposed, research into AR in the field of repair has been increasingly in depth. For example, when a robot using an augmented reality technology executes tasks such as grabbing and welding, accurate three-dimensional pose information of an object needs to be acquired in advance through visual information acquired by a camera, and in addition, the three-dimensional pose of the object needs to be judged in advance through visual sensor information in the aspects of unmanned driving, aerospace, deep sea operation, weapon guidance and the like. Present augmented reality's sensor mainly relies on camera, laser radar, ultrasonic radar etc. and wherein the camera divide into monocular camera and binocular camera again, and wherein there is bulky, the weight is heavy, the problem of high price, fragile in the binocular camera, and ultrasonic radar exists the precision not high, and the real-time is poor, can not have and shelter from, easily receives the problem of noise influence.
Disclosure of Invention
The invention aims to solve the technical problem of how to provide an identification method which is low in cost and can accurately obtain the pose of an object.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an object pose prediction method based on a CAD model is characterized by comprising the following steps:
obtaining relevant parameters of the monocular camera through calibration, and generating data required by rough matching by using a CAD model;
detecting and identifying an object in the image and outputting a mask of the image, and obtaining related outline information of the object through the mask of the object;
and obtaining a rough matching pose of the object by combining the relevant contour information of the object with rough matching data, and then obtaining an accurate pose of the object by an iterative algorithm.
The further technical scheme is that the method for acquiring the relevant parameters of the monocular camera through calibration comprises the following steps:
constructing a camera imaging model:
m is a three-dimensional space point, M is an image point projected by M on an image plane, and the projection of a world coordinate system to a pixel coordinate can be obtained according to the relation between coordinate systems related to a camera:
can write (1) into (2)
Wherein a isx,ayScale factors for the horizontal and vertical axes of the image, respectively; k is a camera internal parameter matrix; m1Containing a rotation matrix and a translation vector, M1The medium parameter is determined by the position of the camera coordinate system relative to the world coordinate system, hence the name M1Is a camera extrinsic parameter matrix; the product M of the internal parameter matrix and the external parameter matrix is a projection matrix; xWIs the x-axis coordinate, Y, of the center W of the object in the world coordinate systemWIs the y-axis coordinate, Z, of the center W of the object in the world coordinate systemWIs the z-axis coordinate of the object center W in the world coordinate system;
the focal length of the camera is f, the axis is the positive z direction, the x and y axes are on the plane of the optical center O, the optical center O is used as the origin of the camera coordinate system, and the position of the center of the object is represented by W in the camera coordinate system, wherein:
W=(Wx,Wy,Wz) (3)
if P ═ u, v is the coordinates of the corresponding pixel of the object on the image, and K is the camera intrinsic parameter matrix, then this equation can be obtained:
this equation represents the two-dimensional coordinate position P of the actual object center position W after passing through the camera intrinsic parameters K and being projected onto the image in the camera coordinate system.
The further technical scheme is that the method for generating rough matching data by utilizing the CAD model comprises the following steps:
firstly, rendering a mask of an object under a specified pose through an object CAD model, obtaining a boundary frame of the object through the mask of the object, and then sampling the outline of the object at intervals on the boundary frame according to different requirements;
dividing L into n equal parts by taking the length L of a left frame as a reference, taking every L/n as a sampling abscissa point, traversing points on each contour, and calculating the distance from the points to the left frame when the abscissa of the points is equal to the sampling abscissa point;
normalizing the sampling value, namely unifying the lengths of the left boundary frames to a unit;
and sampling the contour of the object at different rotation angles by taking the center of the object CAD model as the center at the specified distance, and storing the contour sampling information and the corresponding pose information to obtain rough matching template data of the object.
The further technical solution is that the method for detecting and identifying objects in an image and outputting a mask of the image comprises the following steps:
and performing image recognition by using a Mask-RCNN neural network, and outputting the category of the object and the Mask of the object.
The technical scheme is that when the Mask-RCNN neural network is trained, a data set is automatically generated by using a blender and Opencv software for training.
The further technical scheme is that the rough pose matching method comprises the following steps:
the pose of the rigid body comprises a rotation part R and a displacement part T, and the matching process of the rotation part is as follows:
firstly, normalizing the output contour information, unifying the output contour information to the same scale for comparison;
if the sampling data of the actual shade of the object is SinThe ith group of data in the template data is SiAnd each group has n sampling values, calculating L of each group of actual mask sampling data and template1Distance, L of the ith group of data1Distance LiComprises the following steps:
ideally, the sampling values should be consistent when the positions are the same, that is, the rotation angle with the distance of 0 in the template data is the rotation angle corresponding to the contour, so that the rotation angle corresponding to the minimum value in all the results when the threshold is met is taken as the rotation angle obtained by current matching, and the threshold is not met to consider that the matching fails;
in coarse matching, the error is controlled to be not more than 12 degrees in each degree of freedom of the Euler angle; then, the Euler angle information is converted into a rotation matrix R, and the rotation information of the object is obtained;
the algorithm of the translation part is as follows:
when generating the template data, since the object is sampled at a predetermined distance and the CAD model size is known, the size of the bounding box corresponding to the object is inversely proportional to the distance thereof, that is, the distance between the center point of the model and the optical center of the camera can be found by (5) if the distance becomes longer as the bounding box becomes smaller, which is consistent with the recognition by the human eye:
D=(win/wi)·Di(6)
wherein winOutput bounding box width, w, for object identificationiBounding box width of template data matched to its rotation, DiThe distance is specified when the template data is collected, and D is the distance between the center point of the model and the optical center of the camera;
CAD model size prior informationIt is known that the actual physical distance represented by each pixel in the template can be calculated, and then the displacement vector t of the object can be calculatedz:
Wherein, txIs the displacement of the object in the x-axis, tyThe world coordinate of the object is obtained by combining internal and external parameters of the camera after the rotation R and the displacement T of the object are obtained.
The further technical scheme is that the method for obtaining the accurate pose of the object through the iterative algorithm comprises the following steps:
if the coarse matching object rotation is a ═ phi (psi, theta, phi), then, on the basis of the coarse matching object rotation, each coordinate axis is added and subtracted by an angle delta epsilon, the delta epsilon is set to be half of the coarse matching interval, a plurality of angles of adjacent spaces of the coarse matching object rotation are obtained in the coarse matching rotation space, the outlines of the objects are obtained by combining a CAD model by the plurality of angles of the adjacent spaces, and the outlines are obtained by combining the formula (5) by a contour sampling method, so that the formula (5) L is obtainediRotation A of minimum value1=(ψ1,θ1,φ1) Obtaining the rotation angle of the object after one iteration;
then, continuously halving the angle delta epsilon to obtain an angle value in a smaller range for iteration, and finally obtaining a rotation angle with the formula (3) being 0;
and combining the translation information of the object obtained in the rough matching process to obtain the accurate pose of the object.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: according to the method, firstly, relevant parameters of a camera are obtained through calibration, data required by rough matching are generated through a CAD model, then, a deep neural network or other algorithms are used for detecting and identifying an object in an image and outputting a shade of the image, relevant outline information can be obtained through the shade of the object, the outline information can be combined with the rough matching data to obtain the rough matching pose of the object, and then the accurate pose of the object can be obtained through an iterative algorithm.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a method according to an embodiment of the invention;
FIG. 2 is a diagram of a coordinate system in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a pinhole model of a camera according to an embodiment of the present invention;
FIG. 4 is a graph of the results of sampling an object profile in an embodiment of the present invention;
FIG. 5 is a diagram illustrating the segmentation effect of Mask-RCNN images according to an embodiment of the present invention;
FIG. 6 is a diagram showing the results of Mask-RCNN image recognition in an embodiment of the present invention;
FIG. 7 is a comparison graph of coarse matching and post-iteration poses in an embodiment of the present invention;
FIG. 8 is a diagram of the pose accuracy of an object under occlusion in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1, an embodiment of the present invention discloses a method for predicting the pose of an object based on a CAD model, which includes the following steps:
the method comprises the steps of firstly obtaining relevant parameters of a camera through calibration, generating data required by rough matching by using a CAD model, detecting and identifying an object in an image by using a deep neural network or other algorithms, outputting a shade of the image, obtaining relevant contour information through the shade of the object, obtaining rough matching pose of the object by combining the contour information with the rough matching data, and obtaining accurate pose of the object through an iterative algorithm.
The above method is explained in detail below:
a camera imaging model:
m is a three-dimensional space point, and M is an image point projected by M on an image plane. The projection of the world coordinate system to the pixel coordinates can be obtained from the relationship between the coordinate systems involved in the camera (the coordinate system relationship is shown in fig. 2):
can write (1) into (2)
Wherein a isx,ayScale factors for the horizontal and vertical axes of the image, respectively; k comprises internal parameters such as focal length, principal point coordinates and the like, and is called as an internal parameter matrix; m1Containing a rotation matrix and a translation vector, M1The medium parameters are determined by the position of the camera coordinate system relative to the world coordinate system, and are therefore called the extrinsic parameter matrix of the camera; the product M of the internal parameter and the external parameter matrix is called the projection matrix. By comparing the formula (1) and the formula (2), the concrete expressions of the internal and external parameters of the camera represented by the matrixes can be easily determined, and the calibration of the camera is to determine the internal parameters and the external parameters of the camera.
Calibrating a camera:
assuming that the camera is a pinhole model, as shown in fig. 3, the pose of an object is a general term for the position and posture of the object.
The focal length of the camera is f, the axis is the positive z direction, the x and y axes are on the plane of the optical center O, the optical center O is used as the origin of the camera coordinate system, and the position of the center of the object can be represented by W in the camera coordinate system, wherein:
W=(Wx,Wy,Wz) (3)
the specified object center is the position of the object CAD model center, and is generally the volume center. If P is (u, v) the coordinates of the corresponding pixel of the object on the image and K is the camera reference matrix, then this equation can be obtained:
this equation represents the two-dimensional coordinate position P of the actual object center position W after passing through the camera intrinsic parameters K and being projected onto the image in the camera coordinate system.
Therefore, to obtain the three-dimensional coordinate position of the object, the camera internal reference K must be calibrated, where there are many calibration methods, and in the present application, K is obtained by using the camera calibration method provided by Opencv software.
Generating coarse matching data by using a CAD model:
the most central algorithm in generating the template data using the object CAD model is a sampling algorithm based on the object contour. Firstly, a Mask of the object can be rendered under a designated pose through the object CAD model, a boundary box of the object can be obtained through the Mask of the object, then the contour of the object is sampled at regular intervals on the boundary box according to different requirements, and as shown in FIG. 4, the contour of the object is sampled at regular intervals on the left frame.
Taking the frame length L of a left boundary (similar to other boundaries) as a reference, dividing L into n equal parts, taking every L/n as a sampling abscissa point, traversing points on each contour, and calculating the distance from each point to the left frame when the abscissa of each point is equal to the sampling abscissa point. This changes the contour information into a set of sample values.
Because the contour may have a variable size, the sampling value needs to be normalized, that is, the length of the left bounding box is unified to one unit, and in the experiment, the length of the left bounding box is unified to 128px, so that the sampling precision and the sampling speed can be ensured.
The sampling mode has the advantages that a set of characteristics of the contour, namely sampling values, are obtained, the values have scaling invariance to the contour, but are sensitive to object rotation, the data dimensions are consistent, and comparison is convenient.
And sampling the contour of the object at different rotation angles by taking the center of the object CAD model as the center at the specified distance, and storing the contour sampling information and the corresponding pose information to obtain rough matching template data of the object.
Image recognition output object mask:
the current method for image recognition with prominent effect is to use a deep neural network, wherein Mask-RCNN is a model with better effect in the current image recognition by using the deep neural network, the effect is shown in figure 5, the deep neural network model can output the object type and the object Mask in real time and high precision after being trained, so the model is used as an image detection end processing module in the method.
With the continuous development of the neural network, the performance of different algorithms and deep neural network models is beyond that of Mask-RCNN, and the algorithm can be suitable for any output Mask (Mask) or contour algorithm or deep neural network model, namely, the algorithm can be used as a universal solution. When the Mask-RCNN neural network is trained, a data set is automatically generated by using a blend and Opencv software for training, and the recognition precision is high.
And (3) coarse matching algorithm:
the pose of the rigid body comprises a rotation part R and a displacement part T, and the matching process of the rotation part is as follows:
because the object shade resolutions output by different frames are different, if the object resolution is too low, the data quality acquired by the sampling algorithm is influenced, and if the object resolution is too high, the sampling speed is reduced during sampling, so that the method is the same as the sampling algorithm, firstly, the output profile information is normalized and is unified to the same scale for comparison.
If the sampling data of the actual shade of the object is SinThe ith group of data in the template data is SiAnd each group has n sampling values, calculating L of each group of actual mask sampling data and template1Distance, L of the ith group of data1Distance LiComprises the following steps:
ideally, when the positions are the same, the sampling values should be consistent, that is, the rotation angle with the distance of 0 in the template data is the rotation angle corresponding to the profile, in practice, if the angle is too finely divided, a large amount of data is generated, and the matching is too slow, so that the rotation angle corresponding to the minimum value satisfying the threshold in all the results is taken as the rotation angle obtained by current matching, and the rotation angle not satisfying the threshold is regarded as the matching failure. In order to guarantee the matching speed, in the coarse matching, the error is controlled to be not more than 12 degrees per degree of freedom error at the Euler angle (namely, 360 degrees are equally divided into 30 parts for sampling to generate a coarse matching template). And then the Euler angle information can be converted into a rotation matrix R, so that the rotation information of the object can be obtained.
The algorithm of the translation part is as follows:
when generating the template data, since the object is sampled at a predetermined distance and the CAD model size is known, the size of the bounding box corresponding to the object is inversely proportional to the distance thereof, that is, the distance between the center point of the model and the optical center of the camera can be determined by (5) if the bounding box is smaller and the distance is longer, which is identical to the distance recognized by human eyes:
D=(win/wi)·Di(6)
wherein winOutput bounding box width (or calculated using the bounding box length), w, for object recognitioniBounding box width of template data matched to its rotation, DiThe distance is specified when the template data is collected, and D is the distance between the center point of the model and the optical center of the camera.
Similarly, since the CAD model has known a priori information about its size, the actual physical distance represented by each pixel in the template can be calculated, and the displacement vector of the object can be calculated:
in an actual experiment, since information at a sub-pixel level cannot be acquired, when the calculation is performed only by using pixels, the displacement vector error is large, and the problem can be solved by improving the resolution of a camera image, that is, the higher the resolution of the camera is, the more accurate the position of an object is. After the rotation R and the displacement T of the object are obtained, the world coordinates of the object can be obtained by combining internal and external parameters of the camera.
An iterative algorithm:
after the rough matching pose of the object is obtained, the rotation of the object theoretically has an error smaller than 12 degrees, in order to eliminate the error, an iterative algorithm is introduced, the algorithm calculates on the rotation information of the object obtained by rough matching, and finally rotation information with zero error is obtained (when the floating point number is set to be 8 decimal numbers)
If the coarse matching object rotation is a ═ a (ψ, θ, Φ), then each axis is added or subtracted by a small angle Δ ∈ on the basis, and since the coarse matching interval was set to 12 ° before, Δ ∈ is set to 6 ° which is half of the coarse matching interval, and thus 26 angles of its adjacent space are found in the coarse matching rotation space, the contour of the object is obtained by combining the CAD model using these 26 angles, and the contour sampling method is combined with equation (5) to find equation (5) LiRotation A of minimum value1=(ψ1,θ1,φ1) And obtaining the rotation angle of the object after one iteration.
And then, continuously halving the delta epsilon to obtain an angle value in a smaller range for iteration, and finally obtaining a rotation angle (when the floating point number is set to be 8 decimal places) with the formula (3) being 0, and obtaining more accurate rotation information by setting the floating point number of the computer.
And combining the translation information of the object obtained in the rough matching to obtain the accurate 6Dof position of the object.
Experimental data:
the experimental environment was configured as follows: the notebook is the association Y7000, the system is ubuntul6.04, and the programming language uses python 3.6.
Object recognition accuracy: the Mask data input by the method adopts the Mask output by the Mask-RCNN neural network, and because the Mask-RCNN neural network has strong performance, the method realizes ideal identification precision through self-developed data set training, and can meet the requirements of the method, as shown in FIG. 6.
Rotation accuracy: the coarse matching rotation precision is determined by a generated coarse matching data template, each degree of freedom of an Euler angle is divided into 30 equal parts during the coarse matching in the experiment, so that the precision is not more than 12 degrees (360 degrees/30), and the highest precision of 8-bit floating point numbers is achieved after 6 iterations on average. Fig. 7 shows the comparison of the results of the mask after rough matching and iteration, wherein the mask is a random pose mask (i.e. input) of the object, the rendered image of the object obtained by using the pose after rough matching, and the difference image of the two.
Experiments prove that the rotation error after iteration is 0 at an 8-bit floating point, and the green error of the part of the graph 5 is caused by the position error.
Position accuracy: since the position of the method is calculated by the bounding box of the object, the accuracy is limited by the pixel accuracy, and in extreme cases, for example, a far object is smaller, the bounding box is proportionally reduced, so that the bounding box is different by one pixel, the position error is increased greatly, so that the position accuracy of the object depends on the camera pixel, and the higher the camera pixel is, the smaller the bounding box error is, and the higher the position accuracy of the object is.
Through experiments, when the resolution of the camera is 512x512 pixels, the position accuracy of the 4x4x3(cm) object in the method is changed along with the position of the object from the camera, and the position accuracy is shown in table 1:
TABLE 1 position error versus distance
Distance between object and camera (mm) | Error (mm) |
500 | 0-5 |
1000 | 2-12 |
2000 | 10-100 |
5000 | >100 |
Comparison with other related pose methods:
compared with the typical methods such as SSD-6D, BB8 and the like in the current neural network, when the evaluation standard is the current universal standard 2Dproject, 5cm5 degrees or 6Dpos, the method is close to 100% in rotation accuracy, the method is far superior to other algorithms, the main error source is position error, and no contrast is found with other methods after the position error cause is considered to depend on the resolution precision of a camera picture.
The difference in real-time is large, and the method runs in the personal notebook environment to detect a rough matching of one picture for about 0.6s, and the average time after iteration is about 40-60 s. Generally, the 6Dof pose method based on the neural network can basically reach real time (> 20fps), and the 6Dof pose method based on the algorithm is representative that Linemod can also basically reach 15-18 fps.
The method is outstanding in anti-interference capability, and as long as the object contour sampling is basically correct, the influence of the absence of the object mask on the calculation pose of the method is not large, as shown in fig. 8.
In conclusion, the method can be used as a general algorithm for detecting the pose of the object when the real-time requirement is not high, the detection precision is high, the anti-interference performance is high, and the real-time performance can be improved by using C + + codes and parallel computing in practical application so as to meet the use requirement.
Claims (7)
1. An object pose prediction method based on a CAD model is characterized by comprising the following steps:
obtaining relevant parameters of the monocular camera through calibration, and generating data required by rough matching by using a CAD model;
detecting and identifying an object in the image and outputting a mask of the image, and obtaining related outline information of the object through the mask of the object;
and obtaining a rough matching pose of the object by combining the relevant contour information of the object with rough matching data, and then obtaining an accurate pose of the object by an iterative algorithm.
2. The CAD model-based object pose prediction method of claim 1, wherein the method for obtaining monocular camera related parameters by calibration comprises the steps of:
constructing a camera imaging model:
m is a three-dimensional space point, M is an image point projected by M on an image plane, and the projection of a world coordinate system to a pixel coordinate can be obtained according to the relation between coordinate systems related to a camera:
can write (1) into (2)
Wherein a isx,ayScale factors for the horizontal and vertical axes of the image, respectively; k is a camera internal parameter matrix;M1containing a rotation matrix and a translation vector, M1The medium parameter is determined by the position of the camera coordinate system relative to the world coordinate system, hence the name M1Is a camera extrinsic parameter matrix; the product M of the internal parameter matrix and the external parameter matrix is a projection matrix; xWIs the x-axis coordinate, Y, of the center W of the object in the world coordinate systemWIs the y-axis coordinate, Z, of the center W of the object in the world coordinate systemWIs the z-axis coordinate of the object center W in the world coordinate system;
the focal length of the camera is f, the axis is the positive z direction, the x and y axes are on the plane of the optical center O, the optical center O is used as the origin of the camera coordinate system, and the position of the center of the object is represented by W in the camera coordinate system, wherein:
W=(Wx,Wy,Wz) (3)
specifying the object center as the position of the object CAD model center, and if P ═ u (u),v) is the coordinates of the corresponding pixels of the object on the image, and K is the camera internal reference matrix, then the equation can be obtained:
this equation represents the two-dimensional coordinate position P of the actual object center position W after passing through the camera intrinsic parameters K and being projected onto the image in the camera coordinate system.
3. The CAD model-based object pose prediction method of claim 1, wherein the method of generating coarse match data using a CAD model is as follows:
firstly, rendering a mask of an object under a specified pose through an object CAD model, obtaining a boundary frame of the object through the mask of the object, and then sampling the outline of the object at intervals on the boundary frame according to different requirements;
dividing L into n equal parts by taking the length L of a left frame as a reference, taking every L/n as a sampling abscissa point, traversing points on each contour, and calculating the distance from the points to the left frame when the abscissa of the points is equal to the sampling abscissa point;
normalizing the sampling value, namely unifying the lengths of the left boundary frames to a unit;
and sampling the contour of the object at different rotation angles by taking the center of the object CAD model as the center at the specified distance, and storing the contour sampling information and the corresponding pose information to obtain rough matching template data of the object.
4. The CAD model-based object pose prediction method of claim 1, wherein the method of detecting masks that identify objects in an image and output an image is as follows:
and performing image recognition by using a Mask-RCNN neural network, and outputting the category of the object and the Mask of the object.
5. The CAD model based object pose prediction method of claim 4, wherein during training of the Mask-RCNN neural network, a dataset is automatically generated by a blend and Opencv software for training.
6. The CAD model-based object pose prediction method of claim 1, wherein the coarse matching pose method is as follows:
the pose of the rigid body comprises a rotation part R and a displacement part T, and the matching process of the rotation part is as follows:
firstly, normalizing the output contour information, unifying the output contour information to the same scale for comparison;
if the sampling data of the actual shade of the object is SinThe ith group of data in the template data is SiAnd each group has n sampling values, calculating L of each group of actual mask sampling data and template1Distance, L of the ith group of data1Distance LiComprises the following steps:
ideally, the sampling values should be consistent when the positions are the same, that is, the rotation angle with the distance of 0 in the template data is the rotation angle corresponding to the contour, so that the rotation angle corresponding to the minimum value in all the results when the threshold is met is taken as the rotation angle obtained by current matching, and the threshold is not met to consider that the matching fails;
in coarse matching, the error is controlled to be not more than 12 degrees in each degree of freedom of the Euler angle; then, the Euler angle information is converted into a rotation matrix R, and the rotation information of the object is obtained;
the algorithm of the translation part is as follows:
when generating the template data, since the object is sampled at a predetermined distance and the CAD model size is known, the size of the bounding box corresponding to the object is inversely proportional to the distance thereof, that is, the distance between the center point of the model and the optical center of the camera can be found by (5) if the distance becomes longer as the bounding box becomes smaller, which is consistent with the recognition by the human eye:
D=(win/wi)·Di(6)
wherein winOutput bounding box width, w, for object identificationiBounding box width of template data matched to its rotation, DiThe distance is specified when the template data is collected, and D is the distance between the center point of the model and the optical center of the camera;
the prior information of the size of the CAD model is known, so that the actual physical distance represented by each pixel in the template can be calculated, and the displacement vector t of the object can be further calculatedz:
Wherein, txIs an object in xAmount of displacement of the shaft, tyThe world coordinate of the object is obtained by combining internal and external parameters of the camera after the rotation R and the displacement T of the object are obtained.
7. The CAD model-based object pose prediction method of claim 6, wherein the exact pose of the object is obtained by an iterative algorithm as follows:
if the coarse matching object rotation is a ═ phi (psi, theta, phi), then, on the basis of the coarse matching object rotation, each coordinate axis is added and subtracted by an angle delta epsilon, the delta epsilon is set to be half of the coarse matching interval, a plurality of angles of adjacent spaces of the coarse matching object rotation are obtained in the coarse matching rotation space, the outlines of the objects are obtained by combining a CAD model by the plurality of angles of the adjacent spaces, and the outlines are obtained by combining the formula (5) by a contour sampling method, so that the formula (5) L is obtainediRotation A of minimum value1=(ψ1,θ1,φ1) Obtaining the rotation angle of the object after one iteration;
then, continuously halving the angle delta epsilon to obtain an angle value in a smaller range for iteration, and finally obtaining a rotation angle with the formula (3) being 0;
and combining the translation information of the object obtained in the rough matching process to obtain the accurate pose of the object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910947809.8A CN110706285A (en) | 2019-10-08 | 2019-10-08 | Object pose prediction method based on CAD model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910947809.8A CN110706285A (en) | 2019-10-08 | 2019-10-08 | Object pose prediction method based on CAD model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110706285A true CN110706285A (en) | 2020-01-17 |
Family
ID=69196741
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910947809.8A Pending CN110706285A (en) | 2019-10-08 | 2019-10-08 | Object pose prediction method based on CAD model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110706285A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112465898A (en) * | 2020-11-20 | 2021-03-09 | 上海交通大学 | Object 3D pose tag acquisition method based on checkerboard calibration plate |
CN112630639A (en) * | 2020-12-01 | 2021-04-09 | 国网江苏省电力有限公司检修分公司 | System and method for online detection of meshing state of handcart contact of high-voltage switch cabinet |
CN115033998A (en) * | 2022-07-13 | 2022-09-09 | 北京航空航天大学 | Personalized 2D data set construction method for mechanical parts |
WO2022252487A1 (en) * | 2021-06-04 | 2022-12-08 | 浙江商汤科技开发有限公司 | Pose acquisition method, apparatus, electronic device, storage medium, and program |
EP4166281A4 (en) * | 2020-07-29 | 2024-03-13 | Siemens Ltd. China | Method and apparatus for robot to grab three-dimensional object |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110157178A1 (en) * | 2009-12-28 | 2011-06-30 | Cuneyt Oncel Tuzel | Method and System for Determining Poses of Objects |
CN103365249A (en) * | 2013-07-10 | 2013-10-23 | 西安电子科技大学 | Rapid solving method for failure workspace of six-degree-of-freedom parallel robot |
CN104596502A (en) * | 2015-01-23 | 2015-05-06 | 浙江大学 | Object posture measuring method based on CAD model and monocular vision |
CN106600639A (en) * | 2016-12-09 | 2017-04-26 | 江南大学 | Genetic algorithm and adaptive threshold constraint-combined ICP (iterative closest point) pose positioning technology |
CN106845354A (en) * | 2016-12-23 | 2017-06-13 | 中国科学院自动化研究所 | Partial view base construction method, part positioning grasping means and device |
CN106845515A (en) * | 2016-12-06 | 2017-06-13 | 上海交通大学 | Robot target identification and pose reconstructing method based on virtual sample deep learning |
CN107818577A (en) * | 2017-10-26 | 2018-03-20 | 滁州学院 | A kind of Parts Recognition and localization method based on mixed model |
CN108010082A (en) * | 2017-12-28 | 2018-05-08 | 上海觉感视觉科技有限公司 | A kind of method of geometric match |
CN108555908A (en) * | 2018-04-12 | 2018-09-21 | 同济大学 | A kind of identification of stacking workpiece posture and pick-up method based on RGBD cameras |
CN109087323A (en) * | 2018-07-25 | 2018-12-25 | 武汉大学 | A kind of image three-dimensional vehicle Attitude estimation method based on fine CAD model |
CN109801337A (en) * | 2019-01-21 | 2019-05-24 | 同济大学 | A kind of 6D position and orientation estimation method of Case-based Reasoning segmentation network and iteration optimization |
CN110097598A (en) * | 2019-04-11 | 2019-08-06 | 暨南大学 | A kind of three-dimension object position and orientation estimation method based on PVFH feature |
CN110298854A (en) * | 2019-05-17 | 2019-10-01 | 同济大学 | The snakelike arm co-located method of flight based on online adaptive and monocular vision |
-
2019
- 2019-10-08 CN CN201910947809.8A patent/CN110706285A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110157178A1 (en) * | 2009-12-28 | 2011-06-30 | Cuneyt Oncel Tuzel | Method and System for Determining Poses of Objects |
CN103365249A (en) * | 2013-07-10 | 2013-10-23 | 西安电子科技大学 | Rapid solving method for failure workspace of six-degree-of-freedom parallel robot |
CN104596502A (en) * | 2015-01-23 | 2015-05-06 | 浙江大学 | Object posture measuring method based on CAD model and monocular vision |
CN106845515A (en) * | 2016-12-06 | 2017-06-13 | 上海交通大学 | Robot target identification and pose reconstructing method based on virtual sample deep learning |
CN106600639A (en) * | 2016-12-09 | 2017-04-26 | 江南大学 | Genetic algorithm and adaptive threshold constraint-combined ICP (iterative closest point) pose positioning technology |
CN106845354A (en) * | 2016-12-23 | 2017-06-13 | 中国科学院自动化研究所 | Partial view base construction method, part positioning grasping means and device |
CN107818577A (en) * | 2017-10-26 | 2018-03-20 | 滁州学院 | A kind of Parts Recognition and localization method based on mixed model |
CN108010082A (en) * | 2017-12-28 | 2018-05-08 | 上海觉感视觉科技有限公司 | A kind of method of geometric match |
CN108555908A (en) * | 2018-04-12 | 2018-09-21 | 同济大学 | A kind of identification of stacking workpiece posture and pick-up method based on RGBD cameras |
CN109087323A (en) * | 2018-07-25 | 2018-12-25 | 武汉大学 | A kind of image three-dimensional vehicle Attitude estimation method based on fine CAD model |
CN109801337A (en) * | 2019-01-21 | 2019-05-24 | 同济大学 | A kind of 6D position and orientation estimation method of Case-based Reasoning segmentation network and iteration optimization |
CN110097598A (en) * | 2019-04-11 | 2019-08-06 | 暨南大学 | A kind of three-dimension object position and orientation estimation method based on PVFH feature |
CN110298854A (en) * | 2019-05-17 | 2019-10-01 | 同济大学 | The snakelike arm co-located method of flight based on online adaptive and monocular vision |
Non-Patent Citations (3)
Title |
---|
YIBO CUI 等: "Estimation of 6Dof Pose Using Image Mask and Bounding Box", 《IGTA 2019: IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS》 * |
ZHUANGNAN XU 等: "A Monocular Object Pose Recognition Algorithm Based on CAD Model and Object Contour", 《JOURNAL OF COMPUTING AND ELECTRONIC INFORMATION MANAGEMENT》 * |
崔毅博 等: "利用RGB图像和DNN进行物体6DOf位姿推算", 《计算机仿真》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4166281A4 (en) * | 2020-07-29 | 2024-03-13 | Siemens Ltd. China | Method and apparatus for robot to grab three-dimensional object |
CN112465898A (en) * | 2020-11-20 | 2021-03-09 | 上海交通大学 | Object 3D pose tag acquisition method based on checkerboard calibration plate |
CN112630639A (en) * | 2020-12-01 | 2021-04-09 | 国网江苏省电力有限公司检修分公司 | System and method for online detection of meshing state of handcart contact of high-voltage switch cabinet |
CN112630639B (en) * | 2020-12-01 | 2022-12-23 | 国网江苏省电力有限公司检修分公司 | System and method for online detection of meshing state of handcart contact of high-voltage switch cabinet |
WO2022252487A1 (en) * | 2021-06-04 | 2022-12-08 | 浙江商汤科技开发有限公司 | Pose acquisition method, apparatus, electronic device, storage medium, and program |
CN115033998A (en) * | 2022-07-13 | 2022-09-09 | 北京航空航天大学 | Personalized 2D data set construction method for mechanical parts |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110706285A (en) | Object pose prediction method based on CAD model | |
Yang et al. | Monocular object and plane slam in structured environments | |
CN109345588B (en) | Tag-based six-degree-of-freedom attitude estimation method | |
CN109544677B (en) | Indoor scene main structure reconstruction method and system based on depth image key frame | |
CN105021124B (en) | A kind of planar part three-dimensional position and normal vector computational methods based on depth map | |
CN111897349B (en) | Autonomous obstacle avoidance method for underwater robot based on binocular vision | |
CN108122256B (en) | A method of it approaches under state and rotates object pose measurement | |
CN110688947B (en) | Method for synchronously realizing human face three-dimensional point cloud feature point positioning and human face segmentation | |
EP3159125A1 (en) | Device for recognizing position of mobile robot by using direct tracking, and method therefor | |
CN111401266B (en) | Method, equipment, computer equipment and readable storage medium for positioning picture corner points | |
EP3751517A1 (en) | Fast articulated motion tracking | |
KR100874817B1 (en) | Facial feature detection method, media and apparatus using stereo combining mechanism | |
US20050265604A1 (en) | Image processing apparatus and method thereof | |
EP3159122A1 (en) | Device and method for recognizing location of mobile robot by means of search-based correlation matching | |
CN113393524B (en) | Target pose estimation method combining deep learning and contour point cloud reconstruction | |
CN114022542A (en) | Three-dimensional reconstruction-based 3D database manufacturing method | |
EP3185212B1 (en) | Dynamic particle filter parameterization | |
CN108335325A (en) | A kind of cube method for fast measuring based on depth camera data | |
CN113439289A (en) | Image processing for determining the thickness of an object | |
Sun et al. | A fast underwater calibration method based on vanishing point optimization of two orthogonal parallel lines | |
CN111709269B (en) | Human hand segmentation method and device based on two-dimensional joint information in depth image | |
CN108694348B (en) | Tracking registration method and device based on natural features | |
CN105339981A (en) | Method for registering data using set of primitives | |
CN111915632B (en) | Machine learning-based method for constructing truth database of lean texture target object | |
CN117218205B (en) | Camera external parameter correction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200117 |
|
RJ01 | Rejection of invention patent application after publication |