CN110706285A - Object pose prediction method based on CAD model - Google Patents

Object pose prediction method based on CAD model Download PDF

Info

Publication number
CN110706285A
CN110706285A CN201910947809.8A CN201910947809A CN110706285A CN 110706285 A CN110706285 A CN 110706285A CN 201910947809 A CN201910947809 A CN 201910947809A CN 110706285 A CN110706285 A CN 110706285A
Authority
CN
China
Prior art keywords
camera
rotation
pose
cad model
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910947809.8A
Other languages
Chinese (zh)
Inventor
许状男
王广龙
刁俊岐
庞健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN201910947809.8A priority Critical patent/CN110706285A/en
Publication of CN110706285A publication Critical patent/CN110706285A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/002Measuring arrangements characterised by the use of optical techniques for measuring two or more coordinates
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/24Measuring arrangements characterised by the use of optical techniques for measuring contours or curvatures
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C11/00Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an object pose prediction method based on a CAD model, and relates to the technical field of image processing methods. The method comprises the following steps: obtaining relevant parameters of the monocular camera through calibration, and generating data required by rough matching by using a CAD model; detecting and identifying an object in the image and outputting a mask of the image, and obtaining related outline information of the object through the mask of the object; and obtaining a rough matching pose of the object by combining the relevant contour information of the object with rough matching data, and then obtaining an accurate pose of the object by an iterative algorithm. The method can be used as an algorithm for detecting the pose of an object when the real-time requirement is not high, and has high detection precision and strong anti-interference performance.

Description

Object pose prediction method based on CAD model
Technical Field
The invention relates to the technical field of image processing methods, in particular to an object pose prediction method based on a computer-aided design (CAD) model.
Background
Augmented Reality (AR) is based on computer graphics technology and visualization technology, adds a positioning virtual object in a three-dimensional space, can integrate information of a real scene and a virtual scene, and has real-time interactivity. Since the concept of augmented reality-based induced repair was proposed, research into AR in the field of repair has been increasingly in depth. For example, when a robot using an augmented reality technology executes tasks such as grabbing and welding, accurate three-dimensional pose information of an object needs to be acquired in advance through visual information acquired by a camera, and in addition, the three-dimensional pose of the object needs to be judged in advance through visual sensor information in the aspects of unmanned driving, aerospace, deep sea operation, weapon guidance and the like. Present augmented reality's sensor mainly relies on camera, laser radar, ultrasonic radar etc. and wherein the camera divide into monocular camera and binocular camera again, and wherein there is bulky, the weight is heavy, the problem of high price, fragile in the binocular camera, and ultrasonic radar exists the precision not high, and the real-time is poor, can not have and shelter from, easily receives the problem of noise influence.
Disclosure of Invention
The invention aims to solve the technical problem of how to provide an identification method which is low in cost and can accurately obtain the pose of an object.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an object pose prediction method based on a CAD model is characterized by comprising the following steps:
obtaining relevant parameters of the monocular camera through calibration, and generating data required by rough matching by using a CAD model;
detecting and identifying an object in the image and outputting a mask of the image, and obtaining related outline information of the object through the mask of the object;
and obtaining a rough matching pose of the object by combining the relevant contour information of the object with rough matching data, and then obtaining an accurate pose of the object by an iterative algorithm.
The further technical scheme is that the method for acquiring the relevant parameters of the monocular camera through calibration comprises the following steps:
constructing a camera imaging model:
m is a three-dimensional space point, M is an image point projected by M on an image plane, and the projection of a world coordinate system to a pixel coordinate can be obtained according to the relation between coordinate systems related to a camera:
can write (1) into (2)
Wherein a isx,ayScale factors for the horizontal and vertical axes of the image, respectively; k is a camera internal parameter matrix; m1Containing a rotation matrix and a translation vector, M1The medium parameter is determined by the position of the camera coordinate system relative to the world coordinate system, hence the name M1Is a camera extrinsic parameter matrix; the product M of the internal parameter matrix and the external parameter matrix is a projection matrix; xWIs the x-axis coordinate, Y, of the center W of the object in the world coordinate systemWIs the y-axis coordinate, Z, of the center W of the object in the world coordinate systemWIs the z-axis coordinate of the object center W in the world coordinate system;
the focal length of the camera is f, the axis is the positive z direction, the x and y axes are on the plane of the optical center O, the optical center O is used as the origin of the camera coordinate system, and the position of the center of the object is represented by W in the camera coordinate system, wherein:
W=(Wx,Wy,Wz) (3)
if P ═ u, v is the coordinates of the corresponding pixel of the object on the image, and K is the camera intrinsic parameter matrix, then this equation can be obtained:
Figure BDA0002224735330000023
this equation represents the two-dimensional coordinate position P of the actual object center position W after passing through the camera intrinsic parameters K and being projected onto the image in the camera coordinate system.
The further technical scheme is that the method for generating rough matching data by utilizing the CAD model comprises the following steps:
firstly, rendering a mask of an object under a specified pose through an object CAD model, obtaining a boundary frame of the object through the mask of the object, and then sampling the outline of the object at intervals on the boundary frame according to different requirements;
dividing L into n equal parts by taking the length L of a left frame as a reference, taking every L/n as a sampling abscissa point, traversing points on each contour, and calculating the distance from the points to the left frame when the abscissa of the points is equal to the sampling abscissa point;
normalizing the sampling value, namely unifying the lengths of the left boundary frames to a unit;
and sampling the contour of the object at different rotation angles by taking the center of the object CAD model as the center at the specified distance, and storing the contour sampling information and the corresponding pose information to obtain rough matching template data of the object.
The further technical solution is that the method for detecting and identifying objects in an image and outputting a mask of the image comprises the following steps:
and performing image recognition by using a Mask-RCNN neural network, and outputting the category of the object and the Mask of the object.
The technical scheme is that when the Mask-RCNN neural network is trained, a data set is automatically generated by using a blender and Opencv software for training.
The further technical scheme is that the rough pose matching method comprises the following steps:
the pose of the rigid body comprises a rotation part R and a displacement part T, and the matching process of the rotation part is as follows:
firstly, normalizing the output contour information, unifying the output contour information to the same scale for comparison;
if the sampling data of the actual shade of the object is SinThe ith group of data in the template data is SiAnd each group has n sampling values, calculating L of each group of actual mask sampling data and template1Distance, L of the ith group of data1Distance LiComprises the following steps:
Figure BDA0002224735330000031
ideally, the sampling values should be consistent when the positions are the same, that is, the rotation angle with the distance of 0 in the template data is the rotation angle corresponding to the contour, so that the rotation angle corresponding to the minimum value in all the results when the threshold is met is taken as the rotation angle obtained by current matching, and the threshold is not met to consider that the matching fails;
in coarse matching, the error is controlled to be not more than 12 degrees in each degree of freedom of the Euler angle; then, the Euler angle information is converted into a rotation matrix R, and the rotation information of the object is obtained;
the algorithm of the translation part is as follows:
when generating the template data, since the object is sampled at a predetermined distance and the CAD model size is known, the size of the bounding box corresponding to the object is inversely proportional to the distance thereof, that is, the distance between the center point of the model and the optical center of the camera can be found by (5) if the distance becomes longer as the bounding box becomes smaller, which is consistent with the recognition by the human eye:
D=(win/wi)·Di(6)
wherein winOutput bounding box width, w, for object identificationiBounding box width of template data matched to its rotation, DiThe distance is specified when the template data is collected, and D is the distance between the center point of the model and the optical center of the camera;
CAD model size prior informationIt is known that the actual physical distance represented by each pixel in the template can be calculated, and then the displacement vector t of the object can be calculatedz
Figure BDA0002224735330000041
Wherein, txIs the displacement of the object in the x-axis, tyThe world coordinate of the object is obtained by combining internal and external parameters of the camera after the rotation R and the displacement T of the object are obtained.
The further technical scheme is that the method for obtaining the accurate pose of the object through the iterative algorithm comprises the following steps:
if the coarse matching object rotation is a ═ phi (psi, theta, phi), then, on the basis of the coarse matching object rotation, each coordinate axis is added and subtracted by an angle delta epsilon, the delta epsilon is set to be half of the coarse matching interval, a plurality of angles of adjacent spaces of the coarse matching object rotation are obtained in the coarse matching rotation space, the outlines of the objects are obtained by combining a CAD model by the plurality of angles of the adjacent spaces, and the outlines are obtained by combining the formula (5) by a contour sampling method, so that the formula (5) L is obtainediRotation A of minimum value1=(ψ1,θ11) Obtaining the rotation angle of the object after one iteration;
then, continuously halving the angle delta epsilon to obtain an angle value in a smaller range for iteration, and finally obtaining a rotation angle with the formula (3) being 0;
and combining the translation information of the object obtained in the rough matching process to obtain the accurate pose of the object.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: according to the method, firstly, relevant parameters of a camera are obtained through calibration, data required by rough matching are generated through a CAD model, then, a deep neural network or other algorithms are used for detecting and identifying an object in an image and outputting a shade of the image, relevant outline information can be obtained through the shade of the object, the outline information can be combined with the rough matching data to obtain the rough matching pose of the object, and then the accurate pose of the object can be obtained through an iterative algorithm.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a method according to an embodiment of the invention;
FIG. 2 is a diagram of a coordinate system in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a pinhole model of a camera according to an embodiment of the present invention;
FIG. 4 is a graph of the results of sampling an object profile in an embodiment of the present invention;
FIG. 5 is a diagram illustrating the segmentation effect of Mask-RCNN images according to an embodiment of the present invention;
FIG. 6 is a diagram showing the results of Mask-RCNN image recognition in an embodiment of the present invention;
FIG. 7 is a comparison graph of coarse matching and post-iteration poses in an embodiment of the present invention;
FIG. 8 is a diagram of the pose accuracy of an object under occlusion in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1, an embodiment of the present invention discloses a method for predicting the pose of an object based on a CAD model, which includes the following steps:
the method comprises the steps of firstly obtaining relevant parameters of a camera through calibration, generating data required by rough matching by using a CAD model, detecting and identifying an object in an image by using a deep neural network or other algorithms, outputting a shade of the image, obtaining relevant contour information through the shade of the object, obtaining rough matching pose of the object by combining the contour information with the rough matching data, and obtaining accurate pose of the object through an iterative algorithm.
The above method is explained in detail below:
a camera imaging model:
m is a three-dimensional space point, and M is an image point projected by M on an image plane. The projection of the world coordinate system to the pixel coordinates can be obtained from the relationship between the coordinate systems involved in the camera (the coordinate system relationship is shown in fig. 2):
can write (1) into (2)
Figure BDA0002224735330000062
Wherein a isx,ayScale factors for the horizontal and vertical axes of the image, respectively; k comprises internal parameters such as focal length, principal point coordinates and the like, and is called as an internal parameter matrix; m1Containing a rotation matrix and a translation vector, M1The medium parameters are determined by the position of the camera coordinate system relative to the world coordinate system, and are therefore called the extrinsic parameter matrix of the camera; the product M of the internal parameter and the external parameter matrix is called the projection matrix. By comparing the formula (1) and the formula (2), the concrete expressions of the internal and external parameters of the camera represented by the matrixes can be easily determined, and the calibration of the camera is to determine the internal parameters and the external parameters of the camera.
Calibrating a camera:
assuming that the camera is a pinhole model, as shown in fig. 3, the pose of an object is a general term for the position and posture of the object.
The focal length of the camera is f, the axis is the positive z direction, the x and y axes are on the plane of the optical center O, the optical center O is used as the origin of the camera coordinate system, and the position of the center of the object can be represented by W in the camera coordinate system, wherein:
W=(Wx,Wy,Wz) (3)
the specified object center is the position of the object CAD model center, and is generally the volume center. If P is (u, v) the coordinates of the corresponding pixel of the object on the image and K is the camera reference matrix, then this equation can be obtained:
Figure BDA0002224735330000071
this equation represents the two-dimensional coordinate position P of the actual object center position W after passing through the camera intrinsic parameters K and being projected onto the image in the camera coordinate system.
Therefore, to obtain the three-dimensional coordinate position of the object, the camera internal reference K must be calibrated, where there are many calibration methods, and in the present application, K is obtained by using the camera calibration method provided by Opencv software.
Generating coarse matching data by using a CAD model:
the most central algorithm in generating the template data using the object CAD model is a sampling algorithm based on the object contour. Firstly, a Mask of the object can be rendered under a designated pose through the object CAD model, a boundary box of the object can be obtained through the Mask of the object, then the contour of the object is sampled at regular intervals on the boundary box according to different requirements, and as shown in FIG. 4, the contour of the object is sampled at regular intervals on the left frame.
Taking the frame length L of a left boundary (similar to other boundaries) as a reference, dividing L into n equal parts, taking every L/n as a sampling abscissa point, traversing points on each contour, and calculating the distance from each point to the left frame when the abscissa of each point is equal to the sampling abscissa point. This changes the contour information into a set of sample values.
Because the contour may have a variable size, the sampling value needs to be normalized, that is, the length of the left bounding box is unified to one unit, and in the experiment, the length of the left bounding box is unified to 128px, so that the sampling precision and the sampling speed can be ensured.
The sampling mode has the advantages that a set of characteristics of the contour, namely sampling values, are obtained, the values have scaling invariance to the contour, but are sensitive to object rotation, the data dimensions are consistent, and comparison is convenient.
And sampling the contour of the object at different rotation angles by taking the center of the object CAD model as the center at the specified distance, and storing the contour sampling information and the corresponding pose information to obtain rough matching template data of the object.
Image recognition output object mask:
the current method for image recognition with prominent effect is to use a deep neural network, wherein Mask-RCNN is a model with better effect in the current image recognition by using the deep neural network, the effect is shown in figure 5, the deep neural network model can output the object type and the object Mask in real time and high precision after being trained, so the model is used as an image detection end processing module in the method.
With the continuous development of the neural network, the performance of different algorithms and deep neural network models is beyond that of Mask-RCNN, and the algorithm can be suitable for any output Mask (Mask) or contour algorithm or deep neural network model, namely, the algorithm can be used as a universal solution. When the Mask-RCNN neural network is trained, a data set is automatically generated by using a blend and Opencv software for training, and the recognition precision is high.
And (3) coarse matching algorithm:
the pose of the rigid body comprises a rotation part R and a displacement part T, and the matching process of the rotation part is as follows:
because the object shade resolutions output by different frames are different, if the object resolution is too low, the data quality acquired by the sampling algorithm is influenced, and if the object resolution is too high, the sampling speed is reduced during sampling, so that the method is the same as the sampling algorithm, firstly, the output profile information is normalized and is unified to the same scale for comparison.
If the sampling data of the actual shade of the object is SinThe ith group of data in the template data is SiAnd each group has n sampling values, calculating L of each group of actual mask sampling data and template1Distance, L of the ith group of data1Distance LiComprises the following steps:
Figure BDA0002224735330000081
ideally, when the positions are the same, the sampling values should be consistent, that is, the rotation angle with the distance of 0 in the template data is the rotation angle corresponding to the profile, in practice, if the angle is too finely divided, a large amount of data is generated, and the matching is too slow, so that the rotation angle corresponding to the minimum value satisfying the threshold in all the results is taken as the rotation angle obtained by current matching, and the rotation angle not satisfying the threshold is regarded as the matching failure. In order to guarantee the matching speed, in the coarse matching, the error is controlled to be not more than 12 degrees per degree of freedom error at the Euler angle (namely, 360 degrees are equally divided into 30 parts for sampling to generate a coarse matching template). And then the Euler angle information can be converted into a rotation matrix R, so that the rotation information of the object can be obtained.
The algorithm of the translation part is as follows:
when generating the template data, since the object is sampled at a predetermined distance and the CAD model size is known, the size of the bounding box corresponding to the object is inversely proportional to the distance thereof, that is, the distance between the center point of the model and the optical center of the camera can be determined by (5) if the bounding box is smaller and the distance is longer, which is identical to the distance recognized by human eyes:
D=(win/wi)·Di(6)
wherein winOutput bounding box width (or calculated using the bounding box length), w, for object recognitioniBounding box width of template data matched to its rotation, DiThe distance is specified when the template data is collected, and D is the distance between the center point of the model and the optical center of the camera.
Similarly, since the CAD model has known a priori information about its size, the actual physical distance represented by each pixel in the template can be calculated, and the displacement vector of the object can be calculated:
Figure BDA0002224735330000091
Figure BDA0002224735330000092
in an actual experiment, since information at a sub-pixel level cannot be acquired, when the calculation is performed only by using pixels, the displacement vector error is large, and the problem can be solved by improving the resolution of a camera image, that is, the higher the resolution of the camera is, the more accurate the position of an object is. After the rotation R and the displacement T of the object are obtained, the world coordinates of the object can be obtained by combining internal and external parameters of the camera.
An iterative algorithm:
after the rough matching pose of the object is obtained, the rotation of the object theoretically has an error smaller than 12 degrees, in order to eliminate the error, an iterative algorithm is introduced, the algorithm calculates on the rotation information of the object obtained by rough matching, and finally rotation information with zero error is obtained (when the floating point number is set to be 8 decimal numbers)
If the coarse matching object rotation is a ═ a (ψ, θ, Φ), then each axis is added or subtracted by a small angle Δ ∈ on the basis, and since the coarse matching interval was set to 12 ° before, Δ ∈ is set to 6 ° which is half of the coarse matching interval, and thus 26 angles of its adjacent space are found in the coarse matching rotation space, the contour of the object is obtained by combining the CAD model using these 26 angles, and the contour sampling method is combined with equation (5) to find equation (5) LiRotation A of minimum value1=(ψ1,θ11) And obtaining the rotation angle of the object after one iteration.
And then, continuously halving the delta epsilon to obtain an angle value in a smaller range for iteration, and finally obtaining a rotation angle (when the floating point number is set to be 8 decimal places) with the formula (3) being 0, and obtaining more accurate rotation information by setting the floating point number of the computer.
And combining the translation information of the object obtained in the rough matching to obtain the accurate 6Dof position of the object.
Experimental data:
the experimental environment was configured as follows: the notebook is the association Y7000, the system is ubuntul6.04, and the programming language uses python 3.6.
Object recognition accuracy: the Mask data input by the method adopts the Mask output by the Mask-RCNN neural network, and because the Mask-RCNN neural network has strong performance, the method realizes ideal identification precision through self-developed data set training, and can meet the requirements of the method, as shown in FIG. 6.
Rotation accuracy: the coarse matching rotation precision is determined by a generated coarse matching data template, each degree of freedom of an Euler angle is divided into 30 equal parts during the coarse matching in the experiment, so that the precision is not more than 12 degrees (360 degrees/30), and the highest precision of 8-bit floating point numbers is achieved after 6 iterations on average. Fig. 7 shows the comparison of the results of the mask after rough matching and iteration, wherein the mask is a random pose mask (i.e. input) of the object, the rendered image of the object obtained by using the pose after rough matching, and the difference image of the two.
Experiments prove that the rotation error after iteration is 0 at an 8-bit floating point, and the green error of the part of the graph 5 is caused by the position error.
Position accuracy: since the position of the method is calculated by the bounding box of the object, the accuracy is limited by the pixel accuracy, and in extreme cases, for example, a far object is smaller, the bounding box is proportionally reduced, so that the bounding box is different by one pixel, the position error is increased greatly, so that the position accuracy of the object depends on the camera pixel, and the higher the camera pixel is, the smaller the bounding box error is, and the higher the position accuracy of the object is.
Through experiments, when the resolution of the camera is 512x512 pixels, the position accuracy of the 4x4x3(cm) object in the method is changed along with the position of the object from the camera, and the position accuracy is shown in table 1:
TABLE 1 position error versus distance
Distance between object and camera (mm) Error (mm)
500 0-5
1000 2-12
2000 10-100
5000 >100
Comparison with other related pose methods:
compared with the typical methods such as SSD-6D, BB8 and the like in the current neural network, when the evaluation standard is the current universal standard 2Dproject, 5cm5 degrees or 6Dpos, the method is close to 100% in rotation accuracy, the method is far superior to other algorithms, the main error source is position error, and no contrast is found with other methods after the position error cause is considered to depend on the resolution precision of a camera picture.
The difference in real-time is large, and the method runs in the personal notebook environment to detect a rough matching of one picture for about 0.6s, and the average time after iteration is about 40-60 s. Generally, the 6Dof pose method based on the neural network can basically reach real time (> 20fps), and the 6Dof pose method based on the algorithm is representative that Linemod can also basically reach 15-18 fps.
The method is outstanding in anti-interference capability, and as long as the object contour sampling is basically correct, the influence of the absence of the object mask on the calculation pose of the method is not large, as shown in fig. 8.
In conclusion, the method can be used as a general algorithm for detecting the pose of the object when the real-time requirement is not high, the detection precision is high, the anti-interference performance is high, and the real-time performance can be improved by using C + + codes and parallel computing in practical application so as to meet the use requirement.

Claims (7)

1. An object pose prediction method based on a CAD model is characterized by comprising the following steps:
obtaining relevant parameters of the monocular camera through calibration, and generating data required by rough matching by using a CAD model;
detecting and identifying an object in the image and outputting a mask of the image, and obtaining related outline information of the object through the mask of the object;
and obtaining a rough matching pose of the object by combining the relevant contour information of the object with rough matching data, and then obtaining an accurate pose of the object by an iterative algorithm.
2. The CAD model-based object pose prediction method of claim 1, wherein the method for obtaining monocular camera related parameters by calibration comprises the steps of:
constructing a camera imaging model:
m is a three-dimensional space point, M is an image point projected by M on an image plane, and the projection of a world coordinate system to a pixel coordinate can be obtained according to the relation between coordinate systems related to a camera:
Figure FDA0002224735320000011
can write (1) into (2)
Figure FDA0002224735320000012
Wherein a isx,ayScale factors for the horizontal and vertical axes of the image, respectively; k is a camera internal parameter matrix;M1containing a rotation matrix and a translation vector, M1The medium parameter is determined by the position of the camera coordinate system relative to the world coordinate system, hence the name M1Is a camera extrinsic parameter matrix; the product M of the internal parameter matrix and the external parameter matrix is a projection matrix; xWIs the x-axis coordinate, Y, of the center W of the object in the world coordinate systemWIs the y-axis coordinate, Z, of the center W of the object in the world coordinate systemWIs the z-axis coordinate of the object center W in the world coordinate system;
the focal length of the camera is f, the axis is the positive z direction, the x and y axes are on the plane of the optical center O, the optical center O is used as the origin of the camera coordinate system, and the position of the center of the object is represented by W in the camera coordinate system, wherein:
W=(Wx,Wy,Wz) (3)
specifying the object center as the position of the object CAD model center, and if P ═ u (u),v) is the coordinates of the corresponding pixels of the object on the image, and K is the camera internal reference matrix, then the equation can be obtained:
Figure FDA0002224735320000021
this equation represents the two-dimensional coordinate position P of the actual object center position W after passing through the camera intrinsic parameters K and being projected onto the image in the camera coordinate system.
3. The CAD model-based object pose prediction method of claim 1, wherein the method of generating coarse match data using a CAD model is as follows:
firstly, rendering a mask of an object under a specified pose through an object CAD model, obtaining a boundary frame of the object through the mask of the object, and then sampling the outline of the object at intervals on the boundary frame according to different requirements;
dividing L into n equal parts by taking the length L of a left frame as a reference, taking every L/n as a sampling abscissa point, traversing points on each contour, and calculating the distance from the points to the left frame when the abscissa of the points is equal to the sampling abscissa point;
normalizing the sampling value, namely unifying the lengths of the left boundary frames to a unit;
and sampling the contour of the object at different rotation angles by taking the center of the object CAD model as the center at the specified distance, and storing the contour sampling information and the corresponding pose information to obtain rough matching template data of the object.
4. The CAD model-based object pose prediction method of claim 1, wherein the method of detecting masks that identify objects in an image and output an image is as follows:
and performing image recognition by using a Mask-RCNN neural network, and outputting the category of the object and the Mask of the object.
5. The CAD model based object pose prediction method of claim 4, wherein during training of the Mask-RCNN neural network, a dataset is automatically generated by a blend and Opencv software for training.
6. The CAD model-based object pose prediction method of claim 1, wherein the coarse matching pose method is as follows:
the pose of the rigid body comprises a rotation part R and a displacement part T, and the matching process of the rotation part is as follows:
firstly, normalizing the output contour information, unifying the output contour information to the same scale for comparison;
if the sampling data of the actual shade of the object is SinThe ith group of data in the template data is SiAnd each group has n sampling values, calculating L of each group of actual mask sampling data and template1Distance, L of the ith group of data1Distance LiComprises the following steps:
Figure FDA0002224735320000031
ideally, the sampling values should be consistent when the positions are the same, that is, the rotation angle with the distance of 0 in the template data is the rotation angle corresponding to the contour, so that the rotation angle corresponding to the minimum value in all the results when the threshold is met is taken as the rotation angle obtained by current matching, and the threshold is not met to consider that the matching fails;
in coarse matching, the error is controlled to be not more than 12 degrees in each degree of freedom of the Euler angle; then, the Euler angle information is converted into a rotation matrix R, and the rotation information of the object is obtained;
the algorithm of the translation part is as follows:
when generating the template data, since the object is sampled at a predetermined distance and the CAD model size is known, the size of the bounding box corresponding to the object is inversely proportional to the distance thereof, that is, the distance between the center point of the model and the optical center of the camera can be found by (5) if the distance becomes longer as the bounding box becomes smaller, which is consistent with the recognition by the human eye:
D=(win/wi)·Di(6)
wherein winOutput bounding box width, w, for object identificationiBounding box width of template data matched to its rotation, DiThe distance is specified when the template data is collected, and D is the distance between the center point of the model and the optical center of the camera;
the prior information of the size of the CAD model is known, so that the actual physical distance represented by each pixel in the template can be calculated, and the displacement vector t of the object can be further calculatedz
Figure FDA0002224735320000032
Figure FDA0002224735320000033
Wherein, txIs an object in xAmount of displacement of the shaft, tyThe world coordinate of the object is obtained by combining internal and external parameters of the camera after the rotation R and the displacement T of the object are obtained.
7. The CAD model-based object pose prediction method of claim 6, wherein the exact pose of the object is obtained by an iterative algorithm as follows:
if the coarse matching object rotation is a ═ phi (psi, theta, phi), then, on the basis of the coarse matching object rotation, each coordinate axis is added and subtracted by an angle delta epsilon, the delta epsilon is set to be half of the coarse matching interval, a plurality of angles of adjacent spaces of the coarse matching object rotation are obtained in the coarse matching rotation space, the outlines of the objects are obtained by combining a CAD model by the plurality of angles of the adjacent spaces, and the outlines are obtained by combining the formula (5) by a contour sampling method, so that the formula (5) L is obtainediRotation A of minimum value1=(ψ1,θ11) Obtaining the rotation angle of the object after one iteration;
then, continuously halving the angle delta epsilon to obtain an angle value in a smaller range for iteration, and finally obtaining a rotation angle with the formula (3) being 0;
and combining the translation information of the object obtained in the rough matching process to obtain the accurate pose of the object.
CN201910947809.8A 2019-10-08 2019-10-08 Object pose prediction method based on CAD model Pending CN110706285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910947809.8A CN110706285A (en) 2019-10-08 2019-10-08 Object pose prediction method based on CAD model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910947809.8A CN110706285A (en) 2019-10-08 2019-10-08 Object pose prediction method based on CAD model

Publications (1)

Publication Number Publication Date
CN110706285A true CN110706285A (en) 2020-01-17

Family

ID=69196741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910947809.8A Pending CN110706285A (en) 2019-10-08 2019-10-08 Object pose prediction method based on CAD model

Country Status (1)

Country Link
CN (1) CN110706285A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465898A (en) * 2020-11-20 2021-03-09 上海交通大学 Object 3D pose tag acquisition method based on checkerboard calibration plate
CN112630639A (en) * 2020-12-01 2021-04-09 国网江苏省电力有限公司检修分公司 System and method for online detection of meshing state of handcart contact of high-voltage switch cabinet
CN115033998A (en) * 2022-07-13 2022-09-09 北京航空航天大学 Personalized 2D data set construction method for mechanical parts
WO2022252487A1 (en) * 2021-06-04 2022-12-08 浙江商汤科技开发有限公司 Pose acquisition method, apparatus, electronic device, storage medium, and program
EP4166281A4 (en) * 2020-07-29 2024-03-13 Siemens Ltd. China Method and apparatus for robot to grab three-dimensional object

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110157178A1 (en) * 2009-12-28 2011-06-30 Cuneyt Oncel Tuzel Method and System for Determining Poses of Objects
CN103365249A (en) * 2013-07-10 2013-10-23 西安电子科技大学 Rapid solving method for failure workspace of six-degree-of-freedom parallel robot
CN104596502A (en) * 2015-01-23 2015-05-06 浙江大学 Object posture measuring method based on CAD model and monocular vision
CN106600639A (en) * 2016-12-09 2017-04-26 江南大学 Genetic algorithm and adaptive threshold constraint-combined ICP (iterative closest point) pose positioning technology
CN106845354A (en) * 2016-12-23 2017-06-13 中国科学院自动化研究所 Partial view base construction method, part positioning grasping means and device
CN106845515A (en) * 2016-12-06 2017-06-13 上海交通大学 Robot target identification and pose reconstructing method based on virtual sample deep learning
CN107818577A (en) * 2017-10-26 2018-03-20 滁州学院 A kind of Parts Recognition and localization method based on mixed model
CN108010082A (en) * 2017-12-28 2018-05-08 上海觉感视觉科技有限公司 A kind of method of geometric match
CN108555908A (en) * 2018-04-12 2018-09-21 同济大学 A kind of identification of stacking workpiece posture and pick-up method based on RGBD cameras
CN109087323A (en) * 2018-07-25 2018-12-25 武汉大学 A kind of image three-dimensional vehicle Attitude estimation method based on fine CAD model
CN109801337A (en) * 2019-01-21 2019-05-24 同济大学 A kind of 6D position and orientation estimation method of Case-based Reasoning segmentation network and iteration optimization
CN110097598A (en) * 2019-04-11 2019-08-06 暨南大学 A kind of three-dimension object position and orientation estimation method based on PVFH feature
CN110298854A (en) * 2019-05-17 2019-10-01 同济大学 The snakelike arm co-located method of flight based on online adaptive and monocular vision

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110157178A1 (en) * 2009-12-28 2011-06-30 Cuneyt Oncel Tuzel Method and System for Determining Poses of Objects
CN103365249A (en) * 2013-07-10 2013-10-23 西安电子科技大学 Rapid solving method for failure workspace of six-degree-of-freedom parallel robot
CN104596502A (en) * 2015-01-23 2015-05-06 浙江大学 Object posture measuring method based on CAD model and monocular vision
CN106845515A (en) * 2016-12-06 2017-06-13 上海交通大学 Robot target identification and pose reconstructing method based on virtual sample deep learning
CN106600639A (en) * 2016-12-09 2017-04-26 江南大学 Genetic algorithm and adaptive threshold constraint-combined ICP (iterative closest point) pose positioning technology
CN106845354A (en) * 2016-12-23 2017-06-13 中国科学院自动化研究所 Partial view base construction method, part positioning grasping means and device
CN107818577A (en) * 2017-10-26 2018-03-20 滁州学院 A kind of Parts Recognition and localization method based on mixed model
CN108010082A (en) * 2017-12-28 2018-05-08 上海觉感视觉科技有限公司 A kind of method of geometric match
CN108555908A (en) * 2018-04-12 2018-09-21 同济大学 A kind of identification of stacking workpiece posture and pick-up method based on RGBD cameras
CN109087323A (en) * 2018-07-25 2018-12-25 武汉大学 A kind of image three-dimensional vehicle Attitude estimation method based on fine CAD model
CN109801337A (en) * 2019-01-21 2019-05-24 同济大学 A kind of 6D position and orientation estimation method of Case-based Reasoning segmentation network and iteration optimization
CN110097598A (en) * 2019-04-11 2019-08-06 暨南大学 A kind of three-dimension object position and orientation estimation method based on PVFH feature
CN110298854A (en) * 2019-05-17 2019-10-01 同济大学 The snakelike arm co-located method of flight based on online adaptive and monocular vision

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YIBO CUI 等: "Estimation of 6Dof Pose Using Image Mask and Bounding Box", 《IGTA 2019: IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS》 *
ZHUANGNAN XU 等: "A Monocular Object Pose Recognition Algorithm Based on CAD Model and Object Contour", 《JOURNAL OF COMPUTING AND ELECTRONIC INFORMATION MANAGEMENT》 *
崔毅博 等: "利用RGB图像和DNN进行物体6DOf位姿推算", 《计算机仿真》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4166281A4 (en) * 2020-07-29 2024-03-13 Siemens Ltd. China Method and apparatus for robot to grab three-dimensional object
CN112465898A (en) * 2020-11-20 2021-03-09 上海交通大学 Object 3D pose tag acquisition method based on checkerboard calibration plate
CN112630639A (en) * 2020-12-01 2021-04-09 国网江苏省电力有限公司检修分公司 System and method for online detection of meshing state of handcart contact of high-voltage switch cabinet
CN112630639B (en) * 2020-12-01 2022-12-23 国网江苏省电力有限公司检修分公司 System and method for online detection of meshing state of handcart contact of high-voltage switch cabinet
WO2022252487A1 (en) * 2021-06-04 2022-12-08 浙江商汤科技开发有限公司 Pose acquisition method, apparatus, electronic device, storage medium, and program
CN115033998A (en) * 2022-07-13 2022-09-09 北京航空航天大学 Personalized 2D data set construction method for mechanical parts

Similar Documents

Publication Publication Date Title
CN110706285A (en) Object pose prediction method based on CAD model
Yang et al. Monocular object and plane slam in structured environments
CN109345588B (en) Tag-based six-degree-of-freedom attitude estimation method
CN109544677B (en) Indoor scene main structure reconstruction method and system based on depth image key frame
CN105021124B (en) A kind of planar part three-dimensional position and normal vector computational methods based on depth map
CN111897349B (en) Autonomous obstacle avoidance method for underwater robot based on binocular vision
CN108122256B (en) A method of it approaches under state and rotates object pose measurement
CN110688947B (en) Method for synchronously realizing human face three-dimensional point cloud feature point positioning and human face segmentation
EP3159125A1 (en) Device for recognizing position of mobile robot by using direct tracking, and method therefor
CN111401266B (en) Method, equipment, computer equipment and readable storage medium for positioning picture corner points
EP3751517A1 (en) Fast articulated motion tracking
KR100874817B1 (en) Facial feature detection method, media and apparatus using stereo combining mechanism
US20050265604A1 (en) Image processing apparatus and method thereof
EP3159122A1 (en) Device and method for recognizing location of mobile robot by means of search-based correlation matching
CN113393524B (en) Target pose estimation method combining deep learning and contour point cloud reconstruction
CN114022542A (en) Three-dimensional reconstruction-based 3D database manufacturing method
EP3185212B1 (en) Dynamic particle filter parameterization
CN108335325A (en) A kind of cube method for fast measuring based on depth camera data
CN113439289A (en) Image processing for determining the thickness of an object
Sun et al. A fast underwater calibration method based on vanishing point optimization of two orthogonal parallel lines
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN108694348B (en) Tracking registration method and device based on natural features
CN105339981A (en) Method for registering data using set of primitives
CN111915632B (en) Machine learning-based method for constructing truth database of lean texture target object
CN117218205B (en) Camera external parameter correction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117

RJ01 Rejection of invention patent application after publication