CN113034571B - Object three-dimensional size measuring method based on vision-inertia - Google Patents

Object three-dimensional size measuring method based on vision-inertia Download PDF

Info

Publication number
CN113034571B
CN113034571B CN202110413394.3A CN202110413394A CN113034571B CN 113034571 B CN113034571 B CN 113034571B CN 202110413394 A CN202110413394 A CN 202110413394A CN 113034571 B CN113034571 B CN 113034571B
Authority
CN
China
Prior art keywords
imu
coordinate system
box
ellipsoid
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110413394.3A
Other languages
Chinese (zh)
Other versions
CN113034571A (en
Inventor
管贻生
林旭滨
杨益枘
何力
张宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Tianhua Science and Education Co.,Ltd.
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110413394.3A priority Critical patent/CN113034571B/en
Publication of CN113034571A publication Critical patent/CN113034571A/en
Application granted granted Critical
Publication of CN113034571B publication Critical patent/CN113034571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Abstract

The invention discloses a method for measuring the three-dimensional size of an object based on vision-inertia, which utilizes a monocular camera and an IMU (inertial measurement Unit) as sensing acquisition equipment, establishes a mathematical equation of a three-dimensional dual ellipsoid of the object and a projection outline thereof in multiple views by calibrating camera parameters, detecting a target object by an image and estimating the motion posture of a sensor, and finally calculates the three-dimensional space occupation size of a three-dimensional minimum envelope box robot of the object. The invention not only can measure the real information of the three-dimensional size of the object, but also has the advantages of low measurement cost, convenience and the like.

Description

Object three-dimensional size measuring method based on vision-inertia
Technical Field
The invention relates to the technical field of machine vision, in particular to a method for measuring the three-dimensional size of an object based on vision-inertia.
Background
In the important links of logistics storage, product packaging, production line design, loading space optimization and the like involved in the industrial automatic production process, the three-dimensional size information of the product is an important physical parameter. The acquisition of the three-dimensional size information of the object mostly depends on manual measurement or direct product design parameter consultation, the manual measurement cannot meet the production automation requirement, and the design model is usually confidential or difficult to acquire. The existing measurement of three-dimensional size information of an object belongs to the field of industrial reverse engineering, and object refined three-dimensional modeling is usually carried out by adopting three-dimensional distance measuring sensors such as a multi-line laser scanner, a line structured light or a depth camera, and the like, and the method relates to modules for obtaining three-dimensional point cloud information of the object, point cloud registration and pose estimation of adjacent frames, multi-frame dense point cloud fusion and optimization and the like. On one hand, sensing equipment required by refined three-dimensional reconstruction is high in cost, and certain overlapping rate is required between point clouds of different frames in point cloud registration and pose estimation, so that the equipment is moved slowly in the actual use process to ensure the quality and the overlapping rate of the point clouds, and the defects of limited use scenes, high cost, low portability and the like are caused. On the other hand, in most industrial scenarios, a refined three-dimensional model of the object is not required, but only a minimal envelope volume of the object needs to be measured as a space occupation estimate.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a vision-inertia-based object three-dimensional size measuring method, which aims to utilize a monocular camera and an IMU (inertial measurement Unit) as sensing acquisition equipment, establish a mathematical equation of a three-dimensional dual ellipsoid of an object and a projection outline thereof in multiple views by calibrating camera parameters, detecting a target object through an image and estimating a motion posture of a sensor, and finally calculate the occupied size of a three-dimensional space of a three-dimensional minimum envelope box of the object, thereby realizing the measurement of the three-dimensional size information of the object with low cost, high efficiency, convenience and real size.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a method for measuring the three-dimensional size of an object based on vision-inertia comprises the following steps:
s1, estimating the pose of a camera based on visual-inertial coupling;
s2, extracting an object detection frame, and calculating a back projection plane corresponding to the object detection frame;
s3, constructing a dual ellipsoid envelope equation;
and S4, calculating the three-dimensional size of the ellipsoid minimum envelope box so as to obtain the three-dimensional occupied space size information of the object.
Further, the specific process of estimating the pose of the camera based on the visual-inertial coupling in step S1 is as follows:
adopting a visual-inertial tight coupling framework, including SIFT feature extraction and matching association on the image; for image keyframe C due to IMU and image data frequency differences i To C j Performing pre-integration on IMU data between frames, wherein the integration result is IMU posture delta R, speed delta v and position delta p between frames; for C i The state of (2), which is fused into the pre-integration result to obtain C j The attitude R, the velocity v and the position p of the lower IMU relative to the world coordinate system;
estimating three-dimensional coordinates of the tracked feature points X by using a trigonometry, and adding feature points which exist in a map and are established with data association with the feature points of the current frame into an optimization function;
constructing a visual reprojection error and an IMU inertial error as an optimization cost function, wherein the functions are as follows:
Figure GDA0003856878620000021
Figure GDA0003856878620000022
wherein E is proj (k, j) is a cost function for visual perception that defines the reprojection error of the three-dimensional coordinates from the feature points on the image:
Figure GDA0003856878620000023
Figure GDA0003856878620000024
wherein { R CB ,p CB Is the relative transformation of the camera coordinate system and the IMU coordinate system for the coordinates of the kth feature point in the world coordinate system
Figure GDA0003856878620000031
Coordinates converted into camera coordinate system
Figure GDA0003856878620000032
Function(s)
Figure GDA0003856878620000033
Is a camera imaging model, and three-dimensional coordinate points are calculated
Figure GDA0003856878620000034
And mapping the image to an image coordinate system to obtain the pixel coordinates of the image projection. Finally, the process is carried out in a closed loop,ρ (-) is a kernel function Huber, which is used to replace the square term of the original error with a function that grows more slowly, while ensuring the smoothness property of the error function.
E IMU (i, j) is the IMU error term:
Figure GDA0003856878620000035
Figure GDA0003856878620000036
Figure GDA0003856878620000037
Figure GDA0003856878620000038
e b =b j -b i
as above, ρ (-) is the kernel function Huber function, E IMU (i, j) IMU error terms representing the ith and jth frames, including attitude error e R Speed error e v And positional error e p And offset error e h In which sigma I Is a pre-integrated information matrix and E R Is an information matrix that is offset randomly walked. At the attitude error e R Middle, Δ R ij Representing the relative pose of the IMU through the i and j frames,
Figure GDA0003856878620000039
an approximate correction term representing the pose pre-integration over j frames,
Figure GDA00038568786200000310
representing the relative transformation pose of i-j by the IMU through pre-integration, log (·) maps the elements of the orthogonal transformation group SO (3) into lie algebra SO (3). At a speed error e v In the step (1), the first step,
Figure GDA00038568786200000311
and
Figure GDA00038568786200000312
representing the IMU velocities of the j and i frames, respectively, in a world coordinate system, g W Representing the gravity vector, Δ v ij Represents the speed difference of ij two frames, and
Figure GDA00038568786200000313
and
Figure GDA00038568786200000314
respectively representing the approximate offset terms of the gyroscope and accelerometer at the j frame. At a position error e p In the step (1), the first step,
Figure GDA00038568786200000315
and
Figure GDA00038568786200000316
indicating the IMU position relative to the world coordinate system at frame j and frame i respectively,
Figure GDA00038568786200000317
is the expression of IMU linear velocity of i frame under the world coordinate system, delta p ij Is the difference in the positions of the two frames i, j. In the offset term e b In (b), bj and b i Indicating the IMU offset for the j and i frames, respectively.
The method comprises the steps of carrying out optimization solution on the equation containing the visual sensing errors and the inertial sensing errors by using a g2o optimization frame, and solving the attitude R of each key frame j And position P j
Further, the specific process of step S2 is as follows:
training and fine-tuning a target object data set by adopting a pre-training convolution deep neural network to obtain a target object detection network module, extracting a target object from an input image by the module, outputting a semantic label of the object and coordinate and size information of a 2D (two-dimensional) envelope frame, and setting four vertexes of the envelope frame as p 1 ,p 2 ,p 3 ,p 4 Four sides are l 1 ,l 2 ,l 3 ,l 4 All represented in a homogeneous pixel coordinate system. Under the coordinates, any point p has coordinates [ u, v,1 ]]And any one of the lines l t Can be divided into two points p a And p b The cross product of (a) yields, i.e.:
Figure GDA0003856878620000041
according to the camera imaging model, the straight line l in the image is back projected to obtain a plane pi passing through the optical center of the camera, and then the given k frame image plane back projection equation can be used for projecting a matrix P by the camera k =K[R k t k ]Obtaining, namely:
Figure GDA0003856878620000042
wherein pi t =[π t1t2t3t4 ] T Is prepared from t And performing the operation on four edges of the object detection frames in all the images by using the obtained back projection plane, wherein each detection frame obtains four back projection planes.
Further, the specific process of constructing the dual ellipsoid envelope equation in the step S3 is as follows:
the dual form of ellipsoid is regarded as an algebraic cluster formed by all tangent planes on the ellipsoid, and in projective geometry, there is a definite definition pi of dual ellipsoid T Q * Pi =0, establishing its tangent plane pi t =[π t1t2t3t4 ] T The equation of (c):
Figure GDA0003856878620000043
written in vector form are:
Figure GDA0003856878620000051
the envelope equation of the dual ellipsoid in the above formula (1) is an equation in a homogeneous coordinate system, that is, variables are all regarded as the same solution when the difference is constant scale;
and (3) simultaneously setting back projection planes of all edges of the object detection frame with a plurality of views to obtain a plurality of formulas in the same equation (1), and forming a linear equation set which is recorded as:
Aq * =0
where A is an n × 10 matrix, n represents the number of backprojection planes, and the equation is a linear least squares problem:
Figure GDA0003856878620000052
carrying out SVD on the matrix A, wherein a singular value vector corresponding to the minimum singular value is an independent element of the solved ellipsoid, writing a 4 multiplied by 4 symmetric matrix according to a quadratic matrix, namely the solved object minimum envelope ellipsoid, and writing the symmetric matrix into the following formula:
Figure GDA0003856878620000053
further, the specific process of step S4 is as follows:
by means of an equation in the form of ellipsoid dual, three tangent planes aligned with the scene coordinate system are sought, which are obtained by solving a quadratic equation of unity, the analytical solution of which is:
Figure GDA0003856878620000054
Figure GDA0003856878620000055
Figure GDA0003856878620000061
Figure GDA0003856878620000062
Figure GDA0003856878620000063
Figure GDA0003856878620000064
thereby obtaining the length, width and height of the envelope box cube:
long l = abs (box (1) -box (4));
width w = abs (box (2) -box (5));
height h = abs (box (3) -box (6));
in the above formula, box (1) and box (4) represent x-axis coordinates of y-o-z parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, box (2) and box (5) represent y-axis coordinates of x-o-z parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, box (3) and box (6) represent z-axis coordinates of x-o-y parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, and abs (·) represents an absolute value function.
Compared with the prior art, the principle and the advantages of the scheme are as follows:
the scheme utilizes a monocular camera and an IMU (inertial measurement Unit) as sensing acquisition equipment, establishes mathematical equations of a three-dimensional dual ellipsoid of an object and a projection outline thereof in multiple views by calibrating camera parameters, detecting a target object by an image and estimating a motion attitude of a sensor, and finally calculates the occupied size of the three-dimensional space of the robot in the three-dimensional minimum envelope box of the object.
The scheme not only can measure the real information of the three-dimensional size of the object, but also has the advantages of low measurement cost, convenience and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required to be used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is also possible for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for measuring the three-dimensional size of an object based on vision-inertia according to the present invention;
FIG. 2 is a schematic diagram of a pose estimation algorithm for visual inertial fusion;
FIG. 3 is a schematic diagram illustrating coordinate definitions of object detection frames in an image;
FIG. 4 is a schematic diagram of a back projection plane corresponding to a straight line in an image;
FIG. 5 is a schematic diagram of a multi-view geometry in which the dual form of an ellipsoid may be viewed as a back-projected tangential envelope;
FIG. 6 is a schematic diagram of a multi-view construction of dual ellipsoids.
Detailed Description
The invention will be further illustrated with reference to specific examples:
as shown in fig. 1, the method for measuring the three-dimensional size of an object based on vision-inertia according to the embodiment includes the following steps:
s1, estimating the pose of a camera based on visual-inertial coupling;
adopting a visual-inertial tight coupling framework, including SIFT feature extraction and matching association on the image; for image keyframe C due to IMU and image data frequency differences i To C j Performing pre-integration on IMU data between frames, wherein the integration result is IMU posture delta R, speed delta v and position delta p between frames; for C i The state of (2), which is fused into the pre-integration result to obtain C j The attitude R, the velocity v and the position p of the lower IMU relative to the world coordinate system;
estimating three-dimensional coordinates of the tracked feature points X by using a trigonometry, and adding feature points which exist in a map and are established with data association with the feature points of the current frame into an optimization function;
the pose-3D feature points of the vision-inertia pose estimation jointly form a multi-constraint graph, as shown in FIG. 2, a vision reprojection error and an IMU inertia error are constructed as an optimization cost function, and the functions are as follows:
Figure GDA0003856878620000071
Figure GDA0003856878620000072
wherein E is proj (k, j) is a cost function for visual perception that defines the reprojection error of the three-dimensional coordinates from the feature points on the image:
Figure GDA0003856878620000081
Figure GDA0003856878620000082
wherein { R CB ,p CB Is the relative transformation between the camera coordinate system and the IMU coordinate system, and is used for converting the k characteristic point in the world coordinate system
Figure GDA0003856878620000083
Coordinates converted into camera coordinate system
Figure GDA0003856878620000084
Function(s)
Figure GDA0003856878620000085
Is a camera imaging model, and three-dimensional coordinate points are calculated
Figure GDA0003856878620000086
Mapping the image to an image coordinate system to obtain the pixel coordinates of the image projection. Finally, ρ (-) is a kernel function Huber, which is used to correct the original errorThe square term is replaced by a function that grows more slowly while ensuring the smooth nature of the error function.
E IMU (i, j) is the IMU error term:
Figure GDA0003856878620000087
Figure GDA0003856878620000088
Figure GDA0003856878620000089
Figure GDA00038568786200000810
e b =b j -b i
as above, ρ (-) is the kernel function Huber, E IMU (i, j) IMU error terms representing the ith and jth frames, including attitude error e R Speed error e v Position error e p And offset error e h Wherein ∑ I Is a pre-integrated information matrix and E R Is an information matrix that is offset randomly walked. In the attitude error e R Middle, Δ R ij Representing the relative pose of the IMU through the i and j frames,
Figure GDA00038568786200000811
an approximate correction term representing the pose pre-integration over j frames,
Figure GDA00038568786200000812
representing the relative transformation pose of i-j by the IMU through pre-integration, log (·) maps the elements of the orthogonal transformation group SO (3) into lie algebra SO (3). At a speed error e v In the step (1), the first step,
Figure GDA00038568786200000813
and
Figure GDA00038568786200000814
representing the IMU velocities of the j and i frames, respectively, in a world coordinate system, g W Representing the gravity vector, Δ v ij Represents the speed difference of the ij two frames, and
Figure GDA0003856878620000091
and
Figure GDA0003856878620000092
respectively representing approximate offset terms of the gyroscope and the accelerometer at the j frame. At a position error e p In the step (1), the first step,
Figure GDA0003856878620000093
and
Figure GDA0003856878620000094
indicating the IMU position relative to the world coordinate system at frame j and frame i respectively,
Figure GDA0003856878620000095
is the expression of the IMU linear velocity of the i frame under the world coordinate system,
Figure GDA0003856878620000098
is the difference in the positions of the two frames i, j. In the offset term e b In (b) j And b i Indicating the IMU offset for the j and i frames, respectively.
The method comprises the steps of carrying out optimization solution on the equation containing the visual sensing errors and the inertial sensing errors by using a g2o optimization frame, and solving the attitude R of each key frame j And position P j
S2, extracting an object detection frame, and calculating a back projection plane corresponding to the object detection frame;
the method comprises the steps of training and fine-tuning a target object data set by adopting a pre-training convolution deep neural network to obtain a target object detection network module, extracting a target object from an input image by the module, and outputting a semantic label of the objectAnd coordinate and size information of the 2D envelope, as shown in FIG. 3, let four vertices of the envelope be denoted as p 1 ,p 2 ,p 3 ,p 4 Four sides are l 1 ,l 2 ,l 3 ,l 4 All represented in a homogeneous pixel coordinate system. Under the coordinate, any point p has a coordinate [ u, v,1 ]]And any straight line l t Can be divided into two points p a And p b The cross product of (a) yields, i.e.:
Figure GDA0003856878620000096
as shown in fig. 4, according to the camera imaging model, a straight line l in the image is back projected to obtain a plane pi passing through the optical center of the camera. The given k-th frame image plane back-projection equation may be projected by the camera to form a matrix P k =K[R k t k ]Obtaining, namely:
Figure GDA0003856878620000097
wherein pi t =[π t1t2t3t4 ] T Is prepared from t The resulting back projection plane. This operation is performed on all four edges of the object detection box in all images, each detection box resulting in four backprojection planes.
S3, then, constructing a dual ellipsoid envelope equation;
the dual form of ellipsoid is regarded as an algebraic cluster formed by all tangent planes on the ellipsoid, and in projective geometry, there is a definite definition pi of dual ellipsoid T Q * Pi =0, establishing its tangent plane pi t =[π t1t2t3t4 ] T The equation of (c):
Figure GDA0003856878620000101
written in vector form are:
Figure GDA0003856878620000102
the envelope equation of the dual ellipsoid in the above formula (1) is an equation in a homogeneous coordinate system, that is, variables are all regarded as the same solution when the difference is constant scale;
and (3) simultaneously setting back projection planes of all edges of the object detection frame of a plurality of views to obtain a plurality of equations in the same equation (1), and forming a linear equation set which is recorded as:
Aq * =0
where A is an n × 10 matrix, n represents the number of backprojection planes, and the equation is a linear least squares problem:
Figure GDA0003856878620000103
carrying out SVD on the matrix A, wherein a singular value vector corresponding to the minimum singular value is an independent element of the solved ellipsoid, writing a 4 multiplied by 4 symmetric matrix according to a quadratic matrix, namely the solved object minimum envelope ellipsoid, and writing the symmetric matrix into the following formula:
Figure GDA0003856878620000104
s4, finally, calculating the three-dimensional size of the ellipsoid minimum envelope box;
the ellipsoid obtained in step S3 represents the minimum ellipsoid enveloping surface of the object under normal placement, i.e. the pose of the ellipsoid is not limited, and only the minimum enveloping volume of the object is sought, so that if the enveloping box is directly obtained from the three semi-axial lengths of the ellipsoid, the enveloping box may not be suitable for the application where the object is required to be placed normally, but rather, an enveloping box that is adapted to the object placement and aligned with the object placement plane is sought. Most methods seek the minimum coordinate on the coordinate axis of the scene coordinate system through traversing and sampling and regard the minimum coordinate as the size of the envelope box, and the calculation method is too inefficient. Therefore, this step seeks three sections aligned to the scene coordinate system by means of equations in the form of pairs of ellipsoids, which are obtained by solving a quadratic equation of unity, whose analytic solution is:
Figure GDA0003856878620000111
Figure GDA0003856878620000112
Figure GDA0003856878620000113
Figure GDA0003856878620000114
Figure GDA0003856878620000115
Figure GDA0003856878620000116
thereby obtaining the length, width and height of the envelope box cube:
long l = abs (box (1) -box (4));
width w = abs (box (2) -box (5));
height h = abs (box (3) -box (6));
in the above formula, box (1) and box (4) represent x-axis coordinates of two tangent points of the ellipsoid and a y-O-z parallel surface of the world coordinate system, box (2) and box (5) represent y-axis coordinates of two tangent points of the ellipsoid and an x-O-z parallel surface of the world coordinate system, box (3) and box (6) represent z-axis coordinates of two tangent points of the ellipsoid and an x-O-y parallel surface of the world coordinate system, and abs (·) represents an absolute value function.
In the embodiment, a monocular camera and an IMU are used as sensing acquisition equipment, a mathematical equation of a three-dimensional dual ellipsoid of an object and a projection outline of the ellipsoid in multiple views is established by calibrating camera parameters, detecting a target object by an image and estimating a motion posture of a sensor, and finally, the occupied size of a three-dimensional space of a three-dimensional minimum envelope box of the object is calculated, so that the measurement of the three-dimensional size information of the object with real dimensions, which is low in cost, efficient, convenient and fast, is realized.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims (3)

1. A method for measuring the three-dimensional size of an object based on vision-inertia is characterized by comprising the following steps:
s1, estimating the pose of a camera based on visual-inertial coupling;
s2, extracting an object detection frame, and calculating a back projection plane corresponding to the object detection frame;
s3, constructing a dual ellipsoid envelope equation;
s4, calculating the three-dimensional size of the ellipsoid minimum envelope box so as to obtain the three-dimensional occupied space size information of the object;
the specific process of estimating the pose of the camera based on the visual-inertial coupling in the step S1 is as follows:
adopting a visual-inertial tight coupling framework, including SIFT feature extraction and matching association on the image; for image keyframe C due to IMU and image data frequency differences i To C j Performing pre-integration on IMU data between frames, wherein the integration result is IMU posture delta R, speed delta v and position delta p between frames; for C i The result of fusing into the pre-integration can be C j The attitude R, the velocity v and the position p of the lower IMU relative to the world coordinate system;
estimating three-dimensional coordinates of the tracked feature points X by using a trigonometry, and adding feature points which exist in a map and are established with data association with the feature points of the current frame into an optimization function;
constructing a visual reprojection error and an IMU inertial error as an optimization cost function, wherein the functions are as follows:
Figure FDA0003887053580000011
Figure FDA0003887053580000012
wherein, E proj (k, j) is a cost function for visual sensing that defines the reprojection error of the three-dimensional coordinates from the feature points on the image:
Figure FDA0003887053580000013
Figure FDA0003887053580000014
wherein { R CB ,p CB Is the relative transformation between the camera coordinate system and the IMU coordinate system, and is used for converting the k characteristic point in the world coordinate system
Figure FDA0003887053580000021
Coordinates converted into camera coordinate system
Figure FDA0003887053580000022
Function(s)
Figure FDA0003887053580000023
Is a camera imaging model, and three-dimensional coordinate points are calculated
Figure FDA0003887053580000024
Mapping the image to an image coordinate system to obtain the pixel coordinate of the image in image projection, wherein rho (·) is a kernel function Huber function used for transforming the original image into a new imageThe square term of the initial error is replaced by a function which grows slowly, and meanwhile, the smoothness of the error function is guaranteed;
E IMU (i, j) is the IMU error term:
Figure FDA0003887053580000025
Figure FDA0003887053580000026
Figure FDA0003887053580000027
Figure FDA0003887053580000028
e b =b j -b i
E IMU (i, j) IMU error terms representing the ith and jth frames, including attitude error e R Speed error e v Position error e p And offset error e h Wherein ∑ I Is a pre-integrated information matrix, and R is an information matrix of offset random walks; in the attitude error e R Middle, Δ R ij Representing the relative pose of the IMU through the i and j frames,
Figure FDA0003887053580000029
an approximate correction term representing the pose pre-integration over j frames,
Figure FDA00038870535800000210
representing the relative transformation posture of i-j obtained by IMU through pre-integration, and mapping the elements of an orthogonal transformation group SO (3) into a lie algebra SO (3) by Log (·); at a speed error e v In (1),
Figure FDA00038870535800000211
and
Figure FDA00038870535800000212
representing the IMU velocities of the j and i frames, respectively, in a world coordinate system, g W Representing the gravity vector, Δ v ij Represents the speed difference of ij two frames, and
Figure FDA00038870535800000213
and
Figure FDA00038870535800000214
respectively representing offset approximate terms of the gyroscope and the accelerometer in j frames; at a position error e p In (1),
Figure FDA00038870535800000215
and
Figure FDA00038870535800000216
indicating the IMU position relative to the world coordinate system at frame j and frame i respectively,
Figure FDA00038870535800000217
is the expression of IMU linear velocity of i frame under the world coordinate system, delta p ij Is the difference in the positions of the two frames i, j; in the offset term e b In (b) j And b i Respectively representing IMU offset under j and i frames;
the method comprises the steps of carrying out optimization solution on the equation containing the visual sensing errors and the inertial sensing errors by using a g2o optimization frame, and solving the posture R of each key frame j And position P j
The specific process of step S2 is as follows:
training and fine-tuning on a target object data set by adopting a pre-training convolution deep neural network to obtain a target object detection network module, extracting a target object from an input image by the module, and outputting a semantic label and a 2D (two-dimensional) envelope frame of the objectCoordinate and size information, and setting four vertexes of the envelope as p 1 ,p 2 ,p 3 ,p 4 Four sides are l 1 ,l 2 ,l 3 ,l 4 All expressed in a homogeneous pixel coordinate system; under the coordinates, any point p has coordinates [ u, v,1 ]]And any straight line l t Can be divided into two points p a And p b The cross product of (a) yields, i.e.:
Figure FDA0003887053580000031
according to the camera imaging model, the straight line l in the image is back projected to obtain a plane pi passing through the optical center of the camera, and then the given k frame image plane back projection equation can be used for projecting a matrix P by the camera k =K[R k t k ]Obtaining, namely:
Figure FDA0003887053580000032
wherein pi t =[π t1t2t3t4 ] T Is formed by t And performing the operation on four sides of the object detection frames in all the images by using the obtained back projection plane, wherein each detection frame obtains four back projection planes.
2. The method for measuring the three-dimensional size of the object based on the vision-inertia as claimed in claim 1, wherein the step S3 is to construct the dual ellipsoid envelope equation by the following specific process:
the dual form of ellipsoid is regarded as an algebraic cluster formed by all tangent planes on the ellipsoid, and in projective geometry, there is a definite definition pi of dual ellipsoid T Q * Pi =0, and establishing tangent plane pi t =[π t1t2t3t4 ] T The equation of (c):
Figure FDA0003887053580000033
written in vector form are:
Figure FDA0003887053580000041
the envelope equation of the dual ellipsoid in the above formula (1) is an equation in a homogeneous coordinate system, that is, variables are all regarded as the same solution when the difference is constant scale;
and (3) simultaneously setting back projection planes of all edges of the object detection frame with a plurality of views to obtain a plurality of formulas in the same equation (1), and forming a linear equation set which is recorded as:
Aq * =0
where A is an n × 10 matrix, n represents the number of backprojection planes, and the equation is a linear least squares problem:
Figure FDA0003887053580000042
carrying out SVD on the matrix A, wherein a singular value vector corresponding to the minimum singular value is an independent element of the solved ellipsoid, writing a 4 multiplied by 4 symmetric matrix according to a quadratic matrix, namely the solved object minimum envelope ellipsoid, and writing the symmetric matrix into the following formula:
Figure FDA0003887053580000043
3. the method for measuring the three-dimensional size of the object based on the vision-inertia as claimed in claim 1, wherein the specific process of the step S4 is as follows:
by means of an equation in the form of ellipsoid duality, three tangent planes aligned with the scene coordinate system are sought, the tangent planes are obtained by solving a quadratic equation of unity, and the analytic solution is as follows:
Figure FDA0003887053580000044
Figure FDA0003887053580000051
Figure FDA0003887053580000052
Figure FDA0003887053580000053
Figure FDA0003887053580000054
Figure FDA0003887053580000055
thereby obtaining the length, width and height of the envelope box cube:
length l = abs (box (1) -box (4));
width w = abs (box (2) -box (5));
height h = abs (box (3) -box (6));
in the above formula, box (1) and box (4) represent x-axis coordinates of y-o-z parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, box (2) and box (5) represent y-axis coordinates of x-o-z parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, box (3) and box (6) represent z-axis coordinates of x-o-y parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, and abs (·) represents an absolute value function.
CN202110413394.3A 2021-04-16 2021-04-16 Object three-dimensional size measuring method based on vision-inertia Active CN113034571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110413394.3A CN113034571B (en) 2021-04-16 2021-04-16 Object three-dimensional size measuring method based on vision-inertia

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110413394.3A CN113034571B (en) 2021-04-16 2021-04-16 Object three-dimensional size measuring method based on vision-inertia

Publications (2)

Publication Number Publication Date
CN113034571A CN113034571A (en) 2021-06-25
CN113034571B true CN113034571B (en) 2023-01-20

Family

ID=76457585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110413394.3A Active CN113034571B (en) 2021-04-16 2021-04-16 Object three-dimensional size measuring method based on vision-inertia

Country Status (1)

Country Link
CN (1) CN113034571B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114485479B (en) * 2022-01-17 2022-12-30 吉林大学 Structured light scanning and measuring method and system based on binocular camera and inertial navigation
CN114719759B (en) * 2022-04-01 2023-01-03 南昌大学 Object surface perimeter and area measurement method based on SLAM algorithm and image instance segmentation technology
CN115655262B (en) * 2022-12-26 2023-03-21 广东省科学院智能制造研究所 Deep learning perception-based multi-level semantic map construction method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019090487A1 (en) * 2017-11-07 2019-05-16 大连理工大学 Highly dynamic wide-range any-contour-error monocular six-dimensional measurement method for numerical control machine tool

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104748746B (en) * 2013-12-29 2017-11-03 刘进 Intelligent machine attitude determination and virtual reality loaming method
CN111156984B (en) * 2019-12-18 2022-12-09 东南大学 Monocular vision inertia SLAM method oriented to dynamic scene

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019090487A1 (en) * 2017-11-07 2019-05-16 大连理工大学 Highly dynamic wide-range any-contour-error monocular six-dimensional measurement method for numerical control machine tool

Also Published As

Publication number Publication date
CN113034571A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN113034571B (en) Object three-dimensional size measuring method based on vision-inertia
CN109307508B (en) Panoramic inertial navigation SLAM method based on multiple key frames
TWI607412B (en) Measurement systems and methods for measuring multi-dimensions
CN109752003B (en) Robot vision inertia point-line characteristic positioning method and device
US9729789B2 (en) Method of 3D reconstruction and 3D panoramic mosaicing of a scene
CN108053473A (en) A kind of processing method of interior three-dimensional modeling data
CN100417231C (en) Three-dimensional vision semi-matter simulating system and method
CN111156998A (en) Mobile robot positioning method based on RGB-D camera and IMU information fusion
EP3274964B1 (en) Automatic connection of images using visual features
CN108170297B (en) Real-time six-degree-of-freedom VR/AR/MR device positioning method
CN103649998A (en) Method for determining a parameter set designed for determining the pose of a camera and/or for determining a three-dimensional structure of the at least one real object
CN111127540B (en) Automatic distance measurement method and system for three-dimensional virtual space
KR20220025028A (en) Method and device for building beacon map based on visual beacon
US20160292883A1 (en) Method of estimating the speed of displacement of a camera
CN113848931B (en) Agricultural machinery automatic driving obstacle recognition method, system, equipment and storage medium
CN111791235A (en) Robot multi-camera visual inertia point-line characteristic positioning method and device
Ding et al. An efficient solution to the homography-based relative pose problem with a common reference direction
Koschorrek et al. A multi-sensor traffic scene dataset with omnidirectional video
CN110533719A (en) Augmented reality localization method and device based on environmental visual Feature point recognition technology
CN108010122B (en) Method and system for reconstructing and measuring three-dimensional model of human body
Feng et al. Semi-automatic 3d reconstruction of piecewise planar building models from single image
CN110853103B (en) Data set manufacturing method for deep learning attitude estimation
CN116358517B (en) Height map construction method, system and storage medium for robot
Panek et al. Visual localization using imperfect 3d models from the internet
Kang et al. 3D urban reconstruction from wide area aerial surveillance video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Guan Yisheng

Inventor after: Lin Xubin

Inventor after: Yang Yinen

Inventor after: He Li

Inventor after: Zhang Hong

Inventor before: He Li

Inventor before: Lin Xubin

Inventor before: Yang Yinen

Inventor before: Guan Yisheng

Inventor before: Zhang Hong

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240322

Address after: Room 2412, Xingwei Mingzuo, No. 460 Shaoshan North Road, Yuhua District, Changsha City, Hunan Province, 410007

Patentee after: Hunan Tianhua Science and Education Co.,Ltd.

Country or region after: China

Address before: 510062 Dongfeng East Road, Yuexiu District, Guangzhou, Guangdong 729

Patentee before: GUANGDONG University OF TECHNOLOGY

Country or region before: China