CN113034571B

CN113034571B - Object three-dimensional size measuring method based on vision-inertia

Info

Publication number: CN113034571B
Application number: CN202110413394.3A
Authority: CN
Inventors: 管贻生; 林旭滨; 杨益枘; 何力; 张宏
Original assignee: Guangdong University of Technology
Current assignee: Hunan Tianhua Science and Education Co.,Ltd.
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2023-01-20
Anticipated expiration: 2041-04-16
Also published as: CN113034571A

Abstract

The invention discloses a method for measuring the three-dimensional size of an object based on vision-inertia, which utilizes a monocular camera and an IMU (inertial measurement Unit) as sensing acquisition equipment, establishes a mathematical equation of a three-dimensional dual ellipsoid of the object and a projection outline thereof in multiple views by calibrating camera parameters, detecting a target object by an image and estimating the motion posture of a sensor, and finally calculates the three-dimensional space occupation size of a three-dimensional minimum envelope box robot of the object. The invention not only can measure the real information of the three-dimensional size of the object, but also has the advantages of low measurement cost, convenience and the like.

Description

Object three-dimensional size measuring method based on vision-inertia

Technical Field

The invention relates to the technical field of machine vision, in particular to a method for measuring the three-dimensional size of an object based on vision-inertia.

Background

In the important links of logistics storage, product packaging, production line design, loading space optimization and the like involved in the industrial automatic production process, the three-dimensional size information of the product is an important physical parameter. The acquisition of the three-dimensional size information of the object mostly depends on manual measurement or direct product design parameter consultation, the manual measurement cannot meet the production automation requirement, and the design model is usually confidential or difficult to acquire. The existing measurement of three-dimensional size information of an object belongs to the field of industrial reverse engineering, and object refined three-dimensional modeling is usually carried out by adopting three-dimensional distance measuring sensors such as a multi-line laser scanner, a line structured light or a depth camera, and the like, and the method relates to modules for obtaining three-dimensional point cloud information of the object, point cloud registration and pose estimation of adjacent frames, multi-frame dense point cloud fusion and optimization and the like. On one hand, sensing equipment required by refined three-dimensional reconstruction is high in cost, and certain overlapping rate is required between point clouds of different frames in point cloud registration and pose estimation, so that the equipment is moved slowly in the actual use process to ensure the quality and the overlapping rate of the point clouds, and the defects of limited use scenes, high cost, low portability and the like are caused. On the other hand, in most industrial scenarios, a refined three-dimensional model of the object is not required, but only a minimal envelope volume of the object needs to be measured as a space occupation estimate.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a vision-inertia-based object three-dimensional size measuring method, which aims to utilize a monocular camera and an IMU (inertial measurement Unit) as sensing acquisition equipment, establish a mathematical equation of a three-dimensional dual ellipsoid of an object and a projection outline thereof in multiple views by calibrating camera parameters, detecting a target object through an image and estimating a motion posture of a sensor, and finally calculate the occupied size of a three-dimensional space of a three-dimensional minimum envelope box of the object, thereby realizing the measurement of the three-dimensional size information of the object with low cost, high efficiency, convenience and real size.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a method for measuring the three-dimensional size of an object based on vision-inertia comprises the following steps:

s1, estimating the pose of a camera based on visual-inertial coupling;

s2, extracting an object detection frame, and calculating a back projection plane corresponding to the object detection frame;

s3, constructing a dual ellipsoid envelope equation;

and S4, calculating the three-dimensional size of the ellipsoid minimum envelope box so as to obtain the three-dimensional occupied space size information of the object.

Further, the specific process of estimating the pose of the camera based on the visual-inertial coupling in step S1 is as follows:

adopting a visual-inertial tight coupling framework, including SIFT feature extraction and matching association on the image; for image keyframe C due to IMU and image data frequency differences _i To C _j Performing pre-integration on IMU data between frames, wherein the integration result is IMU posture delta R, speed delta v and position delta p between frames; for C _i The state of (2), which is fused into the pre-integration result to obtain C _j The attitude R, the velocity v and the position p of the lower IMU relative to the world coordinate system;

estimating three-dimensional coordinates of the tracked feature points X by using a trigonometry, and adding feature points which exist in a map and are established with data association with the feature points of the current frame into an optimization function;

constructing a visual reprojection error and an IMU inertial error as an optimization cost function, wherein the functions are as follows:

wherein E is _proj (k, j) is a cost function for visual perception that defines the reprojection error of the three-dimensional coordinates from the feature points on the image:

wherein { R _CB ,p _CB Is the relative transformation of the camera coordinate system and the IMU coordinate system for the coordinates of the kth feature point in the world coordinate system

Coordinates converted into camera coordinate system

Function(s)

Is a camera imaging model, and three-dimensional coordinate points are calculated

And mapping the image to an image coordinate system to obtain the pixel coordinates of the image projection. Finally, the process is carried out in a closed loop,ρ (-) is a kernel function Huber, which is used to replace the square term of the original error with a function that grows more slowly, while ensuring the smoothness property of the error function.

E _IMU (i, j) is the IMU error term:

e _b ＝b ^j -b ⁱ

as above, ρ (-) is the kernel function Huber function, E _IMU (i, j) IMU error terms representing the ith and jth frames, including attitude error e _R Speed error e _v And positional error e _p And offset error e _h In which sigma _I Is a pre-integrated information matrix and E _R Is an information matrix that is offset randomly walked. At the attitude error e _R Middle, Δ R _ij Representing the relative pose of the IMU through the i and j frames,

an approximate correction term representing the pose pre-integration over j frames,

representing the relative transformation pose of i-j by the IMU through pre-integration, log (·) maps the elements of the orthogonal transformation group SO (3) into lie algebra SO (3). At a speed error e _v In the step (1), the first step,

and

representing the IMU velocities of the j and i frames, respectively, in a world coordinate system, g _W Representing the gravity vector, Δ v _ij Represents the speed difference of ij two frames, and

and

respectively representing the approximate offset terms of the gyroscope and accelerometer at the j frame. At a position error e _p In the step (1), the first step,

and

indicating the IMU position relative to the world coordinate system at frame j and frame i respectively,

is the expression of IMU linear velocity of i frame under the world coordinate system, delta p _ij Is the difference in the positions of the two frames i, j. In the offset term e _b In (b), bj and b ⁱ Indicating the IMU offset for the j and i frames, respectively.

The method comprises the steps of carrying out optimization solution on the equation containing the visual sensing errors and the inertial sensing errors by using a g2o optimization frame, and solving the attitude R of each key frame _j And position P _j 。

Further, the specific process of step S2 is as follows:

training and fine-tuning a target object data set by adopting a pre-training convolution deep neural network to obtain a target object detection network module, extracting a target object from an input image by the module, outputting a semantic label of the object and coordinate and size information of a 2D (two-dimensional) envelope frame, and setting four vertexes of the envelope frame as p ₁ ，p ₂ ，p ₃ ，p ₄ Four sides are l ₁ ，l ₂ ，l ₃ ，l ₄ All represented in a homogeneous pixel coordinate system. Under the coordinates, any point p has coordinates [ u, v,1 ]]And any one of the lines l _t Can be divided into two points p _a And p _b The cross product of (a) yields, i.e.:

according to the camera imaging model, the straight line l in the image is back projected to obtain a plane pi passing through the optical center of the camera, and then the given k frame image plane back projection equation can be used for projecting a matrix P by the camera _k ＝K[R _k t _k ]Obtaining, namely:

wherein pi _t ＝[π _t1 ,π _t2 ,π _t3 ,π _t4 ] ^T Is prepared from _t And performing the operation on four edges of the object detection frames in all the images by using the obtained back projection plane, wherein each detection frame obtains four back projection planes.

Further, the specific process of constructing the dual ellipsoid envelope equation in the step S3 is as follows:

the dual form of ellipsoid is regarded as an algebraic cluster formed by all tangent planes on the ellipsoid, and in projective geometry, there is a definite definition pi of dual ellipsoid ^T Q ^* Pi =0, establishing its tangent plane pi _t ＝[π _t1 ,π _t2 ,π _t3 ,π _t4 ] ^T The equation of (c):

written in vector form are:

the envelope equation of the dual ellipsoid in the above formula (1) is an equation in a homogeneous coordinate system, that is, variables are all regarded as the same solution when the difference is constant scale;

and (3) simultaneously setting back projection planes of all edges of the object detection frame with a plurality of views to obtain a plurality of formulas in the same equation (1), and forming a linear equation set which is recorded as:

Aq ^* ＝0

where A is an n × 10 matrix, n represents the number of backprojection planes, and the equation is a linear least squares problem:

carrying out SVD on the matrix A, wherein a singular value vector corresponding to the minimum singular value is an independent element of the solved ellipsoid, writing a 4 multiplied by 4 symmetric matrix according to a quadratic matrix, namely the solved object minimum envelope ellipsoid, and writing the symmetric matrix into the following formula:

further, the specific process of step S4 is as follows:

by means of an equation in the form of ellipsoid dual, three tangent planes aligned with the scene coordinate system are sought, which are obtained by solving a quadratic equation of unity, the analytical solution of which is:

thereby obtaining the length, width and height of the envelope box cube:

long l = abs (box (1) -box (4));

width w = abs (box (2) -box (5));

height h = abs (box (3) -box (6));

in the above formula, box (1) and box (4) represent x-axis coordinates of y-o-z parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, box (2) and box (5) represent y-axis coordinates of x-o-z parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, box (3) and box (6) represent z-axis coordinates of x-o-y parallel surfaces of the world coordinate system and two tangent points of the ellipsoid, and abs (·) represents an absolute value function.

Compared with the prior art, the principle and the advantages of the scheme are as follows:

the scheme utilizes a monocular camera and an IMU (inertial measurement Unit) as sensing acquisition equipment, establishes mathematical equations of a three-dimensional dual ellipsoid of an object and a projection outline thereof in multiple views by calibrating camera parameters, detecting a target object by an image and estimating a motion attitude of a sensor, and finally calculates the occupied size of the three-dimensional space of the robot in the three-dimensional minimum envelope box of the object.

The scheme not only can measure the real information of the three-dimensional size of the object, but also has the advantages of low measurement cost, convenience and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required to be used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is also possible for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for measuring the three-dimensional size of an object based on vision-inertia according to the present invention;

FIG. 2 is a schematic diagram of a pose estimation algorithm for visual inertial fusion;

FIG. 3 is a schematic diagram illustrating coordinate definitions of object detection frames in an image;

FIG. 4 is a schematic diagram of a back projection plane corresponding to a straight line in an image;

FIG. 5 is a schematic diagram of a multi-view geometry in which the dual form of an ellipsoid may be viewed as a back-projected tangential envelope;

FIG. 6 is a schematic diagram of a multi-view construction of dual ellipsoids.

Detailed Description

The invention will be further illustrated with reference to specific examples:

as shown in fig. 1, the method for measuring the three-dimensional size of an object based on vision-inertia according to the embodiment includes the following steps:

s1, estimating the pose of a camera based on visual-inertial coupling;

the pose-3D feature points of the vision-inertia pose estimation jointly form a multi-constraint graph, as shown in FIG. 2, a vision reprojection error and an IMU inertia error are constructed as an optimization cost function, and the functions are as follows:

wherein { R _CB ,p _CB Is the relative transformation between the camera coordinate system and the IMU coordinate system, and is used for converting the k characteristic point in the world coordinate system

Coordinates converted into camera coordinate system

Function(s)

Mapping the image to an image coordinate system to obtain the pixel coordinates of the image projection. Finally, ρ (-) is a kernel function Huber, which is used to correct the original errorThe square term is replaced by a function that grows more slowly while ensuring the smooth nature of the error function.

E _IMU (i, j) is the IMU error term:

e _b ＝b ^j -b ⁱ

as above, ρ (-) is the kernel function Huber, E _IMU (i, j) IMU error terms representing the ith and jth frames, including attitude error e _R Speed error e _v Position error e _p And offset error e _h Wherein ∑ _I Is a pre-integrated information matrix and E _R Is an information matrix that is offset randomly walked. In the attitude error e _R Middle, Δ R _ij Representing the relative pose of the IMU through the i and j frames,

and

representing the IMU velocities of the j and i frames, respectively, in a world coordinate system, g _W Representing the gravity vector, Δ v _ij Represents the speed difference of the ij two frames, and

and

respectively representing approximate offset terms of the gyroscope and the accelerometer at the j frame. At a position error e _p In the step (1), the first step,

and

is the expression of the IMU linear velocity of the i frame under the world coordinate system,

is the difference in the positions of the two frames i, j. In the offset term e _b In (b) ^j And b ⁱ Indicating the IMU offset for the j and i frames, respectively.

the method comprises the steps of training and fine-tuning a target object data set by adopting a pre-training convolution deep neural network to obtain a target object detection network module, extracting a target object from an input image by the module, and outputting a semantic label of the objectAnd coordinate and size information of the 2D envelope, as shown in FIG. 3, let four vertices of the envelope be denoted as p ₁ ，p ₂ ，p ₃ ，p ₄ Four sides are l ₁ ，l ₂ ，l ₃ ，l ₄ All represented in a homogeneous pixel coordinate system. Under the coordinate, any point p has a coordinate [ u, v,1 ]]And any straight line l _t Can be divided into two points p _a And p _b The cross product of (a) yields, i.e.:

as shown in fig. 4, according to the camera imaging model, a straight line l in the image is back projected to obtain a plane pi passing through the optical center of the camera. The given k-th frame image plane back-projection equation may be projected by the camera to form a matrix P _k ＝K[R _k t _k ]Obtaining, namely:

wherein pi _t ＝[π _t1 ,π _t2 ,π _t3 ,π _t4 ] ^T Is prepared from _t The resulting back projection plane. This operation is performed on all four edges of the object detection box in all images, each detection box resulting in four backprojection planes.

S3, then, constructing a dual ellipsoid envelope equation;

written in vector form are:

and (3) simultaneously setting back projection planes of all edges of the object detection frame of a plurality of views to obtain a plurality of equations in the same equation (1), and forming a linear equation set which is recorded as:

Aq ^* ＝0

s4, finally, calculating the three-dimensional size of the ellipsoid minimum envelope box;

the ellipsoid obtained in step S3 represents the minimum ellipsoid enveloping surface of the object under normal placement, i.e. the pose of the ellipsoid is not limited, and only the minimum enveloping volume of the object is sought, so that if the enveloping box is directly obtained from the three semi-axial lengths of the ellipsoid, the enveloping box may not be suitable for the application where the object is required to be placed normally, but rather, an enveloping box that is adapted to the object placement and aligned with the object placement plane is sought. Most methods seek the minimum coordinate on the coordinate axis of the scene coordinate system through traversing and sampling and regard the minimum coordinate as the size of the envelope box, and the calculation method is too inefficient. Therefore, this step seeks three sections aligned to the scene coordinate system by means of equations in the form of pairs of ellipsoids, which are obtained by solving a quadratic equation of unity, whose analytic solution is:

thereby obtaining the length, width and height of the envelope box cube:

long l = abs (box (1) -box (4));

width w = abs (box (2) -box (5));

height h = abs (box (3) -box (6));

in the above formula, box (1) and box (4) represent x-axis coordinates of two tangent points of the ellipsoid and a y-O-z parallel surface of the world coordinate system, box (2) and box (5) represent y-axis coordinates of two tangent points of the ellipsoid and an x-O-z parallel surface of the world coordinate system, box (3) and box (6) represent z-axis coordinates of two tangent points of the ellipsoid and an x-O-y parallel surface of the world coordinate system, and abs (·) represents an absolute value function.

In the embodiment, a monocular camera and an IMU are used as sensing acquisition equipment, a mathematical equation of a three-dimensional dual ellipsoid of an object and a projection outline of the ellipsoid in multiple views is established by calibrating camera parameters, detecting a target object by an image and estimating a motion posture of a sensor, and finally, the occupied size of a three-dimensional space of a three-dimensional minimum envelope box of the object is calculated, so that the measurement of the three-dimensional size information of the object with real dimensions, which is low in cost, efficient, convenient and fast, is realized.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. A method for measuring the three-dimensional size of an object based on vision-inertia is characterized by comprising the following steps:

s1, estimating the pose of a camera based on visual-inertial coupling;

s3, constructing a dual ellipsoid envelope equation;

s4, calculating the three-dimensional size of the ellipsoid minimum envelope box so as to obtain the three-dimensional occupied space size information of the object;

the specific process of estimating the pose of the camera based on the visual-inertial coupling in the step S1 is as follows:

adopting a visual-inertial tight coupling framework, including SIFT feature extraction and matching association on the image; for image keyframe C due to IMU and image data frequency differences _i To C _j Performing pre-integration on IMU data between frames, wherein the integration result is IMU posture delta R, speed delta v and position delta p between frames; for C _i The result of fusing into the pre-integration can be C _j The attitude R, the velocity v and the position p of the lower IMU relative to the world coordinate system;

wherein, E _proj (k, j) is a cost function for visual sensing that defines the reprojection error of the three-dimensional coordinates from the feature points on the image:

Coordinates converted into camera coordinate system

Function(s)

Mapping the image to an image coordinate system to obtain the pixel coordinate of the image in image projection, wherein rho (·) is a kernel function Huber function used for transforming the original image into a new imageThe square term of the initial error is replaced by a function which grows slowly, and meanwhile, the smoothness of the error function is guaranteed;

E _IMU (i, j) is the IMU error term:

e _b ＝b ^j -b ⁱ

E _IMU (i, j) IMU error terms representing the ith and jth frames, including attitude error e _R Speed error e _v Position error e _p And offset error e _h Wherein ∑ _I Is a pre-integrated information matrix, and _R is an information matrix of offset random walks; in the attitude error e _R Middle, Δ R _ij Representing the relative pose of the IMU through the i and j frames,

representing the relative transformation posture of i-j obtained by IMU through pre-integration, and mapping the elements of an orthogonal transformation group SO (3) into a lie algebra SO (3) by Log (·); at a speed error e _v In (1),

and

and

respectively representing offset approximate terms of the gyroscope and the accelerometer in j frames; at a position error e _p In (1),

and

is the expression of IMU linear velocity of i frame under the world coordinate system, delta p _ij Is the difference in the positions of the two frames i, j; in the offset term e _b In (b) ^j And b ⁱ Respectively representing IMU offset under j and i frames;

the method comprises the steps of carrying out optimization solution on the equation containing the visual sensing errors and the inertial sensing errors by using a g2o optimization frame, and solving the posture R of each key frame _j And position P _j ；

The specific process of step S2 is as follows:

training and fine-tuning on a target object data set by adopting a pre-training convolution deep neural network to obtain a target object detection network module, extracting a target object from an input image by the module, and outputting a semantic label and a 2D (two-dimensional) envelope frame of the objectCoordinate and size information, and setting four vertexes of the envelope as p ₁ ，p ₂ ，p ₃ ，p ₄ Four sides are l ₁ ，l ₂ ，l ₃ ，l ₄ All expressed in a homogeneous pixel coordinate system; under the coordinates, any point p has coordinates [ u, v,1 ]]And any straight line l _t Can be divided into two points p _a And p _b The cross product of (a) yields, i.e.:

wherein pi _t ＝[π _t1 ,π _t2 ,π _t3 ,π _t4 ] ^T Is formed by _t And performing the operation on four sides of the object detection frames in all the images by using the obtained back projection plane, wherein each detection frame obtains four back projection planes.

2. The method for measuring the three-dimensional size of the object based on the vision-inertia as claimed in claim 1, wherein the step S3 is to construct the dual ellipsoid envelope equation by the following specific process:

the dual form of ellipsoid is regarded as an algebraic cluster formed by all tangent planes on the ellipsoid, and in projective geometry, there is a definite definition pi of dual ellipsoid ^T Q ^* Pi =0, and establishing tangent plane pi _t ＝[π _t1 ,π _t2 ,π _t3 ,π _t4 ] ^T The equation of (c):

written in vector form are:

Aq ^* ＝0

3. the method for measuring the three-dimensional size of the object based on the vision-inertia as claimed in claim 1, wherein the specific process of the step S4 is as follows:

by means of an equation in the form of ellipsoid duality, three tangent planes aligned with the scene coordinate system are sought, the tangent planes are obtained by solving a quadratic equation of unity, and the analytic solution is as follows:

thereby obtaining the length, width and height of the envelope box cube:

length l = abs (box (1) -box (4));

width w = abs (box (2) -box (5));

height h = abs (box (3) -box (6));