WO2021063128A1

WO2021063128A1 - Method for determining pose of active rigid body in single-camera environment, and related apparatus

Info

Publication number: WO2021063128A1
Application number: PCT/CN2020/110254
Authority: WO
Inventors: 王越; 许秋子
Original assignee: 深圳市瑞立视多媒体科技有限公司
Priority date: 2019-09-30
Filing date: 2020-08-20
Publication date: 2021-04-08
Also published as: CN110689577B; CN110689577A; CN114170307A

Abstract

The present invention relates to the technical field of computer vision, and particularly relates to a method for determining a pose of an active rigid body in a single-camera environment, and a related apparatus. The method comprises: acquiring two-dimensional spatial point coordinates and two-dimensional spatial point codes of two adjacent frames, and a camera parameter, matching, according to the two-dimensional spatial point codes, the two-dimensional spatial point coordinates so as to obtain multiple two-dimensional spatial feature pairs, constructing a set of linear equations from the multiple two-dimensional spatial feature pairs and the camera parameter, and solving to obtain an intrinsic matrix; decomposing the intrinsic matrix by means of a singular value decomposition algorithm so as to obtain multiple rotation matrices and translation matrices; and estimating three-dimensional spatial point coordinates, detecting a depth value, determining a target rotation matrix and a target translation matrix, and determining a pose of a rigid body according to the target rotation matrix and the target translation matrix. The invention achieves tracking and positioning of an active light emitting rigid body in a single-camera environment at a low cost, and is thus more advantageous than employing a complicated multi-camera setting.

Description

Position and attitude positioning method of active rigid body in single camera environment and related equipment

Technical field

The invention relates to the technical field of computer vision, in particular to a method for positioning an active rigid body in a single-camera environment and related equipment.

Background technique

The traditional optical motion capture method uses the ultra-high-power near-infrared light source in the motion capture camera to emit infrared light and irradiate it on the passive marking point; the marking point coated with high-reflective material reflects the irradiated infrared light, and this part of the infrared light The ambient light with background information will pass through the low-distortion lens and reach the infrared narrow band pass filter unit of the camera. Since the light band of the infrared narrow band pass filter unit is the same as that of the infrared light source, the ambient light with redundant background information will be filtered out, leaving only the infrared light with the marking point information to pass through and be taken by the camera. Photosensitive element recording. The photosensitive element then converts the light signal into an image signal and outputs it to the control circuit, and the image processing unit in the control circuit uses a Field Programmable Gate Array (FPGA) to preprocess the image signal in the form of hardware, and finally to The tracking software flows out the 2D coordinate information of the marked points.

In the traditional optical motion capture system, whether it is active rigid body tracking or passive rigid body tracking, the system server needs to receive the 2D data of each camera in the multi-camera system, and then adopt the principle of multi-eye vision, according to the matching relationship between 2D point clouds And calibrate the calculated pose relationship between the cameras in advance, calculate the 3D coordinates in the three-dimensional space, and use this as the basis to calculate the motion information of the rigid body in the space. This method relies on the collaborative work between multiple cameras, so that it can be used in a relatively large space to realize the identification and tracking of rigid bodies, which leads to the high cost and difficult maintenance of the motion capture system.

Summary of the invention

The main purpose of the present invention is to provide an active rigid body pose positioning method and related equipment in a single-camera environment, aiming to solve the high cost and difficult-to-maintain technology caused by the use of multi-camera systems in current passive or active motion capture methods problem.

To achieve the above objective, the present invention provides a method for positioning an active rigid body in a single-camera environment. The method includes the following steps:

Obtain the two-dimensional space point coordinates of two adjacent frames captured by the monocular camera, the two-dimensional space point code corresponding to the two-dimensional space point coordinates, and the camera parameters of the camera. According to the two-dimensional space point code, the phase Matching the two-dimensional spatial point coordinates of two adjacent frames to obtain multiple sets of two-dimensional spatial feature pairs, constructing a linear equation system from the multiple sets of two-dimensional spatial feature pairs and the camera parameters, and solving the essential matrix;

Decompose the essential matrix through a singular value decomposition algorithm to obtain multiple sets of rotation matrices and translation matrices;

According to the two-dimensional space feature pair, multiple sets of the rotation matrix and the translation matrix, the three-dimensional space point coordinates are estimated, the depth value of the three-dimensional space point coordinates is detected, and the set of the rotation matrix whose depth value is a positive number The translation matrix is defined as a target rotation matrix and a target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the target translation matrix.

Optionally, the determining the rigid body pose according to the target rotation matrix and the target translation matrix includes:

Sum the distances between all three-dimensional space points in the three-dimensional space point coordinates and take an average value to obtain a three-dimensional average distance;

Obtain rigid body coordinates, sum the distances between all rigid body mark points in the rigid body coordinates and take an average value to obtain the average rigid body distance;

The target translation matrix is optimized by an optimization formula to obtain an optimized target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the optimized target translation matrix;

The optimization formula is:

Wherein, L1 is the three-dimensional average distance, L2 is the average distance of the rigid body, T is the target translation matrix before optimization, and T′ is the target translation matrix after optimization.

Optionally, the acquiring rigid body coordinates, summing the distances between all rigid body mark points in the rigid body coordinates and then taking the average value, before obtaining the average distance of the rigid body, includes:

Obtain the two-dimensional space point coordinates of two adjacent frames captured by multiple cameras, the two-dimensional space point code corresponding to the two-dimensional space point coordinates, and the space position data of multiple cameras, and encode the two-dimensional space point The same multiple of the two-dimensional space point coordinates are classified into the same type, and are marked under the same marking point;

Match a plurality of the cameras in pairs, and obtain the three-dimensional space point coordinates of each frame of each of the marker points according to the spatial position data of the two cameras and the plurality of the two-dimensional space point coordinates of the same frame ；

The coordinates of all three-dimensional space points in the same frame are converted into rigid body coordinates in the rigid body coordinate system, and the rigid body coordinates of each of the marked points in each frame are obtained.

Optionally, the multiple cameras are matched in pairs, and each of the marked points is obtained according to the spatial position data of the two cameras and the coordinates of the multiple two-dimensional spatial points in the same frame. The three-dimensional space point coordinates of the frame, including:

The two two-dimensional space point coordinates captured by the two matching cameras in the same frame are matched by pairwise matching of all the cameras that have captured the same marked point, and the least squares method is solved by singular value decomposition. Calculate a set of three-dimensional space point coordinates;

Judging whether the three-dimensional space point coordinates are within a preset threshold value range, and if the three-dimensional space point coordinates exceed the threshold value range, the three-dimensional space point coordinates are eliminated to obtain a set of the eliminated three-dimensional space point coordinates;

Calculate the average value of a set of the three-dimensional space point coordinates, and optimize by Gauss-Newton method to obtain the three-dimensional space point coordinates of the mark point.

Optionally, the conversion of all three-dimensional space point coordinates in the same frame into rigid body coordinates in a rigid body coordinate system to obtain the rigid body coordinates of each of the marked points in each frame includes:

Calculate the coordinate average value of the three-dimensional space point coordinates corresponding to the multiple mark points in the same frame, and record the coordinate average value as the origin in the rigid body coordinate system;

The difference between the origin and the coordinates of the three-dimensional space points corresponding to each of the marking points in the same frame is respectively calculated to obtain the rigid body coordinates of each of the marking points in each frame.

Optionally, the estimating three-dimensional spatial point coordinates through the two-dimensional spatial feature pair, multiple sets of the rotation matrix and the translation matrix includes:

Suppose the two cameras are camera 1 and camera 2, the two two-dimensional space point coordinates captured in the same frame are A(a1, a2), B(b1, b2), and the rotation matrix of camera 1 is R1( R11, R12, R13), R1 is a 3*3 matrix, the translation matrix is T1 (T11, T12, T13), T1 is a 3*1 matrix, and the rotation matrix of camera 2 is R2 (R21, R22, R23), The translation matrix is T2 (T21, T22, T23). Similarly, R2 is a 3*3 matrix, and T2 is a 3*1 matrix. The three-dimensional space point coordinates can be obtained by the following method:

1) According to the internal parameters and distortion parameters of the two cameras, transform the pixel coordinates A(a1, a2), B(b1, b2) into camera coordinates A′(a1′, a2′), B′(b1′, b2′) );

2) Construct the least squares matrix X and Y, where X is a 4*3 matrix and Y is a 4*1 matrix; the first line of X matrix is a1′*R13-R11, and the second line of X matrix is a2′*R13- R12, the third row of X matrix is b1′*R23-R21, the fourth row of X matrix is b2′*R23-R22; the first row of Y matrix is T11-a1′*T13, the second row of Y matrix is T12-a2′*T13, The third row of Y matrix is T21-b1′*T23, and the fourth row of Y matrix is T22-b2′*T23;

3) According to the equation X*C=Y and the already constructed matrix X and matrix Y, use singular value decomposition (SVD) to obtain a three-dimensional space point coordinate C(c1, c2, c3);

4) According to a plurality of different rotation matrices and translation matrices R1, T1, R2, T2, a plurality of different three-dimensional space point coordinates are obtained.

Optionally, the detecting the depth value of the three-dimensional space point coordinates, and defining the group of the rotation matrix and the translation matrix whose depth value is a positive number as the target rotation matrix and the target translation matrix includes:

According to the estimated three-dimensional space point coordinates, detect whether the depth value corresponding to the three-dimensional space point coordinates is a positive number, and if so, define the corresponding set of the rotation matrix and translation matrix as the target rotation matrix and target translation matrix.

Further, in order to achieve the above objective, the present invention also provides an active rigid body pose positioning device in a single-camera environment, including:

The calculation essential matrix module is used to obtain the two-dimensional space point coordinates of two adjacent frames captured by the monocular camera, the two-dimensional space point code corresponding to the two-dimensional space point coordinates, and the camera parameters of the camera, according to the two Two-dimensional space point coding, matching the two-dimensional space point coordinates of two adjacent frames to obtain multiple sets of two-dimensional space feature pairs, and constructing a linear equation system from multiple sets of the two-dimensional space feature pairs and the camera parameters, Solve the essential matrix;

Calculating rotation matrix and translation matrix module, which is used to decompose the essential matrix through a singular value decomposition algorithm to obtain multiple sets of rotation matrix and translation matrix;

The rigid body pose determination module is used to estimate the three-dimensional space point coordinates through the two-dimensional space feature pair, multiple sets of the rotation matrix and the translation matrix, detect the depth value of the three-dimensional space point coordinate, and set the depth value to be positive The number of rotation matrices and translation matrices are defined as a target rotation matrix and a target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the target translation matrix.

In order to achieve the above objective, the present invention also provides a device for positioning an active rigid body in a single-camera environment. The device includes a memory, a processor, and a device that is stored in the memory and can run on the processor. A pose positioning program for an active rigid body in a single-camera environment, where the pose positioning program for an active rigid body in a single-camera environment is executed by the processor to realize the pose positioning of an active rigid body in a single-camera environment as described above Method steps.

In order to achieve the above object, the present invention also provides a computer-readable storage medium that stores a program for positioning the pose of an active rigid body in a single-camera environment. When the pose positioning program is executed by the processor, the steps of the active rigid body pose positioning method in the single-camera environment as described above are realized.

In the pose positioning method of an active rigid body in a single-camera environment provided by the present invention, in the process of determining the pose of a rigid body, the essence matrix is solved by matching the characteristic points in the coordinates of two adjacent frames; and the essence is graded by the singular value decomposition algorithm Matrix, multiple groups of rotation matrix and translation matrix are obtained; by detecting the depth value of the feature point, the final target rotation matrix and translation matrix are determined. The whole process does not depend on the rigid body structure, and the required matching data can be obtained according to the code and coordinates to calculate the rigid body pose information. In a single-camera environment, the present invention can realize the tracking and positioning of an active optical rigid body at a lower cost, and has obvious advantages compared with a complex multi-camera environment. In addition, the present invention matches the feature points of two adjacent frames each time, so that the active light rigid body can be tracked and positioned every time the current frame is compared to the initial frame's motion posture, thereby avoiding the common features of monocular camera tracking. The cumulative error problem further improves the tracking accuracy.

Description of the drawings

By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only used for the purpose of illustrating the preferred embodiments, and are not considered as a limitation to the present invention.

FIG. 1 is a schematic structural diagram of the operating environment of an active rigid body pose positioning device in a single-camera environment related to a solution of an embodiment of the present invention;

2 is a flowchart of a method for positioning an active rigid body in a single-camera environment in an embodiment of the present invention;

Figure 3 is a detailed flowchart of step S3 in an embodiment of the present invention;

4 is a detailed flowchart of step S302 in an embodiment of the present invention;

Fig. 5 is a structural diagram of an active rigid body pose positioning device in a single-camera environment in an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

Those skilled in the art can understand that, unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present invention refers to the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups of them.

1, which is a schematic structural diagram of an operating environment of an active rigid body pose positioning device in a single-camera environment related to a solution of an embodiment of the present invention.

As shown in FIG. 1, the active rigid body pose positioning device in a single-camera environment includes: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.

Those skilled in the art can understand that the hardware structure of the active rigid body pose positioning device in the single-camera environment shown in FIG. 1 does not constitute a limitation on the active rigid body pose positioning device in the single-camera environment, and may include ratios More or fewer parts are shown, or some parts are combined, or different parts are arranged.

As shown in FIG. 1, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a pose positioning program for an active rigid body in a single-camera environment. Among them, the operating system is a program that manages and controls the pose positioning equipment and software resources of the active rigid body in the single-camera environment, and supports the operation of the pose positioning program of the active rigid body in the single-camera environment and other software and/or programs.

In the hardware structure of the active rigid body pose positioning device in the single-camera environment shown in Figure 1, the network interface 1004 is mainly used to access the network; the user interface 1003 is mainly used to detect confirmation commands and edit commands, and process The device 1001 can be used to call the pose positioning program of the active rigid body in the single-camera environment stored in the memory 1005, and execute the operations of the following embodiments of the pose positioning method of the active rigid body in the single-camera environment.

2 is a flowchart of a method for positioning an active rigid body in a single-camera environment in an embodiment of the present invention. As shown in FIG. 2, a method for positioning an active rigid body in a single-camera environment includes The following steps:

Step S1, solving the essential matrix: Obtain the two-dimensional point coordinates of the two adjacent frames captured by the monocular camera, the two-dimensional point code corresponding to the two-dimensional point coordinates, and the camera parameters of the camera, according to the two-dimensional point code, The two-dimensional space point coordinates of two adjacent frames are matched to obtain multiple sets of two-dimensional spatial feature pairs. The multiple sets of two-dimensional spatial feature pairs and camera parameters are constructed to form a linear equation set to solve the essential matrix.

The marker points in this step are generally set at different positions of the rigid body. When the rigid body moves within the camera capture range, the two-dimensional space coordinate information of the marker point is captured by the monocular camera to determine the spatial point data. The spatial point data includes two-dimensional space. Point coordinates and corresponding two-dimensional space point codes. Usually, there are eight marking points on the rigid body, and the marking points can be eight luminous LED lights. Therefore, a rigid body usually contains eight spatial point data, and each frame of monocular camera data contains eight mark points. The coding of the same mark point in different frames is the same, and different mark points are in the same frame. The encoding is different. Based on this, all the two-dimensional space points in two adjacent frames captured by the monocular camera can be matched, and the two two-dimensional space points with the same two-dimensional space point code are regarded as a set of two-dimensional space feature pairs, and the same group is considered The two-dimensional spatial feature pair is the projection of the same marker point in two adjacent frames on the monocular camera. When a rigid body contains eight marker points, it has eight sets of two-dimensional spatial feature pairs.

Before the monocular camera captures the spatial point data, the camera parameters of the monocular camera need to be calibrated, that is, the camera's optical center, focal length and distortion parameters, etc. These camera parameters are used as a matrix, recorded as matrix M, and used in the essential matrix calculation. In this step, when solving the essential matrix, the principle of epipolar geometric constraint is adopted. The linear equations are constructed for multiple sets of two-dimensional spatial feature pairs and camera parameters in the following way to solve the essential matrix:

In order to solve the essential matrix, first calculate the fundamental matrix F,

According to multiple sets of two-dimensional spatial feature pairs, the fundamental matrix F is obtained. According to F=M- ^T EM, since the matrix M corresponding to the camera parameters is known, the essential matrix E can be obtained.

Step S2: Decompose the essential matrix: Decompose the essential matrix through a singular value decomposition algorithm to obtain multiple sets of rotation matrices and translation matrices.

After the essential matrix is obtained, the motion information of the rigid body is recovered according to the essential matrix: rotation matrix R and translation matrix T. This process is obtained by singular value decomposition (SVD) in this step. After passing the singular value decomposition of the essential matrix E obtained in step S1, a total of four possible solutions (R, T) can be obtained, that is, four sets of rotation matrices and translation matrices, of which only one correct solution is available in the monocular camera Positive depth (the depth value is a positive number). Therefore, the next step of detecting depth information is required.

Step S3: Determine the pose of the rigid body: Estimate the three-dimensional space point coordinates through the two-dimensional space feature pairs, multiple sets of rotation matrices and translation matrices, detect the depth value of the three-dimensional space point coordinates, and set the group of rotation matrices with a positive depth value The translation matrix is defined as the target rotation matrix and the target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the target translation matrix.

After decomposing the essential matrix using singular value decomposition in step S2, four possible solutions are obtained. Therefore, in this step, it is necessary to finally determine the correct solution among the four possible solutions. First, it is necessary to estimate the coordinates of the three-dimensional space point, and detect the depth value of the feature point according to the three-dimensional space point coordinate. Only the set of solutions (R, T) with a positive depth value is the final target (R, T).

In one embodiment, in step S3, estimating three-dimensional space point coordinates through two-dimensional spatial feature pairs, multiple sets of rotation matrices and translation matrices, further includes:

Suppose the two cameras are camera 1 and camera 2, the two two-dimensional space point coordinates captured in the same frame are A(a1, a2), B(b1, b2), and the rotation matrix of camera 1 is R1( R11, R12, R13), R1 is a 3*3 matrix, the translation matrix is T1 (T11, T12, T13), T1 is a 3*1 matrix, and the rotation matrix of camera 2 is R2 (R21, R22, R23), The translation matrix is T2 (T21, T22, T23). Similarly, R2 is a 3*3 matrix and T2 is a 3*1 matrix. The coordinates of a three-dimensional space point C(c1, c2) in the same frame are obtained by the following method , C3):

3) According to the equation X*C=Y and the already constructed matrix X and matrix Y, one of the three-dimensional space point coordinates C(c1, c2, c3) can be obtained by SVD decomposition;

Finally, according to multiple different rotation matrices and translation matrices, such as multiple sets of R1, T1, R2, and T2 rotation matrix and translation matrix data pairs, multiple different three-dimensional space point coordinates are obtained.

For example, if four sets of rotation matrices and translation matrices are obtained in step S2, four different three-dimensional space point coordinates can be estimated through this step, but there is only one three-dimensional space point coordinate C whose coordinate value c3 is greater than 0, then The R and T corresponding to the coordinate C of the three-dimensional space point are the final target data.

In this embodiment, the matched sets of two-dimensional space feature pairs are combined with four possible solutions (R, T), and the corresponding three-dimensional space coordinate data (x, y, z) is estimated by the above method according to the principle of triangulation. , Provide accurate data for the subsequent detection depth value z.

In one embodiment, in step S3, the depth value of the three-dimensional space point coordinates is detected, and the group of rotation matrices and translation matrices with a positive depth value are defined as the target rotation matrix and the target translation matrix, including:

According to the estimated three-dimensional space point coordinates, it is detected whether the depth value corresponding to the three-dimensional space point coordinates is a positive number, and if so, the corresponding group of rotation matrix and translation matrix is defined as the target rotation matrix and the target translation matrix.

In this embodiment, multiple depth values z are obtained by solving in the above-mentioned manner, and the corresponding solution (R, T) when the depth value z is zero or negative is eliminated, and the corresponding solution (R, T) when the depth value z is positive is retained. And as the final target data, the rigid body pose is determined with this target data.

In one embodiment, in step S3, after defining the set of rotation matrices and translation matrices with a positive depth value as the target rotation matrix and the target translation matrix, before determining the rigid body pose according to the target rotation matrix and the target translation matrix, such as As shown in Figure 3, it includes:

Step S301: Calculate the three-dimensional average distance: sum the distances between all three-dimensional space points in the three-dimensional space point coordinates and take the average value to obtain the three-dimensional average distance.

When calculating the three-dimensional average distance, you can randomly select a three-dimensional space point 1 to calculate the distance between the three-dimensional space point 1 and any other three-dimensional space point 2. The following formula can be used in the calculation:

D=sqrt((a ₁ -b ₁ ) ² +(a ₂ -b ₂ ) ² +(a ₃ -b ₃ ) ² )

Among them, D is the distance between two three-dimensional space points, (a ₁ , a ₂ , a ₃ ) is the three-dimensional space point coordinates of the three-dimensional space point 1, (b ₁ , b ₂ , b ₃ ) is the three-dimensional space point 2 The coordinates of the point in the three-dimensional space.

Calculate the distance between the 3D space point 2 and any other 3D space point 3 that has not participated in the calculation until all the 3D space points have participated in the calculation, then sum all the distances, and then take the average value. It is also possible to calculate the distance between the last three-dimensional space point 8 that participated in the calculation and the first randomly taken three-dimensional space point 1 after all three-dimensional space points have participated in the calculation, and then sum all the distances, and then take the average value.

For example, when there are eight marked points in a rigid body, there are eight three-dimensional space points in the three-dimensional space point coordinates. Through the above method, eight distance values are calculated, the eight distance values are added, and then divided by Eight, get the three-dimensional average distance.

Step S302: Calculate the average distance of the rigid body: obtain the coordinates of the rigid body, sum the distances between all rigid body mark points in the coordinates of the rigid body and take the average value to obtain the average distance of the rigid body.

When calculating the average distance of a rigid body, a distance calculation formula similar to that in step S301 can be used to calculate the distance between two rigid body mark points in the rigid body coordinate system after calculating the distance between the two coordinates of the rigid body coordinate system, and then the sum is performed and the average value is taken.

The rigid body coordinates in this step can be obtained by actually measuring the rigid body coordinates of the marked points, as shown in Figure 4, or can be obtained by a multi-camera system, that is, in the following way, accurate rigid body coordinates can be obtained with only one initialization. Multiple calculations:

Step S30201, acquiring data: acquiring the two-dimensional point coordinates of two adjacent frames captured by multiple cameras, the two-dimensional point codes corresponding to the two-dimensional point coordinates, and the spatial position data of multiple cameras, and encode the two-dimensional points The coordinates of the same multiple two-dimensional space points are classified into the same kind, and are marked under the same marking point.

The marker points in this step are generally set at different positions of the rigid body. The two-dimensional space coordinate information of the marker points is captured by multiple cameras, and the spatial point data is determined through the preset rigid body encoding technology. The spatial point data includes two-dimensional spatial points. Coordinates and corresponding two-dimensional space point codes. The spatial position data is obtained by calibrating and calculating the spatial position relationship of each camera. Usually, there are eight marking points on the rigid body, and the marking points can be eight luminous LED lights. Therefore, a rigid body usually contains eight spatial point data. Among the information captured by multiple cameras, each frame of data for a single camera contains the spatial point data of eight marker points. The encoding of the same marker point in different frames is the same. Yes, the coding of different marker points in the same frame is different. Based on this, the spatial point data with the same spatial point code in all cameras can be divided together as the same type, and these spatial point data are considered to be projections of the same marker point in space on different cameras.

Step S30202: Calculate the three-dimensional space data: match multiple cameras in pairs, and obtain the three-dimensional space point coordinates of each marker point per frame according to the spatial position data of the two cameras and the multiple two-dimensional space point coordinates of the same frame .

The processing of this step is performed on each frame of data of each marker point. During processing, multiple cameras that capture the marker point are matched in pairs. Using the principle of triangulation in multi-view geometry, through singular value decomposition (Singular Value Decomposition). Decomposition, SVD) solve the least square method to obtain a set of three-dimensional space point data.

For example, when the rigid body includes eight marker points, eight three-dimensional space point codes and three-dimensional space point coordinates of the eight marker points are obtained through this step.

This step further includes:

(1) Solve the least squares method: match all the cameras that captured the same mark point in pairs, and use the multi-view geometry to match the two two-dimensional space point coordinates captured by the two matched cameras in the same frame. The principle of triangulation, the least squares method is solved by singular value decomposition, and a three-dimensional space point is obtained after traversing all pairwise matching cameras, and a set of three-dimensional space points is obtained. A set of three-dimensional space points is the three-dimensional mark point Space point coordinates.

Suppose the two cameras are camera 1 and camera 2, the coordinates of two two-dimensional space points captured in the same frame are A(a1, a2), B(b1, b2), and the rotation matrix of camera 1 is R1( R11, R12, R13), R1 is a 3*3 matrix, the translation matrix is T1 (T11, T12, T13), T1 is a 3*1 matrix, and the rotation matrix of camera 2 is R2 (R21, R22, R23), The translation matrix is T2 (T21, T22, T23). Similarly, R2 is a 3*3 matrix, the translation matrix is T2, and T2 is a 3*1 matrix. The coordinates of a three-dimensional space point in the same frame are obtained by the following method C(c1, c2, c3):

2) Construct the least squares matrix X and Y, where X is a 4*3 matrix and Y is a 4*1 matrix; the first line of X matrix is a1′*R13-R11, and the second line of X matrix is a2′*R13- R12, the third row of X matrix is b1′*R23-R21, the fourth row of X matrix is b2′*R23-R22; the first row of Y matrix is T11-a1′*T13, the second row of Y matrix is T12-a2′*T13, The third row of the Y matrix is T21-b1'*T23, and the fourth row of the Y matrix is T22-b2'*T23.

3) According to the equation X*C=Y and the already constructed matrices X and Y, a three-dimensional space point coordinate C can be obtained by SVD decomposition.

In this step, the two two-dimensional space point coordinates captured by all pairwise matching cameras are finally calculated to obtain a set of three-dimensional space point coordinates.

(2) Eliminate the coordinates outside the threshold: determine whether the three-dimensional space point coordinates are within the preset threshold range, if it exceeds the threshold range, the three-dimensional space point coordinates are eliminated, and a set of three-dimensional space point coordinates after elimination is obtained.

After obtaining multiple three-dimensional space point coordinates, it is necessary to check whether these three-dimensional space point coordinates are within a preset threshold range, that is, a smaller threshold distance, and this threshold range is a coordinate parameter preset in advance. If the three-dimensional space point coordinates are found to deviate from the threshold range, the three-dimensional space point coordinates are considered to be wrong data and eliminated.

(3) Calculate the average value: Calculate the average value of a set of three-dimensional space point coordinates, optimize by Gauss Newton method, and obtain the three-dimensional space point coordinates of the marker point.

Calculate the average value of all three-dimensional space point coordinates after excluding the error data. When calculating, calculate the average value of each dimension of the three-dimensional space point coordinates to obtain the three-dimensional space point coordinates C′ (c1′, c2′, c3′), The obtained three-dimensional space point coordinates are optimized by the Gauss-Newton method, and finally the three-dimensional space point coordinates C (c1, c2, c3) of a certain mark point are obtained:

1) According to the R and T of each camera, calculate the following values for C′ and sum up g0, H0;

Calculate the projection coordinates of the three-dimensional space point coordinate C′ on each camera, match the actual image coordinate to the nearest point and calculate the residual error of the image coordinate with the nearest point;

Calculate the 3D coordinate q of C′ in the camera coordinate system according to the R and T of each camera, and define:

Return D*R;

Given a 3D point p (x, y, z) in the camera I coordinate system and its imaging coordinates (u, v) on the camera, then

Corresponding Jacobian matrix

Taking the 3D point variable in the world coordinate system, there are

According to the Gauss-Newton algorithm, calculate the gradient

2) Calculation

3) Finally, the optimized three-dimensional space point coordinates C (c1, c2, c3) are obtained.

Step S30203: Calculate rigid body coordinates: convert all three-dimensional space point codes and three-dimensional space point coordinates in the same frame into rigid body coordinates in the rigid body coordinate system, and obtain the rigid body coordinates of each marker point and each frame.

Through step S2, the three-dimensional space point data corresponding to each marking point can be obtained, and the multiple three-dimensional space point data corresponding to multiple marking points can be formed into a rigid body. If the rigid body currently in use has eight luminous LED lights, the rigid body Contains eight three-dimensional space point data. Through multiple three-dimensional space point data, such as three-dimensional space point coordinates in eight three-dimensional space point data, it can be transformed into rigid body coordinates in a rigid body coordinate system.

This step further includes:

(1) Calculate the average value: Calculate the coordinate average value of the three-dimensional space point coordinates corresponding to multiple mark points in the same frame, and record the coordinate average value as the origin in the rigid body coordinate system.

When determining the rigid body coordinates, first determine the origin under the rigid body coordinate system. In this step, the average value is calculated for each dimension of the three-dimensional space point coordinates corresponding to all the mark points in the same frame to obtain the coordinate average value, and this coordinate average value is recorded as the origin in the rigid coordinate system as all the mark points The reference data of the corresponding three-dimensional space point coordinates.

For example, when the rigid body contains eight marker points, step S2 obtains eight three-dimensional space point coordinate data, and calculates the average value of each dimension of the eight three-dimensional space point coordinate data to obtain the coordinate average value.

(2) Calculate the difference: calculate the difference between the origin and the three-dimensional space point coordinates corresponding to each mark point in the same frame, and obtain the rigid body coordinates of each mark point in each frame.

The average value of the coordinates is taken as the origin in the rigid body coordinate system, and the difference between the coordinates of each three-dimensional space point and the origin is calculated, and the difference obtained is the rigid body coordinate of each marked point.

For example, when a rigid body contains eight mark points, the three-dimensional space point coordinates corresponding to the eight mark points are calculated as the difference from the origin. When calculating, the difference between the coordinates of each dimension and the dimension coordinates corresponding to the origin is calculated, and finally Get eight rigid body coordinates.

In this embodiment, multiple cameras are used to capture multiple two-dimensional space point coordinates, a set of three-dimensional space point data is analyzed through a specific solution algorithm, and after operations such as integration, averaging, and optimization of multiple three-dimensional space point data are performed, Finally, more accurate three-dimensional space point data is obtained, and the accurate three-dimensional space point data is converted into rigid body coordinate data in the rigid body coordinate system, which provides definite and accurate data for the subsequent calculation of the average distance of the rigid body.

Step S303, optimization: the target translation matrix is optimized by the optimization formula to obtain the optimized target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the optimized target translation matrix. The optimization formula is:

Among them, L1 is the three-dimensional average distance, L2 is the average distance of the rigid body, T is the target translation matrix before optimization, and T′ is the target translation matrix after optimization.

Under the monocular camera, after estimating the target rotation matrix R and the target translation matrix T of the rigid body, under the same rotation angle of the rigid body, the translation amount may have various situations, so there is no guarantee that the translation matrix T is accurate and true data. . In order to obtain more optimized and reliable rigid body pose information, and then determine the rigid body movement, after the three-dimensional space point coordinates of the rigid body are estimated by the triangulation principle, the three-dimensional space point coordinates and the rigid body coordinates in the rigid body coordinate system are estimated according to the estimated three-dimensional space point coordinates and the rigid body coordinates. The target translation matrix is optimized. The optimized target translation matrix is obtained through the above optimization formula, so that the final rigid body pose is more accurate and more authentic.

The pose positioning method of the active rigid body in the single-camera environment of this embodiment. The active optical rigid body has coding information so that the motion capture tracking and positioning no longer depends on the rigid body structure, but can directly obtain matching two-dimensional spatial features based on the coding information. Yes, to solve the rigid body pose. In a single-camera environment, the invention can realize the tracking and positioning of a rigid body at a lower cost, which has obvious advantages compared with a complex multi-camera environment. In addition, because the encoding information of the active optical rigid body is used to match the adjacent two frames, each time the active optical rigid body is tracked and positioned, the motion posture of the current frame compared to the initial frame can be calculated, thereby avoiding monocular camera tracking The common cumulative error problem further improves the tracking accuracy.

In one embodiment, a device for positioning an active rigid body in a single-camera environment is proposed. As shown in FIG. 5, the device includes:

The calculation essential matrix module is used to obtain the two-dimensional space point coordinates of two adjacent frames captured by the monocular camera, the two-dimensional space point code corresponding to the two-dimensional point coordinates, and the camera parameters of the camera. According to the two-dimensional space point code, Match the two-dimensional space point coordinates of two adjacent frames to obtain multiple sets of two-dimensional spatial feature pairs, construct a linear equation system from multiple sets of two-dimensional spatial feature pairs and camera parameters, and solve the essential matrix;

Calculate rotation matrix and translation matrix module, which is used to decompose the essential matrix through the singular value decomposition algorithm to obtain multiple sets of rotation matrix and translation matrix;

Determine the rigid body pose module, used to estimate the 3D space point coordinates through the 2D space feature pairs, multiple sets of rotation matrices and translation matrices, detect the depth value of the 3D space point coordinates, and set the group of rotation matrices with a positive depth value The translation matrix is defined as the target rotation matrix and the target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the target translation matrix.

Based on the same embodiment description as the above-mentioned embodiment of the present invention for the active rigid body pose positioning method in the single-camera environment, this embodiment does not describe the contents of the embodiment of the active rigid body pose positioning device in the single-camera environment. Too much repeat.

In one embodiment, a device for positioning an active rigid body in a single-camera environment is proposed. The device includes a memory, a processor, and an active rigid body in a single-camera environment that is stored in the memory and can run on the processor. The pose positioning program of the active rigid body in the single-camera environment is executed by the processor to implement the steps in the pose positioning method of the active rigid body in the single-camera environment of the foregoing embodiments.

In one embodiment, a computer-readable storage medium stores a pose positioning program for an active rigid body in a single-camera environment, and the pose positioning program for an active rigid body in a single-camera environment is processed by the processor The steps in the active rigid body pose positioning method in the single-camera environment of the foregoing embodiments are implemented during execution. Wherein, the storage medium may be a non-volatile storage medium.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable storage medium, and the storage medium can include: Read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, All should be considered as the scope of this specification.

The above-mentioned embodiments only express some exemplary embodiments of the present invention, and their descriptions are more specific and detailed, but they should not be interpreted as limiting the patent scope of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of the present invention, several modifications and improvements can be made, and these all fall within the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the appended claims.

Claims

A method for positioning an active rigid body in a single-camera environment, characterized in that the method includes the following steps:

Obtain the two-dimensional space point coordinates of two adjacent frames captured by the monocular camera, the two-dimensional space point code corresponding to the two-dimensional space point coordinates, and the camera parameters of the camera. According to the two-dimensional space point code, the phase Matching the two-dimensional spatial point coordinates of two adjacent frames to obtain multiple sets of two-dimensional spatial feature pairs, constructing a linear equation system from the multiple sets of two-dimensional spatial feature pairs and the camera parameters, and solving the essential matrix;

Decompose the essential matrix through a singular value decomposition algorithm to obtain multiple sets of rotation matrices and translation matrices;

According to the two-dimensional space feature pair, multiple sets of the rotation matrix and the translation matrix, the three-dimensional space point coordinates are estimated, the depth value of the three-dimensional space point coordinates is detected, and the set of the rotation matrix whose depth value is a positive number The translation matrix is defined as a target rotation matrix and a target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the target translation matrix.
The method for positioning the pose of an active rigid body in a single-camera environment according to claim 1, wherein the determining the pose of the rigid body according to the target rotation matrix and the target translation matrix comprises:

Sum the distances between all three-dimensional space points in the three-dimensional space point coordinates and take an average value to obtain a three-dimensional average distance;

Obtain rigid body coordinates, sum the distances between all rigid body mark points in the rigid body coordinates and take an average value to obtain the average rigid body distance;

The target translation matrix is optimized by an optimization formula to obtain an optimized target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the optimized target translation matrix;

The optimization formula is:

Wherein, L1 is the three-dimensional average distance, L2 is the average distance of the rigid body, T is the target translation matrix before optimization, and T′ is the target translation matrix after optimization.
The pose positioning method of an active rigid body in a single-camera environment according to claim 2, wherein the acquiring rigid body coordinates, summing the distances between all rigid body marking points in the rigid body coordinates, and taking the average Value, before getting the average distance of the rigid body, including:

Obtain the two-dimensional space point coordinates of two adjacent frames captured by multiple cameras, the two-dimensional space point code corresponding to the two-dimensional space point coordinates, and the space position data of multiple cameras, and encode the two-dimensional space point The same multiple of the two-dimensional space point coordinates are classified into the same type, and are marked under the same marking point;

Match a plurality of the cameras in pairs, and obtain the three-dimensional space point coordinates of each frame of each of the marker points according to the spatial position data of the two cameras and the plurality of the two-dimensional space point coordinates of the same frame ；

The coordinates of all three-dimensional space points in the same frame are converted into rigid body coordinates in the rigid body coordinate system, and the rigid body coordinates of each of the marked points in each frame are obtained.
The pose positioning method of an active rigid body in a single-camera environment according to claim 3, wherein the matching of a plurality of the cameras is performed according to the spatial position data of the two cameras and the same type. Obtaining the three-dimensional space point coordinates of each of the marking points in each frame of a plurality of the two-dimensional space point coordinates of the frame includes:

The two two-dimensional space point coordinates captured by the two matching cameras in the same frame are matched by pairwise matching of all the cameras that have captured the same marked point, and the least squares method is solved by singular value decomposition. Calculate a set of three-dimensional space point coordinates;

Judging whether the three-dimensional space point coordinates are within a preset threshold value range, and if the three-dimensional space point coordinates exceed the threshold value range, the three-dimensional space point coordinates are eliminated to obtain a set of the eliminated three-dimensional space point coordinates;

Calculate the average value of a set of the three-dimensional space point coordinates, and optimize by Gauss-Newton method to obtain the three-dimensional space point coordinates of the mark point.
The pose positioning method of an active rigid body in a single-camera environment according to claim 3, wherein the coordinates of all three-dimensional space points in the same frame are converted into rigid body coordinates in a rigid body coordinate system to obtain each position The rigid body coordinates of each frame of the marked points include:

Calculate the coordinate average value of the three-dimensional space point coordinates corresponding to the plurality of the mark points in the same frame, and record the coordinate average value as the origin in the rigid body coordinate system;

The difference between the origin and the coordinates of the three-dimensional space points corresponding to each of the marking points in the same frame is respectively calculated to obtain the rigid body coordinates of each of the marking points in each frame.
The pose positioning method of an active rigid body in a single-camera environment according to claim 1, wherein the three-dimensional image is estimated by the two-dimensional spatial feature pair, multiple sets of the rotation matrix and the translation matrix. Space point coordinates, including:

Suppose the two cameras are camera 1 and camera 2, and the coordinates of two two-dimensional space points captured in the same frame are A (a1, a2) and B (b1, b2) respectively, and the rotation matrix of the camera 1 is R1 (R11, R12, R13), the translation matrix is T1 (T11, T12, T13), the rotation matrix of the camera 2 is R2 (R21, R22, R23), and the translation matrix is T2 (T21, T22, T23), Wherein, the R1 and R2 are 3*3 matrices, the T1 and T2 are 3*1 matrices, and the three-dimensional space point coordinates are obtained by the following method:

According to the internal parameters and distortion parameters of the two cameras, the pixel coordinates A(a1, a2), B(b1, b2) are converted into camera coordinates A′(a1′, a2′), B′(b1′, b2′) );

Construct the least squares matrix X and Y, where X is a 4*3 matrix, Y is a 4*1 matrix, the first row of X matrix is a1′*R13-R11, the second row of X matrix is a2′*R13-R12, The third row of X matrix is b1′*R23-R21, the fourth row of X matrix is b2′*R23-R22; the first row of Y matrix is T11-a1′*T13, the second row of Y matrix is T12-a2′*T13, Y matrix The third row is T21-b1′*T23, and the fourth row of Y matrix is T22-b2′*T23;

According to the equation X*C=Y and the matrix X and matrix Y, a three-dimensional space point coordinate C (c1, c2, c3) is obtained by using singular value decomposition;

According to multiple different rotation matrices and translation matrices, multiple different three-dimensional space point coordinates are obtained.
The pose positioning method of an active rigid body in a single-camera environment according to claim 1, wherein the detection of the depth value of the three-dimensional space point coordinates, the group of the rotation matrix with a positive depth value and the translation The matrix is defined as the target rotation matrix and the target translation matrix, including:

According to the estimated three-dimensional space point coordinates, detect whether the depth value corresponding to the three-dimensional space point coordinates is a positive number, and if so, define the corresponding set of the rotation matrix and translation matrix as the target rotation matrix and target translation matrix.
An active rigid body pose positioning device in a single-camera environment, characterized in that the device includes:

The calculation essential matrix module is used to obtain the two-dimensional space point coordinates of two adjacent frames captured by the monocular camera, the two-dimensional space point code corresponding to the two-dimensional space point coordinates, and the camera parameters of the camera, according to the two Two-dimensional space point coding, matching the two-dimensional space point coordinates of two adjacent frames to obtain multiple sets of two-dimensional space feature pairs, and constructing a linear equation system from multiple sets of the two-dimensional space feature pairs and the camera parameters, Solve the essential matrix;

Calculating rotation matrix and translation matrix module, which is used to decompose the essential matrix through a singular value decomposition algorithm to obtain multiple sets of rotation matrix and translation matrix;

The rigid body pose determination module is used to estimate the three-dimensional space point coordinates through the two-dimensional space feature pair, multiple sets of the rotation matrix and the translation matrix, detect the depth value of the three-dimensional space point coordinate, and set the depth value to be positive The number of rotation matrices and translation matrices are defined as a target rotation matrix and a target translation matrix, and the rigid body pose is determined according to the target rotation matrix and the target translation matrix.
An active rigid body pose positioning device in a single-camera environment, characterized in that the device includes:

A memory, a processor, and a pose positioning program for an active rigid body in a single-camera environment stored on the memory and running on the processor. The pose positioning program for an active rigid body in the single-camera environment is controlled by When the processor is executed, the steps of the method for positioning an active rigid body in a single-camera environment according to any one of claims 1 to 7 are realized.
A computer-readable storage medium, wherein the computer-readable storage medium stores a pose positioning program for an active rigid body in a single-camera environment, and the pose positioning program for an active rigid body in the single-camera environment is When the processor is executed, the steps of the method for positioning an active rigid body in a single-camera environment according to any one of claims 1 to 7 are realized.