CN117197333A

CN117197333A - Space target reconstruction and pose estimation method and system based on multi-view vision

Info

Publication number: CN117197333A
Application number: CN202310964313.8A
Authority: CN
Inventors: 魏承; 邓志明; 谷海宇; 刘天喜; 曹喜滨
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-12-08

Abstract

The invention provides a space target reconstruction and pose estimation method and a system based on multi-view, which belong to the technical field of space non-cooperative target pose estimation, and comprise the steps of firstly, establishing an ideal camera model and a distortion model, establishing a multi-view camera vision system, and calibrating an internal reference of a simulation camera; then, the deep convolutional neural network is used for extracting and matching the characteristic points, and multi-vision visual space target three-dimensional point cloud reconstruction of the convolutional neural network is carried out to complete sparse point cloud generation; finally estimating the pose of the non-cooperative target based on a probability closest point iterative algorithm (PICP); the method is improved aiming at the problems of mismatching of characteristic points, too small quantity of point clouds, too low robustness of the point Yun Peizhun and the like in the binocular vision process, and solves the problems that the traditional method is extremely easy to receive the interference of external environments, is easy to cause mismatching when facing complex environments, and further influences subsequent pose measurement and the like.

Description

Space target reconstruction and pose estimation method and system based on multi-view vision

Technical Field

The invention belongs to the technical field of pose estimation of space non-cooperative targets, and particularly relates to a space target reconstruction and pose estimation method and system based on multi-objective vision.

Background

In the space mission, continuous and accurate state observation of a mission target is an indispensable loop. The pose measurement sensors commonly used at present comprise a GPS navigator, an inertial navigation element, a laser range finder and a vision sensor. The measuring sensor can be divided into an internal measuring method and an external measuring method according to whether the task target contains a pose or not. Aiming at the target internally provided with the GPS navigator or the inertial sensor, the position and the gesture of the space target can be determined directly by utilizing the data provided by the measuring tool, so as to obtain the target motion state. For targets that do not contain pose measurement equipment inside and cannot communicate, laser rangefinder or vision sensor is typically used to achieve contact-free pose measurement of the target, a process called lateral approach. Because the vision sensor has the advantages of simple structure, light weight and the like, the vision sensor is often used for a space target pose measurement task.

In the task of the spacecraft, the task targets can be divided into cooperative targets and non-cooperative targets according to the prior knowledge of the task objects, such as the information of feature targets, target shape geometric parameters and the like, so that the vision-based space target measurement technology can be divided into a cooperative target vision measurement technology and a non-cooperative target vision measurement technology.

Visual measurement based on cooperative targets can realize high-precision state estimation of close-range targets, but cannot be suitable for non-cooperative targets, and in addition, when the cooperative targets exceed an observation window, target positions are lost.

Visual measurement based on non-cooperative targets can be divided into five measurement modes based on strong geometric features, model matching, laser projection, point cloud registration and multi-sensor fusion according to the technical principle, but the measurement modes have respective disadvantages.

Disclosure of Invention

Aiming at the problems that the existing mature binocular vision matching point cloud registration scheme is easy to interfere when facing complex scenes, mismatching and the like, the invention provides a space target reconstruction and pose estimation method and system based on the binocular vision, and the characteristic point matching process and the point cloud registration process are optimized. And analyzing the monocular vision model, pushing out a multi-vision measurement system, and carrying out internal reference solving on the simulation camera.

The invention is realized by the following technical scheme:

the space target reconstruction and pose estimation method based on the multi-view vision comprises the following steps: the method specifically comprises the following steps:

s1, an ideal camera model and a distortion model are established, a multi-view camera vision system is established, and internal parameters of a simulation camera are calibrated;

S2, according to the system established in the S1, the deep convolutional neural network is used for extracting and matching the characteristic points, the multi-view visual space target three-dimensional point cloud reconstruction of the convolutional neural network is carried out, and the sparse point cloud generation is completed;

s3, estimating the pose of the non-cooperative target based on a probability closest point iterative algorithm (PICP).

Further, in S1, S1.1, monocular camera vision system modeling, and establishing an ideal camera model and a camera distortion model;

a common optical camera employs a common pinhole camera model that uses pinhole imaging principles to translate a two-dimensional inverse image of a three-dimensional object to an artifact level in front of the camera lens _{Flour with a plurality of grooves} P _{Upper part} Forming a positive image; in a pinhole camera imaging model, four coordinate system conversions are involved, namely a world coordinate system, a camera coordinate system, a physical imaging coordinate system and an image pixel coordinate system;

the transformation process of the world coordinate system and the camera coordinate system involves the pose R, t of the camera placed in the world coordinate system, also referred to as an external parameter of the camera in the usual sense; in monocular vision experiments, the world coordinate system often coincides with the camera coordinate system to avoid the calculation of external parameters, and in binocular vision experiments, the world coordinate system adopts a mode of coinciding with the camera coordinate system of the left camera to reduce calculation;

The transformation of the camera coordinate system with the physical imaging coordinate system can be described as a similar triangle scaling process, assuming that the physical imaging plane is parallel to the object image plane and perpendicular to the camera optical axis,

the coordinates of the three-dimensional point P (X, Y, Z) in space in the physical imaging plane can be expressed as:

the transformation relationship between the physical imaging plane coordinate system and the image coordinate system is translation and scaling, since the origin of the physical plane coordinate system is at the camera optical center, the origin of the image coordinate system is at the upper left corner of the image, the u-axis is directed in the column direction of the image, the v-axis is directed in the row direction of the image, assuming that the distance of translation of the camera optical center and the image coordinate system is (p _u ,p _v ) (distance m), the coordinates of the point P in the image coordinate system are, according to equation (1):

while the physical imaging coordinate system has a scaling relationship with the image coordinate system, in equation (2)M is taken as a coordinate unit, and the digital image is a matrix, and the unit is pixel; assuming that the pixel coordinate system is scaled by a factor of a in the u-axis and by a factor of β in the v-axis, where a and the length of a single pixel of the image in the x-axis and y-axis, respectively, are available according to equation (2):

the transformation relation between the world coordinate system and the image pixel coordinate system can be obtained in a comprehensive way:

Where s is a scaling factor, f _x ,f _y ,c _x ,c _y The internal reference, called a camera, can be abbreviated as:

equation (5) is the conversion relation between the image pixel coordinate system and the world coordinate system of the undistorted camera in the ideal model; introducing distortion parameters to eliminate errors due to real processes;

the camera distortion can be divided into tangential distortion and radial distortion, the tangential distortion of the camera has little influence on the image imaging process, and the radial distortion has great influence on the final imaging of the image, so that the undistorted image required by the subsequent experiment can be obtained only by correcting the radial distortion;

taking radial distortion into consideration, introducing a radial distortion parameter into an ideal pinhole imaging model to eliminate the influence caused by the distortion; since distortion occurs in imaging, it is assumed that the world coordinate system and the camera coordinate system are transformed into, for the next arbitrary point P (X, Y, Z) in the world coordinate system, there are:

wherein P is _c Coordinates of the point P under a camera coordinate system; let P be _c The distance to the origin of the coordinate system is r, then r ² ＝＝(p _cx /p _cz ) ² +(p _cy /p _cz ) ² For this P, a polynomial is used _c Distortion elimination is performed by:

k in (8) ₁ 、k ₂ 、k ₃ For radial distortion parameter, p ₁ 、p ₂ Is a tangential distortion parameter;

although the tangential distortion of the camera can be replaced by 0 in the actual experimental process, the tangential distortion can be solved in the camera calibration process, and distortion parameters of the camera are as follows:

distCoeffs＝[k ₁ k ₂ p ₁ p ₂ k ₃ ] (9)

From equations (5) and (9), it can be seen that the transformation matrix between the real world coordinate system and the image pixel coordinate system is solved to share f in consideration of camera distortion _x ,f _y ,c _x ,c _y ,k ₁ ,k ₂ ,k ₃ ,p ₁ ,p ₂ Nine parameters, the first four parameters are collectively referred to as camera parameters, and the last five parameters are collectively referred to as distortionParameters;

s1.2, modeling a multi-camera vision system;

in an ideal binocular vision measurement model, the imaging planes of the two cameras are coplanar, the optical axes are parallel, and have the same internal parameters, and the external parameters only have X _LR The amount of translation on the shaft, no rotation; the coordinate systems of the left camera and the right camera are O respectively _L -X _L Y _L Z _L And O _R -X _R Y _R Z _R The line connecting the origins of the two-phase coordinate system is called the base line, and is generally denoted by b; as is ideal, the optical axis of the camera passes through the center of the image coordinate system;

let the coordinates of an arbitrary spatial point P be (X, Y, Z), the spatial point P is in the pixel coordinate system o of the left and right cameras _L -u _L v _L And o _R -u _R v _R The projection points on (a) are p respectively _L (u _L ,v _L ) And p _R (u _R ,v _R ) The corresponding coordinates on the physical imaging coordinate system are (x) _L ,y _L ) And (x) _R ,y _R ) According to the triangle similarity principle, there are:

wherein f is the focal length of the camera; the conversion relation between the physical imaging coordinate system and the pixel coordinate system is brought into, and the conversion relation comprises the following steps:

the coordinates of the P point are thus obtained as:

where d is defined as the difference between the left and right camera abscissas, commonly referred to as parallax (disparity); reintroducing the same specification based on binocular stereo vision measurement model The dry camera can form a multi-camera observation model; any two cameras in the multi-camera observation model can be regarded as a pair of binocular stereo vision measurement models, and therefore, the multi-camera observation model with n cameras is common to all the multi-camera observation modelsA combination of binocular stereo measurement models.

Further, S1.3, calibrating internal parameters of the simulation camera; performing internal reference solving on a camera in a simulation environment; the simulation camera model is selected as an OpenGL perspective camera model, and the relationship between the simulation camera model and the ideal pinhole camera model calibration parameters is as follows:

wherein θ is the pitch direction field angle; aspect is the Aspect ratio; n is the near plane distance in m; f is the distance of the far plane, and the unit is m; f (f) _x ,f _y ,c _x ,c _y Is an internal reference of the camera;

the internal parameters of the simulation camera under the current parameters can be calculated by using the formula (16), and the environment is a simulation environment and can be regarded as an ideal camera model, so that no distortion exists, and both radial distortion and tangential distortion are 0;

in S2, S2.1, image acquisition and image preprocessing are carried out, and the extraction of the pixel positions of the characteristic points and the corresponding descriptors is realized by using a SuperPoint algorithm;

the SuperPoint adopts a semi-self-supervision training method, and the training process is divided into three steps, namely, generation of a basic extractor, self-labeling of characteristic points, final network training and error evaluation;

The basic extractor generates a virtual three-dimensional object serving as a training set, so that the extraction of the strong characteristic angular points is realized, and the preparation is carried out for the subsequent self-calibration;

and (3) self-labeling of characteristic points: on the basis of obtaining a basic feature point extractor, carrying out self-labeling on general pictures which are not labeled; performing homography transformation on the non-marked image to generate a new image, re-using a basic extractor to mark characteristic points, and performing inverse transformation after extraction to generate a final self-marked image;

final network training: the self-labeling images are subjected to known transformation, the pose relation between the two images can be obtained, and then the two images are respectively input into a SuperPoint network for training; because the relative pose relation of the two images is known, the error analysis can be carried out on the finally extracted characteristic points and descriptors;

s2.2, detecting and matching the feature points, and matching the descriptors obtained in the S2.1 by using a SuperGlue algorithm to obtain a final feature point matching pair and a matching confidence;

s2.3, mismatching and rejecting, namely rejecting the mismatching pair in the S2.2 by using a RANSAC algorithm;

setting the size of the data set as K and the minimum parameter required for solving the model as S, the steps of the RANSAC algorithm are as follows:

S2.3.1 randomly sampling S points from the data set K as an initial model to solve the data set;

s2.3.2 fitting a possible model of the dataset from the S points sampled in the previous step;

s2.3.3 bringing other points in the dataset into the model to calculate the error, when less than the threshold P, taking the point as an interior point;

s2.3.4 repeating N times, selecting the set with the largest inner points and re-fitting to obtain a new possible model;

s2.3.5 repeating the steps until reaching the appointed iteration times, and ending the algorithm;

for different outlier contents, different times of iteration are needed to reach the expected accuracy; for any one dataset, the number of iterations can be expressed as:

where w is the probability that the randomly chosen point from the dataset is the interior point and p is the probability that the sampling was successful.

Further, S2.4, resolving the relative pose of the camera and determining space points, resolving the relative pose of the camera by utilizing epipolar constraint, and obtaining three-dimensional coordinates of the feature points by utilizing triangulation on the basis of obtaining the relative pose of the camera; performing three-dimensional point cloud reconstruction based on triangulation;

the relative pose relation of the multiple camera groups can be obtained by camera calibration, the generation of sparse point cloud is realized by utilizing the principle of triangulation on the basis,

Consider an image I taken by two cameras placed in parallel in different poses ₁ 、I ₂ Taking the left camera coordinate system as a reference coordinate system, and taking the transformation matrix of the right camera as T; let the camera optical center be O ₁ And O ₂ Any three-dimensional point P in the space is I ₁ The projection pixel point in (a) is p ₁ At I ₂ The projection pixel point in (a) is p ₂ The method comprises the steps of carrying out a first treatment on the surface of the Theoretically straight line O ₁ p ₁ With O ₂ p ₂ The two straight lines emitted from the optical center are always free from an intersection point in space due to the influence of noise, lens distortion and other factors; therefore, the least square function constructed in the process can be solved, and x is set ₁ 、x ₂ The normalized coordinates of the two feature points are obtained:

s ₁ x ₁ ＝s ₂ Rx ₂ +t. (18)

knowing the camera pose R and position t in equation (18), it is the depth s of the two feature points in the respective images that needs to be solved ₁ 、s ₂ The method comprises the steps of carrying out a first treatment on the surface of the To the left of the two sides of the upper partThe method can obtain:

s can be solved by the formula (19) ₂ I.e. the three-dimensional coordinates of the spatial points;

s2.5, optimizing the point cloud, and simultaneously optimizing the pose of the camera and the three-dimensional coordinates of the space points by using a Bundle Adjustment algorithm;

in order to minimize the re-projection error, the coordinates of the three-dimensional points and the pose of the camera need to be optimized simultaneously, so that the light emitted from any characteristic point is finally converged to the optical center, namely BA (Bundle Adjustment); taking a camera observation model into consideration, establishing least square for errors generated in the process once:

e＝h(x,y)-h(ξ,p) (20)

Wherein ζ represents a plum group corresponding to a real pose R, t of the camera at the current moment, p represents a real three-dimensional coordinate of a road sign at the current moment, x represents a plum group corresponding to a pose R, t solved by the camera at the moment, and y represents a corresponding point three-dimensional coordinate of the road sign p observed in the x state;

considering the overall observation process, the cost function can be expressed as:

where z=h (x, y), z _ij Pose xi of camera _i Site observation road sign p _j The obtained data;

since equation (21) is a nonlinear function, to solve for the equation's overall argument plus the delta Δx, there is:

wherein E is _ij The partial derivatives of the cost function to the coordinates of the three-dimensional points of the road mark in the state are shown as follows:

F _ij representing the bias of the cost function to the camera pose in this state is:

the above equation was solved using the LM (Levenberg-Marquardt) algorithm:

f(x+δx)≈f(x)+Jδx (25)

the J is a cost function, and a jacobian matrix obtained by first-order partial derivative is obtained by taking J as a cost function, wherein J= [ F E ]; minimizing the re-projection error and taking f (x) =ε has:

(J ^T J+λI)δx＝-J ^T ε (26)

the simultaneous optimization of the pose of the camera and the coordinates of the three-dimensional points can be realized by solving the equation;

further, in step 3, S3.1: generating a probability point cloud, and fusing the point cloud generated by the S2.5 in the sparse point cloud generating module with the matching confidence obtained by the S2.2 to obtain the probability point cloud;

S3.2: removing outliers based on the remote interference point removal of the outliers so as to improve the subsequent registration accuracy; meanwhile, in order to improve the operation speed of the registration algorithm, downsampling the point cloud to reduce the scale of the point cloud;

in order to realize the preliminary processing of the source point cloud and the target point cloud, an outlier elimination algorithm based on statistical filtering is adopted; the point cloud in the three-dimensional coordinate system can be expressed as:

P＝{p _i ∈R ³ } _{i＝1,2,3,...} (27)

wherein p is _i ＝(x _i ,y _i ,z _i ) Is the three-dimensional coordinates of points set in a cartesian coordinate system; for each 3D point, two steps are taken; searching a group of nearest points in the neighborhood of the point, calculating the Euler distance, and then calculating the standard deviation of the distance between the point and the closing point; in this way, it can be determined whether it is an outlier:

wherein d is _i Representing the statistical distance of the points; k represents the number of selected adjacent points; m is a threshold value for controlling the number of outliers; d, d _std Represents p _i Standard deviation from the prox;

for p _i If p _j And p is as follows _i The Euler distance between the two is greater than d _i Then p _j Is considered an outlier.

Further, in S3, the method further includes: s3.3: registering point clouds, namely registering the point clouds among frames by using a probability nearest iteration point algorithm (PICP), obtaining the relative pose of a non-cooperative target, and comparing the relative pose with a pose true value to measure the robustness and the accuracy of the algorithm; giving two point cloud models which are respectively reference point clouds { p } _i Sum of the target point cloud { p' _i The problem of estimating the pose of the non-cooperative target is converted into the problem that the distance from the reference point cloud to the target point cloud is minimum when the optimal transformation parameters are solved, so that a rotation matrix R and a translation vector t of the non-cooperative target can be obtained:

the single point distance error can be expressed as:

e _i ＝p _i -(Rp′ _i +t) (30)

from the above equation, the registration problem can be translated into the minimum sum of the Euclidean distances of the reference point cloud and the target point cloud errors, namely equation (31):

considering the influence of the near-interference point-to-point cloud registration process, a distance weight is added on the basis of the formula (31) to reduce the influence of the near-interference point on the registration process, so that a new loss function can be expressed as:

wherein w is _i Represents distance weight and w _i ∈[0,1]The method comprises the steps of carrying out a first treatment on the surface of the If w _i The error generated by the point-to-loss function will decrease at the same time if the value of (2) goes to 0;

near interference points are typically present at the edges of the target, so weights can be constructed by distance to the point cloud centroid; for a point pair, the distances from the point to the centroid of the source and target point clouds are as follows:

wherein q is _μ And q' _μ Respectively representing the mass centers of the reference point cloud and the target point cloud; taking into account d _pi And d _qi Selecting a two-dimensional gaussian distribution function as an independent variable to evaluate the weight of each point as in equation (34); the shape of the non-cooperative targets such as satellites is approximately symmetrical, and the structure is uniform; meanwhile, the two-dimensional Gaussian function has rotational symmetry, so that the weight cannot deviate to any side to cause registration failure; furthermore, the gaussian function is a single-valued function, the weight of each point increasing or decreasing monotonically with the distance to the centroid;

Due to w _i ∈[0,1]The value of sigma may be set toThus w _i Can be expressed as:

combining equation (32), the loss function with near interference point consideration can be obtained as:

considering the problem of the feature recognition algorithm accuracy degradation caused by feature destruction under extreme illumination conditions, taking the matching confidence coefficient obtained by the SuperGlue algorithm as the confidence coefficient of the point cloud, taking the confidence coefficient as the weight to participate in the point cloud registration process so as to reduce the influence of mismatching on final pose estimation, and obtaining a new loss function as follows:

wherein P is _i Is a probability weight and P _i ∈[0,1]The method comprises the steps of carrying out a first treatment on the surface of the Considering the matching confidence coefficient obtained by the SuperGlue algorithm, the higher the matching confidence coefficient is, the stronger the pair of feature points have in the image recognition process, and the three-dimensional points obtained by the pair of feature points have higher credibility; conversely, a lower confidence in the match indicates a greater likelihood of a mismatch for the pair of feature points, and thus a probability weight P _i Can be expressed as:

wherein P (P) _i ) And P (P' _i ) Respectively the probability of a reference point and the probability of a target point; therefore, by comprehensively considering complex background and extreme illumination effects, the loss function of the probability nearest iteration point (PICP) algorithm can be obtained as follows:

according to the above, the point cloud registration problem can be restated as a least squares problem; to simplify the computation, the transformation of the point cloud can be skillfully transformed into two steps: translation and rotation; the motion of a rigid body should be considered as rotation of the source point cloud around the centroid and translation of the centroid; the centroids of the two sets of point clouds are therefore listed:

Combining the above can be obtained:

due to the cross term part (p _i -p-R(p _i 'p') is zero after summing, so the optimization function can be expressed as:

the barycenter removing coordinates of the reference point cloud and the target point cloud are calculated by:

q _i ＝p _i -p,q′ _i ＝p′ _i -p′ (43)

the least squares problem can be optimized as follows:

the error term for R is expanded with:

the removal of the irrelevant items to R is:

singular value decomposition of the above is:

the method comprises the following steps:

thus, there are:

the rotation matrix and the displacement vector of the point cloud registration can be obtained through the above formula.

Space target reconstruction and pose estimation system based on multi-view vision: the system comprises a camera vision module, a sparse point cloud generation module and a point cloud registration module; the camera vision module is used for establishing an ideal camera model and a distortion model, establishing a multi-camera vision system and calibrating an internal parameter of the simulation camera; the sparse point cloud generation module is used for realizing extraction and matching of characteristic points by using a deep convolutional neural network according to the system established in the S1, performing multi-vision visual space target three-dimensional point cloud reconstruction of the convolutional neural network, and completing sparse point cloud generation; the point cloud registration module is based on non-cooperative target pose estimation of a probabilistic closest point iterative algorithm (PICP).

An electronic device comprising a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.

A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the above method.

The invention has the beneficial effects that

The invention optimizes the characteristic point matching process and the point cloud registration process, and improves the problems of mismatching of characteristic points, too small number of point clouds, too low robustness of the point Yun Peizhun and the like in the binocular vision process. The method solves the problems that the traditional method such as SIFT, ORB, SURF algorithm has better performance in indoor stable environment in actual use, is extremely easy to be interfered by external environment, is often extremely easy to be interfered to cause mismatching when facing complex environments such as darkness, weakness, overexposure and the like, and further influences subsequent pose measurement.

Aiming at the problem that the traditional feature point extraction and matching algorithm is extremely easy to be interfered to generate mismatching in a complex environment, the feature point extraction and matching of a non-cooperative satellite are realized by using SuperPoint and SuperGlue algorithms, mismatching pairs are removed by using RANSAC algorithm, three-dimensional point cloud reconstruction is carried out on a target by using a triangulation principle, and finally, the pose and three-dimensional point coordinates of a camera are simultaneously optimized by using BA algorithm. And carrying out a non-cooperative satellite three-dimensional point cloud reconstruction experiment in a simulation environment, and finally generating a sparse point cloud with a scale error within a range of 1% -2%, thereby having higher precision.

Aiming at the problem of far interference points, an outlier removing method is provided to eliminate the influence of the far interference points on the pose measurement process; aiming at the problems of near interference points and unavoidable mismatching, the distance weight and the probability weight are added on the basis of the traditional point cloud registration algorithm, and the proposed probability nearest point iterative algorithm (PICP) adds the distance weight and the probability weight for each point in the point cloud so as to solve the problems of complex background and strong light which need to be faced by non-cooperative target pose estimation in a real environment.

Drawings

FIG. 1 is a frame of a pose measurement system of the present invention; FIG. 2 is a camera distortion model; FIG. 3 is a schematic view of radial distortion; FIG. 4 is an ideal binocular vision model;

FIG. 5 is a three-dimensional vision model; FIG. 6 is an OpenGL perspective camera model; FIG. 7 is a sparse point cloud generation box; FIG. 8 is a SuperPoint algorithm architecture;

FIG. 9 is a SuperPoint training step framework; FIG. 10 is a SuperGlue algorithm architecture;

FIG. 11 is a schematic diagram of RANSAC algorithm mismatching culling, wherein _a ) Identifying an original result for SIFT, (b) identifying a result graph after the RANSAC mismatching and rejecting;

FIG. 12 is a schematic diagram of triangulation;

FIG. 13 is a graph showing the result of the SuperGlue algorithm, wherein (a) is frame 1, (b) is frame 88, (c) is frame 183, and (d) is frame 268;

FIG. 14 is a schematic view of a non-cooperative satellite sparse point cloud; FIG. 15 is a flowchart of PICP algorithm; fig. 16 is an outlier schematic.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

With reference to fig. 1 to 16. The space target reconstruction and pose estimation method based on the multi-view vision comprises the following steps: the method specifically comprises the following steps:

In S1, S1.1, modeling a monocular camera vision system, and establishing an ideal camera model and a camera distortion model; using the principle of aperture imaging, light rays pass through the optical center of the camera and fall on the photosensitive plane to form a two-dimensional inverted image of a three-dimensional object, and for modeling convenience, the two-dimensional inverted image of the three-dimensional object is generally translated to an imaginary plane P in front of a camera lens to form a positive image; in a pinhole camera imaging model, four coordinate systems are mainly involved in conversion, namely a world coordinate system (inertial coordinate system), a camera coordinate system, a physical imaging coordinate system and an image pixel coordinate system;

The transformation process of the world coordinate system and the camera coordinate system mainly relates to the pose R, t of the camera placed in the world coordinate system, and is also called an external parameter of the camera in a common sense; in monocular vision experiments, the world coordinate system often coincides with the camera coordinate system to avoid the calculation of external parameters, and in binocular vision experiments, the world coordinate system often adopts a mode of coinciding with the camera coordinate system of the left camera to reduce calculation; the transformation process of the camera coordinate system and the physical imaging coordinate system can be simply described as a similar triangle scaling process, and for the sake of clarity in describing the transformation occurring in this process, it is assumed that the physical imaging plane is parallel to the object image plane and perpendicular to the camera optical axis;

according to triangle similarity, the coordinates of the three-dimensional point P (X, Y, Z) in space in the physical imaging plane can be expressed as:

equation (5) is the conversion relation between the image pixel coordinate system and the world coordinate system of the undistorted camera in the ideal model; because the lens surface of the camera is inevitably concave-convex in the production process of the real camera lens, the light propagation path is bent to a certain extent, the ideal small-hole imaging model is in error with the real model, and the error can lead to the distortion of the image shot by the camera, so that distortion parameters are required to be introduced to eliminate the error caused by the real process.

Camera aberrations can be classified into tangential aberrations, which are caused by manufacturing imperfections in which the lens is not perfectly parallel to the imaging plane, and radial aberrations, which are due to the fact that pairs of rays farther from the optical axis are bent more inward, resulting in a square being imaged and then becoming barrel-shaped, also referred to as barrel-shaped aberrations, as shown in fig. 3; if the light rays far from the optical axis are bent more outwards, the light rays are distorted in a pillow shape. In practical use of the camera, the tangential distortion of the camera is found to be small, typically at 10 ^-4 Magnitude, small impact on the image imaging process, while radial distortion is typically 10 ^-2 The magnitude has larger influence on the final imaging of the image, so that in partial image correction experiments, the undistorted image required by subsequent experiments can be obtained by correcting radial distortion only;

considering radial distortion, it can be seen from fig. 3 that the more distant the pixel from the center of the image, the more significantly the distortion appears, and the portion near the edge is significantly curved, which will cause significant errors in the subsequent image processing and feature matching process. The introduction of radial distortion parameters into the model of ideal pinhole imaging eliminates the effects of this distortion. Since distortion occurs in imaging, it is assumed that the world coordinate system and the camera coordinate system are transformed into, for the next arbitrary point P (X, Y, Z) in the world coordinate system, there are:

wherein P is _c The point P is co-ordinate in the camera coordinate system. Let P be _c The distance to the origin of the coordinate system is r, then r ² ＝＝(p _cx /p _cz ) ² +(p _cy /p _cz ) ² For this P, a polynomial is used _c Distortion elimination is performed by:

k in (8) ₁ 、k ₂ 、k ₃ For radial distortion parameter, p ₁ 、p ₂ Is a tangential distortion parameter. Although the tangential distortion of the camera can be replaced by 0 in the actual experimental process, the tangential distortion can be solved in the camera calibration process, and distortion parameters of the camera are as follows:

distCoeffs＝[k ₁ k ₂ p ₁ p ₂ k ₃ ] (9)

From equations (5) and (9), it can be seen that the transformation matrix between the real world coordinate system and the image pixel coordinate system is solved to share f in consideration of camera distortion _x ,f _y ,c _x ,c _y ,k ₁ ,k ₂ ,k ₃ ,p ₁ ,p ₂ Nine parameters, the first four parameters are collectively referred to as camera parameters, and the last five parameters are collectively referred to as distortion parameters.

S1.2, modeling a multi-camera vision system; the most basic binocular stereoscopic vision system needs to be studied before the model of the multi-view camera vision system is built. The binocular stereoscopic vision system is inspired by a human vision system and consists of two optical cameras, wherein the two cameras are positioned at different positions in space, and the binocular parallax principle is utilized to match and calculate homonymous feature points of images shot by the two cameras at different positions so as to acquire three-dimensional geometric information and motion state of a target object in space.

As shown in fig. 4, in the ideal binocular vision measurement model, the imaging planes of the two cameras are coplanar, the optical axes are parallel, and have the same internal parameters, and the external parameters have only X _LR There is no rotation by the amount of translation on the shaft. The coordinate systems of the left camera and the right camera are O respectively _L -X _L Y _L Z _L And O _R -X _R Y _R Z _R The line connecting the origins of the two-phase coordinate system is called the base line, and is generally denoted by b; as is ideal, the optical axis of the camera passes through the center of the image coordinate system.

where f is the focal length of the camera. The conversion relation between the physical imaging coordinate system and the pixel coordinate system is brought into, and the conversion relation comprises the following steps:

the coordinates of the P point are thus obtained as:

where d is defined as the difference between the left and right camera abscissas and is commonly referred to as parallax (disparity). Since binocular stereo vision measurement process is usedThe principle of triangulation is combined with (10) that the depth measurable by the binocular camera is inversely proportional to parallax, proportional to the base line, when the left and right cameras are at X _LR Too close a separation can cause the binocular camera set to degrade into monocular cameras and the target depth estimation cannot be performed without triangulation. Therefore, in the practical experimental process, the selection of the base line often affects the identifiable range and accuracy of the whole measurement system, and in engineering experiments, the ratio of the base line distance to the working distance is generally considered to be between 0.5 and 2.2, so that a better measurement effect can be obtained.

And (3) introducing a plurality of cameras with the same specification on the basis of the binocular stereoscopic vision measurement model to form a multi-camera observation model. Any two cameras in the multi-camera observation model can be regarded as a pair of binocular stereo vision measurement models, and therefore, the multi-camera observation model with n cameras is common to all the multi-camera observation models A combination of binocular stereo measurement models.

Taking a three-view camera as an example, the multi-view camera observation model building process is described as shown in fig. 5. Assuming that the coordinates of any point P in the world coordinate system are (X, Y, Z), the coordinates of projection points of the point P on the pixel planes of the three cameras are P respectively _L (u _L ,v _L )、p _H (u _H ,v _H ) And p _R (u _R ,v _R ) Consider three cameras in commonThe combination is that the intersection points of three groups of binocular stereo vision systems formed by combining the camera L, the camera H and the camera R in space are respectively P ₀ (X ₀ ,Y ₀ ,Z ₀ )、P ₁ (X ₁ ,Y ₁ ,Z ₁ ) And P ₂ (X ₂ ,Y ₂ ,Z ₂ ). In an ideal case the three points would intersect at the point P simultaneously in space; however, due to the interference of errors such as camera calibration, the acquisition of a multi-camera measurement system in practical experiments often has +.>Each spatial point, i.e. one spatial point for each group of binocular stereo vision systems, as shown by P in FIG. 5 ₀ 、P ₁ 、P ₂ As shown. According to the binocular stereo vision measurement principle described above, it is assumed that the projection matrices of camera L, camera H and camera R are M respectively _L 、M _H And M _R Spatial point P ₀ 、P ₁ And P ₂ The coordinates of (c) can be solved by the following equation:

/>

from the equations (13), (14), (15), the spatial point P can be calculated ₀ 、P ₁ And P ₂ Is a three-dimensional coordinate of (c).

S1.3, calibrating internal parameters of the simulation camera; in order to cooperate with the follow-up simulation experiment, performing internal reference solving on a camera in a simulation environment;

The simulated camera model is selected as an OpenGL perspective camera model, as shown in fig. 6. According to the perspective camera model shown in fig. 6, there is a relationship with the ideal pinhole camera model calibration parameters:

wherein θ is the pitch direction field angle; aspect is the Aspect ratio; n is the near plane distance in m; f is the distance of the far plane, and the unit is m; f (f) _x ,f _y ,c _x ,c _y Is an internal reference of the camera. The simulated camera parameter settings are shown in table 1.

Table 1 OpenGL camera parameter settings

The simulation camera internal parameters under the current parameters can be calculated by using the formula (16), and the environment is a simulation environment, which can be regarded as an ideal camera model, so that distortion caused by problems such as a lens process does not exist, and therefore, both radial distortion and tangential distortion are 0, and the obtained camera internal parameters are shown in the table 2.

TABLE 2 ideal camera internal parameters

S1 firstly, a non-cooperative target pose measurement improvement frame based on multi-eye vision is provided on the basis of a traditional non-cooperative target pose measurement frame based on the dual-eye vision, and the problems of feature point mismatching, too small number of point clouds, too low robustness of the point Yun Peizhun and the like in the dual-eye vision process are solved. Secondly, modeling an ideal monocular camera vision system, mainly comprising the conversion relations of a world coordinate system, a camera coordinate system, a physical imaging coordinate system and an image pixel coordinate system in the camera shooting process, and deriving camera internal parameters; and adding the effect of a realistic error on the basis of an ideal monocular camera vision system, and leading out a monocular camera distortion model and camera distortion parameters. Firstly, deducing a binocular vision system model, wherein the deduction comprises parallax and space point three-dimensional coordinate solution; and a plurality of cameras are added on the basis of the binocular vision system model to form a multi-vision system, and the multi-vision system is split into a plurality of pairs of binocular vision system combinations to derive the multi-vision system model.

In S2, since the non-cooperative target pose measurement does not have prior information of the target, such as geometry, key features, cooperative targets, etc., the relative pose of the target cannot be obtained by identifying cooperative markers, strong geometric features, and model matches. The traditional non-cooperative target pose estimation algorithm generally generates a three-dimensional point cloud of a target through binocular vision and feature point matching, and converts the non-cooperative target into a cooperative target, but the non-cooperative target is very easy to receive interference of an external environment, so that subsequent pose measurement is influenced. In order to solve the problems, in the principle of feature point extraction and matching of a deep convolutional neural network algorithm, the SuperPoint algorithm is used for realizing the extraction of feature point pixel positions and corresponding descriptors, and the SuperGlue algorithm is used for matching the descriptors obtained in the previous step to obtain final feature point matching pairs and matching confidence; and generating a sparse point cloud of a non-cooperative target by utilizing a RANSAC algorithm and a triangulation principle, and finally optimizing the pose of the camera and the three-dimensional point coordinate by utilizing a BundleAdjust algorithm to obtain a three-dimensional point cloud with minimized reprojection error for matching with subsequent experimental development. The sparse point cloud generation framework is shown in fig. 9.

S2.1, image acquisition and image preprocessing are carried out, mainly comprising image de-distortion and the like, and extraction of feature point pixel positions and corresponding descriptors is realized by using a SuperPoint algorithm; the architecture of the SuperPoint algorithm is shown in FIG. 8, and the algorithm main body framework consists of a shared encoder and decoders for extracting corresponding feature points and generating descriptors respectively;

shared encoder section, superPoint references the practice of VGG networks, the encoder consists of convolutional layers, pooled spatial downsampling, and nonlinear activation functions. The SuperPoint encoder uses three largest pool layers, so that an image of size h×w is defined as H _c =h/8 and W _c Three 2 x 2 non-overlapping max pooling operations yield an 8 x 8 pixel cell. The encoder will input an imageIntermediate tensor mapped to smaller spatial dimension and larger channel depth +.>And (3) upper part.

The feature point part decoder will calculateAnd outputs a signal with a size of +.>Tensors of (c). The 65 channels correspond to local, non-overlapping 8 x 8 pixel grid areas plus one additional "no point of interest". After softmax per channel, the extra "no interest" will be deleted and +.>

Description sub-portion decoder computationAnd outputs a signal with a size of +. >Tensors of (c). To output a dense mapping of L2 normalized fixed length descriptors, superPoint uses a model similar to UCN, first outputting a semi-dense grid of descriptors (e.g., one per 8 pixels). Semi-randomly rather than densely learning descriptors can reduce the memory required for training and preserve runtime manageability. The decoder then performs cubic interpolation on the descriptors, and finally normalizes to unit length.

Because the characteristic points cannot be described by using a strict mathematical expression, the characteristic point data set cannot be formed by manually marking the image shot by the camera; if the traditional feature point detection algorithm, such as the SIFT algorithm, is used for extracting and labeling the feature points to form a training data set, the recognition accuracy of the traditional algorithm is limited to a certain extent, and a model trained according to the training set is difficult to have a good recognition effect. SuperPoint employs semi-self-supervised training as shown in fig. 9. The SuperPoint training process mainly comprises three steps, namely, generation of a basic extractor, self-labeling of feature points, final network training and error evaluation.

The basis extractor generates: although feature points have no strict mathematical expression, there are still some points recognized by people due to their strong features, such as corner points of squares, triangles, and the like. According to the principle, a virtual three-dimensional object is adopted as a training set, so that the extraction of the corner points with strong characteristics is realized, and the preparation is carried out for the subsequent self-calibration.

And (3) self-labeling of characteristic points: and on the basis of obtaining the basic feature point extractor, carrying out self-labeling on the general pictures which are not labeled. Because the basic feature point extractor can only extract the strong feature points, in order to obtain more feature point labels, the homography transformation is carried out on the unlabeled image to generate a new image, the basic extractor is reused for feature point labels, and the final self-labeling image is generated by using the inverse transformation after extraction.

Final network training: the self-labeling images are subjected to known transformation, the pose relation between the two images can be obtained, and then the two images are respectively input into a SuperPoint network for training. Since the relative pose relationship of the two images is known, error analysis can be performed on the finally extracted feature points and descriptors.

the SuperGlue matching algorithm is used together with the SuperPoint algorithm to achieve a good matching effect. The architecture of the SuperGlue algorithm is shown in FIG. 10. The method can be mainly divided into an attention seeking neural network part and an optimal matching layer part. The first part uses a keypoint encoder to map the keypoint location p and its visual descriptor d into a single vector, inspired by the Transfoemer network, superGlue uses the self-attention layer and the cross-attention layer (repeated L times) to create a more powerful representation f. The optimal matching layer creates an mxn score matrix and then finds the optimal partial assignment using the Sinkhorn algorithm (for T iterations).

S2.3, mismatching and rejecting, namely rejecting the mismatching pair in the S2.2 by using a RANSAC algorithm; although the SuperGlue algorithm can accurately and robustly extract and match the characteristic points, a small amount of mismatching phenomenon can occur in the algorithm when the algorithm faces a severe environment, and in order to further improve the pose estimation and three-dimensional point cloud reconstruction accuracy of the follow-up camera, the RANSAC algorithm is adopted to perform mismatching elimination on the obtained characteristic point matching pairs. Aiming at the problem of mismatching elimination, a common method is to screen out matching pairs with matching distance larger than the minimum matching distance by times, and then to eliminate the mismatching pairs from the screened data set by using a RANSAC algorithm.

S2.3.5 repeating the above steps until the specified number of iterations is reached, ending the algorithm.

For different outlier content, different numbers of iterations are required to achieve the desired accuracy. For any one dataset, the number of iterations can be expressed as:

where w is the probability that the randomly chosen point from the dataset is the interior point and p is the probability that the sampling was successful. When p=99%, the corresponding number of iterations is shown as 3.

TABLE 3 RANSAC iteration number

In order to embody the algorithm effect, the original matching pair obtained by the SIFT algorithm is subjected to the RANSAC algorithm to remove the mismatching, and the removing result is shown in fig. 11 (b).

Fig. 11 (a) shows the original result of SIFT feature recognition, and 138 pairs of feature matching pairs are shared, so that a large number of mismatching pairs exist although the matching pairs are more, and the pose estimation of a camera and the three-dimensional point cloud generation precision can be seriously affected in actual use; fig. 11 (b) shows a total of 65 pairs of SIFT feature point pairs after RANSAC mismatching and rejection, and although the number of feature matching pairs of the image processed by the RANSAC algorithm is significantly reduced, the remaining matching pairs are all correct matching, so that the RANSAC algorithm can well achieve rejection of mismatching pairs.

S2.4, resolving the relative pose of the camera and determining space points, resolving the relative pose of the camera by utilizing epipolar constraint, and obtaining three-dimensional coordinates of feature points by utilizing triangulation on the basis of obtaining the relative pose of the camera; performing three-dimensional point cloud reconstruction based on triangulation;

The relative pose relation of the multi-camera group can be obtained by camera calibration, and on the basis, the generation of sparse point cloud is realized by utilizing a triangulation principle, as shown in fig. 12; consider an image I taken by two cameras placed in parallel in different poses ₁ 、I ₂ Taking the left camera coordinate system as a reference coordinate system, the transformation matrix of the right camera is T. Let the camera optical center be O ₁ And O ₂ Any three-dimensional point P in the space is I ₁ The projection pixel point in (a) is p ₁ At I ₂ The projection pixel point in (a) is p ₂ . Theoretically straight line O ₁ p ₁ With O ₂ p ₂ The two straight lines emitted from the optical center often have no intersection point in space due to noise, lens distortion, and other factors. Therefore, the least square function constructed in the process can be solved, and x is set ₁ 、x ₂ The normalized coordinates of the two feature points are obtained:

s ₁ x ₁ ＝s ₂ Rx ₂ +t. (18)

camera pose in (18)Knowing the state R and the position t, what needs to be solved is the depth s of the two feature points in the respective images ₁ 、s ₂ . To the left of the two sides of the upper partThe method can obtain:

s can be solved by the formula (19) ₂ I.e. the three-dimensional coordinates of the spatial points.

Because of the influence of factors such as camera parameter calibration errors, mismatching and the like, the three-dimensional coordinates corresponding to the same characteristic point and the connecting lines of optical centers of different cameras are not intersected at one point in space, in order to minimize the re-projection errors, the three-dimensional point coordinates and the camera pose are required to be optimized at the same time, so that the light emitted from any characteristic point is finally converged to the optical center, namely BA (Bundle Adjustment); taking a camera observation model into consideration, establishing least square for errors generated in the process once:

e＝h(x,y)-h(ξ,p) (20)

where ζ represents the corresponding plum group of the real pose R, t of the camera at the current moment, p represents the real three-dimensional coordinate of the landmark at the current moment, x represents the corresponding plum group of the pose R, t solved by the camera at the moment, and y represents the corresponding point three-dimensional coordinate of the landmark p observed in the x state. Considering the overall observation process, the cost function can be expressed as:

where z=h (x, y), z _ij Pose xi of camera _i Site observation road sign p _j The data obtained.

/>

The above equation was solved using the LM (Levenberg-Marquardt) algorithm:

f(x+δx)≈f(x)+Jδx (25)

wherein J is a jacobian matrix obtained by solving first-order partial derivatives by using a cost function, and J= [ F E ] is given as the above formula. Minimizing the re-projection error and taking f (x) =ε has:

(J ^T J+λI)δx＝-J ^T ε (26)

the simultaneous optimization of the pose of the camera and the coordinates of the three-dimensional points can be realized by solving the equation.

Comparing and analyzing simulation experiments of a target three-dimensional point cloud reconstruction algorithm: 4 cameras are built under a simulation environment to form a multi-camera set to observe and shoot non-cooperative satellites, and background starlight is added as an interference item. The camera parameters are shown in the calibration results, the pose of the camera in the world coordinate system is shown in table 4.

TABLE 4 Multi-view camera spatial location

The non-cooperative satellite model is selected from Aura satellite model provided by NASA, the non-cooperative satellite is shot by using a multi-camera set, superPoint feature extraction and SuperGlue feature point matching are performed on images shot by the multi-camera, as shown in FIG. 13, the confidence of the matching pair is blue, yellow and red from low to high, and the feature point pair obtained by matching SuperPoint with SuperGlue algorithm has higher confidence.

After the feature point pairs are obtained, carrying out mismatching elimination by using a RANSAC algorithm, obtaining sparse point clouds of non-cooperative satellites by using triangulation on each feature point pair according to camera relative pose data in table 4, and finally, simultaneously optimizing camera pose and three-dimensional point cloud coordinates by using BA to obtain final sparse point clouds, as shown in fig. 14. The error between the length and the width of the sailboard in the obtained sparse point cloud and the true value is within the range of 1% -2%, and the reconstruction accuracy is high.

In S3, considering the complex environment of the spatial non-cooperative target, factors influencing the pose recognition accuracy of the spatial non-cooperative target mainly come from two aspects: edge interference points caused by complex backgrounds; and the problems of mismatching and point cloud generation caused by characteristic damage under the influence of extreme illumination.

In order to analyze the influence of the clutter background on the pose recognition of the non-cooperative target, the interference points are divided into two types to construct a pose estimation model. The first class consists of far-interference points, the positions of which are far away from the actual target and have serious influence on the mass center of the target point cloud, so that the far-interference points need to be screened and removed before the point cloud is registered. The second category consists of near interference points that are close to or even in contact with the target. The influence of the interference points on the mass center of the target point cloud is small, but the near interference points often keep forbidden or even have a motion rule repulsive to the target point cloud, so that the whole registration accuracy is greatly influenced in the process of registering the point cloud.

Extreme illumination is mainly represented by dark and strong light environments, and the most important influence on pose identification of a non-cooperative target is the problem of characteristic damage of illumination to the non-cooperative target. Because the optical camera mainly relies on target features to carry out matching and point cloud generation, when the features are seriously damaged, the point cloud generated by the multi-camera array has more mismatching points, which seriously affects the accuracy of target point cloud registration, so that the traditional point cloud registration algorithm needs to be further modified.

S3.1: generating a probability point cloud, and fusing the point cloud generated by the S2.5 in the sparse point cloud generating module with the matching confidence obtained by the S2.2 to obtain the probability point cloud; and registering the point cloud combining the distance weight and the confidence probability. The former can eliminate the influence of the near-interference point-to-point cloud registration process, the latter can reduce the influence of illumination interference, and the whole flow frame of the algorithm is shown in fig. 15.

S3.2: removing outliers based on the remote interference point removal of the outliers so as to improve the subsequent registration accuracy; meanwhile, in order to improve the operation speed of the registration algorithm, downsampling the point cloud to reduce the scale of the point cloud; in order to realize the preliminary processing of the source point cloud and the target point cloud, an outlier elimination algorithm based on statistical filtering is adopted. The point cloud in the three-dimensional coordinate system can be expressed as:

P＝{p _i ∈R ³ } _{i＝1,2,3,...} (27)

wherein p is _i ＝(x _i ,y _i ,z _i ) Is the three-dimensional coordinates of points set in a cartesian coordinate system. For each 3D point, two steps are taken; first searching for the nearest set of points in its neighborhood and calculating the euler distance, then calculating the standard deviation of the distance between the point and the closing point. In this way, it can be determined whether it is an outlier:

wherein d is _i Representing the statistical distance of the points; k represents the number of selected adjacent points, and may be 20; m is a threshold value for controlling the number of outliers; d, d _std Represents p _i Standard deviation from the proximities. For p _i If p _j And p is as follows _i The Euler distance between the two is greater than d _i Then p _j Is considered an outlier; as shown in fig. 16.

S3.3: point cloud registration, using probability bestRegistering the inter-frame point clouds by a near iterative point algorithm (PICP) to obtain the relative pose of a non-cooperative target, and comparing the relative pose with a pose true value to measure the robustness and the accuracy of the algorithm; giving two point cloud models which are respectively reference point clouds { p } _i Sum of the target point cloud { p' _i The problem of estimating the pose of the non-cooperative target can be converted into a problem of solving the optimal transformation parameters so as to minimize the distance from the reference point cloud to the target point cloud, thereby obtaining a rotation matrix R and a translation vector t of the non-cooperative target:

the single point distance error can be expressed as:

e _i ＝p _i -(Rp′ _i +t) (30)

wherein w is _i Represents distance weight and w _i ∈[0,1]. From the point of view of the algorithm itself, w _i Indicating the extent to which the corresponding point should be focused. Such point pairs qualify for highlighting in an iteration. In other words, the loss function (32) can reach a minimum only when such point pairs achieve the desired registration. When w is _i When the maximum is reached, the solution of this point pair is the same as the corresponding value in the original algorithm, which means that the point is filledTaking into account; conversely, if w _i The error generated by the point-to-loss function will be reduced at the same time as the value of (a) goes to 0. Thus, such pairs of points are more easily weakened and ignored. When approaching zero (it is not possible to reach 0), the solution result for the corresponding point approaches zero, which means that the point pair is not considered at all, nor does it participate in the calculation.

To weaken near interference point and reasonably control the weight w _i The location of the near interference point needs to be considered. For the above model, near-interference points are typically present at the edges of the target, so weights can be constructed by distance to the centroid of the point cloud. For a point pair, the distances from the point to the centroid of the source and target point clouds are as follows:

wherein q is _μ And q' _μ Representing the centroids of the reference point cloud and the target point cloud, respectively. Taking into account d _pi And d _qi As an independent variable, a two-dimensional gaussian distribution function is selected to evaluate the weight of each point, as in equation (34). The shape of the non-cooperative targets such as satellites is approximately symmetrical and the structure is uniform. At the same time, the two-dimensional gaussian function has rotational symmetry, which means that it is similar to analyzing the smoothness of all directions. Thus, the weights are not biased to either side, resulting in registration failure. Furthermore, the gaussian function is a single-valued function, with the weight of each point increasing or decreasing monotonically with distance from the centroid.

wherein P is _i Is a probability weight and P _i ∈[0,1]. Considering the matching confidence coefficient obtained by the SuperGlue algorithm, the higher the matching confidence coefficient is, the stronger the pair of feature points have in the image recognition process, and the three-dimensional points obtained by the pair of feature points have higher credibility; conversely, a lower confidence in the match indicates a greater likelihood of a mismatch for the pair of feature points, and it is desirable to minimize the effect of these points during registration. Thus probability weight P _i Can be expressed as:

wherein P (P) _i ) And P (P' _i ) The reference point probability and the target point probability, respectively. Therefore, by comprehensively considering complex background and extreme illumination effects, the loss function of the probability nearest iteration point (PICP) algorithm can be obtained as follows:

According to the above equation, the point cloud registration problem can be restated as a least squares problem. To simplify the computation, the transformation of the point cloud can be skillfully transformed into two steps: translation and rotation. In this sense, the motion of a rigid body should be considered as rotation of the source point cloud around the centroid and translation of the centroid. The centroids of the two sets of point clouds are therefore listed:

/>

combining the above can be obtained:

q _i ＝p _i -p,q′ _i ＝p′ _i -p′ (43)

the least squares problem can be optimized as follows:

the error term for R is expanded with:

the removal of the irrelevant items to R is:

singular value decomposition of the above is:

the method comprises the following steps:

thus, there are:

the rotation matrix and the displacement vector of the point cloud registration can be obtained through the above formula. Compared with the traditional nearest point iterative algorithm (ICP), the probability nearest point iterative algorithm (PICP) provided by the invention adds the distance weight and the probability weight for each point in the point cloud so as to solve the problems of complex background and strong light which need to be faced in the real environment for non-cooperative target pose estimation.

And determining an error source as a far interference point and a near interference point caused by a starry sky background, and solving the problem of mismatching caused by an extreme illumination environment. Aiming at the problem of far interference points, an outlier removing method is provided to eliminate the influence of the far interference points on the pose measurement process; aiming at the problems of near interference points and unavoidable mismatching, the distance weight and the probability weight are added on the basis of the algorithm of the traditional point cloud registration; shooting of non-cooperative satellites and sparse point cloud generation are realized by constructing a multi-camera set in a simulation environment, and pose measurement accuracy of a traditional ICP algorithm and PICP algorithm is compared under three working conditions of translation, rotation, translation and rotation. Experiments show that under the condition of non-cooperative satellite translation, the observation errors and fluctuation of the traditional ICP algorithm are far beyond those of the PICP algorithm; under the rotating working condition, the traditional ICP algorithm is completely disabled, and the PICP algorithm can still accurately and stably measure the pose of the non-cooperative target; under the working conditions of translation and rotation, the traditional ICP algorithm is completely disabled, the error of the PICP algorithm is obviously increased relative to the working conditions of translation and rotation, and the identification result is still far superior to the ICP algorithm.

The system comprises a camera vision module, a sparse point cloud generation module and a point cloud registration module; the camera vision module is used for establishing an ideal camera model and a distortion model, establishing a multi-camera vision system and calibrating an internal parameter of the simulation camera; the sparse point cloud generation module is used for realizing extraction and matching of characteristic points by using a deep convolutional neural network according to the system established in the S1, performing multi-vision visual space target three-dimensional point cloud reconstruction of the convolutional neural network, and completing sparse point cloud generation; the point cloud registration module is based on non-cooperative target pose estimation of a probabilistic closest point iterative algorithm (PICP).

The above detailed description of the method and system for reconstructing and estimating the pose of the space target based on the multi-view is provided, the principle and the implementation of the invention are explained, the above description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. The space target reconstruction and pose estimation method based on the multi-view vision is characterized by comprising the following steps of:

the method specifically comprises the following steps:

2. The method according to claim 1, wherein: in the step S1 of the method,

s1.1, modeling a monocular camera vision system, and establishing an ideal camera model and a camera distortion model;

a pinhole camera model is adopted by a general optical camera, and a two-dimensional reverse image of a three-dimensional object is translated onto an imaginary plane P in front of a camera lens by using a pinhole imaging principle to form a positive image; in a pinhole camera imaging model, four coordinate system conversions are involved, namely a world coordinate system, a camera coordinate system, a physical imaging coordinate system and an image pixel coordinate system;

the transformation relation between the physical imaging plane coordinate system and the image coordinate system is translation and scaling, and the origin of the physical imaging plane coordinate system is positioned at the optical center of the camera, so that the image coordinate systemThe origin of (2) is in the upper left corner of the image, the u-axis is directed in the column direction of the image, the v-axis is directed in the row direction of the image, assuming that the camera optical center is shifted from the image coordinate system by a distance (p _u ,p _v ) (distance m), the coordinates of the point P in the image coordinate system are, according to equation (1):

wherein P is _c Coordinates of the point P under a camera coordinate system; let P be _c The distance to the origin of the coordinate system is rUsing polynomials for this P _c Distortion elimination is performed by:

distCoeffs＝[k ₁ k ₂ p ₁ p ₂ k ₃ ] (9)

From equations (5) and (9), it can be seen that the real world coordinate system is solved toThe transformation matrix between the image pixel coordinate systems shares f _x ,f _y ,c _x ,c _y ,k ₁ ,k ₂ ,k ₃ ,p ₁ ,p ₂ Nine parameters, the first four parameters are collectively called as camera internal parameters, and the last five parameters are collectively called as distortion parameters;

s1.2, modeling a multi-camera vision system;

wherein f is the focal length of the camera;

the conversion relation between the physical imaging coordinate system and the pixel coordinate system is brought into, and the conversion relation comprises the following steps:

the coordinates of the P point are thus obtained as:

where d is defined as the difference between the left and right camera abscissas, commonly referred to as parallax (disparity); a plurality of cameras with the same specification are introduced on the basis of the binocular stereoscopic vision measurement model, so that a multi-camera observation model can be formed; any two cameras in the multi-camera observation model can be regarded as a pair of binocular stereo vision measurement models, and therefore, the multi-camera observation model with n cameras is common to all the multi-camera observation models A combination of binocular stereo measurement models.

3. The method according to claim 2, characterized in that: also included in S1 is:

s1.3, calibrating internal parameters of the simulation camera; performing internal reference solving on a camera in a simulation environment;

the simulation camera model is selected as an OpenGL perspective camera model, and the relationship between the simulation camera model and the ideal pinhole camera model calibration parameters is as follows:

the simulation camera internal parameters under the current parameters can be calculated by using the formula (16), and the environment is a simulation environment and can be regarded as an ideal camera model, so that no distortion exists, and both radial distortion and tangential distortion are 0.

4. A method according to claim 3, characterized in that: in the step S2 of the process,

s2.1, image acquisition and image preprocessing are carried out, and extraction of feature point pixel positions and corresponding descriptors is achieved by using a SuperPoint algorithm;

5. The method according to claim 4, wherein: in S2, further comprising:

s ₁ x ₁ ＝s ₂ Rx ₂ +t. (18)

camera pose R and position in formula (18)t is known, what needs to be solved is the depth s of two feature points in the respective images ₁ 、s ₂ The method comprises the steps of carrying out a first treatment on the surface of the To the left of the two sides of the upper partThe method can obtain:

in order to minimize the re-projection error, the coordinates of the three-dimensional points and the pose of the camera need to be optimized simultaneously, so that the light emitted from any characteristic point is finally converged to the optical center, namely BA (Bundle Adjustment);

taking a camera observation model into consideration, establishing least square for errors generated in the process once:

e＝h(x,y)-h(ξ,p) (20)

the above equation was solved using the LM (Levenberg-Marquardt) algorithm:

f(x+δx)≈f(x)+Jδx (25)

the J is a cost function, and a jacobian matrix obtained by first-order partial derivative is obtained by taking J as a cost function, wherein J= [ F E ];

minimizing the re-projection error and taking f (x) =ε has:

(J ^T J+λI)δx＝-J ^T ε (26)

6. The method according to claim 5, wherein: in the step (3) of the process,

s3.1: generating a probability point cloud, and fusing the point cloud generated by the S2.5 in the sparse point cloud generating module with the matching confidence obtained by the S2.2 to obtain the probability point cloud;

P＝{p _i ∈R ³ } _{i＝1,2,3,...} (27)

wherein p is _i ＝(x _i ,y _i ,z _i ) Is the three-dimensional coordinates of points set in a cartesian coordinate system;

for each 3D point, two steps are taken; searching a group of nearest points in the neighborhood of the point, calculating the Euler distance, and then calculating the standard deviation of the distance between the point and the closing point; in this way, it can be determined whether it is an outlier:

7. The method according to claim 6, wherein: in S3, further comprising:

s3.3: registering point clouds, namely registering the point clouds among frames by using a probability nearest iteration point algorithm (PICP), obtaining the relative pose of a non-cooperative target, and comparing the relative pose with a pose true value to measure the robustness and the accuracy of the algorithm;

giving two point cloud models which are respectively reference point clouds { p } _i Sum of the target point cloud { p' _i The problem of estimating the pose of the non-cooperative target is converted into the problem that the distance from the reference point cloud to the target point cloud is minimum when the optimal transformation parameters are solved, so that a rotation matrix R and a translation vector t of the non-cooperative target can be obtained:

the single point distance error can be expressed as:

e _i ＝p _i -(Rp′ _i +t) (30)

wherein w is _i Represents distance weight and w _i ∈[0,1]；

If w _i The error generated by the point-to-loss function will decrease at the same time if the value of (2) goes to 0;

wherein q is _μ And q' _μ Respectively representing the mass centers of the reference point cloud and the target point cloud;

taking into account d _pi And d _qi Selecting a two-dimensional gaussian distribution function as an independent variable to evaluate the weight of each point as in equation (34); the shape of the non-cooperative targets such as satellites is approximately symmetrical, and the structure is uniform; meanwhile, the two-dimensional Gaussian function has rotational symmetry, so that the weight cannot deviate to any side to cause registration failure; furthermore, the Gaussian function is a single-valued function, each point The weight increases or decreases monotonically with distance from the centroid;

wherein P is _i Is a probability weight and P _i ∈[0,1]；

Considering the matching confidence coefficient obtained by the SuperGlue algorithm, the higher the matching confidence coefficient is, the stronger the pair of feature points have in the image recognition process, and the three-dimensional points obtained by the pair of feature points have higher credibility; conversely, a lower confidence in the match indicates a greater likelihood of a mismatch for the pair of feature points, and thus a probability weight P _i Can be expressed as:

wherein P (P) _i ) And P (P' _i ) Respectively the probability of a reference point and the probability of a target point;

therefore, by comprehensively considering complex background and extreme illumination effects, the loss function of the probability nearest iteration point (PICP) algorithm can be obtained as follows:

combining the above can be obtained:

q _i ＝p _i -p,q′ _i ＝p′ _i -p′ (43)

the least squares problem can be optimized as follows:

the error term for R is expanded with:

the removal of the irrelevant items to R is:

singular value decomposition of the above is:

the method comprises the following steps:

thus, there are:

8. The utility model provides a space target reconstruction and position appearance estimation system based on multi-view vision which characterized in that:

the system comprises a camera vision module, a sparse point cloud generation module and a point cloud registration module;

the camera vision module is used for establishing an ideal camera model and a distortion model, establishing a multi-camera vision system and calibrating an internal parameter of the simulation camera;

the sparse point cloud generation module is used for realizing extraction and matching of characteristic points by using a deep convolutional neural network according to the system established in the S1, performing multi-vision visual space target three-dimensional point cloud reconstruction of the convolutional neural network, and completing sparse point cloud generation;

The point cloud registration module is based on non-cooperative target pose estimation of a probabilistic closest point iterative algorithm (PICP).

9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 7.