CN111127524A

CN111127524A - Method, system and device for tracking trajectory and reconstructing three-dimensional image

Info

Publication number: CN111127524A
Application number: CN201811290448.6A
Authority: CN
Inventors: 席明; 章国锋; 赵长飞
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2020-05-08

Abstract

The embodiment of the application discloses a method, a system and a device for tracking a track and reconstructing a three-dimensional scene, relates to the technical field of image processing, and solves the problems of poor track tracking stability and low precision in the prior art and the problem of low accuracy of the reconstruction effect of the three-dimensional scene in the prior art. The specific scheme is as follows: extracting a first view image frame corner and a second view image frame corner of a target acquired by a binocular camera by a track tracking and three-dimensional reconstruction device to generate a three-dimensional point cloud and a key frame, tracking two frames of images in front and at back by using a multi-plane fitting mode to continuously update the three-dimensional point cloud and the key frame set, solving the camera pose, obtaining a dense depth map according to the camera pose and the three-dimensional point cloud, judging the confidence of the depth map, and finally obtaining a three-dimensional reconstruction model of a scene. The method and the device are used in the image tracking and three-dimensional scene reconstruction processes.

Description

Method, system and device for tracking trajectory and reconstructing three-dimensional image

Technical Field

The embodiment of the application relates to the technical field of image information processing, in particular to a method, a system and a device for trajectory tracking and three-dimensional reconstruction based on a binocular camera.

Background

Currently, with the widespread use of mobile smart devices, a large number of virtual reality and augmented reality based applications are also continually emerging into the market, for example: the virtual object is superposed to the picture of the real scene, so that the virtual object can be applied to various fields such as games, medical treatment, education, navigation and the like; and the development of robots And the extensive research of unmanned vehicles of unmanned aerial vehicles in the current society, And the high-precision instant positioning And Mapping (SLAM) And three-dimensional reconstruction are hot spots researched by many researchers.

The existing SLAM and three-dimensional reconstruction method is mainly based on a Personal Computer (PC) or monocular camera equipment, and when performing target tracking, a model for recovering a camera motion trajectory is determined according to a scene being a planar/non-planar scene by using an ORiented Brief (ORB) feature method to perform camera motion trajectory tracking; when the tracking is lost, the relocation operation is performed by a method of calculating bag of words (BoW). However, since the target tracking method is based on a fixed geometric model, the tracking effect is low in accuracy; in addition, the repositioning is performed by the BoW method, and because the calculation and the matching of the BoW of the key frame and the current frame are mainly performed, the method is easy to lose in an area with poor characteristics and has an unsatisfactory repositioning effect.

Disclosure of Invention

The embodiment of the application provides a method and a system for tracking a track and reconstructing a three-dimensional image, which can quickly and accurately track the track, improve the precision of a depth map and further improve the reliability of a three-dimensional reconstruction result.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect of the embodiments of the present application, a method for trajectory tracking and three-dimensional reconstruction is provided, where the method includes: extracting angular points of the first image frame and the second image frame; the first image frame is an image frame acquired by a first camera of the binocular camera, and the second image frame is an image frame acquired by a second camera of the binocular camera; acquiring three-dimensional point clouds corresponding to the first image frame and the second image frame according to the acquired corner points of the first image frame and the second image frame, wherein the three-dimensional point clouds are composed of three-dimensional points; fitting three-dimensional points in three-dimensional point clouds corresponding to the first image frame and the second image frame to two or more planes in a preset three-dimensional coordinate system; acquiring a matching pair consisting of feature blocks mapped to the three-dimensional points on the first image frame and the second image frame respectively according to the transformation relation between the plane where the three-dimensional points are located and the corresponding first image frame and second image frame, wherein the feature blocks are image blocks taking angular points as centers; acquiring the pose of the binocular camera according to the three-dimensional points and the corresponding matching pairs; acquiring a depth map according to the pixel value of a feature block on a first image frame and the pixel value of a matching block of the feature block on a second image frame; and generating a reconstruction model according to the acquired camera pose and depth map result. Therefore, more accurate matching can be obtained by determining the matching pairs and solving the camera pose in a plane fitting mode, and the camera pose is more accurate to solve.

With reference to the first aspect, in a possible implementation manner, the method further includes: carrying out confidence degree judgment on the obtained depth map result; and generating a reconstruction model according to the acquired camera pose, depth map result and confidence coefficient judgment result. Therefore, the confidence degree judgment is carried out on the depth map result, and then the reconstruction model is generated according to the depth map result with high reference degree, so that the accuracy of the reconstruction model is improved.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, the obtaining three-dimensional point clouds corresponding to a first image frame and a second image frame according to corner points of the first image frame and the second image frame includes, if the first image frame and the second image frame are key frames, constructing pyramid images for the first image frame and the second image frame, and searching for a matching block of a feature block of the second image frame in the first image frame to obtain a matching pair; triangularizing the matched pair to generate a three-dimensional point cloud; and if the first image frame and the second image frame are non-key frames, taking the three-dimensional point cloud corresponding to the previous image frame of the first image frame and the previous image frame of the second image frame as the three-dimensional point cloud corresponding to the first image frame and the second image frame. Therefore, more accurate matching can be obtained by updating the three-dimensional point cloud according to the key frame, and the solving of the camera pose is more accurate.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, the obtaining the pose of the binocular camera according to the three-dimensional point and the corresponding matching pair includes performing error Sum of Squares (SSD) matching on image blocks on the first image frame and the second image frame, which are associated with the three-dimensional point, and image blocks of the candidate corner points according to initial poses of the first image frame and the second image frame, so as to obtain a matching relationship between the three-dimensional point and two-dimensional coordinates of the first image frame and two-dimensional coordinates of the second image frame; wherein the candidate corner point is a corner point visible at the initial coordinates of the first image frame and the second image frame; determining a target three-dimensional point according to the obtained matching relation; the target three-dimensional point is a three-dimensional point successfully matched with the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame; and calculating the camera pose of the image frame according to the target three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame. Therefore, the matching relation between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame is obtained by performing SSD matching, and the camera pose is calculated according to the target three-dimensional point determined by the matching relation, so that the camera pose is more accurately solved.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the determining a target three-dimensional point according to the obtained matching relationship includes: according to the plane where the successfully matched three-dimensional points are located, calculating a homography transformation matrix H from the plane to the first image frame and the second image frame₁And H₂(ii) a Transforming the matrix H according to the homography₁And H₂Respectively mapping the unmatched three-dimensional points to the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame, and iteratively updating the matched three-dimensional points, wherein iteration is more successfulThe new times are one or more times; if the number of the newly matched three-dimensional points is less than a first threshold value, stopping iteration; respectively aligning the first image frame and the second image frame with thumbnails of a frame before the first image frame and a frame before the second image frame to obtain an interframe transformation matrix S₁And S₂(ii) a According to the interframe transformation matrix S₁And S₂And projecting the updated unmatched three-dimensional points to the first image frame and the second image frame respectively according to the camera poses of the frame before the first image frame and the frame before the second image frame, and determining the target three-dimensional points. Thus, the matrix H is transformed by the homography₁And H₂Iterative matching is carried out on the three-dimensional points which are not successfully matched to obtain more matches, and the matching is carried out according to the interframe transformation matrix S₁And S₂And projecting the unmatched three-dimensional points after iteration to obtain more accurate matches, so that the pose of the camera is solved more accurately.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining a matching relationship between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame further includes matching an unmatched three-dimensional point to the first image frame according to the camera pose of the first image frame, and updating the matched three-dimensional point; projecting the updated successfully matched three-dimensional points to a second image frame; and calculating the pixel value error of the three-dimensional point in the first image frame and the second image frame matching block, and deleting the corresponding matching relation if the error is greater than a second threshold value. Therefore, the matching relation with large pixel value error of the first image frame and the second image frame matching block is deleted, so that more accurate matching can be obtained, and the solving of the camera pose is more accurate.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the initial poses of the first image frame and the second image frame are: camera poses of a frame before the first image frame and a frame before the second image frame, or initial poses of the first image frame and the second image frame which are respectively calculated according to a motion model; the initial coordinates of the first image frame and the second image frame are as follows: three-dimensional points visible from the first image frame and the second image frame are projected to the first image frame and the second image frame, respectively, and initial coordinates of the first image frame and the second image frame are obtained. Therefore, the matching relation between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame is obtained by performing SSD matching according to the initial pose and the initial coordinates, and then the camera pose is calculated according to the target three-dimensional point determined by the matching relation, so that the solution of the camera pose is more accurate.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the key frame is: if the first image frame or the second image frame satisfies: the matching rate of the feature blocks meets a matching threshold, the position vector of the first image frame or the second image frame and a previous first key frame or a previous second key frame is greater than the position threshold, the ratio of the number of the feature blocks on the first image frame or the second image frame, which are the same as the number of the feature blocks on the previous first image frame or the previous second image frame, to the total number of the feature blocks on the previous first image frame or the previous second image frame is less than the threshold, or the number of frames between the first image frame or the second image frame and the previous first key frame or the previous second key frame is greater than a preset threshold, and the first image frame or the second image frame is taken as the first key frame or the second key frame to be added into the key frame set. Therefore, the first image frame and the second image frame which meet the key frame judgment condition are added into the key frame set, the three-dimensional point cloud can be updated according to the key frames, more accurate matching is obtained, and the camera pose is more accurately solved.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the fitting three-dimensional points in three-dimensional point clouds corresponding to the first image frame and the second image frame to two or more planes in a preset three-dimensional coordinate system includes: fitting the three-dimensional points to two or more planes in a preset three-dimensional coordinate system By using an image frame random sampling consensus RANSAC method according to a formula Ax + By + Cz + D which is 0; wherein, (A, B, C)^T＝(P₂-P₁)×(P₃-P₁)，P₁,P₂,P₃Randomly selecting three points from the three-dimensional point cloud; d ═ P (A, B, C)₁. Therefore, plane fitting is carried out on the three-dimensional points through the RANSAC method, and more accurate matching is obtained by combining the plane to determine the matching pairs and solve the camera pose, so that the solution of the camera pose is more accurate.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, if the difference between the numbers of feature blocks in the first image frame and the second image frame is greater than a preset threshold K1, or the numbers of feature blocks in both the first image frame and the second image frame are less than K2, the method further includes: creating thumbnails of the first image frame or the second image frame and key frames in the key frame queue; obtaining the matching degree ranking of the first image frame or the second image frame and the key frame by a thumbnail alignment method; and respectively carrying out feature block matching on the feature block of the key frame with high matching degree and the first image frame and the second image frame, updating the successfully matched feature pair to a feature block matching successful set M, and respectively calculating the camera poses of the first image frame and the second image frame according to the successfully matched feature pair in the feature block matching successful set M. Therefore, when the image frame is lost in tracking, the image frame is redirected by the thumbnail alignment method, and the influence of the loss of tracking on the three-dimensional reconstruction effect can be avoided.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the acquiring the pose of the binocular camera according to the three-dimensional points and the obtained matching pairs further includes creating a matching vector V according to the matching pairs_n(i) The matching vector V_n(i) The feature block number is used for indicating that the ith key frame is successfully matched with the first image frame and the second image frame; according to the matching vector V_n(i) Carrying out feature block matching on feature blocks of the ith key frame, which are not successfully matched with the first image frame and the second image frame, and updating a set M; the ith key frame is the key frame with the highest matching success number with the first image frame and the second image frame feature blocks; according to the matching success in the set MCalculating a second camera pose of the first image frame and the second image frame; and if the offset between the second camera pose of the first image frame and the second camera pose of the second image frame and the camera pose of the first image frame and the second image frame is larger than a third threshold value, adjusting the camera poses of the first image frame and the second image frame according to the second camera poses of the first image frame and the second image frame. Therefore, the motion tracks of the feature blocks are subjected to loop detection, the feature blocks with the same name on the discontinuous frames are efficiently matched, and the influence of the generated accumulated error on the camera pose result is avoided.

With reference to the first aspect and the possible implementation manners described above, in another possible implementation manner, the method further includes performing local aggregation adjustment BA optimization and global BA optimization on the camera poses of the three-dimensional point cloud and the keyframe; and updating the camera poses of the three-dimensional point cloud and the key frame. Therefore, when the background is idle, the BA optimization is carried out on the camera positions of the three-dimensional point cloud and the key frame, so that the machine position is more accurate.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining a depth map according to pixel values of a feature block on a first image frame and pixel values of a matching block of the feature block on a second image frame includes transversely dividing the first image frame or the second image frame into N blocks; respectively calculating the depth maps of N parts of the first image frame or the second image frame by adopting different threads; wherein N is an integral multiple of the number of threads; and integrating the depth maps of the N parts of the first image frame or the second image frame to obtain a depth map result of the first image frame or the second image frame. In this way, the image frame is divided into blocks, each block is processed by different threads, and then the results are integrated, so that the generation speed of the depth map result is increased.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the performing confidence level determination on the obtained depth map result includes defining a set of criterion vectors C ═ C (C ═ C)_m,C_c,C_p) And a set of threshold vectors T ═ T (T)_m,T_c,T_p)，Wherein, C_mA value representing the SAD of the pixel (x, y); c_c＝2SAD(x,y,d₁)-SAD(x,y,d₁+1)-SAD(x,y,d₁-1) representing the Sum of Absolute Difference (SAD) difference of the adjacent disparity and the current disparity at pixel point (x, y); c_pRepresenting a ratio of a depth within a neighborhood from the current depth value such that SAD is maximum and less than SAD of the current depth SAD to SAD of the current depth; and if the criterion vector C (x, y) of the pixel point (x, y) is smaller than the threshold vector T (x, y), determining that the confidence of the depth map result of the pixel point is 1. Therefore, the confidence degree judgment is carried out on the depth map result, and then the reconstruction model is generated according to the depth map result with high reference degree, so that the accuracy of the reconstruction model is improved.

In a second aspect of the embodiments of the present application, there is provided a trajectory tracking and three-dimensional reconstruction system, which may include: the system comprises a binocular camera and a track tracking and three-dimensional reconstruction device, wherein the binocular camera is used for extracting angular points of a first image frame and a second image frame; the first image frame is an image frame acquired by a first camera of the binocular camera, and the second image frame is an image frame acquired by a second camera of the binocular camera; the track tracking and three-dimensional reconstruction device is used for acquiring three-dimensional point clouds corresponding to the first image frame and the second image frame according to the corner points of the first image frame and the second image frame, and the three-dimensional point clouds are composed of three-dimensional points; fitting three-dimensional points in the three-dimensional point clouds corresponding to the first image frame and the second image frame to two or more planes in a preset three-dimensional coordinate system; obtaining a matching pair consisting of feature blocks mapped to the three-dimensional point on the first image frame and the second image frame respectively according to the transformation relation between the plane where the three-dimensional point is located and the first image frame and the second image frame, wherein the feature blocks are image blocks taking angular points as centers; acquiring the pose of the binocular camera according to the three-dimensional points and the matched pair; acquiring a depth map according to the pixel value of the feature block on the first image frame and the pixel value of the matching block of the feature block on the second image frame; and generating a reconstruction model according to the camera pose and depth map result. Therefore, more accurate matching can be obtained by determining the matching pair and solving the camera pose in a plane fitting mode, and the camera pose is more accurately solved.

With reference to the second aspect, in a possible implementation manner, the trajectory tracking and three-dimensional reconstruction apparatus is further configured to: performing confidence judgment on the obtained depth map result; and generating a reconstruction model according to the camera pose, the depth map result and the confidence coefficient judgment result. Therefore, the confidence degree judgment is carried out on the depth map result, and the re-modeling type is generated according to the depth map result with high reference degree, so that the accuracy of the re-modeling type is improved.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining three-dimensional point clouds corresponding to the first image frame and the second image frame according to corner points of the first image frame and the second image frame includes, if the first image frame and the second image frame are key frames, constructing pyramid images for the first image frame and the second image frame, and searching for a matching block of a feature block of the second image frame in the first image frame to obtain a matching pair; triangularizing the matched pair to generate a three-dimensional point cloud; and if the first image frame and the second image frame are non-key frames, taking the three-dimensional point cloud corresponding to the previous image frame of the first image frame and the previous image frame of the second image frame as the three-dimensional point cloud corresponding to the first image frame and the second image frame. Therefore, more accurate matching can be obtained by updating the three-dimensional point cloud according to the key frame, and the solving of the camera pose is more accurate.

With reference to the second aspect and the possible implementation manners described above, in another possible implementation manner, the obtaining of the pose of the binocular camera according to the three-dimensional point and the matching pair includes performing error Sum of Squares (SSD) matching on image blocks on the first image frame and the second image frame, which are associated with the three-dimensional point, and image blocks of the candidate corner points according to the initial poses of the first image frame and the second image frame, so as to obtain a matching relationship between the three-dimensional point and two-dimensional coordinates of the first image frame and two-dimensional coordinates of the second image frame; wherein the candidate corner points are corner points visible at initial coordinates of the first image frame and the second image frame; determining a target three-dimensional point according to the matching relation; the target three-dimensional point is a three-dimensional point successfully matched with the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame; and calculating the camera pose of the image frame according to the target three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame. Therefore, the matching relation between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame is obtained by performing SSD matching, and the camera pose is calculated according to the target three-dimensional point determined by the matching relation, so that the solution of the camera pose is more accurate.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the determining a target three-dimensional point according to a matching relationship includes: according to the plane where the successfully matched three-dimensional points are located, calculating a homography transformation matrix H from the plane to the first image frame and the second image frame₁And H₂(ii) a Transforming the matrix H according to the homography₁And H₂Respectively mapping the three-dimensional points which are not successfully matched to the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame, and iteratively updating the three-dimensional points which are successfully matched, wherein the iteration is updated for one time or more times; if the number of the newly matched three-dimensional points is less than a first threshold value, stopping iteration; aligning the first image frame and the second image frame with thumbnails of a frame before the first image frame and a frame before the second image frame respectively to obtain an inter-frame transformation matrix S₁And S₂(ii) a According to the interframe transformation matrix S₁And S₂And respectively projecting the updated unmatched three-dimensional points to the first image frame and the second image frame according to the camera poses of the previous frame of the first image frame and the previous frame of the second image frame, and determining the target three-dimensional points. Thus, the matrix H is transformed by the homography₁And H₂Iterative matching is carried out on the three-dimensional points which are not successfully matched to obtain more matches, and the matching is carried out according to the interframe transformation matrix S₁And S₂And projecting the unmatched three-dimensional points after iteration to obtain more accurate matches, so that the pose of the camera is solved more accurately.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining a matching relationship between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame further includes matching an unmatched three-dimensional point to the first image frame according to the camera pose of the first image frame, and updating the matched three-dimensional point; projecting the updated successfully matched three-dimensional points to a second image frame; and calculating the pixel value error of the three-dimensional point in the first image frame and the second image frame matching block, and deleting the corresponding matching relation if the error is greater than a second threshold value. Therefore, the matching relation with large pixel value error of the first image frame and the second image frame matching block is deleted, so that more accurate matching can be obtained, and the solution of the camera pose is more accurate.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the initial poses of the first image frame and the second image frame are: camera poses of a frame before the first image frame and a frame before the second image frame, or initial poses of the first image frame and the second image frame which are respectively calculated according to a motion model; the initial coordinates of the first image frame and the second image frame are as follows: three-dimensional points visible from the first image frame and the second image frame are projected to the first image frame and the second image frame, respectively, and initial coordinates of the first image frame and the second image frame are obtained. Therefore, the matching relation between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame is obtained by performing SSD matching according to the initial pose and the initial coordinates, and then the camera pose is calculated according to the target three-dimensional point determined by the matching relation, so that the solution of the camera pose is more accurate.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the fitting three-dimensional points in three-dimensional point clouds corresponding to the first image frame and the second image frame to two or more planes in a preset three-dimensional coordinate system includes: fitting the three-dimensional points to two or more planes in a preset three-dimensional coordinate system By using an image frame random sampling consensus RANSAC method according to a formula Ax + By + Cz + D which is 0; wherein, (A, B, C)^T＝(P₂-P₁)×(P₃-P₁)，P₁,P₂,P₃Is the three-dimensionRandomly selecting three points from the point cloud; d ═ P (A, B, C)₁. Therefore, plane fitting is carried out on the three-dimensional points through the RANSAC method, and more accurate matching is obtained by combining the plane to determine the matching pairs and solve the camera pose, so that the solution of the camera pose is more accurate.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, if the difference between the numbers of feature blocks in the first image frame and the second image frame is greater than a preset threshold K1, or the numbers of feature blocks in both the first image frame and the second image frame are less than K2, the method further includes: creating a thumbnail of the first image frame or the second image frame and a key frame in a key frame queue; obtaining the matching degree ranking of the first image frame or the second image frame and the key frame through a thumbnail alignment method; and respectively carrying out feature block matching on the feature block of the key frame with high matching degree and the first image frame and the second image frame, updating the successfully matched feature pair to a feature block matching successful set M, and respectively calculating the camera poses of the first image frame and the second image frame according to the successfully matched feature pair in the feature block matching successful set M. Therefore, when the image frame is lost in tracking, the method can avoid the influence on the three-dimensional reconstruction effect due to the loss of tracking by redirecting through the method for aligning the thumbnail.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining the pose of the binocular camera according to the three-dimensional points and the matching pairs further includes creating a matching vector V according to the matching pairs_n(i) The matching vector V_n(i) The feature block number is used for indicating that the ith key frame is successfully matched with the first image frame and the second image frame; according to the matching vector V_n(i) Carrying out feature block matching on the feature block of the ith key frame which is not successfully matched with the first image frame and the second image frame and the first image frame and the second image frame, and updating a set M; the ith key frame is the key frame with the highest matching success number with the first image frame and the second image frame feature blocks; calculating a second phase of the first image frame and the second image frame according to the successfully matched feature pairs in the set MMachine pose; and if the offset between the second camera pose of the first image frame and the second camera pose of the second image frame and the camera pose of the first image frame and the second image frame is larger than a third threshold value, adjusting the camera poses of the first image frame and the second image frame according to the second camera poses of the first image frame and the second image frame. Therefore, the motion tracks of the feature blocks are subjected to loop detection, the feature blocks with the same name on the discontinuous frames are efficiently matched, and the influence of the generated accumulated error on the camera pose result is avoided.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining a depth map according to pixel values of feature blocks on a first image frame and pixel values of matching blocks of the feature blocks on a second image frame includes transversely dividing the first image frame or the second image frame into N blocks; respectively calculating the depth maps of N parts of the first image frame or the second image frame by adopting different threads; wherein N is an integral multiple of the number of threads; and integrating the depth maps of the N parts of the first image frame or the second image frame to obtain a depth map result of the first image frame or the second image frame. In this way, the image frame is divided into blocks, each block is processed by different threads, and then the results are integrated, so that the generation speed of the depth map result is increased.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the performing confidence level determination on the obtained depth map result includes defining a set of criterion vectors C ═ C (C ═ C)_m,C_c,C_p) And a set of threshold vectors T ═ T (T)_m,T_c,T_p) Wherein, C_mA value representing the SAD of the pixel (x, y); c_c＝2SAD(x,y,d₁)-SAD(x,y,d₁+1)-SAD(x,y,d₁-1) representing the Sum of Absolute Difference (SAD) difference of the adjacent disparity and the current disparity at pixel point (x, y); c_pRepresenting a ratio of a depth within a neighborhood from the current depth value such that SAD is maximum and less than SAD of the current depth SAD to SAD of the current depth; and if the criterion vector C (x, y) of the pixel point (x, y) is smaller than the threshold vector T (x, y), determining that the confidence of the depth map result of the pixel point is 1. In this manner,and the reliability judgment is carried out on the depth map result, and then the reconstruction model is generated according to the depth map result with high reference degree, so that the accuracy of the reconstruction model is improved.

In a third aspect of the embodiments of the present application, there is provided a trajectory tracking and three-dimensional reconstruction apparatus, which may include: a memory for storing a computer program; a processor for executing the computer program for implementing the method for trajectory tracking and three-dimensional reconstruction as described in the first aspect or any of the possible implementations of the first aspect.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program, when executed by a processor, implements a trajectory tracking and three-dimensional reconstruction method according to the first aspect or any one of the possible implementations of the first aspect.

Drawings

Fig. 1A is a schematic view of an application scenario architecture of a trajectory tracking and three-dimensional reconstruction system according to an embodiment of the present application;

fig. 1B is a schematic diagram of a hardware structure of a trajectory tracking and three-dimensional reconstruction apparatus according to an embodiment of the present disclosure;

fig. 2 is an exemplary diagram of an image frame obtained by shooting with a binocular camera and an exemplary diagram of a generation effect of a reconstruction model according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a trajectory tracking and three-dimensional reconstruction method provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a feature block detection and matching result provided in the embodiment of the present application;

FIG. 5 is a diagram illustrating multi-threaded accelerated image segmentation provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of an error accumulation provided in an embodiment of the present application;

fig. 7 is a schematic diagram of a confidence level determination result provided in the embodiment of the present application;

FIG. 8 is a diagram illustrating an exemplary effect of another reconstruction model provided in an embodiment of the present application;

fig. 9 is a flowchart of a camera pose calculation method according to an embodiment of the present application;

fig. 10 is a flowchart of a target three-dimensional point determination method provided in an embodiment of the present application;

FIG. 11 is a flow chart of a multi-plane fitting method provided by an embodiment of the present application;

FIG. 12 is a flowchart of a method for repositioning camera poses according to an embodiment of the present application;

fig. 13 is a flowchart of a loop detection method according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a binocular camera tracking and three-dimensional reconstruction system, and the basic principle is as follows: and (3) the video acquired by the binocular camera passes through a trajectory tracking and three-dimensional reconstruction device to obtain the pose of the camera and the sparse three-dimensional point cloud, then a dense depth map is obtained, and finally a reconstruction model of the three-dimensional scene is obtained.

As shown in fig. 1A, an application scenario architecture diagram of a trajectory tracking and three-dimensional reconstruction system according to an embodiment of the present invention is shown, where the application scenario may include a three-dimensional scenario 10, a binocular camera 20, a trajectory tracking and three-dimensional reconstruction apparatus 30, and an application platform 40.

The binocular camera 20 may be connected to the trajectory tracking and three-dimensional reconstruction apparatus 30 in a wired or wireless manner, and the trajectory tracking and three-dimensional reconstruction apparatus 30 may be connected to the application platform 40 in a wired or wireless manner.

The binocular camera 20 may include a first camera and a second camera, the first camera and the second camera respectively include an image viewing window, the first camera and the second camera respectively capture an image of an object through their viewing windows, an image frame captured by the first camera may be referred to as a first image frame, and an image captured by the second camera may be referred to as a second image frame. The binocular camera 20 may continuously photograph the three-dimensional scene through the first camera and the second camera in the moving process, and may transmit a first image frame photographed by the first camera and a second image frame photographed by the second camera to the trajectory tracking and three-dimensional reconstruction device in real time.

The trajectory tracking and three-dimensional reconstruction device is used for calculating and processing a first image frame obtained by shooting through the first camera and a second image frame obtained by shooting through the second camera, and comprises the steps of carrying out corner extraction, three-dimensional point cloud obtaining, plane fitting, camera pose solving, depth map solving, obtaining a reconstruction model, reconstructing a three-dimensional image and transmitting the reconstructed three-dimensional image to the application platform 40. The trajectory tracking and three-dimensional reconstruction apparatus shown in fig. 1A is only a schematic diagram, and the apparatus may further include a display for displaying the three-dimensional image reconstruction process, which is not limited in this respect.

The application platform 40 may be a mobile phone terminal, a computer, a tablet computer, a monitor, etc., but the present invention is not limited thereto. The application on the application platform 40 may be an augmented reality system application, or may be other applications, and may be used to display a reconstructed three-dimensional image, where the display form may be to superimpose a virtual three-dimensional image onto a picture of a real scene, or may be a pure virtual three-dimensional image, and the present invention is not limited thereto. The application on the application platform 40 can be system applications in various fields such as games, medical treatment, education, tourism, navigation, criminal investigation and military affairs, and taking navigation as an example, when a user uses the navigation application, the user can obtain a three-dimensional image of a destination or a three-dimensional image of a building and a road interested along the way according to needs, and taking tourism as an example, a tourist can feel ancient buildings existing in history through a virtual three-dimensional image in a history track real scene.

It can be understood by those skilled in the art that fig. 1A is only a schematic diagram, and does not limit the location and mutual interaction of other functional units and each functional unit in the application scenario architecture.

As shown in fig. 2, an exemplary diagram of image frames obtained by continuously shooting the three-dimensional scene by the binocular camera 20 according to the embodiment of the present invention through the first camera and the second camera is shown, where a1 and a2 in fig. 2 are a first image frame and a first second image frame respectively acquired by the first camera and the second camera, B1 and B2 in fig. 2 are a second image frame and a second image frame respectively acquired by the first camera and the second camera, and C1 and C2 in fig. 2 are a third image frame and a third image frame respectively acquired by the first camera and the second camera.

Fig. 1B is a schematic diagram of a hardware structure of the trajectory tracking and three-dimensional reconstruction apparatus 30, as shown in fig. 1B, which includes a processor 301, a communication line 302, a memory 303, and at least one communication interface (fig. 1B only illustrates an example in which the communication interface 34 is included).

The processor 301 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the present application.

The communication link 302 may include a path for transmitting information between the aforementioned components.

The communication interface 304 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The memory 303 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be separate and coupled to the processor via a communication line 302. The memory may also be integral to the processor.

Wherein the memory 303 is used for storing computer execution instructions for executing the present application, wherein the memory 303 may store instructions for implementing three modular functions: the binocular camera tracking instruction, the depth map restoration instruction and the three-dimensional map reconstruction instruction are controlled and executed by the processor 301. The processor 301 is configured to execute computer-executable instructions stored in the memory 303, so as to implement the trajectory tracking and three-dimensional reconstruction method provided by the following embodiments of the present application. Specifically, the processor 301 executes a binocular camera tracking instruction stored in the memory 303 to perform corner extraction, three-dimensional point cloud acquisition, multi-plane fitting and camera pose solving, the processor 301 executes a depth map restoration instruction stored in the memory 303 to perform depth map solving and confidence level judgment according to the calculated camera pose and three-dimensional point cloud, and the processor 301 executes a three-dimensional map reconstruction instruction in the memory 303 to generate a reconstruction model according to the calculated camera pose and depth map results and the confidence level judgment result, so as to perform three-dimensional image reconstruction. The memory 303 shown in fig. 1B is only a schematic diagram, and the memory may further include other functional instructions, which is not limited in this respect.

Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In particular implementations, processor 301 may include one or more CPUs such as CPU0 and CPU1 in fig. 1B for one embodiment.

In particular implementations, communication device 400 may include multiple processors, such as processor 401 and processor 408 in fig. 4, for example, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

The trajectory tracking and three-dimensional reconstruction apparatus 30 may be a general-purpose device or a special-purpose device. In a specific implementation, the trajectory tracking and three-dimensional reconstruction apparatus 30 may be integrated with the application platform 40, and may be a desktop computer equipped with two cameras, a portable computer, a web server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or a device with a similar structure as in fig. 3. The embodiment of the present application does not limit the type of the communication apparatus 400.

The following will specifically illustrate the trajectory tracking and the three-dimensional reconstruction provided by the embodiment of the present application with reference to fig. 1A, 1B and 2.

An embodiment of the present invention provides a trajectory tracking and three-dimensional reconstruction method, and as shown in fig. 3, a flowchart of a trajectory tracking and three-dimensional reconstruction method to which the embodiment of the present invention may be applied is shown. The method is applied to a track tracking and three-dimensional reconstruction system, and comprises the following steps:

301. the processor 301 executes the binocular camera tracking instruction in the memory 303 to extract the corner points of the first image frame and the second image frame; the first image frame is an image frame acquired by a first camera of the binocular camera, and the second image frame is an image frame acquired by a second camera of the binocular camera.

Specifically, an acceleration segment test feature FAST corner of the first image frame and the acceleration segment test feature FAST corner of the second image frame may be extracted, or a Harris corner or a Binary Robust invariant scalable key point (BRISK) corner may be extracted, which is not limited in the embodiment of the present invention.

The binocular camera acquires the first image frame and the second image frame according to a camera model, wherein the camera model may be a perspective camera model or other camera models.

Taking a perspective camera model as an example, in the process of the binocular camera moving in a three-dimensional scene, images are continuously obtained by using the perspective camera model, and camera parameters corresponding to each image frame can be divided into an internal parameter and an external parameter. The internal parameters are represented by a 3x3 internal parameter matrix K:

wherein f is_x、f_yFocal lengths in x and y directions in units of pixels, respectively, (c)_x,c_y)^TThe center of the first view window and the center of the second view window of the binocular camera are located in the image frame.

The extrinsic parameters are represented by a 3x3 rotation matrix R and a translation vector T. (R, T) may transform the X points in the world coordinate system to the camera coordinate system, the transformation formula being as follows:

X_t＝R·X+T；

a projection equation for mapping a three-dimensional point X to a two-dimensional image point X in the world coordinate system can be obtained, and is expressed as:

x＝π(K,R,T,X)；

if the camera internal parameters are not changed in the tracking process, the formula can be simplified as follows:

x＝π(R,T,X)。

generally, the internal parameters of the two viewing windows of the binocular camera can be considered to be the same and oriented towards a line perpendicular to the centers of the two viewing windows. X-representation of three-dimensional points in two viewfinder coordinate systems_left、X_rightAnd satisfies the following conditions:

X_right＝X_left+(-b 0 0)^T；

wherein b is the straight line distance between the center points of the left and right viewfinder windows.

Before the binocular camera is used for tracking, the binocular camera equipment can be corrected according to a stereo matching principle, for example, the optical axis parallel principle is used for correction, and the correction can also be performed according to other correction principles, so that the embodiment of the invention is not limited; the image frame data acquired by the binocular camera can be corrected, the pose difference can be corrected according to an image correction algorithm, other image errors and the like can also be corrected, and the embodiment of the invention is not limited; the correction may be performed manually or by other automatic correction methods, and the embodiment of the present invention is not limited thereto.

302. The processor 301 executes the binocular camera tracking instruction in the memory 303, and acquires three-dimensional point clouds corresponding to the first image frame and the second image frame according to the corner points of the first image frame and the second image frame, wherein the three-dimensional point clouds are composed of three-dimensional points.

In a specific implementation, as an embodiment, the acquiring three-dimensional point clouds corresponding to the first image frame and the second image frame according to the corner points of the first image frame and the second image frame includes,

if the first image frame and the second image frame are key frames, constructing pyramid images for the first image frame and the second image frame, searching a matching block of a second image frame feature block in the first image frame, and obtaining a matching pair;

triangularizing the obtained matching pairs to generate three-dimensional point cloud;

and if the first image frame and the second image frame are non-key frames, taking the three-dimensional point cloud corresponding to the previous image frame of the first image frame and the previous image frame of the second image frame as the three-dimensional point cloud corresponding to the first image frame and the second image frame.

Taking the extraction Patch as an example of the feature block, if the first image frame and the second image frame are key frames, performing Patch matching search on the epipolar line by using the first image frame and the second image frame for the initial three-dimensional point cloud and the newly added point cloud of each key frame to obtain a reliable matching pair. Namely: for each feature block x in the first image frame_left＝(u_left,v_left)^TIn the second image frame u e [ u ]_left-b,u_left]，v∈[v_left-a,v_left+a]Searching in the area, calculating the SSD of the pixel values of the two characteristic blocks to obtain the point x with the minimum SSD_right＝(u_right,v_rightt)^T(ii) a Similarly, in the first image frame u e [ u ]_right,u_right+b]，v∈[v_right-a,v_right+a]Find the point x with the minimum SSD of the pixel values of the two feature blocks within the region of (1)_left＝(u_left,v_left)^TWill (x)_left,x_right) The camera polar lines are used as a matching pair, wherein a is a preset fluctuation range which can be allowed in the direction perpendicular to the base lines of the two approach windows when the matching pair is determined, and the two camera polar lines refer to the shortest connecting line between the centers of the two view windows; and b is the distance between the left camera and the right camera. As shown in fig. 4, a schematic diagram of feature block detection and matching results according to an embodiment of the present application is shown. Wherein, the feature block a in the first image frame P1 and the feature block a' in the second image frame P2 are a pair of matching pairs.

Triangularization of the matching pairs can adopt a formula:

wherein

For matched pair (x)_left，x_right)，(u_t，left，v_t，left) Is x_left(ii) image coordinates of (u)_t，right，v_t，left) Is x_rightImage coordinates of (2), X_tRepresenting a matching pair (x)_left,x_right) Three-dimensional space coordinates of corresponding three-dimensional points in the real three-dimensional space; other formulas, such as other functions like a reconstruct function, may also be used, and the embodiment of the present invention is not limited in this respect.

And taking the three-dimensional point cloud corresponding to the first image frame and the first second image frame acquired by the binocular camera as initial three-dimensional point cloud, and adding the three-dimensional points of the key frame into the three-dimensional point cloud every time a key frame is added subsequently.

And if the first image frame and the second image frame are non-key frames, abandoning the use of the first image frame and the second image frame for three-dimensional point cloud updating, and taking the three-dimensional point cloud corresponding to the previous image frame of the first image frame and the previous image frame of the second image frame as the three-dimensional point cloud corresponding to the first image frame and the second image frame.

Wherein the key frame is: if the first image frame or the second image frame satisfies: the matching rate of the feature blocks meets a matching threshold, the position vector of the first image frame or the second image frame and a previous first key frame or a previous second key frame is greater than the position threshold, the ratio of the number of the feature blocks on the first image frame or the second image frame, which are the same as the number of the feature blocks on the previous first image frame or the previous second image frame, to the total number of the feature blocks on the previous first image frame or the previous second image frame is less than the threshold, or the number of frames between the first image frame or the second image frame and the previous first key frame or the previous second key frame is greater than a preset threshold, the first image frame or the second image frame is taken as the first key frame or the second key frame, and the first key frame or the second key frame is added into the key frame set. Specifically, how to set each threshold and threshold is determined according to the specific situation, and the embodiment of the present invention is not limited thereto.

303. The processor 301 executes the binocular camera tracking instruction in the memory 303 to fit three-dimensional points in the three-dimensional point cloud corresponding to the first image frame and the second image frame to two or more planes in a preset three-dimensional coordinate system.

Specifically, the multiple planes can be fitted by using the RANSAC method from the three-dimensional points visible from the first image frame and the second image frame, or multiple planes can be fitted by using other iterative fitting methods, which is not limited in the embodiment of the present application. Taking the RANSAC method as an example, the basic idea is as follows: randomly extracting a plurality of samples from the sample set, judging whether the samples meet a preset threshold value, generating a subset according to a judgment result to obtain a model, iteratively performing the steps, continuously updating the subset and the model, after certain sampling times are finished, if a consistent set is found, the algorithm is successful, selecting the maximum consistent set obtained after sampling to judge the inner point and the outer point, and finishing the algorithm.

The preset coordinate system may be a world coordinate system, or may be another coordinate system, which is not limited in this embodiment.

304. The processor 301 executes a binocular camera tracking instruction in the memory 303, and obtains a matching pair composed of feature blocks mapped to three-dimensional points on a first image frame and a second image frame respectively according to a transformation relation between a plane where the three-dimensional points are located and the first image frame and the second image frame, wherein the feature blocks are image blocks taking the corner points as centers.

Specifically, taking FAST corner points of accelerated fragment test features of a first image frame and a second image frame as an example, patch may be extracted as a feature block with the position of the FAST corner point as a center, and the feature block may also be extracted by other fragment extraction methods, which is not limited in the embodiments of the present invention; in addition, the embodiment of the present invention is not limited to the size of the patch.

The transformation relation between the plane where the three-dimensional point is located and the first image frame and the second image frame can be represented by a transformation matrix, and feature blocks which are not successfully matched can be matched again according to the transformation matrix to obtain more matched pairs.

305. The processor 301 executes the binocular camera tracking instruction in the memory 303, and acquires the pose of the binocular camera according to the three-dimensional point and the acquired matching pair.

Specifically, an Efficient n projection Point problem method (effective Perspective-n-Point, EPnP) can be used to calculate the coordinates and angles of the binocular camera in the preset coordinate system according to the three-dimensional points and the two feature blocks in the matching pair on the first image frame and the second image frame corresponding to the three-dimensional points, and determine the pose of the binocular camera. An Iterative Closest Point (ICP) method may also be used, ICP is used to match respective three-dimensional points of the two frames before and after, so as to obtain a relative pose between the two frames, and other calculation methods may also be used to calculate the camera pose of the image frame, which is not limited in the embodiment of the present invention.

306. The processor 301 executes depth map restoration instructions in the memory 303 to obtain a depth map based on the pixel values of the feature blocks on the first image frame and the pixel values of the matching blocks of said feature blocks on the second image frame.

In a specific implementation, as an embodiment, the obtaining a depth map according to pixel values of a feature block on the first image frame and pixel values of a matching block of the feature block on the second image frame includes,

transversely dividing the first image frame or the second image frame into N blocks; as shown in fig. 5, a multi-pass accelerated image segmentation schematic diagram according to an embodiment of the present application is shown. In order to prevent the large error from being generated at the boundary of each block, two adjacent blocks may overlap each other, and the specific overlapping depth is determined by the specific situation, which is not limited in the embodiment of the present invention.

Respectively calculating depth maps of N parts of the first image frame or the second image frame by adopting different threads; wherein, the N is an integral multiple of the thread number;

and integrating the depth maps of the N parts of the first image frame or the second image frame to obtain a depth map result of the first image frame or the second image frame.

Specifically, the obtaining a depth map result of the first image frame or the second image frame includes:

on the basis of the semi-global matching method, the parallax result graph is calculated, and the specific formula is as follows:

wherein, C (p, D)_p) Error energy, P, brought about for the matching of pixel points P₁T[|D_P-D_q|＝1]Applying a constant penalty term P to surrounding pixels with parallax 1 relative to the pixel point P₁，P₂T[|D_P-D_q|＞1]A larger constant penalty term P is applied to surrounding pixel points with parallax greater than 1 relative to the pixel point P₂. And the second part and the third part introduce constraint terms, namely constant penalty terms, into the pixel point matching cost in order to ensure the accuracy and the smoothness of the parallax result graph.

And calculating the cost value according to the calculated parallax result.

Specifically, a certain pixel point corresponds to disparity ═ d, and the pixel point can be implemented by using the accumulation of r direction paths with the pixel point as the center, and the direction number is not limited in the embodiment of the present invention. Fig. 6 a and B show schematic diagrams of error accumulation of 16 directional paths with the pixel point as the center according to the embodiment of the present application, where fig. 6 a shows a minimum path loss cost value of the parallax d corresponding to the pixel point p, and fig. 6B shows a schematic diagram of 16 directional paths of the pixel point p.

Cost value corresponding to pixel point p on each parallax

Wherein L is_r(p, d) denotes the cost value of a pixel point p corresponding to the disparity d along the direction r. The cost value can be calculated as:

wherein, the P₁，P₂Is a constant penalty term.

307. Processor 301 executes the three-dimensional map reconstruction instructions in memory 303 to generate a reconstructed model from the camera pose and depth map results.

The reconstruction model may be a Truncated Signed Distance Function (TSDF) model, or may use other models, which is not limited in the embodiments of the present invention. Here, a TSDF model is taken as an example to explain the reconstruction of the three-dimensional map of the scene, specifically, the TSDF model is constructed, and the three-dimensional surface of the scene is obtained by using the TSDF model through a ray casting algorithm RayCast algorithm, which may be compressed by using a hash algorithm to save memory, so that a larger reconstruction of the three-dimensional scene may be supported.

As shown in D in fig. 2, an exemplary diagram of the effect of generating a reconstruction model to which the embodiment of the present application may be applied is shown.

In a specific implementation, as an embodiment, as shown in fig. 3, the method further includes:

308. the processor 301 executes the depth map restoration instruction in the memory 303, and performs confidence judgment on the obtained depth map result.

In a specific implementation, as an embodiment, the performing the confidence judgment on the obtained depth map result includes,

defining a set of criterion vectors C ═ C_m,C_c,C_p) And a set of threshold vectors T ═ T (T)_m,T_c,T_p)，

Wherein, C_m(x,y)＝cost_min(x,y,d₁) Indicating the value of SAD of the pixel (x, y); c_c＝2SAD(x,y,d₁)-SAD(x,y,d₁+1)-SAD(x,y,d₁-1) representing the Sum of Absolute Difference (SAD) difference of the adjacent disparity and the current disparity at pixel point (x, y);

representing a ratio of a depth within a neighborhood from the current depth value such that SAD is maximum and less than SAD of the current depth to SAD of the current depth, wherein c₁(x,y)＝SAD(x,y,d₁)， c₂(x,y)＝SAD(x,y,d₂)，c₂≤c₁。

Wherein SAD is the value d of each pixel point in the parallax result map₁＝D(x_i,y_i) A certain range d e [ d ] around₁-deviation,d₁+deviation]The sum of absolute errors of;

I_L(x_i,y_i) Is the gray value of pixel point I in the left view, I_R(x_i-d,y_i) The gray value of the pixel point i in the right view.

And if the criterion vector C (x, y) of the pixel point (x, y) is smaller than the threshold vector T (x, y), determining that the confidence coefficient of the depth map recovery result of the pixel point is 1.

As shown in fig. 7, a schematic diagram of a confidence level determination result according to an embodiment of the present invention is shown. Wherein, a in fig. 7 is an original image frame; b in fig. 7 is an optimal parallax image calculated by the conventional SGM algorithm; c in fig. 7 is a confidence determination result, where a white portion represents that the reliability of the depth map result of the pixel position is high, and a black portion represents that the reliability of the depth map result of the pixel position is low; d in fig. 7 is a confidence map obtained from the confidence result and the depth map result.

309. The processor 301 executes the three-dimensional map reconstruction instruction in the memory 303, and generates a reconstruction model according to the camera pose, the depth map result and the confidence level judgment result.

And if the confidence coefficient judgment result is 0, giving up the depth map result corresponding to the pixel point, and taking the depth map result corresponding to the pixel point with the confidence coefficient judgment result of 1 and the camera pose as the basis for generating the reconstruction model.

As shown in fig. 8, another reconstruction model generation effect diagram to which the embodiment of the present application can be applied is shown. Wherein, a1 and a2 in fig. 8 are a first frame first image frame and a first frame second image frame respectively acquired by a first camera and a second camera, B1 and B2 in fig. 8 are a second frame first image frame and a second frame second image frame respectively acquired by the first camera and the second camera, C1 and C2 in fig. 8 are a third frame first image frame and a third frame second image frame respectively acquired by the first camera and the second camera, D in fig. 8 is a schematic diagram of a motion trajectory and a three-dimensional point cloud of a binocular camera, and E in fig. 8 is a reconstruction model effect diagram generated by combining a camera pose, a depth map result and a confidence coefficient judgment result.

In a specific implementation, as an embodiment, as shown in fig. 9, a flowchart of a camera pose calculation method according to an embodiment of the present invention is shown. The acquiring the pose of the binocular camera according to the three-dimensional points and the matching pairs comprises:

901. the processor 301 executes a binocular camera tracking instruction in the memory 303, and performs error square sum algorithm (SSD) matching on image blocks on the first image frame and the second image frame, which are associated with the three-dimensional point, and image blocks of candidate corner points according to initial poses of the first image frame and the second image frame to obtain a matching relation between the three-dimensional point and two-dimensional coordinates of the first image frame and two-dimensional coordinates of the second image frame; wherein the candidate corner point is a corner point visible at initial coordinates of the first and second image frames.

The image block associated with the three-dimensional point refers to an image block on the first key frame associated with the three-dimensional point.

The initial poses of the first image frame and the second image frame are as follows:

the camera poses of the frame preceding the first image frame and the frame preceding the second image frame, for example, when the front and back frames are moving relatively slowly, such as in an indoor handheld scene, can be regarded as the initial camera poses of the image frames. Or

And respectively calculating the initial poses of the first image frame and the second image frame according to the motion model. The embodiment of the invention does not limit the specific motion model.

The initial coordinates of the first image frame and the second image frame are as follows:

and projecting three-dimensional points visible from the first image frame and the second image frame to the first image frame and the second image frame respectively to obtain initial coordinates of the first image frame and the second image frame.

Specifically, the camera pose of the frame preceding the first image frame and the frame preceding the second image frame may be taken as the camera pose of the image frames, or the camera pose of the image frames may be calculated from a motion model.

902. The processor 301 executes the binocular camera tracking instruction in the memory 303, and determines a target three-dimensional point according to the obtained matching relationship; the target three-dimensional point is a three-dimensional point successfully matched with the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame.

903. The processor 301 executes the binocular camera tracking instructions in the memory 303 to calculate the camera pose of the image frame based on the target three-dimensional point and the first image frame two-dimensional coordinates and the second image frame two-dimensional coordinates.

Specifically, according to the matching relationship between the three-dimensional Point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame, the camera pose of the image frame can be calculated by using an Efficient n projection Point problem method (effective Perspective-n-Point, EPnP). The camera pose of the image frame may also be calculated by using other calculation methods such as Iterative Closest Point (ICP), which is not limited in the embodiment of the present invention.

In a specific implementation, as an embodiment, as shown in fig. 10, a flowchart of a target three-dimensional point determination method according to an embodiment of the present invention is shown. The determining the target three-dimensional point according to the matching relationship comprises:

1001. the processor 301 executes the binocular camera tracking instruction in the memory 303, and calculates the homography transformation matrix H from the plane to the first image frame and the second image frame according to the plane where the successfully matched three-dimensional point is located₁And H₂。

Specifically, the successfully matched three-dimensional points are grouped according to the planes in which the three-dimensional points are located, and homography transformation matrixes H from the corresponding planes to the first image frame and the second image frame are respectively calculated₁And H₂。

1002. The processor 301 executes the binocular camera tracking instructions in the memory 303 to transform the matrix H according to the homography₁And H₂Respectively mapping the three-dimensional points which are not successfully matched to the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame, and iteratively updating the three-dimensional points which are successfully matched, wherein the iterative updating can be performed once or for multiple times.

In particular, the matrix H is transformed according to the homography₁And H₂Respectively mapping unmatched three-dimensional points to the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame, then using the feature blocks around the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame as candidate points to search the feature blocks, and adding the newly matched three-dimensional points into the three-dimensional point cloud.

1003. The processor 301 executes the binocular camera tracking instruction in the memory 303 to determine whether the number of newly matched three-dimensional points in step 1002 is less than a first threshold.

The first threshold is set according to specific situations, and the embodiment of the present invention is not limited thereto.

If the number of the newly matched three-dimensional points is less than the first threshold, the iteration is stopped, and step 1004 is executed.

If the number of the newly matched three-dimensional points is greater than the first threshold, step 1001 and step 1002 are executed.

1004. The processor 301 executes the binocular camera tracking instruction in the memory 303 to align the first image frame and the second image frame with the thumbnails of the frame before the first image frame and the frame before the second image frame, respectively, to obtain the interframe transformation matrix S₁And S₂。

1005. Processor 301 executes binocular camera tracking instructions in memory 303 based on the interframe transformation matrix S₁And S₂And projecting the updated unmatched three-dimensional points to the first image frame and the second image frame respectively according to the camera poses of the frame before the first image frame and the frame before the second image frame, and determining the target three-dimensional points.

In a specific implementation, as an embodiment, the obtaining a matching relationship between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame further includes:

matching the three-dimensional points which are not successfully matched with the first image frame according to the camera pose of the first image frame, and updating the three-dimensional points which are successfully matched;

projecting the updated successfully matched three-dimensional point to a second image frame;

and calculating the pixel value error of the three-dimensional point in the first image frame and the second image frame matching block, and deleting the corresponding matching relation if the error is greater than a second threshold value. The second threshold is set according to specific situations, and the embodiment of the present invention is not limited thereto.

In a specific implementation, as an embodiment, fig. 11 is a flowchart of a multi-plane fitting method according to an embodiment of the present invention, where fitting three-dimensional points in three-dimensional point clouds corresponding to a first image frame and a second image frame to two or more planes in a preset three-dimensional coordinate system includes:

1101. the processor 301 executes the binocular camera tracking instructions in the memory 303 to randomly select three points P from the three-dimensional point cloud₁,P₂,P₃。

1102. Processor 301 executes binocular camera tracking instructions, root, in memory 303Calculating a plane equation according to the formula Ax + By + Cz + D as 0, wherein (A, B, C)^T＝(P₂-P₁)×(P₃-P₁)，D＝-(A,B,C)·P₁。

1103. The processor 301 executes the binocular camera tracking instruction in the memory 303 to determine the distance d from other three-dimensional points to the plane_iWhether the value is less than a preset threshold value d.

The preset threshold d is determined according to a specific scene, and the embodiment of the present invention is not limited thereto.

1104. If d is_iD, the processor 301 executes the binocular camera tracking instruction in the memory 303, and takes the three-dimensional point i as the inner point of the plane; if d is_iD, abandoning the three-dimensional point i as the inner point of the plane.

1105. Judging whether the iteration number is larger than a preset maximum iteration threshold N_maxOr the number of inner points of said plane is greater than 60% of the total number of said three-dimensional points.

1106. If the iteration times are larger than the preset maximum iteration threshold N_maxOr the number of the inner points of the plane is more than 60% of the total number of the three-dimensional points, the processor 301 executes a binocular camera tracking instruction in the memory 303, adds the plane to a plane queue, and removes the inner points of the plane from the three-dimensional point cloud; otherwise,

step

1102 and 1104 are performed.

Specifically, if the iteration number is greater than the preset maximum iteration threshold N_maxOr the number of the inner points of the plane is more than 60 percent of the total number of the three-dimensional points, selecting a plane equation with the best fitting, optimizing the plane equation by using the inner points of the plane to be more accurate, adding the plane equation into the plane queue, and simultaneously moving the inner points of the plane out of the point cloud set.

Wherein the maximum iteration threshold N_maxThe embodiments of the present invention are not limited to these examples, as the case may be.

After a plane and the inner points of the plane are determined, the next plane is fitted according to the remaining three-dimensional points by adopting the same steps.

In a specific implementation, as an embodiment, as shown in fig. 12, a flowchart of a method for repositioning camera poses according to an embodiment of the present invention is shown, where if a difference between the numbers of feature blocks in the first image frame or the second image frame is greater than a preset threshold K1, or the numbers of feature blocks in both the first image frame and the second image frame are less than K2, the method includes:

1201. processor 301 executes binocular camera tracking instructions in memory 303 to create thumbnails of the first image frame or the second image frame and the keyframes in the keyframe queue.

1202. The processor 301 executes the binocular camera tracking instruction in the memory 303 to obtain the matching degree ranking of the first image frame or the second image frame and the key frame through the thumbnail alignment method.

1203. The processor 301 executes the binocular camera tracking instruction in the memory 303, performs feature block matching on the feature blocks of the keyframe with the high matching degree and the first image frame and the second image frame respectively, and updates the successfully matched feature pairs to the feature block matching successful set M.

1204. The processor 301 executes the binocular camera tracking instruction in the memory 303, and calculates the camera poses of the first image frame and the second image frame respectively according to the successfully matched feature pairs in the feature block matching successful set M.

Specifically, the keyframes with the top matching degree are added into the set L, the feature blocks corresponding to the feature tracks included in each keyframe in the set L are matched with the corner points of the image frames, all feature pairs successfully matched are added into the feature block matching successful set M, the camera poses of the first image frame and the second image frame are calculated according to the feature pairs successfully matched, and specifically, the camera poses can be calculated by using EPnP. Other calculation methods may also be used, and the embodiments of the present invention are not limited in this respect.

In a specific implementation, as an embodiment, fig. 13 is a flowchart of a loop detection method according to an embodiment of the present application, where the loop detection process specifically includes,

1301. processor 301 executes binocular camera tracking instructions in memory 303 to create a matching vector V from the matching pairs_n(i) Wherein the matching vector V_n(i) The feature block number used for indicating the matching success of the ith key frame and the first image frame and the second image frame is the overlapping degree of the ith key frame and the image frame.

Specifically, when the first image frame or the second image frame is determined to be a key frame, a matching vector is created, and before creating the matching vector, since the first image frame or the second image frame and the key frame adjacent to the first image frame or the second image frame in the time domain do not form a loop relationship, the key frame adjacent to the first image frame or the second image frame in the time domain in the set L needs to be removed.

1302. Processor 301 executes binocular camera tracking instructions in memory 303 based on matching vector V_n(i) Carrying out feature block matching on feature blocks of the ith key frame, which are not successfully matched with the first image frame and the second image frame, and updating the set M; and matching the ith key frame with the feature blocks of the first image frame and the second image frame to obtain the key frame with the highest success number.

1303. Processor 301 executes the binocular camera tracking instructions in memory 303 to update the matching vector V from set M_n+1(i)。

In particular, V_n+1(i) And V_n(i) The same is true.

1304. Judgment V_n+1(i) Whether it is less than a preset threshold.

1305. If V_n+1(i) And when the image position is smaller than the preset threshold value, calculating the second camera pose of the first image frame and the second image frame according to the feature pair successfully matched in the set M.

If V_n+1(i) If the value is greater than or equal to the predetermined threshold value, go to

step

1302 and 1304.

1306. And if the offset between the second camera poses of the first image frame and the second image frame and the camera pose offsets of the first image frame and the second image frame are larger than a third threshold value, adjusting the camera poses of the first image frame and the second image frame according to the second camera poses of the first image frame and the second image frame.

In a particular implementation, the invention is provided as an exampleAs provided in the embodiments of the present application, the method may further include: according to the following formula

And performing local collection adjustment BA optimization and global BA optimization on the camera poses of the three-dimensional point cloud and the key frame, and updating the camera poses of the three-dimensional point cloud and the key frame.

Wherein, X_iRepresenting points in real three-dimensional space, pi_left(R_j,T_j,X_i) Represents X_iProjection point, pi, on the first camera image plane_right(R_j,T_j,X_i) Represents X_iThe projected point on the second camera image plane,

and

is a matched pair of the signals,

and

respectively, the euclidean distance (i.e., the deviation) between the two projected points and the feature block, i.e., the three-dimensional point projection error.

Specifically, when the background of the track tracking and three-dimensional reconstruction device is idle, the BA optimization and the update of the camera pose of the three-dimensional point cloud and the key frame can be carried out, and a more accurate camera pose is obtained.

The BA optimization includes local optimization and global optimization, specifically, local BA optimization is performed once every time a frame of key frame is added, and global BA is performed periodically, specifically, how to set a period is performed, which is determined according to specific situations, and thus, embodiments of the present invention are not limited thereto.

According to the method, FAST corner extraction of left and right views of a target is realized through a track tracking and three-dimensional reconstruction device of a binocular camera tracking system, an initial three-dimensional point cloud and a key frame are generated, a three-dimensional point cloud and a key frame set are continuously updated by tracking two frames of images in front and at back in a multi-plane fitting mode, the position of a camera is solved, a dense depth map is obtained according to the position of the camera and the three-dimensional point cloud, confidence judgment is carried out on the depth map, and finally a reconstruction model of a three-dimensional scene is obtained. More accurate matching can be obtained in the tracking process, tracking is more stable, accurate camera pose is solved, the traditional SGM algorithm is combined with depth result confidence degree analysis, the accuracy of a depth map and the accuracy of a reconstruction model are improved, and the reconstruction effect of a three-dimensional map is more accurate.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. The storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A trajectory tracking and three-dimensional reconstruction method is characterized by comprising the following steps:

extracting angular points of the first image frame and the second image frame; the first image frame is an image frame acquired by a first camera of a binocular camera, and the second image frame is an image frame acquired by a second camera of the binocular camera;

acquiring three-dimensional point clouds corresponding to the first image frame and the second image frame according to the corner points of the first image frame and the second image frame, wherein the three-dimensional point clouds are composed of three-dimensional points;

fitting three-dimensional points in the three-dimensional point clouds corresponding to the first image frame and the second image frame to two or more planes in a preset three-dimensional coordinate system;

obtaining a matching pair consisting of feature blocks mapped to the three-dimensional points on the first image frame and the second image frame respectively according to the transformation relation between the plane where the three-dimensional points are located and the first image frame and the second image frame, wherein the feature blocks are image blocks taking the angular points as centers;

acquiring the pose of the binocular camera according to the three-dimensional points and the matching pairs;

acquiring a depth map according to the pixel values of the feature blocks on the first image frame and the pixel values of the matched blocks of the feature blocks on the second image frame;

and generating a reconstruction model according to the camera pose and depth map result.

2. The trajectory tracking and three-dimensional reconstruction method according to claim 1, further comprising:

performing confidence judgment on the obtained depth map result;

and generating a reconstruction model according to the camera pose, the depth map result and the confidence coefficient judgment result.

3. The trajectory tracking and three-dimensional reconstruction method according to claim 1 or 2, wherein the obtaining three-dimensional point clouds corresponding to the first image frame and the second image frame according to the corner points of the first image frame and the second image frame comprises:

if the first image frame and the second image frame are key frames, constructing pyramid images for the first image frame and the second image frame, searching a matching block of a feature block of the second image frame in the first image frame, and obtaining a matching pair;

triangularizing the matching pairs to generate three-dimensional point cloud;

4. The trajectory tracking and three-dimensional reconstruction method according to claim 3, wherein the obtaining the pose of the binocular camera according to the three-dimensional points and the matching pairs comprises:

carrying out error square sum algorithm (SSD) matching on image blocks on the first image frame and the second image frame, which are associated with the three-dimensional points, and image blocks of candidate corner points according to the initial poses of the first image frame and the second image frame to obtain a matching relation between the three-dimensional points and two-dimensional coordinates of the first image frame and two-dimensional coordinates of the second image frame; wherein the candidate corner points are corner points visible at initial coordinates of the first and second image frames;

determining a target three-dimensional point according to the matching relation; the target three-dimensional point is a three-dimensional point successfully matched with the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame;

and calculating the camera pose of the image frame according to the target three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame.

5. The trajectory tracking and three-dimensional reconstruction method according to claim 4, wherein the determining the target three-dimensional point according to the matching relationship comprises:

calculating a homography transformation matrix H from the plane to the first image frame and the second image frame according to the plane where the successfully matched three-dimensional points are located₁And H₂；

Transforming a matrix H according to the homographies₁And H₂Respectively mapping the three-dimensional points which are not successfully matched to the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame, and updating the three-dimensional points which are successfully matched in an iteration mode, wherein the iteration is updated to one time or multiple times；

If the number of the newly matched three-dimensional points is less than a first threshold value, stopping iteration; respectively aligning the first image frame and the second image frame with thumbnails of the previous frame of the first image frame and the previous frame of the second image frame to obtain an interframe transformation matrix S₁And S₂；

According to the interframe transformation matrix S₁And S₂And projecting the updated unmatched three-dimensional points to the first image frame and the second image frame respectively by using the camera poses of the frame before the first image frame and the frame before the second image frame to determine target three-dimensional points.

6. The trajectory tracking and three-dimensional reconstruction method according to claim 4 or 5, wherein the obtaining of the matching relationship between the three-dimensional point and the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame comprises,

matching the three-dimensional points which are not successfully matched to the first image frame according to the camera pose of the first image frame, and updating the three-dimensional points which are successfully matched;

projecting the updated successfully matched three-dimensional point to the second image frame;

and calculating pixel value errors of the three-dimensional points in the first image frame and the second image frame matching blocks, and deleting the matching relation if the errors are larger than a second threshold value.

7. The trajectory tracking and three-dimensional reconstruction method according to claim 4 or 5, wherein the initial poses of the first image frame and the second image frame are:

camera pose of a frame preceding the first image frame and a frame preceding the second image frame, or

Respectively calculating initial poses of the first image frame and the second image frame according to a motion model;

projecting three-dimensional points visible from the first image frame and the second image frame to the first image frame and the second image frame respectively to obtain initial coordinates of the first image frame and the second image frame.

8. The trajectory tracking and three-dimensional reconstruction method according to any one of claims 1-2 and 4-5, wherein the fitting of the three-dimensional points in the three-dimensional point cloud corresponding to the first image frame and the second image frame to two or more planes in a preset three-dimensional coordinate system comprises:

fitting the three-dimensional points to two or more planes in a preset three-dimensional coordinate system By using an image frame random sampling consensus RANSAC method according to a formula Ax + By + Cz + D which is 0;

wherein, (A, B, C)^T＝(P₂-P₁)×(P₃-P₁)，P₁,P₂,P₃Randomly selecting three points from the three-dimensional point cloud; d ═ P (A, B, C)₁。

9. The trajectory tracking and three-dimensional reconstruction method according to any one of claims 1-2 and 4-5, wherein if the difference between the number of feature blocks in the first image frame or the second image frame is greater than a preset threshold K1, or the number of feature blocks in both the first image frame and the second image frame is less than K2, the method further comprises:

creating thumbnails of the first image frame or the second image frame and the key frames in the key frame queue;

obtaining the matching degree ranking of the first image frame or the second image frame and the key frame through a thumbnail alignment method;

and respectively carrying out feature block matching on the feature blocks of the key frames with high matching degree and the first image frame and the second image frame, updating the feature pairs successfully matched to a feature block matching successful set M, and respectively calculating the camera poses of the first image frame and the second image frame according to the feature pairs successfully matched in the feature block matching successful set M.

10. The trajectory tracking and three-dimensional reconstruction method according to claim 9, wherein the obtaining the pose of the binocular camera according to the three-dimensional points and the matching pairs further comprises:

creating a matching vector V from said matching pair_n(i) Said matching vector V_n(i) The feature block number is used for indicating that the ith key frame is successfully matched with the first image frame and the second image frame;

according to the matching vector V_n(i) Carrying out feature block matching on feature blocks of the ith key frame, which are not successfully matched with the first image frame and the second image frame, and updating the set M; wherein the ith key frame is the key frame with the highest number of successful matching with the feature blocks of the first image frame and the second image frame;

calculating a second camera pose of the first image frame and the second image frame according to the successfully matched feature pairs in the set M;

if the second camera pose corresponding to the first image frame and the second image frame and the camera pose offset of the first image frame and the second image frame are larger than a third threshold value, adjusting the camera poses of the first image frame and the second image frame according to the second camera poses of the first image frame and the second image frame.

11. The trajectory tracking and three-dimensional reconstruction method according to any one of claims 1-2 and 4-5, wherein the obtaining a depth map according to pixel values of a feature block on the first image frame and pixel values of a matching block of the feature block on the second image frame comprises:

transversely dividing the first image frame or the second image frame into N blocks;

12. The method of claim 11, wherein the determining a confidence level of the obtained depth map result comprises,

defining a set of criterion vectors C ═ C_m,C_c,C_p) And a set of threshold vectors T ═ T (T)_m,T_c,T_p) Wherein, C_mA value representing the SAD of the pixel (x, y); c_c＝2SAD(x,y,d₁)-SAD(x,y,d₁+1)-SAD(x,y,d₁-1) representing the Sum of Absolute Difference (SAD) difference of the adjacent disparity and the current disparity at a pixel point (x, y); c_pRepresenting a ratio of a depth within a neighborhood from the current depth value such that SAD is maximum and less than SAD of the current depth SAD to SAD of the current depth;

and if the criterion vector C (x, y) of the pixel point (x, y) is smaller than the threshold vector T (x, y), determining that the confidence of the depth map result of the pixel point is 1.

13. A trajectory tracking and three-dimensional reconstruction system, comprising:

the binocular camera is used for extracting angular points of the first image frame and the second image frame; the first image frame is an image frame acquired by a first camera of a binocular camera, and the second image frame is an image frame acquired by a second camera of the binocular camera;

the track tracking and three-dimensional reconstruction device is used for acquiring three-dimensional point clouds corresponding to the first image frame and the second image frame according to the corner points of the first image frame and the second image frame, and the three-dimensional point clouds are composed of three-dimensional points;

14. The trajectory tracking and three-dimensional reconstruction system of claim 13, wherein the trajectory tracking and three-dimensional reconstruction device is further configured to:

performing confidence judgment on the obtained depth map result;

15. The trajectory tracking and three-dimensional reconstruction system according to claim 13 or 14, wherein the acquiring three-dimensional point clouds corresponding to the first image frame and the second image frame according to corner points of the first image frame and the second image frame comprises,

triangularizing the matching pairs to generate three-dimensional point cloud;

16. The trajectory tracking and three-dimensional reconstruction system of claim 15, wherein the obtaining the pose of the binocular camera according to the three-dimensional points and the matching pairs comprises:

17. The trajectory tracking and three-dimensional reconstruction system of claim 16, wherein said determining a target three-dimensional point according to said matching relationship comprises:

Transforming a matrix H according to the homographies₁And H₂Respectively mapping the three-dimensional points which are not successfully matched to the two-dimensional coordinates of the first image frame and the two-dimensional coordinates of the second image frame, and updating the three-dimensional points which are successfully matched in an iteration mode, wherein the iteration is updated for one time or more times;

According to the interframe transformation matrix S₁And S₂And projecting the updated unmatched successful three-dimensional points to the first image frame and the second image frame respectively according to the camera poses of the frame before the first image frame and the frame before the second image frameAnd determining a target three-dimensional point.

18. The trajectory tracking and three-dimensional reconstruction system of claim 16 or 17, wherein said obtaining a matching relationship of said three-dimensional point to two-dimensional coordinates of a first image frame and two-dimensional coordinates of a second image frame comprises,

19. The trajectory tracking and three-dimensional reconstruction system according to claim 16 or 17, wherein the initial poses of the first and second image frames are:

20. The trajectory tracking and three-dimensional reconstruction system according to any one of claims 13-14 and 16-17, wherein the fitting of three-dimensional points in the three-dimensional point cloud corresponding to the first image frame and the second image frame to two or more planes in a predetermined three-dimensional coordinate system comprises:

21. The trajectory tracking and three-dimensional reconstruction system according to any one of claims 13-14 and 16-17, wherein if the difference between the number of feature blocks in the first image frame or the second image frame is greater than a preset threshold K1, or the number of feature blocks in both the first image frame and the second image frame is less than K2, the method further comprises:

22. The trajectory tracking and three-dimensional reconstruction system of claim 21, wherein said obtaining the pose of the binocular camera from the three-dimensional points and the matching pairs comprises:

according to the matching vector V_n(i) Matching the feature block of the ith key frame which is not successfully matched with the first image frame and the second image framePerforming feature block matching on the second image frame, and updating the set M; wherein the ith key frame is the key frame with the highest number of successful matching with the feature blocks of the first image frame and the second image frame;

calculating the matching according to the successfully matched feature pairs in the set M;

and if the offset between the second camera pose of the first image frame and the second camera pose of the second image frame and the camera pose of the first image frame and the second image frame is larger than a third threshold value, adjusting the camera poses of the first image frame and the second image frame according to the second camera poses of the first image frame and the second image frame.

23. The trajectory tracking and three-dimensional reconstruction system according to any one of claims 13-14 and 16-17, wherein said obtaining a depth map from pixel values of a feature block on the first image frame and pixel values of a matching block of the feature block on the second image frame comprises:

24. The trajectory tracking and three-dimensional reconstruction system of claim 22, wherein the confidence determination of the obtained depth map result comprises:

25. An apparatus for trajectory tracking and three-dimensional reconstruction, the apparatus comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the trajectory tracking and three-dimensional reconstruction method according to any one of claims 1 to 12.

26. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the trajectory tracking and three-dimensional reconstruction method according to any one of claims 1 to 12.