WO2019170166A1

WO2019170166A1 - Depth camera calibration method and apparatus, electronic device, and storage medium

Info

Publication number: WO2019170166A1
Application number: PCT/CN2019/085515
Authority: WO
Inventors: 方璐; 苏卓; 韩磊; 戴琼海
Original assignee: 清华-伯克利深圳学院筹备办公室
Priority date: 2018-03-05
Filing date: 2019-05-05
Publication date: 2019-09-12
Also published as: CN108447097B; CN108447097A

Abstract

Disclosed are a depth camera calibration method and apparatus, an electronic device, and a storage medium. The method comprises: controlling at least two depth cameras in a panoramic depth camera system to synchronously collect images during a movement process, each depth camera being provided with a corresponding tag; obtaining the positions of at least one first tag camera when collecting the images; if a second tag camera has a historical viewing angle overlap with the first tag camera, calculating the relative positions of the second tag camera and the first tag camera at a same moment according to images corresponding to the historical viewing angle overlap and the positions of the first tag camera when collecting the images.

Description

Depth camera calibration method and device, electronic device and storage medium

The present application claims the priority of the Chinese Patent Application No. 20110117 973 8.7 filed on March 5, 2010, the entire disclosure of which is hereby incorporated by reference.

Technical field

Embodiments of the present invention relate to machine vision technology, for example, to a depth camera calibration method and apparatus, an electronic device, and a storage medium.

Background technique

With the development of robot navigation, virtual reality and augmented reality technology, RGB-D cameras (ie depth cameras) are widely used in robot navigation, static scene reconstruction and dynamic human reconstruction. RGB-D camera combines traditional RGB camera with depth camera, which has the advantages of high precision, small size, large amount of information, passivity and rich information.

At present, due to the limited field of view (FoV) of a single RGB-D camera, navigation with a single RGB-D camera cannot simultaneously acquire all the surrounding information, and the small field of view also has certain limitations on 3D reconstruction, such as dynamic object reconstruction. It is impossible to obtain a whole body or multi-object model, and it is impossible to reconstruct a dynamic object and a static object in the field of view at the same time.

If you are using a multi-camera system, you need to perform accurate external parameter calibration on the camera. At present, a camera calibration method based on a specific calibration object is generally adopted, which is relatively cumbersome, and requires a field of view between adjacent cameras to have a large overlapping range, and cannot adapt to a situation in which the field of view overlap is small. The calibration method based on Simultaneous Localization And Mapping (SLAM) uses the camera to move according to the preset trajectory to capture images, and then offline processing the image for calibration. It is not possible to quickly calibrate 360° panoramic RGB-D cameras online.

Summary of the invention

The embodiment of the invention provides a depth camera calibration method and device, an electronic device and a storage medium, so as to realize real-time online panoramic multi-camera self-calibration without human intervention.

In a first aspect, an embodiment of the present invention provides a depth camera calibration method, including:

Controlling at least two depth cameras in the panoramic depth camera system to simultaneously acquire images during motion, wherein each depth camera is provided with a label;

Acquiring at least one first label camera to capture a pose of each frame image;

Determining the second tag camera and the camera if the second tag camera overlaps with the first tag camera, and the corresponding image is overlapped according to the historical view angle and the pose of the first tag camera when the plurality of frame images are captured. The relative pose of the first label camera at the same time.

In a second aspect, an embodiment of the present invention further provides a depth camera calibration apparatus, including:

a camera control module configured to control at least two depth cameras in the panoramic depth camera system to synchronously acquire images during motion, wherein each depth camera is provided with a label;

a pose obtaining module configured to acquire a pose of the at least one first label camera when acquiring a plurality of frame images;

The relative pose calculation module is configured to overlap the corresponding image according to the historical perspective and the pose of the first label camera when each frame image is captured, in a case where the second label camera overlaps with the first label camera appearance history angle Determining a relative pose of the second tag camera at the same time as the first tag camera.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

One or more processors;

a memory set to store one or more programs;

a panoramic depth camera system comprising at least two depth cameras, the at least two depth cameras covering a panoramic field of view for acquiring images;

The one or more programs are executed by the one or more processors such that the one or more processors implement a depth camera calibration method as described in any of the embodiments herein.

In a fourth aspect, the embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, and when the program is executed by the processor, the depth camera calibration method according to any embodiment of the present application is implemented.

BRIEF abstract

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are required to be used in the embodiments or the description of the related art, will be briefly described. It is obvious that the drawings in the following description are some embodiments of the present application, and those of ordinary skill in the art Other drawings can also be obtained from these drawings on the premise of creative work.

1 is a flowchart of a depth camera calibration method according to Embodiment 1 of the present invention;

2 is a schematic diagram of a panoramic depth camera system according to Embodiment 1 of the present invention;

3 is a flowchart of a depth camera calibration method according to Embodiment 2 of the present invention;

4 is a flowchart of a depth camera calibration method according to Embodiment 3 of the present invention;

FIG. 5 is a flowchart of obtaining a first label camera pose according to Embodiment 4 of the present invention; FIG.

6 is a structural block diagram of a depth camera calibration apparatus according to Embodiment 5 of the present invention;

FIG. 7 is a schematic structural diagram of an electronic device according to Embodiment 6 of the present invention.

Detailed ways

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. In addition, it should be noted that, for the convenience of description, only some but not all of the structures related to the present application are shown in the drawings.

Embodiment 1

1 is a flowchart of a depth camera calibration method according to Embodiment 1 of the present invention. The present embodiment is applicable to a multi-camera self-calibration. The calibration in this embodiment refers to calculating a relative pose between cameras. The method may be performed by a depth camera calibration device or an electronic device, and the depth camera calibration device may be implemented by software and/or hardware, for example, by a central processing unit (CPU), and the camera is controlled and calibrated by the CPU. Further, the device can be integrated in a portable mobile electronic device. As shown in FIG. 1, the method may include, for example, step S101, step S102, and step S103.

In step S101, at least two depth cameras in the panoramic depth camera system are controlled to synchronously acquire images during the motion.

Among them, each depth camera is set with a corresponding label.

Wherein, the panoramic depth camera system (which may be simply referred to as a camera system) includes at least two depth cameras (RGB-D cameras) that cover a 360-degree panoramic field of view. In practical applications, the size of the camera and the number of cameras to be used can be selected according to specific needs, and multiple cameras can be fixed on the platform (for example, components of rigid structure) according to the size of the camera and the number of cameras to meet the field of view coverage. Initially completed the construction of the panoramic depth camera system. In an embodiment, the number of cameras required is determined according to the field of view of the single camera and the required field of view. The sum of the fields of view of all cameras needs to be larger than the required field of view. Illustratively, for example, a single camera with the same viewing angle is used. The number of cameras n needs to satisfy n×Fov>α, Fov represents the field of view of a single camera, and α represents the required field of view of the camera system being built, for example, taking α=360°, and the lateral field of view of a single camera is 65°. The longitudinal field of view is 80°, and n=5 or n=6 can be selected. Six cameras can be used in consideration of the vertical field of view. Depending on the size of the camera used and the number of cameras, the camera can be reasonably laid out, exemplarily, with a length of 10 to 15 cm, a width of 3 to 5 cm, a height of 3 to 5 cm, and a resolution of 640 × 480 RGB. For example, in the case of a -D camera, a positive hexagonal prism with a base length of 5 cm is selected as a shaft, and the camera lens is fixed outwardly on the shaft by a fixing member, as shown in FIG. It should be noted that when the camera system is built, combined with specific system usage requirements, adjacent cameras may have no overlap of views or a small overlap of views, for example, one or two degrees of view overlap.

The depth camera in the panoramic depth camera system synchronously acquires images, and the synchronization mode may be hardware synchronization or software synchronization. In an embodiment, hardware synchronization refers to using a signal (such as a rising edge signal) to simultaneously trigger all cameras to capture images at the same time; the software mode refers to time stamping multiple images when buffering images captured by multiple cameras. The image with the closest timestamp is considered to be the image acquired at the same time, that is, maintaining a buffer area, and outputting the image frames of the most similar time stamps of multiple cameras at a time.

To perform camera self-calibration on-line, the panoramic depth camera system needs to be in motion to acquire images as observation information. For example, using a robot-equipped panoramic depth camera system or a user-held panoramic depth camera system to perform free motion in a conventional room, and to ensure as much as possible to increase the rotational motion during the movement to increase the overlap of historical angles between the cameras during the exercise, and obtain more observations. Information for easy calibration.

Each depth camera is provided with a corresponding attribute label, which is used to distinguish the role of the camera. For example, the first label indicates that the camera is a reference camera, and the second label indicates that the camera is a non-reference camera. For simplicity, the specific value of the label may be 0. Or 1, for example, 0 represents a non-reference camera and 1 represents a reference camera.

In step S102, the pose of the at least one first tag camera when acquiring each frame image is acquired.

The camera character may be determined according to the camera tag. At least one of the at least two depth cameras included in the panoramic depth camera system may be preset as any reference camera in the system, and the first tag camera is the reference camera. It should be noted that if multiple reference cameras are set in advance, it is necessary to accurately know the relative pose between the reference cameras. Generally, a reference camera can be preset in consideration of the accuracy of the calibration. For the reference camera in the camera system, in the whole calibration process, it is necessary to acquire the pose of the reference camera when capturing multiple frame images in real time. The pose is the relative position when the reference camera captures the current frame image relative to the image of the previous frame. The pose changes, for example, may include a position (translation matrix T) and a pose (rotation matrix R) involving six degrees of freedom X, Y, Z, α, β, and γ, where X, Y, and Z are camera positions The parameters, α, β, and γ are parameters of the camera pose.

In step S103, if the second tag camera overlaps with the first tag camera appearance history angle, the corresponding image is overlapped according to the historical view angle and the pose of the first tag camera when the plurality of frame images are acquired, and the second tag camera is determined to be the first The relative pose of the label camera at the same time.

Wherein, with the movement of the panoramic depth camera system, each second label camera overlaps with the historical perspective of the first label camera, thereby obtaining the relative position of each second label camera and the first label camera at the same time. That is, the relative pose between multiple cameras is obtained, and the real-time online camera self-calibration process is completed.

It should be noted that during the entire calibration process, the depth cameras to be calibrated in the panoramic depth camera system are in motion, and the cameras of these motions synchronously acquire images in real time and maintain key frames under multiple cameras. For the reference camera, the single camera pose estimation is performed according to the acquired multiple frame images, and the camera pose when each frame image is acquired is obtained; at the same time, for the non-reference camera, the single camera pose estimation is not performed. The collected plurality of frame images are used as observation information to determine whether or not the historical perspective overlaps with any of the reference cameras. If they overlap, the relative pose of the non-reference camera and the corresponding reference camera at the same time can be calculated.

The depth camera calibration method of the embodiment is based on a first label camera preset in the panoramic depth camera system, and uses a second label camera and a first label camera in a process of capturing images while the panoramic depth camera system is moving. Overlap, determining the relative pose of the second label camera and the first label camera, enabling fast online self-calibration of multiple cameras. The method does not require a calibration object, and does not require a fixed large field of view overlap between adjacent cameras. When the camera system is built, there is a small overlapping viewing angle between adjacent cameras or no overlapping viewing angle, and the motion of the panoramic depth camera system is used to make different The camera collects images with overlapping historical views and can be calibrated accordingly. The method has a small amount of calculation and can be calibrated online based on the CPU. The depth camera calibration method of the embodiment of the present invention is applicable to an application background such as indoor robot navigation or three-dimensional scene reconstruction.

Embodiment 2

Based on the above embodiments, the present embodiment further optimizes the determination of the overlap of historical angles of view and the calculation of the relative pose between cameras.

For each depth camera to be calibrated in the camera system, an image is captured to determine whether the frame image is a key frame, so as to determine the historical perspective overlap. If the loopback optimization is performed later, key frames are also needed. In an embodiment, while acquiring the pose of the at least one first label camera to capture each frame image, the method further includes: performing feature point matching on the current frame image collected by the plurality of depth cameras and the previous key frame thereof respectively. And obtaining a conversion relationship matrix between the two frames of images; if the conversion relationship matrix is greater than or equal to the preset conversion threshold, determining that the current frame image is the current key frame under the corresponding depth camera, and storing the key frame.

The first frame image acquired by each depth camera defaults to a key frame. For each frame image acquired subsequently, the frame image is compared with the closest key frame of the camera to determine whether the frame image is a key. frame. The preset conversion threshold is set in advance according to the motion condition when the image is acquired by the depth camera. For example, if the pose changes greatly when the camera takes two adjacent frames, the preset conversion threshold is set larger. Feature point matching can use related matching algorithms, for example, feature matching of color images acquired by RGB-D cameras based on Oriented FAST and Rotated BRIEF (ORB) algorithm (sparse algorithm), or using direct method Perform dense registration.

FIG. 3 is a flowchart of a depth camera calibration method according to Embodiment 2 of the present invention. As shown in FIG. 3, the method includes: Step S301 to Step S306.

In step S301, at least two depth cameras in the panoramic depth camera system are controlled to synchronously acquire images during the motion.

Among them, each depth camera is set with a corresponding label.

In step S302, a pose of the at least one first tag camera when acquiring a plurality of frame images is acquired.

In step S303, the current frame image acquired by the second tag camera is matched with the historical key frame of the at least one first tag camera; if the historical key frame and the current frame image reach the matching threshold, the second tag camera is determined. The historical perspective overlaps with the corresponding first label camera.

The second tag camera performs feature point matching on the key frame (ie, historical key frame) stored under the first tag camera every time a new frame image is acquired, and if the matching threshold is reached, it is considered that the historical view overlap occurs. Feature point matching can use related matching algorithms, for example, feature matching of color images acquired by RGB-D cameras based on sparse ORB algorithm, or dense registration using direct method. The matching threshold may be the number of preset matching feature points.

In step S304, the abnormal data in the correspondence between the current frame image acquired by the second tag camera and the feature point of the corresponding historical key frame is removed, and the relative position of the current frame image and the corresponding historical key frame is calculated according to the remaining feature point correspondence relationship. relationship.

In step S305, a transformation relationship between the pose when the second label camera captures the current frame image and the pose of the first label camera when acquiring the corresponding historical key frame is determined according to the relative positional relationship.

In step S306, determining the relative pose of the second label camera and the first label camera at the current frame time according to the transformation relationship and the plurality of poses between the acquisition of the corresponding historical key frame and the acquisition of the current frame image by the first label camera. .

The random sample Consensus (RANSAC) algorithm may be used to remove the anomaly data, and the relative positional relationship between the two frames of the overlapping view angles may be calculated according to the remaining feature point correspondence relationship; according to the relative positional relationship between the images, Find the corresponding camera pose change relationship. Since the pose of the first tag camera when collecting a plurality of frame images is always estimated, the first tag camera pose between the historical keyframe and the current frame overlapped by the historical perspective is known, thereby deriving the history The relative position of the second tag camera with the overlapping angles of view and the first tag camera at the current frame time (ie, the moment when the historical perspective overlaps). The relative pose also includes six degrees of freedom, involving the rotation matrix R and the translation matrix T.

Illustratively, the panoramic depth camera system includes three depth cameras A, B, and C, with camera A being the first label camera and cameras B and C being the second label camera. The three cameras synchronously acquire images. When the 10th frame is acquired, the 10th frame of camera B overlaps with the 5th frame (historical key frame) of camera A, and the features between the images overlapped according to the historical perspective of the two frames. Point matching can be used to calculate the transformation relationship between the pose of the 10th frame captured by camera B and the pose of the 5th frame acquired by camera A. Since the camera A is always performing pose estimation and recording the poses of the 5th frame to the 10th frame, thereby deriving one frame, the relative pose of the camera B and the camera A in the 10th frame can be obtained.

In the depth camera calibration method of the embodiment, the historical key frame is used to determine that the second tag camera overlaps with the historical view of the first tag camera, and the pose relationship between the second tag camera and the first tag camera is calculated by the two frames of the overlapping angle of view. Then, using the first label camera pose between the corresponding historical key frame and the current frame under the first label camera, the relative pose of the second label camera and the first label camera at the same moment can be derived, and the calculation is simple and capable Guarantee fast online calibration.

In an embodiment, in order to obtain a more accurate relative pose between the cameras and improve the calibration accuracy, after calculating the relative pose of the second label camera and the first label camera at the same time, the calculated Global optimization of the relative pose, for example, may include: if there is a key frame in the current frame image acquired by the depth camera synchronously calculated relative to the pose, the history under the depth camera according to the key frame and the calculated relative pose The key frame performs loopback detection; if the loopback is successful (that is, the matching historical keyframe is found), the relative pose between the corresponding depth cameras is updated according to the key frame and the corresponding historical keyframe.

Among them, the loopback detection is to determine whether the depth camera moves to a place that has been reached or has a large overlap with the historical angle of view. Loopback detection is performed for the key frame rate. For each frame of image captured by the depth camera, it is necessary to determine whether the frame image is a key frame. If yes, loopback detection is performed. If not, wait for the next key frame to arrive for loopback detection. . The determination of the key frame may be that the image of each frame captured by the depth camera is matched with the last key frame corresponding to the camera to obtain a transformation relationship matrix between the two frames; if the conversion relationship matrix is greater than or equal to the pre-predetermined Setting the conversion threshold determines that the current frame image is the current key frame under the camera.

Because the relative pose between the cameras is involved, the comparison object of the loopback detection in this embodiment is a historical key frame under the depth camera in which the relative pose is calculated in the camera system, instead of merely comparing its own historical key frame, if loopback Successfully, the relative pose between the corresponding depth cameras is updated according to the current key frame and the corresponding historical key frame to reduce the cumulative error and improve the accuracy of the relative pose between the cameras.

In an embodiment, the loopback detection of the current key frame and the historical key frame may be performed by matching the current key frame with the ORB feature point of the historical key frame. If the matching degree is high, the loopback is successful. In an embodiment, According to the matching degree between the current key frame and the historical key frame, one or more historical key frames with high matching degree may be selected to perform an optimized update of the relative pose between the corresponding cameras. It should be noted that if the matching historical key frame belongs to the depth camera itself, the depth camera's own pose is optimized according to the current key frame and the historical key frame. The above-mentioned relative pose optimization update process can be started after obtaining the relative pose of a pair of cameras in the camera system, and the relative pose between the cameras is updated as the camera moves to acquire images, if the camera system The relative pose between all the cameras is calculated, and when the preset condition is met (for example, the number of optimizations of the relative pose between multiple cameras reaches the preset number of times, or the preset error requirement is met), the optimization update is stopped. Get the final more accurate calibration parameters.

Exemplarily, taking the panoramic depth camera system including three depth cameras A, B, and C as an example, after calculating the relative pose of the camera B and the camera A at the same time, determining whether the camera C and the camera A have a historical perspective While overlapping (to determine the relative pose of the two), the relative pose between cameras B and A is also optimized based on the key frames captured by camera A and/or camera B. If the relative pose of the camera C and the camera A is obtained at the same time, the relative pose of the camera B and the camera C can be further derived, thereby obtaining the relative pose between the plurality of cameras, and then the optimization is performed. . For the current frame time, the three cameras synchronously acquire corresponding images. If it is determined that the image a acquired by the camera A and the image b captured by the camera B are key frames, and the image c acquired by the camera C is not a key frame, then the key frame a and b performs loopback detection, both of which are loopback success, and are optimized and updated according to key frames a and b respectively. The calculation of relative poses during the optimization and update process is the same as the initial relative pose calculation method, and will not be described here.

In this embodiment, the calculated relative position and orientation of the camera can be optimized in real time according to the image acquired by the camera, and the accuracy of the camera calibration is improved.

Embodiment 3

The depth camera calibration method of each of the above embodiments is to use a first label camera preset in the panoramic depth camera system as a reference to obtain a relative pose between the cameras. This embodiment provides another depth camera calibration method based on the above embodiments to further speed up the online calibration speed. 4 is a flowchart of a depth camera calibration method according to Embodiment 3 of the present invention. As shown in FIG. 4, the method includes: Step S401 to Step S405.

In step S401, at least two depth cameras in the panoramic depth camera system are controlled to synchronously acquire images during the motion.

Among them, each depth camera is set with a corresponding label.

In step S402, a pose of the at least one first tag camera when acquiring a plurality of frame images is acquired.

In step S403, if the second tag camera overlaps with the first tag camera's historical view angle, the corresponding image is overlapped according to the historical view angle and the pose of the first tag camera when the plurality of frame images are acquired, and the second tag camera is determined to be the first The relative pose of the label camera at the same time.

In step S404, the label of the second tag camera is modified to the first tag.

That is to say, after calculating the relative pose, the second tag camera overlapping with the historical view angle of the first tag camera is included in the reference range to expand the reference range.

In step S405, the pose posture when acquiring the plurality of frame images by the at least one first label camera is repeatedly performed, determining the relative pose of the second label camera with the historical perspective overlap and the corresponding first label camera at the same time and modifying The operation of the tag (ie, repeating S402 to S404) until the second tag camera is not included in the at least two depth cameras. The labels of the depth camera in the camera system are all modified to the first label, indicating that each second label camera calculates the relative pose with other cameras, and the calibration result is obtained.

In this embodiment, after the relative position of the second label camera and the first label camera is obtained by overlapping the historical angle of view, the second label camera is also used as a reference to expand the reference range, and the history of the other second label cameras overlaps. Probability, which can further speed up the calibration.

For the determination of the key frame, the determination of the overlap of the historical perspective, the calculation of the relative pose, and the optimization of the relative pose, please refer to the description of the foregoing embodiment, and details are not described herein again.

Embodiment 4

Based on the foregoing embodiments, the present embodiment further optimizes “acquiring the pose of at least one first tag camera when acquiring a plurality of frame images” to improve the calculation speed. FIG. 5 is a flowchart of acquiring a first label camera pose according to Embodiment 4 of the present invention. As shown in FIG. 5, the method includes: Step S501, Step S502, and Step S503.

In step S501, for each first tag camera, feature extraction is performed on each frame image acquired by the first tag camera to obtain at least one feature point of each frame image.

The feature extraction of the image is to find some pixel points (ie, feature points) having landmark features in the frame image, for example, may be corner points, textures, and pixel points in the edge of the image. Feature extraction for each frame of image may use an ORB algorithm to find at least one feature point in the frame image.

In step S502, feature point matching is performed on two adjacent frames of images to obtain a feature point correspondence relationship between adjacent two frames of images.

Considering that the frequency of images captured by the camera during motion is relatively fast, the content of the adjacent two frames of images acquired by the same camera is the same, so there is a certain correspondence between the feature points corresponding to the two frames of images. The sparse ORB feature registration or the direct method dense registration can be used to obtain the correspondence between the feature points of the adjacent two frames, that is, the correspondence between the feature points of the adjacent two frames is obtained.

In an embodiment, taking a feature point between two adjacent frames as an example, assume that feature points X1 and X2 representing the same texture feature in two frames of images are respectively located at different positions of the two frames of images, with H (X1, X2) represents the Hamming distance between two feature points X1, X2, XORing the two feature points, and counting the number of 1 as the Hamming distance of a feature point between adjacent two frames of images (ie feature point correspondence).

In step S503, the abnormal data in the feature point correspondence is removed, and the non-linear component in the second-order statistic of the remaining feature points and the nonlinear component including the camera pose are used to calculate the non-J(ξ) ^T J(ξ) Linear term

Perform multiple iterations on δ=-(J(ξ) ^T J(ξ)) ^-1 J(ξ) ^T r(ξ) to solve the pose when the re-projection error r(ξ) is less than the preset threshold. Specifically, iterative calculation can be performed using the Gauss-Newton method. Preferably, it may be a posture in which the calculation of the reprojection error is minimized.

Where r(ξ) denotes a vector containing all reprojection errors, J(ξ) is the Jacobian matrix of r(ξ), ξ represents the Lie algebra of the camera pose, and δ represents the increase of r(ξ) at each iteration. Measured value; R _i represents the rotation matrix of the camera when the image of the ith frame is acquired; R _j represents the rotation matrix of the camera when the image of the jth frame is acquired;

Representing the kth feature point on the image of the i-th frame;

Representing the kth feature point on the jth frame image; C _i,j represents a set of correspondence points of the feature points of the i-th frame image and the j-th frame image; ||C _i,j ||-1 represents the i-th frame image The number of correspondences with the feature points of the j-th frame image; [] _× represents the vector product; ||C _i,j || represents the norm of C _i,j .

In an embodiment, the nonlinear term

The expression is:

among them,

Represents a linear component;

And r _jl represent nonlinear components,

Is the lth row in the rotation matrix R _i , r _jl is the transpose of the l th row in the rotation matrix R _j , l=0, 1, 2 (this embodiment starts counting from 0 based on the programming idea, l=0 Represents the first line of the matrix, and so on.

In an embodiment, there is unqualified abnormal data in each feature point correspondence relationship between adjacent two frames of images obtained in step S502. For example, in two adjacent frames, each frame image must exist in another frame. The feature points not included in one frame of image are subjected to the matching operation in step S502, and an abnormal correspondence relationship occurs. In an embodiment, the abnormal data may be removed using the RANSAC algorithm, and the remaining feature point correspondences are represented as

among them,

Corresponding to the k-th feature point between the i-th frame image and the j-th frame image; j=i-1.

Calculating the camera pose is to solve the nonlinear least squares problem between two frames of images with the following cost function:

Wherein E represents a re-projection error of the ith frame image in the Euclidean space compared to the j-th frame image (in the present embodiment, the previous frame image); T _i represents the pose of the camera when the ith frame image is acquired (according to the foregoing The explanation of the camera pose can be seen that the actual position refers to the pose change of the image of the i-th frame relative to the image of the previous frame), T _j represents the pose of the camera when the image of the j-th frame is captured, and N represents the total frame acquired by the camera. number;

Indicates the kth feature point on the image of the i-th frame

Homogeneous coordinates,

Indicates the kth feature point on the jth frame image

Homogeneous coordinates. It should be noted that when i and k have the same value,

with

Representing the same point, the difference is

Is the local coordinates,

It is homogeneous coordinates.

In order to speed up the calculation rate, the present embodiment does not directly calculate the cost function of the equation (2), but calculates the linear component including the second-order statistic relationship of the remaining feature points and the nonlinear component including the camera pose. Nonlinear term in ^T J(ξ)

Perform multiple iterations on δ=-(J(ξ) ^T J(ξ)) ^-1 J(ξ) ^T r(ξ) to solve the pose when the reprojection error is less than the preset threshold. Nonlinear term

The expression of the nonlinear term

The linear part that is fixed between two frames of image when calculating

As a whole, W is calculated. It does not need to be calculated according to the number of feature points. It reduces the complexity of camera pose calculation and enhances the real-time performance of camera pose calculation.

The derivation process of equation (1) is described below, and the principle of reducing the complexity of the algorithm is analyzed in combination with the derivation process.

The camera pose T _i =[R _i /t _i ] when the camera captures the ith frame image in the Euclidean space, in fact, T _i refers to the image of the jth frame when the camera captures the ith frame image (this embodiment) The pose transformation matrix of the middle frame image of the middle finger includes a rotation matrix R _i and a translation matrix t _i . The rigid transformation T _i in the Euclidean space is _represented by the Lie algebra ξ _i on the SE3 space, that is, ξ _i also represents the camera pose when the camera acquires the ith frame image, and T(ξ _i ) maps the Lie algebra ξ _i Is the T _i in the Euclidean space.

For each feature point correspondence

Its reprojection error is:

The reprojection error of Euclidean space in equation (1) can be expressed as E(ξ)=||r(ξ)||, and r(ξ) represents a vector containing all reprojection errors, namely:

It can be expressed as (for simplicity, ξ _{i is} omitted below):

among them,

Represents the lth row in the rotation matrix R _i ; t _il represents the lth element in the translation vector t _i , l=0, 1, 2.

among them,

A Jacobian matrix corresponding to the correspondence relationship between the feature points of the i-th frame image and the j-th frame image; m represents the m-th feature point correspondence relationship.

Is a 6×6 square matrix,

Representation matrix

Transpose,

The expression is as follows:

Wherein, I _{3 × 3} represents a 3 × 3 unit matrix. According to formula (6) and formula (7),

The four non-zero 6×6 submatrices are:

Below

As an example, the other three non-zero sub-matrices are similarly calculated and will not be described again.

Among them, combined with formula (5) can get:

will

Expressed as W, combined with equation (5), the nonlinear term in equation (10)

Simplified to equation (1), the structural term in the nonlinear term

It is linearized to W. Although for structural items

In terms of

Is non-linear, but after the above analysis,

All non-zero elements in the line are linear with the second-order statistic of the structural terms in C _i,j , and the second-order statistic of the structural term is

with

That is, the sparse matrix

The second order statistic for the structural terms in C _i,j is elemental linear.

It should be noted that each correspondence

The Jacobian matrix is composed of geometric terms ξ _i , ξ _j and structure terms

Decide. For all correspondences in the same frame pair C _i,j , their corresponding Jacobian matrices share the same geometric item, but have different structural items. For a frame pair C _i,j , calculate

The correlation algorithm depends on the number of feature point correspondences in C _i,j , and the present embodiment can calculate the complexity with a fixed complexity.

It is only necessary to calculate the second-order statistic W of the structure item, without each corresponding relationship to participate in the calculation of the relevant structural item, ie

The four non-zero sub-matrices can be calculated by the complexity O(1) instead of the complexity O(||C _i,j ||).

Therefore, the sparse matrices J ^T J and J ^T r required in the iterative step of nonlinear Gauss-Newton optimization for δ=-(J(ξ) ^T J(ξ)) ^-1 J(ξ) ^T r(ξ) The complexity O(M) can be efficiently calculated instead of the original computational complexity O(N _coor ), N _coor represents the total number of correspondences of all feature points of all frame pairs, and M represents the number of frame pairs. In general, O(N _coor ) is approximately 300 in sparse matching and approximately 10,000 in dense matching, much larger than the number M of frame pairs.

After the above derivation, in the camera pose calculation process, for each frame pair, W is calculated, and then equations (1), (10, (9), (8), and (6) are calculated.

Furthermore, iterative calculation can be used to find the ξ when r(ξ) is the smallest.

In an embodiment, in order to obtain a more accurate first label camera pose, after obtaining the pose of the at least one first label camera to capture the plurality of frame images, a global consistent optimization update may be performed on the acquired pose. For example, if the current frame image collected by the first tag camera is a key frame, the loop detection is performed according to the current key frame and the historical key frame of the first tag camera; if the loopback is successful, the acquired according to the current key frame pair. The first label camera poses a globally consistent optimized update.

That is to say, for each frame of image captured by the first tag camera, it is necessary to determine whether the frame image is a key frame, and if so, loopback detection is performed, and if not, wait for the next key frame to arrive for loopback detection. Wherein, the determination of the key frame may be that the image of each frame captured by the first label camera is matched with the previous key frame corresponding to the camera to obtain a transformation relationship matrix between the two frames; if the conversion relationship matrix is greater than or Equal to the preset conversion threshold, it is determined that the current frame image is the current key frame under the camera.

The globally consistent optimization update means that during the calibration process, as the camera moves to the point where it has arrived or has a large overlap with the historical perspective, the current frame image is consistent with the captured image, instead of Produce staggering, aliasing and other phenomena. Loopback detection is based on the current observation of the depth camera to determine whether the camera has moved to a place that has been reached or has a large overlap with the historical perspective. If the loopback is successful, the first label camera pose is globally consistently optimized according to the current key frame. Update to reduce the cumulative error.

In an embodiment, the loopback detection of the current key frame and the historical key frame may be performed by matching the current key frame with the ORB feature point of the historical key frame. If the matching degree is high, the loopback is successful. In an embodiment, one or more historical key frames with high matching degree may be selected to perform global consistent optimization update of the camera pose according to the matching degree between the current key frame and the historical key frame.

In an embodiment, the camera poses globally consistent optimization update, that is, according to the correspondence between the current key frame and one or more historical key frames with high matching degree,

Minimizes the conversion error problem between the current keyframe of the cost function and all historical keyframes with high matching. Where E(T ₁ , T ₂ ,···, T _N-1 |T _i ∈SE3,i∈[1,N-1]) represent all pairs of frames (any historical matching key frame and current key frame Conversion error for one frame pair; N indicates the number of historical key frames with high matching degree with the current key frame; E _i,j indicates the conversion error between the i-th frame and the j-th frame, and the conversion error is the re-projection error .

In an embodiment, during the camera pose optimization update, the relative pose of the non-key frame and its corresponding key frame needs to be kept unchanged. For example, the optimization update algorithm may use the related BA algorithm, or step S503 may be used. The method in order to improve the optimization speed, and will not be described in detail. Similarly, the algorithm of the embodiment (ie, the method in S503) can also be used for the calculation and optimization process of the relative pose between cameras.

In this embodiment, the linear pose of the second-order statistic of the feature point and the nonlinear component including the pose of the camera are used to iteratively calculate the pose of the camera, and the nonlinear term is performed.

Fixed linear part when calculating

As a whole, W is calculated, which reduces the complexity of camera pose calculation, enhances the real-time performance of camera pose calculation, and has low hardware requirements. Applying the above algorithm to solve the pose and back-end optimization process, we can get fast and globally consistent calibration parameters.

It should be noted that the embodiment of the present invention can implement pose estimation and optimization based on the flow and principle of SLAM. The pose estimation is implemented by the front-end visual odometer thread, and the pose optimization is implemented by the back-end loop detection and optimization algorithm, for example, The associated beam adjustment (BA) algorithm or the algorithm in this embodiment.

In the process of SLAM, the following operations are performed according to the acquired image: pose estimation and optimization of the first label camera, and the relative pose between the cameras is calculated by the overlap of the angles of view, and the relative pose is calculated. These operations can be performed simultaneously. Optimize the pose of each camera and continuously update the relative pose between the calculated cameras through a globally consistent SLAM, while maintaining local maps and globally consistent global maps to accommodate conventional RGB-D cameras for indoor robot navigation or The application background of 3D scene reconstruction. The map in SLAM refers to the motion trajectory of the camera in the world coordinate system and the position of the key frames observed in the motion trajectory in the world coordinate system. If the camera system is rigidly deformed due to physical impact, the embodiment only needs to start the calibration procedure for rapid recalibration without re-arranging the calibration object.

Embodiment 5

The embodiment provides a depth camera calibration device, which can be used to perform the depth camera calibration method provided by any embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method. The device can be implemented by means of hardware and/or software, for example by means of a CPU. For details of the techniques not described in detail in this embodiment, reference may be made to the depth camera calibration method provided by any embodiment of the present application. The depth camera calibration device and the panoramic depth camera system need to transmit control signals and images, etc. There are many communication methods between the two, for example, communication through a serial port, a network cable, or the like, or wireless through Bluetooth, wireless broadband, etc. The way to communicate. As shown in FIG. 6, the device includes a camera control module 61, a pose acquisition module 62, and a relative pose calculation module 63.

The camera control module 61 is configured to control at least two depth cameras in the panoramic depth camera system to synchronously acquire images during motion, wherein each depth camera is provided with a corresponding label.

The pose acquisition module 62 is configured to acquire a pose when at least one first label camera captures a plurality of frame images.

The relative pose calculation module 63 is configured to: when the second label camera overlaps with the first label camera appearance history angle, overlap the corresponding image according to the historical perspective and the pose of the first label camera when acquiring the plurality of frame images, and determine The second label camera is in a relative position at the same time as the first label camera.

In an embodiment, the foregoing apparatus may further include: a label modification module and an operation execution module.

a label modification module, configured to modify the label of the second label camera to the first label after the relative pose calculation module 63 calculates the relative pose of the second label camera and the first label camera at the same time;

An operation execution module, configured to repeatedly perform acquiring a pose of the at least one first label camera when acquiring a plurality of frame images, determining a relative position of the second label camera having a historical perspective overlap and a corresponding first label camera at the same time; The operation of the label is modified (ie, the operations of the pose acquisition module 62, the relative pose calculation module 63, and the label modification module are repeated) until the second label camera is not included in the at least two depth cameras of the panoramic depth camera system.

In an embodiment, the apparatus may further include: a key frame determining module configured to respectively set the plurality of depth cameras while the pose acquiring module 62 acquires the pose of the at least one first label camera when acquiring the plurality of frame images The collected current frame image is matched with the feature key of the previous key frame to obtain a conversion relationship matrix between the two frames; if the conversion relationship matrix is greater than or equal to the preset conversion threshold, the current frame image is determined to be the corresponding depth camera. The current key frame, and store the key frame, specifically storing the key frame and the depth camera to which it belongs.

In an embodiment, the apparatus may further include: a viewing angle overlap determining module configured to: before determining the relative pose of the second label camera and the first label camera at the same time, the current frame image acquired by the second label camera and The historical key frame of the at least one first tag camera performs feature point matching; if the historical key frame and the current frame image reach a matching threshold, it is determined that the second tag camera overlaps with the corresponding first tag camera appearance history angle.

In an embodiment, the relative pose calculation module 63 includes a relative position relationship calculation unit, a transformation relationship calculation unit, and a relative pose calculation unit.

The relative position relationship calculation unit is configured to remove the abnormal data in the correspondence between the current frame image acquired by the second tag camera and the feature point of the corresponding historical key frame, and calculate the current frame image and the corresponding historical key frame according to the remaining feature point correspondence relationship Relative positional relationship;

The transformation relationship calculation unit is configured to calculate, according to the relative positional relationship, a transformation relationship between a pose when the second label camera acquires the current frame image and a pose of the first label camera when acquiring the corresponding historical key frame;

The relative pose calculation unit is configured to determine, according to the transformation relationship and the plurality of poses between the collection of the corresponding historical key frame and the acquisition of the current frame image, the second label camera and the first label camera at the current frame moment. Relative pose.

In an embodiment, the pose acquisition module 62 includes a feature extraction unit, a feature matching unit, and an iterative computing unit.

a feature extraction unit, configured to perform feature extraction on each frame image acquired by the first tag camera for each first tag camera to obtain at least one feature point of each frame image;

The feature matching unit is configured to perform feature point matching on the adjacent two frames of images to obtain a feature point correspondence relationship between adjacent two frames of images;

An iterative calculation unit, configured to remove anomaly data in a correspondence of feature points, and calculate a linear component of a second-order statistic of the remaining feature points and a nonlinear component including a camera pose to calculate J(ξ) ^T J(ξ) Nonlinear term

Perform multiple iterations on δ=-(J(ξ) ^T J(ξ)) ^-1 J(ξ) ^T r(ξ) to solve the pose when the re-projection error is less than the preset threshold;

Representing the kth feature point on the image of the i-th frame;

In an embodiment, the nonlinear term

The expression is:

among them,

Represents a linear component;

And r _jl represent nonlinear components,

Is the lth row in the rotation matrix R _i , and r _jl is the transpose of the l th row in the rotation matrix R _j , l=0, 1, 2 .

In an embodiment, the foregoing apparatus may further include: a loopback detection module and an optimization update module.

The loop detection module is configured to: after acquiring the pose of the at least one first label camera to capture the plurality of frame images, if the current frame image captured by the first label camera is a key frame, according to the current key frame and the first label camera Historical key frames for loopback detection;

The optimization update module is configured to perform globally consistent optimization update on the acquired first label camera pose according to the current key frame in the case of successful loopback.

In an embodiment, the loopback detection module is further configured to: after the relative pose calculation module 63 calculates the relative pose of the second label camera and the first label camera at the same time, if the depth camera of the relative pose has been calculated If there is a key frame in the current frame image acquired synchronously, loopback detection is performed according to the key frame and the historical key frame under the depth camera in which the relative pose has been calculated; the above optimization update module is also set to be successful in the loopback process. The relative pose between the corresponding depth cameras is updated according to the key frame and the corresponding historical key frame.

It should be noted that, in the embodiment of the above-mentioned depth camera calibration apparatus, each unit and module included is divided according to functional logic, but is not limited to the above division, as long as the corresponding function can be implemented; The specific names of the functional units are also for convenience of distinguishing from each other and are not intended to limit the scope of protection of the present application.

Embodiment 6

The embodiment provides an electronic device comprising: one or more processors, a memory and a panoramic depth camera system. Wherein, the memory is set to store one or more programs. A panoramic depth camera system includes at least two depth cameras that cover a panoramic field of view and are arranged to capture images. The one or more programs are executed by the one or more processors such that the one or more processors implement a depth camera calibration method as described in any of the embodiments herein.

FIG. 7 is a schematic structural diagram of an electronic device according to Embodiment 6 of the present invention. FIG. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present invention. The electronic device 712 shown in FIG. 7 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.

As shown in Figure 7, electronic device 712 is embodied in the form of a general purpose computing device. Components of electronic device 712 may include, but are not limited to, one or more processors (or processing unit 716), system memory 728, and a bus 718 that connects different system components, including system memory 728 and processing unit 716.

Bus 718 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA Bus, and the Video Electronics Standards. Association, VESA) Local Bus and Peripheral Component Interconnect (PCI) bus.

Electronic device 712 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 712, including volatile and non-volatile media, removable and non-removable media.

System memory 728 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 730 and/or cache memory 732. Electronic device 712 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 734 can be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 7, commonly referred to as a "hard disk drive"). Although not shown in FIG. 7, a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk"), and a removable non-volatile disk (such as a CD-ROM, DVD-ROM) may be provided. Or other optical media) read and write optical drive. In these cases, each drive can be coupled to bus 718 via one or more data medium interfaces. System memory 728 can include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the various embodiments of the present application.

Program/utility 740 having a set (at least one) of program modules 742, which may be stored, for example, in system memory 728, including but not limited to an operating system, one or more applications, other program modules, and programs Data, each of these examples or some combination may include an implementation of a network environment. Program module 742 typically performs the functions and/or methods of the embodiments described herein.

The electronic device 712 can also be in communication with one or more external devices 714 (eg, a keyboard, pointing device, display 724, etc.), and can also be in communication with one or more devices that enable a user to interact with the electronic device 712, and/or Any device (eg, a network card, modem, etc.) that enables the electronic device 712 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 722. Moreover, the electronic device 712 can also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 720. As shown, network adapter 720 communicates with other modules of electronic device 712 via bus 718. It should be understood that although not shown in the figures, other hardware and/or software modules may be utilized in conjunction with electronic device 712, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays (Redundant Arrays) Of Independent Disks (RAID) systems, tape drives, and data backup storage systems.

The processing unit 716 performs various functional applications and data processing by executing programs stored in the system memory 728, such as implementing the depth camera calibration method provided by embodiments of the present invention.

The electronic device 712 can also include a panoramic depth camera system 750 that includes at least two depth cameras that cover the panoramic field of view for acquiring images. The panoramic depth camera system 750 is coupled to the processing unit 716 and the system memory 728. The depth camera included in the panoramic depth camera system 750 can acquire images under the control of the processing unit 716. In an embodiment, the panoramic depth camera system 750 can be embedded in an electronic device.

In one embodiment, the one or more processors are central processors; the electronic devices are portable mobile electronic devices, such as mobile robots, drones, three-dimensional visual interactive devices (such as VR glasses or wearable helmets), or smart terminals. (such as a mobile phone or tablet).

Example 7

The present embodiment provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements a depth camera calibration method as described in any of the embodiments of the present application.

The computer storage medium of the embodiments of the present invention may employ any combination of one or more computer readable mediums. The computer readable medium can be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (Read-Only) Memory, ROM), Erasable Programmable Read Only Memory (EPROM) or flash memory, optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or the like Any suitable combination. In this document, a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.

A computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .

Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages, or a combination thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional A procedural programming language - such as the "C" language or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider) Internet connection).

The above embodiment numbers are for the purpose of description only and do not represent the advantages and disadvantages of the embodiments.

It will be apparent to those skilled in the art that the various modules or operations of the above-described embodiments of the present invention may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computer device so that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or many of them Modules or operations are made as a single integrated circuit module. Thus, the application is not limited to any specific combination of hardware and software.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same or similar parts between the various embodiments may be referred to each other.

Claims

A depth camera calibration method comprising:

Controlling at least two depth cameras in the panoramic depth camera system to simultaneously acquire images during motion, wherein each depth camera is provided with a label;

Obtaining a pose when at least one of the at least two depth cameras acquires a plurality of frame images;

In a case where the second tag camera of the at least two depth cameras overlaps with the first tag camera appearance history angle, the corresponding image is overlapped according to the historical view angle and the plurality of frame images are acquired by the first tag camera The pose determines the relative pose of the second tag camera at the same time as the first tag camera.
The method of claim 1, after determining the relative pose of the second tag camera and the first tag camera at the same time, further comprising:

Modifying the label of the second label camera to the first label;

Performing repeatedly to acquire a pose when at least one first label camera captures a plurality of frame images, determining a relative pose of the second label camera having a historical perspective overlap and a first label camera at the same time, and modifying the label until the The second label camera is not included in at least two depth cameras.
The method according to claim 1, wherein, while acquiring the pose of the plurality of frame images when the at least one of the at least two depth cameras captures the plurality of frame images, the method further comprises:

Performing feature point matching on the current frame image acquired by each depth camera and the last key frame of each depth camera to obtain a conversion relationship matrix between the current frame image and the previous key frame;

And determining, in the case that the conversion relationship matrix is greater than or equal to a preset conversion threshold, the current frame image as a current key frame under each depth camera and storing the current key frame.
The method of claim 1, before determining the relative pose of the second tag camera and the first tag camera at the same time, further comprising:

Matching a current frame image acquired by the second tag camera with a historical key frame of the at least one first tag camera;

Determining, in the case that the historical key frame of the at least one first tag camera and the current frame image acquired by the second tag camera reach a matching threshold, determining the second tag camera and the first corresponding to the historical key frame A tag camera has a historical perspective overlap.
The method according to claim 4, wherein the second tag camera and the first tag camera are determined to be in accordance with a historical view overlapping the corresponding image and a pose of the first tag camera when capturing a plurality of frame images Relative poses at the same time, including:

And removing the abnormal data in the correspondence between the current frame image collected by the second label camera and the feature point of the historical key frame, and calculating a relative position relationship between the current frame image and the historical key frame according to the remaining feature point correspondence relationship;

Determining, according to a relative positional relationship between the current frame image and the historical key frame, a pose of the second tag camera when the current frame image is acquired, and a pose of the first tag camera when the historical key frame is acquired. Transformation relationship

Determining, according to the transformation relationship and the plurality of poses between the collecting the historical key frame and acquiring the current frame image, the second label camera and the first label camera at a current frame moment Relative pose.
The method of claim 1, wherein acquiring a pose of the at least one of the at least two depth cameras when acquiring the plurality of frame images comprises:

For each first label camera, performing feature extraction on the multi-frame images collected by the first label camera to obtain at least one feature point of each frame image;

Feature point matching is performed on two adjacent frames of the multi-frame image to obtain a feature point correspondence relationship between the adjacent two frame images;

The abnormal data in the correspondence relationship of the feature points is removed, and the nonlinear component in J(ξ) T J(ξ) is calculated by the linear component including the second-order statistic of the remaining feature points and the nonlinear component including the camera pose
Perform multiple iterations on δ=-(J(ξ) T J(ξ)) -1 J(ξ) T r(ξ) to solve the pose when the re-projection error is less than the preset threshold;

Where r(ξ) denotes a vector containing all reprojection errors, J(ξ) is the Jacobian matrix of r(ξ), ξ represents the Lie algebra of the camera pose, and δ represents the increase of r(ξ) at each iteration. Measured value; R i represents the rotation matrix of the camera when the image of the ith frame is acquired; R j represents the rotation matrix of the camera when the image of the jth frame is acquired;
Representing the kth feature point on the image of the i-th frame;
Representing the kth feature point on the jth frame image; C i,j represents a set of correspondence points of the feature points of the i-th frame image and the j-th frame image; ||C i,j ||-1 represents the i-th frame image The number of correspondences with the feature points of the j-th frame image; [] × represents the vector product; ||C i,j || represents the norm of C i,j .
The method of claim 6 wherein said non-linear term
The expression is:

among them,
Represents a linear component; r il T and r jl represent nonlinear components, r il T is the lth row in the rotation matrix R i , r jl is the transpose of the lth row in the rotation matrix R j , l=0,1 ,2.
The method according to claim 1 or claim 6, after acquiring the pose of the at least one of the at least two depth cameras to capture the plurality of frame images, the method further comprises:

In the case that the current frame image collected by the first tag camera is a key frame, loopback detection is performed according to the current key frame and the historical key frame of the first tag camera;

In the case that the current key frame and the historical key frame of the first tag camera are successfully detected in the loopback, the globally consistent optimized update of the acquired first tag camera pose is performed according to the current key frame.
The method of claim 1 or 2, after determining the relative pose of the second tag camera and the first tag camera at the same time, further comprising:

In the case that there is a current key frame in the current frame image acquired by the depth camera that has determined the relative pose, the loop detection is performed according to the current key frame and the historical key frame under the depth camera that has determined the relative pose;

In the case that the current key frame and the historical key frame of the depth camera that has calculated the relative pose are successfully detected in the loopback, the relative position between the corresponding depth cameras is updated according to the current key frame and the corresponding historical key frame. posture.
A depth camera calibration device comprising:

a camera control module configured to control at least two depth cameras in the panoramic depth camera system to synchronously acquire images during motion, wherein each depth camera is provided with a label;

a pose obtaining module, configured to acquire a pose when at least one of the at least two depth cameras acquires a plurality of frame images;

a relative pose calculation module configured to overlap the corresponding image and the first label according to a historical perspective in a case where a second label camera of the at least two depth cameras overlaps with a history view of the first label camera The pose when the camera captures each frame image determines the relative pose of the second tag camera and the first tag camera at the same time.
The apparatus of claim 10 further comprising:

a label modification module, configured to modify a label of the second label camera to be the first label;

An operation execution module is configured to repeatedly perform acquiring a pose of the at least one first label camera when acquiring a plurality of frame images, determining a relative position of the second label camera having the historical perspective overlap and the first label camera at the same time, and modifying the label The operation does not include the second tag camera in the at least two depth cameras.
The apparatus of claim 10 further comprising:

a key frame determining module configured to perform feature point matching between a current frame image acquired by each depth camera and a previous key frame of each camera to obtain a conversion between the current frame image and the previous key frame a relationship matrix; if the conversion relationship matrix is greater than or equal to a preset conversion threshold, determining the current frame image as a current key frame under each depth camera, and storing the current key frame.
An electronic device comprising:

At least one processor;

a memory, configured to store at least one program;

a panoramic depth camera system comprising at least two depth cameras, the at least two depth cameras covering a panoramic field of view for acquiring images;

The at least one program is executed by the at least one processor such that the at least one processor implements the depth camera calibration method of any one of claims 1-9.
The electronic device of claim 13, wherein the at least one processor is a central processor; the electronic device is a portable mobile electronic device.
A computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the depth camera calibration method of any of claims 1-9.