CN108447097B

CN108447097B - Depth camera calibration method and device, electronic equipment and storage medium

Info

Publication number: CN108447097B
Application number: CN201810179738.7A
Authority: CN
Inventors: 方璐; 苏卓; 韩磊; 戴琼海
Original assignee: Tsinghua-Berkeley Shenzhen Institute Preparation Office
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2018-03-05
Filing date: 2018-03-05
Publication date: 2021-04-27
Anticipated expiration: 2038-03-05
Also published as: WO2019170166A1; CN108447097A

Abstract

The embodiment of the invention discloses a depth camera calibration method, a depth camera calibration device, electronic equipment and a storage medium, wherein the method comprises the following steps: controlling at least two depth cameras in the panoramic depth camera system to synchronously acquire images in the motion process, wherein each depth camera is provided with a corresponding label; acquiring the pose of at least one first label camera when acquiring each frame of image; and if the historical view angles of the second label camera and the first label camera are overlapped, calculating the relative pose of the second label camera and the first label camera at the same moment according to the images corresponding to the overlapped historical view angles and the pose of the first label camera when the first label camera collects the images of each frame. According to the embodiment of the invention, based on the first tag camera preset in the panoramic depth camera system, in the process of acquiring images while the camera system moves, the relative pose between the cameras is determined by overlapping historical visual angles, so that the rapid online self-calibration of multiple cameras is realized. And a calibration object is not needed, and a fixed large view field overlapping range is not needed between adjacent cameras.

Description

Depth camera calibration method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to a machine vision technology, in particular to a depth camera calibration method and device, electronic equipment and a storage medium.

Background

With the development of robot navigation, virtual reality and augmented reality technologies, RGB-D cameras (i.e., depth cameras) are widely used for robot navigation, static scene reconstruction, dynamic body reconstruction, and the like. The RGB-D camera combines the traditional RGB camera and the depth camera, and has the advantages of high precision, small volume, large information amount, passivity, rich information and the like.

At present, because a Field of View (FoV) of a single RGB-D camera is limited, all surrounding information cannot be simultaneously acquired by using a single RGB-D camera for navigation, and a small Field of View also has a certain limitation on three-dimensional reconstruction, for example, a whole body or multiple object models cannot be acquired in dynamic object reconstruction, and for example, a dynamic object and a static object in the Field of View cannot be reconstructed simultaneously.

If a multi-camera system is used, precise external reference calibration of the cameras is required. At present, a camera calibration method based on a specific calibration object is generally adopted, the method is relatively complicated, the view fields between adjacent cameras are required to have a large overlapping range, and the method cannot adapt to the condition that the view field overlapping range is small. However, in some calibration methods based on Simultaneous Localization And Mapping (SLAM), a camera is moved according to a preset track to collect an image, And then the image is processed offline for calibration, so that a 360-degree panoramic RGB-D camera cannot be calibrated quickly And simultaneously online.

Disclosure of Invention

The embodiment of the invention provides a depth camera calibration method, a depth camera calibration device, electronic equipment and a storage medium, and aims to realize real-time online panoramic multi-camera self-calibration without human intervention.

In a first aspect, an embodiment of the present invention provides a depth camera calibration method, including:

controlling at least two depth cameras in the panoramic depth camera system to synchronously acquire images in the motion process, wherein each depth camera is provided with a corresponding label;

acquiring the pose of at least one first label camera when acquiring each frame of image;

if the historical view angles of the second label camera and the first label camera are overlapped, calculating the relative pose of the second label camera and the first label camera at the same moment according to the images corresponding to the overlapped historical view angles and the pose of the first label camera when the first label camera collects the images of each frame.

In a second aspect, an embodiment of the present invention further provides a depth camera calibration apparatus, including:

the camera control module is used for controlling at least two depth cameras in the panoramic depth camera system to synchronously acquire images in the motion process, wherein each depth camera is provided with a corresponding label;

the pose acquisition module is used for acquiring the pose of each frame of image acquired by at least one first tag camera;

and the relative pose calculation module is used for calculating the relative poses of the second tag camera and the first tag camera at the same moment according to the images corresponding to the overlapped historical view angles and the poses of the first tag camera when the first tag camera collects the images of all frames under the condition that the historical view angles of the second tag camera and the first tag camera are overlapped.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

a panoramic depth camera system comprising at least two depth cameras covering a panoramic field of view for capturing images;

when executed by the one or more processors, cause the one or more processors to implement a depth camera calibration method as described in any embodiment of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the depth camera calibration method according to any embodiment of the present invention.

According to the embodiment of the invention, based on the first tag camera preset in the panoramic depth camera system, in the process of acquiring images while moving by the panoramic depth camera system, the relative poses of the second tag camera and the first tag camera are determined by overlapping the historical visual angles of the second tag camera and the first tag camera, so that the rapid online self-calibration of multiple cameras is realized. The method does not need a calibration object, does not need a fixed large view field overlapping range between adjacent cameras, has a small overlapping view angle or no overlapping view angle between the adjacent cameras when a camera system is built, and can calibrate according to the method by using the motion of the panoramic depth camera system to enable different cameras to acquire images with overlapped historical view angles. The method has small calculation amount and can carry out on-line calibration based on the CPU.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a depth camera calibration method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a panoramic depth camera system according to an embodiment of the present invention;

FIG. 3 is a flowchart of a depth camera calibration method according to a second embodiment of the present invention;

fig. 4 is a flowchart of a depth camera calibration method according to a third embodiment of the present invention;

fig. 5 is a flowchart of acquiring a pose of a first tag camera according to a fourth embodiment of the present invention;

fig. 6 is a block diagram of a depth camera calibration apparatus according to a fifth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a depth camera calibration method according to an embodiment of the present invention, where the embodiment is applicable to a case of multi-camera self-calibration, and calibration in the embodiment refers to calculating a relative pose between cameras. The method may be executed by a depth camera calibration apparatus or an electronic device, where the depth camera calibration apparatus may be implemented by software and/or hardware, for example, by a Central Processing Unit (CPU), and the CPU controls and calibrates the camera; further, the apparatus may be integrated in a portable mobile electronic device. As shown in fig. 1, the method specifically includes:

s101, controlling at least two depth cameras in the panoramic depth camera system to synchronously acquire images in the motion process, wherein each depth camera is provided with a corresponding label.

Therein, a panoramic depth camera system (which may be referred to simply as a camera system) comprises at least two depth cameras (RGB-D cameras) covering a 360 degree panoramic field of view. In practical application, the size and the number of cameras to be used can be selected according to specific requirements, and the cameras are fixed on a platform (for example, a component with a rigid structure) according to the size and the number of the cameras so as to meet the requirement of field coverage and initially complete the construction of the panoramic depth camera system. Specifically, the required number of cameras is determined according to the field of view and the required field of view of a single camera, the sum of the field of view of all the cameras needs to be larger than the required field of view, for example, taking a single camera with the same use angle as an example, the number n of cameras needs to satisfy n × Fov > α, Fov represents the field of view of the single camera, α represents the required field of view of the built camera system, for example, α is 360 °, the transverse field of view of the single camera is 65 °, the longitudinal field of view is 80 °, n is 5 or n is 6, and 6 cameras can be used in consideration of the requirement of the longitudinal field of view. According to the size and the number of the cameras, the cameras are reasonably arranged, for example, an RGB-D camera with a length of 10-15 cm, a width of 3-5 cm, a height of 3-5 cm and a resolution of 640 × 480 is taken as an example, a regular hexagonal prism with a bottom edge of 5 cm in length is taken as an axis, and a camera lens faces outwards and is fixed on the axis through a fixing piece, as shown in fig. 2. It should be noted that, when the camera system is built, in combination with specific system use requirements, adjacent cameras may not have viewing angle overlap or have small viewing angle overlap, for example, viewing angle overlap of one or two degrees.

The depth cameras in the panoramic depth camera system synchronously acquire images, and the synchronization mode can be hardware synchronization or software synchronization. Specifically, the hardware synchronization refers to using a signal (such as a rising edge signal) to simultaneously trigger all cameras to acquire images at the same time; the software mode is that when the images collected by each camera are cached, the images are stamped, the images with the closest timestamps are regarded as the images collected at the same time, namely, one buffer area is maintained, and the image frames with the closest timestamps of each camera are output each time.

When the camera self-calibration is carried out on line, the panoramic depth camera system needs to be in a motion state, and an acquired image is used as observation information. For example, a robot is used to carry a panoramic depth camera system or a user holds the panoramic depth camera system to move freely in a conventional room, and it is ensured that rotational movement is increased as much as possible in the movement process, so that the historical view angle overlap between cameras in the movement process is increased, more observation information is obtained, and the calibration is facilitated.

Each depth camera sets a corresponding attribute tag, where the attribute tag is used to distinguish roles of the cameras, for example, a first tag indicates that the camera is a reference camera, and a second tag indicates that the camera is a non-reference camera, and for simplicity, a specific value of the tag may be 0 or 1, for example, 0 represents a non-reference camera, and 1 represents a reference camera.

And S102, acquiring the pose of at least one first label camera when the first label camera collects each frame of image.

The camera role can be determined according to the camera tags, at least one reference camera in at least two depth cameras included in the panoramic depth camera system can be specifically preset as a reference camera, and the first tag camera is the reference camera. It should be noted that if a plurality of reference cameras are preset, the relative pose between the reference cameras needs to be accurately known, and in consideration of calibration accuracy, one reference camera can be preset generally. For a reference camera in a camera system, in the whole calibration process, the pose of each frame of image acquired by the reference camera needs to be acquired in real time, wherein the pose refers to the change of the relative pose of the reference camera when acquiring the current frame of image relative to the previous frame of image, the pose specifically comprises a position (translation matrix T) and a posture (rotation matrix R), and relates to six degrees of freedom X, Y, Z, alpha, beta and gamma, wherein X, Y and Z are parameters of the camera position, and alpha, beta and gamma are parameters of the camera posture.

S103, if the historical view angles of the second label camera and the first label camera are overlapped, calculating the relative pose of the second label camera and the first label camera at the same moment according to the images corresponding to the overlapped historical view angles and the pose of the first label camera when the first label camera collects the images of each frame.

Along with the movement of the panoramic depth camera system, historical visual angles of the second tag cameras and the first tag cameras are overlapped, so that the relative poses of the second tag cameras and the first tag cameras at the same moment can be obtained, namely the relative poses between the cameras are obtained, and the real-time online camera self-calibration process is completed.

It should be noted that, in the whole calibration process, the depth cameras to be calibrated in the panoramic depth camera system are all in a motion state, and the moving cameras synchronously acquire images in real time and maintain key frames under each camera. For the reference camera, performing single-camera pose estimation according to each acquired frame image to obtain a camera pose when each frame image is acquired; meanwhile, for the non-reference cameras, single-camera pose estimation is not carried out, each acquired frame image is used as observation information to judge whether the acquired frame image is overlapped with the historical view angle of any reference camera, and if the acquired frame image is overlapped with the historical view angle of any reference camera, the relative pose of the non-reference camera and the corresponding reference camera at the same time can be calculated.

The depth camera calibration method of the embodiment is based on a first tag camera preset in a panoramic depth camera system, and determines the relative poses of a second tag camera and the first tag camera by overlapping the historical view angles of the second tag camera and the first tag camera in the process of acquiring images while moving in the panoramic depth camera system, so that the rapid online self-calibration of multiple cameras is realized. The method does not need a calibration object, does not need a fixed large view field overlapping range between adjacent cameras, has a small overlapping view angle or no overlapping view angle between the adjacent cameras when a camera system is built, and can calibrate according to the method by using the motion of the panoramic depth camera system to enable different cameras to acquire images with overlapped historical view angles. The method has small calculation amount and can carry out on-line calibration based on the CPU. The depth camera calibration method provided by the embodiment of the invention is suitable for application backgrounds of indoor robot navigation or three-dimensional scene reconstruction and the like.

Example two

On the basis of the above embodiment, the present embodiment further optimizes the determination of the overlap of the historical viewing angles and the calculation of the relative pose between the cameras.

For each depth camera to be calibrated in the camera system, a frame of image is acquired, whether the frame of image is a key frame or not is determined, so that the overlapping of historical visual angles is determined, and if loop optimization is subsequently performed, the key frame is also required to be used. Specifically, when the pose of each frame of image acquired by at least one first tag camera is obtained, the method may further include: respectively carrying out feature point matching on the current frame image acquired by each depth camera and a key frame on the current frame image to obtain a conversion relation matrix between the two frame images; and if the conversion relation matrix is larger than or equal to the preset conversion threshold value, determining the current frame image as a key frame under the corresponding depth camera, and storing the key frame.

The first frame of image collected by each depth camera is a key frame by default, and for each frame of image collected subsequently, whether the frame of image is a key frame is determined by comparing the frame of image with the nearest key frame under the camera. The preset conversion threshold is set in advance according to the motion condition of the depth camera when the image is acquired, for example, if the pose change is large when the camera shoots two adjacent frames of images, the preset conversion threshold is set to be larger. Feature point matching may employ existing matching algorithms, such as feature matching of color images acquired by RGB-D cameras based on the FAST feature point extraction and description (ORB) algorithm (sparse algorithm), or dense registration using direct methods.

Fig. 3 is a flowchart of a depth camera calibration method according to a second embodiment of the present invention, and as shown in fig. 3, the method includes:

s301, controlling at least two depth cameras in the panoramic depth camera system to synchronously acquire images in the motion process, wherein each depth camera is provided with a corresponding label.

S302, acquiring the pose of at least one first label camera when acquiring each frame of image.

S303, performing feature point matching on the current frame image acquired by the second label camera and the historical key frame of the at least one first label camera; and if the historical key frame and the current frame image reach the matching threshold, determining that the historical view angle of the second label camera is overlapped with the historical view angle of the corresponding first label camera.

And the second label camera performs feature point matching with the key frame (namely the historical key frame) stored under the first label camera every time a new frame of image is acquired, and if the matching threshold is reached, the historical view angle is considered to be overlapped. Feature point matching may employ existing matching algorithms, for example, sparse ORB-based algorithms for feature matching of color images acquired by RGB-D cameras, or dense registration using direct methods. The matching threshold may be a preset number of matching feature points.

S304, removing abnormal data in the feature point corresponding relation between the current frame image acquired by the second label camera and the corresponding historical key frame, and calculating the relative position relation between the current frame image and the corresponding historical key frame according to the remaining feature point corresponding relation;

s305, calculating a transformation relation between the pose of the second tag camera when acquiring the current frame image and the pose of the first tag camera when acquiring the corresponding historical key frame according to the relative position relation;

s306, calculating the relative poses of the second label camera and the first label camera at the current frame moment according to the transformation relation and the positions of the first label camera from the collection of the corresponding historical key frame to the collection of the current frame image.

Wherein, a Random Sample Consensus (RANSAC) algorithm can be used to remove abnormal data, and the relative position relationship between two frames of images with overlapped view angles is calculated according to the corresponding relationship of the remaining feature points; according to the relative position relationship between the images, the corresponding camera pose transformation relationship can be obtained. Since the poses of the first label camera when the first label camera collects the images of the frames are estimated all the time, the poses of the first label camera between the history key frame corresponding to the history view angle overlap and the current frame are known, and therefore the relative poses of the second label camera and the first label camera with the history view angle overlap at the current frame time (namely the time when the history view angle overlap occurs) can be deduced. The relative pose also includes six degrees of freedom, involving a rotation matrix R and a translation matrix T.

Illustratively, the panoramic depth camera system includes three depth cameras A, B and C, where camera a is a first tag camera and cameras B and C are second tag cameras. The three cameras synchronously acquire images, when the 10 th frame is acquired, the 10 th frame of the camera B and the 5 th frame (history key frame) of the camera A are overlapped in history view angle, and the transformation relation between the pose of the camera B acquiring the 10 th frame and the pose of the camera A acquiring the 5 th frame can be calculated according to the feature point matching between the images with the two overlapped history view angles. Because the camera A is always in pose estimation and the poses of the 5 th frame to the 10 th frame are recorded, the relative poses of the camera B and the camera A in the 10 th frame can be obtained by deducing the frame.

According to the depth camera calibration method, historical key frames are used for determining that the second label camera and the first label camera have historical view angle overlapping, the pose transformation relation between the second label camera and the first label camera is calculated through two frames of images with the overlapped view angles, then the corresponding first label camera pose between the historical key frames and the current frame under the first label camera is used, the relative pose of the second label camera and the first label camera at the same time can be deduced, the calculation is simple, and rapid online calibration can be guaranteed.

Optionally, in order to obtain a more accurate relative pose between the cameras and improve the calibration accuracy, after calculating the relative pose of the second tag camera and the first tag camera at the same time, global optimization may be performed on the calculated relative pose, which specifically includes: if the relative key frame in the current frame image synchronously acquired by the depth camera with the calculated relative pose is the key frame, performing loop detection according to the key frame and the historical key frame under the depth camera with the calculated relative pose; if the loop is successful (namely a matched historical key frame is found), optimizing and updating the relative pose between the corresponding depth cameras according to the key frame and the corresponding historical key frame.

Loop detection is where it is determined whether the depth camera has moved to a place that was reached or has a large overlap with the historical perspective. The loop detection is performed for the key frame rate, and for each frame of image acquired by the depth camera, it needs to be determined whether the frame of image is a key frame, if so, the loop detection is performed, and if not, the loop detection is performed by waiting for the next key frame to arrive. The judgment of the key frame can be to match each frame of image collected by the depth camera with the feature point of the last key frame corresponding to the camera to obtain a conversion relation matrix between the two frames of images; and if the conversion relation matrix is larger than or equal to a preset conversion threshold value, determining that the current frame image is a key frame under the camera.

Because the relative pose among the cameras is involved, the comparison object of loop detection in this embodiment is the historical key frame under the depth camera for which the relative pose has been calculated in the camera system, rather than merely comparing the historical key frames of the comparison object itself, and if the loop is successful, the relative pose among the corresponding depth cameras is optimized and updated according to the current key frame and the corresponding historical key frame, so as to reduce the accumulated error and improve the accuracy of the relative pose among the cameras.

Specifically, the loop detection of the current key frame and the historical key frame may be performed by performing matching operation on ORB feature points of the current key frame and the historical key frame, and if the matching degree is high, the loop detection is successful. It should be noted that, if the matched historical key frame belongs to the depth camera itself, the pose of the depth camera itself is optimized according to the current key frame and the historical key frame. The optimization updating process of the relative pose can be started after the relative pose of a pair of cameras in the camera system is obtained, the calculated relative pose between the cameras is updated along with the process of acquiring images by the movement of the cameras, if the relative pose between all the cameras in the camera system is calculated, and when a preset condition is met (for example, the optimization times of the relative pose between all the cameras reach the preset times or the preset error requirement is met), the optimization updating is stopped, and finally accurate calibration parameters are obtained.

Illustratively, still taking the panoramic depth camera system as comprising three depth cameras A, B and C as an example, after calculating the relative poses of camera B and camera a at the same time, the relative poses between camera B and a are also optimally updated from the keyframes captured by camera a and/or camera B while determining whether historical perspective overlap occurs between camera C and camera a (to determine the relative poses of the two). If the relative pose of the camera C and the camera A at the same time is obtained, the relative pose of the camera B and the camera C can be further deduced, so that the relative pose between the cameras is obtained, and then optimization updating is carried out. For the current frame, the three cameras synchronously acquire corresponding images, if the image a acquired by the camera A and the image B acquired by the camera B are judged to be key frames and the image C acquired by the camera C is not judged to be key frames, loop detection is carried out on the key frames a and B, the loop detection and the loop detection are both successful, optimization updating is carried out according to the key frames a and B respectively, the calculation method of the relative pose in the optimization updating process is the same as the calculation method of the initial relative pose, and details are not repeated here.

According to the embodiment, the calculated relative pose between the cameras can be optimized and updated in real time according to the images acquired by the cameras, and the camera calibration accuracy is improved.

EXAMPLE III

The depth camera calibration method of each embodiment described above is to use a first tag camera preset in the panoramic depth camera system as a reference to calculate the relative pose between the cameras. On the basis of the above embodiments, the present embodiment provides another depth camera calibration method to further accelerate the online calibration speed. Fig. 4 is a flowchart of a depth camera calibration method according to a third embodiment of the present invention, and as shown in fig. 4, the method includes:

s401, controlling at least two depth cameras in the panoramic depth camera system to synchronously acquire images in the motion process, wherein each depth camera is provided with a corresponding label.

S402, acquiring the pose of at least one first label camera when acquiring each frame of image.

And S403, if the historical view angles of the second label camera and the first label camera are overlapped, calculating the relative pose of the second label camera and the first label camera at the same moment according to the images corresponding to the overlapped historical view angles and the pose of the first label camera when the first label camera collects the images of each frame.

S404, modifying the label of the second label camera into a first label.

That is, after the relative pose is calculated, the second tag camera overlapping the appearance history view angle of the first tag camera is taken into the reference category, and the reference range is expanded.

S405, repeatedly executing the operations of acquiring the pose of at least one first tag camera when the first tag camera collects each frame of image, calculating the relative pose of a second tag camera with the historical view angle overlapped with the corresponding first tag camera at the same time and modifying the tag (namely, repeatedly executing S402 to S404) until the at least two depth cameras do not contain the second tag camera. The labels of the depth cameras in the camera system are modified into first labels, and the fact that each second label camera calculates the relative pose with other cameras is shown, and a calibration result is obtained.

In this embodiment, after the relative poses of the second tag camera and the first tag camera are obtained by overlapping the historical view angles, the second tag camera is also used as a reference, so that the reference range is expanded, the probability that the historical view angles of other second tag cameras are overlapped is increased, and the calibration speed can be further increased.

For the determination of the keyframe, the determination of the overlap of the historical viewing angles, the calculation of the relative pose, and the optimization update of the relative pose, please refer to the description of the foregoing embodiments, which is not repeated herein.

Example four

On the basis of the above embodiments, the present embodiment further optimizes the "pose when acquiring at least one first tag camera collects each frame image" to improve the calculation speed. Fig. 5 is a flowchart of acquiring a pose of a first tag camera according to a fourth embodiment of the present invention, and as shown in fig. 5, the method includes:

s501, for each first label camera, extracting features of each frame of image acquired by the first label camera to obtain at least one feature point of each frame of image.

The feature extraction is performed on the image to find some pixel points (i.e., feature points) with landmark features in the frame image, for example, the pixel points may be corners, textures, and edges in a frame image. The feature extraction of each frame of image can adopt an ORB algorithm to find at least one feature point in the frame of image.

And S502, performing feature point matching on the two adjacent frames of images to obtain the corresponding relationship of the feature points between the two adjacent frames of images.

Considering that the camera acquires images at a higher frequency in the motion process, the contents of the two adjacent frames of images acquired by the same camera are the same, and therefore, a certain corresponding relationship also exists between the corresponding feature points of the two frames of images. Sparse ORB feature registration or direct method dense registration can be adopted to obtain the feature point corresponding relation between two adjacent frames of images, and the feature point corresponding relation between two adjacent frames of images is obtained.

Specifically, taking a feature point between two adjacent frames of images as an example, assuming that feature points X1 and X2 representing the same texture feature in the two frames of images are located at different positions of the two frames of images, respectively, and H (X1 and X2) represents a hamming distance between two feature points X1 and X2, the two feature points are subjected to an exclusive or operation, and the number of the result is 1, which is taken as the hamming distance (i.e., a feature point corresponding relationship) of one feature point between two adjacent frames of images.

S503, removing abnormal data in the corresponding relation of the feature points, and calculating J (xi) through a linear component containing the second-order statistic of the residual feature points and a nonlinear component containing the pose of the camera^TNonlinear term in J (xi)

For delta- (J (xi)^T J(ξ))^-1J(ξ)^Tr (xi) is performed for a plurality of timesAnd (5) performing iterative calculation, and solving the pose when the reprojection error r (xi) is smaller than a preset threshold value. Specifically, the iterative calculation can be performed using the gauss-newton method. It may be preferable to calculate the pose at which the reprojection error is minimized.

Wherein r (ξ) represents a vector containing all reprojection errors, J (ξ) is a Jacobian matrix of r (ξ), ξ represents a Li algebra of camera poses, and δ represents an increment value of r (ξ) in each iteration; r_iRepresenting a rotation matrix of the camera when the ith frame of image is acquired; r_jRepresenting a rotation matrix of the camera when the j frame image is acquired;

representing the kth characteristic point on the ith frame image;

representing the kth characteristic point on the jth frame image; c_i,jA set representing the corresponding relation of the characteristic points of the ith frame image and the jth frame image; i C_i,jThe | 1 represents the number of the corresponding relations of the characteristic points of the ith frame image and the jth frame image; []_×Representing a vector product; i C_i,jI means taking C_i,jNorm of (d).

Further, the non-linear term

The expression of (a) is:

wherein the content of the first and second substances,

represents a linear component; r is_il ^TAnd r_jlRepresents a nonlinear component, r_il ^TIs a rotation matrix R_iLine I of (1), r_jlIs a rotation matrix R_jThe transpose of the l-th row in (1, 2) is 0 (this embodiment is based on the idea of programming, counting from 0, and l-0 represents the 1 st row of the matrix, which is commonly known as the "1 st" row, and so onAnd so on).

Specifically, the feature point correspondences between two adjacent frames of images obtained in S502 may have unqualified abnormal data, for example, feature points that are not included in another frame of image are necessarily present in each frame of image in the two adjacent frames of images, and the abnormal correspondences may occur by performing the matching operation of S502 on the feature points. Preferably, the RANSAC algorithm can be used to remove abnormal data, and the remaining feature point correspondences are expressed as

Wherein the content of the first and second substances,

representing the corresponding relation between the kth characteristic point between the ith frame image and the jth frame image; j-i-1.

Calculating the pose of the camera is to solve a nonlinear least squares problem between two images with the cost function as follows:

wherein E represents a reprojection error of the ith frame image in euclidean space compared with the jth frame image (the last frame image in this embodiment); t is_iRepresents the pose of the camera when the ith frame image is acquired (as can be seen from the explanation of the pose of the camera, the pose of the camera actually means the change of the acquired ith frame image relative to the previous frame image), T_jRepresenting the pose of the camera when acquiring the j frame image; n represents the total frame number collected by the camera;

representing the k-th feature point on the i-th frame image

The homogeneous coordinate of (a) is,

representing the kth bit in the jth frame imageSign point

Homogeneous coordinates of (a). It should be noted that, when i and k have the same value,

and

represent the same spot, with the difference that

Is the local co-ordinate(s) of the location,

are homogeneous coordinates.

In order to accelerate the calculation rate, the cost function of the formula (2) is not directly calculated in the embodiment, but J (ξ) is calculated through a linear component containing the corresponding relation of the second-order statistics of the residual feature points and a nonlinear component containing the camera pose^TNonlinear term in J (xi)

For delta- (J (xi)^T J(ξ))-¹J(ξ)^TAnd (xi) carrying out repeated iterative computation to solve the pose when the reprojection error is smaller than a preset threshold value. By non-linear terms

By the expression (2), the nonlinear terms are carried out

In calculation, the linear part fixed between two frame images

The calculation is carried out by considering the whole W without calculating according to the number of the corresponding relations of the feature points, thereby reducing the complexity of the calculation of the camera pose and enhancing the calculation of the camera poseReal-time performance of the system.

The derivation process of equation (1) is described below, and the derivation process is combined to analyze the principle of reducing the complexity of the algorithm.

Camera pose T when camera collects ith frame image in Euclidean space_i＝[R_i/t_i]In fact T_iRefers to a pose transformation matrix including a rotation matrix R when the camera collects the ith frame image and the j (the last frame image in this embodiment) frame image_iAnd a translation matrix t_i. Transforming the stiffness in Euclidean space by T_iAlgebraic xi of lie in SE3 space_iIs expressed as ξ_iAlso represents the camera pose, T (xi), when the camera acquires the ith frame image_i) Lie algebra xi_iMapping to T in Euclidean space_i。

For each feature point correspondence

The reprojection error is:

the reprojection error in euclidean space in equation (1) can be expressed as E (ξ) | | r (ξ) |, and r (ξ) represents a vector containing all the reprojection errors, i.e.:

T(ξ_i)P_i ^kcan be expressed as (for simplicity of presentation, xi is omitted below)_i)：

Wherein the content of the first and second substances,

representing a rotation matrix R_iLine i of (1); t is t_ilRepresenting a translation vector t_iThe first element in (1), i ═ 0,1, 2.

Wherein the content of the first and second substances,

representing a Jacobian matrix corresponding to the corresponding relation of the feature points between the ith frame image and the jth frame image; m represents the mth feature point correspondence.

Is a 6 x 6 square matrix,

representation matrix

The transpose of (a) is performed,

the expression is as follows:

wherein, I_3×3Representing a 3 x 3 identity matrix. According to formula (6) and formula (7),

the four non-zero 6 × 6 sub-matrices are:

the following are

For example, the other three non-zero submatrices are calculated similarly, and are not described again.

Wherein, the combination formula (5) can obtain:

will be provided with

Expressed as W, in combination with equation (5), the non-linear term in equation (10) can be expressed

Simplified as formula (1), structural terms in the nonlinear term

Is linearized as W. Albeit to the structural item

In the case of a non-woven fabric,

is non-linear, but through the analysis described above,

all non-zero elements of (1) and C_i,jThe second-order statistics of the medium structure terms are in linear relation, and the second-order statistics of the structure terms are

And

that isThat is, a sparse matrix

To C_i,jThe second order statistics of the mesostructure terms are element linear.

It should be noted that each corresponding relationship

The Jacobian matrixes are all provided with geometric terms xi_i，ξ_jAnd structural items

And (6) determining. For the same frame pair C_i,jAll corresponding relations in (2), their corresponding jacobian matrices share the same geometric terms but have different structural terms. For one frame pair C_i,jCalculating

When existing algorithms rely on C_i,jThe number of the corresponding relations of the medium feature points, and the embodiment can efficiently calculate with fixed complexity

Only the second-order statistic W of the structural item needs to be calculated, and the related structural item does not need to be involved in the calculation of each corresponding relation, namely

The four non-zero submatrices in the system can replace the complexity O (| | C) with the complexity O (1)_i,j| |) is calculated.

Thus, when δ ═ - (J (ξ)^TJ(ξ))^-1J(ξ)^TSparse matrixes JTJ and JTr required in the iteration step of nonlinear Gauss-Newton optimization of r (xi) can be efficiently calculated by the complexity O (M) instead of the original calculation complexity O (N)_coor)，N_coorRepresenting all feature point correspondences for all frame pairsThe total number, M, indicates the number of frame pairs. In general, O (N)_coor) Approximately 300 in sparse matching and 10000 in dense matching, which is much larger than the number of frame pairs M.

Through the derivation, in the camera pose calculation process, W is calculated for each frame pair, and then the expressions (1), (10), (9), (8) and (6) are calculated to obtain

Further, ξ can be obtained by iterative calculation when r (ξ) is the smallest.

Optionally, in order to obtain a more accurate pose of the first tag camera, after obtaining the pose of at least one first tag camera when acquiring each frame of image, the obtained pose may be optimized and updated in a globally consistent manner, and the method specifically includes: if the current frame image acquired by the first label camera is a key frame, loop detection is carried out according to the current key frame and the historical key frame of the first label camera; and if the loop is successful, performing globally consistent optimization updating on the obtained first tag camera pose according to the current key frame.

That is to say, for each frame of image acquired by the first tag camera, it is necessary to determine whether the frame of image is a key frame, if so, loop detection is performed, and if not, loop detection is performed by waiting for the next key frame to arrive. The judgment of the key frame can be to match each frame of image collected by the first label camera with the feature point of the last key frame corresponding to the camera to obtain a conversion relation matrix between the two frames of images; and if the conversion relation matrix is larger than or equal to a preset conversion threshold value, determining that the current frame image is a key frame under the camera.

The global consistent optimization updating refers to that in the calibration process, along with the movement of the camera, when the depth camera moves to a place where the depth camera arrives once or has a large overlap with a historical view angle, the current frame image is consistent with the collected image, and phenomena such as interlacing and aliasing are not generated. And the loop detection is to judge whether the camera moves to a place which is reached once or a place which has larger overlap with a historical visual angle according to the current observation of the depth camera, and if the loop detection is successful, the global consistent optimization updating is carried out on the pose of the first label camera according to the current key frame, so that the accumulated error is reduced.

Specifically, the loop detection of the current key frame and the historical key frame may be performed by performing matching operation on ORB feature points of the current key frame and the historical key frame, and if the matching degree is high, the loop is successful.

Preferably, the camera pose is updated in a global consistent optimization mode, namely, the corresponding relation between the current key frame and one or more historical key frames with high matching degree is solved to obtain

The method is a problem of minimized conversion error between a current key frame of a cost function and all history key frames with high matching degree. Wherein, E (T)₁,T₂,…,T_N-1|T_i∈SE3,i∈[1,N-1]) Representing the conversion error of all frame pairs (any history matching key frame and the current key frame are one frame pair); n represents the number of historical key frames with high matching degree with the current key frames; e_i,jIndicating the conversion error between the ith frame and the jth frame, i.e. the reprojection error.

Specifically, in the process of optimizing and updating the pose of the camera, the relative poses of the non-key frames and the corresponding key frames need to be kept unchanged, and the specific optimization and update algorithm may use the existing BA algorithm, or may use the method in S503 to increase the optimization speed, which is not described in detail. Likewise, for the calculation and optimization process of the relative pose between the cameras, the algorithm of the present embodiment (i.e., the method in S503) may also be used.

The embodiment iteratively calculates the camera pose through a linear component containing the second-order statistic of the feature points and a nonlinear component containing the camera pose to carry out a nonlinear term

When calculating, will be fixedLinear part of stator

The method is considered as a whole W for calculation, so that the complexity of camera pose calculation is reduced, the real-time performance of the camera pose calculation is enhanced, and the requirement on hardware is low. The algorithm is applied to the process of solving the pose and optimizing the rear end, and quick and globally consistent calibration parameters can be obtained.

It should be noted that, the embodiment of the present invention may implement pose estimation and optimization based on the SLAM flow and principle, where the pose estimation is implemented by a front-end visual odometer thread, and the pose optimization is implemented by a back-end loop detection and optimization algorithm, for example, using an existing Beam Adjustment (BA) algorithm or the algorithm in this embodiment.

In the SLAM process, the following operations are performed from the acquired image: and estimating and optimizing the pose of the first tag camera, calculating the relative pose between the cameras by utilizing the overlapping of the view angles, and optimizing the calculated relative pose. These operations may be performed simultaneously. Through the SLAM with the same global position, the posture of each camera is optimized, the calculated relative position and posture between the cameras are continuously updated, and meanwhile, a local map and a global map with the same global position can be maintained so as to adapt to the application background of indoor robot navigation or three-dimensional scene reconstruction of a conventional RGB-D camera. The map in SLAM refers to the motion trajectory of the camera in the world coordinate system and the position of the keyframe observed in the motion trajectory in the world coordinate system. If the camera system is rigidly deformed due to physical impact, the calibration program is started to perform quick recalibration only, and the calibration object does not need to be rearranged.

EXAMPLE five

The embodiment provides a depth camera calibration device, which can be used for executing the depth camera calibration method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. The means may be implemented by means of hardware and/or software, for example by means of a CPU. For details of the technology that are not described in detail in this embodiment, reference may be made to the depth camera calibration method provided in any embodiment of the present invention. Control signals, images and the like need to be transmitted between the depth camera calibration device and the panoramic depth camera system, and communication modes between the depth camera calibration device and the panoramic depth camera system are various, for example, communication can be performed in a wired mode such as a serial port and a network cable, and communication can also be performed in a wireless mode such as Bluetooth and wireless broadband. As shown in fig. 6, the apparatus includes: camera control module 61, pose acquisition module 62, and relative pose calculation module 63.

The camera control module 61 is used for controlling at least two depth cameras in the panoramic depth camera system to synchronously acquire images in the motion process, wherein each depth camera is provided with a corresponding label;

a pose acquisition module 62, configured to acquire a pose of each frame of image acquired by at least one first tag camera;

and a relative pose calculation module 63, configured to calculate, when the historical view angles of the second tag camera overlap with the historical view angles of the first tag camera, a relative pose of the second tag camera at the same time as the relative pose of the first tag camera according to the images corresponding to the historical view angle overlap and the pose of the first tag camera when the first tag camera acquires the images of each frame.

Optionally, the apparatus may further include:

a tag modification module configured to modify the tag of the second tag camera into the first tag after the relative pose calculation module 63 calculates the relative pose of the second tag camera and the first tag camera at the same time;

and the operation execution module is used for repeatedly executing the operations of acquiring the pose of at least one first tag camera when the first tag camera acquires each frame of image, calculating the relative pose of a second tag camera with the historical view angle overlapped with the corresponding first tag camera at the same moment and modifying the tags (namely, repeatedly executing the operations of the pose acquisition module 62, the relative pose calculation module 63 and the tag modification module) until the second tag camera is not contained in at least two depth cameras of the panoramic depth camera system.

Optionally, the apparatus may further include: the key frame determining module is used for respectively matching the current frame image acquired by each depth camera with a key frame on the current frame image when the pose acquiring module 62 acquires the pose of each frame image acquired by at least one first tag camera to obtain a conversion relation matrix between the two frame images; and if the conversion relation matrix is larger than or equal to a preset conversion threshold value, determining that the current frame image is a key frame under the corresponding depth camera, and storing the key frame, specifically storing the key frame and the depth camera to which the key frame belongs.

Optionally, the apparatus may further include: the visual angle overlapping determining module is used for matching feature points of a current frame image acquired by the second label camera with the historical key frames of the at least one first label camera before calculating the relative poses of the second label camera and the first label camera at the same moment; and if the historical key frame and the current frame image reach the matching threshold, determining that the historical view angle of the second label camera is overlapped with the historical view angle of the corresponding first label camera.

Optionally, the relative pose calculation module 63 includes:

the relative position relation calculation unit is used for removing abnormal data in the feature point corresponding relation between the current frame image acquired by the second label camera and the corresponding historical key frame and calculating the relative position relation between the current frame image and the corresponding historical key frame according to the remaining feature point corresponding relation;

the transformation relation calculating unit is used for calculating the transformation relation between the pose of the second tag camera when acquiring the current frame image and the pose of the first tag camera when acquiring the corresponding historical key frame according to the relative position relation;

and the relative pose calculation unit is used for calculating the relative poses of the second tag camera and the first tag camera at the current frame moment according to the transformation relation and the positions of the first tag camera from the acquisition of the corresponding historical key frame to the acquisition of the current frame image.

Optionally, the pose acquisition module 62 includes:

the characteristic extraction unit is used for extracting the characteristics of each frame of image acquired by each first label camera to obtain at least one characteristic point of each frame of image;

the characteristic matching unit is used for matching characteristic points of two adjacent frames of images to obtain the corresponding relation of the characteristic points between the two adjacent frames of images;

an iteration calculation unit for removing abnormal data in the corresponding relation of the feature points, and calculating J (xi) by linear components including the second order statistics of the residual feature points and nonlinear components including the pose of the camera^TNonlinear term in J (xi)

For delta- (J (xi)^T J(ξ))^-1J(ξ)^TPerforming repeated iterative computation on r (xi) for a plurality of times, and solving the pose when the reprojection error is smaller than a preset threshold value;

representing the kth characteristic point on the ith frame image;

Further, the non-linear term

The expression of (a) is:

wherein the content of the first and second substances,

represents a linear component; r is_il ^TAnd r_jlRepresents a nonlinear component, r_il ^TIs a rotation matrix R_iLine I of (1), r_jlIs a rotation matrix R_jThe transpose of the l-th line in (1), l is 0,1, 2.

Optionally, the apparatus may further include:

the loop detection module is used for performing loop detection according to a current key frame and a historical key frame of the first tag camera if a current frame image acquired by the first tag camera is a key frame after the pose of at least one first tag camera acquiring each frame image is acquired;

and the optimization updating module is used for performing global consistent optimization updating on the acquired pose of the first tag camera according to the current key frame under the condition that loop returning is successful.

Optionally, the loop detection module is further configured to, after the relative pose calculation module 63 calculates the relative poses of the second tag camera and the first tag camera at the same time, perform loop detection according to the key frame and the historical key frame under the depth camera having calculated the relative poses if the relative key frame is in the current frame image synchronously acquired by the depth camera having calculated the relative poses; the optimization updating module is further configured to, in the case that the loop-back is successful, optimally update the relative pose between the corresponding depth cameras according to the key frame and the corresponding historical key frame.

It should be noted that, in the embodiment of the depth camera calibration apparatus, the units and modules included in the embodiment are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

EXAMPLE six

The embodiment provides an electronic device, including: one or more processors, a memory, and a panoramic depth camera system. Wherein the memory is used for storing one or more programs. A panoramic depth camera system includes at least two depth cameras covering a panoramic field of view for capturing images. When executed by the one or more processors, cause the one or more processors to implement a depth camera calibration method as described in any embodiment of the invention.

Fig. 7 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention. FIG. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present invention. The electronic device 712 shown in fig. 7 is only an example and should not bring any limitations to the function and the scope of use of the embodiments of the present invention.

As shown in fig. 7, electronic device 712 is embodied in the form of a general purpose computing device. Components of electronic device 712 may include, but are not limited to: one or more processors (or otherwise referred to as processing units 716), a system memory 728, and a bus 718 that couples various system components including the system memory 728 and the processing units 716.

Bus 718 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 712 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 712 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 728 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)730 and/or cache memory 732. The electronic device 712 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 734 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive"). Although not shown in FIG. 7, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 718 by one or more data media interfaces. System memory 728 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

Program/utility 740 having a set (at least one) of program modules 742 may be stored, for instance, in system memory 728, such program modules 742 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 742 generally perform the functions and/or methodologies of embodiments of the invention as described herein.

The electronic device 712 may also communicate with one or more external devices 714 (e.g., keyboard, pointing device, display 724, etc.), with one or more devices that enable a user to interact with the electronic device 712, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 712 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 722. Also, the electronic device 712 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 720. As shown, the network adapter 720 communicates with the other modules of the electronic device 712 via the bus 718. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 712, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 716 executes programs stored in the system memory 728 to perform various functional applications and data processing, such as implementing a depth camera calibration method provided by embodiments of the present invention.

The electronic device 712 may further include: panoramic depth camera system 750 includes at least two depth cameras covering a panoramic field of view for capturing images. The panoramic depth camera system 750 is connected with the processing unit 716 and the system memory 728. The depth camera included in the panoramic depth camera system 750 may capture images under the control of the processing unit 716. In particular, the panoramic depth camera system 750 may be installed in an electronic device in a built-in manner.

Optionally, one or more processors are central processing units; the electronic device is a portable mobile electronic device, such as a mobile robot, a drone, a three-dimensional visual interaction device (such as VR glasses or a wearable helmet), or a smart terminal (such as a mobile phone or a tablet computer).

EXAMPLE seven

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements a depth camera calibration method according to any embodiment of the present invention.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The above example numbers are for description only and do not represent the merits of the examples.

It will be appreciated by those of ordinary skill in the art that the modules or operations of the embodiments of the invention described above may be implemented using a general purpose computing device, which may be centralized on a single computing device or distributed across a network of computing devices, and that they may alternatively be implemented using program code executable by a computing device, such that the program code is stored in a memory device and executed by a computing device, and separately fabricated into integrated circuit modules, or fabricated into a single integrated circuit module from a plurality of modules or operations thereof. Thus, the present invention is not limited to any specific combination of hardware and software.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A depth camera calibration method is characterized by comprising the following steps:

the acquiring of the pose of each frame of image acquired by at least one first tag camera comprises:

for each first label camera, performing feature extraction on each frame of image acquired by the first label camera to obtain at least one feature point of each frame of image;

matching feature points of two adjacent frames of images to obtain a corresponding relationship of the feature points between the two adjacent frames of images;

removing abnormal data in the corresponding relation of the feature points, and calculating J (xi) through a linear component containing the second-order statistic of the residual feature points and a nonlinear component containing the pose of the camera^TNonlinear term in J (xi)

For delta- (J (xi)^TJ(ξ))^-1J(ξ)^TPerforming repeated iterative computation on r (xi) for a plurality of times, and solving the pose when the reprojection error is smaller than a preset threshold value;

wherein r (xi) represents a vector containing all reprojection errors, J (xi) is a Jacobian matrix of r (xi), J (xi)^TThe matrix is a transpose matrix of J (xi), xi represents a lie algebra of a camera pose, and delta represents an increment value of r (xi) in each iteration; r_iRepresenting a rotation matrix of the camera when the ith frame of image is acquired; r_jRepresenting a rotation matrix of the camera when the j frame image is acquired;

representing the kth characteristic point on the ith frame image;

representing the kth characteristic point on the jth frame image; c_i,jA set representing the corresponding relation of the characteristic points of the ith frame image and the jth frame image; i C_i,jThe | 1 represents the number of the corresponding relations of the characteristic points of the ith frame image and the jth frame image; []_×Representing a vector product; i C_i,jI means taking C_i,jNorm of (d);

if the historical view angles of the second label camera and the first label camera are overlapped, calculating the relative pose of the second label camera and the first label camera at the same moment according to the images corresponding to the overlapped historical view angles and the pose of the first label camera when the first label camera collects the images of each frame;

the calculating the relative pose of the second tag camera and the first tag camera at the same moment according to the images corresponding to the historical view angle in an overlapping manner and the pose of the first tag camera when the first tag camera collects each frame of image comprises: removing abnormal data in the feature point corresponding relation between the current frame image acquired by the second label camera and the corresponding historical key frame, and calculating the relative position relation between the current frame image and the corresponding historical key frame according to the remaining feature point corresponding relation; calculating a transformation relation between the pose of the second tag camera when acquiring the current frame image and the pose of the first tag camera when acquiring the corresponding historical key frame according to the relative position relation; and calculating the relative poses of the second label camera and the first label camera at the current frame moment according to the transformation relation and the positions of the first label camera between the acquisition of the corresponding historical key frame and the acquisition of the current frame image.

2. The method of claim 1, further comprising, after calculating the relative pose of the second tag camera and the first tag camera at the same time:

modifying the tag of the second tag camera to be a first tag;

and repeatedly executing the operations of acquiring the pose of at least one first tag camera when acquiring each frame of image, calculating the relative pose of a second tag camera with the overlapped historical view angles and the corresponding first tag camera at the same moment, and modifying the tags until the at least two depth cameras do not contain the second tag camera.

3. The method according to claim 1, wherein while acquiring the pose of the at least one first tag camera at the time of capturing each frame of image, further comprising:

respectively carrying out feature point matching on the current frame image acquired by each depth camera and a key frame on the current frame image to obtain a conversion relation matrix between the two frame images;

and if the conversion relation matrix is larger than or equal to a preset conversion threshold value, determining the current frame image as a key frame under the corresponding depth camera, and storing the key frame.

4. The method of claim 1, further comprising, prior to calculating the relative pose of the second tag camera and the first tag camera at the same time:

performing feature point matching on the current frame image acquired by the second label camera and the historical key frame of the at least one first label camera;

and if the historical key frame and the current frame image reach a matching threshold value, determining that the historical view angle of the second label camera is overlapped with the historical view angle of the corresponding first label camera.

5. The method of claim 1, wherein the non-linear term

The expression of (a) is:

wherein the content of the first and second substances,

6. The method of claim 1, after acquiring the pose of the at least one first tag camera as it captures each frame of image, further comprising:

if the current frame image acquired by the first label camera is a key frame, performing loop detection according to the current key frame and a historical key frame of the first label camera;

and if the loop returning is successful, performing globally consistent optimization updating on the obtained first tag camera pose according to the current key frame.

7. The method of claim 1, further comprising, after calculating the relative pose of the second tag camera and the first tag camera at the current frame time,:

if the relative key frame in the current frame image synchronously acquired by the depth camera with the calculated relative pose is relevant, performing loop detection according to the key frame and the historical key frame under the depth camera with the calculated relative pose;

and if the loop returning is successful, optimizing and updating the relative pose between the corresponding depth cameras according to the key frames and the corresponding historical key frames.

8. A depth camera calibration device, comprising:

the pose acquisition module is used for acquiring the pose of at least one first tag camera when acquiring each frame of image, and the acquisition of the pose of at least one first tag camera when acquiring each frame of image comprises the following steps: for each first label camera, performing feature extraction on each frame of image acquired by the first label camera to obtain at least one feature point of each frame of image; matching feature points of two adjacent frames of images to obtain a corresponding relationship of the feature points between the two adjacent frames of images; removing abnormal data in the corresponding relation of the feature points, and calculating J (xi) through a linear component containing the second-order statistic of the residual feature points and a nonlinear component containing the pose of the camera^TNonlinear term in J (xi)

For delta- (J (xi)^TJ(ξ))^-1J(ξ)^TPerforming repeated iterative computation on r (xi) for a plurality of times, and solving the pose when the reprojection error is smaller than a preset threshold value; wherein r (xi) represents a vector containing all reprojection errors, J (xi) is a Jacobian matrix of r (xi), J (xi)^TThe matrix is a transpose matrix of J (xi), xi represents a lie algebra of a camera pose, and delta represents an increment value of r (xi) in each iteration; r_iRepresenting a rotation matrix of the camera when the ith frame of image is acquired; r_jRepresenting a rotation matrix of the camera when the j frame image is acquired;

representing the kth characteristic point on the ith frame image;

representing the kth characteristic point on the jth frame image; c_i,jA set representing the corresponding relation of the characteristic points of the ith frame image and the jth frame image; i C_i,jThe | 1 represents the number of the corresponding relations of the characteristic points of the e frame image and the j frame image; []_×Representing a vector product; i C_i,jI means taking C_i,jNorm of (d);

the relative pose calculation module is configured to, when a history view angle of a second tag camera overlaps with a history view angle of the first tag camera, calculate a relative pose of the second tag camera and the first tag camera at the same time according to an image corresponding to the history view angle overlap and a pose of the first tag camera at the time of capturing each frame of image, and calculate a relative pose of the second tag camera and the first tag camera at the same time according to the image corresponding to the history view angle overlap and the pose of the first tag camera at the time of capturing each frame of image includes: removing abnormal data in the feature point corresponding relation between the current frame image acquired by the second label camera and the corresponding historical key frame, and calculating the relative position relation between the current frame image and the corresponding historical key frame according to the remaining feature point corresponding relation; calculating a transformation relation between the pose of the second tag camera when acquiring the current frame image and the pose of the first tag camera when acquiring the corresponding historical key frame according to the relative position relation; and calculating the relative poses of the second label camera and the first label camera at the current frame moment according to the transformation relation and the positions of the first label camera between the acquisition of the corresponding historical key frame and the acquisition of the current frame image.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the depth camera calibration method of any one of claims 1-7.

10. The electronic device of claim 9, wherein the one or more processors are central processors; the electronic device is a portable mobile electronic device.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a depth camera calibration method as set forth in any one of claims 1 to 7.