WO2024093635A1

WO2024093635A1 - Camera pose estimation method and apparatus, and computer-readable storage medium

Info

Publication number: WO2024093635A1
Application number: PCT/CN2023/124164
Authority: WO
Inventors: 武云钢
Original assignee: 深圳市其域创新科技有限公司
Priority date: 2022-11-04
Filing date: 2023-10-12
Publication date: 2024-05-10
Also published as: CN115423863A; CN115423863B

Abstract

A camera pose estimation method and apparatus, and a computer-readable storage medium. The camera pose estimation method comprises: acquiring extrinsic parameters of each camera in a multi-camera photographic apparatus that comprises at least two cameras (110); determining, from the extrinsic parameters, a first extrinsic parameter of a master camera and a second extrinsic parameter of each slave camera, and according to the first extrinsic parameter and the second extrinsic parameter, calculating a pose transformation relationship of each slave camera relative to the master camera (120); acquiring a first geographical location of the master camera by means of a sensor (130); and generating a three-dimensional point according to a plurality of images captured by the multi-camera photographic apparatus, and performing optimization calculation on the three-dimensional point according to the first geographical location and the pose transformation relationship, so as to obtain an optimized first pose of each camera (140). By means of the method, the embodiments of the present application can solve the problem of an optimization effect being poor due to a feature point matching effect being relatively poor when a captured image has relatively poor ground texture, and the embodiments of the present application have relatively low dependence on feature point matching, and can improve the optimization effect when the ground texture is relatively poor.

Description

Camera pose estimation method, device and computer readable storage medium

Technical Field

The embodiments of the present invention relate to the field of unmanned aerial survey technology, and specifically to a camera pose estimation method, device, equipment and computer-readable storage medium.

Background technique

At present, with the continuous advancement of science and technology, aerial photogrammetry technology can map topographic maps and image maps of various scales, and can also establish terrain databases to provide basic data for various geographic information systems and land information systems, so that people can plan and manage land more finely. The precise geographic information data provided by aerial photogrammetry technology can also establish high-precision maps, which brings convenience to people's travel positioning. Aerial photogrammetry technology often uses aerial triangulation, that is, using continuously captured aerial photos with a certain overlap, based on a small number of field control points, to establish a route model or regional network model corresponding to the field by photogrammetry methods, so as to obtain the plane coordinates and elevation of the encrypted points. UAV aerial survey technology is a powerful supplement to traditional aerial photogrammetry methods. It has the characteristics of flexibility, high efficiency, precision and accuracy, low operating cost and wide application range. Using UAV aerial survey technology, when carrying multiple cameras for aerial photography, UAVs can effectively capture orthophotos and oblique images. The frame of multiple cameras is large, and high-altitude operations can complete flight missions at one time.

With existing drone aerial survey technology, when a drone is equipped with multiple cameras for aerial photography, the matching effect of feature points is often poor when encountering a ground with poor texture characteristics, such as large forests or large areas of water.

Summary of the invention

In view of the above problems, an embodiment of the present invention provides a camera pose estimation method to solve the problems existing in the prior art.

According to one aspect of an embodiment of the present invention, a camera pose estimation method is provided, characterized by comprising:

Acquire external parameters of each camera in a multi-camera shooting device including at least two cameras, wherein the relative position relationship between each of the cameras is fixed, and the at least two cameras include a master camera and one or more slave cameras;

Determine a first extrinsic parameter of the master camera and a second extrinsic parameter of each of the slave cameras from the extrinsic parameters, and calculate a posture transformation relationship of each of the slave cameras relative to the master camera according to the first extrinsic parameter and the second extrinsic parameter;

Acquire a first geographic location of the main camera through a sensor;

Three-dimensional points are generated according to the multiple images taken by the multi-camera shooting device, and the three-dimensional points are optimized and calculated according to the first geographical location and the posture transformation relationship to obtain the optimized first posture of each camera.

In an optional manner, determining a first extrinsic parameter of the master camera and a second extrinsic parameter of each of the slave cameras from the extrinsic parameters, and calculating a pose transformation relationship of each of the slave cameras relative to the master camera according to the first extrinsic parameter and the second extrinsic parameter further includes:

The main camera extrinsic parameter and the slave camera extrinsic parameter are calculated according to a plurality of images taken by the main camera and the slave camera at the same track position;

According to the main camera extrinsic parameter and the slave camera extrinsic parameter, the conversion relationship between the image taken by the main camera and the image taken by the slave camera is calculated, and the calculation formula is:
T01＝Tw0'*Tw1

Wherein, T01 is the conversion relationship, Tw0 is the master camera external parameter, Tw0' is the inverse matrix of Tw0, and Tw1 is the slave camera external parameter;

The conversion relationship of the image is determined as the posture transformation relationship.

In an optional manner, generating three-dimensional points according to the multiple images captured by the multi-camera shooting device further includes:

Extract feature point information of each image from multiple images;

Generate bag-of-words information according to the feature point information;

Performing matching calculation on at least two images having the same feature descriptor in the bag-of-words information to obtain a matching relationship between the two matching images;

Calculating the relative transformation relationship between every two images in all images according to the matching relationship;

The three-dimensional point is generated according to the relative transformation relationship.

In an optional manner, the optimizing calculation of the three-dimensional point according to the first geographical location and the posture transformation relationship further includes:

The projection matrix used for optimization is determined according to the following formula:
P _i = k · [R _c | t _c ] · [R _i | t _i ]

Wherein, _Pi is the projection matrix, [ _Rc | _tc ] is the posture transformation relationship, [ _Ri | _ti ] is the first geographical location, k is the internal parameter of any camera of the master camera and the slave camera, i is the sequence number of the first geographical location, and c is the sequence number of the posture transformation relationship;

The minimized reprojection error of the three-dimensional point is calculated according to the projection matrix, and the formula is:

Wherein, x is the three-dimensional point, _xo is the two-dimensional feature point obtained after reprojecting the three-dimensional point, and o is the serial number of the three-dimensional point.

In an optional manner, after optimizing and calculating the three-dimensional point according to the first geographical location and the posture transformation relationship, the camera posture estimation method further includes:

According to the reprojection error obtained after optimizing the three-dimensional points, points whose pixel errors are greater than 4 pixels are eliminated from the three-dimensional points;

Eliminate points whose angles between observation points are less than 2 degrees from the three-dimensional points;

The three-dimensional points are globally optimized.

In an optional manner, the camera pose estimation method further includes:

Calculate the second geographical location of the slave camera according to the first geographical location of the master camera and the posture transformation relationship;

Optimizing and calculating the three-dimensional point according to the second geographical location and the posture transformation relationship, The optimized second pose of each slave camera is obtained.

In an optional manner, after obtaining the external parameters of each camera in a multi-camera shooting device including at least two cameras, the method further includes:

The position transformation relationship of each of the slave cameras relative to the master camera obtained by calculating according to the first external parameter and the second external parameter during the last operation of the multi-camera shooting device is obtained.

According to another aspect of an embodiment of the present invention, a camera pose estimation device is provided, comprising:

A first acquisition module, used to acquire an external parameter of each camera in a multi-camera shooting device including at least two cameras, wherein a relative position relationship between each of the cameras is fixed, and the at least two cameras include a master camera and one or more slave cameras;

A first calculation module is used to determine a first extrinsic parameter of the master camera and a second extrinsic parameter of each of the slave cameras from the extrinsic parameters, and calculate a posture transformation relationship of each of the slave cameras relative to the master camera according to the first extrinsic parameter and the second extrinsic parameter;

A second acquisition module, configured to acquire a first geographic location of the main camera through a sensor;

The second calculation module is used to generate three-dimensional points according to the multiple images taken by the multi-camera shooting device, and optimize the three-dimensional points according to the first geographical location and the posture transformation relationship to obtain the optimized first posture of each camera.

A processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus;

The memory is used to store at least one program, and the program enables the processor to perform operations such as the above-mentioned camera pose estimation method.

According to another aspect of an embodiment of the present invention, a computer-readable storage medium is provided, wherein at least one program is stored in the storage medium, and the program enables a camera pose estimation device to perform operations corresponding to the above method.

According to the camera pose estimation method, device, equipment and computer-readable storage medium of the embodiments of the present invention, by obtaining the main camera extrinsic parameters and the slave camera extrinsic parameters, the pose transformation relationship of the slave camera extrinsic parameters relative to the main camera extrinsic parameters is calculated, and then the first geographic location of the main camera is obtained. And pose transformation relationship are used to optimize the 3D points generated by multiple target images taken by the main camera and the slave camera. Since the first geographical location of the main camera is its actual location, and the actual position relationship between the main camera and the slave camera is relatively fixed, obtaining the pose transformation relationship of the slave camera relative to the main camera is equivalent to obtaining the actual position of the slave camera. Optimizing the 3D points based on the actual position of the camera can solve the problem of poor optimization effect caused by poor feature point matching when the ground texture of the captured image is poor. It has low dependence on feature point matching, can improve the optimization effect when the ground texture is poor, improve the optimization accuracy and applicability, and thus improve the accuracy of 3D reconstruction.

The above description is only an overview of the technical solution of the embodiment of the present invention. In order to more clearly understand the technical means of the embodiment of the present invention, it can be implemented according to the contents of the specification. In order to make the above and other purposes, features and advantages of the embodiment of the present invention more obvious and easy to understand, the specific implementation methods of the present invention are listed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are only used to illustrate the embodiments and are not to be considered as limiting the present invention. In addition, the same reference symbols are used to represent the same components throughout the accompanying drawings. In the accompanying drawings:

FIG1 is a schematic diagram showing a flow chart of a camera pose estimation method provided by an embodiment of the present invention;

FIG2 is a schematic diagram showing the structure of a camera pose estimation device provided by an embodiment of the present invention;

FIG3 shows a schematic structural diagram of a camera pose estimation device provided by an embodiment of the present invention.

Detailed ways

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention can be implemented in various forms and should not be limited to the embodiments set forth herein.

In view of the currently common use of drone aerial survey for 3D reconstruction, the inventors have noticed that in the existing drone aerial survey for 3D reconstruction through aerial triangulation, the images taken by each camera on the drone are extracted and matched, and only the matching information is used to perform BA (Bundle Adjustment) optimization on the generated 3D points. Due to the strong dependence on matching information, when the ground texture characteristics are poor, insufficient stability, reduced feature point matching effect and abnormal camera position solution will occur. In order to improve the stability of UAV aerial survey 3D reconstruction and expand its applicability, it is particularly important to develop a more accurate camera pose estimation method that is less dependent on matching relationships and has wider applicability to terrain scenes.

In order to solve the above problems, the inventors of the present application have designed a camera pose estimation method after research. By obtaining the main camera extrinsic parameters and the slave camera extrinsic parameters, calculating the pose transformation relationship between the main camera extrinsic parameters and the slave camera extrinsic parameters, obtaining the first geographical location of the main camera, and then optimizing the three-dimensional points generated by three-dimensional reconstruction according to the first geographical location and the pose transformation relationship, the dependence of the optimization process on the matching relationship of feature points is reduced, and the accuracy of the optimized three-dimensional points is not easily affected by inaccurate matching relationships in complex terrain, which can improve the accuracy of three-dimensional reconstruction through unmanned aerial vehicle aerial survey.

FIG. 1 shows a flow chart of a camera pose estimation method provided by an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:

Step 110: Obtaining external parameters of each camera in a multi-camera shooting device including at least two cameras, wherein the relative position relationship between each camera is fixed, and the at least two cameras include a master camera and one or more slave cameras.

In this step, the main camera refers to the camera used as an orthographic lens in a multi-camera shooting device, that is, the camera that shoots in the direction of the target. In three-dimensional reconstruction, in order to capture images at different angles at the same position so as to extract feature points and generate a three-dimensional model, the shooting directions of the cameras are often inconsistent. Generally, there is only one main camera, and the remaining cameras are slave cameras. In the embodiment of the present application, only a multi-camera shooting device with one main camera and one or more slave cameras is taken as an example.

In this embodiment, a geographic location acquisition device, such as various types of location sensors, needs to be installed on the main camera in the multi-camera shooting device. The other cameras serve as slave cameras, and based on the geographic location of the main camera and the fixed relative position relationship between the other cameras and the main camera, the posture transformation relationship of the slave cameras relative to the main camera can be obtained in subsequent steps, or the geographic location of the slave cameras can be further obtained.

In this step, obtaining the extrinsic parameters of each camera in a multi-camera shooting device including at least two cameras refers to obtaining the extrinsic parameters of the master camera and the slave camera. The extrinsic parameters of the camera refer to the parameters of the camera in the world coordinate system, including the rotation matrix R and the translation matrix T. The camera extrinsic parameters can be obtained in a variety of ways. For example, an aerial triangulation calculation can be performed using multiple images taken by the master camera and the slave camera at the same trajectory point to obtain the master camera extrinsic parameters and the slave camera extrinsic parameters. They can also be obtained by camera self-calibration. There are also many methods for camera self-calibration. The Tsai two-step method can be used. Different methods can be used to obtain camera extrinsics according to actual conditions, such as the Zhang calibration method, the active system controlling the camera to perform specific movements, the layered step-by-step calibration method, or the camera self-calibration based on the Kruppa equation. The embodiments of the present application do not specifically limit this.

In this step, each camera in the multi-camera shooting device is rigidly connected. If one or more cameras in the multi-camera shooting device rotate the shooting angle, the direction of the camera changes, and the posture also changes. Since the relative position relationship between each camera is fixed, the movement of all cameras in the multi-camera shooting device is consistent. For example, all cameras in the multi-camera shooting device rotate a certain angle at the same time, or the multi-camera shooting device moves a certain distance as a whole. At this time, even if the posture of each camera changes, the relative posture between each camera remains unchanged.

By obtaining the extrinsic parameters of each camera in a multi-camera shooting device including at least two cameras, a data basis is obtained for subsequent camera pose estimation calculations. Calculations based on the camera extrinsic parameters can obtain relatively accurate data calculation results.

Step 120: Determine a first extrinsic parameter of the master camera and a second extrinsic parameter of each of the slave cameras from the extrinsic parameters, and calculate a posture transformation relationship of each of the slave cameras relative to the master camera based on the first extrinsic parameter and the second extrinsic parameter.

In this step, the main camera extrinsics and slave camera extrinsics obtained in step 110 are used to calculate the pose transformation relationship between the main camera extrinsics and the slave camera extrinsics. The pose transformation relationship is a transformation formula, and can also be a parameter, or one or more calculation formulas. The purpose is to enable the main camera extrinsics or the slave camera extrinsics to be converted to each other after calculation in combination with the pose transformation relationship. Different forms of pose transformation relationships can be used in the calculation according to actual conditions, as long as the conversion between the main camera extrinsics and the slave camera extrinsics can be conveniently realized. The embodiment of the present application does not make any special limitation on this.

By obtaining the pose transformation relationship between the main camera extrinsic parameters and the slave camera extrinsic parameters, the fixed relative position data between the main camera and the slave camera can be added to the optimization process in the subsequent camera pose estimation through the pose transformation relationship, so that the camera pose estimation can use the relative position of the main camera and the slave camera as the optimization basis. It is not only optimized through the matching relationship of image feature points, but also can improve the applicability and stability of the optimization, making the camera pose estimation result more accurate.

Step 130: Acquire a first geographic location of the main camera through a sensor.

In this step, acquiring the first geographical location of the main camera refers to directly obtaining the first geographical location of the main camera through a sensor, and the geographical location refers to the position data of the main camera in the world coordinate system.

In this step, the sensor can be a gyroscope or a GPS, or other sensors. The first geographic location can be obtained through different sensors according to actual conditions. It only needs to be able to conveniently obtain the position data of the main camera in the world coordinate system directly or through certain calculations. The embodiment of the present application does not make any special limitations on this.

The first geographic location of the main camera is obtained through the sensor, making data acquisition more convenient and simple, and the obtained geographic location is the accurate actual location of the main camera, providing an accurate data basis for subsequent optimization calculations.

Step 140: Generate three-dimensional points based on the multiple images taken by the multi-camera shooting device, optimize and calculate the three-dimensional points according to the first geographical location and the posture transformation relationship, and obtain the optimized first posture of each camera.

In this step, generating three-dimensional points based on multiple images taken by a multi-camera shooting device means that after the main camera and the slave camera take multiple images, three-dimensional reconstruction is performed based on the multiple images to generate three-dimensional points. In an embodiment of the present application, aerial triangulation is performed based on the multiple images taken by the main camera and the slave cameras to generate three-dimensional points.

The generation of three-dimensional points refers to the generation by extracting feature points from multiple images. Depending on the number of images, the number of generated three-dimensional points will be sparse or dense.

In this step, the three-dimensional points are optimized and calculated according to the first geographic location and the posture transformation relationship to obtain the optimized first posture of each camera, which means that the data of the first geographic location and the posture transformation relationship are substituted into the optimization process as optimization parameters, and the three-dimensional points are optimized to obtain the optimized first posture.

In this step, the method for optimizing the calculation of the three-dimensional points according to the first geographic location and the position transformation relationship can be the bundle adjustment method. The bundle adjustment method refers to extracting the optimal 3D model and camera parameters (intrinsic and external parameters) from visual reconstruction. The bundles of light rays reflected from each feature point converge to the optical center after we make the best adjustment (adjustment) to the camera posture and the position of the feature point. This process is referred to as BA. The first geographic location and the position transformation relationship can be substituted into the bundle adjustment method to achieve the optimization calculation of the three-dimensional points.

By generating three-dimensional points based on multiple target images taken by the main camera and the slave camera, and then optimizing the three-dimensional points based on the first geographic location and the posture transformation relationship, the optimization of the three-dimensional points can be made not only dependent on Based on the matching relationship of feature points of multiple images, when encountering complex terrain and the matching relationship of feature points of the target image is not accurate, more accurate optimization results can be obtained. At the same time, by involving the first geographic location and posture transformation relationship of the main camera in the optimization, the applicability and optimization accuracy of 3D reconstruction in complex terrain are improved.

Through the combination of the above steps 110, 120, 130 and 140, it can be known that according to the camera pose estimation method provided by the present application, the pose transformation relationship between the main camera extrinsic parameters and the slave camera extrinsic parameters can be calculated by obtaining the main camera extrinsic parameters and the slave camera extrinsic parameters, and then the first geographic location of the main camera is obtained, and the three-dimensional point is optimized by the pose transformation relationship and the first geographic location. In this way, data acquisition is more convenient, and it is only necessary to install a sensor for obtaining the first geographic location on the main camera. Since the relative position between each camera does not change, the precise position of each camera can be calculated by the pose transformation relationship and the first geographic location obtained by the sensor on the main camera. There is no need to install additional sensors on each camera, which reduces the equipment cost. In addition, by involving the first geographic location of the main camera and the pose transformation relationship that can reflect the relative position of the main camera and the slave camera in the optimization calculation, the optimization process does not completely rely on the matching relationship between the feature points of multiple target images. When three-dimensional reconstruction is performed on complex terrain, if the matching relationship of the feature points is inaccurate, the first geographic location and the pose transformation relationship are involved in the optimization calculation, so that the accuracy of the three-dimensional points generated by the optimized three-dimensional reconstruction can be improved, thereby improving the applicability.

In one embodiment of the present invention, step 120 further includes:

Step a01: Calculating the main camera extrinsic parameters and the slave camera extrinsic parameters according to a plurality of images taken by the main camera and the slave camera at the same track position;

Step a02: Calculate the conversion relationship between the image taken by the main camera and the image taken by the slave camera according to the main camera extrinsic parameter and the slave camera extrinsic parameter. The calculation formula is:
T01＝Tw0'*Tw1

Step a03: Determine the conversion relationship of the image as the posture transformation relationship.

In step a01, the main camera extrinsic parameters and the slave camera extrinsic parameters are calculated based on multiple images taken by the main camera and the slave camera at the same track position. Specifically, aerial triangulation is performed based on multiple images taken by the main camera and the slave camera at the same track position, and the main camera extrinsic parameters are obtained based on the results of aerial triangulation. Camera extrinsics and slave camera extrinsics.

Among them, the same trajectory position means that during UAV aerial survey, the flight route of the UAV can form a trajectory. A trajectory contains multiple trajectory positions, which can also be called trajectory points. Each trajectory position has fixed coordinates in the world coordinate system, that is, the main camera extrinsic parameters and the slave camera extrinsic parameters are calculated according to multiple images taken by the main camera and the slave camera at the same trajectory position. It can be understood that the main camera extrinsic parameters and the slave camera extrinsic parameters are calculated according to multiple images taken by the main camera and the slave camera at the same position.

In step a02, Tw0 refers to the extrinsic parameter of the main camera in the world coordinate system, and Tw0' refers to the transformation from the main camera coordinate system to the world coordinate system. Tw0' can be obtained by calculating the inverse matrix of the main camera extrinsic parameter Tw0, which is used to calculate the transformation relationship between the main camera extrinsic parameter and the slave camera extrinsic parameter, that is, TW0'*TW1.

By obtaining the conversion relationship T01 between the main camera extrinsic parameters and the slave camera extrinsic parameters, the positions of the main camera and the slave camera in the same coordinate system can be converted to each other. Since the relative positions of multiple cameras remain basically unchanged, the position data of other cameras can be obtained based on the position data of any camera. The data obtained in this way will not be affected by the shooting conditions or image quality. In the subsequent optimization, the extrinsic parameters of any camera can be used as optimization data through the conversion relationship T01, which reduces the dependence on the feature point matching relationship. Accurate optimization results can be obtained in multiple images, and the optimization results will not be affected by inaccurate feature point matching due to poor image quality obtained by shooting, thereby improving applicability.

In one embodiment of the present invention, generating three-dimensional points according to the multiple images captured by the multi-camera shooting device further includes:

Step b01: extracting feature point information of each of the multiple images;

Step b02: Generate bag-of-words information according to the feature point information;

Step b03: performing matching calculation on at least two images having the same feature descriptor in the bag-of-words information to obtain a matching relationship between the two matching images;

Step b04: calculating the relative conversion relationship between every two images in all images according to the matching relationship;

Step b05: Generate three-dimensional points based on the relative transformation relationship.

In step b01, the feature point information of each target image can be extracted by FAST feature point Extraction, that is, traverse each pixel in each target image, select 16 surrounding pixels with the current pixel as the center and 3 as the radius, and compare them in turn. If the grayscale difference value is greater than the set threshold, it is marked as a feature point. The set threshold can be set according to the actual situation, and the embodiment of the present application does not make special restrictions on this. You can also choose ORB feature point extraction or surf feature point extraction methods, as long as you can easily extract the feature point information of each target image, and the embodiment of the present application does not make special restrictions on this.

Among them, the feature point information can be all the pixels in an area centered on the feature point or covering the feature point, or it can be parameter information of a single pixel or multiple pixels. The purpose is to enable subsequent word bag information to be generated based on the feature point information. According to actual conditions, the feature point information can be in various forms, as long as it can facilitate the generation of subsequent word bag information. The embodiments of the present application do not make special limitations on this.

In step b02, word bag information is generated based on feature point information, specifically by generating words through clustering based on the feature point information, that is, word bag information. For example, the feature point information is all pixel points in multiple areas, and each area contains lakes and grasslands, then corresponding word bag information containing lakes and grasslands can be generated.

In step b03, at least two target images with the same feature descriptor in the word bag information are matched and calculated to obtain the matching relationship between the two matched target images. Specifically, the corresponding target images in the word bag information are matched according to the feature descriptor through loop detection. The feature descriptor refers to a descriptor (Descriptor), which is a data structure that describes features. The dimension of a descriptor can be multidimensional and is used to describe feature points. The acquisition method is to take the image feature point as the center, take an S*S neighborhood window, randomly select a pair of points in the window, compare the sizes of the pixels of the two, perform binary assignment, and then continue to randomly select N pairs of points, repeat the binary assignment, and form a binary code. This code is the description of the feature point, that is, the feature descriptor.

In step b03, after the matching relationship is obtained, further, in order to improve the accuracy of the matching relationship, the erroneous matching relationship can be filtered out by geometric filtering.

In step b04, the relative transformation relationship of the target image is calculated based on the matching relationship. Specifically, the matching relationship obtained in step b03 is used to calculate the relative transformation relationship of the extrinsic parameters of each pair of matched target images, and then rotation averaging and translation averaging are performed. Rotation averaging refers to estimating the absolute position of the camera under a given relative rotation measurement value, and translation averaging refers to estimating the absolute position of the camera under a given relative rotation measurement value. The absolute position of the camera is estimated in the case of relative translation measurement values. Both relative rotation measurement values and relative translation measurement values can be obtained based on the relative conversion relationship. When performing rotation averaging, the L2 norm can be used, because in the process of code optimization iteration, the L2 norm is to find the sum of squares. The code will optimize and solve this type of formula and converge quickly. When performing translation averaging, the L1 norm can be used, because the L1 norm has a more stable feedback on noise.

In one embodiment of the present invention, the optimizing calculation of the three-dimensional point according to the first geographical location and the posture transformation relationship further includes:

Step b06: Determine the projection matrix for optimization according to the following formula:
P _i = k · [R _c | t _c ] · [R _i | t _i ]

Wherein, _Pi is the projection matrix, [ _Rc | _tc ] is the conversion relationship, [ _Ri | _ti ] is the first geographical location, k is the internal parameter of any camera in the master camera and the slave camera, i is the sequence number of the first geographical location, and c is the sequence number of the camera;

Step b07: Calculate the minimized reprojection error of the three-dimensional point according to the projection matrix, and the formula is:

Wherein, x is the three-dimensional point, _xo is the two-dimensional feature point obtained after reprojecting the three-dimensional point, and o is the sequence number of the three-dimensional point.

In step b06, the conversion relationship [R _c |t _c ] is multiplied by the first geographic location [R _i |t _i ], and the camera intrinsic parameter k is added to participate in the calculation to obtain the projection matrix P _i , which can reflect the conversion from three-dimensional points to two-dimensional points and provide a data basis for subsequent calculations.

Among them, the camera intrinsic parameter k can be the main camera intrinsic parameter or the slave camera intrinsic parameter. It can be optimized for different cameras according to actual conditions. It only needs to be able to obtain an accurate projection matrix in the end. For example, when it is necessary to optimize the main camera, the camera intrinsic parameter k is the main camera intrinsic parameter, and the optimized first pose obtained after optimization calculation is the first pose of the main camera.

Where k is the intrinsic parameter of any camera among the master camera and the slave camera. If k is the intrinsic parameter of the master camera, then [ _R c|t _c ] is the unit matrix.

In step b07, the reprojection error is calculated by the least squares method, that is, the minimum distance that minimizes the reprojection of the three-dimensional point to the two-dimensional plane of the image is calculated. When calculating the reprojection error, ceres tools can be used to iteratively solve the optimal solution. Different tools can also be used to assist the calculation according to actual conditions. This application does not make any special restrictions on this.

In one embodiment of the present invention, after optimizing and calculating the three-dimensional point according to the first geographical location and the position and posture transformation relationship, the method further includes:

Step d01: according to the reprojection error obtained after optimizing the three-dimensional points, eliminating the points whose pixel errors are greater than 4 pixels from the three-dimensional points;

Step d02: Eliminate points whose angles between observation points and the three-dimensional points are less than 2 degrees;

Step d03: globally optimizing the three-dimensional points.

In step d01, the pixel error can be calculated by the reprojection error. The formula for calculating the reprojection error is:

Wherein, x is the three-dimensional point, _xo is the two-dimensional feature point obtained by reprojecting the three-dimensional point, and o is the serial number of the three-dimensional point. According to the calculation formula of the reprojection error, the formula for calculating the pixel error can be obtained as follows:

|P _i xx _o |

Wherein, _Pi is the projection matrix, x is the 3D point, and _xo is the 2D feature point obtained by reprojecting the 3D point. The pixel error is the difference between the position of the 3D point and the 2D feature point when the 3D point is projected onto a 2D plane.

In step d02, the observation point refers to a 3D point generated from multiple images captured by a multi-camera shooting device. If a 3D point can be observed by two cameras at the same time, then the 3D point to the two cameras The angle formed by the straight line is the observation point angle. If the largest observation point angle among all observation point angles of the same observation point is less than 2 degrees, this observation point will be eliminated.

When the angle between the observation points is less than 2 degrees, it can be considered that the angle between the observation points of the two cameras is particularly small. In this case, the generated 3D points often have large errors. When the reprojection error is greater than 4 pixels, that is, when the difference between the 3D point projected onto the 2D plane and the 2D pixel position is greater than 4 pixels, it can also be considered that the error of this 3D point is large. Therefore, by eliminating points with pixel errors greater than 4 and eliminating points with an angle of less than 2 degrees between each observation point, the accuracy of the remaining 3D points can be higher, and the effect of global optimization of the 3D points can be better.

In one embodiment of the present invention, the camera pose estimation method further includes:

Step e01: Calculate the second geographical location of the slave camera according to the first geographical location of the master camera and the posture transformation relationship;

Step e02: Optimizing and calculating the three-dimensional points according to the second geographical location and the posture transformation relationship to obtain each second posture optimized from the camera.

In step e01, the second geographical location of the slave camera is calculated based on the first geographical location and posture transformation relationship of the main camera. This means that the second geographical location of the slave camera is obtained by combining the first geographical location and the posture transformation relationship obtained by calculation, and the second geographical location refers to the position data of the slave camera in the world coordinate system.

In step e02, the three-dimensional points are optimized and calculated according to the second geographic location and the posture transformation relationship to obtain each second posture optimized from the camera. This means that the data of the second geographic location and the posture transformation relationship are substituted into the optimization process as optimization parameters, the three-dimensional points are optimized, and the optimized second posture is obtained.

By optimizing the three-dimensional points according to the second geographic location and posture transformation relationship, the second posture from the camera is obtained. The optimization of the three-dimensional points not only depends on the matching relationship of the feature points of multiple images, but also can obtain more accurate optimization results when the matching relationship of the feature points is not accurate under complex terrain, thereby improving the applicability and the effect of three-dimensional reconstruction optimization.

In one embodiment of the present invention, after obtaining the external parameters of each camera in a multi-camera shooting device including at least two cameras, the method further includes:

Step f01: Obtain the last operation of the multi-camera shooting device according to the first external reference and the second external reference. The historical pose transformation relationship of each slave camera relative to the master camera is calculated, the historical pose transformation relationship is used as the pose transformation relationship, and the process jumps to the step of obtaining the first geographic location of the master camera through the sensor.

In step f01, since each camera in the multi-camera shooting device is rigidly connected, the relative position relationship of each camera will not change, so the historical posture transformation relationship obtained in the previous operation can be read multiple times and reused. After each aerial photography operation, the data obtained in this aerial photography can also be saved for subsequent use. The data can be saved as a json format file, and can also be saved as other types of data according to actual conditions. It is only necessary to ensure that the data can be easily read and used repeatedly, and the embodiment of the present application does not make any special limitations on this.

Among them, after obtaining the historical posture transformation relationship as the posture transformation relationship, there is no need to repeatedly calculate the posture transformation relationship of the camera, and you can jump directly to the next step.

By obtaining the historical pose transformation relationship of each slave camera relative to the master camera calculated according to the first external parameter and the second external parameter in the previous operation process, and using the historical pose transformation relationship as the pose transformation relationship, each operation after the first operation can directly use the fixed and unchanging data calculated in the previous operation, which simplifies the operation process and improves the calculation efficiency.

Fig. 2 shows a functional block diagram of a camera pose estimation device 200 according to an embodiment of the present invention. As shown in Fig. 2 , the device comprises: a first acquisition module 210 , a first calculation module 220 , a second acquisition module 230 and a second calculation module 240 .

A first acquisition module 210 is used to acquire external parameters of each camera in a multi-camera shooting device including at least two cameras, wherein the relative position relationship between each camera is fixed, and the at least two cameras include a master camera and one or more slave cameras;

A first calculation module 220 is used to determine a first extrinsic parameter of the master camera and a second extrinsic parameter of each of the slave cameras from the extrinsic parameters, and calculate a posture transformation relationship of each of the slave cameras relative to the master camera according to the first extrinsic parameter and the second extrinsic parameter;

A second acquisition module 230, configured to acquire a first geographical location of the main camera through a sensor;

The second calculation module 240 is used to generate three-dimensional points according to the multiple images taken by the multi-camera shooting device, and optimize the three-dimensional points according to the first geographical location and the posture transformation relationship to obtain the optimized first posture of each camera.

In some embodiments, the first calculation module 220 further includes:

A first calculation unit, configured to calculate the main camera extrinsic parameters and the slave camera extrinsic parameters according to a plurality of images taken by the main camera and the slave camera at the same track position;

A second calculation unit is used to calculate the conversion relationship between the image taken by the main camera and the image taken by the slave camera according to the main camera extrinsic parameter and the slave camera extrinsic parameter, and the calculation formula is: T01=Tw0'*Tw1, wherein T01 is the conversion relationship, Tw0 is the main camera extrinsic parameter, Tw0' is the inverse matrix of Tw0, and Tw1 is the slave camera extrinsic parameter;

The third computing unit is used to determine the conversion relationship of the image as the posture transformation relationship.

In some embodiments, the second calculation module 240 further includes:

a fourth computing unit, configured to extract feature point information of each of the plurality of images;

A fifth computing unit, configured to generate bag-of-words information according to the feature point information;

a sixth calculation unit, configured to perform a matching calculation on at least two images having the same feature descriptor in the bag-of-words information to obtain a matching relationship between the two matching images;

a seventh calculation unit, configured to calculate a relative transformation relationship between every two images in all images according to the matching relationship;

An eighth calculation unit is used to generate the three-dimensional point according to the relative transformation relationship.

In some embodiments, the second calculation module 240 further includes:

a ninth calculation unit, configured to determine a projection matrix for optimization according to the following formula: _Pi = k·[ _Rc | _tc ]·[ _Ri | _ti ], wherein _Pi is the projection matrix, [ _Rc | _tc ] is the posture transformation relationship, [ _Ri | _ti ] is the first geographic location, k is an intrinsic parameter of any camera of the master camera and the slave camera, i is a sequence number of the first geographic location, and c is a sequence number of the posture transformation relationship;

A tenth calculation unit is used to calculate the minimized reprojection error of the three-dimensional point according to the projection matrix, and the formula is: Among them, x is The three-dimensional point, _{xo is} a two-dimensional feature point obtained by reprojecting the three-dimensional point, and o is the sequence number of the three-dimensional point.

In some embodiments, the camera pose estimation apparatus 200 further includes:

A first elimination module is used to eliminate points whose pixel errors are greater than 4 pixels from the three-dimensional points according to the reprojection errors obtained after optimizing and calculating the three-dimensional points;

A second elimination module is used to eliminate points whose angles between observation points and the three-dimensional points are less than 2 degrees;

The optimization module is used to perform global optimization on the three-dimensional points.

In some embodiments, the camera pose estimation apparatus 200 further includes:

A third calculation module, configured to calculate a second geographical location of the slave camera according to the first geographical location of the master camera and the posture transformation relationship;

The fourth calculation module is used to optimize the calculation of the three-dimensional point according to the second geographical location and the posture transformation relationship to obtain the second posture optimized from each camera.

In some embodiments, the camera pose estimation apparatus 200 further includes:

A fifth calculation module, configured to calculate a second geographical location of the slave camera according to a first geographical location and a position transformation relationship of the master camera;

The sixth calculation module is used to optimize the calculation of the three-dimensional point according to the second geographical location and the posture transformation relationship to obtain the second posture optimized from each camera.

In some embodiments, the camera pose estimation apparatus 200 further includes:

The third acquisition module is used to obtain the historical posture transformation relationship of each slave camera relative to the main camera calculated according to the first external parameter and the second external parameter during the last operation of the multi-camera shooting device, use the historical posture transformation relationship as the posture transformation relationship, and jump to the step of obtaining the first geographic location of the main camera through the sensor.

FIG3 shows a schematic structural diagram of a camera pose estimation device according to an embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the camera pose estimation device.

As shown in FIG. 3 , the camera pose estimation device may include: a processor 302 , a memory 306 , a communication interface 304 , and a communication bus 308 .

The processor 302, the memory 306 and the communication interface 304 communicate with each other via the communication bus 308. Communication.

The memory 306 is used to store at least one program 310 , and the program 310 enables the processor 302 to execute the relevant steps in the above-mentioned camera pose estimation method embodiment.

An embodiment of the present invention further provides a computer-readable storage medium, in which at least one program is stored. When the program runs on a camera pose estimation device, the camera pose estimation device can execute the camera pose estimation method in any of the above method embodiments.

The algorithm or display provided herein is not inherently related to any particular computer, virtual system or other equipment. Various general purpose systems can also be used together with the teachings based on this. According to the above description, it is obvious to construct the structure required for this type of system. In addition, the embodiment of the present invention is not directed to any specific programming language yet. It should be understood that various programming languages can be utilized to realize the content of the present invention described herein, and the description made to specific languages above is for disclosing the best mode of the present invention.

In the description provided herein, a large number of specific details are described. However, it is understood that embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures and techniques are not shown in detail so as not to obscure the understanding of this description.

Similarly, it should be understood that in order to streamline the present invention and aid in understanding one or more of the various inventive aspects, in the above description of exemplary embodiments of the present invention, various features of the embodiments of the present invention are sometimes grouped together into a single embodiment, figure, or description thereof. However, this method of disclosure should not be interpreted as reflecting an intention that the claimed invention requires more features than those expressly recited in each claim.

Those skilled in the art will appreciate that the modules in the devices in the embodiments may be adaptively changed and arranged in one or more devices different from the embodiments. The modules or units or components in the embodiments may be combined into one module or unit or component, and may be divided into a plurality of submodules or subunits or subcomponents. Except that at least some of such features and/or processes or units are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstracts and drawings) and all processes or units of any method or device disclosed in this manner may be combined in any combination. Unless otherwise expressly stated, each feature disclosed in this specification (including the accompanying claims, abstracts and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

It should be noted that the above embodiments illustrate the present invention rather than limit the present invention. And those skilled in the art may design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference symbols placed between brackets shall not be construed as limiting the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "one" or "an" preceding an element does not exclude the presence of a plurality of such elements. The present invention may be implemented with the aid of hardware comprising several different elements and with the aid of appropriately programmed computers. In a unit claim enumerating several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third, etc. does not indicate any order. These words may be interpreted as names. The steps in the above embodiments, unless otherwise specified, should not be understood as limitations on the order of execution.

Claims

A camera pose estimation method, characterized by comprising:

Acquire external parameters of each camera in a multi-camera shooting device including at least two cameras, wherein the relative position relationship between each camera is fixed, and the at least two cameras include a master camera and one or more slave cameras;

Determine a first extrinsic parameter of the master camera and a second extrinsic parameter of each of the slave cameras from the extrinsic parameters, and calculate a posture transformation relationship of each of the slave cameras relative to the master camera according to the first extrinsic parameter and the second extrinsic parameter;

Acquire a first geographic location of the main camera through a sensor;

Three-dimensional points are generated according to the multiple images taken by the multi-camera shooting device, and the three-dimensional points are optimized and calculated according to the first geographical location and the posture transformation relationship to obtain the optimized first posture of each camera.
The camera pose estimation method according to claim 1 is characterized in that the step of determining a first extrinsic parameter of the master camera and a second extrinsic parameter of each slave camera from the extrinsic parameter, and calculating a pose transformation relationship of each slave camera relative to the master camera according to the first extrinsic parameter and the second extrinsic parameter further comprises:

The main camera extrinsic parameter and the slave camera extrinsic parameter are calculated according to a plurality of images taken by the main camera and the slave camera at the same track position;

According to the main camera extrinsic parameter and the slave camera extrinsic parameter, the conversion relationship between the image taken by the main camera and the image taken by the slave camera is calculated, and the calculation formula is:
T01＝Tw0'*Tw1

Wherein, T01 is the conversion relationship, Tw0 is the master camera external parameter, Tw0' is the inverse matrix of Tw0, and Tw1 is the slave camera external parameter;

The transformation relationship of the image is determined as the posture transformation relationship.
The camera pose estimation method according to claim 1, characterized in that the generating of three-dimensional points according to the multiple images captured by the multi-camera shooting device further comprises:

Extract feature point information of each image from multiple images;

Generate bag-of-words information according to the feature point information;

Performing matching calculation on at least two images having the same feature descriptor in the bag-of-words information to obtain a matching relationship between the two matching images;

Calculating the relative transformation relationship between every two images in all images according to the matching relationship;

The three-dimensional point is generated according to the relative transformation relationship.
The camera pose estimation method according to claim 1, characterized in that the optimizing calculation of the three-dimensional point according to the first geographical location and the pose transformation relationship further comprises:

The projection matrix used for optimization is determined according to the following formula:
P i = k · [R c | t c ] · [R i | t i ]

Wherein, Pi is the projection matrix, [ Rc | tc ] is the posture transformation relationship, [ Ri | ti ] is the first geographic location, k is the intrinsic parameter of any camera of the master camera and the slave camera, i is the sequence number of the first geographic location, and c is the sequence number of the camera;

The minimized reprojection error of the three-dimensional point is calculated according to the projection matrix, and the formula is:

Wherein, x is the three-dimensional point, xo is the two-dimensional feature point obtained by reprojecting the three-dimensional point, and o is the serial number of the three-dimensional point.
The camera pose estimation method according to claim 1, characterized in that after optimizing and calculating the three-dimensional point according to the first geographical location and the pose transformation relationship, the camera pose estimation method further comprises:

According to the reprojection error obtained after optimizing the three-dimensional points, points whose pixel errors are greater than 4 pixels are eliminated from the three-dimensional points;

Eliminate points whose angles between observation points are less than 2 degrees from the three-dimensional points;

The three-dimensional points are globally optimized.
The camera pose estimation method according to claim 1, characterized in that the camera pose estimation method further comprises:

Calculate the second geographical location of the slave camera according to the first geographical location of the master camera and the posture transformation relationship;

The three-dimensional point is optimized and calculated according to the second geographical location and the posture transformation relationship to obtain the second posture optimized by each slave camera.
The camera pose estimation method according to claim 1, characterized in that after obtaining the external parameters of each camera in a multi-camera shooting device including at least two cameras, the camera pose estimation method further comprises:

The position transformation relationship of each of the slave cameras relative to the master camera calculated according to the first external parameter and the second external parameter during the last operation of the multi-camera shooting device is obtained.
A camera pose estimation device, comprising:

A first acquisition module, used to acquire an external parameter of each camera in a multi-camera shooting device including at least two cameras, wherein a relative position relationship between each of the cameras is fixed, and the at least two cameras include a master camera and one or more slave cameras;

A first calculation module is used to determine a first extrinsic parameter of the master camera and a second extrinsic parameter of each of the slave cameras from the extrinsic parameters, and calculate a posture transformation relationship of each of the slave cameras relative to the master camera according to the first extrinsic parameter and the second extrinsic parameter;

A second acquisition module, configured to acquire a first geographic location of the main camera through a sensor;

The second calculation module is used to generate three-dimensional points according to the multiple images taken by the multi-camera shooting device, and optimize the three-dimensional points according to the first geographical location and the posture transformation relationship to obtain the optimized first posture of each camera.
A camera pose estimation device, characterized in that it comprises: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other through the communication bus;

The memory is used to store at least one program, and the program enables the processor to perform the operation of the camera pose estimation method as described in any one of claims 1-7.
A computer-readable storage medium, characterized in that at least one program is stored in the storage medium, and when the program is run on a camera pose estimation device, the camera pose estimation device performs the operation of the camera pose estimation method as described in any one of claims 1-7.