CN113781567A

CN113781567A - Aerial image target geographic positioning method based on three-dimensional map generation

Info

Publication number: CN113781567A
Application number: CN202111179403.3A
Authority: CN
Inventors: 杨涛; 张芳冰; 李东东; 王志军; 范婧慧; 李烨; 宁雅佳
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2021-12-10
Anticipated expiration: 2041-10-08
Also published as: CN113781567B

Abstract

The invention relates to an aerial image target geographic positioning method based on three-dimensional map generation, and belongs to the technical field of image processing. The method reduces the introduction of sensor error sources as much as possible, only utilizes an on-line acquired aerial image sequence and GPS information to generate and optimize the position and the sparse geographic map of the airborne camera in real time, simultaneously projects and triangulates map points in a visual range on an image to quickly obtain triangular meshes of the image with geographic information, and further accurately estimates the GPS values of target pixel points by adopting a triangular internal linear interpolation method based on the vertices of the meshes. The method can obtain good ground target geographical positioning results in various complex environments, different flight heights and different attitude angles.

Description

Aerial image target geographic positioning method based on three-dimensional map generation

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an aerial image target geographic positioning method based on three-dimensional map generation.

Background

Geo-locating ground objects of interest from an airborne platform is an important activity in many applications such as visual surveillance. However, due to the limited measurement accuracy of airborne sensors such as gyroscopes, it is difficult for current ground target positioning algorithms based on unmanned aerial vehicles to obtain precise geographical positions immediately, and the accuracy of high altitude and medium altitude is difficult to guarantee. Common solutions include: 1) ground target localization based on image registration. The scheme needs to obtain the prior information of the basic image in advance, and the accurate position of the ground target is obtained by matching with the prior image information. The method can generally locate multiple points simultaneously, but relies on prior knowledge such as satellite images, and the accuracy of target location is greatly affected by the spatial position accuracy provided by the prior knowledge. 2) Ground target location based on coordinate transformation. The method utilizes the GPS position of the unmanned aerial vehicle, calculates the actual geographic coordinates of the target according to the conversion relation among the image coordinate system, the body coordinate system and the geodetic coordinate system, and often introduces the attitude angle of the unmanned aerial vehicle. Although this direct calculation scheme is simple and easy to understand and has low calculation cost, the accuracy of target geographic positioning is still insufficient and the measured low-accuracy attitude angle is a main source of error, especially when the flying height of the unmanned aerial vehicle is high. To address this problem, the literature "Vision-Based Target Three-Dimensional localization Using Unmanned Aerial Vehicles, 65(10):8052 and 8061, 2018" designs a Vision-Based Geolocation method to determine the 3-D position of a Target, which uses two or more images containing the Target taken from different viewpoints to accurately estimate the Target height and yaw angle measurement deviation and perform Three-Dimensional localization of the ground Target. The solution based on multi-point observation is an important research direction at present, but still faces the difficulties of insufficient geographic positioning precision, limited flying height and difficulty in simultaneously considering the accurate positioning of stationary and moving targets.

Disclosure of Invention

Technical problem to be solved

In order to improve the multi-target geographic positioning effect in the unmanned aerial vehicle image, the invention provides an aerial image target geographic positioning method based on three-dimensional map generation.

Technical scheme

An aerial image target geographic positioning method based on three-dimensional map generation is characterized by comprising the following steps:

step 1: pose estimation and three-dimensional map generation: aiming at an aerial image acquired on line, firstly extracting image feature points and carrying out feature matching with adjacent key frames; then, determining and optimizing the pose of the current camera by using a pose estimation method, generating three-dimensional map points and judging whether a key frame is inserted or not; finally, temporarily storing the current image and the obtained corresponding camera pose into a buffer area B; the cache region can support the storage of b images and the poses of the corresponding cameras and a GPS;

step 2: and (3) joint optimization of pose and map: inserting the new key frame K determined in the step 1 into a key frame list, and updating the newly added three-dimensional points into a three-dimensional map; then, based on the new key frame, the associated key frame with the common view relation and the corresponding three-dimensional points, performing joint optimization of the coordinates of the map points and the poses of the key frames by minimizing the reprojection error;

and step 3: and (3) recovering the geographic information of the three-dimensional map: when the number N of frames in the key frame list is greater than 3, determining a geographic transformation matrix of a reference geographic coordinate system and a three-dimensional map coordinate system according to key frame poses in a map and a GPS (global positioning system) corresponding to the key frame poses one by one, wherein the geographic transformation matrix comprises a scaling coefficient s, a rotation matrix R and a translation matrix T; the ECEF rectangular coordinate system is selected as a reference geographic coordinate system, so that the GPS coordinate needs to be converted into the ECEF coordinate when the conversion matrix is estimated;

and 4, step 4: and (3) effective key frame fast screening: when the number of the cached images in the step 1 is equal to m and an effective geographic transformation matrix exists in the step 3, acquiring the image to be processed and the pose and the GPS value of the corresponding camera from the initial position of the buffer area B, and further estimating the target geographic position of the image; firstly, calculating a monitoring field range by utilizing a GPS value corresponding to an image, the approximate flying height of the unmanned aerial vehicle and the field angle of an airborne camera; then, defining an area by taking the origin and 2 times of the diagonal length of the field of view range as a radius, and further determining an effective key frame in the area in the three-dimensional map; the cameras corresponding to the key frames are likely to have a common-view relationship with the cameras corresponding to the images;

and 5: and (3) image triangular mesh generation: based on the selected effective key frames, firstly, acquiring a visible three-dimensional point set of the key frames in the three-dimensional map, and avoiding introducing all map points to cause subsequent larger workload; then, re-projecting the three-dimensional points by using the pose of the image, and keeping a two-dimensional pixel point set projected in the image; finally, the discrete two-dimensional pixel point set in the image is divided into a plurality of triangular meshes by adopting Delaunay triangulation, and the formation of a triangle by using the nearest three points is ensured to form a uniform and smooth triangular mesh;

step 6: estimating the latitude and longitude of the image target: based on a triangular mesh generated by an image, firstly confirming a triangle to which a target pixel point belongs, and obtaining pixel coordinates of three vertexes; then, according to the distribution of the target points and the positions of the three vertexes of the image pixels, estimating the three-dimensional coordinates of the target pixel points in the map by utilizing the three-dimensional map coordinates of the three vertexes and adopting a triangular internal linear interpolation method; and finally, calculating the ECEF coordinate of the target in the reference geographic coordinate system by using the geographic transformation matrix obtained in the step 3, and further obtaining the GPS position of the target.

Preferably: the bit attitude estimation method in the step 1 specifically comprises the following steps: the initialization process is 2D-2D, and the frame tracking process is 3D-2D.

Preferably: the step 1 of simultaneously generating three-dimensional map points and judging whether to insert a key frame specifically comprises the following steps: and judging whether the key frame is inserted or not according to whether the occupation ratio of the overlapping area of the current image and the latest key frame is larger than a set threshold or not.

Advantageous effects

The aerial image target geographic positioning method based on three-dimensional map generation provided by the invention has the advantages that the introduction of sensor error sources is reduced as much as possible, the use of attitude angles, terrain flatness hypothesis and any prior information such as a Digital Elevation Model (DEM) is avoided, the three-dimensional sparse geographic map of a scene is generated and optimized in real time only by utilizing an on-line acquired airborne image sequence and GPS information, the rapid estimation of the geographic position of target pixel points in the image is realized through map point projection and triangulation, the precision and the speed of ground target positioning are effectively improved, and the method is suitable for real-time monitoring of unmanned aerial vehicles or emergency rescue systems.

Compared with the prior art, the method only utilizes the monocular aerial image sequence and the GPS information, avoids introducing various sensor error sources such as attitude angles and a plurality of constraint conditions such as prior of topographic data, and utilizes the online generation of the three-dimensional geographic map, thereby effectively improving the accuracy and speed of target geographic positioning, realizing the rapid and accurate estimation of multi-target longitude and latitude in the aerial image, and having the positioning effect of considering both moving and static targets.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

FIG. 1 is a flow chart of a method for geo-locating a target in an aerial image generated based on a three-dimensional map.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides an aerial image target geographical positioning method based on three-dimensional map generation, which comprises the following steps: the method comprises the steps of pose estimation and three-dimensional map generation, pose and map joint optimization, three-dimensional map geographic information recovery, effective key frame fast screening, image triangular grid generation and image target longitude and latitude estimation.

(a) Pose estimation and three-dimensional map generation: aiming at an aerial image acquired on line, firstly extracting image feature points and carrying out feature matching with adjacent key frames; then, determining and optimizing the pose of the current camera by using a pose estimation method of 2D-2D (initialization process) or 3D-2D (frame tracking process), generating a three-dimensional map point and judging whether a key frame is inserted; and finally, temporarily storing the current image and the obtained corresponding camera pose into a cache area B. The buffer area can support the storage of b images and the poses of the corresponding cameras and the GPS.

(b) And (3) joint optimization of pose and map: inserting the new key frame K determined in the step 1 into a key frame list, and updating the newly added three-dimensional points into a three-dimensional map; and then, based on the new key frame, the associated key frame with the common view relation and the corresponding three-dimensional points, performing joint optimization of the coordinates of the map points and the pose of the key frame by minimizing the reprojection error.

(c) And (3) recovering the geographic information of the three-dimensional map: and when the number N of the frames in the key frame list is more than 3, determining a geographic transformation matrix of a reference geographic coordinate system and a three-dimensional map coordinate system according to the key frame poses in the map and the GPS corresponding to the key frame poses in the map one by one, wherein the geographic transformation matrix comprises a scaling coefficient s, a rotation matrix R and a translation matrix T. An Earth Centered Earth Fixed (ECEF) rectangular coordinate system is selected as the reference geographic coordinate system, so that the GPS coordinates need to be converted into ECEF coordinates when estimating the conversion matrix.

(d) And (3) effective key frame fast screening: and (3) when the number of the cached images in the step (1) is equal to m and an effective geographic transformation matrix exists in the step (3), acquiring the image to be processed and the pose and the GPS value of the corresponding camera from the initial position of the buffer area B, and further estimating the target geographic position of the image. Firstly, calculating a monitoring field range ((m) × (m)) by using a GPS value corresponding to the image, the approximate flying height of the unmanned aerial vehicle and a field angle of an airborne camera; and then, regarding the origin and 2 times of the length of the diagonal line of the field of view range as a radius to define an area, and further determining an effective key frame in the area in the three-dimensional map. The cameras corresponding to these key frames are likely to have a co-view relationship with the cameras corresponding to the images.

(e) And (3) image triangular mesh generation: based on the selected effective key frames, firstly, acquiring a visible three-dimensional point set of the key frames in the three-dimensional map, and avoiding introducing all map points to cause subsequent larger workload; then, re-projecting the three-dimensional points by using the pose of the image, and keeping a two-dimensional pixel point set projected in the image; and finally, dividing the discrete two-dimensional pixel point set in the image into a plurality of triangular meshes by adopting Delaunay triangulation, and ensuring that the nearest three points form a triangle to form uniform and smooth triangular meshes.

(f) Estimating the latitude and longitude of the image target: based on a triangular mesh generated by an image, firstly confirming a triangle to which a target pixel point belongs, and obtaining pixel coordinates of three vertexes; then, according to the distribution of the target points and the positions of the three vertexes of the image pixels, estimating the three-dimensional coordinates of the target pixel points in the map by utilizing the three-dimensional map coordinates of the three vertexes and adopting a triangular internal linear interpolation method; and finally, calculating the ECEF coordinate of the target in the reference geographic coordinate system by using the geographic transformation matrix obtained in the step 3, and further obtaining the GPS position of the target.

In order that those skilled in the art will better understand the present invention, the following detailed description is given with reference to specific examples.

1) Pose estimation and three-dimensional map generation

Based on an image I acquired by an unmanned aerial vehicle on line, firstly, extracting an ORB feature point of the image, which is an effective feature point with high calculation speed; then, whether the initialization is successful or not is determined, if the initialization is not successful, an initialization process is executed, the pose of the camera is determined through feature matching and 2D-2D pose estimation, as shown in a formula (1), and meanwhile a key frame list and a three-dimensional map are initialized; if the initialization is successful, performing feature matching and 2D-3D pose estimation on the current image and the latest key frame, and generating a new three-dimensional map point and judging whether a key frame is inserted or not (specifically, judging whether the key frame is inserted or not according to whether the occupation ratio of the overlapping area of the current image and the latest key frame is greater than a set threshold, wherein the threshold is set to be 0.6) as shown in formula (2); and finally, temporarily storing the current image and the obtained corresponding camera pose into a cache area B. This buffer can support the storage of b images and their corresponding camera poses and GPS, b being typically set to 10 x f depending on the frame rate f at which the processor runs the algorithm, i.e. approximately buffering the data in 10s, here set to 150.

Wherein, p in the formula (1)_t-1And p_tIs a matching point set between a current frame and an initial key frame, K is camera internal reference, and a camera pose [ R ] is obtained by solving a basic matrix F and decomposing the basic matrix F_t|T_t](ii) a In the formula (2), p_iIs the 2D feature point of the image I, and the corresponding 3D map point is P_i＝{x_i y_i z_i}^T，p_iIs a map point P_iA projection pixel point, I ═ 1, 2.. n, on the image I. Solving for the camera pose [ R ] by minimizing the reprojection error of the n pairs of matching points_t|T_t]。

2) Pose and map joint optimization

When step 1 determines to add a new key frame K, inserting K into the key frame list; the new keyframes and their associated keyframes with a common view relationship, and their visual 3D map points,and carrying out joint binding optimization on the coordinates of the map points and the pose of the key frame by solving the problem of the minimized reprojection error. Assume that the associated key frame set of the current frame I is { K }_cj1,2,. C } and their pose { [ R } ]_j|T_j]1,2,. C, the 3D map point set seen by the camera to which these keyframes belong is { P }_iI ═ 1,2,. D }, and the three-dimensional point Pi is at key frame K_cjThe corresponding 2D feature point in (1) is P_ij，p′_ijIs a three-dimensional point P_iIn key frame K_cjThe optimization problem is shown in equation (3):

3) three-dimensional map geographic information recovery

The ECEF rectangular coordinate system is here chosen as the reference geographical coordinate system. Suppose there are N keyframes { K1, K2,. KN } at the current time, the corresponding pose is { [ R } ]₁|T₁]，[R₂|T₂]，...[R_N|T_N]Is { g, the corresponding GPS value₁，g₂，...g_NAnd coordinates of cameras corresponding to the key frames in a three-dimensional map coordinate system are respectively { T }₁，T₂，...T_NAnd obtaining the coordinates of the GPS coordinate system and the ECEF coordinate system in the reference geographic coordinate system as { e } respectively according to the fixed conversion relation of the GPS coordinate system and the ECEF coordinate system₁，e₂，...e_N}. When N is larger than 3, according to the point set { T ] in the coordinate system of the three-dimensional map₁，T₂，...T_NAnd a set of points in a geographic coordinate system e₁，e₂，...e_NSolving a transformation matrix between two space rectangular coordinate systems according to the one-to-one correspondence relationship between the two space rectangular coordinate systems, wherein the transformation matrix comprises a scale factor s, a rotation matrix R and a translation matrix T. The formula for calculating the scaling factor s is shown in (4) and (5), wherein O_TAnd O_eRespectively, the centroids of the two point sets. To solve the rotation matrix R, a matrix H is set according to equation (6), and SVD decomposition is performed on H to obtain a rotation matrix, as shown in equation (7). Based on s and R, a translation matrix is calculated according to formula (8)T。

H＝(T₁-O_T，...，T_N-O_T)^T·(e₁-O_e，...，e_N-O_e) (6)

It should be noted that when N is less than 3, the transformation matrix cannot be estimated according to the above formula. At this time, s is set to 0, and the rotation matrix and the translation matrix are set to one unit matrix and one zero matrix, respectively.

4) Efficient keyframe fast screening

When the number of the cached images in the step 1 is equal to B and the geographic transformation matrix s obtained in the step 3 is not equal to 0, firstly, a first-in first-out strategy is adopted to obtain the image I to be processed from the initial position of the buffer area B₀And the pose of the corresponding camera and the GPS value g₀(lon, lat, alt); then, according to the approximate flying height H (m) of the unmanned aerial vehicle, the field angle FOV of the onboard camera and the image resolution (w)_p*h_p) Calculating a monitoring field of view range (W)_r(m)*H_r(m)), as shown in formula (9); finally, using g as original point and 2 times diagonal length 2 x l of field of view range as radius to define search area of effective key frame having common-view relation with camera corresponding to image I1, in said area the key frame is correspondent to GPS value g of camera₀'(lon', lat ', alt') should satisfy the formula (10).

Where R is 6371004(m), which is the average radius of the earth, and rad () is a function that converts angle to radians.

5) Image triangular mesh generation

Based on the selected effective key frames, firstly acquiring a visible three-dimensional point set P ═ P of the key frames in the three-dimensional map_iI ═ 1,2,. D }, avoiding introducing all map points to cause subsequent large workload; then, using camera internal reference K, image I₀Corresponding camera pose [ R ]₀|T₀]Re-projecting the three-dimensional point set according to a formula (11) to obtain a two-dimensional point set

I ═ 1,2,. D }, and then screening to obtain a projection on the image I₀Two-dimensional pixel point set in

|p_iIs epsilon of p and 0 is more than or equal to u_i≤w_p，0≤v_i≤h_p}; and finally, a Delaunay triangulation algorithm is adopted, a discrete point set p' in the image is divided into a plurality of triangular meshes through point-by-point insertion operation, the formation of a triangle by the nearest three points is ensured, a uniform and smooth triangular mesh is formed, and the excellent characteristics of uniqueness, optimality, most rule and the like are met:

s_i·(u_i v_i 1)^T＝K(R₀·P_i+T₀)，i＝1，2，...D (11)

6) image target latitude and longitude estimation

First, traverse image I₀Confirming target pixel point p of all triangles of the triangular mesh_t(u, v) is located at threeAngular Delta S₁S₂S₃The pixel coordinates of three vertexes are assumed to be S respectively₁(u₁，v₁)，S₂(u₂，v₂)，S₃(u₃，v₃) Their corresponding three-dimensional map coordinates are respectively

The result is unique; thereafter, three vertices S are calculated according to equation (12)₁，S₂，S₃Inner linear interpolation coefficient k of₁，k₂，k₃And estimating a pixel point p according to the formula (13)_tThree-dimensional coordinates P in a map_t(x, y, z); finally, the geographic transformation matrix [ s x R | T ] obtained in the step 3 is utilized]Calculating the ECEF coordinates e of the target in the reference geographic coordinate system according to the formula (4)_t(X, Y, Z), and further obtaining the GPS value.

The method reduces the introduction of sensor error sources as much as possible, only utilizes an on-line acquired aerial image sequence and GPS information to generate and optimize the position and the sparse geographic map of the airborne camera in real time, simultaneously projects and triangulates map points in a visual range on an image to quickly obtain triangular meshes of the image with geographic information, and further accurately estimates the GPS values of target pixel points by adopting a triangular internal linear interpolation method based on the vertices of the meshes. The method can obtain good ground target geographical positioning results in various complex environments, different flight heights and different attitude angles. For urban scenes with rich three-dimensional structures, the positioning precision is less than 1m in flight height from 0m to 2000 m.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure.

Claims

1. An aerial image target geographic positioning method based on three-dimensional map generation is characterized by comprising the following steps:

2. The method for geographically positioning the target of the aerial image generated based on the three-dimensional map as claimed in claim 1, wherein the method for estimating the position in step 1 specifically comprises: the initialization process is 2D-2D, and the frame tracking process is 3D-2D.

3. The method for geographically locating the target in the aerial image generated based on the three-dimensional map as claimed in claim 1, wherein the step 1 of simultaneously generating the three-dimensional map points and determining whether to insert the key frame specifically comprises: and judging whether the key frame is inserted or not according to whether the occupation ratio of the overlapping area of the current image and the latest key frame is larger than a set threshold or not.