CN113781567B

CN113781567B - Aerial image target geographic positioning method based on three-dimensional map generation

Info

Publication number: CN113781567B
Application number: CN202111179403.3A
Authority: CN
Inventors: 杨涛; 张芳冰; 李东东; 王志军; 范婧慧; 李烨; 宁雅佳
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2024-05-31
Anticipated expiration: 2041-10-08
Also published as: CN113781567A

Abstract

The invention relates to an aerial image target geographic positioning method based on three-dimensional map generation, and belongs to the technical field of image processing. The method reduces the introduction of sensor error sources as far as possible, generates and optimizes the pose of the onboard camera and a sparse geographic map in real time by only using an online acquired aerial image sequence and GPS information, simultaneously projects and triangulates map points in a visible range on an image to quickly obtain triangular grids with geographic information of the image, and further accurately estimates the GPS value of a target pixel point by adopting a triangle internal linear interpolation method based on the grid vertexes. The method can obtain good ground target geographic positioning results in various complex environments, different flying heights and different attitude angles.

Description

Aerial image target geographic positioning method based on three-dimensional map generation

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an aerial image target geographic positioning method based on three-dimensional map generation, which utilizes unmanned aerial vehicle-mounted video to realize accurate estimation of the geographic position of an image target by quickly constructing a scene three-dimensional map.

Background

Geolocation of a ground target of interest from an aerial platform is an important activity in many applications, such as visual surveillance. However, due to the limited measurement accuracy of the on-board sensor such as the gyroscope, the current ground target positioning algorithm based on the unmanned aerial vehicle is mostly difficult to obtain the accurate geographic position immediately, and particularly the accuracy at the middle and high altitudes is difficult to ensure. Common solutions include: 1) Ground target positioning based on image registration. The scheme needs to obtain priori information of the basic image in advance, and the accurate position of the ground target is obtained by matching the priori image information. This approach can generally be used for simultaneous multi-point positioning, but relies on a priori knowledge such as satellite images, and the accuracy of target positioning is greatly affected by the spatial position accuracy provided by the a priori knowledge. 2) Ground target positioning based on coordinate transformation. According to the method, the GPS position of the unmanned aerial vehicle is utilized, the actual geographic coordinates of the target are calculated according to the conversion relation between the machine body coordinate system and the geodetic coordinate system, and the attitude angle of the unmanned aerial vehicle is often introduced. The direct calculation scheme is simple and easy to understand and low in calculation cost, but the accuracy of target geographic positioning is still insufficient, and the measured low-accuracy attitude angle is a main source of error, especially when the flying height of the unmanned aerial vehicle is high. To solve this problem, literature "Vision-Based Target Three-Dimensional Geolocation Using Unmanned Aerial Vehicles,IEEE Transactions on Industrial Electronics,65(10):8052–8061,2018" devised a vision-based geolocation method to determine the 3-D position of an object, which uses two or more images containing the object taken from different viewpoints to accurately estimate the target altitude and yaw angle measurement bias and to perform three-dimensional localization of the ground object. This solution based on multipoint observation is an important research direction at present, but still faces the difficulties that the geographic positioning accuracy is insufficient, the flying height is limited, and the precise positioning of static and moving targets is difficult to achieve simultaneously.

Disclosure of Invention

Technical problem to be solved

In order to improve the multi-target geographic positioning effect in an unmanned aerial vehicle image, the invention provides an aerial image target geographic positioning method based on three-dimensional map generation.

Technical proposal

An aerial image target geographic positioning method based on three-dimensional map generation is characterized by comprising the following steps:

Step 1: pose estimation and three-dimensional map generation: for an aerial image acquired on line, firstly extracting image feature points and performing feature matching with adjacent key frames; then, determining and optimizing the pose of the current camera by using a pose estimation method, generating three-dimensional map points and judging whether a key frame is inserted or not; finally, the current image and the obtained corresponding camera pose are temporarily stored in a buffer area B; the buffer area can support and store b images and the pose and GPS of the corresponding camera;

Step 2: pose and map joint optimization: inserting the new key frame K determined in the step 1 into a key frame list, and simultaneously updating the newly added three-dimensional points into the three-dimensional map; then, based on the new key frame, the associated key frame with the common view relation and the corresponding three-dimensional points, joint optimization of map point coordinates and key frame pose is performed by minimizing the reprojection error;

step 3: three-dimensional map geographic information recovery: when the number of frames N in the key frame list is greater than 3, determining a geographic transformation matrix of a reference geographic coordinate system and a three-dimensional map coordinate system according to GPS (global positioning system) corresponding to the key frame pose in the map one by one, wherein the geographic transformation matrix comprises a scaling coefficient s, a rotation matrix R and a translation matrix T; the ECEF rectangular coordinate system is selected as a reference geographic coordinate system, so that GPS coordinates are required to be converted into ECEF coordinates when a conversion matrix is estimated;

step 4: effective key frame rapid screening: when the number of the cached images in the step 1 is equal to m and an effective geographic transformation matrix exists in the step 3, acquiring an image to be processed and the pose and GPS value of a camera corresponding to the image from the initial position of a buffer area B, and further estimating the target geographic position of the image; firstly, calculating a monitoring field range by using a GPS value corresponding to an image, the approximate flying height of an unmanned aerial vehicle and the field angle of an airborne camera; then, an origin is used as an origin, and 2 times of the diagonal length of the view field range is used as a radius to define a region, so that effective key frames in the region are determined in the three-dimensional map; the cameras corresponding to the key frames are likely to have a common-view relationship with the cameras corresponding to the images;

step 5: generating an image triangular grid: based on the selected effective key frames, firstly, a visual three-dimensional point set of the key frames in the three-dimensional map is obtained, and the subsequent large workload caused by introducing all map points is avoided; then, re-projecting the three-dimensional points by using the pose of the image, and reserving a two-dimensional pixel point set projected in the image; finally, the Delaunay triangulation is adopted to divide the discrete two-dimensional pixel point set in the image into a plurality of triangular grids, and the triangle is formed by the nearest three points, so that a uniform and smooth triangular grid is formed;

Step 6: estimating longitude and latitude of an image target: firstly confirming a triangle to which a target pixel belongs based on a triangle grid generated by an image, and obtaining pixel coordinates of three vertexes; then, according to the distribution of the target point and the three vertexes at the image pixel positions, estimating the three-dimensional coordinates of the target pixel point in the map by using the three-dimensional map coordinates of the three vertexes by adopting a triangle internal linear interpolation method; and finally, calculating ECEF coordinates of the target in a reference geographic coordinate system by using the geographic transformation matrix obtained in the step 3, and further obtaining the GPS position of the target.

Preferably: the pose estimation method in the step 1 specifically comprises the following steps: the initialization process is 2D-2D, and the frame tracking process is 3D-2D.

Preferably: in the step 1, three-dimensional map points are generated simultaneously and whether key frames are inserted or not is judged specifically as follows: and judging whether to insert the key frame according to whether the occupation ratio of the overlapping area of the current image and the latest key frame is larger than a set threshold value.

Advantageous effects

According to the aerial image target geographic positioning method based on three-dimensional map generation, sensor error sources are reduced as much as possible, attitude angles, terrain flatness assumption and any priori information such as Digital Elevation Model (DEM) are avoided, a three-dimensional sparse geographic map of a scene is generated and optimized in real time only by using an on-line acquired airborne image sequence and GPS information, rapid estimation of the geographic position of a target pixel point in an image is realized through map point projection and triangulation, the accuracy and speed of ground target positioning are effectively improved, and the aerial image target geographic positioning method based on the three-dimensional sparse geographic map is suitable for unmanned aerial vehicle real-time monitoring or emergency rescue systems.

Compared with the prior art, the method and the device only use the monocular aerial image sequence and GPS information, and by avoiding introducing various sensor error sources such as attitude angles and some constraint conditions such as priori of terrain data, the three-dimensional geographic map is generated on line, so that the accuracy and speed of target geographic positioning are effectively improved, the rapid and accurate estimation of the longitude and latitude of multiple targets in the aerial image is realized, and the positioning effect of the method and the device is compatible with the moving and static targets.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

FIG. 1 is a flow chart of a method for geolocating an aerial image target based on three-dimensional map generation.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention provides an aerial image target geographic positioning method based on three-dimensional map generation, which comprises the following steps: pose estimation and three-dimensional map generation, pose and map joint optimization, three-dimensional map geographic information recovery, effective key frame rapid screening, image triangular grid generation and image target longitude and latitude estimation.

(A) Pose estimation and three-dimensional map generation: for an aerial image acquired on line, firstly extracting image feature points and performing feature matching with adjacent key frames; then, determining and optimizing the pose of the current camera by using a pose estimation method of 2D-2D (initialization process) or 3D-2D (frame tracking process), generating three-dimensional map points and judging whether a key frame is inserted or not; and finally, temporarily storing the current image and the obtained pose of the corresponding camera into a buffer area B. The buffer area can support and store b images and the pose and GPS of the corresponding camera.

(B) Pose and map joint optimization: inserting the new key frame K determined in the step 1 into a key frame list, and simultaneously updating the newly added three-dimensional points into the three-dimensional map; and then, based on the new key frames, the associated key frames with the common view relationship and the corresponding three-dimensional points, carrying out joint optimization of map point coordinates and key frame pose by minimizing the reprojection error.

(C) Three-dimensional map geographic information recovery: when the number of frames N in the key frame list is larger than 3, determining a geographic transformation matrix of a reference geographic coordinate system and a three-dimensional map coordinate system according to GPS (global positioning system) corresponding to the key frame pose in the map one by one, wherein the geographic transformation matrix comprises a scaling coefficient s, a rotation matrix R and a translation matrix T. The geocentric rectangular coordinate system (EARTH CENTERED EARTH Fixed, ECEF) is chosen as the reference geographic coordinate system, so that the GPS coordinates need to be converted into ECEF coordinates when estimating the conversion matrix.

(D) Effective key frame rapid screening: when the number of the cached images in the step 1 is equal to m and an effective geographic transformation matrix exists in the step 3, the image to be processed and the pose and GPS value of the corresponding camera are obtained from the initial position of the buffer area B, and then target geographic position estimation is carried out on the image. Firstly, calculating a monitoring field range ((m) x (m)) by utilizing a GPS value corresponding to an image, namely the approximate flying height of the unmanned aerial vehicle and the field angle of an onboard camera; and then taking the origin as the original point, taking 2 times of the diagonal length of the field of view as the radius to delimit the area, and further determining the effective key frames in the area in the three-dimensional map. The cameras corresponding to these key frames are likely to have a co-view relationship with the camera corresponding to the image.

(E) Generating an image triangular grid: based on the selected effective key frames, firstly, a visual three-dimensional point set of the key frames in the three-dimensional map is obtained, and the subsequent large workload caused by introducing all map points is avoided; then, re-projecting the three-dimensional points by using the pose of the image, and reserving a two-dimensional pixel point set projected in the image; and finally, adopting Delaunay triangulation to divide the discrete two-dimensional pixel point set in the image into a plurality of triangular grids, and ensuring that the nearest three points form triangles to form uniform and smooth triangular grids.

(F) Estimating longitude and latitude of an image target: firstly confirming a triangle to which a target pixel belongs based on a triangle grid generated by an image, and obtaining pixel coordinates of three vertexes; then, according to the distribution of the target point and the three vertexes at the image pixel positions, estimating the three-dimensional coordinates of the target pixel point in the map by using the three-dimensional map coordinates of the three vertexes by adopting a triangle internal linear interpolation method; and finally, calculating ECEF coordinates of the target in a reference geographic coordinate system by using the geographic transformation matrix obtained in the step 3, and further obtaining the GPS position of the target.

In order that those skilled in the art will better understand the present invention, the following detailed description of the present invention will be provided with reference to specific examples.

1) Pose estimation and three-dimensional map generation

Based on an image I acquired by an unmanned aerial vehicle on line, firstly extracting an ORB characteristic point of the image, which is an effective characteristic point with high calculation speed; then, determining whether the initialization is successful, if not, executing an initialization process, determining the pose of the camera through feature matching and 2D-2D pose estimation, and initializing a key frame list and a three-dimensional map at the same time as shown in a formula (1); if the initialization is successful, performing feature matching and 2D-3D pose estimation on the current image and the latest key frame, generating new three-dimensional map points and judging whether to insert the key frame (specifically, judging whether to insert the key frame according to whether the occupation ratio of the overlapping area of the current image and the latest key frame is larger than a set threshold value, wherein the threshold value is set to be 0.6); and finally, temporarily storing the current image and the obtained pose of the corresponding camera into a buffer area B. The buffer can support storing b images and their corresponding camera poses and GPS, b is generally set to 10 x f, i.e. approximately buffering the data in 10s, here 150, according to the frame rate f of the processor running algorithm.

Wherein, p _t-1 and p _t in the formula (1) are matching point sets between the current frame and the initial key frame, K is a camera internal reference, and the pose [ R _t|T_t ] of the camera is obtained by solving a basic matrix F and decomposing the basic matrix F; in formula (2), P _i is a 2D feature point of the image I, the corresponding 3D map point is P _i＝{x_i y_i z_i}^T,p_i' which is a projection pixel point of the map point P _i on the image I, i=1, 2. The camera pose is solved by minimizing the re-projection error of the n pairs of matching points [ R _t|T_t ].

2) Pose and map joint optimization

When step 1 determines to add a new key frame K, inserting K into the key frame list; and then, based on the new key frames and the associated key frames with the common view relationship and the visual 3D map points thereof, carrying out joint binding optimization on the coordinates of the map points and the pose of the key frames by solving the problem of minimized reprojection errors. Assuming that the associated keyframe set for the current frame I is { K _cj |j=1, 2,..c }, their pose is { [ R _j|T_j ] |j=1, 2,..c }, the set of 3D map points seen by the camera to which these keyframes belong is { P _i |i=1, 2,..d }, the corresponding 2D feature point of the three-dimensional point Pi in keyframe K _cj is P _ij,p′_ij is the 2D projected pixel point of the three-dimensional point P _i in keyframe K _cj, the optimization problem is as shown in equation (3):

3) Three-dimensional map geographic information retrieval

The ECEF rectangular coordinate system is chosen here as the reference geographical coordinate system. Assuming that N key frames { K1, K2,..kn }, the corresponding pose is { [ R ₁|T₁],[R₂|T₂],...[R_N|T_N ] }, the corresponding GPS value is { g ₁,g₂,...g_N }, then the coordinates of the cameras corresponding to the key frames in the three-dimensional map coordinate system are { T ₁,T₂,...T_N }, and the coordinates thereof in the reference geographic coordinate system are { e ₁,e₂,...e_N }, according to the fixed conversion relation between the GPS and the ECEF coordinate system. When N is larger than 3, according to the one-to-one correspondence between the point set { T ₁,T₂,...T_N } in the three-dimensional map coordinate system and the point set { e ₁,e₂,...e_N } in the geographic coordinate system, solving a transformation matrix between the two spatial rectangular coordinate systems, wherein the transformation matrix comprises a scale factor s, a rotation matrix R and a translation matrix T. The calculation formulas of the scaling factor s are shown in (4) and (5), wherein O _T and O _e are the centroids of the two point sets respectively. To solve the rotation matrix R, a matrix H is set according to formula (6), and SVD decomposition is performed on H to obtain the rotation matrix, as shown in formula (7). Based on s and R, a translation matrix T is calculated according to equation (8).

H＝(T₁-O_T,...,T_N-O_T)^T·(e₁-O_e,...,e_N-O_e) (6)

It should be noted that when N is less than 3, the conversion matrix cannot be estimated according to the above formula. At this time, s is set to 0, and the rotation matrix and the translation matrix are set to a unit matrix and a zero matrix, respectively.

4) Efficient key frame fast screening

When the number of the cached images in the step 1 is equal to B and the geographic transformation matrix s obtained in the step 3 is not equal to 0, firstly acquiring an image I ₀ to be processed and the pose and GPS value g ₀ (lon, lat, alt) of a corresponding camera from the initial position of the buffer area B by adopting a first-in first-out strategy; then, according to the approximate flying height H (m) of the unmanned aerial vehicle, the field angle FOV of the onboard camera and the image resolution (W _p*h_p) calculate a monitoring field-of-view range (W _r(m)*H_r (m)), as shown in a formula (9); finally, using g as the origin, the field of view range diagonal length 2*l is 2 times as radius to define the search area of the effective key frame with the co-view relation with the camera corresponding to the image I1, and the GPS value g ₀ '(lon', lat ', alt') of the key frame corresponding to the camera in the area should satisfy the formula (10).

Where r= 6371004 (m), is the average radius of the earth and rad () is a function that converts angle into radians.

5) Image triangle mesh generation

Based on the selected effective key frames, firstly, a visual three-dimensional point set P= { P _i |i=1, 2,..D } of the key frames in the three-dimensional map is obtained, so that the introduction of all map points is avoided to cause subsequent larger workload; then, using the camera internal parameter K, the camera pose [ R ₀|T₀ ] corresponding to the image I ₀, re-projecting the three-dimensional point set according to the formula (11) to obtain a two-dimensional point setI=1, 2,..d }, and further screening to obtain a set of two-dimensional pixel points projected in the image I ₀ P _i. Epsilon. P, and 0.ltoreq.u _i≤w_p,0≤v_i≤h_p; finally, a Delaunay triangulation algorithm is adopted, a discrete point set p' in the image is divided into a plurality of triangular grids through point-by-point insertion operation, triangle formation by the nearest three points is guaranteed, uniform and smooth triangular grids are formed, and excellent characteristics of uniqueness, optimality, most rule and the like are met:

s_i·(u_i v_i 1)^T＝K(R₀·P_i+T₀)，i＝1，2，...D (11)

6) Image target longitude and latitude estimation

Firstly, traversing all triangles of the triangle mesh of the image I ₀, confirming pixel coordinates of three vertexes of the triangle delta S ₁S₂S₃ where the target pixel point p _t (u, v) is located, and assuming S₁(u₁,v₁),S₂(u₂,v₂),S₃(u₃,v₃), respectively, three-dimensional map coordinates corresponding to the pixel points are respectivelyThis result is unique; then, calculating an internal linear interpolation coefficient k ₁,k₂,k₃ of the three vertexes S ₁,S₂,S₃ according to the formula (12), and estimating a three-dimensional coordinate P _t (x, y, z) of the pixel point P _t in the map according to the formula (13); and finally, calculating ECEF coordinates e _t (X, Y, Z) of the target in a reference geographic coordinate system according to a formula (4) by using the geographic transformation matrix s, r|T obtained in the step 3, so as to obtain the GPS value of the target.

The method reduces the introduction of sensor error sources as far as possible, generates and optimizes the pose of the onboard camera and a sparse geographic map in real time by only using an online acquired aerial image sequence and GPS information, simultaneously projects and triangulates map points in a visible range on an image to quickly obtain triangular grids with geographic information of the image, and further accurately estimates the GPS value of a target pixel point by adopting a triangle internal linear interpolation method based on the grid vertexes. The method can obtain good ground target geographic positioning results in various complex environments, different flying heights and different attitude angles. For urban scenes with rich three-dimensional structures, the positioning accuracy is less than 1m at the flight height of 0-2000 m.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims

1. An aerial image target geographic positioning method based on three-dimensional map generation is characterized by comprising the following steps:

2. The aerial image target geographic positioning method based on three-dimensional map generation according to claim 1, wherein the position estimation method in step 1 specifically comprises the following steps: the initialization process is 2D-2D, and the frame tracking process is 3D-2D.

3. The method for geographic positioning of an aerial image target generated based on a three-dimensional map according to claim 1, wherein in step 1, three-dimensional map points are generated simultaneously and whether a key frame is inserted is determined specifically as follows: and judging whether to insert the key frame according to whether the occupation ratio of the overlapping area of the current image and the latest key frame is larger than a set threshold value.