CN114842220B

CN114842220B - Unmanned aerial vehicle visual positioning method based on multi-source image matching

Info

Publication number: CN114842220B
Application number: CN202210321285.3A
Authority: CN
Inventors: 袁媛; 刘赶超; 李超
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2024-02-27
Anticipated expiration: 2042-03-24
Also published as: CN114842220A

Abstract

The invention provides an unmanned aerial vehicle visual positioning method based on multi-source image matching. Firstly, training a feature extraction network by using an existing multi-source matching image and a real unmanned airport scene image respectively; then, performing feature extraction on the unmanned aerial vehicle image by using a trained network, and performing position estimation on the unmanned aerial vehicle by using position point information, so that the positioning search range is reduced; and then, extracting features of the satellite images in the estimated position range, and performing feature matching on the unmanned aerial vehicle images and the satellite images by using similarity measurement to obtain an unmanned aerial vehicle positioning result. The method can better solve the problem of heterogeneous matching between the satellite image and the unmanned aerial vehicle image, can be used for various application scenes, has small calculated amount, and can better meet the real-time positioning requirement of the unmanned aerial vehicle platform.

Description

Unmanned aerial vehicle visual positioning method based on multi-source image matching

Technical Field

The invention belongs to the technical field of multi-source remote sensing matching, and particularly relates to an unmanned aerial vehicle visual positioning method based on multi-source image matching.

Background

Unmanned aerial vehicle positioning is usually realized by satellite navigation, but as a passive signal receiving mode, navigation signals are easy to interfere in special scenes. When the signal is lost, the accumulated error of the inertial measurement unit becomes larger and larger over time. The computer vision processes and analyzes the vision information through the computer system, realizes the functions of detecting, identifying, tracking, positioning and the like of the target, and has stronger anti-interference capability. Therefore, unmanned aerial vehicle positioning based on visual matching can well solve unmanned aerial vehicle positioning problem under satellite refusing condition.

The unmanned aerial vehicle visual positioning method is roughly divided into three types: map-free positioning methods (e.g., visual odometry), map-based positioning methods (e.g., synchronous positioning and patterning methods), and map-based positioning methods (e.g., image matching methods). The three visual positioning methods have advantages and disadvantages and application ranges: the method based on map construction and map-free positioning only requires cameras installed on UAVs (Unmanned Aerial Vehicle, unmanned aerial vehicles), but estimation errors of inter-frame motion can be seriously accumulated; additional pre-recorded geographic reference image libraries are needed based on the image matching method, but the absolute position of the UAV can be obtained without accumulating errors.

The image matching method is mainly divided into a traditional method and a deep learning method. The traditional method is based on manually designed descriptors to extract features to realize remote sensing image matching, and the correspondence between local features (regions, lines and points) is mainly sought through descriptor similarity and/or space geometric relations. The use of locally significant features allows such methods to run quickly and be robust to noise, complex geometric deformations and significant radiometric differences. However, due to the popularity of higher resolution and larger size data, the method cannot meet the requirements of more correspondence, higher accuracy and more flexible applications. With the proposal of a large number of marked data sets, the deep learning method, in particular to a Convolutional Neural Network (CNN), achieves a very good effect in the field of image matching. The CNN has the main advantage that the CNN can automatically learn the characteristics favorable for image matching under the guidance of the label data. In contrast to manually designed descriptors, deep learning-based features contain not only low-level spatial information, but also high-level semantic information. Due to the strong capability of automatically extracting the features, the deep learning method can obtain higher matching accuracy.

Although the unmanned aerial vehicle visual positioning method based on image matching has significant advantages, some problems still need to be solved: firstly, the imaging conditions of a reference image with geographic information and an unmanned aerial vehicle image are different, and the multi-source image matching has the problem of heterogeneity; in addition, a small amount of annotation data is difficult to adapt to various application scenes; finally, the platform of the unmanned aerial vehicle also puts a strict requirement on real-time.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an unmanned aerial vehicle visual positioning method based on multi-source image matching. Firstly, training a feature extraction network by using an existing multi-source matching image and a real unmanned airport scene image respectively; then, performing feature extraction on the unmanned aerial vehicle image by using a trained network, and performing position estimation on the unmanned aerial vehicle by using position point information, so that the positioning search range is reduced; and then, extracting features of the satellite images in the estimated position range, and performing feature matching on the unmanned aerial vehicle images and the satellite images by using similarity measurement to obtain an unmanned aerial vehicle positioning result. The method can better solve the problem of heterogeneous matching between the satellite image and the unmanned aerial vehicle image, can be used for various application scenes, has small calculated amount, and can better meet the real-time positioning requirement of the unmanned aerial vehicle platform.

The unmanned aerial vehicle visual positioning method based on multi-source image matching is characterized by comprising the following steps:

step 1: training a twin network for feature extraction by adopting satellite and virtual unmanned aerial vehicle matching images acquired on Google Earth Pro, and storing network parameters; retraining the network by using the tagged real-scene unmanned aerial vehicle takeoff position image to obtain a trained feature extraction network which is suitable for the real unmanned aerial vehicle working scene; the twin network for feature extraction comprises a convolution layer 1, a maximum pooling layer 1, a convolution layer 2, a maximum pooling layer 2, a convolution layer 3, a convolution layer 4, a convolution layer 5, a maximum pooling layer 3, a convolution layer 6 and a convolution layer 7 which are sequentially connected, wherein the convolution kernel size of the convolution layer 1 is 7×7×24, the step size is 1, the convolution kernel size of the convolution layer 2 is 5×5×24, the step size is 1, the convolution kernel sizes of the convolution layer 3 and the convolution layer 4 are 3×3×96, the step size is 1, the convolution kernel size of the convolution layer 5 is 3×3×64, the step size is 1, the convolution kernel size of the convolution layer 6 is 3×3×128, the convolution kernel size of the convolution layer 7 is 8×8×128, the step size is 1, the kernel sizes of the maximum pooling layer 1 and the maximum pooling layer 2 are 3×3, and the kernel size of the maximum pooling layer 3 is 6×6;

step 2: uniformly cutting an unmanned aerial vehicle image into three parts in the transverse direction and the longitudinal direction respectively to form a nine-grid, downsampling each grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and jointly forming the description I 'of the whole unmanned aerial vehicle image by the extracted 9 features' _k ；

Step 3: the method comprises the steps of taking the first n nearest positioning position points of the unmanned aerial vehicle at the current moment to carry out position estimation, respectively calculating the flight speed of the unmanned aerial vehicle in n-1 position intervals of the longitude and latitude directions according to the longitude and latitude where the position points are located and the flight time between every two position points, and recording the flight speed sequence V in the n-1 position intervals of the longitude direction ^long The flying speed sequence in the n-1 position interval in the latitude direction is V ^lat The motion trail of the unmanned aerial vehicle is regarded as uniform motion, and the sequence V ^long And V ^lat The velocity mean value in the system obeys the t distribution, and the velocity range in the longitude or latitude direction isWherein (1)>And S is ^*2 Representing the mean and variance of the longitudinal or latitudinal velocity, respectively, α being the confidence parameter of the t distribution, t _α/2 (n-2) represents that the t distribution side quantile of n-1 sampling intervals is utilized under the condition that the confidence is 1-alpha, the n is obtained by looking up a t distribution table, the value range of n is 3-6, and the value of alpha is 0.005; initializing the first n positioning position points at the initial moment by utilizing the take-off position;

multiplying the speed range by the time difference between the current time and the flight time of the nth position point to obtain a displacement range in the period, if the obtained displacement range is smaller than 20m, keeping the displacement range to be 20m, and adding the displacement range with the coordinates of the nth position point to obtain the current position range of the unmanned aerial vehicle;

step 4: cutting every 10m of the satellite image in the position range obtained in the step 3, uniformly cutting the satellite image into three parts in the transverse direction and the longitudinal direction each time to form a nine-grid, downsampling each small grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and putting 9 features of each extracted satellite image and the position labels thereof into a library to be matched;

step 5: traversing the library to be matched, and calculating L between 9 features of each satellite image and the features of the corresponding position of the nine squares on the unmanned aerial vehicle image obtained in the step 2 ₂ Norms to get 9L ₂ The sum of the norm values is used as a similarity measurement value of the whole image, so that each satellite image in the library to be matched obtains a similarity measurement value of the satellite image and the unmanned aerial vehicle image, and a position label of the satellite image with the minimum similarity measurement value is used as a current unmanned aerial vehicle positioning result; if the difference between the minimum two similarity measurement values is larger than a threshold value beta, the current positioning result is considered to be reliable, and the current unmanned aerial vehicle positioning result is taken as a positioning position point; the value of the threshold value beta is 0.25;

step 6: and repeatedly executing the steps 2 to 5 until the visual navigation flight task is finished.

The beneficial effects of the invention are as follows: because the virtual dataset image and the real scene image are adopted to train the network respectively, the network can be better used for extracting global high-order semantic features, and the problem of heterogeneity among the multi-source images is solved; by adopting the position estimation method, the matching search range can be effectively reduced, the positioning accuracy is improved, and the real-time performance of the system is ensured.

Drawings

FIG. 1 is a flow chart of a method for visual positioning of an unmanned aerial vehicle based on multi-source image matching;

FIG. 2 is a flow chart of a position estimation of the present invention;

fig. 3 is a resulting image of the visual positioning of an unmanned aerial vehicle in a college of university of northwest industrial university using the method of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following figures and examples, which include but are not limited to the following examples.

As shown in fig. 1, the invention provides an unmanned aerial vehicle visual positioning method based on multi-source image matching, which comprises the following specific implementation processes:

step 1: training a twin network for feature extraction by adopting satellite and virtual unmanned aerial vehicle matching images acquired on Google Earth Pro, and storing network parameters; retraining the network by using the tagged real-scene unmanned aerial vehicle takeoff position image to obtain a trained feature extraction network which is suitable for the real unmanned aerial vehicle working scene; the specific structure of the twin network for feature extraction is shown in table 1, wherein Conv represents a convolutional layer name, pooling represents a Pooling layer name, C represents a convolutional operation, and MP represents a maximum Pooling operation.

TABLE 1

Structure of the	Type(s)	Output size	Nuclear size	Stride length
					Conv1	C	128×128×24	7×7×24	1
Pooling1	MP	64×64×24	3×3	2
					Conv2	C	64×64×64	5×5×24	1
Pooling2	MP	32×32×64	3×3	2
					Conv3	C	32×32×96	3×3×96	1
Conv4	C	32×32×96	3×3×96	1
					Conv5	C	32×32×64	3×3×64	1
Pooling3	MP	8×8×64	6×6	4
					Conv6	C	8×8×128	3×3×128	1
Conv7	C	1×1×128	8×8×128	1

Step 2: uniformly cutting an unmanned aerial vehicle image into three parts in the transverse direction and the longitudinal direction respectively to form a nine-grid, downsampling each grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and jointly forming the description I 'of the whole unmanned aerial vehicle image by the extracted 9 features' _k 。

Step 3: the method comprises the steps of taking the first n nearest trusted positioning position points of the unmanned aerial vehicle at the current moment to perform position estimation (the position information of the area is known by default in the flying position of the unmanned aerial vehicle, the earliest n position points can be directly initialized in the take-off position, the n trusted position points are continuously and iteratively updated through flight matching in the later stage), calculating the flying speeds of the unmanned aerial vehicle in n-1 position intervals in the longitude and latitude directions according to the longitude and latitude where the position points are located and the flying time length between every two position points, and recording the flying speed sequence V in the n-1 position intervals in the longitude direction ^long The flying speed sequence in the n-1 position interval in the latitude direction is V ^lat The motion trail of the unmanned aerial vehicle is regarded as uniform motion, and the sequence is thatV ^long And V ^lat The velocity mean value in the system obeys the t distribution, and the velocity range in the longitude or latitude direction isWherein (1)>And S is ^*2 Representing the mean and variance of the longitudinal or latitudinal velocity, respectively, α being the confidence parameter of the t distribution, t _α/2 (n-2) represents that the t distribution side quantile of n-1 sampling intervals is utilized under the condition that the confidence is 1-alpha, the n is obtained by looking up a t distribution table, the value range of n is 3-6, and the value of alpha is 0.005;

multiplying the time difference between the current moment and the flight moment of the nth position point by the speed range to obtain a displacement range in the period, when the value of the obtained displacement range is smaller than 20m, reserving the value of the displacement range as 20m, and adding the value of the displacement range with the coordinates of the nth position point to obtain the current position range of the unmanned aerial vehicle;

the above-described position estimation process is shown in fig. 2.

Step 4: cutting every 10m of the satellite image in the position range obtained in the step 3, uniformly cutting the satellite image into three parts in the transverse direction and the longitudinal direction each time to form a nine-grid, downsampling each small grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and placing 9 features of each extracted satellite image and position labels thereof into a library to be matched.

Step 5: traversing the library to be matched, and calculating L between 9 features of each satellite image and the features of the corresponding position of the nine squares on the unmanned aerial vehicle image obtained in the step 2 ₂ Norms to get 9L ₂ And taking the sum of the norm values as a similarity measurement value of the whole image, so that each satellite image in the library to be matched obtains a similarity measurement value of the satellite image and the unmanned aerial vehicle image, and taking the position label of the satellite image with the minimum similarity measurement value as the current unmanned aerial vehicle positioning result. If the difference between the minimum two similarity measurement values is greater than the threshold value beta, the current matching result is considered to have higher reliability, and the current matching result is considered to have higher reliability at the end of the sequenceTail-adding and updating the n trusted positioning positions in the step 3; the value of the threshold value beta is 0.25.

To verify the effectiveness of the method of the invention, the central processing unit isE5-2680 v4 2.40GHz CPU, a memory 64G, a display card NVIDIA RTX 3090 and Ubuntu16.04 operating systems, and performing simulation experiments based on Pytorch1.7.1 and Python3.8.5 language environments. The center part of los Angeles in virtual data is constructed by using the 3D modeling function of Google Earth Pro software as an experimental area, and the area is 4942 multiplied by 3408m ² . There are typical urban landscapes in the area, with multi-span houses, streets and vehicles, and also in suburban areas, which are open. Thereby obtaining the simulation image of the unmanned aerial vehicle. And for the converted aerial image and the cut satellite image, the positions of the images are marked by the coordinates of each image, so that a matching pair of two heterogeneous images is generated, and 10000 virtual data are obtained. And simultaneously, obtaining the real image of the unmanned aerial vehicle by aerial photography, and carrying out artificial annotation to generate a corresponding matched satellite image. Each scene of the real data is 600 pairs of samples, the actual application scene is simulated at the same time, a long-distance unmanned aerial vehicle video is marked, the duration is 4 minutes, the flight distance is 2 kilometers, the unmanned aerial vehicle flight height is 170m, the video quality is 1920x1080, and each second is 30 frames.

The random gradient descent method Stochastic Gradient Descent (SGD) is used as an optimizer to optimize network parameters, and the parameters of the optimizer SGD are: learning rate 0.01, impulse 0.9, total 50 epochs. The learning rate of every 20 epochs in the virtual data training stage is reduced by 10 times, 30 images are taken in the fine tuning stage, data enhancement is carried out through rotation, and the learning rate is unchanged.

Table 2 shows the matching accuracy calculation results obtained after training the network by using the virtual data and the real aerial data manufactured by Google Earth Pro, respectively. It can be seen that the network has the preliminary heterogeneous matching capability after training on the virtual data, but the effect of directly migrating to the real data is lower, only 36.4%; after the network model is subjected to training fine adjustment on the real data, the matching effect is remarkably improved, and the performance on the real data is greatly improved to 61.5%. Table 3 shows the calculation results after adding the nine-grid cutting process to the image. It can be seen that the test accuracy of each stage after the nine-grid is added is remarkably improved. Fig. 3 shows a result image of the method for visual positioning in the university of northwest industrial university of security, wherein gray solid position points represent actual flight positions, and white hollow is used as a fulcrum to represent visual positioning results of the positions. It can be seen that the result of visual positioning is substantially around the true position, with good positioning accuracy.

TABLE 2

	Virtual data training	After virtual data training and real data fine tuning
			Virtual data testing	66.3％	48.7％
Real data testing	36.4％	61.5％

TABLE 3 Table 3

	Post-training testing of virtual data	Virtual data training+real data post-fine tuning test
			Does not adopt 9 palace lattice	66.3％	61.5％
Adopts 9 palace lattice	75.4％	69.7％

Claims

1. The unmanned aerial vehicle visual positioning method based on multi-source image matching is characterized by comprising the following steps: