CN114842220B - Unmanned aerial vehicle visual positioning method based on multi-source image matching - Google Patents
Unmanned aerial vehicle visual positioning method based on multi-source image matching Download PDFInfo
- Publication number
- CN114842220B CN114842220B CN202210321285.3A CN202210321285A CN114842220B CN 114842220 B CN114842220 B CN 114842220B CN 202210321285 A CN202210321285 A CN 202210321285A CN 114842220 B CN114842220 B CN 114842220B
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- image
- convolution layer
- positioning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000000007 visual effect Effects 0.000 title claims abstract description 21
- 238000005259 measurement Methods 0.000 claims abstract description 15
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000011176 pooling Methods 0.000 claims description 15
- 238000006073 displacement reaction Methods 0.000 claims description 12
- 238000005520 cutting process Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000000059 patterning Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/803—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/17—Terrestrial scenes taken from planes or by drones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention provides an unmanned aerial vehicle visual positioning method based on multi-source image matching. Firstly, training a feature extraction network by using an existing multi-source matching image and a real unmanned airport scene image respectively; then, performing feature extraction on the unmanned aerial vehicle image by using a trained network, and performing position estimation on the unmanned aerial vehicle by using position point information, so that the positioning search range is reduced; and then, extracting features of the satellite images in the estimated position range, and performing feature matching on the unmanned aerial vehicle images and the satellite images by using similarity measurement to obtain an unmanned aerial vehicle positioning result. The method can better solve the problem of heterogeneous matching between the satellite image and the unmanned aerial vehicle image, can be used for various application scenes, has small calculated amount, and can better meet the real-time positioning requirement of the unmanned aerial vehicle platform.
Description
Technical Field
The invention belongs to the technical field of multi-source remote sensing matching, and particularly relates to an unmanned aerial vehicle visual positioning method based on multi-source image matching.
Background
Unmanned aerial vehicle positioning is usually realized by satellite navigation, but as a passive signal receiving mode, navigation signals are easy to interfere in special scenes. When the signal is lost, the accumulated error of the inertial measurement unit becomes larger and larger over time. The computer vision processes and analyzes the vision information through the computer system, realizes the functions of detecting, identifying, tracking, positioning and the like of the target, and has stronger anti-interference capability. Therefore, unmanned aerial vehicle positioning based on visual matching can well solve unmanned aerial vehicle positioning problem under satellite refusing condition.
The unmanned aerial vehicle visual positioning method is roughly divided into three types: map-free positioning methods (e.g., visual odometry), map-based positioning methods (e.g., synchronous positioning and patterning methods), and map-based positioning methods (e.g., image matching methods). The three visual positioning methods have advantages and disadvantages and application ranges: the method based on map construction and map-free positioning only requires cameras installed on UAVs (Unmanned Aerial Vehicle, unmanned aerial vehicles), but estimation errors of inter-frame motion can be seriously accumulated; additional pre-recorded geographic reference image libraries are needed based on the image matching method, but the absolute position of the UAV can be obtained without accumulating errors.
The image matching method is mainly divided into a traditional method and a deep learning method. The traditional method is based on manually designed descriptors to extract features to realize remote sensing image matching, and the correspondence between local features (regions, lines and points) is mainly sought through descriptor similarity and/or space geometric relations. The use of locally significant features allows such methods to run quickly and be robust to noise, complex geometric deformations and significant radiometric differences. However, due to the popularity of higher resolution and larger size data, the method cannot meet the requirements of more correspondence, higher accuracy and more flexible applications. With the proposal of a large number of marked data sets, the deep learning method, in particular to a Convolutional Neural Network (CNN), achieves a very good effect in the field of image matching. The CNN has the main advantage that the CNN can automatically learn the characteristics favorable for image matching under the guidance of the label data. In contrast to manually designed descriptors, deep learning-based features contain not only low-level spatial information, but also high-level semantic information. Due to the strong capability of automatically extracting the features, the deep learning method can obtain higher matching accuracy.
Although the unmanned aerial vehicle visual positioning method based on image matching has significant advantages, some problems still need to be solved: firstly, the imaging conditions of a reference image with geographic information and an unmanned aerial vehicle image are different, and the multi-source image matching has the problem of heterogeneity; in addition, a small amount of annotation data is difficult to adapt to various application scenes; finally, the platform of the unmanned aerial vehicle also puts a strict requirement on real-time.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an unmanned aerial vehicle visual positioning method based on multi-source image matching. Firstly, training a feature extraction network by using an existing multi-source matching image and a real unmanned airport scene image respectively; then, performing feature extraction on the unmanned aerial vehicle image by using a trained network, and performing position estimation on the unmanned aerial vehicle by using position point information, so that the positioning search range is reduced; and then, extracting features of the satellite images in the estimated position range, and performing feature matching on the unmanned aerial vehicle images and the satellite images by using similarity measurement to obtain an unmanned aerial vehicle positioning result. The method can better solve the problem of heterogeneous matching between the satellite image and the unmanned aerial vehicle image, can be used for various application scenes, has small calculated amount, and can better meet the real-time positioning requirement of the unmanned aerial vehicle platform.
The unmanned aerial vehicle visual positioning method based on multi-source image matching is characterized by comprising the following steps:
step 1: training a twin network for feature extraction by adopting satellite and virtual unmanned aerial vehicle matching images acquired on Google Earth Pro, and storing network parameters; retraining the network by using the tagged real-scene unmanned aerial vehicle takeoff position image to obtain a trained feature extraction network which is suitable for the real unmanned aerial vehicle working scene; the twin network for feature extraction comprises a convolution layer 1, a maximum pooling layer 1, a convolution layer 2, a maximum pooling layer 2, a convolution layer 3, a convolution layer 4, a convolution layer 5, a maximum pooling layer 3, a convolution layer 6 and a convolution layer 7 which are sequentially connected, wherein the convolution kernel size of the convolution layer 1 is 7×7×24, the step size is 1, the convolution kernel size of the convolution layer 2 is 5×5×24, the step size is 1, the convolution kernel sizes of the convolution layer 3 and the convolution layer 4 are 3×3×96, the step size is 1, the convolution kernel size of the convolution layer 5 is 3×3×64, the step size is 1, the convolution kernel size of the convolution layer 6 is 3×3×128, the convolution kernel size of the convolution layer 7 is 8×8×128, the step size is 1, the kernel sizes of the maximum pooling layer 1 and the maximum pooling layer 2 are 3×3, and the kernel size of the maximum pooling layer 3 is 6×6;
step 2: uniformly cutting an unmanned aerial vehicle image into three parts in the transverse direction and the longitudinal direction respectively to form a nine-grid, downsampling each grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and jointly forming the description I 'of the whole unmanned aerial vehicle image by the extracted 9 features' k ;
Step 3: the method comprises the steps of taking the first n nearest positioning position points of the unmanned aerial vehicle at the current moment to carry out position estimation, respectively calculating the flight speed of the unmanned aerial vehicle in n-1 position intervals of the longitude and latitude directions according to the longitude and latitude where the position points are located and the flight time between every two position points, and recording the flight speed sequence V in the n-1 position intervals of the longitude direction long The flying speed sequence in the n-1 position interval in the latitude direction is V lat The motion trail of the unmanned aerial vehicle is regarded as uniform motion, and the sequence V long And V lat The velocity mean value in the system obeys the t distribution, and the velocity range in the longitude or latitude direction isWherein (1)>And S is *2 Representing the mean and variance of the longitudinal or latitudinal velocity, respectively, α being the confidence parameter of the t distribution, t α/2 (n-2) represents that the t distribution side quantile of n-1 sampling intervals is utilized under the condition that the confidence is 1-alpha, the n is obtained by looking up a t distribution table, the value range of n is 3-6, and the value of alpha is 0.005; initializing the first n positioning position points at the initial moment by utilizing the take-off position;
multiplying the speed range by the time difference between the current time and the flight time of the nth position point to obtain a displacement range in the period, if the obtained displacement range is smaller than 20m, keeping the displacement range to be 20m, and adding the displacement range with the coordinates of the nth position point to obtain the current position range of the unmanned aerial vehicle;
step 4: cutting every 10m of the satellite image in the position range obtained in the step 3, uniformly cutting the satellite image into three parts in the transverse direction and the longitudinal direction each time to form a nine-grid, downsampling each small grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and putting 9 features of each extracted satellite image and the position labels thereof into a library to be matched;
step 5: traversing the library to be matched, and calculating L between 9 features of each satellite image and the features of the corresponding position of the nine squares on the unmanned aerial vehicle image obtained in the step 2 2 Norms to get 9L 2 The sum of the norm values is used as a similarity measurement value of the whole image, so that each satellite image in the library to be matched obtains a similarity measurement value of the satellite image and the unmanned aerial vehicle image, and a position label of the satellite image with the minimum similarity measurement value is used as a current unmanned aerial vehicle positioning result; if the difference between the minimum two similarity measurement values is larger than a threshold value beta, the current positioning result is considered to be reliable, and the current unmanned aerial vehicle positioning result is taken as a positioning position point; the value of the threshold value beta is 0.25;
step 6: and repeatedly executing the steps 2 to 5 until the visual navigation flight task is finished.
The beneficial effects of the invention are as follows: because the virtual dataset image and the real scene image are adopted to train the network respectively, the network can be better used for extracting global high-order semantic features, and the problem of heterogeneity among the multi-source images is solved; by adopting the position estimation method, the matching search range can be effectively reduced, the positioning accuracy is improved, and the real-time performance of the system is ensured.
Drawings
FIG. 1 is a flow chart of a method for visual positioning of an unmanned aerial vehicle based on multi-source image matching;
FIG. 2 is a flow chart of a position estimation of the present invention;
fig. 3 is a resulting image of the visual positioning of an unmanned aerial vehicle in a college of university of northwest industrial university using the method of the present invention.
Detailed Description
The invention will be further illustrated with reference to the following figures and examples, which include but are not limited to the following examples.
As shown in fig. 1, the invention provides an unmanned aerial vehicle visual positioning method based on multi-source image matching, which comprises the following specific implementation processes:
step 1: training a twin network for feature extraction by adopting satellite and virtual unmanned aerial vehicle matching images acquired on Google Earth Pro, and storing network parameters; retraining the network by using the tagged real-scene unmanned aerial vehicle takeoff position image to obtain a trained feature extraction network which is suitable for the real unmanned aerial vehicle working scene; the specific structure of the twin network for feature extraction is shown in table 1, wherein Conv represents a convolutional layer name, pooling represents a Pooling layer name, C represents a convolutional operation, and MP represents a maximum Pooling operation.
TABLE 1
Structure of the | Type(s) | Output size | Nuclear size | Stride length |
Conv1 | C | 128×128×24 | 7×7×24 | 1 |
Pooling1 | MP | 64×64×24 | 3×3 | 2 |
Conv2 | C | 64×64×64 | 5×5×24 | 1 |
Pooling2 | MP | 32×32×64 | 3×3 | 2 |
Conv3 | C | 32×32×96 | 3×3×96 | 1 |
Conv4 | C | 32×32×96 | 3×3×96 | 1 |
Conv5 | C | 32×32×64 | 3×3×64 | 1 |
Pooling3 | MP | 8×8×64 | 6×6 | 4 |
Conv6 | C | 8×8×128 | 3×3×128 | 1 |
Conv7 | C | 1×1×128 | 8×8×128 | 1 |
Step 2: uniformly cutting an unmanned aerial vehicle image into three parts in the transverse direction and the longitudinal direction respectively to form a nine-grid, downsampling each grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and jointly forming the description I 'of the whole unmanned aerial vehicle image by the extracted 9 features' k 。
Step 3: the method comprises the steps of taking the first n nearest trusted positioning position points of the unmanned aerial vehicle at the current moment to perform position estimation (the position information of the area is known by default in the flying position of the unmanned aerial vehicle, the earliest n position points can be directly initialized in the take-off position, the n trusted position points are continuously and iteratively updated through flight matching in the later stage), calculating the flying speeds of the unmanned aerial vehicle in n-1 position intervals in the longitude and latitude directions according to the longitude and latitude where the position points are located and the flying time length between every two position points, and recording the flying speed sequence V in the n-1 position intervals in the longitude direction long The flying speed sequence in the n-1 position interval in the latitude direction is V lat The motion trail of the unmanned aerial vehicle is regarded as uniform motion, and the sequence is thatV long And V lat The velocity mean value in the system obeys the t distribution, and the velocity range in the longitude or latitude direction isWherein (1)>And S is *2 Representing the mean and variance of the longitudinal or latitudinal velocity, respectively, α being the confidence parameter of the t distribution, t α/2 (n-2) represents that the t distribution side quantile of n-1 sampling intervals is utilized under the condition that the confidence is 1-alpha, the n is obtained by looking up a t distribution table, the value range of n is 3-6, and the value of alpha is 0.005;
multiplying the time difference between the current moment and the flight moment of the nth position point by the speed range to obtain a displacement range in the period, when the value of the obtained displacement range is smaller than 20m, reserving the value of the displacement range as 20m, and adding the value of the displacement range with the coordinates of the nth position point to obtain the current position range of the unmanned aerial vehicle;
the above-described position estimation process is shown in fig. 2.
Step 4: cutting every 10m of the satellite image in the position range obtained in the step 3, uniformly cutting the satellite image into three parts in the transverse direction and the longitudinal direction each time to form a nine-grid, downsampling each small grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and placing 9 features of each extracted satellite image and position labels thereof into a library to be matched.
Step 5: traversing the library to be matched, and calculating L between 9 features of each satellite image and the features of the corresponding position of the nine squares on the unmanned aerial vehicle image obtained in the step 2 2 Norms to get 9L 2 And taking the sum of the norm values as a similarity measurement value of the whole image, so that each satellite image in the library to be matched obtains a similarity measurement value of the satellite image and the unmanned aerial vehicle image, and taking the position label of the satellite image with the minimum similarity measurement value as the current unmanned aerial vehicle positioning result. If the difference between the minimum two similarity measurement values is greater than the threshold value beta, the current matching result is considered to have higher reliability, and the current matching result is considered to have higher reliability at the end of the sequenceTail-adding and updating the n trusted positioning positions in the step 3; the value of the threshold value beta is 0.25.
Step 6: and repeatedly executing the steps 2 to 5 until the visual navigation flight task is finished.
To verify the effectiveness of the method of the invention, the central processing unit isE5-2680 v4 2.40GHz CPU, a memory 64G, a display card NVIDIA RTX 3090 and Ubuntu16.04 operating systems, and performing simulation experiments based on Pytorch1.7.1 and Python3.8.5 language environments. The center part of los Angeles in virtual data is constructed by using the 3D modeling function of Google Earth Pro software as an experimental area, and the area is 4942 multiplied by 3408m 2 . There are typical urban landscapes in the area, with multi-span houses, streets and vehicles, and also in suburban areas, which are open. Thereby obtaining the simulation image of the unmanned aerial vehicle. And for the converted aerial image and the cut satellite image, the positions of the images are marked by the coordinates of each image, so that a matching pair of two heterogeneous images is generated, and 10000 virtual data are obtained. And simultaneously, obtaining the real image of the unmanned aerial vehicle by aerial photography, and carrying out artificial annotation to generate a corresponding matched satellite image. Each scene of the real data is 600 pairs of samples, the actual application scene is simulated at the same time, a long-distance unmanned aerial vehicle video is marked, the duration is 4 minutes, the flight distance is 2 kilometers, the unmanned aerial vehicle flight height is 170m, the video quality is 1920x1080, and each second is 30 frames.
The random gradient descent method Stochastic Gradient Descent (SGD) is used as an optimizer to optimize network parameters, and the parameters of the optimizer SGD are: learning rate 0.01, impulse 0.9, total 50 epochs. The learning rate of every 20 epochs in the virtual data training stage is reduced by 10 times, 30 images are taken in the fine tuning stage, data enhancement is carried out through rotation, and the learning rate is unchanged.
Table 2 shows the matching accuracy calculation results obtained after training the network by using the virtual data and the real aerial data manufactured by Google Earth Pro, respectively. It can be seen that the network has the preliminary heterogeneous matching capability after training on the virtual data, but the effect of directly migrating to the real data is lower, only 36.4%; after the network model is subjected to training fine adjustment on the real data, the matching effect is remarkably improved, and the performance on the real data is greatly improved to 61.5%. Table 3 shows the calculation results after adding the nine-grid cutting process to the image. It can be seen that the test accuracy of each stage after the nine-grid is added is remarkably improved. Fig. 3 shows a result image of the method for visual positioning in the university of northwest industrial university of security, wherein gray solid position points represent actual flight positions, and white hollow is used as a fulcrum to represent visual positioning results of the positions. It can be seen that the result of visual positioning is substantially around the true position, with good positioning accuracy.
TABLE 2
Virtual data training | After virtual data training and real data fine tuning | |
Virtual data testing | 66.3% | 48.7% |
Real data testing | 36.4% | 61.5% |
TABLE 3 Table 3
Post-training testing of virtual data | Virtual data training+real data post-fine tuning test | |
Does not adopt 9 palace lattice | 66.3% | 61.5% |
Adopts 9 palace lattice | 75.4% | 69.7% |
Claims (1)
1. The unmanned aerial vehicle visual positioning method based on multi-source image matching is characterized by comprising the following steps:
step 1: training a twin network for feature extraction by adopting satellite and virtual unmanned aerial vehicle matching images acquired on Google Earth Pro, and storing network parameters; retraining the network by using the tagged real-scene unmanned aerial vehicle takeoff position image to obtain a trained feature extraction network which is suitable for the real unmanned aerial vehicle working scene; the twin network for feature extraction comprises a convolution layer 1, a maximum pooling layer 1, a convolution layer 2, a maximum pooling layer 2, a convolution layer 3, a convolution layer 4, a convolution layer 5, a maximum pooling layer 3, a convolution layer 6 and a convolution layer 7 which are sequentially connected, wherein the convolution kernel size of the convolution layer 1 is 7×7×24, the step size is 1, the convolution kernel size of the convolution layer 2 is 5×5×24, the step size is 1, the convolution kernel sizes of the convolution layer 3 and the convolution layer 4 are 3×3×96, the step size is 1, the convolution kernel size of the convolution layer 5 is 3×3×64, the step size is 1, the convolution kernel size of the convolution layer 6 is 3×3×128, the convolution kernel size of the convolution layer 7 is 8×8×128, the step size is 1, the kernel sizes of the maximum pooling layer 1 and the maximum pooling layer 2 are 3×3, and the kernel size of the maximum pooling layer 3 is 6×6;
step 2: uniformly cutting an unmanned aerial vehicle image into three parts in the transverse direction and the longitudinal direction respectively to form a nine-grid, downsampling each grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and jointly forming the description I 'of the whole unmanned aerial vehicle image by the extracted 9 features' k ;
Step 3: the method comprises the steps of taking the first n nearest positioning position points of the unmanned aerial vehicle at the current moment to carry out position estimation, respectively calculating the flight speed of the unmanned aerial vehicle in n-1 position intervals of the longitude and latitude directions according to the longitude and latitude where the position points are located and the flight time between every two position points, and recording the flight speed sequence V in the n-1 position intervals of the longitude direction long The flying speed sequence in the n-1 position interval in the latitude direction is V lat The motion trail of the unmanned aerial vehicle is regarded as uniform motion, and the sequence V long And V lat The velocity mean value in the system obeys the t distribution, and the velocity range in the longitude or latitude direction isWherein (1)>And S is *2 Representing the mean and variance of the longitudinal or latitudinal velocity, respectively, α being the confidence parameter of the t distribution, t α/2 (n-2) represents that the t distribution side quantile of n-1 sampling intervals is utilized under the condition that the confidence is 1-alpha, the n is obtained by looking up a t distribution table, the value range of n is 3-6, and the value of alpha is 0.005; initializing the first n positioning position points at the initial moment by utilizing the take-off position;
multiplying the speed range by the time difference between the current time and the flight time of the nth position point to obtain a displacement range in the period, if the obtained displacement range is smaller than 20m, keeping the displacement range to be 20m, and adding the displacement range with the coordinates of the nth position point to obtain the current position range of the unmanned aerial vehicle;
step 4: cutting every 10m of the satellite image in the position range obtained in the step 3, uniformly cutting the satellite image into three parts in the transverse direction and the longitudinal direction each time to form a nine-grid, downsampling each small grid to 128 multiplied by 128, extracting features by utilizing the network trained in the step 1, and putting 9 features of each extracted satellite image and the position labels thereof into a library to be matched;
step 5: traversing the library to be matched, and calculating L between 9 features of each satellite image and the features of the corresponding position of the nine squares on the unmanned aerial vehicle image obtained in the step 2 2 Norms to get 9L 2 The sum of the norm values is used as a similarity measurement value of the whole image, so that each satellite image in the library to be matched obtains a similarity measurement value of the satellite image and the unmanned aerial vehicle image, and a position label of the satellite image with the minimum similarity measurement value is used as a current unmanned aerial vehicle positioning result; if the difference between the minimum two similarity measurement values is larger than a threshold value beta, the current positioning result is considered to be reliable, and the current unmanned aerial vehicle positioning result is taken as a positioning position point; the value of the threshold value beta is 0.25;
step 6: and repeatedly executing the steps 2 to 5 until the visual navigation flight task is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210321285.3A CN114842220B (en) | 2022-03-24 | 2022-03-24 | Unmanned aerial vehicle visual positioning method based on multi-source image matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210321285.3A CN114842220B (en) | 2022-03-24 | 2022-03-24 | Unmanned aerial vehicle visual positioning method based on multi-source image matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114842220A CN114842220A (en) | 2022-08-02 |
CN114842220B true CN114842220B (en) | 2024-02-27 |
Family
ID=82563377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210321285.3A Active CN114842220B (en) | 2022-03-24 | 2022-03-24 | Unmanned aerial vehicle visual positioning method based on multi-source image matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114842220B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
CN113239952A (en) * | 2021-03-30 | 2021-08-10 | 西北工业大学 | Aerial image geographical positioning method based on spatial scale attention mechanism and vector map |
CN113361508A (en) * | 2021-08-11 | 2021-09-07 | 四川省人工智能研究院(宜宾) | Cross-view-angle geographic positioning method based on unmanned aerial vehicle-satellite |
WO2022022695A1 (en) * | 2020-07-31 | 2022-02-03 | 华为技术有限公司 | Image recognition method and apparatus |
-
2022
- 2022-03-24 CN CN202210321285.3A patent/CN114842220B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
WO2022022695A1 (en) * | 2020-07-31 | 2022-02-03 | 华为技术有限公司 | Image recognition method and apparatus |
CN113239952A (en) * | 2021-03-30 | 2021-08-10 | 西北工业大学 | Aerial image geographical positioning method based on spatial scale attention mechanism and vector map |
CN113361508A (en) * | 2021-08-11 | 2021-09-07 | 四川省人工智能研究院(宜宾) | Cross-view-angle geographic positioning method based on unmanned aerial vehicle-satellite |
Non-Patent Citations (2)
Title |
---|
基于POS与图像匹配的无人机目标定位方法研究;张岩;李建增;李德良;杜玉龙;;军械工程学院学报;20150215(01);全文 * |
无人机影像与卫星影像配准的卷积神经网络方法;蓝朝桢;施群山;崔志祥;秦剑琪;徐青;;测绘科学技术学报;20200215(01);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114842220A (en) | 2022-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102273559B1 (en) | Method, apparatus, and computer readable storage medium for updating electronic map | |
CN111862126B (en) | Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm | |
CN107024216B (en) | Intelligent vehicle fusion positioning system and method introducing panoramic map | |
WO2019153245A1 (en) | Systems and methods for deep localization and segmentation with 3d semantic map | |
GB2568286A (en) | Method of computer vision based localisation and navigation and system for performing the same | |
US11430199B2 (en) | Feature recognition assisted super-resolution method | |
CN113593017A (en) | Method, device and equipment for constructing surface three-dimensional model of strip mine and storage medium | |
CN109033245B (en) | Mobile robot vision-radar image cross-modal retrieval method | |
CN113052106B (en) | Airplane take-off and landing runway identification method based on PSPNet network | |
AU2020375559B2 (en) | Systems and methods for generating annotations of structured, static objects in aerial imagery using geometric transfer learning and probabilistic localization | |
CN113516664A (en) | Visual SLAM method based on semantic segmentation dynamic points | |
CN111611918B (en) | Traffic flow data set acquisition and construction method based on aerial data and deep learning | |
CN114564545A (en) | System and method for extracting ship experience course based on AIS historical data | |
CN111812978B (en) | Cooperative SLAM method and system for multiple unmanned aerial vehicles | |
CN115496900A (en) | Sparse fusion-based online carbon semantic map construction method | |
JP2020153956A (en) | Mobile location estimation system and mobile location method | |
CN114842220B (en) | Unmanned aerial vehicle visual positioning method based on multi-source image matching | |
CN113227713A (en) | Method and system for generating environment model for positioning | |
CN115187614A (en) | Real-time simultaneous positioning and mapping method based on STDC semantic segmentation network | |
CN110826432B (en) | Power transmission line identification method based on aviation picture | |
CN112836586A (en) | Intersection information determination method, system and device | |
Sikdar et al. | Unconstrained Vision Guided UAV Based Safe Helicopter Landing | |
Sun et al. | Accurate deep direct geo-localization from ground imagery and phone-grade gps | |
CN115994934B (en) | Data time alignment method and device and domain controller | |
Astudillo et al. | Mono-LSDE: Lightweight Semantic-CNN for Depth Estimation from Monocular Aerial Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |