CN116206139A

CN116206139A - Unmanned aerial vehicle image upscaling matching method based on local self-convolution

Info

Publication number: CN116206139A
Application number: CN202211717727.2A
Authority: CN
Inventors: 谷鑫斌; 张瑛
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-06-02

Abstract

The invention discloses a local self-convolution image upscaling matching method suitable for unmanned aerial vehicle visual navigation, which comprises the following steps: s1, performing self-convolution on each pixel point in a training image to realize scale-up operation of the training image; s2, extracting features of the self-rolled training image by using a SIFT method to obtain a descriptor; s3, matching the training image with the query image based on the extracted SIFT descriptor; s4, eliminating error matching generated in the matching process. The image upscaling matching method provided by the invention can effectively remove unnecessary details of the image, and simultaneously retain or highlight important edge information.

Description

Unmanned aerial vehicle image upscaling matching method based on local self-convolution

Technical Field

The invention relates to image matching, in particular to an unmanned aerial vehicle image upscaling matching method based on local self-convolution.

Background

The positioning system such as a GPS (Global positioning System) is free from the problem of visual navigation in the supporting environment, and the unmanned aerial vehicle visual image and a reference image (a remote sensing satellite image with geographic position information) are required to be subjected to feature matching so as to determine the specific azimuth of the unmanned aerial vehicle, wherein the key problem to be solved is feature matching among images with different ground resolutions.

1. When the upscaling process is not performed, because the ground resolutions of the image to be matched and the reference image are large, the problem that the local feature points cannot be accurately matched with the global feature points exists, and too many mismatching feature points or mismatching feature points exist, as shown in fig. 6.

2. The downscaling process based on downsampling or gaussian filtering has feature points that are not matched, as shown in fig. 7.

3. The feature points that can be matched can be more effectively preserved based on the self-convolution upscaling process, as shown in fig. 8.

The query image in the image feature point matching is generally an image captured by a satellite. Training images are images obtained by shooting of the unmanned aerial vehicle.

Image feature point matching mainly involves two tasks: feature point detection and descriptor extraction. Feature point detection typically involves corner point detectors such as Harris and FAST corner point detectors. The goal of feature point detection is to find salient points that are easily detected in two completely independent detection iterations on different images of the same region, which may vary greatly in illumination, scale, rotation, and viewpoint. The descriptor extraction is a step of extracting feature vectors from an area around the feature points. The goal is to construct a descriptor for matching feature points using metrics such as euclidean distance or hamming distance.

The reference image is typically a large-scale image of a satellite camera pair, which when detected using methods such as SIFT, results in increased matching time and affects the accuracy of the matching because the training image contains many unnecessary details compared to the query image. Therefore, the training image needs to be upscaled to remove unnecessary details. Current image upscaling methods mainly use gaussian kernels to filter the image, but one problem with this is that the entire information of the image is blurred, which may result in some important image parts not being well preserved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides an unmanned aerial vehicle image upscaling matching method based on local self-convolution, which can effectively remove unnecessary details of an image and simultaneously reserve or highlight important edge information.

The aim of the invention is realized by the following technical scheme: a unmanned aerial vehicle image upscaling matching method based on local self-convolution comprises the following steps:

s1, performing self-convolution on each pixel point in a training image to realize scale-up operation of the training image;

in the step S1, when performing self-convolution on each pixel point in the training image, a matrix with a size of (2a+1) × (2b+1) is used as a convolution kernel of the pixel point with the pixel point as a center;

the pixel values of different pixel points contained in a matrix of the size (2a+1) × (2b+1) are used as matrix elements.

The step S1 includes:

s101 for any point (x ₀ ,y ₀ ) The resulting self-convolution image g (x, y) is in (x ₀ ,y ₀ ) The value g (x ₀ ,y ₀ )：

Wherein the method comprises the steps of

Representing the original image by (x) ₀ ,y ₀ ) For the center, a matrix of size (2a+1) × (2b+1), +.>

Is (x) ₀ ,y ₀ ) A convolution kernel of size (2a+1) × (2b+1), f (x ₀ -dx,y ₀ Dy represents the pixel point (x) ₀ -dx,y ₀ -dy) pixel values at;

s102, repeatedly executing the step S101 for each point of the training image to obtain the value of the self-convolution image g (x, y) at each point, thereby obtaining the self-convolution image g (x, y) as an upscaled image;

for a pixel point in the image boundary, the convolution kernel with the size of (2a+1) x (2b+1) cannot be obtained, and the pixel value of the pixel point is directly taken as a self-convolution result.

S2, extracting features of the self-rolled training image by using a SIFT method to obtain a descriptor;

said step S2 comprises the sub-steps of:

s201, constructing a DoG pyramid of an upscaled image g (x, y):

a1, constructing a Gaussian scale space as a Gaussian blur result:

the gaussian scale space of an image is defined as a function L (x, y, σ) that results from the convolution of the gaussian kernel function G (x, y, σ) with the input image I (x, y):

L(x,y,σ)＝G(x,y,σ)*I(x,y)

where x represents the convolution of the data,

sigma is called a scale space factor, is the standard deviation of Gaussian normal distribution, reflects the blurred degree of an image, and the larger the value of the standard deviation is, the more blurred the image is, and the larger the corresponding scale is;

a2, firstly, carrying out Gaussian blur on the obtained upscaled image g (x, y) to obtain a g (x, y) Gaussian blur image serving as a first layer of a Gaussian pyramid, and then carrying out continuous downsampling based on the g (x, y) Gaussian blur image to obtain a series of continuously reduced images; taking the image obtained by each downsampling as a layer, and forming an image pyramid of g (x, y) according to the downsampling sequence;

a3, carrying out Gaussian blur on any layer of image of the image pyramid by using n sequentially arranged scale space factors to obtain n Gaussian blurred images with different scale space factors; for any two adjacent two scale space factors, the ratio of the former space scale factor to the latter space scale factor is k;

for Gaussian blur images with n different scale space factors, calculating a DoG space:

for Gaussian blur images corresponding to two adjacent scale space factors, the calculation mode of the DoG (Difference of Gaussian, difference of Gaussian functions) is as follows:

D(x,y,σ)＝[G(x,y,kσ)-G(x,y,σ)]*I(x,y)＝L(x,y,kσ)-L(x,y,σ)

wherein L (x, y, σ) is the gaussian scale space of the image;

since n Gaussian blur images with different scale space factors are shared, n-1 DoG spaces are obtained in total through calculation;

a4, repeatedly executing the step A3 for each layer of image of the image pyramid to obtain a DoG pyramid of g (x, y);

s202, detecting extremum in the DoG pyramid, and eliminating extremum points which do not meet the conditions, so as to obtain characteristic points:

b1, for each layer of image of the image pyramid, carrying out the following processing on n-1 DoG spaces obtained by the image pyramid:

in order to find the extreme point of the scale space, for each DoG space, each pixel point is compared with all adjacent points of the same scale space and the adjacent scale space, and when the pixel value is larger than the pixel value of all adjacent points or smaller than the pixel value of all adjacent points, the current pixel point is the extreme point;

and B2, taking the obtained extreme points as candidate feature points, and eliminating the extreme points which do not meet the condition:

for any candidate feature point x, the offset is defined as Deltax, the contrast is the absolute value of D (x) |D (x), and Taylor expansion is applied to D (x)

Since x is the extreme point of D (x), the above formula is derived and set to 0 to obtain

Then substituting the obtained Deltax into Taylor expansion of D (x)

Let the threshold of contrast be T, if

The feature point is reserved, otherwise, the feature point is removed;

s203, calculating the main direction of the feature points:

scale image of feature point

L(x,y)＝G(x,y,σ)*I(x,y)

Calculating the amplitude and the amplitude of the region image centered on the feature point and having a radius of 3×1.5σ, the amplitude m (x, y) and the direction θ (x, y) of the gradient at each point L (x, y) being obtained by the following formula

After the gradient direction is calculated, the gradient direction and the amplitude corresponding to the pixels in the neighborhood of the histogram statistical feature point are used; the horizontal axis of the histogram in the gradient direction is the angle of the gradient direction, the vertical axis is the accumulation of the gradient amplitude corresponding to the gradient direction, and the peak value in the histogram is the main direction of the feature point;

s204, generating a feature descriptor:

for each feature point, in order to ensure the rotation invariance of the vector, the coordinate axis is rotated by an angle theta in the neighborhood coordinates with the feature point as the center, wherein theta is the main direction angle of the feature point;

taking a 16×16 window with the main direction as the center after rotation, solving the gradient amplitude and gradient direction of each pixel in the window, then using a gaussian function G (x, y, sigma) with sigma=4 to allocate a weight to the amplitude of each sampling point,

wherein:

finally, calculating a weighted accumulated value of the amplitude of each direction in 8 directions on each 4×4 small block to form a seed point; that is, each keypoint is described using 16 seed points, such that one keypoint produces a 128-dimensional SIFT feature vector;

and finally, carrying out normalization processing on the length of the obtained feature vector, further removing the influence of illumination, and obtaining SIFT features, thus obtaining the descriptor.

S3, matching the training image with the query image based on the extracted SIFT descriptor;

firstly substituting the query image into the step S1 to obtain a self-convolution result of the query image, substituting the self-convolution image of the query image into the step S2 to serve as a training image to obtain feature points and descriptors of the query image, calculating Euclidean distances of descriptors corresponding to all feature points in the query image for each descriptor corresponding to each feature point in the training image, and if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a given threshold value, considering that the feature points corresponding to the two descriptors are successfully matched.

S4, eliminating error matching generated in the matching process.

When the error matching is eliminated in the step S4, any one of the following methods is adopted:

1. cross-filtering:

if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a given threshold value, the feature points corresponding to the two descriptors are considered to be successfully matched, when one feature point on the training image is matched with one feature point on the query image, an opposite check is carried out, namely the feature point on the query image is matched with the feature point on the training image, and if the matching is successful, the feature point is considered to be correct; if no match can be successfully made, the match is considered a false match, which is removed.

2. Ratio test:

for each match, two nearest neighbor descriptors, namely, two descriptors with minimum Euclidean distance between the descriptors corresponding to the match on the query image on the training image, are returned, and the correct match is considered only when the Euclidean distance between the descriptors corresponding to the first match and the descriptors corresponding to the second match on the query image is smaller than a set threshold value, and if the Euclidean distance is larger than the set threshold value, the incorrect match is considered, and the incorrect match is removed.

The beneficial effects of the invention are as follows: according to the method, the images shot by the unmanned aerial vehicle in real time are matched with the images shot by the satellite in the prior art, unnecessary details of the images can be effectively removed through the self-rolling method, and important edge information is reserved or highlighted.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is an original image in an embodiment;

FIG. 3 is a self-convolution image obtained using a convolution kernel of 5x5 size in an embodiment;

FIG. 4 is a diagram of an embodiment of an edge extraction of an original image using the Prewitt algorithm;

FIG. 5 is an image obtained by extracting edges from a convolution image using the Prewitt algorithm in an embodiment.

FIG. 6 is a graph showing the results of matching an unprocessed training image with a query image according to an embodiment

FIG. 7 is a diagram showing the result of matching a training image with a query image using Gaussian blur processing in an embodiment

FIG. 8 is a diagram of a result of matching a training image with a query image using a self-convolution method in an embodiment

FIG. 9 is an image of a third layer of a Gaussian pyramid of a training image in an embodiment

FIG. 10 is an image of a third layer of a DoG pyramid of a training image in an embodiment

FIG. 11 is a descriptor (only the upper left corner is shown) of a training image in an embodiment

Detailed Description

The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description.

As shown in fig. 1, a method for up-scaling matching an unmanned aerial vehicle image based on local self-convolution includes the following steps: s1, performing upscaling operation on a training image by adopting a local self-convolution method; s2, extracting features of the self-rolled training image by using a SIFT method; s3, matching the training image with the query image; s4, eliminating error matching generated in the matching process.

The step S1 includes:

Wherein the method comprises the steps of

For one (5, 5) image, it is self-rolled with (3, 3) kernels

The size of the core is (3, 3),

therefore, only the values of (2, 2), (2, 3), (2, 4), (3, 2), (3, 3), (3, 4), (4, 2), (4, 3), (4, 4) and the positions will change, and the result after the change is

The self-convolution result of the image is obtained.

When the Prewitt operator is used to calculate edges for it (only the x-direction is shown here)

/>

Here we focus on the (3, 3) position, the value of the (3, 3) position being

x24(x24+x35)-x22(x22-x33)+x34(x34+x23)-x32(x32-x43)+x44(x44+x33)-x42(x42-x31)+(x14*x25+x34*x45+x43*x54)-(x41*x52+x12*x23+x21*x32)

The value extracted by the Prewitt operator of the (3, 3) position after the image self-convolution is obtained, compared with the value extracted by the Prewitt of the non-self-convolution image

x24-x22+x34-x32+x44-x42

The weighting takes into account the pixels farther around to give different weights and finally adds two terms of symmetry to correct the value.

For fig. 2, when a convolution kernel of 5x5 size is used, the resulting self-convolution image is fig. 3.

It was found that the smaller width lines were eliminated from the roll-up. The following fig. 4 and 5 can be obtained by extracting edges from fig. 2 and 3 using the Prewitt algorithm, respectively.

It is obvious that the self-convolution of the image is effective for removing unnecessary details and the important edge information of the image is well preserved and even enhanced.

In the step S2, SIFT is used to extract features of the image:

the SIFT algorithm was proposed by Lowe in 2004. It remains unchanged from rotation, scaling, brightness variation, etc., and is a very stable local feature.

To extract SIFT features, the method mainly comprises the following steps:

an image pyramid is first constructed and,

images of different blur levels can be obtained using different "gaussian kernels". A gaussian scale space of an image can be derived from its convolution with different gaussian:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

where G (x, y, σ) is a Gaussian kernel function.

Sigma is called a scale space factor, which is the standard deviation of a gaussian normal distribution, reflecting the degree to which an image is blurred, the larger its value, the more blurred the image, and the larger the corresponding scale.

The scale space is constructed to detect feature points that are all present at different scales,

let k be the scale factor of two adjacent gaussian scale spaces, the definition of DoG:

D(x,y,σ)＝[G(x,y,kσ)-G(x,y,σ)]*I(x,y)＝L(x,y,kσ)-L(x,y,σ)

wherein L (x, y, σ) is the gaussian scale space of the image;

the image pyramid is a group of results obtained by the same image under different resolutions, the original image is firstly required to be subjected to Gaussian blur, and then the Gaussian blurred image is subjected to downsampling to obtain a series of images with continuously reduced sizes;

carrying out Gaussian blur on each layer of image of the image pyramid by using different parameters sigma to obtain the Gaussian pyramid;

the resulting gaussian pyramid third layer is shown in fig. 9. In downsampling, the first image of the group of images above the pyramid is downsampled from the third image of the group below. After the construction of the Gaussian pyramid is completed, the adjacent Gaussian pyramids are subtracted to obtain the DoG pyramid. The resulting DoG pyramid is shown in fig. 10.

The purpose of constructing the scale space is to detect feature points that exist at different scales, and the better operator for detecting feature points is delta ² G (laplace, loG),

direct use of LoG computation is relatively expensive, and DoG is typically used to approximate LoG. Let k be the scale factor of two adjacent gaussian scale spaces, the definition of DoG:

D(x,y,σ)＝[G(x,y,kσ)-G(x,y,σ)]*I(x,y)＝L(x,y,kσ)-L(x,y,σ)

where L (x, y, σ) is the gaussian scale space of the image.

Then detecting extremum in the DoG space, and eliminating bad extremum points:

candidate feature point x, whose offset is defined as Δx, whose contrast is D (x) absolute value |D (x) |, and Taylor expansion is applied to D (x)

Then substituting the obtained Deltax into Taylor expansion of D (x)

Let the threshold of contrast be T, if

The feature point remains and is otherwise removed.

Then, the main direction of the characteristic points is calculated:

scale image of feature point

L(x,y)＝G(x,y,σ)*I(x,y)

Calculating the amplitude and the angle of the region image centered on the feature point and having a radius of 3×1.5σ, the modulus m (x, y) and the direction θ (x, y) of the gradient of each point L (x, y) can be obtained by the following formula

After the gradient direction is calculated, the gradient direction and the amplitude corresponding to the pixels in the neighborhood of the histogram statistical feature point are used. The horizontal axis of the histogram of the gradient direction is the angle of the gradient direction (the gradient direction ranges from 0 to 360 degrees, the histogram has 10 columns every 36 degrees, or 8 columns every 45 degrees), the vertical axis is the accumulation of the gradient magnitudes corresponding to the gradient direction, and the peak value of the histogram is the main direction of the feature point

Finally, generating feature descriptions

1. The main direction of rotation is corrected, ensuring rotation invariance.

2. Generating descriptors to finally form a 128-dimensional feature vector

3. And (3) normalizing, namely normalizing the length of the feature vector to further remove the influence of illumination.

And obtaining SIFT features.

To ensure rotational invariance of the vector, the coordinate axis is rotated by θ (the main direction of the feature point) in the neighborhood coordinates around the feature point as the center.

The rotation is followed by a 16 x 16 window centered around the main direction. The gradient amplitude and gradient direction of each pixel in the window are obtained, and then weighted by using a Gaussian window. Finally, calculating the accumulated value of each direction in 8 directions on each 4×4 small block, and forming a seed point. I.e., each keypoint is described using 16 seed points, such that one keypoint can produce a 128-dimensional SIFT feature vector. The resulting descriptor is shown in fig. 11.

The step S4 is to use a ratio test to eliminate mismatch.

1. Cross-filtering:

if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a certain threshold value, the feature points corresponding to the two descriptors are considered to be successfully matched. When one feature point on the training image is matched with one feature point on the query image, performing one-time opposite check, namely matching the feature point on the query image with the feature point on the training image, and if matching is successful, judging that the feature point is correct; if no match can be successfully made, the match is considered a false match, which is removed.

2. Ratio test:

for each match, two nearest neighbor descriptors, i.e., two descriptors with minimum Euclidean distance on the query image of the descriptor corresponding to the match on the training image, are returned, and only when the Euclidean distance between the descriptor on the query image corresponding to the first match and the descriptor on the query image corresponding to the second match is smaller than a certain threshold value, the correct match is considered. If the Euclidean distance is greater than the set threshold, then the error match is considered to be removed.

While the foregoing description illustrates and describes a preferred embodiment of the present invention, it is to be understood that the invention is not limited to the form disclosed herein, but is not to be construed as limited to other embodiments, but is capable of use in various other combinations, modifications and environments and is capable of changes or modifications within the spirit of the invention described herein, either as a result of the foregoing teachings or as a result of the knowledge or skill of the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims

1. A unmanned aerial vehicle image upscaling matching method based on local self-convolution is characterized by comprising the following steps of: the method comprises the following steps:

s4, eliminating error matching generated in the matching process.

2. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: in the step S1, when performing self-convolution on each pixel point in the training image, a matrix with a size of (2a+1) × (2b+1) is used as a convolution kernel of the pixel point with the pixel point as a center;

3. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 2, wherein the method comprises the following steps: the step S1 includes:

Wherein the method comprises the steps of

4. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: said step S2 comprises the sub-steps of:

s201, constructing a DoG pyramid of an upscaled image g (x, y):

a1, constructing a Gaussian scale space as a Gaussian blur result:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

where x represents the convolution of the data,

aiming at Gaussian blur images corresponding to two adjacent scale space factors, the DoG calculation mode is as follows:

D(x,y,σ)＝[G(x,y,kσ)-G(x,y,σ)]*I(x,y)＝L(x,y,kσ)-L(x,y,σ)

wherein L (x, y, σ) is the gaussian scale space of the image;

for each layer of image of the image pyramid, the obtained n-1 DoG spaces are processed as follows:

then, the obtained extreme points are used as candidate feature points, and extreme points which do not meet the conditions are removed:

Then substituting the obtained Deltax into Taylor expansion of D (x)

Let the threshold of contrast be T, if

The feature point is reserved, otherwise, the feature point is removed;

s203, calculating the main direction of the feature points:

scale image of feature point

L(x,y)＝G(x,y,σ)*I(x,y)

s204, generating a feature descriptor:

wherein:

5. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: the step S3 includes:

6. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: when the error matching is eliminated in the step S4, any one of the following methods is adopted:

1. cross-filtering:

2. Ratio test: